Mathematical Analysis A Concise Introduction Bernd S. W. Schroder Louisiana Tech University Program of Mathematics and Statistics Ruston, LA BtCLNTENNIAL WILEY-INTERSCIENCE A John Wiley & Sons, Inc., Publication This Page Intentionally Left Blank Mathematical Analysis THE W l L E Y BICENTENNIAL-KNOWLEDGEFOR GENERATIONS 6 ach generation has its unique needs and aspirations. When Charles Wiley first opened his small printing shop in lower Manhattan in 1807, it was a generation of boundless potential searching for an identity. And we were there, helping to define a new American literary tradition. Over half a century later, in the midst of the Second Industrial Revolution, it was a generation focused on building the future. Once again, we were there, supplying the critical scientific, technical, and engineering knowledge that helped frame the world. Throughout the 20th Century, and into the new millennium, nations began to reach out beyond their own borders and a new international community was born. Wiley was there, expanding its operations around the world to enable a global exchange of ideas, opinions, and know-how. For 200 years, Wiley has been an integral part of each generation's journey, enabling the flow of information and understanding necessary to meet their needs and fulfill their aspirations. Today, bold new technologies are changing the way we live and learn. Wiley will be there, providing you the must-have knowledge you need to imagine new worlds, new possibilities, and new opportunities. Generations come and go, but you can always count on Wiley to provide you the knowledge you need, when and where you need it! rc'\ U WILLIAM J. PESCE PRESIDENT AND CHIEF EXECUTIVEOFFICER PETER B O O T H WlLEV CHAIRMAN OF THE BOARD Mathematical Analysis A Concise Introduction Bernd S. W. Schroder Louisiana Tech University Program of Mathematics and Statistics Ruston, LA BtCLNTENNIAL WILEY-INTERSCIENCE A John Wiley & Sons, Inc., Publication Copyright C 2008 by John Wiley & Sons, Inc. All rights reserved Published by John Wiley & Sons, Inc., Hoboken, New Jersey. Published simultaneously in Canada. No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, scanning, or otherwise, except as permitted under Section 107 or 108 of the 1976 United States Copyright Act, without either the prior written permission of the Publisher, or authorization through payment of the appropriate per-copy fee to the Copyright Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, (978) 750-8400, fax (978) 750-4470, or on the web at www.copyright.com. Requests to the Publisher for permission should be addressed to the Permissions Department, John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, (201) 748-601 1, fax (201) 748-6008, or online at http://www.wiley.com/go/permission. Limit of Liability/Disclaimer of Warranty: While the publisher and author have used their best efforts in preparing this book, they make no representations or warranties with respect to the accuracy or completeness of the contents of this book and specifically disclaim any implied warranties of merchantability or fitness for a particular purpose. No warranty may be created or extended by sales representatives or written sales materials. The advice and strategies contained herein may not be suitable for your situation. You should consult with a professional where appropriate. Neither the publisher nor author shall be liable for any loss of profit or any other commercial damages, including but not limited to special, incidental, consequential, or other damages. For general information on our other products and services or for technical support, please contact our Customer Care Department within the United States at (800) 762-2974, outside the United States at (317) 572-3993 or fax (317) 572-4002. Wiley also publishes its books in a variety of electronic formats. Some content that appears in print may not be available in electronic format. For information about Wiley products, visit our web site at www.wiley.com. Wiley Bicentennial Logo: Richard J. Pacific0 Library of Congress Cataloging-in-Publication Data: Schroder, Bernd S. W. (Bemd Siegfried Walter), 1966Mathematical analysis : a concise introduction / Bernd S.W. Schroder p. cm. ISBN 978-0-470-10796-6 (cloth) 1. Mathematical analysis. I. Title. QA300.S376 2007 5 15-dc22 2007024690 Printed in the United States of America. Contents Table of Contents V Preface xi Part I: Analysis of Functions of a Single Real Variable 1 The Real Numbers 1.1 Field Axioms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 Order Axioms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3 Lowest Upper and Greatest Lower Bounds . . . . . . . . . . . . . . . 1.4 Natural Numbers, Integers. and Rational Numbers . . . . . . . . . . . 1.5 Recursion. Induction. Summations. and Products . . . . . . . . . . . 1 1 4 8 11 17 :2 Sequences of Real Numbers 2.1 Limits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Limit Laws . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3 CauchySequences . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4 Bounded Sequences . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.5 Infinite Limits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 25 30 36 40 44 :3 Continuous Functions 3.1 Limits of Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 49 3.2 Limit Laws . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3 One-sided Limits and Infinite Limits . . . . . . . . . . . . . . . . . . 3.4 3.5 3.6 4 Continuity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Properties of Continuous Functions . . . . . . . . . . . . . . . . . Limits at Infinity . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 . . 66 Differentiable Functions 4.1 Differentiability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 Differentiation Rules . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3 Rolle’s Theorem and the Mean Value Theorem . . . . . . . . . . . . V 52 56 69 71 71 74 80 vi Con tents 5 The Riemann Integral I 5.1 Riemann Sums and the Integral . . . . . . . . . . . . . . . . . . . . . 5.2 Uniform Continuity and Integrability of Continuous Functions . . . . 5.3 The Fundamental Theorem of Calculus . . . . . . . . . . . . . . . . 5.4 The Darboux Integral . . . . . . . . . . . . . . . . . . . . . . . . . . 85 85 91 95 97 6 Series of Real Numbers I 101 6.1 Series as a Vehicle To Define Infinite Sums . . . . . . . . . . . . . . 101 6.2 AbsoluteConvergenceandUnconditionalConvergence . . . . . . . . 108 7 Some Set Theory 7.1 The Algebra of Sets . . . . . . . . . . . . . . . . . . . . . . . . 7.2 Countable Sets . . . . . . . . . . . . . . . . . . . . . . . . . . 7.3 Uncountable Sets . . . . . . . . . . . . . . . . . . . . . . . . . 117 117 122 124 8 The Riemann Integral I1 8.1 Outer Lebesgue Measure . . . . . . . . . . . . . . . . . . . . . . . 8.2 Lebesgue’s Criterion for Riemann Integrability . . . . . . . . . . . 8.3 More Integral Theorems . . . . . . . . . . . . . . . . . . . . . . . 8.4 Improper Riemann Integrals . . . . . . . . . . . . . . . . . . . . . 127 127 131 136 140 9 The Lebesgue Integral 145 9.1 Lebesgue Measurable Sets . . . . . . . . . . . . . . . . . . . . . . . 147 9.2 Lebesgue Measurable Functions . . . . . . . . . . . . . . . . . . . . 153 9.3 Lebesgue Integration . . . . . . . . . . . . . . . . . . . . . . . . . . 158 9.4 Lebesgue Integrals versus Riemann Integrals . . . . . . . . . . . . . 165 10 Series of Real Numbers I1 10.1 Limits Superior and Inferior . . . . . . . . . . . . . . . . . . . . . . 10.2 The Root Test and the Ratio Test . . . . . . . . . . . . . . . . . . . . 10.3 Power Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169 169 172 11 Sequences of Functions 11.1 Notions of Convergence . . . . . . . . . . . . . . . . . . . . . . . . 11.2 Uniform Convergence . . . . . . . . . . . . . . . . . . . . . . . . . . 179 179 182 12 Transcendental Functions 12.1 The Exponential Function . . . . . . . . . . . . . . . . . . . . . . . . 12.2 Sine and Cosine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.3 L‘H6pital’s Rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189 189 193 199 13 Numerical Methods 13.1 Approximation with Taylor Polynomials . . . . . . . . . . 13.2 Newton’s Method . . . . . . . . . . . . . . . . . . . . . . . 13.3 Numerical Integration . . . . . . . . . . . . . . . . . . . . . 203 204 208 214 . 175 Con tents vii Part 11: Analysis in Abstract Spaces 14 Integration on Measure Spaces 14.1 Measure Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225 225 14.2 Outer Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14.3 Measurable Functions . . . . . . . . . . . . . . . . . . . . . . . . . . 230 234 14.4 Integration of Measurable Functions . . . . . . . . . . . . . . . . . . 235 14.5 Monotone and Dominated Convergence . . . . . . . . . . . . . . . . 238 14.6 Convergence in Mean. in Measure. and Almost Everywhere . . . . . 14.7 Product a-Algebras . . . . . . . . . . . . . . . . . . . . . . . . . . . . 242 245 14.8 Product Measures and Fubini’s Theorem . . . . . . . . . . . . . . . . 251 15 The Abstract Venues for Analysis 15.1 Abstraction I: Vector Spaces . . . . . . . . . . . . . . . . . . . . . . 15.2 Representation of Elements: Bases and Dimension . . . . . . . . . 15.3 Identification of Spaces: Isomorphism . . . . . . . . . . . . . . . . 15.4 Abstraction 11: Inner Product Spaces . . . . . . . . . . . . . . . . . 15.5 Nicer Representations: Orthonormal Sets . . . . . . . . . . . . . . 15.6 Abstraction 111: Normed Spaces . . . . . . . . . . . . . . . . . . . . 15.7 Abstraction IV: Metric Spaces . . . . . . . . . . . . . . . . . . . . . 15.8 LP Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15.9 Another Number Field: Complex Numbers . . . . . . . . . . . . . . . . . . 255 255 259 262 264 267 269 275 278 281 16 The Topology of Metric Spaces 16.1 Convergence of Sequences . . . . . . . . . . . . . . . . . . . . . . . 16.2 Completeness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16.3 Continuous Functions . . . . . . . . . . . . . . . . . . . . . . . . . . 287 287 29 1 296 16.4 Open and Closed Sets . . . . . . . . . . . . . . . . . . . . . . . . . . 16.5 Compactness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 301 309 316 322 330 333 16.6 The Normed Topology of Rd . . . . . . . . . . . . . . . . . . . . . . 16.7 Dense Subspaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16.8 Connectedness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16.9 Locally Compact Spaces . . . . . . . . . . . . . . . . . . . . . . . . 17 Differentiation in Normed Spaces 341 17.1 Continuous Linear Functions . . . . . . . . . . . . . . . . . . . . . . 342 17.2 Matrix Representation of Linear Functions . . . . . . . . . . . . . . . 348 17.3 Differentiability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 353 17.4 The Mean Value Theorem . . . . . . . . . . . . . . . . . . . . . . . 360 17.5 How Partial Derivatives Fit In . . . . . . . . . . . . . . . . . . . . . 362 17.6 Multilinear Functions (Tensors) . . . . . . . . . . . . . . . . . . . . 369 17.7 Higher Derivatives . . . . . . . . . . . . . . . . . . . . . . . . . . . 373 17.8 The Implicit Function Theorem . . . . . . . . . . . . . . . . . . . . . 380 ... Vlll Con tents 18 Measure. Topology. and Differentiation 18.1 Lebesgue Measurable Sets in Rd . . . . . . . . . . . . . . . . . . . . 18.2 Cco and Approximation of Integrable Functions . . . . . . . . . . . . 18.3 Tensor Algebra and Determinants . . . . . . . . . . . . . . . . . . . 18.4 Multidimensional Substitution . . . . . . . . . . . . . . . . . . . . . 385 385 391 397 407 19 Introduction to Differential Geometry 42 1 19.1 Manifolds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 421 19.2 Tangent Spaces and Differentiable Functions . . . . . . . . . . . . . . 427 19.3 Differential Forms. Integrals Over the Unit Cube . . . . . . . . . . . 434 19.4 k-Forms and Integrals Over k-Chains . . . . . . . . . . . . . . . . . . 443 19.5 Integration on Manifolds . . . . . . . . . . . . . . . . . . . . . . . . 452 19.6 Stokes’ Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . 458 20 Hilbert Spaces 20.1 Orthonormal Bases . . . . . . . . . . . . . . . . . . . . . . . . . . . 20.2 Fourier Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20.3 The Riesz Representation Theorem . . . . . . . . . . . . . . . . . . . 463 463 467 475 Part 111: Applied Analysis 21 Physics Background 2 1.1 Harmonic Oscillators . . . . . . . . . . . . . . . . . . . . . . . . . . 21.2 Heat and Diffusion . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.3 Separation of Variables. Fourier Series. and Ordinary Differential Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.4 Maxwell’s Equations . . . . . . . . . . . . . . . . . . . . . . . . . . 21.5 The Navier Stokes Equation for the Conservation of Mass . . . . . . . 483 484 486 22 Ordinary Differential Equations 22.1 Banach Space Valued Differential Equations . . . . . . . . . . . . . . 22.2 An Existence and Uniqueness Theorem . . . . . . . . . . . . . . . . 22.3 Linear Differential Equations . . . . . . . . . . . . . . . . . . . . . . 505 505 508 510 23 The Finite Element Method 23.1 Ritz-Galerkin Approximation . . . . . . . . . . . . . . . . . . . . . . 23.2 Weakly Differentiable Functions . . . . . . . . . . . . . . . . . . . . 23.3 Sobolev Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23.4 Elliptic Differential Operators . . . . . . . . . . . . . . . . . . . . . 23.5 Finite Elements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 513 513 518 524 532 536 Conclusion and Outlook 490 493 496 544 Con tents 1x Appendices A Logic A.l Statements. . . . . . . . . . . . . . . . . . . . , . . . . . . . . . A.2 Negations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 545 . , . 545 . 546 B SetTheory 547 B.l The Zermelo-Fraenkel Axioms . . . . . . . . . . . . . . . . . . . . . 547 B.2 Relations and Functions . . . . . . . . . . . . . . . . . . . . . . . . . 548 C Natural Numbers, Integers, and Rational Numbers C.1 The Natural Numbers . . , . . , . . . . . . . . . . . . . . . . . . . 549 . 549 C.2 The Integers . . . . . . . . . . . . . . . . . . . . . . . , . . . . . . . 550 C.3 The Rational Numbers . . . . . . . . . . . . . . . . . . . . . . . . . 550 Bibliography 55 1 Index 553 Con tents X Theon systems Background: Brieflv in Amendices Chapter 6 senes I I I Part I: Analysis on R Counrabdii) Chapter 5 RlemB"" + Chapter 8 RlCma"" Chapter I I Sequencesof Functions Integral I Integral II Chapter 13 Chapter 9 Lcbeigue Integral Chapter 12 Trmscendenral Chapter 21 Ph>iics Background Chapter 22 Ordmay Differential Equations 4 h"lIleClCd Methods 4 c F""Ctl0"S Part 11: Abstract Analysis Part 111: Applications Chapter 21 Panial DEr. Finite Elements Figure 1: Content dependency chart with minimum prerequisites indicated by arrows. Some remarks, examples, and exercises in the later chapter might still depend on other earlier chapters, but this problem typically can be resolved by quoting a single result. Details about where and how the reader can "branch out" are given in in the text. Preface This text is a self-contained introduction to the fundamentals of analysis. The only prerequisite is some experience with mathematical language and proofs. That is, it helps to be familiar with the structure of mathematical statements and with proof methods, such as direct proofs, proofs by contradiction, or induction. With some support in the right places, mostly in the early chapters, this text can also be used without prerequisites in a first proof class. Mastering proofs in analysis is one of the key steps toward becoming a mathematician. To develop sound proof writing techniques, standard proof techniques are discussed early in the text and for a while they are pointed out explicitly. Throughout, proofs are presented with as much detail and as little hand waving as possible. This makes some proofs (for example, the density of C [ a ,b] in L P [ a ,b]in Part 11)notationally a bit complicated. With computers now being a regular tool in mathematics, the author considers this appropriate. When code is written for a problem, all details must be implemented, even those that are omitted in proofs. Seeing a few highly detailed proofs is reasonable preparation for such tasks. Moreover, to facilitate the transition to more abstract settings, such as measure, inner product, normed, and metric spaces, the results for single variable functions are proved using methods that translate to these abstract settings. For example, early proofs rely extensively on sequences and we also use the completeness of the real numbers rather than their order properties. Analysis is important for applications, because it provides the abstract background that allows us to apply the full power of mathematics to scientific problems. This text shows that all abstractions are well motivated by the desire to build a strong theory that connects to specific applications. Readers who complete this text will be ready for all analysis-based and analysis-related subjects in mathematics, including complex analysis, differential equations, differential geometry, functional analysis, harmonic analysis, mathematical physics, measure theory, numerical analysis, partial differential equations, probability theory, and topology. Readers interested in motivation from physics are advised to browse Chapter 21, even if they have not read any of the earlier chapters. Aside from the topics covered, readers interested in applications should note that the axiomatic approach of mathematics is similar to problem solving in other fields. In mathematics, theories are built on axioms. Similarly, in applications, models are subject to constraints. Neither the axioms, nor the constraints can be violated by the theory or model. Building a theory based on axioms fosters the reader's discipline to not make unwarranted assumptions. xii Preface Organization of the content. The text consists of three large parts. Part I, comprised of Chapters 1-13, presents the analysis of functions of one real variable, including a motivated introduction to the Lebesgue integral. Chapters 1-6 and 10-13 could be called “single variable calculus with proofs.” For a smooth transition from calculus and a gradual increase in abstraction, Chapters 1-6 require very little set theory. Chapter 1 presents the properties of the real line and limits of sequences are introduced in Chapter 2. Chapters 3-5 present the fundamentals on continuity, differentiation, and (Riemann) integration in this order and Chapter 6 gives a first introduction to series. Chapters 6-8 are motivated by the desire to further explore the Riemann integral while avoiding the excessive use of Riemann sums. This exploration is done with the Lebesgue criterion for Riemann integrability. Although this criterion requires the Lebesgue measure, the payoff is that many proofs become simpler. To quickly reach this criterion, the first presentation of series in Chapter 6 is deliberately kept short. It presents enough about series to allow the definition of Lebesgue measure. Chapter 7 presents fundamental notions of set theory. Most of these ideas are needed for Lebesgue measure, but, overall, Chapter 7 contains all the set theory needed in the remainder of the text. Chapter 8 finishes the presentation of the Riemann integral. With Lebesgue measure available, it is natural to investigate the Lebesgue integral in Chapter 9. This chapter could also be delayed to the end of Part I, but the author believes that early exposure to the crucial ideas will ease the later transition to measure spaces. The analysis of single variable functions is finished with the rigorous introduction of the transcendental functions. The necessary background on power series is explored in Chapter 10. Chapter 11 presents some fundamentals on the convergence of sequences of functions and Chapter 12 is devoted to the transcendental functions themselves. Chapter 13 discusses general numerical methods, but transcendental functions provide a rich test bed for the methods presented. Part I of the text can be read or presented in many orders. Figure 1 shows the prerequisite structure of the text. Prerequisites for each chapter have deliberately been kept minimal. In this fashion, the order of topics in the reader’s first contact with proofs in analysis can be adapted to many readers’ preferences. Most notably, the intentionally early presentation of Lebesgue integration can be postponed to the end of Part I if so desired. Throughout, the author intends to keep the reader engaged by providing motivation for all abstractions. Consequently, as Figure 1 and the table of contents indicate, some concepts and results are presented in a “just-in-time’’ fashion rather than in what may be considered their traditional place. If a concept is needed in an exercise before the concept is “officially” defined in the text, the concept will be defined in the exercise and in the text. Part 11, comprised of Chapters 14-20, explores how the appropriate abstractions lead to a powerful and widely applicable theoretical foundation for all branches of applied mathematics. The desire to define an integral in d-dimensional space provides a natural motivation to introduce measure spaces in Chapter 14. This chapter facilitates the transition to more abstract mathematics by frequently referring back to corresponding results for the one dimensional Lebesgue integral. The proofs of these results usually are verbatim the same as in the one-dimensional setting. Moreover, this early introduction makes LP spaces available as examples for the rest of the text. The abstract venues of analysis are then presented in Chapter 15, which provides all examples Preface ... Xlll for the rest of Part 11. The fundamentals on metric spaces and continuity are presented in Chapter 16. As with measure spaces, for several results on metric spaces the reader is referred back to the corresponding proof for single variable functions. Proofs are no longer verbatim the same and abstraction is facilitated by translating proofs from a familiar setting to the new setting while analyzing similarities and differences. In a class, the author suggests that the teacher fill in some of these proofs to demonstrate the process. Chapter 17 presents the fundamentals on normed spaces and differentiation. Again, ideas are similar to those for functions of a single variable, but this time the abstraction goes beyond translation. With all three fundamental concepts (integration, continuity, and differentiation) available in the abstract setting, Chapter 18 shows the interrelationship between concepts presented separately before, culminating in the Multivariable Substitution Formula. The second part is completed by a presentation of the fundamentals of analysis on manifolds, together with a physical interpretation of key concepts in Chapter 19 and by an introduction to Hilbert spaces in Chapter 20. The remaining chapters give a brief outlook to applied subjects in which analysis is used, specifically, physics in Chapter 21, ordinary differential equations in Chapter 22, and partial differential equations and the finite element method in Chapter 23. Each of these chapters can only give a taste of its subject and I encourage the reader to go deeper into the utterly fascinating applications that lie behind part 111. The mathematical preparation through this text should facilitate the transition. It should be possible to cover the bulk of the text in a two course sequence. Although Chapters 14-16 should be read in order, depending on the available time, the pace and the choice of topics, any of Chapters 17-23 can serve as a capstone experience. How to read this text. Mathematics in general, and analysis in particular, is not a spectator sport. It is learned by doing. To allow the reader to “do” mathematics, each section has exercises of varying degrees of difficulty. Some exercises require the adaptation of an argument in the text. These exercises are also intended to make the reader critically analyze the argument before adapting it. This is the first step toward being able to write proofs. Of course the need for very critical (and slow) reading of mathematics is nicely summed up in the old quote that “To read without a pencil is daydreaming.” The reader should ask himherself after every sentence “What does this mean? Why is this justified?’ Making notes in the margin to explain the harder steps will allow the reader to answer these questions more easily in the second and third readings of a proof. So it is important to read thoroughly and slowly, to make notes and to reread as often as needed. The extensive index should help with unknown or forgotten terminology as necessary. Other exercises have hints on how to create a proof that the reader has not seen before. These exercises require the use of proof techniques in a new setting. Finally, there are also exercises without hints. Being able to create the proof with nothing but the result given is the deepest task in a mathematics course. This is not to say that exercises without hints are always the hardest and adaptations are always the easiest, but in many cases this is true. Finally, some exercises give a sequence of hints and intermediate results leading up to a famous theorem or a specific example. These exercises could also be used as mini-projects. In a class, some of them xiv Preface could be the basis for separate lectures that spotlight a particular theorem or example. To get the most out of this text, the reader is encouraged to not look for hints and solutions in other background materials. In fact, even for proofs that are adaptations of proofs in this text, it is advantageous to try to create the proof without looking up the proof that is to be adapted. There is evidence that the struggle to solve a problem, which can take days for a single proof, is exactly what ultimately contributes to the development of strong skills. “Shortcuts,” while pleasant, can actually diminish this development. Readers interested in quantitative evidence that shows how the struggle to acquire a skill actually can lead to deeper learning may find the article [4] quite enlightening. A better survival mechanism than shortcuts is the development of connections between newly learned content and existing knowledge. The reader will need to find these connections to hisher existing knowledge, but the structure of the text is intended to help by motivating all abstractions. Readers interested in how knowledge is activated more easily when it was learned in a known context may be interested in the article [5]. Acknowledgments. Strange as it may sound, I started writing this text in the spring of 1987, as I prepared for my oral final examination in the traditional Analysis I111 sequence in Germany. Basically, I took all topics in the sequence and arranged them in what was the most logical fashion to me at the time. Of course, these notes are, in retrospect, immature. But they did a lot to shape my abilities and they were a good source of ideas and exercises. In this respect, I am indebted to my teachers for this sequence: Professor Wegener and teaching assistant Ms. Lange for Analysis I, Professor Kutzler and teaching assistant Herr Bottger for Analysis 11-111 as well as Professor Herz in whose Differential Equations class I first saw analysis “at work.” With all due respect to the other individuals, to me and many of my fellow students, the force that drove us in analysis (and beyond) was Herr Bottger. This gentleman was uncompromising in his pursuit of mathematical excellence and we feared as well as looked forward to his demanding exercise sets. He was highly respected because he was ready to spend hours with anyone who wanted to talk mathematics. Those who kept up with him were extremely well prepared for their mathematical careers. Incidentally, Dr. Ansgar Jungel, whose notes I used for the chapter on the finite element method, took the above mentioned classes with me. The thorough preparation through these classes is the main reason why most of this text was comparatively easy to write. If this text does half as good a job as Herr Bottger did with us, it has more than achieved its purpose. It was thrilling to test my limitations, it was humbling to find them and ultimately I was left awed once more by the beauty of mathematics. When my abilities were insufficient to proceed, I used the texts listed in the bibliography for proofs, hints or to structure the presentation. To make the reader fully concentrate on matters at hand, and to force myself to make the exposition self-contained, outside references are limited to places where results were beyond the scope of this exposition. A solid foundation will allow readers to judiciously pick their own resources for further study. Nonetheless, it is appropriate to recognize the influence of the works of a number of outstanding individuals. I used Adams [2], Renardy and Rogers [23], Yosida [33] and Zeidler [34] for Sobolev spaces, Aris 131, Cramer’s http: //www.navier-stokes .net/,and Preface xv Welty, Wicks and Wilson [31] for fluid dynamics, Chapman [6] for heat transfer, Cohn [7] for measure theory, DieudonnC [8] for differentiation in Banach spaces, Dodge [9] and Halmos [ 131 for set theory, Ferguson [ 101, Sandefur [24] and Stoer and Bulirsch [28] for numerical analysis, Halliday, Resnick and Walker [ 121 for elementary physics, Hewitt and Stromberg [14], Heuser [15], [16], Johnsonbaugh and Pfaffenberger [20], Lehn [22] and Stromberg [29] for general background on analysis, Heuser [17] for functional analysis, Hurd and Loeb [18] for the use of quantifiers in logic, Jiingel [21] and Solin [25] for the finite element method, Spivak [26], [27] for manifolds, Torchinsky [30] for Fourier series, Willard [32] for topology, and the Online Encyclopaedia of Mathematics http : / /eom.springer.d e / for quick checks of notation and definitions. Readers interested in further study of these subjects may wish to start with the above references. The first draft of the manuscript was used in my analysis classes in the Winter and Spring quarters of 2007. The first class covered Chapters 1-9, the second covered Chapters 11 and 14-18 (with some strategic “fast forwards”). This setup assured that graduating students would have full exposure to the essentials of analysis on the real line and to as much abstract analysis as possible without “handwaving arguments.” I am grateful to the students in these classes for keeping up with the pace, solving large numbers of homework problems, being patient with the typos we found and also for suggesting at least one order in which to present the material that I had not considered. The students’ evaluations (my best ever) also reaffirmed for me that people will enjoy, or at least accept and honor, a challenge, and that an ambitious, motivated course should be the way to go. Devery Rowland once more did an excellent job printing drafts of the text for the classes. Aside from the referees, several colleagues also commented on this text and I owe them my thanks for making it a better product. In particular, I would like to thank Natalia Zotov for some comments on an early version that significantly improved the presentation, and Ansgar Jiingel for pointing out some key references on Sobolev spaces. Although I hope that we have found all remaining errors and typos, any that remain are my responsibility and mine alone. I request readers to report errors and typos to me so I can post an errata. My contacts at Wiley, Susanne Steitz, Jacqueline Palmieri, and Melissa Yanuzzi bore with me when the stress level rose and their patience made the publishing process very smooth. As always, this work would not have been possible without the love of my family. It is truly wonderful to be supported by individuals who accept your decision to spend large amounts of time reliving your formative years. Finally, I was sad to learn that Herr Bottger died unexpectedly a few years after I had my last class with him. Sir, this one’s for you. Ruston, LA, August 30,2007 Bernd Schroder Part I Analysis of Functions of a Single Real Variable Chapter 1 The Real Numbers This investigation of analysis starts with minimal prerequisites. Regarding set theory, the terms “set” and “element” will remain undefined, as is customary in mathematics to avoid paradoxes. The empty set 0 is the set that has no elements. The statement “e E S” says that e is an element of the set S. The statement “ A G B” says that every element of A is an element of B . Sets A and B are equal if and only if A C B and B C A . The statement “A c B” says that A E B and A # B . Subsets will be defined as “ A = {x E S : (property)},”that is, with a statement from which set S the elements of A are taken and a property describing them. The union of two sets A and B is A U B = {x : x E A o r x E B } , theintersectionis A n B = {x : x E A andx E B ) . u n Union and intersection of finitely many sets are denoted j=l n A j and n A j , respec- j=1 tively, and the relative complement of B in A is A \ B = {x E A : x @ B ) . Further details on set theory are purposely delayed until Section 7.1. Until then, we focus on analytical techniques. Any required notions of set theory will be clarified on the spot. To define properties, sometimes the universal quantifier “V” (read “for all”) or the existential quantifier “3” (read “there exists”) are used. Formal logic is described in more detail in Appendix A. Finally, the reader needs an intuitive idea what a function, a relation and a binary operation are. Details are relegated to Appendices B.2 and C.2. The real numbers R are the “staging ground” for analysis. They can be characterized as the unique (up to isomorphism) mathematical entity that satisfies Axioms 1.1, 1.6, and 1.19. That is, they are the unique linearly ordered, complete field (see Exercise 1-30). In this chapter, we introduce the axioms for the real numbers and some fundamental consequences. These results assure that the real numbers indeed have the properties that we are familiar with from algebra and calculus. 1.1 Field Axioms The description of the real numbers starts with their algebraic properties. 1 2 1. The Real Numbers Axiom 1.1 The real numbers R are a field. That is, R has at least two elements and there are two binary operations, addition : R x R + R and multiplication . : R x R -+ R,so that + 1. Addition is associative, that is, for all x , y , z (x R we have E R we have + y ) + z = x + (y + z). 2. Addition is commutative, that is, for all x , y x E +y = y +x. 3. There is a neutral element 0 f o r addition, that is, there is an element 0 t h a t f o r a l l x E R we havex + 0 = x . 4. For every element x x + (-x) = 0. E R so E R there is an additive inverse element (-x) so that E R we have 6. Multiplication is commutative, that is, for all x , y E R we have 5. Multiplication is associative, that is, for all x , y , z (x . y ) . z = x . ( y . z ) . x ‘ 4 ’ = y .x. 7. There is a neutral element 1f o r multiplication, that is, there is an element 1 E R so that for all x E R we have 1 . x = x . 8. For every element x E t h a t x . x - l = 1. R \ { 0 }there is a multiplicative inverse element x - l so 9. Multiplication is (left) distributive over addition, that is, f o r all a , x , y have a . (x y) = a, .x + a . y . + E R we As is customary for multiplication, the dot between factors is usually omitted. Fields are investigated in detail in abstract algebra. For analysis, it is most effective to remember that the field axioms guarantee the properties needed so that we can perform algebra and arithmetic “as usual.” Some of these properties are exhibited in this section and in the exercises. The exercises also include examples that show that not every field needs to be infinite (see Exercises 1-7-1-9). Theorem 1.2 The following are true in R: 1. For all x E R,we have Ox = 0. 2. 0 # 1. 3. Additive inverses are unique. That is, i f x E property in part 4 ofAxiom 1.1, then x’ = X. 4. For all x E R,we have (- l)x = -x. R and x’ and X both have the 3 1.1. Field Axioms Proof. Early in the text, proofs will sometimes be interrupted by comments in italics to point out standard formulations and proof techniques. To prove part 1, let x E R. Then the axioms allow us to obtain the following Ax.3 Ax.6 Ax.9 Ax.6 equation. Ox = (O+O)x = x(O+O) = xO+xO = Ox +Ox. This implies as was claimed. The proof of part I shows how every step in a proof needs to be just$ed. Usually we will not explicitly justify each step in a computation with an axiom or a previous result. Howevel; the reader should always mentallyfill in thejusti3cation. The practice offilling in these justiJcations should be started in the computations in the remainder of this proot To prove part 2, first note that, because R has at least two elements, there is an x E R \ ( 0 ) . Now suppose for a contradiction (see Standard Proof Technique 1.4 below) that 0 = 1. Then x = 1 . x = 0 . x = 0 is a contradiction to x E R \ ( 0 ) . For part 3, note that if x’and X both have the property in part 4 of Axiom 1.1, then x’ = x’+O = x’+(x+X) = (x’+x)+X = (x+x’)+X = O+X = X+O = X.Note that the statement of part 3 already encodes the typical approach to a uniqueness proof (see Standard Proof Technique 1.5 below). Finally, for part 4 note that x (- l ) x = l x (- 1)x = (1 (- 1)). = Ox = 0. Because by part 3 additive inverses are unique, (- l)x must be the additive inverse -x of x . The last step is a typical application of modus ponens, see Standard Proof Technique 1.3 below. + + + To familiarize the reader with standard proof techniques, these techniques will be pointed out explicitly in the early part of the text. The techniques presented in Chapter 1 are general proof techniques applicable throughout mathematics. Techniques presented in later chapters are mostly specific to analysis. Standard Proof Technique 1.3 The simplest mathematical proof technique is a direct proof in which a result that says “ A implies B” is applied after we have proved that A is true. Truth of A and of “ A implies B” guarantees truth of B . This technique is also called modus ponens. An example is in the proof of part 4 of Theorem 1.2. 0 Standard Proof Technique 1.4 In a proof by contradiction, we suppose the contrary (the negation, also see Appendix A.2) of what is claimed is true and then we derive a contradiction. Typically, we derive a statement and its negation, which is a contradiction, because they cannot both be true. For an example, see the proof of part 2 of Theorem 1.2 above. Given that the reasoning that led to the contradiction is correct, the contradiction must be caused by the assumption that the contrary of the claim is true. Hence, the contrary of the claim must be false, because true statements cannot imply false statements like contradictions (see part 3 of Definition A.2 in Appendix A). But this means the claim must be true. We will usually indicate proofs by contradiction with a starting statement like “suppose for a contradiction.” 0 4 1. The Real Numbers Standard Proof Technique 1.5 For many mathematical objects it is important to assure that they are the only object that has certain properties. That is, we want to assure that the object is unique. In a typical uniqueness proof, we assume that there is more than one object with the properties under investigation and we prove that any two of these objects must be equal. Part 3 of Theorem 1.2 shows this approach. Exercises 1-1. Prove that (-1). (-1) = 1. 1-2. . is right distributive over +. Prove that for all x , y , z E R we have (x + y ) z = xz + yz. 1-3. Multiplicative inverses are unique. Prove that if x E W and x' and X both have the property in part 8 of Axiom 1.1 then x' = X. 1-4. Prove that 0 does not have a multiplicative inverse. 1-5. Prove that if x , y # 0, then ( x y ) - ' = y - l x - ' . Conclude in particular that x y # 0. 1-6. Prove each of the binomial formulas below. Justify each step with the appropriate axiom + b ) 2 = a* + 2ab + b2 (a + b ) ( a - b ) = a2 - b2 (b) ( a - b)* = a 2 - 2ab (a) ( a (c) + b2 + 1-7. Prove that the set (0, 1) with the usual multiplication and the usual addition, except that 1 1 := 0, is a field. That is, prove that the set and addition and multiplication as stated have the properties listed in Axiom 1.1. 1-8. Prove that the set (0, 1. 2 ) with the sum and product of two elements being the remainder obtained when dividing the regular sum and product by 3 is a field. 1-9. A property and some finite fields (a) Let F be a field and let x , y E F . Prove that x y = 0 if and only if x = 0 or y = 0 (b) Prove that the set [O. 1, 2. 3 ) with the sum and product of two elements being the remainder obtained when dividing the regular sum and product by 4 is not a field. (c) Prove that the set (0, 1, . , , , p - 1) with the sum and product of two elements being the remainder obtained when dividing the regular sum and product by p is a field if and only if p is a prime number. 1.2 Order Axioms Exercises 1-7-1-9c show that the field axioms alone are not enough to describe the real numbers. In fact, fields need not even be infinite. However, aside from executing the familiar algebraic operations, we can also compare real numbers. This section presents the order relation on the real numbers and its properties. Axiom 1.6 The real numbers R contain a subset R+,called the positive real numbers such that 1. For all x , y 2. For all x E Either x E E R+,we have x + y E E%+ and x y E E%+, R, exactly one of the following three properties holds. R+ or -x E Rt or x = 0. 5 1.2. Order Axioms A real number x is called negative if and only if -x E R+. Once positive numbers are defined, we can define an order relation. As usual, instead of writing y (-x) we write y - x and call it the difference of x and y. The binary operation “-” is called subtraction. The phrase “if and only if,” which is used in definitions and biconditionals, is normally abbreviated with the artificial word “iff.” + Definition 1.7 For x,y E R,we say x is less than y, in symbols x < y, i f f y -x E R+. We say x is less than or equal to y, denoted x 5 y, ifSx < y or x = y. Finally, we say x is greater than y, denoted x > y, i r y < x,and we say x is greater than or equal to y , denoted x 2 y, ifsy 5 x. The relation 5 satisfies the properties that define an order relation. Proposition 1.8 The relation 5 is an order relation on R.That is, 1. 5 is reflexive. For all x E R we have x 5 x, 2. 5 is antisymmetric. For all x,y x = y, E R we have that x 5 y and y 5 x implies 3. 5 is transitive. For all x, y , z E X , we have that x 5 y and y 5 z implies x 5 z. Moreovel; the relation 5 is a total order relation, that is, f o r any two x,y E have that x 5 y or y 5 x. R we Proof. The relation 5 is reflexive, because it includes equality. For antisymmetry, let x 5 y and y 5 x and suppose for a contradiction that x y . Then x - y E R+ and -(x - y ) = y - x E R+,which cannot be by Axiom 1.6. Thus < must be antisymmetric. For transitivity, let x 5 y and y 5 z . There is nothing to prove if one of the inequalities is an equality. Thus we can assume that x y and y < z , which means y - x E Rf and z - y E R+.But then R+ contains (7, - y) ( y - x) = z - x, and hence x < z . We have shown that for all x,y. z E R the inequalities x 5 y and y I :z imply x 5 z , which means that 5 is transitive. For the “moreover” part note that if x,y E R,then y - x E R and we have either y - x E R+,which means x < y , or y - x = 0, which means y = x, or x - y = - ( y - x ) E R + , w h i c h m e a n s y < x . Thereforeforallx,y E R o n e o f x 5 y or y 5 x holds, and hence 5 is a total order. + + Once an order relation is established, we can define intervals. Definition 1.9 An interval is a set I C R so that for all c , d E I and x E R the inequalities c < x < d imply x E I . In particular for a , b E R with a < b we define 1. [ a , b] := (X E R : a 5 x 5 b}, 2. ( a , b ) := (x E R : a < x < b ] , ( a , 00) := (x E R : a < x}, (-00. b ) := (X E R : x < b}, (-w, 00) := R, 6 1. The Real Numbers 3. [ a ,b ) := {x E R :a 5 x 4. ( a , b ] := {X R :u E < b } , [ a , 00) := { X E R : u 5 x}, < x 5 b ] , (-w, b ] := ( X E R : x 5 b]. The points a and b are also called the endpoints of the interval. A n interval that does not contain either of its endpoints (where &m are also considered to be "endpoints") is called open, An interval that contains exactly one of its endpoints is called half-open and an interval that contains both its endpoints is called closed. For the first part of this text, the domains of functions will almost exclusively be intervals. Because analysis requires extensive work with inequalities, we need to investigate how the order relation relates to the algebraic operations. Theorem 1.10 Properties of the order relation. Let x , y , z E R. 1. The number x is positive ifsx > 0 and x is negative c r x < 0. 2. I f x 5 y , then x + z 5 y + z. 3. I f x 5 y and z > 0, then xz 5 y z . 4. I f x 5 y and z < 0, then xz 2 y z . 5. l f 0 < x 5 y , then y-' 5 x-'. Similar results can be proved f o r other combinations of strict and nonstrict inequalities. We will not state these here, but instead trust that the reader can make the requisite translation from the statements in this theorem. Proof. Parts 1 and 2 are left to the reader as Exercises 1-10a and 1-lob. Throughout this text, parts of proofs will be delegated to the reader to facilitate a better connection to the material presented. For part 3 , let x 5 y and let z > 0. Then, y - x E R+ or y = x. In case y = x, we obtain y z = x z and thus, in particular, xz 5 y z . In case y - x E R+,note that z > 0 means z E R+, and hence y z - xz = ( y - x ) z E R+.By definition, this implies xz < y z , and in particular xz 5 y z . Because we have shown xz 5 y z in each case, the result is established. All proofs in this section are done with the above kind of case distinction (see Standard Proof Technique 1.1 1). For part 4, let x 5 y and let z < 0. Then, y - x E R+ or y = x. In case y = x, we obtain y z = x z , and hence xz 2 y z . In case y - x E Rf,note that z < 0 means -2 E B+, and hence xz - y z = ( x - y ) z = ( y - x ) ( - z ) E R+.By definition, this implies y z < xz, and hence y z 5 xz,which establishes the result. For part 5, first note that there is nothing to prove if x = y . Hence, we can assume that x < y . Suppose for a contradiction that x - l < y-' . Then by part 3 we have that 1 = x - l x < y - ' x , and hence x < y . 1 < y y - ' x = x,contradiction. Standard Proof Technique 1.11 When several possibilities must be considered in a proof, the proof usually continues with separate arguments for each possibility. The proof is complete when each separate argument has led to the desired conclusion. This 0 type of proof is also called a proof by case distinction. 1.2. Order Axioms 7 We conclude this section by introducing the absolute value function and some of its properties. Definition 1.12 For x E R,we set Ix I = x; i f x 1.0, and we call it the absolute -x; i f x < 0, value of x . Theorem 1.13 summarizes the properties of the absolute value. The numbering is adjusted so that properties 1,2, and 3 correspond to the analogous properties for norms (see Definition 15.38). We will formulate many results in the jirst part of the text to be analogous or easily generalizable to more abstract settings, but we will usually do so without explicit forward references. In this fashion many abstract situations will be more familiar because of similarities to situations investigated in the jirst part. Theorem 1.13 Properties of the absolute value. 0. For all x E R,we have Ix I > 0, 1. For all x E R,we have 1x1 = 0 i y x 2. F o r a l l x , y ER,wehave = 0, lxyl = Ixllyl, 3. Triangular inequality. For all x,y E R,we have Ix 4. Reverse triangular inequality. For all x , y E + y I 5 lx I + I y 1. R,we have 1 Ix I - I y I 1 I Ix - y I. Proof. For part 0, let x E R.In case x > 0, by Definition 1.12 we have /x1 = x > 0. In case x < 0, we have x @ R+ and by part 2 of Axiom 1.6 we conclude -x > 0. Because in this case Ix I = -x > 0, part 0 follows. Throughout the text, the two implications of a biconditional “ A iff B” will be referred to as “+,”denoting “if A, then B ” and “+,”denoting “if B , then A.” For part 1, note that the direction “+=” is trivial, because (01= 0. For the direction “jlet , x” E R be so that /xI = 0 and suppose for a contradiction that x 0. If x > 0, then 0 < x = 1x1 = 0, a contradiction. (Note that the previous sentence is a shortproof by contradiction that is part of a longer proof by contradiction.) Therefore x < 0. But then 0 < -x = 1x1 = 0, a contradiction. Hence, x must be equal to 0. For part 2 , let x , y E R. If x 2 0 and y 1. 0, then by part 3 of Theorem 1.10 xy 1. 0, and hence lxyl = x y = I x / / y l . If x 2 0 and y < 0, then by part 4 of Theorem 1.10 we infer xy 5 0. Hence, (xyl = -xy = x ( - y ) = J x J J y The J . case x < 0 and y 3 0 is similar and the reader will produce it in Exercise 1- 11a. Finally, if x < 0 and y < 0, then by part 4 of Theorem 1.10 we obtain x y > 0. Hence, /xyl = xy = (-l)(-1)xy = ( - x ) ( - y ) = ixllyl. To prove the triangular inequality, first note that for all x E IR we have that x I /x1. This is clear for x 1. 0 and for x < 0 we simply note x < 0 < -x = 1x1.Moreover, (see Exercise 1-llb) for all x E R we have -x I1x1. Now let x,y E R. If the inequality x y 2 0 holds, then by part 2 of Theorem 1.10 at least one of x. 4’ is greater than or equal to 0. (Otherwise x < 0 and y < 0 would imply x y < 0.) Hence,bypart2ofTheoreml.lOIx+yI = x + y l I x l + y ~I x I + I y I . I f x + y ( 0 , + + + 1. The Real Numbers 8 then at least one of x and y is less than 0. Hence, by part 2 of Theorem 1.10 we obtain Ix y l = -(x y ) = --x (-y) < I -XI (-Y) i I --XI I - Y I = 1x1 IYI. Finally, for the reverse triangular inequality, let x, y E R.Without loss of generality (see Standard Proof Technique 1.14) assume that ( X I 3 IyI. (The proof for the case 1x1 < lyl is left as Exercise 1-llc.) Then 1x1 = Ix - y yl i Ix - yI lyl, which w implies 11x1 - 1y11 = 1x1 - IYI i Ix - Y I . + + + + + + + + Standard Proof Technique 1.14 If the proofs for the cases in a case distinction are very similar, it is customary to assume without loss of generality that one of these similar cases is true. This is not a loss of generality, because it is assumed that what is presented enables the reader to fill in the proof(s) for the other case(s). In this text, the omitted part is sometimes included as an explicit exercise for the reader. 0 Exercises 1- 10. Finishing the proof of Theorem 1.10 (a) Prove part 1 of Theorem 1.10. (b) Prove part 2 of Theorem 1.10. 1-1 1. Finishing the proof of Theorem 1.13. (a) L e t x , y ~ W . P r o v e t h a t i f x > O a n d y ~ O , t h e n I x y l = I x l l y l . (b) Prove that for all x E R we have --x 5 1x1. 1 (c) Prove that if 1x1 < Iyl, then 11x1 - ( y / 5 Ix 1-12. Let I , J G R be intervals. Prove that I n J = {x E - y/. W :x E I and x E J ] is again an interval 1-13. Let a < b and letx, y E [ u , b].Prove that In - yI 5 b - a 1-14. Prove that none of the fields from Exercise 1-9c can satisfy Axiom 1.6 by showing that for these fields part 2 of Axiom 1.6 fails for n = 1. Note. This result shows that Axiom 1.6 distinguishes R from the finite fields of Exercise 1-9c. 1.3 Lowest Upper and Greatest Lower Bounds A structure that has the properties outlined in Axioms 1.1 and 1.6 is also called a linearly ordered field. The rational numbers satisfy these properties just as well as the real numbers. Thus we are not done with our characterization of R.The final axiom for the real numbers addresses upper and lower bounds of sets. Definition 1.15 Let A be a subset ofR. E R is called an upper bound of A iff u 2 a f o r all a E A. has an upper bound, it is also called bounded above. 1. The number u 2. The number I E R is called a lower bound of A i f f 1 5 a f o r all a a lower bound, it is also called bounded below. A subset A E If A A. If A has R that is bounded above and bounded below is also called bounded. 9 1.3. Lowest Upper and Greatest Lower Bounds Among all upper bounds of a set, the smallest one (if it exists) plays a special role. Similarly, the greatest lower bound plays a special role if it exists. Definition 1.16 Let A C R. 1. The number s E R is called lowest upper bound of A or supremum of A, denoted sup(A), iffs is an upper bound of A and f o r all upper bounds u of A we have that s 5 u. 2. The number i E R is called greatest lower bound of A or infimum of A, denoted inf(A), iff i is a lower bound of A and f o r all lower bounds 1 of A we have that 1 5 i . Formally, it is not guaranteed that suprema and infima are unique, but the next result shows that this is indeed the case. Note that the statement of Proposition 1.17 follows the standard pattern for a uniqueness statement. Proposition 1.17 Suprema are unique. That is, ifthe set A and s , t E R both are suprema of A, then s = t. R is bounded above Proof. Let A G Iw and s , t E R be as indicated. Then s is an upper bound of A and, because t is a supremum of A, we infer s 2 t . Similarly, t is an upper bound of A and, because s is a supremum of A, we infer t 2 s. This implies s = t . Standard Proof Technique 1.18 (Also compare with Standard Proof Technique 1.14.) When, as in the proof of Proposition 1.17, two parts of a proof are very similar, it is common to only prove one part and state that the other part is similar. Throughout the text, the reader will become familiar with this idea through exercises that require the construction of proofs that are similar to proofs given in the narrative. The proof that infima are unique is similar (see Exercise 1-15). Because suprema and infima are unique if they exist, we speak of the supremum and the infimum. The final axiom for the real numbers now states that suprema and infima exist under mild hypotheses. Axiom 1.19 Completeness Axiom. Every nonempty subset S of R that has an upper bound has a lowest upper bound. Although the Completeness Axiom formally only guarantees that nonempty subsets of R that are bounded above have suprema, existence of infima is a consequence. Proposition 1.20 Let S 5 R be nonempty and bounded below. Then S has a greatest lower bound. Proof. Let L := {x E R : x is a lower bound of S}. Then L f 0. Let s E S. Then for all 1 E L we have that 1 Is. Because S f: 0 this means that L is bounded above. Because L f: 0, by the Completeness Axiom, L has a supremum sup(L). Every s E S is an upper bound of L , which means that s 2 sup(L) and so sup(L) is a lower bound of S . By definition of suprema, sup(L) is greater than or equal to all elements of L , 10 1. The Real Numbers that is, it is greater than or equal to all lower bounds of S. By definition of infima, this means that sup(L) = inf(S). rn We will see that suprema and infima are valuable tools in analysis on the real line. The next result shows that in any set with a supremum we can find numbers that are arbitrarily close to the supremum. This fact is important, because analysis ultimately is about objects “getting close to each other.” Proposition 1.21 Let S c R be a nonempty subset of R that is bounded above and let s := sup(S). Thenfor every E > 0 there is an element x E S so that s - x < E . Proof. Suppose for a contradiction that there is an E > 0 so that for all x E S we have that s - x 1 E . Then for all x E S we would obtain s - E 1 x, that is, s - E would be an upper bound of S. But s - E < s contradicts the fact that s is the lowest upper bound of S. rn Although the supremum and infimum of a set need not be elements of the set, we have different names for them in case they are in the set. Definition 1.22 Let A be a subset of R. 1. If A is bounded above and sup(A) E A, then the supremum of A is also called the maximum of A, denoted max(A). 2. If A is bounded below and inf(A) minimum of A, denoted min(A). E A, then the injmum of A is also called the Although the distinctions between suprema and maxima and between infima and minima are small, the notions are distinct. For example, the open interval (0, 1) has a supremum (1) and an infimum (0), but it has neither a maximum, nor a minimum. Exercises 1-15. Let A g W be bounded below and l e t s , f E W both be infima of A. Prove that s = t . 1-16. Approaching infima. State and prove a version of Proposition 1.21 that applies to infima. Is the proof significantly different from that of Proposition 1.21? 1-17. Let S g W be bounded above. Prove that s E W is the supremum of S iff s is an upper bound of S and for all E > 0 there is an x E S so that Is - x / < E . 1-18. Suprema and infima vs. containment of sets. (a) Let A. B C W be bounded above. Prove that A (b) Let A , B g W be bounded below. Prove that A 1-19. Let A g 5 B implies sup(A) 5 sup(B). g B implies inf(A) ? inf(B). W be bounded above. Prove that inf(x E R : - x E A] = - sup(A). 1.4. Natural Numbers, Integers, and Rational Numbers 11 1.4 Natural Numbers, Integers, and Rational Numbers Although Axioms 1.1, 1.6 and 1.19 uniquely describe the real numbers, they do not mention familiar subsets, such as natural numbers, integers, and rational numbers. This is because these sets can be constructed from the axioms as subsets of the real numbers. We start with the natural numbers, which are the unique subset with properties as stated in Theorem 1.23. While their existence is easy to establish, the uniqueness of the natural numbers can only be proved in Theorem 1.28 after some more machinery has been developed. Theorem 1.23 There is a subset N G R,called the natural numbers, so that 1. 1 E N . 2. For each n E N the number n + 1 is also in N. 3. Principle of Induction. If S s have n 1 E S,then S = N. + N is such that 1 E S and f o r each n E S we also Proof. Call a subset A G R a successor set iff 1 E A and for all a E A we also have a 1 E A . Successor sets exist, because, for example, R itself is a successor set. Let N be the set of all elements of R that are in all successor sets. Because 1 is an element of every successor set, we infer 1 E N. Moreover, if n E N, then n is in every successor set, which means n 1 is in every successor set, and hence n 1 E N. Finally, any subset S C N as given in the Principle of Induction is a successor set. Because the elements of N are contained in all successor sets, we conclude that N G S , and hence N = S. 1 + + + Of course, we will denote the natural numbers by their usual names 1, 2, 3, . . . As algebraic objects, natural numbers are suited for addition and multiplication (see Proposition 1.24), but they are not so well suited for subtraction (see Proposition 1.25). Although all results until Theorem 1.28 are stated for N,they hold “for every subset of R that satisfies the properties in Theorem 1.23.” The reader should keep this in mind and double check, because we will need it in the proof of Theorem 1.28. To avoid awkward formulations, the results up to Theorem 1.28 are formulated for N,however. Proposition 1.24 The natural numbers are closed under addition and multiplication. That is, i f m , n E N,then m n and mn are in N also. + Proof. The key to this result is the Principle of Induction. Let m E W be arbitrary and let S, := { n E N : m+n E N}.Then m E N implies m+ 1 E N,and hence 1 E S,. Moreover, if n E S,, then m n E N,and hence m ( n 1) = ( m n ) 1 E N, which means that n 1 E .S, By the Principle of Induction we conclude that S, = N. Because m E N was arbitrary, this means that for any m , n E N we have m n E W. 1 The proof for products is similar and left to the reader as Exercise 1-20. + + + + + + + Readers familiar with induction recognize the part “1 E S,” of the preceding proof as the base step of an induction and the part “n E S, jn 1 E S”, as the induction step. In this section, we use the “induction on sets” as done in the preceding proof. The more commonly known Principle of Induction is introduced in Theorem 1.39. + 12 1. The Real Numbers Proposition 1.25 Let m , n E N be such that m > n. Then m - n E N. Proof. We first show that if m E N,then m - 1 E N or m - 1 = 0. To do this, let A := { m E N : m - 1 E N o r m - 1 = 0 ) . Then 1 E A a n d i f m E A , then ( m 1) - 1 = m E A C N,which means m 1 E A . Hence, A = N by the Principle of Induction. Now let S:= { n E N:(Vm E N : m > n implies m - n E N)}. If n = 1 and m E W satisfies m > 1, then m - 1 > 0 and so by the above m - 1 E N,which means 1 E S. Let n E S. If m > n 1, then m - 1 > n , and hence m - ( n 1) = (m - 1) - n E N, which means n 1 E S. By the Principle of Induction we conclude that S = N,and hence for all m , n E N we have proved that m > n implies m - n E N. + + + + + Proposition 1.26 shows that the natural numbers are positive and the smallest difference between any two of them is 1. Proposition 1.26 For all n E N,the inequality n 2 1 holds and there is no m that the inequalities n < m < n 1 hold. + E N so Proof. The proof that all natural numbers are greater than or equal to 1 is left to Exercise 1-21. Now suppose for a contradiction that there is an n E N and an rn E N so that n < m < n 1. Thenm - n E N a n d m - n < 1, acontradiction. + The Well-ordering Theorem turns out to be equivalent to the Principle of Induction (see Exercise 1-22). Theorem 1.27 Well-ordering Theorem. Every nonempty subset of N has a smallest element. Proof. Suppose for a contradiction that B 5 N is not empty and does not have a smallest element. Let S := { n E N : (Vm E N : m I n implies m $ B ) } . By Proposition 1.26, 1 is less than or equal to all elements of N,so 1 # B, and hence 1 E S. Now let n E S. Then all m E N with m 5 n are not in B. But then n 1 E B would by Proposition 1.26 imply that n 1 is the smallest element of B . Hence, n 1 # B and we conclude n 1 E S.By the Principle of Induction, S = N and consequently B = 0, a contradiction. + + + + Now we are finally ready to show that the natural numbers are unique. Theorem 1.28 The natural numbers N are the unique subset of R that satisfies the properties in Theorem 1.23. Proof. Examination of the proofs of all results since Theorem 1.23 reveals that any set S E Iw that satisfies the properties in Theorem 1.23 must also have the properties given in these results. It may feel tedious to go back and verify the above statement. Howevel; mathematical presentations more often than not will ask a reader to use a modification of a known proof toprove a result (also see Standard Proof Technique 1.14). When this occurs, the 13 1.4. Natural Numbers, Integers, and Rational Numbers reader is expected to verifL that the result(s)can indeed be proved with similar methods as were used for earlier results. Now suppose for a contradiction that there is a set S # N with properties as in Theorem 1.23. Then S is a successor set, so M S. Let B := S \N = {s E S : s # N). Then B # 0, and hence by the Well-ordering Theorem, which is valid for S, B has a smallest element b. Because 1 E N we infer b 1 , and hence by Proposition 1.25, which is valid for S, we have b - 1 E S. But then b - 1 @ N,because this would imply b = ( b - 1) 1 E N.Hence, b - 1 E B , which is a contradiction to the fact that b is the smallest element of B . + Once we have constructed the natural numbers, the next number system to consider are the integers. Definition 1.29 The set Z := { m E R : m E of integers. N or m = 0 or - rn E N)is called the set We leave several proofs of natural properties of the integers to the reader. Proposition 1.30 The integers are closed under addition, subtraction and multiplication. Moreovel; for any two integers k , 1 with k > 1 we have that k - 1 >_ 1, every nonempty set A 5 Z that is bounded below has a minimum, and every nonempty set A C Zthat is bounded above has a maximum. Proof. To prove that Z is closed under addition, let m , n E Z. In case both are natural numbers or in case one of them is zero, there is nothing to prove. Moreover, in case -m, -n E N we have rn n = -((-m) ( - n ) ) , which is in Z, because (-m) ( - n ) E N. Now consider the case m E N and -n E N. If m = -n, we obtain m n = 0 E Z. If m > -n, then by Proposition 1.25 we conclude that m n = m - (-n) E N 5 Z. Finally, if m < -n again by Proposition 1.25 we conclude that -(m n ) = (-n) - m E N,which means by definition of Z that m n E Z.The case -rn E M and n E N is treated similarly (see Exercise 1-23a). Closedness under subtraction and multiplication as well as the claim about differences are left to Exercises 1-23b-1-23d. Now let A E Z be nonempty and bounded below. Then, because A C R, it has an infimum a . By the version of Proposition 1.21 for infima, there is an integer rn E A with m - a < 1. Because the absolute value of the difference between any two distinct integers is at least 1, rn is the only integer in [a,a 1 ) . Hence, m is below all elements of A that are not in [a,a 1). Because m is the only element of A in [ a ,a l), m must be the minimum of A. The proof of the corresponding result for nonempty subsets A 5 Z that are bounded above is left to Exercise 1-23e. + + + + + + + + + + A key property of the natural numbers is that any real number is exceeded by a natural number. To prove this, we need the usual fractions, which are easily introduced. 1 R \ { 0 )we set - := a-l and call it the reciprocal of a. n -. b 1 For b E W and a E W \ ( 0 )we set - := b . - = ba-' and call it a fraction. a U Definition 1.31 For all a E 1. The Real Numbers 14 1 1 Because - + - = 2-' 2 2 following. + 2-' = (1 + 1) . 2 - ' Theorem 1.32 For every x E R, there is an n E = 2 .2-' = 1 we can now prove the N so that n 2 x. Proof. For a contradiction, suppose that x is such that for all n E N we have that n < x. Then B := { y E R : (Vn E N : n < y ) ) is not empty. Moreover, B is bounded below by all n E N. By the Completeness Axiom, B has an infimum, call it b. Then 1 1 1 b - - # B, which means there is an n E N with n 2 b - -. But then n 1 2 b 2 2 2 is a lower bound of B, a contradiction to b = inf(B). + + Because N C Z and because subsets of Z that are bounded below have a minimum, we infer that for every real number x there is a unique smallest integer that is greater than or equal to x. Similarly there is a unique largest integer that is less than or equal to x. These numbers are useful when we need integers instead of real numbers, so we define the following. Definition 1.33 For every x E R, let [XI be the smallest integer greater than or equal to x. Moreovel; let 1x1 be the largest integer less than or equal to x. Asfunctions from IR to Z, r.1 is called the ceiling function and 1.1 is called the floor function. The last subset of R that we introduce is the set of rational numbers. Rational numbers are naturally defined as fractions. ca 1 Definition 1.34 The set Q := - : n E Z, d E N is called the set of rational numbers. The set R \ Q := {x E R : x # Q]is called the set of irrational numbers. Proposition 1.35 The rational numbers are closed under addition, subtraction and 4 multiplication. Moreovel; i f q , r E Q and r 0, then - E Q. r + m n Proof. Let m , n E Z, let c , d E N and consider the rational numbers - and - . C d Then Q is closed under addition because m n = mc-' + nd-' = mdd-'c-' + ncc-'d-' - + c d mn For multiplication, note that - - = mc-'nd-' cd der is left to Exercise 1-24. = mnc-ld-' = mn The remaincd -. Rational numbers can be found between any two real numbers and Exercise 1-45 will establish a similar result for irrational numbers. Theorem 1.36 Let a , b thata < q < b. E IR with a < b. Then there is a rational number q E Q such 15 1.4. Natural Numbers, Integers, and Rational Numbers 1 Proof. By Theorem 1.32, there is an n E N so that 0 < -< n. By part b-a 1 5 of Theorem 1.10, we obtain - < b - a. Now let u := min n 1 u 1 Then - - - 2 b - a > -, which means n n n n 1+1 u '+'<b. - < -. Hence, by definition of 1 and u we infer a < n n n n similarly let I := max m E Z: - 5 a I , We conclude with a simple looking result that is actually at the heart of a standard proof technique (see Standard Proof Technique 2.7). Exercise 1-25 extends Theorem 1.37 to inequalities. Theorem 1.37 Let x E E R.r f x 3 0 and for all E > 0 we have x 5 E, then x = 0. Proof. Let x be as indicated and suppose for a contradiction that x > 0. Then x . x . 1 := - is positive and x 5 E = - implies 1 5 -, a contradiction. 2 2 2 Exercises 1-20. Prove that if m. n E N, then mn E N. Hint. Same idea as the first part of the proof of Proposition 1.24 with sets S, := [n E N : mn -m E N and n E N}. 1-21. Prove that if n E N,then n 2 1. Hint. Use S := ( n E N : n ? 1). 1-22. Use the Well-ordering Theorem to prove the Principle of Induction. 1-23. Finish the proof of Proposition 1.30 by proving the following. Finish the proof that then m n E Z. + Z is closed under addition. That is, prove that if Prove that Zis closed under subtraction. That is, prove that m -n E Prove that 2.is closed under multiplication. That is, prove that mn Prove that for any two integers m , n with m > n we have m Hint. Find a contradiction to Proposition 1.26. -n E Z for all m ,n E Z. Z for all m ,n E Z. ? 1. Prove that every nonempty set A 2 Z that is bounded above has a maximum. 1-24. Finish the proof of Proposition 1.35. That is. (a) Prove that Q is closed under subtraction. (b) Prove that if q , r E Q and r f 0, then Hint. First show that for n 1-25. Prove that if a , b E E 4 - E r Z\ ( 0 )and d R are such that for all E 0. E N we have that > 0 we have a 5 b + E, (;)-I n then a 5 b. 1-26. Prove that for every real number x there is an integer n so that n 5 x. 1-27. Prove that for any real numbers x, E > 0 there is an n E N so that Hint. Theorem 1.32. 1 1 1 1-28. Prove that - - - = 1. 3 3 3 + + X - n =d < E. E N. I. The Real Numbers 16 1-29. A rational number r is called a dyadic rational number iff there are p E Z and n E N so that P r = -. Dyadic rational numbers are useful in analysis because they can provide a sequence of 2” “grids” such that each new grid contains the old one (see part 1-29a below), the whole set is the union of the “grids” (see part 1-29b) and between any two real numbers there is a dyadic rational number (see part 1-29c). Let D be the set of dyadic rational numbers and for each n E W let Dn := K - :p E Z]. (a) Prove that for all n E N we have Dn c Dn+l. uDn m (b) Let := { x E W : (3n E N : x co E Dn) ] and prove that D = u D, n=l n=l (c) Prove that for any x , y E R with x < y there is a dyadic rational number d so that x < d < y. 1-30. In this exercise, we will prove that the real numbers are the (up to isomorphism) unique linearly ordered complete field. That is, we will prove that every mathematical object that satisfies Axioms 1.1, 1.6, and 1.19 is in a certain sense (defined below) “the same as R.” First notice that, similar to the proof of Theorem 1.28, all results proved so far hold for any object that satisjfies Axioms 1.1, 1.6, and 1.19 (because the results are derivEd fzom t h e y axioms). That is, every set W that satisfies Axioms 1.1, 1.6, and 1.19 contains subsets N, Z, and Q that have the properties that we have proved up to now for the natural numbers, the integers and the rational numbers. (a) Prove that for all x E W we have that x = sup(r E Q : r 5 x). 0 (b) Now let Jk be a set that satisfies Axioms 1.1, 1.6, and 1.19 and let 6, 2, and be subsets of R that have the properties that we have proved up to now for the natural numbers, the integers and the rational numbers, including Exercise 1-30a. i. Define a function f : Q + @ as follows. For n E W, let f ( 1 ) := i and once f ( n ) is defined let f ( n 1) := f ( n ) + l . Also let f ( - n ) := l i t . For n E Z and d E W let + Q the above definition is not self-contradictory by d proving that it assigns exactly one value to each x E Q.Then prove that f ( x ) E for each x E Q and that f preserves the order, that is, if x < z , then f ( x ) 4 f(z). f (E) := $.Prove that for all x E d 0 ii. For x E R let f ( x ) := sup { f ( r ) : r E Q and r 5 x ] . Prove that for all x E R the above definition is not self-contradictory by proving it assigns exactly one value to each x E w. (Formally this says that f is well-defined.) iii. Prove that the above function does not map any two points to the same image by proving that for all x, y E R the inequality x f y implies that f ( x ) # f ( y ) . (Formally, this says that the function f is one-to-one or injective.) iv. Prove that the above function “reaches” every element of Jk by proving that for all 1 E there is an x E R so that f ( x ) = 1. (Formally, this says that the function f is onto or surjective.) fi v. Prove that the above function is consistent with the algebraic operations by proving that for all x , y E W we have that f ( x y ) = f ( x ) T f ( y ) and f ( x y ) = f ( x ) - f ( y ) . (Formally, this says that f is a field isomorphism.) + vi. Prove that the above function is consistent with the order relation by proving that for all x , y E W we have that x 5 y implies that f ( x ) i f ( y ) . (Formally, this says that f is an order isomorphism.) The above steps show that the points and operations in W and in Jk can b_e identified with each other in such a way th5t it does not matter if we are working in R or in R.Thus for all intents and purposes, W and W are “the same.” This is the essence of saying that the real numbers are up to isomorphism the unique linearly ordered, complete field. 17 1.5. Recursion, Induction, Summations, and Products 1.5 Recursion, Induction, Summations, and Products A recursive definition defines an entity X, that depends on a natural number n first for n = 1 and then it defines Xn+l in terms of X , . By the Principle of Induction the set S = { n E N : X, is defined ] is equal to N,which means that a recursive definition defines the entity X, for all natural numbers n . In this fashion, the sum of finitely many numbers can be defined. I Definition 1.38 For each j E N let aj E R. Define the sum c a j := a1 and for j=l n+l n E Ndefine the sum x a j := a n + l n -m j=l j=l + Caj.Form E N U {0},set x j=l a j := 0. The parameter j is also called the summation index. In particular, note that a sum whose index starts at 1 and ends at a number smaller than 1 is always zero. It is also called an empty sum. Summations that start at numbers other than 1 are defined similarly (Exercise 1-31). By their nature, recursive definitions are closely linked to induction. Unlike what is stated in Theorem 1.23, induction normally is used to prove statements about natural numbers. This is possible, because a proof that a statement is true for all natural numbers is the same as a proof that a certain set is equal to N. Theorem 1.39 Principle of Induction. Let P(n) be a Statement about the natural number n. I f P ( 1) is true and iffor all n E W truth of P(n) implies truth of P(n 1), then P (n) holds for all natural numbers. + Proof. Let P be as indicated and consider the set S := { n E W : P(n) is true }. Then 1 E S. For every n E S the statement P(n) is true, hence P(n 1) is true, which means n 1 E S . By Theorem 1.23 we conclude S = N and thus P(n) is true for all n E N. w + + Standard Proof Technique 1.40 In the form of Theorem 1.39, induction is a standard proof technique. It involves a two-step process. In the first step, called the base step, P ( l ) is proved. Then, in the induction step, P(n) is used to prove P(n 1). In this context, P(n) is also called the induction hypothesis. All proofs in this section rely on induction. Moreover, Exercise 1-32 exhibits another way to carry out an induction (sometimes called strong induction). 0 + n Example 1.41 For all n E W, the summation formula j=l 7 1 j = -n(n 2 + 1) holds. 7 1 Proof. The statement is P(n) = j=1 cj 1 Basestep. Weprove P(n) forn = 1. j=l 1 = 1 = -1(1 2 + l ) , so P ( l ) holds. 18 1. The Real Numbers c n Induction step. Under the induction hypothesis 1 j = -n(n 2 j=1 nil I j = 7 (n + 1) we must prove + l)((n + 1) + 1). A standard step in induction for recursively defined L, j=1 quantities is to split off the last term. This is done in the first step here. cj n n+l = (n+l)+Cj j=l j=1 1 1 1 + 1) + -n(n + 1) = -2(n + 1) + -n(n + 1) 2 2 2 = (n = 1 -(n 2 + 2)(n + 1). 0 Further examples of similar inductions can be found in Exercise 1-33. Similar to sums we can define products. Although products occur less frequently than sums, they are useful to define powers. n 1 Definition 1.42 For each j E N,let aJ n E R. DeJine the product n+l f o r all n E N dejine the product n aJ := an+l . j=1 fi n aJ := a1 and 5= 1 a ] . For all m E N U [O}, set J=1 a] := 1. The parameter j is also called the product index. J=1 Products that start at numbers other than 1 are defined similarly (Exercise 1-31). Products that end at an index that is smaller than the starting index are set to 1 and are also called empty products. n n Definition 1.43 For all a E R,and all n E NU{O}, we dejine the nth power an := a. J=1 Aside from integer powers of numbers, we want to work with rational powers. To define rational powers, we need nth roots of nonnegative real numbers. To formally prove their existence, we need the Binomial Theorem. As a start we need binomial coefficients and one of their key properties. n n Definition 1.44 For all n E N U {O}, we dejine n ! := J=1 of n. For all n , k E n! := k ! ( n - k ) ! ’ (el j and call it the factorial N U (0) with k 5 n, we dejine the binomial coefficient as 1.5. Recursion, Induction, Summations, and Products Theorem 1.45 The equation 19 (k:l)+(F)=(nll) holdsforalln,k E N with k 5 n. Proof. This result can be proved by direct computation. = (krl)+(;) - n! n! ( k - l ) ! ( n- ( k - l))! + k ! ( n - k ) ! n!k n !( n 1- k ) - n ! ( k +n 1- k ) k !( n 1- k ) ! k !( n 1-k ) ! k !( n 1 -k ) ! (n l)! k!(n 1 - k ) ! ' + + + + + + + + Now we are ready to prove the Binomial Theorem. Theorem 1.46 The Binomial Theorem. For all real numbers a , b we have ( a + b)" n , , E R,and all n E N, = k=O Proof. Throughout the proof we will freely use the properties of sums proved in Exercise 1-34. The proof is by induction on n , with P ( n ) being the statement about (a b)". Base step. For n = 1, note that + which proves the base step. Induction step. Assuming that the result holds for n , we must prove it for n First note that it follows easily from Definition 1.43 that for all x E R and all m we have x . x m = xrn+'. = (a = (a + b)(a + b)" + b) f:(I) akbnPk k=O = 2( i ) ak+lbn-k k=O + 2 (:> k=O akbn+l-k + 1. E N 20 1. The Real Numbers = ( ;),.+lbn+l-intl, +( [( ) + ( : ) U O ~ ~ + I - ~ + ~ j=1 j-1 7)]dbn+l-j With the Binomial Theorem, we can prove that nth roots exist. The proof of Theorem 1.47 is the first proof in this text in which we have to choose a number to make another number smaller than a given bound. That is, this is our first proof with a distinct analytical flavor. Theorem 1.47 Let n E N.For every nonnegative real number a, there exists a unique nonnegative real number r such that r n = a. Proof. We first prove the existence of r . Let R := {x E R : x _> 0 and x n 5 a } . Then 0 E R and R is bounded above by max{l, a } . Let r := sup(R). To show that rn = a , we will show that rn # a and r" # a . First, suppose for a contradiction that rn < a . Then there is a 6 > 0 so that r n 6 < a. By Theorem 1.32 (or Exercise + 1-27), for each k E (1, . . . , n } we can find an mk E N so that (i) m := max{ml, . . . , m,,). Then by the Binomial Theorem we conclude k=l mk i r n - k a n s .; Let i 1.5.Recursion, Induction, Summations, and Products The above shows that r 21 1 +E R , contradicting the fact that r = sup(R). Hence, m rn # a . The proof that rn 3 a is similar and left to the reader as Exercise 1-36. For uniqueness, suppose for a contradiction that there is another b >_ 0 with b” = a . Then b < r or b > r . But if b > r. then with S := b - r we obtain a = b“ = ( r + S)n = rn + rnPkSk> a , a contradiction. Hence, b < r . But k=l then with 6 := r - b we have a = r n = ( b + 6 ) n =bn + bnPkSk> a , a con. . tradiction. Therefore r is unique. We conclude by defining rational powers of nonnegative numbers and by proving some of their properties. Definition 1.48 Let n E N and let a E IR be nonnegative. The unique nonnegative real For n = 2 the root number r such that rn = a is called the nth root of a, denoted G. is called the square root, denoted &. Existence of nth roots is another property that distinguishes R from Q.Although Theorem 1.36 indicates that there are “many” rational numbers, the rational number system has some shortcomings when it comes to powers. Proposition 1.49 There is no rational number r such that r 2 = 2. Proof. We first prove by induction as stated in Exercise 1-32 (strong induction) that if n2 = 2z for some z E N,then n = 22’ for some z’ E N. The base step for n = 1 is vacuously true. That is, because the hypothesis l 2 = 2 2 leads to the contradiction 1 = l 2 = 22 = z z > 1, the hypothesis is never true, which means that the implication is automatically true (see Definition A.2 in Appendix A). For the induction step, first note that the result is trivial for n = 2, because 2 = 2.1. Now assume that n > 2 and the statement has been proved for all natural numbers less than n. Then 22 = n2 = ( n - 2 2)2 = ( n - 2)2 4(n - 2) 4 implies that ( n - 2)2 = 22 for some Z E N.By induction hypothesis, we conclude that n - 2 = 2: for some Z E N,and hence n = 22 2 = 22’ for some z’ E N. This proves that if n2 = 2z for some z E N,then n = 22’ for z’ = 2 1 E N. + + + + + + Now suppose for a contradiction that there are n E Z and d E N so that (;)2 =2 and such that there is no k E N \ 11) such that n = nk . k and d = dk . k . But by the above n2 = 2d2 implies n = n2 . 2 . Consequently, 2d2 = (n2 . 2)2, that is, d 2 = n; ’ 2 , which implies d = d2 . 2, a contradiction. We conclude from Theorem 1.47 and Proposition 1.49 that f i is irrational. For odd natural numbers (that is, natural numbers of the form n = 2k l), it is possible to define the nth root of a negative number a < 0 as @ := For the most part, powers are considered for nonnegative numbers, though. + -m. 1. The Real Numbers 22 Definition 1.50 For all real numbers a 2 0, all m E N U {0), n E N and all q E with q > 0 we dejine 1. a ; := ~. That is, the (i)th power Q of a is the nth root of a. 2. a: := ( a " ) : . 3. a-q := (a">- 1 1 = -fora Uq + 0. Theorem 1.51 For all positive numbers a and b and all rational numbers x and y , the following power laws hold: aXaY =(p+Y (q= (ab)x = a X b X aXY Proof. We first prove (ab)x = axbx. For exponents n E N,this is an easy induction. The base step n = 1 is trivial and the induction step from n to n 1 is (ab)n+' = ab(ab)n = abanbn = aanbb" = an+'bn+'. n For rational exponents - with n , d E N,we have that (ab)a = (ab)n and d ( ~ z b ;= ) ~ = anbn = (ab)n. Note that in both equalities we used the dejinition of fractional powers, not the power law that we are currently proving. Because all numbers involved are positive and dth roots are unique, we conclude that n n n (ab)a = a a b a . For x = 0, the equality (ab)x = a X b Xis trivial. Finally, for all positive x E Q we note (ab)-xaxbx = (ab)-x(ab)x = 1. Therefore (ab)-x is the multiplicative inverse of u X b x ,that is, ( d P= xa-xb-x. Thus (ab)x = aXbXfor all a , b > 0 and all x E Q. To prove that uX+Y = a X a y we proceed similarly. For exponents m , n E N,the proof for arbitrary m is an induction on n. The base step amal = ama = urn+' follows straight from the definition of powers with natural exponents. For the induction step from n to n 1, note that aman+' = amana = a m f n a = a m + ( n f 1 )which , proves the result for exponents m , n E N. For positive rational exponents x and y , note that there are m , n , d E N so that m n x = - and y = -. Then, using the equality we already proved, we obtain d d + ( (b;)d '!>" + ni n aXaJ = a a a a = (.")f 1 = (a"an)a = (am+n ) f - =a ~ + ~ . The equality is trivial if one of x and y is zero. In case both exponents are negative, note that for all positive x,y E Q we have aX+Ja-Xa-J= ~ ~ a J ' a - ~ a -= - Y1, which means a-Xa-Y = ~ - ~ f ( - y ) as was to be 23 1.5. Recursion, Induction, Summations, and Products proved. This leaves the case in which one exponent is positive and the other is negative. Let x,y E Q be positive and consider ax-)'. If the inequality 1x1 > IyI holds we have ax-4' a-v a --x = &Ca-x = 1, which means that ax-y = axa-y. If 1x1 < IyI we have aL'-xaxa-.i = a y a - y - 1, which means that ay-' = aya-'. If 1x1 = IyI the claim is trivial. Thus a x + y = d a y for all a > 0 and all x,y E Q. We leave the remaining three equalities as Exercise 1-37. Power laws for a 5 0 and b 5 0 (as applicable) can be proved similarly. To conclude, note that the results presented in this chapter guarantee that the real numbers have the properties we expect them to have. We will therefore use the usual notation (fractions, etc.) and laws of algebra throughout this text without further qualms about the need to justify that we are indeed allowed to do so. Exercises 1-31. Let k . rn E m Z and for each j E Zlet a j E n in W.Define the sum a , and the product j=k a,. j=k 1-32. Let P ( n ) be a statement about the natural numbern. Prove that if P(1) is true and if for all n E N\[1) truth of P ( l ) , . . . , P ( n - 1) implies truth of P ( n ) , then P ( n ) holds for all natural numbers. This type of induction is sometimes called strong induction. Hint Consider S := n { W : ( V k < IZ : P ( k ) holds E ) }. 1-33. Prove each of the following by induction. n (b) 1 j 3 = ;n2(n + (d) Bernoulli's inequality. Prove that for all real numbers x that (1 +x)" z 1 + n x . 1-34. Properties of sums and products. Let c E W and for all j (a) Prove that for all n E N we have x ( a j +bj) = n # 0 and n 2 2 we have -1, x N let a j and bj be real numbers aj j=1 j=1 (b) Prove that for all n E E P + x bj j=l n N we have E ( c a j ) = c nj J=1 j=1 n 1 =n ( c ) Prove that for all n E N we have j=1 (d) Prove that for all n E N we have fi (a, . b j ) = j=1 1-35. Reindexing sums. Lets E Z, n E N and for j [fi [fi aj ) j=1 E Zlet a, E . bj ) j=1 5s. Prove that s+n n+I 1=s i=l C a, = nkfs-l. 1. The Real Numbers 24 1-36. Finish the proof of Theorem 1.47 by showing that r" # a . Hint. Suppose r" > a and prove that then for some E > 0 and all 6 E (0, E ) we have r" - S @ R 1-37. Finish the proof of Theorem 1.51. That is, let a and b be positive real numbers, let x , y E prove each of the following. (c) Q and (ax)' = ax' 1-38. Let 0 5 a < b and let q z 0 be rational. Prove that aq < bq. 1-39. Let a,x E (0, 00) and let x be a rational number. (a) Prove that if a > 1 and x > 1, then ax > a . P and compare a p and a4 Hint. Let p , q E N be so that x = 4 (b) Prove thatifa < 1 andx < 1, thenax > a . (c) Prove that if a > 1 and x < 1, then ax < a . (d) Prove that if a < 1 and x > 1, then a x < a . 1-40. Letn E IV,Provethat (a> = 1 andthat (:> =I 1-41. Prove that there is no rational number r such that r 2 = 3. 1-42. Prove that for any n real numbers X I , . . . , xn the inequality a+b 1-43. Prove that for all a,b t 0 the inequality 4% 5 -holds. 2 1-44. (a) Prove that for all a , b E W we have d;;2+b2 5 / a /+ /bl. (b) Prove that for any a l , . . . , an E W we have 1-45. Let a , b E W with a < b. Prove that there is an irrational number x E W \ Q such that a < x Hint. Use that f i is irrational and Exercise 1-27 and mimic the proof of Theorem 1.36. 4 b. Chapter 2 Sequences of Real Numbers Convergence is the fundamental concept of analysis. It explores what happens when two quantities get close to each other, or when a quantity grows beyond all bounds. This chapter exhibits these ideas for sequences, with special emphasis on standard proof techniques. 2.1 Limits We start by defining sequences. Definition 2.1 A sequence of real numbers is a function f from the natural numbers to the real numbers. To emphasize their discrete nature, we denote sequences as with the understanding that a, = f ( n )for all n E N. Similar to sums and products, a sequence can actually start at any integer k (Exercise 2-1). The limit of a sequence should be the place where the sequence “stabilizes” for large n. Definition 2.2 encodes this property by demanding that for every given tolerance E , there is a threshold N so that once the running index n has gone past the threshold N ,the sequence can only deviate from the limit by less than the tolerance E . Definition 2.2 Let {a,}=, be a sequence of real numbers. Then L E R is called limit of [ u , } g , ifffor all E > 0 there is an N E N so that for all n 2 N we have that la, - L J < E (see Figure 2). A sequence that has a limit will be called convergent, a sequence that does not have a limit will be called divergent. Fintrely man) n mapped below L - E w 1 w All n IV i u “la1l.’ of w) mapped to ( L - 8 , L - w L w -& L + Finitely many n mapped aboie L + F) w \ w L f a w r E * E Figure 2: Visualization of convergence to L . For every E > 0 a “tail” of the sequence is in ( L - E , L E ) . + 25 2. Sequences of Real Numbers 26 Remark 2.3 It can be helpful to restate the definition using quantifiers (see Definition A.3 in Appendix A). L E R is a limit of {a,}Eliff Vs > 0 : 3N E N : V n 2 N : la, - LI < E. Formal statements with quantifiers enforce the rule that a variable must be defined or quantified before it can be used. We will usually enforce the same rule in natural language. Although the prose becomes a bit rigid this way, in nested quantifications like the definition of limits it is clearer to say “. . .for all x we have P ( x ) . . . ” than to 0 say “. . . P ( x ) holds for all x . . . ” Note that we did not speak of the limit of a sequence in Definition 2.2. This is because we have not proved yet that every convergent sequence has only one limit. The next theorem shows that this indeed the case. Proposition 2.4 Limits of sequences of real numbers are unique. That is, if(^,}^^ is then L = M . a sequence of real numbers and both L and M are limits of {a,}zl Proof. Let be a sequence and let L and M be limits of {a,}El. We need to prove that L = M . By Theorem 1.37 we can do so by showing that for all E > 0 the inequality I L - MI < E holds. Let E > 0 be arbitrary but fixed. Then there is an N1 E N such that for all n 2 N1 & we have that la, - L / < -. There also is an N2 E N such that for all n 2 N2 we 2 & have la, - M I < -. Let N := max{Nl, N2]. Because N 2 N1, for all n 2 N we 2 & & have (a, - L / < -. Because N 2 N2, for all n 2 N we have la, - MI < -. Then 2 2 by adding and subtracting U N and applying the triangular inequality we obtain (with n =N ) Because for arbitrary E > 0 we have the inequality lL - MI < E , by Theorem 1.37 we conclude that I L - MI = 0, and hence L = M . We will ultimately read and produce proofs that are much more complicated than the proof of Proposition 2.4. Therefore it is only appropriate to analyze how such proofs can be conceived. The standard proof techniques discussed later in this section reveal that certain details are indeed standard techniques which simply need to be internalized and used at the right time. Other than that, the novice usually is impressed & . by the sometimes “strange” choices for E . The reason why - is chosen in the proof of 2 Proposition 2.4 is that the proof is actually created backwards. Consider the following. To show that L = M we first note that because the a, are eventually close to L and close to M , we can put an a, with a sufficiently large index between L and M . After applying the triangular inequality 27 2.1. Limits the resulting differences should be small. By Theorem 1.37, if for all E > 0 we can make the difference IL. - MI less than or equal to E then IL, - MI = 0, that is, L = M . So we want to make the sum of the differences lL - a, I and (a, - MI smaller than E . & It is most natural to make each of the two terms smaller than - to obtain 2 jL - MI = IL - UN + - MI 5 IL. - a N 1 + IUN - MI E + E - = E. 2 2 This argument provides the final few lines of the proof. Note that up to here we have not chosen any N1,N2, or N . However, now that we have the “meat” of the argument, it is easy to create the “header.” E To make la, - L J and la, - MI smaller than -, we use that by the definition 2 of convergence there are N1 and N2 as mentioned in the proof and choose N to be their maximum so that both required inequalities hold for indices beyond N . Note that even though the “header” is the first thing we encounter, it is often the last thing that materializes as a proof is created. So, to set up an analysis proof it is standard practice to start by working with inequalities. Once the inequalities work, we create a “header” with the appropriate choices for E , n , and so on. UN < - Standard proof techniques in analysis. Certain steps occur so frequently in analysis proofs that they should become second nature. In this fashion, communication becomes more effective because memory is less strained to recall details of proofs. This is a cognitive technique commonly known as “chunking” of data. By internalizing certain standard “chunks,” larger amounts of data can be recalled, because we only need to recall which chunks are involved rather than all details. Unlike the standard proof techniques listed so far, from here on most standard proof techniques will be specific to analysis. The standard techniques used in the proof of Proposition 2.4 are listed below. Standard Proof Technique 2.5 It is common practice to rearrange terms and to add and subtract the same term to obtain more manageable expressions. When working with absolute values, this is often done in conjunction with the triangular inequality. In Proposition 2.4 this is the step IL - MI = J L - U N + UN - MI 5 IL - UN I + jUN - MI. + Such a step is usually abbreviated as ( L - MI 5 IL - U N I la^ - MI. In other computations, we will see that it can also be useful to multiply and divide by the same nonzero term. 0 Standard Proof Technique 2.6 If finitely many numbers N1, . . . , Nk E N are such that for all n 2 Ni a certain inequality holds, we can choose N := max{N1, . . . , Nk}. Then for all n 2 N all these inequalities hold. We usually claim directly that such an O N exists, skipping the intermediate Ni . Standard Proof Technique 2.7 To prove that two quantities are equal, we can prove that for any E > 0 the absolute value of the difference is less than E . This is usually done without explicit reference to Theorem 1.37. To prove an inequality a 5 b we often prove that for all E > 0 the inequality a 5 b + E holds (see Exercise 1-25). 0 28 2. Sequences of Real Numbers Standard Proof Technique 2.8 In many analysis proofs we prove results about a universally quantified variable, often denoted E . To prove such results, we pick one such E that is “arbitrary, but fixed,” throughout the proof. It must be fixed throughout the proof so we can uniquely define quantities that depend on it, and it must be arbitrary so that we really prove something about all variables in the scope of the universal quantification. Once the result is proved, we can conclude that “Because E was arbitrary we have proved . . . for all such E.” This final statement reiterates that, even though we made specific choices for the E in the proof, we can indeed make these choices for all E , which proves the universally quantified statement. Because this approach is so common, the bracketing statements about the variable E are usually left out or abbreviated. 0 Standard Proof Technique 2.9 Finally, statements like “We need to prove L = M,” that are put at the start to remind the reader what we will prove are often left out. Similarly, statements put at the end to reiterate what we have proved are often left out, too. 0 To phase in these techniques, we will first carry them out explicitly and give a reference to the above list. Then we will omit the explicit step, but still refer to the appropriate entry in the above list. Eventually a proof for something like Proposition 2.4 will condense to the following. “Expert Proof” of Proposition 2.4. Let E > 0. Then there is an N E W such that & & for all n 2 N the inequalities la, - L ( < - and (a, - MI < - hold. Therefore 2 2 IL - MI 5 ( L - a N I + la^ - MI E <- 2 + E2 = E , - which implies L = M . The “expert proof’ contains all relevant information needed to communicate the idea. It does not list the details of standard techniques, which are assumed to be known. In this fashion the reader may have to fill in more “steps between the lines,” but as long as these steps can be considered manageable, communication becomes more effective. Definition 2.10 Because the limit is unique, we speak of the limit of a sequence. The exists and is equal to the notation lim a, = L will indicate that the limit of n+w number L. The next result shows that the first finitely many values of a sequence do not affect its convergence behavior. Theorem 2.11 Let [ u , ) and ~ ~ ( b n } g lbe sequences of real numbers and let K E W be so that for all n 2 K we have a, = b,. Then { a , } z l converges ifs (bn}E1 converges and in this case the equality lim a, = lim b, holds. n-cc n+w Proof. Left to Exercise 2-2. 29 2.1. Limits We conclude this section with an explicit proof that a limit is a certain number. This exercise is good for learning how to choose more exotic values for N . Remember that while the proof is presented in linear fashion, it is actually created by working with the inequalities that we see at the end. In fact, N was found by solving the inequality 17 < E , which arises naturally from the computation at the end. 4n 10 + 3n-1 3 -Example 2.12 Prove that lim -n-+m2n+5 2’ Let E > 0 be arbitrary but fixed, and let N for all n 3 N the following holds. E N be such that N 12-10 > 4 3n-1 3 Because E > 0 was arbitrary, this proves that lim -- n-tm2nf5 2 . Then 0 Exercises 2-1. Let k E Zand for each n E Zwith n 2 k let a, E B.Define the sequence it means for L E W to be its limit. and define what 2-2. Prove Theorem 2.1 1. 2-3. Write out the argument that produced the choice for N in Example 2.12. 2-4. Let {an]F=, and (bn)r==l be sequences such that for all n if { u , ) ~ =converges ~ then so does 2-5. Let {a,)?=, {bn)r-’, and n+cu lim a , E N we have la, = Iim bn. - bnl < I -. Prove that n n+cu be a sequence. Prove that lim a, = L iff lim la, - L / = 0. n+ce n+m 2-6. Prove each of the following. (d) l i m m - & = O n+m 2-7. The usefulness of quantifiers. ” the (a) State the negation of the statement “ L is the limit of the sequence ( a n ) ~ = ; Cby= lfinding negation of Definition 2.2. (b) State the negation of the statement “L is the limit of the sequence { u , ] ~ = ~by” finding the negation of the quantifier version of Definition 2.2 in Remark 2.3. Hint. Appendix A.2. 30 2. Sequences of Real Numbers Limit Laws 2.2 Example 2.12 shows how to prove that a limit is a certain number. However, to write such a proof we must know the limit in advance. It would be nice to have a computational tool that provides the limit. It would be even nicer if the computation somehow could justify (prove) that the result of the computation really is the limit. This can be done with the limit laws discussed in this section. Because we are investigating the theory, it will be important to understand how the limit laws come into being. Applying the limit laws is fairly simple (see Exercise 2-8). To use the limit laws, we must know the limits of certain standard sequences. 1 Theorem 2.13 lim - = 0. n+m y2 Proof. Exercise 2-9. The limit laws state that limits can be moved into the familiar algebraic operations. The proof consists of several similar arguments. The argument for sums is spelled out in great detail. The argument for differences is almost verbatim the same argument as for sums and it is left to the reader in Exercise 2-10. The argument for products is proved closer to the fashion of the “expert proof’ on page 28, but with remarks interspersed. For the argument for quotients, we only present the final inequalities, leaving the remaining parts to the reader in Exercise 2- 1 1. Theorem 2.14 Limit lawsfor sequences. Let sequences. Then the following hold. 1. The sum {a, {a,}Eland [ b , } E l be two convergent + b n } z i converges and n-+m lim a, + b, = lim a, + lim b,. n+co n+co 2. The difference [a, - 00 bn},,l converges and lirn a, -b, = lirn a, - lirn b, n-m n+oc 3. The product {anbn}Ei converges and lim a, . b, = lim a, . n+cc lim b,. n+m n+m 4. Ifall b, # 0 and lim b, n+m + 0, then the quotient n+m converges and an limn--tcoan lim - = b, b, ’ n-30 Proof. Throughout this proof let L := lim a, and let M := lim b,. + n+m n+co To show for part 1 that the sum {a, b,,}E1converges to the limit L + M , we must show that for all E > 0 there is an N E N so that for all n 2 N we have the inequality (a, bn) - ( L M)I < E . Let E > 0 be arbitrary but fixed (see Standard Proof Technique 2.8). Because E lirn a, = L there is an NL E N so that for all n 2 NL we have la, - L / < -. Simn-m 2 ilarly, because lirn b, = M there is an NM E N so that for all n 2 NM we have 1 + + n+w Ib,, & -. Let N := max(NL, NM}. Then (compare with Standard Proof Tech2 & & nique 2.6) for all n 3 N the inequalities la, - LI < - and Ib, - MI < - hold. By 2 2 - MI < 2.2. Limit Laws 31 rearranging the terms and applying the triangular inequality (compare with Standard Proof Technique 2.5) we obtain the following for all n 2 N . 1 (an + b,) - (L + M )I = Ian - L < & & - f - = & . + bn - MI I la, - L I + Ibn - MI 2 2 We have proved that for arbitrary E > 0 we can find an N E N so that for all n 2 N we have that (a, b,) - ( L M ) < E (see Standard Proof Technique 2.9). Therefore by the definition of the limit, lim a, b, = L M . n-+m Part 2 is Exercise 2-10. To show how an abbreviated proof still contains the standard statement of the de$nition of the limit, the key parts of the definition are in the proof of part 3. We also intersperse comments in italics on how certain choices arise. For part 3, let The choices for E look a bit strange. They are made so that the inequalities later on work out and they can be motivated by reading the final estimates. Recall the discussion after the proof of Proposition 2.4. The first typed part of this proof were the $nu1 inequalities. Because lim a, = L there is an N L E N so that for all n 2 N L we have that I + + I + + n+-cc E . (We cannot divide by ]MI alone, because it could be zero.) 2(lMl 1) Similarly, because lim b, = M there is an N M E N so that for all n 2 N M we have la, - L1 < + E n+cc lim a, = L there is a K1 E N so that for all 16, - MI < 2(ILI 1) . Finally, because n-cc n 2 K1 we have la, - LI < 1. Consequently, for all n 2 K1 we obtain by the reverse triangularinequality la,] - ILI 5 llanl - I L / / 5 la, -LI < 1 andthus lanl < ILI 1. Let . Then (compare with Standard Proof Technique 2.6) + + the inequalities la,l < /LI Ian - LI < E 2(IMI + 1 hold. & + 1)’ Ibn - MI < 2(ILI + 1 ) and To “connect” the anbn and the L M , we add and subtract the term a,M. This allows us to apply the triangular inequaliv (compare with Standard Proof Technique 2.5) to split the original difference lanbn - LMI into two summands. Each of these summands contains one factor that can be made small. For all n 3 N , we obtain the following. a,MI < lanbn = lanllbn -MI - + la,M - LMI + Ian -LIIMI 2. Sequences of Real Numbers 32 which proves part 3 (see Standard Proof Technique 2.9). For part 4, the final estimates, which are the hard part to figure out, are la,M - LMI - I < + ILM - Lb,l Ibn I IMI + LIIMI ILIlM - bnl Ib* I IMI la, - LllMl lLllM - bnl la, - + IMI E I M ~2 21LI &M2 -M 2 2(21LI 1) < &. 4 /MI +- + To train the reader in generating the appropriate “header,” the complete proof of part 4 (including final estimates) is left to Exercise 2-1 1. Standard Proof Technique 2.15 In an analysis proof, it can be necessary to divide by a nonnegative quantity lal. Usually the quotient will be multiplied by that same quantity later in an estimate and the goal is to cancel it (consider the proofs of parts 3 and 4 of Theorem 2.14). However, we cannot divide by zero. To avoid any undue distractions here, when defining certain quotients, we will usually add 1 to nonnegative la’ < 1 still allows us to quantities in denominators. In this fashion, the fact that la1 +1 “cancel” the term in an estimate. This is more effective than to separately consider the 0 case that a quantity is equal to zero. The limit laws allow us to efficiently establish convergence and compute the limits for many complicated looking sequences. For instance, finding the limit of the sequence in Example 2.12 reduces to the following. { ;;; ;}Il. Example 2.16 Find the limit of - Using Theorem 2.13 and the limit laws we obtain the following: 3n - 1 3 n-1; lim - = lim n-+W 2n 5 n-ta2n+5 + 3-, 1 1 = lim -n+-m 2+5 3-lim,,,; 2+lim,,,; 1 3 - - 5 2’ 3 Note that by the limit laws, the computation also proves that the limit is - . That is, 2 we will not need to perform the tedious argument given in Example 2.12. 2.2. Limit Laws 33 For limits of powers, we will consider integer powers here, leaving rational powers to Theorem 3.42 and real powers to Theorem 12.11. Theorem 2.17 Let {a,,}Elbe a convergent sequence. Thenfor all k E N the sequence . Ifnone of the terms are zero and the limit is not equal to zero, the result holds for all k E Z. converges and lim a," = n i x Proof. Fork E N the proof is an induction on k. The base step fork = 1 is trivial. For the induction step k -+ k 1, assume that for a given k 3 1 the sequence + oo converges and that lirn a," = {a,"]n = l nioo converges with lirn a,"" = lim n+cc n+oo as follows. ( = . We must prove that . By part 3 of Theorem 2.14, we can argue lim a, .a," = lim a,"+', n+oo n+cc where each expression in the chain of equations is guaranteed to exist because the expression preceding it exists. This completes the induction step and the proof of the result for k E N. Now for k E Z,let all terms and the limit be nonzero. For k > 0, we just proved the result and for k = 0 the result is trivial. For k < 0. note that where again each expression in the chain of equations is guaranteed to exist because the expression preceding it exists. Standard Proof Technique 2.18 Many statements about limits implicitly assert two things. First, that the limit exists, and second, what the limit is. This means a proof will need to verify both these claims. To simplify the language and to shorten proofs, it is customary to dispense with an explicit existence proof for the limit. Instead, the computation of the limit can be used to establish the existence of the limit as in the proof of Theorem 2.17. If we start with an existing quantity and finish with the quantity in question, then the existence claim in known limit laws establishes the existence of all intermediate quantities from left to right. Exercise 2-23 shows that the order of progression is important. The limit of a sum, a diference, a product, or a quotient can 0 exist even when the sequences of the individual terms diverge. 34 2. Sequences of Real Numbers Standard Proof Technique 2.19 An induction as in the first part of the proof of Theorem 2.17, which takes a result for an operation on two objects and turns it into a result for an operation with k E N objects, is often omitted by saying “a simple induction argument shows ” The proofs will all be similar. The base step is usually simple. For the induction step, split off the last element, apply the induction hypothesis to the first k elements, then apply the original result and thus conclude the proof. 0 ... Although it would feel natural to prove that Theorem 2.17 holds for real exponents, real exponents must be postponed until we define real exponentiation in Definition 12.8. Our main focus is on the abstract underpinnings and fundamentals of analysis, not on certain specifics. Hence, we postpone the introduction of transcendental functions until we have all the machinery to introduce them very efficiently. Their introduction will also make us more fully appreciate the power of the abstract tools we build. We could at least prove Theorem 2.17 for rational exponents, but at this stage the proof would be cumbersome. We will obtain the result easily in Theorem 3.42, once we have developed some more tools. We will use this approach throughout. Zfa result can be obtained easily late< then (unless an important technique needs to be introduced) we will postpone the result rather than set up an unnecessarily complicated argument. We conclude this section by investigating the relationship between limits and inequalities. Theorem 2.20 Let {a,}Eland { b n } z lbe two convergent sequences of real numbers so that for all n E N the inequality a, 5 b, holds. Then lim a, 5 lirn b,. n+m n+m Proof. Let L := lim an and let M := lim b,. We will prove (see Standard Proof n+m n+m Technique 2.7) that for every E > 0 the inequality L - M < E holds, which proves that L - M 5 0, that is, L 5 M . Let E > 0. Then (see Standard Proof Technique 2.6) there is an N E W so that & & for all n 2 N we have la, - LI < - and Ib, - MI < -. But, using Standard Proof 2 2 Technique 2.5, this implies L-M L-a,+b,-M+a,-b, = & 5 lL-a,I+lb,-MI+a,-b, & -+-+O=&. 2 2 < We have proved that for all E > 0 we have L - M < proves that lirn a, = L 5 M = lim b,. E, which by Exercise 1-25 ,--too ,--too The next theorem is a refinement of the fact that limits preserve existing inequalities. If a sequence {b,}r=l is “trapped” between two convergent sequences with the same limit, then must converge to that limit also. The novelty is that we need not assume that { b , ] E l converges. {&}El Theorem 2.21 The Squeeze Theorem for sequences. Let { u , } ~ ={bn)n=l, ~ , cc { c , }cc ,=~ be sequences of real numbers so that for all n E N the inequalities a, 5 b, 5 c, hold and so that lim a, = lim c., Then { b n ) E lconverges and the limits are equal, that n+m n+cc is, lim b, = lim a, = lim cn. n-m n-m n+m 2.2. Limit Laws 35 Proof. Let L := lim a, = lim c,. We need to prove that for all E > 0 there is an n+co n+cc N so that for all n 2 N the inequality Ib, - LI < E holds. Let E > 0. Because {a,}zl and both converge to L , there are N, E N and Nc E N so that for all n 2 N, we have la, - LI < E and so that for all n 2 Nc we have / c , - Li < E . Let N := max{N,, N c } . Then for all n 2 N we obtain L - E < a, and c, < L E . Therefore, for all n 2 N we conclude L - E < a,* 5 b, 5 c, < L E, which means that Ib, - LI < E . We have proved that for all E > 0 there is an N E W so that for all n 2 N we have the inequality Ib, - LI < E . This means that {bn)zl converges to L . N E (cn}zl + + Standard Proof Technique 2.22 To prove that the limit of a sequence {brz]r=l is zero, 1 it is common to prove that lbnl is bounded by some sequence, such as a multiple of -, n that goes to zero itself. The Squeeze Theorem then guarantees that the limit of {b,}gl also is zero. In particular (compare with Exercise 2-5), we often prove that lirn a, = L by n+cc proving that la, - L I is bounded by a sequence that goes to zero. Exercises 2-8. Use the limit laws to compute the limit of the sequence. 2-9. Prove Theorem 2.13. Hint. Exercise 1-27 2-10. Prove part 2 of Theorem 2.14. 2-1 1. Prove part 4 of Theorem 2.14. Explain all choices for N and for upper or lower bounds. Hint. There are three inequalities that must hold for all n 2 N . 2-12. Prove that if (an)g, converges, then lim n-tx 2-13. Prove that if a, 2 0 for all n E / a n /= lirn a, n+m 1. N and ( a , ] g l converges, then n lim -kw = 4s.. Y" 2- 14. Let q > 0. Prove that lirn - = 0. n+3o n ! rz! 2-15. Conjecture the value of lim - and then prove your conjecture. n-cc n" 2-16. Let {an]gl be a sequence of real numbers and let ( p n ) g l be a sequence of positive real numbers 1 so that lim ___ = 0. n-tm c g , l Pk c;=i Pkak (a) Prove that if (an]r=O=l converges to a E R, then lim n++m (b) Give an example to show that the convergence of convergence of { =a C;,lPk "=' I, =, oii Pkak Pk need not imply the 2. Sequences of Real Numbers 36 Figure 3: An injective function maps all elements of the domain to distinct images, but some elements of the range may not have a preimage ( a ) . For a surjective function, every element of the range has a preimage, but some elements of the domain may be mapped to the same image ( b ) . A bijective function maps all elements of the domain to distinct images and each element of the range has a preimage (c). This is why the existence of a bijection between two sets indicates that the two sets are “of the same size’’ (see Definitions 2.25 and 7.1 1). 2.3 Cauchy Sequences A sequence so that for all E > 0 all elements with a sufficiently large index are within E of each other should converge. Indeed, this condition guarantees that the elements with large indices cluster ever more tightly. However, the number system may have a hole just where these elements cluster. For example, the sequence 1.4, 1.41. 1.414, 1.4142, . . . of successively better decimal approximations of h does not converge in Q because the value that the sequence approaches is not in Q (see Exercise 2-17). This section shows that this problem does not arise in the real numbers. Sequences for which elements with large indices cluster ever more tightly play an important role in analysis. They are called Cauchy sequences. {a,}zl Definition 2.23 Let be a sequence of real numbers. Then {an)rT1is called a Cauchy sequence i r f o r all E > 0 there is an N E N so that for all m , n 2 N we have that laTl- a, 1 < E . In the real numbers, convergence and being a Cauchy sequence are equivalent. Before we can prove this result, we need to define finite and infinite sets. Definition 2.24 Let A , B be sets and let f : A -+ B be a function. Then f is called injective or one-to-one ifffor all x , y E A the inequality x # y implies f (x) # f (y). f is called surjective or onto ifSfor all b E B there is an a E A with f ( a ) = b. Finally, f is called bijective iff f is both injective and surjective. Figure 3 gives a visualization of injective, surjective, and bijective functions and some properties of injective and surjective functions are investigated in Exercises 2-1 8 and 2-19. Once we have bijective functions, we can define finite sets as “sets of size n,” where n E N U {O}. 2.3. Cauchy Sequences 37 Definition 2.25 A set F is called finite iff F is empty or there is an n E N and a bijective function f : { 1, . . . , n } -+ F. Sets that are notjnite are called infinite. For finite sets F # 5 we set 1 FJ:= n with n as above and we set 101 := 0. For injinite sets ZwesetlII:=co. Lemma 2.26 Let F be ajnite set and let I be an injinite set. Then I \ F is injinite. Proof. In case F is empty, there is nothing to prove. In case F is not empty let n E N be so that there is a bijective function f : ( 1, . . . , n } -+ F . Suppose for a contradiction that I \ F is finite. Then there are a natural number m E N and a bijective function g : (1, . . . , m } + I \ F . Define the function h : (1, . . . , m n } + I by if j 5 n , Then it is easy to show that h is bijective (Exercise h ( j ) := f ( A ; g ( j - n ) ; if j > n. 2-20). But this means that I is finite, a contradiction. + [ Theorem 2.27 A sequence ( u , } ~of~real numbers converges iff it is a Cauchy sequence. Proof. For the direction “jlet , L” := lim a,. We need to prove that for all E > 0 n-oo there is an N E N so that for all m , n 2 N we have \a, - a, I < E . & Let E > 0. Then there is an N E N so that for all n 2 N we have la, - LI < -. 2 Therefore for all m , n 2 N we obtain We have proved that for all E > 0 there is an N E N so that for all m , n > N we have la, - a,J < E . Hence, {a,}r=l is a Cauchy sequence. Once more we have used Standard Proof Technique 2.5. It is so common, and we have used it often enough, that we will no longer explicitly refer back to it. “+:”Let ( u , ) be ~ ~a Cauchy sequence. We need to prove that there is an L E R so that for all E > 0 there is an N E N so that for all n 2 N we have la, - L1 < E . We first need to find a suitable number L. Because ( a , ) ~isl a Cauchy sequence, for E := 1 there is an N E N such that for all m , n > N we have la, - a, I < 1. In particular, for all n 3 N we obtain la, - a N 1 < 1, and hence a, < a~ 1. Therefore the set ( n E N : a, 5 a N 1) is infinite and thus {x E R : ( n E N : a, 5 x } is infinite } # 0. For all n 2 N we also have that a, > a~ - 1, so that for all x 5 a N - 1 the set ( n E N : a, 5 x} is finite. Therefore {x E R : ( n E N : a, 5 x} is infinite } is bounded below. This means L := inf (x E R : {n E N : a, 5 x} is infinite } exists by Proposition 1.20. The idea how to obtain L is visualized in Figure 4. To prove that L is the limit of we need to prove that for all E > 0 there is an N E N so that for all n 2 N we have la, - L / < E . Let E > 0. By definition of L , the set H- := FI E N : a, 5 L - - is finite 2 and the set H+ := n E N : a, 5 L - is infinite. Therefore, by Lemma 2.26 the 2 is infinite. Because relative complement H+ \ H.- = n E N : a, E + + (a,}zl 1 [ +7 { 7 2. Sequences of Real Numbers 38 CUlOff polnr Infimtely man) n mapped to an) (-m. x ] filth x > L Finilely many n mapped to any (-m. XIwith x < L - Fmrel) many n mapped below a N - 1 1 w w -I FmtzI) many n mapped abore a N w w , aN - 1 w L w - w 1 w \ , a w +1 - c aN f 1 aN Figure 4: Visualization of the construction of L in the proof of Theorem 2.27. {U,},X~=~ is a Cauchy sequence, there is an N E N such that for all in, n 2 N we have & & la, - a,/ < -. Moreover, because n E N : a, E ( L - i,L + is infinite, there 2 is a k 3 N so that lak - LI 5 6 z]} 2 { -. Therefore for all n 2 N we obtain 2 Because E was arbitrary we have proved that for all E > 0 there is an N E W so that for all n 1 N we have la, - LI < E . Hence, lim a, = L . fl+W Standard Proof Technique 2.28 Application of the Completeness Axiom to get the infimum or supremum of the “right” set is a standard technique on the real line. We will also see this approach in the proofs of Theorems 2.37, 2.41, 3.34, and 8.4. In abstract spaces, this technique is not available and we usually substitute compactness (see Definition 16.57), a property which, for closed and bounded intervals on the real line, is a consequence of the Completeness Axiom (see Theorems 2.41 and 8.4). 0 Convergence of Cauchy sequences is a fundamental analytical property called “completeness,” which is introduced in Section 16.2. Another way of formulating Theorem 2.27 is to say that the real numbers are complete. Although Axiom 1.19 is already called the Completeness Axiom of the real numbers, this terminology makes sense, because Exercise 2-25 shows that Theorem 2.27 and Axiom 1.19 are equivalent. This means that either one of them could rightly be called the Completeness Axiom. There are many other equivalent formulations of the Completeness Axiom. We will encounter some more of them in Theorems 2.37, 2.41, and 8.4 as well as in Exercise 2-5Oc. Whenever we encounter one of these formulations, there will be an exercise similar to Exercise 2-25 to show that the new result is equivalent to one of the equivalent formulations of the Completeness Axiom that we already know. Aside from the fundamental importance of completeness, Theorem 2.27 has immediate value for showing if a sequence is divergent. By Definition 2.2 a sequence is divergent iff (using negations as stated in Appendix A.2) for every real number L there is an E > 0 such that for all N E W there is an n 3 N so that la, - LI 2 E . This is a four times nested quantification that would require us to show for every real number that it is not the limit of the sequence. Theorem 2.27 reduces proofs of divergence to showing the sequence is not a Cauchy sequence. Example 2.29 The sequence { (- diverges. 2.3. Cauchy Sequences 39 To prove that the sequence diverges, we need to prove that it is not a Cauchy sequence. This means (see Appendix A.2) we must find an E > 0 so that for all N E N there are m , n >_ N so that la, - a,\ > E . But with E := 1 for every N E N we have that i(-l)N - (-l)N+li = 2 > 1. Hence, { (- l),}Zlis not a Cauchy sequence and therefore it diverges. 0 Standard Proof Technique 2.30 In Example 2.29, we had to negate the statement “for all E > 0 there is an N E N such that for all m , n 2 N we have la, - a, 1 < E.” When negating such a complicated statement, it is helpful to write the statement in quantifiers and then negate it. In this fashion, the definition of a Cauchy sequence is VE > 0 : 3N E N : V m , n 2 N : la, - a,( < E, and the negation becomes (see Appendix A.2) 3~ > 0 : V N E N : 3m, n >_ N : la, - a, I 2 E, which is what was needed in Example 2.29. The schematic way in which quantified statements can be negated is very helpful, especially the first few times one negates a complicated statement. However, the quantifiers must not become a crutch. It is advisable to first try the negation verbally and then double check with quantifiers. Standard Proof Technique 2.31 To prove that a sequence of real numbers converges, we often simply prove that it is a Cauchy sequence. To prove that a sequence of real numbers diverges, we prove that it is not a Cauchy sequence. 0 Exercises 2-17. Prove that number. { 1 x. is a Cauchy sequence of rational numbers whose limit is not a rational n=l 2- 18. Let A . B be sets and let f : A -+ B be a function. Prove that f is injective iff for all bl , b2 E B we have that f ( b 1 ) = f ( b 2 ) implies bl = b2. 2-19. Compositions of injective and surjective functions. Let A , B , C be sets and let f : B -+ C and g : A -+ B be functions. The composition of f and g is defined by f o g ( a ) := f ( g ( a ) ) for all a E A. (a) Prove that if f and g are injective, then so is f o g . (b) Prove that i f f and g are surjective, then so is f o g 2-20. Prove that the function h in the proof of Lemma 2.26 is bijective. 2-2 1. State the definition of a convergent sequence using quantifiers 2-22. For each of the following sequences, prove that it converges or prove that it diverges 40 2. Sequences of Real Numbers 2-23. Existence of the limit on the left side of a limit law as in Theorem 2.14 does not imply the existence of the limits on the right side. (a) Use {a,]:, := { ( - l ) , } g 1 1 and [b,]El:= (-1)"'l cr3 ln=l to show that lirn a, n+oo + b, can exist without either sequence being convergent. (b) Show that lirn a, - b, can exist without either sequence being convergent. n+m (c) Show that lirn a, . b, can exist without either sequence being convergent n+m (d) Show that lim ?! n+m b, can exist without either sequence being convergent 2-24. Can a Cauchy sequence have two limits? Explain your answer. 2-25. Use the fact that Cauchy sequences converge in the real numbers and the axioms for R except for Axiom 1.19 toprove Axiom 1.19. Hint Let S G R be bounded above and not empty. Construct a Cauchy sequence so that for all x E S there is an N E N so that for all n 1 N the inequality an 2 x holds. 2.4 Bounded Sequences If the elements of a sequence cannot become arbitrarily large, we speak of a bounded sequence. Unlike being a Cauchy sequence, boundedness is not equivalent to convergence, but it still has some important consequences. Definition 2.32 A sequence is called bounded above iff there is a number A E R such that for all n E N the inequality a, 5 A holds. In this case, A is also called an upper bound of the sequence. A sequence {a,}Klis called bounded below iff there is a B E R such that for all n E N the inequality a, 2 B holds. In this case, B is also called a lower bound of the sequence. Finally, a sequence is called bounded #it is bounded above and bounded below and we call it unbounded if not. Example 2.33 The sequence is bounded, while the sequence {n)El is not - U bounded. Proposition 2.34 Any convergent sequence of real numbers is bounded. Proof. Let { a n ) z lbe a convergent sequence and let L := lim a,. We need to n+cc prove that there is a number M E R so that for all n E N the inequality /a,/ 5 M holds. Let E > 0. Then there is an N E N so that for all n 2 N we have la, - L / < E . Let M : = m a x ( J L l + & , la11 . . . . , I a M - I I } . Thenforalln < Nwetnviallyhavela,I 5 M and for n L N we obtain lan] 5 la, - LI jLI < E ILI 5 M . We have proved that is bounded below by -M and above by M . w + + In general, the converse of Proposition 2.34 is not true as the next example shows. Example 2.35 The sequence { (-l)"}zl is bounded, but it does not converge (see Example 2.29). 0 2.4. Bounded Sequences 41 For monotone sequences, however, boundedness does imply convergence. Definition 2.36 Let { a n } E i be a sequence. Then [a,}Elis called nondecreasing ifs for all n E N we have a, 5 an+l. It is called nonincreasing ifffor all n E N we have a, 2 a,+l. rf{a,}EI is either nonincreasing or nondecreasing, then it is called monotone. Moreovel; is called (strictly) increasing zxfor all n E N we have a, < a,+l and it is called (strictly) decreasing ifffor all n E N we have a, > a,+l. The sequence { n ] z lshows that nondecreasing sequences can grow beyond all bounds. But if this is not the case, a monotone sequence converges. The key to the proof is Standard Proof Technique 2.28. Theorem 2.37 Monotone Sequence Theorem. If { u , , } ~is ~bounded and monotone, then [a,}Elconverges. More precisely 1. v{a,]Elis bounded above and nondecreasing, then it converges. 2. Zf{a,},X=l is bounded below and nonincreasing, then it converges. Proof. We only prove part 1. The proof of part 2 is similar (Exercise 2-30). To prove part 1, let be bounded above and nondecreasing. Then the set {a, : n E N] is bounded above, and hence by Axiom 1.19 it has a supremum L . To prove that L is the limit of the sequence, we must prove that for every E 0 there is an N E N so that for all n 2 N the inequality la, - LI < E holds. Let E > 0. By Proposition 1.21 there is an N E N with U N > L - E . But then for all n N we have L - E < U N 5 a, 5 L , and hence ~ U N- LI < E . We have proved that for every E > 0 there is an N E N so that for all n 2 N we have la, - LI < E . Hence, lim a, = L . H n+o3 Although boundedness does not imply convergence, it forces the sequence to “cluster” in some places. To make this idea more precise, we need the notion of a subsequence. Definition 2.38 Let A , B , C be sets and let f : B -+ C and g : A -+ B be functions. The composition o f f and g is defined by f o g ( a ) := f ( g ( a ) )for all a E A. Definition 2.39 Let [a,]Elbe a sequence of real numbers and let { n k ] g l be a strictly increasing sequence of natural numbers. Then { a f l k } Eis 1 called a subsequence of [ u , } ~ Formally, ~ . a subsequence is the composition of the function that maps k to nk and the function that maps n to a,. Convergence is what happens when the indices get large. To obtain a notion that is useful to analyze convergence behavior, in the definition of a subsequence we had to specifically demand that the nk are strictly increasing. If we had allowed nk = n k + l , then by choosing [ n k } & to be a constant sequence, any sequence would have infinitely many convergent “subsequences.” This would be counterintuitive, because a sequence such as { k] 00 would have a “subsequence” that converges to 1 even though the n= 1 sequence itself converges to 0. 2. Sequences of Real Numbers 42 With the definition as it is, subsequences behave sensibly when the sequence converges. {a,}zl Proposition 2.40 Let be a convergent sequence of real numbers with limit L . Then every subsequence { a n k } z also l converges to L. {a,}zl Proof. Let be a convergent sequence of real numbers with limit L and let be a subsequence. We must prove that for all E > 0 there is a K E N so that for all k 2 K we have lank - Li < E . Let E > 0. Because nk < nk+l for all k E N,an easy induction shows that rZk 2 k for all k E N (Exercise 2-31). Because { a , } z l converges, there is an N E N so that for all n 2 N we have la, - LI < E . Therefore for all k 2 N we obtain nk 2 k 2 N , and hence lank - LI < E . We have proved that for all E > 0 there is a K E N so that for all k 2 K we have /ank- L / < E . Hence, { a n k } E 1converges to L , too. H {U,,}~QS_~ The precise statement of the idea that a bounded sequence of real numbers must “cluster” somewhere is that a bounded sequence of real numbers must have a convergent subsequence. This is an important property of the real numbers, which is ultimately encoded in the notion of compactness (see Section 16.5). The proof of the Bolzano-Weierstrass Theorem utilizes Standard Proof Technique 2.28. of Theorem 2.41 Bolzano-Weierstrass Theorem. Any bounded sequence real numbers has a convergent subsequence. Proof. Let be a bounded sequence and let b > 0 be such that for all n E N we have ]a,] < b. Then b is a real number so that { n E N : a, 5 b ] = N and {n E N : a, 5 -b} = 0. Therefore b is contained in the set of real numbers {x E R : { n E N : a, 5 x} is infinite }. Moreover, this set is bounded below by -b. Hence, the infimum L := inf { x E R : ( n E N : a, 5 x } is infinite } exists by Proposition 1.20. We will prove that L is the limit of a subsequence of {U,},X=~. To do this, we will employ the Standard Proof Technique 2.22 and find a subsequence { a n k } E lso that /ank- LI is bounded by a sequence that goes to zero. For each k E N the set 1 Hk+ := n E N : a, 5 L + -kl l is infinite and the set HC is finite. Therefore the set Tk := n E N : a,%E ( L - { 1 i ,L 1 + ‘1) := n E N : a, 7 5L -k = Hk+ \ H L is ink finite for each k E N (see Lemma 2.26). Construct {nk}‘& inductively as follows. Because TI is infinite, it is not empty and we let n 1 E T I . Once rZk was chosen, let nk+l be any natural number in Tk+l that is greater than ilk. Such a natural number exists, because T k + l is infinite. Then {a,,}& is a subsequence of and for all 1 k E N we have /ank- LI 5 -, because nk E Tk. By the Squeeze Theorem this means k H lim la,, - L / = 0, and hence lim ank = L. {a,]zl k-m k+w 43 2.4. Bounded Sequences bounded +:(1 + (-1)")n t convergent subsequence bounded and monotone f: convergent ti monotone c finite or infinite limit (-1)" pi: n Figure 5: Implications between the various notions related to convergence that are introduced in this chapter. Implications are indicated with arrows and the examples near the arrows indicate that the opposite implication does not hold. The Bolzano-Weierstrass Theorem is a useful tool in single variable analysis. We will see examples of its use in the proofs of Theorem 3.44 and Lemma 5.19. For many sets of properties, it is instructive to explore which property implies which other properties and which properties are equivalent. Figure 5 summarizes the properties introduced in this chapter (including those from the next section) and how they are related to each other. Exercises 2-26. For each of the sequences below determine if it is bounded, and then prove your claim. 2-27. For the given sequence (a, nk is as indicated. ]El,find an expression for the terms of the subsequence { unk)Elwhere 2-28. Use Proposition 2.40 to prove that each of the sequences below diverges. 2-29. Explain why we could have chosen E = 1 in the proof of Proposition 2.34. Then explain why we cannot choose E = 1 in a general convergence proof. 2-30. Prove part 2 of the Monotone Sequence Theorem. 2-31. Perform the induction mentioned in the proof of Proposition 2.40. First state exactly what it is that you prove, then execute the proof. 2. Sequences of Real Numbers 44 2-32 Sketch a visualization for the construction of L in the proof of the Bolzano-Weierstrass Theorem that is similar to Figure 4. 2-33 Let x E R and let (x,)?=~ be a sequence of real numbers that does not converge to x. Prove that there is an E > 0 and a subsequence { x n k } E lso that for all k E N we have xnk - x 1 E . 1 1 2-34 Let [x,)~=~be a sequence of real numbers that has no convergent subsequence. Prove that for each x E R there is an E~ > 0 so that { n E N : Ixn - x / < cx } is finite. be a sequence of real numbers such that every subsequence has a subse2-35 Let L E R and let quence that converges to L . Prove that ( x ~ } ? = ~converges. 2-36 A well-known convergent sequence. n (a) Let a , b E Wand n E N.Prove that b n f l -an+' = (b- a ) ajbn-j. j =O (b) Let a , b E (c) Prove that :>")I, lFt with 0 5 a < b and let n E { (1 + N.Prove that b"+l - .n+l b-a < (n + 1)b". is increasing and that it converges. Hint To prove it is increasing, bring an+' to the right in part 2-36b and use a = 1 1 and b = 1 + - , To prove the sequence is bounded above, use a = 1 and b = 1 n + n +1 1 ~ 1 +. 2n 2-31 Use the Monotone Sequence Theorem and the axioms for 1.19. R except Axiom 1.19 to prove Axiom 2-38 Use the Bolzano-Weierstrass Theorem and the axioms for 1.19. R except Axiom 1.19 to prove Axiom 2-39 Use the Bolzano-Weierstrass Theorem and the axioms for W except for Axiom 1.19 to prove directly (that is, without using Theorem 2.27 or its proof) that every Cauchy sequence of real numbers converges. 2-40 Use the Bolzano-Weierstrass Theorem and the axioms for Monotone Sequence Theorem. 2-4 1 Let R except for Axiom 1.19 to prove the [U,),X=~ and {bn]E1 be bounded sequences. + (a) Prove that (a, b,]?=;O=l is bounded. Hint. Unlike for convergence, the bound need not be one number M . An upper bound of the form Ma M b would work just fine. + (b) Prove that ( a n b n ] ~ =isl bounded. (c) Prove that if there is a S 0 so that b, > S for all n E N then is bounded 2-42. Let x E [O, I]. Prove that the sequence defined recursively by a0 := 0 and the recurrence 1 relation a,+l := an - (x - a,' ) converges to Jr;. 2 Hint. Prove by induction that the sequence is bounded above by &.Then prove that it is nondecreasing. + 2.5 Infinite Limits Although unbounded sequences diverge, they can display some types of regular behavior, which are explored in this section. 2.5. Infinite Limits 45 Definition 2.42 Let {a,,}z1 be a sequence of real numbers. Then we say that the limit of ( a , } z l is infinity ifffor every M E R there is an N E N so that for all n 2 N the inequality a, >_ M holds. In this case, we write lirn a, = 00. A limit of negative n-+ x infinity is defined similarly (Exercise 2-43) and denoted lirn a, = -00. ,--too Intuitively an infinite limit should mean that eventually the sequence gets close to infinity. Similar to Definition 2.2, this idea is encoded in Definition 2.42 by saying that for any given bound M , there is a threshold N so that once the running index n goes past the threshold N ,the sequence will not drop below the bound M any more. It is also said that a sequence with lirn a, = 00 grows beyond all bounds. ,--too Example 2.43 L e t x > 1. Then lim x n = 00. n+cc Clearly, for all n E N the inequality xn+' > x n holds. Suppose for a contradiction that { x " } z l does not go to infinity. Then (refer to Standard Proof Technique 2.30 as necessary) there is a B > 0 so that for all N E N there is an rn > N with x m 5 B . Let n E N. By the above, there is an rn 1. n so that X" 5 x m 5 B . Hence, the sequence { x n } z l is bounded above. Let M := sup { x n : IZ E N} . Now M M by Proposition 1.21 because - < M , there is an n E N so that x n > -. But then X X M xntl = x . x n > x- = M , a contradiction. Thus lim x n = 00. n-m X With some exceptions, discussed at the end of this section, the limit laws for infinite limits are similar to those for convergent sequences. be such that lim a, = co. Theorem 2.44 Limit laws involving co. Let n+oo 1. If {bn},X==l is a bounded sequence, then lirn a,, n-+m + b, = 00 and if all a, are bn nonzero, then lim - = 0. n+W a, 2. If nlirn +x b, = then lim a, + b , = m and lirn anbn = 03. ,-too n+cc 3. I f c > 0 is a real number, then lim can = 03. n--tx 4. I f c 4 0 is a real number, then lirn can = -co. n+m Proof. To prove part 1 let lirn a, = 00 and let (b,],X==,be bounded. Let B E R be n i x such that for all n E W the inequality Ib, I < B holds. First consider the sum. We need to prove that for all M E R there is an N E N so that for all n > N we have a, + b, 2 M . Let M E R. There is an N E N so that for all n 2 N we have a, 2 M + B . But then for all n 2 N we obtain a, b, > M B - B = M , and hence lirn a, b, = 00. n i x Now consider the quotient. We need to prove that for all E > 0 there is an N E W + + la": I so that for all n 2 N the inequality - < + E holds. 2. Sequences of Real Numbers 46 Let E IBI R so that < M , which > 0. By Theorem 1.32 there is an M E W 1Bl means - < M E. & Now there is an N E W so that for all n >_ N we have a, 2 M . Thus < E , and hence lim for all n >_ N we obtain bn - = 0. n-+m a, .. , To prove part 2 let lirn a, = lirn b, = 00. n-tcc n-00 For the sum, we need to prove that for all M E R there is an N E N so that for all n 3 N the inequality a, b, 2 M holds. Let M E R and note that there is an N E N so that for all n 2 N we have the M M inequalities a, 2 - and b, 2 - (see Standard Proof Technique 2.6). But then for all 2 2 M M n 2 N we have a, b, 1. - - = M . Hence, lirn a, b, = 00. 2 2 n+co For the product, we need to prove that for all M E R there is an N E N so that for all n 2 N the inequality anbn 2 M holds. Let M E R and note that there is an N E N so that for all n 2 N we have the inequalities a, 2 and b, 2 But then for all n 2 N the product exceeds M because anbn 2 IM I 2 M . Hence, lim a, b, = co. n+oo To prove part 3, let lim a, = co and let c > 0. We need to prove that for all n-00 M E R there is an N E N so that for all n 2 N the inequality can 2 M holds. M Let M E R.There is an N E W so that for all n 2 N we have a, 1. -. But then C M for all n 2 N we obtain can >_ c- = M . Hence, lim can = 00. C n+co Finally, to prove part 4 let lirn a, = 00 and let c < 0. We need to prove that for nim all M E R there is an N E N so that for all n 2 N the inequality can IM holds. M Let M E R.There is an N E N so that for all n 2 N we have a, 2 --. But then + + + + m m. mJln/ll= ( i) for all n 2 N we obtain can I c -- Icl = M . Hence, lirn n+m Can = -m. Infinite limits can also help indirectly to establish the existence of finite limits. Example 2.45 Let x > 0. Then lim x f = 1. n+cc The result is trivial for x = 1. We first consider x > 1. Suppose for a contradiction that lirn xf f 1. For every n E n+cc Thus if I N we have 1 < x h = (.;) * < xf . 00 xi ' ln=l does not converge to 1, then (refer to Standard Proof Technique 2.30 as necessary) because xj > 1 for all n E N E N there is an n 2 N with x: > 1 + N,there is an E. E Because (x;] + > 0 so that for every 00 n=l is decreasing this would mean that 2; > 1 E for all n E N,But then for all n E N we would have (1 + E ) , < (x:) = x, contradicting the fact that lim (1 E ) , = 00 (see Example 2.43). Thus for all x > 1 we have lirn n+m I xi ,-too = 1. + 2.5. Infinite Limits 47 The proof for 0 < x < I is deferred to Exercise 2-46. Just as for infinite limits, there are limit laws for limits that equal negative infinity. be such that lim a, = -cc. Theorem 2.46 Limit laws involving -m. Let n-cc 1. I f { b n } E l is a bounded sequence, then lim a, n-tcc + b, = -00 and if all a, are bn nonzero, then lim - = 0. ,+aa, 2. I f lim b, = -.co, then lirn a, n+cc ,403 + b, = --oo and lirn anbn = 00. n+w 3. I f lim b, = cc,then lim anbn = -w. n-cc ,300 4. I f c > 0 is a real numbel; then lirn can = -cc. 11'00 5. I f c 40 is a real numbel; then lim can 00. n+m Proof. The proof of Theorem 2.46 is similar to that of Theorem 2.44. It is thus left to the reader as Exercise 2-44. The addition of two sequences such that one has limit 00 and the other has limit -m is absent from Theorems 2.44 and 2.46. This is because by Exercise 2-48b the sum need not converge. Exercise 2-48c shows that even if there is a limit, it is not the same number in all cases. The situation is similar for the product of a sequence with infinite limit and a sequence with limit zero (see Exercises 2-48d and 2-48e), as well as for the quotient of two sequences with infinite limits (see Exercises 2-48f and 2-488). These types of limits are called indeterminate forms and they are discussed in more detail in Section 12.3. Exercises 2-43. State the definition of a sequence whose limit is negative infinity. 2-44. Prove Theorem 2.46. That is, prove each of the following. + is a bounded sequence, then lim a, bn = -oa and if all n+x b an are nonzero, then lim -!! = 0. n + w a, (b) If lim a, = lim bn = -w, then lirn a, b, = --oo and lim a,b, = oc. (a) If,5mxa, n+x = -m and [bn):=, n+x n e w + n+x (c) If lim a, = -oa and lim b, = oa,then lim a,b, n+x n+x n+x = -oa (d) If c > 0 is a real number and lirn a, = -m, then lim can = --oo n+m n+m (e) If c < 0 is a real number and lirn a,, = - w , then lim can = 00. n+x n+x 2-45. Let 0 < x < 1 (a) Prove that lim x n = 0 by using the limit laws n+m (b) Prove that lirn x n = 0 by mimicking the proof in Example 2.43 n+m 2. Sequences of Real Numbers 48 1 1. Prove that lirn x i = 1 2-46. Let 0 < x 2-47. Prove that n+x I,"=, { is decreasing and that the limit is 1 ( + :In 5 Hint. Use Exercise 2-36c to show that 1 (n + 1) &I n - n for all large enough n and derive the inequality from this. For the limit L , use the result from Example 2.45 to show that L = L2. i 2-48. A first encounter with indeterminate forms. (a) For [ u , ) ~:=~( n ] g l and ( b n ] g l := lim b, = --o= and that the sequence (a, n-x [ - n2 ),"=,, + b,}r=l prove (n]E1 a n d ( b , ] z l := ( - n ) z l , p r o v e t h a t n+oo lirn and that the sequence (a, + b n ) r =l converges. (b) F o r ( a n ) z l := (c) Let c E W.Find sequences + b, lim a, n-x lim an = x , that ,-roo diverges. [~,)r=~ an = x , n-oo lim bn = -00 and [ b n ) r z lso that lirn a, = 00, lirn bn = -m and n i w n-a3 = c. (d) Find sequences ( a n ) z l and [ b n J z 1so that lirn a, = x , lim bn = 0 and ( ~ n b n ] E i n+oo n-x diverges. (e) Let c E W. Find sequences (an},X=l and ( b n ] ; P 1so that n+m lim a, = 03, lirn b, = 0 and n+x lim anbn = c. n-oo (0Find 3cI and (b,)r=l so that lim a, = lirn b, = m and sequences n-oo n+x di- verges. (g) Let c E lim n-m W. Find sequences 2 = c. b, 2-49. Prove that if (a,)?=l and (b,]?=l an N E are sequences so that lirn a, = x and there are an N so that bn > E for all n 2 N , then n-oo lim n-oo E > 0 and anbn = 00. 2-50. A characterization of divergent sequences (a,]zl I,"=, be an unbounded sequence. Prove that there is a subsequence (ank (a) Let lim ank = 00 or lim ank = -a. so that k+co k-x be a bounded divergent sequence. Prove that there are two convergent subse(b) Let quences (a!, such that lim a!, and lim ank exist, hut are not equal. and {ank}gl m+x k+x (c) Let be a sequence. Prove that [ u , ] ~ = ~diverges if and only if there is a subsequence {ank},& such that lim ank = x or lim ank = --x, or there are two subse- }r=l quences (aim and k-m { L Z , , ~ } :such ~ k+oo that lirn al, and lim ank exist, but are not equal. m+m k+x (d) Use the characterization of divergent sequences in part 2-5Oc and the axioms for R except for Axiom 1.19 to prove the Bolzano-Weierstrass Theorem. 2-51. Prove Cauchy's Limit Theorem. That is, let (b,]?=, be a strictly increasing sequence of positive numbers that goes to infinity and let [a,]?=, be a sequence. Prove that if the sequence an converges to c, then lim - = c. n - x b, Hint Exercise 2-16 with p , := b,, - b,-l and another appropriate sequence Chapter 3 Continuous Functions Functions are the central objects of analysis. This chapter defines limits and continuity for functions of a real variable and it presents some consequences of continuity. To avoid problems with complicated domains (see Exercise 3-32 and the end of Section 16.3.1 for some details), in this chapter functions are usually considered on intervals or on intervals from which at most finitely many points were removed. These domains are sufficient to build the traditional calculus of functions of one variable. More complicated domains are handled in metric spaces in the second part of the text. 3.1 Limits of Functions The limit of a function at a point x is supposed to express what happens near x, but not necessarily at x. This is similar to the running index n of a sequence never actually becoming 00. While 00 is not in the domain of a sequence, a real number x can be in the domain of a function. Hence, we must explicitly remove x from consideration. In this section, functions are defined on an open interval from which a point x has been removed. In this fashion, we assure that each function is defined “close to the left of x” and “close to the right of x,” which is what we need to investigate for (two-sided) limits. Because convergence of sequences is already defined, we can use sequences to define convergence of functions. Definition 3.1 Sequence formulation of the limit of a function. Let I g R be an open interval and let x E I . The number L E R is called the limit of the function f : I \ (x} -+ Iw at x zTfoor all sequences (zn}El with zn E I \ {x} for all n E M and lim zn = x we have lim f ( z , ) = L. In this case, we denote lim f ( z ) := L and we n-rffi n m ’ XZ ’ also say that f converges (to L ) at x. Similar to Theorem 2.1 1, the limit of a function at x is only affected by the values of the function near x . 49 50 3. Continuous Functions Theorem 3.2 Let I 2 IR be an open interval, x E I , and let f ,g : I \ {x} + R be functions. If there is a number 6 > 0 so that f and g are equal on the subset { Z E I \ { x } : Iz - X I < 8 ) o f I \ {x},then f converges at x ifsg converges at x and in this case the equality lirn f ( z ) = lim g ( z ) holds. Z'X Z'X Proof. E: cise 3-2. By Theorem 3.2, Definition 3.1 also defines the limit at x for any function that is defined on a set D that contains a set I \ (x},where I is an open interval that contains x. Formally, we define the following. Definition 3.3 Let D , R , S be sets with R D and let f : D -+ S be a function. The restriction of thefunction f to R, denoted f I R , is defined by f I R ( x ) := f (x) for all x E R. Definition 3.4 Let f : D -+ IR be a function and let x E R be so that there is an open interval I with x E I and I \ { x }C D. We define the limit off at x as the limit of the restriction f l ~ \ { a~ t~ x and denote it lim f ( z ) := lim f l ~ \ { ~ ) ( z ) . X Z' Z'X All the following results on functions f : I \ {x} -+ R also apply to functions with larger domains. Strictly speaking we would need to apply all definitions and results to the restriction of the function to a set I \ {x},where I is an open interval and x E Z. We will usually avoid this simple formality. Ultimately, Definition 16.33 will encompass this situation as well as some situations in which, for single variable functions, we use one-sided limits (see Definition 3.15 below). Example 3.5 1. For all x E R,we have lirn z = x. Z'X 2. For all x E R,we have lirn IzI = 1x1. X Z' 3. The function f (x) := { 1; 0; -1; f o r x > 0, for x = 0, does not have a limit at x = 0. f o r x < 0, Part 1 is trivial and part 2 follows from Exercise 2-12. To see that the function in part 3 does not have a limit at x = 0, it suffices to produce two sequences {yn}E1and {zfl}glthat converge to zero so that yn f 0 and zn f 0 for all n E N and { f ( y n ) ) z J for all n E W we obtain n+x lirn f (:) = 1 f -1 = lim f n+x (- :), which completes the argument. :StandardProof Technique 3.6 If a sequence that converges to a number x from the is usually a good choice. For a sequence that converges 51 3. I . Limits of Functions a2 to x from the left, { x-- is usually a good choice. This idea is extended in Standard Proof Technique i 3.8. l n = 0 l There are at least two ways to define the limit of a function. We have already seen the formulation with sequences and we give the formulation with E and S below. The two formulations are equivalent, and hence either one could serve as the definition. With both formulations available, we can choose which one to use. Depending on the situation, one formulation may be preferable over the other to produce a simpler statement or proof. The proof of Theorem 3.19 is a good example of how each formulation is better suited for certain settings than the other. Theorem 3.7 E-6 formulation of the limit of a function. Let I C R be an open interval and let x E I . Then L E R is the limit of thefinction f : I \ {x} + R at x iff for all E > 0 there is a 6 > 0 such that for all z E I \ {x} with ) z - x / < 6 we have that ( z ) - LI < E . If Proof. For “+,”let lim f ( z ) = L . Suppose, for a contradiction, the statement on XZ ’ the right is false. Then there is an E > 0 so that for each 6 > 0 there is a z E I \ {x] with / z - x J < 6 and l f ( z ) - LI 2 E (if necessary, use Standard Proof Technique 1 1 2.30 for the negation). Then for S := - there is a z , E I \ {x] with Iz, - X I < - and n n l f ( z n ) - Li 2 E . But then lim zn = x, while lim f (z,) either does not exist, or n+3o n+cc if it exists, then lirn f (z,) L (see Exercise 2-5). Either way we have amved at a n+m + contradiction. For “e,” let f : I \ {x} + R be such that for all E > 0 there is a 6 > 0 such that for all z E I \ {x}with Iz - X I < 6 we have that I f ( z ) - LI < E . We need to prove that for each sequence with zn E I \ {x} for all n E N and lirn zn = x we have {z,}r=, n+cc that lirn f ( z , ) = L . n+m Let { z ~ } : = ~ be a sequence with zn E I \ ( x } for all n E N and n+co lim zn = x, and let E > 0. Then there is a 6 > 0 such that for all z E Z \ {x] with Iz - X I < 6 we have I f ( z ) - Ll < E . Moreover, for S there is an N E N so that for all n 3 N we have Izn - x J < 8. But then for all n 2 N we infer l f ( z , ) - LI < E , and hence lim f ( z , ) = L . This proves that lirn f (z) = L . 17 + 30 Z’X Standard Proof Technique 3.8 In the ‘‘+”part of the proof of Theorem 3.7, for all S > 0 there is a z with Jz - x J < 6 and other properties. To obtain a sequence {~,,},x7_~ that converges to x so that each z, has the other desired properties, it is standard 1 1 practice to use 6 := - and then pick an appropriate element zn with / Z n - X I < -. n n Exercises 3- 1, Explain why after Definition 3.1 it is not necessary to prove that the limit of a function is unique 3-2. Prove Theorem 3.2. 3. Continuous Functions 52 3-3. Let m , b E W.Prove that Z'lirn X mz + b = mx + b. 3-4. Let I be an open interval and let x E I . Prove that if f : I \ ( x ) -+ W does not converge to L at x, then there are an E z 0 and a sequence [ z ~ ) : = ~ so that lirn zn = x , zn E I \ (x} for all n E 8 and n-x 1 f ( z n ) - L 1 > E for a11n E N. 3-5. Prove that if m E Zand 1.1 is the floor function, then lirn l z ] does not exist. z+m 2-3 1 3-6. Prove that lirn -- - 6' 2-312-9 3-7. Prove that the Dirichlet function f ( x ) = '' for Q' does not converge at any x E R. 0; f o r x $ Q, 3-8. Explain why, with the present definition, lirn &is not defined. Then state what the limit should be z+o and how we could circumvent our purely formal problem. Nore. This problem will be resolved in Exercise 16-28. 3.2 Limit Laws Just as for limits of sequences, we are interested in how limits of functions relate to the algebraic operations, because this should simplify the computation of limits. First note that all algebraic operations on functions are defined pointwise. Definition 3.9 Let D E R be a set and let f , g : D -+ R be functions. For all x E D we define ( f g)(x) := f ( x ) g(x), ( f - g)(x) := f ( x ) - g(x), and f (x) ( f . g)(x) := f (x) . g(x). For all x E D with g(x) f 0 we dejne (x) := - + + ($) Theorem 3.10 Limit laws for functions. Let I Iw be an open interval, let x E I and let f , g : I \ {x} -+ R be functions such that lim f ( z ) and lirn g ( z ) exist. Then the Z'X Z'X following hold. lim( f X Z' + g ) ( z ) = lim f ( z ) + lim g ( z ) . Z'X XZ' lim( f - g ) ( z ) = lim f ( z ) - lim g ( z ) . Z'X lim( f X Z' Z"X g ) ( z ) = lim f ( z ) . Z'X XZ' JL: g(z). Each equation implicitly asserts that the limit on the left side exists (see box on p . 33). Moreovel; formally, in part 4 we would need to demand also that g ( z ) # 0 for all z E I \ {x). But lirn g ( z ) # 0 implies that g ( z ) # 0 for all z E I \ {x} that are near Z'X x. Hence, i f g has zeros, we implicitly assume that g has been restricted appropriately rather than worry about zeros that do not affect the convergence behavior: 53 3.2. Limit Laws Proof. Throughout this proof let L f := lim f (z) and let L , := lim g(z). Z’X X’Z In this proof we will use Definition 3.1 as well as Theorem 3.7. Although it will turn out that Definition 3.1 in conjunction with the limit laws for sequences is more effective for this prooj it will also be instructive to see how 8-6proofs are constructed. The reader will compare the two approaches by proving each part with the respective other approach in Exercises 3-9 and 3-10. To prove part 1, we use Theorem 3.7. We need to prove that for all E > 0 there is a 6 > 0 s o t h a t f o r a l l z ~I\{x)withIz-xl < 6 w e h a v e I ( f + g ) ( z ) - ( L f + L , ) l <E. Let E > 0. Then there are 6f > 0 and 6, > 0 so that for all z E I \ {x) with If (z) - L f l < - and for all z E I \ {x) with Iz - X I < 6, we 2 E have Ig(z) - L , < -. Let 6 := min{bf, 6,) (compare with Standard Proof Technique 2 2.6). Then for all z E I \ {x) with Iz - X I < 6 we obtain via the triangular inequality that Iz - X I < Sf we have & 1 This means we have proved that for all E > 0 there is a 6 > 0 so that for all I \ {x) with Iz - X I < 6 we have f g)(z) - ( L f L g ) l < E . Consequently, lim( f g ) ( z ) = L f L,. z E Z’X + I( + + + To prove part 2 using Definition 3.1, we need to prove that for all sequences {z, with Z, E I \ {x) for all n E N and lim zn = x we have lirn f (z,) = L . (z,}zl be a sequence in I \ {x} with lim n m ’ Let Because n-cc n-cc Z, = x. By {z,)zl was arbitrary this implies lim( f )El Theorem 2.14 we infer -g)(z) = L f - L,. X’Z The proofs of parts I and 2 show that to prove the limit laws, Definition 3.1 is more effective than Theorem 3.7. Nonetheless, both ways are actually equally complex overall. If we compare the proof of part I with the proof of part 1 of Theorem 2. I 4 we see striking similarities in the arguments. This means that the complexity of a proof using Dejinition 3.1 is simply delegated to the proof of an earlier result (Theorem2.14). The reader will have the chance to analyze the similarities and the direrences in the proofs for part 3 in Exercise 3-9. The similarity between the proofs here and the proofs for Theorem 2. I 4 can be used to translate the proof for part 4 into a complete proof of part 4 of Theorem 2.14. The rather complicated choices in the header were of course made after the final inequalities had been analyzed carefully. The proof of part 3 is left to the reader as Exercise 3-9. For part 4, we use Theorem 3.7. So we need to prove that for all E > 0 there is a f L S > O s o t h a t f o r a l l z E Z \ {x} with Iz - X I < 6 we have l;(z) < E. $1 Let E 1 0. Then there is a 6f > 0 such that for all z have f ( z ) - L f l < 7 ‘ILX1. Moreover, there is a 6, > \ (x)with Iz - x I < Sf we 0 such that for all z E I \ {x) E I 3. Continuous Functions 54 with /z-XI 2 ( 2 ILf We have proved that for all Iz ILg I* < 6, we have / g ( z ) - L,/ < - x / < 6 we have E I + 1) . Finally, there is a v > 0 there is a 6 > 0 so that for all c E. z E > 0 so I \ (x)with Therefore the limit of the quotient exists and Just like limits of sequences, limits of functions preserve inequalities and there also is a Squeeze Theorem. Definition 3.11 Let D 5 R be a set and let f , g : D + R be functions. We say 3f is pointwise less than or equal to g, that is, i r f o r all x E D the inequaliv f _< g 1 f ( x ) 5 g ( x ) holds. Theorem 3.12 Let I 5 R be an open interval, let x E I and let f , g : I \ (x}+ R be functions. I f f 5 g on I \ {x} and f and g converge at x, then lim f ( z ) 5 lim g ( z ) . Z'X Z'X Proof. Exercise 3- 1 1. Theorem 3.13 The Squeeze Theorem for functions. Let I C R be an open interval, let x E I and let f , g , h : I \ {x} -+ R be functions. I f f 5 g 5 h on I \ {x) and f and h converge at x with lim f ( z ) = lirn h ( z ) , then g converges at x and lim g ( z ) = lim f ( z ) = Z'X Z X' i% h(z). Z'X Z'X 55 3.2. Limit Laws Proof. Exercise 3- 12. Finally, convergence is also preserved by the composition of functions. Theorem 3.14 Let I , J 5 R be open intervals, let x E I , let g : I \ {x) -+ IW and f : J + R befunctions with g [ I \ {x}] J , and let lirn g ( z ) = L E J . Assume that Z"X lirn f ( y ) exists and that g [ I \ (x}] E J \ { g ( x ) } ,06 in case g(x) E g[Z J-+L \ (x}],that lirn f ( y ) = f ( L ) . Then f o g converges at x and lirn f o g(x) = lim f ( y ) . J-tL y+L Z'X Proof. Let M := lirn f ( y ) and let be a sequence in the set I \ {x} so Y+L that lirn zn = x. Then lim g ( z n ) = L . If no zn satisfies g ( z n ) = g(x), we obtain n+oo n-cc lirn f ( g ( z n ) ) = M , while if some g ( z n ) are equal to g(x), then M = f(L)and we n-+w can infer lirn f ( g ( z n ) ) = f ( L ) = M in this case also. Because the sequence {z,},"=, n+w was arbitrary the result is established. Exercises 3-9. Completing the proof of Theorem 3.10. Let x E I and let f . g : I \ {x)--f lim f ( z ) and lim g(z) exist. R be functions such that Z"* Z"X (a) Prove part 3 using Definition 3.1. (b) Prove part 3 using Theorem 3.7. 3-10. Alternative proofs for the proved parts of Theorem 3.10. Let x E I and let f,g : I \ {x] + functions such that lirn f ( z ) and lirn g(z) exist. (a) Prove that Jir+mx(f R be X Z' X Z' + g)(z) = lirn f ( z ) + Jim g(z) (part 1) using Definition 3.1 XZ' X Z' (b) Prove that lirn ( f - g)(z) = lim f ( z ) - lirn g ( z ) (part 2 ) using Theorem 3.7. z+x Z'X (c) Prove that if lirn g(z) f 0, then ZX' Z+X !s(5) ( z ) = l i m Z ~ xf ( z ) (part4) using Definition 3.1. limz+x g(z) 3-11. Prove Theorem 3.12. (a) Using Definition 3.1. (b) Using Theorem 3.7. 3-12. Prove Theorem 3.13. (a) Using Definition 3.1 (b) Using Theorem 3.7 3-13. Explain the similarities between the proof of part 1 of Theorem 3.10 as presented and the proof of part 1 of Theorem 2.14. 3-14. Computation of limits. (a) Let I be an open interval, let x E I and let f : I \ (x) + Jim f(z) = lim f ( x h ) . i-x + h-0 (b) Compute each of the following limits x 2 - 5x 6 i. lirn x 1 3 x 2 - x - 6 x-9 iii. lim X"9X2( 9 + & ) x + r n + ii. lim x-2 W be a function. Prove that 3x3 - x* - 12x x2 - 4 +4 56 3. Continuous Functions 3.3 One-sided Limits and Infinite Limits Section 3.5 will show that it is advantageous to consider continuous functions (see Definition 3.23) on closed intervals. But that means we also need a notion of convergence at the endpoints of a closed interval. One-sided limits provide just that. Definition 3.15 Sequence formulation of the left limit of a function. Let a < b. The number L E R is called the left limit of the function f : [ a ,b ) + R at b iff for all sequences in [ a ,b ) with lirn zn = b we have lirn f (z,) = L. In this case, n+cc n+cc we denote lim f ( z ) := L and we say f converges (to L ) at b from the left. {zn]zl z+b- The right limit at a for a function f : ( a , b ] -+ E% is defined similarly. It is denoted lim f ( z ) , and we say f converges at a from the right. ziaf I f f : D -+ R is a function and the domain D contains an interval [ a ,b ) with a c b, we dejine lim f (2) := lirn f I [ a , b ) ( z ) if it exists. (Exercise 3-15 shows that z+b- z+b- these left limits are well dejined.) Right limits are defined similarly. Similar to limits, we prove most results for functions defined on half-open intervals. These results are also valid for functions with larger domains. We simply apply them to the appropriate restrictions. Example 3.16 Let m E Z.For thejoor and ceiling functions of Definition 1.33, we have lirn l z ] = m - 1, lim LzJ = m, lim rzl = m, and lirn [zl = m 1. 0 z+m- z-+m+ zim- z+m+ + Definitions 3.15 and 3.1 differ in only one way. For a one-sided limit, the sequences must all stay on one side of the number, while in Definition 3.1 the sequences can have values on either side. To emphasize the ability to approach from either side, limits of functions are sometimes called two-sided limits. With such strong similarity in the definitions, it is only natural that the theorems that govern one-sided limits are similar to the theorems that govern (two-sided) limits. Theorem 3.17 Limit laws for one-sided limits. Let f , g : [a,b ) -+ R be functions such that lim f ( z ) and lirn g ( z ) exist. Then the following hold. z+b- zib- I . lim ( f + g ) ( z ) = lim f ( z ) z-tb- z+b- + z-tblim g(z). Each equation implicitly asserts that the limit on the left side exists. (See box on page 33.) Moreovel; because lim g ( z ) # 0 in part 4 implies that g ( z ) # 0 for z near b z-b- (where it matters), we did not demand g ( z ) hold for right limits. + 0 for all z E [ a ,b). Similar limit laws 57 3.3. One-sided Limits and Infinite Limits Proof. Exercise 3-16. Theorem 3.18 8-6formulation of the left limit of a function. The number L E R is the left limit of thefunction f : [ a ,b ) + R at b ifSfor all E > 0 there is a 6 > 0 such thatforallzE [ a , b ) w i t h l z - b l < 6 w e h a v e ( f ( z ) - L (< E . Proof. Exercise 3- 17. Theorem 3.19 connects one-sided limits to (two-sided) limits. Note that to make the proof efficient, we use the sequence formulation of the limit for one direction and the E-8 formulation for the other direction. Theorem 3.19 Let I g R be an open interval, let x E I and let f : I \ {x) + R be a function. Then lim f ( z ) exists iff lim f ( z ) and lirn f ( z ) both exist and are equal. z'x- Z'X Z'X+ In this case the limit is equal to the left and the right limit. Proof. For "+,"let for all sequences L := lirn f ( x ) . Using Definition 3.15 we must prove that Z'X {z,}:~ in I with zn < x for all n E N and nlim -ffi Z, = x {z,}Elin I with lim f (z,) = L and we must prove the same result for all sequences > x for all n E N and lirn Z , = x. fl'cc zn we have rl-m Let {z,},"=, be a sequence in I with Z , < x for all n E N and lirn Z , = x. By Definition 3.1 we have lirn f (z,) = L , which was to be proved. Sequences with n-+ffi zn > x for all n E N are treated similarly. Hence, lim f ( z ) = lirn f ( z ) = L . fl'ffi -x'z Z'X+ For "+,"let lim f ( 2 ) = lirn f ( z ) =: L . By Theorem 3.7, we must prove that Z'x- Z'X+ > 0 there is a 6 > 0 so that for all l f ( z ) - L / < &. for all E z E I \ {x} with Iz - XI < 6 we have Let E > 0. By Theorem 3.18 there is a 61 > 0 so that for all z E I \ (x} with z < x and Iz - X I < 61 we have l f ( z ) - LI < E . By the corresponding result for right limits, there is a 6, > 0 so that for all z E I \ { x } with z > x and ) z - X I < 6, we have I f ( z ) - LI < E . Let S := rnin{&, &}. Then for all z E I \ {x} with / z -xi < 6 we infer f ( z ) - L / < E . By Theorem 3.7, this proves Z'lim f (z) = L. X 1 Standard Proof Technique 3.20 To prove that a function has a limit at x often proves that the left and the right limits exist and that they are equal. E R one Infinity and negative infinity are not numbers, so formally they do not qualify as limits. But knowing that a function grows beyond all bounds near a point gives more information than a statement that the limit does not exist. Hence, we extend the language to allow infinite limits. Definition 3.21 Let f : [ a ,b ) + R be a function. Then the left limit off at b is said to be infinity iff f o r all sequences {zn}Klin [ a , b ) with lim zn = b we have lim f ( i n= ) 00. We denote lim f ( z ) := 00. fl'ffi n 'x: z+b- 58 3. Continuous Functions Infinite right limits at a f o r a function f : ( a , b] -+ R and infinite (two-sided) limits of a function f : I \ (x} 3 E% at x E I , where I is an open interval, are defined similarly. Limits equal to negative infinity are also defined similarly and they are denoted by -m. Finally, as before, infinite one-sided and two-sided limits of functions with larger domains are defined via the limits of appropriate restrictions. We chose to put one-sided infinite limits in the spotlight in Definition 3.21, because functions often have different behavior to the left and to the right of a point. For 1 1 example, lirn - = 00 and lirn - = -m. z Z'O+ z+o- z It is not surprising that there is a formulation of infinite limits that is similar to the E-8 formulation of finite limits. Theorem 3.22 M-8 formulation of infinite left limits. The left limit of the function f : [ u , b ) + R at b is infinite iff f o r all M E R there is a 6 > 0 such that f o r all z E [ a ,6 ) with Iz - bl < 6 we have f (2) > M . Proof. Exercise 3-18. Of course, similar results also hold for right-sided and two-sided limits. Limit laws for infinite limits and a version of Theorem 3.19 are given in Exercises 3-22 and 3-23. Exercises 3-15. Let f,g : [a. b ) -+ B be functions. Prove that if there is a number S t 0 so that f and g are equal on the subset [ z E [ a ,b ) : / Z - bl < S ] of [ a , b ) ,then the left limit o f f at b exists iff the left limit of g at b exists and in this case we have lirn f ( z ) = lirn g(z). z- b- i-b- 3-16. Prove Theorem 3.17. That is, let f,g : [a. b ) + X be functions such that lim f ( z ) and lirn g ( z ) z-tb- z+b- exist and prove each of the following. (a) - : ~ - ( f+ g ) ( z ) = lim f ( z ) z+b- (b) lirn :+b- + z+blim (f - g ) ( t ) = lirn f ( z ) - lirn z+b- g(z). g(z) z-b- lirn (f . g ) ( z ) = lim f ( z ) . lirn g ( z ) z-tb- (d) If lirn g ( z ) z-tb- z+b- # 0, then lirn z+b- ('1 &' (z)= lim,,b- f(z) lim,,b- g(z) 3-17. Prove Theorem 3.18. 3- 18. Prove Theorem 3.22. 3-19. Prove that f ( x ) = 3-20. Prove that Iim z+o+ x+3; (x + 10; h , forx<l, for x > 1, has a limit at x = 1 and state its value forx = 1, = 0. Explain why this is a satisfactory resolution of the formal problem in Exercise 3-8 or why it is not. 3-21. A function f is called nondecreasing on I C W iff for all x i < x2 in I we have f ( x 1 ) 5 f ( x 2 ) . Let f : [ a , b ] -+ R be a nondecreasing function. 59 3.4. Continuity (a) Prove that for every x E ( a , b] we have lim f ( z ) = sup { f(z):z<x } Z+S- Hint. Mimic the proof of the Monotone Sequence Theorem. (b) Prove that for every x E [ a , b ) the right limit lim f ( z ) exists and state its value ZX '+ 3-22. Limit laws involving infinite limits of functions. Let f . g : [ a , b ) + W be functions, and let lim f ( z ) = m. Prove the following. :+b- 1 (a) If g is bounded (that is, there is an M > 0 so that for all z E [a. b ) we have g(z) then lirn f ( z ) + g(z) = x and if all f ( z ) are nonzero, then lirn z+b- z+b- (b) If lim g(z) = m, then lirn f ( i ) :+b- z-b- + g(z) = co and z lim ib- I < M), go = 0. f(z) f ( z ) g ( z ) = 00. (c) If c > 0 is a real number, then lirn c f ( z ) = x. z+b- (d) If c < 0 is a real number, then lim c f ( z ) = -m z-b- JR be an open interval, let x E I and let f : I \ [ x ] 3-23. Let I lim f ( z ) =miff lirn f ( z ) = lim f ( z ) = 00. ZJ' Z'X- --f R be a function. Prove that ZX '+ 3.4 Continuity Continuous functions are usually defined as functions for which the limit is computed by substituting the value. Because we may need to take care of endpoints, the formalization of the elementary definition from calculus requires an extra item (see number 4 below). Definition 3.23 Let D 2 R be an interval of nonzero lengthfrom which at mostfinitely many points have been removed and let f : D -+ R be a function. Then f is called continuous at x iff 1. f (x) is dejned, that is, x E D, and 2. lirn f ( z ) exists, and Z-+X 3. lim f ( z ) = f ( x ) ,and Z"X 4. I f x is an endpoint of D, use left or right limits in 2 and 3, as appropriate. f is called continuous (on D ) iff f is continuous at every x E D. We could also define continuity for functions defined on sets for which every point of the domain is contained in an interval of nonzero length. Exercise 3-32 shows that with the present definition this idea is a bit too simple to produce a sensible result. This is not a problem, because in the early part of the text the only functions whose domains are not intervals are rational functions. For these functions the pathology of Exercise 3-32 is not an issue. Therefore we relegate all concerns regarding more complicated domains to Section 16.3. 3. Continuous Functions 60 __ I_ T r--------- / X For theorems, we will usually work with functions that are defined on intervals, because if D is an interval from which at most finitely many points were removed, then f : D +. R is continuous at x E D iff f 1 is continuous at x, where I is a maximum-sized (with respect to containment) interval contained in D that contains x. Example 3.24 1. Constantfunctions are continuous at every x E W. 2. The function f (x) = x is continuous at every x E R. 3. Thefilmtion f ( x ) = 1x1 is continuous at every x E R. Parts 1 and 2 are trivial and part 3 follows from part 2 of Example 3.5. 0 It is useful to incorporate the definition of limits directly into a characterization of continuity. With such a characterization it is not necessary to resolve the multiple parts of the original definition whenever we work with continuity. Figure 6 shows graphical interpretations of the conditions. EX be an interval, let x Theorem 3.25 Let I Then the following are equivalent. E I and let f : I -+ R be a function. 1. f is continuous at x. 2. For every sequence lim n+m f(zn) = {zn},"=, with zn E I for all n E N and n+m lim zn = x, we have f(x). 3. For every E > 0, there is a 6 > 0 such that for all z E I with Iz - x I < S we have If(z)- f < &. 3.4. Continuity 61 Proof. We will prove “1+2,” “2=+3”and “3+1.” The remaining implications follow because logical implications are transitive. That is, the implications “ I +2” and “2=+3”imply “l+3,” the implications “2=>3”and “3+l” imply “2+1,” and so on. We will assume throughout that x is not an endpoint of the domain. The arguments are easily modified (by using appropriate one-sided limits and theorems) for the case that x is an endpoint. For “1+2,” we need to prove that for every sequence { Z n } E i with Zn E I for all n E Nand lirn zn = x we have lim f(z,) = f ( x ) . n’cv n+cc Because f is continuous at x, we have lim f ( z ) = f (x). By Definition 3.1, this (zn}zl with z , XZ ’ I \ {x) and n-too lirn Z n = x we have that (zn}gl be a sequence with Zn E I and lirn Zn = x . n+cv n+oo If there is an N E N so that zn = x for all n 2 N, then there is nothing to prove. Otherwise let (zn,}El be the subsequence of all elements with zllk E I \ {x}. Then lirn znk = x, and hence lim f (z,,) = f (x). Now let E > 0. Then there is a K E N means that for all sequences lim f (z,) = f (x). Now let k+ cv E k+oc so that for all k 2 K we have I f (z,,) - f (x)( < 8. Then for all n 2 nK either zn = x and f (2,)- f (x)] = 0 < E orthereis a k 2 K withn = nk and f ( z n k ) - f (x)l < E . This means lirn f (2,) = f (x) for every sequence with Z, E I for all n E N n-oo and lirn Z, = x, which establishes this part of the proof. n-oc The proof of “2+3” is similar to the proof of “=>” in Theorem 3.7 (Exercise 3-24). For “3+1,” note that, because f is defined on I and x E I , f is defined at x. The condition in part 3 implies by Theorem 3.7 that lirn f ( z ) = f (x), which completes the 1 {z,}zl 1 X’Z proof. Because of the formal problems with limits of functions outlined at the end of Section 16.3.1, in general settings one of the conditions in Theorem 3.25 is normally used to define continuity. Standard Proof Technique 3.26 To prove the equivalence of several conditions, it is standard practice to prove that the first implies the second, the second implies the third, and so on, and finally that the last condition implies the first. All other implications 0 follow from the transitivity of logical implications. The proofs of parts 5 and 6 of Theorem 3.27 serve as good examples how the conditions in Theorem 3.25 can be used in conjunction. Theorem 3.27 Let I R be an interval, let x at x E I . Then the following hold. 1. f E I and let f , g : I + R be continuous + g is continuous at x. 2. f - g is continuous at x. 3. f . g is continuous at x . 4. I f g ( x ) # 0 f o r all x E f I , then - is continuous at x. g 3. Continuous Functions 62 5. max{f ,g } is continuous at x. (The maximum is dejinedpointwise.) 6. min{f,g } is continuous at x. (The minimum is definedpointwise.) Proof. The first four parts are direct consequences of the corresponding limit laws. For part 5 , let x E I. We will establish continuity of max{f, g } at x by proving that for all sequences with Z, E I for all n E N and lim zn = x we have {z,}zl ,--too lim max { f ( z , ) , g ( z n ) } = max { f b ) ,g w } . n-oc Let be a sequence in I with lim zn = x. In case f ( x ) # g ( x ) , assume {z,}zl n+Cc without loss of generality that f ( x ) > g ( x ) . To prove that the limit of the image sequence is lim max{f, g } ( z , ) = max{f, g } ( x ) = f ( x ) , let E > 0. Because f and n-+m .. .. g are continuous at x, there is a 6 > 0 so that for all z E I with Iz - X I < 6 we f ( x ) - g ( x ) . In have I f ( z ) - f ( x ) l < min E , f ( x ) and I g ( z ) - g ( x ) l < 2 particular, for all z E I with'lz - X I < 6 we dbtain 1 ig(')} and hence max{f, g } ( z ) = f ( z ) . For 6, find an N Jz, - X I < 8.Then for all n 2 N we infer E N so that for all n 2 N we have I maxIf3 g } ( z n ) - maxIf, g } ( x ) l = I f ( z n ) - f ( x > l < E, and hence max{f, g } is continuous at x when f(x) > g(x). This leaves the case f ( x ) = g ( x ) . Let E > 0. Because f and g are continuous at x, there is a 6 > 0 so that for all z E I with Iz - xi < S we have that l f ( z ) - f ( x ) l < E and I g ( z ) - g ( x ) l < E . For 6, find an N E N so that for all n 2 N we have that Iz, - X I < 6. For all n 2 N , the maximum max{f, g } ( z n ) is equal to f ( z , ) or g(z,). Because Iz, - X I < 6, if max(f, g } ( z n ) = f(z,) we infer max{f, g } ( z n ) - maxIf9 g } ( x ) l = If(z,> - f ( x ) J < E andifmaxIf, g } ( z n >= g(z,) we infer max{f, g}(z,) - max{f, g } ( x ) l = Ig(z,) - g ( x ) / < E . Thus the maximum max{f, g } is continuous at x when f ( x ) = g ( x ) . The proof of part 6 is left as Exercise 3-25b. I 1 There is a faster proof of part 5 that relies on Theorem 3.30 below and on an algebraic representation of the maximum (see Exercise 3-26a). Because our main focus is on standard techniques in analysis, the longer proof was presented here. Theorem 3.27 gives access to two standard examples of continuous functions. Example 3.28 A polynomial is a function p : R -+ JR for which there are an n n and ag, . . . , a , E R so that a, f 0 and for all x E E W Iw we have p ( x ) = x a ; x ; . The ;=0 number n is called the degree of the polynomial. The constant function p ( x ) = 0 is also considered to be a polynomial. Its degree is defined to be -m. Every polynomial is continuous on R. (See Exercise 3-28.) 3.4. Continuity 63 Example 3.29 A rational function is a function r for which there are two polynomials p ,q : R R so that for all x P(X) R for which q ( x ) f 0 we have r ( x ) = -. By q(x) Theorem 3.27 and Example 3.28, every rational function is continuous on its domain { x E R : q ( x ) f O}. (We implicitly use here that every polynomial has at most finitely many zeroes.) 0 --f E Continuity is also preserved by compositions. Theorem 3.30 Let I , J R be intervals, let g : I -+ R be continuous at x E I , let g [ I ] C J and let f : J -+ R be continuous at g ( x ) . Then f o g : I -+ R is continuous at x . Proof. We will prove that for every E > 0 there is a 6 > 0 so that for all z E I with Iz - X I < 6 we have ( g ( z > )- f ( g ( x ) ) l < E . Let E > 0. Because f is continuous at g ( x ) , there is a 6f > 0 so that for all y E J with / y - g ( x ) < 6 f we have ( y ) - f ( g ( x ) ) < E . Because g is continuous at x , for Sf there is a 6 > 0 so that for all z E I with jz - X I < 6 we have I g ( z ) - g(x)l < 6f. But then because l g ( z ) - g(x)j < 6 f we infer f ( g ( z ) ) - f ( g ( x ) ) l < E . Therefore we have proved that for every E > 0 there is a 6 > 0 so that for all z E I with 1 - X I < 6 we have l f ( g ( z ) )- f ( g ( x ) ) l < E , which means that f o g is continuous at x . H If I I 1 If We conclude this section by characterizing discontinuities. Definition 3.31 Let D be an interval of nonzero length from which at most finitely many points X I , . . . , x,, have been removed. Ifthefunction f : D -+ R is not continuous at x E D U ( X I , . . . , x,}, we speak of a discontinuity at x . There are several types of discontinuities. (For visualizations, see Figure 7,for examples, see Exercise 3-31.) I . I f lim f ( z ) exists, or i f x is an endpoint of D U ( X I , . . . , x,,} and the appropriate Z'X one-sided limit exists, but the limit is not equal to f ( x ) ; or if x is not an endpoint of D U { X I , . . . , x,} and f is not defined at x , we speak of a removable discontinuity. 2. I f lim f ( z ) and lim f ( z ) exist, but they are not equal, we speak of a jump Z'X- Z'X+ discontinuity. 3. I f (at least) one of lim f ( z ) and lirn f ( z ) does not exist and Z'X- Z'X+ sequence (z,}:, in D that converges to x and limn+oo of an infinite discontinuity. 1f(z,)] = if there is a 00, we speak 4. Iftat least) one of lim f ( z ) and lirn f ( z ) does not exist, ifthere is a 6 > 0 so z'x- Z'X+ {z,,}zl that f is bounded in { z E D : / z - x I ia} and ifthere are two sequences and ( w , ] that ~ ~ converge to x such that for all n E W we have z,,, w, < x (or z,, w, > x ) and such that both lirn f ( z , ) and lirn f (w,)exist, but they are n+oo ,--too not equal, we speak of a discontinuity by oscillation. 3. Continuous Functions 64 Figure 7: Visualization of the possible discontinuities of a function. Theorem 3.32 Let D C R be an interval of nonzero length from which at most$nitely many points XI,. . . , x, have been removed and let f : D + R be a function. Then eveiy discontinuity x E D U {XI,. . . , x,} of f is of one of the four types listed in Dejinition 3.31. Proof, Let x E D or let x be one of the finitely many elements that were removed and assume that f is not continuous at x. Let the discontinuity at x be neither a removable discontinuity, nor a jump discontinuity, nor an infinite discontinuity. We will prove that it must be a discontinuity by oscillation. If lirn f (x) and lirn f ( x ) both Z’X- Z’X+ existed, then they would either be equal and the discontinuity would be removable, or not, in which case the discontinuity would be a jump discontinuity. Hence, one of the one-sided limits does not exist at x. If x is the supremum or the infimum of D , then f is defined at x and one of the two one-sided limits does not exist by default. In this case, the respective other one-sided limit also must not exist, because otherwise there would be a removable discontinuity at x. By symmetry, we can assume without loss of generality that lim f (x) does not exist and f is defined on some interval [x - 6 . x). zx’- Because the discontinuity at x is not an infinite discontinuity, there is a u > 0 so that V f is bounded on [x-u, x). Let zn := x - -. Then lirn z , = x. First consider the case n n’oo that lim f (z,) =: L exists. Because lim f ( z ) does not exist, there must be an E > 0 n+oo ZX’- and a sequence {vk}& 1 1 so that lim k+m Uk = x, and for all k E N we have x - v 5 uk < x oc and f ( U k ) - L > E . Because f is bounded on [x - u , x), the sequence { f ( U k ) J k Z l is bounded. By the Bolzano-Weierstrass Theorem, it has a convergent subsequence co {f(Uk,)},=1. Then n-oo lim f(vk,) L , and hence {wn}E1:= and {zn1Z1 are sequences as required in the definition of a discontinuity by oscillation. oo If { f (z.)},=~ does not converge, note that it is bounded, and hence it has a convergent subsequence { f Now f has a discontinuity by oscillation at x,because we can replace with { z n k } E in I the above argument and then repeat it. W + {w,}zl (znk)}E1. {zn]zl Standard Proof Technique 3.33 When an argument requires a convergent sequence and all that is guaranteed for a given sequence ( z ~ } ? = ~is a convergent subsequence, then one often assumes without loss of generality that ( z , , ] ~ = converges. ~ This is because, just like at the end of the proof of Theorem 3.32, the given sequence can be replaced with a convergent subsequence that (usually) has all the properties of (Z,},X=~, plus it converges. 0 {zn]zI 3.4. Continuity 65 Although Theorem 3.32 characterizes all possible discontinuities for functions from R to R,other types of discontinuities exist for functions with infinite dimensional range (see Exercise 16-36). Exercises 3-24. Prove part “ 2 j 3 ” of Theorem 3.25. 3-25. Completing the proof of Theorem 3.27 (a) Give all details of the proofs of parts 1-4. (b) Prove part 6. 3-26. Alternative proofs of parts 5 and 6 of Theorem 3.27. Let I be an interval and let f,g : I + R be functions. (a) Use parts 1 and 2 and Theorem 3.30 to prove that if f and g are continuous at x E I , then max(f, g ) is continuous at x . Hinr. First prove that for all a , b E 1 2 W the equality max[a, b ] = - ( a + b + la - b / ) holds. (b) Give a similar proof of part 6 of Theorem 3.27. 3-27. Prove that for all n E N the function f ( x ) = x n is continuous on W.Then prove that for all m E W the function f ( x ) = Y r n is continuous on I% \ ( 0 ) . 3-28. Prove that all polynomials are continuous on E. Hint. Induction on the degree. 3-29. Alternative proofs of Theorem 3.30. (a) Prove Theorem 3.30 using Definition 3.23 and Theorem 3.14. (b) Prove Theorem 3.30 using part 2 of Theorem 3.25. 3-30. Explain why we demand that the interval in Definition 3.23 must have nonzero length. 3-31. Examples of discontinuities. 2 (a) Prove that f ( x ) = - has a removable discontinuity at 0 X (b) Prove that f ( x ) = rx] has a jump discontinuity at 0. 1 (c) Prove that f ( x ) = - has an infinite discontinuity at 0. x (d) Let g(x) := +2; . 1 f o r O S x i -, 2 1 for - < x 5 1. 2 - , has a discontinuity by oscillation at 0 u [k - L. ‘1 m 3-32. Let u := 10n ’ n ,,=I f : [-l,O] (a) (b) (c) (d) UU --f := ( x E B :(3n I E N :x E 0; [ - A,:])I and let the function forx E [-l,O], E U. W be defined by f ( x ) := 1: forx Prove that f is continuous on every interval I that is contained in its domain D := [- 1 , OIUU. Prove that every point in D is contained in an interval of nonzero length. Explain why the function still should not be considered to be continuous on D . Suggest a generalization of the definition of continuity that would allow domains such as D and that would make f discontinuous at 0. This function will be revisited in Exercise 16-29. 66 3. Continuous Functions .xA approach c I , from the left sn approach c from the nght b -' " 1 f(x)>Ofor Figure 8: Visualizing the Intermediate Value Theorem. Intuitively ( a ) it is clear that an unbroken graph that goes from a point below the x-axis to a point above the xaxis must cross the x-axis at least once (solid graph) and that it could even cross the x-axis multiple times (dotted continuation past the first intercept). Part ( b ) gives a visualization of the proof. Because the sequences { x , ) and ~ ~ {xA},"=, meet in the middle at (c, f ( c ) ) , we conclude f ( c ) = 0. 3.5 Properties of Continuous Functions Aside from their obvious connection to limits, continuous functions are interesting because they have several useful properties. Theorem 3.34 Intermediate Value Theorem. Let a < b and let f : [ a ,b] -+ R be a continuousjknction. I f f ( a ) < 0 and f ( b ) > 0 (or vice versa) then there is a c E ( a , b ) such that f ( c ) = 0. (Also see Figure 8(a).) Proof. Assume without loss of generality that f ( a ) < 0 and f ( b ) > 0. The proof for the other case is similar. The set G := {x E [ a ,b ] : f ( x ) > 0) contains b and it is bounded below by a . Let c := inf(G). We will show that c is as claimed in the theorem. First, we show that c $ { a ,b ) . For a contradiction suppose that c = a . Then by the version of Proposition 1.21 for infima, for each n E N there would be an xn E (a,a + :) with f (x,) > 0. (Note that we are using Standard Proof Tech- nique 3.8 here.) But then lim x, = a , and by continuity of f we could infer that n-+m 0 > f ( a ) = n+m lim f(x,) 3 0, a contradiction. The inequality c f: b is proved similarly (see Exercise 3-33). Because c # b, again by the version of Proposition 1.21 for infima, for each n E N there is an x, E c c, c ( # a , there is an N E +- 3 with f ( x , ) > 0. In particular, lim x, = c. Because n-cc 1 is greater than n = c and f(xA) 5 0 for W so that for all n 2 N the number XI, := c - XA or equal to a. Hence, the sequence {xA},"==, satisfies lim n+cc all n E N.For a visualization of the sequences, consider Figure 8. - 3.5. Properties of Continuous Functions 67 Because f is continuous, we infer lim f ( x n )= lim f (x;) = f ( c ) . But the inn-oc n-oc (a) I 0. Thus equalities for f ( x n ) and f (x;) show that lim f ( x , ) 3 0 and lim f n-+w n-cc f ( c ) must be greater than or equal to zero and less than or equal to zero, which implies f ( c ) = 0. The Intermediate Value Theorem immediately implies that continuous images of intervals are intervals, too. Definition 3.35 Let A , B be sets, let f : A -+ B be a function and let C the image of C under f is dejned to be f [ C ] := { f ( c ) : c E C}. Theorem 3.36 Let I is an interval. G R be an interval and let f :I s A. Then + R be continuous. Then f [ I ] Proof. Let I , u E f [ I ] with 1 < u and let m E (1, u ) . Then there are a , b E I with f ( a ) = 1 and f ( b ) = u . Without loss of generality assume a < b. The function g ( x ) := f ( x ) - m is continuous on [ a , b ] . By the Intermediate Value Theorem there is a c E ( a , b) so that g ( c ) = 0. But then f ( c ) = g ( c ) in = in. Because i n , 1, u were arbitrary we have proved that for any two elements 1 < u of f [ Z ] and any element m between them we have that m E f [ Z ] .Therefore f [ I ] is an interval. + We now turn to inverse functions. Definition 3.37 Let A , B be sets and let f : A i B be a bijective function. Then the inverse function o f f is the unique (see Exercise 3-34)function f : B + A that maps each b E B to the unique a E A so that f ( a ) = b. -' Theorem 3.38 Let I 2 R be an interval and let f : I -+ R be a continuous injective function. Then the inverse function f : f [ I ] i I is also continuous. Proof. Clearly, f maps Z bijectively onto f [ Z ] ,and hence f has an inverse function f - ] that is defined on f [ Z ] . By Theorem 3.36, f [ Z ] is an interval. All that is left is 1 oc to show that if a sequence { ~ ~ ] n h in 3 , f~ [ I ]converges to y E f [ Z ] ,then f - ( yn ) l n = l converges to f - ' ( y ) . Let(yn}E1beasequenceinf[I]with lim yn = y . F o r n ~ N , l e t x ,:= f - ' ( y n ) n+oo and let x := f - ' ( y ) . For E > 0, let J := { z E I : Iz - X I < E } , Ji := { z E J : z 5 x) and J , := { z E J : z 2 x}. Then f [ J ] ,f [ J l ] and f [ J u ] are all intervals that contain j = f ( x ) . Because f is injective, the only point f [ J l ] and f [ J u ]have in common is y . Therefore y is the maximum of one of the intervals f [ J l ] and f [ J u ] and it is the minimum of the other. If both f [ J / ]and f [ J u ]have more than one element, then there is a 6 > 0 so that ( y - 6, y 8) f [ J ] . In this case, because {yn)Z1 converges to y , there is an N E N so that for all n 3 N we have yn E ( y - 6 , y 6) E f [ J ] .Consequently, for all n 2 N the point x n is in J , which means Ixn - x /< E . If f [J,] has exactly one element, then J , has exactly one element and x is the largest point of I . Therefore f [ J / ]has more than one element and there is a 6 > 0 + + 68 3. Continuous Functions + so that ( y - 6 , y ] E f [ J l ] or [ y , y 6) & f [ J l ] . Without loss of generality assume ( y - 6 , y ] E f [ J l ] . We claim that then y is the largest point of f [ I ] . Suppose for a contradiction that there was a y' > y in f [ I ] . Then a := f - ' ( y ' ) < x and there would also be a b < x with f ( b ) < y . By Theorem 3.36 there would be a c # x between a and b so that f ( c ) = y = f ( x ) , contradicting the injectivity of f . Thus y = sup f [ I ] .Because ( y n } z lconverges to y there is an N E N so that for all n 2 N we have yn E ( y - 6, y ] 5 f [ J l ] . Thus for all n 2 N we infer x, E J l , which means Ix, - x / < E . The case in which f [ J / ]has exactly one element is handled similarly. We have shown that in each of the above cases f - ' lim y,) = lim f - ' ( y n ) , which implies that f ( r 2 . 3 ~ - -' is continuous. , n.fOg- Now that we know that inverses of continuous functions are continuous, we can establish limit laws for powers with rational exponents. Definition 3.39 A number n E Z is called even irthere is a k it is called odd iff there is a k E Z so that n = 2k 1. + E Z so that n = 2k and Corollary 3.40 Let d E N. Then f ( x ) = x : is continuous on [0,00) i f d is even and it is continuous on JR i f d is odd. Proof. Use Theorem 3.38 (Exercise 3-35). Corollary 3.41 Let r E Q be positive. Then f (x) = x" is continuous on [0,CO), and i f r can be represented as a fraction with odd denominatol; f is continuous on R. For negative r E Q,f ( x ) = x" is continuous on ( 0 ,CO), and i f r can be represented as a fraction with odd denominatol; f is continuous on R \ {O}. Proof. Use Exercise 3-27 and Corollary 3.40 (Exercise 3-36). Theorem 3.42 Let f o r all r E lirn Q we have n+cc be a convergent sequence of nonnegative numbers. Then aI; = (n+oc lim Proof. Exercise 3-37. We conclude this section by showing that on closed and bounded intervals continuous functions assume an absolute minimum and an absolute maximum. Definition 3.43 Let I E R be an interval and let f : I + JR be a real valued function. The number y, is called the absolute minimum value o f f (in I ) if and only ifthere is an x,, E I with f (x,) = y, and f o r all x E I we have f (x) > f (x,). The number y~ is called the absolute maximum value off (in I ) ifand only ifthere is an X M E I with f ( x ~ = ) Y M and for all x E I we have f ( x ) 5 f ( x ~ ) .A value that is the absolute maximum or the absolute minimum is also called an absolute extremum. Theorem 3.44 Let f : [ a ,b] -+ Iw be continuous. Then there is an x that,forall z E [ a ,bl we have f ( x ) 2 f ( z ) . E [ a ,b] such 69 3.6. Limits at Infinity Proof. First we show that f is bounded above on [ a ,b ] . For a contradiction, suppose that for every n E N there is an x, E [a,b ] such that f (x,) 2 n. Then by the Bolzano-Weierstrass Theorem there is a convergent subsequence { xnk},“=, with limit x E [ a ,b ] . But then for all k we have f ( x n k ) L nk while at the same time lim f ( x n k ) = f (x) < 00, which is not possible. k-+ rn Thus f is bounded above. Let M := sup { f ( z ) : z E [ a , b ] } . For each n E N 1 find x, E [ a ,b] with f (x,) 2 M - -. Again by the Bolzano-Weierstrass Theorem n ~ limit x E [ a ,b ] . For all k we have there is a convergent subsequence { x , , } ~ with 1 A4 - - 5 f ( x n k ) 5 M , which means by the Squeeze Theorem that nk f (x) = ,lhlf = M = SUP { f ( z ) : z E [ a ,b l } . Exercises + b in the Intermediate Value Theorem 3-33. Prove that c 3-34. Prove that i f f : A b E B. + B is bijective and g, It : B -+ A are inverses o f f , then g(b) = h ( b ) for all 3-35. Prove Corollary 3.40 3-36. Prove Corollary 3.41 3-37. Prove Theorem 3.42. 3-38. Let f : [ a . b] + B be continuous. Prove that there is an x inequality f ( x ) 5 f ( z ) holds. E [ a , b] such that for all z E [a. b] the 3-39. Let I 5 W be an interval. Give an example that shows that even if f : I + W is continuous, f [ I ] need not be bounded. Then explain why this example does not contradict Theorem 3.44. 3-40. Let I be an interval and let f : I + R be continuous and injective. Prove that f is either increasing (that is, for all x1 < x 2 we have f ( x 1 ) < f ( x 2 ) ) or decreasing (that is, for all X I < x2 we have f(x1) > f ( x 2 ) ) . 3-41. Although the contrapositive is often used to clarify a mathematical statement, some contrapositives can be quite confusing. Determine if the statement “If f ( n ) 0 for all c E [a, b ] ,then f ( a ) > 0 or f ( b ) 5 0 or f is not continuous on [a, b].” is true or false by analyzing its (much simpler) contrapositive. + 3.6 Limits at Infinity For functions defined on an interval ( t , 30) it is sensible to investigate the behavior as the argument x gets large. The resulting notion of a limit at infinity is set up in exactly the same way as the other limits discussed in this chapter. Thus it is not surprising that similar laws hold. Definition 3.45 Let L be a real number and let f : ( t , 00) + R be a function. We say f converges to the limit L at 30 and write lim f ( z ) = L iff for every sequence {Z,},X=~ in ( t , 30) such that n+m lim ZO ’D Z, = 00 we have lim f (z,) = L. ,--too 70 3. Continuous Functions Limits at --oo are dejined similarly. For functions with domains that contain intervals ( t , 00) or (--00, t ) the limits at f - 0 0 are dejined as the limits of the appropriate restriction. Theorem 3.46 Limit laws for limits of functions at -00. Let f , g : ( t , 00) + R be g ( z ) exist. Then the following hold. functions so that limZ+CCf ( z ) and limZ--rCC 1. lirn ( f Z M ' + g ) ( z ) = lim f ( z ) + lim g ( z ) . Z'CC Z'CC 2. lim (f - g ) ( z ) = lim f ( z ) - lim g(z). Z'ffi Z'CC Z'CC 3. lirn (f . g ) ( z ) = lim f ( z ) . lim g ( z ) . Z'W Z'CC Z'CC Each equation implicitly asserts that and in this case the equality holds. if the right side exists, so does the left side Proof. Exercise 3-42. Theorem 3.47 E-M formulation for the limit of a function at infinity. Thefinction f : ( t , 00) -+ R converges to the limit L at infinity ifand only iffor every E > 0 there is an M such that for all z 2 M we have f ( z ) - L < E . 1 I w Proof. Exercise 3-43. Moreover, infinite limits at &00 can be Similar results hold for limits at -a. defined similar to infinite limits at a finite number and similar limit laws hold. By now, the reader is sufficiently familiar with the underlying ideas to formulate these definitions independently (see Exercise 3-44). Exercises 3-42. Prove Theorem 3.46. (a) Prove part 1 of Theorem 3.46. (b) Prove part 2 of Theorem 3.46. ( c ) Prove part 3 of Theorem 3.46. (d) Prove part 4 of Theorem 3.46. 3-43. Prove Theorem 3.47. 3-44. Infinite limits at infinity. (a) State the definition of lim f ( z ) = oc z+m (b) State limit laws for infinite limits at oc. (c) State and prove a result similar to Theorem 3.47 for infinite limits at infinity. 3-45. Explain why we do not define left and right limits at infinity. Chapter 4 Differentiable Functions Geometrically speaking, differentiable functions have unbroken graphs without corners. This “smoothness” of differentiable functions is useful in applications. In the present chapter differentiability is introduced in Section 4.1, the relation between differentiability and the common operations on functions is considered in Section 4.2, and some geometric consequences of differentiability are provided in Section 4.3. 4.1 Differentiability The derivative provides the slope of the tangent line, if the function has a tangent line at the point. Definition 4.1 encodes this idea by demanding that as we fix one point and move the other one closer to this “base point,” the slopes of the secant lines through these points approach a limit. The left part of Figure 9 shows that this convergence of the slopes means that the secant lines tilt towards a “limiting line,” the tangent line. It should be noted that differentiability typically is considered on open intervals so that secant lines “in both directions” can be used to obtain the limit. Definition 4.1 A function f : ( a , b ) -+ lim z+x - 2--x R is differentiable at x exists. In this case we set f ’ ( x ) := lim Z’X E ( a , b ) zrthe limit (‘) - f ( x ) and call it the z-x d f ( x ) and D f ( x ) . derivative o f f at x. Other notations for the derivative at x are dx Similar to limits offunctions, if D C R, a function f : D --f JR is called differentiable at x E D iff there is an open interval ( a , b) D so that x E ( a ,b ) and so that 71 72 4. Differentiable Functions f I ( a , b ) is differentiable at x. Moreover, similar to what we did so far, we will mostly work with functions defined on open intervals, trusting that the reader can make the jump to larger domains. Example 4.2 The following are verified with routine computations (Exercise 4-1). I . At every x E R thefunction f (x) = x is dizerentiable with f’(x) = 1. 0 2. The function f (x) = 1x1 is not differentiable at x = 0. As with continuity, the local property of differentiability can be demanded at every point of the domain to obtain a global definition. Definition 4.3 Let D E R.Afunction f : D + IR is differentiable, or differentiable on D, iff it is differentiable at every x E D. Continuity at x means that lirn f ( z ) - f(x) = 0. Theorem 4.4 affirms that conZ X ’ vergence of the quotient (‘) - z-x to a number is a stronger condition. Theorem 4.4 Let f : ( a , b ) + R be dizerentiable at x . Then f is continuous at x. Hence, every differentiable function is continuous. Howevel; not every continuous function is dizerentiable. Proof. Let f be differentiable at x. Then lim Z X ’ 0 = lim(z - x) lim Z’X Z X ’ f ( z ) - f (x) = lim(z - x) z-x X’Z (‘) - z-x - z-x exists. This implies = lim f ( z ) - f(x), Z’X + that is, lirn f (z) = lirn (f( z ) - f (x)) lim f (x)= 0 + f (x) = f (x),and f is conZ’X Z’X Z’X tinuous at x. To see that not every continuous function is differentiable, consider f (x) = 1x1, H which is continuous, but it is not differentiable at x = 0 . Ultimately, we will generalize differentiability to higher dimensions. In higher dimensions, division is not possible, but it is possible to define entities that are similar to tangent lines. Theorem 4.5 shows the difference between differentiability at x and continuity at x without using division. A differentiable function f can be approximated by a straight line g ( z ) = f(x) f’(x)(z - x) through (x,f (x)) in such a way that the difference between f and g goes to zero faster than Iz - x 1. Geometrically, this means (see Exercise 4-2a and Figure 9 ( b ) )that, no matter how small the width, near x the differentiable function f will enter all “cones” which are centered at (x,f (x)) and symmetric about the line g . This idea ultimately leads to the definition of differentiability in higher dimensional spaces (see Definition 17.24). For a continuous function f,the difference between f and any straight line through (x,f (x)) goes to zero (see Exercise 4-2c), but the function need not enter arbitrarily narrow cones about the line. + 4.1. Differentiability 73 (b) Figure 9: Two ways to view differentiability. In ( a ) , as in Definition 4.1, the slopes of the secant lines through (x,f ( x ) ) and (z, f ( z ) ) approach a number, which is the slope of the tangent line. In ( b ) ,as in Theorem 4.5 and Exercise 4-2a, the function and a certain straight line, the tangent line, are such that for any width E > 0, near x the function will ultimately enter the “cone of width E” about the tangent line. Theorem4.5 Let f : ( a , b ) + R beafunctionandletx E ( a ,b). Then f isdigerentiable at x iff there is an L E JR so that for every E > 0 there is a 6 > 0 such that for Moreovel; all z f x with Iz - X I < 6 we have I f ( z ) - f ( x ) - L ( z - x)/ < E J Z -XI. in this case f ’ ( x ) = L . 1 2-x -L < E. Multiplying both sides by 8 L = lim Z’X - z-x Iz - XI turns this inequality into - and f is differentiable at x with f ’ ( x ) = L . Exercises 4-1. (a) Prove that the function f ( x ) = x is differentiable at every x E W and that f ’ ( x ) = 1 (b) Prove that the function f ( x ) = 1x1 is not differentiable at x = 0. 4-2. Differentiability versus continuity. (a) Let f : (u,b) + R.Prove that f is differentiable at x iff there is an L E > OthereisaS > 0 so that for allz E R with / z - x i < 6 wehave ( A x )+ u z -XI ) - El2 --XI 5 f ( z ) 5 (f(x) + L(z - x ) E W so that for every ) +ElZ -XI Also prove that in this case f’(x) = L . Use this result to explain the right part of Figure 9. 14 4. Differentiable Functions (b) Prove that for any m E W we have lim Iz/ - mz = 0, even though f ( x ) (c) Let f : ( a , b) 0 ’i tiable at x = 0 W be a function that is continuous at x i 1 f ( z ) - [ f ( x ) + m(z - x)l I = 0. J&x E = 1x1 is not differen- W and let m E W. Prove that 4-3. For f : ( u , b ) + B and x E ( a , b ) we define the left-sided derivative of f at x via the left limit ~ ’ f ( x:=) lim z’x- - f ( x ) if the limit exists and we define the right-sided derivative o f f at z-x x via the right limit D ” f ( x ) := lim f ( z ) <‘X+ z-x if the limit exists. (a) Prove that f is differentiable at x iff the left-sided and right-sided derivatives exist at x and D‘f(x) = D‘f(x). (Compare with Standard Proof Technique 3.20.) (b) Prove that for f ( x ) = 1x1 we have that D ‘ f ( 0 ) = 1 and d f ( 0 ) = -1. 4.2 Differentiation Rules The results in this section show how differentiability relates to the algebraic operations for functions as well as to composition. So far, whenever limits were concerned, we worked with estimates that ultimately made certain differences smaller than E . However, we can also use the available limit theorems and we will do so in this section. The following computations reemphasize the fact that facility in symbolic computation remains important in abstract mathematics. Theorem 4.6 Let the functions f : ( a ,b ) -+ E% and g : ( a ,b ) -+ R both be dzfferentiable at x E ( a , b ) and let c be a real numbei: Then the functions f g , f - g and cf are all differentiable at x and + (f + g)’(-x) (f - g)’(x) = = (Cf)’(X) = f ’ G )+ g’(x), f’(-x) - g’(-x), Cf’(X). Proof. We use the definition of the derivative with the goal of extracting the difference quotients for f and g , respectively. (f +g)’(x) = lim ii x (f + g ) ( z I - ( f + g>(-x) = lim f (2) + g ( z ) - f ( X I - g ( x ) z - X z+x z--x 4.2. Differentiation Rules 75 The remainder of the proof is left to the reader as Exercises 4-4 and 4-5. H Derivatives are also well-behaved when products, quotients, powers and compositions are involved. We first prove the quotient rule, which is a bit harder than the product rule and we leave the proof of the product rule as an exercise. Theorem 4.7 Quotient Rule. Let the functions f,g : ( a , b) + R be diTerentiable at x E ( a , b) and let g ( x ) + 0. f is. diflerentiable at x with Then the quotient g Proof. Similar to the proof of Theorem 4.6 we compute the limit of the difference quotients. The computations are a bit more involved than before. $(z) lim X’Z - gf ( x ) z-x 4. Differentiable Functions 76 AA=Af g+Ag. f+AfAg Figure 10: Visualization of the Product Rule. The growth rate of a rectangle with side lengths f and g can be obtained from the picture above by dividing the formula for + g ' f . The term A f A g does A A by At and letting A t go to zero. It is A' = f ' g At not contribute to the rate because in its numerator two quantities are going to zero. The proof of the product rule (Exercise 4-6) makes this idea more precise. w Theorem 4.8 Product Rule. Let f,g : ( a , b) + R be differentiable at x E ( a , b). Then f g is differentiable at x with ( f g ) ' ( x ) = f ' ( x ) g ( x ) g ' ( x )f ( x ) . (For a visualization, consider Figure 10.) + Proof. Exercise 4-6. Now that products and quotients are taken care of, we can consider powers. The Power Rule could have been proved earlier in a more direct fashion, but the present proof is faster. + Theorem 4.9 Power Rule. For every integer n 0, thefunction f ( x ) = X" is diyerd entiable with -xn = nxn-' at evely x E R for which the right side is dejined. dx Proof. For n > 0, we use induction on the exponent n. For the base step with d 2-x n = 1, note that -xl = lim -= 1 for all x E R. dx z-xz-x d For the induction step n + ( n + l), we need to prove -x"+' = ( n 1)x" for all dx d x E 1w and we can use the induction hypothesis that -xn = nxn-'. Via the Product dx d d Rule we obtain -xn+' = - ( x . x " ) = 1 . X" nx"-' . x = ( n 1)x". dx dx This establishes the power rule for positive integer exponents n. For any rn E N, we have -rn < 0 and we can differentiate x P m as follows. + + + 4.2. Differentiation Rules 77 ipeed g’(n) in the g-direction ,peed f ’ (g(x) ) g’(x) n the f-direction ,’slope g‘(xj ,‘slope i ‘ ( g ( x 1 ) I X * - __c - “g” c speed 1 in the x-direction speed g’(x) in the g-direction The output of g becomes the input o f f . Position and speed are preserved. Figure 11: Visualization of the chain rule. The derivative can also be understood as a magnification factor for speeds. If a particle at x moves at unit speed along the horizontal axis, then its image particle under g moves at speed g’(x) along the vertical axis. Now, if a particle at g ( x ) moves at speed g’(x) along the horizontal axis then its image particle moves at speed f ’(g(x))g’(x)along the vertical axis. The above shows that the power rule holds for all nonzero integers. Now that we have the power rule, the last computation can only be viewed as a clumsy way to compute the derivative of powers with negative exponents. Howevel; in this proof we had no choice, because we were proving the very rule that abbreviates this computation. We will revisit the Power Rule in Theorems 4.22 and 12.10. With algebraic operations taken care of, we turn our attention to composition. Figure 11 shows a kinematic way to explain the Chain Rule. Theorem 4.10 Chain Rule. Let g : ( a , b ) -+ LR and f : ( c ,d ) + R be functions with g [ ( a ,b ) ] ( c ,d ) and let x E ( a , b ) be such that g is dzjterentiable at x and f is dzjterentiable at g ( x ) . Then f o g : ( a , b ) -+ R is dijterentiable at x and the derivative is ( f 0 g)’(x) = f’(g(x))g’(x). Proof. First, consider the case that there is no sequence {znlzl with ,hhr Zn and zn # x and g ( z n ) = g ( x ) for all n E W. In this case, we proceed as follows. lim Z X ’ f 0 g(z)- f z-x 0 g(x) =x 4. Differentiable Functions 78 = lim u*g(x) = f( u ) - f . lim u -g(x) Z’X g(z) - g(x) z -x f’(g(x))g’(x). {z,},“=, with n+m lirn zn = x and Because f is differentiable at g ( x ) , there is a u > 0 so that for all u # g(x) This leaves the case in which there is a sequence zn # x and g ( z n ) = g ( x ) for all n with 1u - g(x)l < u we have 1 E N.In this case, - g(x) reverse triangular inequality) ’(’) E - f l ( s ( x ) ) l < 1, and hence (by the 1 I + 1. Moreover, for all - (g(x)) < f’(g(x))i - g(x) 0 there is a 6 > 0 so’ that for all z f ’ x with Iz - X & If’ ( g ( x ) )1 + 1 I < 6 we have that (because g’(x) = 0) and / g ( z ) - g(x)l < u (by Theorem 4.4). Formally, we would need tofind a 6 so that thefirst inequalig holds and another so that the second one holds and then use the minimum of the two. We used a simple modification of Standard Proof Technique 2.6 to abbreviate this step. Therefore for all z # x with Iz - x / < 6 we obtain the following. In case g ( z ) = g ( x ) , we have g(z) g ( x ) = 0. In case g ( z ) # g ( x ) ,we have I g ( z ) - g ( x ) / < v , and hence z-x Because E was arbitrary we conclude and the proof is complete. 4.2. Differentiation Rules 79 With compositions taken care of, it would be natural to also consider inverse functions. Because we need Rolle's Theorem to dispose of a technicality, consideration of inverse functions is postponed to Theorem 4.2 1. Example 4.11 Maxima and minima do not preserve diyerentiability. Even though f (x) = x and g ( x ) = -x are both differentiable, the functions max{f , g } ( x ) = 1x1 and min{f , g ] ( x ) = --/XI are not differentiable at 0. Thus, while differentiability is compatible with the natural algebraic operations for functions, it is not compatible with the natural order-theoretical operations for functions. 0 If a function f : ( a , b ) -+ R is differentiable on ( a , b ) , then the derivative f' is a function in its own right. Hence, we can consider continuity and differentiability for .f I . Definition 4.12 Let D R and let f : D -+ R be afunction. Then f is called continuously differentiable iff it is diyerentiable on D and the derivative f ' : D + R is continuous. The function f is also often considered to be its own "zeroth derivative" f (1' := f . The function f is called n times differentiable i y i t is n - 1 times derivative f (n-l) : D + R is d8erentiable. The nth diTerentiable and its (n d d" derivative o f f is f ( " ' ( x ) := -f("")(x), also denoted -f . The function f is dx dxn called n times continuously differentiable iff it is n times differentiable and its nth derivative f ( n ) : D + R is continuous. Finally, f is called infinitely differentiable iy fo r all n E N it is n times differentiable. The differentiation rules we have derived here immediately show the following. Example 4.13 Polynomials and rational functions are infinitely diyerentiable on their 0 domains. Example 4.14 For every n E N, the function f (x)= times continuously differentiable, but it is not ( n + 1)-times differentiable. (Exercise 0 4-7.) Exercises 4-4. Prove that if f,g : ( a , b ) -+ R are both differentiable at x E ( a , b ) , then f - g is differentiable at x and (f - g)'(xj = f ' ( x ) - g'(x). 4-5. Prove that if f : ( a , b ) -+ and ( c f ) ' ( x ) = c f ' ( x ) . W is differentiable at x E ( a , b ) and c E W,then cf is differentiable at x 4-6. Prove the product rule. Hint. It's similar to the proof of the quotient rule, but simpler. 4-7. Prove the claim in Example 4.14. Hint. Use induction and at x = 0 use Exercise 4-3. 4-8. Prove that f : ( u , b j + 5s is differentiable at x E ( a . b ) iff the limit lim and that in this case f ' ( x ) = lim h-0 f(x +h) - f(x) h h+O f(x +hj - f(xj h exists 80 4. Differentiable Functions 4-9. Let n E N. Use the Binomial Theorem and Exercise 4-8 to prove the Power Rule for f ( x ) = x" without using induction. 4-10. Prove that the derivative of f ( x ) = f i is f ' ( x ) = -. 1 2 f i 4- 1 1. Compute the derivative of each of the following functions. (b) f ( x ) = (x2 + 5>1 d x (You may use Exercise 4-10.) (c) f ( x ) = 4-12. Use induction (-l)"n!x-"-' 4-13. Prove that if f , g : (a,b ) + differentiable and ( f g ) ( " ) = W are both 2 (;> for all x # 0. n times differentiable, then the product f g is n times f(kIg(n-k), k=O Hznr. Mimic the proof of the Binomial Theorem. 4.3 Rolle's Theorem and the Mean Value Theorem One of the main applications of derivatives is to use the sign of the derivative to compute relative extrema of a function and intervals where a function is increasing or decreasing. The formal justification follows from Rolle's Theorem and the Mean Value Theorem. Definition 4.15 Let f : ( a ,b ) + R be a function. Then f is said to have a relative (or local) minimum at x, @there is a 6 > 0 such that f (x,) 5 f ( x )for all x E (x, - 6 , x, + 6 ) . f is said to have a relative (or local) maximum at x~ ifand only ifthere is a 6 > 0 such that f ( X M ) 2 f (x)for all x E ( x -~6, x~ 6). I f f has a local maximum or a local minimum at the point c we also say that f has a relative (or local) extremum at c. + Intuitively, relative extrema are the locally highest or lowest points of the graph. (Note, however, that stagnation also is possible, see Exercise 4-14.) At the location of a relative maximum there cannot be an incline in any direction. Hence, the derivative should be zero at a relative maximum. Theorem 4.16 Let f : ( u , b ) + R be a function and let m E ( a ,b). I f f is differentiable at m and f has a relative maximum at m, then f ' ( m ) = 0. Proof. Because f has a relative maximum at m there is a positive number 6 so that f (z) - f (m) f ( z ) - f ( m ) 5 0 for all z with Iz - ml < 6. We infer that lim 5 0 z+m+ z -m and lim z+m- - z-m ( m ) 3 0. Because f is differentiable at m , these two limits must 4.3. Rolle’s Theorem and the Mean Value Theorem rn (a) a b -+-T--+ 81 b (b) a Figure 12: Rolle’s Theorem states that if a differentiable function starts and ends at the same height, then it must have a flat tangent in between ( a ) . The Mean Value Theorem states that some tangent must be parallel to the secant through the starting point and the ending point ( b ) . be equal to f’(rn). Therefore f ’ ( m ) is greater than or equal to 0 and smaller than or equal to 0. This implies f ’ ( m ) = 0. Rolle’s Theorem states that if a function’s values are equal at the endpoints of an interval, then the function must have a horizontal tangent line in the interval (see Figure 12). Note that the proof is a collection of direct proofs (modus ponens), which is possible because we can rely on strong results that we proved earlier. Theorem 4.17 Rolle’s Theorem. Let f : [ a , b] + Iw be differentiable on the open interval ( a , b ) and continuous on the closed interval [ a ,b]. I f f ( a ) = f (b),then there is an m E ( a , b ) with f ’ ( m ) = 0. Proof. Because the result is trivial if f is constant, we can assume f is not constant. By Theorem 3.44, f assumes an absolute maximum and an absolute minimum on [ a ,b ] . Because f is not constant, one of these values is not equal to f ( a ) . Assume without loss of generality that the absolute maximum value is greater than f ( a ) . Let m E [ a ,b] be so that f ( m ) 1: f ( x ) for all x E [a, b ] . Because f ( m ) > f ( a ) = f ( b ) we infer m E ( a , b ) . Hence, f is differentiable at m. Because f ( m ) 2 f ( x ) for all x E ( a , b ) , f has a relative maximum at m . Now by Theorem 4.16 we conclude f ’ ( m ) = 0. The Mean Value Theorem generalizes Rolle’s Theorem by no longer demanding that the values at the endpoints are equal. It guarantees that there is a point in the interval so that the tangent line at that point is parallel to the secant line through the endpoints. In the proof, we subtract the line I(x) = (x - a ) f ( b ) - f ( a ) from the b-a function f . This reduces the proof to an application of Rolle’s Theorem. Theorem 4.18 Mean Value Theorem. Let f : [a, b] + R be differentiable on the open interval ( a , b ) and continuous on the closed interval [ a ,b]. Then there is a numf ( b )- f ( a ) ber c E ( a , b ) so that = f’(c). b-a Proof. The function g(x) := f ( x ) - ( x - a ) f ( b ) is continuous on [ a ,b ] , b-a differentiable on ( a , b ) and g ( b ) = f ( a ) = g(a). By Rolle’s Theorem there is a c in the 82 4. Differen tiable Functions s-’/ ieflecr across the diagonal c I ’ I’ ? c x * I Figure 13: Visualization of Theorem 4.21. Reflection across the diagonal produces the graph of the inverse function and the slope of the reflected line is the multiplicative inverse of the slope of the original line. interval ( a , b ) such that g’(c) = 0. But that means f ’ ( c ) that is, f ’(c) = f (b)- f (a) b-a f ( b )- f ( a ) / = g ( c ) = 0, b-a ‘ With the Mean Value Theorem available we can now prove a sufficient criterion for when a function is increasing or decreasing. Definition 4.19 Let I 2 R be an interval and let f : I + R be a function. Then f is called (strictly) increasing on I if and only if for all x1 < x2 in I we have f ( x l ) < f ( ~ 2 ) f. is called (strictly) decreasing on I ifand only iffor all x1 < x2 in I we have f ( x 1 ) > f ( x 2 ) . Moreovel; f is called nondecreasing on I if and only iffor all x1 < x2 in I we have f ( x l ) 5 f (x2) and f is called nonincreasing on I ifand only iffor all x1 < x2 in I we have f ( X I ) 2 f ( ~ 2 ) . Theorem 4.20 Let f : [ a ,b] + R be differentiable on the open interval ( a , b ) and continuous on the closed interval [ a ,b]. If f ’ ( x ) > 0 for all x E ( a , b), then f is increasing on [ a ,b]. Proof. Let f ’ ( x ) > 0 for all x in ( a , b ) . Suppose, for a contradiction, that there are points X I ,x2 in [ a ,b] such that X I < x2 and f ( X I ) 2 f (x2). Then by the Mean Value Theorem there is a c in the interval ( X I , x2) (and therefore in ( a , b ) ) such that f’(c)= (x2)x2 - X I < 0, which is a contradiction. w Exercise 4-2 1 provides a similar criterion for nonincreasing and nondecreasing functions. Rolle’s Theorem also enables us to prove that the inverses of differentiable functions are differentiable. For a visualization, consider Figure 13. 4.3. Rolle's Theorem and the Mean Value Theorem 83 -' Theorem 4.21 Let f : ( a , b ) + R be a continuous injective function, let f be its inverse and let x E f [ ( a ,b ) ] be such that f is differentiable at f - ' ( x ) with 1 f ' ( f - I ( x ) ) f 0. Then f is dgerentiable at x and ( f - I ) ' ( x ) = -' f' (f- 1 (XI) ' Proof. By Theorem 3.36 f [ ( a , b ) ] is an interval and by Theorem 4.16 x is neither its maximum nor its minimum. Thus there is a 6 > 0 so that the containment ( x - 6, x 6) f [ ( a ,b ) ] holds. Hence, we can talk about differentiability of the function f - ' at x. Let be a sequence in f [ ( a , b)]\ { x } so that lirn zn = x . By Definition 3.1, + {zn}zl n-m f-'(zn) - f-'(x> 1 - . For each f ' (f-'w) and let y = f - ' ( x ) . Then yn = f-'(z,) f f - ' ( x ) = y for we are done if we can show that lim n-m Zn - X n E M let yn := f - ' ( z , ) all n E M.By assumption f is continuous, and hence by Theorem 3.38 so is f - ' . This means that lim yn = lirn f-'(z,) = f - ' lirn zn = f ( x ) = y , and therefore n-+m (n-+m n-m = lim n-Q3 - Yn - Y f(yn)-f(y) 1 = lim n+m 1 f(Yn)-f(Y) Yn -Y 1 -f'(y) f'(f-'(x>) . Theorem 4.21 allows us to extend the Power Rule to rational exponents. Theorem 4.22 Power Rule. Let r E Q \ {O}. Then f ( x ) = x' is differentiable with d -Xr = rxr-1 at every x for which the right side is defined. dx Proof. Let m E N and consider f ( x ) = x i . Then f is the inverse function of g ( x ) = x m . By Theorem 4.21 for all nonzero x E R for which x i is defined, we Now let n E Zand m . E , M. Bv the Chain Rule. for all nonzero x for which x t is Exercises 4-14. Geometry versus utility in the definition of relative extrema. (a) Explain why by Definition 4.15 every point in the interval (0. 1) is a relative maximum and a for x 5 0, relative minimum of the function f ( x ) = for x E [O. I], ( x - 113; f o r x > 1. 84 4. Differentiable Functions (b) Sketch the graph of the function and comment whether intuition agrees that f has a relative maximum and a relative minimum at each x E (0, 1). (c) Use the proof of Rolle’s Theorem to explain why the definition of relative maxima as in Definition 4.15 is preferable to a definition of relative maxima that requires the value at a relative maximum to be strictly larger than the values of the function near the relative maximum. f 4-15. Prove that if f ‘ ( m ) = 0. : ( a , 6) -+ R is differentiable and has a relative minimum at m E ( a , b ) , then 4-16. Let f : [a, b] + W be differentiable on the open interval (a, b ) and continuous on the closed interval [ a , b]. Prove that if f ’ ( x ) i0 for all x E ( u , b ) , then f is decreasing on [ u , b ] . 4-17. Let f : [ u , b] -+ I 3be differentiable on the open interval ( u , b ) and continuous on the closed interval [ a , b ] . Prove that if f ’ ( x ) = 0 for all x E (a, b), then f is constant on [a, b]. 4-18. Give a direct proof of Theorem 4.20. That is, give a proof that does not argue via contradiction 4-19. Let f ( x ) := u x 2 decreasing on + bx + c be a quadratic function defined on R. Prove that if u ( -m. 3 -- G a State and prove a similar result for a < 0. + > 0 then f is ) - -, m . and increasing on + + + 4-20. Let f ( x ) := ax3 bx2 c x d be a cubic function defined on W with u 0. Prove that if 4b2 - 12uc > 0, then f has two relative extrema. For a > 0 and for a < 0 separately describe where f is increasing and where it is decreasing. 4-21. Let f : [ a , b] + R be differentiable on (a, b ) and continuous on [ a , b ] . (a) Prove that f is nondecreasing iff for all x equivalence.) E ( a , b ) we have f ’ ( x ) 2 0. (Note that this is an (b) Prove that f is nonincreasing iff for all x E ( a , b ) we have f ’ ( x ) 5 0. (Note that this is an equivalence.) (c) Give an example that shows that the condition in Theorem 4.20 is not equivalent to f being strictly increasing. 4-22. Use Theorem 4.21 to prove that the derivative of f ( x ) = f i is f ’ ( x ) = -. 1 2JT; 4-23. Explain with the following is not a proof for Theorem 4.21. “Proof.” We know that x = f (f-’( x ) ) . Differentiating both sides with respect to x gives d 1= -x dx = d -f = f ’ ( f - ’ ( x ) ) ( f - ’ ) ’ ( x ) , so ( f - ’ ) ’ ( x ) = dx 1 f‘ (f -1( X I ) 4-24. Let f : ( a . b ) + iW be continuous and let x E ( a , b). Prove that if f ’ ( z ) exists for all z and lim f ’ ( z ) exists, then f is differentiable at x with f ’ ( x ) = lim f ’ ( z ) . E ?? ‘ ( a , b) \ ( x ) Z’X Z’X 4-25. Let f : ( a , b ) + W be differentiable. Prove that f ’ (which need not be continuous) has the intermediate value property. That is, prove that for all c < d in (a, b ) and all u between f ’ ( c ) and f’(d) there is an m E (c. d ) so that .f’(m)= u . { r; and h ( x ) := f o r x i d , are for x = c, forx = d, continuous on [c, d ] , that g ( d ) = h ( c ) , apply the Intermediate ValueTheorem to one of them, then apply the Mean Value Theorem to f. Hinr. Show that g ( x ) := 4-26. Prove that for n 2 2 the nth derivative of f ( x ) = & is -Zn-l (2n - 2)! f ‘ n ’ ( x ) = (-1)”+l x 2 = (-,)n+’ 2n-l(n - 1)!2n 1.3.5...(2n-3) 2” x -2n-I 2 4-27. For each function, find an expression for f ( n ) ( x ) and prove that it is the nth derivative o f f I (a) f ( x ) = - fi (b) f ( x ) = % Chapter 5 The Riernann Integral I The geometric goal of integration is to compute the area under a graph. In Riemann integration this is done by approximating the area with rectangles. This chapter presents the idea behind Riemann integration and some integration criteria, examples and theorems. Although intuitively the Riemann integral seems to be the right idea, by the end of the chapter we will need more machinery to fully characterize Riemann integrability and we will also have exposed some key weaknesses. An equivalent criterion for Riemann integrability will be presented in Theorem 8.12. The observed weaknesses of the Riemann integral will be addressed in Chapter 9. 5.1 Riemann Sums and the Integral To define the area of rectangles under the graph of a function, we first need to determine the base for each rectangle. This is done with a partition of the interval. Definition 5.1 Let [ a ,b ] be a closed interval. Then anyjnite set P [a,b ] such that a , b E P will be called a partition of [ a ,b]. Because the order of the points will be important, we also write P = {a = xo < xi < . ' < xn = b } when working with a partition P . 1 With the partition giving the bases of the rectangles, we still need to determine the heights. Each height will be a value that the function assumes within the respective interval of the partition (see Figure 14(a)). Definition 5.2 Let [ a ,b ] be a closed interval and let f : [ a ,b ] -+ R be bounded. For any partition P = {a = xo < x1 i. . . < xn = b } a set T = { t l , . . . , t n ) such that f o r all i E { 1, . . . , n } we have that ti E [xi-l,xi] will be called an evaluation set. We 85 5. The Riemann Integral I 86 Riemann sum “upper sum” “lower sum” Figure 14: The Riemann integral approximates the area under the graph of a function with the areas of rectangles. Definition 5.2 demands that the height of each rectangle is the value of the function at some point in the base interval (one point marked in ( a ) ) . Of particular importance are the lower sum (see Definition 5.13) of the areas of the largest rectangles that can be fit under the graph of the function (b) and the upper sum (see Definition 5.13) of the areas of the smallest rectangles that contain the graph of the function ( c ) . dejine the Riemann sum off with respect to the partition P and evaluation set T to be c n R ( f , P , T ) := f (tj)(xi - xi-1). We will also use the notation Axi := xi - X i - 1 . i=l Clearly, a Riemann sum can only accidentally be equal to the area under the graph. However, the narrower we make the rectangles, the closer the Riemann sums should be to the actual area. The norm of a partition gives a uniform measure of how narrow the rectangles in a partition are. Definition 5.3 Let [ a ,b ] be a closed interval in the real numbers. For a partition P = { a = xo ix1 i’ . . < x, = b ) , we define the norm of the partition P to be 11 P / / := max { (xi - xi-1) : i = I , . . . , n } . We now say that a function is Riemann integrable iff all Riemann sums get close to one value, the integral, as the norm of the partitions is made small. Definition 5.4 Let D be a set. A function f : D -+ E% is called bounded iff there is an M E R so that f (x) 5 M f o r all x E D. 1 I Definition 5.5 The function f : [ a ,b] + R is called Riemann integrable (on [ a ,b ] ) i f f f is bounded and there is a number I such that f o r all E > 0 there is a 6 > 0 so that f o r all partitions P with 11 PI1 < 6 and all evaluation sets T the inequality c f(ti)Axj - I i=l i I = IR( f , P , T ) - I < E holds. The number I will be called the Riemann integral off and it will also be denoted I” f (x) d x := I . The function f is also called the integrand and a and b are called the bounds of the integral, with a being the lower bound and b being the upper bound. 5.1. Riemann Sums and the Integral 87 The definition of the Riemann integral allows us to compute the value of the integral, if it exists, as a limit of a sequence of Riemann sums. Note that in Lemma 5.6 below the only condition on the partitions is that their norm goes to zero and that we are free to choose the evaluation sets any way we want to. Lemma 5.6 Let f : [ a ,b] +- R be Riemann integrable. Then for any sequence [ Pk } E1 of partitions of [a,b] with lirn (1 pk I( = 0 and any associated sequence of k-tw b evahation Sets { T k ) E l we have lim R( f, pk, Tk) = k+ 03 f (x)dx. Proof. Let [ p k } E l be a sequence of partitions of [ a ,b] with lim IIPkII = 0, let k - t co [ T k ) E l be an associated sequence of evaluation sets and let E > 0. There is a number 6 > 0 so that for all partitions P of [ a ,b] with 11 P 11 < 6 and any associated evaluation set T we have R ( f , P , T ) - Ib -=I f (x)dx < &. Because lim I/Pk 11 = 0 there is a K E W so that for all k 2 K we have IIPkII / R ( f $ pk, Tk) - lb f (x)dxl k+w 6. Hence, for all k 2 K we infer that rn < &. The idea in Lemma 5.6 is very useful for numerical integration (see Exercise 1325) and for the proof of the Fundamental Theorem of Calculus (see Theorem 5.23). To demonstrate how Lemma 5.6 is applied, consider the following example. I' 1 x dx = -. 2 If f is Riemann integrable, then by Lemma 5.6 for any sequence {Pn}F=l of par- Example 5.7 If f (x) = x is Riemann integrable on [0, 11, then titions with evaluation sets Tn we have lim R(f, P, := {1 j n : n-tw = O , . . . , n } andT, := = - lim n-tw {f :j = 1 -Cj 1 n2 j=1 I n lim - - ( n + a2 2 n-tw f ( x ) d x . We choose 1 n2+n 1) = lim -n-tco 2n2 2' Example 5.7 exhibits a key problem that we will ultimately resolve in Theorem 8.12. Although we may be able to compute a value that should be the Riemann integral, it may not be clear if the function actually is Riemann integrable. 88 5. The Riemann Integral I The following results show that it is reasonably simple to prove Riemann integrability if the value of the integral is known. Note that the strong similarity between Theorem 5.8 and the appropriate parts of Theorems 2.14 and 3.10 is not accidental. The definition of the Riemann integral is similar to the definitions of the limits of sequences and functions. Hence, similar “limit laws” hold and they are proved with similar methods. Unfortunately the similarity does not apply to integrals of products and quotients. (Products are addressed in Exercises 5-21 and 5-22 and quotients are typically treated as products in integration.) Theorem 5.8 Let f , g : [ a ,b] + R be Riemann integrable and let c and cf are Riemann integrable and the following equations hold. 1. 2. lb(f + lb Ib g)(x) d x = cf ( x )d x = c b f ( x ) dx + l E R. Then f +g b g ( x )d x . f (x) dx. + Proof. We will only prove that f g is Riemann integrable and that equation 1 holds, leaving the rest to Exercise 5-3. Let E > 0. Then there is a 6 > 0 so that for all partitions P of [ a ,b] with I/P 11 < 6 and all associated evaluation sets T we have 12 f (ti)& - i=l Ib f ( x ) dxi < 4 and , k g ( t i ) A x i i=l for all partitions P of [ u , bl with II P I/ < 6 and all associated evaluation sets T we obtain < & & - + - = & 2 2 By Definition 5.5, this implies that f + g is Riemann integrable and the equation The prototypes with which we approximate the area under functions are rectangles, that is, areas under step functions. Therefore it is only natural that step functions should be Riemann integrable and that their integrals should be the sum of the areas of the rectangles involved. Definition 5.9 Let M be a set and let S M be a subset. Then the indicator function 1; f o r x E S, of S is thefunction ls(x) := 0; f o r x # S. 5.1. Riemann Sums and the Integral 89 Proposition 5.10 Let a < b and let c , d integrable over [ a , b ] and Proof. Let E Lb E [ a , b ] with c < d. Then f [ c , d ) is Riemann l [ c , d ) dx = d - c. > 0, let 6 := min { i,T}, let P = {a = xo < be a partition with II P )I < 6 and let T = {tl, ... < x, = b } . . . , t , } be an associated evaluation set. d-c Because 6 5 -, there are j , k E 11,. . . , n } with j i k so that l r c , d ) ( t i ) = 1 iff 3 i E { j ,j 1 , . . . , k } and l f c , d ) ( t i = ) 0 otherwise. Then c E ( t j - 1 , t j ] G [ x j - 2 , x j ] , Or, if j = 1, C E [Xo, t i ] 5 [ X j - l , X j ] . Similarly, d E [ x k - I , x k + l ] , or, if k = n , + d E [x,-l, x,]. Either way, & IXj-1 - c J < - and IXk 2 - dl < & -. Hence, 2 I I i=i -&-I)- By Definition 5.5, l (d - c ) ~ ~ ,isd Riemann ) integrable and the integral is (d - c). rn Standard Proof Technique 5.11 A sum of differences as in the proof of Proposition 5.10, for which the negative part of one term cancels the positive part of the previous or the next term, is also called a telescoping sum. Telescoping sums are typically used to collapse sums of many terms into shorter sums (see the proof of Theorem 5.23), or, conversely (see, for example, the proofs of Theorems 13.14 and 17.33) to make a connection between two terms using differences that are easier to work with. Proposition 5.12 Let Q := { and f o r k and E 11, . . . , m ) let lb eakl[zk-I,zk) k=l a = ak E zo < z1 < “m < zm = b } be a partition of [ a ,b ] R. Then f := ~ a k l ~ z k - , ,isi kRiemann ) integrable k= 1 in dx = z a k ( Z k - z k - 1 ) . k=l Proof. Combine Theorem 5.8 and Proposition 5.10 (Exercise 5-5). rn Exercises 5-1. Prove that Lemma 5.6 can be turned into a biconditional. That is, let f : [ a , b] + W be a bounded function and prove that f is Riemann integrable iff there is an I E R so that for every sequence of partitions of [a, b] with lim llPk 11 = 0 and any associated sequence of evaluation sets (&)p=, k+cc 5. The Riemann Integral I 90 5-2. Prove each of the following (a) If the function f ( x ) = x 2 is Riemann integrable on [0, I], then (b) If the function f ( x ) = x3 is Riemann integrable on [0, I], then (c) If the function f ( x ) = x4 is Riemann integrable on [0, 11, then s,I x2 dx = 6' 1 3 -. 1 x3 d x = -. 4 s,I x 4 dx = 1 5 -. Hint Use summation formulas from Exercise 1-33. 5-3. Prove part 2 of Theorem 5.8. m 5-4. Prove that if fl , . . . , fm : [ a , b] --f R are Riemann integrable on k=l lb5 integrable on [a. b] and fk is Riemann [a, b ] , then m k=l b fk(x) dx fk(x)dx = k=l 5-5. Prove Proposition 5.12. You may use the result of Exercise 5-4. 5-6. Let f,g : [ u ,b] l (f lb + b - g ) ( x )dx = R be Riemann integrable. Prove that f l b f ( x )d x - l - g is Riemann integrable and b g(x) d x . 5-7. Let a < 6 and let c, d E [ a , b] with c < d . Prove that l[,,d] is Riemann integrable over [ a , b] and l[,,d] dx = d - C. 5-8. Prove that if c is a constant, then f c ( x ) := c is Riemann integrable and 5-9. Let f : [ a , 61 + R be Riemann integrable and let m , M E m 5 f ( x ) 5 M . Prove that m(6 - a ) 5 s," 1" fc(x) dx = c ( b - a ) . W be such that for all x E [ a , b] we have f ( x ) dx 5 M ( b - a ) . 5-10. In the definitions of the Riemann integral and of Riemann sums we demand that the function is bounded. We could define Riemann sums for unbounded functions also, but this exercise shows that the Riemann integral cannot be defined for unbounded functions. Let f : [a,b] --f R be a function. Prove that if there is a number I such that for all E > 0 there is a S > 0 so that for all partitions P with 11 P // < S and all associated evaluation sets T we have R(f,P. T ) - I < E , then f must be bounded. Hint. Assume the function is not bounded and choose the right partitions and evaluation sets to obtain a contradiction to Riemann integrability. 1 1 f : [a. b] + 4 be a function. Prove that if for all z 0 there is a S z 0 so that for all partitions P ,Q with I/ P 11, 11 Q I/ < S and all associated evaluation sets T , U we have 1 R(f,P ,T ) - R(f,Q , U ) 1 < E . then f must be Riemann integrable. 5-1 1. Cauchy Criterion for Riemann integrability. Let E 5-12, Riemann-Stieltjes integrals. Let [ u ,61 be a closed interval, let g : [ a , 61 + W be nondecreasing and let f : [a, 61 -+ W be bounded. For any partition P = (a = xg < x1 < . . . < xn = b) and associated evaluation set T = ( t l , . . . , t n ) , define the Riemann-Stieltjes sum of f with respect to n the partition P,evaluation set T and integrator g as S,(f, P,T ) := f ( t j ) ( g ( x i )- g(xi-1) ). i=l We will also use the notation Agi := g ( x j ) - g ( x j - 1 ) . The integrand f is called Riemann-Stieltjes integrable on [ a , 61 with respect to the integrator g iff f is bounded and there is a number I such that for all E > 0 there is a 6 z 0 so that for all partitions P with lIPl1 < S and all associated evaluation sets T we have S g ( f , P ,T ) - I < E . 1 The number I will be called the Riemann-Stieltjes integral of f ,denoted 1 lb f dg := I . 5.2. Uniform Continuity and Integrability of Continuous Functions 91 (a) Prove that if fl , f2 are Riemann-Stieltjes integrable with respect to g, then f1 +f2 is Riemann- Jd Stieltjes integrable with respect to g and b (fl Jdb + f2) dg = Jd + fi dg (b) Prove that if f is Riemann-Stieltjes integrable with respect to g and c Riemann-Stieltjes integrable with respect to g and Sb (c) Prove that if g : [0, 11 + R is the indicator function 1 not Riemann-Stieltjes integrable with respect to g . cf dg = c Jd f b E f2 dg. R,then cf is b dg. [ 1 4 , then the function f =1 5.2 Uniform Continuity and Integrability of Continuous Functions Unlike continuity and differentiability, Riemann integrability is not easily checked. Therefore, we need to invest some effort into finding criteria for Riemann integrability that are simpler than the definition. The first such criterion is that continuity implies Riemann integrability. The fundamental idea behind the Squeeze Theorem (see Theorem 2.21) is that if a sequence is trapped between two other sequences that converge to the same limit, then the sequence in the middle must converge to that limit, too. To use this idea in our investigation of Riemann integrability, we define lower and upper bounds for all Riemann sums associated with a given partition. Definition 5.13 Let f : [ a ,b] -+ R be bounded, let P = { a = xg < a partition of [ a ,b] and for each i E (1, . . . , n } let mi := inf { f (x) : x E [xi-l, x;]} Mi := sup { f (x) : x and . . . < x, E [xi-l, c = b } be xi]} n We define the lower sum o f f with respect to P to be L ( f , P ) := n define the upper sum o f f with respect to P to be U (f , P ) := mi Ax; and we i=l M ; Ax;. i=l Because the supremum and the infimum in Definition 5.13 need not be assumed by f,lower and upper sums need not be Riemann sums of f . However, the next result shows that the Riemann sums are "trapped" between appropriate upper and lower sums. Lemma 5.14 Let f : [ a ,b] -+ R be bounded and let P = { a = xg,. . . , x, = b } be a partition of [ a ,b]. Then f o r all associated evaluation sets T = { t l , . . . , t n } the inequalities L ( f , P ) 5 R( f , P , T ) L U ( f , P ) hold. Proof. Because ti E [xi-l, x i ] n for all i E n all i E (1, . . . , n ) . Hence, c m i Axi p i=l 11, . . . , n } we have mi 5 f ( t ; ) 5 Mi for f (tj)Axi 5 i=l c n Mi Axi, as claimed. i=l To prove that a function is Riemann integrable, we would need to show that lower and upper sums get closer to a certain limit as the norm of the partition tends to zero. The following monotonicity properties are a first step in this direction. 92 5. The Riemann Integral I Definition 5.15 Let P , Q be partitions of [ a ,b]. Then Q is called a refinement of P ifSP C Q. Lemma 5.16 Let f : [ a ,b] -+ E% be bounded and let P , Q be partitions of [ a ,b ] so that Q is a rejinement of P . Then L ( f , P ) f L ( f , Q ) f U (f , Q ) f U (f , P ) . Proof. From Lemma 5.14, we know that L ( f , Q) f U (f , Q). Therefore we can focus on the other two inequalities. Let {zl,. . . , z k } := Q \ P . Then with Qo := P and Q; := Qj-1 U { z j ) for j = 1, . . . , k we have that each Q j is a refinement of Q;-1 and Q k = Q. Thus we are done if we can prove the outer inequalities for Q = P U { z ] for some z E [ a ,b ] \ P . We will only prove L ( f , P ) 5 L ( f , Q ) , leaving the other inequality to Exercise 5-13. Let P = { a = xo < x1 < . . . -= xn = b } and let j E { 1, . . . , n } be such that z E [x;-1,xj].Then Lemma 5.17 Let f : [ a ,b] + R be bounded and let P , Q be partitions of [ a ,b]. Then L ( f , Q ) 5 U ( f , PI. Proof. Because P U Q is a refinement of P and of Q , the proof follows from Lemma 5.16: L ( f , Q ) I L(f , P U Q ) I U (f , P U Q ) 5 U (f , P ) . Turning to continuous functions, we note that continuity guarantees that for every point x and every E > 0, there is an interval (x - 6,x 6) so that the inequality sup{ f ( z ) : z E (x - 6 , x 6 ) ) - inf { f ( z ) : z E (x - 6 , x 6 ) ) < E holds. With this difference expected to be small, the difference between upper and lower sums of continuous functions should also become small as the partitions become finer. Unfortunately, while the norm of a partition is given for the whole interval, the S may vary from point to point. To prove that continuous functions are Riemann integrable, we need to show that for any E > 0 we can choose a 6 as above that works at every point of the closed interval [ a ,b ] . This is the idea behind uniform continuity. + + + Definition 5.18 Let J C R be an interval. The function f : J + R is called uniformly continuous zrfor every E > 0 there is a 6 > 0 such that f o r all u . u E J with Iu - U I < 6 we have I f ( u ) - f ( u ) l < E. 93 5.2. Uniform Continuity and Integrability of Continuous Functions Lemma 5.19 Let f : [ a ,b] -+ IR be continuous. Then f is uniformly continuous. Proof. Suppose for a contradiction that f is not uniformly continuous. Then there is an E > 0 such that for all 6 > 0 there are u , u E [a,b] such that ( u - U I < 6 1 and l f ( u ) - f ( u ) / >_ E . Therefore, for all k E N with 6 k := - we can find numbers k 1 U k , U k E [ a ,b] so that Iuk - u k / < - and / f ( u k ) - f ( u k ) l 2 E . By the Bolzanok Weierstrass Theorem, the bounded sequence { u k } L , has a subsequence { uk, that 1 the inequality Iuk, - Uk, < converges to a t E [ a ,b ] . Because for all m E km holds, we conclude that lim U k , = lim U k , = t . But then, because f is continuous, 1 m+co lim f m+m (uk,) m-co = f ( t ) = lim f (uk,) , which means lim I f (uk,) a contradiction to / m+co f( u k ) - f ( u k ) m-co - f (uk,)l = 0, I 2 E for all natural numbers k . Uniform continuity was the last piece in the puzzle to establish that continuous functions on closed and bounded intervals are Riemann integrable. Theorem 5.20 Let f : [ a ,b] -+ R be continuous. Then f is Riemann integrable. Proof. By Theorem 3.44, there is a real number M such that f (x) 5 M for all [ a ,b ] . Then for all partitions P of [a,b] we have that L ( f,P ) 5 M ( b - a ) . Set I := sup { L ( f , P ) : P is a partition of [a,b ] } .Because by Lemma 5.17 for any partitions P , Q of [ a ,b] we have L(f,Q) 5 U ( f , P ) , we infer L ( f , P ) 5 I 5 U ( f , P ) for all partitions P of [a,b ] . By Lemma 5.14 for all evaluation sets T , we have L(f,P ) 5 R ( f , P , T ) 5 U (f,P ) , so by Exercise 1-13, for all partitions P of [ a ,b ] the inequality l R ( f , P , T ) - I / 5 U ( f , P ) - L(f,P ) holds. Now let E > 0. By Lemma 5.19, there is a 6 > 0 such that for all x, y E [ a ,b] with E I x - j I < 6 wehave lf(x) - f ( y ) l < -.Let P = {a = xo < x1 < . . . < xn = b } b-a be any partition of [ a ,b] with /I P 11 < 6. Because f is continuous on [ x j - l , x i ] , by Theorem 3.44 and Exercise 3-38 there are tf , t u E [xi-1,xi] so that f t f = mi and x E 0 ) f ( t y = M i . Therefore n By Definition 5.5 f is Riemann integrable with n Ib f ( x ) dx = I . We conclude this section by proving that for continuous functions the “right” choice of an evaluation point will produce the value of the integral. Proposition 5.21 is a lemma needed to establish the result, which is stated in Theorem 5.22. 5. The Riemann Integral I 94 Proposition 5.21 Let f , g : [a, b ] + IR be Riemann integrable. I f f (x) I g ( x )for all l b x E b f ( x ) dx I g ( x ) d x . [ a , bl, then Proof. Let E > 0 and let 6 > 0 be such that for all partitions P = { a = xo < x1 < . . . < xn = b } with 11 PI1 < 6 and all evaluation sets T = ( t l , . . . , t n }we have Idb f ( x )dx - 2 f(ti)Axi g(x)dx -kg(tj)Axi i=l i=l P being a partition with 11 P 11 < S and T being an evaluation set we obtain which proves the result (see Standard Proof Technique 2.7). Theorem 5.22 Mean Value Theorem for the Integral. Let f : [ a ,b] + IR be continuous. Then there is a c E ( a , b ) so that l6 f ( x ) d x = f ( c ) ( b- a ) . Proof. Let M = max{f(x) : x E [ a ,b ] } ,m := min{f(x) : x E [ a ,b ] }and let x,, X M E [ a ,b ] be such that f ( x , ) = m and f ( X M ) = M . Then the constant ) M are continuous, and hence Riemann intefunctions k , ( x ) := m and k ~ ( x := grable over [ a ,b ] . Moreover, k , ( x ) 5 f ( x ) 5 k ~ ( x for ) all x E [ a ,b ] and it is easy to show (see Exercise 5-8) that if k is a constant, then Therefore by Proposition 5.21 we obtain m ( b - a) 5 then f (x,)=m 5 1,"f (x) d x b-a Ib lb k d x = k(b -a). f ( x )dx I M ( b - a ) . But 5 M = f ( x ~and ) by the Intermediate Value Theorem there must be a c between x, and X M so that f ( c ) = 1,"f (XI d x b-a ' Exercises 5-13. Finish the proof of Lemma 5.16 by proving that if f : [ a . b] -+ W is bounded, P is a partition of [ a . b] and Q = P U ( z ) for some z E [ a ,b] \ P , then U ( f , Q ) 5 U ( f , P). 1 5-14. Prove that f : (0, 11 + W defined by f ( x ) = - is continuous but not uniformly continuous. Then X explain why this example is not a contradiction to Lemma 5.19. 5-15. A function f : [ a ,b] -+ W is called Lipschitz continuous iff there is an L x,y E [ a , 61 the inequality f ( x ) - f ( y ) 5 L ~ x yl holds. i 1 t 0 such that for all (a) Prove that i f f : [a. b] + W is Lipschitz continuous, then f is uniformly continuous. (b) Prove that i f f : (c. d ) + W is differentiable and f ' is bounded on [ a ,b] C (c. d ) , then f is Lipschitz continuous on [ a ,b ] . 5.3. The Fundamental Theorem of Calculus 95 (c) Prove that f ( x ) := f i is uniformly continuous on [0, 11, but not Lipschitz continuous 5-16. Prove that if f : [a, b] --f [0, 00) is a continuous nonnegative function with f ( x ) = 0 for all x E [a, b]. 5-17. Let f : [ a , b] + R be Riemann integrable and let Ib f ( x ) dx = 0, then [ P k ] r = , be a sequence of partitions with rb 5-18. Prove that i f f : [ a , b ] + [0,co)is a nonnegative Riemann integrable function with lb f ( x ) d x > 0, so that f ( x ) 1 6 for all x E [c, d ] . Hint. For a contradiction, suppose the opposite. Use lower sums to show the integral would be zero. then there are an E > 0 and c id 5-19. Let f : [ a , b] + R be bounded and let g : [ u ,b] + J R be nondecreasing. With mi and Mi as in n n Definition 5.13, let Lg(f, P ) := xrnj Agj be the lower sum and let Ug(f, P) := E M , Agi be i=l i=l the upper sum o f f with respect to g and P . (a) Let P , Q be partitions of [a. b] so that Q is a refinement of P. Prove that Lg(f,p ) 5 Lg(f,Q ) 5 Ug(f, Q) 5 ug(f, P). (b) Prove that i f f is continuous, then f is Riemann-Stieltjes integrable with respect to g. (c) Let f1, f2 : [a, b] + R be Riemann-Stieltjes integrable with respect to f l ( x ) 5 f z ( x ) for all x E [ a , bl, then lb s,” fl dg 5 f 2 dg. (d) Mean Value Theorem for Riemann Stieltjes Integrals. Let f : [a, b] --f Prove that there is a c E ( a , 6 ) so that (e) Let c E (a, b ) and let g := l [ , , b ] . i. Prove that i f f : [ a ,b] -+ with respect to g and g . Prove that if R be continuous. Ib f dg = f ( c ) ( g ( b )- g ( a ) ) . W is continuous at c, then f is Riemann-Stieltjes integrable Ib f dl[,,b] = f ( c ) . ii. Prove that if f : [ a , b] + R is not continuous at c, then f is not Riemann-Stieltjes integrable with respect to g . (f) Prove that if f is continuous and g is continuously differentiable, then, with the integral on b f dg = the right being a Riemann integral, we have / a b f ( x ) g ’ ( x )d x . Note. Integrals against d g are often used to abbreviate Riemann integrals as on the right side. 5.3 The Fundamental Theorem of Calculus The Fundamental Theorem of Calculus connects differential and integral calculus by stating that differentiation and integration are essentially inverses of each other. Theorem 5.23 Fundamental Theorem of Calculus, Antiderivative Form. Let [ a ,b ] be contained in ( c ,d ) and let F : ( c ,d ) +. R be a differentiablefunction whose derivative f is Riemann integrable on [ a , b]. Then Ib f (x)d x = F ( b ) - F ( a ) =: F ( x ) 96 5. The Riemann Integral I Proof. Because f is Riemann integrable, the integral exists. Hence, we can use Lemma 5.6 to prove the equation. For k E N,let Pk = a = xf' < . . . < x$) = b 1 be a partition with (1 Pk 11 = 7 . By the Mean Value Theorem (see Theorem 4. 18), for I 1 K each k E €V and each i E { 1, . . . , nk} there is a point ti(k) E [x!!)~,xr)] such that F x?) F xi'!), ( xi(k)-- x i -(, theequation f (ty'>= F f (t(k') = holds. Therefore we obtain (k) 'i (x:!)~) I f (t?)) Ax(k) = F (x:")) - F . Let Tk := t!k) : i = 1, . . . , nk . Then Tk is an evaluation set for Pk and by Lemma 5.6 we conclude via a telescoping sum which proves the result. Definition 5.24 Because of the Fundamental Theorem of Calculus, if F and f are functions with F' = f , then F will also be called an indefinite integral o f f , denoted F = 1 f (x) dx. One way to compute Riemann integrals is to evaluate an indefinite integral at the upper and lower bound and to compute the difference. The hypothesis that the derivative of F is Riemann integrable feels quite artificial. Nonetheless, this hypothesis is best possible. Exercise 12-24 will exhibit a differentiable function whose derivative is bounded, but not Riemann integrable. Although this example is a bit pathological, it points out a weakness of the Riemann integral that motivates the development of the Lebesgue integral. The Antiderivative Form of the Fundamental Theorem of Calculus for the Lebesgue integral, which has no artificial looking hypotheses, will be proved in Exercise 23-8. The Derivative Form of the Fundamental Theorem of Calculus is proved in Theorem 8.17 for the Riemann integral and in Exercise 18-6 for the Lebesgue integral. Exercises 5-20. Power Rule for integration. Let r E Q \ [- positive or both be negative. Prove that b. In case r i0, let a and b either both be 1 1 x r dx = -brcl o r + ' . Then explain why 1) and let a Ib i r+1 ~ r f 1 i0. we needed to require a and b to be both positive or both negative for r 5-21. Integration by Parts. Let [ a , b] c (c, d ) and let F , g : (c, d ) + W be continuously differentiable with derivatives f and g'. Prove that Ib f ( x ) g ( x ) dx = F(b)g(b)- F ( a ) g ( a ) - Ib F ( x ) g ' ( x ) dx. 5.4. The Darboux Integral 97 5-22. Integration by Substitution. Let [a, b] c (c, d ) , let g ; (c, d ) + W be continuously differentiable with derivative g’ and let F be continuously differentiable with derivative f such that the domain of F contains g [ [ u , b] 1. Prove that b f (g(x) ) g ’ ( x ) dx = F ( g ( b ) ) - F (g(a)) . 5-23. What would we need to prove so that the hypothesis that the derivatives are continuous in Exercises 5-21 and 5-22 can be replaced with the hypothesis that the derivatives are Riemann integrable? 5.4 The Darboux Integral Lemma 5.6 is an efficient tool to establish properties of the Riemann integral, provided all functions involved are Riemann integrable. It is now time to look for a similarly efficient criterion to prove that a function is Riemann integrable. Riemann’s Condition below is inspired by the idea of trapping the Riemann sums between lower and upper sums, which was already used in the proof that continuous functions are Riemann integrable. Riemann’s Condition is simpler to verify than Definition 5.5 of Riemann integrability, because, for E > 0, instead of working with all partitions of sufficiently small norm and all evaluation sets, we only need to find one partition so that the upper and lower sums are closer together than E . The price is paid in the proof, where for the ‘‘e” direction, our only tool is one partition and we must prove something for all partitions of sufficiently small norm. Theorem 5.25 Riemann’s Condition. Let f : [ a ,b] + Jft be a boundedfunction. Then f is Riemann integrable on [a,b ] i#for all E > 0 there is a partition P of [a,b] such that U (f , P ) - L ( f , P ) < E. Proof. For “+,”let f : [ a ,b] + EX be bounded and such that for all E > 0 there is a partition P of [ a ,b] such that U ( f , P ) - L ( f , P ) < E . By Lemma 5.17 the set B := { L ( f , P ) : P is a partition of [ a ,b ] }is bounded above. Let C := sup B and let E > 0. Let P = {a = xg < x1 < . . . < x, = b } be a partition of [ a ,b ] such that E U (f,P ) - L ( f , P ) < -. Then for all refinements Q of P and all evaluation sets T for 2 Q , Lemmas 5.14 and 5.16 imply IR(f,Q , TI u(f, P ) - L ( f 3P ) < 5. - 131 I u ( f ,Q ) - L ( f , Q ) I & To show that f is Riemann integrable, let M := sup {If (x)l : x E [ a ,b ] }and let & Ax1 &?.! . We will now show that for all partitions S 6 := min ( 4 n ( M + 1 ) ’ 3 ’ ” ’ 3 with IlSIl < 6 and all associated evaluation sets TS we have l R ( f , S, Ts)- CI < E . Let S = a = xo < x1 < . . . < x& = b be any partition of [ a ,b] with IlSll < 6. s s Then Q : S U P is a refinement of P . Therefore, for any evaluation set T for Q & we have R ( f , Q , T ) - CI < -. Moreover, by choice of 6, every interval [x:-~, xs] 2 contains at most one point of P that is not in S and any two intervals [x;-, , x/”]that contain such a point do not intersect. Let TS be any evaluation set for S and let T be an evaluation set for Q that contains Ts.Then T \ TS = { t i , , . . . , t i k } with k < n , because 1 1 I I 98 5. The Riemann Integral I the addition of at most n - 1 points to S that are all in distinct intervals at most n - 1 intervals that do not already have an evaluation point assigned. Therefore 1 R ( f ,Q , T ) - R ( f 9 S, Ts)1 and hence IR ( f , S , Ts)- CI I IR ( f , S , Ts) - R ( f , Q, T ) l + I R ( f , Q , T ) - Cl & & -+-=&. < 2 2 Thus f is Riemann integrable and its Riemann integral is C. For ''=+," let f : [ u , b] -+ R be Riemann integrable and let E > 0. Then there is a number I such that for all E > 0 there is a 6 > 0 such that for all partitions & P with IIPII < 6 and all evaluation sets T we have that I R ( f , P , T ) - I1 < -. Let 4 P = ( a = xo < x1 < . . . < x, = b] be such a partition and for each i E 11.. . . , n } & & f i n d t f , t y E [xi-l,x;]suchthatf t,! I m i + and f (ty) 2 M ; - -. 4n Axi 4n Axi L e t T L : = { t , ! : i = l , . . . , n a n d T ' : = { t ~ : i = l , . . . , n}.Then I 0 U ( f >P ) - L ( f . P ) < [ U ( f ,P ) - R (f,P , T')] + / R (f,P , T') + 11 - R ( f , P , T L ) i + [ R ( f , P , T L ) - - L ( f ,PI] 99 5.4. The Darboux Integral n < & C-AXi+z+C4n Axi i=l & n i=l & & AX,- 4nAxi ‘ - 4 + -2 + -4 = & & E, which was to be proved. Riemann’s Condition shows that if the lower and upper sums of a function f can get arbitrarily close to each other, then f is Riemann integrable. The only way this cannot happen is if the function oscillates too much, that is, if the function is highly discontinuous. Example 5.26 The Dirichlet function f (x) = 0; f o r x E [O, 11 \ Q> is Riel ; for x E [0,l ] n Q, mann integrable on [0,I ] . All lower sums of the Dirichlet function are zero and all upper sums are 1. By Theorem 5.25 the Dirichlet function is not Riemann integrable. 0 It is natural to ask now how much oscillation or discontinuity there can be without losing Riemann integrability. Moreover, there are further important results about Riemann integrals that could be presented here. Theorem 5.25 could be used to prove these results, but the proofs would involve substantial work with Riemann sums. We will avoid this work by first using Theorem 5.25 to prove the Lebesgue criterion for Riemann integrability in Theorem 8.12. This criterion makes the proofs of further results about Riemann integrals, presented in Section 8.3, more effective. Moreover, the Lebesgue criterion will answer how discontinuous a Riemann integrable function can be. To formulate the Lebesgue criterion, we need to introduce the Lebesgue measure of a set, which is fundamental for more advanced analysis. To define Lebesgue measure we need “infinite summations” (introduced in Chapter 6) and some set theoretical notions of “size” for sets (introduced in Chapter 7). In Chapter 8, we will continue where we leave off here. To conclude this chapter, we note that the approximation of the area under a function with lower and upper sums is also known as Darboux integration. To acquaint the reader with the language involved, we formally introduce Darboux integration, which is equivalent to Riemann integration. Definition 5.27 Let f : [a,b] -+ R be a boundedfinction. We define Cf 24, P ) : P is a partition of [ a ,b ] } , := sup {L(f, := inf { U (f , P ) : P is a partition of [ a ,b ] } . Cf is also called the lower integral o f f and Uf is also called the upper integral of f . We will say that f is Darboux integrable on [a,b] iff Cf = 24f. In this case Cf = 24, is also called the Darboux integral o f f . Proposition 5.28 Let f : [ a ,b ] + R be a bounded function. Then Cf 5 Uf. 5. The Riemann Integral I 100 Proof. Easy consequence of Lemma 5.17 (Exercise 5-24). Theorem 5.29 Let f : [ a ,b ] -+ R be a bounded function. Then f is Darboux integrable on [ a ,b] iff f is Riemann integrable on [a,b]. Moreovei; the Darboux and Riemann integrals are equal in this case, that is, Lb f ( x ) d x = C = 24. Proof. For “jlet ,” f : [ a ,b] ++R be Darboux integrable and let E 0. Then & U = C and there are partitions Q and R of [a,b] such that C - - < L ( f , Q ) and 2 & U(f,R) < U + - . N o w P : = QURisarefinementof QandRandbyLemma5.16 2 we infer C- & - < L ( f , Q ) 5 L ( f ,P) I C=U I U ( f ,P ) 5 U ( f , R ) < U + 2 = C + i, & & 2 and hence U (f , P ) - L ( f , P ) < E . By Theorem 5.25 f is Riemann integrable. For “+,”let f : [ a ,b ] -+ R be Riemann integrable. Then by Theorem 5.25 for all E > 0 there is a partition P of [ a ,b ] such that U ( f , P ) - L(f,P ) < E . Hence, for any E > 0 we infer U - C 5 U (f , P) - L ( f , P ) < E , which means U = C. The “moreover” part follows upon examination of the “+”part of the proof of Theorem 5.25, which also shows that the Riemann integral o f f is equal to C. Exercises 5-24. Prove Proposition 5.28. 5-25. Let f ,g : [a,b] + R be Darboux integrable. Prove that then f cf+g= L f + c g . 5-26. Let f : [ a ,b] --f W be Riemann integrable on [a.b ] . (a) For each n with 11 + g also is Darboux integrable and E [5 W, let P - a = x $ ) < . . . < 1‘“) = b } be a partition of the interval [ a . b ] kn n - 1 11 < -1 and let sn := n 1 m j ” ) = inf f ( x ) : x E [xf:), k,i ~ 1 1 m ~ ) l ~ X ; ~ > , , X+~m ) )t ) l , x?)]} i. Prove that for all n E N and all x ii. Prove that i f f is continuous at x iii. Prove that lim n-30 l b 1 f - sn 1 (b) Prove that there is a sequence {cn};$ E E -,I’;., (n) [Xkn ’ where [ a ,b] we have sn ( x ) 5 f ( x ) X converges to f ( x ) . [a, b ] ,then { s , ( x ) dx = 0 of continuous functions on [a,b ] such that for all n E W we have jcn/ 5 I f / and so that 5-27. Prove Riemann’s Condition for Riemann-Stieltjes integrals. That is, let f : [ a ,b] +. X be bounded. let g : [a.b ] + E be nondecreasing and prove that f is Riemann-Stieltjes integrable on [a.b ] with respect to g iff for all F z 0 there is a partition P of [a,b] such that U,(f,P ) - L g ( f ,P ) < E . Chapter 6 Series of Real Numbers I Series facilitate the computation of “sums with infinitely many terms.” This first introduction mostly showcases the results needed to define outer Lebesgue measure (see Definition 8.1). Further results about series will be presented in Chapter 10. 6.1 Series as a Vehicle To Define Infinite Sums Of course it is impossible to add infinitely many numbers. But we can consider the convergence behavior of a sequence of finite sums. Definition 6.1 Let ( a j } c l be a sequence of real numbers. The partial sums of the n a2 series c a j of real numbers are dejined to be sn := la2 a j . The series is said to i=l j=1 converge ifs the sequence of partial sums ( e a j j=1 converges and it is said to n=l c .. diverge otherwise. For a convergent series, the limit is usually denoted aj j=1 Series that start at numbers other than 1 are defined similarly (Exercise 6-1). This section is devoted to introductory examples and some fundamental results. Geometric series are the prototypical examples of convergent series. Because there is a simple formula for the partial sums, geometric series nicely fit the mold of Definition 6.1. Figure 15 gives an indication why geometric series are called “geometric” and Exercise 6-2 gives some examples of computations. Theorem 6.2 Geometric sums and geometric series. Let a , q E R and let q f 1. n 1 -qn a q j = aq -holds. Moreover;for Then for all n E N the summation formula 1-9 j=1 c 101 102 1 6. Series of Real Numbers I I I I I 1 - 1 1 1 1 -+-+-+-+...=I 2 4 8 1 6 t - 2 I I I I Figure 15: The “sum” of infinitely many terms can be finite. Each rectangle inside the square above is one-half of the size of the previous rectangle and the sum of all their areas should be the area of the square. c all real numbers 1q I < 1 the series c oc 30 aqJ converges and j=l j=l aq a q J = -. 1-q Proof. The first statement follows from the following telescoping sum argument. n n j=1 ;=1 j=1 n r z t1 ;=l j=2 = aq - a q n f l . a; For 141 < 1, note that n x a q j 1 -qn = lim ;=1 n e w aq . m ;=l Series with nonnegative terms play an important role in analysis. Their partial sums are nondecreasing. Therefore, by the Monotone Sequence Theorem, whenever the terms of a series are nonnegative and the partial sums are bounded, the series converges. cc Lemma 6.3 For all j E M, let a j 2 0. Then the series quence of the partial sums {2 J=1 a; 1 a j converges $the sej=1 is bounded above. n=l Proof. For 3,” recall that by Proposition 2.34 any convergent sequence is bounded, and hence, in particular, it is bounded above. 103 6.1. Series as a Vehicle To Define Infinite Sums c 00 For "+,"let aj be a series such that all aj are nonnegative and such that the j=1 c c c sequence n+l have 00 of its partial sums is bounded above. Because for all n E aj N we { j l l aj = an+l 2 0 , the sequence of the partial sums is nondecreasing. aj - j=1 j=1 Thus by the Monotone Sequence Theorem series converges. { c 30 must converge, and hence the aj In=1 j l 1 Series can easily be added, subtracted, and multiplied by numbers. Multiplication of series is more complicated and is therefore deferred to Theorem 10.14. c c 00 Theorem 6.4 Let x) bj be convergent series of real numbers and let c be j=1 j=1 a real numbel: Then the following hold. aj and c c c 00 1. The series aj c c 00 + bj converges and aj j=l j=1 00 2. The series j=1 33 c bj . aj j=l 00 aj - j=1 00 caj converges and 00 00 aj - bj = j=l 00 c +c c c j=1 00 aj - bj converges and j=l 3. The series + bj = bj. j=1 c 00 caj = c j=l aj . j=1 Proof. To prove part 1, note that 00 n x C a j+ c b j j=1 = n-oo lim c a j j=1 n j=l = n + n+m lim n I1 c b j =)i-ncaj + c b j j=l j=l j=l 30 lim c a j + b j = c a j + b j , n+w j=l j=l where in each step the existence of the quantity on the left side of the equal sign implies c c c +c 00 c+ x c 00 cc 00 aj and the existence of the quantity on the right side. Hence, if j=1 bj converge, j=1 00 bj . j=l j=1 j=1 j=1 The remaining parts are left to the reader as Exercises 6-3a and 6-3b. then aj bj converges and we have aj +bj = aj 6. Series of Real Numbers I 104 Similar to the algebraic operations, comparabilities can be moved through the summations. c c c c cc w Theorem 6.5 Let cc 00 Then j=1 aj 5 j=1 bj be convergent series so that a j 5 bj for all j E a j and N. j=1 bj. j=1 Proof. Exercise 6-4. The most concrete example of series in our daily experience is so fundamental, it is often not even recognized as a series. Geometric series are the foundation for the decimal expansion of real numbers. Formally, decimal expansions are an identification of numbers with sequences of integers. The connection is made via series with nonnegative terms as shown below. Proposition 6.6 Let D be the set of all sequences { d j ) p of integers in the set (of digits) (0, 1, 2, 3 , 4 , 5 , 6 , 7 ,8,9) such that there is a k E &=with dk # 0 and so that for every n E N there is an m 2 n so that d,,, # 9. Then for each { d j ) g l E D the series O0 d. converges and the function r : D + (0, 1) is bijective. r ( { d j ) T l ) := j=1 For every x E (0, l), we call r-l (x) the decimal expansion ofx. c Proof. Let {dj]T=o=l E D . For all n 05 E N,we infer the inequalities c- c- 1 9 = 9101 10 1j=1 loJ - j=1 dj < n & 9 1 10 2 10 <--=I - . co dj are nonnegative, Lemma 6.3 guarantees that d j conBecause the terms 10J 105 j=1 verges. Moreover, because at least one dk is positive and not all dk are equal to nine, c 00 we infer that 0 < dj < 1 for all { d j } E 1E D . - 101 i=l To prove that r is injective, let { c j } p l f: {dj}y=l be sequences in D. Let n E R? be the smallest natural number so that cn $r d,, and assume without loss of generality that cn > d,. Then 00 cj - dj 30 1 r ( { ~ j )-c rl ( {)d j ) p l ) = 2 - -k 10J 1o n j=n+l j =n C 1 - c7 C -=--c-- 1On " O 9 j=n+l 101 1 10n 9 1 10 10n 1 - -1- & 1 lon - 1 lon O 0 9 1 j = l 10" 10J 1 = 0: 10" 6.1. Series as a Vehicle To Define Infinite Sums which implies that r ( { c ; For surjectivity, let x } E l )# r E 105 ({dj}zl). (0, 1) and construct {d;}TZl recursively as follows. Set the digit do := 0. Once dn has been defined so that 0 5 x the largest integer k so that 0 I x- c inequalities o5x -C j=l The series O0 ;=l j=1 dj -- j=1 nfl 105 2 di j=l 1 < -, 10n let dn+l be dj 1 - < -hold. 105 lon+' d. -L converges to x,because for all n 1oj dj 101 k 10n+" Then dn+l is at most 9 and the 1 < -. Finally, for each n 101 10n j=1 dm # 9, because otherwise we would have that 0 p x - 2 dj E j=l N we have by construction N there must be an m 9 dj 03 E 1 9+3 j=n+l ;=l > n with 1 10 which cannot be. In particular, the representation in Proposition 6.6 can be used to convert infinite repeating decimals into fractions (see Exercise 6-5). Of course not every series converges. For example, if all terms are equal to 1, the sum is infinite. The limit test below is a criterion for divergence, because it says (Exercise 6-6) that if the terms do not go to zero, the series diverges. c*3 Theorem 6.7 Limit Test. Ifthe series aj converges, then lim j-co ;= 1 aj = 0. co Proof. Note that if C aj converges, then j=l n n-1 00 c4 Caj - C a j = 0. lim a, = lim x u ; - x u ; = n-co n-co ;=l j=1 j=1 j=1 If Theorem 6.7 was a biconditional, the theory of series would be very easy indeed. The next example shows that this is not the case. Example 6.8 The harmonic series O01 diverges, even though the terms converge to C J ;=l zero. 106 6. Series of Real Numbers 1 The partial sums sn = subsequence { s n l ni = 2‘. c n 1 - form an increasing sequence. If we can show that a j=1 j }Elgoes to infinity, then the sequence { s n } z 1 diverges. We choose i-1 i k=O k=O Because the cf x that ~ 2 go i to infinity, the sequence of partial sums diverges, which means . 0 diverges. We will be confronted with “infinite sums” like the harmonic series throughout measure theory. Therefore it is useful to formally define infinite sums. c c x Definition 6.9 Let n n+oo a.- co and call it infinite or an j=l j=l infinite sum ifs lim c Ly, a j be a series. We write n 00 C a j =-aifs lim C a j =-a. a j = 00. we write j=1 n-+Ly, j=l j=l Exercises 6-1. Let k E Z and for each j E Zlet a , E W.Define the series x. k a ] and a] j=-w j=k 6-2. Compute the value o f each of the series below. 6-3. More on the arithmetic of series (Theorem 6.4) (a) Prove part 2 of Theorem 6.4 (b) Prove part 3 of Theorem 6.4. x j=1 Cc 00 aj j=I nor b j converges. j=l x a j and (c) Give an example of two series j=1 c rn b j such that aj j=1 + b j converges, but neither 6.1. Series as a Vehicle To Define Infinite Sums c m (d) Is it possible to find two series c m actly one of j=1 c c j=1 c m 00 a j and b j such that j=1 a j + b j converges, and ex- j=1 m aJ and c b j diverges? j=1 00 (el IS there a series 107 c m 00 a , and a c E R such that j=l a j diverges? c a j converges, and j=1 j=l 6-4. Prove Theorem 6.5 6-5. Convert each of the following infinite repeating decimals below into a fraction. A bar over a set of digits means that these digits repeat indefinitely. (b) 0.25 (a) 0.25 (c) 0.9462 (d) 0.1473 (e) 12.004% c m 6-6. Why does the limit test say that if ,lim a j f 0, then the series 5-00 a j diverges? j=l 6-7. Another proof for the summation formula in Theorem 6.2. Let q f 1 and a be real numbers. Prove c ?I by induction that c j=1 1-9" ' aqJ = aq 1-GJ 00 00 6-8. 2k test. Let a j be a nonincreasing sequence with nonnegative terms. Prove that j=1 a j converges j=1 k= 1 Hint. Use Example 6.8 as guidance. c m 6-9. Prove that converges by showing that the partial sums form a Cauchy sequence (- j j=1 5 1 - m + -.n1 Be careful to distinguish all possible combinations of n . m being even or odd. 6-10. Translating between sequences and series. Let ( u , ) ~ = ~be a sequence of real numbers. Prove that m converges iff the series c ( a , + l - a j ) converges and that in this case we obtain the limit j=l oi as lim an = a1 n-rm +C(aj+l -aj). j=1 6- 1 1. Use Lemma 6.3 to prove the Monotone Sequence Theorem c 00 6-12. Let a j be a convergent series, let ( j k ] & be a strictly increasing sequence of natural num- j=1 c jk-1 bers with j o = 1 and for all k E N let Ak := j=jk-l c 00 aj. Prove that Ak converges and k=l 6. Series of Real Numbers I 108 6.2 Absolute Convergence and Unconditional Convergence For series, there is no simple condition that is equivalent to convergence. Therefore we need criteria that imply convergence. This section presents the criteria that are needed to define and work with outer Lebesgue measure. More criteria will be presented in Section 10.2. Cauchy sequences play an important role throughout analysis, so it is natural to relate the convergence of series to Cauchy sequences. Cc Proposition 6.10 Cauchy Criterion. A series a j converges lfSfor all E > 0 there j=1 is an N E N so that f o r all n 2 rn 2 N we have 00 Proof. The series c a j converges iff the sequence {s,}zl of its partial sums j=1 converges, which by Theorem 2.27 is the case iff it is a Cauchy sequence. This is the case iff for all E > 0 there is an N E N so that for all n 3 m 3 N the inequality Isn - sm-11 iE holds. Since 1 I we have proved the result. a , = Is, - s-1, 1j;m The Alternating Series Test shows that there are indeed many convergent series. Theorem 6.11 Alternating Series Test. Let { b j } F 1 be a nonincreasing nonnegative 00 sequence such that lim bj (- l ) j + ' b j converges. = 0. Then J-00 j=1 Proof. Let E > 0 be arbitrary. There is an N E b, i-. Then for all rn > n 2 N we obtain 2 I m I m-n 1 j=n I i=O E M so that for all n i =O , i use bn+2,+1?bn+2(j+1)and a telescoping sum 3 N we have 109 6.2. Absolute Convergence and Unconditional Convergence rn and by the Cauchy Criterion we conclude that the series converges. In many situations we will work with series of nonnegative numbers. If negative summands occur, it is natural to take absolute values and hope the sum still converges. This is not always the case, but the idea of absolute convergence is fundamental. c c 00 00 Definition 6.12 A series converges absolutely iffthe series a j j=1 laj 1 converges. j=1 Absolute convergence is a strictly stronger condition than convergence. That is, absolutely convergent series converge, but the converse is not true. c c J II c 00 Proposition 6.13 I f the series converges absolutely, then it converges. More- a j j=l Ix ovel; the triangular inequality x a. < l a j I holds. ij=l - j=1 c c 1 00 00 Proof. Let E > 0. Because a j converges absolutely, j=l laj I converges. Thus j=1 n there is an N E N so that for all n 2 m 2 N we have that c n all n 2 m 2 N we obtain c 5 laj < E . But then for j =m laj I< E. Therefore by the Cauchy Criterion j=m x the series a j converges. j=l c n Moreover, for all n E W we have l a j j=1 I1 n I c 00 laj I I j=1 la; 1, which implies the j=l rn triangular inequality. 00 Example 6.14 By the Alternating Series Test, the series cI x However, (-I);+' j=l I f J ~ ' = O01 j=1 e(-l)j+l j=1 1 converges. j = 00, so it does not converge absolutely. 0 J Absolute convergence of a series is often established with the Comparison Test. c c a2 Theorem 6.15 Comparison Test. Let a j j=1 30 for all j E N.I J ? b ~j converges, then j=1 M bj be series with 0 5 a j I bj and ;=l M a j j=1 converges, too. 110 6. Series of Real Numbers 1 is a nondecreasing sequence and for all n we have c c c n j=l 5 00 rn n aj b j 5 b j < co. This means that is bounded above by j=l j=l 00 c 00 bj. By the Monotone Sequence Theorem, the series j=1 a j converges. j=l The Comparison Test allows us to establish convergence and divergence for series that can be compared to series with known convergence behavior. For example, O =1 converges because 2 j12 < - (k)'. j=l 2J2 Similarly, O 01 diverges because 1 > -.1 We A i-1 A - j will use the Comparison Test extensively in Section 10.2. Moreover, it should be noted that the Comparison Test can also be applied when there is a k G N so that the comparability 0 5 a j 5 b j holds only for j 2 k (Exercise 6-16). J - I We conclude this section with the subtle, but nonetheless necessary, notion of unconditional convergence. Assume that the terms of a series are provided in no specific order. (This is the case in the definition of outer Lebesgue measure.) Then we could add the terms in any order we choose. It would be catastrophic if the value of this summation depended on the order in which we sum the terms. Unfortunately, for 30 1 arbitrary series this can actually happen. For example, - 1)J + l - converges. How- c( J=1 j c C 1 ever, a rearrangement can destroy the convergence. Consider that O - 00 and r=l 2 j + 1 c( 30 I - 1) - = --oo (Exercise 6-13). Suppose we first sum enough odd numbered terms j=1 2j 1 1 to get a number > 1, say, - - > 1, then add the first even numbered term, so we 1 3 1 1 get - then add enough odd numbered terms to get a number > 2, say, 1 3 + + + (-:), > 2, then add the second even numbered term, so we 1 get 1 k) + -31 + (- + get a number > 3, say 20 1 + (- ),then add enough odd numbered terms to j=2 +-+ 1 3 254 - j=2 k=21 1 6.2. Absolute Convergence and Unconditional Convergence 111 then add the third even numbered term, so we get 254 j=2 1 k=21 and proceed in this fashion indefinitely. This process uses all the available numbers and it produces a sequence of sums that goes to 00. Of course we will need to use more and more odd numbered terms to make up for the one new even numbered term and the increment of 1 , but nonetheless, in one arrangement the series sums to a finite number, while in another the sum is infinite. We will make this idea more precise in the proof of Theorem 6.18 and we will push it to its full extent in Exercise 6-23. Series in which we can rearrange the terms in any way without affecting the “sum” are called unconditionally convergent. c 00 Definition 6.16 The series aj converges unconditionally @for all bijective func- j=l co a2 co tions u : W + W the series C a , ( j ) converges and x a , ( i ) = x a j . Zfa series i=l i=l j=l converges, but not unconditionally, we will say it converges conditionally. For series of real numbers, Theorem 6.18 below shows that unconditional convergence and absolute convergence are equivalent. This is very helpful, because it is a lot easier to check whether one series of absolute values converges than to check if all rearranged series converge to the same limit. c co Definition 6.17 Let c co be the series aj be a series and let B 5 N. We dejine the series j=1 c a; to jeB b j , where b; := a,; for j E B, ;=l c =o Theorem 6.18 The series a, converges absolutely @it converges unconditionally. ;= 1 Proof. For “j,” let E > 0 and let N E N be such that for all n 2 rn 2 N we have 2 c co la;/ < ;=m E . Then 2 & lajl 5 - < E . 2 j=N Now let u : W -+ N be an arbitrary bijection. Then there is an I (1, . . . . N - 1) a[{l,. . . , Z } ] . Hence,forallrn p Z weobtain lm c o I I I E W so that 112 6. Series of Real Numbers I c c oc 00 which means a,(i) = uj. j=1 i=l M For "+,"we prove that if x a j does not converge absolutely, then it does not j=1 c co converge unconditionally. There is nothing to prove if c cc can assume does not converge, so we aj j=1 aj converges, but not absolutely. j=1 For j E N,let uj' := max{aj, 0) and let a; := - min{aj, 0). We first claim that oc ZC 33 x u ; = c u r = co. Suppose for a contradiction the series X u : j=l j=1 c c c c c oc 00 Then the series a; = 00 00 laj I = - ( U j - u t ) would converge, so n=l j=1 did converge. j=l j=1 uJ' +a; j=1 00 would converge, and hence a j would converge absolutely, a contradiction. j=1 We now recursively construct a bijection a : N + N so that c n , ( i ) diverges. i=l ffi ~~ Because X u : = co there are n1 E N a n d a(1) < a(2) < . . . 4 a(n1) so that for J=1 all i E (1, . . . , n l } we have a,(i) > 0, so that for all j 5 a(n1) with a j > 0 we have ni nl-1 j = a(i) for some i E { l ,. . . , n l } and so that Because a,(i) 5 1 < i=l i=l x ~ u j = c ~ t h e r e a r e mE lN a n d o ( n 1 + 1 ) < a ( n 1 + 2 ) < j=1 + + ... <a(nl+ml)sothat + for all i E {nl 1 , . . . , n1 m l } we have a,(i) I 0, so that for all j I a(n1 m l ) with a j 5 0 we have j = a(i) for some i E ( n l 1, . . . ,n1 m l } and so that + + i=I i=l For the recursive step, let no := mo := 0 and assume that nl < . . . < n k , m 1, . . . , mk E N and distinct a ( l ) ,. . . , a(nk mk) E have been chosen as follows. + k Foralli E P k : = U { n p - l + m p - l + l , . . . , np}wehavea,(i) > ~ , f o r a l l lj a ( n k ) p=l I1 1 i=l i=l 6.2. Absolute Convergence and Unconditional Convergence u k Moreover, for all i E j 5 a(nk + mk) with N k := { n pi-1, . . . , n p + m p } we have u,(i) 5 0, for all p=l c Ui 5 0 we have j = a(i) for some i E nkfmk-1 the inequalities 113 Nk and the sums satisfy nk+mk u,(i) > k 2 u,(~). i=l i=l + mk + 1) < . . . < O ( n k + l ) greater than a ( n k ) so that for all i E &+I := u { n P - l + mp-1 + 1, . . . , n p }we have u,(i) > 0, for all Choose natural numbers a ( n k k+l p=l j 5 a(nk+l) with Uj nktl-1 ties +1< a,(i) 5 k i=l c > 0 we have j = a(i) for some i E Pk+l and the inequali- u,(i) hold. (This can be done because i=l U,(i) uj’ = 00.) j=1 nk+mk Because k 2 c co nk+l we infer > nk nk+l + mk. Choose the natural numbers i=l a(nk+l + 1) < . . . < o ( n k + l +mk+l) greater than a ( n k + m k ) such that for all indices k+l i E N k + l := U ( n p + l, . . . , np+mp}wehaveu,(j) 10,forallj sa(nk+l+mk+l) p= 1 with U j 5 0 we have j = a(i) for some i c nk+l + m k + l - 1 u,(i) > k i=l Because k c E Nk+1 and the sums satisfy the inequalities nk+l fmk+l +1< c +12 cUT 30 u,(i). (This can be done because = 00.) j=l i=l nk+l a,(i) we infer m k + l 2 1. i=l Then a : N +. N as constructed is injective by construction, surjective because in every recursive step at least one more positive term and at least one more nonpositive 30 term of xu;are added to the set . . . , u , , ( ~ ~ + ~and ~ ) }the , construction shows (uD(l), is unbounded, and hence divergent. Standard Proof Technique 6.19 In the proof of “=+”of Theorem 6.18, our estimates c co only yielded a nonstrict inequality luj I 5 . . .. By making the sum less than or j=N E 8 . equal to - we made it strictly less than E . Because E can be replaced with - in any 2 2 argument, in analysis it is usually not a problem if an estimate only yields a nonstrict inequality 5 E rather than a strict inequality < E . 0 Sometimes we need to partition an infinite sum into infinitely many “chunks” (see the proof of part 3 of Theorem 8.6). For this type of rearrangement, we use double 6. Series of Real Numbers I 114 series. For brevity’s sake, we limit ourselves to double series of nonnegative terms in Proposition 6.22. Definition 6.20 For i , j E N,we define the ordered pair ( i , j ) := { i , { i ,j ] } and we define N x N = { ( i , j ) : i , j E N}.A (doubly indexed) family of numbers { ~ ( i , j ) ] Z = ~ is a function from N x N to R. The definition of ordered pairs guarantees that the order in which the numbers are listed matters (Exercise 6-14). Definition 6.20 also indicates that it is time to investigate set theory in more detail. We will do so in the next chapter. Definition 6.21 Let series {a(i,j)}$=l be 7, M a doubly indexed family of numbers. The double M M a ( i , j )is called convergent $for all i E a(,,j ) converges W the series i = l j=1 j=1 00 also converges. We will also denote the and furthermore the series i=l cooc m o c double series as well as its limit by ai3jinstead i=l j=1 of a(i,j). i=l j=1 Proposition 6.22 shows that for double series of nonnegative numbers the order of summation is immaterial. Proposition 6.22 Let { ~ ( i , j ) ] E ,be = ~a family of nonnegative numbers. Then the dou- m o o ble series a ( i , j ) converges Ifsf o r all bijections a : N x N +W x N the sum i=l j=1 a330 7‘7, a,(i,j) converges. Furthermore, in this case the values are equal. j=l j=1 Proof. The direction E prove “+,”let ‘‘e” is trivial, because we can choose a ( i , j ) := ( i , j ) . To M 77,a ( i , j )be convergent and let a : N x N + N x N be a bijection. i = l j=1 c x m Let i E W. Then for any m E c N we have a,(i.;) m 5 j=1 a ( k , ; ) .Hence, by Lemma k = l j=1 .. 6.3 each series a,(i,j) converges. j=1 0000 For the convergence of the rearranged double series to a(i,j ) , let E > 0 be i = l j=1 arbitrary but fixed, and let m E N. Then for all i E { 1, . . . , m ] there is an Ni so that 115 6.2. Absolute Convergence and Unconditional Convergence & Qu(i,j) j=N,+l < -. Let N := max { N i : i E (1, . . . , m } } .Then rn / N Because E was arbitrary, for all m E m m m m N the inequality N cccc m c c >:7; au(i,j ) i = l j=1 5 7 7;a(i, j) i = l j=1 corn holds. In particular, this means that the double series a u ( i , j )converges and i=l j=1 cccc i = l j=1 i=l j=1 To show that the inequality actually is an equation, note that the roles of the double series can be reversed in the above argument to arrive at the reversed inequality (see Exercise 6-24). 00cc 0 0 0 0 a(i,j ) , which completes the au(i,j ) = Therefore, we conclude that We will revisit the summation of double series that also have negative terms with more sophisticated tools in Exercise 14-50. Exercises c 30 6-13. Prove that j=1 cc 1 ~ 2j 1 = 00 and c ( - l ) - +1 j=l = -00. 2j 6-14. Let i . j , m ,n E N.Prove that (i, j ) = (m.n ) iff i = m and j = n c c 00 6-15. Cauchy Criterion for absolute convergence. Prove that a series a , converges absolutely iff for j=1 n all E > 0 there is an N E N so that for all n c c c X 6-16. Comparison Test revisited. Let k E c c N and let X j ? k . Prove that if b j converges, then j=1 laj 1 < ? m ? N we have 33 bj be series with 0 5 a j 5 bj for all a j and j=1 cc J=1 a j converges, too. j=1 33 6-17. Prove that a series a j converges absolutely iff the sequence J=l E. j =m 6. Series of Real Numbers I 116 6-18. Prove that the sum of two absolutely convergent series also converges absolutely c 00 6-19. Give an example of an absolutely convergent series so that j=1 c c c c x 00 aj aj f j=1 laj 1 j=1 x 6-20. Let x a j be an absolutely convergent series and let ( b j ] z l be a bounded sequence. Prove that J=1 a j b j converges absolutely. j=1 6-21. Determine which of the following series converges. If it converges, determine if it converges absolutely. j=1 c c x X 6-22. Limit Comparison Test for series. Let j=1 if lim J+W b j be series with positive terms. Prove that a j and j=1 9 = c > 0, then either both series converge or both series diverge. bj Hint. For E > 0 there is an N E N so that for all j p N we have (c - e ) b j 5 a j 5 (c +E)bj, 00 a , be a conditionally convergent series and let z 6-23. Let j=1 c E W.Prove that there is a bijective function 00 u : W + N such that a,(j) = z. j=1 Hint. Mimic the proof of the “e’’ part of Theorem 6.18. Oscillate about z rather than increasing c n with k . Use that the limit of the terms is zero and that the distance of the partial sum a,(jj to z j=1 ultimately is always bounded by some la, 1 with rn being large. cc a,(;, j ) and i=l j = l i=l j=1 cc x00 cow 6-24. Reverse the roles of x x a(;,j ) as indicated in the proof of Proposition 6.22 ;=I j=1 i = l j=1 cc 1 x x 6-25. Let { ~ ( j , j j ] ; = ~be a family of real numbers so that the double series cc x Prove that 1 converges. x a ( ; , j ) converges to a number L and for all bijections a : N x W ;=I j=1 xDs a , ( j , j ) converges to the same number L . i=l j=1 a(;,jj ;=I j=1 + W x N the sum Chapter 7 Some Set Theory Up to this point our development of analysis had minimal need for details of set theory. However, to define the outer Lebesgue measure of a set (see Definition 8.1), we will cover the set with an infinite family of intervals and sum the lengths. To be able to compute the sum as a series, we can use at most as many intervals as there are natural numbers. Surprisingly enough this is not trivial, because infinite sets can come in different sizes. This chapter presents the fundamentals on arbitrary families of sets and on the notion of size for sets. These ideas will be needed immediately for outer Lebesgue measure and they will be useful throughout our work with measures. All sophisticated set theory needed in this text is presented in this chapter. 7.1 The Algebra of Sets Recall from the introduction to Chapter 1 that the terms “set” and “element” remain undefined. To be able to work with arbitrary families of sets, we need to formally define them and show some fundamental properties. Because every family of sets must consist of subsets of another set, we start with the power set of a set. Definition 7.1 Let S be a set. The power set P ( S ) of S is the set of all subsets of S. Definition 7.2 Let S be a set. A family of sets C is simply a set of subsets of S , that is, P ( S ) . An indexed family of sets is denoted {Ci}iEz,where it is understood that there is a function from the index set I to P ( S ) that maps each i E I to Ci. C So while families of sets do not introduce any new set theoretical ideas, we call them “families of sets” rather than “sets of sets” to indicate that we are now considering objects whose elements are set theoretically more complicated than just elements that cannot be decomposed any further. A family of sets can always be turned into an indexed family of sets by making each set its own index. Conversely, an indexed family can be turned into a nonindexed family by considering the set C := {Ci : i E I } , but this process will collapse sets Ci = Cj with i j into one set in C. In our work, we may need repeated sets, so most of the families of sets we consider will be indexed. + 117 118 7. Some Set Theory C1 3 @ ( a ) c2 c , n c* 8 1 c3 Figure 16: Venn diagrams of the intersection ( a ) and the union (b)of three sets. Nonetheless, proofs for both kinds of families are very similar and we will present both notations in this section. The most elementary operations for families of sets are unions and intersections (also see the Venn diagrams in Figure 16). Definition 7.3 Let C be a family of sets in S. Then we defne the union of C as the set := {x E S : (3C E C : x E C ) } .Zf[Ci)iEz is an indexed family, the union is UC denoted U C i := {x E S : (3i E I : x E C i ) } . i€f Definition 7.4 Let C be a family of sets in S. Then we defne the intersection of C as the set := {x E S : (VC E C : x E C ) } . Zf[Ci}ie~ is an indexed family, the nC n intersection is ~i := {x E s : (vi E I : x E ~ i ) } . ieZ The relation between complements (see page 1) and unions/intersections is described by DeMorgan’s Laws. Theorem 7.5 DeMorgan’s Laws. Let C and [Ci}i,~ be families of sets and let X be another set. Then I. x \ Uci = iEZ 2. x \ ciand x \ Uc = n i x \ c : c E el. i€I x \ n c i = U x \ cia n d X \ ief nc = U{x \ c : c E el. ief Proof. Proofs f o r indexed and nonindexed families are very similar: Hence, we will only present one proof f o r each part. For part 1, we will prove the containments X \ G n { X \ C : C E C} and u UC X \ C 1 n [ X \ C : C E C},which (see introduction to Chapter 1) shows the two sets are equal. For the containment relation X \ C C n { X \ C : C E C},consider an arbitrary x E X\ u UC. Then x 6 UC,which means that x 6 C for all C E C. But then x E X \ C for all C E C, which means x E n { X \ C : C E C}.This proves that X \ C is contained in n { X \ C : C E C}. u 7.1. The Algebra of Sets 119 For the reversed containment relation X \ arbitrary x E n { X \ C :C x $ C. But then x @ E uC 2 C). Then for all C UC,which means x E X E \ n { X \ C : C E C),consider an C we have x X \ C, that is, E UC. This proves that X \ UC contains n i x \ c : c E el. Because each set is contained in the respective other set, the sets must be equal. It is also possible to prove part 1 with a chain of equalities, like part 2 below (Exercise 7-la). The proof of part 1 for indexed families of sets is similar (Exercise 7-lb.). We prove part 2 for indexed families. = {X E x : (3i E I : x E x \ ci)}= UX\ Ci. icI It is also possible to prove part 2 with a mutual containment argument as done for part 1 (Exercise 7-lc). The proof of part 2 for (nonindexed) families of sets is similar (Exercise 7-ld). Standard Proof Technique 7.6 To prove equality of two sets A and B it is common to prove mutual containment, that is, A C B and B 5 A . This is similar to proving that two numbers are equal by proving that one number is both greater than or equal to and less than or equal to the other one. 0 Ordered pairs and products also have formal set theoretical definitions. Definition 7.7 Let A and B be sets and let a E A and b E B. Then we define the ordered pair ( a , b ) to be ( a , b ) := { a , { a ,b } } . ZfA1, . . . , A, are sets and ai E Ai f o r i = 1, . . . , n, we dejine ( a l , . . . , a,) := { ( a l , . . . , a,-i), { ( a l , . . . , a n p i ) ,a,}} and call it an ordered n-tuple. Definition 7.8 Let A 1 , . . . , A, be sets. Then the product A 1 x . . . x A, of these sets is dejined to be the set of all ordered n-tuples ( a l , . . . , a,) so that f o r all i = 1, . . . , n n n we have ai E Ai. The product is also denoted n Ai. Moreovel; for j E { 1, . . . , n ) we i=l n define the natural projection T A , : Ai + A j onto A j by T A (~a l , . . . , a,) := a j . i=l We will also abbreviate the natural projection onto the jthfactor as T,. For a visualization of the product of two sets, see Figure 17. The definition of a product is fundamental for the formal definition of functions and relations. Because we will not need this level of detail in this text, these definitions have been relegated to Appendix B.2. Theorem 7.9 Let ( C i j i , ~be a family of sets and let A be a set. Then the equalities Ci = A x Ci hold. A x Ci = A x Ci and A x u u n 7. Some Set Theory 120 f ~ 7r2(a) = a 2 I ---- --- Figure 17: The product of two sets is a “rectangle’’ with the natural projections ni providing the “coordinates” of an element. Proof. Left to Exercise 7-2. The distributive laws express the relationship between unions and intersections. Theorem 7.10 Distributive laws. Let { A i } i eand ~ { B j } j eJ be indexedfumilies of sets. Then Ai n Bj = Ai n Bj and Ai U Bj = Ai U B j . ieI jeJ (i,j)eIxJ ieI jcJ ( i , j ) E I xJ u u u Proof. For the first equation let x E u nu Ai iel x E Ai and a j E J so that x For the reverse inclusion, let x thatx E Ai n B j U A in E B j , which means x Ai E Ai f?B j . Then there is an (i, j ) E n E I , then x x E Ai, u B j for all j Ai U Ai C E ieI E ieI J , and hence x B j , then x Ai U I x J . For the reverse inclusion, let x for all i Ai nBj. E I x J so E Ai U Bj Bj. ieI E u n Bj C I so that ( i ,j ) E I x J For the second equation, first note that if x for all (i, j ) E ( i , j ) e I xJ u E u B, , Then there are an i jeJ jeJ iE1 n 0 jeJ E n Ai U B j . If x E Ai ( i .j ) E I x J 0 Bj and if x @ Ai, for some io n cn E I, then jEJ E Bj jeJ Ai U ieI Bj. jEJ Exercises 7- 1. Proving DeMorgan’s Laws. (a) Prove part 1 of Theorem 7.5 for families of sets using a chain of equalities. (b) Prove part 1 of Theorem 7.5 for indexed families of sets. (c) Prove part 2 of Theorem 7.5 for indexed families of sets using a mutual containment argument. 7.1. The Algebra of Sets 121 (d) Prove part 2 of Theorem 7.5 for nonindexed families of sets 7-2. Prove Theorem 7.9. That is, let { C i ) i €be ~ a family of sets and let A be a set. (a) Prove that A x u u Ci = icI A x Ci. (b) Prove that A x icI nCi n A x Ci. = icI icI 7-3. Let A , B , C be sets. Rove each of the following. (a) C \ ( A n B ) = (C \ A ) u (C \ B ) (c) (e) c u ( A n B ) = (cu A ) n (cu B ) ( A \ B ) n c = ( A n c)\ ( B n c) c n ( A u B ) = (cn A ) u (cn B ) (d) (0 If A , B 5 C, then A \ B X be a set. An algebra is a set of sets A g P ( X ) such that 0 E A, if A n and if A j E A for all j = 1, . . . , n , then = A n (C \ B ) . C = 0 so that any two sets in the family intersect. 7-4. Give an example of a family of sets with 7-5. Let (b) C \ ( A U B ) = (C \ A ) n (C \ B ) u E A, then X \ A E A, A j E A. j=l n (a) Prove that if A j E A for all j = 1, . . . , n , then A, E A. j=l (b) Let X be a set. Prove that the power set of X is an algebra. (c) Let X be a set. Prove that A := { A g X : A or X \ A is finite ) is an algebra. (d) Prove that an algebra need not contain countable unions of its elements. g X and B g Y be sets. Prove that (X x Y ) \ ( A x B ) = [ ( X \ A ) x (Y \ B ) ] U [ (X \ A ) x B ] U [ A x ( Y \ B ) 1. 7-6. Let A 7-7. Let A , B , C, D be sets. Prove that ( A x B ) n (C x D ) = ( A n C) x ( B n D ) . 7-8. Let f be a function whose domain is contained in X and whose range is contained in Y . X x Y : y = f ( x ) ) ] is the domain o f f . (a) Prove that nx [ {(x, y ) E (b) Prove that n y [ {(x,y ) E X x Y : y = f ( x ) ) ] is the range of f. 7-9. Unions and intersections versus functions. Let X , Y be sets, let f : X -+ Y be a function, let { X i ] i c l be a family of subsets of X and let { Y i ) i € be ~ a family of subsets of Y . Prove each of the following. (e) For f ( x ) := x 2 there are sets A , B with f [ A n Bj f f[A]n f [ B ] . injective . 7. Some Set Theory 122 7.2 Countable Sets Two sets are considered to be of the same size iff their elements can be matched oneby-one without any leftovers on either side. Recall that bijective functions were defined in Definition 2.24. Definition 7.11 Two sets A and B are called equivalent iff there is a bijective function f : A + B. With this language, Definition 2.25 says that a nonempty set is finite iff it is equivalent to a subset { 1, . . . , n ) of N. In Figure 3 on page 36, only the two sets on the right are equivalent. For analysis, the most important size distinction between infinite sets is if they are countable or uncountable. Countable sets are, roughly speaking, sets for which it is possible to "count" the elements, where the counting process may stop or it may not. Definition 7.12 A set C is called countably infinite iff there is a bijective function f : N -+ C. A set C is called countable iff C is finite or countably infinite. Subsets of a set cannot be larger than the set, so it is natural that subsets of countable sets are countable. Theorem 7.13 I f C is countable and S C , then S is countable. Proof. If S is finite, there is nothing to prove. Hence, we can assume that S is infinite. Let f : N + C be a bijection and let n1 := min f - ' [ S ] . Fork E W, we define nk+l recursively. Once n l , . . . , nk are chosen, let nk+l:=min f - ' [ ~ l \ { n l ,. . . , n a ) ) . ( Define g : N -+ S by g ( k ) := f ( n k ) . Because all nk are in f - ' [ S ] , g maps N into S. Because no two nk are equal and f is injective, g is injective. Finally, suppose for a contradiction that g is not surjective. Let b be the smallest element of f-'[S \ g[N]] and let the number of elements of f - ' [ S ] n { l , . . . , b - 1) be k . Then f - ' [S] n { 1, . . . , b - 1) = { 1, . . . , n k } (with no = 0 in case k = 0) and b = min f - ' [ S ] \ { n l ,. . . , n k ) ) , which means b = nk+l, a contradiction. ( Theorem 7.13 says that to prove that a set is countable, it is enough to embed a copy of it into a countable set. In the language of analysis, it is good enough to find an "upper bound" that is countable. We will use this idea repeatedly in the remainder of this section. Interestingly enough, even sets that look uncountable may be countable. Lemma 7.14 The set N x W is countable. Proof. The function f : N x W + W defined by f (in, n ) := 2m3n is injective. Thus the set N x N is equivalent to a subset of W, and hence by Theorem 7.13 W x W is countable. Note how Theorem 7.13 allowed us to avoid the detailed construction of a bijective function between N and W x N.Defining such a function is not trivial. Exercise 7-10 explicitly presents a bijective function between N x N and N. 7.2. Countable Sets 123 Definition 7.15 Two sets A and B are called disjoint ifSA f l B = 0. A family {Ci}i,r is culled pairwise disjoint @for all i # j we have Ci n Cj = 0. Theorem 7.16 Countable unions of countable sets are countable. Proof. Let {C,}EE1 with a E N U (00) be a countable family of countable sets. For each C,, let B, := C, \ u u u ‘‘2” u a a n-1 C,. The containment ‘‘5” B, = C j . We claim j=1 n=l n=l a follows from B, C, for all n E N. For let x smallest natural number so that x E C,. Then x E C,. Let n E N be the n=l B,, which proves ‘‘2.” Moreover, E u m-1 \ the B, are pairwise disjoint, because if m < n , then B, = C, Cj L C, and j=l u n-1 B, = C, \ I Cj G Cn \ C,. Now for each n let B, = bk : k j=1 E I,],where Zn is N or a set of the form 11, . . . , m,}. Then f ( n ,k ) := bf:is a bijective function between u a {(n,k)E N x N : n 5 u,k E I,) G N x Nand a UC,. B, = n=l n=l 00 Thus u C n is n= 1 countable. Maybe the most surprising fact about countability is that the rational numbers are countable. The reason is that the order in which the rational numbers are counted has nothing to do with their natural ordering. Theorem 7.17 The rational numbers Q are countable. u{ n ”- M =- 0) = N] , that is, the positive rational d numbers are a countable union of countable sets, and hence by Theorem 7.16 they are countable. Similarly, ( r E Q : r < 0 ) is countable and of course {0}is finite. Now Q = ( r E Q : r > 0 } U {r E Q : r < 0 ) U (0} is countable by Theorem 7.16. Proof. We have {r E Q : r - :n E ff=I Exercises 1 7-10. Prove that the function f : N x W + W defined by f ( m . n ) := - ( m 2 bijective. (For a visualization, consider the middle of Figure 18.) + n - l)(m + n - 2) + n is 7-11. Prove that the set of integers Z is countable. 7-12. Prove that the set of integers Z is countable by constructing a bijective function f : N + Z. Hint Figure 18(a). 7-13. Prove that the set of dyadic rational numbers is countable. 7-14. Use Theorem 7.16 to prove Lemma 7.14. 7-15. Give a direct proof that the union of TWO countable sets is countable. 7-16. Prove that if C1, , . . , C, are countable, then C1 x C2 x .. . x C, is countable. Figure 18: Some standard visualizations for countability arguments. Part ( a ) shows the construction for a direct proof that the union of two countable sets is countable. Part ( b ) shows an explicit bijective function between W and the product of two countable sets. Part ( c ) shows the idea behind the proof that (0, 1) is not countable. cc 3030 7-17. Let ( a ( i , j ) ] E = lbe a family of nonnegative numbers. Then the double series a ( i , j ) con- i=l j = 1 30 verges iff for all bijections D :W + N x N the sum a o ( i ) converges. Furthermore, in this case i=l the values are equal. Hint. This is similar to the proof of Proposition 6.22. 7.3 Uncountable Sets Not all sets are countable. In fact, (see Exercise 7-18) there is an infinite hierarchy of sizes for infinite sets, because any time we form a power set, we obtain a set that is strictly larger than the set we started with. Theorem 7.18 I f X is a set, then X is not equivalent to its power set Pix). Proof. Suppose for a contradiction that f : X --f P ( X ) is a bijection. Define B := {x E X : x # f ( x ) } . Because f is surjective, there is a b E X with B = f ( b ) . Now b E B would imply b E B = f ( b ) and by definition of B this would mean b $ f (bj = B . Thus we infer b # B . But then b # B = f (b),which by definition of B forces b E B , a contradiction. For analysis, we typically do not need the full hierarchy. Instead we only need to distinguish sets that are countable from those that are not. Definition 7.19 A set U is called uncountable i f f it is not countable. The real numbers, which are fundamental for analysis, are not countable. Theorem 7.20 The interval ( 0 , 1) is uncountable. 125 7.3. Uncountable Sets x ,Itl,=, 1 .i j In=] CD w; ! * Figure 19: Cantor sets are the intersection of a sequence of unions of closed intervals where at each step only the left and right segments of each interval are kept. Proof. Suppose for a contradiction that (0, 1) was countable. Then there is a sequence such that for every x E (0, 1) there is an n E N so that x = x,. [X~}F=~ For each n E N,let 1 00 xAk) I k l be the decimal expansion of x, as in Proposition 6.6. For each n E N,let y , be a number in the set [ I , 2, 3 , 4 , 5 , 6 , 7 , 8) \ [xf)] . Then ( y n } z l is a decimal expansion of a number y E (0, 1). However, for all n E N we have that yn f x;), and hence y # xn,contradiction. The remainder of the proof that W is uncountable is left to Exercise 7-19b. The uncountability of the real numbers shows that, even though in real life we work mostly with rational numbers, there are many more irrational numbers than there are rational numbers. Theorem 7.21 The set W \ Q of irrational numbers is uncountable. Proof. By Exercise 7-19b, the real numbers are uncountable. Now for a contradiction suppose that W \ Q was countable. Then R = (R\ Q) U Q would be countable by Theorem 7.16, contradiction. We conclude this section by defining Cantor sets. These sets are very useful to construct counterexamples which show that certain hypotheses in analytical theorems cannot be dispensed with. Because these counterexamples can be considered a bit pathological, we defer their construction to the exercises and we will only refer to Cantor sets when necessary. Cantor sets are constructed from a sequence of unions of closed intervals so that in each step we remove the middle of each interval. Figure 19 shows the first six stages in the construction of the ternary Cantor set, which is constructed by successively removing the middle third of the intervals at each stage. 1 Definition 7.22 Let [ a ,b ] be an interval and let 0 < q < -. Then we define the left 2 part o f [ a ,b ] to be the interval L q [ a ,b ] := [a, a q ( b - a ) ] and the rightpart to be the interval & [ a , b ] := [b - q ( b - a ) , b ] . + 7. Some Set Theory 126 1 For a sequence Q := {qn]zl with 0 < qn < - f o r all n E N define CF recursively 2 as follows. Let Cf := ZFo := [0, 11 and once the set Cf is dejined as a union ofpair2” wise disjoint closed intervals u Zi,n, Q f o r i = 1 , . . . , 2n let Z2i-l,n+l Q := Lqn+l[Ii:], i=l u 2n+l := Rqn+l[ Z e ] and let Cn+l Q := let Z2i,n+l Q 00 Zj:+l. Then C Q := i=l Cf is called n=l the Cantor set associated with the sequence Q. Even though the construction looks like it should only leave the boundary points of the intervals, Cantor sets are in fact uncountable. The details can be explored in Exercise 7-25. Exercises 7-18. Construct a sequence (Pn}F=l of infinite sets so that no two sets P, and Pn+l are equivalent, but for each n E N there is an injective function f n : Pn -+ P,+1. 7-19. Containment and uncountable sets (a) Let U ,V be sets. Prove that if U is uncountable and U C V, then V is uncountable (b) Prove that B is not a countable set. 7-20. Prove that every uncountable set contains a countably infinite subset 7-21. Prove that the set of all functions from W to (0, 1) is uncountable 7-22. Let { a , ] j G lbe an uncountable family of positive numbers. Prove that there are an so that ain P E for all n E W. countable subfamily {qn E P 0 and a 7-23. Let F : [ a , b] + R be a nondecreasing bounded function. Prove that F can have at most countably many discontinuities. Hint.Suppose the set of discontinuities is uncountable and use Exercise 3-21 to conclude that there must be an E so that the set of all x with lim F ( t ) - lim F ( z ) > E is uncountable. z+x+ z+x- 7-24. Prove that for every countable subset A g R there is a nondecreasing function f : B + [O, 11 that is continuous on W \ A and discontinuous at every a E A. Hint. With A = {a, : n E W) set f ( x ) := ,:an i x 2” 7-25. Cantor sets (a) Prove that for any sequence Q = (q,)F=l of numbers qn E tion from C Q to the set of all sequences of zeroes and ones. Hint. Forx E C Q andn 3 1, seta,(x) := Oiffx E I ? Jsn ( 0, :> - there is a bijective func- = Lq,, thatis, iffwehave to turn “left” at the nth stage of the construction to keep x in the interval. (b) Prove that CQ is uncountable. Hint.Exercise 7-21. (c) Prove that the set of endpoints of the intervals I f n that make up the C : is countable. Q . (d) Prove that every x E C Q is the limit of a sequence of endpoints of intervals I 1.11 Chapter 8 The Riernann Integral I1 The Dirichlet function in Example 5.26 shows that functions with too many discontinuities may not be Riemann integrable. This is because for a discontinuity d there is an E > 0 so that, independent of 6, the numbers inf { f (x) : x E (d - 6 , d 6 ) ) and sup { f (x) : x E (d - 6 , d a)} will always be at least E apart. By Theorem 5.25, too many such discontinuities will cause the function to not be Riemann integrable. At the same time, Proposition 5.12 shows that a function can have some discontinuities and still be Riemann integrable. To determine when a function is Riemann integrable, we need to determine “how many” discontinuities are acceptable. For a graphical motivation, consider Figure 20 on page 133. Section 8.1 introduces outer Lebesgue measure, which is the tool to measure “how many” discontinuities a function has. Section 8.2 introduces Lebesgue’s integrability criterion and Section 8.3 shows how this criterion allows us to easily obtain new results about the Riemann integral. We conclude in Section 8.4 with improper integrals. + + 8.1 Outer Lebesgue Measure Outer Lebesgue measure covers a set with open intervals and assigns the infimum of the sums of the lengths of these intervals as the measure (“size”) of the set. With respect to Riemann integrability, we should note that if all discontinuities of our function are trapped in a union of intervals, then outside of this union of intervals the function is continuous and Riemann integrability should not be a problem there. Definition 8.1 For an open interval I = ( a ,6 ) in R,we define I I1 := b - a. For any set S C R,we define the outer Lebesgue measure of S to be cx: j=1 j=1 where we set h ( S ) = oc, ifnone of the series in the set on the right converge. 127 8. The Riemann Integral II 128 The proof of Proposition 8.5 will show why we use open intervals in the definition of outer Lebesgue measure. We first turn our attention to sets with outer Lebesgue measure zero. These sets, and properties that hold on the complement of such a set, will be of particular importance for the Riemann integral. Definition 8.2 A set of outer Lebesgue measure zero is called a set of measure zero or a null set. A property P ( x ) such that h ( ( x E D : P ( x ) is not true }) = 0 is said to hold almost everywhere in D. Almost everywhere is also abbreviated as a.e. Countable sets are considered “small” in set theory and they are also “small” with respect to outer Lebesgue measure. Proposition 8.3 Countable subsets of EX have outer Lebesgue measure 0. Proof. Let C = {cl, c2, . . .} be a countable subset of R and let E > 0. For j E N let 00 00 00 E l 1 I . ‘= c . - - . ThenCsUZjandCIIil=Cs-=s.Thus 2 2j 2J j=1 j=l j=l h ( C ) = 0. (, Although the proof of Proposition 8.3 is quite simple, it takes a little to get used to the result. Recall that Q is countable, which means it is a null set! Exercise 8-3d will show that null sets can be uncountable, too. Of course, not all subsets of R are null sets. To prove that outer Lebesgue measure provides the “right” measure for intervals, we need the Heine-Bore1 Theorem. The conclusion of the Heine-Bore1 Theorem is the inspiration for the topological definition of compactness (see Theorem 16.72). Until we formally introduce compactness in Section 16.5 we will rely on Standard Proof Technique 2.28 as used in the proof of the Heine-Bore1 Theorem. Theorem 8.4 Heine-Bore1 Theorem. Let [ a ,b] c R and let Z be a family of open intervals with [ a ,b] 2 I . Then there are$nitely many intervals I1, 1 2 , . . . , I,, E Z u n SO that [ a ,b] IEZ c U Zj. j=l Proof. Suppose for a contradiction that Let c := inf C. Because a E I for some I E Z we infer c > a. Let J E Zbe an open interval with c E J. If c < b there is a 6 > 0 with (c - 6, c 6) 5 J n [ a ,b ] . If c = b there is a 6 > 0 with (c - 6, b] 2 J n [ a ,b]. By definition of c, there + are finitely many I1, 1 2 , . . . , Z, E Z so that [ a ,c - 61 s u n j=1 I j . But then if c b we i 8.1. Outer Lebesgue Measure +S] C J U obtain [ a , c u u 129 n Zj, contradicting the definition of c. Hence, c = b and we j=l n obtain [ a ,b] C J U Zj, implying C = 0, a contradiction. j=1 Proposition 8.5 Let a , b E R with a < b. Then h ( [ a ,b ] ) = b - a. Proof. Let E > 0. Because [ a ,b] 2 b + E4 ) U + fi (-- -) 2 . 2 n 92 . 2 " n=2 & & we obtain h ( [ a ,b ] ) < ( b - a ) E . Because E > 0 was arbitrary, h ( [ a ,b ] ) I b - a. To show the reverse inequality, let [ Z j } c l be a countable family of open intervals u 00 so that [ a ,b] E Z j . By the Heine-Bore1 Theorem, there is a finite number of inter- j=l u n vals Zj,, . . . , Zj, SO that [ a ,b] C Zj,. For k = 1, , . . , n let Zjk = ( a k , bk). Without k=l loss of generality assume that no interval Z j k is contained in another. Reorder the intervals so that for k = 2, . . . , n we have bk-1 5 bk. Then for all k = 2, . . . , n we have a k - 1 P ak. Moreover, bn > b, a1 -= a and for all k = 2, . . . , n we infer ak ibk-1. n Hence, bk - ak > bl - a1 n bk - bk-1 = bl - a1 + bn - bl 1 b - a , which k=2 k=l co IZ, 1 implies + > b - a , and hence h ( [ a ,b ] ) 3 b - a . J=1 Aside from its use related to Riemann integration, outer Lebesgue measure also is the foundation for Lebesgue integration. We conclude this section with some of the properties of outer Lebesgue measure, which will also be helpful for some exercises. Theorem 8.6 The properties of outer Lebesgue measure h. With 00 defined to be greater than all real numbers and the sum of a divergent series of nonnegative numbers being 00 we have the following. 1. h ( 0 ) = 0. 2. ZfA C B , then h ( A ) 5 h ( B ) . 3. Outer Lebesgue measure is countably subadditive. That is, for all sequences 00 {An]El ofsubsets An 5 R the inequality h Proof. For part 2, let A & B . Then we obtain the following. 130 8. The Riemann Integral II u 30 IZj 1 : A G 5 inf r each Z j is an open interval u Zj, each 00 IZj I :B s j=l = Zj, j=1 j=1 Zj is an open interval j=1 h(B). For part 1, note that for all n E N we have 0 1- ,: I 1 L 7c I 2 and thus h ( 0 ) 5 -, which 71 -71 via part 2 implies h(0) = 0. For part 3, first note that there is nothing to prove if the right side is infinite. So assume the right side is finite and let E > 0. For each n E M, find a countable family { I u uu 00 Lemma 7.14 the family Zn of this family is such that 1 j.n=l is a countable family of open intervals. The union " 0 0 00 An 5 n=l u xi Zj" = n=l j=1 Zj". By Proposition 6.22, the j,n=l convergence behavior and value of a doubly infinite sum of nonnegative numbers do not depend on the order of summation and by Exercise 7-17 it does not matter if we represent the sum as a double sum or a single sum. Thus we can conclude Because E was arbitrary, this proves part 3. Exercises 8-1. Let [ A n ) E lbe a countable family of null sets. Prove that h 8-2. Let a. b E W with a ib. Prove that h ( ( a ,b ) ) = b - a . Hint. For ''z"approximate the open interval "from the inside" with closed intervals 8-3. Let Q = ( q n ] 2 1be a sequence of numbers qn E and let C Q be the associated Cantor set as in Definition 7.22. We will use the notation of Definition 7.22 throughout this exercise. j=l 8.2. Lebesgue ’s Criterion for Riemann Integrability 131 Hint.Prove that for a finite union of painvise disjoint intervals the outer Lebesgue measure is the sum of the lengths of the intervals. This requires repeated use of the argument in the proof of Proposition 8.5. (b) Prove that { fi 2q,} converges. j=1 n n (c) Prove that h (CQ) = n$% 24j j=1 Hint. For ‘‘2,”first consider a family Z of open intervals so that C Q E VZ1,12,. . . , In E Z : C Q n [O, XIg‘ UZ.Prove that = 0, by assuming it is j=1 not empty and showing that inf C E CQ, which leads to a contradiction similar to the proof of the Heine-Bore1 Theorem. (It also helps to look ahead to the proof of Lemma 8.11 for a similar argument.) Then show that for any countable family u {Zj]gl with C Q u 03 I, there j=1 m are intervals Iji , . . . , Zjk with C Q I jk . Conclude that there must be an n E W so that k=l m k=l (d) Prove that for any q E (0, -:> the constant sequence Q = {q)z, yields a Cantor set C Q of measure zero. Note. By Exercise 7-25b in Section 7.3 Cantor sets are uncountable. This means there are uncountable sets of measure zero. (e) Use Q = { 2n;; n=l ~ to prove that there are Cantor sets that are not of measure zero n n+l n Hint. Prove that for all n E N we have -. 2qj 2 1 - j=i j=2 2J (0Prove that there are Cantor sets whose Lebesgue measure is arbitrarily close to 1. Hint. Fork E W fixed, use n + k instead of n in Exercise 8-3e. 8-4. Use the Heine-Bore1 Theorem and the axioms for W except for Axiom 1.19 to prove the BolzanoWeierstrass Theorem. Hint. We will ultimately do this in a more abstract setting in Theorem 16.72. 8-5. Prove that i f f : [a, b] --f all x E [ u , b ] . R is continuous and A ( ( x E [ a ,b] : f ( x ) # 0) ) = 0, then f ( x ) = 0 for 8-6. Prove that if f,g : [a. b] + B are continuous almost everywhere, then f everywhere. 8.2 + g continuous almost Lebesgue’s Criterion for Riemann Integrability The oscillation of a function is a quantitative measure “how discontinuous” the function is. It is the last tool we need to characterize Riemann integrable functions. 8. The Riemann Integral II 132 Definition 8.7 Let f : [ a ,b ] -+ R be a boundedfunction and let x E [ a ,b]. For any interval I let w f ( I ) := sup { If(y) - f ( z ) / : y , z E I n [a,b ] }be the oscillation off over the interval I . Define the oscillation o f f at the point x E [ a ,b] as the infimum w f ( x ) := inf { w f ( ~ :)x E J , J is an open interval }. Exercise 8-8 shows that the oscillation measures the height of “jumps” in the function and Exercise 12-23 shows that the oscillation is also a measure of the size of oscillations. Regarding the details of the definition, Exercise 8-10 shows that it is important that we use open intervals in the definition of the oscillation wf (x) at a point. Theorem 8.8 The boundedfunction f : [a,b ] -+ R is continuous at x = 0. E [a,b ] ij7 Of(X) 0. Then there is a Proof. For “+,”let f be continuous at x E [a,b] and let E & S > 0 such that for all y E [ a ,b] with Iy - X I < 6 we have f ( y ) - f (x)l < -. Then 2 for all y , z E ( x - 6, x + 6 ) r l [a,b] we obtain 1 + Hence, w f ( x ) 5 w f ( ( x - S, x 6)) 5 E and because E was arbitrary we conclude that w f ( x ) = 0. Conversely, for “e,” let x E [a,b] with w f ( x ) = 0 and let E > 0. Then there is an interval (c, d ) with x E (c, d ) such that w f ( ( c ,d ) ) < E . Let 6 := min{x - c, d - x} unless x = a , in which case we set 6 := d - x, or x = b, in which case we set S := x - c. Then for all y E [a,b] with lx - y / < 6 we have y E (c, d ) , and hence l f ( x ) - f ( y ) / < E . Thus f is continuous at x. The next two lemmas show that, when it comes to Riemann integrability, small oscillation means that lower and upper sums can get close to each other, while nonzero oscillation on a set of positive outer Lebesgue measure prevents Riemann integrability (also see Figure 20). Lemma 8.9 Let f : [a,b ] --+ R be bounded. I f w f ( . x ) < & f o rall x E [ a ,b], then there is apartition P of [ a ,bl so that U ( f , P ) - L ( f , p ) < 61b - al. + Proof. For every z E [ a ,b ] ,there is an open interval Iz = ( z - a,, z 6,) so that w f ( I , ) < E . By the Heine-Bore1 Theorem, there are finitely many z1, . . . , zm E [ a ,b] . Let P := ( a = xo < x1 < . . . < x n = b} j=l 6. be the set comprised of a , b and the endpoints z j k 2 that are in [ a ,b ] . Then each 2 interval [xi-l, xi] is contained in an interval Izl and consequently, with notation as in Definition 5.13, we infer Mi - mi < E . But then n U ( f , P ) - L ( f ,p ) = n Mi Axi i=l n n mi Axi = c ( M i - m i )Axi i=l i=l 8.2. LebesgueB Criterion for Riemann Integrability 133 A 4 (“1 (h) Figure 20: The area of the boxes indicates the difference between the lower and the upper sum of the function for the given partition. Boxes are tall where the function has large slopes or discontinuities. Comparison of ( a ) and ( b ) shows that as the norm of the partition goes to zero, the height of the boxes goes to zero where the function is continuous. Where the function is discontinuous the height remains bounded away from zero (consider the two discontinuities). Thus f can only be Riemann integrable if the discontinuities (unavoidable tall boxes) can be trapped inside a set of intervals whose total length is small. This is the idea for the Lebesgue criterion. which finishes the proof. Lemma 8.10 Let f : [ a ,bl -+ 24 be bounded. Ifh({.x theti f is not Riemann integrable. Proof. Because { x E [ a .b ] : w f ( x ) > 0 } = x [ a .61 : w f ( x ) > 0 ) ) > 0, E E [a. b] : w f ( x ) > /=I is an E > 0 so that L := h ( { x E [ u , b] : w , f ( x ) > E } ) > 0. Let P = { a = xo < X I < . . . < xn = b } be any partition of [a.b ] . Define the set D := { x E [ a .b] \ {xo, . . . , x,?} : w f ( x ) > E } . By Theorem 8.6, we obtain + CA ( { x j ~2 ) A 11 L 2 A(D)= A ( D ) 2 h ( { x E [u, b] : w f ( x ) > =L, E}) i 1 D j=1 uUIxj} which means h ( D ) = L . With i l , . . . , ik E { 1 , . . . , n } being the indices of the intervals ( x i - ] ,x i ) that ink tersect D we obtain D k u(x;,-l. xi,) j=1 Axi, and 2 h ( D ) = L . But then in each j=1 interval ( s i , - l .x;,) there is an s j with w , f ( s j ) > E , and hence w f ( x i , - l , x i , ) > each j = 1. . . . , k we infer M;, - ni,, > E and thus t1 E. k U ( f . P ) - L ( f . P ) = C ( M i - m;)Axi 2 C ( M i , - m;,)Ax;, > E L > 0. i=l j=1 For 134 8. The Riemann Integral II Because P was arbitrary we have shown that for any partition P of [ a ,b ] we have U (f ,P ) - L ( f ,P ) > E L > 0, where E and L are fixed. By Theorem 5.25 this means that f is not Riemann integrable. Lemma 8.10 shows that only functions that are continuous almost everywhere have a chance to be Riemann integrable. Lemma 8.9 shows that if we can “trap” the discontinuities in a small enough set, a function should be Riemann integrable. The only obstacle left is that outer Lebesgue measure works with countably many open intervals. To obtain a complement that is made up of closed intervals, it would be nice if we could use finitely many open intervals to cover the set of discontinuities. The next lemma shows that this is possible. Lemma 8.11 Let f : [ p , b]- -+ E% be a bounded function and for each p > 0 define B, := {x E [a,b ] : w f ( x ) 2 p } . l f Z is a family of open intervals such that B, c u n I , then there arefinitely many 11,1 2 , . . . , I, E u Zso that B , IEZ Zj. j=1 Proof. Suppose for a contradiction that Then c := inf C 5 b. We first claim c E B,. There is a sequence { a , } z l of elements of B, so that a, < c and lim a, = c. Let E n+o3 > 0. Then there is an n E N so that la, - CI < & -. 2 E + -)) 2 p . Because of 2 6 c ( c - E , c + E ) we infer - c +E)) 2 p . the containment a, - -, a, + ( 2 Because for all 1 < c r there is an > 0 with ( c - c + E ) g (1, r ) we conclude Moreover, because a, E E B, we have that w f ( ( a , - ?, a, W ~ ( ( C E, E i E, that w f ( c ) 2 p . Because c E B,, there is an open interval Z E Z that contains c. Clearly, c a. But then x := max { a ,inf(I)} E [ a,b] and because x # C, there are in- + u n tervals Z1,. . . , In E Zso that B , n [ a,XI 2 I ] . But then for all y E I with y 2 c J=1 we obtain B, n [ a ,y ] C Z U u n Zj. If c b this means inf C 2 sup(I) > c and if i j=1 c = b this means C = 0, a contradiction either way. This proves the result. Now we can characterize Riemann integrability as follows. Theorem 8.12 Lebesgue’s criterion for Riemann integrability. The bounded function f : [ a ,b ] + R is Riemann integrable on [ a,b] iff f is continuous a.e. on [ a ,b ]. Proof. The part “+”is the contrapositive of Lemma 8.10. 8.2. Lebesgue’s Criterion for Riemann Integrability 135 For “+,”let f be continuous a.e. on [ a , b] and let E > 0. Choose rn E N so that b-a E . Then h(X,) = 0, and there is a -< -. Let X, := x E [ a ,b] : o f ( x ) 2 m 2 rn sequence u of open intervals with X, 5 c oc m (Zj}cl Z j and j=1 j=1 lZjI < & 2 ( q ( [ a ,bl) + 1)’ u k . . . , Zj, so that X, C By Lemma 8.11, there are finitely many open intervals Z j l , and without loss of generality we may assume the Zj, Zj, 1=1 are pairwise disjoint. For each u k Zjl let Cjl := Zjl U { sup(Zj,), inf(Zjl)}. The set [ a , b] \ Ij, is a finite union of closed 1=1 intervals J1, . . . , J p and singleton sets {sl},. . . , { s q } . By Lemma 8.9 for each Ji,there lJ.1 Set P := is a partition Pi so that U (f , Pi) - L(f,Pi) < 2. rn . 1 5 < c u P Pi U { a , b}. Then i=l c k P + I Ji I O f( [ a ,b ] ) I Zj, I i=l 1=1 b-a & & & -+ w f ( [ a ,bl) <-+-=&. rn 2 ( q ( [ a , b l ) 1) 2 2 rn + By Theorem 5.25 f is Riemann integrable on [ a , b]. Lebesgue’s integrability criterion shows some of our earlier examples in a different light. Clearly, the Lebesgue criterion implies that continuous functions are Riemann integrable, so it supersedes Theorem 5.20. Similarly, any function with finitely many discontinuities is Riemann integrable, so Proposition 5.12 also is an easy consequence of the Lebesgue criterion. Regarding the Dirichlet function we can say the following. Example 8.13 (Example 5.26 revisited.) The Dirichlet function is not Riemann integrable on [0,11. In light of Theorem 8.12, the function f ( x ) = 0; for x E [O, 11 \ Q is not 1; for x E [0, 11 n Q, Riemann integrable, because it is discontinuous at eviry point of [Ol 11 and by Proposition 8.5 we have h([O,11) = 1 > 0. 0 I 8. The Riemann Integral II 136 Exercises 8-7. Let f : [ a ,bl +R be bounded and let I , J be intervals with I g J . Prove that w f ( Z ) 5 w f ( J ) . 8-8. The oscillation as a measure of "jumps" in the function. (a) Let f ( x ) := 1 1 0 , ~ Prove ). that w f ( 0 ) = 1 (b) Let f : [ a , b] + E% be bounded and let x E (a. b ) be so that lim f ( z ) and lim f ( z ) both z'x- Z'X+ exist. Prove that wf(x) 2 Z'X- (c) State when the inequality in Exercise 8-8b actually is an equality and prove your claim. 8-9. Let f : [ u , b] + R be bounded and let x E [a, b]. Prove that w f ( x ) = lim wf { O; n+oo ' for 8-10, Prove that for f ( x ) := O' we have inf { w f ( J ) : 0 E J , J is an interval 1; forx < 0, though f is discontinuous at 0. The explain why this does not contradict Exercise 8-8b. } = 0, even 8-1 1. Prove that any nondecreasing function F : [a. b] + R is Riemann integrable. Hint. Exercise 7-23. 8-12. Let a < b. A function f : [ a , b ] + R is said to be of bounded variation iff (a) Prove that every function of bounded variation is the difference of two nondecreasing functions. Hinr. Use V , " ( f ) := sup I" 1 f ( a i ) - f(ai-1) 1 : a = a0 < a1 < . . . < a, =x i=l 1 . (b) Prove that every function of bounded variation has at most countably many discontinuities. Hint. Exercise 1-23. (c) Prove that every function of bounded variation on [ a , b] is Riemann integrable on [ a , b ] . (d) Prove that if f and g are of bounded variation and (Y variation, too. E R,then f + g and af are of bounded 8-13. A function f : [ a ,b] + R is called piecewise continuous iff there are a = zo < z1 < ' ' < Z, = b so that for all i = 1, . . . , n the restriction f l ( z , - l , z i ) is continuous. Prove that piecewise continuous bounded functions are Riemann integrable. 8-14. Prove that if A , N g Rand N is a null set, then h ( A \ N ) = h ( A ) . 8-15. Prove that if C Q is a Cantor set with nonzero measure (see Exercise 8-3e),then Ice is not Riemann integrable. Hint. Prove that l Cis ~ discontinuous at every x E CQ. 8.3 More Integral Theorems The most tedious part in any proof involving Riemann integrable functions is to prove Riemann integrability. Lebesgue's integrability criterion is a powerful tool to address this issue, Without it, the proofs of Theorems 8.14 and 8.16 would be a lot longer. Theorem 8.14 Let f,g : [ a ,b ] + E% be Riemann integrable. Then 1. The product f g is Riemann integrable on [ a ,b], 8.3. More Integral Theorems 137 I f . 2. Ifthere is an E > 0 so that I g ( x ) 2 E for all x E [ a ,b ] , then the quotient - is g Riemann integrable on [a,b ] , 3. The absolute value If 1 is Riemann integrable and the triangular inequality Proof. For part 1, note that if f and g are continuous a.e., then the set of discontinuities { x : wfg (x) 0 ) is contained in the union { x : ~f (x) # 0 } U { x : wg(x) O } , which has measure zero. Thus f g is continuous a.e., and hence Riemann integrable. The remaining parts of this proof are left as Exercise 8-16. + + It is now easy to see that once f : [a,b] + R is Riemann integrable, then it is also Riemann integrable over any closed subinterval. Formally, we define the following. Definition 8.15 Let f : [ a ,b] + IR and let c , d E [ a ,b] with c < d. Then f is called Riemann integrable over [c,d ] ifSthe restriction f I[c,d] is Riemann integrable. In this case we set l d f ( x ) dx := l d f l [ c , d ~ ( ~d )x . Theorem 8.16 Let f : [ a ,b] + R and let m E ( a ,b). Then f is Riemann integrable over [ a ,b] iff f is Riemann integrable over [a, m ] and over [ m ,b]. In this case, the integrals satisfy the equation Ib f ( x )d x = lm f (x) d x + / m b f ( x )d x . Proof. Exercise 8-17. We can now consider integrals in which the upper bound is an independent variable. This idea provides another connection between derivatives and integrals. For any continuous function f , definite integrals produce a function G with G' = f. Theorem 8.17 Fundamental Theorem of Calculus, Derivative Form. Let f be a Riemann integrablefunction on [a,b]. Then the function G ( x ) := formly continuous on [ a , b ] and iff is continuous at x I"f ( t )dt is uni- Ja E ( a ,b ) , then G is dixerentiable at x with G ' ( x ) = 1 I Proof. Let B be an upper bound so that f (x) < B for all x E [a,b ] . To see that & G is uniformly continuous on [ a ,b ] ,let E > 0. Set 6 := -. Then for all x , z E [a,b] B with Ix - zI iS we obtain (assuming without loss of generality that x < z ) < & B-=E B ' which means that G is uniformly continuous. 8. The Riemann Integral II 138 Now let x E ( a , b ) , let f be continuous at x and let E > 0. Then there is a 6 > 0 such that for all z E [a,b ] with Iz - X I < 6 we have ( f ( z ) - f ( x ) l < E . Hence, for all z > x with z E [a,b ] and Iz - X I < S we obtain For z < x,the proof is similar, and hence lim G ( z )- G ( x ) Z'X z-x rn = f (XI. Functions as mentioned in Theorem 8.17 sometimes are also defined with an integral such that a > x . If we went through a partition from right to left rather than left to right, then all the bases of the rectangles in a Riemann sum would have negative length. Thus it makes sense to define the following. Definition 8.18 Let f : [a, b ] +. R be Riemann integrable. Then we define the integral with the reversed bounds to be la b f (x)d x := - f (x) dx. Corollary 8.19 Let f : [ a ,b] -+ IR be Riemann integrable and let xo the function G ( x ) := Jc: f ( t ) dt is uniformly continuous on [ a ,b ] and d", (I:f ( t ) dt 1 uous at x E ( a ,b), then - E [ u , b]. iff Then is contin- = f (x). Proof. Exercise 8- 18. rn Exercises 8-16. Proving Theorem 8.14 (a) Prove part 2 of Theorem 8.14. (b) Prove part 3 of Theorem 8.14. Hint.Use Lemma 5.6 to prove the inequality (c) To see how valuable the Lebesgue criterion is, use Riemann's Condition (Theorem 5.25) to prove that I f 1 is Riemann integrable over [u,b]. Then compare this proof with the proof using the Lebesgue criterion and state which proof is simpler. Hint.Use a partition P with U ( f , P ) - L ( f , P ) < E . 8-17. Proving Theorem 8.16 (a) Prove Theorem 8.16. Hint. In each direction of the proof use the Lebesgue criterion to establish Riemann integrability. Use Lemma 5.6 for the equations, making sure that the point m is an element of each partition P k . 8.3. More Integral Theorems 139 (b) To see how valuable the Lebesgue criterion is, use Riemann’s Condition (Theorem 5.25) to prove that if f is Riemann integrable over [a, b],then f is Riemann integrable over [a, nz]. Then compare this proof with the proof using the Lebesgue criterion and state which proof is simpler. Hint. Use a partition P with U ( f , P ) - L ( f , P ) iE . 8-18. Prove Corollary 8.19. 8-19. Use Riemann’s Condition (Theorem 5.25) to prove that if f and g are Riemann integrable over [a, b],then f g is Riemann integrable over [a, b]. Then compare this proof with the proof using the Lebesgue criterion and state which proof is simpler. 8-20. Determine which result from this section that was used in the proof of Theorem 8.17 was not available in Section 5.3. (This prevented us from placing Theorem 8.17 right after the Antiderivative Form of the Fundamental Theorem of Calculus.) 8-21. Mean Value Theorem for the Integral. Let the function f : [ u , b] + W be continuous and let the function g : [a, b] + [0, co)be nonnegative and Riemann integrable. Prove that there is a c E [a, b] so that the integral satisfies 8-22 Let g : [ a , b] + [O. 00) l b b f ( x ) g ( x )d x = f ( c ) be Riemann integrable and let f : [a, b] --f b there is a c f ( x ) g ( x )dx = f ( a ) [ a , b] so that E g(x) dx. [ g(x) dx W be nondecreasing. Prove that +f(b) h b g ( x ) d x . You may use that nondecreasing functions are Riemann integrable (see Exercise 8-1 1). 8-23 Integration by Parts. Let [ a , b] c (c. d ) and let F , g : (c, d ) -+ W be differentiable functions with derivatives f and g’ that are Riemann integrable on [ a , b]. Prove that the integral satisfies lb f ( x ) g ( x ) d x = F ( b ) g ( b )- F ( a ) g ( a ) - lb F(x)g’(x)dx. 8-24. Integration by Substitution. Let [ a , b] c (c, d ) , let g : (c, d ) + R be differentiable, let its derivative g’ be Riemann integrable on [a, b ] ,let ( u , u ) 2 g [ [ a ,b] ] and let F : ( u , u) + R be differentiable with continuous derivative f. Prove that lb f ( g ( x ) ) g ’ ( x ) dx = F ( g ( b )) - F (g(a) ) . 8-25. Let a > 0. A function f : [-a, a ] + W is called even iff f ( x ) = f ( - x ) for all x E [ - a , a ] . A function f : [-a, a ] + R is called odd iff f ( x ) = - f ( - x ) for all x E [-a, a ] . La L (a) Prove that if f is even and Riemann integrable, then f ( x ) dx = 2 (b) Prove that i f f is odd and Riemann integrable, then f ( x ) d x = 0. La f ( x ) dx (c) Prove that any function f : [-a, a ] + W is the sum of an even and an odd function Hint. f ( x ) + f ( - x ) is even, ‘ 2 8-26. Compute the derivative. (Should the integrands be unknown, simply note that they are continuous.) (a) & 1’ et2 dt 8-27. Let f : [a, b] + W be continuous and let I , u : (c, d ) (l:;) f(t) d f ) = f ( u ( x ) ) u’(x) - f (I@) 8-28. Construct a function f : [0, 11 + R so that integrable. -+ [ a , b] be differentiable. Prove that ) l’(x). I f 1 is Riemann integrable and f is not Riemann 8-29. A function f : [ a , b] + W is called absolutely continuous iff for every E > 0 there is a S > 0 so that for all sequences (a1 , b l ) , . , , , (a,, b,) of pairwise disjoint open intervals the inequality n n x ( b j - ai) i6, implies i=l i=l 1 f ( b i )- f(ai) 1 < E. 8. The Riemann Integral 11 140 (a) Let f : [a, bl + R be a Riemann integrable function. Prove that the function G : [a, b] + W defined by G ( x ) := s,^ f ( t ) df is absolutely continuous on [ a , b]. (b) Prove that every absolutely continuous function is uniformly continuous. (c) Prove that f ( x ) = 1 - X is continuous, but not absolutely continuous on (0, 11 8-30. Results for Riemann-Stieltjes integrals. Let g : [a, b] + R be nondecreasing, and let the functions f,h : [ a , bl + Iw be bounded and Riemann-Stieltjes integrable on [ a , b] with respect to g. (a) Prove that i f 1 is Riemann-Stielfjesintegrable on [a, b] with respect t o g and that the triangular inequality ilb / If1 f dgi 5 b dg holds. (b) Prove that for all m E [a,b] the function f is Riemann-Stieltjes integrable with respect to g b over [a,m ] and over [m,b] and that f dg = / rn a f dg + /f b rn dg. (c) Prove that the product f h is Riemann-Stieltjes integrable on [ a , b] with respect to g Hint (for all). Use the Riemann Condition for Riemann-Stieltjes integrals (see Exercise 5-27) 8.4 Improper Riemann Integrals The Riemann integral allows us to compute the “area” under bounded functions defined on closed and bounded intervals. Sometimes we are interested in the area under functions that are defined on infinite intervals or that are unbounded. These areas can be approximated with Riemann integrals. Definition 8.20 Let a restriction f I[a,c~ E R and let f : [ a ,00) -+ R be such that for all c > a the is Riemann integrable. Ifthe limit lim r+cz, l t f (x) d x exists, it is called cc the improper Riemann integral of f over [ a , 00) and it is denoted Improper Riemann integrals for functions f : (Exercise 8-31). (-00, f ( x )d x . b ] -+ R are dejned similarly Example 8.21 The p-integral test for integrals over infinite intervals. Let p > 0 be 1 rational. Then f ( x ) = - is improperly Riemann integrable over [ 1, 00) zfs p > 1. xp 1 , while for Finally, we need to show that the improper integral does not exist for p = 1. To do 1 1 this, note that f ( x ) = - is greater than -on each interval [ m ,m 1). Therefore, X m+l + 141 8.4. Improper Riemann Integrals for all n E N we infer f L C" k +1 l -l[k,k+l), and hence k=l The latter sum is unbounded, so 0 d x does not exist. Note that because we have not yet defined logarithms, the last argument in Example 8.21 is unavoidable. Similarly, we had to restrict ourselves to rational powers, because powers with arbitrary real exponents have not been defined yet. Of course, the p integral test ultimately holds for real exponents p and we will not restate it once real powers are defined. Improper integrals can also be defined for (potentially) unbounded functions. Definition 8.22 Let a , b E W with a < b and let f : [a,b ) + W be suck that for all c E ( a ,b ) the restriction f is Riemann integrable. Ifthe limit lim t+b- lt f ( x ) dx exists, it is called the improper Riemann integral off over [ a ,b ) and it is denoted lb f ( x ) dx. Improper Riemann integrals forfunctions f : ( a ,b] -+ W are defined similarly (Exercise 8-32). For Riemann integrable functions on [ a ,b ] , the improper Riemann integral over [ a ,b ) agrees with the Riemann integral over [ a ,b ] . Proposition 8.23 Let a , b E R with a < b and let f : [ a ,b ] + R be Riemann integrable. Then the improper Riemann integral off over [ a ,b ) exists and it is equal to the Riemann integral off over [ a ,b]. & Proof. Let B > 0 be an upper bound of I f / and let E > 0. Then for S := - and all B t E [ a ,b ) with It - bl < 6 we obtain Riemann integrable functions are not the only functions that are improperly Riemann integrable. Example 8.24 shows that there are unbounded functions for which the improper Riemann integral exists. Example 8.24 The p-integral test for improper integrals over (0, 11. Let p > 0 be 1 rational. Then f ( x ) = - is improperly Riemann integrable over ( 0 , I ] iff p < 1. XP Mimic the proof for Example 8.21. (Exercise 8-33.) 0 8. The Riemann Integral II 142 Note that the improper integrals r I' - :x d x converge for p < 1, while the integrals - d x converge for p > 1. Irrespective of this important difference, for both xlP types of improper integrals similar laws hold and there also is a Comparison Test that is similar to the Comparison Test for series. Theorem 8.25 Let f , g : [ a ,b ) + R (where b could be integrable over [ a ,b ) and let c E R.Then 1. f 00) be improperly Riemann + g is improperly Riemann integrable over [a,b ) and lb(f + g ) ( x )d x = l b f (x) d x + l b g(x)dx. 2. cf is improperly Riemann integrable over [a,b ) and l b c f ( x )d x = c Ib f(x) dx. Proof. Similar to the proof of Theorem 6.4. (Exercise 8-34.) Theorem 8.26 Let f : [a,b ) -+ R (where b could be 00) be Riemann integrable over all intervals [ a ,c] 5 [a,b). Then f is improperly Riemann integrable over [ a ,b ) if and only if for all c E [ a ,b ) the finction f is improperly Riemann integrable over 1 b [c. b ) and in this case f (x)d x = lc f (x) d x b + f (x) d x . Proof. Exercise 8-35. Theorem 8.27 Comparison Test for improper integrals. Let f ,g : [ a ,b ) -+ E% (where b could be 00) be such that 0 5 f 5 g , f is Riemann integrable over every closed interval in [ a ,b ) and g is improperly Riemann integrable over [ a ,b). Then b b f is improperly Riemann integrable over [ a ,b) and Proof. The function F ( t ) := [ a ,b ) and it is bounded by f (x)d x 5 g ( x )dx. lt f (x)d x is continuous and nondecreasing on g(x) d x . The reader will show in Exercise 8-36 that lim F ( t ) = sup { F ( t ) : t E [ a ,b ) } 5 g ( x ) d x to complete the proof. t+b- Theorem 8.28 Let f : [ a ,b) + R (where b could be 00) be Riemann integrable over every closed interval in [ a ,b). If I f 1 is improperly Riemann integrable over [ a ,b), then f is improperly Riemann integrable over [ a ,b ) and the triangular inequality 143 8.4. Improper Riemann Integrals t Figure 21: In the integral test, the improper integral of a nonincreasing function is related to the series that give the Riemann sums with left and right endpoint evaluations for the partition with step length 1. The integral test says that for an improperly integrable function the series obtained by right endpoint evaluation cannot be infinite ( a ) , while for a function that is not improperly integrable the series obtained by left endpoint evaluation cannot be finite (b). Proof. Exercise 8-37. Finally, we should note that the occurrence of series in Example 8.21 is not an accident. The Integral Test connects the convergence of certain series to the convergence of improper integrals over infinite intervals. Theorem 8.29 Integral Test. Let f : [ 1, 00) -+ [0,00) be a bounded nonincreasing 00 ifs the improper integral f ( j ) converges nonnegative function. Then the series j=1 f (x)dx converges (also see Figure 21). 00 Proof. Throughout the proof let g ( x ) = f ( j ) l [ j , j + I ) ( x ) .(For every x 3 1, j=1 this sum has at most one nonzero term.) m For “+,”let f ( j ) be convergent. Then g as defined above is improperly Riej=1 00 w [ mann integrable over [ l , 00) with f(j) g ( x )d x = i00. Because 0 5 f 5 g , J I j=l the Comparison Test for improper integrals implies that Conversely, for r La “e,” let f ( x - 1) d x = Lrn L 00 f ( x ) d x converges. f ( x ) d x be convergent. Then the improper integral f ( x ) d x converges. Because g ( x ) 5 f ( x - 1) for all x 2 2, by Comparison Test for improper integrals the integral g ( x ) d x converges, which 144 8. The Riemann Integral II C X f ( j ) converges. means the series j=2 The connections and similarities between integrals over infinite intervals and series need to be considered with caution. For example, Exercise 8-38 shows that there is no Limit Test for improper integrals over infinite intervals. Finally, improper integrals can also be defined for functions on the whole real line and for functions with multiple singularities. a1 < a2 < . . . < a,-l < a, = b (where a could be and b could be 00) and let f : ( a , b ) \ { a l , . . . , a n - l } -+ E% be Riemann integrable over all closed subintervals of its domain. We define the improper Riemann integral off as follows. Let r1 < 12 < < r, be so that aj-1 < r j < a , and define Definition 8.30 Let a = a0 < -00 Ib f (x) dx := s . . 2 [I::, f (x) dx + j=1 1; f (x) dx] ifall the summands exist. Exercises 8-31. Let b E W and let f : (-m, b] + W be such that for all a integrable. Define the improper Riemann integral < b the restriction f [ [ a , b ]is Riemann s_, b f ( x ) dx o f f over (--00, b]. 8-32. Let a , b E iW with a < b and let f : (a. b] + W be such that for all c E ( a , b) the restriction f l [ c , b ] is Riemann integrable. Define the improper Riemann integral 8-33. Prove the claim in Example 8.24. lb f ( x ) dx o f f over [a, b] 8-34. Prove Theorem 8.25. 8-35. Prove Theorem 8.26. 8-36. Finish the proof of Theorem 8.27 by proving that lirn F ( t ) = sup +If1 8-37 Prove Theorem 8.28. Hint. 0 5 f t+b- [ F(t): t E [a, b ) }. 5 2lfl. 8-38 Construct a function f : [ I , m ) + [0, I] that is improperly Riemann integrable, but does not 1 converge to zero as x --f 03. Hint. Triangles of height 1 and area 2n 8-39 Cauchy Criterion for improper Riemann integrability. Let f : [a, b ) + W (where b could be 00) be Riemann integrable over all intervals [a, c ] 2 [ a , b). Prove that f is improperly Riemann integrable over [ a , b ) iff for all E > 0 there is an M E [a, b ) , so that for all c, d E ( M , b ) we have 8-40 Limit Comparison Test for improper integrals. Let be Riemann integrable over all intervals [ a ,c] s," f,g : [ a ,b ) + [0, 03) (where b could be x) [a, b ) . Prove that if lim x+b- f(x) fo =K g(x) > 0, then d x converges iff Hinr. Close to b we have g ( x ) ( K ~ E) 5 f(x) 5 g(x)(K +E). A 8-4 1 Prove that the function f : ( a , b] + W is improperly Riemann integrable over ( a , b] iff the function [ improperly Riemann integrable over a , m ) and that in this case the integrals are equal. Chapter 9 The Lebesgue Integral The geometric idea behind integration is to approximate the area under the graph of a function with areas that are easier to compute. In the Riemann integral, we partition the x-axis and erect a rectangle over each partition interval to approximate the area under the graph. However, Lebesgue’s criterion for Riemann integrability shows that geometric rectangles will not approximate the area well if the function “oscillates” too much. That is, if the differences between the possible choices for the heights of the rectangles do not shrink to zero, then we cannot uniquely identify the area. Equivalently, in the Darboux formulation (see Section 5.4) excessive oscillations force the upper and lower approximations of the area with rectangles to stay a finite distance apart. If we change our point of view and partition the y-axis instead of the x-axis, the problem with different choices for the height goes away (see Figure 22). For a set S = { x E [ a , b] : yi-1 5 f ( x ) < yi }, all sensible values for the height of a generalized rectangle with base S are between yi-1 and yi. Because the difference between these values can be made small, oscillations are not an issue. However, this approach requires that the bases of our generalized rectangles are no longer intervals. The area of such a generalized rectangle will be the Lebesgue measure of the base set times the height of the rectangle. In this fashion, we retain all benefits of the geometric motivation for integration, while being able to integrate many more functions.’ This chapter introduces the fundamentals of Lebesgue integration. These fundamentals will be revisited in Chapter 14 when we generalize our work to arbitrary measure spaces. Our presentation is designed to readily translate to the more abstract setting of Chapter 14. Before we start, it is time to extend our arithmetic from the real numbers to the ‘The Lebesgue integral also remedies a more abstract problem with the Riemann integral. Spaces of Riemann integrable functions are usually not complete (see Exercise 16-15d), while spaces of Lebesgue integrable functions are (see Theorem 16.19). Completeness is such a fundamental abstract property that this may be the main reason why the Lebesgue integral is preferred. 145 146 9. The Lebesgue Integral k f l ___ 2" k 2" k-l 2" k-2 2" dashed: ~ n2" k-3 2" S = k-1 7 l A k - l k=l 7 - 2" Ak-1 Figure 22: The idea behind the Lebesgue integral is to partition the y-axis instead of the x-axis and to approximate the area with simple functions. This figure shows that this can lead to "scattered" base sets on the x-axis, which we will treat first in this chapter. The proof of Theorem 9.19 uses the partition of the y-axis into intervals with dyadic rational endpoints. real numbers with 30 and -m included. This is sometimes called the extended real number system. In the extended real number system, every set has a supremum. This will make our definitions of measures and of the Lebesgue integral simpler, because we will not need to explicitly distinguish between bounded and unbounded sets. Proposition 9.1 In the extended real number system [--00, co] := R U {oc.- 3 0 ) , 30 is the greatest element and --oo is the smallest element. Consequently, every set has an in$mum and a supremum. Proof. Let A C [--00, 001. The supremum of {-30) is -30, and the supremum of 0 also is -00, so we can assume that A # 0, {-cm}.If A is bounded above by a real number, then A n Iw # 0 has a supremum in the real numbers, which is also the supremum of A in [-30, 301. If A is not bounded above by a real number, then sup(A) = 00. Infima are treated similarly. Proposition 9.1 assures that from here on, we can take suprema and infima of sets of numbers fairly indiscriminately, as long as we know how to handle infinite values 301 is the same as on R with the additional convenalgebraically. Arithmetic on [-a, tions of Definition 9.2. These conventions are inspired by the corresponding limit laws in Theorems 2.44 and 2.46. 147 9.1. Lebesgue Measurable Sets Definition 9.2 Arithmetic involving 00. Let c E R. Then c-00=-00, c+co=00, undefined; c.(-m)= { 00; :nzfined; i f c > 0, i f c < 0, i f c = 0, i f c > 0, i f c < 0, i f c = 0, 00 C - = 0: --oo m . (-00) = -00, m*co=co, ( - 0 0 ) . (-00) C - = 0, = 00. All other attempts at “arithmetic with infinity” lead to what is called indeterminate forms. For these, the result can be any number, and hence rules of arithmetic cannot be stated. We will consider indeterminate forms in Section 12.3. 9.1 Lebesgue Measurable Sets We will work with generalized rectangles whose bases are no longer intervals. Therefore we need to measure the size of sets that are more complicated than rectangles. Outer Lebesgue measure is a reliable upper bound for the (one-dimensional) “volume” of a set. For all examples we have seen, it gives the right “volume.” Hence, we can (and do) consider it the right way to measure the “outer volume” of a set. Unfortunately, there are complications. It is possible to split a set T into two sets so that the outer Lebesgue measures of the two sets add up to more than the outer Lebesgue measure of T . If we were to involve such pathological sets in a definition of integration, the integral would not even be a reliable measure of the area of generalized rectangles. It would make no sense if the total “length” of the base would depend on how we split the base. The definition of Lebesgue measurable sets is designed to safeguard exactly against this problem (see Theorem 9.11). Because all our sets are subsets of R (and later of a measure space M ) , we introduce abbreviated notation for the complement. Notation 9.3 When there is one underlying set X that contains all sets that we currently investigate (as is the case in measure theory) and S s X , then we denote the 0 complement of S in X also as S’ := X \ S. Definition 9.4 A subset S R is called Lebesgue measurable i f f o r all T C R the equality h ( T ) = h ( S n T ) + h (S’ n T ) holds. We will also call the set T a test set. We denote the set of Lebesgue measurable subsets of R by EL. The existence of non-Lebesgue measurable sets, which would cause the abovementioned problems, is equivalent to the Axiom of Choice. That is, whether or not nonLebesgue measurable sets exist depends on what axiomatic system of set theory is used. 9. The Lebesgue Integral 148 In practical terms it means that non-Lebesgue measurable sets do not occur in physical phenomena. Hence, from an applied point-of-view we need not be too concerned with nonmeasurable sets and we will not consider them any further in this text. Exercise 9-7 illustrates the problems mentioned above with a simpler measure of ‘‘size,’’ the Jordan content, for which even simple sets can behave badly. The construction is more complicated for Lebesgue measure and the interested reader can find such constructions in [14] and [29]. For the remainder of this section, we explore the properties of the set of Lebesgue measurable subsets of R. Definition 9.5 Let S R be a Lebesgue measurable set. Then the outer Lebesgue measure h ( S ) of S is also called the Lebesgue measure of S. We first note that “half’ of the definition of measurability is always satisfied. Corollary 9.6 For all subsets S , T 5 R,we have h ( T ) 5 h ( S n T ) + A. (S’ n T ) Proof. This follows from part 3 of Theorem 8.6 with A1 := S n T , A2 := S’ n T and A,, := 0 for n 2 3. It is now easy to see that null sets are Lebesgue measurable. Proposition 9.7 I f h ( S ) = 0, then S is Lebesgue measurable. Proof. Let S be a null set. By part 2 of Theorem 8.6 for all sets T R we obtain 0 5 h ( S f l T ) 5 h ( S ) = 0. Hence, for all subsets T E R we have the inequality h ( S n T ) h (S’ n T ) 5 h (S’ n T ) 5 h ( T ) , which by Corollary 9.6 is all we need to establish Lebesgue measurability of S . + In the following, we establish that certain set theoretical operations preserve measurability. Theorem 9.10 summarizes the most important facts. Although there are more results, the properties listed in Theorem 9.10 suffice for our purposes. Set systems that satisfy these properties are called a-algebras (see Definition 14.1). These set systems are fundamental for measure theory. Lemma 9.8 If A and B are Lebesgue measurable sets, then the intersection A n B is Lebesgue measurable. Proof. Proofs of Lebesgue measurability typically involve the appropriate rewriting of terms and the use of the right test sets. To show that A n B is Lebesgue measurable, let T R be any subset of R. Then h ( ( An B ) n T ) = + h ( ( A n B)’ n T ) h ( A n B n T ) + h ((A’ U B’) n T ) + + h ( A n B f l T ) h ( A n (A’ U B’) f l T ) h (A’ = A ( B n A n T ) + h (B’ n A n T ) + A (A’ n T ) = n (A’ U B’) n T ) 149 9.1. Lebesgue Measurable Sets = h(AnT ) + h (A’ n T ) = h(T). Because T was an arbitrary subset of measurable. R we have proved that A nB is Lebesgue rn The next lemma will be useful in two ways. It is a step toward proving that countable unions of Lebesgue measurable sets are Lebesgue measurable. It also is a step toward proving that if a Lebesgue measurable set consists of countably many pairwise disjoint Lebesgue measurable pieces, then the Lebesgue measure of the set is the sum of the Lebesgue measures of the pieces. Lemma 9.9 Let {An}Elbe a sequence of painvise u disjoint Lebesgue measurable co sets. Then the union A, is Lebesgue measurable and for all T & IR we have n=l Proof. Let T E R. We first prove by induction that for all k k A(T)= E N we have that ) C A (A, n T ) + A n T . The base step with k = 1 follows n=l from the Lebesgue measurability of A 1. For the induction step k -+ ( k l), we can assume that the induction hypothe- + k sis A(T) = C A (A, n T )+A n=l k+l A.(T) = CE. (A, n T ) + A n=l induction hypothesis. is true, and we need to prove that ((5 n=l An)’ n T ) . We start the induction step with the 9. The Lebesgue Integral 150 / k+l k \ n=l = & A n n T ) + h ( ( ; A n ) ’ nnT= l) , n=l which finishes the induction stem k Thus for all k E N we have h ( T ) 2 h (A, n T )+h n=l letting k go to infinity we obtain the following. 2 h(T). The above establishes measurability of the union as well as the desired equality. W Theorem 9.10 The set CA of Lebesgue measurable subsets of properties. 1. 0 E 2. I f S R has the following C).. E C h , then S’ E C,. u x 3. @ A n E CAf o r all n E W, then An E Xi,. n=l Proof. Parts 1 and 2 are left to the reader as Exercises 9-la and 9-lb. For part 3, let An E CAfor all n E N and let T R. Define B1 := A 1 and then inductively for n E N set Bn+l that for all n E := A n + l r7 n n (dB1l’ = A n f l r7 Bi. An easy induction shows ,=1 N the set Bn is Lebesgue measurable (use part 2 and Lemma 9.8), that B1, . . . , Bn are pairwise disjoint and that u n A, = r=l u n 2=1 B, (see Exercise 9-lc). But 9.1. Lebesgue Measurable Sets u u x then 151 00 Bi (see Exercise 9-ld) and the latter set is Lebesgue measurable by Ai = i=l i=l Lemma 9.9. Now that we have established that countable unions preserve Lebesgue measurability, it is reassuring to note that Lebesgue measure is additive for countable unions of pairwise disjoint Lebesgue measurable sets. This is exactly what we expect from a sensible measure. The size of a whole set is the sum of the sizes of its pairwise disjoint parts. Theorem 9.11 Let u rn sets. Then {An]El be a sequence of painvise disjoint Lebesgue measurable ffi A, is Lebesgue measurable and h n=l Proof. The union is Lebesgue measurable by Lemma 9.9 and if we apply Lemma u ffi 9.9 to T := A,, we obtain n=l which proves the result. The next result is a nice application of the countable additivity we just proved and it will also be needed when we show that sums of Lebesgue integrable functions are Lebesgue integrable. be a sequence of Lebesgue measurable subsets of W so Theorem 9.12 Let (,,u =, 100 that A, A,+lforall n E N.Then h Proof. Set B1 := A1 and for all n u N E N the equality I 1 =1 u N N B, = E \ A, = lim h(A,). n-.+rn N define Bn+l := An+l \ A,. Then for all n=l u u cc so A, = A N holds, which means B, = n=l A,. n=l N lim N+W C ~ ( B , 1’)im= AN). n=l N+rn The whole idea of Lebesgue measurability is only useful if it indeed allows us to extend the idea of Riemann integrability. For that to happen, intervals must be Lebesgue measurable. We have delayed this result to the end of the section, because now we can use some of the machinery built so far. 9. The Lebesgue Integral 152 Proposition 9.13 Intervals are Lebesgue measurable and the Lebesgue measure of an interval is its length. Proof. Because singletons are null sets and thus Lebesgue measurable, and because countable unions of Lebesgue measurable sets are Lebesgue measurable, intervals are shown to be Lebesgue measurable if we can prove that open intervals of finite length are Lebesgue measurable. So let A = ( a , b ) be an open interval of finite length and let T E R. By Corollary 9.6, we only need to prove h ( T ) 2 h ( A n T ) + h (A’ n T ) . The inequality is trivial if h ( T ) = co, so we only need to prove the inequality if h ( T ) < 00. Let T C R be a set of finite outer Lebesgue measure, let E > 0 and let [Zj}pl be a family of open intervals with T u 00 00 Zj and IZjl 5 h(T) j=1 j=l + !.2 Then Z := { Z j n A : j E N} is a countable family of open intervals whose union contains A n T . Moreover, 0 := {Zj \ [ a , CO) u :j E a - -&, a + {( 8 N} U { Z j \ (-co,b] : j -), (b& i& ’ b + E i)] & N] 8 is a countable family of open intervals whose union contains A’ n T . Thus h ( A f’ T ) = + h (A’ n T ) h(T)+& + Because E was arbitrary this proves that h ( T ) 1 h ( A r l T ) h (A’ n T ) and we have proved that A is Lebesgue measurable. Regarding the Lebesgue measure of intervals, by Proposition 8.5 for closed intervals in R we have h ( [ a ,b ] ) = b - a . For closed, unbounded intervals of the form [ a , co), we infer for all b > a that h ( [ a ,co)) 2 h ( [ a ,b ] ) = b - a , and hence h ( [ a ,00)) = 00. Intervals of the form (-00, b] are handled similarly. For open and half-open intervals, note that the singleton sets consisting of the endpoints have measure zero. This means (see Exercise 9-2) that adding or removing these points does not affect the Lebesgue measurability of the set or its Lebesgue measure. Standard Proof Technique 9.14 Note that the inequality marked with (*) in the proof of Proposition 9.13 can actually be shown to be an equality. This is not necessary 9.2. Lebesgue Measurable Functions 153 because we only need the inequality. In complicated estimates, it can happen that an inequality sign is put between quantities that are actually equal. Usually, this happens when the equality would not have helped in the proof (as in the example just mentioned) and when the writer did not want (the reader) to spend extra effort to think about why the quantities may be equal. 0 Exercises 9-1. Finish the proof of Theorem 9.10. That is, (a) Prove that 0 E (b) Prove that if S ZA. ZA,then S’ E ZA. E (c) Perform the induction mentioned in the proof of part 3. (d) Prove that if (Ai):, u u n Ai = i=1 and (Bi)iOO,l are countable families of sets so that for all n E N we have n u u 00 B i , then 03 Ai = i=l i=l Bi. i=l 9-2. Let A be a Lebesgue measurable set and let N be a null set. Prove that A \ N is Lebesgue measurable and that h ( A \ N ) = ,L(A). Him. Use Theorem 9.10. 9-3. Let A , B 9-4. Let u B be Lebesgue measurable sets. Prove that A \ B is Lebesgue measurable. be a finite sequence of painvise disjoint Lebesgue measurable sets. Prove that the union N An is Lebesgue measurable with A u ) (ny1 n=l An N = 2 h(An). n=l n 00 9-5. Let (An)E1 be a sequence of Lebesgue measurable sets. Prove that An is Lebesgue measurable n=l and that for all k E W we have h n ) (Il An 5 h(Ak). 9-6. Let C Q be a Cantor set. Prove that CQ is Lebesgue measurable. Hint. Use Exercise 9-5. n 9-7. For all S C [0, 11, let J(S) := inf j=1 dan content of S . Prove that J ( [ O , 11 n Q ) = 1 and J (10, 11 \ Q ) = 1. 9-8. Let A G B C B and let A (but not necessarily B ) be Lebesgue measurable Prove that h ( B ) = h ( A ) h ( B \ A ) + 9.2 Lebesgue Measurable Functions We now introduce the functions for which the Lebesgue integral can be defined. By first defining what (potentially) integrable functions should look like, we avoid the Riemann integral’s conceptual complications that are characterized in Lebesgue’s criterion (see Theorem 8.12). Existence or nonexistence of the Lebesgue integral, defined in Section 9.3, will then merely be a question of whether there is too much area under the graph of the function. We will revisit the original motivation of partitioning the y-axis after the 9. The Lebesgue Integral 154 proof of Theorem 9.19. To approximate areas, in Lebesgue integration indicator functions take the place of rectangles. Recall that by Definition 5.9 the indicator function 1; f o r x E S, of a set S R is l s ( x ) := 0; f o r x $ S. Just as Riemann integrable functions can be approximated a.e. with step functions (see Exercise 5-26), Lebesgue integrable functions will be approximated with functions that are constant on measurable sets and which only assume finitely many values. Definition9.15 A function s : R + R is called a simple Lebesgue measurable function, or, a simple function, iff there are numbers a1 , . . . , a, E R and pairwise n disjoint Lebesgue measurable sets A1, . . . , A, 5 so that s = aklAk. k=l For functions f that assume more than finitely many values, we consider the positive and negative parts of f separately. Definition 9.16 For f : R + [-m, co],we define f + ( x ) := max { f (x), 0 ) and f - ( x ) := - min { f ( x ) , 0 )f o r all x E R. Because we will successively approximate measurable functions from below we want to speak of sequences of functions. Definition 9.17 A family { f n J n E ~of functions will also be called a sequence of functions, denoted { fn}r=l. Definition 9.18 A function f : R + [0,001 is called Lebesgue measurable iff there is a sequence {sn)rE1of simple functions s, : Iw + [0,co)such that f o r all x E R the sequence { s, ( x ) } E l is nondecreasing and lim s, (x) = f (x). A function f : R + [--00,-00] both Lebesgue measurable. n-+m is called Lebesgue measurable iff f + and f - are The key problem in Riemann integration is that for some functions the approximations from above and below will not “meet.” Definition 9.18 does not simply circumvent this problem by only focusing on approximations from below. Exercise 9-9 shows that a bounded function (only bounded functions are considered in Riemann integration) is Lebesgue measurable iff it can be approximated from above with simple functions. That is, Definition 9.18 may look biased, but for bounded functions the concept of Lebesgue measurability could also be defined with approximations from above. Because we are also interested in unbounded functions, we choose to work with approximations from below throughout. Because Lebesgue measurability is a key concept, it is useful to have several equivalent formulations available. Theorem 9.19 Let f : R -+ lent. [--00, m] be a function. Then the following are equiva- 1. The function f is Lebesgue measurable. 9.2. Lebesgue Measurable Functions 155 2. For all a E R,the set {x E R : f (x) > a } is Lebesgue measurable. 3. For all a E R,the set {x E R : f (x) 5 a } is Lebesgue measurable. 4. For all a E R,the set {x E R : f (x) < a } is Lebesgue measurable. 5. For all a E R,the set {x E R : f (x) L a } is Lebesgue measurable. Proof. We first prove the result for a function f : R + [0, w]. For the implication “1=+2,” let a E R and let {s,]:=~ be a sequence of simple functions such that for all x E R the sequence {sn(x)}Z1is nondecreasing and converges to f ( x ) . For all x E R,if f ( x ) > a , then for some n E N the inequality s,(x) > a holds. Conversely, if for some n E N we have that s n ( x ) > a , then because { S , ( X ) } ~ is~ nondecreasing we must have f ( x ) > a . This means that u {x 30 {x E R : f ( x ) > a } = E R : s n ( x ) > a } . But each set {x E R : sn(x) > a ) i1=1 is a union of finitely many Lebesgue measurable sets, which means it is Lebesgue measurable. Therefore, as a countable union of Lebesgue measurable sets, the set {x : f ( x ) > a } is Lebesgue measurable. For “2+3,” let a E R.Then {x E R : f ( x ) 5 a } = R \ {x E R : f ( x ) > a } , which is the complement of a Lebesgue measurable set. For “3=+4,” note that for all real numbers a the set {x E R : f ( x ) < a } is equal to the union {x E R :f(x) <a}= u7x E 7 R : f ( x ) 5 a - - and the latter set is k a countable union of Lebesgue measurable sets. For “4+5,” let a E R. Then {x E R : f ( x ) 2 a } = R \ {x E R : f ( x ) < a } , which is the complement of a Lebesgue measurable set. For “5+1,” by Exercise 9-10 the set {x E R : f ( x ) = 00) is Lebesgue measurable. For each n E N,define the simple function k= 1 Then each s, is nonnegative, because for all k , n E N we have To see that each { s n ( x ) } Z lis nondecreasing, let x E R and let n E N. The claim is trivial if f ( x ) = 00, so we can assume f ( x ) < w. If f ( x ) 2 n , then k-1 s , ( x ) = 0 5 s,+l(x). If f ( x ) < n , find k E { 1 , . . . , n2”} so that s,(x) = -. 2” k-1 k 2 ( k - 1) k- 1 - -= s,(x). Finally, Then -5 f ( x ) < -, and hence s,+1(x) 2 2” 2, 2, each ( s n ( x ) } z lconverges to f ( x ) because, if f ( x ) E [ N - 1, N ) , then for all IZ 2 N 1 the inequality l f ( x ) - s , ( x ) / < - holds, and if f ( x ) = 00, then f ( x ) = n . 2” 2n+l 9. The Lebesgue Integral 156 Finally, consider f : R + [--00, co]. For “1+2” let f be Lebesgue measurable. Then f - and f + are Lebesgue measurable. But then for a 2 0 we have that {x E R : f ( x ) > a } = {x E R : f + ( x ) > a } , which is Lebesgue measurable, and for a < 0 we have {x E R : f ( x ) a } = (x E R : f - ( x ) < - a } , which is also Lebesgue measurable. Parts “2+3,” “ 3 j 4 , ” and “4+5” are similar to what was done for nonnegative functions. To prove part “ 5 j 1 , ” first note that for all a > 0 we have {x E R : f + ( x ) 2 a } = { x E R : f ( x ) 2 a } , which is Lebesgue measurable and for all a 5 0 we have { x E R : f + ( x ) 2 a } = R, which is also Lebesgue measurable. Hence, f +is Lebesgue measurable. Considering the negative part f-,for all a 2 0 we have that { x E R : f - ( x ) 5 a } = { x E R : f ( x ) 2 - u } , which is Lebesgue measurable and for a < 0 we have {x E R : f - ( x ) 5 a } = 0, which is also Lebesgue measurable. Hence, f - is Lebesgue measurable and because we already proved that f’ is Lebesgue measurable we have proved that f is Lebesgue measurable. rn The underlying idea of the proof of part “ 5 j 1” for nonnegative functions f is to 1 partition the interval [O, n ) on the y-axis into intervals of length -. The proof shows 2n that the area under the functions that are used to approximate f should approximate the area under f , which means the idea of partitioning the y-axis can lead to a sensible notion of integration. Once measurable functions are characterized, it is helpful to determine how measurability relates to common algebraic operations. + Theorem 9.20 Let f , g : R -+ [--00, 001 be Lebesgue measurable functions. I f f g is dejined everywhere, then f g is Lebesgue measurable. Similarly, i f f - g or f . g is dejined everywhere, then it is Lebesgue measurable. Moreovel; f + , f- and I f I are Lebesgue measurable. + + Proof. To see that f g is Lebesgue measurable, let a E R. We will use Theorem 1.36 to show that the set { x E R : ( f g ) ( x ) < a } is a countable union of Lebesgue measurable sets. If (f g ) ( x ) < a , then there is an E > 0 so that (f g ) ( x ) 26 < a. By Theorem 1.36, there are rational numbers r and s so that f ( x ) < r < f ( x ) E and g(x) < s < g ( x ) E . This means that f ( x ) g ( x ) < r s < ( f g ) ( x ) 2s < a , which proves the containment ‘‘S”in the equation below. The containment ‘‘2’’ is trivial. + + + + + + + + + + Because the latter set is a countable union of Lebesgue measurable sets, the set g)(x) < u } is Lebesgue measurable. Because a E R was arbitrary this means that f g is a Lebesgue measurable function. The proofs that f - g and f . g are Lebesgue measurable functions are similar (see Exercise 9-1 1). The functions f + and f - are Lebesgue measurable by Definition 9.18 and the Lebesgue measurability of If1 = f + f - follows from the Lebesgue measurability of sums of Lebesgue measurable functions. rn { x E R : (f + + + 9.2. Lebesgue Measurable Functions 157 Exercises 9-9. Prove that a bounded function f : W -+ [0, 00) is Lebesgue measurable iff there is a sequence w ( s n ) z , of simple functions S n : W -+ R such that for all x E R the sequence ( s n ( x ) }n=l is nonincreasing and lim s n ( x ) = f ( x ) . n+oc Hint. Mimic part ‘‘5+1” of the proof of Theorem 9.19 for nonnegative functions. 9-10. Let f : W -+ [-m, cc]be a Lebesgue measurable function. Use part 4 of Theorem 9.19 to prove that ( x E W : f ( x ) = cu } is Lebesgue measurable. 9- 1 1. Finish the proof of Theorem 9.20. That is, (a) Prove that if f,g : W + [ - x , m] are Lebesgue measurable and f - g is defined everywhere, then f - g is Lebesgue measurable. (b) Prove that i f f , g : R -+ [-m, co]are Lebesgue measurable and f . g is defined everywhere, then f g is Lebesgue measurable. Hint. This one is complicated because of negative signs. Prove the result first for f,g > 0, then use f = f+ - f- and g = g t - g - . 9-12. Prove that the sum of two simple functions is again a simple function. 9-13. Prove that f : R + [--00, co]is Lebesgue measurable iff for any two numbers a < b in W the set [ x E W : f ( x ) E [ a , b ) ] is Lebesgue measurable. 9-14. Use Definition 9.18 to prove that if f,g : f g is Lebesgue measurable. + W -+ [0, co] are Lebesgue measurable functions, then 9-15. Let f , h : W -+ [-cc,x] and let f be Lebesgue measurable. Prove that i f f = h a.e., then h is Lebesgue measurable. 9-16. Let f .g : R -+ [-x,co]be Lebesgue measurable functions. + f,g : W + [-cc,co] are Lebesgue measurable and f ( x ) g ( x ) is de’ fined almost everywhere, then (f + g ) ( x ) := f ( x ) + g ( x ) ; if f ( x ) + g ( x ) is defined, IS lo: otherwise, Lebesgue measurable. Hint Apply Exercise 9-15 to the right auxiliary functions and then use Theorem 9.20. (a) Prove that if (b) Prove that if f,g : R + [-m, fined almost everywhere, then Lebesgue measurable (c) Prove that if co] are Lebesgue measurable and f ( x ) - g(x) is deis defined, 1s ’ f ( x ) - g ( x ) ; if f ( x ) 0; otherwise, (f - g ) ( x ) := l f,g : W + [-x.m] are Lehesgue measurable and f ( x ) g ( x ) is defined almost everywhere, then ( f g ) ( x ):= if f ( x ) g ( x ) is defined’ is Lehesgue measurable. otherwise, 9-17, Let f,g : W +. [-m, 001 be Lebesgue measurable functions. (a) Prove that (b) Prove that (c) Prove that { x E W : f ( x ) = g ( x ) } is Lebesgue measurable. [ x E W : f ( x ) 5 g ( x ) ] is Lebesgue measurable. [ x E R : f ( x ) ig ( x ) } is Lebesgue measurable. 9-1 8. Let f . g : W + [-cu, x]be Lebesgue measurable functions (a) Prove that max(f, g ) (defined pointwise) is Lebesgue measurable. (b) Prove that min(f, g ) (defined pointwise) is Lebesgue measurable. 9-19, Let f : W + W be a nondecreasing function. Prove that f is Lebesgue measurable 9. The Lebesgue Integral 158 9.3 Lebesgue Integration Independent of whether the base is an interval or a potentially more scattered Lebesgue measurable set, the area of a “rectangle” should be the measure of the base times the height. This idea is behind the Lebesgue integral of simple functions. Definition 9.21 Let A1, . . . , A , C JR be painvise disjoint Lebesgue measurable sets, n let al, . . . , a, E [o, 00) be nonnegative numbers and let s = n function. We dejne the Lebesgue integral o f s by UklAi, k= 1 be a simple n ak1.4, d h := C akh(Ak). k=l n By Exercise 9-20a for any given simple function s the value akh(Ak) does not k=l 7c depend on the representation s = ak1.4, that was chosen for s. Hence, Definition k=l 9.21 is sensible and we can proceed to more general functions. For a more general function, the Lebesgue integral is defined by approximating the area under the function from below with the area under simple functions. Definition 9.22 Let f : R + [0,001 be a Lebesgue measurable function. We dejne the Lebesgue integral o f f to be f d h := sup { s d h : s is a simple function with 0 5 s 5 f and we will call f Lebesgue integrable ifSthe supremum isfinite. A function g : R + E% will be called Lebesgue integrable i#g+ both Lebesgue integrable. We set 4 g d h := ’8 dh - 1 and g- are g- d h and call it the Lebesgue integral of g . Continuing our comparison with the Riemann integral, Exercise 9-2 1 guarantees that for bounded functions that differ from zero only on a set of finite Lebesgue measure (the Riemann integral is defined for bounded functions on bounded intervals) there also is an approximation from above that will give the value from Definition 9.22. That is, unlike the RiemandDarboux integral (see Definition 5.27 and Theorem 5.29), for bounded functions that differ from zero on closed and bounded intervals the Lebesgue integral does not have any problems with an upper and a lower approximation not being equal. As noted after the definition of Lebesgue measurable sets (see Definition 9.4), nonmeasurable sets are quite hard to come by. Similarly, although we will always need to prove measurability for functions that we want to integrate, nonmeasurable functions are not expected to arise in practical applications. This means that the only possible problem in the definition of the Lebesgue integral is the potential for infinite area under 9.3. Lebesgue Integration 159 the graph. This is not a problem, because functions whose graphs enclose an infinite area cannot have a finite integral, independent of what notion of integration is used. Now that we have a sensible notion of integration that (as it ultimately turns out) does not have the weaknesses of the Riemann integral, we can establish some theorems about Lebesgue integrals. Theorem 9.23 Let f , g : R + [-m, 001 be Lebesgue measurable functions. I. If0 5 f 5 g a.e. and g is Lebesgue integrable, then f is also Lebesgue IRI r integrable and r f dh 5 g dh. R 2. f is Lebesgue integrable iff 1 f I is Lebesgue integrable and in this case the tri- angular inequality 3. Iff >O,then 1 f d i i I I f I d h holds. 1 f d h = O z f f f =Oa.e.. Proof. For part 1, let N := {x E R : f ( x ) 2 g ( x ) } . By hypothesis, N is a null set. Let s : W + [0, 00) be a simple function with 0 5 s 5 f . Then s l ~ 5\ g~ is a simple function also and sup {b s dh = s l ~ d\h .~Hence, s d h : s is a simple function with 0 5 s 5 f = sup 5 1 1 sup [1 {1 S~R\N I d h : s is a simple function with 0 5 s 5 f s d h : s is a simple function with 0 5 s 5 g 1 . I Because g is Lebesgue integrable, the latter supremum is finite, and hence f is Lebesgue integrable. Because the suprema are equal to the respective Lebesgue integrals, we conclude that b f dh 5 b g dh. For part 2, first note that by Theorem 9.20 the function If1 is Lebesgue measurable. The direction "+"of the claim follows straight from part 1, because if 1 f I is Lebesgue integrable, then 0 5 f+ 5 I f 1 and 0 5 f - 5 I f I imply that f+ and f - are both Lebesgue integrable, which means by Definition 9.22 that f is Lebesgue integrable. For the direction "+,"let f : R + [--00. 001 be Lebesgue integrable. Then f f and - f - are Lebesgue integrable. To prove that I f [ = f + - f - is Lebesgue integrable I1 lets = x U k l & beasimplefunctionwith0 5 s 5 I f l . L e t P := {x E R :f ( x ) ? 0) k=l and N := { x E R : f (x) < 0). Then P U N = R and P and N are disjoint. Therefore n n 160 9. The Lebesgue Integral and 0 5 s- 5 f -. But this means that f - d h < co and thus 1 f Therefore we conclude that i s Leoesgue integratxe. Finally, because f -, f + 1 5 If 1 we conclude by part 1 that L For part 3, let f 2 0. First, consider the direction obtain 0 5 f 5 0 a.e. and by part 1 we conclude that P f d h = 0. IB Conversely, for the direction “+.”Because f f dh 5 = 0 a.e., we 0 d h = 0 , which means “+”let s, f d h = 0 and suppose for a contradiction that A( {x E R : f (x) > 0 ) ) > 0. Then, because the countable union of null sets is again a null set and {x E R : f (x) > 0 } = n , there must be an n and s 1 := - 1 we ~ obtain n by part 1. Therefore we conclude that f = 0 a.e.. Exercise 9-15 shows that if a function is equal a.e. to a Lebesgue measurable function, then it must be Lebesgue measurable, too. Part 1 of Theorem 9.23 can be used to show that if two Lebesgue measurable functions are equal a.e., then their Lebesgue integrals must be equal, too (see Exercise 9-26). Basically this means that for integration null sets are insignificant. The following definition is therefore sensible because independent of how we extend the function f to all of R,either all extensions are Lebesgue measurable or none of them are (because for all a > 0 the set {x E R : g(x) < a } with g as in Definition 9.24 below differs from {x E R : f ( x ) exists and f ( x ) < a } at most by a null set) and either all extensions are Lebesgue integrable with the same Lebesgue integral or none of them are Lebesgue integrable (by part 1 of Theorem 9.23). Definition 9.24 I f the function f : R g(x) := { ofl(x)’ --f [--30, 001 is defined a.e. and the function is Lebesgue measurable, then we will call f is i f f ( x ) is not defined, 9.3. Lebesgue Integration 161 Lebesgue integrable iff g is Lebesgue integrable and we define the Lebesgue integral P P o f f to be /R f d h := /R g dh. Theorem 9.25 shows that the Lebesgue integral is well-behaved with respect to the linear operations of multiplying with a real number and addition. Because of Exercise 9-16a and Definition 9.24 and because the set where the sum f g is undefined is a null set (see Exercise 9-27), we do not need to place any additional hypotheses on the functions in part 2 of Theorem 9.25. + Theorem 9.25 Let f ,g : R -+ [--00, a E R. The the following are true: 001 be Lebesgue integrablefunctions and let 1. af is Lebesgue integrable and 2. f + g is Lebesgue integrable and s, + f g dh = s, f dh + g dh. Proof. For part 1, note that by Theorem 9.20 with g(x) = a the function af is Lebesgue measurable. If f 2 0 and a 2 0, then af is Lebesgue integrable because af d h = sup = sup - [s, [ [1 s d h : s is a simple function with 0 5 s 5 af I as d h : s is a simple function with 0 5 as 5 af s d h : s is a simple function with 0 a sup 5s5f I But this means that for any Lebesgue integrable function f and any a 2 0 the functions (af )+ = af + and (af ) - = a f - are Lebesgue integrable and 1 af d h = k(af)+dh-k(af)-dh=a 1 f'dh-a 1 f-dA = a / Rf dh Finally, for any Lebesgue integrable function f and any a < 0 the functions ( af )+ = -af- and (af ) - = --af + are Lebesgue integrable and For part 2 first note that by Exercise 9-16a, Definition 9.24 and the preceding discussion, we can assume that f g is defined and finite everywhere and that f g is + + 9. The Lebesgue Integral 162 Lebesgue measurable. (Simply set f and g equal to zero where the sum is not defined.) Also note that part 2 is easily proved for simple functions (Exercise 9-20b), so we can use the additivity of the Lebesgue integral for simple functions in the following. We will first prove the result for nonnegative f and g. To see that f + g is Lebesgue integrable, suppose for a contradiction that f g is not Lebesgue integrable. Then for + each n E { N there is a simple function s, with 0 5 s, 5 F := x E R :f(x) 3 For each n E f(x) + 2 [ f and G := x E '(')} N one of the inequalities 1 1: : -S,1F Without loss of generality, assume that + g and k s, d h > n. Let R : g(x) 2 +2 a k; d h > - or g(x) n 1. --s,lc; d h > - holds. 4 n - s , 1 ~d h > - holds for all n 4 E N.Because 1 0 5 -S,*lF 5 + 1~ 5 f this implies that f is not Lebesgue integrable, which is a 2 2 contradiction. Hence, f g must be Lebesgue integrable. Now if s1 is a simple function with 0 5 s1 5 f and s2 is a simple function with 0 5 s2 5 g, then s1 s2 is a simple function with 0 5 s1 + s2 5 f + g. This + 1 implies s1 + dh + arbitrary, we obtain s2 I sl dh = f dh + L + s?:d h 5 g dh 5 L f s,+ + f g d h and because s1, s2 were g dh. 1 1 For the reversed inequality, let C, := x E R : - 5 f ( x ) g(x) 5 n for each { n n E N.We first prove that lim f g d h . Let E > 0 and let (f g ) lc, d h = n+m R k s d h > k f + g d A - - . N Eo t e s be a simple function so that 0 5 s 5 f g and 2 s u CT: that C, = { x E R : f ( x ) + g(x) n=l subsets A C {x E R : f ( x ) m + + + L + > 0 ) . Hence, by Theorem 9.12 for all measurable + g(x) > 0 ) we obtain n+Q3 lim h ( A n C,) = h ( A ) . With uJ l ~and , all aJ > 0 this implies s = J=1 m m slc, d h = lim C a j h ( A j n C,) = C a j h ( A j ) = n+m j=1 Therefore there is an n E But then 0 5 slc, 5 ( f j=1 N so that the inequality k . 1 ~ .d h + g ) l c , , and hence L(f+g)lc,,dhz s, s dh. > k s dh - & - 2 holds. 163 9.3. Lebesgue Integration { Because f and g are both nonnegative, the sequence JR (f nondecreasing and we conclude that lim n+oo Now let E > 0 and let n E < {I, Iw ” + (f N be so that (f + g) lc, d k } is n=l g ) lc, d h = E + g) lc, d h > 2 I 6 . Because f and g are bounded by n on Cn, the 4 ( W d + 1) proof of part “5+1” of Theorem 9.19 shows that there are simple functions sf and sg so that 0 I sf If l c , , 0 Isg F glc, and for all x E R the inequalities f ( x ) l c , (x) - s f ( x ) < u and g(x)lc, ( x ) - sg(x) < u hold. Therefore, Let u := min -, Because E > 0 was arbitrary, this proves the additivity of the Lebesgue integral for nonnegative functions. For not necessarily nonnegative functions, note that because If gJ5 I f / /gl, the above and part 2 of Theorem 9.23 show that f g is Lebesgue integrable when f and g are Lebesgue integrable. For the equality of the integrals, first notice that if f1, f 2 , gl, g2 are nonnegative, + + integrable and satisfy we can conclude that (f + g)’ - (f + g)- f1 - f2 = gl fl =f dh - - g2, then via f2 dh = + g = (f’ + g’) 1 +k fl gl d h - (f- + g-) g2 d h = + 1+ f2 gl d h g2 d h . Therefore with we obtain 9. The Lebesgue Integral 164 which completes the proof. + + The approximation of the integral of f g with integrals of functions ( f g) lc,, in part 2 of the proof of Theorem 9.23, shows that sequences of functions should be powerful tools. Convergence of sequences of functions is discussed in Chapter 11 and the fundamental limit theorems for (Lebesgue) integration are introduced in Section 14.5. Moreover, Exercise 14-33 gives a more efficient proof of part 2 using limit theorems for integrals. Exercises 9-20 Integration of simple functions. (a) Let s be a simple nonnegative function, let 4'1, . . . , ym E (0, 03) be the nonzero values that s assumes, let a l , , , . ,a, E [O, M), and let A l , . . . , An G B be pairwise disjoint Lebesgue measurable sets so that s = n n m k=l k=l j=1 C a k l A k . Prove that C a k h ( A k ) = C y j h (5-1 (yj1) . (This proves that the integral in Definition 9.21 does not depend on the representation of s . ) Hint. Each of the ak must be a yj or zero. Group the first sum so that equal values U k are contiguous and then prove that the union of the corresponding sets Ak is the inverse image of the appropriate yj. (b) Prove that if $1 and s2 are simple functions, then s, $1 + s2 dh = s, sl d h + s2 dh Hint. Find Lebesgue measurable sets A 1 . . . . , An so that $1 and s2 are constant on each A j . 9-2 1 Let f : X + [O, mj be bounded and Lebesgue measurable so that h finite. Prove that f is Lebesgue integrable and k 9-22 Let f : f d h = inf ( {x {k iff the supremum S := sup W : f(x) > 0 } ) is 1 s dh : s is a simple function with f 5 s . R + [O, 031 be a Lebesgue measurable function. Prove that Lebesgue integral of f . E f is Lebesgue integrable 1 {s, min(f, n ) l [ - n , n l d h : n E W is finite and that in this case S is the 9-23 Let f : R + [O, m] be Lebesgue integrable, let a > 0 and let A g R be Lebesgue measurable and SO that f -a lA > 0. Prove that k f - a l A dh = f dh - uh(A). 9-24 Construct Lebesgue integrable functions f : B + R and g : R + W so that ( f and ( f + g)- f fg-. + 9-25 Let a1 , . , , , a, E sets and let f = + gjf f f' + gf R,let A 1 , , , . , A n g B be (not necessarily pairwise disjoint) Lebesgue measurable 5 UklAk k= 1 be a simple function. Prove that s, f dh = 5 a k h ( A k ) . Then explain k=l how this result differs from the result in Exercise 9-20a and why we could not have used it to prove Exercise 9-20a. Hint. For the proof of the equation, you may use Theorem 9.25. 9.4. Lebesgue Integrals versus Riemann Integrals 165 9-26. Let j , g : R + [-co,001 be Lebesgue measurable functions so that f = g a s . Use only Theorem 9.23 to prove that f is Lebesgue integrable iff g is Lebesgue integrable and that in this case we have IRjdA.=I gdh. R 9-27. Let j : W +- [-co,co]be Lebesgue integrable. Prove that { x E JR : f ( x ) = cm } is a null set. 9-28. Let f,g : R +- [ - m , co]be Lebesgue integrable functions. (a) Prove that max(f, g) (defined pointwise) is Lebesgue integrable. (b) Prove that min(f, g } (defined pointwise) is Lebesgue integrable. (c) Prove that f - g is Lebesgue integrable and s, f - g dh = s, j dh - g dh. 9-29. Let C Q be a Cantor set. Prove that l C p is Lebesgue integrable. Hinr. Exercise 9-6. 9-30. Prove that the Dirichlet function f ( x ) = 1~,[0,11is Lebesgue integrable. 9.4 Lebesgue Integrals versus Riemann Integrals We conclude this chapter by establishing the relationship between the Lebesgue integral and regular as well as improper Riemann integrals. The Lebesgue integral truly is an extension of the Riemann integral of bounded functions on closed and bounded intervals. To see this, we first show that Riemann integrable functions are Lebesgue integrable and that for these functions the Lebesgue integral equals the Riemann integral. Theorem 9.26 also shows how to overcome the small nuisance of Riemann integrals being defined on sets [ a ,b ] ,while the Lebesgue integral is defined on R. Theorem 9.26 I f f is Riemann integrable over [ a ,b], then f R : E% + E% dejined by { f W ( x ) := ;()'; for "[ otherwise, blJ is Lebesgue integrable and the Riemann integral of f is equal to the Lebesgue integral of fR. That is, 1 fps d h = Proof. First, let f 2 0. Let P = {a = xo < . . . [ u , bl. With mi and Mi as in Definition 5.13, lets; Ib f dx. xn = b } be a partition of i n + := ~ m k l [ x r - l , x k mnl(xnl j and k=l p XU .-.- n + M k l [ X k - , , X k ) M n l { x n l . Then 0 5 sf I f 5 sup. k=l For all n E N,consider the partition P,, : k = 0, . . . ,2" := I . Then sp 5 sF+' for all n E N and by Lebesgue's criterion for Riemann integrability (Theorem 8.12) we infer that lim s,p"(x) = f ( x ) a.e. Because fps and the s p are zero n+w outside [ a ,b ] ,by Exercise 9-15 this means that fw is Lebesgue measurable. Because f~ 5 Ml[,,bl for some M > 0, we conclude that fps is Lebesgue integrable. Moreover, for all n E N we have L ( f , P,,) = 1s p dh 5 f R dh 5 s$ d h = U (f , Pn). 166 9. The Lebesgue Integral For each n E N,there is an evaluation set T, [ = t?), integers k = 1, . . . ,2" the inequality f (t;)) - mk < . . . , t2" 1 ~ n(b - a ) (n) I so that for all holds. But then we 1 infer R ( f , P,, T,) - L ( f , P,) < -. Because lim 11 Pn 11 = 0 we obtain n n-cc lim L ( f , P,) = lim R ( f , Pn, Tn) = n+co n+co Similarly, we can prove lim U (f , P,) = n+oc l b f (x)d x . lb f (x) dx. But this means which establishes the desired equality. For f being an arbitrary Riemann integrable function on [ a ,b ] ,note that because f is bounded there is a real number B so that h := f B l [ , , b ] 2 0. Then h x is Lebesgue integrable, so f~ = h~ - B l [ , , b ] is Lebesgue integrable and + b = l h - B d x = Ib fdx. The Lebesgue integral also incorporates parts of the improper Riemann integral as the next result shows. For improperly Riemann integrable functions as in Theorem 9.27, we also often say that the improper Riemann integral converges absolutely. Theorem 9.27 I f f is Riemann integrable over every closed subinterval of [ a ,b ) and 1.f I is improperly Riemann integrable over [ a ,b) (where b is either a number or infinity), then fw : R + R dejinedpointwise by f R ( x ) := i;(x): f o r x E [ a ,b), 1s . otherwise, { b Lebesgue integrable and the integrals are equal, that is, fR d h = f dx. A similar result holds for functions that are improperly Riemann integrable over an interval ( a , b ] (where a is either a number or negative infinity) or over a set of the form ( a , b ) \ I a i , . . . ,a,}. Proof. We will prove the result for b E ( a , co) and leave the case b = 00 to Exercise 9-31b. For Lebesgue measurability, first consider the case that f 2 0. For b-a : k = 0 , . . . , 2, and with mi and Ml as in Defall n E N,let P, := a k2, { inition 5.13, let 1 + 2"-1 SZ := a+(k-l19,a+k9). k=l Then for all n E N we have 9.4. Lebesgue Integrals versus Riemann Integrals 167 s i 5 ";s I f and by Lebesgue's criterion for Riemann integrability (Theorem 8.12) for almost all x E [ a ,b) we infer that lim s E ( x ) = f ( x ) . Because f~ and the s; are n+cc zero outside [ a ,b ) , by Exercise 9-15 this means that if f 2 0, then f~ is Lebesgue measurable. Applying this result to f f and f - separately implies that f is Lebesgue measurable regardless of what sign it takes. Now for all simple functions s with 0 5 s 5 If1 and all n E N the inequalities l b- 1 l b I f ( x ) /d x < 00 hold. Because n E N was s l [a ,b - arbitrary we can conclude that for all simple functions s with 0 5 s 5 I f ] we have the dh 5 inequality 1 I f ( x ) (d x 5 b sdh 5 lf(x)l d x < 00 (Exercise 9-31a). Hence, I f ~ is l Lebesgue integrable, so f x is Lebesgue integrable and h l f I ~ Lb1 To prove that the two integrals are equal, let 1-8 b dh 5 E 1 f ( x ) dx < 00. > 0 and find a 6 > 0 so that & I f ( x ) /d x < -.Then 2 b I 2 1 - 8 ( f ( x ) ld x < E . Because E was arbitrary the two integrals must be equal. Note that not every improperly Riemann integrable function is Lebesgue integrable (Exercise 9-34). Moreover, not every Lebesgue integrable function can be integrated with a regular or improper Riemann integral (Exercises 9-29, 9-30, 12-248). Hence, neither integral can formally be replaced by the other. In fact, because cancellations as in Exercise 9-34 are sometimes desired in Lebesgue integration, improper Lebesgue integrals can be defined similar to improper Riemann integrals (Exercise 9-35). If we consider the earlier development of the Riemann integral in Chapter 5, it would be natural to target the Fundamental Theorem of Calculus as the next big result after the fundamental properties and examples that were presented in this chapter. Unfortunately, although it is quite beautiful, the Fundamental Theorem of Calculus for the 168 9. The Lebesgue Integral Lebesgue integral has a rather technical proof. Exercise 9-36 is the first lemma for this result. Further lemmas are presented in Exercises 10-7, 11-20, 14-36 and 14-38 before the result itself is proved in Exercises 18-6 and 23-8. Exercises 9-31. Finish the proof of Theorem 9.27. (a) Prove that if a , b E R, a < b, M n E W we have that z 0 and s is a nonnegative simple function so that for all dh M , then i s dh 5 M. Hint. Recall that simple functions are bounded. (b) Prove Theorem 9.27 f o r b = co. 1 9-32. Let f ( x ) = - 1(0,11.Prove that 8is Lebesgue integrable while f is not. 1 9-33. Let f ( x ) = - l [ ~ , ~ Prove ) . that f 2 is Lebesgue integrable while f is not. 30 9-34. Let f ( x ) := ~ ( - l ) n + l i l [n,n+l), where the sum is taken pointwise. Prove that the improper n=l Riemann integral of f over [ 1, co)exists and that f is not Lebesgue integrable. Hint Harmonic series and the Alternating Series Test. 9-35. Define the improper Lebesgue integral of a function f : [ u , b ) + R,where b 9-36. Use the following steps to prove that if E with E g E W U (00) [ u , b] and Z is a family of closed subintervals of [a. b] C UZ,then there is a finite, pairwise disjoint subfamily 3 g Z so that lI I IEF z -61h ( E ) . I I (a) For any closed interval I,let I * be the closed interval with the same midpoint and I * = 5 1 I I . Prove that if I , J E 1,I n J # 0 and 111 < 2 1 4 , then I G J*. 6 (b) Let So := sup { I f / : I E Z let I1 E Z be so that If11 > 2.Assume pairwise disjoint 2 1, u n I1, . . . , I, are chosen so that, with A , := Ij, for each I E Zwe have I f' An = 0 or j=1 u I:. n I Prove that if there is an I E Z with I n A n = 0, we can continue the construction. j=l Hint. Use6,+1 := sup [ Ill : I E Z, I n An = 0 } (c) Prove that, independent of whether the construction in part 9-36b terminates in finitely many steps or not, if 3 is the family of intervals constructed in part 9-36b, then h ( E ) < 5 1 I1 . 1€3 (d) Prove that part 9-36c yields the requisite intervals. Chapter 10 Series of Real Numbers I1 The essentials on series were introduced in Chapter 6 to facilitate the “summation of infinitely many numbers.” This chapter presents further aspects of the theory of series. Particular attention is given to power series, which are needed to define the transcendental functions in Chapter 12. 10.1 Limits Superior and Inferior The terms of a series are usually easier to handle than the partial sums. Hence, it would be useful to have convergence criteria based on properties of the terms. The first obstacle for devising such criteria is that sequences obtained from the terms of a series need not converge. The limits superior and inferior defined in this section can be computed for any sequence and they describe the sequence’s limiting behavior. Proposition 10.1 Let {a,}?=l be a sequence of real numbers that is bounded above. ’x Then the sequence { sup{aj : j 2 n}},=l converges to a real number or to -a. Proof. The sequence of suprema is nonincreasing. If it is bounded below, then it converges to a real number. If not, then it converges to -m. Proposition 10.1 and a simple modification (see Exercise 10-1) show that the following definition is sensible. Definition 10.2 Let {a,}Elbe a sequence of real numbers that is bounded above. The limit superior of {an}Elis defined to be lim sup a, := lim sup{aj : j 2 a } . For n+cc n-+m sequences that are not bounded above, we say that the limit superior is co. If is a sequence of real numbers that is bounded below, then the limit inferior of {a,}?=l is defined to be liminfa, := lim inf{aj : j 2 n } . For sequences n+m n+’x that are not bounded below, we say that the limit inferior is -m. {a,}zl The relationship between limit superior and limit inferior is the obvious one. 169 10. Series of Real Numbers II 170 Proposition 10.3 Let {a,}El be a sequence in R.Then lim inf a, n+oc Proof. Clearly, for all n E implies lim inf a, 5 lirn sup a,. n+oo N we have inf{aj : j 5 lirn sup a,. ,-too p n } 5 sup{aj : j 2 n } , which n+oo Exercise 10-3 shows that precise arithmetic with limits inferior and superior is impossible for most operations. At least for negative signs we have a simple “reversal.” Proposition 10.4 For any sequence of real numbers lim inf -a, = - lirn sup a, and lirn sup -a, = - lim inf a,. n+cc n+oo n+m n-oo we have the equations Proof. Recalling Exercise 1-19 we note that The other equation is proved similarly. Finally, we should establish the relationship between the limit (if it exists) and the limits superior and inferior. {a,}zl Lemma 10.5 Let be a sequence of real numbers and let { b n } E 1be a convergent sequence of real numbers so that f o r all n E N the inequality b, 5 a, holds. Then lim b, 5 liminfa,. n-cx n+oo Proof. Let E > 0 and let L = lirn b,. Then there is an N E N so that for all n+oc n z N we have Jb, - L J < E , and hence a, p b, L - E . This means that for all n 2 N we have inf{aj : j 2 n } > L - E , and hence liminfa, 2 L - E . Because n+oo E > 0 was arbitrary we conclude lirn inf a, 3 L , which was to be proved. n+oc Theorem 10.6 A sequence of real numbers converges ifs its limits superior and inferior are equal and real, that is, lim sup an = lim inf a, E R.In this case, we n+cc n+oo have the equalities lim a, = lirn sup a, = lim inf a,. n+m n-oc n+m Proof. For “+,”let be a convergent sequence. Then by Lemma 10.5 and Exercise 10-4a we obtain lim sup a, 5 lim a, 5 lim inf a, E R.By Proposition 10.3, n+oo n+oo n+m we also have lirn inf a, 5 lirn sup a,, which implies the two must be equal to each other n-toc n-+x and to the limit of the sequence. let lirn sup a, = liminf a, =: L E R and let For “e,” n-oo n+bc E > 0. There is an N so that for all n 2 N we have inf{aj : j 2 n } > L - E and sup{aj : j 2 n } < L converges to L . In particular, for all n 2 N we infer la, - L / < E , so The equation at the end of the theorem is proved in either of the above parts. E N + E. W 10.1. Limits Superior and Inferior 171 Exercises 10-1. State and prove a version of Proposition 10.1 that can be used to justify the definition of the limit inferior. 10-2. Limit superior, limit inferior, and subsequences. Let {a,]:=l be a sequence of real numbers. is the largest S E [-co,x] such that there is a (a) Prove that the limit superior of {a,]:=, subsequence { a n k } & with lirn ank = S. k+oc (b) Prove that the limit inferior of ( u , ] ~ =is~ the smallest S ] ~ lirn ~ ank = S. subsequence [ u , ~ with E [-co,001 such that there is a k-tm 10-3. Let and { b n ] r z 1be bounded sequences of real numbers. (a) Prove that lirn inf a, n-tm + b, + lim inf b,. n+w 2 lim inf a, n+bo (b) Give an example that shows that the inequality can be strict. (c) Give examples that show that the limit inferior of a product of sequences can be greater or smaller than the product of the individual limit inferiors. 10-4. Let { a n ] g land be sequences of real numbers so that a n 5 b, for all n E N. (a) Prove that if {bn]gl converges, then limsupa, 5 lim b,. n-tm n- - tm (b) Prove that in general it is possible to have lirn sup a, > lirn inf b,. n+oc n+m 10-5. Let { L Z , ] ~be =a ~ bounded sequence of positive numbers so that the sequence of the reciprocals is bounded, too. Prove each of the following. 1 1 (a) liminf - = n+m a, limsup,,, a, 1 (b) limsup - = n--tm ’ a, 1 a, liminf,,, ’ 10-6. Let be a sequence of positive numbers and let (b,]r=l be a convergent sequence of positive numbers with nonzero limit. Prove each of the following. limn+= bn (b) limsupa,b” = limsupa, (a) lirn sup anbn = lim sup a, lim b,. n+w n-sm n-oc n-+m (n+m 1 10-7. Lebesgue’s Differentiation Theorem states that for any function f : [ a , b] + B of bounded variation the derivative f ’ ( x ) exists a.e. We will prove this result using the following steps. (a) For any function f : ( a , b ) + B andx E (a. b),define D + f ( x ) := lirn sup f(x +h) - f ( x ) h h, - f ( x ) , where “h + O+” means that h approaches zero form h the right ( h > 0) and “h -+ 0-” means that h approaches zero form the left ( h < 0). Prove that f is differentiable at x iff D + f ( x ) = D + f ( x ) = D - f ( x ) = D - f ( x ) and that in this case these values are equal to f ’ ( x ) . Note. The four values above are also known as the Dini derivatives of f at x. D - f ( x ) := liminf f ( x + h+O- (b) Let f : [ a , b] + R be of bounded variation, let A := { x E (a. b ) : D - f ( x ) < D + f ( x ) }. let B := [ x E ( a, b ) : D - f ( x ) > D + f ( x ) }, and let C := [ x E ( a . b ) : D + f ( x ) = co Prove that f is differentiable at every x E ( a , b ) \ (A U B U C ) . Note. In particular, this means we have proved Lebesgue’s Theorem if we can prove that A , B and C are null sets. The remaining parts of this exercise are devoted to this task. 1 1 1. 10. Series of Real Numbers I1 172 (c) Suppose for a contradiction that h ( A ) > 0 and obtain a contradiction as follows. 1 1 i. Prove that h x E A : D- f (x) < 4 - - < q < q - < D + f ( x ) > 0 for some n n q E Q and n E N. 1 1 ii. With q , n as above let E := x E A : D- j ( x ) < 4 - - < q < 4 + - < D+ f (x) , n n 1 let g(x) := f ( x ) - q x , let E := - and let a = xo < x1 < . . . < xn = b be so n (1 1) + I I n t h a t x /g(xk)-g(xk-l) 1 > Vig- k=l I EZ Eh ( E) .Pr o v et h at f o r al l x 6 1 )I1 > - h ( E ) and use this family to obtain the contradiction 6 E En(xk-1,xk) n VXx-, g > V t g . k=l (d) Use D- f (x) = - D - ( - f ) ( x ) and D + f ( x ) = - D + ( - f ) ( x ) toprove that h ( B ) = 0. 6Vbf 0 to a contradiction by letting M := -2.-and finding for each (e) Lead the assumption h ( C ) x E C an hx with I f ( x + h,) h(C) - f (x) 1 > Mh,. Use Exercise 9-36 as in part 10-7(c)iv and arrive at a contradiction in a similar way. 10.2 The Root Test and the Ratio Test When it works, the Ratio Test for the convergence of a series is computationally convenient, because it only involves a division. Note that by using the limit superior and the limit inferior we avoid any issues with convergence of the sequence of quotients. ffi Theorem 10.7 Ratio Test. Let a j be a series with a j # 0 for all j E W. j=1 cc a j converges absolutely. ffi > 1 then aj diverges. If neither of the above conditions holds, then the series might converge or diverge. We also say that the Ratio Testfailed in this situation. < 1. Then there is a J Proof. For part 1, let q := such that for all j 2 J we have laj+ll E W < qlajl, and hence lajl < qj-'/a~(. Let 10.2. The Root Test and the Ratio Test 173 laJ 1 laJ I M := J . Then for all j 3 J we obtain 0 5 laj 1 < q J - = M q j . By Comparison 4 qJ c 00 Test (Theorem 6.15 and Exercise 6-16), I *I a j converges absolutely. j=l > 1, then there is a J E W so that for all j 3 J aj we have laj+ll > lajl. Hence, for all j > J we obtain lajl > laJl > 0 and the terms of the series do not converge to zero. Thus the series diverges. The last statement will be illuminated in Example 10.10. For part 2, note that if lirn inf J+00 The limit superior and the limit inferior remove any analytic concerns about convergence of the quotient from the Ratio Test. But algebraic concerns remain regarding the possible division by zero. The Ratio Test cannot be applied directly to any series for which a subsequence of the terms is zero. The Root Test does not involve divisions. Thus it can be applied directly to any series. 30 a j be a series. Theorem 10.8 Root Test. Let j=1 1. lflim sup m < 1, then m > 1, then m aj diverges. j=1 j+m Zf lim sup a j converges absolutely. j=1 j-+m 2. Zflim sup 30 = 1, then the series might converge or diverge. We also say that j-m the Root Testfailed in this situation. < 1. There is a J E Proof. For part 1, let q := N such that c 30 laJ I < q J for all j 2 J . By Comparison Test, aJ converges absolutely. J=1 For part 2 , note that if lim sup .I/l.Ji > 1, then for any n E N there is a j > n with J+m laJ 1 > 1. Hence, the terms do not converge to zero, and so the series diverges. The last statement will be illuminated in Example 10.10. w In theoretical investigations, the Root Test usually is preferred over the Ratio Test, because we do not need to worry about terms that are equal to zero. Exercise 10-8 shows that for any series for which convergence can be proved with the Ratio Test, convergence can (in theory) also be proved with the Root Test. This is another indication that the Root Test is preferable when developing a theory. The p-series test below provides examples of convergent and divergent series for which the Root Test and the Ratio Test both fail. 10. Series of Real Numbers II 174 Q.The sum Theorem 10.9 p-series test. Let p E c OC1 - converges ifsp > 1. j=l J P 1 1 Proof. If p 5 1, then for all j 2 1 we have - > .: Because the harmonic series c Jp O01 - diverges (Example 6 j=l j 4 , by Comparison Test c -J 3 0 1 - diverges for p 5 1. j=l J P c+ k This leaves the case p > 1. Note that the sequence of partial sums Sk = is j=1 increasing. Hence, we are done if we can show that it is bounded. To do this we use a chunking argument similar to the proof in Example 6.8. . 2'+' m-1 1 1D-1 Because the right side of this inequality does not depend on m , the Sk are bounded H and the series converges for p > 1. 5j, Example 10.10 The Root Test and the Ratio Test both fail for the harmonic series M 1 which diverges, and for the series ?, which converges. For the Ratio Test, c /=I i=l J this is a simple computation. For the Root Test, we need to use that lirn 1'30 postpone the verification to Exercise 10-16c. f i = 1. We 0 Exercises x a, be a series with a j f 0 for all j 10-8. The Root Test versus the Ratio Test. Let j=1 Explain why part 10-8a shows that if the Ratio Test proves that a series converges, then the Root Test also proves that the series converges. 10.3. Power Series 175 c m 10-9. Construct a convergent series a j with j=1 10-10. Prove the p-series test (Theorem 10.9) using the Integral Test (Theorem 8.29). 10-11. Use the Ratio Test or the Root Test to determine which of the following series converge. If a series converges, determine if it converges absolutely. 10-12. Use any of the convergence tests introduced so far (including those from Chapter 6) to determine which of the following series converge. If a series converges, determine if it converges absolutely. 10.3 Power Series With series giving access to “sums with infinitely many terms” it is natural to consider what happens when we let the sum in the definition of polynomials have infinitely many terms. The resulting notion of a power series is very versatile. In particular, it will enable us to introduce the transcendental functions in Chapter 12. 00 is called a Ck(X k=O power series centered at a (or a power series about a ) with coeficients C k . The power Definition 10.11 For a sequence {Ck}z,, the expression c N series is called convergent at x E ifand only if lim N-KC Ck(X - a )k exists. Other- k=O wise it is called divergent at x . Theorem 10.12 shows that power series converge on a symmetric open interval ( a - R . a R ) about the center point (where R could be infinite), and they diverge outside [a - R , a R ] . So only if R is finite are there two points, a R and a - R for which the convergence behavior is unknown. Note how in Theorem 10.12 R is defined in terms of the Root Test. Exercise 10-13provides an equally useful conceptual characterization of R. + + + x Theorem 10.12 Let R := Ck(X - a ) k be a power series. 1 lim supk+x? x If lim sup iflim sup $%TI’ = 00 let R := 0 and iflim sup k+m - a ) k converges absolutely f o r all x k= 1 and it diverges f o r all x E R with lx - a1 > R. R := CC. Then c C k ( X E (0, m) let k+x? k=O = 0 let k - + 00 E R with Ix - a1 < R 10. Series of Real Numbers 11 176 1 Proof. We apply the Root Test. Let the formal quotient - be 00. For x # a , 0 = Ix - a1 lim sup the limit superior lim sup is less than 1 iff /- k+m 1 Ix-al < lim SUPk+00 This proves the result. m k+ 00 and it is greater than 1 iff Jx-al > c 1 limSUPk+00 00 Definition 10.13 Let 1 0 - a ) k be apower series. Then, with - := 00 this once, Ck(X k=O R := 1 lim suPk-+oo rn. c 00 is called the radius of convergence of Ck(x - a ) k . k=O Theorem 10.12 shows that power series define functions on intervals. These functions are limits of polynomials of increasing degree. To find out more about the analytic properties of these functions we need to investigate sequences of functions, which is done in Chapter 11. We conclude the present section with some remarks on the arithmetic of power series. Sums, differences and constant multiples of power series are straightforward (see Exercise 10-14). The multiplication of power series is similar to that of polynomials. We want to multiply each term of the first series with each term of the second and then collect terms with equal exponents. The product of a term with exponent i in the first power series with a term with exponent k - i in the second power series gives a term with exponent k for the product. We use this observation to first define a product for series. c 00 x7 Theorem 10.14 The Cauchy Product of two series. Let Uk and k=O bk be absok=O c cc c 00 converges absolutely and c n Pk = k=O n 7 Uibk-i. Then Pk k=O i =O m k=O k=O Proof. First note that a ak pk = c oc k lutely convergent series and for each k 1. 0 dejine pk := bk. k=O k k=O i=O aibk-j = c ajbl = Osj+l5n n n-k y, akbi for k=O i=o 00 every n E N. To see that c p k converges absolutely, note that for all n k=O inequalities E & theI 177 10.3. Power Series x hold. Now let E > 0 and assume without loss of generality that each series has at least 00 one nonzero term. Then there is an even N E N so that lakl < k= f 00 E n 3 CEO Ibkl and . Therefore for all n 2 N we obtain M M k=O k=O n 0 k=O i=O n-k k=O i=O n n-k k=O n i=o k=O - E I I 0 00 00 i=n-k+l k=n+l 1 n 00 00 0 1 k=O k=O k=O k=O 0 1 1 k=O F E E --+--+--=&. 3 3 3 co Hence, x o c 0 0 Pk = k=O bk, which completes the proof. ak k=O k=O It can be shown that in Theorem 10.14 it is enough to demand that one series converges (not necessarily absolutely) and the other converges absolutely. Because the proof is much more involved, we demanded here that both series converge absolutely. c 00 Corollary 10.15 Let c 00 Ck(X - a ) k and k=O dk(X - a ) k be power series with radii of k=O k convergence R, and Rd, respectively. With Pk := c c i d k - i f o r k p 0 , the radius x i =O 00 of convergence of pk(X - a ) k is at least R, := min{R,, R d ] and f o r all x with k=O Ix - a 1 < R , the product of the power series is the power series with coeficients P k , 00 that is, pi ( x - a ) = k=O (5 ck ( X - a ) k ) k=O (5 dk ( x - k=O . 10. Series of Real Numbers 178 II Proof. Exercise 10-15. Exercises c x 10-13. Prove that the radius of convergence of a power series is the largest R U k ( X - a$ c E [O, m] so k=O X that a k ( x - a ) k converges absolutely for all x with /x- a / iR k=O Note. This conceptual formulation is often more helpful than the formula in Definition 10.13 c x X c k ( x - a ) k and 10-14. Let k=O d k ( x - a ) k be power series with radii of convergence Rc and Rd k=O m (a) Prove that the radius of convergence of C ( c k + dk)(x C(C + dk)(x ~ c x X that - a)k= k=O ck(x - a ) k is at least min(R,. R d ] and x + k=O - a)k c k=O dk(x - a j k for all x for which both se- k=O ries converge. Hint. Use the result from Exercise 10-13. (b) State and prove a result similar to part 10-14a for the difference of the two power series. c x (c) Prove that if b E c W \ (01, then the radius of convergence of c x X x for which ck(x - a ) k converges, we have k=O bq(x bck(x k=O - a)k = b k=O - a)k is Rc and for all c x ck(x - a j k . k=O 10-15. Prove Corollary 10.15 c x 10-16. Consider the power series k= 1 Xk -. k (a) Use the Ratio Test to show that the radius of convergence is 1 = 1. (b) Use Theorem 10.12 to show that lim n+w (c) Prove that the Root Test fails for c c j=1 and J j=l - J2' 10-17. Use the formula from Definition 10.13 to compute the radius of convergence of the given power series. You may use the result from Exercise 10-16b. Chapter 11 Sequences of Functions In Section 10.3, we considered power series as series of numbers for fixed x E R. Another way to look at them is as sequences of polynomials. This change in pointof-view is the first step toward function spaces, which are very important in analysis. This chapter describes ways in which sequences of functions can converge. Formally, a sequence of functions is a map from the natural numbers into a function space. Until we encounter function spaces in Example 15.3, a sequence offunctions will simply be, as in Defnition 9.17, a sequence whose “values” are functions, not numbers. 11.1 Notions of Convergence The most obvious way in which a sequence of functions can converge is at every point. Definition 11.1 Let S be a set and for every n E N let f n : S -+ R be a function. The sequence of functions ( is called pointwise convergent to the function f : S -+ R i f f o r all x E S we have lim f n ( x ) = f ( x ) . f,}rz1 n+m Example 11.2 For n E N,let f n ( x ) := xt. On the set (0, 1) the sequence {fn],X==l converges pointwise to the constant function f (x) = 1. By Exercise 2-46, for all x E 1 (0, 1) the sequence x 1 3 0 ln=l converges to 1. Example 11.3 By Defnition 9.18 every real valued Lebesgue measurable function f is the pointwise limit of a sequence of simple functions. This is because if {sn},XE1is a sequence of simple functions that converges pointwise to f t and (tn)E1 is a sequence of simple functions that converges pointwise to f - , then the sequence {s, - t n ] E lconverges pointwise to f = f f - f - . 0 Pointwise convergence means the sequence of functions converges at every point, but the “speed” of convergence can vary from point to point. Uniform convergence requires that convergence happens at a uniform minimum speed (see Figure 23). 179 180 11. Sequences of Functions f+E f f --E 0 I I I I Definition 11.4 Let S be a set and jor ever): n E N let j,, : ?, -+ lK be ajunction. is called uniformly convergent to the function The sequence of functions ( fn]r=l f : S + E% iff for all E > 0 there is an N E N so that for all n 2 N and all x E s we have lf,,(x> - f ( x ) J < E . It a sequence ot tunctions only converges (pointwise or unirormly) on a part 1 of the common domain S, we also say that the sequence converges (pointwise or uniformly) on T . It is easy to see that uniform convergence implies pointwise convergence (Exercise 11-1). The converse is false, as the next example shows. Example 11.5 For n E N,let f,,(x) := xi. Then for any 6 > 0 on the interval (6,1) the sequence ( fn]TT1 converges uniformly to the constant function f (x) = 1. Howevel; on the set (0, 1) the sequence ( fn}r& does not converge uniformly to the constant function f (x) = 1. For the first claim, let 6 E (0, 1). Then for all points x E (6, 1) we have that 6; < x i < 1. If E > 0 is given, there is an N E N so that for all n 1. N we have 1 - E < 6; < 1. Consequently, for all points x E ( 8 , 1) and all n 2 N we infer 1 - E < 6i ixi < 1. Therefore on the interval (6, 1) the sequence { fn]T=l converges uniformly to the constant function f ( x ) = 1. To prove that the sequence { does not converge uniformly to f (x) = 1 on (0. l), suppose for a contradiction that it does. Then for every E E (0, 1) there is an N E N so that for all n 2 N and all x E (0, 1) we have 1 - E < xi < 1. But j”,,}zl y := (1 - E ) is~ in (0, 1) and fiy ((1 - E ) ~ = ) ((1 - E ) ” ) ~ = 1 - E 3 1 - E , a contradiction. Therefore, on the set (0, 1) the sequence { f,,}Z1does not converge uniformly to the constant function f (x) = 1. 0 181 11.1. Notions of Convergence Power series are important examples of uniformly convergent series. c M Theorem 11.6 Let c Ck(X - a ) k be a power series with radius of convergence R. k=O ffi Then Ck(X - a ) kconverges uniformly on any closed subinterval of ( a - R , a + R). k=O + Proof. Any closed subinterval of ( a - R , a R ) is contained in a symmetric closed subinterval [a - r, a r ] with 0 < r < R. Thus we are done if we can prove the result for symmetric closed subintervals [a - r, a r ] E (a - R , a R ) . Let r E (0, R ) . For all x E [a - r, a r ] and all k E N U {0},we have that I C k ( X 5 Icklrk. For + + + u)~I + E [a - r, a + r ] , let P ( x ) := c c k ( x - a ) k be the limit of the power series. c13 each x k=O 30 By Theorem 10.12, Ck((a + r ) - a )k converges absolutely. k=O there is an N x E [a - r, a N so that for all E n > n I I k=O I E > 0 co N we have lCklrk < E . Therefore for all k=n + r ] and all n 1. N we obtain I Thus for any lk=n+l 30 and hence the power series converges uniformly on [a - r, a +r]. More notions of convergence for sequences of functions are discussed in Section 14.6. Exercises 11- 1. Prove that every uniformly convergent sequence of functions is pointwise convergent. I 11-2. In Example 11.5, consider the proof that the f n (n) = x i do not converge uniformly to the constant 1 . function f ( x ) = 1 on (0, 1). Could we have used E = - in this part of the proof? Explain. 2 11-3. Prove that for each S > 0the sequence {fn):zl with f n ( x ) = x” converges uniformly on (0, 1 - S ) , but it does not converge uniformly on (0, 1). 11-4. Prove that a bounded function f : B --t R is Lebesgue measurable iff it is a uniform limit of simple functions. Hint. Part “5=+1”of Theorem 9.19. 11-5. Give an example of a sequence {fn)r=, of bounded functions on an interval I that converges pointwise to an unbounded function f : I + W 11-6. A sequence of functions (fn)T=l defined on a set S is called pointwise Cauchy on the set S iff ]El for all x E S the sequence [ f n ( x ) is a Cauchy sequence. Prove that every pointwise Cauchy sequence of functions is pointwise convergent to a function f : S + W. 11. Sequences of Functions 182 11-7. Approximating continuous functions with differentiable functions. Let [ a , b] be an interval (a) For continuous f : [a. b] -+ R and all x E R,let f ( x ) := f(x); if x E [ a , b]. f ( b ) ; i f x z b. For n E N and all x E R, let fn (x):= n x+; f ( z ) d z . Prove that fn is differentiable on B. (h) Prove that for every continuous function on [a, b] there is a sequence of functions that converges uniformly to f and so that every function in the sequence is continuous on [ a , b] and differentiable on ( a , b ). [fn]Fz1 11-8. A sequence of functions defined on a set S is called uniformly Cauchy on the set S iff for all E > 0 there is an N E N so that for all n , m 2 N and all x E S we have fn ( x ) - f m ( x ) < E . Prove that every uniform Cauchy sequence of functions is uniformly convergent to a function f : S + W. c cc if E f j=l : ( a , b ) + IW be continuous, let Prove that [e,d ] c ( a , b ) he a closed subinterval and let N 3 f : (a,b ) + andlet N := Ib 1 -1 ,’;/. := - x + - converges uniformly to f on [c. d ] . {gn]gN defined by g n ( x ) := f Hint Uniform continuity. 1 1 - 1 1 . Let 1 I c j=1 11-10, Let 1 : [ a , b] + R be so that a, := sup { f , ( x ) : x E [a, b] ] < m. Prove that co a, converges, then fj converges absolutely at every x E [a, b] and uniformly on [ a , b]. 11-9. For each j N,let f j I R be continuously differentiable, let [e,d ] c d ] . Prove that the sequence ( a , b) be a closed subinterval f (x + ;) - f ( x ) [gn]EN defined by g n ( x ) := 1 n converges uniformly to f ’ on [e,d ] . Hint Mean Value Theorem 11.2 Uniform Convergence If the function f is the (pointwise or uniform) limit of a sequence of functions [ f n } z l , it is natural to ask which properties of the functions in the sequence are inherited by the limit f . It turns out that pointwise convergence preserves neither continuity (see Exercise 11-12) nor Riemann integrability (see Exercise 11-13). Uniform convergence on the other hand preserves continuity (see Theorem 11.7), as well as Riemann integrability (see Theorem 11.9). Theorem 11.7 Let { fn}Elbe a sequence of functions on [a, b] that converges uniformly to f : [a, b] + E% and let all f n be continuous at x E [a, b]. Then f is continuous at x. Proof. Let E > 0. Because { f n } , X , 1 converges uniformly to f on [a, b], there is an N N so that for all z I 1 & [a, b] we have f N ( z ) - f ( z ) < -. Moreover, because f N 3 is continuous at x, there is a S > 0 so that for all z E [a, b] with Iz - x /< 6’ we have & lf~(z)- f N ( x ) / < -.Thenforallz E [ a , b ] w i t h l z - x l <Sweconclude 3 E I f ( z ) - f(x)l 5 < E /f(z) & - & fN(Z)I & -+-+--=&, 3 3 3 + / f N ( Z ) - fN(X)I + 1f N ( X ) - f(x)l 11.2. Uniform Convergence 183 and hence f is continuous at x. w Corollary 11.8 Let [ f n } ~ z obe = l a sequence of functions on [a, b ] that converges uniformly to f : [ a ,b] + R and let all f, be continuous on [a, b]. Then f is continuous on [ a ,b]. Proof. Apply Theorem 11.7 at every point in [ a ,b]. w Theorem 11.9 Let { f n j n E N be a sequence of Riemann integrable functions on the interval [ a ,b ] that converges uniformly to the function f : [ a ,b] + R. Then f is Riemann integrable and l b f (x) d x = .I hl b f, (x) d x . Proof. For each n E N,the function fi? is Riemann integrable. Therefore by Theorem 8.12 each set B, := {x E [ a ,b ] : w f ( x ) > 0} is a null set. Hence, by countable subadditivity of outer Lebesgue measure (part 3 of Theorem 8.6), the set oc B := U B, is a null set. NOW for every x E [ a ,b] \ B all functions f, are continuous n=l at x, and by Theorem 11.7 we infer that f is continuous at x. Because B is a null set, by Theorem 8.12 f is Riemann integrable. For the integrals, let E > 0 and find an N E N so that for all n 3 N and all E x E [ a ,b ] we have f n ( x ) - f (x)l < -. Then for all n 2 N we obtain b-a I which establishes the equality. w Lebesgue’s integrability criterion for Riemann integrals reveals that being Riemann integrable is a substantially weaker property than continuity. It would be nice if this weaker property would be preserved by a weaker notion of convergence, such as pointwise convergence. However, Exercise 11-13 shows that the pointwise limit of Riemann integrable functions need not be Riemann integrable. This is another situation in which the Lebesgue integral has an advantage over the Riemann integral. For pointwise limits of (Lebesgue) integrable functions, please consider Section 14.5. With continuity and integrability investigated, we turn to differentiability. For sequences of differentiable functions, uniform convergence does not guarantee the differentiability of the limit. Example 11.10 Uniform limits of direrentiable functions need not be differentiable. 2nt2 For every n E N,the function f n ( x ) := x2n+l is diflerentiable on (-1, l), but the uniform limit of (f,}Elis f (x) = 1x1,which is not dcfferentiable (also see Figure 24). It is clear that the f, are all differentiable. To see the uniform convergence to the absolute value function, let E > 0. For all x E (- 1, 1) we have 11. Sequences of Functions 184 I Figure 24: A sequence of differentiable functions that converges uniformly to Ix 1. For an indication how pathological the situation can become, consider that by Exercise 117 every continuous function is the uniform limit of differentiable functions, while by Exercise 11-21 there are continuous functions that are not differentiable at any x E R. 5 1 and 1x1 < 1. Moreover, lim r k = 1 for every posin+m tive real number r . Let E > 0 and let N E W be such that for all n 2 N we have -11 < ~ . T h e n f o r a l l x ~ ( - ~ , ~ ) a n d a l l n ~ N w e o b t a i n 1 = 6, while for all x E (- 1, 1) \ (-E, E) and all n > N we obtain We have proved that for all E > 0 there is an N E N so that for all n 2 N and for all x E (-1, 1) we have fn(x) - f ( x ) l < E , which means that the sequence { 0 converges uniformly on (-1, 1) to f (x) = 1x1. I fn}Ei To obtain a differentiable limit function, we require continuous derivatives and we impose a uniform convergence criterion on the derivatives. Theorem 11.11 Let { fn}El be a sequence of dzrerentiable functions on ( a , 6 ) so that for all n E N the derivative f; is continuous on ( a , b). I f ( convergespointwise to f : ( a ,b ) + R and the sequence { converges uniformly to the function g : ( a , b ) -+ IR,then f is direrentiable and f ’ = g . f,!,)zl fn}El Proof. By Theorem 11.7, the function g is continuous, because (on every closed subinterval of ( a , b ) ) it is the uniform limit of continuous functions. To show that 11.2. Uniform Convergence f' = g, let x 185 ( a , b ) and let E > 0. Let 6 > 0 be such that for all y E ( a , b ) & with Iy - X I < 6 we have I g ( y ) - g(x)l < -. Let z E ( a , b) \ { x } with Iz - X I < S 3 be fixed. Because converges uniformly to g, there is an Ng E N so that for E (f,'}zl {fn}z1 & all n 2 N g and for all y E ( a , b ) we have ]f,'(y) - g ( y ) l < T . Because converges pointwise to f,there is an Nf E N so that for all n $ Nf the inequality I '(') - - f n ( z ) - fn(x) 2 - x 2 - x I 5 < Value Theorem there is a c between 3 holds. Let n 2 max(Nf, Ng). By the Mean z and x so that f n (2) - f n (x) 2 - x we conclude & & & -+O+-+-=&. 3 3 3 < with Iz - x / < 6 we have R= = f,'(c).Therefore - 2 - x m' Then E. Hence, f is differentiable at x c c Ck(X - a ) k is dlfSerentiable on ( a - R , a k = ~ 33 the derivative is - g(x) < X 1 lim suPk+co I kck ( x - a)k- '. k= 1 Proof. First, note that (by Exercises 10-6 and 10-16b) 1 1 + R ) and 11. Sequences of Functions 186 c cc which means that the termwise derivative c cc of convergence as c k ( x - u ) ~ .Let r k=O d + r ) we have dx all x E ( a - r, a (0, R). Then for every n E c derivative is continuous on (a - r, a + r ) . Moreover, 22 X N and and the 1 32 ck(x - a ) k converges n=l c 30 converges uniformly to c c + r ) and the derivative is kCk(x - on (a - r, a +r ) . k=l cc ln=1 Therefore by Theorem 11.11 the function val ( a - r. a E + r ) and the sequence of the derivatives ck(x - a ) k on ( a - r, a k=O E [2 k:l c kck(x - =ckck(x- - a) Ck(X (k10 pointwise to has the same radius kck(x - k=l ck(x - a ) k is differentiable on the inter- k=O cc Because r kCk(x - E (0, R ) was k=l X ck(x - a ) k is differentiable at every point x arbitrary, k=O E (a - R,a + R ) and the c __ ix) derivative at x is kck(x - k=l In particular, Corollary 11.12 shows that the derivative of a power series is again a power series. Therefore Corollary 11.12 can be applied to derivatives and higher derivatives of power series and we obtain the following. c cc Example 11.13 Any power series f ( x ) = ck(x - a ) k with radius of convergence k=O R > 0 is infinitely differentiable on the interval (a - R , a derivative at x = a is f ( k ) ( a )= k ! C k . + R ) . Moreover, the kfh 0 In light of Example 11.13, it is natural to ask if every infinitely differentiable function is a power series. Lemma 18.8 will show that this is not the case. Exercises 11-12. Let n E 1 for 0 5 x 5 1 - -, W. Prove that f n ( x ) := 1 for 1 5 x 5 2, 0; f o r O i x i 1 , is not continuous on defines a continuous function on [O. 21 and that f ( x ) = 1; for 1 5 x 5 2, 1: 11.2. Uniform Convergence 187 converges pointwise to f. [0, 21. Then prove that 11-13. Let { 4 j ) E 1be an enumeration of all rational numbers in [0, 11 and let n E ' I N.Prove that each func- [O' I q l ' ' ' ' ' 4n1' is Riemann integrable on [0, 11. Then prove that tion f n ( x ) := O; for 1; f o r x E ( q 1 , . . . , q n l , the sequence { converges pointwise to the Dirichlet function f ( x ) = 0; for x E [O, 11 \ 0, 1; for x E [O, 11 n Q, which is not Riemann integrable on [O, 11. 11-14. Prove Theorem 11.11 as follows. Fix xo E ( a , b ) . Prove that g is Riemann integrable from xo to any x E ( a , b). Then use the Fundamental Theorem of Calculus to prove that for all x E ( a , b ) we have l: g ( t ) d t = n lim - x f n ( x ) - f n ( x o ) . From this finding, conclude that f ' = g 11-15. Explain why in the proof of Theorem 11.11 the hypothesis that all with the hypothesis that g is continuous. fi are continuous can be replaced [fn]r=l 11-16. A sequence of functions f,,: [ a , b] + R is called equicontinuous at x iff for all thereisaS z Osothatforalln E Nandallz E [a. 61 with It-XI < 6 wehave f n ( Z ) - f n ( x ) 1 E 1 > 0 < E. (a) Prove that if { fn):=l is equicontinuous at x and converges pointwise to f : [ a , b] + B,then f is continuous at x. (b) Prove that if [fn),X=l is a sequence of continuous functions that converges uniformly to the continuous function f,then {fi2),"=1 is equicontinuous at every x E [ a ,b]. 11-17. Give an example of a sequence of functions [f,],"=, on [0, 11 that converges pointwise on [0, 11 to . . 1 1 a Riemann integrable function f : [0, 11 + X and f ( x ) d x # n$% 1 1 fn(x) dx. 11-18. Prove that if ( fn}F=l is a sequence of bounded functions on an interval I that converges uniformly to the function f : I --+ B, then f is bounded also. s a3 c k ( x - a ) k and 11-19. Let k=O let E x d k ( x - a ) k be two power series with positive radius of convergence and k=O x E N we have C k - a$ for all x E X with 1x - a J < E. Prove that k=O k=O then for all k dk(x ck(x - a ) k = z 0 be so that = dk. Hint. Derivatives at a . fn)FZ1 1 1-20. Fubini's Differentiation Theorem states that if { is a sequence of nondecreasing functions on [ a , b] so that f n 5 f n + l for all n E N,fn+l - f n is nondecreasing for all n E N and so that converges pointwise to f : [a. b] + R, then f ' ( x ) = lim f L ( x ) for almost all x E [a. b ] . [fn]r=l n+m We will prove this result using the steps below. (a) Prove that f is nondecreasing. (b) Prove that there is a set A & [ a , b] so that h ( [ a , b] \ A ) = 0 and all the derivatives involved exist at every x E A . Conclude that we only need to prove the equality for almost all x E A. Hint. Lebesgue's Differentiation Theorem (Exercise 10-7). (c) Prove that for all n E N and all x E A we have f,'(x) 9 Hint Consider that fn+l = f n ( f n + l - f n ) . + fi+l(x). (d) Prove that for all n E N the function f - fn is nondecreasing. (e) Prove that for all x E A we have f i ( x ) 5 f ' ( x ) cc [ f ( b )- f n k ( b ) ]converges and prove that (f) Construct a subsequence [ f n k ] E sol that k=l x C [f(.x) k=l - f n k (XI] converges for all x E [ a , b] 188 11. Sequences o f Functions m (g) Apply what we have proved so far to g ( x ) := [f(x) - f n k ( x ) ] to prove that the series k=l c4 [f'(x) k= 1 fik ( x )1 converges at almost every x E A with limit less than or equal to g ' ( x ) . { Conclude that f ' ( x ) - fik(x)\ k=m 1 converges to zero at almost every x (h) Prove that lim f A ( x ) = f'(x) at almost every x "+a, E A , which establishes the result. x; 1 1-21. A continuous nowhere differentiable function. Let r (x) := and let s ( x ) := t ( x ) c13 + (t(x-j ) + t ( x + j ) ). E A. For all n 1-x; 0; E 1 forO<x<-, 2 1 for-<x<l, 2forx # (0, l), 1 (1000"~) 100" N let f n ( x ) := --s i=l m fn. and define f := n=l Prove that f is continuous because it is the uniform limit of continuous functions. 1 so that Prove that for every x E R and n E N there is a z E R with /z- x / = 4 ' 1000" -f(x) > and conclude that f is not differentiable at any x E W. 1 I 1 z-x ~ Hint.The nth sawtooth function f n is made up of straight line segments with slopes 1 1 0 " on 1 intervals of length __ 2 ' 1000" ' Prove that for every continuous f : [ a .b] + R that is differentiable on ( a , b ) there is a of continuous functions that are not differentiable at any point in ( a , b ) , sequence . v J l i n ~ l . L ~ ~ ~ ~ ~ ~ . ~ i f fn r. m l i q ' n , (fn)F=, 11-22. By Exercise 4-17, if the derivative of the function f : ( a , b ) -+ R is zero for all x E ( a , b ) ,then f is constant. This exercise shows that if the derivative is only zero almost everywhere, then f need not be constant even if it is continuous. The function @ below is called Lebesgue's singular function. { i] co Let Q = , let C Q be the ternary Cantor set and use the notation of Definition 7 . 2 2 . n=l L e t n ~ N . P r o v e t h a t f o r a l l i ~. {. l. . . 2 " ) w w e h a ~ e ~1Z ~ ~ = ~ Let n E W. For every x { 1 , . . . , 2"} E [O. 11 \ 114" let ix so that x is an upper bound of E I2 and { 1, . , 2") be the largest number in x # I s . For x E I f , , let i, := 0. , , Define Prove that $,, : [0, 11 + [0, 11 is continuous, surjective and @, [6 Il?] = [0, 11 i=l Prove that [$n),"=l [ I converges uniformly to a continuous, nondecreasing function $ Prove that @ C Q = [O. 11 and that $ is constant on every subinterval of [O, 11 \ C Q ,which means tlr' = 0 a.e. Chapter 12 Transcendental Functions In this chapter, we use the machinery built so far to define the transcendental functions and to investigate their properties. It would have been possible to define the transcendental functions earlier, but the proofs would have been harder. 12.1 The Exponential Function The question “Is there a nonzero function that is equal to its own derivative?” can be tackled with Corollary 11.12. If there is a power series with this property, then for 1 1 all k E we infer kck = Ck-1. That is, C k = -Ck-l = . . . = -CO. The most natural k k! w nonzero value for co is 1. Thus we are interested in the power series Definition 12.1 For x E R,we define exp(x) := c OC k=O c 3o k=O xk -. k! xk -. k! Theorem 12.2 The series that defines exp(.) has infinite radius of convergence. Thered fore exp(.) is differentiable on R with - exp(x) = exp(x) for all x E R. dx Proof. Because the last convergence terms of k ! exceed , we obtain for the radius of which proves that the radius of convergence is infinite. The differentiability of exp(.) on d R follows from Corollary 11.12. To see that - exp(x) = exp(x) we also use Corollary dx 11.12 and obtain the following. 189 190 12. Transcendental Functions c 3o d - exp(x) = dx xk-l k- k=l k! = k=l c5 oc X - = exp(x). 1 n =O Aside from being equal to its own derivative, the function exp(.) has interesting algebraic properties. Theorem 12.3 For all x,y E R,we have exp(x) exp(y) = exp(x + y ) . Proof. Let x ,y E R. Then via Theorem 10.14 (or Corollary 10.15) and the Binomial Theorem we obtain Theorem 12.4 For all rational numbers r , we have exp ( r ) = (exp( 1))'. Proof. With an easy induction it can be proved that for all n E N and x E R we have exp (nx)= (exp(x))" (Exercise 12-1). It is also easy to see that exp(0) = 1. Moreover, for all x E R we have exp(x) exp(-x) = exp(x - x) = exp(0) = l , which means 1 . In particular, for all n E z we infer exp (nx)= ( exp(x))n. exp(x) P with . p E Z and q E N we obtain Hence, for all rational numbers - that exp(-x) = ~ 9 Theorem 12.4 shows that on Q the function exp(.) is an exponential function. Exponentials with irrational exponents have remained undefined so far. Because exp(.) is differentiable, it is continuous. The most sensible way to extend an exponential function defined on Q to all of R is so that the resulting function is continuous. Thus the following definition is reasonable. Definition 12.5 Euler's number is dejined to be e := exp( 1). The function exp(.) is called the natural exponential function and f o r all x E JR we set ex := exp(x). By Theorem 12.4 for x E Q,the equality exp(x) = ex holds with the power on the right side being a power as in Definition 1.50. Next, we want to define powers with real exponents for all positive real bases. To do so, we first need to prove that e" has an inverse function. 12.1. The Exponential Function 191 Proposition 12.6 The natural exponential finction is a strictly increasing, bijective function from JR to (0, 00). Proof. Let x < y . By Definition 12.1 for any z > 0, we have ei > eo = 1. But this means eJ = eyPx+' = eYPxex > e x , so the natural exponential function is strictly increasing and, in particular, injective. Moreover, its values are greater than or equal to 1 on [0, 00). 1 For z < 0 note that e z e v 2 = e z P z= 1, which means that ez = 7 must be greater ethan 0. Therefore, ex maps R into (0. m). Now note that for x > 0 we have ex > 1 x , and hence lim ex = 00. ConseX'OO 1 quently, lim ex = lim - = 0. By Theorem 12.2, the natural exponential funcx+-x x--f--oo e-x tion is differentiable, and hence continuous. The above limits and the Intermediate Value Theorem show that the natural exponential function is surjective onto (0, 00). (MentallyJill in the details.) w + Definition 12.7 We define the natural logarithm function In : (0, m) + JR to be the inverse function of exp : R + (0, 00). The natural logarithm allows us to define powers of positive bases with arbitrary real exponents. With an argument similar to the proof of Theorem 12.4 we can show that this definition agrees with Definition 1.50 for rational exponents. (Exercise 12-2.) Definition 12.8 Let a > 0 be a positive real number and let r E R.Then we dejine the rth power of a to be a' := exp ( r ln(a)) = erln('). Theorem 12.9 For all positive numbers a and b, and all real numbers x and y the following hold. Proof. All properties follow directly from corresponding properties of the natural exponential function. The first property is proved as follows. u.xa~ = ex In(a)ey M a ) = ex W ) + yMa) = e ( x + ~M )a ) = ax+r Working rows left to right we obtain the following for the second property. (ab)x = - (eln(a)eln(b))x = (eln(u)+ln(b))x= eln(el"(a)cl"(b) )x e( In(a)+ln(b))x = eln(a)x+ln(b)x = ,1n(a)xeln(b)x = a X b X . The third property follows from a x - J a J - x = 1 and ax-)'ay = a x . The remaining w properties are left as Exercise 12-3. Of course, the properties of the natural logarithm function are also of interest, 12. Transcendental Functions 192 Theorem 12.10 The natural logarithmfunction is diferentiable on (0,00) with deriva1 d tive - ln(x) = -. Consequently, allpowerfunctions f (x) := x r with r E R \ Q are dx X d dgerentiable on (0,00) and the Power Rule -xr = r 2 - l holds. dx Proof. By Theorem 4.21, the natural logarithm function is differentiable at every d 1 1 x E (0, 00) and - ln(x) = -= -. For the derivative of powers, the Chain Rule dx eln(x) x implies -xr d = d e r l n ( x ) = e r l n ( x ) r -1 = r x r - l . H dx dx x Because the natural logarithm function is differentiable, it is continuous, which allows us to extend the limit law for powers to arbitrary exponents. be a convergent sequence of nonnegative numbers and Corollary 12.11 Let let r E R.Then n+oo lirn a; unless r < 0 and lirn an = 0. = n+oo Proof. Exercise 12-4b. H Further properties of the natural logarithm function are exhibited in Exercises 12-5 and 12-6. Exercises N we have exp(n) = )" as claimed in the proof of 12-1. Prove by induction that for all n Theorem 12.4. E 12-2. Prove that for all x > 0 and all r 12.8 agree. Q the definitions of the power x r in Definition 1.50 and Definition E (exp(1) 12-3. Finish the proof of Theorem 12.9. That is, prove the following for all a , b > 0 and x , y (a) (i) a x = a" bX (b) (a")' = ax) E R. (c) ax > 0 12-4. The limit law for powers (a) Prove that lim ex = 0. x+-m (b) Prove Corollary 12.11 . Be careful with sequences that go to zero. 12-5. Prove that the natural logarithm function is a strictly increasing bijective function from (0, co) to E with lim In(x) = m and lim ln(x) = -co. x+o+ x+m 12-6. Let u , u > 0. Prove each of the following. (b) ln(e) = 1 (a) In(1) = 0 id) In (:) = ln(u) - W u) (c) ln(uu) = ln(u) (e) In ( u L )= u ln(u) 12-7. Limits of nth roots (a) Prove that if nlirn an = a > 0, then n lirn +m +m Hint. Exercises 2-16 and 12-6. g m=a + ln(u) 193 12.2. Sine and Cosine (b) Prove that lim @ = 1. n-oc 1 1 12-8 Compute the integral x3ex2 d x 12-9 Gronwall's Inequality. Let u , u : [ a , b] + [0,00) be continuous functions and let c ? 0 be so that for all x E [ a , b] we have u ( x ) 5 c + /' u ( t ) u ( r ) d r . Prove that for all x E [ a , b] we have Hint. Divide by the right side, multiply by u ( x ) and integrate. 1 co 12-10 The function r ( a ) := x 0 l - ' C x d x defined for CY z 0 is called the Gamma function. (a) Prove that the improper integral LW xa-l e -' d x converges for a p 1. (b) Prove that the improper integral also converges for 0 a < 1. i (c) Prove that the improper integral diverges for CY 5 0. (d) Prove that r(1)= 1. ( e ) Prove that for all a 2 1 we have r ( a + 1) = a r ( a ) . (0Prove by induction that for every natural number n we have T ( n ) = ( n - l)!. For this reason, the Gamma function is also referred to as the generalized factorial function. 12-11. Compute the following parameter dependent indefinite integrals. These integrals are useful for the integration of rational functions after a partial fraction decomposition. 12.2 Sine and Cosine Similar to the natural exponential function, the sine and cosine functions are defined via power series that have the right derivatives at the origin. Of course, this is a bit of "reverse engineering," because to obtain these derivatives we would need to quote arguments that rely on the geometric definition of these functions. oc ( - l ) k x 2 k + ' Definition 12.12 For x E R, we define sin(x) := k=O 30 sine function, and cos(x) := k=O (- l)kx*k (2k)! (2k + 1)! , which is called the , which is called the cosine function. Theorem 12.13 The power series that define sin(.) and cos(.) have infinite radius of convergence. Therefore, both sin(.) and cos(.) are direrentiable and moreover d sin(x) = cos(x) and - cos(x) = - sin(x)for all x E R. dx dx 12. TranscendentalFunctions 194 Proof. We prove the result for sin(.), leaving cos(.) to the reader in Exercise 12-12. For the sine function, note that for every x E Iw we have (- 1 ) k + l x 2 ( k + ” + 1 (2(k+l)+l)! lim kj.m X2k+3 (2k (-l)kx*k+’ (2k+l)! + 3)! (2k + l)! xZk+’ = lim k+ca X2 (2k + 2)(2k + 3) = 0. Hence, by the Ratio Test, the power series converges for all x E R,so its radius of convergence is infinite. By Corollary 11.12, the sine function is differentiable on R and x1 = (- l)kx2k k=O (2k)! = cos(x). The following identities are useful when working with sine and cosine. Theorem 12.14 For all x, y E R the following identities hold. + cos2(x) = 1 (trigonometric law of Pythagoras) 2. sin(x + y ) = sin(x) cos(y) + cos(x) sin(y) 3. cos(x + y ) = cos(x) cos(y) - sin(x) sin(y) 1. sin2(x) Proof. To prove the first identity, we proceed as follows. sin2(x) + cos2(x) (_1)kX2k+’ (2k = n=l 2 + l)! (- 1); (- l)k ( 2 j + I)! (2k+ I)! 2j+1+2k+l=2n ( c ( - l ) J (-l)k (-1)J (-l)k -1 n=l k=O 1 12.2. Sine and Cosine 195 The remaining two identities are left for Exercise 12-13. The smallest positive zero of the sine function also has a special place in mathematics. Of course, we first must show that the sine function has a positive zero. Proposition 12.15 Thehnction sin(x) ispositive on Proof. For all x E (0, &) , we have X4k+l X2k+l x and it is negative at 4. sin(x) = C ( - l l k (2k k=O + l)! = C ( 4 k + 1 ) ! k=O ) X2 (4k + 2)(4k + 3) > o . On the other hand, for x = 4 we obtain 42k+ 1 x sin(4) = k=O C ( - l l k (2k + l)! - where the first term is negative because 42k+ I 4 D- ilk (2k + l ) ! k=O = 4-- 43 45 - 4 ' 49 1.2.3 1.2.3.4.5 1.2.3.4.5.6.7 1.2.3.4.5.6.7.8.9 5.43 4.43 9.46 2.46 - 4-1.2.3.5 1 . 2 . 3 4 1.2.3.5.6.7.9 1.2.3.5.6.7-9 7 .43 .43 6.9.43 = 41.2.3.5.6.9 1'2.3.5.6'7.9 + + +-- + 12. Transcendental Functions 196 = 4- 472 118.43 =4-4.-<0. 1.2.3.5.6.9 405 Definition 12.16 The smallest positive zero of the function sin(x) is called n. We conclude by proving that the sine and cosine functions are periodic. Definition 12.17 Let p > 0 be a real number: A function f : R -+ R is called periodic with period p zrfor all x E R we have f ( x p ) = f(x). + Theorem 12.18 Both the sine and the cosine function have period 2n Proof. Note that sin(2n) = sin(n + n) = sin(n) cos(n) + cos(n) sin(n) = 0. This implies cos2(2n) = 1 - sin2(2n) = 1 and because cos(2n) = cos(n + n)= cos(n>cos(n) - sin(n) sin(n) = cos2(n) > 0, R we obtain we infer that cos(2n) = 1. Hence, for all x E + 2n) cos(x + 2n) + = sin(x) cos(27r) cos(x) sin(2n) = sin(x) = cos(x) cos(2n) - sin(x) sin(2n) = cos(x). sin(x Exercises 12-12. Prove that the power series that defines the cosine function has infinite radius of convergence and d that - cos(x) = - sin(x). .t% 12-13, Finish the proof of Theorem 12.14. (a) Use power series to prove sin(x + y) = sin(x) cos(y) + cos(x) sin(y) for all x . (b) Use the above and the law of Pythagoras to prove cos (?) n = O a n d s i n ( ?3)7 J E (c) Use the power series to prove sin(-n) = - sin(x) and cos(-x) = cos(x) for all x (d) Prove that sin (e) Prove that cos (5 (-2 7r - x ) = cos(x) for all x E B.(Use parts = sin(x) for all x E B. - .x) (0, Prove that cos(x + y),= R = 1. E 8. 12-13a, 12-13b and 12-13c.) c o s ( ~ ) , c o s i ~ ~sinixi,sin(xi,for ,all x . y E B 12-14. Prove that n > 3. 12-15. A function f : [0, 27r] + B of the form f ( x ) = a0 + n ( a j cos(jx) i--.l trigonometric polynomial. (a) Prove the following product-to-sum formulas 1 i. cos(x) cos(y) = - [ cos(x y) cos(x - y) ] 2 1 ii. sin(x) sin(y) = - [ cos(x - y) - cos(x y) ] 2 + + + + bj sin(jx) ) is called a 12.2. Sine and Cosine 197 1 [ sin(x y) sin(x - y) ] 2 (b) Prove that the product of two trigonometric polynomials is a trigonometric polynomial, iii. sin(x) cos(y) = + + - 12- 16. Inverse trigonometric functions. [ -1. n n Prove that the sine function is injective on - - , 2 2 [ 5,]; n n that - arcsin(x) = -f o r allx E (--, -1. 2 2 The inverse of the sine function restricted to - is called the arcsine arcsin(,). Prove d d c 2 dx Prove that the cosine function is injective on [0, n]. The inverse of the cosine function restricted to [0, n]is called the arccosine arccos(.). Prove 1 d for all x E (0,n). that - arccos(x) = -dx qF2 sin(x) 12-17. The tangent function is defined to be tan(x) := cos(x) (a) Prove that the tangent function is differentiable on its domain and d dx - tan(x) 1 cosZ(x) =- n (- n2 , -). 2 The inverse of the tangent function restricted to (- 5, ); (b) Prove that the tangent function is injective on - is called the arctangent arctan(,). 1 d Prove that - arctan(x) = dx 1 +x2' (d) (Another integral for integration with partial fraction decompositions.) Compute the inte1 gralS dx. (c) ~ 12-18. (The last integral needed for integration with partial fraction decompositions.)Proceed as follows to prove that for all natural numbers n > 1 we have (a) Use integration by parts on 1 1 (x2 + b2)n dx = 1 1 . ( x 2 + b2)" d x X2 (b) The resulting equation contains an integral (x2 + b2)"" (b) 1 d x . Expand the numerator with i b 2 and cancel what can be canceled. (c) Solve the equation for s (x2 + b2)"fl dx. 12-19. Compute each of the integrals below (a) / Cxsin(x) dx. .rex2 sin (x2) dx. 12-20. A representation of n (a) Prove that l b 1 . sin"(x) dx = - - sin"-'(x)cos(x) sinnP2(x) dx for all nat- ural numbers n E N and all real numbers a < b. Hint. Use sinn(x) = sin"-'(x) ( 1 - cos2(x) ) and use integration by parts for the summand ( sinn-2(x) cos(x) ) cos(x). 198 12. Transcendental Functions Prove by induction that 1'. 2n-1 2n-3 IT sin 2n (x) dx = __ . -. . . - - for all n 2n 2n-2 2 2 2n 2n - 2 d x = -. -. . . - for all n E 2 n + 1 2n-1 3 Prove that ,7 - 2 = lim - n - + ~2n n k= 1 ~ E W. N. (2k)2 This is called Wallis' Product Formula. (2k- 1 ) 2 ' s,' (x)dx ? sin2"(x) dx ? 1% sinZnf1(x) dx for all n E N, IT substitute the above expressions and divide by the expression in front of the -. 2 R is Riemann integrable and has a Riemann integrable derivative, then that if f : [0, n ] + 12-21. I -1" (x - + bzn f ( k ) = L n f(x) dxf ler's Summation Formula. Hint. Note that J';l (x i) - f'(x) = 1 dx.This formula is called Eu- f'(x) 2 . _ [ f (0) + f (1) ] - / 1 f ( x ) dx and that the last integral 0 on the right side of Euler's Summation Formula can be turned into a sum of integrals like these by integrating from one integer to the next and applying the appropriate shift. n 12-22 An asymptotic expression for n ! . To see the idea for the first step, note that ln(n!) = ln(k). k=1 (a) Apply Euler's Summation Formula to f ( x ) = ln(x { (b) Prove that / n (x Hint. 1 0 1' i) (x X 1 - 1x1) __ dx] converges. 't 1; + I' k) x+l dx = - + 1) to prove that for all n E W we have n=l (x - 1) 2 x (x i k n!e" (c) Let bn := __ Prove that (bn}r=lconverges to a limit b nnJTi' 1 b (d) Use Wallis' Product Formula to show that - = lim E - 2 1 dx. R 1 5 = ~.& n+m b: n!e" (e) Prove that lim -n+m nn& - This result is called Stirling's Formula. It is often written as n ! read "is asymptotically equal to." 12-23 ($1' where "->, -.en fin&= is for # O3 Prove that w ~ ( 0=) 2. for x = 0. 12-24 A differentiable function with bounded, but not Riemann integrable, derivative (see Figure 25) (a) Prove that for all 8 > 0 there is an xg (b) Prove that f g ( x ) := 'OS (4) E ; part 12-24a, is differentiable on R, but ( 0 , 8 )so that 2x cos (:) + sin (:) =0 for x 5 0, for (O' x8), where xg is a positive number as in ' fi is not continuous at x = 0. 12.3. L’H6pital’sRule 199 If IIII IIV V I l v IIII I V w vI IIII I1 v v II IIII Figure 25: The differentiable function h in Exercise 12-24 oscillates so that the derivative h’ is discontinuous on a Cantor set (marked by blocks). (c) Prove that g a ( x ) := L f4 (XI; 6 forx 5 - ’ is differentiable on W,g; is discontinuous at forx z -, 2 5 (0.8). fs (8 - x ) : 0 and at 6 and (x : ga(x) #0] (d) For any open interval ( a , b ) , let g ( , , b ) ( x ) := gIb-,l ( x - a ) , Prove that g(,,b) is differentiable, g;u,b) is discontinuous at a and at b and (x E W : g(,,b)(X) # 0) g ( a . b ) . cx (e) Let C Q = n Cf be a Cantor set of nonzero Lebesgue measure, which exists by Exercise n=l 8-3e. For each n E N let D i , i tervals so that [O. 13 \ CR = = 1, . . . , 2n - 1 be a sequence of pairwise disjoint open in- -1 u 2” ,=I. . 2“-1 0;;.Let hn := gn:, , Prove that h := lim hn (taken n+m , = l. . pointwise) is differentiable on W and h‘ is discontinuous on C Q (0 Prove that h‘ is bounded, but not Riemann integrable on [0, I ] (g) Prove that h‘ is Lebesgue measurable, and hence Lebesgue integrable on [0, I]. Hint. Represent h‘ as a sum of measurable functions and obtain (h’)-’ or intersection of preimages of ( u , 00) under these functions. [ ( a , x ) ] as a union 12.3 L’H6pital’s Rule Limits of functions involving transcendental functions can be hard to compute. The algebra either involves power series, or it might even be so complicated that it is virtually impossible. L‘H8pital’s Rule is a way to replace the limit of a quotient with the limit of the quotient of the derivatives, which may be easier to compute. The idea behind oc L‘H8pital’s Rule is easily explained with power series. If f ( x ) = x f k ( x - a ) k and k=O gk(x - a)k are power series with f ( a ) = g ( a ) = 0 and if g ’ ( a ) f 0, then g(x) = k=O 200 12. Transcendental Functions That is, the limit of the quotient of the functions is the quotient of the derivatives. Because not every function is a power series and because derivatives cannot be defined at infinity, in general we expect the limit of the quotient of the derivatives to f (XI f’(x) . . be on the right side. To prove lirn -- lim -, it is tempting to apply the X t “ g(x) x+a g’(x) Mean Value Theorem to the quotient, using that f ( a ) = g ( a ) = 0. But arguing f (XI f(x)- f ( a ) lirn -- lim x+a g ( x ) x+a X -a g(x)-g(a) = lim x+a f/o is problematic. g’(cg) If c f and cg approach a X-U at different rates, then the limit need not be lirn - To get the right limit, we need x+a g’(x) cf = cg, which can be achieved with a stronger form of the Mean Value Theorem. Theorem 12.19 Generalized Mean Value Theorem. Let f , g be functions that are continuous on [ a , b] and di#erentiable on ( a , b) and let g ( a ) f: g(b). Then there is a - f’(c> number c E ( a , b ) such that f ( b ) - f ( a ) g(b) - g(a> g’(c) . Proof. Let h ( x ) := [ f ( b )- f ( a ) ] g ( x )- [g(b) - g ( a ) ] f ( x )and apply Rolle’s Theorem. (See Exercise 12-25.) Now we are ready to prove L‘HBpital’s Rule. Theorem 12.20 L’H8pital’s Rule. Let a E [-a, co]and let f,g be differentiable functions dejned on an interval ( z , 00) ( i f a = GO), or (-co,z ) ( i f a = -00), or (a - 6, a + 6 ) \ { a ) ( i f a E R). Ifthe limits o f f and g satisfy lim f ( x ) = lirn g ( x ) = 0 x+a or both limits are in {hm}and lim x+a f X+a f’(x> exists as a number or is infinite, then g’(x) (XI lim -- lim -. This rule also applies to one-sided limits. g(x) x+a x--ta g’(x) f’o. First consider a E (-m, m] and L + g’(x) Proof. Let L := lirn x+a -00. We claim f (x) that for all yo < L there is an xo < a so that for all x E (xo, a ) we have g(x) > Let yo < L . Then there is an x1 < a such that for all x E f ’(XI a ) we have g’(x) > In case lirn f ( x ) = lirn g ( x ) = 0, by the Generalized Mean Value Theorem for x-+a all x E (XI, (XI, x+a a ) there is a c E ( x , a ) so that f ( x > f ( x >- f ( a ) - f’(c) > yo, which g(x>- g ( x ) - g ( a ) g’o means in this case the claim holds with xo := XI. In case lirn f ( x ) = lim g ( x ) = 00,we can assume without loss of generality that X-+a x--ta f and g are positive on [XI, a ) . Let E > 0 be such that ___ < L . Find an (1 - &)2 12.3. L’H6pital’s Rule x2 E ( X I , a ) an xo 20 1 so that for all c E (x2, a ) E (x2, a ) so that for all x f’ ( c ) we have that -> g’(c) E (xo, a ) we have that ~ (1 - E ) 2 ’ f(x) - f(X2) Then find > 1 - E and - g (x2 ) > 1 - E . Then for all x E (xo, a ) by the Generalized Mean Value Theg(x) orem there is a c E ( x 2 , x ) so that f(x) - f ( x 2 ) - f’(c) >and hence for g ( x > - g(x2) g’(c) (1 - E l 2 ’ all x E (xo,a ) we obtain f (x) f ( x )- f ( x 2 ) f(x) g ( x ) - g(x2) f’(c) > -(1 g(x) g ( x ) - g(x2) f ( x > - f ( x 2 ) g(x) g (c) -El 2 > yo, which proves the claim in this case. The other cases for the limits of f and g being infinite are proved similarly, so the claim is proved. The claim proves the result for L = 00 for left limits at a E R and for limits at infinity. Similar to the above we can prove that for L # 00 for all yo > L there is an xo < a so that for all x E (xo, a ) we have - < yo. This proves the result for L = -cm for g(x> left limits at a E R and for limits at infinity. Putting the two results above together, for L E R and every xo < a so that for all x E (xo, a ) we have L - E < ’(’) <L P(X’) E > 0 there is an + E , which proves the result for L E R for left limits at a E R and for limits-at 60. Repeating the above process for a E [-m, 00) to the right of a establishes the result for a = -00, for right limits at a E R and for two-sided limits at a E R. Exercise 12-28 shows that L‘HBpital’s Rule is not a one-for-one swap. The limit lim f ( ’ ) can exist even if lim x+a g(x) x-+a f’(X> fails to exist. g’(x) Aside from the obvious applications to quotients, L‘HBpital’s Rule also allows us to derive a well-known representation for the exponential function. Theorem 12.21 For all x Proof. For all x E E R we have ex = lim (1 n+cc + x--)n . R,we obtain Because the natural exponential function is continuous, the result follows. 202 12. TranscendentalFunctions Exercises 12-25 Prove Theorem 12.19. Remember to prove that Rolle’s Theorem can be applied 12-26 Compute each of the limits below. (a) lirn 2x4 (g) + -t x 2 x x 2 - 16 - 4x3 x+4 - 276 (b) lim xln(x) 1 1 lim -- ___ sin(x) cos(x) - 1 12-27 Prove by induction that for all n E W we have lim x”e-’ X’cs; 12-28. Prove that for f ( x ) = 2x lirn x+m ln(x) (c) &mm x+o+ = 0. + sin(x), g(x) = 2x - sin(x) and a = co we have f‘cx, does not exist. g lirn x+w f (x) ~ g(x) = 1, but (XI 12-29. Let f : (0,co) + R be differentiable. Does lirn f’(x) = 0 imply that lirn f ( x ) exists? Justify x +30 X+lX your answer. 12-30, Is there a differentiable function f : (0, m) + W with lim f ’ ( x ) = 0 so that for every real number (x~)?=~that goes to infinity and so that n+lim X + M L there is a sequence f ( x n ) = L? Justify your answer. 12-31. Creating summation formulas. This exercise shows one way in which summation formulas for c n powers of integers (see Exercise 1-33) can be discovered. Let f ( x ) := ekx. k=l c n (a) Prove that for all p E N we have kP = f@)(O). k= 1 n (b) Prove that for all x E W we have ekx = k=l (c) Prove h a t for all p E W we have f(P’(0) ex - e(n+l)x 1-ex dP = lim xiodxp ’ - e(n+l)x 1-ex ’ n k = (d) Use l’H8pital’s Rule to verify that k=l 22 (n + 1) for all n E W. (e) Use a computer algebra system to generate a closed formula for k5. k=i 12-32. Use Cauchy’s Limit Theorem (see Exercise 2-51) to give an alternative proof of 1’Hbpital’s Rule. Chapter 13 Numerical Methods Many problems cannot be solved exactly. Therefore it is natural to consider computational approaches to mathematics. Numerical analysis is a wide field. For any problem that can be solved exactly (under good circumstances), there is at least one numerical method to provide an approximate solution in case exact methods fail. Usually, a numerical method contains a parameter, call it n , that indicates the computational effort required to obtain the approximation. With enough computational effort, a numerical method should provide approximations close to the exact solution. More formally, this means that as n goes to infinity the limit of our approximations should be the exact solution. But just having approximations that converge to the exact answers usually is not enough. We want to obtain good approximations with as little computational effort as possible. This means we not only need to assure that, given enough computational effort, the approximations converge to the correct result. We also must analyze how fast the approximations converge. In the language that we have developed, this means that just showing that for every N E N so that for all n 2 N the nth approximation is within E of the exact solution is not enough. We also want our estimates to be sharp enough so that when, say, N = 10 guarantees a desired accuracy, we do not use a larger N and waste computational effort. So, where in proofs so far we were satisfied with the fact that N exists, in numerical analysis we want to know what N is. Where in proofs so far we were satisfied that estimates ultimately showed that a certain difference is smaller than E , in numerical analysis we want to perform the estimate with an N that is as small as possible. E > 0 there is an For this reason, this chapter will emphasize error analysis. We present numerical approaches for three typical tasks: The representation of functions in Section 13.1, the solution of equations in Section 13.2 and the computation of integrals in Section 13.3. 203 204 13. Numerical Methods 13.1 Approximation with Taylor Polynomials It seems mundane, but the most fundamental numerical task is the computation of the values of functions such as exponential and trigonometric functions. The exact values of these functions can only be computed for certain special input values x. For all other input values, we need to use approximation techniques. The most fundamental of these techniques is the approximation with Taylor polynomials. There are several ways to motivate the use of polynomials. Most importantly, polynomials are easy to compute, which is a paramount concern in numerical analysis. Moreover, for each of the many functions defined as power series there is a sequence of polynomials that converges to it. Geometrically, we can argue that the tangent line of a differentiable function at a point a has the same value and the same first derivative as f at a and, locally, it approximates f rather well. We have reason to hope that, by increasing the number of derivatives that agree with the derivatives of the function at a, we can enlarge the interval on which we have a good approximation for f . To increase the number of derivatives that agree at a , we need to use polynomials of degree greater than 1. Theorem 13.1 Let the function f be n times diTerentiable at a. Then the polynomial f(j)(a) ( x - a)’ is such that thejrst n derivatives of Tn at a are equal T,,(x) := j! c ~ i =O to thefirst n derivatives off at a. That is, Tjk)(a)= f ( k ) ( afor ) k = 0, . . . , n. Proof. Prove by induction that for 1 Ik In the kth derivative of T,(x) is T,‘k’(x) = c ~ j=k f ( j ) ( a ) j. ( j ! j - 1). .. ( j - k + l)(x - a ) j - k . (Exercise 13-1.) Theorem 13.1 motivates the definition of Taylor polynomials and Taylor series. Definition 13.2 Let thefunction f be n times diyerentiable at a. Then the polynomial T n ( x ) := c f(j)(a) ~ i =O j! (x - a ) j is called the nth Taylor polynomial of f at a. f(j)(a) j! ___ (x - a ) j is infinitely difSerentiable at a, the series T ( x ) := j=O If f is called the Taylor series o f f at a. The definitions of the exponential and the trigonometric functions guarantee that the Taylor polynomials at a = 0 ultimately provide good approximations for these functions (see Exercise 13-2). However, as mentioned in the introduction, for numerical purposes it is not sufficient to just know that for some degree n the nth Taylor polynomial of f is close to f .We need to know how close T,, is to f . + Theorem 13.3 Taylor’s Formula. Iff is ( n 1) times continuously differentiable on ( a - R , a + R ) , then for all x E E% with Ix - a / < R we have 13.1. Approximation with Taylor Polynomials 205 if Ix - a / < R and M is such that for all x E IR with Ix M have f ( " " ) ( x ) I M , then ] f ( x ) - T,(x) I 5 Ix - aIn+l. In particulal; I x I (n - a1 < R we + l)! Proof. This proof is an induction on n. For the base step n = 0 note that for all R , a R ) we have + E (a - For the induction step, we use integration by parts on the induction hypothesis. Let R ) and let f be n+2 times continuously differentiable on (a - R , a R ) . Then x E (a - R , a f(x) + + = T,(x) + (x - a)n+' = Tn(x) + (x - ay+' I'l y f ' " + ' ' + (a (1 - U),+l ( n l)! + f(n+l)(a u(x - a ) )du + u(x - a)) The remainder of the proof is left to Exercise 13-3. There are two ways to use error estimates like Taylor's formula. In an a posteriori or after the fact estimate, the polynomial and the interval are given and we estimate the error. This is straight substitution into the formula (see Exercise 13-4). In an a priori or before the fact estimate, we have a desired accuracy for an interval and need to find an n that guarantees this accuracy. Example 13.4 Determine the degree n so that the nth Taylor polynomial off (x)= e" about a = 0 is within loe4 of ex for all x E [-2, 21. To bound the error by the maximum acceptable error of we make the upper bound of the error from Theorem 13.3 less than the specified acceptable error. This makes the actual error less than the acceptable error. In the notation of Theorem 13.3, we have a = 0, Ix --a I 5 2, and n is to be determined. All derivatives of ex are again e" and ex is increasing. Hence, the upper bound M for the (n l)Stderivative on [-2,2] + 206 13. Numerical Methods Figure 26: Taylor polynomials are local approximations to the function. The left graph shows the exponential function and its first (dashed), second (dotted) and third (dashdotted) Taylor polynomials. Note how the approximation gets better as the degree of the polynomials increases. The right graph shows the exponential function and its eleventh Taylor polynomial as demanded in Example 13.4. e2 2n+1 i where the inequality we need ( n l)! to solve is marked with an exclamation sign. The above inequality cannot be solved algebraically for n . However, n ! grows much faster than 2". Thus by substituting values for n and checking if the inequality is satisfied we find that n = 11 is large enough to guarantee the desired accuracy. For a visualization of the Taylor polynomials, consider Figure 26. 0 is e 2 . We set I f ( x > - T,(x)/ 5 ~ + Remark 13.5 Not every function can be approximated well with Taylor polynomials. For example, Lemma 18.8 exhibits a function that is infinitely differentiable, not identical to zero, and yet all its Taylor polynomials at a = 0 are identical to zero. 0 Remark 13.6 Early operating systems, such as the one on the Commodore 64 in the 1980s, used Taylor polynomials of sufficiently high degree to compute many functions. Nowadays, computational schemes that are faster than Taylor polynomials, but also more memory intensive, are used to compute functions. The reason is that memory is not as much of an issue as it was in the early days of computing, while speed remains a crucial concern. 0 Remark 13.7 Taylor polynomials are also used in physics to obtain low-order approximations of complicated functions f.Typically, i f f is to be evaluated at x Ax, where Ax is small, the exact expression f ( x Ax) is replaced with the approximately equal expression f ( x ) f'(x)Ax. This is feasible because, as Taylor's Formula shows, the difference is often bounded by C ( A X ) ~where , C is a constant. If Ax is small, terms of the order (Ax)* are usually negligible. The determination what is small and what is negligible is made based on practical, nonmathematical considerations. A posteriori, if the approximate formula correctly predicts an experiment, then the approximation must have been permissible. A priori, one could say that if other effects influence the quantity given by f by, say, 0.1% (of the underlying base unit), and Ax is at most I%, then (Ax)' is less than 0.01%, so it can be ignored because other effects will have greater influence. If a first order approximation as indicated does not work, higher0 order Taylor polynomials can be used in more sophisticated models. + + + 13.1. Approximation with Taylor Polynomials 207 Functions can also be approximated with trigonometric polynomials. This idea is motivated by problems as described in Section 21.3. We will present the corresponding series, called Fourier series, in Section 20.2. The powerful tools available by then allow for a more efficient presentation than what would be possible now. Exercises 13-1 Prove Theorem 13.1, c m 13-2 Prove that if f ( x ) = ckxk is a power series with nonzero radius of convergence, then the Taylor k=O c m series o f f about a = o is Ckxk k=O + 1) times continuously differentiable on (a - R , a + R ) , then for all x E R 1 1 ( n +M I)! / x - a / " + ' , where M is such that for all 1x - a1 < R we have 1 f ( " + ' ) ( x ) I < M . 13-3 Prove that if f is ( n with / x - a / < R we have that f ( x ) - T n ( x ) 5 ~ 13-4 Find an upper bound for the error incurred when approximating f on [ l , I ] with its nth degree Taylor polynomial at a . (a) f ( x ) = e x , a = 0, n = 10, [ l , rl = [-5,51 (b) f ( x ) = sin(x), a = 0, n = 7, [ L , r ] = [22 'T "1 13-5 Use induction to prove that the given expression is the nth derivative of the given function. (a) f ( x ) = In 1x1,f ' " ) ( x ) = (bj f ( x ) = 2', f'"'(xj = 2' - l)! (-l)"+'(n for n ? 1 X" (ln(2) )" + ne" 1 - a" f ( x ) = x e a x , f ( " ) ( x ) = anxeaX + -eax 1-a (c) f ( x ) = x e x , f ( " ) ( x ) = .ex (d) 13-6 Determine the smallest n so that polynomial of f about a . 1 T,(x) - f ( x ) 1 < E for all x E [ l , r ] , where T, is the nth Taylor (aj f ( x ) = e x . a = 0,[ L , r ] = [-lo, 101, E = l o r 5 (b) f ( x ) = cos(x), a = 0, [ l , r ] = [-n, TI, (cj f ( x ) = A, a = 1, [ l , r ] = [.5. 1.51, E E = = lo-'' (d) f ( x ) = In@), a = 2, [ l , I ] = [ l , 31, E = lo-* 13-7 Prove that for any a > 0, the Taylor polynomials of f ( x ) = In 1x1 at a converge for x E ( 0 , 2 a ) and they diverge for lx - a1 > a . 13-8 Second Derivative Test. Let f : (a, b ) + R be twice continuously differentiable and let x E ( a, b ) be so that f ' ( x ) = 0. (a) Prove that if f " ( x ) > 0, then there is an E > 0 so that for all z i:x with 1z - X I < E we have that f ( x ) < f ( z ) , (b) Prove that if f " ( x ) < 0, then there is an E > 0 so that for all z # x with ) z - X I iE we have that f ( x ) > f ( z ) . 208 13. Numerical Methods (c) State and prove a similar result for f : ( u , 6) + IF?, being n times continuously differentiable with f ’ ( x ) = f ” ( x ) = . . . = f ( n - l ) ( x ) = 0. (Distinguish even and odd n.) n 13-9. Efficient evaluationof polynomials. Let p ( x ) = u j x j be a polynomial. j =O (a) Provethatp(x1 = a ~ + x ( u l+ x ( u 2 + . . . + x ( u n - l + x ( a n ) ) . . . ) ) . n (b) Count the number of operations in the evaluation of the sum a j x j and in the evaluation j=O in part 13-9a to prove that evaluation as in part 13-9a takes fewer operations (and is thus more efficient) than evaluation of the original sum. Hint. Evaluating u j x J takes j floating point multiplications and floating point multiplications take much more time than floating point additions. (c) State an n step recursive procedure that evaluates polynomials as in part 13-9a. Hint Start with Hn := an and define H n - l , . . . , H1 in such a way that H1 = p ( x ) . 13.2 Newton’s Method Solving equations is a common numerical task. The Intermediate Value Theorem guarantees that for equations f ( x ) = 0 the issue usually is not ifwe can find solutions, but rather how fast we can compute them. Example 13.8 The bisection method. Let f : [a, b] + R be continuous with f ( a ) f ( b ) < 0. Then by the Intermediate Value Theorem f has a zero in [ a , b ] . To simplify the presentation, assume without loss of generality that f has a unique zero z in [ a , b]. We will recursively construct a sequence [ x , } ~ =that ~ converges to z . Let xg := a , x1 := b, and j ( 1 ) := 0. For the recursive construction, let xg,. . . , xn b-a and j ( n ) E ( 0 ,. . . , n - 1) be SO that f ( ~ ~ ) f ( x j (<~ )0 )and /xn- xj(n) = -.2n-1 I Define xn+l := x n + x j ( n ) . If f(x,+l) = 0, stop the recursion because xn+l is the 2 desired zero. Otherwise, if f ( x n + l ) f ( x j ( n ) ) < 0 set j ( n 1) := j ( n ) , and if b-a f ( x n + l ) f ( x n ) < 0 set j ( n 1) := n. In either case, Ixn+l - . ~ j ( ~ + = l ) l-, 2n and we can continue the recursion. By the Intermediate Value Theorem and the uniqueness of z , for each n E N we b-a obtain Ix, - zI 5 -, and hence lim x, = z . 2n- 1 n+oo If f has multiple zeroes in [ a , b] the argument becomes a bit messier, but the sequence will converge to one of the zeroes of f. + + Aside from establishing convergence, the estimate that in the nth step we are within b-a 2”-I of a zero of f tells us how many steps we need to execute to obtain an estimate of a given quality. For example, if b - a = 1 and we want a number that approximates the first 15 decimal places of the zero z , we would need to execute 51 steps of the bisection method (2-51+’ = 2-50 = 1,024C5 < 10-15). 209 13.2. Newton’s Method Figure 27: An illustration of Newton’s method that will be analyzed precisely in Exercise 13-12. This is a large number of steps to carry out by hand. Nonetheless, to solve a single equation, a program would find the desired approximation rather quickly. However, when these methods were first designed, computers had not been invented, so a faster computation was desirable. For today’s applications, speed remains an issue, because in large scale computations we may need to solve billions of equations. This means every step we save would be magnified by a large factor, leading to a more efficient use of computational resources. Newton’s method reduces the (hard) computation of the zero of a differentiable function to the (easier) computation of the zero of the tangent line at a point. Say we are currently at a point x,. Then the tangent line of f at x, is the unique straight line that goes through the point (x,, f(xn)) and has slope f’(x,). Setting its equation y = f(x,) f’(x,)(x - x,) equal to zero and solving for x gives the expression + x=xn-- (”) for the zero of the tangent line. Of course, the zero of the tangent f ’(x,) line is not the zero of the function, but we would hope that it is closer to the zero of f than x, was. Hence, we set x,+l := x, place of x, (see Figure 27). - ’(“) and repeat the above with f ’(xrl) x,+1 in Definition 13.9 Let f be a diferentiable function and let xo be a point in its domain. Then Newton’s method started at xo generates a sequence dejked recursively f(x,) f o r n 2 0. by X,+I := X, - - f ’(xn) If the sequence generated by Newton’s method converges, then under mild hypotheses the limit is a solution of the equation f(x) = 0. Theorem 13.10 Let f be a continuously dzferentiable function. If the sequence generated by Newton’s method converges to a limit L in the domain off and f ’ ( L ) # 0, then f ( L ) = 0. 210 13. Numerical Methods Proof. Because the sequence of the recurrence relation. lim n+cc {xn}Zn converges, we can take limits on both sides . xn+l - = lim n-oo (x, - - leads to the equation L = L - - f ( L ) , which clearly implies f ( L ) = 0. f ’(L) w The reader will prove in Exercise 13-10 that L is also a zero of f if f ’ ( L ) = 0. Another way of looking at the proof of Theorem 13.10 is to say that the limit L is a fixed point of the function F ( x ) = x - -. Fixed points are a standard tool in f’(x) numerical and abstract mathematics. We will encounter fixed points again in Theorems 17.64, 17.65, and 22.6. Theorem 13.10 says that ifNewton’s method converges, the limit is a zero of the function. It does not guarantee that the sequence from Newton’s method converges. However, there are mild conditions that guarantee convergence. Lemma 13.11 Let q E [0, l), let F : ( a , b) -+ IR be a difSerentiablefunction so that F ’ ( x ) < q f o r all x E ( a , b ) and let p E ( a , b) be so that F ( p ) = p ( p is also called a fixed point of F ) . Then p is the onlyfixedpoint of F in ( a , b ) and f o r all xo E ( a , b ) the sequence generated by the recursive equation x,+l := F ( x , ) converges to p . 1 I Proof. To prove that there is no F E ( a , b) \ { p } with p” = F ( 3 note that if F = F ( 3 , then by the Mean Value Theorem there would be a c between p” and p so that IF- p / = / F ( 3- F ( p ) l = IF’(c)/ IF- pl < IF- p i , whichis acontradiction. Hence, p is the only fixed point of F in ( a , b). Now let xo E ( a , b). Because the powers qn converge to zero, the result is proved if we can establish the inequality lx, - pi 5 q“ 1x0 - pI for all n E N.The inequality is proved by induction on n. For the base step n = 1 note that by Mean Value Theorem between xo and p there must be a c so that 1x1 - p ( = IF(xo) - F ( p ) l = IF’(c)/ 1x0 - pi I qlxo - p J . For the induction step note that there must be a c between x, and p so that w Ix?I+l-PI = l F ( x , ) - F ( p ) / = / F ’ W Ixn-pl 5 4 (qnIXO-pI) = qn+lIxo-PI. Theorem 13.12 Let f : ( a , b ) -+ R be twice continuously diTerentiable. Then f o r each zero z off in ( a , b ) with f ’ ( z ) # 0 there is a 6, > 0 so that f o r ever)! starting point xg E ( z - 6,, z 8,) the sequence generated by Newton’s method converges to z. + Proof. We apply Lemma 13.11 to F ( x ) := x - f ( x ) . First note that f ’(XI In particular this means that F ’ ( z ) = 0, and hence there is a 6, > 0 so that 1 I F ’ ( x ) / < - < 1 for all x E ( z - 6,, z 6,). But then by Lemma 13.11 every se2 quence generated by Newton’s method started at any xo E ( z - 6,,z SZ) converges to z . w + + 21 1 13.2. Newton's Method From Theorem 13.12 and its proof we infer that the closer xo is to a zero of f , the faster Newton's method will converge. But when Newton's method converges, the numbers xn will get ever closer to a zero of f . Hence, as Newton's method is executed, the speed of convergence should accelerate. Theorem 13.14 below makes this statement more precise. Lemma 13.13 Let f : ( a , b ) + E% be continuously differentiable and let y > 0 be so that for all x,z E ( a , b ) we have that l f ' ( z ) - f ' ( x ) ( < y l z - XI. Then for all Y ( z - x(2 holds. x,z E ( a , b ) the inequality 1 f ( z ) - f ( x ) - f ' ( x ) ( z - x) 1 < 2 Proof. Without loss of generality assume that x < z . Then = Y -x)2 -(t 2 JX Y - x)2 = -(z 2 Theorem 13.14 Let f : ( a , b ) -+ R be a continuously differentiable function so that f ' ( x ) # 0 for all x E ( a , b). Assume there are xg E ( a , b) and a , j3, y > 0 so 1 :,(; 1 1 1 that - 5 a, so that for all x E ( a , b ) we have f'(x) i B, so that for all I ( a , b ) we have f ' ( z ) - f'(x)/ 5 ylz -XI, so that h := - < 1 and so that 2 a with r := -we have [XO - r, xo r ] G ( a , b). Then 1-h x,z E + I . Each recursively dejinedpoint xn+l := xn - f ( x , ) is in (xo - r, xo + r ) . f'(xn) 2. The sequence (xn}ZO converges to a point u E [ X O -r, xo+r] with f ( u ) =O. 3. Foralln>Owehaveju-x,I h2"-1 ( a - 1 - h2". Proof. We first prove by induction that for all n E N the point xn is well defined, < ah2"-1-1 , and Ixn - xo/ < r . For n = 1 the above is trivial. For the lxn - xn-ll induction step, n + n 1 first note that because (xn- XOI < r the number xn+i is well defined. The definition of xn implies that f ( ~ ~ - 1+) f'(xn-l)(xn - x n - l ) = 0 , which implies + 212 13. Numerical Methods But then, using a telescoping sum, we obtain the following. which finishes the induction. To prove that {x~}:=~ converges, let m 2 n. Then lXm - Xj-1 5 c m lXk - Xk-11 5 a k=n+l c m h2k-'-1 = a m-n-1 h2"-1h2"(2'-1) j=O k=n+l 1 goes to zero as n goes to infinity, so 1 - h2" {x~}:=~ is a Cauchy sequence. (Mentallyfill in the argument.) In particular, (x,},"=~ converges to a number u , which by Theorem 13.10 must satisfy f ( u ) = 0. Letting m 1 go to infinity in the above estimate also shows that Iu - x, I 5 ah2"-' which 1 - h2" ' finishes the proof. Because 0 < h < 1 the bound ah2"-'- ~ Theorem 13.14 shows that once Newton's method is "close enough" to a zero of a function, it converges quite rapidly. Indeed, near a zero u of a continuously differentiable function f , the hypotheses of Theorem 13.14 can be satisfied if f ' ( u ) # 0, because we can make a small by starting near u . For comparison with the bisection 1 22"-1 method, suppose, for argument's sake, ct = j3 = y = 1. Then Iu - X n 1 I ton's method six time; to obtain & approximation that gives the first 15 digits behind the decimal point of u. Another nice feature of Theorem 13.14 is that it can be generalized to several variables (see Exercise 17-44). Finally, note that even though Newton's method is only applicable to differentiable functions, Exercise 13-13 shows that it can be modified to provide a method that is applicable to all functions. Exercises 13-10, Let f be a continuously differentiable function. Prove that if the sequence generated by Newton's method converges to a limit L in the domain of f and f ' ( L ) = 0, then f ( L ) = 0. 1 I 13-1 1. Let q > I, let F : ( u , b ) + W be a differentiable function so that F'(x) > q for all x E (a. b ) and let p E ( u , b ) be so that F ( p ) = p . Prove that p is the only fixed point of F in ( a , b ) and that for all xo E (a, b) \ [ p } the sequence generated by the recursive equation xn+1 := F ( x n ) terminates after finitely many steps with a value that is not in ( a , b). 13.2. Newton’s Method 13-12. Let f : ( a , b ) -+ 213 R be twice differentiable. (a) Let z E ( a , b ) be so that f ( z ) = 0, f’(x) > 0 for all x E ( z , b ) (that is, f is increasing on ( z , b ) ) ,and so that f ” ( x ) > 0 for all x E ( z , b ) (that is, f is concave up on ( z , b)).Prove that the sequence generated by Newton’s method started at any xo E ( z , b) converges to z . Hint. Prove that z 5 xn+l 5 xn for all n E N, Note. Figure 27 provides a geometric visualization of the claim in this exercise. Second note. In particular, it is allowed that f’(z) = 0 in this exercise. (b) Prove that i f f ” is continuous, then for each z E ( a , b ) with f ( z ) = f ’ ( z ) = 0 and f”(z) # 0 there is a 6, z 0 so that for every starting point xo E (z - 6,,z 6,) the sequence generated by Newton’s method converges to z. + 13-13. Let f be a function, let x0,xi E R with f(x0) # f ( x 1 ) and consider the recursively defined (a) Show that the recursive formula is obtained by taking the equation of the secant line of f through ( ~ ~ - 1f(x,-l) , ) and (x,, f ( x n ) ) and computing its unique zero. (b) Prove that if the sequence generated with this method converges to L and bounded for x ,z near L , then f(L)= 0. f ( z ) - f(x) is z-x + f ,z < x1 < xo, f is twice differentiable on ( z - 6,xo 8 ) for some 6 z 0, and f is increasing and concave up on ( z - 6,xo 6) (that is, f ” ( x ) > 0 for all x E (z - 6,xo 6)), then the sequence generated by this method converges to z . Hint. Prove that z 5 xn+1 5 X , for all n E N. (d) Prove that in a situation as in part 13-13c the sequence [ x , ) ~ = ~converges at least as fast to z (c) Prove that if z is a zero of + + ‘x iz x,L o x,” for as the sequence x N Hint. Prove that 5 generated with Newton’s method and 5 xt = XO. all n 13-14. Explain why xo = 1 is not a useful starting point for finding the zeroes of f ( x ) = x 3 - 3x + 4 with Newton’s method. 13-15. Explain why xo = 1 is not a useful starting point for using Newton’s method to find the zeroes of 1. Sketch a rough graph of f and of the tangent lines used to the function f ( x ) = x 6 - 4x2 - x compute x1 and x2 to illustrate your point. 13-16. For the function f ( x ) = x 3 - 5 x - 5 , execute Newton’s method started with xo = -2. Use a calculator or a computer. + (a) Find x1,x2,x3, x4. (b) Find the first n such that your computer shows x, = xn+1. Explain why n is so large. + 13-17. Apply Newton’s method to f ( x ) = 6x4 - 18x2 - 6 x 1 with xo = -1. Explain why the limit is not the zero o f f that is closest to the starting point. 13-18. The limit L computed with Newton’s method does not always give f(L)= 0. + Apply Newton’s method with xo = 1. Use a com(a) Let f ( x ) = (lO1oo los0) x puter and call the apparent limit L . (b) Compute f(L).Is f(L) zz O? (c) Compute the zero o f f exactly and compare it to L. Is L close to the zero? (d) Explain why Newton’s method cannot produce a better approximation to the actual zero o f f . 13-19. Square roots. (a) Use Newton’s method and the fact that &is the solution of the equation x 2 - a = 0 to devise a recursive method to approximate square roots. Note. This is one algorithm that is used in computers to approximate square roots. (b) Prove that for any xo > 0 the sequence generated in part 13-19a converges to & and for any xo < 0 the sequence generated in part 13-19a converges to -A. 13. Numerical Methods 214 13.3 Numerical Integration We conclude our introduction to numerics with numerical integration. Although the Darboux and Lebesgue integrals are useful, even essential, for the development of integration theory, the Riemann integral and the Fundamental Theorem of Calculus are the best tools for numerical considerations. The key to numerical integration is to replace the function we want to integrate with a function that is close to it and which can easily be integrated. The simplest geometric idea is to choose points on the function and let a polynomial go through these points. This section will focus solely on this idea. For other approaches to numerical integration, consider [28]. n Definition 13.15 Let X O , . . . , x, Li(X) := kfi X - Xk __ . xi - xk E R with xi # X k for i # k and define the polynomial Then Li is called a Lagrange polynomial. Theorem 13.16 Lagrange’s Interpolation Formula. Let n E N,xo, . . . , x , E R with xi # X k for i # k , and let f o , . . . , f n E R. Then the polynomial defined by n thatfor all i = 0 , . . . , n we have P,,(xi) = fi. Proof. The equation follows easily from the fact that for all j E {0, . . . , n } we have O’ if j ” Regarding uniqueness, suppose that Q was 1, if j = i . * another polynomial of degree 5 n with Q ( x j ) = f , for all j = 0, . . . , n . Then P - Q is a polynomial of degree 5 n with n 1 zeroes. By the Fundamental Theorem of Algebra (see Exercise 16-74 for a proof), this means that P - Q = 0 or P = Q. + With the polynomial P, going through the points (xi, f i ) the most natural way to approximate a function f is to set fi = f ( x i ) . We first note that if we start with a polynomial f of sufficiently low degree, then we simply get f back. Corollary 13.17 Let f be a polynomial of degree 5 n and let P,, be a polynomial computed as in Theorem 13.16 with numbers f i = f ( x i )for distinct numbers xi E R, i = 0 , . . . , n. Then P,(x) = f ( x )for all x E R. Proof. Both P, and f are polynomials of degree 5 n that go through the points ( x i , f ( x i ) ) . The equality follows from the uniqueness of P,. To obtain a convenient integration formula, we need to express the integral of P, in terms of the fi = f (xi). Interestingly enough, as long as the xi are equidistant, the coefficients in the formula do not depend on the interval over which we integrate. Proposition 13.18 Newton-Cotes formulas. Let n d-c tewal. Define h := -, f o r i = 0 , . . . , n let n Xi E N and let := c [c,d ] be an in- + i h , let f j E R and 13.3. Numerical Integration 215 't Figure 28: Visualization of the approximation of an integral over two subintervals with Riemann sums with evaluation at left endpoints ( a ) , trapezoidal sums ( b ) and Simpson's Rule (c). n Pn(x) d x = h fiai. i =O x-c Proof. The key to the proof is the substitution t := h ' The most obvious way to approximate an integral is to approximate a function f : [a. b ] -+ R with a polynomial of sufficiently large degree and then use the appro- priate Newton-Cotes formula. However, for n 1. 7 some of the ai become negative (see Exercise 13-20), which leads to problems with cancellations of digits. Moreover, the a; are hard to compute for large n. Therefore an interval [ a ,b] is usually first partitioned into shorter subintervals, the formula from Proposition 13.18 is applied on each subinterval and then the results are added. This type of integration formula is called a composite integration formula. 1 1 For example, for n = 1, we obtain a0 = - and a1 = -. If we now partition the 2 2 b-a interval [ a ,b] into N subintervals of length Ax := -and apply Proposition 13.18 N on each subinterval we obtain the approximation 216 13. Numerical Methods which is called the trapezoidal rule, also shown in Figure 28(b). In the trapezoidal rule, we need to evaluate the function N 1 times. The effort in numerical integration is proportional to the number of times the function f is evaluated. To compare the performance of two numerical integration formulas, it is thus important that both formulas evaluate the function equally many times. For this reason, we will construct composite Newton-Cotes formulas so that f is evaluated N 1 times. For n = 2, note that each additional subinterval of length Ax requires two additional evaluations of the function. Thus for n = 2 we must demand that N is even. 1 1 4 With a0 = -, a1 = - and a2 = - we obtain the approximation formula 3 3 3 + + N-1 ” \ \ k=l which is called Simpson’s Rule, also shown in Figure 28(c). In Exercise 13-21, the reader will state composite integration formulas based on Newton-Cotes formulas for n = 3-6. We now turn to the error analysis once more. Peano’s error representation (Theorem 13.19) shows that a more “abstract” point-of-view can have benefits for concrete tasks, which makes it a perfect conclusion for Part I of this text and a lead-in for the more abstract Part 11. Consider that the integral as well as the Newton-Cotes formulas are functions themselves. They map functions on an interval to real numbers. Moreover, the integral as well as the Newton-Cotes formulas are linear. That is, sums and constant factors can be moved through the integral (see Theorems 5.8 and 9.25) and also through the Newton-Cotes formulas (easy computation). Linear functions on vector spaces (like vector spaces of integrable functions) will be important in abstract analysis (see Chapter 17). Because we adopt this more abstract point-of-view, the error representation actually is valid for any numerical approximation of integrals that is linear, that gives exact results for low-degree polynomials and that can be “moved into the integral” as described in the hypothesis of Theorem 13.19 (see [28] for examples beyond the Newton-Cotes formulas). By Corollary 13.17 the nth Newton-Cotes formula is exact for polynomials up to degree n. Because it only evaluates the function at select points X k , multiplies the values with numbers, and adds the results, it is linear and it can be moved into the integral. Hence, Peano’s error representation applies to the nth Newton-Cotes formula. Although the hypotheses for Peano’s error representation look complicated, the proof shows that they are just what is needed to get an estimate. Finally, Peano’s error representation also shows that we need to establish more abstract results to fully justify more concrete results, like error estimates. In the proof of Peano’s error representation, we work with double integrals and we reverse their order. Formally, we have not proved yet that this is possible. Fubini’s Theorem (see Theorem 14.66) will show that this reversal is indeed allowed. For the specific case of double Lebesgue integrals, Fubini’s Theorem is stated in Exercise 16-80. Because Theorem 13.19 is not used to prove these results, we will reverse the order of integration in the proof of Theorem 13.19, anticipating that this can be justified. 13.3. Numerical Integration 217 Theorem 13.19 Peano's error representation. Let n E N,c < d and let F ( . ) be an integration formula which is lineal; that is, F ( a f ,Bg) = a F ( f ) ,BF(g)for all a , E R and f , g : [c,d ] -+ R,which for polynomials p of degree at most + n gives F ( p ) = + Id p ( x ) d x , and which for all continuousfunctions g : [c,d ] -+ R g ( t ) F ( ( x - t)nlx-tzO) dt. Forevery g ( t ) ( x - t)nlx-r20 dt R,let R( f ) := F ( f ) - Riemann integrable function f : [c,d ] -+ kd f ( x ) d x be 1 the error of the numerical integration and let K ( t ) := - R ((. - t)nl.-tzo). Then for n! every function f that for some 6 > 0 is n 1 times continuously diferentiable on the + interval ( c - 6 , d + 6 ) we have R ( f ) = called the Peano kernel. kd f ( " + ' ) ( t ) K ( t )d t . The function K is also Proof. By Taylor's Formula, we obtain the following for all x f (x) + = T n ( x ) (x - c)"+l = T,(x) + I' f ("+l) (c + 1 U(X E - c))?(l n. [c,d ] . - u)" du ld f('+l)(t)+(x - t)nlx-r10 d t , n. where T,(x) is the nth Taylor polynomial of f at c. Because the integration formula is linear, we infer for all functions g , h that R(g+h) = F(g+h)- = F'(g1-L = R(g) d g+hdx Id gdx+F(h)-k d hdx + R(h). Now because the integration formula gives exact results for polynomials of degree 5 n we know that R(T,,) = 0. Because the integral in Taylor's formula above is a function of x we obtain the following. R(f) = R(T,) +R 1 f ' " + ' ) ( t ) - i ( x - t)'lx-tz0 dt n. 13. Numerical Methods 218 F = ( (X - t)" ln-tzo )- ld ld(x - t)nlx-r20 1 dx d t f ( " + ' ) ( t ) K ( t )d t . H Although Peano's error representation is very versatile, we need something more concrete for applications. One key weakness of Peano's error representation is that it depends in a rather complicated way on the interval of integration. The next result makes this dependency more manageable for the Newton-Cotes formulas by reducing the error to a power of half the length of the interval, times a constant, times a value of an appropriate derivative. Theorem 13.20 Let n E N and let f : [c. d ] +-R be n+ 1 times continuously differentiable ifn is odd and n+2 times continuously dgerentiable ifn is even. Let RE'd1(f ) be f ( t ) d t with the nth Newton-Cotes for- the error w9hen approximating the integral /" d inula F n ( f ) , that is, R F ' d l ( f )= F , ( f ) - f ( t ) d t . Foroddn, there i s a < E ( c ,d ) r E ( c ,d ) so that R F . d l ( f )= (n + 2)! Proof. For odd n , by its definition the nth Newton-Cotes formula gives exact results for polynomials of degree n. For even n , the nth Newton-Cotes formula even gives exact results for polynomials of degree n 1 (see Exercise 13-22). Because the proofs for even and odd n are very similar, we will assume throughout that n is fixed and odd. The idea for the proof is a substitution that changes the domain of the integral from [c. d ] to [-1. 11. For all real numbers c < d , let KE.dl denote the Peano kernel for the nth NewtonCotes formula for the interval [ c ,d ] . This Peano kernel can be reduced to the Peano kernel KL-'.'] on [-1, 11 as follows. A simple substitution (similar to the computation + below, see Exercise 13-23) shows that KF.dl ( t + T) d-c d-c = K,[-7'T1 ( t ) for all 219 13.3. Numerical Integration t E [ d-c --, 2 -1 d-c 2 . Moreover, for all x E [--1, I] we have n!K, d-c = F, - d-c -FA-','] 2 d-c (( -9 1: (-Ud -- 2 __ -c 2 2 - -dX )- c 'I d - c 2 du d-cx,O T"-T - 1 ((. - x), l.-xzo) - 11 (U - x )' l u - x r ~du 1 To complete the error estimate, we need to use that the Peano kernels KE . d l ( t ) for the Newton-Cotes formulas on [c,d ] are of constant sign. The proof of this fact is geometrically obvious for n = 1 (see Exercise 13-24b)and still manageable for n = 2 (see Exercise 13-24c), but the algebra becomes increasingly tedious. Because we will focus only on n = 1 and n = 2 (trapezoidal and Simpson's Rule) in the examples, the presentation remains (somewhat)self-contained. For a general argument that also holdsfor other integrationformulas, consider [lo],which uses matrix methods toprove that the Peano kernel does not change its sign. Because the Peano kernel for the Newton-Cotes formulas does not change sign, by the Mean Value Theorem for the Integral as in Exercise 8-21 there is a [ E (c, d ) so that = f('"'(t)/ * d-c K k . d l( u -7- 1> +ci-d du 13. Numerical Methods 220 Note that the proof of Theorem 13.20 (and hence its conclusion) will work for any linear integration formula which can be moved into the integral, which gives exact results for polynomials up to degree n and for which the Peano kernel does not change sign. The reader will verify this for the Midpoint Rule in Exercise 13-25. With Theorem 13.20 providing a bound for the error for individual Newton-Cotes formulas, bounds for the error for composite Newton-Cotes formulas are now an easy consequence. + Corollary 13.21 Let n E N, a < b, 6 > 0 and let f : ( a - 8, b 6) + R be a function. If n is odd let f be n 1 times continuously difSerentiable and let M := max f ("+')(x) : x E [ a ,b] . I f n is even let f be n+2 times continuously d$ 1 11 { I ferentiable and let M := max If""(x)I + I. : x E [ a ,b ] value of the error when approximating the integral Let C F ' b l (f ) be the absolute Lb f ( t ) d t with the nth composite Ja + + Newton-Cotes formula with N = n j intervals, that is, N 1 = n j 1 evaluations, ( b - a)n+2 nfl+l R[-"'1 n (xn+l) where j E N.Then ifn is odd we have C:3b1 ( f ) 5 M Nn+l ( n 1)!2"+2 ' + Proof. If n is odd, by Theorem 13.20 we obtain (b - a)n+2 nn+l R[-"'] = Nnfl (n (~n+l) + 1)!2n+2 and a similar computation gives the result for even n. Note that in the error bounds in Corollary 13.21 only M depends on the function f , only ( b - a ) depends on the interval and only N depends on the number of evaluations. The remainder, although it looks complicated, is merely a constant. Therefore, as N --f 00 the numerical approximation of an integral of an n 1 times (if n is odd) or + 13.3. Numerical Integration 22 1 + n 2 times (if n is even) continuously differentiable function with the nth composite Newton-Cotes formula will converge to the actual integral. Of course, more concrete formulas will be better for given fixed n. For the trapezoidal rule and Simpson’s Rule, we obtain the error bounds indicated below. Corollary 13.22 Let f : ( a - 6 , b + 6 ) + R be a function. (b - a)3 , where 12N2 [ a ,b ] } . This is the error formula for the trapezoidal 1. I f f is twice continuously dgerentiable, we have Cy’bl(f ) I :K {I K = max f ” ( x ) I : x rule with N intervals. E ~ (b - a)5 180N4 ’ where C = max f (i”)(x) : x E [ a ,b ] ] . This is the error formula for Simpson’s Rule with N intervals. 2. I f f is four times continuously dgerentiable, we have CF,bl(f ) 5 C [1 ~ 1 Proof. For n = 1, we obtain (b- a)3 which proves C p 3 b 1f () 5 K . The computation for Simpson’s Rule is similar 12N2 and left to the reader as Exercise 13-26. ~ If the function, the interval and N are given, the aposteriori error estimate is simply a substitution into the formula (see Exercise 13-27). If the function, the interval and a desired accuracy are given, the a priori estimate is obtained by demanding the error bound is less than the desired accuracy and solving for N . Example 13.23 Find the number of intervals needed to estimate s_: e 2 -2 d x with the trapezoidal rule so that the error is less than lop4. 2 12 First, note that with f (x) = e - T we have the derivatives f ’ ( x ) = - x e - T and 12 f ” ( x ) = ( x 2 - 1) e - T . The maximum of K- ( b - a13 12N2 I 2 1 0 - ~ means I- 23 12N2 < 1f”l is assumed at x = 0, so K = 1. Now 2 or N 2 > -lo4. Hence, for N we obtain 3 I 7 81.65 and so N d ; always need to round up. N > 100 - % = 82 would work. Note that in the last step, we 222 13. Numerical Methods Exercises 13-20. Compute the coefficients a; in the Newton-Cotes formulas for n = 7. Use a computer. 13-21. For each given n compute ao, . . . , a, for the Newton-Cotes formula. Then state the corresponding composite integration formula for N divisible by n. Finally, compute the error for the Newton-Cotes formula and for the composite integration formula. For the computation of the cq,a computer is recommended. 3 (a) n = 3. This is called the --rule. 8 (b) n = 4. This is called Mitne’s Rule. (c) n = 5. This rule does not have a specific name. (d) n = 6. This is called Weddle’s Rule. 13-22. Prove as follows that for even n the nth Newton-Cotes formula gives accurate results for polynomials of degree n 1. + (a) Use the linearity property to prove that if the nth Newton-Cotes formula gives the exact integral for one polynomial of degree n 1, then it gives the exact integral for all of them. + (b) Prove that the coefficients a; of the nth Newton-Cotes formula satisfy ai = an-; for all i E (0,. . . , n ) . (cj Prove that if n is even, the nth Newton-Cotes formula gives the exact integral for the polynomial p ( ~=) x ( - c:d)n+l - and conclude that the nth Newton-cotes formula gives the exact integral for all polynomials of degree n + 1. (d) To illustrate that the result does not work for odd n , prove that the first Newton-Cotes formula does not give the integral of f ( x ) = x 2 on [-1, 11. 13-23. Prove that for all t E we have K F ’ d l ( t c+d +T )= K n[-+.?I (tj 13-24. The sign of the Peano kernel for the Newton-Cotes formulas (a) Explain why proving that the Peano kernel KA-”” for the nth Newton-Cotes formula on [-1, I] does not change its sign on [-I, 11 proves that all Peano kernels K F ’ d l for the nth Newton-Cotes formula on [c, d ] do not change signs on [ c , 4. (bj Prove that for n = 1 the Peano kernel KA-’“] is nonnegative on [-1, I]. Hint.Direct computation of the integrals, using that the approximating polynomial is a straight line. ’ (cj Prove that for n = 2 the Peano kernel KA- “ I is nonnegative on [ - 1, 13. Hint.This computation is more tedious. Make sure you use (x - t)31,-,z0 computations for t > 0 and t 5 0. and do separate (dj In a computer algebra system implement a short program that graphs the Peano kernel KL-” on [-I. 11 for arbitrary n. Note. While these graphs do not formally prove that the Peano kernel does not change its sign, the graphs and the implementation are instructive. 13-25. Let f : [a. b]--f R be twice continuously differentiable. Prove that when approximating b f ( xj d x with the midpoint rule, which uses Riemann sums with evaluation in the middle of the interval to ap( b - a)3 , where K = max { l f ” ( x ) I : a 5 x 5 b ) . proximate the integral, the error is bounded by K 24n2 Hint. Use Peano’s representation of the error and use the fact that the midpoint rule is accurate for polynomials of degree n 5 1. Then emulate the rest of the proof for the Newton-Cotes formulas, including the proof that the Peano kernel does not change its sign. ~ 13.3. Numerical Integration 223 13-26. Prove part 2 of Corollary 13.22. 13-27. In each part, give an upper bound for the error of the approximation of the given integral with the given rule and the given number of intervals. e - g dx,trapezoidal rule, N = 20 (a) (b) (c) (dj /-: e - g dx,Simpson’s Rule, N = 20 l5 sin ( x2 ) d x , trapezoidal rule, N = 50 s,’ sin ( x2 ) d x , Simpson’s Rule, N = 50 13-28. Compute the number N of intervals needed to approximate the integral with the indicated rule so that the error is at most the given v . (a) (bj (cj l: e - g dx,Simpson’s Rule, u 5 l2 i4 dx,trapezoidal rule, u i. lo-’ In(xj& d x , Simpson’s Rule, v i. lo-* 13-29. Approximate the integral with the indicated rule so that the error is at most the given u Hiizt. Use the error bounds in Corollary 13.22 to compute the number of intervals. Then compute the requisite sum with a computer, (a) (bj (c) ll Ll‘ L2 1 1 2/5;;e -2 2 d x , trapezoidal rule. u 5 lo-* 1 2 Z e - T dx,Simpson’s Rule, v 5 2 z sin (e) 1 e (:) -2 2 d x , Simpson’s Rule, u 5 lops dx,trapezoidal rule, u 5 10 (0 1‘ f i e X d x , Simpson’s Rule, v 5 I 13-30. Let the function f : [ u ,b] + B be continuously differentiable. Use Euler’s Summation Formula (see Exercise 12-21j and an appropriate substitution to prove that the approximations of the integral lb f ( x ) dx with the trapezoidal rule with N trapezoids converge to Ib f ( x ) d x as N + w . Part 11 Analysis in Abstract Spaces Chapter 14 Integration on Measure Spaces Throughout this second part of the text, we need to integrate multivariable functions. Therefore we start our investigation of the more abstract realms of analysis with integration. Recall that the fundamental idea behind Lebesgue integration was the partition of the range (see Figure 22). Moreover, when we worked with Lebesgue measure, we were concerned with properties stated in terms of sets. We rarely used the fact that these sets were subsets of the real numbers. Therefore integration of functions from a more general space to [-m, m] should be similar to the theory developed in Chapter 9. Indeed, Sections 14.1-14.4 basically recast and sort the results of Chapter 9 to show how these ideas generalize to arbitrary measure spaces. In particular, this generalization makes it possible to talk about Lebesgue integration in Rd.Sections 14.5 and 14.6 provide fundamental results on sequences of integrable functions. These results are the cornerstone for the proofs of many important facts about integrable functions. Finally, Sections 14.7 and 14.8 show how measures on products (such as R2 = R x R) are related to measures on the factors. In this chapter, we experience the full power of abstraction for the first time. Once the abstract core of concrete results for the real numbers is identified, the familiar results from the real numbers can be established in much more general contexts, sometimes even with the same proof. It is important to realize, however, that this generalization comes at a price, We must very carefully check that we did not use any specific properties of the real line in the proof of the concrete result. The most important property of the real line that is lost in the general setting is the linear ordering. Proofs and results that depend on this ordering do not generalize easily. This is why the linear ordering of the real line was used sparingly in the first part of the text. 14.1 Measure Spaces To be most widely applicable, abstract integration is defined on sets equipped with a structure that makes them a “measure space.” In this fashion, we do not need to worry about details regarding the shape and dimension of the domain. The fundamental idea for integration in arbitrary spaces is the same as for Lebesgue integration. Partition the 225 14. Integration on Measure Spaces 226 /& Ak+l Ak-1 Figure 29: For integration in more complicated spaces than the real line, we retain the main idea from Lebesgue integration. We partition the range (here the z-axis) and measure the size of the inverse images Ak of intervals in the range (the z-axis). The sum of these sizes times the corresponding heights should approximate the “volume” under the graph of the function. range and approximate the “area” or “volume” under the function with “generalized rectangles” or “generalized boxes” whose bases are sets for which we can determine the “measure” (see Figure 29). It was noted after Definition 9.4 that even on the real line there are sets for which we cannot determine a sensible “measure” using outer Lebesgue measure. Thus, it is not surprising that on a general set M we need to consider the subset of the power set P ( M ) that contains all subsets for which we can determine the “measure.” The properties that define these subsets are directly inspired by properties of Lebesgue measurable sets (see Theorem 9.10). Definition 14.1 Let M be a set. A subset C & P ( M ) is called a sigma algebra or a-algebra ifs 1. 0 E c, 2. I f S E C, then S’ E C, 32 3. I ~ c for~ all n E W,then U A , EA n=l E C. 14.1. Measure Spaces 227 Our most important examples of a-algebras so far are the power set itself and the set of Lebesgue measurable subsets of R. Example 14.2 1. Let M be a set. Then the power set P(M)of M is a a-algebra. 2. The set Ch of Lebesgue measurable subsets of R is a a-algebra. 0 Part 1 is trivial and part 2 is Theorem 9.10. More examples of a-algebras are given in Exercise 14-1 and Theorem 14.25. Because a-algebras are newly introduced entities, we need to prove that the properties we know from Lebesgue measurable sets also hold in this general context. Proposition 14.3 Let M be a set and let C E P ( M ) be a 0-algebra. I f for every W n N we have A , E E C,then An E C. n=l Proof. For each n E N by part 2 of Definition 14.1, we infer M \ A , of Definition 14.1, we obtain n E C . By part 3 M \ An E C.By DeMorgan’s Laws, this means that n=l co M \ u co n co A , E C . Therefore by part 2 of Definition 14.1 n=l A, E C,which concludes n=l w the proof. Proposition 14.4 Let M be a set, let X u _C P ( M ) be a a-algebra, let N E N and let N A ] , . . . . A N E C.Then A, E C. n=l Proof. For n 2 N + 1, let A , u N := 0. Then n=l u 00 A, = A , E C. w n=l Further properties of a-algebras are presented in Exercise 14-2. Recall that for the definition of Lebesgue measurable functions we never referred to the measure itself. Thus for some purposes it will be sufficient to work with a set and its measurable subsets. This is the idea behind a measurable space. Definition 14.5 A measurable space is a pair ( M , C ) consisting of a set X and a a-algebra C 5 P ( M ) . The sets in C will also be called C-measurable sets. Finally, a measure is a function that assigns each measurable set its “measure.” The only conditions for a sensible “measure” function are that the empty set has no volume and that the volumes of pairwise disjoint sets are added to obtain the volume of their union. 228 14. Integration on Measure Spaces Definition 14.6 Let ( M , C) be a measurable space. Then p : C + [O, measure iJj’ 001 is called a 1. ~ ( 0 = ) 0, and 2. ,u is countably additive, that is, sets A , E C, then p ( fi if is a sequence of painvise disjoint !x An) =E w ( A n ) . The definition of a measure space connects the “measurable sets” with a function that assigns the measure. Definition 14.7 A measure space is a triple ( M , C , p ) consisting of a set M , a sigma algebra C P ( M ) and a measure p : C + [O,CQ]. Example 14.8 With h denoting outer Lebesgue measure and Ch denoting the sigma algebra of Lebesgue measurable sets, (R,Ch,h ) is a measure space. We have already noted that Ch is a 0-algebra. Trivially h ( 0 ) = 0, and the countable additivity of Lebesgue measure is given by Theorem 9.1 1. 0 Example 14.9 Let M be a set. For A M , we define the counting measure y~ ( A ) to be the number of elements of A i f A isfinite and co i f A is infinite. Then ( M , P ( M ) , y ~ ) is a measure space. The reader will prove this in Exercise 14-3. Note that counting measure will allow us to model absolutely convergent series as integrals in ExamQles 14.36 and 14.37 below. 0 In Chapter 9, we defined Lebesgue integration over R,but we never formally defined Lebesgue integration over subsets of R such as intervals. The formalism of measure spaces allows us to easily fill this gap. Every measurable subset of a measure space can be equipped with the structure of a measure space. Example 14.10 Let ( M , C , p ) be a measure space and let 52 E C be measurable. Let C’ := ( S E C : S a}and let pa := pl,n. Then (5 2 , C”, p a ) is a measure space. 0 The reader will prove this in Exercise 14-4. Because measure spaces are newly introduced, we must prove their properties “from scratch.” Specifically, we cannot use familiar properties of Lebesgue measure without first proving them for measure spaces. Nonetheless, the next three propositions should be familiar from Lebesgue measure. Proposition 14.11 Let ( M , C ,F ) be a measure space and let A l , . . . , A N be painvise (u / disjoint sets in C. Then p N n=l \ N A,) = p(An). n=l 14.1. Measure Spaces 229 u N Proof. By Proposition 14.4, we have that + 1, let An := 0. An E C. For n 2 N n=l Proposition 14.12 Let ( M , C , p ) be a measure space and let A , B Then k ( A ) I F(W. E C with A & B. Proof. By Exercise 14-2b we have B \ A E C . Therefore @.(A)I d A ) L4B \ A ) = P ( A lJ ( B \ A ) ) = FUB). + Definition 14.13 Let ( M , C , p ) be a measure space. Then A E C is called a set of measure zero or a null set iff p ( A ) = 0. Aproperty is said to hold almost everywhere in M ipthe subset of M f o r which the property does not hold is of measure zero. Almost everywhere is also abbreviated as a.e. Proposition 14.14 Let ( M , C , p ) be a measure space and f o r all n u co A , E C be a null set. Then E N let the set A , also is a null set. n=l Proof. By Proposition 14.12, subsets of null sets are null sets also. For n E N, u Aj. Then each B, is a null set and j=1 1 w 1 u u co n-1 define B, := An \ m Bn = n=l A,. But n=l co p ( B n ) = 0, which establishes the result. W We conclude this section with a result that shows that the measure of the union of a nested sequence of sets is equal to the limit of the nondecreasing sequence of the measures of the sets Theorem 14.15 Let ( M , C , p ) be a measure space and let N.Then p sets in C so that An E A,+lforall n E u 1 100 be a sequence of \ An = lim p ( A n ) . (n=l *-+m Proof. Mimic the proof of Theorem 9.12. (Exercise 14-5.) Exercises 14-1. Let M be a set. Prove that the set of all subsets S C M so that S is countable or M \ S is countable is a u-algebra. 14-2. Let M be a set and let C g P ( M ) be a o-algebra. n N (a) Prove that if A 1 , . . . , A N E C , then n=l (b) Prove that if A , B E X,then A \ B E C. An E C 230 14. Integration on Measure Spaces 14-3. Counting measure. Let M be a set and let y~ : P ( M ) + [0, M ] be counting measure on M (a) Prove that ~ M ( A=) 0 iff A = 0. (b) Prove that y~ is a measure. 14-4. Measures on subsets. Let ( M , C , p ) be a measure space, let R E Z, let C" := {S E C : S a n d l e t p a := p l x n . 5 Q] (a) Prove that C" is a a-algebra. (b) Prove that p~ is a measure. 14-5. Prove Theorem 14.15. 14-6. Let ( M , C . p ) be a measure space and let (An}:=, be a sequence of sets in C . Prove the inequality 14-7. Let M be a set. Prove that a subset C 2 P ( M ) is a o-algebra iff 0 E C; if S E C , then S' E C ; and 'x. if An E C for all n E N,then A,, E C . n=l 14-8. L e t ( M , C , p ) b e a m e a s u r e s p a c e a n d l e t A , B E C . P r o v e p ( A ) - p ( B ) = p ( A \ B ) - p ( B \ A ) 14-9. The measure of nested intersections. (a) Let ( M , C , p ) be a measure space and let {An},",l be a sequence of sets in C such that for all n E N we have An 2 An+l and such that for some rn E W we have p(A,) < 00. Prove (b) Show that the condition p ( A m ) < m for some m E W cannot be dropped Hint. Let ypq be counting measure on W and let An := { i E W : i 1 n } . 14-10. Let ( M , C , p ) be a measure space. (a) Prove that Z, := [ A that contains C. gM (b) Prove that for all A C, and all E , F E Z. with E E : ( 3 E ,F E C : E 2 A 2 F , p ( F \ E ) = 0 ) } is a o-algebra g A g F and p ( F \ E ) = 0 we have P ( E ) = W(F). (c) F o r a l l A E C , , l e t E . F E C w i t h E g A g F a n d p ( F \ E ) = O a n d d e f i n e p ( A ) : = p ( F ) . Prove that iI : C, + [0, m ] is a measure. (You also need to show that p is well-defined.) (d) Prove that for all B E C we have p ( B ) = p ( E ) . (e) Prove that the measure space ( M , X p , p) is complete. That is, prove that if N E X, is so that p ( N ) = 0 and S & N , then S E C., The o-algebra Z p is also called the completion of Z. and p is also called the completion of p 14.2 Outer Measures Although not all subsets of the real numbers are Lebesgue measurable, outer Lebesgue measure is defined for all subsets of R. The idea of an outer measure can be transplanted to an abstract space. This section shows that outer measures produce a measure space similar to how outer Lebesgue measure produced a measure space. In particular, we will obtain a Lebesgue measure on d-dimensional space. 23 1 14.2. Outer Measures Definition 14.16 Let M be a set. Then p : P ( M ) + [0,co] is called an outer measure 1 3 I . p ( 0 ) = 0, 2. ZfA G B , then p ( A ) 6 p ( B ) , 3. Thefunction p is countably additive, that is, for all sequences {An]:=, of sets (u 00 in M we have p n=l 00 An) 5 C p ( A , ) . n=l By Theorem 8.6, outer Lebesgue measure on R is an outer measure. To integrate in Rd (and via Example 14.10 on subsets C2 E Rd)we need to define an outer measure on RBd that is similar to outer Lebesgue measure on R. The definition is very much the same as for the real line, except that instead of open intervals we use d-dimensional boxes. The thus defined outer measure is also called outer Lebesgue measure and it is also denoted with h. Definition 14.17 Let d 2 1 and for i = 1, . . . , d let ai < bi. Then a set of the form d d B := n ( a i , bi) is called an open box in Rd.We define IBJ := n ( b i - a i ) . For a set i=l i=l S Rd we define the outer Lebesgue measure of S to be 30 I U B j , eachBjisanopenboxinRd , j=1 j=l where we set h ( S ) = cc ifnone of the series in the set on the right converge. It would be nice to show here, similar to Proposition 8.5, that the outer Lebesgue measure of a box is exactly the volume of the box. We postpone this result to Theorem 16.81, where we can use compactness for an efficient proof. The argument that outer Lebesgue measure on Rd defines a measure on its measurable sets is exactly the same as for R.Thus the following results and the definition of measurable sets can be seen as a recap of the appropriate parts of Section 9.1. Theorem 14.18 Outer Lebesgue measure on Rd is an outer measure. Proof. Mimic the proof of Theorem 8.6. (Exercise 14-11.) The motivation and definition of measurable sets in the abstract setting are the same as for Lebesgue measure (see Definition 9.4). We must prevent that the sum of the measures of the pairwise disjoint measurable parts of a set is different from the measure of the whole measurable set. Definition 14.19 Let M be a set and let p : P(M)+ [0,00]be an outer meaM is called p-measurable iff for all T g M we have that sure. A subset S p ( T )= p ( S n T ) + p (S' n T ) . The set of p-measurable subsets of M is denoted C,. 14. Integration on Measure Spaces 232 As in Definition 9.5 the Lebesgue measure is obtained by restricting outer Lebesgue measure to the set Ch of Lebesgue measurable sets. Definition 14.20 Let S C Rd be a Lebesgue measurable set. Then the outer Lebesgue measure h ( S ) of S is also called the Lebesgue measure of S. As before, “half’ of the equality for measurability is always satisfied. Moreover, the proof that outer measures induce measure spaces runs along the same lines as the proofs of the corresponding results in Section 9.1. Proposition 14.21 Let M be a set and let p : P ( M ) -+ [0,CQ] be an outer measure. For all subsets S , T G M , we have p(T) 5 p(S n T ) p (S’ fl T ) . + w Proof. Mimic the proof of Corollary 9.6. (Exercise 14-12.) Proposition 14.22 Let M be a set and let p : P ( M ) -+ [0,001 be an outer measure. Ifp(S) = 0, then S is p-measurable. w Proof. Mimic the proof of Proposition 9.7. (Exercise 14-13.) Lemma 14.23 Let M be a set and let p : P ( M ) -+ [0,001 be an outer measure. I f A and B are p-measurable, then the intersection A n B is p-measurable. w Proof. Mimic the proof of Lemma 9.8. (Exercise 14-14.) Lemma 14.24 Let M be a set, let p : P ( M ) let -+ [0,001 be an outer measure and {A,,]zl be a sequence of painvise disjoint p-measurable sets. u cc Then A,, is Proof. Mimic the proof of Lemma 9.9. (Exercise 14-15.) Theorem 14.25 Let M be a set, let p : P ( M ) -+ [0,001 be an outer measure and let C, be the set of p-measurable sets. Then ( M , C,, p ) is a measure space. Proof. Mimic the proof of Theorem 9.10 to prove that C, is a a-algebra and then mimic the proof of Theorem 9.11 to prove that p is countably additive on C,. (Exercise 14-16.) rn In particular, Theorem 14.25 shows that the triple Rd,Ch,h is a measure space. Lebesgue measure is the standard measure for integration in d-dimensional space. Thus, as we proceed to define measurable and integrable functions, we are constructing a theory that allows us to integrate on Rd and its subsets. ( Exercises 14-11. ProveTheorem 14.18 ) 14.2. Outer Measures 233 14-12. Prove Proposition 14.21 14-13. Prove Proposition 14.22 14-14. Prove Lemma 14.23. 14-15. Prove Lemma 14.24. 14-16. Prove Theorem 14.25. 14-17. Let d 2 1 and for i = 1, , . . , d let the numbers ai < bi be dyadic rational numbers. Then a set of n d the form D := (ai , b i ) is called a dyadic open box in Rd. Prove that for any set S g Wd we i=l u 30 IDj I : S C D j , each D j is a dyadic open box in Rd j=1 1 14-18. Prove that if A , B g R and h ( A ) = 0 (where h is Lebesgue measure on JR), then h ( A x B ) = 0 (where h is Lebesgue measure on R2). I 14-19. Looking at Riemann integrals from a measure theoretic point-of-view. Let [ a ,b] be an interval. For a set S & [a, b] we define J ( S ) := inf n (" IIjI : S & j=1 u I j , each I j is an open interval j=1 and call it the Jordan content of S. Note that the only difference between the Jordan content and outer Lebesgue measure is that the sums over which the infimum is taken are finite. (a) Prove that for any closed interval [c, d ] g [a, b ] we have J ( [c, dl ) = d - c. (b) Prove that the Jordan content is nor an outer measure. Hint: Q. (c) Let Ji (S):=sup In n Ibj - a j 1 : S 2 U [ a j ,b j ] , a l 5 bl 5 a2 5 b2 5 . . j=1 . sa, 5 bn j=1 1 , and call it the inner Jordan content of S g [ a ,b]. Prove that for any closed interval [c, d ] E [ a ,b] we have Ji ([c, d ] ) = d - c. (d) Prove that the inner Jordan content is not an outer measure E W Jordan measurable iff J ( S ) = J i ( S ) . An algebra of sets satisfies the first two properties of a a-algebra, but it is only closed under finite unions rather than infinite of Jordan measurable subsets of [ a ,b] forms an algebra. unions. Prove that the set ( e ) Call a set S z[[n,b] J : z[~,b] + [O, m) is a finitely additive measure. That is, (f) Prove that the Jordan content J(0) = 0 and if (An),N_lis a finite sequence of pairwise disjoint sets An E q~.b], then (g) Prove that a set S g [ a , b] is Jordan measurable iff its indicator function 1s is Riemann integrable. Note. This exercise shows it is fair to say that the problem with the Riemann integral is that its associated notion of a content is only finitely additive. 14-20. Let a < b be real numbers and let g : [ a ,b] + W be nondecreasing. For numbers c, d E ( a , b ) so that c < d define (c, d ) := lim g(x) - lirn g ( x ) . define [ a ,d ) := lirn g(x) - g ( a ) , Ig 1 1 and define (c, b] x-td- Ig := g ( b ) -+,,.. lim , , x+c+ 1 g(x). Set [ a , b] I Ig I x+d- := g ( b ) - g ( a ) . Let Z be the set of all Ir subintervals of [ a , b ] that are either open, closed at a and open on the right, closed at b and open on the left or equal to [a, b]. For any set S c [a, b ] ,we define the outer Lebesgue-Stieltjesmeasure u 30 lljlg : S & of S to behg(S) := inf j=1 j=1 I j , each I j is i n Z I . 234 14. Integration on Measure Spaces (a) Prove that the outer Lebesgue-Stieltjes measure really is an outer measure (b) Prove that open intervals ( c , d ) C [ a . b] are hg-measurable. Hint. Compare with Proposition 9.13. (c) Prove that for c < d both in [ u , b] we have hg ([c, d ] )= lim g ( x ) - lim g ( x ) , where .xed+ x’c- the one-sided limit is understood to be the value of g if c = a or d = b. 1; forx > c. Prove that Ag(S) := (e) Construct a nondecreasing function g : [a, b] + hg ( [ c ,dl ) > g ( d ) - gic). O; 1: ’ if for any set s 2 [ a , b ] . if c E S, ’3 B and an interval [c, d ] g [ u , b] so that (f) Prove that the function f : [ u , b] -+ W is Riemann-Stieltjes integrable with respect to g iff hg ( ( x : f is discontinuous at x ) ) = 0. 14.3 Measurable Functions As we did on the real line, we first define the functions for which there is a chance that the integral exists and then we define the integral. Indicator functions (see Definition 5.9) are once more our “rectangles” and simple functions are composed of finitely many disjoint “rectangles.” From here on, algebraic operations on functions will always be understood to be pointwise, that is, for example, the sum f g of two functions f and g with the same domain is defined as ( f g ) ( x ) := f ( x ) g ( x ) for all elements x of the domain. + + + Definition 14.26 Let ( M , C) be a measurable space. A simple measurable function is a function s : M -+ R such that there are n E N,a l , . . . , a, E R and painvise n disjoint sets A 1, . . . , A, E C so that s = ak1.4,. We will also call these functions k=l simple functions. For measurable functions, we consider the positive and negative parts separately. Definition 14.27 Let M be a set. For f : M -+ [-m, 001 we dejine the positive part f + ( x ) :=max{f(x),O} andthenegativepart f - ( x ) := -min{f(x),O}. Definition 14.28 Let ( M , C) be a measurable space. The nonnegative function f : M -+ [0,001 is called measurable if there is a sequence of simple functions s,, : M + [0,00) such that f o r all x E M the sequence { s n ( x ) } Z lis nondecreasing and lim s, (x) = f (x). {s,}zl n+oc A function f : M -+ [-a, 001 is called measurable fz f + and f - are both measurable. lf it is necessary to distinguish between several a-algebras, we will also call these functions C -measurable. Once again, measurable functions have many characterizations. Theorem 14.29 Let ( M , C ) be a measurable space and let f : M + [-XI, function. Then the following are equivalent. XI] be a 14.4. Integration of Measurable Functions 235 1. f is measurable, 2. For all a E R,we have {x E M 3. For all a E R, we have {x E M : f (x) 5 a } E C , 4. For all a E R,we have {x E M : f 5. For all a E R,we have {x E M : f (x) 1. a } E C. :f (x) > a } E C , (x) < a } E C, Proof. Mimic the proof of Theorem 9.19 (Exercise 14-21). The characterizations of measurable functions can be used to prove that certain operations preserve measurability. Theorem 14.30 Let ( M , C ) be a measurable space and let f , g : M -+[-GO, co]be measurable functions. r f f g is defined everywhere, then f g is measurable. If f - g or f . g is defined everywhere, then it is measurable. Finally, f +, f - and I f I are measurable. + + Proof. Mimic the proof of Theorem 9.20. (Exercise 14-22.) Exercises 14-21. Prove Theorem 14.29 14-22. Prove Theorem 14.30. 14-23. Let ( M , C ) be a measurable space. Prove that a bounded function f : M -+ [0, co) is measurable iff there is a sequence of simple functions sn : M + [0, 00) such that for all x E M the is nonincreasing and lim s n ( x ) = f ( x ) . sequence { s , ( x ) I,"=, n e w Hint. Mimic part ''5=+1" of the proof of Theorem 14.29 for nonnegative functions. 14-24. Let ( M , C )be a measurable space and let f , g : M + [-m, {x {x E M : f ( x ) = g(x) } E C. (b) Prove that E M : f ( x ) 5 g(x) } E C. (c) Prove that {x E M : f ( x ) < g(x) } E C. (a) Prove that 001 be measurable functions. 14.4 Integration of Measurable Functions Once measurable functions are defined, the definition of the integral also is similar to that of the Lebesgue integral. N,let A1, . . . , A , Definition 14.31 Let ( M , C ,p,) be a measlire space, let n E pairwise disjoint, let a1 , . . . , a, a k l be ~ ~a simple function. n E n [0,co)and let s = k= 1 E C be 14. Integration on Measure Spaces 236 The fact that the integral in Definition 14.31 does not depend on the representation of s can be proved just as for the Lebesgue integral in Exercise 9-20a. Definition 14.32 Let ( M , C , p ) be a measure space. Let f : M measurable function. We dejine the integral o f f to be f d p := sup I M [O, co] be a +. { IM s d p : s is a simple function with 0 5 s 5 f and we call f integrable iff the supremum isjinite. A function g : M -+ [-m, 001 is called integrable z r g + and g - are both integrable. Its integral is defined to be IM 1, g d p := g+ d p - /M g - dp. The integral on measure spaces is very versatile. Examples 14.33 and 14.34 show that integration on the real line as well as on d-dimensional spaces are special cases. Examples 14.36 and 14.37 show that absolutely convergent series and absolutely convergent double series are in one-to-one correspondence with integrable functions on the right measure spaces. Example 14.33 With M = R, C = C;, and p = A, Definition 14.32 gives the 0 Lebesgue integral on the real line. Example 14.34 With M = Rd,C = C;, and p = A, Definition 14.32 gives the Lebesgue integral in d-dimensional space. We will investigate in Section 14.8 how the 0 Lebesgue integrals in various dimensions are related to each other. For the mentioned examples on series, we first need to establish that a function is integrable iff its absolute value is integrable. As was (by now) to be expected, the properties of the abstract integral are proved in exactly the same way as for the Lebesgue integral. Theorem 14.35 Let ( M , C, p ) be a measure space and let f , g : M measurable functions. 1. I f 0 5 f 5 g a.e. and g is integrable, then so is f and +. [-co,001 be lM IM f dp 5 g dp. 2. f is integrable z f f 1 f I is integrable and in that case the triangular inequality I n 3. I f f 2 0 , t h e n SM f d p = 0 i f f f =Oa.e. Proof. Mimic the proof of Theorem 9.23. (Exercise 14-25.) 237 14.4. Integration of Measurable Functions Example 14.36 With M = 0;: N,a series aj N,C = P( N ) and p = yw being counting measure on converges absolutely iff the function f : N + [--00, 001 defined by j=1 f(j)= aj is integrable over the measure space (N, P ( N ) , yw). Moreover, for every integrable function f on this measure space we have jN dyw f co = f(j). j=1 Example 14.37 Let (a(i,j ~ } $ = be ~ a doubly indexed countable family of numbers. 7 occo We say the double series a(i,j) converges absolutely iff the double series i = l j=1 moci 1 converges. la(j,j) With M = N x N, C = P ( N x N)and p = mxwbeing counting measure on N x N, 000;: a double series 7 a(i,j)converges absolutely iff the function f (i, j ) := a ( i , j ) is i = l j=1 integrable on the measure space (N x N, P ( N x N), y w x ~ ) . For every integrable function f on this measure space, we have S, cocc .t-dywxrv=~~f(i,.j). XW 0 i = l j=1 As for Lebesgue measure, we consider null sets to be insignificant. Therefore with the same motivation as given for Definition 9.24 we define the following. Definition 14.38 Let ( M , C , p ) be a measure space. I f . f : M -+ [--00, -001 is defined iff (') is defined? is integrable. a.e., we call it integrable z r g ( x ) := i f f (x) i s not defined, { 6(x)' As long as a function is constructed from other integrable functions, like the sum in Theorem 14.39, measurability of the set where the function is not defined is not an issue. However, even if this was a problem and the function is undefined on a subset of a null set, we could simply switch to the completion of the measure (see Exercise 14-10) and note that subsets of null sets are also insignificant. Overall, with the same conventions as for the Lebesgue integral we can prove that the integral over a measure space has the right linearity properties. Theorem 14.39 Let ( M , C , p ) be a measure space and let f ,g : M + integrablefunctions. Then the following hold. 1. For all a E R,the scalar multiple af is integrable and IM 2. The sum f + g is integrable and f dp f +g d p = [--00. -003 IM 1, + lM af d p = a g dp. be f dp. 14. Integration on Measure Spaces 23 8 Proof. Mimic the proof of Theorem 9.25. (Exercises 14-26a and 14-26b.) Exercises 14-25. Prove Theorem 14.35. That is, let ( M , C,p ) be a measure space and let f ,g be measurable functions. (a) Prove that if 0 5 f 5 g a.e. and g is integrable, then so is f and (b) Prove that f is integrable iff (c) Prove that if f > 0, then jM I f 1 is integrable and that in this case f dp 5 1 / g dp. M [M f dw 5 ~ IM 1f l dp. J, f d p = 0 iff f = 0 a.e.. 14-26. Let ( M , C . p ) be a measure space and let f , g : M + [-m, 031 be integrable functions. Prove that if ci E W,then the scalar multiple af is integrable and /M Prove that the sum f + g is integrable and IM + f g dl* = cif dp =a IM + IM 1 f dp lM f dw. g dp Note. Exercise 14-33 gives a more effective proof than mimicking the proof of Theorem 9.25. Prove that f - g is integrable and /M f - g d p = /M f d p - g dp. M Prove that max(f, g ] (defined pointwise) is integrable. Prove that min[f, g ] (defined pointwise) is integrable. Give an example that shows that the product o f f and g need not be integrable. 14-27. Markov’s inequality. Let ( M , o,g ) be a measure space and let f : M + [O. 1 Prove that for all c z 0 we have p ( { x E M : f ( x ) > c ] ) 5 f dp. C 1 30) be integrable. M 14.5 Monotone and Dominated Convergence Sequences have been a standard tool throughout our investigation of single variable functions. Sequences play an equally important role in more abstract settings. In Exercise 11-13, we have seen that pointwise convergence of a bounded, monotone sequence of Riemann integrable functions need not produce a Riemann integrable limit function. The two fundamental results about pointwise convergence in measure theory are the Monotone Convergence Theorem, which says that the pointwise limit of a nondecreasing sequence of nonnegative integrable functions is integrable if the limit of the integrals is finite (see Theorem 14.41), and the Dominated Convergence Theorem, which says that the pointwise limit of a sequence of integrable functions is integrable provided all functions are below one integrable function that dominates them all (see Theorem 14.43). Hence, when it comes to pointwise convergence of functions, the Lebesgue-type integral has more favorable properties than the Riemann integral. Because every subset A E C of a measure space ( M , C , /A) can be turned into a measure space, and because by Theorem 14.35 for any integrable function g, the function g l A is also integrable, we can define integrals over subsets. Definition 14.40 Let ( M , C , p ) be a measure space, let A E C and let the function ,f : A4 + [-m, 001 be integrable. Then we define the integral o f f over the subset A 239 14.5. Monotone and Dominated Convergence as I, g d p := lb SM glA d p . For Lebesgue integrals over intervals, we will also write f ( x ) d h ( x ) := Lb f d h := la,b, f dh. Because the integral is defined as a supremum of integrals of simple functions, it should not be too surprising that the integral is well-behaved with respect to monotone sequences of functions. Theorem 14.41 Monotone Convergence Theorem. Let ( M , C , p ) be a measure bela sequence of nonnegative measurable functions so that space and let { f f l } z { f f l ( x ) } z l is nondecreasing for almost all x E M and lim f n ( x ) exists for almost all x n+m E M . Let f : M + [0,00] be a measurablefunction so that f (x) = fl+m lim f f l ( x ) IM P a.e. Then P f d p = lirn fn f l J M dl*. Proof. First assume that for all x limit lim f n ( x ) exists. E M we have 0 5 f 1 (x) 5 f 2 ( x ) 5 . . . and the fl+X reversed inequality, let s = c ak1.4, be a simple function such that 0 5 s 5 f and let k=l t E ( 0 , 1). For n E all n E N,define E, cc := ( x N we have the containment E , E M : f i l ( x ) 2 t s ( x ) } .Then M = fl=l En+l, and m m k=l k=l Let E > 0. Then there is an N E u N so that E,, for 240 14. Integration on Measure Spaces Because E SM SM f5y- 1, was arbitrary we obtain t s d p i lirn fn n+w d p , and because the number t E (0, 1) was arbitrary, we can let t approach 1 and obtain sdp = s d p 5& :% t S, dp. fn Because s 5 f was an arbitrary simple function, we conclude that SM { f dF = sup s,/ d p : s simple, 0 5 s 5 f } 5 lim n+oo ,/ fn dp. Now assume that 0 i fi(x) 5 f 2 ( x ) I . and n+cc lirn f n ( x ) = f (x) for almost all x E M . Let N E C be a null set so that 0 5 f l ( x ) 5 f 2 ( x ) 5 . . . and lirn f n ( X ) = f (x) for all x E M \ N. Changing a function on a null set does not n+cc change its integral. Therefore we can replace f with f ~ M \ and N each f n with fnlM\N and apply what we have proved above to obtain the equation for the integrals. rn ‘ 1 We need the next lemma for the proof of the Dominated Convergence Theorem. {fn)rE1 Lemma 14.42 Fatou’s Lemma. Let ( M , C , p ) be a measure space and let be a sequence of nonnegative measurable functions. Then lirn inf f n (defined pointwise) n+w is measurable and f n d p 5 ln+cc iminfh fn dp. Proof. Because the limit inferior is liminf f n = lim inf{f j : j 3 n ] , where the n+m n+co infimum and the limits are taken pointwise, we first consider the sequence of functions p n := inf{f j : j 2 n ) . u {x 00 Let n E N and a E R. Then {x E M : p n ( x ) ia } = E M : fj(x) < a}. j =n Because the union on the right side is measurable and a was arbitrary we conclude that p n is measurable. Moreover, clearly we have pn 5 pn+l. By Exercise 14-28a, this means that liminf f n = lim p n is measurable and by n+oo n+oo the Monotone Convergence Theorem numbers n E N we have p n p n d y . Now for all I f n , which by Lemma 10.5 means liminf f n d p = lim n+w 1 p n d p 5 liminfIM fn d p . n+m The inequality in Fatou’s Lemma can be strict (Exercise 14-29). Finally, as long as all functions’ absolute values are bounded (“dominated”) by one integrable function, pointwise limits preserve integrability and the limit can be moved out of the integral. Theorem 14.43 Dominated Convergence Theorem. Let ( M , C ,p ) be a measure be a sequence of measurable functions and let f be a measurable space, let ( fn]rG1 functionsuch that f (x) = lirn f n (x)for almost all x. Moreovel; let g be an integrable n+oo 24 1 14.5. Monotone and Dominated Convergence function such that for all n f is integrable and SM E N and almost all x f dp= ,hI/, fn If I 1 I E M we have f n (x) 5 g ( x ) . Then dp. I Proof. Because ( x ) l = lirn f n ( x ) 5 g(x) for almost all x E M , by part 1 of n+cc Theorem 14.35 I f 1 is integrable and then by part 2 of Theorem 14.35 f is integrable. Because changing a function on a null set does not change the integral, we can assume that l f n ( x ) (5 g(x) and (x)l 5 g(x) for all x E M . Now for all n E N we have f n (x) g(x) 2 0 and thus by Fatou’s Lemma we obtain If + lim lirninf F n-cc which means that functions (- f n ) sition 10.4) IM jMf dp 5 fa ,/ fn hnkf /M + g 2 0 gives + g d p = /M -f d p liminf(f, n-+m + g d p = liminf n-cc /M + g) d p f n d p -t /Mgdir’ f n d p . The same argument applied to the 5 liminf -f n d p , that is (recall Propo/M f n d p . By Proposition 10.3 and Theorem 10.6, we f dp 2 r The hypothesis that all functions f n in the Dominated Convergence Theorem are below an integrable function g cannot be dropped. For example, on (0, 1) equipped l o o converges point{ [M);),=, wise to zero, but all integrals are equal to ln(2). Moreover, Exercise 14-28 shows that with Lebesgue measure, the sequence of functions 1 the demand that the pointwise a.e. limit is measurable cannot be dropped in general, but that it can be dropped for Lebesgue measure. Exercises 14-28. Pointwise almost everywhere limits of measurable functions. Let ( M , C , p ) be a measure space and be a sequence of measurable functions. let (fn)r=u=l Prove that if f ( x ) = lim f n ( x ) for all x E M , then f is measurable. n-tcc Give an example that shows that the pointwise a.e. limit of a sequence of measurable functions need not be measurable. Hint. There is an example with M = (0, 11, C = (0, M )and p = 0. Prove that if ( M , C , p ) is a complete measure space, then f ( x ) = lirn f i l ( x ) a x . implies n-cc that f is measurable. Prove that (&id, Xi,h ) is a complete measure space. 14-29. Give an example that the inequality in Fatou’s Lemma can be strict. Then explain why this is not a counterexample to the Dominated Convergence Theorem. 14-30. Let ( M , C , p ) be a measure space and let g : M + [O, pg : C + [O, cc)defined by p g ( A ) := 031 be integrable. Prove that the function 14. Integration on Measure Spaces 242 14-31. Let ( M , Z. p ) be a measure space and let ( f n } , X , l be a sequence of integrable functions that converges uniformly to a measurable function f. (a) Prove that if p ( M ) < co,then f is integrable and sM f dw = LlmE sM f n dpL. (b) Give an example that shows this result need not hold for infinite measure spaces. be a sequence of integrable functions that con14-32. Let ( M , Z, p ) be a measure space and let {fn]r=l verges pointwise to a measurable function f and for which there is a B E JR so that for all x E M we have f n ( x ) 5 B . (a) Prove that if p ( M ) < co,then f is integrable and f d p = lim n+x M f n dp. (b) Give an example that shows this result need not hold for infinite measure spaces sM sM 14-33. Let f,g : M + [-co,301 be integrable functions. Give a more effective proof that f integrable and IM + f g dp = f dp + + g is g d p than was given in Theorem 9.23. Hint. Use the Monotone Convergence Theorem and two nondecreasing sequences of simple functions that converge to I f 1 and / g (respectively to prove that I f 1 + / g /is integrable. Use a similar idea to prove the equation for the integrals. Do not use the Dominated Convergence Theorem (why?). 14-34. Explain how the hypotheses of the Dominated Convergence Theorem prevent the occurrence of examples as in Exercise 11- 17 or as mentioned at the end of this section. 14-35. Prove that for every Lebesgue integrable function f : SS -+ JR there is a sequence of Riemann integrable functions rn : R + R such that lim n+w s w if - rn I d h = 0. 14.6 Convergence in Mean, in Measure, and Almost Everywhere Theorem 15.50 will show that the “distance” between two functions can be measured with the integral of the absolute value of the difference. Because convergence with respect to this distance is fundamental in analysis, we investigate this notion of convergence and some of its consequences here. Definition 14.44 Let ( M , C ,p ) be a measure space, let [ fn]F=l be a sequence of measurable functions and let f : M + [-w, w] be a measurable function. If 1 fn - f 1 d p = 0 then we say that ( fn]zl converges in mean to f I Convergence in mean is near the heart of the definition of integration. Theorem 14.45 Let ( M , C ,p ) be a measure space and let f : A4 + [-w, w] be an of simplefunctions that converges integrablefunction. Then there is a sequence (sn]El in mean to .f. Proof. By definition of the integral, for every n tions 0 5 ;s 5 f - and 0 5 :s 5 f + so that E N there are simple func1 n 14.6. Convergence in Mean, in Measure, and Almost Everywhere -1 + -1= - 2, < which means that {s.},"=~ 243 n n n converges in mean to f . Unfortunately, convergence in mean does not even imply pointwise a.e. convergence, so the visualization of convergence in mean is a bit complicated. Example 14.46 On the interval [0, 11, for n E N and k E {0, . . . , 2n - 1) define the function f p + k := 1F k ~k + l, ~Then ] . the sequence { fm}F==l converges in mean to 0, but 0 it does not converge at any point in [0, 11. We can say that convergence in mean implies that the measure of any set on which the difference Ifn - f I is greater than a given number must go to zero. Proposition 14.47 Let ( M , C,p ) be a measure space. I f the sequence { fn}r=l of measurable functions converges in mean to the measurable function f . then for all E > 0 we have lim p ({x E M : I f n ( x ) - f ( x ) l > E } ) = 0. n+cc Proof. Let E > 0. Then w and the latter sequence goes to zero. The condition derived in Proposition 14.47 is called convergence in measure. Convergence in measure implies at least that a subsequence converges a.e. f,,}rz1 Definition 14.48 Let ( M , C ,p ) be a measure space. The sequence { of measurable functions converges in measure to the measurablefunction f ifffor every E > 0 we have lim p ({x E M : f n ( x ) - f ( x ) l > E } ) = 0. n+cc 1 fn}ry1 Proposition 14.49 Let ( M , C , p ) be a measure space. I f the sequence { of measurable functions converges in measure to the measurable function f , then it has a subsequence that converges a.e. to f . 14. Integration on Measure Spaces 244 Proof. Define as follows. Let no := 1 and for k L 1 let (nk}.& x E M : I f n k(x) - f ( x )I > that (fnkJE1 s o t h a t p ({x be so that I Suppose for a contradiction does not converge a.e. to f . Then there are an m E E M : (VK E w l m and W : 3k 3 K : nk > nk-1 Ifnk(x) - f(x)l > N and an E > 0 m < E . Then k=l but the measure of the latter set is less than E, a contradiction. Exercises 14-36. Let ( M , C , p ) be a measure space and let g : M + [O, m] be integrable (b) Prove that for every E > 0 there is a S > 0 so that if p ( A ) i8, then Hint. First prove the result for bounded g. L gdp iE . (c) Prove that with M = [a,b ] , p being Lebesgue measure h on [ u , b] and C being the set Cj, of Lebesgue measurable subsets of [ u , b ] ,the function f ( x ) := continuous. 14-37. Pointwise convergence and convergence in measure. (a) Give an example that shows that, in general, pointwise convergence does not imply convergence in measure. (b) Prove that if ( M . C , p ) is a measure space with p ( M ) < 00, [fn]:={ is a sequence of measurable functions that converges ax. to the real valued measurable function f,then converges in measure to f. [fn]F=l (c) Prove Egoroff's Theorem, which states that if ( M , C,p ) is a measure space with p ( M ) iCQ, is a sequence of measurable functions that converges a x . to the real valued measurable function f,then for each E > 0 there is a B E C so that p ( B ' ) iE and converges uniformly to f on B . [fn}r=l 14-38. First results on derivatives of integrable functions. (a) Let f : [a,b] -+ W be a Lebesgue measurable function that is differentiable a t . . Prove that f' is Lebesgue measurable. You may use the result of Exercise 14-28c. (b) Let f : [ a , b] + JR be a nondecreasing function. Prove that f' (which exists a x . by Exercise 10-7) is Lebesgue integrable on [a,b ] and f ' ( x ) dh 5 f ( b )- f ( a ) . 14.7. Product a -Algebras Him. Apply Fatou’s Lemma to fn (x) := 245 f (x + b) 1 - f(x) (where we set f ( b + e ) := f ( b ) for all 6 > 0) and use Exercise 7-23, Theorem 9:26 and an adaptation of the proof of the Derivative Form of the Fundamental Theorem of Calculus for Riemann integrals. 14.7 Product a-Algebras One motivation for introducing measure theory was the desire to define an integral on multidimensional spaces. Integration with respect to d-dimensional Lebesgue measure provides such an integral as an abstract entity. Computationally, it is standard practice to compute higher dimensional integrals as iterated lower dimensional integrals. Sections 14.7 and 14.8 show that this is also possible in measure theory. As a consequence, we not only progress towards showing that d-dimensional integrals can be computed as iterated integrals, but we also are able to establish some results on double series (see Exercise 14-50). The relevant constructions actually start with the construction of the product space of two measure spaces. For this product space, we need a a-algebra that contains certain sets. This section will provide the a-algebra on the product space and Section 14.8 will provide the product measure. As before, to safeguard against nonmeasurable sets, we must make sure that our a-algebras do not get too large. The smallest a-algebra that contains certain sets is generated as follows. Definition 14.50 Let M be a set and let U P ( M ) . Then the a-algebra generated by U is defined to be the intersection of all a-algebras that contain U. Proposition 14.51 Let M be a set and let U by U is a a -algebra. P(M).Then the a-algebra generated Proof. Exercise 14-39. Exercise 14-41a shows that the a-algebra of Lebesgue measurable sets is generated by a family of fairly simple sets. To prove the equality of two measures on the a algebra generated by the set U we want to concentrate on the sets in U . Unfortunately, because the definition of a measure involves unions of pairwise disjoint sets, this is not as straightforward as it might seem. Instead of being equal on a-algebras, measures usually agree on Dynkin systems. 14. Integration on Measure Spaces 246 Definition 14.52 Let M be a set and let V system iff P ( M ) . Then D is called a Dynkin _C 1. M E D , 2. I f A , B E D a n d A 2 B, then A \ B E D, and u cc 3. I f { A n } F I is a sequence in D with A , An+l for all n E N,then A , E D. n=l Definition 14.53 Let ( M , C , p ) be a measure space. Then the measure p is called finite z f f p ( M ) < co. Proposition 14.54 Let ( M , C) be a measurable space and let p and v befinite measures on M so that p ( M ) = v ( M ) . Then { S E C : p(S) = v(S)} is a Dynkin system. Proof. Exercise 14-43a (for part 3 use Theorem 14.15). As with a-algebras, the intersection of all Dynkin systems that contain a given set of sets U is again a Dynkin system, called the Dynkin system generated by U (see Exercise 14-44). The connection between a-algebras and Dynkin systems is made via n-systems. Definition 14.55 Let M be a set and let U G P(M).Then U is called a n-system for all A , B E U the intersection satisfies A f?B E U. zx Theorem 14.56 Monotone Class Theorem or Dynkin’s Lemma. Let M be a set and let U be a n-system that contains M . Then the a-algebra generated by U equals the Dynkin system generated by U. Proof. Let C be the a-algebra generated by U and let D be the Dynkin system generated by U.Because every a-algebra also is a Dynkin system, we infer D C. For the reversed inclusion, first consider D1 := { A E D : A n U E D for all U E U}. BecauseU contains M we infer M E D1.If A , B E D1 and A 2 B , then for all U E U we have ( A \ B ) f l U = ( A n U ) \ ( B n U ) E V (see Exercise 7-3e), which means A \ B E D l . Moreover, if is a sequence of sets in D1 with A , G A,+l for all n E N,then for all U u cc means that n=l E U we obtain (6 An) nU = n=l co u (A, n U ) E V ,which n=l A, E D1. Hence, D1 is a Dynkin system. Because U is a n-system, we infer U D1 and therefore 2) G D1, which implies that V = D1. Now define D2 := { A E D : A n D E D for all D E V}.Because D1 = D we infer U 2 232. Similar to the above we can prove that 272 is a Dynkin system, which means that D = V2.We conclude that if A , B E D , then A Ti B E D. Now we can prove that V is a a-algebra. First, note that because M E D , for all sets S E D the complement S’ = M \ S is in D. In particular, we trivially obtain 0 = M’ E D. Now let { B n } E lbe a sequence of sets in D. For n E N,define the 247 14.7. Product a -Algebras / / / / ’// X X X Figure 30: The “horizontal sections” S J of a set S C X x Y are obtained by intersecting the set with “horizontal lines” X x { y } . The “vertical sections” S, are obtained by intersecting the set with “vertical lines” {x} x Y . u u u n set A , := Bi = M \ E 2). Then A , 2 A,+1 for all n E N and so i=l oc x B, = A, E D and 2)is a a-algebra. This proves C D , hence C = D. ,=l 17=1 The Monotone Class Theorem now allows us to prove that under certain circumstances, two measures on the same a-algebra must be equal. Theorem 14.57 Let M be a set, let U C P ( M ) be a n-system, let C be the a-algebra generated by U and let p , v C + [0,001 be measures with kiu = vlu.If there is a in U so that all p ( A , ) = v(A,) arejnite, A , An+l for all n E N sequence u 3c, and A,z = M , then p = v. Proof. First prove that each { A E C : p ( A n A , ) = u ( A n A , ) } is a Dynkin system. Then apply Theorem 14.56 to show p ( A f’ A , ) = u ( A n A , ) for all A E C and n E N. cc Finally, show p = v by using A0 := 0 and p ( A ) = K ( A f l ( A , \ A,-i)) and n=l C-C u ( A f l ( A , \ A,-I)) for all A v(A)= E C. (Exercise 14-45.) n= 1 To define a product of two measurable spaces we now simply form all the products of sets in the respective a-algebras and consider the o-algebra generated by these products. We also define the intersections of sets with “horizontal lines” or “vertical lines” and the intersections of graphs of functions with “vertical planes.” These intersections will allow us to reduce integrals on the product of two measure spaces to iterated inte- 248 14. Integration on Measure Spaces “I “t Figure 31: The “y-sections” f Y of a function f : X x Y + [--00, m] are obtained by intersecting the graph of the function with “vertical planes X x {y} x [--00, 001 in the X-direction at y.” The “x-sections” f x are obtained by intersecting the graph with “vertical planes (x} x Y x [-m, 001 in the Y-direction at x.” grals. To emphasize the visualization as x- and y-coordinates, for the remainder of this chapter, the first factor space will be X and the second factor space will be Y. Definition 14.58 Let ( X , C ) and ( Y , r) be measurable spaces. A subset S of X x Y is called a rectangle with measurable sides iff there are subsets A E C and B E r such that S = A x B. The a-algebra generated by the set of all rectangles with measurable sides is called the product a-algebra C x r of C and r. Let x E X and y E Y . For all sets S E C x r in the product a-algebra we dejine the sets S, := n y [ { s E S : n x ( s ) = x}] and SY := n x [ { s E S : n y ( s ) = y } ] (see Figure 30). For all C x r-measurablefunctions, we dejine f x (y) := f (x,y) and f’(x) := f (x, y) (see Figure 31). The standard approach to prove results about product a-algebras is to establish the result for rectangles with measurable sides and then to show that the sets for which the result holds form a a-algebra. We start by making sure that the sections S,, SY, f, and f Y of measurable sets and functions are measurable. Proposition 14.59 Let ( X , C ) and ( Y , r) be measurable spaces and let x E X and y E Y. Then for all sets S E C x the set S, is r-measurable and the set S’ is C-measurable. Moreover, for all C x r-measurable functions, the function f x is r-measurable and the function f Y is C-measurable. Proof. First, we prove that for each y E Y and S E C x r we have SY E C. To do this let y E Y and let BY be the set of all S E C x r so that SY E ’c. For all A E C and B E r the rectangle with measurable sides A x B is in BY, because if y E B we 14.7. Product B -Algebras 249 have ( A x B)Y = A E C and if y # B we have ( A x B)Y = 0 E C. To prove that BJ = C x r we need to show that BY is a a-algebra, because then it must contain C x r (see Exercise 14-42) and must hence be equal to it. To see this, first note that 0 = 0 x 0 E BY. Now if S E BY, then (S')' so S' u E s' : ny(s)= y}] = nx [{s = x \ nx[{s E S : ny(s)= y}] BY. Finally, if E = nx[{s # s : ny(s)= y}] = (S')' E C , {Sfl}Zl is a sequence in BY, then 33 so S, E BY. Thus BY is a a-algebra that contains the rectangles with measurable n=l sides and is contained in C x r, which means that BY = C x r. Hence, for all y E Y and all S E C x r we have that SY E C. Similarly, we prove that for all x E X and all S E C x l- we have that Sx E r. Now let f : X x Y + [--00, -001 be a C x r-measurable function. To prove that for every y E Y , the function f Y is C-measurable, let y E Y and a E R.Then (f')-"(a,CO]] = {x E x : f ( x , y ) > a } =nx[{z E xxY : f(z)>a,nr(z)=y}] Similarly, we prove that for every x E X the function fx is r-measurable. We conclude this section by showing that the constructions presented here give reasonable results for Lebesgue measure. + Proposition 14.60 Let m , n , d E N be such that m n = d and let h,, h, and hd denote Lebesgue measure on Rm,Rn and Rd,respectively. Then Chd 2 Xi, x Ch, and f o r all A E and B E Ch, we have hd(A x B ) 5 h,(A)k,(B). Proof. By Exercise 14-41b, Ch, x Xi,,is generated by the sets of the form A x B , where each A and B is either an open box or a null set. The proof that products of open boxes A x B (which are of course open boxes themselves) are hd-measurable is similar to the proof of Proposition 9.13 (see Exercise 14-40). Now let A E Ch,, let B E Ch, and let E > 0. Find sequences of open cc {Zj)Eland { K l ] E l so that the containments A E u Zj, B C uKi and the c4 boxes j=l 1=1 250 14. Integration on Measure Spaces c x inequality 00 II j I j=1 I K1 I < h, (A)hn(B)+ E hold. Then 1=1 j,l=l J=l 1=1 and for all A E Chm and B E Chn we obtain hd(A x B) I h,(A)h,(B). In particular, this means that for A E Ch,, and B E Ch, such that one of A , B is a null set we have that hd(A x B) = 0, which means A x B is Ad-measurable. Together with the Ad-measurability of products of open boxes, this means Chd 2 Chm x Ch,, because Chm x C,in is generated by products of open boxes or null sets in Rm with open boxes or null sets in Rn. Proposition 14.60 has two apparent shortcomings. First, the a-algebras are not shown to be equal to each other. Exercise 14-47 shows that the containment is indeed strict, so this part of the result cannot be improved. Second, we did not show that hd(A x B) and h,(A)h,(B) are equal, even though they should be equal. Recall that in Proposition 8.5 we needed the Heine-Bore1 Theorem to prove that the Lebesgue measure of an interval is its length. To prove that hd(A x B) and h,(A)hn(B) are indeed equal, we need a higher dimensional version of the Heine-Bore1 Theorem. This version will be provided in Theorems 16.72 and 16.80. Thus we delay the proof that hd(A x B) = A,(A)h,(B) to Theorem 16.81. To keep notation simple, unless the distinction of different dimensions is necessary, we will denote Lebesgue measure in all dimensions by h. Exercises 14-39. Prove Proposition 14.51 14-40. Prove that every d-dimensional open box is hd-meaSUrabk. 14-41. Generating the Lebesgue measurable sets. For d E boxes and all d-dimensional null sets. N let Gd be the set of all d-dimensional open (a) Prove that C A ~the, 0-algebra of Lebesgue measurable sets in E d , is generated by Gd. Hint. Gd g C A by ~ Exercise 14-40. Prove that every S E X i d with h ( S ) < m differs by a null set from a countable intersection of countable unions of open boxes. Then use 0 -finiteness. (b) Prove that form, n E N the 0-algebra generated by ( A x B : A E G,, B E G,] is Chm x C h H . (c) P r o v e t h a t i f m + n = d a n d A E C ~ ~ , t h e n t h e r e i s a BE Ch, X C A , withhd(A\BUB\A) = 0. + (d) Prove that if m n = d and the function f : Wd + [-m, m] is Zhd-measurable, then there is a Z A , x~ Cj,, -measurable function g with j = g a.e. 14-42. Let M be a set and let U be a set of subsets of M . Prove that the a-algebra generated by U is contained in all a-algebras that contain U . 14-43. On the equality of measures. (a) Prove Proposition 14.54. (b) Prove that the 0-algebra generated by all the singleton subsets of C := (Cg W : C or W \ C is countable 1. W is the set 14.8. Product Measures and Fubini's Theorem 25 1 (c) Construct two finite measures on C that agree on all singletons, but which are not equal 1 Hint.Set F ( [k) ) := - fork E N and set it equal to zero for all other singletons. 2k (d) Construct a measurable space ( X , C) and two finite measures /1. and u on C so that the equality l ( X ) = u(X) holds, but { S E C : w ( S ) = u(S) } is not a a-algebra. Hint There is a finite example. 14-44. Prove that if ('Dj]iclis a family of Dynkin systems on the set M , then n'Di is a Dynkin system. iel 14-45. Prove Theorem 14.57. 14-46. Prove that if is a measure defined on the a-algebra 8 generated by the finite open intervals such that F ( ( a , b ) ) = b - a for all finite open intervals, then p must be equal to the restriction of Lebesgue measure to B. 14-47. The containment of the a-algebras in Theorem 14.60 is strict. To see this, prove that for m < d every subset of W m (considered as a the subset Rm x [O)d-m of Rd) can be a section of a Lebesgue measurable set in Rd. 14.8 Product Measures and Fubini's Theorem We are now ready to define a measure on the product a-algebra. Because Theorems 14.56 and 14.57 are needed to construct an unambiguous product measure in Theorem 14.62, we introduce the notion of a-finiteness. Definition 14.61 A measure space ( M , C , k ) is called a-finite u iff there is a sequence a3 of subsets A; & M ofjnite measure so that M = A ; . We will also sometimes call j=1 the measure a -3nite. Clearly, Lebesgue measure is a-finite, so we will be able to use the results from this section for d-dimensional integration. Theorem 14.62 Let (X, C , k ) and ( Y , r, u ) be a-jnite measure spaces. Then there is a unique measure k x u on C x r so that with the convention 0.00 = 0 for all A E C and B E we have ( k x u ) ( A x B ) = k ( A ) u ( B ) . Proof. The natural idea for the product measure of a set S E C x r is to integrate the u-measures of the sections S, over X. To do this we first need to prove that for all S E C x r the function x H u (S,) is C-measurable. Let X be the set of sets S E C x r so that the function x H u (S,) is C-measurable. We need to prove that X = C x r. For all A E C and B E r we have that u ( ( A x B),) equals u ( B ) when x E A and 0 when x 9 A . Therefore, the function x H u ( ( A x B),) = U(B)lA(X) is C-measurable for all A x B E C x r. To prove X is a a-algebra, first note that the rectangles with measurable sides form a rr-system (see Exercise 7-7) and X x Y E X. To be able to apply Theorem 14.56, we will now prove 'H is a Dynkin system when u is finite. If S,F E 'H and S C F , then for F \ S we have u ( ( F \ S),) = u (F, \ S,) = u(F,) - u(S,), which implies that F \ S E 'H. Now let be a sequence in X with A, 5 A,,+1 for all n E N. 252 Then u (ifi An),) = u n=l 14. Integration on Measure Spaces (c ( A n ) , ) = ,l&l u ( ( A , ) , ) by Theorem 14.15, so by Exercise 14-28a li is closed under the formation of increasing unions and thus li is a Dynkin system. Therefore, by Theorem 14.56 7-l = C x r. Hence, if u is finite, for all S E C x r the function x H u(S,) is C-measurable. To prove the result for a-finite measure spaces ( Y , u , r), let { F n ) r = i be a se- u CQ quence of pairwise disjoint sets in u, (S) := u ( Fa n S). r with u(F,) < 00 so that F, = Y . Define n=l Then each u, is a finite measure and so each function u, (S,) is co C-measurable. But then u(S,) = which proves that the functions x u,(S,) is also C-measurable (Exercise 14-28a), n=l H u(S,) are C-measurable. Now we prove that ( p x u ) ( S ) := Clearly, (w x u)(0) = s, u (0,) d p = s, s, u (S,) d p defines a measure on C x 0 d p = 0. Now let [ S n ) ~bel a sequence of pairwise disjoint sets in C x r. Then for all x sequence of pairwise disjoint sets in r. Therefore Thus ( p x u ) defines a measure on C x have ( p x u ) ( A x B ) = s, r. E X we have that { (Sn),]zl is a r. Moreover, for all A u ( ( A x B),) d k = s, E C and B E r we u ( B ) 1~ dw = p ( A ) u ( B ) ,where the overall result is zero if one of A or B has measure zero. Because ,LL and u are both a-finite, any measure with the above property must be a-finite. Thus by Theorem 14.57 p x u is the unique measure on C x r so that ( p x u ) ( A x B ) = k ( A ) u ( B )for a all A E C and B E r. Definition 14.63 The measure of p and u. x u in Theorem 14.62 is called the product measure Theorem 14.62 not only proves the existence of the product measure, its proof also provides a representation of p x u. Because the product measure is unique and we could have switched the roles of the two factors, there are two representations. These two representations are needed so that we can ultimately switch the order of integration. 253 14.8. Product Measures and Fubini’s Theorem Corollary 14.64 Let ( X , C, p), ( Y , r, v ) be a-Jinite measure spaces and let S E C x r. Then a ( x ) := v (S,) is C-measurable, b ( y ) := p (SY) is r-measurable and the equality ( p x v ) ( S ) = s, v (S,) d p = s, p (S’) d v holds. Proof. Similar to the proof of Theorem 14.62 we can prove that for all S E C x the function y H p (SY)is r-measurable and (p x v)(S) = p r (SY)d v . We are now ready to prove that integrals with respect to product measures can be computed as iterated integrals. Proposition 14.65 Let ( X , C, p ) and ( Y , r, v ) be a-Jinite measure spaces and let the function f : X x Y -+ [0,001 be C x r-measurable. Then the function x is C-measurable and the function y H J, f H f x dv d p is r-measurable. Moreovel; Proof. We first prove the result for f = 1s where S E C x r. By Proposition 14.59, for all y E Y the function fY = (1s)y = 1s) is C-measurable. Now and similarly we can prove that By Theorem 14.39 the equalities must hold for all simple C x r-measurable functions and by the Monotone Convergence Theorem the equalities hold for all nonnegative C x r-measurable functions. Theorem 14.66 Fubini’s Theorem. Let ( X , C , ,x) and ( Y , r, v) be a-finite measure spaces and let f : X x Y + [-m, 001 be p x v-integrable. Then for p-almost all x E X the function f , is v-integrable and for v-almost all y E Y the function f’ iff, is v-integrable, is is @-integrable.The function x H otherwise, 14. Integration on Measure Spaces 254 Proof. By Proposition 14.65 we have that the functions x x H s, H s, ( f+)x du and ( f -), du are pintegrable, so f x is u-integrable for F-a.e. x E X. Now f - d(!J x s,xYf+d(~xu)-s,xY u) The other equation and the assertions about the f y are proved similarly. a Exercises 14-48. Construct a Lebesgue integrable function f : [0, 11 x [0, 11 -+ [-m, m]so that not all fx and f ‘ are Lebesgue integrable. h ) be the real numbers with Lebesgue measure, let (R,P ( R ) , ) be the real numbers 14-49. Let (W.EA, with counting measure and let f := ly=xbe the indicator function of the set { (x,y ) E R2 : q’ = x ). Prove that (kf) dh) (b dm # fx dm) d h . Note. This example shows that the assumption of a-finiteness of the measure spaces is needed throughout this section. 14-50. Let ( N , P ( N ) , ) be the natural numbers with counting measure Prove that P(N) x P(W) = P ( N x W), where the first product denotes the product a-algebra of the two power sets. Prove that x = mxw cc s1s1 Prove that if the double series Hint. Recall Example 14.37. a ( i , j )converges absolutely, then we can exchange the c CCICCI Prove that if the double series a ( i , j )converges absolutely, then for all bijective func- i=l j = 1 N the sum c x cam tions u : W x W + N x ao(i,jj converges to i=l j=1 i=l Hint. Dominated Convergence Theorem and Fubini 14-51 Let ( M , E.p ) be a u-finite measure space, let f M + [0, 5c] be C-measurable and let (R,EA A) be the real numbers with Lebesgue measure (a) Prove that E = (b) Prove that { (x,j ) C M IM 1 fdp = cs p x B 0 9 j <f(x) ( {x E M :f(x) t } 1s in C y ]) x C, dq’. Chapter 15 The Abstract Venues for Analysis It is now time to introduce the broad spectrum of settings in which analysis is done. Applications are usually set in multidimensional Euclidean space, so we need to find a way to do analysis in Rd.Moreover, because the solutions for many modeling processes are functions, we need to consider spaces of functions. To make the theory applicable in many situations, we analyze the abstract commonalities of the structures in question, define spaces based on these commonalities and then prove theorems in these abstract spaces. The beauty and power of the abstraction are founded in the wide applicability of the results as well as in their relative simplicity. Regarding the wide applicability, recall, for example, that the initial abstraction in Chapter 14 was but a simple step from the corresponding notions for functions of one real variable. Yet it ultimately allowed us to integrate in arbitrary dimensions. Regarding the relative simplicity, note that the notions of convergence for functions presented in Chapter 11 and in Section 14.6 involve a significant amount of detail about functions, which may be quite complicated objects themselves. The generalizations of d-dimensional space presented in this chapter will allow us to visualize the functions and their convergence behavior as if they were points and sequences in low dimensional spaces. This simplification is significant. The subsequent chapters will provide more details on the metric, normed and inner product spaces presented in this chapter. 15.1 Abstraction I: Vector Spaces Vector spaces describe the algebraic properties of multidimensional spaces and of spaces of functions. These properties guarantee that objects can be added to each other and multiplied with numbers. 255 15. The Abstract Venues for Analysis 256 Definition 15.1 A real vector space or a vector space is a triple ( X , +, .) of a set X and two binary operations, vector addition : X x X +. X and scalar multiplication . : R x X -+X , so that the following hold. + 1. Vector addition is associative. That is, f o r all x,y , z E X we have (x 2. Vector addition is commutative. That is, f o r all x, y E X we have x + y ) + z = x + ( y + z). + y = y + x. 3. There is a neutral element 0 E X f o r addition. That is, there is an element 0 E X so that f o r all x E X we have x + 0 = x. 4. For every element x E X , there is an additive inverse element. That is, f o r every x E X there is a (-x) E X so that x (-x) = 0. + 5. Scalar multiplication is (left) distributive over vector addition. That is, forall x, y E X and a E R we havea . (x y ) = a .x + a . y . + 6. Scalar multiplication is (right) distributive over scalar addition. That is, f o r a l l x E X a n d a , j3 E R we have (a j3) . x = a . x j3 .x, + + 7. Scalar multiplication is “associative.” That is, f o r all x E X and a , 0 E R we have a . (j3 . x) = (aj3). x, 8. The number 1 E IR is a neutral element f o r scalar multiplication. That is, f o r all x E X we have 1 . x = x. An element of a vector space is also called a vector and a real number is also called a scalar in this context. We will usually refer to the set X as the vector space, implicitly assuming addition and multiplication are denoted as usual. As is customary for multiplications, the dot is usually omitted. The standard example and visualization for a vector space is d-dimensional space. Example 15.2 Let d E N. The set Rd := {(XI,. . . , Xd) : xi wise addition and scalar multiplication is a vector space. E R} with component- 0 To indicate how abstraction can simplify arguments, we will not directly prove the vector space properties for d-dimensional space. Instead, we note that d-dimensional space can be interpreted as the space of functions f : { 1, . . . , d } +. R.The fact that d-dimensional space is a vector space then follows from Example 15.3 below. In a way, Example 15.3 is universal and it should be thoroughly understood. All examples of spaces in this text are subspaces of spaces as in Example 15.3. Example 15.3 Let D be a set. The set F(D,R) of all functions f : D -+ IR with addition defined pointwise by ( f + g ) ( x ) := f (x) + g(x) and scalar multiplication dejined pointwise by (a . f ) (x) := af (x) is a vector space. All properties follow from the appropriate pointwise properties for real numbers. For example, addition of functions is commutative, because for all x E D we have that ( f + g)(x) = f ( x ) + g ( x ) = g(x) + f ( x ) = ( g + f ) ( x ) .The neutral element with 0 respect to addition is the function that is equal to 0 E R for all x E X. 15.1. Abstraction I: Vector Spaces 257 By identifying each vector ( X I , . . . , x d ) E Rd with the function that maps each index i E ( 1 , . . . , d ) to xi, we see that Rd is the vector space F((1,.. . , d ) , R). We conclude this section by introducing several important subspaces of spaces F ( D , R). Definition 15.4 A subset S of a vector space X is called a vector subspace or subspace of X iffit is a vector space when equipped with the restrictions of addition and scalar multiplication to S. The advantage of working with subspaces of a somewhat universal vector space like F ( D , R) is that subspaces are easily proved to be vector spaces. Lemma 15.5 Let X be a vector space. Then the neutral element 0 E X with respect to addition is unique. Moreovel; for the scalar 0 E R and all vectors x E X we have that 0 . x = 0. Proof. Suppose 0,O’ E X are both neutral elements with respect to addition. Then 0 = 0 0’ = 0’ +O = 0’. Hence, the neutral element with respect to addition is unique. Now let x E X. Then for all y E X we have + y+ox =y+x+ox-x = y + ( 1 +O)x - x = y + x - x = y Therefore Ox is neutral with respect to addition, and hence it is equal to 0 E X. Proposition 15.6 A nonempty subset S of the vector space X is a subspace i f f o r all x,y E S we have x y E S andfor all a E R a n d x E S we have a . x E S. + Proof. For “jnote ,” that if S is a subspace, then the results of operations with elements of S must again be in S. Conversely, for “+”note that if the sums and scalar multiples of elements of S are again in S, then the properties of the restrictions of the operations to S follow from the corresponding properties for the overall operations on X. The existence of a neutral element for addition follows from the fact that if x E S. then 0 = Ox E S. Example 15.7 The set 130 := { { ~ i ) , 3 0 _: ~xi E R,sup { ( x i1 : i E N} < m} with addition and scalar multiplication dejned termwise is a vector space. First, note that loo is a subset of F ( N , R). Now i f f , g E lc0 and { l f ( n ) l : n E N} is bounded by B and { Ig(n)l : n E N} is bounded by B,, then for all n E W we have l(f -k g)(n)l = f ( n > g(n)l I / f ( n > l lg(n)l e B f B, < m, so f g E P. Similarly, if a E R,then for all n E N we have I ( a f ) ( n ) /= l a l / f ( n ) l I IcxIBf < CO. Thus loo is a subspace of F ( N , R). f + + + + Example 15.8 The set C o ( [ a ,b ] ,R) of continuous functions f : [ a ,b ] -+ R with pointwise addition and scalar multiplication is a vector space. Clearly, Co( [ a ,b ] ,R) is a subset of F ( [ a ,b ] ,R). Moreover, part 1 of Theorem 3.27 implies that sums of continuous functions are continuous. Part 3 of Theorem 3.27, applied with one function being constant, implies that scalar multiples of continuous functions are continuous. 258 15. The Abstract Venues for Analysis Notation 15.9 To simplify notation, we write Co[a,b ] instead of C o ( [ a ,b ] ,R). 0 Example 15.8 shows how switching to more abstract situations is often just a change in point-of-view. We no longer consider individual objects, but instead we consider classes of objects. Results about individual objects are then often summarized in statements about the class. For example, the fact that sums and constant multiples of continuous functions are continuous is summarized in the statement that the continuous functions form a vector space. Example 15.10 For a < b and k E N,let C k ( a ,b ) be the set of k times continuously differentiablefunctions on ( a , b). Then C k ( a ,b) with pointwise addition and scalar n 00 multiplication is a vector space. Moreovel; C ” ( a , b ) := C k ( a ,b), the space of k= 1 infinitely differentiable functions on ( a , b), with pointwise addition and scalar multiplication is a vector space. This is an induction using Theorem 4.6. 0 Spaces of “p-integrable functions” are of central importance in analysis. Definition 15.11 Let ( M , C, p ) be a measure space and let 1 5 p < f E F ( M , R) : ,/ CO. l f l P d p < CO] . Example 15.12 Let ( M , C, p ) be a measure space and let 1 5 p < C p ( M , C , p ) is a vector space. Because C P ( M , C, k ) F ( M , R) we can apply Proposition 15.6. Let f , g E C p ( M , C, p). Then 2pIf l p SO f We define + 2plglp d p < 00. Then 00, + g E C p ( M , C, p ) . Moreover, if a E R,then so a . f E C p ( M , C, p). Hence, C P ( M , C , p ) is a vector space. 0 For concrete examples, the notation for LP with an abstract measure space becomes a bit cumbersome. Definition 15.13 When we work with real valued functions on a Lebesgue measurable subset D of Rd, we will assume that M = D, C = Ch, the a-algebra of Lebesgue measurable subsets of D, and p = A, the Lebesgue measure. In this case, we denote L P ( D ) := L P ( M , C,p). Also, if D is an interval, we may leave out the outer parentheses. That is, we may write C P [ a ,b ]for C P ( [ a ,b ] ) ,and so on. 259 15.2. Representation of Elements: Bases and Dimension Aside from the capital CP spaces, the lowercase 1P-spaces of Definition 15.14 below are sometimes used in analysis. In this chapter, we will frequently translate results between “little lp” and “big CP.”This will not be a problem, because the computations are very similar. The only adjustment usually is that sums are interchanged with integrals and vice versa. As a first demonstration of the similarities between “little l p ” and “big CP” we leave the proof that l p is a vector space to Exercise 15-2a. The two ways in which the analogy between lp and CP can be used will be presented in Sections 15.4 and 15.6, respectively. In Section 15.4, we will prove results for l 2 and then show how the proof is analogous for C2. In Section 15.6, we will prove results for CP and show how results for 1P can be obtained as corollaries. Exercises 15-1. Prove that if X is a vector space, then for all x E X we have (-l)x = - x . 15-2. Examples of vector spaces. (a) Prove that for 1 5 p < 00 the space 1P is a vector space. (b) Let ( M , C) be a measurable space. Prove that the set M ( M , C ) of measurable real valued functions is a vector space. (c) Let a ib and let B V [ a ,b] denote the set of all functions of bounded variation on [ a , b] (see Exercise 8-12). Prove that BV[a,b] with pointwise addition and scalar multiplication is a vector space. 15-3. Containments of LP-spaces Prove that I’ c l 2 c 1OC. Then give concrete examples to show the containment is proper. Hint. Harmonic series. Prove that if 1 5 p q , then i 10 c 1 4 , Give an example to show the containment is proper. Let ( M . C , p ) be a measure space with p ( M ) < oc. Prove that if 1 5 p < q , then .CP(M, C , p ) 3 . C q ( M , C , p). Give an example that shows that the containment is proper. Hint. Exercise 9-32. Give an example of a measure space ( M , C , p ) such that for all 1 5 p < q we have that Lp(M.C , p ) g’ C q ( M , C , p ) and L f ’ ( M , X. p ) 2 .Cq(M, C , p). Hint. By part 15-3b the measure space cannot be discrete and by part 15-3c it cannot be finite. 15.2 Representation of Elements: Bases and Dimension It is often useful to represent vectors in terms of certain standard vectors. The most familiar example is the standard coordinate system in d-dimensional space. Definition 15.15 Let X be a vector space. A subset S early independent iff f o r all jinite subsets {XI,. . . , x,} n { a l .. . . , a,} R the equation X \ {O} is called linS and all sets of scalars aixi = 0 implies a1 = a2 = . . . = an = 0. i=l 260 15. The Abstract Venues for Analysis n A sum c a i x i with aj E i=l X I . . . . , x,. R and xi E X is also called a linear combination of Definition 15.16 A linearly independent set B g X such that for every x E X there are afinite subset { b l , . . . , b,] C B and a set of scalars { a l ,. . . ,a,,] C R so that n a;bi is called a base of a vector space. x = i=l Obviously, linear independence and bases are finitary notions. They are usually investigated in linear algebra. For our purposes, they are important to establish the difference between finite dimensional and infinite dimensional spaces. Example 15.17 In Rd,let e; denote the vector such that the ith component is 1 and all other components are zero. Then [ e l , . . . , ed] is a base of Rd. To prove linear independence, for each i = 1, . . . , d let ej’) denote the jfhcom- c d ponent of ei. Then for any a ] ,. . . , a d the vector equation c ere' - 0 leads with i=l d j = 1, . . . , d to the scalar equations aiej’) = 0, which for each j simply state that i=l aj = 0 as was to be proved. Regarding the representation of elements, for each x = ( X I , . . . , X d ) E Rd we have d that x = ( X i , .. .,X d ) CI xiei . = i=l The nice thing about bases is that the representation of elements is unique. Proposition 15.18 Let X be a vector space and let B g X be a base. Then for each x E X the b l , . . . , bn E B and a l , . . . , E R in Definition 15.16 are unique, except that any vector of B \ {bl , . . . , 6,) can be added to the bi with a coeficient zero and that any bi with ai = 0 can be omitted. Proof. Suppose for a contradiction that there is an x E X with two different representations. Let { b l ,. . . , b,] C B be the set of base vectors that occur in either representation. Then there are two distinct n-tuples ( a l , . . . , a,) (PI, . . . , j3,) in x n x n n + a fb, = x = b, . But then c ( a f - B,)b, = 0, and hence by linear r=l r=l 1=l independence of B , for all i E { 1, . . . , n ] we conclude a1 - ,B1 = 0, that is, a, = ,Bf, a contradiction. Rn so that Every vector space is either finite dimensional, that is, it has a finite base, or not. Finite dimensional spaces are investigated linear algebra and in analysis. Infinite dimensional spaces are mostly investigated in (functional) analysis. 26 1 15.2. Representation of Elements: Bases and Dimension Proposition 15.19 Let X be a vector space with afinite base F. Then every linearly independent subset L of X has at most as many elements as F . Moreovel; all bases of X have as many elements as F . Proof. Let F = { f 1 , . . . , f n } be a finite base of X. Suppose for a contradiction that there is a linearly independent set L C X that has more elements than F . Without loss of generality we can assume that L is such that IL n FI is maximal. That is, if is another linearly independent subset of X with more elements than F , then L n F 5 J L n FI. Now let b E L\F and consider the linearly independent sets L\{b} I- I and F . By Exercise 15-4a, there is a subset H C F \ L so that C := ( L \ { b } )U H is a base of X. By Exercise 15-4b, L \ ( b ) is not a base of X, so H f 0. Hence, C has at least as many elements as L and in particular it has more elements than F . Because b y! F , weobtain \ [ ( L\ ( b } )U H ] n FI = I(L n F ) U HI > IL n FI, acontradiction. Thus no linearly independent subset of X has more elements than F . In particular, if B is another base of X, then 1 B 1 5 IF 1. Therefore B is finite and with the same argument as above, if L _C X is linearly independent, then ILl 5 IBI. Because F is linearly independent, we obtain IF I 5 I B 1, and hence IF I = I B I. The fact that if one base is finite then all bases have the same size allows us to assign one number as the dimension. Definition 15.20 A vector space X is called finite dimensional i r i t has afinite base. In this case the dimension of X is the number of vectors in a finite base. If X has no jinite base it is called infinite dimensional. Proposition 15.21 A vector space X f ( 0 } is infinite dimensional iff it contains an infinite linearly independent set. Proof. For ‘‘=+-,”note that no finite linearly independent subset of X is a base. Construct a sequence of finite linearly independent subsets F,, of X as follows. Let Fl := { f l } with f 1 E X \ {0}arbitrary. Once F,, = ( f 1 , . . . , f,,} is chosen, note that because F,, is not a base of X ,there is an f n + l E X so that there are no a1, . . . , a, E R II so that a j f j = f n + l . This means that F,+1 := F, U { f,,+l} is linearly independent j=1 u co (Exercise 15-5). But then the infinite set F, is linearly independent in X n=l For ‘‘+”let A 2 X be an infinite linearly independent subset. Then by Proposition w 15.19 X cannot have a finite base. Our first example of an infinite dimensional space is easy to come by. 0; for j f i , 1; f o r j = i . Then [ej : i E N}is linearly independent, and hence 1’ is injnite dimensional. 0 The reader will verify this in Exercise 15-7. Example 15.22 In 1’ let e, be the sequence with jthentry I,(’) - 15. The Abstract Venues for Analysis 262 Example 15.22 shows a shortcoming of the definition of a base that is ultimately addressed analytically in Banach and Hilbert Spaces. The set of “unit vectors” in Example 15.22 has everything we want in a base, except that we would need to sum infinitely many vectors for the representations. To define “infinite sums” we need a notion of convergence, which will be introduced as soon as we investigate metric spaces. For now, we continue in our exploration of the spaces of abstract analysis and we conclude this section with a notation we will use in l p spaces from here on. Convention 15.23 When working with sequences of objects in spaces whose elements are finite or injnite sequences themselves, we will move the sequence index for the elements into the exponent in parentheses. That is a:’ will denote the jth element in the sequence a,, which is itselfan element of the sequence (a,,}El. The vector ei with jthentry ei(j)- 0; for j f i, will be called the ith standard unit 1; for j = i , vector. For all 1 5 p f 00, the standard unit vectors are contained in l p . 4 Exercises I” I 15-4. The span of linearly independent sets. Let X be a vector space and let W W tobe s pa n(W ):= C X. Define the span of u = ~ n i w , : a i ~ W , w i ~. W i=l (a) Let B and F be linearly independent sets and let F be finite. Prove that if span(B U F ) = X , then there is a subset H of F so that B U H is a base of X. Hint. Induction on the size of F . In the induction step, let f E F and distinguish the cases f E span(B) and f $ span(B). (b) Let B be a linearly independent set and let b E B . Prove that span(B) f span ( B \ ( b } ). 15-5. Let F = ( f l , . . . , f n } be linearly independent in the vector space X . Prove that if that there are no “1, . . . , an E W so that n f,+l E X is so a j f j = f n + l , then F U (f,+l} is linearly independent. j=1 15-6. Base Exchange Theorem. Prove that if X is a vector space and (ul. . . . , u,} and ( W I , . . . , w n }are bases of X , then there is a j E [ 1, . . . , n ] so that ( u l , , , . , q - 1 , wj } is a base of X. 15-7. Prove the claim in Example 15.22. 15-8. Prove that for 1 < p 5 00 the space 1P is infinite dimensional. 15-9. Prove that for 1 5 p < 00 the space / Y [ O , 11 i s infinite dimensional 15-10. Prove that the space Co[O, 11 is infinite dimensional. 15.3 Identification of Spaces: Isomorphism Isomorphisms are structure preserving maps. Basically, isomorphic structures are for all intents and purposes “the same,” as long as we only care about the operations and concepts that are preserved by the isomorphism. Definition 15.24 Let X , Y be vector spaces. Then the function @ : X + Y is called an isomorphism iff 0 is bijective and f o r all u , u E X and a E E% we have that @(u u ) = @ ( u ) @(u) and @ ( a u )= a @ ( u ) . + + 15.3. Identification of Spaces: Isomorphism 263 Definition 15.24 and Exercise 15-11, which proves that the inverse of an isomorphism is again an isomorphism, show why isomorphic structures are considered to be “the same.” Let Q, be an isomorphism from the vector space X to the vector space Y . Obviously, we have a one-for-one matching between the elements of X and Y . Moreover, as long as we only care about linear operations, it does not matter if we carry out the operations in X or in Y , because we can always map back and forth between the spaces and get the corresponding results. The next result shows that from the point-of-view of linear algebra, all finite dimensional spaces are “the same.” Theorem 16.76 will show that they are also “the same” from an analytical point-of-view. Proposition 15.25 Every d-dimensional vector space X is isomorphic to Rd. Proof. Let { b l , . . . , b d } be a base of X . Then for each vector x E X there are d unique numbers x1, . . . , X d by @(x) = @ E R so that x = x i b ; , Define the function @ : X --f I@ i=l (2 d xibi) := i=l C xiei . Because the x; are unique, Q, is well-defined. i=l d Moreover, for all (Y E R and all x , y E X with x = C x i bi i=l xi + +y) = @ y i b; we have i=l d d @(x and y = (zxibi +gyibi) = @ i=l yi)bi i=l d d d i=l i=l i=l and d d i=l i=l which means that @ is has the linearity properties of an isomorphism. For surjectivity, d note that if y = C (c c d yiei is in Rd then @ i=l that @ is injective, let x = d y;bl) = yiei = y . Finally, to prove i=l i=l d d i=l i=l C xibj and z = C zib; be in X with @(x) (c(x; d Then 0 = @(x) - @ ( z ) = @(x - z ) = @ implies x i = z; for all i , and hence x = z. \i=l = @(z). d - ; ; ) b i ) = c ( x i - z i ) e i , which / i=l 15. The Abstract Venues for Analysis 264 Exercises 15-11. Let X and Y be vector spaces. Prove that if CP : X CP-1 : Y -+ x. --f Y is an isomorphism, then so is its inverse 15-12. Prove that for any 1 5 p 5 co the space Wd is isomorphic to the subspace of IP consisting of sequences [ x j ] T = , so that for all j > d we have x , = 0. 15-13, Prove that CP : X + Y is an isomorphism iff 0 is bijective and for all a E W and x , y E X we have that @ ( a x y ) = aCP(x) @ ( y ) . + + 15.4 Abstraction 11: Inner Product Spaces In three dimensional space there are two ways to multiply vectors. The vector (or cross) product, inspired by observations of magnetic forces on moving charges and also used in the definition of the torque, is native to three dimensional space. However, the scalar (or dot) product, which is inspired by the consideration of work done by a force and which can be used to measure angles, can be generalized to other vector spaces. The corresponding notion is called an inner product. Strictly speaking, Definition 15.26 below defines a real inner product space. For complex inner product spaces, consider Definition 15.81. Definition 15.26 An inner product space is apair ( X , (., .)) o f a vector space X and afunction (., .) : X x X + R, called the inner product or scalar product on X , with the following properties. 1. The inner product is positive definite. That is, for all x E X we have (x,x) E [0,ca) with (x,x) = 0 ifsx = 0. 2. The inner product is symmetric. That is, f o r all x,y E X we have (x,y ) = ( y , x). 3. The inner product is linear in theJirstfactor. That is, for all scalars a and all x,y E X we have (ax,y ) = a(x,y ) and for all x,y , z E X we have (x y , z ) = (x,z ) ( y , z ) . + + We will usually call X itself an inner product space. When we do so, we implicitly assume that there is an inner product on X , which will usually be denoted (. , .). Exercise 15-14 shows that the inner product is also linear in the second factor. Rather than staying with d-dimensional space, in our first example we go directly to an infinite dimensional inner product space. Ex: 30 Example 15.27 The set l 2 := : xi E R, < 00 is a vector subspace i=l 00 o f F ( N , R) and with ( { x i ) E l ,{ y i } E l ):= c x i y i it is an innerproduct space. i=l To prove that l 2 is a subspace of F(N,R) first let {xi)E1 E l 2 and a 00 z ( w x i j 2 = a2 i=l Ex; E EX. Then 00 i=l < 00, and hence { a x i } E ,E 12. To prove that 1’ is closed under 15.4. Abstraction 11:Inner Product Spaces 265 addition, let (xi]zl, ( y i ) z l E l 2 and note that (see Exercise 15-15) for all x, y the inequality 2 1 . ~ ~51x2 y2 holds. Then + i=l i=l i=l and hence ( x j } c l w 00 i=l i=l + (yi]El = (xi 00 posed inner product satisfy + yj}El R i=l i=l E 12. Moreover, the series in the pro- C lxiyj I I C (x; + y;) i=l E lcO . Therefore for all sequences i=l 00 ( x i } z l ,( y i } r l l 2 the series ( ( x i } r l ,( y i } ~ l= ) C x i y i converges absolutely, and i=l hence (., .) is defined on all of l 2 x 1 2 . The fact that (., .) defines an inner product now follows easily from standard results about series. (Exercise 15-16.) 17 E Subspaces of inner product spaces clearly are inner product spaces themselves. Example 15.28 shows how d-dimensional space can be obtained as a subspace of 12, which means d-dimensional space can be equipped with an inner product. Example 15.28 Let d E N.The set Rd := {(XI,. . . , x d ) : xi E R} with termwise add dition and scalar multiplication and ((xi, . . . , x d ) , (y1, . . . , yd)) := Xiyi is an ini=l ner product space. For x = (XI, . . . , X d ) E Rd,define l ( x ) to be the sequence ( z , ] ~defined ~ termwise x,; forn I d, is a subspace of 1 2 , and hence it is an inner by Z n := 0; otherwise. 7,. . . product space. Because 1 : Rd + 1 is injective, preserves sums and scalar multiples and satisfies (x, y) = ( l ( x ) ,l(y)) for all x , y E Rd we conclude that (., .) on Rd must 0 satisfy the properties of an inner product. Thus Rdis an inner product space. { In fact, Rd is, as an inner product space, isomorphic to the subspace 1 [Rd] where the isomorphisms between inner product spaces are defined as follows. 12, Definition 15.29 Let ( X , (., .) x) and ( Y , (. , .) y ) be inner product spaces. Then the function CP : X -+ Y is called an isomorphism iff it is an isomorphism between the vector spaces X and Y andfor all u , u E X we have ( u , u ) =~( @ ( u ) ,CP(u)jr. Many important results in analysis come from the fact that L2 is almost an inner product space. There is one problem, which the reader will explain in Exercise 15-18. Yet Proposition 15.30 shows that most properties hold just fine. The problem with L2, and indeed with LP in general, will be resolved in Section 15.8. 15. The Abstract Venues for Analysis 266 Proposition 15.30 Let ( M , C,p ) be a measure space. Consider the vector space L 2 ( M , C , p ) of square integrable functions. Then ( f ,g ) := f ( x ) g ( x )d p is a /M function from C 2 ( M , C , p ) x C 2 ( M ,C, p ) to R such that 1. For all f E 2. Forall f ,g C 2 ( M ,C , p), we have ( f , f ) E [0,CO). Moreovel; (0,O) = 0. E G 2 ( M , C , p), we have (f,g ) = ( 8 , f ) . 3. For all a E R a n d all f , g E L 2 ( M , C , p), we have ( a f , g) = a (f , g), andfor all f , g, h E C 2 ( M ,C , p ) we have ( f g , h ) = ( f , h ) ( g , h ) . + + Proof. The proof that ( f ,g ) exists for all f , g E C 2 ( M , C, p ) is similar to the proof for l 2 and left to the reader as Exercise 15-17. For the properties of (., .), let f , g, h E C 2 ( M , C , p ) and a E R. Then Note that on C 2 [ - n , n) we usually work with ( f ,g) := - / f g d h . The [-r,Jr) 1 is motivated by Example 15.34 below. Inner product spaces will be extra factor n investigated in more detail in Chapter 20, with groundwork laid in this chapter, Chapter 16 and Section 17.1. Exercises 15-14. Prove that if X is an inner product space, then for all x , y , z E X and a E ( x , y z ) = ( x , y ) (x,z) and (x,ay) = a ( x .y ) . + + 15-15, Prove that for a llx, y E R we have R we have 2 / x y l 5 x 2 + y 2 15-16. Finish Example 15.27 by proving that (.. .) is an inner product on 1 2 . 15-17. Prove that (f, g) as in Proposition 15.30 exists for all f,g E L 2 ( M , Z , p). 15-18. Explain why C 2 ( M ,C.p) with ( f ,g ) := Hint. Part 3 of Theorem 14.35. 15-19. Prove that C o [ - n . 7r] with ( f , g ) := - / M f ( x ) g ( x ) d p is not necessarily aninnerproduct space. fg d h is an inner product space. 267 15.5. Nicer Representations: Orthonormal Sets 15.5 Nicer Representations: Orthonormal Sets An inner product allows the definition of right angles. This in turn leads to the desire to represent vectors in terms of “nice” systems in which any two vectors are at right angles. Definition 15.31 Let X be an inner product space. Then x , y onal iff ( x , y ) = 0. X are called orthog- E Definition 15.32 Let X be an inner product space. A subset S 5 X is called an orthonormal system i r f o r any two a , b E S we have ( a ,b) = maximal orthonormal system is an orthonormal system S so that i f x that ( x , s ) = 0 for all s E S, then x = 0. E X is such Example 15.33 The standard unit vectors e, of Convention 15.23 form a maximal orthonormal system in 12. 00 For all i, j E N we have ( e i ,e j ) = e,(k)ej!) = O; if # j ’ Hence, the stan1; i f i = j . k=l dard unit vectors form an orthonormal system in 12. 00 a(j)ej’) = a ( i )for all i e l 2 .Then ( a ,e j ) = E N.Hence, j=l if a E l 2 is so that ( a ,e j ) = 0 for all i a maximal orthonormal system in 12. Example 15.34 For all m , n [: I[ E E N,then a = 0. This means that {ej : i E N]is N, we conclude the following from Exercise 12-15a. cos(mx) cos(nx) dx = sin(mx) sin(nx) d x = sin(mx) cos(nx) dx = 0. 0; ifm f n , n; i f m = n , 0; ifm f n , n; ifm = n , This means that T := { sin(nx), cos(mx) : n L 1, m 1. 1] U { 51 is an orthonor- n) with the inner ma1 system in the space of continuous bounded functions on [-n, product (f, g ) := - SJ f ( x ) g ( x ) d x . Even more is true, as this set is a maximal or- -7c thonormal system. The proof is quite sophisticated however. We will present it in Theorem 20.12. Note that the extra factor in the inner product assures that for each function in T the inner product with itself is 1. Also note that the same properties hold in L 2 [ - x , n),except that C2[-n, x ) is not quite an inner product space. Orthonormal systems possess a key property for being a base. 268 15. The Abstract Venues for Analysis Proposition 15.35 Orthonormal systems are linearly independent. Proof. Let S be an orthonormal system in an inner product space X and let n c l , . . . , cn E S and al, . . .,a,, E R be such that a’c’ I I - 0. Then for all indices i=l j = 1 , . . . , n we obtain I i=l Therefore S is linearly independent. This linear independence means that for finite dimensional spaces we should be able to find an orthonormal base, that is, a base that also is an orthonormal system. I Definition 15.36 Let X be a vector space and let W E X . DeJine the span of W to be : ai E R, wi E i=l w . Theorem 15.37 The Gram-Schmidt Orthonormalization Procedure. Eveiy jinite dimensional inner product space X has an orthonormal base. Proof. Let d be the dimension of X , let {bl , . . . , b d } be a base of X and for all vecbi tors x E X use the notation llxll := Set C I := -. After c1, . . . , Ck-1 have llbl II been defined so that { C I , . . . , C k - I } is an orthonormal base for span({bl, . . . , bk-I}), m. k- 1 let dk := bk - C ( b k ,ci)ci. Because [ b l ,. . . , bk} is linearly independent, we infer i=l dk # 0. Define Ck dk := -. lldk I/ Then for all j < k we obtain Because llCkII = 1, the set {cl, . . . , ck} is an orthonormal set and because all cj are linear combinations of b l , . . . , bk it is contained in span((b1, . . . , bk}). Because orthonormal sets are linearly independent, the set (c1 , . . . , c k } is a base of the span of b l , . . . , bk. After d - 1 steps as above we obtain that { c l , . . . , c d ) is a base of X. 15.6. Abstraction III: Normed Spaces 269 The Gram-Schmidt Orthonormalization Procedure can be executed indefinitely if we start with a countably infinite linearly independent set (see Exercise 15-20). We will discuss in Section 20.1 how we need to adjust the notion of a base to obtain standard representations of elements of an infinite dimensional inner product space. Exercises 15-20. Prove that every infinite dimensional inner product space contains an infinite orthonormal system. 15.6 Abstraction 111: Normed Spaces The definition of distances in vector spaces is a two step process. First, norms can be viewed as a generalization of absolute values. They still connect to the algebraic properties of the surrounding space. Second, metrics (discussed in Section 15.7) need not connect to these properties and can thus also be defined on subsets that are not vector spaces any more. Definition 15.38 A normed space is a pair (X,/I . 11) of. vector space X and afunction 11 . 11 : X -+ [0,co),called the norm on X , such that I . For all x E X , we have llxll = 0 zflx = 0. 2. For all scalars (Y and all x E X , we have that IIax 11 = la/I/x 11. 3. The triangular inequality holds. That is, forall x, y E X we have /Ix yII 5 IIxII + + Ilyll. We will usually call X itselfa normed space. When we do so, we implicitly assume that there is a norm on X , which will usually be denoted 11 . 11. As we introduce more abstract settings, the spaces we are familiar with will be special cases of the more general spaces. For example, Proposition 15.40below shows that every inner product space is a normed space, too. Because the function defined in Lemma 15.39 ultimately turns out to be a norm, we denote it like a norm right away. Note that until we establish that it is a norm, we are not using any of the properties of a norm. Lemma 15.39 Cauchy-Schwarzinequality. Let X be an inner product space and for a l l x E Xletllxll T h e n f o r a l l x , y E X w e h a v e l(x,y)l 5 llxllllyll. :=m. Proof. The function f ( t ) := ( t x + y , t x + y ) = t 2 ( x ,x) + 2 t ( x , y ) + ( y , y ) is a nonnegative quadratic function with absolute minimum at t = - (” ’). But then (x,x) 15. The Abstract Venues for Analysis 270 whichimplies (x,Y ) F~ (lxlI211y1l2,andhence l(x, y ) I I llxllll~ll. Now we can show that the norm notation is justified. m defines a Theorem 15.40 Let X be an inner product space. Then I(x(I := norm on X . Therefore any inner product space is also a normed space. Proof. First, note that /Ix// = 0 iff m = 0 iff ( x , x ) = 0 iff x = 0. Moreover, d G 11 = J = = la1 IIx 11. for all ci E R and x E X we obtain //ax To prove the triangular inequality, let x,y E X . Via the Cauchy-Schwarz inequality we infer the following. IIx + Y 112 = (x + Y x + Y ) = (x,x) + 2 ( x , Y ) + ( Y , Y ) 3 I llxIl2 +2llxllllYll + IIY/12 = (llxll + IlYll)2? which finishes the proof. H that is induced by the innerprod- Definition 15.41 On Rd,the norm Ilxll2 := uct is called the Euclidean norm. Of course, a more general definition is only useful if it introduces new and interesting objects. The following examples are not inner product spaces. I/ 1 Example 15.42 The function {xn}El cQ := sup { Ixn I : n E "} is a norm on lm. It is clear that {xn}El 2 0 for all sequences {xn}Z1 E I". Moreover, we have {xn}Z1 = 0 iff sup { Ixn I : n E N} = 0, which is the case iff all xn are zero, which is the case iff {xn}E1 = (O),",, E Ice. For ci E R and (xn}Z1 E I", note that 1 /lo )I /Irn ll{~xn}El Finally, for " il(xnln=l = SUP { IciIIXnl E N} = la1 ll{xn}Z, /loo. (xn}zl, {yn}zl Ioc note that E + {Yn ln=l l oo cQ = ll{xn + YnIZP=1((,= S U P {lXn + Ynl : E "1 i sup{~x,~+~y~/:n~N} 5 sup { IX,I : n E N} sup { lynl : n E N} - I1 + 00 }n=1 I/ + l l { Y n } Z 11 " 00 3 where the last inequality is true because for each natural number m E /xmli s u p { l x , I : n ~ N } a n d l y ~I Is u p { l y n l : n ~ N } . I 1 N we have that Example 15.43 Let D be a set. The function 11 f i l m := sup { f ( x ) : x E D } defines a norm on B ( D , R),the space ofboundedfunctions on [ a ,b]. The norm // . 11% is also called the uniform norm. 0 This is proved in Exercise 15-21. Example 15.43 gives access to two more commonly used spaces. 27 1 15.6. Abstraction 111:Normed Spaces Example 15.44 The function / / ( x 1 ., . . , xd) anormonRd =B({l , . . . ,d } , R ) . / oo := max { / x jI : j = 1, . . . , d } defines 0 Example 15.45 Because linear subspaces of normed spaces are normed spaces, too, El the space (Co[a,b ] ,11 . /Iw) is a normed space. We can now turn our attention to the Cp spaces once more. Definition 15.46 Let ( M , C , p ) be a measure space and let p 2 1. Then for all I f ,g E C P ( M ,C , p ) we define Ilf Ilp := ( / M If Ip dp)' . Although 11 . l i p is not quite a norm, we have good reason to use norm notation. Part 1 of Theorem 15.50 will identify the only part of the definition of a norm that is not satisfied by 11 . / I p and we will resolve this problem in Section 15.8. The biggest immediate challenge is to prove the triangular inequality for 11 . l i p , which is done in Theorem 15.49. Holder's inequality (Theorem 15.48) is a lemma leading up to Theorem 15.49, but it is also of independent interest. In CP,Holder's inequality often serves as a substitute for the Cauchy-Schwarz inequality, which is only valid in inner product spaces. Lemma 15.47 Young's inequality. Let 1 < p , q < xp l e t x , y E [0,00). T h e n x y 5 P 00 xq 1 be such that P + -1 = 1 and 4 + -. 4 Proof. The inequality is trivial for x = 0 or y = 0, so we can assume both numbers are positive. We first perform some substitutions that simplify the inequality. 1 1 u +u With u = x p and u = xq the inequality is equivalent to the inequality u p u 7 5 - P 4 U I t 1 for all u , u E (0, 00).With t := - the inequality is equivalent to t p 5 - -, where U P 4 we multiply or divide by u to go back and forth. But it is easy to verify with elementary t l L calculus (see Exercise 15-22) that the function f ( t ) := - - - t p has an absolute P 4 minimum value of 0 on (0,00). + + Theorem 15.48 Holder's inequalityf o r integrals. Let ( M , C , p ) be a measure space, 1 1 let 1 < p , q < 00 satisb the equality - + - = 1 and let f E C P ( M , C , p ) and P 9 R E C ~ W ,C , PI. Then f g E C ' ( M , C , F ) and IIfgIIi I IIfIIpIIgIIq. Proof. The computation below shows first of all that /M f g dw = 11 fgll1 is finite, which establishes that f g E C ' ( M , C , p ) . Moreover, the claimed inequality can then 15. The Abstract Venues for Analysis 272 be obtained by multiplying by I/f JIp /lg 1 I q . Theorem 15.49 Minkowski's inequality f o r integrals. Let ( M , C , p ) be a measure space and let p 2 1. Thenf o r all functions f , g E C P ( M , C , p ) we have the inequality llf g l l p I l l f l l p llgllp. + + Proof. For p = 1, the inequality is an easy consequence of the triangular inequality for absolute values. So for the remainder we can assume 1 < p < co.Choose q such 1 1 q = p q , and hence ( p - 1)q = p q - q = p . Thus, that - - = 1. Then p P 4 g l p is integrable, we conclude because we already know from Example 15.12 that If I f glp-' E P ( M , C ,p ) . Now via Holder's inequality we obtain + + which was to be proved. + + a We can now determine how close the I/ . / I p are to being norms. The only difference between the properties established in Theorem 15.50 and the properties of a norm is that llfllp = 0 only implies that f = 0 almost everywhere, not everywhere. This minor nuisance will be remedied in Section 15.8. Theorem 15.50 Let ( M , C ,p ) be a measure space and let p >_ 1. Then the following hold. 27 3 15.6. Abstraction III: Normed Spaces Proof. Part 0 is trivial and part 3 is Minkowski's inequality. The remaining parts are left as Exercise 15-23. W The functions I/ . l i p can be defined on l p also and on these spaces they do define a norm. This is the key advantage that 1'' has over CP.It actually is a normed space. Definition 15.51 For 1 5 p < 00 and {xn}ElE 1P we define the p-norm of (xn}zl Theorem 15.52 For 1 5 p < 00 the space ( l p , )I . lJp) is a normed space. Proof. First note that with yw denoting counting measure on N the space 1P is the space CP(N,P(N), yw) and that the function 11 . l i p : 1P -+ [0, 00) is exactly the function 11 . l i p : L P ( W , P(N), yw) + [0, 00) in Theorem 15.50. By Exercise 14-3a the only set of yw-measure zero is the empty set. But this means that (xn}El = 0 iff {x~):?~ = 0 m-a.e., which is the case iff x, = 0 for all n E N.Thus Theorem 15.50 proves that ( I p , 11 . / I p ) is a normed space. 1 1 Note how the proof of Theorem 15.52 reveals that the lP-spaces are special cases of the CP-spaces. Another consequence of Theorem 15.52 is that Rdequipped with the /I : \ LP norm ( X I ,. . . , xd) 1 := Ixi lp is a normed space (Exercise 15-24). That is, one space such as &id can be equipped with several norms. For d-dimensional space, this issue will be addressed in Section 16.6. Normed spaces will be investigated in more detail in Chapter 17, with groundwork laid in this chapter and in Chapter 16. Exercises 15-21. Finishing Example 15.43. Let B ( D ,W) be the set of bounded functions on the set D . (a) Prove that with pointwise addition and scalar multiplication B ( D , W) is a vector space. (b) Prove that the function l l f l l , := sup { I f(x) 1 :x E D } defines a norm on B ( D ,W). 15-22. Finish the proof of Young's inequality (Lemma 15.47) by proving that absolute minimum value of 0 on (0. m ) . f(t) t := P + -4l - t ' has an 15. The Abstract Venues for Analysis 274 15-23. Finish the proof of Theorem 15.50. That is, (a) Prove part 1 of Theorem 15.50, and 4c (b) Prove part 2 of Theorem 15.50. 15-24. For p 2 1 define // . /Ip : Rd --f [0, co)by I/ ( X I , . . . , Xd) l i p d /xi IP and prove that I/ 1lp := i=l is a norm on ~ d . 15-25. Let a b and let BVo[a,b] denote the vector space of all functions of bounded variation on [a. bl so that f ( a ) = 0. Prove that I l f l l ~ v:= sup 1” I f ( a i ) - f(ai-1) j : a = a0 < a1 < . . < a, = b i=l defines a norm on BVo[a,b ] . 1 15-26. “Concrete” versions of the inequalities in this chapter. (a) Let X I . ,,, xcl. jl , , , , , j d E 3 and let p , q E (1. a )with (l 1 b (b) Let p E [ I , oc).Prove that llfllp := f(x) 1’ 1 1 P 4 - + - = 1. Prove the inequalities defines anorm on C o [ a .b] dx (c) State the Cauchy-Schwarz, Holder and Minkowski inequalities in inregral noration for the norms from part 15-26b. 15-27. Young’s inequality revisited. Let 1 < p , q < a be such that x . y E [0, a). Prove thatxy 5 Hinr. Absorb the E into x and y. xp E- P 1 - P + -1 = 1, let E z 0 and let 4 + E - 2p -.yq 4 15-28. Give a direct proof of Theorem 15.52. The first two parts of the definition of a norm are straightforward. For the triangular inequality, proceed as follows. 1 +1 (a) First prove Holder’s inequality for series. Let 1 < p , q < x be such that - - = 1 P 4 and let [ x , ] z , E 1P and [ J , , ] ~E ~14. Prove that then ( ~ , y , ] E~ I’~ and we have the inequality ll(,GzYng=l Ill 5 x ll{xrIl,=l 1lP 32 /l{rnI,=l llq. (b) Now prove Minkowski’s inequality for series. Prove that for all { x,]:=~, have cc a, Ilh~,=~ + { ~ ~cc1 , Ilp= ~5 I/{xllln=l Ilp + ‘( I ( Y ~ ]3, , ~ 2 (Y,];=~ E I p we Ilp. Hint. Use the proofs of the corresponding inequalities for integrals as guidance. 15-29. Let X be a normed space. Prove that c x i 12 1lxi 11 for any n elements X I , . . . . xn E X 5 li:, 15-30. When is a normed space an inner product space? (a) Parallelogram law. Let X be an inner roduct space. Prove that for all x. j 2P Ilx + Y I P Ilx - Y1I2 = 2/1xIl2 21lYll ’ (b) Polarization identity. Let X be an inner product space. Prove that for all x,y Ilx Y I12 - IIX - yll2 = 4 ( x , y ) . + + + E X we have E X we have 275 15.7. Abstraction IV: Metric Spaces (c) Let X be a normed space in which the parallelogram law holds. Prove that the equation 1 (x, y ) := - (Ilx y1I2 - Ilx - y1I2 ) defines an inner product on X with llxIl = m f o r 4 allx E X . + (d) State an equivalent formulation for the statement “ X is a normed space and the norm of X is induced by an inner product.” (e) Give an example of functions f,g E C o [ a ,b] that do not satisfy the parallelogram law for 1) )Ix, thus proving that ( C o [ a ,b ] , 11 . Jim ) is not an inner product space. 15-31. Let ( M , X,g)be a measure space with g ( M ) < co. Prove that if 1 5 p < q < so, then P ( M , C.g ) 2 Lq(M, X,g ) and for all f Hint. Holder’s inequality. I E 1 L P ( M , C , g ) we have l l f l l p 5 l l f ( / q g ( M ) F - q . 15-32. Jensen’s inequality. Hint. The second inequality is simple. For the first inequality consider x. lIx//oo Provethatforallx ~ I p a n d a l 1l 5 p < q < w w e h a v e llxllp 2 llxllq 2 IIxIIw. Prove that Jensen’s inequality can fail for continuous functions. That is, find p < q , a < b and acontinuous function g : [ a , b] + R so that ligllp < I/glIq and llgllp < Ilgllo0. Hint. A straight line rising from 0 to 1 on a sufficiently short interval will do. 15-33. The norm 11 ]Icc. as limit of the (a) Prove that for all x E 11 lIp-norms. Rd we have lim P+m Ilxllp = / / x / I x (b) Prove that for all x E I’ we have lim llxllp = 11x11~ P-rX (c) Let f : [ a ,b] --f R be continuous. Prove that for every E > 0 there is a S > 0 so that for all p Z 1 w e h a v e ( l I f I l ~ - ~ ) Si~ (d) Prove that for all f E (lb /f(x) 1’ dx)’ C o [ a .b] we have lim l l f l l p = P-tW 5 ~ ~ f ~ ~ x ( b - a ) ~ . lIfllx. 15.7 Abstraction IV: Metric Spaces Norms are used to measure distances in a vector space. Natural phenomena are often modeled in bounded subsets of d-dimensional space, which means sums and constant multiples do not necessarily stay in the subset. When there is no linear structure, distances are measured with metrics. The properties of a metric are inspired by the real life properties of distances. Distances are nonnegative, distinct objects have a nonzero distance from each other, the distance is independent of whether we go from point A to point B or vice versa, and detours through a third point cannot provide a shortcut. 15. The Abstract Venues for Analysis 276 Definition 15.53 A metric space is a pair ( X , d ) of a set X (without additional properties; inparticulal; X need not be a vector space) and afunction d : X x X -+ [0,co), called the metric on X , with the following properties. 1. For all x , y E X , we have that d ( x , y ) = 0 iff x = y . 2. For all x , y E X , we have d ( x , y ) = d ( y , x). 3. For all x , y , z E X , we have d ( x , z ) 5 d ( x , y ) + d ( y , z). We will usually call X itselfa metric space. When we do so, we implicitly assume that there is a metric on X , which will usually be denoted d. Normed spaces are metric spaces, too. So once more we have generalized a known concept. Proposition 15.54 Let X be a normed space. Then d ( x , y ) := IIx - y 11 dejnes a metric on X . Proof. Clearly, d ( x , y ) = IIx - y 11 >_ 0 for all x , y E X and d ( x , y ) = 0 iff //x- y / / = 0 iff x - y = 0 iff x = y . Also d ( x , y ) = /Ix - y / / = / / y- xi1 = d ( y , x ) . Finally, d ( x , z ) = IIx - zll 5 IIx - y II + Ily - zll = d ( x , y ) + d ( y , z ) . With metric spaces we are no longer tied to the linear structure of a vector space. In fact, any subset of a metric space is again a metric space. Proposition 15.55 Let ( X , d ) be a metric space and let S C X be any subset of X . Let ds := d / s x S be the restriction of the metric d to the subset S. Then ( S , d s ) is a metric space. Proof. Clearly, any property of d that holds for all elements of X will also hold for all elements of S. Proposition 15.55 provides a wide range of examples of metric spaces. For example, we can now consider intervals on the real line and subsets of Rd as metric spaces. Definition 15.56 Let X be a metric space. Then we will automatically consider any subset S 5 X to be a metric space, also called a metric subspace, carrying the metric d s of Proposition 15.55. Said metric will usually also be denoted d . It is sometimes called the induced metric or the relative metric. dm+ dm, Example 15.57 Not every metric space is a metric subspace of a normed space. On R2, for x f y let d ( x , y ) := which is the sum of the lengths of the straight line segments from x to 0 and from 0 to y , and for x = y let d ( x , y ) := 0. Then d is a metric on R2that is not induced by a norm. This metric models distances 0 in a situation in which all travel must go through a central hub. For more examples of metric spaces that are not subspaces of normed spaces, consider Exercise 15-34. We conclude this short section by proving the reverse triangular inequality for metric spaces. We could have proved it in normed spaces first, but with this approach we obtain it for normed spaces as a corollary. 277 15.7. Abstraction IV: Metric Spaces 2 LP normed spaces topological spaces (nor considered in this rexr) Figure 32: A hierarchy of structures for analysis. The LP spaces will be introduced in the next section. Theorem 15.58 The reverse triangular inequality f o r metric spaces. Let X be a metric space. Then f o r all x,y , z E X we have ( d ( x ,y ) - d ( y , z)l I d ( x ,z ) . Proof. Let x,y , z E X and without loss of generality assume d ( y , z ) 5 d ( x , y ) . Then the inequality d (x,y) 5 d (x, z ) +d ( z . y ) implies the reverse triangular inequality w Id(x, y ) - d ( y , z ) / = d ( x , y ) - d ( y , Z) I d ( x , z). Corollary 15.59 The reverse triangular inequality f o r normed spaces. Let X be a normedspace. Then f o r a l l x , y E X we have jllxll - ((yIII I IIx - yII. The only spaces more abstract than metric spaces that occur frequently in mathematics are topological spaces. In analysis, spaces usually carry a metric. Therefore we will not present topological spaces in this text. Metric spaces will be investigated in more detail in Chapter 16. Figure 32 shows the hierarchy of structures that arise in analysis. To avoid notational confusion when working in normed spaces, we will use norm notation for the metrics on subsets of normed spaces. This is sensible, because all metrics that we will consider on subsets of normed spaces are induced by a norm. Exercises 15-34. Examples of metric spaces. I 0: i f x = y , X let d ( x , y ) := 1; i f x f y . Prove that d is a metric on X. Nofe. d is called the discrete metric on X. (a) Let X be a set and for x , y (b) Prove that d ( x , y ) := E Ix - Yl 1 + lx - 4'1 defines a metric on W. Hint For the triangular inequality, expand with ~. 1 lx - T I 27 8 15. The Abstract Venues for Analysis (c) Consider the surface of a sphere in Bd with the usual distance function. Let the distance between two points be the length of the shortest path (on the sphere) between these two points. Explain why this distance function defines a metric between the points on the sphere and why this metric is not the metric induced by the usual distance function. 15-35. Explain why the metric induced on R2 by 11 111 is called the taxicab metric 15-36. Let 1 5 p 5 m. In Wd equipped with the metric induced by 11 . l i p compute the distance from the origin to the point (1, , . . , 1). In which metric does the cube [0, l l d have the longest diagonal? In which metric is the diagonal shortest? 15-37. The following, purportedly true, story illustrates the importance of having examples for an abstract notion. Warning. The following notion is absolutely useless. We simply debunk it here. The story itselfmay well be a “mathematical urban legend.” A mathematician once spent a lot of time proving abstract properties about so-called “anti-metric’’ spaces. An “anti-metric” d : X x X + [O. m) is a function so that for all x E X we have d [ x ,x ) = 0 iff x = 0, for all x , y E X we have that d ( x , y ) = d ( y , x ) and for all x , y , z E X we have that d ( x , z) ? d ( x , y ) d ( y , z ) . So, all that has changed from metric spaces is that the triangular inequality is reversed. Prove that an anti-metric space can have at most one point. + The mathematician could have simplified all his proofs by using that these spaces have at most one point. Spaces with at most one point have lots of properties, but they are not very interesting. So if you try to “be wise, generalize,” make sure that your generalizatiodmodification still has models [examples) that are useful. 15.8 LP Spaces Part 1 of Theorem 15.50 shows that CP spaces fail to be metric spaces because it is possible for distinct objects to have distance zero from each other. When all other properties of a metric are given, we speak of a semimetric space. Definition 15.60 A semimetric space is a pair (X’, d S ) of a set X s and a function d S : X s x X s + [0,00) with the following properties. 1. For all x E X s , we have d(x,x ) = 0. 2. For all x , y E X s , we have d S ( x ,y ) = dS(y,x). 3. For all x , y , z E X s , we have dS(x,z ) 5 dS(x,y ) + d S ( y ,z). It would be cumbersome to develop a theory of semimetric spaces parallel to that of metric spaces. It is also more appropriate to work with metric spaces, because practical observation tells us that two distinct objects cannot occupy the same space at the same time. To overcome the minor deficiency of a semimetric, objects that have distance zero from each other are combined to become single points. This process produces a metric space in natural fashion. - Theorem 15.61 Let (X’, d S ) be a semimetric space. Then -G X s x X s dejined by y i f s d s ( x ,y ) = 0 is an equivalence relation (see Definition C.5 in Appendix C.2). x If we denote the equivalence classes of with [XI, then the set X := {[XI : x E Xs} equipped with d ( [ x ] ,[y]) := dS(x,y ) is a metric space. - 15.8. LP Spaces 279 - Proof. It is clear that the relation is reflexive, symmetric and transitive. The function d is defined for equivalence classes, but it is defined in terms of representatives of each class. Therefore we must prove that d is well defined. That is, we must show that the value of d does not depend on which representatives are used. Let [x], [ y ] E X , let X I x2 , E [x] and let y1, y2 E [ y ] . Then ds(xi, y i ) + i d S ( x 1 , x 2 ) d S ( x 2 ,y2) + dS(y2,y i ) = d S ( x 2 ,y2) and we prove the reversed inequality similarly. Hence, d s ( x l , y l ) = d S ( x 2 ,y 2 ) and the definition of d is independent of the representatives chosen from each equivalence class. Now d ( [ x l ,[ y l ) = 0 implies d S ( x ,y ) = 0 , that is, x y and thus [XI = [ y ] (Exercise 15-38a). Conversely, if [XI = [ y ] ,then d ( [ x ] [, y ] )= dS(x,x) = 0. Hence, d ( [ x ] ,[ y ] )= 0 is equivalent to [XI = [ y ] and the first condition for being a metric is satisfied. Antisymmetry and the triangular inequality for d are easily verified (Exerw cises 15-38b and 15-38c), so d is a metric. - Theorem 15.61 allows us to define metric spaces of “p-integrable functions.” Formally these spaces consist of classes of p-integrable functions, but the distinction blurs at times. Metric considerations are taken care of in L P ( M , C , p ) , while integral equations etc. are proved in Cp(M . C, p ) . Definition 15.62 Let ( M , C, p ) be a measure space. For 1 5 p < 30, we denote by L p ( M , C , p ) the metric space obtainedfrom L p ( M . C ,p ) via Theorem 15.61. Exercises 15-39 and 15-40 show that the LP spaces actually are normed spaces and that the L 2 spaces are inner product spaces. We conclude this section by defining a space similar to 1, on measure spaces. Definition 15.63 Let ( M , C ,p ) be a measure space. We define P ( M , C, p ) := { f E F ( M , W) : ( 3 B For f E P ( M , C , p), we define II f ,1 E W : I f (x)l 5 B p-a.e.) }. := inf { B E W : I f (x)l i B p-a.e. }. Note that sometimes the p-a.e. in the definition of LOc is replaced with “p-locally a.e.” Exercise 15-45 shows that for a-finite measure spaces null sets and locally null sets are the same. We consider a-finite measure spaces in this text, so the author chose the simpler definition. Proposition 15.64 Let ( M , C , p ) be a measure space. Then COD( M , C , p ) equipped with d , ( f , g ) := 11 f - g 11 is a semimetric space. Proof. The reader will prove a little more in Exercise 15-41. w Definition 15.65 Let ( M , C , p ) be a measure space. We denote by L a ( M , C , p ) the metric space obtained from C m ( M , C , p ) via Theorem 15.61. Note that by Exercises 15-39 and 15-41 the space L w ( M , C ,p ) is actually a normed space. 15. The Abstract Venues for Analysis 280 Notation 15.66 If M = D is a Lebesgue measurable subset of Rd, C is the a-algebra of Lebesgue measurable subsets of D and p is Lebesgue measure restricted to C we will also write LP ( D )for Lp ( M , C , p ) and if D is an interval we also write Lp [ a ,b ] for L p ( [ a ,b ] ) ,and so on. Notation 15.67 Certain spaces are usually assumed to carry a certain norm or metric. The space Co[a,b ] is usually assumed to be normed by 11 ’ I/co. The spaces LP are usually assumed to be normed by 11 . I l p . The space Lco is usually assumed to be normed by 11 . l)co. Unless otherwise stated, we will assume that each of the above spaces carries the mentioned norm and that any subset of these spaces carries the metric induced by the mentioned norm. For finite dimensional spaces, Theorem 16.76 will show that although many norms are available, for most purposes any one of them can be used. Exercises 15-38. Fill in the remaining details in the proof of Theorem 15.61. - (a) Prove that if is an equivalence relation with equivalence classes denoted by then [XI = [y]. (b) Prove that d : X x X + [O, for all [XI, [ y ] E X. CXJ) [.I and x - y. as in Theorem 15.61 is satisfies d ([XI, [ y ] ) = d ( [ y l , [XI) (c) Prove that d : X x X + [0, co)as in Theorem 15.61 satisfies the triangular inequality. 15-39. A seminormedspace is a pair (X’, 11 lls) of a vector space Xs and a function I / . 11’ : Xs+ [O, a), called the seminorm such that the following hold. 0 JjOlJ’ = 0. For all real numbers a and all x E X’ we have Ilax/Is= IL-IIIxII’. F o r a l l x , y E X S w e h a v e l ~ x + y ~5~ lS~ x l l s + l l y l l s . (a) Prove that -& X s x Xsdefined by x - y iff IIx - yll’ = 0 is an equivalence relation. (b) Prove that if [XI denotes the equivalence class of x under -, then X := {[XI : x E X’) equipped with [XI := llxlls is a normed space. Be careful. You must also prove that X is a vector space. 1 1 15-40. A semi-inner product space is a pair (X’, (., .)‘) consisting of a vector space X s and a function (., .)s : X s x Xs -+ JR, called the semi-inner product such that the following hold. 0 (0,O)’ = 0. For all x E X’, we have ( x . x ) 2 ~ 0. Forallx,y ~ X ’ , w e h a v e ( x , y ) ’ = (y,x)’. For all real numbers a and all x , y For all x, y , z E Xs, we have ( x E + y . z)’ -C X’ + = ( x . z ) ~ ( y ,z)‘. pair (X’,11 . 11’) (a) Prove that with lIxI/’ := -the (b) Prove that X s , we have ( a x , y ) $ = a ( x , y ) ’ . x Xs defined by x - y iff Ilx - is a seminormed space ylls = 0 is an equivalence relation. (c) Prove that if [XI denotes the equivalence class of x under -, then the set X := {[XI :x E X s ] equipped with ( [XI, [ y ] ) := ( x , y)’ is an inner product space. 15-41. Prove that (L:“(M, C ,w ) , 11 . l l m ) in Proposition 15.64 is a seminormed space 15.9. Another Number Field: Complex Numbers 15-42. Let ( M , C,p ) be a finite measure space. For any two measurable functions f,g : M d(f.g ) := s, If - g1 28 1 --f W,define d p . Prove that d is a semimetric on the space of measurable functions. 1 + l f -81 15-43. Holder’s inequality for p = 1, q = 00. Let ( M , C , p ) be a measure space. Prove that for all functions f E L ’ ( M , C, p ) and g E L 3 ” ( M , C, p ) we have fg E L 1 ( M , C, p ) and the inequality llfglll 5 Ilflllllglloo holds. Hint. Use that g is bounded a.e. by llgl/m. 15-44. Let ( M , C ,p ) be a measure space with p ( M ) < i*? (a) Prove that L m ( M , C , p ) 0 c L P ( M , C, p). P€[l.W) (b) Give an example that shows that the containment is proper. (c) Prove that for all f E Lm(M,C ,p ) we have lim llfllp = llfllm. P+m 15-45. Let ( M , C, p ) be a measure space. A set L C C is called locally p-null iff for all S p ( S ) < 00 we have p ( S n L) = 0. E C with (a) Prove that every null set is locally p-null. (b) Prove that if ( M , C , p ) is c-finite, then every locally p-null set is a null set. I (c) Consider the function @ ( A ):= sets of W. co; ‘ if A’ defined on the Lebesgue measurable subifOEA, i. Prove that p is a measure. ii. Prove that Q is locally p-null, but not p-null 15.9 Another Number Field: Complex Numbers Complex numbers are often used in analysis, typically when “square roots of negative numbers” are needed. Exercise 15-52 shows that there is a price to be paid, because we lose the order relation. That in itself is not a problem. This section shows that the field axioms and an absolute value function remain available. Moreover, Theorem 15.75 will show that the convergence of Cauchy sequences, which is fundamental to analysis, is also preserved. Consequently, in abstract analysis R and @ are often used interchangeably. Definition 15.68 The complex numbers @ are the set R x IR equipped with addition and multiplication dejned as follows. For all complex numbers ( a , b), ( c , d ) E @, we set ( a , b ) ( c , d ) := (a c , b d ) and ( a , b) . ( c , d ) := (ac - b d , ad bc). We define i := (0, 1) and 1 := ( 1 , O ) and write complex numbers also in the form ( a , 6 ) = a . 1 b . i = a ib. For z = a i b E @, the number a is also called the real part of z, denoted % ( z ) ,and the number b is also called the imaginary part of z, denoted 3 ( z ) . + + + + The algebraic properties of + + + @: are summarized in Theorems 15.69 and 15.70. Theorem 15.69 The complex numbers C are afield. 15. The Abstract Venues for Analysis 282 + Proof. The field axioms are easily verified with 0 = 0 Oi being neutral for addition, 1 = 1 Oi being neutral for multiplication, -(a b i ) = (-a) (-b)i being U b i being the multiplicative the additive inverse and (a ib)-’ = a2+b2 a2+b2 inverse. (Exercise 15-46.) + + + ~ + ~ The special element i serves as the closest we can get to a “square root of (- l).” Theorem 15.70 i 2 = -1. Proof. i 2 = (0 + l i ) . (0 + ~ i =) ( 0 . o - 1 . 1) + ( 0 . 1 + 1 . o ) i = -1. w Aside from the above algebraic properties, we need to know how to measure distances in the complex numbers. This is done via the absolute value function. Definition 15.71 For z = a + ib E @, the absolute value of z is / z / := Theorem 15.72 Properties of the absolute value. 0. For all z E @, we have / z / 2 0. 3. The triangular inequality holds. Thatis,forallzl,z2E@wehuvelzl+z2l F lziI+/z21. w Proof. Exercise 15-47. If we switch the sign of the imaginary part of a complex number we obtain the complex conjugate. Definition 15.73 For z = a + i b E @, the complex conjugate of z is Z := a - ib. Absolute value and complex conjugate are related via a simple equation. This equation can be used to express multiplicative inverses. Proposition 15.74 For all z Moreover; for all z E E @, the equalities z + Z = 2%(z) and 1zI2 = Z Z hold. 1 Z C \ { 0 }the multiplicative inverse is - = 7. IZI Proof. Exercise 15-48. The definitions of complex valued sequences, convergence in C,and Cauchy sequences in C are similar to the corresponding definitions in R (Exercise 15-49). Theorem 15.75 Every complex valued Cauchy sequence converges in @. 283 15.9. Another Number Field: Complex Numbers {zn}E1 Proof. Let be a Cauchy sequence of complex numbers. For each n E W, let a, := M(zn) and b, := 3 ( z n ) so that zn = a, ib,. Then for all E > 0 there is an N E N so that for all m , n 2 N we have Iz, - zml < E . Thus for all rn, n 2 N we obtain (bn - bm)2 = Izn - z m 1 < E , la, - a, 1 I J(an - + + {a,}zl so is a Cauchy sequence in Iw. Similarly, { b n } E 1is shown to be a Cauchy sequence in R. Let a := lim a, and b := lim b,, where the limits are taken in R. Let z := a la - an I < +-i b and let E 1/2 n+m n-toa > 0. Then - there is an N E N so that for all n 2 and Ib - b,l < L.Therefore, for all n 3 N we obtain N we have A Iz - z n / = J ( a - a,)2 which means that z = lim n+oa Zn + (b - b,)2 < J;: + '2' - = E, - in @. Throughout this text we will rely on the properties that C has in common with R. It is a field in which Cauchy sequences converge and its absolute value function has similar properties as the absolute value function for R. The fact that C contains an element i such that i 2 = -1 is what sometimes requires the explicit use of C instead of R. As long as we do not use this fact, we could use C and Iw interchangeably. In particular, unless otherwise indicated, all theorems on normed real vector spaces also hold for normed complex vector spaces. Moreover, because no linear structure is used in metric spaces, all results for metric spaces hold in real spaces as well as in complex spaces. In the remainder of this section, we highlight the main adjustments that need to be made to examples and definitions when using C instead of R as the underlying field. All results for series, except for the alternating series test (obviously), and all results for power series can be translated verbatim to results for series and power series of complex numbers. The definitions of the exponential and trigonometric functions retain their familiar form. Definition 15.76 For all z E C we define 1. The complex exponential function ez := c Zk O0 k=O c -. k! oc (- l ) k z 2 k + 1 2. The complex sine function sin(z) := k=O (2k oc 3. The complex cosine function cos(z) := k=O + l)! ' (- l)kZ2k (2k)! These functions have the same properties as were proved for their real counterparts in Sections 12.1 and 12.2. Moreover, they are related via the Euler identities. 15. The Abstract Venues for Analysis 284 Theorem 15.77 Euler identities. For all z Equivalently, f o r all z E C we have sin(z) = ,iZ @ we have elz = cos(z) E 2i + i sin(z). eiz + e - i z - ,-iz and cos(z) = 2 . rn Proof. Exercise 15-50. To see how the all-important function spaces can be translated into complex vector spaces, we need to define integrability for complex valued functions. A complex vector space is of course defined similar to a real vector space, the only difference being that the scalars are taken from the field C of complex numbers rather than the field R of real numbers. Definition 15.78 Let ( M , C , p ) be a measure space. Then a function f : X + C will be called measurable iff its real part defined by % ( f ) ( x ):= % ( f ( x ) ) and its imaginary part deJined by 5 (f )(x) := 3 ( f (x)) are both measurable. It will be called integrable iff both real and imaginary parts are integrable and the integral will be JI, f d p := s, %(f 1 d p +i s, Y f1 d p . One of the main reasons why little changes for the integration of complex valued functions is that a complex valued function is integrable iff its absolute value is integrable. Proposition 15.79 Let ( M , C ,p ) be a measure space. Then a complex valuedfunction f : M +- CC is integrable ifSthe absolute value I f 1 : M + JR (of course taken pointwise) is integrable. Proof. For “e,” note that IW(f)l I I f 1 and / 3 ( f ) l 5 I f l , and apply part 2 of Theorem 14.35. = d m 1, 5 h m a x { IW(f) IS(f For “=+,” note that I f 1 the maximum is taken pointwise, and apply part 1 of Theorem 14.35. )I} , where rn Definition 15.80 If D is a set, then F ( D ,C) is the space of all functions from D to @. Spaces l p , CP ( M , C , p ) and LP ( M , C , p ) are defined similar to their counterparts f o r real valued functions and they have similar properties. The similar properties mentioned above include being a normed space or an inner product space. A complex normed space is defined similar to a real normed space, with the only difference being that we use a complex vector space instead of a real vector space. In particular, the norm on a complex normed space still maps into [0,w ) 2 R. The most significant adjustment when going from real vector spaces to complex vector spaces is in the definition of an inner product space. Definition 15.81 A complex inner product space is a pair ( X , (., .)) of a complex vector space X and a function (., .) : X x X -+ @, called the inner product on X , with the following properties. 1. The inner product is positive definite. That & f o r all x E X we have (x,x) E [0,w ) with (x,x ) = 0 z f f x = 0. 285 15.9. Another Number Field: Complex Numbers 2. For all x,y - E X , we have (x,y ) = ( y , x). 3. The inner product is linear in the$rst factor: That is, f o r all scalars a and all x, y E X we have ( a x , y ) = (Y (x,y ) and f o r all x,y , z E X we have (x y , z ) = (x,z ) ( y , z ) . + + We will usually call X itself an inner product space. When we do so, we implicitly assume that there is an innerproduct on X , which will usually be denoted (., .). The adjustment in part 2 of Definition 15.81 guarantees that the Cauchy-Schwarz inequality still holds and that complex inner product spaces are also complex normed spaces (see Exercise 15-51). To make L 2 ( M , C , p ) into a complex inner product space, we use the product ( f , g) := llr f d p on L 2 ( M , C , p ) and then identify functions whose distance from each other is zero as was done in Theorem 15.61. Throughout the rest of this text, unless we explicitly demand the space to be a real or complex space, it is usually safe to pretend the underlying field is R. But since we will only use field axioms, absolute values and convergence of Cauchy sequences in our proofs, the results will hold for complex vector spaces as well as for real vector spaces. Exercises 15-46. Prove that C is a field. That is, prove each of the following + y ) + z = x + ( y + I). C we have x + y = y + x. (c) Prove that with 0 = 0 + Oi for all x E C we have x + 0 = x. (d) Prove that for every element x = a + ib E C the element (-x) = ( - a ) (a) Prove that for all x , y,z E C we have ( x (b) Prove that for all x,y E x + (-x) + i ( - b ) is so that = 0. C we have ( x . y) . z = x . (y . z ) (0 Prove that for all x , y E C we have x y = y . x. ( 8 ) Prove that the element 1 := 1 + i0 is so that for all x E R we have (e) Prove that for all x . y. z E (h) Prove that for every element x = a + i b E C the element x-' := +a y 1 x =x a a2+b2 that x . x - ' = 1. (i) Prove that for all a , x,y E C we have CY . ( x +y ) = 01 x 15-47. Prove Theorem 15.72. That is, prove each of the following. C we have / z / 2 0. C we have lzl = 0 iff z = 0. (c) Prove that for all 2 1 , z 2 E C we have l z l z 2 l = Iz1 lIz2l. (a) Prove that for all z E (b) Prove that for all z E (d) Provethatforallzl,z2 ~ C w e h a v e l z l+z21 5 I z l I + l z 2 I . Hinr. For the triangular inequality, use the Cauchy-Schwarz inequality for R2 15-48. Prove Proposition 15.74. 15-49. Define each of the following for complex numbers. (a) Sequences of complex numbers -1- ' a2+b2 is so 15. The Abstract Venues for Analysis 286 (b) Convergence of sequences of complex numbers. (c) Cauchy sequences of complex numbers. 15-50, Prove the Euler identities. (a) First, use the power series to prove that e l z = cos(z) + i sin(z) for all z E C (b) Use part 15-50a and the appropriate version of Exercise 12-13c to prove that for all z eiz - ,-iz E C we eiz + e-iz and cos(z) = 2 ' 2i (c) Prove that the two identities in part 15-50b imply the identity in part 15-50a have sin(z) = ~ ~ 15-51. Let X be a complex inner product space with inner product (,, .). (a) Prove that for all x , y , z E X and all a E C we have ( x , y + z ) = ( x . y ) + ( x , z ) and ( x , a y ) = Z ( X , y). 1 1 (b) Prove that M ( ( x , y ) ) 5 dholds for all x , y E X. Hint. Use the proof of Lemma 15.39 as guidance and realize that f ( t )= (tx + y , + y) = t 2 k x ) + 2t% ( ( x , y ) ) + (y, y ) . tX (c) Prove that every complex inner product space is a complex normed space. Hint. Adjust the proof of Theorem 15.40 appropriately. (d) Prove the Cauchy-Schwarz inequality. That is, prove that for all x , y E X. 1 ( x , y) I 5 Jm holds 15-52. Prove that C cannot be ordered. Hint. Suppose there was a subset Ct of C with properties as in Axiom 1.6 for the real numbers. Prove that it must contain both 1 and -1, which is not possible. Use that it must contain i or -i. 15-53. Representation of complex numbers and nth roots (a) Prove that for every complex number z = x so that z = r e i e . Hint. Exercise 15-50a. + i): there are real number (b) Prove that for every complex number z # 0 and every n E numbers w 1 , . . . , wn with wr = z . Y 1 0 and Q E [O, 2irI N there are n distinct complex Chapter 16 The Topology of Metric Spaces In Chapter 15, we started with d-dimensional space as our underlying motivation and visualization. By systematically stripping away specific properties, we obtained the successively more general structures of inner product spaces, normed spaces and metric spaces. We will now journey back from metric spaces in this chapter, to normed spaces in Chapter 17, and to inner product spaces in Chapter 20. Because we now move from more general to more specific structures, the results we prove here and in Chapter 17 will also apply to structures discussed in later chapters. 16.1 Convergence of Sequences A metric provides a distance function on a set, which is enough to discuss convergence, the central notion of analysis. We start by defining convergence of sequences and by investigating examples and properties of convergent sequences. Note that convergence is defined exactly as for sequences of real numbers in Definition 2.2. A sequence in X is of course just a function f : N -+X , denoted as before by {xn}K1. Definition 16.1 Let [.x~}:=~ be a sequence in the metric space X. Then L E X is called limit of (xn}K1zfffor all E > 0 there is an N E N so that for all n 2 N we have d(x,, L) < E . A sequence that has a limit will be called convergent, a sequence that does not have a limit will be called divergent. As in the real numbers, limits are unique and the proof is similar. Proposition 16.2 Let {xn}El be a sequence in the metric space X. Ifboth L and M are limits of { x n } ~ fthen 1, L = M. 287 16. The Topology of Metric Spaces 288 Proof. We mimic the proof of Proposition 2.4. Let {x,):=~ be a sequence in X and let L and M be limits of {xn}El. We need to prove that L = M . Let E > 0 be arbitrary but fixed. Then there is an N1 E N such that for all n > N1 & we have d(x,, L) < -. There also is an N2 E N such that for all n p N2 we have 2 & & d(xn, M ) < -. Let N := max(N1, N2}. Then for all n 2 N we infer d ( x , , L) 4 2 2 & and d(x, , M ) < -. Hence, with n = N we obtain 2 d(L, M ) I d(L, x i y ) + d(xiy, M ) < -2 + -2 = & & 6. Because E > 0 was arbitrary, by Theorem 1.37 we conclude that d ( L , M ) = 0, and hence L = M . The similarity to the proof of Proposition 2.4 is obvious. It seems the main change is that we work with d(x,, L) instead of la, - LI. It would be naive to hope that all proofs translate this easily from the real line to metric spaces. But, especially for results early in this chapter, the proof of the corresponding result for the real line will provide more than adequate guidance. Definition 16.3 Because the limit is unique, we speak of the limit of a sequence. The notation lim x, = L will indicate that the limit of {x,,}?=, exists and is equal to L. n-33 Although we are proving results for metric spaces in this chapter, if a metric d(., .) is induced by a norm 11 . I( we will write IIx - yl( instead of d ( x , y ) . We can do this because every normed space is a metric space, which means that all results proved here are also valid for normed spaces (and inner product spaces, too). As a first example, we consider the convergence of sequences in d-dimensional space. It turns out that sequences in Rdconverge iff they converge componentwise. In Theorem 16.4 we prove this for the metric induced by the uniform norm // . (Im. Section 16.6 and specifically Theorem 16.78 will show that the same is true for all norms on Rd.Until Section 16.6 we will consider only the uniform norm on Rd.For the notation for the component sequences, recall Convention 15.23. Then Theorem 16.4 Consider Rdwith the metric induced by the uniform norm 11 . a sequence {x,}El converges to L in I/ . Ilm) 3for all k = 1, . . . , d the kth I co p, converges in R to the kth component L(k)of L. component sequence x i k ) ln=1 Proof. We have lim x, = L iff lim llx, - LIlm = 0. For all k I n+m n+m E 11, . . . , d), we infer lxik) - L ( k ) 5 IIx, - LIIm, so convergence of the vectors implies convergence of the components in R.(MentallyJill in the argument.) Conversely, assume that for all k E { 1, . . . , d} we have lim ixAk) - L @ ) = 0. Let n-cc iz 2 Nk we have ~ E > 0. For each k E { 1, . . . , d) there is an Nk such that for all 16.1. Convergence of Sequences 289 lxik) - L ( k ) < E . Let N := max{Nk : k = 1, . . . , d } . Then for all n 2 N we obtain - < E for all k E (1, . . . , d } , and hence ((x,- Lllm < E . 16. The Topology of Metric Spaces 290 Definition 16.7 A metric space X is called bounded iff there is a point c A4 E E% so thatfor all x E X we have d ( x , c ) 5 M . E X and an We will frequently work with subsets of metric spaces. Because it would be inefficient to define properties for metric spaces and also for subsets of metric spaces we recall that any subset of a metric space is a metric space in its own right. Definition 16.8 Any property of metric spaces can also be viewed as a property of the subsets of metric spaces. Formally, we say that a subset B of a metric space X has property P ifs B as a metric space with the induced metric d I B B has property P. Proposition 16.9 Let {xn}El be a convergent sequence in the metric space X . Then {x, : n E N) is bounded. Proof. Adapt the proof of Proposition 2.34. (See Exercise 16-2.) Subsequences are defined just as for sequences of real numbers. Definition 16.10 (Compare with Definition 2.39.) Let X be a metric space, let {x.),"=~ be a sequence of elements of X and let { n k } E l be a sequence of natural numbers so that nk < nk+l for all k E N. Then { x n k } E 1is called a subsequence of { x n } E l . As for real numbers, subsequences of convergent sequences have the same limit. Proposition 16.11 Let X be a metric space and let { x n } E 1be a convergent sequence with limit L. Then every subsequence { x n , } z l also converges to L. Proof. Mimic the proof of Proposition 2.40. (See Exercise 16-3.) Exercises 16-1, Componentwise convergence is not equivalent to convergence. lo;, 1 for 0 5 x 5 1 - -, n 1 1 (a) Prove that the sequence f n ( x ) := 2n x - 1 - - : for 1 - - 5 x 5 1 - -, n 2n 1 2n(l - x ) ; for 1 - - < x 5 1, 2n converges pointwise to f ( x ) = 0, but it does not converge to f in ( Co[O. 11, 11 . / l a particular you need to prove that the f n are continuous. ( 91 ). In (b) Prove that convergence in 1OC implies componentwise convergence. Hint. Mimic the appropriate part of the proof of Theorem 16.4. (c) Construct a sequence [xn]E1 in Ix so that for all i E W the component sequence x ( i ) converges, but [ x , ) ~ = ,does not converge in P . { x n ln=1 (d) Explain why the proof of Theorem 16.4 cannot be generalized to apply to Ice. That is, determine which part of the proof fails for 1". 16-2. Prove Proposition 16.9. 16-3. Prove Proposition 16.11. 16.2. Completeness 29 1 16-4. Let X be a metric space and let { x n ] r z 1 and (yn]r=l be sequences with lirn d ( x n , yn) = 0. Prove that if ( x n ) E 1converges then so does {yn]r=l n+o3 and lirn xn = lirn yn. n+m n+w 16-5. Let X be a subset of Wd with the metric induced by the Euclidean norm /I . 112 on Wd.Prove that a sequence ( x n ) Z 1converges to L in X iff for all k = I,. . . , d the kfh component sequence converges in R to the kth component L ( k ) of L . Hint.For the ‘‘+”direction, the difference in each component is best chosen to be -!- %a 16-6. Let ( M , C , /L) be a finite measure space and let M ( M ,6 ,p) be the metric space obtained as in Theorem 15.61 from the measurable functions with the semimetric d(f, g ) = from Exercise 15-42. Prove that the sequence convergesin measure to g. [fn]r=l [ [fn] converges to [ g ] s, E 1 Y;l”,l dp M ( M ,6,/A) iff 16-7. Let X be a metric space. Prove that the following are equivalent (a) X is bounded, (b) For all c E X there is an M , (c) There is an M E E B so that for all x E R so that for all x , y E X we have d ( x , c) 5 M c , X we have d ( x , y ) 5 M . 16-8. Let X be a metric space, let x E X and let (X,];P=~be a sequence in X such that every subsequence converges to x. has a subsequence that converges to x . Prove that (X,]F=~ 16-9. Because every normed space is also a metric space, we have also defined convergence in normed and [yn]F=l be sequences in X and let ( c ~ ] ? = ~be spaces. So let X be a normed space, let a sequence in W. (xn)E1 (a) Prove that if lirn xn = x and lirn yn = y, then lim Xn n+oc n+m n+oc + yn = x + y. (b) Prove that if lirn xn = x and lim yn = y , then lim xn - yn = x - y n+cc n-rm n+m (c) Prove that if lirn x n = x and lim cn = c, then lim cnxn = cx. n+oc n+co n+m Hint Mimic the proofs of the appropriate parts of Theorem 2.14 16-10, Because every inner product space is also a metric space, we have also defined convergence in inner product spaces. So let X be an inner product space and let (xn]r=l and be sequences in X . Prove that if lirn xn = x and lim yn = y, then lirn ( x n , Y n ) = (x, y ) . n+oc n+m n+m 16-11. Let X be a metric space and let and be convergent sequences in X with limits a and b, respectively. Prove that lirn d ( u n , b n ) = d(a, b ) . n-+m 16.2 Completeness By Theorem 2.27, every convergent sequence of real numbers is a Cauchy sequence. The definition of Cauchy sequences easily translates to metric spaces and the implication “convergence implies Cauchy” still holds. Definition 16.12 Let {xn]:=l be a sequence in the metric space X . Then {xn}E1 is called a Cauchy sequence iffor all E > 0 there is an N E N so that for all m , n 2 N we have d(x, , x,) < E. Theorem 16.13 Ifthe sequence Cauchy sequence. {xn}zl in the metric space X converges, then it is a 16. The Topology of Metric Spaces 292 Proof. Mimic part of the proof of Theorem 2.27. (See Exercise 16-12.) 1 Conversely, the fact that Cauchy sequences converge is one of the fundamental properties of the real numbers. Recall that by Exercise 2-25 this property is equivalent to the Completeness Axiom (Axiom 1.19). Hence, without this property analysis on the real line would be all but impossible. Convergence of Cauchy sequences is equally important in metric spaces. For example, in numerical considerations, the limit usually is not known. Therefore, to prove convergence of a numerical scheme it is common to prove that the scheme produces a Cauchy sequence and then use convergence of Cauchy sequences to conclude that the scheme will produce a limit (for an example, see the proof of Theorem 13.14). On the theoretical side, convergence of Cauchy sequences is crucial for the proofs of such fundamental results as the differentiability of the inversion operator (Theorem 17.32), Banach’s Fixed Point Theorem (Theorem 17.64), the Implicit Function Theorem (Theorem 17.65), Riesz’ Representation Theorem (Theorem 20.26), Picard and Lindelof’s Existence and Uniqueness Theorem (Theorem 22.6), and the Lax-Milgram Lemma (Lemma 23.4). Hence, we will always give special attention to spaces in which Cauchy sequences converge. Note that in a normed space we automatically assume the metric is induced by the norm. Definition 16.14 A metric space X is called complete i f f all Cauchy sequences in X converge. A complete normed space is called a Banach space and a complete inner product space is called a Hilbert space. The real numbers are the simplest example of a complete metric space. Unsurprisingly, the spaces that are most commonly used in analysis are complete. Theorem 16.15 (I@ I/ . Ilm) , is complete, that is, it is a Banach space. Proof. We need to prove that every Cauchy sequence in Rd converges. A typical completeness proof starts with a Cauchy sequence, constructs an element that should be the limit and then proves that the element is indeed the limit. Let be a Cauchy sequence in Rd and let E > 0. There is an N E N so that for all m , n > N we have IIx, - xn 1Im < E . But then for all j = 1, . . . , d and all nz, n 2 N we obtain - x ~ ” 1 5 IIxm - x, /Im < E , which means that each com- {x,}zl 1 1 1x2’ 30 ponent sequence x,(’I) n=l ponent sequence xi’) is a Cauchy sequence. Because R is complete, each com- 30 ln=l has a limit L ( j ) . Let L = (L ( l ) ,. . . , L i d ) ). By Theorem 16.4 {x,}Kl converges to L . Because { x , } z l was arbitrary, every Cauchy sequence 1 in Rdconverges and the space is complete. Completeness of C o [ a ,b ] , 11 . /Im) will be proved in Exercise 16-13. To prove that the LP spaces are Banach spaces, it is helpful to first analyze convergence of series in normed spaces. Series, convergence of series and absolute convergence are defined just like for real numbers. Lemma 16.18 characterizes Banach spaces in terms of convergent series. ( 16.2. Completeness 293 Definition 16.16 (Compare with Definition 6.1.) Let X be a normed space and let c 00 { a j } p lbe a sequence in X . The partial sums of the series c aj are dejined to be j=1 n sn := aj. The series is said to converge i f t h e sequence ofpartial sums converges j=1 c 00 and it is said to diverge otherwise. For a convergent series, the limit is denoted aj. j=1 Definition 16.17 (Compare with Dejinition 6.12.) Let X be a normed space. Then a c 00 series c 00 a j in X converges absolutely i f t h e series j=l ljaj 11 converges in R. j=l In normed spaces, absolutely convergent series do not automatically converge. In fact, they converge if and only if the space is a Banach space. Lemma 16.18 A normed space X is a Banach space iff every absolutely convergent series in X also converges in X . Proof. For “+,”mimic the proof of Proposition 6.13. (Exercise 16-14.) For “e,” let X be a normed space in which every absolutely convergent series converges and let be a Cauchy sequence. Set a0 := 0, No := 0 and induc1 tively for each Ek := - find an Nk > Nk-1 so that for all n , m 2 Nk the inequality 2k 1 ljan - a, 11 5 holds. For all k E N,let dk := a N k - a N k - , . By construction of the j=1 j=1 L“ j=1 absolutely, and hence, by hypothesis, it converges to a limit L E X. But for all k E W c k we have 30 d j = aNk, so { a ~ ~ is}a ~convergent = ~ subsequence of j=1 To prove that L := lim ahi, is the limit of k-00 K E N so that for all k {an}zl let E E {aa}zl. > 0. Then there is a 3 K we have l / a N k - LII < - and there is an M E W 2 M we have /la, a, 11 < -. But then for all n L M we so that for all n , m 2 2 can find a k 3 K so that Nk 2 M . Then for all n 2 A4 we obtain the inequality 6 (\an- LII I llan - U N , 1 + 1 1 - LI( ~ <~ -2 ~+ -2 = E and X is a Banach space. E E. Hence, {an}zl converges to L w Now we are ready to tackle LP spaces. Theorem 16.19 Let ( M , C , p ) be a measure space and let 1 5 p 5 co. Then L p ( M , C,p ) is complete, that is, it is a Banach space. 294 16. The Topology of Metric Spaces Proof. By Lemma 16.18 we need to prove that every absolutely convergent series 30 converges in LP. So let 0 C[fj]be a series in L p ( M , C ,p ) with C l l [ f j ] l / p< 00. j=1 For each j E N,the function f j will be afxed representative of [ f j ] . First, consider the case p = m. For all natural numbers j E N,we define the set j=l u a2 N j := {X E M : f j ( X ) z IIfjlla2}. Then N := verges on M \ N . Define f ( x ) := {cZ~ 30 N j is a p-null set and ;=1 f j conj=1 f j ( x ) ; forx E M \ N , Because forx E N . 0; u cc Nj j=1 is p-null we infer m a2 llfjlla2 n+a2 =~i% C IIIY~III~ = o , j=n+l and hence by Lemma 16.18 L m ( M , C ,p ) is complete. This leaves the case p E [ I , m). For all x E M set g(x) := Minkowski’s Inequality, for each n E N we obtain Thus by the Monotone Convergence Theorem and hence g is integrable. In particular, g is finite p-a.e., which means that for al30 most every x E M the series fJ(X); f j ( x ) converges absolutely. Define the function f j=l if g(x) < Then the function f is measurable and it otherwise. I I satisfies l f i P 5 g, so f E C p ( M , C ,p ) . Because lim f ( x ) - X f J ( x ) = 0 p-a.e. (Jzl n-+m P a2 J ( x)) I~ ) i~g(x) p-a.e.,bytheDominatedConand ~ . f ( ~ ) -5 ~ f ~ i fb vergence Theorem we conclude lim n+m I! -cI:, l I p f fJ 32 = 0. Therefore the series x [ f J ] J=1 16.2. Completeness 295 By Lemma 16.18, this means that L p ( M , C , p) is complete. converges to [f]. The most natural example of a metric space that is not complete are the rational numbers Q. To see why Q is not complete, recall that by Proposition 1.49 2/2 is not a rational number. Use Theorem 1.36 to construct a sequence {qn}zl of rational numbers that (in Iw) converges to 2/2 (for a concrete example, consider Exercise 2- 17). Then, because {qn}zl converges in R, {qn)zl is a Cauchy sequence. But {qn}El does not have a limit in Q,because if L E Q R was a limit of {qn}Fzl, then by the uniqueness of limits in Iw we would obtain L = h 6 Q,a contradiction. The above argument shows the typical way in which metric spaces fail to be complete. Incomplete spaces contain sequences that appear to converge to an element that "just is not there." Theorem 16.89 will show that this is indeed the only way in which a space can fail to be complete. Further examples of spaces that are not complete are given in Exercise 16-15. In particular, Exercise 16-15d shows that if we use the method that produced L' with the Riemann integral rather than the Lebesgue integral, then we do not obtain a complete space. This is the reason why for theoretical considerations the Lebesgue integral is preferred over the Riemann integral. Exercises 16-12, Prove Theorem 16.13. 16-13. Prove that Cora, b ] with the 11 / I x norm is a Banach space. Hint. First, prove that if is a Cauchy sequence, then every sequence [ f n ( x ) is a Cauchy sequence. Then prove that the pointwise limit f actually is a uniform limit. Conclude that f is continuous (Corollary 11.8). Then prove that llfn - f l l X I goes to zero. (fn)El 16-14. Prove the "=+''part of Lemma 16.18. 16-15. Metric spaces that are not complete. (a) Prove that the interval (0. 1) is not a complete metric space. (b) Prove that C'(-1, 1) with the uniform norm Hint Example 11.10. (c) Prove that Co[O, 11 with the norm I l f l l 1 := (d) Let ( R ' [ O ,11, // f .g with 1' 1 norm l l f l l 1 := /I . JIXI is not a Banach space. s,I 1 1 f ( x ) dx is not a Banach space . 111 ) be the space of Riemann integrable functions on [O, 11 with functions f(x) - g(x) 1 dx = 0 identified into one object like in L'[O. 11 and with the 1 I i1 f ( x ) dx. Prove that this space is not a Banach space Hint. Let C Q be a Cantor set with positive measure (see Exercise 8-3e) and let C : XI the sets used to construct C Q as in Definition 7.22. Prove that sequence in L'[O, 11 that converges to [[1.4),=, be is a Cauchy [Ice] (which is in L'[O, 11 by Exercise 9-29) in the XI L' norm. Use the fact that R1[O,11 is a subspace of L'[O, 11 to conclude that [lcfl I n = , is a Cauchy sequence in R'[O, I]. Then use that Ice is not Riemann integrable (Exercise 8- 15) to prove that I XI [ l c ~I nl = , cannot have a limit in R'[O, 11. In the last step some care is needed because formally the objects we are working with are equivalence classes. 16. The Topology of Metric Spaces 296 (e) Let 1 < p < 00. 1 Let ( R p [ O , 11, /I . 1)’ f ,g so that I f ( x ) - g ( x ) 1’ norm (1’1 llfllp := f(x) 1’ be the Riemann integrable functions on [0, 11 with d x = 0 identified into one object like in L P [ O , 11 and with dx) ’. Prove that this space is not a Banach space. (f) Prove that I’ with the norm / l {x n ] E l 16-16. Let X be a metric space and let gent subsequence, then it converges. := sup { lxnl :n E N } is not a Banach space. be a Cauchy sequence. Prove that if has a conver- 16-17. Leta <b.ProvethatonthespaceB:= { f ~ C l ( a , b l)l f l:l w < c o , / l f ’ l l m <co}thefunction l l f l l l := l l f l l o o + /If’II,definesanormandthat ( B J ,111 )isaBanachspace. 16-18. Give two proofs that for 1 5 p 5 co the space l P is a Banach space. (a) Prove it by applying Theorem 16.19 (b) Give a direct proof. Hint. Mimic the proof of Theorem 16.19 16-19. Unconditional convergence of series in Banach spaces. (a) Define what it means when a senes converges unconditionallyin a normed space. (b) Prove that in Banach spaces absolute convergence implies unconditional convergence. Hint. Mimic the “=+” part of the proof of Theorem 6.18. (c) In Banach spaces, unconditional convergence does not imply absolute convergence. Prove m l that the series -ek converges unconditionally, but not absolutely, in loo. k k=l 16.3 Continuous Functions Once convergence has been defined, it is possible to talk about continuity. After all, continuous functions are exactly those functions that preserve convergent sequences. We will define continuity directly, that is, not via limits of functions. The technicalities that would make a definition via limits of functions tedious are discussed at the end of this section, Definition 16.20 E-6 formulation of continuity. Let X , Y be metric spaces and let f : X + Y . Then f is called continuous at x E X ifffor all E > 0 there is a 6 > 0 such that for all z E X with d x ( z , x) < 6 we have dy ( f ( z ) , f ( x ) ) < E . A function that is continuous at every x E X is simply called continuous on X . Notation 16.21 When the distance between two objects is computed, the space in which the objects reside determines which metric needs to be used. Therefore, unlike in Definition 16.20, we typically do not index metrics and norms with the space on which they act. Metrics and norms will only be indexed when we are working with several metrics or norms on the same space (for examples, see Section 16.6) or when a I? particular norm is needed for a proof (Section 18.4 is a good example). The first examples are (of course) all continuous functions f : I + R, where I is an interval. Next note that the natural projections on EXd are continuous. 16.3. Continuous Functions 297 Example 16.22 For k E { 1, . . . , d } , define the natural projection nk : Rd + on the kth coordinate by n(x1, . . . , X d ) := X k . Then nk is continuous on Rd when Rd is equipped with the 11 . (Im norm. Let x = ( X I , . . . , X d ) E Rd and let E > 0. Then, with 8 := E , for all elements Z E Rd with (IZ - Xllm < E We obtain lnk(Z)- n k ( X ) / = (Zk- X k l 5 l l Z - Xllm < E , and hence is continuous at n. rn To obtain more examples, we need to establish some properties of continuous functions. To make some of these results easier to prove, we first connect continuity to sequences. Theorem 16.23 Sequence formulation of continuity. Let X , Y be metric spaces and E X ifSfor all sequences { Z n } z , in X let x E X . Then f : X + Y is continuous at x with lim zn = x we have lim f (z,) = f ( x ) . n-03 ,--too Proof. Adapt the proof of Theorem 3.25 (Exercise 16-20). Continuity is compatible with algebraic operations, if they are available, and it is preserved by compositions. Theorem 16.24 Let X be a metric space, let Y be a normed space and let the functions f,g : X + Y be continuous. Then f + g and f - g are continuous. Proof. Adjust the appropriate parts of the proof of Theorem 3.27. This can be done via Theorem 16.23 and Exercises 16-9a and 16-9b or with a direct E-8 argument that rn mimics the appropriate part of the proof of Theorem 2.14. (Exercise 16-21.) Theorem 16.25 Let X be a metric space, let Y be a normed space, and let thefunctions f : X + Y and CY : X -+ R be continuous. Then CYf : X -+ Y is continuous. Proof. Adjust the appropriate part of the proof of Theorem 3.27. This can be done via Theorem 16.23 and Exercise 16-9c or with a direct 8-8 argument that mimics the rn appropriate part of the proof of Theorem 2.14. (Exercise 16-22.) Theorem 16.26 Let X be a metric space, let Y be an inner product space, and let f,g : X -+ Y be continuous. Then (f, g ) : X + R is continuous. Proof. Adjust the appropriate part of the proof of Theorem 3.27. This can be done via Theorem 16.23 and Exercise 16-10 or with a direct E-8 argument that mimics the rn appropriate part of the proof of Theorem 2.14. (Exercise 16-23.) Theorem 16.27 Let X , Y , Z be metric spaces and let g : X continuous functions. Then f o g : X -+ Z is continuous. + Y and f : Y -+ Z be rn Proof. Mimic the proof of Theorem 3.30. (See Exercise 16-24.) + ( The above shows that functions on Rd,like f ( x , y ) := x y cos x 2 - 4’2) (see Exercise 16-25), which combine the components using known continuous operations and functions are continuous. Our definition of continuity also gives access to more complicated situations. For example, it is well known that integration is a “continuous operation.” The following example makes this observation more precise. 16. The Topology of Metric Spaces 298 Example 16.28 Let ( M , C,p ) be a measure space, let 1 5 p 5 00, let q be such 1 1 f g d p defines a that - - = I and let [ g ] E L q ( M , C, p). Then Z ~ , l ( [ f ] ) := P 4 M : L p ( M , C,p ) -+ R. continuousfunction Because the elements of L p ( M , C , p ) and L q ( M , C , p ) are equivalence classes and is defined in terms of representatives of the equivalence classes, we must first prove that the functions Z L ~are ] well defined. For functions f E P ( M , C , p ) and 1 + g E C q ( M , C,p ) set Z g ( f ) := b f g d p . By Holder’s inequality (Theorem 15.48 and Exercise 15-43), the integral in Z,(f) exists for all functions f E P ( M , cr, p ) and g E , P ( M , cr, p ) and the inequality /Z,(f)l I Ilfllpllglly holds. In particular this means if f i , f2 E [ f l and g l , g 2 E k l , then IpgJfl) - Ig2(f2)/ + I I p g J f l ) - &Jf2)1 / I g J f 2 ) - 4Jf211 = Ips, ( f l - fill + Ipgl-g*(f2)l I llfl - f2llplIg1lly + llf2lIpllg1 - g211y = 0. That is, the value of Z,(f) does not change if different representatives of the equivalence class of [ f ] E L p ( M , cr. p ) and [ g ] E Lq(M. cr, p ) are chosen. Now if the functions f E C p ( M , C, p ) and gl, g2 E C q ( M , C, p ) satisfy [ g l l = [gzl, then lIs, ( f ) - Z g 2 ( f ) / = 0. Therefore, for f E U ( M , C , p ) and [g] E L * ( M , C, p ) setting J r s l ( f ) := Z g ( f ) defines a unique function on P ( M , C , p). Moreover, if the functions fi, f 2 E C P ( M ,C,p ) satisfy [ f l ] = [f2] and [g] E L q ( M , C,p ) , then ( J r g i ( f i ) - Jrgi(f2)I = Izg(fi) - Ig(f2)1 = 0. Therefore Z ~ ~ l ( [ f ] ) := J [ , l ( f ) defines a well-defined function on L p ( M , C,p ) . Now let [g] E L q ( M , C , p). To see that Zrg] is continuous, first note that by Holder’s inequality we infer Z I ~(I[ f ] ) I[ f ] [ g ] for all [ f ] E L p ( M , cr, p ) and [g] E L q ( M , cr, p ) . Let [ f l ] E L p ( M , cr, p ) and let E > 0. Then for all elements & we obtain [f21 E L p ( M , U3 p ) with / I [ f l l - [f2111, < 6 := I / 1 1 / /I4 ll[~lllq+ lp[gl([fll) - 4 s l ( [ f 2 1 ) / Hence, = / p r s l ( [ f l - f21)I I I/[fl - f 2 1 l I p l l [ ~ l l l q is continuous. is actually a pretty complicated function. It maps equivalence Note that each classes of functions to numbers. For such complicated constructions, it is important to establish some type of mental hierarchy so that the functions that are elements of the space are distinguished from the functions that act on the space itself. The best way to achieve this is to practice with the complicated settings. (Consider, for example, Exercise 16-26, where functions map sequences to numbers.) When working with such functions, the following notations can also help establish a mental hierarchy. Definition 16.29 A continuous function into the real numbers is also referred to as a functional. A continuous function between metric spaces is also called an operator. 16.3. Continuous Functions 299 Note that in Example 16.28 we have proved a bit more than just continuity. We )I[glll,II[fi - f2111p3 forall have established the inequality lI~g]([fil)- I[g~([f21)lI [ f l ] , [f2] E L P ( M , C , p ) , where ll[g]llq actually is a constant for the function Functions that satisfy such a condition are called Lipschitz continuous. Definition 16.30 Let X , Y be metric spaces. A function f : X -+ Y is called Lipschitz continuous there is an L > 0, called the Lipschitz constant of f , so that for all x , Y E X we have d ( f ( x ) ,f ( Y ) ) i L d ( x , Y > . zr It is easy to see that Lipschitz continuous functions are continuous. Proposition 16.31 Let X , Y be metric spaces and let f : X continuousfunction. Then f is continuous. Proof. Exercise 16-30a. -+ Y be a Lipschitz w 16.3.1 The Trouble with Limits of Functions The familiar Definition 3.23 from calculus was not used to define continuity in metric spaces because a technical problem allows us to only define limits at accumulation points of the space. Definition 16.32 Let X be a metric space. Then x E X is called an accumulation point of X ifSfor all 6 > 0 there is a z E X \ {x} such that d ( z ,x) < 6. Definition 16.33 E-S formulation of the limit of a function. Let X , Y be metric spaces, let f : X \ { x } + Y and let x be an accumulation point of X . Then L E Y is culled the limit off at x E X ifffor all E > 0 there is a 6 > 0 such that for all z E X \ { x } with d ( z , x ) < 6 we have d ( f ( z ) ,L ) < E . We require that the point x is an accumulation point, because otherwise the limit of a function would not be unique. Indeed, if x is not an accumulation point (that is, if x is an isolated point), then for some S > 0 there is no z E X \ {x} with d ( z , x ) < 6 . Because there are no such points, it is vacuously true that for all such points z and any E > 0 and L E Y we have d ( f ( z ) ,L ) < E , no matter what L is. So at an isolated point x any element of Y would be a limit of f , which is a rather pathological situation. To avoid this pathology, we demand that limits are only taken at accumulation points. Note that for functions f : I \ { x } + R,where I is an open interval and x E I, the definition of the limit via Definition 16.33 is consistent with Definition 3.1, because x clearly is an accumulation point of I \ { x } and because we can apply Theorem 3.7. Exercises 16-28 and 16-29 show that Definition 16.33 removes the formal problems we had identified for Definition 3.1. Regarding continuity, it is reassuring to note that there is no need to worry about isolated points. The definition of continuity states that as the inputs get close to x, the outputs must get close to f ( x ) . If the inputs cannot get close to x , then the outputs vacuously get close to f ( x ) . This is still a silly notion, but there is no logical problem 16. The Topology of Metric Spaces 300 and usually no one worries about continuity at an isolated point. The definition implies that if a function is defined at an isolated point, then it is continuous there. So be it. Aside from the problem with isolated points, limits work very much like for functions on intervals ( a , b). Theorem 16.34 Sequence formulation of the limit of a function. Let X , Y be metric E Y . Then lim f ( z ) = L ifffor Z'X all sequences {zn}K1 in X \ {x} with lim zn = x we have lim f(zn) = L. spaces, let x be an accumulation point of X and let L n-+w n+co rn Proof. Mimic the proof of Theorem 3.7. (Exercise 16-27.) Exercises 16-20. Prove Theorem 16.23. 16-21. Prove Theorem 16.24. 16-22. Prove Theorem 16.25 16-23. Prove Theorem 16.26. 16-24. Prove Theorem 16.27. + 16-25. Prove that f : R2 --f R defined by f f x , y) := xy cos ( x2 - y2 ) is continuous. Hint.Break up the way the function is computed into elementary steps that are continuous. 16-26. Let 1 5 p 5 co. For k E ?Ck ( { u ( j ) 17 ) := N,define the projection on the kth component q : l p + W by Prove that for all p and k the function ."k : l p + R is continuous. j=l 16-27. Prove Theorem 16.34. 16-28. Exercise 3-8 revisited. (a) Prove that lim z+o & = 0. In particular, this means that with Definition 16.33there are no formal problems with the limit of the square root function at 0. (b) Explain why, despite the result in Exercise 16-28a, Definition 16.33 does not supersede the definition of one-sided limits. Hint.Step functions. 16-29. Prove that the function from Exercise 3-32 does not have a limit at x = 0. 16-30. Let X , Y be metric spaces and let f : X + Y be a function. (a) Prove that i f f is Lipschitz continuous, then f is continuous. (b) Define uniform continuity for functions g : X + Y . (c) Prove that i f f is Lipschitz continuous, then f is uniformly continuous. 16-31. Let X be a metric space and let z E X. Prove that d ( . . z ) : X + W is continuous. 16-32. Let X be a metric space and let f,g : X + all x E X , then f .is continuous on X. W be continuous functions. Prove that if g(x) f 0 for - g Hint.Use the proof of Theorem 2.14 as guidance. 16-33. Paths and continuity. (a) Let C be a subset of a normed space X , let x E C be such that there is an E > 0 so that { z E X : 1Jz- xJ1 iE } & C and let Y be a metric space. Prove that f : C -+ Y is continuous at x iff for all continuous functions p : [0, I] --f X so that p ( 0 ) = x , the composition f o p : [0, I ] + Y is continuous at 0. 301 16.4. Open and Closed Sets (b) Prove that f : E2 + W defined by f ( x , y ) := { 09. for (x, Y ) f (0,O ) , for x = y = 0, is not continuous at the origin. 16-34 A polynomial p : Rd -+ W is a finite sum of finite products of the variables. That is, there are an n E W, c l , . . . , c, E R and (a;,. . . ,a;) E ( W U {O] )d so that for all ( X I ,. . . , xd) E X d we have p(a-1. . . . , x d ) = 2 c;xp' . x?. Prove that every polynomial p : xd + B is continuous. i=l Hint. Induction. First, for n = 1 on the sum of the as to take care of the products, then on n. from X to Y is called uniformly 16-35 Let X and Y be metric spaces. A sequence of functions convergent to the function f : X + Y iff for all E > 0 there is an N E W so that for all n ? N and all x E X we have d ( f n ( x ) , f ( x ) ) < E . Let be a sequence of functions f, : X + Y that converges uniformly to f : X + Y and let all f, be continuous at c E X. Prove that f is continuous at c. Hint. Mimic the proof of Theorem 1 1.7. 1 1 j(j+l)(x-& ; for-sxs-, It1) J+1 j 1 1 16-36 For natural numbers j p 2, let g j ( j - 1) ( 1 - x) ; for 7 5 x 5 ---, J-1 J j-1 otherwise. [fn]y=l [f,]p=, a3 Prove that the function f : R --f lCc defined by f := [ ,, , j=2 g / + 1 ' / - 1 1 e j is continuous on W\ (01, bounded, and for any sequence x, -+ 0 with x, z 0 the limit lim f ( x n )does not exist n-rm Nore. This function shows that in infinite dimensional spaces functions can have discontinuities that are neither removable, nor jumps, nor infinite, nor by oscillation. 16.4 Open and Closed Sets Results, such as the Intermediate Value Theorem (Theorem 3.34) or the fact that continuous functions assume absolute maxima and minima (Theorem 3.44), refer specifically to the ordering of the real numbers. This ordering is not available in metric spaces. To obtain more general versions of these theorems, we must first establish properties that will take the place of the ordering of the real numbers. Connectedness (see Definition 16.90), which is needed to prove a version of the Intermediate Value Theorem, is defined purely in topological terms. Compactness, which allows us to prove a version of Theorem 3.44 for metric spaces, can be defined in terms of sequences, but it is more commonly defined in topological terms (see Theorem 16.72). Thus, before we continue, we introduce the requisite topological ideas. In a nutshell, topology revolves around open and closed sets. Figure 34 on page 304 summarizes the main properties of these sets. With a metric envisioned as measuring distances, a ball must be the set of points within a certain distance of a center point. Definition 16.35 Let X be a metric space, let x E X and let E > 0. Then the open ball of radius E about x is dejned to be B,(x) := { p E X : d ( p , x) iF } . An open set contains for each of its points a small open ball around that point. So, just like in an open interval, at every point in an open set we have a certain amount of room to go in any direction (also see Figure 34(a)). 16. The Topology of Metric Spaces 302 Figure 33: Visualization of the proof of Proposition 16.37. Definition 16.36 Let X be a metric space. A subset 0 E X is called open ifffor all E 0 there is an E~ > 0 such that Bex(x) g 0. x Open intervals ( a , b) on the real line are open sets as in Definition 16.36, because for each point x E ( a , b) with E := min(b - x,x - a } we obtain the containment B, (x)= (x - E , x E ) C ( a , b). Thus the nomenclature is consistent. In general, open balls are the prototypical examples of open sets. Although it is convenient to envision these balls as round entities, we should bear in mind that they need not be round at all. Balls with respect to the uniform norm 11 . )Im on Rd are cubes (also see Figure 37 on page 317). + Proposition 16.37 Let X be a metric space, let z ball B E ( z )is open. E X and let E > 0. Then the open Proof. To prove that a set is open, we prove that it contains a small ball around each of its points. Let x E B E ( z )be arbitrary. Set E, := E - d ( x , z ) > 0. Then for all y E Bgx(x) we have that d ( y , z ) 5 d ( y , x ) d(x,z ) < E - d ( x , z ) d ( x , z ) = E . Hence, BCx(x) 5 B, ( z ) (see Figure 33) and since x was arbitrary, B, ( z ) is open. + + We can summarize the most important properties of open sets as follows. Theorem 16.38 Let X be a metric space. Then the following results hold: 1. Both 0 and X are open subsets of X. n u n 2. 1f01, . . . , 0, are open subsets ofx, then ok is open. k=l 3. If0 is a family of open subsets of X then Proof. Part 1 is trivial. 0 is open. 16.4. Open and Closed Sets 303 n n For part 2, let 01, . . . , On be open subsets of X, and let x E o k be arbitrary. k= 1 Then for each integer k E { 1, . . . , n } there is an E k > 0 such that B , , ( x ) E ok. Let := min{sk : k = 1, . . . , n ) . Then for all integers k E (1, . . . , n } the containments E n n c: ~ & ( x ) ~,,(x) c: ok hold. Hence, B,(x) G n nok n ok. Since x E k= 1 was arbitrary, k=l n the intersection o k is open. u k=l For part 3, let 0 a family of open subsets of X and let x E 0be arbitrary. Then there is an 0 E 0 such that x E 0 and there is an E > 0 so that B,(x) 0. But then BE(x)E 0 E 0.Since x E 0 was arbitrary, 0 is open. rn u u u Example 16.39 Arbitrary intersections of opera sets need not be open. For any real numbers a < b, we have that [ a , b] = n 0 is not an open subset of X . Once open sets are defined we can briefly summarize the fundamental concepts of topology. A topology is a family 7 of subsets of a set X such that the three conditions in Theorem 16.38 hold. The pair (X, 7) is called a topological space and the sets 0 E 7 are called open sets. Any idea that can be stated purely in terms of open sets is actually a topological idea. In this text, we will focus on metric spaces and only touch upon the topological notions and vocabulary as necessary. As an example consider how continuous functions would be defined in terms of open sets. Theorem 16.40 Topological formulation of continuity. Let X and Y be metricspaces and let f : X + Y be a function. Then f is continuous on X zrfor all open subsets V C Y the inverse image f [ V ]E X is open. -‘ Proof. For “ j , let ” f be continuous and let V _C Y be open. Let x E f - ’ [ V ] . Then there is an E > 0 so that B E (f (x)) 5 V . Because f is continuous, there is a S > 0 such that for all z E X with d ( z , x) < 6 we have d ( f ( z ) ,f ( x ) ) < E . Therefore, for all z E Bs(x) we obtain f ( z ) E B , ( f ( x ) ) G V , and hence Bs(x) _C f - ’ [ V ] . Because x E f - ’ [ V ]was arbitrary, f - ’ [ V ]is open. Y the inverse For “+,”let the function f be such that for all open subsets V image f - ’ [ V ] _C X is open, let x E X and let E > 0. Then f-l[B,(f(x))] is open. Hence, there is a 6 > 0 with &(x) _C f - ’ [ B E ( f ( x ) ) ]But . this means for any z E X with d ( z , x) < S the inequality d ( f ( z ) , f ( x ) ) < E holds, and hence f is continuous at rn x . Because x E X was arbitrary, f is continuous. We will often be concerned with subsets of metric spaces that contain an open set around a given point x. To abbreviate the wording and because open sets contain a little “padding” around each of their points (see Figure 3 4 ( a ) ) ,we call such sets neighborhoods. 304 16. The Topology of Metric Spaces open set Figure 34: Open sets contain a certain amount of “padding” around each of their points ( a ) ,while closed sets contain all their limit points (b). Definition 16.41 Let X be a metric space and let x E X . Then N X is called a neighborhood of x i f f x E N and there is an open set 0 G X so that x E 0 C N . I f S 2 X and N is a neighborhood of each s E S, then N is also called a neighborhood of s. The complements of open sets are called closed sets. Definition 16.42 Let X be a metric space. A subset C G X is called closed fz its complement X \ C is open. Closed intervals on the real line are also closed sets as in Definition 16.42. Once again, the nomenclature is consistent. Example 16.43 Although it is tempting to hope that open and closed are mutually exclusive and exhaustive properties, sets can actually be open, closed, both, or neither. 1 . The interval [0, I] is a closed subset of the real line. 2. In any metric space, the sets 0 and X are both open and closed. 3. The interval [0,1) as a subset of the real line is neither open nor closed. 0 Although properties of closed sets can in theory be obtained through complementation, closed sets are interesting in their own right, because closed sets “keep their limits” (also see Figure 34(b)). Definition 16.44 Let X be a metric space and let A G X . Then x E X is called a limit point of A iff there is a sequence {a,}:=, with a, E A for all n E N so that x = lim a,. n+m Limit points are different from accumulation points in that the sequence that converges to a limit point x can assume x as a value (and even be constant with value x ) , which is forbidden for accumulation points. 305 16.4. Open and Closed Sets Theorem 16.45 Let X be a metric space. Then C C X is closed @for all convergent sequences with zn E C for all n E M we have lim Zn E C. {z,}Z, zn n+co c Proof. For "jlet ," C X be closed, let {z,},"=, be a convergent sequence with C for all n E N and let x := lim zn. For a contradiction, suppose that x $ C. E n-tco Because X \ C is open, there is an E > 0 so that B E ( x )G X \ C. But then there is an N E N so that for all n 2 N the point z,, is in BE(x)5 X \ C and in C, a contradiction. For "+,"suppose for a contradiction that X \ C is not open. Then there is an x E X \ C so that for every n E N there is a point zn E B L(x)n C. But then { Z n } E i n is a sequence of points in C that converges to a point x @ C, contradiction. Remark 16.46 Definition 16.36 and Theorem 16.45 provide descriptive characterizations of open and closed sets, respectively. Note (also see Figure 34) that sequences co in open sets need not take their limit in the open set (consider the sequence { ln=l in the interval (0, 1)) and that closed sets need not contain a small ball around each of 0 their points (consider the point 0 in [0, 11). Because completeness is important, we should note that closed subsets of complete spaces will again be complete. Corollary 16.47 Let X be a complete metric space and let C E X be closed. Then C with the induced metric is a complete metric space. rn Proof. Exercise 16-37. 16.4.1 The Interior and The Closure For each subset A of a metric space, there are a largest open set contained in A and a smallest closed set that contains A . These sets are called the interior and the closure, respectively. Definition 16.48 Let X be a metric space and let A C X . The interior A" of A is the set of all points x E A so that a small ball around the point is also in A , that is, A" := {x E A : (3s > 0 : B,(x) C A ) } . The points in A" are also called the interior points of A. u{ Proposition 16.49 Let X be a metric space and let A E X . Then A" is an open set that contains all open subsets of A. Moreovel; A" = 0 C A : 0 is open }. Proof. To see that A" is open, let x E A". Then x E A and there is an E > 0 so that B E ( x ) C A . By Proposition 16.37, for every z E B,(x)there is an E~ > 0 so that BEI( z ) g B , ( X ) A . But then B E ( x )E A", which proves that A" is open. Now let U g A be an open subset of A. Then for all x E U there is an E > 0 so that BE(x)C U G A, which means x E A". Hence, U A" for all open subsets of A. Finally to prove the equation, note that the set on the right is an open subset of A , which means (by what we just proved) that the set on the right is contained in A". Conversely, A" is an open set contained in A , so A" is contained in the union on the right, thus establishing the equation. c 306 16. The Topology of Metric Spaces Example 16.50 1. The interior of an open ball B, (x)obviously is the ball itself, ( B , (x))" = B, (x). 2 . The interior of the rational numbers as a subset of the real numbers is Q" = M, because every ball around a rational number contains an irrational number. 0 Because every singleton set {x)is contained in all open balls of radius r > 0 about x. it is easy to see that there is no smallest open set that contains a given set. Definition 16.51 Let X be a metric space a i d let A C X . The closure A- (or 2)of A is dejined to be the set of all limit points of A, that is, - [ ( A := A - := x E X : 3 { a , } z l : lirn a, = x and Vn nice E W : a, E I. Proposition 16.52 Let X be a metric space and let A E X . Then A - is closed and it is contained in all closed supersets of A . Moreovel; A- = n ( C 1 A : C is closed }. Proof. To prove that A - is closed, let x E X and let {x,}?=~ be a sequence with A- for all n E W and lirn x, = x. Then for each n E N there is an a, E A with nice 1 d(x,. a,) < -. But then lim a, = lim x, = x and x E A-. Hence, A- is closed. n n+oo n+oo Now let C 2 A be a closed superset of A and let x E A - . Then there is a sequence {a,}Zlso that a, E A for all n E N and lim a, = x. But then, because C is closed xn E n-co and contains A we conclude x = lim a, E C, and hence A- E C. n-rzo Finally, to prove the equation, note that the set on the right is a closed superset of A (use Exercise 16-41c), so A - must be contained in the set on the right. Conversely, because A- is a closed superset of A it must contain the intersection on the right, which establishes the equality. Example 16.53 1. The closure of an open ball is B,(x) = { p E X : d ( p , x) 5 r } (Exercise 1648a). 0 2 . The closure of the rational numbers in the real numbers is = R,because every 0 real number is the limit of a sequence of rational numbers. Because every open interval is the union of all its closed subintervals, there is no largest closed set that is contained in a given set. The set that is geometrically between the interior of a set and the interior of the set's complement is called the boundary. Definition 16.54 Let X be a metric space and let A C X . The boundary 6 A of A is dejined to be 6A := A- \ A". 307 16.4. Open and Closed Sets Figure 35: Visualization of Proposition 16.56. Relatively open sets U are intersections of the subset with open sets V in the space ( a ) . In particular, as subsets of the space itself, they need not be open ( b ) . Example 16.55 1. The boundary of an open ball is s B , ( x ) = { p E X : d ( p , x) = r } (Exercise 16-48b) . 2. The boundary of the rational numbers in the real numbers is 8Q = R. Further properties of the interior, the closure, and the boundary of a set will be investigated in Exercises 16-43-16-47. 16.4.2 Relatively Open Sets When subspaces S of a metric space X are investigated, subsets of S can be open in the subspace S, as well as in the space X itself. However, it is important to realize that a set U C S can be open in S and still it may not be open in X. For example, the interval [O. 1) is an open subset of the metric space [O, 21. Indeed, for every x E [O, 1) there is a small E > 0 so that the ball in [0,2] around x of radius E is contained in [0, 1). Proposition 16.56 describes the relation between open sets in the space X and open sets in a subset S. Open sets in a subset are also called relatively open. Proposition 16.56 Let X be a metric space and let S X be a subset. Then U S is open with respect to the induced metric on S iff there is a set V C X that is open with respect to the metric on X and such that U = V S. (Also see Figure 35.) Proof. To prove "+,"for each element u E U find E, > 0 so that the containment B;,(u) := {x E S : d ( x , u ) < e U }G U holds. Let v := u U€U Bt(U)= u {x E x : d ( x ,u ) < E U } UEU Then V is open and we claim that U = Sn V . Clearly, U _C S f l V . Now let u E S n V. Then there is a u E U so that u E B: ( u ) . Because u E S we infer u E ( u ) _C U , and hence S n V C U . Bli 16. The Topology of Metric Spaces 308 The part “+”is left as Exercise 16-38. rn A similar result is proved for closed sets in Exercise 16-49. Exercises 16-37. Prove Corollary 16.47. part. 16-38. Finish the proof of Proposition 16.56 by proving the ‘‘e” 16-39. Let X be a metric space. Prove that for all x E X the set X \ (x)is open. 16-40, Let X be a metric space. Prove that U C X is open iff U is a union of open balls (x). 16-41. Closed sets. Let X be a metric space. (a) Prove that both 0 and X are closed subsets of X. u n (b) Prove that if C1, . . . , C, are closed subsets of X , then Ck is closed. k=l (c) Prove that if C is a family of closed subsets of X then C is closed. (d) Give an example of an infinite union of closed sets that is not closed. d 16-42. Let C g Wd be so that for all n E W the set C n n [ - n , n ] is closed. Prove that C is closed. i=l 16-43. The interior of a set. Let X be a metric space. (a) Prove that Aoo = A’ for all A g X. (b) Prove that if A l , . . . , A , g X, then (h Aj ) A; = j=1 j=1 oc (c) Prove that if A j g X for all j E N,then AS and give an example that shows that the containment can be proper. (d) Prove that U g X is open iff U o = U . 16-44. The closure of a set. Let X be a metric space Prove that A - - = A - for all A g X. Prove that if A 1 , . . . . A , g X, then u n uq. n Aj = j=1 j=1 Prove that if A j sX u CCI for all j E W, then A, 2 r=l u x. and give an example that shows ]=I that the containment can be proper. Prove that C g X is closed iff C- = C 16-45. The boundary of a set. Let X be a metric space and let A C X. (a) Prove that the boundary S A of A is closed. (b) Find a set B in a metric space X so that B , SB and S(SB) are three distinct sets (c) Prove that if A is closed, then (X \ A ) U Ao = X. (d) Prove that for any set A we have S ( S A) = 6 ( S ( 6 A ) ). 16.5. Compactness 309 16-46. Closure, interior and the boundary. Let X be a metric space and let A & X (a) Prove that X \ (A") = ( X \ A ) - . (b) Prove that X \ (A-) = (X \A)'. (c) Prove that S A = A- n (X \ A ) - . (d) Prove that A-O- (e) Prove that A'-' C A - and show that the containment can be proper. 2 A n and show that the containment can be proper. 16-47. Let X be a normed space and let Y C X be a normed subspace. Prove that Y - also is a normed subspace of X. 16-48. Let X be a metric space, let x (a) Prove that B,(x) = (b) Prove that S B , ( x ) = E X and let r > 0 [ p E X : d ( p ,x ) 5 r ) . { p E X : d ( p ,x ) = r ) . 16-49. Let X be a metric space and let S & X be a subset. Prove that C g S is closed with respect to the induced metric on S iff there is a set D C X that is closed with respect to the metric on X and such that C = D n S. Hint. Use Theorem 16.45. For ''3,'' let D be the set of all limits of sequences in C . 16-50, Let X, Y be metric spaces and let f : X +. Y , Prove that f is continuous at x E X iff for all open subsets V C Y that contain f ( x ) the inverse image f-'[V] & X contains an open ball around x . 16-51. Let x be a point in a metric space X. Can a neighborhood of x be closed? Explain. 16-52. Let X be a normed space and let R g X be an open subset. Prove that every x point of R. E R is an accumulation 16-53. Let X, Y be metric spaces and let f : X + Y be a function. Define the oscillation of f over the open set U C X as w f ( U ) := sup [ d ( f ( y ) , f ( z ) ) : y , z E U ). Define the oscillation o f f at x E x as w f ( x ) := inf [ w ~ ( u :) x E U , u open 1. (a) Prove that f is continuous at x iff wf ( x ) = 0. (b) Prove that for all p 1 0, the set (c) Prove that for all p 2 0, the set {x {x E X : wf(x) 2 p E X :wf(x) < p } is closed. ) is open. 16-54. Prove that Cantor sets are closed. 16.5 Compactness The Bolzano-Weierstrass Theorem plays a key role in the proofs of several important results for functions of a single variable. For example, it is used to prove that a continuous function f : [ a ,b] + R always assumes an absolute maximum value (see Theorem 3.44), as well as to show that continuous functions f : [ a ,b] -+ R are uniformly continuous (see Lemma 5.19). It therefore is natural to investigate spaces that satisfy the conclusion of the Bolzano-Weierstrass Theorem. Definition 16.57 Bolzano-Weierstrass formulation of compactness. A metric space X is called compact ifSevery sequence {x~}:=~ of elements in X has a convergent subsequence. 310 16. The Topology of Metric Spaces Compactness is usually formulated in topological terms. We will investigate this formulation, which is reminiscent of the Heine-Bore1 Theorem (see Theorem 8.4), in Theorem 16.72. In metric spaces, the Bolzano-Weierstrass formulation of compactness is equivalent to the topological formulation, but we need to be careful. In general topological spaces, the Bolzano-Weierstrass formulation of compactness is called sequential compactness. It is a consequence of (topological) compactness, but it is not equivalent to it. Closed and bounded subsets of finite dimensional spaces are the prototypical examples of compact sets (also see Theorem 16.80). Example 16.58 Let r > 0 and let c,(o):= [x E Rd : I I x J J5~ r ) . Then c,(o)is compact when equipped with the metric induced by the 11 . /Jm-norm. A typical compactness proof with the Bolzano- Weierstrass formulation will take a sequence in the space and produce a convergent subsequence. Let be a sequence in c,(o).Then each component sequence x;) is [xn}zl { )5L { bounded. In particular, x:')] { 00 has a convergent subsequence x"'} n=l [ 00 1 5 j < d and assume n i ] m=l n=l 00 nk . Now let k=l is a strictly increasing sequence of integers such that for 1 5 i 5 j the subsequences (x(~))T converge. Then m=l a convergent subsequence. That is, there is a strictly increasing sequence of integers [niT1]I0 1=1 such that for all 1 5 i 5 j + 1 the subsequences [ 00 { xy/+,] 00 converge. 1=1 Inductively we conclude that there is a subsequence x n i ] k = i such that for 1 5 i 5 d d 00 converge. Call each limit x ( i ) and let x := E x ( ' ) e i . the subsequences .i z lCCI Then by Theorem 16.4 {xn..) - k is a convergent subsequence of k =l {xn}Z1 with limit x. Moreover, x E C,(O), because C,(O) is closed. Since { x n } K l was arbitrary this means 0 that C, (0) is compact. Compact subspaces of Rd will be investigated in detail in Section 16.6. As Example 16.58 indicates, compact spaces are usually subsets of larger metric spaces. Definition 16.59 Let X be a metric space. A subset C G X is called compact i f f C with the induced metric is a compact metric space. Equivalently, C is compact i f f every sequence in C has a convergent subsequence whose limit is in C. Compact subsets of metric spaces are closed and bounded Proposition 16.60 Let X be a metric space and let C be a compact subset of X . Then C is closed and bounded. 16.5. Compactness 311 Proof. Let C X be compact. To prove that C is closed, suppose for a contradiction that C is not closed. Then there is a sequence {x~}:=~ of elements of C that converges in X to a limit x # C . But then by Proposition 16.11 all subsequences of {x,,},I=l x converge (in X ) to x,and hence no subsequence of has a limit in C, a contradiction. To prove that C is bounded, suppose for a contradiction that C is not bounded. Let .YI E C . Once X I . . . . , x,, E C have been chosen, we can find an x,,+] E C so that ~ ( x . , , L Ix,,) . ? (n 1 ) max {d(xk,x,) : k = 1, . . . , n - l } . But then the inequality d ( s , , - ~ xk) . 2 d(x,,+l. .xn) - d(x,,. xk) 2 n 1 holds for all k = 1. , . . , n , and hence all subsequences of the inductively constructed sequence {x, are unbounded. Now by Proposition 16.9 no subsequence of (x~}:=~ has a limit in C (or even in X ) , a contradiction. w {x,,)zl + + + Exercise 16-55 shows that the converse of Proposition 16.60 does not hold in general. Next, we note that the closed subsets of a compact space inherit the compactness. Proposition 16.61 Let X be a compact metric space and let C G X be closed. Then C is compact. Proof. Let { ~ , ) , " 3 _ ~be a sequence of elements of C. Because C 5 X there is a subsequence { a n k } E lthat converges in X to a limit L . But then by Theorem 16.45 L E C, and hence C is compact. w The general version of Theorem 3.44 is now the following. Theorem 16.62 Let X be a compact metric space, let Y be a metric space and let f : X + Y be continuous and surjective. Then Y is compact. Proof. Let { y n } g l be a sequence in Y . For each n E N let xn E X be such that f ( x , ) = yn. Because X is compact, there is a subsequence {xn,}El of {x.},"=~ and an x E X such that lim xnk = x. But then y = f ( x ) E Y , { is a subsequence ynk}zl k+m of { y n } z land lim ynk = lim f ( x n k )= f ( x ) = y . Therefore Y is compact. k+ ac: k+x w Corollary 16.63 (Compare with Theorem 3.44.) Let X be a compact metric space and let f : X -+ R be continuous. Then f assumes its absolute maximum on X . That is, there is an x E X so that f (x) > f ( z ) f o r all z E X . Proof. The function f is surjective onto f [ X ] & R.By Theorem 16.62 this means f [ X ]is compact and by Proposition 16.60 this means that f [ X ] is closed and bounded. Then M := sup ( f [ X I ) is an element of f [ X ] . Therefore there is an x E X so that f ( x ) = M , and M is greater than or equal to all other values f assumes. w It is also easy to see that the inverses of continuous functions on compact metric spaces are continuous. Theorem 16.64 (Compare with Theorem 3.38.) Let X be a compact metric space, let Y be a metric space and let the function f : X + Y be continuous and injective. Then the inverse function f : f [XI -+ X is continuous, too. -' 312 16. The Topology of Metric Spaces Proof (sketch). Let {yn}El be a sequence in Y that converges to y E f i x { -' 1 [XI. Prove has a subsequence that converges to f that every subsequence of f ( y n ) n=l and use Exercise 16-8. The full proof is left to the reader as Exercise 16-56. (y) H An important metric property of compact spaces is that they are complete. Theorem 16.65 Let X be a compact metric space. Then X is complete. Proof. Exercise 16-57. Another important consequence of compactness is that continuous functions are uniformly continuous on compact metric spaces. Definition 16.66 Let X , Y be metric spaces. Then the function f : X -+ Y is called uniformly continuous i f f o r every E > 0 there is a S > 0 such that for all u , u E X with d ( u , u ) < 6 we have that d ( f ( u ) ,f ( u ) ) < E . Lemma 16.67 Let X be a compact metric space, let Y be a metric space and let the function f : X + Y be continuous. Then f is uniformly continuous. Proof. Mimic the proof of Lemma 5.19. (Exercise 16-58.) Compactness typically is not formulated in terms of the Bolzano-Weierstrass Theorem, but in terms similar to the Heine-Bore1 Theorem (see Theorem 8.4). For metric spaces, both formulations are equivalent. Because the Heine-Bore1 formulation is exclusively in terms of open sets it is the topological (and thus more general) description of compactness. Recall that the Heine-Bore1 Theorem said that each open cover of a closed and bounded interval has a finite subcover. u Definition 16.68 A cover of a metric space X is a family C of sets such that X C. An open cover C is a cover such that all sets in C are open. For a subset S of a metric space X , it is usually more natural to cover S with sets that are open in X , even if this means that the sets are not contained in S. Hence, i f S X , we will also call a family C an open cover of S i ra l l sets in C are open (in X ) and s E UC. For a visualization of open covers, consider Figure 36. Example 16.69 Open covers. I . {B,(o) : n E N} is anopencover ofR d. 2. [ B- (0) : x 3. For any a , b E } B ~ ( o )is an open cover ofBl(0) E (0, l), the set open cover of the interval [0, 11. { (:, 1- i) :n E ll~d N) u {Lo, a ) , ( b , 11) is an 313 16.5. Compactness (a1 (b) Figure 36: Because compact sets are usually subsets of other spaces, open covers are typically visualized with sets that are open in the surrounding space ( a ) rather than with relatively open sets (b). U { (-1, a ) , (b,2)) is an open cover Moreovel; the set ofthe interval [0, 11 i f w e consider [0, 11 as a subspace ofR. We have encountered the proof technique of finding finite subcovers of open covers in Proposition 8.5, Exercise 8-3c and Lemma 8.11. Definition 14.17 of the d dimensional outer Lebesgue measure also relies on open covers and the remarks after the proof of Proposition 14.60 suggest that we need a version of the Heine-Bore1 Theorem to prove that d-dimensional Lebesgue measure is a product of lower dimensional Lebesgue measures. Indeed, in this text, the main use of finite subcovers of open covers is in connecting topology with measure theory. Theorem 16.72 below shows that compactness provides an abstract version of the Heine-Bore1 Theorem. Lemma 16.70 Let X be a compact metric space. Then for every E > 0 there are n X I ,. . . . ~n E X SO that X C U B,(xj). j=1 Proof. Let E > 0. Let x1 E X be arbitrary. If X u BE(xl)stop. Otherwise continue n-1 as follows. If xi,. . . , x n - l E X are SO that X j=1 u n-1 B , ( x j ) , choose x n pi u B,(xj). j=1 n If X C j=1 B , ( x j ) , stop, otherwise continue. This process cannot continue indefinitely, because if it did, {xn}Z1 would be a sequence such that for any distinct m , n E N we have d(x,, x,) 2 E . This would mean that (xn}E1has no convergent subsequence, a contradiction to the compactness of X. The x1, . . . , x, for which the construction stops are as desired. 16. The Topology o f Metric Spaces 3 14 Definition 16.71 Let X be a metric space and let 0 be an open covey. A finite subset u n (01,. . . , O n ] & 0so that X & Oj is also called a finite subcover. j=1 Theorem 16.72 Heine-Bore1formulation of compactness. A metric space X is compact iff every open cover 0 of X has a finite subcovey. Proof. For "+,"we will prove the contrapositive. So assume that X is not compact. Let { x , } z l be a sequence that does not have a convergent subsequence. Then (Exercise 16-59) for every x E X there is an E~ > 0 so that { n E N : x, E BEx(x)) is finite. But then C = { BEx(x) : x E X ) is an open cover of X that cannot have a finite subcover. For "+,"let X be compact and let 0 be an open cover. We first prove that there is an E 0 so that for every x E X there is an 0 E 0 so that B,(x) g 0. For a contradiction, suppose that this is not the case. Then for each n E N there is an xfl E X such that B L(x,) is not contained in any set 0 E 0. Because X is compact, (x,}:=~ fl has a convergent subsequence { x , ~ ) ~ Let = ~ x. := lim x,,. Then there is an 0 (x: k+m E 0 so that x E 0. Moreover, there is an E > 0 so that B E ( x ) 0. Now let k E N be such 1 & 1 that - < - and such that d ( x n k ,x) < -. Then for all y E B L (x,,) we have that nk 2 nk "k 1 1 & & d ( y , x) 5 d ( y , X,,) d(x,,, x) < - - < - - = E . Consequently, the containIlk nk 2 2 ments B L ( x n k ) 5 B, (x) C 0 provide the desired contradiction. + + + "k Now let E > 0 be so that for every x E X there is an 0 E 0so that BE(x) & 0 . By u u n Lemma 16.70, there are finitely many X I , . . . , x, E X so that X C B,(xj). For each j=l n j = 1 , . . . , n , let Oj E 0be such that B E ( x j ) O j . Then X C B E ( x j )& j=1 and [ Oj}S=l is the desired finite subcover of 0. u 11 Oj j=1 I Typically, when we invoke compactness we obtain a finite subcover of a cover with sets that are open in a surrounding space. It is not necessary to explicitly prove that a subset S of a metric space is compact iff every open cover 0 (with sets that are open in the surrounding space) has a finite subcover. The translation is eazily made by finding a finite subcover { 01 n S , . . . , 0, f' S}of the corresponding cover 0 := { 0 nS : 0 E 0) with relatively open sets and then going back to the sets { 0 1 , . . . , O n }that are open in X (also see Figure 36). Exercises 16-55. Prove that B1 := [ x 16-56. Prove Theorem E 16.64. 16-57. Prove Theorem 16.65. Hint. Exercise 16-16. 16-58. Prove Lemma 16.67 l a j : lIx/lcw 5 1 ) is a closed and bounded subset of lx that is not compact. 16.5. Compactness 315 16-59. Prove that if X is a metric space,~=:],x( { n E W : xn E BE(x)] is infinite, then is a sequence and x E X is such that for all E > 0 the set (X,],X,~has a convergent subsequence. 16-60. Let X be a metric space. Prove that if C a finite subcover. 16-61. Let X be a metric space. Prove that if C a finite subcover. g X is not closed, then there exists an open cover without 5 X is not bounded, then there exists an open cover without 16-62. Let X be a metric space, let K C X be compact, and let 0 2 X he open such that K 2 0. Prove that there is an E z 0 such that for all x E K we have BE(x) C 0 . 16-63. Let X be a metric space such that all closed and bounded subsets are compact and let [an]r=l be a sequence. Prove that ( u , ) ~diverges ~ if and only if there is a subsequence { a n k ] E lthat l such that lim aim and is unbounded, or there are two subsequences { u n k ] ~and { U ~ ~ } Z = ~ m+m lim unk both exist, but are not equal. k+x 16-64. More on Lemma 16.70. (a) Prove that the conclusion of Lemma 16.70 is not equivalent to compactness by showing that the open interval (0, 1) satisfies the conclusion of Lemma 16.70. (b) Prove that a metric space X is compact iff i. X is complete and u n ii. For every E > 0, there are X I ,. . . , xn E X so that X 2 B,(xj). j=1 Hint. We only need to prove “e.” For this direction, let ( ~ j ] ? = be ~ struct a Cauchy sequence of points ( z k ] , X = , in X so that {j E N :yj a sequence in X. ConE B 1(zk) You will need to take a subsequence of a subsequence when constructing zkil Zl, x . ] is infinite. after obtaining . . . Zk. 16-65. Continuous functions need not be bounded on noncompact closed and bounded sets. For each n E N define the function f : lm + B to be f ( x ) := n 1 - - x(“) - E L ( e n ) around the nth unit vector and zero on Zx \ I0 arctan : B + (-:, and arctan(-x) := :j. 2 uB h 1o: ( 1Y) on the ball ( e n ) . Prove that f is continuous on n=l lx and unbounded on B1(0) G 1% 16-66. The tangent function can be inverted on the interval ( cc T i c (- k, ,). \ L Its inverse is the arctangent function L I 7r Extend arctan(,) to the interval [-m, m] by defining arctan(m) := 2 Forx, y E [-m, m] definedc(x, y) := arctan(x) - arctan(y) 1 1. , ) is a compact metric space. (a) Prove that ([-m, a ]dc (b) Prove that if [ x n ] r = l is a sequence of real numbers and x lim Jxn -xi = 0. E W,then n lim +m d c ( x n ,x ) = 0 iff n-cc (c) Prove that if is a sequence of real numbers, then we have lim dc(xn.x)) = 0 iff n-m lim xn = 30 in the sense of Definition 2.42. n-x 16-67. Prove that if f : [c, d ] x [a. b] + R is continuous, then the function g : [u,b] + R defined by g ( t ) := Id f ( x , t ) dx is continuous. H i m Use the uniform continuity o f f . 16. The Topology of Metric Spaces 316 16-68. Lemmas for the Stone-Weierstrass Theorem. [fn]r=l be a nondecreasing sequence of continuous real-valued func(a) Dini’s Theorem. Let tions on [a, b] that converges pointwise to the continuous function f : [a, b] + W.Prove that converges uniformly to f . Hint. Cover the interval with open intervals (x - S,, x + 8,) on which f is close to a function f n , . Then take a finite subcover. {fn]E1 (b) Prove that the sequence [Pn]F=Oof polynomials defined recursively for all x + 1 Po(x) := 0 and Pn+l (x):= Pn(x) - ( x - P:(x) 2 function f ( x ) = &. Hint. Same approach as in Exercise 2-42. E [0, 11 by ) converges uniformly on [O, 11 to the (c) Prove that for any M > 0 there is a sequence of polynomials that converges uniformly to f ( x ) = & on [O, MI. 16.6 The Normed Topology of Rd So far, on d-dimensional space we have only worked with the uniform norm 11 . l l m . This norm is easy to work with, but it does not measure the usual Euclidean distance. Therefore we will now investigate how norms in d-dimensional space relate to each other. It turns out that all norms on finite dimensional spaces are equivalent and that they induce the same notion of convergence. This means there is no loss of generality in working with the uniform norm on a finite dimensional space. Lemma 16.73 Let /I . 11 be a norm on Rd. Then f o r all x = c (XI, . . . , Xd) E Rd the d inequality IIx 11 p llxlloo ljei 11 holds, where ei denotes the i f h unit vector in Rd. i=l i=l i=l Theorem 16.74 Zfboth /I . I/ 1 and // . 112 are norms on Rd, then there are real numbers c, C > 0 such thatfor all x E Rd the inequalities cllx II 1 I IIx 112 I Cllx I/ 1 hold. Proof. Let // . I/ be an arbitrary norm on Rd. By Lemma 16.73, for all points d x,y ~ R ~ w e i n f e r ~ I l x l l - I l yI l l I~l x - y l l I I l x - y l l c o ~ l l e i I I . T h u s II./Iisconi=l tinuous with respect to 11 . IIoo. By Example 16.58, Proposition 16.61, and Corollary 16.63 we conclude that the norm /I . 11 assumes an absolute minimum and an absolute maximum on the compact set B := y E Rd : Ilyllw = 1 . Moreover, the absolute minimum cannot be zero, because IIx I/ = 0 implies x = 0 and 0 # B . The result will be proved if we can show that for any norms /I . Ill and II . 112 on Rd there is a C > 0 such that for all x E Rd \ ( O } we have that IIx (12 I C IIx II 1 . Let M := ma.{ Ilyll2 : y E B } and m := min{ IIyII1 : y E B } , and let x E Rd \ ( 0 ) 1 1 16.6. The Normed Topology of Rd 317 t Figure 37: Geometrically, Theorem 16.76 says that on a finite dimensional space inside any ball with respect to one norm we can find a ball with respect to any other norm and with the same center. Moreover, this smaller ball contains a ball with respect to the first norm with the same center. The figure shows this nesting for balls with respect to the three most common norms on R2, the uniform norm 11 . llo0 (dashed), the Euclidean norm 11 . 112 (solid), and the taxicab norm /I . 11 1 (dotted). Norms that satisfy the conclusion of Theorem 16.74 are also called equivalent. Theorem 16.76 below shows that any two norms on a finite dimensional vector space are equivalent. Figure 37 provides a visualization of equivalence for norms. Definition 16.75 Let X be a vector space and let 11 . 11 1 and /I . 112 be norms on X . Then 11 . 11 1 and I/ . 112 are called equivalent ifSthere are real numbers c , C > 0 such that for a l l x E X we have c I J x I I I I Ilxll2 I Clixll1. Theorem 16.76 Let X be afinite dimensional vector space. Then all norms on X are equivalent. Proof. Let 11 . 111 and 11 . 112 be two norms on X. Let { b l ,. . . , b d } be a base of X and let Q : X -+ Rd be the isomorphism that maps each bi to ei . For k = 1,2, define II d II /I d I/ 318 16. The Topology of Metric Spaces d But then for all x := E x ( " h i E X we infer i=l Equivalent norms induce the same notion of convergence. Proposition 16.77 Let X be a vector space, let 11 ' 1 1 1 and 11.112 be two equivalent norms on X , let {xn}Elbe a sequence in X and let x E X . Then lim xn = x in ( X , I/ . 111) if lim x, = x in cc I1 + (x,/ / . 112). n-+w Proof. Exercise 16-69. 1 Because equivalent norms induce the same notion of convergence, sequences in d-dimensional space converge iff their component sequences converge. Theorem 16.78 Let X be a j n i t e dimensional normed space and let { b l , . . . , bd) be a base of X . For each element x in X , let x('), . . . , be the components such d that x = E x ( ' ) b i . Then a sequence {x.],"=~converges to L in X g a l l component i=l Proof. Use Theorems 16.4, 16.76, and Proposition 16.77. (Exercise 16-70.) 1 In particular, we obtain that all finite dimensional normed spaces are complete. Theorem 16.79 Let X be ajnite dimensional piormed space. Then X is complete. Proof. Exercise 16-7 1. 1 Moreover, in finite dimensional spaces compactness is equivalent to being closed and bounded. Theorem 16.80 A subset C of ajnite dimensional normed space X is compact iff it is closed and bounded. 16.6. The Normed Topology of Rd 3 19 Proof. The direction "+"follows from Proposition 16.60. For "+,"let ( b l ,. . . , bd} be a base of X and for each x E X let x ( I ) , . . . , x@) d be the components so that x = x(')bi.Let C X be closed and bounded and let i=l ffi ( x n } z l be a sequence in C . Because the component sequence is bounded 00 in R there is a convergent subsequence with limit x ( l ) . Now suppose n:, oc has been chosen so that for m = 1, . . . , i the sequence {xi:)} has a limit i=l . I - 30 Then we can choose a subsequence a limit x ( ~ + ' ) .But then for rn = 1, . . . , i 1 + 1 the sequence { x:jl i=l - d oc has a limit j=1 Continue this selection process up to i = d. By Theorem 16.78 the subsequence d { } oc 'n? j=1 converges to x := x(')bi,and because C is closed, x E C. i=l Theorem 16.80 and the Heine-Bore1 formulation of compactness allow us to prove that if m , n , d E N with m n = d, then for all sets S E X i m x X i n the d-dimensional Lebesgue measure h d ( S ) is equal to the product measure A, x h,(S) of m-dimensional and n-dimensional Lebesgue measure. In particular, this completes the investigation started in Proposition 14.60. + Theorem 16.81 Let d E N and for i = 1, . . . , d let Ji be an interval offinite length. d I Ji I . Moreovel; if the numbers m, n E N satisfy m + n = d, then Ad 1 x;,,, = h, x A,, that is, the restriction of the Lebesgue measure on Rd to x X i n is equal to the product of the Lebesgue measures on Rmand Rn. d Proof. The inequality I Ji 1 follows directly from the definition hd of outer Lebesgue measure. To prove the reversed inequality, we proceed as follows. For i = 1, . . . , d, let ai be the left endpoint of Ji and let bi be the right endd point. Let K := n [ a i , bi]. It is easy to prove (see Exercise 16-72) the equality hd ( fI i=l Ji) = h d ( K ) . Now let i=l open boxes so that K u E > 0 and let {Dj],"=l be a sequence of dyadic ffi j=1 30 D j and h d ( K ) +E > I D j 1. (By Exercise 14-17, such j=1 16. The Topology of Metric Spaces 320 a sequence exists.) Because K is compact, we can assume without loss of gener- u N ality that there is an N E W so that K E n D j . For each j E 11,. . . . N } , let j=l d Dj = ( u ! , b j ) . Let M be the largest integer so that 2M is the denominator of i=l n (5,T), + c any of the completely simplified dyadic rational numbers u / and b/ . Let C M be the set d of all cubes of the form ci 1 where the C i are integers. Then for each D j 1=1 the equality (Dj( = (El holds (see Exercise 16-73). For i = 1, . . . , d , E E c M , E n D jf 0 li let lj be the largest integer so that - < ui and let ri be the smallest integer so that 2M Ti - > bit Then for every E 2M j E E C M that is contained in Q := . ~ _ i=l 1 . . . . , N so that E C Dj. Therefore n1 n d Because E was arbitrary we conclude hd (i:1 Ji 2 I Ji 1, and hence the two i=l sides are equal. In particular, this means that if A is an rn-dimensional open box and B is an IZdimensional open box, then A d ( A x B) = hm(A)h,(Bj. By Proposition 14.60, this equation also holds when one of A or B is a null set and the other is an arbitrary Lebesgue measurable set. But then, because the ( m - and n-dimensional) Lebesgue measurable sets are the a-algebra generated by the open boxes and the null sets, the above and Theorem 14.57 prove that h d I H zAn= h , x h,. Exercises 16-69. 16-70. 16-71. 16-72. Prove Proposition 16.77. Prove Theorem 16.78. Prove Theorem 16.79. Let d E N and for i = 1, . . . , d let Jj be a, not necessarily closed, interval of finite length with left endpoint aj and right endpoint bj . Prove that i,d 16.6. The Normed Topology of Rd 321 + n (3, k), n ($,5) d 16-73. Let M E N and let CW be the set of all cubes of the form C’ 1 where the ci i=l d are integers. Prove that for each dyadic open box of the form D = the equality i=l \Dl = ] E l holds. EECM,E~D#O n 16-74. The Fundamental Theorem of Algebra. Let P : C + C defined by P ( z ) := C a k z k be a k=O nonconstant complex polynomial. (a) Prove that P is continuous. (b) Prove that IPl : C +- [O, co)assumes an absolute minimum in @. Hint. Recall that C is (as a metric space) isomorphic to B2. Prove that for any M > 0 there is an r 0 so that for all Iz1 > r the inequality P ( z ) > M holds. 1 1 (c) Now prove the Fundamental Theorem of Algebra. That is, prove that there must be a z E C so that P ( z ) = 0. Hint. Suppose there is no such z , let the absolute minimum of IPl be assumed at zo and n 1 b j z j for some rn E consider Q ( z ) := -P(z z o ) . Then Q ( z ) = 1 bmzm + P(z0) + + 1 W. Apply the triangular inequality and find a z with Q ( z ) j=m+l I < 1. n n (d) Prove that there are, not necessarily distinct, z1, , . . , zn E C so that P ( z ) = a, (z - z j ) j=l for all z E C. 16-75, Partial Fraction Decompositions. (a) Let P be a polynomial with real coefficients. Prove that if z P (Z) = 0. E C is so that P ( z ) = 0, then (b) Use the Fundamental Theorem of Algebra to prove that each polynomial with real coefficients can be written as a product of the leading coefficient, linear factors ( x - c) and irreducible quadratic factors ( (x - a)* b2 ), where all constants a , b, and c are real. + (c) Prove that every rational function with real coefficients can be written as the sum of a polynomial and a linear combination of horizontally shifted rational functions as in Exercises 12-11, 12-17d, and 12-18. Hint. Induction on the degree of the denominator. (d) Explain why (at least in principle) it is possible to find a symbolic antiderivative for every rational function with real coefficients. 16-76. Prove that on Z2 the norms )I . 112 and 11 . /Ioc are not equivalent. 16-77. Prove that in a finite dimensional normed space a series converges unconditionally iff it converges absolutely. 16-78. Let X be an infinite dimensional inner product space. Prove that { x E X : Ilx 11 5 1 ] is closed and bounded, but not compact. Hint. Apply the Gram-Schmidt Orthonormalization Procedure to a sequence (bn)r= of linearly independent vectors in X to obtain a countable orthonormal system in [ x E X : / / xII i 1 1. 16-79. Proceed as follows to prove that a nomed space X is finite dimensional iff compactness is equivalent to being closed and bounded. (a) Briefly explain why we only need to prove “e.” 16. The Topology of Metric Spaces 322 For ''e " we prove the contrapositive. So for the remainder; let X be an injinite dimensional n o m e d space and let [b,,]r=l be a sequence in X so that anyjinite subset is linearly independent. Even though we do not necessarily have an inner product in we can adapt the idea from Exercise 16-78. x, Let A g X be a nonempty subset. For all x E X , define the distance from x to A as dist(x. A ) := inf [ d(x,a ) : a E A ). (We will investigate this function in Section 16.9.) := { x E span(u1, , , . , u n ) : lIxl/ 5 r For u1, . . . , u,, E X and r 2 0 let B~pan(ul""'un) Prove that for any element w f! span(u1,. . . , u n ) there is an a ( w ) E B~pan(vl""'L'n) so that 1 w - a ( w ) /I = dist ( w , B1span(u1, ....u,) Prove that / a(w) j/ }. ) z 0. < 2l/wl/ 1 4 1 Prove that if llwll < -,then dist ( w - a ( w ) , B~pan(u13""u") ) = w - a(w) Construct a sequence~=:],u( in X so that I/un // = 1 for all n E i , j E W we have llui - u j // 2 1. Finish the proof of the contrapositive of /I N and so that for all distinct "e." 16-80. Fubini's Theorem revisited. Let h2 be Lebesgue measure on R2 and let hx and h , denote Lebesgue measure on the x - and y-axes, respectively. (a) Prove that if the function f : R2 + [-m, m] is Lebesgue integrable and hx xh,,-measurable, then for almost all elements x E W the function f x ( y ) := f (x,y ) is Lebesgue integrable, for almost all elements y E R the function f ' ( x ) := f ( x , y ) is Lebesgue integrable, the function x H f x dh?; if fx is Lebesgue is Lebesgue integrable and the function 0: otherwise, {h (c) State and prove a result similar to the result in part 16-80a for the Lebesgue integral on R3, representing it as an iteration of three single variable Lebesgue integrals. (d) Compute the following integrals: 16.7 Dense Subspaces Recall that the integral of a nonnegative measurable function is defined as a supremum of integrals of simple functions. This means that for every integrable function there should be simple functions arbitrarily "close" to it. The concept of a dense subset expresses this idea in precise terms. 16.7. Dense Subspaces 323 Definition 16.82 Let X be a metric space. A set S C X is called dense in X ifffor every e > 0 and every x E X there is an s E S so that d ( x , s ) < E . So a subset S of a metric space X is dense iff every neighborhood of every point of X contains a point in S. In terms of approximating elements, we can say the following. Proposition 16.83 Let X be a metric space. Then S g X is dense in X ifffor all x there is a sequence of elements in S so that lirn s, = x. {sn}zl E X n-+m Proof. Use Standard Proof Technique 3.8. (Exercise 16-81.) The simplest example of a dense subset are the rational numbers as a subset of the real numbers. Theorem 16.84 Q is dense in R. Proof. Use Theorem 1.36. (Exercise 16-82.) Once we take care of the usual problem of equality almost everywhere, the above mentioned simple functions can be considered “dense in LP.” Theorem 16.85 Let ( M , C,p ) be a measure space and let 1 5 p S := { [s] : s E F ( M , R) is simple } is dense in L P ( M , C , p). i03. Then the set Proof. First, consider a nonnegative function g E P ( M , C , k ) . From the proof of Theorem 14.29 (see proof of Theorem 9.19), we infer that there is a sequence~=:],s{ of nonnegative simple functions that converges pointwise to g with 0 5 s, I g for co all n E W. Hence, the sequence { Is, - glp},=, converges pointwise to zero and it is bounded by g p E L P ( M , C , p ) . Thus by the Dominated Convergence Theorem we obtain lirn n+oo jM Is, - g l p 1 d p = 0, that is, n-+m lirn [s,] - [gl {s,}zl = 0. Now let f E C P ( M , C , p ) . Let be a sequence of simple functions so that lirn I/s, = 0 and let { t n } z lbe a sequence of simple functions so that n -+ oc f+I/, / t, - f - / P = 0. Then {s, - t n ) z l is a sequence of simple functions and we conclude 0 5 lirn 1 [s, - tn] - [f ] 1 5 lirn IISn - f + /I tn - f - 1 = 0. n+oo n-m lirn ,-too + By Proposition 16.83, S is dense in L p ( M , C , p ) . Although simple functions can be defined for arbitrary measure spaces, when additional structure is available it would be desirable to have dense subsets of functions with properties related to that structure. The next result shows that the continuous functions are “dense in LP[a,b].” For some LP spaces, we will find an even nicer dense subspace in Theorem 18.12. [ Theorem 16.86 Let a < b, let C [ a ,b ] := [ f ]: f Then C [ a ,b] is dense in L p [ a ,b ] . E C o [ a ,b ] ]and let 1 5 p < 00. 16. The Topology of Metric Spaces 324 Figure 38: Illustration of the approximation of indicator functions in the proof of Theorem 16.86. First ( a ) cover the set A with open intervals so that the measure of the union of the intervals is close to h ( A ) .Then (b) discard all but finitely many intervals but do not decrease the measure of the union too much. Then (c) approximate the indicator function of each interval with a continuous function so that the integrals remain close. Proof. By Theorems 5.20 and 9.26, C [ a ,b] is a subset of LP[a, b].We are done if we can show that for any E > 0 and f E CP(R) there is a continuous g E P ( R ) so that llf - gllp < E . The result for U [ a ,b] will follow because CP[a,b ] is embedded in Cp(R) by setting each function equal to zero outside [a,b] and because the restriction of a continuous function on R to [ a ,b] is continuous on [ a ,b]. First, let (1, r ) be an open interval on the real line and let E > 0. Define Then each h(l,rl,cis continuous on R and the following inequalities hold. We now prove that for every measurable set A and every E > 0 there is a continuous function g A , E so that lI1A - g A , c l l p < E . For the idea, consider Figure 38. Let {Ij]y=l u 00 be a sequence of open intervals so that A G n E N be such that [ j=n+l j=l 1 I I j 1) ’ < :. 00 IZj 1 Zj and < h(A) + (:)’ . Let j=1 Then the function gA,E := 2 j=1 h I l , ; is con- 16.7. Dense Subspaces 325 tinuous on W and Now we prove that for every simple function s and every E > 0 there is a continuous n function gs,c so that llgs,e - slip < E. Let s = x U j=l j l A J be a simple function on R n and let E > 0. Then gs,e := J=1 a J g A JE , is continuous on R and n(IaJ~+l) Now finally let f E CP(R) and let E > 0. By Theorem 16.85 there is a simple & & function s so that jl f - s I I p < -. 2 Moreover, g , ' 42 is continuous, Hence, for every [f]E LP(R)and every E > 0 there is a g E C(R) so that < E , which proves that C [ a ,b ] is dense in U [ a ,b ] . )I [f]- [ g ]Ilp It is worth noting that S as well as C [ a ,b ] are actually linear subspaces of the normed spaces L p ( M , C , p ) and LP[a, b ] ,respectively. In finite dimensional spaces, proper linear subspaces cannot be dense in the whole space. In infinite dimensional spaces there can be many dense linear subspaces comprised of "nice" elements. For integrable functions, it is standard practice to prove results for a dense subset of functions with nice properties and then use a limit argument to get the result for all functions. The proof of Theorem 18.37 is a prime example of this approach. Theorem 16.87 gives a first impression how an equality on a dense subset translates to an equality on the whole space. Theorem 16.87 Let X , Y be metric spaces, let D X be dense and let the functions f,g : x --f Y be continuous with f ID = glD. Then f = g . 16. The Topology of Metric Spaces 326 Proof. Let x E X \ D and let (d,}r=l be a sequence in D so that lim d, = x. ( Then f ( x ) = f lim d,) = lim f (d,) = lim g (d,) = g n+cc n+cc n+cc cause f (x) = g(x) for all x E D this proves f = g. ( lim d, n-30 rn Because completeness is such a useful analytical property, we conclude this section by proving that every metric space can be viewed as a dense subspace of a complete metric space. Definition 16.88 Let X , Y be metric spaces. A function f : X + Y is called an isometry iff f o r all x, x' E X we have d ( f (x), f ( x ' ) ) = d (x, x'). If there is an isometry f : X + Y , we will also say that X can be isometrically embedded into Y . Theorem 16.89 Every metric space X can be isometrically embedded as a dense subspace into a complete metric space C ( X ) . Proof. For this proof, let d x denote the metric on X. Define C(X):= [ { x ( ~ ) ] : : ( x i i j ] " i=l 1=1 I isacauchysequenceinx . Let { x ( ~ ) ) : , [ y ( i ) ) m E C ( X ) . We will first show [ d x ( x ( ~ 4"i))]30 ), is a r=l i=l i=l Cauchy sequence. To do this let i , j E N. Assume without loss of generality that Thus for any ( x ( ~ ) } : , { y ( i ) ] 3 0 r=1 i=l E C(X)the sequence { d x ( x l i ) , y ( i ) ) ) im= l is a [ )N Cauchy sequence, and hence it has a limit. For x ( i ) r=l , [ y ( i ) ) m E C ( X ) , define i=l 16.7. Dense Subspaces 327 We claim that dS is a semimetric on the set C(X). Clearly, for all x,y E c(X) we have dS(x,y ) 2 0,d S ( x , x ) = 0 and dS(x,y ) = d S ( y ,x). Now consider three elements [ x ( ~ )r =)1 : , { y ( i ) } z l [, Z ( ~ ) ] : ~ E C(X).Then - Let be the equivalence relation on C(X)as in Theorem 15.61. Let C ( X ) be the set and let d be the metric on C ( X ) obtained from ( C ( X ) ,d S )via Theorem 15.61. AS in Theorem 15.61 we will denote the elements of C ( X ) by [x],where x E C(X). We claim for every { x ( ~ ) } : E C ( X ) and every n E N there is an equivalent 1=1 1 i = l so that for all i, j E N we have dx ( Y ( ~ )y, ( j ) ) < -. n Let n E N. [ ] { ~ ( ~ ) } f f ir=l There is an rn E N so that for i, j 3 rn we have that dx ( x ( ~ )x ,( j ) ) < { y(i))oc dx n := {ximCi))b3is a Cauchy sequence and clearly for all i, j E N we have i=l i=l 1 co ( Y ( ~ ) , y ( j ) ) < -. Moreover, because [x(')] is a Cauchy sequence, we infer i 1 lzl - i=l lim dx y ( i ) ,~ ( = ~ 1lim dx ( x ( ~ + x~( )i ), = 0, and hence y ( ' ) i+oc 1 -. The sequence ( i+w {x(i)l:l. To prove that C ( X ) is complete, let { [x,]}zl be a Cauchy sequence in C ( X ) . By the above, for each n E N we can assume without loss of generality that the sequence 1 xn = {x:)) is such that for all i, j E N we have dx ( , ' : x xi')) < -. i=l n b3 Define x := x,(l) We claim that x is a Cauchy sequence. Let E > 0. Then x) { IiT1. 1 & & there is an N E N so that - < - and for all rn, n 2 N we have d([x,], [x,]) 4 -. 3 N 3 & Let n , rn >_ N be fixed. Because ,lim dx ( x:), x?)) = d([x,], [x,]) < -, there is a 1 3 0 0 3 & k E N so that d x (.$I, xi')) < i.Hence, for all m ,n 2 N we obtain dx ( x j d ) , x i 1 ) ) 5 dx (x:!', + dx (xi."),x i k ) )+ dx ( x i k ) ,xi')) x$') I s 1 r n 3 n Because rn, n 3 N were arbitrary, x is a Cauchy sequence. < - + - + - < & . We now claim that lim [x,] = [x]. Let n-oc E > 0. With N E XI(^)) < 5,and hence obtain that for all n , i 2 N there is a k 2 N with dx ( x i k ) , ( dx xn( i ) > xi')) 1 < dx (,,(/I, xi')) < -+-+:<&. I n s 3 1 2 & N SO that < - we N 3 & + dx ( x i k ) ,x f ) ) + dx (xf), xi')) 16. The Topology of Metric Spaces 328 Therefore d ( [ x , ] , [XI) = ,lim dx that lim d ( [ x , ] , [XI) = 0. 2+00 (x!), x!l)) <E and we conclude, as was claimed, n+w Clearly, the map f : X -+ C ( X ) defined by f ( x ) := [{x}El] is well-defined and for all x, y E X we have d x ( x , y ) = d ( f ( x ) , f ( y ) ) . This means that X can be isometrically embedded into C ( X ) . 00 of To prove f [ X ] is dense in C ( X ) ,let [XI E C ( X ) .Fix a representative [XI. Then the sequence { f (x“)) )00 i=l converges to [XI { li=1 in C ( X ) (Exercise 16-83). A complete superspace Y of a metric space X so that X is dense in Y is also called the completion of the space. Any two completions are isometrically isomorphic (see Exercise 16-84d), so we can really speak of the completion. Thus Theorem 16.84 says that the real numbers are the completion of the rational numbers. Indeed, one way to construct the real numbers from the rational numbers in set theory is to apply the construction in the proof of Theorem 16.89 to Q (see Exercise 16-93). When considering function spaces, Theorem 16.86 says that L p [ a ,b] is the completion of C [ a ,b] with respect to the 11 . I l p norm. Because continuous functions are Riemann integrable, L p [ a ,b] is also the completion of the set of (equivalence classes of) Riemann integrable functions with respect to the 11 . [ I p norm. It is possible to introduce the LP spaces as these completions, but by explicitly defining the integrals the elements of the completion are better understood. Finally, note that Exercises 16-94 and 16-95 show that the completions of normed and inner product spaces are Banach and Hilbert spaces, respectively. Exercises 16-81. Prove Proposition 16.83. 16-82. Prove Theorem 16.84. 16-83. Finish the proof of Theorem 16.89 as follows. Let [XIE C ( X ) . Fix a representative [ x ( ~ ) }of~ [XI. Prove that the sequence f ( >I,: x(~) 16-84. Extensions of continuous functions. i=l converges to [XI in C ( X ) . (a) Let X be a metric space, Y a complete metric space, D G X dense and let f : D + Y be uniformly continuous. Prove that f can be extended to a unique continuous function F : X + Y sothat F I D = f. (b) Prove that there are continuous functions f : (0, 11 + B that do nor have a continuous extension F : [0, 11 + W as in Exercise 16-84a. (c) Give an example that shows that the space Y in Exercise 16-84a must be complete (d) Prove that if X , Y are metric spaces, D X g X is dense in X , D y E Y is dense in Y and f : D X --f Dy is an isometry, then there is an isometry between X and Y . 16-85. Examples of dense subspaces (a) Let ( M , C ,F ) be a measure space. Prove that the space of “simple functions” S := { [s] : s E F ( M , R)is simple ] is dense in L m ( M , X,p). (b) Prove that for 1 5 p < m, C’ := { [f]: f E C ’ ( a , b) ] is dense in L P ( a . b ) . Hint. You can defer most of the proof to corresponding parts of the proof of Theorem 16.86. Only the start needs to be modified. 16.7. Dense Subspaces 329 16-86. Prove that C [ a ,b] is not dense in L m [ a , b] 16-87. Dense subspaces of ( C o ( X , R),11 . //oo). Let X be a compact metric space and let C o ( X ,W) be A vector subspace L the s ace of continuous functions from X to R with the uniform norm I/ . of C ( X , R) is called a sublattice of C o ( X ,R)iff for all f E L we have that max(f, g ) E L and g min(f, g ] E L . A subset S of C o ( X ,W) is called point-separating iff for all x , y E X there is a g E S with g(x) f g ( y ) . A vector subspace A of C o ( X , R) is called asubalgebra of C o ( X ,R) iff for all f,g E A we have that f g E A. (a) Prove that if L is a point-separating sublattice of C o ( X ,W) that contains the constant functions, then for all E > 0, all functions f E C o ( X ,W) and all x E X there is a g E L so that g(x) = f ( x ) and for all y E X we have g ( y ) 5 f ( y ) E . Hint. Set h x ( z ) := f ( x ) and for y E X \ ( x ) find g E L with g(y) f g(x) and set + h y ( z ) := f ( x ) + ( f ( y ) - f i x ) ) g'(')( y ) - g&)( x ) . Cover X with open sets I , equality h y ( z ) 5 f ( z ) on which the in- - + E holds for all z E I y . Then use a finite open subcover and minima. (b) Prove that if L is a point-separating sublattice of C o ( X ,W) that contains the constant functions, then L is dense in C o ( X ,W). Hint. Let f E C o ( X ,W). For each x E X find an open subset Ix 3 x and a function g, E L so that g x ( y ) if ( y ) E for all y E X and so that g, ( z ) - f ( z ) < E for all z E I x . Then use a finite open subcover and maxima. + I I (c) Prove that if L is a subspace of C o ( X , R) so that for all f E L we have that is a sublattice of C o ( X ,W). (d) Let A be a subalgebra of C o ( X ,W) and let P : B f E A the function P o f is also in A . --f If1 E L , then L B be a polynomial. Prove that for all (e) Let A be a subalgebra of the space C o ( X , B).Prove that the subspace of C o ( X , W) is a sublattice. Hint. First prove that ; Iis a subalgebra. Then prove that for each f E 2 the function If1 = is also in ;I. To prove that the square root can be taken, use Exercise 16-68c. fl (f) Stone-Weierstrass Theorem. Let X be compact. Prove that if A is a point-separating subalgebra of C o ( X ,R)that contains the constant functions, then A is dense in C o ( X , W). (g) Prove that the subspace of all polynomials on [a, b] is dense in C o [ a ,b ] . (h) Let E > 0. Prove that the subspace of all trigonometric polynomials on [-n, n - E ] is dense in co[-n,n - e l . Hint. Exercise 12-15b. (i) Why are the trigonometric polynomials nor dense in C o [ - n , n]? (j) Let C := [-n,n)with the metric d ( x , y ) := I/ (cos(x), sin(x) ) - ( co s( j ) , sin(y) ) / I 2 i. Prove that C is isometrically isomorphic to ii. Prove that C is compact. [ z E W2 : lIzl12 = 1 ] . iii. Prove that the trigonometric polynomials are dense in ( Co(C, W), 11 llm ) 16-88. Density of polynomials does not imply that all Taylor series converge (a) Let X Y & Z be metric spaces so that X is dense in Z. Prove that X is dense in Y . (b) Prove that the set of polynomials is dense in ( C m [ a , b ] , 11 . llm). Derivatives on the boundary are understood to be one-sided. You may use Exercise 16-878. (c) Explain why part 16-88b does not imply that every function in the space (Coo[-1, 11, I/ . l l ~ ) is the limit of its Taylor series about zero. (Lemma 18.8 will ultimately provide an example of a function whose Taylor series does not converge to the function.) 16. The Topology of Metric Spaces 330 16-89. Use Egoroff’s Theorem (see Exercise 14-37c) and Theorem 16.86 to prove that for every function f E L P [ a ,b] and every E > 0 there is a B & [ a , b ] with h ( B ) > (b - a) - E and a continuous function g : B + so that glB = f. 16-90. A metric space with a countable dense subset is called separable ( a ) Prove that every open subset in a separable metric space is a counrabie union of open balls ( b , Pro\e that a compact metric space is separable. I Hirir. Use Lemma 16.70 for sk := - and all k E ?I. k ( c ) Let 0 & X“ be an open set. Prove that 0 is separable. Hint. a‘‘is countable. (d) Give an example of a complete, separable metric space that is not compact. (e) Prove that if X , Y are metric spaces, X is separable and f : X + Y is continuous and surjective, then Y is separable. 16-91. Let X be a metric space and let D g X . Prove that D is dense in X iff D = X . 16-92. Let X be a metric space. List all closed dense subspaces of X 16-93. Constructing the real numbers from the rational numbers. Prove that the metric space obtained when applying the construction in the proof of Theorem 16.89 to the rational numbers is an ordered field. Hint. Define the operations and . for sequences termwise and prove that they induce well-defined operations on the equivalence classes. Then prove that these operations satisfy the field axioms. Define R+ as the set of equivalence classes of sequences for which there is a positive rational number r so that the terms are eventually greater than r . Then prove that Axiom 1.6 is satisfied. The above, together with Theorem 16.89, shows that the space is a complete, ordered field. The Completeness Axiom as stated in this text can be proved using convergence of Cauchy sequences as in Exercise 2-25. + 16-94. Prove that the completion of a normed space is a Banach space. 16-95. Prove that the completion of an inner product space is a Hilbert space. 16.8 Connectedness Intuitively, a metric space should be connected iff it is possible to get from any place to any other place by going along an unbroken path. Unfortunately, this would define connectedness in terms of itself, because the only interpretation of “unbroken” is “connected.” Using continuous paths to connect points also leads to some problems, which will be explained in Example 16.96. Therefore, we define connectedness by what it means to be disconnected. Sensibly, being disconnected should mean that the space can be split into two nonempty pieces that are separate from each other. Two disjoint open sets can be considered to be separate entities, because each element in each set has nonzero distance to the respective other set. Definition 16.90 A metric space X is called disconnected iff there are t ~ ; disjoint o nonempty open sets U , V C X such that U U V = X . A metric space X is called connected iff no such disjoint nonempty open sets exist. Note that if X is disconnected, then the sets U and V are closed as well as open (Exercise 16-96). Such sets are sometimes called clopen. 16.8. Connectedness 331 Figure 39: A disconnected subset D of a metric space can be separated by two open sets U and V into at least two nonempty components (Dl and 0 2 U D3 in this figure). As with any topological notion, subsets of metric spaces are disconnected iff they are disconnected as metric spaces. Figure 39 gives the idea and Exercise 16-97 investigates the properties of sets that disconnect a subset of a metric space. Most importantly, subspaces can only be disconnected by open subsets of the surrounding space. The most natural example of a connected set is an interval. It turns out that intervals are the only connected subsets of the real numbers. Theorem 16.91 A subset of the real line is connected iff it is an interval. Proof. For "+,"let S be a connected subset of R and suppose for a contradiction that it is not an interval. Then there are I , r E S with 1 < r such that there is an m E R \ S with I < m < r. But then U := S n (-m, m ) and V := S n ( m ,m) are both open in S, nonempty and disjoint. This implies that S is disconnected, contradiction. For "+,"let I be an interval. Suppose for a contradiction that I is not connected and let U , V 5 I be disjoint nonempty open subsets of I so that U U V = I . Let c E I . Then c E U or c E V . Without loss of generality assume that c E U . Then there is an element u E V so that u > c or u < c. Without loss of generality assume u > c. Consider the set W := (x E V : x > c } ,which is nonempty and bounded below by c. Let b := inf W . Then because c # V and U is open we infer c < b. We claim that b # W. Indeed, otherwise b E V and then there is an E > 0 so that I n ( b - E , b E ) C V. Because c < b and c # V we obtain c 5 b - E . Because I is an interval, we infer ( b - E , b ] I , and then ( b - E , b ] 2 W , contradicting the choice of b. Thus b # W. Because v > b > c and I is an interval we obtain b E I , and hence b E U . But then there is an E > 0 so that I n ( b - E , b + E ) & U . In particular, this means that inf(W) 2 b E , contradicting the choice of b. + + Continuous functions satisfy an abstract version of the Intermediate Value Theorem, with connected spaces taking the place of the intervals on the real line. Theorem 16.92 Let X , Y be metric spaces and let f : X + Y be continuous. If X is connected, then the image f [XIis a connected subspace of Y . 16. The Topology of Metric Spaces 332 Proof. Suppose for a contradiction that f [ X ] is not connected. Then there are disjoint nonempty sets U , V c f [ X ] that are open in f [ X ] and satisfy U U V = f [ X ] . But then f - ' [ U ] and f - ' [ V ] are disjoint nonempty open subsets of X so that the equality f - ' [ U ] U f - l [ V ] = X holds, contradicting the assumption that X is connetted. Thus f [ X ] must be connected. The Intermediate Value Theorem is more readily recognized in the following result. Corollary 16.93 Intermediate Value Theorem. Let X be a connected metric space and let f : X -+ R be continuous. Then for all a , b E X with f ( a ) f ( b ) and all t between f ( a ) and f ( b ) there is a c E X with f ( c ) = t. + Proof. Exercise 16-98. By using images of intervals in Theorem 16.92 we obtain the idea of connectedness that was mentioned in the introduction. Definition 16.94 A metric space X is called pathwise connected @for any two points x,y E X there is a continuous jimction f : [0, I ] + X such that f ( 0 ) = x and f (1) =Y. Some examples of pathwise connected sets are given in Exercise 16-99. The next result shows that these are also examples of connected sets. Theorem 16.95 Let X be a pathwise connected metric space. Then X is connected. Proof. Suppose for a contradiction that X is not connected. Then there are two nonempty disjoint open subsets U and V of X so that U U V = X . Let u E U and u E V and let f : [0, 11 + X be continuous with f ( 0 ) = u and f ( 1 ) = u . Then f - ' [ U ] and f - ' [ V ] are disjoint nonempty open subsets of [0, 11, which is impossible because [0, 11 is connected. rn Exercise 16-100 will show that connectedness and pathwise connectedness are equivalent for open subsets of normed spaces. However, in general pathwise connectedness is strictly stronger than connectedness. { Example 16.96 The space X := { (x,0 ) : x 5 0} U (x, sin (:)) : x > 0) is con- nected, but it is not pathwise connected. The space X is not pathwise connected, because there is no continuous function (-&, (:) ) x (;')) x f : [0, 11 -+ X such that f ( 0 ) = { { both { (x , 0 ) : x 5 0) and (x, sin 0) and f ( 1) = (0,O). On the other hand, : > 0) are pathwise connected, and hence connected. Thus, if X was disconnected, there would be open sets U , V C EX2 so that { (x,0) : x I 0} c U and U intersects { (x, sin ( x . sin (i)) : >0) c V . But (0,O) E U implies that : x > 0 ) , so this is not possible. 0 16.9. Locally Compact Spaces 333 Exercises 16-96. Prove that a metric space X is disconnected iff there are disjoint nonempty closed sets C , D that C U D = X. 16-97. Disconnected subspaces. Let X be a metric space and let S C X so 5X (a) Prove that S is disconnected iff there are disjoint nonempty subsets U , V in X such that S C U U V . 5 X that are open (b) Give an example that shows that a subspace S can be disconnected and there are no disjoint nonempty subsets C, D C X rhat are closed in X such that S E C U D. 16-98. Prove Corollary 16.93. 16-99. Let X be a normed space (a) Prove that for any two points a , b E X the function f ( r ) := a + r(b - a ) is continuous. (b) Prove that X is pathwise connected. (c) Prove that for all x E X and all E > 0 the ball B E ( x )is pathwise connected. 16-100. Let X be a normed space and let R connected. g X be open. Prove that R is connected iff it is pathwise 16-101. Let X , Y be metric spaces and let f : X + Y be continuous. Prove that if X is pathwise connected, then f[X]is a pathwise connected subspace of Y . 16-102. Unit “circles.” + (a) Prove that { ( x , y ) E W2 : x 2 y2 = 1 ] is connected. Hint. c(r) = cos(r)el sin(t)e2. + [ (x,y ) E R2 : I l ( x . y ) l l p = 1 ] is connected. [ (x,y ) E R2 : II(x, y ) I l m = 1 ] is connected. (b) Let 1 5 p < (c) Prove that 30. Prove that 16-103. Connected components. Let X be a metric space. A subset C 5 X is called a component of X iff C is connected and there is no proper superset D 3 C that is connected. (a) Prove that if A , B 5 X are connected and A n B # 0, then A U B is connected. (b) Prove that if C1. C2 are components of X then either C1 = C2 or C1 n C2 = 0. (c) Prove that every x E X is contained in a component of X. (d) Prove that every open subset of W is a countable union of pairwise disjoint open intervals. 16.9 Locally Compact Spaces The goal of this section is to construct (families of) continuous functions that are equal to one on a specified set and equal to zero on another set. To construct such functions, we introduce the distance function. Definition 16.97 Let X be a metric space and let A X be nonempv. For all x E X , we dejine dist(x, A) := inf { d ( x ,a ) : a E A } and call it the distance from x to A. Lemma 16.98 Let X be a metric space and let A C X be nonempty. Then the function dist(., A ) is Lipschitz continuous. 16. The Topology of Metric Spaces 334 + Proof. Let x , y E X, let E > 0 and let a E A be so that d ( y , a ) 5 dist(y, A) E . Thendist(x, A) 5 d ( x , a ) 5 d ( x , y) d ( y , a ) 5 d ( x , y ) dist(y, A) 6 , andhence dist(x, A) - dist(y, A) i d ( x , y) E . Because E > 0 was arbitrary this means that dist(x, A ) - dist(y, A) 5 d ( x , y). We can prove dist(y, A) - dist(x, A) 5 d ( x , y) in similar fashion. Hence, Idist(x, A) - dist(y, A)l 5 d ( x , y) and dist(., A) is Lipschitz w continuous with Lipschitz constant 1 + + + + Lemma 16.99 below says that for any closed set C that is contained in an open set U the distance function allows us to slip another open set V between C and U so that the closure of V is also between C and U . For an illustration of Lemma 16.99, see Figure 40 on page 336. Lemma 16.99 Let X be a metric space, let C s X be closed and let U C X be open so that C U . Then there is a continuous function f : X -+ [0, 11 so that f Ic = 1 and f Ix\u = 0. Moreovel; there is an open set V so that C C V C 7 U . dist(x, X \ U ) . Because C and dist(x, X \ U ) dist(x, C) X \ U are disjoint closed sets, the denominator is greater than zero for all x E X (see Exercise 16-104b). Thus f is continuous on X. Moreover, (see Exercise 16-105) f / c = 1 and f Ix\u = 0. Proof. For each x E X, let f ( x ) := To prove the claim about the sets, let V := f - ' + [(i, 111. Because (k, 11 is open in [0, 11 and f is continuous, V is an open set in X and because f Ic = 1 it contains 1 C. Moreover, because f is continuous, for all x E 7we infer f (x)2 :, which means that v L w U For compact sets C, we would like to separate C from its neighborhood U with an open set V whose closure is compact. While this is not possible in general (see Exercise 16-log), it is possible in spaces with sufficiently many compact subsets. Local compactness guarantees that locally there are enough compact subsets by demanding that every point has a compact neighborhood. Definition 16.100 A metric space X is called locally compact i f e v e r y x compact neighborhood. E Proposition 16.101 A metric space X is locally compact iff f o r every x an E > 0 so that B , ( x ) is compact. X there is E X has a Proof. For "+,"let X be locally compact and let x E X. Then x has a compact neighborhood N . Let E > 0 be so that B , ( x ) N . Then by Proposition 16.61 the set B, ( x ) is compact. Conversely, let X be so that for every x E X there is an E > 0 so that B,(x) is w compact. Then for every x E X the neighborhood N := BE( x ) is compact. It is easy to infer from Proposition 16.101 that all open subsets of Rdand all closed subsets of Rd are locally compact. In particular, we obtain that surfaces like the unit 16.9. Locally Compact Spaces x E Rd : lowing. llxil2 335 I = 1 are locally compact. More generally, we can say the fol- Definition 16.102 Let X . Y be metric spaces. Then f : X + Y is called a homeomorphism iff f is continuous and bijective and its im.erse is continuous, too. Example 16.103 Any metric space for which each point has a neighborhood that is homeomorphic to an open set in R" is locally compact. If the dimension d does not depend on the point, we call the space a manifold. Manifolds are discussed in detail in Chapter 19. Surfaces such as the unit sphere are manifolds. For much of the following, solids and surfaces in Rdare a good visualization and motivation. 0 In locally compact spaces, between any compact set C and any open neighborhood U of C we can slip a compact neighborhood of C (also see Figure 40(a)). Lemma 16.104 Let X be a locally compact metric space, let C 5 X be compact and let U 5 X be open with C U . Then C has a neighborhood V so that is compact and contained in U . v Proof. For each c E C, there is an E~ > 0 so that B,,(c) is compact and conBe, ( c ) and C is compact, there are c1, . . . , cn E C so tained in U . Because C u u C€C j=1 u n n that C E BEc ( c j ) . But then V := B,, ( c j ) is a neighborhood of C so that by j=1 Exercises 16-44b and 16-1 10 the closure 7= u n u n B,, ( c j ) = BEc ( c j ) is compact. j=1 J=1 ) contained in U , the union is contained in U , too. W Moreover, because all B E c ( c are Standard Proof Technique 16.105 The argument in the proof of Lemma 16.104 is a typical application of the Heine-Bore1 formulation of compactness. We start with an open cover and because the notion we want to preserve is only preserved by finite unions, we use a finite subcover. 0 Local compactness only applies near individual points. As it turns out, for connected spaces we can turn this local idea into a property that allows us to use compactness in a more global fashion. Definition 16.106 A metric space X is called a-compact #X is the union of countably many compact sets. Clearly, closed subsets of Rd are a-compact, because their intersections with the closed balls B, (0) are compact. More specifically we can say the following. Theorem 16.107 Let X be a connected, locally compact metric space. Then X is a compact. 16. The Topology of Metric Spaces 336 Figure 40: Part ( a ) illustrates Lemmas 16.99 and 16.104, which say that between any closed (compact) set C and any open set U surrounding C there is an open set V so that V and 7are “between” C and U . Part ( b ) shows a a-compact space and the idea for the proof of Theorem 16.112. The concentric circles depict a compact exhaustion (the sets K,, in the proof of Theorem 16.1 12). The shells S,,and their neighborhoods U, are set up so that only finitely may of the U, can intersect. I - I Proof. For each x E X, let r, := sup r > 0 : B,(x) is compact . Because X is locally compact, each r, is greater than zero. If x E X is so that r, is infinity, then co for all n E N the set B , , ( x ) is compact, and hence X = uB,(x) is a-compact. This n=l leaves the case in which each r, is finite. In this case, we first prove that the function x H r, is continuous. Let x,z E X be so that d(x,z ) < r,. Then for all r E ( d ( x ,z ) , r,) we infer B r - d ( x , z ) ( ~ C ) B,(x) and the ball on the right is compact. Hence, rz 2 r, - d ( x , z ) , that is, r, - rz 5 d ( x , z ) . Reversing the roles of x and z we can also prove rz - r, 5 d(x,z ) , which means lr, - rzI 5 d ( x , z ) for all z with d(x,z ) < r,. Hence, x H r, is continuous at each x E x. Now for each compact subset C of X we define N ( C ) := u B?(c). We claim C€C that the set N ( C ) is compact. Let ~=:},x{ be a sequence in N ( C ) . For each x, there is a c, E C so that x, E B% (c,). Because C is compact, {cn)Zl has a convergent subsequence with limit c E C. Without loss of generality we can assume that [en]:=, itself converges to c. Then there is an N E N so that for all n 2 N the inequalities TC rC rc, 57, d(c,,, c ) < - and Ire, - r c / < - hold. But then for all n 2 N we have - < -, 4 4 2 8 rc rc 7rc and hence d ( x n , c ) 5 d ( x , , c,) +d(c,, c) < - < -, so x, E B b ( c ) . Be2 4 8 has a convergent subsequence. This proves that cause B7 ( c ) is compact, ~=:},x{ 3 N ( C ) is compact. + Now let x E X be arbitrary and let C1 := {x}. Recursively define Cn+l := N(C,) 16.9. Locally Compact Spaces u 337 oc for n E N. Let H := C,. Then because each C, is compact, H is a-compact. n=l Moreover, because each Cn+l is a neighborhood of every element of C,, H is open. To see that H is closed, let x E %. Then there is an h E H so that d ( h , x ) < 4 rX rh 3r.x rx and Irh - rx 1 < -. But then - > - > -, which means that if n E N is so that 4 2 84 h E C,, then x E N(C,). Hence, H = H is closed and so X \ H is open. Because X is connected, this means that X = H and since H is a-compact the result is proved. We conclude this section by proving that locally compact spaces have a partition of unity (see Definition 16.110 below) with certain properties. Definition 16.108 The cover 0 of the metric space X is called locally finite iff each p E X has a neighborhood that intersects onlyjnitely many elements of 0. Definition 16.109 Let X be a metric space and let f : X + R. Then the support of f is dejked to be supp(f ) := { x E X : f ( x ) # 0). Definition 16.110 A family {cpj)jEJ of continuous functions on a metric space X is called a partition of unity iff 1. The collection { {x E X : cpj ( x ) $ 0 } }j E is a locallyjnite cover of X . 2. For all x E X we have that cpj ( x ) = 1. (By part 1 for each x E X this sum jEJ has only jnitely many nonzero terms.) If0 is an open cover of X and for each j E J the containment supp(cpj) for some U E 0, then the partition of unity is called subordinate to 0. c U holds The importance of partitions of unity will become clear in Section 19.5. Until then, consider the following. Many surfaces in Rd cannot be parametrized with just one function that has an open domain. (Open domains are needed, because differentiable functions typically have open domains, see Chapter 17.) For example, a parametrization of the unit sphere, say, with spherical coordinates, will always either hit a few points twice or it will m i s s at least a “seam.” This is because a parametrization must be a homeomorphism and the unit sphere is compact, which means it cannot be homeomorphic to an open subset of I@. However, roughly speaking, a function is integrated over the sphere by integrating its composition with the right parametrization. Thus, it is problematic to double count points or m i s s points. Either case would distort the integral. This problem does not arise for functions that are zero except on some small open set. For such functions, we could simply use a parametrization for which the missed seam does not intersect the support of the function. A partition of unity {cp,)jGJ allows us to represent arbitrary functions f as sums ‘ p j f of functions ‘pj f whose supports jEJ are contained in “small” open sets. We can then integrate these functions separately 16. The Topology of Metric Spaces 338 and the overall sum will be the integral of f.There is still a tremendous amount of detail left to be considered (think about independence of the parametrization), and this is why we will later need partitions of unity subordinate to open covers that have further nice properties. Because not every cover is locally finite, we need the notion of a refinement. Definition 16.111 Let 0 be a cover of the metric space X . A cover refinement of 0#for all U E 6 there is a V E 0so that U C V . 6 is called a Theorem 16.112 Let 0 be an open cover of the locally compact, a-compact metric space X . Then 0 has a countable locallyfinite open refinement. Moreovel; the closures of the sets in the rejinement are compact, Proof. There is nothing to prove if X is compact. If X is not compact, let { C ; } c l cx be a sequence of compact sets with X = u C;. Let K1 := C1 and once K1, . . . , K, j=1 are defined, let K,+1 be a compact neighborhood of K, U C, so that K, U C, g K,",, u . M Then X = K , and for all n E N we have the containment K, C K,",, . n=l Let K-1 := KO := 0.For all n E u N let S, := K, \ K,"-l and U, := K,",, \ Kn-2. ffi Then X = Sj and for all n E N the set S, is compact, U, is open and Sn g U,. j=1 For In - ml z 1, we have S, n S, = 0 and for all / n - ml > 2 we have U , n U , = 0 (also see the right part of Figure 40). For each x E S, , let Nx be an open neighborhood of X that is contained in U, and in some 0 E 0. Then S,, C N x and because S, u X€S, u kn is compact, there are x1("), . . . , xj:) E S, so that S, _C j=1 We define d := CL) u Nx(!] : n [ k nJ u E N,j = 1, . . . , k, I Nx(!). ' - . Clearly, 0is countable. Because N,c,) for all n E N we conclude that 6 is a cover of X. All j=1 ' N.x(,) are open and contained in an 0 E 0, so 6 is an open refinement of 0. Finally, X= S j and S, C j=1 6 is locally finite, let x E X . Then there is a k E N so that x @ U,, unless + 1, k + 2). Any N$) that intersects uk U uk+l U u k + 2 must be contained in to iee that m E { k ,k one of uk-2,. . . , uk+4. These sets contain finitely many N;:) each. Therefore x can be in at most finitely many N$), and hence 6 is locally finite. Finally, because each Nx is contained in a U, 5 K,+1, the closure of each Nx is compact. As it turns out, any locally finite open cover of a a-compact space is countable. 16.9. Locally Compact Spaces 339 Proposition 16.113 Let 0be a locallyjinite open cover of the a-compact metric space X . Then 0 is countable. Proof. Exercise 16-109. To construct a partition of unity subordinate to a locally finite open cover we need to find a way to define functions that are supported inside the open sets so that the sets where the functions are not zero cover the whole metric space. To do that, we need to be able to shrink every set in the open cover a little bit while making sure that we still cover the whole space. Theorem 16.114 Shrinking Lemma. Let 0 be a locally$nite open cover of the locally compact, a-compact metric space Then for each U E 0 there is an open set VIJso that c U and so that 6 := { VIJ: U E 0}is a locallyjnite open cover of X . x. Proof. By Proposition 16.113, 0 is at most countable, so let {U, : n E W} := 0. (For finite covers 0, the construction below terminates in finitely many steps with the u 00 desired cover 6.) The set C1 := U1 \ u 00 U,, = X \ n=2 U,, is closed and contained n=2 in U1. By Lemma 16.99, there is an open set V1 so that C1 C V1 G % E U1. Then 0 1 := { V l }U {U, : n > 1) is anopencover of X and% & U1. Once an open cover 0 k = { V1, . . . , Vk} U {U,, : n > k } has been constructed so that q C Uj holds for j = 1, . . . , k, let C k + l := u k + l \ u [j:l co Vj U u n=k+2 Un ) . Then Ck+l is closed and contained in u k + l . By Lemma 16.99, there is an open set Vk+l so u k + ] . Let L?k+l := { V l , .. ., v k , vk+l}u{un: n > k 1). that ck & v k + l & v k + l Then 0 k + l is an opEncover of X and% E Uj for all j = 1 , . . . , k 1. Now consider 0 := { Vj : j E N}. Because (3 is locally finite, for each x E X there is an N E N so that x # U, for all n 2 N . But then, because ON was an open cover of X, there must be a j < N so that x E V j . Hence, 8 is a cover of X. By construction, for all j E N the set V, is open and satisfies _C U j . Because 0 is locally finite, 6 must also be locally finite and thus the result is proved. + + Theorem 16.115 Let 0 be an open cover of the locally compact, a-compact metric space X . Then there is a partition of unity subordinate to 0. Proof. Let U be a locally finite open refinement of 0 as guaranteed by Theorem 16.112. Let u" be a locally finite open cover so that for every U E U there is a VIJE U so that 5 U , as guaranteed by the Shrinking Lemma. For each U E 24,let WIJ be an open set so that 5 W I J 6C U as provided by Lemma 16.99 and let = 0 as $U : X -+ [O, 11 be a continuous function so that $ I J I ~ , = 1 and provided by Lemma 16.99. Because U is locally finite, each x E X has a neighborhood V so thatLor all u E V the equality $ u ( v ) = 0 holds for all but finitely many U E U . Because U is a cover of X, for at least one U E U we have + u ( x ) # 0. Hence, for all x E X the sum +(x) := $ I J ( X ) is a positive real number. Moreover, for each UEU 340 16. The Topology of Metric Spaces E X on a neighborhood V of x the function @ is the sum of finitely many continuous functions. Hence, @ is continuous on this neighborhood of x and so @ is continuous at x. Because x was arbitrary, @ is continuous on X. x = {x E X : @ u ( x )# 0). For all For each U E U , we have @u U E U define q y := -. 7b over, for all x E X we have Then { {x E X : cpu(x) # O ) } u E U is locally finite. More- c cpu(x) = c uEU U€U @u(x) - @(x) - 1. Hence, $(XI {cpu}uEu @(XI is a partition of unity. For each U , we have supp(cpu) G U G 0 for some 0 the partition of unity {cpu}uEuis subordinate to 0. E 0, so Exercises 16-104. Let X be a metric space and let A G X be a nonempty subset. (a) Prove that dist(x, A) = 0 iff there is a sequence [a,]:=, (b) Let C CX in A with lim a, = x. n-t 30 be closed and nonempty. Prove that dist(x, C) = 0 iff x E C 16-105. Let a , b be distinct nonnegative numbers that are not both zero. Prove that we have a 5 E [O, a f b a = Oiffa = 0,andthat = 1 iff b = 0. that a f b a f b 16-106. Let X be a metric space and let C & X be a nonempty compact subset. Prove that for all x is a c, E C so that d ( x , cx) = dist(x, C). 16-107. Let X be a metric space. For any two nonempty subsets A , B as dist(A, B ) := inf { dist(a, B ) : a E A 1. E 11. X there X, define the distance from A to B (a) Prove that for all nonempty subsets A , B g X we have dist(A, B ) = dist(B, A). (b) Give an example of two closed, disjoint, nonempty sets A , B such that dist(A, B ) = 0. (c) Prove that the function in Lemma 16.99 need not be uniformly continuous, and hence in particular it need not be Lipschitz continuous. (d) Prove that if B and C are not empty and C is compact, then there is a c E C so that dist(C, B ) = dist(c, B ) . 16-108. Give an example of a metric space in which Lemma 16.104 fails. That is, give an example of a metric space in which no neighborhood of a compact set is compact. 16-109. Prove Proposition 16.113. u I1 16-1 10. Let X be a metric space and let C1, . . . , Cn be compact subsets of X. Prove that C j is compact. j=l 16-1 I 1. Give an example of a bijective continuous linear function whose inverse is not continuous. Chapter 17 Differentiation in Normed Spaces To discuss differentiation, we need algebraic operations and a metric. Normed spaces have the algebraic structure of a vector space and a metric induced by the norm (see Proposition 15.54). The vector space structure is typically discussed in linear algebra. To keep the text self-contained, we introduce the requisite concepts and ideas in this chapter. Moreover, we freely use metric concepts discussed in Chapter 16. The presentation will be coordinate-free and valid for arbitrary (including infinite) dimensions. Although derivatives are mainly used in finite dimensional spaces and computed through partial derivatives along coordinate axes, this abstraction does not make the proofs more complicated. Instead, the omission of coordinates allows us to focus on the conceptual core of differentiation. The relevant results for finite dimensional spaces are given as consequences of the general theory. To understand differentiation in multidimensional spaces, we must first adjust our expectation what a derivative should be. The derivative cannot be a number or a slope, because it is not clear in which direction this slope would go. Instead, differentiation is defined similar to Theorem 4.5, which says that the derivative at x determines the unique straight line through (x,f(x)) for which the difference (at z ) between the function and the line goes to zero faster than ( z - XI. Geometrically, (hyper)planes will take the place of lines. Linear functions are the analytical tool used to define (hyper)planes. We start our investigation with linear functions in Sections 17.1 and 17.2. Derivatives and partial derivatives are introduced in Sections 17.3 and 17.5. In between, Section 17.4 introduces the Mean Value Theorem, which is crucial for using derivatives to estimate differences. Section 17.6 introduces tensors, which are needed to represent higher derivatives in Section 17.7. We conclude in Section 17.8 with the Implicit Function Theorem, which provides important examples of manifolds. 17. Differentiation in Normed Spaces 342 17.1 Continuous Linear Functions The definition of a linear function is entirely algebraic. It simply states that the function is compatible with the vector space operations. Definition 17.1 Let X , Y be vector spaces. A function L : X -+ Y is called linear i f s E X we have L[ xl x2] = L [ x l ]+ L[x2]andfor all (Y E IR and x E X we have L [ ( Y x ]= a L [ x ] .A linearfunction is also sometimes called a linear operator. for all X I ,x2 + Notation 17.2 Derivatives will be functions that map points to linear functions. To distinguish the various evaluations, throughout the text we will enclose the argument of a linear function in square brackets rather than round parentheses. To avoid confusion with the square brackets which indicate that the elements of LP spaces are equivalence classes, we will henceforth omit the brackets around the elements of LP spaces, as is customary in analysis. 0 Differentiation and integration both lead to examples of linear functions. Example 17.3 1 1 1. Let ( M , C, p ) be a measure space, let 1 5 p , q 5 co with - - = 1 and let P 9 g E L q ( M , C , p ) . By Example 16.28, Zg[f ] := fg d p defines a continuous + function Zg : L p ( M , 0,p ) -+ is linear. s, IR and it is easy to see via Theorem 14.39 that Zg 2. Let 1 5 p < co and let X be the subspace of functions f E C' ( a , b ) n L p ( a , b) so that f' E L p ( a , b ) , too. When it is not mentioned explicitly that elements of L P ( a , b) are equivalence classes, it is customary to use intersections like C ' ( a , b ) n L p ( a , b ) rather than the more complicated notation from Theorem 16.86. We claim that the function D : X + LP(a, b ) defined by D [f ] := f' is linear, but not continuous. If two functions in X are equal almost everywhere, then they must be equal (see Exercise 8-5). That is, i f f , g E X g L p ( a , b ) and f = g a.e., then f = g, which implies that D is well-defined. (Just because we do not mention that the elements of LP are equivalence classes does not mean we do not need to pay attention to this fact when we define a function.) Linearity of D follows easily from the corresponding linearity properties of the derivative (see Theorem 4.6). To see that D is not continuous, we will only consider the case a = 0 and b = 1 here. The general case is left to the reader in Exercise 17-1. Because 17.1. Continuous Linear Functions lim = n--tm 343 (n p - p + l =(m; np 1; i f p > 1, i f p = 1, 0 the function D is not continuous at 0. It is a good rule of thumb that integration usually defines continuous linear functions and differentiation usually defines discontinuous linear functions. We have to be careful, though. Exercise 17-10 shows that integration need not always define a continuous linear function and Exercise 17-13 shows that differentiation can be continuous. The multiplication of a matrix with column vectors is another fundamental example of a linear function. We will investigate the connection between the “abstract” concept of continuous linear functions and the more “concrete” idea of matrix multiplication in Section 17.2. Before then, we need to investigate continuity for linear functions. In Example 17.3, we have only proved that D is not continuous at the origin. Loosely speaking, for linear functions the behavior at the origin is duplicated at every point. In particular, Exercise 17-2 shows that D is discontinuous at every point of LP(a, b). As a positive result, Theorem 17.4 below shows that for linear functions continuity at the origin is equivalent to continuity everywhere. Note how the proof uses linearity to reduce every situation to a configuration near the origin. Theorem 17.4 Let X, Y be normed spaces and let L : X + Y be a linearfunction. The following are equivalent: 1. L is continuous on X. 2. L is continuous at 0. - 3. L is boundedon B l ( 0 ) c X. 4. T h e r e i s a c E R s o t h a t f o r a l l x ~ X w e h a v elIL[x]//icllxll. Proof. The implication “1=+2” is trivial. For “2+3,” let E > 0. Because L is continuous at 0, there is a 6 > 0 so that for all x E X with Ilxll = //x- 011 5 6 we have IIL[x] < E . Therefore for all points x E X with / / x = / /IIx - 011 I 1 we obtain 1 1 IIL[x]11= IIL :6x] = L [6x] r: ; E . Hence, L is bounded on B1(0). / 1 [ For ‘‘3=>4,”let c allx E x weinfer E 1 1 R be such that for all x E l l ~ [ x ]=( / For ‘ ‘ 4 j l . ” let c > 0 be such that for all x & E B1(0) we have 1 L [x]1 5 c. Then for E > 0 and set 6 := -. Then for all x,y E X with lIx - y C 1 X we have L [ x ] I /5 cIIxII. Let I/ < 6 we obtain 1/L[xl- L[YlII = / / L b- Y 1 / / I cllx - YII < “; & = E. In particular, Theorem 17.4 shows that every continuous linear function is Lipschitz continuous. 344 17. Differentiation in Normed Spaces Definition 17.5 Because of part 3 of Theorem 17.4, continuous linear functions are also called bounded linear functions. It is now easy to show that any linear function on a finite dimensional normed space is continuous. Corollary 17.6 Let X be ajinite dimensional normed space, let Y be a normed space and let L : X + Y be a linearfunction. Then L is continuous. Proof. Let {ul, ...,Ud} be a base of X. Then for all x x E X there are unique d coefficients c1, . . . , Cd so that x = Cj ui . Denote the norm on X by II . /I and note that i=l IIxlloo := max { IciI : i = 1 , . . . , d } is another norm on X. By Theorem 16.76, 11 and the original norm I/ . I( of X are equivalent. Let r > 0 be such that for all x d wehave I I X ~ 5~ rllxll ~ andletc:=rxIIL[ui]ll. i=l . E X d Thenforallx=xciui ~ X w e i=l obtain which means that L is continuous. Linear functions can be added pointwise and they can be multiplied with real numbers. Hence, the linear functions from one normed space to another form a vector space. Part 4 of Theorem 17.4 enables us to define a norm on this space. Definition 17.7 Let X, Y be normed spaces. We dejine C ( X , Y ) to be the set of all continuous linear functions from X to Y . Theorem 17.8 Let X, Y be normed spaces. Then, with pointwise addition and scalar multiplication, C ( X , Y ) is a vector space. Moreovel; the function Proof. The proof that C(X, Y ) is a vector space is left to Exercise 17-3a. To prove that C(X, Y ) is a normed space, we start by defining for all L E C(X, Y ) the quantity 11 L /I := inf { c L 0 : (Vx E X : 11 L[x] 11 5 cI/x[I)}. This is necessary, because we do not know a priori that the infimum is assumed, as is implicitly claimed in the definition of the norm on C ( X , Y ) in the statement of the theorem. To prove that the 11 L 11 defined above is a norm, first note that if L : X + Y is linear and bounded, then IlLll = 0 iff inf{c L O:(Vx E X:/L[x]ll I cllxll)}=O iff for all 17.1. Continuous Linear Functions x E 345 X we have L [ x ] = 0 iff L = 0. Moreover, if L E C ( X , Y ) and (Y E 1la~11= inf{c 2 R,then o : (VX E x : ~ l a ~ [ x I l l lc11x11)} For the triangular inequality, let L , M E L ( X , Y ) . Then o : (VX E x : / I L [ X+] M [ X ] I r:I C I I X I I ) } I inf{c 2 o : (VX E x : IIL[XIII + IIM[XIII I cIIxII)} 5 inf{a 2 o : (VX E x : /l~[xlll5 a / l x l l ) } I I L+ M I / = inf{c 2 +inf { b 2 0 : (Vx E X : llM[x]11 I bllxll)} = IlLIl + IlMIl. By Exercise 17-3b, we have l l L [ x ] / /I llLll IIxII for all x E X. In particular, the infimum that is used to define the norm is actually a minimum, as claimed. Unless otherwise indicated, throughout this text the norm on any space of continuous linear functions will be assumed to be the norm from Theorem 17.8. Definition 17.9 The norm from Theorem 17.8 is also called the operator norm of the continuous linear function L. Another way to represent the norm of a continuous linear function is the following. Proposition 17.10 Let X , Y be normed spaces and let L : X + Y be a continuous linearfunction. Then 11L11 = sup I I L [ x ] : x E B1(0)]. { Proof. Exercise 17-4. 1 rn The completeness of the spaces C(X,Y ) solely depends on the completeness of the image space Y . Theorem 17.11 Let X be a normed space and let Y be a Banach space. Then C ( X , Y ) is a Banach space. {Ln)rz1 Proof. Let be a Cauchy sequence in C ( X , Y ) . We first claim that for all x E X the sequence { L n [ x ] } zisl a Cauchy sequence. To prove this claim, let x E X and let E > 0. Find an N E N so that for all m , n 3 N we have the inequality & llLm - L l l I . Then for all m , n 2 N we infer IIxII + 1 ~ 17. Differentiation in Normed Spaces 346 l a Cauchy sequence. and { L n [ x ] } z is Because Y is complete, for all x E X we can define L [ x ]:= lim L , [ x ] . It is easy n+m to prove that L is linear (see Exercise 17-5). Because the reverse triangular inequality /llLMII- ~ ~ L5 nIILm ~ ~- L,(( / holds for all m , n E N we obtain that { ~ ~ L n ~ ~ } is a Cauchy sequence, and hence lim IIL,)I exists. Now for all x E X the inequaln+x, (ii /lL,/l) % //XI/ holds, and hence L ity / / L [ x l /=/ lim / / L , [ x ] / II ,--too E L(X,Y ) . To prove that L is the limit of { L n } z lin L ( X , Y ) let E > 0. Let N E N be such that for all m , n 2 N we have llLm - L , )I < E . Then for all n 2 N we obtain llLm - L , /I IIx 11 : x Thus for all n > N we have IIL - L , 11 E -1 Bl(0) 5 E. < E and { L n ] z converges l to L . w For our investigation of derivatives, it is important to realize that there is a simple bijective correspondence between the elements of a normed space Y and the linear functions from R to Y . Theorem 17.12 Let Y be a normed space. For f E Y define L f : R -+ Y by L f [ x ]:= xf . Then the function I [f ] := L f defines an isometric isomorphismfrom Y to C(R, Y ) . Proof. Clearly, for all f E Y the function L f is linear and II L f II = I/ f 11. Linearity of I is trivial. Because IILf 11 = 11 f 11, I is an isometry, and hence it is injective. To prove that I is surjective, let L E C(R, Y ) . Set f := L[1]. Then for all x E W we have L [ x ] = L [ x l ] = x L [ l ] = xf , which means that L = L f and I is surjective. We conclude this section by noting that for continuous linear functions we could always assume that the domain is a Banach space. Recall that by Theorem 16.89 and Exercise 16-94 every normed space can be densely embedded into a Banach space. Theorem 17.13 shows that any linear function L : X -+ Y can be extended to a unique continuous linear function from the completion of X to the completion of Y . This means that as long as we work with continuous linear functions, which we do exclusively in this chapter, it would be no loss of generality to assume (as is often done) that domain and range are Banach spaces. Theorem 17.13 Let X be a normed space, let Y be a Banach space, let D & X be a dense linear subspace of X and let L : D -+ Y be a continuous linear function. Then there is a unique continuous linear function M : X -+ Y such that M I D = L. Proof. Exercise 17-6. w 347 1 7.1. Continuous Linear Functions Exercises 17-1. Let 1 5 p < co,let a < b and let X := { f E C ’ ( u , b ) n L P ( a , b ) : f ’ E L p ( u , b ) D : X -+ L P ( a , b ) defined by D [ f ] := [ f ’ ]is not continuous. }. Prove that 17-2. Let X , Y be named spaces and let L : X -+ Y be linear. Prove that if L is discontinuous at the origin, then it is discontinuous at every x E X . 17-3. Finish the proof of Theorem 17.8. Let X, Y be normed spaces (a) Prove that L(X,Y ) with pointwise addition and scalar multiplication is a vector space. (b) Let L : X + Y be a continuous linear function. Prove that 1 L[x] / 5 IILII IIx 11 for all x E X. 17-4. Prove Proposition 17.10. 17-5. Finish the proof of Theorem 17.11 by proving that the function L [ x ] := lim L n [ x ] defined in the n+m proof is linear. 17-6. Prove Theorem 17.13. 17-7. Let X, Y be vector spaces and let U be a base of X. (a) Let L . M : X + Y be linear. Prove that if L ( b ) = M ( b ) for all b E U, then L = M . (b) Prove that if f : U + Y is a function defined on the base 8,then there is a unique linear function L f : X + Y so that L f / 8 = f . 1 17-8. Let 1 5 p 5 x, let q be so that P s { ~IF=, , ({xj]gI) := + -1 = 1 and let 02 4 ( U ~ J ~E = 14. ~ Prove that the function C a j x j is a continuous linear operator from ZP to R. j=1 17-9. Let X,Y be normed spaces and let L : X + Y be linear. Prove that if L is continuous at some x E X,then L is continuous on X. 17- 10. Integration need not always define a continuous linear function. Let X be the space of all continuous functions f : JR + JR so that { x : f ( x ) f 0 } is bounded. 11 . 1/30 (a) Prove that X is a normed subspace of (C(W,a), (b) Prove that L : X -+ W defined by L [ f ] := __ (c) Prove that L is not bounded on B l ( 0 ) . 1, ). x f ( x ) dx is a linear function on X (d) Let f E X. Find a sequence ( f n ) ~ = of , elements of X that converges to f and such that 02 { L [ f n ] }n=l does not converge to L [ f ] . (e) Prove that X is not a Banach space. (0 Prove that X is not dense in (C(W, W),11 ). 1130 17-11, Let X be the space of polynomials of order at most 3 on the interval (0, 1). Prove that D [ p ] := p’ is a continuous linear function from X to X. 17-12. Let X. Y and Z be normed spaces and let K : X functions. --f Y and L : Y + Z be continuous linear (a) Prove that lIL o Kll 5 ~ ~ L i ~ ~ ~ K i ~ (b) Prove that the inequality in part 17-12a can be strict. 17-13, ProvethatifC’(u,b)isequippedwiththenorm I l f l l := l I f l l r n f H f ’ is a continuous mapping from C ( a , b ) to Co ( a , b ). + llf’llm (seeExercise 16-17),then 17. Differentiation in Normed Spaces 348 17.2 Matrix Representation of Linear Functions The coordinate-free introduction in Section 17.1 provides a concise description of linear functions. Without specifics about a coordinate system, an abstract notion can usually be investigated more easily, because there are fewer details to keep track of. On the other hand, coordinates bridge the gap between abstract notions and concrete applications. Therefore, the connection between a concept and its coordinatized version should be investigated very carefully. This section shows that coordinatization of linear functions is done by carefully reinterpretingsome natural coefficients. A coordinate system in a vector space ultimately is nothing but a base, because any vector can be expressed as a unique linear combination of the base vectors (see Proposition 15.18). The coefficients in the base representation of the vector can be viewed as the coordinates. In this section, we investigate how a linear function between finite dimensional spaces maps the coordinates of its input vectors to the coordinates of its output vectors. Because all finite dimensional spaces are isomorphic to some Rd (see Proposition 15.25) we will work with spaces Rm,Rn etc. throughout this section. The choice of a base in domain and range leads to the connection between linear functions and matrices. Proposition 17.14 Let in, n E N,let L : Rn --+ Rm be lineal; let { u l , . . . , u,} be a base of Rn and let (wl, . . . , w m }be a base of R". For all j = 1, . . . , n, let aij with m i = 1, . . . , mbesuchthatL[u;] = C a i ; w i . Because(u1, . . . , u,]isabaseofRn,for i=l all x E n Rn,there are unique coeflcients c1, . . . , c, so that x = cj uj. The image of j=1 Proof. Exercise 17-14. Proposition 17.14 shows that, once we fix bases in Rn and R", for each linear function L : Rn --+ R" there is a unique rectangular array of real numbers a j j , with indices i = 1, . . . , m , j = 1, . . . , n that can be used to represent L . Such rectangular arrays of numbers are called matrices. The set of matrices with the natural addition and scalar multiplication is a vector space. Definition 17.15 Let m , n E N.A real m x n-matrix with m rows and n columns is a function A : 11, . . . , m } x 11, . . . , n } --+ R,denoted A = (ajj) = l,. , . The j = l , . . . .n index i is called the row index and the index j is called the column index. We dejine M ( m x n , R)to be the set of all realm x n-matrices. ,, Proposition 17.16 Let m , n E N,let (aij) = , , , j = l . . . . ,n , (bij) = 1 , ,, , . j = I, ...,n be real 17.2. Matrix Representation of Linear Functions 349 R.With addition dejined by m x n matrices and let c E + (bij) i = 1 , . = 1,. ., , m j = l,..,,n (aij) i , , ,m j = 1, . . . , n := (aij + bjj) i = 1 , . ., , m j = l....,n and with scalar multiplication dejined by = 1 , . . . ,m j = l , . . . ,n ~ ( a i j i)= l , . . . ,m := (caij) i j = l , . . . ,n the set of matrices M ( m x n , R) is a vector space. Proof. Exercise 17-15. The coefficients from Proposition 17.14 immediately lead to an isomorphism between C (Rn,R") and M ( m x n , R), where both are considered as vector spaces. Note that the specific isomorphism will depend on which bases we choose in R" and R". Theorem 17.17 Let m , n E N,and let { u l , . . . , u n } and { w l , . . . , w m } be bases of W" and Rm, respectively. For each L E C (R", R"), let A ( L ) = ( a j j ) = 1,, , m ,, j = 1, . . . , n be the matrix with coeficients aij provided by Proposition 17.14. Then the function A : C (W", R") + M ( m x n , R)thus dejined is a vector space isomorphism. Proof. Exercise 17-16. Similar to Theorem 17.12, form = n = 1 Theorem 17.17 shows that linear functions from R to R are in bijective correspondence with numbers (considered as "1 x 1 matrices"). Composition of functions can be used to define a multiplication on C (R" , R"). Exercise 17-17 shows that this multiplication is compatible with addition and scalar multiplication. For M ( m x n , R),we can define multiplication of matrices as follows. Definition 17.18 Matrix multiplication. Let m , n , p E N. For the real matrices A = ( a j k ) j = 1 , . . , , m E M ( m x n , R ) a n d B = ( b i j ) j = l , . . _ , p E M ( p x m , R),we j = 1, . . . , m k = l , . . . ,n dejine the product B A := bijajk (j:1 ) E M ( p x n , W). i = l ,. . . , p k = 1.. ..,n Theorems 17.20 and 17.21 below show the connection between matrix multiplication and the evaluation and composition of linear functions. For the remainder of this chapter, the base used in any space Rdis the standard base ( e l , . . . , ed). m Proposition 17.19 Let m E N. Thefinction V, isomorphism from M ( m x 1, R) to Wm. (uij) = 1,, , , , j=1 := C uilei is an i=l 17. Differentiation in Normed Spaces 350 I W" vn M ( n x 1,R) L c vm A = A(L) c I Rm M(m x 1,W) H B =A(H) * afp c i'. M ( p x 1.X) Figure 4 1: The connection between linear functions and their matrix representations. Note that instead of representing linear functions between spaces Rd with the standard basis, we could have also represented linear functions between arbitrary finite dimensional vector spaces using any base in each space. Proof. Exercise 17- 18. W The isomorphisms Vm are the key to representing evaluations and compositions of linear functions as matrix multiplications and vice versa. It is customary to drop the second index for elements of M ( m x 1, R)and we will do so in the following. The representation of elements of Rm as in Proposition 17.19 is also called the representation with column vectors. The corresponding representation with 1 x m matrices is called the representation with row vectors. Theorem 17.20 Matrix multiplication and evaluation of linear functions. Let m , n E N,let L : R" + Rm be a linearfunction and let A := A ( L ) E M ( m x n , R)be the matrix obtained from Theorem 17.17, using the standard bases in E%" and R". Then , where A and V;'[u] are multiplied as for all u E R" we have L [ v ] = Vm matrices. Proof. Exercise 17-19. W Exercise 17-20 shows that matrix multiplication is associative, which means we can write a product of three or more matrices without parentheses indicating which pairs of matrices are multiplied first. Theorem 17.21 Matrix multiplication and composition of linear functions. Let m , n , p E N,let L : Rn + Rm and H : Rm + R p be linear functions and let A := A ( L ) E M ( m x n , R) and B := A ( H ) E M ( p x m , R) be the matrices obtained from Theorem 17.1 7, using the standard bases in Rn, Rm,and RP.Then for all u E Fin we have H o L [ u ] = Vp BAV;'[x]], where B, A and V;'[x] are multiplied as matrices. Proof. Exercise 17-21. W The connection between linear functions and the matrices that represent them is also expressed in Figure 41. Diagrams as in this figure are often called commutative diagrams, because the order in which the arrows are followed can be interchanged and the arrows representing isomorphisms can be reversed. To further familiarize ourselves with the correspondence between matrix multiplication and composition of linear functions, let us cast the well-known Gauss-Jordan 35 1 17.2. Matrix Representation of Linear Functions algorithm from linear algebra in the language of linear functions. We will focus the result on bijective linear functions because we need this reinterpretation for Lemma 18.35 in the proof of the Multidimensional Substitution Formula. Definition 17.22 Elementary row operation functions. Let d d x E Rdbe represented as x xie; =: ( X I , = E N and let every vector . . . , xd). i=l I . D : lRd + Rdis called a diagonal operator $there are c1, . . . , Cd D ( x 1 , . . . , X d ) = ( C l X l , . . . , CdXd). E R so that 2. A : Rd -+ Rd is called a row addition operator zflthere are a number a E Iw and distinct indices i , j E (1, . . . , d } so that for all x = ( X I , . . . , X d ) E Rd we have A(x1, . . . , xd) = ( X I , . . . , X j - 1 , x; + a x ; , x;+l, . . . , xd). 3. T : Rd + Rd is called a row transposition operator zflthere are indices i , j E [ l , . . . , d } w i t h i < j s u c h t h a t f o r a l l x = ( x i , . . . , xd) ~ R ~ w e h a v e T ( x .~. . . , ~ d ) = ( ~ 1. .,. . ~ i - l , X j , ~ i +,l. . . , ~ j - l , ~ i , ~ j + l , . . . , x d ) . Theorem 17.23 Gauss-JordanAlgorithm. Let d E N.Every bijective linearfunction L : Rd + Rd is a composition of one diagonal operator so that all cj # 0 with row addition and row transposition operators. Proof. We will provide an outline here and leave the full proof to Exercise 17-22. The proof is an induction on the dimension d . The base step d = 1 is trivial. For the induction step, let A' := A ( L ) E M ( d x d , R) be the matrix obtained from Theorem 17.17 using the standard base in Rd. Because L is bijective, there is a coefficient upl that is not equal to zero (explain). This means the transposition of rows 1 and i produces a matrix A' with a:1 # 0. Execute d - 1 row additions to produce a matrix A* with a:, # 0 and a:l = 0 for i = 2, . . . , d . Now consider the matrix B obtained from A 2 by erasing the first row and the first column. This matrix is the image B = A ( L ' ) of a bijective linear function L' : Itd-' + Rd-' (explain). Therefore, by induction hypothesis (explain) there is a sequence of row additions and transpositions that turns B into a diagonal matrix C. The corresponding row additions and transpositions turn A2 into a matrix A 3 whose only nonzero entries are in the first row and on the diagonal (explain). Moreover, all of the entries on the diagonal are not zero (explain). Perform the appropriate row additions to obtain a matrix A4 whose only nonzero entries are on the diagonal. (explain). For the above constructed sequence of row transpositions and additions, let the operators M I , . . . , M,, be the corresponding row transposition and row addition operators in the order in which the operations were performed. Then N := M , o . . . o Mi o L is a linear function so that A ( N ) = A4 (explain; a commutative diagram may help). That is, N is a diagonal operator and L = M r ' 0 .. OM;' o N . Row transposition operators are their own inverses and the inverse of a row addition operator is another row addition w operator (explain by stating the inverse). Thus we have proved the theorem. Note how the whole proof of Theorem 17.23 depends on the fluent translation between matrix operations and their interpretations as compositions of linear operators. 352 17. Differentiation in Normed Spaces Exercise 17-24 shows another application of this translation by assuring that the solution x of a uniquely solvable system of equations Ax = b depends continuously on the coefficients of A and b. Exercises 17-14. Prove Proposition 17.14. 17-15. Prove Proposition 17.16 17-16. Prove Theorem 17.17. 17-17. Let V , X, Y , Z be vector spaces, let L : Y + Z , M , N : X + Y and K : V c E B.Prove each of the following. --f X be linear and let (a) L o ( M + N) = L O M + L o N (b) ( M + N)o K = M o K +N o K (c) ( C L ) 0 M = c ( L 0 M ) = L 0 (CM) 17-18. Prove Proposition 17.19 17-19. Prove Theorem 17.20. 17-20. Prove that matrix multiplication is associative. That is, prove that if A E M ( m x n , B), B E M ( p x rn, R),and C E M ( q x p , W), then C ( B A ) = ( C B ) A . 17-21. Prove Theorem 17.21. 17-22. Prove Theorem 17.23 17-23. Interpret the rn x n matrix A with entries a i j as a linear function from ( E n ,/I ,112 ) to ( W" /I . / I 2 ). I Prove that the operator norm of A satisfies 11 Alj 5 i=l j = ] 17-24. The continuous dependence of solutions x of uniquely solvable systems of linear equations Ax = b on the entries of the coefficient matrix A and the right side 6 . (a) Let X , Y be normed spaces and let X ( X , Y ) be the set of all invertible continuous linear functions from X to Y with continuous inverse ("linear homeomorphisms"). Prove that the function J : X ( X , Y ) + X ( Y , X), which maps each A E X ( X , Y ) to A-' E X ( Y . X ) is continuous on X ( X , Y ) . Hint. Prove that for all A , B E R ( X , Y ) that are close enough together, the inequality 1IA-l - B-] 15 'IA-' 'I2 1 - llA-'/I lIB - All llB - All holds and prove that this implies that J is continuous at A . (b) Prove that if A is an invertible d x d matrix and b solution of the system of equations A x = b. (c) Prove that the map S : R ous. ( Wd.Rd ) x E Ed,then x = J ( A ) b is the unique ?Ed + Rd defined by S ( A , b ) := J ( A ) b is continu- (d) Suppose an industrial process allows the measurement of the coefficient matrix A and of the right hand side b of a linear system of equations Ax = b. Moreover, suppose that the process requires the computation of the solution x . Explain why (unavoidable) sufficiently small measurement errors are not likely to have a large effect on the computed solution x . 353 17.3. Differentiability 3d view ’1 side view + 6117 - - X I Figure 42: Differentiation is approximation with linear functions. The graph of a linear function from R2 to R is a plane. The figure shows that the difference between the graph of a differentiable function and the graph of an appropriately shifted linear function fits into arbitrarily small “cones.” In the side view, the plane is seen “edge-on” and we only indicate the sides of the cone with dotted lines. 17.3 Differentiability Although it may be geometrically intuitive, defining the derivative of a multivariable function in terms of partial derivatives can become a notational nightmare. Just consider the indices and notations in the matrix representation of a linear function in Section 17.2 and imagine them as part of a more complex definition. To circumvent this level of detail, which might obscure the forest for the trees, we introduce derivatives with a coordinate-free definition. Aside from simpler notation, we avoid the pathology presented in Exercises 17-60 and 17-61 and we gain conceptual insights into a theory that is not bound to finite dimensional spaces. Indeed, all results in this section hold for infinite dimensional spaces, too. Interestingly enough, restriction to finite dimensional spaces would not simplify the proofs. This is similar to Sections 14.2-14.4, where proofs originally designed for functions of one real variable translated verbatim to the setting of measure spaces. In this chapter, we state the proofs in the abstract setting and provide results for d-dimensional space as corollaries. By staying with this level of generality, we will ultimately produce some rather elegant proofs of important results. For examples, consider the proof of Leibniz’ Rule in Exercise 17-58, as well as the ends of Sections 17.7 and 17.8. In each case, important results for the familiar d-dimensional setting are obtained as corollaries of coordinate-free results on differentiation. The key to differentiation in higher dimensional spaces lies in Figure 9 and its analytical formulation in Theorem 4.5. A function should be differentiable iff it can 17. Differentiation in Normed Spaces 354 be approximated very closely by a shifted linear function. As in Figure 9, the idea in Definition 17.24 below is that near the point where the derivative is taken, for any multidimensional analogue of a cone, both the function and the approximating linear entity should ultimately be in the same “cone” (see Figure 42). Similar to the single variable setting, the natural domains for differentiable functions are open sets. Definition 17.24 Let X , Y be normed spaces, let R X be open, let f : R + Y be a function and let x E R. Then f is called differentiable at x @there is a continuous linearfunction L : X + Y so that for all E > 0 there is a 6 > 0 such that for all z E X with llz - xi1 < 6 we have IIf(z>-f(x)-L[z-xlI/ i&//Z--XII. In this case, we set Df (x) := L and call it the derivative o f f at x . By Exercise 17-25b, the derivative is unique, so we are can speak of the derivative. Notation 17.25 The argument in round parentheses behind a derivative Df will denote the point at which the derivative is taken and the argument in square brackets will denote the place where the derivative (remember that it is a linear function) is evaluated. That is, for f : R + Y , D f ( x ) [ a ]will denote the derivative of f taken at x E R and evaluated at a E X. 0 + The function A [ z ] := f ( x ) Df ( x ) [ z- x ] is also often called the linear approximation of f at x . The name should be clear. The function A is “linear” (affine linear to be precise, but that distinction is not always made) and it approximates the function f (see Figure 42). Exercise 17-25 investigates the quality of the approximation and it also gives a geometric interpretation for functions into the real numbers. Theorem 17.12 expresses the connection between linear functions L : Iw + Y and vectors and Theorem 17.17 expresses (among other things) the connection between linear functions L : R + R and numbers. These connections are the reason why derivatives of functions defined on intervals ( a , b ) are usually considered to be numbers or (tangent) vectors. The formal justification is the following. Proposition 17.26 Let Y be a normed space and let a < b be real numbers. The function f : ( a , b ) + Y is differentiable at x in the sense of Dejnition 17.24 i f f the f (x h ) - f ( X I exists. In this case, we call f ’ ( x ) the velocity limit f ’ ( x ) := lim h-0 h vector’ and we have Df ( x ) [ a ]= f ’ ( x ) afor all a E R. + Proof. Mimic the proof of Theorem 4.5. (Exercise 17-26.) Continuous linear functions are the simplest example of differentiable functions that do not follow the pattern of Proposition 17.26 Proposition 17.27 Let X , Y be normed spaces and let L : X + Y be continuous and lineal: Then L is differentiable at every x E X with D L ( x ) [ a ]= L [ a ] . ‘The name comes from the fact that if f : ( a , b ) + f’(r) is the velocity of the particle. W3 gives the position of a particle at time t , then 355 17.3. Differentiability Proof. Exercise 17-27. Note that the derivative of a continuous linear function actually is a constant function (whose value at every point is L [ . ] ) !This is similar to the derivative o f f (x) = cx being f ’ ( x ) = c. More examples of differentiable functions will be encountered throughout this chapter. For the rest of this section, we will work with derivatives in their full generality. First note that differentiability still implies continuity. Theorem 17.28 Let X , Y be normed spaces, let R _C X be an open set, and let the function f : 52 -+ Y be differentiable at x E Cl. Then f is continuous at x. Proof. Exercise 17-28 We conclude this section with differentiation rules. Theorem 17.29 Let X , Y be normed spaces, let R g X be open, let the functions f , g : 52 + Y be differentiable at x E 52 and let a E R. Then the sum f g is differentiable at x with D ( f g)(x) = Df (x) + Dg(x) and the scalar multiple af is dizerentiable at x with D ( af ) (x) = a Df (x). + + Proof. When the derivative is given, differentiability is proved directly with the definition. Consider af. For given E > 0, find 6 > 0 so that for all z E 52 with E /Iz -x 11 < S we have f ( z ) - f (x) - D f ( x ) [ z- XI I -I/z - x 11. Then for all b I+1 z E R with llz - xi1 < 6 we infer /I /I I IaI- & la1 + 1 Ilz - X I / 5 EIIZ - which means that f is differentiable at x and the derivative is a Df(x). The claim about the sum is proved similarly (Exercise 17-29). The Chain Rule retains its familiar form from calculus, except that the multiplication is replaced by composition. This is natural, because the composition of two linear functions from R to R corresponds to the multiplication of the numbers that represent the functions (see Theorem 17.21). Theorem 17.30 Chain Rule. Let X , Y , Z be normed spaces, let R1 C X and 522 Y be open, let g i-2 1 -+ R2 be differentiable at x and let f : R2 + Z be differentiable at g(x). Then f o g isdifferentiableatx withderivative D( f og)(x) = Df (g(x))oDg(x). Proof. Let F > 0. Find 61 > 0 so that for all y E Y with lly - g(x)II < 61 we have 356 17. Differentiation in Normed Spaces = ~IlZ-xlI, which proves the Chain Rule. Similar to the Chain Rule, the rule for the differentiation of inverse functions retains its overall form. The reciprocal in Theorem 4.21 is replaced with the inversion of the derivative. This is once again natural, because if two linear functions from R to R are inverses of each other, then the numbers that represent the functions are reciprocals of each other. Unlike in Theorem 4.21 we must demand that the image of the domain of f is open and that the inverse is continuous. This is because the argument at the beginning of the proof of Theorem 4.21 is not easily translated to the setting of normed spaces. Corollary 17.66 will show that the translation is possible, but it requires the Implicit Function Theorem. Theorem 17.31 Let X , Y be normed spaces, let Cl s X , 5 s Y be open and let f : R + 5 be a continuous bijective function with continuous inverse. I f f is differentiable at xo and Df (xo) is continuous, bijective and linear with continuous inverse, then f - ' is differentiable at yo := f ( x o ) and D (f-') ( y o ) = (Df( f - ' ( y o ) ) ) - ' . Proof. First note that there is a 61 > 0 so that for all x E X with 1I.x - xoll < 61 we have / f ( x ) - f ( x o ) - D f ( x o ) [-~ xol// 5 1 1 1 IIx - xoll. (Why is the 2 (Df(xo))-lII denominator not zero?) Hence, for all x E X with IIx - xoll < 61 the following 17.3. Differentiability 357 In particular, for all elements x E X with /Ix- xo/I < 61 we have the inequality Ilx - xoll 5 2 ~ ~ ( D f ( x o ) ) -IIf(x) * - f ( x o ) / I .Now let E > 0. Find 82 E (0,s') so that for all x E X with l(x - xoll < 62 we have 1 = EllY-Yoll. rn The derivative of the inverse of a function must not be confused with the derivative of the function that maps an invertible linear function to its inverse, which is considered in the following. (Recall that in Exercise 17-24a we have already proved that inversion is a continuous operation. Theorem 17.32 provides another proof of this fact.) Note that for the theorem to make sense, we first must also prove that the linear homeomorphisms (invertible continuous linear functions with continuous inverse) form an open subset of the space C(X,Y ) . Theorem 17.32 Let X be a Banach space, let Y be a normed space and let K ( X , Y ) be the set of linear homeomorphisms from X to Y . Then K ( X , Y ) is an open subset of C(X. Y ) and J ( A ) := A-' is a differentiable function from ?-t(X, Y ) to X ( Y , X ) whose derivative at A E K ( X , Y ) is D J ( A ) [ F ]= -AT1 o F o A-'. X Proof. For K E C(X,X) with IlKIl < 1, the series z ( - - l ) j K j converges absoj=O lutely. Because X is a Banach space, the series converges. Let I : X --+ X denote the identity. It can be verified directly that ( I + K)-' 00 = c ( - l ) j K j . In particular, this j =O 358 17. Differentiation in Normed Spaces means that if 11 K )I < 1, then Z 1 is bounded by (I + K)-' I/ 5 + K is invertible with continuous inverse and its norm 1 ~ 1 - IIKII' Now let A E X ( X , Y ) and consider F E C ( X , Y ) with 1 II F II < IIA-lII Then llFll < 1, which means that A and + Z A-' F are invertible with continuous inverse. Hence, A tinuous inverse. Thus for all A E X ( X , Y ) we infer that B + F is invertible with con1 ( A ) s X ( X , Y ) and l l A - 1 II R ( X , Y ) is open in C ( X , Y ) . To prove the differentiability of J at A E R ( X , Y ) let E > 0 and find a positive & 1 S < min so that for all F E C ( X , Y ) the inequality IlFIl < 6 2 IIA-' { implies that Z r"i} + A-' 00 F is invertible with continuous inverse and j=O Then for all functions F E C ( X , Y ) with 1) F 11 < 6 we conclude that A with continuous inverse and 1' A-' F IJ < 2. + F is invertible which proves the result. Exercises 17-25. Let X , Y be normed spaces, let R C X be open and let f , g : R ---f Y be functions. We will say that g is tangential to f at x E R iff iimr 'if(z) - g(z)'I = 0. (Note that we can work with limits, llz - X I / because by Exercise 16-52 every point of R is an accumulation point.) (a) Prove that the function f : + Y is differentiable at x E Q with derivative L iff the function g ( z ) := f ( x ) L [ z - x] is tangential to f at x. + 359 17.3. Differentiability (b) Prove that if two continuous linear functions L1, L2 : X + Y are so that both shifted functions f ( x ) L j [ ,- x ] are tangential to f at x, then L1 = L2. + (c) Let f : W2 + W be differentiable at the point ( x g , yo) E R2.Explain why the function P ( x , y ) := f ( x 0 , yo) D f ( x 0 , y o ) [ ( x - xo, y - y o ) ] is also called the tangent plane o f f at ( x o , Y O ) . Hint. Represent D f ( x 0 , y o ) as a matrix. + (d) Let X be a normed space, let R 5 X be open and let f : Define the tangent hyperplane of f : R + R at x . R --f B be differentiable at x E R. 17-26. Prove Proposition 17.26. 17-27. Prove Proposition 17.27 17-28. Prove Theorem 17.28. Hint. Add and subtract D f ( x ) [ z - x ] inside I/ f ( z ) - f ( x ) 1 . 17-29. Finish the proof of Theorem 17.29. That is, let X , Y be normed spaces, let R C X be open, let f,g : R + Y be differentiable at x E R and prove that f g is differentiable at x with derivative Wf = D f ( x ) Dgb-1. + + + 17-30. Explain why we must use "less than or equal" in Definition 17.24 while we can use the strict inequality in Theorem 4.5. 17-31. Let R C W" be open and let x E R.Explain why a function f : R -+ W m is differentiable at x E R iff there is a matrix A E M ( m x n , W) so that for every E > 0 there is a S > 0 so that for all z E R with llz - x I/ < S we have f ( z ) - f ( x ) - A(z - x ) < E I I Z - x /I, where A(z - x ) is the matrix product of A with the representation of z - x as a column vector with respect to the standard base. 1 /I 17-32. Let ( M . C. p ) be a measure space. Define I : L 2 ( M , C, p ) that I is differentiablewith D I ( f ) [ u ]= 17-33. Use Theorem 17.32 toprovethat d (A) d x .x lM -+ R by I(f) := 2f u d g = 1 -?. 17-34. Compare the proofs of Theorems 4.10 and 17.30. Decide which is simpler and explain your decision. 17-35. Let X. Y be normed spaces, let R g X be open, let f : R -+ W be differentiable at a and let (fy)(x) := f ( x ) y for all x E R.Prove that D ( f y ) ( a ) [ . ]= D f ( a ) [ . ] y . E R,let y E Y 17-36. Let X. Y be normed spaces, let R g X be open and let f : R -+ Y be continuous at the point x E R. Prove that if there is a linear function L : X -+ Y (we do not assume L is continuous) so that for all E > 0 there is a S > 0 so that for all z E R with /1z - x/1 < 6 we have f ( z ) - f ( x ) - LIZ- X I i~ l l z xlI, then f is differentiable at x . Ii 1 17-37. Let ( X , 11 //x ) , ( Y , 11 . / ( y ) be normed spaces, let R g X be open, let f : R -+ Y , let x E R.let /I.\/; be a norm on X that is equivalent to II.1Ix and let II.ll; be a norm on Y that is equivalent to 11 11 y . Prove that f considered as a function from a subset of ( X , 11 . 1 1 )~ to ( Y , 11 IIy ) is differentiable to ( Y . 11 .) ;1 is differentiable at x . at x iff f considered as a function from a subset of (X, 11 ) ; 11 17-38. Let X be a normed space. let R C X be open and let f : R + W be differentiable at x E R. Prove that if there is a 6 > 0 so that for all z E R with llz - X I \ < S we have that f ( z ) 5 f ( x ) (that is, there is a local maximum at x ) , then D f ( x ) = 0. Hinr. Suppose the contrary and use an a E X so that D f ( x ) [ a ]z 0. 17-39. Let X, Y be normed spaces, let R 2 X be open and for all n E N let f n : R -+ Y he differentiable converges pointwise to on R with continuous derivative O f n : R + L ( X , Y ) . Prove that if a function f : R -+ Y and [ Of,, },"==, converges uniformly to a function 7 : R -+ L ( X , Y ) , then f is differentiable and Df = 7. Hinr. First define pointwise and uniform convergence. (fn)F=l 360 17. Differentiation in Normed Spaces 17.4 The Mean Value Theorem The Mean Value Theorem (Theorem 4.18) cannot be translated directly to higher dimensional settings. Exercise 17-40 shows that for functions f : [a, b] + X , where X is a normed space, the derivative (velocity vector) need not be parallel to f ( b ) - f ( a ) at any t E ( a , b ) . However, the Mean Value Theorem is mainly used to bound the difference f ( b ) - f ( a ) by the product of b - a with the supremum of the derivative. Such a result can be proved in a more general setting and we will call the general result “Mean Value Theorem,” too. A natural idea for the proof is to first use the Fundamental Theorem of Calculus for an appropriately defined integral and to then use the triangular inequality (Exercises 17-4 1 and 17-42). This approach ultimately requires the continuity of the derivative or a technical integrability condition. Conditions of this kind can be avoided by working with compactness instead. Theorem 17.33 Mean Value Theorem. Let X , Y be normed spaces, let L! E X be open, let f : L! + X be differentiable and let a , b E L! be distinct points so that for all t E [0, 11 we have a t ( b - a ) E L!. Then + lIf(b)-f(a)II (sup{llDf(a+t(b-a))ll : t E [0,11}lJb-~ll. Proof. Considerg(t) := f (a+t(b-a)), definedon aninterval (-u, l+u) for some d u > 0. By the Chain Rule, we obtain g’(t) = --g(t) = D f ( a + t ( b - a ) ) [ b- a]. Let dt E > 0. For every t E [0, 11, there is a Sr > 0 so that for all z E X with llzll < Sr lIb - a 11 we have /I f ( a +t (b- a ) +z) I/ & - f ( a +t(b - a ) ) - D f ( a +t ( b- a ) ) [z] 5 -llzll. Therefore, with c := sup {//of (a Ix - t I < 6r we infer Ilb-all + t ( b - a ) )1 : t E [O, l]}, for all t , x E [0, 11 with I/ /Ig(x) - s ( t ) 5 IlgCx) -so> - g’(t)(x - t > / / I/g’(t>(x- t)II + 1 f ( a + x ( b - a ) ) - f ( ~ +(tb - a ) ) - D f ( a + t ( b - a ) ) [ ( x- t ) ( b -a)] / t )( b-011 1 I---I l b - t ) ( b-all1 + / ~ ~ f ( a + t ( b - a ) ) ~ / I I b - ~ I I I ~ - t l lIb - all = + // of (a+t (b- a ) ) [(x - & 5 (cllb-all + E)IX + - tl. NOW{ ( t - S r , t S t ) : t E [0, I]} is an open cover of the compact set [0, 11, which means there is a finite subcover { (t, - St,,t, A t i ) : i = 1, . . . , n } . Without loss of generality, we can assume that tl < t2 < . . . < tn, t l - &, < 0, 1 < tn S,, and t,+l-SrI+, < t , + S r L f o r i = l , . . . , n - l . S e t x o : = O , x , : = l a n d f o r i = 1 , . . . ,n-1 let x, E (t,+1 - SfL+,, t, S,,) n ( t l ,t,+l). Then, using a telescoping sum, + + /I f ( b ) - f ( a )I/ + 17.4. The Mean Value Theorem I 361 n n i=l n i=l C (cllb-all+ + ( cl l b - ~ l l +& ) ( t i - x i - l ) E)(Xi-ti) = cllb-a// + E. i=l Because E > 0 was arbitrary, the theorem is proved. The requirement that there is a straight line connection between the points a and b feels limiting, but it cannot be dropped. In particular, mere connectedness of the domain is not enough. (See Exercise 17-62.) Exercises 17-40. There is no direct translation of the Mean Value Theorem to vector valued functions. Let the function f : [0, n]+ R3 be defined by f ( t ) := (cos(t), sin(t), t2(n- t ) ). Prove that there is no c E (0, n)so that f(n)- f ( 0 ) = D f ( c ) [ n- 01. 17-41. The Riemann integral for Banach space valued functions. Let X be a Banach space and let f : [a, b] -+ X be a function. (a) Define what it means for f to be Riemann integrable. (b) Prove that i f f is continuous, then f is Riemann integrable. Hint. Prove that for a sufficiently fine partition P any two Riemann sums R ( f , P , T I )and R(f,P , T z ) are close to each other. Then prove that for any refinement Q the Riemann sums for P and for Q are close to each other. Prove that for Pn being the equidistant partition with n intervals, the Riemann sums converge. Then prove that all Riemann sums converge. (c) Prove the Antiderivative Form of the Fundamental Theorem of Calculus. That is, prove that i f f : [a, b] + X is differentiable and f' is integrable, then Ib f ' ( x ) dx = f ( b ) - f ( a ) Hint. Cover [ a , b] withintervals ( x - S x , x + S x ) sothatforallz E [ a , b] with l z - x l < 8, we Iz - X I . Take a finite subcover and construct a have I l f ( z ) - f ( x ) - f ' ( x ) ( z - x ) / l 5 b-a partition A (d) Triangular inequality. Prove that Illb Illb1 1 f(x) f ( x ) dx 5 dx if both integrals exist. (e) Prove the Derivative Form of the Fundamental Theorem of Calculus. That is, prove that d", I' f ( x ) d x = f ( t ) for all i f f : [ a , b1 -+ X is continuous, then - t E [a, b]. 17-42. Use Exercises 17-41c and 17-41d to prove that if X, Y are normed spaces, R C X is an open subset, f : C2 + X is differentiable, Df is continuous, and a , b E 12 are so that for all t E [O, 11 we have a+t(b-a)ER,then / / f ( b ) - f ( a ) 5 s u p { / I D f ( a + t ( b - a ) ) / / : r ~ [ O , l ] } i ~ b - a ~ I . T h e n explain why we cannot avoid the continuity hypothesis in this approach. / 17-43. Use Exercises 17-41c and 17-41d to prove that if X. Y are normed spaces, R 5 X is an open subset, f : R -+ X is differentiable, Df is continuous, and a , b E R and y > 0 are so that for all f E [O, 11 we have a f(b - a ) E R and Of(a t ( b - a ) ) - D f ( a ) 5 y t l / b - a l l , then the inequality + I/ + Y 1 f ( b ) - f ( a ) - Df(a)[b - a1 / 5 -2 Ilb - all2 holds. 1 Hint. Mimic the proof of Lemma 13.13 and use Exercise 17-41c. 17-44. Newton's method in several variables. Let X be a Banach space, let R C_ X be open and let f : Q + X be a continuously differentiable function so that D f ( x ) E ' H ( X , X) for all x E Q. Prove that if there are xo E R and a , B , y > 0 so that D f ( x g ) - ' [ f ( x 0 ) ] 5 a , so that for all x E R 1 17. Differentiation in Normed Spaces 362 we have /Df(x)-'11 5 j3, so that for all x. z E L2 we have h := ~ 2 < 1 and so that with r := 1-h 1) D f ( z ) - D f ( x ) 1 5 y l l i -xIl, we have B,(xo) & 52, then (a) Each recursively defined point x n + l := xI1- Df(x,)-' (b) The sequence (x~];=~ so that [ f ( x n ) ] is in B,(xo) converges to a point u E B,(xo) with f ( u ) = 0. h2n-l (c) For all n 2 0 we have ]/it - xn 11 5 cy 2 " 1-h Hint. Mimic the proof of Theorem 13.14. 17.5 How Partial Derivatives Fit In Partial derivatives are usually defined in the direction of a coordinate axis. Rather than n d tying our presentation to Rdand its coordinate axes, we consider a product Xi of i=l normed spaces. Proposition 17.34 For i = 1, . . . , d , let ( X I , 11 . lli) be a normed space and let 11 . //Bd n d be a norm on Rd. Then is a vector space and jj X i with componentwise addition and scalar multiplication i=l (XI, 1) . . . , xd) jj := (11x1 11 1 , . . . , llxdlid) IjRd defines a norm on it. Proof. Exercise 17-45. Definition 17.35 For i = 1, . . . , d , let (X,, I/ . I l l ) be a normed space and let I/ . llmd b e a n o r m o n R d . Thenorm /)(xi, . . . , xd)ll := ~ ~, . . . , Ilxdlld)llad ( ~ i s ~c a l l e d~a n d product norm. From now on, we will assume that any product X , of normed spaces I=1 is equipped with a product norm and we will call it a product space. Because all norms on Rdare equivalent, all product norms are equivalent. Hence, unless otherwise stated, we will use max { 11x111 1 , . . . , ((xdlid} as the product norm. Exercise 17-46 shows that all product norms are equivalent, so we are free to interchange specific norms in Rdin the definition of the product norm we use. The definition of product norms implies that the product of Banach spaces is again a Banach space and that the natural projections are continuous. Proposition 17.36 For i = 1, . . . , d , let (X,, // . i l l ) be a normed space. The product n d X , is a Banach space ifSallfactor spaces ( X I , 11 r=l Proof. Exercise 17-47. . I l l ) are Banach spaces. ~ 17.5. How Partial Derivatives Fit In 363 Figure 43: For partial derivatives, we restrict our attention to an appropriately translated subspace and take the derivative in that subspace. n d Proposition 17.37 For i = 1, . . . , d let (Xi, 11 . lli) be a normed space and let Xj i=l d be the product space. Then the natural projections nj : n n;(xi ~ X i -+ X i dejined by i=l . . . , xd) := x j are continuous. Proof. Exercise 17-48. Exercise 17-49 now shows that not every norm on a product of vector spaces is a product norm. Proposition 17.38 assures that as we consider partial derivatives with respect to a factor space, the domains of the requisite functions are open. n d Proposition 17.38 Let X = Xi be a product space, let R C X be open, and let i=l a = ( a l , . . . , a d ) E R. Then f o r all j E [ 1, . . . , d } the set := { x j E is open in xj : ( ~ 1 , .. . , a j - l , x j , a j + l , . . . , a d ) E a} (x;.// . 1).; Proof. Exercise 17-50. n rn d Definition 17.39 Let X = X j be a product space, let Y be a normed space, let i=l R 2 X be open, let a = ( a l , . . . , a d ) E R,and let j E (1,. . . , d } . For f : R + Y , 17. Differentiation in Normed Spaces 364 define fj" : R; + Y by f j " ( x ; ) := f ( a l , . . . , a;-1, x;, a ; + l , . . . , a d ) . If fj" is d$ferentiable at a;, then the derivative of fj" at a; is denoted D; f ( a ) and it is called the partial derivative o f f with respect to X ; at a or the partial derivative o f f with respect to the j t h variable at a. For a visualization, consider Figure 43. n d It is easy to see that if f is differentiable at a E X i , then the restriction of its i=l derivative to X ; is the partial derivative with respect to X ; . n d Proposition 17.40 Let X = X i be a product space, let Y be a normed space, let i=l 52 X be open and let f : R + Y . I f f is difSerentiable at a E Q, then f o r all integers j E (1, . . . , d } the partial derivative at a with respect to X ; exists and is equal to D; f ( a ) = D f (a)I(o)x . . . x ( ~ ) x ~ , x ( ~ ~ x .Moreovel; . . x ~ ~ ~ . f o r all u E X we have d of ( a ) [ u i , . . . , U d l = D ; f (a)[u;l. ;=l Proof. Exercise 17-51. Unfortunately, the existence of partial derivatives is not sufficient for differentiability (in fact, not even for continuity) as Exercises 17-60 and 17-61 show. The pathology exhibited in these exercises is why we build the theory around derivatives as in Definition 17.24 instead of partial derivatives. Nonetheless, it is possible to construct the derivative from partial derivatives, as long as the partial derivatives are continuous. n d Theorem 17.41 Let X = X i be a product space, let Y be a normed space, let i=l Q & X be open and let f : R + Y . Iffor every j E { 1, . . . , d } the function f is dzfferentiable with respect to the jthvariable at every x E S2 and the function x H D; f (x) is continuous at a E S2, then f is di~erentiableat a and the derivative o f f at a is d Df (a>[Ui,. . .,U d l = D;f (a)[u;]. ;=l Proof. Recall that we said we would use IIx 11 := max { llxi Ili : i = 1, . . . , d } as the product norm. Let E > 0. Find 6 > 0 so that for all j = 1, . . . , d and all x E Bs(a) we & have D; (x) - D; ( a ) < -. Then for all (z1, . . . , Z d ) E Bs(a1, . . . , a d ) we obtain d the following. 1 / 17.5. How Partial Derivatives Fit In 365 c d The above proves that D f ( a ) [ u l ,. . . , U d ] = Dj f ( a ) [ u j ] . j=1 With the general results established, we can now turn to the familiar partial deriva- n n tives on Rn. Because Rn= R is the prototypical product space, all results proved i=l so far apply to R '. Therefore we can concentrate on translating the abstract notions to the more concrete setting of n-dimensional space. Partial derivatives in R"are typically defined in the direction of a coordinate vector. Definition 17.42 Let the set R R" be open, let f : R -+ R" be a function and let a E R. Then the partial derivative off with respect to x i at a is defined to be f (a + h e j ) - f ( a ) if this limit exists. af -(a) := lim ax h-0 h The connection between these partial derivatives and the derivative is an exercise in reinterpreting abstract concepts as matrices. Theorem 17.43 Let R g a E R,f o r i = 1 , . . . , m let R" be an open set, fi := j r i o f let f : R and letu E -+ R" be differentiable at Rn be so thatu n = z u j e j . Then /=I af " repreThat is, the matrix (:(a)) ax i = l , . . . ,m j=1 j = l ....,n sents D f ( a ) with respect to the standard bases in Rn and R". This matrix is also called the Jacobian matrix off at a. Conversely, i f f : R -+ Rn' is so that f o r all i = 1, . . . , m and j = 1, . . . , n and of ( a ) [ u ] = i=l ej j$(a)uj. afi all x E 52 the partial derivative -( x ) exists and 8Xj continuous at a , then f is differentiable at a. if all these partial derivatives are 366 17. Differentiation in Normed Spaces n n Proof. For j = 1, . . . , n , let Rj := R so that Rn = R j . (In this fashion, we j=l can distinguish the partial derivatives in the coordinate directions.) For g : C2 + R c n differentiable at a , we infer by Proposition 17.40 that D g ( a ) [ u ]= D~,g(a)[uj]. j=1 By Proposition 17.26 for all j = 1, . . . , n we obtain Da,g(a)[uj] = -ag (a)uj, hence D g ( a ) [ u ]= ax c j=1 -ag (a)uj. ax and Now for f : C2 + Rm by Exercise 17-35 we con- The last statement follows from Theorem 17.41. For m = 1, the Jacobian matrix of a differentiable function f : Rn + R is given special attention because of its physical meaning. (7) Definition 17.44 Let C2 C R” be open and let f : 52 + R be so that all partial derivatives o f f exist at a E C2. Then we define the gradient o f f at a to be grad( f ) ( a ) := V f ( a ) := (‘F’). With V := this looks like a formal af (a) multiplication o f f with the ‘ vector” V. The (purelyfo%al) “vector” V is also called the nabla operator. The physical meaning of the gradient is easily explained with what we have derived so far. Let f : 52 + R be differentiable at a. Then by Theorem 17.43 for all u E Rd we obtain D f (a)[u] = ( Vf ( a ) , u ) . Moreover, the derivative describes the local behavior of f near a. In particular, for any vector u with IIuII = 1 we infer lim t+O (‘ + ”) t - ’ (‘) = (Vf ( a ) ,u ) (Exercise 17-52). That is, the inner product ( Vf ( a ) ,u ) gives the derivative of f in the direction u . This inner product is largest when u = Of so the gradient vector gives the direction of steepest ascent V f (a>jj ’ with its norm being the slope in that direction. Similarly, -V f ( a ) gives the direction of steepest descent with the negative of its norm being the slope in that direction. Physical systems usually strive for equilibrium. Whenever there is an imbalance in a physical quantity described by f in a homogeneous medium, there will be a flow in the direction of -V f that tries to restore equilibrium. For more on the physical ideas, consider Section 21.2. )I Exercises 17-45. Prove Proposition 17.34 as follows 367 17.5. How Partial Derivatives Fit In n d (a) For i = 1, . . . , d , let X; be a vector space. Prove that Xi with componentwise addition i=l and scalar multiplication is a vector space. d , let ( X;, // that 11; ) be a normed space and let 11 . I/Rd be a norm on W d . Prove 1) (xl,. . . , xd) 1) := 1 ( 11x11; 1, . . . , IjXdlId )/I@ n xi. d defines a norm on i=l 17-46 Prove that any two product norms are equivalent. Note. This and Exercise 17-37 justify the free interchange of one product norm for another 17-47 Prove Proposition 17.36. 17-48 Prove Proposition 17.37. 17-49 Not every norm on a product is a product norm. n d (a) Let X I , . . . , x d be vector spaces and let // I~x be a norm on Xj j=1 1 /Ix i. Prove that for all j = 1, . . . , d the function ilxj 11) := (0, . . . , 0,x j , 0 . . . , 0) is a norm on X j . ii. Prove that if for each i E [ 1, . . . . d ] we pick a fixed x; E X; \ (01, then the function ( a l , . . . , a d ) := ( a l x l , . . . , a d x d ) defines a norm on J R ~ . /I I/ /I llx (b) Let ( X , 11 . // ) be a normed space. let S g X be a dense subspace (like the simple functions in L P , see Theorem 16.85) and let F be a subspace so that S f l F = {O} (with S being the set of simple functions in L P , F could be the set of scalar multiples of a function f $ S). Define Y := S x F and let (s, f) := 11s f l l . 1 1 np + i. Prove that 11 . is a norm on S x F . ii. Prove that the natural projection ns : S x F + S is not continuous Hint. Let s, + f and consider s, - f. iii. Conclude that 11 . l l n p is nor a product norm on S x F 17-50 Prove Proposition 17.38. 17-51 Prove Proposition 17.40. Bd be open and let f : R + B be differentiable at a f ( a +tu) - f(u) l(u11 = 1 we have lim = ( V f ( Q ) , u ). t-0 t 17-52 Let R C n E R.Prove that for any vector u with d 17-53 Let X = Xi be a product space, let Y be a normed space, let R 5 X be open and let f : R + Y i=l be a function. Prove that i f f is differentiable on R and Df is continuous, then all partial derivatives D j f exist for all a E R and the functions D j f are continuous on R. 17-54 Let X be a vector space and let E,F g X be vector subspaces of X. Then X is called the direct sum of E and F , denoted E @ F , iff E F = ( 0 )and for all x E X there are e E E and f E F so thatx=e+ f . Prove that if X is the direct sum of E and F , then X = E c 3 F is isomorphic to E x F . 17-55 Prove the Multivariable Chain Rule. That is, prove that if f ( X I ,. . . , x,) : W n + W is a differentiable function of n variables and the components are differentiable functions x j ( t i . . . . , rm) of m variables, then af - at; = j=1 af ax2 + . . . + -2. af ax af ax, + -af a x j = --ax2 ati ax, at; ax j at; ax, ati 17-56 Coordinate transformations for differential operators. Let (x,y ) be rectangular coordinates on arctan ( f ) ; if x > 0, be polar coordinates on R’. w2, let r = and let Q := ar ctan ( $ ) +;r ; i f x < O , JG 17. Differentiation in Normed Spaces 368 af af af sin(@) (a) Prove that if f : W2 + R is differentiable, then - = - cos(8) - - ~. ax ar 80 r Hint. Use the Chain Rule as stated in Exercise 17-55. (b) Prove that i f f : R2 + R is differentiable and a2f a2f af are differentiable, then af and ax’ af ar ae a2f sin(8) cos(8) a f sin(Q)cos(O) af sin2(@) a2f sin2(8) 2+2arao r as ,2 ar r a@ r 2 ’ The second partial derivatives simply denote partial derivatives of partial derivatives. + -- -cos2(8)-2a x 2 - ar2 (c) Let f be a function whose partial derivatives an expression for +-- af are differentiable. Derive af af af and ax’ ay’ ar a@ a2f + a2f in polar coordinates. ax2 ay* - a 2 f and add the two Hint. Derive a formula similar to that in part 17-56b for ay n * d 17-57. Let W be a normed space, let Xi be a product space, let R i=l let f j : R + Xi be differentiable at x E R. Prove that f := (f1, Df(xIL.1 = ( D f l ( X ) [ . I > . , Dfd(X)[.I ). W be open and for i = 1, . . . , d . . . , f d ) is differentiable at x with , 1 17-58. Leibniz’ Rule. Let a , b : (c, d ) -+ ( I , u ) be differentiable and let g : ( I , u ) x (c, d ) --f tiable with respect to the second variable. Let F E L 1 ( l ,u ) be so that all g ( . , t ) , ’(” t a and -g(., at prove that R be differen- h , - ’(” t ) h t ) are bounded by F ( . ) and let all g(., t ) be continuous. Use the steps outlined below to (a) Prove that u : (c, d ) + ( I , u ) x (1. u ) x L’(I, u ) defined by u ( t ) := ( + a(t) b(t) is differen- g(x3 t ) tiable. Hint. Use the result of Exercise 17-57. Use Proposition 17.26 and the Dominated Convergence Theorem for the differentiability of the third component. (b) Prove that s : ( I , u ) x (1, u ) x C ( l , u ) + W defined by s where C ( I , u ) is a normed subspace of L1(Z, u ) . 0 lb b Hint. Theorem 17.41. Use the linearity of the integral operator h for the partial derivative with respect to the third component. h ( x ) dx is differentiable, := H lb h ( x ) d x on L 1 ( I , u ) (c) Prove Leibniz’ Rule using the Chain Rule 17-59. Let X be a normed space, let D g X be a dense subspace, let Y be a Banach space, let R g D be open in D and let f : R + Y be continu_ously differentiable with bounded uniformly continuous deriIative. Prove th2t there is an open set R E X and a unique continuously differentiable function e : R + Y so that R n D = R and so that e l n = f . Hint. Use the Mean Value Theorem (Theorem 17.33). { m’ xy 17-60. Consider the function f (x, y) = af af (a) Prove that -(0, 0) = -(0 , ax av 0 0 ) = 0. for (x, y ) f (0, O ) , for (x, y) = (0,O). 17.6. Multilinear Functions (Tensors) 369 (b) Prove that f is not continuous at (0,O) 17-61. Consider the function f ( x , y) = . for (x,y ) m' { 0; for (x, X2Y f (O, y) = (0,O). (a) Prove that f is continuous at (0,O). (b) Let X = span(u) c R2 be an arbitrary one dimensional linear subspace of W2 with Prove that D x f ( 0 , O ) := lim I / u / /= 1. f @ u ) - f ( 0 ,0 ) exists, f t+O (c) Prove that f is not differentiable at (0,0) Hinf. If f was differentiable at (0, 0), what would the derivatives in part 17-61b be equal to? Use Theorem 17.43. 17-62. An unbounded function with bounded derivative and bounded, connected two-dimensional TI ( 8 , A r ) E R2 : 8 > -, A r f ( Q ,A r ) := (( 4 + A r ) cos(8), (f+ and define f : A + R2 by E A r ) sin(R)). (a) Prove that S := f [ A ]is open, connected and contained in B2(0. 0). (b) Sketch S and state the geometric meaning of 8 and A r . (c) Prove that f is injective. (d) For (x, y ) E S, define 8 ( x , y ) to be the first component of f-'(x, y ) . Prove that the function (x,y ) F+- B(x, y) is differentiable at every point of S by showing that on every open disk contained in S it differs from arctan (e) For (x, y ) E S , define r ( x , y ) := J x 2 (0For (x.y) + y 2 and prove that r is differentiable on S . E S define B ( x . y ) := In ( 8 ( x ,y ) ). Prove that B is differentiable with bounded derivative. Hint. Prove that the absolute values of the partial derivatives of 8 are equal to one of the 1x1 I cos(Q(x, Y))I or I ~ I - I s i w x , Y))I and use that the radius expressions r ( x ,Y ) r(x,Y ) 1 (g) Prove that lim ( x . 41+ (0.0) B ( x , y ) = 00. 17.6 Multilinear Functions (Tensors) I f f : R -+ Y is differentiable at every x E S2, then we can also try to differentiate the derivative D f : R -+ C ( X , Y ) . If the thus computed second derivative exists, it would be a function that maps points x E R to linear functions D 2 f ( x ) [ . ]E C ( X , C ( X , Y ) ) and these linear functions map points u E X to linear functions D2f ( x ) [ u ] [ . ] .It turns out that such functions are linear in both square bracketed arguments (see Proposition 17.53 below). To simplify notation, higher derivatives are usually identified with functions that have several arguments and are linear in each one of them. 370 17. Differentiation in Normed Spaces n k Definition 17.45 Let X = n Xi be a product space and let Y be a normed space. i=l k Then T : X i + Y is called multilinear or k-linear or a k-tensor i=l j E { 1 . . . . . k } , all x. J E X ; and all a , p TIXI. . . . . s j - I . a x + B y . = E R we have . . . , x/;] x. x j + l . . . . . X k ] + ~ T [ x I . ,. ., X j - 1 . aT[.Vl.. . . . “;-I. lfSfor all indices Xj+l. )’, , u j + l , . . . .X X ] . 2-linearfunctions are also called bilinear. As for linear functions, we enclose the argument of multilinear functions in square brackets instead of round parentheses. This is because as multilinear functions are identified with higher derivatives we will evaluate higher derivatives at a point x (enclosed in round parentheses) for an argument [tl , . . . , t m ] ,which will be distinguished by being enclosed in square brackets. Example 17.46 Examples of bilinear functions. 1. The function m : R2+ R defined by m [ x , y ] := xy is bilinear. 2. Let X be a real vector space and let (., .) be an inner product on X. Then (., .) is bilinear. An inner product on a complex vector space X is not bilinear, because for x, y E X and a E @. we have (x,cuy) = E ( x , y ) . Complex inner products are also called sesquilinear. 3. The cross product (1;) (ii) y1 x := ( YlZ2 - ZlY2 is a bilinear funcx1y2 - Y l X 2 0 tion from I W ~x I W ~to I W ~ . Continuity of multilinear functions is characterized similar to continuity of linear functions. n k Theorem 17.47 Let X = n X, be a product space, let Y be a normed space and let 1=1 h T : X, + Y be k-linear: Then the following are equivalent: 1=1 I . T is continuous on X . 2. T is continuous at 0 E X. - 3. T is bounded on B I (0) c X . n k 4. There is a c I/ T [ X l > . . . 1 E R so that f o r all elements (XI,. . . , xk) E X Xkl /I IC l l X l I1 . = 1=1 ’ . IlXk /I. X , we have that 37 1 17.6. Multilinear Functions (Tensors) Proof. Mimic the proof of Theorem 17.4 (Exercise 17-63). n n k Corollary 17.48 Let X = X i be a finite dimensional product space, let Y be a i=l k normed space and let T : X , -+ Y be k-lineal: Then T is continuous. i=l Proof. Mimic the proof of Corollary 17.6 (Exercise 17-64). Similar to continuous linear functions, continuous multilinear functions form a normed space. Definition 17.49 Let X I , . . . , xk, Y be normed spaces. DeJine Tk(X1,. . . , x k , Y ) to n k be the set of all continuous k-linear functions from the product space Xi to Y . If i=l XI = ' . . = Xk = X , we also write I ~ ( xY ,) instead of I ~ ( X .~. . ,, xk, Y ) . Theorem 17.50 Let X i , . . . , xk, Y be normed spaces. Then, with pointwise addition and scalar multiplication, T k ( X l , . . . , x k , Y ) is a vector space and the function is a norm on T ~ ( x. . .~, x,k , Y ) so that nxi. /I ~ ( x l. ., . , X k ) I/ 5 I I T11 11x1 11 . . llXk 11 f o r all k ( X I ,. . . , X k ) E Moreover; i f y is a Banach space, then so is I k ( X 1 , . . . , xk. Y ) . i=l Proof. Mimic the proofs of Theorems 17.8 and 17.11 (Exercise 17-65). Definition 17.51 Similar to the operator norm of a continuous linearfunction, we will call the norm from Theorem 17.50 the tensor norm of the continuous k-tensor T . Continuing with similarities to continuous linear functions, multilinear functions are differentiable iff they are continuous. Theorem 17.52 Let X I , . . . ~ xk and Y be normed spaces and consider the function n k T E Tk(X1,. . . , X k , Y ) . Then T is dgerentiable and f o r each ( x i , .. . ,xk) E Xi i=l k the derivative is D T ( X ~ , . . . , X k ) [ u l , .. . , U k ] = c T [ x l , . . . ,x;-1, u j , x ; + l , . . . , x k l . j=l Proof. The case k = 1 is Proposition 17.27. Hence, we will assume k 2 2 throughout. We use a telescoping sum that is similar to the one in the proof of Theorem 17. Differentiation in Normed Spaces 372 n k 17.41. Let ( X I ,. . . , X k ) E 1 (z1, . . . , zk) // n k X;. Then for all elements i=l 211 ( X I , . . . , xk) i I/ (zi,. . . , zk) E )I + 1 =: M we obtain the following. X j so that i=l j=l =:C & Now for any E z 0 we can choose S := -to make the difference smaller than E c.+! llz - x 11. Hence, T is differentiable with the indicated derivative. In particular, Theorem 17.52 says that the derivative of a k-tensor is a sum of ( k - 1)-tensors. We conclude this section with the result that allows us to identify higher derivatives with continuous k-linear functions. Proposition 17.53 The spaces C (X,I k ( X , Y ) ) and 'Tk+'(X, Y ) are isomorphic via D : X + I k ( X , Y ) in L (X,I k ( X , Y ) ) to the multiLinearfunction in I~+'(x, Y ) that maps (uo, u1,. . . , u k ) to D[uo][ul, . . . , uk]. the map that sends the function Proof. Exercise 17-66. Starting with Exercise 17-69 below the exercises emphasize an important idea that was already used in Exercise 17-58. If we can write a complicated function as the appropriate composition of simpler functions, taking the derivative becomes a comparatively easy task of combining the Chain Rule and Exercise 17-57. This is one of the advantages of the coordinate free approach to differentiation. 373 17.7. Higher Derivatives Exercises 17-63. Prove Theorem 17.47. 17-64. Prove Corollary 17.48. 17-65. Prove Theorem 17.50. 17-66. Prove Proposition 17.53. 17-67. More examples of k-linear maps. (a) Prove that m : Rk + & defined i by r n [ x l , . . . , X k ] := XI is continuous and k-linear. 1 1 (b) Let ( M , Z, p ) be a measure space and let 1 5 p , q 5 cc with - - = 1. Prove that P 4 I : L P ( M , Z, p ) x L q ( M , Z, p ) + W defined by I[f, g ] := fg d p is a continuous ’ ’ Xk IM + bilinear map. (c) Let X, Y , Z be normed spaces and let o : C ( Y , Z) x L(X,Y ) be defined by o [ L ,M ] := L o M . Prove that o is a continuous bilinear map. 17-68. Prove that for every 2-tensor T : JRd x Bd + R there is a d x d-matrix A so that T [ u ,w] = v T A w , where u , w are column vectors with respect to the standard base and v T is the transpose of u , that is, a row vector. 17-69. Prove each of the following as a consequence of Theorem 17.52 and Exercise 17-57. (a) Let f,g : ( a , b ) + JR be differentiable. Prove that (f . g)’ = f’g (b) Let Y be an inner product space and let f,g : ( a , b ) (f>g)’ = if’.g ) + (f,g’). --f + fg’. Y be differentiable. Prove that (c) Let 52 C W3 be open, let f,g : 52 x 52 + W3 be differentiable and let x denote the cross product on R3. Prove that (f x g)’ = f’x g f x g’. (d) Let W , X , Y , Z be normed spaces, let 52 _C W be open and let L : 52 + L ( Y , Z) and M : R + C ( X ,Y ) be differentiable. Prove that D ( L o M ) = D L o M L o D M . Hint. Exercise 17-67c. + + (e) Explain why all the above product rules “look the same.” 17-70. Derive a product rule for products of k functions f l , . . , , fk : ( a , b ) + W. 17-71. Let G L ( n x n. R) be the set of invertible n x n matrices and let S : G L ( n x n , W) x W“ + W” be the function that maps the pair ( A , b ) of an invertible matrix A and a “right hand side vector” b to the solution x of the system of equations Ax = b. (a) Prove that S is differentiable. Hint. Theorem 17.32. (b) Compute the derivative of S at an arbitrary ( A , b ) . 17.7 Higher Derivatives Now we are ready to investigate higher order derivatives. The underlying definition is the obvious one. Definition 17.54 Let X , Y be normed spaces and let R g X be open. The function f : R + Y is called k times differentiableat x iff f is (k - 1) times differentiable on R and its ( k - l)Sfderivative Dk-’ f is differentiable at x. The kth derivative off at x is denoted Dkf (x). Thefunction f is called k times differentiable on R iy f is k times dixerentiable at every x E R.It is called k times continuously differentiable on 52 i f f it is k times differentiable on C2 and Dk f is continuous on R.Finally, the function f is called infinitely differentiable on irfor all k E N f is k times differentiable on R. 17. Differentiation in Normed Spaces 374 Appropriate application of Proposition 17.53 allows us to identify kth derivatives with k-tensors. This identification is common in analysis and we will use it throughout this text. That is, Dk f (x) will denote the k-tensor that is associated with the kth derivative of f at x. The next result shows how higher derivatives of higher derivatives are related. Proposition 17.55 Let X , Y be normed spaces, let m , n E W, let R X be open, let f : R + Y , and let x E R. Then f is m + n times differentiable at x ifs f is m times diTerentiable on R and D" f is n times differentiable at x E R. Moreovel; Dm+" f (x) = Dn (D" f )(x) andfor ( t l , . . . , tm+,) E Xm+n we have the identity Dm+nf( x ) [ t l , . . tm+nl = Dn ( D m f )( x ) [ t l t . .. , tnl[tn+l,.. . tm+nl. Proof. Let m E N be arbitrary. The proof is an induction on n , with the definition being the base case n = 1. For the representations of the tensors, note that for ( t l , . . . , tm+l) E Xm+' the derivative D (D" f )( x) [t l ][t 2. ,. . , tm+l] is by Proposition 17.53 identified with the value Dm+' f ( x ) [ t l ., . . , t,+l], where Dm+' f (x) is the corresponding ( m 1)-linear map. For the induction step, note that f is m n 1 times differentiable at x iff f is m 1 times differentiable on 52 and Dm+l f is n times differentiable at x E R. This is the case iff f is m times differentiable on R,D " f is differentiable on R and Dm+' f is n times differentiable at x E R,which by induction hypothesis is the case iff f is m times differentiable on R and Dmf is n + 1 times differentiable at x E R. For the representations of the tensors note that for all (tl , . . . , tm+n+l)E Xm+n+l the following hold. + + + + Dn+l Dm f ) ( x ) [ t l ., . . , tn+lI[tn+2>. . tm+n+lI ( = D (D" ( D m f ) )(x)[tlICt29... , tn+l1[tn+2,.. ., tm+n+lI . $ = D ( D n + " f ) (x)[t11[t2, . . . , tm+n+ll - ~n+m+l f ( x ) [ t l , . . , tm+n+ll. 1 For kfh derivatives (or, more accurately, for the tensors associated with them) the order of the arguments does not matter. The key to this insight is to prove the result for second derivatives. Theorem 17.56 Hermann Armandus Schwarz' Theorem. Let X and Y be normed spaces, let 52 & X be open, and let f : R + Y be twice differentiable at x E R.Then for all (s,t ) E X 2 we have D2f ( x ) [ s ,t ] = D2f ( x ) [ t ,sl. Proof. The main idea is that the sum f ( x + s + t ) - f ( x + t ) - f ( x + s ) + f (x) should be close to D2f ( x ) [ s ,t ] and D2f ( x ) [ t ,s ] . To understand where this expression comes f (x + s) - f from, recall that for single variable functions the difference quotient is close to f '(x) for small enough s. Hence, the difference quotient t St S 1 7.7. Higher Derivatives 375 should be close to f ” ( x > , andso f(x+s+t)-f(x+t)-f(x+s)+f(x) should be close to theproductf”(x)st, whichfor general f would be D 2 f ( x ) [ s t, ] or D 2 f ( x ) [ t s, ] . Let E > 0 and let SSt > 0 be so that for all s , t E X with llsll < SSt and ljtll < Ssr & we have x s t E L 2 and Df (x s t ) - D f ( x >- D 2 f ( x > [ s tll! I - 11s t 11, + + !I + + + where D’J is interpreted as a linear map into C ( X , Y ) . Then for all s, t 11s /I < 6,, and lltll < 6,,we obtain ‘i f Gt 5 4 E + X with + + r ) - J (-u + t ) - f ( x + $1 + f ( x ) - D ’ f ( X ) [ S , fill 5 /lf(x+s+t) - f ( x + t ) - (Df(X+S)[fI - Df(X)[fI) - (f(x+s) - f(x))/l 376 17. Differentiation in Normed Spaces But this implies the following for all s , t E X \ {O}. lpf(x)[s> tl - D 2 f ( x ) [ t SIII , Therefore, for Als, t ] := D 2 f ( x ) [ st, ] and B [ s , t ] := D 2 f ( x ) [ t s, ] the tensor norm of the difference is IIA - BII 5 48. Because E > 0 was arbitrary this implies IIA - BII = 0, and hence D 2 f ( x ) [ s t, ] = D 2 f ( x ) [ ts] , for all s, t E X . Definition 17.57 A bijective finction a : { 1, . . . , k } + (1, . . . , k } is also called a n k permutation. A k-tensor T : n Xi --f Y is called symmetric ifs for all k-tuples i=l k ( x i , . . . ,X k ) E Xi and all permutations a : { 1, . . . , k } + { 1, . . . , k } the equal- Corollary 17.58 Let X , Y be normed spaces, let fi X be open, let x f : fi + Y be k times differentiable at x. Then Dk f (x) is symmetric. E Q, and let Proof. The proof is an induction on k with k = 1 being trivial and k = 2 proved in Theorem 17.56. For the induction step, assume that k > 2 and that the result is proved for all j < k . Let ( t l , . . . , t k ) E X k and let a : 11, . . . , k } + 11, . . . , k } be a permutation. If a(1) # 1 let t : { 2 , .. . , k } + { 2 , .. . , k } be a permutation with t ( 2 ) = a(1). If ~ ( 1= ) 1 let t ( i ) = i for all i E [ 2 , . . . , k } and skip the middle three lines in the computation below. D k f ( x ) [ t i , .. . , t k l = D ( D k - ' f (x)) [til[t2,.. . , tkl = D ( D k - ' f ( x ) ) [tll[t,(2), = D2 ( D k - ' f ( X ) ) [ti,t,(2)l[h(3),. . . , t ~ ( k ) 1 = D2 ( D k - ' . f ( J ) ) [ & ( 2 ) , tll[tr(3),* * . = D (D"-'S(x)) = D ( D k - l f ( x ) ) [t,(l)I[t,(2)3 = D k f ( x ) [ & T ( l' ).> . t,(k)I, . . ' > t,(k)I 3 k(k)] . . > t,(k)I [ ~ , ( l ) l l ~t,(3)>. l~ b(3), . . . t,(k)I 3 1 where in the second to last step we use { l ,t(3),. . . , t ( k ) } = {a(2),. . . , a ( k ) }and apply an appropriate permutation. 17.7. Higher Derivatives 377 Clairaut's Theorem says that the order in which partial derivatives are taken is not important as long as the function is twice differentiable. It is usually stated for second partial derivatives of functions from Rd to R. The corollary below is easily seen to imply Clairaut's Theorem (see Corollary 17.60). Note, however (Exercise 17-73), that mixed partial derivatives can exist and not be equal. n n d d Corollary 17.59 Let Xi be aproduct space, let Y be a normed space, let R C i=l Xi i=l be open, let x E R, and let f : R + Y be k times differentiable at x. Then f o r all indices i l , . . . , i k E { 1, . . . , d } the partial derivative Di, . . . Di, f exists and f o r all permutations D : 11,. . . , k } + 11,. . . , k ] and all (xi,,. . . , x i , ) we have Dil ' . . D i , f ( x i , , ...,x i,)=Du(i,)...Do(ik)f(x,(il),. . . $ x o ( i k ) ) . Proof. For j E (1, . . . , d } , define e [ X j ] := { (0, . . . , 0 , x j , 0 , . . . , 0 ) : x j E X j ) , where the xj is in the jthcomponent of each vector. First we prove by induction on k that Dil ' . . D i k f = D k f I n l = l e [ X i , l The . case k = 1 follows from Di f = D f le[xil (see Proposition 17.40) wherever f is differentiable. For the induction step, assume the result has been proved for all j < k . Then Now Corollary 17.58 implies Dj, . . . D j k f ( x i l ,. . . , xi,) I n5=,e [ X i k1 (xi, . . . xi,) = Dk f = ~ ' f l n ;e[X,(,k)1(~o(ii)3 =, . . xo(ik)) = Du(il) ' ' ' Du(i,)f (xu(il), I . 3 ... 9 xu(ik)), which finishes the proof. Corollary 17.60 Clairaut's Theorem. Let R C Rd be an open set and let the function f : C2 -+ R be twice diferentiable at x E R. Then for all i l , i 2 E { 1, . . . , d } we have a2f (x) = a 2 f (x). axi, axi, axi,axi, r f f is k times differentiable at x, then for all i l , . . , , i k ~ mutations D of { 1, . . . , k } we have akf axi, . . . axi, af Proof. Easy consequence of -(x) axi (x) = E { 1, . . . , d akf axu(i1). ' axo(i,) ) and all per- (x). . t = D R f~( x ) [ t ]and Corollary 17.59. Now that we have made the connection to partial derivatives, we can also give an explicit formula for the kth derivative in terms of partial derivatives that is similar to Theorem 17.43. 17. Differen tiation in Normed Spaces 378 Rd be open, let x Corollary 17.61 Let B differentiable at x. Then E B and let f : B + Iw be k times Moreovel; for hi = h2 = . . . = hk = c = ( c l , . . . , C d ) we have d n;,[hjlei, for all j = 1, . . . , k. For the second part, note Proof. Use that hj = i,=l that because the kth partial derivatives do not depend on the order in which the partial derivatives are taken, we can sort the first sum by how often each kth partial derivative occurs. We conclude this section with a proof that k-fold differentiability is preserved by compositions. Proposition 17.62 Chain Rule. Let X , Y , Z be normed spaces, let R1 & X , B2 E Y be open, let g : R1 -+ Q2 be k times differentiable at x E 521 and let f : R2 + Z be k times differentiable at g ( x ) E B2. Then f o g is k times direrentiable at x. Proof. This proof is an induction on k, with the base step k = 1 being the Chain Rule (Theorem 17.30). For the induction step (k - 1) -+ k, let f and g be k times differentiable. Recall that by Theorem 17.30 we have D ( f o g)(x) = D f ( g ( x ) ) o D g ( x ) . By Proposition 17.55, D g ( . ) is k - 1 times differentiable at x and by induction hypothesis D f ( g ( . ) )is k - 1 times differentiable at x. Therefore by an easy generalization of Exercise 17-57 the function x H ( D f ( g ( x ) ) D , g ( x ) ) is k - 1 times differentiable. Moreover, the function ( L , M ) H L o M from L(Y, Z ) x C ( X , Y ) to C ( X , Z ) is continuous and bilinear, and hence k - 1 times differentiable by Exercise 17-74. But then by induction hypothesis, the composition x H ( D f ( g ( x ) ) ,D g ( x ) ) H D f ( g ( x ) )o D g ( x ) is k - 1 times differentiable, which completes the proof. Exercises 17-72. Finish the proof of Theorem 17.56 by proving that for all & z 0 we can find a St, > 0 so that for all s, r E X with llsli < St, and llrli < St, we have 17-73. Even when mixed partial derivatives exist, they need not be equal. Consider the function 17.7. Higher Derivatives (a) Prove that both 379 a2f ~ a2f axay (0, 0) and -(0, 0) exist and are not equal ayax af . af . (b) Prove that neither - nor - is differentiable at (0, 0). ax ay (c) Prove that f is differentiable at every (x.y ) E R2 (d) Prove that D f is not differentiable at (0,0). 17-74. Prove that every continuous k-tensor T is infinitely differentiable with D JT = 0 for j > k 17-75. Let R G Rd be open, let f : R + W be a twice differentiable function and let x E (a) Prove that A := ( D 2 f ( x ) [ e ; e, j l R. ) is such that D 2 f ( x ) [ u ,w ] = u T A w , = 1 , ,,, , j=l,..,,d where u , w are column vectors with respect to the standard base and uT is the transpose of u , that is, a row vector. (b) Prove that A is a symmetric matrix, that is, for all i, j E (1, . . . , d ) we have a j j = a j j . 17-76. For each f : R2 + 22, compute the second derivative. Use the representation of Exercise 17-75 (a) f ( x , y) = ye" (b) f ( x . J) = x 3 + 3xy + y 2 + 17-77. Taylor's Formula Let X , Y be Banach spaces, let R C X be open, let f : R --f Y be k 1 times continuously differentiable on R, and let x E 52, z E X be so that for all r E [0, 11 we have x +tz E R. + Hint. Consider the function t w f(x t z ) . The Riemann integral for continuous Banach space valued functions is defined in Exercise 17-41. Use Theorem 13.3 as guidance. (b) For R Wd and Y = R state Taylor's formula from part 17-77a in terms of partial derivatives. Then decide which of the two formulas is easier to use computationally and which formula is easier to read. Hint. Corollary 17.61. (c) Let o ( i ):= s,' + Dk+'f(x k The function T k ( z ) := i=o + u z ) d u [ i, . ,, . i ] . Prove that lim O ( z ) = o ~ llzIl+O llzllk 1 -t D ' f ( x ) [ z ,. . . , i ] is called the kth Taylor polynomial o f f at x -' 17-78. For each function f : R2 R below, compute the second Taylor polynomial at x = 0. Use the representation of Exercise 17-75 for the second derivative. (b) f ( x , y ) = x 2 (a) f ( x , y ) = ex?.* + y2 (Is the result a surprise?) 17-79. Let X be a normed space. A bilinear function T : X x X + 22 is called positive definite iff for all x E X \ ( 0 )we have T [ x ,x] > 0. (a) Second Derivative Test. Let Q differentiable and let u E E X be open, let f R be so that D f ( u ) = 0. : X --f R be twice continuously i. Prove that if D 2 f ( u ) is positive definite, then there is an E > 0 so that for all points u E X \ ( u }with lIu - ulI < E we have f ( u ) < f ( u ) , ii. Prove that if - D 2 f ( u ) is positive definite, then there is an E > 0 so that for all points u E X \ ( u ) with )1u - u / / < E we have f ( u ) > f ( u ) , 380 17. Differentiation in Normed Spaces iii. Prove that if there are x1. x 2 E X so that D2 f ( u ) [ x l ,xl] > 0 and D 2 f ( u ) [ x 2 x2] , < 0, then for every E > 0 there are elements u , w E X \ ( u )with IIu - ulI iE , Ilu - wll < E , f ( u ) < f (v), and f ( u ) > f (w). Hint. Use Exercise 17-77. a2f ( u ) > 0 (b) Prove that if X = W2, then the second derivative D 2 f ( u ) is positive definite iff ax2 and -a2f ( u ) ~a2f ( u ) - (*(u))* ax2 ay axay > 0. Hint. Use Exercise 17-75. a2f a2f ax2 ay2 (c) Prove that if X = W2 and -(u)-(u) (a(.)) 2 - < 0, then there are x1,x2 E X axay so that D2f ( u ) [ x l ,xi] > 0 and D 2 f ( u ) [ x 2 , xz] < 0. (d) State and prove a result similar to part 17-79a for a k times continuously differentiable function f : C2 + W so that Of (x) = 0, . . . , Dk-’ f (n) = 0. (Distinguish even and odd k.) 17-80. A characterization of symmetry n k (a) Prove that a continuous k-linear function T : there is a S > 0 so that for all k-tuples (XI,. . . . xk) E > 0 1 (XI,. . . , x k ) / <S Xi + Y is symmetric iff for every i=l n k E X i that satisfy i=l and all permutations 0 : [ 1, . , , , k ] + [ 1, , . , , k ] we have /I ~ ( x l . . xk) - ~ ( x o ( 1 ) s . . , x o ( k ) ) 1 ,,, , 5 E C IIxi 11 (ill Ik . (b) Consider the continuous bilinear map T : B2 x W2 + R defined by T ( x , y ) := 7r1 ( x ) 7 r 2 ( y ) . Prove that for every E > 0 there is a 8 > 0 so that for all ( x , y ) E R2 x E2with I/(x,Y)il < 6 we have T ( x , y ) - T ( y ,x) 5 E ( //x/I IIyIl ) , but T is not symmetric. 1 1 + 17-81. Let X be a Banach space, let Y be a normed space and let X ( X , Y ) be the set of linear homeomorphisms from X to Y . Prove that the map J ( A ) := A-l is an infinitely differentiable function from X(X, Y ) to X ( Y , X ) . 17.8 The Implicit Function Theorem To investigate the solution sets of equations f (x,y ) = 0 in more detail, it is often helpful to represent y as a function g(x). The Implicit Function Theorem says that under mild hypotheses this is possible and g has the same differentiability properties as f.The first step toward the Implicit Function Theorem is a result about fixed points. Definition 17.63 Let S be a set and let f : S -+ S be a function. Then p E S is called a fixed point off i f S f ( p ) = p . Fixed points are important throughout applied mathematics because many equations can be rewritten as fixed point equations. (Recall Newton’s Method from Section 13.2.) Under certain conditions, fixed points must exist. That is, if a fixed point equation from a concrete application satisfies the right abstract condition, then it must have a solution. Banach’s Fixed Point Theorem provides one such condition. 17.8. The Implicit Function Theorem 381 Theorem 17.64 Banach's Fixed Point Theorem. Let X be a complete metric space, let 0 5 q < 1 and let f : X -+ X be afunction so that for all points x,y E X we have d ( f ( x ) ,f ( y ) ) 5 q d ( x , y ) . Then f has a uniquefucedpoint p E X . Proof. Let x E proves that for all n E X and consider the sequence { f n ( x ) } Z l .An easy induction N we have d f n ( x ) , f n+l (x)) ( 5 q " d ( x , f ( x ) ) . Hence, for all m-1 q k d ( x ,f (x)). Therefore, { f n ( x ) } z lis n , m E W we infer d (fn(x), f m ( x ) ) 5 k=n a Cauchy sequence. Let p := lim f (x) and let E > 0. Then there is an n E d ( p , f n ( x ) )< 3 n-30 N so that & a n d d ( f " ( x ) , f " + ' ( x ) ) < -, whichimplies 3 d ( p , f ( p ) ) i d ( p , f n ( x ) )+ d ( f " ( x ) , f n + ' ( x ) ) + d ( f " + ' ( x ) , f ( p ) ) 5 < d ( P , f n ( x ) )+ d & & (fa@), f n + l ( x ) )+ q d ( f n ( x >P, ) & -+-Sq-<&. 3 3 3 Because E > 0 was arbitrary we have proved that p = f ( p ) . Regarding uniqueness, suppose that p is another fixed point of f . Then we have d ( p , F) = d ( f ( p ) ,f@)) 5 q d ( p , which implies d ( p , p"> = 0, and hence we conclude p = F. a, Now we are ready to state and prove the Implicit Function Theorem. Note that the function h in the proof is similar to applying Newton's Method in the Y-coordinate. Theorem 17.65 Implicit Function Theorem. Let X , Y and Z be Banach spaces, let i-2 C X x Y be an open set, let (xo,yo) E i-2, and let f : i-2 -+ Z be continuously diferentiable so that f (xo,yo) = 0 and so that D y f (xo,yo) : Y -+ Z is a linear homeomorphism. Then there is an open neighborhood N 5 X of xo such that there is a unique continuously difSerentiable function g : N + Y so that g(x0) = yo and f o r all x E N we have (x,g ( x ) ) E R and f ( x , g(x)) = 0. The derivative of g is Dg(x> = - ( ~ y f ( x g, ( x ) ) ) - l o D x f ( x , g(x>). Moreovei; iff i s k times continuously differentiable, then so is g . Proof. Let LO := D y f ( x 0 , yo), let h ( x , y ) := y - L i l [ f ( x , y ) ] and let 6 > 0 be so that for all (x,y ) E X x Y with I / ( x , y ) - (xo,y o ) 5 6 we have that (x,y ) E R, 1 D y f ( x , y ) is invertible and ~ / DfY(x,y ) - Dr f (xo, y o ) / / < I ~ is bounded on Bs((x0, yo)). Then for all points (x, yi) (x. y l ) - (xo,yo) 5 6 ( i = 1, 2 ) the following holds. 1 Ilh(x, Y 2 ) - h(x3 Y l ) l l E X x Y with 17. Differentiation in Normed Spaces 382 5 1 ~ I l Y2 Ylll. Let N := n-x [Bs ((xo, y o ) ) ] = BJ (xo) (equality holds because so far we are using the norm max { 11 . IIx, I/ . I l y } on X x Y ) . By Theorem 17.64 for each point x E N - the function h(x,.) : Bs(y0) -+ &(yo) has a unique fixed point g ( x ) . Because we have g ( x ) = h(x, g ( x ) ) = g ( x ) - L,'[f(x, g(x))] iff f ( x , g ( x ) ) = 0, a function g : N + Y with f ( x , g ( x ) ) = 0 exists on N and by Theorem 17.64 it is unique. To prove that g is continuous, note that by the proof of Theorem 17.64 the functions g,, where go(x) := yo and g , ( x ) := h(x,g , - l ( x ) ) for n ? 1, form a uniform Cauchy sequence. Thus g is the uniform limit of the functions g,. Because the g, are continuous, so is g (easy adaptation of the proof of Theorem 11.7, see Exercise 17-82). To prove that g is differentiable, fix x E N . To simplify the following argument, we will continue the proof with the product norm (x,y ) := /Ix/I II y II on X x Y . Note that by continuity of g and Proposition 17.40 for every x E N and every E > 0 with 1 & < there is a 6 > 0 so that for all z E N with /Iz - x 11 i6 2 !1-(Dyf(x, g(xN)-l we have 1 1 /I + 383 17.8. The Implicit Function Theorem Via (*) and the reverse triangular inequality, for all z with I/z - x 11 < S we infer L which implies that llg(z) - g ( x ))I I a ( z )llz - x 11. Because a ( z ) is bounded on B s ( x ) , substituting this into the equation (*) proves that g is differentiable with the derivative being as claimed. To prove that g is k times continuously differentiable if f is, we proceed by induction. The base step was just proved. For the induction step k + ( k I), note that both - ( D y f ( x , g( x) ) ) - ' and D x f ( x , g ( x ) )are compositions of k times continuously differentiable functions, and hence k times continuously differentiable themselves (for the inversion, Exercise 17-81 is needed). But then, because the composition operator is continuous and bilinear from C ( Z , Y ) x C ( X , Z ) to C ( X , Y ) the derivative D g ( x ) = - ( D y f ( x , g ( x ) ) ) - l o D x f ( x ,g ( x ) ) is k times continuously differentiable and therefore g is k 1 times continuously differentiable. + + The Implicit Function Theorem allows us to show that in Theorem 17.31 the hypotheses that f is bijective, f - ' is continuous and the range is open can be dropped. Corollary 17.66 Let X , Y be Banach spaces, let V 5 X be open, let f : V + Y be continuously differentiable and let xo E V be so that D f (xo) : X + Y is a linear homeomorphism. Then there exists an open neighborhood U of xo so that f Iu is a homeomorphism from U to an open neighborhood of f (xo). Moreovel; if f is k times continuously differentiable in U , then the inverse f - ' is k times continuously differentiable in f [ U ] . Proof. Define F : V x Y + Y by F ( x . y ) := f ( x ) - y and let yo := ~ ( x o )By . the Implicit Function Theorem, because D x f ( x 0 , yo) = D f ( x 0 ) there is a neighborhood N of yo and a k times continuously differentiable function g from N to X so that F ( g ( y ) ,y ) = 0 for all y E N . Now g is the inverse of f and the result follows. W The Implicit Function Theorem also is fundamental for the theory of manifolds, because it allows us to express sets of the form { ( x , y ) : f ( x , y ) = 0) as manifolds, or, as Corollaries 17.67 and 17.69 show, as embedded manifolds. We will revisit the results below when we prove Theorem 19.7. Corollary 17.67 Let X , Y , Z be Banach spaces, let U 5 X and V Y be open, let f : U x V + Z be k times continuously differentiable and let (xo, y o ) E U x V be so that Dy f (xo, yo) is a linear homeomorphism. Then there are open neighborhoods UOof xo and WOoff (xo, yo) and a unique k times continuously difSerentiablefunction g : UO x Wo + Y so thatfor all ( x , w)E UOx WOwe have f ( x ,g ( x , w))= w. Proof. Apply the Implicit Function Theorem to H ( x , y , w) := w - f ( x , y ) (Exercise 17-83). 17. Differentiation in Normed Spaces 384 Definition 17.68 The rank of a matrix is the largest number of linearly independent column vectors in the matrix. Corollary 17.69 Let R E Rd be open and let f : S-2 + RdPmbe a k times continuously differentiable function such that the matrix (g) i = l , ..., d - m has rank j = l , . . . ,d Rd and d - m. If a E C2 is such that f ( a ) = 0, then there is an open set G a k times continuously diyerentiable function g : G --f C2 with k times continuR and the equation ously differentiable inverse so that g[G] is open, a E g[G] f 0 g ( U 1 , . . . , U d ) = (Um+l, . . . , U d ) holds for all ( U l , . . . , U d ) E G. Proof. Let j1, . . . , j d P m be the indices of the columns so that the corresponding column vectors of (y (a)) ax are linearly independent. Represent Rd i = l , . . . ,d - m j = 1. . . . , d as span{ek : k 9 { j l , . . . , j d - m ) } x span{ej, : i = 1 , . . . ,d - m ) (this permutes the components appropriately) and apply Corollary 17.67. The details are left to the reader as Exercise 17-84. Exercises 17-82. Let X , Y be Banach spaces, let R C X and let [fn]T=l be a sequence of continuous functions f n : S2 + Y so that for all E > 0 there is an N E N so that for all m ,n 2 N and all x E R we have / f n ( x ) - f m ( x ) < E . Prove that then there is a continuous function f : R +. Y so that lim f n (x) = f ( x ) for all x E R. 1 n+cc 17-83. Prove Corollary 17.67. 17-84. Prove Corollary 17.69. 17-85. Explain why in the proof of the Implicit Function Theorem we must prove that g is continuous before we can prove that it is differentiable. 17-86. Lagrange multipliers. (a) Let X , Y be finite dimensional normed spaces with dim(X) = n ? m = dim(Y), let R & X be open, let f : R +. R be continuously differentiable, let g : R + Y be continuously differentiable, and let x E R be so that g ( x ) = 0, the rank of D g ( x ) is m, and there is an E > 0 so that for all z E R with g(z) = 0 and I/z - x i / < E we have that f ( x ) > f ( z ) . Prove that there is a continuous linear function cp E L(Y,R) so that D f ( x ) cp o D g ( x ) = 0. Hint. Let X1 := [ a E X : D g ( x ) [ a ]= 0 ] and represent X as a product X = X I x X 2 with x = (xl . x 2 ) . Find a neighborhood U x V of x so that there is a continuously differentiable a :U + Vsothatg(xl,a(xl))=Oforallx E U.ThenconsiderH(x):=f(xl,~(xl)). + (b) Prove that if X = R" and Y = Wm, then there are h l , . . . , A m m grad(f)(x) E R so that we have + z h i g r a d ( n i o g ) ( x ) = 0. i=l 17-87. Let X be a complete metric space and let f : X + X be a function so that there is a sequence cc b, converges and for all n E N and all x , y [ b n ) E 1of nonnegative numbers such that E X we n=l have d ( f n ( x ) ,f"(y)) 5 b n d ( x , y ) . Prove that f has a unique fixed point p in X . 17-88. Let 11 . 112 be the Euclidean norm on W3.Find a map f : R3 + R3 that has no fixed points and so thotfn-oll r c I33 W P h a v e 11 f l u ) - f ( v ) 11- = I I Y - v l l q . Chapter 18 Measure, Topology, and Differentiation Continuity, differentiation, and integration are the three main topics in analysis. Part I of the text has shown that the combination of these concepts can lead to powerful new insights, such as the Fundamental Theorem of Calculus or the Lebesgue criterion for Riemann integrability. For abstract spaces, we have seen in Chapter 14 how integration leads to measure theory, in Chapter 16 how the investigation of limits and continuity leads to topological concepts, and in Chapter 17 how differentiation is approximation with continuous linear functions. In this chapter, we investigate the connections between measure, topology, and differentiation in d-dimensional space. Section 18.1 characterizes Lebesgue measurable sets topologically by approximating them from the inside with closed sets and from the outside with open sets. Similarly, Section 18.2 shows that p-integrable functions can be approximated with infinitely differentiable functions. After a brief excursion into tensor algebra in Section 18.3 (placed there to keep the presentation self-contained), the chapter concludes with the proof of the Multidimensional Substitution Formula in Section 18.4. 18.1 Lebesgue Measurable Sets in Rd This section shows how Lebesgue measurable sets in Rd can be characterized almost exclusively with the topological ideas of openness and closedness. We start by proving that the most fundamental subsets of Rd are Lebesgue measurable. Theorem 18.1 Open and closed subsets of Rd are Lebesgue measurable. Proof. By Proposition 14.60, for any x = d open cube C,(X) := n ( x i - E , xi the open ball of i=l radius E (XI,. . . , xd) E Rd and any E + E ) is Lebesgue measurable. around x in the uniform norm (1 . ((oo. 385 > 0 the Moreover, C,(X) is 18. Measure, Topology, and Differentiation 386 Let 0 5 Rdbe an open set. Then OQ := 0 n Qdis a countable dense subset of 0. For all x E OQ,let E, := sup { E < 1 : C,(x) 5 O}. Then clearly C , , ( x ) C 0. u x€Oq To prove the reversed inclusion, let y E 0. There are an E E (0, 1) so that C,(y) _C 0 E and an x E OQ so that [ [ x- y [ I f f i< -. But then y E C % ( x )E C,(y) 5 0, which 2 C,,(x). Because y was arbitrary this proves that C,,(x) 2 0, means y E u u u X€OZ and hence X€OQ CEX (x) = 0. Because OQ is countable and every C,, (x)is Lebesgue X€OQ measurable we conclude that 0 is Lebesgue measurable because it is a countable union of Lebesgue measurable sets. Therefore, the open subsets of Rd are Lebesgue measurable. Now let C 5 Rd be closed. Then 0 = Rd \ C is open and thus Lebesgue measurable. But then C is the complement of a Lebesgue measurable set, and hence it is Lebesgue measurable, too. w Corollary 18.2 Let S 5 Rd be Lebesgue measurable and let f : S -+ R be continuous. Then f is Lebesgue measurable. Proof. Let a E R. Then the set ( a , co) is open in R. Because f is continuous, f - ' [ ( a , m)] is relatively open. That is, there is an open set 0 5 Rd so that f - ' [ ( a , m)] = S n 0, which by Theorem 18.1 is Lebesgue measurable. Hence, f is Lebesgue measurable. w Lebesgue measurable sets can now be characterized as the subsets of Rd that can be approximated by open sets from the outside and by closed sets from the inside so that the measure of the difference can be made arbitrarily small. We will first prove this result for sets of finite measure and then for sets of arbitrary measure. Theorem 18.3 A subset S Rd with h ( S ) < 00 is Lebesgue measurable zfffor every z 0 there are a compact set C and an open set 0 such that C 5 S _C 0 and A(O \ C ) < E . E Proof. For "+,"let S be Lebesgue - measurable with h ( S ) < 00 and let E > 0. Then u ffi 00 there is a family of open boxes { B j } g l with S 5 Bj and j=1 u j=1 cc oc, Let 0 := [Bjl < h ( S ) IBj I B j . Then 0 is open, S C 0, and h ( 0 ) 5 < h(S) j=1 j=1 + -.E2 + -.2 E For any d n E N let K , := n [ - n , n ] . By Theorem 14.15, we obtain h ( S ) = lim h ( S n K,) i=l .~ Hence, there is an N n+co E N so that K . E := K N satisfies h ( S n K ) 2 h ( S ) - -. The set 4 K \ S is Lebesgue measurable and there is a family of open boxes {Aj]yZl so that 18.1. Lebesgue Measurable Sets in Rd u x K \S _C j=1 387 x $. But then C := K \ uA j is closed x IAjl < h ( K A j and \ S) + j=1 j=1 and bounded, and hence by Theorem 16.80 it is compact. Moreover, the containment u x C =K \ A , 2 K \ ( K \ S) = S holds. NOW j=l > & & ( k ( S ) - -4) - - =4h ( S ) - - & 2’ E -) & Now h ( 0 \ C) = h ( 0 ) - h ( C ) < h ( S ) + - - ( h ( S ) = 6 , whichproves the 2 2 direction “+.” Conversely, for “+”let S C Rd be such that h ( S ) < 00 and for every E > 0 there are a compact set C and an open set 0 such that C _C S _C 0 and h ( 0 \ C) < E . Let T C Rd and let e > 0. Then with 0 and C as described we infer 18. Measure, Topology, and Differentiation 388 A ( T ) = h(on T ) h (0’ n T ) > h ( o n T ) + h ( o ’ n T ) + h ( o\ c ) - E + (o’n T ) + h ( ( o\ c ) n T ) - E 2 h( o n T ) + h (0’ n c’n T ) + h (( o n c’)n T ) - E 2 A(O n T ) + A 2 + A. (c’n T ) - E h ( s n T ) + A (s’n T ) - E . h(on T ) Because E was arbitrary we obtain h ( T ) 2 h ( S f l T ) was arbitrary, this means that S is Lebesgue measurable. + h (S’ f l T ) , and because T Theorem 18.3 indicates that an “inner volume” or “inner Lebesgue measure” for a set S could be defined as the supremum of the (outer) Lebesgue measures of all compact sets contained in S (also see Exercise 18-3). A set would then be measurable iff the inner and outer volume are equal. Measurability in this sense turns out to be the same as Lebesgue measurability. However, this idea is quite complicated and it requires some topological structure, while Definition 14.19 of measurability can be used on arbitrary sets. This is why, even though it intuitively feels like a good idea, we do not work with “inner measures.” We can characterize Lebesgue measurable sets in Rd as follows. Theorem 18.4 Let S Rd. Then the following are equivalent. 1. S is Lebesgue measurable, 2. For every E > 0, there are a closed set C and an open set 0 so that C 2 S _C 0 and h ( 0 \ C) < E . of open sets and a sequence {C,}r=l of closed sets 3. There is a sequence { cc with 0, 2 On+l and C, G Cn+l f o r all n E (6 andsothath 0, N so that n 00 C, 5 S 5 n=l cc \ u O,? n=l U Cn) = 0. nr-12. d Proof. To prove “1+2,” for any natural number n E N let K , := n ] and let i=l K-1 := KO := 0. For each n E N the set S n ( K , \ K,-1) is Lebesgue measurable with finite Lebesgue measure, so by Theorem 18.3 there are a closed set C, and an openc set On withC, C Sn(K,\K,-1) E 0, K,O+,\K,-1 andsothath(0, \C,l) < A 2.2n’ u ~ n=l u 00 30 Then the set 0 := 0, is open and by Exercise 16-42, the set C := n=l C, is closed. 18.1. Lebesgue Measurable Sets in Rd 389 Finally, with 00:= 0 we obtain the following. m ~ (\ C 0) = C ~ ( (\ C0 ) n (Kn \ ~n-1)) n=l For “2=+3,” for each natural number k E N let i& be open and let z k be closed n 1 with E k c s s 6 k and A( & \ E k ) < -. Then for each n E N,the set 0, := & k n u E n k=l n is open, the set Cn := u m Therefore, E k is closed, 0, k=l Cn G S n=l 2 On+l,Cn G Cn+i and Cn G On. Cc 0,. Moreover, for all k E N we infer n=l Cc and because k E sG N is arbitrary we conclude h On \ (n=l “1 u C, = 0. n=l The part “ 3 j l ” follows from the fact that open sets, closed sets and null sets are Lebesgue measurable. rn Lebesgue measure is also often considered on Lebesgue measurable subsets of Rd (see Example 14.10). Let A 2 Rd be Lebesgue measurable. Because the intersection of Lebesgue measurable sets is again Lebesgue measurable, Theorem 18.4 also holds for Lebesgue measurable subsets of A . If we want all sets involved to be subsets of A, we need to replace the demand that the 0, are open with the demand that the On are open in A and we need to replace the demand that the C, are closed with the demand that the C, are closed in A . If A itself is open, this is not necessary. Theorem 18.4 also shows that the a-algebra generated by the open subsets of Rdis interesting in itself. It is investigated further in Exercise 18-7. Exercises 18-1. Explain why the hypothesis k ( S ) < DC, in Theorem 18.3 cannot be dropped 18-2. Prove that every open subset of Bdis a countable union of compact sets 18. Measure, Topology,and Differentiation 390 18-3. Let S C Rd be Lebesgue measurable, (a) Prove that h(S) = inf [ A ( 0 ) : S (b) Prove that h ( S ) = sup [ h ( C ) : C (c) Prove that h ( S ) = sup [ h(C) : C g 0, 0 is open ] C S,C is compact ] g S. C is closed ] . 18-4. Prove that S & Ed with h ( S ) icc is Lebesgue measurable iff for every E z 0 there is a compact set C g S with h ( C ) > h ( S ) - E . 18-5. Lebesgue measurable functions. Let R g Wd be open and let f : R -+ W. (a) Prove that f is Lebesgue measurable iff for all open sets 0 & measurable. R the set f-' [ 01 is Lebesgue (b) Prove that f is Lebesgue measurable iff for all closed sets C E measurable. X the set f-' [C] is Lebesgue (c) Prove that f is Lebesgue measurable iff for all compact sets K 2 Lebesgue measurable. B the set f - l [ K ] is 18-6. The Derivative Form of the Fundamental Theorem of Calculus for the Lebesgue integral states the following. If h : [ a ,b] + W is Lebesgue integrable, then the function H ( x ) := iXIX I^ h ( t )d h ( t ) h ( t ) d h ( t ) = h(x) for almost all x E [ a ,b ] . In this is differentiable a t . with derivative - exercise, we will prove the result with the steps given below. (Recall that by Exercise 10-7, H is differentiable a.e. if h is nonnegative.) u n (a) Let a1 ibl < a2 < b2 < ' 'a, bn and let A := i ( a j , b,). Prove the result for the j=1 indicator function 1 ~ . (b) Use Fubini's Theorem on limits of nondecreasing functions (see Exercise 11-20) and Exercise 16-103d to prove the result for indicator functions of open subsets of [ a , b ] . (c) Prove the result for indicator functions of closed subsets of [ a ,b]. (d) Use Fubini's Theorem on limits of nondecreasing functions (see Exercise 11-20) and part 3 of Theorem 18.4 to prove the result for indicator functions of Lebesgue measurable subsets of [a. bl. (e) Prove the result for simple Lebesgue measurable functions on [ a , b] (f) Use Fubini's Theorem on limits of nondecreasing functions (see Exercise 11-20) to prove the result for nonnegative Lebesgue integrable functions on [ a , b ] . (g) Prove the result for all Lebesgue integrable functions on [ a , b]. 18-7. Borel Sets. Let X be a metric space. The a-algebra generated by the open sets in X is called the a-algebra of Borel sets of X , denoted B ( X ) . A function f : X + W is called Borel measurable iff it is measurable as a function on the measurable space ( X , B ( X ) ) . (a) Prove that if X is a Lebesgue measurable subset of Rd considered as a metric subspace of ad, then every Borel set of X is also Lebesgue measurable. (b) Let X g Ed be Lebesgue measurable. Prove that for every Lebesgue measurable set S 2 X S g G and h(G \ F ) = 0. there are Borel sets F , G E B ( X ) so that F (c) Let X C Wd be Lebesgue measurable and let f : X + B be Lebesgue measurable. Prove that there is a Borel measurable function f~ : X + W so that i( [ x E X : f ( x ) f ~ B ( x ) = 0. Note. This result is the reason why the focus of measure theory can be restricted to Borel measurable functions when necessary. 1) 391 18.2. Ccc and Approximation of Integrable Functions + (d) Prove that if rn n = d and X E C k m x CA,, then every Borel measurable function f : X + R is also X i m x X i n -measurable. (e) A measure p : B ( Rd ) +. [O. m] is called a Borel measure. Let the norm on Wd be the uniform norm Prove that two Borel measures p and u are equal iff for all x E Qd and all r E (0, 1) n Q we have that p ( B r ( x ) ) is finite and w (B,(x)) = u ( B , ( x ) ). (f,Let the norm on Rd be the uniform norm. Prove that there are two distinct measures p , u : B ( Bd ) + [0, co]so that g ( B r ( x ) ) = u ( B , ( x ) ) holds for all x E Qd and all r E ( o , i ) n Q. 18-8. The o-algebra of subsets of [a, b] generated by the open sets in [ a , b] (as a metric space in its own right) is also called the 0-algebra of Borel sets (of [ a , 61). Let g : [a,b ] + W be nondecreasing and let A, be the Lebesgue-Stieltjes measure defined in Exercise 14-20. (a) Prove that all Borel sets of [a, b ] are hg-measurable (b) Prove that Ag is a finite measure on the Borel sets of [ a . b ] . (c) Let f : [ a , b] + R be a function. Then f is called Lebesgue-Stieltjes integrable with respect to g iff f is A,-integrable. Prove that if f is continuous, then f is Lebesgue-Stieltjes integrable with respect to g and lb s," f dh, = f dg, where the right side is the Riemann-Stieltjes integral of f ,which exists by Exercise 5-19b. 18.2 CODand Approximation of Integrable Functions Theorem 16.86 makes a first connection between topological and measure theoretical notions by showing that, loosely speaking, the continuous functions are dense in L p [ a ,b] for 1 5 p < co. In this section, we show that even more is true. The set of infinitely often differentiable functions is dense in LP, too. Because differentiability is usually defined on open sets, we work with open domains R E Rd throughout this section. Recall that by Notation 15.66, with terminology as in Example 14.10, we have CP(i2) = c n,C,n&p) and L P ( R )= L p (R, Cp, ( i.IEs). Definition 18.5 Let R Rd be open and let k 3 0. We define Ck(R) to be the set of all k times continuously differentiable functions f : R -+ R. Moreovel; we define n mz C x ( Q ) := Ck(R). k= 1 Note that infinite differentiability is compatible with the usual algebraic operations. Rd be an open subset of Rd and let f,g E Cmz(R). Then f thefunctions f + g, f - g, f g are in C"(R). Moreovel; - ~ C ~ ( R \ { x : g ( x ) = 0 } ) . Theorem 18.6 Let R 5 g Proof. We first show that for the first three operations, the result is a consequence of Proposition 17.62. If f,g E C30(R), then x H (f (x), g(x)) is infinitely differentiable. Moreover, any bilinear function on R2 is continuous, and hence (use Theorem 17.52 and note that the derivative is the sum of two linear functions) infinitely differentiable. Therefore, by Proposition 17.62 for all k E N we infer f g E @(a),because + 18. Measure, Topology, and Differentiation 392 it is the composition of the k times differentiable functions x H ( f (x),g(x)) and ( y , z ) H y z. Because k was arbitrary, f g E CcQ(S2).Differences and products are treated similarly. + + ( 3 For the quotient, note that the function ( y , z ) H y , - is infinitely differentiable on R x R \ (0) and that the quotient is the composition of the functions We will prove more than that infinitely differentiable functions are dense in LP(S2). We will show that there is a dense subset consisting of infinitely differentiable functions that are zero outside a compact set. Definition 18.7 Let S-2 & Rdbe open. A function f E Coo(a) is called a test function ifs supp( f ) is compact. The set of all test functions on S2 is denoted C r (S2). The terminology "test function" comes from the role these functions play in the investigation of partial differential equations. One concrete example is the definition of weak derivatives (see Definition 23.11). With Cr(S2) defined, our next step is to approximate the indicator function of an interval with a function in C,"(R). The construction is visualized in Figure 44. Lemma 18.8 The C x connector: The function c ( x ) := 0; e .x; f o r x 5 0, is inf o r x > 0, Jinitely differentiable on R. Proof. It is clear that c E Cm(R \ (O}). It remains to be proved that c is infinitely differentiable at x = 0. For all x < 0, we have c("'(x) = 0. Therefore, if they exist, all derivatives of c at x = 0 must be zero. For all n E N U {0},we have lim c'n'(z) z-0 --+n-- = 0. If we can show that the right-sided limit also is 0, then we have forx E (-m, 01, d" -i for all n E N. O; p e x ; forx E (0, co), (Formally, this would require an induction on n, but it is not hard to mentally fill in.) An easy induction shows that for every n E N U {0}there is a polynomial p n so d" I that -e-; = pn e-4 for all x > 0 (see Exercise 18-9). For all j E N by dxn ~, _ _1 e .r e - i = lim uje-' = 0, which Exercise 12-27, we obtain lim - = lim ( : ) J proved that c E c"(R) with c'"'(x) := (:) x+o+ xJ x+O' k a j x j we can conclude means that for every polynomial p ( x ) = j =O But then for all n E u+x N U {0}we obtain 393 18.2. Cco and Approximation of Integrable Functions Figure 44: Constructing a Coo “indicator function” for intervals. 1 d“e-i - 0 lim z+o+ dz“ z-0 = lim z+o+ pn (i) e-i - o = 2-0 1 lim - p n z+o+ z (:) e-: = 0. 1 Note that the function in Lemma 18.8 also shows that not every C”O function can be represented as a power series. This is because c(“)(O) = 0 for all n E N,but c # 0. 4x1 is c(6 - x) c(x) infinitely . differentiableon R.Moreovel; it takes values in [0, 11, it is identical to zero on (--00, 01, it is identical to 1 on [a, m) and j s [ ( O , S ) ] C (0, 1). Lemma 18.9 The Coojump function. Let 6 > 0. Then j a ( x ) := “I + ~~ Proof. Because c 3 0 and for each x E R at least one of c(x) and c(6 - x ) is not zero, Proposition 17.62 and Theorem 18.6 imply that ja E C”(R). Because c 2 0 for 4x1 < all x E R we obtain c ( x ) 5 c(S - x ) c ( x ) , which implies 0 5 + c(6 - x ) + c(x) - for all x E R and the inequalities are strict for x E (0,6). Finally, because c ( x ) = 0 for x I 0 we conclude j s ( x ) = 0 for x I 0 and because c(6 - x) = 0 for x 3 6 we conclude J s ( x ) = 1 for x 2 6. Lemma 18.10 The Coo interval indicatol: Let a < b and 6 > 0 be real numbers. The function l ( a , b ) , J ( X ) := js(x - a ) j s ( b - x ) is infinitely difSerentiable on R. Moreovel; it takes values in [0, 11, it is identical to zero on E% \ ( a ,b ) and it is identical to 1 on [a 6 , b - 61. + 1 Proof. Exercise 18-10. Cm interval indicator functions are the key ingredient to constructing similar “indicator functions” for closed sets. These functions are then used to show that the compactly supported infinitely differentiable functions are dense in L* (Q). Lemma 18.11 Let C C Rd be closed and let U C Rd be open so that C U. Then there is a function l c , u E Coo ( R d ) that takes values in [0,11 and satisfies k u l , = 1 and SUPP ( k u ) c u. Proof. First consider the case that C is compact. Throughout, we will use the uniform norm 11 . Iloo. So, “balls” around points will actually be cubes. For each x E C, let E, 0 be so that BEx( x ) c U . Then BQ (x)],,~ is an open cover of C. Let uf { n XI, . . . , xn be so that C BsX j=1 (xj).For each x j , let x ; ) denote the ith component 394 18. Measure, Topology,and Differentiation of the representation of x j with respect to the standard base. For j = 1, . . . , n let 11 becau\e it is a product of infinitely differentiable functions. Hence, for h := infer supp(h) x E B61, - c U,h E Cm (ad) and for all x ( x - ~ ) which . means h(x) 2 12, hi ue J=i E C there is a j E [ 1. . . . . n ) so that j l being a Cx jump function (x) = 1. With as in Lkmma 18.9, define l c , := ~ j1 o h . By Proposition 17.62, this function is in Cx By the above, we have lc,ulc = 1 and supp ( l c , ~c) U , so the result is established for compact C. Now consider the case that C is closed, but not necessarily compact. For each n E W, let C, := C f l (B,(O) \ B,-l(O)). Then each C, is compact. For each x E C,, (md). let E~ E (0, 1) be so that BEx(x) c U and let U,, := u BE,(x). Then for each n E N, X€C, the set U, is open and contains the compact set C,. Let the functions l c , , ~be , as w l c , , ~,, For , each x E Rd, there is a neighborhood constructed above and let g := n='i V of x so that at most four terms in this series are not equal to zero on V . Thus g is in COc Moreover, g(x) 2 1 for all x E C , and supp(g) c U . Now l c , := ~ j1 o g is as desired. (Rd). Note that Lemma 18.11 shows in particular that Cr(S2) is not empty, which is not necessarily trivial. The next result shows that even more is true. Theorem 18.12 Let 1 5 p < 00 and let 52 2 Rd be an open subset of Rd.Then C$(Q), or more formally T := { [ f ] : f E Cr(S2)}, is dense in LP(S2). Proof. The proof runs along the same lines as the proof of Theorem 16.86. The main change is that we need to approximate the indicator functions of open boxes contained in S2 (rather than those of intervals) with functions in C r ( Q ) (rather than d with continuous functions). Let B = n(lL, r , ) be a box that is contained in 52 and let ,=1 > 0. By Theorem 18.3, there is a compact set C & B so that h ( B \ C) < E * . Let 1 c . be ~ as in Lemma 18.11. Then supp ( l c , ~c) B 2 S2, l c , E~ Cco(S2) and E We conclude that in any LP-neighborhood of an indicator function of a box in Q we can find a function in C r (Q). Now that we can approximate indicator functions of boxes with C r (D)-functions, the proof proceeds just like the proof of Theorem 16.86. To approximate indicator 18.2. Cx and Approximation of Integrable Functions 395 Figure 45: Illustration how the indicator function of an interval can be approximated in LP with CQ3functions. The area bounded by the difference shrinks to zero, which allows the approximation (in the LP sense) of the discontinuities with infinitely smooth functions. functions of sets, cover the set A with open boxes Bj C R such that the sum of the volumes of the boxes Bj is close to that of the set. Truncate the sum to obtain a finite number of boxes Bj so that the sum of the volumes of the finitely many boxes is close to the total sum of volumes. Approximate the indicator functions of these finitely many boxes with CF(S2) functions as indicated above to prove that indicator functions of sets can be approximated with CF(S2) functions. The details are to be given in Exercise 18-1la. Next, just like the proof of Theorem 16.86, approximate simple functions with functions in Cr(S2) (Exercise 18-llb). Finally, just like the proof of Theorem 16.86, apply Theorem 16.85 (Exercise 18-11c). Geometrically, Theorem 18.12 says that even highly discontinuous functions can be approximated arbitrarily well with infinitely smooth functions in LP. Figure 45 shows such an approximation for the indicator function of an interval. This visualization illustrates the crucial part of the proof of Theorem 18.12 and it also shows that the result is not counterintuitive. The infinitely smooth approximations do not in any way “fix” the jumps at the discontinuities. Exercises 18-9. Prove by induction that for every n E dn -L x = pn (t) N there is a polynomial p n so that for all x > 0 we have e-4. 18-10. Prove Lemma 18.10. 18-1 1. Finish the proof of Theorem 18.12 (a) Prove that for every measurable subset A of Q and every &?A,& E c r ( Q sothat ) l / l A -gA,&Ilp < E . E (b) Prove that for every simple function s E LP(R) and every gS,&E C r ( Q ) so that 11s - gS,&lIP < E . E > 0 there is a test function > 0 there is a test function (c) Apply Theorem 16.85 to complete the proof. 18-12. Let R g Rd be an open set. Prove that for every function f in C F ( Q ) that converges a t . to f. 18-13. Let R Rd be an open set. Prove that i f f then f = 0 a.e. E E L 1 (Q) there is a sequence of functions L’(S2) is so that fq dA = 0 for all cp E Cg(R), 18. Measure, Topology, and Differentiation 396 18-14. A half-open box in Wd is a set H for which there are real numbers ai < bi, i = 1, . . . . d so that d H = n[a;,bi). i=l Let 0 2 Wd be open and let K g 0 be compact. Prove that for every E > 0 there is a finite family n (H,)S=l of pairwise disjoint half-open cubes of side length less that E so that K E u Hj g 0. j=1 Hint. Work with the 11 . n[ n use cubes of the form 1 , norm, let S := - min 2 IiS, (I, { E , dist ( C , Wd \ 0 ) ] (need to prove S > 0) and + 1)s ). i=l 18-15. Prove that the function g : Rd -+ B defined by g ( x ) := e% ; lo: for I/x1/2< 1, for Ilxll2 2 1, is a C r function. Hint. Chain Rule and Lemma 18.8. 18-16. Let R C Wd be open and let f E LP(R). With f extended to all of Rd by setting it equal to zero sI + outside of 52, prove that lim f ( x ) - f ( x z ) IP d u x ) = 0. z ~ o Hints. First prove the result for C r functions. Then prove the result in general by approximating f with C r functions. 18-17. Let R 2 Wd be open, let f fp(x) := E LP(R), let cp E C r ( R ) and for all x E Wd define the function fp by f(z)cp(x - z ) d h ( z ) . (The integrals exist because LP(supp(cp)) 2 L'(supp(cp)).) (a) Prove that fp is continuous on Wd. Hint. Dominated Convergence Theorem af of f exist and are equal to Rd all first partial derivatives - (b) Prove that at every x E ax; Hint. Dominated Convergence Theorem, Mean Value Theorem for functions of one variable and boundedness of the first partial derivatives of 9. (c) Prove that all first partial derivatives af - of axi Hint. Apply parts 18-17a and 18-17b. f are continuous at every x E Wd (d) Prove that fp E Coo ( Wd ) Hint. Apply part 18-17c and use induction Note. The operation that produces 18-18. Let R 2 Wd beopenandlet fp from f and cp is also called the convolution of f and cp p E [1, 00). (a) Prove that LP(R) has a countable dense subset consisting of simple functions. Hint. Boxes with dyadic rational bounds. (b) Prove that LP(R) has a countable dense subset consisting of C r functions. 18-19. Prove that for every Riemann integrable function f : [a. b] + R on [a. b] there is a sequence (gn)?=] of infinitely differentiable functions on [a,b] such that & ( a ) = g n ( b ) = 0 for all n E W, nl%l b /f-g,ldx=OandforallnEWwehavelgn/i/fI. Hint. First find a = xo < x l .'. i n, = b and a step function s ( x ) = a k l [ x k - l , , x kso ) that i k=l Is/ 5 1 f j and /Is - f 11 1 is small. Then use Coo interval indicator functions. 18.3. Tensor Algebra and Determinants 397 Figure 46: Geometric arguments that give the area of a parallelogram and the volume of a Parallelepiped. (Formally, the cross product on the left is obtained by making 2 and b vectors in R3 whose third component is zero.) 18.3 Tensor Algebra and Determinants To prove the Multidimensional Substitution Formula we need a volume function for n-dimensional parallelepipeds spanned by vectors a1 , . . . , a,. Moreover, for integration on manifolds we need functions that allow us to compute the lower dimensional volume of lower dimensional parallelepipeds in higher dimensional spaces. For example, we will be interested in the area (two dimensional volume) of a parallelogram (two dimensional parallelepiped) in three dimensional space. We start by analyzing some formulas from geometry (also see Figure 46). It can be proved that the area of the parallelogram in R2 spanned by the vectors a = b= (Ei 1is (::) and 1. A ( a , b ) = lalb2 - bla2 Similarly, the volume of the three dimensional 1 1. parallelepiped spanned by the vectors a , b , c is V ( a ,b , c) = ( a , b x c) Note that if we drop the absolute values, the (oriented) area is bilinear and the (oriented) volume is 3-linear (see Definition 17.45) in the input vectors. Throughout, we will consider oriented volumes, that is, volumes that can be positive or negative. This is not much of a loss, because if we really want a nonnegative number, we simply take absolute values. Because the two dimensional volume formula is incorporated in the three dimensional volume formula, the above suggests that we should try to construct a general tensor formalism that produces the volume function for parallelepipeds in arbitrary dimensions. 18. Measure, Topology, and Differentiation 398 We start by representing k-tensors on finite dimensional vector spaces. The representation in Theorem 18.16 below is already implicit in Corollary 17.61. Definition 18.13 Let V be a vector space. The set of real valued k-tensors on V is denoted by I k( V ) . For a Jinite dimensional vector space V , the dual space is deJined to be V* := I ’ ( V ) = C ( V ,R). With pointwise addition and scalar multiplication ‘ T k ( V )is a vector space (see Exercise 18-20). The (higher dimensional) volume of a box is the product of the lower dimensional volumes of the projections: for a rectangle, area is the product of the lengths of the sides (length is one dimensional volume), and for a three dimensional box, the volume is the area of the base times the height. It is thus not surprising that tensors can be multiplied in the same simplistic fashion. Definition 18.14 Let V be a vector space and S E ‘ T k ( V ) ,T E 7 “ ( V ) . We define s @ T [ U l , . . . , U k , U k + l , . . . , U k + l ] := s[L’l,. . . , uk]T[uk+l,. . . , V k + l ] and Call it the tensor product of S and T . Clearly, S @ T E ‘Tk+’(V). The tensor product is not commutative, but it is associative (see Exercise 18-21). This means that while we need to be careful with the order of the factors, tensor products with more than two factors can be written without parentheses. Tensor products allow us to represent tensors in terms of a very natural base. Theorem 18.15 Let V be a Jinite dimensional - vector space with base { u1, . . . , U d } . - Then the maps @i : V -+ IR dejined by q5i := ai form a base of the dual space of V , called the dual base of { v l ,. . . , Ud}. Proof. To see that { @ I , . . . , q5d) is linearly independent, let a1, . . . , ad d E d R be so t h a t C a i $ i = O . T h e n f o r a l l j E { l , . . . , d } w e i n f e r O = ~ a i @ i [ u j ] = a j and , i=l i=l hence {41,. . . , 4 d ) is linearly independent. To see that (41,. . . , @ d ] is a base of V*, let +E c d V*. Then for all u = ai ui i=l d d in V we have + [ u ] = c a i + [ u i ] = c + [ u i ] 4 i [ u ] which , means that i=l i=l combination of the 4i. Hence, {$I, . . . , &j} is a base of V*. + is a linear H Theorem 18.16 Let V be a Jinite dimensional vector space with base { v l ,. . . , U d } and let { @ I , . . . , @ d } be the associated dual base as in Theorem 18.15. Then the set B := {4i,@ . . . @ @ i k : i l , . . . . ik E 11. . . . , d } } is a base f o r I k ( V ) . 18.3. Tensor Algebra and Determinants 399 Proof. The proof is similar to the proof of Theorem 18.15. (Exercise 18-22). H Both the area of a parallelogram as well as the volume of a three dimensional parallelepiped have multiple summands. Hence, the volume function for parallelepipeds cannot just be a simple tensor product. A key property of both geometric formulas is that if we switch any two vectors, the sign of the result changes. This observation leads us to alternating tensors. Definition 18.17 Let V be u vector space. A k-tensor w E I k V ( ) is called alternating #for all 1 5 i < j 5 k and all U I. . . . , Uk E v we have o[Ul,. - . . , U i - 1 , ui, u i + l , . . . U j - 1 , U j , u j + l , . . . , U k ] --W[Ul,. . . , U j - 1 , U j , U i + l , . . . , u j - 1 , ui, u j + l , . . . , U k ] . 3 The set of alternating k-tensors on V is denoted Ak( V ) . Exercise 18-23 shows that the set of alternating k-tensors forms a vector space. Because they have only one argument, linear functions q5 : V --f R are alternating 1-tensors. (The universal quantification in the definition of an “alternating 1-tensor” is over the empty set, so it is vacuously true.) For any tensor, we can define a corresponding alternating tensor. The idea is to sum all terms that can be obtained by permuting the entries in such a way that a transposition of two vectors will switch a positive summand with a negative summand. Definition 18.18 For k E N we let S k be the set of all permutations on [ 1, . . . , k ) . A transposition is a permutation t that fixes all but two elements of the set. The sign sgn(o) of a permutation a E Sk is 1 i f a is a composition of an even number of transpositions and it is - 1 if a is a composition of an odd number of transpositions. (Exercise 18-24 shows that the sign is well-defined.) If T E I k ( V ) we define Theorem 18.19 shows that Alt(T) is an alternating tensor, thus explaining the no1 tation. The proof also shows why we need the coefficient -. Without it, Alt( .) applied k! to an alternating tensor would multiply the alternating tensor with a factor that is not equal to 1. Theorem 18.19 Let V be a vector space and let k E N. For all T Alt(T) E A k ( V )and f o r all w E A k ( V )we have Alt(w) = o. E I k( V ) we have Proof. First note that a tensor w E I k( V ) is alternating iff for all transpositions sk and all u1, . . . , U k E V we have w [ u 1 , . . . , u k ] = - w [ U r ( l ) , . . . , U,(kj]. Moreover, note that for all transpositions t E s k we have t o t = id{I,..,,k). NOW let T E I ~ ( v and ) let u1, . . . , U k E V . Then for all transpositions t E Sk we obtain the following. t E 18. Measure, Topology, and Differentiation 400 = -Alt(T)[Ul,. ..,~ k ] . Therefore Alt( T ) is alternating. For the second part, note that if o E A’( V ) and a E sk, then for all u1, . . . , U k we have w [ q , . . . , ]‘u = sgn(a)w[u,(l), . . . , v,(k)] (Exercise 18-25). Now = W[Vl,. E V ..,Uk]. It is also easy to see that Alt : T k (V ) -+ Ak( V ) is linear (see Exercise 18-26). With Alt(.) we can now multiply alternating tensors in a way that produces a new alternating tensor. Definition 18.20 Let V be a vector space. For w E A k ( V )and rj E Al(V) define the (k l)! wedge product w A v to be w A q := ___ AMw 8 v). k!l! + Example 18.21 The wedge product is the key to obtaining volume functions. Let ( e l , . . . , ed} be the standard base in Rd and let {nl , . , . , nd}be the corresponding dual base. 1. For all a,b E Rd,we have nl A n2(a,b ) = nl(a)n2(b)- nl(b)n2(a).That is, the wedge product of n1 and n2 maps any two vectors a , b E Rd to the oriented area of the parallelogram spanned by the vectors made up of the first two components of a and b. More generally, the wedge product of ni and nj maps a , b E Rd to the area of the parallelogram spanned by the vectors made up of the ith and jthcomponents of a and b. 2. For all a , b , c E Rd,we have (ni A ( ~ A2 ~ 3 ) ) ( ab ,, c ) = 18.3. Tensor Algebra and Determinants 40 1 (The lengthy componentwise proof can be produced in Exercise 18-27.) That is, the wedge product of nl,372, and n3 maps three vectors a , b , c E Rd to the oriented volume of the parallelepiped spanned by the vectors made up of the first three components of a , b and c. Similarly, the wedge product of ni, and n k maps any three vectors a , b , c E Rd to the volume of the parallelepiped spanned by the vectors made up of the ith, jthand kth components of a , b and c. The above indicates that the wedge product of k vectors of the dual base of the standard base e l , . . . , ed should map any k-tuple of vectors to the lower dimensional volume of the projection of the parallelepiped spanned by the vectors into the right lower dimensional subspace (see Exercise 18-28). Moreover, part 2 indicates that the wedge product of “projection volume functions” gives a higher dimensional volume function. Therefore the wedge product formalism should be the version of the familiar “base times height” formula for boxes that holds for parallelepipeds. 0 With the wedge product accepted as the right formalism to compute volumes of parallelepipeds we need to investigate its properties. Bilinearity of the wedge product is established in Exercise 18-29. Moreover, the wedge product allows a base representation of alternating tensors on finite dimensional spaces similar to Theorem 18.16. For this base representation, we first need to establish associativity and the behavior when two factors are transposed. The first two parts of the following lemma are motivated by the proof of the wedge product’s associativity. Lemma 18.22 Let V be a vector space. I . Zf S E T k( V ) , T E 7‘( V ) and Alt(S) = 0, then Alt(S 8 T ) = Alt(T @ S) = 0. 2. If S E T k( V ) , T E 7‘(V ) and U E 7”( V ) , then Alt(Alt(S @ T ) @ U ) = Alt(S @ T 8 U ) = Alt(S 8 Alt(T @ U ) ) . 3. Zfw E A k ( V ) ,r j E A‘(V) and B E A”(V), then In particular this means that the wedge product is associative. 4. The wedge product is “anti-commutative.’’I f w 0 A r j = (-l)k’q A w. 5. Zfk is odd and C#J E A k ( V ) ,then C#J A C#J E A k ( V )and rj E A’(V), then = 0. Proof. For part 1, we simply compute Alt(S @ T ) and represent the permutations cr E Sk+l as the composition of three permutations. The first sorts the elements of { 1, . . . ,k 1 ) into a set of size k and a set of size 1, so that in each set the elements + 18. Measure, Topology, and Differentiation 402 are listed in increasing order and the sets are listed as two consecutive blocks. The remaining two permutations then act on these sets. Alt(S €3 T)[vi, . . . , w + ~ I =Alt(S)[u,, , ..., u U k ] = O = 0. Part 2 is a straightforward application of the linearity of Alt(.), the multilinearity of tensor products and part 1. Alt(Alt(S €3 T ) €3 U ) - Alt(S €3 T €3 U ) = Alt(Alt(S @ T ) €3 U - ( S €3 T ) €3 U ) = Alt([Alt(S @ T ) - ( S €3 T ) ]€3 U ) = 0, and the other equality is proved similarly. Part 3 now follows from part 2. (W A q) A 8 = [ %Ai)!t(W k!l! €3 q ) ] A 8 18.3. Tensor Algebra and Determinants 403 and the other equality is proved similarly. For part 4, represent w A r j and r j A o similar to the representation for part 1. Then note that 6u,w o 6 w t u , the permutation that transposes the “blocks” 11, . . . , k} and {k 1, . . . , k 1 ) can be represented as a composition of kl transpositions by going left to right in {k 1, . . . , k I } , transposing each element k times with its current predecessor. This establishes the claimed equality. Part 5 immediately follows from part 4. + + + + With associativity of the wedge product established, we no longer need to include parentheses in the wedge multiplication of three or more alternating tensors. Theorem 18.23 Let V be a $finite dimensional vector space with base { u1, . . . , u d } , and let {$I, . . . , &) be the associated dual base as in Theorem 18.15. Then the set d:={@ilA..+q5i,:lsi1 < . . . < i k s d } i s a b a s e f o r A k ( V ). Proof. By Theorem 18.16, every q5 E I k ( V ) is a linear combination of tensor products $il @ . . . 8 4 i k . The function Alt : ‘ T k ( V )+ A k ( V ) is surjective by Theorem 18.19 and by Exercise 18-26 it is linear. Therefore every w E A k ( V ) is a linear combination of tensors Alt(@i, @ . . @ &). By associativity of the wedge product, Alt($i, 8 .. . 8 #ik) is a multiple of 4jl A . . . A &. If any two indices i j are equal, then by part 5 of Lemma 18.22 &, A q$] = 0.Moreover, for any permutation 0 E s k by part 4 of Lemma 18.22 we infer that q5il A . . . A $ j k E { 3 &,(il) A . . . A & ( i k ) } . This means that every w E A k ( V )is a linear combination of tensors $il A . . . A 4ik with il < i 2 < . . . < i k . To see that these tensors are linearly independent, let the numbers ail,,,ik be so that ail,,,ik@il A . . . A $ik = 0. For fixed 1 5 j 1 < . . . < j k 5 d note that l i i i <...<ikid which proves that the tensors independent. A ... A qii, with i l < i 2 < ... < i k are linearly In particular, Theorem 18.23 shows that A d ( V ) is one dimensional. Therefore all alternating d-tensors on d-dimensional space are multiples of one tensor. Geometrically this means that on d-dimensional space there is only one volume function for d-dimensional parallelepipeds, which is the only sensible possibility. 18. Measure, Topology, and Differentiation 404 Definition 18.24 The unique alternating d-tensor w on Rd with w(e1, . . . ,ed) = 1 is called the d-dimensional determinant. It is denoted det(u1, . . . , U d ) := o ( v 1 , . . . , vd). I f A = (a;,) = . is a d x d-matrix, we define the determinant of the matrix as ,,, J = l....,d The connection between determinants and volumes, which retroactively validates our geometric motivation, is made in Exercise 18-48. To be able to prove the results needed to get there we conclude this section with a few key properties of determinants. Proposition 18.25 Let V be a vector space with base { u1 , . . . , U d } and let w E Ad( V ) . d IfWj = C U i j V i , t h e n o ( w l , . . . , W d ) = det j = 1, . . . . d i=l Proof. Because ( v 1 , . . . , U d ] is linearly independent, w(u1, . . . , ud) f 0 (see Exercise 18-30) and so for A = ( a ; j ) = , , , , d we can define j = l ,...,d It is easy to see that q is d-linear in the columns of A, that the transposition of any two columns of A changes the sign of q ( A ) and that q ( 1 ) = 1 when I is the X d identity matrix. This means that if we represent each vector w = the column vector [y1) wjei in Rd with i=l , then q E Ad and q(e1, . . . , ed) = 1. But this means Wd that q is the determinant on R d ,that is, d The result now follows from applying the above to vectors Wj aij Ui . = ;=I Proposition 18.26 Let V be a d-dimensional vector space and let w { w1 , . . . , W d } is not linearly independent, then w(w1, . . . , W d ) = 0. Proof. Without loss of generality assume that there are E Ad(V). a1 , . . . , ad- 1 E If R so that 405 18.3. Tensor Algebra and Determinants d-1 i=l Corollary 18.27 Let A , B be real d x d-matrices. Then det(AB) = det(A) det(B). Proof. If the columns of A are linearly independent, we prove the result similar det(A B ) to the proof of Proposition 18.25. The function q ( B ) := ___ is linear in each det(A) column of B , transposing two columns transposes two columns of AB and thus it changes the sign of q , and q ( Z ) = 1, where Z is the identity matrix. Hence, we infer q ( B ) = det(B) in this case. If the columns of A are not linearly independent, then both sides of the equation are zero. When arguing in general, we usually work with linear functions, not matrices. Therefore. it is sensible to define the determinant of a linear function. Definition 18.28 Let L : Rd + Rd be a linearfunction and let A be its matrix representation with respect to the standard base. Then we define the determinant of L as det(L) := det(A). Exercise 18-31 shows that the determinant of a linear function L does not depend on the specific base chosen, as long as we use the same base to represent arguments and images. Finally, because Exercise 18-48 shows that the determinant defines volumes in EXd, we want to define volumes on more general spaces. As long as the space carries an inner product, this is sensible. With an inner product we can define orthonormal bases and cubes whose sides are defined by the vectors of an orthonormal base should have volume 1. To make a volume definition with an alternating d-tensor possible, we must assure that these tensors are constant on the orthonormal bases. The next result shows that this is almost the case. Proposition 18.29 Let V be a d-dimensional innerproduct space and let w E A d ( V ) . Then for any two orthonormal bases (u1, . . . , U d } and ( w 1 , . . . , w d } of v the equation / W ( U l , . . . , Ud)l = I W ( W 1 , . . . , wd)l holds. d Proof. For i, j = 1, . . . , d let aij E I d R be so that d wj = \ C aijvi. Then for all i=I d 18. Measure, Topology, and Differentiation 406 is the matrix with entries a i j , then the product A A T of A and its transpose is the identity I . By Corollary 18.27 and Exercise 18-34 we infer that the determinant of A is 1 2 or -1 because (det (A) ) = det (A) det A det A A = det(Z) = 1. Therefore by Proposition 18.25 we conclude ( ' > = ( '> j w ( w ~ ., . . , W d ) l = j det(A)w(vi,. . . , u d ) / = Iw(ui,. . . , With d-forms evaluating to the same absolute value when applied to orthonormal bases, it makes sense to pick one of the two d-forms that assign values of f l to orthonormal bases and designating it as the volume element. We will see in Section 19.5 how to choose one of these two to obtain a volume element for very general surfaces. Exercises 18-20. Let V be a vector space. Prove that T k ( V )with pointwise addition and scalar multiplication is a vector space. That is, prove that (a) Ifw, 7 E T ~ ( vthen ) , u ( u 1 , . , . , v k ) := w ( v 1 , . . , , V k ) + q ( u 1 , . . . , v k ) is ak-tensor on V . (b) If w E I k ( V ) and c E W,then ~ ( u l . ,. . . u k ) := c w ( u 1 , . . . , Vk) is a k-tensor on V . 18-21. Properties of the tensor product. Let V be a vector space (a) Prove that for all tensors S E 'Tk( V ) ,T E 7 l ( V ) and U E 7"( V ) we have (s €3 T ) €3 u = s 8 ( T 8 U ) . (b) Prove that there are tensors S and T on W2 so that S €3 T # T 8 S. 18-22. Prove Theorem 18.16. 18-23. Let V be a vector space and let k E 18-24. Let k E N.Prove that A k ( V )is a vector space. N.Proceed as follows to prove that the sign of a permutation is well-defined. (a) Prove that every u E sk can be represented as a composition of transpositions. Hint. Induction on k . In the induction step, transpose u ( k )back to k . (b) Prove that if u can be represented as the composition of n transpositions and as the composition of m > n transpositions, then m - n is even. Hint. Again, induction on k . (c) Prove that the function sgn : Sk + { 1, - 1) is well-defined. IS-25. Let V be a vector space. Prove that if w E A k ( V )and u E Sk, then for all u1, . . . , V k w [ u l , . . . , ukl = Sgn(U)W[Uo(l),. . . v o ( k ) l . Hint. Induction on the number of transpositions that need to be composed to get u . E V we have 3 18-26. Prove that Alt : T k ( V ) + A k ( V ) is linear, that is, for all tensors S , T E ' T k ( V )and all scalars a E R we have Alt(S T ) = Alt(S) Alt(T) and Alt(crT) = aAlt(T). + + 18-27. Prove the formula in part 2 of Example 18.21 18-28, The wedge product xii A . . . A rik gives the k-dimensional volume of the projection of a parallelepiped into span{ei,, . . . , ejk 1. (a) Let a1 , . . . . U k 7c1 A E LRd be zero in all but the first k components. Prove that the wedge product . . . A irk (a1 , . . . , ak) gives the k-dimensional volume of the parallelepiped spanned by a1 . . . . , a k . Hint. Prove that A ' ' ' A =k 1 (XERd:,k+l =,,,=X d = O ] k composed with the isomorphism be- lk tween ( ~ k ) ) "and { x E ~d : X k + l = . . . = X d = 0 is equal to the k-dimensional determinant and use that the k-dimensional determinant is the volume function for parallelepipeds spanned by k vectors in k-dimensional space. (This fact will be established in Exercise IS-48 without using this exercise.) 18.4. Multidimensional Substitution 407 (b) Let a l , . . . , U k E E d . Prove that wedge product 7rjl A ' . A 7rik ( a l . . . . , a k ) gives the k-dimensional volume of the projection of the parallelepiped spanned by a1 , . . . , U k into span(ei,, . . . , e i k ) . 18-29. Let V be a vector space. Prove that the wedge product is bilinear, that is, prove the following. (a) Prove that if w1, w2 E A k ( V )and q E A'(V), then (wl (b) Prove that if w E ilk(!') and '11,772 E A'(V), then w (c) Prove that if w E Ak(V), q E A'(V) and c E + w2) A II = w l A (a1 A rl + w2 A q . + 172) = w A '71 -tw A 72. W,then ( c w ) A q = c(w A q ) = w A (cq). 18-30, Let V be a d-dimensional vector space, let the set ( u l U d ] g V be linearly independent and let W E Ad(V)\(Oj.Provethatw(ul , . . . , u d ) + O Hint. It suffices to prove the result for W d ,Suppose for a contradiction that w ( u 1 , . . . , u d ) = 0 and use linear combinations to derive that w(e1, 18-31. The determinant of a linear function is independent of the base with respect to which it is represented. Let L : Wd + Wd be a linear function and let B be its matrix representation with respect to the base ( u l , . . . , U d ] . Prove that det(B) = det(L). c 18-32. Summation formula for the determinant. Let d E W. Prove that the determinant of any real d x d matrix A = (aij) . = 1, , , , , d eWalsdet(A) = sW(o)alo(l) " ' a d o ( d ) , j = l , . . . .d O€sd Hint. Prove that the determinant is the image under Alt(.) of the tensor that multiplies d ! with the diagonal entries. 18-33. Row Expansion of the Determinant. Let A E M ( d x d . E). We claim that for any number d i E (1,. . . , d ) we have det(A) = x ( - l ) ' ' j a i j det(Aij), where A i j E M(d - 1x d - 1, X ) is j=l obtained from A by erasing the ith row and the jthcolumn. (a) Prove the claim using Exercise 18-32 (b) Prove the claim by proving that the formula defines an alternating d-tensor that is equal to 1 for the identity matrix. 18-34. The transpose of a matrix A = (ajj) = is AT := ( u j i ) = 1, , _ ,, d . Prove that for j = 1, . . . . d j = l , . . . ,d all real d x d matrices A we have det(A) = det ( A r ) . Hint. Exercise 18-32. 18-35. Determinants of certain linear functions. (a) Let c1, . , . . Cd E W and let D : Rd --f Rd be the diagonal operator D ( X 1 , . . , , X d ) := (ClXl, . . . . CdXd). Prove that det(D) = C1 ' . ' C d . E W,let i, j E (1, , , . . d j be distinct and let A : Wd + Wd be the row addition operator A(x1, . . . . X d ) := ( X I , . . . , xi-1. xi +ax,, x i + l , . . . , xd). Prove that det(A) = 1. (b) Let a (c) Let i . j E (1, , . . , d j with i < j and let T : erator T ( x 1 , . . . . X d ) := det(T) = -1. Rd + Wd be the row transposition op. . . , nj-1, xi,x j + l . . . . . Xd). Prove that (XI.. . . , x-1. xj,x i + l , 18.4 Multidimensional Substitution The d-dimensional substitution formula (see Theorem 18.37) is a surprisingly deep result. Its proof draws on topology, differential calculus and measure theory as well as 18. Measure, Topology, and Differentiation 408 on linear algebra. However, the underlying idea is fairly simple. Consider the image of a small cube under a differentiable function g. If the cube is sufficiently small, then on the cube the function g is approximately equal to its derivative L . The image of the cube under L is a parallelepiped. This means that the g-images of our most elementary volume elements (cubes) are approximately parallelepipeds. It can be shown geometrically that the absolute value of the determinant of a 3 x 3 matrix is the volume of the parallelepiped spanned by the column vectors. So, in three dimensions, the volume of the image of the cube spanned by the standard unit vectors under a linear function L is the absolute value of the determinant of L . Because translations do not affect the volume (see Lemma 18.3 1) and linear factors in the columns can be factored out of the determinant, we expect that the absolute value of the determinant of a linear function L is exactly the factor by which the volume of a cube is multiplied to obtain the volume of its image under L. Lemma 18.35 shows that this is the case for arbitrary sets in arbitrary finite dimensions. Lemma 18.36 then shows that for continuously differentiable functions the absolute value of the determinant of the derivative is the local volume distortion factor and Theorem 18.37 pulls everything together to establish the Multidimensional Substitution Formula. Throughout this section we work with the uniform norm 11 .,1 on Rdbecause the “balls” in this norm are cubes. Lemma 18.30 Let 521, 522 S Rd be open subsets of Rd and let g : 521 -+ C22 be a continuously differentiablefunction. If S 2 5 2 1 is Lebesgue measurable, then g [ S ] is Lebesgue measurable, too. Proof. We first prove that g maps null sets to null sets. Let N c 521 be a null set. 1 For n E N,let K , x E S-21 : dist (x,Rd \ Q 1 ) 2 - and ~~x~~5 n . Then all K , n u I 30 are closed and bounded and 521 = K,. It is enough to prove that each g[N n K,] n=l is a null set. Because K2, is compact and Dg is continuous, we can find an upper bound L > 0 for 11 DgI/ on K2,, with all distances measured with the 11 . /I,-norm on u Ly) Rd.Let E c1 > 0 and let f l K, gj j=l x and { g j } F l be a family of open boxes such that N Ej 1 & -. i j=1 that N n K , C 2Ld From these boxes, construct a family of open cubes { B j } P l so u c1 Bj, j=1 1 Bj < j=1 & - Ld and the diameter of each Bj is at most 1 - 2n (see Exercise 18-36aj. Because we can discard any cubes that do not intersect K , we can as- u c13 sume that all B j intersect K,. Then Bj & K2,. By Theorem 17.33, this means that j=1 g is Lipschitz continuous on each Bj and the Lipschitz constant is L when distances are measured with the 11 . Il,-norm on Rd.Hence, each g [ B j ]is contained in a cube 409 18.4. Multidimensional Substitution u m C j with h ( C j ) 5 L d h ( B j ) (see Exercise 18-36b). Therefore, g [ N n K,] C Cj j=1 33 52 and h ( g [ N n K,]) 5 c h ( C j ) 5 L d h ( B j )< L d L = E . Because E was arbiLd j=1 j=1 trary, we obtain h ( g [ N n K,]) = 0 and we have proved that g maps null sets to null sets. Now let S 2 C 2 1 be Lebesgue measurable. Then by Theorem 18.4 there is a sequence ( F i } E l of closed sets and a sequence ( G i j E , of open sets so that for all i E N u 00 we have Fi C Fi+1 and Gi 2 Gi+l and so that with F := n 00 Gi we i=l i=l have F C S C G C S21 and h ( G \ F ) = 0. Clearly, g [ F l C g [ S ] C g[Gl G C2z and by the above h ( g [ G ]\ g [ F ] )5 h ( g [ G \ FI) = 0. For i E N,set := Fi n Ki . Then each is closed and bounded, and hence com- u E, 52 E 8 u nu u 52 52 nKi) = Fi and G := 00 00 Fi Ki = [E,] [g+1] u 00 n C21 = Fi. By Theoi=l i=l i=l i=l i=l i=l rem 16.62, all g are compact, and hence closed, so g [ F ] is the union of a sequence of closed sets g with g Cg for all i E N. This means that g [ F ]is Lebesgue measurable. Because h ( g [ S ]\ g [ F ] ) 5 h ( g [ G ]\ g [ F ] ) = 0 and null sets are Lebesgue measurable, this means that g [ S ]\ g [ F ]is Lebesgue measurable, and hence g [ S ] is Lebesgue measurable. pact and =u(Fi [R,] [E] Fi Exercise 18-54 shows that continuity alone is not enough to assure that the images of null sets are null sets. Unsurprisingly, translations do not affect the Lebesgue measure of a set, as the next result shows. Lemma 18.31 The effect of a translation on Lebesgue measure. For S _C Rd and x E Rd,define x + S := x + s E Rd : s E S } . If S is Lebesgue measurable, then x S is Lebesgue measurable and h(x + S ) = h ( S ) . + I a Proof. Exercise 18-37. The key to establishing Lemma 18.35 is to prove it for sufficiently basic linear functions from which arbitrary linear functions can be built. The first step are linear functions that stretch or shrink objects parallel to the coordinate axes. Lemma 18.32 The effect of a diagonal operator on Lebesgue measure. Let the set S C EXd be Lebesgue measurable, let c1, . . . , Cd E JR \ ( 0 ) and let the function D : Rd +. Rd be defined by D ( x 1 , . . . , x d ) := ( ~ 1 x 1 ,. . , C d X d ) . Then D [ S ] is Lebesgue measurable with h ( D [ S ] )= Ic1 . . . C d l h ( S ) = det(D)/h(S). 1 Proof. The Lebesgue measurability of D [ S ]follows from Lemma 18.30. Now let u 52 E > 0. To compute h ( D [ S ] ) let , { B j } p lbe a family of open boxes so that S j=l Bj 18. Measure, Topology, and Differentiation 410 Figure 47: Visualization of the proof of Lemma 18.33. Maps as in Lemma 18.33 only affect two coordinates, xi and x j . In these coordinates, they enforce a uniform “shear” on the space. This shear turns rectangles into parallelograms without affecting the area. These parallelograms can be covered with rectangles whose sides are parallel to the coordinate axes. tr3 and so that lBjl j=1 h(S) u + Ic1 . . .&Cnl + 1 . Then {DIBj]},”^_lis a family of open 00 boxessothatD[S] G D[Bj]andforeachj E NwehavelD[Bj]/ = Ic1 . . . c,IIBjI j=1 (Exercise 18-38). This means that 00 j=1 Because E j=1 > 0 was arbitrary we infer h ( D [ S ] ) 5 /c1 . . . C d l h ( S ) . A similar re- sult holds for D - ’ ( x l , . . . , xd) = -.XI, C l 1 . . . , -xd) . For all Lebesgue measurable cd sets A the set D - ’ [ A ] is Lebesgue measurable and h D- [A] - so h ( S ) = h (.’[D[S]) 5 1 ( 1 )<lcl.-cdl h(A). h(D[S]), that is, h(D[S]) 2 Ic1 . . . c d ( h ( S ) , IC1 . , . Cd I which establishes h(D[S]) = /c1 . . . c d / h ( S ) . Finally, it was shown in Exercise 18-35a that det(D) = c1 . . ‘ c d . The next step are functions that shear objects in one coordinate direction. Figure 47 shows the geometric idea for the proof of Lemma 18.33. Lemma 18.33 The effect of row addition on Lebesgue measure. Let S C Wd be a Lebesgue measurable set, let a E R, let i , j E { 1, . . . , d ) be distinct and dejne A : Rd + Wd by A(x1,. . . , xd) := (XI,. . . , ~ ~ - xl 1 , a x j ,x L + l , .. . , xd). Then A[S] is Lebesgue measurable with h(A[S]) = h(S) = det(A)lh(S). I + 18.4. Multidimensional Substitution 41 1 Proof. By Lemma 18.30 A [ S ]is Lebesgue measurable. Without loss of generality assume i < j and u > 0. Let & > 0. To compute h ( A [ S ] ) let , { B k ] g 1 be u cc 30 a family of open boxes so that S 5 k=l fi (u,",b,") := Bk, let nk E 1 Bkj Bk and k=l N be so that a (h$- a : ) 2 nk 1=1 bk - ak Ak := nb ' . Then for every x = + (rn - I ) & , a: (XI, < h(S) + -.2 For each k , let & n( & b,k - a,k ) < 2k+l andlet l#i,j . . . , xd) E Bk there is an m E (1, . . . , nk} so + rnAk]. This means that the ith component of A ( x ) satisfies X; f UXj E [X; U (a) f (m - I ) a k ) , Xi -k U (Ur f mAk Hence. m=l1=1 (see Figure 47), and therefore nk k(A[Bk]) 5 (bf m=l + U A k ) . Ak . n (bf - u,") r#i, j We obtain Because E > 0 was arbitrary we infer h ( A [ S ] ) 5 h ( S ) . We can obtain the same inequality for A - ' ( x l , . . . , xd) = ( X I , . . . , x i - 1 , x; - a x , , x i + l , . . . , xd). But this 18. Measure, Topology, and Differentiation 412 means h ( S ) = h (A-'[A[S]]) 5 h(A[S]), and hence h ( S ) = h(A[S]). Finally, it was shown in Exercise 18-35b that det(A) = 1. Lemma 18.34 The effect of row transposition on Lebesgue measure. Let S Rd be Lebesgue measurable, let i , j E 11, . . . , d } with i < j and let T : Rd -+ Rd be T ( x 1 , . . . , xd) := (xi,.. . , xi-1,x j , x i + i , . . . , x j - 1 , xi,x j + l , . . . , X d ) . Then T[S] is Lebesgue measurable with h ( T [ S ] )= h ( S ) = det(T)/h(S). 1 Proof. Exercise 18-39. Lemma 18.35 The effect of a bijective linear operator on Lebesgue measure. Let the function L : Rd + Rdbe linear and bijective and let S C Rdbe Lebesgue measurable. Then L[S] is Lebesgue measurable and /;(L[S]) = det(L)lh(S). I Proof. The Gauss-Jordan algorithm from Theorem 17.23 shows that there is a diagonal operator D with nonzero diagonal entries and a sequence of row transposition and row addition operators A1, . . . , A,* so that L = A,A,-l ' ' . A1 D. Because the determinant of a composition is the product of the determinants of the factors (see I n1 I Corollary 18.27) we infer that det(L) = (iI1 h(L[S]) = h(A,A,-i . . . A1 D[S]) 1 det(Ai) 1 1 1 det(D) I, and hence = det(A,)lh(A,-1 . . . A1 D[S]) Lemma 18.35 confirms our initial geometric idea for arbitrary dimensions. If a parallelepiped is spanned by vectors u1, . . . , v d , then it is the image of the unit cube under the linear map whose matrix representation is the matrix A with columns u1, . . . , Ud. Lemma 18.35 shows that the volume of this parallelepiped is the determinant of A. (Details are left to Exercise 18-48.) For the remaining results, we will work with half-open cubes, which are cubes of d the form n [ a i ,bi) with all bi - ai being equal. The radius of a half-open cube is half i=l its side length Lemma 18.36 The effect of a diffeomorphism on Lebesgue measure. Let ' 2 1 and C22 be open subsets of Rd,let K C R1 be compact and let g : Rl -+ R2 be a continuously differentiable bijective function with det ( D g ( x ) ) # 0 f o r all x E R1. Then f o r eve? 6 > 0 there is a 6 > 0 so that for every half-open cube B K with center point x and with radius less than S we have h ( g [ B ] )- det(Dg(x))(h(B) < s h ( B ) . I Proof. Because K is compact, the derivative Dg is uniformly continuous on K and Dg(x)-llI is uniformly bounded on K by an M > 0. Moreover, we can 18.4. Multidimensional Substitution find a v > 0 such that (1 +u )~ 1< 413 & max{/detDg(x)I : x E K } . There is a 6 > 0 1 U so that for all x,y E K with lIy - x / / <~6 we have //Dg(y)- Dg(x) < -. Now M let B be a half-open cube of radius less than 6, contained in K and centered at x. Then for all L E B the following holds. //g(z)- (g(x) + Dg(x)[z - X I ) Now we are ready to prove the Multidimensional Substitution Formula. Because we work with individual functions, we state the result for functions f in C' . It should be clear that a similar result holds for L1. 18. Measure, Topology, and Differentiation 414 Theorem 18.37 Multidimensional SubstitutionFormula. Let 521, 522 G EXd be open subsets of Rd and let g : Q1 -+ Q2 be a continuously differentiable bijective function so that for all x E 521 we have det (Dg(x)) # 0. Then for all f E L'(S-22) we have Proof. By Corollary 17.66, g-' is also continuously differentiable. The measurability of f o g now follows from Lemma 18.30 applied to 8-l. We will first prove the identity for all nonnegative f E Cr(522). Let f E Cp(522). Then g-'[supp( f ) ] is compact. For every x E g-'[supp(f)] find E, > 0 so that B E s ( x ) G Q l . Then { B y ( x ): x E there are X I , . . . , X k so that g-'[supp(f)] G g-'[supp(f)]} is an open cover of the set g-'[supp(f)], u Bcx,(xi). Then K := 3 i=l Q1. u k k compact subset of and hence BEXi ( x i ) is a i=l Moreover, if we define JSuPP:= min 1 .L : ;& i = 1, . . . , n 1, then every half-open cube of radius less than AsUpP that intersects g-' [supp( f ) ] is contained in K . (Recall that we are working with // . [lo.) Let E > 0. The function f is uniformly continuous on the compact set g[K]. Therefore there is a 6f > 0 so that for all x , y E g[K] with IIy - x ( / < 6f we have If(v) - f ( x ) / < & The functions ( f o g ) / det(Dg)/ and g are uniformly 3h.( g [KI) . continuous on K . Therefore there is a Sfos > 0 so that for all points x , y E K with & I I Y - ~ I I < 6 f o g wehave lf(g(x))l det(Dg(x))I-f(g(y))l det(Dg(y))lI < 3h(K) and + I/g(y) - g ( x ) / I < 6 f . Let M := max{I f(y)I : y E Q2) 1. By Lemma 18.36, there is a 6 E (0, min{6fog,8suPP))so that for every half-open cube B g K centered 1 at x with radius less than 6 we have h(g[B]) - det(Dg(x))lh(B) < n d Let B1, . . . , Bn be all half-open cubes of the form n g-I[supp(f)]. Then g-'[supp(f)] [ l i s , (li i=l U B j and supp(f) & 3Mh(K) h(B). + 1)6) that intersect n 5 U g [ B i ] . Because i=l i=l 8 < SsuPP, all Bi are contained in K. Let xi be the center point of Bi. Then for all points Y E Bi we have If(g(y))l det(Dg(y))l - f (g(xi))ldet(Dg(xi))ll < I / g ( y ) - g(xi)/l < Sf and for all Therefore n z E 1 g[Bil we have f ( z ) - f(g(xi))/ < & 3h(~), & 3k(g[Kl). 18.4. Multidimensional Substitution 415 n n i=l i=l n Because E was arbitrary the inequality h2 (f o g ) 1 det(Dg) I d h holds f dh 5 1 for all nonnegative f E C,"(Q). The same argument with the roles of domain and range reversed proves the reversed inequality as follows. By Theorem 17.31, we have that Dg-' = ((Dg) o g-')-l, so that (det(Dg) o g-') det (Dg-') [ = det ((Dg) o g-I) ((Dg) o g-l)-'] = 1, and hence which establishes the result for all nonnegative f E Cr(S22). Now let f be a nonnegative simple function on S22 whose support is contained in the interior of a compact set K S22 and for which there is a 6 > 0 so that := u Bs(x) JE quence is compact and contained in Q2. By Theorem 18.12, there is a se- K {fn}zl in C 2 ( s 2 2 ) so that lim n-+w If - f n l dA. = 0. By Lemma 18.11, we can assume without loss of generality that supp(fn) & k for all natural numbers 416 18. Measure, Topology, and Differentiation n E N. Let U := max { f ( z ) : z E Qz}. Because we can subtract the function h n ( x ) := (f n ( x )- U ) j l (f n ( x ) - U ) (see Lemma 18.9 for the definition of j l ) from f n without affecting the convergence, we can assume without loss of generality that the f n are uniformly bounded by a multiple of 1,-. Clearly, 1~ is integrable. By Proposition 14.49 we can assume without loss of generality that { also converges pointwise a.e. to f . Because g-' maps null sets to null sets (see proof of Lemma 18.30) { ( f n o g) d e t ( D g ) l } z l converges pointwise a.e. to ( f o g)l det(Dg)l. Moreover, all these functions are bounded by a multiple of (1,- o g) det( Dg) which is integrable. Thus the result follows for f via the Dominated Convergence Theorem. For nonnegative functions f E C'(S22), the result now follows via the Monotone Convergence Theorem (Exercise 18-40a). Finally, for arbitrary functions f E C'(Q2) the result follows by considering f + and f - separately (Exercise 18-40b). fn}gl 1 1, I We conclude this section by assigning the function g and the volume distortion factor det( Dg) their commonly used names. Definition 18.38 Let S21, S22 E JRd be open subsets of Rd and let g : Q I + Q 2 be a continuously differentiable bijective function so that for all x E Q l we have det (Dg(x)) # 0. Then g is also called a coordinate transformation and the determinant det (Dg(x)) is also called the Jacobian of g. Exercises 18-36. Finishing the proof of Lemma 18.30 n d (a) Let B = (ai . h i ) be an open box and let i=l cubes E l , . . . , Bn SO that B C S z 0. Prove that there is a finite set of open 0 (6 B,, L Bj j=1 B j (with respect to E, ) < h(B) + E and the diameter of each j=1 11 /Irn) is at most 8. & Hint. Define u := min n and let the sets - a i ) n f = j + l ( b i -ail d B I , . . . , Bn be all the cubes of the form measure of the union satisfies E. (6 ( I I u. ( I r + 1)u) that intersect B . Show that the 1=1 B J ) < L(B) j=1 + 52 . Then slightly enlarge each cube to get the result n d (b) Let B = (ai , h i ) be an open cube, let f : B +. Rd be a Lipschitz continuous function and i=l let L be its Lipschitz constant when all distances are measured with respect to 1) . /Irn. Prove b. - a that with (cl, . . . , c d ) := f and r := 2 (remember that all T) d b, - a, are equal), we have f[B]C n ( c i - Lr, c, i=l + Lr) 18.4. Multidimensional Substitution 417 n d (c) Explain why it is nor possible to prove that if B = ( a i , bi) is an open box and f : B -+ Rd i=l is Lipschitz continuous with Lipschitz constant L when all distances are measured with respect to rhe Euclidean norm !&?..%)and I/ 2 Hinr. Rotations. 18-37. Prove Lemma 18.31. Hinr. Use the proof of Lemma 18.32 as guidance. d 18-38. Let D be as in Lemma 18.32 and let B = n [ a i , bi] be a d-dimensional box. Prove that D [ B ] is a box with 1 D [ B ] 1 = 1 ~ 1 . .c,llB1. . i=l 18-39. Prove Lemma 18.34. Hint. Use the proof of Lemma 18.32 as guidance. 18-40. Finishing the proof of the Multivariable Substitution Theorem. (a) Prove Theorem 18.37 for nonnegative f E L 1 ( R 2 ) , assuming that it holds for nonnegative simple functions f E L ' ( R 2 ) whose support is contained in the interior of a compact set K R2 for which there is a S > 0 so that i? := u B s ( x ) is compact and contained in Q2. xcK (b) Prove Theorem 18.37 for arbitrary f E L 1(R2),assuming that it holds for nonnegative functions f E ~ ' ( ~ 2 1 . 18-41. Let R1 g Wd be an open subset of W d . Prove that i f f : R1 + Rd is Lipschitz continuous, then there is a constant L > 0 so that for all S _C R1 we have h ( f[S] ) 5 L h ( S ) . 18-42. Compute the Jacobians of the following coordinate transformations g. (a) Polar Coordinates. The coordinate transformation s2 : (0. m) x (0.27c) + by q ( r . 0) = ( r cos(0), r sin(@ ). W2 is defined (b) Cylindrical Coordinates. The coordinate transformation c : (0,m ) x (0, 2 n ) x defined by c(r, 0 , z ) = ( r cos(Q),r sin(Q),z ). W + B3 is (c) Spherical Coordinates. The coordinate transformation s3 : (0. m) x (0,2rr) x (0, 7r) -+ W3 is defined by s3(p, 0 , q ) = ( p cos(0) sin(cp), p sin(0) sin(q), p cos(q) ). 18-43. Compute the following integrals where d x , dy, dz denote the Lebesgue integrals along the x-. y - and z-axes, respectively. Use the coordinate transforms from Exercise 18-42 as appropriate. 418 18. Measure, Topology, and Differentiation (1-ii? 18-44. The integral of N F , o ( x ) = -e 6 - 7 U * 2 .2 (a) Compute the integral of f ( x , J ) = eover a disk of radius u centered at the origin. Then take the limit of the integrals as u + 00. s_, X2 (b) Compute (Lm m x2 e--T dA(x) by proving x2 x2+2 e-TdA(x))' = LXI".e-- dh(x, J ) and then computing the latter integral. Hint. Proposition 14.65 and part 18-44a. L, so (c) Prove that 1 ( x - d e 2u2 dh(x) = 1 18-45. The volume of d-dimensional balls. We first need to define d-dimensional Spherical Coordinates. Two and three dimensional spherical coordinates s2 are defined in Exercises 18-42a and 18-42c. For d > 3 with Sd-1 : (0, 00) x ( 0 , 2 n )x (0,7r)d-3 + Rd-l denoting d - 1 dimensional spherical coordinates, let Sd(p3 8 %$91, , , . , (Pd-2) := Sd-1 . . . . Vd-3)) sin(Vd-2), . . . . ( p , 8. rd-1 (Sd-I(P. 8, V l , . ,,> Vd-3)) Sin(%-2). Pcos(Vd-2) ldP2 Let Jd be the Jacobian of Sd. Prove that IJd 1 = p (sin(cpd-2) IJd- 11 f o r d 2 3. Hint. The last row of the matrix (which contains the derivatives of the last coordinate) has cos(rpd-2) as its first entry and --p sin(cpd-2) as its last entry. Expand the determinant with respect to the last row (see Exercise 18-33). Prove that the volume of the d-dimensional Euclidean ball of radius I about the origin is Prove the area formula for circles in R2, Prove the volume formula for balls in B3. Hint. It is not necessary to compute the integrals 18-46. Let ( M , X I p ) be a n-finite measure space, let f : M + [0, co]be C-measurable and let (R,Xjh.A ) be the real numbers with Lebesgue measure. [;,fP d k = 1 m Prove that prP-'p ( {x E M :f(x) > t }) d t for all 1 5 p < x. (Measurability is not an issue because of Exercise 14-S1a.) Prove that if g : (0, 00) 1 m [;,n.fdw 18-47. g E = + (0, x)is differentiable and g ' ( x ) > 0 for all x , then g'(f)cL ( { x E M : f ( x ) > ] ) dt. c' (.") Prove that C ( x , t ) := f ( x - r ) g ( r ) is Lebesgue measurable. Wd the function cx(r) := f ( x - r ) g ( r ) is Lebesgue integrable J& f ( x - t)g(t) dt: if cx is Lebesgue integrable, .+ g(x) := 10: otherwise, isLebesgueintegrableand /If *gill 5 ~ ~ f ~ I l / ~ g ~ 1 1 . The function f * g is also called the convolution o f f and g Prove that for almost all x and that he function E 419 18.4. Multidimensional Substitution (c) Prove that if h E (d) Prove that f L 1(."), then k(x) := h ( - x ) also is Lebesgue integrable *g =g * f (e) Prove that if g is bounded, then f * g is continuous. Hint. First prove the result for continuous f.then use that the continuous functions are dense i n L 1 (R~). 18-48. Prove that the determinant det(u1, . . . , u d ) is the oriented n-dimensional volume of the parallelepiped spanned by the column vectors u1, . . . , Ud. Hint. Map a box to the parallelepiped and use the Multivariable Substitution Formula. 18-49. Prove that the hypothesis that L is bijective can be dropped from Lemma 18.35. Hint. Prove that if the linear function L is not bijective, then the Lebesgue measure of every image of a Lebesgue measurable set is 0. 18-50. Multidimensional Substitution with weaker hypotheses. Let R1, 9 2 5 Ed be open subsets of Rd and let g : R1 +. R2 be a continuously differentiable function so that for almost all x E R I we have det ( D g ( x ) ) # 0 and so that { x E R1 : (32 E C21 : f ( x ) = f ( z ) ) ) is a null set. Then for all Hint. The set { x E R1 : det(Dg(x)) = 0 ] is closed in G I . Prove that if K is compact, then the set [ x E K : (32, E K : f ( x ) = f ( z ) ) } is closed, too. Apply Theorem 18.37 to the appropriate subset of R1 to first prove the result for compactly supported functions. 18-51. On the injectivity hypothesis of Theorem 18.37. Consider the function g : R2 + R2 defined by g ( x . y ) = (x' - y 2 , 2xy) (this function interprets (x,y ) as a complex number and squares it). (a) Prove that for all (x,y ) E W2 the function g is differentiable at (x, y ) and that det ( D g ( x . y ) ) = 4x2 4 y 2 . + (b) Prove that for all (x, y ) E (c) Prove that for all ( a , b) W2 we have 1 g(x, y ) / I 2 + (0,O) we have g a = x2 +y2 + m b = ( u ,b ) . (d) Prove that g [B1(0,O)] = B l ( 0 , O ) . (e) Prove that for the function f ( x , y ) = 1 we have s,, f d h = 7r, but the transformed inte- (0.0) gral is s,, f o g I det(Dg) 1 d h = 27r. Then explain why this result does not contradict (0,O) Theorem 18.37. Hint. You may use the result of Exercise 18-42. 18-52. The effect of a diffeomorphism on Lebesgue measure. Let R1 and R2 be open subsets of Bd, let K g Rl be compact and let g : C21 --f R2 be a continuously differentiable bijective function with det ( D g ( x ) ) f 0 for all x E R1.Then for every E z 0 there is a 6 > 0 so that for every box B g K 1 withcenterpointx andwithdiameterlessthan8 wehave IA ( g [ B ]) - /det(Dg(x)) i ( B ) l c & h ( B ) . Hint. The difference between this result and Lemma 18.36 are the absolute value signs and that we can work with boxes rather than cubes. Use Theorem 18.37 with f = 1. 18-53. Line integrals and surface integrals are independent of the parametrization. (a) Let m 5 d , let R,,R2 g Em be open and let r1 : 521 -+ Rd and r2 : R2 + Rd be injective and continuously differentiable so that r1 [R,]= r2[C22]and so that all derivatives Dri (x) are injective. Prove that r2-' o r1 is a differentiable function from R l to 822. 420 18. Measure, Topology, and Differentiation Hint. L e tx E a R1.Find u r n + ' , . . . , ud so that - q ( x ) , 8x1 a .. . , -rl(x), v,+l,, axm d a ba s e ofR'an d o n R 1 x R d-, s e t R l ( z l . . . . , z d ) : = r l ( z 1 , . . . , z m ) + , I . , Ud is zjuj.Use j=rn+l Corollary 17.66 to prove that R1 is differentiable with differentiable inverse on a neighborhood of ( x i . . . . , x r n . 0, . . . , 0). Define a similar function R2 for ' 2 . + + Let S > 0 and let rl : (a1 - 6 , bl 6) + Rd and r2 : (a2 - 8 , b2 6 ) + Rd be continuously differentiable functions with rl [ [ a l ,b l ] ] = r2 [ [ q b2] , rl (al)= r2(a2), rl(b1) = r2(b2) and so that all derivatives of the rj are not zero. Let R Rd be an open set that contains r l [ [ a l ,bill and let F : R -+ Rd be continuously differentiable. Prove that 1, Hint. rl = r2 o r2-' o rl Let and R2 be open subsets of R2 and let 11 : Rl -+ W3 and r2 : R2 + W3 be continuously differentiable with r l [ Q l ] = rz[S22] so that for all ( x , y ) E 2, we have a a a y ) # 0, for all ( u , u ) E R2 we have - r 2 ( ~ , u ) x - r 2 ( u , v ) # 0, ar au au and so that there are points ( x , y) E 521 and ( u , u ) E R2 so that r l ( x . y ) = r2(u, u ) and a a a a a -q(x, ax y ) x -ri(x, - q ( x , y ) x -r1 (x,y ) = h-r2(u, u ) x - r 2 ( u , u ) for some h > 0. Let R E R3 be an ax ay all au open set that contains r1 [ R l l and let F : R + W3 be continuously differentiable. Prove that Hint. rl = r2 o r2-' o ' 1 , Then work out componentwise that the effect on the cross product is a multiplication with the determinant of the appropriate 2 x 2 matrix. 18-54. Continuous images of null sets can have nonzero measure. For this exercise, let the function @ : [0, 11 + [0, 11 be Lebesgue's singular function from Exercise 11-22. (a) Prove that if g : [a, b] + R is a nondecreasing function, h(x) = g(x) h (g[A] ) > 0, then h (h[A] ) > 0. + + x and A is a set with + (b) Prove that F ( s ) := + ( x ) x (where is Lebesgue's singular function from Exercise 11-22) defines a continuous bijective function from [0, 11 to [0,2] so that F-' also is continuous and A. ( F [CQ])> 0. (c) Prove that for all d E N there is a null set N in Rd and a continuous bijective function G on an open subset of Rd that contains N so that h ( G [ N ]) > 0. 18-55. Prove that if f : [ a , b] + R is absolutely continuous and nondecreasing and N set, then f [ N ] also is a null set. [ a , b] is a null 18-56. Uniform limits of continuous functions with additional properties need not have these additional properties. is a uniformly convergent sequence of (a) Use Exercises 18-54 and 18-55 to prove that if [ f n absolutely continuous functions on [ a , b ] ,then the limit need not be absolutely continuous. (b) Prove that if [fn]r==r is a uniformly convergent sequence of Lipschitz continuous functions b], then the limit need not be Lipschitz continuous. on [a, Chapter 19 Introduction to Differential Geometry In applications, it is often necessary to describe surfaces, like the d-dimensional spheres p E Rd : llp112 = r ] of radius r , or, more generally, lower dimensional subsets S of a higher dimensional space. To describe such subsets or surfaces, we use parametrizations, that is, bijective, continuously differentiable functions g : S2 --f S with injective derivatives D g ( x ) , where S2 is an open subset of RM for an appropriate m . Unfortunately, for many surfaces such an overall parametrization cannot be defined. For example, for the spheres of radius 1 it can be shown (Exercise 19-12) that a parametrization g as mentioned must have a continuous inverse function g-‘ : S +. S2. But S is compact and S2 is not compact, which is impossible by Theorem 16.62. On the other hand, the Implicit Function Theorem guarantees that for many surfaces that are solution sets of equations f ( x , y ) = 0, where both x and y can be in Banach spaces, we can locally find parametrizations. This is the idea behind the definition of a manifold. This chapter introduces manifolds in Section 19.1 and their tangent spaces and differentiable functions in Section 19.2. Sections 19.3, 19.4, and 19.5 build the integration theory on manifolds and the chapter culminates in Section 19.6 with Stokes’ Theorem. 1 19.1 Manifolds The fundamental idea behind a manifold is that it “locally looks like m-dimensional space.” This reflects our intuition that differentiable surfaces, despite being curved, locally look like two-dimensional space. 4 21 422 19. Introduction to Differential Geometry Figure 48: Manifolds ( a ) are spaces that locally look like R". The atlas of a Cmmanifold is a collection of homeomorphisms {xi}ier into the appropriate Rm so that the compositions xi o xj7' are diffeomorphisms wherever they are defined. Embedded manifolds ( b ) are subspaces of Rd so that each point of the manifold has a neighborhood in Rdthat can be transformed so that the image of the intersection of the manifold with the neighborhood lies in the hyperplane in which the ( m 1)" through dth coordinates are zero. + Definition 19.1 Let m E N.A metric space M is called an m-dimensional (topological) manifold ifffor each p E M there is an open set 0 C M so that p E 0 and 0 is homeomorphic to R". (See Figure 48.) Equivalently, in the definition of a manifold we could demand that each 0 is homeomorphic to an open subset of Rm (see Exercise 19-1). For further investigations, we also need differentiability properties. Because M is just a metric space, these properties are introduced through differentiability properties of compositions of the homeomorphisms from the definition. Rd be open. A bijective, infinitely direrentiablefiinction Definition 19.2 Let U , V h : U -+ V with infinitely direrentiable inverse is called a diffeomorphism. Definition 19.3 Let m E N and let M be an m-dimensional manifold. A family {Xi}iGI is called an atlasfor M ifleach xi is a homeomorphism from an open subset Oi of M to an open subset of R",for each p E M there is an i E I so that p E Oi and for all i , j E I the composition xi o xjT1 : xj [ Oi n O j ] -+ x,[ Oi n O j ] is a diffeomorphism. The functions xi are also called charts or coordinate systems. (See Figure 48.) The inverse x - l of a coordinate system x : U parametrization of the subset U of M . -+ Rm can be interpreted as a Definition 19.4 A pair ( M , {xi)ier) of an m-dimensional manifold M and an atlas {xi}iE1 is also called an m-dimensional Cm-manifold. A s for spaces, we Qpically refer to Cm-manifolds through the set M , implicitly assuming that an atlas is given. Of course, every Coo-manifoldis a manifold. A Ck-diffeomorphismis a bijective, k times continuously differentiable function with k times continuously differentiable 19.1. Manifolds 423 inverse. By using Ck-diffeomorphisms instead of diffeomorphisms, it is possible to define Ck-manifolds. Working with Ck-manifolds would require us to keep track of detailed differentiability conditions. Hence, throughout this chapter we will work with Cm-manifolds and we will simply refer to them as manifolds. Results similar to the ones derived in this chapter also hold for @-manifolds. Trivially (Rd. { idRd is a manifold. Whenever we work with Rd as a manifold }) we will assume that the atlas is { idRd}. Our first nontrivial examples of manifolds are subsets of d-dimensional space. These "embedded manifolds" will be of particular interest throughout. They arise frequently in applications and we will use them as examples and for motivation of abstract definitions. s Definition 19.5 Let m , d E N be so that m 5 d. A set M Rd is called an m dimensional embedded manifold iff f o r every p E M there is an open neighborhood U C Rd of p , an open set V Rd and a diffeomorphism h : U -+ V f o r which h [ U f? M ] = { V E v : um+l = . . . = V d = 01. (See Figure 48.) Proposition 19.6 Every m-dimensional embedded manifold M is an m-dimensional manifold. Proof. For each p E M , let U p be an open subset of Rd that contains p and for which there is a diffeomorphism h , : U p + V, with V p C Rd open and so that h,[U, n MI = { V E V p : um+l = . . . = Ud = 0 ) . Let TRrn : Rd + Rm be the projection onto the first m coordinates. For all p E M let x, := nRm o h p l U p n M . Clearly, each x, is a homeomorphism from the set 0, := U p n M to the subset r B m [ { ~E vP : um+l = . . . = Ud = o}] of Rm. For all p , q E M , the composition h , o h i ' : h p [ U pn U,] .+ h y [ U pn U q ] is a diffeomorphism. Moreover, the sets xP[Op n 0,] = TRni o h p [ U pn U , n M ] andxq[Op n O,] = n p o h q [ U pn U, n M ] are projections of the intersections of open subsets with the subspace Rm x (0)d-m of R d , which means they are open subsets of Rm. With epgm : Rm + Rd being the natural embedding that maps Rm to Rm x the composition X, O X - ' = ~ p g mo h , 0 h i ' 0 epgm is a diffeomorphism. Therefore P In~m [h , [UP"Uq nM]] { x p J p Eis~an atlas, and hence M is an m-dimensional manifold. Whenever we work with an embedded manifold, we will assume that its atlas was generated as in the proof of Proposition 19.6. Embedded manifolds arise naturally when solving equations. Theorem 19.7 Let R C Rd be open and let f : R + Rd-" be an infinitely diyerentiable function so that f o r all p E R with f ( p ) = 0 the matrix D f ( p ) has rank d - m. Then f ( 0 ) is an m-dimensional embedded manifold in Rd. -' Proof. Let p E R be so that f ( p ) = 0. Apply Corollary 17.69 to obtain an open set G G Rd and a diffeomorphism g : G -+ g[G] so that p E g [ G ] C C2 and f o g ( v 1 , . . . , U d ) = ( u r n + ' , . . . , U d ) for all ( q ,. . . , U d ) E G . Then for all (ul. .. .,Vd) E 8-1 [ f - ' ( ~ ) n g [ G ] ]we obtain f o g(v1, . . . , V d ) = 0, which means 424 19. Introduction to Differential Geometry u,+1 = . . = ud = 0. With U := g[G], V = G and h = g-l we see that, because p was arbitrary, f (0) is an m-dimensional embedded manifold. w -' Proposition 19.6 and Theorem 19.7 provide a multitude of concrete examples. Example 19.8 Examples of (embedded) manifolds. 1. Trivially, every open subset Cl of Rdis a d-dimensional manifold with atlas {in}, where i n ( p ) = p for all p E Cl. This observation is trivial, but it will help with integration over embedded manifolds. d 2. Let r > 0. Because f : Rd -+ R defined by f (XI,. . . , Xd) := r 2 - x; is J=1 1 infinitely differentiable, every sphere p E origin is a (d - 1)-dimensional manifold. Rd : f ( p ) = 0) centered around the 3. A function a : Rd -+ R is called affine linear iff there is a nonzero linear function L : Rd + R and an r E R so that a(x) = r L [ x ]for all x E Rd. Because affine linear functions are infinitely differentiable, the intersection of any hyperplane p E Rd : a ( p ) = 0 ) with an open set is a (d - 1)-dimensional manifold. + I 4. More generally, the level surfaces of f ( x ) = k of any infinitely differentiable function f : Rd + R with nonzero derivatives D f (x) are (d - 1)-dimensional manifolds. Under mild hypotheses on how the surfaces intersect, for n < d the intersection of n level surfaces f 1 (x) = k l , . . . , f, (x) = k, is a (d - n)dimensional manifold, because we can use F ( x ) := ( f i (x)-kl , . . . , f, (x)-k,) and apply Proposition 19.6 and Theorem 19.7. 0 Exercise 19-2 provides further examples. We can also obtain examples of manifolds by considering subspaces. Definition 19.9 Let M be a manifold and let U C M be an open subset is called an open submanifold of M . of M . Then U Exercise 19-3 shows that open submanifolds are indeed manifolds themselves. Some interesting sets cannot be described as manifolds. For example, the closed ball B1(0) c Rd (with respect to the Euclidean norm) is not a manifold, because none of the boundary points has a neighborhood that is isomorphic to Rd.The observation that these points have neighborhoods that are isomorphic to a half-space leads to the idea of a manifold with boundary. Definition 19.10 Let m E N.A metric space M is called an m-dimensional (topological) manifold with boundary i r f o r each point p E M there is an open neighborhood 0 C M of p that is homeomorphic to Euclidean space R"'or to the upper half space Hfn := {(XI, . . . . x,) E Rm : x, 2 0 ) . Points p E M that do not have neighborhoods isomorphic to Rm are also called boundary points and the set of all these points is called the boundary a M of M . 425 19.1. Manifolds To define atlases for manifolds with boundary, we define the following. Definition 19.11 Let C2 5 Rd be an open subset of IRd and let B E Rd be so that Q 5 B E Then Ck ( B ) denotes the set of restrictions to B of functions that are k times differentiable on a neighborhood of B. Similarly, C" ( B ) denotes the setof restrictions to B offunctions that are infinitely direrentiable on a neighborhood of B . a. Because the ranges of the coordinate systems of manifolds with boundary are sets as in Definition 19.11, atlases for manifolds with boundary, @-manifolds with boundary and C"-manifolds with boundary are defined similar to atlases for manifolds, C k manifolds, and Cco-manifolds, respectively. We will also refer to Cco-manifolds with boundary simply as manifolds with boundary. Exercise 19-4 shows that the boundary of a manifold with boundary is a manifold. As for manifolds, subsets of Rd are of particular interest. Definition 19.12 Let m , d E N be so that m 5 d. A set M C Rd is called an mdimensional embedded manifold with boundary irfor every p E M there is an open set U Rd containing p and an open set V C Rd such that either 1. There is a diffeomorphism h : U + V so that h ( U n M ) = { v EV : ~ m + i = . . . = ud = 01, O r 2. There is a diffeomorphism h : U --f V so that the mth component of h ( p ) is zero andh(U fl M ) = {v E V : u, 2 0, Um+1 = . . . = ud = 0). The reader will prove in Exercise 19-5 that every embedded manifold with boundary is a manifold with boundary. Corners cannot be described with differentiable functions. Therefore, some interesting sets, such as the cube [0, 1Id, do not have a satisfactory description as embedded manifolds with boundary (see Exercise 19-13b). To include these sets, we define manifolds with corners similar to manifolds with boundary. Definition 19.13 Let m E N.A metric space M is called an m-dimensional (topological) manifold with corners iff each p E M has an open neighborhood 0 C M that is homeomorphic to a subspace ck := {(xi,. . . , x,) E Rrn : xk 2 0, . . . , X m 2 o} with k E { 1, . . . , rn) or to Rm. Points p E M that do not have neighborhoods isomorphic to Rrnare also called boundary points and the set of all these points is called the boundary a M of M . Atlases for manifolds with corners, Ck-manifolds with corners and Cm-manifolds with comers are defined similar to atlases for manifolds with boundary, Ck-manifolds with boundary, and Cm-manifolds with boundary, respectively. We will also refer to Cm-manifolds with comers simply as manifolds with corners. For a Cm-manifold with corners, we will say that a point p E M is contained in a corner iff there is a homeomorphism x from a neighborhood of p to a space ck with k < m and x ( p ) = 0. Embedded manifolds with comers are touched upon in Exercise 19-6. We should also note that, formally, every manifold with boundary is also a manifold with corners. We 19. Introduction to Differential Geometry 426 will typically not unify the two concepts, even when proving results valid for manifolds with corners, because “manifold with corners” should explicitly indicate the presence of corners, while a manifold with boundary is assumed to be smooth Exercises s 19-1. Let M be a metric space, let p E M , let 0 C M be an open neighborhood of p , let V W m be open and let x : 0 + V be a homeomorphism. Prove that there is an open neighborhood U of p and a homeomorphism y : U + R”. Hint Consider the restriction of x to the inverse image of a small ball around x ( p ) and then map that ball diffeomorphically to Rm. 19-2. More examples of manifolds (a) Let a , b, c x2 (0, 30). Prove that the ellipsoid a manifold. Also construct an atlas. E y2 22 ++= 1 in R3 is a 2-dimensional b2 c2 (b) Let S7 C X 2 and let f : S7 + W be infinitely differentiable with D f ( x ) # 0 for all x E S2. Prove that every level curve f ( x , y ) = k is a one-dimensional manifold. (c) Prove that S L ( n , R) := fold. {A E M ( n x n , W) : det(A) = 1 } is an n2 - 1 dimensional mani- 19-3. Prove that if M is a manifold and U is an open submanifold, then U is also a manifold. 19-4. Prove that if M is a manifold with boundary, then a M is a manifold. 19-5. Let M E Wd be an embedded manifold with boundary. (a) Prove that M is a manifold with boundary. (b) Prove that the two conditions in the definition of an embedded manifold with boundary are mutually exclusive. That is, every point of M satisfies either 1 or 2, but not both. Hint. Assume a point satisfies both and use Corollary 17.66. (c) Prove that the points of M that satisfy condition 2 are the boundary points of M . (d) Prove that if M is the closure of an open subset of Rd,then a M = SM, that is, the boundary of M as a manifold with boundary equals its topological boundary. (e) Prove that aM is an embedded manifold. 19-6. Define embedded manifolds with corners and prove results similar to those in Exercise 19-5 19-7. Prove that if M is a manifold with comers, then a M is a union of manifolds with corners. 19-8. Prove that every connected manifold is pathwise connected. 19-9. Prove that every manifold is locally compact and that every connected manifold is o-compact 19-10. Let Q1, S72 C Rm be open sets. Prove that if g : det ( D g ( x ) ) f 0 for all x E R1. ’21 -+ S72 is a C1-diffeomorphism, then 19-11. Prove that a set M g Rd is an m-dimensional embedded manifold in Rd iff for each p E M there are an open neighborhood G g Rd,an open set N C Rm and an injective infinitely differentiable function f : N + G so that f [ N ] = M n G, D f ( z ) has rank rn for all z E N and f-’ : f[N1+ N is continuous. Hint. For “e,” let x E N be so that f ( x ) = p and let the vectors urn+l, . . . , U d E Rd be so that the set { a Gf(x), function f ( z i , . a . . . , -f(x), ax, um+l, J . , , , U d is a base of Rd.Then apply Corollary 17.66 to the . . , z d ) = ~ ( z i , ... . z m ) + d C j=m+l zjuj 427 19.2. Tangent Spaces and Differentiable Functions 19-12. Let 2 ‘ g R2 be open. Prove that no function f : R --f { p E R2 : 1Ipli2 = 1 ] can be bijective, continuously differentiable and so that all D g ( x ) are injective. Hint. Use an idea similar to that for Exercise 19-11 to prove that f-’ is continuous. 19-13. Corners cannot be described with embedded manifolds. Let d p 2. (a) Prove that the graph [ ( t . ltl, 0 , . . . , 0 ) : t E (-1, 1) ] of the absolute value function is not an embedded manifold in Rd. Hint. Use Exercise 19- 1 1 (b) Prove that the cube [0, 1Id is not an embedded manifold with boundary in Wd Hint. Use Exercise 19-5e and arguments similar to part 19-13a. 19-14. Let M be a manifold with corners (a) Prove that if p E M is in a corner isomorphic to the origin in ck, then there are continuous functions C k . . . . , cm : [O, 11 --f M so that each ci [ [O, 11 ] is contained in a comer and the intersection of any two distinct sets ci [ [0, I] ] is { p ] . (b) Prove that the set CM := { p E M : p is in a corner ] is closed. (c) Prove that if M is compact, then there are finitely many continuous functions c1, . . . , cn so u n that C M = ci [ [0, 11 1. i=l 19-15. Prove that if A is an atlas of the manifold M , then the set is also an atlas of M . Note. This atlas is called the maximal atlas of M . u A” { :A g A” and A”is an atlas of M ] 19.2 Tangent Spaces and Differentiable Functions The definition of differentiable functions on manifolds (see Definition 19.14 below) is motivated by the fact that compositions of differentiable functions are again differentiable. However, rather than using differentiability of the factors to obtain differentiability of the composition, differentiability of the composition is used to define differentiability of the middle factor. Exercise 19-16a shows that this idea is consistent with the idea of differentiability according to Definition 17.24. Definition 19.14 Let M , N be manifolds and let f : M -+ N be a function. Then f is called differentiable iyfor all x in the atlas of M and all y in the atlas of N the composition y o f o x - l is diyerentiable. Note that Definition 19.14 does not depend on the atlases used for M and N , as long as there is a containment relation between the atlases for each manifold. This is because if x l and x2 are coordinate systems of M with overlapping domains, then the composition x1 o x c l is differentiable and similar for coordinate systems y1, y2 for N . Therefore differentiability of y1 o f o implies differentiability of y2 o f o xF1 on a subset of its domain and the domain of y2 o f o xT1 can be pieced together from overlaps with domains of similar compositions using functions 21 and y1 from smaller atlases of M and N , respectively. The details are left to Exercise 19-16b. Proposition 19.15 and Exercise 19-17 show that differentiable functions on manifolds behave as we would expect differentiable functions to behave when it comes to domain restrictions and compositions. xrl 428 19. Introduction to Differential Geometry Proposition 19.15 Let M , N be manifolds, let U C M be an open submanifold and let f : M -+ N be a direrentiablefunction. Then f Iu : U -+ N also is differentiable. Proof. Exercise 19-18. Before we show that there are indeed many examples of differentiable functions, we introduce higher orders of differentiability. Throughout, unless otherwise stated, the dimension of a manifold M will be m and the dimension of a manifold N will be n. Definition 19.16 Let M , N be manifolds, let k E N and let U E M be open. Then f : M -+ N is a Ck function on U ifffor all coordinate systems x : V -+ JRm and y : W + Rn the composition y o f o x-l is a C kfunction. A function lx[unvnf-l [w]] that is Ck for all k E N is called C f f i .A bijective Ckfunction whose inverse also is Ck is called a Ck-diffeomorphism. Proposition 19.17 Let M be a manifold. A function f : M -+ JR is COc ifSfor each p E M there is a neighborhood U p of p so that f is Cffion U p rn Proof. Exercise 19-20. The next result is a translation of Lemma 18.11 to manifolds. Theorem 19.18 Let M be a manifold, let C 5 M be compact and let U C M be open so that C E U . Then there is a Coofunction f : M -+ [0, 11 so that f / c = 1 and SUPP(f) c u. Proof. Let p E C and let x : V + JRm be a coordinate system around p . Then there is an e p > 0 so that B,,(p) C U n V. Because x-l is continuous, the image UPQd := [ I [T- 1 x B q ( p ) is open and CRd := x B 9 ( p ) n C is closed. With lclWd,uRd as o x(q) for all q E U n V and let f p ( q ) := 0 in Lemma 18.11 let f p ( q ) := lcRd,uRd for all other 4 E M . It is easy to see that supp(fp) C B 3 ( p ) C U . 3 To prove that f p is Coo, let 4 E M . If f p is identical to zero in a neighborhood of q , then clearly fp is Coo in that neighborhood. If f p is not identical to zero in any neighborhood of q , then q E supp( f p ) C B q ( p ) . Let y be any coordinate system around 4 and let cause E E 3 E 0, 2 be so that B,(q) is contained in the domain of y . Be- &P -= and 4 ? ( E supp(f p ) , B,(4) is also contained in the domain of x. Thus is Coo. Because 4 E M was u arbitrary, f p is C f f i . Because C C B y ( p ) and C is compact, there are points P I , . . . , pn E C so u PEC n that C C j=1 n f p j is Coo,glc 1 1 and supp(g) C: U . Hence, B&pj - ( p j ) . Now g := ’ j=1 f := j1 o g (see Lemma 18.9 for the definition of j , ) is as desired. rn 429 19.2. Tangent Spaces and Differentiable Functions Figure 49: If M is an embedded manifold and h is as in Definition 19.5, then the derivative of h-' maps the horizontal space into which h[U n MI is embedded to a space that is tangential to M . Although Definition 19.14 defines differentiability on a manifold, it is not satisfactory by itself. After all, differentiable functions have a derivative and Definition 19.14 does not tell us what the derivative is or what it should be. The problem is that a manifold does not have any linear structure. Without linear structure there is no way to shift a linear function L [ . - x ] so that f (x) L [ . - x ] is locally a good approximation of f , as we did in Definition 17.24 (also see Exercise 17-25). In fact, we cannot even dejine linear functions, which means on a manifold we must start from scratch. For embedded manifolds, there is a notion of differentiability in the surrounding space. Hence, we will use embedded manifolds as guidance. Throughout, we will make sure that our newly defined notions are consistent with what we should expect for embedded manifolds. So consider an embedded manifold M & EXd and let h : U + V be as in Definition 19.5. Then for all p E U n M the function h-' is differentiable at h ( p ) and near p the affine function k - ' ( h ( p ) ) D h - ' ( h ( p ) ) [ . - h ( p ) ] is tangential to + + h-' (see Figure 49). With Rm x {O]d-" [ := u E Rd : um+l 1 = . . . = V d = 0 there (Bm x It is an open set V in Rd so that h maps U n M to the set V n follows directly from the definition of differentiability that for every E > 0 there is a 6 > 0 so that for all z E V n ( E X m x {O}"-'") with / / z - h ( p ) < 6 we have / llh-'(z) - k - ' ( k ( p ) ) - D k - ' ( h ( p ) ) [ z - k ( p ) ] l l f tion P + Dk-'(h(p))[ ' EI/Z - h(p)II.Therefore,thefunc I - h ( ~ ) ] / B " ' x ( o l n &ismtangential to h-' ""( xP (0)d-m) + [.- . Geo- metrically, this means that the set of points p D k - ' ( h ( p ) ) x {O}d-"] is the tangent space of M at p (see Figure 49). It can be shown via the Chain Rule that this tangent space does not depend on the choice of h (see Exercise 19-19). Moreover, if M and N are embedded manifolds and f is a differentiable function on a neighborhood of M so that f [ M ] N, then the derivative of f at p maps the above defined tangent space of M at p to the similarly defined tangent space of N at f ( p ) . 19. Introduction to Differential Geometry 430 This means we can reinterpret derivatives as functions that map the right tangent spaces to each other. But for an abstract manifold there is no surrounding space in which to define tangent spaces on which the derivative of a function f : M -+ N could operate. Thus we first need to create tangent spaces. There are numerous ways in which tangent spaces can be defined in differential geometry and they are all equivalent in a certain sense. We choose a simple definition here and reinterpret it when this becomes conducive to the investigation in Section 19.4. If x is a coordinate system for the manifold M , then for every element p in the domain of x the space R" is the tangent space (of Rm itself) at x ( p ) . This is because the only space that could be tangential to an open set is the surrounding space itself. Proposition 19.19 shows that if x and y are coordinate systems of M and p is in the domains of x and y , then the tangent vectors in the tangent spaces at points x ( p ) and y ( p ) can be identified in such a way that they are useable as tangent vectors for M at p . Proposition 19.20 shows that this idea is consistent with the idea of a tangent space for embedded manifolds as described above. The remaining results in this section will involve a lot of work with equivalence relations. Because every point of a manifold can be in the domain of several coordinate systems, at each step we must assure that our definitions do not depend on the specific coordinate system we use. This level of detail would be hard to work with on a regular basis. It would be similar to recall that formally every real number is an equivalence class of Cauchy sequences of rational numbers (see remarks after the proof of Theorem 16.89) whenever we work with real numbers. Clearly, this would be overkill. Therefore, we will establish that tangent spaces and the functions that take the place of derivatives behave as they should and we will subsequently use these properties, with the details of the definitions only to be used when this is unavoidable. Proposition 19.19 Let M be an m-dimensional manifold and let p E M . For all u , w E R" and f o r all coordinate systems x, y so that p is in the domains of x and y, dejine the relation - p by (x,u ) - p ( y , w)zff w = D y o x (x(p))[u]. Then the relation - p is an equivalence relation. Moreovel; if we denote the equivalence classes by [x,u I p and the set of equivalence classes by M,, then the binary operations [x,ulP [x,u ] , := [x,u uIp and a[x, uIp := [x,CYU]~,where a E R,are welldejined and with these operations M , is a vector space that is isomorphic to Rm. ( + -'I + -, Proof. Left to the reader as Exercise 19-21. The proof that is well-defined relies on the formula for the derivative of the inverse function for symmetry and on the Chain Rule for transitivity. Proposition 19.20 Let M Rdbe an m-dimensional embedded manifold, let p E M and let x : U + R" be a coordinate system around p as constructed in Proposition 19.6. Then x-' is diyerentiable in the sense of Dejinition 17.24 and the "embedded := { p } x Dx-'(x(p)) [R"] is isomorphic to M p via the isotangent space" morphism F ( ( p , u ) ) := [x, (Ux-'(x(p)))-' . [u]] P Proof. Exercise 19-22. H 43 1 19.2. Tangent Spaces and Differentiable Functions Proposition 19.19 introduces an object M , that could serve as a tangent space for M at p and Proposition 19.20 shows that for embedded manifolds there is a natural isomorphism between M , and the space that we would expect to be the tangent space. Thus we call M , the tangent space of M at p . Definition 19.21 The space M , of Proposition 19.19 will be called the tangent space of M at p . The set T M := M , is called the tangent bundle of M . u ,EM Simplistically speaking, at every p E M Definition 19.21 merely tacks a tangent space on to the manifold. However, even if this was all, this approach is consistent with an important physical motivation. Vectors in physics have magnitude and direction, just like vectors in mathematics. But vectors in physics also have a point of action. For example, consider a car that moves in a straight line at constant speed. The magnitude and direction of the force that the car exerts on the particles in front of it stay the same, independent of whether these particles are air or whether they constitute another car. Obviously, the effect is different in either situation. If the force acts on a set of air molecules, the car travels regularly. If it acts on another car, the car crashes. To give vectors in mathematics a point of action, we need to incorporate the point of action into the definition of the vector. This is done in the definition of T M . For { i d R d } ) ,note that R$ is just [idRd, u ] , : u E R d l ,which is consistent with the idea that our vectors now have a point of action p . Similarly, for an open set !d E Rdconsidered as a manifold (a, i n ) we obtain a, = [ i n , u ] , : u E R d ]for the tangent space. Although these realizations are trivial, they will be very useful when we consider integration over embedded manifolds. Now that tangent spaces are defined, we can tackle the definition of a "derivative" on M . Consider embedded manifolds M and N and a differentiable function f defined on a neighborhood of M so that f [ M ] 2 N. Then the Chain Rule implies D f ( p ) [ D h - ' ( h ( p ) ) x {O}d-m]] = D ( f oh-') ( h ( p ) ) x {0}"-'"], which is contained in the tangent space of N at f ( p ) , and it is equal to the tangent space if we assume that f is a diffeomorphism. Therefore derivatives on manifolds should map the tangent space at a point into the tangent space at the image point (see Figure 50). Proposition 19.22 shows that it is possible to define such a mapping on the tangent vectors and Proposition 19.24 shows that this map is consistent with what we expect it to do for embedded manifolds. [ (wd, I [,,, [.- Proposition 19.22 Let M , N be manifolds, let the function f : M + N be direrentiable, let p E M , let x be a coordinate system around p and let y be a coordinate system around f ( p ) . Then f * , ([x, ul,) := [ Y , D ( Y 0 f 0 x-'> ( x ( p ) ) [ ~ l ]de~~~) fines a linear function f+, : M , +. N,. Proof. We first need to prove that f i . p is well-defined. Let let y1 and 4'2 be coordinate systems about f ( p ) . Then (XI,u ) -, (x2,w) and 432 19. Introduction to Differential Geometry Figure 50: The tangent space M , of a manifold M at a point p (see Definition 19.21) can be viewed as a tangential plane attached at p . For embedded manifolds, M , can be considered to be the image of Rm under the derivative of the right parametrization (see Proposition 19.20). For a differentiable function f : M + N the map f* (see Proposition 19.22) plays the role of the derivative. If M and N are both embedded manifolds and f is a differentiable function from a neighborhood of M to a neighborhood of N , then f* combines the function and the derivative (see Proposition 19.24). Definition 19.23 We denote the function from T M to T N whose restriction to each M p is f * p by f*.Moreovel; unless necessary, we will not explicitly mention p and thus denote f * p by f*,too. Proposition 19.24 Let M , N be embedded manifolds, let p E M and let f be a differentiable function from a neighborhood o f M to a neighborhood of N . Then for all ( p , u ) E MErnbthefunction f*emb((p,u ) ) := ( f ( p ) ,D f ( p ) [ u ] )satisJies the equation f* = FN o fZmb o F L ' , where FM and FN are the isomorphisms from Proposition 19.20 for M and N , respectively. Proof. Let x be the coordinate system around p that is used to construct FM and let y be the coordinate system around f ( p ) that is used to construct F N . Then 433 19.2. Tangent Spaces and Differentiable Functions As noted in the beginning, the details of chasing these compositions through the equivalence classes are too cumbersome to do on a regular basis. To avoid this level of detail, we accept that T M really defines the tangent spaces and that f* is the "derivative" of a function f : M + N , and we subsequently use the fundamental properties of these entities rather than their definitions. For manifolds with boundary and manifolds with corners, we can also define tangent spaces. Because differentiable functions on sets that are not closed are restrictions of differentiable functions on larger sets, the derivatives D y o x- are also defined for boundary points. This means we can say the following. ( '1 Definition 19.25 If M is a manifold with boundary or a manifold with corners, the tangent bundle T M is deJined in the same way as for manifolds. For points on the boundary of the manifold, there are two kinds of tangent vectors. Definition 19.26 Let M be a manifold with boundaly and let p E L?M. A tangent vector [x,u ] , will be called outward pointing ifsu, -= 0 and it will be called inward pointing iff u, z 0. For p being in the boundary of a manifold with corners the vector [x,u ] , will be called outward pointing i f f x ( p ) u $1;;;. The vector [x," I p will be called inward pointing i f f x ( p ) u E (Ck)'. + + With tangent spaces defined, it is now easy to define (tangential) vector fields. Definition 19.27 Let M be a manifold. A function F : M + T M so that F ( p ) E M , for all p E M is called a vector field on M . Finally, note that even though we assumed throughout that our manifolds were Coo manifolds, everything in this section can be defined and proved for C1 manifolds. 434 19. Introduction to Differential Geometry Exercises 19-16. Consistency of Definition 19.14 with the original definition of differentiability and with itself. (a) Let R1,R; g B" and R2, R; g Rn be open sets, let f : S21 + R2 be a function and let R; and @ : R2 + S2; be differentiable bijective functions with differentiable 9 : '21 + inverse. Prove that f is differentiable iff @ o f o cp-l is differentiable. (b) Let M be a manifold with atlas dM,let N be a manifold with atlas A N and let f : M + N be so that for all X A E AM and all Y A E A N the composition Y A o f o x i 1 is differentiable. Let AM2 AM and_xN 2 AN be atlases of M and N ,let x : U -+ Em be in 2~ and let y : V + W" be in d N so that the composition y o f ox-' has nonempty domain. Prove that y o f o 1 - I is differentiable. 19-17. Let M , N, 0 be manifolds and let f : M -+ N and g : N + 0 be differentiable. Prove that g o f is differentiable and that ( g o f)* = g, o f*. 19-18. Prove Proposition 19.15. 19-19. Prove that if M is an embedded manifold and h : U + V and k : 6 -+ then Dh-' ( h ( p )) [ TR" x (O]d-m ] = Dk-' ( k ( p ) ) [ Rm x (O]d-" are as in Definition 19.5, 1. 19-20. Prove Proposition 19.17. 19-21. Prove Proposition 19.19. That is, prove that the relation z Pis an equivalence relation and that the vector addition and scalar multiplication are well-defined. 19-22. Prove Proposition 19.20. 19-23. Let M be a manifold. Prove that if id : M + M is the identity, then id, : T M + T M also is the identity. 19-24. Let M . N be manifolds and let U 5 M be an open submanifold of M . (a) Prove that the tangent bundle T U = u PEU (b) Prove that ( f l u ) * = f *l~ Up of U is equal to u Mp. PEU for all differentiable functions f : M + N. 19-25. Let M be an m-dimensional manifold. Prove that T M is a 2m-dimensional manifold. 19-26. Prove that if A4 g Rd is an embedded C m manifold and R is an open neighborhood of M , then for every f E Ck(R)the restriction f j is~ C k on M . 19-27. Let M be a connected manifold, let C C_ M be closed (but not necessarily compact) and let U 5 M be open so that C g U . Prove that there is a C" function f : M + [O, 11 so that f l c = 1 and supp(f) G U . 19.3 Differential Forms, Integrals Over the Unit Cube As we start our investigation of integration on manifolds, we first note the following three shortcomings of the tools we currently have available. First, although vector fields as in Definition 19.27 are important, they have a mortal flaw for applications in fluid mechanics and field theory. Any vector field that is defined as a map from the manifold into the tangent bundle is necessarily tangential to the manifold. In fluid mechanics and field theory, manifolds are test surfaces and vector fields usually go through these test surfaces or at least they have a component that causes some "transfer" through the surface. Obviously such vector fields cannot be modeled with the tangent bundle. Second, a typical application of the interplay between fields and surfaces is the computation how much "matter" a vector field transfers through a test surface. To 19.3. Differential Forms, Integrals Over the Unit Cube 435 compute this quantity, we need to integrate the field over the manifold. In R d , integrals usually are computed with Fubini's Theorem and we do not give much thought to the fact that the coordinate directions are provided by the standard base. On a manifold there is no standard base and each neighborhood of a point has infinitely many parametrizations via coordinate systems. Thus we cannot just pick one coordinate system and use the standard base in its image space. Third, lower dimensional objects, like two dimensional surfaces in three dimensional space, typically have measure zero in the higher dimensional space. Therefore, we need a notion of integration that gives nonzero integrals, even if our manifold resides in a higher dimensional space. The above only reemphasizes that the requisite definitions will require a lot of attention to detail. Differential forms will allow us to model vector fields that are not necessarily parallel to the manifold. The integral of such a form over a manifold will be pieced together from simpler integrals. The simplest such integral is the integral of a differential form over a cube, which is presented in this section. In Section 19.4, cubes in Rm are lifted to the manifold as k-cubes and the differential forms will be k-forms on the manifold. Finally, Section 19.5 defines the integral of a form over the whole manifold. Some definitions will be repeated in this presentation, but the insight gained by first working out the details in a simple setting will be well worth it. (Compare with the double coverage of Lebesgue integration in Chapters 9 and 14.) For starters, recall that Ak ( V ) denotes the space of alternating k-tensors on V (Definition 18.17). Definition 19.28 Let R be a subset of Rd. A function w : R -+ u Ak (Rz) so that PER w ( p ) E Ak Rd f o r all p E R is called a k-form on R or simply a differential form. Weformally set A' (Rz) := R,so that a 0-form simply is a function. ( PI Because it is natural to identify each [idRd, u I p E R$ with u , we will denote tangent vectors to Rd by single letters in this section. Because alternating k-tensors are associated with k-dimensional volumes (see Example 18.21) we introduce the following notation for the dual base. Definition 19.29 Let { e l , . . . , ed] be the standard base f o r R$.We will denote the dual . . . , Trd} as { d x l , . . . , dxd} where dxi (e,) = ni ( e j ) . base {nl, The notation is similar to that for integrals because forms are connected to integrals of vector fields as follows. A differentiable function Y : [ a ,b ] +- R3 with ~ ' ( t#) 0 for all t E [ a ,b] can be interpreted as describing the position r ( t ) of a traveling particle at time t . If R contains r [ [ a ,b ] ]and the vector field F : R + T R is so that F ( x , y , z ) describes a force that is acting on a particle at the point ( x , y , z ) , then the work that is done as the particle travels from r ( a ) to r (b) is the line integral W =lh(F(r(t)), d t , where we assume that each F ( r ( t ) ) was projected back from R2,(r)to R3 via [in,v],(~) H u . By Exercise 18-53b, the value of this integral depends only on the geometric shape of r [ [ a ,b ] ]and the direction of travel, 436 19. Introduction to Differential Geometry but not on the speed at which the particle travels from r ( a ) to r ( b ) along this path. we can write the integral as r(f) Each rl!(t)can be obtained from r ' ( t ) by applying the form d x i . The integral is thus also abbreviated as W = lb + lb + l P dxl for d x , ( r ' ( t ) ) d t . The form w = P d x l Q dx2 b R dx3, where dxi stands + Q dx2 + R dx3 can be interpreted as the dif- ferential amount of work that is done as the particle moves a differential step from its current position r ( t ) to r ( t ) + r'(t) dt = r ( t )+ i;::) dx2 2 i1 . The integration for- malism that we will define in Section 19.5 will indeed locally reduce the integral of the form w to the above line integral. Similarly, an injective differentiable function r : [a,b] x [c,d ] + R3 so that (2 8) x ( x , y ) f 0 for all ( x , y ) E [ a ,b] x [c,d ] can be interpreted as a surface in R3.If we interpret F as a flow field, then (with the right hypotheses) the throughput of F through the surface defined by r is the surface integral Exercise 18-53c shows that this integral also only depends on the geometric shape of the surface as long as the parametrizations respect the orientation of the surface. (Details on the rather subtle notion of orientation will be addressed later.) With the wedge product from Definition 18.20 the factors behind the components of F can be ar obtained by applying the forms dx2 A dx3, dx3 A d x l and d x l A dx2 to the vectors ax ar and -. Thus w = P dx2 A dx3 + Q dx3 A d x l + R d x l A dx2 can be interpreted aY as the differential throughput of the vector field F through a parallelogram spanned by ar ar the differential vectors -dx and -dy attached at r ( x , y ) and located on the surface ax aY defined by r (also see Figure 5 1 on page 442). 437 19.3. Differential Forms, Integrals Over the Unit Cube L''Idib Finally, for a scalar function f : [ a ,b] x [ c ,d ] x [ I , h ] --+ IR the integral is f ( x , y , z ) d x d y d z . This time there is no extra factor involved. The form f ( p ) dxr\dyr\dz can be interpreted as the differential contribution o f f ( p )to the overall integral. To stay consistent with the above, we could say that the box is parametrized by the identity. The scaling factor, which is 1, is obtained by applying d x A d y A d z to the partial derivatives of the identity with respect to x , y , and z. The above indicates that differential forms should allow us to describe line integrals of vector fields, surface integrals of vector fields over surfaces with rectangular parameter domain, and integrals of scalar functions over boxes with one formalism. Moreover, in this formalism the forms carry almost all the information needed for the integral. The components of the field as well as the directions in which we integrate are part of the form. Only the parameter domain is not given and it can be supplied by a coordinate system. Therefore, this approach should be the right idea and, after taking care of the details, we will indeed see that forms on manifolds can be used to model vector fields on surfaces. Definition 19.30 Let R Rd be open and let o : R -+ u Ak ( a p be ) a k-form on PEQ c there is a base representation of the form R. By Theorem 18.23, at every p E ~ ( p=) o i l , , , , , i k ( pdxi, ) A . . . A dxi,. We will call w differentiable at p lsil < , . , < i k s d iff each of the wi,,,,,>ik is differentiable at p . I f w is differentiable at each p E C? we will call w differentiable on a. Recall that by Theorem 17.43 for differentiable functions f : EXd + IR the deriva- af + + af tive D f is D f = - dxl . . . - dxd and it is only a small abuse of notation to axl axd set d f := D f . We define the differential of a k-form similar to the k lStderivative of regular functions (see Corollary 17.61), except that we work with wedge products instead of tensor products. In this fashion, we will retain the connection to differential contributions to integrals. + Definition 19.31 Let R C Rd be open and let f : fine df af := - dxl 8x1 af dxd. + .'. + - ~ ( p=) axd I f w : 52 --+ u -+ EX be differentiable. We deAk ( a p )is dgerentiable and PEQ Wil,,,.,ik( p ) dxi, A . . . A dxi,, then we define the ( k + l)-formd w lsil < . . . < & i d bY d w ( p ) := c l i i l <..,<iksd and call it the differential of o. dwi1,,,.,ik(p)dxil A ' ' A dxik 19. Introduction to Differential Geometry 438 Definition 19.31 allows us to make some connections that may be familiar from multivariable calculus or physics. Example 19.32 I f w = P dx2 A dx3 + Q dx3 A dxl + R dxl A dx2, then + Q dx2 + R dxg, then = (” ax2 - 9) ax3 dx2 A dx3 + I f w = P dxl do dxi (g 2) dx3 - A dxl Both equalities are a direct application of the definition of the differential and of A dxj = 0 (Exercise 19-28). The connection to vector fields defined as functions from a subset of R3 to R3 is now quite simple. [[ Definition 19.33 Let S2 2 Rd be open and let F ( p ) = i n , d be a vector aF. ax 2. The divergence is field 9n Q. We dejne the divergence of F as div(F) := j=1 also denoted by V F, because the sum is the formal scalar product of the column vector in F with the nabla operator. Definition 19.34 Let S2 2 R3 be open and let F = [ i n , ( 1] be a vector field on Q. Wedefine and call it the curl of F. The curl is also denoted by V x F, because the components can be obtained as the formal cross product of the column vector in F with the nabla operator. Example 19.32 shows that the differential of the appropriate form encodes the divergence and the curl of a vector field together with the appropriate volume or area element that is needed to integrate the divergence or the curl. Also recall that the gra- dient of a differentiable function f : 52 + R is grad(f) = d j=1 af -ej. ax Hence, the 19.3. Differential Forms, Integrals Over the Unit Cube 439 differential of a 0-form (a function) encodes the gradient together with the differential length element that is needed for a line integral. We now turn to the properties of differential forms. Theorem 19.35 Let R C_ Wdbe open, let o be a d8erentiable k-form and let q be a diferentiable l-form on Q. Then the following hold. + q ) = d o + dq. d ( w A q ) = (dw) A q + ( - l ) k w A d q . 1. I f 1 = k, then d ( o 2. 3. I f w is twice differentiable, then d ( d w ) = 0. Proof. Part 1 is trivial. Moreover, part 1 shows that we only need to consider forms w = f d x i , A . . . A dxi, and q = g dxj, A . . . A d x j , for parts 2 and 3. For such forms, we obtain the following for part 2. d ( w A q ) = d [ ( f dxi, ~ . . . ~ d x i , ) ~ ( g d~ x. j. ,. ~ d x j , ) ] = d [ ( f g ) d x i ,~ . . . A d x i , ~ d x ~j ., . . r \ d x j , ] = d ( f g ) r \ d x i , ~ . . . ~ d x~i d, x j A, . . . ~ d x j , = ( ( d f ) g 4-f ( d g ) ) A d x , , ( d f ) A d x , , A . . A dx,, = + A A . . . A dx,, A d x J , A . . . A d x J l g d x J ,A . . . A dxJl + ( - l ) k f d x , , A . . . A ~ x , ,r \ d g A d x J , A . . . ~ d x , , = (dw) A q + ( - l ) k oA d q . Finally, part 3 is proved as follows with D, denoting the athpartial derivative. d = x d ( D , f ( p ) ) d x , ~ d x i ~, . . . ~ d x i ~ ,=l d = 0. d 19. Introduction to Differential Geometry 440 The above completes the proof. Exercise 19-31 shows how Theorem 19.35 includes certain equations from vector analysis as special cases. We are now ready to define the integral of a k-form over the cube [O, 1 I k . Definition 19.36 I f w is a k-form on [0, 1 I k , then (by Theorem 18.23) there is a unique function f so that w = f dxl A . . . A dxk and we dejine In the following, we will often work with k-tuples from which a certain element, say, the lth, has been removed. To simplify notation, we define the following. Definition 19.37 If a1 . . . ak is a sequence of k mathematical symbols denoting elements of a set and we want to investigate the sequence from which the I" entry has been removed, we write a1 . . . a2 . . . ak instead of a1 . . . al-lal+l . . . ak. Stokes' Theorem now says that the integral of the differential of a (k - 1)-form (which is a k-form) over the cube [0, lIk can be expressed as a sum of integrals of the (k - 1)-form over the faces of the cube. We first state the result in its raw form and subsequently reduce the rather complicated right side to something more manageable. Theorem 19.38 Stokes' Theoremfor [0, 1Ik. Let k E N and let w be a continuously direrentiable ( k - 1 ) form that is dejined on a neighborhood of [0,ilk. Then where for k = 1 the integral on the right (in this case there is only one "integral") simply evaluates the functions at the given point. Proof. With w ( p ) = wi lsil < . . . < i k - l i , ( p ) dxi, A . . . A dxik-, for each subset i k {il < . . . < ik-l} C 11,. . . , k) there is a unique index j? equality { i i < . . . < ik-l} = { 1, . . . , k} \ {j?} holds. Hence, E (1, k} so that the A . . A dxk 44 1 19.3. Differential Forms, Integrals Over the Unit Cube Via Fubini's Theorem and the Fundamental Theorem of Calculus we obtain w Geometrically, the sum of integrals on the right side of the equation in Theorem 19.38 runs over all the faces of the cube [0, 1 I k . We thus would like to abbreviate it as the integral of the form w over the surface of the cube. To incorporate the negative signs we have picked up, we attach the negative signs to the pieces of the surface. This is similar to how we incorporated the integration variables into forms. Moreover, in anticipation of the next section, we define our cubes and boundary pieces as functions. Definition 19.39 We dejine I k as the identity function from [0,lIk to Rk. The ( i , 0)face of I k is Ii,oj(x):= (XI,.. . , xi-l,O, x i + l , . . . ,xk), and the ( i , l>-faceof I k is k I(i,lj(x) := (XI,. . . , xi-1, 1, xi+l, . . . , X k ) , where I$,oland I$,l)are functions from k [0, llk-' t o R k . The(formal)sumi31k = c ( - l ) i - I [ , l j )iscalledthe bound- i=l k I aryof I k .Fora k - 1f o r m w ( p ) = k ( p ) dxl ,,,,,& ,,,,, A ... Ad& A . . . A dxk we a=l dejine k Clearly, the integral of w over i31k is defined so that the inconvenient sum from Theorem 19.38 is pushed into the formal sum that defines the domain of integration. More importantly, this formalism has a solid physical intuition behind it. Let us interpret w as a vector field, say a flow of a fluid, that is decomposed into components so 442 19. Introduction to Differential Geometry x Figure 5 1: Visualization of the 2-form Q d x ~ d asza vector field parallel to the y-axis. If Q d x A d z is interpreted as a flow field, then there is no transfer of material through the bottom, the top, the front or the back face of the cube. Moreover, if we assume that Q is a constant, then the transfer through the left face is Q . (area of face) into the cube and the transfer through the right face is Q . (area of face) out of the cube that the dxl A . . . A d z A . . . Adxk-component is perpendicular to any plane of the form x, = c , where c E R is a constant (also see Figure 51). Components that are parallel to the surface do not contribute to the surface integral, because flow parallel to a membrane does not transport material through it. On the other hand, components that are perpendicular to the surface contribute with their magnitude, because a flow of velocity v' that is perpendicular to a small surface S transports per unit time a material volume of Ilv'l/A(S)through the surface, where A ( S ) is the area of the surface S. Therefore, to compute the net flux of w across the surface of [0, 1Ik, for each a E { 1, . . . , k } we only need the integrals of w l , , , , , & , , , , ,over k the parts of the surface of the cube for which x, = 0 or xa = 1. The opposite signs of 0 1 ,,,,,6 ,,,,,k for x, = 0 and x, = 1 are needed to assure that the integral counts net transfer of material into or out of the cube, because a flow direction that enters the cube at xa = 0 is exiting the cube at x, = 1. Corollary 19.40 With Dejnition 19.39 the equation in Stokes' Theoremfor [0, lIk P becomes the much more readable equation /,O,lIk P dm = /aP w . Exercises 19-28. Prove the claims in Example 19.32. 19-29. Prove that i f f and g are 0-forms, then d ( f g ) = ( d f ) g +f(dg). 19-30. Let R g Wd and let w he a k-form on R.Prove that for all i l < . Definition 19.30 is w [ e i , , . . . , e i , ] . 19-31. Let R g ' < ik the function wi,,., . l k from W3 be open. Use Theorem 19.35 to prove the following: (a) If F : R -+ R3 is a twice differentiable vector field, then div (curl(F) ) = 0 (b) I f f : R +-W is a twice differentiable function, then curl (grad(f) ) = 0. 443 19.4. k-Forms and Integrals Over k-Chains (c) Why is is not possible to use Theorem 19.35 to prove the wrong equality grad ( div(F ) ) = O? 19-32. Prove that the function f in Definition 19.36 is f ( p ) = w ( p ) [ e l ,. . . , ekl 19-33. Divergence Theorem for [0, l13. (a) Prove that if F is a differentiable vector field defined on a neighborhood of the unit cube [O, lI3, then Then explain why the integral on the right gives the net outflow from the unit cube (b) Integrate the vector field F ( x , J , z ) = (1: 1 y2 over the surface of the unit cube [0, I l 3 19.4 k-Forms and Integrals Over k-Chains To define integration on manifolds, first note that a manifold locally looks like its tangent space. Hence, locally, volumes in the manifold are essentially volumes in the tangent space. Because volumes are computed via k-tensors we define a k-form on a manifold as a function into the spaces of k-tensors on the tangent spaces. Definition 19.41 A function w : M u --+ Ak ( M p ) such that w ( p ) E Ak ( M p )for PEM each p M is called a k-form on M or a differential form. E Let x : U -+ Rm be a coordinate system and let p E U. To obtain a base in A k ( M p ) , let {[x, e l l p , . . . , [x, e m l p } be the “standard base” in M p and denote its dual base in M ; by { d x l , . . . , d x m } . Then the k-form w on M can be written as w ( p ) = c wfl ,,,,,j k ( p ) dxj, A . . . A dxj,, where the functions wf, ,,,,,in are l i i , <...<ikSrn defined on U . Formally, the w ; , , , , , , ;depend ~ on the choice of the coordinate system x . Similar to Exercise 19-16b it can be proved that if the w?,,,,,,;,(p)in the representation with respect to the coordinate system x are differentiable at p , then for any other coordinate system y the w / ~ , , , (, ,p~) ,in the representation with respect to y are also differentiable at p . Therefore, although the specific representation of the form depends on the coordinate system, the differentiability properties of the form are independent of the coordinate system chosen to represent it. Definition 19.42 A k-form w : M every p w= u PEM A k ( M p )will be called differentiable $for M there is a coordinate system x : U -+ R” so that p E U , the form equals w?,,.,,.i k d x ; , A . . . A dxik on U and the w?13 ,,,.ik are dgerentiable at c E --+ lsil <...<ikirn 19. Introduction to Differential Geometry 444 p . Similarly, we dejne C J-forms f o r j E N and CooTforms. Throughout, unless otherwise stated, k-forms will be assumed to be C30. We will also drop the superscript on the functions o i i , . . . , i k in the coordinate representation of a form. Section 19.3 has shown that the differential of a form is useful in connecting integrals over solids with integrals over their boundary. The differential of a form on a manifold is defined similar to the differential of a form on Rm. For the definition, we first need the differential of a function. To make the notation bearable, we introduce new notation for tangent vectors. Let M be a manifold, let p E M , let x,y be coordinate systems of M whose domains contain p , let f : M + R be differentiable, and let [x,v],, [ y , w I p E M p be so that [x,ulP = [ y ,w I p . Then D (f 0 Y-1) (Y(P))[Wl (f 0 Y-1) = D = D(f 0 (x(P))[ul] ( Y ( P ) )[ D (Y x-1) (x(p))[vl. Hence, for a manifold M , a point p E M , a differentiable function f : M + R and a tangent vector [x,u I p E M p the quantity d f ( p ) [ [ x u, I p ] := D f ox- ( x ( p ) ) [ v ] does not depend on the representative (x,v). With u = ( u 1 , . . . , v,) we obtain '> ( Because of the above we also denote the tangent vector [x,e j I p as af 0 x-l ( x ( p ) ) .This notation expresses the we also write -a( fp ) := - f := axj ax fact that every tangent vector can be interpreted as a geometric object, but also as a differential operator. As was already hinted in the previous section, it is quite common that objects on manifolds have several interpretations. For practical purposes, it is af af sensible to remember that - really is a partial derivative, namely, ax ~ (X(P))> ax but that the composition with the parametrization x-l is omitted in the notation. Now we can define the differential. Definition 19.43 Let f : M + R be differentiable and let x : U + Rm be a coordim ujej we dejne nate system. Then f o r all p E U and u = j=1 445 19.4. k-Forms and Integrals Over k-Chains I f w is a k-jiorm on the manifold M and x : U + R” is a coordinate system on M , then f o r all p E U we dejine the (k 1)-form d w by + and call it the differential of w . To make sure everything is in order, we must prove that this definition is independent of the coordinate system used to define the differential. For 0-forms, this was already done before we defined the differential. Because the brute force computation for k-forms is quite tedious, we first establish some properties of d and then show that all operators with the same properties as d must be equal to d . This also establishes the properties of the differential that we will need later on. Because the differential is defined locally, it has the same properties as the differential from Definition 19.3 1. Theorem 19.44 Let w be a k-jiorm and let q be an l-jiorm on the manifold M . Then the following hold: + q ) = dw + dq. 2. d ( w A q ) = (dw) A q + ( - l ) k w A dq. 1. I f k = 1, then d ( w 3. d i d o ) = 0. Proof. Mimic the proof of Theorem 19.35 (Exercise 19-34). Theorem 19.45 Let M be a manifold, let x : U + R” be a coordinate system and let d’ be a map that takes k-forms on U to (k + 1)-forms on U so that f o r all infinitely dgerentiable f on U the equation d ‘ f = 5 a f ( p ) d x j = d f holds and so that the 8Xj j=1 properties of Theorem 19.44 hold f o r d’. Then d’ = d . Proof. By additivity of d and d’ (see part 1, of Theorem 19.44) it is enough to show that d ( f dxi, A . . . A xi,) = d’ ( f dxi, A . . . A xi,) for all il < . . . < ik and f m ax, infinitely differentiable on U . With f = x , we have d’x, = -( p ) d x J = d x , , 1=1 ax, but we should recall that the “partial derivatives” are not quite-ordinary partial derivatives. Therefore, although it is simple, the reader should verify that the partials really vanish for i $ j and that the i f hpartial is 1 (Exercise 19-35). c 19. Introduction to Differential Geometry 446 We now prove by induction on k that d’ (dxi, A . . . A dxi,) = 0. For k = 1, this follows from d’ ( d x i ) = d’ (d’xi) = 0. For the step k -+ ( k l), we note + d’ (dxj, A . A dxi,,,) = d’ (d’xjl A (d’x;, = ...A d’~j~+,)) d’ (d’xil) A (d’xj2 A . . . A d’x;k+l)- d’x;, A d’ (d’+ A . . . A d’xi,+,) d’ (d’xi,) A (d’xi, A . . . A d’Xj,+,) - d’xj, A d’ (dxi, A . . . A dx;,,,) = 0: = A where the last step is justified by d’ (d’xi) = 0 and by the induction hypothesis. Therefore, with f being a 0-form, we conclude the following for all k . . . . A dxi,) d’f dxi, A . . . A dxi, + (-1)Of d f dxi, A . . . A X i k + 0 d ( f dxj, A . . . A dxi,) , d’ ( f dxj, = = = A A d’ (dxi, A . . . A dxi,) which establishes the result. Theorem 19.45 shows that Definition 19.43 does not depend on the coordinate system we choose. Indeed, on the domain of each coordinate system, any differential d defined as in Definition 19.43 with a specific coordinate system has the properties in Theorem 19.44. By Theorem 19.45, any two such operators must be equal on the intersection of the domains of the coordinate systems used in their definitions. With forms and their differentials defined on manifolds, we can translate some results from the unit cube to manifolds. We first turn to the parametrizations of the domain of integration. Definition 19.46 Let M be a manifold, let k E N and let x be a coordinate system whose range contains [0, l I k x ( O } m - k . A singular k-cube c : [0,lIk + M in M is the composition of the restriction of the function x - l to [0, 1Ik x ( 0 ) m - kwith the natural bijection from [0,lIk to [0, lIk x {O}m-k. Fork = 0, we let a singular 0-cube be the function that maps 0 to x - l ( O ) . A singular 0-cube is a point, a singular 1-cube is a curve and a singular 2-cube is a surface with square parameter domain. The standard k-cube is the identity function I k ( x ) = x on [0, lIk (as we already noted in Definition 19.39). Section 19.3 indicated the importance of chains, so we define them similar to Definition 19.39. Definition 19.47 Let M be a manifold and let S be the set of all singular k-cubes in M . Define the set C of all k-chains in M to be the set of all functions f : S + Z such that f ( c ) = 0f o r all but finitely many c. The elements of C are usually written as a,c, where the a, are integers and the sum has purely formal character: f = c:s ( e l f 0 The definition of the boundary of a singular k-cube is motivated by Theorem 19.38 and the discussion thereafter. 447 19.4. k-Forms and Integrals Over k-Chains Definition 19.48 For a singular k-cube c in the manifold M , we define C ( j , o ) :=C O Zk( ~ , ~ ) k and we set the boundary o f c equal to ac = X ( - l ) ' (c(;,Q - c ( i , l ) ). c(;,1) := COZ;,~), i=l c n For a k-chain a j c j , we define a j=1 To integrate a form, we want to compose it with a singular k-chain and integrate the resulting form. That means we must first devise a way to compose forms with kchains. For a linear map, it is easy to define a corresponding map from the k-tensors on the range to the k-tensors on the domain as follows. Definition 19.49 Let V , W be vector spaces. For T E I k ( W ) , f : V + W linear and u 1 , . . . , U n E V define f * T ( u l , . . . , u k ) = T ( f ( u l ) , . . . , f ( u k ) ) . For f : M + N differentiable, we know that f* is a linear map from M , to N f ( p ) . Definition 19.50 Let M , N be manifolds, let f : M -+ N be diTerentiable and let a k-form on N . Similar to Definition 19.49, f o r each p E M we dejine the k-form f * m on by ( f * W ) ( P ) [ U l ,. . . , u k l := w ( f ( p ) )[ f * p [ u ~ l ,. . . f * p [ u k l ] . In coordinates, this definition reads as follows. w be 9 For a 0-form g, we define f * g := g o f . Note the similarity between Definition 19.50 and the Multidimensional Substitution Formula. If the wi,....,ikare the functions and dyi, A . . . A dy;, are the volume elements, then the right side consists of the composition of the function with the diffeomorphism f multiplied with the transformed volume element. To make working with the coordinate representations easier we first state a simple lemma. Lemma 19.51 Let M , N be manifolds, let x : U + Rm be a coordinate system, let p E U , let f : M -+ N be differentiable and let y : V + R" be a coordinate system with f ( p ) E N . Then f * p ( 1 Proof. Represent D y o f o xP1 componentwise (Exercise 19-37). w Now we can show that the differential commutes with f * , which is the key to Stokes' Theorem for k-chains. 19. Introduction to Differential Geometry 448 Theorem 19.52 Let M , N be manifolds and let w be a k-form on N . Then the equality f * ( d w ) = d ( f * w ) holds. Proof. It is enough to prove the result for forms w = g d y i , A . . . A dyi,. For these forms, we proceed by induction on k . For k = 0, let x be a coordinate system around p and let y be a coordinate system around f ( p ) . It is enough to prove the claimed equality for the base vectors following for all j E , j = 1, . . . , m. Via Lemma 19.51 we obtain the { 1, . . . , m } . For the induction step, we argue as follows. Now the following definition is sensible because open subsets of Rk are manifolds themselves. To assure everything is defined everywhere, we should assume that the function c in the singular k-cube is actually defined on a neighborhood of [0, 1Ik, but a 19.4. k-Forms and Integrals Over k-Chains 449 look at Definition 19.46 reveals that this is not a problem. Conscientious readers may want to mentally substitute x-' instead of c in the definition below. Definition 19.53 Let k M . We define /c w := E N,let c be a singular k-cube in M and let w be a k-form on io,ll 2 c*w. Fork = 0, we define ~~ For a k-chain c = aici in M , we define / 2/ w := c i=l ai l=l w. ci Theorem 19.54 Stokes' Theoremfor chains. I f w is a (k - 1)-form on a manifold M and c is a k-chain in M , then l I, dw = w. Proof. Recall that by Theorem 19.38 the result holds for cubes Z k . (This is most easily seen with notation as in Corollary 19.40.) The idea for the proof therefore is to use Theorem 19.38 and we use it with the notation of Corollary 19.40. Unfortunately, the integral of a form over a k-chain may depend on the coordinate system we chose to define the chain. Ultimately, we want an integral that only depends on the set x-' [0, l]']. Thus we need to investigate how integrals over k-cubes behave under a change of coordinates. The next result shows that a reparametrization f of the set x - l [0, should not affect the value of the integral, because the reparametrized form f*(0) will be multiplied by the factor that we recognize from the Multivariable Substitution Formula. [ [ ilk] 19. Introduction to Differential Geometry 450 Theorem 19.55 Let M and N be m-dimensional manifolds, let f : M + N be differentiable, let x : U -+ Rm be a coordinate system of M let y : V + Rm be a coordinate system of N with f [ U ] g V . Then for every differentiable function g : N -+ R we have f * ( g d y l r \ . . . r \ d y , ) = ( g o f ) d e t Proof. Let p E U . To prove -the claimed equation at p , we only need to con- - sider the action of the functions on (see Exercise 19-36). After a(Yi rewriting f* as f c p 0 f) a (see Lemma 19.51) we obtain the following. where the last step holds because both m-forms evaluate to 1 at the given vectors. W Now we can prove that the integrals of certain m-forms over m-cubes are independent of the parametrization. Theorem 19.56 Let c1, c2 : [0, lIm --f M be two m-cubes in the m-dimensional manifold M so that det ( D o cz)) > 0 at every point in the domain of c l l 0 c:! and let (.[' w be an m-form that vanishes outside c1 "0, l]"] n c2 "0, lIm]. Then /c, = 1; Proof. Let x , y be coordinate systems on M so that the cubes are c1 = x-' i,O,P andc2 = y - 1 . Then with w base for R" we obtain = h dxl A.. . A dx, and { e l ,. . . , em)the standard 19.4. k-Forms and Integrals Over k-Chains li'oc,,[o.llm, h = 45 1 o c1 o (c;' o CZ) det ( D (c;' o c 2 ) ) d h We conclude by noting that everything we have done in this section also works for manifolds with boundaries and for manifolds with corners. The only difference is that the k-cubes could actually touch the boundary in these settings. Hence, we will freely use the results established here for these entities, too. Exercises 19-34. Prove Theorem 19.44. 19-35. Let M be a manifold, let x : U + W" be a coordinate system and let xi := xi o x be the ith V we have, as we would expect, that component of x. Prove that with the new definition of ax, a.ri ax -=Oforifjand-=1 axj axi 19-36. Let w : M + u A k ( M p )be a k-form on M . Prove that each function wx1 1 , ...,i k from Definition PEM 19.42 satisfies wl: ,,,,,i , ( p ) = w ( p ) [[x, e i , l p , . . , [x.el,lp] . 19-37. Prove Lemma 19.51. 19-38. Let M , N , 0 be manifolds and let g : M + N and f : N + 0 be differentiable. Prove that g'f* = (f 0 g)*. 19-39. The boundary of the boundary of a singular k-cube (a) Let k E N be so that (%l)(j,p) k ? 2, let 1 5 i 5 j 5 k - 1 and let a,,!l E (0, I]. Prove that = (%+&), Hint. Straight substitution of a point z E [0, l]k-2 into both sides (b) Prove that if c is a k-chain in the manifold M , then a(&) = 0. k Hint. Represent the boundary operator as ac = ( - l ) i + a c ( l , a ) and use the translai=la=O,l tion of part 19-39a to singular k-cubes to prove that the summands of a(&) cancel in pairs. 452 19. Introduction to Differential Geometry 19.5 Integration on Manifolds Theorem 19.56 indicates that integration over m-cubes should make it possible to define integration “locally” on a manifold. To obtain a “global” notion of integration, we need to piece the local integrals together. With a partition of unity as in Section 16.9 we can reduce any globally defined function to a sum of “locally supported” functions. However, we need to assure that our transition from local to global integrals does not have any unintended consequences. Theorem 19.56 rules out the danger of local inconsistencies. Globally, one of the ideas behind the integral of a form (a vector field) over a manifold is to measure the throughput of a flow (modeled by the fondvector field) through a surface (modeled by the manifold). This idea of throughput is only sensible if the surface has a definite front and back, or, if it is closed, a definite inside and outside. Locally, it is always possible to define a front and a back, but an awkward global situation can cause problems. Consider the Mobius strip depicted in Figure 52. If we start as indicated in the figure, traveling on what we perceive as the “front” of the strip, then after one full traversal of the strip we find ourselves on what we perceive as the back of the strip, plus our directions of “up” and “down” have been reversed. Therefore it is not possible to globally define a positive direction of throughput through the Mobius strip. To avoid such problems, we must limit our attention to those manifolds for which we cannot lose our orientation like that. The first step is to recall that, because for d-dimensional vector spaces V the space A d ( V ) is one dimensional, we can represent the set of bases of V as a union of two disjoint subsets. Definition 19.57 Let V be a d-dimensional real vector space and let o E A d ( V ) . The two sets of bases P := { B : B = ( u 1 , . . . , U d ] is a base of V , w(u1, . . . , u d ) > 0 ) and N := { B : B = {vl, . . . , u d ] is a base of V , o ( q ,. . . , u d ) < 0 ) are called orientations of V and the orientation to which a base ( ~ 1 ., . . , U d ) belongs is denoted [UI, ... 1 Udl. For example, two orthonormal bases in R2(with the order of the vectors assumed to be fixed) are in the same orientation iff each can be obtained from the other by rotating both base vectors by the same amount. This simple visualization shows the problem with the Mobius strip. If we choose an orthonormal base at our starting point (see Figure 5 2 ) and carry it with us in a way that tries to maintain the orientation (one such way is indicated), then, no matter what we do, after one full traversal of the strip, our base will be in the other orientation of EXd. In Figure 52, this is easy to see because if we rotate the base labeled “problem” back onto our original base, we notice that the base vectors have changed roles. The originally horizontal vector is now vertical and vice versa. Such an interchange cannot be achieved with rotations alone. In terms of the form w , the value of w on the pair of vectors has changed sign, which means the bases are in opposite orientations. For a general manifold, we cannot refer to surrounding space, but the tangent spaces allow us to model the idea described above. Simply speaking, we demand that in each domain of a chart x the orientation of the tangent spaces is given by the images of a fixed orientation of R” under the parametrization x-l of the manifold. 19.5.Integration on Manifolds (a) c - + 453 (b) Figure 52: It is impossible to orient a Mobius strip ( b ) . Because of the half-twist in the strip, any orientation that is carried around the strip in a continuous fashion will not arrive as the same orientation at the starting point. On the other hand, spheres are orientable as indicated by the consistent orientation in ( a ) . (We need to be careful interpreting this figure. There is a difficult theorem in topology that says that on a sphere there is no continuous vector field without zeroes. Thus the bases indicated cannot be interpolated by vector fields. However, any two bases can be mapped to each other by translating tangentially along the sphere and then rotating and this process also works if we cany a base in any fashion around the sphere until we are back at the starting point.) Definition 19.58 A choice of an orientation p pfor every tangent space M p of the mdimensional manifold M is called consistent i r f o r every chart x : U + Rm and a lla , b E U we have [x,'[[i,[ui, ~ I L .~. . ,~x+'[[i,[uI, ~ ] , e m l x ( u ) ]=] pUifandonly if [x;' [[ixLu],el],(b)],. . . , x,'[[i,~ul, em],(b)]]= pb. Manifolds with a consistent orientation are called orientable and the function p is called an orientation. A manifold together with an orientation is called an oriented manifold. Charts x : U + I%" such thatfor all a E U we have [ x; ' [ [ i, ~ u] ellx( , al ],. . . , x,'[[i,[u], e,l,(,)]] = pa are called orientation preserving. Note that because k-cubes are defined in terms of coordinate systems, Definition 19.58 also defines orientation preserving k-cubes. For orientation preserving in-cubes c1 and c2, the determinant det ( D (c;' o c2)) is always positive. Therefore Theorem 19.56 shows that the integral of an m-form over an orientation preserving m-cube only depends on the range c "0, lI m]of the m-cube, not on the specific m-cube itself. That is, the integral over c [ [ O , I ] " ] does not depend on the parametrization, as long as we only use orientation preserving parametrizations (m-cubes). Definition 19.59 Let c : [0, 11" + M be an orientation-preserving m-cube in the m-dimensional oriented manifold M and let w be an in-form that vanishes outside c "0, l I m ] .Then we define 454 19. Introduction to Differential Geometry Orientations and forms are defined similarly for manifolds with boundary and for manifolds with comers. Now that we have taken care of the orientability issue, the definition of the integral of a form over a manifold is straightforward. To avoid formal problems, we stay with connected manifolds. This is not a problem, because we will mostly be interested in manifolds that are the disjoint union of at most finitely many connected manifolds. To break up the integral, we need infinitely differentiable partitions of unity that are supported inside singular m-cubes. The following results show that such partitions exist. Definition 19.60 A partition of unity of the manifold M is called a Ccc partition of unity iff all functions in the partition are C30functions. Theorem 19.61 Let 0be an open cover of the connected manifold M . Then there is a Ccopartition of unity subordinate to 0. )IOJ Proof. This proof is the same as for Theorem 16.115, except that we choose the to be Coo instead of just continuous. By Theorem 19.18, this is possible. H Theorem 19.62 Let M be an m-dimensional oriented manifold. Then there is an open cover 0 of M so that for each U E 0 there is an orientation-preserving singular m-cube cu with U C cu “0, llm]. Proof. We provide a slightly more elaborate proof than necessary so that it is easy to see how to generalize the result to manifolds with boundary or corners. For each p E M , let x V -+ R” be an orientation-preserving coordinate system around p that maps p to the origin. Let C, C V be a compact cube in R” so that x ( p ) is in the relative interior (with respect to x[V]) of C,. Then with ic, : [0, 11” -+ C, being the natural bijection between the cubes, the function c p := x - l o i c , is an orientation preserving singular m-cube so that the M-interior of c, “0, 1Ip] contains p . Now let W , E C, be a relatively open subset of x[V] that contains p . Then U p := x-l[W,] is open, p E U p C c, [[O, lIm]and 0 := { U p } p Ecovers ~ M. rn Versions of Theorems 19.61 and 19.62 can also be proved as follows for manifolds with boundary and manifolds with corners. For the Coo partition of unity, we just need to modify Theorem 19.18 appropriately for the boundary points. For the covers, the relatively open sets in the proof of Theorem 19.62 will not be open in Rm,because for boundary points of manifolds with boundary one face of the cube C, will be contained in the boundary of the space Hm. For boundary points of manifolds with corners, several faces could be contained in the boundary of the range of the coordinate system. With all machinery in place it is now easy to see (Exercise 19-41) that the definition below really defines only one number that does not depend on the parametrizations or the choice of partition of unity. The same definition also gives the integral over manifolds with boundary and manifolds with comers. 455 19.5. Integration on Manifolds Definition 19.63 Let M be an m-dimensional oriented manifold and let w be a compactly supported m-form on M . Let 0 be a partition of unity subordinate to an open cover 0 of M so that f o r each U E 0 there is an orientation-preserving singular It is particularly noteworthy that because o is compactly supported, the sum in Definition 19.63 actually is finite (see Exercise 19-40). To visualize these ideas, consider integration over embedded manifolds in Rd.For an embedded manifold M , we can consider every tangent space M , to be a subspace of the tangent space R; of Rd at p . The identification is done as follows. Let p E M , let U 2 Rd be a neighborhood of p in Rd,let V G Rd be open and let h : U + V be a diffeomorphism as in Definition 19.5. As in the proof of Proposition 19.6 let x := XRm o hlunM be the coordinate system around p obtained from h . The function e : M , + Rdp defined by e [ x ,u ] , := [ h , u ] , is the desired isomorphism (see Exercise 19-42). The definition ([idRd,u],, [idRd, w],), := ( u , w),where (., .) is the usual inner product in Rd, produces a natural inner product on Rdp. (This inner product on Rdp is well-defined, because the definition states that we will use the second component of one unique representative of the tangent vector.) Hence, via the above embedding, the tangent spaces M , of embedded manifolds also carry an inner product. Recalling Proposition 18.29 we can define the following. Definition 19.64 Let M be an m-dimensional embedded oriented manifold in Rd. Then f o r each p E M we define the volume element w v ( p ) to be the unique in-form so that f o r all orthonormal bases { v l , . . . , urn}in p, we have w v ( p ) [ u l ,. . . , u,] = 1. Example 19.65 Volume elements encode the integral over open subsets of Rd, as well as the integrals over curves and surfaces. 1. Let R c: Rdbe an open set considered as an oriented embedded manifold with R.Then with x : R + Rd denoting the natural embedding, the volume element wv can be represented as the wedge product wv = dxl A . . . A dxd. In particular, this means that for all f E CF(S-2) the integral of f wv from Definition 19.63 coincides with the Lebesgue integral of p p = [ e l , . . . , ed] for all p E f over a.We will also denote this integral by s, f dV. 2 . Let M be a d - 1 dimensional embedded oriented manifold and let x be an orientation-preserving coordinate system around p , constructed from a diffeomorphism h : U -+ V as in Definition 19.5. We define the unit normal vector to be the unique unit vector n ( p ) in R$ so that n ( p ) ,h,-d(el), . . . , h;j(ed-1)) 1 is a positively oriented base of Rdp. Because the hyperplane determined by D h - ' ( h ( p ) ) [ e l ] ,. . . , D h - ' ( h ( p ) ) [ e d - l ] (see Exercise 19-19) does not depend on the diffeomorphism h , the vector n ( p ) does not depend on the coordinate system. Moreover, the functions n j : M + R that map each p E M to the j t h coordinate of the unit normal vector are differentiable. To see this, in a neighborhood 19. Introduction to Differential Geometry 456 of p , let Z be the solution vector of the system of equations Z, Dh-' [ e j ] )= 0, ( ah-' i = 1 , . . . , d - 1 and Z j = 1 for some j so that 1 # 0 in the h-image of axd the neighborhood of p . Then z is parallel or antiparallel to n . By Exercise 177 1 and the differentiability of the coefficients of the system, the components zj n N are differentiable. The components of n are either the components of N n -- IFIl 7 or llnll of , so the n j are differentiable, too. The volume element of M is w v ( p ) [ u l ,. . . , Vd-11 = det ( n ( p ) ,u1, . . . , V d - I ) , where in the determinant we use the coordinate representation of the vectors with respect to the base {[id,e l l p , . . . , [id,edlp}. Specifically for the case d = 3, if r is an orientation preserving singular 2-cube and f : M -+ IR is differentiable, then which is the parametric formula (from multivariable calculus) for the scalar surface integral of a function f over a parametric surface parametrized by r . For the integral of a vector field F : 2 ' + T R d defined on a neighborhood 52 of M , we define the surface integral to be [, . dS := lM(F, n ) w v . With the above, we obtain the following in case d = 3. which is the parametric formula (from multivariable calculus) for the surface integral of a vector field F over a parametric surface parametrized by r (also see Exercise 18-53c). 3. Let A4 be a one dimensional embedded oriented manifold in IRd, let x : U + R be an orientation preserving coordinate system and let a := x - l . Then (Exercise ( l , ~ ~ ~The ~ ~line ll). 19-43a) the volume element can be written as w v ( t ) [ v ]= u , - 457 19.5. Integration on Manifolds integral for scalar functions is defined in the obvious way. The line integral for vector fields is defined by IM. IM &) dr' := (F, WV. As for the surface integral, these formulas reduce to formulas'fahiliar from multivariable calculus when specific parametrizations are used (see Exercise 19-43). The above examples show that the integral over manifolds encodes the integrals of scalar functions, as well as of vector fields over solids, hypersurfaces, and curves with one formalism. Moreover, this formalism shows that the integrals are independent of the parametrizations chosen for the objects in question, which is a big advantage for theoretical considerations. The computation of numerical values of these integrals still uses parametrizations and proceeds along the same lines as in calculus. Finally, note that the definitions presented here actually work in more general settings. The a-algebra of Borel sets on M is the a-algebra generated by the open subsets of M . The integral can actually be defined for rn-forms gwv for which the function g is Borel measurable. Hence, if we use the measure u that assigns to each Borel subset the integral of its indicator function, we can define the integral like we did on measure spaces. In particular, this means that we can talk about LP spaces for which the domain is a manifold. Regarding the domain of the integral we note that the manifold need not be Co3 C' is sufficient and boundaries and comers are permissible. Exercises 19-40, Prove that in any sum as in Definition 19.63 only finitely many summands are not zero. Hint.Use that the sets [ p : p(p) f 0 ] form a locally finite open cover of M . 19-41. Let M be an m-dimensional oriented manifold, let w be a compactly supported m-form on M , let Q be a partition of unity subordinate to an open cover (3 of M so that for each U E (3 there is an orientation-preserving singular m-cube cu with U C cu "0, l I m ] and let be a partition of unity subordinate to an open cover 6 of M so that for each V E 6 there is an orientation-preserving singularm-cube cv with V 5 cv "0, l]"]. Prove that jM FQ pm = 1 1C.w. 7#?€* Hints. Use Theorem 19.56 to prove that the integrals of the ~ $ @ wdo not depend on whether we choose the cube cu or C V . Then prove that I, 'pw = I, 91C.w. To switch the order of sum- *€* mations use that by Exercise 19-40 only finitely many summands are not zero. 19-42. Let M be an m-dimensional embedded manifold, let p E M , let U g Wd be a neighborhood of p in R d , let V g Rd be open and let h : U -+ V be a diffeomorphism as in Definition 19.5. As in the proof of Proposition 19.6 let x := nRrn o hlunM be the coordinate system around p obtained from h. Prove that e [ x , u I p := [ h , u I p defines an isomorphism from Mp to R"p You must also prove that e is well-defined. 19-43. Let M be a one dimensional embedded oriented manifold and let r : [0, 13 + M be an orientation preserving singular 1-cube. (a) Let x : U -+ R be an orientation preserving coordinate system and let a := x - l Prove that ( , :%). the volume element can be written as w v ( t ) [ u ]= u. - 1 1 (b) Prove that fwv = f ( r ( t ) ) ~ ~ r ' ( drh)(~t )~for all differentiable f : M -+ W 19. Introduction to Differential Geometry 45 8 , A Figure 53: Visualization of the proof of Stokes’ Theorem for embedded manifolds. The parametrization x-’ maps the outward normal direction of the cube in its domain to the outward normal direction of the k-cube that touches the boundary in the manifold. (c) Prove that ( F , -)cq = 1’ ( F ( r ( t ) ) r’(t)) , d h ( t ) for all vector fields F : S2 --f W3 defined on a neighborhood of M . 19.6 Stokes’ Theorem Because the tangent space T a M of an oriented manifold with boundary or with corners is contained in T M , we can obtain an orientation for a M from the orientation of M . Definition 19.66 Let M be an oriented m-dimensional manifold with boundaq or corners and let p be its orientation. For every p E a M that is not contained in a corner; we let [vl, . . . , vm-11 E i f S [ w , v 1 , . . . , u m - l ] E p p for all outward pointing vectors w.The orientation is called the induced orientation or the positive orientation on the boundary. For d-dimensional embedded manifolds in d-dimensional space, it is also called the outward orientation, because the associated normal vector literally points outward. We will always assume that the boundary carries the induced orientation. Formally, for a manifold with comers, the integral over the boundary is a sum of integrals over the pieces of the boundary that are manifolds with comers themselves. To ease notation, it will be understood that the integral over the boundary of a manifold with comers is such a sum. Theorem 19.67 Stokes’ Theorem. Let M be an oriented m-dimensional manifold with boundary or corners, let w be a compactly supported ( m - 1)-form on M and let aM r carry the induced orientation. Then 1.44 r d m = 16’M w’ Proof. We first consider forms that are supported in the M-interior of an orientationpreserving m-cube. The result is trivial if w is supported in the interior of an m-cube c so that c “0, lIm] E M \ aM because then l M d w = l d m = i c w = 0 = lMw. For a form w that is supported in the M-interior of an orientation preserving m-cube c in M that intersects the boundary but not the comers, note that a M nc “0, llm] C ac 19.6. Stokes' Theorem 459 and that w is zero on the parts of the boundary of c that do not intersect aM. Let p E a M n c "0, l]"] and let x be an orientation preserving coordinate system around p. (For the next statement, let a negative sign indicate the opposite orientation.) Then x(p) E aHm and pLX(,)= [ e l , . . . , em] = (-l)"[-e,, e l , . . . , em-1], where -em is the outward unit normal vector. This means that the induced orientation on a M n c "0, I]"] is (-1)" times the usual orientation of M,. We can choose the orientation-preserving singular m-cube c so that a M n c "0, I]"] = c(",o) "0, I]"]. Note that by the above C ( ~ , O ): [0, l]"-l -+ a M is orientation preserving for even m and orientation reversing for odd m. Hence, s w = (-I), c(m.0) I, w. But in 8c the coefficient of c(".o) is (-1)". (The left side of Figure 53 gives a visual idea of what we do here.) Hence, we obtain Finally, when c intersects the comers, a similar argument for the faces of c that are contained in the boundary gives the same result. (Exercise 19-44. The right side of Figure 53 gives a visual idea of what to do.) When we integrate a general form, each form (ow will be supported in the interior of an m-cube as indicated above. Thus we should be able to prove the general result by summation. To move the functions (o from the partition of unity into the differential d,wewillusetheequalityO=Or\w=d(l)r\w=d c(o1 i9.. A W = c ( d ( o ) Am. 9EQ By part 2 of Theorem 19.44 with (o being a 0-form, we obtain d i p ) = (d(o)Aw+(o dw. Hence, because the restrictions of the (o to the boundary are a partition of unity with the requisite properties for defining the integral on the boundary, we conclude With the general result established, we can now present Stokes' Theorem in the forms that are familiar from calculus (also see Figure 54). To prove a Divergence Theorem in Rd, we first need to consider the volume element of (d - 1)-dimensional embedded manifolds. Theorem 19.68 Let M be an embedded oriented (d - 1)-dimensional manifold in Rd and let n be its unit normal vector: Then the volume element can be represented d as w v ( p ) = z ( - l ) ' + j n ; ( p ) dxl A ... A & A . . . A dxd. Moreovel; the equality ;=1 h n j w v = (-l)'+Jdxl A . . . A d x j A . . . A dxd holds. 460 19. Introduction to Differential Geometry Proof. Let U I , . . . , Ud E M p . For the representation of wv, by Exercises 18-33 and 18-34 via expansion with respect to the first column, we obtain the following, where Zj denotes the projection that erases the jthcomponent of each vector. wV(P>[ul,.. . 9 ud-11 d = det (n(p), U 1 , . = E(-l)'+'nj(p) . . , Ud-1) = C(-l)"J.j(P)det(Zj(Ul), j=1 N . . . , Itj(Vd-1)) d dxl A .. . . . A dxd[ul, . . . , ud-11. I * A dxj A j=1 For njwv, we obtain with n being the unit normal vector, nj(P>wV(P)[ul,.. . ud-ll = nj(p)det(n(p ) , u i , . . . , Ud-1) = d e t ( ( n j ( p ) n ( p ) - e j ) + e j , U i , . . . , u d - i ) 9 = det (ej, U 1 , . . . , ud-1) = (-1)"J = (- 1)"jdxl A ... A A det (?j(ui), . , . , nj(ud-1)) N . . . A dxd. Now we can prove the Divergence Theorem. For the remainder of this section, we adopt notation that is seen in physics and the sciences with vectors indicated by arrows on top of the letter and with integrals over closed surfaces and curves denoted with a circle in the integral sign(s). Theorem 19.69 Gauss' Theorem, also known as the Divergence Theorem. If the set E Rd is an embedded oriented connected compact d-dimensional manifol: with boundary or corners, if S = 8 M with positive (outward) orientation and if F is a vector field with continuous partial derivatives on an open region that contains E , then 6F . d.? = ( F ,n ) w; = div ( F ) dw; = div ( F ) d V (also see Fig- ure 54(a)). Proof. This is a consequence of Stokes' Theorem once we note the following: d((F,n)w$) = j=l 19.6. Stokes ’ Theorem 46 1 Figure 54: The Divergence Theorem ( a ) says that the integral of a vector field $ over a closed surface S equals the integral of the field’s divergence divF over the-enclosed solid E . Stokes’ Theorem ( b ) says that the line integral of a vector field F along a closed curve C equals the surface integral of curl@ over any surface S bounded by the curve. The result that is typically called “Stokes’ Theorem” is also a special case of Theorem 19.67. Note that integrals over two and three dimensional objects are often denoted with two and three integral signs, respectively. Theorem 19.70 Stokes’ Theorem for compact surfaces in R3. If S is an embedded oriented connected compact two dimensioqal manifold with boundary or corners, i f C = as with positive orientation and i f F is a vector field with continuous partial derivatives in an open region of R3 that contains S, then (also see Figure 54(b)) Proof. Exercise 19-45. We conclude with an important result for line integrals. 19. Introduction to Differential Geometry 462 Theorem 19.71 Fundamental Theorem for Line Integrals. Let the curve C be parametrized by the continuously diTerentiable function ;(t), a 5 t 5 b. Iff is dlfSerentiable and V f is continuous on C,then . dr' = f ( r ' ( b ) )- f ( ; ( a ) ) . Vf IC Proof. This can be proved with manifolds or directly. (Exercise 19-46.) Exercises 19-44. Let M be an m-dimensional manifold with comers, let x : M -+ Ck be a coordinate system and let c = .-I : [0, lIm +. M be an order-preserving m-cube. Prove that for all ( m - 1)-forms w l,O.lP that are supported in the M-interior of c "0, l]"] the equality Hint. wLx(p)= [ e l , . . . , e m ] = ( - l ) j /M d o = [ - e j , e l , ... ,e j , ... , e m ] . A 1 o holds 19-45. Prove Theorem 19.70. i Hint. Prove that F integral. - ' 1;dIl) o$ = P d x l + Qdx2 + Rdx3. Then use Theorem 19.68 for the double 19-46. Prove Theorem 19.71. 19-47. Some integrals. (a) Compute the integral of the vector field $(x, y , z ) = { (x, y , 0 ) E R3 : x* + y 2 = 1 1. (b) Compute the integral of the vector field @ ( x ,y , z ) = to (4, - 3 , 2 ) . (11 y over the line segment from (0, 0, 1) YZ' (c) Compute the integral of the vector field e ( x , y , z ) = ( x 2 z 2 + cylinder { (x, y , z ) E R3 : x2 + y 2 5 1. -1 5 19-48. A subset R _C Rm is called convex iff for all x, y is contained in R. E z 5 1 1. \ z-y )/ over the surface of the R the line segment { x + t ( y - x ) : t E [O, 11 ] (a) Prove that if R & W3 is open and convex and $ ; + W3 satisfies curl$ = 0 on R,then there is a differentiable function q : Q + W so that F = V q . Hint Let x E R be fixed and define q ( z ) to be the line integral cp(z) := the integral is over the line segment from x to z . s [.;,?I F . d r , where (b) A connected open subset R C B3 is called simply connected iff every closed curve, that is parametrized by a continuous function i! : [ a , b] --f W3 for which there is a c E [ a . 61 so that ?l[,,] and i ! / [ c , b ~are continuously differentiable, is the boundary of a compact embedded two dimensional manifold with comers that is contained in R. Explain why the result from part 19-48a also holds for simply connected sets Q. 19-49. Green's Theorem. Let D be a two dimensional embedded connected compact oriented manifold with boundarv or corners. let C = a D be the boundary curve with positive orientation. Let D be the region in the plane bounded by C. Prove that if P and Q have continuous partial derivatives on an open set that contains D ,then 6 ( g) d; = ID(g g) - dh. Chapter 20 Hilbert Spaces In addition to the topological structure of a metric space and the linear structure of a normed space, in an inner product space we can measure angles and in particular we can define orthogonality. This additional structure allows us to derive results that are not easily accessible otherwise. The properties of orthonormal bases investigated in Section 20.1 will allow us to establish the L2-convergence of Fourier series in Section 20.2 and we conclude in Section 20.3 with Riesz’ Representation Theorem for linear functionals on Hilbert spaces. As noted in Section 15.9, the inner products of real and complex inner product spaces have slightly different properties. To avoid stating all results for real and for complex spaces, in Sections 20.1 and 20.3 we will assume that our inner product spaces are complex. The proofs will also work for real inner product spaces, because for a real number, the real part and the complex conjugate are equal to the number itself. 20.1 Orthonormal Bases Because an inner product allows us to define orthogonality, we are interested in representing the elements of an inner product space as a sum of orthonormal vectors, similar to the base representation of vectors in d-dimensional space. This section presents the general results and Section 20.2 shows the consequences for the representation of functions with trigonometric polynomials. We first need to make sure that in a representation with an orthonormal system there are not too many nonzero coefficients. Proposition 20.1 Bessel’s inequality. Let S be an orthonormal system in the inner product space H and let x E H . Then {s E S : (x,s) f. 0} is countable and c l(x. 5 IIx1I2. S€S 463 20. Hilbert Spaces 464 Proof. First let C C S be finite. Then C€C c C€C R C C€C which means l(x,c ) I 5 ) ) x ) ) Now ~ . suppose for a contradiction that the set 2 C€C + 0} is not countable. Then there are an E > 0 and a set B 5 S so that B is at least countably infinite and for all b E B the inequality (x,b) > E {s E S : (x,s) holds. Let N B . Then E N be greater than / ( x , b ) I 2> N F > b€BN I I I - and let BN j 1;1112 - E l2 5 B be an N-element subset of 2 )Ix 112, a contradiction. Therefore the set { s E S : (x,s) $ 0 ) must be countable. The inequality follows from the inequality for finite subsets of S proved above. Bessel’s inequality shows that the sum c(x, s)s, which, under the right circum- S€S stances, should represent the element x,must converge in a Hilbert space (see Exercise 20-1). Thus we can define orthonormal bases. c(x, Definition 20.2 An orthonormal system S in an inner product space H is called an orthonormal base #for all x E H the series s)s converges to x. The numbers S€S (x,s ) are also called the Fourier coefficients of x with respect to S. The term “Fourier coefficients” is usually associated with the expansion of functions in terms of trigonometric functions. Section 20.2 will show that the results here generalize the original Fourier expansions. Theorems 20.3 and 20.4 give several criteria for an orthonormal system to be an orthonormal base. Note that we will freely use the continuity of the inner product in both factors, which is guaranteed by the Cauchy-Schwarz inequality. cI Theorem 20.3 Parseval’s identity. An orthonormal system S in an inner product 2 space H is an orthonormal base $for all x E H we have [lx/I2 = (x,S) . S€S 1 20.1. Orthonormal Bases 465 Proof. For “+,”note that if S is an orthonormal base, then S€S For “e,” let llx112 = c l(x,s)I2 for all x S€S S, := {s E S : (x,s ) f 0) is countable. Let E H . By Proposition 20.1 the set (sj)g,be an enumeration of S,, c let 00 E > 0 and let N E N be > so that for all n N the inequality l(x,sj)12 iE~ J=n+l holds. Then for all n 1. N we obtain the following: n c(x, M 11 Hence, x - sj)sj < E for all n 1 N and the series converges to x. W /=I 2. S is maximal. 3. span(S) is dense in H Proof. “1 +3” follows directly from the definition of orthonormal bases. For “ 3 3 2 , ” let span(S) be dense in H . For a contradiction, suppose S is not maximal. Then there is a b E H so that llbll = 1 and ( b ,s) = 0 for all s E S. But then (b,c) = 0 for all c E span(S), and for all c E span(S) we conclude that Ilb - ell2 = j/b1I2- 2 X ( ( b ,c ) ) llcll2 = llbj1* llc112 2 1. This is a contradiction to span(S) being dense in H . For “2+ 1,” let S be a maximal orthonormal system in H . Suppose for a contradiction that there is an x E H so that x # c ( x , s)s. Let b := x - x ( x , s)s # 0. Then + + .\ES \ES 466 20. Hilbert Spaces for all t E S we infer which contradicts the maximality of S. A metric space is called separable iff it has a countable dense subset. Separable Hilbert spaces are important in quantum mechanics. The next result shows that, similar to Proposition 15.25 and Theorem 16.76 for finite dimensional spaces, all infinite dimensional separable Hilbert spaces are “the same.” Theorem 20.5 Every infinite dimensional separable Hilbert space H is isomorphic to the space 12. Proof. We will first prove that H has a countable orthonormal base. To do this let C = {c, : n E N} H be a countable dense subset with c1 $ 0. Construct the subset B C C recursively from C as follows. Let c1 E B. For all integers IZ 3 2, let c, E B iff c, $ span({cl,. . . , c n - l } ) . Then for all n E N the set {Ck : k 5 n , Ck E B} is linearly independent and span({ck : k 5 n , Ck E B}) = span({cl,. . . , c,}). Hence, B is linearly independent and span(B) = span(C). The set B must be infinite, because otherwise, H has a finite dimensional dense subspace and is thus itself finite dimensional (Exercise 20-2). Now let {b, : n E N}be an enumeration of B and apply the GramSchmidt Orthonormalization Procedure indefinitely to obtain the orthonormal system S = {s, : n E N} with span(S) = span(B) = span(C). Then S is an orthonormal system whose span is dense in H , which means S is an orthonormal base of H . Because S is a countable orthonormal base, every element x E H has a unique CD ‘x) representation x = c ( x , s j ) s j . The map Z : H + l 2 with Z(x) := c ( x , s j ) e j is j=1 j=1 the desired isomorphism (details left to Exercise 20-4a). Exercises 20-1. Prove that i f . H is a Hilbert space, S G H is an orthonormal system and x E H , then E(x, 5)s S€S converges. 20-2. Prove that if the Hilbert space H has a finite dimensional dense subspace F , then H is finite dimensional. Hint. Find an orthonormal base for F . 20-3. A characterization of equality. Let H he an inner product space, let D g H be dense and let u , f E H . Prove that u = f iff for all x E D we have ( u ,x ) = (f,x). 20-4. Finishing the proof of Theorem 20.5 (a) Prove that the function I defined at the end of the proof of Theorem 20.5 is well-defined, linear, bijective, and continuous and that its inverse is continuous, too. (b) Explain why we need that H is a Hilbert space in Theorem 20.5. 20-5. Let H be a Hilbert space and let S be an orthonormal base of H . (a) Prove that for all x , y E H we have x = y iff ( x , s ) = ( y , 5 ) for all s E S 20.2. Fourier Series 467 (b) Let Y be a Banach space. Prove that if L , M : H L ( s ) = M ( s ) for all s E S, then L = M . --f Y are continuous linear functions and 20-6. Prove that every orthonormal system in a separable inner product space H is at most countable. Hint. The distance between any two distinct elements in an orthonormal system is &.Use that if C is dense, then for each element u in an uncountable orthonormal system, there must be an element cL1E C that is closer than A2 to u 20.2 Fourier Series The representation of elements of inner product spaces is motivated by the corresponding representation in Rd as well as by the representation of functions via trigonometric polynomials. This representation is important, because it (and similar representations) arise naturally when solving partial differential equations (see Section 21.3). We will now use the tools we have introduced for inner product spaces to investigate the convergence of Fourier series of functions on [-n, n).In this section, we work with the real normed spaces L P [ - n , n ) =