Uploaded by Yohan Kim

mathematical-analysis-a-concise-introduction compress

advertisement
Mathematical Analysis
A Concise Introduction
Bernd S. W. Schroder
Louisiana Tech University
Program of Mathematics and Statistics
Ruston, LA
BtCLNTENNIAL
WILEY-INTERSCIENCE
A John Wiley & Sons, Inc., Publication
This Page Intentionally Left Blank
Mathematical Analysis
THE W l L E Y BICENTENNIAL-KNOWLEDGEFOR
GENERATIONS
6
ach generation has its unique needs and aspirations. When Charles Wiley first
opened his small printing shop in lower Manhattan in 1807, it was a generation
of boundless potential searching for an identity. And we were there, helping to
define a new American literary tradition. Over half a century later, in the midst
of the Second Industrial Revolution, it was a generation focused on building the
future. Once again, we were there, supplying the critical scientific, technical, and
engineering knowledge that helped frame the world. Throughout the 20th
Century, and into the new millennium, nations began to reach out beyond their
own borders and a new international community was born. Wiley was there,
expanding its operations around the world to enable a global exchange of ideas,
opinions, and know-how.
For 200 years, Wiley has been an integral part of each generation's journey,
enabling the flow of information and understanding necessary to meet their needs
and fulfill their aspirations. Today, bold new technologies are changing the way
we live and learn. Wiley will be there, providing you the must-have knowledge
you need to imagine new worlds, new possibilities, and new opportunities.
Generations come and go, but you can always count on Wiley to provide you the
knowledge you need, when and where you need it!
rc'\
U
WILLIAM J. PESCE
PRESIDENT AND CHIEF
EXECUTIVEOFFICER
PETER B O O T H WlLEV
CHAIRMAN
OF
THE BOARD
Mathematical Analysis
A Concise Introduction
Bernd S. W. Schroder
Louisiana Tech University
Program of Mathematics and Statistics
Ruston, LA
BtCLNTENNIAL
WILEY-INTERSCIENCE
A John Wiley & Sons, Inc., Publication
Copyright C 2008 by John Wiley & Sons, Inc. All rights reserved
Published by John Wiley & Sons, Inc., Hoboken, New Jersey.
Published simultaneously in Canada.
No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or
by any means, electronic, mechanical, photocopying, recording, scanning, or otherwise, except as
permitted under Section 107 or 108 of the 1976 United States Copyright Act, without either the prior
written permission of the Publisher, or authorization through payment of the appropriate per-copy fee to
the Copyright Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, (978) 750-8400, fax
(978) 750-4470, or on the web at www.copyright.com. Requests to the Publisher for permission should
be addressed to the Permissions Department, John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ
07030, (201) 748-601 1, fax (201) 748-6008, or online at http://www.wiley.com/go/permission.
Limit of Liability/Disclaimer of Warranty: While the publisher and author have used their best efforts in
preparing this book, they make no representations or warranties with respect to the accuracy or
completeness of the contents of this book and specifically disclaim any implied warranties of
merchantability or fitness for a particular purpose. No warranty may be created or extended by sales
representatives or written sales materials. The advice and strategies contained herein may not be suitable
for your situation. You should consult with a professional where appropriate. Neither the publisher nor
author shall be liable for any loss of profit or any other commercial damages, including but not limited
to special, incidental, consequential, or other damages.
For general information on our other products and services or for technical support, please contact our
Customer Care Department within the United States at (800) 762-2974, outside the United States at
(317) 572-3993 or fax (317) 572-4002.
Wiley also publishes its books in a variety of electronic formats. Some content that appears in print may
not be available in electronic format. For information about Wiley products, visit our web site at
www.wiley.com.
Wiley Bicentennial Logo: Richard J. Pacific0
Library of Congress Cataloging-in-Publication Data:
Schroder, Bernd S. W. (Bemd Siegfried Walter), 1966Mathematical analysis : a concise introduction / Bernd S.W. Schroder
p. cm.
ISBN 978-0-470-10796-6 (cloth)
1. Mathematical analysis. I. Title.
QA300.S376 2007
5 15-dc22
2007024690
Printed in the United States of America.
Contents
Table of Contents
V
Preface
xi
Part I: Analysis of Functions of a Single Real Variable
1 The Real Numbers
1.1 Field Axioms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.2 Order Axioms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.3 Lowest Upper and Greatest Lower Bounds . . . . . . . . . . . . . . .
1.4 Natural Numbers, Integers. and Rational Numbers . . . . . . . . . . .
1.5 Recursion. Induction. Summations. and Products . . . . . . . . . . .
1
1
4
8
11
17
:2 Sequences of Real Numbers
2.1 Limits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.2 Limit Laws . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.3 CauchySequences . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.4 Bounded Sequences . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.5 Infinite Limits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
25
25
30
36
40
44
:3 Continuous Functions
3.1 Limits of Functions . . . . . . . . . . . . . . . . . . . . . . . . . . .
49
49
3.2 Limit Laws . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.3 One-sided Limits and Infinite Limits . . . . . . . . . . . . . . . . . .
3.4
3.5
3.6
4
Continuity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Properties of Continuous Functions . . . . . . . . . . . . . . . . .
Limits at Infinity . . . . . . . . . . . . . . . . . . . . . . . . . . . .
59
. . 66
Differentiable Functions
4.1 Differentiability . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.2 Differentiation Rules . . . . . . . . . . . . . . . . . . . . . . . . . .
4.3 Rolle’s Theorem and the Mean Value Theorem . . . . . . . . . . . .
V
52
56
69
71
71
74
80
vi
Con tents
5 The Riemann Integral I
5.1 Riemann Sums and the Integral . . . . . . . . . . . . . . . . . . . . .
5.2 Uniform Continuity and Integrability of Continuous Functions . . . .
5.3 The Fundamental Theorem of Calculus . . . . . . . . . . . . . . . .
5.4 The Darboux Integral . . . . . . . . . . . . . . . . . . . . . . . . . .
85
85
91
95
97
6 Series of Real Numbers I
101
6.1 Series as a Vehicle To Define Infinite Sums . . . . . . . . . . . . . . 101
6.2 AbsoluteConvergenceandUnconditionalConvergence . . . . . . . . 108
7
Some Set Theory
7.1 The Algebra of Sets . . . . . . . . . . . . . . . . . . . . . . . .
7.2 Countable Sets . . . . . . . . . . . . . . . . . . . . . . . . . .
7.3 Uncountable Sets . . . . . . . . . . . . . . . . . . . . . . . . .
117
117
122
124
8
The Riemann Integral I1
8.1 Outer Lebesgue Measure . . . . . . . . . . . . . . . . . . . . . . .
8.2 Lebesgue’s Criterion for Riemann Integrability . . . . . . . . . . .
8.3 More Integral Theorems . . . . . . . . . . . . . . . . . . . . . . .
8.4 Improper Riemann Integrals . . . . . . . . . . . . . . . . . . . . .
127
127
131
136
140
9 The Lebesgue Integral
145
9.1 Lebesgue Measurable Sets . . . . . . . . . . . . . . . . . . . . . . .
147
9.2 Lebesgue Measurable Functions . . . . . . . . . . . . . . . . . . . .
153
9.3 Lebesgue Integration . . . . . . . . . . . . . . . . . . . . . . . . . .
158
9.4 Lebesgue Integrals versus Riemann Integrals . . . . . . . . . . . . . 165
10 Series of Real Numbers I1
10.1 Limits Superior and Inferior . . . . . . . . . . . . . . . . . . . . . .
10.2 The Root Test and the Ratio Test . . . . . . . . . . . . . . . . . . . .
10.3 Power Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
169
169
172
11 Sequences of Functions
11.1 Notions of Convergence . . . . . . . . . . . . . . . . . . . . . . . .
11.2 Uniform Convergence . . . . . . . . . . . . . . . . . . . . . . . . . .
179
179
182
12 Transcendental Functions
12.1 The Exponential Function . . . . . . . . . . . . . . . . . . . . . . . .
12.2 Sine and Cosine . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
12.3 L‘H6pital’s Rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
189
189
193
199
13 Numerical Methods
13.1 Approximation with Taylor Polynomials . . . . . . . . . .
13.2 Newton’s Method . . . . . . . . . . . . . . . . . . . . . . .
13.3 Numerical Integration . . . . . . . . . . . . . . . . . . . . .
203
204
208
214
.
175
Con tents
vii
Part 11: Analysis in Abstract Spaces
14 Integration on Measure Spaces
14.1 Measure Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
225
225
14.2 Outer Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
14.3 Measurable Functions . . . . . . . . . . . . . . . . . . . . . . . . . .
230
234
14.4 Integration of Measurable Functions . . . . . . . . . . . . . . . . . . 235
14.5 Monotone and Dominated Convergence . . . . . . . . . . . . . . . . 238
14.6 Convergence in Mean. in Measure. and Almost Everywhere . . . . .
14.7 Product a-Algebras . . . . . . . . . . . . . . . . . . . . . . . . . . .
. 242
245
14.8 Product Measures and Fubini’s Theorem . . . . . . . . . . . . . . . . 251
15 The Abstract Venues for Analysis
15.1 Abstraction I: Vector Spaces . . . . . . . . . . . . . . . . . . . . . .
15.2 Representation of Elements: Bases and Dimension . . . . . . . . .
15.3 Identification of Spaces: Isomorphism . . . . . . . . . . . . . . . .
15.4 Abstraction 11: Inner Product Spaces . . . . . . . . . . . . . . . . .
15.5 Nicer Representations: Orthonormal Sets . . . . . . . . . . . . . .
15.6 Abstraction 111: Normed Spaces . . . . . . . . . . . . . . . . . . . .
15.7 Abstraction IV: Metric Spaces . . . . . . . . . . . . . . . . . . . . .
15.8 LP Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
15.9 Another Number Field: Complex Numbers . . . . . . . . . . . . .
.
.
.
.
.
255
255
259
262
264
267
269
275
278
281
16 The Topology of Metric Spaces
16.1 Convergence of Sequences . . . . . . . . . . . . . . . . . . . . . . .
16.2 Completeness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
16.3 Continuous Functions . . . . . . . . . . . . . . . . . . . . . . . . . .
287
287
29 1
296
16.4 Open and Closed Sets . . . . . . . . . . . . . . . . . . . . . . . . . .
16.5 Compactness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
301
309
316
322
330
333
16.6 The Normed Topology of Rd . . . . . . . . . . . . . . . . . . . . . .
16.7 Dense Subspaces . . . . . . . . . . . . . . . . . . . . . . . . . . . .
16.8 Connectedness . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
16.9 Locally Compact Spaces . . . . . . . . . . . . . . . . . . . . . . . .
17 Differentiation in Normed Spaces
341
17.1 Continuous Linear Functions . . . . . . . . . . . . . . . . . . . . . .
342
17.2 Matrix Representation of Linear Functions . . . . . . . . . . . . . . . 348
17.3 Differentiability . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
353
17.4 The Mean Value Theorem . . . . . . . . . . . . . . . . . . . . . . .
360
17.5 How Partial Derivatives Fit In . . . . . . . . . . . . . . . . . . . . .
362
17.6 Multilinear Functions (Tensors) . . . . . . . . . . . . . . . . . . . .
369
17.7 Higher Derivatives . . . . . . . . . . . . . . . . . . . . . . . . . . .
373
17.8 The Implicit Function Theorem . . . . . . . . . . . . . . . . . . . . .
380
...
Vlll
Con tents
18 Measure. Topology. and Differentiation
18.1 Lebesgue Measurable Sets in Rd . . . . . . . . . . . . . . . . . . . .
18.2 Cco and Approximation of Integrable Functions . . . . . . . . . . . .
18.3 Tensor Algebra and Determinants . . . . . . . . . . . . . . . . . . .
18.4 Multidimensional Substitution . . . . . . . . . . . . . . . . . . . . .
385
385
391
397
407
19 Introduction to Differential Geometry
42 1
19.1 Manifolds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
421
19.2 Tangent Spaces and Differentiable Functions . . . . . . . . . . . . . . 427
19.3 Differential Forms. Integrals Over the Unit Cube . . . . . . . . . . . 434
19.4 k-Forms and Integrals Over k-Chains . . . . . . . . . . . . . . . . . . 443
19.5 Integration on Manifolds . . . . . . . . . . . . . . . . . . . . . . . .
452
19.6 Stokes’ Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . .
458
20 Hilbert Spaces
20.1 Orthonormal Bases . . . . . . . . . . . . . . . . . . . . . . . . . . .
20.2 Fourier Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
20.3 The Riesz Representation Theorem . . . . . . . . . . . . . . . . . . .
463
463
467
475
Part 111: Applied Analysis
21 Physics Background
2 1.1 Harmonic Oscillators . . . . . . . . . . . . . . . . . . . . . . . . . .
21.2 Heat and Diffusion . . . . . . . . . . . . . . . . . . . . . . . . . . .
21.3 Separation of Variables. Fourier Series. and Ordinary Differential Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2 1.4 Maxwell’s Equations . . . . . . . . . . . . . . . . . . . . . . . . . .
21.5 The Navier Stokes Equation for the Conservation of Mass . . . . . . .
483
484
486
22 Ordinary Differential Equations
22.1 Banach Space Valued Differential Equations . . . . . . . . . . . . . .
22.2 An Existence and Uniqueness Theorem . . . . . . . . . . . . . . . .
22.3 Linear Differential Equations . . . . . . . . . . . . . . . . . . . . . .
505
505
508
510
23 The Finite Element Method
23.1 Ritz-Galerkin Approximation . . . . . . . . . . . . . . . . . . . . . .
23.2 Weakly Differentiable Functions . . . . . . . . . . . . . . . . . . . .
23.3 Sobolev Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
23.4 Elliptic Differential Operators . . . . . . . . . . . . . . . . . . . . .
23.5 Finite Elements . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
513
513
518
524
532
536
Conclusion and Outlook
490
493
496
544
Con tents
1x
Appendices
A Logic
A.l Statements. . . . . . . . . . . . . . . . . . . . , . . . . . . . . .
A.2 Negations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
545
.
,
. 545
. 546
B SetTheory
547
B.l The Zermelo-Fraenkel Axioms . . . . . . . . . . . . . . . . . . . . . 547
B.2 Relations and Functions . . . . . . . . . . . . . . . . . . . . . . . . . 548
C Natural Numbers, Integers, and Rational Numbers
C.1 The Natural Numbers . . , . . , . . . . . . . . . . . . . . . . . . .
549
. 549
C.2 The Integers . . . . . . . . . . . . . . . . . . . . . . . , . . . . . . . 550
C.3 The Rational Numbers . . . . . . . . . . . . . . . . . . . . . . . . . 550
Bibliography
55 1
Index
553
Con tents
X
Theon
systems
Background:
Brieflv in Amendices
Chapter 6
senes I
I
I
Part I:
Analysis on R
Counrabdii)
Chapter 5
RlemB""
+
Chapter 8
RlCma""
Chapter I I
Sequencesof
Functions
Integral I
Integral II
Chapter 13
Chapter 9
Lcbeigue
Integral
Chapter 12
Trmscendenral
Chapter 21
Ph>iics
Background
Chapter 22
Ordmay
Differential Equations
4
h"lIleClCd
Methods
4
c
F""Ctl0"S
Part 11:
Abstract Analysis
Part 111:
Applications
Chapter 21
Panial DEr.
Finite Elements
Figure 1: Content dependency chart with minimum prerequisites indicated by arrows.
Some remarks, examples, and exercises in the later chapter might still depend on other
earlier chapters, but this problem typically can be resolved by quoting a single result.
Details about where and how the reader can "branch out" are given in
in the
text.
Preface
This text is a self-contained introduction to the fundamentals of analysis. The only
prerequisite is some experience with mathematical language and proofs. That is, it
helps to be familiar with the structure of mathematical statements and with proof methods, such as direct proofs, proofs by contradiction, or induction. With some support
in the right places, mostly in the early chapters, this text can also be used without
prerequisites in a first proof class.
Mastering proofs in analysis is one of the key steps toward becoming a mathematician. To develop sound proof writing techniques, standard proof techniques are
discussed early in the text and for a while they are pointed out explicitly. Throughout,
proofs are presented with as much detail and as little hand waving as possible. This
makes some proofs (for example, the density of C [ a ,b] in L P [ a ,b]in Part 11)notationally a bit complicated. With computers now being a regular tool in mathematics, the
author considers this appropriate. When code is written for a problem, all details must
be implemented, even those that are omitted in proofs. Seeing a few highly detailed
proofs is reasonable preparation for such tasks. Moreover, to facilitate the transition
to more abstract settings, such as measure, inner product, normed, and metric spaces,
the results for single variable functions are proved using methods that translate to these
abstract settings. For example, early proofs rely extensively on sequences and we also
use the completeness of the real numbers rather than their order properties.
Analysis is important for applications, because it provides the abstract background
that allows us to apply the full power of mathematics to scientific problems. This text
shows that all abstractions are well motivated by the desire to build a strong theory
that connects to specific applications. Readers who complete this text will be ready
for all analysis-based and analysis-related subjects in mathematics, including complex
analysis, differential equations, differential geometry, functional analysis, harmonic
analysis, mathematical physics, measure theory, numerical analysis, partial differential
equations, probability theory, and topology. Readers interested in motivation from
physics are advised to browse Chapter 21, even if they have not read any of the earlier
chapters.
Aside from the topics covered, readers interested in applications should note that
the axiomatic approach of mathematics is similar to problem solving in other fields.
In mathematics, theories are built on axioms. Similarly, in applications, models are
subject to constraints. Neither the axioms, nor the constraints can be violated by the
theory or model. Building a theory based on axioms fosters the reader's discipline to
not make unwarranted assumptions.
xii
Preface
Organization of the content. The text consists of three large parts. Part I, comprised of Chapters 1-13, presents the analysis of functions of one real variable, including a motivated introduction to the Lebesgue integral. Chapters 1-6 and 10-13 could
be called “single variable calculus with proofs.” For a smooth transition from calculus
and a gradual increase in abstraction, Chapters 1-6 require very little set theory. Chapter 1 presents the properties of the real line and limits of sequences are introduced in
Chapter 2. Chapters 3-5 present the fundamentals on continuity, differentiation, and
(Riemann) integration in this order and Chapter 6 gives a first introduction to series.
Chapters 6-8 are motivated by the desire to further explore the Riemann integral
while avoiding the excessive use of Riemann sums. This exploration is done with
the Lebesgue criterion for Riemann integrability. Although this criterion requires the
Lebesgue measure, the payoff is that many proofs become simpler. To quickly reach
this criterion, the first presentation of series in Chapter 6 is deliberately kept short.
It presents enough about series to allow the definition of Lebesgue measure. Chapter 7 presents fundamental notions of set theory. Most of these ideas are needed for
Lebesgue measure, but, overall, Chapter 7 contains all the set theory needed in the remainder of the text. Chapter 8 finishes the presentation of the Riemann integral. With
Lebesgue measure available, it is natural to investigate the Lebesgue integral in Chapter 9. This chapter could also be delayed to the end of Part I, but the author believes
that early exposure to the crucial ideas will ease the later transition to measure spaces.
The analysis of single variable functions is finished with the rigorous introduction of the transcendental functions. The necessary background on power series is
explored in Chapter 10. Chapter 11 presents some fundamentals on the convergence of
sequences of functions and Chapter 12 is devoted to the transcendental functions themselves. Chapter 13 discusses general numerical methods, but transcendental functions
provide a rich test bed for the methods presented.
Part I of the text can be read or presented in many orders. Figure 1 shows the
prerequisite structure of the text. Prerequisites for each chapter have deliberately been
kept minimal. In this fashion, the order of topics in the reader’s first contact with
proofs in analysis can be adapted to many readers’ preferences. Most notably, the
intentionally early presentation of Lebesgue integration can be postponed to the end
of Part I if so desired. Throughout, the author intends to keep the reader engaged by
providing motivation for all abstractions. Consequently, as Figure 1 and the table of
contents indicate, some concepts and results are presented in a “just-in-time’’ fashion
rather than in what may be considered their traditional place. If a concept is needed
in an exercise before the concept is “officially” defined in the text, the concept will be
defined in the exercise and in the text.
Part 11, comprised of Chapters 14-20, explores how the appropriate abstractions
lead to a powerful and widely applicable theoretical foundation for all branches of applied mathematics. The desire to define an integral in d-dimensional space provides a
natural motivation to introduce measure spaces in Chapter 14. This chapter facilitates
the transition to more abstract mathematics by frequently referring back to corresponding results for the one dimensional Lebesgue integral. The proofs of these results
usually are verbatim the same as in the one-dimensional setting. Moreover, this early
introduction makes LP spaces available as examples for the rest of the text. The abstract venues of analysis are then presented in Chapter 15, which provides all examples
Preface
...
Xlll
for the rest of Part 11.
The fundamentals on metric spaces and continuity are presented in Chapter 16. As
with measure spaces, for several results on metric spaces the reader is referred back to
the corresponding proof for single variable functions. Proofs are no longer verbatim the
same and abstraction is facilitated by translating proofs from a familiar setting to the
new setting while analyzing similarities and differences. In a class, the author suggests
that the teacher fill in some of these proofs to demonstrate the process.
Chapter 17 presents the fundamentals on normed spaces and differentiation. Again,
ideas are similar to those for functions of a single variable, but this time the abstraction
goes beyond translation. With all three fundamental concepts (integration, continuity,
and differentiation) available in the abstract setting, Chapter 18 shows the interrelationship between concepts presented separately before, culminating in the Multivariable
Substitution Formula.
The second part is completed by a presentation of the fundamentals of analysis on
manifolds, together with a physical interpretation of key concepts in Chapter 19 and by
an introduction to Hilbert spaces in Chapter 20.
The remaining chapters give a brief outlook to applied subjects in which analysis
is used, specifically, physics in Chapter 21, ordinary differential equations in Chapter
22, and partial differential equations and the finite element method in Chapter 23. Each
of these chapters can only give a taste of its subject and I encourage the reader to go
deeper into the utterly fascinating applications that lie behind part 111. The mathematical preparation through this text should facilitate the transition.
It should be possible to cover the bulk of the text in a two course sequence. Although Chapters 14-16 should be read in order, depending on the available time, the
pace and the choice of topics, any of Chapters 17-23 can serve as a capstone experience.
How to read this text. Mathematics in general, and analysis in particular, is not
a spectator sport. It is learned by doing. To allow the reader to “do” mathematics,
each section has exercises of varying degrees of difficulty. Some exercises require the
adaptation of an argument in the text. These exercises are also intended to make the
reader critically analyze the argument before adapting it. This is the first step toward
being able to write proofs. Of course the need for very critical (and slow) reading of
mathematics is nicely summed up in the old quote that “To read without a pencil is
daydreaming.” The reader should ask himherself after every sentence “What does this
mean? Why is this justified?’ Making notes in the margin to explain the harder steps
will allow the reader to answer these questions more easily in the second and third
readings of a proof. So it is important to read thoroughly and slowly, to make notes
and to reread as often as needed. The extensive index should help with unknown or
forgotten terminology as necessary. Other exercises have hints on how to create a proof
that the reader has not seen before. These exercises require the use of proof techniques
in a new setting. Finally, there are also exercises without hints. Being able to create
the proof with nothing but the result given is the deepest task in a mathematics course.
This is not to say that exercises without hints are always the hardest and adaptations
are always the easiest, but in many cases this is true. Finally, some exercises give a
sequence of hints and intermediate results leading up to a famous theorem or a specific
example. These exercises could also be used as mini-projects. In a class, some of them
xiv
Preface
could be the basis for separate lectures that spotlight a particular theorem or example.
To get the most out of this text, the reader is encouraged to not look for hints and
solutions in other background materials. In fact, even for proofs that are adaptations
of proofs in this text, it is advantageous to try to create the proof without looking up
the proof that is to be adapted. There is evidence that the struggle to solve a problem,
which can take days for a single proof, is exactly what ultimately contributes to the
development of strong skills. “Shortcuts,” while pleasant, can actually diminish this
development. Readers interested in quantitative evidence that shows how the struggle
to acquire a skill actually can lead to deeper learning may find the article [4] quite
enlightening. A better survival mechanism than shortcuts is the development of connections between newly learned content and existing knowledge. The reader will need
to find these connections to hisher existing knowledge, but the structure of the text is
intended to help by motivating all abstractions. Readers interested in how knowledge
is activated more easily when it was learned in a known context may be interested in
the article [5].
Acknowledgments. Strange as it may sound, I started writing this text in the spring
of 1987, as I prepared for my oral final examination in the traditional Analysis I111 sequence in Germany. Basically, I took all topics in the sequence and arranged
them in what was the most logical fashion to me at the time. Of course, these notes
are, in retrospect, immature. But they did a lot to shape my abilities and they were
a good source of ideas and exercises. In this respect, I am indebted to my teachers
for this sequence: Professor Wegener and teaching assistant Ms. Lange for Analysis
I, Professor Kutzler and teaching assistant Herr Bottger for Analysis 11-111 as well as
Professor Herz in whose Differential Equations class I first saw analysis “at work.”
With all due respect to the other individuals, to me and many of my fellow students,
the force that drove us in analysis (and beyond) was Herr Bottger. This gentleman
was uncompromising in his pursuit of mathematical excellence and we feared as well
as looked forward to his demanding exercise sets. He was highly respected because
he was ready to spend hours with anyone who wanted to talk mathematics. Those
who kept up with him were extremely well prepared for their mathematical careers.
Incidentally, Dr. Ansgar Jungel, whose notes I used for the chapter on the finite element
method, took the above mentioned classes with me. The thorough preparation through
these classes is the main reason why most of this text was comparatively easy to write.
If this text does half as good a job as Herr Bottger did with us, it has more than achieved
its purpose.
It was thrilling to test my limitations, it was humbling to find them and ultimately
I was left awed once more by the beauty of mathematics. When my abilities were insufficient to proceed, I used the texts listed in the bibliography for proofs, hints or to
structure the presentation. To make the reader fully concentrate on matters at hand, and
to force myself to make the exposition self-contained, outside references are limited to
places where results were beyond the scope of this exposition. A solid foundation will
allow readers to judiciously pick their own resources for further study. Nonetheless, it
is appropriate to recognize the influence of the works of a number of outstanding individuals. I used Adams [2], Renardy and Rogers [23], Yosida [33] and Zeidler [34] for
Sobolev spaces, Aris 131, Cramer’s http: //www.navier-stokes .net/,and
Preface
xv
Welty, Wicks and Wilson [31] for fluid dynamics, Chapman [6] for heat transfer, Cohn
[7] for measure theory, DieudonnC [8] for differentiation in Banach spaces, Dodge [9]
and Halmos [ 131 for set theory, Ferguson [ 101, Sandefur [24] and Stoer and Bulirsch
[28] for numerical analysis, Halliday, Resnick and Walker [ 121 for elementary physics,
Hewitt and Stromberg [14], Heuser [15], [16], Johnsonbaugh and Pfaffenberger [20],
Lehn [22] and Stromberg [29] for general background on analysis, Heuser [17] for
functional analysis, Hurd and Loeb [18] for the use of quantifiers in logic, Jiingel [21]
and Solin [25] for the finite element method, Spivak [26], [27] for manifolds, Torchinsky [30] for Fourier series, Willard [32] for topology, and the Online Encyclopaedia of
Mathematics http : / /eom.springer.d e / for quick checks of notation and definitions. Readers interested in further study of these subjects may wish to start with the
above references.
The first draft of the manuscript was used in my analysis classes in the Winter and
Spring quarters of 2007. The first class covered Chapters 1-9, the second covered
Chapters 11 and 14-18 (with some strategic “fast forwards”). This setup assured that
graduating students would have full exposure to the essentials of analysis on the real
line and to as much abstract analysis as possible without “handwaving arguments.” I
am grateful to the students in these classes for keeping up with the pace, solving large
numbers of homework problems, being patient with the typos we found and also for
suggesting at least one order in which to present the material that I had not considered.
The students’ evaluations (my best ever) also reaffirmed for me that people will enjoy,
or at least accept and honor, a challenge, and that an ambitious, motivated course should
be the way to go. Devery Rowland once more did an excellent job printing drafts of
the text for the classes.
Aside from the referees, several colleagues also commented on this text and I owe
them my thanks for making it a better product. In particular, I would like to thank Natalia Zotov for some comments on an early version that significantly improved the presentation, and Ansgar Jiingel for pointing out some key references on Sobolev spaces.
Although I hope that we have found all remaining errors and typos, any that remain are
my responsibility and mine alone. I request readers to report errors and typos to me
so I can post an errata. My contacts at Wiley, Susanne Steitz, Jacqueline Palmieri, and
Melissa Yanuzzi bore with me when the stress level rose and their patience made the
publishing process very smooth.
As always, this work would not have been possible without the love of my family.
It is truly wonderful to be supported by individuals who accept your decision to spend
large amounts of time reliving your formative years.
Finally, I was sad to learn that Herr Bottger died unexpectedly a few years after I
had my last class with him. Sir, this one’s for you.
Ruston, LA, August 30,2007
Bernd Schroder
Part I
Analysis of Functions
of a Single Real Variable
Chapter 1
The Real Numbers
This investigation of analysis starts with minimal prerequisites. Regarding set theory,
the terms “set” and “element” will remain undefined, as is customary in mathematics
to avoid paradoxes. The empty set 0 is the set that has no elements. The statement
“e E S” says that e is an element of the set S. The statement “ A G B” says that every
element of A is an element of B . Sets A and B are equal if and only if A C B and
B C A . The statement “A c B” says that A E B and A # B . Subsets will be defined
as “ A = {x E S : (property)},”that is, with a statement from which set S the elements
of A are taken and a property describing them. The union of two sets A and B is
A U B = {x : x E A o r x E B } , theintersectionis A n B = {x : x E A andx E B ) .
u
n
Union and intersection of finitely many sets are denoted
j=l
n
A j and
n
A j , respec-
j=1
tively, and the relative complement of B in A is A \ B = {x E A : x @ B ) . Further
details on set theory are purposely delayed until Section 7.1. Until then, we focus on
analytical techniques. Any required notions of set theory will be clarified on the spot.
To define properties, sometimes the universal quantifier “V” (read “for all”) or the
existential quantifier “3” (read “there exists”) are used. Formal logic is described in
more detail in Appendix A. Finally, the reader needs an intuitive idea what a function,
a relation and a binary operation are. Details are relegated to Appendices B.2 and C.2.
The real numbers R are the “staging ground” for analysis. They can be characterized as the unique (up to isomorphism) mathematical entity that satisfies Axioms
1.1, 1.6, and 1.19. That is, they are the unique linearly ordered, complete field (see
Exercise 1-30). In this chapter, we introduce the axioms for the real numbers and some
fundamental consequences. These results assure that the real numbers indeed have the
properties that we are familiar with from algebra and calculus.
1.1 Field Axioms
The description of the real numbers starts with their algebraic properties.
1
2
1. The Real Numbers
Axiom 1.1 The real numbers R are a field. That is, R has at least two elements
and there are two binary operations, addition
: R x R + R and multiplication
. : R x R -+ R,so that
+
1. Addition is associative, that is, for all x , y , z
(x
R we have
E
R we have
+ y ) + z = x + (y + z).
2. Addition is commutative, that is, for all x , y
x
E
+y = y +x.
3. There is a neutral element 0 f o r addition, that is, there is an element 0
t h a t f o r a l l x E R we havex + 0 = x .
4. For every element x
x + (-x) = 0.
E
R so
E
R there is an additive inverse element (-x) so that
E
R we have
6. Multiplication is commutative, that is, for all x , y E
R we have
5. Multiplication is associative, that is, for all x , y , z
(x . y ) . z = x . ( y . z ) .
x ‘ 4 ’ = y .x.
7. There is a neutral element 1f o r multiplication, that is, there is an element 1 E R
so that for all x E R we have 1 . x = x .
8. For every element x E
t h a t x . x - l = 1.
R \ { 0 }there is a multiplicative inverse element x - l so
9. Multiplication is (left) distributive over addition, that is, f o r all a , x , y
have a . (x y) = a, .x + a . y .
+
E
R we
As is customary for multiplication, the dot between factors is usually omitted.
Fields are investigated in detail in abstract algebra. For analysis, it is most effective
to remember that the field axioms guarantee the properties needed so that we can perform algebra and arithmetic “as usual.” Some of these properties are exhibited in this
section and in the exercises. The exercises also include examples that show that not
every field needs to be infinite (see Exercises 1-7-1-9).
Theorem 1.2 The following are true in R:
1. For all x
E
R,we have Ox
= 0.
2. 0 # 1.
3. Additive inverses are unique. That is, i f x E
property in part 4 ofAxiom 1.1, then x’ = X.
4. For all x
E
R,we have (- l)x
= -x.
R and
x’ and
X both have the
3
1.1. Field Axioms
Proof. Early in the text, proofs will sometimes be interrupted by comments in italics
to point out standard formulations and proof techniques.
To prove part 1, let x E R. Then the axioms allow us to obtain the following
Ax.3
Ax.6
Ax.9
Ax.6
equation. Ox = (O+O)x = x(O+O) = xO+xO = Ox +Ox. This implies
as was claimed. The proof of part I shows how every step in a proof needs to be
just$ed. Usually we will not explicitly justify each step in a computation with an axiom
or a previous result. Howevel; the reader should always mentallyfill in thejusti3cation.
The practice offilling in these justiJcations should be started in the computations in
the remainder of this proot
To prove part 2, first note that, because R has at least two elements, there is an
x E R \ ( 0 ) . Now suppose for a contradiction (see Standard Proof Technique 1.4
below) that 0 = 1. Then x = 1 . x = 0 . x = 0 is a contradiction to x E R \ ( 0 ) .
For part 3, note that if x’and X both have the property in part 4 of Axiom 1.1, then
x’ = x’+O = x’+(x+X) = (x’+x)+X = (x+x’)+X = O+X = X+O = X.Note that
the statement of part 3 already encodes the typical approach to a uniqueness proof (see
Standard Proof Technique 1.5 below).
Finally, for part 4 note that x (- l ) x = l x (- 1)x = (1 (- 1)). = Ox = 0.
Because by part 3 additive inverses are unique, (- l)x must be the additive inverse
-x of x . The last step is a typical application of modus ponens, see Standard Proof
Technique 1.3 below.
+
+
+
To familiarize the reader with standard proof techniques, these techniques will be
pointed out explicitly in the early part of the text. The techniques presented in Chapter 1
are general proof techniques applicable throughout mathematics. Techniques presented
in later chapters are mostly specific to analysis.
Standard Proof Technique 1.3 The simplest mathematical proof technique is a direct proof in which a result that says “ A implies B” is applied after we have proved
that A is true. Truth of A and of “ A implies B” guarantees truth of B . This technique
is also called modus ponens. An example is in the proof of part 4 of Theorem 1.2. 0
Standard Proof Technique 1.4 In a proof by contradiction, we suppose the contrary
(the negation, also see Appendix A.2) of what is claimed is true and then we derive
a contradiction. Typically, we derive a statement and its negation, which is a contradiction, because they cannot both be true. For an example, see the proof of part 2 of
Theorem 1.2 above. Given that the reasoning that led to the contradiction is correct, the
contradiction must be caused by the assumption that the contrary of the claim is true.
Hence, the contrary of the claim must be false, because true statements cannot imply
false statements like contradictions (see part 3 of Definition A.2 in Appendix A). But
this means the claim must be true.
We will usually indicate proofs by contradiction with a starting statement like “suppose for a contradiction.”
0
4
1. The Real Numbers
Standard Proof Technique 1.5 For many mathematical objects it is important to assure that they are the only object that has certain properties. That is, we want to assure
that the object is unique. In a typical uniqueness proof, we assume that there is more
than one object with the properties under investigation and we prove that any two of
these objects must be equal. Part 3 of Theorem 1.2 shows this approach.
Exercises
1-1. Prove that (-1). (-1) = 1.
1-2.
. is right distributive over +. Prove that for all x , y , z
E
R we have (x + y ) z = xz + yz.
1-3. Multiplicative inverses are unique. Prove that if x E W and x' and X both have the property in part
8 of Axiom 1.1 then x' = X.
1-4. Prove that 0 does not have a multiplicative inverse.
1-5. Prove that if x , y # 0, then ( x y ) - ' = y - l x - ' .
Conclude in particular that x y # 0.
1-6. Prove each of the binomial formulas below. Justify each step with the appropriate axiom
+ b ) 2 = a* + 2ab + b2
(a + b ) ( a - b ) = a2 - b2
(b) ( a - b)* = a 2 - 2ab
(a) ( a
(c)
+ b2
+
1-7. Prove that the set (0, 1) with the usual multiplication and the usual addition, except that 1 1 := 0, is
a field. That is, prove that the set and addition and multiplication as stated have the properties listed
in Axiom 1.1.
1-8. Prove that the set (0, 1. 2 ) with the sum and product of two elements being the remainder obtained
when dividing the regular sum and product by 3 is a field.
1-9. A property and some finite fields
(a) Let F be a field and let x , y
E
F . Prove that x y = 0 if and only if x = 0 or y = 0
(b) Prove that the set [O. 1, 2. 3 ) with the sum and product of two elements being the remainder
obtained when dividing the regular sum and product by 4 is not a field.
(c) Prove that the set (0, 1, . , , , p - 1) with the sum and product of two elements being the
remainder obtained when dividing the regular sum and product by p is a field if and only if p
is a prime number.
1.2 Order Axioms
Exercises 1-7-1-9c show that the field axioms alone are not enough to describe the real
numbers. In fact, fields need not even be infinite. However, aside from executing the
familiar algebraic operations, we can also compare real numbers. This section presents
the order relation on the real numbers and its properties.
Axiom 1.6 The real numbers R contain a subset R+,called the positive real numbers
such that
1. For all x , y
2. For all x
E
Either x E
E
R+,we have x + y
E
E%+ and x y
E
E%+,
R, exactly one of the following three properties holds.
R+ or -x E Rt or x = 0.
5
1.2. Order Axioms
A real number x is called negative if and only if -x E R+.
Once positive numbers are defined, we can define an order relation. As usual,
instead of writing y (-x) we write y - x and call it the difference of x and y. The
binary operation “-” is called subtraction.
The phrase “if and only if,” which is used in definitions and biconditionals, is normally abbreviated with the artificial word “iff.”
+
Definition 1.7 For x,y E R,we say x is less than y, in symbols x < y, i f f y -x E R+.
We say x is less than or equal to y, denoted x 5 y, ifSx < y or x = y. Finally, we say
x is greater than y, denoted x > y, i r y < x,and we say x is greater than or equal
to y , denoted x 2 y, ifsy 5 x.
The relation 5 satisfies the properties that define an order relation.
Proposition 1.8 The relation 5 is an order relation on R.That is,
1. 5 is reflexive. For all x
E
R we have x 5 x,
2. 5 is antisymmetric. For all x,y
x = y,
E
R we have that x 5 y and y 5 x implies
3. 5 is transitive. For all x, y , z E X , we have that x 5 y and y 5 z implies
x 5 z.
Moreovel; the relation 5 is a total order relation, that is, f o r any two x,y E
have that x 5 y or y 5 x.
R we
Proof. The relation 5 is reflexive, because it includes equality.
For antisymmetry, let x 5 y and y 5 x and suppose for a contradiction that x y .
Then x - y E R+ and -(x - y ) = y - x E R+,which cannot be by Axiom 1.6. Thus
< must be antisymmetric.
For transitivity, let x 5 y and y 5 z . There is nothing to prove if one of the
inequalities is an equality. Thus we can assume that x
y and y < z , which means
y - x E Rf and z - y E R+.But then R+ contains (7, - y) ( y - x) = z - x, and
hence x < z . We have shown that for all x,y. z E R the inequalities x 5 y and y I
:z
imply x 5 z , which means that 5 is transitive.
For the “moreover” part note that if x,y E R,then y - x E R and we have
either y - x E R+,which means x < y , or y - x = 0, which means y = x, or
x - y = - ( y - x ) E R + , w h i c h m e a n s y < x . Thereforeforallx,y E R o n e o f x 5 y
or y 5 x holds, and hence 5 is a total order.
+
+
Once an order relation is established, we can define intervals.
Definition 1.9 An interval is a set I C R so that for all c , d E I and x E R the
inequalities c < x < d imply x E I . In particular for a , b E R with a < b we define
1. [ a , b] := (X E
R : a 5 x 5 b},
2. ( a , b ) := (x E R : a < x < b ] , ( a , 00) := (x E R : a < x},
(-00. b ) := (X E R : x < b}, (-w, 00) := R,
6
1. The Real Numbers
3. [ a ,b ) := {x E
R :a 5 x
4. ( a , b ] := {X
R :u
E
< b } , [ a , 00) := { X E R : u 5 x},
< x 5 b ] , (-w, b ] := ( X E R : x 5 b].
The points a and b are also called the endpoints of the interval. A n interval that
does not contain either of its endpoints (where &m are also considered to be "endpoints") is called open, An interval that contains exactly one of its endpoints is called
half-open and an interval that contains both its endpoints is called closed.
For the first part of this text, the domains of functions will almost exclusively be
intervals. Because analysis requires extensive work with inequalities, we need to investigate how the order relation relates to the algebraic operations.
Theorem 1.10 Properties of the order relation. Let x , y , z E R.
1. The number x is positive ifsx > 0 and x is negative c r x < 0.
2. I f x 5 y , then x
+ z 5 y + z.
3. I f x 5 y and z > 0, then xz 5 y z .
4. I f x 5 y and
z
< 0, then xz 2 y z .
5. l f 0 < x 5 y , then y-' 5 x-'.
Similar results can be proved f o r other combinations of strict and nonstrict inequalities.
We will not state these here, but instead trust that the reader can make the requisite
translation from the statements in this theorem.
Proof. Parts 1 and 2 are left to the reader as Exercises 1-10a and 1-lob. Throughout
this text, parts of proofs will be delegated to the reader to facilitate a better connection
to the material presented.
For part 3 , let x 5 y and let z > 0. Then, y - x E R+ or y = x. In case y = x,
we obtain y z = x z and thus, in particular, xz 5 y z . In case y - x E R+,note that
z > 0 means z E R+, and hence y z - xz = ( y - x ) z E R+.By definition, this implies
xz < y z , and in particular xz 5 y z . Because we have shown xz 5 y z in each case,
the result is established. All proofs in this section are done with the above kind of case
distinction (see Standard Proof Technique 1.1 1).
For part 4, let x 5 y and let z < 0. Then, y - x E R+ or y = x. In case y = x,
we obtain y z = x z , and hence xz 2 y z . In case y - x E Rf,note that z < 0 means
-2 E B+, and hence xz - y z = ( x - y ) z = ( y - x ) ( - z ) E R+.By definition, this
implies y z < xz, and hence y z 5 xz,which establishes the result.
For part 5, first note that there is nothing to prove if x = y . Hence, we can assume
that x < y . Suppose for a contradiction that x - l < y-' . Then by part 3 we have that
1 = x - l x < y - ' x , and hence x < y . 1 < y y - ' x = x,contradiction.
Standard Proof Technique 1.11 When several possibilities must be considered in a
proof, the proof usually continues with separate arguments for each possibility. The
proof is complete when each separate argument has led to the desired conclusion. This
0
type of proof is also called a proof by case distinction.
1.2. Order Axioms
7
We conclude this section by introducing the absolute value function and some of
its properties.
Definition 1.12 For x
E
R,we set Ix I = x;
i f x 1.0, and we call it the absolute
-x; i f x < 0,
value of x .
Theorem 1.13 summarizes the properties of the absolute value. The numbering is
adjusted so that properties 1,2, and 3 correspond to the analogous properties for norms
(see Definition 15.38). We will formulate many results in the jirst part of the text to
be analogous or easily generalizable to more abstract settings, but we will usually do
so without explicit forward references. In this fashion many abstract situations will be
more familiar because of similarities to situations investigated in the jirst part.
Theorem 1.13 Properties of the absolute value.
0. For all x
E
R,we have Ix I > 0,
1. For all x
E
R,we have 1x1 = 0 i y x
2. F o r a l l x , y
ER,wehave
= 0,
lxyl = Ixllyl,
3. Triangular inequality. For all x,y E R,we have Ix
4. Reverse triangular inequality. For all x , y
E
+ y I 5 lx I + I y 1.
R,we have 1 Ix I - I y I 1
I
Ix - y I.
Proof. For part 0, let x E R.In case x > 0, by Definition 1.12 we have /x1 = x > 0.
In case x < 0, we have x @ R+ and by part 2 of Axiom 1.6 we conclude -x > 0.
Because in this case Ix I = -x > 0, part 0 follows.
Throughout the text, the two implications of a biconditional “ A iff B” will be referred to as “+,”denoting “if A, then B ” and “+,”denoting “if B , then A.”
For part 1, note that the direction “+=” is trivial, because (01= 0. For the direction
“jlet
,
x”
E R be so that /xI = 0 and suppose for a contradiction that x
0. If x > 0,
then 0 < x = 1x1 = 0, a contradiction. (Note that the previous sentence is a shortproof
by contradiction that is part of a longer proof by contradiction.) Therefore x < 0. But
then 0 < -x = 1x1 = 0, a contradiction. Hence, x must be equal to 0.
For part 2 , let x , y E R. If x 2 0 and y 1. 0, then by part 3 of Theorem 1.10
xy 1. 0, and hence lxyl = x y = I x / / y l . If x 2 0 and y < 0, then by part 4 of
Theorem 1.10 we infer xy 5 0. Hence, (xyl = -xy = x ( - y ) = J x J J y The
J . case
x < 0 and y 3 0 is similar and the reader will produce it in Exercise 1- 11a. Finally,
if x < 0 and y < 0, then by part 4 of Theorem 1.10 we obtain x y > 0. Hence,
/xyl = xy = (-l)(-1)xy = ( - x ) ( - y ) = ixllyl.
To prove the triangular inequality, first note that for all x E IR we have that x I
/x1.
This is clear for x 1. 0 and for x < 0 we simply note x < 0 < -x = 1x1.Moreover,
(see Exercise 1-llb) for all x E R we have -x I1x1. Now let x,y E R. If the
inequality x y 2 0 holds, then by part 2 of Theorem 1.10 at least one of x. 4’ is
greater than or equal to 0. (Otherwise x < 0 and y < 0 would imply x y < 0.)
Hence,bypart2ofTheoreml.lOIx+yI = x + y l I x l + y ~I x I + I y I . I f x + y ( 0 ,
+
+
+
1. The Real Numbers
8
then at least one of x and y is less than 0. Hence, by part 2 of Theorem 1.10 we obtain
Ix y l = -(x
y ) = --x
(-y) < I -XI
(-Y) i I --XI
I - Y I = 1x1 IYI.
Finally, for the reverse triangular inequality, let x, y E R.Without loss of generality
(see Standard Proof Technique 1.14) assume that ( X I 3 IyI. (The proof for the case
1x1 < lyl is left as Exercise 1-llc.) Then 1x1 = Ix - y yl i Ix - yI lyl, which
w
implies 11x1
- 1y11
= 1x1 - IYI i Ix - Y I .
+
+
+
+
+
+
+
+
Standard Proof Technique 1.14 If the proofs for the cases in a case distinction are
very similar, it is customary to assume without loss of generality that one of these
similar cases is true. This is not a loss of generality, because it is assumed that what is
presented enables the reader to fill in the proof(s) for the other case(s). In this text, the
omitted part is sometimes included as an explicit exercise for the reader.
0
Exercises
1- 10. Finishing the proof of Theorem 1.10
(a) Prove part 1 of Theorem 1.10.
(b) Prove part 2 of Theorem 1.10.
1-1 1. Finishing the proof of Theorem 1.13.
(a) L e t x , y ~ W . P r o v e t h a t i f x > O a n d y ~ O , t h e n I x y l = I x l l y l .
(b) Prove that for all x E R we have --x 5 1x1.
1
(c) Prove that if 1x1 < Iyl, then 11x1 - ( y / 5 Ix
1-12. Let I , J G
R be intervals. Prove that I n J
= {x E
-
y/.
W :x
E I and x E J ] is again an interval
1-13. Let a < b and letx, y E [ u , b].Prove that In - yI 5 b - a
1-14. Prove that none of the fields from Exercise 1-9c can satisfy Axiom 1.6 by showing that for these
fields part 2 of Axiom 1.6 fails for n = 1.
Note. This result shows that Axiom 1.6 distinguishes R from the finite fields of Exercise 1-9c.
1.3 Lowest Upper and Greatest Lower Bounds
A structure that has the properties outlined in Axioms 1.1 and 1.6 is also called a
linearly ordered field. The rational numbers satisfy these properties just as well as the
real numbers. Thus we are not done with our characterization of R.The final axiom
for the real numbers addresses upper and lower bounds of sets.
Definition 1.15 Let A be a subset ofR.
E R is called an upper bound of A iff u 2 a f o r all a E A.
has an upper bound, it is also called bounded above.
1. The number u
2. The number I E R is called a lower bound of A i f f 1 5 a f o r all a
a lower bound, it is also called bounded below.
A subset A
E
If A
A. If A has
R that is bounded above and bounded below is also called bounded.
9
1.3. Lowest Upper and Greatest Lower Bounds
Among all upper bounds of a set, the smallest one (if it exists) plays a special role.
Similarly, the greatest lower bound plays a special role if it exists.
Definition 1.16 Let A C R.
1. The number s E R is called lowest upper bound of A or supremum of A,
denoted sup(A), iffs is an upper bound of A and f o r all upper bounds u of A we
have that s 5 u.
2. The number i E R is called greatest lower bound of A or infimum of A, denoted inf(A), iff i is a lower bound of A and f o r all lower bounds 1 of A we have
that 1 5 i .
Formally, it is not guaranteed that suprema and infima are unique, but the next
result shows that this is indeed the case. Note that the statement of Proposition 1.17
follows the standard pattern for a uniqueness statement.
Proposition 1.17 Suprema are unique. That is, ifthe set A
and s , t E R both are suprema of A, then s = t.
R is bounded above
Proof. Let A G Iw and s , t E R be as indicated. Then s is an upper bound of A and,
because t is a supremum of A, we infer s 2 t . Similarly, t is an upper bound of A and,
because s is a supremum of A, we infer t 2 s. This implies s = t .
Standard Proof Technique 1.18 (Also compare with Standard Proof Technique 1.14.)
When, as in the proof of Proposition 1.17, two parts of a proof are very similar, it is
common to only prove one part and state that the other part is similar. Throughout the
text, the reader will become familiar with this idea through exercises that require the
construction of proofs that are similar to proofs given in the narrative.
The proof that infima are unique is similar (see Exercise 1-15). Because suprema
and infima are unique if they exist, we speak of the supremum and the infimum.
The final axiom for the real numbers now states that suprema and infima exist under
mild hypotheses.
Axiom 1.19 Completeness Axiom. Every nonempty subset S of R that has an upper
bound has a lowest upper bound.
Although the Completeness Axiom formally only guarantees that nonempty subsets
of
R that are bounded above have suprema, existence of infima is a consequence.
Proposition 1.20 Let S 5 R be nonempty and bounded below. Then S has a greatest
lower bound.
Proof. Let L := {x E R : x is a lower bound of S}. Then L f 0. Let s E S. Then
for all 1 E L we have that 1 Is. Because S f: 0 this means that L is bounded above.
Because L f: 0, by the Completeness Axiom, L has a supremum sup(L). Every s E S
is an upper bound of L , which means that s 2 sup(L) and so sup(L) is a lower bound
of S . By definition of suprema, sup(L) is greater than or equal to all elements of L ,
10
1. The Real Numbers
that is, it is greater than or equal to all lower bounds of S. By definition of infima, this
means that sup(L) = inf(S).
rn
We will see that suprema and infima are valuable tools in analysis on the real line.
The next result shows that in any set with a supremum we can find numbers that are
arbitrarily close to the supremum. This fact is important, because analysis ultimately
is about objects “getting close to each other.”
Proposition 1.21 Let S c R be a nonempty subset of R that is bounded above and let
s := sup(S). Thenfor every E > 0 there is an element x E S so that s - x < E .
Proof. Suppose for a contradiction that there is an E > 0 so that for all x E S we
have that s - x 1 E . Then for all x E S we would obtain s - E 1 x, that is, s - E would
be an upper bound of S. But s - E < s contradicts the fact that s is the lowest upper
bound of S.
rn
Although the supremum and infimum of a set need not be elements of the set, we
have different names for them in case they are in the set.
Definition 1.22 Let A be a subset of R.
1.
If A is bounded above and sup(A)
E A, then the supremum of A is also called
the maximum of A, denoted max(A).
2. If A is bounded below and inf(A)
minimum of A, denoted min(A).
E
A, then the injmum of A is also called the
Although the distinctions between suprema and maxima and between infima and
minima are small, the notions are distinct. For example, the open interval (0, 1) has a
supremum (1) and an infimum (0), but it has neither a maximum, nor a minimum.
Exercises
1-15. Let A g W be bounded below and l e t s , f E W both be infima of A. Prove that s = t .
1-16. Approaching infima. State and prove a version of Proposition 1.21 that applies to infima. Is the proof
significantly different from that of Proposition 1.21?
1-17. Let S g W be bounded above. Prove that s E W is the supremum of S iff s is an upper bound of S
and for all E > 0 there is an x E S so that Is - x / < E .
1-18. Suprema and infima vs. containment of sets.
(a) Let A. B C W be bounded above. Prove that A
(b) Let A , B g W be bounded below. Prove that A
1-19. Let A g
5 B implies sup(A) 5 sup(B).
g B implies inf(A) ? inf(B).
W be bounded above. Prove that inf(x E R : - x
E
A] = - sup(A).
1.4. Natural Numbers, Integers, and Rational Numbers
11
1.4 Natural Numbers, Integers, and Rational Numbers
Although Axioms 1.1, 1.6 and 1.19 uniquely describe the real numbers, they do not
mention familiar subsets, such as natural numbers, integers, and rational numbers. This
is because these sets can be constructed from the axioms as subsets of the real numbers.
We start with the natural numbers, which are the unique subset with properties as stated
in Theorem 1.23. While their existence is easy to establish, the uniqueness of the
natural numbers can only be proved in Theorem 1.28 after some more machinery has
been developed.
Theorem 1.23 There is a subset N
G R,called the natural numbers, so that
1. 1 E N .
2. For each n E N the number n
+ 1 is also in N.
3. Principle of Induction. If S s
have n 1 E S,then S = N.
+
N is such that 1 E
S and f o r each n E S we also
Proof. Call a subset A G R a successor set iff 1 E A and for all a E A we also
have a
1 E A . Successor sets exist, because, for example, R itself is a successor
set. Let N be the set of all elements of R that are in all successor sets. Because 1 is
an element of every successor set, we infer 1 E N. Moreover, if n E N, then n is in
every successor set, which means n 1 is in every successor set, and hence n 1 E N.
Finally, any subset S C N as given in the Principle of Induction is a successor set.
Because the elements of N are contained in all successor sets, we conclude that N G S ,
and hence N = S.
1
+
+
+
Of course, we will denote the natural numbers by their usual names 1, 2, 3, . . .
As algebraic objects, natural numbers are suited for addition and multiplication (see
Proposition 1.24), but they are not so well suited for subtraction (see Proposition 1.25).
Although all results until Theorem 1.28 are stated for N,they hold “for every subset of
R that satisfies the properties in Theorem 1.23.” The reader should keep this in mind
and double check, because we will need it in the proof of Theorem 1.28. To avoid
awkward formulations, the results up to Theorem 1.28 are formulated for N,however.
Proposition 1.24 The natural numbers are closed under addition and multiplication.
That is, i f m , n E N,then m n and mn are in N also.
+
Proof. The key to this result is the Principle of Induction. Let m E W be arbitrary
and let S, := { n E N : m+n E N}.Then m E N implies m+ 1 E N,and hence 1 E S,.
Moreover, if n E S,, then m n E N,and hence m ( n 1) = ( m n ) 1 E N,
which means that n 1 E .S, By the Principle of Induction we conclude that S, = N.
Because m E N was arbitrary, this means that for any m , n E N we have m n E W.
1
The proof for products is similar and left to the reader as Exercise 1-20.
+
+
+ +
+ +
+
Readers familiar with induction recognize the part “1 E S,” of the preceding proof
as the base step of an induction and the part “n E S, jn 1 E S”, as the induction
step. In this section, we use the “induction on sets” as done in the preceding proof.
The more commonly known Principle of Induction is introduced in Theorem 1.39.
+
12
1. The Real Numbers
Proposition 1.25 Let m , n
E
N be such that m
> n. Then m - n E
N.
Proof. We first show that if m E N,then m - 1 E N or m - 1 = 0. To do this,
let A := { m E N : m - 1 E N o r m - 1 = 0 ) . Then 1 E A a n d i f m E A , then
( m 1) - 1 = m E A C N,which means m 1 E A . Hence, A = N by the Principle
of Induction.
Now let S:= { n E N:(Vm E N : m > n implies m - n E N)}. If n = 1 and m E W
satisfies m > 1, then m - 1 > 0 and so by the above m - 1 E N,which means 1 E S.
Let n E S. If m > n 1, then m - 1 > n , and hence m - ( n 1) = (m - 1) - n E N,
which means n 1 E S. By the Principle of Induction we conclude that S = N,and
hence for all m , n E N we have proved that m > n implies m - n E N.
+
+
+
+
+
Proposition 1.26 shows that the natural numbers are positive and the smallest difference between any two of them is 1.
Proposition 1.26 For all n E N,the inequality n 2 1 holds and there is no m
that the inequalities n < m < n 1 hold.
+
E
N so
Proof. The proof that all natural numbers are greater than or equal to 1 is left to
Exercise 1-21.
Now suppose for a contradiction that there is an n E N and an rn E N so that
n < m < n 1. Thenm - n E N a n d m - n < 1, acontradiction.
+
The Well-ordering Theorem turns out to be equivalent to the Principle of Induction
(see Exercise 1-22).
Theorem 1.27 Well-ordering Theorem. Every nonempty subset of N has a smallest
element.
Proof. Suppose for a contradiction that B 5 N is not empty and does not have a
smallest element. Let S := { n E N : (Vm E N : m I
n implies m $ B ) } . By Proposition 1.26, 1 is less than or equal to all elements of N,so 1 # B, and hence 1 E S. Now
let n E S. Then all m E N with m 5 n are not in B. But then n 1 E B would by
Proposition 1.26 imply that n 1 is the smallest element of B . Hence, n 1 # B and
we conclude n 1 E S.By the Principle of Induction, S = N and consequently B = 0,
a contradiction.
+
+
+
+
Now we are finally ready to show that the natural numbers are unique.
Theorem 1.28 The natural numbers N are the unique subset of R that satisfies the
properties in Theorem 1.23.
Proof. Examination of the proofs of all results since Theorem 1.23 reveals that any
set S E Iw that satisfies the properties in Theorem 1.23 must also have the properties
given in these results.
It may feel tedious to go back and verify the above statement. Howevel; mathematical presentations more often than not will ask a reader to use a modification of a known
proof toprove a result (also see Standard Proof Technique 1.14). When this occurs, the
13
1.4. Natural Numbers, Integers, and Rational Numbers
reader is expected to verifL that the result(s)can indeed be proved with similar methods
as were used for earlier results.
Now suppose for a contradiction that there is a set S # N with properties as in
Theorem 1.23. Then S is a successor set, so M S. Let B := S \N = {s E S : s # N).
Then B # 0, and hence by the Well-ordering Theorem, which is valid for S, B has a
smallest element b. Because 1 E N we infer b
1 , and hence by Proposition 1.25,
which is valid for S, we have b - 1 E S. But then b - 1 @ N,because this would imply
b = ( b - 1) 1 E N.Hence, b - 1 E B , which is a contradiction to the fact that b is
the smallest element of B .
+
Once we have constructed the natural numbers, the next number system to consider
are the integers.
Definition 1.29 The set Z := { m E R : m E
of integers.
N or m
= 0 or
- rn
E
N)is called the set
We leave several proofs of natural properties of the integers to the reader.
Proposition 1.30 The integers are closed under addition, subtraction and multiplication. Moreovel; for any two integers k , 1 with k > 1 we have that k - 1 >_ 1, every
nonempty set A 5 Z that is bounded below has a minimum, and every nonempty set
A C Zthat is bounded above has a maximum.
Proof. To prove that Z is closed under addition, let m , n E Z. In case both are
natural numbers or in case one of them is zero, there is nothing to prove. Moreover,
in case -m, -n E N we have rn n = -((-m)
( - n ) ) , which is in Z, because
(-m)
( - n ) E N. Now consider the case m E N and -n E N. If m = -n, we
obtain m n = 0 E Z. If m > -n, then by Proposition 1.25 we conclude that
m n = m - (-n) E N 5 Z. Finally, if m < -n again by Proposition 1.25 we
conclude that -(m
n ) = (-n) - m E N,which means by definition of Z that
m n E Z.The case -rn E M and n E N is treated similarly (see Exercise 1-23a).
Closedness under subtraction and multiplication as well as the claim about differences are left to Exercises 1-23b-1-23d.
Now let A E Z be nonempty and bounded below. Then, because A C R, it has an
infimum a . By the version of Proposition 1.21 for infima, there is an integer rn E A
with m - a < 1. Because the absolute value of the difference between any two distinct
integers is at least 1, rn is the only integer in [a,a 1 ) . Hence, m is below all elements
of A that are not in [a,a 1). Because m is the only element of A in [ a ,a l), m
must be the minimum of A.
The proof of the corresponding result for nonempty subsets A 5 Z that are bounded
above is left to Exercise 1-23e.
+
+
+
+
+
+
+
+
+
+
A key property of the natural numbers is that any real number is exceeded by a
natural number. To prove this, we need the usual fractions, which are easily introduced.
1
R \ { 0 )we set - := a-l and call it the reciprocal of a.
n
-.
b
1
For b E W and a E W \ ( 0 )we set - := b . - = ba-' and call it a fraction.
a
U
Definition 1.31 For all a
E
1. The Real Numbers
14
1 1
Because - + - = 2-'
2 2
following.
+ 2-'
= (1
+ 1) . 2 - '
Theorem 1.32 For every x E R, there is an n
E
= 2 .2-' = 1 we can now prove the
N so that n 2 x.
Proof. For a contradiction, suppose that x is such that for all n E N we have that
n < x. Then B := { y E R : (Vn E N : n < y ) ) is not empty. Moreover, B is bounded
below by all n E N. By the Completeness Axiom, B has an infimum, call it b. Then
1
1
1
b - - # B, which means there is an n E N with n 2 b - -. But then n 1 2 b 2
2
2
is a lower bound of B, a contradiction to b = inf(B).
+
+
Because N C Z and because subsets of Z that are bounded below have a minimum,
we infer that for every real number x there is a unique smallest integer that is greater
than or equal to x. Similarly there is a unique largest integer that is less than or equal
to x. These numbers are useful when we need integers instead of real numbers, so we
define the following.
Definition 1.33 For every x E R, let [XI be the smallest integer greater than or equal
to x. Moreovel; let 1x1 be the largest integer less than or equal to x. Asfunctions from
IR to Z, r.1 is called the ceiling function and 1.1 is called the floor function.
The last subset of R that we introduce is the set of rational numbers. Rational
numbers are naturally defined as fractions.
ca
1
Definition 1.34 The set Q := - : n E Z, d E N is called the set of rational numbers. The set R \ Q := {x E R : x # Q]is called the set of irrational numbers.
Proposition 1.35 The rational numbers are closed under addition, subtraction and
4
multiplication. Moreovel; i f q , r E Q and r 0, then - E Q.
r
+
m
n
Proof. Let m , n E Z, let c , d E N and consider the rational numbers - and - .
C
d
Then Q is closed under addition because
m n
= mc-' + nd-' = mdd-'c-' + ncc-'d-'
- + c
d
mn
For multiplication, note that - - = mc-'nd-'
cd
der is left to Exercise 1-24.
= mnc-ld-'
=
mn
The remaincd
-.
Rational numbers can be found between any two real numbers and Exercise 1-45
will establish a similar result for irrational numbers.
Theorem 1.36 Let a , b
thata < q < b.
E
IR with a < b. Then there is a rational number q
E
Q such
15
1.4. Natural Numbers, Integers, and Rational Numbers
1
Proof. By Theorem 1.32, there is an n E N so that 0 < -< n. By part
b-a
1
5 of Theorem 1.10, we obtain - < b - a. Now let u := min
n
1
u
1
Then - - - 2 b - a > -, which means
n
n n
n
1+1
u
'+'<b.
- < -. Hence, by definition of 1 and u we infer a < n
n
n
n
similarly let I := max m E
Z: - 5 a
I
,
We conclude with a simple looking result that is actually at the heart of a standard
proof technique (see Standard Proof Technique 2.7). Exercise 1-25 extends Theorem
1.37 to inequalities.
Theorem 1.37 Let x
E
E
R.r f x 3 0 and for all E
> 0 we have x 5
E,
then x = 0.
Proof. Let x be as indicated and suppose for a contradiction that x > 0. Then
x .
x .
1
:= - is positive and x 5 E = - implies 1 5 -, a contradiction.
2
2
2
Exercises
1-20. Prove that if m. n E N, then mn E N.
Hint. Same idea as the first part of the proof of Proposition 1.24 with sets S,
:= [n E
N : mn
-m E
N and n
E
N}.
1-21. Prove that if n E N,then n 2 1.
Hint. Use S := ( n E N : n ? 1).
1-22. Use the Well-ordering Theorem to prove the Principle of Induction.
1-23. Finish the proof of Proposition 1.30 by proving the following.
Finish the proof that
then m n E Z.
+
Z is closed under addition. That is, prove that if
Prove that Zis closed under subtraction. That is, prove that m
-n E
Prove that 2.is closed under multiplication. That is, prove that mn
Prove that for any two integers m , n with m > n we have m
Hint. Find a contradiction to Proposition 1.26.
-n
E
Z for all m ,n
E
Z.
Z for all m ,n
E
Z.
? 1.
Prove that every nonempty set A 2 Z that is bounded above has a maximum.
1-24. Finish the proof of Proposition 1.35. That is.
(a) Prove that Q is closed under subtraction.
(b) Prove that if q , r
E
Q and r f 0, then
Hint. First show that for n
1-25. Prove that if a , b
E
E
4
- E
r
Z\ ( 0 )and d
R are such that for all E
0.
E
N we have that
> 0 we have a 5 b
+
E,
(;)-I
n
then a 5 b.
1-26. Prove that for every real number x there is an integer n so that n 5 x.
1-27. Prove that for any real numbers x, E > 0 there is an n E N so that
Hint. Theorem 1.32.
1
1
1
1-28. Prove that - - - = 1.
3
3
3
+ +
X
-
n
=d
< E.
E
N.
I. The Real Numbers
16
1-29. A rational number r is called a dyadic rational number iff there are p E Z and n E N so that
P
r = -. Dyadic rational numbers are useful in analysis because they can provide a sequence of
2”
“grids” such that each new grid contains the old one (see part 1-29a below), the whole set is the
union of the “grids” (see part 1-29b) and between any two real numbers there is a dyadic rational
number (see part 1-29c).
Let D be the set of dyadic rational numbers and for each n E W let Dn :=
K
-
:p E
Z].
(a) Prove that for all n E N we have Dn c Dn+l.
uDn
m
(b) Let
:=
{ x E W : (3n E N : x
co
E Dn)
] and prove that D
=
u D,
n=l
n=l
(c) Prove that for any x , y E
R with x
< y there is a dyadic rational number d so that x < d < y.
1-30. In this exercise, we will prove that the real numbers are the (up to isomorphism) unique linearly
ordered complete field. That is, we will prove that every mathematical object that satisfies Axioms
1.1, 1.6, and 1.19 is in a certain sense (defined below) “the same as R.”
First notice that, similar to the proof of Theorem 1.28, all results proved so far hold for any object that
satisjfies Axioms 1.1, 1.6, and 1.19 (because the results are derivEd fzom t h e y axioms). That is, every
set W that satisfies Axioms 1.1, 1.6, and 1.19 contains subsets N, Z, and Q that have the properties
that we have proved up to now for the natural numbers, the integers and the rational numbers.
(a) Prove that for all x E
W we have that x
= sup(r E
Q : r 5 x).
0
(b) Now let Jk be a set that satisfies Axioms 1.1, 1.6, and 1.19 and let 6, 2, and be subsets of
R that have the properties that we have proved up to now for the natural numbers, the integers
and the rational numbers, including Exercise 1-30a.
i. Define a function f : Q + @ as follows. For n E W, let f ( 1 ) := i and once f ( n ) is
defined let f ( n 1) := f ( n ) + l . Also let f ( - n ) := l i t . For n E Z and d E W let
+
Q the above definition is not self-contradictory by
d
proving that it assigns exactly one value to each x E Q.Then prove that f ( x ) E for
each x E Q and that f preserves the order, that is, if x < z , then f ( x ) 4 f(z).
f
(E)
:= $.Prove that for all x E
d
0
ii. For x E R let f ( x ) := sup { f ( r ) : r E Q and r 5 x ] . Prove that for all x E R the
above definition is not self-contradictory by proving it assigns exactly one value to each
x E
w.
(Formally this says that f is well-defined.)
iii. Prove that the above function does not map any two points to the same image by proving
that for all x, y E R the inequality x f y implies that f ( x ) # f ( y ) .
(Formally, this says that the function f is one-to-one or injective.)
iv. Prove that the above function “reaches” every element of Jk by proving that for all 1 E
there is an x E R so that f ( x ) = 1.
(Formally, this says that the function f is onto or surjective.)
fi
v. Prove that the above function is consistent with the algebraic operations by proving that
for all x , y E W we have that f ( x y ) = f ( x ) T f ( y ) and f ( x y ) = f ( x ) - f ( y ) .
(Formally, this says that f is a field isomorphism.)
+
vi. Prove that the above function is consistent with the order relation by proving that for all
x , y E W we have that x 5 y implies that f ( x ) i f ( y ) .
(Formally, this says that f is an order isomorphism.)
The above steps show that the points and operations in W and in Jk can b_e identified with each
other in such a way th5t it does not matter if we are working in R or in R.Thus for all intents
and purposes, W and W are “the same.” This is the essence of saying that the real numbers are
up to isomorphism the unique linearly ordered, complete field.
17
1.5. Recursion, Induction, Summations, and Products
1.5 Recursion, Induction, Summations, and Products
A recursive definition defines an entity X, that depends on a natural number n first for
n = 1 and then it defines Xn+l in terms of X , . By the Principle of Induction the set
S = { n E N : X, is defined ] is equal to N,which means that a recursive definition
defines the entity X, for all natural numbers n . In this fashion, the sum of finitely many
numbers can be defined.
I
Definition 1.38 For each j E N let
aj E
R. Define the sum c
a j := a1
and for
j=l
n+l
n
E
Ndefine the sum x a j := a n + l
n
-m
j=l
j=l
+ Caj.Form E N U {0},set x
j=l
a j := 0. The
parameter j is also called the summation index.
In particular, note that a sum whose index starts at 1 and ends at a number smaller
than 1 is always zero. It is also called an empty sum. Summations that start at numbers
other than 1 are defined similarly (Exercise 1-31). By their nature, recursive definitions
are closely linked to induction. Unlike what is stated in Theorem 1.23, induction normally is used to prove statements about natural numbers. This is possible, because a
proof that a statement is true for all natural numbers is the same as a proof that a certain
set is equal to N.
Theorem 1.39 Principle of Induction. Let P(n) be a Statement about the natural
number n. I f P ( 1) is true and iffor all n E W truth of P(n) implies truth of P(n 1),
then P (n) holds for all natural numbers.
+
Proof. Let P be as indicated and consider the set S := { n E W : P(n) is true }.
Then 1 E S. For every n E S the statement P(n) is true, hence P(n 1) is true, which
means n 1 E S . By Theorem 1.23 we conclude S = N and thus P(n) is true for all
n E N.
w
+
+
Standard Proof Technique 1.40 In the form of Theorem 1.39, induction is a standard
proof technique. It involves a two-step process. In the first step, called the base step,
P ( l ) is proved. Then, in the induction step, P(n) is used to prove P(n 1). In this
context, P(n) is also called the induction hypothesis. All proofs in this section rely
on induction. Moreover, Exercise 1-32 exhibits another way to carry out an induction
(sometimes called strong induction).
0
+
n
Example 1.41 For all n
E
W, the summation formula
j=l
7
1
j = -n(n
2
+ 1) holds.
7
1
Proof. The statement is P(n) =
j=1
cj
1
Basestep. Weprove P(n) forn = 1.
j=l
1
= 1 = -1(1
2
+ l ) , so P ( l ) holds.
18
1. The Real Numbers
c
n
Induction step. Under the induction hypothesis
1
j = -n(n
2
j=1
nil
I
j = 7 (n
+ 1) we must prove
+ l)((n + 1) + 1). A standard step in induction for recursively defined
L,
j=1
quantities is to split off the last term. This is done in the first step here.
cj
n
n+l
=
(n+l)+Cj
j=l
j=1
1
1
1
+ 1) + -n(n
+ 1) = -2(n
+ 1) + -n(n
+ 1)
2
2
2
=
(n
=
1
-(n
2
+ 2)(n + 1).
0
Further examples of similar inductions can be found in Exercise 1-33.
Similar to sums we can define products. Although products occur less frequently
than sums, they are useful to define powers.
n
1
Definition 1.42 For each j E N,let aJ
n
E
R. DeJine the product
n+l
f o r all n E
N dejine the product
n
aJ := an+l .
j=1
fi
n
aJ := a1 and
5= 1
a ] . For all m E
N U [O}, set
J=1
a] := 1. The parameter j is also called the product index.
J=1
Products that start at numbers other than 1 are defined similarly (Exercise 1-31).
Products that end at an index that is smaller than the starting index are set to 1 and are
also called empty products.
n
n
Definition 1.43 For all a
E
R,and all n
E NU{O}, we dejine
the nth power an :=
a.
J=1
Aside from integer powers of numbers, we want to work with rational powers. To
define rational powers, we need nth roots of nonnegative real numbers. To formally
prove their existence, we need the Binomial Theorem. As a start we need binomial
coefficients and one of their key properties.
n
n
Definition 1.44 For all n
E
N U {O}, we dejine n ! :=
J=1
of n. For all n , k E
n!
:= k ! ( n - k ) ! ’
(el
j and call it the factorial
N U (0) with k 5 n, we dejine the binomial coefficient as
1.5. Recursion, Induction, Summations, and Products
Theorem 1.45 The equation
19
(k:l)+(F)=(nll)
holdsforalln,k E
N
with k 5 n.
Proof. This result can be proved by direct computation.
=
(krl)+(;)
-
n!
n!
( k - l ) ! ( n- ( k - l))! + k ! ( n - k ) !
n!k
n !( n 1- k ) - n ! ( k +n 1- k )
k !( n 1- k ) ! k !( n 1-k ) !
k !( n 1 -k ) !
(n l)!
k!(n 1 - k ) ! '
+
+
+
+
+
+
+
+
Now we are ready to prove the Binomial Theorem.
Theorem 1.46 The Binomial Theorem. For all real numbers a , b
we have ( a
+ b)"
n
, ,
E
R,and all n
E
N,
=
k=O
Proof. Throughout the proof we will freely use the properties of sums proved in
Exercise 1-34. The proof is by induction on n , with P ( n ) being the statement about
(a b)".
Base step. For n = 1, note that
+
which proves the base step.
Induction step. Assuming that the result holds for n , we must prove it for n
First note that it follows easily from Definition 1.43 that for all x E R and all m
we have x . x m = xrn+'.
=
(a
=
(a
+ b)(a + b)"
+ b) f:(I)
akbnPk
k=O
=
2( i )
ak+lbn-k
k=O
+
2 (:>
k=O
akbn+l-k
+ 1.
E
N
20
1. The Real Numbers
=
(
;),.+lbn+l-intl,
+(
[( ) + (
: ) U O ~ ~ + I - ~ + ~
j=1
j-1
7)]dbn+l-j
With the Binomial Theorem, we can prove that nth roots exist. The proof of Theorem 1.47 is the first proof in this text in which we have to choose a number to make
another number smaller than a given bound. That is, this is our first proof with a distinct
analytical flavor.
Theorem 1.47 Let n E N.For every nonnegative real number a, there exists a unique
nonnegative real number r such that r n = a.
Proof. We first prove the existence of r . Let R := {x E R : x _> 0 and x n 5 a } .
Then 0 E R and R is bounded above by max{l, a } . Let r := sup(R). To show that
rn = a , we will show that rn # a and r" # a . First, suppose for a contradiction that
rn < a . Then there is a 6 > 0 so that r n 6 < a. By Theorem 1.32 (or Exercise
+
1-27), for each k E (1, . . . , n } we can find an mk
E
N so that
(i)
m := max{ml, . . . , m,,). Then by the Binomial Theorem we conclude
k=l
mk
i
r n - k a
n
s
.; Let
i
1.5.Recursion, Induction, Summations, and Products
The above shows that r
21
1
+E R , contradicting the fact that r = sup(R). Hence,
m
rn # a . The proof that rn 3 a is similar and left to the reader as Exercise 1-36.
For uniqueness, suppose for a contradiction that there is another b >_ 0 with
b” = a . Then b < r or b > r . But if b > r. then with S := b - r we obtain
a = b“ = ( r
+ S)n
= rn
+
rnPkSk> a , a contradiction. Hence, b < r . But
k=l
then with 6 := r - b we have a = r n = ( b
+ 6 ) n =bn +
bnPkSk> a , a con. .
tradiction. Therefore r is unique.
We conclude by defining rational powers of nonnegative numbers and by proving
some of their properties.
Definition 1.48 Let n E N and let a E IR be nonnegative. The unique nonnegative real
For n = 2 the root
number r such that rn = a is called the nth root of a, denoted G.
is called the square root, denoted &.
Existence of nth roots is another property that distinguishes R from Q.Although
Theorem 1.36 indicates that there are “many” rational numbers, the rational number
system has some shortcomings when it comes to powers.
Proposition 1.49 There is no rational number r such that r 2 = 2.
Proof. We first prove by induction as stated in Exercise 1-32 (strong induction)
that if n2 = 2z for some z E N,then n = 22’ for some z’ E N. The base step
for n = 1 is vacuously true. That is, because the hypothesis l 2 = 2 2 leads to the
contradiction 1 = l 2 = 22 = z z > 1, the hypothesis is never true, which means that
the implication is automatically true (see Definition A.2 in Appendix A).
For the induction step, first note that the result is trivial for n = 2, because 2 = 2.1.
Now assume that n > 2 and the statement has been proved for all natural numbers less
than n. Then 22 = n2 = ( n - 2 2)2 = ( n - 2)2 4(n - 2) 4 implies that
( n - 2)2 = 22 for some Z E N.By induction hypothesis, we conclude that n - 2 = 2:
for some Z E N,and hence n = 22 2 = 22’ for some z’ E N. This proves that if
n2 = 2z for some z E N,then n = 22’ for z’ = 2 1 E N.
+
+
+
+
+
+
Now suppose for a contradiction that there are n E Z and d E N so that (;)2
=2
and such that there is no k E N \ 11) such that n = nk . k and d = dk . k . But by the
above n2 = 2d2 implies n = n2 . 2 . Consequently, 2d2 = (n2 . 2)2, that is, d 2 = n; ’ 2 ,
which implies d = d2 . 2, a contradiction.
We conclude from Theorem 1.47 and Proposition 1.49 that f i is irrational.
For odd natural numbers (that is, natural numbers of the form n = 2k
l), it is
possible to define the nth root of a negative number a < 0 as @ :=
For the
most part, powers are considered for nonnegative numbers, though.
+
-m.
1. The Real Numbers
22
Definition 1.50 For all real numbers a 2 0, all m E N U {0), n E N and all q E
with q > 0 we dejine
1. a ; := ~.
That is, the
(i)th
power
Q
of a is the nth root of a.
2. a: := ( a " ) : .
3.
a-q
:= (a">-
1
1
= -fora
Uq
+ 0.
Theorem 1.51 For all positive numbers a and b and all rational numbers x and y , the
following power laws hold:
aXaY
=(p+Y
(q=
(ab)x = a X b X
aXY
Proof. We first prove (ab)x = axbx. For exponents n E N,this is an easy induction. The base step n = 1 is trivial and the induction step from n to n
1 is
(ab)n+' = ab(ab)n = abanbn = aanbb" = an+'bn+'.
n
For rational exponents - with n , d E N,we have that (ab)a = (ab)n and
d
( ~ z b ;=
) ~
= anbn = (ab)n. Note that in both equalities we used the
dejinition of fractional powers, not the power law that we are currently proving. Because all numbers involved are positive and dth roots are unique, we conclude that
n
n n
(ab)a = a a b a .
For x = 0, the equality (ab)x = a X b Xis trivial. Finally, for all positive x E Q we
note (ab)-xaxbx = (ab)-x(ab)x = 1. Therefore (ab)-x is the multiplicative inverse
of u X b x ,that is, ( d P=
xa-xb-x. Thus (ab)x = aXbXfor all a , b > 0 and all x E Q.
To prove that uX+Y = a X a y we proceed similarly. For exponents m , n E N,the
proof for arbitrary m is an induction on n. The base step amal = ama = urn+' follows
straight from the definition of powers with natural exponents. For the induction step
from n to n 1, note that aman+' = amana = a m f n a = a m + ( n f 1 )which
,
proves the
result for exponents m , n E N.
For positive rational exponents x and y , note that there are m , n , d E N so that
m
n
x = - and y = -. Then, using the equality we already proved, we obtain
d
d
+
(
(b;)d
'!>"
+
ni
n
aXaJ = a a a a =
(.")f
1
= (a"an)a = (am+n
)
f
-
=a ~ + ~ .
The equality is trivial if one of x and y is zero.
In case both exponents are negative, note that for all positive x,y E Q we have
aX+Ja-Xa-J= ~ ~ a J ' a - ~ a -=
- Y1, which means a-Xa-Y = ~ - ~ f ( - y ) as was to be
23
1.5. Recursion, Induction, Summations, and Products
proved. This leaves the case in which one exponent is positive and the other is negative.
Let x,y E Q be positive and consider ax-)'. If the inequality 1x1 > IyI holds we have
ax-4' a-v a --x = &Ca-x
= 1, which means that ax-y = axa-y. If 1x1 < IyI we have
aL'-xaxa-.i
= a y a - y - 1, which means that ay-' = aya-'. If 1x1 = IyI the claim is
trivial. Thus a x + y = d a y for all a > 0 and all x,y E Q.
We leave the remaining three equalities as Exercise 1-37.
Power laws for a 5 0 and b 5 0 (as applicable) can be proved similarly. To
conclude, note that the results presented in this chapter guarantee that the real numbers
have the properties we expect them to have. We will therefore use the usual notation
(fractions, etc.) and laws of algebra throughout this text without further qualms about
the need to justify that we are indeed allowed to do so.
Exercises
1-31. Let k . rn
E
m
Z and for each j
E
Zlet a j
E
n
in
W.Define the sum
a , and the product
j=k
a,.
j=k
1-32. Let P ( n ) be a statement about the natural numbern. Prove that if P(1) is true and if for all n E N\[1)
truth of P ( l ) , . . . , P ( n - 1) implies truth of P ( n ) , then P ( n ) holds for all natural numbers. This
type of induction is sometimes called strong induction.
Hint Consider S := n
{
W : ( V k < IZ : P ( k ) holds
E
) }.
1-33. Prove each of the following by induction.
n
(b)
1
j 3 = ;n2(n
+
(d) Bernoulli's inequality. Prove that for all real numbers x
that (1 +x)" z 1 + n x .
1-34. Properties of sums and products. Let c E
W and for all j
(a) Prove that for all n E N we have x ( a j
+bj) =
n
# 0 and n 2 2 we have
-1, x
N let a j and bj be real numbers
aj
j=1
j=1
(b) Prove that for all n E
E
P
+
x
bj
j=l
n
N we have E ( c a j ) = c
nj
J=1
j=1
n
1 =n
( c ) Prove that for all n E N we have
j=1
(d) Prove that for all n
E
N we have
fi
(a, . b j ) =
j=1
1-35. Reindexing sums. Lets
E
Z, n
E
N and for j
[fi [fi
aj )
j=1
E
Zlet a,
E
.
bj
)
j=1
5s. Prove that
s+n
n+I
1=s
i=l
C a, =
nkfs-l.
1. The Real Numbers
24
1-36. Finish the proof of Theorem 1.47 by showing that r" # a .
Hint. Suppose r" > a and prove that then for some E > 0 and all 6
E
(0, E ) we have r" - S @ R
1-37. Finish the proof of Theorem 1.51. That is, let a and b be positive real numbers, let x , y E
prove each of the following.
(c)
Q and
(ax)' = ax'
1-38. Let 0 5 a < b and let q z 0 be rational. Prove that aq < bq.
1-39. Let a,x
E
(0, 00) and let x be a rational number.
(a) Prove that if a > 1 and x > 1, then ax > a .
P and compare a p and a4
Hint. Let p , q E N be so that x = 4
(b) Prove thatifa < 1 andx < 1, thenax > a .
(c) Prove that if a > 1 and x < 1, then ax < a .
(d) Prove that if a < 1 and x > 1, then a x < a .
1-40. Letn
E
IV,Provethat
(a>
= 1 andthat
(:>
=I
1-41. Prove that there is no rational number r such that r 2 = 3.
1-42. Prove that for any n real numbers X I ,
. . . , xn the inequality
a+b
1-43. Prove that for all a,b t 0 the inequality 4% 5 -holds.
2
1-44.
(a) Prove that for all a , b
E
W we have d;;2+b2 5 / a /+ /bl.
(b) Prove that for any a l , . . . , an
E
W we have
1-45. Let a , b E W with a < b. Prove that there is an irrational number x E W \ Q such that a < x
Hint. Use that f i is irrational and Exercise 1-27 and mimic the proof of Theorem 1.36.
4
b.
Chapter 2
Sequences of Real Numbers
Convergence is the fundamental concept of analysis. It explores what happens when
two quantities get close to each other, or when a quantity grows beyond all bounds.
This chapter exhibits these ideas for sequences, with special emphasis on standard
proof techniques.
2.1 Limits
We start by defining sequences.
Definition 2.1 A sequence of real numbers is a function f from the natural numbers to
the real numbers. To emphasize their discrete nature, we denote sequences as
with the understanding that a, = f ( n )for all n E N.
Similar to sums and products, a sequence can actually start at any integer k (Exercise 2-1). The limit of a sequence should be the place where the sequence “stabilizes”
for large n. Definition 2.2 encodes this property by demanding that for every given
tolerance E , there is a threshold N so that once the running index n has gone past the
threshold N ,the sequence can only deviate from the limit by less than the tolerance E .
Definition 2.2 Let {a,}=, be a sequence of real numbers. Then L E R is called limit
of [ u , } g , ifffor all E > 0 there is an N E N so that for all n 2 N we have that
la, - L J < E (see Figure 2). A sequence that has a limit will be called convergent, a
sequence that does not have a limit will be called divergent.
Fintrely man) n
mapped below L - E
w
1
w
All
n
IV i u “la1l.’ of w) mapped to ( L - 8 , L
-
w
L
w
-&
L
+
Finitely many n
mapped aboie L
+
F)
w
\
w
L
f
a
w
r
E
*
E
Figure 2: Visualization of convergence to L . For every E > 0 a “tail” of the sequence
is in ( L - E , L E ) .
+
25
2. Sequences of Real Numbers
26
Remark 2.3 It can be helpful to restate the definition using quantifiers (see Definition
A.3 in Appendix A).
L
E
R is a limit of {a,}Eliff Vs
> 0 : 3N E
N : V n 2 N : la,
- LI <
E.
Formal statements with quantifiers enforce the rule that a variable must be defined
or quantified before it can be used. We will usually enforce the same rule in natural
language. Although the prose becomes a bit rigid this way, in nested quantifications
like the definition of limits it is clearer to say “. . .for all x we have P ( x ) . . . ” than to
0
say “. . . P ( x ) holds for all x . . . ”
Note that we did not speak of the limit of a sequence in Definition 2.2. This is
because we have not proved yet that every convergent sequence has only one limit.
The next theorem shows that this indeed the case.
Proposition 2.4 Limits of sequences of real numbers are unique. That is, if(^,}^^ is
then L = M .
a sequence of real numbers and both L and M are limits of
{a,}zl
Proof. Let
be a sequence and let L and M be limits of {a,}El.
We need
to prove that L = M . By Theorem 1.37 we can do so by showing that for all E > 0 the
inequality I L - MI < E holds.
Let E > 0 be arbitrary but fixed. Then there is an N1 E N such that for all n 2 N1
&
we have that la, - L / < -. There also is an N2 E N such that for all n 2 N2 we
2
&
have la, - M I < -. Let N := max{Nl, N2]. Because N 2 N1, for all n 2 N we
2
&
&
have (a, - L / < -. Because N 2 N2, for all n 2 N we have la, - MI < -. Then
2
2
by adding and subtracting U N and applying the triangular inequality we obtain (with
n =N )
Because for arbitrary E > 0 we have the inequality lL - MI < E , by Theorem 1.37
we conclude that I L - MI = 0, and hence L = M .
We will ultimately read and produce proofs that are much more complicated than
the proof of Proposition 2.4. Therefore it is only appropriate to analyze how such
proofs can be conceived. The standard proof techniques discussed later in this section
reveal that certain details are indeed standard techniques which simply need to be internalized and used at the right time. Other than that, the novice usually is impressed
& .
by the sometimes “strange” choices for E . The reason why - is chosen in the proof of
2
Proposition 2.4 is that the proof is actually created backwards. Consider the following.
To show that L = M we first note that because the a, are eventually close to L and
close to M , we can put an a, with a sufficiently large index between L and M . After
applying the triangular inequality
27
2.1. Limits
the resulting differences should be small. By Theorem 1.37, if for all E > 0 we can
make the difference IL. - MI less than or equal to E then IL, - MI = 0, that is, L = M .
So we want to make the sum of the differences lL - a, I and (a, - MI smaller than E .
&
It is most natural to make each of the two terms smaller than - to obtain
2
jL - MI = IL - UN
+
- MI 5 IL. - a N 1
+ IUN - MI
E
+
E
- = E.
2
2
This argument provides the final few lines of the proof. Note that up to here we
have not chosen any N1,N2, or N . However, now that we have the “meat” of the
argument, it is easy to create the “header.”
E
To make la, - L J and la, - MI smaller than -, we use that by the definition
2
of convergence there are N1 and N2 as mentioned in the proof and choose N to be
their maximum so that both required inequalities hold for indices beyond N . Note that
even though the “header” is the first thing we encounter, it is often the last thing that
materializes as a proof is created. So, to set up an analysis proof it is standard practice
to start by working with inequalities. Once the inequalities work, we create a “header”
with the appropriate choices for E , n , and so on.
UN
< -
Standard proof techniques in analysis. Certain steps occur so frequently in analysis proofs that they should become second nature. In this fashion, communication
becomes more effective because memory is less strained to recall details of proofs.
This is a cognitive technique commonly known as “chunking” of data. By internalizing certain standard “chunks,” larger amounts of data can be recalled, because we only
need to recall which chunks are involved rather than all details. Unlike the standard
proof techniques listed so far, from here on most standard proof techniques will be
specific to analysis. The standard techniques used in the proof of Proposition 2.4 are
listed below.
Standard Proof Technique 2.5 It is common practice to rearrange terms and to add
and subtract the same term to obtain more manageable expressions. When working
with absolute values, this is often done in conjunction with the triangular inequality.
In Proposition 2.4 this is the step
IL - MI = J L - U N
+
UN
- MI 5 IL - UN I
+
jUN
- MI.
+
Such a step is usually abbreviated as ( L - MI 5 IL - U N I
la^ - MI. In other
computations, we will see that it can also be useful to multiply and divide by the same
nonzero term.
0
Standard Proof Technique 2.6 If finitely many numbers N1, . . . , Nk E N are such
that for all n 2 Ni a certain inequality holds, we can choose N := max{N1, . . . , Nk}.
Then for all n 2 N all these inequalities hold. We usually claim directly that such an
O
N exists, skipping the intermediate Ni .
Standard Proof Technique 2.7 To prove that two quantities are equal, we can prove
that for any E > 0 the absolute value of the difference is less than E . This is usually
done without explicit reference to Theorem 1.37. To prove an inequality a 5 b we
often prove that for all E > 0 the inequality a 5 b + E holds (see Exercise 1-25). 0
28
2. Sequences of Real Numbers
Standard Proof Technique 2.8 In many analysis proofs we prove results about a universally quantified variable, often denoted E . To prove such results, we pick one such
E that is “arbitrary, but fixed,” throughout the proof. It must be fixed throughout the
proof so we can uniquely define quantities that depend on it, and it must be arbitrary so
that we really prove something about all variables in the scope of the universal quantification. Once the result is proved, we can conclude that “Because E was arbitrary
we have proved . . . for all such E.” This final statement reiterates that, even though we
made specific choices for the E in the proof, we can indeed make these choices for all
E , which proves the universally quantified statement.
Because this approach is so common, the bracketing statements about the variable
E are usually left out or abbreviated.
0
Standard Proof Technique 2.9 Finally, statements like “We need to prove L = M,”
that are put at the start to remind the reader what we will prove are often left out.
Similarly, statements put at the end to reiterate what we have proved are often left out,
too.
0
To phase in these techniques, we will first carry them out explicitly and give a
reference to the above list. Then we will omit the explicit step, but still refer to the
appropriate entry in the above list. Eventually a proof for something like Proposition
2.4 will condense to the following.
“Expert Proof” of Proposition 2.4. Let E > 0. Then there is an N E W such that
&
&
for all n 2 N the inequalities la, - L ( < - and (a, - MI < - hold. Therefore
2
2
IL - MI 5 ( L - a N I
+ la^ - MI
E
<-
2
+ E2 = E ,
-
which implies L = M .
The “expert proof’ contains all relevant information needed to communicate the
idea. It does not list the details of standard techniques, which are assumed to be known.
In this fashion the reader may have to fill in more “steps between the lines,” but as long
as these steps can be considered manageable, communication becomes more effective.
Definition 2.10 Because the limit is unique, we speak of the limit of a sequence. The
exists and is equal to the
notation lim a, = L will indicate that the limit of
n+w
number L.
The next result shows that the first finitely many values of a sequence do not affect
its convergence behavior.
Theorem 2.11 Let [ u , ) and
~ ~ ( b n } g lbe sequences of real numbers and let K E W
be so that for all n 2 K we have a, = b,. Then { a , } z l converges ifs (bn}E1
converges and in this case the equality lim a, = lim b, holds.
n-cc
n+w
Proof. Left to Exercise 2-2.
29
2.1. Limits
We conclude this section with an explicit proof that a limit is a certain number. This
exercise is good for learning how to choose more exotic values for N . Remember that
while the proof is presented in linear fashion, it is actually created by working with
the inequalities that we see at the end. In fact, N was found by solving the inequality
17
< E , which arises naturally from the computation at the end.
4n 10
+
3n-1
3
-Example 2.12 Prove that lim -n-+m2n+5
2’
Let E > 0 be arbitrary but fixed, and let N
for all n 3 N the following holds.
E
N be such that N
12-10
>
4
3n-1
3
Because E > 0 was arbitrary, this proves that lim -- n-tm2nf5
2
. Then
0
Exercises
2-1. Let k E Zand for each n E Zwith n 2 k let a, E B.Define the sequence
it means for L E W to be its limit.
and define what
2-2. Prove Theorem 2.1 1.
2-3. Write out the argument that produced the choice for N in Example 2.12.
2-4. Let {an]F=, and (bn)r==l
be sequences such that for all n
if { u , ) ~ =converges
~
then so does
2-5. Let {a,)?=,
{bn)r-’, and n+cu
lim a ,
E
N we have la,
= Iim bn.
- bnl
<
I
-. Prove that
n
n+cu
be a sequence. Prove that lim a, = L iff lim la, - L / = 0.
n+ce
n+m
2-6. Prove each of the following.
(d)
l i m m - & = O
n+m
2-7. The usefulness of quantifiers.
”
the
(a) State the negation of the statement “ L is the limit of the sequence ( a n ) ~ = ; Cby= lfinding
negation of Definition 2.2.
(b) State the negation of the statement “L is the limit of the sequence { u , ] ~ = ~by” finding the
negation of the quantifier version of Definition 2.2 in Remark 2.3.
Hint. Appendix A.2.
30
2. Sequences of Real Numbers
Limit Laws
2.2
Example 2.12 shows how to prove that a limit is a certain number. However, to write
such a proof we must know the limit in advance. It would be nice to have a computational tool that provides the limit. It would be even nicer if the computation somehow
could justify (prove) that the result of the computation really is the limit. This can be
done with the limit laws discussed in this section. Because we are investigating the theory, it will be important to understand how the limit laws come into being. Applying
the limit laws is fairly simple (see Exercise 2-8).
To use the limit laws, we must know the limits of certain standard sequences.
1
Theorem 2.13 lim - = 0.
n+m
y2
Proof. Exercise 2-9.
The limit laws state that limits can be moved into the familiar algebraic operations.
The proof consists of several similar arguments. The argument for sums is spelled out
in great detail. The argument for differences is almost verbatim the same argument
as for sums and it is left to the reader in Exercise 2-10. The argument for products
is proved closer to the fashion of the “expert proof’ on page 28, but with remarks
interspersed. For the argument for quotients, we only present the final inequalities,
leaving the remaining parts to the reader in Exercise 2- 1 1.
Theorem 2.14 Limit lawsfor sequences. Let
sequences. Then the following hold.
1. The sum {a,
{a,}Eland [ b , } E l be two convergent
+ b n } z i converges and n-+m
lim a, + b, = lim a, + lim b,.
n+co
n+co
2. The difference [a,
-
00
bn},,l
converges and lirn a, -b, = lirn a, - lirn b,
n-m
n+oc
3. The product {anbn}Ei converges and lim a, . b, = lim a,
. n+cc
lim b,.
n+m
n+m
4. Ifall b,
# 0 and lim b,
n+m
+ 0, then the quotient
n+m
converges and
an
limn--tcoan
lim - =
b,
b, ’
n-30
Proof. Throughout this proof let L := lim a, and let M := lim b,.
+
n+m
n+co
To show for part 1 that the sum {a, b,,}E1converges to the limit L + M , we must
show that for all E > 0 there is an N E N so that for all n 2 N we have the inequality
(a,
bn) - ( L M)I < E .
Let E > 0 be arbitrary but fixed (see Standard Proof Technique 2.8). Because
E
lirn a, = L there is an NL E N so that for all n 2 NL we have la, - L / < -. Simn-m
2
ilarly, because lirn b, = M there is an NM E N so that for all n 2 NM we have
1
+
+
n+w
Ib,,
&
-. Let N
:= max(NL, NM}. Then (compare with Standard Proof Tech2
&
&
nique 2.6) for all n 3 N the inequalities la, - LI < - and Ib, - MI < - hold. By
2
2
-
MI <
2.2. Limit Laws
31
rearranging the terms and applying the triangular inequality (compare with Standard
Proof Technique 2.5) we obtain the following for all n 2 N .
1 (an + b,)
- (L
+ M )I
=
Ian - L
<
&
&
- f - = & .
+ bn
-
MI I
la, - L I
+ Ibn - MI
2 2
We have proved that for arbitrary E > 0 we can find an N E N so that for all n 2 N
we have that (a, b,) - ( L M ) < E (see Standard Proof Technique 2.9). Therefore
by the definition of the limit, lim a, b, = L M .
n-+m
Part 2 is Exercise 2-10.
To show how an abbreviated proof still contains the standard statement of the de$nition of the limit, the key parts of the definition are
in the proof of part 3. We
also intersperse comments in italics on how certain choices arise.
For part 3, let
The choices for E look a bit strange. They are made so that the inequalities later on
work out and they can be motivated by reading the final estimates. Recall the discussion
after the proof of Proposition 2.4. The first typed part of this proof were the $nu1
inequalities.
Because lim a, = L there is an N L E N so that for all n 2 N L we have that
I +
+ I
+
+
n+-cc
E
. (We cannot divide by ]MI alone, because it could be zero.)
2(lMl 1)
Similarly, because lim b, = M there is an N M E N so that for all n 2 N M we have
la, - L1 <
+
E
n+cc
lim a, = L there is a K1 E N so that for all
16, - MI < 2(ILI 1) . Finally, because n-cc
n 2 K1 we have la, - LI < 1. Consequently, for all n 2 K1 we obtain by the reverse
triangularinequality la,] - ILI 5 llanl - I L / / 5 la, -LI < 1 andthus lanl < ILI 1.
Let
. Then (compare with Standard Proof Technique 2.6)
+
+
the inequalities
la,l
< /LI
Ian - LI <
E
2(IMI
+ 1 hold.
&
+ 1)’
Ibn - MI <
2(ILI
+ 1 ) and
To “connect” the anbn and the L M , we add and subtract the term a,M. This
allows us to apply the triangular inequaliv (compare with Standard Proof Technique
2.5) to split the original difference lanbn - LMI into two summands. Each of these
summands contains one factor that can be made small.
For all n 3 N , we obtain the following.
a,MI
<
lanbn
=
lanllbn -MI
-
+ la,M - LMI
+ Ian -LIIMI
2. Sequences of Real Numbers
32
which proves part 3 (see Standard Proof Technique 2.9).
For part 4, the final estimates, which are the hard part to figure out, are
la,M - LMI
-
I
<
+ ILM - Lb,l
Ibn I IMI
+
LIIMI ILIlM - bnl
Ib* I IMI
la, - LllMl lLllM - bnl
la,
-
+
IMI
E I M ~2
21LI
&M2
-M
2
2(21LI
1) < &.
4 /MI
+-
+
To train the reader in generating the appropriate “header,” the complete proof of
part 4 (including final estimates) is left to Exercise 2-1 1.
Standard Proof Technique 2.15 In an analysis proof, it can be necessary to divide
by a nonnegative quantity lal. Usually the quotient will be multiplied by that same
quantity later in an estimate and the goal is to cancel it (consider the proofs of parts
3 and 4 of Theorem 2.14). However, we cannot divide by zero. To avoid any undue
distractions here, when defining certain quotients, we will usually add 1 to nonnegative
la’
< 1 still allows us to
quantities in denominators. In this fashion, the fact that la1
+1
“cancel” the term in an estimate. This is more effective than to separately consider the
0
case that a quantity is equal to zero.
The limit laws allow us to efficiently establish convergence and compute the limits for many complicated looking sequences. For instance, finding the limit of the
sequence in Example 2.12 reduces to the following.
{ ;;; ;}Il.
Example 2.16 Find the limit of -
Using Theorem 2.13 and the limit laws we obtain the following:
3n - 1
3 n-1;
lim - = lim n-+W 2n
5
n-ta2n+5
+
3-,
1
1
= lim -n+-m
2+5
3-lim,,,;
2+lim,,,;
1
3
- -
5
2’
3
Note that by the limit laws, the computation also proves that the limit is - . That is,
2
we will not need to perform the tedious argument given in Example 2.12.
2.2. Limit Laws
33
For limits of powers, we will consider integer powers here, leaving rational powers
to Theorem 3.42 and real powers to Theorem 12.11.
Theorem 2.17 Let
{a,,}Elbe a convergent sequence. Thenfor all k E N the sequence
. Ifnone of the terms are zero and the
limit is not equal to zero, the result holds for all k E Z.
converges and lim a," =
n i x
Proof. Fork E N the proof is an induction on k.
The base step fork = 1 is trivial.
For the induction step k -+ k
1, assume that for a given k 3 1 the sequence
+
oo
converges and that lirn a," =
{a,"]n = l
nioo
converges with lirn a,"" = lim
n+cc
n+oo
as follows.
(
=
. We must prove that
. By part 3 of Theorem 2.14, we can argue
lim a, .a," = lim a,"+',
n+oo
n+cc
where each expression in the chain of equations is guaranteed to exist because the
expression preceding it exists. This completes the induction step and the proof of the
result for k E N.
Now for k E Z,let all terms and the limit be nonzero. For k > 0, we just proved
the result and for k = 0 the result is trivial. For k < 0. note that
where again each expression in the chain of equations is guaranteed to exist because
the expression preceding it exists.
Standard Proof Technique 2.18 Many statements about limits implicitly assert two
things. First, that the limit exists, and second, what the limit is. This means a proof
will need to verify both these claims. To simplify the language and to shorten proofs,
it is customary to dispense with an explicit existence proof for the limit. Instead, the
computation of the limit can be used to establish the existence of the limit as in the
proof of Theorem 2.17. If we start with an existing quantity and finish with the quantity
in question, then the existence claim in known limit laws establishes the existence of
all intermediate quantities from left to right. Exercise 2-23 shows that the order of
progression is important. The limit of a sum, a diference, a product, or a quotient can
0
exist even when the sequences of the individual terms diverge.
34
2. Sequences of Real Numbers
Standard Proof Technique 2.19 An induction as in the first part of the proof of Theorem 2.17, which takes a result for an operation on two objects and turns it into a result
for an operation with k E N objects, is often omitted by saying “a simple induction
argument shows ” The proofs will all be similar. The base step is usually simple.
For the induction step, split off the last element, apply the induction hypothesis to the
first k elements, then apply the original result and thus conclude the proof.
0
...
Although it would feel natural to prove that Theorem 2.17 holds for real exponents,
real exponents must be postponed until we define real exponentiation in Definition
12.8. Our main focus is on the abstract underpinnings and fundamentals of analysis, not
on certain specifics. Hence, we postpone the introduction of transcendental functions
until we have all the machinery to introduce them very efficiently. Their introduction
will also make us more fully appreciate the power of the abstract tools we build. We
could at least prove Theorem 2.17 for rational exponents, but at this stage the proof
would be cumbersome. We will obtain the result easily in Theorem 3.42, once we have
developed some more tools. We will use this approach throughout. Zfa result can be
obtained easily late< then (unless an important technique needs to be introduced) we
will postpone the result rather than set up an unnecessarily complicated argument.
We conclude this section by investigating the relationship between limits and inequalities.
Theorem 2.20 Let {a,}Eland { b n } z lbe two convergent sequences of real numbers
so that for all n E N the inequality a, 5 b, holds. Then lim a, 5 lirn b,.
n+m
n+m
Proof. Let L := lim an and let M := lim b,. We will prove (see Standard Proof
n+m
n+m
Technique 2.7) that for every E > 0 the inequality L - M < E holds, which proves that
L - M 5 0, that is, L 5 M .
Let E > 0. Then (see Standard Proof Technique 2.6) there is an N E W so that
&
&
for all n 2 N we have la, - LI < - and Ib, - MI < -. But, using Standard Proof
2
2
Technique 2.5, this implies
L-M
L-a,+b,-M+a,-b,
=
&
5 lL-a,I+lb,-MI+a,-b,
&
-+-+O=&.
2
2
<
We have proved that for all E > 0 we have L - M <
proves that lirn a, = L 5 M = lim b,.
E,
which by Exercise 1-25
,--too
,--too
The next theorem is a refinement of the fact that limits preserve existing inequalities. If a sequence {b,}r=l is “trapped” between two convergent sequences with the
same limit, then
must converge to that limit also. The novelty is that we need
not assume that { b , ] E l converges.
{&}El
Theorem 2.21 The Squeeze Theorem for sequences. Let { u , } ~ ={bn)n=l,
~ , cc { c , }cc
,=~
be sequences of real numbers so that for all n E N the inequalities a, 5 b, 5 c, hold
and so that lim a, = lim c., Then { b n ) E lconverges and the limits are equal, that
n+m
n+cc
is, lim b, = lim a, = lim cn.
n-m
n-m
n+m
2.2. Limit Laws
35
Proof. Let L := lim a, = lim c,. We need to prove that for all E > 0 there is an
n+co
n+cc
N so that for all n 2 N the inequality Ib, - LI < E holds.
Let E > 0. Because {a,}zl
and
both converge to L , there are N, E N and
Nc E N so that for all n 2 N, we have la, - LI < E and so that for all n 2 Nc we have
/ c , - Li < E . Let N := max{N,, N c } . Then for all n 2 N we obtain L - E < a, and
c, < L
E . Therefore, for all n 2 N we conclude L - E < a,* 5 b, 5 c, < L
E,
which means that Ib, - LI < E .
We have proved that for all E > 0 there is an N E W so that for all n 2 N we have
the inequality Ib, - LI < E . This means that {bn)zl
converges to L .
N
E
(cn}zl
+
+
Standard Proof Technique 2.22 To prove that the limit of a sequence {brz]r=l
is zero,
1
it is common to prove that lbnl is bounded by some sequence, such as a multiple of -,
n
that goes to zero itself. The Squeeze Theorem then guarantees that the limit of {b,}gl
also is zero.
In particular (compare with Exercise 2-5), we often prove that lirn a, = L by
n+cc
proving that la, - L I is bounded by a sequence that goes to zero.
Exercises
2-8. Use the limit laws to compute the limit of the sequence.
2-9. Prove Theorem 2.13.
Hint. Exercise 1-27
2-10. Prove part 2 of Theorem 2.14.
2-1 1. Prove part 4 of Theorem 2.14. Explain all choices for N and for upper or lower bounds.
Hint. There are three inequalities that must hold for all n 2 N .
2-12. Prove that if
(an)g,
converges, then lim
n-tx
2-13. Prove that if a, 2 0 for all n
E
/ a n /=
lirn a,
n+m
1.
N and ( a , ] g l converges, then n lim
-kw
=
4s..
Y"
2- 14. Let q > 0. Prove that lirn - = 0.
n+3o n !
rz!
2-15. Conjecture the value of lim - and then prove your conjecture.
n-cc n"
2-16. Let
{an]gl
be a sequence of real numbers and let ( p n ) g l be a sequence of positive real numbers
1
so that lim ___ = 0.
n-tm c
g
,
l Pk
c;=i
Pkak
(a) Prove that if (an]r=O=l
converges to a E R, then lim
n++m
(b) Give an example to show that the convergence of
convergence of
{
=a
C;,lPk
"='
I, =,
oii
Pkak
Pk
need not imply the
2. Sequences of Real Numbers
36
Figure 3: An injective function maps all elements of the domain to distinct images, but
some elements of the range may not have a preimage ( a ) . For a surjective function,
every element of the range has a preimage, but some elements of the domain may be
mapped to the same image ( b ) . A bijective function maps all elements of the domain
to distinct images and each element of the range has a preimage (c). This is why the
existence of a bijection between two sets indicates that the two sets are “of the same
size’’ (see Definitions 2.25 and 7.1 1).
2.3 Cauchy Sequences
A sequence so that for all E > 0 all elements with a sufficiently large index are within
E of each other should converge. Indeed, this condition guarantees that the elements
with large indices cluster ever more tightly. However, the number system may have
a hole just where these elements cluster. For example, the sequence 1.4, 1.41. 1.414,
1.4142, . . . of successively better decimal approximations of h does not converge in
Q because the value that the sequence approaches is not in Q (see Exercise 2-17).
This section shows that this problem does not arise in the real numbers. Sequences
for which elements with large indices cluster ever more tightly play an important role
in analysis. They are called Cauchy sequences.
{a,}zl
Definition 2.23 Let
be a sequence of real numbers. Then {an)rT1is called a
Cauchy sequence i r f o r all E > 0 there is an N E N so that for all m , n 2 N we have
that laTl- a, 1 < E .
In the real numbers, convergence and being a Cauchy sequence are equivalent.
Before we can prove this result, we need to define finite and infinite sets.
Definition 2.24 Let A , B be sets and let f : A -+ B be a function. Then f is called
injective or one-to-one ifffor all x , y E A the inequality x # y implies f (x) # f (y).
f is called surjective or onto ifSfor all b E B there is an a E A with f ( a ) = b.
Finally, f is called bijective iff f is both injective and surjective.
Figure 3 gives a visualization of injective, surjective, and bijective functions and
some properties of injective and surjective functions are investigated in Exercises 2-1 8
and 2-19. Once we have bijective functions, we can define finite sets as “sets of size
n,” where n E N U {O}.
2.3. Cauchy Sequences
37
Definition 2.25 A set F is called finite iff F is empty or there is an n E N and a
bijective function f : { 1, . . . , n } -+ F. Sets that are notjnite are called infinite. For
finite sets F # 5 we set 1 FJ:= n with n as above and we set 101 := 0. For injinite sets
ZwesetlII:=co.
Lemma 2.26 Let F be ajnite set and let I be an injinite set. Then I \ F is injinite.
Proof. In case F is empty, there is nothing to prove. In case F is not empty let
n E N be so that there is a bijective function f : ( 1, . . . , n } -+ F . Suppose for a
contradiction that I \ F is finite. Then there are a natural number m E N and a bijective
function g : (1, . . . , m } + I \ F . Define the function h : (1, . . . , m n } + I by
if j 5 n ,
Then it is easy to show that h is bijective (Exercise
h ( j ) := f ( A ;
g ( j - n ) ; if j > n.
2-20). But this means that I is finite, a contradiction.
+
[
Theorem 2.27 A sequence ( u , } ~of~real numbers converges iff it is a Cauchy sequence.
Proof. For the direction “jlet
,
L”
:= lim a,. We need to prove that for all E > 0
n-oo
there is an N E N so that for all m , n 2 N we have \a, - a, I < E .
&
Let E > 0. Then there is an N E N so that for all n 2 N we have la, - LI < -.
2
Therefore for all m , n 2 N we obtain
We have proved that for all E > 0 there is an N E N so that for all m , n > N we
have la, - a,J < E . Hence, {a,}r=l is a Cauchy sequence.
Once more we have used Standard Proof Technique 2.5. It is so common, and we
have used it often enough, that we will no longer explicitly refer back to it.
“+:”Let ( u , ) be
~ ~a Cauchy sequence. We need to prove that there is an L E R
so that for all E > 0 there is an N E N so that for all n 2 N we have la, - L1 < E .
We first need to find a suitable number L. Because ( a , ) ~isl a Cauchy sequence,
for E := 1 there is an N E N such that for all m , n > N we have la, - a, I < 1. In particular, for all n 3 N we obtain la, - a N 1 < 1, and hence a, < a~ 1. Therefore the set
( n E N : a, 5 a N 1) is infinite and thus {x E R : ( n E N : a, 5 x } is infinite } # 0.
For all n 2 N we also have that a, > a~ - 1, so that for all x 5 a N - 1 the
set ( n E N : a, 5 x} is finite. Therefore {x E R : ( n E N : a, 5 x} is infinite } is
bounded below. This means L := inf (x E R : {n E N : a, 5 x} is infinite } exists by
Proposition 1.20. The idea how to obtain L is visualized in Figure 4.
To prove that L is the limit of
we need to prove that for all E > 0 there is
an N E N so that for all n 2 N we have la, - L / < E .
Let E > 0. By definition of L , the set H- := FI E N : a, 5 L - - is finite
2
and the set H+ := n E N : a, 5 L - is infinite. Therefore, by Lemma 2.26 the
2
is infinite. Because
relative complement H+ \ H.- = n E N : a, E
+
+
(a,}zl
1
[
+7
{
7
2. Sequences of Real Numbers
38
CUlOff polnr
Infimtely man) n mapped to an)
(-m. x ] filth x > L
Finilely many n mapped to any
(-m. XIwith x < L
-
Fmrel) many n
mapped below a N - 1
1
w
w
-I
FmtzI) many n
mapped abore a N
w
w ,
aN - 1
w
L
w
-
w
1
w
\
,
a
w
+1
- c
aN f 1
aN
Figure 4: Visualization of the construction of L in the proof of Theorem 2.27.
{U,},X~=~ is a Cauchy sequence, there is an N E N such that for all in, n 2 N we have
&
&
la, - a,/ < -. Moreover, because n E N : a, E ( L - i,L +
is infinite, there
2
is a k 3 N so that
lak -
LI 5
6
z]}
2
{
-. Therefore for all n 2 N we obtain
2
Because E was arbitrary we have proved that for all E > 0 there is an N E W so that
for all n 1 N we have la, - LI < E . Hence, lim a, = L .
fl+W
Standard Proof Technique 2.28 Application of the Completeness Axiom to get the
infimum or supremum of the “right” set is a standard technique on the real line. We
will also see this approach in the proofs of Theorems 2.37, 2.41, 3.34, and 8.4. In
abstract spaces, this technique is not available and we usually substitute compactness
(see Definition 16.57), a property which, for closed and bounded intervals on the real
line, is a consequence of the Completeness Axiom (see Theorems 2.41 and 8.4).
0
Convergence of Cauchy sequences is a fundamental analytical property called “completeness,” which is introduced in Section 16.2. Another way of formulating Theorem
2.27 is to say that the real numbers are complete. Although Axiom 1.19 is already
called the Completeness Axiom of the real numbers, this terminology makes sense,
because Exercise 2-25 shows that Theorem 2.27 and Axiom 1.19 are equivalent. This
means that either one of them could rightly be called the Completeness Axiom.
There are many other equivalent formulations of the Completeness Axiom. We will
encounter some more of them in Theorems 2.37, 2.41, and 8.4 as well as in Exercise
2-5Oc. Whenever we encounter one of these formulations, there will be an exercise
similar to Exercise 2-25 to show that the new result is equivalent to one of the equivalent
formulations of the Completeness Axiom that we already know.
Aside from the fundamental importance of completeness, Theorem 2.27 has immediate value for showing if a sequence is divergent. By Definition 2.2 a sequence is
divergent iff (using negations as stated in Appendix A.2) for every real number L there
is an E > 0 such that for all N E W there is an n 3 N so that la, - LI 2 E . This is
a four times nested quantification that would require us to show for every real number
that it is not the limit of the sequence. Theorem 2.27 reduces proofs of divergence to
showing the sequence is not a Cauchy sequence.
Example 2.29 The sequence { (-
diverges.
2.3. Cauchy Sequences
39
To prove that the sequence diverges, we need to prove that it is not a Cauchy sequence. This means (see Appendix A.2) we must find an E > 0 so that for all N E N
there are m , n >_ N so that la, - a,\ > E .
But with E := 1 for every N E N we have that i(-l)N - (-l)N+li = 2 > 1.
Hence, { (- l),}Zlis not a Cauchy sequence and therefore it diverges.
0
Standard Proof Technique 2.30 In Example 2.29, we had to negate the statement
“for all E > 0 there is an N E N such that for all m , n 2 N we have la, - a, 1 < E.”
When negating such a complicated statement, it is helpful to write the statement in
quantifiers and then negate it. In this fashion, the definition of a Cauchy sequence is
VE > 0 : 3N E
N : V m , n 2 N : la, - a,(
< E,
and the negation becomes (see Appendix A.2)
3~ > 0 : V N E
N : 3m, n
>_
N : la,
- a,
I 2 E,
which is what was needed in Example 2.29. The schematic way in which quantified
statements can be negated is very helpful, especially the first few times one negates
a complicated statement. However, the quantifiers must not become a crutch. It is
advisable to first try the negation verbally and then double check with quantifiers.
Standard Proof Technique 2.31 To prove that a sequence of real numbers converges,
we often simply prove that it is a Cauchy sequence. To prove that a sequence of real
numbers diverges, we prove that it is not a Cauchy sequence.
0
Exercises
2-17. Prove that
number.
{
1
x.
is a Cauchy sequence of rational numbers whose limit is not a rational
n=l
2- 18. Let A . B be sets and let f : A -+ B be a function. Prove that f is injective iff for all bl , b2 E B we
have that f ( b 1 ) = f ( b 2 ) implies bl = b2.
2-19. Compositions of injective and surjective functions. Let A , B , C be sets and let f : B -+ C and
g : A -+ B be functions. The composition of f and g is defined by f o g ( a ) := f ( g ( a ) ) for all
a E A.
(a) Prove that if f and g are injective, then so is f o g .
(b) Prove that i f f and g are surjective, then so is f o g
2-20. Prove that the function h in the proof of Lemma 2.26 is bijective.
2-2 1. State the definition of a convergent sequence using quantifiers
2-22. For each of the following sequences, prove that it converges or prove that it diverges
40
2. Sequences of Real Numbers
2-23. Existence of the limit on the left side of a limit law as in Theorem 2.14 does not imply the existence
of the limits on the right side.
(a) Use {a,]:,
:= { ( - l ) , } g 1
1
and [b,]El:= (-1)"'l
cr3
ln=l
to show that lirn a,
n+oo
+ b,
can exist without either sequence being convergent.
(b) Show that lirn a, - b, can exist without either sequence being convergent.
n+m
(c) Show that lirn a, . b, can exist without either sequence being convergent
n+m
(d) Show that lim
?!
n+m b,
can exist without either sequence being convergent
2-24. Can a Cauchy sequence have two limits? Explain your answer.
2-25. Use the fact that Cauchy sequences converge in the real numbers and the axioms for R except for
Axiom 1.19 toprove Axiom 1.19.
Hint Let S G R be bounded above and not empty. Construct a Cauchy sequence
so that for
all x E S there is an N E N so that for all n 1 N the inequality an 2 x holds.
2.4 Bounded Sequences
If the elements of a sequence cannot become arbitrarily large, we speak of a bounded
sequence. Unlike being a Cauchy sequence, boundedness is not equivalent to convergence, but it still has some important consequences.
Definition 2.32 A sequence
is called bounded above iff there is a number
A E R such that for all n E N the inequality a, 5 A holds. In this case, A is also
called an upper bound of the sequence. A sequence {a,}Klis called bounded below
iff there is a B E R such that for all n E N the inequality a, 2 B holds. In this case, B
is also called a lower bound of the sequence. Finally, a sequence is called bounded
#it is bounded above and bounded below and we call it unbounded if not.
Example 2.33 The sequence
is bounded, while the sequence
{n)El is not
-
U
bounded.
Proposition 2.34 Any convergent sequence of real numbers is bounded.
Proof. Let { a n ) z lbe a convergent sequence and let L := lim a,. We need to
n+cc
prove that there is a number M E R so that for all n E N the inequality /a,/ 5 M
holds.
Let E > 0. Then there is an N E N so that for all n 2 N we have la, - L / < E . Let
M : = m a x ( J L l + & , la11 . . . . , I a M - I I } . Thenforalln < Nwetnviallyhavela,I 5 M
and for n L N we obtain lan] 5 la, - LI
jLI < E
ILI 5 M .
We have proved that
is bounded below by -M and above by M .
w
+
+
In general, the converse of Proposition 2.34 is not true as the next example shows.
Example 2.35 The sequence { (-l)"}zl
is bounded, but it does not converge (see
Example 2.29).
0
2.4. Bounded Sequences
41
For monotone sequences, however, boundedness does imply convergence.
Definition 2.36 Let { a n } E i be a sequence. Then [a,}Elis called nondecreasing ifs
for all n E N we have a, 5 an+l. It is called nonincreasing ifffor all n E N we
have a, 2 a,+l. rf{a,}EI is either nonincreasing or nondecreasing, then it is called
monotone. Moreovel;
is called (strictly) increasing zxfor all n E N we have
a, < a,+l and it is called (strictly) decreasing ifffor all n E N we have a, > a,+l.
The sequence { n ] z lshows that nondecreasing sequences can grow beyond all
bounds. But if this is not the case, a monotone sequence converges. The key to the
proof is Standard Proof Technique 2.28.
Theorem 2.37 Monotone Sequence Theorem. If { u , , } ~is ~bounded and monotone,
then [a,}Elconverges. More precisely
1.
v{a,]Elis bounded above and nondecreasing, then it converges.
2.
Zf{a,},X=l is bounded
below and nonincreasing, then it converges.
Proof. We only prove part 1. The proof of part 2 is similar (Exercise 2-30).
To prove part 1, let
be bounded above and nondecreasing. Then the set
{a, : n E N] is bounded above, and hence by Axiom 1.19 it has a supremum L . To
prove that L is the limit of the sequence, we must prove that for every E
0 there is
an N E N so that for all n 2 N the inequality la, - LI < E holds.
Let E > 0. By Proposition 1.21 there is an N E N with U N > L - E . But then for
all n N we have L - E < U N 5 a, 5 L , and hence ~ U N- LI < E .
We have proved that for every E > 0 there is an N E N so that for all n 2 N we
have la, - LI < E . Hence, lim a, = L .
H
n+o3
Although boundedness does not imply convergence, it forces the sequence to “cluster” in some places. To make this idea more precise, we need the notion of a subsequence.
Definition 2.38 Let A , B , C be sets and let f : B -+ C and g : A -+ B be functions.
The composition o f f and g is defined by f o g ( a ) := f ( g ( a ) )for all a E A.
Definition 2.39 Let [a,]Elbe a sequence of real numbers and let { n k ] g l be a strictly
increasing sequence of natural numbers. Then { a f l k } Eis
1 called a subsequence of
[ u , } ~ Formally,
~ .
a subsequence is the composition of the function that maps k to nk
and the function that maps n to a,.
Convergence is what happens when the indices get large. To obtain a notion that is
useful to analyze convergence behavior, in the definition of a subsequence we had to
specifically demand that the nk are strictly increasing. If we had allowed nk = n k + l ,
then by choosing [ n k } & to be a constant sequence, any sequence would have infinitely
many convergent “subsequences.” This would be counterintuitive, because a sequence
such as
{ k]
00
would have a “subsequence” that converges to 1 even though the
n= 1
sequence itself converges to 0.
2. Sequences of Real Numbers
42
With the definition as it is, subsequences behave sensibly when the sequence converges.
{a,}zl
Proposition 2.40 Let
be a convergent sequence of real numbers with limit L .
Then every subsequence { a n k } z also
l
converges to L.
{a,}zl
Proof. Let
be a convergent sequence of real numbers with limit L and let
be a subsequence. We must prove that for all E > 0 there is a K E N so that
for all k 2 K we have lank - Li < E .
Let E > 0. Because nk < nk+l for all k E N,an easy induction shows that rZk 2 k
for all k E N (Exercise 2-31). Because { a , } z l converges, there is an N E N so that
for all n 2 N we have la, - LI < E . Therefore for all k 2 N we obtain nk 2 k 2 N ,
and hence lank - LI < E .
We have proved that for all E > 0 there is a K E N so that for all k 2 K we have
/ank- L / < E . Hence, { a n k } E 1converges to L , too.
H
{U,,}~QS_~
The precise statement of the idea that a bounded sequence of real numbers must
“cluster” somewhere is that a bounded sequence of real numbers must have a convergent subsequence. This is an important property of the real numbers, which is ultimately encoded in the notion of compactness (see Section 16.5). The proof of the
Bolzano-Weierstrass Theorem utilizes Standard Proof Technique 2.28.
of
Theorem 2.41 Bolzano-Weierstrass Theorem. Any bounded sequence
real numbers has a convergent subsequence.
Proof. Let
be a bounded sequence and let b > 0 be such that for all
n E N we have ]a,] < b. Then b is a real number so that { n E N : a, 5 b ] = N
and {n E N : a, 5 -b} = 0. Therefore b is contained in the set of real numbers
{x E R : { n E N : a, 5 x} is infinite }. Moreover, this set is bounded below by -b.
Hence, the infimum L := inf { x E R : ( n E N : a, 5 x } is infinite } exists by Proposition 1.20.
We will prove that L is the limit of a subsequence of {U,},X=~. To do this, we
will employ the Standard Proof Technique 2.22 and find a subsequence { a n k } E lso
that /ank- LI is bounded by a sequence that goes to zero. For each k E N the set
1
Hk+ := n
E
N : a, 5 L
+ -kl l is infinite and the set HC
is finite. Therefore the set Tk := n E N : a,%E ( L -
{
1
i ,L
1
+ ‘1)
:= n
E
N : a,
7
5L -k
= Hk+ \ H L is ink
finite for each k E N (see Lemma 2.26). Construct {nk}‘& inductively as follows.
Because TI is infinite, it is not empty and we let n 1 E T I . Once rZk was chosen, let
nk+l be any natural number in Tk+l that is greater than ilk. Such a natural number
exists, because T k + l is infinite. Then {a,,}& is a subsequence of
and for all
1
k E N we have /ank- LI 5 -, because nk E Tk. By the Squeeze Theorem this means
k
H
lim la,, - L / = 0, and hence lim ank = L.
{a,]zl
k-m
k+w
43
2.4. Bounded Sequences
bounded
+:(1 + (-1)")n
t
convergent subsequence
bounded and monotone
f:
convergent
ti
monotone
c
finite or infinite limit
(-1)"
pi: n
Figure 5: Implications between the various notions related to convergence that are
introduced in this chapter. Implications are indicated with arrows and the examples
near the arrows indicate that the opposite implication does not hold.
The Bolzano-Weierstrass Theorem is a useful tool in single variable analysis. We
will see examples of its use in the proofs of Theorem 3.44 and Lemma 5.19.
For many sets of properties, it is instructive to explore which property implies
which other properties and which properties are equivalent. Figure 5 summarizes the
properties introduced in this chapter (including those from the next section) and how
they are related to each other.
Exercises
2-26. For each of the sequences below determine if it is bounded, and then prove your claim.
2-27. For the given sequence (a,
nk is as indicated.
]El,find an expression for the terms of the subsequence { unk)Elwhere
2-28. Use Proposition 2.40 to prove that each of the sequences below diverges.
2-29. Explain why we could have chosen E = 1 in the proof of Proposition 2.34. Then explain why we
cannot choose E = 1 in a general convergence proof.
2-30. Prove part 2 of the Monotone Sequence Theorem.
2-31. Perform the induction mentioned in the proof of Proposition 2.40. First state exactly what it is that
you prove, then execute the proof.
2. Sequences of Real Numbers
44
2-32 Sketch a visualization for the construction of L in the proof of the Bolzano-Weierstrass Theorem that
is similar to Figure 4.
2-33 Let x E R and let (x,)?=~ be a sequence of real numbers that does not converge to x. Prove that
there is an E > 0 and a subsequence { x n k } E lso that for all k E N we have xnk - x 1 E .
1
1
2-34 Let [x,)~=~be a sequence of real numbers that has no convergent subsequence. Prove that for each
x E R there is an E~ > 0 so that { n E N : Ixn - x / < cx } is finite.
be a sequence of real numbers such that every subsequence has a subse2-35 Let L E R and let
quence that converges to L . Prove that ( x ~ } ? = ~converges.
2-36 A well-known convergent sequence.
n
(a) Let a , b E Wand n
E
N.Prove that b n f l -an+'
= (b- a )
ajbn-j.
j =O
(b) Let a , b
E
(c) Prove that
:>")I,
lFt with 0 5 a < b and let n E
{
(1
+
N.Prove that
b"+l - .n+l
b-a
< (n
+ 1)b".
is increasing and that it converges.
Hint To prove it is increasing, bring an+' to the right in part 2-36b and use a = 1
1
and b = 1 + - , To prove the sequence is bounded above, use a = 1 and b = 1
n
+ n +1 1
~
1
+.
2n
2-31 Use the Monotone Sequence Theorem and the axioms for
1.19.
R except Axiom 1.19 to prove Axiom
2-38 Use the Bolzano-Weierstrass Theorem and the axioms for
1.19.
R except Axiom 1.19 to prove Axiom
2-39 Use the Bolzano-Weierstrass Theorem and the axioms for W except for Axiom 1.19 to prove directly (that is, without using Theorem 2.27 or its proof) that every Cauchy sequence of real numbers
converges.
2-40 Use the Bolzano-Weierstrass Theorem and the axioms for
Monotone Sequence Theorem.
2-4 1 Let
R except
for Axiom 1.19 to prove the
[U,),X=~ and {bn]E1
be bounded sequences.
+
(a) Prove that (a, b,]?=;O=l
is bounded.
Hint. Unlike for convergence, the bound need not be one number M . An upper bound of the
form Ma M b would work just fine.
+
(b) Prove that ( a n b n ] ~ =isl bounded.
(c) Prove that if there is a S
0 so that b, > S for all n
E
N then
is bounded
2-42. Let x E [O, I]. Prove that the sequence
defined recursively by a0 := 0 and the recurrence
1
relation a,+l := an - (x - a,' ) converges to Jr;.
2
Hint. Prove by induction that the sequence is bounded above by &.Then prove that it is nondecreasing.
+
2.5
Infinite Limits
Although unbounded sequences diverge, they can display some types of regular behavior, which are explored in this section.
2.5. Infinite Limits
45
Definition 2.42 Let {a,,}z1 be a sequence of real numbers. Then we say that the limit
of ( a , } z l is infinity ifffor every M E R there is an N E N so that for all n 2 N
the inequality a, >_ M holds. In this case, we write lirn a, = 00. A limit of negative
n-+ x
infinity is defined similarly (Exercise 2-43) and denoted lirn a, = -00.
,--too
Intuitively an infinite limit should mean that eventually the sequence gets close to
infinity. Similar to Definition 2.2, this idea is encoded in Definition 2.42 by saying that
for any given bound M , there is a threshold N so that once the running index n goes
past the threshold N ,the sequence will not drop below the bound M any more. It is
also said that a sequence with lirn a, = 00 grows beyond all bounds.
,--too
Example 2.43 L e t x > 1. Then lim x n = 00.
n+cc
Clearly, for all n E N the inequality xn+' > x n holds. Suppose for a contradiction that { x " } z l does not go to infinity. Then (refer to Standard Proof Technique
2.30 as necessary) there is a B > 0 so that for all N E N there is an rn > N with
x m 5 B . Let n E N. By the above, there is an rn 1. n so that X" 5 x m 5 B .
Hence, the sequence { x n } z l is bounded above. Let M := sup { x n : IZ E N} . Now
M
M
by Proposition 1.21 because - < M , there is an n E N so that x n > -. But then
X
X
M
xntl = x . x n > x- = M , a contradiction. Thus lim x n = 00.
n-m
X
With some exceptions, discussed at the end of this section, the limit laws for infinite
limits are similar to those for convergent sequences.
be such that lim a, = co.
Theorem 2.44 Limit laws involving co. Let
n+oo
1. If {bn},X==l
is a bounded sequence, then lirn a,,
n-+m
+ b, =
00
and if all a, are
bn
nonzero, then lim - = 0.
n+W a,
2.
If nlirn
+x
b, =
then lim a, + b , = m and lirn anbn = 03.
,-too
n+cc
3. I f c > 0 is a real number, then lim can = 03.
n--tx
4. I f c
4
0 is a real number, then lirn can = -co.
n+m
Proof. To prove part 1 let lirn a, = 00 and let (b,],X==,be bounded. Let B E R be
n i x
such that for all n E W the inequality Ib, I < B holds.
First consider the sum. We need to prove that for all M E R there is an N E N so
that for all n > N we have a, + b, 2 M .
Let M E R. There is an N E N so that for all n 2 N we have a, 2 M + B . But then
for all n 2 N we obtain a, b, > M B - B = M , and hence lirn a, b, = 00.
n i x
Now consider the quotient. We need to prove that for all E > 0 there is an N E W
+
+
la": I
so that for all n 2 N the inequality - <
+
E
holds.
2. Sequences of Real Numbers
46
Let
E
IBI
R so that < M , which
> 0. By Theorem 1.32 there is an M E W
1Bl
means - <
M
E.
&
Now there is an N E W so that for all n >_ N we have a, 2 M . Thus
< E , and hence lim
for all n >_ N we obtain
bn
- = 0.
n-+m a,
.. ,
To prove part 2 let lirn a, = lirn b, = 00.
n-tcc
n-00
For the sum, we need to prove that for all M E R there is an N E N so that for all
n 3 N the inequality a, b, 2 M holds.
Let M E R and note that there is an N E N so that for all n 2 N we have the
M
M
inequalities a, 2 - and b, 2 - (see Standard Proof Technique 2.6). But then for all
2
2
M
M
n 2 N we have a, b, 1. - - = M . Hence, lirn a, b, = 00.
2
2
n+co
For the product, we need to prove that for all M E R there is an N E N so that for
all n 2 N the inequality anbn 2 M holds.
Let M E R and note that there is an N E N so that for all n 2 N we have the
inequalities a, 2
and b, 2
But then for all n 2 N the product exceeds
M because anbn 2
IM I 2 M . Hence, lim a, b, = co.
n+oo
To prove part 3, let lim a, = co and let c > 0. We need to prove that for all
n-00
M E R there is an N E N so that for all n 2 N the inequality can 2 M holds.
M
Let M E R.There is an N E W so that for all n 2 N we have a, 1. -. But then
C
M
for all n 2 N we obtain can >_ c- = M . Hence, lim can = 00.
C
n+co
Finally, to prove part 4 let lirn a, = 00 and let c < 0. We need to prove that for
nim
all M E R there is an N E N so that for all n 2 N the inequality can IM holds.
M
Let M E R.There is an N E N so that for all n 2 N we have a, 2 --. But then
+
+
+
+
m
m.
mJln/ll=
(
i)
for all n 2 N we obtain can I
c --
Icl
= M . Hence, lirn
n+m
Can
= -m.
Infinite limits can also help indirectly to establish the existence of finite limits.
Example 2.45 Let x > 0. Then lim x f = 1.
n+cc
The result is trivial for x = 1. We first consider x > 1. Suppose for a contradiction that lirn xf f 1. For every n E
n+cc
Thus if
I
N we have 1 < x h
=
(.;)
*
< xf .
00
xi
' ln=l
does not converge to 1, then (refer to Standard Proof Technique 2.30
as necessary) because xj > 1 for all n E
N E N there is an n 2 N with x: > 1
+
N,there is an
E.
E
Because (x;]
+
> 0 so that for every
00
n=l
is decreasing this
would mean that 2; > 1 E for all n E N,But then for all n E N we would have
(1 + E ) , < (x:)
= x, contradicting the fact that lim (1 E ) , = 00 (see Example
2.43). Thus for all x > 1 we have lirn
n+m
I
xi
,-too
= 1.
+
2.5. Infinite Limits
47
The proof for 0 < x < I is deferred to Exercise 2-46.
Just as for infinite limits, there are limit laws for limits that equal negative infinity.
be such that lim a, = -cc.
Theorem 2.46 Limit laws involving -m. Let
n-cc
1. I f { b n } E l is a bounded sequence, then lim a,
n-tcc
+ b, = -00
and
if all a,
are
bn
nonzero, then lim - = 0.
,+aa,
2. I f lim b, = -.co,
then lirn a,
n+cc
,403
+ b, = --oo
and lirn anbn = 00.
n+w
3. I f lim b, = cc,then lim anbn = -w.
n-cc
,300
4. I f c > 0 is a real numbel; then lirn can = -cc.
11'00
5. I f c
40
is a real numbel; then lim can
00.
n+m
Proof. The proof of Theorem 2.46 is similar to that of Theorem 2.44. It is thus left
to the reader as Exercise 2-44.
The addition of two sequences such that one has limit 00 and the other has limit
-m is absent from Theorems 2.44 and 2.46. This is because by Exercise 2-48b the
sum need not converge. Exercise 2-48c shows that even if there is a limit, it is not the
same number in all cases. The situation is similar for the product of a sequence with
infinite limit and a sequence with limit zero (see Exercises 2-48d and 2-48e), as well as
for the quotient of two sequences with infinite limits (see Exercises 2-48f and 2-488).
These types of limits are called indeterminate forms and they are discussed in more
detail in Section 12.3.
Exercises
2-43. State the definition of a sequence whose limit is negative infinity.
2-44. Prove Theorem 2.46. That is, prove each of the following.
+
is a bounded sequence, then lim a,
bn = -oa and if all
n+x
b
an are nonzero, then lim -!! = 0.
n + w a,
(b) If lim a, = lim bn = -w, then lirn a, b, = --oo and lim a,b, = oc.
(a) If,5mxa,
n+x
= -m and [bn):=,
n+x
n e w
+
n+x
(c) If lim a, = -oa and lim b, = oa,then lim a,b,
n+x
n+x
n+x
= -oa
(d) If c > 0 is a real number and lirn a, = -m, then lim can = --oo
n+m
n+m
(e) If c < 0 is a real number and lirn a,, = - w , then lim can = 00.
n+x
n+x
2-45. Let 0 < x < 1
(a) Prove that lim x n = 0 by using the limit laws
n+m
(b) Prove that lirn x n = 0 by mimicking the proof in Example 2.43
n+m
2. Sequences of Real Numbers
48
1
1. Prove that lirn x i = 1
2-46. Let 0 < x
2-47. Prove that
n+x
I,"=,
{
is decreasing and that the limit is 1
( + :In 5
Hint. Use Exercise 2-36c to show that 1
(n
+ 1) &I
n
-
n for all large enough n and derive the inequality
from this. For the limit L , use the result from Example 2.45 to show that L = L2.
i
2-48. A first encounter with indeterminate forms.
(a) For
[ u , ) ~:=~( n ] g l
and
( b n ] g l :=
lim b, = --o= and that the sequence (a,
n-x
[ - n2 ),"=,,
+ b,}r=l
prove
(n]E1
a n d ( b , ] z l := ( - n ) z l , p r o v e t h a t n+oo
lirn
and that the sequence (a, + b n ) r =l converges.
(b) F o r ( a n ) z l :=
(c) Let c E
W.Find sequences
+ b,
lim a,
n-x
lim an = x ,
that
,-roo
diverges.
[~,)r=~
an = x , n-oo
lim bn = -00
and [ b n ) r z lso that lirn a, = 00, lirn bn = -m and
n i w
n-a3
= c.
(d) Find sequences ( a n ) z l and [ b n J z 1so that lirn a, = x , lim bn = 0 and ( ~ n b n ] E i
n+oo
n-x
diverges.
(e) Let c E
W. Find sequences (an},X=l and ( b n ] ; P 1so that n+m
lim
a, = 03,
lirn b, = 0 and
n+x
lim anbn = c.
n-oo
(0Find
3cI
and (b,)r=l so that lim a, = lirn b, = m and
sequences
n-oo
n+x
di-
verges.
(g) Let c E
lim
n-m
W. Find sequences
2
= c.
b,
2-49. Prove that if (a,)?=l and (b,]?=l
an N
E
are sequences so that lirn a, = x and there are an
N so that bn > E for all n 2 N , then n-oo
lim
n-oo
E
> 0 and
anbn = 00.
2-50. A characterization of divergent sequences
(a,]zl
I,"=,
be an unbounded sequence. Prove that there is a subsequence (ank
(a) Let
lim ank = 00 or lim ank = -a.
so that
k+co
k-x
be a bounded divergent sequence. Prove that there are two convergent subse(b) Let
quences (a!,
such that lim a!, and lim ank exist, hut are not equal.
and
{ank}gl
m+x
k+x
(c) Let
be a sequence. Prove that [ u , ] ~ = ~diverges if and only if there is a subsequence {ank},& such that lim ank = x or lim ank = --x, or there are two subse-
}r=l
quences (aim
and
k-m
{ L Z , , ~ } :such
~
k+oo
that lirn al, and lim ank exist, but are not equal.
m+m
k+x
(d) Use the characterization of divergent sequences in part 2-5Oc and the axioms for R except for
Axiom 1.19 to prove the Bolzano-Weierstrass Theorem.
2-51. Prove Cauchy's Limit Theorem. That is, let (b,]?=, be a strictly increasing sequence of positive numbers that goes to infinity and let [a,]?=, be a sequence. Prove that if the sequence
an
converges to c, then lim - = c.
n - x b,
Hint Exercise 2-16 with p , := b,, - b,-l and another appropriate sequence
Chapter 3
Continuous Functions
Functions are the central objects of analysis. This chapter defines limits and continuity
for functions of a real variable and it presents some consequences of continuity. To
avoid problems with complicated domains (see Exercise 3-32 and the end of Section
16.3.1 for some details), in this chapter functions are usually considered on intervals
or on intervals from which at most finitely many points were removed. These domains
are sufficient to build the traditional calculus of functions of one variable. More complicated domains are handled in metric spaces in the second part of the text.
3.1 Limits of Functions
The limit of a function at a point x is supposed to express what happens near x, but
not necessarily at x. This is similar to the running index n of a sequence never actually
becoming 00. While 00 is not in the domain of a sequence, a real number x can be in
the domain of a function. Hence, we must explicitly remove x from consideration. In
this section, functions are defined on an open interval from which a point x has been
removed. In this fashion, we assure that each function is defined “close to the left of
x” and “close to the right of x,” which is what we need to investigate for (two-sided)
limits. Because convergence of sequences is already defined, we can use sequences to
define convergence of functions.
Definition 3.1 Sequence formulation of the limit of a function. Let I g R be an
open interval and let x E I . The number L E R is called the limit of the function
f : I \ (x} -+ Iw at x zTfoor all sequences (zn}El
with zn E I \ {x} for all n E M and
lim zn = x we have lim f ( z , ) = L. In this case, we denote lim f ( z ) := L and we
n-rffi
n
m
’
XZ
’
also say that f converges (to L ) at x.
Similar to Theorem 2.1 1, the limit of a function at x is only affected by the values
of the function near x .
49
50
3. Continuous Functions
Theorem 3.2 Let I 2 IR be an open interval, x E I , and let f ,g : I \ {x} + R
be functions. If there is a number 6 > 0 so that f and g are equal on the subset
{ Z E I \ { x } : Iz - X I < 8 ) o f I \ {x},then f converges at x ifsg converges at x and in
this case the equality lirn f ( z ) = lim g ( z ) holds.
Z'X
Z'X
Proof. E: cise 3-2.
By Theorem 3.2, Definition 3.1 also defines the limit at x for any function that is
defined on a set D that contains a set I \ (x},where I is an open interval that contains
x. Formally, we define the following.
Definition 3.3 Let D , R , S be sets with R D and let f : D -+ S be a function. The
restriction of thefunction f to R, denoted f I R , is defined by f I R ( x )
:= f (x) for all
x E R.
Definition 3.4 Let f : D -+ IR be a function and let x E R be so that there is an open
interval I with x E I and I \ { x }C D. We define the limit off at x as the limit of the
restriction f l ~ \ { a~ t~ x and denote it lim f ( z ) := lim f l ~ \ { ~ ) ( z ) .
X
Z'
Z'X
All the following results on functions f : I \ {x} -+ R also apply to functions with
larger domains. Strictly speaking we would need to apply all definitions and results to
the restriction of the function to a set I \ {x},where I is an open interval and x E Z. We
will usually avoid this simple formality. Ultimately, Definition 16.33 will encompass
this situation as well as some situations in which, for single variable functions, we use
one-sided limits (see Definition 3.15 below).
Example 3.5
1. For all x E
R,we have lirn z
= x.
Z'X
2. For all x E R,we have lirn IzI = 1x1.
X
Z'
3. The function f (x) :=
{
1;
0;
-1;
f o r x > 0,
for x = 0, does not have a limit at x = 0.
f o r x < 0,
Part 1 is trivial and part 2 follows from Exercise 2-12. To see that the function in
part 3 does not have a limit at x = 0, it suffices to produce two sequences {yn}E1and
{zfl}glthat converge to zero so that yn f 0 and zn f 0 for all n E N and { f ( y n ) ) z J
for all n
E
W we obtain n+x
lirn f
(:)
= 1 f -1 = lim f
n+x
(- :),
which completes
the argument.
:StandardProof Technique 3.6 If a sequence that converges to a number x from the
is usually a good choice. For a sequence that converges
51
3. I . Limits of Functions
a2
to x from the left,
{
x--
is usually a good choice. This idea is extended in
Standard Proof Technique i 3.8.
l n =
0
l
There are at least two ways to define the limit of a function. We have already seen
the formulation with sequences and we give the formulation with E and S below. The
two formulations are equivalent, and hence either one could serve as the definition.
With both formulations available, we can choose which one to use. Depending on the
situation, one formulation may be preferable over the other to produce a simpler statement or proof. The proof of Theorem 3.19 is a good example of how each formulation
is better suited for certain settings than the other.
Theorem 3.7 E-6 formulation of the limit of a function. Let I C R be an open
interval and let x E I . Then L E R is the limit of thefinction f : I \ {x} + R at x iff
for all E > 0 there is a 6 > 0 such that for all z E I \ {x} with ) z - x / < 6 we have that
( z ) - LI < E .
If
Proof. For “+,”let lim f ( z ) = L . Suppose, for a contradiction, the statement on
XZ
’
the right is false. Then there is an E > 0 so that for each 6 > 0 there is a z E I \ {x]
with / z - x J < 6 and l f ( z ) - LI 2 E (if necessary, use Standard Proof Technique
1
1
2.30 for the negation). Then for S := - there is a z , E I \ {x] with Iz, - X I < - and
n
n
l f ( z n ) - Li 2 E . But then lim zn = x, while lim f (z,) either does not exist, or
n+3o
n+cc
if it exists, then lirn f (z,)
L (see Exercise 2-5). Either way we have amved at a
n+m
+
contradiction.
For “e,”
let f : I \ {x} + R be such that for all E > 0 there is a 6 > 0 such that
for all z E I \ {x}with Iz - X I < 6 we have that I f ( z ) - LI < E . We need to prove that
for each sequence
with zn E I \ {x} for all n E N and lirn zn = x we have
{z,}r=,
n+cc
that lirn f ( z , ) = L .
n+m
Let { z ~ } : = ~ be a sequence with zn E I \ ( x } for all n E N and n+co
lim zn = x, and
let E > 0. Then there is a 6 > 0 such that for all z E Z \ {x] with Iz - X I < 6 we
have I f ( z ) - Ll < E . Moreover, for S there is an N E N so that for all n 3 N we
have Izn - x J < 8. But then for all n 2 N we infer l f ( z , ) - LI < E , and hence
lim f ( z , ) = L . This proves that lirn f (z) = L .
17 + 30
Z’X
Standard Proof Technique 3.8 In the ‘‘+”part of the proof of Theorem 3.7, for all
S > 0 there is a z with Jz - x J < 6 and other properties. To obtain a sequence
{~,,},x7_~ that converges to x so that each z, has the other desired properties, it is standard
1
1
practice to use 6 := - and then pick an appropriate element zn with / Z n - X I < -.
n
n
Exercises
3- 1, Explain why after Definition 3.1 it is not necessary to prove that the limit of a function is unique
3-2. Prove Theorem 3.2.
3. Continuous Functions
52
3-3. Let m , b
E
W.Prove that Z'lirn
X
mz
+ b = mx + b.
3-4. Let I be an open interval and let x E I . Prove that if f : I \ ( x ) -+ W does not converge to L at x,
then there are an E z 0 and a sequence [ z ~ ) : = ~ so that lirn zn = x , zn E I \ (x} for all n E 8 and
n-x
1 f ( z n ) - L 1 > E for a11n E N.
3-5. Prove that if m E Zand 1.1 is the floor function, then lirn l z ] does not exist.
z+m
2-3
1
3-6. Prove that lirn -- -
6'
2-312-9
3-7. Prove that the Dirichlet function f ( x ) =
''
for
Q' does not converge at any x E R.
0; f o r x $ Q,
3-8. Explain why, with the present definition, lirn &is not defined. Then state what the limit should be
z+o
and how we could circumvent our purely formal problem.
Nore. This problem will be resolved in Exercise 16-28.
3.2 Limit Laws
Just as for limits of sequences, we are interested in how limits of functions relate to the
algebraic operations, because this should simplify the computation of limits. First note
that all algebraic operations on functions are defined pointwise.
Definition 3.9 Let D E R be a set and let f , g : D -+ R be functions. For all
x E D we define ( f g)(x) := f ( x ) g(x), ( f - g)(x) := f ( x ) - g(x), and
f (x)
( f . g)(x) := f (x) . g(x). For all x E D with g(x) f 0 we dejne
(x) := -
+
+
($)
Theorem 3.10 Limit laws for functions. Let I Iw be an open interval, let x E I and
let f , g : I \ {x} -+ R be functions such that lim f ( z ) and lirn g ( z ) exist. Then the
Z'X
Z'X
following hold.
lim( f
X
Z'
+ g ) ( z ) = lim f ( z ) + lim g ( z ) .
Z'X
XZ'
lim( f - g ) ( z ) = lim f ( z ) - lim g ( z ) .
Z'X
lim( f
X
Z'
Z"X
g ) ( z ) = lim f ( z ) .
Z'X
XZ'
JL:
g(z).
Each equation implicitly asserts that the limit on the left side exists (see box on p . 33).
Moreovel; formally, in part 4 we would need to demand also that g ( z ) # 0 for all
z E I \ {x). But lirn g ( z ) # 0 implies that g ( z ) # 0 for all z E I \ {x} that are near
Z'X
x. Hence, i f g has zeros, we implicitly assume that g has been restricted appropriately
rather than worry about zeros that do not affect the convergence behavior:
53
3.2. Limit Laws
Proof. Throughout this proof let L f := lim f (z) and let L , := lim g(z).
Z’X
X’Z
In this proof we will use Definition 3.1 as well as Theorem 3.7. Although it will
turn out that Definition 3.1 in conjunction with the limit laws for sequences is more
effective for this prooj it will also be instructive to see how 8-6proofs are constructed.
The reader will compare the two approaches by proving each part with the respective
other approach in Exercises 3-9 and 3-10.
To prove part 1, we use Theorem 3.7. We need to prove that for all E > 0 there is a
6 > 0 s o t h a t f o r a l l z ~I\{x)withIz-xl < 6 w e h a v e I ( f + g ) ( z ) - ( L f + L , ) l
<E.
Let E > 0. Then there are 6f > 0 and 6, > 0 so that for all z E I \ {x) with
If
(z) - L f l < - and for all z E I \ {x) with Iz - X I
< 6, we
2
E
have Ig(z) - L , < -. Let 6 := min{bf, 6,) (compare with Standard Proof Technique
2
2.6). Then for all z E I \ {x) with Iz - X I < 6 we obtain via the triangular inequality
that
Iz - X I
< Sf we have
&
1
This means we have proved that for all E > 0 there is a 6 > 0 so that for all
I \ {x) with Iz - X I < 6 we have f g)(z) - ( L f L g ) l < E . Consequently,
lim( f g ) ( z ) = L f L,.
z
E
Z’X
+
I( +
+
+
To prove part 2 using Definition 3.1, we need to prove that for all sequences {z,
with Z, E I \ {x) for all n E N and lim zn = x we have lirn f (z,) = L .
(z,}zl
be a sequence in I \ {x} with lim
n
m
’
Let
Because
n-cc
n-cc
Z, = x. By
{z,)zl
was arbitrary this implies lim( f
)El
Theorem 2.14 we infer
-g)(z) = L f
-
L,.
X’Z
The proofs of parts I and 2 show that to prove the limit laws, Definition 3.1 is
more effective than Theorem 3.7. Nonetheless, both ways are actually equally complex
overall. If we compare the proof of part I with the proof of part 1 of Theorem 2. I 4
we see striking similarities in the arguments. This means that the complexity of a proof
using Dejinition 3.1 is simply delegated to the proof of an earlier result (Theorem2.14).
The reader will have the chance to analyze the similarities and the direrences in the
proofs for part 3 in Exercise 3-9. The similarity between the proofs here and the proofs
for Theorem 2. I 4 can be used to translate the proof for part 4 into a complete proof of
part 4 of Theorem 2.14. The rather complicated choices in the header were of course
made after the final inequalities had been analyzed carefully.
The proof of part 3 is left to the reader as Exercise 3-9.
For part 4, we use Theorem 3.7. So we need to prove that for all E > 0 there is a
f
L
S > O s o t h a t f o r a l l z E Z \ {x} with Iz - X I < 6 we have l;(z) < E.
$1
Let E
1
0. Then there is a 6f > 0 such that for all z
have f ( z ) - L f l <
7
‘ILX1. Moreover, there is a 6,
>
\ (x)with Iz - x I < Sf we
0 such that for all z E I \ {x)
E
I
3. Continuous Functions
54
with
/z-XI
2 ( 2 ILf
We have proved that for all
Iz
ILg I*
< 6, we have / g ( z ) - L,/ <
- x / < 6 we have
E
I + 1)
. Finally, there is a v
> 0 there is a 6 > 0 so that for all
c
E.
z
E
> 0 so
I \ (x)with
Therefore the limit of the quotient exists and
Just like limits of sequences, limits of functions preserve inequalities and there also
is a Squeeze Theorem.
Definition 3.11 Let D 5 R be a set and let f , g : D + R be functions. We say
3f is pointwise less than or equal to g, that is, i r f o r all x E D the inequaliv
f _< g 1
f ( x ) 5 g ( x ) holds.
Theorem 3.12 Let I 5 R be an open interval, let x E I and let f , g : I \ (x}+ R be
functions. I f f 5 g on I \ {x} and f and g converge at x, then lim f ( z ) 5 lim g ( z ) .
Z'X
Z'X
Proof. Exercise 3- 1 1.
Theorem 3.13 The Squeeze Theorem for functions. Let I C R be an open interval, let x E I and let f , g , h : I \ {x} -+ R be functions. I f f 5 g 5 h on
I \ {x) and f and h converge at x with lim f ( z ) = lirn h ( z ) , then g converges at
x and lim g ( z ) = lim f ( z ) =
Z'X
Z
X'
i%
h(z).
Z'X
Z'X
55
3.2. Limit Laws
Proof. Exercise 3- 12.
Finally, convergence is also preserved by the composition of functions.
Theorem 3.14 Let I , J 5 R be open intervals, let x E I , let g : I \ {x) -+ IW and
f : J + R befunctions with g [ I \ {x}] J , and let lirn g ( z ) = L E J . Assume that
Z"X
lirn f ( y ) exists and that g [ I \ (x}] E J \ { g ( x ) } ,06 in case g(x)
E g[Z
J-+L
\ (x}],that
lirn f ( y ) = f ( L ) . Then f o g converges at x and lirn f o g(x) = lim f ( y ) .
J-tL
y+L
Z'X
Proof. Let M := lirn f ( y ) and let
be a sequence in the set I
\ {x} so
Y+L
that lirn zn = x. Then lim g ( z n ) = L . If no zn satisfies g ( z n ) = g(x), we obtain
n+oo
n-cc
lirn f ( g ( z n ) ) = M , while if some g ( z n ) are equal to g(x), then M = f(L)and we
n-+w
can infer lirn f ( g ( z n ) ) = f ( L ) = M in this case also. Because the sequence {z,},"=,
n+w
was arbitrary the result is established.
Exercises
3-9. Completing the proof of Theorem 3.10. Let x E I and let f . g : I \ {x)--f
lim f ( z ) and lim g(z) exist.
R be functions such that
Z"*
Z"X
(a) Prove part 3 using Definition 3.1.
(b) Prove part 3 using Theorem 3.7.
3-10. Alternative proofs for the proved parts of Theorem 3.10. Let x E I and let f,g : I \ {x] +
functions such that lirn f ( z ) and lirn g(z) exist.
(a) Prove that Jir+mx(f
R be
X
Z'
X
Z'
+ g)(z) = lirn f ( z ) + Jim g(z) (part 1) using Definition 3.1
XZ'
X
Z'
(b) Prove that lirn ( f - g)(z) = lim f ( z ) - lirn g ( z ) (part 2 ) using Theorem 3.7.
z+x
Z'X
(c) Prove that if lirn g(z) f 0, then
ZX'
Z+X
!s(5) ( z )
= l i m Z ~ xf ( z ) (part4) using Definition 3.1.
limz+x g(z)
3-11. Prove Theorem 3.12.
(a) Using Definition 3.1.
(b) Using Theorem 3.7.
3-12. Prove Theorem 3.13.
(a) Using Definition 3.1
(b) Using Theorem 3.7
3-13. Explain the similarities between the proof of part 1 of Theorem 3.10 as presented and the proof of
part 1 of Theorem 2.14.
3-14. Computation of limits.
(a) Let I be an open interval, let x E I and let f : I \ (x) +
Jim f(z) = lim f ( x h ) .
i-x
+
h-0
(b) Compute each of the following limits
x 2 - 5x 6
i. lirn
x 1 3 x 2 - x - 6
x-9
iii. lim
X"9X2( 9 + & ) x + r n
+
ii. lim
x-2
W be a function. Prove that
3x3 - x* - 12x
x2 - 4
+4
56
3. Continuous Functions
3.3 One-sided Limits and Infinite Limits
Section 3.5 will show that it is advantageous to consider continuous functions (see Definition 3.23) on closed intervals. But that means we also need a notion of convergence
at the endpoints of a closed interval. One-sided limits provide just that.
Definition 3.15 Sequence formulation of the left limit of a function. Let a < b. The
number L E R is called the left limit of the function f : [ a ,b ) + R at b iff for all
sequences
in [ a ,b ) with lirn zn = b we have lirn f (z,) = L. In this case,
n+cc
n+cc
we denote lim f ( z ) := L and we say f converges (to L ) at b from the left.
{zn]zl
z+b-
The right limit at a for a function f : ( a , b ] -+ E% is defined similarly. It is denoted
lim f ( z ) , and we say f converges at a from the right.
ziaf
I f f : D -+ R is a function and the domain D contains an interval [ a ,b ) with
a c b, we dejine lim f (2) := lirn f I [ a , b ) ( z ) if it exists. (Exercise 3-15 shows that
z+b-
z+b-
these left limits are well dejined.) Right limits are defined similarly.
Similar to limits, we prove most results for functions defined on half-open intervals.
These results are also valid for functions with larger domains. We simply apply them
to the appropriate restrictions.
Example 3.16 Let m E Z.For thejoor and ceiling functions of Definition 1.33, we
have lirn l z ] = m - 1, lim LzJ = m, lim rzl = m, and lirn [zl = m 1. 0
z+m-
z-+m+
zim-
z+m+
+
Definitions 3.15 and 3.1 differ in only one way. For a one-sided limit, the sequences
must all stay on one side of the number, while in Definition 3.1 the sequences can have
values on either side. To emphasize the ability to approach from either side, limits of
functions are sometimes called two-sided limits. With such strong similarity in the
definitions, it is only natural that the theorems that govern one-sided limits are similar
to the theorems that govern (two-sided) limits.
Theorem 3.17 Limit laws for one-sided limits. Let f , g : [a,b ) -+ R be functions
such that lim f ( z ) and lirn g ( z ) exist. Then the following hold.
z+b-
zib-
I . lim ( f + g ) ( z ) = lim f ( z )
z-tb-
z+b-
+ z-tblim
g(z).
Each equation implicitly asserts that the limit on the left side exists. (See box on page
33.) Moreovel; because lim g ( z ) # 0 in part 4 implies that g ( z ) # 0 for z near b
z-b-
(where it matters), we did not demand g ( z )
hold for right limits.
+ 0 for all z E [ a ,b). Similar limit laws
57
3.3. One-sided Limits and Infinite Limits
Proof. Exercise 3-16.
Theorem 3.18 8-6formulation of the left limit of a function. The number L E R is
the left limit of thefunction f : [ a ,b ) + R at b ifSfor all E > 0 there is a 6 > 0 such
thatforallzE [ a , b ) w i t h l z - b l < 6 w e h a v e ( f ( z ) - L (< E .
Proof. Exercise 3- 17.
Theorem 3.19 connects one-sided limits to (two-sided) limits. Note that to make
the proof efficient, we use the sequence formulation of the limit for one direction and
the E-8 formulation for the other direction.
Theorem 3.19 Let I g R be an open interval, let x E I and let f : I \ {x) + R be a
function. Then lim f ( z ) exists iff lim f ( z ) and lirn f ( z ) both exist and are equal.
z'x-
Z'X
Z'X+
In this case the limit is equal to the left and the right limit.
Proof. For
"+,"let
for all sequences
L := lirn f ( x ) . Using Definition 3.15 we must prove that
Z'X
{z,}:~ in I with zn
< x for all n E
N and nlim
-ffi
Z, = x
{z,}Elin I with
lim f (z,) = L and we must prove the same result for all sequences
> x for all n E N and lirn Z , = x.
fl'cc
zn
we have
rl-m
Let {z,},"=, be a sequence in I with Z , < x for all n E N and lirn Z , = x. By
Definition 3.1 we have lirn f (z,) = L , which was to be proved. Sequences with
n-+ffi
zn > x for all n E N are treated similarly. Hence, lim f ( z ) = lirn f ( z ) = L .
fl'ffi
-x'z
Z'X+
For "+,"let lim f ( 2 ) = lirn f ( z ) =: L . By Theorem 3.7, we must prove that
Z'x-
Z'X+
> 0 there is a 6 > 0 so that for all
l f ( z ) - L / < &.
for all
E
z
E
I
\
{x} with
Iz
-
XI
< 6 we have
Let E > 0. By Theorem 3.18 there is a 61 > 0 so that for all z E I \ (x} with z < x
and Iz - X I < 61 we have l f ( z ) - LI < E . By the corresponding result for right limits,
there is a 6, > 0 so that for all z E I \ { x } with z > x and ) z - X I < 6, we have
I f ( z ) - LI < E . Let S := rnin{&, &}. Then for all z E I \ {x} with / z -xi < 6 we
infer f ( z ) - L / < E . By Theorem 3.7, this proves Z'lim
f (z) = L.
X
1
Standard Proof Technique 3.20 To prove that a function has a limit at x
often proves that the left and the right limits exist and that they are equal.
E
R one
Infinity and negative infinity are not numbers, so formally they do not qualify as
limits. But knowing that a function grows beyond all bounds near a point gives more information than a statement that the limit does not exist. Hence, we extend the language
to allow infinite limits.
Definition 3.21 Let f : [ a ,b ) + R be a function. Then the left limit off at b is
said to be infinity iff f o r all sequences {zn}Klin [ a , b ) with lim zn = b we have
lim f ( i n=
) 00. We denote lim f ( z ) := 00.
fl'ffi
n
'x:
z+b-
58
3. Continuous Functions
Infinite right limits at a f o r a function f : ( a , b] -+ R and infinite (two-sided)
limits of a function f : I \ (x} 3 E% at x E I , where I is an open interval, are defined
similarly. Limits equal to negative infinity are also defined similarly and they are
denoted by -m.
Finally, as before, infinite one-sided and two-sided limits of functions with larger
domains are defined via the limits of appropriate restrictions.
We chose to put one-sided infinite limits in the spotlight in Definition 3.21, because
functions often have different behavior to the left and to the right of a point. For
1
1
example, lirn - = 00 and lirn - = -m.
z
Z'O+
z+o- z
It is not surprising that there is a formulation of infinite limits that is similar to the
E-8 formulation of finite limits.
Theorem 3.22 M-8 formulation of infinite left limits. The left limit of the function
f : [ u , b ) + R at b is infinite iff f o r all M E R there is a 6 > 0 such that f o r all
z E [ a ,6 ) with Iz - bl < 6 we have f (2) > M .
Proof. Exercise 3-18.
Of course, similar results also hold for right-sided and two-sided limits. Limit laws
for infinite limits and a version of Theorem 3.19 are given in Exercises 3-22 and 3-23.
Exercises
3-15. Let f,g : [a. b ) -+ B be functions. Prove that if there is a number S t 0 so that f and g are equal
on the subset [ z E [ a ,b ) : / Z - bl < S ] of [ a , b ) ,then the left limit o f f at b exists iff the left limit
of g at b exists and in this case we have lirn f ( z ) = lirn g(z).
z- b-
i-b-
3-16. Prove Theorem 3.17. That is, let f,g : [a. b ) + X be functions such that lim f ( z ) and lirn g ( z )
z-tb-
z+b-
exist and prove each of the following.
(a) - : ~ - ( f+ g ) ( z ) = lim f ( z )
z+b-
(b)
lirn
:+b-
+ z+blim
(f - g ) ( t ) = lirn f ( z ) - lirn
z+b-
g(z).
g(z)
z-b-
lirn (f . g ) ( z ) = lim f ( z ) . lirn g ( z )
z-tb-
(d) If lirn g ( z )
z-tb-
z+b-
# 0, then lirn
z+b-
('1
&'
(z)=
lim,,b-
f(z)
lim,,b-
g(z)
3-17. Prove Theorem 3.18.
3- 18. Prove Theorem 3.22.
3-19. Prove that f ( x ) =
3-20. Prove that Iim
z+o+
x+3;
(x +
10;
h
,
forx<l,
for x > 1, has a limit at x = 1 and state its value
forx = 1,
= 0. Explain why this is a satisfactory resolution of the formal problem in
Exercise 3-8 or why it is not.
3-21. A function f is called nondecreasing on I C W iff for all x i < x2 in I we have f ( x 1 ) 5 f ( x 2 ) .
Let f : [ a , b ] -+ R be a nondecreasing function.
59
3.4. Continuity
(a) Prove that for every x E ( a , b] we have lim f ( z ) = sup
{ f(z):z<x }
Z+S-
Hint. Mimic the proof of the Monotone Sequence Theorem.
(b) Prove that for every x E [ a , b ) the right limit lim f ( z ) exists and state its value
ZX
'+
3-22. Limit laws involving infinite limits of functions. Let f . g : [ a , b ) + W be functions, and let
lim f ( z ) = m. Prove the following.
:+b-
1
(a) If g is bounded (that is, there is an M > 0 so that for all z E [a. b ) we have g(z)
then lirn f ( z ) + g(z) = x and if all f ( z ) are nonzero, then lirn
z+b-
z+b-
(b) If lim g(z) = m, then lirn f ( i )
:+b-
z-b-
+ g(z) = co and z lim
ib-
I < M),
go
= 0.
f(z)
f ( z ) g ( z ) = 00.
(c) If c > 0 is a real number, then lirn c f ( z ) = x.
z+b-
(d) If c < 0 is a real number, then lim c f ( z ) = -m
z-b-
JR be an open interval, let x E I and let f : I \ [ x ]
3-23. Let I
lim f ( z ) =miff lirn f ( z ) = lim f ( z ) = 00.
ZJ'
Z'X-
--f
R be a function. Prove that
ZX
'+
3.4 Continuity
Continuous functions are usually defined as functions for which the limit is computed
by substituting the value. Because we may need to take care of endpoints, the formalization of the elementary definition from calculus requires an extra item (see number 4
below).
Definition 3.23 Let D 2 R be an interval of nonzero lengthfrom which at mostfinitely
many points have been removed and let f : D -+ R be a function. Then f is called
continuous at x
iff
1. f (x) is dejned, that is, x E D, and
2. lirn f ( z ) exists, and
Z-+X
3. lim f ( z ) = f ( x ) ,and
Z"X
4. I f x is an endpoint of D, use left or right limits in 2 and 3, as appropriate.
f is called continuous (on D ) iff f is continuous at every x
E
D.
We could also define continuity for functions defined on sets for which every point
of the domain is contained in an interval of nonzero length. Exercise 3-32 shows that
with the present definition this idea is a bit too simple to produce a sensible result. This
is not a problem, because in the early part of the text the only functions whose domains
are not intervals are rational functions. For these functions the pathology of Exercise
3-32 is not an issue. Therefore we relegate all concerns regarding more complicated
domains to Section 16.3.
3. Continuous Functions
60
__
I_ T
r---------
/
X
For theorems, we will usually work with functions that are defined on intervals,
because if D is an interval from which at most finitely many points were removed,
then f : D +. R is continuous at x E D iff f 1 is continuous at x, where I is a
maximum-sized (with respect to containment) interval contained in D that contains x.
Example 3.24
1. Constantfunctions are continuous at every x E
W.
2. The function f (x) = x is continuous at every x
E
R.
3. Thefilmtion f ( x ) = 1x1 is continuous at every x E
R.
Parts 1 and 2 are trivial and part 3 follows from part 2 of Example 3.5.
0
It is useful to incorporate the definition of limits directly into a characterization of
continuity. With such a characterization it is not necessary to resolve the multiple parts
of the original definition whenever we work with continuity. Figure 6 shows graphical
interpretations of the conditions.
EX be an interval, let x
Theorem 3.25 Let I
Then the following are equivalent.
E
I and let f : I -+ R be a function.
1. f is continuous at x.
2. For every sequence
lim
n+m
f(zn) =
{zn},"=, with zn
E
I for all n E
N and n+m
lim zn
= x, we have
f(x).
3. For every E > 0, there is a 6 > 0 such that for all z E I with Iz - x I < S we have
If(z)- f
< &.
3.4. Continuity
61
Proof. We will prove “1+2,” “2=+3”and “3+1.” The remaining implications
follow because logical implications are transitive. That is, the implications “ I +2”
and “2=+3”imply “l+3,” the implications “2=>3”and “3+l” imply “2+1,” and
so on.
We will assume throughout that x is not an endpoint of the domain. The arguments
are easily modified (by using appropriate one-sided limits and theorems) for the case
that x is an endpoint.
For “1+2,” we need to prove that for every sequence { Z n } E i with Zn E I for all
n E Nand lirn zn = x we have lim f(z,) = f ( x ) .
n’cv
n+cc
Because f is continuous at x, we have lim f ( z ) = f (x). By Definition 3.1, this
(zn}zl
with z ,
XZ
’
I \ {x) and n-too
lirn Z n = x we have that
(zn}gl
be a sequence with Zn E I and lirn Zn = x .
n+cv
n+oo
If there is an N E N so that zn = x for all n 2 N, then there is nothing to prove.
Otherwise let (zn,}El
be the subsequence of all elements with zllk E I \ {x}. Then
lirn znk = x, and hence lim f (z,,) = f (x). Now let E > 0. Then there is a K E N
means that for all sequences
lim f (z,) = f (x). Now let
k+ cv
E
k+oc
so that for all k 2 K we have I f (z,,) - f (x)( < 8. Then for all n 2 nK either zn = x
and f (2,)- f (x)] = 0 < E orthereis a k 2 K withn = nk and f ( z n k ) - f (x)l < E .
This means lirn f (2,) = f (x) for every sequence
with Z, E I for all n E N
n-oo
and lirn Z, = x, which establishes this part of the proof.
n-oc
The proof of “2+3” is similar to the proof of “=>” in Theorem 3.7 (Exercise 3-24).
For “3+1,” note that, because f is defined on I and x E I , f is defined at x. The
condition in part 3 implies by Theorem 3.7 that lirn f ( z ) = f (x), which completes the
1
{z,}zl
1
X’Z
proof.
Because of the formal problems with limits of functions outlined at the end of
Section 16.3.1, in general settings one of the conditions in Theorem 3.25 is normally
used to define continuity.
Standard Proof Technique 3.26 To prove the equivalence of several conditions, it is
standard practice to prove that the first implies the second, the second implies the third,
and so on, and finally that the last condition implies the first. All other implications
0
follow from the transitivity of logical implications.
The proofs of parts 5 and 6 of Theorem 3.27 serve as good examples how the
conditions in Theorem 3.25 can be used in conjunction.
Theorem 3.27 Let I R be an interval, let x
at x E I . Then the following hold.
1. f
E
I and let f , g : I + R be continuous
+ g is continuous at x.
2. f - g is continuous at x.
3. f . g is continuous at x .
4. I f g ( x ) # 0 f o r all x
E
f
I , then - is continuous at x.
g
3. Continuous Functions
62
5. max{f ,g } is continuous at x. (The maximum is dejinedpointwise.)
6. min{f,g } is continuous at x. (The minimum is definedpointwise.)
Proof. The first four parts are direct consequences of the corresponding limit laws.
For part 5 , let x E I. We will establish continuity of max{f, g } at x by proving
that for all sequences
with Z, E I for all n E N and lim zn = x we have
{z,}zl
,--too
lim max { f ( z , ) , g ( z n ) } = max { f b ) ,g w } .
n-oc
Let
be a sequence in I with lim zn = x. In case f ( x ) # g ( x ) , assume
{z,}zl
n+Cc
without loss of generality that f ( x ) > g ( x ) . To prove that the limit of the image
sequence is lim max{f, g } ( z , ) = max{f, g } ( x ) = f ( x ) , let E > 0. Because f and
n-+m
..
..
g are continuous at x, there is a 6 > 0 so that for all z E I with Iz - X I < 6 we
f ( x ) - g ( x ) . In
have I f ( z ) - f ( x ) l < min E , f ( x )
and I g ( z ) - g ( x ) l <
2
particular, for all z E I with'lz - X I < 6 we dbtain
1
ig(')}
and hence max{f, g } ( z ) = f ( z ) . For 6, find an N
Jz, - X I < 8.Then for all n 2 N we infer
E
N so that for all n 2 N we have
I maxIf3 g } ( z n ) - maxIf, g } ( x ) l = I f ( z n ) - f ( x > l <
E,
and hence max{f, g } is continuous at x when f(x) > g(x).
This leaves the case f ( x ) = g ( x ) . Let E > 0. Because f and g are continuous at x, there is a 6 > 0 so that for all z E I with Iz - xi < S we have that
l f ( z ) - f ( x ) l < E and I g ( z ) - g ( x ) l < E . For 6, find an N E N so that for all
n 2 N we have that Iz, - X I < 6. For all n 2 N , the maximum max{f, g } ( z n ) is
equal to f ( z , ) or g(z,). Because Iz, - X I < 6, if max(f, g } ( z n ) = f(z,) we infer
max{f, g } ( z n ) - maxIf9 g } ( x ) l = If(z,> - f ( x ) J < E andifmaxIf, g } ( z n >= g(z,)
we infer max{f, g}(z,) - max{f, g } ( x ) l = Ig(z,) - g ( x ) / < E . Thus the maximum
max{f, g } is continuous at x when f ( x ) = g ( x ) .
The proof of part 6 is left as Exercise 3-25b.
I
1
There is a faster proof of part 5 that relies on Theorem 3.30 below and on an algebraic representation of the maximum (see Exercise 3-26a). Because our main focus is
on standard techniques in analysis, the longer proof was presented here.
Theorem 3.27 gives access to two standard examples of continuous functions.
Example 3.28 A polynomial is a function p : R -+ JR for which there are an n
n
and ag, . . . , a , E R so that a, f 0 and for all x
E
E
W
Iw we have p ( x ) = x a ; x ; . The
;=0
number n is called the degree of the polynomial. The constant function p ( x ) = 0 is
also considered to be a polynomial. Its degree is defined to be -m. Every polynomial
is continuous on R. (See Exercise 3-28.)
3.4. Continuity
63
Example 3.29 A rational function is a function r for which there are two polynomials
p ,q : R
R so that for all x
P(X)
R for which q ( x ) f 0 we have r ( x ) = -.
By
q(x)
Theorem 3.27 and Example 3.28, every rational function is continuous on its domain
{ x E R : q ( x ) f O}. (We implicitly use here that every polynomial has at most finitely
many zeroes.)
0
--f
E
Continuity is also preserved by compositions.
Theorem 3.30 Let I , J
R be intervals, let g : I -+ R be continuous at x E I , let
g [ I ] C J and let f : J -+ R be continuous at g ( x ) . Then f o g : I -+ R is continuous
at x .
Proof. We will prove that for every E > 0 there is a 6 > 0 so that for all z E I with
Iz - X I < 6 we have ( g ( z > )- f ( g ( x ) ) l < E .
Let E > 0. Because f is continuous at g ( x ) , there is a 6f > 0 so that for all y E J
with / y - g ( x ) < 6 f we have ( y ) - f ( g ( x ) ) < E . Because g is continuous at x , for
Sf there is a 6 > 0 so that for all z E I with jz - X I < 6 we have I g ( z ) - g(x)l < 6f.
But then because l g ( z ) - g(x)j < 6 f we infer f ( g ( z ) ) - f ( g ( x ) ) l < E .
Therefore we have proved that for every E > 0 there is a 6 > 0 so that for all
z E I with 1 - X I < 6 we have l f ( g ( z ) )- f ( g ( x ) ) l < E , which means that f o g is
continuous at x .
H
If
I
I
1
If
We conclude this section by characterizing discontinuities.
Definition 3.31 Let D be an interval of nonzero length from which at most finitely
many points X I , . . . , x,, have been removed. Ifthefunction f : D -+ R is not continuous at x E D U ( X I , . . . , x,}, we speak of a discontinuity at x . There are several types
of discontinuities. (For visualizations, see Figure 7,for examples, see Exercise 3-31.)
I . I f lim f ( z ) exists, or i f x is an endpoint of D U ( X I , . . . , x,,} and the appropriate
Z'X
one-sided limit exists, but the limit is not equal to f ( x ) ; or if x is not an endpoint of D U { X I , . . . , x,} and f is not defined at x , we speak of a removable
discontinuity.
2. I f lim f ( z ) and lim f ( z ) exist, but they are not equal, we speak of a jump
Z'X-
Z'X+
discontinuity.
3. I f (at least) one of lim f ( z ) and lirn f ( z ) does not exist and
Z'X-
Z'X+
sequence (z,}:,
in D that converges to x and limn+oo
of an infinite discontinuity.
1f(z,)] =
if there is a
00,
we speak
4. Iftat least) one of lim f ( z ) and lirn f ( z ) does not exist, ifthere is a 6 > 0 so
z'x-
Z'X+
{z,,}zl
that f is bounded in { z E D : / z - x I ia} and ifthere are two sequences
and ( w , ] that
~ ~ converge to x such that for all n E W we have z,,, w, < x (or
z,, w, > x ) and such that both lirn f ( z , ) and lirn f (w,)exist, but they are
n+oo
,--too
not equal, we speak of a discontinuity by oscillation.
3. Continuous Functions
64
Figure 7: Visualization of the possible discontinuities of a function.
Theorem 3.32 Let D C R be an interval of nonzero length from which at most$nitely
many points XI,. . . , x, have been removed and let f : D + R be a function. Then
eveiy discontinuity x E D U {XI,. . . , x,} of f is of one of the four types listed in
Dejinition 3.31.
Proof, Let x E D or let x be one of the finitely many elements that were removed
and assume that f is not continuous at x. Let the discontinuity at x be neither a removable discontinuity, nor a jump discontinuity, nor an infinite discontinuity. We will
prove that it must be a discontinuity by oscillation. If lirn f (x) and lirn f ( x ) both
Z’X-
Z’X+
existed, then they would either be equal and the discontinuity would be removable, or
not, in which case the discontinuity would be a jump discontinuity. Hence, one of the
one-sided limits does not exist at x. If x is the supremum or the infimum of D , then
f is defined at x and one of the two one-sided limits does not exist by default. In this
case, the respective other one-sided limit also must not exist, because otherwise there
would be a removable discontinuity at x. By symmetry, we can assume without loss of
generality that lim f (x) does not exist and f is defined on some interval [x - 6 . x).
zx’-
Because the discontinuity at x is not an infinite discontinuity, there is a u > 0 so that
V
f is bounded on [x-u, x). Let zn := x - -. Then lirn z , = x. First consider the case
n
n’oo
that lim f (z,) =: L exists. Because lim f ( z ) does not exist, there must be an E > 0
n+oo
ZX’-
and a sequence {vk}&
1
1
so that lim
k+m
Uk
= x, and for all k E
N we have x - v 5
uk <
x
oc
and f ( U k ) - L > E . Because f is bounded on [x - u , x), the sequence { f ( U k ) J k Z l
is bounded. By the Bolzano-Weierstrass Theorem, it has a convergent subsequence
co
{f(Uk,)},=1.
Then n-oo
lim f(vk,)
L , and hence {wn}E1:=
and {zn1Z1
are sequences as required in the definition of a discontinuity by oscillation.
oo
If { f (z.)},=~ does not converge, note that it is bounded, and hence it has a convergent subsequence { f
Now f has a discontinuity by oscillation at x,because
we can replace
with { z n k } E in
I the above argument and then repeat it.
W
+
{w,}zl
(znk)}E1.
{zn]zl
Standard Proof Technique 3.33 When an argument requires a convergent sequence
and all that is guaranteed for a given sequence ( z ~ } ? = ~is a convergent subsequence,
then one often assumes without loss of generality that ( z , , ] ~ = converges.
~
This is because, just like at the end of the proof of Theorem 3.32, the given sequence
can be replaced with a convergent subsequence that (usually) has all the properties of
(Z,},X=~, plus it converges.
0
{zn]zI
3.4. Continuity
65
Although Theorem 3.32 characterizes all possible discontinuities for functions from
R to R,other types of discontinuities exist for functions with infinite dimensional range
(see Exercise 16-36).
Exercises
3-24. Prove part “ 2 j 3 ” of Theorem 3.25.
3-25. Completing the proof of Theorem 3.27
(a) Give all details of the proofs of parts 1-4.
(b) Prove part 6.
3-26. Alternative proofs of parts 5 and 6 of Theorem 3.27. Let I be an interval and let f,g : I + R be
functions.
(a) Use parts 1 and 2 and Theorem 3.30 to prove that if f and g are continuous at x
E
I , then
max(f, g ) is continuous at x .
Hinr. First prove that for all a , b
E
1
2
W the equality max[a, b ] = - ( a + b + la - b / ) holds.
(b) Give a similar proof of part 6 of Theorem 3.27.
3-27. Prove that for all n E N the function f ( x ) = x n is continuous on W.Then prove that for all m E W
the function f ( x ) = Y r n
is continuous on I% \ ( 0 ) .
3-28. Prove that all polynomials are continuous on E.
Hint. Induction on the degree.
3-29. Alternative proofs of Theorem 3.30.
(a) Prove Theorem 3.30 using Definition 3.23 and Theorem 3.14.
(b) Prove Theorem 3.30 using part 2 of Theorem 3.25.
3-30. Explain why we demand that the interval in Definition 3.23 must have nonzero length.
3-31. Examples of discontinuities.
2
(a) Prove that f ( x ) = - has a removable discontinuity at 0
X
(b) Prove that f ( x ) = rx] has a jump discontinuity at 0.
1
(c) Prove that f ( x ) = - has an infinite discontinuity at 0.
x
(d) Let g(x) :=
+2;
.
1
f o r O S x i -,
2
1
for - < x 5 1.
2 -
,
has a discontinuity by oscillation at 0
u [k - L. ‘1
m
3-32. Let
u :=
10n ’ n
,,=I
f : [-l,O]
(a)
(b)
(c)
(d)
UU
--f
:= ( x E
B :(3n
I
E N :x E
0;
[
-
A,:])I
and let the function
forx E [-l,O],
E U.
W be defined by f ( x ) := 1: forx
Prove that f is continuous on every interval I that is contained in its domain D := [- 1 , OIUU.
Prove that every point in D is contained in an interval of nonzero length.
Explain why the function still should not be considered to be continuous on D .
Suggest a generalization of the definition of continuity that would allow domains such as D
and that would make f discontinuous at 0.
This function will be revisited in Exercise 16-29.
66
3. Continuous Functions
.xA approach c
I
,
from the left
sn approach c
from the nght
b
-'
"
1
f(x)>Ofor
Figure 8: Visualizing the Intermediate Value Theorem. Intuitively ( a ) it is clear that
an unbroken graph that goes from a point below the x-axis to a point above the xaxis must cross the x-axis at least once (solid graph) and that it could even cross the
x-axis multiple times (dotted continuation past the first intercept). Part ( b ) gives a
visualization of the proof. Because the sequences { x , ) and
~ ~ {xA},"=, meet in the
middle at (c, f ( c ) ) , we conclude f ( c ) = 0.
3.5 Properties of Continuous Functions
Aside from their obvious connection to limits, continuous functions are interesting
because they have several useful properties.
Theorem 3.34 Intermediate Value Theorem. Let a < b and let f : [ a ,b] -+ R
be a continuousjknction. I f f ( a ) < 0 and f ( b ) > 0 (or vice versa) then there is a
c E ( a , b ) such that f ( c ) = 0. (Also see Figure 8(a).)
Proof. Assume without loss of generality that f ( a ) < 0 and f ( b ) > 0. The proof
for the other case is similar.
The set G := {x E [ a ,b ] : f ( x ) > 0) contains b and it is bounded below by a . Let
c := inf(G). We will show that c is as claimed in the theorem.
First, we show that c $ { a ,b ) . For a contradiction suppose that c = a . Then
by the version of Proposition 1.21 for infima, for each n E N there would be an
xn
E
(a,a
+ :)
with f (x,) > 0. (Note that we are using Standard Proof Tech-
nique 3.8 here.) But then lim x, = a , and by continuity of f we could infer that
n-+m
0 > f ( a ) = n+m
lim f(x,) 3 0, a contradiction. The inequality c f: b is proved similarly (see Exercise 3-33).
Because c # b, again by the version of Proposition 1.21 for infima, for each n E N
there is an x, E
c
c, c
(
# a , there is an N
E
+-
3
with f ( x , ) > 0. In particular, lim x, = c. Because
n-cc
1
is greater than
n
= c and f(xA) 5 0 for
W so that for all n 2 N the number XI, := c -
XA
or equal to a. Hence, the sequence {xA},"==, satisfies lim
n+cc
all n E N.For a visualization of the sequences, consider Figure 8.
-
3.5. Properties of Continuous Functions
67
Because f is continuous, we infer lim f ( x n )= lim f (x;) = f ( c ) . But the inn-oc
n-oc
(a)
I
0. Thus
equalities for f ( x n ) and f (x;) show that lim f ( x , ) 3 0 and lim f
n-+w
n-cc
f ( c ) must be greater than or equal to zero and less than or equal to zero, which implies
f ( c ) = 0.
The Intermediate Value Theorem immediately implies that continuous images of
intervals are intervals, too.
Definition 3.35 Let A , B be sets, let f : A -+ B be a function and let C
the image of C under f is dejned to be f [ C ] := { f ( c ) : c E C}.
Theorem 3.36 Let I
is an interval.
G R be an interval and let f
:I
s A.
Then
+ R be continuous. Then f [ I ]
Proof. Let I , u E f [ I ] with 1 < u and let m E (1, u ) . Then there are a , b E I
with f ( a ) = 1 and f ( b ) = u . Without loss of generality assume a < b. The function
g ( x ) := f ( x ) - m is continuous on [ a , b ] . By the Intermediate Value Theorem there
is a c E ( a , b) so that g ( c ) = 0. But then f ( c ) = g ( c ) in = in. Because i n , 1, u were
arbitrary we have proved that for any two elements 1 < u of f [ Z ] and any element m
between them we have that m E f [ Z ] .Therefore f [ I ] is an interval.
+
We now turn to inverse functions.
Definition 3.37 Let A , B be sets and let f : A i B be a bijective function. Then the
inverse function o f f is the unique (see Exercise 3-34)function f
: B + A that
maps each b E B to the unique a E A so that f ( a ) = b.
-'
Theorem 3.38 Let I 2 R be an interval and let f : I -+ R be a continuous injective
function. Then the inverse function f
: f [ I ] i I is also continuous.
Proof. Clearly, f maps Z bijectively onto f [ Z ] ,and hence f has an inverse function
f - ] that is defined on f [ Z ] . By Theorem 3.36, f [ Z ] is an interval. All that is left is
1
oc
to show that if a sequence { ~ ~ ] n h in
3 , f~ [ I ]converges to y E f [ Z ] ,then f - ( yn ) l n = l
converges to f - ' ( y ) .
Let(yn}E1beasequenceinf[I]with lim yn = y . F o r n ~ N , l e t x ,:= f - ' ( y n )
n+oo
and let x := f - ' ( y ) . For E > 0, let J := { z E I : Iz - X I < E } , Ji := { z E J : z 5 x)
and J , := { z E J : z 2 x}. Then f [ J ] ,f [ J l ] and f [ J u ] are all intervals that contain
j = f ( x ) . Because f is injective, the only point f [ J l ] and f [ J u ]have in common is
y . Therefore y is the maximum of one of the intervals f [ J l ] and f [ J u ] and it is the
minimum of the other.
If both f [ J / ]and f [ J u ]have more than one element, then there is a 6 > 0 so that
( y - 6, y 8)
f [ J ] . In this case, because {yn)Z1
converges to y , there is an N E N
so that for all n 3 N we have yn E ( y - 6 , y 6) E f [ J ] .Consequently, for all n 2 N
the point x n is in J , which means Ixn - x /< E .
If f [J,] has exactly one element, then J , has exactly one element and x is the
largest point of I . Therefore f [ J / ]has more than one element and there is a 6 > 0
+
+
68
3. Continuous Functions
+
so that ( y - 6 , y ] E f [ J l ] or [ y , y
6) & f [ J l ] . Without loss of generality assume
( y - 6 , y ] E f [ J l ] . We claim that then y is the largest point of f [ I ] . Suppose for
a contradiction that there was a y' > y in f [ I ] . Then a := f - ' ( y ' ) < x and there
would also be a b < x with f ( b ) < y . By Theorem 3.36 there would be a c # x
between a and b so that f ( c ) = y = f ( x ) , contradicting the injectivity of f . Thus
y = sup f [ I ] .Because ( y n } z lconverges to y there is an N E N so that for all n 2 N
we have yn E ( y - 6, y ] 5 f [ J l ] . Thus for all n 2 N we infer x, E J l , which means
Ix, - x / < E .
The case in which f [ J / ]has exactly one element is handled similarly.
We have shown that in each of the above cases f - '
lim y,) = lim f - ' ( y n ) ,
which implies that f
(
r 2 . 3 ~ -
-' is continuous.
,
n.fOg-
Now that we know that inverses of continuous functions are continuous, we can
establish limit laws for powers with rational exponents.
Definition 3.39 A number n E Z is called even irthere is a k
it is called odd iff there is a k E Z so that n = 2k 1.
+
E
Z so that n = 2k and
Corollary 3.40 Let d E N. Then f ( x ) = x : is continuous on [0,00) i f d is even and
it is continuous on JR i f d is odd.
Proof. Use Theorem 3.38 (Exercise 3-35).
Corollary 3.41 Let r E Q be positive. Then f (x) = x" is continuous on [0,CO), and
i f r can be represented as a fraction with odd denominatol; f is continuous on R. For
negative r E Q,f ( x ) = x" is continuous on ( 0 ,CO), and i f r can be represented as a
fraction with odd denominatol; f is continuous on R \ {O}.
Proof. Use Exercise 3-27 and Corollary 3.40 (Exercise 3-36).
Theorem 3.42 Let
f o r all r
E
lirn
Q we have n+cc
be a convergent sequence of nonnegative numbers. Then
aI; =
(n+oc
lim
Proof. Exercise 3-37.
We conclude this section by showing that on closed and bounded intervals continuous functions assume an absolute minimum and an absolute maximum.
Definition 3.43 Let I E R be an interval and let f : I + JR be a real valued function.
The number y, is called the absolute minimum value o f f (in I ) if and only ifthere
is an x,, E I with f (x,) = y, and f o r all x E I we have f (x) > f (x,). The number
y~ is called the absolute maximum value off (in I ) ifand only ifthere is an X M E I
with f ( x ~ =
) Y M and for all x E I we have f ( x ) 5 f ( x ~ ) .A value that is the
absolute maximum or the absolute minimum is also called an absolute extremum.
Theorem 3.44 Let f : [ a ,b] -+ Iw be continuous. Then there is an x
that,forall z E [ a ,bl we have f ( x ) 2 f ( z ) .
E
[ a ,b] such
69
3.6. Limits at Infinity
Proof. First we show that f is bounded above on [ a ,b ] . For a contradiction,
suppose that for every n E N there is an x, E [a,b ] such that f (x,) 2 n. Then
by the Bolzano-Weierstrass Theorem there is a convergent subsequence { xnk},“=, with
limit x E [ a ,b ] . But then for all k we have f ( x n k ) L nk while at the same time
lim f ( x n k ) = f (x) < 00, which is not possible.
k-+ rn
Thus f is bounded above. Let M := sup { f ( z ) : z E [ a , b ] } . For each n E N
1
find x, E [ a ,b] with f (x,) 2 M - -. Again by the Bolzano-Weierstrass Theorem
n
~
limit x E [ a ,b ] . For all k we have
there is a convergent subsequence { x , , } ~ with
1
A4 - - 5 f ( x n k ) 5 M , which means by the Squeeze Theorem that
nk
f (x) = ,lhlf
= M = SUP { f ( z ) : z E [ a ,b l } .
Exercises
+ b in the Intermediate Value Theorem
3-33. Prove that c
3-34. Prove that i f f : A
b E B.
+
B is bijective and g, It : B -+ A are inverses o f f , then g(b) = h ( b ) for all
3-35. Prove Corollary 3.40
3-36. Prove Corollary 3.41
3-37. Prove Theorem 3.42.
3-38. Let f : [ a . b] + B be continuous. Prove that there is an x
inequality f ( x ) 5 f ( z ) holds.
E [ a , b] such
that for all
z
E [a. b] the
3-39. Let I 5 W be an interval. Give an example that shows that even if f : I + W is continuous, f [ I ]
need not be bounded. Then explain why this example does not contradict Theorem 3.44.
3-40. Let I be an interval and let f : I + R be continuous and injective. Prove that f is either increasing
(that is, for all x1 < x 2 we have f ( x 1 ) < f ( x 2 ) ) or decreasing (that is, for all X I < x2 we have
f(x1) > f ( x 2 ) ) .
3-41. Although the contrapositive is often used to clarify a mathematical statement, some contrapositives
can be quite confusing. Determine if the statement “If f ( n )
0 for all c E [a, b ] ,then f ( a ) > 0
or f ( b ) 5 0 or f is not continuous on [a, b].” is true or false by analyzing its (much simpler)
contrapositive.
+
3.6 Limits at Infinity
For functions defined on an interval ( t , 30) it is sensible to investigate the behavior as
the argument x gets large. The resulting notion of a limit at infinity is set up in exactly
the same way as the other limits discussed in this chapter. Thus it is not surprising that
similar laws hold.
Definition 3.45 Let L be a real number and let f : ( t , 00) + R be a function. We
say f converges to the limit L at 30 and write lim f ( z ) = L iff for every sequence
{Z,},X=~ in ( t , 30) such that n+m
lim
ZO
’D
Z,
= 00 we have lim f (z,) = L.
,--too
70
3. Continuous Functions
Limits at --oo are dejined similarly. For functions with domains that contain intervals ( t , 00) or (--00, t ) the limits at f - 0 0 are dejined as the limits of the appropriate
restriction.
Theorem 3.46 Limit laws for limits of functions at -00. Let f , g : ( t , 00) + R be
g ( z ) exist. Then the following hold.
functions so that limZ+CCf ( z ) and limZ--rCC
1. lirn ( f
Z
M
'
+ g ) ( z ) = lim f ( z ) + lim g ( z ) .
Z'CC
Z'CC
2. lim (f - g ) ( z ) = lim f ( z ) - lim g(z).
Z'ffi
Z'CC
Z'CC
3. lirn (f . g ) ( z ) = lim f ( z ) . lim g ( z ) .
Z'W
Z'CC
Z'CC
Each equation implicitly asserts that
and in this case the equality holds.
if the right side exists, so does the left side
Proof. Exercise 3-42.
Theorem 3.47 E-M formulation for the limit of a function at infinity. Thefinction
f : ( t , 00) -+ R converges to the limit L at infinity ifand only iffor every E > 0 there
is an M such that for all z 2 M we have f ( z ) - L < E .
1
I
w
Proof. Exercise 3-43.
Moreover, infinite limits at &00 can be
Similar results hold for limits at -a.
defined similar to infinite limits at a finite number and similar limit laws hold. By
now, the reader is sufficiently familiar with the underlying ideas to formulate these
definitions independently (see Exercise 3-44).
Exercises
3-42. Prove Theorem 3.46.
(a) Prove part 1 of Theorem 3.46.
(b) Prove part 2 of Theorem 3.46.
( c ) Prove part 3 of Theorem 3.46.
(d) Prove part 4 of Theorem 3.46.
3-43. Prove Theorem 3.47.
3-44. Infinite limits at infinity.
(a) State the definition of lim f ( z ) = oc
z+m
(b) State limit laws for infinite limits at oc.
(c) State and prove a result similar to Theorem 3.47 for infinite limits at infinity.
3-45. Explain why we do not define left and right limits at infinity.
Chapter 4
Differentiable Functions
Geometrically speaking, differentiable functions have unbroken graphs without
corners. This “smoothness” of differentiable functions is useful in applications. In
the present chapter differentiability is introduced in Section 4.1, the relation between
differentiability and the common operations on functions is considered in Section 4.2,
and some geometric consequences of differentiability are provided in Section 4.3.
4.1 Differentiability
The derivative provides the slope of the tangent line, if the function has a tangent line
at the point. Definition 4.1 encodes this idea by demanding that as we fix one point and
move the other one closer to this “base point,” the slopes of the secant lines through
these points approach a limit. The left part of Figure 9 shows that this convergence of
the slopes means that the secant lines tilt towards a “limiting line,” the tangent line. It
should be noted that differentiability typically is considered on open intervals so that
secant lines “in both directions” can be used to obtain the limit.
Definition 4.1 A function f : ( a , b ) -+
lim
z+x
-
2--x
R is differentiable at x
exists. In this case we set f ’ ( x ) := lim
Z’X
E
( a , b ) zrthe limit
(‘) - f ( x ) and call it the
z-x
d f ( x ) and D f ( x ) .
derivative o f f at x. Other notations for the derivative at x are dx
Similar to limits offunctions, if D C R, a function f : D --f JR is called differentiable at x E D iff there is an open interval ( a , b) D so that x E ( a ,b ) and so that
71
72
4. Differentiable Functions
f I ( a , b ) is differentiable at x. Moreover, similar to what we did so far, we will mostly
work with functions defined on open intervals, trusting that the reader can make the
jump to larger domains.
Example 4.2 The following are verified with routine computations (Exercise 4-1).
I . At every x
E
R thefunction f (x) = x is dizerentiable with f’(x) = 1.
0
2. The function f (x) = 1x1 is not differentiable at x = 0.
As with continuity, the local property of differentiability can be demanded at every
point of the domain to obtain a global definition.
Definition 4.3 Let D E R.Afunction f : D + IR is differentiable, or differentiable
on D, iff it is differentiable at every x E D.
Continuity at x means that lirn f ( z ) - f(x) = 0. Theorem 4.4 affirms that conZ
X
’
vergence of the quotient
(‘) -
z-x
to a number is a stronger condition.
Theorem 4.4 Let f : ( a , b ) + R be dizerentiable at x . Then f is continuous at
x. Hence, every differentiable function is continuous. Howevel; not every continuous
function is dizerentiable.
Proof. Let f be differentiable at x. Then lim
Z
X
’
0 = lim(z - x) lim
Z’X
Z
X
’
f ( z ) - f (x) = lim(z - x)
z-x
X’Z
(‘) -
z-x
-
z-x
exists. This implies
= lim f ( z ) - f(x),
Z’X
+
that is, lirn f (z) = lirn (f( z ) - f (x)) lim f (x)= 0 + f (x) = f (x),and f is conZ’X
Z’X
Z’X
tinuous at x.
To see that not every continuous function is differentiable, consider f (x) = 1x1,
H
which is continuous, but it is not differentiable at x = 0 .
Ultimately, we will generalize differentiability to higher dimensions. In higher dimensions, division is not possible, but it is possible to define entities that are similar to
tangent lines. Theorem 4.5 shows the difference between differentiability at x and continuity at x without using division. A differentiable function f can be approximated
by a straight line g ( z ) = f(x) f’(x)(z - x) through (x,f (x)) in such a way that
the difference between f and g goes to zero faster than Iz - x 1. Geometrically, this
means (see Exercise 4-2a and Figure 9 ( b ) )that, no matter how small the width, near x
the differentiable function f will enter all “cones” which are centered at (x,f (x)) and
symmetric about the line g . This idea ultimately leads to the definition of differentiability in higher dimensional spaces (see Definition 17.24). For a continuous function
f,the difference between f and any straight line through (x,f (x)) goes to zero (see
Exercise 4-2c), but the function need not enter arbitrarily narrow cones about the line.
+
4.1. Differentiability
73
(b)
Figure 9: Two ways to view differentiability. In ( a ) , as in Definition 4.1, the slopes
of the secant lines through (x,f ( x ) ) and (z, f ( z ) ) approach a number, which is the
slope of the tangent line. In ( b ) ,as in Theorem 4.5 and Exercise 4-2a, the function and
a certain straight line, the tangent line, are such that for any width E > 0, near x the
function will ultimately enter the “cone of width E” about the tangent line.
Theorem4.5 Let f : ( a , b ) + R beafunctionandletx E ( a ,b). Then f isdigerentiable at x iff there is an L E JR so that for every E > 0 there is a 6 > 0 such that for
Moreovel;
all z f x with Iz - X I < 6 we have I f ( z ) - f ( x ) - L ( z - x)/ < E J Z -XI.
in this case f ’ ( x ) = L .
1
2-x
-L <
E.
Multiplying both sides by
8
L = lim
Z’X
-
z-x
Iz
-
XI
turns this inequality into
-
and f is differentiable at x with f ’ ( x ) = L .
Exercises
4-1.
(a) Prove that the function f ( x ) = x is differentiable at every x E
W and that f ’ ( x ) = 1
(b) Prove that the function f ( x ) = 1x1 is not differentiable at x = 0.
4-2. Differentiability versus continuity.
(a) Let f : (u,b) + R.Prove that f is differentiable at x iff there is an L
E > OthereisaS > 0 so that for allz E R with / z - x i < 6 wehave
( A x )+
u z -XI
) - El2 --XI 5 f ( z ) 5
(f(x)
+ L(z - x )
E
W so that for every
) +ElZ
-XI
Also prove that in this case f’(x) = L . Use this result to explain the right part of Figure 9.
14
4. Differentiable Functions
(b) Prove that for any m
E
W we have lim Iz/ - mz = 0, even though f ( x )
(c) Let f : ( a , b)
0
’i
tiable at x = 0
W be a function that is continuous at x
i
1 f ( z ) - [ f ( x ) + m(z - x)l I = 0.
J&x
E
= 1x1 is not differen-
W and let m
E
W. Prove that
4-3. For f : ( u , b ) + B and x E ( a , b ) we define the left-sided derivative of f at x via the left limit
~ ’ f ( x:=) lim
z’x-
- f ( x ) if the limit exists and we define the right-sided derivative o f f at
z-x
x via the right limit D ” f ( x ) := lim f ( z ) <‘X+
z-x
if the limit exists.
(a) Prove that f is differentiable at x iff the left-sided and right-sided derivatives exist at x and
D‘f(x) = D‘f(x).
(Compare with Standard Proof Technique 3.20.)
(b) Prove that for f ( x ) = 1x1 we have that D ‘ f ( 0 ) = 1 and d f ( 0 ) = -1.
4.2 Differentiation Rules
The results in this section show how differentiability relates to the algebraic operations
for functions as well as to composition. So far, whenever limits were concerned, we
worked with estimates that ultimately made certain differences smaller than E . However, we can also use the available limit theorems and we will do so in this section.
The following computations reemphasize the fact that facility in symbolic computation
remains important in abstract mathematics.
Theorem 4.6 Let the functions f : ( a ,b ) -+ E% and g : ( a ,b ) -+ R both be dzfferentiable at x E ( a , b ) and let c be a real numbei: Then the functions f g , f - g and
cf are all differentiable at x and
+
(f
+ g)’(-x)
(f
- g)’(x)
=
=
(Cf)’(X)
=
f ’ G )+ g’(x),
f’(-x) - g’(-x),
Cf’(X).
Proof. We use the definition of the derivative with the goal of extracting the difference quotients for f and g , respectively.
(f
+g)’(x)
=
lim
ii
x
(f
+ g ) ( z I - ( f + g>(-x) = lim f (2) + g ( z ) - f ( X I - g ( x )
z - X
z+x
z--x
4.2. Differentiation Rules
75
The remainder of the proof is left to the reader as Exercises 4-4 and 4-5.
H
Derivatives are also well-behaved when products, quotients, powers and compositions are involved. We first prove the quotient rule, which is a bit harder than the
product rule and we leave the proof of the product rule as an exercise.
Theorem 4.7 Quotient Rule. Let the functions f,g : ( a , b) + R be diTerentiable
at x E ( a , b) and let g ( x )
+ 0.
f is. diflerentiable at x with
Then the quotient g
Proof. Similar to the proof of Theorem 4.6 we compute the limit of the difference
quotients. The computations are a bit more involved than before.
$(z)
lim
X’Z
- gf ( x )
z-x
4. Differentiable Functions
76
AA=Af
g+Ag. f+AfAg
Figure 10: Visualization of the Product Rule. The growth rate of a rectangle with side
lengths f and g can be obtained from the picture above by dividing the formula for
+
g ' f . The term A f A g does
A A by At and letting A t go to zero. It is A' = f ' g
At
not contribute to the rate because in its numerator two quantities are going to zero. The
proof of the product rule (Exercise 4-6) makes this idea more precise.
w
Theorem 4.8 Product Rule. Let f,g : ( a , b) + R be differentiable at x E ( a , b).
Then f g is differentiable at x with ( f g ) ' ( x ) = f ' ( x ) g ( x ) g ' ( x )f ( x ) . (For a visualization, consider Figure 10.)
+
Proof. Exercise 4-6.
Now that products and quotients are taken care of, we can consider powers. The
Power Rule could have been proved earlier in a more direct fashion, but the present
proof is faster.
+
Theorem 4.9 Power Rule. For every integer n 0, thefunction f ( x ) = X" is diyerd
entiable with -xn = nxn-' at evely x E R for which the right side is dejined.
dx
Proof. For n > 0, we use induction on the exponent n. For the base step with
d
2-x
n = 1, note that -xl = lim -= 1 for all x E R.
dx
z-xz-x
d
For the induction step n + ( n + l), we need to prove -x"+'
= ( n 1)x" for all
dx
d
x E 1w and we can use the induction hypothesis that -xn = nxn-'. Via the Product
dx
d
d
Rule we obtain -xn+'
= - ( x . x " ) = 1 . X"
nx"-' . x = ( n 1)x".
dx
dx
This establishes the power rule for positive integer exponents n. For any rn E N,
we have -rn < 0 and we can differentiate x P m as follows.
+
+
+
4.2. Differentiation Rules
77
ipeed g’(n) in the g-direction
,peed f ’ (g(x) ) g’(x)
n the f-direction
,’slope g‘(xj
,‘slope i ‘ ( g ( x 1 )
I
X
*
-
__c
-
“g”
c
speed 1 in the x-direction
speed g’(x) in the g-direction
The output of g becomes the input o f f . Position and speed are preserved.
Figure 11: Visualization of the chain rule. The derivative can also be understood as
a magnification factor for speeds. If a particle at x moves at unit speed along the
horizontal axis, then its image particle under g moves at speed g’(x) along the vertical
axis. Now, if a particle at g ( x ) moves at speed g’(x) along the horizontal axis then its
image particle moves at speed f ’(g(x))g’(x)along the vertical axis.
The above shows that the power rule holds for all nonzero integers. Now that we
have the power rule, the last computation can only be viewed as a clumsy way to compute the derivative of powers with negative exponents. Howevel; in this proof we had
no choice, because we were proving the very rule that abbreviates this computation.
We will revisit the Power Rule in Theorems 4.22 and 12.10. With algebraic operations taken care of, we turn our attention to composition. Figure 11 shows a kinematic
way to explain the Chain Rule.
Theorem 4.10 Chain Rule. Let g : ( a , b ) -+ LR and f : ( c ,d ) + R be functions
with g [ ( a ,b ) ] ( c ,d ) and let x E ( a , b ) be such that g is dzjterentiable at x and f is
dzjterentiable at g ( x ) . Then f o g : ( a , b ) -+ R is dijterentiable at x and the derivative
is ( f 0 g)’(x) = f’(g(x))g’(x).
Proof. First, consider the case that there is no sequence
{znlzl
with ,hhr Zn
and zn # x and g ( z n ) = g ( x ) for all n E W. In this case, we proceed as follows.
lim
Z
X
’
f
0
g(z)- f
z-x
0
g(x)
=x
4. Differentiable Functions
78
=
lim
u*g(x)
=
f( u ) - f
. lim
u -g(x)
Z’X
g(z) - g(x)
z -x
f’(g(x))g’(x).
{z,},“=, with n+m
lirn zn
= x and
Because f is differentiable at g ( x ) , there is a u > 0 so that for all u
# g(x)
This leaves the case in which there is a sequence
zn # x and g ( z n ) = g ( x ) for all n
with 1u - g(x)l < u we have
1
E
N.In this case,
- g(x)
reverse triangular inequality) ’(’)
E
- f l ( s ( x ) ) l < 1, and hence (by the
1
I
+ 1. Moreover, for all
- (g(x)) < f’(g(x))i
- g(x)
0 there is a 6 > 0 so’ that for all z f ’ x with Iz - X
&
If’ ( g ( x ) )1 + 1
I
< 6 we have that
(because g’(x) = 0) and / g ( z ) - g(x)l < u (by
Theorem 4.4). Formally, we would need tofind a 6 so that thefirst inequalig holds and
another so that the second one holds and then use the minimum of the two. We used a
simple modification of Standard Proof Technique 2.6 to abbreviate this step. Therefore
for all z # x with Iz - x / < 6 we obtain the following. In case g ( z ) = g ( x ) , we have
g(z) g ( x ) = 0. In case g ( z ) # g ( x ) ,we have I g ( z ) - g ( x ) / < v , and hence
z-x
Because E was arbitrary we conclude
and the proof is complete.
4.2. Differentiation Rules
79
With compositions taken care of, it would be natural to also consider inverse functions. Because we need Rolle's Theorem to dispose of a technicality, consideration of
inverse functions is postponed to Theorem 4.2 1.
Example 4.11 Maxima and minima do not preserve diyerentiability.
Even though f (x) = x and g ( x ) = -x are both differentiable, the functions
max{f , g } ( x ) = 1x1 and min{f , g ] ( x ) = --/XI are not differentiable at 0. Thus, while
differentiability is compatible with the natural algebraic operations for functions, it is
not compatible with the natural order-theoretical operations for functions.
0
If a function f : ( a , b ) -+ R is differentiable on ( a , b ) , then the derivative f' is a
function in its own right. Hence, we can consider continuity and differentiability for
.f I .
Definition 4.12 Let D R and let f : D -+ R be afunction. Then f is called continuously differentiable iff it is diyerentiable on D and the derivative f ' : D + R
is continuous. The function f is also often considered to be its own "zeroth derivative" f (1' := f . The function f is called n times differentiable i y i t is n - 1 times
derivative f (n-l) : D + R is d8erentiable. The nth
diTerentiable and its (n d
d"
derivative o f f is f ( " ' ( x ) := -f("")(x), also denoted -f . The function f is
dx
dxn
called n times continuously differentiable iff it is n times differentiable and its nth
derivative f ( n ) : D + R is continuous. Finally, f is called infinitely differentiable
iy fo r all n E N it is n times differentiable.
The differentiation rules we have derived here immediately show the following.
Example 4.13 Polynomials and rational functions are infinitely diyerentiable on their
0
domains.
Example 4.14 For every n E N, the function f (x)=
times continuously differentiable, but it is not ( n + 1)-times differentiable. (Exercise
0
4-7.)
Exercises
4-4. Prove that if f,g : ( a , b ) -+ R are both differentiable at x E ( a , b ) , then f - g is differentiable at x
and (f - g)'(xj = f ' ( x ) - g'(x).
4-5. Prove that if f : ( a , b ) -+
and ( c f ) ' ( x ) = c f ' ( x ) .
W is differentiable at x
E ( a , b ) and c E
W,then cf is differentiable at x
4-6. Prove the product rule.
Hint. It's similar to the proof of the quotient rule, but simpler.
4-7. Prove the claim in Example 4.14.
Hint. Use induction and at x = 0 use Exercise 4-3.
4-8. Prove that f : ( u , b j + 5s is differentiable at x E ( a . b ) iff the limit lim
and that in this case f ' ( x ) = lim
h-0
f(x
+h) - f(x)
h
h+O
f(x
+hj - f(xj
h
exists
80
4. Differentiable Functions
4-9. Let n E N. Use the Binomial Theorem and Exercise 4-8 to prove the Power Rule for f ( x ) = x"
without using induction.
4-10. Prove that the derivative of f ( x ) = f i is f ' ( x ) =
-.
1
2 f i
4- 1 1. Compute the derivative of each of the following functions.
(b) f ( x ) = (x2
+ 5>1 d
x (You may use Exercise 4-10.)
(c) f ( x ) =
4-12. Use induction
(-l)"n!x-"-'
4-13. Prove that if f , g : (a,b ) +
differentiable and ( f g ) ( " ) =
W are both
2 (;>
for all x
# 0.
n times differentiable, then the product f g is n times
f(kIg(n-k),
k=O
Hznr. Mimic the proof of the Binomial Theorem.
4.3 Rolle's Theorem and the Mean Value Theorem
One of the main applications of derivatives is to use the sign of the derivative to compute relative extrema of a function and intervals where a function is increasing or decreasing. The formal justification follows from Rolle's Theorem and the Mean Value
Theorem.
Definition 4.15 Let f : ( a ,b ) + R be a function. Then f is said to have a relative (or local) minimum at x, @there is a 6 > 0 such that f (x,) 5 f ( x )for all
x E (x, - 6 , x, + 6 ) . f is said to have a relative (or local) maximum at x~ ifand
only ifthere is a 6 > 0 such that f ( X M ) 2 f (x)for all x E ( x -~6, x~ 6). I f f has
a local maximum or a local minimum at the point c we also say that f has a relative
(or local) extremum at c.
+
Intuitively, relative extrema are the locally highest or lowest points of the graph.
(Note, however, that stagnation also is possible, see Exercise 4-14.) At the location of
a relative maximum there cannot be an incline in any direction. Hence, the derivative
should be zero at a relative maximum.
Theorem 4.16 Let f : ( u , b ) + R be a function and let m E ( a ,b). I f f is differentiable at m and f has a relative maximum at m, then f ' ( m ) = 0.
Proof. Because f has a relative maximum at m there is a positive number 6 so that
f (z) - f (m)
f ( z ) - f ( m ) 5 0 for all z with Iz - ml < 6. We infer that lim
5 0
z+m+
z -m
and lim
z+m-
-
z-m
( m ) 3 0. Because f is differentiable at m , these two limits must
4.3. Rolle’s Theorem and the Mean Value Theorem
rn
(a) a
b
-+-T--+
81
b
(b) a
Figure 12: Rolle’s Theorem states that if a differentiable function starts and ends at the
same height, then it must have a flat tangent in between ( a ) . The Mean Value Theorem
states that some tangent must be parallel to the secant through the starting point and
the ending point ( b ) .
be equal to f’(rn). Therefore f ’ ( m ) is greater than or equal to 0 and smaller than or
equal to 0. This implies f ’ ( m ) = 0.
Rolle’s Theorem states that if a function’s values are equal at the endpoints of
an interval, then the function must have a horizontal tangent line in the interval (see
Figure 12). Note that the proof is a collection of direct proofs (modus ponens), which
is possible because we can rely on strong results that we proved earlier.
Theorem 4.17 Rolle’s Theorem. Let f : [ a , b] + Iw be differentiable on the open
interval ( a , b ) and continuous on the closed interval [ a ,b]. I f f ( a ) = f (b),then there
is an m E ( a , b ) with f ’ ( m ) = 0.
Proof. Because the result is trivial if f is constant, we can assume f is not constant.
By Theorem 3.44, f assumes an absolute maximum and an absolute minimum on
[ a ,b ] . Because f is not constant, one of these values is not equal to f ( a ) . Assume
without loss of generality that the absolute maximum value is greater than f ( a ) . Let
m E [ a ,b] be so that f ( m ) 1: f ( x ) for all x E [a, b ] .
Because f ( m ) > f ( a ) = f ( b ) we infer m E ( a , b ) . Hence, f is differentiable at
m. Because f ( m ) 2 f ( x ) for all x E ( a , b ) , f has a relative maximum at m . Now by
Theorem 4.16 we conclude f ’ ( m ) = 0.
The Mean Value Theorem generalizes Rolle’s Theorem by no longer demanding
that the values at the endpoints are equal. It guarantees that there is a point in the
interval so that the tangent line at that point is parallel to the secant line through the
endpoints. In the proof, we subtract the line I(x) = (x - a ) f ( b ) - f ( a ) from the
b-a
function f . This reduces the proof to an application of Rolle’s Theorem.
Theorem 4.18 Mean Value Theorem. Let f : [a, b] + R be differentiable on the
open interval ( a , b ) and continuous on the closed interval [ a ,b]. Then there is a numf ( b )- f ( a )
ber c E ( a , b ) so that
= f’(c).
b-a
Proof. The function g(x) := f ( x ) - ( x - a ) f ( b ) is continuous on [ a ,b ] ,
b-a
differentiable on ( a , b ) and g ( b ) = f ( a ) = g(a). By Rolle’s Theorem there is a c in the
82
4. Differen tiable Functions
s-’/
ieflecr across
the diagonal
c
I
’
I’
?
c
x
*
I
Figure 13: Visualization of Theorem 4.21. Reflection across the diagonal produces the
graph of the inverse function and the slope of the reflected line is the multiplicative
inverse of the slope of the original line.
interval ( a , b ) such that g’(c) = 0. But that means f ’ ( c ) that is, f ’(c) =
f (b)- f (a)
b-a
f ( b )- f ( a )
/
= g ( c ) = 0,
b-a
‘
With the Mean Value Theorem available we can now prove a sufficient criterion for
when a function is increasing or decreasing.
Definition 4.19 Let I 2 R be an interval and let f : I + R be a function. Then
f is called (strictly) increasing on I if and only if for all x1 < x2 in I we have
f ( x l ) < f ( ~ 2 ) f. is called (strictly) decreasing on I ifand only iffor all x1 < x2 in
I we have f ( x 1 ) > f ( x 2 ) .
Moreovel; f is called nondecreasing on I if and only iffor all x1 < x2 in I we
have f ( x l ) 5 f (x2) and f is called nonincreasing on I ifand only iffor all x1 < x2
in I we have f ( X I ) 2 f ( ~ 2 ) .
Theorem 4.20 Let f : [ a ,b] + R be differentiable on the open interval ( a , b ) and
continuous on the closed interval [ a ,b]. If f ’ ( x ) > 0 for all x E ( a , b), then f is
increasing on [ a ,b].
Proof. Let f ’ ( x ) > 0 for all x in ( a , b ) . Suppose, for a contradiction, that there
are points X I ,x2 in [ a ,b] such that X I < x2 and f ( X I ) 2 f (x2). Then by the Mean
Value Theorem there is a c in the interval ( X I , x2) (and therefore in ( a , b ) ) such that
f’(c)=
(x2)x2 - X I
<
0, which is a contradiction.
w
Exercise 4-2 1 provides a similar criterion for nonincreasing and nondecreasing
functions. Rolle’s Theorem also enables us to prove that the inverses of differentiable
functions are differentiable. For a visualization, consider Figure 13.
4.3. Rolle's Theorem and the Mean Value Theorem
83
-'
Theorem 4.21 Let f : ( a , b ) + R be a continuous injective function, let f
be
its inverse and let x E f [ ( a ,b ) ] be such that f is differentiable at f - ' ( x ) with
1
f ' ( f - I ( x ) ) f 0. Then f
is dgerentiable at x and ( f - I ) ' ( x ) =
-'
f' (f- 1
(XI)
'
Proof. By Theorem 3.36 f [ ( a , b ) ] is an interval and by Theorem 4.16 x is neither its maximum nor its minimum. Thus there is a 6 > 0 so that the containment
( x - 6, x
6) f [ ( a ,b ) ] holds. Hence, we can talk about differentiability of the
function f - ' at x.
Let
be a sequence in f [ ( a , b)]\ { x } so that lirn zn = x . By Definition 3.1,
+
{zn}zl
n-m
f-'(zn) - f-'(x>
1
-
. For each
f ' (f-'w)
and let y = f - ' ( x ) . Then yn = f-'(z,) f f - ' ( x ) = y for
we are done if we can show that lim
n-m
Zn - X
n E M let yn := f - ' ( z , )
all n E M.By assumption f is continuous, and hence by Theorem 3.38 so is f - ' . This
means that lim yn = lirn f-'(z,) = f - ' lirn zn = f ( x ) = y , and therefore
n-+m
(n-+m
n-m
=
lim
n-Q3
-
Yn - Y
f(yn)-f(y)
1
= lim
n+m
1
f(Yn)-f(Y)
Yn -Y
1
-f'(y)
f'(f-'(x>)
.
Theorem 4.21 allows us to extend the Power Rule to rational exponents.
Theorem 4.22 Power Rule. Let r E Q \ {O}. Then f ( x ) = x' is differentiable with
d
-Xr
= rxr-1 at every x for which the right side is defined.
dx
Proof. Let m E N and consider f ( x ) = x i . Then f is the inverse function of
g ( x ) = x m . By Theorem 4.21 for all nonzero x E R for which x i is defined, we
Now let n E Zand m
.
E
,
M. Bv the Chain Rule. for all nonzero x for which x t is
Exercises
4-14. Geometry versus utility in the definition of relative extrema.
(a) Explain why by Definition 4.15 every point in the interval (0. 1) is a relative maximum and a
for x 5 0,
relative minimum of the function f ( x ) =
for x E [O. I],
( x - 113; f o r x > 1.
84
4. Differentiable Functions
(b) Sketch the graph of the function and comment whether intuition agrees that f has a relative
maximum and a relative minimum at each x E (0, 1).
(c) Use the proof of Rolle’s Theorem to explain why the definition of relative maxima as in Definition 4.15 is preferable to a definition of relative maxima that requires the value at a relative
maximum to be strictly larger than the values of the function near the relative maximum.
f
4-15. Prove that if
f ‘ ( m ) = 0.
: ( a , 6) -+
R is differentiable and has a relative minimum at m
E ( a , b ) , then
4-16. Let f : [a, b] + W be differentiable on the open interval (a, b ) and continuous on the closed interval
[ a , b]. Prove that if f ’ ( x ) i0 for all x E ( u , b ) , then f is decreasing on [ u , b ] .
4-17. Let f : [ u , b] -+ I
3be differentiable on the open interval ( u , b ) and continuous on the closed interval
[ a , b ] . Prove that if f ’ ( x ) = 0 for all x E (a, b), then f is constant on [a, b].
4-18. Give a direct proof of Theorem 4.20. That is, give a proof that does not argue via contradiction
4-19. Let f ( x ) := u x 2
decreasing on
+ bx + c be a quadratic function defined on R. Prove that if u
(
-m.
3
--
G
a
State and prove a similar result for a < 0.
+
> 0 then
f is
)
- -, m .
and increasing on
+
+ +
4-20. Let f ( x ) := ax3 bx2 c x
d be a cubic function defined on W with u
0. Prove that if
4b2 - 12uc > 0, then f has two relative extrema. For a > 0 and for a < 0 separately describe
where f is increasing and where it is decreasing.
4-21. Let f : [ a , b] + R be differentiable on (a, b ) and continuous on [ a , b ] .
(a) Prove that f is nondecreasing iff for all x
equivalence.)
E ( a , b ) we
have f ’ ( x ) 2 0. (Note that this is an
(b) Prove that f is nonincreasing iff for all x E ( a , b ) we have f ’ ( x ) 5 0. (Note that this is an
equivalence.)
(c) Give an example that shows that the condition in Theorem 4.20 is not equivalent to f being
strictly increasing.
4-22. Use Theorem 4.21 to prove that the derivative of f ( x ) = f i is f ’ ( x ) =
-.
1
2JT;
4-23. Explain with the following is not a proof for Theorem 4.21.
“Proof.” We know that x = f (f-’( x ) ) . Differentiating both sides with respect to x gives
d
1= -x
dx
=
d
-f
= f ’ ( f - ’ ( x ) ) ( f - ’ ) ’ ( x ) , so ( f - ’ ) ’ ( x ) =
dx
1
f‘ (f -1( X I )
4-24. Let f : ( a . b ) + iW be continuous and let x E ( a , b). Prove that if f ’ ( z ) exists for all z
and lim f ’ ( z ) exists, then f is differentiable at x with f ’ ( x ) = lim f ’ ( z ) .
E
??
‘
( a , b) \ ( x )
Z’X
Z’X
4-25. Let f : ( a , b ) + W be differentiable. Prove that f ’ (which need not be continuous) has the intermediate value property. That is, prove that for all c < d in (a, b ) and all u between f ’ ( c ) and f’(d)
there is an m E (c. d ) so that .f’(m)= u .
{
r;
and h ( x ) :=
f o r x i d , are
for x = c,
forx = d,
continuous on [c, d ] , that g ( d ) = h ( c ) , apply the Intermediate ValueTheorem to one of them, then
apply the Mean Value Theorem to f.
Hinr. Show that g ( x ) :=
4-26. Prove that for n 2 2 the nth derivative of f ( x ) = & is
-Zn-l
(2n - 2)!
f ‘ n ’ ( x ) = (-1)”+l
x 2 = (-,)n+’
2n-l(n - 1)!2n
1.3.5...(2n-3)
2”
x
-2n-I
2
4-27. For each function, find an expression for f ( n ) ( x ) and prove that it is the nth derivative o f f
I
(a) f ( x ) =
-
fi
(b) f ( x ) =
%
Chapter 5
The Riernann Integral I
The geometric goal of integration is to compute the area under a graph. In Riemann integration this is done by approximating the area with rectangles. This chapter
presents the idea behind Riemann integration and some integration criteria, examples
and theorems. Although intuitively the Riemann integral seems to be the right idea, by
the end of the chapter we will need more machinery to fully characterize Riemann integrability and we will also have exposed some key weaknesses. An equivalent criterion
for Riemann integrability will be presented in Theorem 8.12. The observed weaknesses
of the Riemann integral will be addressed in Chapter 9.
5.1 Riemann Sums and the Integral
To define the area of rectangles under the graph of a function, we first need to determine
the base for each rectangle. This is done with a partition of the interval.
Definition 5.1 Let [ a ,b ] be a closed interval. Then anyjnite set P [a,b ] such that
a , b E P will be called a partition of [ a ,b]. Because the order of the points will be
important, we also write P = {a = xo < xi < . ' < xn = b } when working with a
partition P .
1
With the partition giving the bases of the rectangles, we still need to determine the
heights. Each height will be a value that the function assumes within the respective
interval of the partition (see Figure 14(a)).
Definition 5.2 Let [ a ,b ] be a closed interval and let f : [ a ,b ] -+ R be bounded. For
any partition P = {a = xo < x1 i. . . < xn = b } a set T = { t l , . . . , t n ) such that
f o r all i E { 1, . . . , n } we have that ti E [xi-l,xi] will be called an evaluation set. We
85
5. The Riemann Integral I
86
Riemann sum
“upper sum”
“lower sum”
Figure 14: The Riemann integral approximates the area under the graph of a function
with the areas of rectangles. Definition 5.2 demands that the height of each rectangle
is the value of the function at some point in the base interval (one point marked in ( a ) ) .
Of particular importance are the lower sum (see Definition 5.13) of the areas of the
largest rectangles that can be fit under the graph of the function (b) and the upper sum
(see Definition 5.13) of the areas of the smallest rectangles that contain the graph of
the function ( c ) .
dejine the Riemann sum off with respect to the partition P and evaluation set T to be
c
n
R ( f , P , T ) :=
f (tj)(xi - xi-1). We will also use the notation Axi := xi - X i - 1 .
i=l
Clearly, a Riemann sum can only accidentally be equal to the area under the graph.
However, the narrower we make the rectangles, the closer the Riemann sums should be
to the actual area. The norm of a partition gives a uniform measure of how narrow the
rectangles in a partition are.
Definition 5.3 Let [ a ,b ] be a closed interval in the real numbers. For a partition
P = { a = xo ix1 i’ . . < x, = b ) , we define the norm of the partition P to be
11 P / / := max { (xi - xi-1) : i = I , . . . , n } .
We now say that a function is Riemann integrable iff all Riemann sums get close to
one value, the integral, as the norm of the partitions is made small.
Definition 5.4 Let D be a set. A function f : D -+ E% is called bounded iff there is an
M E R so that f (x) 5 M f o r all x E D.
1
I
Definition 5.5 The function f : [ a ,b] + R is called Riemann integrable (on [ a ,b ] )
i f f f is bounded and there is a number I such that f o r all E > 0 there is a 6 > 0
so that f o r all partitions P with 11 PI1 < 6 and all evaluation sets T the inequality
c
f(ti)Axj - I
i=l
i
I
= IR( f , P , T ) - I <
E
holds. The number I will be called the
Riemann integral off and it will also be denoted
I”
f (x) d x := I . The function f
is also called the integrand and a and b are called the bounds of the integral, with a
being the lower bound and b being the upper bound.
5.1. Riemann Sums and the Integral
87
The definition of the Riemann integral allows us to compute the value of the integral, if it exists, as a limit of a sequence of Riemann sums. Note that in Lemma 5.6
below the only condition on the partitions is that their norm goes to zero and that we
are free to choose the evaluation sets any way we want to.
Lemma 5.6 Let f : [ a ,b] +- R be Riemann integrable. Then for any sequence
[ Pk } E1 of partitions of [a,b] with lirn (1 pk I( = 0 and any associated sequence of
k-tw
b
evahation Sets { T k ) E l we have lim R( f, pk, Tk) =
k+
03
f (x)dx.
Proof. Let [ p k } E l be a sequence of partitions of [ a ,b] with lim IIPkII = 0, let
k - t co
[ T k ) E l be an associated sequence of evaluation sets and let E > 0. There is a number
6 > 0 so that for all partitions P of [ a ,b] with 11 P 11 < 6 and any associated evaluation set T we have R ( f , P , T ) -
Ib -=I
f (x)dx < &. Because lim I/Pk 11 = 0 there is a
K E W so that for all k 2 K we have IIPkII
/ R ( f $ pk, Tk) -
lb
f (x)dxl
k+w
6. Hence, for all k 2 K we infer that
rn
< &.
The idea in Lemma 5.6 is very useful for numerical integration (see Exercise 1325) and for the proof of the Fundamental Theorem of Calculus (see Theorem 5.23). To
demonstrate how Lemma 5.6 is applied, consider the following example.
I'
1
x dx = -.
2
If f is Riemann integrable, then by Lemma 5.6 for any sequence {Pn}F=l of par-
Example 5.7 If f (x) = x is Riemann integrable on [0, 11, then
titions with evaluation sets Tn we have lim R(f,
P, :=
{1 j
n
:
n-tw
= O , . . . , n } andT, :=
=
-
lim
n-tw
{f
:j = 1
-Cj
1
n2
j=1
I n
lim - - ( n +
a2 2
n-tw
f ( x ) d x . We choose
1
n2+n
1) = lim -n-tco
2n2
2'
Example 5.7 exhibits a key problem that we will ultimately resolve in Theorem
8.12. Although we may be able to compute a value that should be the Riemann integral,
it may not be clear if the function actually is Riemann integrable.
88
5. The Riemann Integral I
The following results show that it is reasonably simple to prove Riemann integrability if the value of the integral is known. Note that the strong similarity between
Theorem 5.8 and the appropriate parts of Theorems 2.14 and 3.10 is not accidental.
The definition of the Riemann integral is similar to the definitions of the limits of sequences and functions. Hence, similar “limit laws” hold and they are proved with
similar methods. Unfortunately the similarity does not apply to integrals of products
and quotients. (Products are addressed in Exercises 5-21 and 5-22 and quotients are
typically treated as products in integration.)
Theorem 5.8 Let f , g : [ a ,b] + R be Riemann integrable and let c
and cf are Riemann integrable and the following equations hold.
1.
2.
lb(f
+
lb Ib
g)(x) d x =
cf ( x )d x = c
b
f
( x ) dx
+
l
E
R. Then f
+g
b
g ( x )d x .
f (x) dx.
+
Proof. We will only prove that f
g is Riemann integrable and that equation 1
holds, leaving the rest to Exercise 5-3. Let E > 0. Then there is a 6 > 0 so that for
all partitions P of [ a ,b] with I/P 11 < 6 and all associated evaluation sets T we have
12
f (ti)&
-
i=l
Ib
f ( x ) dxi <
4
and , k g ( t i ) A x i i=l
for all partitions P of [ u , bl with II P I/ < 6 and all associated evaluation sets T we
obtain
<
&
&
- + - = &
2
2
By Definition 5.5, this implies that f
+ g is Riemann integrable and the equation
The prototypes with which we approximate the area under functions are rectangles,
that is, areas under step functions. Therefore it is only natural that step functions should
be Riemann integrable and that their integrals should be the sum of the areas of the
rectangles involved.
Definition 5.9 Let M be a set and let S M be a subset. Then the indicator function
1; f o r x E S,
of S is thefunction ls(x) :=
0; f o r x # S.
5.1. Riemann Sums and the Integral
89
Proposition 5.10 Let a < b and let c , d
integrable over [ a , b ] and
Proof. Let
E
Lb
E
[ a , b ] with c < d. Then f [ c , d ) is Riemann
l [ c , d ) dx = d - c.
> 0, let 6 := min
{ i,T},
let P = {a = xo <
be a partition with II P )I < 6 and let T =
{tl,
...
< x, = b }
. . . , t , } be an associated evaluation set.
d-c
Because 6 5 -, there are j , k E 11,. . . , n } with j i k so that l r c , d ) ( t i ) = 1 iff
3
i E { j ,j
1 , . . . , k } and l f c , d ) ( t i =
) 0 otherwise. Then c E ( t j - 1 , t j ] G [ x j - 2 , x j ] ,
Or, if j = 1, C E [Xo, t i ] 5 [ X j - l , X j ] . Similarly, d E [ x k - I , x k + l ] , or, if k = n ,
+
d
E
[x,-l,
x,].
Either way,
&
IXj-1
- c J < - and IXk
2
- dl <
&
-. Hence,
2
I
I i=i
-&-I)-
By Definition 5.5, l
(d - c )
~ ~ ,isd Riemann
)
integrable and the integral is (d - c).
rn
Standard Proof Technique 5.11 A sum of differences as in the proof of Proposition
5.10, for which the negative part of one term cancels the positive part of the previous
or the next term, is also called a telescoping sum. Telescoping sums are typically used
to collapse sums of many terms into shorter sums (see the proof of Theorem 5.23),
or, conversely (see, for example, the proofs of Theorems 13.14 and 17.33) to make a
connection between two terms using differences that are easier to work with.
Proposition 5.12 Let Q := {
and f o r k
and
E
11, . . . , m ) let
lb
eakl[zk-I,zk)
k=l
a =
ak E
zo
<
z1
< “m
<
zm
= b } be a partition of [ a ,b ]
R. Then f := ~ a k l ~ z k - , ,isi kRiemann
)
integrable
k= 1
in
dx = z a k ( Z k - z k - 1 ) .
k=l
Proof. Combine Theorem 5.8 and Proposition 5.10 (Exercise 5-5).
rn
Exercises
5-1. Prove that Lemma 5.6 can be turned into a biconditional. That is, let f : [ a , b] + W be a bounded
function and prove that f is Riemann integrable iff there is an I E R so that for every sequence
of partitions of [a, b] with lim llPk 11 = 0 and any associated sequence of evaluation sets
(&)p=,
k+cc
5. The Riemann Integral I
90
5-2. Prove each of the following
(a) If the function f ( x ) = x 2 is Riemann integrable on [0, I], then
(b) If the function f ( x ) = x3 is Riemann integrable on [0, I], then
(c) If the function f ( x ) = x4 is Riemann integrable on [0, 11, then
s,I
x2 dx =
6'
1
3
-.
1
x3 d x = -.
4
s,I
x 4 dx =
1
5
-.
Hint Use summation formulas from Exercise 1-33.
5-3. Prove part 2 of Theorem 5.8.
m
5-4. Prove that if fl , . . . , fm : [ a , b]
--f
R are Riemann integrable on
k=l
lb5
integrable on [a. b] and
fk is Riemann
[a, b ] , then
m
k=l
b
fk(x) dx
fk(x)dx =
k=l
5-5. Prove Proposition 5.12. You may use the result of Exercise 5-4.
5-6. Let f,g : [ u ,b]
l (f
lb
+
b
- g ) ( x )dx =
R be Riemann integrable. Prove that f
l
b
f ( x )d x -
l
-
g is Riemann integrable and
b
g(x) d x .
5-7. Let a < 6 and let c, d E [ a , b] with c < d . Prove that l[,,d] is Riemann integrable over [ a , b] and
l[,,d]
dx = d
- C.
5-8. Prove that if c is a constant, then f c ( x ) := c is Riemann integrable and
5-9. Let f : [ a , 61 + R be Riemann integrable and let m , M E
m 5 f ( x ) 5 M . Prove that m(6 - a ) 5
s,"
1"
fc(x) dx = c ( b - a ) .
W be such that for all x
E [ a , b] we
have
f ( x ) dx 5 M ( b - a ) .
5-10. In the definitions of the Riemann integral and of Riemann sums we demand that the function is
bounded. We could define Riemann sums for unbounded functions also, but this exercise shows that
the Riemann integral cannot be defined for unbounded functions.
Let f : [a,b] --f R be a function. Prove that if there is a number I such that for all E > 0 there
is a S > 0 so that for all partitions P with 11 P // < S and all associated evaluation sets T we have
R(f,P. T ) - I < E , then f must be bounded.
Hint. Assume the function is not bounded and choose the right partitions and evaluation sets to obtain
a contradiction to Riemann integrability.
1
1
f : [a. b] + 4 be a function. Prove that if for all
z 0 there is a S z 0 so that for all partitions P ,Q with I/ P 11, 11 Q I/ < S and all associated evaluation
sets T , U we have 1 R(f,P ,T ) - R(f,Q , U ) 1 < E . then f must be Riemann integrable.
5-1 1. Cauchy Criterion for Riemann integrability. Let
E
5-12, Riemann-Stieltjes integrals. Let [ u ,61 be a closed interval, let g : [ a , 61 + W be nondecreasing
and let f : [a, 61 -+ W be bounded. For any partition P = (a = xg < x1 < . . . < xn = b) and
associated evaluation set T = ( t l , . . . , t n ) , define the Riemann-Stieltjes sum of f with respect to
n
the partition P,evaluation set T and integrator g as S,(f, P,T ) :=
f ( t j ) ( g ( x i )- g(xi-1)
).
i=l
We will also use the notation Agi := g ( x j ) - g ( x j - 1 ) .
The integrand f is called Riemann-Stieltjes integrable on [ a , 61 with respect to the integrator
g iff f is bounded and there is a number I such that for all E > 0 there is a 6 z 0 so that for all
partitions P with lIPl1 < S and all associated evaluation sets T we have S g ( f , P ,T ) - I < E .
1
The number I will be called the Riemann-Stieltjes integral of f ,denoted
1
lb
f dg
:= I .
5.2. Uniform Continuity and Integrability of Continuous Functions
91
(a) Prove that if fl , f2 are Riemann-Stieltjes integrable with respect to g, then f1 +f2 is Riemann-
Jd
Stieltjes integrable with respect to g and
b
(fl
Jdb
+ f2) dg =
Jd
+
fi dg
(b) Prove that if f is Riemann-Stieltjes integrable with respect to g and c
Riemann-Stieltjes integrable with respect to g and
Sb
(c) Prove that if g : [0, 11 + R is the indicator function 1
not Riemann-Stieltjes integrable with respect to g .
cf dg = c
Jd f
b
E
f2 dg.
R,then
cf is
b
dg.
[ 1 4 , then the function f
=1
5.2 Uniform Continuity and Integrability of Continuous Functions
Unlike continuity and differentiability, Riemann integrability is not easily checked.
Therefore, we need to invest some effort into finding criteria for Riemann integrability
that are simpler than the definition. The first such criterion is that continuity implies
Riemann integrability.
The fundamental idea behind the Squeeze Theorem (see Theorem 2.21) is that if
a sequence is trapped between two other sequences that converge to the same limit,
then the sequence in the middle must converge to that limit, too. To use this idea in
our investigation of Riemann integrability, we define lower and upper bounds for all
Riemann sums associated with a given partition.
Definition 5.13 Let f : [ a ,b] -+ R be bounded, let P = { a = xg <
a partition of [ a ,b] and for each i E (1, . . . , n } let
mi
:= inf { f (x) : x
E [xi-l,
x;]}
Mi := sup { f (x) : x
and
. . . < x,
E [xi-l,
c
= b } be
xi]}
n
We define the lower sum o f f with respect to P to be L ( f , P ) :=
n
define the upper sum o f f with respect to P to be U (f , P ) :=
mi Ax; and we
i=l
M ; Ax;.
i=l
Because the supremum and the infimum in Definition 5.13 need not be assumed by
f,lower and upper sums need not be Riemann sums of f . However, the next result
shows that the Riemann sums are "trapped" between appropriate upper and lower sums.
Lemma 5.14 Let f : [ a ,b] -+ R be bounded and let P = { a = xg,. . . , x, = b }
be a partition of [ a ,b]. Then f o r all associated evaluation sets T = { t l , . . . , t n } the
inequalities L ( f , P ) 5 R( f , P , T ) L U ( f , P ) hold.
Proof. Because ti
E [xi-l, x i ]
n
for all i
E
n
all i E (1, . . . , n ) . Hence, c m i Axi p
i=l
11, . . . , n } we have mi 5 f ( t ; ) 5 Mi for
f (tj)Axi 5
i=l
c
n
Mi Axi, as claimed.
i=l
To prove that a function is Riemann integrable, we would need to show that lower
and upper sums get closer to a certain limit as the norm of the partition tends to zero.
The following monotonicity properties are a first step in this direction.
92
5. The Riemann Integral I
Definition 5.15 Let P , Q be partitions of [ a ,b]. Then Q is called a refinement of P
ifSP C Q.
Lemma 5.16 Let f : [ a ,b] -+ E% be bounded and let P , Q be partitions of [ a ,b ] so
that Q is a rejinement of P . Then L ( f , P ) f L ( f , Q ) f U (f , Q ) f U (f , P ) .
Proof. From Lemma 5.14, we know that L ( f , Q) f U (f , Q). Therefore we can
focus on the other two inequalities. Let {zl,. . . , z k } := Q \ P . Then with Qo := P
and Q; := Qj-1 U { z j ) for j = 1, . . . , k we have that each Q j is a refinement of Q;-1
and Q k = Q. Thus we are done if we can prove the outer inequalities for Q = P U { z ]
for some z E [ a ,b ] \ P . We will only prove L ( f , P ) 5 L ( f , Q ) , leaving the other
inequality to Exercise 5-13. Let P = { a = xo < x1 < . . . -= xn = b } and let
j E { 1, . . . , n } be such that z E [x;-1,xj].Then
Lemma 5.17 Let f : [ a ,b] + R be bounded and let P , Q be partitions of [ a ,b].
Then L ( f , Q ) 5 U ( f , PI.
Proof. Because P U Q is a refinement of P and of Q , the proof follows from
Lemma 5.16: L ( f , Q ) I
L(f , P U Q ) I
U (f , P U Q ) 5 U (f , P ) .
Turning to continuous functions, we note that continuity guarantees that for every
point x and every E > 0, there is an interval (x - 6,x
6) so that the inequality
sup{ f ( z ) : z E (x - 6 , x 6 ) ) - inf { f ( z ) : z E (x - 6 , x 6 ) ) < E holds.
With this difference expected to be small, the difference between upper and lower
sums of continuous functions should also become small as the partitions become finer.
Unfortunately, while the norm of a partition is given for the whole interval, the S may
vary from point to point. To prove that continuous functions are Riemann integrable,
we need to show that for any E > 0 we can choose a 6 as above that works at every
point of the closed interval [ a ,b ] . This is the idea behind uniform continuity.
+
+
+
Definition 5.18 Let J C R be an interval. The function f : J + R is called uniformly continuous zrfor every E > 0 there is a 6 > 0 such that f o r all u . u E J with
Iu - U I < 6 we have I f ( u ) - f ( u ) l < E.
93
5.2. Uniform Continuity and Integrability of Continuous Functions
Lemma 5.19 Let f : [ a ,b] -+ IR be continuous. Then f is uniformly continuous.
Proof. Suppose for a contradiction that f is not uniformly continuous. Then there
is an E > 0 such that for all 6 > 0 there are u , u E [a,b] such that ( u - U I < 6
1
and l f ( u ) - f ( u ) / >_ E . Therefore, for all k E N with 6 k := - we can find numbers
k
1
U k , U k E [ a ,b] so that Iuk - u k / < - and / f ( u k ) - f ( u k ) l 2 E . By the Bolzanok
Weierstrass Theorem, the bounded sequence { u k } L , has a subsequence { uk,
that
1
the inequality Iuk, - Uk, < converges to a t E [ a ,b ] . Because for all m E
km
holds, we conclude that lim U k , = lim U k , = t . But then, because f is continuous,
1
m+co
lim f
m+m
(uk,)
m-co
= f ( t ) = lim f (uk,) , which means lim I f (uk,)
a contradiction to
/
m+co
f( u k ) - f ( u k )
m-co
-
f (uk,)l
= 0,
I 2 E for all natural numbers k .
Uniform continuity was the last piece in the puzzle to establish that continuous
functions on closed and bounded intervals are Riemann integrable.
Theorem 5.20 Let f : [ a ,b] -+ R be continuous. Then f is Riemann integrable.
Proof. By Theorem 3.44, there is a real number M such that f (x) 5 M for all
[ a ,b ] . Then for all partitions P of [a,b] we have that L ( f,P ) 5 M ( b - a ) . Set
I := sup { L ( f , P ) : P is a partition of [a,b ] } .Because by Lemma 5.17 for any partitions P , Q of [ a ,b] we have L(f,Q) 5 U ( f , P ) , we infer L ( f , P ) 5 I 5 U ( f , P )
for all partitions P of [a,b ] . By Lemma 5.14 for all evaluation sets T , we have
L(f,P ) 5 R ( f , P , T ) 5 U (f,P ) , so by Exercise 1-13, for all partitions P of [ a ,b ]
the inequality l R ( f , P , T ) - I / 5 U ( f , P ) - L(f,P ) holds.
Now let E > 0. By Lemma 5.19, there is a 6 > 0 such that for all x, y E [ a ,b] with
E
I x - j I < 6 wehave lf(x) - f ( y ) l < -.Let P = {a = xo < x1 < . . . < xn = b }
b-a
be any partition of [ a ,b] with /I P 11 < 6. Because f is continuous on [ x j - l , x i ] , by
Theorem 3.44 and Exercise 3-38 there are tf , t u E [xi-1,xi] so that f t f = mi and
x
E
0
)
f ( t y = M i . Therefore
n
By Definition 5.5 f is Riemann integrable with
n
Ib
f ( x ) dx = I .
We conclude this section by proving that for continuous functions the “right” choice
of an evaluation point will produce the value of the integral. Proposition 5.21 is a
lemma needed to establish the result, which is stated in Theorem 5.22.
5. The Riemann Integral I
94
Proposition 5.21 Let f , g : [a, b ] + IR be Riemann integrable. I f f (x) I
g ( x )for all
l
b
x
E
b
f ( x ) dx I g ( x ) d x .
[ a , bl, then
Proof. Let E > 0 and let 6 > 0 be such that for all partitions P = { a = xo <
x1 < . . . < xn = b } with 11 PI1 < 6 and all evaluation sets T = ( t l , . . . , t n }we have
Idb
f ( x )dx -
2
f(ti)Axi
g(x)dx -kg(tj)Axi
i=l
i=l
P being a partition with 11 P 11 < S and T being an evaluation set we obtain
which proves the result (see Standard Proof Technique 2.7).
Theorem 5.22 Mean Value Theorem for the Integral. Let f : [ a ,b] + IR be continuous. Then there is a c E ( a , b ) so that
l6
f ( x ) d x = f ( c ) ( b- a ) .
Proof. Let M = max{f(x) : x E [ a ,b ] } ,m := min{f(x) : x E [ a ,b ] }and
let x,, X M E [ a ,b ] be such that f ( x , ) = m and f ( X M ) = M . Then the constant
) M are continuous, and hence Riemann intefunctions k , ( x ) := m and k ~ ( x :=
grable over [ a ,b ] . Moreover, k , ( x ) 5 f ( x ) 5 k ~ ( x for
) all x E [ a ,b ] and it
is easy to show (see Exercise 5-8) that if k is a constant, then
Therefore by Proposition 5.21 we obtain m ( b - a) 5
then f (x,)=m 5
1,"f (x) d x
b-a
Ib
lb
k d x = k(b -a).
f ( x )dx I
M ( b - a ) . But
5 M = f ( x ~and
) by the Intermediate Value Theorem
there must be a c between x, and X M so that f ( c ) =
1,"f (XI d x
b-a
'
Exercises
5-13. Finish the proof of Lemma 5.16 by proving that if f : [ a . b] -+ W is bounded, P is a partition of
[ a . b] and Q = P U ( z ) for some z E [ a ,b] \ P , then U ( f , Q ) 5 U ( f , P).
1
5-14. Prove that f : (0, 11 + W defined by f ( x ) = - is continuous but not uniformly continuous. Then
X
explain why this example is not a contradiction to Lemma 5.19.
5-15. A function f : [ a ,b] -+ W is called Lipschitz continuous iff there is an L
x,y E [ a , 61 the inequality f ( x ) - f ( y ) 5 L ~ x yl holds.
i
1
t
0 such that for all
(a) Prove that i f f : [a. b] + W is Lipschitz continuous, then f is uniformly continuous.
(b) Prove that i f f : (c. d ) + W is differentiable and f ' is bounded on [ a ,b] C (c. d ) , then f is
Lipschitz continuous on [ a ,b ] .
5.3. The Fundamental Theorem of Calculus
95
(c) Prove that f ( x ) := f i is uniformly continuous on [0, 11, but not Lipschitz continuous
5-16. Prove that if f : [a, b] --f [0, 00) is a continuous nonnegative function with
f ( x ) = 0 for all x
E [a, b].
5-17. Let f : [ a , b] +
R be Riemann integrable and let
Ib
f ( x ) dx = 0, then
[ P k ] r = , be a sequence of partitions with
rb
5-18. Prove that i f f : [ a , b ] + [0,co)is a nonnegative Riemann integrable function with
lb
f ( x ) d x > 0,
so that f ( x ) 1 6 for all x E [c, d ] .
Hint. For a contradiction, suppose the opposite. Use lower sums to show the integral would be zero.
then there are an E > 0 and c
id
5-19. Let f : [ a , b] + R be bounded and let g : [ u ,b] + J
R be nondecreasing. With mi and Mi as in
n
n
Definition 5.13, let Lg(f,
P ) := xrnj Agj be the lower sum and let Ug(f,
P) :=
E M , Agi be
i=l
i=l
the upper sum o f f with respect to g and P .
(a) Let P , Q be partitions of [a. b] so that Q is a refinement of P.
Prove that Lg(f,p ) 5 Lg(f,Q ) 5 Ug(f,
Q) 5
ug(f,
P).
(b) Prove that i f f is continuous, then f is Riemann-Stieltjes integrable with respect to g.
(c) Let f1, f2 : [a, b] +
R be Riemann-Stieltjes integrable with respect to
f l ( x ) 5 f z ( x ) for all x E [ a , bl, then
lb s,”
fl dg 5
f 2 dg.
(d) Mean Value Theorem for Riemann Stieltjes Integrals. Let f : [a, b] --f
Prove that there is a c E ( a , 6 ) so that
(e) Let c E (a, b ) and let g := l [ , , b ] .
i. Prove that i f f : [ a ,b] -+
with respect to g and
g . Prove that if
R be continuous.
Ib
f dg = f ( c ) ( g ( b )- g ( a ) ) .
W is continuous at c, then f is Riemann-Stieltjes integrable
Ib
f dl[,,b] = f ( c ) .
ii. Prove that if f : [ a , b] + R is not continuous at c, then f is not Riemann-Stieltjes
integrable with respect to g .
(f) Prove that if f is continuous and g is continuously differentiable, then, with the integral on
b
f dg =
the right being a Riemann integral, we have
/
a
b
f ( x ) g ’ ( x )d x .
Note. Integrals against d g are often used to abbreviate Riemann integrals as on the right side.
5.3 The Fundamental Theorem of Calculus
The Fundamental Theorem of Calculus connects differential and integral calculus by
stating that differentiation and integration are essentially inverses of each other.
Theorem 5.23 Fundamental Theorem of Calculus, Antiderivative Form. Let [ a ,b ]
be contained in ( c ,d ) and let F : ( c ,d ) +. R be a differentiablefunction whose derivative f is Riemann integrable on [ a , b]. Then
Ib
f (x)d x = F ( b ) - F ( a ) =: F ( x )
96
5. The Riemann Integral I
Proof. Because f is Riemann integrable, the integral exists. Hence, we can use
Lemma 5.6 to prove the equation. For k E N,let Pk = a = xf' < . . . < x$) = b
1
be a partition with (1 Pk 11 = 7 . By the Mean Value Theorem (see Theorem 4. 18), for
I
1
K
each k E €V and each i E { 1, . . . , nk} there is a point ti(k) E [x!!)~,xr)] such that
F x?)
F xi'!),
( xi(k)-- x i -(,
theequation f (ty'>= F f (t(k') =
holds. Therefore we obtain
(k)
'i
(x:!)~)
I
f (t?)) Ax(k) = F (x:")) - F
. Let Tk := t!k) : i = 1, . . . , nk . Then Tk is
an evaluation set for Pk and by Lemma 5.6 we conclude via a telescoping sum
which proves the result.
Definition 5.24 Because of the Fundamental Theorem of Calculus, if F and f are
functions with F' = f , then F will also be called an indefinite integral o f f , denoted
F =
1
f (x) dx. One way to compute Riemann integrals is to evaluate an indefinite
integral at the upper and lower bound and to compute the difference.
The hypothesis that the derivative of F is Riemann integrable feels quite artificial.
Nonetheless, this hypothesis is best possible. Exercise 12-24 will exhibit a differentiable function whose derivative is bounded, but not Riemann integrable. Although this
example is a bit pathological, it points out a weakness of the Riemann integral that
motivates the development of the Lebesgue integral. The Antiderivative Form of the
Fundamental Theorem of Calculus for the Lebesgue integral, which has no artificial
looking hypotheses, will be proved in Exercise 23-8. The Derivative Form of the Fundamental Theorem of Calculus is proved in Theorem 8.17 for the Riemann integral and
in Exercise 18-6 for the Lebesgue integral.
Exercises
5-20. Power Rule for integration. Let r E
Q \ [-
positive or both be negative. Prove that
b. In case r i0, let a and b either both be
1
1
x r dx = -brcl
o r + ' . Then explain why
1) and let a
Ib
i
r+1
~
r f 1
i0.
we needed to require a and b to be both positive or both negative for r
5-21. Integration by Parts. Let [ a , b] c (c, d ) and let F , g : (c, d ) + W be continuously differentiable
with derivatives f and g'. Prove that
Ib
f ( x ) g ( x ) dx = F(b)g(b)- F ( a ) g ( a ) -
Ib
F ( x ) g ' ( x ) dx.
5.4. The Darboux Integral
97
5-22. Integration by Substitution. Let [a, b] c (c, d ) , let g ; (c, d ) + W be continuously differentiable
with derivative g’ and let F be continuously differentiable with derivative f such that the domain of
F contains g [ [ u , b]
1. Prove that
b
f
(g(x)
) g ’ ( x ) dx = F ( g ( b ) ) - F
(g(a))
.
5-23. What would we need to prove so that the hypothesis that the derivatives are continuous in Exercises
5-21 and 5-22 can be replaced with the hypothesis that the derivatives are Riemann integrable?
5.4 The Darboux Integral
Lemma 5.6 is an efficient tool to establish properties of the Riemann integral, provided
all functions involved are Riemann integrable. It is now time to look for a similarly
efficient criterion to prove that a function is Riemann integrable.
Riemann’s Condition below is inspired by the idea of trapping the Riemann sums
between lower and upper sums, which was already used in the proof that continuous
functions are Riemann integrable. Riemann’s Condition is simpler to verify than Definition 5.5 of Riemann integrability, because, for E > 0, instead of working with all
partitions of sufficiently small norm and all evaluation sets, we only need to find one
partition so that the upper and lower sums are closer together than E . The price is paid
in the proof, where for the ‘‘e”
direction, our only tool is one partition and we must
prove something for all partitions of sufficiently small norm.
Theorem 5.25 Riemann’s Condition. Let f : [ a ,b] + Jft be a boundedfunction.
Then f is Riemann integrable on [a,b ] i#for all E > 0 there is a partition P of [a,b]
such that U (f , P ) - L ( f , P ) < E.
Proof. For “+,”let f : [ a ,b] + EX be bounded and such that for all E > 0 there
is a partition P of [ a ,b] such that U ( f , P ) - L ( f , P ) < E . By Lemma 5.17 the set
B := { L ( f , P ) : P is a partition of [ a ,b ] }is bounded above. Let C := sup B and let
E > 0. Let P = {a = xg < x1 < . . . < x, = b } be a partition of [ a ,b ] such that
E
U (f,P ) - L ( f , P ) < -. Then for all refinements Q of P and all evaluation sets T for
2
Q , Lemmas 5.14 and 5.16 imply
IR(f,Q , TI
u(f, P ) - L ( f 3P ) < 5.
- 131 I
u ( f ,Q ) - L ( f , Q ) I
&
To show that f is Riemann integrable, let M := sup {If (x)l : x E [ a ,b ] }and let
&
Ax1
&?.! . We will now show that for all partitions S
6 := min ( 4 n ( M + 1 ) ’ 3 ’ ” ’
3
with IlSIl < 6 and all associated evaluation sets TS we have l R ( f , S, Ts)- CI < E .
Let S = a = xo < x1 < . . . < x& = b be any partition of [ a ,b] with IlSll < 6.
s
s
Then Q : S U P is a refinement of P . Therefore, for any evaluation set T for Q
&
we have R ( f , Q , T ) - CI < -. Moreover, by choice of 6, every interval [x:-~, xs]
2
contains at most one point of P that is not in S and any two intervals [x;-, , x/”]that
contain such a point do not intersect. Let TS be any evaluation set for S and let T be an
evaluation set for Q that contains Ts.Then T \ TS = { t i , , . . . , t i k } with k < n , because
1
1
I
I
98
5. The Riemann Integral I
the addition of at most n - 1 points to S that are all in distinct intervals
at most n - 1 intervals that do not already have an evaluation point assigned. Therefore
1 R ( f ,Q , T ) - R
( f 9
S,
Ts)1
and hence
IR ( f , S , Ts)- CI
I IR
( f , S , Ts) - R ( f , Q, T ) l + I R ( f , Q , T ) - Cl
&
&
-+-=&.
<
2
2
Thus f is Riemann integrable and its Riemann integral is C.
For ''=+," let f : [ u , b] -+ R be Riemann integrable and let E > 0. Then there
is a number I such that for all E > 0 there is a 6 > 0 such that for all partitions
&
P with IIPII < 6 and all evaluation sets T we have that I R ( f , P , T ) - I1 < -. Let
4
P = ( a = xo < x1 < . . . < x, = b] be such a partition and for each i E 11.. . . , n }
&
&
f i n d t f , t y E [xi-l,x;]suchthatf t,! I m i + and f (ty) 2 M ; - -.
4n Axi
4n Axi
L e t T L : = { t , ! : i = l , . . . , n a n d T ' : = { t ~ : i = l , . . . , n}.Then
I
0
U ( f >P ) - L ( f . P )
<
[ U ( f ,P ) - R
(f,P , T')]
+ / R (f,P , T')
+ 11 - R ( f , P , T L ) i + [ R ( f , P , T L )
-
-
L ( f ,PI]
99
5.4. The Darboux Integral
n
<
&
C-AXi+z+C4n Axi
i=l
&
n
i=l
&
&
AX,- 4nAxi
‘ - 4
+ -2 + -4 =
&
&
E,
which was to be proved.
Riemann’s Condition shows that if the lower and upper sums of a function f can
get arbitrarily close to each other, then f is Riemann integrable. The only way this
cannot happen is if the function oscillates too much, that is, if the function is highly
discontinuous.
Example 5.26 The Dirichlet function f (x) = 0; f o r x E [O, 11 \ Q> is
Riel ; for x E [0,l ] n Q,
mann integrable on [0,I ] .
All lower sums of the Dirichlet function are zero and all upper sums are 1. By
Theorem 5.25 the Dirichlet function is not Riemann integrable.
0
It is natural to ask now how much oscillation or discontinuity there can be without losing Riemann integrability. Moreover, there are further important results about
Riemann integrals that could be presented here. Theorem 5.25 could be used to prove
these results, but the proofs would involve substantial work with Riemann sums. We
will avoid this work by first using Theorem 5.25 to prove the Lebesgue criterion for
Riemann integrability in Theorem 8.12. This criterion makes the proofs of further results about Riemann integrals, presented in Section 8.3, more effective. Moreover, the
Lebesgue criterion will answer how discontinuous a Riemann integrable function can
be.
To formulate the Lebesgue criterion, we need to introduce the Lebesgue measure of
a set, which is fundamental for more advanced analysis. To define Lebesgue measure
we need “infinite summations” (introduced in Chapter 6) and some set theoretical notions of “size” for sets (introduced in Chapter 7). In Chapter 8, we will continue where
we leave off here.
To conclude this chapter, we note that the approximation of the area under a function with lower and upper sums is also known as Darboux integration. To acquaint the
reader with the language involved, we formally introduce Darboux integration, which
is equivalent to Riemann integration.
Definition 5.27 Let f : [a,b] -+ R be a boundedfinction. We define
Cf
24,
P ) : P is a partition of [ a ,b ] } ,
:= sup {L(f,
:= inf { U (f , P ) : P is a partition of [ a ,b ] } .
Cf is also called the lower integral o f f and Uf is also called the upper integral
of f . We will say that f is Darboux integrable on [a,b] iff Cf = 24f. In this case
Cf = 24, is also called the Darboux integral o f f .
Proposition 5.28 Let f : [ a ,b ] + R be a bounded function. Then Cf 5 Uf.
5. The Riemann Integral I
100
Proof. Easy consequence of Lemma 5.17 (Exercise 5-24).
Theorem 5.29 Let f : [ a ,b ] -+ R be a bounded function. Then f is Darboux integrable on [ a ,b] iff f is Riemann integrable on [a,b]. Moreovei; the Darboux and
Riemann integrals are equal in this case, that is,
Lb
f ( x ) d x = C = 24.
Proof. For “jlet
,”
f : [ a ,b] ++R be Darboux integrable and let E
0. Then
&
U = C and there are partitions Q and R of [a,b] such that C - - < L ( f , Q ) and
2
&
U(f,R) < U + - . N o w P : = QURisarefinementof QandRandbyLemma5.16
2
we infer
C-
&
- < L ( f , Q ) 5 L ( f ,P) I
C=U I
U ( f ,P ) 5 U ( f , R ) < U +
2 = C + i,
&
&
2
and hence U (f , P ) - L ( f , P ) < E . By Theorem 5.25 f is Riemann integrable.
For “+,”let f : [ a ,b ] -+ R be Riemann integrable. Then by Theorem 5.25 for all
E > 0 there is a partition P of [ a ,b ] such that U ( f , P ) - L(f,P ) < E . Hence, for any
E > 0 we infer U - C 5 U (f , P) - L ( f , P ) < E , which means U = C.
The “moreover” part follows upon examination of the “+”part of the proof of
Theorem 5.25, which also shows that the Riemann integral o f f is equal to C.
Exercises
5-24. Prove Proposition 5.28.
5-25. Let f ,g : [a,b] + R be Darboux integrable. Prove that then f
cf+g= L f + c g .
5-26. Let f : [ a ,b] --f
W be Riemann integrable on [a.b ] .
(a) For each n
with 11
+ g also is Darboux integrable and
E
[5
W, let P - a = x $ ) < . . . < 1‘“)
= b } be a partition of the interval [ a . b ]
kn
n - 1
11 < -1 and let sn
:=
n
1
m j ” ) = inf f ( x ) : x E [xf:),
k,i
~
1
1
m ~ ) l ~ X ; ~ > , , X+~m
) )t ) l
, x?)]}
i. Prove that for all n E N and all x
ii. Prove that i f f is continuous at x
iii. Prove that lim
n-30
l
b
1 f - sn 1
(b) Prove that there is a sequence {cn};$
E
E
-,I’;.,
(n)
[Xkn
’ where
[ a ,b] we have sn ( x ) 5 f ( x )
X
converges to f ( x ) .
[a,
b ] ,then { s , ( x )
dx = 0
of continuous functions on [a,b ] such that for all n
E
W
we have jcn/ 5 I f / and so that
5-27. Prove Riemann’s Condition for Riemann-Stieltjes integrals. That is, let f : [ a ,b] +. X be bounded.
let g : [a.b ] + E be nondecreasing and prove that f is Riemann-Stieltjes integrable on [a.b ] with
respect to g iff for all F z 0 there is a partition P of [a,b] such that U,(f,P ) - L g ( f ,P ) < E .
Chapter 6
Series of Real Numbers I
Series facilitate the computation of “sums with infinitely many terms.” This first introduction mostly showcases the results needed to define outer Lebesgue measure (see
Definition 8.1). Further results about series will be presented in Chapter 10.
6.1 Series as a Vehicle To Define Infinite Sums
Of course it is impossible to add infinitely many numbers. But we can consider the
convergence behavior of a sequence of finite sums.
Definition 6.1 Let ( a j } c l be a sequence of real numbers. The partial sums of the
n
a2
series c a j of real numbers are dejined to be sn :=
la2
a j . The series is said to
i=l
j=1
converge ifs the sequence of partial sums ( e a j
j=1
converges and it is said to
n=l
c
..
diverge otherwise. For a convergent series, the limit is usually denoted
aj
j=1
Series that start at numbers other than 1 are defined similarly (Exercise 6-1). This
section is devoted to introductory examples and some fundamental results.
Geometric series are the prototypical examples of convergent series. Because there
is a simple formula for the partial sums, geometric series nicely fit the mold of Definition 6.1. Figure 15 gives an indication why geometric series are called “geometric”
and Exercise 6-2 gives some examples of computations.
Theorem 6.2 Geometric sums and geometric series. Let a , q E R and let q f 1.
n
1 -qn
a q j = aq -holds. Moreover;for
Then for all n E N the summation formula
1-9
j=1
c
101
102
1
6. Series of Real Numbers I
I
I
I
I
1
-
1
1
1
1
-+-+-+-+...=I
2 4 8 1 6
t -
2
I
I
I
I
Figure 15: The “sum” of infinitely many terms can be finite. Each rectangle inside the
square above is one-half of the size of the previous rectangle and the sum of all their
areas should be the area of the square.
c
all real numbers 1q I < 1 the series
c
oc
30
aqJ converges and
j=l
j=l
aq
a q J = -.
1-q
Proof. The first statement follows from the following telescoping sum argument.
n
n
j=1
;=1
j=1
n
r z t1
;=l
j=2
= aq - a q n f l .
a;
For 141 < 1, note that
n
x a q j
1 -qn
= lim
;=1
n e w
aq
.
m
;=l
Series with nonnegative terms play an important role in analysis. Their partial sums
are nondecreasing. Therefore, by the Monotone Sequence Theorem, whenever the
terms of a series are nonnegative and the partial sums are bounded, the series converges.
cc
Lemma 6.3 For all j
E
M, let a j 2 0. Then the series
quence of the partial sums
{2
J=1
a;
1
a j converges $the sej=1
is bounded above.
n=l
Proof. For 3,”
recall that by Proposition 2.34 any convergent sequence is bounded,
and hence, in particular, it is bounded above.
103
6.1. Series as a Vehicle To Define Infinite Sums
c
00
For
"+,"let
aj be a series such that all aj are nonnegative and such that the
j=1
c
c c
sequence
n+l
have
00
of its partial sums is bounded above. Because for all n E
aj
N we
{ j l l
aj = an+l 2 0 , the sequence of the partial sums is nondecreasing.
aj -
j=1
j=1
Thus by the Monotone Sequence Theorem
series converges.
{
c
30
must converge, and hence the
aj
In=1
j l 1
Series can easily be added, subtracted, and multiplied by numbers. Multiplication
of series is more complicated and is therefore deferred to Theorem 10.14.
c c
00
Theorem 6.4 Let
x)
bj be convergent series of real numbers and let c be
j=1
j=1
a real numbel: Then the following hold.
aj and
c
c
c
00
1. The series
aj
c
c
00
+ bj converges and
aj
j=l
j=1
00
2. The series
j=1
33
c
bj .
aj
j=l
00
aj -
j=1
00
caj converges and
00
00
aj - bj =
j=l
00
c +c
c c
j=1
00
aj - bj converges and
j=l
3. The series
+ bj =
bj.
j=1
c
00
caj = c
j=l
aj .
j=1
Proof. To prove part 1, note that
00
n
x
C a j+ c b j
j=1
=
n-oo
lim
c a j
j=1
n
j=l
=
n
+ n+m
lim
n
I1
c b j =)i-ncaj + c b j
j=l
j=l
j=l
30
lim c a j + b j = c a j + b j ,
n+w
j=l
j=l
where in each step the existence of the quantity on the left side of the equal sign implies
c c
c +c
00
c+
x
c
00
cc
00
aj and
the existence of the quantity on the right side. Hence, if
j=1
bj converge,
j=1
00
bj .
j=l
j=1
j=1
j=1
The remaining parts are left to the reader as Exercises 6-3a and 6-3b.
then
aj
bj converges and we have
aj +bj =
aj
6. Series of Real Numbers I
104
Similar to the algebraic operations, comparabilities can be moved through the summations.
c c
c c
cc
w
Theorem 6.5 Let
cc
00
Then
j=1
aj 5
j=1
bj be convergent series so that a j 5 bj for all j E
a j and
N.
j=1
bj.
j=1
Proof. Exercise 6-4.
The most concrete example of series in our daily experience is so fundamental, it
is often not even recognized as a series. Geometric series are the foundation for the
decimal expansion of real numbers. Formally, decimal expansions are an identification of numbers with sequences of integers. The connection is made via series with
nonnegative terms as shown below.
Proposition 6.6 Let D be the set of all sequences { d j ) p of integers in the set (of
digits) (0, 1, 2, 3 , 4 , 5 , 6 , 7 ,8,9) such that there is a k E &=with dk # 0 and so that for
every n E N there is an m 2 n so that d,,, # 9. Then for each { d j ) g l E D the series
O0
d.
converges and the function r : D + (0, 1) is bijective.
r ( { d j ) T l ) :=
j=1
For every x E (0, l), we call r-l (x) the decimal expansion ofx.
c
Proof. Let {dj]T=o=l
E D . For all n
05
E
N,we infer the inequalities
c- c-
1
9
= 9101
10
1j=1 loJ - j=1
dj
<
n
&
9 1
10 2
10
<--=I
-
.
co
dj are nonnegative, Lemma 6.3 guarantees that
d j conBecause the terms 10J
105
j=1
verges. Moreover, because at least one dk is positive and not all dk are equal to nine,
c
00
we infer that 0 <
dj < 1 for all { d j } E 1E D .
-
101
i=l
To prove that r is injective, let { c j } p l f: {dj}y=l be sequences in D. Let n E R?
be the smallest natural number so that cn $r d,, and assume without loss of generality
that cn > d,. Then
00
cj - dj
30
1
r ( { ~ j )-c
rl
( {)d j ) p l ) =
2 - -k
10J
1o n j=n+l
j =n
C
1
-
c7
C -=--c--
1On
" O 9
j=n+l 101
1
10n
9 1 10
10n 1 -
-1-
&
1
lon
-
1
lon
O 0 9 1
j = l 10" 10J
1
= 0:
10"
6.1. Series as a Vehicle To Define Infinite Sums
which implies that r ( { c ;
For surjectivity, let x
} E l )# r
E
105
({dj}zl).
(0, 1) and construct {d;}TZl recursively as follows. Set the
digit do := 0. Once dn has been defined so that 0 5 x the largest integer k so that 0 I
x-
c
inequalities
o5x -C
j=l
The series
O0
;=l
j=1
dj
--
j=1
nfl
105
2 di
j=l
1
< -,
10n
let dn+l be
dj
1
- < -hold.
105
lon+'
d.
-L converges to x,because for all n
1oj
dj
101
k
10n+"
Then dn+l is at most 9 and the
1
< -.
Finally, for each n
101
10n
j=1
dm # 9, because otherwise we would have
that 0 p x -
2 dj
E
j=l
N we have by construction
N there must be an m
9
dj
03
E
1 9+3
j=n+l
;=l
> n with
1
10
which cannot be.
In particular, the representation in Proposition 6.6 can be used to convert infinite
repeating decimals into fractions (see Exercise 6-5). Of course not every series converges. For example, if all terms are equal to 1, the sum is infinite. The limit test below
is a criterion for divergence, because it says (Exercise 6-6) that if the terms do not go
to zero, the series diverges.
c*3
Theorem 6.7 Limit Test. Ifthe series
aj
converges, then lim
j-co
;= 1
aj
= 0.
co
Proof. Note that if
C
aj
converges, then
j=l
n
n-1
00
c4
Caj - C a j = 0.
lim a, = lim x u ; - x u ; =
n-co
n-co
;=l
j=1
j=1
j=1
If Theorem 6.7 was a biconditional, the theory of series would be very easy indeed.
The next example shows that this is not the case.
Example 6.8 The harmonic series
O01
diverges, even though the terms converge to
C
J
;=l
zero.
106
6. Series of Real Numbers 1
The partial sums sn =
subsequence { s n l
ni = 2‘.
c
n 1
- form an increasing sequence. If we can show that a
j=1 j
}Elgoes to infinity, then the sequence { s n } z 1 diverges. We choose
i-1
i
k=O
k=O
Because the
cf
x
that
~ 2 go
i
to infinity, the sequence of partial sums diverges, which means
.
0
diverges.
We will be confronted with “infinite sums” like the harmonic series throughout
measure theory. Therefore it is useful to formally define infinite sums.
c
c
x
Definition 6.9 Let
n
n+oo
a.- co and call it infinite or an
j=l
j=l
infinite sum ifs lim
c
Ly,
a j be a series. We write
n
00
C a j =-aifs lim C a j =-a.
a j = 00. we write
j=1
n-+Ly,
j=l
j=l
Exercises
6-1. Let k
E
Z and for each j
E
Zlet a ,
E
W.Define the series
x.
k
a ] and
a]
j=-w
j=k
6-2. Compute the value o f each of the series below.
6-3. More on the arithmetic of series (Theorem 6.4)
(a) Prove part 2 of Theorem 6.4
(b) Prove part 3 of Theorem 6.4.
x
j=1
Cc
00
aj
j=I
nor
b j converges.
j=l
x
a j and
(c) Give an example of two series
j=1
c
rn
b j such that
aj
j=1
+ b j converges, but neither
6.1. Series as a Vehicle To Define Infinite Sums
c
m
(d) Is it possible to find two series
c
m
actly one of
j=1
c
c
j=1
c
m
00
a j and
b j such that
j=1
a j
+ b j converges, and ex-
j=1
m
aJ and
c
b j diverges?
j=1
00
(el IS there a series
107
c
m
00
a , and a c E R such that
j=l
a j diverges?
c a j converges, and
j=1
j=l
6-4. Prove Theorem 6.5
6-5. Convert each of the following infinite repeating decimals below into a fraction. A bar over a set of
digits means that these digits repeat indefinitely.
(b) 0.25
(a) 0.25
(c)
0.9462
(d) 0.1473
(e) 12.004%
c
m
6-6. Why does the limit test say that if ,lim a j f 0, then the series
5-00
a j diverges?
j=l
6-7. Another proof for the summation formula in Theorem 6.2. Let q f 1 and a be real numbers. Prove
c
?I
by induction that
c
j=1
1-9"
'
aqJ = aq 1-GJ
00
00
6-8. 2k test. Let
a j be a nonincreasing sequence with nonnegative terms. Prove that
j=1
a j converges
j=1
k= 1
Hint. Use Example 6.8 as guidance.
c
m
6-9. Prove that
converges by showing that the partial sums form a Cauchy sequence
(-
j
j=1
5
1
-
m
+ -.n1 Be careful to distinguish all possible combinations of
n . m being even or odd.
6-10. Translating between sequences and series. Let ( u , ) ~ = ~be a sequence of real numbers. Prove that
m
converges iff the series c ( a , + l
- a j ) converges and that in this case we obtain the limit
j=l
oi
as lim an = a1
n-rm
+C(aj+l
-aj).
j=1
6- 1 1. Use Lemma 6.3 to prove the Monotone Sequence Theorem
c
00
6-12. Let
a j be a convergent series, let ( j k ] &
be a strictly increasing sequence of natural num-
j=1
c
jk-1
bers with j o = 1 and for all k
E
N let
Ak :=
j=jk-l
c
00
aj.
Prove that
Ak converges and
k=l
6. Series of Real Numbers I
108
6.2 Absolute Convergence and Unconditional Convergence
For series, there is no simple condition that is equivalent to convergence. Therefore we
need criteria that imply convergence. This section presents the criteria that are needed
to define and work with outer Lebesgue measure. More criteria will be presented in
Section 10.2.
Cauchy sequences play an important role throughout analysis, so it is natural to
relate the convergence of series to Cauchy sequences.
Cc
Proposition 6.10 Cauchy Criterion. A series
a j converges lfSfor all E > 0 there
j=1
is an N E
N so that f o r all n 2 rn 2 N we have
00
Proof. The series
c a j
converges iff the sequence
{s,}zl
of its partial sums
j=1
converges, which by Theorem 2.27 is the case iff it is a Cauchy sequence. This is the
case iff for all E > 0 there is an N E N so that for all n 3 m 3 N the inequality
Isn - sm-11
iE
holds. Since
1
I we have proved the result.
a , = Is, - s-1,
1j;m
The Alternating Series Test shows that there are indeed many convergent series.
Theorem 6.11 Alternating Series Test. Let
{ b j } F 1 be
a nonincreasing nonnegative
00
sequence such that lim
bj
(- l ) j + ' b j converges.
= 0. Then
J-00
j=1
Proof. Let E > 0 be arbitrary. There is an N
E
b, i-. Then for all rn > n 2 N we obtain
2
I m
I
m-n
1 j=n
I
i=O
E
M so that for all n
i =O
,
i
use bn+2,+1?bn+2(j+1)and a telescoping sum
3 N we have
109
6.2. Absolute Convergence and Unconditional Convergence
rn
and by the Cauchy Criterion we conclude that the series converges.
In many situations we will work with series of nonnegative numbers. If negative
summands occur, it is natural to take absolute values and hope the sum still converges.
This is not always the case, but the idea of absolute convergence is fundamental.
c
c
00
00
Definition 6.12 A series
converges absolutely iffthe series
a j
j=1
laj
1 converges.
j=1
Absolute convergence is a strictly stronger condition than convergence. That is,
absolutely convergent series converge, but the converse is not true.
c
c J II c
00
Proposition 6.13 I f the series
converges absolutely, then it converges. More-
a j
j=l
Ix
ovel; the triangular inequality
x
a. <
l a j I holds.
ij=l
- j=1
c
c 1
00
00
Proof. Let E > 0. Because
a j
converges absolutely,
j=l
laj
I converges. Thus
j=1
n
there is an N E
N so that for all n 2 m 2 N we have that
c
n
all n 2 m 2 N we obtain
c
5
laj
< E . But then for
j =m
laj
I<
E.
Therefore by the Cauchy Criterion
j=m
x
the series
a j
converges.
j=l
c
n
Moreover, for all n E W we have
l
a j
j=1
I1
n
I
c
00
laj I
I
j=1
la;
1, which implies the
j=l
rn
triangular inequality.
00
Example 6.14 By the Alternating Series Test, the series
cI
x
However,
(-I);+'
j=l I
f
J
~
'
=
O01
j=1
e(-l)j+l
j=1
1 converges.
j
= 00, so it does not converge absolutely.
0
J
Absolute convergence of a series is often established with the Comparison Test.
c
c
a2
Theorem 6.15 Comparison Test. Let
a j
j=1
30
for all j
E
N.I J ? b ~j converges, then
j=1
M
bj be series with 0 5 a j I
bj
and
;=l
M
a j
j=1
converges, too.
110
6. Series of Real Numbers 1
is a nondecreasing sequence and for all n we have
c c c
n
j=l
5
00
rn
n
aj
b j
5
b j
< co. This means that
is bounded above by
j=l
j=l
00
c
00
bj.
By the Monotone Sequence Theorem, the series
j=1
a j
converges.
j=l
The Comparison Test allows us to establish convergence and divergence for series
that can be compared to series with known convergence behavior. For example, O =1
converges because 2 j12 <
-
(k)'.
j=l
2J2
Similarly, O 01 diverges because 1 > -.1 We
A
i-1
A - j
will use the Comparison Test extensively in Section 10.2. Moreover, it should be noted
that the Comparison Test can also be applied when there is a k G N so that the comparability 0 5 a j 5 b j holds only for j 2 k (Exercise 6-16).
J
- I
We conclude this section with the subtle, but nonetheless necessary, notion of unconditional convergence. Assume that the terms of a series are provided in no specific
order. (This is the case in the definition of outer Lebesgue measure.) Then we could
add the terms in any order we choose. It would be catastrophic if the value of this
summation depended on the order in which we sum the terms. Unfortunately, for
30
1
arbitrary series this can actually happen. For example,
- 1)J + l - converges. How-
c(
J=1
j
c
C 1 ever, a rearrangement can destroy the convergence. Consider that O - 00 and
r=l 2 j + 1
c(
30
I
- 1) - = --oo (Exercise 6-13). Suppose we first sum enough odd numbered terms
j=1
2j
1 1
to get a number > 1, say, - - > 1, then add the first even numbered term, so we
1 3
1 1
get - then add enough odd numbered terms to get a number > 2, say,
1
3
+
+ + (-:),
> 2, then add the second even numbered term, so we
1
get 1
k)
+ -31 + (- +
get a number > 3, say
20
1
+ (-
),then add enough odd numbered terms to
j=2
+-+
1 3
254
-
j=2
k=21
1
6.2. Absolute Convergence and Unconditional Convergence
111
then add the third even numbered term, so we get
254
j=2
1
k=21
and proceed in this fashion indefinitely. This process uses all the available numbers
and it produces a sequence of sums that goes to 00. Of course we will need to use more
and more odd numbered terms to make up for the one new even numbered term and the
increment of 1 , but nonetheless, in one arrangement the series sums to a finite number,
while in another the sum is infinite. We will make this idea more precise in the proof
of Theorem 6.18 and we will push it to its full extent in Exercise 6-23.
Series in which we can rearrange the terms in any way without affecting the “sum”
are called unconditionally convergent.
c
00
Definition 6.16 The series
aj converges unconditionally @for all bijective func-
j=l
co
a2
co
tions u : W + W the series C a , ( j ) converges and x a , ( i ) = x a j . Zfa series
i=l
i=l
j=l
converges, but not unconditionally, we will say it converges conditionally.
For series of real numbers, Theorem 6.18 below shows that unconditional convergence and absolute convergence are equivalent. This is very helpful, because it is a
lot easier to check whether one series of absolute values converges than to check if all
rearranged series converge to the same limit.
c
co
Definition 6.17 Let
c
co
be the series
aj be a series and let B 5 N. We dejine the series
j=1
c
a; to
jeB
b j , where b; :=
a,;
for j
E
B,
;=l
c
=o
Theorem 6.18 The series
a, converges absolutely @it converges unconditionally.
;= 1
Proof. For “j,”
let E > 0 and let N E N be such that for all n 2 rn 2 N we have
2
c
co
la;/ <
;=m
E . Then
2
&
lajl 5 - < E .
2
j=N
Now let u : W -+ N be an arbitrary bijection. Then there is an I
(1, . . . . N - 1) a[{l,. . . , Z } ] . Hence,forallrn p Z weobtain
lm
c
o
I
I
I
E
W so that
112
6. Series of Real Numbers I
c
c
oc
00
which means
a,(i) =
uj.
j=1
i=l
M
For
"+,"we prove that if
x a j does not converge absolutely, then it does not
j=1
c
co
converge unconditionally. There is nothing to prove if
c
cc
can assume
does not converge, so we
aj
j=1
aj
converges, but not absolutely.
j=1
For j E
N,let uj'
:= max{aj, 0) and let a; := - min{aj, 0). We first claim that
oc
ZC
33
x u ; =
c u r = co. Suppose for a contradiction the series X u :
j=l
j=1
c c
c
c c
oc
00
Then the series
a; =
00
00
laj I =
- ( U j - u t ) would converge, so
n=l
j=1
did converge.
j=l
j=1
uJ' +a;
j=1
00
would converge, and hence
a j would converge absolutely, a contradiction.
j=1
We now recursively construct a bijection a : N +
N so that
c n , ( i ) diverges.
i=l
ffi
~~
Because X u : = co there are n1 E N a n d a(1) < a(2) <
. . . 4 a(n1) so that for
J=1
all i E (1, . . . , n l } we have a,(i) > 0, so that for all j 5 a(n1) with a j > 0 we have
ni
nl-1
j = a(i) for some i E { l ,. . . , n l } and so that
Because
a,(i) 5 1 <
i=l
i=l
x
~ u j = c ~ t h e r e a r e mE lN a n d o ( n 1 + 1 ) < a ( n 1 + 2 ) <
j=1
+
+
... <a(nl+ml)sothat
+
for all i E {nl 1 , . . . , n1 m l } we have a,(i) I
0, so that for all j I
a(n1 m l )
with a j 5 0 we have j = a(i) for some i E ( n l
1, . . . ,n1 m l } and so that
+
+
i=I
i=l
For the recursive step, let no := mo := 0 and assume that nl < . . . < n k ,
m 1, . . . , mk E N and distinct a ( l ) ,. . . , a(nk mk) E have been chosen as follows.
+
k
Foralli E P k : = U { n p - l + m p - l + l , . . . , np}wehavea,(i) > ~ , f o r a l l lj a ( n k )
p=l
I1 1
i=l
i=l
6.2. Absolute Convergence and Unconditional Convergence
u
k
Moreover, for all i E
j 5
a(nk
+ mk) with
N k :=
{ n pi-1, . . . , n p
+ m p } we have u,(i)
5 0, for all
p=l
c
Ui
5 0 we have j = a(i) for some i E
nkfmk-1
the inequalities
113
Nk
and the sums satisfy
nk+mk
u,(i) > k 2
u,(~).
i=l
i=l
+ mk + 1) < . . . < O ( n k + l ) greater than a ( n k )
so that for all i E &+I := u { n P - l + mp-1 + 1, . . . , n p }we have u,(i) > 0, for all
Choose natural numbers a ( n k
k+l
p=l
j 5
a(nk+l)
with
Uj
nktl-1
ties
+1<
a,(i) 5 k
i=l
c
> 0 we have j = a(i) for some i E Pk+l and the inequali-
u,(i) hold. (This can be done because
i=l
U,(i)
uj’ = 00.)
j=1
nk+mk
Because k 2
c
co
nk+l
we infer
> nk
nk+l
+ mk.
Choose the natural numbers
i=l
a(nk+l
+ 1) < . . . < o ( n k + l
+mk+l)
greater than a ( n k + m k ) such that for all indices
k+l
i
E N k + l := U ( n p + l, . . . , np+mp}wehaveu,(j)
10,forallj
sa(nk+l+mk+l)
p= 1
with U j 5 0 we have j = a(i) for some i
c
nk+l + m k + l - 1
u,(i) > k
i=l
Because k
c
E Nk+1
and the sums satisfy the inequalities
nk+l fmk+l
+1<
c
+12
cUT
30
u,(i). (This can be done because
= 00.)
j=l
i=l
nk+l
a,(i) we infer m k + l
2 1.
i=l
Then a : N +. N as constructed is injective by construction, surjective because in
every recursive step at least one more positive term and at least one more nonpositive
30
term of xu;are added to the set
. . . , u , , ( ~ ~ + ~and
~ ) }the
, construction shows
(uD(l),
is unbounded, and hence divergent.
Standard Proof Technique 6.19 In the proof of “=+”of Theorem 6.18, our estimates
c
co
only yielded a nonstrict inequality
luj
I 5 . . .. By making the sum less than or
j=N
E
8 .
equal to - we made it strictly less than E . Because E can be replaced with - in any
2
2
argument, in analysis it is usually not a problem if an estimate only yields a nonstrict
inequality 5 E rather than a strict inequality < E .
0
Sometimes we need to partition an infinite sum into infinitely many “chunks” (see
the proof of part 3 of Theorem 8.6). For this type of rearrangement, we use double
6. Series of Real Numbers I
114
series. For brevity’s sake, we limit ourselves to double series of nonnegative terms in
Proposition 6.22.
Definition 6.20 For i , j E N,we define the ordered pair ( i , j ) := { i , { i ,j ] } and we
define N x N = { ( i , j ) : i , j E N}.A (doubly indexed) family of numbers { ~ ( i , j ) ] Z = ~
is a function from N x N to R.
The definition of ordered pairs guarantees that the order in which the numbers are
listed matters (Exercise 6-14). Definition 6.20 also indicates that it is time to investigate
set theory in more detail. We will do so in the next chapter.
Definition 6.21 Let
series
{a(i,j)}$=l be
7,
M
a doubly indexed family of numbers. The double
M
M
a ( i , j )is called
convergent $for all i
E
a(,,j ) converges
W the series
i = l j=1
j=1
00
also converges. We will also denote the
and furthermore the series
i=l
cooc
m o c
double series as well as its limit by
ai3jinstead
i=l j=1
of
a(i,j).
i=l j=1
Proposition 6.22 shows that for double series of nonnegative numbers the order of
summation is immaterial.
Proposition 6.22 Let
{ ~ ( i , j ) ] E ,be
= ~a family
of nonnegative numbers. Then the dou-
m o o
ble series
a ( i , j ) converges
Ifsf o r all bijections a : N x N
+W x
N the sum
i=l j=1
a330
7‘7,
a,(i,j) converges. Furthermore, in this case the values are equal.
j=l
j=1
Proof. The direction
E
prove “+,”let
‘‘e”
is trivial, because we can choose a ( i , j )
:= ( i , j ) . To
M
77,a ( i , j )be convergent and let a : N x N + N x N be a bijection.
i = l j=1
c
x
m
Let i E W. Then for any m E
c
N we have
a,(i.;)
m
5
j=1
a ( k , ; ) .Hence,
by Lemma
k = l j=1
..
6.3 each series
a,(i,j)
converges.
j=1
0000
For the convergence of the rearranged double series to
a(i,j ) , let E > 0 be
i = l j=1
arbitrary but fixed, and let m E N. Then for all i
E
{ 1, . . . , m ] there is an Ni so that
115
6.2. Absolute Convergence and Unconditional Convergence
&
Qu(i,j)
j=N,+l
< -. Let N := max { N i : i E (1, . . . , m } } .Then
rn
/
N
Because E was arbitrary, for all m
E
m
m
m
m
N the inequality
N
cccc
m c c
>:7;
au(i,j )
i = l j=1
5
7 7;a(i,
j)
i = l j=1
corn
holds. In particular, this means that the double series
a u ( i , j )converges
and
i=l j=1
cccc
i = l j=1
i=l j=1
To show that the inequality actually is an equation, note that the roles of the double
series can be reversed in the above argument to arrive at the reversed inequality (see
Exercise 6-24).
00cc
0 0 0 0
a(i,j ) , which completes the
au(i,j ) =
Therefore, we conclude that
We will revisit the summation of double series that also have negative terms with
more sophisticated tools in Exercise 14-50.
Exercises
c
30
6-13. Prove that
j=1
cc
1
~
2j
1
= 00 and c ( - l ) -
+1
j=l
= -00.
2j
6-14. Let i . j , m ,n E N.Prove that (i, j ) = (m.n ) iff i = m and j = n
c
c
00
6-15. Cauchy Criterion for absolute convergence. Prove that a series
a , converges absolutely iff for
j=1
n
all E > 0 there is an N E
N so that for all n
c c
c
X
6-16. Comparison Test revisited. Let k E
c
c
N and let
X
j ? k . Prove that if
b j converges, then
j=1
laj 1 <
? m ? N we have
33
bj be series with 0 5 a j 5 bj for all
a j and
j=1
cc
J=1
a j converges, too.
j=1
33
6-17. Prove that a series
a j converges absolutely iff the sequence
J=l
E.
j =m
6. Series of Real Numbers I
116
6-18. Prove that the sum of two absolutely convergent series also converges absolutely
c
00
6-19. Give an example of an absolutely convergent series
so that
j=1
c
c
c c
x
00
aj
aj f
j=1
laj 1
j=1
x
6-20. Let
x
a j be an absolutely convergent series and let ( b j ] z l be a bounded sequence. Prove that
J=1
a j b j converges absolutely.
j=1
6-21. Determine which of the following series converges. If it converges, determine if it converges absolutely.
j=1
c c
x
X
6-22. Limit Comparison Test for series. Let
j=1
if lim
J+W
b j be series with positive terms. Prove that
a j and
j=1
9 = c > 0, then either both series converge or both series diverge.
bj
Hint. For E > 0 there is an N
E
N so that for all j
p N we have (c - e ) b j 5 a j 5 (c
+E)bj,
00
a , be a conditionally convergent series and let z
6-23. Let
j=1
c
E
W.Prove that there is a bijective function
00
u : W + N such that
a,(j)
= z.
j=1
Hint. Mimic the proof of the “e’’
part of Theorem 6.18. Oscillate about z rather than increasing
c
n
with k . Use that the limit of the terms is zero and that the distance of the partial sum
a,(jj
to z
j=1
ultimately is always bounded by some la, 1 with rn being large.
cc
a,(;, j ) and
i=l j = l
i=l j=1
cc
x00
cow
6-24. Reverse the roles of
x x
a(;,j ) as indicated in the proof of Proposition 6.22
;=I j=1
i = l j=1
cc 1
x x
6-25. Let { ~ ( j , j j ] ; = ~be a family of real numbers so that the double series
cc
x
Prove that
1 converges.
x
a ( ; , j ) converges to a number L and for all bijections a : N x W
;=I j=1
xDs
a , ( j , j ) converges to the same number L .
i=l j=1
a(;,jj
;=I j=1
+ W x N the sum
Chapter 7
Some Set Theory
Up to this point our development of analysis had minimal need for details of set theory.
However, to define the outer Lebesgue measure of a set (see Definition 8.1), we will
cover the set with an infinite family of intervals and sum the lengths. To be able to
compute the sum as a series, we can use at most as many intervals as there are natural
numbers. Surprisingly enough this is not trivial, because infinite sets can come in
different sizes. This chapter presents the fundamentals on arbitrary families of sets
and on the notion of size for sets. These ideas will be needed immediately for outer
Lebesgue measure and they will be useful throughout our work with measures. All
sophisticated set theory needed in this text is presented in this chapter.
7.1 The Algebra of Sets
Recall from the introduction to Chapter 1 that the terms “set” and “element” remain
undefined. To be able to work with arbitrary families of sets, we need to formally
define them and show some fundamental properties. Because every family of sets must
consist of subsets of another set, we start with the power set of a set.
Definition 7.1 Let S be a set. The power set P ( S ) of S is the set of all subsets of S.
Definition 7.2 Let S be a set. A family of sets C is simply a set of subsets of S , that is,
P ( S ) . An indexed family of sets is denoted {Ci}iEz,where it is understood that
there is a function from the index set I to P ( S ) that maps each i E I to Ci.
C
So while families of sets do not introduce any new set theoretical ideas, we call
them “families of sets” rather than “sets of sets” to indicate that we are now considering
objects whose elements are set theoretically more complicated than just elements that
cannot be decomposed any further. A family of sets can always be turned into an
indexed family of sets by making each set its own index. Conversely, an indexed family
can be turned into a nonindexed family by considering the set C := {Ci : i E I } , but
this process will collapse sets Ci = Cj with i
j into one set in C. In our work, we
may need repeated sets, so most of the families of sets we consider will be indexed.
+
117
118
7. Some Set Theory
C1
3
@
( a ) c2
c , n c* 8
1 c3
Figure 16: Venn diagrams of the intersection ( a ) and the union (b)of three sets.
Nonetheless, proofs for both kinds of families are very similar and we will present both
notations in this section. The most elementary operations for families of sets are unions
and intersections (also see the Venn diagrams in Figure 16).
Definition 7.3 Let C be a family of sets in S. Then we defne the union of C as the
set
:= {x E S : (3C E C : x E C ) } .Zf[Ci)iEz is an indexed family, the union is
UC
denoted U C i := {x E S : (3i E I : x E C i ) } .
i€f
Definition 7.4 Let C be a family of sets in S. Then we defne the intersection of C
as the set
:= {x E S : (VC E C : x E C ) } . Zf[Ci}ie~ is an indexed family, the
nC
n
intersection is
~i
:= {x E
s : (vi E I : x E ~ i ) } .
ieZ
The relation between complements (see page 1) and unions/intersections is described by DeMorgan’s Laws.
Theorem 7.5 DeMorgan’s Laws. Let C and [Ci}i,~
be families of sets and let X be
another set. Then
I.
x \ Uci =
iEZ
2.
x \ ciand x \ Uc = n i x \ c : c E el.
i€I
x \ n c i = U x \ cia n d X \
ief
nc
=
U{x \ c : c E el.
ief
Proof. Proofs f o r indexed and nonindexed families are very similar: Hence, we will
only present one proof f o r each part.
For part 1, we will prove the containments X \
G n { X \ C : C E C} and
u
UC
X \
C 1 n [ X \ C : C E C},which (see introduction to Chapter 1) shows the two
sets are equal.
For the containment relation X \
C C n { X \ C : C E C},consider an arbitrary
x
E
X\
u
UC. Then x 6 UC,which means that x
6
C for all C E
C. But
then
x E X \ C for all C E C, which means x E n { X \ C : C E C}.This proves that
X \
C is contained in n { X \ C : C E C}.
u
7.1. The Algebra of Sets
119
For the reversed containment relation X \
arbitrary x
E
n { X \ C :C
x $ C. But then x @
E
uC 2
C). Then for all C
UC,which means x
E
X
E
\
n { X \ C : C E C),consider an
C we have x
X \ C, that is,
E
UC. This proves that X \ UC
contains n i x \ c : c E el.
Because each set is contained in the respective other set, the sets must be equal. It
is also possible to prove part 1 with a chain of equalities, like part 2 below (Exercise
7-la). The proof of part 1 for indexed families of sets is similar (Exercise 7-lb.).
We prove part 2 for indexed families.
=
{X E
x
: (3i E I : x E
x \ ci)}= UX\ Ci.
icI
It is also possible to prove part 2 with a mutual containment argument as done for
part 1 (Exercise 7-lc). The proof of part 2 for (nonindexed) families of sets is similar
(Exercise 7-ld).
Standard Proof Technique 7.6 To prove equality of two sets A and B it is common
to prove mutual containment, that is, A C B and B 5 A . This is similar to proving
that two numbers are equal by proving that one number is both greater than or equal to
and less than or equal to the other one.
0
Ordered pairs and products also have formal set theoretical definitions.
Definition 7.7 Let A and B be sets and let a E A and b E B. Then we define the
ordered pair ( a , b ) to be ( a , b ) := { a , { a ,b } } . ZfA1, . . . , A, are sets and ai E Ai f o r
i = 1, . . . , n, we dejine ( a l , . . . , a,) := { ( a l , . . . , a,-i), { ( a l , . . . , a n p i ) ,a,}} and
call it an ordered n-tuple.
Definition 7.8 Let A 1 , . . . , A, be sets. Then the product A 1 x . . . x A, of these sets
is dejined to be the set of all ordered n-tuples ( a l , . . . , a,) so that f o r all i = 1, . . . , n
n
n
we have ai
E
Ai. The product is also denoted
n
Ai. Moreovel; for j
E
{ 1, . . . , n ) we
i=l
n
define the natural projection T A , :
Ai + A j onto A j by
T A (~a l ,
. . . , a,)
:= a j .
i=l
We will also abbreviate the natural projection onto the jthfactor as T,.
For a visualization of the product of two sets, see Figure 17. The definition of a
product is fundamental for the formal definition of functions and relations. Because
we will not need this level of detail in this text, these definitions have been relegated to
Appendix B.2.
Theorem 7.9 Let ( C i j i , ~be a family of sets and let A be a set. Then the equalities
Ci =
A x Ci hold.
A x
Ci =
A x Ci and A x
u u
n
7. Some Set Theory
120
f
~
7r2(a) = a 2
I
----
---
Figure 17: The product of two sets is a “rectangle’’ with the natural projections ni
providing the “coordinates” of an element.
Proof. Left to Exercise 7-2.
The distributive laws express the relationship between unions and intersections.
Theorem 7.10 Distributive laws. Let { A i } i eand
~ { B j } j eJ be indexedfumilies of sets.
Then
Ai n
Bj =
Ai n Bj and
Ai U
Bj =
Ai U B j .
ieI
jeJ
(i,j)eIxJ
ieI
jcJ
( i , j ) E I xJ
u u
u
Proof. For the first equation let x
E
u nu
Ai
iel
x
E
Ai and a j
E
J so that x
For the reverse inclusion, let x
thatx E Ai n B j
U A in
E
B j , which means x
Ai
E
Ai f?B j . Then there is an (i, j )
E
n
E
I , then x
x E Ai,
u B j for all j
Ai U
Ai C
E
ieI
E
ieI
J , and hence x
B j , then x
Ai U
I x J . For the reverse inclusion, let x
for all i
Ai
nBj.
E
I x J so
E
Ai U Bj
Bj.
ieI
E
u
n Bj C
I so that
( i ,j ) E I x J
For the second equation, first note that if x
for all (i, j )
E
( i , j ) e I xJ
u
E
u
B, , Then there are an i
jeJ
jeJ
iE1
n
0
jeJ
E
n
Ai U B j . If x
E
Ai
( i .j ) E I x J
0 Bj and if x
@ Ai, for some io
n cn
E
I, then
jEJ
E
Bj
jeJ
Ai U
ieI
Bj.
jEJ
Exercises
7- 1. Proving DeMorgan’s Laws.
(a) Prove part 1 of Theorem 7.5 for families of sets using a chain of equalities.
(b) Prove part 1 of Theorem 7.5 for indexed families of sets.
(c) Prove part 2 of Theorem 7.5 for indexed families of sets using a mutual containment argument.
7.1. The Algebra of Sets
121
(d) Prove part 2 of Theorem 7.5 for nonindexed families of sets
7-2. Prove Theorem 7.9. That is, let { C i ) i €be
~ a family of sets and let A be a set.
(a) Prove that A x
u u
Ci =
icI
A x Ci.
(b) Prove that A x
icI
nCi n
A x Ci.
=
icI
icI
7-3. Let A , B , C be sets. Rove each of the following.
(a) C \ ( A n B ) = (C \ A ) u (C \ B )
(c)
(e)
c u ( A n B ) = (cu A ) n (cu B )
( A \ B ) n c = ( A n c)\ ( B n c)
c n ( A u B ) = (cn A ) u (cn B )
(d)
(0 If A , B 5 C, then A \ B
X be a set. An algebra is a set of sets A g P ( X ) such that 0 E A, if A
n
and if A j E A for all j = 1, . . . , n , then
= A n (C \ B ) .
C = 0 so that any two sets in the family intersect.
7-4. Give an example of a family of sets with
7-5. Let
(b) C \ ( A U B ) = (C \ A ) n (C \ B )
u
E
A, then X \ A
E
A,
A j E A.
j=l
n
(a) Prove that if A j
E
A for all j = 1, . . . , n , then
A,
E
A.
j=l
(b) Let X be a set. Prove that the power set of X is an algebra.
(c) Let X be a set. Prove that A := { A g X : A or X \ A is finite ) is an algebra.
(d) Prove that an algebra need not contain countable unions of its elements.
g X and B g Y be sets. Prove that
(X x Y ) \ ( A x B ) = [ ( X \ A ) x (Y \ B ) ] U [ (X \ A ) x B ] U [ A x ( Y \ B ) 1.
7-6. Let A
7-7. Let A , B , C, D be sets. Prove that ( A x B ) n (C x D ) = ( A n C) x ( B n D ) .
7-8. Let f be a function whose domain is contained in X and whose range is contained in Y .
X x Y : y = f ( x ) ) ] is the domain o f f .
(a) Prove that nx
[ {(x, y ) E
(b) Prove that n y
[ {(x,y ) E X
x Y : y = f ( x ) ) ] is the range of f.
7-9. Unions and intersections versus functions. Let X , Y be sets, let f : X -+ Y be a function, let { X i ] i c l
be a family of subsets of X and let { Y i ) i € be
~ a family of subsets of Y . Prove each of the following.
(e) For f ( x ) := x 2 there are sets A , B
with f [ A n Bj f f[A]n f [ B ] .
injective .
7. Some Set Theory
122
7.2 Countable Sets
Two sets are considered to be of the same size iff their elements can be matched oneby-one without any leftovers on either side. Recall that bijective functions were defined
in Definition 2.24.
Definition 7.11 Two sets A and B are called equivalent iff there is a bijective function
f : A + B.
With this language, Definition 2.25 says that a nonempty set is finite iff it is equivalent to a subset { 1, . . . , n ) of N. In Figure 3 on page 36, only the two sets on the right
are equivalent. For analysis, the most important size distinction between infinite sets
is if they are countable or uncountable. Countable sets are, roughly speaking, sets for
which it is possible to "count" the elements, where the counting process may stop or it
may not.
Definition 7.12 A set C is called countably infinite iff there is a bijective function
f : N -+ C. A set C is called countable iff C is finite or countably infinite.
Subsets of a set cannot be larger than the set, so it is natural that subsets of countable
sets are countable.
Theorem 7.13 I f C is countable and S
C , then S is countable.
Proof. If S is finite, there is nothing to prove. Hence, we can assume that S is
infinite. Let f : N + C be a bijection and let n1 := min f - ' [ S ] . Fork E W, we define
nk+l recursively. Once n l , . . . , nk are chosen, let nk+l:=min f - ' [ ~ l \ { n l ,. . . , n a ) ) .
(
Define g : N -+ S by g ( k ) := f ( n k ) . Because all nk are in f - ' [ S ] , g maps N into S.
Because no two nk are equal and f is injective, g is injective.
Finally, suppose for a contradiction that g is not surjective. Let b be the smallest
element of f-'[S \ g[N]] and let the number of elements of f - ' [ S ] n { l , . . . , b - 1)
be k . Then f - ' [S] n { 1, . . . , b - 1) = { 1, . . . , n k } (with no = 0 in case k = 0) and
b = min f - ' [ S ] \ { n l ,. . . , n k ) ) , which means b = nk+l, a contradiction.
(
Theorem 7.13 says that to prove that a set is countable, it is enough to embed a
copy of it into a countable set. In the language of analysis, it is good enough to find an
"upper bound" that is countable. We will use this idea repeatedly in the remainder of
this section. Interestingly enough, even sets that look uncountable may be countable.
Lemma 7.14 The set N x W is countable.
Proof. The function f : N x W + W defined by f (in, n ) := 2m3n is injective.
Thus the set N x N is equivalent to a subset of W, and hence by Theorem 7.13 W x W
is countable.
Note how Theorem 7.13 allowed us to avoid the detailed construction of a bijective
function between N and W x N.Defining such a function is not trivial. Exercise 7-10
explicitly presents a bijective function between N x N and N.
7.2. Countable Sets
123
Definition 7.15 Two sets A and B are called disjoint ifSA f l B = 0. A family {Ci}i,r
is culled pairwise disjoint @for all i # j we have Ci n Cj = 0.
Theorem 7.16 Countable unions of countable sets are countable.
Proof. Let {C,}EE1 with a E N U (00) be a countable family of countable sets.
For each C,, let B, := C, \
u
u u
‘‘2”
u
a
a
n-1
C,. The containment ‘‘5”
B, =
C j . We claim
j=1
n=l
n=l
a
follows from B,
C, for all n E
N. For
let x
smallest natural number so that x E C,. Then x
E
C,. Let n
E
N be the
n=l
B,, which proves ‘‘2.” Moreover,
E
u
m-1
\
the B, are pairwise disjoint, because if m < n , then B, = C,
Cj L C, and
j=l
u
n-1
B, = C, \
I
Cj G Cn \ C,. Now for each n let B, = bk : k
j=1
E
I,],where Zn is N
or a set of the form 11, . . . , m,}. Then f ( n ,k ) := bf:is a bijective function between
u
a
{(n,k)E N x
N
: n 5 u,k E
I,) G N
x Nand
a
UC,.
B, =
n=l
n=l
00
Thus u C n is
n= 1
countable.
Maybe the most surprising fact about countability is that the rational numbers are
countable. The reason is that the order in which the rational numbers are counted has
nothing to do with their natural ordering.
Theorem 7.17 The rational numbers Q are countable.
u{ n
”-
M
=- 0) =
N] , that is, the positive rational
d
numbers are a countable union of countable sets, and hence by Theorem 7.16 they are
countable. Similarly, ( r E Q : r < 0 ) is countable and of course {0}is finite. Now
Q = ( r E Q : r > 0 } U {r E Q : r < 0 ) U (0} is countable by Theorem 7.16.
Proof. We have {r E Q : r
- :n E
ff=I
Exercises
1
7-10. Prove that the function f : N x W + W defined by f ( m . n ) := - ( m
2
bijective. (For a visualization, consider the middle of Figure 18.)
+ n - l)(m + n - 2) + n is
7-11. Prove that the set of integers Z is countable.
7-12. Prove that the set of integers Z is countable by constructing a bijective function f : N + Z.
Hint Figure 18(a).
7-13. Prove that the set of dyadic rational numbers is countable.
7-14. Use Theorem 7.16 to prove Lemma 7.14.
7-15. Give a direct proof that the union of TWO countable sets is countable.
7-16. Prove that if C1, , . . , C, are countable, then C1 x C2 x
.. . x
C, is countable.
Figure 18: Some standard visualizations for countability arguments. Part ( a ) shows the
construction for a direct proof that the union of two countable sets is countable. Part
( b ) shows an explicit bijective function between W and the product of two countable
sets. Part ( c ) shows the idea behind the proof that (0, 1) is not countable.
cc
3030
7-17. Let ( a ( i , j ) ] E = lbe a family of nonnegative numbers. Then the double series
a ( i , j ) con-
i=l j = 1
30
verges iff for all bijections
D
:W
+ N x N the sum
a o ( i ) converges. Furthermore, in this case
i=l
the values are equal. Hint. This is similar to the proof of Proposition 6.22.
7.3 Uncountable Sets
Not all sets are countable. In fact, (see Exercise 7-18) there is an infinite hierarchy of
sizes for infinite sets, because any time we form a power set, we obtain a set that is
strictly larger than the set we started with.
Theorem 7.18 I f X is a set, then X is not equivalent to its power set
Pix).
Proof. Suppose for a contradiction that f : X --f P ( X ) is a bijection. Define
B := {x E X : x # f ( x ) } . Because f is surjective, there is a b E X with B = f ( b ) .
Now b E B would imply b E B = f ( b ) and by definition of B this would mean
b $ f (bj = B . Thus we infer b # B . But then b # B = f (b),which by definition of
B forces b E B , a contradiction.
For analysis, we typically do not need the full hierarchy. Instead we only need to
distinguish sets that are countable from those that are not.
Definition 7.19 A set U is called uncountable i f f it is not countable.
The real numbers, which are fundamental for analysis, are not countable.
Theorem 7.20 The interval ( 0 , 1) is uncountable.
125
7.3. Uncountable Sets
x
,Itl,=,
1
.i j In=]
CD
w; !
*
Figure 19: Cantor sets are the intersection of a sequence of unions of closed intervals
where at each step only the left and right segments of each interval are kept.
Proof. Suppose for a contradiction that (0, 1) was countable. Then there is a
sequence
such that for every x E (0, 1) there is an n E N so that x = x,.
[X~}F=~
For each n
E
N,let
1
00
xAk)
I k l
be the decimal expansion of x, as in Proposition 6.6.
For each n E N,let y , be a number in the set [ I , 2, 3 , 4 , 5 , 6 , 7 , 8) \ [xf)] . Then
( y n } z l is a decimal expansion of a number y E (0, 1). However, for all n E N we
have that yn f x;), and hence y # xn,contradiction.
The remainder of the proof that W is uncountable is left to Exercise 7-19b. The
uncountability of the real numbers shows that, even though in real life we work mostly
with rational numbers, there are many more irrational numbers than there are rational
numbers.
Theorem 7.21 The set W \
Q of irrational numbers is uncountable.
Proof. By Exercise 7-19b, the real numbers are uncountable. Now for a contradiction suppose that W \ Q was countable. Then R = (R\ Q) U Q would be countable by
Theorem 7.16, contradiction.
We conclude this section by defining Cantor sets. These sets are very useful to
construct counterexamples which show that certain hypotheses in analytical theorems
cannot be dispensed with. Because these counterexamples can be considered a bit
pathological, we defer their construction to the exercises and we will only refer to
Cantor sets when necessary. Cantor sets are constructed from a sequence of unions
of closed intervals so that in each step we remove the middle of each interval. Figure
19 shows the first six stages in the construction of the ternary Cantor set, which is
constructed by successively removing the middle third of the intervals at each stage.
1
Definition 7.22 Let [ a ,b ] be an interval and let 0 < q < -. Then we define the left
2
part o f [ a ,b ] to be the interval L q [ a ,b ] := [a, a q ( b - a ) ] and the rightpart to be
the interval & [ a , b ] := [b - q ( b - a ) , b ] .
+
7. Some Set Theory
126
1
For a sequence Q := {qn]zl
with 0 < qn < - f o r all n E N define CF recursively
2
as follows. Let Cf := ZFo := [0, 11 and once the set Cf is dejined as a union ofpair2”
wise disjoint closed intervals
u
Zi,n,
Q f o r i = 1 , . . . , 2n let Z2i-l,n+l
Q
:= Lqn+l[Ii:],
i=l
u
2n+l
:= Rqn+l[ Z e ] and let Cn+l
Q :=
let Z2i,n+l
Q
00
Zj:+l.
Then C Q :=
i=l
Cf is called
n=l
the Cantor set associated with the sequence Q.
Even though the construction looks like it should only leave the boundary points
of the intervals, Cantor sets are in fact uncountable. The details can be explored in
Exercise 7-25.
Exercises
7-18. Construct a sequence (Pn}F=l of infinite sets so that no two sets P, and Pn+l are equivalent, but for
each n E N there is an injective function f n : Pn -+ P,+1.
7-19. Containment and uncountable sets
(a) Let U ,V be sets. Prove that if U is uncountable and U C V, then V is uncountable
(b) Prove that B is not a countable set.
7-20. Prove that every uncountable set contains a countably infinite subset
7-21. Prove that the set of all functions from W to (0, 1) is uncountable
7-22. Let { a , ] j G lbe an uncountable family of positive numbers. Prove that there are an
so that ain P E for all n E W.
countable subfamily {qn
E P
0 and a
7-23. Let F : [ a , b] + R be a nondecreasing bounded function. Prove that F can have at most countably
many discontinuities.
Hint.Suppose the set of discontinuities is uncountable and use Exercise 3-21 to conclude that there
must be an E so that the set of all x with lim F ( t ) - lim F ( z ) > E is uncountable.
z+x+
z+x-
7-24. Prove that for every countable subset A g R there is a nondecreasing function f : B + [O, 11 that
is continuous on W \ A and discontinuous at every a E A.
Hint. With A = {a, : n
E
W) set f ( x ) :=
,:an i x
2”
7-25. Cantor sets
(a) Prove that for any sequence Q = (q,)F=l of numbers qn
E
tion from C Q to the set of all sequences of zeroes and ones.
Hint. Forx
E
C Q andn 3 1, seta,(x) := Oiffx E I ?
Jsn
(
0,
:>
-
there is a bijective func-
= Lq,,
thatis, iffwehave
to turn “left” at the nth stage of the construction to keep x in the interval.
(b) Prove that CQ is uncountable. Hint.Exercise 7-21.
(c) Prove that the set of endpoints of the intervals I f n that make up the C
:
is countable.
Q .
(d) Prove that every x E C Q is the limit of a sequence of endpoints of intervals I 1.11
Chapter 8
The Riernann Integral I1
The Dirichlet function in Example 5.26 shows that functions with too many discontinuities may not be Riemann integrable. This is because for a discontinuity d there is
an E > 0 so that, independent of 6, the numbers inf { f (x) : x E (d - 6 , d 6 ) ) and
sup { f (x) : x E (d - 6 , d a)} will always be at least E apart. By Theorem 5.25, too
many such discontinuities will cause the function to not be Riemann integrable. At the
same time, Proposition 5.12 shows that a function can have some discontinuities and
still be Riemann integrable. To determine when a function is Riemann integrable, we
need to determine “how many” discontinuities are acceptable. For a graphical motivation, consider Figure 20 on page 133. Section 8.1 introduces outer Lebesgue measure,
which is the tool to measure “how many” discontinuities a function has. Section 8.2
introduces Lebesgue’s integrability criterion and Section 8.3 shows how this criterion
allows us to easily obtain new results about the Riemann integral. We conclude in
Section 8.4 with improper integrals.
+
+
8.1 Outer Lebesgue Measure
Outer Lebesgue measure covers a set with open intervals and assigns the infimum of the
sums of the lengths of these intervals as the measure (“size”) of the set. With respect
to Riemann integrability, we should note that if all discontinuities of our function are
trapped in a union of intervals, then outside of this union of intervals the function is
continuous and Riemann integrability should not be a problem there.
Definition 8.1 For an open interval I = ( a ,6 ) in R,we define I I1 := b - a. For any
set S C R,we define the outer Lebesgue measure of S to be
cx:
j=1
j=1
where we set h ( S ) = oc, ifnone of the series in the set on the right converge.
127
8. The Riemann Integral II
128
The proof of Proposition 8.5 will show why we use open intervals in the definition
of outer Lebesgue measure. We first turn our attention to sets with outer Lebesgue
measure zero. These sets, and properties that hold on the complement of such a set,
will be of particular importance for the Riemann integral.
Definition 8.2 A set of outer Lebesgue measure zero is called a set of measure zero
or a null set. A property P ( x ) such that h ( ( x E D : P ( x ) is not true }) = 0 is said to
hold almost everywhere in D. Almost everywhere is also abbreviated as a.e.
Countable sets are considered “small” in set theory and they are also “small” with
respect to outer Lebesgue measure.
Proposition 8.3 Countable subsets of EX have outer Lebesgue measure 0.
Proof. Let C = {cl, c2, . . .} be a countable subset of R and let E > 0. For j E N let
00
00
00
E l
1
I . ‘= c . - - . ThenCsUZjandCIIil=Cs-=s.Thus
2 2j
2J
j=1
j=l
j=l
h ( C ) = 0.
(,
Although the proof of Proposition 8.3 is quite simple, it takes a little to get used to
the result. Recall that Q is countable, which means it is a null set! Exercise 8-3d will
show that null sets can be uncountable, too. Of course, not all subsets of R are null
sets. To prove that outer Lebesgue measure provides the “right” measure for intervals,
we need the Heine-Bore1 Theorem. The conclusion of the Heine-Bore1 Theorem is the
inspiration for the topological definition of compactness (see Theorem 16.72). Until
we formally introduce compactness in Section 16.5 we will rely on Standard Proof
Technique 2.28 as used in the proof of the Heine-Bore1 Theorem.
Theorem 8.4 Heine-Bore1 Theorem. Let [ a ,b] c R and let Z be a family of open
intervals with [ a ,b] 2
I . Then there are$nitely many intervals I1, 1 2 , . . . , I,, E Z
u
n
SO
that [ a ,b]
IEZ
c U Zj.
j=l
Proof. Suppose for a contradiction that
Let c := inf C. Because a E I for some I E Z we infer c > a. Let J E Zbe an
open interval with c E J. If c < b there is a 6 > 0 with (c - 6, c 6) 5 J n [ a ,b ] .
If c = b there is a 6 > 0 with (c - 6, b] 2 J n [ a ,b]. By definition of c, there
+
are finitely many I1, 1 2 , . . . , Z,
E
Z so that [ a ,c - 61 s
u
n
j=1
I j . But then if c
b we
i
8.1. Outer Lebesgue Measure
+S] C J U
obtain [ a , c
u
u
129
n
Zj,
contradicting the definition of c. Hence, c = b and we
j=l
n
obtain [ a ,b] C J U
Zj, implying
C = 0, a contradiction.
j=1
Proposition 8.5 Let a , b E R with a < b. Then h ( [ a ,b ] ) = b - a.
Proof. Let
E
> 0. Because [ a ,b] 2
b
+ E4 ) U
+
fi (--
-)
2 . 2 n 92 . 2 "
n=2
&
&
we
obtain h ( [ a ,b ] ) < ( b - a ) E . Because E > 0 was arbitrary, h ( [ a ,b ] ) I
b - a.
To show the reverse inequality, let [ Z j } c l be a countable family of open intervals
u
00
so that [ a ,b] E
Z j . By
the Heine-Bore1 Theorem, there is a finite number of inter-
j=l
u
n
vals
Zj,,
. . . , Zj,
SO
that [ a ,b] C
Zj,.
For k = 1, , . . , n let
Zjk
= ( a k , bk). Without
k=l
loss of generality assume that no interval Z j k is contained in another. Reorder the intervals so that for k = 2, . . . , n we have bk-1 5 bk. Then for all k = 2, . . . , n we have
a k - 1 P ak. Moreover, bn > b, a1 -= a and for all k = 2, . . . , n we infer ak ibk-1.
n
Hence,
bk - ak > bl - a1
n
bk - bk-1 = bl - a1
+ bn - bl
1 b - a , which
k=2
k=l
co
IZ, 1
implies
+
> b - a , and hence h ( [ a ,b ] ) 3 b - a .
J=1
Aside from its use related to Riemann integration, outer Lebesgue measure also is
the foundation for Lebesgue integration. We conclude this section with some of the
properties of outer Lebesgue measure, which will also be helpful for some exercises.
Theorem 8.6 The properties of outer Lebesgue measure h. With 00 defined to be
greater than all real numbers and the sum of a divergent series of nonnegative numbers
being 00 we have the following.
1. h ( 0 ) = 0.
2. ZfA C B , then h ( A ) 5 h ( B ) .
3. Outer Lebesgue measure is countably subadditive. That is, for all sequences
00
{An]El
ofsubsets An 5 R the inequality h
Proof. For part 2, let A & B . Then we obtain the following.
130
8. The Riemann Integral II
u
30
IZj 1 : A G
5
inf
r
each Z j is an open interval
u
Zj,
each
00
IZj
I :B
s
j=l
=
Zj,
j=1
j=1
Zj
is an open interval
j=1
h(B).
For part 1, note that for all n E
N we have 0
1- ,:
I 1
L 7c
I
2
and thus h ( 0 ) 5 -, which
71
-71
via part 2 implies h(0) = 0.
For part 3, first note that there is nothing to prove if the right side is infinite. So
assume the right side is finite and let E > 0. For each n E M, find a countable family
{
I
u uu
00
Lemma 7.14 the family Zn
of this family is such that
1 j.n=l
is a countable family of open intervals. The union
" 0 0
00
An 5
n=l
u
xi
Zj" =
n=l j=1
Zj". By Proposition 6.22, the
j,n=l
convergence behavior and value of a doubly infinite sum of nonnegative numbers do
not depend on the order of summation and by Exercise 7-17 it does not matter if we
represent the sum as a double sum or a single sum. Thus we can conclude
Because E was arbitrary, this proves part 3.
Exercises
8-1. Let [ A n ) E lbe a countable family of null sets. Prove that h
8-2. Let a. b E W with a ib. Prove that h ( ( a ,b ) ) = b - a .
Hint. For ''z"approximate the open interval "from the inside" with closed intervals
8-3. Let Q = ( q n ] 2 1be a sequence of numbers qn E
and let C Q be the associated Cantor set
as in Definition 7.22. We will use the notation of Definition 7.22 throughout this exercise.
j=l
8.2. Lebesgue ’s Criterion for Riemann Integrability
131
Hint.Prove that for a finite union of painvise disjoint intervals the outer Lebesgue measure is
the sum of the lengths of the intervals. This requires repeated use of the argument in the proof
of Proposition 8.5.
(b) Prove that
{ fi
2q,} converges.
j=1
n
n
(c) Prove that h (CQ) = n$%
24j
j=1
Hint. For ‘‘2,”first consider a family Z of open intervals so that C Q E
VZ1,12,. . . , In E Z : C Q n [O, XIg‘
UZ.Prove that
= 0, by assuming it is
j=1
not empty and showing that inf C E CQ, which leads to a contradiction similar to the proof
of the Heine-Bore1 Theorem. (It also helps to look ahead to the proof of Lemma 8.11 for a
similar argument.) Then show that for any countable family
u
{Zj]gl
with C Q
u
03
I, there
j=1
m
are intervals Iji , . . . , Zjk with C Q
I jk . Conclude that there must be an n E W so that
k=l
m
k=l
(d) Prove that for any q
E
(0, -:>
the constant sequence Q = {q)z,
yields a Cantor set C Q of
measure zero.
Note. By Exercise 7-25b in Section 7.3 Cantor sets are uncountable. This means there are
uncountable sets of measure zero.
(e) Use Q =
{
2n;;
n=l
~
to prove that there are Cantor sets that are not of measure zero
n
n+l
n
Hint. Prove that for all n
E
N we have
-.
2qj 2 1 -
j=i
j=2
2J
(0Prove that there are Cantor sets whose Lebesgue measure is arbitrarily close to 1.
Hint. Fork
E
W fixed, use n
+ k instead of n in Exercise 8-3e.
8-4. Use the Heine-Bore1 Theorem and the axioms for W except for Axiom 1.19 to prove the BolzanoWeierstrass Theorem.
Hint. We will ultimately do this in a more abstract setting in Theorem 16.72.
8-5. Prove that i f f : [a, b] --f
all x E [ u , b ] .
R is continuous and A ( ( x
E [ a ,b] : f ( x ) # 0)
) = 0, then f ( x ) = 0 for
8-6. Prove that if f,g : [a. b] + B are continuous almost everywhere, then f
everywhere.
8.2
+ g continuous almost
Lebesgue’s Criterion for Riemann Integrability
The oscillation of a function is a quantitative measure “how discontinuous” the function
is. It is the last tool we need to characterize Riemann integrable functions.
8. The Riemann Integral II
132
Definition 8.7 Let f : [ a ,b ] -+ R be a boundedfunction and let x E [ a ,b]. For any
interval I let w f ( I ) := sup { If(y) - f ( z ) / : y , z E I n [a,b ] }be the oscillation off
over the interval I . Define the oscillation o f f at the point x E [ a ,b] as the infimum
w f ( x ) := inf { w f ( ~ :)x E J , J is an open interval }.
Exercise 8-8 shows that the oscillation measures the height of “jumps” in the function and Exercise 12-23 shows that the oscillation is also a measure of the size of oscillations. Regarding the details of the definition, Exercise 8-10 shows that it is important
that we use open intervals in the definition of the oscillation wf (x) at a point.
Theorem 8.8 The boundedfunction f : [a,b ] -+ R is continuous at x
= 0.
E
[a,b ] ij7
Of(X)
0. Then there is a
Proof. For “+,”let f be continuous at x E [a,b] and let E
&
S > 0 such that for all y E [ a ,b] with Iy - X I < 6 we have f ( y ) - f (x)l < -. Then
2
for all y , z E ( x - 6, x + 6 ) r
l [a,b] we obtain
1
+
Hence, w f ( x ) 5 w f ( ( x - S, x 6)) 5 E and because E was arbitrary we conclude
that w f ( x ) = 0.
Conversely, for “e,”
let x E [a,b] with w f ( x ) = 0 and let E > 0. Then there is an
interval (c, d ) with x E (c, d ) such that w f ( ( c ,d ) ) < E . Let 6 := min{x - c, d - x}
unless x = a , in which case we set 6 := d - x, or x = b, in which case we set
S := x - c. Then for all y E [a,b] with lx - y / < 6 we have y E (c, d ) , and hence
l f ( x ) - f ( y ) / < E . Thus f is continuous at x.
The next two lemmas show that, when it comes to Riemann integrability, small
oscillation means that lower and upper sums can get close to each other, while nonzero
oscillation on a set of positive outer Lebesgue measure prevents Riemann integrability
(also see Figure 20).
Lemma 8.9 Let f : [a,b ] --+ R be bounded. I f w f ( . x ) < & f o rall x E [ a ,b], then
there is apartition P of [ a ,bl so that U ( f , P ) - L ( f , p ) < 61b - al.
+
Proof. For every z E [ a ,b ] ,there is an open interval Iz = ( z - a,, z 6,) so that
w f ( I , ) < E . By the Heine-Bore1 Theorem, there are finitely many z1, . . . , zm E [ a ,b]
. Let P
:= ( a = xo < x1 < . . . < x n = b}
j=l
6.
be the set comprised of a , b and the endpoints z j k 2 that are in [ a ,b ] . Then each
2
interval [xi-l, xi] is contained in an interval Izl and consequently, with notation as in
Definition 5.13, we infer Mi - mi < E . But then
n
U ( f , P ) - L ( f ,p )
=
n
Mi Axi i=l
n
n
mi Axi = c ( M i - m i )Axi
i=l
i=l
8.2. LebesgueB Criterion for Riemann Integrability
133
A
4
(“1
(h)
Figure 20: The area of the boxes indicates the difference between the lower and the
upper sum of the function for the given partition. Boxes are tall where the function
has large slopes or discontinuities. Comparison of ( a ) and ( b ) shows that as the norm
of the partition goes to zero, the height of the boxes goes to zero where the function
is continuous. Where the function is discontinuous the height remains bounded away
from zero (consider the two discontinuities). Thus f can only be Riemann integrable
if the discontinuities (unavoidable tall boxes) can be trapped inside a set of intervals
whose total length is small. This is the idea for the Lebesgue criterion.
which finishes the proof.
Lemma 8.10 Let f : [ a ,bl -+ 24 be bounded. Ifh({.x
theti f is not Riemann integrable.
Proof. Because { x
E
[ a .b ] : w f ( x ) > 0 } =
x
[ a .61 : w f ( x ) > 0 ) ) > 0,
E
E
[a. b] : w f ( x ) >
/=I
is an E > 0 so that L := h ( { x E [ u , b] : w , f ( x ) > E } ) > 0.
Let P = { a = xo < X I < . . . < xn = b } be any partition of [a.b ] . Define the set
D := { x E [ a .b] \ {xo, . . . , x,?} : w f ( x ) > E } . By Theorem 8.6, we obtain
+ CA
( { x j ~2
) A
11
L
2 A(D)= A ( D )
2
h ( { x E [u, b] : w f ( x ) >
=L,
E})
i 1
D
j=1
uUIxj}
which means h ( D ) = L .
With i l , . . . , ik E { 1 , . . . , n } being the indices of the intervals ( x i - ] ,x i ) that ink
tersect D we obtain D
k
u(x;,-l. xi,)
j=1
Axi,
and
2 h ( D ) = L . But then in each
j=1
interval ( s i , - l .x;,) there is an s j with w , f ( s j ) > E , and hence w f ( x i , - l , x i , ) >
each j = 1. . . . , k we infer M;, - ni,, > E and thus
t1
E.
k
U ( f . P ) - L ( f . P ) = C ( M i - m;)Axi 2 C ( M i , - m;,)Ax;, > E L > 0.
i=l
j=1
For
134
8. The Riemann Integral II
Because P was arbitrary we have shown that for any partition P of [ a ,b ] we have
U (f ,P ) - L ( f ,P ) > E L > 0, where E and L are fixed. By Theorem 5.25 this means
that f is not Riemann integrable.
Lemma 8.10 shows that only functions that are continuous almost everywhere have
a chance to be Riemann integrable. Lemma 8.9 shows that if we can “trap” the discontinuities in a small enough set, a function should be Riemann integrable. The only
obstacle left is that outer Lebesgue measure works with countably many open intervals. To obtain a complement that is made up of closed intervals, it would be nice if
we could use finitely many open intervals to cover the set of discontinuities. The next
lemma shows that this is possible.
Lemma 8.11 Let f : [ p , b]- -+ E% be a bounded function and for each p > 0 define B, := {x E [a,b ] : w f ( x ) 2 p } . l f Z is a family of open intervals such that
B,
c
u
n
I , then there arefinitely many 11,1 2 , . . . , I,
E
u
Zso that B ,
IEZ
Zj.
j=1
Proof. Suppose for a contradiction that
Then c := inf C 5 b.
We first claim c E B,.
There is a sequence { a , } z l of elements of B, so that
a, < c and lim a, = c. Let
E
n+o3
> 0. Then there is an n
E
N so that la,
- CI <
&
-.
2
E
+ -))
2 p . Because of
2
6
c ( c - E , c + E ) we infer
- c +E)) 2 p .
the containment a, - -, a, +
(
2
Because for all 1 < c r there is an > 0 with ( c - c + E ) g (1, r ) we conclude
Moreover, because a,
E
E
B, we have that w f ( ( a , - ?, a,
W ~ ( ( C E,
E
i
E,
that w f ( c ) 2 p .
Because c E B,, there is an open interval Z E Z that contains c. Clearly,
c
a. But then x := max { a ,inf(I)} E [ a,b] and because x # C, there are in-
+
u
n
tervals Z1,. . . , In E
Zso that B , n [ a,XI 2
I ] . But then for all y
E
I with y 2 c
J=1
we obtain B,
n [ a ,y ] C Z U
u
n
Zj.
If c
b this means inf C 2 sup(I) > c and if
i
j=1
c = b this means C = 0, a contradiction either way. This proves the result.
Now we can characterize Riemann integrability as follows.
Theorem 8.12 Lebesgue’s criterion for Riemann integrability. The bounded function f : [ a ,b ] + R is Riemann integrable on [ a,b] iff f is continuous a.e. on [ a ,b ].
Proof. The part
“+”is the contrapositive of Lemma 8.10.
8.2. Lebesgue’s Criterion for Riemann Integrability
135
For “+,”let f be continuous a.e. on [ a , b] and let E > 0. Choose rn E N so that
b-a
E
. Then h(X,) = 0, and there is a
-< -. Let X, := x E [ a ,b] : o f ( x ) 2
m
2
rn
sequence
u
of open intervals with X, 5
c
oc
m
(Zj}cl
Z j and
j=1
j=1
lZjI <
&
2 ( q ( [ a ,bl)
+ 1)’
u
k
. . . , Zj, so that X, C
By Lemma 8.11, there are finitely many open intervals Z j l ,
and without loss of generality we may assume the
Zj,
Zj,
1=1
are pairwise disjoint. For each
u
k
Zjl
let Cjl := Zjl U { sup(Zj,), inf(Zjl)}. The set [ a , b] \
Ij, is a finite union of closed
1=1
intervals
J1,
. . . , J p and singleton sets {sl},. . . , { s q } . By Lemma 8.9 for each Ji,there
lJ.1 Set P :=
is a partition Pi so that U (f , Pi) - L(f,Pi) < 2.
rn
.
1
5
<
c
u
P
Pi U { a , b}. Then
i=l
c
k
P
+
I Ji I O f( [ a ,b ] ) I Zj, I
i=l
1=1
b-a
&
&
&
-+ w f ( [ a ,bl)
<-+-=&.
rn
2 ( q ( [ a , b l ) 1)
2
2
rn
+
By Theorem 5.25 f is Riemann integrable on [ a , b].
Lebesgue’s integrability criterion shows some of our earlier examples in a different
light. Clearly, the Lebesgue criterion implies that continuous functions are Riemann
integrable, so it supersedes Theorem 5.20. Similarly, any function with finitely many
discontinuities is Riemann integrable, so Proposition 5.12 also is an easy consequence
of the Lebesgue criterion. Regarding the Dirichlet function we can say the following.
Example 8.13 (Example 5.26 revisited.) The Dirichlet function is not Riemann integrable on [0,11.
In light of Theorem 8.12, the function f ( x ) = 0; for x E [O, 11 \ Q is not
1; for x E [0, 11 n Q,
Riemann integrable, because it is discontinuous at eviry point of [Ol 11 and by Proposition 8.5 we have h([O,11) = 1 > 0.
0
I
8. The Riemann Integral II
136
Exercises
8-7. Let f : [ a ,bl
+R
be bounded and let I , J be intervals with I g J . Prove that w f ( Z ) 5 w f ( J ) .
8-8. The oscillation as a measure of "jumps" in the function.
(a) Let f ( x ) := 1 1 0 , ~ Prove
).
that w f ( 0 ) = 1
(b) Let
f
: [ a , b] + E% be bounded and let x E (a. b ) be so that lim f ( z ) and lim f ( z ) both
z'x-
Z'X+
exist. Prove that wf(x) 2
Z'X-
(c) State when the inequality in Exercise 8-8b actually is an equality and prove your claim.
8-9. Let f : [ u , b] + R be bounded and let x E [a, b]. Prove that w f ( x ) = lim wf
{ O;
n+oo
'
for
8-10, Prove that for f ( x ) :=
O' we have inf { w f ( J ) : 0 E J , J is an interval
1; forx < 0,
though f is discontinuous at 0. The explain why this does not contradict Exercise 8-8b.
}
= 0, even
8-1 1. Prove that any nondecreasing function F : [a. b] + R is Riemann integrable. Hint. Exercise 7-23.
8-12. Let a < b. A function f : [ a , b ] + R is said to be of bounded variation iff
(a) Prove that every function of bounded variation is the difference of two nondecreasing functions. Hinr. Use V , " ( f ) := sup
I"
1 f ( a i ) - f(ai-1) 1 : a = a0 < a1 < . . . < a,
=x
i=l
1
.
(b) Prove that every function of bounded variation has at most countably many discontinuities.
Hint. Exercise 1-23.
(c) Prove that every function of bounded variation on [ a , b] is Riemann integrable on [ a , b ] .
(d) Prove that if f and g are of bounded variation and (Y
variation, too.
E
R,then f
+ g and af are of bounded
8-13. A function f : [ a ,b] + R is called piecewise continuous iff there are a = zo < z1 < ' ' < Z, = b
so that for all i = 1, . . . , n the restriction f l ( z , - l , z i ) is continuous. Prove that piecewise continuous
bounded functions are Riemann integrable.
8-14. Prove that if A , N g Rand N is a null set, then h ( A \ N ) = h ( A ) .
8-15. Prove that if C Q is a Cantor set with nonzero measure (see Exercise 8-3e),then Ice is not Riemann
integrable. Hint. Prove that l Cis ~
discontinuous at every x E CQ.
8.3 More Integral Theorems
The most tedious part in any proof involving Riemann integrable functions is to prove
Riemann integrability. Lebesgue's integrability criterion is a powerful tool to address
this issue, Without it, the proofs of Theorems 8.14 and 8.16 would be a lot longer.
Theorem 8.14 Let f,g : [ a ,b ] + E% be Riemann integrable. Then
1. The product f g is Riemann integrable on [ a ,b],
8.3. More Integral Theorems
137
I
f .
2. Ifthere is an E > 0 so that I g ( x ) 2 E for all x E [ a ,b ] , then the quotient - is
g
Riemann integrable on [a,b ] ,
3. The absolute value If 1 is Riemann integrable and the triangular inequality
Proof. For part 1, note that if f and g are continuous a.e., then the set of discontinuities { x : wfg (x) 0 ) is contained in the union { x : ~f (x) # 0 } U { x : wg(x) O } ,
which has measure zero. Thus f g is continuous a.e., and hence Riemann integrable.
The remaining parts of this proof are left as Exercise 8-16.
+
+
It is now easy to see that once f : [a,b] + R is Riemann integrable, then it is also
Riemann integrable over any closed subinterval. Formally, we define the following.
Definition 8.15 Let f : [ a ,b] + IR and let c , d E [ a ,b] with c < d. Then f is called
Riemann integrable over [c,d ] ifSthe restriction f I[c,d] is Riemann integrable. In this
case we set
l
d
f ( x ) dx :=
l
d
f l [ c , d ~ ( ~d )x .
Theorem 8.16 Let f : [ a ,b] + R and let m E ( a ,b). Then f is Riemann integrable
over [ a ,b] iff f is Riemann integrable over [a, m ] and over [ m ,b]. In this case, the
integrals satisfy the equation
Ib
f ( x )d x =
lm
f (x) d x
+
/
m
b
f ( x )d x .
Proof. Exercise 8-17.
We can now consider integrals in which the upper bound is an independent variable. This idea provides another connection between derivatives and integrals. For any
continuous function f , definite integrals produce a function G with G' = f.
Theorem 8.17 Fundamental Theorem of Calculus, Derivative Form. Let f be a
Riemann integrablefunction on [a,b]. Then the function G ( x ) :=
formly continuous on [ a , b ] and iff is continuous at x
I"f ( t )dt is uni-
Ja
E
( a ,b ) , then G is dixerentiable
at x with G ' ( x ) =
1
I
Proof. Let B be an upper bound so that f (x) < B for all x E [a,b ] . To see that
&
G is uniformly continuous on [ a ,b ] ,let E > 0. Set 6 := -. Then for all x , z E [a,b]
B
with Ix - zI iS we obtain (assuming without loss of generality that x < z )
<
&
B-=E
B
'
which means that G is uniformly continuous.
8. The Riemann Integral II
138
Now let x E ( a , b ) , let f be continuous at x and let E > 0. Then there is a 6 > 0
such that for all z E [a,b ] with Iz - X I < 6 we have ( f ( z ) - f ( x ) l < E . Hence, for all
z > x with z E [a,b ] and Iz - X I < S we obtain
For z < x,the proof is similar, and hence lim
G ( z )- G ( x )
Z'X
z-x
rn
= f (XI.
Functions as mentioned in Theorem 8.17 sometimes are also defined with an integral such that a > x . If we went through a partition from right to left rather than left to
right, then all the bases of the rectangles in a Riemann sum would have negative length.
Thus it makes sense to define the following.
Definition 8.18 Let f : [a, b ] +. R be Riemann integrable. Then we define the integral with the reversed bounds to be
la
b
f (x)d x := -
f (x) dx.
Corollary 8.19 Let f : [ a ,b] -+ IR be Riemann integrable and let xo
the function G ( x ) :=
Jc:
f ( t ) dt is uniformly continuous on [ a ,b ] and
d", (I:f ( t ) dt 1
uous at x E ( a ,b), then -
E [ u , b].
iff
Then
is contin-
= f (x).
Proof. Exercise 8- 18.
rn
Exercises
8-16. Proving Theorem 8.14
(a) Prove part 2 of Theorem 8.14.
(b) Prove part 3 of Theorem 8.14.
Hint.Use Lemma 5.6 to prove the inequality
(c) To see how valuable the Lebesgue criterion is, use Riemann's Condition (Theorem 5.25) to
prove that I f 1 is Riemann integrable over [u,b]. Then compare this proof with the proof using
the Lebesgue criterion and state which proof is simpler.
Hint.Use a partition P with U ( f , P ) - L ( f , P ) < E .
8-17. Proving Theorem 8.16
(a) Prove Theorem 8.16.
Hint. In each direction of the proof use the Lebesgue criterion to establish Riemann integrability. Use Lemma 5.6 for the equations, making sure that the point m is an element of each
partition P k .
8.3. More Integral Theorems
139
(b) To see how valuable the Lebesgue criterion is, use Riemann’s Condition (Theorem 5.25) to
prove that if f is Riemann integrable over [a, b],then f is Riemann integrable over [a, nz].
Then compare this proof with the proof using the Lebesgue criterion and state which proof is
simpler.
Hint. Use a partition P with U ( f , P ) - L ( f , P ) iE .
8-18. Prove Corollary 8.19.
8-19. Use Riemann’s Condition (Theorem 5.25) to prove that if f and g are Riemann integrable over
[a, b],then f g is Riemann integrable over [a, b]. Then compare this proof with the proof using the
Lebesgue criterion and state which proof is simpler.
8-20. Determine which result from this section that was used in the proof of Theorem 8.17 was not available
in Section 5.3. (This prevented us from placing Theorem 8.17 right after the Antiderivative Form of
the Fundamental Theorem of Calculus.)
8-21. Mean Value Theorem for the Integral. Let the function f : [ u , b] + W be continuous and let the
function g : [a, b] + [0, co)be nonnegative and Riemann integrable. Prove that there is a c E [a, b]
so that the integral satisfies
8-22 Let g : [ a , b] + [O.
00)
l
b
b
f ( x ) g ( x )d x = f ( c )
be Riemann integrable and let f : [a, b] --f
b
there is a c
f ( x ) g ( x )dx = f ( a )
[ a , b] so that
E
g(x) dx.
[
g(x) dx
W be nondecreasing. Prove that
+f(b)
h
b
g ( x ) d x . You may use
that nondecreasing functions are Riemann integrable (see Exercise 8-1 1).
8-23 Integration by Parts. Let [ a , b] c (c. d ) and let F , g : (c, d ) -+ W be differentiable functions
with derivatives f and g’ that are Riemann integrable on [ a , b]. Prove that the integral satisfies
lb
f ( x ) g ( x ) d x = F ( b ) g ( b )- F ( a ) g ( a ) -
lb
F(x)g’(x)dx.
8-24. Integration by Substitution. Let [ a , b] c (c, d ) , let g : (c, d ) + R be differentiable, let its derivative g’ be Riemann integrable on [a, b ] ,let ( u , u ) 2 g [ [ a ,b] ] and let F : ( u , u) + R be differentiable with continuous derivative f. Prove that
lb
f ( g ( x ) ) g ’ ( x ) dx = F ( g ( b ))
-
F (g(a) ) .
8-25. Let a > 0. A function f : [-a, a ] + W is called even iff f ( x ) = f ( - x ) for all x E [ - a , a ] . A
function f : [-a, a ] + R is called odd iff f ( x ) = - f ( - x ) for all x E [-a, a ] .
La
L
(a) Prove that if f is even and Riemann integrable, then
f ( x ) dx = 2
(b) Prove that i f f is odd and Riemann integrable, then
f ( x ) d x = 0.
La
f ( x ) dx
(c) Prove that any function f : [-a, a ] + W is the sum of an even and an odd function
Hint. f ( x ) + f ( - x ) is even,
‘
2
8-26. Compute the derivative. (Should the integrands be unknown, simply note that they are continuous.)
(a)
& 1’
et2 dt
8-27. Let f : [a, b] + W be continuous and let I , u : (c, d )
(l:;)
f(t)
d f ) = f ( u ( x ) ) u’(x) - f (I@)
8-28. Construct a function f : [0, 11 + R so that
integrable.
-+ [ a , b] be
differentiable. Prove that
) l’(x).
I f 1 is Riemann integrable and
f is not Riemann
8-29. A function f : [ a , b] + W is called absolutely continuous iff for every E > 0 there is a S > 0
so that for all sequences (a1 , b l ) , . , , , (a,, b,) of pairwise disjoint open intervals the inequality
n
n
x ( b j - ai) i6, implies
i=l
i=l
1 f ( b i )- f(ai) 1 < E.
8. The Riemann Integral 11
140
(a) Let f : [a, bl + R be a Riemann integrable function. Prove that the function G : [a,
b] + W
defined by G ( x ) :=
s,^
f ( t ) df is absolutely continuous on [ a , b].
(b) Prove that every absolutely continuous function is uniformly continuous.
(c) Prove that f ( x ) =
1
-
X
is continuous, but not absolutely continuous on (0, 11
8-30. Results for Riemann-Stieltjes integrals. Let g : [a, b] + R be nondecreasing, and let the functions
f,h : [ a , bl + Iw be bounded and Riemann-Stieltjes integrable on [ a , b] with respect to g.
(a) Prove that i f 1 is Riemann-Stielfjesintegrable on [a, b] with respect t o g and that the triangular inequality
ilb / If1
f dgi 5
b
dg holds.
(b) Prove that for all m E [a,b] the function f is Riemann-Stieltjes integrable with respect to g
b
over [a,m ] and over [m,b] and that
f dg
=
/
rn
a
f dg
+
/f
b
rn
dg.
(c) Prove that the product f h is Riemann-Stieltjes integrable on [ a , b] with respect to g
Hint (for all). Use the Riemann Condition for Riemann-Stieltjes integrals (see Exercise 5-27)
8.4 Improper Riemann Integrals
The Riemann integral allows us to compute the “area” under bounded functions defined on closed and bounded intervals. Sometimes we are interested in the area under
functions that are defined on infinite intervals or that are unbounded. These areas can
be approximated with Riemann integrals.
Definition 8.20 Let a
restriction f I[a,c~
E
R and let f
: [ a ,00) -+ R be such that for all c > a the
is Riemann integrable. Ifthe limit lim
r+cz,
l
t
f (x) d x exists, it is called
cc
the improper Riemann integral of f over [ a , 00) and it is denoted
Improper Riemann integrals for functions f :
(Exercise 8-31).
(-00,
f ( x )d x .
b ] -+ R are dejned similarly
Example 8.21 The p-integral test for integrals over infinite intervals. Let p > 0 be
1
rational. Then f ( x ) = - is improperly Riemann integrable over [ 1, 00) zfs p > 1.
xp
1
, while for
Finally, we need to show that the improper integral does not exist for p = 1. To do
1
1
this, note that f ( x ) = - is greater than -on each interval [ m ,m 1). Therefore,
X
m+l
+
141
8.4. Improper Riemann Integrals
for all n
E
N we infer f L
C" k +1 l
-l[k,k+l),
and hence
k=l
The latter sum is unbounded, so
0
d x does not exist.
Note that because we have not yet defined logarithms, the last argument in Example
8.21 is unavoidable. Similarly, we had to restrict ourselves to rational powers, because
powers with arbitrary real exponents have not been defined yet. Of course, the p integral test ultimately holds for real exponents p and we will not restate it once real
powers are defined.
Improper integrals can also be defined for (potentially) unbounded functions.
Definition 8.22 Let a , b E W with a < b and let f : [a,b ) + W be suck that for all
c
E
( a ,b ) the restriction f
is Riemann integrable. Ifthe limit lim
t+b-
lt
f ( x ) dx
exists, it is called the improper Riemann integral off over [ a ,b ) and it is denoted
lb
f ( x ) dx. Improper Riemann integrals forfunctions f : ( a ,b] -+
W are defined
similarly (Exercise 8-32).
For Riemann integrable functions on [ a ,b ] , the improper Riemann integral over
[ a ,b ) agrees with the Riemann integral over [ a ,b ] .
Proposition 8.23 Let a , b E R with a < b and let f : [ a ,b ] + R be Riemann
integrable. Then the improper Riemann integral off over [ a ,b ) exists and it is equal
to the Riemann integral off over [ a ,b].
&
Proof. Let B > 0 be an upper bound of I f / and let E > 0. Then for S := - and all
B
t E [ a ,b ) with It - bl < 6 we obtain
Riemann integrable functions are not the only functions that are improperly Riemann integrable. Example 8.24 shows that there are unbounded functions for which
the improper Riemann integral exists.
Example 8.24 The p-integral test for improper integrals over (0, 11. Let p > 0 be
1
rational. Then f ( x ) = - is improperly Riemann integrable over ( 0 , I ] iff p < 1.
XP
Mimic the proof for Example 8.21. (Exercise 8-33.)
0
8. The Riemann Integral II
142
Note that the improper integrals
r
I'
-
:x
d x converge for p < 1, while the integrals
- d x converge for p > 1. Irrespective of this important difference, for both
xlP
types of improper integrals similar laws hold and there also is a Comparison Test that
is similar to the Comparison Test for series.
Theorem 8.25 Let f , g : [ a ,b ) + R (where b could be
integrable over [ a ,b ) and let c E R.Then
1. f
00)
be improperly Riemann
+ g is improperly Riemann integrable over [a,b ) and
lb(f
+
g ) ( x )d x =
l
b
f (x) d x
+
l
b
g(x)dx.
2. cf is improperly Riemann integrable over [a,b ) and
l b c f ( x )d x = c
Ib
f(x) dx.
Proof. Similar to the proof of Theorem 6.4. (Exercise 8-34.)
Theorem 8.26 Let f : [a,b ) -+ R (where b could be 00) be Riemann integrable over
all intervals [ a ,c] 5 [a,b). Then f is improperly Riemann integrable over [ a ,b ) if
and only if for all c E [ a ,b ) the finction f is improperly Riemann integrable over
1
b
[c. b ) and in this case
f (x)d x =
lc
f (x) d x
b
+
f (x) d x .
Proof. Exercise 8-35.
Theorem 8.27 Comparison Test for improper integrals. Let f ,g : [ a ,b ) -+ E%
(where b could be 00) be such that 0 5 f 5 g , f is Riemann integrable over every closed interval in [ a ,b ) and g is improperly Riemann integrable over [ a ,b). Then
b
b
f is improperly Riemann integrable over [ a ,b) and
Proof. The function F ( t ) :=
[ a ,b ) and it is bounded by
f (x)d x 5
g ( x )dx.
lt
f (x)d x is continuous and nondecreasing on
g(x) d x . The reader will show in Exercise 8-36 that
lim F ( t ) = sup { F ( t ) : t E [ a ,b ) } 5
g ( x ) d x to complete the proof.
t+b-
Theorem 8.28 Let f : [ a ,b) + R (where b could be 00) be Riemann integrable over
every closed interval in [ a ,b). If I f 1 is improperly Riemann integrable over [ a ,b),
then f is improperly Riemann integrable over [ a ,b ) and the triangular inequality
143
8.4. Improper Riemann Integrals
t
Figure 21: In the integral test, the improper integral of a nonincreasing function is
related to the series that give the Riemann sums with left and right endpoint evaluations for the partition with step length 1. The integral test says that for an improperly
integrable function the series obtained by right endpoint evaluation cannot be infinite
( a ) , while for a function that is not improperly integrable the series obtained by left
endpoint evaluation cannot be finite (b).
Proof. Exercise 8-37.
Finally, we should note that the occurrence of series in Example 8.21 is not an accident. The Integral Test connects the convergence of certain series to the convergence
of improper integrals over infinite intervals.
Theorem 8.29 Integral Test. Let f : [ 1, 00) -+ [0,00) be a bounded nonincreasing
00
ifs the improper integral
f ( j ) converges
nonnegative function. Then the series
j=1
f (x)dx converges (also see Figure 21).
00
Proof. Throughout the proof let g ( x ) =
f ( j ) l [ j , j + I ) ( x ) .(For
every x 3 1,
j=1
this sum has at most one nonzero term.)
m
For
“+,”let
f ( j ) be convergent. Then g as defined above is improperly Riej=1
00
w
[
mann integrable over [ l , 00) with
f(j)
g ( x )d x =
i00.
Because 0 5 f 5 g ,
J I
j=l
the Comparison Test for improper integrals implies that
Conversely, for
r
La
“e,”
let
f ( x - 1) d x =
Lrn
L
00
f ( x ) d x converges.
f ( x ) d x be convergent. Then the improper integral
f ( x ) d x converges. Because g ( x ) 5 f ( x - 1) for all x 2 2,
by Comparison Test for improper integrals the integral
g ( x ) d x converges, which
144
8. The Riemann Integral II
C X
f ( j ) converges.
means the series
j=2
The connections and similarities between integrals over infinite intervals and series
need to be considered with caution. For example, Exercise 8-38 shows that there is no
Limit Test for improper integrals over infinite intervals.
Finally, improper integrals can also be defined for functions on the whole real line
and for functions with multiple singularities.
a1 < a2 < . . . < a,-l
< a, = b (where a could be
and b could be 00) and let f : ( a , b ) \ { a l , . . . , a n - l } -+ E% be Riemann integrable
over all closed subintervals of its domain. We define the improper Riemann integral
off as follows. Let r1 < 12 <
< r, be so that aj-1 < r j < a , and define
Definition 8.30 Let a = a0 <
-00
Ib
f (x) dx :=
s . .
2 [I::,
f (x) dx
+
j=1
1;
f (x) dx] ifall the summands exist.
Exercises
8-31. Let b E
W and let f
: (-m, b] +
W be such that for all a
integrable. Define the improper Riemann integral
< b the restriction f [ [ a , b ]is Riemann
s_,
b
f ( x ) dx o f f over (--00,
b].
8-32. Let a , b E iW with a < b and let f : (a. b] + W be such that for all c E ( a , b) the restriction f l [ c , b ]
is Riemann integrable. Define the improper Riemann integral
8-33. Prove the claim in Example 8.24.
lb
f ( x ) dx o f f over [a, b]
8-34. Prove Theorem 8.25.
8-35. Prove Theorem 8.26.
8-36. Finish the proof of Theorem 8.27 by proving that lirn F ( t ) = sup
+If1
8-37 Prove Theorem 8.28. Hint. 0 5 f
t+b-
[ F(t): t
E [a, b )
}.
5 2lfl.
8-38 Construct a function f : [ I , m ) + [0, I] that is improperly Riemann integrable, but does not
1
converge to zero as x --f 03. Hint. Triangles of height 1 and area 2n
8-39 Cauchy Criterion for improper Riemann integrability. Let f : [a, b ) + W (where b could be
00) be Riemann integrable over all intervals [a, c ] 2 [ a , b). Prove that f is improperly Riemann
integrable over [ a , b ) iff for all E > 0 there is an M E [a, b ) , so that for all c, d E ( M , b ) we have
8-40 Limit Comparison Test for improper integrals. Let
be Riemann integrable over all intervals [ a ,c]
s,"
f,g :
[ a ,b )
+ [0, 03) (where b could be x)
[a, b ) . Prove that if
lim
x+b-
f(x)
fo
=K
g(x)
> 0, then
d x converges iff
Hinr. Close to b we have g ( x ) ( K
~
E)
5 f(x) 5 g(x)(K
+E).
A
8-4 1 Prove that the function f : ( a , b] + W is improperly Riemann integrable over ( a , b] iff the function
[
improperly Riemann integrable over a , m ) and that in this case
the integrals are equal.
Chapter 9
The Lebesgue Integral
The geometric idea behind integration is to approximate the area under the graph
of a function with areas that are easier to compute. In the Riemann integral, we partition the x-axis and erect a rectangle over each partition interval to approximate the
area under the graph. However, Lebesgue’s criterion for Riemann integrability shows
that geometric rectangles will not approximate the area well if the function “oscillates”
too much. That is, if the differences between the possible choices for the heights of
the rectangles do not shrink to zero, then we cannot uniquely identify the area. Equivalently, in the Darboux formulation (see Section 5.4) excessive oscillations force the
upper and lower approximations of the area with rectangles to stay a finite distance
apart.
If we change our point of view and partition the y-axis instead of the x-axis, the
problem with different choices for the height goes away (see Figure 22). For a set
S = { x E [ a , b] : yi-1 5 f ( x ) < yi }, all sensible values for the height of a generalized
rectangle with base S are between yi-1 and yi. Because the difference between these
values can be made small, oscillations are not an issue. However, this approach requires
that the bases of our generalized rectangles are no longer intervals. The area of such
a generalized rectangle will be the Lebesgue measure of the base set times the height
of the rectangle. In this fashion, we retain all benefits of the geometric motivation for
integration, while being able to integrate many more functions.’
This chapter introduces the fundamentals of Lebesgue integration. These fundamentals will be revisited in Chapter 14 when we generalize our work to arbitrary measure spaces. Our presentation is designed to readily translate to the more abstract setting of Chapter 14.
Before we start, it is time to extend our arithmetic from the real numbers to the
‘The Lebesgue integral also remedies a more abstract problem with the Riemann integral. Spaces of
Riemann integrable functions are usually not complete (see Exercise 16-15d), while spaces of Lebesgue
integrable functions are (see Theorem 16.19). Completeness is such a fundamental abstract property that
this may be the main reason why the Lebesgue integral is preferred.
145
146
9. The Lebesgue Integral
k f l
___
2"
k
2"
k-l
2"
k-2
2"
dashed:
~
n2"
k-3
2"
S
=
k-1
7 l A k - l
k=l
7
-
2"
Ak-1
Figure 22: The idea behind the Lebesgue integral is to partition the y-axis instead of
the x-axis and to approximate the area with simple functions. This figure shows that
this can lead to "scattered" base sets on the x-axis, which we will treat first in this
chapter. The proof of Theorem 9.19 uses the partition of the y-axis into intervals with
dyadic rational endpoints.
real numbers with 30 and -m included. This is sometimes called the extended real
number system. In the extended real number system, every set has a supremum. This
will make our definitions of measures and of the Lebesgue integral simpler, because
we will not need to explicitly distinguish between bounded and unbounded sets.
Proposition 9.1 In the extended real number system [--00, co] := R U {oc.- 3 0 ) , 30
is the greatest element and --oo is the smallest element. Consequently, every set has
an in$mum and a supremum.
Proof. Let A C [--00, 001. The supremum of {-30) is -30, and the supremum
of 0 also is -00, so we can assume that A # 0, {-cm}.If A is bounded above by
a real number, then A n Iw # 0 has a supremum in the real numbers, which is also
the supremum of A in [-30, 301. If A is not bounded above by a real number, then
sup(A) = 00. Infima are treated similarly.
Proposition 9.1 assures that from here on, we can take suprema and infima of sets
of numbers fairly indiscriminately, as long as we know how to handle infinite values
301 is the same as on R with the additional convenalgebraically. Arithmetic on [-a,
tions of Definition 9.2. These conventions are inspired by the corresponding limit laws
in Theorems 2.44 and 2.46.
147
9.1. Lebesgue Measurable Sets
Definition 9.2 Arithmetic involving 00. Let c E R. Then
c-00=-00,
c+co=00,
undefined;
c.(-m)=
{
00;
:nzfined;
i f c > 0,
i f c < 0,
i f c = 0,
i f c > 0,
i f c < 0,
i f c = 0,
00
C
- = 0:
--oo
m . (-00) = -00,
m*co=co,
( - 0 0 ) . (-00)
C
- = 0,
= 00.
All other attempts at “arithmetic with infinity” lead to what is called indeterminate
forms. For these, the result can be any number, and hence rules of arithmetic cannot be
stated. We will consider indeterminate forms in Section 12.3.
9.1 Lebesgue Measurable Sets
We will work with generalized rectangles whose bases are no longer intervals. Therefore we need to measure the size of sets that are more complicated than rectangles.
Outer Lebesgue measure is a reliable upper bound for the (one-dimensional) “volume”
of a set. For all examples we have seen, it gives the right “volume.” Hence, we can
(and do) consider it the right way to measure the “outer volume” of a set.
Unfortunately, there are complications. It is possible to split a set T into two sets
so that the outer Lebesgue measures of the two sets add up to more than the outer
Lebesgue measure of T . If we were to involve such pathological sets in a definition of
integration, the integral would not even be a reliable measure of the area of generalized
rectangles. It would make no sense if the total “length” of the base would depend
on how we split the base. The definition of Lebesgue measurable sets is designed to
safeguard exactly against this problem (see Theorem 9.11). Because all our sets are
subsets of R (and later of a measure space M ) , we introduce abbreviated notation for
the complement.
Notation 9.3 When there is one underlying set X that contains all sets that we currently investigate (as is the case in measure theory) and S s X , then we denote the
0
complement of S in X also as S’ := X \ S.
Definition 9.4 A subset S
R is called Lebesgue measurable i f f o r all T C R the
equality h ( T ) = h ( S n T ) + h (S’ n T ) holds. We will also call the set T a test set.
We denote the set of Lebesgue measurable subsets of R by EL.
The existence of non-Lebesgue measurable sets, which would cause the abovementioned problems, is equivalent to the Axiom of Choice. That is, whether or not nonLebesgue measurable sets exist depends on what axiomatic system of set theory is used.
9. The Lebesgue Integral
148
In practical terms it means that non-Lebesgue measurable sets do not occur in physical
phenomena. Hence, from an applied point-of-view we need not be too concerned with
nonmeasurable sets and we will not consider them any further in this text. Exercise
9-7 illustrates the problems mentioned above with a simpler measure of ‘‘size,’’ the
Jordan content, for which even simple sets can behave badly. The construction is more
complicated for Lebesgue measure and the interested reader can find such constructions
in [14] and [29]. For the remainder of this section, we explore the properties of the set
of Lebesgue measurable subsets of R.
Definition 9.5 Let S
R be a Lebesgue measurable set. Then the outer Lebesgue
measure h ( S ) of S is also called the Lebesgue measure of S.
We first note that “half’ of the definition of measurability is always satisfied.
Corollary 9.6 For all subsets S , T 5 R,we have h ( T ) 5 h ( S n T ) + A. (S’ n T )
Proof. This follows from part 3 of Theorem 8.6 with A1 := S n T , A2 := S’ n T
and A,, := 0 for n 2 3.
It is now easy to see that null sets are Lebesgue measurable.
Proposition 9.7 I f h ( S ) = 0, then S is Lebesgue measurable.
Proof. Let S be a null set. By part 2 of Theorem 8.6 for all sets T R we obtain
0 5 h ( S f l T ) 5 h ( S ) = 0. Hence, for all subsets T E R we have the inequality
h ( S n T ) h (S’ n T ) 5 h (S’ n T ) 5 h ( T ) , which by Corollary 9.6 is all we need to
establish Lebesgue measurability of S .
+
In the following, we establish that certain set theoretical operations preserve measurability. Theorem 9.10 summarizes the most important facts. Although there are
more results, the properties listed in Theorem 9.10 suffice for our purposes. Set systems that satisfy these properties are called a-algebras (see Definition 14.1). These set
systems are fundamental for measure theory.
Lemma 9.8 If A and B are Lebesgue measurable sets, then the intersection A n B is
Lebesgue measurable.
Proof. Proofs of Lebesgue measurability typically involve the appropriate rewriting
of terms and the use of the right test sets. To show that A n B is Lebesgue measurable,
let T R be any subset of R. Then
h ( ( An B ) n T )
=
+ h ( ( A n B)’ n T )
h ( A n B n T ) + h ((A’ U B’) n T )
+
+
h ( A n B f l T ) h ( A n (A’ U B’) f l T ) h (A’
= A ( B n A n T ) + h (B’ n A n T ) + A (A’ n T )
=
n (A’ U B’) n T )
149
9.1. Lebesgue Measurable Sets
= h(AnT )
+ h (A’ n T )
= h(T).
Because T was an arbitrary subset of
measurable.
R we have proved that
A
nB
is Lebesgue
rn
The next lemma will be useful in two ways. It is a step toward proving that countable unions of Lebesgue measurable sets are Lebesgue measurable. It also is a step
toward proving that if a Lebesgue measurable set consists of countably many pairwise
disjoint Lebesgue measurable pieces, then the Lebesgue measure of the set is the sum
of the Lebesgue measures of the pieces.
Lemma 9.9 Let
{An}Elbe a sequence of painvise
u
disjoint Lebesgue measurable
co
sets. Then the union
A, is Lebesgue measurable and for all T &
IR we have
n=l
Proof. Let T E R. We first prove by induction that for all k
k
A(T)=
E
N we have that
)
C A (A, n T ) + A
n T . The base step with k = 1 follows
n=l
from the Lebesgue measurability of A 1.
For the induction step k -+ ( k l), we can assume that the induction hypothe-
+
k
sis A(T) =
C A (A, n T )+A
n=l
k+l
A.(T) =
CE.
(A, n T ) + A
n=l
induction hypothesis.
is true, and we need to prove that
((5
n=l
An)’ n T ) . We start the induction step with the
9. The Lebesgue Integral
150
/ k+l
k
\
n=l
=
& A n n T ) + h ( ( ; A n ) ’ nnT= l) ,
n=l
which finishes the induction stem
k
Thus for all k E
N we have h ( T ) 2
h (A,
n T )+h
n=l
letting k go to infinity we obtain the following.
2
h(T).
The above establishes measurability of the union as well as the desired equality. W
Theorem 9.10 The set CA of Lebesgue measurable subsets of
properties.
1. 0
E
2. I f S
R has the following
C)..
E C h , then
S’
E
C,.
u
x
3. @ A n E CAf o r all n E W, then
An
E
Xi,.
n=l
Proof. Parts 1 and 2 are left to the reader as Exercises 9-la and 9-lb. For part
3, let An E CAfor all n E N and let T
R. Define B1 := A 1 and then inductively
for n E
N set Bn+l
that for all n
E
:= A n + l r7
n
n
(dB1l’
= A n f l r7
Bi. An easy induction shows
,=1
N the set Bn is Lebesgue measurable (use part 2 and Lemma 9.8),
that B1, . . . , Bn are pairwise disjoint and that
u
n
A, =
r=l
u
n
2=1
B, (see Exercise 9-lc). But
9.1. Lebesgue Measurable Sets
u u
x
then
151
00
Bi (see Exercise 9-ld) and the latter set is Lebesgue measurable by
Ai =
i=l
i=l
Lemma 9.9.
Now that we have established that countable unions preserve Lebesgue measurability, it is reassuring to note that Lebesgue measure is additive for countable unions
of pairwise disjoint Lebesgue measurable sets. This is exactly what we expect from a
sensible measure. The size of a whole set is the sum of the sizes of its pairwise disjoint
parts.
Theorem 9.11 Let
u
rn
sets. Then
{An]El
be a sequence of painvise disjoint Lebesgue measurable
ffi
A, is Lebesgue measurable and h
n=l
Proof. The union is Lebesgue measurable by Lemma 9.9 and if we apply Lemma
u
ffi
9.9 to T :=
A,, we obtain
n=l
which proves the result.
The next result is a nice application of the countable additivity we just proved and
it will also be needed when we show that sums of Lebesgue integrable functions are
Lebesgue integrable.
be a sequence of Lebesgue measurable subsets of W so
Theorem 9.12 Let
(,,u
=,
100
that A,
A,+lforall n
E
N.Then h
Proof. Set B1 := A1 and for all n
u
N E
N the equality
I 1 =1
u
N
N
B, =
E
\
A,
= lim h(A,).
n-.+rn
N define
Bn+l := An+l
\ A,. Then for all
n=l
u
u
cc
so
A, = A N holds, which means
B, =
n=l
A,.
n=l
N
lim
N+W
C ~ ( B , 1’)im= AN).
n=l
N+rn
The whole idea of Lebesgue measurability is only useful if it indeed allows us
to extend the idea of Riemann integrability. For that to happen, intervals must be
Lebesgue measurable. We have delayed this result to the end of the section, because
now we can use some of the machinery built so far.
9. The Lebesgue Integral
152
Proposition 9.13 Intervals are Lebesgue measurable and the Lebesgue measure of an
interval is its length.
Proof. Because singletons are null sets and thus Lebesgue measurable, and because
countable unions of Lebesgue measurable sets are Lebesgue measurable, intervals are
shown to be Lebesgue measurable if we can prove that open intervals of finite length
are Lebesgue measurable. So let A = ( a , b ) be an open interval of finite length and
let T E R. By Corollary 9.6, we only need to prove h ( T ) 2 h ( A n T ) + h (A’ n T ) .
The inequality is trivial if h ( T ) = co, so we only need to prove the inequality if
h ( T ) < 00. Let T C R be a set of finite outer Lebesgue measure, let E > 0 and let
[Zj}pl
be a family of open intervals with T
u
00
00
Zj
and
IZjl
5 h(T)
j=1
j=l
+ !.2
Then
Z := { Z j n A : j E N} is a countable family of open intervals whose union contains
A n T . Moreover,
0
:=
{Zj \ [ a , CO)
u
:j E
a - -&, a +
{(
8
N} U { Z j \ (-co,b] : j
-), (b&
i& ’ b +
E
i)]
&
N]
8
is a countable family of open intervals whose union contains A’ n T . Thus
h ( A f’ T )
=
+ h (A’ n T )
h(T)+&
+
Because E was arbitrary this proves that h ( T ) 1 h ( A r l T ) h (A’ n T ) and we
have proved that A is Lebesgue measurable.
Regarding the Lebesgue measure of intervals, by Proposition 8.5 for closed intervals in R we have h ( [ a ,b ] ) = b - a . For closed, unbounded intervals of the form
[ a , co), we infer for all b > a that h ( [ a ,co)) 2 h ( [ a ,b ] ) = b - a , and hence
h ( [ a ,00)) = 00. Intervals of the form (-00, b] are handled similarly. For open and
half-open intervals, note that the singleton sets consisting of the endpoints have measure zero. This means (see Exercise 9-2) that adding or removing these points does not
affect the Lebesgue measurability of the set or its Lebesgue measure.
Standard Proof Technique 9.14 Note that the inequality marked with (*) in the proof
of Proposition 9.13 can actually be shown to be an equality. This is not necessary
9.2. Lebesgue Measurable Functions
153
because we only need the inequality. In complicated estimates, it can happen that an
inequality sign is put between quantities that are actually equal. Usually, this happens
when the equality would not have helped in the proof (as in the example just mentioned)
and when the writer did not want (the reader) to spend extra effort to think about why
the quantities may be equal.
0
Exercises
9-1. Finish the proof of Theorem 9.10. That is,
(a) Prove that 0
E
(b) Prove that if S
ZA.
ZA,then S’
E
ZA.
E
(c) Perform the induction mentioned in the proof of part 3.
(d) Prove that if (Ai):,
u u
n
Ai =
i=1
and (Bi)iOO,l are countable families of sets so that for all n E N we have
n
u u
00
B i , then
03
Ai =
i=l
i=l
Bi.
i=l
9-2. Let A be a Lebesgue measurable set and let N be a null set. Prove that A \ N is Lebesgue measurable
and that h ( A \ N ) = ,L(A).
Him. Use Theorem 9.10.
9-3. Let A , B
9-4. Let
u
B be Lebesgue measurable sets. Prove that A \ B is Lebesgue measurable.
be a finite sequence of painvise disjoint Lebesgue measurable sets. Prove that the union
N
An is Lebesgue measurable with A
u )
(ny1
n=l
An
N
=
2 h(An).
n=l
n
00
9-5. Let (An)E1
be a sequence of Lebesgue measurable sets. Prove that
An is Lebesgue measurable
n=l
and that for all k E W we have h
n )
(Il
An
5 h(Ak).
9-6. Let C Q be a Cantor set. Prove that CQ is Lebesgue measurable.
Hint. Use Exercise 9-5.
n
9-7. For all S C [0, 11, let J(S) := inf
j=1
dan content of S . Prove that J ( [ O , 11 n Q ) = 1 and J (10, 11 \
Q ) = 1.
9-8. Let A G B C B and let A (but not necessarily B ) be Lebesgue measurable
Prove that h ( B ) = h ( A ) h ( B \ A )
+
9.2 Lebesgue Measurable Functions
We now introduce the functions for which the Lebesgue integral can be defined. By first
defining what (potentially) integrable functions should look like, we avoid the Riemann
integral’s conceptual complications that are characterized in Lebesgue’s criterion (see
Theorem 8.12). Existence or nonexistence of the Lebesgue integral, defined in Section
9.3, will then merely be a question of whether there is too much area under the graph of
the function. We will revisit the original motivation of partitioning the y-axis after the
9. The Lebesgue Integral
154
proof of Theorem 9.19. To approximate areas, in Lebesgue integration indicator functions take the place of rectangles. Recall that by Definition 5.9 the indicator function
1; f o r x E S,
of a set S R is l s ( x ) :=
0; f o r x $ S.
Just as Riemann integrable functions can be approximated a.e. with step functions
(see Exercise 5-26), Lebesgue integrable functions will be approximated with functions
that are constant on measurable sets and which only assume finitely many values.
Definition9.15 A function s : R + R is called a simple Lebesgue measurable
function, or, a simple function, iff there are numbers a1 , . . . , a, E R and pairwise
n
disjoint Lebesgue measurable sets A1,
. . . , A, 5 so that s =
aklAk.
k=l
For functions f that assume more than finitely many values, we consider the positive and negative parts of f separately.
Definition 9.16 For f : R + [-m, co],we define f + ( x ) := max { f (x), 0 ) and
f - ( x ) := - min { f ( x ) , 0 )f o r all x E R.
Because we will successively approximate measurable functions from below we
want to speak of sequences of functions.
Definition 9.17 A family { f n J n E ~of functions will also be called a sequence of functions, denoted { fn}r=l.
Definition 9.18 A function f : R + [0,001 is called Lebesgue measurable iff there
is a sequence {sn)rE1of simple functions s, : Iw + [0,co)such that f o r all x E R the
sequence { s, ( x ) } E l is nondecreasing and lim s, (x) = f (x).
A function f : R + [--00,-00]
both Lebesgue measurable.
n-+m
is called Lebesgue measurable iff f + and f - are
The key problem in Riemann integration is that for some functions the approximations from above and below will not “meet.” Definition 9.18 does not simply circumvent this problem by only focusing on approximations from below. Exercise 9-9
shows that a bounded function (only bounded functions are considered in Riemann
integration) is Lebesgue measurable iff it can be approximated from above with simple functions. That is, Definition 9.18 may look biased, but for bounded functions the
concept of Lebesgue measurability could also be defined with approximations from
above. Because we are also interested in unbounded functions, we choose to work
with approximations from below throughout.
Because Lebesgue measurability is a key concept, it is useful to have several equivalent formulations available.
Theorem 9.19 Let f : R -+
lent.
[--00,
m] be a function. Then the following are equiva-
1. The function f is Lebesgue measurable.
9.2. Lebesgue Measurable Functions
155
2. For all a
E
R,the set {x E R : f (x) > a } is Lebesgue measurable.
3. For all a
E
R,the set {x E R : f (x) 5 a } is Lebesgue measurable.
4. For all a
E
R,the set {x E R : f (x) < a } is Lebesgue measurable.
5. For all a E
R,the set {x E R : f (x) L a } is Lebesgue measurable.
Proof. We first prove the result for a function f : R + [0, w].
For the implication “1=+2,” let a E R and let {s,]:=~ be a sequence of simple
functions such that for all x E R the sequence {sn(x)}Z1is nondecreasing and converges to f ( x ) . For all x E R,if f ( x ) > a , then for some n E N the inequality
s,(x) > a holds. Conversely, if for some n E N we have that s n ( x ) > a , then
because { S , ( X ) } ~ is~ nondecreasing we must have f ( x ) > a . This means that
u {x
30
{x E R : f ( x ) > a } =
E
R : s n ( x ) > a } . But each set {x
E
R : sn(x) > a )
i1=1
is a union of finitely many Lebesgue measurable sets, which means it is Lebesgue
measurable. Therefore, as a countable union of Lebesgue measurable sets, the set
{x : f ( x ) > a } is Lebesgue measurable.
For “2+3,” let a E R.Then {x E R : f ( x ) 5 a } = R \ {x E R : f ( x ) > a } ,
which is the complement of a Lebesgue measurable set.
For “3=+4,” note that for all real numbers a the set {x E R : f ( x ) < a } is equal
to the union {x
E
R :f(x) <a}=
u7x
E
7
R : f ( x ) 5 a - - and the latter set is
k
a countable union of Lebesgue measurable sets.
For “4+5,” let a E R. Then {x E R : f ( x ) 2 a } = R \ {x E R : f ( x ) < a } ,
which is the complement of a Lebesgue measurable set.
For “5+1,” by Exercise 9-10 the set {x E R : f ( x ) = 00) is Lebesgue measurable.
For each n E N,define the simple function
k= 1
Then each s, is nonnegative, because for all k , n E
N we have
To see that each { s n ( x ) } Z lis nondecreasing, let x E R and let n E N. The
claim is trivial if f ( x ) = 00, so we can assume f ( x ) < w. If f ( x ) 2 n , then
k-1
s , ( x ) = 0 5 s,+l(x). If f ( x ) < n , find k E { 1 , . . . , n2”} so that s,(x) = -.
2”
k-1
k
2 ( k - 1)
k- 1
- -= s,(x). Finally,
Then -5 f ( x ) < -, and hence s,+1(x) 2
2”
2,
2,
each ( s n ( x ) } z lconverges to f ( x ) because, if f ( x ) E [ N - 1, N ) , then for all IZ 2 N
1
the inequality l f ( x ) - s , ( x ) / < - holds, and if f ( x ) = 00, then f ( x ) = n .
2”
2n+l
9. The Lebesgue Integral
156
Finally, consider f : R + [--00, co]. For “1+2” let f be Lebesgue measurable. Then f - and f + are Lebesgue measurable. But then for a 2 0 we have that
{x E R : f ( x ) > a } = {x E R : f + ( x ) > a } , which is Lebesgue measurable, and for
a < 0 we have {x E R : f ( x ) a } = (x E R : f - ( x ) < - a } , which is also Lebesgue
measurable. Parts “2+3,” “ 3 j 4 , ” and “4+5” are similar to what was done for nonnegative functions. To prove part “ 5 j 1 , ” first note that for all a > 0 we have
{x E R : f + ( x ) 2 a } = { x E R : f ( x ) 2 a } , which is Lebesgue measurable and for
all a 5 0 we have { x E R : f + ( x ) 2 a } = R, which is also Lebesgue measurable.
Hence, f +is Lebesgue measurable. Considering the negative part f-,for all a 2 0
we have that { x E R : f - ( x ) 5 a } = { x E R : f ( x ) 2 - u } , which is Lebesgue measurable and for a < 0 we have {x E R : f - ( x ) 5 a } = 0, which is also Lebesgue
measurable. Hence, f - is Lebesgue measurable and because we already proved that
f’ is Lebesgue measurable we have proved that f is Lebesgue measurable.
rn
The underlying idea of the proof of part “ 5 j 1” for nonnegative functions f is to
1
partition the interval [O, n ) on the y-axis into intervals of length -. The proof shows
2n
that the area under the functions that are used to approximate f should approximate
the area under f , which means the idea of partitioning the y-axis can lead to a sensible
notion of integration.
Once measurable functions are characterized, it is helpful to determine how measurability relates to common algebraic operations.
+
Theorem 9.20 Let f , g : R -+ [--00, 001 be Lebesgue measurable functions. I f f g
is dejined everywhere, then f
g is Lebesgue measurable. Similarly, i f f - g or f . g
is dejined everywhere, then it is Lebesgue measurable. Moreovel; f + , f- and I f I are
Lebesgue measurable.
+
+
Proof. To see that f g is Lebesgue measurable, let a E R. We will use Theorem
1.36 to show that the set { x E R : ( f
g ) ( x ) < a } is a countable union of Lebesgue
measurable sets. If (f g ) ( x ) < a , then there is an E > 0 so that (f g ) ( x ) 26 < a.
By Theorem 1.36, there are rational numbers r and s so that f ( x ) < r < f ( x ) E and
g(x) < s < g ( x ) E . This means that f ( x ) g ( x ) < r s < ( f
g ) ( x ) 2s < a ,
which proves the containment ‘‘S”in the equation below. The containment ‘‘2’’ is
trivial.
+
+
+
+
+
+
+
+
+
+
Because the latter set is a countable union of Lebesgue measurable sets, the set
g)(x) < u } is Lebesgue measurable. Because a E R was arbitrary this
means that f g is a Lebesgue measurable function.
The proofs that f - g and f . g are Lebesgue measurable functions are similar
(see Exercise 9-1 1). The functions f + and f - are Lebesgue measurable by Definition
9.18 and the Lebesgue measurability of If1 = f + f - follows from the Lebesgue
measurability of sums of Lebesgue measurable functions.
rn
{ x E R : (f
+
+
+
9.2. Lebesgue Measurable Functions
157
Exercises
9-9. Prove that a bounded function f : W -+ [0, 00) is Lebesgue measurable iff there is a sequence
w
( s n ) z , of simple functions S n : W -+ R such that for all x E R the sequence ( s n ( x ) }n=l
is
nonincreasing and lim s n ( x ) = f ( x ) .
n+oc
Hint. Mimic part ‘‘5+1” of the proof of Theorem 9.19 for nonnegative functions.
9-10. Let f : W -+ [-m, cc]be a Lebesgue measurable function. Use part 4 of Theorem 9.19 to prove
that ( x E W : f ( x ) = cu } is Lebesgue measurable.
9- 1 1. Finish the proof of Theorem 9.20. That is,
(a) Prove that if f,g : W + [ - x , m] are Lebesgue measurable and f - g is defined everywhere,
then f - g is Lebesgue measurable.
(b) Prove that i f f , g : R -+ [-m, co]are Lebesgue measurable and f . g is defined everywhere,
then f g is Lebesgue measurable.
Hint. This one is complicated because of negative signs. Prove the result first for f,g > 0,
then use f = f+ - f- and g = g t - g - .
9-12. Prove that the sum of two simple functions is again a simple function.
9-13. Prove that f : R + [--00, co]is Lebesgue measurable iff for any two numbers a < b in W the set
[ x E W : f ( x ) E [ a , b ) ] is Lebesgue measurable.
9-14. Use Definition 9.18 to prove that if f,g :
f g is Lebesgue measurable.
+
W
-+ [0, co] are Lebesgue measurable functions, then
9-15. Let f , h : W -+ [-cc,x] and let f be Lebesgue measurable. Prove that i f f = h a.e., then h is
Lebesgue measurable.
9-16. Let f .g : R -+ [-x,co]be Lebesgue measurable functions.
+
f,g : W + [-cc,co] are Lebesgue measurable and f ( x ) g ( x ) is de’
fined almost everywhere, then (f + g ) ( x ) := f ( x ) + g ( x ) ; if f ( x ) + g ( x ) is defined, IS
lo:
otherwise,
Lebesgue measurable.
Hint Apply Exercise 9-15 to the right auxiliary functions and then use Theorem 9.20.
(a) Prove that if
(b) Prove that if f,g :
R
+ [-m,
fined almost everywhere, then
Lebesgue measurable
(c) Prove that if
co] are Lebesgue measurable and f ( x ) - g(x) is deis defined, 1s
’
f ( x ) - g ( x ) ; if f ( x ) 0;
otherwise,
(f - g ) ( x ) :=
l
f,g : W + [-x.m] are Lehesgue measurable and f ( x ) g ( x ) is defined almost
everywhere, then ( f g ) ( x ):=
if f ( x ) g ( x ) is defined’ is Lehesgue measurable.
otherwise,
9-17, Let f,g : W +. [-m, 001 be Lebesgue measurable functions.
(a) Prove that
(b) Prove that
(c) Prove that
{ x E W : f ( x ) = g ( x ) } is Lebesgue measurable.
[ x E W : f ( x ) 5 g ( x ) ] is Lebesgue measurable.
[ x E R : f ( x ) ig ( x ) } is Lebesgue measurable.
9-1 8. Let f . g : W + [-cu, x]be Lebesgue measurable functions
(a) Prove that max(f, g ) (defined pointwise) is Lebesgue measurable.
(b) Prove that min(f, g ) (defined pointwise) is Lebesgue measurable.
9-19, Let f : W + W be a nondecreasing function. Prove that f is Lebesgue measurable
9. The Lebesgue Integral
158
9.3 Lebesgue Integration
Independent of whether the base is an interval or a potentially more scattered Lebesgue
measurable set, the area of a “rectangle” should be the measure of the base times the
height. This idea is behind the Lebesgue integral of simple functions.
Definition 9.21 Let A1, . . . , A , C JR be painvise disjoint Lebesgue measurable sets,
n
let
al,
. . . , a,
E
[o, 00)
be nonnegative numbers and let s =
n
function. We dejne the Lebesgue integral o f s by
UklAi,
k= 1
be a simple
n
ak1.4, d h :=
C akh(Ak).
k=l
n
By Exercise 9-20a for any given simple function s the value
akh(Ak) does not
k=l
7c
depend on the representation s =
ak1.4, that was chosen for s. Hence, Definition
k=l
9.21 is sensible and we can proceed to more general functions. For a more general
function, the Lebesgue integral is defined by approximating the area under the function
from below with the area under simple functions.
Definition 9.22 Let f : R + [0,001 be a Lebesgue measurable function. We dejne
the Lebesgue integral o f f to be
f d h := sup
{
s d h : s is a simple function with 0 5 s 5 f
and we will call f Lebesgue integrable ifSthe supremum isfinite.
A function g : R + E% will be called Lebesgue integrable i#g+
both Lebesgue integrable. We set
4
g d h :=
’8
dh -
1
and g- are
g- d h and call it the
Lebesgue integral of g .
Continuing our comparison with the Riemann integral, Exercise 9-2 1 guarantees
that for bounded functions that differ from zero only on a set of finite Lebesgue measure
(the Riemann integral is defined for bounded functions on bounded intervals) there also
is an approximation from above that will give the value from Definition 9.22. That
is, unlike the RiemandDarboux integral (see Definition 5.27 and Theorem 5.29), for
bounded functions that differ from zero on closed and bounded intervals the Lebesgue
integral does not have any problems with an upper and a lower approximation not being
equal.
As noted after the definition of Lebesgue measurable sets (see Definition 9.4), nonmeasurable sets are quite hard to come by. Similarly, although we will always need to
prove measurability for functions that we want to integrate, nonmeasurable functions
are not expected to arise in practical applications. This means that the only possible
problem in the definition of the Lebesgue integral is the potential for infinite area under
9.3. Lebesgue Integration
159
the graph. This is not a problem, because functions whose graphs enclose an infinite
area cannot have a finite integral, independent of what notion of integration is used.
Now that we have a sensible notion of integration that (as it ultimately turns out)
does not have the weaknesses of the Riemann integral, we can establish some theorems
about Lebesgue integrals.
Theorem 9.23 Let f , g : R + [-m, 001 be Lebesgue measurable functions.
I. If0 5 f
5 g a.e. and g is Lebesgue integrable, then f is also Lebesgue
IRI
r
integrable and
r
f dh 5
g dh.
R
2. f is Lebesgue integrable iff 1 f I is Lebesgue integrable and in this case the tri-
angular inequality
3.
Iff
>O,then
1
f d i i I I f I d h holds.
1
f d h = O z f f f =Oa.e..
Proof. For part 1, let N := {x E R : f ( x ) 2 g ( x ) } . By hypothesis, N is a null
set. Let s : W + [0, 00) be a simple function with 0 5 s 5 f . Then s l ~ 5\ g~ is a
simple function also and
sup
{b
s dh =
s l ~ d\h .~Hence,
s d h : s is a simple function with 0 5 s 5 f
= sup
5
1 1
sup
[1
{1
S~R\N
I
d h : s is a simple function with 0 5 s 5 f
s d h : s is a simple function with 0 5 s 5 g
1
.
I
Because g is Lebesgue integrable, the latter supremum is finite, and hence f is
Lebesgue integrable. Because the suprema are equal to the respective Lebesgue integrals, we conclude that
b
f dh 5
b
g dh.
For part 2, first note that by Theorem 9.20 the function If1 is Lebesgue measurable.
The direction "+"of the claim follows straight from part 1, because if 1 f I is Lebesgue
integrable, then 0 5 f+ 5 I f 1 and 0 5 f - 5 I f I imply that f+ and f - are both
Lebesgue integrable, which means by Definition 9.22 that f is Lebesgue integrable.
For the direction "+,"let f : R + [--00. 001 be Lebesgue integrable. Then f f
and - f - are Lebesgue integrable. To prove that I f [ = f + - f - is Lebesgue integrable
I1
lets = x U k l & beasimplefunctionwith0 5 s 5 I f l . L e t P := {x E R :f ( x ) ? 0)
k=l
and N := { x E R : f (x) < 0). Then P U N = R and P and N are disjoint. Therefore
n
n
160
9. The Lebesgue Integral
and 0 5 s- 5 f
-. But this means that
f - d h < co and thus 1 f
Therefore we conclude that
i s Leoesgue integratxe.
Finally, because f -, f
+
1
5 If 1 we conclude by part 1 that
L
For part 3, let f 2 0. First, consider the direction
obtain 0 5 f 5 0 a.e. and by part 1 we conclude that
P
f d h = 0.
IB
Conversely, for the direction
“+.”Because f
f dh 5
= 0 a.e., we
0 d h = 0 , which
means
“+”let
s,
f d h = 0 and suppose for a contradiction
that A( {x E R : f (x) > 0 ) ) > 0. Then, because the countable union of null sets is
again a null set and {x E R : f (x) > 0 } =
n
, there must be an
n
and s
1
:= - 1 we
~ obtain
n
by part 1. Therefore we conclude that f = 0 a.e..
Exercise 9-15 shows that if a function is equal a.e. to a Lebesgue measurable function, then it must be Lebesgue measurable, too. Part 1 of Theorem 9.23 can be used to
show that if two Lebesgue measurable functions are equal a.e., then their Lebesgue integrals must be equal, too (see Exercise 9-26). Basically this means that for integration
null sets are insignificant. The following definition is therefore sensible because independent of how we extend the function f to all of R,either all extensions are Lebesgue
measurable or none of them are (because for all a > 0 the set {x E R : g(x) < a } with
g as in Definition 9.24 below differs from {x E R : f ( x ) exists and f ( x ) < a } at most
by a null set) and either all extensions are Lebesgue integrable with the same Lebesgue
integral or none of them are Lebesgue integrable (by part 1 of Theorem 9.23).
Definition 9.24 I f the function f : R
g(x) :=
{ ofl(x)’
--f
[--30,
001 is defined a.e. and the function
is Lebesgue measurable, then we will call f
is
i f f ( x ) is not defined,
9.3. Lebesgue Integration
161
Lebesgue integrable iff g is Lebesgue integrable and we define the Lebesgue integral
P
P
o f f to be /R f d h := /R g dh.
Theorem 9.25 shows that the Lebesgue integral is well-behaved with respect to the
linear operations of multiplying with a real number and addition. Because of Exercise
9-16a and Definition 9.24 and because the set where the sum f
g is undefined is a
null set (see Exercise 9-27), we do not need to place any additional hypotheses on the
functions in part 2 of Theorem 9.25.
+
Theorem 9.25 Let f ,g : R -+ [--00,
a E R. The the following are true:
001
be Lebesgue integrablefunctions and let
1. af is Lebesgue integrable and
2. f
+ g is Lebesgue integrable and
s, +
f
g dh =
s,
f dh +
g dh.
Proof. For part 1, note that by Theorem 9.20 with g(x) = a the function af is
Lebesgue measurable. If f 2 0 and a 2 0, then af is Lebesgue integrable because
af d h
= sup
= sup
-
[s,
[
[1
s d h : s is a simple function with 0 5 s 5 af
I
as d h : s is a simple function with 0 5 as 5 af
s d h : s is a simple function with 0
a sup
5s5f
I
But this means that for any Lebesgue integrable function f and any a 2 0 the
functions (af )+ = af + and (af ) - = a f - are Lebesgue integrable and
1
af d h
=
k(af)+dh-k(af)-dh=a
1
f'dh-a
1
f-dA
= a / Rf dh
Finally, for any Lebesgue integrable function f and any a < 0 the functions
( af )+ = -af- and (af ) - = --af + are Lebesgue integrable and
For part 2 first note that by Exercise 9-16a, Definition 9.24 and the preceding discussion, we can assume that f g is defined and finite everywhere and that f
g is
+
+
9. The Lebesgue Integral
162
Lebesgue measurable. (Simply set f and g equal to zero where the sum is not defined.)
Also note that part 2 is easily proved for simple functions (Exercise 9-20b), so we can
use the additivity of the Lebesgue integral for simple functions in the following.
We will first prove the result for nonnegative f and g. To see that f + g is Lebesgue
integrable, suppose for a contradiction that f g is not Lebesgue integrable. Then for
+
each n E
{
N there is a simple function s, with 0 5 s, 5
F := x E
R :f(x) 3
For each n E
f(x)
+
2
[
f
and G := x E
'(')}
N one of the inequalities
1
1:
:
-S,1F
Without loss of generality, assume that
+ g and
k
s, d h > n. Let
R : g(x) 2
+2
a k;
d h > - or
g(x)
n
1.
--s,lc; d h > - holds.
4
n
- s , 1 ~d h > - holds for all n
4
E
N.Because
1
0 5 -S,*lF 5 + 1~ 5 f this implies that f is not Lebesgue integrable, which is a
2
2
contradiction. Hence, f g must be Lebesgue integrable.
Now if s1 is a simple function with 0 5 s1 5 f and s2 is a simple function with
0 5 s2 5 g, then s1 s2 is a simple function with 0 5 s1 + s2 5 f + g. This
+
1
implies
s1
+
dh
+
arbitrary, we obtain
s2
I
sl
dh =
f dh
+
L
+ s?:d h 5
g dh 5
L
f
s,+ +
f
g d h and because
s1, s2
were
g dh.
1
1
For the reversed inequality, let C, := x E R : - 5 f ( x ) g(x) 5 n for each
{
n
n E N.We first prove that lim
f
g d h . Let E > 0 and let
(f
g ) lc, d h =
n+m R
k s d h > k f + g d A - - . N Eo t e
s be a simple function so that 0 5 s 5 f
g and
2
s
u
CT:
that
C, = { x E
R : f ( x ) + g(x)
n=l
subsets A C {x E R : f ( x )
m
+
+
+
L
+
> 0 ) . Hence, by Theorem 9.12 for all measurable
+ g(x) > 0 ) we obtain n+Q3
lim h ( A n C,)
= h ( A ) . With
uJ l ~and
, all aJ > 0 this implies
s =
J=1
m
m
slc, d h = lim C a j h ( A j n C,) = C a j h ( A j ) =
n+m j=1
Therefore there is an n E
But then 0 5 slc, 5 ( f
j=1
N so that the inequality k . 1 ~ .d h
+ g ) l c , , and hence
L(f+g)lc,,dhz
s,
s dh.
> k s dh -
&
-
2
holds.
163
9.3. Lebesgue Integration
{
Because f and g are both nonnegative, the sequence JR (f
nondecreasing and we conclude that lim
n+oo
Now let
E
> 0 and let n E
<
{I,
Iw ” +
(f
N be so that
(f
+ g) lc, d k }
is
n=l
g ) lc, d h =
E
+ g) lc, d h >
2
I
6
. Because f and g are bounded by n on Cn, the
4 ( W d + 1)
proof of part “5+1” of Theorem 9.19 shows that there are simple functions sf and
sg so that 0 I
sf If l c , , 0 Isg F glc, and for all x E R the inequalities
f ( x ) l c , (x) - s f ( x ) < u and g(x)lc, ( x ) - sg(x) < u hold. Therefore,
Let u := min
-,
Because E > 0 was arbitrary, this proves the additivity of the Lebesgue integral for
nonnegative functions.
For not necessarily nonnegative functions, note that because If
gJ5 I f /
/gl,
the above and part 2 of Theorem 9.23 show that f g is Lebesgue integrable when f
and g are Lebesgue integrable.
For the equality of the integrals, first notice that if f1, f 2 , gl, g2 are nonnegative,
+
+
integrable and satisfy
we can conclude that
(f
+ g)’
- (f
+ g)-
f1
- f2 = gl
fl
=f
dh -
- g2, then via
f2
dh =
+ g = (f’ + g’)
1 +k
fl
gl d h - (f-
+ g-)
g2 d h =
+
1+
f2
gl d h
g2 d h . Therefore with
we obtain
9. The Lebesgue Integral
164
which completes the proof.
+
+
The approximation of the integral of f g with integrals of functions ( f g) lc,, in
part 2 of the proof of Theorem 9.23, shows that sequences of functions should be powerful tools. Convergence of sequences of functions is discussed in Chapter 11 and the
fundamental limit theorems for (Lebesgue) integration are introduced in Section 14.5.
Moreover, Exercise 14-33 gives a more efficient proof of part 2 using limit theorems
for integrals.
Exercises
9-20 Integration of simple functions.
(a) Let s be a simple nonnegative function, let 4'1, . . . , ym E (0, 03) be the nonzero values that
s assumes, let a l , , , . ,a, E [O, M), and let A l , . . . , An G B be pairwise disjoint Lebesgue
measurable sets so that s =
n
n
m
k=l
k=l
j=1
C a k l A k . Prove that C a k h ( A k ) = C y j h
(5-1
(yj1) .
(This proves that the integral in Definition 9.21 does not depend on the representation of s . )
Hint. Each of the ak must be a yj or zero. Group the first sum so that equal values U k are
contiguous and then prove that the union of the corresponding sets Ak is the inverse image of
the appropriate yj.
(b) Prove that if $1 and s2 are simple functions, then
s,
$1
+ s2 dh =
s,
sl d h
+
s2 dh
Hint. Find Lebesgue measurable sets A 1 . . . . , An so that $1 and s2 are constant on each A j .
9-2 1 Let f : X + [O, mj be bounded and Lebesgue measurable so that h
finite. Prove that f is Lebesgue integrable and
k
9-22 Let f :
f d h = inf
( {x
{k
iff the supremum S := sup
W
: f(x) > 0
} ) is
1
s dh : s is a simple function with f 5 s .
R + [O, 031 be a Lebesgue measurable function. Prove that
Lebesgue integral of f .
E
f is Lebesgue integrable
1
{s,
min(f, n ) l [ - n , n l d h : n E W is finite and that in this case S is the
9-23 Let f : R + [O, m] be Lebesgue integrable, let a > 0 and let A g R be Lebesgue measurable and
SO
that f
-a
lA
> 0. Prove that
k
f
-
a l A dh =
f dh - uh(A).
9-24 Construct Lebesgue integrable functions f : B + R and g : R + W so that ( f
and ( f + g)- f fg-.
+
9-25 Let a1 , . , , , a, E
sets and let f =
+ gjf
f f'
+ gf
R,let A 1 , , , . , A n g B be (not necessarily pairwise disjoint) Lebesgue measurable
5
UklAk
k= 1
be a simple function. Prove that
s,
f dh =
5
a k h ( A k ) . Then explain
k=l
how this result differs from the result in Exercise 9-20a and why we could not have used it to prove
Exercise 9-20a.
Hint. For the proof of the equation, you may use Theorem 9.25.
9.4. Lebesgue Integrals versus Riemann Integrals
165
9-26. Let j , g : R + [-co,001 be Lebesgue measurable functions so that f = g a s . Use only Theorem
9.23 to prove that f is Lebesgue integrable iff g is Lebesgue integrable and that in this case we have
IRjdA.=I
gdh.
R
9-27. Let j : W
+-
[-co,co]be Lebesgue integrable. Prove that { x E JR : f ( x ) = cm } is a null set.
9-28. Let f,g : R +- [ - m , co]be Lebesgue integrable functions.
(a) Prove that max(f, g) (defined pointwise) is Lebesgue integrable.
(b) Prove that min(f, g } (defined pointwise) is Lebesgue integrable.
(c) Prove that f - g is Lebesgue integrable and
s,
f - g dh =
s,
j dh -
g dh.
9-29. Let C Q be a Cantor set. Prove that l C p is Lebesgue integrable.
Hinr. Exercise 9-6.
9-30. Prove that the Dirichlet function f ( x ) = 1~,[0,11is Lebesgue integrable.
9.4 Lebesgue Integrals versus Riemann Integrals
We conclude this chapter by establishing the relationship between the Lebesgue integral and regular as well as improper Riemann integrals. The Lebesgue integral truly
is an extension of the Riemann integral of bounded functions on closed and bounded
intervals. To see this, we first show that Riemann integrable functions are Lebesgue
integrable and that for these functions the Lebesgue integral equals the Riemann integral. Theorem 9.26 also shows how to overcome the small nuisance of Riemann
integrals being defined on sets [ a ,b ] ,while the Lebesgue integral is defined on R.
Theorem 9.26 I f f is Riemann integrable over [ a ,b], then f R : E% + E% dejined by
{
f W ( x ) := ;()';
for
"[
otherwise,
blJ
is Lebesgue integrable and the Riemann integral of
f is equal to the Lebesgue integral of
fR.
That is,
1
fps d h =
Proof. First, let f 2 0. Let P = {a = xo < . . .
[ u , bl. With mi and Mi as in Definition 5.13, lets;
Ib
f dx.
xn = b } be a partition of
i
n
+
:= ~ m k l [ x r - l , x k mnl(xnl
j
and
k=l
p
XU
.-.-
n
+
M k l [ X k - , , X k ) M n l { x n l . Then 0 5 sf I
f 5 sup.
k=l
For all n E
N,consider the partition P,,
: k = 0, . . . ,2"
:=
I
. Then
sp 5 sF+' for all n E N and by Lebesgue's criterion for Riemann integrability (Theorem 8.12) we infer that lim s,p"(x) = f ( x ) a.e. Because fps and the s p are zero
n+w
outside [ a ,b ] ,by Exercise 9-15 this means that fw is Lebesgue measurable. Because
f~ 5 Ml[,,bl for some M > 0, we conclude that fps is Lebesgue integrable. Moreover,
for all n E
N we have L ( f , P,,)
=
1s p
dh 5
f R dh 5
s$ d h =
U (f , Pn).
166
9. The Lebesgue Integral
For each n E
N,there is an evaluation set T,
[
= t?),
integers k = 1, . . . ,2" the inequality f (t;)) - mk <
. . . , t2"
1
~
n(b - a )
(n)
I
so that for all
holds. But then we
1
infer R ( f , P,, T,) - L ( f , P,) < -. Because lim 11 Pn 11 = 0 we obtain
n
n-cc
lim L ( f , P,) = lim R ( f , Pn, Tn) =
n+co
n+co
Similarly, we can prove lim U (f , P,) =
n+oc
l
b
f (x)d x .
lb
f (x) dx. But this means
which establishes the desired equality.
For f being an arbitrary Riemann integrable function on [ a ,b ] ,note that because
f is bounded there is a real number B so that h := f
B l [ , , b ] 2 0. Then h x is
Lebesgue integrable, so f~ = h~ - B l [ , , b ] is Lebesgue integrable and
+
b
=
l h - B d x =
Ib
fdx.
The Lebesgue integral also incorporates parts of the improper Riemann integral as
the next result shows. For improperly Riemann integrable functions as in Theorem
9.27, we also often say that the improper Riemann integral converges absolutely.
Theorem 9.27 I f f is Riemann integrable over every closed subinterval of [ a ,b ) and
1.f I is improperly Riemann integrable over [ a ,b) (where b is either a number or infinity), then fw : R + R dejinedpointwise by f R ( x ) := i;(x): f o r x E [ a ,b), 1s
.
otherwise,
{
b
Lebesgue integrable and the integrals are equal, that is,
fR d h =
f dx.
A similar result holds for functions that are improperly Riemann integrable over an
interval ( a , b ] (where a is either a number or negative infinity) or over a set of the form
( a , b ) \ I a i , . . . ,a,}.
Proof. We will prove the result for b E ( a , co) and leave the case b = 00 to
Exercise 9-31b. For Lebesgue measurability, first consider the case that f 2 0. For
b-a
: k = 0 , . . . , 2, and with mi and Ml as in Defall n E N,let P, := a k2,
{
inition 5.13, let
1
+
2"-1
SZ :=
a+(k-l19,a+k9).
k=l
Then for all n E
N
we have
9.4. Lebesgue Integrals versus Riemann Integrals
167
s i 5 ";s
I
f and by Lebesgue's criterion for Riemann integrability (Theorem 8.12)
for almost all x E [ a ,b) we infer that lim s E ( x ) = f ( x ) . Because f~ and the s; are
n+cc
zero outside [ a ,b ) , by Exercise 9-15 this means that if f 2 0, then f~ is Lebesgue
measurable. Applying this result to f f and f - separately implies that f is Lebesgue
measurable regardless of what sign it takes.
Now for all simple functions s with 0 5 s 5 If1 and all n E N the inequalities
l
b- 1
l
b
I f ( x ) /d x < 00 hold. Because n E N was
s l [a ,b - arbitrary we can conclude that for all simple functions s with 0 5 s 5 I f ] we have the
dh 5
inequality
1
I f ( x ) (d x 5
b
sdh 5
lf(x)l d x < 00 (Exercise 9-31a). Hence, I f ~ is
l Lebesgue
integrable, so f x is Lebesgue integrable and
h l f I ~ Lb1
To prove that the two integrals are equal, let
1-8
b
dh 5
E
1
f ( x ) dx <
00.
> 0 and find a 6 > 0 so that
&
I f ( x ) /d x < -.Then
2
b
I 2 1 - 8 ( f ( x ) ld x < E .
Because E was arbitrary the two integrals must be equal.
Note that not every improperly Riemann integrable function is Lebesgue integrable
(Exercise 9-34). Moreover, not every Lebesgue integrable function can be integrated
with a regular or improper Riemann integral (Exercises 9-29, 9-30, 12-248). Hence,
neither integral can formally be replaced by the other. In fact, because cancellations as
in Exercise 9-34 are sometimes desired in Lebesgue integration, improper Lebesgue
integrals can be defined similar to improper Riemann integrals (Exercise 9-35).
If we consider the earlier development of the Riemann integral in Chapter 5, it
would be natural to target the Fundamental Theorem of Calculus as the next big result
after the fundamental properties and examples that were presented in this chapter. Unfortunately, although it is quite beautiful, the Fundamental Theorem of Calculus for the
168
9. The Lebesgue Integral
Lebesgue integral has a rather technical proof. Exercise 9-36 is the first lemma for this
result. Further lemmas are presented in Exercises 10-7, 11-20, 14-36 and 14-38 before
the result itself is proved in Exercises 18-6 and 23-8.
Exercises
9-31. Finish the proof of Theorem 9.27.
(a) Prove that if a , b
E
R, a < b, M
n E W we have that
z 0 and s is a nonnegative simple function so that for all
dh
M , then
i
s dh 5
M.
Hint. Recall that simple functions are bounded.
(b) Prove Theorem 9.27 f o r b = co.
1
9-32. Let f ( x ) = - 1(0,11.Prove that
8is Lebesgue integrable while f is not.
1
9-33. Let f ( x ) = - l [ ~ , ~ Prove
) .
that f 2 is Lebesgue integrable while f is not.
30
9-34. Let f ( x ) := ~ ( - l ) n + l i l [n,n+l),
where the sum is taken pointwise. Prove that the improper
n=l
Riemann integral of f over [ 1, co)exists and that f is not Lebesgue integrable.
Hint Harmonic series and the Alternating Series Test.
9-35. Define the improper Lebesgue integral of a function f : [ u , b ) + R,where b
9-36. Use the following steps to prove that if E
with E
g
E
W U (00)
[ u , b] and Z is a family of closed subintervals of [a. b]
C UZ,then there is a finite, pairwise disjoint subfamily 3 g Z so that
lI I
IEF
z -61h ( E ) .
I I
(a) For any closed interval I,let I * be the closed interval with the same midpoint and I * = 5 1 I I .
Prove that if I , J E 1,I n J # 0 and 111 < 2 1 4 , then I G J*.
6
(b) Let So := sup { I f / : I E Z let I1 E Z be so that If11 > 2.Assume pairwise disjoint
2
1,
u
n
I1, . . . , I, are chosen so that, with A , :=
Ij, for each I E
Zwe have I
f' An = 0
or
j=1
u I:.
n
I
Prove that if there is an I
E
Z with I n A n = 0, we can continue the construction.
j=l
Hint. Use6,+1 := sup
[ Ill : I E Z,
I n An = 0 }
(c) Prove that, independent of whether the construction in part 9-36b terminates in finitely many
steps or not, if 3 is the family of intervals constructed in part 9-36b, then h ( E ) < 5
1 I1 .
1€3
(d) Prove that part 9-36c yields the requisite intervals.
Chapter 10
Series of Real Numbers I1
The essentials on series were introduced in Chapter 6 to facilitate the “summation
of infinitely many numbers.” This chapter presents further aspects of the theory of
series. Particular attention is given to power series, which are needed to define the
transcendental functions in Chapter 12.
10.1 Limits Superior and Inferior
The terms of a series are usually easier to handle than the partial sums. Hence, it
would be useful to have convergence criteria based on properties of the terms. The
first obstacle for devising such criteria is that sequences obtained from the terms of a
series need not converge. The limits superior and inferior defined in this section can be
computed for any sequence and they describe the sequence’s limiting behavior.
Proposition 10.1 Let {a,}?=l be a sequence of real numbers that is bounded above.
’x
Then the sequence { sup{aj : j 2 n}},=l converges to a real number or to -a.
Proof. The sequence of suprema is nonincreasing. If it is bounded below, then it
converges to a real number. If not, then it converges to -m.
Proposition 10.1 and a simple modification (see Exercise 10-1) show that the following definition is sensible.
Definition 10.2 Let {a,}Elbe a sequence of real numbers that is bounded above.
The limit superior of {an}Elis defined to be lim sup a, := lim sup{aj : j 2 a } . For
n+cc
n-+m
sequences that are not bounded above, we say that the limit superior is co.
If
is a sequence of real numbers that is bounded below, then the limit
inferior of {a,}?=l is defined to be liminfa, := lim inf{aj : j 2 n } . For sequences
n+m
n+’x
that are not bounded below, we say that the limit inferior is -m.
{a,}zl
The relationship between limit superior and limit inferior is the obvious one.
169
10. Series of Real Numbers II
170
Proposition 10.3 Let
{a,}El be a sequence in R.Then lim
inf a,
n+oc
Proof. Clearly, for all n E
implies lim inf a, 5 lirn sup a,.
n+oo
N we have inf{aj : j
5 lirn sup a,.
,-too
p n } 5 sup{aj : j 2 n } , which
n+oo
Exercise 10-3 shows that precise arithmetic with limits inferior and superior is impossible for most operations. At least for negative signs we have a simple “reversal.”
Proposition 10.4 For any sequence of real numbers
lim inf -a, = - lirn sup a, and lirn sup -a, = - lim inf a,.
n+cc
n+oo
n+m
n-oo
we have the equations
Proof. Recalling Exercise 1-19 we note that
The other equation is proved similarly.
Finally, we should establish the relationship between the limit (if it exists) and the
limits superior and inferior.
{a,}zl
Lemma 10.5 Let
be a sequence of real numbers and let { b n } E 1be a convergent sequence of real numbers so that f o r all n E N the inequality b, 5 a, holds. Then
lim b, 5 liminfa,.
n-cx
n+oo
Proof. Let E > 0 and let L = lirn b,. Then there is an N E N so that for all
n+oc
n z N we have Jb, - L J < E , and hence a, p b,
L - E . This means that for
all n 2 N we have inf{aj : j 2 n } > L - E , and hence liminfa, 2 L - E . Because
n+oo
E > 0 was arbitrary we conclude lirn inf a, 3 L , which was to be proved.
n+oc
Theorem 10.6 A sequence of real numbers
converges ifs its limits superior
and inferior are equal and real, that is, lim sup an = lim inf a, E R.In this case, we
n+cc
n+oo
have the equalities lim a, = lirn sup a, = lim inf a,.
n+m
n-oc
n+m
Proof. For “+,”let
be a convergent sequence. Then by Lemma 10.5 and
Exercise 10-4a we obtain lim sup a, 5 lim a, 5 lim inf a, E R.By Proposition 10.3,
n+oo
n+oo
n+m
we also have lirn inf a, 5 lirn sup a,, which implies the two must be equal to each other
n-toc
n-+x
and to the limit of the sequence.
let lirn sup a, = liminf a, =: L E R and let
For “e,”
n-oo
n+bc
E
> 0. There is an N
so that for all n 2 N we have inf{aj : j 2 n } > L - E and sup{aj : j 2 n } < L
converges to L .
In particular, for all n 2 N we infer la, - L / < E , so
The equation at the end of the theorem is proved in either of the above parts.
E
N
+ E.
W
10.1. Limits Superior and Inferior
171
Exercises
10-1. State and prove a version of Proposition 10.1 that can be used to justify the definition of the limit
inferior.
10-2. Limit superior, limit inferior, and subsequences. Let {a,]:=l
be a sequence of real numbers.
is the largest S E [-co,x] such that there is a
(a) Prove that the limit superior of {a,]:=,
subsequence { a n k } & with lirn ank = S.
k+oc
(b) Prove that the limit inferior of ( u , ] ~ =is~ the smallest S
] ~ lirn
~ ank = S.
subsequence [ u , ~ with
E
[-co,001 such that there is a
k-tm
10-3. Let
and { b n ] r z 1be bounded sequences of real numbers.
(a) Prove that lirn inf a,
n-tm
+ b,
+ lim
inf b,.
n+w
2 lim inf a,
n+bo
(b) Give an example that shows that the inequality can be strict.
(c) Give examples that show that the limit inferior of a product of sequences can be greater or
smaller than the product of the individual limit inferiors.
10-4. Let
{ a n ] g land
be sequences of real numbers so that a n 5 b, for all n
E
N.
(a) Prove that if {bn]gl
converges, then limsupa, 5 lim b,.
n-tm
n- - tm
(b) Prove that in general it is possible to have lirn sup a, > lirn inf b,.
n+oc
n+m
10-5. Let { L Z , ] ~be
=a
~ bounded sequence of positive numbers so that the sequence of the reciprocals is
bounded, too. Prove each of the following.
1
1
(a) liminf - =
n+m a,
limsup,,,
a,
1
(b) limsup - =
n--tm
’
a,
1
a,
liminf,,,
’
10-6. Let
be a sequence of positive numbers and let (b,]r=l be a convergent sequence of positive
numbers with nonzero limit. Prove each of the following.
limn+= bn
(b) limsupa,b” = limsupa,
(a) lirn sup anbn = lim sup a, lim b,.
n+w
n-sm
n-oc
n-+m
(n+m
1
10-7. Lebesgue’s Differentiation Theorem states that for any function f : [ a , b] + B of bounded
variation the derivative f ’ ( x ) exists a.e. We will prove this result using the following steps.
(a) For any function f : ( a , b ) + B andx
E
(a. b),define D + f ( x ) := lirn sup
f(x
+h) - f ( x )
h
h, - f ( x ) , where “h + O+” means that h approaches zero form
h
the right ( h > 0) and “h -+ 0-” means that h approaches zero form the left ( h < 0). Prove
that f is differentiable at x iff D + f ( x ) = D + f ( x ) = D - f ( x ) = D - f ( x ) and that in this
case these values are equal to f ’ ( x ) .
Note. The four values above are also known as the Dini derivatives of f at x.
D - f ( x ) := liminf f ( x
+
h+O-
(b) Let f : [ a , b] + R be of bounded variation, let A := { x E (a. b ) : D - f ( x ) < D + f ( x ) }.
let B := [ x E ( a, b ) : D - f ( x ) > D + f ( x ) }, and let C := [ x E ( a . b ) : D + f ( x ) = co
Prove that f is differentiable at every x E ( a , b ) \ (A U B U C ) .
Note. In particular, this means we have proved Lebesgue’s Theorem if we can prove that A ,
B and C are null sets. The remaining parts of this exercise are devoted to this task.
1
1
1.
10. Series of Real Numbers I1
172
(c) Suppose for a contradiction that h ( A ) > 0 and obtain a contradiction as follows.
1
1
i. Prove that h x E A : D- f (x) < 4 - - < q < q - < D + f ( x )
> 0 for some
n
n
q E Q and n E N.
1
1
ii. With q , n as above let E := x E A : D- j ( x ) < 4 - - < q < 4 + - < D+ f (x) ,
n
n
1
let g(x) := f ( x ) - q x , let E := - and let a = xo < x1 < . . . < xn = b be so
n
(1
1)
+
I
I
n
t h a t x /g(xk)-g(xk-l)
1 > Vig-
k=l
I EZ
Eh ( E) .Pr o v et h at f o r al l x
6
1
)I1 > - h ( E ) and use this family to obtain the contradiction
6
E
En(xk-1,xk)
n
VXx-, g > V t g .
k=l
(d) Use D- f (x) = - D - ( - f ) ( x ) and D + f ( x ) = - D + ( - f ) ( x ) toprove that h ( B ) = 0.
6Vbf
0 to a contradiction by letting M := -2.-and finding for each
(e) Lead the assumption h ( C )
x E C an hx with
I f ( x + h,)
h(C)
- f (x) 1 > Mh,.
Use Exercise 9-36 as in part 10-7(c)iv and arrive at a contradiction in a similar way.
10.2 The Root Test and the Ratio Test
When it works, the Ratio Test for the convergence of a series is computationally convenient, because it only involves a division. Note that by using the limit superior and
the limit inferior we avoid any issues with convergence of the sequence of quotients.
ffi
Theorem 10.7 Ratio Test. Let
a j be a series with a j
# 0 for all j
E
W.
j=1
cc
a j converges absolutely.
ffi
> 1 then
aj diverges.
If neither of the above conditions holds, then the series might converge or diverge.
We also say that the Ratio Testfailed in this situation.
< 1. Then there is a J
Proof. For part 1, let q :=
such that for all j 2 J we have
laj+ll
E
W
< qlajl, and hence lajl < qj-'/a~(. Let
10.2. The Root Test and the Ratio Test
173
laJ 1
laJ I
M := J . Then for all j 3 J we obtain 0 5 laj 1 < q J - = M q j . By Comparison
4
qJ
c
00
Test (Theorem 6.15 and Exercise 6-16),
I *I
a j converges absolutely.
j=l
> 1, then there is a J E W so that for all j 3 J
aj
we have laj+ll > lajl. Hence, for all j > J we obtain lajl > laJl > 0 and the terms
of the series do not converge to zero. Thus the series diverges.
The last statement will be illuminated in Example 10.10.
For part 2, note that if lirn inf
J+00
The limit superior and the limit inferior remove any analytic concerns about convergence of the quotient from the Ratio Test. But algebraic concerns remain regarding
the possible division by zero. The Ratio Test cannot be applied directly to any series
for which a subsequence of the terms is zero. The Root Test does not involve divisions.
Thus it can be applied directly to any series.
30
a j be a series.
Theorem 10.8 Root Test. Let
j=1
1. lflim sup
m < 1, then
m > 1, then
m
aj
diverges.
j=1
j+m
Zf lim sup
a j converges absolutely.
j=1
j-+m
2. Zflim sup
30
= 1, then the series might converge or diverge. We also say that
j-m
the Root Testfailed in this situation.
< 1. There is a J E
Proof. For part 1, let q :=
N such that
c
30
laJ I < q J for all j 2 J . By Comparison Test,
aJ converges absolutely.
J=1
For part 2 , note that if lim sup .I/l.Ji > 1, then for any n
E
N there is a j
> n with
J+m
laJ 1 > 1. Hence, the terms do not converge to zero, and so the series diverges.
The last statement will be illuminated in Example 10.10.
w
In theoretical investigations, the Root Test usually is preferred over the Ratio Test,
because we do not need to worry about terms that are equal to zero. Exercise 10-8
shows that for any series for which convergence can be proved with the Ratio Test,
convergence can (in theory) also be proved with the Root Test. This is another indication that the Root Test is preferable when developing a theory.
The p-series test below provides examples of convergent and divergent series for
which the Root Test and the Ratio Test both fail.
10. Series of Real Numbers II
174
Q.The sum
Theorem 10.9 p-series test. Let p E
c
OC1
- converges ifsp > 1.
j=l J P
1
1
Proof. If p 5 1, then for all j 2 1 we have - > .: Because the harmonic series
c
Jp
O01
- diverges (Example 6
j=l
j
4 , by Comparison Test
c
-J
3 0 1
- diverges for p 5 1.
j=l J P
c+
k
This leaves the case p > 1. Note that the sequence of partial sums Sk =
is
j=1
increasing. Hence, we are done if we can show that it is bounded. To do this we use a
chunking argument similar to the proof in Example 6.8.
.
2'+'
m-1
1
1D-1
Because the right side of this inequality does not depend on m , the Sk are bounded
H
and the series converges for p > 1.
5j,
Example 10.10 The Root Test and the Ratio Test both fail for the harmonic series
M 1
which diverges, and for the series
?, which converges. For the Ratio Test,
c
/=I
i=l
J
this is a simple computation. For the Root Test, we need to use that lirn
1'30
postpone the verification to Exercise 10-16c.
f i = 1. We
0
Exercises
x
a, be a series with a j f 0 for all j
10-8. The Root Test versus the Ratio Test. Let
j=1
Explain why part 10-8a shows that if the Ratio Test proves that a series converges, then the
Root Test also proves that the series converges.
10.3. Power Series
175
c
m
10-9. Construct a convergent series
a j with
j=1
10-10. Prove the p-series test (Theorem 10.9) using the Integral Test (Theorem 8.29).
10-11. Use the Ratio Test or the Root Test to determine which of the following series converge. If a series
converges, determine if it converges absolutely.
10-12. Use any of the convergence tests introduced so far (including those from Chapter 6) to determine
which of the following series converge. If a series converges, determine if it converges absolutely.
10.3 Power Series
With series giving access to “sums with infinitely many terms” it is natural to consider
what happens when we let the sum in the definition of polynomials have infinitely
many terms. The resulting notion of a power series is very versatile. In particular, it
will enable us to introduce the transcendental functions in Chapter 12.
00
is called a
Ck(X k=O
power series centered at a (or a power series about a ) with coeficients C k . The power
Definition 10.11 For a sequence
{Ck}z,,
the expression
c
N
series is called convergent at x
E
ifand only
if lim
N-KC
Ck(X
- a )k exists. Other-
k=O
wise it is called divergent at x .
Theorem 10.12 shows that power series converge on a symmetric open interval
( a - R . a R ) about the center point (where R could be infinite), and they diverge
outside [a - R , a R ] . So only if R is finite are there two points, a R and a - R
for which the convergence behavior is unknown. Note how in Theorem 10.12 R is
defined in terms of the Root Test. Exercise 10-13provides an equally useful conceptual
characterization of R.
+
+
+
x
Theorem 10.12 Let
R :=
Ck(X
- a ) k be a power series.
1
lim supk+x?
x
If lim sup
iflim sup
$%TI’
= 00 let R := 0 and iflim sup
k+m
- a ) k converges absolutely f o r all x
k= 1
and it diverges f o r all x E R with lx - a1 > R.
R :=
CC.
Then
c C k ( X
E
(0, m) let
k+x?
k=O
= 0 let
k - + 00
E
R with Ix - a1 < R
10. Series of Real Numbers 11
176
1
Proof. We apply the Root Test. Let the formal quotient - be 00. For x # a ,
0
= Ix - a1 lim sup
the limit superior lim sup
is less than 1 iff
/-
k+m
1
Ix-al <
lim SUPk+00
This proves the result.
m
k+
00
and it is greater than 1 iff Jx-al >
c
1
limSUPk+00
00
Definition 10.13 Let
1
0
- a ) k be apower series. Then, with - := 00 this once,
Ck(X
k=O
R :=
1
lim suPk-+oo
rn.
c
00
is called the radius of convergence of
Ck(x - a ) k .
k=O
Theorem 10.12 shows that power series define functions on intervals. These functions are limits of polynomials of increasing degree. To find out more about the analytic
properties of these functions we need to investigate sequences of functions, which is
done in Chapter 11. We conclude the present section with some remarks on the arithmetic of power series. Sums, differences and constant multiples of power series are
straightforward (see Exercise 10-14).
The multiplication of power series is similar to that of polynomials. We want to
multiply each term of the first series with each term of the second and then collect
terms with equal exponents. The product of a term with exponent i in the first power
series with a term with exponent k - i in the second power series gives a term with
exponent k for the product. We use this observation to first define a product for series.
c
00
x7
Theorem 10.14 The Cauchy Product of two series. Let
Uk
and
k=O
bk be absok=O
c cc
c
00
converges absolutely and
c
n
Pk =
k=O
n
7
Uibk-i. Then
Pk
k=O
i =O
m
k=O
k=O
Proof. First note that
a
ak
pk =
c
oc
k
lutely convergent series and for each k 1. 0 dejine pk :=
bk.
k=O
k
k=O i=O
aibk-j =
c
ajbl =
Osj+l5n
n n-k
y,
akbi for
k=O i=o
00
every n
E
N. To see that c p k converges absolutely, note that for all n
k=O
inequalities
E
&
theI
177
10.3. Power Series
x
hold. Now let E > 0 and assume without loss of generality that each series has at least
00
one nonzero term. Then there is an even N E
N so that
lakl <
k= f
00
E
n
3 CEO
Ibkl
and
. Therefore for all n 2 N we obtain
M
M
k=O
k=O
n
0
k=O
i=O
n-k
k=O i=O
n
n-k
k=O
n
i=o
k=O
-
E
I
I
0
00
00
i=n-k+l
k=n+l
1
n
00
00
0
1
k=O
k=O
k=O
k=O
0
1
1
k=O
F
E
E
--+--+--=&.
3
3
3
co
Hence,
x
o c 0 0
Pk
=
k=O
bk, which completes the proof.
ak
k=O
k=O
It can be shown that in Theorem 10.14 it is enough to demand that one series
converges (not necessarily absolutely) and the other converges absolutely. Because the
proof is much more involved, we demanded here that both series converge absolutely.
c
00
Corollary 10.15 Let
c
00
Ck(X
- a ) k and
k=O
dk(X - a ) k be power series with radii of
k=O
k
convergence R, and Rd, respectively. With Pk := c c i d k - i f o r k p 0 , the radius
x
i =O
00
of convergence of
pk(X - a ) k is at least R, := min{R,, R d ] and f o r all x with
k=O
Ix - a 1 < R , the product of the power series is the power series with coeficients P k ,
00
that is,
pi ( x - a ) =
k=O
(5
ck ( X - a ) k )
k=O
(5
dk ( x -
k=O
.
10. Series of Real Numbers
178
II
Proof. Exercise 10-15.
Exercises
c
x
10-13. Prove that the radius of convergence of a power series
is the largest R
U k ( X - a$
c
E
[O, m] so
k=O
X
that
a k ( x - a ) k converges absolutely for all x with
/x- a / iR
k=O
Note. This conceptual formulation is often more helpful than the formula in Definition 10.13
c
x
X
c k ( x - a ) k and
10-14. Let
k=O
d k ( x - a ) k be power series with radii of convergence Rc and Rd
k=O
m
(a) Prove that the radius of convergence of C ( c k + dk)(x
C(C
+ dk)(x
~
c
x
X
that
- a)k=
k=O
ck(x - a ) k
is at least min(R,. R d ] and
x
+
k=O
- a)k
c
k=O
dk(x - a j k for all x for which both se-
k=O
ries converge.
Hint. Use the result from Exercise 10-13.
(b) State and prove a result similar to part 10-14a for the difference of the two power series.
c
x
(c) Prove that if b
E
c
W \ (01, then the radius of convergence of
c
x
X
x for which
ck(x - a ) k converges, we have
k=O
bq(x
bck(x
k=O
- a)k = b
k=O
- a)k
is Rc and for all
c
x
ck(x - a j k .
k=O
10-15. Prove Corollary 10.15
c
x
10-16. Consider the power series
k= 1
Xk
-.
k
(a) Use the Ratio Test to show that the radius of convergence is 1
= 1.
(b) Use Theorem 10.12 to show that lim
n+w
(c) Prove that the Root Test fails for
c c
j=1
and
J
j=l
-
J2'
10-17. Use the formula from Definition 10.13 to compute the radius of convergence of the given power
series. You may use the result from Exercise 10-16b.
Chapter 11
Sequences of Functions
In Section 10.3, we considered power series as series of numbers for fixed x E R.
Another way to look at them is as sequences of polynomials. This change in pointof-view is the first step toward function spaces, which are very important in analysis.
This chapter describes ways in which sequences of functions can converge. Formally,
a sequence of functions is a map from the natural numbers into a function space. Until
we encounter function spaces in Example 15.3, a sequence offunctions will simply be,
as in Defnition 9.17, a sequence whose “values” are functions, not numbers.
11.1 Notions of Convergence
The most obvious way in which a sequence of functions can converge is at every point.
Definition 11.1 Let S be a set and for every n E N let f n : S -+ R be a function. The sequence of functions (
is called pointwise convergent to the function
f : S -+ R i f f o r all x E S we have lim f n ( x ) = f ( x ) .
f,}rz1
n+m
Example 11.2 For n E N,let f n ( x ) := xt. On the set (0, 1) the sequence {fn],X==l
converges pointwise to the constant function f (x) = 1.
By Exercise 2-46, for all x
E
1
(0, 1) the sequence x
1 3 0
ln=l
converges to 1.
Example 11.3 By Defnition 9.18 every real valued Lebesgue measurable function f
is the pointwise limit of a sequence of simple functions.
This is because if {sn},XE1is a sequence of simple functions that converges pointwise to f t and (tn)E1
is a sequence of simple functions that converges pointwise to
f - , then the sequence {s, - t n ] E lconverges pointwise to f = f f - f - .
0
Pointwise convergence means the sequence of functions converges at every point,
but the “speed” of convergence can vary from point to point. Uniform convergence
requires that convergence happens at a uniform minimum speed (see Figure 23).
179
180
11. Sequences of Functions
f+E
f
f --E
0
I
I
I
I
Definition 11.4 Let S be a set and jor ever): n E N let j,, : ?, -+ lK be ajunction.
is called uniformly convergent to the function
The sequence of functions ( fn]r=l
f : S + E% iff for all E > 0 there is an N E N so that for all n 2 N and all
x E s we have lf,,(x> - f ( x ) J < E .
It a sequence ot tunctions only converges (pointwise or unirormly) on a part 1
of the common domain S, we also say that the sequence converges (pointwise or uniformly) on T . It is easy to see that uniform convergence implies pointwise convergence
(Exercise 11-1). The converse is false, as the next example shows.
Example 11.5 For n E N,let f,,(x) := xi. Then for any 6 > 0 on the interval
(6,1) the sequence ( fn]TT1
converges uniformly to the constant function f (x) = 1.
Howevel; on the set (0, 1) the sequence ( fn}r&
does not converge uniformly to the
constant function f (x) = 1.
For the first claim, let 6 E (0, 1). Then for all points x E (6, 1) we have that
6; < x i < 1. If E > 0 is given, there is an N E N so that for all n 1. N we
have 1 - E < 6; < 1. Consequently, for all points x E ( 8 , 1) and all n 2 N we infer
1 - E < 6i ixi < 1. Therefore on the interval (6, 1) the sequence { fn]T=l
converges
uniformly to the constant function f ( x ) = 1.
To prove that the sequence {
does not converge uniformly to f (x) = 1 on
(0. l), suppose for a contradiction that it does. Then for every E E (0, 1) there is an
N E N so that for all n 2 N and all x E (0, 1) we have 1 - E < xi < 1. But
j”,,}zl
y := (1 - E ) is~ in (0, 1) and fiy ((1 - E ) ~ =
) ((1 - E ) ” ) ~ = 1 - E 3 1 - E , a
contradiction. Therefore, on the set (0, 1) the sequence { f,,}Z1does not converge
uniformly to the constant function f (x) = 1.
0
181
11.1. Notions of Convergence
Power series are important examples of uniformly convergent series.
c
M
Theorem 11.6 Let
c
Ck(X
- a ) k be a power series with radius of convergence R.
k=O
ffi
Then
Ck(X
- a ) kconverges uniformly on any closed subinterval of ( a - R , a
+ R).
k=O
+
Proof. Any closed subinterval of ( a - R , a R ) is contained in a symmetric closed
subinterval [a - r, a r ] with 0 < r < R. Thus we are done if we can prove the result
for symmetric closed subintervals [a - r, a r ] E (a - R , a R ) . Let r E (0, R ) .
For all x E [a - r, a r ] and all k E N U {0},we have that I C k ( X 5 Icklrk. For
+
+
+
u)~I
+
E [a - r, a + r ] , let P ( x ) := c c k ( x - a ) k be the limit of the power series.
c13
each x
k=O
30
By Theorem 10.12,
Ck((a
+ r ) - a )k converges absolutely.
k=O
there is an N
x
E
[a - r, a
N so that for all
E
n
>
n
I
I
k=O
I
E
> 0
co
N we have
lCklrk
< E . Therefore for all
k=n
+ r ] and all n 1. N we obtain
I
Thus for any
lk=n+l
30
and hence the power series converges uniformly on [a - r, a
+r].
More notions of convergence for sequences of functions are discussed in Section 14.6.
Exercises
11- 1. Prove that every uniformly convergent sequence of functions is pointwise convergent.
I
11-2. In Example 11.5, consider the proof that the f n (n) = x i do not converge uniformly to the constant
1 .
function f ( x ) = 1 on (0, 1). Could we have used E = - in this part of the proof? Explain.
2
11-3. Prove that for each S > 0the sequence {fn):zl
with f n ( x ) = x” converges uniformly on (0, 1 - S ) ,
but it does not converge uniformly on (0, 1).
11-4. Prove that a bounded function f : B --t R is Lebesgue measurable iff it is a uniform limit of simple
functions. Hint. Part “5=+1”of Theorem 9.19.
11-5. Give an example of a sequence {fn)r=,
of bounded functions on an interval I that converges pointwise to an unbounded function f : I + W
11-6. A sequence of functions (fn)T=l
defined on a set S is called pointwise Cauchy on the set S iff
]El
for all x E S the sequence [ f n ( x )
is a Cauchy sequence. Prove that every pointwise Cauchy
sequence of functions is pointwise convergent to a function f : S + W.
11. Sequences of Functions
182
11-7. Approximating continuous functions with differentiable functions. Let [ a , b] be an interval
(a) For continuous f : [a. b] -+ R and all x E R,let f ( x ) :=
f(x);
if x E [ a , b].
f ( b ) ; i f x z b.
For n E
N and all x E R, let fn (x):= n
x+;
f ( z ) d z . Prove that fn is differentiable on B.
(h) Prove that for every continuous function on [a, b] there is a sequence of functions that converges uniformly to f and so that every function in the sequence is continuous on [ a , b] and
differentiable on ( a , b ).
[fn]Fz1
11-8. A sequence of functions
defined on a set S is called uniformly Cauchy on the set S iff for all
E > 0 there is an N E N so that for all n , m 2 N and all x E S we have fn ( x ) - f m ( x )
< E . Prove
that every uniform Cauchy sequence of functions is uniformly convergent to a function f : S + W.
c
cc
if
E
f
j=l
: ( a , b ) + IW be continuous, let
Prove that
[e,d ]
c ( a , b ) he a closed subinterval and let N
3
f
: (a,b ) +
andlet N :=
Ib 1
-1 ,’;/.
:= -
x + - converges uniformly to f on [c. d ] .
{gn]gN
defined by g n ( x ) := f
Hint Uniform continuity.
1 1 - 1 1 . Let
1
I
c
j=1
11-10, Let
1
: [ a , b] + R be so that a, := sup { f , ( x ) : x E [a, b] ] < m. Prove that
co
a, converges, then
fj converges absolutely at every x E [a, b] and uniformly on [ a , b].
11-9. For each j
N,let f j
I
R be continuously differentiable, let [e,d ] c
d ] . Prove that the sequence
( a , b) be a closed subinterval
f (x + ;) - f ( x )
[gn]EN
defined by g n ( x ) :=
1
n
converges uniformly to f ’ on [e,d ] . Hint Mean Value Theorem
11.2 Uniform Convergence
If the function f is the (pointwise or uniform) limit of a sequence of functions [ f n } z l ,
it is natural to ask which properties of the functions in the sequence are inherited by
the limit f . It turns out that pointwise convergence preserves neither continuity (see
Exercise 11-12) nor Riemann integrability (see Exercise 11-13). Uniform convergence
on the other hand preserves continuity (see Theorem 11.7), as well as Riemann integrability (see Theorem 11.9).
Theorem 11.7 Let { fn}Elbe a sequence of functions on [a, b] that converges uniformly to f : [a, b] + E% and let all f n be continuous at x E [a, b]. Then f is
continuous at x.
Proof. Let E > 0. Because { f n } , X , 1 converges uniformly to f on [a, b], there is an
N
N so that for all z
I
1
&
[a, b] we have f N ( z ) - f ( z ) < -. Moreover, because f N
3
is continuous at x, there is a S > 0 so that for all z E [a, b] with Iz - x /< 6’ we have
&
lf~(z)- f N ( x ) / < -.Thenforallz E [ a , b ] w i t h l z - x l <Sweconclude
3
E
I f ( z ) - f(x)l
5
<
E
/f(z)
&
-
&
fN(Z)I
&
-+-+--=&,
3
3
3
+ / f N ( Z ) - fN(X)I + 1f N ( X ) - f(x)l
11.2. Uniform Convergence
183
and hence f is continuous at x.
w
Corollary 11.8 Let [ f n } ~ z obe
= l a sequence of functions on [a, b ] that converges uniformly to f : [ a ,b] + R and let all f, be continuous on [a, b]. Then f is continuous
on [ a ,b].
Proof. Apply Theorem 11.7 at every point in [ a ,b].
w
Theorem 11.9 Let { f n j n E N be a sequence of Riemann integrable functions on the interval [ a ,b ] that converges uniformly to the function f : [ a ,b] + R. Then f is
Riemann integrable and
l
b
f (x) d x =
.I
hl
b
f, (x) d x .
Proof. For each n E N,the function fi? is Riemann integrable. Therefore by
Theorem 8.12 each set B, := {x E [ a ,b ] : w f ( x ) > 0} is a null set. Hence, by
countable subadditivity of outer Lebesgue measure (part 3 of Theorem 8.6), the set
oc
B :=
U B, is a null set.
NOW
for every x
E
[ a ,b] \ B all functions f, are continuous
n=l
at x, and by Theorem 11.7 we infer that f is continuous at x. Because B is a null set,
by Theorem 8.12 f is Riemann integrable.
For the integrals, let E > 0 and find an N E N so that for all n 3 N and all
E
x E [ a ,b ] we have f n ( x ) - f (x)l < -. Then for all n 2 N we obtain
b-a
I
which establishes the equality.
w
Lebesgue’s integrability criterion for Riemann integrals reveals that being Riemann
integrable is a substantially weaker property than continuity. It would be nice if this
weaker property would be preserved by a weaker notion of convergence, such as pointwise convergence. However, Exercise 11-13 shows that the pointwise limit of Riemann
integrable functions need not be Riemann integrable. This is another situation in which
the Lebesgue integral has an advantage over the Riemann integral. For pointwise limits
of (Lebesgue) integrable functions, please consider Section 14.5.
With continuity and integrability investigated, we turn to differentiability. For sequences of differentiable functions, uniform convergence does not guarantee the differentiability of the limit.
Example 11.10 Uniform limits of direrentiable functions need not be differentiable.
2nt2
For every n E N,the function f n ( x ) := x2n+l is diflerentiable on (-1, l), but the
uniform limit of (f,}Elis f (x) = 1x1,which is not dcfferentiable (also see Figure
24).
It is clear that the f, are all differentiable. To see the uniform convergence to the
absolute value function, let E > 0. For all x E (- 1, 1) we have
11. Sequences of Functions
184
I
Figure 24: A sequence of differentiable functions that converges uniformly to Ix 1. For
an indication how pathological the situation can become, consider that by Exercise 117 every continuous function is the uniform limit of differentiable functions, while by
Exercise 11-21 there are continuous functions that are not differentiable at any x E R.
5 1 and 1x1 < 1. Moreover, lim r k = 1 for every posin+m
tive real number r . Let E > 0 and let N E W be such that for all n 2 N we have
-11 < ~ . T h e n f o r a l l x ~ ( - ~ , ~ ) a n d a l l n ~ N w e o b t a i n
1 = 6,
while for all x E (- 1, 1) \
(-E,
E)
and all n
> N we obtain
We have proved that for all E > 0 there is an N E N so that for all n 2 N and for
all x E (-1, 1) we have fn(x) - f ( x ) l < E , which means that the sequence {
0
converges uniformly on (-1, 1) to f (x) = 1x1.
I
fn}Ei
To obtain a differentiable limit function, we require continuous derivatives and we
impose a uniform convergence criterion on the derivatives.
Theorem 11.11 Let { fn}El
be a sequence of dzrerentiable functions on ( a , 6 ) so that
for all n E N the derivative f; is continuous on ( a , b). I f (
convergespointwise
to f : ( a ,b ) + R and the sequence {
converges uniformly to the function
g : ( a , b ) -+ IR,then f is direrentiable and f ’ = g .
f,!,)zl
fn}El
Proof. By Theorem 11.7, the function g is continuous, because (on every closed
subinterval of ( a , b ) ) it is the uniform limit of continuous functions. To show that
11.2. Uniform Convergence
f' = g, let x
185
( a , b ) and let E > 0. Let 6 > 0 be such that for all y E ( a , b )
&
with Iy - X I < 6 we have I g ( y ) - g(x)l < -. Let z E ( a , b) \ { x } with Iz - X I < S
3
be fixed. Because
converges uniformly to g, there is an Ng E N so that for
E
(f,'}zl
{fn}z1
&
all n 2 N g and for all y E ( a , b ) we have ]f,'(y) - g ( y ) l < T . Because
converges pointwise to f,there is an Nf E N so that for all n $ Nf the inequality
I
'(')
-
- f n ( z ) - fn(x)
2 - x
2 - x
I
5
<
Value Theorem there is a c between
3
holds. Let n 2 max(Nf, Ng). By the Mean
z and x so that
f n (2) - f n (x)
2 - x
we conclude
&
&
&
-+O+-+-=&.
3
3 3
<
with Iz - x / < 6 we have
R=
= f,'(c).Therefore
-
2 - x
m'
Then
E.
Hence, f is differentiable at x
c
c
Ck(X - a ) k is dlfSerentiable on ( a - R , a
k = ~
33
the derivative is
- g(x) <
X
1
lim suPk+co
I
kck ( x - a)k-
'.
k= 1
Proof. First, note that (by Exercises 10-6 and 10-16b)
1
1
+ R ) and
11. Sequences of Functions
186
c
cc
which means that the termwise derivative
c
cc
of convergence as
c k ( x - u ) ~ .Let r
k=O
d
+ r ) we have dx
all x E ( a - r, a
(0, R). Then for every n
E
c
derivative is continuous on (a - r, a
+ r ) . Moreover,
22
X
N and
and the
1
32
ck(x - a ) k
converges
n=l
c
30
converges uniformly to
c
c
+ r ) and the derivative is
kCk(x -
on (a - r, a
+r ) .
k=l
cc
ln=1
Therefore by Theorem 11.11 the function
val ( a - r. a
E
+ r ) and the sequence of the derivatives
ck(x - a ) k on ( a - r, a
k=O
E
[2
k:l
c
kck(x -
=ckck(x-
- a)
Ck(X
(k10
pointwise to
has the same radius
kck(x -
k=l
ck(x - a ) k is differentiable on the inter-
k=O
cc
Because r
kCk(x -
E
(0, R ) was
k=l
X
ck(x - a ) k is differentiable at every point x
arbitrary,
k=O
E (a -
R,a
+ R ) and the
c
__
ix)
derivative at x is
kck(x -
k=l
In particular, Corollary 11.12 shows that the derivative of a power series is again
a power series. Therefore Corollary 11.12 can be applied to derivatives and higher
derivatives of power series and we obtain the following.
c
cc
Example 11.13 Any power series f ( x ) =
ck(x - a ) k with radius of convergence
k=O
R > 0 is infinitely differentiable on the interval (a - R , a
derivative at x = a is f ( k ) ( a )= k ! C k .
+ R ) . Moreover, the kfh
0
In light of Example 11.13, it is natural to ask if every infinitely differentiable function is a power series. Lemma 18.8 will show that this is not the case.
Exercises
11-12. Let n
E
1
for 0 5 x 5 1 - -,
W. Prove that f n ( x ) :=
1
for 1 5 x 5 2,
0; f o r O i x i 1 , is not continuous on
defines a continuous function on [O. 21 and that f ( x ) =
1; for 1 5 x 5 2,
1:
11.2. Uniform Convergence
187
converges pointwise to f.
[0, 21. Then prove that
11-13. Let { 4 j ) E 1be an enumeration of all rational numbers in [0, 11 and let n E
'
I
N.Prove that each func-
[O'
I q l ' ' ' ' ' 4n1' is Riemann integrable on [0, 11. Then prove that
tion f n ( x ) := O; for
1; f o r x E ( q 1 , . . . , q n l ,
the sequence {
converges pointwise to the Dirichlet function f ( x ) = 0; for x E [O, 11 \ 0,
1; for x E [O, 11 n Q,
which is not Riemann integrable on [O, 11.
11-14. Prove Theorem 11.11 as follows. Fix xo E ( a , b ) . Prove that g is Riemann integrable from xo to any
x E ( a , b). Then use the Fundamental Theorem of Calculus to prove that for all x E ( a , b ) we have
l:
g ( t ) d t = n lim
- x f n ( x ) - f n ( x o ) . From this finding, conclude that f ' = g
11-15. Explain why in the proof of Theorem 11.11 the hypothesis that all
with the hypothesis that g is continuous.
fi are continuous can be replaced
[fn]r=l
11-16. A sequence
of functions f,,: [ a , b] + R is called equicontinuous at x iff for all
thereisaS z Osothatforalln E Nandallz E [a. 61 with It-XI < 6 wehave f n ( Z ) - f n ( x )
1
E
1
> 0
<
E.
(a) Prove that if { fn):=l
is equicontinuous at x and converges pointwise to f : [ a , b] + B,then
f is continuous at x.
(b) Prove that if [fn),X=l is a sequence of continuous functions that converges uniformly to the
continuous function f,then {fi2),"=1 is equicontinuous at every x E [ a ,b].
11-17. Give an example of a sequence of functions [f,],"=,
on [0, 11 that converges pointwise on [0, 11 to
. .
1
1
a Riemann integrable function f : [0, 11 + X and
f ( x ) d x # n$%
1
1
fn(x) dx.
11-18. Prove that if ( fn}F=l is a sequence of bounded functions on an interval I that converges uniformly
to the function f : I --+ B, then f is bounded also.
s
a3
c k ( x - a ) k and
11-19. Let
k=O
let
E
x
d k ( x - a ) k be two power series with positive radius of convergence and
k=O
x
E
N we have C k
- a$
for all x E
X with 1x - a J <
E.
Prove that
k=O
k=O
then for all k
dk(x
ck(x - a ) k =
z 0 be so that
= dk. Hint. Derivatives at a .
fn)FZ1
1 1-20. Fubini's Differentiation Theorem states that if {
is a sequence of nondecreasing functions
on [ a , b] so that f n 5 f n + l for all n E N,fn+l - f n is nondecreasing for all n E N and so that
converges pointwise to f : [a. b] + R, then f ' ( x ) = lim f L ( x ) for almost all x E [a. b ] .
[fn]r=l
n+m
We will prove this result using the steps below.
(a) Prove that f is nondecreasing.
(b) Prove that there is a set A & [ a , b] so that h ( [ a , b] \ A ) = 0 and all the derivatives involved
exist at every x E A . Conclude that we only need to prove the equality for almost all x E A.
Hint. Lebesgue's Differentiation Theorem (Exercise 10-7).
(c) Prove that for all n E N and all x E A we have f,'(x) 9
Hint Consider that fn+l = f n ( f n + l - f n ) .
+
fi+l(x).
(d) Prove that for all n
E
N the function f - fn is nondecreasing.
(e) Prove that for all x
E
A we have f i ( x ) 5 f ' ( x )
cc
[ f ( b )- f n k ( b ) ]converges and prove that
(f) Construct a subsequence [ f n k ] E
sol that
k=l
x
C [f(.x)
k=l
-
f n k (XI] converges for all x
E [ a , b]
188
11. Sequences o f Functions
m
(g) Apply what we have proved so far to g ( x ) :=
[f(x)
-
f n k ( x ) ] to prove that the series
k=l
c4
[f'(x) k= 1
fik ( x )1 converges at almost every x E A with limit less than or equal to g ' ( x ) .
{
Conclude that f ' ( x ) -
fik(x)\ k=m 1 converges to zero at almost every x
(h) Prove that lim f A ( x ) = f'(x) at almost every x
"+a,
E
A , which establishes the result.
x;
1 1-21. A continuous nowhere differentiable function. Let r (x) :=
and let s ( x ) := t ( x )
c13
+
(t(x-j )
+ t ( x + j ) ).
E A.
For all n
1-x;
0;
E
1
forO<x<-,
2
1
for-<x<l,
2forx # (0, l),
1
(1000"~)
100"
N let f n ( x ) := --s
i=l
m
fn.
and define f :=
n=l
Prove that f is continuous because it is the uniform limit of continuous functions.
1
so that
Prove that for every x E R and n E N there is a z E R with /z- x / =
4 ' 1000"
-f(x) >
and conclude that f is not differentiable at any x E W.
1
I
1
z-x
~
Hint.The nth sawtooth function f n is made up of straight line segments with slopes 1 1 0 " on
1
intervals of length __
2 ' 1000" '
Prove that for every continuous f : [ a .b] + R that is differentiable on ( a , b ) there is a
of continuous functions that are not differentiable at any point in ( a , b ) ,
sequence
. v J l i n ~ l . L ~ ~ ~ ~ ~ ~ . ~ i f fn r. m l i q ' n ,
(fn)F=,
11-22. By Exercise 4-17, if the derivative of the function f : ( a , b ) -+ R is zero for all x E ( a , b ) ,then f is
constant. This exercise shows that if the derivative is only zero almost everywhere, then f need not
be constant even if it is continuous. The function @ below is called Lebesgue's singular function.
{ i]
co
Let Q =
, let C Q be the ternary Cantor set and use the notation of Definition 7 . 2 2 .
n=l
L e t n ~ N . P r o v e t h a t f o r a l l i ~. {. l. . . 2 " ) w w e h a ~ e ~1Z ~ ~ = ~
Let n
E
W. For every x
{ 1 , . . . , 2"}
E
[O. 11 \ 114" let ix
so that x is an upper bound of
E
I2 and
{ 1, . , 2") be the largest number in
x # I s . For x E I f , , let i, := 0.
,
,
Define
Prove that $,, : [0, 11 + [0, 11 is continuous, surjective and @,
[6
Il?]
= [0, 11
i=l
Prove that [$n),"=l
[ I
converges uniformly to a continuous, nondecreasing function $
Prove that @ C Q = [O. 11 and that $ is constant on every subinterval of [O, 11 \ C Q ,which
means tlr' = 0 a.e.
Chapter 12
Transcendental Functions
In this chapter, we use the machinery built so far to define the transcendental functions
and to investigate their properties. It would have been possible to define the transcendental functions earlier, but the proofs would have been harder.
12.1 The Exponential Function
The question “Is there a nonzero function that is equal to its own derivative?” can be
tackled with Corollary 11.12. If there is a power series with this property, then for
1
1
all k E we infer kck = Ck-1. That is, C k = -Ck-l = . . . = -CO. The most natural
k
k!
w
nonzero value for co is 1. Thus we are interested in the power series
Definition 12.1 For x
E
R,we define exp(x)
:=
c
OC
k=O
c
3o
k=O
xk
-.
k!
xk
-.
k!
Theorem 12.2 The series that defines exp(.) has infinite radius of convergence. Thered
fore exp(.) is differentiable on R with - exp(x) = exp(x) for all x E R.
dx
Proof. Because the last
convergence
terms of k ! exceed
, we obtain for the radius of
which proves that the radius of convergence is infinite. The differentiability of exp(.) on
d
R follows from Corollary 11.12. To see that - exp(x) = exp(x) we also use Corollary
dx
11.12 and obtain the following.
189
190
12. Transcendental Functions
c
3o
d
- exp(x) =
dx
xk-l
k-
k=l
k!
=
k=l
c5
oc
X
-
= exp(x).
1
n =O
Aside from being equal to its own derivative, the function exp(.) has interesting
algebraic properties.
Theorem 12.3 For all x,y
E
R,we have exp(x) exp(y) = exp(x + y ) .
Proof. Let x ,y E R. Then via Theorem 10.14 (or Corollary 10.15) and the Binomial Theorem we obtain
Theorem 12.4 For all rational numbers r , we have exp ( r ) = (exp( 1))'.
Proof. With an easy induction it can be proved that for all n E N and x E R we have
exp (nx)= (exp(x))" (Exercise 12-1). It is also easy to see that exp(0) = 1. Moreover, for all x E R we have exp(x) exp(-x) = exp(x - x) = exp(0) = l , which means
1
. In particular, for all n E z we infer exp (nx)= ( exp(x))n.
exp(x)
P with
. p E Z and q E N we obtain
Hence, for all rational numbers -
that exp(-x) =
~
9
Theorem 12.4 shows that on Q the function exp(.) is an exponential function. Exponentials with irrational exponents have remained undefined so far. Because exp(.) is
differentiable, it is continuous. The most sensible way to extend an exponential function defined on Q to all of R is so that the resulting function is continuous. Thus the
following definition is reasonable.
Definition 12.5 Euler's number is dejined to be e := exp( 1). The function exp(.) is
called the natural exponential function and f o r all x E JR we set ex := exp(x).
By Theorem 12.4 for x E Q,the equality exp(x) = ex holds with the power on the
right side being a power as in Definition 1.50. Next, we want to define powers with
real exponents for all positive real bases. To do so, we first need to prove that e" has an
inverse function.
12.1. The Exponential Function
191
Proposition 12.6 The natural exponential finction is a strictly increasing, bijective
function from JR to (0, 00).
Proof. Let x < y . By Definition 12.1 for any z > 0, we have ei > eo = 1. But
this means eJ = eyPx+' = eYPxex > e x , so the natural exponential function is strictly
increasing and, in particular, injective. Moreover, its values are greater than or equal to
1 on [0, 00).
1
For z < 0 note that e z e v 2 = e z P z= 1, which means that ez = 7 must be greater
ethan 0. Therefore, ex maps R into (0. m).
Now note that for x > 0 we have ex > 1 x , and hence lim ex = 00. ConseX'OO
1
quently, lim ex = lim - = 0. By Theorem 12.2, the natural exponential funcx+-x
x--f--oo e-x
tion is differentiable, and hence continuous. The above limits and the Intermediate
Value Theorem show that the natural exponential function is surjective onto (0, 00).
(MentallyJill in the details.)
w
+
Definition 12.7 We define the natural logarithm function In : (0, m) + JR to be the
inverse function of exp : R + (0, 00).
The natural logarithm allows us to define powers of positive bases with arbitrary
real exponents. With an argument similar to the proof of Theorem 12.4 we can show
that this definition agrees with Definition 1.50 for rational exponents. (Exercise 12-2.)
Definition 12.8 Let a > 0 be a positive real number and let r E R.Then we dejine the
rth power of a to be a' := exp ( r ln(a)) = erln(').
Theorem 12.9 For all positive numbers a and b, and all real numbers x and y the
following hold.
Proof. All properties follow directly from corresponding properties of the natural
exponential function. The first property is proved as follows.
u.xa~
= ex In(a)ey M a ) = ex W ) + yMa) = e ( x + ~M )a ) = ax+r
Working rows left to right we obtain the following for the second property.
(ab)x =
-
(eln(a)eln(b))x
= (eln(u)+ln(b))x= eln(el"(a)cl"(b)
)x
e( In(a)+ln(b))x = eln(a)x+ln(b)x = ,1n(a)xeln(b)x = a X b X .
The third property follows from a x - J a J - x = 1 and ax-)'ay = a x . The remaining
w
properties are left as Exercise 12-3.
Of course, the properties of the natural logarithm function are also of interest,
12. Transcendental Functions
192
Theorem 12.10 The natural logarithmfunction is diferentiable on (0,00) with deriva1
d
tive - ln(x) = -. Consequently, allpowerfunctions f (x) := x r with r E R \ Q are
dx
X
d
dgerentiable on (0,00) and the Power Rule -xr = r 2 - l holds.
dx
Proof. By Theorem 4.21, the natural logarithm function is differentiable at every
d
1
1
x E (0, 00) and - ln(x) = -= -. For the derivative of powers, the Chain Rule
dx
eln(x) x
implies -xr
d
= d e r l n ( x ) = e r l n ( x ) r -1 = r x r - l .
H
dx
dx
x
Because the natural logarithm function is differentiable, it is continuous, which
allows us to extend the limit law for powers to arbitrary exponents.
be a convergent sequence of nonnegative numbers and
Corollary 12.11 Let
let r
E
R.Then n+oo
lirn a;
unless r < 0 and lirn an = 0.
=
n+oo
Proof. Exercise 12-4b.
H
Further properties of the natural logarithm function are exhibited in Exercises 12-5
and 12-6.
Exercises
N we have exp(n) =
)" as claimed in the proof of
12-1. Prove by induction that for all n
Theorem 12.4.
E
12-2. Prove that for all x > 0 and all r
12.8 agree.
Q the definitions of the power x r in Definition 1.50 and Definition
E
(exp(1)
12-3. Finish the proof of Theorem 12.9. That is, prove the following for all a , b > 0 and x , y
(a)
(i)
a x
=
a"
bX
(b) (a")'
= ax)
E
R.
(c) ax > 0
12-4. The limit law for powers
(a) Prove that
lim ex = 0.
x+-m
(b) Prove Corollary 12.11 . Be careful with sequences that go to zero.
12-5. Prove that the natural logarithm function is a strictly increasing bijective function from (0, co) to E
with lim In(x) = m and lim ln(x) = -co.
x+o+
x+m
12-6. Let u , u > 0. Prove each of the following.
(b) ln(e) = 1
(a) In(1) = 0
id) In
(:)
= ln(u) - W
u)
(c) ln(uu) = ln(u)
(e) In ( u L )= u ln(u)
12-7. Limits of nth roots
(a) Prove that if nlirn
an = a > 0, then n lirn
+m
+m
Hint. Exercises 2-16 and 12-6.
g
m=a
+ ln(u)
193
12.2. Sine and Cosine
(b) Prove that lim @ = 1.
n-oc
1
1
12-8 Compute the integral
x3ex2 d x
12-9 Gronwall's Inequality. Let u , u : [ a , b] + [0,00) be continuous functions and let c ? 0 be so
that for all x
E
[ a , b] we have u ( x ) 5 c
+ /'
u ( t ) u ( r ) d r . Prove that for all x E [ a , b] we have
Hint. Divide by the right side, multiply by u ( x ) and integrate.
1
co
12-10 The function r ( a ) :=
x 0 l - ' C x d x defined for CY z 0 is called the Gamma function.
(a) Prove that the improper integral
LW
xa-l e -'
d x converges for a p 1.
(b) Prove that the improper integral also converges for 0
a < 1.
i
(c) Prove that the improper integral diverges for CY 5 0.
(d) Prove that r(1)= 1.
( e ) Prove that for all a 2 1 we have
r ( a + 1) = a r ( a ) .
(0Prove by induction that for every natural number n we have T ( n ) = ( n - l)!.
For this reason, the Gamma function is also referred to as the generalized factorial function.
12-11. Compute the following parameter dependent indefinite integrals. These integrals are useful for the
integration of rational functions after a partial fraction decomposition.
12.2 Sine and Cosine
Similar to the natural exponential function, the sine and cosine functions are defined
via power series that have the right derivatives at the origin. Of course, this is a bit
of "reverse engineering," because to obtain these derivatives we would need to quote
arguments that rely on the geometric definition of these functions.
oc ( - l ) k x 2 k + '
Definition 12.12 For x
E
R, we define sin(x) :=
k=O
30
sine function, and cos(x) :=
k=O
(- l)kx*k
(2k)!
(2k
+ 1)!
, which is called the
, which is called the cosine function.
Theorem 12.13 The power series that define sin(.) and cos(.) have infinite radius
of convergence. Therefore, both sin(.) and cos(.) are direrentiable and moreover
d
sin(x) = cos(x) and - cos(x) = - sin(x)for all x E R.
dx
dx
12. TranscendentalFunctions
194
Proof. We prove the result for sin(.), leaving cos(.) to the reader in Exercise 12-12.
For the sine function, note that for every x E Iw we have
(- 1 ) k + l x 2 ( k + ” + 1
(2(k+l)+l)!
lim
kj.m
X2k+3
(2k
(-l)kx*k+’
(2k+l)!
+ 3)!
(2k + l)!
xZk+’
= lim
k+ca
X2
(2k
+ 2)(2k + 3)
= 0.
Hence, by the Ratio Test, the power series converges for all x E R,so its radius
of convergence is infinite. By Corollary 11.12, the sine function is differentiable on R
and
x1
=
(- l)kx2k
k=O
(2k)!
= cos(x).
The following identities are useful when working with sine and cosine.
Theorem 12.14 For all x, y E R the following identities hold.
+ cos2(x) = 1 (trigonometric law of Pythagoras)
2. sin(x + y ) = sin(x) cos(y) + cos(x) sin(y)
3. cos(x + y ) = cos(x) cos(y) - sin(x) sin(y)
1. sin2(x)
Proof. To prove the first identity, we proceed as follows.
sin2(x)
+ cos2(x)
(_1)kX2k+’
(2k
=
n=l
2
+ l)!
(- 1);
(- l)k
(
2
j
+
I)!
(2k+
I)!
2j+1+2k+l=2n
( c
( - l ) J (-l)k
(-1)J
(-l)k
-1
n=l
k=O
1
12.2. Sine and Cosine
195
The remaining two identities are left for Exercise 12-13.
The smallest positive zero of the sine function also has a special place in mathematics. Of course, we first must show that the sine function has a positive zero.
Proposition 12.15 Thehnction sin(x) ispositive on
Proof. For all x
E
(0,
&) , we have
X4k+l
X2k+l
x
and it is negative at 4.
sin(x) = C ( - l l k
(2k
k=O
+
l)! = C ( 4 k + 1 ) !
k=O
)
X2
(4k
+ 2)(4k + 3) > o .
On the other hand, for x = 4 we obtain
42k+ 1
x
sin(4)
=
k=O
C ( - l l k (2k + l)!
-
where the first term is negative because
42k+ I
4
D- ilk (2k + l ) !
k=O
= 4--
43
45
-
4
'
49
1.2.3 1.2.3.4.5 1.2.3.4.5.6.7 1.2.3.4.5.6.7.8.9
5.43
4.43
9.46
2.46
- 4-1.2.3.5 1 . 2 . 3 4
1.2.3.5.6.7.9 1.2.3.5.6.7-9
7 .43 .43
6.9.43
= 41.2.3.5.6.9 1'2.3.5.6'7.9
+
+
+--
+
12. Transcendental Functions
196
= 4-
472
118.43
=4-4.-<0.
1.2.3.5.6.9
405
Definition 12.16 The smallest positive zero of the function sin(x) is called n.
We conclude by proving that the sine and cosine functions are periodic.
Definition 12.17 Let p > 0 be a real number: A function f : R -+ R is called
periodic with period p zrfor all x E R we have f ( x p ) = f(x).
+
Theorem 12.18 Both the sine and the cosine function have period 2n
Proof. Note that sin(2n) = sin(n + n) = sin(n) cos(n) + cos(n) sin(n) = 0.
This implies cos2(2n) = 1 - sin2(2n) = 1 and because
cos(2n) = cos(n
+ n)= cos(n>cos(n) - sin(n) sin(n) = cos2(n) > 0,
R we obtain
we infer that cos(2n) = 1. Hence, for all x E
+ 2n)
cos(x + 2n)
+
=
sin(x) cos(27r) cos(x) sin(2n) = sin(x)
= cos(x) cos(2n) - sin(x) sin(2n) = cos(x).
sin(x
Exercises
12-12. Prove that the power series that defines the cosine function has infinite radius of convergence and
d
that - cos(x) = - sin(x).
.t%
12-13, Finish the proof of Theorem 12.14.
(a) Use power series to prove sin(x
+ y) = sin(x) cos(y) + cos(x) sin(y) for all x .
(b) Use the above and the law of Pythagoras to prove cos
(?)
n = O a n d s i n ( ?3)7
J E
(c) Use the power series to prove sin(-n) = - sin(x) and cos(-x) = cos(x) for all x
(d) Prove that sin
(e) Prove that cos
(5
(-2
7r
- x ) = cos(x) for all x
E
B.(Use parts
= sin(x) for all x
E
B.
- .x)
(0,
Prove that cos(x + y),=
R
= 1.
E
8.
12-13a, 12-13b and 12-13c.)
c o s ( ~ ) , c o s i ~ ~sinixi,sin(xi,for
,all x . y E
B
12-14. Prove that n > 3.
12-15. A function f : [0, 27r] + B of the form f ( x ) = a0
+
n
( a j cos(jx)
i--.l
trigonometric polynomial.
(a) Prove the following product-to-sum formulas
1
i. cos(x) cos(y) = - [ cos(x y) cos(x - y) ]
2
1
ii. sin(x) sin(y) = - [ cos(x - y) - cos(x y) ]
2
+ +
+
+ bj sin(jx) ) is called a
12.2. Sine and Cosine
197
1
[ sin(x y) sin(x - y) ]
2
(b) Prove that the product of two trigonometric polynomials is a trigonometric polynomial,
iii. sin(x) cos(y) =
+ +
-
12- 16. Inverse trigonometric functions.
[
-1.
n n
Prove that the sine function is injective on - - ,
2 2
[ 5,];
n n
that - arcsin(x) = -f o r allx E (--, -1.
2 2
The inverse of the sine function restricted to -
is called the arcsine arcsin(,). Prove
d
d c 2
dx
Prove that the cosine function is injective on [0, n].
The inverse of the cosine function restricted to [0, n]is called the arccosine arccos(.). Prove
1
d
for all x E (0,n).
that - arccos(x) = -dx
qF2
sin(x)
12-17. The tangent function is defined to be tan(x) := cos(x)
(a) Prove that the tangent function is differentiable on its domain and
d
dx
- tan(x)
1
cosZ(x)
=-
n
(- n2 , -).
2
The inverse of the tangent function restricted to (- 5,
);
(b) Prove that the tangent function is injective on
-
is called the arctangent arctan(,).
1
d
Prove that - arctan(x) =
dx
1 +x2'
(d) (Another integral for integration with partial fraction decompositions.) Compute the inte1
gralS
dx.
(c)
~
12-18. (The last integral needed for integration with partial fraction decompositions.)Proceed as follows
to prove that for all natural numbers n > 1 we have
(a) Use integration by parts on
1
1
(x2 + b2)n dx =
1
1
. ( x 2 + b2)" d x
X2
(b) The resulting equation contains an integral
(x2
+ b2)""
(b)
1
d x . Expand the numerator with
i b 2 and cancel what can be canceled.
(c) Solve the equation for
s
(x2 + b2)"fl
dx.
12-19. Compute each of the integrals below
(a)
/
Cxsin(x) dx.
.rex2 sin (x2) dx.
12-20. A representation of n
(a) Prove that
l
b
1 .
sin"(x) dx = - - sin"-'(x)cos(x)
sinnP2(x) dx for all nat-
ural numbers n E N and all real numbers a < b.
Hint. Use sinn(x) = sin"-'(x) ( 1 - cos2(x) ) and use integration by parts for the summand
( sinn-2(x) cos(x) ) cos(x).
198
12. Transcendental Functions
Prove by induction that
1'.
2n-1
2n-3
IT
sin 2n (x) dx = __ . -. . . - - for all n
2n
2n-2
2 2
2n
2n - 2
d x = -. -. . . - for all n E
2 n + 1 2n-1
3
Prove that
,7
-
2
= lim
-
n - + ~2n
n
k= 1
~
E
W.
N.
(2k)2
This is called Wallis' Product Formula.
(2k- 1 ) 2 '
s,'
(x)dx ?
sin2"(x) dx ?
1%
sinZnf1(x) dx for all n
E
N,
IT
substitute the above expressions and divide by the expression in front of the -.
2
R is Riemann integrable and has a Riemann integrable derivative, then
that if f : [0, n ] +
12-21.
I
-1"
(x -
+
bzn
f ( k ) = L n f(x) dxf
ler's Summation Formula.
Hint. Note that J';l (x
i)
-
f'(x) =
1
dx.This formula is called Eu-
f'(x)
2
. _
[ f (0) + f (1) ] -
/
1
f ( x ) dx and that the last integral
0
on the right side of Euler's Summation Formula can be turned into a sum of integrals like these by
integrating from one integer to the next and applying the appropriate shift.
n
12-22 An asymptotic expression for n ! . To see the idea for the first step, note that ln(n!) =
ln(k).
k=1
(a) Apply Euler's Summation Formula to f ( x ) = ln(x
{
(b) Prove that / n (x Hint.
1
0
1' i)
(x
X
1
- 1x1) __ dx]
converges.
't
1; + I' k)
x+l
dx =
-
+ 1) to prove that for all n E W we have
n=l
(x -
1)
2 x
(x
i k
n!e"
(c) Let bn := __ Prove that (bn}r=lconverges to a limit b
nnJTi'
1
b
(d) Use Wallis' Product Formula to show that - = lim
E
-
2
1
dx.
R
1
5 = ~.&
n+m b:
n!e"
(e) Prove that lim -n+m nn&
-
This result is called Stirling's Formula. It is often written as n !
read "is asymptotically equal to."
12-23
($1'
where "->,
-.en
fin&=
is
for # O3 Prove that w ~ ( 0=) 2.
for x = 0.
12-24 A differentiable function with bounded, but not Riemann integrable, derivative (see Figure 25)
(a) Prove that for all 8 > 0 there is an xg
(b) Prove that f g ( x ) :=
'OS
(4)
E
;
part 12-24a, is differentiable on R, but
( 0 , 8 )so that 2x cos
(:)
+ sin (:)
=0
for x 5 0,
for
(O' x8), where xg is a positive number as in
'
fi is not continuous at x = 0.
12.3. L’H6pital’sRule
199
If
IIII IIV V I l
v
IIII I V
w vI
IIII I1 v
v II
IIII
Figure 25: The differentiable function h in Exercise 12-24 oscillates so that the derivative h’ is discontinuous on a Cantor set (marked by blocks).
(c) Prove that g a ( x ) :=
L
f4 (XI;
6
forx 5 -
’ is differentiable on W,g; is discontinuous at
forx z -,
2
5 (0.8).
fs (8 - x ) :
0 and at 6 and
(x
: ga(x)
#0]
(d) For any open interval ( a , b ) , let g ( , , b ) ( x ) := gIb-,l ( x - a ) , Prove that g(,,b) is differentiable, g;u,b) is discontinuous at a and at b and (x E W : g(,,b)(X) # 0) g ( a . b ) .
cx
(e) Let C Q =
n
Cf be a Cantor set of nonzero Lebesgue measure, which exists by Exercise
n=l
8-3e. For each n
E
N let D i , i
tervals so that [O. 13 \ CR =
= 1, . . . , 2n - 1 be a sequence of pairwise disjoint open in-
-1
u
2”
,=I.
.
2“-1
0;;.Let hn :=
gn:, , Prove that h := lim hn (taken
n+m
, = l.
.
pointwise) is differentiable on W and h‘ is discontinuous on C Q
(0 Prove that h‘ is bounded, but not Riemann integrable on [0, I ]
(g) Prove that h‘ is Lebesgue measurable, and hence Lebesgue integrable on [0, I].
Hint. Represent h‘ as a sum of measurable functions and obtain (h’)-’
or intersection of preimages of ( u , 00) under these functions.
[ ( a , x ) ] as a union
12.3 L’H6pital’s Rule
Limits of functions involving transcendental functions can be hard to compute. The algebra either involves power series, or it might even be so complicated that it is virtually
impossible. L‘H8pital’s Rule is a way to replace the limit of a quotient with the limit
of the quotient of the derivatives, which may be easier to compute. The idea behind
oc
L‘H8pital’s Rule is easily explained with power series. If f ( x ) =
x
f k ( x - a ) k and
k=O
gk(x - a)k are power series with f ( a ) = g ( a ) = 0 and if g ’ ( a ) f 0, then
g(x) =
k=O
200
12. Transcendental Functions
That is, the limit of the quotient of the functions is the quotient of the derivatives.
Because not every function is a power series and because derivatives cannot be defined at infinity, in general we expect the limit of the quotient of the derivatives to
f (XI
f’(x) . .
be on the right side. To prove lirn -- lim -, it is tempting to apply the
X t “ g(x)
x+a g’(x)
Mean Value Theorem to the quotient, using that f ( a ) = g ( a ) = 0. But arguing
f (XI
f(x)- f ( a )
lirn -- lim
x+a g ( x )
x+a
X -a
g(x)-g(a)
= lim
x+a
f/o
is problematic.
g’(cg)
If c f and cg approach a
X-U
at different rates, then the limit need not be lirn - To get the right limit, we need
x+a g’(x)
cf = cg, which can be achieved with a stronger form of the Mean Value Theorem.
Theorem 12.19 Generalized Mean Value Theorem. Let f , g be functions that are
continuous on [ a , b] and di#erentiable on ( a , b) and let g ( a ) f: g(b). Then there is a
- f’(c>
number c E ( a , b ) such that f ( b ) - f ( a ) g(b) - g(a> g’(c) .
Proof. Let h ( x ) := [ f ( b )- f ( a ) ] g ( x )- [g(b) - g ( a ) ] f ( x )and apply Rolle’s
Theorem. (See Exercise 12-25.)
Now we are ready to prove L‘HBpital’s Rule.
Theorem 12.20 L’H8pital’s Rule. Let a E [-a,
co]and let f,g be differentiable
functions dejned on an interval ( z , 00) ( i f a = GO), or (-co,z ) ( i f a = -00), or
(a - 6, a + 6 ) \ { a ) ( i f a E R). Ifthe limits o f f and g satisfy lim f ( x ) = lirn g ( x ) = 0
x+a
or both limits are in {hm}and lim
x+a
f
X+a
f’(x>
exists as a number or is infinite, then
g’(x)
(XI
lim -- lim -. This rule also applies to one-sided limits.
g(x)
x+a
x--ta
g’(x)
f’o.
First consider a E (-m, m] and L +
g’(x)
Proof. Let L := lirn
x+a
-00. We claim
f
(x)
that for all yo < L there is an xo < a so that for all x E (xo, a ) we have g(x) >
Let yo < L . Then there is an x1 < a such that for all x E
f
’(XI
a ) we have g’(x) >
In case lirn f ( x ) = lirn g ( x ) = 0, by the Generalized Mean Value Theorem for
x-+a
all x
E (XI,
(XI,
x+a
a ) there is a c
E
( x , a ) so that
f ( x > f ( x >- f ( a ) - f’(c) > yo, which
g(x>- g ( x ) - g ( a ) g’o
means in this case the claim holds with xo := XI.
In case lirn f ( x ) = lim g ( x ) = 00,we can assume without loss of generality that
X-+a
x--ta
f and g are positive on
[XI,
a ) . Let
E
> 0 be such that ___ < L . Find an
(1 - &)2
12.3. L’H6pital’s Rule
x2 E ( X I , a )
an xo
20 1
so that for all c
E (x2, a )
E (x2, a )
so that for all x
f’ ( c )
we have that ->
g’(c)
E (xo, a )
we have that
~
(1 - E ) 2 ’
f(x) - f(X2)
Then find
> 1 - E and
- g (x2 ) > 1 - E . Then for all x E (xo, a ) by the Generalized Mean Value Theg(x)
orem there is a c E ( x 2 , x ) so that f(x) - f ( x 2 ) - f’(c) >and hence for
g ( x > - g(x2)
g’(c)
(1 - E l 2 ’
all x E (xo,a ) we obtain
f (x)
f ( x )- f ( x 2 )
f(x)
g ( x ) - g(x2)
f’(c)
> -(1
g(x)
g ( x ) - g(x2) f ( x > - f ( x 2 )
g(x)
g (c)
-El
2
> yo,
which proves the claim in this case.
The other cases for the limits of f and g being infinite are proved similarly, so the
claim is proved. The claim proves the result for L = 00 for left limits at a E R and for
limits at infinity.
Similar to the above we can prove that for L # 00 for all yo > L there is an xo < a
so that for all x E (xo, a ) we have - < yo. This proves the result for L = -cm for
g(x>
left limits at a E R and for limits at infinity.
Putting the two results above together, for L E R and every
xo < a so that for all x E (xo, a ) we have L - E < ’(’)
<L
P(X’)
E
> 0 there is an
+ E , which proves the
result for L E R for left limits at a E R and for limits-at 60.
Repeating the above process for a E [-m, 00) to the right of a establishes the
result for a = -00, for right limits at a E R and for two-sided limits at a E R.
Exercise 12-28 shows that L‘HBpital’s Rule is not a one-for-one swap. The limit
lim f ( ’ ) can exist even if lim
x+a
g(x)
x-+a
f’(X>
fails to exist.
g’(x)
Aside from the obvious applications to quotients, L‘HBpital’s Rule also allows us
to derive a well-known representation for the exponential function.
Theorem 12.21 For all x
Proof. For all x
E
E
R we have ex
= lim (1
n+cc
+ x--)n .
R,we obtain
Because the natural exponential function is continuous, the result follows.
202
12. TranscendentalFunctions
Exercises
12-25 Prove Theorem 12.19. Remember to prove that Rolle’s Theorem can be applied
12-26 Compute each of the limits below.
(a) lirn
2x4
(g)
+
-t x 2 x
x 2 - 16
- 4x3
x+4
-
276
(b)
lim xln(x)
1
1
lim -- ___
sin(x)
cos(x) - 1
12-27 Prove by induction that for all n E W we have lim x”e-’
X’cs;
12-28. Prove that for f ( x ) = 2x
lirn
x+m
ln(x)
(c) &mm
x+o+
= 0.
+ sin(x), g(x) = 2x - sin(x) and a = co we have
f‘cx,
does not exist.
g
lirn
x+w
f (x)
~
g(x)
= 1, but
(XI
12-29. Let f : (0,co) + R be differentiable. Does lirn f’(x) = 0 imply that lirn f ( x ) exists? Justify
x +30
X+lX
your answer.
12-30, Is there a differentiable function f : (0, m) + W with lim f ’ ( x ) = 0 so that for every real number
(x~)?=~that goes to infinity and so that n+lim
X + M
L there is a sequence
f ( x n ) = L? Justify your answer.
12-31. Creating summation formulas. This exercise shows one way in which summation formulas for
c
n
powers of integers (see Exercise 1-33) can be discovered. Let f ( x ) :=
ekx.
k=l
c
n
(a) Prove that for all p E
N we have
kP = f@)(O).
k= 1
n
(b) Prove that for all x E
W we have
ekx =
k=l
(c) Prove h a t for all p
E
W we have
f(P’(0)
ex
- e(n+l)x
1-ex
dP
= lim xiodxp
’
- e(n+l)x
1-ex
’
n
k =
(d) Use l’H8pital’s Rule to verify that
k=l
22 (n + 1) for all n
E W.
(e) Use a computer algebra system to generate a closed formula for
k5.
k=i
12-32. Use Cauchy’s Limit Theorem (see Exercise 2-51) to give an alternative proof of 1’Hbpital’s Rule.
Chapter 13
Numerical Methods
Many problems cannot be solved exactly. Therefore it is natural to consider computational approaches to mathematics. Numerical analysis is a wide field. For any
problem that can be solved exactly (under good circumstances), there is at least one
numerical method to provide an approximate solution in case exact methods fail. Usually, a numerical method contains a parameter, call it n , that indicates the computational effort required to obtain the approximation. With enough computational effort,
a numerical method should provide approximations close to the exact solution. More
formally, this means that as n goes to infinity the limit of our approximations should be
the exact solution. But just having approximations that converge to the exact answers
usually is not enough. We want to obtain good approximations with as little computational effort as possible. This means we not only need to assure that, given enough
computational effort, the approximations converge to the correct result. We also must
analyze how fast the approximations converge.
In the language that we have developed, this means that just showing that for every
N E N so that for all n 2 N the nth approximation is within E of the
exact solution is not enough. We also want our estimates to be sharp enough so that
when, say, N = 10 guarantees a desired accuracy, we do not use a larger N and waste
computational effort. So, where in proofs so far we were satisfied with the fact that N
exists, in numerical analysis we want to know what N is. Where in proofs so far we
were satisfied that estimates ultimately showed that a certain difference is smaller than
E , in numerical analysis we want to perform the estimate with an N that is as small as
possible.
E
> 0 there is an
For this reason, this chapter will emphasize error analysis. We present numerical
approaches for three typical tasks: The representation of functions in Section 13.1, the
solution of equations in Section 13.2 and the computation of integrals in Section 13.3.
203
204
13. Numerical Methods
13.1 Approximation with Taylor Polynomials
It seems mundane, but the most fundamental numerical task is the computation of the
values of functions such as exponential and trigonometric functions. The exact values
of these functions can only be computed for certain special input values x. For all other
input values, we need to use approximation techniques. The most fundamental of these
techniques is the approximation with Taylor polynomials. There are several ways to
motivate the use of polynomials. Most importantly, polynomials are easy to compute,
which is a paramount concern in numerical analysis. Moreover, for each of the many
functions defined as power series there is a sequence of polynomials that converges to
it.
Geometrically, we can argue that the tangent line of a differentiable function at
a point a has the same value and the same first derivative as f at a and, locally, it
approximates f rather well. We have reason to hope that, by increasing the number
of derivatives that agree with the derivatives of the function at a, we can enlarge the
interval on which we have a good approximation for f . To increase the number of
derivatives that agree at a , we need to use polynomials of degree greater than 1.
Theorem 13.1 Let the function f be n times diTerentiable at a. Then the polynomial
f(j)(a)
( x - a)’ is such that thejrst n derivatives of Tn at a are equal
T,,(x) :=
j!
c
~
i =O
to thefirst n derivatives off at a. That is, Tjk)(a)= f ( k ) ( afor
) k = 0, . . . , n.
Proof. Prove by induction that for 1 Ik In the kth derivative of T,(x) is
T,‘k’(x) =
c
~
j=k
f ( j ) ( a ) j. (
j !
j - 1). .. ( j - k
+ l)(x - a ) j - k . (Exercise 13-1.)
Theorem 13.1 motivates the definition of Taylor polynomials and Taylor series.
Definition 13.2 Let thefunction f be n times diyerentiable at a. Then the polynomial
T n ( x ) :=
c
f(j)(a)
~
i =O
j!
(x - a ) j is called the nth Taylor polynomial of f at a.
f(j)(a)
j!
___ (x - a ) j
is infinitely difSerentiable at a, the series T ( x ) :=
j=O
If f
is called the
Taylor series o f f at a.
The definitions of the exponential and the trigonometric functions guarantee that
the Taylor polynomials at a = 0 ultimately provide good approximations for these
functions (see Exercise 13-2). However, as mentioned in the introduction, for numerical purposes it is not sufficient to just know that for some degree n the nth Taylor
polynomial of f is close to f .We need to know how close T,, is to f .
+
Theorem 13.3 Taylor’s Formula. Iff is ( n 1) times continuously differentiable on
( a - R , a + R ) , then for all x E E% with Ix - a / < R we have
13.1. Approximation with Taylor Polynomials
205
if Ix - a / < R and M is such that for all x E IR with Ix
M
have f ( " " ) ( x ) I
M , then ] f ( x ) - T,(x) I 5 Ix - aIn+l.
In particulal;
I
x
I
(n
- a1 <
R we
+ l)!
Proof. This proof is an induction on n. For the base step n = 0 note that for all
R , a R ) we have
+
E (a -
For the induction step, we use integration by parts on the induction hypothesis. Let
R ) and let f be n+2 times continuously differentiable on (a - R , a R ) .
Then
x
E (a - R , a
f(x)
+
+
=
T,(x)
+ (x - a)n+'
=
Tn(x)
+ (x - ay+'
I'l y f ' " + ' ' +
(a
(1 - U),+l
( n l)!
+
f(n+l)(a
u(x - a ) )du
+ u(x - a))
The remainder of the proof is left to Exercise 13-3.
There are two ways to use error estimates like Taylor's formula. In an a posteriori
or after the fact estimate, the polynomial and the interval are given and we estimate
the error. This is straight substitution into the formula (see Exercise 13-4). In an a
priori or before the fact estimate, we have a desired accuracy for an interval and need
to find an n that guarantees this accuracy.
Example 13.4 Determine the degree n so that the nth Taylor polynomial off (x)= e"
about a = 0 is within loe4 of ex for all x E [-2, 21.
To bound the error by the maximum acceptable error of
we make the upper
bound of the error from Theorem 13.3 less than the specified acceptable error. This
makes the actual error less than the acceptable error. In the notation of Theorem 13.3,
we have a = 0, Ix --a I 5 2, and n is to be determined. All derivatives of ex are again e"
and ex is increasing. Hence, the upper bound M for the (n l)Stderivative on [-2,2]
+
206
13. Numerical Methods
Figure 26: Taylor polynomials are local approximations to the function. The left graph
shows the exponential function and its first (dashed), second (dotted) and third (dashdotted) Taylor polynomials. Note how the approximation gets better as the degree
of the polynomials increases. The right graph shows the exponential function and its
eleventh Taylor polynomial as demanded in Example 13.4.
e2
2n+1 i where the inequality we need
( n l)!
to solve is marked with an exclamation sign.
The above inequality cannot be solved algebraically for n . However, n ! grows
much faster than 2". Thus by substituting values for n and checking if the inequality is
satisfied we find that n = 11 is large enough to guarantee the desired accuracy. For a
visualization of the Taylor polynomials, consider Figure 26.
0
is e 2 . We set I f ( x > - T,(x)/ 5
~
+
Remark 13.5 Not every function can be approximated well with Taylor polynomials.
For example, Lemma 18.8 exhibits a function that is infinitely differentiable, not identical to zero, and yet all its Taylor polynomials at a = 0 are identical to zero.
0
Remark 13.6 Early operating systems, such as the one on the Commodore 64 in the
1980s, used Taylor polynomials of sufficiently high degree to compute many functions.
Nowadays, computational schemes that are faster than Taylor polynomials, but also
more memory intensive, are used to compute functions. The reason is that memory is
not as much of an issue as it was in the early days of computing, while speed remains
a crucial concern.
0
Remark 13.7 Taylor polynomials are also used in physics to obtain low-order approximations of complicated functions f.Typically, i f f is to be evaluated at x Ax, where
Ax is small, the exact expression f ( x Ax) is replaced with the approximately equal
expression f ( x ) f'(x)Ax. This is feasible because, as Taylor's Formula shows, the
difference is often bounded by C ( A X ) ~where
,
C is a constant. If Ax is small, terms
of the order (Ax)* are usually negligible. The determination what is small and what is
negligible is made based on practical, nonmathematical considerations. A posteriori,
if the approximate formula correctly predicts an experiment, then the approximation
must have been permissible. A priori, one could say that if other effects influence the
quantity given by f by, say, 0.1% (of the underlying base unit), and Ax is at most
I%, then (Ax)' is less than 0.01%, so it can be ignored because other effects will have
greater influence. If a first order approximation as indicated does not work, higher0
order Taylor polynomials can be used in more sophisticated models.
+
+
+
13.1. Approximation with Taylor Polynomials
207
Functions can also be approximated with trigonometric polynomials. This idea is
motivated by problems as described in Section 21.3. We will present the corresponding
series, called Fourier series, in Section 20.2. The powerful tools available by then allow
for a more efficient presentation than what would be possible now.
Exercises
13-1 Prove Theorem 13.1,
c
m
13-2 Prove that if f ( x ) =
ckxk is a power series with nonzero radius of convergence, then the Taylor
k=O
c
m
series o f f about a = o is
Ckxk
k=O
+ 1) times continuously differentiable on (a - R , a + R ) , then for all x E R
1
1 ( n +M I)! / x - a / " + ' , where M is such that for
all 1x - a1 < R we have 1 f ( " + ' ) ( x ) I < M .
13-3 Prove that if f is ( n
with / x
-
a / < R we have that f ( x ) - T n ( x ) 5
~
13-4 Find an upper bound for the error incurred when approximating f on [ l , I ] with its nth degree Taylor
polynomial at a .
(a) f ( x ) = e x , a = 0, n = 10, [ l , rl = [-5,51
(b) f ( x ) = sin(x), a = 0, n = 7, [ L , r ] =
[22 'T
"1
13-5 Use induction to prove that the given expression is the nth derivative of the given function.
(a) f ( x ) = In 1x1,f ' " ) ( x ) =
(bj f ( x ) = 2', f'"'(xj = 2'
- l)!
(-l)"+'(n
for n ? 1
X"
(ln(2)
)"
+ ne"
1 - a"
f ( x ) = x e a x , f ( " ) ( x ) = anxeaX + -eax
1-a
(c) f ( x ) = x e x , f ( " ) ( x ) = .ex
(d)
13-6 Determine the smallest n so that
polynomial of f about a .
1 T,(x) - f ( x ) 1
<
E
for all x
E
[ l , r ] , where T, is the nth Taylor
(aj f ( x ) = e x . a = 0,[ L , r ] = [-lo, 101, E = l o r 5
(b) f ( x ) = cos(x), a = 0, [ l , r ] = [-n,
TI,
(cj f ( x ) =
A,
a = 1, [ l , r ] = [.5. 1.51,
E
E =
= lo-''
(d) f ( x ) = In@), a = 2, [ l , I ] = [ l , 31, E = lo-*
13-7 Prove that for any a > 0, the Taylor polynomials of f ( x ) = In 1x1 at a converge for x E ( 0 , 2 a ) and
they diverge for lx - a1 > a .
13-8 Second Derivative Test. Let f : (a, b ) + R be twice continuously differentiable and let x E ( a, b )
be so that f ' ( x ) = 0.
(a) Prove that if f " ( x ) > 0, then there is an E > 0 so that for all z i:x with 1z - X I < E we have
that f ( x ) < f ( z ) ,
(b) Prove that if f " ( x ) < 0, then there is an E > 0 so that for all z # x with ) z - X I iE we have
that f ( x ) > f ( z ) .
208
13. Numerical Methods
(c) State and prove a similar result for f : ( u , 6) + IF?, being n times continuously differentiable
with f ’ ( x ) = f ” ( x ) = . . . = f ( n - l ) ( x ) = 0. (Distinguish even and odd n.)
n
13-9. Efficient evaluationof polynomials. Let p ( x ) =
u j x j be a polynomial.
j =O
(a) Provethatp(x1 = a ~ + x ( u l+ x ( u 2 + . . . + x ( u n - l + x ( a n ) ) . . . ) ) .
n
(b) Count the number of operations in the evaluation of the sum
a j x j and in the evaluation
j=O
in part 13-9a to prove that evaluation as in part 13-9a takes fewer operations (and is thus more
efficient) than evaluation of the original sum.
Hint. Evaluating u j x J takes j floating point multiplications and floating point multiplications
take much more time than floating point additions.
(c) State an n step recursive procedure that evaluates polynomials as in part 13-9a.
Hint Start with Hn := an and define H n - l , . . . , H1 in such a way that H1 = p ( x ) .
13.2 Newton’s Method
Solving equations is a common numerical task. The Intermediate Value Theorem guarantees that for equations f ( x ) = 0 the issue usually is not ifwe can find solutions, but
rather how fast we can compute them.
Example 13.8 The bisection method. Let f : [a, b] + R be continuous with
f ( a ) f ( b ) < 0. Then by the Intermediate Value Theorem f has a zero in [ a , b ] . To
simplify the presentation, assume without loss of generality that f has a unique zero z
in [ a , b]. We will recursively construct a sequence [ x , } ~ =that
~ converges to z .
Let xg := a , x1 := b, and j ( 1 ) := 0. For the recursive construction, let xg,. . . , xn
b-a
and j ( n ) E ( 0 ,. . . , n - 1) be SO that f ( ~ ~ ) f ( x j (<~ )0 )and /xn- xj(n) = -.2n-1
I
Define xn+l := x n + x j ( n ) . If f(x,+l) = 0, stop the recursion because xn+l is the
2
desired zero. Otherwise, if f ( x n + l ) f ( x j ( n ) ) < 0 set j ( n
1) := j ( n ) , and if
b-a
f ( x n + l ) f ( x n ) < 0 set j ( n
1) := n. In either case, Ixn+l - . ~ j ( ~ + =
l ) l-,
2n
and we can continue the recursion.
By the Intermediate Value Theorem and the uniqueness of z , for each n E N we
b-a
obtain Ix, - zI 5 -, and hence lim x, = z .
2n- 1
n+oo
If f has multiple zeroes in [ a , b] the argument becomes a bit messier, but the sequence will converge to one of the zeroes of f.
+
+
Aside from establishing convergence, the estimate that in the nth step we are within
b-a
2”-I of a zero of f tells us how many steps we need to execute to obtain an estimate
of a given quality. For example, if b - a = 1 and we want a number that approximates
the first 15 decimal places of the zero z , we would need to execute 51 steps of the
bisection method (2-51+’ = 2-50 = 1,024C5 < 10-15).
209
13.2. Newton’s Method
Figure 27: An illustration of Newton’s method that will be analyzed precisely in Exercise 13-12.
This is a large number of steps to carry out by hand. Nonetheless, to solve a single
equation, a program would find the desired approximation rather quickly. However,
when these methods were first designed, computers had not been invented, so a faster
computation was desirable. For today’s applications, speed remains an issue, because
in large scale computations we may need to solve billions of equations. This means
every step we save would be magnified by a large factor, leading to a more efficient use
of computational resources.
Newton’s method reduces the (hard) computation of the zero of a differentiable
function to the (easier) computation of the zero of the tangent line at a point. Say we
are currently at a point x,. Then the tangent line of f at x, is the unique straight
line that goes through the point (x,, f(xn)) and has slope f’(x,). Setting its equation
y = f(x,)
f’(x,)(x - x,) equal to zero and solving for x gives the expression
+
x=xn--
(”) for the zero of the tangent line. Of course, the zero of the tangent
f ’(x,)
line is not the zero of the function, but we would hope that it is closer to the zero of
f than x, was. Hence, we set x,+l := x,
place of x, (see Figure 27).
-
’(“)
and repeat the above with
f ’(xrl)
x,+1 in
Definition 13.9 Let f be a diferentiable function and let xo be a point in its domain. Then Newton’s method started at xo generates a sequence dejked recursively
f(x,) f o r n 2 0.
by X,+I := X, - -
f ’(xn)
If the sequence generated by Newton’s method converges, then under mild hypotheses the limit is a solution of the equation f(x) = 0.
Theorem 13.10 Let f be a continuously dzferentiable function. If the sequence generated by Newton’s method converges to a limit L in the domain off and f ’ ( L ) # 0,
then f ( L ) = 0.
210
13. Numerical Methods
Proof. Because the sequence
of the recurrence relation.
lim
n+cc
{xn}Zn
converges, we can take limits on both sides
.
xn+l
-
= lim
n-oo
(x,
- - leads to the equation
L = L - - f ( L ) , which clearly implies f ( L ) = 0.
f ’(L)
w
The reader will prove in Exercise 13-10 that L is also a zero of f if f ’ ( L ) = 0.
Another way of looking at the proof of Theorem 13.10 is to say that the limit L
is a fixed point of the function F ( x ) = x - -. Fixed points are a standard tool in
f’(x)
numerical and abstract mathematics. We will encounter fixed points again in Theorems
17.64, 17.65, and 22.6.
Theorem 13.10 says that ifNewton’s method converges, the limit is a zero of the
function. It does not guarantee that the sequence from Newton’s method converges.
However, there are mild conditions that guarantee convergence.
Lemma 13.11 Let q E [0, l), let F : ( a , b) -+ IR be a difSerentiablefunction so that
F ’ ( x ) < q f o r all x E ( a , b ) and let p E ( a , b) be so that F ( p ) = p ( p is also called
a fixed point of F ) . Then p is the onlyfixedpoint of F in ( a , b ) and f o r all xo E ( a , b )
the sequence generated by the recursive equation x,+l := F ( x , ) converges to p .
1
I
Proof. To prove that there is no F E ( a , b) \ { p } with p” = F ( 3 note that if
F = F ( 3 , then by the Mean Value Theorem there would be a c between p” and p so
that IF- p / = / F ( 3- F ( p ) l = IF’(c)/ IF- pl < IF- p i , whichis acontradiction.
Hence, p is the only fixed point of F in ( a , b).
Now let xo E ( a , b). Because the powers qn converge to zero, the result is proved
if we can establish the inequality lx, - pi 5 q“ 1x0 - pI for all n E N.The inequality
is proved by induction on n.
For the base step n = 1 note that by Mean Value Theorem between xo and p there
must be a c so that 1x1 - p ( = IF(xo) - F ( p ) l = IF’(c)/ 1x0 - pi I
qlxo - p J .
For the induction step note that there must be a c between x, and p so that
w
Ix?I+l-PI = l F ( x , ) - F ( p ) / = / F ’ W Ixn-pl 5 4 (qnIXO-pI)
= qn+lIxo-PI.
Theorem 13.12 Let f : ( a , b ) -+ R be twice continuously diTerentiable. Then f o r
each zero z off in ( a , b ) with f ’ ( z ) # 0 there is a 6, > 0 so that f o r ever)! starting
point xg E ( z - 6,, z 8,) the sequence generated by Newton’s method converges to z.
+
Proof. We apply Lemma 13.11 to F ( x ) := x - f ( x ) . First note that
f ’(XI
In particular this means that F ’ ( z ) = 0, and hence there is a 6, > 0 so that
1
I F ’ ( x ) / < - < 1 for all x E ( z - 6,, z 6,). But then by Lemma 13.11 every se2
quence generated by Newton’s method started at any xo E ( z - 6,,z SZ) converges
to z .
w
+
+
21 1
13.2. Newton's Method
From Theorem 13.12 and its proof we infer that the closer xo is to a zero of f ,
the faster Newton's method will converge. But when Newton's method converges,
the numbers xn will get ever closer to a zero of f . Hence, as Newton's method is
executed, the speed of convergence should accelerate. Theorem 13.14 below makes
this statement more precise.
Lemma 13.13 Let f : ( a , b ) + E% be continuously differentiable and let y > 0 be
so that for all x,z E ( a , b ) we have that l f ' ( z ) - f ' ( x ) ( < y l z - XI. Then for all
Y ( z - x(2 holds.
x,z E ( a , b ) the inequality 1 f ( z ) - f ( x ) - f ' ( x ) ( z - x) 1 < 2
Proof. Without loss of generality assume that x < z . Then
=
Y -x)2
-(t
2
JX
Y - x)2
= -(z
2
Theorem 13.14 Let f : ( a , b ) -+ R be a continuously differentiable function so that
f ' ( x ) # 0 for all x E ( a , b). Assume there are xg E ( a , b) and a , j3, y > 0 so
1 :,(; 1
1
1
that - 5 a, so that for all x E ( a , b ) we have f'(x) i B, so that for all
I
( a , b ) we have f ' ( z ) - f'(x)/ 5 ylz -XI,
so that h := - < 1 and so that
2
a
with r := -we have [XO - r, xo r ] G ( a , b). Then
1-h
x,z
E
+
I . Each recursively dejinedpoint xn+l := xn - f ( x , ) is in (xo - r, xo + r ) .
f'(xn)
2. The sequence (xn}ZO
converges to a point u E [ X O -r, xo+r] with f ( u ) =O.
3. Foralln>Owehaveju-x,I
h2"-1
( a - 1 - h2".
Proof. We first prove by induction that for all n E N the point xn is well defined,
< ah2"-1-1 , and Ixn - xo/ < r . For n = 1 the above is trivial. For the
lxn - xn-ll induction step, n + n 1 first note that because (xn- XOI < r the number xn+i is well
defined. The definition of xn implies that f ( ~ ~ - 1+) f'(xn-l)(xn - x n - l ) = 0 , which
implies
+
212
13. Numerical Methods
But then, using a telescoping sum, we obtain the following.
which finishes the induction.
To prove that {x~}:=~ converges, let m 2 n. Then
lXm
- Xj-1
5
c
m
lXk
- Xk-11 5 a
k=n+l
c
m
h2k-'-1 = a
m-n-1
h2"-1h2"(2'-1)
j=O
k=n+l
1
goes to zero as n goes to infinity, so
1 - h2"
{x~}:=~ is a Cauchy sequence. (Mentallyfill in the argument.) In particular, (x,},"=~
converges to a number u , which by Theorem 13.10 must satisfy f ( u ) = 0. Letting m
1
go to infinity in the above estimate also shows that Iu - x, I 5 ah2"-'
which
1 - h2" '
finishes the proof.
Because 0 < h < 1 the bound ah2"-'-
~
Theorem 13.14 shows that once Newton's method is "close enough" to a zero of
a function, it converges quite rapidly. Indeed, near a zero u of a continuously differentiable function f , the hypotheses of Theorem 13.14 can be satisfied if f ' ( u ) # 0,
because we can make a small by starting near u . For comparison with the bisection
1
22"-1 method, suppose, for argument's sake, ct = j3 = y = 1. Then Iu - X n 1 I
ton's method six time; to obtain & approximation that gives the first 15 digits behind
the decimal point of u.
Another nice feature of Theorem 13.14 is that it can be generalized to several variables (see Exercise 17-44).
Finally, note that even though Newton's method is only applicable to differentiable
functions, Exercise 13-13 shows that it can be modified to provide a method that is
applicable to all functions.
Exercises
13-10, Let f be a continuously differentiable function. Prove that if the sequence generated by Newton's
method converges to a limit L in the domain of f and f ' ( L ) = 0, then f ( L ) = 0.
1
I
13-1 1. Let q > I, let F : ( u , b ) + W be a differentiable function so that F'(x) > q for all x E (a. b )
and let p E ( u , b ) be so that F ( p ) = p . Prove that p is the only fixed point of F in ( a , b ) and that
for all xo E (a, b) \ [ p } the sequence generated by the recursive equation xn+1 := F ( x n ) terminates
after finitely many steps with a value that is not in ( a , b).
13.2. Newton’s Method
13-12. Let f : ( a , b ) -+
213
R be twice differentiable.
(a) Let z E ( a , b ) be so that f ( z ) = 0, f’(x) > 0 for all x E ( z , b ) (that is, f is increasing on
( z , b ) ) ,and so that f ” ( x ) > 0 for all x E ( z , b ) (that is, f is concave up on ( z , b)).Prove that
the sequence generated by Newton’s method started at any xo E ( z , b) converges to z .
Hint. Prove that z 5 xn+l 5 xn for all n E N,
Note. Figure 27 provides a geometric visualization of the claim in this exercise.
Second note. In particular, it is allowed that f’(z) = 0 in this exercise.
(b) Prove that i f f ” is continuous, then for each z E ( a , b ) with f ( z ) = f ’ ( z ) = 0 and f”(z) # 0
there is a 6, z 0 so that for every starting point xo E (z - 6,,z 6,) the sequence generated
by Newton’s method converges to z.
+
13-13. Let
f be a function, let x0,xi
E
R with
f(x0)
# f ( x 1 ) and consider the recursively defined
(a) Show that the recursive formula is obtained by taking the equation of the secant line of f
through ( ~ ~ - 1f(x,-l)
,
) and (x,, f ( x n ) ) and computing its unique zero.
(b) Prove that if the sequence generated with this method converges to L and
bounded for x ,z near L , then f(L)= 0.
f ( z ) - f(x)
is
z-x
+
f ,z < x1 < xo, f is twice differentiable on ( z - 6,xo 8 ) for
some 6 z 0, and f is increasing and concave up on ( z - 6,xo 6) (that is, f ” ( x ) > 0 for all
x E (z - 6,xo 6)), then the sequence generated by this method converges to z .
Hint. Prove that z 5 xn+1 5 X , for all n E N.
(d) Prove that in a situation as in part 13-13c the sequence [ x , ) ~ = ~converges at least as fast to z
(c) Prove that if z is a zero of
+
+
‘x
iz x,L o x,” for
as the sequence x N
Hint. Prove that 5
generated with Newton’s method and
5
xt =
XO.
all n
13-14. Explain why xo = 1 is not a useful starting point for finding the zeroes of f ( x ) = x 3 - 3x + 4 with
Newton’s method.
13-15. Explain why xo = 1 is not a useful starting point for using Newton’s method to find the zeroes of
1. Sketch a rough graph of f and of the tangent lines used to
the function f ( x ) = x 6 - 4x2 - x
compute x1 and x2 to illustrate your point.
13-16. For the function f ( x ) = x 3 - 5 x - 5 , execute Newton’s method started with xo = -2. Use a
calculator or a computer.
+
(a) Find x1,x2,x3, x4.
(b) Find the first n such that your computer shows x, = xn+1. Explain why n is so large.
+
13-17. Apply Newton’s method to f ( x ) = 6x4 - 18x2 - 6 x 1 with xo = -1. Explain why the limit is
not the zero o f f that is closest to the starting point.
13-18. The limit L computed with Newton’s method does not always give f(L)= 0.
+
Apply Newton’s method with xo = 1. Use a com(a) Let f ( x ) = (lO1oo
los0) x puter and call the apparent limit L .
(b) Compute f(L).Is f(L) zz O?
(c) Compute the zero o f f exactly and compare it to L. Is L close to the zero?
(d) Explain why Newton’s method cannot produce a better approximation to the actual zero o f f .
13-19. Square roots.
(a) Use Newton’s method and the fact that &is the solution of the equation x 2 - a = 0 to devise
a recursive method to approximate square roots.
Note. This is one algorithm that is used in computers to approximate square roots.
(b) Prove that for any xo > 0 the sequence generated in part 13-19a converges to & and for any
xo < 0 the sequence generated in part 13-19a converges to
-A.
13. Numerical Methods
214
13.3 Numerical Integration
We conclude our introduction to numerics with numerical integration. Although the
Darboux and Lebesgue integrals are useful, even essential, for the development of integration theory, the Riemann integral and the Fundamental Theorem of Calculus are the
best tools for numerical considerations. The key to numerical integration is to replace
the function we want to integrate with a function that is close to it and which can easily
be integrated. The simplest geometric idea is to choose points on the function and let
a polynomial go through these points. This section will focus solely on this idea. For
other approaches to numerical integration, consider [28].
n
Definition 13.15 Let X O , . . . , x,
Li(X) :=
kfi
X
- Xk
__ .
xi - xk
E
R with xi # X k for i # k and define the polynomial
Then Li is called a Lagrange polynomial.
Theorem 13.16 Lagrange’s Interpolation Formula. Let n E N,xo, . . . , x , E R
with xi # X k for i # k , and let f o , . . . , f n E R. Then the polynomial defined by
n
thatfor all i = 0 , . . . , n we have P,,(xi) = fi.
Proof. The equation follows easily from the fact that for all j E {0, . . . , n } we have
O’ if j
” Regarding uniqueness, suppose that Q was
1, if j = i .
*
another polynomial of degree 5 n with Q ( x j ) = f , for all j = 0, . . . , n . Then P - Q
is a polynomial of degree 5 n with n 1 zeroes. By the Fundamental Theorem of
Algebra (see Exercise 16-74 for a proof), this means that P - Q = 0 or P = Q.
+
With the polynomial P, going through the points (xi, f i ) the most natural way to
approximate a function f is to set fi = f ( x i ) . We first note that if we start with a
polynomial f of sufficiently low degree, then we simply get f back.
Corollary 13.17 Let f be a polynomial of degree 5 n and let P,, be a polynomial
computed as in Theorem 13.16 with numbers f i = f ( x i )for distinct numbers xi E R,
i = 0 , . . . , n. Then P,(x) = f ( x )for all x E R.
Proof. Both P, and f are polynomials of degree 5 n that go through the points
( x i , f ( x i ) ) . The equality follows from the uniqueness of P,.
To obtain a convenient integration formula, we need to express the integral of P,
in terms of the fi = f (xi). Interestingly enough, as long as the xi are equidistant, the
coefficients in the formula do not depend on the interval over which we integrate.
Proposition 13.18 Newton-Cotes formulas. Let n
d-c
tewal. Define h := -, f o r i = 0 , . . . , n let
n
Xi
E
N and let
:= c
[c,d ] be an in-
+ i h , let f j
E
R and
13.3. Numerical Integration
215
't
Figure 28: Visualization of the approximation of an integral over two subintervals
with Riemann sums with evaluation at left endpoints ( a ) , trapezoidal sums ( b ) and
Simpson's Rule (c).
n
Pn(x) d x = h
fiai.
i =O
x-c
Proof. The key to the proof is the substitution t := h '
The most obvious way to approximate an integral is to approximate a function
f : [a. b ] -+ R with a polynomial of sufficiently large degree and then use the appro-
priate Newton-Cotes formula. However, for n 1. 7 some of the ai become negative
(see Exercise 13-20), which leads to problems with cancellations of digits. Moreover,
the a; are hard to compute for large n. Therefore an interval [ a ,b] is usually first partitioned into shorter subintervals, the formula from Proposition 13.18 is applied on each
subinterval and then the results are added. This type of integration formula is called a
composite integration formula.
1
1
For example, for n = 1, we obtain a0 = - and a1 = -. If we now partition the
2
2
b-a
interval [ a ,b] into N subintervals of length Ax := -and apply Proposition 13.18
N
on each subinterval we obtain the approximation
216
13. Numerical Methods
which is called the trapezoidal rule, also shown in Figure 28(b). In the trapezoidal
rule, we need to evaluate the function N
1 times.
The effort in numerical integration is proportional to the number of times the function f is evaluated. To compare the performance of two numerical integration formulas, it is thus important that both formulas evaluate the function equally many times.
For this reason, we will construct composite Newton-Cotes formulas so that f is evaluated N
1 times.
For n = 2, note that each additional subinterval of length Ax requires two additional evaluations of the function. Thus for n = 2 we must demand that N is even.
1
1
4
With a0 = -, a1 = - and a2 = - we obtain the approximation formula
3
3
3
+
+
N-1
”
\
\
k=l
which is called Simpson’s Rule, also shown in Figure 28(c). In Exercise 13-21, the
reader will state composite integration formulas based on Newton-Cotes formulas for
n = 3-6.
We now turn to the error analysis once more. Peano’s error representation (Theorem 13.19) shows that a more “abstract” point-of-view can have benefits for concrete
tasks, which makes it a perfect conclusion for Part I of this text and a lead-in for the
more abstract Part 11. Consider that the integral as well as the Newton-Cotes formulas
are functions themselves. They map functions on an interval to real numbers. Moreover, the integral as well as the Newton-Cotes formulas are linear. That is, sums and
constant factors can be moved through the integral (see Theorems 5.8 and 9.25) and
also through the Newton-Cotes formulas (easy computation). Linear functions on vector spaces (like vector spaces of integrable functions) will be important in abstract
analysis (see Chapter 17). Because we adopt this more abstract point-of-view, the error representation actually is valid for any numerical approximation of integrals that is
linear, that gives exact results for low-degree polynomials and that can be “moved into
the integral” as described in the hypothesis of Theorem 13.19 (see [28] for examples
beyond the Newton-Cotes formulas).
By Corollary 13.17 the nth Newton-Cotes formula is exact for polynomials up to
degree n. Because it only evaluates the function at select points X k , multiplies the
values with numbers, and adds the results, it is linear and it can be moved into the
integral. Hence, Peano’s error representation applies to the nth Newton-Cotes formula.
Although the hypotheses for Peano’s error representation look complicated, the proof
shows that they are just what is needed to get an estimate.
Finally, Peano’s error representation also shows that we need to establish more
abstract results to fully justify more concrete results, like error estimates. In the proof
of Peano’s error representation, we work with double integrals and we reverse their
order. Formally, we have not proved yet that this is possible. Fubini’s Theorem (see
Theorem 14.66) will show that this reversal is indeed allowed. For the specific case
of double Lebesgue integrals, Fubini’s Theorem is stated in Exercise 16-80. Because
Theorem 13.19 is not used to prove these results, we will reverse the order of integration
in the proof of Theorem 13.19, anticipating that this can be justified.
13.3. Numerical Integration
217
Theorem 13.19 Peano's error representation. Let n E N,c < d and let F ( . ) be
an integration formula which is lineal; that is, F ( a f
,Bg) = a F ( f ) ,BF(g)for
all a , E R and f , g : [c,d ] -+ R,which for polynomials p of degree at most
+
n gives F ( p ) =
+
Id
p ( x ) d x , and which for all continuousfunctions g : [c,d ] -+ R
g ( t ) F ( ( x - t)nlx-tzO) dt. Forevery
g ( t ) ( x - t)nlx-r20 dt
R,let R( f ) := F ( f ) -
Riemann integrable function f : [c,d ] -+
kd
f ( x ) d x be
1
the error of the numerical integration and let K ( t ) := - R ((. - t)nl.-tzo). Then for
n!
every function f that for some 6 > 0 is n 1 times continuously diferentiable on the
+
interval ( c - 6 , d
+ 6 ) we have R ( f ) =
called the Peano kernel.
kd
f ( " + ' ) ( t ) K ( t )d t . The function K is also
Proof. By Taylor's Formula, we obtain the following for all x
f (x)
+
=
T n ( x ) (x - c)"+l
=
T,(x)
+
I'
f
("+l)
(c
+
1
U(X
E
- c))?(l
n.
[c,d ] .
- u)" du
ld
f('+l)(t)+(x - t)nlx-r10 d t ,
n.
where T,(x) is the nth Taylor polynomial of f at c. Because the integration formula is
linear, we infer for all functions g , h that
R(g+h)
=
F(g+h)-
=
F'(g1-L
=
R(g)
d
g+hdx
Id
gdx+F(h)-k
d
hdx
+ R(h).
Now because the integration formula gives exact results for polynomials of degree
5 n we know that R(T,,) = 0. Because the integral in Taylor's formula above is a
function of x we obtain the following.
R(f) =
R(T,)
+R
1
f ' " + ' ) ( t ) - i ( x - t)'lx-tz0 dt
n.
13. Numerical Methods
218
F
=
( (X
- t)" ln-tzo
)-
ld
ld(x
- t)nlx-r20
1
dx d t
f ( " + ' ) ( t ) K ( t )d t .
H
Although Peano's error representation is very versatile, we need something more
concrete for applications. One key weakness of Peano's error representation is that
it depends in a rather complicated way on the interval of integration. The next result
makes this dependency more manageable for the Newton-Cotes formulas by reducing
the error to a power of half the length of the interval, times a constant, times a value of
an appropriate derivative.
Theorem 13.20 Let n E N and let f : [c. d ] +-R be n+ 1 times continuously differentiable ifn is odd and n+2 times continuously dgerentiable ifn is even. Let RE'd1(f ) be
f ( t ) d t with the nth Newton-Cotes for-
the error w9hen approximating the integral
/"
d
inula F n ( f ) , that is, R F ' d l ( f )= F , ( f ) -
f ( t ) d t . Foroddn, there i s a
< E ( c ,d )
r
E
( c ,d ) so that R F . d l ( f )=
(n
+ 2)!
Proof. For odd n , by its definition the nth Newton-Cotes formula gives exact results
for polynomials of degree n. For even n , the nth Newton-Cotes formula even gives
exact results for polynomials of degree n 1 (see Exercise 13-22). Because the proofs
for even and odd n are very similar, we will assume throughout that n is fixed and odd.
The idea for the proof is a substitution that changes the domain of the integral from
[c. d ] to [-1. 11.
For all real numbers c < d , let KE.dl denote the Peano kernel for the nth NewtonCotes formula for the interval [ c ,d ] . This Peano kernel can be reduced to the Peano
kernel KL-'.'] on [-1, 11 as follows. A simple substitution (similar to the computation
+
below, see Exercise 13-23) shows that KF.dl ( t
+
T)
d-c
d-c
= K,[-7'T1
( t ) for all
219
13.3. Numerical Integration
t E
[
d-c
--,
2
-1
d-c
2
. Moreover, for all x
E
[--1, I] we have
n!K,
d-c
=
F,
-
d-c
-FA-',']
2
d-c
((
-9
1:
(-Ud
--
2
__
-c
2
2
- -dX )- c
'I d - c
2
du
d-cx,O
T"-T -
1
((. - x), l.-xzo) -
11
(U - x
)'
l u - x r ~du
1
To complete the error estimate, we need to use that the Peano kernels KE . d l ( t )
for the Newton-Cotes formulas on [c,d ] are of constant sign. The proof of this fact is
geometrically obvious for n = 1 (see Exercise 13-24b)and still manageable for n = 2
(see Exercise 13-24c), but the algebra becomes increasingly tedious. Because we will
focus only on n = 1 and n = 2 (trapezoidal and Simpson's Rule) in the examples,
the presentation remains (somewhat)self-contained. For a general argument that also
holdsfor other integrationformulas, consider [lo],which uses matrix methods toprove
that the Peano kernel does not change its sign.
Because the Peano kernel for the Newton-Cotes formulas does not change sign, by
the Mean Value Theorem for the Integral as in Exercise 8-21 there is a [ E (c, d ) so
that
=
f('"'(t)/
*
d-c
K k . d l( u
-7-
1>
+ci-d
du
13. Numerical Methods
220
Note that the proof of Theorem 13.20 (and hence its conclusion) will work for any
linear integration formula which can be moved into the integral, which gives exact
results for polynomials up to degree n and for which the Peano kernel does not change
sign. The reader will verify this for the Midpoint Rule in Exercise 13-25.
With Theorem 13.20 providing a bound for the error for individual Newton-Cotes
formulas, bounds for the error for composite Newton-Cotes formulas are now an easy
consequence.
+
Corollary 13.21 Let n E N, a < b, 6 > 0 and let f : ( a - 8, b
6) + R
be a function. If n is odd let f be n
1 times continuously difSerentiable and let
M := max f ("+')(x) : x E [ a ,b] . I f n is even let f be n+2 times continuously d$
1
11
{
I
ferentiable and let M := max If""(x)I
+
I.
: x E [ a ,b ]
value of the error when approximating the integral
Let C F ' b l (f ) be the absolute
Lb f ( t ) d t with the nth composite
Ja
+
+
Newton-Cotes formula with N = n j intervals, that is, N
1 = n j 1 evaluations,
( b - a)n+2 nfl+l R[-"'1
n
(xn+l)
where j E N.Then ifn is odd we have C:3b1 ( f ) 5 M Nn+l
( n 1)!2"+2 '
+
Proof. If n is odd, by Theorem 13.20 we obtain
(b - a)n+2 nn+l R[-"']
=
Nnfl
(n
(~n+l)
+ 1)!2n+2
and a similar computation gives the result for even n.
Note that in the error bounds in Corollary 13.21 only M depends on the function f ,
only ( b - a ) depends on the interval and only N depends on the number of evaluations.
The remainder, although it looks complicated, is merely a constant. Therefore, as
N --f 00 the numerical approximation of an integral of an n 1 times (if n is odd) or
+
13.3. Numerical Integration
22 1
+
n 2 times (if n is even) continuously differentiable function with the nth composite
Newton-Cotes formula will converge to the actual integral. Of course, more concrete
formulas will be better for given fixed n. For the trapezoidal rule and Simpson’s Rule,
we obtain the error bounds indicated below.
Corollary 13.22 Let f : ( a - 6 , b
+ 6 ) + R be a function.
(b - a)3
, where
12N2
[ a ,b ] } . This is the error formula for the trapezoidal
1. I f f is twice continuously dgerentiable, we have Cy’bl(f ) I
:K
{I
K = max f ” ( x ) I : x
rule with N intervals.
E
~
(b - a)5
180N4 ’
where C = max f (i”)(x) : x E [ a ,b ] ] . This is the error formula for Simpson’s Rule with N intervals.
2. I f f is four times continuously dgerentiable, we have CF,bl(f ) 5 C
[1
~
1
Proof. For n = 1, we obtain
(b- a)3
which proves C p 3 b 1f () 5 K
. The computation for Simpson’s Rule is similar
12N2
and left to the reader as Exercise 13-26.
~
If the function, the interval and N are given, the aposteriori error estimate is simply
a substitution into the formula (see Exercise 13-27). If the function, the interval and
a desired accuracy are given, the a priori estimate is obtained by demanding the error
bound is less than the desired accuracy and solving for N .
Example 13.23 Find the number of intervals needed to estimate
s_:
e
2
-2
d x with the
trapezoidal rule so that the error is less than lop4.
2
12
First, note that with f (x) = e - T we have the derivatives f ’ ( x ) = - x e - T and
12
f ” ( x ) = ( x 2 - 1) e - T . The maximum of
K-
( b - a13
12N2
I
2 1 0 - ~ means I-
23
12N2
<
1f”l
is assumed at x = 0, so K = 1. Now
2
or N 2 > -lo4. Hence, for N we obtain
3
I 7
81.65 and so N
d
;
always need to round up.
N > 100 -
%
= 82 would work. Note that in the last step, we
222
13. Numerical Methods
Exercises
13-20. Compute the coefficients a; in the Newton-Cotes formulas for n = 7. Use a computer.
13-21. For each given n compute ao, . . . , a, for the Newton-Cotes formula. Then state the corresponding
composite integration formula for N divisible by n. Finally, compute the error for the Newton-Cotes
formula and for the composite integration formula. For the computation of the cq,a computer is
recommended.
3
(a) n = 3. This is called the --rule.
8
(b) n = 4. This is called Mitne’s Rule.
(c) n = 5. This rule does not have a specific name.
(d) n = 6. This is called Weddle’s Rule.
13-22. Prove as follows that for even n the nth Newton-Cotes formula gives accurate results for polynomials
of degree n 1.
+
(a) Use the linearity property to prove that if the nth Newton-Cotes formula gives the exact integral for one polynomial of degree n 1, then it gives the exact integral for all of them.
+
(b) Prove that the coefficients a; of the nth Newton-Cotes formula satisfy ai = an-; for all
i E (0,. . . , n ) .
(cj Prove that if n is even, the nth Newton-Cotes formula gives the exact integral for the polynomial p ( ~=) x
(
-
c:d)n+l
-
and conclude that the nth Newton-cotes formula gives the
exact integral for all polynomials of degree n
+ 1.
(d) To illustrate that the result does not work for odd n , prove that the first Newton-Cotes formula
does not give the integral of f ( x ) = x 2 on [-1, 11.
13-23. Prove that for all t
E
we have K F ’ d l ( t
c+d
+T
)= K n[-+.?I
(tj
13-24. The sign of the Peano kernel for the Newton-Cotes formulas
(a) Explain why proving that the Peano kernel KA-”” for the nth Newton-Cotes formula on
[-1, I] does not change its sign on [-I, 11 proves that all Peano kernels K F ’ d l for the nth
Newton-Cotes formula on [c, d ] do not change signs on [ c , 4.
(bj Prove that for n = 1 the Peano kernel KA-’“] is nonnegative on [-1, I].
Hint.Direct computation of the integrals, using that the approximating polynomial is a straight
line.
’
(cj Prove that for n = 2 the Peano kernel KA- “ I is nonnegative on [ - 1, 13.
Hint.This computation is more tedious. Make sure you use (x - t)31,-,z0
computations for t > 0 and t 5 0.
and do separate
(dj In a computer algebra system implement a short program that graphs the Peano kernel KL-”
on [-I. 11 for arbitrary n.
Note. While these graphs do not formally prove that the Peano kernel does not change its sign,
the graphs and the implementation are instructive.
13-25. Let f : [a. b]--f
R be twice continuously differentiable. Prove that when approximating
b
f ( xj d x
with the midpoint rule, which uses Riemann sums with evaluation in the middle of the interval to ap( b - a)3
, where K = max { l f ” ( x ) I : a 5 x 5 b ) .
proximate the integral, the error is bounded by K
24n2
Hint. Use Peano’s representation of the error and use the fact that the midpoint rule is accurate for
polynomials of degree n 5 1. Then emulate the rest of the proof for the Newton-Cotes formulas,
including the proof that the Peano kernel does not change its sign.
~
13.3. Numerical Integration
223
13-26. Prove part 2 of Corollary 13.22.
13-27. In each part, give an upper bound for the error of the approximation of the given integral with the
given rule and the given number of intervals.
e - g dx,trapezoidal rule, N = 20
(a)
(b)
(c)
(dj
/-:
e - g dx,Simpson’s Rule, N = 20
l5
sin ( x2 ) d x , trapezoidal rule, N = 50
s,’
sin ( x2 ) d x , Simpson’s Rule, N = 50
13-28. Compute the number N of intervals needed to approximate the integral with the indicated rule so
that the error is at most the given v .
(a)
(bj
(cj
l:
e - g dx,Simpson’s Rule, u 5
l2
i4
dx,trapezoidal rule, u i. lo-’
In(xj& d x , Simpson’s Rule, v i. lo-*
13-29. Approximate the integral with the indicated rule so that the error is at most the given u
Hiizt. Use the error bounds in Corollary 13.22 to compute the number of intervals. Then compute the
requisite sum with a computer,
(a)
(bj
(c)
ll
Ll‘
L2
1
1
2/5;;e
-2
2 d x , trapezoidal rule. u 5 lo-*
1
2
Z e - T dx,Simpson’s Rule, v 5
2
z
sin
(e)
1
e
(:)
-2
2 d x , Simpson’s Rule, u 5 lops
dx,trapezoidal rule, u 5
10
(0
1‘
f i e X d x , Simpson’s Rule, v
5
I
13-30. Let the function f : [ u ,b] + B be continuously differentiable. Use Euler’s Summation Formula
(see Exercise 12-21j and an appropriate substitution to prove that the approximations of the integral
lb
f ( x ) dx with the trapezoidal rule with N trapezoids converge to
Ib
f ( x ) d x as N + w .
Part 11
Analysis in Abstract Spaces
Chapter 14
Integration on Measure Spaces
Throughout this second part of the text, we need to integrate multivariable functions.
Therefore we start our investigation of the more abstract realms of analysis with integration. Recall that the fundamental idea behind Lebesgue integration was the partition
of the range (see Figure 22). Moreover, when we worked with Lebesgue measure, we
were concerned with properties stated in terms of sets. We rarely used the fact that
these sets were subsets of the real numbers. Therefore integration of functions from a
more general space to [-m, m] should be similar to the theory developed in Chapter
9. Indeed, Sections 14.1-14.4 basically recast and sort the results of Chapter 9 to show
how these ideas generalize to arbitrary measure spaces. In particular, this generalization makes it possible to talk about Lebesgue integration in Rd.Sections 14.5 and 14.6
provide fundamental results on sequences of integrable functions. These results are the
cornerstone for the proofs of many important facts about integrable functions. Finally,
Sections 14.7 and 14.8 show how measures on products (such as R2 = R x R) are
related to measures on the factors.
In this chapter, we experience the full power of abstraction for the first time. Once
the abstract core of concrete results for the real numbers is identified, the familiar
results from the real numbers can be established in much more general contexts, sometimes even with the same proof. It is important to realize, however, that this generalization comes at a price, We must very carefully check that we did not use any specific
properties of the real line in the proof of the concrete result. The most important property of the real line that is lost in the general setting is the linear ordering. Proofs and
results that depend on this ordering do not generalize easily. This is why the linear
ordering of the real line was used sparingly in the first part of the text.
14.1 Measure Spaces
To be most widely applicable, abstract integration is defined on sets equipped with a
structure that makes them a “measure space.” In this fashion, we do not need to worry
about details regarding the shape and dimension of the domain. The fundamental idea
for integration in arbitrary spaces is the same as for Lebesgue integration. Partition the
225
14. Integration on Measure Spaces
226
/&
Ak+l
Ak-1
Figure 29: For integration in more complicated spaces than the real line, we retain
the main idea from Lebesgue integration. We partition the range (here the z-axis) and
measure the size of the inverse images Ak of intervals in the range (the z-axis). The
sum of these sizes times the corresponding heights
should approximate the
“volume” under the graph of the function.
range and approximate the “area” or “volume” under the function with “generalized
rectangles” or “generalized boxes” whose bases are sets for which we can determine
the “measure” (see Figure 29). It was noted after Definition 9.4 that even on the real
line there are sets for which we cannot determine a sensible “measure” using outer
Lebesgue measure. Thus, it is not surprising that on a general set M we need to consider the subset of the power set P ( M ) that contains all subsets for which we can
determine the “measure.” The properties that define these subsets are directly inspired
by properties of Lebesgue measurable sets (see Theorem 9.10).
Definition 14.1 Let M be a set. A subset C & P ( M ) is called a sigma algebra or
a-algebra ifs
1. 0
E
c,
2. I f S E C, then S’ E C,
32
3. I
~
c for~ all n E W,then U A ,
EA
n=l
E
C.
14.1. Measure Spaces
227
Our most important examples of a-algebras so far are the power set itself and the
set of Lebesgue measurable subsets of R.
Example 14.2
1. Let M be a set. Then the power set
P(M)of M
is a a-algebra.
2. The set Ch of Lebesgue measurable subsets of R is a a-algebra.
0
Part 1 is trivial and part 2 is Theorem 9.10.
More examples of a-algebras are given in Exercise 14-1 and Theorem 14.25. Because a-algebras are newly introduced entities, we need to prove that the properties we
know from Lebesgue measurable sets also hold in this general context.
Proposition 14.3 Let M be a set and let C E P ( M ) be a 0-algebra. I f for every
W
n
N we have A ,
E
E
C,then
An
E
C.
n=l
Proof. For each n
E
N by part 2 of Definition 14.1, we infer M \ A ,
of Definition 14.1, we obtain
n
E C . By part 3
M \ An E C.By DeMorgan’s Laws, this means that
n=l
co
M \
u
co
n
co
A , E C . Therefore by part 2 of Definition 14.1
n=l
A, E C,which concludes
n=l
w
the proof.
Proposition 14.4 Let M be a set, let X
u
_C
P ( M ) be a a-algebra, let N
E
N and let
N
A ] , . . . . A N E C.Then
A,
E
C.
n=l
Proof. For n 2 N
+ 1, let A ,
u
N
:= 0. Then
n=l
u
00
A, =
A , E C.
w
n=l
Further properties of a-algebras are presented in Exercise 14-2. Recall that for the
definition of Lebesgue measurable functions we never referred to the measure itself.
Thus for some purposes it will be sufficient to work with a set and its measurable
subsets. This is the idea behind a measurable space.
Definition 14.5 A measurable space is a pair ( M , C ) consisting of a set X and a
a-algebra C 5 P ( M ) . The sets in C will also be called C-measurable sets.
Finally, a measure is a function that assigns each measurable set its “measure.” The
only conditions for a sensible “measure” function are that the empty set has no volume
and that the volumes of pairwise disjoint sets are added to obtain the volume of their
union.
228
14. Integration on Measure Spaces
Definition 14.6 Let ( M , C) be a measurable space. Then p : C + [O,
measure iJj’
001
is called a
1. ~ ( 0 =
) 0, and
2. ,u is countably additive, that is,
sets A ,
E
C, then p
( fi
if
is a sequence of painvise disjoint
!x
An) =E w ( A n ) .
The definition of a measure space connects the “measurable sets” with a function
that assigns the measure.
Definition 14.7 A measure space is a triple ( M , C , p ) consisting of a set M , a sigma
algebra C P ( M ) and a measure p : C + [O,CQ].
Example 14.8 With h denoting outer Lebesgue measure and Ch denoting the sigma
algebra of Lebesgue measurable sets, (R,Ch,h ) is a measure space.
We have already noted that Ch is a 0-algebra. Trivially h ( 0 ) = 0, and the countable
additivity of Lebesgue measure is given by Theorem 9.1 1.
0
Example 14.9 Let M be a set. For A M , we define the counting measure y~ ( A ) to
be the number of elements of A i f A isfinite and co i f A is infinite. Then ( M , P ( M ) , y ~ )
is a measure space.
The reader will prove this in Exercise 14-3. Note that counting measure will allow
us to model absolutely convergent series as integrals in ExamQles 14.36 and 14.37
below.
0
In Chapter 9, we defined Lebesgue integration over R,but we never formally defined Lebesgue integration over subsets of R such as intervals. The formalism of measure spaces allows us to easily fill this gap. Every measurable subset of a measure
space can be equipped with the structure of a measure space.
Example 14.10 Let ( M , C , p ) be a measure space and let 52 E C be measurable. Let
C’ := ( S E C : S
a}and let pa := pl,n. Then (5 2 , C”, p a ) is a measure
space.
0
The reader will prove this in Exercise 14-4.
Because measure spaces are newly introduced, we must prove their properties
“from scratch.” Specifically, we cannot use familiar properties of Lebesgue measure
without first proving them for measure spaces. Nonetheless, the next three propositions should be familiar from Lebesgue measure.
Proposition 14.11 Let ( M , C ,F ) be a measure space and let A l , . . . , A N be painvise
(u
/
disjoint sets in C. Then p
N
n=l
\
N
A,) =
p(An).
n=l
14.1. Measure Spaces
229
u
N
Proof. By Proposition 14.4, we have that
+ 1, let An := 0.
An E C. For n 2 N
n=l
Proposition 14.12 Let ( M , C , p ) be a measure space and let A , B
Then k ( A ) I
F(W.
E
C with A & B.
Proof. By Exercise 14-2b we have B \ A E C . Therefore
@.(A)I
d A ) L4B \ A ) = P ( A lJ
( B \ A ) ) = FUB).
+
Definition 14.13 Let ( M , C , p ) be a measure space. Then A E C is called a set of
measure zero or a null set iff p ( A ) = 0. Aproperty is said to hold almost everywhere
in M ipthe subset of M f o r which the property does not hold is of measure zero. Almost
everywhere is also abbreviated as a.e.
Proposition 14.14 Let ( M , C , p ) be a measure space and f o r all n
u
co
A , E C be a null set. Then
E
N let the set
A , also is a null set.
n=l
Proof. By Proposition 14.12, subsets of null sets are null sets also. For n E N,
u
Aj.
Then each B, is a null set and
j=1
1 w
1
u u
co
n-1
define B, := An \
m
Bn =
n=l
A,. But
n=l
co
p ( B n ) = 0, which establishes the result.
W
We conclude this section with a result that shows that the measure of the union of
a nested sequence of sets is equal to the limit of the nondecreasing sequence of the
measures of the sets
Theorem 14.15 Let ( M , C , p ) be a measure space and let
N.Then p
sets in C so that An E A,+lforall n E
u 1
100
be a sequence of
\
An = lim p ( A n ) .
(n=l
*-+m
Proof. Mimic the proof of Theorem 9.12. (Exercise 14-5.)
Exercises
14-1. Let M be a set. Prove that the set of all subsets S C M so that S is countable or M \ S is countable
is a u-algebra.
14-2. Let M be a set and let C g P ( M ) be a o-algebra.
n
N
(a) Prove that if A 1 , . . . , A N
E
C , then
n=l
(b) Prove that if A , B
E
X,then A \ B
E
C.
An
E
C
230
14. Integration on Measure Spaces
14-3. Counting measure. Let M be a set and let y~ : P ( M ) + [0, M ] be counting measure on M
(a) Prove that ~ M ( A=) 0 iff A = 0.
(b) Prove that y~ is a measure.
14-4. Measures on subsets. Let ( M , C , p ) be a measure space, let R E Z, let C" := {S E C : S
a n d l e t p a := p l x n .
5 Q]
(a) Prove that C" is a a-algebra.
(b) Prove that p~ is a measure.
14-5. Prove Theorem 14.15.
14-6. Let ( M , C . p ) be a measure space and let (An}:=, be a sequence of sets in C . Prove the inequality
14-7. Let M be a set. Prove that a subset C 2 P ( M ) is a o-algebra iff 0 E C; if S E C , then S' E C ; and
'x.
if An E C for all n E
N,then
A,, E C .
n=l
14-8. L e t ( M , C , p ) b e a m e a s u r e s p a c e a n d l e t A , B E C . P r o v e p ( A ) - p ( B ) = p ( A \ B ) - p ( B \ A )
14-9. The measure of nested intersections.
(a) Let ( M , C , p ) be a measure space and let {An},",l be a sequence of sets in C such that for
all n E N we have An 2 An+l and such that for some rn E W we have p(A,) < 00. Prove
(b) Show that the condition p ( A m ) < m for some m E W cannot be dropped
Hint. Let ypq be counting measure on W and let An := { i E W : i 1 n } .
14-10. Let ( M , C , p ) be a measure space.
(a) Prove that Z, := [ A
that contains C.
gM
(b) Prove that for all A
C, and all E , F E Z. with E
E
: ( 3 E ,F E C : E
2 A 2 F , p ( F \ E ) = 0 ) } is a o-algebra
g A g F and p ( F \ E )
= 0 we have
P ( E ) = W(F).
(c) F o r a l l A E C , , l e t E . F E C w i t h E g A g F a n d p ( F \ E ) = O a n d d e f i n e p ( A ) : = p ( F ) .
Prove that iI : C, + [0, m ] is a measure. (You also need to show that p is well-defined.)
(d) Prove that for all B E C we have p ( B ) = p ( E ) .
(e) Prove that the measure space ( M , X p , p) is complete. That is, prove that if N E X, is so
that p ( N ) = 0 and S & N , then S E C.,
The o-algebra Z p is also called the completion of Z. and p is also called the completion of
p
14.2 Outer Measures
Although not all subsets of the real numbers are Lebesgue measurable, outer Lebesgue
measure is defined for all subsets of R. The idea of an outer measure can be transplanted to an abstract space. This section shows that outer measures produce a measure
space similar to how outer Lebesgue measure produced a measure space. In particular,
we will obtain a Lebesgue measure on d-dimensional space.
23 1
14.2. Outer Measures
Definition 14.16 Let M be a set. Then p : P ( M ) + [0,co] is called an outer
measure 1
3
I . p ( 0 ) = 0,
2. ZfA
G B , then p ( A ) 6 p ( B ) ,
3. Thefunction p is countably additive, that is, for all sequences {An]:=, of sets
(u
00
in M we have p
n=l
00
An) 5 C p ( A , ) .
n=l
By Theorem 8.6, outer Lebesgue measure on R is an outer measure. To integrate
in Rd (and via Example 14.10 on subsets C2 E Rd)we need to define an outer measure
on RBd
that is similar to outer Lebesgue measure on R. The definition is very much the
same as for the real line, except that instead of open intervals we use d-dimensional
boxes. The thus defined outer measure is also called outer Lebesgue measure and it is
also denoted with h.
Definition 14.17 Let d 2 1 and for i = 1, . . . , d let ai < bi. Then a set of the form
d
d
B := n ( a i , bi) is called an open box in Rd.We define IBJ := n ( b i - a i ) . For a set
i=l
i=l
S Rd we define the outer Lebesgue measure of S to be
30
I
U B j , eachBjisanopenboxinRd ,
j=1
j=l
where we set h ( S ) = cc ifnone of the series in the set on the right converge.
It would be nice to show here, similar to Proposition 8.5, that the outer Lebesgue
measure of a box is exactly the volume of the box. We postpone this result to Theorem
16.81, where we can use compactness for an efficient proof.
The argument that outer Lebesgue measure on Rd defines a measure on its measurable sets is exactly the same as for R.Thus the following results and the definition of
measurable sets can be seen as a recap of the appropriate parts of Section 9.1.
Theorem 14.18 Outer Lebesgue measure on Rd is an outer measure.
Proof. Mimic the proof of Theorem 8.6. (Exercise 14-11.)
The motivation and definition of measurable sets in the abstract setting are the
same as for Lebesgue measure (see Definition 9.4). We must prevent that the sum of
the measures of the pairwise disjoint measurable parts of a set is different from the
measure of the whole measurable set.
Definition 14.19 Let M be a set and let p : P(M)+ [0,00]be an outer meaM is called p-measurable iff for all T g M we have that
sure. A subset S
p ( T )= p ( S n T ) + p (S' n T ) . The set of p-measurable subsets of M is denoted C,.
14. Integration on Measure Spaces
232
As in Definition 9.5 the Lebesgue measure is obtained by restricting outer Lebesgue
measure to the set Ch of Lebesgue measurable sets.
Definition 14.20 Let S C Rd be a Lebesgue measurable set. Then the outer Lebesgue
measure h ( S ) of S is also called the Lebesgue measure of S.
As before, “half’ of the equality for measurability is always satisfied. Moreover,
the proof that outer measures induce measure spaces runs along the same lines as the
proofs of the corresponding results in Section 9.1.
Proposition 14.21 Let M be a set and let p : P ( M ) -+ [0,CQ] be an outer measure.
For all subsets S , T G M , we have p(T) 5 p(S n T ) p (S’ fl T ) .
+
w
Proof. Mimic the proof of Corollary 9.6. (Exercise 14-12.)
Proposition 14.22 Let M be a set and let p : P ( M ) -+ [0,001 be an outer measure.
Ifp(S) = 0, then S is p-measurable.
w
Proof. Mimic the proof of Proposition 9.7. (Exercise 14-13.)
Lemma 14.23 Let M be a set and let p : P ( M ) -+ [0,001 be an outer measure. I f A
and B are p-measurable, then the intersection A n B is p-measurable.
w
Proof. Mimic the proof of Lemma 9.8. (Exercise 14-14.)
Lemma 14.24 Let M be a set, let p : P ( M )
let
-+
[0,001 be an outer measure and
{A,,]zl
be a sequence of painvise disjoint p-measurable sets.
u
cc
Then
A,, is
Proof. Mimic the proof of Lemma 9.9. (Exercise 14-15.)
Theorem 14.25 Let M be a set, let p : P ( M ) -+ [0,001 be an outer measure and let
C, be the set of p-measurable sets. Then ( M , C,, p ) is a measure space.
Proof. Mimic the proof of Theorem 9.10 to prove that C, is a a-algebra and then
mimic the proof of Theorem 9.11 to prove that p is countably additive on C,. (Exercise
14-16.)
rn
In particular, Theorem 14.25 shows that the triple Rd,Ch,h is a measure space.
Lebesgue measure is the standard measure for integration in d-dimensional space.
Thus, as we proceed to define measurable and integrable functions, we are constructing
a theory that allows us to integrate on Rd and its subsets.
(
Exercises
14-11. ProveTheorem 14.18
)
14.2. Outer Measures
233
14-12. Prove Proposition 14.21
14-13. Prove Proposition 14.22
14-14. Prove Lemma 14.23.
14-15. Prove Lemma 14.24.
14-16. Prove Theorem 14.25.
14-17. Let d 2 1 and for i = 1, , . . , d let the numbers ai < bi be dyadic rational numbers. Then a set of
n
d
the form D :=
(ai , b i ) is called a dyadic open box in Rd. Prove that for any set S
g Wd we
i=l
u
30
IDj I : S C
D j , each D j is a dyadic open box in Rd
j=1
1
14-18. Prove that if A , B g R and h ( A ) = 0 (where h is Lebesgue measure on JR), then h ( A x B ) = 0
(where h is Lebesgue measure on R2).
I
14-19. Looking at Riemann integrals from a measure theoretic point-of-view. Let [ a ,b] be an interval. For
a set S & [a, b] we define J ( S ) := inf
n
("
IIjI : S &
j=1
u
I j , each I j is an open interval
j=1
and
call it the Jordan content of S. Note that the only difference between the Jordan content and outer
Lebesgue measure is that the sums over which the infimum is taken are finite.
(a) Prove that for any closed interval [c, d ] g [a, b ] we have J
( [c, dl ) = d - c.
(b) Prove that the Jordan content is nor an outer measure.
Hint: Q.
(c) Let Ji (S):=sup
In
n
Ibj - a j 1 : S 2 U [ a j ,b j ] , a l 5 bl 5 a2 5 b2 5 . .
j=1
. sa, 5 bn
j=1
1
, and call
it the inner Jordan content of S g [ a ,b]. Prove that for any closed interval [c, d ] E [ a ,b]
we have Ji ([c, d ] ) = d - c.
(d) Prove that the inner Jordan content is not an outer measure
E W Jordan measurable iff J ( S ) = J i ( S ) . An algebra of sets satisfies the
first two properties of a a-algebra, but it is only closed under finite unions rather than infinite
of Jordan measurable subsets of [ a ,b] forms an algebra.
unions. Prove that the set
( e ) Call a set S
z[[n,b]
J : z[~,b]
+ [O, m) is a finitely additive measure. That is,
(f) Prove that the Jordan content
J(0) = 0 and if (An),N_lis a finite sequence of pairwise disjoint sets
An E
q~.b],
then
(g) Prove that a set S g [ a , b] is Jordan measurable iff its indicator function 1s is Riemann
integrable.
Note. This exercise shows it is fair to say that the problem with the Riemann integral is that its
associated notion of a content is only finitely additive.
14-20. Let a < b be real numbers and let g : [ a ,b] + W be nondecreasing. For numbers c, d E ( a , b ) so
that c < d define (c, d ) := lim g(x) - lirn g ( x ) . define [ a ,d ) := lirn g(x) - g ( a ) ,
Ig
1
1
and define (c, b]
x-td-
Ig := g ( b ) -+,,.. lim
,
,
x+c+
1
g(x). Set [ a , b]
I
Ig
I
x+d-
:= g ( b ) - g ( a ) . Let Z be the set of all
Ir
subintervals of [ a , b ] that are either open, closed at a and open on the right, closed at b and open on
the left or equal to [a, b]. For any set S c [a, b ] ,we define the outer Lebesgue-Stieltjesmeasure
u
30
lljlg : S &
of S to behg(S) := inf
j=1
j=1
I j , each I j is i n Z
I
.
234
14. Integration on Measure Spaces
(a) Prove that the outer Lebesgue-Stieltjes measure really is an outer measure
(b) Prove that open intervals ( c , d ) C [ a . b] are hg-measurable.
Hint. Compare with Proposition 9.13.
(c) Prove that for c < d both in [ u , b] we have hg ([c, d ]
)=
lim g ( x ) - lim g ( x ) , where
.xed+
x’c-
the one-sided limit is understood to be the value of g if c = a or d = b.
1;
forx
> c.
Prove that Ag(S) :=
(e) Construct a nondecreasing function g : [a, b] +
hg ( [ c ,dl ) > g ( d ) - gic).
O;
1:
’
if
for any set s 2 [ a , b ] .
if c E S,
’3
B and an interval
[c, d ]
g
[ u , b] so that
(f) Prove that the function f : [ u , b] -+ W is Riemann-Stieltjes integrable with respect to g iff
hg ( ( x : f is discontinuous at x ) ) = 0.
14.3 Measurable Functions
As we did on the real line, we first define the functions for which there is a chance that
the integral exists and then we define the integral. Indicator functions (see Definition
5.9) are once more our “rectangles” and simple functions are composed of finitely many
disjoint “rectangles.” From here on, algebraic operations on functions will always be
understood to be pointwise, that is, for example, the sum f
g of two functions f
and g with the same domain is defined as ( f g ) ( x ) := f ( x ) g ( x ) for all elements
x of the domain.
+
+
+
Definition 14.26 Let ( M , C) be a measurable space. A simple measurable function
is a function s : M -+ R such that there are n E N,a l , . . . , a, E R and painvise
n
disjoint sets A 1,
. . . , A,
E
C so that s =
ak1.4,. We will also call these functions
k=l
simple functions.
For measurable functions, we consider the positive and negative parts separately.
Definition 14.27 Let M be a set. For f : M -+ [-m, 001 we dejine the positive part
f + ( x ) :=max{f(x),O} andthenegativepart f - ( x ) := -min{f(x),O}.
Definition 14.28 Let ( M , C) be a measurable space. The nonnegative function
f : M -+ [0,001 is called measurable if there is a sequence
of simple
functions s,, : M + [0,00) such that f o r all x E M the sequence { s n ( x ) } Z lis
nondecreasing and lim s, (x) = f (x).
{s,}zl
n+oc
A function f : M -+ [-a,
001 is called measurable fz f + and f - are both
measurable. lf it is necessary to distinguish between several a-algebras, we will also
call these functions C -measurable.
Once again, measurable functions have many characterizations.
Theorem 14.29 Let ( M , C ) be a measurable space and let f : M + [-XI,
function. Then the following are equivalent.
XI] be a
14.4. Integration of Measurable Functions
235
1. f is measurable,
2. For all a
E
R,we have {x E M
3. For all a
E
R, we have {x E M : f (x) 5 a } E C ,
4. For all a
E
R,we have {x E M : f
5. For all a
E
R,we have {x E M : f (x) 1. a } E C.
:f
(x) > a } E C ,
(x) < a }
E
C,
Proof. Mimic the proof of Theorem 9.19 (Exercise 14-21).
The characterizations of measurable functions can be used to prove that certain
operations preserve measurability.
Theorem 14.30 Let ( M , C ) be a measurable space and let f , g : M -+[-GO, co]be
measurable functions. r f f
g is defined everywhere, then f
g is measurable. If
f - g or f . g is defined everywhere, then it is measurable. Finally, f +, f - and I f I
are measurable.
+
+
Proof. Mimic the proof of Theorem 9.20. (Exercise 14-22.)
Exercises
14-21. Prove Theorem 14.29
14-22. Prove Theorem 14.30.
14-23. Let ( M ,
C ) be a measurable space. Prove that a bounded function f : M -+ [0, co) is measurable
iff there is a sequence
of simple functions sn : M + [0, 00) such that for all x E M the
is nonincreasing and lim s n ( x ) = f ( x ) .
sequence { s , ( x )
I,"=,
n e w
Hint. Mimic part ''5=+1" of the proof of Theorem 14.29 for nonnegative functions.
14-24. Let ( M ,
C )be a measurable space and let f , g : M + [-m,
{x
{x
E
M : f ( x ) = g(x) }
E
C.
(b) Prove that
E
M : f ( x ) 5 g(x) }
E
C.
(c) Prove that
{x
E M : f ( x ) < g(x)
}
E
C.
(a) Prove that
001 be
measurable functions.
14.4 Integration of Measurable Functions
Once measurable functions are defined, the definition of the integral also is similar to
that of the Lebesgue integral.
N,let A1, . . . , A ,
Definition 14.31 Let ( M , C ,p,) be a measlire space, let n
E
pairwise disjoint, let a1 , . . . , a,
a k l be
~ ~a simple function.
n
E
n
[0,co)and let s =
k= 1
E
C be
14. Integration on Measure Spaces
236
The fact that the integral in Definition 14.31 does not depend on the representation
of s can be proved just as for the Lebesgue integral in Exercise 9-20a.
Definition 14.32 Let ( M , C , p ) be a measure space. Let f : M
measurable function. We dejine the integral o f f to be
f d p := sup
I M
[O, co] be a
+.
{ IM
s d p : s is a simple function with 0 5 s 5
f
and we call f integrable iff the supremum isjinite.
A function g : M -+ [-m, 001 is called integrable z r g + and g - are both integrable. Its integral is defined to be
IM 1,
g d p :=
g+ d p - /M g - dp.
The integral on measure spaces is very versatile. Examples 14.33 and 14.34 show
that integration on the real line as well as on d-dimensional spaces are special cases.
Examples 14.36 and 14.37 show that absolutely convergent series and absolutely convergent double series are in one-to-one correspondence with integrable functions on
the right measure spaces.
Example 14.33 With M = R, C = C;, and p = A, Definition 14.32 gives the
0
Lebesgue integral on the real line.
Example 14.34 With M = Rd,C = C;, and p = A, Definition 14.32 gives the
Lebesgue integral in d-dimensional space. We will investigate in Section 14.8 how the
0
Lebesgue integrals in various dimensions are related to each other.
For the mentioned examples on series, we first need to establish that a function is
integrable iff its absolute value is integrable. As was (by now) to be expected, the properties of the abstract integral are proved in exactly the same way as for the Lebesgue
integral.
Theorem 14.35 Let ( M , C, p ) be a measure space and let f , g : M
measurable functions.
1. I f 0 5 f 5 g a.e. and g is integrable, then so is f and
+.
[-co,001 be
lM IM
f dp 5
g dp.
2. f is integrable z f f 1 f I is integrable and in that case the triangular inequality
I
n
3. I f f 2 0 , t h e n
SM
f d p = 0 i f f f =Oa.e.
Proof. Mimic the proof of Theorem 9.23. (Exercise 14-25.)
237
14.4. Integration of Measurable Functions
Example 14.36 With M =
0;:
N,a series
aj
N,C
= P( N ) and p = yw being counting measure on
converges absolutely iff the function f : N + [--00,
001 defined by
j=1
f(j)=
aj
is integrable over the measure space (N, P ( N ) , yw). Moreover, for every
integrable function f on this measure space we have
jN dyw
f
co
=
f(j).
j=1
Example 14.37 Let (a(i,j ~ } $ = be
~ a doubly indexed countable family of numbers.
7
occo
We say the double series
a(i,j) converges
absolutely iff the double series
i = l j=1
moci
1 converges.
la(j,j)
With M = N x N, C = P ( N x N)and p = mxwbeing counting measure on N x N,
000;:
a double series
7
a(i,j)converges
absolutely iff the function f (i, j ) := a ( i , j ) is
i = l j=1
integrable on the measure space
(N x N, P ( N x N), y w x ~ ) . For every integrable
function f on this measure space, we have
S,
cocc
.t-dywxrv=~~f(i,.j).
XW
0
i = l j=1
As for Lebesgue measure, we consider null sets to be insignificant. Therefore with
the same motivation as given for Definition 9.24 we define the following.
Definition 14.38 Let ( M , C , p ) be a measure space. I f . f : M -+ [--00, -001 is defined
iff (') is defined?
is integrable.
a.e., we call it integrable z r g ( x ) :=
i f f (x) i s not defined,
{ 6(x)'
As long as a function is constructed from other integrable functions, like the sum
in Theorem 14.39, measurability of the set where the function is not defined is not an
issue. However, even if this was a problem and the function is undefined on a subset
of a null set, we could simply switch to the completion of the measure (see Exercise
14-10) and note that subsets of null sets are also insignificant.
Overall, with the same conventions as for the Lebesgue integral we can prove that
the integral over a measure space has the right linearity properties.
Theorem 14.39 Let ( M , C , p ) be a measure space and let f ,g : M +
integrablefunctions. Then the following hold.
1. For all a E
R,the scalar multiple af is integrable and
IM
2. The sum f
+ g is integrable and
f dp
f
+g d p =
[--00.
-003
IM
1, + lM
af d p = a
g dp.
be
f dp.
14. Integration on Measure Spaces
23 8
Proof. Mimic the proof of Theorem 9.25. (Exercises 14-26a and 14-26b.)
Exercises
14-25. Prove Theorem 14.35. That is, let ( M , C,p ) be a measure space and let f ,g be measurable functions.
(a) Prove that if 0 5 f 5 g a.e. and g is integrable, then so is f and
(b) Prove that f is integrable iff
(c) Prove that if f
> 0, then
jM
I f 1 is integrable and that in this case
f dp 5
1
/
g dp.
M
[M f dw
5
~
IM
1f l
dp.
J, f d p = 0 iff f = 0 a.e..
14-26. Let ( M , C . p ) be a measure space and let f , g : M + [-m, 031 be integrable functions.
Prove that if
ci
E
W,then the scalar multiple af is integrable and /M
Prove that the sum f
+ g is integrable and
IM +
f
g dl* =
cif
dp =a
IM + IM
1
f dp
lM
f dw.
g dp
Note. Exercise 14-33 gives a more effective proof than mimicking the proof of Theorem 9.25.
Prove that f - g is integrable and /M f - g d p = /M f d p -
g dp.
M
Prove that max(f, g ] (defined pointwise) is integrable.
Prove that min[f, g ] (defined pointwise) is integrable.
Give an example that shows that the product o f f and g need not be integrable.
14-27. Markov’s inequality. Let ( M , o,g ) be a measure space and let f : M + [O.
1
Prove that for all c z 0 we have p ( { x E M : f ( x ) > c ] ) 5 f dp.
C
1
30)
be integrable.
M
14.5 Monotone and Dominated Convergence
Sequences have been a standard tool throughout our investigation of single variable
functions. Sequences play an equally important role in more abstract settings. In Exercise 11-13, we have seen that pointwise convergence of a bounded, monotone sequence of Riemann integrable functions need not produce a Riemann integrable limit
function. The two fundamental results about pointwise convergence in measure theory are the Monotone Convergence Theorem, which says that the pointwise limit of a
nondecreasing sequence of nonnegative integrable functions is integrable if the limit of
the integrals is finite (see Theorem 14.41), and the Dominated Convergence Theorem,
which says that the pointwise limit of a sequence of integrable functions is integrable
provided all functions are below one integrable function that dominates them all (see
Theorem 14.43). Hence, when it comes to pointwise convergence of functions, the
Lebesgue-type integral has more favorable properties than the Riemann integral.
Because every subset A E C of a measure space ( M , C , /A) can be turned into
a measure space, and because by Theorem 14.35 for any integrable function g, the
function g l A is also integrable, we can define integrals over subsets.
Definition 14.40 Let ( M , C , p ) be a measure space, let A E C and let the function
,f : A4 + [-m, 001 be integrable. Then we define the integral o f f over the subset A
239
14.5. Monotone and Dominated Convergence
as
I,
g d p :=
lb
SM
glA d p . For Lebesgue integrals over intervals, we will also write
f ( x ) d h ( x ) :=
Lb
f d h :=
la,b,
f dh.
Because the integral is defined as a supremum of integrals of simple functions, it
should not be too surprising that the integral is well-behaved with respect to monotone
sequences of functions.
Theorem 14.41 Monotone Convergence Theorem. Let ( M , C , p ) be a measure
bela sequence of nonnegative measurable functions so that
space and let { f f l } z
{ f f l ( x ) } z l is nondecreasing for almost all x E M and lim f n ( x ) exists for almost
all x
n+m
E
M . Let f : M + [0,00] be a measurablefunction so that f (x) = fl+m
lim f f l ( x )
IM
P
a.e. Then
P
f d p = lirn
fn
f l J M
dl*.
Proof. First assume that for all x
limit lim f n ( x ) exists.
E
M we have 0 5 f 1 (x) 5 f 2 ( x ) 5 . . . and the
fl+X
reversed inequality, let s =
c
ak1.4, be a simple function such that 0 5 s 5 f and let
k=l
t E ( 0 , 1). For n E
all n
E
N,define E,
cc
:= ( x
N we have the containment E ,
E
M : f i l ( x ) 2 t s ( x ) } .Then M =
fl=l
En+l, and
m
m
k=l
k=l
Let E > 0. Then there is an N E
u
N so that
E,, for
240
14. Integration on Measure Spaces
Because
E
SM
SM f5y- 1,
was arbitrary we obtain
t s d p i lirn
fn
n+w
d p , and because the
number t E (0, 1) was arbitrary, we can let t approach 1 and obtain
sdp =
s d p 5&
:%
t
S,
dp.
fn
Because s 5 f was an arbitrary simple function, we conclude that
SM
{
f dF = sup s,/
d p : s simple, 0 5 s 5 f } 5 lim
n+oo
,/
fn dp.
Now assume that 0 i fi(x) 5 f 2 ( x ) I
. and n+cc
lirn f n ( x ) = f (x) for almost
all x E M . Let N E C be a null set so that 0 5 f l ( x ) 5 f 2 ( x ) 5 . . . and
lirn f n ( X ) = f (x) for all x E M \ N. Changing a function on a null set does not
n+cc
change its integral. Therefore we can replace f with f ~ M \ and
N each f n with fnlM\N
and apply what we have proved above to obtain the equation for the integrals.
rn
‘
1
We need the next lemma for the proof of the Dominated Convergence Theorem.
{fn)rE1
Lemma 14.42 Fatou’s Lemma. Let ( M , C , p ) be a measure space and let
be
a sequence of nonnegative measurable functions. Then lirn inf f n (defined pointwise)
n+w
is measurable and
f n d p 5 ln+cc
iminfh fn dp.
Proof. Because the limit inferior is liminf f n = lim inf{f j : j 3 n ] , where the
n+m
n+co
infimum and the limits are taken pointwise, we first consider the sequence of functions
p n := inf{f j : j 2 n ) .
u {x
00
Let n
E
N and a
E
R. Then {x E M : p n ( x ) ia } =
E
M : fj(x) < a}.
j =n
Because the union on the right side is measurable and a was arbitrary we conclude that
p n is measurable. Moreover, clearly we have pn 5 pn+l.
By Exercise 14-28a, this means that liminf f n = lim p n is measurable and by
n+oo
n+oo
the Monotone Convergence Theorem
numbers n
E
N we have p n
p n d y . Now for all
I
f n , which by Lemma 10.5 means
liminf f n d p = lim
n+w
1
p n d p 5 liminfIM fn d p .
n+m
The inequality in Fatou’s Lemma can be strict (Exercise 14-29). Finally, as long as
all functions’ absolute values are bounded (“dominated”) by one integrable function,
pointwise limits preserve integrability and the limit can be moved out of the integral.
Theorem 14.43 Dominated Convergence Theorem. Let ( M , C ,p ) be a measure
be a sequence of measurable functions and let f be a measurable
space, let ( fn]rG1
functionsuch that f (x) = lirn f n (x)for almost all x. Moreovel; let g be an integrable
n+oo
24 1
14.5. Monotone and Dominated Convergence
function such that for all n
f is integrable and
SM
E
N and almost all x
f dp=
,hI/,
fn
If
I
1
I
E M we have f n (x) 5 g ( x ) . Then
dp.
I
Proof. Because ( x ) l = lirn f n ( x ) 5 g(x) for almost all x E M , by part 1 of
n+cc
Theorem 14.35 I f 1 is integrable and then by part 2 of Theorem 14.35 f is integrable.
Because changing a function on a null set does not change the integral, we can
assume that l f n ( x ) (5 g(x) and (x)l 5 g(x) for all x E M . Now for all n E N we
have f n (x) g(x) 2 0 and thus by Fatou’s Lemma we obtain
If
+
lim
lirninf
F
n-cc
which means that
functions (- f n )
sition 10.4)
IM
jMf
dp 5
fa
,/
fn
hnkf /M
+ g 2 0 gives
+ g d p = /M
-f d p
liminf(f,
n-+m
+ g d p = liminf
n-cc /M
+ g) d p
f n d p -t
/Mgdir’
f n d p . The same argument applied to the
5 liminf
-f n d p , that is (recall Propo/M
f n d p . By Proposition 10.3 and Theorem 10.6, we
f dp 2
r
The hypothesis that all functions f n in the Dominated Convergence Theorem are
below an integrable function g cannot be dropped. For example, on (0, 1) equipped
l o o
converges point{
[M);),=,
wise to zero, but all integrals are equal to ln(2). Moreover, Exercise 14-28 shows that
with Lebesgue measure, the sequence of functions 1
the demand that the pointwise a.e. limit is measurable cannot be dropped in general,
but that it can be dropped for Lebesgue measure.
Exercises
14-28. Pointwise almost everywhere limits of measurable functions. Let ( M , C , p ) be a measure space and
be a sequence of measurable functions.
let (fn)r=u=l
Prove that if f ( x ) = lim f n ( x ) for all x E M , then f is measurable.
n-tcc
Give an example that shows that the pointwise a.e. limit of a sequence of measurable functions
need not be measurable.
Hint. There is an example with M = (0, 11, C = (0, M )and p = 0.
Prove that if ( M , C , p ) is a complete measure space, then f ( x ) = lirn f i l ( x ) a x . implies
n-cc
that f is measurable.
Prove that (&id,
Xi,h
) is a complete measure space.
14-29. Give an example that the inequality in Fatou’s Lemma can be strict. Then explain why this is not a
counterexample to the Dominated Convergence Theorem.
14-30. Let ( M , C , p ) be a measure space and let g : M + [O,
pg : C + [O,
cc)defined by p g ( A ) :=
031
be integrable. Prove that the function
14. Integration on Measure Spaces
242
14-31. Let ( M , Z. p ) be a measure space and let ( f n } , X , l be a sequence of integrable functions that converges uniformly to a measurable function f.
(a) Prove that if p ( M ) < co,then f is integrable and
sM
f dw
= LlmE
sM
f n dpL.
(b) Give an example that shows this result need not hold for infinite measure spaces.
be a sequence of integrable functions that con14-32. Let ( M , Z, p ) be a measure space and let {fn]r=l
verges pointwise to a measurable function f and for which there is a B E JR so that for all x E M we
have f n ( x ) 5 B .
(a) Prove that if p ( M ) < co,then f is integrable and
f d p = lim
n+x M
f n dp.
(b) Give an example that shows this result need not hold for infinite measure spaces
sM sM
14-33. Let f,g : M + [-co,301 be integrable functions. Give a more effective proof that f
integrable and
IM +
f
g dp =
f dp +
+ g is
g d p than was given in Theorem 9.23.
Hint. Use the Monotone Convergence Theorem and two nondecreasing sequences of simple functions that converge to I f 1 and / g (respectively to prove that I f 1 + / g /is integrable. Use a similar idea
to prove the equation for the integrals. Do not use the Dominated Convergence Theorem (why?).
14-34. Explain how the hypotheses of the Dominated Convergence Theorem prevent the occurrence of examples as in Exercise 11- 17 or as mentioned at the end of this section.
14-35. Prove that for every Lebesgue integrable function f : SS -+ JR there is a sequence of Riemann
integrable functions rn : R + R such that lim
n+w
s
w if
- rn
I d h = 0.
14.6 Convergence in Mean, in Measure, and Almost Everywhere
Theorem 15.50 will show that the “distance” between two functions can be measured
with the integral of the absolute value of the difference. Because convergence with
respect to this distance is fundamental in analysis, we investigate this notion of convergence and some of its consequences here.
Definition 14.44 Let ( M , C ,p ) be a measure space, let [ fn]F=l be a sequence of
measurable functions and let f : M + [-w, w] be a measurable function. If
1 fn
-
f 1 d p = 0 then we say that ( fn]zl
converges in mean to f
I
Convergence in mean is near the heart of the definition of integration.
Theorem 14.45 Let ( M , C ,p ) be a measure space and let f : A4 + [-w, w] be an
of simplefunctions that converges
integrablefunction. Then there is a sequence (sn]El
in mean to .f.
Proof. By definition of the integral, for every n
tions 0 5 ;s
5 f - and 0 5 :s
5 f + so that
E
N there are simple func1
n
14.6. Convergence in Mean, in Measure, and Almost Everywhere
-1 + -1= - 2,
<
which means that {s.},"=~
243
n
n
n
converges in mean to f .
Unfortunately, convergence in mean does not even imply pointwise a.e. convergence, so the visualization of convergence in mean is a bit complicated.
Example 14.46 On the interval [0, 11, for n E N and k E {0, . . . , 2n - 1) define the
function f p + k := 1F k ~k + l, ~Then
] . the sequence { fm}F==l
converges in mean to 0, but
0
it does not converge at any point in [0, 11.
We can say that convergence in mean implies that the measure of any set on which
the difference Ifn - f I is greater than a given number must go to zero.
Proposition 14.47 Let ( M , C,p ) be a measure space. I f the sequence { fn}r=l
of
measurable functions converges in mean to the measurable function f . then for all
E > 0 we have lim p ({x E M : I f n ( x ) - f ( x ) l > E } ) = 0.
n+cc
Proof. Let E > 0. Then
w
and the latter sequence goes to zero.
The condition derived in Proposition 14.47 is called convergence in measure. Convergence in measure implies at least that a subsequence converges a.e.
f,,}rz1
Definition 14.48 Let ( M , C ,p ) be a measure space. The sequence {
of measurable functions converges in measure to the measurablefunction f ifffor every E > 0
we have lim p ({x E M : f n ( x ) - f ( x ) l > E } ) = 0.
n+cc
1
fn}ry1
Proposition 14.49 Let ( M , C , p ) be a measure space. I f the sequence {
of
measurable functions converges in measure to the measurable function f , then it has a
subsequence that converges a.e. to f .
14. Integration on Measure Spaces
244
Proof. Define
as follows. Let no := 1 and for k L 1 let
(nk}.&
x E M : I f n k(x) - f ( x )I >
that
(fnkJE1
s o t h a t p ({x
be so that I
Suppose for a contradiction
does not converge a.e. to f . Then there are an m E
E M : (VK
E
w l
m and
W : 3k 3 K :
nk > nk-1
Ifnk(x)
- f(x)l >
N and an E
> 0
m
< E . Then
k=l
but the measure of the latter set is less than
E,
a contradiction.
Exercises
14-36. Let ( M , C , p ) be a measure space and let g : M + [O, m] be integrable
(b) Prove that for every E > 0 there is a S > 0 so that if p ( A ) i8, then
Hint. First prove the result for bounded g.
L
gdp
iE .
(c) Prove that with M = [a,b ] , p being Lebesgue measure h on [ u , b] and C being the set
Cj, of Lebesgue measurable subsets of [ u , b ] ,the function f ( x ) :=
continuous.
14-37. Pointwise convergence and convergence in measure.
(a) Give an example that shows that, in general, pointwise convergence does not imply convergence in measure.
(b) Prove that if ( M . C , p ) is a measure space with p ( M ) < 00, [fn]:={
is a sequence of measurable functions that converges ax. to the real valued measurable function f,then
converges in measure to f.
[fn]F=l
(c) Prove Egoroff's Theorem, which states that if ( M , C,p ) is a measure space with
p ( M ) iCQ,
is a sequence of measurable functions that converges a x . to the real
valued measurable function f,then for each E > 0 there is a B E C so that p ( B ' ) iE and
converges uniformly to f on B .
[fn}r=l
14-38. First results on derivatives of integrable functions.
(a) Let f : [a,b] -+ W be a Lebesgue measurable function that is differentiable a t . . Prove that
f' is Lebesgue measurable. You may use the result of Exercise 14-28c.
(b) Let f : [ a , b] + JR be a nondecreasing function. Prove that f' (which exists a x . by Exercise
10-7) is Lebesgue integrable on [a,b ] and
f ' ( x ) dh 5 f ( b )- f ( a ) .
14.7. Product a -Algebras
Him. Apply Fatou’s Lemma to fn (x) :=
245
f (x
+ b)
1
-
f(x)
(where we set f ( b + e ) := f ( b )
for all 6 > 0) and use Exercise 7-23, Theorem 9:26 and an adaptation of the proof of the
Derivative Form of the Fundamental Theorem of Calculus for Riemann integrals.
14.7 Product a-Algebras
One motivation for introducing measure theory was the desire to define an integral on
multidimensional spaces. Integration with respect to d-dimensional Lebesgue measure
provides such an integral as an abstract entity. Computationally, it is standard practice
to compute higher dimensional integrals as iterated lower dimensional integrals. Sections 14.7 and 14.8 show that this is also possible in measure theory. As a consequence,
we not only progress towards showing that d-dimensional integrals can be computed
as iterated integrals, but we also are able to establish some results on double series (see
Exercise 14-50).
The relevant constructions actually start with the construction of the product space
of two measure spaces. For this product space, we need a a-algebra that contains
certain sets. This section will provide the a-algebra on the product space and Section
14.8 will provide the product measure.
As before, to safeguard against nonmeasurable sets, we must make sure that our
a-algebras do not get too large. The smallest a-algebra that contains certain sets is
generated as follows.
Definition 14.50 Let M be a set and let U P ( M ) . Then the a-algebra generated
by U is defined to be the intersection of all a-algebras that contain U.
Proposition 14.51 Let M be a set and let U
by U is a a -algebra.
P(M).Then the a-algebra generated
Proof. Exercise 14-39.
Exercise 14-41a shows that the a-algebra of Lebesgue measurable sets is generated
by a family of fairly simple sets. To prove the equality of two measures on the a algebra generated by the set U we want to concentrate on the sets in U . Unfortunately,
because the definition of a measure involves unions of pairwise disjoint sets, this is not
as straightforward as it might seem. Instead of being equal on a-algebras, measures
usually agree on Dynkin systems.
14. Integration on Measure Spaces
246
Definition 14.52 Let M be a set and let V
system iff
P ( M ) . Then D is called a Dynkin
_C
1. M E D ,
2. I f A , B E D a n d A 2 B, then A \ B
E
D, and
u
cc
3. I f { A n } F I is a sequence in D with A ,
An+l for all n E
N,then
A , E D.
n=l
Definition 14.53 Let ( M , C , p ) be a measure space. Then the measure p is called
finite z f f p ( M ) < co.
Proposition 14.54 Let ( M , C) be a measurable space and let p and v befinite measures on M so that p ( M ) = v ( M ) . Then { S E C : p(S) = v(S)} is a Dynkin system.
Proof. Exercise 14-43a (for part 3 use Theorem 14.15).
As with a-algebras, the intersection of all Dynkin systems that contain a given set
of sets U is again a Dynkin system, called the Dynkin system generated by U (see
Exercise 14-44). The connection between a-algebras and Dynkin systems is made via
n-systems.
Definition 14.55 Let M be a set and let U G P(M).Then U is called a n-system
for all A , B E U the intersection satisfies A f?B E U.
zx
Theorem 14.56 Monotone Class Theorem or Dynkin’s Lemma. Let M be a set and
let U be a n-system that contains M . Then the a-algebra generated by U equals the
Dynkin system generated by U.
Proof. Let C be the a-algebra generated by U and let D be the Dynkin system
generated by U.Because every a-algebra also is a Dynkin system, we infer D
C.
For the reversed inclusion, first consider D1 := { A E D : A n U E D for all U E U}.
BecauseU contains M we infer M E D1.If A , B E D1 and A 2 B , then for all U E U
we have ( A \ B ) f l U = ( A n U ) \ ( B n U ) E V (see Exercise 7-3e), which means
A \ B E D l . Moreover, if
is a sequence of sets in D1 with A , G A,+l for
all n E
N,then for all U
u
cc
means that
n=l
E
U we obtain
(6
An) nU =
n=l
co
u
(A,
n U ) E V ,which
n=l
A, E D1. Hence, D1 is a Dynkin system. Because U is a n-system, we
infer U D1 and therefore 2) G D1, which implies that V = D1.
Now define D2 := { A E D : A n D E D for all D E V}.Because D1 = D we
infer U 2 232. Similar to the above we can prove that 272 is a Dynkin system, which
means that D = V2.We conclude that if A , B E D , then A Ti B E D.
Now we can prove that V is a a-algebra. First, note that because M E D , for all
sets S E D the complement S’ = M \ S is in D. In particular, we trivially obtain
0 = M’ E D. Now let { B n } E lbe a sequence of sets in D. For n E N,define the
247
14.7. Product a -Algebras
/ / / /
’//
X
X
X
Figure 30: The “horizontal sections” S J of a set S C X x Y are obtained by intersecting
the set with “horizontal lines” X x { y } . The “vertical sections” S, are obtained by
intersecting the set with “vertical lines” {x} x Y .
u
u u
n
set A , :=
Bi = M \
E 2). Then
A , 2 A,+1 for all n E
N and so
i=l
oc
x
B, =
A,
E
D and 2)is a a-algebra. This proves C
D , hence C = D.
,=l
17=1
The Monotone Class Theorem now allows us to prove that under certain circumstances, two measures on the same a-algebra must be equal.
Theorem 14.57 Let M be a set, let U C P ( M ) be a n-system, let C be the a-algebra
generated by U and let p , v C + [0,001 be measures with kiu = vlu.If there is a
in U so that all p ( A , ) = v(A,) arejnite, A ,
An+l for all n E N
sequence
u
3c,
and
A,z = M , then p = v.
Proof. First prove that each { A E C : p ( A n A , ) = u ( A n A , ) } is a Dynkin system.
Then apply Theorem 14.56 to show p ( A f’ A , ) = u ( A n A , ) for all A E C and n E N.
cc
Finally, show p = v by using A0 := 0 and p ( A ) =
K ( A f l ( A , \ A,-i))
and
n=l
C-C
u ( A f l ( A , \ A,-I)) for all A
v(A)=
E C.
(Exercise 14-45.)
n= 1
To define a product of two measurable spaces we now simply form all the products
of sets in the respective a-algebras and consider the o-algebra generated by these products. We also define the intersections of sets with “horizontal lines” or “vertical lines”
and the intersections of graphs of functions with “vertical planes.” These intersections
will allow us to reduce integrals on the product of two measure spaces to iterated inte-
248
14. Integration on Measure Spaces
“I
“t
Figure 31: The “y-sections” f Y of a function f : X x Y + [--00, m] are obtained
by intersecting the graph of the function with “vertical planes X x {y} x [--00, 001 in
the X-direction at y.” The “x-sections” f x are obtained by intersecting the graph with
“vertical planes (x} x Y x [-m, 001 in the Y-direction at x.”
grals. To emphasize the visualization as x- and y-coordinates, for the remainder of this
chapter, the first factor space will be X and the second factor space will be Y.
Definition 14.58 Let ( X , C ) and ( Y , r) be measurable spaces. A subset S of X x Y is
called a rectangle with measurable sides iff there are subsets A E C and B E r such
that S = A x B. The a-algebra generated by the set of all rectangles with measurable
sides is called the product a-algebra C x r of C and r.
Let x E X and y E Y . For all sets S E C x r in the product a-algebra we
dejine the sets S, := n y [ { s E S : n x ( s ) = x}] and SY := n x [ { s E S : n y ( s ) = y } ]
(see Figure 30). For all C x r-measurablefunctions, we dejine f x (y) := f (x,y) and
f’(x) := f (x, y) (see Figure 31).
The standard approach to prove results about product a-algebras is to establish the
result for rectangles with measurable sides and then to show that the sets for which the
result holds form a a-algebra. We start by making sure that the sections S,, SY, f, and
f Y of measurable sets and functions are measurable.
Proposition 14.59 Let ( X , C ) and ( Y , r) be measurable spaces and let x E X and
y E Y. Then for all sets S E C x
the set S, is r-measurable and the set S’
is C-measurable. Moreover, for all C x r-measurable functions, the function f x is
r-measurable and the function f Y is C-measurable.
Proof. First, we prove that for each y E Y and S E C x r we have SY E C. To do
this let y E Y and let BY be the set of all S E C x r so that SY E ’c. For all A E C
and B E r the rectangle with measurable sides A x B is in BY, because if y E B we
14.7. Product B -Algebras
249
have ( A x B)Y = A E C and if y # B we have ( A x B)Y = 0 E C. To prove that
BJ = C x r we need to show that BY is a a-algebra, because then it must contain
C x r (see Exercise 14-42) and must hence be equal to it. To see this, first note that
0 = 0 x 0 E BY. Now if S E BY, then
(S')'
so S'
u
E
s' : ny(s)= y}]
=
nx [{s
=
x \ nx[{s E S : ny(s)= y}]
BY. Finally, if
E
= nx[{s #
s : ny(s)= y}]
= (S')' E C ,
{Sfl}Zl
is a sequence in BY, then
33
so
S, E BY. Thus BY is a a-algebra that contains the rectangles with measurable
n=l
sides and is contained in C x r, which means that BY = C x r. Hence, for all y E Y
and all S E C x r we have that SY E C. Similarly, we prove that for all x E X and all
S E C x l- we have that Sx E r.
Now let f : X x Y + [--00, -001 be a C x r-measurable function. To prove that
for every y E Y , the function f Y is C-measurable, let y E Y and a E R.Then
(f')-"(a,CO]]
= {x E x : f ( x , y ) > a } =nx[{z E
xxY
: f(z)>a,nr(z)=y}]
Similarly, we prove that for every x E X the function fx is r-measurable.
We conclude this section by showing that the constructions presented here give
reasonable results for Lebesgue measure.
+
Proposition 14.60 Let m , n , d E N be such that m n = d and let h,, h, and hd
denote Lebesgue measure on Rm,Rn and Rd,respectively. Then Chd 2 Xi, x Ch, and
f o r all A E
and B E Ch, we have hd(A x B ) 5 h,(A)k,(B).
Proof. By Exercise 14-41b, Ch, x Xi,,is generated by the sets of the form A x B ,
where each A and B is either an open box or a null set.
The proof that products of open boxes A x B (which are of course open boxes
themselves) are hd-measurable is similar to the proof of Proposition 9.13 (see Exercise
14-40). Now let A E Ch,, let B E Ch, and let E > 0. Find sequences of open
cc
{Zj)Eland { K l ] E l so that the containments A E u Zj, B C uKi and the
c4
boxes
j=l
1=1
250
14. Integration on Measure Spaces
c
x
inequality
00
II j I
j=1
I K1 I
< h,
(A)hn(B)+ E hold. Then
1=1
j,l=l
J=l
1=1
and for all A E Chm and B E Chn we obtain hd(A x B) I
h,(A)h,(B).
In particular, this means that for A E Ch,, and B E Ch, such that one of A , B is a
null set we have that hd(A x B) = 0, which means A x B is Ad-measurable. Together
with the Ad-measurability of products of open boxes, this means Chd 2 Chm x Ch,,
because Chm x C,in is generated by products of open boxes or null sets in Rm with
open boxes or null sets in Rn.
Proposition 14.60 has two apparent shortcomings. First, the a-algebras are not
shown to be equal to each other. Exercise 14-47 shows that the containment is indeed
strict, so this part of the result cannot be improved. Second, we did not show that
hd(A x B) and h,(A)h,(B) are equal, even though they should be equal. Recall that
in Proposition 8.5 we needed the Heine-Bore1 Theorem to prove that the Lebesgue
measure of an interval is its length. To prove that hd(A x B) and h,(A)hn(B) are
indeed equal, we need a higher dimensional version of the Heine-Bore1 Theorem. This
version will be provided in Theorems 16.72 and 16.80. Thus we delay the proof that
hd(A x B) = A,(A)h,(B) to Theorem 16.81. To keep notation simple, unless the
distinction of different dimensions is necessary, we will denote Lebesgue measure in
all dimensions by h.
Exercises
14-39. Prove Proposition 14.51
14-40. Prove that every d-dimensional open box is hd-meaSUrabk.
14-41. Generating the Lebesgue measurable sets. For d E
boxes and all d-dimensional null sets.
N let Gd be the set of all d-dimensional open
(a) Prove that C A ~the, 0-algebra of Lebesgue measurable sets in E d , is generated by Gd.
Hint. Gd g C A by
~ Exercise 14-40. Prove that every S E X i d with h ( S ) < m differs
by a null set from a countable intersection of countable unions of open boxes. Then use
0 -finiteness.
(b) Prove that form, n E
N the 0-algebra generated by ( A x B : A
E G,, B E G,] is Chm x C h H .
(c) P r o v e t h a t i f m + n = d a n d A E C ~ ~ , t h e n t h e r e i s a BE Ch, X C A , withhd(A\BUB\A) = 0.
+
(d) Prove that if m n = d and the function f : Wd + [-m, m] is Zhd-measurable, then there
is a Z A , x~ Cj,, -measurable function g with j = g a.e.
14-42. Let M be a set and let U be a set of subsets of M . Prove that the a-algebra generated by U is
contained in all a-algebras that contain U .
14-43. On the equality of measures.
(a) Prove Proposition 14.54.
(b) Prove that the 0-algebra generated by all the singleton subsets of
C := (Cg W : C or W \ C is countable 1.
W is the set
14.8. Product Measures and Fubini's Theorem
25 1
(c) Construct two finite measures on C that agree on all singletons, but which are not equal
1
Hint.Set F ( [k) ) := - fork E N and set it equal to zero for all other singletons.
2k
(d) Construct a measurable space ( X , C) and two finite measures /1. and u on C so that the equality
l ( X ) = u(X) holds, but { S E C : w ( S ) = u(S) } is not a a-algebra.
Hint There is a finite example.
14-44. Prove that if ('Dj]iclis a family of Dynkin systems on the set M , then
n'Di
is a Dynkin system.
iel
14-45. Prove Theorem 14.57.
14-46. Prove that if is a measure defined on the a-algebra 8 generated by the finite open intervals such that
F ( ( a , b ) ) = b - a for all finite open intervals, then p must be equal to the restriction of Lebesgue
measure to B.
14-47. The containment of the a-algebras in Theorem 14.60 is strict. To see this, prove that for m < d
every subset of W m (considered as a the subset Rm x [O)d-m of Rd) can be a section of a Lebesgue
measurable set in Rd.
14.8 Product Measures and Fubini's Theorem
We are now ready to define a measure on the product a-algebra. Because Theorems
14.56 and 14.57 are needed to construct an unambiguous product measure in Theorem
14.62, we introduce the notion of a-finiteness.
Definition 14.61 A measure space ( M , C , k ) is called a-finite
u
iff
there is a sequence
a3
of subsets A; & M ofjnite measure so that M =
A ; . We will also sometimes call
j=1
the measure a -3nite.
Clearly, Lebesgue measure is a-finite, so we will be able to use the results from
this section for d-dimensional integration.
Theorem 14.62 Let (X, C , k ) and ( Y , r, u ) be a-jnite measure spaces. Then there is
a unique measure k x u on C x r so that with the convention 0.00 = 0 for all A E C
and B E we have ( k x u ) ( A x B ) = k ( A ) u ( B ) .
Proof. The natural idea for the product measure of a set S E C x r is to integrate
the u-measures of the sections S, over X. To do this we first need to prove that for all
S E C x r the function x H u (S,) is C-measurable.
Let X be the set of sets S E C x r so that the function x H u (S,) is C-measurable.
We need to prove that X = C x r. For all A E C and B E r we have that
u ( ( A x B),) equals u ( B ) when x E A and 0 when x 9 A . Therefore, the function
x H u ( ( A x B),) = U(B)lA(X) is C-measurable for all A x B E C x r.
To prove X is a a-algebra, first note that the rectangles with measurable sides form
a rr-system (see Exercise 7-7) and X x Y E X. To be able to apply Theorem 14.56,
we will now prove 'H is a Dynkin system when u is finite. If S,F E 'H and S C F ,
then for F \ S we have u ( ( F \ S),) = u (F, \ S,) = u(F,) - u(S,), which implies
that F \ S E 'H. Now let
be a sequence in X with A, 5 A,,+1 for all n E N.
252
Then u
(ifi
An),) = u
n=l
14. Integration on Measure Spaces
(c
( A n ) , ) = ,l&l u ( ( A , ) , ) by Theorem 14.15, so by
Exercise 14-28a li is closed under the formation of increasing unions and thus li is a
Dynkin system. Therefore, by Theorem 14.56 7-l = C x r. Hence, if u is finite, for all
S E C x r the function x H u(S,) is C-measurable.
To prove the result for a-finite measure spaces ( Y , u , r), let { F n ) r = i be a se-
u
CQ
quence of pairwise disjoint sets in
u, (S) := u ( Fa
n S).
r
with u(F,) <
00
so that
F, = Y . Define
n=l
Then each u, is a finite measure and so each function u, (S,) is
co
C-measurable. But then u(S,) =
which proves that the functions x
u,(S,) is also C-measurable (Exercise 14-28a),
n=l
H
u(S,) are C-measurable.
Now we prove that ( p x u ) ( S ) :=
Clearly, (w x u)(0) =
s,
u (0,) d p =
s,
s,
u (S,) d p defines a measure on C x
0 d p = 0. Now let [ S n ) ~bel a sequence
of pairwise disjoint sets in C x r. Then for all x
sequence of pairwise disjoint sets in r. Therefore
Thus ( p x u ) defines a measure on C x
have ( p x u ) ( A x B ) =
s,
r.
E
X we have that { (Sn),]zl is a
r. Moreover, for all A
u ( ( A x B),) d k =
s,
E C and B E
r we
u ( B ) 1~ dw = p ( A ) u ( B ) ,where
the overall result is zero if one of A or B has measure zero. Because ,LL and u are
both a-finite, any measure with the above property must be a-finite. Thus by Theorem
14.57 p x u is the unique measure on C x r so that ( p x u ) ( A x B ) = k ( A ) u ( B )for
a
all A E C and B E r.
Definition 14.63 The measure
of p and u.
x u in Theorem 14.62 is called the product measure
Theorem 14.62 not only proves the existence of the product measure, its proof also
provides a representation of p x u. Because the product measure is unique and we
could have switched the roles of the two factors, there are two representations. These
two representations are needed so that we can ultimately switch the order of integration.
253
14.8. Product Measures and Fubini’s Theorem
Corollary 14.64 Let ( X , C, p), ( Y , r, v ) be a-Jinite measure spaces and let
S E C x r. Then a ( x ) := v (S,) is C-measurable, b ( y ) := p (SY) is r-measurable
and the equality ( p x v ) ( S ) =
s,
v (S,) d p =
s,
p (S’) d v holds.
Proof. Similar to the proof of Theorem 14.62 we can prove that for all S E C x
the function y
H
p
(SY)is r-measurable and
(p x
v)(S) =
p
r
(SY)d v .
We are now ready to prove that integrals with respect to product measures can be
computed as iterated integrals.
Proposition 14.65 Let ( X , C, p ) and ( Y , r, v ) be a-Jinite measure spaces and let the
function f : X x Y -+ [0,001 be C x r-measurable. Then the function x
is C-measurable and the function y
H
J, f
H
f x dv
d p is r-measurable. Moreovel;
Proof. We first prove the result for f = 1s where S E C x r. By Proposition
14.59, for all y E Y the function fY = (1s)y = 1s) is C-measurable. Now
and similarly we can prove that
By Theorem 14.39 the equalities must hold for all simple C x r-measurable functions and by the Monotone Convergence Theorem the equalities hold for all nonnegative C x r-measurable functions.
Theorem 14.66 Fubini’s Theorem. Let ( X , C , ,x) and ( Y , r, v) be a-finite measure
spaces and let f : X x Y + [-m, 001 be p x v-integrable. Then for p-almost all
x E X the function f , is v-integrable and for v-almost all y E Y the function f’
iff, is v-integrable, is
is @-integrable.The function x H
otherwise,
14. Integration on Measure Spaces
254
Proof. By Proposition 14.65 we have that the functions x
x
H
s,
H
s,
( f+)x du and
( f -), du are pintegrable, so f x is u-integrable for F-a.e. x E X. Now
f - d(!J x
s,xYf+d(~xu)-s,xY
u)
The other equation and the assertions about the f y are proved similarly.
a
Exercises
14-48. Construct a Lebesgue integrable function f : [0, 11 x [0, 11 -+ [-m,
m]so that not all fx and f
‘
are Lebesgue integrable.
h ) be the real numbers with Lebesgue measure, let (R,P ( R ) , ) be the real numbers
14-49. Let (W.EA,
with counting measure and let f := ly=xbe the indicator function of the set { (x,y ) E R2 : q’ = x ).
Prove that
(kf)
dh)
(b
dm #
fx
dm) d h .
Note. This example shows that the assumption of a-finiteness of the measure spaces is needed
throughout this section.
14-50. Let ( N , P ( N ) ,
) be the natural numbers with counting measure
Prove that P(N) x P(W) = P ( N x W), where the first product denotes the product a-algebra
of the two power sets.
Prove that
x
=
mxw
cc
s1s1
Prove that if the double series
Hint. Recall Example 14.37.
a ( i , j )converges absolutely, then we can exchange the
c
CCICCI
Prove that if the double series
a ( i , j )converges absolutely, then for all bijective func-
i=l j = 1
N the sum
c
x
cam
tions u : W x W + N x
ao(i,jj converges to
i=l j=1
i=l
Hint. Dominated Convergence Theorem and Fubini
14-51 Let ( M , E.p ) be a u-finite measure space, let f M + [0, 5c] be C-measurable and let (R,EA A)
be the real numbers with Lebesgue measure
(a) Prove that E =
(b) Prove that
{ (x,j ) C M
IM 1
fdp =
cs
p
x B 0 9 j <f(x)
( {x
E
M :f(x)
t
} 1s in C
y
])
x C,
dq’.
Chapter 15
The Abstract Venues for
Analysis
It is now time to introduce the broad spectrum of settings in which analysis is done.
Applications are usually set in multidimensional Euclidean space, so we need to find
a way to do analysis in Rd.Moreover, because the solutions for many modeling processes are functions, we need to consider spaces of functions. To make the theory
applicable in many situations, we analyze the abstract commonalities of the structures
in question, define spaces based on these commonalities and then prove theorems in
these abstract spaces. The beauty and power of the abstraction are founded in the
wide applicability of the results as well as in their relative simplicity. Regarding the
wide applicability, recall, for example, that the initial abstraction in Chapter 14 was
but a simple step from the corresponding notions for functions of one real variable.
Yet it ultimately allowed us to integrate in arbitrary dimensions. Regarding the relative simplicity, note that the notions of convergence for functions presented in Chapter
11 and in Section 14.6 involve a significant amount of detail about functions, which
may be quite complicated objects themselves. The generalizations of d-dimensional
space presented in this chapter will allow us to visualize the functions and their convergence behavior as if they were points and sequences in low dimensional spaces. This
simplification is significant.
The subsequent chapters will provide more details on the metric, normed and inner
product spaces presented in this chapter.
15.1 Abstraction I: Vector Spaces
Vector spaces describe the algebraic properties of multidimensional spaces and of
spaces of functions. These properties guarantee that objects can be added to each other
and multiplied with numbers.
255
15. The Abstract Venues for Analysis
256
Definition 15.1 A real vector space or a vector space is a triple ( X , +, .) of a set X
and two binary operations, vector addition : X x X +. X and scalar multiplication
. : R x X -+X , so that the following hold.
+
1. Vector addition is associative.
That is, f o r all x,y , z
E
X we have (x
2. Vector addition is commutative.
That is, f o r all x, y E X we have x
+ y ) + z = x + ( y + z).
+ y = y + x.
3. There is a neutral element 0 E X f o r addition.
That is, there is an element 0 E X so that f o r all x E X we have x
+ 0 = x.
4. For every element x E X , there is an additive inverse element.
That is, f o r every x E X there is a (-x) E X so that x (-x) = 0.
+
5. Scalar multiplication is (left) distributive over vector addition.
That is, forall x, y E X and a E R we havea . (x y ) = a .x + a . y .
+
6. Scalar multiplication is (right) distributive over scalar addition.
That is, f o r a l l x E X a n d a , j3 E R we have (a j3) . x = a . x j3 .x,
+
+
7. Scalar multiplication is “associative.”
That is, f o r all x E X and a , 0 E R we have a . (j3 . x) = (aj3). x,
8. The number 1 E IR is a neutral element f o r scalar multiplication.
That is, f o r all x E X we have 1 . x = x.
An element of a vector space is also called a vector and a real number is also
called a scalar in this context. We will usually refer to the set X as the vector space,
implicitly assuming addition and multiplication are denoted as usual. As is customary
for multiplications, the dot is usually omitted.
The standard example and visualization for a vector space is d-dimensional space.
Example 15.2 Let d E N. The set Rd := {(XI,. . . , Xd) : xi
wise addition and scalar multiplication is a vector space.
E
R} with component-
0
To indicate how abstraction can simplify arguments, we will not directly prove the
vector space properties for d-dimensional space. Instead, we note that d-dimensional
space can be interpreted as the space of functions f : { 1, . . . , d } +. R.The fact that
d-dimensional space is a vector space then follows from Example 15.3 below. In a
way, Example 15.3 is universal and it should be thoroughly understood. All examples
of spaces in this text are subspaces of spaces as in Example 15.3.
Example 15.3 Let D be a set. The set F(D,R) of all functions f : D -+ IR with
addition defined pointwise by ( f + g ) ( x ) := f (x) + g(x) and scalar multiplication
dejined pointwise by (a . f ) (x) := af (x) is a vector space.
All properties follow from the appropriate pointwise properties for real numbers.
For example, addition of functions is commutative, because for all x E D we have that
( f + g)(x) = f ( x ) + g ( x ) = g(x) + f ( x ) = ( g + f ) ( x ) .The neutral element with
0
respect to addition is the function that is equal to 0 E R for all x E X.
15.1. Abstraction I: Vector Spaces
257
By identifying each vector ( X I , . . . , x d ) E Rd with the function that maps each
index i E ( 1 , . . . , d ) to xi, we see that Rd is the vector space F((1,.. . , d ) , R). We
conclude this section by introducing several important subspaces of spaces F ( D , R).
Definition 15.4 A subset S of a vector space X is called a vector subspace or subspace of X iffit is a vector space when equipped with the restrictions of addition and
scalar multiplication to S.
The advantage of working with subspaces of a somewhat universal vector space
like F ( D , R) is that subspaces are easily proved to be vector spaces.
Lemma 15.5 Let X be a vector space. Then the neutral element 0 E X with respect to
addition is unique. Moreovel; for the scalar 0 E R and all vectors x E X we have that
0 . x = 0.
Proof. Suppose 0,O’ E X are both neutral elements with respect to addition. Then
0 = 0 0’ = 0’ +O = 0’. Hence, the neutral element with respect to addition is unique.
Now let x E X. Then for all y E X we have
+
y+ox =y+x+ox-x
= y + ( 1 +O)x - x = y + x - x = y
Therefore Ox is neutral with respect to addition, and hence it is equal to 0 E X.
Proposition 15.6 A nonempty subset S of the vector space X is a subspace i f f o r all
x,y E S we have x y E S andfor all a E R a n d x E S we have a . x E S.
+
Proof. For “jnote
,”
that if S is a subspace, then the results of operations with
elements of S must again be in S.
Conversely, for “+”note that if the sums and scalar multiples of elements of S are
again in S, then the properties of the restrictions of the operations to S follow from the
corresponding properties for the overall operations on X. The existence of a neutral
element for addition follows from the fact that if x E S. then 0 = Ox E S.
Example 15.7 The set 130 := { { ~ i ) , 3 0 _: ~xi E R,sup { ( x i1 : i E N} < m} with addition and scalar multiplication dejned termwise is a vector space.
First, note that loo is a subset of F ( N , R). Now i f f , g E lc0 and { l f ( n ) l : n E N}
is bounded by B and { Ig(n)l : n E N} is bounded by B,, then for all n E W we have
l(f -k g)(n)l = f ( n > g(n)l I
/ f ( n > l lg(n)l e B f B, < m, so f
g E P.
Similarly, if a E R,then for all n E N we have I ( a f ) ( n ) /= l a l / f ( n ) l I
IcxIBf < CO.
Thus loo is a subspace of F ( N , R).
f
+
+
+
+
Example 15.8 The set C o ( [ a ,b ] ,R) of continuous functions f : [ a ,b ] -+ R with
pointwise addition and scalar multiplication is a vector space.
Clearly, Co( [ a ,b ] ,R) is a subset of F ( [ a ,b ] ,R). Moreover, part 1 of Theorem
3.27 implies that sums of continuous functions are continuous. Part 3 of Theorem 3.27,
applied with one function being constant, implies that scalar multiples of continuous
functions are continuous.
258
15. The Abstract Venues for Analysis
Notation 15.9 To simplify notation, we write Co[a,b ] instead of C o ( [ a ,b ] ,R).
0
Example 15.8 shows how switching to more abstract situations is often just a
change in point-of-view. We no longer consider individual objects, but instead we
consider classes of objects. Results about individual objects are then often summarized
in statements about the class. For example, the fact that sums and constant multiples of
continuous functions are continuous is summarized in the statement that the continuous
functions form a vector space.
Example 15.10 For a < b and k E N,let C k ( a ,b ) be the set of k times continuously
differentiablefunctions on ( a , b). Then C k ( a ,b) with pointwise addition and scalar
n
00
multiplication is a vector space. Moreovel; C ” ( a , b ) :=
C k ( a ,b), the space of
k= 1
infinitely differentiable functions on ( a , b), with pointwise addition and scalar multiplication is a vector space.
This is an induction using Theorem 4.6.
0
Spaces of “p-integrable functions” are of central importance in analysis.
Definition 15.11 Let ( M , C, p ) be a measure space and let 1 5 p <
f
E
F ( M , R) :
,/
CO.
l f l P d p < CO] .
Example 15.12 Let ( M , C, p ) be a measure space and let 1 5 p <
C p ( M , C , p ) is a vector space.
Because C P ( M , C, k ) F ( M , R) we can apply Proposition 15.6.
Let f , g E C p ( M , C, p). Then
2pIf l p
SO
f
We define
+ 2plglp d p <
00.
Then
00,
+ g E C p ( M , C, p ) . Moreover, if a E R,then
so a . f E C p ( M , C, p). Hence, C P ( M , C , p ) is a vector space.
0
For concrete examples, the notation for LP with an abstract measure space becomes
a bit cumbersome.
Definition 15.13 When we work with real valued functions on a Lebesgue measurable
subset D of Rd, we will assume that M = D, C = Ch, the a-algebra of Lebesgue
measurable subsets of D, and p = A, the Lebesgue measure. In this case, we denote L P ( D ) := L P ( M , C,p). Also, if D is an interval, we may leave out the outer
parentheses. That is, we may write C P [ a ,b ]for C P ( [ a ,b ] ) ,and so on.
259
15.2. Representation of Elements: Bases and Dimension
Aside from the capital CP spaces, the lowercase 1P-spaces of Definition 15.14 below are sometimes used in analysis.
In this chapter, we will frequently translate results between “little lp” and “big
CP.”This will not be a problem, because the computations are very similar. The only
adjustment usually is that sums are interchanged with integrals and vice versa. As a first
demonstration of the similarities between “little l p ” and “big CP” we leave the proof
that l p is a vector space to Exercise 15-2a. The two ways in which the analogy between
lp and CP can be used will be presented in Sections 15.4 and 15.6, respectively. In
Section 15.4, we will prove results for l 2 and then show how the proof is analogous for
C2. In Section 15.6, we will prove results for CP and show how results for 1P can be
obtained as corollaries.
Exercises
15-1. Prove that if X is a vector space, then for all x E X we have (-l)x = - x .
15-2. Examples of vector spaces.
(a) Prove that for 1 5 p <
00
the space 1P is a vector space.
(b) Let ( M , C) be a measurable space. Prove that the set M ( M , C ) of measurable real valued
functions is a vector space.
(c) Let a ib and let B V [ a ,b] denote the set of all functions of bounded variation on [ a , b]
(see Exercise 8-12). Prove that BV[a,b] with pointwise addition and scalar multiplication is
a vector space.
15-3. Containments of LP-spaces
Prove that I’ c l 2 c 1OC. Then give concrete examples to show the containment is proper.
Hint. Harmonic series.
Prove that if 1 5 p
q , then
i
10
c 1 4 , Give an example to show the containment is proper.
Let ( M . C , p ) be a measure space with p ( M ) < oc. Prove that if 1 5 p < q , then
.CP(M, C , p ) 3 . C q ( M , C , p). Give an example that shows that the containment is proper.
Hint. Exercise 9-32.
Give an example of a measure space ( M , C , p ) such that for all 1 5 p < q we have that
Lp(M.C , p ) g’ C q ( M , C , p ) and L f ’ ( M , X. p ) 2 .Cq(M, C , p).
Hint. By part 15-3b the measure space cannot be discrete and by part 15-3c it cannot be finite.
15.2 Representation of Elements: Bases and Dimension
It is often useful to represent vectors in terms of certain standard vectors. The most
familiar example is the standard coordinate system in d-dimensional space.
Definition 15.15 Let X be a vector space. A subset S
early independent iff f o r all jinite subsets {XI,. . . , x,}
n
{ a l ..
. . , a,}
R the equation
X \ {O} is called linS and all sets of scalars
aixi = 0 implies a1 = a2 = . . . = an = 0.
i=l
260
15. The Abstract Venues for Analysis
n
A sum c a i x i with aj E
i=l
X I . . . . , x,.
R and xi
E X is also called a linear combination of
Definition 15.16 A linearly independent set B g X such that for every x E X there
are afinite subset { b l , . . . , b,] C B and a set of scalars { a l ,. . . ,a,,] C R so that
n
a;bi is called a base of a vector space.
x =
i=l
Obviously, linear independence and bases are finitary notions. They are usually
investigated in linear algebra. For our purposes, they are important to establish the
difference between finite dimensional and infinite dimensional spaces.
Example 15.17 In Rd,let e; denote the vector such that the ith component is 1 and all
other components are zero. Then [ e l , . . . , ed] is a base of Rd.
To prove linear independence, for each i = 1, . . . , d let ej’) denote the jfhcom-
c
d
ponent of ei. Then for any a ] ,. . . , a d the vector equation
c
ere'
- 0 leads with
i=l
d
j = 1, . . . , d to the scalar equations
aiej’) = 0, which for each j simply state that
i=l
aj = 0 as was to be proved.
Regarding the representation of elements, for each x = ( X I , . . . , X d ) E Rd we have
d
that x = ( X i ,
.. .,X d )
CI
xiei .
=
i=l
The nice thing about bases is that the representation of elements is unique.
Proposition 15.18 Let X be a vector space and let B g X be a base. Then for each
x E X the b l , . . . , bn E B and a l , . . . ,
E R in Definition 15.16 are unique, except
that any vector of B \ {bl , . . . , 6,) can be added to the bi with a coeficient zero and
that any bi with ai = 0 can be omitted.
Proof. Suppose for a contradiction that there is an x E X with two different representations. Let { b l ,. . . , b,] C B be the set of base vectors that occur in either
representation. Then there are two distinct n-tuples ( a l , . . . , a,)
(PI, . . . , j3,) in
x
n
x
n
n
+
a fb, = x =
b, . But then c ( a f - B,)b, = 0, and hence by linear
r=l
r=l
1=l
independence of B , for all i E { 1, . . . , n ] we conclude a1 - ,B1 = 0, that is, a, = ,Bf, a
contradiction.
Rn so that
Every vector space is either finite dimensional, that is, it has a finite base, or not.
Finite dimensional spaces are investigated linear algebra and in analysis. Infinite dimensional spaces are mostly investigated in (functional) analysis.
26 1
15.2. Representation of Elements: Bases and Dimension
Proposition 15.19 Let X be a vector space with afinite base F. Then every linearly
independent subset L of X has at most as many elements as F . Moreovel; all bases of
X have as many elements as F .
Proof. Let F = { f 1 , . . . , f n } be a finite base of X. Suppose for a contradiction
that there is a linearly independent set L C X that has more elements than F . Without
loss of generality we can assume that L is such that IL n FI is maximal. That is,
if
is another linearly independent subset of X with more elements than F , then
L n F 5 J L n FI. Now let b E L\F and consider the linearly independent sets L\{b}
I- I
and F . By Exercise 15-4a, there is a subset H C F \ L so that C := ( L \ { b } )U H is
a base of X. By Exercise 15-4b, L \ ( b ) is not a base of X, so H f 0. Hence, C has
at least as many elements as L and in particular it has more elements than F . Because
b y! F , weobtain \ [ ( L\ ( b } )U H ] n FI = I(L n F ) U HI > IL n FI, acontradiction.
Thus no linearly independent subset of X has more elements than F . In particular,
if B is another base of X, then 1 B 1 5 IF 1. Therefore B is finite and with the same
argument as above, if L _C X is linearly independent, then ILl 5 IBI. Because F is
linearly independent, we obtain IF I 5 I B 1, and hence IF I = I B I.
The fact that if one base is finite then all bases have the same size allows us to
assign one number as the dimension.
Definition 15.20 A vector space X is called finite dimensional i r i t has afinite base.
In this case the dimension of X is the number of vectors in a finite base. If X has no
jinite base it is called infinite dimensional.
Proposition 15.21 A vector space X f ( 0 } is infinite dimensional iff it contains an
infinite linearly independent set.
Proof. For ‘‘=+-,”note that no finite linearly independent subset of X is a base.
Construct a sequence of finite linearly independent subsets F,, of X as follows. Let
Fl := { f l } with f 1 E X \ {0}arbitrary. Once F,, = ( f 1 , . . . , f,,} is chosen, note that
because F,, is not a base of X ,there is an f n + l E X so that there are no a1, . . . , a, E R
II
so that
a j f j = f n + l . This means that F,+1 := F, U { f,,+l} is linearly independent
j=1
u
co
(Exercise 15-5). But then the infinite set
F, is linearly independent in X
n=l
For ‘‘+”let A 2 X be an infinite linearly independent subset. Then by Proposition
w
15.19 X cannot have a finite base.
Our first example of an infinite dimensional space is easy to come by.
0; for j f i ,
1; f o r j = i .
Then [ej : i E N}is linearly independent, and hence 1’ is injnite dimensional.
0
The reader will verify this in Exercise 15-7.
Example 15.22 In 1’ let e, be the sequence with jthentry
I,(’)
-
15. The Abstract Venues for Analysis
262
Example 15.22 shows a shortcoming of the definition of a base that is ultimately addressed analytically in Banach and Hilbert Spaces. The set of “unit vectors” in Example
15.22 has everything we want in a base, except that we would need to sum infinitely
many vectors for the representations. To define “infinite sums” we need a notion of
convergence, which will be introduced as soon as we investigate metric spaces. For
now, we continue in our exploration of the spaces of abstract analysis and we conclude
this section with a notation we will use in l p spaces from here on.
Convention 15.23 When working with sequences
of objects in spaces whose
elements are finite or injnite sequences themselves, we will move the sequence index
for the elements into the exponent in parentheses. That is a:’ will denote the jth
element in the sequence a,, which is itselfan element of the sequence (a,,}El. The
vector ei with jthentry ei(j)- 0; for j f i, will be called the ith standard unit
1; for j = i ,
vector. For all 1 5 p f 00, the standard unit vectors are contained in l p .
4
Exercises
I”
I
15-4. The span of linearly independent sets. Let X be a vector space and let W
W tobe s pa n(W ):=
C X.
Define the span of
u = ~ n i w , : a i ~ W , w i ~. W
i=l
(a) Let B and F be linearly independent sets and let F be finite. Prove that if span(B U F ) = X ,
then there is a subset H of F so that B U H is a base of X.
Hint. Induction on the size of F . In the induction step, let f E F and distinguish the cases
f E span(B) and f $ span(B).
(b) Let B be a linearly independent set and let b E B . Prove that span(B) f span ( B \ ( b } ).
15-5. Let F = ( f l ,
. . . , f n } be linearly independent in the vector space X . Prove that if
that there are no “1,
. . . , an E W so that
n
f,+l
E
X is so
a j f j = f n + l , then F U (f,+l} is linearly independent.
j=1
15-6. Base Exchange Theorem. Prove that if X is a vector space and (ul. . . . , u,} and ( W I , . . . , w n }are
bases of X , then there is a j E [ 1, . . . , n ] so that ( u l , , , . , q - 1 , wj } is a base of X.
15-7. Prove the claim in Example 15.22.
15-8. Prove that for 1 < p 5
00
the space 1P is infinite dimensional.
15-9. Prove that for 1 5 p <
00
the space / Y [ O , 11 i s infinite dimensional
15-10. Prove that the space Co[O, 11 is infinite dimensional.
15.3 Identification of Spaces: Isomorphism
Isomorphisms are structure preserving maps. Basically, isomorphic structures are for
all intents and purposes “the same,” as long as we only care about the operations and
concepts that are preserved by the isomorphism.
Definition 15.24 Let X , Y be vector spaces. Then the function @ : X + Y is called
an isomorphism iff 0 is bijective and f o r all u , u E X and a E E% we have that
@(u u ) = @ ( u ) @(u) and @ ( a u )= a @ ( u ) .
+
+
15.3. Identification of Spaces: Isomorphism
263
Definition 15.24 and Exercise 15-11, which proves that the inverse of an isomorphism is again an isomorphism, show why isomorphic structures are considered to be
“the same.” Let Q, be an isomorphism from the vector space X to the vector space Y .
Obviously, we have a one-for-one matching between the elements of X and Y . Moreover, as long as we only care about linear operations, it does not matter if we carry out
the operations in X or in Y , because we can always map back and forth between the
spaces and get the corresponding results.
The next result shows that from the point-of-view of linear algebra, all finite dimensional spaces are “the same.” Theorem 16.76 will show that they are also “the same”
from an analytical point-of-view.
Proposition 15.25 Every d-dimensional vector space X is isomorphic to Rd.
Proof. Let { b l , . . . , b d } be a base of X . Then for each vector x
E
X there are
d
unique numbers x1, . . . , X d
by @(x) = @
E
R so that x =
x i b ; , Define
the function @ : X
--f
I@
i=l
(2
d
xibi) :=
i=l
C
xiei
. Because the x; are unique,
Q,
is well-defined.
i=l
d
Moreover, for all (Y E
R and all x , y
E
X with x =
C
x i bi
i=l
xi +
+y)
=
@
y i b;
we have
i=l
d
d
@(x
and y =
(zxibi
+gyibi)
=
@
i=l
yi)bi
i=l
d
d
d
i=l
i=l
i=l
and
d
d
i=l
i=l
which means that @ is has the linearity properties of an isomorphism. For surjectivity,
d
note that if y =
C
(c c
d
yiei
is in Rd then @
i=l
that @ is injective, let x =
d
y;bl) =
yiei = y .
Finally, to prove
i=l
i=l
d
d
i=l
i=l
C xibj and z = C zib; be in X with @(x)
(c(x;
d
Then 0 = @(x) - @ ( z ) = @(x - z ) = @
implies x i = z; for all i , and hence x = z.
\i=l
=
@(z).
d
- ; ; ) b i ) = c ( x i - z i ) e i , which
/
i=l
15. The Abstract Venues for Analysis
264
Exercises
15-11. Let X and Y be vector spaces. Prove that if CP : X
CP-1 : Y -+ x.
--f
Y is an isomorphism, then so is its inverse
15-12. Prove that for any 1 5 p 5 co the space Wd is isomorphic to the subspace of IP consisting of
sequences [ x j ] T = , so that for all j > d we have x , = 0.
15-13, Prove that CP : X + Y is an isomorphism iff 0 is bijective and for all a E W and x , y E X we have
that @ ( a x y ) = aCP(x) @ ( y ) .
+
+
15.4 Abstraction 11: Inner Product Spaces
In three dimensional space there are two ways to multiply vectors. The vector (or cross)
product, inspired by observations of magnetic forces on moving charges and also used
in the definition of the torque, is native to three dimensional space. However, the scalar
(or dot) product, which is inspired by the consideration of work done by a force and
which can be used to measure angles, can be generalized to other vector spaces. The
corresponding notion is called an inner product. Strictly speaking, Definition 15.26
below defines a real inner product space. For complex inner product spaces, consider
Definition 15.81.
Definition 15.26 An inner product space is apair ( X , (., .)) o f a vector space X and
afunction (., .) : X x X + R, called the inner product or scalar product on X , with
the following properties.
1. The inner product is positive definite.
That is, for all x E X we have (x,x) E [0,ca) with (x,x) = 0 ifsx = 0.
2. The inner product is symmetric.
That is, f o r all x,y E X we have (x,y ) = ( y , x).
3. The inner product is linear in theJirstfactor.
That is, for all scalars a and all x,y E X we have (ax,y ) = a(x,y ) and for all
x,y , z E X we have (x y , z ) = (x,z ) ( y , z ) .
+
+
We will usually call X itself an inner product space. When we do so, we implicitly
assume that there is an inner product on X , which will usually be denoted (. , .).
Exercise 15-14 shows that the inner product is also linear in the second factor.
Rather than staying with d-dimensional space, in our first example we go directly to an
infinite dimensional inner product space.
Ex:
30
Example 15.27 The set l 2 :=
: xi
E
R,
<
00
is a vector subspace
i=l
00
o f F ( N , R) and with ( { x i ) E l ,{ y i } E l ):= c x i y i it is an innerproduct space.
i=l
To prove that l 2 is a subspace of F(N,R) first let {xi)E1 E l 2 and a
00
z ( w x i j 2 = a2
i=l
Ex;
E
EX. Then
00
i=l
<
00,
and hence { a x i } E ,E 12. To prove that 1’ is closed under
15.4. Abstraction 11:Inner Product Spaces
265
addition, let (xi]zl, ( y i ) z l E l 2 and note that (see Exercise 15-15) for all x, y
the inequality 2 1 . ~ ~51x2 y2 holds. Then
+
i=l
i=l
i=l
and hence ( x j } c l
w
00
i=l
i=l
+ (yi]El
= (xi
00
posed inner product satisfy
+ yj}El
R
i=l
i=l
E 12. Moreover, the series in the pro-
C lxiyj I I C (x; + y;)
i=l
E
lcO
. Therefore for all sequences
i=l
00
( x i } z l ,( y i } r l
l 2 the series ( ( x i } r l ,( y i } ~ l=
) C x i y i converges absolutely, and
i=l
hence (., .) is defined on all of l 2 x 1 2 .
The fact that (., .) defines an inner product now follows easily from standard results
about series. (Exercise 15-16.)
17
E
Subspaces of inner product spaces clearly are inner product spaces themselves.
Example 15.28 shows how d-dimensional space can be obtained as a subspace of 12,
which means d-dimensional space can be equipped with an inner product.
Example 15.28 Let d E N.The set Rd := {(XI,. . . , x d ) : xi
E
R} with termwise add
dition and scalar multiplication and ((xi, . . . , x d ) , (y1, . . . , yd)) :=
Xiyi is an ini=l
ner product space.
For x = (XI, . . . , X d ) E Rd,define l ( x ) to be the sequence ( z , ] ~defined
~
termwise
x,; forn I
d,
is a subspace of 1 2 , and hence it is an inner
by Z n := 0;
otherwise.
7,. . .
product space. Because 1 : Rd + 1 is injective, preserves sums and scalar multiples
and satisfies (x, y) = ( l ( x ) ,l(y)) for all x , y E Rd we conclude that (., .) on Rd must
0
satisfy the properties of an inner product. Thus Rdis an inner product space.
{
In fact, Rd is, as an inner product space, isomorphic to the subspace 1 [Rd]
where the isomorphisms between inner product spaces are defined as follows.
12,
Definition 15.29 Let ( X , (., .) x) and ( Y , (. , .) y ) be inner product spaces. Then the
function CP : X -+ Y is called an isomorphism iff it is an isomorphism between the
vector spaces X and Y andfor all u , u E X we have ( u , u ) =~( @ ( u ) ,CP(u)jr.
Many important results in analysis come from the fact that L2 is almost an inner
product space. There is one problem, which the reader will explain in Exercise 15-18.
Yet Proposition 15.30 shows that most properties hold just fine. The problem with L2,
and indeed with LP in general, will be resolved in Section 15.8.
15. The Abstract Venues for Analysis
266
Proposition 15.30 Let ( M , C,p ) be a measure space. Consider the vector space
L 2 ( M , C , p ) of square integrable functions. Then ( f ,g ) :=
f ( x ) g ( x )d p is a
/M
function from C 2 ( M , C , p ) x C 2 ( M ,C, p ) to R such that
1. For all f
E
2. Forall f ,g
C 2 ( M ,C , p), we have ( f , f ) E [0,CO). Moreovel; (0,O) = 0.
E
G 2 ( M , C , p), we have (f,g ) = ( 8 , f ) .
3. For all a E R a n d all f , g E L 2 ( M , C , p), we have ( a f , g) = a (f , g), andfor
all f , g, h E C 2 ( M ,C , p ) we have ( f g , h ) = ( f , h ) ( g , h ) .
+
+
Proof. The proof that ( f ,g ) exists for all f , g E C 2 ( M , C, p ) is similar to the
proof for l 2 and left to the reader as Exercise 15-17. For the properties of (., .), let
f , g, h E C 2 ( M , C , p ) and a E R. Then
Note that on C 2 [ - n , n) we usually work with ( f ,g) := -
/
f g d h . The
[-r,Jr)
1
is motivated by Example 15.34 below. Inner product spaces will be
extra factor
n
investigated in more detail in Chapter 20, with groundwork laid in this chapter, Chapter
16 and Section 17.1.
Exercises
15-14. Prove that if X is an inner product space, then for all x , y , z E X and a E
( x , y z ) = ( x , y ) (x,z) and (x,ay)
= a ( x .y ) .
+
+
15-15, Prove that for a llx, y E
R we have
R we have 2 / x y l 5 x 2 + y 2
15-16. Finish Example 15.27 by proving that (.. .) is an inner product on 1 2 .
15-17. Prove that (f,
g) as in Proposition 15.30 exists for all f,g E L 2 ( M , Z , p).
15-18. Explain why C 2 ( M ,C.p) with ( f ,g ) :=
Hint. Part 3 of Theorem 14.35.
15-19. Prove that C o [ - n .
7r]
with ( f , g ) :=
-
/
M
f ( x ) g ( x ) d p is not necessarily aninnerproduct space.
fg d h is an inner product space.
267
15.5. Nicer Representations: Orthonormal Sets
15.5 Nicer Representations: Orthonormal Sets
An inner product allows the definition of right angles. This in turn leads to the desire
to represent vectors in terms of “nice” systems in which any two vectors are at right
angles.
Definition 15.31 Let X be an inner product space. Then x , y
onal iff ( x , y ) = 0.
X are called orthog-
E
Definition 15.32 Let X be an inner product space. A subset S 5 X is called an
orthonormal system i r f o r any two a , b
E
S we have ( a ,b) =
maximal orthonormal system is an orthonormal system S so that i f x
that ( x , s ) = 0 for all s E S, then x = 0.
E
X is such
Example 15.33 The standard unit vectors e, of Convention 15.23 form a maximal orthonormal system in 12.
00
For all i, j E N we have ( e i ,e j ) =
e,(k)ej!) = O; if # j ’ Hence, the stan1; i f i = j .
k=l
dard unit vectors form an orthonormal system in 12.
00
a(j)ej’) = a ( i )for all i
e l 2 .Then ( a ,e j ) =
E
N.Hence,
j=l
if a E l 2 is so that ( a ,e j ) = 0 for all i
a maximal orthonormal system in 12.
Example 15.34 For all m , n
[:
I[
E
E
N,then a
= 0. This means that {ej : i E
N]is
N, we conclude the following from Exercise 12-15a.
cos(mx) cos(nx) dx
=
sin(mx) sin(nx) d x
=
sin(mx) cos(nx) dx
= 0.
0;
ifm f n ,
n; i f m = n ,
0;
ifm f n ,
n; ifm = n ,
This means that T := { sin(nx), cos(mx) : n L 1, m 1. 1] U
{ 51
is an orthonor-
n) with the inner
ma1 system in the space of continuous bounded functions on [-n,
product (f, g ) := -
SJ
f ( x ) g ( x ) d x . Even more is true, as this set is a maximal or-
-7c
thonormal system. The proof is quite sophisticated however. We will present it in
Theorem 20.12. Note that the extra factor in the inner product assures that for each
function in T the inner product with itself is 1. Also note that the same properties hold
in L 2 [ - x , n),except that C2[-n, x ) is not quite an inner product space.
Orthonormal systems possess a key property for being a base.
268
15. The Abstract Venues for Analysis
Proposition 15.35 Orthonormal systems are linearly independent.
Proof. Let S be an orthonormal system in an inner product space X and let
n
c l , . . . , cn E S
and
al, .
. .,a,,
E
R be such that
a’c’
I I - 0. Then for all indices
i=l
j = 1 , . . . , n we obtain
I
i=l
Therefore S is linearly independent.
This linear independence means that for finite dimensional spaces we should be
able to find an orthonormal base, that is, a base that also is an orthonormal system.
I
Definition 15.36 Let X be a vector space and let W E X . DeJine the span of W to be
: ai E R, wi E
i=l
w .
Theorem 15.37 The Gram-Schmidt Orthonormalization Procedure. Eveiy jinite
dimensional inner product space X has an orthonormal base.
Proof. Let d be the dimension of X , let {bl , . . . , b d } be a base of X and for all vecbi
tors x E X use the notation llxll :=
Set C I := -. After c1, . . . , Ck-1 have
llbl II
been defined so that { C I , . . . , C k - I } is an orthonormal base for span({bl, . . . , bk-I}),
m.
k- 1
let dk := bk - C ( b k ,ci)ci. Because [ b l ,. . . , bk} is linearly independent, we infer
i=l
dk # 0. Define
Ck
dk
:= -.
lldk
I/
Then for all j < k we obtain
Because llCkII = 1, the set {cl, . . . , ck} is an orthonormal set and because all cj
are linear combinations of b l , . . . , bk it is contained in span((b1, . . . , bk}). Because
orthonormal sets are linearly independent, the set (c1 , . . . , c k } is a base of the span of
b l , . . . , bk.
After d - 1 steps as above we obtain that { c l , . . . , c d ) is a base of X.
15.6. Abstraction III: Normed Spaces
269
The Gram-Schmidt Orthonormalization Procedure can be executed indefinitely if
we start with a countably infinite linearly independent set (see Exercise 15-20). We will
discuss in Section 20.1 how we need to adjust the notion of a base to obtain standard
representations of elements of an infinite dimensional inner product space.
Exercises
15-20. Prove that every infinite dimensional inner product space contains an infinite orthonormal system.
15.6 Abstraction 111: Normed Spaces
The definition of distances in vector spaces is a two step process. First, norms can
be viewed as a generalization of absolute values. They still connect to the algebraic
properties of the surrounding space. Second, metrics (discussed in Section 15.7) need
not connect to these properties and can thus also be defined on subsets that are not
vector spaces any more.
Definition 15.38 A normed space is a pair (X,/I . 11) of. vector space X and afunction 11 . 11 : X -+ [0,co),called the norm on X , such that
I . For all x
E
X , we have llxll = 0 zflx = 0.
2. For all scalars
(Y
and all x E X , we have that IIax 11 = la/I/x 11.
3. The triangular inequality holds.
That is, forall x, y E X we have /Ix yII 5 IIxII
+
+ Ilyll.
We will usually call X itselfa normed space. When we do so, we implicitly assume that
there is a norm on X , which will usually be denoted 11 . 11.
As we introduce more abstract settings, the spaces we are familiar with will be
special cases of the more general spaces. For example, Proposition 15.40below shows
that every inner product space is a normed space, too. Because the function defined in
Lemma 15.39 ultimately turns out to be a norm, we denote it like a norm right away.
Note that until we establish that it is a norm, we are not using any of the properties of
a norm.
Lemma 15.39 Cauchy-Schwarzinequality. Let X be an inner product space and for
a l l x E Xletllxll
T h e n f o r a l l x , y E X w e h a v e l(x,y)l 5 llxllllyll.
:=m.
Proof. The function f ( t ) := ( t x + y , t x
+ y ) = t 2 ( x ,x) + 2 t ( x , y ) + ( y , y ) is a
nonnegative quadratic function with absolute minimum at t = - (” ’). But then
(x,x)
15. The Abstract Venues for Analysis
270
whichimplies (x,Y ) F~ (lxlI211y1l2,andhence l(x, y ) I I
llxllll~ll.
Now we can show that the norm notation is justified.
m defines a
Theorem 15.40 Let X be an inner product space. Then I(x(I :=
norm on X . Therefore any inner product space is also a normed space.
Proof. First, note that /Ix//
= 0 iff
m = 0 iff ( x , x ) = 0 iff x = 0. Moreover,
d
G
11 = J
=
= la1 IIx 11.
for all ci E R and x E X we obtain //ax
To prove the triangular inequality, let x,y E X . Via the Cauchy-Schwarz inequality
we infer the following.
IIx
+ Y 112
=
(x
+ Y x + Y ) = (x,x) + 2 ( x , Y ) + ( Y , Y )
3
I llxIl2 +2llxllllYll
+ IIY/12 = (llxll + IlYll)2?
which finishes the proof.
H
that is induced by the innerprod-
Definition 15.41 On Rd,the norm Ilxll2 :=
uct is called the Euclidean norm.
Of course, a more general definition is only useful if it introduces new and interesting objects. The following examples are not inner product spaces.
I/
1
Example 15.42 The function {xn}El cQ := sup { Ixn I : n E "} is a norm on lm.
It is clear that {xn}El 2 0 for all sequences {xn}Z1
E I". Moreover, we have
{xn}Z1 = 0 iff sup { Ixn I : n E N} = 0, which is the case iff all xn are zero, which
is the case iff {xn}E1
= (O),",, E Ice.
For ci E R and (xn}Z1
E I", note that
1
/lo
)I
/Irn
ll{~xn}El
Finally, for
"
il(xnln=l
= SUP { IciIIXnl
E
N} = la1 ll{xn}Z, /loo.
(xn}zl,
{yn}zl
Ioc note that
E
+ {Yn ln=l l oo
cQ
=
ll{xn + YnIZP=1((,= S U P {lXn
+ Ynl
:
E
"1
i sup{~x,~+~y~/:n~N}
5 sup { IX,I : n E N} sup { lynl : n E N}
-
I1
+
00
}n=1
I/ + l l { Y n } Z 11 "
00
3
where the last inequality is true because for each natural number m E
/xmli s u p { l x , I : n ~ N } a n d l y ~I Is u p { l y n l : n ~ N } .
I
1
N we have that
Example 15.43 Let D be a set. The function 11 f i l m := sup { f ( x ) : x E D } defines
a norm on B ( D , R),the space ofboundedfunctions on [ a ,b]. The norm // . 11% is also
called the uniform norm.
0
This is proved in Exercise 15-21.
Example 15.43 gives access to two more commonly used spaces.
27 1
15.6. Abstraction 111:Normed Spaces
Example 15.44 The function / / ( x 1 ., . . , xd)
anormonRd =B({l , . . . ,d } , R ) .
/ oo := max { / x jI : j
= 1, . . . , d } defines
0
Example 15.45 Because linear subspaces of normed spaces are normed spaces, too,
El
the space (Co[a,b ] ,11 . /Iw) is a normed space.
We can now turn our attention to the Cp spaces once more.
Definition 15.46 Let ( M , C , p ) be a measure space and let p 2 1. Then for all
I
f ,g
E
C P ( M ,C , p ) we define Ilf Ilp := ( / M If
Ip
dp)'
.
Although 11 . l i p is not quite a norm, we have good reason to use norm notation.
Part 1 of Theorem 15.50 will identify the only part of the definition of a norm that is
not satisfied by 11 . / I p and we will resolve this problem in Section 15.8. The biggest
immediate challenge is to prove the triangular inequality for 11 . l i p , which is done
in Theorem 15.49. Holder's inequality (Theorem 15.48) is a lemma leading up to
Theorem 15.49, but it is also of independent interest. In CP,Holder's inequality often
serves as a substitute for the Cauchy-Schwarz inequality, which is only valid in inner
product spaces.
Lemma 15.47 Young's inequality. Let 1 < p , q <
xp
l e t x , y E [0,00). T h e n x y 5 P
00
xq
1
be such that P
+ -1 = 1 and
4
+ -.
4
Proof. The inequality is trivial for x = 0 or y = 0, so we can assume both
numbers are positive. We first perform some substitutions that simplify the inequality.
1
1
u
+u
With u = x p and u = xq the inequality is equivalent to the inequality u p u 7 5 - P 4
U
I
t
1
for all u , u E (0, 00).With t := - the inequality is equivalent to t p 5 - -, where
U
P
4
we multiply or divide by u to go back and forth. But it is easy to verify with elementary
t
l
L
calculus (see Exercise 15-22) that the function f ( t ) := - - - t p has an absolute
P 4
minimum value of 0 on (0,00).
+
+
Theorem 15.48 Holder's inequalityf o r integrals. Let ( M , C , p ) be a measure space,
1
1
let 1 < p , q < 00 satisb the equality - + - = 1 and let f E C P ( M , C , p ) and
P 9
R E C ~ W ,C , PI. Then f g E C ' ( M , C , F ) and IIfgIIi I
IIfIIpIIgIIq.
Proof. The computation below shows first of all that
/M
f g dw = 11 fgll1 is finite,
which establishes that f g E C ' ( M , C , p ) . Moreover, the claimed inequality can then
15. The Abstract Venues for Analysis
272
be obtained by multiplying by I/f JIp /lg 1 I q .
Theorem 15.49 Minkowski's inequality f o r integrals. Let ( M , C , p ) be a measure
space and let p 2 1. Thenf o r all functions f , g E C P ( M , C , p ) we have the inequality
llf g l l p I
l l f l l p llgllp.
+
+
Proof. For p = 1, the inequality is an easy consequence of the triangular inequality
for absolute values. So for the remainder we can assume 1 < p < co.Choose q such
1
1
q = p q , and hence ( p - 1)q = p q - q = p . Thus,
that - - = 1. Then p
P 4
g l p is integrable, we conclude
because we already know from Example 15.12 that If
I f glp-' E P ( M , C ,p ) . Now via Holder's inequality we obtain
+
+
which was to be proved.
+
+
a
We can now determine how close the I/ . / I p are to being norms. The only difference
between the properties established in Theorem 15.50 and the properties of a norm is
that llfllp = 0 only implies that f = 0 almost everywhere, not everywhere. This
minor nuisance will be remedied in Section 15.8.
Theorem 15.50 Let ( M , C ,p ) be a measure space and let p >_ 1. Then the following
hold.
27 3
15.6. Abstraction III: Normed Spaces
Proof. Part 0 is trivial and part 3 is Minkowski's inequality. The remaining parts
are left as Exercise 15-23.
W
The functions I/ . l i p can be defined on l p also and on these spaces they do define a
norm. This is the key advantage that 1'' has over CP.It actually is a normed space.
Definition 15.51 For 1 5 p <
00
and
{xn}ElE 1P we define the p-norm of (xn}zl
Theorem 15.52 For 1 5 p < 00 the space ( l p , )I . lJp) is a normed space.
Proof. First note that with yw denoting counting measure on N the space 1P is
the space CP(N,P(N),
yw) and that the function 11 . l i p : 1P -+ [0, 00) is exactly the
function 11 . l i p : L P ( W , P(N),
yw) + [0, 00) in Theorem 15.50. By Exercise 14-3a the
only set of yw-measure zero is the empty set. But this means that (xn}El = 0 iff
{x~):?~ = 0 m-a.e., which is the case iff x, = 0 for all n E N.Thus Theorem 15.50
proves that ( I p , 11 . / I p ) is a normed space.
1
1
Note how the proof of Theorem 15.52 reveals that the lP-spaces are special cases
of the CP-spaces. Another consequence of Theorem 15.52 is that Rdequipped with the
/I
: \
LP norm ( X I ,. . . , xd)
1
:=
Ixi lp is a normed space (Exercise 15-24). That is,
one space such as &id
can be equipped with several norms. For d-dimensional space,
this issue will be addressed in Section 16.6.
Normed spaces will be investigated in more detail in Chapter 17, with groundwork
laid in this chapter and in Chapter 16.
Exercises
15-21. Finishing Example 15.43. Let B ( D ,W) be the set of bounded functions on the set D .
(a) Prove that with pointwise addition and scalar multiplication B ( D , W) is a vector space.
(b) Prove that the function l l f l l ,
:= sup
{
I f(x) 1 :x E D
} defines a norm on B ( D ,W).
15-22. Finish the proof of Young's inequality (Lemma 15.47) by proving that
absolute minimum value of 0 on (0. m ) .
f(t)
t
:= P
+ -4l - t '
has an
15. The Abstract Venues for Analysis
274
15-23. Finish the proof of Theorem 15.50. That is,
(a) Prove part 1 of Theorem 15.50, and
4c
(b) Prove part 2 of Theorem 15.50.
15-24. For p 2 1 define //
. /Ip
: Rd --f [0, co)by
I/ ( X I , . . . , Xd) l i p
d
/xi IP and prove that I/ 1lp
:=
i=l
is a norm on ~ d .
15-25. Let a
b and let BVo[a,b] denote the vector space of all functions of bounded variation on [a. bl
so that f ( a ) = 0. Prove that
I l f l l ~ v:= sup
1”
I f ( a i ) - f(ai-1)
j
: a = a0 < a1 < . . < a, = b
i=l
defines a norm on BVo[a,b ] .
1
15-26. “Concrete” versions of the inequalities in this chapter.
(a) Let X I .
,,,
xcl. jl , , , , , j d E 3 and let p , q E (1. a )with
(l 1
b
(b) Let p
E
[ I , oc).Prove that llfllp :=
f(x)
1’
1
1
P
4
- + - = 1. Prove the inequalities
defines anorm on C o [ a .b]
dx
(c) State the Cauchy-Schwarz, Holder and Minkowski inequalities in inregral noration for the
norms from part 15-26b.
15-27. Young’s inequality revisited. Let 1 < p , q < a be such that
x . y E [0, a).
Prove thatxy 5
Hinr. Absorb the E into x and y.
xp
E-
P
1
-
P
+ -1 = 1, let E
z 0 and let
4
+ E - 2p -.yq
4
15-28. Give a direct proof of Theorem 15.52. The first two parts of the definition of a norm are straightforward. For the triangular inequality, proceed as follows.
1
+1
(a) First prove Holder’s inequality for series. Let 1 < p , q < x be such that - - = 1
P
4
and let [ x , ] z , E 1P and [ J , , ] ~E ~14. Prove that then ( ~ , y , ] E~ I’~ and we have the
inequality ll(,GzYng=l
Ill
5
x
ll{xrIl,=l
1lP
32
/l{rnI,=l
llq.
(b) Now prove Minkowski’s inequality for series. Prove that for all {
x,]:=~,
have
cc
a,
Ilh~,=~
+ { ~ ~cc1 , Ilp= ~5 I/{xllln=l
Ilp
+ ‘( I ( Y ~ ]3, , ~
2
(Y,];=~
E I p we
Ilp.
Hint. Use the proofs of the corresponding inequalities for integrals as guidance.
15-29. Let X be a normed space. Prove that c x i
12
1lxi 11 for any n elements X I ,
. . . . xn
E
X
5
li:,
15-30. When is a normed space an inner product space?
(a) Parallelogram law. Let X be an inner roduct space. Prove that for all x. j
2P
Ilx + Y I P
Ilx - Y1I2 = 2/1xIl2 21lYll ’
(b) Polarization identity. Let X be an inner product space. Prove that for all x,y
Ilx Y I12 - IIX - yll2 = 4 ( x , y ) .
+
+
+
E
X we have
E
X we have
275
15.7. Abstraction IV: Metric Spaces
(c) Let X be a normed space in which the parallelogram law holds. Prove that the equation
1
(x, y ) := - (Ilx
y1I2 - Ilx - y1I2 ) defines an inner product on X with llxIl = m f o r
4
allx E X .
+
(d) State an equivalent formulation for the statement “ X is a normed space and the norm of X is
induced by an inner product.”
(e) Give an example of functions f,g E C o [ a ,b] that do not satisfy the parallelogram law for
1) )Ix, thus proving that ( C o [ a ,b ] , 11 . Jim ) is not an inner product space.
15-31. Let ( M , X,g)be a measure space with g ( M ) < co. Prove that if 1 5 p < q < so, then
P ( M , C.g ) 2 Lq(M, X,g ) and for all f
Hint. Holder’s inequality.
I
E
1
L P ( M , C , g ) we have l l f l l p 5 l l f ( / q g ( M ) F - q .
15-32. Jensen’s inequality.
Hint. The second inequality is simple. For the first inequality consider
x.
lIx//oo
Provethatforallx ~ I p a n d a l 1l 5 p < q < w w e h a v e llxllp 2 llxllq 2
IIxIIw.
Prove that Jensen’s inequality can fail for continuous functions. That is, find p < q , a < b
and acontinuous function g : [ a , b] + R so that ligllp < I/glIq and llgllp < Ilgllo0.
Hint. A straight line rising from 0 to 1 on a sufficiently short interval will do.
15-33. The norm 11 ]Icc. as limit of the
(a) Prove that for all x E
11 lIp-norms.
Rd we have lim
P+m
Ilxllp = / / x / I x
(b) Prove that for all x E I’ we have lim llxllp = 11x11~
P-rX
(c) Let f : [ a ,b] --f R be continuous. Prove that for every E > 0 there is a S > 0 so that for all
p Z 1 w e h a v e ( l I f I l ~ - ~ ) Si~
(d) Prove that for all f
E
(lb
/f(x)
1’
dx)’
C o [ a .b] we have lim l l f l l p =
P-tW
5 ~ ~ f ~ ~ x ( b - a ) ~ .
lIfllx.
15.7 Abstraction IV: Metric Spaces
Norms are used to measure distances in a vector space. Natural phenomena are often
modeled in bounded subsets of d-dimensional space, which means sums and constant
multiples do not necessarily stay in the subset. When there is no linear structure, distances are measured with metrics. The properties of a metric are inspired by the real
life properties of distances. Distances are nonnegative, distinct objects have a nonzero
distance from each other, the distance is independent of whether we go from point A
to point B or vice versa, and detours through a third point cannot provide a shortcut.
15. The Abstract Venues for Analysis
276
Definition 15.53 A metric space is a pair ( X , d ) of a set X (without additional properties; inparticulal; X need not be a vector space) and afunction d : X x X -+ [0,co),
called the metric on X , with the following properties.
1. For all x , y E X , we have that d ( x , y ) = 0 iff x = y .
2. For all x , y
E
X , we have d ( x , y ) = d ( y , x).
3. For all x , y , z E X , we have d ( x , z ) 5 d ( x , y )
+ d ( y , z).
We will usually call X itselfa metric space. When we do so, we implicitly assume that
there is a metric on X , which will usually be denoted d.
Normed spaces are metric spaces, too. So once more we have generalized a known
concept.
Proposition 15.54 Let X be a normed space. Then d ( x , y ) := IIx - y 11 dejnes a metric on X .
Proof. Clearly, d ( x , y ) = IIx - y 11 >_ 0 for all x , y E X and d ( x , y ) = 0 iff
//x- y / / = 0 iff x - y = 0 iff x = y . Also d ( x , y ) = /Ix - y / / = / / y- xi1 = d ( y , x ) .
Finally, d ( x , z ) = IIx - zll 5 IIx - y II
+ Ily - zll = d ( x , y ) + d ( y , z ) .
With metric spaces we are no longer tied to the linear structure of a vector space.
In fact, any subset of a metric space is again a metric space.
Proposition 15.55 Let ( X , d ) be a metric space and let S C X be any subset of X . Let
ds := d / s x S be the restriction of the metric d to the subset S. Then ( S , d s ) is a metric
space.
Proof. Clearly, any property of d that holds for all elements of X will also hold for
all elements of S.
Proposition 15.55 provides a wide range of examples of metric spaces. For example, we can now consider intervals on the real line and subsets of Rd as metric spaces.
Definition 15.56 Let X be a metric space. Then we will automatically consider any
subset S 5 X to be a metric space, also called a metric subspace, carrying the metric
d s of Proposition 15.55. Said metric will usually also be denoted d . It is sometimes
called the induced metric or the relative metric.
dm+ dm,
Example 15.57 Not every metric space is a metric subspace of a normed space. On
R2, for x f y let d ( x , y ) :=
which is the sum of the lengths of
the straight line segments from x to 0 and from 0 to y , and for x = y let d ( x , y ) := 0.
Then d is a metric on R2that is not induced by a norm. This metric models distances
0
in a situation in which all travel must go through a central hub.
For more examples of metric spaces that are not subspaces of normed spaces, consider Exercise 15-34. We conclude this short section by proving the reverse triangular
inequality for metric spaces. We could have proved it in normed spaces first, but with
this approach we obtain it for normed spaces as a corollary.
277
15.7. Abstraction IV: Metric Spaces
2
LP
normed
spaces
topological spaces (nor considered
in
this rexr)
Figure 32: A hierarchy of structures for analysis. The LP spaces will be introduced in
the next section.
Theorem 15.58 The reverse triangular inequality f o r metric spaces. Let X be a
metric space. Then f o r all x,y , z E X we have ( d ( x ,y ) - d ( y , z)l I
d ( x ,z ) .
Proof. Let x,y , z E X and without loss of generality assume d ( y , z ) 5 d ( x , y ) .
Then the inequality d (x,y) 5 d (x, z ) +d ( z . y ) implies the reverse triangular inequality
w
Id(x, y ) - d ( y , z ) / = d ( x , y ) - d ( y , Z) I
d ( x , z).
Corollary 15.59 The reverse triangular inequality f o r normed spaces. Let X be a
normedspace. Then f o r a l l x , y E X we have jllxll - ((yIII I
IIx - yII.
The only spaces more abstract than metric spaces that occur frequently in mathematics are topological spaces. In analysis, spaces usually carry a metric. Therefore
we will not present topological spaces in this text. Metric spaces will be investigated
in more detail in Chapter 16. Figure 32 shows the hierarchy of structures that arise in
analysis.
To avoid notational confusion when working in normed spaces, we will use norm
notation for the metrics on subsets of normed spaces. This is sensible, because all
metrics that we will consider on subsets of normed spaces are induced by a norm.
Exercises
15-34. Examples of metric spaces.
I
0: i f x = y ,
X let d ( x , y ) := 1; i f x f y . Prove that d is a metric on X.
Nofe. d is called the discrete metric on X.
(a) Let X be a set and for x , y
(b) Prove that d ( x , y ) :=
E
Ix - Yl
1
+ lx - 4'1 defines a metric on W.
Hint For the triangular inequality, expand with ~.
1
lx - T I
27 8
15. The Abstract Venues for Analysis
(c) Consider the surface of a sphere in Bd with the usual distance function. Let the distance
between two points be the length of the shortest path (on the sphere) between these two points.
Explain why this distance function defines a metric between the points on the sphere and why
this metric is not the metric induced by the usual distance function.
15-35. Explain why the metric induced on R2 by 11 111 is called the taxicab metric
15-36. Let 1 5 p 5 m. In Wd equipped with the metric induced by 11 . l i p compute the distance from the
origin to the point (1, , . . , 1). In which metric does the cube [0, l l d have the longest diagonal? In
which metric is the diagonal shortest?
15-37. The following, purportedly true, story illustrates the importance of having examples for an abstract
notion. Warning. The following notion is absolutely useless. We simply debunk it here. The story
itselfmay well be a “mathematical urban legend.” A mathematician once spent a lot of time proving
abstract properties about so-called “anti-metric’’ spaces. An “anti-metric” d : X x X + [O. m)
is a function so that for all x E X we have d [ x ,x ) = 0 iff x = 0, for all x , y E X we have that
d ( x , y ) = d ( y , x ) and for all x , y , z E X we have that d ( x , z) ? d ( x , y ) d ( y , z ) . So, all that has
changed from metric spaces is that the triangular inequality is reversed. Prove that an anti-metric
space can have at most one point.
+
The mathematician could have simplified all his proofs by using that these spaces have at most one
point. Spaces with at most one point have lots of properties, but they are not very interesting. So
if you try to “be wise, generalize,” make sure that your generalizatiodmodification still has models
[examples) that are useful.
15.8 LP Spaces
Part 1 of Theorem 15.50 shows that CP spaces fail to be metric spaces because it is
possible for distinct objects to have distance zero from each other. When all other
properties of a metric are given, we speak of a semimetric space.
Definition 15.60 A semimetric space is a pair (X’,
d S ) of a set X s and a function
d S : X s x X s + [0,00) with the following properties.
1. For all x E X s , we have d(x,x ) = 0.
2. For all x , y E X s , we have d S ( x ,y ) = dS(y,x).
3. For all x , y , z E X s , we have dS(x,z ) 5 dS(x,y )
+ d S ( y ,z).
It would be cumbersome to develop a theory of semimetric spaces parallel to that of
metric spaces. It is also more appropriate to work with metric spaces, because practical
observation tells us that two distinct objects cannot occupy the same space at the same
time. To overcome the minor deficiency of a semimetric, objects that have distance
zero from each other are combined to become single points. This process produces a
metric space in natural fashion.
-
Theorem 15.61 Let (X’,
d S ) be a semimetric space. Then -G X s x X s dejined by
y i f s d s ( x ,y ) = 0 is an equivalence relation (see Definition C.5 in Appendix C.2).
x
If we denote the equivalence classes of with [XI, then the set X := {[XI : x E Xs}
equipped with d ( [ x ] ,[y]) := dS(x,y ) is a metric space.
-
15.8. LP Spaces
279
-
Proof. It is clear that the relation
is reflexive, symmetric and transitive. The
function d is defined for equivalence classes, but it is defined in terms of representatives
of each class. Therefore we must prove that d is well defined. That is, we must show
that the value of d does not depend on which representatives are used. Let [x], [ y ] E X ,
let X I x2
, E [x] and let y1, y2 E [ y ] . Then
ds(xi, y i )
+
i d S ( x 1 , x 2 ) d S ( x 2 ,y2)
+ dS(y2,y i ) = d S ( x 2 ,y2)
and we prove the reversed inequality similarly. Hence, d s ( x l , y l ) = d S ( x 2 ,y 2 ) and
the definition of d is independent of the representatives chosen from each equivalence
class.
Now d ( [ x l ,[ y l ) = 0 implies d S ( x ,y ) = 0 , that is, x
y and thus [XI = [ y ]
(Exercise 15-38a). Conversely, if [XI = [ y ] ,then d ( [ x ] [, y ] )= dS(x,x) = 0. Hence,
d ( [ x ] ,[ y ] )= 0 is equivalent to [XI
= [ y ] and the first condition for being a metric is
satisfied. Antisymmetry and the triangular inequality for d are easily verified (Exerw
cises 15-38b and 15-38c), so d is a metric.
-
Theorem 15.61 allows us to define metric spaces of “p-integrable functions.” Formally these spaces consist of classes of p-integrable functions, but the distinction blurs
at times. Metric considerations are taken care of in L P ( M , C , p ) , while integral equations etc. are proved in Cp(M . C, p ) .
Definition 15.62 Let ( M , C, p ) be a measure space. For 1 5 p < 30, we denote by
L p ( M , C , p ) the metric space obtainedfrom L p ( M . C ,p ) via Theorem 15.61.
Exercises 15-39 and 15-40 show that the LP spaces actually are normed spaces and
that the L 2 spaces are inner product spaces.
We conclude this section by defining a space similar to 1, on measure spaces.
Definition 15.63 Let ( M , C ,p ) be a measure space. We define
P ( M , C, p ) := { f E F ( M , W) : ( 3 B
For f
E
P ( M , C , p), we define II f
,1
E
W : I f (x)l 5 B p-a.e.) }.
:= inf { B E
W : I f (x)l i B p-a.e. }.
Note that sometimes the p-a.e. in the definition of LOc is replaced with “p-locally
a.e.” Exercise 15-45 shows that for a-finite measure spaces null sets and locally null
sets are the same. We consider a-finite measure spaces in this text, so the author chose
the simpler definition.
Proposition 15.64 Let ( M , C , p ) be a measure space. Then COD( M , C , p ) equipped
with d , ( f , g ) := 11 f - g 11 is a semimetric space.
Proof. The reader will prove a little more in Exercise 15-41.
w
Definition 15.65 Let ( M , C , p ) be a measure space. We denote by L a ( M , C , p ) the
metric space obtained from C m ( M , C , p ) via Theorem 15.61.
Note that by Exercises 15-39 and 15-41 the space L w ( M , C ,p ) is actually a
normed space.
15. The Abstract Venues for Analysis
280
Notation 15.66 If M = D is a Lebesgue measurable subset of Rd, C is the a-algebra
of Lebesgue measurable subsets of D and p is Lebesgue measure restricted to C we
will also write LP ( D )for Lp ( M , C , p ) and if D is an interval we also write Lp [ a ,b ]
for L p ( [ a ,b ] ) ,and so on.
Notation 15.67 Certain spaces are usually assumed to carry a certain norm or metric.
The space Co[a,b ] is usually assumed to be normed by 11 ’ I/co. The spaces LP are
usually assumed to be normed by 11 . I l p . The space Lco is usually assumed to be
normed by 11 . l)co. Unless otherwise stated, we will assume that each of the above
spaces carries the mentioned norm and that any subset of these spaces carries the
metric induced by the mentioned norm. For finite dimensional spaces, Theorem 16.76
will show that although many norms are available, for most purposes any one of them
can be used.
Exercises
15-38. Fill in the remaining details in the proof of Theorem 15.61.
-
(a) Prove that if is an equivalence relation with equivalence classes denoted by
then [XI
= [y].
(b) Prove that d : X x X + [O,
for all [XI,
[ y ] E X.
CXJ)
[.I and x
-
y.
as in Theorem 15.61 is satisfies d ([XI,
[ y ] ) = d ( [ y l , [XI)
(c) Prove that d : X x X + [0, co)as in Theorem 15.61 satisfies the triangular inequality.
15-39. A seminormedspace is a pair (X’, 11 lls) of a vector space Xs and a function I / . 11’ : Xs+ [O, a),
called the seminorm such that the following hold.
0
JjOlJ’ = 0.
For all real numbers a and all x E X’ we have Ilax/Is= IL-IIIxII’.
F o r a l l x , y E X S w e h a v e l ~ x + y ~5~ lS~ x l l s + l l y l l s .
(a) Prove that -& X s x Xsdefined by x
-
y iff IIx
-
yll’ = 0 is an equivalence relation.
(b) Prove that if [XI
denotes the equivalence class of x under -, then X := {[XI
: x E X’) equipped
with [XI := llxlls is a normed space.
Be careful. You must also prove that X is a vector space.
1
1
15-40. A semi-inner product space is a pair (X’, (., .)‘) consisting of a vector space X s and a function
(., .)s : X s x Xs -+ JR, called the semi-inner product such that the following hold.
0
(0,O)’ = 0.
For all x
E
X’, we have ( x . x ) 2
~ 0.
Forallx,y ~ X ’ , w e h a v e ( x , y ) ’ = (y,x)’.
For all real numbers a and all x , y
For all x, y , z
E
Xs, we have ( x
E
+ y . z)’
-C X’
+
= ( x . z ) ~ ( y ,z)‘.
pair (X’,11 . 11’)
(a) Prove that with lIxI/’ := -the
(b) Prove that
X s , we have ( a x , y ) $ = a ( x , y ) ’ .
x Xs defined by x
-
y iff Ilx
-
is a seminormed space
ylls = 0 is an equivalence relation.
(c) Prove that if [XI
denotes the equivalence class of x under -, then the set X := {[XI
:x E X s ]
equipped with ( [XI,
[ y ] ) := ( x , y)’ is an inner product space.
15-41. Prove that (L:“(M, C ,w ) , 11 . l l m )
in Proposition 15.64 is a seminormed space
15.9. Another Number Field: Complex Numbers
15-42. Let ( M , C,p ) be a finite measure space. For any two measurable functions f,g : M
d(f.g )
:=
s,
If
- g1
28 1
--f
W,define
d p . Prove that d is a semimetric on the space of measurable functions.
1 + l f -81
15-43. Holder’s inequality for p = 1, q = 00. Let ( M , C , p ) be a measure space. Prove that for all
functions f E L ’ ( M , C, p ) and g E L 3 ” ( M , C, p ) we have fg E L 1 ( M , C, p ) and the inequality
llfglll 5 Ilflllllglloo holds.
Hint. Use that g is bounded a.e. by llgl/m.
15-44. Let ( M , C ,p ) be a measure space with p ( M ) < i*?
(a) Prove that L m ( M , C , p )
0
c
L P ( M , C, p).
P€[l.W)
(b) Give an example that shows that the containment is proper.
(c) Prove that for all f E Lm(M,C ,p ) we have lim llfllp = llfllm.
P+m
15-45. Let ( M , C, p ) be a measure space. A set L C C is called locally p-null iff for all S
p ( S ) < 00 we have p ( S n L) = 0.
E
C with
(a) Prove that every null set is locally p-null.
(b) Prove that if ( M , C , p ) is c-finite, then every locally p-null set is a null set.
I
(c) Consider the function @ ( A ):=
sets of W.
co;
‘
if
A’ defined on the Lebesgue measurable subifOEA,
i. Prove that p is a measure.
ii. Prove that Q is locally p-null, but not p-null
15.9 Another Number Field: Complex Numbers
Complex numbers are often used in analysis, typically when “square roots of negative
numbers” are needed. Exercise 15-52 shows that there is a price to be paid, because
we lose the order relation. That in itself is not a problem. This section shows that
the field axioms and an absolute value function remain available. Moreover, Theorem
15.75 will show that the convergence of Cauchy sequences, which is fundamental to
analysis, is also preserved. Consequently, in abstract analysis R and @ are often used
interchangeably.
Definition 15.68 The complex numbers @ are the set R x IR equipped with addition
and multiplication dejned as follows. For all complex numbers ( a , b), ( c , d ) E @,
we set ( a , b ) ( c , d ) := (a c , b d ) and ( a , b) . ( c , d ) := (ac - b d , ad bc).
We define i := (0, 1) and 1 := ( 1 , O ) and write complex numbers also in the form
( a , 6 ) = a . 1 b . i = a ib. For z = a i b E @, the number a is also called the
real part of z, denoted % ( z ) ,and the number b is also called the imaginary part of z,
denoted 3 ( z ) .
+
+
+
+
The algebraic properties of
+
+
+
@: are
summarized in Theorems 15.69 and 15.70.
Theorem 15.69 The complex numbers C are afield.
15. The Abstract Venues for Analysis
282
+
Proof. The field axioms are easily verified with 0 = 0 Oi being neutral for
addition, 1 = 1 Oi being neutral for multiplication, -(a b i ) = (-a) (-b)i being
U
b
i being the multiplicative
the additive inverse and (a ib)-’ =
a2+b2
a2+b2
inverse. (Exercise 15-46.)
+
+
+
~
+
~
The special element i serves as the closest we can get to a “square root of (- l).”
Theorem 15.70 i 2 = -1.
Proof. i 2 = (0
+ l i ) . (0 + ~
i =) ( 0 . o
- 1 . 1) + ( 0 . 1 + 1 . o ) i = -1.
w
Aside from the above algebraic properties, we need to know how to measure distances in the complex numbers. This is done via the absolute value function.
Definition 15.71 For z = a
+ ib E @, the absolute value of z is / z / :=
Theorem 15.72 Properties of the absolute value.
0. For all
z
E
@, we have
/ z / 2 0.
3. The triangular inequality holds.
Thatis,forallzl,z2E@wehuvelzl+z2l F lziI+/z21.
w
Proof. Exercise 15-47.
If we switch the sign of the imaginary part of a complex number we obtain the
complex conjugate.
Definition 15.73 For z = a
+ i b E @, the complex conjugate of z is Z := a - ib.
Absolute value and complex conjugate are related via a simple equation. This
equation can be used to express multiplicative inverses.
Proposition 15.74 For all z
Moreover; for all z
E
E
@, the equalities z
+ Z = 2%(z) and 1zI2 = Z Z hold.
1 Z
C \ { 0 }the multiplicative inverse is - = 7.
IZI
Proof. Exercise 15-48.
The definitions of complex valued sequences, convergence in C,and Cauchy
sequences in C are similar to the corresponding definitions in R (Exercise 15-49).
Theorem 15.75 Every complex valued Cauchy sequence converges in @.
283
15.9. Another Number Field: Complex Numbers
{zn}E1
Proof. Let
be a Cauchy sequence of complex numbers. For each n E W,
let a, := M(zn) and b, := 3 ( z n ) so that zn = a,
ib,. Then for all E > 0 there is
an N E N so that for all m , n 2 N we have Iz, - zml < E . Thus for all rn, n 2 N we
obtain
(bn - bm)2 = Izn - z m 1 < E ,
la, - a, 1 I
J(an -
+
+
{a,}zl
so
is a Cauchy sequence in Iw. Similarly, { b n } E 1is shown to be a Cauchy
sequence in R. Let a := lim a, and b := lim b,, where the limits are taken in R.
Let z := a
la - an I <
+-i b and let E
1/2
n+m
n-toa
> 0. Then
- there is an N E N so that for all n 2
and Ib - b,l < L.Therefore, for all n 3 N we obtain
N we have
A
Iz
- z n / = J ( a - a,)2
which means that z = lim
n+oa
Zn
+ (b - b,)2
<
J;: + '2'
- = E,
-
in @.
Throughout this text we will rely on the properties that C has in common with R.
It is a field in which Cauchy sequences converge and its absolute value function has
similar properties as the absolute value function for R. The fact that C contains an
element i such that i 2 = -1 is what sometimes requires the explicit use of C instead
of R. As long as we do not use this fact, we could use C and Iw interchangeably. In
particular, unless otherwise indicated, all theorems on normed real vector spaces also
hold for normed complex vector spaces. Moreover, because no linear structure is used
in metric spaces, all results for metric spaces hold in real spaces as well as in complex
spaces. In the remainder of this section, we highlight the main adjustments that need
to be made to examples and definitions when using C instead of R as the underlying
field.
All results for series, except for the alternating series test (obviously), and all results for power series can be translated verbatim to results for series and power series
of complex numbers. The definitions of the exponential and trigonometric functions
retain their familiar form.
Definition 15.76 For all z
E
C we define
1. The complex exponential function ez :=
c
Zk
O0
k=O
c
-.
k!
oc (- l ) k z 2 k + 1
2. The complex sine function sin(z) :=
k=O
(2k
oc
3. The complex cosine function cos(z) :=
k=O
+ l)!
'
(- l)kZ2k
(2k)!
These functions have the same properties as were proved for their real counterparts
in Sections 12.1 and 12.2. Moreover, they are related via the Euler identities.
15. The Abstract Venues for Analysis
284
Theorem 15.77 Euler identities. For all z
Equivalently, f o r all z E
C we have sin(z) =
,iZ
@ we have elz = cos(z)
E
2i
+ i sin(z).
eiz + e - i z
- ,-iz
and cos(z) =
2
.
rn
Proof. Exercise 15-50.
To see how the all-important function spaces can be translated into complex vector
spaces, we need to define integrability for complex valued functions. A complex vector space is of course defined similar to a real vector space, the only difference being
that the scalars are taken from the field C of complex numbers rather than the field R
of real numbers.
Definition 15.78 Let ( M , C , p ) be a measure space. Then a function f : X + C
will be called measurable iff its real part defined by % ( f ) ( x ):= % ( f ( x ) ) and its
imaginary part deJined by 5 (f )(x) := 3 ( f (x)) are both measurable. It will be called
integrable iff both real and imaginary parts are integrable and the integral will be
JI, f
d p :=
s,
%(f 1 d p
+i
s,
Y f1 d p .
One of the main reasons why little changes for the integration of complex valued
functions is that a complex valued function is integrable iff its absolute value is integrable.
Proposition 15.79 Let ( M , C ,p ) be a measure space. Then a complex valuedfunction
f : M +- CC is integrable ifSthe absolute value I f 1 : M + JR (of course taken
pointwise) is integrable.
Proof. For “e,”
note that IW(f)l I
I f 1 and / 3 ( f ) l 5 I f l , and apply part 2 of
Theorem 14.35.
=
d
m
1,
5 h m a x { IW(f) IS(f
For “=+,” note that I f 1
the maximum is taken pointwise, and apply part 1 of Theorem 14.35.
)I}
, where
rn
Definition 15.80 If D is a set, then F ( D ,C) is the space of all functions from D to @.
Spaces l p , CP ( M , C , p ) and LP ( M , C , p ) are defined similar to their counterparts
f o r real valued functions and they have similar properties.
The similar properties mentioned above include being a normed space or an inner product space. A complex normed space is defined similar to a real normed
space, with the only difference being that we use a complex vector space instead of
a real vector space. In particular, the norm on a complex normed space still maps into
[0,w ) 2 R. The most significant adjustment when going from real vector spaces to
complex vector spaces is in the definition of an inner product space.
Definition 15.81 A complex inner product space is a pair ( X , (., .)) of a complex
vector space X and a function (., .) : X x X -+ @, called the inner product on X ,
with the following properties.
1. The inner product is positive definite.
That & f o r all x E X we have (x,x) E [0,w ) with (x,x ) = 0 z f f x = 0.
285
15.9. Another Number Field: Complex Numbers
2. For all x,y
-
E
X , we have (x,y ) = ( y , x).
3. The inner product is linear in the$rst factor:
That is, f o r all scalars a and all x, y E X we have ( a x , y ) = (Y (x,y ) and f o r all
x,y , z E X we have (x y , z ) = (x,z ) ( y , z ) .
+
+
We will usually call X itself an inner product space. When we do so, we implicitly
assume that there is an innerproduct on X , which will usually be denoted (., .).
The adjustment in part 2 of Definition 15.81 guarantees that the Cauchy-Schwarz
inequality still holds and that complex inner product spaces are also complex normed
spaces (see Exercise 15-51). To make L 2 ( M , C , p ) into a complex inner product space,
we use the product ( f , g) :=
llr
f
d p on L 2 ( M , C , p ) and then identify functions
whose distance from each other is zero as was done in Theorem 15.61.
Throughout the rest of this text, unless we explicitly demand the space to be a real
or complex space, it is usually safe to pretend the underlying field is R. But since we
will only use field axioms, absolute values and convergence of Cauchy sequences in
our proofs, the results will hold for complex vector spaces as well as for real vector
spaces.
Exercises
15-46. Prove that C is a field. That is, prove each of the following
+ y ) + z = x + ( y + I).
C we have x + y = y + x.
(c) Prove that with 0 = 0 + Oi for all x E C we have x + 0 = x.
(d) Prove that for every element x = a + ib E C the element (-x) = ( - a )
(a) Prove that for all x , y,z E C we have ( x
(b) Prove that for all x,y E
x
+ (-x)
+ i ( - b ) is so that
= 0.
C we have ( x . y) . z = x . (y . z )
(0 Prove that for all x , y E C we have x y = y . x.
( 8 ) Prove that the element 1 := 1 + i0 is so that for all x E R we have
(e) Prove that for all x . y. z E
(h) Prove that for every element x = a
+ i b E C the element x-'
:=
+a
y
1 x =x
a
a2+b2
that x . x - ' = 1.
(i) Prove that for all a , x,y
E
C we have CY . ( x
+y ) =
01
x
15-47. Prove Theorem 15.72. That is, prove each of the following.
C we have / z / 2 0.
C we have lzl = 0 iff z = 0.
(c) Prove that for all 2 1 , z 2 E C we have l z l z 2 l = Iz1 lIz2l.
(a) Prove that for all z
E
(b) Prove that for all z
E
(d) Provethatforallzl,z2 ~ C w e h a v e l z l+z21 5 I z l I + l z 2 I .
Hinr. For the triangular inequality, use the Cauchy-Schwarz inequality for R2
15-48. Prove Proposition 15.74.
15-49. Define each of the following for complex numbers.
(a) Sequences of complex numbers
-1-
'
a2+b2
is so
15. The Abstract Venues for Analysis
286
(b) Convergence of sequences of complex numbers.
(c) Cauchy sequences of complex numbers.
15-50, Prove the Euler identities.
(a) First, use the power series to prove that e l z = cos(z)
+ i sin(z) for all z E C
(b) Use part 15-50a and the appropriate version of Exercise 12-13c to prove that for all z
eiz
- ,-iz
E
C we
eiz + e-iz
and cos(z) =
2
'
2i
(c) Prove that the two identities in part 15-50b imply the identity in part 15-50a
have sin(z) =
~
~
15-51. Let X be a complex inner product space with inner product (,, .).
(a) Prove that for all x , y , z
E
X and all a
E
C we
have ( x , y
+ z ) = ( x . y ) + ( x , z ) and
( x , a y ) = Z ( X , y).
1
1
(b) Prove that M ( ( x , y ) ) 5 dholds for all x , y E X.
Hint. Use the proof of Lemma 15.39 as guidance and realize that
f ( t )= (tx
+ y , + y) = t 2 k x ) + 2t% ( ( x , y ) ) + (y, y ) .
tX
(c) Prove that every complex inner product space is a complex normed space.
Hint. Adjust the proof of Theorem 15.40 appropriately.
(d) Prove the Cauchy-Schwarz inequality. That is, prove that
for all x , y E X.
1 ( x , y) I 5 Jm
holds
15-52. Prove that C cannot be ordered.
Hint. Suppose there was a subset Ct of C with properties as in Axiom 1.6 for the real numbers.
Prove that it must contain both 1 and -1, which is not possible. Use that it must contain i or -i.
15-53. Representation of complex numbers and nth roots
(a) Prove that for every complex number z = x
so that z = r e i e .
Hint. Exercise 15-50a.
+ i): there are real number
(b) Prove that for every complex number z # 0 and every n E
numbers w 1 , . . . , wn with wr = z .
Y
1 0 and Q
E
[O, 2irI
N there are n distinct complex
Chapter 16
The Topology of Metric Spaces
In Chapter 15, we started with d-dimensional space as our underlying motivation and
visualization. By systematically stripping away specific properties, we obtained the
successively more general structures of inner product spaces, normed spaces and metric
spaces. We will now journey back from metric spaces in this chapter, to normed spaces
in Chapter 17, and to inner product spaces in Chapter 20. Because we now move from
more general to more specific structures, the results we prove here and in Chapter 17
will also apply to structures discussed in later chapters.
16.1 Convergence of Sequences
A metric provides a distance function on a set, which is enough to discuss convergence,
the central notion of analysis. We start by defining convergence of sequences and by
investigating examples and properties of convergent sequences. Note that convergence
is defined exactly as for sequences of real numbers in Definition 2.2. A sequence in X
is of course just a function f : N -+X , denoted as before by {xn}K1.
Definition 16.1 Let [.x~}:=~
be a sequence in the metric space X. Then L E X is
called limit of (xn}K1zfffor all E > 0 there is an N E N so that for all n 2 N we
have d(x,, L) < E . A sequence that has a limit will be called convergent, a sequence
that does not have a limit will be called divergent.
As in the real numbers, limits are unique and the proof is similar.
Proposition 16.2 Let {xn}El
be a sequence in the metric space X. Ifboth L and M
are limits of { x n } ~ fthen
1, L = M.
287
16. The Topology of Metric Spaces
288
Proof. We mimic the proof of Proposition 2.4. Let {x,):=~ be a sequence in X and
let L and M be limits of {xn}El.
We need to prove that L = M .
Let E > 0 be arbitrary but fixed. Then there is an N1 E N such that for all n > N1
&
we have d(x,, L) < -. There also is an N2 E N such that for all n p N2 we have
2
&
&
d(xn, M ) < -. Let N := max(N1, N2}. Then for all n 2 N we infer d ( x , , L) 4 2
2 &
and d(x, , M ) < -. Hence, with n = N we obtain
2
d(L, M ) I
d(L, x i y )
+ d(xiy, M ) < -2 + -2 =
&
&
6.
Because E > 0 was arbitrary, by Theorem 1.37 we conclude that d ( L , M ) = 0, and
hence L = M .
The similarity to the proof of Proposition 2.4 is obvious. It seems the main change
is that we work with d(x,, L) instead of la, - LI. It would be naive to hope that all
proofs translate this easily from the real line to metric spaces. But, especially for results
early in this chapter, the proof of the corresponding result for the real line will provide
more than adequate guidance.
Definition 16.3 Because the limit is unique, we speak of the limit of a sequence. The
notation lim x, = L will indicate that the limit of {x,,}?=, exists and is equal to L.
n-33
Although we are proving results for metric spaces in this chapter, if a metric d(., .)
is induced by a norm 11 . I( we will write IIx - yl( instead of d ( x , y ) . We can do this
because every normed space is a metric space, which means that all results proved here
are also valid for normed spaces (and inner product spaces, too).
As a first example, we consider the convergence of sequences in d-dimensional
space. It turns out that sequences in Rdconverge iff they converge componentwise. In
Theorem 16.4 we prove this for the metric induced by the uniform norm // . (Im. Section
16.6 and specifically Theorem 16.78 will show that the same is true for all norms on
Rd.Until Section 16.6 we will consider only the uniform norm on Rd.For the notation
for the component sequences, recall Convention 15.23.
Then
Theorem 16.4 Consider Rdwith the metric induced by the uniform norm 11 .
a sequence {x,}El converges to L in
I/ . Ilm) 3for all k = 1, . . . , d the kth
I
co
p,
converges in R to the kth component L(k)of L.
component sequence x i k )
ln=1
Proof. We have lim x, = L iff lim llx, - LIlm = 0. For all k
I
n+m
n+m
E
11, . . . , d), we
infer lxik) - L ( k ) 5 IIx, - LIIm, so convergence of the vectors implies convergence
of the components in R.(MentallyJill in the argument.)
Conversely, assume that for all k E { 1, . . . , d} we have lim ixAk) - L @ ) = 0. Let
n-cc
iz 2 Nk we have
~
E
> 0. For each k E { 1, . . . , d) there is an Nk such that for all
16.1. Convergence of Sequences
289
lxik) - L ( k ) < E . Let N := max{Nk : k = 1, . . . , d } . Then for all n 2 N we obtain
-
< E for all k E (1, . . . , d } , and hence ((x,- Lllm < E .
16. The Topology of Metric Spaces
290
Definition 16.7 A metric space X is called bounded iff there is a point c
A4 E E% so thatfor all x E X we have d ( x , c ) 5 M .
E
X and an
We will frequently work with subsets of metric spaces. Because it would be inefficient to define properties for metric spaces and also for subsets of metric spaces we
recall that any subset of a metric space is a metric space in its own right.
Definition 16.8 Any property of metric spaces can also be viewed as a property of
the subsets of metric spaces. Formally, we say that a subset B of a metric space X has
property P ifs B as a metric space with the induced metric d I B B has property P.
Proposition 16.9 Let {xn}El
be a convergent sequence in the metric space X . Then
{x, : n E N) is bounded.
Proof. Adapt the proof of Proposition 2.34. (See Exercise 16-2.)
Subsequences are defined just as for sequences of real numbers.
Definition 16.10 (Compare with Definition 2.39.) Let X be a metric space, let {x.),"=~
be a sequence of elements of X and let { n k } E l be a sequence of natural numbers so
that nk < nk+l for all k E N. Then { x n k } E 1is called a subsequence of { x n } E l .
As for real numbers, subsequences of convergent sequences have the same limit.
Proposition 16.11 Let X be a metric space and let { x n } E 1be a convergent sequence
with limit L. Then every subsequence { x n , } z l also converges to L.
Proof. Mimic the proof of Proposition 2.40. (See Exercise 16-3.)
Exercises
16-1, Componentwise convergence is not equivalent to convergence.
lo;,
1
for 0 5 x 5 1 - -,
n
1
1
(a) Prove that the sequence f n ( x ) := 2n x - 1 - - : for 1 - - 5 x 5 1 - -,
n
2n
1
2n(l - x ) ;
for 1 - - < x 5 1,
2n converges pointwise to f ( x ) = 0, but it does not converge to f in ( Co[O. 11, 11 . / l a
particular you need to prove that the f n are continuous.
(
91
). In
(b) Prove that convergence in 1OC implies componentwise convergence.
Hint. Mimic the appropriate part of the proof of Theorem 16.4.
(c) Construct a sequence
[xn]E1
in Ix
so that for all i E W the component sequence x ( i )
converges, but [ x , ) ~ = ,does not converge in P .
{
x
n ln=1
(d) Explain why the proof of Theorem 16.4 cannot be generalized to apply to Ice. That is, determine which part of the proof fails for 1".
16-2. Prove Proposition 16.9.
16-3. Prove Proposition 16.11.
16.2. Completeness
29 1
16-4. Let X be a metric space and let { x n ] r z 1 and (yn]r=l be sequences with lirn d ( x n , yn) = 0. Prove
that if ( x n ) E 1converges then so does
{yn]r=l
n+o3
and lirn xn = lirn yn.
n+m
n+w
16-5. Let X be a subset of Wd with the metric induced by the Euclidean norm /I . 112 on Wd.Prove that a sequence ( x n ) Z 1converges to L in X iff for all k = I,. . . , d the kfh component sequence
converges in R to the kth component L ( k ) of L .
Hint.For the ‘‘+”direction, the difference in each component is best chosen to be
-!-
%a
16-6. Let ( M , C , /L) be a finite measure space and let M ( M ,6 ,p) be the metric space obtained as in
Theorem 15.61 from the measurable functions with the semimetric d(f, g ) =
from Exercise 15-42. Prove that the sequence
convergesin measure to g.
[fn]r=l
[ [fn]
converges to [ g ]
s,
E
1 Y;l”,l
dp
M ( M ,6,/A) iff
16-7. Let X be a metric space. Prove that the following are equivalent
(a) X is bounded,
(b) For all c
E
X there is an M ,
(c) There is an M
E
E
B so that for all x E
R so that for all x , y
E
X we have d ( x , c) 5 M c ,
X we have d ( x , y ) 5 M .
16-8. Let X be a metric space, let x E X and let (X,];P=~be a sequence in X such that every subsequence
converges to x.
has a subsequence that converges to x . Prove that
(X,]F=~
16-9. Because every normed space is also a metric space, we have also defined convergence in normed
and [yn]F=l be sequences in X and let ( c ~ ] ? = ~be
spaces. So let X be a normed space, let
a sequence in W.
(xn)E1
(a) Prove that if lirn xn = x and lirn yn = y, then lim Xn
n+oc
n+m
n+oc
+ yn = x + y.
(b) Prove that if lirn xn = x and lim yn = y , then lim xn - yn = x - y
n+cc
n-rm
n+m
(c) Prove that if lirn x n = x and lim cn = c, then lim cnxn = cx.
n+oc
n+co
n+m
Hint Mimic the proofs of the appropriate parts of Theorem 2.14
16-10, Because every inner product space is also a metric space, we have also defined convergence in inner
product spaces. So let X be an inner product space and let (xn]r=l and
be sequences in X .
Prove that if lirn xn = x and lim yn = y, then lirn ( x n , Y n ) = (x, y ) .
n+oc
n+m
n+m
16-11. Let X be a metric space and let
and
be convergent sequences in X with limits a
and b, respectively. Prove that lirn d ( u n , b n ) = d(a, b ) .
n-+m
16.2 Completeness
By Theorem 2.27, every convergent sequence of real numbers is a Cauchy sequence.
The definition of Cauchy sequences easily translates to metric spaces and the implication “convergence implies Cauchy” still holds.
Definition 16.12 Let {xn]:=l be a sequence in the metric space X . Then {xn}E1
is
called a Cauchy sequence iffor all E > 0 there is an N E N so that for all m , n 2 N
we have d(x, , x,) < E.
Theorem 16.13 Ifthe sequence
Cauchy sequence.
{xn}zl
in the metric space X converges, then it is a
16. The Topology of Metric Spaces
292
Proof. Mimic part of the proof of Theorem 2.27. (See Exercise 16-12.)
1
Conversely, the fact that Cauchy sequences converge is one of the fundamental
properties of the real numbers. Recall that by Exercise 2-25 this property is equivalent
to the Completeness Axiom (Axiom 1.19). Hence, without this property analysis on
the real line would be all but impossible. Convergence of Cauchy sequences is equally
important in metric spaces. For example, in numerical considerations, the limit usually
is not known. Therefore, to prove convergence of a numerical scheme it is common
to prove that the scheme produces a Cauchy sequence and then use convergence of
Cauchy sequences to conclude that the scheme will produce a limit (for an example,
see the proof of Theorem 13.14). On the theoretical side, convergence of Cauchy sequences is crucial for the proofs of such fundamental results as the differentiability
of the inversion operator (Theorem 17.32), Banach’s Fixed Point Theorem (Theorem
17.64), the Implicit Function Theorem (Theorem 17.65), Riesz’ Representation Theorem (Theorem 20.26), Picard and Lindelof’s Existence and Uniqueness Theorem (Theorem 22.6), and the Lax-Milgram Lemma (Lemma 23.4). Hence, we will always give
special attention to spaces in which Cauchy sequences converge. Note that in a normed
space we automatically assume the metric is induced by the norm.
Definition 16.14 A metric space X is called complete i f f all Cauchy sequences in X
converge. A complete normed space is called a Banach space and a complete inner
product space is called a Hilbert space.
The real numbers are the simplest example of a complete metric space. Unsurprisingly, the spaces that are most commonly used in analysis are complete.
Theorem 16.15 (I@
I/ . Ilm) , is complete, that is, it is a Banach space.
Proof. We need to prove that every Cauchy sequence in Rd converges. A typical
completeness proof starts with a Cauchy sequence, constructs an element that should
be the limit and then proves that the element is indeed the limit.
Let
be a Cauchy sequence in Rd and let E > 0. There is an N E N so
that for all m , n > N we have IIx, - xn 1Im < E . But then for all j = 1, . . . , d and all
nz, n 2 N we obtain
- x ~ ” 1 5 IIxm - x, /Im < E , which means that each com-
{x,}zl
1
1
1x2’
30
ponent sequence x,(’I)
n=l
ponent sequence xi’)
is a Cauchy sequence. Because R is complete, each com-
30
ln=l
has a limit L ( j ) . Let L =
(L ( l ) ,. . . , L i d ) ). By Theorem
16.4 {x,}Kl converges to L . Because { x , } z l was arbitrary, every Cauchy sequence
1
in Rdconverges and the space is complete.
Completeness of C o [ a ,b ] , 11 . /Im) will be proved in Exercise 16-13.
To prove that the LP spaces are Banach spaces, it is helpful to first analyze convergence of series in normed spaces. Series, convergence of series and absolute convergence are defined just like for real numbers. Lemma 16.18 characterizes Banach
spaces in terms of convergent series.
(
16.2. Completeness
293
Definition 16.16 (Compare with Definition 6.1.) Let X be a normed space and let
c
00
{ a j } p lbe
a sequence in X . The partial sums of the series
c
aj are dejined to be
j=1
n
sn :=
aj.
The series is said to converge i f t h e sequence ofpartial sums converges
j=1
c
00
and it is said to diverge otherwise. For a convergent series, the limit is denoted
aj.
j=1
Definition 16.17 (Compare with Dejinition 6.12.) Let X be a normed space. Then a
c
00
series
c
00
a j in X converges absolutely i f t h e series
j=l
ljaj
11 converges in R.
j=l
In normed spaces, absolutely convergent series do not automatically converge. In
fact, they converge if and only if the space is a Banach space.
Lemma 16.18 A normed space X is a Banach space iff every absolutely convergent
series in X also converges in X .
Proof. For “+,”mimic the proof of Proposition 6.13. (Exercise 16-14.)
For “e,”
let X be a normed space in which every absolutely convergent series
converges and let
be a Cauchy sequence. Set a0 := 0, No := 0 and induc1
tively for each Ek := - find an Nk > Nk-1 so that for all n , m 2 Nk the inequality
2k
1
ljan - a, 11 5
holds. For all k E N,let dk := a N k - a N k - , . By construction of the
j=1
j=1
L“
j=1
absolutely, and hence, by hypothesis, it converges to a limit L E X. But for all k E W
c
k
we have
30
d j = aNk, so { a ~ ~ is}a ~convergent
= ~
subsequence of
j=1
To prove that L := lim ahi, is the limit of
k-00
K E
N so that for all k
{an}zl
let
E
E
{aa}zl.
> 0. Then there is a
3 K we have l / a N k - LII < - and there is an M E
W
2
M
we
have
/la,
a,
11
<
-.
But
then for all n L M we
so that for all n , m 2
2
can find a k 3 K so that Nk 2 M . Then for all n 2 A4 we obtain the inequality
6
(\an- LII I
llan - U N ,
1 + 1 1 - LI(
~ <~ -2 ~+ -2 =
E
and X is a Banach space.
E
E.
Hence,
{an}zl
converges to L
w
Now we are ready to tackle LP spaces.
Theorem 16.19 Let ( M , C , p ) be a measure space and let 1 5 p 5 co. Then
L p ( M , C,p ) is complete, that is, it is a Banach space.
294
16. The Topology of Metric Spaces
Proof. By Lemma 16.18 we need to prove that every absolutely convergent series
30
converges in LP. So let
0
C[fj]be a series in L p ( M , C ,p ) with C l l [ f j ] l / p<
00.
j=1
For each j E N,the function f j will be afxed representative of [ f j ] .
First, consider the case p = m. For all natural numbers j E N,we define the set
j=l
u
a2
N j :=
{X E
M : f j ( X ) z IIfjlla2}. Then N :=
verges on M \ N . Define f ( x ) :=
{cZ~
30
N j is a p-null set and
;=1
f j conj=1
f j ( x ) ; forx E M \ N , Because
forx E N .
0;
u
cc
Nj
j=1
is p-null we infer
m
a2
llfjlla2
n+a2
=~i%
C IIIY~III~ = o ,
j=n+l
and hence by Lemma 16.18 L m ( M , C ,p ) is complete.
This leaves the case p E [ I , m). For all x E M set g(x) :=
Minkowski’s Inequality, for each n E
N we obtain
Thus by the Monotone Convergence Theorem
and hence g is integrable. In particular, g is finite p-a.e., which means that for al30
most every x E M the series
fJ(X);
f j ( x ) converges absolutely. Define the function f
j=l
if g(x) <
Then the function f is measurable and it
otherwise.
I
I
satisfies l f i P 5 g, so f E C p ( M , C ,p ) . Because lim f ( x ) - X f J ( x ) = 0 p-a.e.
(Jzl
n-+m
P
a2
J ( x)) I~
) i~g(x) p-a.e.,bytheDominatedConand ~ . f ( ~ ) -5 ~ f ~ i fb
vergence Theorem we conclude lim
n+m
I! -cI:, l I p
f
fJ
32
= 0. Therefore the series x [ f J ]
J=1
16.2. Completeness
295
By Lemma 16.18, this means that L p ( M , C , p) is complete.
converges to [f].
The most natural example of a metric space that is not complete are the rational
numbers Q. To see why Q is not complete, recall that by Proposition 1.49 2/2 is
not a rational number. Use Theorem 1.36 to construct a sequence {qn}zl
of rational
numbers that (in Iw) converges to 2/2 (for a concrete example, consider Exercise 2- 17).
Then, because {qn}zl
converges in R, {qn)zl
is a Cauchy sequence. But {qn}El
does not have a limit in Q,because if L E Q R was a limit of {qn}Fzl,
then by the
uniqueness of limits in Iw we would obtain L = h 6 Q,a contradiction.
The above argument shows the typical way in which metric spaces fail to be complete. Incomplete spaces contain sequences that appear to converge to an element that
"just is not there." Theorem 16.89 will show that this is indeed the only way in which
a space can fail to be complete.
Further examples of spaces that are not complete are given in Exercise 16-15. In
particular, Exercise 16-15d shows that if we use the method that produced L' with the
Riemann integral rather than the Lebesgue integral, then we do not obtain a complete
space. This is the reason why for theoretical considerations the Lebesgue integral is
preferred over the Riemann integral.
Exercises
16-12, Prove Theorem 16.13.
16-13. Prove that Cora, b ] with the 11 / I x norm is a Banach space.
Hint. First, prove that if
is a Cauchy sequence, then every sequence [ f n ( x )
is a
Cauchy sequence. Then prove that the pointwise limit f actually is a uniform limit. Conclude that
f is continuous (Corollary 11.8). Then prove that llfn - f l l X I goes to zero.
(fn)El
16-14. Prove the "=+''part of Lemma 16.18.
16-15. Metric spaces that are not complete.
(a) Prove that the interval (0. 1) is not a complete metric space.
(b) Prove that C'(-1, 1) with the uniform norm
Hint Example 11.10.
(c) Prove that Co[O, 11 with the norm I l f l l 1 :=
(d) Let ( R ' [ O ,11, //
f .g with
1' 1
norm l l f l l 1 :=
/I . JIXI is not a Banach space.
s,I 1
1
f ( x ) dx is not a Banach space
. 111 ) be the space of Riemann integrable functions on [O, 11 with functions
f(x) - g(x) 1 dx = 0 identified into one object like in L'[O. 11 and with the
1
I
i1
f ( x ) dx. Prove that this space is not a Banach space
Hint. Let C Q be a Cantor set with positive measure (see Exercise 8-3e) and let C
:
XI
the sets used to construct C Q as in Definition 7.22. Prove that
sequence in L'[O, 11 that converges to
[[1.4),=,
be
is a Cauchy
[Ice] (which is in L'[O, 11 by Exercise 9-29) in the
XI
L' norm. Use the fact that R1[O,11 is a subspace of L'[O, 11 to conclude that
[lcfl I n = ,
is a Cauchy sequence in R'[O, I]. Then use that Ice is not Riemann integrable (Exercise
8- 15) to prove that
I
XI
[ l c ~I nl = ,
cannot have a limit in R'[O, 11. In the last step some care
is needed because formally the objects we are working with are equivalence classes.
16. The Topology of Metric Spaces
296
(e) Let 1 < p <
00.
1
Let ( R p [ O , 11, /I . 1)’
f ,g so that
I f ( x ) - g ( x ) 1’
norm
(1’1
llfllp :=
f(x)
1’
be the Riemann integrable functions on [0, 11 with
d x = 0 identified into one object like in L P [ O , 11 and with
dx)
’.
Prove that this space is not a Banach space.
(f) Prove that I’ with the norm / l {x n ] E l
16-16. Let X be a metric space and let
gent subsequence, then it converges.
:= sup
{ lxnl
:n E N
} is not a Banach space.
be a Cauchy sequence. Prove that if
has a conver-
16-17. Leta <b.ProvethatonthespaceB:= { f ~ C l ( a , b l)l f l:l w < c o , / l f ’ l l m <co}thefunction
l l f l l l := l l f l l o o +
/If’II,definesanormandthat
( B J ,111 )isaBanachspace.
16-18. Give two proofs that for 1 5 p 5 co the space l P is a Banach space.
(a) Prove it by applying Theorem 16.19
(b) Give a direct proof.
Hint. Mimic the proof of Theorem 16.19
16-19. Unconditional convergence of series in Banach spaces.
(a) Define what it means when a senes converges unconditionallyin a normed space.
(b) Prove that in Banach spaces absolute convergence implies unconditional convergence.
Hint. Mimic the “=+” part of the proof of Theorem 6.18.
(c) In Banach spaces, unconditional convergence does not imply absolute convergence. Prove
m l
that the series
-ek converges unconditionally, but not absolutely, in loo.
k
k=l
16.3 Continuous Functions
Once convergence has been defined, it is possible to talk about continuity. After all,
continuous functions are exactly those functions that preserve convergent sequences.
We will define continuity directly, that is, not via limits of functions. The technicalities
that would make a definition via limits of functions tedious are discussed at the end of
this section,
Definition 16.20 E-6 formulation of continuity. Let X , Y be metric spaces and let
f : X + Y . Then f is called continuous at x E X ifffor all E > 0 there is a 6 > 0
such that for all z E X with d x ( z , x) < 6 we have dy ( f ( z ) , f ( x ) ) < E . A function
that is continuous at every x E X is simply called continuous on X .
Notation 16.21 When the distance between two objects is computed, the space in
which the objects reside determines which metric needs to be used. Therefore, unlike in Definition 16.20, we typically do not index metrics and norms with the space
on which they act. Metrics and norms will only be indexed when we are working with
several metrics or norms on the same space (for examples, see Section 16.6) or when a
I?
particular norm is needed for a proof (Section 18.4 is a good example).
The first examples are (of course) all continuous functions f : I + R, where I is
an interval. Next note that the natural projections on EXd are continuous.
16.3. Continuous Functions
297
Example 16.22 For k E { 1, . . . , d } , define the natural projection nk : Rd + on
the kth coordinate by n(x1, . . . , X d ) := X k . Then nk is continuous on Rd when Rd is
equipped with the 11 . (Im norm.
Let x = ( X I , . . . , X d ) E Rd and let E > 0. Then, with 8 := E , for all elements
Z E Rd with (IZ - Xllm < E We obtain lnk(Z)- n k ( X ) / = (Zk- X k l 5 l l Z - Xllm < E ,
and hence
is continuous at n.
rn
To obtain more examples, we need to establish some properties of continuous functions. To make some of these results easier to prove, we first connect continuity to
sequences.
Theorem 16.23 Sequence formulation of continuity. Let X , Y be metric spaces and
E X ifSfor all sequences { Z n } z , in X
let x E X . Then f : X + Y is continuous at x
with lim zn = x we have lim f (z,) = f ( x ) .
n-03
,--too
Proof. Adapt the proof of Theorem 3.25 (Exercise 16-20).
Continuity is compatible with algebraic operations, if they are available, and it is
preserved by compositions.
Theorem 16.24 Let X be a metric space, let Y be a normed space and let the functions
f,g : X + Y be continuous. Then f + g and f - g are continuous.
Proof. Adjust the appropriate parts of the proof of Theorem 3.27. This can be done
via Theorem 16.23 and Exercises 16-9a and 16-9b or with a direct E-8 argument that
rn
mimics the appropriate part of the proof of Theorem 2.14. (Exercise 16-21.)
Theorem 16.25 Let X be a metric space, let Y be a normed space, and let thefunctions
f : X + Y and CY : X -+ R be continuous. Then CYf : X -+ Y is continuous.
Proof. Adjust the appropriate part of the proof of Theorem 3.27. This can be done
via Theorem 16.23 and Exercise 16-9c or with a direct 8-8 argument that mimics the
rn
appropriate part of the proof of Theorem 2.14. (Exercise 16-22.)
Theorem 16.26 Let X be a metric space, let Y be an inner product space, and let
f,g : X -+ Y be continuous. Then (f, g ) : X + R is continuous.
Proof. Adjust the appropriate part of the proof of Theorem 3.27. This can be done
via Theorem 16.23 and Exercise 16-10 or with a direct E-8 argument that mimics the
rn
appropriate part of the proof of Theorem 2.14. (Exercise 16-23.)
Theorem 16.27 Let X , Y , Z be metric spaces and let g : X
continuous functions. Then f o g : X -+ Z is continuous.
+ Y and f : Y -+ Z be
rn
Proof. Mimic the proof of Theorem 3.30. (See Exercise 16-24.)
+
(
The above shows that functions on Rd,like f ( x , y ) := x y cos x 2 - 4’2) (see
Exercise 16-25), which combine the components using known continuous operations
and functions are continuous. Our definition of continuity also gives access to more
complicated situations. For example, it is well known that integration is a “continuous
operation.” The following example makes this observation more precise.
16. The Topology of Metric Spaces
298
Example 16.28 Let ( M , C,p ) be a measure space, let 1 5 p 5 00, let q be such
1
1
f g d p defines a
that - - = I and let [ g ] E L q ( M , C, p). Then Z ~ , l ( [ f ] ) :=
P 4
M
: L p ( M , C,p ) -+ R.
continuousfunction
Because the elements of L p ( M , C , p ) and L q ( M , C , p ) are equivalence classes
and
is defined in terms of representatives of the equivalence classes, we must first
prove that the functions Z L ~are
] well defined. For functions f E P ( M , C , p ) and
1
+
g E C q ( M , C,p ) set Z g ( f ) :=
b
f g d p . By Holder’s inequality (Theorem 15.48
and Exercise 15-43), the integral in Z,(f) exists for all functions f E P ( M , cr, p )
and g E , P ( M , cr, p ) and the inequality /Z,(f)l I
Ilfllpllglly holds. In particular this
means if f i , f2 E [ f l and g l , g 2 E k l , then
IpgJfl)
- Ig2(f2)/
+
I I p g J f l ) - &Jf2)1
/ I g J f 2 ) - 4Jf211
= Ips, ( f l - fill + Ipgl-g*(f2)l
I llfl - f2llplIg1lly
+ llf2lIpllg1 - g211y = 0.
That is, the value of Z,(f) does not change if different representatives of the equivalence class of [ f ] E L p ( M , cr. p ) and [ g ] E Lq(M. cr, p ) are chosen. Now if the
functions f E C p ( M , C, p ) and gl, g2 E C q ( M , C, p ) satisfy [ g l l = [gzl, then
lIs, ( f ) - Z g 2 ( f ) / = 0. Therefore, for f E U ( M , C , p ) and [g] E L * ( M , C, p ) setting J r s l ( f ) := Z g ( f ) defines a unique function on P ( M , C , p). Moreover, if the
functions fi, f 2 E C P ( M ,C,p ) satisfy [ f l ] = [f2] and [g] E L q ( M , C,p ) , then
( J r g i ( f i ) - Jrgi(f2)I = Izg(fi) - Ig(f2)1 = 0. Therefore Z ~ ~ l ( [ f ] ) := J [ , l ( f ) defines
a well-defined function on L p ( M , C,p ) .
Now let [g] E L q ( M , C , p). To see that Zrg] is continuous, first note that by
Holder’s inequality we infer Z I ~(I[ f ] ) I[ f ] [ g ] for all [ f ] E L p ( M , cr, p )
and [g] E L q ( M , cr, p ) . Let [ f l ] E L p ( M , cr, p ) and let E > 0. Then for all elements
&
we obtain
[f21 E L p ( M , U3 p ) with / I [ f l l - [f2111, < 6 :=
I
/
1 1 / /I4
ll[~lllq+
lp[gl([fll) - 4 s l ( [ f 2 1 ) /
Hence,
=
/ p r s l ( [ f l - f21)I
I
I/[fl - f 2 1 l I p l l [ ~ l l l q
is continuous.
is actually a pretty complicated function. It maps equivalence
Note that each
classes of functions to numbers. For such complicated constructions, it is important to
establish some type of mental hierarchy so that the functions that are elements of the
space are distinguished from the functions that act on the space itself. The best way
to achieve this is to practice with the complicated settings. (Consider, for example,
Exercise 16-26, where functions map sequences to numbers.) When working with
such functions, the following notations can also help establish a mental hierarchy.
Definition 16.29 A continuous function into the real numbers is also referred to as a
functional. A continuous function between metric spaces is also called an operator.
16.3. Continuous Functions
299
Note that in Example 16.28 we have proved a bit more than just continuity. We
)I[glll,II[fi - f2111p3 forall
have established the inequality lI~g]([fil)- I[g~([f21)lI
[ f l ] , [f2] E L P ( M , C , p ) , where ll[g]llq actually is a constant for the function
Functions that satisfy such a condition are called Lipschitz continuous.
Definition 16.30 Let X , Y be metric spaces. A function f : X -+ Y is called Lipschitz
continuous there is an L > 0, called the Lipschitz constant of f , so that for all
x , Y E X we have d ( f ( x ) ,f ( Y ) ) i L d ( x , Y > .
zr
It is easy to see that Lipschitz continuous functions are continuous.
Proposition 16.31 Let X , Y be metric spaces and let f : X
continuousfunction. Then f is continuous.
Proof. Exercise 16-30a.
-+
Y be a Lipschitz
w
16.3.1 The Trouble with Limits of Functions
The familiar Definition 3.23 from calculus was not used to define continuity in metric
spaces because a technical problem allows us to only define limits at accumulation
points of the space.
Definition 16.32 Let X be a metric space. Then x E X is called an accumulation
point of X ifSfor all 6 > 0 there is a z E X \ {x} such that d ( z ,x) < 6.
Definition 16.33 E-S formulation of the limit of a function. Let X , Y be metric
spaces, let f : X \ { x } + Y and let x be an accumulation point of X . Then L E Y
is culled the limit off at x E X ifffor all E > 0 there is a 6 > 0 such that for all
z E X \ { x } with d ( z , x ) < 6 we have d ( f ( z ) ,L ) < E .
We require that the point x is an accumulation point, because otherwise the limit
of a function would not be unique. Indeed, if x is not an accumulation point (that is, if
x is an isolated point), then for some S > 0 there is no z E X \ {x} with d ( z , x ) < 6 .
Because there are no such points, it is vacuously true that for all such points z and any
E > 0 and L E Y we have d ( f ( z ) ,L ) < E , no matter what L is. So at an isolated point
x any element of Y would be a limit of f , which is a rather pathological situation. To
avoid this pathology, we demand that limits are only taken at accumulation points.
Note that for functions f : I \ { x } + R,where I is an open interval and x E I, the
definition of the limit via Definition 16.33 is consistent with Definition 3.1, because
x clearly is an accumulation point of I \ { x } and because we can apply Theorem 3.7.
Exercises 16-28 and 16-29 show that Definition 16.33 removes the formal problems
we had identified for Definition 3.1.
Regarding continuity, it is reassuring to note that there is no need to worry about
isolated points. The definition of continuity states that as the inputs get close to x, the
outputs must get close to f ( x ) . If the inputs cannot get close to x , then the outputs
vacuously get close to f ( x ) . This is still a silly notion, but there is no logical problem
16. The Topology of Metric Spaces
300
and usually no one worries about continuity at an isolated point. The definition implies
that if a function is defined at an isolated point, then it is continuous there. So be it.
Aside from the problem with isolated points, limits work very much like for functions on intervals ( a , b).
Theorem 16.34 Sequence formulation of the limit of a function. Let X , Y be metric
E Y . Then lim f ( z ) = L ifffor
Z'X
all sequences {zn}K1
in X \ {x} with lim zn = x we have lim f(zn) = L.
spaces, let x be an accumulation point of X and let L
n-+w
n+co
rn
Proof. Mimic the proof of Theorem 3.7. (Exercise 16-27.)
Exercises
16-20. Prove Theorem 16.23.
16-21. Prove Theorem 16.24.
16-22. Prove Theorem 16.25
16-23. Prove Theorem 16.26.
16-24. Prove Theorem 16.27.
+
16-25. Prove that f : R2 --f R defined by f f x , y) := xy cos ( x2 - y2 ) is continuous.
Hint.Break up the way the function is computed into elementary steps that are continuous.
16-26. Let 1 5 p 5 co. For k E
?Ck ( { u ( j )
17 ) :=
N,define the
projection on the kth component q : l p +
W by
Prove that for all p and k the function ."k : l p + R is continuous.
j=l
16-27. Prove Theorem 16.34.
16-28. Exercise 3-8 revisited.
(a) Prove that lim
z+o
& = 0. In particular, this means that with Definition 16.33there are no formal
problems with the limit of the square root function at 0.
(b) Explain why, despite the result in Exercise 16-28a, Definition 16.33 does not supersede the
definition of one-sided limits.
Hint.Step functions.
16-29. Prove that the function from Exercise 3-32 does not have a limit at x = 0.
16-30. Let X , Y be metric spaces and let f : X + Y be a function.
(a) Prove that i f f is Lipschitz continuous, then f is continuous.
(b) Define uniform continuity for functions g : X + Y .
(c) Prove that i f f is Lipschitz continuous, then f is uniformly continuous.
16-31. Let X be a metric space and let z
E
X. Prove that d ( . . z ) : X + W is continuous.
16-32. Let X be a metric space and let f,g : X +
all x E X , then
f .is continuous on X.
W be continuous functions. Prove that if g(x)
f 0 for
-
g
Hint.Use the proof of Theorem 2.14 as guidance.
16-33. Paths and continuity.
(a) Let C be a subset of a normed space X , let x E C be such that there is an E > 0 so that
{ z E X : 1Jz- xJ1 iE } & C and let Y be a metric space. Prove that f : C -+ Y
is continuous at x iff for all continuous functions p : [0, I] --f X so that p ( 0 ) = x , the
composition f o p : [0, I ] + Y is continuous at 0.
301
16.4. Open and Closed Sets
(b) Prove that f : E2 + W defined by f ( x , y ) :=
{ 09.
for (x, Y ) f (0,O ) ,
for x = y = 0,
is not continuous at the origin.
16-34 A polynomial p : Rd -+ W is a finite sum of finite products of the variables. That is, there are an
n E W, c l , . . . , c, E R and (a;,. . . ,a;) E ( W U {O] )d so that for all ( X I ,. . . , xd) E X d we have
p(a-1. . . . , x d ) =
2
c;xp'
.
x?. Prove that every polynomial p :
xd + B is continuous.
i=l
Hint. Induction. First, for n = 1 on the sum of the as to take care of the products, then on n.
from X to Y is called uniformly
16-35 Let X and Y be metric spaces. A sequence of functions
convergent to the function f : X + Y iff for all E > 0 there is an N E W so that for all n ? N and
all x E X we have d ( f n ( x ) , f ( x ) ) < E .
Let
be a sequence of functions f, : X + Y that converges uniformly to f : X + Y and
let all f, be continuous at c E X. Prove that f is continuous at c.
Hint. Mimic the proof of Theorem 1 1.7.
1
1
j(j+l)(x-&
; for-sxs-,
It1)
J+1
j
1
1
16-36 For natural numbers j p 2, let g
j ( j - 1) (
1
- x) ; for 7 5 x 5 ---,
J-1
J
j-1
otherwise.
[fn]y=l
[f,]p=,
a3
Prove that the function f : R
--f
lCc defined by f :=
[ ,,
,
j=2 g / + 1 ' / - 1
1 e j is continuous on
W\
(01,
bounded, and for any sequence x, -+ 0 with x, z 0 the limit lim f ( x n )does not exist
n-rm
Nore. This function shows that in infinite dimensional spaces functions can have discontinuities that
are neither removable, nor jumps, nor infinite, nor by oscillation.
16.4 Open and Closed Sets
Results, such as the Intermediate Value Theorem (Theorem 3.34) or the fact that continuous functions assume absolute maxima and minima (Theorem 3.44), refer specifically
to the ordering of the real numbers. This ordering is not available in metric spaces. To
obtain more general versions of these theorems, we must first establish properties that
will take the place of the ordering of the real numbers. Connectedness (see Definition
16.90), which is needed to prove a version of the Intermediate Value Theorem, is defined purely in topological terms. Compactness, which allows us to prove a version of
Theorem 3.44 for metric spaces, can be defined in terms of sequences, but it is more
commonly defined in topological terms (see Theorem 16.72). Thus, before we continue, we introduce the requisite topological ideas. In a nutshell, topology revolves
around open and closed sets. Figure 34 on page 304 summarizes the main properties
of these sets.
With a metric envisioned as measuring distances, a ball must be the set of points
within a certain distance of a center point.
Definition 16.35 Let X be a metric space, let x E X and let E > 0. Then the open ball
of radius E about x is dejned to be B,(x) := { p E X : d ( p , x) iF } .
An open set contains for each of its points a small open ball around that point. So,
just like in an open interval, at every point in an open set we have a certain amount of
room to go in any direction (also see Figure 34(a)).
16. The Topology of Metric Spaces
302
Figure 33: Visualization of the proof of Proposition 16.37.
Definition 16.36 Let X be a metric space. A subset 0 E X is called open ifffor all
E 0 there is an E~ > 0 such that Bex(x) g 0.
x
Open intervals ( a , b) on the real line are open sets as in Definition 16.36, because
for each point x E ( a , b) with E := min(b - x,x - a } we obtain the containment
B, (x)= (x - E , x E ) C ( a , b). Thus the nomenclature is consistent. In general, open
balls are the prototypical examples of open sets. Although it is convenient to envision
these balls as round entities, we should bear in mind that they need not be round at all.
Balls with respect to the uniform norm 11 . )Im on Rd are cubes (also see Figure 37 on
page 317).
+
Proposition 16.37 Let X be a metric space, let z
ball B E ( z )is open.
E
X and let
E
> 0. Then the open
Proof. To prove that a set is open, we prove that it contains a small ball around
each of its points.
Let x E B E ( z )be arbitrary. Set E, := E - d ( x , z ) > 0. Then for all y E Bgx(x)
we have that d ( y , z ) 5 d ( y , x ) d(x,z ) < E - d ( x , z ) d ( x , z ) = E . Hence,
BCx(x) 5 B, ( z ) (see Figure 33) and since x was arbitrary, B, ( z ) is open.
+
+
We can summarize the most important properties of open sets as follows.
Theorem 16.38 Let X be a metric space. Then the following results hold:
1. Both 0 and X are open subsets of X.
n
u
n
2.
1f01,
. . . , 0, are open subsets ofx, then
ok
is open.
k=l
3. If0 is a family of open subsets of X then
Proof. Part 1 is trivial.
0 is open.
16.4. Open and Closed Sets
303
n
n
For part 2, let
01,
. . . , On be open subsets of X, and let x
E
o k
be arbitrary.
k= 1
Then for each integer k E { 1, . . . , n } there is an E k > 0 such that B , , ( x ) E ok. Let
:= min{sk : k = 1, . . . , n ) . Then for all integers k E (1, . . . , n } the containments
E
n
n
c:
~ & ( x ) ~,,(x)
c: ok hold. Hence, B,(x) G
n
nok
n
ok.
Since x E
k= 1
was arbitrary,
k=l
n
the intersection
o k
is open.
u
k=l
For part 3, let 0 a family of open subsets of X and let x E
0be arbitrary. Then
there is an 0 E 0 such that x E 0 and there is an E > 0 so that B,(x) 0. But then
BE(x)E 0 E
0.Since x E
0 was arbitrary, 0 is open.
rn
u
u
u
Example 16.39 Arbitrary intersections of opera sets need not be open.
For any real numbers a < b, we have that [ a , b] =
n
0
is not an open subset of X .
Once open sets are defined we can briefly summarize the fundamental concepts of
topology. A topology is a family 7 of subsets of a set X such that the three conditions
in Theorem 16.38 hold. The pair (X, 7) is called a topological space and the sets
0 E 7 are called open sets. Any idea that can be stated purely in terms of open sets is
actually a topological idea. In this text, we will focus on metric spaces and only touch
upon the topological notions and vocabulary as necessary. As an example consider how
continuous functions would be defined in terms of open sets.
Theorem 16.40 Topological formulation of continuity. Let X and Y be metricspaces
and let f : X + Y be a function. Then f is continuous on X zrfor all open subsets
V C Y the inverse image f [ V ]E X is open.
-‘
Proof. For “ j , let
” f be continuous and let V _C Y be open. Let x E f - ’ [ V ] .
Then there is an E > 0 so that B E (f (x)) 5 V . Because f is continuous, there is a
S > 0 such that for all z E X with d ( z , x) < 6 we have d ( f ( z ) ,f ( x ) ) < E . Therefore,
for all z E Bs(x) we obtain f ( z ) E B , ( f ( x ) ) G V , and hence Bs(x) _C f - ’ [ V ] .
Because x E f - ’ [ V ]was arbitrary, f - ’ [ V ]is open.
Y the inverse
For “+,”let the function f be such that for all open subsets V
image f - ’ [ V ] _C X is open, let x E X and let E > 0. Then f-l[B,(f(x))]
is open.
Hence, there is a 6 > 0 with &(x) _C f - ’ [ B E ( f ( x ) ) ]But
. this means for any z E X
with d ( z , x) < S the inequality d ( f ( z ) , f ( x ) ) < E holds, and hence f is continuous at
rn
x . Because x E X was arbitrary, f is continuous.
We will often be concerned with subsets of metric spaces that contain an open
set around a given point x. To abbreviate the wording and because open sets contain
a little “padding” around each of their points (see Figure 3 4 ( a ) ) ,we call such sets
neighborhoods.
304
16. The Topology of Metric Spaces
open set
Figure 34: Open sets contain a certain amount of “padding” around each of their points
( a ) ,while closed sets contain all their limit points (b).
Definition 16.41 Let X be a metric space and let x E X . Then N
X is called a
neighborhood of x i f f x E N and there is an open set 0 G X so that x E 0 C N . I f
S 2 X and N is a neighborhood of each s E S, then N is also called a neighborhood
of s.
The complements of open sets are called closed sets.
Definition 16.42 Let X be a metric space. A subset C G X is called closed fz its
complement X \ C is open.
Closed intervals on the real line are also closed sets as in Definition 16.42. Once
again, the nomenclature is consistent.
Example 16.43 Although it is tempting to hope that open and closed are mutually
exclusive and exhaustive properties, sets can actually be open, closed, both, or neither.
1 . The interval [0, I] is a closed subset of the real line.
2. In any metric space, the sets 0 and X are both open and closed.
3. The interval [0,1) as a subset of the real line is neither open nor closed.
0
Although properties of closed sets can in theory be obtained through complementation, closed sets are interesting in their own right, because closed sets “keep their
limits” (also see Figure 34(b)).
Definition 16.44 Let X be a metric space and let A G X . Then x E X is called a
limit point of A iff there is a sequence {a,}:=, with a, E A for all n E N so that
x = lim a,.
n+m
Limit points are different from accumulation points in that the sequence that converges to a limit point x can assume x as a value (and even be constant with value x ) ,
which is forbidden for accumulation points.
305
16.4. Open and Closed Sets
Theorem 16.45 Let X be a metric space. Then C C X is closed @for all convergent
sequences
with zn E C for all n E M we have lim Zn E C.
{z,}Z,
zn
n+co
c
Proof. For "jlet
,"
C X be closed, let {z,},"=, be a convergent sequence with
C for all n E N and let x := lim zn. For a contradiction, suppose that x $ C.
E
n-tco
Because X \ C is open, there is an E > 0 so that B E ( x )G X \ C. But then there is an
N E N so that for all n 2 N the point z,, is in BE(x)5 X \ C and in C, a contradiction.
For "+,"suppose for a contradiction that X \ C is not open. Then there is an
x E X \ C so that for every n E N there is a point zn E B L(x)n C. But then { Z n } E i
n
is a sequence of points in C that converges to a point x @ C, contradiction.
Remark 16.46 Definition 16.36 and Theorem 16.45 provide descriptive characterizations of open and closed sets, respectively. Note (also see Figure 34) that sequences
co
in open sets need not take their limit in the open set (consider the sequence
{
ln=l
in the interval (0, 1)) and that closed sets need not contain a small ball around each of
0
their points (consider the point 0 in [0, 11).
Because completeness is important, we should note that closed subsets of complete
spaces will again be complete.
Corollary 16.47 Let X be a complete metric space and let C E X be closed. Then C
with the induced metric is a complete metric space.
rn
Proof. Exercise 16-37.
16.4.1 The Interior and The Closure
For each subset A of a metric space, there are a largest open set contained in A and a
smallest closed set that contains A . These sets are called the interior and the closure,
respectively.
Definition 16.48 Let X be a metric space and let A C X . The interior A" of A is
the set of all points x E A so that a small ball around the point is also in A , that is,
A" := {x E A : (3s > 0 : B,(x) C A ) } . The points in A" are also called the interior
points of A.
u{
Proposition 16.49 Let X be a metric space and let A E X . Then A" is an open set
that contains all open subsets of A. Moreovel; A" =
0 C A : 0 is open }.
Proof. To see that A" is open, let x E A". Then x E A and there is an E > 0 so
that B E ( x ) C A . By Proposition 16.37, for every z E B,(x)there is an E~ > 0 so that
BEI( z ) g B , ( X ) A . But then B E ( x )E A", which proves that A" is open.
Now let U g A be an open subset of A. Then for all x E U there is an E > 0 so
that BE(x)C U G A, which means x E A". Hence, U A" for all open subsets of A.
Finally to prove the equation, note that the set on the right is an open subset of
A , which means (by what we just proved) that the set on the right is contained in A".
Conversely, A" is an open set contained in A , so A" is contained in the union on the
right, thus establishing the equation.
c
306
16. The Topology of Metric Spaces
Example 16.50
1. The interior of an open ball B, (x)obviously is the ball itself, ( B , (x))" = B, (x).
2 . The interior of the rational numbers as a subset of the real numbers is Q" = M,
because every ball around a rational number contains an irrational number. 0
Because every singleton set {x)is contained in all open balls of radius r > 0 about
x. it is easy to see that there is no smallest open set that contains a given set.
Definition 16.51 Let X be a metric space a i d let A C X . The closure A- (or 2)of A
is dejined to be the set of all limit points of A, that is,
-
[
(
A := A - := x E X : 3 { a , } z l : lirn a, = x and Vn
nice
E
W : a,
E
I.
Proposition 16.52 Let X be a metric space and let A E X . Then A - is closed and it
is contained in all closed supersets of A . Moreovel; A- = n ( C 1 A : C is closed }.
Proof. To prove that A - is closed, let x E X and let {x,}?=~ be a sequence with
A- for all n E W and lirn x, = x. Then for each n E N there is an a, E A with
nice
1
d(x,. a,) < -. But then lim a, = lim x, = x and x E A-. Hence, A- is closed.
n
n+oo
n+oo
Now let C 2 A be a closed superset of A and let x E A - . Then there is a sequence
{a,}Zlso that a, E A for all n E N and lim a, = x. But then, because C is closed
xn
E
n-co
and contains A we conclude x = lim a, E C, and hence A- E C.
n-rzo
Finally, to prove the equation, note that the set on the right is a closed superset of
A (use Exercise 16-41c), so A - must be contained in the set on the right. Conversely,
because A- is a closed superset of A it must contain the intersection on the right, which
establishes the equality.
Example 16.53
1. The closure of an open ball is B,(x) = { p E X : d ( p , x) 5 r } (Exercise 1648a).
0
2 . The closure of the rational numbers in the real numbers is = R,because every
0
real number is the limit of a sequence of rational numbers.
Because every open interval is the union of all its closed subintervals, there is no
largest closed set that is contained in a given set. The set that is geometrically between
the interior of a set and the interior of the set's complement is called the boundary.
Definition 16.54 Let X be a metric space and let A C X . The boundary 6 A of A is
dejined to be 6A := A- \ A".
307
16.4. Open and Closed Sets
Figure 35: Visualization of Proposition 16.56. Relatively open sets U are intersections
of the subset with open sets V in the space ( a ) . In particular, as subsets of the space
itself, they need not be open ( b ) .
Example 16.55
1. The boundary of an open ball is s B , ( x ) = { p E X : d ( p , x) = r } (Exercise
16-48b) .
2. The boundary of the rational numbers in the real numbers is 8Q = R.
Further properties of the interior, the closure, and the boundary of a set will be
investigated in Exercises 16-43-16-47.
16.4.2 Relatively Open Sets
When subspaces S of a metric space X are investigated, subsets of S can be open in the
subspace S, as well as in the space X itself. However, it is important to realize that a
set U C S can be open in S and still it may not be open in X. For example, the interval
[O. 1) is an open subset of the metric space [O, 21. Indeed, for every x E [O, 1) there
is a small E > 0 so that the ball in [0,2] around x of radius E is contained in [0, 1).
Proposition 16.56 describes the relation between open sets in the space X and open
sets in a subset S. Open sets in a subset are also called relatively open.
Proposition 16.56 Let X be a metric space and let S X be a subset. Then U S is
open with respect to the induced metric on S iff there is a set V C X that is open with
respect to the metric on X and such that U = V S. (Also see Figure 35.)
Proof. To prove "+,"for each element u E U find E, > 0 so that the containment
B;,(u) := {x E S : d ( x , u ) < e U }G U holds. Let
v :=
u
U€U
Bt(U)=
u {x
E
x : d ( x ,u ) < E U }
UEU
Then V is open and we claim that U = Sn V . Clearly, U _C S f
l V . Now let u E S n V.
Then there is a u E U so that u E B: ( u ) . Because u E S we infer u E
( u ) _C U ,
and hence S n V C U .
Bli
16. The Topology of Metric Spaces
308
The part
“+”is left as Exercise 16-38.
rn
A similar result is proved for closed sets in Exercise 16-49.
Exercises
16-37. Prove Corollary 16.47.
part.
16-38. Finish the proof of Proposition 16.56 by proving the ‘‘e”
16-39. Let X be a metric space. Prove that for all x
E
X the set X \ (x)is open.
16-40, Let X be a metric space. Prove that U C X is open iff U is a union of open balls
(x).
16-41. Closed sets. Let X be a metric space.
(a) Prove that both 0 and X are closed subsets of X.
u
n
(b) Prove that if C1, . . . , C, are closed subsets of X , then
Ck is closed.
k=l
(c) Prove that if C is a family of closed subsets of X then
C is closed.
(d) Give an example of an infinite union of closed sets that is not closed.
d
16-42. Let C g
Wd be so that for all n E W the set C n n [ - n , n ] is closed. Prove that C is closed.
i=l
16-43. The interior of a set. Let X be a metric space.
(a) Prove that Aoo = A’ for all A g X.
(b) Prove that if A l , . . . , A , g X, then
(h
Aj )
A;
=
j=1
j=1
oc
(c) Prove that if A j g X for all j E
N,then
AS and give an example that
shows that the containment can be proper.
(d) Prove that U g X is open iff U o = U .
16-44. The closure of a set. Let X be a metric space
Prove that A - - = A - for all A g X.
Prove that if A 1 , . . . . A , g X, then
u
n
uq.
n
Aj =
j=1
j=1
Prove that if A j
sX
u
CCI
for all j E W, then
A, 2
r=l
u
x.
and give an example that shows
]=I
that the containment can be proper.
Prove that C g X is closed iff C- = C
16-45. The boundary of a set. Let X be a metric space and let A C X.
(a) Prove that the boundary S A of A is closed.
(b) Find a set B in a metric space X so that B , SB and S(SB) are three distinct sets
(c) Prove that if A is closed, then (X \ A ) U Ao = X.
(d) Prove that for any set A we have S ( S A) = 6 ( S ( 6 A ) ).
16.5. Compactness
309
16-46. Closure, interior and the boundary. Let X be a metric space and let A & X
(a) Prove that X \ (A") = ( X \ A ) - .
(b) Prove that X \ (A-) = (X \A)'.
(c) Prove that S A = A- n (X \ A ) - .
(d) Prove that
A-O-
(e) Prove that A'-'
C A - and show that the containment can be proper.
2 A n and show that the containment can be proper.
16-47. Let X be a normed space and let Y C X be a normed subspace. Prove that Y - also is a normed
subspace of X.
16-48. Let X be a metric space, let x
(a) Prove that B,(x) =
(b) Prove that S B , ( x ) =
E
X and let r > 0
[ p E X : d ( p ,x ) 5 r ) .
{ p E X : d ( p ,x ) = r ) .
16-49. Let X be a metric space and let S & X be a subset. Prove that C g S is closed with respect to the
induced metric on S iff there is a set D C X that is closed with respect to the metric on X and such
that C = D n S.
Hint. Use Theorem 16.45. For ''3,''
let D be the set of all limits of sequences in C .
16-50, Let X, Y be metric spaces and let f : X +. Y , Prove that f is continuous at x E X iff for all open
subsets V C Y that contain f ( x ) the inverse image f-'[V] & X contains an open ball around x .
16-51. Let x be a point in a metric space X. Can a neighborhood of x be closed? Explain.
16-52. Let X be a normed space and let R g X be an open subset. Prove that every x
point of R.
E
R is an accumulation
16-53. Let X, Y be metric spaces and let f : X + Y be a function. Define the oscillation of f over the
open set U C X as w f ( U ) := sup [ d ( f ( y ) , f ( z ) ) : y , z E U ). Define the oscillation o f f at
x E x as w f ( x ) := inf [ w ~ ( u :) x E U , u open
1.
(a) Prove that f is continuous at x iff wf ( x ) = 0.
(b) Prove that for all p 1 0, the set
(c) Prove that for all p 2 0, the set
{x
{x
E
X
: wf(x) 2 p
E
X
:wf(x) < p
} is closed.
) is open.
16-54. Prove that Cantor sets are closed.
16.5 Compactness
The Bolzano-Weierstrass Theorem plays a key role in the proofs of several important
results for functions of a single variable. For example, it is used to prove that a continuous function f : [ a ,b] + R always assumes an absolute maximum value (see
Theorem 3.44), as well as to show that continuous functions f : [ a ,b] -+ R are uniformly continuous (see Lemma 5.19). It therefore is natural to investigate spaces that
satisfy the conclusion of the Bolzano-Weierstrass Theorem.
Definition 16.57 Bolzano-Weierstrass formulation of compactness. A metric space
X is called compact ifSevery sequence {x~}:=~ of elements in X has a convergent
subsequence.
310
16. The Topology of Metric Spaces
Compactness is usually formulated in topological terms. We will investigate this
formulation, which is reminiscent of the Heine-Bore1 Theorem (see Theorem 8.4), in
Theorem 16.72. In metric spaces, the Bolzano-Weierstrass formulation of compactness is equivalent to the topological formulation, but we need to be careful. In general
topological spaces, the Bolzano-Weierstrass formulation of compactness is called sequential compactness. It is a consequence of (topological) compactness, but it is not
equivalent to it.
Closed and bounded subsets of finite dimensional spaces are the prototypical examples of compact sets (also see Theorem 16.80).
Example 16.58 Let r > 0 and let c,(o):= [x E Rd : I I x J J5~ r ) . Then c,(o)is
compact when equipped with the metric induced by the 11 . /Jm-norm.
A typical compactness proof with the Bolzano- Weierstrass formulation will take a
sequence in the space and produce a convergent subsequence.
Let
be a sequence in c,(o).Then each component sequence x;)
is
[xn}zl
{ )5L
{
bounded. In particular, x:')]
{
00
has a convergent subsequence x"'}
n=l
[
00
1 5 j < d and assume n i ]
m=l
n=l
00
nk
. Now let
k=l
is a strictly increasing sequence of integers such
that for 1 5 i 5 j the subsequences
(x(~))T
converge. Then
m=l
a convergent subsequence. That is, there is a strictly increasing sequence of integers
[niT1]I0
1=1
such that for all 1 5 i 5 j
+ 1 the subsequences
[
00
{ xy/+,]
00
converge.
1=1
Inductively we conclude that there is a subsequence x n i ] k = i such that for 1 5 i 5 d
d
00
converge. Call each limit x ( i ) and let x := E x ( ' ) e i .
the subsequences
.i z lCCI
Then by Theorem 16.4 {xn..)
-
k
is a convergent subsequence of
k =l
{xn}Z1
with limit x.
Moreover, x E C,(O), because C,(O) is closed. Since { x n } K l was arbitrary this means
0
that C, (0) is compact.
Compact subspaces of Rd will be investigated in detail in Section 16.6. As Example
16.58 indicates, compact spaces are usually subsets of larger metric spaces.
Definition 16.59 Let X be a metric space. A subset C G X is called compact i f f C
with the induced metric is a compact metric space. Equivalently, C is compact i f f every
sequence in C has a convergent subsequence whose limit is in C.
Compact subsets of metric spaces are closed and bounded
Proposition 16.60 Let X be a metric space and let C be a compact subset of X . Then
C is closed and bounded.
16.5. Compactness
311
Proof. Let C
X be compact. To prove that C is closed, suppose for a contradiction that C is not closed. Then there is a sequence {x~}:=~ of elements of C that
converges in X to a limit x # C . But then by Proposition 16.11 all subsequences of
{x,,},I=l
x converge (in X ) to x,and hence no subsequence of
has a limit in C, a
contradiction.
To prove that C is bounded, suppose for a contradiction that C is not bounded. Let
.YI E C . Once X I . . . . , x,, E C have been chosen, we can find an x,,+] E C so that
~ ( x . , , L Ix,,)
. ? (n
1 ) max {d(xk,x,) : k = 1, . . . , n - l } . But then the inequality
d ( s , , - ~ xk)
.
2 d(x,,+l. .xn) - d(x,,. xk) 2 n 1 holds for all k = 1. , . . , n , and hence
all subsequences of the inductively constructed sequence {x,
are unbounded. Now
by Proposition 16.9 no subsequence of (x~}:=~ has a limit in C (or even in X ) , a
contradiction.
w
{x,,)zl
+ +
+
Exercise 16-55 shows that the converse of Proposition 16.60 does not hold in general. Next, we note that the closed subsets of a compact space inherit the compactness.
Proposition 16.61 Let X be a compact metric space and let C G X be closed. Then
C is compact.
Proof. Let { ~ , ) , " 3 _ ~be a sequence of elements of C. Because C 5 X there is a
subsequence { a n k } E lthat converges in X to a limit L . But then by Theorem 16.45
L E C, and hence C is compact.
w
The general version of Theorem 3.44 is now the following.
Theorem 16.62 Let X be a compact metric space, let Y be a metric space and let
f : X + Y be continuous and surjective. Then Y is compact.
Proof. Let { y n } g l be a sequence in Y . For each n E N let xn E X be such that
f ( x , ) = yn. Because X is compact, there is a subsequence {xn,}El of {x.},"=~ and
an x E X such that lim xnk = x. But then y = f ( x ) E Y , {
is a subsequence
ynk}zl
k+m
of { y n } z land lim ynk = lim f ( x n k )= f ( x ) = y . Therefore Y is compact.
k+ ac:
k+x
w
Corollary 16.63 (Compare with Theorem 3.44.) Let X be a compact metric space and
let f : X -+ R be continuous. Then f assumes its absolute maximum on X . That is,
there is an x E X so that f (x) > f ( z ) f o r all z E X .
Proof. The function f is surjective onto f [ X ] & R.By Theorem 16.62 this means
f [ X ]is compact and by Proposition 16.60 this means that f [ X ] is closed and bounded.
Then M := sup ( f [ X I ) is an element of f [ X ] . Therefore there is an x E X so that
f ( x ) = M , and M is greater than or equal to all other values f assumes.
w
It is also easy to see that the inverses of continuous functions on compact metric
spaces are continuous.
Theorem 16.64 (Compare with Theorem 3.38.) Let X be a compact metric space, let
Y be a metric space and let the function f : X + Y be continuous and injective. Then
the inverse function f : f [XI -+ X is continuous, too.
-'
312
16. The Topology of Metric Spaces
Proof (sketch). Let
{yn}El
be a sequence in Y that converges to y E f
i
x
{ -'
1
[XI. Prove
has a subsequence that converges to f
that every subsequence of f ( y n )
n=l
and use Exercise 16-8.
The full proof is left to the reader as Exercise 16-56.
(y)
H
An important metric property of compact spaces is that they are complete.
Theorem 16.65 Let X be a compact metric space. Then X is complete.
Proof. Exercise 16-57.
Another important consequence of compactness is that continuous functions are
uniformly continuous on compact metric spaces.
Definition 16.66 Let X , Y be metric spaces. Then the function f : X -+ Y is called
uniformly continuous i f f o r every E > 0 there is a S > 0 such that for all u , u E X
with d ( u , u ) < 6 we have that d ( f ( u ) ,f ( u ) ) < E .
Lemma 16.67 Let X be a compact metric space, let Y be a metric space and let the
function f : X + Y be continuous. Then f is uniformly continuous.
Proof. Mimic the proof of Lemma 5.19. (Exercise 16-58.)
Compactness typically is not formulated in terms of the Bolzano-Weierstrass Theorem, but in terms similar to the Heine-Bore1 Theorem (see Theorem 8.4). For metric
spaces, both formulations are equivalent. Because the Heine-Bore1 formulation is exclusively in terms of open sets it is the topological (and thus more general) description
of compactness. Recall that the Heine-Bore1 Theorem said that each open cover of a
closed and bounded interval has a finite subcover.
u
Definition 16.68 A cover of a metric space X is a family C of sets such that X
C.
An open cover C is a cover such that all sets in C are open.
For a subset S of a metric space X , it is usually more natural to cover S with sets
that are open in X , even if this means that the sets are not contained in S. Hence, i f
S
X , we will also call a family C an open cover of S i ra l l sets in C are open (in X )
and s E
UC.
For a visualization of open covers, consider Figure 36.
Example 16.69 Open covers.
I . {B,(o) : n E N} is anopencover ofR d.
2.
[ B-
(0) : x
3. For any a , b
E
}
B ~ ( o )is an open cover ofBl(0)
E (0, l),
the set
open cover of the interval [0, 11.
{ (:,
1-
i)
:n E
ll~d
N) u {Lo, a ) , ( b , 11)
is an
313
16.5. Compactness
(a1
(b)
Figure 36: Because compact sets are usually subsets of other spaces, open covers are
typically visualized with sets that are open in the surrounding space ( a ) rather than
with relatively open sets (b).
U { (-1, a ) , (b,2)) is an open cover
Moreovel; the set
ofthe interval [0, 11 i f w e consider [0, 11 as a subspace ofR.
We have encountered the proof technique of finding finite subcovers of open covers in Proposition 8.5, Exercise 8-3c and Lemma 8.11. Definition 14.17 of the d dimensional outer Lebesgue measure also relies on open covers and the remarks after
the proof of Proposition 14.60 suggest that we need a version of the Heine-Bore1 Theorem to prove that d-dimensional Lebesgue measure is a product of lower dimensional
Lebesgue measures. Indeed, in this text, the main use of finite subcovers of open covers is in connecting topology with measure theory. Theorem 16.72 below shows that
compactness provides an abstract version of the Heine-Bore1 Theorem.
Lemma 16.70 Let X be a compact metric space. Then for every
E
> 0 there are
n
X I ,. . . . ~n E
X
SO
that X C
U B,(xj).
j=1
Proof. Let E > 0. Let x1
E
X be arbitrary. If X
u
BE(xl)stop. Otherwise continue
n-1
as follows. If xi,. . . , x n - l
E
X are
SO
that X
j=1
u
n-1
B , ( x j ) , choose x n pi
u
B,(xj).
j=1
n
If X C
j=1
B , ( x j ) , stop, otherwise continue. This process cannot continue indefinitely,
because if it did, {xn}Z1
would be a sequence such that for any distinct m , n E N we
have d(x,, x,) 2 E . This would mean that (xn}E1has no convergent subsequence,
a contradiction to the compactness of X. The x1, . . . , x, for which the construction
stops are as desired.
16. The Topology o f Metric Spaces
3 14
Definition 16.71 Let X be a metric space and let 0 be an open covey. A finite subset
u
n
(01,. . . , O n ] & 0so that X &
Oj is also called a finite subcover.
j=1
Theorem 16.72 Heine-Bore1formulation of compactness. A metric space X is compact iff every open cover 0 of X has a finite subcovey.
Proof. For "+,"we will prove the contrapositive. So assume that X is not compact.
Let { x , } z l be a sequence that does not have a convergent subsequence. Then (Exercise
16-59) for every x E X there is an E~ > 0 so that { n E N : x, E BEx(x)) is finite. But
then C = { BEx(x) : x E X ) is an open cover of X that cannot have a finite subcover.
For "+,"let X be compact and let 0 be an open cover. We first prove that there
is an E
0 so that for every x E X there is an 0 E 0 so that B,(x) g 0. For a
contradiction, suppose that this is not the case. Then for each n E N there is an xfl E X
such that B L(x,) is not contained in any set 0 E 0. Because X is compact, (x,}:=~
fl
has a convergent subsequence { x , ~ ) ~ Let
= ~ x. := lim x,,. Then there is an 0
(x:
k+m
E
0
so that x E 0. Moreover, there is an E > 0 so that B E ( x ) 0. Now let k E N be such
1
&
1
that - < - and such that d ( x n k ,x) < -. Then for all y E B L (x,,) we have that
nk
2
nk
"k
1
1
&
&
d ( y , x) 5 d ( y , X,,)
d(x,,, x) < - - < - - = E . Consequently, the containIlk
nk
2
2
ments B L ( x n k ) 5 B, (x) C 0 provide the desired contradiction.
+
+
+
"k
Now let E > 0 be so that for every x E X there is an 0 E 0so that BE(x) & 0 . By
u
u
n
Lemma 16.70, there are finitely many X I , . . . , x, E X so that X C
B,(xj). For each
j=l
n
j = 1 , . . . , n , let Oj
E
0be such that B E ( x j ) O j . Then X C
B E ( x j )&
j=1
and [ Oj}S=l is the desired finite subcover of 0.
u
11
Oj
j=1
I
Typically, when we invoke compactness we obtain a finite subcover of a cover with
sets that are open in a surrounding space. It is not necessary to explicitly prove that a
subset S of a metric space is compact iff every open cover 0 (with sets that are open in
the surrounding space) has a finite subcover. The translation is eazily made by finding a
finite subcover { 01 n S , . . . , 0, f' S}of the corresponding cover 0 := { 0 nS : 0 E 0)
with relatively open sets and then going back to the sets { 0 1 , . . . , O n }that are open in
X (also see Figure 36).
Exercises
16-55. Prove that B1 := [ x
16-56. Prove Theorem
E
16.64.
16-57. Prove Theorem 16.65.
Hint. Exercise 16-16.
16-58. Prove Lemma 16.67
l a j : lIx/lcw 5 1 ) is a closed and bounded subset of lx that is not compact.
16.5. Compactness
315
16-59. Prove that if X is a metric space,~=:],x(
{ n E W : xn E BE(x)] is infinite, then
is a sequence and x E X is such that for all E > 0 the set
(X,],X,~has a convergent subsequence.
16-60. Let X be a metric space. Prove that if C
a finite subcover.
16-61. Let X be a metric space. Prove that if C
a finite subcover.
g X is not closed, then there exists an open cover without
5 X is not bounded, then there exists an open cover without
16-62. Let X be a metric space, let K C X be compact, and let 0 2 X he open such that K 2 0. Prove
that there is an E z 0 such that for all x E K we have BE(x) C 0 .
16-63. Let X be a metric space such that all closed and bounded subsets are compact and let [an]r=l
be a sequence. Prove that ( u , ) ~diverges
~
if and only if there is a subsequence { a n k ] E lthat
l
such that lim aim and
is unbounded, or there are two subsequences { u n k ] ~and
{ U ~ ~ } Z = ~ m+m
lim unk both exist, but are not equal.
k+x
16-64. More on Lemma 16.70.
(a) Prove that the conclusion of Lemma 16.70 is not equivalent to compactness by showing that
the open interval (0, 1) satisfies the conclusion of Lemma 16.70.
(b) Prove that a metric space X is compact iff
i. X is complete and
u
n
ii. For every E > 0, there are X I ,. . . , xn E X so that X 2
B,(xj).
j=1
Hint. We only need to prove “e.”
For this direction, let
( ~ j ] ? = be
~
struct a Cauchy sequence of points ( z k ] , X = , in X so that
{j
E
N :yj
a sequence in X. ConE B 1(zk)
You will need to take a subsequence of a subsequence when constructing zkil
Zl,
x
.
] is infinite.
after obtaining
. . . Zk.
16-65. Continuous functions need not be bounded on noncompact closed and bounded sets. For each
n
E
N define the function
f : lm
+ B to be f ( x ) := n 1 - - x(“) -
E L ( e n ) around the nth unit vector and zero on Zx \
I0
arctan : B +
(-:,
and arctan(-x) :=
:j.
2
uB h
1o:
(
1Y)
on the ball
( e n ) . Prove that f is continuous on
n=l
lx and unbounded on B1(0) G 1%
16-66. The tangent function can be inverted on the interval
(
cc
T i c
(- k,
,).
\
L
Its inverse is the arctangent function
L I
7r
Extend arctan(,) to the interval [-m, m] by defining arctan(m) := 2
Forx, y E [-m, m] definedc(x, y) := arctan(x) - arctan(y)
1
1.
, ) is a compact metric space.
(a) Prove that ([-m, a ]dc
(b) Prove that if [ x n ] r = l is a sequence of real numbers and x
lim Jxn -xi = 0.
E
W,then n lim
+m
d c ( x n ,x ) = 0 iff
n-cc
(c) Prove that if
is a sequence of real numbers, then we have lim dc(xn.x)) = 0 iff
n-m
lim xn = 30 in the sense of Definition 2.42.
n-x
16-67. Prove that if f : [c, d ] x [a. b] + R is continuous, then the function g : [u,b] + R defined by
g ( t ) :=
Id
f ( x , t ) dx is continuous.
H i m Use the uniform continuity o f f .
16. The Topology of Metric Spaces
316
16-68. Lemmas for the Stone-Weierstrass Theorem.
[fn]r=l
be a nondecreasing sequence of continuous real-valued func(a) Dini’s Theorem. Let
tions on [a, b] that converges pointwise to the continuous function f : [a, b] + W.Prove that
converges uniformly to f .
Hint. Cover the interval with open intervals (x - S,, x + 8,) on which f is close to a function
f n , . Then take a finite subcover.
{fn]E1
(b) Prove that the sequence [Pn]F=Oof polynomials defined recursively for all x
+
1
Po(x) := 0 and Pn+l (x):= Pn(x) - ( x - P:(x)
2
function f ( x ) = &.
Hint. Same approach as in Exercise 2-42.
E
[0, 11 by
) converges uniformly on [O, 11 to the
(c) Prove that for any M > 0 there is a sequence of polynomials that converges uniformly to
f ( x ) = & on [O, MI.
16.6 The Normed Topology of Rd
So far, on d-dimensional space we have only worked with the uniform norm 11 . l l m .
This norm is easy to work with, but it does not measure the usual Euclidean distance.
Therefore we will now investigate how norms in d-dimensional space relate to each
other. It turns out that all norms on finite dimensional spaces are equivalent and that
they induce the same notion of convergence. This means there is no loss of generality
in working with the uniform norm on a finite dimensional space.
Lemma 16.73 Let /I . 11 be a norm on Rd. Then f o r all x =
c
(XI,
. . . , Xd)
E
Rd the
d
inequality IIx 11 p llxlloo
ljei 11
holds, where ei denotes the i f h unit vector in Rd.
i=l
i=l
i=l
Theorem 16.74 Zfboth /I . I/ 1 and // . 112 are norms on Rd, then there are real numbers
c, C > 0 such thatfor all x E Rd the inequalities cllx II 1 I
IIx 112 I
Cllx I/ 1 hold.
Proof. Let // . I/ be an arbitrary norm on Rd. By Lemma 16.73, for all points
d
x,y ~ R ~ w e i n f e r ~ I l x l l - I l yI
l l I~l x - y l l I
I l x - y l l c o ~ l l e i I I . T h u s II./Iisconi=l
tinuous with respect to 11 . IIoo. By Example 16.58, Proposition 16.61, and Corollary
16.63 we conclude that the norm /I . 11 assumes an absolute minimum and an absolute
maximum on the compact set B := y E Rd : Ilyllw = 1 . Moreover, the absolute
minimum cannot be zero, because IIx I/ = 0 implies x = 0 and 0 # B .
The result will be proved if we can show that for any norms /I . Ill and II . 112 on
Rd there is a C > 0 such that for all x E Rd \ ( O } we have that IIx (12 I
C IIx II 1 . Let
M := ma.{ Ilyll2 : y E B } and m := min{ IIyII1 : y E B } , and let x E Rd \ ( 0 )
1
1
16.6. The Normed Topology of Rd
317
t
Figure 37: Geometrically, Theorem 16.76 says that on a finite dimensional space inside
any ball with respect to one norm we can find a ball with respect to any other norm and
with the same center. Moreover, this smaller ball contains a ball with respect to the
first norm with the same center. The figure shows this nesting for balls with respect to
the three most common norms on R2, the uniform norm 11 . llo0 (dashed), the Euclidean
norm 11 . 112 (solid), and the taxicab norm /I . 11 1 (dotted).
Norms that satisfy the conclusion of Theorem 16.74 are also called equivalent.
Theorem 16.76 below shows that any two norms on a finite dimensional vector space
are equivalent. Figure 37 provides a visualization of equivalence for norms.
Definition 16.75 Let X be a vector space and let 11 . 11 1 and /I . 112 be norms on X . Then
11 . 11 1 and I/ . 112 are called equivalent ifSthere are real numbers c , C > 0 such that for
a l l x E X we have c I J x I I I I
Ilxll2 I
Clixll1.
Theorem 16.76 Let X be afinite dimensional vector space. Then all norms on X are
equivalent.
Proof. Let 11 . 111 and 11 . 112 be two norms on X. Let { b l ,. . . , b d } be a base of X
and let Q : X -+ Rd be the isomorphism that maps each bi to ei . For k = 1,2, define
II d
II
/I d
I/
318
16. The Topology of Metric Spaces
d
But then for all x := E x ( " h i E X we infer
i=l
Equivalent norms induce the same notion of convergence.
Proposition 16.77 Let X be a vector space, let 11 ' 1 1 1 and 11.112 be two equivalent norms
on X , let {xn}Elbe a sequence in X and let x E X . Then lim xn = x in ( X , I/ . 111)
if
lim x, = x in
cc
I1 +
(x,/ / . 112).
n-+w
Proof. Exercise 16-69.
1
Because equivalent norms induce the same notion of convergence, sequences in
d-dimensional space converge iff their component sequences converge.
Theorem 16.78 Let X be a j n i t e dimensional normed space and let { b l , . . . , bd) be
a base of X . For each element x in X , let x('), . . . ,
be the components such
d
that x = E x ( ' ) b i . Then a sequence {x.],"=~converges to L in X g a l l component
i=l
Proof. Use Theorems 16.4, 16.76, and Proposition 16.77. (Exercise 16-70.)
1
In particular, we obtain that all finite dimensional normed spaces are complete.
Theorem 16.79 Let X be ajnite dimensional piormed space. Then X is complete.
Proof. Exercise 16-7 1.
1
Moreover, in finite dimensional spaces compactness is equivalent to being closed
and bounded.
Theorem 16.80 A subset C of ajnite dimensional normed space X is compact iff it is
closed and bounded.
16.6. The Normed Topology of Rd
3 19
Proof. The direction "+"follows from Proposition 16.60.
For "+,"let ( b l ,. . . , bd} be a base of X and for each x E X let x ( I ) , . . . , x@)
d
be the components so that x =
x(')bi.Let C
X be closed and bounded and let
i=l
ffi
( x n } z l be a sequence in C . Because the component sequence
is bounded
00
in
R there is a convergent subsequence
with limit x ( l ) . Now suppose n:,
oc
has been chosen so that for m = 1, . . . , i the sequence {xi:)}
has a limit
i=l
. I -
30
Then we can choose a subsequence
a limit x ( ~ + ' ) .But then for rn = 1, . . . , i
1
+ 1 the sequence { x:jl
i=l -
d
oc
has a limit
j=1
Continue this selection process up to i = d. By Theorem 16.78 the subsequence
d
{ } oc
'n?
j=1
converges to x :=
x(')bi,and because C is closed, x
E
C.
i=l
Theorem 16.80 and the Heine-Bore1 formulation of compactness allow us to prove
that if m , n , d E N with m n = d, then for all sets S E X i m x X i n the d-dimensional
Lebesgue measure h d ( S ) is equal to the product measure A, x h,(S) of m-dimensional
and n-dimensional Lebesgue measure. In particular, this completes the investigation
started in Proposition 14.60.
+
Theorem 16.81 Let d E N and for i = 1, . . . , d let Ji be an interval offinite length.
d
I Ji I . Moreovel; if the numbers m, n
E
N satisfy m
+ n = d,
then Ad 1
x;,,, = h, x A,, that is, the restriction of the Lebesgue measure on Rd to
x X i n is equal to the product of the Lebesgue measures on Rmand Rn.
d
Proof. The inequality
I Ji 1 follows directly from the definition
hd
of outer Lebesgue measure. To prove the reversed inequality, we proceed as follows.
For i = 1, . . . , d, let ai be the left endpoint of Ji and let bi be the right endd
point. Let K := n [ a i , bi]. It is easy to prove (see Exercise 16-72) the equality
hd
( fI
i=l
Ji) = h d ( K ) . Now let
i=l
open boxes so that K
u
E
> 0 and let {Dj],"=l be a sequence of dyadic
ffi
j=1
30
D j and h d ( K )
+E >
I D j 1. (By Exercise 14-17, such
j=1
16. The Topology of Metric Spaces
320
a sequence exists.) Because K is compact, we can assume without loss of gener-
u
N
ality that there is an N E W so that K E
n
D j . For each j E 11,. . . . N } , let
j=l
d
Dj =
( u ! , b j ) . Let M be the largest integer so that 2M is the denominator of
i=l
n (5,T),
+
c
any of the completely simplified dyadic rational numbers u / and b/ . Let C M be the set
d
of all cubes of the form
ci
1
where the C i are integers. Then for each D j
1=1
the equality (Dj(
=
(El holds (see Exercise 16-73). For i = 1, . . . , d ,
E E c M , E n D jf 0
li
let lj be the largest integer so that - < ui and let ri be the smallest integer so that
2M
Ti
- > bit Then for every E
2M
j
E
E C M that is contained in Q :=
. ~ _
i=l
1 . . . . , N so that E C Dj. Therefore
n1 n
d
Because
E
was arbitrary we conclude hd
(i:1
Ji
2
I Ji 1, and hence the two
i=l
sides are equal.
In particular, this means that if A is an rn-dimensional open box and B is an IZdimensional open box, then A d ( A x B) = hm(A)h,(Bj. By Proposition 14.60, this
equation also holds when one of A or B is a null set and the other is an arbitrary
Lebesgue measurable set. But then, because the ( m - and n-dimensional) Lebesgue
measurable sets are the a-algebra generated by the open boxes and the null sets, the
above and Theorem 14.57 prove that h d I
H
zAn= h , x h,.
Exercises
16-69.
16-70.
16-71.
16-72.
Prove Proposition 16.77.
Prove Theorem 16.78.
Prove Theorem 16.79.
Let d E N and for i = 1, . . . , d let Jj be a, not necessarily closed, interval of finite length with left
endpoint aj and right endpoint bj . Prove that i,d
16.6. The Normed Topology of Rd
321
+
n (3,
k),
n ($,5)
d
16-73. Let M
E
N
and let CW be the set of all cubes of the form
C’
1
where the ci
i=l
d
are integers. Prove that for each dyadic open box of the form D =
the equality
i=l
\Dl =
] E l holds.
EECM,E~D#O
n
16-74. The Fundamental Theorem of Algebra. Let P :
C + C defined by P ( z ) := C a k z k be a
k=O
nonconstant complex polynomial.
(a) Prove that P is continuous.
(b) Prove that IPl : C +- [O, co)assumes an absolute minimum in @.
Hint. Recall that C is (as a metric space) isomorphic to B2.
Prove that for any M > 0 there is
an r 0 so that for all Iz1 > r the inequality P ( z ) > M holds.
1
1
(c) Now prove the Fundamental Theorem of Algebra. That is, prove that there must be a z E C
so that P ( z ) = 0.
Hint. Suppose there is no such z , let the absolute minimum of IPl be assumed at zo and
n
1
b j z j for some rn E
consider Q ( z ) := -P(z
z o ) . Then Q ( z ) = 1 bmzm
+
P(z0)
+
+
1
W. Apply the triangular inequality and find a z with Q ( z )
j=m+l
I < 1.
n
n
(d) Prove that there are, not necessarily distinct, z1, , . . , zn E
C so that P ( z ) = a,
(z - z j )
j=l
for all z
E
C.
16-75, Partial Fraction Decompositions.
(a) Let P be a polynomial with real coefficients. Prove that if z
P (Z) = 0.
E
C is so that P ( z ) = 0, then
(b) Use the Fundamental Theorem of Algebra to prove that each polynomial with real coefficients
can be written as a product of the leading coefficient, linear factors ( x - c) and irreducible
quadratic factors ( (x - a)* b2 ), where all constants a , b, and c are real.
+
(c) Prove that every rational function with real coefficients can be written as the sum of a polynomial and a linear combination of horizontally shifted rational functions as in Exercises 12-11,
12-17d, and 12-18.
Hint. Induction on the degree of the denominator.
(d) Explain why (at least in principle) it is possible to find a symbolic antiderivative for every
rational function with real coefficients.
16-76. Prove that on Z2 the norms )I . 112 and
11 . /Ioc are not equivalent.
16-77. Prove that in a finite dimensional normed space a series converges unconditionally iff it converges
absolutely.
16-78. Let X be an infinite dimensional inner product space. Prove that { x E X : Ilx 11 5 1 ] is closed and
bounded, but not compact.
Hint. Apply the Gram-Schmidt Orthonormalization Procedure to a sequence (bn)r= of linearly
independent vectors in X to obtain a countable orthonormal system in [ x E X : / / xII i 1
1.
16-79. Proceed as follows to prove that a nomed space X is finite dimensional iff compactness is equivalent
to being closed and bounded.
(a) Briefly explain why we only need to prove “e.”
16. The Topology of Metric Spaces
322
For ''e
" we prove the contrapositive. So for the remainder; let X be an injinite dimensional
n o m e d space and let [b,,]r=l be a sequence in X so that anyjinite subset is linearly independent. Even though we do not necessarily have an inner product in
we can adapt the
idea from Exercise 16-78.
x,
Let A g X be a nonempty subset. For all x E X , define the distance from x to A as
dist(x. A ) := inf [ d(x,a ) : a E A ). (We will investigate this function in Section 16.9.)
:= { x E span(u1, , , . , u n ) : lIxl/ 5 r
For u1, . . . , u,, E X and r 2 0 let B~pan(ul""'un)
Prove that for any element w
f!
span(u1,. . . , u n ) there is an a ( w ) E B~pan(vl""'L'n)
so that
1 w - a ( w ) /I = dist ( w , B1span(u1, ....u,)
Prove that
/ a(w) j/
}.
) z 0.
< 2l/wl/
1
4
1
Prove that if llwll < -,then dist ( w - a ( w ) , B~pan(u13""u")
) = w - a(w)
Construct a sequence~=:],u(
in X so that I/un // = 1 for all n E
i , j E W we have llui - u j // 2 1.
Finish the proof of the contrapositive of
/I
N and so that for all distinct
"e."
16-80. Fubini's Theorem revisited. Let h2 be Lebesgue measure on R2 and let hx and h , denote Lebesgue
measure on the x - and y-axes, respectively.
(a) Prove that if the function f : R2 + [-m, m] is Lebesgue integrable and hx xh,,-measurable,
then for almost all elements x E W the function f x ( y ) := f (x,y ) is Lebesgue integrable, for
almost all elements y E R the function f ' ( x ) := f ( x , y ) is Lebesgue integrable, the function
x H
f x dh?; if fx is Lebesgue
is Lebesgue integrable and the function
0:
otherwise,
{h
(c) State and prove a result similar to the result in part 16-80a for the Lebesgue integral on R3,
representing it as an iteration of three single variable Lebesgue integrals.
(d) Compute the following integrals:
16.7 Dense Subspaces
Recall that the integral of a nonnegative measurable function is defined as a supremum
of integrals of simple functions. This means that for every integrable function there
should be simple functions arbitrarily "close" to it. The concept of a dense subset
expresses this idea in precise terms.
16.7. Dense Subspaces
323
Definition 16.82 Let X be a metric space. A set S C X is called dense in X ifffor
every e > 0 and every x E X there is an s E S so that d ( x , s ) < E .
So a subset S of a metric space X is dense iff every neighborhood of every point of
X contains a point in S. In terms of approximating elements, we can say the following.
Proposition 16.83 Let X be a metric space. Then S g X is dense in X ifffor all x
there is a sequence
of elements in S so that lirn s, = x.
{sn}zl
E
X
n-+m
Proof. Use Standard Proof Technique 3.8. (Exercise 16-81.)
The simplest example of a dense subset are the rational numbers as a subset of the
real numbers.
Theorem 16.84 Q is dense in R.
Proof. Use Theorem 1.36. (Exercise 16-82.)
Once we take care of the usual problem of equality almost everywhere, the above
mentioned simple functions can be considered “dense in LP.”
Theorem 16.85 Let ( M , C,p ) be a measure space and let 1 5 p
S := { [s] : s E F ( M , R) is simple } is dense in L P ( M , C , p).
i03.
Then the set
Proof. First, consider a nonnegative function g E P ( M , C , k ) . From the proof of
Theorem 14.29 (see proof of Theorem 9.19), we infer that there is a sequence~=:],s{
of nonnegative simple functions that converges pointwise to g with 0 5 s, I
g for
co
all n E W. Hence, the sequence { Is, - glp},=, converges pointwise to zero and it is
bounded by g p E L P ( M , C , p ) . Thus by the Dominated Convergence Theorem we
obtain lirn
n+oo
jM
Is, - g l p
1
d p = 0, that is, n-+m
lirn [s,] - [gl
{s,}zl
= 0.
Now let f E C P ( M , C , p ) . Let
be a sequence of simple functions so
that lirn I/s, = 0 and let { t n } z lbe a sequence of simple functions so that
n -+ oc
f+I/,
/ t, - f - / P = 0. Then {s, - t n ) z l is a sequence of simple functions and we
conclude 0 5 lirn 1 [s, - tn] - [f ] 1 5 lirn IISn - f
+ /I tn - f - 1 = 0.
n+oo
n-m
lirn
,-too
+
By Proposition 16.83, S is dense in L p ( M , C , p ) .
Although simple functions can be defined for arbitrary measure spaces, when additional structure is available it would be desirable to have dense subsets of functions
with properties related to that structure. The next result shows that the continuous functions are “dense in LP[a,b].” For some LP spaces, we will find an even nicer dense
subspace in Theorem 18.12.
[
Theorem 16.86 Let a < b, let C [ a ,b ] := [ f ]: f
Then C [ a ,b] is dense in L p [ a ,b ] .
E
C o [ a ,b ] ]and let 1 5 p <
00.
16. The Topology of Metric Spaces
324
Figure 38: Illustration of the approximation of indicator functions in the proof of Theorem 16.86. First ( a ) cover the set A with open intervals so that the measure of the union
of the intervals is close to h ( A ) .Then (b) discard all but finitely many intervals but do
not decrease the measure of the union too much. Then (c) approximate the indicator
function of each interval with a continuous function so that the integrals remain close.
Proof. By Theorems 5.20 and 9.26, C [ a ,b] is a subset of LP[a, b].We are done if
we can show that for any E > 0 and f E CP(R) there is a continuous g E P ( R ) so that
llf - gllp < E . The result for U [ a ,b] will follow because CP[a,b ] is embedded in
Cp(R) by setting each function equal to zero outside [a,b] and because the restriction
of a continuous function on R to [ a ,b] is continuous on [ a ,b].
First, let (1, r ) be an open interval on the real line and let E > 0. Define
Then each h(l,rl,cis continuous on R and the following inequalities hold.
We now prove that for every measurable set A and every E > 0 there is a continuous
function g A , E so that lI1A - g A , c l l p < E . For the idea, consider Figure 38. Let {Ij]y=l
u
00
be a sequence of open intervals so that A G
n
E
N be such that
[
j=n+l
j=l
1
I I j 1)
’
<
:.
00
IZj 1
Zj and
< h(A)
+ (:)’
. Let
j=1
Then the function gA,E :=
2
j=1
h I l , ; is con-
16.7. Dense Subspaces
325
tinuous on W and
Now we prove that for every simple function s and every E > 0 there is a continuous
n
function gs,c so that llgs,e - slip <
E.
Let s = x U
j=l
j l A J
be a simple function on
R
n
and let E > 0. Then gs,e :=
J=1
a J g A JE
,
is continuous on R and
n(IaJ~+l)
Now finally let f E CP(R) and let E > 0. By Theorem 16.85 there is a simple
&
&
function s so that jl f - s I I p < -.
2 Moreover, g , ' 42 is continuous,
Hence, for every [f]E LP(R)and every E > 0 there is a g E C(R) so that
< E , which proves that C [ a ,b ] is dense in U [ a ,b ] .
)I [f]- [ g ]Ilp
It is worth noting that S as well as C [ a ,b ] are actually linear subspaces of the
normed spaces L p ( M , C , p ) and LP[a, b ] ,respectively. In finite dimensional spaces,
proper linear subspaces cannot be dense in the whole space. In infinite dimensional
spaces there can be many dense linear subspaces comprised of "nice" elements.
For integrable functions, it is standard practice to prove results for a dense subset
of functions with nice properties and then use a limit argument to get the result for all
functions. The proof of Theorem 18.37 is a prime example of this approach. Theorem
16.87 gives a first impression how an equality on a dense subset translates to an equality
on the whole space.
Theorem 16.87 Let X , Y be metric spaces, let D
X be dense and let the functions
f,g : x --f Y be continuous with f ID = glD. Then f = g .
16. The Topology of Metric Spaces
326
Proof. Let x
E
X \ D and let (d,}r=l be a sequence in D so that lim d, = x.
(
Then f ( x ) = f lim d,) = lim f (d,) = lim g (d,) = g
n+cc
n+cc
n+cc
cause f (x) = g(x) for all x E D this proves f = g.
( lim d,
n-30
rn
Because completeness is such a useful analytical property, we conclude this section
by proving that every metric space can be viewed as a dense subspace of a complete
metric space.
Definition 16.88 Let X , Y be metric spaces. A function f : X + Y is called an
isometry iff f o r all x, x' E X we have d ( f (x), f ( x ' ) ) = d (x, x'). If there is an
isometry f : X + Y , we will also say that X can be isometrically embedded into Y .
Theorem 16.89 Every metric space X can be isometrically embedded as a dense subspace into a complete metric space C ( X ) .
Proof. For this proof, let d x denote the metric on X. Define
C(X):= [ { x ( ~ ) ] : : ( x i i j ] "
i=l
1=1
I
isacauchysequenceinx .
Let { x ( ~ ) ) : , [ y ( i ) ) m E C ( X ) . We will first show [ d x ( x ( ~ 4"i))]30
),
is a
r=l
i=l
i=l
Cauchy sequence. To do this let i , j E N. Assume without loss of generality that
Thus for any ( x ( ~ ) } : , { y ( i ) ] 3 0
r=1
i=l
E
C(X)the sequence { d x ( x l i ) , y ( i ) ) ) im= l is a
[ )N
Cauchy sequence, and hence it has a limit. For x ( i )
r=l
, [ y ( i ) ) m E C ( X ) , define
i=l
16.7. Dense Subspaces
327
We claim that dS is a semimetric on the set C(X). Clearly, for all x,y E c(X)
we have dS(x,y ) 2 0,d S ( x , x ) = 0 and dS(x,y ) = d S ( y ,x). Now consider three
elements [ x ( ~ )r =)1 : , { y ( i ) } z l [, Z ( ~ ) ] : ~ E C(X).Then
-
Let be the equivalence relation on C(X)as in Theorem 15.61. Let C ( X ) be the
set and let d be the metric on C ( X ) obtained from ( C ( X ) ,d S )via Theorem 15.61. AS
in Theorem 15.61 we will denote the elements of C ( X ) by [x],where x E C(X).
We claim for every { x ( ~ ) } : E C ( X ) and every n E N there is an equivalent
1=1
1
i = l so that for all i, j E N we have dx ( Y ( ~ )y, ( j ) ) < -.
n Let n E N.
[
]
{ ~ ( ~ ) } f f ir=l
There is an rn E
N so that for i, j
3 rn we have that dx ( x ( ~ )x ,( j ) ) <
{ y(i))oc
dx
n
:= {ximCi))b3is a Cauchy sequence and clearly for all i, j E N we have
i=l
i=l
1
co
( Y ( ~ ) , y ( j ) ) < -. Moreover, because [x(')]
is a Cauchy sequence, we infer
i
1 lzl -
i=l
lim dx y ( i ) ,~ ( =
~ 1lim dx ( x ( ~ + x~( )i ), = 0, and hence y ( ' )
i+oc
1
-. The sequence
(
i+w
{x(i)l:l.
To prove that C ( X ) is complete, let { [x,]}zl be a Cauchy sequence in C ( X ) . By
the above, for each n E N we can assume without loss of generality that the sequence
1
xn = {x:))
is such that for all i, j E N we have dx (
,
'
:
x
xi')) < -.
i=l
n
b3
Define x := x,(l)
We claim that x is a Cauchy sequence. Let E > 0. Then
x)
{
IiT1.
1
&
&
there is an N E N so that - < - and for all rn, n 2 N we have d([x,], [x,]) 4 -.
3
N
3
&
Let n , rn >_ N be fixed. Because ,lim dx (
x:), x?)) = d([x,], [x,]) < -, there is a
1 3 0 0
3
&
k E N so that d x (.$I, xi')) < i.Hence, for all m ,n 2 N we obtain
dx ( x j d ) , x i 1 ) )
5
dx (x:!',
+ dx (xi."),x i k ) )+ dx ( x i k ) ,xi'))
x$')
I
s
1
r n 3 n
Because rn, n 3 N were arbitrary, x is a Cauchy sequence.
<
- + - + - < & .
We now claim that lim [x,] = [x]. Let
n-oc
E
> 0. With N E
XI(^)) < 5,and hence
obtain that for all n , i 2 N there is a k 2 N with dx ( x i k ) ,
(
dx xn( i ) > xi'))
1
<
dx (,,(/I, xi'))
<
-+-+:<&.
I
n
s
3
1
2
&
N SO that < - we
N
3
&
+ dx ( x i k ) ,x f ) ) + dx (xf), xi'))
16. The Topology of Metric Spaces
328
Therefore d ( [ x , ] , [XI) = ,lim dx
that lim d ( [ x , ] , [XI) = 0.
2+00
(x!),
x!l))
<E
and we conclude, as was claimed,
n+w
Clearly, the map f : X -+ C ( X ) defined by f ( x ) := [{x}El] is well-defined
and for all x, y E X we have d x ( x , y ) = d ( f ( x ) , f ( y ) ) . This means that X can be
isometrically embedded into C ( X ) .
00
of
To prove f [ X ] is dense in C ( X ) ,let [XI E C ( X ) .Fix a representative
[XI. Then the sequence { f (x“))
)00
i=l
converges to
[XI
{
li=1
in C ( X ) (Exercise 16-83).
A complete superspace Y of a metric space X so that X is dense in Y is also called
the completion of the space. Any two completions are isometrically isomorphic (see
Exercise 16-84d), so we can really speak of the completion. Thus Theorem 16.84 says
that the real numbers are the completion of the rational numbers. Indeed, one way
to construct the real numbers from the rational numbers in set theory is to apply the
construction in the proof of Theorem 16.89 to Q (see Exercise 16-93).
When considering function spaces, Theorem 16.86 says that L p [ a ,b] is the completion of C [ a ,b] with respect to the 11 . I l p norm. Because continuous functions are
Riemann integrable, L p [ a ,b] is also the completion of the set of (equivalence classes
of) Riemann integrable functions with respect to the 11 . [ I p norm. It is possible to introduce the LP spaces as these completions, but by explicitly defining the integrals the
elements of the completion are better understood.
Finally, note that Exercises 16-94 and 16-95 show that the completions of normed
and inner product spaces are Banach and Hilbert spaces, respectively.
Exercises
16-81. Prove Proposition 16.83.
16-82. Prove Theorem 16.84.
16-83. Finish the proof of Theorem 16.89 as follows. Let [XIE C ( X ) . Fix a representative [ x ( ~ ) }of~
[XI.
Prove that the sequence f
(
>I,:
x(~)
16-84. Extensions of continuous functions.
i=l
converges to [XI
in C ( X ) .
(a) Let X be a metric space, Y a complete metric space, D G X dense and let f : D + Y
be uniformly continuous. Prove that f can be extended to a unique continuous function
F : X + Y sothat F I D = f.
(b) Prove that there are continuous functions
f : (0, 11 + B that do nor have a continuous
extension F : [0, 11 + W as in Exercise 16-84a.
(c) Give an example that shows that the space Y in Exercise 16-84a must be complete
(d) Prove that if X , Y are metric spaces, D X g X is dense in X , D y E Y is dense in Y and
f : D X --f Dy is an isometry, then there is an isometry between X and Y .
16-85. Examples of dense subspaces
(a) Let ( M , C ,F ) be a measure space. Prove that the space of “simple functions”
S := { [s] : s E F ( M , R)is simple ] is dense in L m ( M , X,p).
(b) Prove that for 1 5 p < m, C’ := { [f]: f E C ’ ( a , b) ] is dense in L P ( a . b ) .
Hint. You can defer most of the proof to corresponding parts of the proof of Theorem 16.86.
Only the start needs to be modified.
16.7. Dense Subspaces
329
16-86. Prove that C [ a ,b] is not dense in L m [ a , b]
16-87. Dense subspaces of ( C o ( X , R),11 . //oo). Let X be a compact metric space and let C o ( X ,W) be
A vector subspace L
the s ace of continuous functions from X to R with the uniform norm I/ .
of C ( X , R) is called a sublattice of C o ( X ,R)iff for all f E L we have that max(f, g ) E L and
g
min(f, g ] E L . A subset S of C o ( X ,W) is called point-separating iff for all x , y E X there is a
g E S with g(x) f g ( y ) . A vector subspace A of C o ( X , R) is called asubalgebra of C o ( X ,R) iff
for all f,g E A we have that f g E A.
(a) Prove that if L is a point-separating sublattice of C o ( X ,W) that contains the constant functions, then for all E > 0, all functions f E C o ( X ,W) and all x E X there is a g E L so that
g(x) = f ( x ) and for all y E X we have g ( y ) 5 f ( y ) E .
Hint. Set h x ( z ) := f ( x ) and for y E X \ ( x ) find g E L with g(y) f g(x) and set
+
h y ( z ) := f ( x )
+ ( f ( y ) - f i x ) ) g'(')( y ) - g&)( x ) . Cover X with open sets I ,
equality h y ( z ) 5 f ( z )
on which the in-
-
+ E holds for all z E I y . Then use a finite open subcover and minima.
(b) Prove that if L is a point-separating sublattice of C o ( X ,W) that contains the constant functions, then L is dense in C o ( X ,W).
Hint. Let f E C o ( X ,W). For each x E X find an open subset Ix 3 x and a function g, E L
so that g x ( y ) if ( y ) E for all y E X and so that g, ( z ) - f ( z ) < E for all z E I x . Then
use a finite open subcover and maxima.
+
I
I
(c) Prove that if L is a subspace of C o ( X , R) so that for all f E L we have that
is a sublattice of C o ( X ,W).
(d) Let A be a subalgebra of C o ( X ,W) and let P : B
f E A the function P o f is also in A .
--f
If1
E
L , then L
B be a polynomial. Prove that for all
(e) Let A be a subalgebra of the space C o ( X , B).Prove that the subspace of C o ( X , W) is a
sublattice.
Hint. First prove that ;
Iis a subalgebra. Then prove that for each f E 2 the function
If1 =
is also in ;I.
To prove that the square root can be taken, use Exercise 16-68c.
fl
(f) Stone-Weierstrass Theorem. Let X be compact. Prove that if A is a point-separating subalgebra of C o ( X ,R)that contains the constant functions, then A is dense in C o ( X , W).
(g) Prove that the subspace of all polynomials on [a, b] is dense in C o [ a ,b ] .
(h) Let E > 0. Prove that the subspace of all trigonometric polynomials on [-n,
n - E ] is dense
in co[-n,n - e l .
Hint. Exercise 12-15b.
(i) Why are the trigonometric polynomials nor dense in C o [ - n , n]?
(j) Let C := [-n,n)with the metric d ( x , y ) :=
I/ (cos(x), sin(x) ) - ( co s( j ) , sin(y) ) / I 2
i. Prove that C is isometrically isomorphic to
ii. Prove that C is compact.
[ z E W2 : lIzl12 = 1 ] .
iii. Prove that the trigonometric polynomials are dense in
( Co(C, W), 11 llm )
16-88. Density of polynomials does not imply that all Taylor series converge
(a) Let X
Y & Z be metric spaces so that X is dense in Z. Prove that X is dense in Y .
(b) Prove that the set of polynomials is dense in ( C m [ a , b ] , 11
. llm). Derivatives on the boundary
are understood to be one-sided. You may use Exercise 16-878.
(c) Explain why part 16-88b does not imply that every function in the space (Coo[-1, 11, I/ . l l ~ )
is the limit of its Taylor series about zero.
(Lemma 18.8 will ultimately provide an example of a function whose Taylor series does not
converge to the function.)
16. The Topology of Metric Spaces
330
16-89. Use Egoroff’s Theorem (see Exercise 14-37c) and Theorem 16.86 to prove that for every function
f E L P [ a ,b] and every E > 0 there is a B & [ a , b ] with h ( B ) > (b - a) - E and a continuous
function g : B + so that glB = f.
16-90. A metric space with a countable dense subset is called separable
( a ) Prove that every open subset in a separable metric space is a counrabie union of open balls
( b , Pro\e that a compact metric space is separable.
I
Hirir. Use Lemma 16.70 for sk := - and all k E ?I.
k
( c ) Let 0 & X“ be an open set. Prove that 0 is separable.
Hint.
a‘‘is countable.
(d) Give an example of a complete, separable metric space that is not compact.
(e) Prove that if X , Y are metric spaces, X is separable and f : X + Y is continuous and
surjective, then Y is separable.
16-91. Let X be a metric space and let D g X . Prove that D is dense in X iff D = X .
16-92. Let X be a metric space. List all closed dense subspaces of X
16-93. Constructing the real numbers from the rational numbers. Prove that the metric space obtained when
applying the construction in the proof of Theorem 16.89 to the rational numbers is an ordered field.
Hint. Define the operations and . for sequences termwise and prove that they induce well-defined
operations on the equivalence classes. Then prove that these operations satisfy the field axioms.
Define R+ as the set of equivalence classes of sequences for which there is a positive rational number
r so that the terms are eventually greater than r . Then prove that Axiom 1.6 is satisfied.
The above, together with Theorem 16.89, shows that the space is a complete, ordered field. The
Completeness Axiom as stated in this text can be proved using convergence of Cauchy sequences as
in Exercise 2-25.
+
16-94. Prove that the completion of a normed space is a Banach space.
16-95. Prove that the completion of an inner product space is a Hilbert space.
16.8 Connectedness
Intuitively, a metric space should be connected iff it is possible to get from any place
to any other place by going along an unbroken path. Unfortunately, this would define
connectedness in terms of itself, because the only interpretation of “unbroken” is “connected.” Using continuous paths to connect points also leads to some problems, which
will be explained in Example 16.96. Therefore, we define connectedness by what it
means to be disconnected. Sensibly, being disconnected should mean that the space
can be split into two nonempty pieces that are separate from each other. Two disjoint
open sets can be considered to be separate entities, because each element in each set
has nonzero distance to the respective other set.
Definition 16.90 A metric space X is called disconnected iff there are t ~ ; disjoint
o
nonempty open sets U , V C X such that U U V = X . A metric space X is called
connected iff no such disjoint nonempty open sets exist.
Note that if X is disconnected, then the sets U and V are closed as well as open
(Exercise 16-96). Such sets are sometimes called clopen.
16.8. Connectedness
331
Figure 39: A disconnected subset D of a metric space can be separated by two open
sets U and V into at least two nonempty components (Dl and 0 2 U D3 in this figure).
As with any topological notion, subsets of metric spaces are disconnected iff they
are disconnected as metric spaces. Figure 39 gives the idea and Exercise 16-97 investigates the properties of sets that disconnect a subset of a metric space. Most importantly,
subspaces can only be disconnected by open subsets of the surrounding space.
The most natural example of a connected set is an interval. It turns out that intervals
are the only connected subsets of the real numbers.
Theorem 16.91 A subset of the real line is connected iff it is an interval.
Proof. For "+,"let S be a connected subset of R and suppose for a contradiction
that it is not an interval. Then there are I , r E S with 1 < r such that there is an
m E R \ S with I < m < r. But then U := S n (-m, m ) and V := S n ( m ,m) are both
open in S, nonempty and disjoint. This implies that S is disconnected, contradiction.
For "+,"let I be an interval. Suppose for a contradiction that I is not connected
and let U , V 5 I be disjoint nonempty open subsets of I so that U U V = I . Let c E I .
Then c E U or c E V . Without loss of generality assume that c E U . Then there is
an element u E V so that u > c or u < c. Without loss of generality assume u > c.
Consider the set W := (x E V : x > c } ,which is nonempty and bounded below by c.
Let b := inf W . Then because c # V and U is open we infer c < b.
We claim that b # W. Indeed, otherwise b E V and then there is an E > 0 so that
I n ( b - E , b E ) C V. Because c < b and c # V we obtain c 5 b - E . Because I is
an interval, we infer ( b - E , b ] I , and then ( b - E , b ] 2 W , contradicting the choice
of b.
Thus b # W. Because v > b > c and I is an interval we obtain b E I , and hence
b E U . But then there is an E > 0 so that I n ( b - E , b + E ) & U . In particular, this
means that inf(W) 2 b E , contradicting the choice of b.
+
+
Continuous functions satisfy an abstract version of the Intermediate Value Theorem, with connected spaces taking the place of the intervals on the real line.
Theorem 16.92 Let X , Y be metric spaces and let f : X + Y be continuous. If X is
connected, then the image f [XIis a connected subspace of Y .
16. The Topology of Metric Spaces
332
Proof. Suppose for a contradiction that f [ X ] is not connected. Then there are disjoint nonempty sets U , V c f [ X ] that are open in f [ X ] and satisfy U U V = f [ X ] .
But then f - ' [ U ] and f - ' [ V ] are disjoint nonempty open subsets of X so that the
equality f - ' [ U ] U f - l [ V ] = X holds, contradicting the assumption that X is connetted. Thus f [ X ] must be connected.
The Intermediate Value Theorem is more readily recognized in the following result.
Corollary 16.93 Intermediate Value Theorem. Let X be a connected metric space
and let f : X -+ R be continuous. Then for all a , b E X with f ( a ) f ( b ) and all t
between f ( a ) and f ( b ) there is a c E X with f ( c ) = t.
+
Proof. Exercise 16-98.
By using images of intervals in Theorem 16.92 we obtain the idea of connectedness
that was mentioned in the introduction.
Definition 16.94 A metric space X is called pathwise connected @for any two points
x,y E X there is a continuous jimction f : [0, I ] + X such that f ( 0 ) = x and
f (1)
=Y.
Some examples of pathwise connected sets are given in Exercise 16-99. The next
result shows that these are also examples of connected sets.
Theorem 16.95 Let X be a pathwise connected metric space. Then X is connected.
Proof. Suppose for a contradiction that X is not connected. Then there are two
nonempty disjoint open subsets U and V of X so that U U V = X . Let u E U and
u E V and let f : [0, 11 + X be continuous with f ( 0 ) = u and f ( 1 ) = u . Then
f - ' [ U ] and f - ' [ V ] are disjoint nonempty open subsets of [0, 11, which is impossible
because [0, 11 is connected.
rn
Exercise 16-100 will show that connectedness and pathwise connectedness are
equivalent for open subsets of normed spaces. However, in general pathwise connectedness is strictly stronger than connectedness.
{
Example 16.96 The space X := { (x,0 ) : x 5 0} U (x, sin
(:))
: x > 0) is con-
nected, but it is not pathwise connected.
The space X is not pathwise connected, because there is no continuous function
(-&,
(:) ) x
(;')) x
f : [0, 11 -+ X such that f ( 0 ) =
{
{
both { (x , 0 ) : x 5 0) and (x, sin
0) and f ( 1) = (0,O). On the other hand,
:
> 0) are pathwise connected, and hence
connected. Thus, if X was disconnected, there would be open sets U , V C EX2 so that
{ (x,0) : x I
0} c U and
U intersects
{
(x, sin
( x . sin
(i))
:
>0)
c V . But (0,O) E U implies that
: x > 0 ) , so this is not possible.
0
16.9. Locally Compact Spaces
333
Exercises
16-96. Prove that a metric space X is disconnected iff there are disjoint nonempty closed sets C , D
that C U D = X.
16-97. Disconnected subspaces. Let X be a metric space and let S
C X so
5X
(a) Prove that S is disconnected iff there are disjoint nonempty subsets U , V
in X such that S C U U V .
5 X that are open
(b) Give an example that shows that a subspace S can be disconnected and there are no disjoint
nonempty subsets C, D C X rhat are closed in X such that S E C U D.
16-98. Prove Corollary 16.93.
16-99. Let X be a normed space
(a) Prove that for any two points a , b E X the function f ( r ) := a
+ r(b - a ) is continuous.
(b) Prove that X is pathwise connected.
(c) Prove that for all x E X and all E > 0 the ball B E ( x )is pathwise connected.
16-100. Let X be a normed space and let R
connected.
g X be open. Prove that R is connected iff it is pathwise
16-101. Let X , Y be metric spaces and let f : X + Y be continuous. Prove that if X is pathwise connected,
then f[X]is a pathwise connected subspace of Y .
16-102. Unit “circles.”
+
(a) Prove that { ( x , y ) E W2 : x 2 y2 = 1 ] is connected.
Hint. c(r) = cos(r)el sin(t)e2.
+
[ (x,y ) E R2 : I l ( x . y ) l l p = 1 ] is connected.
[ (x,y ) E R2 : II(x, y ) I l m = 1 ] is connected.
(b) Let 1 5 p <
(c) Prove that
30.
Prove that
16-103. Connected components. Let X be a metric space. A subset C 5 X is called a component of X iff C
is connected and there is no proper superset D 3 C that is connected.
(a) Prove that if A , B
5 X are connected and A n B # 0, then A U B is connected.
(b) Prove that if C1. C2 are components of X then either C1 = C2 or C1 n C2 = 0.
(c) Prove that every x E X is contained in a component of X.
(d) Prove that every open subset of
W is a countable union of pairwise disjoint open intervals.
16.9 Locally Compact Spaces
The goal of this section is to construct (families of) continuous functions that are
equal to one on a specified set and equal to zero on another set. To construct such
functions, we introduce the distance function.
Definition 16.97 Let X be a metric space and let A X be nonempv. For all x E X ,
we dejine dist(x, A) := inf { d ( x ,a ) : a E A } and call it the distance from x to A.
Lemma 16.98 Let X be a metric space and let A C X be nonempty. Then the function
dist(., A ) is Lipschitz continuous.
16. The Topology of Metric Spaces
334
+
Proof. Let x , y E X, let E > 0 and let a E A be so that d ( y , a ) 5 dist(y, A) E .
Thendist(x, A) 5 d ( x , a ) 5 d ( x , y) d ( y , a ) 5 d ( x , y ) dist(y, A) 6 , andhence
dist(x, A) - dist(y, A) i d ( x , y) E . Because E > 0 was arbitrary this means that
dist(x, A ) - dist(y, A) 5 d ( x , y). We can prove dist(y, A) - dist(x, A) 5 d ( x , y) in
similar fashion. Hence, Idist(x, A) - dist(y, A)l 5 d ( x , y) and dist(., A) is Lipschitz
w
continuous with Lipschitz constant 1
+
+
+
+
Lemma 16.99 below says that for any closed set C that is contained in an open set
U the distance function allows us to slip another open set V between C and U so that
the closure of V is also between C and U . For an illustration of Lemma 16.99, see
Figure 40 on page 336.
Lemma 16.99 Let X be a metric space, let C s X be closed and let U C X be open
so that C
U . Then there is a continuous function f : X -+ [0, 11 so that f Ic = 1
and f Ix\u = 0. Moreovel; there is an open set V so that C C V C 7 U .
dist(x, X \ U )
. Because C and
dist(x, X \ U ) dist(x, C)
X \ U are disjoint closed sets, the denominator is greater than zero for all x E X
(see Exercise 16-104b). Thus f is continuous on X. Moreover, (see Exercise 16-105)
f / c = 1 and f Ix\u = 0.
Proof. For each x
E
X, let f ( x ) :=
To prove the claim about the sets, let V := f - '
+
[(i,
111. Because
(k,
11 is open
in [0, 11 and f is continuous, V is an open set in X and because f Ic = 1 it contains
1
C. Moreover, because f is continuous, for all x E 7we infer f (x)2 :, which means
that
v
L
w
U
For compact sets C, we would like to separate C from its neighborhood U with
an open set V whose closure is compact. While this is not possible in general (see
Exercise 16-log), it is possible in spaces with sufficiently many compact subsets. Local
compactness guarantees that locally there are enough compact subsets by demanding
that every point has a compact neighborhood.
Definition 16.100 A metric space X is called locally compact i f e v e r y x
compact neighborhood.
E
Proposition 16.101
A metric space X is locally compact iff f o r every x
an E > 0 so that B , ( x ) is compact.
X there is
E
X has a
Proof. For "+,"let X be locally compact and let x E X. Then x has a compact
neighborhood N . Let E > 0 be so that B , ( x ) N . Then by Proposition 16.61 the set
B, ( x ) is compact.
Conversely, let X be so that for every x E X there is an E > 0 so that B,(x) is
w
compact. Then for every x E X the neighborhood N := BE( x ) is compact.
It is easy to infer from Proposition 16.101 that all open subsets of Rdand all closed
subsets of Rd are locally compact. In particular, we obtain that surfaces like the unit
16.9. Locally Compact Spaces
x E Rd :
lowing.
llxil2
335
I
= 1 are locally compact. More generally, we can say the fol-
Definition 16.102 Let X . Y be metric spaces. Then f : X + Y is called a homeomorphism iff f is continuous and bijective and its im.erse is continuous, too.
Example 16.103 Any metric space for which each point has a neighborhood that is
homeomorphic to an open set in R" is locally compact. If the dimension d does not
depend on the point, we call the space a manifold. Manifolds are discussed in detail in
Chapter 19. Surfaces such as the unit sphere are manifolds. For much of the following,
solids and surfaces in Rdare a good visualization and motivation.
0
In locally compact spaces, between any compact set C and any open neighborhood
U of C we can slip a compact neighborhood of C (also see Figure 40(a)).
Lemma 16.104 Let X be a locally compact metric space, let C 5 X be compact and
let U 5 X be open with C U . Then C has a neighborhood V so that is compact
and contained in U .
v
Proof. For each c E C, there is an E~ > 0 so that B,,(c) is compact and conBe, ( c ) and C is compact, there are c1, . . . , cn E C so
tained in U . Because C
u
u
C€C
j=1
u
n
n
that C E
BEc ( c j ) . But then V :=
B,, ( c j ) is a neighborhood of C so that by
j=1
Exercises 16-44b and 16-1 10 the closure 7=
u
n
u
n
B,, ( c j ) =
BEc ( c j ) is compact.
j=1
J=1
) contained in U , the union is contained in U , too. W
Moreover, because all B E c ( c are
Standard Proof Technique 16.105 The argument in the proof of Lemma 16.104 is
a typical application of the Heine-Bore1 formulation of compactness. We start with
an open cover and because the notion we want to preserve is only preserved by finite
unions, we use a finite subcover.
0
Local compactness only applies near individual points. As it turns out, for connected spaces we can turn this local idea into a property that allows us to use compactness in a more global fashion.
Definition 16.106 A metric space X is called a-compact #X is the union of countably many compact sets.
Clearly, closed subsets of Rd are a-compact, because their intersections with the
closed balls B, (0) are compact. More specifically we can say the following.
Theorem 16.107 Let X be a connected, locally compact metric space. Then X is a compact.
16. The Topology of Metric Spaces
336
Figure 40: Part ( a ) illustrates Lemmas 16.99 and 16.104, which say that between any
closed (compact) set C and any open set U surrounding C there is an open set V so
that V and 7are “between” C and U . Part ( b ) shows a a-compact space and the idea
for the proof of Theorem 16.112. The concentric circles depict a compact exhaustion
(the sets K,, in the proof of Theorem 16.1 12). The shells S,,and their neighborhoods
U, are set up so that only finitely may of the U, can intersect.
I
-
I
Proof. For each x E X, let r, := sup r > 0 : B,(x) is compact . Because X is
locally compact, each r, is greater than zero. If x E X is so that r, is infinity, then
co
for all n
E
N the set B , , ( x ) is compact, and hence X
=
uB,(x)
is a-compact. This
n=l
leaves the case in which each r, is finite.
In this case, we first prove that the function x H r, is continuous. Let x,z E X be
so that d(x,z ) < r,. Then for all r E ( d ( x ,z ) , r,) we infer B r - d ( x , z ) ( ~ C
) B,(x) and
the ball on the right is compact. Hence, rz 2 r, - d ( x , z ) , that is, r, - rz 5 d ( x , z ) .
Reversing the roles of x and z we can also prove rz - r, 5 d(x,z ) , which means
lr, - rzI 5 d ( x , z ) for all z with d(x,z ) < r,. Hence, x H r, is continuous at each
x E x.
Now for each compact subset C of X we define N ( C ) :=
u
B?(c). We claim
C€C
that the set N ( C ) is compact. Let ~=:},x{
be a sequence in N ( C ) . For each x, there
is a c, E C so that x, E B% (c,). Because C is compact, {cn)Zl
has a convergent
subsequence with limit c E C. Without loss of generality we can assume that [en]:=,
itself converges to c. Then there is an N E N so that for all n 2 N the inequalities
TC
rC
rc,
57,
d(c,,, c ) < - and Ire, - r c / < - hold. But then for all n 2 N we have - < -,
4
4
2
8
rc
rc
7rc
and hence d ( x n , c ) 5 d ( x , , c,) +d(c,, c) <
- < -,
so x, E B b ( c ) . Be2
4
8
has a convergent subsequence. This proves that
cause B7 ( c ) is compact, ~=:},x{
3
N ( C ) is compact.
+
Now let x
E
X be arbitrary and let C1 := {x}. Recursively define Cn+l := N(C,)
16.9. Locally Compact Spaces
u
337
oc
for n E
N. Let H
:=
C,. Then because each C, is compact, H is a-compact.
n=l
Moreover, because each Cn+l is a neighborhood of every element of C,, H is open.
To see that H is closed, let x E %. Then there is an h E H so that d ( h , x ) < 4
rX
rh
3r.x
rx
and Irh - rx 1 < -. But then - > - > -, which means that if n E N is so that
4
2
84
h E C,, then x E N(C,). Hence, H = H is closed and so X \ H is open. Because X
is connected, this means that X = H and since H is a-compact the result is proved.
We conclude this section by proving that locally compact spaces have a partition of
unity (see Definition 16.110 below) with certain properties.
Definition 16.108 The cover 0 of the metric space X is called locally finite iff each
p E X has a neighborhood that intersects onlyjnitely many elements of 0.
Definition 16.109 Let X be a metric space and let f : X + R. Then the support of
f is dejked to be supp(f ) := { x E X : f ( x ) # 0).
Definition 16.110 A family {cpj)jEJ of continuous functions on a metric space X is
called a partition of unity iff
1. The collection { {x E X : cpj ( x ) $ 0 } }j E is a locallyjnite cover of X .
2. For all x
E
X we have that
cpj ( x ) =
1. (By part 1 for each x E X this sum
jEJ
has only jnitely many nonzero terms.)
If0 is an open cover of X and for each j E J the containment supp(cpj)
for some U E 0,
then the partition of unity is called subordinate to 0.
c U holds
The importance of partitions of unity will become clear in Section 19.5. Until then,
consider the following. Many surfaces in Rd cannot be parametrized with just one
function that has an open domain. (Open domains are needed, because differentiable
functions typically have open domains, see Chapter 17.) For example, a parametrization of the unit sphere, say, with spherical coordinates, will always either hit a few
points twice or it will m i s s at least a “seam.” This is because a parametrization must be
a homeomorphism and the unit sphere is compact, which means it cannot be homeomorphic to an open subset of I@. However, roughly speaking, a function is integrated
over the sphere by integrating its composition with the right parametrization. Thus, it
is problematic to double count points or m i s s points. Either case would distort the integral. This problem does not arise for functions that are zero except on some small open
set. For such functions, we could simply use a parametrization for which the missed
seam does not intersect the support of the function. A partition of unity {cp,)jGJ allows
us to represent arbitrary functions f as sums
‘ p j f of functions ‘pj f whose supports
jEJ
are contained in “small” open sets. We can then integrate these functions separately
16. The Topology of Metric Spaces
338
and the overall sum will be the integral of f.There is still a tremendous amount of detail left to be considered (think about independence of the parametrization), and this is
why we will later need partitions of unity subordinate to open covers that have further
nice properties.
Because not every cover is locally finite, we need the notion of a refinement.
Definition 16.111 Let 0 be a cover of the metric space X . A cover
refinement of 0#for all U E 6 there is a V E 0so that U C V .
6 is called a
Theorem 16.112 Let 0 be an open cover of the locally compact, a-compact metric
space X . Then 0 has a countable locallyfinite open refinement. Moreovel; the closures
of the sets in the rejinement are compact,
Proof. There is nothing to prove if X is compact. If X is not compact, let { C ; } c l
cx
be a sequence of compact sets with X =
u
C;. Let K1 := C1 and once K1, . . . , K,
j=1
are defined, let K,+1 be a compact neighborhood of K, U C, so that K, U C, g K,",,
u
.
M
Then X =
K , and for all n
E
N we have the containment K, C K,",, .
n=l
Let K-1 := KO := 0.For all n E
u
N let S,
:= K,
\ K,"-l and U,
:= K,",,
\
Kn-2.
ffi
Then X =
Sj and for all n
E
N the set S, is compact, U, is open and Sn g U,.
j=1
For In - ml z 1, we have S, n S, = 0 and for all / n - ml > 2 we have U , n U , = 0
(also see the right part of Figure 40). For each x E S, , let Nx be an open neighborhood
of X that is contained in U, and in some 0 E 0. Then S,, C
N x and because S,
u
X€S,
u
kn
is compact, there are x1("), . . . , xj:)
E S, so that S, _C
j=1
We define
d :=
CL)
u
Nx(!] : n
[
k nJ
u
E
N,j
= 1, . . . , k,
I
Nx(!).
'
-
. Clearly, 0is countable. Because
N,c,) for all n E N we conclude that 6 is a cover of X. All
j=1 '
N.x(,) are open and contained in an 0 E 0, so 6 is an open refinement of 0. Finally,
X=
S j and S,
C
j=1
6 is locally finite, let x E X . Then there is a k E N so that x @ U,, unless
+ 1, k + 2). Any N$) that intersects uk U uk+l U u k + 2 must be contained in
to iee that
m E { k ,k
one of uk-2,. . . , uk+4. These sets contain finitely many N;:) each. Therefore x can
be in at most finitely many N$), and hence 6 is locally finite. Finally, because each
Nx is contained in a U, 5 K,+1, the closure of each Nx is compact.
As it turns out, any locally finite open cover of a a-compact space is countable.
16.9. Locally Compact Spaces
339
Proposition 16.113 Let 0be a locallyjinite open cover of the a-compact metric space
X . Then 0 is countable.
Proof. Exercise 16-109.
To construct a partition of unity subordinate to a locally finite open cover we need
to find a way to define functions that are supported inside the open sets so that the sets
where the functions are not zero cover the whole metric space. To do that, we need to
be able to shrink every set in the open cover a little bit while making sure that we still
cover the whole space.
Theorem 16.114 Shrinking Lemma. Let 0 be a locally$nite open cover of the locally compact, a-compact metric space
Then for each U E 0 there is an open set
VIJso that
c U and so that 6 := { VIJ: U E 0}is a locallyjnite open cover of X .
x.
Proof. By Proposition 16.113, 0 is at most countable, so let {U, : n E W} := 0.
(For finite covers 0,
the construction below terminates in finitely many steps with the
u
00
desired cover
6.) The set C1 := U1 \
u
00
U,, = X
\
n=2
U,, is closed and contained
n=2
in U1. By Lemma 16.99, there is an open set V1 so that C1 C V1 G % E U1. Then
0
1 := { V l }U {U, : n > 1) is anopencover of X and% & U1.
Once an open cover 0 k = { V1, . . . , Vk} U {U,, : n > k } has been constructed so
that
q C Uj holds for j
= 1, . . . , k, let C k + l := u k + l
\
u
[j:l
co
Vj U
u
n=k+2
Un
)
. Then
Ck+l is closed and contained in u k + l . By Lemma 16.99, there is an open set Vk+l so
u k + ] . Let L?k+l := { V l , .. ., v k , vk+l}u{un: n > k
1).
that ck & v k + l & v k + l
Then 0 k + l is an opEncover of X and% E Uj for all j = 1 , . . . , k 1.
Now consider 0 := { Vj : j E N}. Because (3 is locally finite, for each x E X
there is an N E N so that x # U, for all n 2 N . But then, because ON was an open
cover of X, there must be a j < N so that x E V j . Hence, 8 is a cover of X. By
construction, for all j E N the set V, is open and satisfies
_C U j . Because 0 is
locally finite, 6 must also be locally finite and thus the result is proved.
+
+
Theorem 16.115 Let 0 be an open cover of the locally compact, a-compact metric
space X . Then there is a partition of unity subordinate to 0.
Proof. Let U be a locally finite open refinement of 0 as guaranteed by Theorem
16.112. Let u" be a locally finite open cover so that for every U E U there is a VIJE U
so that 5
U , as guaranteed by the Shrinking Lemma. For each U E 24,let WIJ
be an open set so that 5
W I J 6C U as provided by Lemma 16.99 and let
= 0 as
$U : X -+ [O, 11 be a continuous function so that $ I J I ~ ,
= 1 and
provided by Lemma 16.99. Because U is locally finite, each x E X has a neighborhood
V so thatLor all u E V the equality $ u ( v ) = 0 holds for all but finitely many U E U .
Because U is a cover of X, for at least one U E U we have + u ( x ) # 0. Hence, for
all x E X the sum +(x) :=
$ I J ( X ) is a positive real number. Moreover, for each
UEU
340
16. The Topology of Metric Spaces
E X on a neighborhood V of x the function @ is the sum of finitely many continuous
functions. Hence, @ is continuous on this neighborhood of x and so @ is continuous at
x. Because x was arbitrary, @ is continuous on X.
x
= {x E X : @ u ( x )# 0). For all
For each U E U , we have
@u
U E U define q y := -.
7b
over, for all x
E
X we have
Then { {x E X : cpu(x) # O ) } u E U is locally finite. More-
c
cpu(x) =
c
uEU
U€U
@u(x) - @(x) - 1. Hence,
$(XI
{cpu}uEu
@(XI
is a partition of unity. For each U , we have supp(cpu) G U G 0 for some 0
the partition of unity {cpu}uEuis subordinate to 0.
E
0,
so
Exercises
16-104. Let X be a metric space and let A
G
X be a nonempty subset.
(a) Prove that dist(x, A) = 0 iff there is a sequence [a,]:=,
(b) Let C
CX
in A with lim a, = x.
n-t 30
be closed and nonempty. Prove that dist(x, C) = 0 iff x E C
16-105. Let a , b be distinct nonnegative numbers that are not both zero. Prove that we have
a
5
E [O,
a f b
a
= Oiffa = 0,andthat
= 1 iff b = 0.
that a f b
a f b
16-106. Let X be a metric space and let C & X be a nonempty compact subset. Prove that for all x
is a c, E C so that d ( x , cx) = dist(x, C).
16-107. Let X be a metric space. For any two nonempty subsets A , B
as dist(A, B ) := inf { dist(a, B ) : a E A
1.
E
11.
X there
X, define the distance from A to B
(a) Prove that for all nonempty subsets A , B g X we have dist(A, B ) = dist(B, A).
(b) Give an example of two closed, disjoint, nonempty sets A , B such that dist(A, B ) = 0.
(c) Prove that the function in Lemma 16.99 need not be uniformly continuous, and hence in
particular it need not be Lipschitz continuous.
(d) Prove that if B and C are not empty and C is compact, then there is a c E C so that
dist(C, B ) = dist(c, B ) .
16-108. Give an example of a metric space in which Lemma 16.104 fails. That is, give an example of a metric
space in which no neighborhood of a compact set is compact.
16-109. Prove Proposition 16.113.
u
I1
16-1 10. Let X be a metric space and let C1, . . . , Cn be compact subsets of X. Prove that
C j is compact.
j=l
16-1 I 1. Give an example of a bijective continuous linear function whose inverse is not continuous.
Chapter 17
Differentiation in Normed
Spaces
To discuss differentiation, we need algebraic operations and a metric. Normed spaces
have the algebraic structure of a vector space and a metric induced by the norm (see
Proposition 15.54). The vector space structure is typically discussed in linear algebra. To keep the text self-contained, we introduce the requisite concepts and ideas in
this chapter. Moreover, we freely use metric concepts discussed in Chapter 16. The
presentation will be coordinate-free and valid for arbitrary (including infinite) dimensions. Although derivatives are mainly used in finite dimensional spaces and computed
through partial derivatives along coordinate axes, this abstraction does not make the
proofs more complicated. Instead, the omission of coordinates allows us to focus on
the conceptual core of differentiation. The relevant results for finite dimensional spaces
are given as consequences of the general theory.
To understand differentiation in multidimensional spaces, we must first adjust our
expectation what a derivative should be. The derivative cannot be a number or a slope,
because it is not clear in which direction this slope would go. Instead, differentiation is defined similar to Theorem 4.5, which says that the derivative at x determines
the unique straight line through (x,f(x)) for which the difference (at z ) between the
function and the line goes to zero faster than ( z - XI. Geometrically, (hyper)planes
will take the place of lines. Linear functions are the analytical tool used to define
(hyper)planes. We start our investigation with linear functions in Sections 17.1 and
17.2. Derivatives and partial derivatives are introduced in Sections 17.3 and 17.5. In
between, Section 17.4 introduces the Mean Value Theorem, which is crucial for using
derivatives to estimate differences. Section 17.6 introduces tensors, which are needed
to represent higher derivatives in Section 17.7. We conclude in Section 17.8 with the
Implicit Function Theorem, which provides important examples of manifolds.
17. Differentiation in Normed Spaces
342
17.1 Continuous Linear Functions
The definition of a linear function is entirely algebraic. It simply states that the function
is compatible with the vector space operations.
Definition 17.1 Let X , Y be vector spaces. A function L : X -+ Y is called linear i f s
E X we have L[ xl
x2] = L [ x l ]+ L[x2]andfor all (Y E IR and x E X
we have L [ ( Y x ]= a L [ x ] .A linearfunction is also sometimes called a linear operator.
for all X I ,x2
+
Notation 17.2 Derivatives will be functions that map points to linear functions. To distinguish the various evaluations, throughout the text we will enclose the argument of a
linear function in square brackets rather than round parentheses. To avoid confusion
with the square brackets which indicate that the elements of LP spaces are equivalence
classes, we will henceforth omit the brackets around the elements of LP spaces, as
is customary in analysis.
0
Differentiation and integration both lead to examples of linear functions.
Example 17.3
1
1
1. Let ( M , C, p ) be a measure space, let 1 5 p , q 5 co with - - = 1 and let
P 9
g E L q ( M , C , p ) . By Example 16.28, Zg[f ] :=
fg d p defines a continuous
+
function Zg : L p ( M , 0,p ) -+
is linear.
s,
IR and it is easy to see via Theorem 14.39 that Zg
2. Let 1 5 p < co and let X be the subspace of functions f E C' ( a , b ) n L p ( a , b)
so that f' E L p ( a , b ) , too. When it is not mentioned explicitly that elements
of L P ( a , b) are equivalence classes, it is customary to use intersections like
C ' ( a , b ) n L p ( a , b ) rather than the more complicated notation from Theorem
16.86. We claim that the function D : X + LP(a, b ) defined by D [f ] := f' is
linear, but not continuous.
If two functions in X are equal almost everywhere, then they must be equal (see
Exercise 8-5). That is, i f f , g E X g L p ( a , b ) and f = g a.e., then f = g,
which implies that D is well-defined. (Just because we do not mention that the
elements of LP are equivalence classes does not mean we do not need to pay
attention to this fact when we define a function.) Linearity of D follows easily
from the corresponding linearity properties of the derivative (see Theorem 4.6).
To see that D is not continuous, we will only consider the case a = 0 and b = 1
here. The general case is left to the reader in Exercise 17-1. Because
17.1. Continuous Linear Functions
lim
=
n--tm
343
(n p - p + l
=(m;
np
1;
i f p > 1,
i f p = 1,
0
the function D is not continuous at 0.
It is a good rule of thumb that integration usually defines continuous linear functions and differentiation usually defines discontinuous linear functions. We have to be
careful, though. Exercise 17-10 shows that integration need not always define a continuous linear function and Exercise 17-13 shows that differentiation can be continuous.
The multiplication of a matrix with column vectors is another fundamental example
of a linear function. We will investigate the connection between the “abstract” concept
of continuous linear functions and the more “concrete” idea of matrix multiplication in
Section 17.2. Before then, we need to investigate continuity for linear functions.
In Example 17.3, we have only proved that D is not continuous at the origin.
Loosely speaking, for linear functions the behavior at the origin is duplicated at every point. In particular, Exercise 17-2 shows that D is discontinuous at every point
of LP(a, b). As a positive result, Theorem 17.4 below shows that for linear functions
continuity at the origin is equivalent to continuity everywhere. Note how the proof uses
linearity to reduce every situation to a configuration near the origin.
Theorem 17.4 Let X, Y be normed spaces and let L : X + Y be a linearfunction.
The following are equivalent:
1. L is continuous on X.
2. L is continuous at 0.
-
3. L is boundedon B l ( 0 ) c X.
4. T h e r e i s a c E R s o t h a t f o r a l l x ~ X w e h a v elIL[x]//icllxll.
Proof. The implication “1=+2” is trivial. For “2+3,” let E > 0. Because L is
continuous at 0, there is a 6 > 0 so that for all x E X with Ilxll = //x- 011 5 6 we
have IIL[x] < E . Therefore for all points x E X with / / x =
/ /IIx - 011 I
1 we obtain
1
1
IIL[x]11= IIL :6x] =
L [6x] r: ; E . Hence, L is bounded on B1(0).
/
1
[
For ‘‘3=>4,”let c
allx
E
x weinfer
E
1
1
R be such that for all x
E
l l ~ [ x ]=( /
For ‘ ‘ 4 j l . ” let c > 0 be such that for all x
&
E
B1(0) we have 1 L [x]1 5 c. Then for
E
> 0 and set 6 := -. Then for all x,y E X with lIx - y
C
1
X we have L [ x ] I /5 cIIxII. Let
I/
< 6 we obtain
1/L[xl- L[YlII = / / L b- Y 1 / / I
cllx - YII <
“;
&
= E.
In particular, Theorem 17.4 shows that every continuous linear function is Lipschitz
continuous.
344
17. Differentiation in Normed Spaces
Definition 17.5 Because of part 3 of Theorem 17.4, continuous linear functions are
also called bounded linear functions.
It is now easy to show that any linear function on a finite dimensional normed space
is continuous.
Corollary 17.6 Let X be ajinite dimensional normed space, let Y be a normed space
and let L : X + Y be a linearfunction. Then L is continuous.
Proof. Let
{ul, ...,Ud}
be a base of X. Then for all x
x
E
X there are unique
d
coefficients c1,
. . . , Cd so that x
=
Cj ui
. Denote the norm on X by II . /I and note that
i=l
IIxlloo := max { IciI : i = 1 , . . . , d } is another norm on X. By Theorem 16.76, 11
and the original norm I/ . I( of X are equivalent. Let r > 0 be such that for all x
d
wehave
I I X ~ 5~ rllxll
~ andletc:=rxIIL[ui]ll.
i=l
.
E
X
d
Thenforallx=xciui ~ X w e
i=l
obtain
which means that L is continuous.
Linear functions can be added pointwise and they can be multiplied with real numbers. Hence, the linear functions from one normed space to another form a vector
space. Part 4 of Theorem 17.4 enables us to define a norm on this space.
Definition 17.7 Let X, Y be normed spaces. We dejine C ( X , Y ) to be the set of all
continuous linear functions from X to Y .
Theorem 17.8 Let X, Y be normed spaces. Then, with pointwise addition and scalar
multiplication, C ( X , Y ) is a vector space. Moreovel; the function
Proof. The proof that C(X, Y ) is a vector space is left to Exercise 17-3a. To prove
that C(X, Y ) is a normed space, we start by defining for all L E C(X, Y ) the quantity
11 L /I := inf { c L 0 : (Vx E X : 11 L[x] 11 5 cI/x[I)}. This is necessary, because we do not
know a priori that the infimum is assumed, as is implicitly claimed in the definition of
the norm on C ( X , Y ) in the statement of the theorem.
To prove that the 11 L 11 defined above is a norm, first note that if L : X + Y is linear
and bounded, then IlLll = 0 iff inf{c L O:(Vx E X:/L[x]ll I
cllxll)}=O iff for all
17.1. Continuous Linear Functions
x
E
345
X we have L [ x ] = 0 iff L = 0. Moreover, if L E C ( X , Y ) and (Y E
1la~11= inf{c 2
R,then
o : (VX E x : ~ l a ~ [ x I
l l lc11x11)}
For the triangular inequality, let L , M E L ( X , Y ) . Then
o : (VX E x : / I L [ X+] M [ X ] I r:I C I I X I I ) }
I inf{c 2 o : (VX E x : IIL[XIII + IIM[XIII I
cIIxII)}
5 inf{a 2 o : (VX E x : /l~[xlll5 a / l x l l ) }
I I L+ M I / =
inf{c 2
+inf { b 2 0 : (Vx E X : llM[x]11 I
bllxll)}
=
IlLIl + IlMIl.
By Exercise 17-3b, we have l l L [ x ] / /I
llLll IIxII for all x E X. In particular, the
infimum that is used to define the norm is actually a minimum, as claimed.
Unless otherwise indicated, throughout this text the norm on any space of continuous linear functions will be assumed to be the norm from Theorem 17.8.
Definition 17.9 The norm from Theorem 17.8 is also called the operator norm of the
continuous linear function L.
Another way to represent the norm of a continuous linear function is the following.
Proposition 17.10 Let X , Y be normed spaces and let L : X + Y be a continuous
linearfunction. Then 11L11 = sup I I L [ x ] : x E B1(0)].
{
Proof. Exercise 17-4.
1
rn
The completeness of the spaces C(X,Y ) solely depends on the completeness of the
image space Y .
Theorem 17.11 Let X be a normed space and let Y be a Banach space. Then C ( X , Y )
is a Banach space.
{Ln)rz1
Proof. Let
be a Cauchy sequence in C ( X , Y ) . We first claim that for
all x E X the sequence { L n [ x ] } zisl a Cauchy sequence. To prove this claim, let
x E X and let E > 0. Find an N E N so that for all m , n 3 N we have the inequality
&
llLm - L l l I
. Then for all m , n 2 N we infer
IIxII + 1
~
17. Differentiation in Normed Spaces
346
l a Cauchy sequence.
and { L n [ x ] } z is
Because Y is complete, for all x E X we can define L [ x ]:= lim L , [ x ] . It is easy
n+m
to prove that L is linear (see Exercise 17-5). Because the reverse triangular inequality /llLMII- ~ ~ L5 nIILm
~ ~- L,((
/
holds for all m , n E N we obtain that { ~ ~ L n ~ ~ }
is a Cauchy sequence, and hence lim IIL,)I exists. Now for all x E X the inequaln+x,
(ii
/lL,/l)
%
//XI/ holds, and hence L
ity / / L [ x l /=/ lim / / L , [ x ] / II
,--too
E
L(X,Y ) . To
prove that L is the limit of { L n } z lin L ( X , Y ) let E > 0. Let N E N be such that for
all m , n 2 N we have llLm - L , )I < E . Then for all n 2 N we obtain
llLm - L , /I IIx 11 : x
Thus for all n
> N we have IIL - L , 11
E
-1
Bl(0) 5
E.
< E and { L n ] z converges
l
to L .
w
For our investigation of derivatives, it is important to realize that there is a simple
bijective correspondence between the elements of a normed space Y and the linear
functions from R to Y .
Theorem 17.12 Let Y be a normed space. For f E Y define L f : R -+ Y by
L f [ x ]:= xf . Then the function I [f ] := L f defines an isometric isomorphismfrom Y
to C(R, Y ) .
Proof. Clearly, for all f E Y the function L f is linear and II L f II = I/ f 11. Linearity
of I is trivial. Because IILf 11 = 11 f 11, I is an isometry, and hence it is injective. To
prove that I is surjective, let L E C(R, Y ) . Set f := L[1]. Then for all x E W we have
L [ x ] = L [ x l ] = x L [ l ] = xf , which means that L = L f and I is surjective.
We conclude this section by noting that for continuous linear functions we could
always assume that the domain is a Banach space. Recall that by Theorem 16.89 and
Exercise 16-94 every normed space can be densely embedded into a Banach space.
Theorem 17.13 shows that any linear function L : X -+ Y can be extended to a
unique continuous linear function from the completion of X to the completion of Y .
This means that as long as we work with continuous linear functions, which we do
exclusively in this chapter, it would be no loss of generality to assume (as is often
done) that domain and range are Banach spaces.
Theorem 17.13 Let X be a normed space, let Y be a Banach space, let D & X be a
dense linear subspace of X and let L : D -+ Y be a continuous linear function. Then
there is a unique continuous linear function M : X -+ Y such that M I D = L.
Proof. Exercise 17-6.
w
347
1 7.1. Continuous Linear Functions
Exercises
17-1. Let 1 5 p < co,let a < b and let X := { f E C ’ ( u , b ) n L P ( a , b ) : f ’ E L p ( u , b )
D : X -+ L P ( a , b ) defined by D [ f ] := [ f ’ ]is not continuous.
}. Prove that
17-2. Let X , Y be named spaces and let L : X -+ Y be linear. Prove that if L is discontinuous at the
origin, then it is discontinuous at every x E X .
17-3. Finish the proof of Theorem 17.8. Let X, Y be normed spaces
(a) Prove that L(X,Y ) with pointwise addition and scalar multiplication is a vector space.
(b) Let L : X + Y be a continuous linear function. Prove that
1 L[x] /
5 IILII IIx 11 for all x
E
X.
17-4. Prove Proposition 17.10.
17-5. Finish the proof of Theorem 17.11 by proving that the function L [ x ] := lim L n [ x ] defined in the
n+m
proof is linear.
17-6. Prove Theorem 17.13.
17-7. Let X, Y be vector spaces and let U be a base of X.
(a) Let L . M : X + Y be linear. Prove that if L ( b ) = M ( b ) for all b E U, then L = M .
(b) Prove that if f : U + Y is a function defined on the base 8,then there is a unique linear
function L f : X + Y so that L f / 8 = f .
1
17-8. Let 1 5 p 5 x, let q be so that P
s { ~IF=,
, ({xj]gI)
:=
+ -1 = 1 and let
02
4
( U ~ J ~E = 14.
~
Prove that the function
C a j x j is a continuous linear operator from ZP to R.
j=1
17-9. Let X,Y be normed spaces and let L : X + Y be linear. Prove that if L is continuous at some
x E X,then L is continuous on X.
17- 10. Integration need not always define a continuous linear function. Let X be the space of all continuous
functions f : JR + JR so that { x : f ( x ) f 0 } is bounded.
11 . 1/30
(a) Prove that X is a normed subspace of (C(W,a),
(b) Prove that L : X -+ W defined by L [ f ] :=
__
(c) Prove that L is not bounded on B l ( 0 ) .
1,
).
x f ( x ) dx is a linear function on X
(d) Let f E X. Find a sequence ( f n ) ~ = of
, elements of X that converges to f and such that
02
{ L [ f n ] }n=l
does not converge to L [ f ] .
(e) Prove that X is not a Banach space.
(0 Prove that X
is not dense in (C(W, W),11
).
1130
17-11, Let X be the space of polynomials of order at most 3 on the interval (0, 1). Prove that D [ p ] := p’ is
a continuous linear function from X to X.
17-12. Let X. Y and Z be normed spaces and let K : X
functions.
--f
Y and L : Y + Z be continuous linear
(a) Prove that lIL o Kll 5 ~ ~ L i ~ ~ ~ K i ~
(b) Prove that the inequality in part 17-12a can be strict.
17-13, ProvethatifC’(u,b)isequippedwiththenorm I l f l l := l I f l l r n
f H f ’ is a continuous mapping from C ( a , b ) to Co ( a , b ).
+ llf’llm
(seeExercise 16-17),then
17. Differentiation in Normed Spaces
348
17.2 Matrix Representation of Linear Functions
The coordinate-free introduction in Section 17.1 provides a concise description of linear functions. Without specifics about a coordinate system, an abstract notion can
usually be investigated more easily, because there are fewer details to keep track of.
On the other hand, coordinates bridge the gap between abstract notions and concrete
applications. Therefore, the connection between a concept and its coordinatized version should be investigated very carefully. This section shows that coordinatization of
linear functions is done by carefully reinterpretingsome natural coefficients.
A coordinate system in a vector space ultimately is nothing but a base, because any
vector can be expressed as a unique linear combination of the base vectors (see Proposition 15.18). The coefficients in the base representation of the vector can be viewed
as the coordinates. In this section, we investigate how a linear function between finite
dimensional spaces maps the coordinates of its input vectors to the coordinates of its
output vectors. Because all finite dimensional spaces are isomorphic to some Rd (see
Proposition 15.25) we will work with spaces Rm,Rn etc. throughout this section.
The choice of a base in domain and range leads to the connection between linear
functions and matrices.
Proposition 17.14 Let in, n E N,let L : Rn --+ Rm be lineal; let { u l , . . . , u,} be a
base of Rn and let (wl,
. . . , w m }be a base of R". For all j = 1, . . . , n, let aij with
m
i = 1, . . . , mbesuchthatL[u;] = C a i ; w i . Because(u1, . . . , u,]isabaseofRn,for
i=l
all x
E
n
Rn,there are unique coeflcients c1, . . . , c,
so that x =
cj uj.
The image of
j=1
Proof. Exercise 17-14.
Proposition 17.14 shows that, once we fix bases in Rn and R", for each linear
function L : Rn --+ R" there is a unique rectangular array of real numbers a j j , with
indices i = 1, . . . , m , j = 1, . . . , n that can be used to represent L . Such rectangular
arrays of numbers are called matrices. The set of matrices with the natural addition and
scalar multiplication is a vector space.
Definition 17.15 Let m , n E N.A real m x n-matrix with m rows and n columns is
a function A : 11, . . . , m } x 11, . . . , n } --+ R,denoted A = (ajj) = l,. , . The
j = l , . . . .n
index i is called the row index and the index j is called the column index. We dejine
M ( m x n , R)to be the set of all realm x n-matrices.
,,
Proposition 17.16 Let m , n
E
N,let
(aij)
=
,
,
,
j = l . . . . ,n
, (bij)
= 1 , ,, ,
.
j = I, ...,n
be real
17.2. Matrix Representation of Linear Functions
349
R.With addition dejined by
m x n matrices and let c E
+ (bij) i = 1 , .
= 1,. ., , m
j = l,..,,n
(aij) i
, , ,m
j = 1, . . . , n
:= (aij
+ bjj)
i = 1 , . ., , m
j = l....,n
and with scalar multiplication dejined by
= 1 , . . . ,m
j = l , . . . ,n
~ ( a i j i)= l , . . . ,m := (caij) i
j = l , . . . ,n
the set of matrices M ( m x n , R) is a vector space.
Proof. Exercise 17-15.
The coefficients from Proposition 17.14 immediately lead to an isomorphism between C (Rn,R") and M ( m x n , R), where both are considered as vector spaces. Note
that the specific isomorphism will depend on which bases we choose in R" and R".
Theorem 17.17 Let m , n E N,and let { u l , . . . , u n } and { w l , . . . , w m } be bases of
W" and Rm, respectively. For each L E C (R",
R"), let A ( L ) = ( a j j ) = 1,, , m
,,
j = 1, . . . , n
be the matrix with coeficients aij provided by Proposition 17.14. Then the function
A : C (W",
R") + M ( m x n , R)thus dejined is a vector space isomorphism.
Proof. Exercise 17-16.
Similar to Theorem 17.12, form = n = 1 Theorem 17.17 shows that linear functions from R to R are in bijective correspondence with numbers (considered as "1 x 1
matrices").
Composition of functions can be used to define a multiplication on C (R" , R").
Exercise 17-17 shows that this multiplication is compatible with addition and scalar
multiplication. For M ( m x n , R),we can define multiplication of matrices as follows.
Definition 17.18 Matrix multiplication. Let m , n , p E N. For the real matrices
A = ( a j k ) j = 1 , . . , , m E M ( m x n , R ) a n d B = ( b i j ) j = l , . . _ , p E M ( p x m , R),we
j = 1, . . . , m
k = l , . . . ,n
dejine the product B A :=
bijajk
(j:1
)
E
M ( p x n , W).
i = l ,. . . , p
k = 1.. ..,n
Theorems 17.20 and 17.21 below show the connection between matrix multiplication and the evaluation and composition of linear functions. For the remainder of this
chapter, the base used in any space Rdis the standard base ( e l , . . . , ed).
m
Proposition 17.19 Let m
E
N. Thefinction V,
isomorphism from M ( m x 1, R) to Wm.
(uij)
= 1,, , , ,
j=1
:=
C uilei is an
i=l
17. Differentiation in Normed Spaces
350
I
W"
vn
M ( n x 1,R)
L
c
vm
A = A(L)
c
I
Rm
M(m x 1,W)
H
B =A(H)
* afp
c
i'.
M ( p x 1.X)
Figure 4 1: The connection between linear functions and their matrix representations.
Note that instead of representing linear functions between spaces Rd with the standard
basis, we could have also represented linear functions between arbitrary finite dimensional vector spaces using any base in each space.
Proof. Exercise 17- 18.
W
The isomorphisms Vm are the key to representing evaluations and compositions of
linear functions as matrix multiplications and vice versa. It is customary to drop the
second index for elements of M ( m x 1, R)and we will do so in the following. The representation of elements of Rm as in Proposition 17.19 is also called the representation
with column vectors. The corresponding representation with 1 x m matrices is called
the representation with row vectors.
Theorem 17.20 Matrix multiplication and evaluation of linear functions. Let
m , n E N,let L : R" + Rm be a linearfunction and let A := A ( L ) E M ( m x n , R)be
the matrix obtained from Theorem 17.17, using the standard bases in E%" and R". Then
, where A and V;'[u] are multiplied as
for all u E R" we have L [ v ] = Vm
matrices.
Proof. Exercise 17-19.
W
Exercise 17-20 shows that matrix multiplication is associative, which means we can
write a product of three or more matrices without parentheses indicating which pairs
of matrices are multiplied first.
Theorem 17.21 Matrix multiplication and composition of linear functions. Let
m , n , p E N,let L : Rn + Rm and H : Rm + R p be linear functions and let
A := A ( L ) E M ( m x n , R) and B := A ( H ) E M ( p x m , R) be the matrices obtained from Theorem 17.1 7, using the standard bases in Rn, Rm,and RP.Then for all
u E Fin we have H o L [ u ] = Vp BAV;'[x]], where B, A and V;'[x] are multiplied
as matrices.
Proof. Exercise 17-21.
W
The connection between linear functions and the matrices that represent them is
also expressed in Figure 41. Diagrams as in this figure are often called commutative
diagrams, because the order in which the arrows are followed can be interchanged and
the arrows representing isomorphisms can be reversed.
To further familiarize ourselves with the correspondence between matrix multiplication and composition of linear functions, let us cast the well-known Gauss-Jordan
35 1
17.2. Matrix Representation of Linear Functions
algorithm from linear algebra in the language of linear functions. We will focus the
result on bijective linear functions because we need this reinterpretation for Lemma
18.35 in the proof of the Multidimensional Substitution Formula.
Definition 17.22 Elementary row operation functions. Let d
d
x
E
Rdbe represented as x
xie; =: ( X I ,
=
E
N and let every vector
. . . , xd).
i=l
I . D : lRd + Rdis called a diagonal operator $there are c1, . . . , Cd
D ( x 1 , . . . , X d ) = ( C l X l , . . . , CdXd).
E
R so that
2. A : Rd -+ Rd is called a row addition operator zflthere are a number a E Iw
and distinct indices i , j E (1, . . . , d } so that for all x = ( X I , . . . , X d ) E Rd we
have A(x1, . . . , xd) = ( X I , . . . , X j - 1 , x; + a x ; , x;+l, . . . , xd).
3. T : Rd + Rd is called a row transposition operator zflthere are indices
i , j E [ l , . . . , d } w i t h i < j s u c h t h a t f o r a l l x = ( x i , . . . , xd) ~ R ~ w e h a v e
T ( x .~. . . , ~ d ) = ( ~ 1. .,. . ~ i - l , X j , ~ i +,l. . . , ~ j - l , ~ i , ~ j + l , . . . , x d ) .
Theorem 17.23 Gauss-JordanAlgorithm. Let d E N.Every bijective linearfunction
L : Rd + Rd is a composition of one diagonal operator so that all cj # 0 with row
addition and row transposition operators.
Proof. We will provide an outline here and leave the full proof to Exercise 17-22.
The proof is an induction on the dimension d . The base step d = 1 is trivial. For the
induction step, let A' := A ( L ) E M ( d x d , R) be the matrix obtained from Theorem
17.17 using the standard base in Rd. Because L is bijective, there is a coefficient upl
that is not equal to zero (explain). This means the transposition of rows 1 and i produces
a matrix A' with a:1 # 0. Execute d - 1 row additions to produce a matrix A* with
a:, # 0 and a:l = 0 for i = 2, . . . , d . Now consider the matrix B obtained from
A 2 by erasing the first row and the first column. This matrix is the image B = A ( L ' )
of a bijective linear function L' : Itd-' + Rd-' (explain). Therefore, by induction
hypothesis (explain) there is a sequence of row additions and transpositions that turns
B into a diagonal matrix C. The corresponding row additions and transpositions turn
A2 into a matrix A 3 whose only nonzero entries are in the first row and on the diagonal
(explain). Moreover, all of the entries on the diagonal are not zero (explain). Perform
the appropriate row additions to obtain a matrix A4 whose only nonzero entries are on
the diagonal. (explain).
For the above constructed sequence of row transpositions and additions, let the operators M I , . . . , M,, be the corresponding row transposition and row addition operators
in the order in which the operations were performed. Then N := M , o . . . o Mi o L is
a linear function so that A ( N ) = A4 (explain; a commutative diagram may help). That
is, N is a diagonal operator and L = M r ' 0 .. OM;' o N . Row transposition operators
are their own inverses and the inverse of a row addition operator is another row addition
w
operator (explain by stating the inverse). Thus we have proved the theorem.
Note how the whole proof of Theorem 17.23 depends on the fluent translation between matrix operations and their interpretations as compositions of linear operators.
352
17. Differentiation in Normed Spaces
Exercise 17-24 shows another application of this translation by assuring that the solution x of a uniquely solvable system of equations Ax = b depends continuously on the
coefficients of A and b.
Exercises
17-14. Prove Proposition 17.14.
17-15. Prove Proposition 17.16
17-16. Prove Theorem 17.17.
17-17. Let V , X, Y , Z be vector spaces, let L : Y + Z , M , N : X + Y and K : V
c E B.Prove each of the following.
--f
X be linear and let
(a) L o ( M + N) = L O M + L o N
(b) ( M + N)o K = M
o
K
+N o K
(c) ( C L ) 0 M = c ( L 0 M ) = L
0 (CM)
17-18. Prove Proposition 17.19
17-19. Prove Theorem 17.20.
17-20. Prove that matrix multiplication is associative. That is, prove that if A E M ( m x n , B),
B E M ( p x rn, R),and C E M ( q x p , W), then C ( B A ) = ( C B ) A .
17-21. Prove Theorem 17.21.
17-22. Prove Theorem 17.23
17-23. Interpret the rn x n matrix A with entries a i j as a linear function from ( E n ,/I ,112 ) to ( W" /I . / I 2 ).
I
Prove that the operator norm of A satisfies 11 Alj 5
i=l j = ]
17-24. The continuous dependence of solutions x of uniquely solvable systems of linear equations Ax = b
on the entries of the coefficient matrix A and the right side 6 .
(a) Let X , Y be normed spaces and let X ( X , Y ) be the set of all invertible continuous linear
functions from X to Y with continuous inverse ("linear homeomorphisms"). Prove that the
function J : X ( X , Y ) + X ( Y , X), which maps each A E X ( X , Y ) to A-' E X ( Y . X ) is
continuous on X ( X , Y ) .
Hint. Prove that for all A , B E R ( X , Y ) that are close enough together, the inequality
1IA-l
-
B-]
15
'IA-' 'I2
1 - llA-'/I lIB - All
llB
-
All holds and prove that this implies that J is
continuous at A .
(b) Prove that if A is an invertible d x d matrix and b
solution of the system of equations A x = b.
(c) Prove that the map S : R
ous.
( Wd.Rd ) x
E
Ed,then x
= J ( A ) b is the unique
?Ed + Rd defined by S ( A , b ) := J ( A ) b is continu-
(d) Suppose an industrial process allows the measurement of the coefficient matrix A and of
the right hand side b of a linear system of equations Ax = b. Moreover, suppose that the
process requires the computation of the solution x . Explain why (unavoidable) sufficiently
small measurement errors are not likely to have a large effect on the computed solution x .
353
17.3. Differentiability
3d view
’1
side view
+
6117 - - X I
Figure 42: Differentiation is approximation with linear functions. The graph of a linear function from R2 to R is a plane. The figure shows that the difference between
the graph of a differentiable function and the graph of an appropriately shifted linear
function fits into arbitrarily small “cones.” In the side view, the plane is seen “edge-on”
and we only indicate the sides of the cone with dotted lines.
17.3 Differentiability
Although it may be geometrically intuitive, defining the derivative of a multivariable
function in terms of partial derivatives can become a notational nightmare. Just consider the indices and notations in the matrix representation of a linear function in Section 17.2 and imagine them as part of a more complex definition. To circumvent this
level of detail, which might obscure the forest for the trees, we introduce derivatives
with a coordinate-free definition. Aside from simpler notation, we avoid the pathology
presented in Exercises 17-60 and 17-61 and we gain conceptual insights into a theory
that is not bound to finite dimensional spaces. Indeed, all results in this section hold
for infinite dimensional spaces, too. Interestingly enough, restriction to finite dimensional spaces would not simplify the proofs. This is similar to Sections 14.2-14.4,
where proofs originally designed for functions of one real variable translated verbatim
to the setting of measure spaces. In this chapter, we state the proofs in the abstract
setting and provide results for d-dimensional space as corollaries. By staying with
this level of generality, we will ultimately produce some rather elegant proofs of important results. For examples, consider the proof of Leibniz’ Rule in Exercise 17-58,
as well as the ends of Sections 17.7 and 17.8. In each case, important results for the
familiar d-dimensional setting are obtained as corollaries of coordinate-free results on
differentiation.
The key to differentiation in higher dimensional spaces lies in Figure 9 and its
analytical formulation in Theorem 4.5. A function should be differentiable iff it can
17. Differentiation in Normed Spaces
354
be approximated very closely by a shifted linear function. As in Figure 9, the idea
in Definition 17.24 below is that near the point where the derivative is taken, for any
multidimensional analogue of a cone, both the function and the approximating linear
entity should ultimately be in the same “cone” (see Figure 42). Similar to the single
variable setting, the natural domains for differentiable functions are open sets.
Definition 17.24 Let X , Y be normed spaces, let R
X be open, let f : R + Y be
a function and let x E R. Then f is called differentiable at x @there is a continuous
linearfunction L : X + Y so that for all E > 0 there is a 6 > 0 such that for all z E X
with llz - xi1 < 6 we have
IIf(z>-f(x)-L[z-xlI/
i&//Z--XII.
In this case, we set Df (x) := L and call it the derivative o f f at x .
By Exercise 17-25b, the derivative is unique, so we are can speak of the derivative.
Notation 17.25 The argument in round parentheses behind a derivative Df will denote
the point at which the derivative is taken and the argument in square brackets will
denote the place where the derivative (remember that it is a linear function) is evaluated.
That is, for f : R + Y , D f ( x ) [ a ]will denote the derivative of f taken at x E R and
evaluated at a E X.
0
+
The function A [ z ] := f ( x ) Df ( x ) [ z- x ] is also often called the linear approximation of f at x . The name should be clear. The function A is “linear” (affine linear
to be precise, but that distinction is not always made) and it approximates the function
f (see Figure 42). Exercise 17-25 investigates the quality of the approximation and it
also gives a geometric interpretation for functions into the real numbers.
Theorem 17.12 expresses the connection between linear functions L : Iw + Y and
vectors and Theorem 17.17 expresses (among other things) the connection between
linear functions L : R + R and numbers. These connections are the reason why
derivatives of functions defined on intervals ( a , b ) are usually considered to be numbers
or (tangent) vectors. The formal justification is the following.
Proposition 17.26 Let Y be a normed space and let a < b be real numbers. The
function f : ( a , b ) + Y is differentiable at x in the sense of Dejnition 17.24 i f f the
f (x h ) - f ( X I exists. In this case, we call f ’ ( x ) the velocity
limit f ’ ( x ) := lim
h-0
h
vector’ and we have Df ( x ) [ a ]= f ’ ( x ) afor all a E R.
+
Proof. Mimic the proof of Theorem 4.5. (Exercise 17-26.)
Continuous linear functions are the simplest example of differentiable functions
that do not follow the pattern of Proposition 17.26
Proposition 17.27 Let X , Y be normed spaces and let L : X + Y be continuous and
lineal: Then L is differentiable at every x E X with D L ( x ) [ a ]= L [ a ] .
‘The name comes from the fact that if f : ( a , b ) +
f’(r) is the velocity of the particle.
W3 gives the position of a particle at time t , then
355
17.3. Differentiability
Proof. Exercise 17-27.
Note that the derivative of a continuous linear function actually is a constant function (whose value at every point is L [ . ] ) !This is similar to the derivative o f f (x) = cx
being f ’ ( x ) = c. More examples of differentiable functions will be encountered
throughout this chapter. For the rest of this section, we will work with derivatives
in their full generality. First note that differentiability still implies continuity.
Theorem 17.28 Let X , Y be normed spaces, let R _C X be an open set, and let the
function f : 52 -+ Y be differentiable at x E Cl. Then f is continuous at x.
Proof. Exercise 17-28
We conclude this section with differentiation rules.
Theorem 17.29 Let X , Y be normed spaces, let R g X be open, let the functions
f , g : 52 + Y be differentiable at x E 52 and let a E R. Then the sum f
g is
differentiable at x with D ( f
g)(x) = Df (x) + Dg(x) and the scalar multiple af is
dizerentiable at x with D ( af ) (x) = a Df (x).
+
+
Proof. When the derivative is given, differentiability is proved directly with the
definition. Consider af. For given E > 0, find 6 > 0 so that for all z E 52 with
E
/Iz -x 11 < S we have f ( z ) - f (x) - D f ( x ) [ z- XI I
-I/z - x 11. Then for all
b I+1
z E R with llz - xi1 < 6 we infer
/I
/I
I IaI-
&
la1
+ 1 Ilz - X I /
5
EIIZ
-
which means that f is differentiable at x and the derivative is a Df(x).
The claim about the sum is proved similarly (Exercise 17-29).
The Chain Rule retains its familiar form from calculus, except that the multiplication is replaced by composition. This is natural, because the composition of two linear
functions from R to R corresponds to the multiplication of the numbers that represent
the functions (see Theorem 17.21).
Theorem 17.30 Chain Rule. Let X , Y , Z be normed spaces, let R1 C X and 522 Y
be open, let g i-2 1 -+ R2 be differentiable at x and let f : R2 + Z be differentiable at
g(x). Then f o g isdifferentiableatx withderivative D( f og)(x) = Df (g(x))oDg(x).
Proof. Let F > 0. Find 61 > 0 so that for all y
E
Y with lly - g(x)II < 61 we have
356
17. Differentiation in Normed Spaces
=
~IlZ-xlI,
which proves the Chain Rule.
Similar to the Chain Rule, the rule for the differentiation of inverse functions retains
its overall form. The reciprocal in Theorem 4.21 is replaced with the inversion of the
derivative. This is once again natural, because if two linear functions from R to R are
inverses of each other, then the numbers that represent the functions are reciprocals of
each other. Unlike in Theorem 4.21 we must demand that the image of the domain
of f is open and that the inverse is continuous. This is because the argument at the
beginning of the proof of Theorem 4.21 is not easily translated to the setting of normed
spaces. Corollary 17.66 will show that the translation is possible, but it requires the
Implicit Function Theorem.
Theorem 17.31 Let X , Y be normed spaces, let Cl s X , 5 s Y be open and let
f : R + 5 be a continuous bijective function with continuous inverse. I f f is differentiable at xo and Df (xo) is continuous, bijective and linear with continuous inverse,
then f - ' is differentiable at yo := f ( x o ) and D
(f-') ( y o ) = (Df( f - ' ( y o ) ) ) - ' .
Proof. First note that there is a 61 > 0 so that for all x E X with 1I.x - xoll < 61 we
have
/ f ( x ) - f ( x o ) - D f ( x o ) [-~ xol// 5
1
1
1
IIx - xoll. (Why is the
2 (Df(xo))-lII
denominator not zero?) Hence, for all x E X with IIx - xoll <
61
the following
17.3. Differentiability
357
In particular, for all elements x E X with /Ix- xo/I < 61 we have the inequality
Ilx - xoll 5 2 ~ ~ ( D f ( x o ) ) -IIf(x)
*
- f ( x o ) / I .Now let E > 0. Find 82 E (0,s') so
that for all x E X with l(x - xoll < 62 we have
1
= EllY-Yoll.
rn
The derivative of the inverse of a function must not be confused with the derivative
of the function that maps an invertible linear function to its inverse, which is considered
in the following. (Recall that in Exercise 17-24a we have already proved that inversion
is a continuous operation. Theorem 17.32 provides another proof of this fact.) Note that
for the theorem to make sense, we first must also prove that the linear homeomorphisms
(invertible continuous linear functions with continuous inverse) form an open subset of
the space C(X,Y ) .
Theorem 17.32 Let X be a Banach space, let Y be a normed space and let K ( X , Y )
be the set of linear homeomorphisms from X to Y . Then K ( X , Y ) is an open subset
of C(X. Y ) and J ( A ) := A-' is a differentiable function from ?-t(X, Y ) to X ( Y , X )
whose derivative at A E K ( X , Y ) is D J ( A ) [ F ]= -AT1 o F o A-'.
X
Proof. For K
E
C(X,X) with IlKIl < 1, the series z ( - - l ) j K j converges absoj=O
lutely. Because X is a Banach space, the series converges. Let I : X --+ X denote the
identity. It can be verified directly that ( I
+ K)-'
00
= c ( - l ) j K j . In particular, this
j =O
358
17. Differentiation in Normed Spaces
means that if 11 K )I < 1, then Z
1
is bounded by (I
+ K)-'
I/
5
+ K is invertible with continuous inverse and its norm
1
~
1 - IIKII'
Now let A E X ( X , Y ) and consider F E C ( X , Y ) with
1
II F II
<
IIA-lII Then
llFll < 1, which means that A and
+
Z A-' F are invertible with continuous inverse. Hence, A
tinuous inverse. Thus for all A E X ( X , Y ) we infer that B
+ F is invertible with con1 ( A ) s X ( X , Y ) and
l l A - 1 II
R ( X , Y ) is open in C ( X , Y ) .
To prove the differentiability of J at A E R ( X , Y ) let E > 0 and find a positive
&
1
S < min
so that for all F E C ( X , Y ) the inequality IlFIl < 6
2 IIA-'
{
implies that Z
r"i}
+ A-'
00
F is invertible with continuous inverse and
j=O
Then for all functions F E C ( X , Y ) with 1) F 11 < 6 we conclude that A
with continuous inverse and
1'
A-' F
IJ
< 2.
+ F is invertible
which proves the result.
Exercises
17-25. Let X , Y be normed spaces, let R C X be open and let f , g : R
---f
Y be functions. We will say
that g is tangential to f at x E R iff iimr 'if(z) - g(z)'I = 0. (Note that we can work with limits,
llz - X I /
because by Exercise 16-52 every point of R is an accumulation point.)
(a) Prove that the function f : + Y is differentiable at x E Q with derivative L iff the function
g ( z ) := f ( x ) L [ z - x] is tangential to f at x.
+
359
17.3. Differentiability
(b) Prove that if two continuous linear functions L1, L2 : X + Y are so that both shifted functions f ( x ) L j [ ,- x ] are tangential to f at x, then L1 = L2.
+
(c) Let f : W2 + W be differentiable at the point ( x g , yo) E R2.Explain why the function
P ( x , y ) := f ( x 0 , yo) D f ( x 0 , y o ) [ ( x - xo, y - y o ) ] is also called the tangent plane o f f
at ( x o , Y O ) .
Hint. Represent D f ( x 0 , y o ) as a matrix.
+
(d) Let X be a normed space, let R 5 X be open and let f :
Define the tangent hyperplane of f : R + R at x .
R --f B be differentiable at x
E
R.
17-26. Prove Proposition 17.26.
17-27. Prove Proposition 17.27
17-28. Prove Theorem 17.28.
Hint. Add and subtract D f ( x ) [ z - x ] inside
I/ f ( z ) - f ( x ) 1 .
17-29. Finish the proof of Theorem 17.29. That is, let X , Y be normed spaces, let R C X be open, let
f,g : R + Y be differentiable at x E R and prove that f g is differentiable at x with derivative
Wf
= D f ( x ) Dgb-1.
+
+
+
17-30. Explain why we must use "less than or equal" in Definition 17.24 while we can use the strict inequality in Theorem 4.5.
17-31. Let R C W" be open and let x E R.Explain why a function f : R -+ W m is differentiable at x E R
iff there is a matrix A E M ( m x n , W) so that for every E > 0 there is a S > 0 so that for all z E R
with llz - x I/ < S we have f ( z ) - f ( x ) - A(z - x ) < E I I Z - x /I, where A(z - x ) is the matrix
product of A with the representation of z - x as a column vector with respect to the standard base.
1
/I
17-32. Let ( M . C. p ) be a measure space. Define I : L 2 ( M , C, p )
that I is differentiablewith D I ( f ) [ u ]=
17-33. Use Theorem 17.32 toprovethat
d
(A)
d x .x
lM
-+
R by I(f) :=
2f u d g
=
1
-?.
17-34. Compare the proofs of Theorems 4.10 and 17.30. Decide which is simpler and explain your decision.
17-35. Let X. Y be normed spaces, let R g X be open, let f : R -+ W be differentiable at a
and let (fy)(x) := f ( x ) y for all x E R.Prove that D ( f y ) ( a ) [ . ]= D f ( a ) [ . ] y .
E
R,let y
E
Y
17-36. Let X. Y be normed spaces, let R g X be open and let f : R -+ Y be continuous at the point
x E R. Prove that if there is a linear function L : X -+ Y (we do not assume L is continuous) so that for all E > 0 there is a S > 0 so that for all z E R with /1z - x/1 < 6 we have
f ( z ) - f ( x ) - LIZ- X I
i~ l l z xlI, then f is differentiable at x .
Ii
1
17-37. Let ( X , 11 //x
) , ( Y , 11 . / ( y ) be normed spaces, let R g X be open, let f : R -+ Y , let x E R.let
/I.\/; be a norm on X that is equivalent to II.1Ix and let II.ll; be a norm on Y that is equivalent to 11 11 y .
Prove that f considered as a function from a subset of ( X , 11 . 1 1 )~ to ( Y , 11 IIy ) is differentiable
to ( Y . 11 .) ;1
is differentiable at x .
at x iff f considered as a function from a subset of (X, 11 ) ; 11
17-38. Let X be a normed space. let R C X be open and let f : R + W be differentiable at x E R. Prove
that if there is a 6 > 0 so that for all z E R with llz - X I \ < S we have that f ( z ) 5 f ( x ) (that is,
there is a local maximum at x ) , then D f ( x ) = 0.
Hinr. Suppose the contrary and use an a E X so that D f ( x ) [ a ]z 0.
17-39. Let X, Y be normed spaces, let R 2 X be open and for all n E N let f n : R -+ Y he differentiable
converges pointwise to
on R with continuous derivative O f n : R + L ( X , Y ) . Prove that if
a function f : R -+ Y and [ Of,, },"==, converges uniformly to a function 7 : R -+ L ( X , Y ) , then
f is differentiable and Df = 7.
Hinr. First define pointwise and uniform convergence.
(fn)F=l
360
17. Differentiation in Normed Spaces
17.4 The Mean Value Theorem
The Mean Value Theorem (Theorem 4.18) cannot be translated directly to higher dimensional settings. Exercise 17-40 shows that for functions f : [a, b] + X , where X
is a normed space, the derivative (velocity vector) need not be parallel to f ( b ) - f ( a )
at any t E ( a , b ) . However, the Mean Value Theorem is mainly used to bound the
difference f ( b ) - f ( a ) by the product of b - a with the supremum of the derivative.
Such a result can be proved in a more general setting and we will call the general result
“Mean Value Theorem,” too. A natural idea for the proof is to first use the Fundamental Theorem of Calculus for an appropriately defined integral and to then use the
triangular inequality (Exercises 17-4 1 and 17-42). This approach ultimately requires
the continuity of the derivative or a technical integrability condition. Conditions of this
kind can be avoided by working with compactness instead.
Theorem 17.33 Mean Value Theorem. Let X , Y be normed spaces, let L! E X be
open, let f : L! + X be differentiable and let a , b E L! be distinct points so that for
all t E [0, 11 we have a t ( b - a ) E L!. Then
+
lIf(b)-f(a)II (sup{llDf(a+t(b-a))ll : t E [0,11}lJb-~ll.
Proof. Considerg(t) := f (a+t(b-a)), definedon aninterval (-u, l+u) for some
d
u > 0. By the Chain Rule, we obtain g’(t) = --g(t) = D f ( a + t ( b - a ) ) [ b- a]. Let
dt
E > 0. For every t E [0, 11, there is a Sr > 0 so that for all z E X with llzll < Sr lIb - a 11
we have
/I f ( a +t (b- a ) +z)
I/
&
- f ( a +t(b - a ) ) - D f ( a +t ( b- a ) ) [z] 5 -llzll.
Therefore, with c := sup {//of
(a
Ix - t I < 6r we infer
Ilb-all
+ t ( b - a ) )1 : t E [O, l]}, for all t , x E [0, 11 with
I/
/Ig(x) - s ( t )
5 IlgCx) -so> - g’(t)(x - t > / / I/g’(t>(x- t)II
+
1 f ( a + x ( b - a ) ) - f ( ~ +(tb - a ) ) - D f ( a + t ( b - a ) ) [ ( x- t ) ( b -a)] /
t )( b-011 1
I---I l b - t ) ( b-all1 + / ~ ~ f ( a + t ( b - a ) ) ~ / I I b - ~ I I I ~ - t l
lIb - all
=
+ // of (a+t (b- a ) ) [(x
-
&
5
(cllb-all
+
E)IX
+
- tl.
NOW{ ( t - S r , t S t ) : t E [0, I]} is an open cover of the compact set [0, 11, which
means there is a finite subcover { (t, - St,,t, A t i ) : i = 1, . . . , n } . Without loss of
generality, we can assume that tl < t2 < . . . < tn, t l - &, < 0, 1 < tn S,, and
t,+l-SrI+, < t , + S r L f o r i = l , . . . , n - l . S e t x o : = O , x , : =
l a n d f o r i = 1 , . . . ,n-1
let x, E (t,+1 - SfL+,, t, S,,) n ( t l ,t,+l). Then, using a telescoping sum,
+
+
/I f ( b ) - f ( a )I/
+
17.4. The Mean Value Theorem
I
361
n
n
i=l
n
i=l
C (cllb-all+
+ ( cl l b - ~ l l +& ) ( t i - x i - l )
E)(Xi-ti)
= cllb-a//
+
E.
i=l
Because E > 0 was arbitrary, the theorem is proved.
The requirement that there is a straight line connection between the points a and
b feels limiting, but it cannot be dropped. In particular, mere connectedness of the
domain is not enough. (See Exercise 17-62.)
Exercises
17-40. There is no direct translation of the Mean Value Theorem to vector valued functions. Let the
function f : [0, n]+ R3 be defined by f ( t ) := (cos(t), sin(t), t2(n- t ) ). Prove that there is no
c E (0, n)so that f(n)- f ( 0 ) = D f ( c ) [ n- 01.
17-41. The Riemann integral for Banach space valued functions. Let X be a Banach space and let
f : [a, b] -+ X be a function.
(a) Define what it means for f to be Riemann integrable.
(b) Prove that i f f is continuous, then f is Riemann integrable.
Hint. Prove that for a sufficiently fine partition P any two Riemann sums R ( f , P , T I )and
R(f,P , T z ) are close to each other. Then prove that for any refinement Q the Riemann sums
for P and for Q are close to each other. Prove that for Pn being the equidistant partition with
n intervals, the Riemann sums converge. Then prove that all Riemann sums converge.
(c) Prove the Antiderivative Form of the Fundamental Theorem of Calculus. That is, prove
that i f f : [a, b] + X is differentiable and f' is integrable, then
Ib
f ' ( x ) dx = f ( b ) - f ( a )
Hint. Cover [ a , b] withintervals ( x - S x , x + S x ) sothatforallz E [ a , b] with l z - x l < 8, we
Iz - X I . Take a finite subcover and construct a
have I l f ( z ) - f ( x ) - f ' ( x ) ( z - x ) / l 5
b-a
partition
A
(d) Triangular inequality. Prove that
Illb Illb1
1
f(x)
f ( x ) dx 5
dx if both integrals exist.
(e) Prove the Derivative Form of the Fundamental Theorem of Calculus. That is, prove that
d", I' f ( x ) d x = f ( t ) for all
i f f : [ a , b1 -+ X is continuous, then -
t
E [a, b].
17-42. Use Exercises 17-41c and 17-41d to prove that if X, Y are normed spaces, R C X is an open subset,
f : C2 + X is differentiable, Df is continuous, and a , b E 12 are so that for all t E [O, 11 we have
a+t(b-a)ER,then / / f ( b ) - f ( a ) 5 s u p { / I D f ( a + t ( b - a ) ) / / : r ~ [ O , l ] } i ~ b - a ~ I . T h e n
explain why we cannot avoid the continuity hypothesis in this approach.
/
17-43. Use Exercises 17-41c and 17-41d to prove that if X. Y are normed spaces, R 5 X is an open subset,
f : R -+ X is differentiable, Df is continuous, and a , b E R and y > 0 are so that for all f E [O, 11
we have a f(b - a ) E R and Of(a t ( b - a ) ) - D f ( a ) 5 y t l / b - a l l , then the inequality
+
I/
+
Y
1 f ( b ) - f ( a ) - Df(a)[b - a1 / 5 -2 Ilb - all2 holds.
1
Hint. Mimic the proof of Lemma 13.13 and use Exercise 17-41c.
17-44. Newton's method in several variables. Let X be a Banach space, let R C_ X be open and let
f : Q + X be a continuously differentiable function so that D f ( x ) E ' H ( X , X) for all x E Q. Prove
that if there are xo E R and a , B , y > 0 so that D f ( x g ) - ' [ f ( x 0 ) ] 5 a , so that for all x E R
1
17. Differentiation in Normed Spaces
362
we have /Df(x)-'11 5 j3, so that for all x. z E L2 we have
h :=
~
2
< 1 and so that with r :=
1-h
1) D f ( z ) - D f ( x ) 1 5 y l l i -xIl,
we have B,(xo) & 52, then
(a) Each recursively defined point x n + l := xI1- Df(x,)-'
(b) The sequence (x~];=~
so that
[ f ( x n ) ] is in B,(xo)
converges to a point u E B,(xo) with f ( u ) = 0.
h2n-l
(c) For all n 2 0 we have
]/it - xn
11 5 cy 2
"
1-h
Hint. Mimic the proof of Theorem 13.14.
17.5 How Partial Derivatives Fit In
Partial derivatives are usually defined in the direction of a coordinate axis. Rather than
n
d
tying our presentation to
Rdand its coordinate axes, we consider a product
Xi of
i=l
normed spaces.
Proposition 17.34 For i = 1, . . . , d , let ( X I , 11 . lli) be a normed space and let 11 . //Bd
n
d
be a norm on Rd. Then
is a vector space and
jj
X i with componentwise addition and scalar multiplication
i=l
(XI,
1)
. . . , xd) jj := (11x1
11 1 , . . . , llxdlid) IjRd defines a norm on
it.
Proof. Exercise 17-45.
Definition 17.35 For i = 1, . . . , d , let (X,,
I/ . I l l ) be a normed space and let I/ . llmd
b e a n o r m o n R d . Thenorm /)(xi, . . . , xd)ll := ~
~, . . . , Ilxdlld)llad
(
~
i s ~c a l l e d~a
n
d
product norm. From now on, we will assume that any product
X , of normed spaces
I=1
is equipped with a product norm and we will call it a product space. Because all norms
on Rdare equivalent, all product norms are equivalent. Hence, unless otherwise stated,
we will use max { 11x111 1 , . . . , ((xdlid} as the product norm.
Exercise 17-46 shows that all product norms are equivalent, so we are free to interchange specific norms in Rdin the definition of the product norm we use. The definition of product norms implies that the product of Banach spaces is again a Banach
space and that the natural projections are continuous.
Proposition 17.36 For i = 1, . . . , d , let (X,,
// . i l l ) be a normed space. The product
n
d
X , is a Banach space ifSallfactor spaces ( X I , 11
r=l
Proof. Exercise 17-47.
. I l l ) are Banach spaces.
~
17.5. How Partial Derivatives Fit In
363
Figure 43: For partial derivatives, we restrict our attention to an appropriately translated
subspace and take the derivative in that subspace.
n
d
Proposition 17.37 For i = 1, . . . , d let
(Xi,
11 . lli) be a normed space and let
Xj
i=l
d
be the product space. Then the natural projections nj : n
n;(xi
~
X i
-+ X i dejined by
i=l
. . . , xd) := x j are continuous.
Proof. Exercise 17-48.
Exercise 17-49 now shows that not every norm on a product of vector spaces is a
product norm. Proposition 17.38 assures that as we consider partial derivatives with
respect to a factor space, the domains of the requisite functions are open.
n
d
Proposition 17.38 Let X =
Xi
be a product space, let R C X be open, and let
i=l
a = ( a l , . . . , a d ) E R. Then f o r all j E [ 1, . . . , d } the set
:= { x j E
is open in
xj
: ( ~ 1 , .. . , a j - l , x j , a j + l , . . . , a d )
E
a}
(x;.// . 1).;
Proof. Exercise 17-50.
n
rn
d
Definition 17.39 Let X =
X j be a product space, let Y be a normed space, let
i=l
R 2 X be open, let a = ( a l , . . . , a d )
E
R,and let j
E (1,.
. . , d } . For f : R + Y ,
17. Differentiation in Normed Spaces
364
define fj" : R; + Y by f j " ( x ; ) := f ( a l , . . . , a;-1, x;, a ; + l , . . . , a d ) . If fj" is d$ferentiable at a;, then the derivative of fj" at a; is denoted D; f ( a ) and it is called
the partial derivative o f f with respect to X ; at a or the partial derivative o f f with
respect to the j t h variable at a. For a visualization, consider Figure 43.
n
d
It is easy to see that if f is differentiable at a
E
X i , then the restriction of its
i=l
derivative to X ; is the partial derivative with respect to X ; .
n
d
Proposition 17.40 Let X =
X i be a product space, let Y be a normed space, let
i=l
52
X be open and let f : R + Y . I f f is difSerentiable at a E Q, then f o r all
integers j E (1, . . . , d } the partial derivative at a with respect to X ; exists and is
equal to D; f ( a ) = D f (a)I(o)x . . . x ( ~ ) x ~ , x ( ~ ~ x .Moreovel;
. . x ~ ~ ~ . f o r all u E X we have
d
of ( a ) [ u i , . . . , U d l =
D ; f (a)[u;l.
;=l
Proof. Exercise 17-51.
Unfortunately, the existence of partial derivatives is not sufficient for differentiability (in fact, not even for continuity) as Exercises 17-60 and 17-61 show. The pathology
exhibited in these exercises is why we build the theory around derivatives as in Definition 17.24 instead of partial derivatives. Nonetheless, it is possible to construct the
derivative from partial derivatives, as long as the partial derivatives are continuous.
n
d
Theorem 17.41 Let X =
X i be a product space, let Y be a normed space, let
i=l
Q & X be open and let f : R + Y . Iffor every j E { 1, . . . , d } the function f is dzfferentiable with respect to the jthvariable at every x E S2 and the function x H D; f (x)
is continuous at a E S2, then f is di~erentiableat a and the derivative o f f at a is
d
Df
(a>[Ui,.
. .,U d l =
D;f (a)[u;].
;=l
Proof. Recall that we said we would use IIx 11 := max { llxi Ili : i = 1, . . . , d } as the
product norm. Let E > 0. Find 6 > 0 so that for all j = 1, . . . , d and all x E Bs(a) we
&
have D; (x) - D; ( a ) < -. Then for all (z1, . . . , Z d ) E Bs(a1, . . . , a d ) we obtain
d
the following.
1
/
17.5. How Partial Derivatives Fit In
365
c
d
The above proves that D f ( a ) [ u l ,. . . , U d ] =
Dj f ( a ) [ u j ] .
j=1
With the general results established, we can now turn to the familiar partial deriva-
n
n
tives on Rn. Because Rn=
R is the prototypical product space, all results proved
i=l
so far apply to R
'. Therefore we can concentrate on translating the abstract notions to
the more concrete setting of n-dimensional space. Partial derivatives in R"are typically
defined in the direction of a coordinate vector.
Definition 17.42 Let the set R
R" be open, let f : R -+ R" be a function and
let a E R. Then the partial derivative off with respect to x i at a is defined to be
f (a + h e j ) - f ( a ) if this limit exists.
af
-(a)
:= lim
ax
h-0
h
The connection between these partial derivatives and the derivative is an exercise
in reinterpreting abstract concepts as matrices.
Theorem 17.43 Let R g
a
E
R,f o r i
= 1 , . . . , m let
R" be an open set,
fi := j r i o f
let f : R
and letu
E
-+
R" be differentiable at
Rn be so thatu
n
= z u j e j . Then
/=I
af
"
repreThat is, the matrix (:(a))
ax
i = l , . . . ,m
j=1
j = l ....,n
sents D f ( a ) with respect to the standard bases in Rn and R". This matrix is also
called the Jacobian matrix off at a.
Conversely, i f f : R -+ Rn' is so that f o r all i = 1, . . . , m and j = 1, . . . , n and
of ( a ) [ u ] =
i=l
ej
j$(a)uj.
afi
all x E 52 the partial derivative -( x ) exists and
8Xj
continuous at a , then f is differentiable at a.
if all these partial derivatives are
366
17. Differentiation in Normed Spaces
n
n
Proof. For j = 1, . . . , n , let Rj := R so that Rn =
R j . (In this fashion, we
j=l
can distinguish the partial derivatives in the coordinate directions.) For g : C2 + R
c
n
differentiable at a , we infer by Proposition 17.40 that D g ( a ) [ u ]=
D~,g(a)[uj].
j=1
By Proposition 17.26 for all j = 1, . . . , n we obtain Da,g(a)[uj] = -ag
(a)uj,
hence D g ( a ) [ u ]=
ax
c
j=1
-ag
(a)uj.
ax
and
Now for f : C2 + Rm by Exercise 17-35 we con-
The last statement follows from Theorem 17.41.
For m = 1, the Jacobian matrix of a differentiable function f : Rn + R is given
special attention because of its physical meaning.
(7)
Definition 17.44 Let C2 C R” be open and let f : 52 + R be so that all partial derivatives o f f exist at a E C2. Then we define the gradient o f f at a to be
grad( f ) ( a ) := V f ( a ) :=
(‘F’).
With V :=
this looks like a formal
af
(a)
multiplication o f f with the ‘ vector” V. The (purelyfo%al) “vector” V is also called
the nabla operator.
The physical meaning of the gradient is easily explained with what we have derived so far. Let f : 52 + R be differentiable at a. Then by Theorem 17.43 for
all u E Rd we obtain D f (a)[u] = ( Vf ( a ) , u ) . Moreover, the derivative describes
the local behavior of f near a. In particular, for any vector u with IIuII = 1 we infer lim
t+O
(‘
+
”)
t
-
’
(‘)
= (Vf ( a ) ,u ) (Exercise 17-52). That is, the inner product
( Vf ( a ) ,u ) gives the derivative of f in the direction u . This inner product is largest
when u = Of
so the gradient vector gives the direction of steepest ascent
V f (a>jj ’
with its norm being the slope in that direction. Similarly, -V f ( a ) gives the direction of steepest descent with the negative of its norm being the slope in that direction.
Physical systems usually strive for equilibrium. Whenever there is an imbalance in a
physical quantity described by f in a homogeneous medium, there will be a flow in
the direction of -V f that tries to restore equilibrium. For more on the physical ideas,
consider Section 21.2.
)I
Exercises
17-45. Prove Proposition 17.34 as follows
367
17.5. How Partial Derivatives Fit In
n
d
(a) For i = 1, . . . , d , let X; be a vector space. Prove that
Xi with componentwise addition
i=l
and scalar multiplication is a vector space.
d , let ( X;, //
that
11; ) be a normed space and let 11 . I/Rd be a norm on W d . Prove
1) (xl,. . . , xd) 1) := 1 ( 11x11; 1, . . . , IjXdlId )/I@
n xi.
d
defines a norm on
i=l
17-46 Prove that any two product norms are equivalent.
Note. This and Exercise 17-37 justify the free interchange of one product norm for another
17-47 Prove Proposition 17.36.
17-48 Prove Proposition 17.37.
17-49 Not every norm on a product is a product norm.
n
d
(a) Let X I , . . . , x d be vector spaces and let //
I~x
be a norm on
Xj
j=1
1
/Ix
i. Prove that for all j = 1, . . . , d the function ilxj 11) := (0, . . . , 0,x j , 0 . . . , 0)
is a
norm on X j .
ii. Prove that if for each i E [ 1, . . . . d ] we pick a fixed x; E X; \ (01, then the function
( a l , . . . , a d ) := ( a l x l , . . . , a d x d )
defines a norm on J R ~ .
/I
I/
/I
llx
(b) Let ( X , 11 . // ) be a normed space. let S g X be a dense subspace (like the simple functions
in L P , see Theorem 16.85) and let F be a subspace so that S f l F = {O} (with S being the set
of simple functions in L P , F could be the set of scalar multiples of a function f $ S). Define
Y := S x F and let (s, f)
:= 11s f l l .
1
1 np
+
i. Prove that 11 .
is a norm on S x F .
ii. Prove that the natural projection ns : S x F + S is not continuous
Hint. Let s, + f and consider s, - f.
iii. Conclude that 11 . l l n p is nor a product norm on S x F
17-50 Prove Proposition 17.38.
17-51 Prove Proposition 17.40.
Bd be open and let f : R + B be differentiable at a
f ( a +tu) - f(u)
l(u11 = 1 we have lim
= ( V f ( Q ) , u ).
t-0
t
17-52 Let R C
n
E
R.Prove that for any vector u with
d
17-53 Let X =
Xi be a product space, let Y be a normed space, let R
5 X be open and let f : R + Y
i=l
be a function. Prove that i f f is differentiable on R and Df is continuous, then all partial derivatives
D j f exist for all a E R and the functions D j f are continuous on R.
17-54 Let X be a vector space and let E,F g X be vector subspaces of X. Then X is called the direct
sum of E and F , denoted E @ F , iff E F = ( 0 )and for all x E X there are e E E and f E F so
thatx=e+ f .
Prove that if X is the direct sum of E and F , then X = E c
3 F is isomorphic to E x F .
17-55 Prove the Multivariable Chain Rule. That is, prove that if f ( X I ,. . . , x,) : W n + W is a differentiable function of n variables and the components are differentiable functions x j ( t i . . . . , rm) of m
variables, then
af
-
at;
=
j=1
af ax2 + . . . + -2.
af ax
af ax, + -af a x j = --ax2 ati
ax, at;
ax j at;
ax, ati
17-56 Coordinate transformations for differential operators. Let (x,y ) be rectangular coordinates on
arctan ( f ) ;
if x > 0,
be polar coordinates on R’.
w2, let r =
and let Q :=
ar ctan ( $ ) +;r ; i f x < O ,
JG
17. Differentiation in Normed Spaces
368
af af
af sin(@)
(a) Prove that if f : W2 + R is differentiable, then - = - cos(8) - - ~.
ax
ar
80 r
Hint. Use the Chain Rule as stated in Exercise 17-55.
(b) Prove that i f f : R2 + R is differentiable and
a2f
a2f
af are differentiable, then
af
and ax’ af
ar
ae
a2f sin(8) cos(8) a f sin(Q)cos(O) af sin2(@) a2f sin2(8)
2+2arao
r
as
,2
ar r
a@ r 2 ’
The second partial derivatives simply denote partial derivatives of partial derivatives.
+
-- -cos2(8)-2a x 2 - ar2
(c) Let f be a function whose partial derivatives
an expression for
+--
af are differentiable. Derive
af
af af and ax’ ay’ ar
a@
a2f + a2f in polar coordinates.
ax2
ay*
-
a 2 f and add the two
Hint. Derive a formula similar to that in part 17-56b for ay
n
*
d
17-57. Let W be a normed space, let
Xi be a product space, let R
i=l
let f j : R + Xi be differentiable at x E R. Prove that f := (f1,
Df(xIL.1 = ( D f l ( X ) [ . I > . , Dfd(X)[.I ).
W be open and for i = 1, . . . , d
. . . , f d ) is differentiable at x with
, 1
17-58. Leibniz’ Rule. Let a , b : (c, d ) -+ ( I , u ) be differentiable and let g : ( I , u ) x (c, d ) --f
tiable with respect to the second variable. Let F E L 1 ( l ,u ) be so that all g ( . , t ) , ’(” t
a
and -g(.,
at
prove that
R be differen-
h , - ’(” t )
h
t ) are bounded by F ( . ) and let all g(., t ) be continuous. Use the steps outlined below to
(a) Prove that u : (c, d ) + ( I , u ) x (1. u ) x L’(I, u ) defined by u ( t ) :=
(
+
a(t)
b(t)
is differen-
g(x3 t )
tiable.
Hint. Use the result of Exercise 17-57. Use Proposition 17.26 and the Dominated Convergence Theorem for the differentiability of the third component.
(b) Prove that s : ( I , u ) x (1, u ) x C ( l , u ) + W defined by s
where C ( I , u ) is a normed subspace of L1(Z, u ) .
0 lb
b
Hint. Theorem 17.41. Use the linearity of the integral operator h
for the partial derivative with respect to the third component.
h ( x ) dx is differentiable,
:=
H
lb
h ( x ) d x on L 1 ( I , u )
(c) Prove Leibniz’ Rule using the Chain Rule
17-59. Let X be a normed space, let D g X be a dense subspace, let Y be a Banach space, let R g D be
open in D and let f : R + Y be continu_ously differentiable with bounded uniformly continuous
deriIative. Prove th2t there is an open set R E X and a unique continuously differentiable function
e : R + Y so that R n D = R and so that e l n = f .
Hint. Use the Mean Value Theorem (Theorem 17.33).
{ m’
xy
17-60. Consider the function f (x, y) =
af
af
(a) Prove that -(0, 0) = -(0 ,
ax
av
0
0 ) = 0.
for (x, y ) f (0, O ) ,
for (x, y) = (0,O).
17.6. Multilinear Functions (Tensors)
369
(b) Prove that f is not continuous at (0,O)
17-61. Consider the function f ( x , y) =
. for (x,y )
m'
{ 0; for (x,
X2Y
f (O,
y) = (0,O).
(a) Prove that f is continuous at (0,O).
(b) Let X = span(u)
c R2 be an arbitrary one dimensional linear subspace of W2 with
Prove that D x f ( 0 , O ) := lim
I / u / /= 1.
f @ u ) - f ( 0 ,0 ) exists,
f
t+O
(c) Prove that f is not differentiable at (0,0)
Hinf. If f was differentiable at (0, 0), what would the derivatives in part 17-61b be equal to?
Use Theorem 17.43.
17-62. An unbounded function with bounded derivative and bounded, connected two-dimensional
TI
( 8 , A r ) E R2 : 8 > -, A r
f ( Q ,A r ) :=
((
4
+ A r ) cos(8),
(f+
and define f : A + R2 by
E
A r ) sin(R)).
(a) Prove that S := f [ A ]is open, connected and contained in B2(0. 0).
(b) Sketch S and state the geometric meaning of 8 and A r .
(c) Prove that f is injective.
(d) For (x, y ) E S, define 8 ( x , y ) to be the first component of f-'(x, y ) . Prove that the function
(x,y ) F+- B(x, y) is differentiable at every point of S by showing that on every open disk
contained in S it differs from arctan
(e) For (x, y ) E S , define r ( x , y ) := J x 2
(0For (x.y)
+ y 2 and prove that r is differentiable on S .
E S define B ( x . y ) := In ( 8 ( x ,y )
). Prove that B
is differentiable with bounded
derivative.
Hint. Prove that the absolute values of the partial derivatives of 8 are equal to one of the
1x1 I cos(Q(x, Y))I or I ~ I - I s i w x , Y))I and use that the radius
expressions r ( x ,Y )
r(x,Y )
1
(g) Prove that
lim
( x . 41+ (0.0)
B ( x , y ) = 00.
17.6 Multilinear Functions (Tensors)
I f f : R -+ Y is differentiable at every x E S2, then we can also try to differentiate the
derivative D f : R -+ C ( X , Y ) . If the thus computed second derivative exists, it would
be a function that maps points x E R to linear functions D 2 f ( x ) [ . ]E C ( X , C ( X , Y ) )
and these linear functions map points u E X to linear functions D2f ( x ) [ u ] [ . ] .It turns
out that such functions are linear in both square bracketed arguments (see Proposition 17.53 below). To simplify notation, higher derivatives are usually identified with
functions that have several arguments and are linear in each one of them.
370
17. Differentiation in Normed Spaces
n
k
Definition 17.45 Let X =
n
Xi be a product space and let Y be a normed space.
i=l
k
Then T :
X i + Y is called multilinear or k-linear or a k-tensor
i=l
j
E
{ 1 . . . . . k } , all x.
J E
X ; and all a , p
TIXI. . . . . s j - I . a x + B y .
=
E
R we have
. . . , x/;]
x. x j + l . . . . . X k ] + ~ T [ x I . ,. ., X j - 1 .
aT[.Vl.. . . . “;-I.
lfSfor all indices
Xj+l.
)’, , u j + l , .
. . .X X ] .
2-linearfunctions are also called bilinear.
As for linear functions, we enclose the argument of multilinear functions in square
brackets instead of round parentheses. This is because as multilinear functions are
identified with higher derivatives we will evaluate higher derivatives at a point x (enclosed in round parentheses) for an argument [tl , . . . , t m ] ,which will be distinguished
by being enclosed in square brackets.
Example 17.46 Examples of bilinear functions.
1. The function m : R2+ R defined by m [ x , y ] := xy is bilinear.
2. Let X be a real vector space and let (., .) be an inner product on X. Then (., .) is
bilinear. An inner product on a complex vector space X is not bilinear, because
for x, y E X and a E @. we have (x,cuy) = E ( x , y ) . Complex inner products are
also called sesquilinear.
3. The cross product
(1;) (ii)
y1
x
:=
(
YlZ2 - ZlY2
is a bilinear funcx1y2 - Y l X 2
0
tion from I W ~x I W ~to I W ~ .
Continuity of multilinear functions is characterized similar to continuity of linear
functions.
n
k
Theorem 17.47 Let X =
n
X, be a product space, let Y be a normed space and let
1=1
h
T :
X, + Y be k-linear: Then the following are equivalent:
1=1
I . T is continuous on X .
2. T is continuous at 0
E
X.
-
3. T is bounded on B I (0) c X .
n
k
4. There is a c
I/ T [ X l > . . .
1
E
R so that f o r all elements (XI,. . . , xk) E X
Xkl
/I IC l l X l I1 .
=
1=1
’
. IlXk /I.
X , we have that
37 1
17.6. Multilinear Functions (Tensors)
Proof. Mimic the proof of Theorem 17.4 (Exercise 17-63).
n
n
k
Corollary 17.48 Let X =
X i be a finite dimensional product space, let Y be a
i=l
k
normed space and let T :
X , -+ Y be k-lineal: Then T is continuous.
i=l
Proof. Mimic the proof of Corollary 17.6 (Exercise 17-64).
Similar to continuous linear functions, continuous multilinear functions form a
normed space.
Definition 17.49 Let X I , . . . , xk, Y be normed spaces. DeJine Tk(X1,. . . , x k , Y ) to
n
k
be the set of all continuous k-linear functions from the product space
Xi to Y .
If
i=l
XI
= ' . . = Xk = X , we also write
I ~ ( xY ,) instead of I ~ ( X .~. . ,, xk, Y ) .
Theorem 17.50 Let X i , . . . , xk, Y be normed spaces. Then, with pointwise addition
and scalar multiplication, T k ( X l , . . . , x k , Y ) is a vector space and the function
is a norm on T ~ ( x. . .~, x,k , Y ) so that
nxi.
/I ~ ( x l. ., . , X k ) I/ 5 I I T11 11x1 11 . .
llXk
11 f o r all
k
( X I ,. . . , X k ) E
Moreover; i f y is a Banach space, then so is I k ( X 1 , . . . , xk. Y ) .
i=l
Proof. Mimic the proofs of Theorems 17.8 and 17.11 (Exercise 17-65).
Definition 17.51 Similar to the operator norm of a continuous linearfunction, we will
call the norm from Theorem 17.50 the tensor norm of the continuous k-tensor T .
Continuing with similarities to continuous linear functions, multilinear functions
are differentiable iff they are continuous.
Theorem 17.52 Let X I , . . .
~
xk
and Y be normed spaces and consider the function
n
k
T
E
Tk(X1,. . . , X k , Y ) . Then T is dgerentiable and f o r each
( x i , .. . ,xk) E
Xi
i=l
k
the derivative is D T ( X ~ , . . . , X k ) [ u l , .. . , U k ] = c T [ x l , . . . ,x;-1, u j , x ; + l , . . . , x k l .
j=l
Proof. The case k = 1 is Proposition 17.27. Hence, we will assume k 2 2
throughout. We use a telescoping sum that is similar to the one in the proof of Theorem
17. Differentiation in Normed Spaces
372
n
k
17.41. Let ( X I ,. . . , X k ) E
1 (z1, . . . , zk) //
n
k
X;. Then for all elements
i=l
211 ( X I , . . . , xk)
i
I/
(zi,. . . , zk) E
)I + 1 =: M we obtain the following.
X j so that
i=l
j=l
=:C
&
Now for any E z 0 we can choose S := -to make the difference smaller than
E
c.+!
llz - x 11. Hence, T is differentiable with the indicated derivative.
In particular, Theorem 17.52 says that the derivative of a k-tensor is a sum of
( k - 1)-tensors. We conclude this section with the result that allows us to identify
higher derivatives with continuous k-linear functions.
Proposition 17.53 The spaces C (X,I k ( X , Y ) ) and 'Tk+'(X,
Y ) are isomorphic via
D : X + I k ( X , Y ) in L (X,I k ( X , Y ) ) to the multiLinearfunction in I~+'(x,
Y ) that maps (uo, u1,. . . , u k ) to D[uo][ul,
. . . , uk].
the map that sends the function
Proof. Exercise 17-66.
Starting with Exercise 17-69 below the exercises emphasize an important idea that
was already used in Exercise 17-58. If we can write a complicated function as the
appropriate composition of simpler functions, taking the derivative becomes a comparatively easy task of combining the Chain Rule and Exercise 17-57. This is one of the
advantages of the coordinate free approach to differentiation.
373
17.7. Higher Derivatives
Exercises
17-63. Prove Theorem 17.47.
17-64. Prove Corollary 17.48.
17-65. Prove Theorem 17.50.
17-66. Prove Proposition 17.53.
17-67. More examples of k-linear maps.
(a) Prove that m : Rk + &
defined
i by r n [ x l , . . . , X k ] := XI
is continuous and k-linear.
1
1
(b) Let ( M , Z, p ) be a measure space and let 1 5 p , q 5 cc with - - = 1. Prove that
P
4
I : L P ( M , Z, p ) x L q ( M , Z, p ) + W defined by I[f, g ] :=
fg d p is a continuous
’ ’
Xk
IM
+
bilinear map.
(c) Let X, Y , Z be normed spaces and let o : C ( Y , Z) x L(X,Y ) be defined by o [ L ,M ] := L o M .
Prove that o is a continuous bilinear map.
17-68. Prove that for every 2-tensor T : JRd x Bd + R there is a d x d-matrix A so that T [ u ,w] = v T A w ,
where u , w are column vectors with respect to the standard base and v T is the transpose of u , that is,
a row vector.
17-69. Prove each of the following as a consequence of Theorem 17.52 and Exercise 17-57.
(a) Let f,g : ( a , b ) + JR be differentiable. Prove that (f . g)’ = f’g
(b) Let Y be an inner product space and let f,g : ( a , b )
(f>g)’
= if’.g )
+ (f,g’).
--f
+
fg’.
Y be differentiable. Prove that
(c) Let 52 C W3 be open, let f,g : 52 x 52 + W3 be differentiable and let x denote the cross
product on R3. Prove that (f x g)’ = f’x g f x g’.
(d) Let W , X , Y , Z be normed spaces, let 52 _C W be open and let L : 52 + L ( Y , Z) and
M : R + C ( X ,Y ) be differentiable. Prove that D ( L o M ) = D L o M L o D M .
Hint. Exercise 17-67c.
+
+
(e) Explain why all the above product rules “look the same.”
17-70. Derive a product rule for products of k functions f l , . . , , fk : ( a , b ) + W.
17-71. Let G L ( n x n. R) be the set of invertible n x n matrices and let S : G L ( n x n , W) x W“ + W” be
the function that maps the pair ( A , b ) of an invertible matrix A and a “right hand side vector” b to
the solution x of the system of equations Ax = b.
(a) Prove that S is differentiable.
Hint. Theorem 17.32.
(b) Compute the derivative of S at an arbitrary ( A , b ) .
17.7 Higher Derivatives
Now we are ready to investigate higher order derivatives. The underlying definition is
the obvious one.
Definition 17.54 Let X , Y be normed spaces and let R g X be open. The function
f : R + Y is called k times differentiableat x iff f is (k - 1) times differentiable on
R and its ( k - l)Sfderivative Dk-’ f is differentiable at x. The kth derivative off at x
is denoted Dkf (x). Thefunction f is called k times differentiable on R iy f is k times
dixerentiable at every x E R.It is called k times continuously differentiable on 52 i f f
it is k times differentiable on C2 and Dk f is continuous on R.Finally, the function f is
called infinitely differentiable on irfor all k E N f is k times differentiable on R.
17. Differentiation in Normed Spaces
374
Appropriate application of Proposition 17.53 allows us to identify kth derivatives
with k-tensors. This identification is common in analysis and we will use it throughout
this text. That is, Dk f (x) will denote the k-tensor that is associated with the kth
derivative of f at x. The next result shows how higher derivatives of higher derivatives
are related.
Proposition 17.55 Let X , Y be normed spaces, let m , n E W, let R
X be open,
let f : R + Y , and let x E R. Then f is m + n times differentiable at x ifs f
is m times diTerentiable on R and D" f is n times differentiable at x E R. Moreovel; Dm+" f (x) = Dn (D" f )(x) andfor ( t l , . . . , tm+,) E Xm+n we have the identity
Dm+nf( x ) [ t l , . . tm+nl = Dn ( D m f )( x ) [ t l t . .. , tnl[tn+l,.. . tm+nl.
Proof. Let m E N be arbitrary. The proof is an induction on n , with the definition being the base case n = 1. For the representations of the tensors, note that for
( t l , . . . , tm+l) E Xm+' the derivative D (D" f )( x) [t l ][t 2. ,. . , tm+l] is by Proposition 17.53 identified with the value Dm+' f ( x ) [ t l ., . . , t,+l], where Dm+' f (x) is the
corresponding ( m 1)-linear map.
For the induction step, note that f is m n
1 times differentiable at x iff f
is m
1 times differentiable on 52 and Dm+l f is n times differentiable at x E R.
This is the case iff f is m times differentiable on R,D " f is differentiable on R and
Dm+' f is n times differentiable at x E R,which by induction hypothesis is the case
iff f is m times differentiable on R and Dmf is n + 1 times differentiable at x E R.
For the representations of the tensors note that for all (tl , . . . , tm+n+l)E Xm+n+l the
following hold.
+
+ +
+
Dn+l Dm f ) ( x ) [ t l ., . . , tn+lI[tn+2>. . tm+n+lI
(
= D (D" ( D m f ) )(x)[tlICt29... , tn+l1[tn+2,.. ., tm+n+lI
. $
=
D ( D n + " f ) (x)[t11[t2,
. . . , tm+n+ll
-
~n+m+l
f ( x ) [ t l , . . , tm+n+ll.
1
For kfh derivatives (or, more accurately, for the tensors associated with them) the
order of the arguments does not matter. The key to this insight is to prove the result for
second derivatives.
Theorem 17.56 Hermann Armandus Schwarz' Theorem. Let X and Y be normed
spaces, let 52 & X be open, and let f : R + Y be twice differentiable at x E R.Then
for all (s,t ) E X 2 we have D2f ( x ) [ s ,t ] = D2f ( x ) [ t ,sl.
Proof. The main idea is that the sum f ( x + s + t ) - f ( x + t ) - f ( x + s ) + f (x) should
be close to D2f ( x ) [ s ,t ] and D2f ( x ) [ t ,s ] . To understand where this expression comes
f (x + s) - f
from, recall that for single variable functions the difference quotient
is close to f '(x) for small enough s. Hence, the difference quotient
t
St
S
1 7.7. Higher Derivatives
375
should be close to f ” ( x > , andso f(x+s+t)-f(x+t)-f(x+s)+f(x)
should be close
to theproductf”(x)st, whichfor general f would be D 2 f ( x ) [ s t, ] or D 2 f ( x ) [ t s, ] .
Let E > 0 and let SSt > 0 be so that for all s , t E X with llsll < SSt and ljtll < Ssr
&
we have x s t E L
2 and Df (x s t ) - D f ( x >- D 2 f ( x > [ s tll! I
- 11s
t 11,
+ +
!I
+
+ +
where D’J is interpreted as a linear map into C ( X , Y ) . Then for all s, t
11s /I < 6,, and lltll < 6,,we obtain
‘i f
Gt
5
4
E
+
X with
+ + r ) - J (-u + t ) - f ( x + $1 + f ( x ) - D ’ f ( X ) [ S , fill
5
/lf(x+s+t) - f ( x + t ) -
(Df(X+S)[fI
- Df(X)[fI)
- (f(x+s) - f(x))/l
376
17. Differentiation in Normed Spaces
But this implies the following for all s , t E X
\ {O}.
lpf(x)[s>
tl - D 2 f ( x ) [ t SIII
,
Therefore, for Als, t ] := D 2 f ( x ) [ st, ] and B [ s , t ] := D 2 f ( x ) [ t s, ] the tensor
norm of the difference is IIA - BII 5 48. Because E > 0 was arbitrary this implies
IIA - BII = 0, and hence D 2 f ( x ) [ s t, ] = D 2 f ( x ) [ ts]
, for all s, t E X .
Definition 17.57 A bijective finction a : { 1, . . . , k } + (1, . . . , k } is also called a
n
k
permutation. A k-tensor T :
n
Xi --f Y is called symmetric ifs for all k-tuples
i=l
k
( x i , . . . ,X k )
E
Xi and all permutations a : { 1, . . . , k } + { 1, . . . , k } the equal-
Corollary 17.58 Let X , Y be normed spaces, let fi X be open, let x
f : fi + Y be k times differentiable at x. Then Dk f (x) is symmetric.
E
Q, and let
Proof. The proof is an induction on k with k = 1 being trivial and k = 2 proved in
Theorem 17.56.
For the induction step, assume that k > 2 and that the result is proved for all
j < k . Let ( t l , . . . , t k ) E X k and let a : 11, . . . , k } + 11, . . . , k } be a permutation.
If a(1) # 1 let t : { 2 , .. . , k } + { 2 , .. . , k } be a permutation with t ( 2 ) = a(1). If
~ ( 1=
) 1 let t ( i ) = i for all i E [ 2 , . . . , k } and skip the middle three lines in the
computation below.
D k f ( x ) [ t i , .. . , t k l
=
D ( D k - ' f (x)) [til[t2,.. . , tkl
=
D ( D k - ' f ( x ) ) [tll[t,(2),
=
D2 ( D k - ' f ( X ) ) [ti,t,(2)l[h(3),. . . , t ~ ( k ) 1
=
D2 ( D k - ' . f ( J ) ) [ & ( 2 ) , tll[tr(3),* * .
=
D (D"-'S(x))
=
D ( D k - l f ( x ) ) [t,(l)I[t,(2)3
=
D k f ( x ) [ & T ( l' ).>
. t,(k)I,
. . ' > t,(k)I
3
k(k)]
. . > t,(k)I
[ ~ , ( l ) l l ~t,(3)>.
l~
b(3),
. . . t,(k)I
3
1
where in the second to last step we use { l ,t(3),. . . , t ( k ) } = {a(2),. . . , a ( k ) }and
apply an appropriate permutation.
17.7. Higher Derivatives
377
Clairaut's Theorem says that the order in which partial derivatives are taken is not
important as long as the function is twice differentiable. It is usually stated for second
partial derivatives of functions from Rd to R. The corollary below is easily seen to
imply Clairaut's Theorem (see Corollary 17.60). Note, however (Exercise 17-73), that
mixed partial derivatives can exist and not be equal.
n
n
d
d
Corollary 17.59 Let
Xi
be aproduct space, let Y be a normed space, let R C
i=l
Xi
i=l
be open, let x E R, and let f : R + Y be k times differentiable at x. Then
f o r all indices i l , . . . , i k E { 1, . . . , d } the partial derivative Di, . . . Di, f exists and
f o r all permutations D : 11,. . . , k } + 11,. . . , k ] and all (xi,,. . . , x i , ) we have
Dil ' . . D i , f ( x i , , ...,x
i,)=Du(i,)...Do(ik)f(x,(il),. . . $ x o ( i k ) ) .
Proof. For j E (1, . . . , d } , define e [ X j ] := { (0, . . . , 0 , x j , 0 , . . . , 0 ) : x j E X j ) ,
where the xj is in the jthcomponent of each vector. First we prove by induction on k
that Dil ' . . D i k f = D k f I n l = l e [ X i , l The
.
case k = 1 follows from Di f = D f le[xil
(see Proposition 17.40) wherever f is differentiable.
For the induction step, assume the result has been proved for all j < k . Then
Now Corollary 17.58 implies
Dj,
. . . D j k f ( x i l ,. . . , xi,)
I n5=,e [ X i k1 (xi, . . . xi,)
=
Dk f
=
~ ' f l n ;e[X,(,k)1(~o(ii)3
=,
. . xo(ik))
=
Du(il) ' ' ' Du(i,)f (xu(il),
I
. 3
...
9
xu(ik)),
which finishes the proof.
Corollary 17.60 Clairaut's Theorem. Let R C Rd be an open set and let the function
f : C2 -+ R be twice diferentiable at x E R. Then for all i l , i 2 E { 1, . . . , d } we have
a2f
(x) = a 2 f (x).
axi, axi,
axi,axi,
r f f is k times differentiable at x, then for all i l , . . , , i k
~
mutations D of { 1, . . . , k } we have
akf
axi, . . . axi,
af
Proof. Easy consequence of -(x)
axi
(x) =
E { 1, . . . , d
akf
axu(i1).
'
axo(i,)
) and all per-
(x).
. t = D R f~( x ) [ t ]and Corollary 17.59.
Now that we have made the connection to partial derivatives, we can also give an
explicit formula for the kth derivative in terms of partial derivatives that is similar to
Theorem 17.43.
17. Differen tiation in Normed Spaces
378
Rd be open, let x
Corollary 17.61 Let B
differentiable at x. Then
E B and let f
: B
+ Iw be k times
Moreovel; for hi = h2 = . . . = hk = c = ( c l , . . . , C d ) we have
d
n;,[hjlei, for all j = 1, . . . , k. For the second part, note
Proof. Use that hj =
i,=l
that because the kth partial derivatives do not depend on the order in which the partial
derivatives are taken, we can sort the first sum by how often each kth partial derivative
occurs.
We conclude this section with a proof that k-fold differentiability is preserved by
compositions.
Proposition 17.62 Chain Rule. Let X , Y , Z be normed spaces, let R1 & X , B2 E Y
be open, let g : R1 -+ Q2 be k times differentiable at x E 521 and let f : R2 + Z be
k times differentiable at g ( x ) E B2. Then f o g is k times direrentiable at x.
Proof. This proof is an induction on k, with the base step k = 1 being the Chain
Rule (Theorem 17.30).
For the induction step (k - 1) -+ k, let f and g be k times differentiable. Recall
that by Theorem 17.30 we have D ( f o g)(x) = D f ( g ( x ) ) o D g ( x ) . By Proposition
17.55, D g ( . ) is k - 1 times differentiable at x and by induction hypothesis D f ( g ( . ) )is
k - 1 times differentiable at x. Therefore by an easy generalization of Exercise 17-57
the function x H ( D f ( g ( x ) ) D
, g ( x ) ) is k - 1 times differentiable. Moreover, the
function ( L , M ) H L o M from L(Y, Z ) x C ( X , Y ) to C ( X , Z ) is continuous and
bilinear, and hence k - 1 times differentiable by Exercise 17-74. But then by induction
hypothesis, the composition x H ( D f ( g ( x ) ) ,D g ( x ) ) H D f ( g ( x ) )o D g ( x ) is k - 1
times differentiable, which completes the proof.
Exercises
17-72. Finish the proof of Theorem 17.56 by proving that for all & z 0 we can find a St, > 0 so that for all
s, r E X with llsli < St, and llrli < St, we have
17-73. Even when mixed partial derivatives exist, they need not be equal. Consider the function
17.7. Higher Derivatives
(a) Prove that both
379
a2f
~
a2f
axay
(0, 0) and -(0, 0) exist and are not equal
ayax
af .
af
.
(b) Prove that neither - nor - is differentiable at (0, 0).
ax
ay
(c) Prove that f is differentiable at every (x.y ) E R2
(d) Prove that D f is not differentiable at (0,0).
17-74. Prove that every continuous k-tensor T is infinitely differentiable with D JT = 0 for j > k
17-75. Let R G Rd be open, let f : R + W be a twice differentiable function and let x E
(a) Prove that A := ( D 2 f ( x ) [ e ; e, j l
R.
)
is such that D 2 f ( x ) [ u ,w ] = u T A w ,
= 1 , ,,, ,
j=l,..,,d
where u , w are column vectors with respect to the standard base and uT is the transpose
of u , that is, a row vector.
(b) Prove that A is a symmetric matrix, that is, for all i, j
E
(1, . . . , d ) we have a j j = a j j .
17-76. For each f : R2 + 22, compute the second derivative. Use the representation of Exercise 17-75
(a) f ( x , y) = ye"
(b) f ( x . J) = x 3
+ 3xy + y 2
+
17-77. Taylor's Formula Let X , Y be Banach spaces, let R C X be open, let f : R --f Y be k
1
times continuously differentiable on R, and let x E 52, z E X be so that for all r E [0, 11 we have
x +tz
E
R.
+
Hint. Consider the function t w f(x
t z ) . The Riemann integral for continuous Banach
space valued functions is defined in Exercise 17-41. Use Theorem 13.3 as guidance.
(b) For R Wd and Y = R state Taylor's formula from part 17-77a in terms of partial derivatives.
Then decide which of the two formulas is easier to use computationally and which formula is
easier to read.
Hint. Corollary 17.61.
(c) Let
o ( i ):=
s,'
+
Dk+'f(x
k
The function T k ( z ) :=
i=o
+ u z ) d u [ i, .
,,
. i ] . Prove that lim O ( z ) = o
~
llzIl+O
llzllk
1
-t D ' f ( x ) [ z ,. . . , i ] is called the kth Taylor polynomial o f f at x
-'
17-78. For each function f : R2
R below, compute the second Taylor polynomial at x = 0. Use the
representation of Exercise 17-75 for the second derivative.
(b) f ( x , y ) = x 2
(a) f ( x , y ) = ex?.*
+ y2 (Is the result a surprise?)
17-79. Let X be a normed space. A bilinear function T : X x X + 22 is called positive definite iff for all
x E X \ ( 0 )we have T [ x ,x] > 0.
(a) Second Derivative Test. Let Q
differentiable and let u E
E X be open, let f
R be so that D f ( u ) = 0.
: X
--f
R be twice continuously
i. Prove that if D 2 f ( u ) is positive definite, then there is an E > 0 so that for all points
u E X \ ( u }with lIu - ulI < E we have f ( u ) < f ( u ) ,
ii. Prove that if - D 2 f ( u ) is positive definite, then there is an E > 0 so that for all points
u E X \ ( u ) with )1u - u / / < E we have f ( u ) > f ( u ) ,
380
17. Differentiation in Normed Spaces
iii. Prove that if there are x1. x 2 E X so that D2 f ( u ) [ x l ,xl] > 0 and D 2 f ( u ) [ x 2 x2]
, < 0,
then for every E > 0 there are elements u , w E X \ ( u )with IIu - ulI iE , Ilu - wll < E ,
f ( u ) < f (v), and f ( u ) > f (w).
Hint. Use Exercise 17-77.
a2f ( u ) > 0
(b) Prove that if X = W2, then the second derivative D 2 f ( u ) is positive definite iff ax2
and -a2f
( u ) ~a2f
( u ) - (*(u))*
ax2
ay
axay
> 0.
Hint. Use Exercise 17-75.
a2f
a2f
ax2
ay2
(c) Prove that if X = W2 and -(u)-(u)
(a(.))
2
-
< 0, then there are x1,x2 E X
axay
so that D2f ( u ) [ x l ,xi] > 0 and D 2 f ( u ) [ x 2 , xz] < 0.
(d) State and prove a result similar to part 17-79a for a k times continuously differentiable function
f : C2 + W so that Of (x) = 0, . . . , Dk-’ f (n) = 0. (Distinguish even and odd k.)
17-80. A characterization of symmetry
n
k
(a) Prove that a continuous k-linear function T :
there is a S > 0 so that for all k-tuples
(XI,.
. . . xk)
E
> 0
1 (XI,. . . , x k ) /
<S
Xi + Y is symmetric iff for every
i=l
n
k
E
X i that satisfy
i=l
and all permutations
0
: [ 1, . , , , k ] + [ 1, , . , , k ] we have
/I ~ ( x l . . xk) - ~ ( x o ( 1 ) s . . , x o ( k ) ) 1
,,,
,
5
E
C IIxi 11
(ill
Ik
.
(b) Consider the continuous bilinear map T : B2 x W2 + R defined by T ( x , y ) := 7r1 ( x ) 7 r 2 ( y ) .
Prove that for every E > 0 there is a 8 > 0 so that for all ( x , y ) E R2 x E2with I/(x,Y)il < 6
we have T ( x , y ) - T ( y ,x) 5 E ( //x/I IIyIl ) , but T is not symmetric.
1
1
+
17-81. Let X be a Banach space, let Y be a normed space and let X ( X , Y ) be the set of linear homeomorphisms from X to Y . Prove that the map J ( A ) := A-l is an infinitely differentiable function from
X(X, Y ) to X ( Y , X ) .
17.8 The Implicit Function Theorem
To investigate the solution sets of equations f (x,y ) = 0 in more detail, it is often
helpful to represent y as a function g(x). The Implicit Function Theorem says that
under mild hypotheses this is possible and g has the same differentiability properties
as f.The first step toward the Implicit Function Theorem is a result about fixed points.
Definition 17.63 Let S be a set and let f : S -+ S be a function. Then p E S is called
a fixed point off i f S f ( p ) = p .
Fixed points are important throughout applied mathematics because many equations can be rewritten as fixed point equations. (Recall Newton’s Method from Section
13.2.) Under certain conditions, fixed points must exist. That is, if a fixed point equation from a concrete application satisfies the right abstract condition, then it must have
a solution. Banach’s Fixed Point Theorem provides one such condition.
17.8. The Implicit Function Theorem
381
Theorem 17.64 Banach's Fixed Point Theorem. Let X be a complete metric space,
let 0 5 q < 1 and let f : X -+ X be afunction so that for all points x,y E X we have
d ( f ( x ) ,f ( y ) ) 5 q d ( x , y ) . Then f has a uniquefucedpoint p E X .
Proof. Let x
E
proves that for all n E
X and consider the sequence { f n ( x ) } Z l .An easy induction
N we have d f n ( x ) , f n+l (x))
(
5 q " d ( x , f ( x ) ) . Hence, for all
m-1
q k d ( x ,f (x)). Therefore, { f n ( x ) } z lis
n , m E W we infer d (fn(x), f m ( x ) ) 5
k=n
a Cauchy sequence. Let p := lim f (x) and let E > 0. Then there is an n E
d ( p , f n ( x ) )<
3
n-30
N so that
&
a n d d ( f " ( x ) , f " + ' ( x ) ) < -, whichimplies
3
d ( p , f ( p ) ) i d ( p , f n ( x ) )+ d ( f " ( x ) , f n + ' ( x ) ) + d ( f " + ' ( x ) , f ( p ) )
5
<
d ( P , f n ( x ) )+ d
&
&
(fa@),
f n + l ( x ) )+ q d ( f n ( x >P, )
&
-+-Sq-<&.
3 3
3
Because E > 0 was arbitrary we have proved that p = f ( p ) .
Regarding uniqueness, suppose that p is another fixed point of f . Then we have
d ( p , F) = d ( f ( p ) ,f@)) 5 q d ( p ,
which implies d ( p , p"> = 0, and hence we
conclude p = F.
a,
Now we are ready to state and prove the Implicit Function Theorem. Note that the
function h in the proof is similar to applying Newton's Method in the Y-coordinate.
Theorem 17.65 Implicit Function Theorem. Let X , Y and Z be Banach spaces, let
i-2 C X x Y be an open set, let (xo,yo) E i-2, and let f : i-2 -+ Z be continuously
diferentiable so that f (xo,yo) = 0 and so that D y f (xo,yo) : Y -+ Z is a linear
homeomorphism. Then there is an open neighborhood N 5 X of xo such that there
is a unique continuously difSerentiable function g : N + Y so that g(x0) = yo and
f o r all x E N we have (x,g ( x ) ) E R and f ( x , g(x)) = 0. The derivative of g is
Dg(x> = - ( ~ y f ( x g, ( x ) ) ) - l o D x f ( x , g(x>). Moreovei; iff i s k times continuously
differentiable, then so is g .
Proof. Let LO := D y f ( x 0 , yo), let h ( x , y ) := y - L i l [ f ( x , y ) ] and let 6 > 0 be
so that for all (x,y ) E X x Y with I / ( x , y ) - (xo,y o ) 5 6 we have that (x,y ) E R,
1
D y f ( x , y ) is invertible and ~ / DfY(x,y ) - Dr f (xo, y o ) / / <
I
~
is bounded on Bs((x0, yo)). Then for all points (x, yi)
(x. y l ) - (xo,yo) 5 6 ( i = 1, 2 ) the following holds.
1
Ilh(x, Y 2 ) - h(x3 Y l ) l l
E
X x Y with
17. Differentiation in Normed Spaces
382
5
1
~ I l Y2 Ylll.
Let N := n-x [Bs ((xo, y o ) ) ] = BJ (xo) (equality holds because so far we are using
the norm max { 11 . IIx, I/ . I l y } on X x Y ) . By Theorem 17.64 for each point x E N
- the function h(x,.) : Bs(y0) -+ &(yo) has a unique fixed point g ( x ) . Because we
have g ( x ) = h(x, g ( x ) ) = g ( x ) - L,'[f(x, g(x))] iff f ( x , g ( x ) ) = 0, a function
g : N + Y with f ( x , g ( x ) ) = 0 exists on N and by Theorem 17.64 it is unique.
To prove that g is continuous, note that by the proof of Theorem 17.64 the functions g,, where go(x) := yo and g , ( x ) := h(x,g , - l ( x ) ) for n ? 1, form a uniform
Cauchy sequence. Thus g is the uniform limit of the functions g,. Because the g, are
continuous, so is g (easy adaptation of the proof of Theorem 11.7, see Exercise 17-82).
To prove that g is differentiable, fix x E N . To simplify the following argument, we
will continue the proof with the product norm (x,y ) := /Ix/I II y II on X x Y . Note
that by continuity of g and Proposition 17.40 for every x E N and every E > 0 with
1
& <
there is a 6 > 0 so that for all z E N with /Iz - x 11 i6
2 !1-(Dyf(x, g(xN)-l
we have
1
1
/I
+
383
17.8. The Implicit Function Theorem
Via (*) and the reverse triangular inequality, for all z with I/z - x 11 < S we infer
L
which implies that llg(z) - g ( x ))I I
a ( z )llz - x 11. Because a ( z ) is bounded on B s ( x ) ,
substituting this into the equation (*) proves that g is differentiable with the derivative
being as claimed.
To prove that g is k times continuously differentiable if f is, we proceed by induction. The base step was just proved. For the induction step k + ( k I), note
that both - ( D y f ( x , g( x) ) ) - ' and D x f ( x , g ( x ) )are compositions of k times continuously differentiable functions, and hence k times continuously differentiable themselves (for the inversion, Exercise 17-81 is needed). But then, because the composition
operator is continuous and bilinear from C ( Z , Y ) x C ( X , Z ) to C ( X , Y ) the derivative
D g ( x ) = - ( D y f ( x , g ( x ) ) ) - l o D x f ( x ,g ( x ) ) is k times continuously differentiable
and therefore g is k 1 times continuously differentiable.
+
+
The Implicit Function Theorem allows us to show that in Theorem 17.31 the hypotheses that f is bijective, f - ' is continuous and the range is open can be dropped.
Corollary 17.66 Let X , Y be Banach spaces, let V 5 X be open, let f : V + Y be
continuously differentiable and let xo E V be so that D f (xo) : X + Y is a linear
homeomorphism. Then there exists an open neighborhood U of xo so that f Iu is
a homeomorphism from U to an open neighborhood of f (xo). Moreovel; if f is k
times continuously differentiable in U , then the inverse f - ' is k times continuously
differentiable in f [ U ] .
Proof. Define F : V x Y + Y by F ( x . y ) := f ( x ) - y and let yo := ~ ( x o )By
. the
Implicit Function Theorem, because D x f ( x 0 , yo) = D f ( x 0 ) there is a neighborhood
N of yo and a k times continuously differentiable function g from N to X so that
F ( g ( y ) ,y ) = 0 for all y E N . Now g is the inverse of f and the result follows.
W
The Implicit Function Theorem also is fundamental for the theory of manifolds,
because it allows us to express sets of the form { ( x , y ) : f ( x , y ) = 0) as manifolds,
or, as Corollaries 17.67 and 17.69 show, as embedded manifolds. We will revisit the
results below when we prove Theorem 19.7.
Corollary 17.67 Let X , Y , Z be Banach spaces, let U 5 X and V
Y be open, let
f : U x V + Z be k times continuously differentiable and let (xo, y o ) E U x V be
so that Dy f (xo, yo) is a linear homeomorphism. Then there are open neighborhoods
UOof xo and WOoff (xo, yo) and a unique k times continuously difSerentiablefunction
g : UO x Wo + Y so thatfor all ( x , w)E UOx WOwe have f ( x ,g ( x , w))= w.
Proof. Apply the Implicit Function Theorem to H ( x , y , w) := w - f ( x , y ) (Exercise 17-83).
17. Differentiation in Normed Spaces
384
Definition 17.68 The rank of a matrix is the largest number of linearly independent
column vectors in the matrix.
Corollary 17.69 Let R E Rd be open and let f : S-2 + RdPmbe a k times continuously differentiable function such that the matrix
(g)
i = l , ..., d - m
has rank
j = l , . . . ,d
Rd and
d - m. If a E C2 is such that f ( a ) = 0, then there is an open set G
a k times continuously diyerentiable function g : G --f C2 with k times continuR and the equation
ously differentiable inverse so that g[G] is open, a E g[G]
f 0 g ( U 1 , . . . , U d ) = (Um+l, . . . , U d ) holds for all ( U l , . . . , U d ) E G.
Proof. Let j1, . . . , j d P m be the indices of the columns so that the corresponding
column vectors of
(y
(a))
ax
are linearly independent. Represent Rd
i = l , . . . ,d - m
j = 1. . . . , d
as span{ek : k 9 { j l , . . . , j d - m ) } x span{ej, : i = 1 , . . . ,d - m ) (this permutes the
components appropriately) and apply Corollary 17.67. The details are left to the reader
as Exercise 17-84.
Exercises
17-82. Let X , Y be Banach spaces, let R C X and let [fn]T=l be a sequence of continuous functions
f n : S2 + Y so that for all E > 0 there is an N E N so that for all m ,n 2 N and all x E R we
have / f n ( x ) - f m ( x ) < E . Prove that then there is a continuous function f : R +. Y so that
lim f n (x) = f ( x ) for all x E R.
1
n+cc
17-83. Prove Corollary 17.67.
17-84. Prove Corollary 17.69.
17-85. Explain why in the proof of the Implicit Function Theorem we must prove that g is continuous before
we can prove that it is differentiable.
17-86. Lagrange multipliers.
(a) Let X , Y be finite dimensional normed spaces with dim(X) = n ? m = dim(Y), let R & X
be open, let f : R +. R be continuously differentiable, let g : R + Y be continuously
differentiable, and let x E R be so that g ( x ) = 0, the rank of D g ( x ) is m, and there is an
E > 0 so that for all z E R with g(z) = 0 and I/z - x i / < E we have that f ( x ) > f ( z ) . Prove
that there is a continuous linear function cp E L(Y,R) so that D f ( x ) cp o D g ( x ) = 0.
Hint. Let X1 := [ a E X : D g ( x ) [ a ]= 0 ] and represent X as a product X = X I x X 2 with
x = (xl . x 2 ) . Find a neighborhood U x V of x so that there is a continuously differentiable
a :U + Vsothatg(xl,a(xl))=Oforallx E U.ThenconsiderH(x):=f(xl,~(xl)).
+
(b) Prove that if X = R" and Y = Wm, then there are h l , . . . , A m
m
grad(f)(x)
E
R so that we have
+ z h i g r a d ( n i o g ) ( x ) = 0.
i=l
17-87. Let X be a complete metric space and let f : X + X be a function so that there is a sequence
cc
b, converges and for all n E N and all x , y
[ b n ) E 1of nonnegative numbers such that
E
X we
n=l
have d ( f n ( x ) ,f"(y)) 5 b n d ( x , y ) . Prove that f has a unique fixed point p in X .
17-88. Let 11 . 112 be the Euclidean norm on W3.Find a map f : R3 + R3 that has no fixed points and so
thotfn-oll r
c I33 W P h a v e 11 f l u ) - f ( v ) 11- = I I Y - v l l q .
Chapter 18
Measure, Topology, and
Differentiation
Continuity, differentiation, and integration are the three main topics in analysis. Part I
of the text has shown that the combination of these concepts can lead to powerful new
insights, such as the Fundamental Theorem of Calculus or the Lebesgue criterion for
Riemann integrability. For abstract spaces, we have seen in Chapter 14 how integration
leads to measure theory, in Chapter 16 how the investigation of limits and continuity
leads to topological concepts, and in Chapter 17 how differentiation is approximation
with continuous linear functions. In this chapter, we investigate the connections between measure, topology, and differentiation in d-dimensional space. Section 18.1
characterizes Lebesgue measurable sets topologically by approximating them from the
inside with closed sets and from the outside with open sets. Similarly, Section 18.2
shows that p-integrable functions can be approximated with infinitely differentiable
functions. After a brief excursion into tensor algebra in Section 18.3 (placed there
to keep the presentation self-contained), the chapter concludes with the proof of the
Multidimensional Substitution Formula in Section 18.4.
18.1 Lebesgue Measurable Sets in Rd
This section shows how Lebesgue measurable sets in Rd can be characterized almost
exclusively with the topological ideas of openness and closedness. We start by proving
that the most fundamental subsets of Rd are Lebesgue measurable.
Theorem 18.1 Open and closed subsets of Rd are Lebesgue measurable.
Proof. By Proposition 14.60, for any x =
d
open cube C,(X) := n ( x i - E , xi
the open ball of
i=l
radius E
(XI,.
. . , xd) E
Rd and any E
+ E ) is Lebesgue measurable.
around x in the uniform norm (1 . ((oo.
385
> 0 the
Moreover, C,(X) is
18. Measure, Topology, and Differentiation
386
Let 0 5 Rdbe an open set. Then OQ := 0 n Qdis a countable dense subset of 0.
For all x E OQ,let E, := sup { E < 1 : C,(x) 5 O}. Then clearly
C , , ( x ) C 0.
u
x€Oq
To prove the reversed inclusion, let y E 0. There are an E E (0, 1) so that C,(y) _C 0
E
and an x E OQ so that [ [ x- y [ I f f i< -. But then y E C % ( x )E C,(y) 5 0, which
2
C,,(x). Because y was arbitrary this proves that
C,,(x) 2 0,
means y E
u
u
u
X€OZ
and hence
X€OQ
CEX
(x) = 0. Because OQ is countable and every C,,
(x)is Lebesgue
X€OQ
measurable we conclude that 0 is Lebesgue measurable because it is a countable union
of Lebesgue measurable sets. Therefore, the open subsets of Rd are Lebesgue measurable.
Now let C 5 Rd be closed. Then 0 = Rd \ C is open and thus Lebesgue measurable. But then C is the complement of a Lebesgue measurable set, and hence it is
Lebesgue measurable, too.
w
Corollary 18.2 Let S 5 Rd be Lebesgue measurable and let f : S -+ R be continuous. Then f is Lebesgue measurable.
Proof. Let a E R. Then the set ( a , co) is open in R. Because f is continuous, f - ' [ ( a , m)] is relatively open. That is, there is an open set 0 5 Rd so that
f - ' [ ( a , m)] = S n 0, which by Theorem 18.1 is Lebesgue measurable. Hence, f is
Lebesgue measurable.
w
Lebesgue measurable sets can now be characterized as the subsets of Rd that can
be approximated by open sets from the outside and by closed sets from the inside so
that the measure of the difference can be made arbitrarily small. We will first prove
this result for sets of finite measure and then for sets of arbitrary measure.
Theorem 18.3 A subset S Rd with h ( S ) < 00 is Lebesgue measurable zfffor every
z 0 there are a compact set C and an open set 0 such that C 5 S _C 0 and
A(O \ C ) < E .
E
Proof. For "+,"let S be Lebesgue
- measurable with h ( S ) < 00 and let E > 0. Then
u
ffi
00
there is a family of open boxes { B j } g l with S 5
Bj and
j=1
u
j=1
cc
oc,
Let 0 :=
[Bjl < h ( S )
IBj I
B j . Then 0 is open, S C 0, and h ( 0 ) 5
< h(S)
j=1
j=1
+ -.E2
+ -.2
E
For any
d
n
E
N let K ,
:= n [ - n , n ] . By Theorem 14.15, we obtain h ( S ) = lim h ( S n K,)
i=l
.~
Hence, there is an N
n+co
E
N so that K
.
E
:= K N satisfies h ( S n K ) 2 h ( S ) - -. The set
4
K \ S is Lebesgue measurable and there is a family of open boxes {Aj]yZl so that
18.1. Lebesgue Measurable Sets in Rd
u
x
K \S
_C
j=1
387
x
$. But then C := K \ uA j is closed
x
IAjl < h ( K
A j and
\ S) +
j=1
j=1
and bounded, and hence by Theorem 16.80 it is compact. Moreover, the containment
u
x
C =K
\
A , 2 K \ ( K \ S) = S holds. NOW
j=l
>
&
&
( k ( S ) - -4) - - =4h ( S ) - -
&
2’
E
-)
&
Now h ( 0 \ C) = h ( 0 ) - h ( C ) < h ( S ) + - - ( h ( S ) = 6 , whichproves the
2
2
direction “+.”
Conversely, for “+”let S C Rd be such that h ( S ) < 00 and for every E > 0 there
are a compact set C and an open set 0 such that C _C S _C 0 and h ( 0 \ C) < E . Let
T C Rd and let e > 0. Then with 0 and C as described we infer
18. Measure, Topology, and Differentiation
388
A ( T ) = h(on T ) h (0’ n T )
> h ( o n T ) + h ( o ’ n T ) + h ( o\ c ) - E
+
(o’n T ) + h ( ( o\ c ) n T ) - E
2 h( o n T ) + h (0’ n c’n T ) + h (( o n c’)n T ) - E
2
A(O n T ) + A
2
+ A. (c’n T ) - E
h ( s n T ) + A (s’n T ) - E .
h(on T )
Because E was arbitrary we obtain h ( T ) 2 h ( S f l T )
was arbitrary, this means that S is Lebesgue measurable.
+ h (S’ f l T ) , and because T
Theorem 18.3 indicates that an “inner volume” or “inner Lebesgue measure” for
a set S could be defined as the supremum of the (outer) Lebesgue measures of all
compact sets contained in S (also see Exercise 18-3). A set would then be measurable
iff the inner and outer volume are equal. Measurability in this sense turns out to be
the same as Lebesgue measurability. However, this idea is quite complicated and it
requires some topological structure, while Definition 14.19 of measurability can be
used on arbitrary sets. This is why, even though it intuitively feels like a good idea, we
do not work with “inner measures.”
We can characterize Lebesgue measurable sets in Rd as follows.
Theorem 18.4 Let S
Rd. Then the following are equivalent.
1. S is Lebesgue measurable,
2. For every E > 0, there are a closed set C and an open set 0 so that C 2 S _C 0
and h ( 0 \ C) < E .
of open sets and a sequence {C,}r=l of closed sets
3. There is a sequence {
cc
with 0, 2 On+l and C, G Cn+l f o r all n E
(6
andsothath
0,
N so that
n
00
C, 5 S 5
n=l
cc
\
u
O,?
n=l
U Cn) = 0.
nr-12.
d
Proof. To prove “1+2,” for any natural number n
E
N let K ,
:=
n ] and let
i=l
K-1 := KO := 0. For each n E N the set S n ( K , \ K,-1) is Lebesgue measurable with
finite Lebesgue measure, so by Theorem 18.3 there are a closed set C, and an openc set
On withC, C Sn(K,\K,-1)
E 0, K,O+,\K,-1 andsothath(0, \C,l) < A
2.2n’
u
~
n=l
u
00
30
Then the set 0 :=
0, is open and by Exercise 16-42, the set C :=
n=l
C, is closed.
18.1. Lebesgue Measurable Sets in Rd
389
Finally, with 00:= 0 we obtain the following.
m
~ (\ C
0) =
C ~ ( (\ C0 ) n (Kn \
~n-1))
n=l
For “2=+3,” for each natural number k E N let i& be open and let z k be closed
n
1
with E k c s s 6 k and A( & \ E k ) < -. Then for each n E N,the set 0, :=
&
k
n
u
E n
k=l
n
is open, the set Cn :=
u
m
Therefore,
E k is closed, 0,
k=l
Cn G S
n=l
2 On+l,Cn G Cn+i and Cn G
On.
Cc
0,. Moreover, for all k
E
N we infer
n=l
Cc
and because k E
sG
N is arbitrary we conclude h
On \
(n=l
“1
u
C,
= 0.
n=l
The part “ 3 j l ” follows from the fact that open sets, closed sets and null sets are
Lebesgue measurable.
rn
Lebesgue measure is also often considered on Lebesgue measurable subsets of Rd
(see Example 14.10). Let A 2 Rd be Lebesgue measurable. Because the intersection
of Lebesgue measurable sets is again Lebesgue measurable, Theorem 18.4 also holds
for Lebesgue measurable subsets of A . If we want all sets involved to be subsets of A,
we need to replace the demand that the 0, are open with the demand that the On are
open in A and we need to replace the demand that the C, are closed with the demand
that the C, are closed in A . If A itself is open, this is not necessary.
Theorem 18.4 also shows that the a-algebra generated by the open subsets of Rdis
interesting in itself. It is investigated further in Exercise 18-7.
Exercises
18-1. Explain why the hypothesis k ( S ) <
DC,
in Theorem 18.3 cannot be dropped
18-2. Prove that every open subset of Bdis a countable union of compact sets
18. Measure, Topology,and Differentiation
390
18-3. Let S
C Rd be Lebesgue measurable,
(a) Prove that h(S) = inf [ A ( 0 ) : S
(b) Prove that h ( S ) = sup [ h ( C ) : C
(c) Prove that h ( S ) = sup [ h(C) : C
g 0, 0 is open ]
C S,C is compact ]
g S. C is closed ] .
18-4. Prove that S & Ed with h ( S ) icc is Lebesgue measurable iff for every E z 0 there is a compact set
C g S with h ( C ) > h ( S ) - E .
18-5. Lebesgue measurable functions. Let
R g Wd be open and let f : R
-+
W.
(a) Prove that f is Lebesgue measurable iff for all open sets 0 &
measurable.
R the set f-' [ 01 is Lebesgue
(b) Prove that f is Lebesgue measurable iff for all closed sets C E
measurable.
X the set f-' [C] is Lebesgue
(c) Prove that f is Lebesgue measurable iff for all compact sets K 2
Lebesgue measurable.
B the
set f - l [ K ] is
18-6. The Derivative Form of the Fundamental Theorem of Calculus for the Lebesgue integral states
the following. If h : [ a ,b] +
W is Lebesgue integrable, then the function H ( x ) :=
iXIX
I^
h ( t )d h ( t )
h ( t ) d h ( t ) = h(x) for almost all x E [ a ,b ] . In this
is differentiable a t . with derivative -
exercise, we will prove the result with the steps given below. (Recall that by Exercise 10-7, H is
differentiable a.e. if h is nonnegative.)
u
n
(a) Let a1
ibl
< a2 < b2 <
'
'a,
bn and let A :=
i
( a j , b,).
Prove the result for the
j=1
indicator function 1 ~ .
(b) Use Fubini's Theorem on limits of nondecreasing functions (see Exercise 11-20) and Exercise
16-103d to prove the result for indicator functions of open subsets of [ a , b ] .
(c) Prove the result for indicator functions of closed subsets of [ a ,b].
(d) Use Fubini's Theorem on limits of nondecreasing functions (see Exercise 11-20) and part 3
of Theorem 18.4 to prove the result for indicator functions of Lebesgue measurable subsets of
[a. bl.
(e) Prove the result for simple Lebesgue measurable functions on [ a , b]
(f) Use Fubini's Theorem on limits of nondecreasing functions (see Exercise 11-20) to prove the
result for nonnegative Lebesgue integrable functions on [ a , b ] .
(g) Prove the result for all Lebesgue integrable functions on [ a , b].
18-7. Borel Sets. Let X be a metric space. The a-algebra generated by the open sets in X is called the
a-algebra of Borel sets of X , denoted B ( X ) . A function f : X + W is called Borel measurable iff
it is measurable as a function on the measurable space ( X , B ( X ) ) .
(a) Prove that if X is a Lebesgue measurable subset of Rd considered as a metric subspace of ad,
then every Borel set of X is also Lebesgue measurable.
(b) Let X g Ed be Lebesgue measurable. Prove that for every Lebesgue measurable set S 2 X
S g G and h(G \ F ) = 0.
there are Borel sets F , G E B ( X ) so that F
(c) Let X C Wd be Lebesgue measurable and let f : X + B be Lebesgue measurable. Prove that
there is a Borel measurable function f~ : X + W so that i( [ x E X : f ( x ) f ~ B ( x ) = 0.
Note. This result is the reason why the focus of measure theory can be restricted to Borel
measurable functions when necessary.
1)
391
18.2. Ccc and Approximation of Integrable Functions
+
(d) Prove that if rn
n = d and X E C k m x CA,, then every Borel measurable function
f : X + R is also X i m x X i n -measurable.
(e) A measure p : B ( Rd ) +. [O. m] is called a Borel measure. Let the norm on Wd be the
uniform norm Prove that two Borel measures p and u are equal iff for all x E Qd and all
r E (0, 1) n Q we have that p ( B r ( x ) ) is finite and w (B,(x)) = u ( B , ( x ) ).
(f,Let the norm on Rd be the uniform norm. Prove that there are two distinct measures
p , u : B ( Bd ) + [0, co]so that g ( B r ( x ) ) = u ( B , ( x ) ) holds for all x E Qd and all
r E ( o , i ) n Q.
18-8. The o-algebra of subsets of [a, b] generated by the open sets in [ a , b] (as a metric space in its own
right) is also called the 0-algebra of Borel sets (of [ a , 61). Let g : [a,b ] + W be nondecreasing and
let A, be the Lebesgue-Stieltjes measure defined in Exercise 14-20.
(a) Prove that all Borel sets of [a, b ] are hg-measurable
(b) Prove that Ag is a finite measure on the Borel sets of [ a . b ] .
(c) Let f : [ a , b] + R be a function. Then f is called Lebesgue-Stieltjes integrable with
respect to g iff f is A,-integrable.
Prove that if f is continuous, then f is Lebesgue-Stieltjes integrable with respect to g and
lb s,"
f dh, =
f dg, where the right side is the Riemann-Stieltjes integral of f ,which
exists by Exercise 5-19b.
18.2 CODand Approximation of Integrable Functions
Theorem 16.86 makes a first connection between topological and measure theoretical notions by showing that, loosely speaking, the continuous functions are dense in
L p [ a ,b] for 1 5 p < co. In this section, we show that even more is true. The set of
infinitely often differentiable functions is dense in LP, too. Because differentiability
is usually defined on open sets, we work with open domains R E Rd throughout this
section. Recall that by Notation 15.66, with terminology as in Example 14.10, we have
CP(i2) = c n,C,n&p)
and L P ( R )= L p (R, Cp,
(
i.IEs).
Definition 18.5 Let R
Rd be open and let k 3 0. We define Ck(R) to be the set
of all k times continuously differentiable functions f : R -+ R. Moreovel; we define
n
mz
C x ( Q ) :=
Ck(R).
k= 1
Note that infinite differentiability is compatible with the usual algebraic operations.
Rd be an open subset of Rd and let f,g E Cmz(R). Then
f
thefunctions f + g, f - g, f g are in C"(R). Moreovel; - ~ C ~ ( R \ { x : g ( x ) = 0 } ) .
Theorem 18.6 Let R 5
g
Proof. We first show that for the first three operations, the result is a consequence
of Proposition 17.62. If f,g E C30(R), then x H (f (x), g(x)) is infinitely differentiable. Moreover, any bilinear function on R2 is continuous, and hence (use Theorem
17.52 and note that the derivative is the sum of two linear functions) infinitely differentiable. Therefore, by Proposition 17.62 for all k E N we infer f g E @(a),because
+
18. Measure, Topology, and Differentiation
392
it is the composition of the k times differentiable functions x H ( f (x),g(x)) and
( y , z ) H y z. Because k was arbitrary, f
g E CcQ(S2).Differences and products
are treated similarly.
+
+
(
3
For the quotient, note that the function ( y , z ) H y , - is infinitely differentiable
on R x
R \ (0) and that the quotient is the composition of the functions
We will prove more than that infinitely differentiable functions are dense in LP(S2).
We will show that there is a dense subset consisting of infinitely differentiable functions
that are zero outside a compact set.
Definition 18.7 Let S-2 & Rdbe open. A function f E Coo(a) is called a test function
ifs supp( f ) is compact. The set of all test functions on S2 is denoted C r (S2).
The terminology "test function" comes from the role these functions play in the
investigation of partial differential equations. One concrete example is the definition
of weak derivatives (see Definition 23.11). With Cr(S2) defined, our next step is
to approximate the indicator function of an interval with a function in C,"(R). The
construction is visualized in Figure 44.
Lemma 18.8 The C x connector: The function c ( x ) :=
0;
e .x;
f o r x 5 0,
is inf o r x > 0,
Jinitely differentiable on R.
Proof. It is clear that c E Cm(R \ (O}). It remains to be proved that c is infinitely
differentiable at x = 0. For all x < 0, we have c("'(x) = 0. Therefore, if they
exist, all derivatives of c at x = 0 must be zero. For all n E N U {0},we have
lim c'n'(z) z-0
--+n--
= 0. If we can show that the right-sided limit also is 0, then we have
forx E (-m, 01,
d"
-i
for all n E N.
O;
p e x ; forx E (0, co),
(Formally, this would require an induction on n, but it is not hard to mentally fill in.)
An easy induction shows that for every n E N U {0}there is a polynomial p n so
d"
I
that -e-;
= pn
e-4 for all x > 0 (see Exercise 18-9). For all j E N by
dxn
~,
_ _1
e .r
e - i = lim uje-' = 0, which
Exercise 12-27, we obtain lim - = lim ( : ) J
proved that c E c"(R) with c'"'(x) :=
(:)
x+o+
xJ
x+O'
k
a j x j we can conclude
means that for every polynomial p ( x ) =
j =O
But then for all n
E
u+x
N U {0}we obtain
393
18.2. Cco and Approximation of Integrable Functions
Figure 44: Constructing a Coo “indicator function” for intervals.
1
d“e-i - 0
lim
z+o+
dz“
z-0
= lim
z+o+
pn
(i)
e-i - o
=
2-0
1
lim - p n
z+o+ z
(:)
e-: = 0.
1
Note that the function in Lemma 18.8 also shows that not every C”O function can
be represented as a power series. This is because c(“)(O) = 0 for all n E N,but c # 0.
4x1
is
c(6 - x) c(x)
infinitely
. differentiableon R.Moreovel; it takes values in [0, 11, it is identical to zero
on (--00, 01, it is identical to 1 on [a, m) and j s [ ( O , S ) ] C (0, 1).
Lemma 18.9 The Coojump function. Let 6 > 0. Then j a ( x ) :=
“I
+
~~
Proof. Because c 3 0 and for each x E R at least one of c(x) and c(6 - x ) is not
zero, Proposition 17.62 and Theorem 18.6 imply that ja E C”(R). Because c 2 0 for
4x1
<
all x E R we obtain c ( x ) 5 c(S - x ) c ( x ) , which implies 0 5
+
c(6 - x )
+ c(x) -
for all x E R and the inequalities are strict for x E (0,6). Finally, because c ( x ) = 0
for x I
0 we conclude j s ( x ) = 0 for x I
0 and because c(6 - x) = 0 for x 3 6 we
conclude J s ( x ) = 1 for x 2 6.
Lemma 18.10 The Coo interval indicatol: Let a < b and 6 > 0 be real numbers. The
function l ( a , b ) , J ( X ) := js(x - a ) j s ( b - x ) is infinitely difSerentiable on R. Moreovel;
it takes values in [0, 11, it is identical to zero on E% \ ( a ,b ) and it is identical to 1 on
[a 6 , b - 61.
+
1
Proof. Exercise 18-10.
Cm interval indicator functions are the key ingredient to constructing similar “indicator functions” for closed sets. These functions are then used to show that the compactly supported infinitely differentiable functions are dense in L* (Q).
Lemma 18.11 Let C C Rd be closed and let U C Rd be open so that C
U.
Then there is a function l c , u E Coo ( R d ) that takes values in [0,11 and satisfies
k u l , = 1 and SUPP ( k u )
c u.
Proof. First consider the case that C is compact. Throughout, we will use the
uniform norm 11 . Iloo. So, “balls” around points will actually be cubes. For each x E C,
let E,
0 be so that BEx( x ) c U . Then BQ (x)],,~ is an open cover of C. Let
uf
{
n
XI,
. . . , xn be so that C
BsX
j=1
(xj).For
each x j , let x ; ) denote the ith component
394
18. Measure, Topology,and Differentiation
of the representation of x j with respect to the standard base. For j = 1, . . . , n let
11
becau\e it is a product of infinitely differentiable functions. Hence, for h :=
infer supp(h)
x E
B61,
-
c U,h
E Cm
(ad) and for all x
( x - ~ ) which
.
means h(x) 2
12,
hi ue
J=i
E
C there is a j E [ 1. . . . . n ) so that
j l being a Cx jump function
(x) = 1. With
as in Lkmma 18.9, define l c , :=
~ j1 o h . By Proposition 17.62, this function is in
Cx
By the above, we have lc,ulc = 1 and supp ( l c , ~c) U , so the result is
established for compact C.
Now consider the case that C is closed, but not necessarily compact. For each
n E W, let C, := C f l (B,(O) \ B,-l(O)). Then each C, is compact. For each x E C,,
(md).
let E~
E
(0, 1) be so that BEx(x) c U and let U,, :=
u
BE,(x). Then for each n
E
N,
X€C,
the set U, is open and contains the compact set C,. Let the functions l c , , ~be
, as
w
l c , , ~,, For
, each x E Rd, there is a neighborhood
constructed above and let g :=
n='i
V of x so that at most four terms in this series are not equal to zero on V . Thus g is in
COc
Moreover, g(x) 2 1 for all x E C , and supp(g) c U . Now l c , :=
~ j1 o g
is as desired.
(Rd).
Note that Lemma 18.11 shows in particular that Cr(S2) is not empty, which is not
necessarily trivial. The next result shows that even more is true.
Theorem 18.12 Let 1 5 p <
00
and let 52 2
Rd be an open subset of Rd.Then
C$(Q), or more formally T := { [ f ] : f E Cr(S2)}, is dense in LP(S2).
Proof. The proof runs along the same lines as the proof of Theorem 16.86. The
main change is that we need to approximate the indicator functions of open boxes
contained in S2 (rather than those of intervals) with functions in C r ( Q ) (rather than
d
with continuous functions). Let B = n(lL,
r , ) be a box that is contained in 52 and let
,=1
> 0. By Theorem 18.3, there is a compact set C & B so that h ( B \ C) < E * . Let
1 c . be
~ as in Lemma 18.11. Then supp ( l c , ~c) B 2 S2, l c , E~ Cco(S2) and
E
We conclude that in any LP-neighborhood of an indicator function of a box in Q
we can find a function in C r (Q).
Now that we can approximate indicator functions of boxes with C r (D)-functions,
the proof proceeds just like the proof of Theorem 16.86. To approximate indicator
18.2. Cx and Approximation of Integrable Functions
395
Figure 45: Illustration how the indicator function of an interval can be approximated
in LP with CQ3functions. The area bounded by the difference shrinks to zero, which
allows the approximation (in the LP sense) of the discontinuities with infinitely smooth
functions.
functions of sets, cover the set A with open boxes Bj C R such that the sum of the
volumes of the boxes Bj is close to that of the set. Truncate the sum to obtain a finite
number of boxes Bj so that the sum of the volumes of the finitely many boxes is close
to the total sum of volumes. Approximate the indicator functions of these finitely many
boxes with CF(S2) functions as indicated above to prove that indicator functions of sets
can be approximated with CF(S2) functions. The details are to be given in Exercise
18-1la. Next, just like the proof of Theorem 16.86, approximate simple functions with
functions in Cr(S2) (Exercise 18-llb). Finally, just like the proof of Theorem 16.86,
apply Theorem 16.85 (Exercise 18-11c).
Geometrically, Theorem 18.12 says that even highly discontinuous functions can be
approximated arbitrarily well with infinitely smooth functions in LP. Figure 45 shows
such an approximation for the indicator function of an interval. This visualization
illustrates the crucial part of the proof of Theorem 18.12 and it also shows that the
result is not counterintuitive. The infinitely smooth approximations do not in any way
“fix” the jumps at the discontinuities.
Exercises
18-9. Prove by induction that for every n E
dn
-L
x
= pn
(t)
N there is a polynomial
p n so that for all x > 0 we have
e-4.
18-10. Prove Lemma 18.10.
18-1 1. Finish the proof of Theorem 18.12
(a) Prove that for every measurable subset A of Q and every
&?A,& E c r ( Q sothat
)
l / l A -gA,&Ilp < E .
E
(b) Prove that for every simple function s E LP(R) and every
gS,&E C r ( Q ) so that 11s - gS,&lIP < E .
E
> 0 there is a test function
> 0 there is a test function
(c) Apply Theorem 16.85 to complete the proof.
18-12. Let R g Rd be an open set. Prove that for every function f
in C F ( Q ) that converges a t . to f.
18-13. Let R
Rd be an open set. Prove that i f f
then f = 0 a.e.
E
E
L 1 (Q) there is a sequence of functions
L’(S2) is so that
fq dA = 0 for all cp
E
Cg(R),
18. Measure, Topology, and Differentiation
396
18-14. A half-open box in
Wd is a set H
for which there are real numbers ai < bi, i = 1, . . . . d so that
d
H = n[a;,bi).
i=l
Let 0
2 Wd be open and let K g 0 be compact. Prove that for every E
> 0 there is a finite family
n
(H,)S=l of pairwise disjoint half-open cubes of side length less that E so that K E
u
Hj
g 0.
j=1
Hint. Work with the
11 .
n[
n
use cubes of the form
1 ,
norm, let S := - min
2
IiS, (I,
{ E , dist
( C , Wd \ 0
) ] (need to prove S > 0) and
+ 1)s ).
i=l
18-15. Prove that the function g : Rd -+
B defined by g ( x ) := e%
;
lo:
for I/x1/2< 1,
for Ilxll2 2 1,
is a C r function.
Hint. Chain Rule and Lemma 18.8.
18-16. Let R C Wd be open and let f E LP(R). With f extended to all of Rd by setting it equal to zero
sI
+
outside of 52, prove that lim
f ( x ) - f ( x z ) IP d u x ) = 0.
z ~ o
Hints. First prove the result for C r functions. Then prove the result in general by approximating f
with C r functions.
18-17. Let R
2 Wd be open, let f
fp(x) :=
E
LP(R), let cp E C r ( R ) and for all x E Wd define the function fp by
f(z)cp(x - z ) d h ( z ) . (The integrals exist because LP(supp(cp)) 2 L'(supp(cp)).)
(a) Prove that fp is continuous on Wd.
Hint. Dominated Convergence Theorem
af of f exist and are equal to
Rd all first partial derivatives -
(b) Prove that at every x E
ax;
Hint. Dominated Convergence Theorem, Mean Value Theorem for functions of one variable
and boundedness of the first partial derivatives of 9.
(c) Prove that all first partial derivatives
af
- of
axi
Hint. Apply parts 18-17a and 18-17b.
f are continuous at every x E Wd
(d) Prove that fp E Coo ( Wd )
Hint. Apply part 18-17c and use induction
Note. The operation that produces
18-18. Let R
2 Wd beopenandlet
fp
from f and cp is also called the convolution of f and cp
p E [1, 00).
(a) Prove that LP(R) has a countable dense subset consisting of simple functions.
Hint. Boxes with dyadic rational bounds.
(b) Prove that LP(R) has a countable dense subset consisting of C r functions.
18-19. Prove that for every Riemann integrable function f : [a. b] + R on [a. b] there is a sequence
(gn)?=] of infinitely differentiable functions on [a,b] such that & ( a ) = g n ( b ) = 0 for all n E W,
nl%l
b
/f-g,ldx=OandforallnEWwehavelgn/i/fI.
Hint. First find a = xo < x l
.'.
i
n, = b and a step function s ( x ) =
a k l [ x k - l , , x kso
) that
i
k=l
Is/ 5
1 f j and /Is - f 11 1 is small. Then use Coo interval indicator functions.
18.3. Tensor Algebra and Determinants
397
Figure 46: Geometric arguments that give the area of a parallelogram and the volume
of a Parallelepiped. (Formally, the cross product on the left is obtained by making 2
and b vectors in R3 whose third component is zero.)
18.3 Tensor Algebra and Determinants
To prove the Multidimensional Substitution Formula we need a volume function for
n-dimensional parallelepipeds spanned by vectors a1 , . . . , a,. Moreover, for integration on manifolds we need functions that allow us to compute the lower dimensional
volume of lower dimensional parallelepipeds in higher dimensional spaces. For example, we will be interested in the area (two dimensional volume) of a parallelogram (two
dimensional parallelepiped) in three dimensional space.
We start by analyzing some formulas from geometry (also see Figure 46). It can be
proved that the area of the parallelogram in R2 spanned by the vectors a =
b=
(Ei 1is
(::)
and
1.
A ( a , b ) = lalb2 - bla2 Similarly, the volume of the three dimensional
1
1.
parallelepiped spanned by the vectors a , b , c is V ( a ,b , c) = ( a , b x c) Note that if
we drop the absolute values, the (oriented) area is bilinear and the (oriented) volume
is 3-linear (see Definition 17.45) in the input vectors. Throughout, we will consider
oriented volumes, that is, volumes that can be positive or negative. This is not much
of a loss, because if we really want a nonnegative number, we simply take absolute
values.
Because the two dimensional volume formula is incorporated in the three dimensional volume formula, the above suggests that we should try to construct a general
tensor formalism that produces the volume function for parallelepipeds in arbitrary
dimensions.
18. Measure, Topology, and Differentiation
398
We start by representing k-tensors on finite dimensional vector spaces. The representation in Theorem 18.16 below is already implicit in Corollary 17.61.
Definition 18.13 Let V be a vector space. The set of real valued k-tensors on V is
denoted by I k( V ) . For a Jinite dimensional vector space V , the dual space is deJined
to be V* := I ’ ( V ) = C ( V ,R).
With pointwise addition and scalar multiplication ‘ T k ( V )is a vector space (see
Exercise 18-20). The (higher dimensional) volume of a box is the product of the lower
dimensional volumes of the projections: for a rectangle, area is the product of the
lengths of the sides (length is one dimensional volume), and for a three dimensional
box, the volume is the area of the base times the height. It is thus not surprising that
tensors can be multiplied in the same simplistic fashion.
Definition 18.14 Let V be a vector space and S E ‘ T k ( V ) ,T E 7 “ ( V ) . We define
s @ T [ U l , . . . , U k , U k + l , . . . , U k + l ] := s[L’l,. . . , uk]T[uk+l,. . . , V k + l ] and Call it the
tensor product of S and T . Clearly, S @ T E ‘Tk+’(V).
The tensor product is not commutative, but it is associative (see Exercise 18-21).
This means that while we need to be careful with the order of the factors, tensor products with more than two factors can be written without parentheses. Tensor products
allow us to represent tensors in terms of a very natural base.
Theorem 18.15 Let V be a Jinite dimensional
- vector space with base { u1, . . . , U d } .
-
Then the maps
@i
: V -+
IR dejined by q5i
:= ai form a base of the dual
space of V , called the dual base of { v l ,. . . , Ud}.
Proof. To see that { @ I ,
. . . , q5d) is linearly independent, let a1, . . . , ad
d
E
d
R be so
t h a t C a i $ i = O . T h e n f o r a l l j E { l , . . . , d } w e i n f e r O = ~ a i @ i [ u j ] = a j and
,
i=l
i=l
hence
{41,. . . , 4 d )
is linearly independent.
To see that (41,. . . , @ d ] is a base of V*, let
+E
c
d
V*. Then for all u =
ai ui
i=l
d
d
in V we have + [ u ] = c a i + [ u i ] = c + [ u i ] 4 i [ u ] which
,
means that
i=l
i=l
combination of the 4i. Hence, {$I, . . . , &j} is a base of V*.
+ is a linear
H
Theorem 18.16 Let V be a Jinite dimensional vector space with base { v l ,. . . , U d }
and let { @ I , . . . , @ d } be the associated dual base as in Theorem 18.15. Then the set
B := {4i,@ . . . @ @ i k : i l , . . . . ik E 11. . . . , d } } is a base f o r I k ( V ) .
18.3. Tensor Algebra and Determinants
399
Proof. The proof is similar to the proof of Theorem 18.15. (Exercise 18-22).
H
Both the area of a parallelogram as well as the volume of a three dimensional parallelepiped have multiple summands. Hence, the volume function for parallelepipeds
cannot just be a simple tensor product. A key property of both geometric formulas is
that if we switch any two vectors, the sign of the result changes. This observation leads
us to alternating tensors.
Definition 18.17 Let V be u vector space. A k-tensor w E I k V
( ) is called alternating
#for all 1 5 i < j 5 k and all U I. . . . , Uk E v we have
o[Ul,.
-
. . , U i - 1 , ui, u i + l , . . . U j - 1 , U j , u j + l , . . . , U k ]
--W[Ul,.
. . , U j - 1 , U j , U i + l , . . . , u j - 1 , ui, u j + l , . . . , U k ] .
3
The set of alternating k-tensors on V is denoted Ak( V ) .
Exercise 18-23 shows that the set of alternating k-tensors forms a vector space.
Because they have only one argument, linear functions q5 : V --f R are alternating
1-tensors. (The universal quantification in the definition of an “alternating 1-tensor”
is over the empty set, so it is vacuously true.) For any tensor, we can define a corresponding alternating tensor. The idea is to sum all terms that can be obtained by
permuting the entries in such a way that a transposition of two vectors will switch a
positive summand with a negative summand.
Definition 18.18 For k E N we let S k be the set of all permutations on [ 1, . . . , k ) .
A transposition is a permutation t that fixes all but two elements of the set. The
sign sgn(o) of a permutation a E Sk is 1 i f a is a composition of an even number
of transpositions and it is - 1 if a is a composition of an odd number of transpositions. (Exercise 18-24 shows that the sign is well-defined.) If T E I k ( V ) we define
Theorem 18.19 shows that Alt(T) is an alternating tensor, thus explaining the no1
tation. The proof also shows why we need the coefficient -. Without it, Alt( .) applied
k!
to an alternating tensor would multiply the alternating tensor with a factor that is not
equal to 1.
Theorem 18.19 Let V be a vector space and let k E N. For all T
Alt(T) E A k ( V )and f o r all w E A k ( V )we have Alt(w) = o.
E
I k( V ) we have
Proof. First note that a tensor w E I k( V ) is alternating iff for all transpositions
sk and all u1, . . . , U k E V we have w [ u 1 , . . . , u k ] = - w [ U r ( l ) , . . . , U,(kj]. Moreover, note that for all transpositions t E s k we have t o t = id{I,..,,k).
NOW let T E I ~ ( v
and
) let u1, . . . , U k E V . Then for all transpositions t E Sk we
obtain the following.
t E
18. Measure, Topology, and Differentiation
400
=
-Alt(T)[Ul,.
..,~
k ] .
Therefore Alt( T ) is alternating.
For the second part, note that if o E A’( V ) and a E sk, then for all u1, . . . , U k
we have w [ q , . . . , ]‘u = sgn(a)w[u,(l), . . . , v,(k)] (Exercise 18-25). Now
=
W[Vl,.
E
V
..,Uk].
It is also easy to see that Alt : T k (V ) -+ Ak( V ) is linear (see Exercise 18-26).
With Alt(.) we can now multiply alternating tensors in a way that produces a new
alternating tensor.
Definition 18.20 Let V be a vector space. For w E A k ( V )and rj E Al(V) define the
(k l)!
wedge product w A v to be w A q := ___ AMw 8 v).
k!l!
+
Example 18.21 The wedge product is the key to obtaining volume functions. Let
( e l , . . . , ed} be the standard base in Rd and let {nl , . , . , nd}be the corresponding dual
base.
1. For all a,b E Rd,we have nl A n2(a,b ) = nl(a)n2(b)- nl(b)n2(a).That
is, the wedge product of n1 and n2 maps any two vectors a , b E Rd to the
oriented area of the parallelogram spanned by the vectors made up of the first
two components of a and b. More generally, the wedge product of ni and nj
maps a , b E Rd to the area of the parallelogram spanned by the vectors made up
of the ith and jthcomponents of a and b.
2. For all a , b , c E
Rd,we have
(ni A ( ~ A2 ~ 3 ) ) ( ab ,, c ) =
18.3. Tensor Algebra and Determinants
40 1
(The lengthy componentwise proof can be produced in Exercise 18-27.)
That is, the wedge product of nl,372, and n3 maps three vectors a , b , c E Rd
to the oriented volume of the parallelepiped spanned by the vectors made up of
the first three components of a , b and c. Similarly, the wedge product of ni,
and n k maps any three vectors a , b , c E Rd to the volume of the parallelepiped
spanned by the vectors made up of the ith, jthand kth components of a , b and c.
The above indicates that the wedge product of k vectors of the dual base of the
standard base e l , . . . , ed should map any k-tuple of vectors to the lower dimensional
volume of the projection of the parallelepiped spanned by the vectors into the right
lower dimensional subspace (see Exercise 18-28). Moreover, part 2 indicates that the
wedge product of “projection volume functions” gives a higher dimensional volume
function. Therefore the wedge product formalism should be the version of the familiar
“base times height” formula for boxes that holds for parallelepipeds.
0
With the wedge product accepted as the right formalism to compute volumes of
parallelepipeds we need to investigate its properties. Bilinearity of the wedge product
is established in Exercise 18-29. Moreover, the wedge product allows a base representation of alternating tensors on finite dimensional spaces similar to Theorem 18.16. For
this base representation, we first need to establish associativity and the behavior when
two factors are transposed. The first two parts of the following lemma are motivated
by the proof of the wedge product’s associativity.
Lemma 18.22 Let V be a vector space.
I . Zf S
E
T k( V ) , T
E
7‘( V ) and Alt(S) = 0, then
Alt(S 8 T ) = Alt(T @ S) = 0.
2.
If S E T k( V ) , T
E
7‘(V ) and U
E
7”( V ) , then
Alt(Alt(S @ T ) @ U ) = Alt(S @ T 8 U ) = Alt(S 8 Alt(T @ U ) ) .
3. Zfw
E
A k ( V ) ,r j
E
A‘(V) and B E A”(V), then
In particular this means that the wedge product is associative.
4. The wedge product is “anti-commutative.’’I f w
0 A r j = (-l)k’q A w.
5. Zfk is odd and C#J
E
A k ( V ) ,then C#J
A
C#J
E
A k ( V )and
rj E
A’(V), then
= 0.
Proof. For part 1, we simply compute Alt(S @ T ) and represent the permutations
cr E Sk+l as the composition of three permutations. The first sorts the elements of
{ 1, . . . ,k
1 ) into a set of size k and a set of size 1, so that in each set the elements
+
18. Measure, Topology, and Differentiation
402
are listed in increasing order and the sets are listed as two consecutive blocks. The
remaining two permutations then act on these sets.
Alt(S €3 T)[vi, . . . , w + ~ I
=Alt(S)[u,, , ..., u U k ] = O
=
0.
Part 2 is a straightforward application of the linearity of Alt(.), the multilinearity
of tensor products and part 1.
Alt(Alt(S €3 T ) €3 U ) - Alt(S €3 T €3 U )
= Alt(Alt(S @ T ) €3 U - ( S €3 T ) €3 U )
=
Alt([Alt(S @ T ) - ( S €3 T ) ]€3 U )
=
0,
and the other equality is proved similarly. Part 3 now follows from part 2.
(W
A
q) A 8 =
[
%Ai)!t(W
k!l!
€3 q ) ]
A
8
18.3. Tensor Algebra and Determinants
403
and the other equality is proved similarly.
For part 4, represent w A r j and r j A o similar to the representation for part 1. Then
note that 6u,w o 6 w t u , the permutation that transposes the “blocks” 11, . . . , k} and
{k 1, . . . , k 1 ) can be represented as a composition of kl transpositions by going
left to right in {k 1, . . . , k I } , transposing each element k times with its current
predecessor. This establishes the claimed equality.
Part 5 immediately follows from part 4.
+
+
+
+
With associativity of the wedge product established, we no longer need to include
parentheses in the wedge multiplication of three or more alternating tensors.
Theorem 18.23 Let V be a $finite dimensional vector space with base { u1, . . . , u d } ,
and let {$I, . . . , &) be the associated dual base as in Theorem 18.15. Then the set
d:={@ilA..+q5i,:lsi1
< . . . < i k s d } i s a b a s e f o r A k ( V ).
Proof. By Theorem 18.16, every q5 E I k ( V ) is a linear combination of tensor
products $il @ . . . 8 4 i k . The function Alt : ‘ T k ( V )+ A k ( V ) is surjective by Theorem 18.19 and by Exercise 18-26 it is linear. Therefore every w E A k ( V ) is a linear
combination of tensors Alt(@i, @ . . @ &). By associativity of the wedge product,
Alt($i, 8 .. . 8 #ik) is a multiple of 4jl A . . . A &. If any two indices i j are equal, then
by part 5 of Lemma 18.22 &, A q$] = 0.Moreover, for any permutation 0 E s k by
part 4 of Lemma 18.22 we infer that q5il A . . . A $ j k E { 3 &,(il) A . . . A & ( i k ) } . This
means that every w E A k ( V )is a linear combination of tensors $il A . . . A 4ik with
il < i 2 < . . . < i k .
To see that these tensors are linearly independent, let the numbers ail,,,ik
be so that
ail,,,ik@il
A . . . A $ik = 0. For fixed 1 5 j 1 < . . . < j k 5 d note that
l i i i <...<ikid
which proves that the tensors
independent.
A
... A
qii, with i l < i 2 <
...
< i k are linearly
In particular, Theorem 18.23 shows that A d ( V ) is one dimensional. Therefore all
alternating d-tensors on d-dimensional space are multiples of one tensor. Geometrically this means that on d-dimensional space there is only one volume function for
d-dimensional parallelepipeds, which is the only sensible possibility.
18. Measure, Topology, and Differentiation
404
Definition 18.24 The unique alternating d-tensor w on Rd with w(e1, . . . ,ed) = 1 is
called the d-dimensional determinant. It is denoted det(u1, . . . , U d ) := o ( v 1 , . . . , vd).
I f A = (a;,) =
. is a d x d-matrix, we define the determinant of the matrix as
,,,
J
= l....,d
The connection between determinants and volumes, which retroactively validates
our geometric motivation, is made in Exercise 18-48. To be able to prove the results
needed to get there we conclude this section with a few key properties of determinants.
Proposition 18.25 Let V be a vector space with base { u1 , . . . , U d } and let w
E
Ad( V ) .
d
IfWj
= C U i j V i , t h e n o ( w l , . . . , W d ) = det
j = 1, . . . . d
i=l
Proof. Because ( v 1 , . . . , U d ] is linearly independent, w(u1, . . . , ud) f 0 (see Exercise 18-30) and so for A = ( a ; j ) = , , , , d we can define
j = l ,...,d
It is easy to see that q is d-linear in the columns of A, that the transposition of
any two columns of A changes the sign of q ( A ) and that q ( 1 ) = 1 when I is the
X
d
identity matrix. This means that if we represent each vector w =
the column vector
[y1)
wjei in Rd with
i=l
, then q
E
Ad
and q(e1, . . . , ed) = 1. But this means
Wd
that q is the determinant on R d ,that is,
d
The result now follows from applying the above to vectors
Wj
aij Ui .
=
;=I
Proposition 18.26 Let V be a d-dimensional vector space and let w
{ w1 , . . . , W d } is not linearly independent, then w(w1, . . . , W d ) = 0.
Proof. Without loss of generality assume that there are
E
Ad(V).
a1 , . . . , ad- 1 E
If
R so that
405
18.3. Tensor Algebra and Determinants
d-1
i=l
Corollary 18.27 Let A , B be real d x d-matrices. Then det(AB) = det(A) det(B).
Proof. If the columns of A are linearly independent, we prove the result similar
det(A B )
to the proof of Proposition 18.25. The function q ( B ) := ___ is linear in each
det(A)
column of B , transposing two columns transposes two columns of AB and thus it
changes the sign of q , and q ( Z ) = 1, where Z is the identity matrix. Hence, we infer
q ( B ) = det(B) in this case. If the columns of A are not linearly independent, then both
sides of the equation are zero.
When arguing in general, we usually work with linear functions, not matrices.
Therefore. it is sensible to define the determinant of a linear function.
Definition 18.28 Let L : Rd + Rd be a linearfunction and let A be its matrix representation with respect to the standard base. Then we define the determinant of L as
det(L) := det(A).
Exercise 18-31 shows that the determinant of a linear function L does not depend
on the specific base chosen, as long as we use the same base to represent arguments and
images. Finally, because Exercise 18-48 shows that the determinant defines volumes in
EXd, we want to define volumes on more general spaces. As long as the space carries an
inner product, this is sensible. With an inner product we can define orthonormal bases
and cubes whose sides are defined by the vectors of an orthonormal base should have
volume 1. To make a volume definition with an alternating d-tensor possible, we must
assure that these tensors are constant on the orthonormal bases. The next result shows
that this is almost the case.
Proposition 18.29 Let V be a d-dimensional innerproduct space and let w E A d ( V ) .
Then for any two orthonormal bases (u1, . . . , U d } and ( w 1 , . . . , w d } of v the equation
/ W ( U l , . . . , Ud)l = I W ( W 1 , . . . , wd)l holds.
d
Proof. For i, j = 1, . . . , d let
aij E
I d
R be so that
d
wj =
\
C aijvi. Then for all
i=I
d
18. Measure, Topology, and Differentiation
406
is the matrix with entries a i j , then the product A A T of A and its transpose is the identity
I . By Corollary 18.27 and Exercise 18-34 we infer that the determinant of A is 1
2
or -1 because (det (A) ) = det (A) det A
det A A
= det(Z) = 1. Therefore
by Proposition 18.25 we conclude
( ' > = ( '>
j w ( w ~ ., . . , W d ) l = j det(A)w(vi,.
. . , u d ) / = Iw(ui,. . . ,
With d-forms evaluating to the same absolute value when applied to orthonormal
bases, it makes sense to pick one of the two d-forms that assign values of f l to orthonormal bases and designating it as the volume element. We will see in Section 19.5
how to choose one of these two to obtain a volume element for very general surfaces.
Exercises
18-20. Let V be a vector space. Prove that T k ( V )with pointwise addition and scalar multiplication is a
vector space. That is, prove that
(a) Ifw, 7 E T ~ ( vthen
) , u ( u 1 , . , . , v k ) := w ( v 1 , . . , , V k )
+ q ( u 1 , . . . , v k ) is ak-tensor on V .
(b) If w E I k ( V ) and c E W,then ~ ( u l . ,. . . u k ) := c w ( u 1 , . . . , Vk) is a k-tensor on V .
18-21. Properties of the tensor product. Let V be a vector space
(a) Prove that for all tensors S E 'Tk( V ) ,T E 7 l ( V ) and U E 7"( V ) we have
(s €3 T ) €3 u = s 8 ( T 8 U ) .
(b) Prove that there are tensors S and T on W2 so that S €3 T # T 8 S.
18-22. Prove Theorem 18.16.
18-23. Let V be a vector space and let k E
18-24. Let k
E
N.Prove that A k ( V )is a vector space.
N.Proceed as follows to prove that the sign of a permutation is well-defined.
(a) Prove that every u E sk can be represented as a composition of transpositions.
Hint. Induction on k . In the induction step, transpose u ( k )back to k .
(b) Prove that if u can be represented as the composition of n transpositions and as the composition of m > n transpositions, then m - n is even.
Hint. Again, induction on k .
(c) Prove that the function sgn : Sk + { 1, - 1) is well-defined.
IS-25. Let V be a vector space. Prove that if w E A k ( V )and u E Sk, then for all u1, . . . , V k
w [ u l , . . . , ukl = Sgn(U)W[Uo(l),. . . v o ( k ) l .
Hint. Induction on the number of transpositions that need to be composed to get u .
E
V we have
3
18-26. Prove that Alt : T k ( V ) + A k ( V ) is linear, that is, for all tensors S , T E ' T k ( V )and all scalars
a E R we have Alt(S T ) = Alt(S) Alt(T) and Alt(crT) = aAlt(T).
+
+
18-27. Prove the formula in part 2 of Example 18.21
18-28, The wedge product xii A . . . A rik gives the k-dimensional volume of the projection of a parallelepiped into span{ei,, . . . , ejk 1.
(a) Let a1 , . . . . U k
7c1 A
E
LRd be zero in all but the first k components. Prove that the wedge product
. . . A irk (a1 , . . . , ak) gives the k-dimensional volume of the parallelepiped spanned by
a1 . . . . , a k .
Hint. Prove that
A
' ' '
A
=k 1 (XERd:,k+l =,,,=X d = O ] k composed with the isomorphism be-
lk
tween ( ~ k ) ) "and { x E ~d : X k + l = . . . = X d = 0 is equal to the k-dimensional determinant and use that the k-dimensional determinant is the volume function for parallelepipeds
spanned by k vectors in k-dimensional space. (This fact will be established in Exercise IS-48
without using this exercise.)
18.4. Multidimensional Substitution
407
(b) Let a l , . . . , U k E E d . Prove that wedge product 7rjl A ' . A 7rik ( a l . . . . , a k ) gives the
k-dimensional volume of the projection of the parallelepiped spanned by a1 , . . . , U k into
span(ei,, . . . , e i k ) .
18-29. Let V be a vector space. Prove that the wedge product is bilinear, that is, prove the following.
(a) Prove that if w1, w2 E A k ( V )and q
E
A'(V), then
(wl
(b) Prove that if w
E
ilk(!') and '11,772 E A'(V), then w
(c) Prove that if w
E
Ak(V), q E A'(V) and c E
+ w2) A II = w l
A (a1
A rl + w2 A q .
+ 172) = w A '71 -tw A 72.
W,then ( c w ) A q
= c(w A q ) = w
A
(cq).
18-30, Let V be a d-dimensional vector space, let the set ( u l
U d ] g V be linearly independent and let
W E Ad(V)\(Oj.Provethatw(ul , . . . , u d ) + O
Hint. It suffices to prove the result for W d ,Suppose for a contradiction that w ( u 1 , . . . , u d ) = 0 and
use linear combinations to derive that w(e1,
18-31. The determinant of a linear function is independent of the base with respect to which it is represented.
Let L : Wd + Wd be a linear function and let B be its matrix representation with respect to the base
( u l , . . . , U d ] . Prove that det(B) = det(L).
c
18-32. Summation formula for the determinant. Let d E W. Prove that the determinant of any real d x d
matrix A = (aij) . = 1, , , , , d eWalsdet(A) =
sW(o)alo(l) " ' a d o ( d ) ,
j = l , . . . .d
O€sd
Hint. Prove that the determinant is the image under Alt(.) of the tensor that multiplies d ! with the
diagonal entries.
18-33. Row Expansion of the Determinant. Let A E M ( d x d . E). We claim that for any number
d
i E (1,.
. . , d ) we have det(A) = x ( - l ) ' ' j a i j
det(Aij), where A i j
E
M(d
-
1x d
-
1, X ) is
j=l
obtained from A by erasing the ith row and the jthcolumn.
(a) Prove the claim using Exercise 18-32
(b) Prove the claim by proving that the formula defines an alternating d-tensor that is equal to 1
for the identity matrix.
18-34. The transpose of a matrix A = (ajj)
=
is AT := ( u j i )
= 1, , _ ,, d
. Prove that for
j = 1, . . . . d
j = l , . . . ,d
all real d x d matrices A we have det(A) = det ( A r ) .
Hint. Exercise 18-32.
18-35. Determinants of certain linear functions.
(a) Let c1, . , . . Cd E W and let D : Rd --f Rd be the diagonal operator
D ( X 1 , . . , , X d ) := (ClXl, . . . . CdXd). Prove that det(D) = C1 ' . ' C d .
E W,let i, j E (1, , , . . d j be distinct and let A : Wd + Wd be the row addition
operator A(x1, . . . . X d ) := ( X I , . . . , xi-1. xi +ax,, x i + l , . . . , xd). Prove that det(A) = 1.
(b) Let a
(c) Let i . j
E
(1, , . . , d j with i < j and let T :
erator T ( x 1 , . . . . X d ) :=
det(T) = -1.
Rd + Wd be the row transposition op. . . , nj-1, xi,x j + l . . . . . Xd). Prove that
(XI.. . . , x-1. xj,x i + l ,
18.4 Multidimensional Substitution
The d-dimensional substitution formula (see Theorem 18.37) is a surprisingly deep
result. Its proof draws on topology, differential calculus and measure theory as well as
18. Measure, Topology, and Differentiation
408
on linear algebra. However, the underlying idea is fairly simple. Consider the image
of a small cube under a differentiable function g. If the cube is sufficiently small, then
on the cube the function g is approximately equal to its derivative L . The image of the
cube under L is a parallelepiped. This means that the g-images of our most elementary
volume elements (cubes) are approximately parallelepipeds.
It can be shown geometrically that the absolute value of the determinant of a 3 x 3
matrix is the volume of the parallelepiped spanned by the column vectors. So, in three
dimensions, the volume of the image of the cube spanned by the standard unit vectors
under a linear function L is the absolute value of the determinant of L . Because translations do not affect the volume (see Lemma 18.3 1) and linear factors in the columns can
be factored out of the determinant, we expect that the absolute value of the determinant
of a linear function L is exactly the factor by which the volume of a cube is multiplied
to obtain the volume of its image under L. Lemma 18.35 shows that this is the case for
arbitrary sets in arbitrary finite dimensions.
Lemma 18.36 then shows that for continuously differentiable functions the absolute
value of the determinant of the derivative is the local volume distortion factor and
Theorem 18.37 pulls everything together to establish the Multidimensional Substitution
Formula. Throughout this section we work with the uniform norm 11 .,1 on Rdbecause
the “balls” in this norm are cubes.
Lemma 18.30 Let 521, 522 S Rd be open subsets of Rd and let g : 521 -+ C22 be a
continuously differentiablefunction. If S 2 5 2 1 is Lebesgue measurable, then g [ S ] is
Lebesgue measurable, too.
Proof. We first prove that g maps null sets to null sets. Let N c 521 be a null set.
1
For n E N,let K ,
x E S-21 : dist (x,Rd \ Q 1 ) 2 - and ~~x~~5 n . Then all K ,
n
u
I
30
are closed and bounded and
521
=
K,. It is enough to prove that each g[N n K,]
n=l
is a null set. Because K2, is compact and Dg is continuous, we can find an upper
bound L > 0 for 11 DgI/ on K2,, with all distances measured with the 11 . /I,-norm on
u
Ly)
Rd.Let E
c1
> 0 and let
f l K,
gj
j=l
x
and
{ g j } F l be a family of open boxes such that N
Ej
1
&
-.
i
j=1
that N n K , C
2Ld
From these boxes, construct a family of open cubes { B j } P l so
u c1
Bj,
j=1
1
Bj <
j=1
&
-
Ld
and the diameter of each Bj is at most
1
-
2n
(see
Exercise 18-36aj. Because we can discard any cubes that do not intersect K , we can as-
u
c13
sume that all B j intersect K,. Then
Bj & K2,. By Theorem 17.33, this means that
j=1
g is Lipschitz continuous on each Bj and the Lipschitz constant is L when distances
are measured with the 11 . Il,-norm on Rd.Hence, each g [ B j ]is contained in a cube
409
18.4. Multidimensional Substitution
u
m
C j with h ( C j ) 5 L d h ( B j ) (see Exercise 18-36b). Therefore, g [ N n K,] C
Cj
j=1
33
52
and h ( g [ N n K,]) 5 c h ( C j ) 5
L d h ( B j )< L d L = E . Because E was arbiLd
j=1
j=1
trary, we obtain h ( g [ N n K,]) = 0 and we have proved that g maps null sets to
null sets.
Now let S 2 C 2 1 be Lebesgue measurable. Then by Theorem 18.4 there is a sequence ( F i } E l of closed sets and a sequence ( G i j E , of open sets so that for all i E N
u
00
we have Fi
C Fi+1 and Gi 2
Gi+l and so that with F :=
n
00
Gi we
i=l
i=l
have F C S C G C S21 and h ( G \ F ) = 0. Clearly, g [ F l C g [ S ] C g[Gl G C2z and
by the above h ( g [ G ]\ g [ F ] )5 h ( g [ G \ FI) = 0.
For i E N,set
:= Fi n Ki . Then each
is closed and bounded, and hence com-
u E,
52
E
8
u nu u
52
52
nKi) =
Fi and G :=
00
00
Fi
Ki =
[E,]
[g+1]
u
00
n C21 =
Fi. By Theoi=l
i=l
i=l
i=l
i=l
i=l
rem 16.62, all g
are compact, and hence closed, so g [ F ] is the union of a sequence of closed sets g
with g
Cg
for all i E N. This means that
g [ F ]is Lebesgue measurable. Because h ( g [ S ]\ g [ F ] ) 5 h ( g [ G ]\ g [ F ] ) = 0 and null
sets are Lebesgue measurable, this means that g [ S ]\ g [ F ]is Lebesgue measurable, and
hence g [ S ] is Lebesgue measurable.
pact and
=u(Fi
[R,]
[E]
Fi
Exercise 18-54 shows that continuity alone is not enough to assure that the images of null sets are null sets. Unsurprisingly, translations do not affect the Lebesgue
measure of a set, as the next result shows.
Lemma 18.31 The effect of a translation on Lebesgue measure. For S _C Rd and
x E Rd,define x + S := x + s E Rd : s E S } . If S is Lebesgue measurable, then
x S is Lebesgue measurable and h(x + S ) = h ( S ) .
+
I
a
Proof. Exercise 18-37.
The key to establishing Lemma 18.35 is to prove it for sufficiently basic linear
functions from which arbitrary linear functions can be built. The first step are linear
functions that stretch or shrink objects parallel to the coordinate axes.
Lemma 18.32 The effect of a diagonal operator on Lebesgue measure. Let the set
S C EXd be Lebesgue measurable, let c1, . . . , Cd E JR \ ( 0 ) and let the function
D : Rd +. Rd be defined by D ( x 1 , . . . , x d ) := ( ~ 1 x 1 ,. . , C d X d ) . Then D [ S ] is
Lebesgue measurable with h ( D [ S ] )= Ic1 . . . C d l h ( S ) = det(D)/h(S).
1
Proof. The Lebesgue measurability of D [ S ]follows from Lemma 18.30. Now let
u
52
E
> 0. To compute h ( D [ S ] ) let
, { B j } p lbe a family of open boxes so that S
j=l
Bj
18. Measure, Topology, and Differentiation
410
Figure 47: Visualization of the proof of Lemma 18.33. Maps as in Lemma 18.33 only
affect two coordinates, xi and x j . In these coordinates, they enforce a uniform “shear”
on the space. This shear turns rectangles into parallelograms without affecting the
area. These parallelograms can be covered with rectangles whose sides are parallel to
the coordinate axes.
tr3
and so that
lBjl
j=1
h(S)
u
+ Ic1 . . .&Cnl + 1 . Then {DIBj]},”^_lis a family of open
00
boxessothatD[S] G
D[Bj]andforeachj
E
NwehavelD[Bj]/ =
Ic1
. . . c,IIBjI
j=1
(Exercise 18-38). This means that
00
j=1
Because
E
j=1
> 0 was arbitrary we infer h ( D [ S ] ) 5 /c1 . . . C d l h ( S ) . A similar re-
sult holds for D - ’ ( x l , . . . , xd) = -.XI,
C l
1
. . . , -xd)
. For all Lebesgue measurable
cd
sets A the set D - ’ [ A ] is Lebesgue measurable and h D- [A] -
so h ( S ) = h (.’[D[S])
5
1
(
1
)<lcl.-cdl
h(A).
h(D[S]), that is, h(D[S]) 2 Ic1 . . . c d ( h ( S ) ,
IC1 . , . Cd I
which establishes h(D[S]) = /c1 . . . c d / h ( S ) . Finally, it was shown in Exercise 18-35a
that det(D) = c1 . . ‘ c d .
The next step are functions that shear objects in one coordinate direction. Figure
47 shows the geometric idea for the proof of Lemma 18.33.
Lemma 18.33 The effect of row addition on Lebesgue measure. Let S C Wd be a
Lebesgue measurable set, let a E R, let i , j E { 1, . . . , d ) be distinct and dejne
A : Rd + Wd by A(x1,. . . , xd) := (XI,. . . , ~ ~ - xl
1 ,
a x j ,x L + l , .. . , xd). Then
A[S] is Lebesgue measurable with h(A[S]) = h(S) = det(A)lh(S).
I
+
18.4. Multidimensional Substitution
41 1
Proof. By Lemma 18.30 A [ S ]is Lebesgue measurable. Without loss of generality assume i < j and u > 0. Let & > 0. To compute h ( A [ S ] ) let
, { B k ] g 1 be
u
cc
30
a family of open boxes so that S 5
k=l
fi
(u,",b,") := Bk, let nk
E
1 Bkj
Bk and
k=l
N be so that a (h$- a : ) 2
nk
1=1
bk - ak
Ak :=
nb
' . Then for every x =
+ (rn - I ) & ,
a:
(XI,
< h(S)
+ -.2 For each k , let
&
n(
&
b,k - a,k ) < 2k+l andlet
l#i,j
. . . , xd) E Bk there is an m
E (1, . . . , nk} so
+ rnAk]. This means that the ith component of A ( x )
satisfies
X;
f UXj
E
[X;
U
(a) f (m -
I ) a k ) , Xi
-k U
(Ur
f mAk
Hence.
m=l1=1
(see Figure 47), and therefore
nk
k(A[Bk]) 5
(bf m=l
+ U A k ) . Ak .
n
(bf - u,")
r#i, j
We obtain
Because E > 0 was arbitrary we infer h ( A [ S ] ) 5 h ( S ) . We can obtain the same
inequality for A - ' ( x l , . . . , xd) = ( X I , . . . , x i - 1 , x; - a x , , x i + l , . . . , xd). But this
18. Measure, Topology, and Differentiation
412
means h ( S ) = h (A-'[A[S]]) 5 h(A[S]), and hence h ( S ) = h(A[S]). Finally, it was
shown in Exercise 18-35b that det(A) = 1.
Lemma 18.34 The effect of row transposition on Lebesgue measure. Let S
Rd be
Lebesgue measurable, let i , j E 11, . . . , d } with i < j and let T : Rd -+ Rd be
T ( x 1 , . . . , xd) := (xi,.. . , xi-1,x j , x i + i , . . . , x j - 1 , xi,x j + l , . . . , X d ) . Then T[S] is
Lebesgue measurable with h ( T [ S ] )= h ( S ) = det(T)/h(S).
1
Proof. Exercise 18-39.
Lemma 18.35 The effect of a bijective linear operator on Lebesgue measure. Let the
function L : Rd + Rdbe linear and bijective and let S C Rdbe Lebesgue measurable.
Then L[S] is Lebesgue measurable and /;(L[S]) = det(L)lh(S).
I
Proof. The Gauss-Jordan algorithm from Theorem 17.23 shows that there is a
diagonal operator D with nonzero diagonal entries and a sequence of row transposition
and row addition operators A1, . . . , A,* so that L = A,A,-l ' ' . A1 D. Because the
determinant of a composition is the product of the determinants of the factors (see
I
n1
I
Corollary 18.27) we infer that det(L) =
(iI1
h(L[S])
=
h(A,A,-i
. . . A1 D[S])
1
det(Ai)
1
1 1 det(D) I, and hence
= det(A,)lh(A,-1
. . . A1 D[S])
Lemma 18.35 confirms our initial geometric idea for arbitrary dimensions. If a parallelepiped is spanned by vectors u1, . . . , v d , then it is the image of the unit cube under
the linear map whose matrix representation is the matrix A with columns u1, . . . , Ud.
Lemma 18.35 shows that the volume of this parallelepiped is the determinant of A.
(Details are left to Exercise 18-48.)
For the remaining results, we will work with half-open cubes, which are cubes of
d
the form n [ a i ,bi) with all bi - ai being equal. The radius of a half-open cube is half
i=l
its side length
Lemma 18.36 The effect of a diffeomorphism on Lebesgue measure. Let ' 2 1 and C22
be open subsets of Rd,let K C R1 be compact and let g : Rl -+ R2 be a continuously
differentiable bijective function with det ( D g ( x ) ) # 0 f o r all x E R1. Then f o r eve?
6 > 0 there is a 6 > 0 so that for every half-open cube B
K with center point x and
with radius less than S we have h ( g [ B ] )- det(Dg(x))(h(B) < s h ( B ) .
I
Proof. Because K is compact, the derivative Dg is uniformly continuous on K and
Dg(x)-llI is uniformly bounded on K by an M > 0. Moreover, we can
18.4. Multidimensional Substitution
find a v > 0 such that (1
+u
)~ 1<
413
&
max{/detDg(x)I : x E K }
. There is a 6 > 0
1
U
so that for all x,y E K with lIy - x / / <~6 we have //Dg(y)- Dg(x) < -. Now
M
let B be a half-open cube of radius less than 6, contained in K and centered at x. Then
for all L E B the following holds.
//g(z)- (g(x)
+ Dg(x)[z - X I )
Now we are ready to prove the Multidimensional Substitution Formula. Because
we work with individual functions, we state the result for functions f in C' . It should
be clear that a similar result holds for L1.
18. Measure, Topology, and Differentiation
414
Theorem 18.37 Multidimensional SubstitutionFormula. Let 521, 522 G EXd be open
subsets of Rd and let g : Q1 -+ Q2 be a continuously differentiable bijective function
so that for all x E 521 we have det (Dg(x)) # 0. Then for all f E L'(S-22) we have
Proof. By Corollary 17.66, g-' is also continuously differentiable. The measurability of f o g now follows from Lemma 18.30 applied to 8-l. We will first prove
the identity for all nonnegative f E Cr(522). Let f E Cp(522). Then g-'[supp( f ) ]
is compact. For every x E g-'[supp(f)] find E, > 0 so that B E s ( x ) G Q l . Then
{ B y ( x ): x
E
there are X I ,
. . . , X k so that g-'[supp(f)] G
g-'[supp(f)]}
is an open cover of the set g-'[supp(f)],
u
Bcx,(xi). Then K :=
3
i=l
Q1.
u
k
k
compact subset of
and hence
BEXi
( x i ) is a
i=l
Moreover, if we define JSuPP:= min
1
.L :
;&
i = 1, . . . , n
1,
then
every half-open cube of radius less than AsUpP that intersects g-' [supp( f ) ] is contained
in K . (Recall that we are working with // . [lo.)
Let E > 0. The function f is uniformly continuous on the compact set g[K].
Therefore there is a 6f > 0 so that for all x , y E g[K] with IIy - x ( / < 6f we have
If(v) - f ( x ) / <
&
The functions ( f o g ) / det(Dg)/ and g are uniformly
3h.( g [KI) .
continuous on K . Therefore there is a Sfos > 0 so that for all points x , y E K with
&
I I Y - ~ I I < 6 f o g wehave lf(g(x))l det(Dg(x))I-f(g(y))l det(Dg(y))lI < 3h(K) and
+
I/g(y) - g ( x ) / I < 6 f . Let M := max{I f(y)I : y E Q2)
1. By Lemma 18.36,
there is a 6 E (0, min{6fog,8suPP))so that for every half-open cube B g K centered
1
at x with radius less than 6 we have h(g[B]) - det(Dg(x))lh(B) <
n
d
Let B1, . . . , Bn be all half-open cubes of the form
n
g-I[supp(f)].
Then g-'[supp(f)]
[ l i s , (li
i=l
U B j and supp(f)
&
3Mh(K)
h(B).
+ 1)6) that intersect
n
5 U g [ B i ] . Because
i=l
i=l
8 < SsuPP, all Bi are contained in K. Let xi be the center point of Bi. Then for all
points Y E Bi we have If(g(y))l det(Dg(y))l - f (g(xi))ldet(Dg(xi))ll <
I / g ( y ) - g(xi)/l < Sf and for all
Therefore
n
z
E
1
g[Bil we have f ( z ) - f(g(xi))/ <
&
3h(~),
&
3k(g[Kl).
18.4. Multidimensional Substitution
415
n
n
i=l
i=l
n
Because
E
was arbitrary the inequality
h2
(f o g ) 1 det(Dg) I d h holds
f dh 5
1
for all nonnegative f E C,"(Q). The same argument with the roles of domain and
range reversed proves the reversed inequality as follows. By Theorem 17.31, we have
that Dg-' = ((Dg) o g-')-l,
so that
(det(Dg) o g-') det (Dg-')
[
= det ((Dg) o g-I) ((Dg) o g-l)-']
= 1,
and hence
which establishes the result for all nonnegative f E Cr(S22).
Now let f be a nonnegative simple function on S22 whose support is contained
in the interior of a compact set K
S22 and for which there is a 6 > 0 so that
:=
u Bs(x)
JE
quence
is compact and contained in
Q2.
By Theorem 18.12, there is a se-
K
{fn}zl
in C 2 ( s 2 2 ) so that lim
n-+w
If
- f n l dA. = 0. By Lemma 18.11, we
can assume without loss of generality that supp(fn) &
k for all
natural numbers
416
18. Measure, Topology, and Differentiation
n E N. Let U := max { f ( z ) : z E Qz}. Because we can subtract the function
h n ( x ) := (f n ( x )- U ) j l (f n ( x ) - U ) (see Lemma 18.9 for the definition of j l ) from f n
without affecting the convergence, we can assume without loss of generality that the f n
are uniformly bounded by a multiple of 1,-. Clearly, 1~ is integrable. By Proposition
14.49 we can assume without loss of generality that {
also converges pointwise a.e. to f . Because g-' maps null sets to null sets (see proof of Lemma 18.30)
{ ( f n o g) d e t ( D g ) l } z l converges pointwise a.e. to ( f o g)l det(Dg)l. Moreover, all
these functions are bounded by a multiple of (1,- o g) det( Dg) which is integrable.
Thus the result follows for f via the Dominated Convergence Theorem.
For nonnegative functions f E C'(S22), the result now follows via the Monotone
Convergence Theorem (Exercise 18-40a). Finally, for arbitrary functions f E C'(Q2)
the result follows by considering f + and f - separately (Exercise 18-40b).
fn}gl
1
1,
I
We conclude this section by assigning the function g and the volume distortion
factor det( Dg) their commonly used names.
Definition 18.38 Let S21, S22 E JRd be open subsets of Rd and let g : Q I + Q 2
be a continuously differentiable bijective function so that for all x E Q l we have
det (Dg(x)) # 0. Then g is also called a coordinate transformation and the determinant det (Dg(x)) is also called the Jacobian of g.
Exercises
18-36. Finishing the proof of Lemma 18.30
n
d
(a) Let B =
(ai . h i ) be an open box and let
i=l
cubes E l , . . . , Bn
SO
that B C
S z 0. Prove that there is a finite set of open
0 (6
B,, L
Bj
j=1
B j (with respect to
E,
)
< h(B)
+ E and the diameter of each
j=1
11 /Irn) is at most 8.
&
Hint. Define u := min
n
and let the sets
- a i ) n f = j + l ( b i -ail
d
B I , . . . , Bn be all the cubes of the form
measure of the union satisfies E.
(6
( I I u. ( I r
+ 1)u)
that intersect B . Show that the
1=1
B J ) < L(B)
j=1
+ 52 . Then slightly enlarge each cube to
get the result
n
d
(b) Let B =
(ai , h i ) be
an open cube, let f : B +. Rd be a Lipschitz continuous function and
i=l
let L be its Lipschitz constant when all distances are measured with respect to 1) . /Irn. Prove
b. - a
that with (cl, . . . , c d ) := f
and r := 2
(remember that all
T)
d
b,
- a,
are equal), we have f[B]C n ( c i - Lr, c,
i=l
+ Lr)
18.4. Multidimensional Substitution
417
n
d
(c) Explain why it is nor possible to prove that if B =
( a i , bi) is an open box and f : B -+
Rd
i=l
is Lipschitz continuous with Lipschitz constant L when all distances are measured with respect
to rhe Euclidean norm
!&?..%)and
I/
2
Hinr. Rotations.
18-37. Prove Lemma 18.31.
Hinr. Use the proof of Lemma 18.32 as guidance.
d
18-38. Let D be as in Lemma 18.32 and let B = n [ a i , bi] be a d-dimensional box. Prove that D [ B ] is a
box with
1 D [ B ] 1 = 1 ~ 1 . .c,llB1.
.
i=l
18-39. Prove Lemma 18.34.
Hint. Use the proof of Lemma 18.32 as guidance.
18-40. Finishing the proof of the Multivariable Substitution Theorem.
(a) Prove Theorem 18.37 for nonnegative f E L 1 ( R 2 ) , assuming that it holds for nonnegative
simple functions f E L ' ( R 2 ) whose support is contained in the interior of a compact set
K
R2 for which there is a S
> 0 so that
i?
:=
u
B s ( x ) is compact and contained in Q2.
xcK
(b) Prove Theorem 18.37 for arbitrary f E L 1(R2),assuming that it holds for nonnegative functions f E ~ ' ( ~ 2 1 .
18-41. Let R1 g Wd be an open subset of W d . Prove that i f f : R1 + Rd is Lipschitz continuous, then
there is a constant L > 0 so that for all S _C R1 we have h ( f[S] ) 5 L h ( S ) .
18-42. Compute the Jacobians of the following coordinate transformations g.
(a) Polar Coordinates. The coordinate transformation s2 : (0. m) x (0.27c) +
by q ( r . 0) = ( r cos(0), r sin(@ ).
W2 is defined
(b) Cylindrical Coordinates. The coordinate transformation c : (0,m ) x (0, 2 n ) x
defined by c(r, 0 , z ) = ( r cos(Q),r sin(Q),z ).
W + B3 is
(c) Spherical Coordinates. The coordinate transformation s3 : (0. m) x (0,2rr) x (0,
7r) -+ W3
is defined by s3(p, 0 , q ) = ( p cos(0) sin(cp), p sin(0) sin(q), p cos(q) ).
18-43. Compute the following integrals where d x , dy, dz denote the Lebesgue integrals along the x-. y - and
z-axes, respectively. Use the coordinate transforms from Exercise 18-42 as appropriate.
418
18. Measure, Topology, and Differentiation
(1-ii?
18-44. The integral of N F , o ( x ) = -e
6
-
7
U
*
2
.2
(a) Compute the integral of f ( x , J ) = eover a disk of radius u centered at the origin.
Then take the limit of the integrals as u + 00.
s_,
X2
(b) Compute
(Lm
m
x2
e--T dA(x) by proving
x2
x2+2
e-TdA(x))'
= LXI".e--
dh(x, J )
and then computing the latter integral.
Hint. Proposition 14.65 and part 18-44a.
L,
so
(c) Prove that
1
( x - d
e
2u2
dh(x) = 1
18-45. The volume of d-dimensional balls. We first need to define d-dimensional Spherical Coordinates.
Two and three dimensional spherical coordinates s2 are defined in Exercises 18-42a and 18-42c. For
d > 3 with Sd-1 : (0, 00) x ( 0 , 2 n )x (0,7r)d-3 + Rd-l denoting d - 1 dimensional spherical
coordinates, let
Sd(p3
8 %$91, , , . , (Pd-2)
:=
Sd-1
. . . . Vd-3)) sin(Vd-2), . . . .
( p , 8.
rd-1 (Sd-I(P. 8, V l , .
,,>
Vd-3)) Sin(%-2). Pcos(Vd-2)
ldP2
Let Jd be the Jacobian of Sd. Prove that IJd 1 = p (sin(cpd-2)
IJd- 11 f o r d 2 3.
Hint. The last row of the matrix (which contains the derivatives of the last coordinate) has
cos(rpd-2) as its first entry and --p sin(cpd-2) as its last entry. Expand the determinant with
respect to the last row (see Exercise 18-33).
Prove that the volume of the d-dimensional Euclidean ball of radius
I
about the origin is
Prove the area formula for circles in R2,
Prove the volume formula for balls in B3.
Hint. It is not necessary to compute the integrals
18-46. Let ( M , X I p ) be a n-finite measure space, let f : M + [0, co]be C-measurable and let (R,Xjh.A )
be the real numbers with Lebesgue measure.
[;,fP d k = 1
m
Prove that
prP-'p
( {x
E
M :f(x) > t
})
d t for all 1 5 p < x.
(Measurability is not an issue because of Exercise 14-S1a.)
Prove that if g : (0, 00)
1
m
[;,n.fdw
18-47.
g E
=
+
(0, x)is differentiable and g ' ( x ) > 0 for all x , then
g'(f)cL ( { x E M : f ( x ) >
] ) dt.
c' (.")
Prove that C ( x , t ) := f ( x
-
r ) g ( r ) is Lebesgue measurable.
Wd the function cx(r) := f ( x - r ) g ( r ) is Lebesgue integrable
J& f ( x - t)g(t) dt: if cx is Lebesgue integrable,
.+ g(x) :=
10:
otherwise,
isLebesgueintegrableand /If *gill 5 ~ ~ f ~ I l / ~ g ~ 1 1 .
The function f * g is also called the convolution o f f and g
Prove that for almost all x
and that he function
E
419
18.4. Multidimensional Substitution
(c) Prove that if h E
(d) Prove that f
L 1(."), then k(x)
:= h ( - x ) also is Lebesgue integrable
*g =g * f
(e) Prove that if g is bounded, then f * g is continuous.
Hint. First prove the result for continuous f.then use that the continuous functions are dense
i n L 1 (R~).
18-48. Prove that the determinant det(u1, . . . , u d ) is the oriented n-dimensional volume of the parallelepiped
spanned by the column vectors u1, . . . , Ud.
Hint. Map a box to the parallelepiped and use the Multivariable Substitution Formula.
18-49. Prove that the hypothesis that L is bijective can be dropped from Lemma 18.35.
Hint. Prove that if the linear function L is not bijective, then the Lebesgue measure of every image
of a Lebesgue measurable set is 0.
18-50. Multidimensional Substitution with weaker hypotheses. Let R1, 9 2 5 Ed be open subsets of Rd
and let g : R1 +. R2 be a continuously differentiable function so that for almost all x E R I we have
det ( D g ( x ) ) # 0 and so that { x E R1 : (32 E C21 : f ( x ) = f ( z ) ) ) is a null set. Then for all
Hint. The set { x E R1 : det(Dg(x)) = 0 ] is closed in G I . Prove that if K is compact, then the
set [ x E K : (32, E K : f ( x ) = f ( z ) ) } is closed, too. Apply Theorem 18.37 to the appropriate
subset of R1 to first prove the result for compactly supported functions.
18-51. On the injectivity hypothesis of Theorem 18.37. Consider the function g : R2 + R2 defined by
g ( x . y ) = (x' - y 2 , 2xy) (this function interprets (x,y ) as a complex number and squares it).
(a) Prove that for all (x,y ) E W2 the function g is differentiable at (x, y ) and that
det ( D g ( x . y ) ) = 4x2 4 y 2 .
+
(b) Prove that for all (x, y ) E
(c) Prove that for all ( a , b)
W2 we have
1 g(x, y ) / I 2
+ (0,O) we have g
a
= x2
+y2
+
m
b
= ( u ,b ) .
(d) Prove that g [B1(0,O)] = B l ( 0 , O ) .
(e) Prove that for the function f ( x , y ) = 1 we have
s,,
f d h = 7r, but the transformed inte-
(0.0)
gral is
s,,
f o g I det(Dg) 1 d h = 27r. Then explain why this result does not contradict
(0,O)
Theorem 18.37.
Hint. You may use the result of Exercise 18-42.
18-52. The effect of a diffeomorphism on Lebesgue measure. Let R1 and R2 be open subsets of Bd, let
K g Rl be compact and let g : C21 --f R2 be a continuously differentiable bijective function with
det ( D g ( x ) ) f 0 for all x E R1.Then for every E z 0 there is a 6 > 0 so that for every box B g K
1
withcenterpointx andwithdiameterlessthan8 wehave IA ( g [ B ]) - /det(Dg(x)) i ( B ) l c & h ( B ) .
Hint. The difference between this result and Lemma 18.36 are the absolute value signs and that we
can work with boxes rather than cubes. Use Theorem 18.37 with f = 1.
18-53. Line integrals and surface integrals are independent of the parametrization.
(a) Let m 5 d , let R,,R2 g Em be open and let r1 : 521 -+ Rd and r2 : R2 + Rd be injective
and continuously differentiable so that r1 [R,]= r2[C22]and so that all derivatives Dri (x) are
injective. Prove that r2-' o r1 is a differentiable function from R l to 822.
420
18. Measure, Topology, and Differentiation
Hint. L e tx
E
a
R1.Find u r n + ' , . . . , ud so that - q ( x ) ,
8x1
a
.. . , -rl(x),
v,+l,,
axm
d
a ba s e ofR'an d o n R 1 x R d-, s e t R l ( z l . . . . , z d ) : = r l ( z 1 , . . . , z m ) +
,
I
. , Ud is
zjuj.Use
j=rn+l
Corollary 17.66 to prove that R1 is differentiable with differentiable inverse on a neighborhood of ( x i . . . . , x r n . 0, . . . , 0). Define a similar function R2 for ' 2 .
+
+
Let S > 0 and let rl : (a1 - 6 , bl
6) + Rd and r2 : (a2 - 8 , b2 6 ) + Rd be
continuously differentiable functions with rl [ [ a l ,b l ] ] = r2 [ [ q b2]
,
rl (al)= r2(a2),
rl(b1) = r2(b2) and so that all derivatives of the rj are not zero. Let R
Rd be an open
set that contains r l [ [ a l ,bill and let F : R -+ Rd be continuously differentiable. Prove that
1,
Hint. rl = r2 o r2-' o rl
Let
and R2 be open subsets of R2 and let 11 : Rl -+ W3 and r2 : R2 + W3 be
continuously differentiable with r l [ Q l ] = rz[S22] so that for all ( x , y ) E 2, we have
a
a
a
y ) # 0, for all ( u , u ) E R2 we have - r 2 ( ~ , u ) x - r 2 ( u , v ) # 0,
ar
au
au
and so that there are points ( x , y) E 521 and ( u , u ) E R2 so that r l ( x . y ) = r2(u, u ) and
a
a
a
a
a
-q(x,
ax
y ) x -ri(x,
- q ( x , y ) x -r1 (x,y ) = h-r2(u, u ) x - r 2 ( u , u ) for some h > 0. Let R E R3 be an
ax
ay
all
au
open set that contains r1 [ R l l and let F : R + W3 be continuously differentiable. Prove that
Hint. rl = r2 o r2-' o ' 1 , Then work out componentwise that the effect on the cross product
is a multiplication with the determinant of the appropriate 2 x 2 matrix.
18-54. Continuous images of null sets can have nonzero measure. For this exercise, let the function
@ : [0, 11 + [0, 11 be Lebesgue's singular function from Exercise 11-22.
(a) Prove that if g : [a, b] + R is a nondecreasing function, h(x) = g(x)
h (g[A] ) > 0, then h (h[A] ) > 0.
+
+ x and A is a set with
+
(b) Prove that F ( s ) := + ( x ) x (where is Lebesgue's singular function from Exercise 11-22)
defines a continuous bijective function from [0, 11 to [0,2] so that F-' also is continuous and
A. ( F [CQ])> 0.
(c) Prove that for all d E N there is a null set N in Rd and a continuous bijective function G on
an open subset of Rd that contains N so that h ( G [ N ]) > 0.
18-55. Prove that if f : [ a , b] + R is absolutely continuous and nondecreasing and N
set, then f [ N ] also is a null set.
[ a , b] is a null
18-56. Uniform limits of continuous functions with additional properties need not have these additional
properties.
is a uniformly convergent sequence of
(a) Use Exercises 18-54 and 18-55 to prove that if [ f n
absolutely continuous functions on [ a , b ] ,then the limit need not be absolutely continuous.
(b) Prove that if [fn]r==r
is a uniformly convergent sequence of Lipschitz continuous functions
b], then the limit need not be Lipschitz continuous.
on [a,
Chapter 19
Introduction to Differential
Geometry
In applications, it is often necessary to describe surfaces, like the d-dimensional
spheres p E Rd : llp112 = r ] of radius r , or, more generally, lower dimensional subsets S of a higher dimensional space. To describe such subsets or surfaces, we use
parametrizations, that is, bijective, continuously differentiable functions g : S2 --f S
with injective derivatives D g ( x ) , where S2 is an open subset of RM for an appropriate
m . Unfortunately, for many surfaces such an overall parametrization cannot be defined. For example, for the spheres of radius 1 it can be shown (Exercise 19-12) that a
parametrization g as mentioned must have a continuous inverse function g-‘ : S +. S2.
But S is compact and S2 is not compact, which is impossible by Theorem 16.62. On the
other hand, the Implicit Function Theorem guarantees that for many surfaces that are
solution sets of equations f ( x , y ) = 0, where both x and y can be in Banach spaces, we
can locally find parametrizations. This is the idea behind the definition of a manifold.
This chapter introduces manifolds in Section 19.1 and their tangent spaces and differentiable functions in Section 19.2. Sections 19.3, 19.4, and 19.5 build the integration
theory on manifolds and the chapter culminates in Section 19.6 with Stokes’ Theorem.
1
19.1 Manifolds
The fundamental idea behind a manifold is that it “locally looks like m-dimensional
space.” This reflects our intuition that differentiable surfaces, despite being curved,
locally look like two-dimensional space.
4 21
422
19. Introduction to Differential Geometry
Figure 48: Manifolds ( a ) are spaces that locally look like R". The atlas of a Cmmanifold is a collection of homeomorphisms {xi}ier into the appropriate Rm so that
the compositions xi o xj7' are diffeomorphisms wherever they are defined. Embedded
manifolds ( b ) are subspaces of Rd so that each point of the manifold has a neighborhood in Rdthat can be transformed so that the image of the intersection of the manifold
with the neighborhood lies in the hyperplane in which the ( m 1)" through dth coordinates are zero.
+
Definition 19.1 Let m E N.A metric space M is called an m-dimensional (topological) manifold ifffor each p E M there is an open set 0 C M so that p E 0 and 0 is
homeomorphic to R". (See Figure 48.)
Equivalently, in the definition of a manifold we could demand that each 0 is homeomorphic to an open subset of Rm (see Exercise 19-1). For further investigations, we
also need differentiability properties. Because M is just a metric space, these properties
are introduced through differentiability properties of compositions of the homeomorphisms from the definition.
Rd be open. A bijective, infinitely direrentiablefiinction
Definition 19.2 Let U , V
h : U -+ V with infinitely direrentiable inverse is called a diffeomorphism.
Definition 19.3 Let m E N and let M be an m-dimensional manifold. A family {Xi}iGI
is called an atlasfor M ifleach xi is a homeomorphism from an open subset Oi of M
to an open subset of R",for each p E M there is an i E I so that p E Oi and for all
i , j E I the composition xi o xjT1 : xj [ Oi n O j ] -+ x,[ Oi n O j ] is a diffeomorphism.
The functions xi are also called charts or coordinate systems. (See Figure 48.)
The inverse x - l of a coordinate system x : U
parametrization of the subset U of M .
-+
Rm can be interpreted as a
Definition 19.4 A pair ( M , {xi)ier) of an m-dimensional manifold M and an atlas
{xi}iE1 is also called an m-dimensional Cm-manifold. A s for spaces, we Qpically
refer to Cm-manifolds through the set M , implicitly assuming that an atlas is given.
Of course, every Coo-manifoldis a manifold. A Ck-diffeomorphismis a bijective,
k times continuously differentiable function with k times continuously differentiable
19.1. Manifolds
423
inverse. By using Ck-diffeomorphisms instead of diffeomorphisms, it is possible to
define Ck-manifolds. Working with Ck-manifolds would require us to keep track of
detailed differentiability conditions. Hence, throughout this chapter we will work with
Cm-manifolds and we will simply refer to them as manifolds. Results similar to the
ones derived in this chapter also hold for @-manifolds.
Trivially (Rd.
{ idRd is a manifold. Whenever we work with Rd as a manifold
})
we will assume that the atlas is { idRd}. Our first nontrivial examples of manifolds
are subsets of d-dimensional space. These "embedded manifolds" will be of particular
interest throughout. They arise frequently in applications and we will use them as
examples and for motivation of abstract definitions.
s
Definition 19.5 Let m , d E N be so that m 5 d. A set M
Rd is called an m dimensional embedded manifold iff f o r every p E M there is an open neighborhood
U C Rd of p , an open set V
Rd and a diffeomorphism h : U -+ V f o r which
h [ U f? M ] = { V E v : um+l = . . . = V d = 01. (See Figure 48.)
Proposition 19.6 Every m-dimensional embedded manifold M is an m-dimensional
manifold.
Proof. For each p E M , let U p be an open subset of Rd that contains p and for
which there is a diffeomorphism h , : U p + V, with V p C Rd open and so that
h,[U, n MI = { V E V p : um+l = . . . = Ud = 0 ) . Let TRrn : Rd + Rm be the
projection onto the first m coordinates. For all p E M let x, := nRm o h p l U p n M .
Clearly, each x, is a homeomorphism from the set 0, := U p n M to the subset
r B m [ { ~E vP : um+l = . . . = Ud = o}] of Rm. For all p , q E M , the composition
h , o h i ' : h p [ U pn U,] .+ h y [ U pn U q ] is a diffeomorphism. Moreover, the sets
xP[Op
n 0,] = TRni o h p [ U pn U , n M ] andxq[Op n O,] = n p o h q [ U pn U, n M ]
are projections of the intersections of open subsets with the subspace Rm x (0)d-m
of R d , which means they are open subsets of Rm. With epgm : Rm + Rd being
the natural embedding that maps Rm to Rm x
the composition
X, O X - ' = ~ p g mo h , 0 h i ' 0 epgm
is a diffeomorphism. Therefore
P
In~m
[h , [UP"Uq nM]]
{ x p J p Eis~an atlas, and hence M is an m-dimensional manifold.
Whenever we work with an embedded manifold, we will assume that its atlas was
generated as in the proof of Proposition 19.6. Embedded manifolds arise naturally
when solving equations.
Theorem 19.7 Let R C Rd be open and let f : R + Rd-" be an infinitely diyerentiable function so that f o r all p E R with f ( p ) = 0 the matrix D f ( p ) has rank d - m.
Then f ( 0 ) is an m-dimensional embedded manifold in Rd.
-'
Proof. Let p E R be so that f ( p ) = 0. Apply Corollary 17.69 to obtain an
open set G G Rd and a diffeomorphism g : G -+ g[G] so that p E g [ G ] C C2
and f o g ( v 1 , . . . , U d ) = ( u r n + ' , . . . , U d ) for all ( q ,. .
. , U d ) E G . Then for all
(ul. .. .,Vd) E
8-1 [ f - ' ( ~ )
n g [ G ] ]we obtain f
o
g(v1, . . . , V d ) = 0, which means
424
19. Introduction to Differential Geometry
u,+1
= . . = ud = 0. With U := g[G], V = G and h = g-l we see that, because p
was arbitrary, f (0) is an m-dimensional embedded manifold.
w
-'
Proposition 19.6 and Theorem 19.7 provide a multitude of concrete examples.
Example 19.8 Examples of (embedded) manifolds.
1. Trivially, every open subset Cl of Rdis a d-dimensional manifold with atlas {in},
where i n ( p ) = p for all p E Cl. This observation is trivial, but it will help with
integration over embedded manifolds.
d
2. Let r > 0. Because f :
Rd -+ R defined by f (XI,. . . , Xd) := r 2 -
x; is
J=1
1
infinitely differentiable, every sphere p E
origin is a (d - 1)-dimensional manifold.
Rd : f ( p ) = 0) centered around the
3. A function a : Rd -+ R is called affine linear iff there is a nonzero linear
function L : Rd + R and an r E R so that a(x) = r L [ x ]for all x E Rd.
Because affine linear functions are infinitely differentiable, the intersection of
any hyperplane p E Rd : a ( p ) = 0 ) with an open set is a (d - 1)-dimensional
manifold.
+
I
4. More generally, the level surfaces of f ( x ) = k of any infinitely differentiable
function f : Rd + R with nonzero derivatives D f (x) are (d - 1)-dimensional
manifolds. Under mild hypotheses on how the surfaces intersect, for n < d
the intersection of n level surfaces f 1 (x) = k l , . . . , f, (x) = k, is a (d - n)dimensional manifold, because we can use F ( x ) := ( f i (x)-kl , . . . , f, (x)-k,)
and apply Proposition 19.6 and Theorem 19.7.
0
Exercise 19-2 provides further examples. We can also obtain examples of manifolds by considering subspaces.
Definition 19.9 Let M be a manifold and let U C M be an open subset
is called an open submanifold of M .
of
M . Then U
Exercise 19-3 shows that open submanifolds are indeed manifolds themselves.
Some interesting sets cannot be described as manifolds. For example, the closed
ball B1(0) c Rd (with respect to the Euclidean norm) is not a manifold, because none
of the boundary points has a neighborhood that is isomorphic to Rd.The observation
that these points have neighborhoods that are isomorphic to a half-space leads to the
idea of a manifold with boundary.
Definition 19.10 Let m E N.A metric space M is called an m-dimensional (topological) manifold with boundary i r f o r each point p E M there is an open neighborhood
0 C M of p that is homeomorphic to Euclidean space R"'or to the upper half space
Hfn := {(XI, . . . . x,) E Rm : x, 2 0 ) . Points p E M that do not have neighborhoods
isomorphic to Rm are also called boundary points and the set of all these points is
called the boundary a M of M .
425
19.1. Manifolds
To define atlases for manifolds with boundary, we define the following.
Definition 19.11 Let C2 5 Rd be an open subset of IRd and let B E Rd be so that
Q 5 B E
Then Ck ( B ) denotes the set of restrictions to B of functions that are
k times differentiable on a neighborhood of B. Similarly, C" ( B ) denotes the setof
restrictions to B offunctions that are infinitely direrentiable on a neighborhood of B .
a.
Because the ranges of the coordinate systems of manifolds with boundary are sets
as in Definition 19.11, atlases for manifolds with boundary, @-manifolds with boundary and C"-manifolds with boundary are defined similar to atlases for manifolds, C k manifolds, and Cco-manifolds, respectively. We will also refer to Cco-manifolds with
boundary simply as manifolds with boundary. Exercise 19-4 shows that the boundary
of a manifold with boundary is a manifold.
As for manifolds, subsets of Rd are of particular interest.
Definition 19.12 Let m , d E N be so that m 5 d. A set M C Rd is called an mdimensional embedded manifold with boundary irfor every p E M there is an open
set U Rd containing p and an open set V C Rd such that either
1. There is a diffeomorphism h : U + V so that
h ( U n M ) = { v EV
: ~ m + i = . . . =
ud
= 01, O r
2. There is a diffeomorphism h : U --f V so that the mth component of h ( p ) is zero
andh(U fl
M ) = {v E V : u, 2 0, Um+1 = . . . = ud = 0).
The reader will prove in Exercise 19-5 that every embedded manifold with boundary is a manifold with boundary.
Corners cannot be described with differentiable functions. Therefore, some interesting sets, such as the cube [0, 1Id, do not have a satisfactory description as embedded
manifolds with boundary (see Exercise 19-13b). To include these sets, we define manifolds with corners similar to manifolds with boundary.
Definition 19.13 Let m E N.A metric space M is called an m-dimensional (topological) manifold with corners iff each p E M has an open neighborhood 0 C M that is
homeomorphic to a subspace ck := {(xi,. . . , x,) E Rrn : xk 2 0, . . . , X m 2 o} with
k E { 1, . . . , rn) or to Rm. Points p E M that do not have neighborhoods isomorphic
to Rrnare also called boundary points and the set of all these points is called the
boundary a M of M .
Atlases for manifolds with corners, Ck-manifolds with corners and Cm-manifolds
with comers are defined similar to atlases for manifolds with boundary, Ck-manifolds
with boundary, and Cm-manifolds with boundary, respectively. We will also refer to
Cm-manifolds with comers simply as manifolds with corners. For a Cm-manifold
with corners, we will say that a point p E M is contained in a corner iff there is a
homeomorphism x from a neighborhood of p to a space ck with k < m and x ( p ) = 0.
Embedded manifolds with comers are touched upon in Exercise 19-6. We should also
note that, formally, every manifold with boundary is also a manifold with corners. We
19. Introduction to Differential Geometry
426
will typically not unify the two concepts, even when proving results valid for manifolds
with corners, because “manifold with corners” should explicitly indicate the presence
of corners, while a manifold with boundary is assumed to be smooth
Exercises
s
19-1. Let M be a metric space, let p E M , let 0 C M be an open neighborhood of p , let V W m be open
and let x : 0 + V be a homeomorphism. Prove that there is an open neighborhood U of p and a
homeomorphism y : U + R”.
Hint Consider the restriction of x to the inverse image of a small ball around x ( p ) and then map that
ball diffeomorphically to Rm.
19-2. More examples of manifolds
(a) Let a , b, c
x2
(0, 30). Prove that the ellipsoid
a
manifold. Also construct an atlas.
E
y2
22
++= 1 in R3 is a 2-dimensional
b2
c2
(b) Let S7 C X 2 and let f : S7 + W be infinitely differentiable with D f ( x ) # 0 for all x E S2.
Prove that every level curve f ( x , y ) = k is a one-dimensional manifold.
(c) Prove that S L ( n , R) :=
fold.
{A
E
M ( n x n , W) : det(A) = 1
}
is an n2 - 1 dimensional mani-
19-3. Prove that if M is a manifold and U is an open submanifold, then U is also a manifold.
19-4. Prove that if M is a manifold with boundary, then a M is a manifold.
19-5. Let M E
Wd be an embedded manifold with boundary.
(a) Prove that M is a manifold with boundary.
(b) Prove that the two conditions in the definition of an embedded manifold with boundary are
mutually exclusive. That is, every point of M satisfies either 1 or 2, but not both.
Hint. Assume a point satisfies both and use Corollary 17.66.
(c) Prove that the points of M that satisfy condition 2 are the boundary points of M .
(d) Prove that if M is the closure of an open subset of Rd,then a M = SM, that is, the boundary
of M as a manifold with boundary equals its topological boundary.
(e) Prove that aM is an embedded manifold.
19-6. Define embedded manifolds with corners and prove results similar to those in Exercise 19-5
19-7. Prove that if M is a manifold with comers, then a M is a union of manifolds with corners.
19-8. Prove that every connected manifold is pathwise connected.
19-9. Prove that every manifold is locally compact and that every connected manifold is o-compact
19-10. Let Q1, S72 C Rm be open sets. Prove that if g :
det ( D g ( x ) ) f 0 for all x E R1.
’21
-+
S72
is a C1-diffeomorphism, then
19-11. Prove that a set M g Rd is an m-dimensional embedded manifold in Rd iff for each p E M there
are an open neighborhood G g Rd,an open set N C Rm and an injective infinitely differentiable
function f : N + G so that f [ N ] = M n G, D f ( z ) has rank rn for all z E N and f-’ : f[N1+ N
is continuous.
Hint. For “e,”
let x E N be so that f ( x ) = p and let the vectors urn+l, . . . , U d E Rd be so that the
set
{
a
Gf(x),
function f ( z i , .
a
. . . , -f(x),
ax,
um+l,
J
. , , , U d is a base of Rd.Then apply Corollary 17.66 to the
. . , z d ) = ~ ( z i , ... . z m ) +
d
C
j=m+l
zjuj
427
19.2. Tangent Spaces and Differentiable Functions
19-12. Let 2
‘ g R2 be open. Prove that no function f : R --f { p E R2 : 1Ipli2 = 1 ] can be bijective,
continuously differentiable and so that all D g ( x ) are injective.
Hint. Use an idea similar to that for Exercise 19-11 to prove that f-’ is continuous.
19-13. Corners cannot be described with embedded manifolds. Let d p 2.
(a) Prove that the graph [ ( t . ltl, 0 , . . . , 0 ) : t E (-1, 1) ] of the absolute value function is not an
embedded manifold in Rd.
Hint. Use Exercise 19- 1 1
(b) Prove that the cube [0, 1Id is not an embedded manifold with boundary in Wd
Hint. Use Exercise 19-5e and arguments similar to part 19-13a.
19-14. Let M be a manifold with corners
(a) Prove that if p E M is in a corner isomorphic to the origin in ck, then there are continuous
functions C k . . . . , cm : [O, 11 --f M so that each ci [ [O, 11 ] is contained in a comer and the
intersection of any two distinct sets ci [ [0, I] ] is { p ] .
(b) Prove that the set CM := { p E M : p is in a corner ] is closed.
(c) Prove that if M is compact, then there are finitely many continuous functions c1, . . . , cn so
u
n
that C M =
ci
[ [0, 11 1.
i=l
19-15. Prove that if A is an atlas of the manifold M , then the set
is also an atlas of M .
Note. This atlas is called the maximal atlas of M .
u A”
{
:A
g
A” and A”is an atlas of M
]
19.2 Tangent Spaces and Differentiable Functions
The definition of differentiable functions on manifolds (see Definition 19.14 below)
is motivated by the fact that compositions of differentiable functions are again differentiable. However, rather than using differentiability of the factors to obtain differentiability of the composition, differentiability of the composition is used to define
differentiability of the middle factor. Exercise 19-16a shows that this idea is consistent
with the idea of differentiability according to Definition 17.24.
Definition 19.14 Let M , N be manifolds and let f : M -+ N be a function. Then f
is called differentiable iyfor all x in the atlas of M and all y in the atlas of N the
composition y o f o x - l is diyerentiable.
Note that Definition 19.14 does not depend on the atlases used for M and N , as
long as there is a containment relation between the atlases for each manifold. This is
because if x l and x2 are coordinate systems of M with overlapping domains, then the
composition x1 o x c l is differentiable and similar for coordinate systems y1, y2 for N .
Therefore differentiability of y1 o f o
implies differentiability of y2 o f o xF1 on
a subset of its domain and the domain of y2 o f o xT1 can be pieced together from
overlaps with domains of similar compositions using functions 21 and y1 from smaller
atlases of M and N , respectively. The details are left to Exercise 19-16b. Proposition
19.15 and Exercise 19-17 show that differentiable functions on manifolds behave as we
would expect differentiable functions to behave when it comes to domain restrictions
and compositions.
xrl
428
19. Introduction to Differential Geometry
Proposition 19.15 Let M , N be manifolds, let U C M be an open submanifold and
let f : M -+ N be a direrentiablefunction. Then f Iu : U -+ N also is differentiable.
Proof. Exercise 19-18.
Before we show that there are indeed many examples of differentiable functions,
we introduce higher orders of differentiability. Throughout, unless otherwise stated,
the dimension of a manifold M will be m and the dimension of a manifold N will be n.
Definition 19.16 Let M , N be manifolds, let k E N and let U E M be open. Then
f : M -+ N is a Ck function on U ifffor all coordinate systems x : V -+ JRm and
y : W + Rn the composition y o f o x-l
is a C kfunction. A function
lx[unvnf-l
[w]]
that is Ck for all k E N is called C f f i .A bijective Ckfunction whose inverse also is Ck
is called a Ck-diffeomorphism.
Proposition 19.17 Let M be a manifold. A function f : M -+ JR is COc ifSfor each
p E M there is a neighborhood U p of p so that f is Cffion U p
rn
Proof. Exercise 19-20.
The next result is a translation of Lemma 18.11 to manifolds.
Theorem 19.18 Let M be a manifold, let C 5 M be compact and let U C M be open
so that C E U . Then there is a Coofunction f : M -+ [0, 11 so that f / c = 1 and
SUPP(f) c u.
Proof. Let p E C and let x : V + JRm be a coordinate system around p . Then
there is an e p > 0 so that B,,(p) C U n V. Because x-l is continuous, the image
UPQd :=
[ I
[T- 1
x B q ( p ) is open and CRd := x B 9 ( p ) n C is closed. With lclWd,uRd
as
o x(q) for all q E U n V and let f p ( q ) := 0
in Lemma 18.11 let f p ( q ) := lcRd,uRd
for all other 4 E M . It is easy to see that supp(fp) C B 3 ( p ) C U .
3
To prove that f p is Coo, let 4 E M . If f p is identical to zero in a neighborhood
of q , then clearly fp is Coo in that neighborhood. If f p is not identical to zero in any
neighborhood of q , then q E supp( f p ) C B q ( p ) . Let y be any coordinate system
around 4 and let
cause
E
E
3
E 0, 2 be so that B,(q) is contained in the domain of y . Be-
&P
-= and 4
?
(
E
supp(f p ) , B,(4) is also contained in the domain of x. Thus
is Coo. Because 4
E
M was
u
arbitrary, f p is C f f i .
Because C C
B y ( p ) and C is compact, there are points P I , . . . , pn E C so
u
PEC
n
that C C
j=1
n
f p j is Coo,glc 1 1 and supp(g) C: U . Hence,
B&pj
- ( p j ) . Now g :=
’
j=1
f := j1 o g (see Lemma 18.9 for the definition of j , ) is as desired.
rn
429
19.2. Tangent Spaces and Differentiable Functions
Figure 49: If M is an embedded manifold and h is as in Definition 19.5, then the
derivative of h-' maps the horizontal space into which h[U n MI is embedded to a
space that is tangential to M .
Although Definition 19.14 defines differentiability on a manifold, it is not satisfactory by itself. After all, differentiable functions have a derivative and Definition 19.14
does not tell us what the derivative is or what it should be. The problem is that a manifold does not have any linear structure. Without linear structure there is no way to shift
a linear function L [ . - x ] so that f (x) L [ . - x ] is locally a good approximation of f ,
as we did in Definition 17.24 (also see Exercise 17-25). In fact, we cannot even dejine
linear functions, which means on a manifold we must start from scratch.
For embedded manifolds, there is a notion of differentiability in the surrounding
space. Hence, we will use embedded manifolds as guidance. Throughout, we will
make sure that our newly defined notions are consistent with what we should expect
for embedded manifolds.
So consider an embedded manifold M & EXd and let h : U + V be as in Definition 19.5. Then for all p E U n M the function h-' is differentiable at h ( p )
and near p the affine function k - ' ( h ( p ) )
D h - ' ( h ( p ) ) [ . - h ( p ) ] is tangential to
+
+
h-' (see Figure 49). With Rm x {O]d-"
[
:= u E
Rd : um+l
1
= . . . = V d = 0 there
(Bm
x
It
is an open set V in Rd so that h maps U n M to the set V n
follows directly from the definition of differentiability that for every E > 0 there
is a 6 > 0 so that for all z E V n ( E X m x {O}"-'")
with / / z - h ( p ) < 6 we have
/
llh-'(z) - k - ' ( k ( p ) ) - D k - ' ( h ( p ) ) [ z - k ( p ) ] l l f
tion P
+ Dk-'(h(p))[
'
EI/Z
- h(p)II.Therefore,thefunc
I
- h ( ~ ) ] / B " ' x ( o l n &ismtangential to h-' ""(
xP
(0)d-m)
+
[.-
. Geo-
metrically, this means that the set of points p D k - ' ( h ( p ) )
x {O}d-"] is the
tangent space of M at p (see Figure 49). It can be shown via the Chain Rule that this
tangent space does not depend on the choice of h (see Exercise 19-19). Moreover, if M
and N are embedded manifolds and f is a differentiable function on a neighborhood
of M so that f [ M ] N, then the derivative of f at p maps the above defined tangent
space of M at p to the similarly defined tangent space of N at f ( p ) .
19. Introduction to Differential Geometry
430
This means we can reinterpret derivatives as functions that map the right tangent
spaces to each other. But for an abstract manifold there is no surrounding space in
which to define tangent spaces on which the derivative of a function f : M -+ N could
operate. Thus we first need to create tangent spaces. There are numerous ways in which
tangent spaces can be defined in differential geometry and they are all equivalent in a
certain sense. We choose a simple definition here and reinterpret it when this becomes
conducive to the investigation in Section 19.4.
If x is a coordinate system for the manifold M , then for every element p in the
domain of x the space R" is the tangent space (of Rm itself) at x ( p ) . This is because
the only space that could be tangential to an open set is the surrounding space itself.
Proposition 19.19 shows that if x and y are coordinate systems of M and p is in the
domains of x and y , then the tangent vectors in the tangent spaces at points x ( p ) and
y ( p ) can be identified in such a way that they are useable as tangent vectors for M at
p . Proposition 19.20 shows that this idea is consistent with the idea of a tangent space
for embedded manifolds as described above.
The remaining results in this section will involve a lot of work with equivalence
relations. Because every point of a manifold can be in the domain of several coordinate
systems, at each step we must assure that our definitions do not depend on the specific
coordinate system we use. This level of detail would be hard to work with on a regular
basis. It would be similar to recall that formally every real number is an equivalence
class of Cauchy sequences of rational numbers (see remarks after the proof of Theorem 16.89) whenever we work with real numbers. Clearly, this would be overkill.
Therefore, we will establish that tangent spaces and the functions that take the place of
derivatives behave as they should and we will subsequently use these properties, with
the details of the definitions only to be used when this is unavoidable.
Proposition 19.19 Let M be an m-dimensional manifold and let p E M . For all
u , w E R" and f o r all coordinate systems x, y so that p is in the domains of x and
y, dejine the relation - p by (x,u ) - p ( y , w)zff w = D y o x
(x(p))[u]. Then
the relation - p is an equivalence relation. Moreovel; if we denote the equivalence
classes by [x,u I p and the set of equivalence classes by M,, then the binary operations
[x,ulP [x,u ] , := [x,u uIp and a[x,
uIp := [x,CYU]~,where a E R,are welldejined and with these operations M , is a vector space that is isomorphic to Rm.
(
+
-'I
+
-,
Proof. Left to the reader as Exercise 19-21. The proof that
is well-defined
relies on the formula for the derivative of the inverse function for symmetry and on the
Chain Rule for transitivity.
Proposition 19.20 Let M Rdbe an m-dimensional embedded manifold, let p E M
and let x : U + R" be a coordinate system around p as constructed in Proposition
19.6. Then x-' is diyerentiable in the sense of Dejinition 17.24 and the "embedded
:= { p } x Dx-'(x(p)) [R"] is isomorphic to M p via the isotangent space"
morphism F ( ( p , u ) ) := [x, (Ux-'(x(p)))-'
.
[u]]
P
Proof. Exercise 19-22.
H
43 1
19.2. Tangent Spaces and Differentiable Functions
Proposition 19.19 introduces an object M , that could serve as a tangent space for
M at p and Proposition 19.20 shows that for embedded manifolds there is a natural
isomorphism between M , and the space that we would expect to be the tangent space.
Thus we call M , the tangent space of M at p .
Definition 19.21 The space M , of Proposition 19.19 will be called the tangent space
of M at p . The set T M :=
M , is called the tangent bundle of M .
u
,EM
Simplistically speaking, at every p E M Definition 19.21 merely tacks a tangent
space on to the manifold. However, even if this was all, this approach is consistent with
an important physical motivation. Vectors in physics have magnitude and direction,
just like vectors in mathematics. But vectors in physics also have a point of action. For
example, consider a car that moves in a straight line at constant speed. The magnitude
and direction of the force that the car exerts on the particles in front of it stay the same,
independent of whether these particles are air or whether they constitute another car.
Obviously, the effect is different in either situation. If the force acts on a set of air
molecules, the car travels regularly. If it acts on another car, the car crashes. To give
vectors in mathematics a point of action, we need to incorporate the point of action into
the definition of the vector. This is done in the definition of T M .
For
{ i d R d } ) ,note that R$ is just [idRd, u ] , : u E R d l ,which is consistent
with the idea that our vectors now have a point of action p . Similarly, for an open set
!d E Rdconsidered as a manifold (a,
i n ) we obtain a, = [ i n , u ] , : u E R d ]for the
tangent space. Although these realizations are trivial, they will be very useful when we
consider integration over embedded manifolds.
Now that tangent spaces are defined, we can tackle the definition of a "derivative" on M . Consider embedded manifolds M and N and a differentiable function
f defined on a neighborhood of M so that f [ M ] 2 N. Then the Chain Rule implies D f ( p ) [ D h - ' ( h ( p ) )
x {O}d-m]] = D ( f oh-') ( h ( p ) )
x {0}"-'"],
which is contained in the tangent space of N at f ( p ) , and it is equal to the tangent
space if we assume that f is a diffeomorphism. Therefore derivatives on manifolds
should map the tangent space at a point into the tangent space at the image point (see
Figure 50). Proposition 19.22 shows that it is possible to define such a mapping on the
tangent vectors and Proposition 19.24 shows that this map is consistent with what we
expect it to do for embedded manifolds.
[
(wd,
I
[,,,
[.-
Proposition 19.22 Let M , N be manifolds, let the function f : M + N be direrentiable, let p E M , let x be a coordinate system around p and let y be a coordinate
system around f ( p ) . Then f * , ([x, ul,) := [ Y , D ( Y 0 f 0 x-'> ( x ( p ) ) [ ~ l ]de~~~)
fines a linear function f+, : M , +. N,.
Proof. We first need to prove that f i . p is well-defined. Let
let y1 and 4'2 be coordinate systems about f ( p ) . Then
(XI,u )
-,
(x2,w)
and
432
19. Introduction to Differential Geometry
Figure 50: The tangent space M , of a manifold M at a point p (see Definition 19.21)
can be viewed as a tangential plane attached at p . For embedded manifolds, M , can be
considered to be the image of Rm under the derivative of the right parametrization (see
Proposition 19.20). For a differentiable function f : M + N the map f* (see Proposition 19.22) plays the role of the derivative. If M and N are both embedded manifolds
and f is a differentiable function from a neighborhood of M to a neighborhood of N ,
then f* combines the function and the derivative (see Proposition 19.24).
Definition 19.23 We denote the function from T M to T N whose restriction to each
M p is f * p by f*.Moreovel; unless necessary, we will not explicitly mention p and thus
denote f * p by f*,too.
Proposition 19.24 Let M , N be embedded manifolds, let p E M and let f be a differentiable function from a neighborhood o f M to a neighborhood of N . Then for all
( p , u ) E MErnbthefunction f*emb((p,u ) ) := ( f ( p ) ,D f ( p ) [ u ] )satisJies the equation
f* = FN o fZmb o F L ' , where FM and FN are the isomorphisms from Proposition
19.20 for M and N , respectively.
Proof. Let x be the coordinate system around p that is used to construct FM and
let y be the coordinate system around f ( p ) that is used to construct F N . Then
433
19.2. Tangent Spaces and Differentiable Functions
As noted in the beginning, the details of chasing these compositions through the
equivalence classes are too cumbersome to do on a regular basis. To avoid this level of
detail, we accept that T M really defines the tangent spaces and that f* is the "derivative" of a function f : M + N , and we subsequently use the fundamental properties
of these entities rather than their definitions.
For manifolds with boundary and manifolds with corners, we can also define tangent spaces. Because differentiable functions on sets that are not closed are restrictions
of differentiable functions on larger sets, the derivatives D y o x- are also defined
for boundary points. This means we can say the following.
(
'1
Definition 19.25 If M is a manifold with boundary or a manifold with corners, the
tangent bundle T M is deJined in the same way as for manifolds.
For points on the boundary of the manifold, there are two kinds of tangent vectors.
Definition 19.26 Let M be a manifold with boundaly and let p E L?M. A tangent
vector [x,u ] , will be called outward pointing ifsu, -= 0 and it will be called inward
pointing iff u, z 0. For p being in the boundary of a manifold with corners the vector
[x,u ] , will be called outward pointing i f f x ( p ) u $1;;;. The vector [x," I p will be
called inward pointing i f f x ( p ) u E (Ck)'.
+
+
With tangent spaces defined, it is now easy to define (tangential) vector fields.
Definition 19.27 Let M be a manifold. A function F : M + T M so that F ( p ) E M ,
for all p E M is called a vector field on M .
Finally, note that even though we assumed throughout that our manifolds were Coo
manifolds, everything in this section can be defined and proved for C1 manifolds.
434
19. Introduction to Differential Geometry
Exercises
19-16. Consistency of Definition 19.14 with the original definition of differentiability and with itself.
(a) Let
R1,R; g B" and R2, R; g Rn be open sets, let f : S21 + R2 be a function and let
R; and @ : R2 + S2; be differentiable bijective functions with differentiable
9 : '21 +
inverse. Prove that f is differentiable iff @ o f o cp-l is differentiable.
(b) Let M be a manifold with atlas dM,let N be a manifold with atlas A N and let f : M + N
be so that for all X A E AM and all Y A E A N the composition Y A o f o x i 1 is differentiable.
Let AM2 AM and_xN 2 AN be atlases of M and N ,let x : U -+ Em be in 2~ and let
y : V + W" be in d N so that the composition y o f ox-' has nonempty domain. Prove that
y o f o 1 - I is differentiable.
19-17. Let M , N, 0 be manifolds and let f : M -+ N and g : N + 0 be differentiable. Prove that g o f
is differentiable and that ( g o f)* = g, o f*.
19-18. Prove Proposition 19.15.
19-19. Prove that if M is an embedded manifold and h : U + V and k : 6 -+
then Dh-' ( h ( p )) [ TR" x (O]d-m ] = Dk-' ( k ( p ) ) [ Rm x (O]d-"
are as in Definition 19.5,
1.
19-20. Prove Proposition 19.17.
19-21. Prove Proposition 19.19. That is, prove that the relation z Pis an equivalence relation and that the
vector addition and scalar multiplication are well-defined.
19-22. Prove Proposition 19.20.
19-23. Let M be a manifold. Prove that if id : M + M is the identity, then id, : T M + T M also is the
identity.
19-24. Let M . N be manifolds and let U 5 M be an open submanifold of M .
(a) Prove that the tangent bundle T U =
u
PEU
(b) Prove that ( f l u ) * = f
*l~
Up of U is equal to
u
Mp.
PEU
for all differentiable functions f : M + N.
19-25. Let M be an m-dimensional manifold. Prove that T M is a 2m-dimensional manifold.
19-26. Prove that if A4 g Rd is an embedded C m manifold and R is an open neighborhood of M , then for
every f E Ck(R)the restriction f j is~ C k on M .
19-27. Let M be a connected manifold, let C C_ M be closed (but not necessarily compact) and let U 5 M
be open so that C g U . Prove that there is a C" function f : M + [O, 11 so that f l c = 1 and
supp(f) G U .
19.3 Differential Forms, Integrals Over the Unit Cube
As we start our investigation of integration on manifolds, we first note the following
three shortcomings of the tools we currently have available.
First, although vector fields as in Definition 19.27 are important, they have a mortal
flaw for applications in fluid mechanics and field theory. Any vector field that is defined
as a map from the manifold into the tangent bundle is necessarily tangential to the
manifold. In fluid mechanics and field theory, manifolds are test surfaces and vector
fields usually go through these test surfaces or at least they have a component that
causes some "transfer" through the surface. Obviously such vector fields cannot be
modeled with the tangent bundle.
Second, a typical application of the interplay between fields and surfaces is the
computation how much "matter" a vector field transfers through a test surface. To
19.3. Differential Forms, Integrals Over the Unit Cube
435
compute this quantity, we need to integrate the field over the manifold. In R d , integrals usually are computed with Fubini's Theorem and we do not give much thought
to the fact that the coordinate directions are provided by the standard base. On a manifold there is no standard base and each neighborhood of a point has infinitely many
parametrizations via coordinate systems. Thus we cannot just pick one coordinate system and use the standard base in its image space.
Third, lower dimensional objects, like two dimensional surfaces in three dimensional space, typically have measure zero in the higher dimensional space. Therefore,
we need a notion of integration that gives nonzero integrals, even if our manifold resides in a higher dimensional space.
The above only reemphasizes that the requisite definitions will require a lot of attention to detail. Differential forms will allow us to model vector fields that are not
necessarily parallel to the manifold. The integral of such a form over a manifold will
be pieced together from simpler integrals. The simplest such integral is the integral of a
differential form over a cube, which is presented in this section. In Section 19.4, cubes
in Rm are lifted to the manifold as k-cubes and the differential forms will be k-forms
on the manifold. Finally, Section 19.5 defines the integral of a form over the whole
manifold. Some definitions will be repeated in this presentation, but the insight gained
by first working out the details in a simple setting will be well worth it. (Compare with
the double coverage of Lebesgue integration in Chapters 9 and 14.) For starters, recall
that Ak ( V ) denotes the space of alternating k-tensors on V (Definition 18.17).
Definition 19.28 Let R be a subset of Rd. A function w : R -+
u
Ak (Rz) so that
PER
w ( p ) E Ak Rd f o r all p E
R is called a k-form on R or simply a differential form.
Weformally set A' (Rz) :=
R,so that a 0-form simply is a function.
(
PI
Because it is natural to identify each [idRd, u I p E R$ with u , we will denote tangent vectors to Rd by single letters in this section. Because alternating k-tensors are
associated with k-dimensional volumes (see Example 18.21) we introduce the following notation for the dual base.
Definition 19.29 Let { e l , . . . , ed] be the standard base f o r R$.We will denote the dual
. . . , Trd} as { d x l , . . . , dxd} where dxi (e,) = ni ( e j ) .
base {nl,
The notation is similar to that for integrals because forms are connected to integrals of vector fields as follows. A differentiable function Y : [ a ,b ] +- R3 with
~ ' ( t#) 0 for all t E [ a ,b] can be interpreted as describing the position r ( t ) of a traveling particle at time t . If R contains r [ [ a ,b ] ]and the vector field F : R + T R
is so that F ( x , y , z ) describes a force that is acting on a particle at the point ( x , y , z ) ,
then the work that is done as the particle travels from r ( a ) to r (b) is the line integral
W =lh(F(r(t)),
d t , where we assume that each F ( r ( t ) ) was projected
back from R2,(r)to R3 via [in,v],(~) H u . By Exercise 18-53b, the value of this
integral depends only on the geometric shape of r [ [ a ,b ] ]and the direction of travel,
436
19. Introduction to Differential Geometry
but not on the speed at which the particle travels from r ( a ) to r ( b ) along this path.
we can write the integral as
r(f)
Each rl!(t)can be obtained from r ' ( t ) by applying the form d x i . The integral is
thus also abbreviated as W =
lb + lb + l
P dxl
for d x , ( r ' ( t ) ) d t . The form w = P d x l
Q dx2
b
R dx3, where dxi stands
+ Q dx2 + R dx3 can be interpreted as the dif-
ferential amount of work that is done as the particle moves a differential step
from its current position r ( t ) to r ( t )
+ r'(t) dt = r ( t )+
i;::)
dx2
2
i1
. The integration for-
malism that we will define in Section 19.5 will indeed locally reduce the integral of the
form w to the above line integral.
Similarly, an injective differentiable function r : [a,b] x [c,d ] + R3 so that
(2 8)
x
( x , y ) f 0 for all ( x , y ) E [ a ,b] x [c,d ] can be interpreted as a surface
in R3.If we interpret F as a flow field, then (with the right hypotheses) the throughput
of F through the surface defined by r is the surface integral
Exercise 18-53c shows that this integral also only depends on the geometric shape
of the surface as long as the parametrizations respect the orientation of the surface.
(Details on the rather subtle notion of orientation will be addressed later.) With the
wedge product from Definition 18.20 the factors behind the components of F can be
ar
obtained by applying the forms dx2 A dx3, dx3 A d x l and d x l A dx2 to the vectors ax
ar
and -. Thus w = P dx2 A dx3 + Q dx3 A d x l + R d x l A dx2 can be interpreted
aY
as the differential throughput of the vector field F through a parallelogram spanned by
ar
ar
the differential vectors -dx and -dy attached at r ( x , y ) and located on the surface
ax
aY
defined by r (also see Figure 5 1 on page 442).
437
19.3. Differential Forms, Integrals Over the Unit Cube
L''Idib
Finally, for a scalar function f : [ a ,b] x [ c ,d ] x [ I , h ]
--+
IR the integral is
f ( x , y , z ) d x d y d z . This time there is no extra factor involved. The form
f ( p ) dxr\dyr\dz can be interpreted as the differential contribution o f f ( p )to the overall integral. To stay consistent with the above, we could say that the box is parametrized
by the identity. The scaling factor, which is 1, is obtained by applying d x A d y A d z to
the partial derivatives of the identity with respect to x , y , and z.
The above indicates that differential forms should allow us to describe line integrals of vector fields, surface integrals of vector fields over surfaces with rectangular
parameter domain, and integrals of scalar functions over boxes with one formalism.
Moreover, in this formalism the forms carry almost all the information needed for the
integral. The components of the field as well as the directions in which we integrate
are part of the form. Only the parameter domain is not given and it can be supplied by
a coordinate system. Therefore, this approach should be the right idea and, after taking
care of the details, we will indeed see that forms on manifolds can be used to model
vector fields on surfaces.
Definition 19.30 Let R
Rd be open and let o : R -+
u
Ak ( a p be
) a k-form on
PEQ
c
there is a base representation of the form
R. By Theorem 18.23, at every p E
~ ( p=)
o i l , , , , , i k ( pdxi,
) A . . . A dxi,. We will call w differentiable at p
lsil < , . , < i k s d
iff each of the wi,,,,,>ik
is differentiable at p . I f w is differentiable at each p E C? we will
call w differentiable on a.
Recall that by Theorem 17.43 for differentiable functions f : EXd + IR the deriva-
af
+ + af
tive D f is D f = - dxl . . . - dxd and it is only a small abuse of notation to
axl
axd
set d f := D f . We define the differential of a k-form similar to the k lStderivative
of regular functions (see Corollary 17.61), except that we work with wedge products
instead of tensor products. In this fashion, we will retain the connection to differential
contributions to integrals.
+
Definition 19.31 Let R C Rd be open and let f :
fine df
af
:= - dxl
8x1
af dxd.
+ .'. + -
~ ( p=)
axd
I f w : 52
--+
u
-+ EX be differentiable. We deAk ( a p )is dgerentiable and
PEQ
Wil,,,.,ik( p ) dxi,
A
. . . A dxi,, then we define the ( k + l)-formd w
lsil < . . . < & i d
bY
d w ( p ) :=
c
l i i l <..,<iksd
and call it the differential of o.
dwi1,,,.,ik(p)dxil
A
' '
A
dxik
19. Introduction to Differential Geometry
438
Definition 19.31 allows us to make some connections that may be familiar from
multivariable calculus or physics.
Example 19.32
I f w = P dx2
A
dx3
+ Q dx3 A dxl + R dxl A dx2, then
+ Q dx2 + R dxg, then
= (”
ax2 - 9)
ax3 dx2 A dx3 +
I f w = P dxl
do
dxi
(g 2)
dx3
-
A
dxl
Both equalities are a direct application of the definition of the differential and of
A dxj = 0 (Exercise 19-28).
The connection to vector fields defined as functions from a subset of R3 to R3 is
now quite simple.
[[
Definition 19.33 Let S2 2 Rd be open and let F ( p ) = i n ,
d
be a vector
aF.
ax
2.
The divergence is
field 9n Q. We dejne the divergence of F as div(F) :=
j=1
also denoted by V F, because the sum is the formal scalar product of the column vector
in F with the nabla operator.
Definition 19.34 Let S2 2 R3 be open and let F = [ i n ,
( 1]
be a vector field on
Q. Wedefine
and call it the curl of F. The curl is also denoted by V x F, because the components
can be obtained as the formal cross product of the column vector in F with the nabla
operator.
Example 19.32 shows that the differential of the appropriate form encodes the divergence and the curl of a vector field together with the appropriate volume or area
element that is needed to integrate the divergence or the curl. Also recall that the gra-
dient of a differentiable function f : 52 + R is grad(f) =
d
j=1
af
-ej.
ax
Hence, the
19.3. Differential Forms, Integrals Over the Unit Cube
439
differential of a 0-form (a function) encodes the gradient together with the differential length element that is needed for a line integral. We now turn to the properties of
differential forms.
Theorem 19.35 Let R C_ Wdbe open, let o be a d8erentiable k-form and let q be a
diferentiable l-form on Q. Then the following hold.
+ q ) = d o + dq.
d ( w A q ) = (dw) A q + ( - l ) k w A d q .
1. I f 1 = k, then d ( o
2.
3. I f w is twice differentiable, then d ( d w ) = 0.
Proof. Part 1 is trivial. Moreover, part 1 shows that we only need to consider forms
w = f d x i , A . . . A dxi, and q = g dxj, A . . . A d x j , for parts 2 and 3. For such forms,
we obtain the following for part 2.
d ( w A q ) = d [ ( f dxi, ~ . . . ~ d x i , ) ~ ( g d~ x. j. ,. ~ d x j , ) ]
=
d [ ( f g ) d x i ,~ . . . A d x i , ~ d x ~j ., . . r \ d x j , ]
=
d ( f g ) r \ d x i , ~ . . . ~ d x~i d, x j A, . . . ~ d x j ,
=
( ( d f ) g 4-f ( d g ) ) A d x , ,
( d f ) A d x , , A . . A dx,,
=
+
A
A
. . . A dx,, A d x J , A . . . A d x J l
g d x J ,A . . . A dxJl
+ ( - l ) k f d x , , A . . . A ~ x , ,r \ d g A d x J , A . . . ~ d x , ,
=
(dw) A q
+ ( - l ) k oA d q .
Finally, part 3 is proved as follows with D, denoting the athpartial derivative.
d
=
x d ( D , f ( p ) ) d x , ~ d x i ~, . . . ~ d x i ~
,=l
d
=
0.
d
19. Introduction to Differential Geometry
440
The above completes the proof.
Exercise 19-31 shows how Theorem 19.35 includes certain equations from vector
analysis as special cases. We are now ready to define the integral of a k-form over the
cube [O, 1 I k .
Definition 19.36 I f w is a k-form on [0, 1 I k , then (by Theorem 18.23) there is a unique
function f so that w = f dxl
A
. . . A dxk and we dejine
In the following, we will often work with k-tuples from which a certain element,
say, the lth, has been removed. To simplify notation, we define the following.
Definition 19.37 If a1 . . . ak is a sequence of k mathematical symbols denoting elements of a set and we want to investigate the sequence from which the I" entry has
been removed, we write a1 . . . a2 . . . ak instead of a1 . . . al-lal+l . . . ak.
Stokes' Theorem now says that the integral of the differential of a (k - 1)-form
(which is a k-form) over the cube [0, lIk can be expressed as a sum of integrals of the
(k - 1)-form over the faces of the cube. We first state the result in its raw form and
subsequently reduce the rather complicated right side to something more manageable.
Theorem 19.38 Stokes' Theoremfor [0, 1Ik. Let k E N and let w be a continuously
direrentiable ( k - 1 ) form that is dejined on a neighborhood of [0,ilk. Then
where for k = 1 the integral on the right (in this case there is only one "integral")
simply evaluates the functions at the given point.
Proof. With w ( p ) =
wi
lsil < . . . < i k - l
i , ( p ) dxi,
A
. . . A dxik-, for each subset
i k
{il < . . . < ik-l} C 11,. . . , k) there is a unique index j?
equality { i i < . . . < ik-l} = { 1, . . . , k} \ {j?} holds. Hence,
E
(1,
k} so that the
A
. . A dxk
44 1
19.3. Differential Forms, Integrals Over the Unit Cube
Via Fubini's Theorem and the Fundamental Theorem of Calculus we obtain
w
Geometrically, the sum of integrals on the right side of the equation in Theorem
19.38 runs over all the faces of the cube [0, 1 I k . We thus would like to abbreviate it
as the integral of the form w over the surface of the cube. To incorporate the negative
signs we have picked up, we attach the negative signs to the pieces of the surface. This
is similar to how we incorporated the integration variables into forms. Moreover, in
anticipation of the next section, we define our cubes and boundary pieces as functions.
Definition 19.39 We dejine I k as the identity function from [0,lIk to Rk. The ( i , 0)face of I k is Ii,oj(x):= (XI,.. . , xi-l,O, x i + l , . . . ,xk), and the ( i , l>-faceof I k is
k
I(i,lj(x)
:= (XI,. . . , xi-1, 1, xi+l, . . . , X k ) , where I$,oland I$,l)are functions from
k
[0, llk-' t o R k . The(formal)sumi31k = c ( - l ) i
- I [ , l j )iscalledthe bound-
i=l
k
I
aryof I k .Fora k - 1f o r m w ( p ) =
k ( p ) dxl
,,,,,& ,,,,,
A
... Ad&
A
. . . A dxk we
a=l
dejine
k
Clearly, the integral of w over i31k is defined so that the inconvenient sum from
Theorem 19.38 is pushed into the formal sum that defines the domain of integration.
More importantly, this formalism has a solid physical intuition behind it. Let us interpret w as a vector field, say a flow of a fluid, that is decomposed into components so
442
19. Introduction to Differential Geometry
x
Figure 5 1: Visualization of the 2-form Q d x ~ d asza vector field parallel to the y-axis.
If Q d x A d z is interpreted as a flow field, then there is no transfer of material through
the bottom, the top, the front or the back face of the cube. Moreover, if we assume that
Q is a constant, then the transfer through the left face is Q . (area of face) into the cube
and the transfer through the right face is Q . (area of face) out of the cube
that the dxl A . . . A d z A . . . Adxk-component is perpendicular to any plane of the form
x, = c , where c E R is a constant (also see Figure 51). Components that are parallel
to the surface do not contribute to the surface integral, because flow parallel to a membrane does not transport material through it. On the other hand, components that are
perpendicular to the surface contribute with their magnitude, because a flow of velocity
v' that is perpendicular to a small surface S transports per unit time a material volume
of Ilv'l/A(S)through the surface, where A ( S ) is the area of the surface S. Therefore,
to compute the net flux of w across the surface of [0, 1Ik, for each a E { 1, . . . , k } we
only need the integrals of w l , , , , , & , , , , ,over
k
the parts of the surface of the cube for which
x, = 0 or xa = 1. The opposite signs of 0 1 ,,,,,6 ,,,,,k for x, = 0 and x, = 1 are needed
to assure that the integral counts net transfer of material into or out of the cube, because
a flow direction that enters the cube at xa = 0 is exiting the cube at x, = 1.
Corollary 19.40 With Dejnition 19.39 the equation in Stokes' Theoremfor [0, lIk
P
becomes the much more readable equation
/,O,lIk
P
dm = /aP w .
Exercises
19-28. Prove the claims in Example 19.32.
19-29. Prove that i f f and g are 0-forms, then d ( f g ) = ( d f ) g
+f(dg).
19-30. Let R g Wd and let w he a k-form on R.Prove that for all i l < .
Definition 19.30 is w [ e i , , . . . , e i , ] .
19-31. Let R g
'
< ik the function wi,,., . l k from
W3 be open. Use Theorem 19.35 to prove the following:
(a) If F : R -+ R3 is a twice differentiable vector field, then div (curl(F) ) = 0
(b) I f f : R +-W is a twice differentiable function, then curl (grad(f)
) = 0.
443
19.4. k-Forms and Integrals Over k-Chains
(c) Why is is not possible to use Theorem 19.35 to prove the wrong equality grad ( div(F )
) = O?
19-32. Prove that the function f in Definition 19.36 is f ( p ) = w ( p ) [ e l ,. . . , ekl
19-33. Divergence Theorem for [0, l13.
(a) Prove that if F is a differentiable vector field defined on a neighborhood of the unit cube
[O, lI3, then
Then explain why the integral on the right gives the net outflow from the unit cube
(b) Integrate the vector field F ( x , J , z ) =
(1: 1
y2
over the surface of the unit cube [0, I l 3
19.4 k-Forms and Integrals Over k-Chains
To define integration on manifolds, first note that a manifold locally looks like its tangent space. Hence, locally, volumes in the manifold are essentially volumes in the
tangent space. Because volumes are computed via k-tensors we define a k-form on a
manifold as a function into the spaces of k-tensors on the tangent spaces.
Definition 19.41 A function w : M
u
--+
Ak ( M p ) such that w ( p )
E
Ak ( M p )for
PEM
each p
M is called a k-form on M or a differential form.
E
Let x : U -+ Rm be a coordinate system and let p E U. To obtain a base
in A k ( M p ) , let {[x, e l l p , . . . , [x, e m l p } be the “standard base” in M p and denote
its dual base in M ; by { d x l , . . . , d x m } . Then the k-form w on M can be written
as w ( p ) =
c
wfl ,,,,,j k ( p ) dxj,
A
. . . A dxj,, where the functions wf, ,,,,,in are
l i i , <...<ikSrn
defined on U . Formally, the w ; , , , , , , ;depend
~
on the choice of the coordinate system x .
Similar to Exercise 19-16b it can be proved that if the w?,,,,,,;,(p)in the representation with respect to the coordinate system x are differentiable at p , then for any other
coordinate system y the w / ~ , , , (, ,p~) ,in the representation with respect to y are also differentiable at p . Therefore, although the specific representation of the form depends
on the coordinate system, the differentiability properties of the form are independent
of the coordinate system chosen to represent it.
Definition 19.42 A k-form w : M
every p
w=
u
PEM
A k ( M p )will be called differentiable $for
M there is a coordinate system x : U -+ R” so that p E U , the form equals
w?,,.,,.i k d x ; , A . . . A dxik on U and the w?13
,,,.ik are dgerentiable at
c
E
--+
lsil <...<ikirn
19. Introduction to Differential Geometry
444
p . Similarly, we dejne C J-forms f o r j E N and CooTforms. Throughout, unless otherwise stated, k-forms will be assumed to be C30. We will also drop the superscript on
the functions o i i , . . . , i k in the coordinate representation of a form.
Section 19.3 has shown that the differential of a form is useful in connecting integrals over solids with integrals over their boundary. The differential of a form on a
manifold is defined similar to the differential of a form on Rm. For the definition, we
first need the differential of a function. To make the notation bearable, we introduce
new notation for tangent vectors.
Let M be a manifold, let p E M , let x,y be coordinate systems of M whose
domains contain p , let f : M + R be differentiable, and let [x,v],, [ y , w I p E M p be
so that [x,ulP = [ y ,w I p . Then
D
(f
0
Y-1) (Y(P))[Wl
(f 0 Y-1)
=
D
=
D(f
0
(x(P))[ul]
( Y ( P ) )[ D (Y
x-1) (x(p))[vl.
Hence, for a manifold M , a point p E M , a differentiable function f : M + R and
a tangent vector [x,u I p E M p the quantity d f ( p ) [ [ x u, I p ] := D f ox- ( x ( p ) ) [ v ]
does not depend on the representative (x,v). With u = ( u 1 , . . . , v,) we obtain
'>
(
Because of the above we also denote the tangent vector [x,e j I p as
af 0 x-l ( x ( p ) ) .This notation expresses the
we also write -a( fp ) := - f :=
axj
ax
fact that every tangent vector can be interpreted as a geometric object, but also as a
differential operator. As was already hinted in the previous section, it is quite common
that objects on manifolds have several interpretations. For practical purposes, it is
af
af
sensible to remember that - really is a partial derivative, namely,
ax
~
(X(P))>
ax
but that the composition with the parametrization x-l is omitted in the notation.
Now we can define the differential.
Definition 19.43 Let f : M + R be differentiable and let x : U + Rm be a coordim
ujej we dejne
nate system. Then f o r all p E U and u =
j=1
445
19.4. k-Forms and Integrals Over k-Chains
I f w is a k-jiorm on the manifold M and x : U + R” is a coordinate system on M ,
then f o r all p E U we dejine the (k 1)-form d w by
+
and call it the differential of w .
To make sure everything is in order, we must prove that this definition is independent of the coordinate system used to define the differential. For 0-forms, this was
already done before we defined the differential. Because the brute force computation
for k-forms is quite tedious, we first establish some properties of d and then show that
all operators with the same properties as d must be equal to d . This also establishes the
properties of the differential that we will need later on.
Because the differential is defined locally, it has the same properties as the differential from Definition 19.3 1.
Theorem 19.44 Let w be a k-jiorm and let q be an l-jiorm on the manifold M . Then the
following hold:
+ q ) = dw + dq.
2. d ( w A q ) = (dw) A q + ( - l ) k w A dq.
1. I f k = 1, then d ( w
3. d i d o ) = 0.
Proof. Mimic the proof of Theorem 19.35 (Exercise 19-34).
Theorem 19.45 Let M be a manifold, let x : U + R” be a coordinate system and
let d’ be a map that takes k-forms on U to (k + 1)-forms on U so that f o r all infinitely
dgerentiable f on U the equation d ‘ f =
5
a f ( p ) d x j = d f holds and so that the
8Xj
j=1
properties of Theorem 19.44 hold f o r d’. Then d’ = d .
Proof. By additivity of d and d’ (see part 1, of Theorem 19.44) it is enough to
show that d ( f dxi, A . . . A xi,) = d’ ( f dxi, A . . . A xi,) for all il < . . . < ik and f
m
ax,
infinitely differentiable on U . With f = x , we have d’x, =
-( p ) d x J = d x , ,
1=1 ax,
but we should recall that the “partial derivatives” are not quite-ordinary partial derivatives. Therefore, although it is simple, the reader should verify that the partials really
vanish for i $ j and that the i f hpartial is 1 (Exercise 19-35).
c
19. Introduction to Differential Geometry
446
We now prove by induction on k that d’ (dxi, A . . . A dxi,) = 0. For k = 1, this
follows from d’ ( d x i ) = d’ (d’xi) = 0. For the step k -+ ( k l), we note
+
d’ (dxj, A . A dxi,,,)
= d’ (d’xjl A (d’x;,
=
...A d’~j~+,))
d’ (d’xil) A (d’xj2 A . . . A d’x;k+l)- d’x;, A d’ (d’+ A . . . A d’xi,+,)
d’ (d’xi,) A (d’xi, A . . . A d’Xj,+,) - d’xj, A d’ (dxi, A . . . A dx;,,,)
=
0:
=
A
where the last step is justified by d’ (d’xi) = 0 and by the induction hypothesis.
Therefore, with f being a 0-form, we conclude the following for all k .
. . . A dxi,)
d’f dxi, A . . . A dxi, + (-1)Of
d f dxi, A . . . A X i k + 0
d ( f dxj, A . . . A dxi,) ,
d’ ( f dxj,
=
=
=
A
A
d’ (dxi, A
. . . A dxi,)
which establishes the result.
Theorem 19.45 shows that Definition 19.43 does not depend on the coordinate system we choose. Indeed, on the domain of each coordinate system, any differential
d defined as in Definition 19.43 with a specific coordinate system has the properties
in Theorem 19.44. By Theorem 19.45, any two such operators must be equal on the
intersection of the domains of the coordinate systems used in their definitions.
With forms and their differentials defined on manifolds, we can translate some
results from the unit cube to manifolds. We first turn to the parametrizations of the
domain of integration.
Definition 19.46 Let M be a manifold, let k E N and let x be a coordinate system
whose range contains [0, l I k x ( O } m - k . A singular k-cube c : [0,lIk + M in M
is the composition of the restriction of the function x - l to [0, 1Ik x ( 0 ) m - kwith the
natural bijection from [0,lIk to [0, lIk x {O}m-k. Fork = 0, we let a singular 0-cube
be the function that maps 0 to x - l ( O ) .
A singular 0-cube is a point, a singular 1-cube is a curve and a singular 2-cube is
a surface with square parameter domain. The standard k-cube is the identity function
I k ( x ) = x on [0, lIk (as we already noted in Definition 19.39). Section 19.3 indicated
the importance of chains, so we define them similar to Definition 19.39.
Definition 19.47 Let M be a manifold and let S be the set of all singular k-cubes in
M . Define the set C of all k-chains in M to be the set of all functions f : S + Z
such that f ( c ) = 0f o r all but finitely many c. The elements of C are usually written as
a,c, where the a, are integers and the sum has purely formal character:
f =
c:s ( e l f 0
The definition of the boundary of a singular k-cube is motivated by Theorem 19.38
and the discussion thereafter.
447
19.4. k-Forms and Integrals Over k-Chains
Definition 19.48 For a singular k-cube c in the manifold M , we define C ( j , o ) :=C O Zk( ~ , ~ )
k
and we set the boundary o f c equal to ac = X ( - l ) ' (c(;,Q - c ( i , l ) ).
c(;,1) := COZ;,~),
i=l
c
n
For a k-chain
a j c j , we define a
j=1
To integrate a form, we want to compose it with a singular k-chain and integrate
the resulting form. That means we must first devise a way to compose forms with kchains. For a linear map, it is easy to define a corresponding map from the k-tensors
on the range to the k-tensors on the domain as follows.
Definition 19.49 Let V , W be vector spaces. For T E I k ( W ) , f : V + W linear
and u 1 , . . . , U n E V define f * T ( u l , . . . , u k ) = T ( f ( u l ) , . . . , f ( u k ) ) .
For f : M + N differentiable, we know that f* is a linear map from M , to N f ( p ) .
Definition 19.50 Let M , N be manifolds, let f : M -+ N be diTerentiable and let
a k-form on N . Similar to Definition 19.49, f o r each p E M we dejine the
k-form f * m on
by ( f * W ) ( P ) [ U l ,. . . , u k l := w ( f ( p ) )[ f * p [ u ~ l ,. . . f * p [ u k l ] . In
coordinates, this definition reads as follows.
w be
9
For a 0-form g, we define f * g := g o f .
Note the similarity between Definition 19.50 and the Multidimensional Substitution Formula. If the wi,....,ikare the functions and dyi, A . . . A dy;, are the volume
elements, then the right side consists of the composition of the function with the diffeomorphism f multiplied with the transformed volume element. To make working
with the coordinate representations easier we first state a simple lemma.
Lemma 19.51 Let M , N be manifolds, let x : U + Rm be a coordinate system, let
p E U , let f : M -+ N be differentiable and let y : V + R" be a coordinate system
with f ( p ) E N . Then f * p
(
1
Proof. Represent D y o f o xP1 componentwise (Exercise 19-37).
w
Now we can show that the differential commutes with f * , which is the key to
Stokes' Theorem for k-chains.
19. Introduction to Differential Geometry
448
Theorem 19.52 Let M , N be manifolds and let w be a k-form on N . Then the equality
f * ( d w ) = d ( f * w ) holds.
Proof. It is enough to prove the result for forms w = g d y i , A . . . A dyi,. For these
forms, we proceed by induction on k . For k = 0, let x be a coordinate system around
p and let y be a coordinate system around f ( p ) . It is enough to prove the claimed
equality for the base vectors
following for all j
E
, j = 1, . . . , m. Via Lemma 19.51 we obtain the
{ 1, . . . , m } .
For the induction step, we argue as follows.
Now the following definition is sensible because open subsets of Rk are manifolds
themselves. To assure everything is defined everywhere, we should assume that the
function c in the singular k-cube is actually defined on a neighborhood of [0, 1Ik, but a
19.4. k-Forms and Integrals Over k-Chains
449
look at Definition 19.46 reveals that this is not a problem. Conscientious readers may
want to mentally substitute x-' instead of c in the definition below.
Definition 19.53 Let k
M . We define /c w :=
E
N,let c be a singular k-cube in M and let w be a k-form on
io,ll
2
c*w. Fork = 0, we define
~~
For a k-chain c =
aici in M , we define
/ 2/
w :=
c
i=l
ai
l=l
w.
ci
Theorem 19.54 Stokes' Theoremfor chains. I f w is a (k - 1)-form on a manifold M
and c is a k-chain in M , then
l I,
dw =
w.
Proof. Recall that by Theorem 19.38 the result holds for cubes Z k . (This is most
easily seen with notation as in Corollary 19.40.) The idea for the proof therefore is to
use Theorem 19.38 and we use it with the notation of Corollary 19.40.
Unfortunately, the integral of a form over a k-chain may depend on the coordinate
system we chose to define the chain. Ultimately, we want an integral that only depends
on the set x-' [0, l]']. Thus we need to investigate how integrals over k-cubes behave
under a change of coordinates. The next result shows that a reparametrization f of the
set x - l [0,
should not affect the value of the integral, because the reparametrized
form f*(0) will be multiplied by the factor that we recognize from the Multivariable
Substitution Formula.
[
[ ilk]
19. Introduction to Differential Geometry
450
Theorem 19.55 Let M and N be m-dimensional manifolds, let f : M + N be differentiable, let x : U -+ Rm be a coordinate system of M let y : V + Rm be a coordinate
system of N with f [ U ] g V . Then for every differentiable function g : N -+ R we
have f * ( g d y l r \ . . . r \ d y , ) = ( g o f ) d e t
Proof. Let p E U . To prove -the claimed equation at p , we only need to con-
-
sider the action of the functions on
(see Exercise 19-36). After
a(Yi
rewriting f* as f c p
0
f) a
(see Lemma 19.51) we obtain
the following.
where the last step holds because both m-forms evaluate to 1 at the given vectors.
W
Now we can prove that the integrals of certain m-forms over m-cubes are independent of the parametrization.
Theorem 19.56 Let c1, c2 : [0, lIm --f M be two m-cubes in the m-dimensional manifold M so that det ( D
o cz)) > 0 at every point in the domain of c l l 0 c:! and let
(.['
w be an
m-form that vanishes outside c1 "0, l]"] n c2 "0, lIm]. Then
/c,
=
1;
Proof. Let x , y be coordinate systems on M so that the cubes are c1 = x-'
i,O,P
andc2 = y -
1
. Then with w
base for R" we obtain
= h dxl
A..
. A dx, and { e l ,. . . , em)the standard
19.4. k-Forms and Integrals Over k-Chains
li'oc,,[o.llm,
h
=
45 1
o c1 o
(c;' o CZ) det ( D (c;' o c 2 ) ) d h
We conclude by noting that everything we have done in this section also works for
manifolds with boundaries and for manifolds with corners. The only difference is that
the k-cubes could actually touch the boundary in these settings. Hence, we will freely
use the results established here for these entities, too.
Exercises
19-34. Prove Theorem 19.44.
19-35. Let M be a manifold, let x : U +
W"
be a coordinate system and let xi := xi o x be the ith
V we have, as we would expect, that
component of x. Prove that with the new definition of ax,
a.ri
ax
-=Oforifjand-=1
axj
axi
19-36. Let w : M +
u
A k ( M p )be a k-form on M . Prove that each function wx1 1 , ...,i k from Definition
PEM
19.42 satisfies wl: ,,,,,i , ( p ) = w ( p ) [[x, e i , l p , . . , [x.el,lp]
.
19-37. Prove Lemma 19.51.
19-38. Let M , N , 0 be manifolds and let g : M + N and f : N + 0 be differentiable. Prove that
g'f*
= (f 0 g)*.
19-39. The boundary of the boundary of a singular k-cube
(a) Let k
E
N be so that
(%l)(j,p)
k ? 2, let 1 5 i 5 j 5 k - 1 and let a,,!l
E
(0, I]. Prove that
= (%+&),
Hint. Straight substitution of a point z
E
[0, l]k-2 into both sides
(b) Prove that if c is a k-chain in the manifold M , then
a(&)
= 0.
k
Hint. Represent the boundary operator as ac =
( - l ) i + a c ( l , a ) and use the translai=la=O,l
tion of part 19-39a to singular k-cubes to prove that the summands of
a(&)
cancel in pairs.
452
19. Introduction to Differential Geometry
19.5 Integration on Manifolds
Theorem 19.56 indicates that integration over m-cubes should make it possible to define integration “locally” on a manifold. To obtain a “global” notion of integration, we
need to piece the local integrals together. With a partition of unity as in Section 16.9
we can reduce any globally defined function to a sum of “locally supported” functions.
However, we need to assure that our transition from local to global integrals does not
have any unintended consequences. Theorem 19.56 rules out the danger of local inconsistencies. Globally, one of the ideas behind the integral of a form (a vector field) over
a manifold is to measure the throughput of a flow (modeled by the fondvector field)
through a surface (modeled by the manifold). This idea of throughput is only sensible
if the surface has a definite front and back, or, if it is closed, a definite inside and outside. Locally, it is always possible to define a front and a back, but an awkward global
situation can cause problems. Consider the Mobius strip depicted in Figure 52. If we
start as indicated in the figure, traveling on what we perceive as the “front” of the strip,
then after one full traversal of the strip we find ourselves on what we perceive as the
back of the strip, plus our directions of “up” and “down” have been reversed. Therefore it is not possible to globally define a positive direction of throughput through the
Mobius strip. To avoid such problems, we must limit our attention to those manifolds
for which we cannot lose our orientation like that. The first step is to recall that, because for d-dimensional vector spaces V the space A d ( V ) is one dimensional, we can
represent the set of bases of V as a union of two disjoint subsets.
Definition 19.57 Let V be a d-dimensional real vector space and let o E A d ( V ) .
The two sets of bases P := { B : B = ( u 1 , . . . , U d ] is a base of V , w(u1, . . . , u d ) > 0 )
and N := { B : B = {vl, . . . , u d ] is a base of V , o ( q ,. . . , u d ) < 0 ) are called orientations of V and the orientation to which a base ( ~ 1 ., . . , U d ) belongs is denoted
[UI,
...
1
Udl.
For example, two orthonormal bases in R2(with the order of the vectors assumed to
be fixed) are in the same orientation iff each can be obtained from the other by rotating
both base vectors by the same amount. This simple visualization shows the problem
with the Mobius strip. If we choose an orthonormal base at our starting point (see
Figure 5 2 ) and carry it with us in a way that tries to maintain the orientation (one such
way is indicated), then, no matter what we do, after one full traversal of the strip, our
base will be in the other orientation of EXd. In Figure 52, this is easy to see because if
we rotate the base labeled “problem” back onto our original base, we notice that the
base vectors have changed roles. The originally horizontal vector is now vertical and
vice versa. Such an interchange cannot be achieved with rotations alone. In terms of
the form w , the value of w on the pair of vectors has changed sign, which means the
bases are in opposite orientations.
For a general manifold, we cannot refer to surrounding space, but the tangent spaces
allow us to model the idea described above. Simply speaking, we demand that in each
domain of a chart x the orientation of the tangent spaces is given by the images of a
fixed orientation of R” under the parametrization x-l of the manifold.
19.5.Integration on Manifolds
(a)
c - +
453
(b)
Figure 52: It is impossible to orient a Mobius strip ( b ) . Because of the half-twist in
the strip, any orientation that is carried around the strip in a continuous fashion will
not arrive as the same orientation at the starting point. On the other hand, spheres are
orientable as indicated by the consistent orientation in ( a ) . (We need to be careful
interpreting this figure. There is a difficult theorem in topology that says that on a
sphere there is no continuous vector field without zeroes. Thus the bases indicated
cannot be interpolated by vector fields. However, any two bases can be mapped to each
other by translating tangentially along the sphere and then rotating and this process
also works if we cany a base in any fashion around the sphere until we are back at the
starting point.)
Definition 19.58 A choice of an orientation p pfor every tangent space M p of the mdimensional manifold M is called consistent i r f o r every chart x : U + Rm and
a lla , b E U we have [x,'[[i,[ui, ~ I L .~. . ,~x+'[[i,[uI,
~ ] , e m l x ( u ) ]=] pUifandonly
if [x;' [[ixLu],el],(b)],. . . , x,'[[i,~ul, em],(b)]]= pb. Manifolds with a consistent
orientation are called orientable and the function p is called an orientation. A manifold together with an orientation is called an oriented manifold. Charts x : U + I%"
such thatfor all a E U we have [ x; ' [ [ i, ~ u] ellx(
,
al ],. . . , x,'[[i,[u], e,l,(,)]] = pa
are called orientation preserving.
Note that because k-cubes are defined in terms of coordinate systems, Definition
19.58 also defines orientation preserving k-cubes. For orientation preserving in-cubes
c1 and c2, the determinant det ( D (c;' o c2)) is always positive. Therefore Theorem
19.56 shows that the integral of an m-form over an orientation preserving m-cube only
depends on the range c "0, lI m]of the m-cube, not on the specific m-cube itself. That
is, the integral over c [ [ O , I ] " ] does not depend on the parametrization, as long as we
only use orientation preserving parametrizations (m-cubes).
Definition 19.59 Let c : [0, 11" + M be an orientation-preserving m-cube in the
m-dimensional oriented manifold M and let w be an in-form that vanishes outside
c "0, l I m ] .Then we define
454
19. Introduction to Differential Geometry
Orientations and forms are defined similarly for manifolds with boundary and for
manifolds with comers.
Now that we have taken care of the orientability issue, the definition of the integral
of a form over a manifold is straightforward. To avoid formal problems, we stay with
connected manifolds. This is not a problem, because we will mostly be interested in
manifolds that are the disjoint union of at most finitely many connected manifolds.
To break up the integral, we need infinitely differentiable partitions of unity that are
supported inside singular m-cubes. The following results show that such partitions
exist.
Definition 19.60 A partition of unity of the manifold M is called a Ccc partition of
unity iff all functions in the partition are C30functions.
Theorem 19.61 Let 0be an open cover of the connected manifold M . Then there is a
Ccopartition of unity subordinate to 0.
)IOJ
Proof. This proof is the same as for Theorem 16.115, except that we choose the
to be Coo instead of just continuous. By Theorem 19.18, this is possible.
H
Theorem 19.62 Let M be an m-dimensional oriented manifold. Then there is an open
cover 0 of M so that for each U E 0 there is an orientation-preserving singular
m-cube cu with U C cu “0, llm].
Proof. We provide a slightly more elaborate proof than necessary so that it is easy
to see how to generalize the result to manifolds with boundary or corners.
For each p E M , let x V -+ R” be an orientation-preserving coordinate system
around p that maps p to the origin. Let C, C V be a compact cube in R” so that x ( p )
is in the relative interior (with respect to x[V]) of C,. Then with ic, : [0, 11” -+ C,
being the natural bijection between the cubes, the function c p := x - l o i c , is an
orientation preserving singular m-cube so that the M-interior of c, “0, 1Ip] contains
p . Now let W , E C, be a relatively open subset of x[V] that contains p . Then
U p := x-l[W,] is open, p E U p C c, [[O, lIm]and 0 := { U p } p Ecovers
~
M.
rn
Versions of Theorems 19.61 and 19.62 can also be proved as follows for manifolds
with boundary and manifolds with corners. For the Coo partition of unity, we just need
to modify Theorem 19.18 appropriately for the boundary points. For the covers, the
relatively open sets in the proof of Theorem 19.62 will not be open in Rm,because for
boundary points of manifolds with boundary one face of the cube C, will be contained
in the boundary of the space Hm. For boundary points of manifolds with corners,
several faces could be contained in the boundary of the range of the coordinate system.
With all machinery in place it is now easy to see (Exercise 19-41) that the definition
below really defines only one number that does not depend on the parametrizations
or the choice of partition of unity. The same definition also gives the integral over
manifolds with boundary and manifolds with comers.
455
19.5. Integration on Manifolds
Definition 19.63 Let M be an m-dimensional oriented manifold and let w be a compactly supported m-form on M . Let 0 be a partition of unity subordinate to an open
cover 0 of M so that f o r each U E 0 there is an orientation-preserving singular
It is particularly noteworthy that because o is compactly supported, the sum in
Definition 19.63 actually is finite (see Exercise 19-40).
To visualize these ideas, consider integration over embedded manifolds in Rd.For
an embedded manifold M , we can consider every tangent space M , to be a subspace
of the tangent space R; of Rd at p . The identification is done as follows. Let p E M ,
let U 2 Rd be a neighborhood of p in Rd,let V G Rd be open and let h : U + V
be a diffeomorphism as in Definition 19.5. As in the proof of Proposition 19.6 let
x := XRm o hlunM be the coordinate system around p obtained from h . The function
e : M , + Rdp defined by e [ x ,u ] , := [ h , u ] , is the desired isomorphism (see Exercise
19-42). The definition ([idRd,u],, [idRd, w],), := ( u , w),where (., .) is the usual
inner product in Rd, produces a natural inner product on Rdp. (This inner product on
Rdp is well-defined, because the definition states that we will use the second component
of one unique representative of the tangent vector.) Hence, via the above embedding,
the tangent spaces M , of embedded manifolds also carry an inner product. Recalling
Proposition 18.29 we can define the following.
Definition 19.64 Let M be an m-dimensional embedded oriented manifold in Rd.
Then f o r each p E M we define the volume element w v ( p ) to be the unique in-form
so that f o r all orthonormal bases { v l , . . . , urn}in p, we have w v ( p ) [ u l ,. . . , u,] = 1.
Example 19.65 Volume elements encode the integral over open subsets of Rd,
as well
as the integrals over curves and surfaces.
1. Let R
c: Rdbe an open set considered as an oriented embedded manifold with
R.Then with x : R + Rd denoting the natural
embedding, the volume element wv can be represented as the wedge product
wv = dxl A . . . A dxd. In particular, this means that for all f E CF(S-2) the
integral of f wv from Definition 19.63 coincides with the Lebesgue integral of
p p = [ e l , . . . , ed] for all p E
f over a.We will also denote this integral by
s,
f dV.
2 . Let M be a d - 1 dimensional embedded oriented manifold and let x be an
orientation-preserving coordinate system around p , constructed from a diffeomorphism h : U -+ V as in Definition 19.5. We define the unit normal vector
to be the unique unit vector n ( p ) in R$ so that n ( p ) ,h,-d(el), . . . , h;j(ed-1))
1
is a positively oriented base of Rdp. Because the hyperplane determined by
D h - ' ( h ( p ) ) [ e l ] ,. . . , D h - ' ( h ( p ) ) [ e d - l ] (see Exercise 19-19) does not depend
on the diffeomorphism h , the vector n ( p ) does not depend on the coordinate system. Moreover, the functions n j : M + R that map each p E M to the j t h coordinate of the unit normal vector are differentiable. To see this, in a neighborhood
19. Introduction to Differential Geometry
456
of p , let Z be the solution vector of the system of equations Z, Dh-' [ e j ] )= 0,
(
ah-'
i = 1 , . . . , d - 1 and Z j = 1 for some j so that 1
# 0 in the h-image of
axd
the neighborhood of p . Then z is parallel or antiparallel to n . By Exercise 177 1 and the differentiability of the coefficients of the system, the components zj
n
N
are differentiable. The components of n are either the components of
N
n
--
IFIl
7
or
llnll
of
, so the n j are differentiable, too.
The volume element of M is w v ( p ) [ u l ,. . . , Vd-11 = det ( n ( p ) ,u1, . . . , V d - I ) ,
where in the determinant we use the coordinate representation of the vectors with
respect to the base {[id,e l l p , . . . , [id,edlp}. Specifically for the case d = 3, if
r is an orientation preserving singular 2-cube and f : M -+ IR is differentiable,
then
which is the parametric formula (from multivariable calculus) for the scalar surface integral of a function f over a parametric surface parametrized by r . For the
integral of a vector field F : 2
' + T R d defined on a neighborhood 52 of M , we
define the surface integral to be
[,
. dS
:=
lM(F,
n ) w v . With the above,
we obtain the following in case d = 3.
which is the parametric formula (from multivariable calculus) for the surface
integral of a vector field F over a parametric surface parametrized by r (also see
Exercise 18-53c).
3. Let A4 be a one dimensional embedded oriented manifold in IRd, let x : U + R
be an orientation preserving coordinate system and let a := x - l . Then (Exercise
(
l , ~ ~ ~The
~ ~line
ll).
19-43a) the volume element can be written as w v ( t ) [ v ]= u , -
457
19.5. Integration on Manifolds
integral for scalar functions is defined in the obvious way. The line integral for
vector fields is defined by
IM. IM &)
dr' :=
(F,
WV.
As for the surface
integral, these formulas reduce to formulas'fahiliar from multivariable calculus
when specific parametrizations are used (see Exercise 19-43).
The above examples show that the integral over manifolds encodes the integrals of
scalar functions, as well as of vector fields over solids, hypersurfaces, and curves with
one formalism. Moreover, this formalism shows that the integrals are independent of
the parametrizations chosen for the objects in question, which is a big advantage for
theoretical considerations. The computation of numerical values of these integrals still
uses parametrizations and proceeds along the same lines as in calculus.
Finally, note that the definitions presented here actually work in more general settings. The a-algebra of Borel sets on M is the a-algebra generated by the open subsets
of M . The integral can actually be defined for rn-forms gwv for which the function g
is Borel measurable. Hence, if we use the measure u that assigns to each Borel subset
the integral of its indicator function, we can define the integral like we did on measure
spaces. In particular, this means that we can talk about LP spaces for which the domain
is a manifold.
Regarding the domain of the integral we note that the manifold need not be Co3
C' is sufficient and boundaries and comers are permissible.
Exercises
19-40, Prove that in any sum as in Definition 19.63 only finitely many summands are not zero.
Hint.Use that the sets [ p : p(p) f 0 ] form a locally finite open cover of M .
19-41. Let M be an m-dimensional oriented manifold, let w be a compactly supported m-form on M , let
Q be a partition of unity subordinate to an open cover (3 of M so that for each U E (3 there is an
orientation-preserving singular m-cube cu with U C cu "0, l I m ] and let be a partition of unity
subordinate to an open cover 6 of M so that for each V E 6 there is an orientation-preserving
singularm-cube cv with V 5 cv "0, l]"]. Prove that
jM
FQ
pm =
1
1C.w.
7#?€*
Hints. Use Theorem 19.56 to prove that the integrals of the ~ $ @ wdo not depend on whether we
choose the cube cu or C V . Then prove that
I,
'pw =
I,
91C.w. To switch the order of sum-
*€*
mations use that by Exercise 19-40 only finitely many summands are not zero.
19-42. Let M be an m-dimensional embedded manifold, let p E M , let U g Wd be a neighborhood of p in
R d , let V g Rd be open and let h : U -+ V be a diffeomorphism as in Definition 19.5. As in the
proof of Proposition 19.6 let x := nRrn o hlunM be the coordinate system around p obtained from
h. Prove that e [ x , u I p := [ h , u I p defines an isomorphism from Mp to R"p You must also prove that
e is well-defined.
19-43. Let M be a one dimensional embedded oriented manifold and let r : [0, 13 + M be an orientation
preserving singular 1-cube.
(a) Let x : U -+
R be an orientation preserving coordinate system and let a
:= x - l
Prove that
( , :%).
the volume element can be written as w v ( t ) [ u ]= u. -
1
1
(b) Prove that
fwv =
f ( r ( t ) ) ~ ~ r ' ( drh)(~t )~for all differentiable f : M -+
W
19. Introduction to Differential Geometry
45 8
,
A
Figure 53: Visualization of the proof of Stokes’ Theorem for embedded manifolds. The
parametrization x-’ maps the outward normal direction of the cube in its domain to
the outward normal direction of the k-cube that touches the boundary in the manifold.
(c) Prove that
( F , -)cq
=
1’
( F ( r ( t ) ) r’(t))
,
d h ( t ) for all vector fields F : S2
--f
W3
defined on a neighborhood of M .
19.6 Stokes’ Theorem
Because the tangent space T a M of an oriented manifold with boundary or with corners
is contained in T M , we can obtain an orientation for a M from the orientation of M .
Definition 19.66 Let M be an oriented m-dimensional manifold with boundaq or corners and let p be its orientation. For every p E a M that is not contained in a corner;
we let [vl, . . . , vm-11 E
i f S [ w , v 1 , . . . , u m - l ] E p p for all outward pointing
vectors w.The orientation
is called the induced orientation or the positive orientation on the boundary. For d-dimensional embedded manifolds in d-dimensional
space, it is also called the outward orientation, because the associated normal vector
literally points outward. We will always assume that the boundary carries the induced
orientation.
Formally, for a manifold with comers, the integral over the boundary is a sum of
integrals over the pieces of the boundary that are manifolds with comers themselves.
To ease notation, it will be understood that the integral over the boundary of a manifold
with comers is such a sum.
Theorem 19.67 Stokes’ Theorem. Let M be an oriented m-dimensional manifold
with boundary or corners, let w be a compactly supported ( m - 1)-form on M and let
aM
r
carry the induced orientation. Then
1.44
r
d m = 16’M w’
Proof. We first consider forms that are supported in the M-interior of an orientationpreserving m-cube. The result is trivial if w is supported in the interior of an m-cube c
so that c “0, lIm] E M \ aM because then l M d w = l d m = i c w = 0 =
lMw.
For a form w that is supported in the M-interior of an orientation preserving m-cube
c in M that intersects the boundary but not the comers, note that a M nc “0, llm] C ac
19.6. Stokes' Theorem
459
and that w is zero on the parts of the boundary of c that do not intersect aM. Let
p E a M n c "0, l]"] and let x be an orientation preserving coordinate system around
p. (For the next statement, let a negative sign indicate the opposite orientation.)
Then x(p) E aHm and pLX(,)= [ e l , . . . , em] = (-l)"[-e,,
e l , . . . , em-1], where
-em is the outward unit normal vector. This means that the induced orientation on
a M n c "0, I]"] is (-1)" times the usual orientation of M,. We can choose the
orientation-preserving singular m-cube c so that a M n c "0, I]"] = c(",o) "0, I]"].
Note that by the above C ( ~ , O ): [0, l]"-l -+ a M is orientation preserving for even m
and orientation reversing for odd m. Hence,
s
w = (-I),
c(m.0)
I,
w. But in 8c the
coefficient of c(".o) is (-1)". (The left side of Figure 53 gives a visual idea of what
we do here.) Hence, we obtain
Finally, when c intersects the comers, a similar argument for the faces of c that are
contained in the boundary gives the same result. (Exercise 19-44. The right side of
Figure 53 gives a visual idea of what to do.)
When we integrate a general form, each form (ow will be supported in the interior
of an m-cube as indicated above. Thus we should be able to prove the general result
by summation. To move the functions (o from the partition of unity into the differential
d,wewillusetheequalityO=Or\w=d(l)r\w=d
c(o1
i9..
A W
= c ( d ( o ) Am.
9EQ
By part 2 of Theorem 19.44 with (o being a 0-form, we obtain d i p ) = (d(o)Aw+(o dw.
Hence, because the restrictions of the (o to the boundary are a partition of unity with
the requisite properties for defining the integral on the boundary, we conclude
With the general result established, we can now present Stokes' Theorem in the
forms that are familiar from calculus (also see Figure 54). To prove a Divergence
Theorem in Rd, we first need to consider the volume element of (d - 1)-dimensional
embedded manifolds.
Theorem 19.68 Let M be an embedded oriented (d - 1)-dimensional manifold in Rd
and let n be its unit normal vector: Then the volume element can be represented
d
as w v ( p ) = z ( - l ) ' + j n ; ( p )
dxl
A
... A & A
. . . A dxd. Moreovel; the equality
;=1
h
n j w v = (-l)'+Jdxl
A
. . . A d x j A . . . A dxd holds.
460
19. Introduction to Differential Geometry
Proof. Let U I , . . . , Ud E M p . For the representation of wv, by Exercises 18-33 and
18-34 via expansion with respect to the first column, we obtain the following, where
Zj denotes the projection that erases the jthcomponent of each vector.
wV(P>[ul,.. .
9
ud-11
d
=
det (n(p), U 1 , .
=
E(-l)'+'nj(p)
. . , Ud-1)
= C(-l)"J.j(P)det(Zj(Ul),
j=1
N
. . . , Itj(Vd-1))
d
dxl
A
..
. . . A dxd[ul, . . . , ud-11.
I
*
A
dxj
A
j=1
For njwv, we obtain with n being the unit normal vector,
nj(P>wV(P)[ul,.. . ud-ll
= nj(p)det(n(p ) , u i , . . . , Ud-1) = d e t ( ( n j ( p ) n ( p ) - e j ) + e j , U i , . . . , u d - i )
9
= det (ej, U 1 , . . . , ud-1) = (-1)"J
=
(- 1)"jdxl
A
... A
A
det (?j(ui),
. , . , nj(ud-1))
N
. . . A dxd.
Now we can prove the Divergence Theorem. For the remainder of this section, we
adopt notation that is seen in physics and the sciences with vectors indicated by arrows
on top of the letter and with integrals over closed surfaces and curves denoted with
a circle in the integral sign(s).
Theorem 19.69 Gauss' Theorem, also known as the Divergence Theorem. If the
set E Rd is an embedded oriented connected compact d-dimensional manifol: with
boundary or corners, if S = 8 M with positive (outward) orientation and if F is a
vector field with continuous partial derivatives on an open region that contains E ,
then
6F
. d.?
=
( F ,n ) w; =
div ( F ) dw; =
div ( F ) d V (also see Fig-
ure 54(a)).
Proof. This is a consequence of Stokes' Theorem once we note the following:
d((F,n)w$)
=
j=l
19.6. Stokes ’ Theorem
46 1
Figure 54: The Divergence Theorem ( a ) says that the integral of a vector field $ over
a closed surface S equals the integral of the field’s divergence divF over the-enclosed
solid E . Stokes’ Theorem ( b ) says that the line integral of a vector field F along a
closed curve C equals the surface integral of curl@ over any surface S bounded by the
curve.
The result that is typically called “Stokes’ Theorem” is also a special case of Theorem 19.67. Note that integrals over two and three dimensional objects are often denoted
with two and three integral signs, respectively.
Theorem 19.70 Stokes’ Theorem for compact surfaces in R3. If S is an embedded
oriented connected compact two dimensioqal manifold with boundary or corners, i f
C = as with positive orientation and i f F is a vector field with continuous partial
derivatives in an open region of R3 that contains S, then (also see Figure 54(b))
Proof. Exercise 19-45.
We conclude with an important result for line integrals.
19. Introduction to Differential Geometry
462
Theorem 19.71 Fundamental Theorem for Line Integrals. Let the curve C be
parametrized by the continuously diTerentiable function ;(t), a 5 t 5 b. Iff is
dlfSerentiable and V f is continuous on C,then
. dr' = f ( r ' ( b ) )- f ( ; ( a ) ) .
Vf
IC
Proof. This can be proved with manifolds or directly. (Exercise 19-46.)
Exercises
19-44. Let M be an m-dimensional manifold with comers, let x : M -+ Ck be a coordinate system and let
c = .-I
: [0, lIm +. M be an order-preserving m-cube. Prove that for all ( m - 1)-forms w
l,O.lP
that are supported in the M-interior of c "0, l]"] the equality
Hint.
wLx(p)= [ e l , . . . , e m ] = ( - l ) j
/M d o =
[ - e j , e l , ... ,e j , ... , e m ] .
A
1
o holds
19-45. Prove Theorem 19.70.
i
Hint. Prove that F
integral.
-
'
1;dIl)
o$ = P d x l
+ Qdx2 + Rdx3. Then use Theorem 19.68 for the double
19-46. Prove Theorem 19.71.
19-47. Some integrals.
(a) Compute the integral of the vector field $(x, y , z ) =
{ (x, y , 0 ) E R3
: x*
+ y 2 = 1 1.
(b) Compute the integral of the vector field @ ( x ,y , z ) =
to (4, - 3 , 2 ) .
(11
y
over the line segment from (0, 0, 1)
YZ'
(c) Compute the integral of the vector field e ( x , y , z ) = ( x 2 z 2
+
cylinder
{ (x, y , z ) E R3 : x2 + y 2 5
1. -1 5
19-48. A subset R _C Rm is called convex iff for all x, y
is contained in R.
E
z 5 1 1.
\ z-y
)/ over the surface of the
R the line segment { x + t ( y - x ) : t
E [O,
11 ]
(a) Prove that if R & W3 is open and convex and $ ; + W3 satisfies curl$ = 0 on R,then
there is a differentiable function q : Q + W so that F = V q .
Hint Let x
E
R be fixed and define q ( z ) to be the line integral cp(z) :=
the integral is over the line segment from x to z .
s
[.;,?I
F . d r , where
(b) A connected open subset R C B3 is called simply connected iff every closed curve, that is
parametrized by a continuous function i! : [ a , b] --f W3 for which there is a c E [ a . 61 so that
?l[,,]
and i ! / [ c , b ~are continuously differentiable, is the boundary of a compact embedded
two dimensional manifold with comers that is contained in R.
Explain why the result from part 19-48a also holds for simply connected sets Q.
19-49. Green's Theorem. Let D be a two dimensional embedded connected compact oriented manifold
with boundarv or corners. let C = a D be the boundary curve with positive orientation. Let D be the
region in the plane bounded by C. Prove that if P and Q have continuous partial derivatives on an
open set that contains D ,then
6 ( g)
d; =
ID(g g)
-
dh.
Chapter 20
Hilbert Spaces
In addition to the topological structure of a metric space and the linear structure of a
normed space, in an inner product space we can measure angles and in particular we
can define orthogonality. This additional structure allows us to derive results that are
not easily accessible otherwise. The properties of orthonormal bases investigated in
Section 20.1 will allow us to establish the L2-convergence of Fourier series in Section
20.2 and we conclude in Section 20.3 with Riesz’ Representation Theorem for linear
functionals on Hilbert spaces.
As noted in Section 15.9, the inner products of real and complex inner product
spaces have slightly different properties. To avoid stating all results for real and for
complex spaces, in Sections 20.1 and 20.3 we will assume that our inner product spaces
are complex. The proofs will also work for real inner product spaces, because for a real
number, the real part and the complex conjugate are equal to the number itself.
20.1 Orthonormal Bases
Because an inner product allows us to define orthogonality, we are interested in representing the elements of an inner product space as a sum of orthonormal vectors, similar
to the base representation of vectors in d-dimensional space. This section presents
the general results and Section 20.2 shows the consequences for the representation of
functions with trigonometric polynomials.
We first need to make sure that in a representation with an orthonormal system
there are not too many nonzero coefficients.
Proposition 20.1 Bessel’s inequality. Let S be an orthonormal system in the inner
product space H and let x E H . Then {s E S : (x,s) f. 0} is countable and
c
l(x.
5 IIx1I2.
S€S
463
20. Hilbert Spaces
464
Proof. First let C C S be finite. Then
C€C
c
C€C
R C
C€C
which means
l(x,c ) I 5 ) ) x ) ) Now
~ . suppose for a contradiction that the set
2
C€C
+ 0} is not countable.
Then there are an E > 0 and a set B 5 S
so that B is at least countably infinite and for all b E B the inequality (x,b) > E
{s E S : (x,s)
holds. Let N
B . Then
E
N be greater than
/ ( x , b ) I 2> N F >
b€BN
I
I I
- and let BN
j
1;1112
- E
l2
5 B be an N-element subset of
2 )Ix 112, a contradiction. Therefore the set
{ s E S : (x,s) $ 0 ) must be countable. The inequality follows from the inequality for
finite subsets of S proved above.
Bessel’s inequality shows that the sum
c(x,
s)s, which, under the right circum-
S€S
stances, should represent the element x,must converge in a Hilbert space (see Exercise
20-1). Thus we can define orthonormal bases.
c(x,
Definition 20.2 An orthonormal system S in an inner product space H is called an
orthonormal base #for all x E H the series
s)s converges to x. The numbers
S€S
(x,s ) are also called the Fourier coefficients of x with respect to S.
The term “Fourier coefficients” is usually associated with the expansion of functions in terms of trigonometric functions. Section 20.2 will show that the results here
generalize the original Fourier expansions.
Theorems 20.3 and 20.4 give several criteria for an orthonormal system to be an
orthonormal base. Note that we will freely use the continuity of the inner product in
both factors, which is guaranteed by the Cauchy-Schwarz inequality.
cI
Theorem 20.3 Parseval’s identity. An orthonormal system S in an inner product
2
space H is an orthonormal base $for all x E H we have [lx/I2 =
(x,S) .
S€S
1
20.1. Orthonormal Bases
465
Proof. For “+,”note that if S is an orthonormal base, then
S€S
For
“e,”
let llx112 =
c
l(x,s)I2 for all x
S€S
S, := {s
E S :
(x,s ) f 0) is countable. Let
E
H . By Proposition 20.1 the set
(sj)g,be an enumeration of S,,
c
let
00
E
> 0 and let N E
N be
>
so that for all n
N the inequality
l(x,sj)12 iE~
J=n+l
holds. Then for all n 1. N we obtain the following:
n
c(x,
M
11
Hence, x
-
sj)sj
<
E
for all n 1 N and the series converges to x.
W
/=I
2. S is maximal.
3. span(S) is dense in H
Proof. “1 +3” follows directly from the definition of orthonormal bases.
For “ 3 3 2 , ” let span(S) be dense in H . For a contradiction, suppose S is not
maximal. Then there is a b E H so that llbll = 1 and ( b ,s) = 0 for all s E S.
But then (b,c) = 0 for all c E span(S), and for all c E span(S) we conclude that
Ilb - ell2 = j/b1I2- 2 X ( ( b ,c ) ) llcll2 = llbj1* llc112 2 1. This is a contradiction to
span(S) being dense in H .
For “2+ 1,” let S be a maximal orthonormal system in H . Suppose for a contradiction that there is an x E H so that x # c ( x , s)s. Let b := x - x ( x , s)s # 0. Then
+
+
.\ES
\ES
466
20. Hilbert Spaces
for all t E S we infer
which contradicts the maximality of S.
A metric space is called separable iff it has a countable dense subset. Separable
Hilbert spaces are important in quantum mechanics. The next result shows that, similar to Proposition 15.25 and Theorem 16.76 for finite dimensional spaces, all infinite
dimensional separable Hilbert spaces are “the same.”
Theorem 20.5 Every infinite dimensional separable Hilbert space H is isomorphic to
the space 12.
Proof. We will first prove that H has a countable orthonormal base. To do this let
C = {c, : n E N} H be a countable dense subset with c1 $ 0. Construct the subset
B C C recursively from C as follows. Let c1 E B. For all integers IZ 3 2, let c, E B
iff c, $ span({cl,. . . , c n - l } ) . Then for all n E N the set {Ck : k 5 n , Ck E B} is linearly independent and span({ck : k 5 n , Ck E B}) = span({cl,. . . , c,}). Hence, B is
linearly independent and span(B) = span(C). The set B must be infinite, because otherwise, H has a finite dimensional dense subspace and is thus itself finite dimensional
(Exercise 20-2). Now let {b, : n E N}be an enumeration of B and apply the GramSchmidt Orthonormalization Procedure indefinitely to obtain the orthonormal system
S = {s, : n E N} with span(S) = span(B) = span(C). Then S is an orthonormal
system whose span is dense in H , which means S is an orthonormal base of H .
Because S is a countable orthonormal base, every element x E H has a unique
CD
‘x)
representation x = c ( x , s j ) s j . The map Z : H + l 2 with Z(x) := c ( x , s j ) e j is
j=1
j=1
the desired isomorphism (details left to Exercise 20-4a).
Exercises
20-1. Prove that i f . H is a Hilbert space, S G H is an orthonormal system and x E H , then
E(x,
5)s
S€S
converges.
20-2. Prove that if the Hilbert space H has a finite dimensional dense subspace F , then H is finite dimensional.
Hint. Find an orthonormal base for F .
20-3. A characterization of equality. Let H he an inner product space, let D g H be dense and let
u , f E H . Prove that u = f iff for all x E D we have ( u ,x ) = (f,x).
20-4. Finishing the proof of Theorem 20.5
(a) Prove that the function I defined at the end of the proof of Theorem 20.5 is well-defined,
linear, bijective, and continuous and that its inverse is continuous, too.
(b) Explain why we need that H is a Hilbert space in Theorem 20.5.
20-5. Let H be a Hilbert space and let S be an orthonormal base of H .
(a) Prove that for all x , y
E
H we have x = y iff ( x , s ) = ( y , 5 ) for all s E S
20.2. Fourier Series
467
(b) Let Y be a Banach space. Prove that if L , M : H
L ( s ) = M ( s ) for all s E S, then L = M .
--f
Y are continuous linear functions and
20-6. Prove that every orthonormal system in a separable inner product space H is at most countable.
Hint. The distance between any two distinct elements in an orthonormal system is &.Use that if C
is dense, then for each element u in an uncountable orthonormal system, there must be an element
cL1E C that is closer than
A2 to u
20.2 Fourier Series
The representation of elements of inner product spaces is motivated by the corresponding representation in Rd as well as by the representation of functions via trigonometric
polynomials. This representation is important, because it (and similar representations)
arise naturally when solving partial differential equations (see Section 21.3). We will
now use the tools we have introduced for inner product spaces to investigate the convergence of Fourier series of functions on [-n,
n).In this section, we work with the real
normed spaces L P [ - n , n ) = 
Download