Uploaded by luk.sena

Apostol Calculus vol-1

advertisement
Tom M. Apostol
CALCULUS
VOLUME 1
One-Variable Calculus, with an
Introduction to Linear Algebra
SECOND EDITION
New York
l
John Wiley & Sons, Inc.
Santa Barbara l London l Sydney
l
Toronto
C O N S U L T I N G
EDITOR
George Springer, Indiana University
XEROX @
is a trademark of Xerox Corporation.
Second Edition Copyright 01967
by John WiJey
& Sons, Inc.
First Edition copyright 0 1961 by Xerox Corporation.
Al1 rights reserved. Permission in writing must be obtained
from the publisher before any part of this publication may
be reproduced or transmitted in any form or by any means,
electronic or mechanical, including photocopy, recording,
or any information storage or retrieval system.
ISBN 0 471 00005 1
Library of Congress Catalog Card Number: 67-14605
Printed in the United States of America.
1 0 9 8 7 6 5 4 3 2
TO
Jane and Stephen
PREFACE
Excerpts from the Preface to the First Edition
There seems to be no general agreement as to what should constitute a first course in
calculus and analytic geometry. Some people insist that the only way to really understand
calculus is to start off with a thorough treatment of the real-number system and develop
the subject step by step in a logical and rigorous fashion. Others argue that calculus is
primarily a tool for engineers and physicists; they believe the course should stress applications of the calculus by appeal to intuition and by extensive drill on problems which develop
manipulative skills. There is much that is sound in both these points of view. Calculus is
a deductive science and a branch of pure mathematics. At the same time, it is very important to remember that calculus has strong roots in physical problems and that it derives
much of its power and beauty from the variety of its applications. It is possible to combine
a strong theoretical development with sound training in technique; this book represents
an attempt to strike a sensible balance between the two. While treating the calculus as a
deductive science, the book does not neglect applications to physical problems. Proofs of
a11 the important theorems are presented as an essential part of the growth of mathematical
ideas; the proofs are often preceded by a geometric or intuitive discussion to give the
student some insight into why they take a particular form. Although these intuitive discussions Will satisfy readers who are not interested in detailed proofs, the complete proofs
are also included for those who prefer a more rigorous presentation.
The approach in this book has been suggested by the historical and philosophical development of calculus and analytic geometry. For example, integration is treated before
differentiation. Although to some this may seem unusual, it is historically correct and
pedagogically sound. Moreover, it is the best way to make meaningful the true connection
between the integral and the derivative.
The concept of the integral is defined first for step functions.
Since the integral of a step
function is merely a finite sum, integration theory in this case is extremely simple. As the
student learns the properties of the integral for step functions, he gains experience in the
use of the summation notation and at the same time becomes familiar with the notation
for integrals. This sets the stage SO that the transition from step functions to more general
functions seems easy and natural.
vii
. ..
WI
Preface
Prefuce to the Second Edition
The second edition differs from the first in many respects. Linear algebra has been
incorporated, the mean-value theorems and routine applications of calculus are introduced
at an earlier stage, and many new and easier exercises have been added. A glance at the
table of contents reveals that the book has been divided into smaller chapters, each centering
on an important concept. Several sections have been rewritten and reorganized to provide
better motivation and to improve the flow of ideas.
As in the first edition, a historical introduction precedes each important new concept,
tracing its development from an early intuitive physical notion to its precise mathematical
formulation. The student is told something of the struggles of the past and of the triumphs
of the men who contributed most to the subject. Thus the student becomes an active
participant in the evolution of ideas rather than a passive observer of results.
The second edition, like the first, is divided into two volumes. The first two thirds of
Volume 1 deals with the calculus of functions of one variable, including infinite series and
an introduction to differential equations. The last third of Volume 1 introduces linear
algebra with applications to geometry and analysis. Much of this material leans heavily
on the calculus for examples that illustrate the general theory. It provides a natural
blending of algebra and analysis and helps pave the way for the transition from onevariable calculus to multivariable calculus, discussed in Volume II. Further development
of linear algebra Will occur as needed in the second edition of Volume II.
Once again 1 acknowledge with pleasure my debt to Professors H. F. Bohnenblust,
A. Erdélyi, F. B. Fuller, K. Hoffman, G. Springer, and H. S. Zuckerman. Their influence
on the first edition continued into the second. In preparing the second edition, 1 received
additional help from Professor Basil Gordon, who suggested many improvements. Thanks
are also due George Springer and William P. Ziemer, who read the final draft. The staff
of the Blaisdell Publishing Company has, as always, been helpful; 1 appreciate their sympathetic consideration of my wishes concerning format and typography.
Finally, it gives me special pleasure to express my gratitude to my wife for the many ways
she has contributed during the preparation of both editions. In grateful acknowledgment
1 happily dedicate this book to her.
T. M. A.
Pasadena,
California
September 16, 1966
CONTENTS
1. INTRODUCTION
Part 1. Historical Introduction
11.1
1 1.2
1 1.3
*1 1.4
1 1.5
1 1.6
The two basic concepts of calculus
Historical background
The method of exhaustion for the area of a parabolic segment
Exercises
A critical analysis of Archimedes’ method
The approach to calculus to be used in this book
Part 2.
12.1
1 2.2
12.3
1 2.4
1 2.5
Some Basic Concepts of the Theory of Sets
Introduction to set theory
Notations for designating sets
Subsets
Unions, intersections, complements
Exercises
Part 3.
1
2
3
8
8
10
11
12
12
13
15
A Set of Axioms for the Real-Number System
13.1 Introduction
1 3.2 The field axioms
*1 3.3 Exercises
1 3.4 The order axioms
*1 3.5 Exercises
1 3.6 Integers and rational numbers
17
17
19
19
21
21
ix
X
Contents
1 3.7 Geometric interpretation of real numbers as points on a line
1 3.8 Upper bound of a set, maximum element, least upper bound (supremum)
1 3.9 The least-Upper-bound axiom (completeness axiom)
1 3.10 The Archimedean property of the real-number system
1 3.11 Fundamental properties of the supremum and infimum
*1 3.12 Exercises
*1 3.13 Existence of square roots of nonnegative real numbers
*1 3.14 Roots of higher order. Rational powers
*1 3.15 Representation of real numbers by decimals
-
22
23
25
25
26
28
29
30
30
Part 4. Mathematical Induction, Summation Notation,
and Related Topics
14.1 An example of a proof by mathematical induction
1 4.2 The principle of mathematical induction
*1 4.3 The well-ordering principle
1 4.4 Exercises
*14.5 Proof of the well-ordering principle
1 4.6 The summation notation
1 4.7 Exercises
1 4.8 Absolute values and the triangle inequality
1 4.9 Exercises
*14.10 Miscellaneous exercises involving induction
32
34
34
35
37
37
39
41
43
44
1. THE CONCEPTS OF INTEGRAL CALCULUS
1.1 The basic ideas of Cartesian geometry
1.2 Functions. Informa1 description and examples
*1.3 Functions. Forma1 definition as a set of ordered pairs
1.4 More examples of real functions
1.5 Exercises
1.6 The concept of area as a set function
1.7 Exercises
1.8 Intervals and ordinate sets
1.9 Partitions and step functions
1.10 Sum and product of step functions
1.11 Exercises
1.12 The definition of the integral for step functions
1.13 Properties of the integral of a step function
1.14 Other notations for integrals
48
50
53
54
56
57
60
60
61
63
63
64
66
69
Contents
1.15 Exercises
1.16 The integral of more general functions
1.17 Upper and lower integrals
1.18 The area of an ordinate set expressed as an integral
1.19 Informa1 remarks on the theory and technique of integration
1.20 Monotonie and piecewise monotonie functions. Definitions and examples
1.21 Integrability of bounded monotonie functions
1.22 Calculation of the integral of a bounded monotonie function
1.23 Calculation of the integral Ji xp dx when p is a positive integer
1.24 The basic properties of the integral
1.25 Integration of polynomials
1.26 Exercises
1.27 Proofs of the basic properties of the integral
xi
70
72
74
75
75
76
77
79
79
80
81
83
84
2. SOME APPLICATIONS OF INTEGRATION
2.1 Introduction
2.2 The area of a region between two graphs expressed as an integral
2.3 Worked examples
2.4 Exercises
2.5 The trigonometric functions
2.6 Integration formulas for the sine and cosine
2.7 A geometric description of the sine and cosine functions
2.8 Exercises
2.9 Polar coordinates
2.10 The integral for area in polar coordinates
2.11 Exercises
2.12 Application of integration to the calculation of volume
2.13 Exercises
2.14 Application of integration to the concept of work
2.15 Exercises
2.16 Average value of a function
2.17 Exercises
2.18 The integral as a function of the Upper limit. Indefinite integrals
2.19 Exercises
88
88
89
94
94
97
102
104
108
109
110
111
114
115
116
117
119
120
124
3. CONTINUOUS FUNCTIONS
3.1
3.2
Informa1 description of continuity
The definition of the limit of a function
126
127
Contents
xii
3.3 The definition of continuity of a function
3.4 The basic limit theorems. More examples of continuous functions
3.5 Proofs of the basic limit theorems
3.6 Exercises
3.7 Composite functions and continuity
3.8 Exercises
3.9 Bolzano’s theorem for continuous functions
3.10 The intermediate-value theorem for continuous functions
3.11 Exercises
3.12 The process of inversion
3.13 Properties of functions preserved by inversion
3.14 Inverses of piecewise monotonie functions
3.15 Exercises
3.16 The extreme-value theorem for continuous functions
3.17 The small-span theorem for continuous functions (uniform continuity)
3.18 The integrability theorem for continuous functions
3.19 Mean-value theorems for integrals of continuous functions
3.20 Exercises
130
131
135
138
140
142
142
144
145
146
147
148
149
150
152
152
154
155
4. DIFFERENTIAL CALCULUS
4.1
4.2
4.3
4.4
4.5
4.6
Historical introduction
A problem involving velocity
The derivative of a function
Examples of derivatives
The algebra of derivatives
Exercises
4.7
4.8
4.9
4.10
Geometric interpretation of the derivative as a slope
Other notations for derivatives
Exercises
The chain rule for differentiating composite functions
4.11 Applications of the chain rule. Related rates and implicit differentiation
4.12 Exercises
4.13 Applications of differentiation to extreme values of functions
4.14 The mean-value theorem for derivatives
4.15 Exercises
4.16
4.17
4.18
4.19
Applications of the mean-value theorem to geometric properties of functions
Second-derivative test for extrema
Curve sketching
Exercises
156
157
159
161
164
167
169
171
173
174 176 cc
179
181
183
186
187
188
189
191
Contents
4.20
4.21
“4.22
“4.23
Worked examples of extremum problems
Exercises
Partial derivatives
Exercises
...
x111
191
194
196
201
5. THE RELATION BETWEEN INTEGRATION
AND DIFFERENTIATION
5.1 The derivative of an indefinite integral. The first fundamental theorem of
calculus
202
5.2 The zero-derivative theorem
204
205
5.3 Primitive functions and the second fundamental theorem of calculus
207
5.4 Properties of a function deduced from properties of its derivative
5.5 Exercises
208
210 “-.
5.6 The Leibniz notation for primitives
5.7 Integration by substitution
212
5.8 Exercises
216
5.9 Integration by parts
217 5.10 Exercises
220
222
*5.11 Miscellaneous review exercises
6. THE LOGARITHM, THE EXPONENTIAL, AND THE
INVERSE TRIGONOMETRIC FUNCTIONS
6.1 Introduction
6.2 Motivation for the definition of the natural logarithm as an integral
6.3 The definition of the logarithm. Basic properties
6.4 The graph of the natural logarithm
6.5 Consequences of the functional equation L(U~) = L(a) + L(b)
6.6 Logarithms referred to any positive base b # 1
6.7 Differentiation and integration formulas involving logarithms
6.8 Logarithmic differentiation
6.9 Exercises
6.10 Polynomial approximations to the logarithm
6.11 Exercises
6.12 The exponential function
6.13 Exponentials expressed as powers of e
6.14 The definition of e” for arbitrary real x
6.15 The definition of a” for a > 0 and x real
226
227
229
230
230
232
233
235
236
238
242
242
244
244
245
Contents
xiv
6.16 Differentiation and integration formulas involving exponentials
6.17 Exercises
6.18 The hyperbolic functions
6.19 Exercises
6.20 Derivatives of inverse functions
6.21 Inverses of the trigonometric functions
6.22 Exercises
6.23 Integration by partial fractions
6.24 Integrals which cari be transformed into integrals of rational functions
6.25 Exercises
6.26 Miscellaneous review exercises
245
248
251
251
252
253
256
258
264
267
268
7. POLYNOMIAL APPROXIMATIONS TO FUNCTIONS
7.1 Introduction
7.2 The Taylor polynomials generated by a function
7.3 Calculus of Taylor polynomials
7.4 Exercises
7.5 Taylor% formula with remainder
7.6 Estimates for the error in Taylor’s formula
*7.7 Other forms of the remainder in Taylor’s formula
7.8 Exercises
7.9 Further remarks on the error in Taylor’s formula. The o-notation
7.10 Applications to indeterminate forms
7.11 Exercises
7.12 L’Hôpital’s rule for the indeterminate form O/O
7.13 Exercises
7.14 The symbols + CO and - 03. Extension of L’Hôpital’s rule
7.15 Infinite limits
7.16 The behavior of log x and e” for large x
7.17 Exercises
272
273
275
278
278
280
283
284
286
289
290
292
295
296
298
300
303
8. INTRODUCTION TO DIFFERENTIAL EQUATIONS
8.1
8.2
8.3
8.4
Introduction
Terminology and notation
A first-order differential equation for the exponential function
First-order linear differential equations
305
306
307
308
Contents
8.5 Exercises
8.6 Some physical problems leading to first-order linear differential equations
8.7 Exercises
8.8 Linear equations of second order with constant coefficients
8.9 Existence of solutions of the equation y” + ~JJ = 0
8.10 Reduction of the general equation to the special case y” + ~JJ = 0
8.11 Uniqueness theorem for the equation y” + bu = 0
8.12 Complete solution of the equation y” + bu = 0
8.13 Complete solution of the equation y” + ay’ + br = 0
8.14 Exercises
8.15 Nonhomogeneous linear equations of second order with constant coefficients
8.16 Special methods for determining a particular solution of the nonhomogeneous
equation y” + ay’ + bu = R
8.17 Exercises
8.18 Examples of physical problems leading to linear second-order equations with
constant coefficients
8.19 Exercises
8.20 Remarks concerning nonlinear differential equations
8.21 Integral curves and direction fields
8.22 Exercises
8.23 First-order separable equations
8.24 Exercises
8.25 Homogeneous first-order equations
8.26 Exercises
8.27 Some geometrical and physical problems leading to first-order equations
8.28 Miscellaneous review exercises
xv
311
313
319
322
323
324
324
326
326
328
329
332
333
334
339
339
341
344
345
347
347
350
351
355
9. COMPLEX NUMBERS
9.1 Historical introduction
9.2 Definitions and field properties
9.3 The complex numbers as an extension of the real numbers
9.4 The imaginary unit i
9.5 Geometric interpretation. Modulus and argument
9.6 Exercises
9.7 Complex exponentials
9.8 Complex-valued functions
9.9 Examples of differentiation and integration formulas
9.10 Exercises
358
358
360
361
362
365
366
368
369
371
xvi
Contents
10. SEQUENCES, INFINITE SERIES,
IMPROPER INTEGRALS
10.1 Zeno’s paradox
10.2 Sequences
10.3 Monotonie sequences of real numbers
10.4 Exercises
10.5 Infinite series
10.6 The linearity property of convergent series
10.7 Telescoping series
10.8 The geometric series
10.9 Exercises
“10.10 Exercises on decimal expansions
10.11 Tests for convergence
10.12 Comparison tests for series of nonnegative terms
10.13 The integral test
10.14 Exercises
10.15 The root test and the ratio test for series of nonnegative terms
10.16 Exercises
10.17 Alternating series
10.18 Conditional and absolute convergence
10.19 The convergence tests of Dirichlet and Abel
10.20 Exercises
*10.21 Rearrangements of series
10.22 Miscellaneous review exercises
10.23 Improper integrals
10.24 Exercises
374
378
381
382
383
385
386
388
391
393
394
394
397
398
399
402
403
406
407
409
411
414
416
420
11. SEQUENCES AND SERIES OF FUNCTIONS
11.1 Pointwise convergence of sequences of functions
11.2 Uniform convergence of sequences of functions
11.3 Uniform convergence and continuity
11.4 Uniform convergence and integration
11.5 A sufficient condition for uniform convergence
11.6 Power series. Circle of convergence
11.7 Exercises
11.8 Properties of functions represented by real power series
11.9 The Taylor’s series generated by a function
11.10 A sufficient condition for convergence of a Taylor’s series
422
423
424
425
427
428
430
431
434
435
Contents
xvii
11.11 Power-series expansions for the exponential and trigonometric functions
*Il. 12 Bernstein’s theorem
11.13 Exercises
11.14 Power series and differential equations
11.15 The binomial series
11.16 Exercises
435
437
438
439
441
443
12. VECTOR ALGEBRA
12.1 Historical introduction
12.2 The vector space of n-tuples of real numbers.
12.3 Geometric interpretation for n < 3
12.4 Exercises
12.5 The dot product
12.6 Length or norm of a vector
12.7 Orthogonality of vectors
12.8 Exercises
12.9 Projections. Angle between vectors in n-space
12.10 The unit coordinate vectors
12.11 Exercises
12.12 The linear span of a finite set of vectors
12.13 Linear independence
12.14 Bases
12.15 Exercises
12.16 The vector space V,(C) of n-tuples of complex
12.17 Exercises
numbers
445
446
448
450
451
453
455
456
457
458
460
462
463
466
467
468
470
13. APPLICATIONS OF VECTOR ALGEBRA
TO ANALYTIC GEOMETRY
13.1
13.2
13.3
13.4
13.5
13.6
13.7
13.8
13.9
Introduction
Lines in n-space
Some simple properties of straight lines
Lines and vector-valued functions
Exercises
Planes in Euclidean n-space
Planes and vector-valued functions
Exercises
The cross product
471
472
473
474
477
478
481
482
483
. ..
xv111
Contents
13.10 The cross product expressed as a determinant
13.11 Exercises
13.12 The scalar triple product
13.13 Cramer’s rule for solving a system of three linear equations
13.14 Exercises
13.15 Normal vectors to planes
13.16 Linear Cartesian equations for planes
13.17 Exercises
13.18 The conic sections
13.19 Eccentricity of conic sections
13.20 Polar equations for conic sections
13.21 Exercises
13.22 Conic sections symmetric about the origin
13.23 Cartesian equations for the conic sections
13.24 Exercises
13.25 Miscellaneous exercises on conic sections
486
487
488
490
491
493
494
496
497
500
501
503
504
505
508
509
14. CALCULUS OF VECTOR-VALUED FUNCTIONS
14.1 Vector-valued functions of a real variable
14.2 Algebraic operations. Components
14.3 Limits, derivatives, and integrals
14.4 Exercises
14.5 Applications to curves. Tangency
14.6 Applications to curvilinear motion. Velocity, speed, and acceleration
14.7 Exercises
14.8 The unit tangent, the principal normal, and the osculating plane of a curve
14.9 Exercises
14.10 The definition of arc length
14.11 Additivity of arc length
14.12 The arc-length function
14.13 Exercises
14.14 Curvature of a curve
14.15 Exercises
14.16 Velocity and acceleration in polar coordinates
14.17 Plane motion with radial acceleration
14.18 Cylindrical coordinates
14.19 Exercises
14.20 Applications to planetary motion
14.2 1 Miscellaneous review exercises
512
512
513
516
517
520
524
525
528
529
532
533
535
536
538
540
542
543
543
545
549
Contents
xix
15. LINEAR SPACES
15.1 Introduction
15.2 The definition of a linear space
15.3 Examples of linear spaces
15.4 Elementary consequences
of
the
axioms
15.5 Exercises
15.6 Subspaces of a linear space
15.7 Dependent and independent sets in a linear space
15.8 Bases and dimension
15.9 Exercises
15.10 Inner products,
Euclidean
spaces, norms
15.11 Orthogonality in a Euclidean space
15.12 Exercises
15.13 Construction of orthogonal sets. The Gram-Schmidt process
15.14 Orthogonal complements. Projections
15.15 Best approximation of elements in a Euclidean space by elements in a finitedimensional subspace
15.16 Exercises
551
551
552
554
555
556
557
559
560
561
564
566
568
572
574
576
16. LINEAR TRANSFORMATIONS AND MATRICES
16.1 Linear transformations
16.2 Nul1 space and range
16.3 Nullity and rank
16.4 Exercises
16.5 Algebraic operations on linear transformations
16.6 Inverses
16.7 One-to-one linear transformations
16.8 Exercises
16.9 Linear transformations with prescribed values
16.10 Matrix representations of linear transformations
16.11 Construction of a matrix representation in diagonal form
16.12 Exercises
16.13 Linear spaces of matrices
16.14 Isomorphism between linear transformations and matrices
16.15 Multiplication of matrices
16.16 Exercises
16.17 Systems of linear equations
578
579
581
582
583
585
587
589
590
591
594
596
597
599
600
603
605
xx
Contents
16.18 Computation techniques
of matrices
16.19 Inverses square
16.20 Exercises
16.21 Miscellaneous exercises on matrices
Answers to exercises
Index
607
611
613
614
617
657
Calculus
INTRODUCTION
Part 1. Historical Introduction
11.1
The two basic concepts of calculus
The remarkable progress that has been made in science and technology during the last
Century is due in large part to the development of mathematics. That branch of mathematics
known as integral and differential calculus serves as a natural and powerful tool for attacking
a variety of problems that arise in physics, astronomy, engineering, chemistry, geology,
biology, and other fields including, rather recently, some of the social sciences.
TO give the reader an idea of the many different types of problems that cari be treated by
the methods of calculus, we list here a few sample questions selected from the exercises that
occur in later chapters of this book.
With what speed should a rocket be fired upward SO that it never returns to earth? What
is the radius of the smallest circular disk that cari caver every isosceles triangle of a given
perimeter L? What volume of material is removed from a solid sphere of radius 2r if a hole
of radius r is drilled through the tenter ? If a strain of bacteria grows at a rate proportional
to the amount present and if the population doubles in one hour, by how much Will it
increase at the end of two hours? If a ten-Pound force stretches an elastic spring one inch,
how much work is required to stretch the spring one foot ?
These examples, chosen from various fields, illustrate some of the technical questions that
cari be answered by more or less routine applications of calculus.
Calculus is more than a technical tool-it is a collection of fascinating and exciting ideas
that have interested thinking men for centuries. These ideas have to do with speed, area,
volume, rate of growth, continuity, tangent line, and other concepts from a variety of fields.
Calculus forces us to stop and think carefully about the meanings of these concepts. Another
remarkable feature of the subject is its unifying power. Most of these ideas cari be formulated SO that they revolve around two rather specialized problems of a geometric nature. W e
turn now to a brief description of these problems.
Consider a curve C which lies above a horizontal base line such as that shown in Figure
1.1. We assume this curve has the property that every vertical line intersects it once at most.
1
2
Introduction
The shaded portion of the figure consists of those points which lie below the curve C, above
the horizontal base, and between two parallel vertical segments joining C to the base. The
first fundamental problem of calculus is this : TO assign a number which measures the area
of this shaded region.
Consider next a line drawn tangent to the curve, as shown in Figure 1.1. The second
fundamental problem may be stated as follows: TO assign a number which measures the
steepness of this line.
FIGURE
1.1
Basically, calculus has to do with the precise formulation and solution of these two
special problems. It enables us to dejine the concepts of area and tangent line and to calculate the area of a given region or the steepness of a given tangent line. Integral calculus
deals with the problem of area and Will be discussed in Chapter 1. Differential calculus deals
with the problem of tangents and Will be introduced in Chapter 4.
The study of calculus requires a certain mathematical background. The present chapter
deals with fhis background material and is divided into four parts : Part 1 provides historical
perspective; Part 2 discusses some notation and terminology from the mathematics of sets;
Part 3 deals with the real-number system; Part 4 treats mathematical induction and the
summation notation. If the reader is acquainted with these topics, he cari proceed directly
to the development of integral calculus in Chapter 1. If not, he should become familiar
with the material in the unstarred sections of this Introduction before proceeding to
Chapter 1.
Il.2 Historical background
The birth of integral calculus occurred more than 2000 years ago when the Greeks
attempted to determine areas by a process which they called the method ofexhaustion. The
essential ideas of this method are very simple and cari be described briefly as follows: Given
a region whose area is to be determined, we inscribe in it a polygonal region which approximates the given region and whose area we cari easily compute. Then we choose another
polygonal region which gives a better approximation, and we continue the process, taking
polygons with more and more sides in an attempt to exhaust the given region. The method
is illustrated for a semicircular region in Figure 1.2. It was used successfully by Archimedes
(287-212 BS.) to find exact formulas for the area of a circle and a few other special figures.
The method of exhaustion for the area of a parabolic segment
3
The development of the method of exhaustion beyond the point to which Archimedes
carried it had to wait nearly eighteen centuries until the use of algebraic symbols and
techniques became a standard part of mathematics. The elementary algebra that is familiar
to most high-school students today was completely unknown in Archimedes’ time, and it
would have been next to impossible to extend his method to any general class of regions
without some convenient way of expressing rather lengthy calculations in a compact and
simplified form.
A slow but revolutionary change in the development of mathematical notations began
in the 16th Century A.D. The cumbersome system of Roman numerals was gradually displaced by the Hindu-Arabie characters used today, the symbols + and - were introduced
for the first time, and the advantages of the decimal notation began to be recognized.
During this same period, the brilliant successes of the Italian mathematicians Tartaglia,
FIGURE 1.2
The method of exhaustion applied to a semicircular region.
Cardano, and Ferrari in finding algebraic solutions of cubic and quartic equations stimulated a great deal of activity in mathematics and encouraged the growth and acceptance of a
new and superior algebraic language. With the widespread introduction of well-chosen
algebraic symbols, interest was revived in the ancient method of exhaustion and a large
number of fragmentary results were discovered in the 16th Century by such pioneers as
Cavalieri, Toricelli, Roberval, Fermat, Pascal, and Wallis.
Gradually the method of exhaustion was transformed into the subject now called integral
calculus, a new and powerful discipline with a large variety of applications, not only to
geometrical problems concerned with areas and volumes but also to problems in other
sciences. This branch of mathematics, which retained some of the original features of the
method of exhaustion, received its biggest impetus in the 17th Century, largely due to the
efforts of Isaac Newton (1642-1727) and Gottfried Leibniz (1646-1716), and its development continued well into the 19th Century before the subject was put on a firm mathematical
basis by such men as Augustin-Louis Cauchy (1789-1857) and Bernhard Riemann (18261866). Further refinements and extensions of the theory are still being carried out in
contemporary mathematics.
Il.3
The method of exhaustion for the area of a parabolic segment
Before we proceed to a systematic treatment of integral calculus, it Will be instructive
to apply the method of exhaustion directly to one of the special figures treated by Archimedes himself. The region in question is shown in Figure 1.3 and cari be described as
follows: If we choose an arbitrary point on the base of this figure and denote its distance
from 0 by X, then the vertical distance from this point to the curve is x2. In particular, if
the length of the base itself is b, the altitude of the figure is b2. The vertical distance from
x to the curve is called the “ordinate” at x. The curve itself is an example of what is known
4
Introduction
0
0
rb2
X’
-
:.p
0
Approximation from below
X
FIGURE 1.3 A parabolic
Approximation from above
FIGURE 1.4
segment.
as a parabola. The region bounded by it and the two line segments is called a parabolic
segment.
This figure may be enclosed in a rectangle of base b and altitude b2, as shown in Figure 1.3.
Examination of the figure suggests that the area of the parabolic segment is less than half
the area of the rectangle. Archimedes made the surprising discovery that the area of the
parabolic segment is exactly one-third that of the rectangle; that is to say, A = b3/3, where
A denotes the area of the parabolic segment. We shall show presently how to arrive at this
result.
It should be pointed out that the parabolic segment in Figure 1.3 is not shown exactly as
Archimedes drew it and the details that follow are not exactly the same as those used by him.
0
FIGURE 1.5
b-
n
-26 .
n
.
.
kb
-
n
. . . b,!!!
n
Calculation of the area of a parabolic segment.
The method of exhaustion for the area of a parabolic segment
5
Nevertheless, the essential ideas are those of Archimedes; what is presented here is the
method of exhaustion in modern notation.
The method is simply this: We slice the figure into a number of strips and obtain two
approximations to the region, one from below and one from above, by using two sets of
rectangles as illustrated in Figure 1.4. (We use rectangles rather than arbitrary polygons to
simplify the computations.) The area of the parabolic segment is larger than the total area
of the inner rectangles but smaller than that of the outer rectangles.
If each strip is further subdivided to obtain a new approximation with a larger number
of strips, the total area of the inner rectangles increases, whereas the total area of the outer
rectangles decreases. Archimedes realized that an approximation to the area within any
desired degree of accuracy could be obtained by simply taking enough strips.
Let us carry out the actual computations that are required in this case. For the sake of
simplicity, we subdivide the base into n equal parts, each of length b/n (see Figure 1.5). The
points of subdivision correspond to the following values of x:
(n - 1)b -=
nb b
()b
9
> 2 3 2 ,...,
>
n n n
n
n
A typical point of subdivision corresponds to x = kbln, where k takes the successive values
k = 0, 1,2, 3, . . . , n. At each point kb/n we construct the outer rectangle of altitude (kb/n)2
as illustrated in Figure 1.5. The area of this rectangle is the product of its base and altitude
and is equal to
Let us denote by S, the sum of the areas of a11 the outer rectangles. Then since the kth
rectangle has area (b3/n3)k2, we obtain the formula
(1.1)
s, = $ (12 + 22 + 32 + . * * + 2).
In the same way we obtain a formula for the sum s, of a11 the inner rectangles:
(1.2)
s, = if [12 + 22 + 32 + * * * + (n - 1)21 .
n3
This brings us to a very important stage in the calculation. Notice that the factor multiplying b3/n3 in Equation (1.1) is the sum of the squares of the first n integers:
l2 + 2” + *. *+ n2.
[The corresponding factor in Equation (1.2) is similar except that the sum has only n - 1
terms.] For a large value of n, the computation of this sum by direct addition of its terms is
tedious and inconvenient.
Fortunately there is an interesting identity which makes it possible
to evaluate this sum in a simpler way, namely,
(1.3)
l2 + 22 + * * * +4+5+l.
6
.
,
6
Introduction
This identity is valid for every integer n 2 1 and cari be proved as follows: Start with the
formula (k + 1)” = k3 + 3k2 + 3k + 1 and rewrite it in the form
3k2 + 3k + 1 = (k +
Takingk=
1)”
- k3.
1,2,..., n - 1, we get n - 1 formulas
3*12+3.1+
1=23-
13
3~2~+3.2+1=33-23
3(n - 1)” + 3(n - 1) + 1 = n3 - (n - 1)“.
When we add these formulas, a11 the terms on the right cancel
except two and we obtain
3[1” + 22 + * * * + (n - 1)2] + 3[1 + 2+ . . . + (n - l)] + (n - 1) = n3 - 13.
The second sum on the left is the sum of terms in an arithmetic progression and it simplifies
to &z(n - 1). Therefore this last equation gives us
Adding n2 to both members, we obtain (1.3).
For our purposes, we do not need the exact expressions given in the right-hand members
of (1.3) and (1.4). Al1 we need are the two inequalities
12+22+***
+ (n - 1)” < -3 < l2 + 22 + . . . + n2
which are valid for every integer n 2 1. These inequalities cari de deduced easily as consequences
of (1.3) and (1.4), or they cari be proved directly by induction. (A proof by
induction is given in Section 14.1.)
If we multiply both inequalities in (1.5) by b3/ n3 and make use of (1.1) and (1.2) we obtain
(1.6)
s, < 5 < $2
for every n. The inequalities in (1.6) tel1 us that b3/3 is a number which lies between s, and
S, for every n. We Will now prove that b3/3 is the ody number which has this property. In
other words, we assert that if A is any number which satisfies the inequalities
(1.7)
s, < A < S,
for every positive integer n, then A = b3/3. It is because of this fact that Archimedes
concluded that the area of the parabolic segment is b3/3.
The method of exhaustion for the area of a parabolic segment
7
T O prove that A = b3/3, we use the inequalities in (1.5) once more. Adding n2 to both
sides of the leftmost inequality in (I.5), we obtain
l2 + 22 + * ** + n2 < $ + n2.
Multiplying this by b3/n3 and using (I.l), we find
s,<:+c
0.8)
n
Similarly, by subtracting n2 from both side; of the rightmost inequality in (1.5) and multiplying by b3/n3, we are led to the inequaiity
b3
- - b3- < s,.
3
n
(1.9)
Therefore, any number A satisfying (1.7) must also satisfy
(1.10)
for every integer IZ 2 1. Now there are only three possibilities:
A>;,
A<$
A=$,
If we show that each of the first two leads to a contradiction, then we must have A = b3/3,
since, in the manner of Sherlock Holmes, this exhausts a11 the possibilities.
Suppose the inequality A > b3/3 were true. From the second inequality in (1.10) we
obtain
(1.11)
A-;<!f
n
for every integer n 2 1. Since A - b3/3 is positive, we may divide both sides of (1.11) by
A - b3/3 and then multiply by n to obtain the equivalent statement
n<
b3
A - b3/3
for every n. But this inequality is obviously false when IZ 2 b3/(A - b3/3). Hence the
inequality A > b3/3 leads to a contradiction. By a similar argument, we cari show that the
8
Introduction
inequality A < b3/3 also leads to a contradiction, and therefore we must have A = b3/3,
as asserted.
*Il.4
Exercises
1. (a) Modify the region in Figure 1.3 by assuming that the ordinate at each x is 2x2 instead of
x2. Draw the new figure. Check through the principal steps in the foregoing section and
find what effect this has on the calculation of the area. Do the same if the ordinate at each x is
(b) 3x2, (c) ax2, (d) 2x2 + 1, (e) ux2 + c.
2. Modify the region in Figure 1.3 by assuming that the ordinate at each x is x3 instead of x2.
Draw the new figure.
(a) Use a construction similar to that illustrated in Figure 1.5 and show that the outer and inner
sums S, and s, are given by
s, = ; (13 +
23
+ . . * + n3),
(b) Use the inequalities (which cari
(1.12)
13
+23
b4
s, = 2 113 + 23 + . . . + (n - 1)3].
be proved by mathematical induction; see Section 14.2)
+... + (n - 1)s < ; < 13 +
23
+ . . . + n3
to show that s, < b4/4 < S, for every n, and prove that b4/4 is the only number which lies
between s, and S, for every n.
(c) What number takes the place of b4/4 if the ordinate at each x is ux3 + c?
3. The inequalities (1.5) and (1.12) are special cases of the more general inequalities
(1.13)
1” + 2” + . . . + (n - 1)” < & < 1” + 2” + . . . + ?ZK
that are valid for every integer n 2 1 and every integer k 2 1. Assume the -validity of (1.13)
and generalize the results of Exercise 2.
Il.5
A critical analysis of Archimedes’ method
From calculations similar to those in Section 1 1.3, Archimedes concluded that the area
of the parabolic segment in question is b3/3. This fact was generally accepted as a mathematical theorem for nearly 2000 years before it was realized that one must re-examine
the result from a more critical point of view. TO understand why anyone would question
the validity of Archimedes’ conclusion, it is necessary to know something about the important
changes that have taken place in the recent history of mathematics.
Every branch of knowledge is a collection of ideas described by means of words and
symbols, and one cannot understand these ideas unless one knows the exact meanings of
the words and symbols that are used. Certain branches of knowledge, known as deductive
systems, are different from others in that a number of “undefined” concepts are chosen
in advance and a11 other concepts in the system are defined in terms of these. Certain
statements about these undefined concepts are taken as axioms or postulates and other
A critical analysis
of Archimedes’ method
9
statements that cari be deduced from the axioms are called theorems. The most familiar
example of a deductive system is the Euclidean theory of elementary geometry that has
been studied by well-educated men since the time of the ancient Greeks.
The spirit of early Greek mathematics, with its emphasis on the theoretical and postulational approach to geometry as presented in Euclid’s Elements, dominated the thinking
of mathematicians until the time of the Renaissance. A new and vigorous phase in the
development of mathematics began with the advent of algebra in the 16th Century, and
the next 300 years witnessed a flood of important discoveries. Conspicuously absent from
this period was the logically precise reasoning of the deductive method with its use of
axioms, definitions, and theorems. Instead, the pioneers in the 16th, 17th, and 18th centuries resorted to a curious blend of deductive reasoning combined with intuition, pure
guesswork, and mysticism, and it is not surprising to find that some of their work was
later shown to be incorrect. However, a surprisingly large number of important discoveries
emerged from this era, and a great deal of the work has survived the test of history-a
tribute to the unusual ski11 and ingenuity of these pioneers.
As the flood of new discoveries began to recede, a new and more critical period emerged.
Little by little, mathematicians felt forced to return to the classical ideals of the deductive
method in an attempt to put the new mathematics on a firm foundation. This phase of the
development, which began early in the 19th Century and has continued to the present day,
has resulted in a degree of logical purity and abstraction that has surpassed a11 the traditions
of Greek science. At the same time, it has brought about a clearer understanding of the
foundations of not only calculus but of a11 of mathematics.
There are many ways to develop calculus as a deductive system. One possible approach
is to take the real numbers as the undefined abjects.
Some of the rules governing the
operations on real numbers may then be taken as axioms. One such set of axioms is listed
in Part 3 of this Introduction. New concepts, such as integral, limit, continuity, derivative,
must then be defined in terms of real numbers. Properties of these concepts are then
deduced as theorems that follow from the axioms.
Looked at as part of the deductive system of calculus, Archimedes’ result about the area
of a parabolic segment cannot be accepted as a theorem until a satisfactory definition of
area is given first. It is not clear whether Archimedes had ever formulated a precise definition of what he meant by area. He seems to have taken it for granted that every region has an
area associated with it. On this assumption he then set out to calculate areas of particular
regions. In his calculations he made use of certain facts about area that cannot be proved
until we know what is meant by area. For instance, he assumed that if one region lies inside
another, the area of the smaller region cannot exceed that of the larger region. Also, if a
region is decomposed into two or more parts, the sum of the areas of the individual parts is
equal to the area of the whole region. Al1 these are properties we would like area to possess,
and we shall insist that any definition of area should imply these properties. It is quite
possible that Archimedes himself may have taken area to be an undefined concept and then
used the properties we just mentioned as axioms about area.
Today we consider the work of Archimedes as being important not SO much because it
helps us to compute areas of particular figures, but rather because it suggests a reasonable
way to dejïne the concept of area for more or less arbitrary figures. As it turns out, the
method of Archimedes suggests a way to define a much more general concept known as the
integral. The integral, in turn, is used to compute not only area but also quantities such as
arc length, volume, work and others.
10
Introduction
If we look ahead and make use of the terminology of integral calculus, the result of the
calculation carried out in Section 1 1.3 for the parabolic segment is often stated as follows :
“The integral of x2 from 0 to b is b3/3.”
It is written symbolically as
0
s0
b3
x2 dx = - ,
3
The symbol 1 (an elongated S) is called an integral sign, and it was introduced by Leibniz
in 1675. The process which produces the number b3/3 is called integration. The numbers
0 and b which are attached to the integral sign are referred to as the limits of integration.
The symbol Jo x2 dx must be regarded as a whole. Its definition Will treat it as such, just
as the dictionary describes the word “lapidate” without reference to “lap,” “id,” or “ate.”
Leibniz’ symbol for the integral was readily accepted by many early mathematicians
because they liked to think of integration as a kind of “summation process” which enabled
them to add together infinitely many “infinitesimally small quantities.” For example, the
area of the parabolic segment was conceived of as a sum of infinitely many infinitesimally
thin rectangles of height x2 and base dx. The integral sign represented the process of adding
the areas of a11 these thin rectangles. This kind of thinking is suggestive and often very
helpful, but it is not easy to assign a precise meaning to the idea of an “infinitesimally small
quantity.” Today the integral is defined in terms of the notion of real number without
using ideas like “infinitesimals.” This definition is given in Chapter 1.
Il.6
The approach to calculus to be used in this book
A thorough and complete treatment of either integral or differential calculus depends
ultimately on a careful study of the real number system. This study in itself, when carried
out in full, is an interesting but somewhat lengthy program that requires a small volume
for its complete exposition. The approach in this book is to begin with the real numbers
as unde@zed abjects and simply to list a number of fundamental properties of real numbers
which we shall take as axioms. These axioms and some of the simplest theorems that cari
be deduced from them are discussed in Part 3 of this chapter.
Most of the properties of real numbers discussed here are probably familiar to the reader
from his study of elementary algebra. However, there are a few properties of real numbers
that do not ordinarily corne into consideration in elementary algebra but which play an
important role in the calculus. These properties stem from the so-called Zeast-Upper-bound
axiom (also known as the completeness or continuity axiom) which is dealt with here in some
detail. The reader may wish to study Part 3 before proceeding with the main body of the
text, or he may postpone reading this material until later when he reaches those parts of the
theory that make use of least-Upper-bound properties. Material in the text that depends on
the least-Upper-bound axiom Will be clearly indicated.
TO develop calculus as a complete, forma1 mathematical theory, it would be necessary
to state, in addition to the axioms for the real number system, a list of the various “methods
of proof” which would be permitted for the purpose of deducing theorems from the axioms.
Every statement in the theory would then have to be justified either as an “established law”
(that is, an axiom, a definition, or a previously proved theorem) or as the result of applying
Introduction to set theory
II
one of the acceptable methods of proof to an established law. A program of this sort would
be extremely long and tedious and would add very little to a beginner’s understanding of
the subject. Fortunately, it is not necessary to proceed in this fashion in order to get a good
understanding and a good working knowledge of calculus. In this book the subject is
introduced in an informa1 way, and ample use is made of geometric intuition whenever it is
convenient
to do SO. At the same time, the discussion proceeds in a manner that is consistent with modern standards of precision and clarity of thought. Al1 the important
theorems of the subject are explicitly stated and rigorously proved.
TO avoid interrupting the principal flow of ideas, some of the proofs appear in separate
starred sections. For the same reason, some of the chapters are accompanied by supplementary material in which certain important topics related to calculus are dealt with in
detail. Some of these are also starred to indicate that they may be omitted or postponed
without disrupting the continuity of the presentation. The extent to which the starred
sections are taken up or not Will depend partly on the reader’s background and ski11 and
partly on the depth of his interests. A person who is interested primarily in the basic
techniques may skip the starred sections. Those who wish a more thorough course in
calculus, including theory as well as technique, should read some of the starred sections.
Part 2.
Some Basic Concepts of the Theory of Sets
12.1 Introduction to set theory
In discussing any branch of mathematics, be it analysis, algebra, or geometry, it is helpful
to use the notation and terminology of set theory. This subject, which was developed by
Boole and Cantort in the latter part of the 19th Century, has had a profound influence on the
development of mathematics in the 20th Century. It has unified many seemingly disconnected ideas and has helped to reduce many mathematical concepts to their logical foundations in an elegant and systematic way. A thorough treatment of the theory of sets would
require a lengthy discussion which we regard as outside the scope of this book. Fortunately,
the basic notions are few in number, and it is possible to develop a working knowledge of the
methods and ideas of set theory through an informa1 discussion. Actually, we shall discuss
not SO much a new theory as an agreement about the precise terminology that we wish to
apply to more or less familiar ideas.
In mathematics, the word “set” is used to represent a collection of abjects viewed as a
single entity. The collections called to mind by such nouns as “flock,” “tribe,” “crowd,”
“team,” and “electorate” are a11 examples of sets. The individual abjects in the collection
are called elements or members of the set, and they are said to belong to or to be contained in
the set. The set, in turn, is said to contain or be composed ofits elements.
t George Boole (1815-1864) was an English mathematician and logician. His book, An Investigation of the
Laws of Thought, published in 1854, marked the creation of the first workable system of symbolic logic.
Georg F. L. P. Cantor (1845-1918) and his school created the modern theory of sets during the period
1874-1895.
12
Introduction
We shall be interested primarily in sets of mathematical abjects: sets of numbers, sets of
curves, sets of geometric figures, and SO on. In many applications it is convenient to deal
with sets in which nothing special is assumed about the nature of the individual abjects in
the collection. These are called abstract sets. Abstract set theory has been developed to deal
with such collections of arbitrary abjects, and from this generality the theory derives its power.
12.2
Notations for designating sets
Sets usually are denoted by capital letters : A, B, C, . . . , X, Y, Z; elements are designated
by lower-case letters: a, b, c, . . . , x, y, z. We use the special notation
XES
to mean that “x is an element of S” or “x belongs to S.” If x does not belong to S, we Write
x 6 S. When convenient, we shall designate sets by displaying the elements in braces; for
example, the set of positive even integers less than 10 is denoted by the symbol (2, 4, 6, S}
whereas the set of a11 positive even integers is displayed as (2, 4, 6, . . .}, the three dots
taking the place of “and SO on.” The dots are used only when the meaning of “and SO on”
is clear. The method of listing the members of a set within braces is sometimes referred to as
the roster notation.
The first basic concept that relates one set to another is equality of sets:
DEFINITION OF SET EQUALITY. Two sets A and B are said to be equal (or identical) if
they consist of exactly the same elements, in which case we Write A = B. If one of the sets
contains an element not in the other, we say the sets are unequal and we Write A # B.
EXAMPLE 1. According to this definition, the two sets (2, 4, 6, 8} and (2, 8, 6,4} are
equal since they both consist of the four integers 2,4,6, and 8. Thus, when we use the roster
notation to describe a set, the order in which the elements appear is irrelevant.
EXAMPLE 2. The sets {2,4, 6, 8) and {2,2, 4,4, 6, S} are equal even though, in the second
set, each of the elements 2 and 4 is listed twice. Both sets contain the four elements 2,4, 6, 8
and no others; therefore, the definition requires that we cal1 these sets equal. This example
shows that we do not insist that the abjects listed in the roster notation be distinct. A similar
example is the set of letters in the word Mississippi, which is equal to the set {M, i, s, p},
consisting of the four distinct letters M, i, s, and p.
12.3 Subsets
From a given set S we may form new sets, called subsets of S. For example, the set
consisting of those positive integers less than 10 which are divisible by 4 (the set (4, 8)) is a
subset of the set of a11 even integers less than 10. In general, we have the following definition.
DEFINITION
OF
A SUBSET.
A set A is said to be a subset of a set B, and we Write
A c B,
whenever every element of A also belongs to B. We also say that A is contained
contains A. The relation c is referred to as set inclusion.
in B or that B
Unions, intersections, complements
13
The statement A c B does not rule out the possibility that B E A. In fact, we may have
both A G B and B c A, but this happens only if A and B have the same elements. In
other words,
A = B
i f a n d o n l y i f Ac BandBc A .
This theorem is an immediate consequence
of the foregoing definitions of equality and
inclusion. If A c B but A # B, then we say that A is aproper subset of B; we indicate this
by writing A c B.
In a11 our applications of set theory, we have a fixed set S given in advance, and we are
concerned only with subsets of this given set. The underlying set S may vary from one
application to another ; it Will be referred to as the unit~ersal set of each particular discourse.
The notation
{x 1 x E S and x satisfies P}
Will designate the set of a11 elements x in S which satisfy the property P. When the universal
set to which we are referring is understood, we omit the reference to Sand Write simply
{x 1x satisfies P}. This is read “the set of a11 x such that x satisfies P.” Sets designated in
this way are said to be described by a defining property. For example, the set of a11 positive
real numbers could be designated as {x 1x > O}; the universal set S in this case is understood
to be the set of a11 real numbers. Similarly, the set of a11 even positive integers {2,4, 6, . . .}
cari be designated as {x 1x is a positive even integer}. Of course, the letter x is a dummy and
may be replaced by any other convenient symbol. Thus, we may Write
{x 1 x > 0) = {y 1 y > 0) = {t 1t > 0)
and SO on.
It is possible for a set to contain no elements whatever. This set is called the empty set
or the void set, and Will be denoted by the symbol ,@ . We Will consider ,@ to be a subset of
every set. Some people find it helpful to think of a set as analogous to a container (such as a
bag or a box) containing certain abjects, its elements. The empty set is then analogous to an
empty container.
TO avoid logical difficulties, we must distinguish between the element x and the set {x}
whose only element is x. (A box with a hat in it is conceptually distinct from the hat itself.)
In particular, the empty set 0 is not the same as the set {@}.
In fact, the empty set ,@ contains
no elements, whereas the set { 0 } has one element, 0. (A box which contains an empty box
is not empty.) Sets consisting of exactly one element are sometimes called one-element
sets.
Diagrams often help us visualize relations between sets. For example, we may think of a
set S as a region in the plane and each of its elements as a point. Subsets of S may then be
thought of as collections of points within S. For example, in Figure 1.6(b) the shaded portion
is a subset of A and also a subset of B. Visual aids of this type, called Venn diagrams, are
useful for testing the validity of theorems in set theory or for suggesting methods to prove
them. Of course, the proofs themselves must rely only on the definitions of the concepts and
not on the diagrams.
12.4 Unions, intersections, complements
From two given sets A and B, we cari form a new set called the union of A and B. This
new set is denoted by the symbol
A
v
B (read: “A union B”)
14
Introduction
0
0
B
A
(a) A u B
(b) A n B
(c) A n B = @
FIGURE 1.6 Unions and intersections.
and is defined as the set of those elements which are in A, in B, or in both. That is to say,
A U B is the set of a11 elements which belong to at least one of the sets A, B. An example is
illustrated in Figure 1.6(a), where the shaded portion represents A u B.
Similarly, the intersection of A and B, denoted by
AnB
(read: “A intersection B”) ,
is defined as the set of those elements common to both A and B. This is illustrated by the
shaded portion of Figure 1.6(b). In Figure I.~(C), the two sets A and B have no elements in
common; in this case, their intersection is the empty set 0. Two sets A and B are said to be
disjointifA nB= ,D.
If A and B are sets, the difference A - B (also called the complement of B relative to A) is
defined to be the set of a11 elements of A which are not in B. Thus, by definition,
In Figure 1.6(b) the unshaded portion of A represents A - B; the unshaded portion of B
represents B - A.
The operations of union and intersection have many forma1 similarities to (as well as
differences from) ordinary addition and multiplication of real numbers. For example,
since there is no question of order involved in the definitions of union and intersection, it
follows that A U B = B U A and that A n B = B n A. That is to say, union and intersection are commutative operations. The definitions are also phrased in such a way that the
operations are associative :
(A u B) u C = A u (B u C)
and
(A n B) n C = A n (B n C) .
These and other theorems related to the “algebra of sets” are listed as Exercises in Section
1 2.5. One of the best ways for the reader to become familiar with the terminology and
notations introduced above is to carry out the proofs of each of these laws. A sample of the
type of argument that is needed appears immediately after the Exercises.
The operations of union and intersection cari be extended to finite or infinite collections
of sets as follows: Let 9 be a nonempty class? of sets. The union of a11 the sets in 9 is
t T O help simplify the language, we cal1 a collection of sets a class. Capital script letters d, g, %‘, . . . are
used to denote classes. The usual terminology and notation of set theory applies, of course, to classes. Thus,
for example, A E 9 means that A is one of the sets in the class 9, and XJ E .?Z means that every set in I
is also in 9, and SO forth.
Exercises
15
defined as the set of those elements which belong to at least one of the sets in 9 and is
denoted by the symbol
UA.
AET
If 9 is a finite collection of sets, say 9 = {A, , A,, . . . , A,}, we Write
*;-&A =Cl&= AI
u A, u . . . u A, .
Similarly, the intersection of a11 the sets in 9 is defined to be the set of those elements
which belong to every one of the sets in 9; it is denoted by the symbol
ALLA.
For finite collections (as above), we Write
Unions and intersections have been defined in such a way that the associative laws for
these operations are automatically satisfied. Hence, there is no ambiguity when we Write
A, u A2 u . . . u A, or A, n A2 n . - . n A,.
12.5 Exercises
1. Use the roster notation to designate the following sets of real numbers.
A = {x 1x2 - 1 = O} .
D={~IX~-2x2+x=2}.
B = {x 1(x - 1)2 = 0} .
E = {x 1(x + Q2 = 9”}.
C = {x ) x + 8 = 9}.
F = {x 1(x2 + 16~)~ = 172}.
2. For the sets in Exercise 1, note that B c A. List a11 the inclusion relations & that hold among
the sets A, B, C, D, E, F.
3. Let A = {l}, B = {1,2}. Discuss the validity of the following statements (prove the ones that
are true and explain why the others are not true).
(a) A c B.
(d) ~EA.
(e) 1 c A.
(b) A G B.
(f) 1 = B.
(c) A E B.
4. Solve Exercise 3 if A = (1) and B = {{l}, l}.
5. Given the set S = (1, 2, 3, 4). Display a11 subsets of S. There are 16 altogether, counting
0 and S.
6. Given the following four sets
A=
Il,%
B = {{l),
W,
c = W), (1, 2%
D = {{lh
(8, {1,2H,
Introduction
16
discuss the validity of the following statements (prove the ones that are true and explain why
the others are not true).
(a) A = B.
(d) A E C.
Cg) B c D.
(b) A G B.
(e) A c D.
(h) B E D.
(c) A c c.
(f) B = C.
(i) A E D.
7. Prove the following properties of set equality.
64 {a, 4 = {a>.
(b) {a, b) = lb, 4.
(c) {a} = {b, c} if and only if a = b = c.
Prove the set relations in Exercises 8 through 19. (Sample proofs are given at the end of this
section).
8. Commutative laws: A u B = B
9. Associative laws: A V (B v C)
10. Distributive Zuws: A n (B u C)
1 1 . AuA=A, AnA=A,
12. A c A u B, A n B c A.
1 3 . Au@ = A , Ana =ET.
14. A u (A n B) = A, A n (A u
u A, A n B = B n A.
= (A u B) u C, A n (B A C) = (A n B) n C.
= (A n B) u (A n C), A u (B n C) = (A u B) n (A u C).
B) = A.
15.IfA&CandBcC,thenA~B~C.
16. If C c A and C E B, then C 5 A n B.
17. (a) If A c B and B c C, prove that A c C.
(b) If A c B and B c C, prove that A s C.
(c) What cari you conclude if A c B and B c C?
(d) If x E A and A c B, is it necessarily true that x E B?
(e) If x E A and A E B, is it necessarily true that x E B?
18. A - (B n C) = (A - B) u (A - C).
19. Let .F be a class of sets. Then
B-UA=n(B-A)
ACF
B - f-j A = u (B - A).
and
AEF
AES
AEF
20. (a) Prove that one of the following two formulas is always right and the other one is sometimes
wrong :
(i) A - (B - C) = (A - B) u C,
(ii) A - (B
U
C)
=
(A - B) - C.
(b) State an additional necessary and sufficient condition for the formula which is sometimes
incorrect to be always right.
Proof of the commutative law A V B = BuA. L e t X=AUB, Y=BUA. T O
prove that X = Y we prove that X c Y and Y c X. Suppose that x E X. Then x is
in at least one of A or B. Hence, x is in at least one of B or A; SO x E Y. Thus, every
element of X is also in Y, SO X c Y. Similarly, we find that Y Ç X, SO X = Y.
Proof of A n B E A. If x E A n B, then x is in both A and B. In particular, x E A.
Thus, every element of A n B is also in A; therefore, A n B G A.
The field axioms
Part 3.
17
A Set of Axioms for the Real-Number System
13.1 Introduction
There are many ways to introduce the real-number system. One popular method is to
begin with the positive integers 1, 2, 3, , . . and use them as building blocks to construct a
more comprehensive system having the properties desired. Briefly, the idea of this method
is to take the positive integers as undefined concepts, state some axioms concerning
them, and then use the positive integers to build a larger system consisting of the positive
rational numbers (quotients of positive integers). The positive rational numbers, in turn,
may then be used as a basis for constructing the positive irrational numbers (real numbers
like 1/2 and 7~ that are not rational). The final step is the introduction of the negative real
numbers and zero. The most difficult part of the whole process is the transition from the
rational numbers to the irrational numbers.
Although the need for irrational numbers was apparent to the ancient Greeks from
their study of geometry, satisfactory methods for constructing irrational numbers from
rational numbers were not introduced until late in the 19th Century. At that time, three
different theories were outlined by Karl Weierstrass (1815-1897), Georg Cantor (18451918), and Richard Dedekind (1831-1916). In 1889, the Italian mathematician Guiseppe
Peano (1858-1932) listed five axioms for the positive integers that could be used as the
starting point of the whole construction. A detailed account of this construction, beginning
with the Peano postulates and using the method of Dedekind to introduce irrational
numbers, may be found in a book by E. Landau, Foundations of Analysis (New York,
Chelsea Publishing CO., 1951).
The point of view we shah adopt here is nonconstructive. We shall start rather far out
in the process, taking the real numbers themselves as undefined abjects satisfying a number
of properties that we use as axioms. That is to say, we shah assume there exists a set R of
abjects, called real numbers, which satisfy the 10 axioms listed in the next few sections. Al1
the properties of real numbers cari be deduced from the axioms in the list. When the real
numbers are defined by a constructive process, the properties we list as axioms must be
proved as theorems.
In the axioms that appear below, lower-case letters a, 6, c, . . . , x, y, z represent arbitrary
real numbers unless something is said to the contrary. The axioms fa11 in a natural way into
three groups which we refer to as the jeld axioms, the order axioms, and the least-upperbound axiom (also called the axiom of continuity or the completeness axiom).
13.2 The field axioms
Along with the set R of real numbers we assume the existence of two operations called
addition and multiplication, such that for every pair of real numbers x and y we cari form the
sum of x and y, which is another real number denoted by x + y, and the product of x and y,
denoted by xy or by x . y. It is assumed that the sum x + y and the product xy are uniquely
determined by x and y. In other words, given x and y, there is exactly one real number
x + y and exactly one real number xy. We attach no special meanings to the symbols
+ and . other than those contained in the axioms.
18
Introduction
COMMUTATIVE
LAWS.
+y
=y
+
X,
AXIOM
1.
AXIOM
2. ASSOCIATIVE LAWS.
x + (y + 2) = (x + y) + z,
AXIOM
3.
x(y + z) = xy + xz.
DISTRIBUTIVE LAW.
X
~xy = yx.
x(yz) = (xy)z.
AXIOM 4. EXISTENCE OF IDENTITY ELEMENTS.
There exist two aistinct real numbers, which
we denote by 0 and 1, such that for ecery real x we have x + 0 = x and 1 ’ x = x.
AXIOM
5.
EXISTENCE
OF
NEGATIVES.
For ecery real number x there is a real number y
such that x + y = 0.
AXIOM 6. EXISTENCE OF RECIPROCALS.
number y such that xy = 1.
Note:
For every real number x # 0 there is a real
The numbers 0 and 1 in Axioms 5 and 6 are those of Axiom 4.
From the above axioms we cari deduce a11 the usual laws of elementary algebra. The
most important of these laws are collected here as a list of theorems. In a11 these theorems
the symbols a, b, C, d represent arbitrary real numbers.
Zf a + b = a + c, then b = c. (In
particular, this shows that the number 0 of Axiom 4 is unique.)
THEOREM
1.1.
CANCELLATION
LAW
FOR
ADDITION.
THEOREM 1.2. POSSIBILITY
OF
SUBTRACTION. Given a and b, there is exactly one x such
that a + x = 6. This x is denoted by b - a. In particular, 0 - a is written simply -a and
is called the negative of a.
THEOREM
1.3.
b - a = b + (-a).
THEOREM
1.4.
-(-a) = a.
THEOREM
1.5.
a(b - c) = ab ‘- ac.
THEOREM
1.6.
0 *a = a * 0 = 0.
THEOREM 1.7. C A N C E L L A T I O N L A W F O R M U L T I P L I C A T I O N .
Zf ab = ac and a # 0, then
b = c. (Zn particular, this shows that the number 1 of Axiom 4 is unique.)
THEOREM
1.8.
POSSIBILITY OF DIVISION.
Given a and b with a # 0, there is exactly one x
such that ax = b. This x is denoted by bja or g and is called the quotient of b and a.
particular, lia is also written aa1 and is called the reciprocal of a.
THEOREM
1.9.
If a # 0, then b/a = b * a-l.
THEOREM
1.10.
Zf
THEOREM
1.11.
Zfab=O,thena=Oorb=O.
THEOREM
1.12.
(-a)b = -(ah) and (-a)(-b) = ab.
THEOREM
1.13.
THEOREM
1.14.
(a/b)(c/d) = (ac)/(bd) if’b # 0 and d # 0.
THEOREM
1.15.
(a/b)/(c/d) = (ad)/(bc) if’b + 0, c # 0, and d # 0.
a # 0, then (a-‘)-’ = a.
(a/b) + (C/d) = (ad + bc)/(bd) zf b # 0 and d # 0.
In
The order axioms
19
TO illustrate how these statements may be obtained as consequences
of the axioms, we
shall present proofs of Theorems 1.1 through 1.4. Those readers who are interested may
find it instructive to carry out proofs of the remaining theorems.
Proof of 1.1. Given a + b = a + c. By Axiom 5, there is a numbery such that y + a = 0.
Since sums are uniquely determined, we have y + (a + 6) = y + (a + c). Using the
associative law, we obtain (y + a) + b = (y + a) + c or 0 + b = 0 + c. But by Axiom 4
we have 0 + b = b and 0 + c = c, SO that b = c. Notice that this theorem shows that there
is only one real number having the property of 0 in Axiom 4. In fact, if 0 and 0’ both have
this property, then 0 + 0’ = 0 and 0 + 0 = 0. Hence 0 + 0’ = 0 + 0 and, by the cancellation law, 0 = 0’.
Proof of 1.2. Given a and 6, choose y SO that a + y = 0 and let x = y + b. Then
a + x = a + (y + b) = (a + y) + b = 0 + b = b. Therefore there is at least one x
such that a + x = 6. But by Theorem 1.1 there is at most one such x. Hence there is
exactly one.
Proof of 1.3. Let x = b - a and let y = b + (-a). We wish to prove that x = y.
Now x + a = b (by the definition of b - a) and
y+a=[b+(-a)]+a=b+[(-a)+a]=b+O=b.
Therefore x + a = y + a and hence,
by Theorem 1.1, x = y,
Proof of 1.4. We have a + (-a) = 0 by the definition of -a. But this equation tells us
that a is the negative of -a. That is, a = -(-a), as asserted.
*13.3
Exercises
1. Prove Theorems 1.5 through 1.15, using Axioms 1 through 6 a n d Theorems 1.1 through 1.4.
In Exercises 2 through 10, prove the given statements or establish the given equations. You
may use Axioms 1 through 6 and Theorems 1.1 through 1.15.
2. -0 = 0.
3. 1-l = 1.
4. Zero has no reciprocal.
5. -(a + b) = -a - b.
6. -(a - b) = -a + b.
7. (a - b) + (b - c) = u - c.
8. If a # 0 and b # 0, then (ub)-l = u-lb-l.
9. -(u/b) = (-a/!~) = a/( -b) if b # 0.
10. (u/b) - (c/i) = (ad - ~C)/(M) if b # 0 and d # 0.
13.4 The order axioms
This group of axioms has to do with a concept which establishes an ordering among the
real numbers. This ordering enables us to make statements about one real number being
larger or smaller than another. We choose to introduce the order properties as a set of
20
Introduction
axioms about a new undefïned concept called positiveness and then to define terms like
less than and greater than in terms of positiveness.
We shah assume that there exists a certain subset R+ c R, called the set of positive
numbers, which satisfies the following three order axioms :
7.
AXIOM
If x and y are in R+, SO are x + y and xy.
AXIOM
8. For every real x # 0, either x E R+ or -x E R+, but not both.
AXIOM
9.
0 $6 R+.
Now we cari define the symbols <, >, 5, and 2, called, respectively, less than, greater
than, less than or equal to, and greater than or equal to, as follows:
x < y means that y - x is positive;
y > x means that x < y;
x 5 y means that either x < y or x = y;
y 2 x means that x 5 y.
Thus, we have x > 0 if and only if x is positive. If x < 0, we say that x is negative; if
x 2 0, we say that x is nonnegative. A pair of simultaneous inequalities such as x < y,
y < z is usually written more briefly as x < y < z; similar interpretations are given to the
compound inequalities x 5 y < z, x < y 5 z, and x < y 5 z.
From the order axioms we cari derive a11 the usual rules for calculating with inequalities.
The most important of these are listed here as theorems.
THEOREM
1.16. TRICHOTOMY LAW. For arbitrary real numbers a and b, exact@ one of
the three relations a < b, b < a, a = b holds.
Zf a < b andb < c, then a < c.
THEOREM
1.17.
TRANSITIVE
THEOREM
1.18.
If a < b, then a + c < b + c.
THEOREM
1.19.
THEOREM
1.20.
THEOREM
If
If
LAW.
a < b and c > 0, then ac < bc.
a # 0, then a2 > 0.
1.21. 1 > 0.
THEOREM
1.22.
Zf a < b and c < 0, then ac > bc.
THEOREM
1.23.
If a < b, then -a > -b. Znparticular, fa < 0, then -a > 0.
THEOREM
1.24.
If ab > 0, then both a and b are positive or both are negative.
THEOREM
1.25.
If a < c and b < d, then a + b < c + d.
Again, we shall prove only a few of these theorems as samples to indicate how the proofs
may be carried out. Proofs of the others are left as exercises.
Integers and rational numbers
21
Proof of 1.16. Let x = b - a. If x = 0, then b - a = a - b = 0, and hence, by Axiom
9, we cannot have a > b or b > a. If x # 0, Axiom 8 tells us that either x > 0 or x < 0,
but not both; that is, either a < b or b < a, but not both. Therefore, exactly one of the
three relations, a = b, a < 6, b < a, holds.
Proof of 1.17. If a < b and b < c, then b - a > 0 and c - b > 0. By Axiom 7 we may
add to obtain (b - a) + (c - b) > 0. That is, c - a > 0, and hence a < c.
Proof of 1.18. Let x = a + c, y = b + c. Then y - x = b - a. But b - a > 0 since
a < b. Hence y - x > 0, and this means that x < y.
Proof of 1.19. If a < 6, then b - a > 0. If c > 0, then by Axiom 7 we may multiply
c by (b - a) to obtain (b - a)c > 0. But (b - a)c = bc - ac. Hence bc - ac > 0, and
this means that ac < bc, as asserted.
Proof of 1.20. If a > 0, then a * a > 0 by Axiom 7. If a < 0, then -a > 0, and hence
(-a) * (-a) > 0 by Axiom 7. In either case we have a2 > 0.
Proof of 1.21. Apply Theorem 1.20 with a = 1.
*I 3.5 Exercises
1. Prove Theorems 1.22 through 1.25, using the earlier theorems a n d Axioms 1 through 9.
In Exercises 2 through 10, prove the given statements or establish the given inequalities. You
may use Axioms 1 through 9 and Theorems 1.1 through 1.25.
2. There is no real number x such that x2 + 1 = 0.
3. The sum of two negative numbers is negative.
4. If a > 0, then l/u > 0; if a < 0, then l/a < 0.
5. If 0 < a < b, then 0 < b-l < u-l.
6. Ifu sbandb <c,thenu SC.
7. Ifu <bandb <c,andu =c,thenb =c.
8. For a11 real a and b we have u2 + b2 2 0. If a and b are not both 0, then u2 + b2 > 0.
9. There is no real number a such that x < a for a11 real x.
10. If x has the property that 0 5 x < h for euery positive real number h, then x = 0.
13.6 Integers and rational numbers
There exist certain subsets of R which are distinguished because they have special properties not shared by a11 real numbers. In this section we shall discuss two such subsets, the
integers and the rational numbers.
TO introduce the positive integers we begin with the number 1, whose existence is guaranteed by Axiom 4. The number 1 + 1 is denoted by 2, the number 2 + 1 by 3, and SO on.
The numbers 1, 2, 3, . . . , obtained in this way by repeated addition of 1 are a11 positive,
and they are called the positive integers. Strictly speaking, this description of the positive
integers is not entirely complete because we have not explained in detail what we mean by
the expressions “and SO on,” or “repeated addition of 1.” Although the intuitive meaning
22
Introduction
of these expressions may seem clear, in a careful treatment of the real-number system it is
necessary to give a more precise definition of the positive integers. There are many ways
to do this. One convenient
method is to introduce first the notion of an inductive set.
DEFINITION OF AN INDUCTIVE SET.
A set of real numbers is called an inductive set if it has
the following
two properties:
(a) The number 1 is in the set.
(b) For every x in the set, the number x + 1 is also in the set.
For example, R is an inductive set. SO is the set R+. Now we shah define
integers to be those real numbers which belong to every inductive set.
DEFINITION OF POSITIVE INTEGERS.
the positive
A real number is called a positive integer if it belongs
to every inductive set.
Let P denote the set of a11 positive integers. Then P is itself an inductive set because (a)
it contains 1, and (b) it contains x + 1 whenever it contains x. Since the members of P
belong to every inductive set, we refer to P as the smallest inductive set. This property of
the set P forms the logical basis for a type of reasoning that mathematicians cal1 proof by
induction, a detailed discussion of which is given in Part 4 of this Introduction.
The negatives of the positive integers are called the negative integers. The positive integers,
together with the negative integers and 0 (zero), form a set Z which we cal1 simply the
set of integers.
In a thorough treatment of the real-number system, it would be necessary at this stage to
prove certain theorems about integers. For example, the sum, difference, or product of two
integers is an integer, but the quotient of two integers need not be an integer. However, we
shah not enter into the details of such proofs.
Quotients of integers a/b (where b # 0) are called rational numbers. The set of rational
numbers, denoted by Q, contains Z as a subset. The reader should realize that a11 the field
axioms and the order axioms are satisfied by Q. For this reason, we say that the set of
rational numbers is an orderedfîeld.
Real numbers that are not in Q are called irrational.
13.7
Geometric interpretation of real numbers as points on a line
The reader is undoubtedly familiar with the geometric representation of real numbers
by means of points on a straight line. A point is selected to represent 0 and another, to the
right of 0, to represent 1, as illustrated in Figure 1.7. This choice determines the scale.
If one adopts an appropriate set of axioms for Euclidean geometry, then each real number
corresponds to exactly one point on this line and, conversely, each point on the line corresponds to one and only one real number. For this reason the line is often called the real Zinc
or the real axis, and it is customary to use the words real number and point interchangeably.
Thus we often speak of the point x rather than the point corresponding to the real number x.
The ordering relation among the real numbers has a simple geometric interpretation.
If x < y, the point x lies to the left of the point y, as shown in Figure 1.7. Positive numbers
Upper bound of a set, maximum element, least Upper
bound (supremum)
23
lie to the right of 0 and negative numbers to the left of 0. If a < b, a point x satisfies the
inequalities a < x < b if and only if x is between a and b.
This device for representing real numbers geometrically is a very worthwhile aid that
helps us to discover and understand better certain properties of real numbers. However,
the reader should realize that a11 properties of real numbers that are to be accepted
as
theorems must be deducible from the axioms without any reference to geometry. This
does not mean that one should not make use of geometry in studying properties of real
numbers. On the contrary, the geometry often suggests the method of proof of a particular
theorem, and sometimes a geometric argument is more illuminating than a purely analytic
proof (one depending entirely on the axioms for the real numbers). In this book, geometric
il
FIGURE 1.7 Real
;
X
Y
numbers represented geometrically on a line.
arguments are used to a large extent to help motivate or clarify a particular discussion.
Nevertheless, the proofs of a11 the important theorems are presented in analytic form.
13.8
Upper bound of a set, maximum element, least Upper bound (supremum)
The nine axioms listed above contain a11 the properties of real numbers usually discussed
in elementary algebra. There is another axiom of fundamental importance in calculus that
is ordinarily not discussed in elementary algebra courses. This axiom (or some property
equivalent to it) is used to establish the existence of irrational numbers.
Irrational numbers arise in elementary algebra when we try to salve certain quadratic
equations. For example, it is desirable to have a real number x such that x2 = 2. From the
nine axioms above, we cannot prove that such an x exists in R, because these nine axioms
are also satisfied by Q, and there is no rational number x whose square is 2. (A proof of this
statement is outlined in Exercise 11 of Section 1 3.12.) Axiom 10 allows us to introduce
irrational numbers in the real-number system, and it gives the real-number system a property
of continuity that is a keystone in the logical structure of calculus.
Before we describe Axiom 10, it is convenient
to introduce some more terminology and
notation. Suppose 5’ is a nonempty set of real numbers and suppose there is a number B
such that
x<B
for every x in S. Then Sis said to be bounded above by B. The number B is called an Upper
bound for S. We say an Upper bound because every number greater than B Will also be an
Upper bound. If an Upper bound B is also a member of S, then B is called the largest
member or the maximum element of S. There cari be at most one such B. If it exists, we
Write
B=maxS.
Thus, B = max S if B E S and x < B for a11 x in S. A set with no Upper bound is said to be
unbounded above.
The following examples serve to illustrate the meaning of these terms.
24
Introduction
EXAMPLE 1. Let S be the set of a11 positive real numbers. This set is unbounded above.
It has no upper bounds and it has no maximum element.
EXAMPLE 2. Let S be the set of a11 real x satisfying 0 5 x 5 1. This set is bounded
above by 1. In fact, 1 is its maximum element.
EXAMPLE 3. Let T be the set of a11 real x satisfying 0 < x < 1. This is like the set in
Example 2 except that the point 1 .is not included. This set is bounded above by 1 but it has
no maximum element.
Some sets, like the one in Example 3, are bounded above but have no maximum element.
For these sets there is a concept which takes the place of the maximum element. This is
called the least Upper bound of the set and it is defined as follows:
DEFINITION OF LEAST UPPER BO~ND.
A number B is called a least Upper bound of a
nonempty set S if B has the following two properties:
(a) B is an Upper boundfor S.
(b) No number less than B is an Upper boundfor S.
If S has a maximum element, this maximum is also a least Upper bound for S. But if S
does not have a maximum element, it may still have a least Upper bound. In Example 3
above, the number 1 is a least Upper bound for T although T has no maximum element.
(See Figure 1.8.)
.
0
Upper bounds for S
/
-,,,,,,,,,,,,,,,,,,,,,,,,,
is
1
\
FIGURE 1.8
1.26.
.
0
Largest member of S
(a) S has a largest member:
maxS= 1
THEOREM
Upper bounds for T
T
/
/
1
Least upper bound of T
(b) T has no largest member, but it has
a least Upper b o u n d : s u p T = 1
Upper bounds, maximum element, supremum.
Two d@erent numbers cannot be least Upper bounds for the same set.
Proof. Suppose that B and C are two least Upper bounds for a set S. Property (b)
implies that C 2 B since B is a least Upper bound; similarly, B 2 C since C is a least Upper
bound. Hence. we have B = C.
This theorem tells us that if there is a least Upper bound for a set S, there is only one and
we may speak of the least Upper bound.
It is common practice to refer to the least Upper bound of a set by the more concise term
supremum, abbreviated sup. We shall adopt this convention and Write
B = sup S
to express the fact that B is the least Upper bound, or supremum, of S.
The Archimedean property of the real-number system
13.9
25
The least-Upper-bound axiom (completeness axiom)
Now we are ready to state the least-Upper-bound axiom for the real-number system.
AXIOM
10. Every nonempty set S ofreal numbers which is bounded above has a supremum;
that is, there is a real number B such that B = sup S.
We emphasize once more that the supremum of S need not be a member of S. In fact,
sup S belongs to S if and only if S has a maximum element, in which case max S = sup S.
Definitions of the terms lower bound, bounded below, smallest member (or minimum
element) may be similarly formulated. The reader should formulate these for himself. If
S has a minimum element, we denote it by min S.
A number L is called a greatest lower bound (or injîmum) of S if (a) L is a lower bound for
S, and (b) no number greater than L is a lower bound for S. The infimum of S, when it
exists, is uniquely determined and we denote it by inf S. If S has a minimum element, then
min S = inf S.
Using Axiom 10, we cari prove the following.
THEOREM
1.27. Every nonempty set S that is bounded below has a greatest lower bound;
that is, there is a real number L such that L = inf S.
Proof. Let -S denote the set of negatives of numbers in S. Then -S is nonempty and
bounded above. Axiom 10 tells us that there is a number B which is a supremum for -S.
It is easy to verify that -B = inf S.
Let us refer once more to the examples in the foregoing section. In Example 1, the set of
a11 positive real numbers, the number 0 is the infimum of S. This set has no minimum
element. In Examples 2 and 3, the number 0 is the minimum element.
In a11 these examples it was easy to decide whether or not the set S was bounded above
or below, and it was also easy to determine the numbers sup S and inf S. The next example
shows that it may be difficult to determine whether Upper or lower bounds exist.
EXAMPLE 4. Let S be the set of a11 numbers of the form (1 + I/n)“, where n = 1,2,3, . . . .
For example, taking n = 1, 2, and 3, we find that the numbers 2, 2, and $4 are in S.
Every number in the set is greater than 1, SO the set is bounded below and hence has an
infimum. With a little effort we cari show that 2 is the smallest element of S SO inf S =
min S = 2. The set S is also bounded above, although this fact is not as easy to prove.
(Try it!) Once we know that S is bounded above, Axiom 10 tells us that there is a number
which is the supremum of S. In this case it is not easy to determine the value of sup S from
the description of S. In a later chapter we Will learn that sup S is an irrational number
approximately equal to 2.718. It is an important number in calculus called the Euler
number e.
13.10
The Archimedean property of the real-number system
This section contains a number of important properties of the real-number system which
are consequences
of the least-Upper-bound axiom.
Introduction
26
THEOREM
1.28.
The set P of positive integers 1, 2, 3, . . . is unbounded above.
Proof. Assume P is bounded above. We shah show that this leads to a contradiction.
Since P is nonempty, Axiom 10 tells us that P has a least Upper bound, say b. The number
b- 1, being less than b, cannot be an Upper bound for P. Hence, there is at least one
positive integer II such that n > h - 1. For this n we have n + 1 > 6. Since n + 1 is in
P, this contradicts the fact that b is an Upper bound for P.
As corollaries of Theorem 1.28, we immediately obtain the following consequences:
THEOREM
1.29.
For every real .x there exists a positive integer n such that n > x.
Proof. If this were not
Theorem 1.28.
SO,
some x would be an Upper
bound for P, contradicting
THEOREM 1.30.
If x > 0 and ify is an arbitrary real number, there exists a positive integer
n such that nx > y.
Proof. Apply Theorem 1.29 with x replaced by y/x,
The property described in Theorem 1.30 is called the Archimedean property of the realnumber system. Geometrically it means that any line segment, no matter how long, may
be covered by a finite number of line segments of a given positive length, no matter how
small. In other words, a small ruler used often enough cari measure arbitrarily large
distances. Archimedes realized that this was a fundamental property of the straight line
and stated it explicitly as one of the axioms of geometry. In the 19th and 20th centuries,
non-Archimedean geometries have been constructed in which this axiom is rejected.
From the Archimedean property, we cari prove the following theorem, which Will be
useful in our discussion of integral calculus.
THEOREM 1.3 1.
If three real numbers a, x, and y satisfy the inequalities
a<x<a+i
(1.14)
for every integer n 2 1, then x = a.
Proof. If x > a, Theorem 1.30 tells us that there is a positive integer n satisfying
n(x - a) > y, contradicting (1.14). Hence we cannot have x > a, SO we must have x = a.
13.11
Fundamental properties of the supremum and infimum
This section discusses three fundamental properties of the supremum and infimum that
we shall use in our development of calculus. The first property states that any set of numbers
with a supremum contains points arbitrarily close to its supremum; similarly, a set with an
infimum contains points arbitrarily close to its infimum.
Fundamental properties of the supremum and injmum
THEOREM
1.32.
27
Let h be a given positive number and let S be a set of real numbers.
(a) If S has a supremum, then for some x in S we have
x>supS-h.
(b) If S has an injmum, then for some x in S we have
x<infS+h.
Proof of (a). If we had x 5 sup S - h for a11 x in S, then sup S - h would be an Upper
bound for S smaller than its least Upper bound. Therefore we must have x > sup S - h
for at least one x in S. This proves (a). The proof of(b) is similar.
THEOREM
1.33.
ADDITIVE
PROPERTY.
Given nonempty subsets A and B of R, Iet C denote
the set
(a) If each of A and B has a supremum, then C has a supremum, and
sup C = sup A + sup B .
(b) If each of A and B has an injmum, then C has an injimum, and
inf C = infA + infB.
Proof. Assume each of A and B has a supremum. If c E C, then c = a + b, where
a E A and b E B. Therefore c 5 sup A + sup B; SO sup A + sup Bis an Upper bound for C.
This shows that C has a supremum and that
supC<supA+supB.
Now let n be any positive integer. By Theorem 1.32 (with h = I/n) there is an a in A and a
b in B such that
a>supA-k,
b>supB-;.
Adding these inequalities, we obtain
a+b>supA+supB-i,
o
r
supA+supB<a+b+$<supC+i,
since a + b < sup C. Therefore we have shown that
sup C 5 sup A + sup B < sup C + ;
Introduction
28
for every integer n 2 1. By Theorem 1.31, we must have sup C = sup A + sup B. This
proves (a), and the proof of(b) is similar.
THEOREM
1.34.
Given two nonempty subsets S and T of R such that
slt
for every s in S and every t in 7.
satisfy the inequality
Then S has a supremum, and T has an injmum, and they
supS<infT.
Proof. Each t in T is an Upper bound for S. Therefore S has a supremum which satisfies
the inequality sup S 5 t for a11 t in T. Hence sup S is a lower bound for T, SO T has an
infimum which cannot be less than sup S. In other words, we have sup S -< inf T, as
asserted.
*13.12
Exercises
1. If x and y are arbitrary real numbers with x < y, prove that there is at least one real z satisfying
x<z<y.
2. If x is an arbitrary real number, prove that there are integers m and n such that m < x < n.
3. If x > 0, prove that there is a positive integer n such that I/n < x.
4. If x is an arbitrary real number, prove that there is exactly one integer n which satisfies the
inequalities n 5 x < n + 1. This n is called the greatest integer in x and is denoted by [xl.
For example, [5] = 5, [$] = 2, [-$1 = -3.
5. If x is an arbitrary real number, prove that there is exactly one integer n which satisfies
x<n<x+l.
6. If x and y are arbitrary real numbers, x < y, prove that there exists at least one rational number r satisfying x < Y < y, and hence infinitely many.
This property is often described by
saying that the rational numbers are dense in the real-number system.
7. If x is rational, x # 0, and y irrational, prove that x + y, x -y, xy, x/y, and y/x are a11
irrational.
8. 1s the sum or product of two irrational numbers always irrational?
9. If x and y are arbitrary real numbers, x <y, prove that there exists at least one irrational
number z satisfying x < z < y, and hence infinitely many.
10. An integer n is called even if n = 2m for some integer m, and odd if n + 1 is even. Prove the
following statements :
(a) An integer cannot be both even and odd.
(b) Every integer is either even or odd.
(c) The sum or product of two even integers is even. What cari you say about the sum or
product
of two odd integers?
(d) If n2 is even, SO is n. If a2 = 2b2, where a and b are integers, then both a and b are even.
(e) Every rational number cari be expressed in the form a/b, where a and b are integers, at
least one of which is odd.
11. Prove that there is no rational number whose square is 2.
[Hint: Argue by contradiction. Assume (a/b)2 = 2, where a and b are integers, at least
one of which is odd. Use parts of Exercise 10 to deduce a contradiction.]
Existence of square roots of nonnegative real numbers
29
12. The Archimedean property of the real-number system was deduced as a consequence of the
least-Upper-bound axiom. Prove that the set of rational numbers satisfies the Archimedean
property but not the least-Upper-bound property. This shows that the Archimedean property does not imply the least-Upper-bound axiom.
*13.13
Existence of square roots of nonnegative real numbers
It was pointed out earlier that the equation x 2 = 2 has no solutions among the rational
numbers. With the help of Axiom 10, we cari prove that the equation x2 = a has a solution
among the real numbers if a 2 0. Each such x is called a square root of a.
First, let us see what we cari say about square roots without using Axiom 10. Negative
numbers cannot have square roots because if x2 = a, then a, being a square, must be
nonnegative (by Theorem 1.20). Moreover, if a = 0, then x = 0 is the only square root
(by Theorem 1.11). Suppose, then, that a > 0. If x2 = a, then x # 0 and (-x)” = a,
SO both x and its negative are square roots. In other words, if a has a square root, then it
has two square roots, one positive and one negative. Also, it has ut most two because
if x2 = a and y2 = a, then x2 = y2 and (x - y)(x + y) = 0, and SO, by Theorem 1.11,
either x = y or x = -y. Thus, if a has a square root, it has exactly two.
The existence of at least one square root cari be deduced from an important theorem
in calculus known as the intermediate-value theorem for continuous functions,
but it
may be instructive to see how the existence of a square root cari be proved directly from
Axiom 10.
THEOREM
Every nonnegatioe real number a has a unique nonnegative square root.
1.35.
Note: If a 2 0, we denote its nonnegative square root by a112 or by 6. If a > 0,
the negative square root is -a112 or -6.
Proof.
If a = 0, then 0 is the only square root. Assume, then, that a > 0. Let S be
the set of a11 positive x such that x2 5 a. Since (1 + a)” > a, the number 1 + a is an
Upper bound for S. Also, S is nonempty because the number a/(1 + a) is in S; in fact,
a2 5 a(1 + a)” and hence a”/(1 + a)” < a. By Axiom 10, S has a least Upper bound
which we shall cal1 b. Note that b 2 a/(1 + a) SO b > 0. There are only three possibilities:
b2 > a, b2 < a, or b2 = a.
Suppose b2 > a and let c = b - (b2 - a)/(2b) = $(b + a/b).
Then 0 < c < b and
~2 = b" - (b2 - a) + (b2 - a)2/(4b2) = a + (b2 - a)2/(4b2) > a. Therefore c2 > x2
for each x in S, and hence c > x for each x in S. This means that c is an Upper bound for
S. Since c < b, we have a contradiction because b was the least Upper bound for S.
Therefore the inequality b2 > a is impossible.
Suppose b2 < a. Since b > 0, we may choose a positive number c such that c < b and
such that c < (a - b2)/(3b). Then we have
(b
+
42
= 62 + c(2b + c ) < b2 + 3bc < b2 + (a - b2) = a
Therefore b + c is in S. Since b + c > b, this contradicts the fact that b is an Upper
bound for S. Therefore the inequality b2 < a is impossible, and the only remaining
alternative is b2 = a.
30
Introduction
*13.14
Roots of higher order. Rational powers
The least-Upper-bound axiom cari also be used to show the existence of roots of higher
order. For example, if n is a positive odd integer, then for each real x there is exactly
one real y such that y” = x. This y is called the nth root of x and is denoted by
y = xl’n
(1.15)
or
J=G
When n is even, the situation is slightly different. In this case, if x is negative, there is no
real y such that yn = x because y” 2 0 for a11 real y. However, if x is positive, it cari be
shown that there is one and only one positive y such that yn = x. This y is called thepositive
nth root of x and is denoted by the symbols in (1.15). Since n is even, (-y)” = y” and hence
each x > 0 has two real nth roots, y and -y. However, the symbols xlln and & are
reserved for the positive nth root. We do not discuss the proofs of these statements here
because they Will be deduced later as consequences
of the intermediate-value theorem for
continuous functions (see Section 3.10).
If r is a positive rational number, say r = min, where m and n are positive integers, we
define xr to be (xm)rln, the nth root of xm, whenever this exists. If x # 0, we define x-’ =
1/x’ whenever X” is defined. From these definitions, it is easy to verify that the usual laws
of exponents are valid for rational exponents : x7 *x5 = x7+‘, (x7>” = xrs, and (xy)’ = x’y’,
*13.15
Representation of real numbers by decimals
A real number of the form
(1.16)
where a,, is a nonnegative integer and a,, a2, . . . , a, are integers satisfying 0 5 a, 5 9, is
usually written more briefly as follows:
r = a,.a,a, * * * a, .
This is said to be a$nite decimal representation of r. For example,
2-=
l
Los
10
. ’
l 2 102= 0 *(32 ’
-=
50
2g
4 = 7 + $ + $ = 7.25 <
Real numbers like these are necessarily rational and, in fact, they a11 have the form r = a/lO”,
where a is an integer. However, not a11 rational numbers cari be expressed with finite
decimal representations. For example, if + could be SO expressed, then we would have
+ = a/lO” or 3a = 10” for some integer a. But this is impossible since 3 is not a factor of any
power of 10.
Nevertheless, we cari approximate an arbitrary real number x > 0 to any desired degree
of accuracy by a sum of the form (1.16) if we take n large enough. The reason for this may
be seen by the following geometric argument: If x is not an integer, then x lies between two
consecutive
integers, say a, < x < a, + 1. The segment joining a, and a, + 1 may be
Representation of real numbers by decimals
31
subdivided into ten equal parts. If x is not one of the subdivision points, then x must lie
between two consecutive
subdivision points. This gives us a pair of inequalities of the form
where a, is an integer (0 < a, 5 9). Next we divide the segment joining a, + a,/10 and
a,, + (a, + l)/lO into ten equal parts (each of length 1OP) and continue the process. If
after a finite number of steps a subdivision point coincides with x, then x is a number of the
form (1.16). Otherwise the process continues indefinitely, and it generates an infinite set of
integers a, , a2 , a3 , . . . . In this case, we say that x has the infinite decimal representation
x = a0.a1a2a3
* *-.
At the nth stage, x satisfies the inequalities
a0 + F. + - - * + ~<x<a,+~+-+ an10”+ 1 *
This gives us two approximations to x, one from above and one from below, by finite
decimals that differ by lO-“. Therefore we cari achieve any desired degree of accuracy in
our approximations by taking n large enough.
When x = 4, it is easy to verify that a, = 0 and a, = 3 for a11 n 2 1, and hence the
corresponding infinite decimal expansion is
Q = 0.333 * * ’ .
Every irrational number has an infinite decimal representation. For example, when x = v’?
we may calculate by tria1 and error as many digits in the expansion as we wish. Thus, G
lies between 1.4 and 1.5, because (1 .4)2 < 2 < (1.5)2. Similarly, by squaring and comparing with 2, we find the following further approximations:
1.41 < v’? < 1.42,
1.414 < fi < 1.415)
1.4142 < fi < 1.4143.
Note that the foregoing process generates a succession of intervals of lengths 10-l, 10-2,
lO-3,..., each contained in the preceding and each containing the point x. This is an
example of what is known as a sequence of nested intervals, a concept that is sometimes used
as a basis for constructing the irrational numbers from the rational numbers.
Since we shah do very little with decimals in this book, we shah not develop their properties in any further detail except to mention how decimal expansions may be defined
analytically with the help of the least-Upper-bound axiom.
If x is a given positive real number, let a, denote the largest integer 5 x. Having chosen
a, , we let a, denote the largest integer such that
a, + A9 < x .
10 -
32
Introduction
More generally, having chosen
that
a, , a, , . . . , a,-, , we let a, denote
the largest integer such
(1.17)
Let S denote
the set of a11 numbers
(1.18)
obtained in this way for n = 0, 1, 2, . . . . Then S is nonempty and bounded above, and
it is easy to verify that x is actually the least Upper bound of S. The integers a,, al, a2, . . .
SO obtained may be used to define a decimal expansion of x if we Write
x = ao.a1a2a3 - * *
to mean that the nth digit a, is the largest integer satisfying (1.17). For example, if x = 8,
we find a, = 0, a, = 1, a, = 2, a3 = 5, and a, = 0 for a11 n 2 4. Therefore we may Write
* = 0.125000*~~,
If in (1.17) we replace the inequality sign 5 by <, we obtain a slightly different definition
of decimal expansions. The least Upper bound of a11 numbers of the form (1.18) is again x,
although the integers a, , a,, a2 , . . . need not be the same as those which satisfy (1.17). F o r
example, if this second definition is applied to x = &, we find a, = 0, a, = 1, a2 = 2,
a3 = 4, and a, = 9 for a11 n 2 4. This leads to the infinite decimal representation
Q = 0.124999 - - - .
The fact that a real number might have two different decimal representations is merely a
reflection of the fact that two different sets of real numbers cari have the same supremum.
Part 4.
Mathematical
Induction, Summation
Related Topics
Notation,
and
14.1 An example of a proof by mathematical induction
There is no largest integer because when we add 1 to an integer k, we obtain k + 1,
which is larger than k. Nevertheless, starting with the number 1, we cari reach any positive
integer whatever in a finite number of steps, passing successively from k to k + 1 at each
step. This is the basis for a type of reasoning that mathematicians cal1 proofby induction.
We shall illustrate the use of this method by proving the pair of inequalities used in Section
An example of aproof by mathematical induction
33
Il.3 in the computation of the area of a parabolic segment, namely
(1.19)
12+22+***
+ (n - 1)2 < $ < l2 + 22 + * * *+ n2.
Consider the leftmost inequality first, and let us refer to this formula as A(n) (an assertion
involving n). It is easy to verify this assertion directly for the first few values of n. Thus,
for example, when IZ takes the values 1, 2, and 3, the assertion becomes
A(l):0 <$
A(2): l2 < $ >
A(3): l2 + 22 < ;,
provided we agree to interpret the sum on the left as 0 when n = 1.
Our abject is to prove that A(n) is true for every positive integer n. The procedure is as
follows: Assume the assertion has been proved for a particular value of n, say for n = k.
That is, assume we have proved
A(k): l2 + 2’ + . . . + (k - 1)” < -3
for a fixed k 2 1. Now using this, we shall deduce the corresponding result for k + 1:
(k + 1)3
A(k + 1): l2 + 22 + . . . + k2 < ~.
3
Start with A(k) and add k2 to both sides. This gives the inequality
l2 + 22 + . . . + k2 < 5 + k2.
TO obtain A(k + 1) as a consequence
of this, it suffices
to show that
But this follows at once from the equation
k3 + 3k2 + 3k + 1
k3
(k
+3 1)3
-=
3
=3+k2+k+;.
Therefore we have shown that A(k + 1) follows from A(k). Now, since A(1) has been
verified directly, we conclude that A(2) is also true. Knowing that A(2) is true, we conclude
that A(3) is true, and SO on. Since every integer cari be reached in this way, A(n) is true for
a11 positive integersn. This proves the leftmost inequality in (1.19). The rightmost inequality
cari be proved in the same way.
34
Introduction
14.2 The principle of mathematical induction
The reader should make certain that he understands the pattern of the foregoing proof.
First we proved the assertion A(n) for n = 1. Next we showed that ifthe assertion is true
for a particular integer, then it is also true for the next integer. From this, we concluded
that the assertion is true for a11 positive integers.
The idea of induction may be illustrated in many nonmathematical ways. For example,
imagine a row of toy soldiers, numbered consecutively, and suppose they are SO arranged
that if any one of them falls, say the one labeled k, it Will knock over the next one, labeled
k + 1. Then anyone cari visualize what would happen if soldier number 1 were toppled
backward. It is also clear that if a later soldier were knocked over first, say the one labeled
n, , then a11 soldiers behind him would fall. This illustrates a slight generalization of the
method of induction which cari be described in the following way.
Method of proof by induction. Let A(n) be an assertion involving an integer n. We
conclude that A(n) is true for every n 2 n, if we cari perform the following two steps:
(a) Prove that A(n,) is true.
(b) Let k be an arbitrary but fixed integer >nl . Assume that A(k) is true and prove that
A(k + 1) is also true.
In actual practice n, is usually 1. The logical justification for this method of proof is the
following theorem about real numbers.
THEOREM
1.36. PRINCIPLE OF MATHEMATICAL
integers which has the following t wo properties:
(a) The number 1 is in the set S.
(b) If an integer k is in S, then SO is k + 1.
Then every positive integer is in the set S.
INDUCTION.
Let S be a set ofpositive
Proof. Properties (a) and (b) tel1 us that S is an inductive set. But the positive integers
were defined to be exactly those real numbers which belong to every inductive set. (See
Section 1 3.6.) Therefore S contains every positive integer.
Whenever we carry out a proof of an assertion A(n) for a11 n 2 1 by mathematical induction, we are applying Theorem 1.36 to the set S of a11 the integers for which the assertion is
true. If we want to prove that A(n) is true only for n 2 n, , we apply Theorem 1.36 to the
set of n for which A(n + n, - 1) is true.
*14.3 The well-ordering principle
There is another important property of the positive integers, called the well-ordering
principle, that is also used as a basis for proofs by induction. It cari be stated as follows.
1.37. WELL-ORDERING
a smallest member.
THEOREM
contains
PRINCIPLE.
Every nonempty set of positive integers
Note that the well-ordering principle refers to sets of positive integers. The theorem is
not true for arbitrary sets of integers. For example, the set of a11 integers has no smallest
member .
Exercises
35
The well-ordering principle cari be deduced from the principle of induction. This is
demonstrated in Section 14.5. We conclude this section with an example showing how the
well-ordering principle cari be used to prove theorems about positive integers.
Let A(n) denote the following assertion:
A(n): l2 + 22 + . . *
Again,
we note that A(1) is true, since
Now there are only two possibilities. We have either
(i) A(n) is true for every positive integer II, or
(ii) there is at least one positive integer n for which A(n) is false.
We shall prove that alternative (ii) leads to a contradiction. Assume (ii) holds. Then by
the well-ordering principle, there must be a smallest positive integer, say k, for which
A(k) is false. (We apply the well-ordering principle to the set of a11 positive integers n for
which A(n) is false. Statement (ii) says that this set is nonempty.) This k must be greater
than 1, because we have verified that A(1) is true. Also, the assertion must be true for
k - 1, since k was the smallest integer for which A(k) is false; therefore we may Write
& - 1): l2 + 2’ + . . . + ( k - 1)” = ~
3 1)3(k + ~
2 1>2
(k + 6 1k ’
Adding k2 to both sides and simplifying the right-hand side, we find
l2 + 22 + . . . +k2=f+;+i.
But this equation states that A(k) is true; therefore we have a contradiction, because k is
an integer for which A(k) is false. In other words, statement (ii) leads to a contradiction.
Therefore (i) holds, and this proves that the identity in question is valid for a11 values of
12 2 1. An immediate consequence
of this identity is the rightmost inequality in (1.19).
A proof like this which makes use of the well-ordering principle is also referred to as
a proof by induction. Of course, the proof could also be put in the more usual form in
which we verify A(l) and then pass from A(k) to A(k + 1).
14.4 Exercises
1. Prove the following formulas by induction :
(a) 1 + 2 + 3 + . f . + n = n(n + 1)/2.
(b) 1 + 3 + 5 + . + (2n - 1) = n2.
(c) 1” +
23
+
33
+ . + n3 = (1 + 2 + 3 + + n)2.
(d) l3 + 23 + . + (n - 1)3 < n4/4 < l3 + 23 + . . . + n3.
Introduction
36
2 . N o t e that
1 =l,
1 - 4 = -(l + 2))
1 -4+9=1 +2+3,
1 - 4 + 9 - 16 = -(l + 2 + 3 + 4).
Guess the general law suggested and prove it by induction.
3. Note that
1+6=2-i,
1+2+$=2-i,
1+*+2+*=2-4.
Guess the general law suggested and prove it by induction.
4. Note that
l-8=$,
(1 - j=)(l - g> = 4,
(1 - i)(l - 9)(1 - 4) = a.
Guess the general law suggested and prove it by induction.
5. Guess a general law which simplifies the product
(1 -i)(l -$(l -y . ..(l -y
and prove it by induction.
6. Let A(n) denote the statement: 1 + 2 + . + n = Q(2n + 1)2.
(a) Prove that if A(k) is true for an integer k, then A(k + 1) is also true.
(b) Criticize the statement : “By induction it follows that A(n) is true for a11 n.”
(c) Amend A(n) by changing the equality to an inequality that is true for a11 positive integers n.
7. Let n, be the smallest positive integer n for which the inequality (1 + x)” > 1 + nx + 11x2 is
true for a11 x > 0. Compute n, , and prove that the inequality is true for a11 integers n 2 n1 .
8. Given positive real numbers n, , CI~, a3, . . . , such that a, < ca,-, for a11 n 2 2, where c is a
fixed positive number, use induction to prove that a, 5 ulcn-r for a11 n 2 1.
9. Prove the following statement by induction: If a line of unit length is given, then a line of
length 6 cari be constructed with straightedge and compass for each positive integer n.
10. Let b denote a fixed positive integer. Prove the following statement by induction: For every
integer n 2 0, there exist nonnegative integers q and r such that
12 = qb + r ,
Olr<b.
11. Let n and d denote integers. We say that dis a divisor of n if n = cd for some integer c. An
integer n is called a prime if n :> 1 and if the only positive divisors of n are 1 and n. Prove, by
induction, that every integer n > 1 is either a prime or a product of primes.
12. Describe the fallacy in the following “proof” by induction:
Statement. Given any collection of n b l o n d e g i r l s . If at least one of the girls has blue eyes,
then a11 n of them have blue eyes.
“Proof.”
The statement is obviously true when n = 1. The step from k to k + 1 cari
be illustrated by going from n = 3 to n = 4. Assume, therefore, that the statement is true
The summation notation
37
when n = 3 and let G,, G,, G,, G, be four blonde girls, at least one of which, say G,, has blue
eyes. Taking G,, G,, and G, together and using the fact that the statement is true when n = 3,
we find that G, and G, also have blue eyes. Repeating the process with G,, G,, and G,, we find
that G, has blue eyes. Thus a11 four have blue eyes. A similar argument allows us to make
the step from k to k + 1 in general.
Corollary.
Al1
blonde girls have blue eyes.
Proof. Since there exists at least one blonde girl with blue eyes, we cari apply the foregoing
result to the collection consisting of a11 blonde girls.
Note: This example is from G. Polya, who suggests that the reader may want to test the
validity of the statement by experiment.
*14.5 Proof of the well-ordering principle
In this section we deduce the well-ordering principle from the principle of induction.
Let T be a nonempty collection of positive integers. We want to prove that T has a
smallest member, that is, that there is a positive integer t, in T such that t, 5 t for a11 t in T.
Suppose T has no smallest member. We shall show that this leads to a contradiction.
The integer 1 cannot be in T (otherwise it would be the smallest member of T). Let S
denote the collection of a11 positive integers n such that n < t for a11 t in T. Now 1 is in S
because 1 < t for a11 t in T. Next, let k be a positive integer in S. Then k < t for a11 t in T.
We shall prove that k + 1 is also in 5’. If this were not SO, then for some t, in T we would
have t, 5 k + 1. Since T has no smallest member, there is an integer t, in T such that
t, < k + 1. But this means that t2 5 k, contradicting the fact that
t2 < h > and hence
k < t for a11 t in T. Therefore k + 1 is in S. By the induction principle, S contains a11
positive integers. Since Tisnonempty, there is a positive integer t in T. But this t must also
be in S (since S contains a11 positive integers). It follows from the definition of S that t < t,
which is a contradiction. Therefore, the assumption that T has no smallest member leads
to a contradiction. It follows that T must have a smallest member, and in turn this proves
that the well-ordering principle is a consequence
of the principle of induction.
14.6 The summation notation
In the calculations for the area of the parabolic segment, we encountered the sum
(1.20)
12 + 22 + 32 + . * *+ n2 .
Note that a typical term in this sum is of the form k2, and we get a11 the terms by letting k
notation which
run through the values 1,2,3, . . . , n. There is a very useful and convenient
enables us to Write sums like this in a more compact form. This is called the summation
notation and it makes use of the Greek letter sigma, 2. Using summation notation, we cari
Write the sum in (1.20) as follows:
This symbol is read: “The sum of k2 for k running from 1 to n.” The numbers appearing
under and above the sigma tel1 us the range of values taken by k. The letter k itself is
38
Introduction
referred to as the index of summation. Of course, it is not important that we use the letter
k; any other convenient letter may take its place. For example, instead of zkZl k2 we could
Write zTcl i2, z;clj2, ZZZ1 m2, etc., a11 of which are considered as alternative notations for
the same thing. The letters i, j, k, m, etc. that are used in this way are called dummy indices.
It would not be a good idea to use the letter n for the dummy index in this particular example
because n is already being used for the number of terms.
More generally, when we want to form the sum of several real numbers, say a, , a,, . . . ,
a n, we denote such a sum by the symbol
(1.21)
a, + a2 + . . . + a,
which, using summation notation, cari be written as follows:
(1.22)
iak.
k=l
For example, we have
Sometimes it is convenient to begin summations from 0 or from some value of the index
beyond 1. For example, we havl:
& =
x0 + x1+
x2 +
x3 +
x4>
n$2n3 = 23 + 33 + 43 + 53.
Other uses of the summation notation are illustrated below:
ix m+l! = x + x2 + x3 + x* + x5,
?Il=0
&-‘- = 1 + 2 + 22 + 23 + 2* + 25.
TO emphasize once more that the choice of dummy index is unimportant, we note that the
last sum may also be written in each of the following forms:
Note: From a strictly logical standpoint, the symbols in (1.21) and (1.22) do not appear
among the primitive symbols for the real-number system. In a more careful treatment, we
could define these new symbols in terms of the primitive undefined symbols of our system.
Exercises
This may be done by a process known as definition
tion, consists of two parts:
(a) We define
39
by induction which, like proof by induc-
kglak = a1 .
(b) Assuming that we have defined I&,ali for a fixed n 2 1, we further define
ix ak = (k!lak) + a,+,.
T O illustrate, we may take II = 1 in (b) and use (a) to obtain
Now, having defined zk=r ak , we cari use (b) again with n = 2 to obtain
k%lak =$;k + a3 = (a1 + a21 + a3.
By the associative law for addition (Axiom 2), the sum (a1 + a2) + a3 is the same as
a, + (a2 + a,), and therefore there is no danger of confusion if we drop the parentheses
and simply Write a, + a2 + a3 for 2i-r ak . Similarly, we have
k$ak = j: + a4 = (a1 + a2 +
Q3)
+ a4
e
In this case we cari proue that the sum (a1 + u2 + as) + u4 is the same as (a1 + a& +
(a3 + a4) or a, + (a2 + a3 + a,), and therefore the parentheses cari be dropped again without danger of ambiguity, and we agree to Write
k$ak = a, + a2 + u3 + u4.
Continuing in this way, we find that (a) and (b) together give us a complete definition of
the symbol in (1.22). The notation in (1.21) is considered to be merely an alternative way of
writing (1.22). It is justified by a general associative law for addition which we shah not
attempt to state or to prove here.
The reader should notice that d@nition by induction and proof by induction involve the
same underlying idea. A definition by induction is also called a recursiue definition.
14.7 Exercises
1. Find the numerical values of the following sums :
(4 2 k
k=l
(b) i 2n-2,
n=2
(c) f 22r+1,
(eli$(2i + 11,
T=O
(4 i nn,
?I=l
(f) $1.
k(k + 1)
k=l
40
Introduction
2. Establish the following properties of the summation notation:
(a>k$b%
(additive property).
+ 6,:) = $ (Ik + 2 bk
k=l
k=l
(homogeneous property).
ccjkz(%
- ak.-l>
(telescoping property).
= an - uO
Use the properties in Exercise 2 whenever possible to derive the formulas in Exercises 3
through 8.
3.2 1 = n. (This means zE=, a,, where each ak = 1.)
k=l
[Hint: 2k - 1 = k2 - (k - 1)2.]
4. i (2k - 1) = ns.
5. c12 k=;+;.
k=l
[Hint:
IJse Exercises 3 and 4.1
k=l
[Hint:
++; +f +z.
k3 - (k - 1)3 = 3k2 - 3 k + 1.1
k=l
if x # 1. Note:
8.
x0 is defined to be 1.
[Hint: Apply Exercise 2 to (1 - x) En=0 x”.]
(b) What is the sum equal to when x = l?
9. Prove, by induction, that the sum I$ (- 1)“(2k + 1) is proportional to n, and find the
constant of proportionality.
10. (a) Give a reasonable definition of the symbol Irdz a,.
(b) Prove, by induction, that for n 2 1 we have
s ; = p-y+1*
k=n+l
?7Z=l
11. Determine whether each of the following statements is true or false. In each case give a
reason for your decision.
100
<a>nz;4
100
100
(djizl(i + 1>2 = zi2.
= 1 n4.
?I=l
i=O
100
(b) 12 = 200.
j=o
100
100
(~‘~~0’” + k) = 2 + L: k.
k=O
Absolute
values and the triangle inequality
41
12. Guess and prove a general rule which simplifies the sum
13. Prove that 2(4n + 1 - di) < L < 2(& - m)if n 2 1. Then use this t o prove
6
that
2VG
26-l
-2<
if m 2 2. In particular, when m = 106, the sum lies between 1998 and 1999.
14.8 Absolute values and the triangle inequality
Calculations with inequalities arise quite frequently in calculus. They are of particular
importance in dealing with the notion of absolute value. If x is a real number, the absolute
value of x is a nonnegative real number denoted by 1x1 and defined as follows:
(
1x1 = x
- x
i f
~20,
i f
~50.
Note that - 1x1 5 x 5 1x1. When real numbers are represented geometrically on a real axis,
the number 1x1 is called the distance of x from 0. If a > 0 and if a point x lies between -a
and a, then 1x1 is nearer to 0 than a is. The analytic statement of this fact is given by the
following theorem.
THEOREM
1.38.
If a 2 0, then 1x1 < a lfand only if -a 5 x 5 a.
Proof. There are two statements to prove: first, that the inequality 1x1 < a implies the
two inequalities -a 5 x 5 a and, conversely, that -a 5 x < a implies 1x1 5 a.
Suppose 1x1 < a. Then we also have -a 5 -IX~. But either x = 1x1 or x = -IX~ and
hence -a 5 -IX~ < x 5 1x1 5 a. This proves the first statement.
TO prove the converse, assume -a 5 x 5 a. Then if x 2 0, we have 1x1 = x 5 a,
whereas if x 5 0, we have 1x1 = -x < a. In either case we have 1x1 < a, and this completes the proof.
Figure 1.9 illustrates the geometrical significance of this theorem.
a
FIGURE 1.9
Geometrical
significance
of
Theorem
1.38.
As a consequence
of Theorem 1.38, it is easy to derive an important inequality which
states that the absolute value of a sum of two real numbers cannot exceed the sum of their
absolute values.
Introduction
42
THEOREM
1.39.
For arbitrary real numbers x and y, we have
Ix + YI I 1x1 + IYI *
Note: This property is called the triangle inequality, because when it is generalized to
vectors it states that the length of any side of a triangle is less than or equal to the sum of
the lengths of the other two sides.
Proof. Adding the inequalitie.,0 -IX~ 5 x < 1x1 and -/y1 I y I 1~1, we obtain
-04 + IA> I x + Y I 1x1 + IYI 9
and hence, by Theorem 1.38, we conclude that Ix + y/ < 1x1 + /y].
If we take x = a - c and y = c - b, then x + y = a - b and the triangle inequality
becomes
la - bl 5 la - CI + lb - CI .
This form of the triangle inequality is often used in practice.
Using mathematical induction, we may extend the triangle inequality as follows:
THEOREM
1.40.
For arbitrary real numbers a,, a2, . . . , a,, we have
Proof. When n = 1 the inequality is trivial, and when n = 2 it is the triangle inequality.
Assume, then, that it is true for ut real numbers. Then for n + 1 real numbers a, , a2 , . . . ,
an+l , we have
Hence the theorem is true for n + 1 numbers if it is true for n. By induction, it is true for
every positive integer n.
The next theorem describes an important inequality that we shall use later in connection
with our study of vector algebra.
THEOREM
1.41. T H E C A U C H Y - S C H W A R Z
arbitrary real numbers, we have
INEQUALITY.
Zfa,, . . ..a. andb,, . . ..b.are
(1.23)
The equality sign holds if and onl;v if there is a real number x such that akx + b, = 0 for each
k = 1, 2, . . . , n.
Exercises
43
Proof. We bave & (aKx + b,)’ 2 0 for every real x because a sum of squares cari
never be negative. This may be written in the form
(1.24)
Ax2+2Bx+C>0,
where
A =ia;,
k=l
B =ia,b,,
k=l
C =ib;.
k=l
We wish to prove that B2 < AC. If A = 0, then each ak = 0,
trivial. If A # 0, we may complete the square and Write
Ax2+2Bx+C=A
SO
B = 0 and the result is
AC - B2
A
*
The right side has its smallest value when x = -B/A. Putting x = -B/A in (1.24), we
obtain B2 < AC. This proves (1.23). The reader should verify that the equality sign holds
if and only if there is an x such that akx + b, = 0 for each k.
14.9 Exercises
1. Prove each of the following properties of absolute values.
(a) 1x1 = 0 if and only if x = 0.
(0 Ixyl = 1x1 lyl.
(b) I-4 = Id.
Cg) Ix/yl = Ixlllyl ify + 0.
cc> Ix -yl = ly - xl.
04 Ix --y1 S 1x1 + lyl.
(d) lx12 = x2.
6) 1x1 - lyl I Ix -yl.
( e ) 1x1 = +2.
<j> II.4 - lyl 1 I Ix -yl.
2. Each inequality (ai), listed below, is equivalent to exactly one inequality (bj). For example,
1x1 < 3 if and only if -3 < x < 3, and hence (a& is equivalent to (b,). Determine a11 equivalent
pairs.
(b,) 4 < x < 6.
(4 1x1 < 3 .
(a2> lx - II < 3.
(b,) -3 < x < 3.
(a& 13 - 2x1 < 1.
(b3) x > 3 or x < -1.
(4 Il + 2x1 2 1.
(64) x > 2.
(a& Ix - II > 2.
(b,) -2 < x < 4.
(6,) -1/35x<-1 o
r
12x5 6.
(4 Ix + 21 2 5.
(6,) 1 < x < 2.
y; 15 - X+l < 1.
x - 51 < Ix + II.
(b,) x I - 7 or x 2 3.
(1;) 1x2 - 21 2 1.
(b,) + < x < 4.
(a1()) x < x2 - 12 < 4x.
@,,) - 1 I x 5 0.
3. Determine whether each of the following is true or false. In each case give a reason for your
decision.
(a) x < 5 implies 1x1 < 5.
(b) Ix - 51 < 2 implies 3 < x < 7.
(c) Il + 3x1 5 1 implies x 2 -g.
(d) There is no real x for which Ix - 11 = Ix - 21.
(e) For every x > 0 there is a y > 0 such that 12x + yl = 5.
4. Show that the equality sign holds in the Cauchy-Schwarz inequality if and only if there is a real
number x such that a$ + bk = 0 for every k = 1,2, . . . , n.
Introduction
44
*14.10
Miscellaneous exercises involving induction
In this section we assemble a number of miscellaneous facts whose proofs are good exercises in
the use of mathematical induction. Some of these exercises may serve as a basis for supplementary
classroom discussion.
Factorials and binomial coejjîcients. The symbol n! (read “n factorial”) may be defined by inductionasfollows:O!=l,n!=(n-l)!nifn>l.
Notethatn!=1.2.3...n.
If 0 5 k 5 n, the binomial coejjfîcient (k) is defined as follows:
n
k
0
n!
= k! (n - k)! ’
Note: Sometimes .C, is written for (E). These numbers appear as coefficients
in the binomial theorem. (See Exercise 4 below.)
1. Compute the values of the following binomial coefficients :
(0 (0).
(a> (3,
(b) Ci),
(4 0,
(4 Ci’>,
(4 (3,
2. (a) Show that (R) = (,nk).
(c) Find k, given that (‘j) = (k? 4).
(b) Find n, given that ( FO) = (y).
(d) 1s there a k such that (y) = ( k’2 a)?
3. Prove that (nkl ) = (k? r) + (R). This is called the Zaw of Pascal’s triangle and it provides a
rapid way of computing binomial coefficients successively. Pascal% triangle is illustrated here
for n 5 6 .
1
1
1
1
2
1
1
3
3
1
1 4 6 4 1
1 5 10 10 5 1
1 6 15 20 15 6 1
4. Use induction to prove the binomial theorem
(a + b)” =sc:)a7cbn-r.
k=O
Then use the theorem to derive
= 2n
the formulas
a n d
2(-l)“(*) = 0,
i f n>O.
The product notation. The product of II real numbers a,, a2, . . . , a, is denoted by the symbol
ni=1 a,, which may be defined by induction. The symbol a1a2 . . . a, is an alternative notation for
this product. Note that
n! = fik.
k=l
5. Give a definition by induction for the product nn=r ak.
45
Miscellaneous exercises involving induction
Prove the following properties of products by induction:
(multiplicative property).
An important special case is the relation ntzl (cak) = cn n$=r uk.
qyL2
k=laR-l
if each a, # 0
(telescoping property).
ao
8. If x # 1, show that
kn(
1 + X2K-‘) = g.
What is the value of the product when x = 1 ?
9. If aR < bk for each k = 1, 2, . . . , n, it is easy to prove by induction that Ii=, ak < z& bk.
Discuss the corresponding inequality for products:
Some
special
inequalities
10. If x > 1, prove by induction that xn > x for every integer n 2 2. If 0 < x < 1, prove that
xn < x for every integer n 2 2.
11. Determine a11 positive integers n for which 2n < n!.
12. (a) Use the binomial theorem to prove that for n a positive integer we have
(b) If n > 1, use part (a) and Exercise 11 to deduce the inequalities
13. (a) Let p be a positive integer. Prove that
bP - a* = (b - a)(b”-1 + bp-2a + b”-3a2 + . . . +
[Hint:
baP-2
Use the telescoping property for sums.]
(b) Let p and n denote positive integers. Use part (a) to show that
np < (n + IF1 - np+l < (n + l)p
P-t1
+ a~-1) .
46
Introduction
(c) Use induction to prove that
n-1
#+l
c
k=l
n
kP <- <
p
+
l
c
kP.
k=l
Part (b) Will assist in making the inductive step from n to n + 1.
14. Let (Or , . . . , a, be n real numhers, a11 having the same sign and a11 greater than -1. Use
induction to prove that
(1 + a&(1 + a,‘) . . ‘(1 +a,> 2 1 + a , + a , +*** +a,.
In particular, when a, = u2 = *. . = a, = x, where x > -1, this yields
(1.25)
(1 + X)” 2 1 + nx
(Bernoulli’s
inequulity).
Show that when n > 1 the equality sign holds in (1.25) only for x = 0.
15. If n 2 2, prove that n!/n” < (&)“, where k is the greatest integer I n/2.
16. The numbers 1, 2, 3, 5, 8, 13, 21, . . . , in which each term after the second is the sum of its
two predecessors, are called Fibonucci numbers. They may be defined by induction as follows :
Ul = 1,
cl2 = 2,
%,l = a, + a,-,
i f n>2.
Prove that
u < l+en
TZ2 )
(
for every n 2 1.
Znequulities reluting di’rent types of uveruges. Let x1 , x2 , . . . , x, be n positive real numbers.
If p is a nonzero integer, the pt,h-power meun M, of the n numbers is defined as follows :
xf + . . . + x;
MD =
n
The number M, is also called the urithmetic meun, M, the root meun square, and M-, the
hurmonic meun.
17. Ifp > 0, prove that M, < M,,, when x1 , x2 , . . . , x, are not a11 equal.
[Hint:
Apply the Cauchy-Schwarz inequality with uk = XE and bk = 1.1
18. Use the result of Exercise 17 to prove that
u4 + b4 + c4 2 “34
if u2 + b2 + c2 = 8 and a > 0, b > 0, c > 0.
19. Let a, , . . . , a, be n positive real numbers whose product is equal to 1. Prove that a, + ***+
a, 2 n and that the equality sign holds only if every ak = 1.
[Hint: Consider two cases: (a) Al1 & = 1; (b) not a11 ak = 1. Use induction. In case
(b) notice that if uiu2 . . . a,,, := 1, then at least one factor, say ur , exceeds 1 and at least
one factor, say a+, , is less than 1. Let b1 = a,~,+, and apply the induction hypothesis to
the product b1u2 *. *a, , using the fact that (ur - l)(~,+~ - 1) < 0.1
Miscellaneous
exercises involving induction
47
20. The geometric mean G of n positive real numbers x1 , . . . , x, is defined by the formula G =
(x1x2 . . . x,)l’fl.
(a) Let It4, denote the pth power mean. Prove that G < Ml and that G = Ml only when
x1 = x2 = . . . = x,.
(b) Let p and q be integers, q < 0 < p. From part (a) deduce that Mp < G < MD when x1 ,
x2 > . **> x, are not a11 equal.
21. Use the result of Exercise 20 to prove the following statement : If a, b, and c are positive real
numbers such that abc = 8, then a + b + c 2 6 and ab + ac + bc 2 12.
22. If Xl> . . . > x, are positive numbers and if y, = 1/x,, prove that
23. If a, b, and c are positive and if a + b + c = 1, prove that (1 - a)(1 - b)(l - c) 2 8abc.
1
THE CONCEPTS OF INTEGRAL CALCULUS
In this chapter we present the ‘definition of the integral and some of its basic properties.
TO understand the definition one must have some acquaintance with the function concept;
the next few sections are devoted. to an explanation of this and related ideas.
1.1 The basic ideas of Cartesian geometry
As mentioned earlier, one of the applications of the integral is the calculation of area.
Ordinarily we do not talk about a.rea by itself. Instead, we talk about the area of something.
This means that we have certain abjects (polygonal regions, circular regions, parabolic
segments, etc.) whose areas we wish to measure. If we hope to arrive at a treatment of area
that Will enable us to deal with many different kinds of abjects, we must first find an effective
way to describe these abjects.
The most primitive way of doing this is by drawing figures, as was done by the ancient
Greeks. A much better way was ;suggested by René Descartes (1596-1650), who introduced
the subject of analytic geometry (also known as Curtesian geometry). Descartes’ idea was
to represent geometric points by numbers. The procedure for points in a plane is this:
Two perpendicular reference lines (called coordinate axes) are chosen, one horizontal
(called the “x-axis”), the other vertical (the ‘ty-axis”). Their point of intersection, denoted
by 0, is called the origin. On the x-axis a convenient point is chosen to the right of 0 and
its distance from 0 is called the unit distance. Vertical distances along the y-axis are usually
measured with the same unit distance, although sometimes it is convenient to use a different
scale on the y-axis. Now each point in the plane (sometimes called the xy-plane) is assigned
a pair of numbers, called its coordinates. These numbers tel1 us how to locate the point.
Figure 1.1 illustrates some examples. The point with coordinates (3, 2) lies three units to
the right of they-axis and two unils above the x-axis. The number 3 is called the x-coordinate
of the point, 2 its y-coordinate. Points to the left of the y-axis have a negative x-coordinate;
those below the x-axis have a negative y-coordinate. The x-coordinate of a point is sometimes called its abscissa and the y-coordinate is called its ordinate.
When we Write a pair of numbers such as (a, b) to represent a point, we agree that the
abscissa or x-coordinate, a, is written first. For this reason, the pair (a, b) is often referred
to as an orderedpair. It is clear that two ordered pairs (a, b) and (c, d) represent the same
point if and only if we have a == c and b = d. Points (a, b) with both a and b positive
are said to lie in thejrst quadran,r; those with a < 0 and b > 0 are in the second quadrant;
48
The basic ideas of Cartesian geometry
49
those with a < 0 and b < 0 are in the third quadrant; and those with a > 0 and b < 0
are in the fourth quadrant. Figure 1.1 shows one point in each quadrant.
The procedure for points in space is similar. We take three mutually perpendicular
lines in space intersecting at a point (the origin). These lines determine three mutually
perpendicular planes, and each point in space cari be completely described by specifying, with
appropriate regard for signs, its distances from these planes. We shall discuss three-dimensional Cartesian geometry in more detail later on; for the present we confine our attention
to plane analytic geometry.
A geometric figure, such as a curve in the plane, is a collection of points satisfying one
or more special conditions. By translating these conditions into expressions involving the
y-axis
”
4
3
---------, (3,2)
2
f
,
I
(-2,1)y-7II
-5
-4 -31
-2
-1
0
I
1
-l--
-2-I,
(-3,
ll
1
I
2
3
x-axis
I
a+
4j
5
l
1
I
l
I
I
-3 ----------_____ 1(4, -3)
~&--~~-:~.~
FIGURE 1.1
FIGURE 1.2 The circle repre-
sented by the Cartesian equation
x2 + y2 = r2.
coordinates x and y, we obtain one or more equations which characterize the figure in
question. For example, consider a circle of radius r with its tenter at the origin, as shown
in Figure 1.2. Let P be an arbitrary point on this &cle, and suppose P has coordinates
(x, y). Then the line segment OP is the hypotenuse of a right triangle whose legs have
lengths 1x1 and [y[ and hence, by the theorem of Pythagoras,
x2 + y2 = r2.
This equation, called a Cartesian equation of the circle, is satisfied by a11 points (x, y) on
the circle and by no others, SO the equation completely characterizes the circle. This
example illustrates how analytic geometry is used to reduce geometrical statements about
points to analytical statements about real numbers.
Throughout their historical development, calculus and analytic geometry have been
intimately intertwined. New discoveries in one subject led to improvements in the other.
The development of calculus and analytic geometry in this book is similar to the historical
development, in that the two subjects are treated together. However, our primary purpose
is to discuss calculus. Concepts from analytic geometry that are required for this purpose
50
The concepts of integral calculus
Will be discussed as needed. Actually, only a few very elementary concepts of plane analytic
geometry are required to understand the rudiments of calculus. A deeper study of analytic
geometry is needed to extend the scope and applications of calculus, and this study Will be
carried out in later chapters using vector methods as well as the methods of calculus.
Until then, a11 that is required from analytic geometry is a little familiarity with drawing
graphs of functions.
1.2 Functions.
Informa1 description and examples
Various fields of human endeavor have to do with relationships that exist between one
collection of abjects and another. Graphs, charts, curves, tables, formulas, and Gallup ~011s
are familiar to everyone who reads the newspapers. These are merely devices for describing
special relations in a quantitative fashion. Mathematicians refer to certain types of these
relations as functions. In this section, we give an informa1 description of the function
concept. A forma1 definition is given in Section 1.3.
EXAMPLE 1. The force F necessary to stretch a steel spring a distance x beyond its natural
length is proportional to x. That is, F = cx, where c is a number independent of x called
the spring constant. This formula, discovered by Robert Hooke in the mid-17th Century, is
called Hooke’s Zaw, and it is said to express the force as a function of the displacement.
EXAMPLE 2. The volume of a cube is a function of its edge-length. If the edges have
length x, the volume Vis given by the formula V = x3.
EXAMPLE 3. A prime is any integer n > 1 that cannot be expressed in the form n = ab,
where a and b are positive integers, both less than n. The first few primes are 2, 3, 5, 7, 11,
13, 17, 19. For a given real number x > 0, it is possible to Count the number of primes less
than or equal to x. This number is said to be a function of x even though no simple algebraic
formula is known for computing it (without counting) when x is known.
The word “function” was introduced into mathematics by Leibniz, who used the term
primarily to refer to certain kinds of mathematical formulas. It was later realized that
Leibniz’s idea of function was much too limited in its scope, and the meaning of the word
has since undergone many stages of generalization. Today, the meaning of function is
essentially this : Given two sets, say X and Y, afunction is a correspondence which associates
with each element of X one and only one element of Y. The set X is called the domain of the
function. Those elements of Y associated with the elements in X form a set called the range
of the function. (This may be a11 of Y, but it need not be.)
Letters of the English and Greek alphabets are often used to denote functions. The
particular lettersf, g, h, F, G, H, and 9 are frequently used for this purpose. Iff is a given
function and if x is an abject of its domain, the notation f(x) is used to designate that abject
in the range which is associated to x by the function f, and it is called the value off at x
or the image of x under f. The symbol f(x) is read as “f of x.”
The function idea may be illustrated schematically in many ways. For example, in
Figure 1.3(a) the collections X and Y are thought of as sets of points and an arrow is used
to suggest a “pairing” of a typical point x in X with the image point f(x) in Y. Another
scheme is shown in Figure 1.3(b). Here the function f is imagined to be like a machine into
Functions.
Informa1 description and examples
51
(4
FIGURE
1.3 Schematic representations of the function idea.
which abjects of the collection X are fed and abjects of Y are produced. When an abject x
is fed into the machine, the output is the objectf(x).
Although the function idea places no restriction on the nature of the abjects in the domain
X and in the range Y, in elementary calculus we are primarily interested in functions whose
domain and range are sets of real numbers. Such functions are called real-valuedfunctions
of a real variable, or, more briefly, real fînctions, and they may be illustrated geometrically
by a graph in the xy-plane. We plot the domain X on the x-axis, and above each point x in
X we plot the point (x, y), where y = f (x). The totality of such points (x, y) is called the
graph of the function.
Now we consider some more examples of real functions.
EXAMPLE 4. The identity function. Suppose that f(x) = x for a11 real x. This function
is often called the identity function. Its domain is the real line, that is, the set of a11 real
numbers. Here x = y for each point (x, y) on the graph off. The graph is a straight iine
making equal angles with the coordinates axes (see Figure 1.4). The range off is the set of
a11 real numbers.
EXAMPLE 5. The absolute-value .function.
Consider the function which assigns to each
real number x the nonnegative number 1x1. A portion of its graph is shown &Figure 1.5.
y’Pw = 1x
44
0
FIGURE
1.4 Graph of the identity
functionf(x) = x.
FIGURE
X
1.5 Absolute-value
function q(x) = 1x1.
52
The concepts of integral calculus
Denoting this function by p, we have y(x) = 1x1 for a11 real x. For example, ~(0) = 0,
~(2) = 2, v( - 3) = 3. We list here some properties of absolute values expressed in function
notation.
64 d-4 = P(X).
(4 dvW1 = dx> .
(b) V(X”) = x2 ,.
(e) y(x) = dxZ .
(c) ~(x + y) 5 q(x) + &y)
(the triangle inequality) .
EXAMPLE 6. Theprime-numberfimction. For any x > 0, let V(X) be the number of primes
less than or equal to x. The domain of n is the set of positive real numbers. Its range is the
set of nonnegative integers (0, 1,2, . . . }. A portion of the graph of 77 is shown in Figure 1.6.
FIGURE 1.6 The prime-number function.
FIGURE 1.7 The factorial
function.
(Different scales are used on the
remains constant until x reaches
Therefore the graph of 7r consists
of functions called step functions;
x- and y-axes.) As x increases, the function value r(x)
a prime, at which point the function value jumps by 1.
of horizontal line segments. This is an example of a class
they play a fundamental role in the theory of the integral.
EXAMPLE 7. The factorial func/ion.
For every positive integer n, we define f(n) to be
n! = l-2..- n. In this example, the domain off is the set of positive integers. The
function values increase SO rapidly that it is more convenient
to display this function in
tabular form rather than as a graph. Figure 1.7 shows a table listing the pairs (n, n!) for
n = 1, 2, . . . , 10.
The reader should note two features that a11 the above examples have in common.
(1) For each x in the domain X. there is one and only one image y that is paired with that
particular x.
(2) Each function generates a set of pairs (x, y), where x is a typical element of the
domain X, and y is the unique element of Y that goes with x.
In most of the above examples, we displayed the pairs (x, y) geometrically as points on a
graph. In Example 7 we displayed them as entries in a table. In each case, to know the
function is to know, in one way or another, a11 the pairs (x, y) that it generates. This simple
Functions. Formal dejînition as a set of ordered pairs
53
observation is the motivation behind the forma1 definition of the function concept that is
given in the next section.
*1.3 Functions. Forma1 definition as a set of ordered pairs
In the informa1 discussion of the foregoing section, a function was described as a correspondence which associates with each abject in a set X one and only one abject in a set Y.
The words “correspondence” and “associates with” may not convey exactly the same
meaning to a11 people, SO we shall reformulate the whole idea in a different way, basing it on
the set concept. First we require the notion of an orderedpair of abjects.
In the definition of set equality, no mention is made of the order in which elements
appear. Thus, the sets {2,5} and {5,2} are equal because they consist of exactly the same
elements. Sometimes the order is important. For example, in plane analytic geometry the
coordinates (x, y) of a point represent an ordered pair of numbers. The point with coordinates (2, 5) is not the same as the point with coordinates (5, 2), although the sets (2, 5)
and {5, 2) are equal. In the same way, if we have a pair of abjects a and b (not necessarily
distinct) and if we wish to distinguish one of the abjects, say a, as thefirst member and the
other, b, as the second, we enclose the abjects in parentheses, (a, b). We refer to this as an
ordered pair. We say that two ordered pairs (a, b) and (c, d) are equal if and only if their
first members are equal and their second members are equal. That is to say, we have
(a, b) = Cc, 4
ifandonlyif a=c and b=d.
Now we may state the forma1 definition of function.
DEFINITION
OF
FUNCTION.
A function
f
is a set of ordered pairs (x, y) no two of lishich
have the sameJirst member.
Iff is a function, the set of a11 elements x that occur as first members of pairs (x, y) in f
is called the domain off. The set of second members y is called the range off, or the set of
values off.
Tntuitively, a function cari be thought of as a table consisting of two columns. Each
entry in the table is an ordered pair (x, y); the column of x’s is the domain off, and the
column of y’s, the range. If two entries (x, y) and (x, z) appear in the table with the same
x-value, then for the table to be a function it is necessary that y = z. In other words, a
function cannot take two different values at a given point x. Therefore, for every x in the
domain off there is exactly one y such that (x, y) of. Since this y is uniquely determined
once x is known, we cari introduce a special symbol for it. It is customary to Write
Y
=fW
instead of (x, y) E f to indicate that the pair (x, y) is in the set f.
As an alternative to describing a function f by specifying explicitly the pairs it contains,
it is usually preferable to describe the domain off, and then, for each x in the domain, to
describe how the function value f (x) is obtained. In this connection,
we have the following
theorem whose proof is left as an exercise for the reader.
54
The concepts of integral calculus
Two functions f and g are equal if and only if
(a) f and g have the same domain, and
(b) f(x) = g(x) for every x in the domain ofj
THEOREM 1.1.
It is important to realize that the abjects x and f(x) which appear in the ordered pairs
(x, f (x)) of a function need not be numbers but may be arbitrary abjects of any kind.
Occasionally we shall use this degree of generality, but for the most part we shall be interested
in real functions, that is, functions whose domain and range are subsets of the real line.
Some of the functions that arise in calculus are described in the next few examples.
1.4 More examples of real functions
1. Constant functions. A function whose range consists of a single number is called a
constant function. An example is shown in Figure 1.8, where f (x) = 3 for every real
x. The graph is a horizontal line cutting the y-axis at the point (0, 3).
Y
d=
g(x) = 2x
f(x) = 3
2
1
0
FIGURE 1.8 A
constant
function f(x) = 3.
Y
0
X
/
FIGURE 1.9
A linear function
g(x) = 2x - 1.
FIGURE
1.10 A quadratic
polynomial f(x) = x2.
2. Linear functions. A function g defined for a11 real x by a formula of the form
g(x) = ax + b
is called a linear function because its graph is a straight line. The number b is called
the y-intercept of the line; it is the y-coordinate of the point (0, b) where the line cuts
the y-axis. The number a is called the slope of the line. One example, g(x) = x, is
shown in Figure 1.4. Another, g(x) = 2x - 1, is shown in Figure 1.9.
3. The power functions. For a fixed positive integer n, let f be defined by the equation
f(x) = xn for a11 real x. When n = 1, this is the identity function, shown in Figure 1.4.
For n = 2, the graph is a parabola, part of which is shown in Figure 1.10. For n = 3,
the graph is a cubic curve and has the appearance of that in Figure 1.11 (p. 56).
More examples of real jiinctions
55
4. Polynomial jîunctions. A polynomial function P is one defined for a11 real x by an
equation of the form
P(x)=c,+c,x+...+c,x”=$c,xk.
K=O
The numbers cg, c1 , . . . , c, are called the coefJicients
of the polynomial, and the
nonnegative integer n is called its degree (if c, # 0). They include the constant functions and the power functions as special cases. Polynomials of degree 2, 3, and 4 are
called quadratic, cubic, and quartic polynomials, respectively. Figure 1.12 shows a
portion of the graph of a quartic polynomial P given by P(x) = $x4 - 2x2.
5. The circle. Suppose we return to the Cartesian equation of a circle, x2 + y2 = r2 and
solve this equation for y in terms of x. There are two solutions given by
y+/-
and
y=
-1/v2_x2.
(We remind the reader that if a > 0, the symbol z/a denotes the positive square root
of a. The negative square root is -A.) There was a time when mathematicians would
say that y is a double-valuedfunction of x given by y = &-v’???. However, the
more modern point of view does not admit “double-valuedness” as a property of
functions. The definition of function requires that for each x in the domain, there
corresponds one and only one y in the range. Geometrically, this means that vertical
lines which intersect the graph do SO at exactly one point. Therefore to make this
example fit the theory, we say that the two solutions for y define two functions, say
f and g, where
f cx> = m
and
g(x) = -dG2
for each x satisfying -r < x 5 r. Each of these functions has for its domain the
interval extending from -r to r. If 1x1 > r, there is no real y such that x2 + y2 = r2,
and we say that the functions f and g are not dejined for such x. Since f (x) is the nonnegative square root of r2 - x2, the graph off is the Upper semicircle shown in Figure
1.13. The function values of g are 5 0, and hence the graph of g is the lower semicircle
shown in Figure 1.13.
6. Sums, products, and quotients of functions. Let f and g be two real functions having
the same domain D. We cari construct new functions from f and g by adding, multiplying, or dividing the function values. The function u defined by the equation
44 =fW + g(x)
i f
XED
is called the sum off and g and is denoted by f + g. Similarly, the product v =
and the quotient w = f/g are the functions defined by the respective formulas
V(X> =fW&>
if
XE
D,
49 =fW/&>
f * g
if x E D and g(x) # 0.
56
The concepts of integral calculus
Y
P(x) = ix’ - 2x2
+px
‘;;J *x-&x
FIGURE 1.11
FIGURE 1.12
A cubic
polynomial: P(x) = x3.
A quartic polynomial :
P(x) = ix” - 2x2.
FIGURE 1.13 Graphs of
two functions:
f(x) = dr2 - x2,
g(x) = -o-F?
The next set of exercises is intended to give the reader some familiarity with the use of
the function notation.
1 . 5 Exercises
1. Let f(x) = x + 1 for a11 real x. Compute the following: f(2), f( -2), -f(2), f(h), llf(2),
f@ + b)> f(4 + j-(4 fwf@).
2. Let f(x) = 1 + x and let g(x) == 1 - x for a11 real x. Compute the following: f(2) + g(2),
f(2) - g(2>,f(2>g(2>,f(2)/go,J tgm, g[fm>.m + g( -4,fWg( -f>.
3. Let p(x) = Ix - 31 + Ix - l( for ah real x. Compute the following: p(O), p(l), v(2), p(3),
q( -l), 9( -2). Find a11 I for which ~(t + 2) = p(l).
4. Letf(x) = x2 for a11 real x. Verify each of the following formulas. In each case describe the
set of real x, y, t, etc., for which the given formula is valid.
(4 f< -x) = f(x).
(4 f(2y) = 4f(v).
(b) f(y) -f(x) = (y - ~>(y + 4.
Ce> f<t”> = f<Oi.
(c) f(X + h) -f(x) = 2xh + h”.
(0 dfca> = M.
5. Let g(x) = Y4 - x2 for 1x1 2 2. Verify each of the following formulas and tel1 for which
values of x, y, S, and t the given formula is valid.
(d) g(a - 2) = Va.
(4 g(-4 =gW
(b) ,@y) = 2d7.
Cc)
&(y
ILïF-T
= p, *
(e) g i = $d16 - s2.
il
1
2 -g(x)
Cf) 2) = -p-- *
6. Let f be defined as follows: f(x) = 1 for 0 5 x < 1; f(x) = 2 for 1 < x < 2. The function
is not defined if x < 0 or if x > 2.
(a) Draw the graph off.
(b) Let g(x) = f (2x). Describe the domain of g and draw its graph.
(c) Let h(x) =f(x - 2). Describe the domain of h and draw its graph.
(d) Let k(x) = f(2x) + f(x - 2). Describe the domain of k and draw its graph.
The concept of area as a set function
57
7. The graphs of the two polynomials g(x) = x and f(x) = x3 intersect at three points. Draw
enough of their graphs to show how they intersect.
8. The graphs of the two quadratic polynomialsf(x) = x2 - 2 and g(x) = 2x2 + 4x + 1 intersect at two points. Draw the portions of the two graphs between the points of intersection.
9. This exercise develops some fundamental properties of polynomials. Let f(x) = &, clcxk be
a polynomial of degree n. Prove each of the following:
(a) If n 2 1 andf(0) = 0, thenf(x) = X~(X), whereg is a polynomial of degree n - 1.
(b) For each real a, the function p given by p(x) =f(x + a) is a polynomial of degree n.
(c) If n 2 1 andf(a) = 0 for some real a, thenf(x) = (x - a)h(x), where h is a polynomial of
degree n - 1. [Hint: Consider p(x) =f(x + a).]
(d) Iff(x) = 0 for n + 1 distinct real values of x, then every coefficient ck is zero andf(x) = 0
for a11 real x.
(e) Letg(x) = zm=
k , b kxk be a polynomial of degree m, where m 2 n. Ifg(x) = f(x) for m + 1
distinct real values of x, then m = n, b, = cB for each k, andg(x) =f(x) for a11 real x.
10. In each case, find a11 polynomials p of degree 5 2 which satisfy the given conditions.
(a> p(O) =PU) =pQ) = 1.
(b) p(O) = p(l) = l,p@) = 2.
(cl p(O) =p(l) = 1.
(4 p(O) =PU).
11. In each case, find a11 polynomials p of degree 5 2 which satisfy the given conditions for a11
real x.
(4 p(x) =PU - 4.
(b) p(x) = ~(1 + xl.
cc> pc24 = 2pw.
(4 ~(3x1 = p(x + 3).
12. Show that the following are polynomials by converting them to the form z;C=, ukxk for a
suitable m. In each case n is a positive integer.
1 - Xn+l
(a) (1 + x)~~.
(b) ~
l - x
’
x # 1.
(cl a(1 + x2?.
1.6 The concept of area as a set function
When a mathematician attempts to develop a general theory encompassing many different
concepts, he tries to isolate common properties which seem to be basic to each of the
particular applications he has in mind. He then uses these properties as fundamental
building blocks of his theory. Euclid used this process when he developed elementary
geometry as a deductive system based on a set of axioms. We used the same process in our
axiomatic treatment of the real number system, and we shall use it once more in our discussion of area.
When we assign an area to a plane region, we associate a number with a set S in the plane.
From a purely mathematical viewpoint, this means that we have a function a (an area
function) which assigns a real number a(S) (the area of S) to each set S in some given
collection of sets. A function of this kind, whose domain is a collection of sets and whose
function values are real numbers, is called a setfinction. The basic problem is this : Given a
plane set S, what area a(S) shall we assign to S?
Our approach to this problem is to start with a number of properties we feel area should
have and take these as axioms for area. Any set function which satisfies these axioms Will
be called an area function. TO make certain we are not discussing an empty theory, it is
necessary to show that an area function actually exists. We shall not attempt to do this here.
Instead, we assume the existence of an area function and deduce further properties from the
axioms. An elementary construction of an area function may be found in Chapters 14 and
22 of Edwin E. Moise, Elementary Geometry From An Advanced Standpoint, AddisonWesley Publishing CO., 1963.
The concepts of integral calculus
58
Before we state the axioms for area, we Will make a few remarks about the collection of
sets in the plane to which an area cari be assigned. These sets Will be called measurable
sets; the collection of a11 measurable sets Will be denoted by J%‘. The axioms contain enough
information about the sets in ~2? to enable us to prove that a11 geometric figures arising in
the usual applications of calculus are in J%’ and that their areas cari be calculated by integration.
One of the axioms (Axiom 5) srates that every rectangle is measurable and that its area
is the product of the lengths of its edges. The term “rectangle” as used here refers to any
set congruentt to a set of the form
Nx, y> 10 I x 5 h, 0 < y I k),
where h > 0 and k 2 0. The numbers h and k are called the lengths of the edges of the
rectangle. We consider a line segment or a point to be a special case of a rectangle by
allowing h or k (or both) to be zero.
A step region
FIGURE
Ordinate set
(4
FIGURE
1.14
Inner
step region
(b)
1.15 An ordinate set enclosed
Outer step region
(cl
by two step regions.
From rectangles we cari build up more complicated sets. The set shown in Figure 1.14
is the union of a finite collection of adjacent rectangles with their bases resting on the x-axis
and is called a step region. The axioms imply that each step region is measurable and that
its area is the sum of the areas of the rectangular pieces.
The region Q shown in Figure 1.15(a) is an example of an ordinate set. Its Upper boundary
is the graph of a nonnegative function.
Axiom 6 Will enable us to prove that many ordinate
sets are measurable and that their areas cari be calculated by approximating such sets by
inner and outer step regions, as shown in Figure 1.15(b) and (c).
We turn now to the axioms themselves.
We assume there exists a class J? of measurable sets
a, whose domain is A%‘, with the following properties:
AXIOMATIC DEFINITION OF AREA.
in the plane and a set function
1. Nonnegative property. For each set S in 4, we have a(S) 2 0.
t Congruence is used here in the same sense as in elementary Euclidean geometry. Two sets are said to be
congruent if their points cari be put in one-to-one correspondence in such a way that distances are preserved.
That is, if two points p and q in one set correspond to p’ and q’ in the other, the distance from p to q must
be equal to the distance from p’ to q’; this must be true for a11 choices of p and q.
The concept
2. Additive property.
If
of
area as a set function
59
S and Tare in =&, then S u T and S n Tare in G&‘, and we have
a(S
U
T) = a(S) + a(T) - a(S n T) .
3. DifSerenceproperty.
If S and Tare in J$‘ with S c T, then T - S is in A, and use have
a(T - S) = a(T) - a(S).
4. Invariance under congruence. If a set S is in & and
in J?’ and we have a(S) = a(T).
5. Choice of scale. Every rectangle R is in A.
then a(R) = hk.
If
if
T is congruent to S, then T is also
the edges
6. Exhaustion propert,v. Let Q be a set that cari be enclosed
S and T, SO that
U-1)
of
R have lengths h and k,
between two step regions
SsQcT.
If there is one and only one number c which satisjes the inequalities
49 I c I a(T)
for ail step regions S and T satisfying (1 .l), then Q is rneasurable
and
a(Q)
= c.
Axiom 1 simply states that the area of a plane measurable set is either a positive number
or zero. Axiom 2 tells us that when a set is formed from two pieces (which may overlap),
the area of the union is the sum of the areas of the two parts minus the area of their intersection. In particular, if the intersection has zero area, the area of the whole is the sum of
the areas of the two parts.
If we remove a measurable set S from a larger measurable set T, Axiom 3 states that the
remaining part, T - S, is measurable and its area is obtained by subtraction, a(T - S) =
a(T) - a(S). In particular, this axiom implies that the empty set ,@ is measurable and has
zero area. Since a(T - S) 2 0, Axiom 3 also implies the monotone property:
4s) 5 a(T),
forsetsSandTin&YwithSc
T.
In other words, a set which is part of another cannot have a larger area.
Axiom 4 assigns equal areas to sets having the same size and shape. The first four
axioms would be trivially satisfied if we assigned the number 0 as the area of every set in
,&Y. Axiom 5 assigns a nonzero area to some rectangles and thereby excludes this trivial
case. Finally, Axiom 6 incorporates the Greek method of exhaustion; it enables us to
extend the class of measurable sets from step regions to more general regions.
Axiom 5 assigns zero area to each line segment. Repeated use of the additive property
shows that every step region is measurable and that its area is the sum of the areas of the
rectangular pieces. Further elementary consequences
of the axioms are discussed in the
next set of exercises.
The concepts of integral calculus
60
1.7 Exercises
The properties of area in this set of exercises are to be deduced from the axioms for area stated
in the foregoing section.
1. Prove that each of the following sets is measurable and has zero area: (a) A set consisting of a
single point. (b) A set consisting of a finite number of points in a plane. (c) The union of a
finite collection of line segments in a plane.
2. Every right triangular region is measurable because it cari be obtained as the intersection of
two rectangles. Prove that every triangular region is measurable and that its area is one half
the product of its base and altitude.
3. Prove that every trapezoid and every parallelogram is measurable and derive the usual formulas
for their areas.
4. A point (x, y) in the plane is called a latticepoint if both coordinates x and y are integers. Let
P be a polygon whose vertices are lattice points. The area of P is Z + ;B - 1, where Z denotes
the number of lattice points inside the polygon and B denotes the number on the boundary.
(a) Prove that the formula is valid for rectangles with sides parallel to the coordinate axes.
(b) Prove that the formula is valid for right triangles and parallelograms.
(c) Use induction on the number of edges to construct
a proof for general polygons.
5. Prove that a triangle whose vertices are lattice points cannot be equilateral.
[Hint: Assume there is such a triangle and compute its area in two ways, using
Exercises 2 and 4.1
6. Let A = (1, 2, 3, 4, 5}, and let ,I denote the class of a11 subsets of A. (There are 32 altogether,
counting A itself and the empty set @ .) For each set S in A, let n(S) denote the number of
distinct elements in S. If S = (1, 2, 3, 4) and T = (3, 4, 5}, compute n(S u T), n(S A T),
n(S - T), and n(T - S). Prove that the set function n satisfies the first three axioms for area.
1.8 Intervals and ordinate sets
In the theory of integration we are concerned primarily with real functions whose domains
are intervals on the x-axis. Sometimes it is important to distinguish between intervals
which include their endpoints and those which do not. This distinction is made by introducing
the following definitions.
---b
a
b
ab
a
b
a
a<xib
a,<x<b
a<xib
a<x<b
Closed
Open
Half-open.
Half-open
FIGURE 1.16 Examples of intervals.
If a < b, we denote by [a, b] the set of a11 x satisfying the inequalities a 5 x 5 b and
refer to this set as the closed interval from a to b. The corresponding open interval, written
(a, b), is the set of a11 x satisfying a < x < b. The closed interval [a, b] includes the endpoints a and b, whereas the open interval does not. (See Figure 1.16.) The open interval
(a, b) is also called the interior of [a, b]. Half-open intervals (a, b] and [a, b), which include
just one endpoint are defined by the inequalities a < x 5 b and a 5 x < b, respectively.
Let f be a nonnegative function whose domain is a closed interval [a, b]. The portion
of the plane between the graph off and the x-axis is called the ordinate set of J More
Partitions and step fînctions
61
precisely, the ordinate set off is the collection of a11 points (x, JJ) satisfying the inequalities
In each of the examples shown in Figure 1.17 the shaded portion represents the ordinate
set of the corresponding function.
Ordinate sets are the geometric abjects whose areas we want to compute by means of the
integral calculus. We shall define the concept of integral first for step functions and then
use the integral of a step function to formulate the definition of integral for more general
a
a
b
FIGURE 1.17
b
Examples of ordinate sets.
functions. Integration theory for step functions is extremely simple and leads in a natural
way to the corresponding theory for more general functions. TO start this program, it is
necessary to have an analytic definition of a step function. This may be given most simply
in terms of the concept of a partition, to which we turn now.
1.9 Partitions and step functions
Suppose we decompose a given closed interval [a, b] into n subintervals by inserting
n - 1 points of subdivision, say x1 , x2 , . . . , x,-~ , subject only to the restriction
(1.2)
It is convenient
to denote the point a itself by x,, and the point b by X, . A collection of
points satisfying (1.2) is called a partition P of [a, b], and we use the symbol
p = {&l,Xl,
. . . 2 &a>
to designate this partition. The partition P determines n closed subintervals
[XII, $1, [x1 3x21 , *. . 7[X,-I 3 x,1 .
A typical closed subinterval is [xkPl , x,], and it is referred to as the kth closed subinterval
of P; an example is shown in Figure 1.18. The corresponding open interval (xkPl , xk) is
called the kth open subinterval of P.
Now we are ready to formulate an analytic definition of a step function.
The concepts of integral calculus
62
kth subinterval [x~ _, , xk]
0
a = x()
XI
x2
..,
xk-l
xk
...
Xn = b
X,-l
FIGURE 1.18 An example of a partition of [a, b].
DEFINITION OF A STEP FUNCTION. A fîînction s, whose domain is a closed interval [a, b],
is called a step function if there is a partition P = {x,, , x1 , . . . , x,} of [a, b] such that s is
constant on each open subinterval of P. That is to say, for each k = 1, 2, . . . , n, there is
a real number s, such that
s(x) = Sk
if
xk-1 < x < xk *
Step functions are sometimes calledpiecewise constant functions.
Note: At each of the endpoints xkpl and xk the function must have some well-defined
value, but this need not be the same as sk .
EXAMPLE. A familiar example of a step function is the “postage function,” whose graph
is shown in Figure 1.19. Assume that the charge for first-class
mail for parcels weighing
up to 20 pounds is 5 cents for every ounce or fraction thereof. The graph shows the number
of 5-cent stamps required for mail weighing up to 4 ounces. In this case the line segments
on the graph are half-open intervals containing their right endpoints. The domain of the
function is the interval [0, 3201.
From a given partition P of [LJ, b], we cari always form a new partition P’ by adjoining
more subdivision points to those already in P. Such a partition P’ is called a rejinement
of P and is said to be jner than P. For example, P = (0, 1, 2, 3, 4) is a partition of the
interval [0, 41. If we adjoint the points 3/4, 42, and 7/2, we obtain a new partition P’ of
p:o
1
2
3
4
3
2
1
4 looo
OI1
11
2
3I
41
FIGURE 1.19 The postage function.
=
P’: 0
-
31x4
4
=
2
=
3
1
z
=
4
FIGURE 1.20 A partition P of [0,4] and a
refinement P’.
Exercises
63
[O, 41, namely, P’ = (0, 314, 1, dz, 2, 3,7/2,4}, which is a refinement of P. (See Figure
1.20.) If a step function is constant on the open subintervals of P, then it is also constant
on the open subintervals of every refinement P’.
1.10 Sum and product of step functions
New step functions may be formed from given step functions by adding corresponding
function values. For example, suppose s and t are step functions, both defined on the
same interval [a, b]. Let P, and P, be partitions of [a, b] such that s is constant on the open
subintervals of PI and t is constant on the open subintervals of P, . Let u = s + t be the
function defined by the equation
u(x) = s(x) + t(x)
if
a<x<b.
Graph of s + t
-
Graph of t
l
.
.
-
-
.
.
’
a
X1
7
a
XI
FIGURE 1.21
X;
a
XI
The sum of two step functions.
TO show that u is actually a step function, we must exhibit a partition P such that u is
constant on the open subintervals of P. For the new partition P, we take a11 the points of
P, along with a11 the points of P, . This partition, the union of P, and P, , is called the
common rejnement of P, and P2 . Since both s and t are constant on the open subintervals
of the common refinement, the same is true of w. An example is illustrated in Figure 1.21.
The partition P, is (a, x1 , b}, the partition P, is {a, xi , b}, and the common refinement is
{a, x; , xl , 6.
Similarly, the product v = s * t of two step functions is another step function. An
important special case occurs when one of the factors, say t, is constant throughout [a, b].
If t(x) = c for each x in [a, b], then each function value v(x) is obtained by multiplying the
step function s(x) by the constant c.
1.11 Exercises
In this set of exercises, [x] denotes the greatest integer < x.
1. Letf(x) = [x] and letg(x) = [2x] for a11 real x. In each case, draw the graph of the function
h defined over the interval [ - 1, 21 by the formula given.
(4 Mx) = f(x) + g(x).
Cc) h(x) = f(x)&).
(b) h(x) =/-C-d + gbP).
(4 h(x) = Q-WgW).
2. In each case,fis a function defined over the interval [ -2, 21 by the formula given. Draw the
graph off. Iffis a step function, find a partition P of [ -2, 21 such thatfis constant on the
open subintervals of P.
The concepts of integral calculus
64
(dj j-(x:, = 2[x].
(4 f(x) = x + [xl.
(ej f(x) = [x + 41.
(b) f(x) = x - [xl.
(f) f(%, = [xl + Lx + 41.
Cc) f(x) = [-xl.
3. In each case, sketch the graph of the functionfdefined by the formula given.
for
0 2; x < 10.
for
0 5 x < 10.
cc> j-w = VGI
(4 f(x) = GI
for
0 <;x < 3.
for
05x53.
(b) j-(x> = [x21
(4 j-b) = [xl’
4. Prove that the greatest-integer fi.mction has the properties indicated.
(a) [x + n] = [x] + n for every integer n.
-[xl if x is an integer,
(b) [-xl =
( -[xl - 1 otherwise.
Cc> [X+~I=[~I+[~I
or [.yl+[yl+l.
(dj [2x] = [xl + [x + 41.
(e) [3x] = [xl + Lx + 41 + Lx + $1.
Optional exercises.
5. The formulas in Exercises 4(d) and 4(e) suggest a generalization for [nx]. State and prove
such a generalization.
6. Recall that a lattice point (x, y) in the plane is one whose coordinates are integers. Letfbe a
nonnegative function whose domain is the interval [a, b], where a and b are integers, a < b.
Let S denote the set of points (x, y) satisfying a 5 x 5 b, 0 < y <f(x). Prove that the number
of lattice points in S is equal to the sum
f$ [f(n)].
n=a
7. If a and b are positive integers with no common factor, we have the formula
81
b-1
na
n=l b
(a - lj(b - 1)
=
2
*
When b = 1, the sum on the left is understood to be 0.
(a) Derive this result by a geometric argument, counting lattice points in a right triangle.
(b) Derive the result analytically as follows: By changing the index of summation, note that
2;~; [nalbl = 2:~; W - njlbl. N ow apply Exercises 4(a) and (b) to the bracket on the
right.
8. Let S be a set of points on the real line. The characteristic function of S is, by definition, the
function xs such that xx(x) = 1 for every x in S, and x,9(x) = 0 for those x not in S. Let f be
a step function which takes the constant value ck on the kth open subinterval Zk of some partition
of an interval [a, b]. Prove that for each x in the union Z1 u Z, u . u Z, we have
This property is described by saying that every step function is a linear combination of characteristic functions
of intervals.
1.12 The definition of the integral for step functions
In this section we introduce thl: integral for step functions.
The definition is constructed
that the integral of a nonnegative step function is equal to the area of its ordinate set.
SO
The dejnition of the integral for step functions
65
Let s be a step function defined on [a, b], and let P = {x, , x1 , . . . , x,} be a partition of
[a, b] such that s is constant on the open subintervals of P. Denote by s, the constant value
that s takes in the kth open subinterval, SO that
s(x) = s,
if
X&l < x < xk ,
k = 1,2 ,..., n .
The integral of s from a to b, denoted
by the following formula:
DEFINITION OF THE INTEGRAL OF STEP FUNCTIONS.
by the symbol Sa s(x) dx, is dejined
(1.3)
b s(x) dx = 5 sk . (xk - x& .
k=l
That is to say, to compute the integral, we multiply each constant value sk by the length of
the kth subinterval, and then we add together a11 these products.
Note that the values of s at the subdivision points are immaterial since they do not appear
on the right-hand side of (1.3). In particular, ifs is constant on the open interval (a, b), say
s(x) = c if a < x < b, then we have
s; s(x) dx = ci (xk - xkpl) = c(b - a) ,
k=l
regardless of the values s(a) and s(b). If c > 0 and if s(x) = c for a11 x in the closed interval
[a, b], the ordinate set of s is a rectangle of base b - a and altitude c; the integral of s is
c(b - a), the area of this rectangle. Changing the value of s at one or both endpoints a or b
changes the ordinate set but does not alter the integral of s or the area of its ordinate set.
For example, the two ordinate sets shown in Figure 1.22 have equal areas.
X
X
FIGURE 1.22 Changes in function values at two
points do not alter area
of ordinate set.
FIGURE 1.23 The ordinate set of a
step function.
The ordinate set of any nonnegative step function s consists of a finite number of rectangles, one for each interval of constancy;
the ordinate set may also contain or lack certain
vertical line segments, depending on how s is defined at the subdivision points. The integral
of s is equal to the sum of the areas of the individual rectangles, regardless of the values s
takes at the subdivision points. This is consistent with the fact that the vertical segments
have zero area and make no contribution to the area of the ordinate set. In Figure 1.23,
the step function s takes the constant values 2, 1, and $ in the open intervals (1, 2), (2, 5),
and (5, 6), respectively. Its integral is equal to
s
16 s(x) dx = 2 . (2 - 1) + 1 +(5 - 2) + i. (6 - 5) = “,9.
The concepts of integral calculus
66
It should be noted that the formula for the integral in (1.3) is independent of the choice of
the partition P as long as s is constant on the open subintervals of P. For example, suppose
we change from P to a finer partition P’ by inserting exactly one new subdivision point t,
where x,, < t < x1. Then the first term on the right of (1.3) is replaced by the two terms
s1 *(t - x,,) and s1 *(x1 - t), and the rest of the terms are unchanged. Since
SI *(t - x0) + SI . (x1 - t) = SI *(x1 - X0))
the value of the entire sum is unchanged. We cari proceed from P to any finer partition P’
by inserting the new subdivision points one at a time. At each stage, the sum in (1.3)
remains unchanged, SO the integral is the same for a11 refinements of P.
1.13 Properties of the integral of a step function
In this section we describe a number of fundamental properties satisfied by the integral
of a step function. Most of thlese properties seem obvious when they are interpreted
geometrically, and some of them may even seem trivial. Al1 these properties carry over
to integrals of more general functions, and it Will be a simple matter to prove them in the
general case once we have established them for step functions. The properties are listed
below as theorems, and in each case a geometric interpretation for nonnegative step functions
is given in terms of areas. Analytic proofs of the theorems are outlined in Section 1.15.
b
a
a
FIGURE 1.24 Illustrating
b
a
b
the additive property of the integral.
The first property states that the integral of a sum of two step functions is equal to the
sum of the integrals. This is known as the additive property and it is illustrated in Figure
1.24.
THEOREM 1.2. ADDITIVE
PROPERTY.
jab b(x) + t(x)] dx = j; s(x) dx + j; t(x) dx .
The next property, illustrated in Figure 1.25, is called the homogeneous property. It
states that if a11 the function values are multiplied by a constant c, then the integral is also
multiplied by c.
THEOREM
1.3.
HOMOGENE~~S
PROPERTY.
For every realnumber c, we have
b s(x) dx .
sab c . s(x) dx = c sa
These two theorems cari be combined into one formula known as the linearity property.
67
Properties of the integrai of a step function
a
b
FIGURE 1.25
THEOREM
1.4.
b
a
Illustrating the homogeneous property of the integral (with c = 2).
LINEARITY
PRO~ERTY.
For every real cl and c2, we bave
j-1 [C~~(X) + c2t(x)] dx = cl Jab s(x) dx + c2 Jab t(x) dx <
Next, we have a comparison theorem which tells us that if one step function has larger
values than another throughout [a, b], its integral over this interval is also larger.
THEOREM
1.5.
COMPARISON
THEOREM.
Ifs(x) < t(x)for every x in [a, b], then
Jab s(x) dx < Jab t(x) dx .
Interpreted geometrically, this theorem reflects the monotone property of area. If the
ordinate set of a nonnegative step function lies inside another, the area of the smaller region
is less than that of the larger.
The foregoing properties a11 refer to step functions defined on a common interval. The
integral has further important properties that relate integrals over different intervals.
Among these we have the following.
THEOREM
1.6.
ADDITIVITY WITH RESPECT TO THE
1: S(X) dx + J: S(X) dx = s: s(x) dx
INTERVAL
if
OF
INTEGRATION.
a < c < b .
This theorem reflects the additive property of area, illustrated in Figure 1.26. If an ordinate
set is decomposed into two ordinate sets, the sum of the areas of the two parts is equal to
the area of the whole.
The next theorem may be described as invariance under translation. If the ordinate set
of a step function s is “shifted” by an amount c, the resulting ordinate set is that of another
step function t related to s by the equation t(x) = s(x - c). Ifs is defined on [a, b], then
t is defined on [a + c, b + c], and their ordinate sets, being congruent, have equal areas.
The concepts of integral calculus
68
b
c
a
b
a
a+c
b+c
FIGURE 1.26 Additivity with respect
FIGURE 1.27 Illustrating invariance of the
to the interval of integration.
integral under translation: t(x) = s(x - c).
This property is expressed analytically as follows:
THEOREM 1.7.
INVARIANCE
UNDER
TRANSLATION.
J ab s(x) dx =: ib+c s(x - c) dx
for every real c
a+C
Its geometric meaning is illustrated in Figure 1.27 for c > 0. When c < 0, the ordinate
set is shifted to the left.
The homogeneous property (l’heorem 1.3) explains what happens to an integral under a
change of scale on the y-axis. The following theorem deals with a change of scale on the
x-axis. If s is a step function defined on an interval [a, b] and if we distort the scale in the
horizontal direction by multiplying a11 x-coordinates by a factor k > 0, then the new graph
is that of another step function t defined on the interval [ka, kb] and related to s by the
equation
t(x) =: s ;
0
if
ka 5 x 5 kb .
An example with k = 2 is shown in Figure 1.28 and it suggests that the distorted figure has
an area twice that of the original figure. More generally, distortion by a positive factor k
2a
FIGURE 1.28
2%
26
C’hange of scale on the x-axis: I(X) = s(x/2).
has the effect of multiplying the integral by k. Expressed analytically, this property assumes
the following form :
THEOREM 1.8. EXPANSION OR CONTRACTION OF THE INTERVAL
j~~~iz)d.x=kj~~s(x)dx
OF INTEGRATION.
,foreveryk>O.
Until now, when we have used the symbol ja, it has been understood that the lower limit
limit b. It is convenient to extend our ideas somewhat and consider
integrals with a lower limit larger than the Upper limit. This is done by defining
a was less than the Upper
(1.4)
1: s(x) dx = - 11 s(x) dx
if
a<b.
Other notations for integrals
69
We also define
Iaa s(x) dx = 0 >
a definition that is suggested by putting a = b in (1.4). These conventions allow us to conclude that Theorem 1.6 is valid not only when c is between a and b but for any arrangement
of the points a, b, c. Theorem 1.6 is sometimes written in the form
j” s(x) dx + j; s(x) dx + j; s(x) dx = 0 .
0
Similarly, we cari extend the range of validity of Theorem 1.8 and allow the constant k to
be negative. In particular, when k = - 1, Theorem 1.8 and Equation (1.4) give us
jab s(x) dx = jl; s( -x> dx .
Y
a
F IGURE 1.29 Illustrating
-x
X
b
the
reflection
property
of
the
integral.
We shall refer to this as the rejectionproperty
of the integral, since the graph of the function
t given by t(x) = s(-X) is obtained from that of s by reflection through the y-axis. An
example is shown in Figure 1.29.
1.14 Other notations for integrals
The letter x that appears in the symbol ja s(x) dx plays no essential role in the definition
of the integral. Any other letter would serve equally well. The letters t, u, v, z are frequently
used for this purpose, and it is agreed that instead of Ja s(x) dx we may Write JE s(t) dt,
ja s(u) du, etc., a11 these being considered as alternative notations for the same thing. The
symbols X, t, u, etc. that are used in this way are called “dummy variables.” They are
analogous to dummy indices used in the summation notation.
There is a tendency among some authors of calculus textbooks to omit the dummy
variable and the d-symbol altogether and to Write simply Ji s for the integral. One good
reason for using this abbreviated symbol is that it suggests more strongly that the integral
depends only on the function s and on the interval [a, b]. Also, certain formulas appear
simpler in this notation. For example, the additive property becomes ji (s + t) = ji s +
JE t. On the other hand, it becomes awkward to Write formulas like Theorems 1.7 and
1.8 in the abbreviated notation. More important than this, we shall find later that the
The concepts of integral calculus
70
original Leibniz notation has certain practical advantages. The symbol dx, which appears
to be rather superfluous at this stage, turns out to be an extremely useful computational
device in connection with many routine calculations with integrals.
1.15 Exercises
1. Compute the value of each of the following integrals. You may use the theorems of Section
1.13 whenever it is convenient 1.0 do SO. The notation [x] denotes the greatest integer 5 x.
(a> j:l [xl dx.
(4 j")[xldx.
(b) j:l b + tl C&L
(4 j:l Pxl dx.
(4 j:l ([xl + b + $1) dx.
(0 j:l [-xl dx.
2. Give an example of a step function s, defined on the closed interval
following properties: fi s(x) dx = 5, si s(x) dx = 2.
3. Show that ja [x] dx + ja [-xl I~X = a - b.
[O, 51, which has the
4. (a) If n is a positive integer, prove that jt [t] dt = n(n - 1)/2.
(b) Iff(x) = j$ [t] dt for x 2 0, draw the graph offover the interval [0,4].
5. (a) Prove that si [t2] dt = 5 - & - 4.
(b) Compute jaa [t21 dt.
6. (a) If n is a positive integer, prove that ji [t12 dt = n(n - 1)(2n - 1)/6.
(b) Iff(x) = jz [t12 dt for x 2 0, draw the graph offcver the interval [0, 31.
(c) Find a11 x > 0 for which & [t12 dt = 2(x - 1).
7. (a) Compute s: [d;] dt.
(b) If n is a positive integer, prove that si” [&] dt = n(n - 1)(4n + 1)/6.
8. Show that the translation property (Theorem 1.7) may be expressed in the equivalent form
s
11,” f(x) dx = j; f(x + c) dx .
9. Show that the following property is equivalent to Theorem 1.8 :
j;;f(x)dx = kjbf(kx)dx.
a
10. Given a positive integer p. A step function s is defined on the interval [0, p] as follows:
s(x)=(-1)~nifxliesintheintervaln
Ix < n + l , w h e r e n =0,1,2,...,p-l;s(p)=O.
Let f(p) = Ji s(x) dx.
(a) Calculatef(3), f (4), andf(j*(3)).
(b) For what value (or values) ofp is If(p)] = 7?
11. If, instead of defining integrals of step functions
by using formula (1.3), we used the definition
s
bs(~) dx = i s; * (xk - Xk-1) ,
a
k=l
a new and different theory of integration would result. Which of the following properties would
Exercises
71
remain valid in this new theory?
(a) j:s + j:s = jls.
1 s(x + c) dx.
(b) j; (s + t) = j: s + j: t.
(e) If s(x) < t(x) for each x in [a, b], then
12. Solve Exercise Il if we use the definition
=iq/(x; -X&l
2
).
k=l
Analytic proofs of the properties of the integral given in Section 1.13 are requested in the
following exercises. The proofs of Theorems 1.3 and 1.8 are worked out here as samples.
Hints are given for the others.
Proof of Theorem 1.3 : ja c . s(x) dx = c ja s(x) dx for every real c.
LetP={x,,x,,..., x,} be a partition of [a, b] such that s is constant on the open subintervals
ofP. A s s u m e s ( x ) =s,ifx,_, <x <x,(k = 1,2,..., n). Then c . s(x) = c. sic if xk-r <
x < xk, and hence by the definition of an integral we have
*c s(x) dx = 2 c Si . (xk - x& = c 2 sk (xL - x& = c *s(x) dx .
sn
k=I
k=L
Proof of Theorenl 1.8 :
s,u”s(i)dx
= k[s(x)dx
if k > 0.
Let P = {x, , x1 , . . . , x,} be a partition of the interval [a, b] such that s is constant on the
open subintervals of P. Assume that s(x) = si if xi-r < x < xi . Let t(x) = s(x/k) if ka <
x 5 kb. Then t(x) = si if x lies in the open interval (kxipl , kx,); hence P’ = {kx, , kx, , . . . ,
kx,} is a partition of [ka, kb] and t is constant on the open subintervals of P’. Therefore t is
a step function whose integral is
î
ka t(x) dx = 2 si . (kxi - kx,_J = k 2 s, . (xi - xi-J = k /“” s(x) dx .
i=l
i=l
rQ
13. Prove Theorem 1.2 (the additive property).
[HNZt:
Use the additive property for sums: ~~=r(uk i- bk) = zizi ok + x{C1 bk .]
14. Prove Theorem 1.4 (the linearity property).
[Hint:
Use the additive property and the homogeneous property.]
15. Prove Theorem 1.5 (the comparison theorem).
[Hint:
Use the corresponding property for sums: x&r ak < J?zC1 bk if ak < bk for
k = 1, 2, . . . , n.]
The concepts of integrul calculus
12
16. Prove Theorem 1.6 (additivity with respect to the interval).
[Hint: If P, is a partition of [a, c] and Pz a partiton
with those of P, form a partition of [a, b].]
17.
Prove
Theorem
1.7
(invariance
under
of [c, 61, then the points of P, along
translation).
If P = {x0, x1 , . . . , x,} is a partition of [a, b], thenP’ = {x0 + c, x1 + c, . . . ,
[Hinf:
x, + c} is a partition of [a + c, b + cl.]
1.16 The integral of more general functions
The integral sa s(x) dx has been defined when s is a step function. In this section we shah
formulate a definition of ji,J(x) dx that Will apply to more general functions J The
definition Will be constructed SO that the resulting integral has a11 the properties listed in
Section 1.13.
h
a
F IGURE 1.30
Approximating a function f from above and below by step functions.
The approach Will be patterned somewhat after the method of Archimedes, which was
explained above in Section 1 1.3. The idea is simply this: We begin by approximating the
function f from below and from above by step functions, as suggested in Figure 1.30.
That is, we choose an arbitrary step function, say s, whose graph lies below that off, and a
arbitrary step function, say t, whose graph lies above that of jY Next, we consider the
collection of a11 the numbers ja s(x) dx and ja t(x) dx obtained by choosing s and t in a11
possible ways. In general, we have
j; s(x) dx < jab t(x) dx
because of the comparison theorem. If the integral offis to obey the comparison theorem,
then it must be a number which falls between ji s(x) dx and Jt t(x) dx for every pair of
approximating functions s and t. If there is only one number which has this property
we define the integral off to be this number.
There is only one thing that cari cause trouble in this procedure, and it occurs in the very
first step. Unfortunately, it is not possible to approximate euery function from above
and from below by step functions. For example, the functionfgiven by the equations
f(x) =;
i f x#O,
f(O) = 0 2
The integral of more general functions
13
is defined for a11 real x, but on any interval [a, b] containing the origin we cannot surround
f by step functions. This is due to the fact that f has arbitrarily large values near the origin
or, as we say, f is unbounded in every neighborhood of the origin (see Figure 1.31). Therefore, we shall first restrict ourselves to those functions that are bounded on [a, b], that is, to
those functions f for which there exists a number M > 0 such that
(1.5)
-M<f(x)IM
for every x in [a, 61. Geometrically, the graph of such a function lies between the graphs
of two constant step functions s and t having the values -M and +M, respectively. (See
- M ----------------------s(x)
= _M
t
FIGURE
1.31 An unbounded function.
FIGURE
1.32 A bounded function.
Figure 1.32.) In a case like this, we say that f is bounded by M. The two inequalities in
(1.5) cari also be written as
With this point taken tare of, we cari proceed to carry out the plan described above and
to formulate the definition of the integral.
DEFINITION OF THE INTEGRAL OF A BOUNDED FUNCTION.
Let f be a function dejned and
bounded on [a, b]. Let s and t denote arbitrary step functions dejined on [a, b] such that
(1.6)
44 If(x) 5 t(x)
for every x in [a, b]. If there is one and only one number I such that
(1.7)
Jab s(x) dx < I 5 6 t(x) dx
for every pair of step functions s and t satisfying (1.6), then this number I is called the
integral off from a to b, and is denoted by the symbol ja f(x) dx or by jaf. When such
an Z exists, the function f is said to be integrable on [a, b].
74
The concepts of integral calculus
If a < b, we define JE~(X) dx = - jaf(x) dx, p rovided f is integrable on [a, b]. We
also define jaf(x) dx = 0. If f is integrable on [a, b], we say that the integral jaf(x) dx
exists. The function f is called the integrand, the numbers a and b are called the limits of
integration, and the interval [a, b] the interval of integration.
1.17 Upper and lower integrals
Assume f is bounded on [a, b:l. Ifs and t are step functions satisfying (1.6), we say s is
below f, and t is abovef, and we Write s 5 f 5 t.
Let S denote the set of a11 numbers Ja s(x) dx obtained as s runs through a11 step functions
belowf, and let T be the set of a11 numbers ja t(x) dx obtained as t runs through a11 step
functions aboveJ That is, let
S=(Ibs(x)dxIsIf),
a
T=[j-;t(x)dxjf<t).
Both sets Sand Tare nonempty sincef is bounded. Also, ji s(x) dx 5 ja t(x) dx if s If 5 t,
SO every number in S is less than every number in T. Therefore, by Theorem 1.34, S has
a supremum, and T has an infimum, and they satisfy the inequalities
i” s(x) dx 5 sup S 2 inf T 5 jab t(x) dx
a
for a11 s and t satisfying s 5 f 5: t. This shows that both numbers sup S and inf T satisfy
(1.7). Therefore, f is integrable on [a, b] if and only if sup S = inf T, in which case we have
s
(:f(x) dx = sup S = inf T.
The number ‘sup S is called the Zower integral off and is denoted by I(f). The number
inf T is called the Upper integral off and is denoted by ï(f). Thus, we have
J(f) = sup (J: s(x) a’x 1s 5 /) ,
1(f) = inf (11 t(x) dx 1f I t) .
The foregoing argument proves .the following theorem.
THEOREM 1.9.
an Upper integral
Every function f which is bounded on [a, b] has a lower integral
satisfying the inequalities
ï(f)
J(f)
and
j-” 4x1 dx 5 Kf) I I(f> I Jab t(x) dx
a
for a11 step functions s and t with s < f < t. The function f is integrable on [a, b] ifand only
if its Upper and lower integrals are equal, in which case we have
/abf(x) dx = _I(f) = I(f> .
Informa1 remarks on the theory and technique of integration
75
1.18 The area of an ordinate set expressed as an integral
The concept of area was introduced axiomatically in Section 1.6 as a set function having
certain properties. From these properties we proved that the area of the ordinate set of a
nonnegative step function is equal to the integral of the function. Now we show that the
same is true for any integrable nonnegative function. We recall that the ordinate set of a
nonnegative function f over an interval [a, b] is the set of a11 points (x, y) satisfying the
inequalities 0 < y <f(x), a 5 x < b.
THEOREM
1.10. Let f be a nonnegative function, integrable on an interval [a, b], and let
Q denote the ordinate set off over [a, b]. Then Q is measurable and its area is equal to the
integral Ja f (x) dx.
Proof. Let S and T be two step regions satisfying S E Q c T. Then there are two step
functions s and t satisfying s 5 f 5 t on [a, b], such that
a(S) = J: s(x) dx
and
a(T) = J: t(x) dx .
Since f is integrable on [a, b], the number 1 = j’a f (x) dx is the only number satisfying the
inequalities
j-: s(x) dx 5 1 I Jab t(x) dx
for a11 step functions s and t with s < f 5 t. Therefore this is also the only number satisfying
a(S) 5 Z 5 a(T) for a11 step regions S and T with S c Q c T. By the exhaustion property,
this proves that Q is measurable and that a(Q) = Z.
Let Q denote the ordinate set of Theorem 1.10, and let Q’ denote
we remove from Q those points on the graph off. That is, let
the set that remains if
Q'={(x,y)IaIxIb,OIy<f(x)}.
The argument used to prove Theorem 1.10 also shows that Q’ is measurable and that
a(Q’) = a(Q). Therefore, by the difference property of area, the set Q - Q’ is measurable
and
a(Q - Q’) = a(Q) - a(Q’) = 0.
In other words, we have proved the following theorem.
THEOREM 1.11.
Let f be a nonnegative function, integrable on an interval [a, b]. Then
the graph off, that is, the set
{(x, y> 1a 5 x 5 b, y = f(x)},
is measurable and has area equal to 0.
1.19 Informa1 remarks on the theory and technique of integration
Two fundamental questions arise at this stage: (1) Which boundedfunctions are integrable?
(2) Given that a function f is integrable, how do we compute the integral off?
The concepts of integral calculus
76
The first question cornes under the heading “Theory of Integration” and the second under
the heading “Technique of Integration.” A complete answer to question (1) lies beyond the
scope of an introductory course and Will not be given in this book. Instead, we shall give
partial answers which require only elementary ideas.
First we introduce an important class of functions known as monotonie jiunctions. In
the following section we define these functions and give a number of examples. Then we
prove that a11 bounded monotonie functions are integrable. Fortunately, most of the
functions that occur in practice are monotonie or sums of monotonie functions, SO the
results of this miniature theory of integration are quite comprehensive.
The discussion of “Technique (of Integration” begins in Section 1.23, where we calculate
the integral Jo xp dx, whenp is a positive integer. Then we develop general properties of the
integral, such as linearity and additivity, and show how these properties help us to extend
our knowledge of integrals of specific functions.
1.20 Monotonie
and piecewise monotonie
functions. Definitions and examples
A function f is said to be increasing on a set S if f (x) 5 f(y) for every pair of points x
and y in S with x < y. If the strict inequality f(x) <f(y) holds for a11 x < y in S, the
function is said to be strictly increasing on S. Similarly, f is called decreasing on S if
a
i
Increasing
a-
b
Strictly
-a
increasing
iJ
Strictly decreasing
FIGURE 1.33 Monotonie functions.
.f(x) 2 f(y) for a11 x < y in S. 1-f f(x) > f(y) f or a11 x < y in S, then f is called strictly
decreasing on S. A function is called monotonie on S if it is increasing on S or if it is decreasing on S. The term strictly monotonie means thatfis strictly increasing on S or strictly
decreasing on S. Ordinarily, the set S under consideration is either an open interval or a
closed interval. Examples are shown in Figure 1.33.
FIGURE 1.34 A piecewise monotonie function.
Integrability of bounded monotonie
functions
77
A function f is said to be piecewise monotonie on an interval if its graph consists of a
finite number of monotonie pieces. That is to say, fis piecewise monotonie on [a, b] if
there is a partition P of [a, b] such that f is monotonie on each of the open subintervals of
P. In particular, step functions are piecewise monotonie, as are a11 the examples shown in
Figures 1.33 and 1.34.
EXAMPLE
1. The power functions. If p is a positive integer, we have the inequality
xp <
y”
i f
Olx<y,
which is easily proved by mathematical induction. This shows that the power functionf,
defined for a11 real x by the equationf(x) = xp, is strictly increasing on the nonnegative
real axis. It is also strictly monotonie on the negative real axis (it is decreasing ifp is even
and increasing ifp is odd). Therefore, f is piecewise monotonie on every finite interval.
EXAMPLE
2. The square-root function. Let f (x) = %f-x f or x 2 0. This function is strictly
increasing on the nonnegative real axis. In fact, if 0 5 x < y, we have
EXAMPLE
3. The graph of the function g defined by the equation
g(x) = l/r2 - x2
if -r < x 5 r
is a semicircle of radius Y. This function is strictly increasing on the interval -r < x 5 0
and strictly decreasing on the interval 0 5 x < r. Hence, g is piecewise monotonie on
L--r, rl.
1.21 Integrability of bounded monotonie functions
The importance of monotonie functions in integration theory is due to the following
theorem.
THEOREM 1.12.
Iffis monotonie
on a closed
interval
[a, b], then f is integrable on [a, b].
Proof. We shall prove the theorem for increasing functions. The proof for decreasing
functions is analogous. Assume f is increasing and let -Icf) and I(f) denote its lower and
Upper integrals, respectively. We shall prove that -Icf) = l(f).
Let n be a positive integer and construct two special approximating step functions s, and
t, as follows: Let P = {x,, x1, . . . , x,} be a partition of [a, b] into n equal subintervals, that
is, subintervals [xkPl, xk. with xk - xkP1 = (b - a)/n for each k. Now define s, and t, by
the formulas
s,(x) = f-h-1) 9
tnc4 = f (x?J
if x~-~ < x < x, .
The concepts of integral calculus
78
At the subdivision points, define s, and t, SO as to preserve the relations s,(x) <&) <
tri(x) throughout [a, b]. An example is shown in Figure 1.35(a). For this choice of step
functions, we have
s:tn - sabs. =k~l= !yfcxk)(x2k [-f(xkXk) -1-) -f(kX~lkwf(~l)]k-l)=(xktb - -a)[XkJ-(-1p)) - “ca)1 ,
k=l
where the last equation is a consequence of the telescoping property of finite sums. This last
relation has a simple geometric interpretation. The difference jz t, - ji s, is equal to the
sum of the areas of the shaded rectangles in Figure 1.35(a). By sliding these rectangles to
the right SO that they rest on a common base as in Figure 1.35(b), we see that they fil1 out a
(b)
64
FIGURE 1.35
Proof of integrability of an increasing function.
rectangle of base (b - a)/n and altitude f(b) -f(a); the sum of the areas is therefore
C/n, where C = (b - a)[f(b) -jr(a)].
Now we rewrite the foregoing :relation in the form
b
WI
sa
t, -
b
sn
s, = c.
n
The lower and Upper integrals off satisfy the inequalities
Multiplying the first set of inequalities by (- 1) and adding the result to the second set, we
obtain
m - m I Jab 4% - Iab s, *
Using (1.8) and the relation I((f> < I(j), we obtain
Calculation of the integral Jo xp dx when p is a positive integer
79
for every integer n 2 1. Therefore, by Theorem 1.31, we must have J(f) = r(f). This
proves thatfis integrable on [a, b].
1.22 Calculation of the integral of a bounded monotonie function
The proof of Theorem 1.12 not only shows that the integral of a bounded increasing
function exists, but it also suggests a method for computing the value of the integral. This
is described by the following theorem.
THEOREM
1.13. Assumef is increasing on a closed interval [a, b]. Let xk = a + k(b - a)/n
fork = 0, 1,. . . , n. If I is any number which satisjîes the inequalities
(1.9)
e %f(x,) 5 z I b-a
%f(x,)
n
k=O
k=l
for every integer n 2 1, then Z = ji f(x) dx.
Proof. Let s, and t, be the special approximating step functions obtained by subdivision
of the interval [a, b] into n equal parts, as described in the proof of Theorem 1.12. Then,
inequalities (1.9) state that
for every n 2 1. But the integral Ja f(x) dx satisfies the same inequalities as Z. Using
Equation (1.8) we see that
W~~-/)-Wd+~
for every integer ut 2 1. Therefore, by Theorem 1.31, we have Z = ja f(x) dx, as asserted.
An analogous argument gives a proof of the corresponding theorem for decreasing
functions.
THEOREM
1.14. Assume f is decreasing on [a, b]. Let xk = a + k(b - a)/n for k =
0, 1, . . . ) n. If Z is any number which satisfîes the inequalities
for every integer n 2 1, then Z = Jo f(x) dx.
1.23
Calculation of the integral jo x* dx when p is a positive integer
TO illustrate the use of Theorem 1.13 we shall calculate the integral Ji xD dx where
b > 0 andp is any positive integer. The integral exists because the integrand is bounded
and increasing on [0, b].
80
The concepts of integral calculus
THEOREM
1.15.
If p is a positive integer and b > 0, we have
b
s0
bD+l
x” dx = p+ 1’
Proof. We begin with the inequalities
n-1
c
k=l
k" <
ff$<zkY
k==l
valid for every integer n 2 1 and every integer p 2 1. These inequalities may be easily
proved by mathematical induction. (A proof is outlined in Exercise 13 of Section 14.10.)
Multiplication of these inequalities by b”+l/nP+l gives us
If we letf(x) = xp and xk = kb/n, for k = 0, 1, 2, . . . , n, these inequalities become
$ xf<xk>
< -j$
< 9
-&&).
k=l
K=O
Therefore, the inequalities (1.9) of Theorem 1.13 are satisfied with f(x) = XV, a = 0, and
1 = b”+l/(p + 1). It follows that Jo Y’ dx E b*+l/(p + 1).
1.24 The basic properties of tbe integral
From the definition of the integral, it is possible to deduce the following properties.
Proofs are given in Section 1.27.
THEOREM
1.16. LINEARITY WITH RESPECT TO THE INTEGRAND. Ifbothfand g are integrable on [a, b], SO is cif + c,gjfor everypair of constants cl and c2 . Furthermore, we have
11” [cJ(x> + C&)l dx = ~1 ipf<4 d.x +
~2 [ g(x) dx .
Note: By use of mathematical induction, the linearity property cari be generalized as
follows: Iffi,...,fn are integrable on [a, b], then SO is c,fi + . . . + c& for a11 real
Cl,...,C,, a n d
Sr
abk;l”kfr(x)
dx =kglck [fk<x> dx .
THEOREM 1.17. ADDITIVITY WITH RESPECT TO THE INTERVAL OF INTEGRATION.
of the following three integrals exist, the third also exists, and we have
If
tW0
Integration of polynomials
81
Note: In particular, iff is monotonie on [a, b] and also on [6, c], then both integrals
ja f and jifexist, SO jo f also exists and is equal to the sum of the other two integrals.
THEOREM
1.18.
INVARIANCE
UNDER
TRANSLATION.
If f is integrable on [a, b], then for
every real c we have
j-abf(x) dx = t;;f(x - c) dx .
THEOREM 1.19.
EXPANSION OR CONTRACTION OF THE INTERVAL OF INTEGRATION.
If f is
integrable on [a, b], then for every real k # 0 we have
s
abf(x) dx = ; rf (;) dx .
Note: In both Theorems 1.18 and 1.19, the existence of one of the integrals implies the
existence of the other. When k = - 1, Theorem 1.19 is called the reflectionproperty.
THEOREM 1.20. COMPARISON THEOREM. If both f and g are integrable on [a, b] and if
g(x) 5 f(x) for every x in [a, b], then we have
c g(x) dx I [f(x) dx .
An important special case of Theorem 1.20 occurs when g(x) = 0 for every x. In this
case, the theorem states that if f (x) 2 0 everywhere on [a, b], then Ja f (x) dx 2 0. In
other words, a nonnegative function has a nonnegative integral. It cari also be shown
that if we have the strict inequality g(x) <f(x) for a11 x in [a, b], then the same strict
inequality holds for the integrals, but the proof is not easy to give at this stage.
In Chapter 5 we shall discuss various methods for calculating the value of an integral
without the necessity of using the definition in each case. These methods, however, are
applicable to only a relatively small number of functions, and for most integrable functions
the actual numerical value of the integral cari only be estimated. This is usually done by
approximating the integrand above and below by step functions or by other simple functions
whose integrals cari be evaluated exactly. Then the comparison theorem is used to obtain
corresponding approximations for the integral of the function in question. This idea Will
be explored more fully in Chapter 7.
1.25 Integration of polynomials
In Section 1.23 we established the integration formula
b
(1.10)
s0
b D+l
x9dx = P+I
for b > 0 andp any positive integer. The formula is also valid if b = 0, since both members
The concepts of integral calculus
82
are zero. We cari use Theorem 1.1.9 to show that (1.10) also holds for negative b. We simply
take k = - 1 in Theorem 1.19 to obtain
s
-b
x’ dx =
-
0
o”(-x)”
dx = (-l)“+‘sbx”
s
dx
= -,
P+I
0
which shows that (1 .lO) holds for negative b. The additive property jt xP dx = Ji xp dx j; xP dx now leads to the more general formula
b
x”dx =
b e+l - a”+l
p+l
’
valid for a11 real a and b, and any integer p 2 0.
Sometimes the special symbol
is used to designate the difference P(b) - P(a). Thus the foregoing formula may also be
written as follows:
s
b
x’
tlx
a
=
xu+l
b
- =
bV+l
P+lu
-
av+l
p+l
.
This formula, along with the linearity property, enables us to integrate every polynomial.
For example, to compute the integral jt(x” - 3x + 5) dx, we find the integral of each term
and then add the results. Thus, we have
s
1
3(x2 - 3x + 5) dx =S,*‘dx-3~xdx+5j-,3dx=;~;-3;l;+5x[
33 - l3
32 - l2 3l - l1 26
=--- 3 -+5-=-2
1
3
3
12 + 10 = -3.
More generally, to compute the integral of any polynomial we integrate term by term:
ckxk dx :=
k=O
0
7%
x”dx =
c
k=O
ck
bk+l
_ ak+l
k+l
’
We cari also integrate more complicated functions formed by piecing together various
polynomials. For example, consider the integral Jo 1x(2x - l)] dx. Because of the absolutevalue signs, the integrand is nolt a polynomial. However, by considering the sign of
Exercises
83
x(2x - l), we cari split the interval [0, l] into two subintervals, in each of which the integrand is a polynomial. As x varies from 0 to 1, the product x(2x - 1) changes sign at the
point x = 8; it is negative if 0 < x < 4 and positive if 4 < x < 1. Therefore, we use the
additive property to Write
j; 1x(2x - 1)j dx = -1;” x(2x - 1) dx + ll;p x(2x - 1) dx
= jol’” (x - 2x2) dx + J;:&2x2
- x) dx
= (4 - 112) + (& - 3) = a .
1.26 Exercises
Compute each
of the following integrals.
1. s 3x2dx *
0
2. I’ x2 dx.
11. s f”(8t3+6t2-2t+5)dt.
3. s 2 4x3 dx.
13. i ~,(x + 1)2dx.
4 . 2 s 4x3dx.
14. I ,‘(x + l)2dx.
5. s ’ 5t4
0
15. s 2 (x - 1)(3x - 1) dx.
12. s 4, (u - l)(u - 2) du.
-3
0
-2
dt.
0
6 . ’ s 5t4dt.
16. I ; I(x - 1)(3x - l)] dx.
7. s ; (5x4 - 4x3) dx.
17. s 3 (2x - 5)3 dx.
8. s Il (5x4 - 4x3) dx.
18. s3 (x2 - 3)3 dx.
2, (t2 + 1) dr.
19. I 0 x2(x - 5)4 dx.
-1
9.
s
0
-3
2 (3x2 - 4x -t 2) dx.
i
21. Find ail values of c for which
10.
(a) jg x(1 - x) dx = 0,
20.
1; (x + 4)‘O dx.
i
[Hint: Theorem 1 .18.]
(b) j; Ix(1 - x)1 dx = 0.
22. Compute each of the following integrals. Draw the graph off in each case.
(4 jff0 dx
(b) j; f@> dx
i f O<x<l,
where f(x) = 12- x
if 1 5 x < 2.
1
if 0 < x 5 c,
ix
where f(x) =
1 -x
C
i f c<x<l;
l - c
c is a fixed real number, 0 < c < 1.
23. Find a quadratic polynomial P for which P(0) = P( 1) = 0 and & P(x) dx = 1.
24. Find a cubic polynomial P for which P(0) = P( -2) = 0, P(1) = 15, and 3 joz P(x) dx = 4.
84
The concepts of integral calculus
Optional exercises
25. Let f be a function whose domalin contains -x whenever it contains x. We say that f is an
even function iff(-x) = f( x ) an d an odd function if f ( -x) = -f(x) for a11 x in the domain
off. If f is integrable on [0, b], prove that
(4 Jbafb)clx
(b) 1”
f(x)
.-b
= 2~~fCx)dx
dx = 0
if
f
is even;
iff is odd.
26. Use Theorems 1.18 and 1.19 to (derive the formula
s
1 f(x) d;r = (b - a)Ji
f
[a + (b - a)x] dx .
27. Theorems 1.18 and 1.19 suggest a common generalization for the integral jI,f(Ax + B) dx.
Guess the formula suggested and prove it with the help of Theorems 1.18 and 1.19. Discuss
also the case A = 0.
28. Use Theorems 1.18 and 1.19 to (derive the formula
s
,bf(c
-X)~X =j+;f(x)dx.
1.27 Proofs of the hasic properties of the integral
This section contains proofs oî the basic properties of the integral listed in Theorems
1.16 through 1.20 in Section 1.24. We make repeated use of the fact that every functionf
which is bounded on an interval [(z, b] has a lower integral Z(j) and an upper integral @J
given by
where s and t denote arbitrary step functions below and above f, respectively. We know,
by Theorem 1.9, thatfis integrable if and only if -I<f) = Z(f), in which case the value of the
integral off is the common value of the Upper and lower integrals.
Proof of the Linearity Property (.Theorem
1.16). We decompose the linearity property into
two parts:
TO prove (A), let Z(J) = j:fand let Z(g) = JE g. We shall prove that J(f + g) = Z(j- + g) =
U) + Z(g).
Let s1 and s2 denote arbitrary step functions below f and g, respectively. Since f and g
are integrable, we have
Pro~fs of the basic properties of the integval
85
By the additive property of the supremum (Theorem 1.33), we also have
(1.11)
~cf> + QI = sup {.rN”
Sl +
j-)2
1% I"f3
s2 I s) .
But if sr If and s2 < g, then the sum s = s1 + s2 is a step function below f + g, and we
have
Therefore, the number I(f + g) is an Upper bound for the set appearing on the right of
(1.11). This upper bound cannot be less than the least Upper bound of the set, SO we have
(1.12)
Z(f) + $7) I -I(f + g> .
Similarly, if we use the relations
r(f) = inf [r tl( f I ti) ,
where tl and t, denote
inequality
I(g) = inf [Jl t2 1g 5 t2) ,
arbitrary step functions above f and g, respectively, we obtain the
I(f+ g> I I<f> + G) *
(1.13)
Inequalities (1.12) and (1.13) together show that_I(f + g) = r(f + g) = Z(j) + 1(g). Therefore f + g is integrable and relation (A) holds.
Relation (B) is trivial if c = 0. If c > 0, we note that every step function si below cf is of
the form s1 = cs, where s is a step function below f. Similarly, every step function t, above
cf is of the form t, = ct, where t is a step function above f. Therefore we have
and
I(cf) = inf [[ t, 1cf < tl) = inf (c c t If 5 1) = cl(f) .
Therefore l(cf) = I(cf) = cZ(f).
supremum and infimum :
(1.14)
H ere we have used the following properties of the
SUp(CXJXEA}=CSUp(xJXEA},
inf(cxJxEA}=cinf{xjxEA},
which hold if c > 0. This proves (B) if c > 0.
If c < 0, the proof of (B) is basically the same, except that every step function s1 below cf
is of the form s1 = ct, where t is a step function above f, and every step function t, above
cf is of the form t, = cs, where s is a step function below f. Also, instead of (1.14) we use
the relations
sup {cx 1x E A} = c inf {x 1x E A} ,
inf{cxIxEA}=c’sup{xIxEA},
86
The concepts of integral calculus
which hold if c < 0. We now have
Similarly, we find I(cf) = cl(f). Therefore (B) holds for a11 real c.
Proof of Additivity with Respect to the Interval of Integration (Theorem 1.17). Suppose
that a < b < c, and assume that ,the two integrals Ja f and j; f exist. Let I(f) and I(f) denote
the Upper and lower integrals ol’f over the interval [a, c]. We shall prove that
(1.15)
I(f) = Kf> = j) + j*k
Ifs is any step function belowf’on [a, c], we have
jac s = [s + jbC s.
Conversely, if sr and s2 are step functions below f on [a, b] and on [b, c], respectively, then
the function s which is equal to 2~~ on [a, b) and equal to s2 on [b, c] is a step function below
f on [a, c] for which we have
j) = saSI + j;s2.
Therefore, by the additive property of the supremum (Theorem 1.33) we have
Similarly, we find
which proves (1.15) when a < b < c. The proof is similar for any other arrangement of
the points a, b, c.
Proof of the Translation Prop(orty (Theorem 1.18). Let g be the function defined on the
interval [a + c, b + c] by the equation g(x) = f(x - c). Let _I(g) and I(g) denote the lower
and Upper integrals of g on the interval [a + c, b + c]. We shall prove that
(1.16)
_I(g) = k> = j)(:x) dx
Let s be any step function below g on the interval [a + c, b + c]. Then the function s1
defined on [a, b] by the equation sr(x) = s(x + c) is a step function below f on [a, b].
Moreover, every step function ~:r below f on [a, b] has this form for some s below g. Also,
by the translation property for integrals of step functions, we have
s(x + c) dx = s b sl(x) dx .
a
Proofs of the basic properties of the integral
87
Therefore we have
Similarly, we find j(g) = fa f(x) dx, which proves (1.16).
Proof of the Expansion Property (Theorem 1.19). Assume k > 0 and define g on the
interval [ka, kb] by the equation g(x) = f(x/k). Let J(g) and I(g) denote the lower and
Upper integrals of g on [ka, kb]. We shah prove that
(1.17)
I(g) = J(g) = k jabfW dx .
Let s be any step function below g on [ka, kb]. Then the function s1 defined on [a, b] by
the equation sr(x) = s(kx) is a step function below f on [a, b]. Moreover, every step
function sr below f on [a, b] has this form. Also, by the expansion property for integrals
of step functions, we have
s(x) dx = k
b
~(/LX) dx = k s sl(x) dx .
a
Therefore we have
Icg) = sup [JLn” s ) s 5 g) = wp (k Iab SI 1~1 If) = k Jab A4 dx .
Similarly, we find 1(g) = kja f(x) dx, which proves (1.17) if k > 0. The same type of proof
cari be used if k < 0.
Proof of the Comparison Theorem (Theorem 1.20). Assume g 5 f on the interval [a, b].
Let s be any step function below g, and let t be any step function abovef. Then we have
Ja s < jg t, and hence Theorem 1.34 gives us
This proves that jz g I jz f, as required.
2
SOME APPLICATIONS OF INTEGRATION
2.1 Introduction
In Section 1.18 we expressed the area of the ordinate set of a nonnegative function as an
integral. In this chapter we Will1 show that areas of more general regions cari also be
expressed as integrals. We Will also discuss further applications of the integral to concepts
such as volume, work, and averages. Then, at the end of the chapter, we Will study
properties of functions defined by integrals.
2.2 The area of a region hetween two graphs expressed as an integral
If two functionsf and g are related by the inequalityf(x) < g(x) for a11 x in an interval
[a, 61, we writef < g on [a, b]. Figure 2.1 shows two examples. Iff 5 g on [a, h], the set
S consisting of a11 points (x, y) satisfying the inequalities
f(x) I Y I g(x) >
alxlb,
is called the region between the graphs off and g. The following theorem tells us how to
express the area of S as an integral.
(4
FIGURE 2.1
The area
(b)
of a region between two graphs expressed as an integral:
a(S) = - i ; [g(x) -~(X>I dx.
88
Worked examples
89
THEOREM 2.1. Assume f and g are integrable and satisfy f 5 g on [a, b]. Then the region
S between their graphs is measurable and its area a(S) is given by the irjtegral
4s) = Jab k(x) - f(x)1 dx .
(2.1)
Proof. Assume first thatf and g are nonnegative, as shown in Figure 2.1(a). Let F and
G denote the following sets:
F = {(~,y> ) a I x 5 b, 0 I y <f(x)>,
G = {(x, y) ) a I x 5 b, 0 < y 2 g(x)} .
That is, G is the ordinate set of g, and Fis the ordinate set off, minus the graph off. T h e
region S between the graphs off and g is the difference S = G - F. By Theorems 1.10 and
1.11, both F and G are measurable. Since F s G, the difference S = G - F is also
measurable, and we have
a(S) = a(G) - a(F) = Jab g(x) dx - saj(x) dx = JI [g(x) -f(x)] dx .
This proves (2.1) when f and g are nonnegative.
Now consider the general case where f 5 g on [a, b], but f and g are not necessarily
nonnegative. An example is shown in Figure 2.1(b). We cari reduce this to the previous
case by sliding the region upward until it lies above the x-axis. That is, we choose a positive
number c large enough to ensure that 0 2 f(x) + c 5 g(x) + c for a11 x in [a, b]. By what
we have already proved, the new region T between the graphs off + c and g + c is
measurable, and its area is given by the integral
47 = s”a Kg(x) + c) - (f(x) + C)I dx = IGb k(x) - f(x)1 dx .
But T is congruent to S;
SO
S is also measurable and we have
4% = a(T) = Iob k(x) -f(x)1 dx .
This completes the proof.
2.3 Worked examples
EXAMPLE 1. Compute the area of the region S between the graphs off and g over the
interval [0, 21 iff(x) = x(x - 2) and g(x) = x/2.
Solution. The two graphs are shown in Figure 2.2. The shaded portion represents S.
Since f < g over the interval [0, 21, we use Theorem 2.1 to Write
a(S)=~z,x)-,,x)]dx=~<;x-x2)dx=~~-~=;.
0
Some applications of integration
FIGURE 2.2 Example 1.
FIGURE 2.3 Example 2.
EXAMPLE 2. Compute the area of the region S between the graphs off and g over the
interval [- 1,2] iff(x) = x and g(x) = x3/4.
Solution. The region S is shown in Figure 2.3. Here we do not have f 5 g throughout
the interval [ - 1,2]. However, we do have f 5 g over the subinterval [ - 1, 0] and g g f
over the subinterval [0, 21. Applying Theorem 2.1 to each subinterval, we have
4s) =~~lk(x) - f(x)1 dx -t-I: V(x) - g(x)1 dx
=In examples like this one, where the interval [a, b] cari be broken up into a finite number
of subintervals such that eitherf 2; g or g 5 fin each subinterval, formula (2.1) of Theorem
2.1 becomes
4s) = Jab k(x) -f(x)1 dx EXAMPLE 3. Area of a circular disk. A circular disk of radius r is the set of a11 points
inside or on the boundary of a c:ircle of radius r. Such a disk is congruent to the region
Worked examples
91
between the graphs of the two functions f and g defined on the interval [-Y, Y] by the
formulas
g(x) = dz-2
and
f(x) = -dr2 - x2.
Each function is bounded and piecewise monotonie SO each is integrable on [-r, r].
Theorem 2.1 tells us that the region between their graphs is measurable and that its area is
jZr [g(x) -f(x)] dx. Let A(r) d enote the area of the disk. We Will prove that
A(r) = ?A(l) .
That is, the area of a disk of radius r is r2 times the area of a unit disk (a disk of radius 1).
Since g(x) -f(x) = 2g(x), Theorem 2.1 gives us
A(r) = J:v 2g(x) dx = 2 /Iv dr” - x2 dx
In particular, when r = 1, we have the formula
A(1) = 2 J’, 41 - x2 dx .
Now we change the scale on the x-axis, using Theorem 1.19 with k = l/r, to obtain
A(r) =
2 11, g(x) dx = 2r J:, g(rx) dx = 2r s:, dr” - (rx)’ dx
= 2r2 J:, dl -
x2
dx = r2A(1) .
This proves that A(r) = r2A(1), as asserted.
DEFINITION.
We dejne the number TT to be the area of a unit disk.
The formula just proved states that A(r) = m2.
The foregoing example illustrates the behavior of area under expansion or contraction
of plane regions. Suppose S is a given set of points in the plane and consider a new set of
points obtained by multiplying the coordinates of each point of S by a constant factor
k > 0. We denote this set by k S and say that it is similar to S. The process which produces
k S from S is called a similarity transformation. Each point is moved along a straight line
which passes through the origin to k times its original distance from the origin. If k > 1,
the transformation is also called a stretching or an expansion (from the origin) and, if
0 < k < 1, it is called a shrinking or a contraction (toward the origin).
For example, if S is the region bounded by a unit circle with tenter at the origin, then
k S is a concentric circular region of radius k. In Example 3 we showed that for circular
regions, the area of k S is k2 times the area of S. Now we prove that this property of area
holds for any ordinate set.
Some applications
92
of integration
EXAMPLE 4. Behavior of the area of an ordinate set under a similarity transformation.
Let f be nonnegative and integrable on [a, b] and let S be its ordinate set. An example is
shown in Figure 2.4(a). If we apply a similarity transformation with a positive factor k,
then kS is the ordinate set of a new function, say g, over the interval [ka, kb]. [See Figure
2.4(b).] A point (x, y) is on the graph of g if and only if the point (x/k, y/k) is on the graph
off. Hence y/k = f(x/k), SO y = kf(x/k). In other words, the new function g is related to
f by the formula
g(x) = VW)
ka
kb
(4
(b)
FIGURE 2.4
The area of
kS
is
k2
times that of S.
for each x in [ka, kb]. Therefore, the area of kS is given by
a(kS) = jky g(x) dx = k j2yf(x/k) dx = k2 SU~(X) dx ,
where in the last step we used the expansion property for integrals (Theorem 1.19). Since
C.,ftof$ = 4% th is proves that a(kS) = k2a(S). In other words, the area of kS is k2 times
EXAMPLE 5. Calculation
of the integral j; x Il2 dx. The integral for area is a two-edged
sword. Although we ordinarily use the integral to calculate areas, sometimes we cari use
our knowledge of area to calculate integrals. We illustrate by computing the value of the
integral & x1’2 dx, where a > 0. (The integral exists since the integrand is increasing and
bounded on [0, a].)
Figure 2.5 shows the graph of the functionfgiven byf(x) = x1j2 over the interval [0, a].
Its ordinate set S has an area given by
a(S) = 6 xli2 dx
Now we compute this area another way. We simply observe that in Figure 2.5 the region
S and the shaded region T together fil1 out a rectangle of base a and altitude a112. Therefore,
a(S) + a(T) = a3j2, SO we have
a(S) = a3/2 - a(T) .
W o r k e d examples
93
But T is the ordinate set of a function g defined over the interval [0, a1’2] on the y-axis by the
equation g(y) = ,v2. Thus, we have
a(T) = 6”’ g(y) dy = 6”’ y2 dy = 4~~‘~ ,
SO
a(S) = a312 - $a312 = $a312. This proves that
FIGURE
2.5
Calculation of the integral ji x1/2 dx.
More generally, if a > 0 and b > 0, we may use the additive property of the integral to
obtain the formula
*’ 912 dx = $(p2 _ a3/2) .
a
The foregoing argument cari also be used to compute the integral Ja xlln dx, if n is a
positive integer. We state the result as a theorem.
THEOREM
2.2.
For a > 0, b > 0 and n u positive integer, we bave
sa Xl’n dx = b
b
(2.2)
The proof is
SO
1+1/?l
-a
1+1/n
1 + l/n
.
similar to that in Example 5 that we leave the details to the reader.
Some applications of integration
94
2.4 Exercises
In Exercises 1 through 14, compute the area of the region S between the graphs off and g over
the interval [a, b] specified in each case. Make a sketch of the two graphs and indicate S by shading.
l.f(x) =4 -x2,
a = -2,
b = 2.
g(x) = 0,
2. f(x) = 4 - x2,
g(x) = 8 - 2x2, a = -2,
b = 2.
3. f(x) = x3 + x2,
a = -1 >
b = 1.
g(x) = x3 + 1,
4. f(x) = x - x2,
a = 0,
b =2.
g(x) = -x,
5. f(x) = x1’3,
g(x) = xl’2,
a = 0,
b = 1.
6. f(x) = ~1’3,
g(x) = x1/2,
a = 1,
b = 2.
7. f(x) = x1/3,
a = 0,
b =2.
g(x) = x1/2,
8. f(x) = x112,
a =0,
b =2.
g(x) = x29
9. f(x) = x2,
a = -1,
b = (1 + y5)/2.
g(x) = x + 1,
10. f(x) = x(x2 - l),
a = -1,
b =&.
g(x) = x,
g(x) = x2 - 1,
a = -1
b = 1.
11. f(x) = 1x1,
12. f(x) = Ix - II,
g(x) = x2 - 2x, a = 0, ’
b =2.
13. f(X) = 2 (XI,
g(x) = 1 - 3x3, a = -&]3, b = 4.
a = -1,
b = 2.
14. f(X) = 1x1 + lx - II, g(x) = 0,
15. The graphs of f(x) = x2 and g(x) = cx3, where c > 0, intersect at the points (0,O) and
(l/c, 1/c2>. Find c SO that the region which lies between these graphs and over the interval
[0, l/c] has area Q.
16. Letf(x) = x - x2,g(x) = ax. Determine a SO that the region above the graph ofg and below
the graph off has area 8.
17. We have defined m to be the area of a unit circular disk. In Example 3 of Section 2.3, we
proved that n = 2 jtldndx. Use properties of the integral to compute the following
in terms of r:
(a) j-:s$=dx;
(b) j-;2/mdx;
(c) sz2 (x - 3)dGdx.
18. Calculate the areas of regular dodecagons (twelve-sided polygons) inscribed and circumscribed about a unit circular disk and thereby deduce the inequalities 3 < r < 12(2 - 43).
19. Let C denote the unit circle, whose Cartesian equation is x2 -i- y2 = 1. Let E be the set of
points obtained by multiplying the x-coordinate of each point (x, y) on C by a constant factor
CI > 0 and the y-coordinate by a constant factor b > 0. The set E is called an ellipse. (When
a = b, the ellipse is another circle.)
(a) Show that each point (x, y) on E satisfies the Cartesian equation (~/a)~ + (y/b)2 = 1.
(b) Use properties of the integral to prove that the region enclosed by this ellipse is measurable
and that its area is rab.
20. Exercise 19 is a generalization of Example 3 of Section 2.3. State and prove a corresponding
generalization of Example 4 of Section 2.3.
21. Use an argument similar to that in Example 5 of Section 2.3 to prove Theorem 2.2.
2.5 The trigonometric functions
Before we introduce further applications of integration, we Will digress briefly to discuss
t h e trigonometric functions.
We assume that the reader has some knowledge of the
properties of the six trigonometric functions, sine, cosine, tangent, cotangent, secant, and
cosecant; and their inverses, arc sine, arc cosine, arc tangent, etc. These functions are
discussed in elementary trigonometry courses in connection with various problems involving
the sides and angles of triangles.
The trigonometric functions
95
The trigonometric functions are important in calculus, not SO much because of their
relation to the sides and angles of a triangle, but rather because of the properties they
possess as functions. The six trigonometric functions have in common an important
property known as periodicity.
A function f is said to beperiodic with periodp # 0 if its domain contains x + p whenever
it contains x and if f(x + p) = f(x) f or every x in the domain off. The sine and cosine
functions are periodic with period 277, where 7r is the area of a unit circular disk. Many
problems in physics and engineering deal with periodic phenomena (such as vibrations,
planetary and wave motion) and the sine and cosine functions form the basis for the
mathematical analysis of such problems.
The sine and cosine functions cari be introduced in many different ways. For example,
there are geometric definitions which relate the sine and cosine functions to angles, and
there are analytic definitions which introduce these functions without any reference whatever
to geometry. Al1 these methods are equivalent, in the sense that they a11 lead to the same
functions.
Ordinarily, when we work with the sine and cosine we are not concerned SO much with
their definitions as we are with the properties that cari be deduced from the definitions.
Some of these properties, which are of importance in calculus, are listed below. As usual,
we denote the values of the sine and cosine functions at x by sin x, COS x, respectively.
FUNDAMENTAL
PROPERTIES
OF
1. Domain
line.
of dejnition.
2. Special
values. We have
3. Cosine of a difference.
(2.3)
THE
SINE
AND
COSINE.
The sine and cosine functions are dejîned everywhere on the real
COS
0 = sin in- = 1,
COS
7~ = - 1,
For a11 x and y, we have
cos(y -x) = cosycosx + sinysinx.
4. Fundamental inegualities. For 0 < x < &r, we have
(2.4)
1
O<cosx<~X<X
COS x .
From these four properties we cari deduce a11 the properties of the sine and cosine that
are of importance in calculus. This suggests that we might introduce the trigonometric
functions axiomatically. That is, we could take properties 1 through 4 as axioms about the
sine and cosine and deduce a11 further properties as theorems. TO make certain we are not
discussing an empty theory, it is necessary to show that there are functions satisfying the
above properties. We shall by-pass this problem for the moment. First we assume that
functions exist which satisfy these fundamental properties and show how further properties
cari then be deduced. Then, in Section 2.7, we indicate a geometric method of defining the
sine and cosine SO as to obtain functions with the desired properties. In Chapter 11 we also
outline an analytic method for defining the sine and cosine.
96
Some applications of integration
THEOREM 2.3.
If two finctions sin and COS satisfy properties 1 through 4, then they also
satisfy the following properties:
(a) Pythagorean identity. sin2 x + cos2 x = 1 for a11 x.
(b) Special values. sin 0 = COS in = sin ré = 0.
(c) Even and oddproperties. The cosine is an even fînction and the sine is an oddfunction.
That is, for a11 x we have
COS (-x) = COS x,
sin (-x) = -sin x.
(d) CO-relations. For a11 x, we have
sin (&r + x) =
COS
x,
COS(& + x ) = -sinx.
(e) Periodicity. For a11 x, we have sin (x + 2x) = sin x,
(f) Addition formulas. For a11 x and y, we have
COS
(x + 277) =
COS
x.
~~~(~+~)=cosxcos~-sinxsiny,
sin(x + y) = sinxcosy + cosxsiny.
(8) DifSerence
formulas. For a11 a and b, we have
a - b
sin a - sin b = 2 sin - COS a+b
2
2 ’
a - b . a + b
cosa-cosb=-2sin-sm2
2 *
(h) Monotonicity. In the interval [0, &T], the sine is strictly increasing and the cosine is
strictly decreasing.
Proof Part (a) follows at once if we take x = y in (2.3) and use the relation COS 0 = 1.
Property (b) follows from (a) by taking x = 0, x = fin, x = 7r and using the relation
sin &T = 1. The even property of the cosine also follows from (2.3) by taking y = 0. Next
we deduce the formula
(2.5)
cas (&r - x) = sin x ,
by taking y = $T in (2.3). From this and (2.3), we find that the sine is odd, since
sin(-x)=cos(S+x) =cos[il- ( f - x ) ]
= COS 7r COS - - x +sin7rsin
1
(277
- - x
(2?T
= - s i n x .
1
This proves (c). TO prove (d), we again use (2.5), first with x replaced by &T + x and then
with x replaced by -x. Repeated use of (d) then gives us the periodicity relations (e).
Integration formulas
for
the sine and cosine
97
TO prove the addition formula for the cosine, we simply replace x by -x in (2.3) and use
the even and odd properties. Then we use part (d) and the addition formula for the cosine
to obtain
-COS x
sin(x+y)=-cos(x+y+t)=
COS( y
+ 2 -)
+ sin x sin
( y + 2-)
= cas x sin y + sin x Cos y
This proves (f). TO deduce the difference formulas (g), we first replace y by -y in the
addition formula for sin (x + y) to obtain
sin(x
-y) = sinxcosy - cosxsiny.
Subtracting this from the formula for sin (x + y) and doing the same for the cosine function,
we get
sin(x +y) - sin(x - y ) = 2sinycosx,
COS (x + y) - COS(~
-y) = -2sinysinx.
Taking x = (a + b)/2, y = (a - b)/2, we find that these become the difference formulas
in (g).
Properties (a) through (g) were deduced from properties 1 through 3 alone. Property 4
is used to prove (h). The inequalities (2.4) show that COS x and sin x are positive if
0 < x < &7r. Now, if 0 < b < a < ix, the numbers (a + b)/2 and (a - b)/2 are in the
interval (0, &r), and the difference formulas (g) show that sin a > sin b and COS a < COS b.
This completes the proof of Theorem 2.3.
Further properties of the sine and cosine functions are discussed in the next set of
exercises (page 104). We mention, in particular, two formulas that are used frequently in
calculus. These are called the double-angle or duplication formulas. We have
sin 2x = 2 sin x cas x .
cas 2x = COS~ x - sin2 x = 1 - 2 sin2 x .
These are, of course, merely special cases of the addition formulas obtained by taking
y = x. The second formula for COS 2x follows from the first by use of the Pythagorean
identity. The Pythagorean identity also shows that [COS XI 5 1 and Isin XI < 1 for a11 x.
2.6 Integration formulas for the sine and cosine
The monotonicity properties in part (h) of Theorem 2.3, along with the CO-relations
and
the periodicity properties, show that the sine and cosine functions are piecewise monotonie
on every interval. Therefore, by repeated use of Theorem 1.12, we see that the sine and
cosine are integrable on every finite interval. Now we shall calculate their integrals by
applying Theorem 1.14. This calculation makes use of a pair of inequalities which we state
as a separate theorem.
THEOREM
2.4. If 0 < a < !g and n 2 1, we have
n
(2.6)
a
-~cos~<sina<on-‘cos~.
c
n k=l
n
n k=O
n
Some applications of integration
98
Proof.
The inequalities in (2.6) Will be deduced from the trigonometric identity
2 sin ix icos kx = sin (n + 4)x - sin ix ,
(2.7)
k=l
which is valid for n 2 1 and a11 real x.
(g) of Theorem 2.3 to Write
2 sin 4x
COS
TO prove (2.7), we use one of the difference formulas
kx = sin (k + 4)x - sin (k - i)x .
Taking k= 1,2,..., n and adding these equations, we find that the sum on the right
telescopes
and we obtain (2.7).
If ix is not an integer multiple of rr we cari divide both members of (2.7) by 2 sin ix to
obtain
12
sin (n + 4)x - sin 4x
c
COS
kx =
2 sin &x
k=l
’
Replacing n by n - 1 and adding 1 to both members we also obtain
n-1
COS
kx =
c
k=O
sin (n - $)x + sin ix
2 sin ix
’
Both these formulas are valid if x # 2mrr, where m is an integer. Taking x = a/n, where
0 < a 2 &r we find that the pair of inequalities in (2.6) is equivalent to the pair
sin (n + 4) a - sin E
n
( 1
a
n
2 sin
< sin a <
(2 1
5
sin (n - 4) a + sin
n
n
2 sin
a
( 2, 1
(2 1
This pair, in turn, is equivalent to the pair
Therefore, proving (2.6) is equivalent to proving (2.8). We shall prove that we have
(2.9)
sin (2n + l)e - sin 8 < y sin 2nO < sin (2n -
for 0 < 2nB 5 +. When 8 = a/(2n) this reduces to (2.8).
l)e + sin e
Integration formulas for the sine and cosine
99
TO prove the leftmost inequality in (2.9), we use the addition formula for the sine to
Write
(2.10)
sin 8
sin (2n + 1)% = sin 2n% cas 8 + cas 2n% sin % < sin 2n% 8 + sin e ,
where we have also used the inequalities
sin 8
COS % < e ’
0 <
COS
2nB 5 1 ,
sin 8 > 0 ,
a11 of which are valid since 0 < 2n% < &T. Inequality (2.10) is equivalent to the leftmost
inequality in (2.9).
TO prove the rightmost inequality in (2.9), we again use the addition formula for the sine
and Write
sin (2n - 1)% = sin 2n%
Adding
COS
e - COS 2n% sin e .
sin % to both members, we obtain
sin (2n - l)e + sin % = sin 2n%
(2.11)
(
COS
8 + sin % 1 - cas 2n%
sin 2n%
1 ’
But since we have
1 - COS 2n%
2 sin’ n%
sin n%
=
=sin 2n%
2 sin n% COS n%
cos ne ’
the right member of (2.11) is equal to
sin 2n%
(
COS
n%
e + sin e sin
- = sin 2n%
COS
COS
ne 1
8 COS ne + sin 8 sin n%
Cos n%
= sin 2no COS (n - 00
COS ne
*
Therefore, to complete the proof of (2.9), we need only show that
COS
(2.12)
(n - i)% , sin
COS ne
8 .
But we have
COS
n% =
COS
(n - l)e
COS
8 - sin (n - i)% sin e
<
COS
(n - i)e
COS
8 <
COS
(n - qe JL
sin e ’
Some applications of integration
100
where we have again used the fundamental inequality COS 8 < B/(sin 0). This last relation
implies (2.12), SO the proof of Theorem 2.4 is complete.
If two functions
THEOREM 2.5.
then for every real a we have
sin and COS satisfy the fundamentalproperties 1 through 4,
a
(2.13)
î0
COS
x dx = sin a ,
a
s0 sin x dx = 1 - COS a .
(2.14)
Proof.
First we prove (2.13), and then we use (2.13) to deduce (2.14). Assume that
0 < a < &T. Since the cosine is decreasing on [0, a], we cari apply Theorem 1.14 in conjunction with the inequalities of Theorem 2.4 to obtain (2.13). The formula also holds
trivially for a = 0, since both members are zero. The general properties of the integral cari
now be used to extend its validity to a11 real a.
For example, if -4~ 5 a 5 0, then 0 < -a < &T, and the reflection property gives us
n
î0
COS
x dx = - s -%Os (-x) dx = - saa
0
COS
x dx = -sin (-a) = sin a .
Thus (2.13) is valid in the interval [-tr, $T]. Now suppose that &T < a 5 $T. Then
-4~ < a - T 5 in-, SO we have
n
J0
COS
x dx =
nl2
J0
COS
x dx +
COS
a
i7712
COS
x dx = sin &r +
‘-*
s-7rj2
COS
(x + n-) dx
x dx = 1 - sin (a - n) + sin (-in) = sin a .
Thus (2.13) holds for a11 a in the interval [-in, $r]. But this interval has length 2n, SO
formula (2.13) holds for a11 a since both members are periodic in a with period 25~.
Now we use (2.13) to deduce (2.14). First we prove that (2.14) holds when a = 7~/2.
Applying, in succession, the translation property, the CO-relation sin (x + 4,) = COS x,
and the reflection property, we find
s
dz
sinxdx=~-~,;in(x+~)dx=/-~,;osxdx=~’2cos(-x)dx.
0
Using the relation
COS
(-x) =
COS
x and Equation (2.13), we obtain
rD
sin x dx = 1 .
s0
New, for any real a, we may Write
a
s0
sin x dx =
RP
s0
a
sin x dx +
a-n/2
=1+
s0
COS
7712
sinxdx=1+l-“2sin(x+F)dx
s
x dx = 1 + sin
This shows that Equation (2.13) implies (2.14).
=1 -
COS
a.
Integration formulas for the sine and cosine
EXAMPLE
101
1. Using (2.13) and (2.14) in conjunction with the additive property
jabf(X) dx = Jobf(x) dx - j-;rcx, dx,
we get the more general integration formulas
b
Ja cas x dx = sin b - sin a
and
c
-Il
b sin x dx = (1 -
COS
b) - (1 -cosa)= -(cosb-cosa).
If again we use the special symbolf(x) 1: to denote the differencef(b) -f(a), we cari Write
these integration formulas in the form
s
b
a
EXAMPLE
COS
x dx = sin x
2. Using the results
b
a
and
i
b
*a
sin x dx = -COS x
b
a
/
.
of Example 1 and the expansion property
s
j-(x) dx = f j-‘>(X/C)
en
dx ,
we obtain the following formulas, valid for c f 0:
b
COS
1
cx dx = -
C
COS
x dx = ‘, (sin cb - sin ca),
and
b
1
sin cx dx = -
EXAMPLE 3. The identity
Example 2, we obtain
a
s0
COS
sin x dx = - -L (COS cb -
COS
ca).
2,x = 1 - 2 sin2 x implies sin2 x = f(1 - cas 2x) so, from
sin2 x dx = i o(l s
COS
2x) dx = t - 4’ sin 2a .
Since sin2 x + cos2 x = 1, we also find
a
COS~
s0
x dx = (1 - sin2 x) dx = a -jusin x Q!X = E + l4 sin 2a .
s
0
102
Some applications of integration
2.7 A geometric description of the sine and cosine functions
In this section we indicate a geometric method for defining the sine and cosine functions,
and we give a geometric interpretation of the fundamental properties listed in the Section 2.5.
Consider a circle of radius r with its tenter at the origin. Denote the point (r, 0) by A,
and let P be any other point on the circle. The two line segments OA and OP determine a
geometric configuration called an angle which we denote by the symbol LAOP. An example
is shown in Figure 2.6. We wish to assign to this angle a nonnegative real number x which
cari be used as a measurement of its size. The most common way of doing this is to take a
circle of radius 1 and let x be the length of the circular arc AP, traced counterclockwise
twice area
of sector
r2
F IGURE 2.6
An angle L AOP consisting of x
radians.
F IGURE 2.7 Geometric description of sin x
and
COS
x.
from A to P, and to say that the measure of LAOP is x radians. From a logical point of
view, this is unsatisfactory at the present stage because we have not yet discussed the
concept of arc length. Arc length Will be discussed later in Chapter 14. Since the concept
of area has already been discussed, we prefer to use the area of the circular sector AOP
rather than the length of the arc AP as a measure of the size of LAOP. It is understood
that the sector AOP is the smaller portion of the circular disk when P is above the real axis,
and the larger portion when P is beiow the real axis.
Later, when arc length is discussed, we shall find that the length of arc AP is exactly
twice the area of sector AOP. Therefore, to get the same scale of measurement for angles
by both methods, we shall use twice the area of the sector AOP as a measure of the angle
LAOP. However, to obtain a “dimensionless” measure of angles, that is, a measure
independent of the unit of distance in our coordinate system, we shall define the measure
of LAOP to be twice the area of sector AOP divided by the square of the radius. This ratio
does not change if we expand or contract the circle, and therefore there is no loss in
generality in restricting our considerations to a unit circle. The unit of measure SO obtained
is called the radian. Thus, we say the measure of an angle LAOP is x radians if x/2 is the
area of the sector AOP tut from a unit circular disk.
We have already introduced the symbol n to denote the area of a unit circular disk. W h e n
P = (- 1, 0), the sector AOP is a semicircular disk of area &n, SO it subtends an angle of n
radians. The entire disk is a sector consisting of 27r radians. If P is initially at (1, 0) and if
A geometric description of the sine and cosine functions
103
P moves once around the circle in a counterclockwise direction, the area of sector AOP
increases from 0 to 7, taking every value in the interval [0, n] exactly once. This property,
which is geometrically plausible, cari be proved by expressing the area as an integral, but
we shall not discuss the proof.
The next step is to define the sine and cosine of an angle. Actually, we prefer to speak
of the sine and cosine of a number rather than of an angle, SO that the sine and cosine Will
be functions defined on the real line. We proceed as follows: Choose a number x satisfying
0 < x < 27 and let P be the point on the unit circle such that the area of sector AOP is
equal to x/2. Let (a, b) denote the coordinates of P. An example is shown in Figure 2.7.
The numbers a and b are completely determined by x. We define the sine and cosine of x
as follows :
cas x = a,
sin x = b .
In other words, COS x is the abscissa of P and sin x is its ordinate.
For example, when x = 7~, we have P = (- 1,0) SO that COS v = - 1 and sin x = 0.
Similarly, when x = +r we have P = (0, 1) and hence COS & = 0 and sin &T = 1. This
procedure describes the sine and cosine as functions defined in the open interval (0,2n).
We extend the definitions to the whole real axis by means of the following equations:
sin 0 = 0,
COS 0 = 1 )
sin (x + 27r) = sin x ,
COS (x + 2n) = COS x .
The other four trigonometric functions are now defined in terms of the sine and cosine by
the usual formulas,
sin x
tan x = ~0s :
COS x
cotx = sin x ’
1
sec x = COS x ’
1
cscx = sin x ’
These functions are defined for a11 real x except for certain isolated points where the
denominators may be zero. They a11 satisfy the periodicity property f(x + 2n) =f(x).
The tangent and cotangent have the smaller period 71.
Now we give geometric arguments to indicate how these definitions lead to the fundamental properties listed in Section 2.5. Properties 1 and 2 have already been taken tare of
by the way we have defined the sine and cosine. The Pythagorean identity becomes evident
when we refer to Figure 2.7. The hne segment OP is the hypotenuse of a right triangle whose
legs have lengths [COS x] and Isin x]. Hence the Pythagorean theorem for right triangles
implies the identity CO? x + sin” x = 1.
Now we use the Pythagorean theorem for right triangles again to give a geometric proof
of formula (2.3) for COS (y - x). Refer to the two right triangles PAQ and PBQ shown in
Figure 2.8. In triangle PAQ, the length of side AQ is ]siny - sin xl, the absolute value of
the difference
of the ordinates of Q and P. Similarly, AP has length ~COS x - COS y]. If d
denotes the length of the hypotenuse PQ, we have, by the Pythagorean theorem,
d2 = (sin y - sin x)” + (COS x - COS y)” .
On the other hand, in right triangle PBQ the leg BP has length Il - COS (JJ - x)] and the
leg BQ has length ]sin (y - x)]. Therefore, the Pythagorean theorem gives us
d2 = [I - cas (y - x)]” + sin2 (-y - x) .
Some applications of integration
104
Equating the two expressions for dz and solving for COS (y - x), we obtain the desired
formula (2.3) for COS (y - x).
Finally, geometric proofs of the fundamental inequalities in property 4 may be given by
referring to Figure 2.9. We simply compare the area of sector OAP with that of triangles
OQP and OAB. Because of the way we have defined angular measure, the area of sector
OAP is 4x. Triangle OAB has base 1 and altitude h, say. By similar triangles, we find
h/l = (sin ~)/(COS x), SO the area of triangle OAB is $h = $(sin ~)/(COS x). Therefore,
comparison of areas gives us the inequalities
1 sin x
i sin x cas x < 2 x < - 2 COS x ’
Q = (COS y, sin y)
6
sin x
hzCOS x
0
FIGURE 2.8
Geometric proof of the formula
for COS (y - x).
FIGURE 2.9
Geometric proof of the inequalities
sin x
0 <cosx <y <Ax.
Dividing by 4 sin x and taking reciprocals, we obtain the fundamental inequalities (2.4).
We remind the reader once more that the discussion of this section is intended to provide
a geometric interpretation of the sine and cosine and their fundamental properties. An
analytic treatment of these functions, making no use of geometry, Will be described in
Section 11.11.
Extensive tables of values of the sine, cosine, tangent, and cotangent appear in most
mathematical handbooks. The graphs of the six trigonometric functions are shown in
Figure 2.10 (page 107) as they appear over one complete period-interval. The rest of the
graph in each case is obtained by appealing to periodicity.
2.8 Exercises
In this set of exercises, you may use the properties of the sine and cosine listed in Sections 2.5
through 2.7.
1. (a) Prove that sin nn = 0 for every integer n and that these are the only values of x for which
sin x = 0.
(b) Find a11 real x such that COS x = 0.
2. Findallrealxsuchthat(a)sinx
= l;(b)cosx = l;(c)sinx = -l;(d)cosx = -1.
3. Prove that sin (x + r) = -sin x and COS (x + n) = -COS x for a11 x.
4. Prove that sin 3x = 3 sin x - 4 sin3 x and COS 3x = COS x - 4 sin2 x COS x for a11 real x.
Prove also that COS 3x = 4 cos3x - 3 COS x.
105
Exercises
[H~I: Use Exercise 4.1
5. (a) Prove that sin $r = $, COS & = $2/3.
(b) Prove that sin 4~ = $&, COS 4~ = &.
(c) Prove that sin 4~ = COS tn = zwz.
6. Prove that tan (x - y) = (tan x - tan y)/(1 + tan x tan y) for a11 x and y with tan x tan y #
- 1. Obtain corresponding formulas for tan (x + y) and cet (x + y).
7. Find numbers A and B such that 3 sin (x + &) = A sin x + B COS x for a11 x.
8. Prove that if C and G( are given real numbers, there exist real numbers A and B such that
C sin (x + a) = A sin x + B COS x for a11 x.
9. Prove that if A and B are given real numbers, there exist numbers C and N, with C 2 0, such
that the formula of Exercise 8 holds.
10. Determine C and c(, with C > 0, such that C sin (x + a) = -2 sin x - 2 COS x for a11 x.
11. Prove that if A and B are given real numbers, there exist numbers C and c(, with C 2 0, such
thatCcos(x+cc)=Asinx+Bcosx.
DetermineCandaifA=B=l.
12. Find a11 real x such that sin x = COS x.
13. Find a11 real x such that sin x - COS x = 1.
14. Prove that the following identities hold for a11 x and y.
(a) 2cosxcosy =cos(x -y) +COS(~ +y).
(b) 2sinxsiny =COS(~ -y) -COS(~ +y).
(c) 2sinxcosy =sin(x -y) +sin(x +y).
15. If h # 0, prove that the following identities hold for a11 x:
sin (x + h) - sin -=x sin (h/2) cos x + e
21 ’
h
h/2
(
COS
(x + h) - COS x =-sin W2) sin x + E
h
21 .
h/2
i
These formulas are used in differential calculus.
16. Prove or disprove each of the following statements.
(a) For a11 x # 0, we have sin 2x # 2 sin x.
(b) For every x, there is a y such that COS (x + y) = COS x + COS y.
(c) There is an x such that sin (x + y) = sin x + sin y for a11 y.
(d) There is a y # 0 such that s; sin x dx = sin y.
17. Calculate the integral Ja sin x dx for each of the following values of a and b.
interpret your result geometrically in terms of areas.
(e) a = 0, b = 77.
(a) a = 0, b = n/6.
(f) a = 0, b = 2a.
(b) a = 0, b = n/4.
(g) a = -1,b = 1.
’
(c) a = 0, b = n/3.
(d) a = 0, b = n/2.
(h) a = -7r/6,b = a/4.
Evaluate
the integrals in Exercises 18 through 27.
18. Ji (x + sin x) dx.
23. j+; 14 +
19. j-;” (x2 +
24. T, 1; +
s
COS
x) dx.
20. c’2 (sin x - COS x) dx.
Is
21. or’2 Isin x - COS xl dx.
i
22.
0
(4 +
In each case
COS
t) dt.
COS
tl dt.
COS
tl dt, if 0 5 x 5 rr.
25.
26. ui2 sin 2x dx.
s
106
Some applications of integration
2s
28. Prove the following integration formulas, valid for b # 0:
COS
1
(a + bt) dt = b [sin (a + bx) - sin a],
0
0
sin (a + bt) dt = - t [Cos (a + bx) - Cos a].
s0
29. (a) Use the identity sin 3t = 3 sin t - 4 sin3 t to deduce the integration formula
z .
sm3 t dt = $ - 9(2 + sin2 x) COS x .
s0
(b) Derive the identity COS 3t = 4 cos3 t - 3 COS t and use it to prove that
z
I0
COS~
t dt = 9(2 + cos2 x) sin x .
30. If a function f is periodic with period p > 0 and integrable on [0, p], prove that ~:Y(X) dx =
j;+“f(x) dx for a11 a.
31. (a) Prove that j[” sin nx dx = j? COS nx dx = 0 for a11 integers n # 0.
(b) Use part (a) and the addition formulas for the sine and cosine to establish the following
formulas, valid for integers m and n, m2 # n2;
2R
s0
sin nx
COS
2n
mx dx =
s0
sin nx sin mx dx =
2n
sin2 nx dx =
s0
2n
s0
cos2nxdx =TT,
cosnxcosmxdx =0,
i f n#O.
These formulas are known as the orthogonality relations for the sine and cosine.
32. Use the identity
2 sin t
COS
kx = sin (2k + 1) 5 - sin (2k - 1) 5
and the telescoping property of finite sums to prove that if x # 2mn (m an integer), we have
n
c
COS
kx =
k=l
sin +2x cas &(n + 1)x
sin ix
33. If x # 2rnn (m an integer), prove that
n
sin &x sin i(n + 1)x
c sin kx =
sin ix
k=l
34. Refer to Figure 2.1. By comparing the area of triangle OAP with that of the circuiar sector
OAP, prove that sin x < x if 0 < x < 4~. Then use the fact that sin (-x) = -sin x to prove
that jsin XI < 1x1 if 0 < 1x1 < 6~.
Exercises
107
-h-=, .&bx
Y
A
43-
’
y=cscx;
2:iJ
1
0
*I
-I--1-3- -2F IGURE 2.10 Graphs of the trigonometric functions
period-interval.
I
I
2rl
!)
iii1
as they appear over one
X
Some applications ofintegration
108
2.9 Polar coordinates
Up to now we have located points in the plane with rectangular coordinates. We cari
also locate them with polar coordinates. This is done as follows. Let P be a point distinct
from the origin. Suppose the line segment joining P to the origin has length r > 0 and
mskes an angle of 8 radians with the positive x-axis. An example is shown in Figure 2. Il.
The two numbers r and 19 are called polar coordinates of P. They are related to the rectangular coordinates (x, y) by the equations
x = rcos0,
(2.15)
y = r sin 0.
Y
y = r sin 19
x = rcosf9
FIGURE 2.11 Polar coordinates.
FIGURE 2.12 A figure-eight curve with polar
equation r = 4jZ-Q.
The positive number r is called the radial distance of P, and 0 is called a polar angle. We
say a polar angle rather than the polar angle because if 8 satisfies (2.15), SO does 8 + 2m-r
for any integer n. We agree to cal1 a11 pairs of real numbers (r, 0) polar coordinates of P if
they satisfy (2.15) with r > 0. Thus, a given point has more than one pair of polar
coordinates. The radial distance r is uniquely determined, r = m,but the polar
angle 0 is determined only up to integer multiples of 27r.
When P is the origin, the equations in (2.15) are satisfied with r = 0 and any 0. For this
reason we assign to the origin the radial distance r = 0, and we agree that any real 0 may
be used as a polar angle.
Letfbe a nonnegative function defined on an interval [a, b]. The set of a11 points with
polar coordinates (r, 0) satisfying r =f(e) is called the graph off in polar coordinates.
The equation r =f(e) is called a polar equation of this graph. For some curves, polar
The integral for area in polar coordinates
109
equations may be simpler and more convenient
to use than Cartesian equations. For
example, the circle with Cartesian equation x2 + y2 = 4 has the simpler polar equation
r = 2. The equations in (2.15) show how to convert from rectangular to polar coordinates.
EXAMPLE . Figure 2.12 shows a curve in the shape of a figure eight whose Cartesian
equation is (x2 + y2)3 = y2. Using (2.15), we find x2 + y2 = rz, SO the polar coordinates of
the points on this curve satisfy the equation r6 = r2 sin2 8, or r2 = Isin 61, r = m.
It is not difficult to sketch this curve from the polar equation. For example, in the interval
0 < 8 5 ~r/2, sin e increases from 0 to 1, SO r also increases from 0 to 1. Plotting a few
values which are easy to calculate, for example, those corresponding to 8 = 7/6, 7r/4, and
n/3, we quickly sketch the portion of the curve in the first quadrant. The rest of the curve
is obtained by appealing to symmetry in the Cartesian equation, or to the symmetry and
periodicity of Isin 01. It would be a more difficult task to sketch this curve from its
Cartesian equation alone.
2.10 The integral for area in polar coordinates
Let f be a nonnegative function defined on an interval [a, b], where 0 5 b - a < 277.
The set of a11 points with polar coordinates (r, 0) satisfying the inequalities
/
8=b
FIGURE 2.13 The radial set of f over
an interval [a, b].
FIGURE 2.14 The radial set of a step
function s is a union of circular sectors.
Its area is 4s: x2(0) dB.
is called the radial set offover [a, b]. The shaded region shown in Figure 2.13 is an example.
If f is constant on [a, b], its radial set is a circular sector subtending an angle of b - a
radians. Figure 2.14 shows the radial set S of a step function s. Over each of the IZ open
subintervals (8,-, , 0,) of [a, b] in which s is constant, say s(0) = sk , the graph of s in polar
coordinates is a circular arc of radius sk , and its radial set is a circular sector subtending an
angle of 8, - e,-, radians. Because of the way we have defined angular measure, the area
of this sector is &(0, - BkP1)s,2 . Since b - a 5 2rr , none of these sectors overlap SO, by
Some applications of integration
110
additivity, the area of the radial set of s over the full interval [a, b] is given by
where s”(0) means the square of s(0). Thus, for step functions, the area of the radial set has
been expressed as an integral. Now we prove that this integral formula holds more
generally.
THEOREM 2.6. Let R denote the radial set of a nonnegative function f over an interval
[a, b], where 0 5 b - a 5 2n=, and assume that R is measurable. Iff 2 is integrable on [a, b]
the area of R is given by the integral
a(R) = 4 j*f’(e) de.
a
Proof.
Choose two step functions s and t satisfying
0 I de) 1f(4 I t(e)
for a11 0 in [a, b], and let S and T denote their radial sets, respectively. Since s 5 f < t on
[a, b], the radial sets are related by the inclusion relations S G R E T. Hence, by the
monotone property of area, we have a(s) 5 a(R) < a(T). But S and T are radial sets of
step functions, SO a(S) = 4s: s”(0) de and a(T) = $Ja t”(e) dB. Therefore we have the
inequalities
r
?(e)
de 5 2a(~) 2 t t”(e) de ,
for a11 step functions s and t satisfying s <f 5 t on [a, b]. But s2 and t2 are arbitrary step
since f” is integrable, we must have
functions satisfying s2 5 f” < t2 on [a, b] hence,
2a(R) = Jafz(0) dB. This proves the theorem.
Note: It cari be proved that the measurability of R is a consequence
thatf2 is integrable, but we shall not discuss the proof.
of the hypothesis
EXAMPLE. TO calculate the area of the radial set R enclosed
by the figure-eight curve
shown in Figure 2.12, we calculate the area of the portion in the first quadrant and multiply
by four. For this curve, we havef2(0) = 1sin 8) and, since sin 8 2 0 for 0 5 0 2 ~12, we
find
nl2
RI2
sin 0 dB = 2 COS 0 - COS T = 2 .
a(R) = 4 o &p(e) de = 2
i
s
s0
2
2.11 Exercises
In each of Exercises 1 through 4, show that the set of points whose rectangular coordinates
(x, y) satisfy the given Cartesian equation is equal to the set of a11 points whose polar coordinates
(r, 0) satisfy the corresponding polar equation.
Application of integration to the calculation
of volume
111
1. (X - 1)2 + y2 = 1;
r =2cos0, coso > O .
r = 1 + cas 0.
2.xs+ys-x=4-;
3. (x2 +yq2 =x2 -y2,y2 5 x2;
r = VGZë, Cos 20 2 0.
4. (x2 + y2)2 = 1x2 - y21 ;
r=+jZZj.
In each of Exercises 5 through 15, sketch the graph off in polar coordinates and compute the
area of the radial set offover the interval specified. You may assume each set is measurable.
5. Spiral of Architnedes: f(0) = 8, 0 I 0 I 27.
6. Circle tangent to y-axis: f(O) = 2 COS 0,
-7112 < 0 I ~12.
7. Two circles tangent to y-axis: f(0) = 2 [COS 01, 0 5 tl I 2~.
8. Circle tangent to x-axis: f(0) = 4 sin 0, 0 I 0 I T.
9. Two circles tangent to x-axis:f(B) = 4 Isin 01, 0 5 8 I 2~.
10. Rosepetal: f(0) = sin 20, 0 5 8 5 ~12.
11. Four-leaved rose: f(0) = Isin 201, 0 < 0 I 27~.
12. Lazy eight: f(O) = ~(COS 81, 0 5 0 I 2ir.
13. Four-leaf clouer: f(0) = 1/icoszer, 0 I t9 I 271.
14. Cardioid:f(B)
= 1 + COS 0, 0 I 0 5 2~.
15. Limaçon: f(e) = 2 + COS e, 0 5 e a 2~.
2.12 Application of integration to the calculation of volume
In Section 1.6 we introduced the concept of area as a set function satisfying certain
properties which we took as axioms for area. Then, in Sections 1.18 and 2.2, we showed
that the areas of many regions could be calculated by integration. The same approach cari
be used to discuss the concept of volume.
We assume there exist certain sets S of points in three-dimensional space, which we cal1
measurable sets, and a set function v, called a volume function, which assigns to each
measurable set S a number v(S), called the volume of S. We use the symbol &’ to denote
the class of a11 measurable sets in three-dimensional space, and we cal1 each set S in z&’ a
solid.
As in the case of area, we list a number of properties we would like volume to have and
take these as axioms for volume. The choice of axioms enables us to prove that the volumes
of many solids cari be computed by integration.
The first three axioms, like those for area, describe
the nonnegative, additive, and
difference
properties. Instead of an axiom of invariance under congruence, we use a
different type of axiom, called Cavalieri’sprinciple. This assigns equal volumes to congruent
solids and also to certain solids which, though not congruent, have equal cross-sectional
areas tut by planes perpendicular to a given line. More precisely, suppose S is a given solid
and L a given line. If a plane F is perpendicular to L, the intersection F f? S is called a
cross-section perpendicular to L. If every cross-section perpendicular to L is a measurable
set in its own plane, we cal1 S a Cavalieri solid. Cavalieri’s principle assigns equal volumes
to two Cavalieri solids, S and T, if a(S n F) = a(T n F) for every plane F perpendicular
to the given line L.
Cavalier?s principle cari be illustrated intuitively as follows. Imagine a Cavalieri solid
as being a stack of thin sheets of material, like a deck of cards, each sheet being perpendicular
to a given line L. If we slide each sheet in its own plane we cari change the shape of the solid
but not its volume.
The next axiom states that the volume of a rectangular parallelepiped is the product of
112
Some applications of integration
the lengths of its edges. A rectangular parallelepiped is any set congruent to a set of the form
(2.16)
0, y, 2) 10 5 x 5 a,
O<y<b,
O<z<c}.
We shah use the shorter term “box” rather than “rectangular parallelepiped.” The nonnegative numbers a, b, c in (2.16) are called the lengths of the edges of the box.
Finally, we include an axiom which states that every convex set is measurable. A set is
called convex if, for every pair of points P and Q in the set, the line segment joining P and
Q is also in the set. This axiom, along with the additive and difference properties, ensures
that a11 the elementary solids that occur in the usual applications of calculus are measurable.
The axioms for volume cari now be stated as follows.
We assume there exists a class &’ of solids and a
AXIOMATIC DEFINITION OF VOLUME.
set function v, whose domain is &‘, with the follow?ng properties:
1. Nonnegative property. For each set S in zzf we have v(S) 2 0.
2. Additive property. US and Tare in &, then S v T and S n T are in &, and we have
V(S u T) = v(S) + v(T) - V(S n T) .
3. DifSerence property. If S and T are in & with S E T, then T - S is in &, and we
have U(T - S) = v(T) - v(S).
4. Cavalier?s principle. If S and T are two Cavalieri solids in & with a(S n F) 5
a(T n F) for every plane Fperpendicular to a given line, then v(S) < v(T).
5. Choice of scale. Every box B is in &. If the edges of B have lengths a, 6, and c, then
v(B) = abc.
6. Every convex set is in &‘.
Axiom 3 shows that the empty set @ is in &’ and has zero volume. Since U(T - S) 2 0,
Axiom 3 also implies the following monotone property :
4s) I v(T),
forsetsSandTin&‘withSG
T.
The monotone property, in turn, shows that every bounded plane set S in ~2 has zero
volume. A plane set is called bounded if it is a subset of some square in the plane. If we
consider a box B of altitude c having this square as its base, then S c B SO that we bave‘
v(S) < v(B) = a%, where a is the length of each edge of the square base. If we had v(S) > 0,
we could choose c SO that c < v(S)/a2, contradicting the inequality v(S) 5 a%. This shows
that u(S) cannot be positive, SO v(S) = 0, as asserted.
Note that Cavalieri’s principle has been stated in the form of inequalities. If a(S n F) =
a(T n F) for every plane F perpendicular to a given line, we may apply Axiom 5 twice to
deduce v(S) 5 v(T) and v(T) 5 v(S), and hence we have v(T) = v(S).
Next we show that the volume of a right cylindrical solid is equal to the area of its base
multiplied by its altitude. By a right cylindrical solid we mean a set congruent to a set S
of the form
s = 0, y, 41 (x, y> E 4
a 5 z 5 bl,
Application
of
integration to the calculation
of
aolume
113
where B is a bounded plane measurable set. The areas of the cross sections of S perpendicular to the z-axis determine a cross-sectional area function as which takes the constant
value a(B) on the interval a < z < b, and the value 0 outside [a, b].
Now let T be a box with cross-sectional area function aT equal to a,. Axiom 5 tells us
that v(T) = a(B)(b - a), where a(B) is the area of the base of T, and b - a is its altitude.
Cavalier?s principle states that v(S) = v(T), SO the volume of S is the area of its base,
a(B), multiplied by its altitude, b - a. Note that the product a(B)(b - a) is the integral
of the function a, over the interval [a, b]. In other words, the volume of a right cylindrical
solid is equal to the integral of its cross-sectional area function,
v(S) = [ as(z) dz .
We cari extend this formula to more general Cavalieri solids. Let R be a Cavalieri solid
with measurable cross-sections perpendicular to a given line L. Introduce a coordinate
axis along L (cal1 it the u-axis), and let an(u) be the area of the cross section tut by a plane
perpendicular to L at the point U. The volume of R cari be computed by the following
theorem.
THEOREM 2.7. Let R be a Cavalieri solid in ~2 with a cross-sectional areafunction atz which
is integrable on an interval [a, b] and zero outside [a, b]. Then the volume of R is equal to
the integral of the cross-sectional area:
v(R) = [ aR(u) du .
Proof. Choose step functions s and t such that s 5 aR < t on [a, b] and define s and t
to be zero outside [a, b]. For each subinterval of [a, b] on which s is constant, we cari
imagine a cylindrical solid (for example, a right circular cylinder) constructed SO that its
cross-sectional area on this subinterval has the same constant value as s. The union of these
cylinders over a11 intervals of constancy of s is a solid S whose volume v(S) is, by additivity,
equal to the integral ji s(u) du. Similarly, there is a solid T, a union of cylinders, whose
volume v(T) = Ja t(u) du. But as(u) = s(u) 5 a,(u) 5 t(u) = aT(u) for a11 u in [a, b], SO
Cavalieri’s principle implies that v(S) < v(R) 5 v(T). In other words, v(R) satisfies the
inequalities
[ s(u) du 5 v(R) < [ t(u) du
for a11 step functions s and t satisfying s < a, 5 t on [a, b]. Since as is integrable on [a, b],
it follows that v(R) = ji a,(u) du.
EXAMPLE. Volume of a solid of revolution. Let f be a function which is nonnegative and
integrable on an interval [a, b]. If the ordinate set of this function is revolved about the
x-axis, it sweeps out a solid of revolution. Each cross section tut by a plane perpendicular
to the x-axis is a circular disk. The area of the circular disk tut at the point x is ~Y(X),
wherefa(x) means the square off(x). Therefore, by Theorem 2.7, the volume of the solid
(if the solid is in JZY) is equal to the integral sa V~(X) dx, if the integral exists. In particular,
114
Some applications of integration
iff(x) = drz - x2 for -Y 5 x 5 r, the ordinate set off is a semicircular disk of radius r
and the solid swept out is a sphere of radius r. The sphere is convex. Its volume is equal to
s7
-T T~“(X) dx = -rr :7 (r2 - x2) dx = 27~ Or(r2 - x2) dx = $rr3.
s
i
More generally, suppose we have two nonnegative functions f and g which are integrable
on an interval [a, b] and satisfy f 5 g on [a, b]. When the region between their graphs is
rotated about the x-axis, it sweeps out a solid of revolution such that each cross section tut
by a plane perpendicular to the x-axis at the point x is an annulus (a region bounded by two
concentric circles) with area .rrg2(x) - T~“(X). Therefore, ifg2 -f 2 is integrable, the volume
of such a solid (if the solid is in ~2) is given by the integral
sab
4g2(x) - ~“(X>I dx
2.13 Exercises
1. Use integration to compute the volume of a right circular cane generated by revolving the
ordinate set of a linear functionf(x) = cx over the interval 0 < x < b. Show that the result
is one-third the area of the base times the altitude of the cane.
In each of Exercises 2 through 7, compute the volume of the solid generated by revolving the
ordinate set of the function fover the interval indicated. Sketch each of the ordinate sets.
2. f(x) = 2/x,
Olx21.
5. f(x) = sin x, 0 < x < r.
3. f(x) = x1/4,
Olxll.
6. f(x) = COS x, 0 < x < ~12.
4. f(x) = x2,
- 1 5x 12.
7. f(x) = sin x + cas x, 0 5 x < V.
In each of Exercises 8 through 11, sketch the region between the graphs offand g and compute
the volume of the solid obtained by rotating this region about the x-axis.
Olxll.
8. f(x) = &,
g(x) = 1,
9. f(x) = 4,
Olxll.
g(x) = x2,
10. f(x) = sin x,
g(x) = COS x,
0 < x < rr/4.
ll.f(x) = y 4 -x2, g(x) = 1,
OIxId3.
12. Sketch the graphs of f(x) = 1/x and g(x) = x/2 over the interval [0,2]. Find a number t,
1 < t < 2, SO that when the region between the graphs off and g over the interval [0, t] is
rotated about the x-axis, it sweeps out a solid of revolution whose volume is equal to Tt3/3.
13. What volume of material is removed from a solid sphere of radius 2r by drilling a hole of radius
r through the tenter?
14. A napkin-ring is formed by drilling a cylindrical hole symmetrically through the tenter of a
solid sphere. If the length of the hole is 2h, prove that the volume of the napkin-ring is nah3,
where a is a rational number.
15. A solid has a circular base of radius 2. Each cross section tut by a plane perpendicular to a
fixed diameter is an equilateral triangle. Compute the volume of the solid.
16. The cross sections of a solid are squares perpendicular to the x-axis with their centers on the
axis. If the square tut off at x has edge 2x2, find the volume of the solid between x = 0 and
x = a. Make a sketch.
17. Find the volume of a solid whose cross section, made by a plane perpendicular to the x-axis,
has the area ax2 + bx + c for each x in the interval 0 5 x < h. Express the volume in terms
of the areas B,, M, and B, of the cross sections corresponding to x = 0, x = h/2, and x = h,
respectively. The resulting formula is known as theprismoidformula.
Application of integration to the concept of work
115
18. Make a sketch of the region in the xy-plane consisting of a11 points (x, y) satisfying the simultaneous inequalities 0 < x 2 2, $x2 < y < 1. Compute the volume of the solid obtained by
rotating this region about (a) the x-axis; (b) the y-axis; (c) the vertical line passing through
(2,0); (d) the horizontal line passing through (0, 1).
2.14 Application of integration to the concept of work
Thus far our applications of integration have been to area and volume, concepts from
geometry. Now we discuss an application to work, a concept from physics.
Work is a measure of the energy expended by a force in moving a particle from one point
to another. In this section we consider only the simplest case, linear motion. That is, we
assume that the motion takes place along a line (which we take as the x-axis) from one
point, say x = a, to another point, x = b, and we also assume that the force acts along this
line. We permit either a < b or b < a. We assume further that the force acting on the
particle is a function of the position. If the particle is at x, we denote by f (x) the force acting
on it, where f (x) > 0 if the force acts in the direction of the positive x-axis, andf(x) < 0 if
the force acts in the opposite direction. When the force is constant, say f(x) = c for a11
x between a and 6, we define the work done by f to be the number c *(b - a), force times
displacement. The work may be positive or negative.
If force is measured in pounds and distance in feet, we measure work in foot-pounds;
if force is in dynes and distance in centirneters (the cgs system), work is measured in dynecentimeters. One dyne-centimeter of work is called an erg. If force is in newtons and
distance in meters (the mks system), work is in nebrston-meters. One newton-meter of work
is called a joule. One newton is 105 dynes, and one joule is 107 ergs.
EXAMPLE. A stone weighing 3 pounds (lb) is thrown upward along a straight line, rising
to a height of 15 feet (ft) and returning to the ground. We take the x-axis pointing up along
the line of motion. The constant force of gravity acts downward, so f (x) = -3 lb for each
x, 0 5 x 5 15. The work done by gravity in moving the stone from, say, x = 6 ft to
x = 15 ft is -3 *(15 - 6) = -27 foot-pounds (ft-lb). When the same stone falls from
x = 15 ft to x = 6 ft, the work done by gravity is -3(6 - 15) = 27 ft-lb.
Now suppose the force is not necessarily constant but is a given function of position defined on the interval joining a and b. How do we define the work done by f in moving a
particle from a to b ? We proceed much as we did for area and volume. We state some
properties of work which are dictated by physical requirements. Then we prove that for
any definition of work which has these properties, the work done by an integrable force
function f is equal to the integral Si f(x) dx.
FUNDAMENTAL PROPERTIES OF WORK.
Let WJjJ denote the work done by a force function
fin moving a particle from a to b. Then work has the following properties:
1. Additiveproperty. Ifa < c < 6, then W:(f) = W:(f) + W:(f).
2. Monotone property. Iff 5 g on [u, b], then W:(f) < W:(g). That is, a greater force
does greater work.
3. Elementary formula. qf is constant, say f (x) = c for a11 x in the open interval (a, b),
then w:(f) = c. (b - a).
The additive property cari be extended by induction to any finite number of intervals.
Some applications of integration
116
That is, if a = x0 < x1 < **. < x, = b, we have
where W, is the work done by f from xg_i to xk. In particular, if the force is a step function
s which takes a constant value sic on the open interval (x,-,, xt), property 3 states that
W, = Si *(xk - xkpl), SO we have
W;(s) = 2 sk . (xk - x~-~) =Sas(x)
dx .
k=l
Thus, for step functions, work has been expressed as an integral. Now it is an easy matter
to prove that this holds true more generally.
THEOREM 2.8. Suppose work has been dejned for a class of force functions f in such a
way that it satisjies properties 1, 2, and 3. Then the work done by an integrable force function
fin moving a particle froc a to b is equal to the integral off,
w:(f) = C~(X) dx .
Proof. Let s and t be two step functions satisfying s If 2 t on [a, b]. The monotone
property of work states that W:(s) 5 W:(f) 5 W:(t). But W:(s) = ja S(X) dx and W:(t) =
JE t(x) dx, SO the number W:(f) satisfies the inequalities
for a11 step functions s and t satisfying s < f 5 t on [a, b].
it follows that W:(f) = jaf(x) dx.
Since f is integrable on [a, b],
Note: Many authors simply define work to be the integral of the force function.
The foregoing discussion serves as motivation for this definition.
EXAMPLE.
Work required to stretch a spring. Assume that the force f(x) needed to
stretch a steel spring a distance x beyond its natural length is proportional to x (Hooke’s
Zaw). We place the x-axis along the axis of the spring. If the stretching force acts in the
positive direction of the axis, we havef(x) = cx, where the spring constant c is positive.
(The value of c cari be determined if we know the forcef(x) for a particular value of x # 0.)
The work required to stretch the spring a distance a is ji f(x) dx = jo cx dx = ca2/2, a
number proportional to the square of the displacement.
A discussion of work for motion along curves
Volume II with the aid of line integrals.
other than straight lines is carried out in
2 . 1 5 Exercises
In Exercises 1 and 2 assume the force on the spring obeys Hooke’s law.
1. If a ten-Pound force stretches an elastic spring one inch, how much work is done in stretching
the spring one foot?
Average value of a function
117
2. A spring has a natural length of 1 meter (m). A force of 100 newtons compresses it to 0.9 m.
How many joules of work are required to compress it to half its natural length? What is the
length of the spring when 20 joules of work have been expended?
3. A particle is moved along the x-axis by a propelling forcef(x) = 3x2 + 4x newtons. Calculate
how many joules of work are done by the force to move the particle (a) from x = 0 to x = 7 m;
(b) from x = 2 m to x = 7 m.
4. A particle is to be moved along the x-axis by a quadratic propelling forcef(x) = ax2 + bx
dynes. Calculate a and b SO that 900 ergs of work are required to move the particle 10 centimeters (cm) from the origin, if the force is 65 dynes when x = 5 cm.
5. A table 50 feet in length and weighing 4 pounds per foot (Ib/ft) hangs from a windlass. Calculate the work done in winding up 25 ft of the table. Neglect a11 forces except gravity.
6. Solve Exercise 5 if a 50 Pound weight is attached to the end of the table.
7. A weight of 150 pounds is attached at one end of a long flexible chain weighing 2 lb/ft. The
weight is initially suspended with 10 feet of chain over the edge of a building 100 feet in height.
Neglect a11 forces except gravity and calculate the amount of work done by the force of gravity
when the load is lowered to a position 10 feet above the ground.
8. In Exercise 7, suppose that the chain is only 60 feet long and that the load and chain are allowed
to drop to the ground, starting from the same initial position as before. Calculate the amount
of work done by the force of gravity when the weight reaches the ground.
The work
9. Let V(q) denote the voltage required to place a charge q on the plates of a condenser.
required to charge a condenser from q = CI to q = b is defined to be the integral Ji V(q) dq.
If the voltage is proportional to the charge, prove that the work done to place a charge Q on
an uncharged condenser
is +Q V(Q).
2.16 Average value of a function
In scientific work it is often necessary to make several measurements under similar
conditions and then compute an average or mean for the purpose of summarizing the data.
There are many useful types of averages, the most common being the arithmetic mean. If
4, a2, . . . , a, are n real numbers, their arithmetic mean a is defined by the equation
1 n
(j=n c
k=l
(2.17)
ak .
If the numbers ak are the values of a functionfat n distinct points, say a, =f(xk), then the
number
; $f cxk)
k=l
is the arithmetic mean of the function valuesf(x,), . . . ,f(xJ. We cari extend this concept
to compute an average value not only for a finite number of values off(x) but for a11 values
off(x) where x runs through an interval. The following definition serves this purpose.
DEFINITION
an interval
(2.18)
OF
AVERAGE
VALUE
OF
A
FUNCTION
ON
AN
INTERVAL.
Iff is integrable on
[a, b], we dejine A(f ), the average value off on [a, b], by the formula
AU) = b5 s:f (xl dx .
Some applications of integration
118
When f is nonnegative, this formula has a simple geometric interpretation. Written in
the form (b - a)A(f) = ja f(x) dx, it states that the rectangle of altitude A(f) and base
[a, b] has an area equal to that of the ordinate set off over [a, b].
Now we cari show that formula (2.18) is actually an extension of the concept of the
arithmetic mean. Let f be a step function which is constant on n equal subintervals of
[a, b]. Specifically, let xk = a + k(b - a)/n for k = 0, 1,2, . . . , n, and suppose that
f(x) = f(x& if xkpl < x < x~. Then xlc - xk-r = (b - a)/n, SO we have
‘w) = & s
f(x) dx =
Thus, for step functions, the average A(f) is the same as the arithmetic mean of the values
f (x,)3 . . . , f (x,) taken on the intervals of constancy.
Weighted arithmetic means are often used in place of the ordinary arithmetic mean in
(2.17). If wl, w2> . . . , w, are IZ nonnegative numbers (called weights), not a11 zero, the
weighted arithmetic mean a of a,, a2, . . . , a, is defined by the formula
iwk
k=l
When the weights are a11 equal, this reduces to the ordinary arithmetic mean. The extension
of this concept to integrable functions is given by the formula
(2.19)
s bwWW dx
4.f) = n b
w(x) dx
Ja
3
where M, is a nonnegative weight function with jz w(x) dx # 0.
Weighted averages are widely used in physics and engineering, as well as in mathematics.
For example, consider a straight rod of length a made of a material of varying density.
Place the rod along the positive x-axis with one end at the origin 0, and let m(x) denote the
mass of a portion of the rod of length x, measured from 0. If m(x) =JO p(t) dt for some
integrable function p (p is the Greek letter ho), then p is called the mass density of the rod.
A untform rod is one whose mass density is constant. The integralj; X~(X) dx is called the
jîrst moment of the rod about 0, and the tenter of mass is the point whose x-coordinate is
a
I
,f= 0 V(X> dx
s”“p(x) dx
’
This is an example of a weighted average. We are averaging the distance function f (x) = x
with the mass density p as weight function.
Exercises
119
The integralj; X~~(X) dx is called the second moment, or moment of inertia, of the rod
about 0, and the positive number r given by the formula
y’=
1
=X”~(X) dx
O
s0“P(X) dx
is called the radius of gyration of the rod. In this case, the function being averaged is the
square of the distance function, f(x) = x2, with the mass density p as the weight function.
Weighted averages like these also occur in the mathematical theory of probability where
the concepts of expectation and variante play the same role as tenter of mass and moment
of inertia.
2.17 Exercises
In Exercises 1 through 10, compute the average A(f) for the given functionfover the specified
interval.
1. f(X) = x2,
a<x<b.
6. f(x) = COS x,
- VT/2 < x < a/2.
2. f(x) = x2 + x3,
Olxll.
7. f(x) = sin 2x,
0 2 x 5 T/2.
3. f(x) = x1/2,
OIx14.
8 . f ( x ) = sin x COS x,
0 < x < R/4.
4. f(x) = x1/3,
lI,x<S.
9. f(x) = sin2 x,
0 < x 5 a/2.
5. f(x) = sin x,
0 5 x < a/2.
10. f(x) = COS2 x,
OIx<?T.
11. (a) Iff(x) = x2 for 0 < x 5 a, find a number c satisfying 0 < c < a such thatf(c) is equal to
the average off in [0, a].
(b) Solve part (a) iff(x) = x”, where n is any positive integer.
12. Letf(x) = x2 for 0 I; x 5 1. The average value off on [0, l] is $. Find a nonnegative weight
function w such that the weighted average off on [0, 11, as defined by Equation (2.19) is
(a> 4~; (b) $; Cc> 8.
13. Let A (f)denote the average offover an interval [a, b]. Prove that the average has the following
properties :
(a) Additive property: A (f + g) = A(f) + A(g).
(b) Homogenousproperty: A(cf) = CA(~)
if c is any real number.
(c)
Monotoneproperty:
A(,f)
<
A(g)
if
f
<g on [a, b].
,
14. Which of the properties in Exercise 13 are valid for weighted averages as defined by Equation
(2.19)?
15. Let Ai(f) denote the average off on an interval [a, b].
(a) If a < c < b, prove that there i:i a number t satisfying 0 < t < 1 such that Ai(f) =
tAi(f) + (1 - t)Ae(f). Thus, Ai(f) is a weighted arithmetic mean of Ai(f) and AZ(f).
(b) Prove that the result of part (a) also holds for weighted averages as defined by Equation
(2.19).
Each of Exercises 16 through 21 refers t’o a rod of length L placed on the x-axis with one end at
the origin. For the mass density p as described in each case, calculate (a) the tenter of mass of the
rod, (b) the moment of inertia about the origin, and (c) the radius of gyration.
16. p(x) = 1
for 0 <x 5 L.
17. p(x) = 1
f o r 05x<:,
18. ,D(x) = x
f o r 0 <x IL.
19. p(x) = x
f o r 05x<;,
p(x) = 2
f o r g < .Y IL.
p(x) = ;
for g <x 2 L.
120
Some applications of integration
20. p(x) = x2
21. p(x) = x2
for 0 I x I L.
L
f o r Olxlz,
L2
P(X) = 4
for 4 2 x < L.
22. Determine a mass density p SO that the tenter of mass of a rod of length L Will be at a distance
L/4 from one end of the rod.
23. In an electrical circuit, the voltage e(t) at time t is given by the formula e(t) = 3 sin 2t. Calculate the following: (a) the average voltage over the time interval [0, a/2]; (b) the root-meansquare of the voltage; that is, the square root of the average of the function e2 in the interval
w, r/21.
24. In an electrical circuit, the voltage e(r) and the current i(r) at time t are given by the formulas
e(t) = 160 sin t, i(t) = 2 sin (t - x/6). The average power is defined to be
1 T
TO
s
e(t)i(t) dt ,
where T is the period of both the voltage and the current. Determine T and calculate the
average power.
2.18 The integral as a function of the Upper limit. Indefinite integrals
In this section we assume thatf is a function such that the integral jZ, f(t) dt exists for each
x in an interval [a, b]. We shall keep a and f fixed and study this integral as a function of x.
We denote the value of the integral by A(x), SO that we have
(2.20)
A(x) = j;f(t) dt
i f
a<x<b.
An equation like this enables us to construct a new function A from a given functionf, the
value of A at each point in [a, b] being determined by Equation (2.20). The function A is
sometimes referred to as an indejnite integral off, and it is said to be obtained from f by
integration. We say an indefinite integral rather than the indefinite integral because A also
depends on the lower limit a. Different values of a Will lead to different functions A. If we
use a different lower limit, say c, and define another indefinite integral F by the equation
then the additive property tells us that
A(x) - F(x) = j:f(t) dt - j:f(t) dt = j;f(t) dt >
and hence the difference A(x) - F(x) is independent of x. Therefore any two indefinite
integrals of the same function differ only by a constant (the constant depends on the choice
of a and c).
When an indefinite integral off is known, the value of an integral such as j: f (t) dt may
be evaluated by a simple subtraction. For example, if n is a nonnegative integer, we have
the formula of Theorem 1.15,
z
xrz+1
t”dt = s0
n+1’
The integral as a function of the Upper limit.
Indejinite
integrals
121
and the additive property implies that
In general, if F(x) = Je f(t) dt, then we have
I:f(t) dt = jcbf(t) dt - j:j(t) dt = F(b) - F(a).
(2.21)
A different choice of c merely changes F(x) by a constant; this does not alter the difference
F(b) - F(a), because the constant cancels out in the subtraction.
If we use the special symbol
F(x)/:
to denote
the difference F(b) - F(a), Equation (2.21) may be written as
s
abf(x) dx =: F(x)\: = F(b) - F(a)
There is, of course, a very simple geometric relationship between a function f and its
indefinite integrals. An example is illustrated in Figure 2.15(a), where f is a nonnegative
function and the number A(x) is equal to the area of the shaded region under the graph of
f from a to x. If f assumes both positive and negative values, as in Figure 2.15(b), the
integral A(x) gives the sum of the areas of the regions above the x-axis minus the sum of
the areas below the x-axis.
Many of the functions that occur in various branches of science arise exactly in this way,
as indefinite integrals of other functions.
This is one of the reasons that a large part of
calculus is devoted to the study of indefinite integrals.
Sometimes a knowledge of a special property off implies a corresponding special property
of the indefinite integral. For example, if f is nonnegative on [a, b], then the indefinite
integral A is increasing, since we have
A(y) - A(x) = j,Lf(f) dt - jUf(O dt = j;/(t) dt 2 0,
X
a
(4
F IGURE 2.15
U-4
Indefinite integral interpreted geometrically in terms of area.
122
Some applications of integration
P(Y)
X
Y.
x+y
-
X
x+y
2
2
(a) A convex function
FIGURE 2.16
(b) A concave function
Geometric interpretation of convexity and concavity.
whenever a 5 x 5 y 5 b. Interpreted geometrically, this means that the area under the
graph of a nonnegative function from a to x cannot decrease as x increases.
Now we discuss another property which is not immediately evident geometrically.
Suppose f is increasing on [a, b]. We cari prove that the indefinite integral A has a property
known as convexity. Its graph bends upward, as illustrated in Figure 2.16(a); that is, the
chord joining any two points on the graph always lies above the graph. An analytic
definition of convexity may be given as follows.
DEFINITION
OF A CONVEX FUNCTION.
A function g is said to be convex on an interval
[a, b] if, for all x and y in [a, b] andfor every CI satisfying 0 < C < 1, we have
(2.22)
g(z) 5 %(Y> + (1 - 4g(x),
where z = CC~ + (1 -.cc)x.
We say g is concave on [a, b] if the reverse inequality holds,
g(z) 2 %(Y) + (1 - 4g(x>,
where z=ocy+(l -tc)x.
These inequalities have a simple geometric interpretation. The point z = CCJJ + (1 - K)X
satisfies z - x = ~(y - x). If x < y, this point divides the interval [~,y] into two subintervals, [x, z] and [z, y], the length of [x, z] being C times that of [x, y]. As C runs from 0
to 1, the point Mg(y) + (1 - CC)~(X) traces out the line segment joining the points (x, g(x))
and (y, g(y)) on the graph of g. Inequality (2.22) states that the graph of g never goes above
this line segment. Figure 2.16(a) shows an example with C = 3. For a concave function,
the graph never goes below the line segment, as illustrated by the example in Figure 2.16(b).
THEOREM
2.9. Let A(x) = JO f(t) dt. Then A is convex on every interval
creasing, and concave on every interval where f is decreasing.
where f is in-
Proof.
Assume f is increasing on [a, b], choose x < y, and let z = CC~ + (1 - CC)~. We
are to prove that A(z) 5 aA + (1 - ~)A(X).
S ince A(z) = ctA(z) + (1 - N)A(Z), this
The integral as a function of the Upper limit.
Indejînite
integrals
123
is the same as proving that ~A(Z) + (1 - ~)A(Z) 5 CL~(Y) + (1 - ~)A(X), or that
(1 - 4144 - &41 I 4qy) -
441’
Since we have A(z) - A(x) = J;f(t) dt and A(y) - A(z) = fif (t) dr, we are to prove that
(2.23)
But
f
(1 - CC) j)(l) dt < tc SLf(t) dt .
is increasing,
f(t)
If(z)
SO
we have the inequalities
if x < t I z,
and
f(z) <f(t)
if z I t < y .
Integrating these inequalities we find
s;f(t) dt If(z)(z - x>,
But (1 - CC)(~ - x) = ~(y - z),
SO
and f(z)(y - z) 5 jzyî(O dt .
these inequalities give us
(1 - ~1 J;f(O dt I (1 - Mz)(z - x) =
~-(z)(Y - z) I ,$‘fG) dt ,
which proves (2.23). This proves that A is convex when f is increasing. When fis decreasing,
we may apply the result just proved to -J
EXAMPLE. The cosine function decreases in the interval [0, 7~1. Since sin x = JO COS t dt,
the graph of the sine function is concave in the interval [0, x]. In the interval [‘rr, 2571, the
cosine increases and the sine function is convex.
Figure 2.17 illustrates further properties of indefinite integrals. The graph on the left is
that of the greatest-integer function, f(x) = [xl; the graph on the right is that of the
indefinite integral A(x) = J; [t] dt. On those intervals where f is constant, the function A
is linear. We describe this by saying that the integral of a step function is piece\+Yse linear.
F IGURE 2.17 The indefinite integral of a step function is piecewise linear.
Some applications of integration
124
Observe also that the graph off is made up of disconnected line segments. There are
points on the graph offwhere a small change in x produces a sudden jump in the value of
the function. Note, however, that the corresponding indefinite integral does not exhibit
this behavior. A small change in x produces only a small change in A(x). That is why the
graph of A is not disconnected. This illustrates a general property of indefinite integrals
known as continuity. In the next chapter we shall discuss the concept of continuity in
detail and prove that the indefinite integral is always a continuous
function.
2.19 Exercises
Evaluate
the integrals in Exercises 1 through 16.
1. Jo” (1 + t + tz)dt.
9. j:, COS
t dt.
2. SO” (1 + t + t2) dt.
10. j;’ (4 +
COS
3. jz (1 + t + t2) dt.
11.
4. j;-‘(1 - 2t + 3t2)dt.
12.
5. j:, t2(t2 + 1) dt.
13. j:’ (v2 + sin 3v) du.
6. js’ (t2 + 1)2 dt.
14. .\l (sin2 x + x) dx.
t) dt.
(4 - sin t) dt.
m
7. jr0 112 + 1) dt,
x > 0.
15.
sin 2w +
COS
t dw.
0
SC
8. j;‘(tl’2 + t1’4) dt,
x > 0.
16. j:, (4 +
COS
i
t)2 dt.
17. Find a11 real values of x such that
j; (t3 - t) dt = 3 j; (t - t3) dt .
Draw a suitable figure and interpret the equation geometrically.
18. Letf(x) = x - [x] - & if x is not an integer, and letf(x) = 0 if x is an integer. (As usual,
[x] denotes the greatest integer I x.) Define a new function P as follows:
f’(x) = j,)(t) dt
for every real x .
(a) Draw the graph off over the interval [ -3, 31 and prove that f is periodic with period 1:
f(x + 1) =f(x) for a11 x.
(b) Prove that P(x) = $(x2 - x), if 0 5 x 5 1 and that P is periodic with period 1.
(c) Express P(x) in terms of [xl.
(d) Determine a constant c such that J”A (P(t) + c) dt = 0.
(e) For the constant c of part (d), let Q(x) = jg (P(t) + c) dt. Prove that Q is periodic with
period 1 and that
Q(x) = ; - ; + ;
i f O<x<l.
Exercises
125
19. Given an odd function f, detîned everywhere, periodic with period 2, and integrable on every
interval. Let g(x) = J;f(t) dt.
(a) Prove that g(2n) = 0 for every integer n.
(b) Prove that g is even and periodic with period 2.
20. Given an even function f, defined everywhere, periodic with period 2, and integrable on every
interval. Let g(x) = j; f (t) dt, and let A = g(1).
(a) Prove that g is odd and that g(x + 2) - g(x) = g(2).
(b) Computeg(2) and g(5) in terms of A.
(c) For what value of A Will g be periodic with period 2?
21. Given two functions f and g, integrable on every interval and having the following properties :
f is odd, g is even, f(5) = 7, f(0) = 0, g(x) =f(x + 5), f(x) = j$ g(t) dt for a11 x. Prove
that (a)& - 5) = -g(x) for a11 x; (b) JO f (t) dt = 7; (c) j$ f(t) dt = g(0) - g(x).
3
CONTINUOUS
FUNCTIONS
3.1 Informa1 description of continuity
This chapter deals with the concept of continuity, one of the most important and also
one of the most fascinating ideas in a11 of mathematics. Before we give a precise technical
definition of continuity, we shah briefly discuss the concept in an informa1 and intuitive
way to give the reader a feeling for its meaning.
Roughly speaking, the situation is this: Suppose a function f has the value f(p) at a
certain point p. Then f is said to be continuous at p if at every nearby point x the function
*X
-3
- 2 - l
0
1
2
3
4
(a) A jump discontinuity at each integer.
FIGURE
3.1
/
=
X
0
(b) An infinite d i s c o n t i n u i t y a t 0 .
Illustrating two kinds of discontinuities.
value f (x) is close to f (p). Another way of putting it is as follows: If we let x move toward
p, we want the corresponding function values f(x) to become arbitrarily close to f(p),
regardless of the manner in which x approaches p. We do not want sudden jumps in the
values of a continuous function, as in the examples in Figure 3.1.
Figure 3.1(a) shows the graph of the function f defined by the equation f (x) = x - [xl,
where [x] denotes the greatest integer Ix. At each integer we have what is known as a
jump discontinuity. For example, f(2) = 0, but as x approaches 2 from the left, f(x)
approaches the value 1, which is not equal to f (2). Therefore we have a discontinuity at 2.
Note that f (x) d oes approach f(2) if we let x approach 2 from the right, but this by itself
is not enough to establish continuity at 2. In a case like this, the function is called continuous
from the right at 2 and discontinuous
from the left at 2. Continuity at a point requires both
continuity from the left and from the right.
126
The dejnition
of the Iimit of a function
127
In the early development of calculus almost a11 functions that were dealt with were
continuous and there was no real need at that time for a penetrating look into the exact
meaning of continuity. It was not until late in the 18th Century that discontinuous
functions
began appearing in connection with various kinds of physical problems. In particular, the
work of J. B. J. Fourier (1758-1830) on the theory of heat forced mathematicians of the
early 19th Century to examine more carefully the exact meaning of such concepts as function
and continuity. Although the meaning of the word “continuous” seems intuitively clear
to most people, it is not obvious how a good definition of this idea should be formulated.
One popular dictionary explains continuity as follows :
Continuity: Quality or state of being continuous.
Continuous: Having continuity of parts.
Trying to learn the meaning of continuity from these two statements alone is like trying to
learn Chinese with only a Chinese dictionary. A satisfactory mathematical definition of
continuity, expressed entirely in terms of properties of the real-number system, was first
formulated in 1821 by the French mathematician, Augustin-Louis Cauchy (1789-1857).
His definition, which is still used today, is most easily explained in terms of the limit concept
to which we turn now.
3.2 The defmition of the limit of a function
Let f be a function defined in some open interval containing a point p, although we do
not insist that f be defined at the point p itself. Let A be a real number. The equation
limf(x) = A
x-lJ
is read: “The limit off(x), as x approaches p, is equal to A,” or “f(x) approaches A as x
approaches p.” It is also written without the limit symbol, as follows:
f(x)-A
a s x+p.
This symbolism is intended to convey the idea that we cari make f(x) as close to A as we
please, provided we choose x sufficiently close to p.
Our first task is to explain the meaning of these symbols entirely in terms of real numbers.
We shall do this in two stages. First we introduce the concept of a neighborhood of a point,
then we define limits in terms of neighborhoods.
DEFINITION OF NEIGHBORHOOD OF A POINT.
its midpoint is called a neighborhood of p.
Any open interval containing a point p as
Notation. We denote neighborhoods by N(p), N,(p), N,(p), etc. Since a neighborhood
N(p) is an open interval symmetric about p, it consists of a11 real x satisfying p - r < x <
p + r for some r > 0. The positive number r is called the radius of the neighborhood. W e
designate N(p) by N(p; r) if we wish to specify its radius. The inequalities p - r < x <
p + r are equivalent to -r < x -p < r, and to Ix -pi < r. Thus, N(p; r) consists of
a11 points x whose distance from p is less than r.
128
Continuous jîînctions
In the next definition, we assume that A is a real number and thatfis a function defined
on some neighborhood of a point p (except possibly at p). The function f may also be
delined at p but this is irrelevant in the definition.
DEFINITION OF LIMIT OF A FUNCTION.
limf(x) = A
cE+3>
The symbolism
[or f(x) - A
as
x-p]
means that for every neighborhood N,(A) there is some neighborhood N,(p) such that
f(x) E N,(A)
(3.1)
whenever x E N,(p)
a n d x#p.
The first thing to note about this definition is that it involves two neighborhoods, N,(A)
and N,(p). The neighborhood N,(A) is specifiedfirst; it tells us how close we wishf(x) to
Neighborhood N,(p)
F IGURE 3 . 2
Here lim f(x) = A, but there
2-p
is no assertion about
f at p.
f is defined at p and
lim f(x) = f(p), hence f is continuous at p.
5-p
F IGURE 3.3 Here
be to the limit A. The second neighborhood, N,(p), tells us how close x should be to p SO
thatf(x) Will be within the first neighborhood N,(A). The essential part of the definition
is that, for every N,(A), no matter how small, there is some neighborhood N,(p) to satisfy
(3.1). In general, the neighborhood N,(p) Will depend on the choice of N,(A). A neighborhood N,(p) that works for one particular N,(A) Will also work, of course, for every larger
N,(A), but it may not be suitable for any smaller N,(A).
The definition of limit may be illustrated geometrically as in Figure 3.2. A neighborhood
N,(A) is shown on the y-axis. A neighborhood N,(p) corresponding to N,(A) is shown on
the x-axis. The shaded rectangle consists of a11 points (x, y) for which x E N,(p) and
y E N,(A). The definition of limit asserts that the entire graph offabove the interval N,(p)
lies within this rectangle, except possibly for the point on the graph above p itself.
The definition
of the limit of a function
129
The definition of limit cari also be formulated in terms of the radii of the neighborhoods
N,(A) and N,(p). It is customary to denote the radius of N,(A) by E (the Greek letter epsilon)
and the radius of N,(p) by 6 (the Greek letter delta). The statementf(x) E N,(A) is equivalent
to the inequality If(x) - Al < E, and the statement x E N,(p), x #p, is equivalent to the
inequalities 0 < lx -pi < 6. Therefore, the definition of limit cari also be expressed as
follows :
The symbol lim5-J(x) = A means that for every E > 0, there is a 6 > 0 such that
(3.2)
whenever
If(4 - 4 < E
0 < (x - pi < 6 .
We note that the three statements,
limf(x) = A ,
2’9
lim (f(x) - A) = 0 ,
z-2)
lim If(x) - A( = 0,
2-p
are a11 equivalent. This equivalence becomes apparent as soon as we Write each of these
statements in the E, 8-terminology (3.2).
In dealing with limits as x +p, we sometimes find it convenient
to denote the difference
x - p by a new symbol, say h, and to let h + 0. This simply amounts to a change in
notation, because, as cari be easily verified, the following two statements are equivalent:
Iimf(x) = A ,
Z-P
limf(p + h) = A .
h-0
EXAMPLE 1. Limit of a constant function. Let f(x) = c for a11 x. It is easy to prove
that for every p, we have lim,, p f(x) = c. In fact, given any neighborhood NI(c), relation
(3.1) is trivially satisfied for any choice of N,(p) because f (x) = c for a11 x and c E N,(c) for
a11 neighborhoods N,(c). In limit notation, we Write
lim c = c .
Z-P
EXAMPLE 2. Limit of the identity function. Here f(x) = x for a11 x. We cari easily prove
that limz+ef(x> = P. Ch oose any neighborhood N,(p) and take N,(p) = N,(p). Then
relation (3.1) is trivially satisfied. In limit notation, we Write
lim x = p .
z-9
“One-sided” limits may be defined in a similar way. For example, if f (x) -+ A as x -+p
through values greater thanp, we say that A is the right-hand limit off at p, and we indicate
this by writing
limf(x) = A .
S+i>+
In neighborhood terminology this means that for every neighborhood N,(A), there is some
neighborhood N,(p) such that
(3.3)
f(x) EJW)
whenever x E N,(p)
a n d x>p.
Continuous functions
130
Left-hand limits, denoted by writing x +p-, are similarly defined by restricting x to
values less than p.
Iffhas a limit A at p, then it also has a right-hand limit and a left-hand limit at p, both
of these being equal to A. But a function cari have a right-hand limit at p different from the
left-hand limit, as indicated in the next example.
EXAMPLE 3. Letf(x) = [x] for a11 x, and let p be any integer. For x nearp, x < p, we
have f(x) = p - 1, and for x near p, x > p, we have f(x) = p. Therefore we see that
lim f(x) = p - 1
r+p-
and
lim f(x) = p .
x+9+
In an example like this one, where the right- and left-hand limits are unequal, the limit of
fat p does not exist.
EXAMPLE
4. Let f(x) = 1/x2 if x # 0, and let f(O) = 0. The graph off near zero is
shown in Figure 3.1(b). In this example,ftakes arbitrarily large values near 0 SO it has no
right-hand limit and no left-hand limit at 0. TO prove rigorously that there is no real number
A such that lim,,,+f(x)= A, we may argue as follows: Suppose there were such an A,
say A 2 0. Choose a neighborhood N,(A) of length 1. In the interval 0 < x < l/(A + 2),
we havef(x) = 1/x2 > (A + 2)2 > A + 2, sof(x) cannot lie in the neighborhood N,(A).
Thus, every neighborhood N(0) contains points x > 0 for whichf(x) is outside N,(A), SO
(3.3) is violated for this choice of N,(A). Hencefhas no right-hand limit at 0.
EXAMPLE 5. Let f(x) = 1 if x # 0, and let f(0) = 0. This function takes the constant
value 1 everywhere except at 0, where it has the value 0. Both the right- and left-hand
limits are 1 at every point p, SO the limit off(x), as x approaches p, exists and equals 1.
Note that the limit offis 1 at the point 0, even thoughf(0) = 0.
3.3 The definition of continuity
of a function
In the definition of limit we made no assertion about the behavior off at the point p
itself. Statement (3.1) refers to those x # p which lie in N,(p), SO it is not necessary that
f be defined at p. Moreover, even if f is defined at p, its value there need not be equal to
the limit A. However, if it happens thatf is defined atp and if it also happens thatf(p) = A,
then we say the function f is continuous at p. In other words, we have the following
definition.
DEFINITION OF CONTINUITY OF A FUNCTION AT A POINT.
tinuous at a point p if
(a)
fis
A function
f
is said to be con-
dejned at p, and
(b) limfC4 =f(p).
I?+D
This definition cari also be formulated in terms of neighborhoods. A function f is
continuous
at p if for every neighborhood Nl[f (p)] there is a neighborhood N,(p) such that
(3.4)
f(x) E NJf (P)I
whenever x E N,(p).
The basic Iimit theorems. More examples of continuous functions
131
Since f(p) always belongs to N,[f(p)], we do not need the condition x # pin (3.4). In
the E, S-terminology, where we specify the radii of the neighborhoods, the definition of
continuity cari be restated as follows:
A function f is continuous at p if for every E > 0 there is a S > 0 such that
If(x) -f(P)1 < E
whenever lx -pJ < 6.
The definition of continuity is illustrated geometrically in Figure 3.3. This is like Figure
3.2 except that the limiting value, A, is equal to the function value f (p) SO the entire graph
off above N,(p) lies in the shaded rectangle.
EXAMPLE
1. Constant functions are continuous everywhere. Iff(x) = c for a11 x, then
limf(x) = lim c = c = f(p)
Z’P
x-î)
for every p, so f is continuous everywhere.
EXAMPLE
2. The identity function is continuous everywhere. If f(x) = x for a11 x, we have
limf(x) = lim x = p =f(p)
CC-P
3z’D
for every p,
SO
the identity function is continuous everywhere.
EXAMPLE
3. Let f(x) = [x ] for a11 x. This function is continuous at every pointp which
is not an integer. At the integers it is discontinuous, since the limit off does not exist, the
right- and left-hand limits being unequal. A discontinuity of this type, where the right- and
left-hand limits exist but are unequal, is called a jump discontinuity.
However, since the
right-hand limit equals f (p) at each integer p, we say that f is continuous from the right at p.
EXAMPLE 4. The function f for which f(x) = 1/x2 for x # 0, f(0) = 0, is discontinuous
at 0. [See Figure 3.1(b).] We say there is an infinite discontinuity
at 0 because the function
takes arbitrarily large values near 0.
EXAMPLE 5. Let f(x) = 1 for x # 0, f(0)
= 0. This function is continuous everywhere
except at 0. It is discontinuous at 0 because f(0) is not equal to the limit off(x) as x + 0.
In this example, the discontinuity could be removed by redefining the function at 0 to have
the value 1 instead of 0. For this reason, a discontinuity of this type is called a removable
discontinuity.
Note that jump discontinuities, such as those possessed by the greatest-integer
function, cannot be removed by simply changing the value off at one point.
3.4 The hasic limit theorems. More examples of continuous functions
Calculations with limits may often be simplified by the use of the following theorem
which provides basic rules for operating with limits.
Continuous
132
TMEOREM
3.1.
functions
Let f and g be functions such that
lim j(x) = A ,
lim g(x) = B .
2-p
2-p
Then we have
(i)
lim [f(x) + g(x)] = A + B ,
Z’P
(ii)
lim [f(x) - g(x)] = A - B ,
IF+?J
(iii)
limf(x) . g(x) = A . B ,
z+rJ
( i v ) limf(x)/g(x) = A/B
r*P
i f
B#O.
Note: An important special case of (iii) occurs whenfis constant, sayf(x) = A for
a11 x. In this case, (iii) is written as lim A .g(x) = A B.
fJ+P
The proof of Theorem 3.1 is not difficult but it is somewhat lengthy SO we have placed
it in a separate section (Section 3.5). We discuss here some simple consequences
of the
theorem.
First we note that the statements in the theorem may be written in a slightly different
form. For example, (i) cari be written as follows:
lim [f(x) + g(x)] = lim f(x) + lim g(x) .
2+?J
e-î,
Z-D
It tells us that the limit of a sum is the sum of the limits.
Jt is customary to denote by f + g, f - g, f. g, and f/g the functions whose values at
each x under consideration are
f(x) + g(x),
f(x) - g(x),
f(x). g(x),
ad
f(x)/g(x)
y
respectively. These functions are called the sum, dijierence, product, and quotient off and
g. Of course, the quotient f/g is defined only at those points for which g(x) # 0. The
following corollary to Theorem 3.1 is stated in this terminology and notation and is
concerned with continuous functions.
THEOREM
3.2. Let f and g be continuous at a point p. Then the sum f + g, the d@erence
i(;)g,+,d the product f *g are also continuous ut p. The same is true of the quotient f/g if
Proof. Since f and g are continuous at p, we have lim,,, f (x) = f (p) and lim,, 9 g(x) =
g(p). Therefore we may apply the limit formulas in Theorem 3.1 with A = f(p) and
B = g(p) to deduce Theorem 3.2.
The basic limit theorems. More examples of continuous jiinctions
133
We have already seen that the identity function and constant functions are continuous
everywhere. Using these examples and Theorem 3.2, we may construct many more examples
of continuous functions.
EXAMPLE 1. Continuity of polynomials. If we take f(x) = g(x) = x, the result on continuity of products proves the continuity at each point for the function whose value at each
x is x2. By mathematical induction, it follows that for every real c and every positive integer
n, the function f for whichf(x) = cx” is continuous for a11 x. Since the sum of two continuous functions is itself continuous, by induction it follows that the same is true for the
sum of any finite number of continuous functions. Therefore every polynomial p(x) =
z;=, cLxk is continuous at a11 points.
EXAMPLE 2. Continuity of rational functions. The quotient of two polynomials is called a
rationalfunction. If r is a rational function, then we have
P(X)
r(x) = - )
4(x)
where p and q are polynomials. The function r is defined for a11 real x for which q(x) # 0.
Since quotients of continuous functions are continuous, we see that every rational function
is continuous wherever it is defined. A simple example is r(x) = 1/x if x # 0. This function
is continuous everywhere except at x = 0, where it fails to be defined.
The next theorem shows that if a function g is squeezed between two other functions
which have equal limits as x -+p, then g also has this limit as x -+p.
THEOREM 3.3. SQUEEZING PRINCIPLE. Suppose that f(x) < g(x) < h(x) for a11 x # p
in some neighborhood N(p). Suppose also that
limf(x) = lim h(x) = a .
X-+D
Z-+P
Then n’e also haue limeeD g(x) = a.
Proof. Let G(x) = g(x) -f(x), and H(x) = h(x) -f(x). The inequalities f 5 g 5 h
implyO<g-flh-f,or
0 2 G(x) I H(x)
for a11 x # p in N(p). TO prove the theorem, it suffices to show that G(x) -+ 0 as x +p,
given that H(x) -f 0 as x -f p.
Let N,(O) be any neighborhood of 0. Since H(x) + 0 as x -+p, there is a neighborhood
N,(p) such that
HC4 E K(O)
whenever x E N,(p)
a n d xfp.
We cari assume that N,(p) E N(p). Then the inequality 0 5 G 5 H states that G(x) is no
Continuous functions
134
further from 0 than H(x) if x is in N,(p), x # p. Therefore G(x) E N,(O) for such x, and
hence G(x) + 0 as x -p. This proves the theorem. The same proof is valid if a11 the
limits are one-sided limits.
The squeezing principle is useful in practice because it is often possible to find squeezing
functions f and h which are easier to deal with than g. We shah use the result now to prove
that every indefinite integral is a continuous function.
3.4. CONTINUITY
for every x in [a, b], and let
THEOREM
OF
INDEFINITE
INTEGRALS.
Assume f is integrable on [a, x]
A(x) = j-)(t) dt .
Then the ind$inite integral A is continuous at each point of [a, b]. (At each endpoint we have
one-sided continuity.)
Proof.
(3.5)
Choose p in [a, b]. We are to prove that A(x) + A(p) as x -+p. We have
A(x) - A(P) = j-)(t) dt +
Now we estimate the size of this integral. Sincefis bounded on [a, b], there is a constant
M > 0 such that -M <f(t) < A4 for a11 t in [a, b]. If x > p, we integrate these inequalities
over the interval [p, x] to obtain
-M(x - P> I A(x) - A(p) 5 M(x - p) .
If x < p, we obtain the same inequalities with x - p replaced by p - x. Therefore, in
either case we cari let x -+ p and apply the squeezing principle to find that A(x) -+ A(p).
This proves the theorem. If p is an endpoint of [a, b], we must let x + p from inside the
interval, SO the limits are one-sided.
3. Continuity of the sine and cosine. Since the sine function is an indefinite
integral, sin x = r COS t dt, the foregoing theorem tells us that the sine is continuous
l
everywhere. Similarly, the cosine is everywhere continuous since COS x = 1 - .’ sin t dt.
!
The continuity of these functions cari also be deduced without making use of the’fact that
they are indefinite integrals. An alternate proof is outlined in Exercise 26 of Section 3.6.
EXAMPLE
EXAMPLE
(3.6)
4. In this example we prove an important limit formula,
limSE= 1,
2’0 x
that is needed later in our discussion of differential calculus. Since the denominator of the
quotient (sin X)/X approaches 0 as x + 0, we cannot apply the quotient theorem on limits
Proofs of the basic limit theorems
135
to deduce (3.6). Instead, we use the squeezing principle. From Section 2.5 we have the
fundamental inequalities
1
sin x
o<cosx<~<-,
COS x
valid for 0 < x < &T. They are also valid for - $ < x < 0 since COS (-x) = COS x and
sin (-x) = -sin x, and hence they hold for a11 x # 0 in the neighborhood N(O; $T). When
x + 0, we find COS x + 1 since the cosine is continuous at 0, and hence ~/(COS x) + 1.
Therefore, by the squeezing principle, we deduce (3.6). If we definef(x) = (sin x)/x for
x # 0, f(O) = 1, thenfis continuous everywhere. Its graph is shown in Figure 3.4.
Y
“_Of(X) = 1 =f(O)
1/
/
-
CX
-2x-X
FIGURE 3.4 ,f(x) = (sin x)/x if x # 0, f(0) = 1. This function is continuous everywhere.
EXAMPLE
5. Continuity off when f (x) = xr for x > 0, where r is a positive rational number.
From Theorem 2.2 we have the integration formula
s
Xl+lln
xtl/n dt = ~
0
1 + I/n ’
valid for a11 x > 0 and every integer n 2 1. Using Theorems 3.4 and 3.1, we find that the
function A given by ,4(x) = xl+lln is continuous at a11 points p > 0. Now let g(x) =
Xl/% = A(x )/x f or x > 0. Since g is a quotient of two continuous functions it, too, is
continuous at a11 points p > 0. More generally, if f(x) = xrnjn, where m is a positive
integer, then f is a product of continuous functions and hence is continuous at a11 points
p > 0. This establishes the continuity of the rth-power function, f(x) = x’, when r is any
positive rational number, at a11 points p > 0. At p = 0 we have right-hand continuity.
The continuity of the rth-power function for rational r cari also be deduced without
using integrals. An alternate proof is given in Section 3.13.
3.5 Proofs of the basic limit tbeorems
In this section we prove Theorem 3.1 which describes the basic rules for dealing with
limits of sums, products,
and quotients. The principal algebraic tools used in the proof
Continuous functions
136
are two properties of absolute values that were mentioned earlier in Sections 14.8 and 14.9.
They are (1) the triangle inequality, which states that la + b] 5 la1 + 161 for a11 real a and
b, and (2) the equation lab] = la1 Jbl, which states that the absolute value of a product is
the product of absolute values.
Proofs of(i) und (ii).
Since the two statements
limf(x) = A
r-l)
and
lim [f(x) - A] = 0
92-D
are equivalent, and since we have
f-(x> + g(x) - (‘4 + B) = [f(x) - Al + [g(x) - 4 ,
it suffices to prove part (i) of the theorem when the limits A and B are both zero.
Suppose, then, thatf(x) +Oandg(x)+Oasx+p. We shall prove thatf(x) + g(x) + 0
as x +p. This means we must show that for every E > 0 there is a 6 > 0 such that
(3.7)
IfW + kW < E
whenever 0 < Ix -pi < 6.
Let E be given. Sincef(x) --f 0 as x +p, there is a 6, > 0 such that
(3.8)
If(x>l < ;
whenever
0 < Ix - PI < 61 .
Similarly, since g(x) + 0 as x +p, there is a 6, > 0 such that
(3.9)
IgWl < ;
whenever 0 < Ix - p] < 6,
If we let 6 denote the smaller of the two numbers 6, and 6, , then both inequalities (3.8) and
(3.9) are valid if 0 < Ix - pi < 6 and hence, by the triangle inequality, we find that
If(x) + &)l I If@)l + Idx>l < ; + ; = E
This proves (3.7) which, in turn, proves (i). The proof of (ii) is entirely similar, except that
in the last step we use the inequality If(x) - g(x)] 5 If(x)] + ]g(x)l.
Proof of (iii). Suppose that we have proved part (iii) for the special case in which one
of the limits is 0. Then the general case follows easily from this special case. In fact, a11
we need to do is Write
fWgW - AB =f(x>[g(x) - Bl + B[f(x) - A] .
The special case implies that each term on the right approaches 0 as x +p and, by property
Proofs of the basic limit theorems
137
(i), the sum of the two terms also approaches 0. Therefore, it remains to prove (iii) in the
special case where one of the limits, say B, is 0.
Suppose, then, thatf(x) -+ A and g(x) + 0 as x +p. We wish to prove thatf(x)g(x) -f 0
as x +p. TO do this we must show that if a positive E is given, there is a 6 > 0 such that
(3.10)
Ifaw < E
whenever 0 < Ix -pi < 6.
Since f(x) -+ A as x -p, there is a 8, such that
(3.11)
If(4 - 4 < 1
whenever 0 < Ix
-pl < 61.
For such x, we have If(x)l = If(x) - A + Al 5 If(x) - Al + [Al < 1 + IA], and hence
(3.12)
If(4g(x)I = IfW IgWl < (1 + IAI> IgWl.
Since g(x) + 0 as x +p, for every e > 0 there is a 6, such that
(3.13)
whenever
0 < Ix - p( < 6, .
Therefore, if we let 6 be the smaller of the two numbers 6, and 6, , then both inequalities
(3.12) and (3.13) are valid whenever 0 < Ix -pi < 6, and for such x we deduce (3.10).
This completes the proof of (iii).
Proofof(iv). Since the quotientf(x)/g(x) is the product off(x)/B with B/g(x), it suffices
to prove that B/g(x) - 1 as x +p and then appeal to (iii). Let h(x) = g(x)/B. Then
h(x) + 1 as x +p, and we wish to prove that l/h(x) -f 1 as x -+p.
Let E > 0 be given. We must show that there is a 6 > 0 such that
(3.14)
The difference
whenever 0 < (x - p( < 6 .
to be estimated may be written as follows.
= Ih(x) - 11
I@>l ’
(3.15)
Since h(x) + 1 as x +p, we cari choose a 6 > 0 such that both inequalities
(3.16)
P(x) - II < ;
and
b(x) - II < ;
!
are satisfied whenever 0 < Ix - pi < 6. The second of these inequalities implies h(x) > &
SO l/lh(x)l
= l/h(x) < 2 for such x. Using this in (3.15) along with the first inequality in
(3.16), we obtain (3.14). This completes the proof of (iv).
138
Continuous functions
3.6 Exercises
In Exercises 1 through 10, compute the limits and explain which limit theorems you are using
i n each case.
x2 - a2
8. lim
2+n x2 + 2ax + a2 ’
1. lim ie .
x+2
25x3 + 2
2. lim ~
2+o 75x7 - 2 .
9. lim tan t.
t+o
x2 - 4
3 . lim - .
2-2 x - 2
10. lim (sin 2t + t2 cas 5t).
teo
2x2 - 3x + 1
x-l
*
X+l
4 . lim
11. lim -.
2-O+ x
5 lim (t + h12 - t2
h-0
h
a # 0.
12. lim fi .
e-o- x
.
x2 - a2
6. lim
z+. x2 + 2ax + a2 ’
a # 0.
x2 - a2
7. lim
a+O x2 + 2ax + a2 ’
x #O.
fi
13. lim X.
z-o+
2/x
14. lim - .
a-o- x
Use the relation lim,,, (sin x)/x = 1 to establish the limit formulas in Exercises 15 through 20.
sin 2x
sin 5x - sin 3x
= 2.
15. lim - = 2.
18. lim
X
2-o x
x-o
tan 2x
sin x - sin a
16. lim T = 2 .
19. lim
= cas a.
z+o sin x
x - a
2-O
sin 5x
1 - COS x
17. lim 7 = 5 .
20. lim x2 = 4.
e+O sln x
X+0
l-d,
21. Show that lim
x2
= 4. [Hint: (1 - 2/u)(l + 6) = 1 - u.]
x-o
22. A function
f
is defined as follows:
,fW =
sin x
i f X<C,
ax + b
i f X>C,
where a, b, c are constants. If b and c are given, find a11 values of a (if any exist) for whichf
is continuous at the point x = c.
23. Solve Exercise 22 if f is defined as follows:
24. At what points are the tangent and cotangent functions continuous?
25. Let f(x) = (tan x)/x if x # 0. Sketch the graph off over the half-open intervals [-&T, 0)
and (0, $1. What happens tof(x) as x + O? Can you definef(0) SO thatfbecomes continuous
at O?
Exercises
139
26. This exercise outlines an alternate proof of the continuity of the sine and cosine functions.
(a) The inequality \sinx( < (xl, valid for 0 < 1x1 < BT, was proved in Exercise 34 of Section
2.8. Use this inequality to prove that the sine function is continuous at 0.
(b) Use part (a) and the identity COS 2x = 1 - 2 sin2 x to prove that the cosine is continuous
at 0.
(c) Use the addition formulas for sin (x + h) and COS (x + h) to prove that the sine and cosine
are continuous at any real x.
27. Figure 3.5 shows a portion of the graph of the functionfdefined as follows:
f(x) = sin k
i f x#O.
For x = l/(nn), where n is an integer, we have sin (1/x) = sin (na) = 0. Between two such
points, the function values rise to + 1 and drop back to 0 or else drop to - 1 and rise back to 0.
FIGURE 3.5 f(x) = sin (1/x) if x # 0. This function is discontinuous at 0 no matter
how f(0) is defined.
Therefore, between any such point and the origin, the curve has an infinite number of oscillations. This suggests that the function values do not approach any fixed value as x + 0. Prove
that there is no real number A such thatf(x) -+ A as x + 0. This shows that it is not possible
to define f(0) in such a way that f becomes continuous at 0.
[Hint:
Assume such
an A exists and obtain a contradiction.]
28. For x # 0, let f(x) = [l/x 1, w here [t] denotes the greatest integer 2 t. Sketch the graph of
f over the intervals [ -2, -51 and [i, 21. What happens to f (x) as x + 0 through positive
values? through negative values ? Can you define f (0) SO that f becomes continuous at O?
29. Same as Exercise 28, when f(x) = ( -1)t1/21 for x # 0.
30. Same as Exercise 28, whenf(x) = x( -l)tl’al for x # 0.
31. Give an example of a function that is continuous at one point of an interval and discontinuous
at a11 other points of the interval, or prove that there is no such function.
32. Letf(x) = x sin (1/x) if x # 0. Definef(0) SO thatfwill be continuous at 0.
33. Letf be a function such that If(u) - f(v)1 5 lu - UI for a11 u and u in an interval [a, b].
(a) Prove that f is continuous at each point of [a, b].
(b) Assume that f is integrable on [a, h]. Prove that
(b - CZ)~
140
Continu020
finctions
(c) More generally, prove that for any c in [a, b], we have
is
:f(x> dx - (b - a)f(c) 5 y .
3.7 Composite functions and continuity
We cari create new functions from given ones by addition, subtraction, multiplication,
and division. In this section we learn a new way to construct functions by an operation
known as composition. We illustrate with an example.
Let f(x) = sin (x2). TO compute f(x), we first square x and then take the sine of x2.
Thus,S(x) is obtained by combining two other functions, the squaring function and the
sine function. If we let v(x) = x2 and U(X) = sin x, we cari expressS(x) in terms of u and u
by writing
We say that f is the composition of u and v (in that order). If we compose u and u in the
opposite order, we obtain a different result, V[U(X)] = (sin x)“. That is, to compute V[U(X)],
we take the sine of x first and then square sin x.
Now we cari carry out this process more generally. Let u and v be any two given functions.
The composite or composition of u and v (in that order) is defined to be the functionffor
which
f(x) = ~bwl
(read as “u of v of x”) .
That is, to evaluatef at x we first compute v(x) and then evaluate u at the point v(x). Of
course, this presupposes that it makes sense to evaluate u at v(x), and therefore f Will be
defined only at those points x for which u(x) is in the domain of u.
For example, if u(x) = 4; and v(x) = 1 - x2, then the composite f is given by f(x) =
m.
Note that v is defined for a11 real x, whereas u is defined only for x 2 0. Therefore the composite f is defined only for those x satisfying 1 - x2 2 0.
Formally, f(x) is obtained by substituting v(x) for x in the expression u(x). For this
reason, the function f is sometimes denoted by the symbol f = u(v) (read as “U of v”).
Another notation that we shall use to denote composition is f = u 0 u (read as “U circle
9). This resembles the notation for the product u . u. In fact, we shall see in a moment
that the operation of composition has some of the properties possessed by multiplication.
The composite of three or more functions may be found by composing them two at a
time. Thus, the function f given by
f(x) =
COS
[sin (x2)]
is a composition, f = u o (u o w), where
u(x) = COS x >
u(x) = sin x,
and
w(x) = x2 .
Notice that the same f cari be obtained by composing u and u first and then composing u 0 u
Composite functions and continuity
with W, thus:
states that
(3.17)
f
141
= (U 0 u) 0 w. This illustrates the associative Zaw for composition which
u 0 (v 0 w) = (u 0 u) 0 w
for a11 functions u, U, w, provided it makes sense to form a11 the composites in question.
The reader Will find that the proof of (3.17) is a straightforward exercise.
It should be noted that the commutative law, u 0 v = v 0 u, does not always hold for
composition. For example, if U(X) = sin x and V(X) = x2, the compositef = u 0 u is given
by f(x) = sin x2 (which means sin (x2)], whereas the composition g = G 0 u is given by
g(x) = sin2 x [which means (sin x)“].
Now we shah prove a theorem which tells us that the property of continuity is preserved
under the operation of composition. More precisely, we have the following.
THEOREM
3.5. Assume v is continuous at p and that u is continuous at q, where q = v(p).
Then the composite finction f = u 0 v is continuous at p.
Proof. Since u is continuous at q, for every neighborhood N,[u(q)] there is a neighborhood
N,(q) such that
(3.18)
4.~4 E W(q)1
But q = u(p) and v is continuous at p,
neighborhood NS(p) such that
(3.19)
$4 E N,(q)
whenever y E N,(q),
SO
for the neighborhood N,(q) there is another
whenever
x E NS(p) .
If we let y = v(x) and combine (3.18) with (3.19), we find that for every neighborhood
N,(u[v(p)]) there is a neighborhood N,(p) such that
~bW1 E NM~(P)I)
whenever x E N,(p),
or, in other words, sincef(x) = ~[V(X)],
f(x) E NI[f(P)l
whenever x E NS(p).
This means thatfis continuous at p, as asserted.
EXAMPLE
everywhere
1. Let f(x) = sin x2. This is the composition of two functions continuous
SO f is continuous everywhere.
E X A M P L E 2. Let f(x) = m = u[u(x)], where u(x) = 6, v(x) = 1 - 2. The
function v is continuous everywhere but u is continuous only for points x 2 0. Hence f is
continuous at those points x for which u(x) 2 0, that is at a11 points satisfying x2 5 1.
Continuous
142
functions
3.8 Exercises
In Exercises 1 through 10, the functionsfandg are defined by the formulas given. Unless otherwise noted, the domains off and g consist of a11 real numbers. Let h(x) =f[g(x)] whenever g(x)
lies in the domain off. In each case, describe the domain of h and give one or more formulas for
determining h(x).
l.f(x) =x2 - 2 x ,
g(x) = x + 1.
2.f(x) =x + 1,
g(x) = x2 - 2x.
3. j-(x) = 1/x
if x 2 0,
g(x) = x2.
4. f(x) = 1/x
if x 2 0,
g(x) = -x2.
5. f(x) = x2,
if x 2 0.
g(x) = vs
6. f(x) = -x2,
if x 2 0.
g(x) = 2/x
7. f(x) = sin x,
if x 2 0.
g(x) = VG
8. f(x) = 4
if x 2 0,
g(x) = sin x.
9. f(X) = 2/x
if x > 0,
if x > 0.
g(x) = x + 1/x
10. f(x) = A-T&
if x > 0,
g(x) = x + di
if x > 0.
Calculate the limits in Exercises 11 through 20 and explain which limit theorems you are using
in each case.
x3 + 8
sin (x2 - 1)
11. lim 16. lim
x-l
<
cv+-2 x 2 - 4 ’
x+1
12. lim 1/1 + 2/X.
17. lim x sin i
x+4
13. lim
t-o
sin (tan t)
sin t ’
14. lim
r-n/2
21. Let
X'
X+0
18. lim
sin (Cos x)
COS
f
x
1 - COS 2x
19 x;ykx-vG
.
X
X+0
2. lim 1 - VT-ZP
2-o
x2
.
andg be two functions defined as follows:
x + I-4
f(X) = ~
2
for a11 x ,
g(x) = x2
1
f o r x<O,
f o r x20.
Find a formula (or formulas) for computing the composite function
what values of x is h continuous?
22. Solve Exercise 21 when f and g are defined as follows:
f(x) = i ;
if 1x1 5 1 ,
if 1x1 > 1 ,
g(x) = l 2 - x2
2
h(x) =f[g(x)]. For
if 1x1 5 2 ,
if 1x1 > 2 .
23. Solve Exercise 21 when h(x) = g [f(x)].
3.9 Bolzano’s
theorem for continuous functions
In the rest of this chapter we shall discuss certain special properties of continuous functions that are used quite frequently. Most of these properties appear obvious when interpreted geometrically ; consequently many people are inclined to accept them as self-evident.
Bolzano’s theorem for continuous functions
143
However, it is important to realize that these statements are no more self-evident than the
definition of continuity itself, and therefore they require proof if they are to be used with
any degree of generality. The proofs of most of these properties make use of the least-upperbound axiom for the real number system.
Bernard Bolzano (1781-1848), a Catholic priest who made many important contributions
to mathematics in the first half of the 19th Century, was one of the first to recognize that
many “obvious” statements about continuous functions require proof. His observations
concerning continuity were published posthumously in 1850 in an important book, Paradoxien des Unendlichen. One of his results, now known as the theorem of Bolzano, is
illustrated in Figure 3.6, where the graph of a continuous function f is shown. The graph
lies below the x-axis at x = a and above the axis at x = b. Bolzano’s theorem asserts that
the curve must cross the axis somewhere between a and b. This property, first published
by Bolzano in 1817, may be stated formally as follows.
THEOREM
3.6. BOLZANO'STHEOREM. Let f be continuous at each point of a closed interval
[a, b] and assume that f(a) andf(b) have opposite signs. Then there is at Ieast one c in the
open interval (a, b) such that f (c) = 0.
We shall base our proof of Bolzano’s theorem on the following property of continuous
functions which we state here as a separate theorem.
THEOREM
3.7. SIGN-PRESERVING PROPERTY OF CONTINUOUS
FUNCTIONS.
Letfbe continuous at c and suppose that f(c) # 0. Then there is an interval (c - 6, c + 6) about c in
which f has the same sign as f(c).
Proof of Theorem 3.7. Suppose f(c) > 0. By continuity, for every E > 0 there is a
6 > 0 such that
(3.20)
f(c) - E <f(x) <f(c) + E
whenever c - 6 <
x
<
c
+ 6.
If we take the 6 corresponding to E = f (c)/2 (this E is positive), then (3.20) becomes
4f(c) <f(x) < Qf(c)
FIGURE 3.6 Illustrating Bolzano’s theorem.
whenever
c - 6<x < c + 6.
FIGURE 3.7 Here f(x) > 0 for x near c
becausef(c)
> 0.
144
Continuous functions
(See Figure 3.7). Therefore f(x) > 0 in this interval, and hence f(x) and f(c) have the
same sign. Iff(c) < 0, we take the 6 corresponding to E = - 4 f(c) and arrive at the same
conclusion.
Note:
interval
If there is one-sided continuity at c, then there is a corresponding one-sided
[c, c + 6) or (c - 6, c] in which f has the same sign as f(c).
Proof of Bolzano’s theorem. TO be specific, assume f(a) < 0 and f(b) > 0, as shown
in Figure 3.6. There may be many values of x between a and b for which f(x) = 0. Our
problem is to find one. We shall do this by finding the largest x for whichf(.x) = 0. F o r
this purpose we let S denote the set of a11 those points x in the interval [a, b] for which
f(x) 2 0. There is at least one point in S because f(a) < 0. Therefore S is a nonempty
set. Also, S is bounded above since a11 of S lies within [a, b], SO S has a supremum. Let
c = sup S. We shall prove that f(c) = 0.
There are only three possibilities: f(c) > 0, f(c) < 0, and f(c) = 0. If f(c) > 0, there
is an interval (c - 6, c + 6), or (c - 6, c] if c = b, in which f is positive. Therefore no
points of S cari lie to the right of c - 6, and hence c - 6 is an Upper bound for the set S.
But c - 6 < c, and c is the least Upper bound of S. Therefore the inequality f(c) > 0
is impossible. If f(c) < 0, there is an interval (c - 6, c + S), or [c, c + S) if c = a, in
which f is negative. Hence f(x) < 0 for some x > c, contradicting the fact that c is an
Upper bound for S. Thereforef (c) < 0 is also impossible, and the only remaining possibility
is f(c) = 0. Also, a < c < b because f(a) < 0 and f(b) > 0. This proves Bolzano’s
theorem.
3.10 The intermediate-value theorem for continuous
functions
An immediate consequence
of Bolzano’s theorem is the intermediate-value theorem for
continuous
functions, illustrated in Figure 3.8.
THEOREM 3.8.
Let f be continuous ut each point of a closed interval [a, b]. Choose two
arbitrarypoints x1 < x2 in [a, b] such thatf (x1) # f (x2). Then f takes on every value between
f (x1) and f (x2) somewhere in the interval (x,, x2).
Proof. Suppose f(x& < f (x2) and let k be any value between f (x1) and f (x,). Let g be the
function defined on [x,, x2] as follows:
g(x) = f (x) - k .
F I G U R E 3 . 8 Illustrating
the
value theorem.
intermediate-
F IGURE
3.9 An example for which Bolzano’s
theorem is not applicable.
Exercises
145
Then g is continuous at each point of [xi, x,], and we have
~(XI) = f-h> - k < 0 ,
&z) =~CG> - k > 0 .
Applying Bolzano’s theorem to g, we have g(c) = 0 for some c between x1 and x2. But
this meansf(c) = k, and the proof is complete.
Note: In both Bolzano’s theorem and the intermediate-value theorem, it is assumed
thatf is continuous at each point of [a, b], including the endpoints a and b. T O understand
why continuity at both endpoints is necessary, we refer to the curve in Figure 3.9. Here
fis continuous everywhere in [a, b] except at a. Although f(a) is negative and f(b) is
positive, there is no x in [a, b] for whichf(x) = 0.
We conclude this section with an application of the intermediate-value theorem in which
we prove that every positive real number has a positive nth root, a fact mentioned earlier in
Section 13.14. We state this as a forma1 theorem.
TNEOREM 3.9. If n is a positive integer and if a > 0, then there is exactly one positive
b such that b” = a.
Proof. Choose c > 1 such that 0 < a < c, and consider the function f defined on the
interval [0, c] by the equationf(x) = xn. This function is continuous on [0, c], and at the
endpoints we have f(0) = 0, f(c) = c”. Since 0 < a < c < cn, the given number a lies
between the function values f(0) and f(c). Therefore, by the intermediate-value theorem,
we havef(x) = a for some x in (0, c), say for x = b. This proves the existence of at least
one positive b such that 6” = a. There cannot be more than one such b becausefis strictly
increasing on [0, c]. This completes the proof.
3.11 Exercises
1. Letf be a polynomial of degree n, sayf(x) = Ik=O c k xL, such that the first and last coefficients
c,, and c, have opposite signs. Prove that f (x) = 0 for at least one positive x.
2. A real number x1, such thatf(x,) = 0, is said to be a real root of the equationf(x) = 0. We
say that a real root of an equation has been isoluted if we exhibit an interval [a, b] containing
this root and no others. With the aid of Bolzano’s theorem, isolate the real roots of each of
the following equations (each has four real roots).
(a) 3x4 - 2x3 - 36x2 + 36x - 8 = 0.
(b) 2x4 - 14x2 + 14x - 1 = 0.
(c) x4 + 4x3 + x2 - 6x + 2 = 0.
3. If n is an odd positive integer and u < 0, prove that there is exactly one negative b such that
b” = a.
4. Let f(x) = tan x. Although f(?r/4) = 1 and f(3=/4) = -1, there is no x in the interval
[x/4, 3x/4] such thatf(x) = 0. Explain why this does not contradict Bolzano’s theorem.
5. Given a real-valued function f which is continuous on the closed interval [0, 11. Assume that
0 <f(x) 2 1 for each x in [0, 11. Prove that there is at least one point c in [0, l] for which
f(c) = c. Such a point is called ajxedpoint off. The result of this exercise is a special case of
Brouwer’s/?xed-point
theorem. [Hint: Apply Bolzano’s theorem to g(x) = f(x) - x.1
6. Given a real-valued functionfwhich is continuous on the closed interval [a, b]. Assume that
f(u) < u and thatf(b) 2 b. Prove thatfhas a fixed point in [a, b]. (See Exercise 5.)
146
Contimous functions
3.12 The process of inversion
This section describes another important method that is often used to construct new
functions from given ones. Before we describe the method in detail, we Will illustrate it with
a simple example.
Consider the function f defined on the interval [0, 21 by the equation J(x) = 2x + 1.
The range offis the interval [l, 51. Each point x in [0,2] is carried byf onto exactly one
point y in [1, 51, namely
(3.21)
y=2x+ 1.
Conversely, for every y in [l, 51, there is exactly one x in [0, 21 for which y = f(x). TO find
this x, we solve Equation (3.21) to obtain
x = $(y - 1).
This equation defines
x as a function ofy. If we denote this function by g, we have
g(y) = &(Y - 1)
for each y in [l, 51. The function g is called the inverse off. Note that g[f(x)] = x for
each x in [0,2], and thatf[g(y)] = y for each y in [l, 51.
Consider now a more general functionf with domain A and range B. For each x in A,
there is exactly one y in B such that JJ =f(x). For each y in B, there is at least one x in A
such that f(x) = y. Suppose that there is exactly one such x. Then we cari define a new
function g on B as follows:
g(y) = x
means y =S(.X) .
In other words, the value of g at each point y in B is that unique x in A such thatf(x) = y.
This new function g is called the inverse ofJ The process by which g is obtained fromfis
called inversion. Note that g[f(x)] = x for a11 x in A, and thatf[g(,v)] = y for a11 y in B.
The process of inversion cari be applied to any function f having the property that for
each y in the range off, there is exactly one x in the domain off such thatf(x) = y. In
particular, a function that is continuous and strictly monotonie on an interval [a, 61 has this
property. An example is shown in Figure 3.10. Let c = f(a), d =f(b). The intermediatevalue theorem for continuous functions tells us that in the interval [a, b], f takes on every
value between c and d. Moreover,fcannot take on the same value twice becausef(x,) #
J”(x.J whenever x1 # x2 . Therefore, every continuous strictly monotonie function has an
inverse.
The relation between a function f and its inverse g cari also be simply explained in the
ordered-pair formulation of the function concept. In Section 1.3 we described a function
f as a set of ordered pairs (x, y) no two of which have the same first element. The inverse
function g is formed by taking the pairs (x, y) inf and interchanging the elements x and y.
That is, (y, x) E g if and only if (x, y) EJ Iff is strictly monotonie, then no two pairs in f
have the same second element, and hence no two pairs of g have the same first element.
Thus g is, indeed, a function.
Properties of functions preserved by inversion
147
EXAMPLE. The nth-root function. If n is a positive integer, let f(x) = xn for x 2 0.
Then f is strictly increasing on every interval [a, b] with 0 < a < b. The inverse function g
is the nth-root function, defined for y 2 0 by the equation
g(y) = Y’” 3.13 Properties of functions preserved by inversion
Many properties possessed by the function f are transmitted to the inverse g. Figure
3.11 illustrates the relationship between their graphs. One cari be obtained from the other
merely by reflection through the line y = x, because a point (u, v) lies on the graph off
if and only if the point (v, u) lies on the graph of g.
Point (qu) with u = g(v)
f(b) = d -----------------f(x) = Y
-----------
JC4 = c ----- //
FIGURE 3.10
A
continuous,
strictly
increasing
function.
Point (u,v) with u = f(u)
FIGURE 3.11 Illustrating
the process of
inversion.
The properties of monotonicity and continuity possessed by f are transmitted to the
inverse function g, as described by the following theorem.
THEOREM
3.10. Assume f is strictly increasing and continuous on an interval [a, b]. Let
c = f (a) and d = f (b) and let g be the inverse off. That is, for each y in [c, d], let g(y) be that
x in [a, b] such that y = f (x). Then
(a) g is strictly increasing on [c, d] ;
(b) g is continuous on [c, d].
Proof. Choose y1 < y, in [c, d] and let x, = g(y& x2 = g(y&. Then y1 = f(xl) and
y2 =f(xz). Since f is strictly increasing, the relation y1 < yz implies x1 < x,, which, in
turn, implies g is strictly increasing on [c, d]. This proves part (a).
NOW we prove (b). The proof is illustrated in Figure 3.12. Choose a point y,, in the open
interval (c, d). TO prove g is continuous at y0, we must show that for every E > 0 there is
a 6 > 0 such that
(3.22)
g(yo) - E < g(y) < g(Jd + E
whenever y0 - 6 < y < y,, + 6.
Let x0 = g(y,,), SO that f (x,,) = y,,. Suppose E is given. (There is no loss in generality if we
consider only those E small enough SO that both x,, - E and x,, + E are in [a, b].) Let 6
148
Continuous
functions
be the smaller of the two numbers
f(xo> -f(% - El
and
f(xo + c> - f(xo> .
It is easy to check that this (r works in (3.22). A slight modification of the argument proves
that g is continuous from the right at c, and continuous from the left at d.
There is a corresponding theorem for decreasing functions. That is, the inverse of a
strictly decreasing continuous functionfis strictly decreasing and continuous. This follows
by applying Theorem 3.10 to -J
br-----------g(&J + f----------- lg
goJo)--------- 1 1
II 1
k!(Yo) - c------- ; j j
I I I
a-- 4
; 1I
I
jjj 1
i ll II
II
II 1
c y0 C”yJ6
0
FIGURE 3.12 Proof
6 is the smaller of these two distances
f(%
fb
of the continuity of the inverse function.
EXAMPLE. Continuity of the nth-root function. The nth-root function g, defined for
y 2. 0 by the equation ,~(y) = y lin, is strictly increasing and continuous on every interval
[c, d] with 0 5 c < d, since it is the inverse of a strictly increasing continuous function.
This gives an alternate proof of the continuity of the nth-root function, independent of the
theory of integration. Since the product of continuous functions is continuous, we again
deduce the continuity of the rth-power function, h(y) = y’, where r = m/n is a positive
rational number and y 2. 0.
3.14 Inverses of piecewise monotonie functions
Suppose we try to apply the process of inversion to a function that is not monotonie on
[u, b]. For example, suppose thatf(x) = x2 on an interval of the form [-c, c] on the x-axis.
Each point x in this interval is carried by f into exacdy one point y in the interval [0, c2],
namely,
y = x2.
(3.23)
We cari salve Equation (3.23) for x in terms ofy, but there are two values of x corresponding
to each y in (0, c2], namely,
x=4
and
x= -<y
Exercises
149
As we have mentioned once before, there was a time when mathematicians would have said
that the inverse g in this case is a double-valuedfirnction
defïned by
g(y)
= hz/y.
But since the more modern point of view does not admit double-valuedness as a property
of functions, in a case like this we say that the process of inversion gives rise to MO new
functions, say gl and ge, where
(3.24)
&(Y) = 2/
and
gz(y> = -fi
for each JJ in [0, c”] ,
TO fit this in with the notion of inverse as explained above, we cari look upon the equation
y = .x2 as defining not one function f but t\iso functions fi and fi, say, where
fi(X) = x2
i f
O<x<c
and
fi(x) = x2
if -c 5 x 5 0 .
These may be considered as distinct functions because they have different domains. Each
function is monotonie on its domain and each has an inverse, the inverse of fi being g,
and the inverse off, being g,, where gI and g2 are given by (3.24).
This illustrates how the process of inversion cari be applied to piecewise monotonie
functions. We simply consider such a function as a union of monotonie functions and invert
each monotonie piece.
We shall make extensive use of the process of inversion in Chapter 6.
3.15 Exercises
In each of Exercises 1 through 5, show thatfis strictly monotonie on the whole real axis. Letg
denote the inverse off. Describe the domain of g in each case. Write y =f(x) and solve for x
in terms of y; thus find a formula (or formulas) for computingg(y)
for each y in the domain of g.
l.f(x) =x + 1.
4. f(x) = x3.
i f x<l,
2. f(X) = 2x + 5.
X
5. f(x) = x2
if 1 < x I 4,
3. f(X) = 1 - x.
if x > 4.
i8-&i
Mean values. Let f be continuous and strictly monotonie on the positive real axis and let g
denote the inverse of f. If a, < a2 < < a, are n given positive real numbers, we define
their mean value (or average) with respect to f to be the number Ml defined as follows:
In particular, when f(x) = xn for p # 0, M, is called the pth power mean (See also Section
1 4.10.) The exercises which follow deal with properties of mean values.
6. Show that f(A4,) = (l/n) ~~=,f(ai). 1 n other words, the value off at the average M, is the
arithmetic mean of the function valuesf(a,), , . . ,~(a,).
7. Show that a, < Mf < a,. In other words, the average of a,, . . . , a, lies between the largest
and smallest of the ai.
8. If h(x) = af(x) + b, where CI # 0, show that Mh = M, . This shows that different functions
may lead to the same average. Interpret this theorem geometrically by comparing the graphs
of h andf.
Continuous jiinctions
150
3.16 The extreme-value theorem for continuous functions
Letfbe a real-valued function defined on a set S of real numbers. The function f is said
to have an abstilute maximum on the set S if there is at least one point c in S such that
f(x) s f(c)
for a11 x in S .
The number f(c) is called the absolute maximum value off on S. We say that f has an
absolute minimum on S if there is a point d in S such that
f(x) 2f(4
for a11 x in S .
Y
No absolute
maximum exists
Absolute
maximum
Absolute minimum
f(x) = sin x, 0 I x S T
f(x) = k if0 < x 5 2, f(0) = 1
(4
(b)
FIGURE
3.13
Maximum and minimum values of functions.
These concepts are illustrated in Figure 3.13. In Figure 3.13(a), S is the closed interval
[0, ~1 and f(x) = sin x. The absolute minimum, which occurs at both endpoints of the
interval, is 0. The absolute maximum isf($n) = 1.
In Figure 3.13(b), S is the closed interval [0, 21 andf(x) = 1/x if x > O,f(O) = 1. In
this example, f has an absolute minimum at x = 2, but it has no absolute maximum. lt
fails to have a maximum because of a discontinuity at a point of S.
We wish to prove that if S is a closed interval and iffis continuous everywhere on S, then
fhas both an absolute maximum and an absolute minimum on S. This result, known as
the extreme-value theorem for continuous functions, Will be deduced as a simple consequence
of the following theorem.
Let f be conTHEOREM 3.11.
BOUNDEDNESS THEOREM FOR CONTINUOUS FUNCTIONS.
tinuous on a closed interval [a, b]. Then f is bounded on [a, b]. That is, there is a number
C 2 0 such that 1f (x)1 5 C for a11 x in [a, b].
The extreme-value theorem for continuous functions
151
Proof. We argue by contradiction, using a technique called the method of successive
bisection. Assume that f is unbounded (not bounded) on [a, b]. Let c be the midpoint of
[a, b]. Since f is unbounded on [a, b] it is unbounded on at least one of the subintervals
[a, c] or [c, b]. Let [a, , b,] be that half of [a, b] in which f is unbounded. Iff is unbounded
in both halves, let [a, , b,] be the left half, [a, c]. Now continue the bisection process
repeatedly, denoting by [a,,, ,b,,,] that half of [a,, b,] in which f is unbounded, with the
understanding that we choose the left half iff is unbounded in both halves. Since the length
of each interval is half that of its predecessor, we note that the length of [a, , b,] is (b - a)/2”.
Let A denote the set of leftmost endpoints a , a, , a2, . . . , SO constructed, and let a be the
supremum of A. Then a lies in [a, b]. By continuity off at a, there is an interval of the
form (a - S, a + S) in which
(3.25)
If(x) -f(a)1 < 1.
If a = a this interval has the form [a, a + 6), and if a = b it has the form (b - 6, b].
Inequality (3.25) implies
If(x)1
< 1 + If (dl ,
fis bounded by 1 + If(a)1 in this interval. However, the interval [a, , b,] lies inside
(a - 6, a + 6) when n is SO large that (b - a)/2” < 6. Therefore f is also bounded in
[a, , b,], contradicting the fact that f is unbounded on [a, , b,]. This contradiction completes
the proof.
SO
If f is bounded on [a, b], then the set of a11 function values f (x) is bounded above and
below. Therefore, this set has a supremum and an infimum which we denote by sup f and
inff, respectively. That is, we Write
SUPf = suP {f(x) 1a I x 5 b},
inff=inf{f(x)Ia~~Ib}.
For any bounded function we have inf f < f(x) 5 sup f for a11 x in [a, b]. Now we prove
that a continuous function takes on both values inff and sup f somewhere in [a, b].
THEOREM
3.12.
Assume f is
[a, b]. Then there exist points c and d in [a, b] such that
EXTREME-VALUE THEOREM FOR CONTINUOUS FUNCTIONS.
continuous on a closed interval
f(c) =
supf
and
f ( d ) = infJ
Proof. It suffices to prove thatf attains its supremum in [a, 61. The result for the inhmum
then follows as a consequence
because the infimum off is the supremum of -J
Let M = supf We shall assume that there is no x in [a, b] for which f(x) = A4 and
obtain a contradiction. Let g(x) = M -f(x). Then g(x) > 0 for a11 x in [a, b] SO the
reciprocal l/g is continuous on [a, b]. By Theorem 3.11, l/g is bounded on [a, b], say l/g(x)
< C for a11 x in [a, b], where C > 0. This implies M -f(x) > l/C, SO that f(x) < A4 l/C for a11 x in [a, b]. This contradicts the fact that M is the least Upper bound off on
[a, b]. Hence, f(x) = M for at least one x in [a, b].
152
Continuous finctions
Note: This theorem shows that iffis continuous on [a, b], then sup f is its absolute
maximum, and inf fits absolute minimum. Hence,
by the intermediate-value theorem, the
range offis the ciosed interval [inff, supf].
3.17 The small-span theorem for continuous functions (uniform continuity)
Let f be real-valued and continuous on a closed interval [a, b] and let A4(f) and m(f)
denote, respectively, the maximum and minimum values off on [a, 61. We shall cal1 the
difference
the span offin the interval [a, b]. Some authors use the term oscillation instead of span.
However, oscillation has the disadvantage of suggesting undulating or wavelike functions.
Older texts use the word saltus, which is Latin for leap. The word “span” seems more
suggestive of what is being measured here. We note that the span offin any subinterval
of [a, b] cannot exceed the span offin [a, b].
We shall prove next that the interval [a, 61 cari be partitioned SO that the span off in each
subinterval is arbitrarily small. More precisely, we have the following theorem which we
cal1 the small-span theorem for continuous functions. It is usually referred to in the literature
as the theorem on uniform continuity.
THEOREM
3.13. Let f be continuous on a closed interval [a, b]. Then, for every E > 0
there is a partition of [a, b] into ajnite number of subintervals such that the span off in every
subinterval is less than E.
Proof. We argue by contradiction, using the method of successive bisections. Assume
the theorem is false. That is, assume that for some E, say for E = q, , the interval [a, b]
cannot be partitioned into a finite number of subintervals in each of which the span off
is less than q, . Let c be the midpoint of [a, b]. Then for the same Q,, the theorem is false in
at least one of the two subintervals [a, c] or [c, b]. (If the theorem were true in both intervals
[a, c] and [c, b], it would also be true in the full interval [a, b].) Let [a, , b,] be that half of
[a, b] in which the theorem is false for E,, . If it is false in both halves, let [a, , b,] be the left
half, [a, c]. Now continue the bisection process repeatedly, denoting by [a,,, , b,,,] that
half of [a, , b,] in which the theorem is false for cg, with the understanding that we choose
the left half if the theorem is false in both halves of [a, , b,,]. Note that the span off in each
subinterval [a, , b,] SO constructed is at least c0 .
Let A denote the collection of leftmost endpoints a, a, , u2, . . . , SO constructed, and let
a be the least Upper bound of A. Then c( lies in [a, b]. By continuity off at tc, there is an
interval (CC - d, CI + S) in which the span off is less than E” . (If cc = a, this interval is
[a, a + S), and if CI = b, it is (b - 6, 61.) However, the interval [a, , b,] lies inside (CC - 6,
dc + S) when n is SO large that (b - a)/2” < 6, SO the span off in [a, , b,] is also less than
E,, , contradicting the fact that the span off is at least q, in [a, , b,]. This contradiction
completes the proof of Theorem 3.13.
3.18 The integrability theorem for continuous functions
The small-span theorem cari be used to prove that a function which is continuous on
[a, b] is also integrable on [a, b].
The integrability
theorem
for continuous
finctiom
1.53
THEOREM 3.14. INTEGRABILITY OF CONTINUOUS
FUNCTIONS.
If a fimction f is continuous
at each point of a closetl intertjal [a, b], then f is integrable on [a, b].
Proof. Theorem 3.11 shows that f is bounded on [a, b], SO f has an upper integral,
j(f), and a lower integral, J(f). We shall prove that J(f) = j(f).
Choose an integer N 2 1 and let E = l/N. By the small-span theorem, for this choice
of E there is a partition P = {x, , .x1 , . . . , x,,} of [a, b] into n subintervals such that the span
offin every subinterval is less than E. Denote by Mk(f) and mk( f ), respectively, the absolute
maximum and minimum values offin the kth subinterval [xkPI , xk]. Then we have
foreachk=1,2 ,...,r?.
follows :
Now let s,, and t, be two step functions
&d = %C(f)
if xkPl < x < xk ,
defined on [a, b] as
s,,(a) = df ),
Then we have s,(x) <f(x) 5 t,(x) for a11 x in [a, b]. Also, we have
The difference of these two integrals is
Since é = I/N, this inequality cari be written in the form
(3.26)
On the other hand, the Upper and lower integrals offsatisfy the inequalities
Multiplying the first set of inequalities by (-1) and adding the result to the second set,
we obtain
Using (3.26) and the relation I(f)) < i(f), we have
154
Continuous jiinctions
for every integer N 2 1. Therefore, by Theorem 1.31, we must have l(f) = r(f). This
proves thatf‘is integrable on [a, 61.
3.19 Mean-value theorems for integrals of continuous functions
In Section 2.16 we defined the average value A(f) of a function f over an interval [a, b]
to be the quotient jif(x) dx/(b - a). Whenfis continuous, we cari prove that this average
value is equal to the value offat some point in [a, b].
THEOREM 3.15. MEAN-VALUE THEOREM
then for some c in [a, b] we have
FOR
INTEGRAIS.
s
;f(x) dx =f(c)(b
Iff is continuous on [a,b],
- a).
Proof. Let m and M denote, respectively, the minimum and maximum
[a, b]. Then m <f(x) 5 A4 for a11 x in [a, b]. Integrating these inequalities
by b - a, we find m 5 A(f) 5 M, where A(f) = j’a f (x) dx/(b - a).
But
mediate-value theorem tells us that A(f) = f(c) for some c in [a, b]. This
proof.
values off on
and dividing
now the intercompletes the
There is a corresponding result for weighted mean values.
THEOREM
3.16. W E I G H T E D M E A N - V A L U E T H E O R E M F O R I N T E G R A I S . Assumefandg are
continuous on [a, b]. If g never changes sign in [a, b] then, for some c in [a, b], nse have
(3.27)
Proof. Since g never changes sign in [a, b], g is always nonnegative or always nonpositive
on [a, b]. Let us assume that g is nonnegative on [a, b]. Then we may argue as in
the proof of Theorem 3.15, except that we integrate the inequalities mg(x) 5 f(x)g(x) <
Mg(x) to obtain
(3.28)
m/)(x) dx 5 I(:f (x)g(x)
dx I M/:g(x) dx.
If Jig(x) dx = 0, this inequality shows that ja f (x)g(x) dx = 0. In this case, Equation (3.27)
holds trivially for any choice of c since both members are zero. Otherwise, the integral of g
is positive, and we may divide by this integral in (3.28) and apply the intermediate-value
theorem as before to complete the proof. If g is nonpositive, we apply the same argument
to -g.
The weighted mean-value theorem sometimes leads to a useful estimate for the integral
of a product of two functions, especially if the integral of one of the factors is easy to
compute. Examples are given in the next set of exercises.
Exercises
155
3.20 Exercises
1. Use Theorem 3.16 to establish the following inequalities:
2. Note that 2/1 - x2 = (1 - xz)/d- and use Theorem 3.16 to obtain the inequalities
3. Use the identity 1 +x6 = (1 +x2)(1 - x2 + x4) and Theorem 3.16 to prove that for a > 0,
we have
Take a = l/lO and calculate the value of the integral rounded off to six decimal places.
4. One of the following two statements is incorrect. Explain why it is wrong.
(a) The integral j$: ( sin t)/r dr > 0 because jaz (sin t)/t dr > jis Isin tl/r dt.
(b) The integral j$j (sin t)/t dt = 0 because, by Theorem 3.16, for some c between 2n and 4~
we have
t
s
477
4n sin
COS (2a) - COS (47T)
= 0.
Tdt=;
sin t dt =
c
2a
s277
s
5. If n is a positive integer, use Theorem 3.16 to show that
d(?z+lh
~ sin (12) dt = (_I)n
4%
6. Assume f is continuous on [a, b].
[a, bl.
c
,
where & < c < M.
If jt f(x) dx = 0, p rove thatf(c) = 0 for at least one c in
7. Assume thatfis integrable and nonnegative on [a, b]. If JE/(x) dx = 0, prove that f(x) = 0
If f(c) > 0 at a point of continuity c, there is an
at each point of continuity off. [Hint:
interval about c in whichf(x) > if(c).]
8. Assume fis continuous on [a, b]. Assume also that jif(x)g(x) dx = 0 for every function g
that is continuous on [a, b]. Prove thatf(x) = 0 for a11 x in [a, b].
4
DIFFERENTIAL
CALCULUS
4.1 Historical introduction
Newton and Leibniz, quite independently of one another, were largely responsible for
developing the ideas of integral calculus to the point where hitherto insurmountable problems
could be solved by more or less routine methods. The successful accomplishments of these
men were primarily due to the fact that they were able to fuse together the integral calculus
with the second main branch of calculus, differential calculus.
The central idea of differential calculus is the notion of derivative. Like the integral,
the derivative originated from a problem in geometry-the problem of finding the tangent
line at a point of a curve. Unlike the integral, however, the derivative evolved very late
in the history of mathematics. The concept was not formulated until early in the 17th
Century when the French mathematician Pierre de Fermat, attempted to determine the
maxima and minima of certain special functions.
Fermat’s idea, basically very simple, cari be understood if we refer to the curve in
Figure 4.1. It is assumed that at each of its points this curve has a definite direction that
cari be described by a tangent line. Some of these tangents are indicated by broken lines
in the figure. Fermat noticed that at certain points where the curve has a maximum or
X0
FIGURE 4.1
156
Xl
The curve has horizontal tangents above the points x,, and x1 .
A problem involving velocity
157
minimum, such as those shown in the figure with abscissae x0 and x1 , the tangent line
must be horizontal. Thus the problem of locating such extreme values is seen to depend
on the solution of another problem, that of locating the horizontal tangents.
This raises the more general question of determining the direction of the tangent line
at an arbitrary point of the curve. It was the attempt to solve this general problem that
led Fermat to discover some of the rudimentary ideas underlying the notion of derivative.
At fust sight there seems to be no connection whatever between the problem of finding
the area of a region lying under a curve and the problem of finding the tangent line at
a point of a curve. The first person to realize that these two seemingly remote ideas are,
in fact, rather intimately related appears to have been Newton’s teacher, Isaac Barrow
(1630-1677). However, Newton and Leibniz were the first to understand the real importance of this relation and they exploited it to the fullest, thus inaugurating an unprecedented era in the development of mathematics.
Although the derivative was originally formulated to study the problem of tangents, it
was soon found that it also provides a way to calculate velocity and, more generally, the
rate of change of a function.
In the next section we shall consider a special problem involving the calculation of a velocity. The solution of this problem contains a11 the essential
features of the derivative concept and may help to motivate the general definition of
derivative which is given in Section 4.3.
4.2 A problem involving velocity
Suppose a projectile is fired straight up from the ground with initial velocity of 144 feet
per second. Neglect friction, and assume the projectile is influenced only by gravity SO
that it moves up and back along a straight line. Letf(t) denote the height in feet that the
projectile attains t seconds after firing. If the force of gravity were not acting on it, the
projectile would continue to move upward with a constant velocity, traveling a distance
of 144 feet every second, and at time t we would have f(t) = 144t. In actual practice,
gravity causes the projectile to slow down until its velocity decreases to zero and then it
drops back to earth. Physical experiments suggest that as long as the projectile is aloft,
its heightf(t) is given by the formula
f(t) = 144t - 16t2.
(4.1)
The term -16t2 is due to the influence of gravity. Note that f(t) = 0 when t = 0 and
when t = 9. This means that the projectile returns to earth after 9 seconds and it is to
be understood that formula (4.1) is valid only for 0 5 t < 9.
The problem we wish to consider is this: TO determine the velocity of the projectile at
each instant of its motion. Before we cari understand this problem, we must decide on
what is meant by the velocity at each instant. TO do this, we introduce first the notion
of average velocity during a time interval, say from time t to time t + h. This is defined
to be the quotient
change in distance during time interval = f(t + h) - f(t)
length of time interval
h
*
This quotient, called a difference
quotient, is a number which may be calculated whenever
158
DifSerential
calculus
both t and t + h are in the interval [0,9]. The number h may be positive or negative,
but not zero. We shah keep t fixed and see what happens to the difference quotient as
we take values of h with smaller and smaller absolute value.
For example, consider the instant t = 2. The distance traveled after 2 seconds is
f(2) = 288 - 64 = 224.
At time t = 2 + h, the distance covered is
f(2 + h) = 144(2 + h) - 16(2 + h)2 = 224 + 80h - 16h2.
Therefore the average velocity in the interval from t = 2 to t = 2 + h is
f(2 + h) - f(2) = 8Oh - 16h2
h
h
= 80 _ 16h
As we take values of h with smaller and smaller absolute value, this average velocity gets
closer and closer to 80. For example, if h = 0.1, we get an average velocity of 78.4; when
h = 0.001, we get 79.984; when h = 0.00001, we obtain the value 79.99984; and when
h = -0.00001, we obtain 80.00016. The important thing is that we cari make the average
velocity as close to 80 as we please by taking Ihl sufficiently small. In other words, the
average velocity approaches 80 as a limit when h approaches zero. It seems natural to cal1
this limiting value the instantaneous velocity at time t = 2.
The same kind of calculation cari be carried out for any other instant. The average
velocity for an arbitrary time interval from t to t + h is given by the quotient
f(t + h) -f’(t) = Il‘Wt + h) - 16(t + h)21 - [144t - 16t2] = 144 _ 32t _ 16h
h
h
When h approaches zero, the expression on the right approaches 144 - 32t as a limit,
and this limit is defined to be the instantaneous velocity at time t. If we denote the instantaneous velocity by v(t), we may Write
(4.2)
v(t) = 144 - 32t.
The formula in (4.1) for the distance f(t) defines a function f which tells us how high
the projectile is at each instant of its motion. We may refer to f as the position function.
Its domain is the closed interval [0, 91 and its graph is shown in Figure 4.2(a). [The scale
on the vertical axis is distorted in both Figures 4.2(a) and (b).] The formula in (4.2) for
the velocity v(t) defines a new function v which tells us how fast the projectile is moving
at each instant of its motion. This is called the velocity function, and its graph is shown in
Figure 4.2(b). As t increases from 0 to 9, v(t) decreases steadily from v(0) = 144 to v(9) =
- 144. TO find the time t for which v(t) = 0, we solve the equation 144 = 32t to obtain
t = 9/2. Therefore, at the midpoint of the motion the influence of gravity reduces the
velocity to zero, and the projectile is momentarily at rest. The height at this instant
is f(9/2) = 324. When t > 9/2, the velocity is negative, indicating that the height is
decreasing.
The derivative of a function
159
The limit process by which v(t) is obtained from the difference quotient is written symbolically as follows :
v(t) = limf(t + h) -f(t)
(4.3)
h
h-0
’
This equation is used to define velocity not only for this particular example but, more
generally, for any particle moving along a straight line, provided the position function f
is such that the difference quotient tends to a definite limit as h approaches zero.
(b)
(4
FIGURE 4 . 2 (a) Graph of the position functionf(t) = 144t - 16t2. (b) Graph of the
velocity function: v(t) = 144 - 32t.
4.3 The derivative of a function
The example described in the foregoing section points the way to the introduction of
the concept of derivative. We begin with a function f defined at least on some open
interval (a, b) on the x-axis. Then we choose a fixed point x in this interval and introduce
the difference quotient
fix + h) -f(x)
h
’
where the number h, which may be positive or negative (but not zero), is such that x + h
also lies in (a, b). The numerator of this quotient measures the change in the function
DifSerential calculus
160
when x changes from x to x + h. The quotient itself is referred to as the average rate of
change off in the interval joining x to x + h.
Now we let h approach zero and see what happens to this quotient. If the quotient
approaches some definite value as a limit (which implies that the limit is the same whether
h approaches zero through positive values or through negative values), then this limit is
called the derivative off at x and is denoted by the symbol f ‘(x) (read as ‘f prime of x”).
Thus, the forma1 definition off’(x) may be stated as follows :
DEFINITION OF DERIVATIVE.
The derivative f ‘(x) is dejîned by the equation
f’(x) = limf(x ’ h) -f(x)
(4.4)
h-0
yrovided the limit exists.
h
’
The number f ‘(x) is also called the rate of change off at x.
By comparing (4.4) with (4.3), we see that the concept of instantaneous velocity is
merely an example of the concept of derivative. The velocity v(l) is equal to the derivative
f’(t), where f is th e function which measures position. This is often described by saying
that velocity is the rate of change of position with respect to time. In the example worked
out in Section 4.2, the position function f is described by the equation
f(t) = 144t - 16t2,
and its derivative f’ is a new function (velocity) given by
f’(t) = 144 - 32t.
In general, the limit process which produces f ‘(x) from f (x) gives us a way of obtaining
a new function f’ from a given function f. The process is called dzjêrentiation,
and f’ is
called theJirst derivative off. Iff', in turn, is defined on an open interval, we cari try to
compute its first derivative, denoted by f V and called the second derivative off. Similarly,
the nth derivative off, denoted by f tn), is defined to be the first derivative off (+l). We
make the convention that f (O) = f, that is, the zeroth derivative is the function itself.
For rectilinear motion, the first derivative of velocity (second derivative of position) is
called accelerarion. For example, to compute the acceleration in the example of Section
4.2, we cari use Equation (4.2) to form the difference quotient
U(t + h) - u(t) = [144 - 32(t + h)] - [144 - 32t] _ -32h _ -32
h
h
h
Since this quotient has the constant value -32 for each h # 0, its limit as h -f 0 is also
-32. Thus, the acceleration in this problem is constant and equal to -32.. This result
tells us that the velocity is decreasing at the rate of 32 feet per second every second. In 9
seconds the total decrease in velocity is 9 *32 = 288 feet per second. This agrees with the
fact that during the 9 seconds of motion the velocity changes from v(0) = 144 to
v(9) = - 144.
161
Examples of derivatives
4.4 Examples of derivatives
EXAMPLE 1. Derivative of a constant function. Suppose f is a constant function, say
f(x) = c for a11 x. The difference quotient is
f(x+h)-f(x)-c-c-O,
h
h
Since the quotient is 0 for a11 h z 0, its limit, f ‘(x), is also 0 for every x.
constant function has a zero derivative everywhere.
In other words, a
EXAMPLE 2. Derivative of a linear function. Suppose f is a linear function, say f(x) =
mx + b for all real x. If h # 0, we have
mh
f(x + h) -f(x) = m(x + h) + b - (mx + b) =-=m
h
h
h
’
Since the difference quotient does not change when h approaches 0, we conclude that
f’(x) = m
for every x.
Thus, the derivative of a linear function is a constant function.
EXAMPLE
3. Derivative of a positive integer power function. Consider next the case
f(x) = xn, where n is a positive integer. The difference quotient becomes
f(x+h)-f(x>=(~+h)~-x~
h
12
’
TO study this quotient as h approaches 0, we cari proceed in two ways, either by factoring
the numerator as a difference of two nth powers or by using the binomial theorem to
expand (x + h)“. We shah carry out the details by the first method and leave the other
method as an exercise for the reader. (See Exercise 39 in Section 4.6.)
From elementary algebra we have the identityt
n-1
an - b” = (a - b) 2 akbnpl-k,
k=O
If we take a = x + h and b = x and divide both sides by tr, this identity becomes
(x + h)” - xn
n-l
=
(x + h)kxn-l-k.
h
c
k=fl
t This identity is an immediate consequence
each term of the sum by (a - b), we find
n-l
(a _ b) 1 &n-1-w
k=O
of the telescoping property of finite sums. In fact, if we multiply
Il-1
= 2 (uLtl/y-(E+ll
k=O
_ &pL)
= a" _ pl.
162
Diyerential calculus
There are n terms in the sum. As h approaches 0, (x + h)” approaches xk, the kth term
a p p r o a c h e s xkxn-lPk = R l, and therefore the sum of a11 n terms approaches nx”-l.
. .
From this it follows thar f’(x) = nxn-l
for every x.
EXAMPLE 4. Derivative of the sine function. Let s(x) = sin x. The difference quotient
in question is
4x + h) - s(x) = sin (x + h) - sin x
h
h
TO transform this into a form that makes it possible to calculate the limit as h + 0, we use
the trigonometric identity
v - x
Y+x
sin y - sin x = 2 sin L COS 2
2
with y = x + h. This leads to the formula
sin (x + h) - sin x
h
As h --f 0, the factor
the limit formula
COS
(x + frh) --f
COS
x because of the continuity of the cosine. Also,
sin x
lim -= 1
2+0 x
established earlier in Section 3.4, shows that
sin (W) -, 1
(4.5)
a s
h-tO.
h/2
Therefore the difference quotient has the limit COS x as h + 0. In other words, s’(x) =
COS x for every x; the derivative of the sine function is the cosine function.
EXAMPLE 5. The derivative of the cosine function. Let c(x) = COS x. We shall prove that
c’(x) = -sin x; that is, the derivative of the cosine function is minus the sine function.
We start with the identity
COS
y -
COS
v - x . y + x
x = -2 sin L sin 2
2
and take y = x + h. This leads to the formula
COS
(x + h) h
COS
x =-
The algebra of derivatives
163
Continuity of the sine shows that sin (x + $h) -f sin x as h -+ 0; from (4.9, we obtain
c’(x) = -sin x.
EXAMPLE 6. Derivative of the nth-root function. If n is a positive integer, let f(x) = xlln
for x > 0. The difference quotient forf is
j-(x + h) -f(x) = (x + h)l’” - xlln
h
h
’
Let u = (x + h)lln and let v = xlln. Then we have un = x + h and un = x,
un - un, and the difference quotient becomes
SO
h =
u - v
1
f(x + h) -f(x)z-c
Un-l + Un-2v + . . . + UVn-2 + g-1 *
h
un - un
The continuity of the nth-root function shows that u -f v as h + 0. Therefore each term
in the denominator on the right has the Iimit un-l as h + 0. There are n terms altogether,
SO the difference quotient has the limit v-“/n. Since u = xlln, this proves that
f’(x) = ! Xlln-l .
n
EXAMPLE 7. Continuity of functions having derivatives. If a function f has a derivative at
a point x, then it is also continuous at x. TO prove this, we use the identity
j-(x + h) = f(x) + h f(x + y - f (“))
(
which is valid for h # 0. If we let h - 0, the difference quotient on the right approaches
f’(x) and, since this quotient is multiplied by a factor which tends to 0, the second term on
the right approaches 0 -f’(x) = 0. This shows that f(x + h) Af(x) as h --i 0, and hence
that f is continuous at x.
This example provides a new way of showing that functions are continuous. Every
time we establish the existence of a derivative f’(x), we also establish, at the same time,
the continuity offat x. It should be noted, however, that the converse is not true. Continuity at x does not necessarily mean that the derivative f’(x) exists. For example, when
e oinp xt = 0 is a point of continuity off [since f (x) --, 0 as x + 0] but there
f ( x ) = Ixl,th
is no derivative at 0. (See Figure 4.3.) The difference quotient [f(O + h) - f(O)]/h is
F IGURE 4 . 3 The function is continuous at 0 but f’(O) does net exist.
164
DifSerential
calculus
equal to ]h]/h. This has the value + 1 if h > 0 and - 1 if h < 0, and hence does not tend
to a limit as h + 0.
4.5 The algebra of derivatives
Just as the limit theorems of Section 3.4 tel1 us how to compute limits of the sum, difference, product, and quotient of two functions, SO the next theorem provides us with a
corresponding set of rules for computing derivatives.
THEOREM 4.1.
Let f and g be two functions dejned on a common interval. At each point
Mlhere f and g have a derivative, the same is true of the sum f + g, the d@erence f - g,
the product f *g, and the quotient f/g. (For f/g we need the extra proviso that g is not zero at
the point in question.) The derivatives of these functions are given by the following formulas:
(9 (f + g)’ = f’ + g’ ,
(ii) (f - g)’ = f’ - g’ ,
(iii) (f*g)‘=f*g’+g*f’,
(iv)
at points x where g(x) # 0.
We shah prove this theorem in a moment, but first we want to mention some of its
consequences.
A special case of (iii) occurs when one of the two functions is constant,
say g(x) = c for a11 x under consideration. In this case, (iii) becomes (c . f)’ = c . f ‘. In
other words, the derivative of a constant times f is the constant times the derivative off.
Combining this with the fact that the derivative of a sum is the sum of derivatives [property
(i)], we find that for every pair of constants c1 and c2 we have
(c1f + c,g)’ = cJ’ +
c2g
‘*
This is called the linearity property of the derivative, and it is analogous to the linearity
property of the integral. Using mathematical induction, we cari extend the iinearity
property to arbitrary finite sums as follows:
where e1 , . . . , c, are constants and fi , . . . , fn are functions with derivatives fi , . . . , f,‘, .
Every derivative formula cari be written in two ways, either as an equality between two
functions or as an equality involving numbers. The properties of Theorem 4.1, as written
above, are equations involving functions. For example, property (i) states that the derivative of the function f + g is the sum of the two functionsf’ and g’. When these functions
The algebra of derivatives
165
are evaluated at a point x, we obtain formulas involving numbers. Thus formula (i)
implies
(f + g)‘!x) = f’(x) + g’(x).
We proceed now to the proof of Theorem 4.1.
Proofof(i). Let x be a point where both derivativesf’(x) and g’(x) exist. The dilference
quotient forf + g is
[fix + h) + dx + WI - Mxi + g(x)1 _ fb + 4 - fw + g(x + h) - g(x)
h
h
h
’
When h + 0 the first quotient on the right approachesf’(x), the second approaches g’(x),
and hence the sum approachesf’(x) + g’(x). This proves (i), and the proof of (ii) is similar.
Proof of (iii). The difference quotient for the productf.
g is
fb + hk(x + h) -f(x)g(xj
(4.6)
h
TO study this quotient as h --f 0, we add and subtract in the numerator a term which enables
us to Write (4.6) as a sum of two terms involving difference quotients offand g. Adding
and subtracting g(x)f(x + h), we see that (4.6) becomes
./Xx + hhdx + h) -f(x)&)
h
= g(x) S(x + h) -f(x) + f(x + h) g(x + h) - g(X)
h
h
’
When h + 0 the first term on the right approaches g(x)f’(x). Sincefis continuous
at x,
we havef(x + h) -f( x ) , SO the second term approachesf(x)g’(x). This proves (iii).
Proofof(iv). A special case of (iv) occurs whenf(x) = 1 for a11 x. In this casef’(x) = 0
for a11 x and (iv) reduces to the formula
g’
0
-1’ =-g
g2
(4.7)
provided g(x) # 0. We cari deduce the general formula (iv) from this special case by
writingf/g as a product and using (iii), since
Therefore it remains to prove (4.7). The difference quotient for l/g is
(4.8)
W& + h)l - [llg(x)l = _ g(x + h) - g(x) . 1.
h
h
1
g(x) dx + h) ’
D$ferential calculus
166
When h + 0, the first quotient on the right approaches g’(x) and the third factor approaches
l/g(x). The continuity of g at x is required since we are using the fact that g(x + h) +
g(x) as h --f 0. Hence the quotient in (4.8) approaches -g’(x)/g(x)2, and this proves (4.7).
Note:
In order to Write (4.8) we need to know that g(x + h) # 0 for a11 sufficiently small
h. This follows from Theorem 3.7.
Theorem 4.1, when used in conjunction with the examples worked out in Section 4.4,
enables us to derive new examples of differentiation formulas.
EXAMPLE
1. Polynomials. In Example 3 of Section 4.4 we showed that if f(x) = xn,
where n is a positive integer, then J’(x) = nxn-l. The reader may find it instructive to
rederive this result as a consequence
of the special case n = 1, using mathematical induction
in conjunction with the formula for differentiating a product.
Using this result along with the linearity property, we cari differentiate any polynomial
by computing the derivative of each term and adding the derivatives. Thus, if
then, by differentiating term by term, we obtain
Note that the derivative of a polynomial of degree n is a new polynomial of degree n - 1.
For example, iff(x) = 2x3 + 5x2 - 7x + 8, thenf’(x) = 6x2 + 10x - 7.
2. Rational functions. If r is the quotient of two polynomials, say r(x) =
then the derivative r’(x) may be computed by the quotient formula (iv) in
Theorem 4.1. The derivative r’(x) exists at every x for which the denominator q(x) # 0.
In particular, when r(x) =
Note that the function r’ SO defined is itself a rational function.
l/xm, where m is a positive integer and x # 0, we find
EXAMPLE
p(x)/q(x),
r’(x) = ’
“.()X
mxmpl
2m
-m
=xln-l-l
.
If this is written in the form r’(x) = -mx?-l, it provides an extension from positive
exponents to negative exponents of the formula for differentiating nth powers.
EXAMPLE 3. Rational powers.
Let f(x) = x’ for x > 0, where r is a rational number.
We have already proved the differentiation formula
(4.9)
f’(x) = rx’-l
for r = lin, where n is a positive integer. Now we extend it to a11 rational powers. The
formula for differentiating a product shows that Equation (4.9) is also valid for r = 2/n
Exercises
167
and, by induction, for r = min, where m is any positive integer. (The induction argument
refers to m.) Therefore Equation (4.9) is valid for a11 positive rational r. The formula
for differentiating a quotient now shows that (4.9) is also valid for negative rational r.
Thus, if f(x) = x2/3, we have f’(x) = 5x-1/3. If f(x) = x1/2, then f’(x) = -SX-~/~. In
each case, we require x > 0.
4.6 Exercises
1. Iff(x) = 2 + x - x2, computef’(O),f’($),f’(l),f’(-10).
2. Iff(x) = $x3 + ix” - 2x, find a11 x for which (a)f’(x) = 0; (b)f’(x) = -2; (c)f’(x) = 10.
In Exercises 3 through 12, obtain a formula forf(x) iff(x) is described as indicated.
3. f(x) = x2 + 3x -t 2.
S.f(x) =$
4. f(x) = x4 + sin x.
9. f(x) =
1
2 + COS x *
x2 + 3x + 2
10. f(x) =
x4 + x2 + 1 *
5. f(x) = x4 sin x.
6.
j-(x> = --&,
x # 1.
x # -1.
11. J’(x) =;I”x.
7. j-(x> = &y + x5 COS x.
13. Assume that the height,f(t)
of a projectile, t seconds after being fired directly upward from the
ground with an initial velocity of a0 ft/sec, is given by the formula
f‘(t) = v,t - 16t2.
(a) Use the method described in Section 4.2 to show that the average velocity of the projectile
during a time interval from t to t + h is ao - 32t - 16h ft/sec, and that the instantaneous
velocity at time t is u0 - 32t ft/sec.
(b) Compute (in terms of ut,) the time required for the velocity to drop to zero.
(c) What is the velocity on return to earth?
(d) What must the initial velocity be for the projectile to return to earth after 1 sec? after
10 sec? after T sec?
(e) Show that the projectile moves with constant acceleration.
(f) Give an example of another formula for the height which Will lead to a constant acceleration of -20 ft/sec/sec.
14. What is the rate of change of the volume of a cube with respect to the length of each edge?
15. (a) The area of a circle of radius r is w2 and its circumference is 2nr. Show that the rate of
change of the area with respect to the radius is equal to the circumference.
(b) The volume of a sphere of radius r is 4nr3/3 and its surface area is 4nr2. Show that the
rate of change of the volume with respect to the radius is equal to the surface area.
In Exercises 16 through 23, obtain a formula for f’(x) iff(x) is defined as indicated.
16. ,f(x) = 1/x,
x > 0.
1
17. f(x) = ~
1+4’
x > 0.
18. f(X) = 2’2,
19. f(X) = x-3’2,
x > 0.
x > 0.
Dlferential calculus
168
20. f(x) =
$2 + $13
+ x1l4
2 1. f(x) = x-l’2 + x-1’3 + x-1’4,
x > 0.
x > 0.
4
22. fC-4 = I+x >
x > 0.
23. f(x) = -?- >
l-t&
x > 0.
24. Let,f,,... , fn be n functions having derivatives f l , . . , fn . Develop a rule for differentiating
the product g =,fr fn and prove it by mathematical induction. Show that for those points
x, where none of the function values fi(x), . . . , fn(x) are zero, we have
f;(x)
g’(x)
fi(x)
-=g(x)
,fi(X) +‘.. +.fn(xy
25. Verify the entries in the following short table of derivatives. It is understood that the formulas
hold for those x for which f(x) is defined.
f(x)
tan x
cet x
f’(x)
sec2 x
-csc2 x
f(x)
.f ‘(x)
sec x
csc x
tan x sec x
-cet x csc x
In Exercises 26 through 35, compute the derivative f’(x). It is understood that each formula
holds for those x for which f(x) is defined.
26. f(x) = tan x sec x.
27. f(x) = x tan x.
32. f(x) = --!--x + sin x ’
28. f(x) = ; + -$ + f
1 +x-x2
30. f(x) = 1 _ x + x2 .
ux2 + bx + c
35’ f(X) = sin x + ~0s x ’
36. If f(x) = (ax + b) sin x + (cx + d) COS x, determine values of the constants a, b, c, d such
thatf’(x) = x COS x.
37. If g(x) = @x2 + bx + c) sin x + (dx2 + ex + f) COS x, determine values of the constants
a, b, c, d, e, f such that g’(x) = x2 sin x.
38. Given the formula
Xn+l - 1
1 + x + x2 + . . *+ xn = x-l
(valid if x # l), determine, by differentiation, formulas for the following sums:
(a) 1 + 2x + 3x2 + . .
(b) 12x + 22x2 + 32x3 + +~zx;->xn.
169
Geometric interpretation of the derivative as a slope
39. Let f(x) = xn, where n is a positive integer. Use the binomial theorem to expand (x +
and derive the formula
fCx
+
h,
-.fCx)
h
n(n - ')xn-2h
= nx"-l
+ ., . + nxhn-2
/J)~
+ hn-l
+-T-
Express the sum on the right in summation notation. Let h + 0 and deduce thatf’(x) = nxn-l.
State which limit theorems you are using. (This result was derived in another way in Example
3 of Section 4.4.)
4.7 Geometric interpretation of the derivative as a slope
The procedure used to define the derivative has a geometric interpretation which leads in
a natural way to the idea of a tangent line to a curve. A portion of the graph of a function
fis shown in Figure 4.4. Two of its points P and Q are shown with respective coordinates
,Vertical (no slope)
,m = 3
+ h) -f(x)
,
m=\ X
FIGURE 4.4
4
m indicates the slope
x+h
Geometric interpretation of the
FIGURE 4.5
Lines of various dopes.
difference quotient as the tangent of an angle.
(x,~(x)) and (x + h,f(x + h)). Consider the right triangle with hypotenuse PQ; its
altitude, J(x + h) -f(x ), r e p resents the difference of the ordinates of the two points Q
and P. Therefore, the difference quotient
(4.10)
f(x + h) - f(x)
h
represents the trigonometric tangent of the angle GI that PQ makes with the horizontal.
The real number tan tl is called the slope of the line through P and Q and it provides a
way of measuring the “steepness” of this line. For example, iff is a linear function, say
f(x) = mx + b, the difference quotient (4.10) has the value m, SO m is the slope of the
line.
Some examples of lines of various slopes are shown in Figure 4.5. For a horizontal line,
170
DifSerential
calculus
u = 0 and the slope, tan cc, is also 0. If LX lies between 0 and &T, the line is rising as we move
from left to right and the slope is positive. If CI lies between &r and n, the line is falling as
we move from left to right and the slope is negative. A line for which u = $T has slope 1.
As cc increases from 0 to &n, tan CI increases without bound, and the corresponding lines
of slope tan CI approach a vertical position. Since tan &r is not defined, we say that vertical
lines haue no dope.
Suppose now that f has a derivative at x. This means that the difference quotient
approaches a certain limit f ‘(x) as h approaches 0. When this is interpreted geometrically
it tells us that, as h gets nearer to 0, the point P remains fixed, Q moves along the curve
toward P, and the line through PQ changes its direction in such a way that its slope
approaches the number f ‘(x) as a limit. For this reason it seems natural to define the dope
of the curve at P to be the numberf ‘(x). The line through P having this slope is called the
tangent line at P.
Note: The concept of a line tangent to a circle (and to a few other special curves) was
considered by the ancient Greeks. They defined a tangent line to a circle as a line having
one of its points on the circle and a11 its other points outside the circle. From this definition, many properties of tangent lines t o circles cari be derived. For example, we cari prove
that the tangent at any point is perpendicular to the radius at that point. However, the
Greek definition of tangent line is not easily extended to more general curves. The method
described above, where the tangent line is defined in terms of a derivative, has proved to
be far more satisfactory. Using this definition, we cari prove that for a circle the tangent
line has a11 the properties ascribed to it by the Greek geometers. Concepts such as perpendicularity and parallelism cari be explained rather simply in analytic terms making use
of slopes of lines. For example, from the trigonometric identity
tan (M - B) =
tan a - tan B
1 + tan GI tan B ’
it follows that two nonvertical lines with the same
identity
cet (u - 8) =
slope are parallel. Also, from the
1 + tan c( tan /?
tan OL - tan p ’
we find that two nonvertical lines with slopes having product
- 1 are perpendicular.
The algebraic sign of the derivative of a function gives us useful information about the
behavior of its graph. For example, if x is a point in an open interval where the derivative
is positive, then the graph is rising in the immediate vicinity of x as we move from left to
right. This occurs at x3 in Figure 4.6. A negative derivative in an interval means the
graph is falling, as shown at x1, while a zero derivative at a point means a horizontal tangent
line. At a maximum or minimum, such as those shown at x2, x5, and x8, the slope must be
zero. Fermat was the first to notice that points like x,, x,, and x,, where f has a maximum
or minimum, must occur among the roots of the equation f’(x) = 0. It is important to
realize that f ‘(x) may also be zero at points where there is no maximum or minimum, such
as above the point x4. Note that this particular tangent line crosses the graph. This is an
example of a situation not covered by the Greek definition of tangency.
Other notations for derivatives
XI
X2
FIGURE 4 . 6
X3
X4
X5
171
X6
Geometric significance of the sign of the derivative.
The foregoing remarks concerning the significance of the algebraic sign of the derivative
may seem quite obvious when we interpret them geometrically. Analytic proofs of these
statements, based on general properties of derivatives, Will be given in Section 4.16.
4.8 Other notations for derivatives
Notation has played an extremely important role in the developmenr of mathematics.
Some mathematical symbols, such as xn or n !, are merely abbreviations that compress long
statements or formulas into a short space.
Others, like the integration symbol ji f(x) dx,
not only remind us of the process being represented but also help us in carrying out
computations.
Sometimes several different notations are used for the same idea, preference for one
or another being dependent on the circumstances that surround the use of the symbols.
This is especially true in differential calculus where many different notations are used for
derivatives. The derivative of a function f has been denoted in our previous discussions
by f ‘, a notation introduced by J. L. Lagrange (1736-1813) late in the 18th Century. This
emphasizes the fact that f' is a new function obtained from f by differentiation, its value
at x being denoted by f ‘(x). Each point (x, y) on the graph off has its coordinates x and
y related by the equation y = f (x), and the symbol y’ is also used to represent the derivative
f'(x). Similarly, y #, . . . , y(lz) represent the higher derivatives f”(x), . . . , f cri)(x). For
example, if y = sin x, then y’ = COS x, y V = -sin x, etc. Lagrange’s notation is not too
far removed from that used by Newton who wrote j and ÿ, instead of y’ and y “. Newton’s
dots are still used by some authors, especially to denote velocity and acceleration.
Another symbol was introduced in 1800 by L. Arbogast (1759-1803) who denoted the
derivative off by DJ a symbol that has widespread use today. The symbol D is called a
172
DifSerential
calculus
d$èrentiation operator, and it helps to suggest that Df is a new function obtained from f
by the operation of differentiation. Higher derivatives f “, f”‘, . . . ,fcn) are written O”f,
O”f, . . . , O”f, respectively, the values of these derivatives at x being written D2f(x),
Thus, we have D sin x = COS x and Dz sin x = D COS x = -sin x.
D3f(x), . . . , D”~(X).
The rule for differentiating a sum of two functions becomes, in the D-notation, D(f + g) =
Df + Dg. Evaluation of the derivatives at x leads to the formula [D(f + g)](x) =
Of(x) + D~(X) which is also written in the form D[~(X) + g(x)] = D~(X) + Dg(x). The
reader may easily formulate the product and quotient rules in the D-notation.
Among the early pioneers of mathematical analysis, Leibniz, more than anyone else,
understood the importance of well-chosen symbols. He experimented at great length and
carried on extensive correspondence with other mathematicians, debating the merits or
drawbacks of various notations. The tremendous impact that calculus has had on the
development of modern mathematics is due in part to its well-developed and highly
suggestive symbols, many of them originated by Leibniz.
Leibniz developed a notation for derivatives quite different from those mentioned above.
Using y forf(x), he wrote the difference quotient
f(x + h) -f(x)
h
in the form
where Ax (read as “delta x”) was written for h, and Ay forf(x + h) -f(x). The symbol
A is called a d@erence operator. For the limit of the difference quotient, that is, for the
derivativef’(x), Leibniz wrote dy/dx. In this notation, the definition of derivative becomes
!!2 = lim 9
d x
~r+oLix'
Not only was Leibniz’s notation different, but his way of thinking about derivatives was
different. He thought of the limit dy/dx as a quotient of “infinitesimal” quantities dy and
dx called “differentials,” and he referred to the derivative dy/dx as a “differential quotient.”
Leibniz imagined infinitesimals as entirely new types of numbers which, although not zero,
were smaller than every positive real number.
Even though Leibniz was not able to give a satisfactory definition of infinitesimals, he
and his followers used them freely in their development of calculus. Consequently, many
people found calculus somewhat mysterious and began to question the validity of the
methods. The work of Cauchy and others in the 19th Century gradually led to the replacement of infinitesimals by the classical theory of limits. Nevertheless, many people have
found it helpful to try to think as Leibniz did in terms of infinitesimals. This kind of
thinking has intuitive appeal and often leads quickly to results that cari be proved correct
by more conventional means.
Recently Abraham Robinson has shown that the real number system cari be extended
to incorporate infinitesimals as envisaged by Leibniz. A discussion of this extension and its
113
Exercises
impact on many branches of mathematics is given in Robinson’s book, Non-standard
Analysis,
North-Holland Publishing Company, Amsterdam, 1966.
Although some of Leibniz’s ideas fe’ll into temporary disrepute, the same cannot be said
of his notations. The symbol dy/dx for the derivative has the obvious advantage that it
summarizes the whole process of forming the difference quotient and passing to the limit.
Later we shall find the further advantage that certain formulas become easier to remember
and to work with when derivatives are written in the Leibniz notation.
4.9 Exercises
1. Let f(x) = ix” - 2x2 + 3x + 1 for all x. Find the points on the graph off at which the
tangent line is horizontal.
2. LetJ’(x) = $x3 + 1,x2 - x - 1 for a11 x. Find the points on the graph offat which the slope
is: (a) 0; (b) -1; (c) 5.
3. Letf(x) = x + sin x for a11 x. Find a11 points x for which the graph offat (x,f(x)) has slope
zero.
4. Letf(x) = x2 + ax + b for a11 x. Find values of a and b such that the line y = 2x is tangent
to the graph off’at the point (2, 4).
5. Find values of the constants a, b, and c for which the graphs of the two polynomialsf(x) =
x2 + ax + b and g(x) = x3 - c Will intersect at the point (1, 2) and have the same tangent
line at that point.
6. Consider the graph of the function f’ defined by the equation f(x) = x2 + ax + b, where a
and b are constants.
(a) Find the slope of the chord joining the points on the graph for which x = x1 and x = x2.
(b) Find, in terms of x1 and x2 , a11 values of x for which the tangent line at (x,f(x)) has the
same slope as the chord in part (a).
Show that the line y = -x is tangent to the curve given by the equation y = x3 - 6x2 + 8x.
Find the point of tangency. Does this tangent line intersect the curve anywhere else?
Make a sketch of the graph of the cubic polynomialf(x) = x - x3 over the closed interval
-2 < x I 2. Find constants m and b such that the line y = mx + b Will be tangent to the
graph off at the point ( - l,O). A second line through (- 1,0) is also tangent to the graph off
at a point (a, c). Determine the coordinates a and c.
A function f is defined as follows:
(a, b, c constants) .
f(x) = (11 + b
Find values of a and b (in terms of c) such thatf’(c) exists.
10. Solve Exercise 9 when f is defined as follows:
f(X) = Ïi
l a + bX2
11. Solve Exercise
9 when
if 1x1 > c ,
if 1x1 5 c .
f is defined as follows:
sin x
f<4 = ax + b
i f X<C,
i f X>C.
DifSerential
174
calculus
12. Iff(x) = (1 - &)/(l + 4;) for x > 0, find formulas for Of(x), Pfcx), and O~(X).
13. There is a polynomial P(x) = ux3 + bx2 + cx + d such that P(0) = P(1) = -2, P’(O) = -1,
and P”(0) = 10. Compute a, b, c, d.
14. Two functions f and g have first and second derivatives at 0 and satisfy the relations
f(O) = 2/gKo ,
f ‘KO = 2g’W) = 4g(O) >
g”(0) = Sf”(0) = 6f(O) = 3 .
(a) Let h(x) = f(x)/g(x),
and compute h’(O).
(b) Let k(x) =,f(x)g(x)sin x, and compute k’(0).
(c) Compute the limit of g’(x)lf’(x) as x + 0.
15. Given that the derivativef’(a) exists. State which of the following statements are true and
which are false. Give a reason for your decision in each case.
(a)f’(u)=lim’
f(h) -f(a)
h-u .
(c) J”(u) = lim
t-o
h+a
f(u + 2t) -f(u)
t
f(u + 2t) -f(u + t)
f(a) -f@ - h)
(b) ,f’(u) = lim
(d) f’(u) = lim
h
’
2t
B-0
t-o
16. Suppose that instead of the usual definition of the derivative Of(x), we define a new kind of
derivative, D*~(X), by the formula
D*f(x) = limf2(x + h) -f2(x)
h
’
h-0
where f 2(x) means [f (x)12.
(a) Derive formulas for computing the derivative D* of a sum, difference, product, and
quotient.
(b) Express D*f(x) in terms of Df(x).
(c) For what functions does O*f = Df?
4.10 The chain rule for differentiating composite functions
With the differentiation formulas developed thus far, we cari find derivatives of functions
for which f(x) is a finite sum of products or quotients of constant multiples of sin x,
COS x, and x’ (Y rational). As yet, however, we have not learned to deal with something
like f(x) = sin (x2) without going back to the definition of derivative. In this section we
shall present a theorem, called the chain rule, that enables us to differentiate composite
functions such as f(x) = sin (x2). This increases substantially the number of functions
that we cari differentiate.
We recall that if u and u are functions such that the domain of u includes the range of ~1,
we cari define the composite function f = u 0 u by the equation
f
f(x) =
4441 *
The chain rule tells us how to express the derivative off in terms of the derivatives u’ and v’.
THEOREM
f=llov.
4 . 2 . CHAIN RULE. Let f be the composition of two functions u and v, say
Suppose that both derivatives v’(x) and u’(y) exist, where y = v(x). Then the
The chain rule
derivative
f
for diflerentiating
composite jîunctions
175
‘(x) also exists and is given by the formula
(4.11)
f’(x) = u’(y) *v’(x) .
In other words, to compute the derivative of u 0 v at x, we first compute the derivative of
u at the point y, where y = v(x), and multiply this by v’(x).
Before we discuss the proof of (4.1 l), we shall mention some alternative ways of expressing
the chain rule formula. If we Write (4.11) entirely in terms of x, we obtain the formula
f’(x) = u’[u(x)] *v’(x) .
Expressed as an equation involving functions rather than numbers, the chain rule assumes
the following form
(u 0 v)’ = (u’ 0 v) *v’.
In the u(v)-notation, let us Write U(V)’ for the derivative of the composite function U(V) and
u’(v) for the composition U’ 0 v. Then the last formula becomes
u(v)’ = u’(v) *v’.
Proof of Theorem 4.2. We turn now to the proof of (4.11). We assume that v has a
derivative at x and that u has a derivative at v(x), and we wish to prove thatf has a derivative
at x given by the product u’[v(x)] . v’(x). The difference quotient for f is
J-(x + h) -f(x)= ~[V(X + h)l - ~[~X~I
h
h
(4.12)
It is helpful at this stage to introduce some new notation. Let y = v(x) and let k =
V(X + h) - v(x). (It is important to realize that k depends o n h . ) Then we have
V(X + h) = y + k and (4.12) becomes
f(x + h) -f(x)= U(Y + k) - U(Y)
h
h
’
(4.13)
The right-hand side of (4.13) resembles the difference quotient whose limit defines u’(y)
except that h appears in the denominator instead of k. If k # 0, it is easy to complete the
proof. We simply multiply numerator and denominator by k, and the right-hand side of
(4.13) becomes
(4.14)
U(Y
+ k) k
U(Y) .--=
k
h
u(y + k) - U(Y) . 4x + h) - 4x1
h
’
k
When h -+ 0, the last quotient on the right tends to v’(x). Also, k 4 0 as h -f 0 because
DifSerent
176
ial calculus
k = u(x + h) - U(X) and v is continuous
at x. Therefore the first quotient on the right
of (4.14) approaches u’(y) as h -f 0, and this leads at once to (4.11).
Although the foregoing argument seems to be the most natural way to proceed, it is not
completely general. Since k = V(X + h) - v(x), it may happen that k = 0 for infinitely
many values of h as h --f 0, in which case the passage from (4.13) to (4.14) is not valid.
TO overcome this difficulty, a slight modification of the proof is needed.
Let us return to Equation (4.13) and express the quotient on the right in a form that
does not involve k in the denominator. For this purpose we introduce the difference
between the derivative u’(y) and the difference quotient whose limit is u’(y). That is, we
define a new function g as follows:
s(t)
=
NY + t> - U(Y) _ u’(y)
t
i f
t#O
This equation defines g(t) only if t # 0. Multiplying by t and rearranging terms, we may
Write (4.15) in the following form:
(4.16)
“(y + t> - u(y) = t[gtt) + u’(I?>l *
Although (4.16) has been derived under the hypothesis that t # 0, it also holds for t = 0,
provided we assign some definite value to g(0). Since g(f) + 0 as t + 0, we shah define g(0)
to be 0. This Will ensure the continuity of g at 0. lf, now, we replace t in (4.16) by k, where
k = U(X + h) - v(x), and substitute the right-hand side of (4.16) in (4.13), we obtain
(4.17)
.f(x + 11) -f(x) = k
h [g(k) + U’(Y)1 9
h
a formula that is valid even if k = 0. When h + 0 the quotient k/h + U’(X) and g(k) -f 0
SO the right-hand side of (4.17) approaches the limit u’(y) *U’(X). This completes the proof
of the chain rule.
4.11 Applications of the chain rule.
Related rates and implicit differentiation
The chain rule is an excellent example to illustrate the usefulness of the Leibniz notation
for derivatives. In fact, if ae Write (4.11) in the Leibniz notation, it assumes the appearance
of a trivial algebraic identity. First we introduce new symbols, say
y = 44
and
z = u(y) .
Then we Write dy/dx for the derivative v’(x), and dz/dy for u’(y). The formation of the
composite function is indicated by writing
z = u(y) = u[z;(x)] =f(x) )
and dz/dx
is written for the derivative f’(x). The chain rule, as expressed in Equation
Applications of the chain rule.
Related rates and implicit dlJïêrentiatio,l
177
(4.1 l), now becomes
(4.18)
dz
dz dy
-=-dx
dy dx ’
The strong suggestive power of this formula is obvious. It is especially attractive to people
who use calculus in physical problems. For example, suppose the foregoing symbol z
represents a physical quantity measured in terms of other physical quantities x and y.
The equation z =f(x) tells us how to find z if x is given, and the equation z = u(y) tells
us how to find z if y is given. The relation between x and y is expressed by the equation
y = u(x). The chain rule, as expressed in (4.18), tells us that the rate of change of z with
respect to x is equal to the product of the rate of change of z with respect to y and the rate
of change ofy with respect to x. The following example illustrates how the chain rule may
be used in a special physical problem.
EXAMPLE 1. Suppose a gas is pumped into a spherical balloon at a constant rate of 50
cubic centimeters per second. Assume that the gas pressure remains constant and that the
balloon always has a spherical shape. How fast is the radius of the balloon increasing
when the radius is 5 centimeters?
Solution. Let r denote the radius and V the volume of the balloon at time t. We are
given dV/dt, the rate of change of volume with respect to time, and we want to determine
dryldt, the rate of change of the radius with respect to time, at the instant when r = 5. The
chain rule provides the connection between the given data and the unknown. It states that
(4.19)
dl/
dl/
- dr-=
dt
d r dt’
TO compute dV/dr, we use the formula V = 4m3/3 which expresses the volume of the sphere
in terms of its radius. Differentiation gives us dV/dr = 4nrz, and hence (4.19) becomes
i!! = 4Tr2 b’
dt
dt ’
Substituting dV/dt = 50 and r = 5, we obtain dr/dt = 1/(2n). That is to say, the radius is
increasing at a rate of 1/(2n) centimeters per second at the instant when r = 5.
The foregoing example is called a problem in related rates. Note that it was not necessary
to express r as a function of t in order to determine the derivative dr/dt. It is this fact that
makes the chain rule especially useful in related-rate problems.
The next two examples show how the chain rule may be used to obtain new differentiation
formulas.
EXAMPLE
2. Givenf(x) = sin (x2), computef’(x).
Solution. The function f is a composition,f(x) = ~[V(X)], where u(x) = x2 and u(x) =
sin x. TO use the chain rule, we need to determine u’[v(x)] = u’(x2). Since u’(x) = COS x,
we have u’(x2) = COS (x2), and hence (4.11) gives us
f’(x) = COS (x2) *u’(x) = COS (x2) *2x.
DifSerential
178
calculus
We may also solve the problem using the Leibniz notation. If we Write y = x2 and z = f(x),
then z = sin-y and dz/dx =f’(x). The chain rule yields
dz
dz dy
-z--z
dy dx
(COS y)(2x) = COS (x”) . 2x )
dx
which agrees with the foregoing result forS’(x).
EXAMPLE
3. rff(X) = [V(X)]", where n is a positive integer, compute f’(x) in terms of
U(X) and v’(x).
Solution. The function f is a composition, f(x) = u[u(x)], where u(x) = xl’. Since
U’(X) = nxnP1, we have u’[u(x)] = n[v(x)]+l, and the chain rule yields
f’(x) = n[v(x)]“-iv’(x)
.
If we omit the reference to x and Write this as an equality involving functions, we obtain
the important formula
(un)’ = nvn-lv’
which tells us how to differentiate the nth power of v when v’ exists. The formula is also
valid for rational powers if vl” and un-l are defined. TO solve the problem in the Leibniz
notation, we Write y = v(x) and z = f(x). Then z = y”, dz/dx = f ‘(x), and the chain rule
gives us
dz
dz dy
z=;Syx=ny +lU’(x) = n[u(x)]“-‘v’(x) ,
which agrees with the first solution.
EXAMPLE 4. The equation x2 + y2 = r2 represents a circle of radius r and tenter at the
origin. If we solve this equation for y in terms of x, we obtain two solutions which serve
to define two functions f and g given on the interval [-r, r] by the formulas
f(x) = dF-2
and
g(x) = -d7=2.
(The graph off is the Upper semicircle and the graph of g the lower semicircle.) We may
compute the derivatives off and g by the chain rule. For f we use the result of Example 3
with v(x) = r2 - x2 and IZ = f to obtain
f’(x) = &(r” - x2)p1i2(-2x) = 4+2 = f$
r
X
whenever f (x) # 0. The same method, applied to g, gives us
(4.21)
g’(x) = - de2 = =g
Exercises
179
whenever g(x) # 0. Notice that if we let y stand for eitherf(x) or g(x), then both formulas
(4.20) and (4.21) cari be combined into one, namely,
(4.22)
y’ = 7 if y # 0.
Another useful application of the chain rule has to do with a technique known as implicit
diferentiation. We shall explain the method and illustrate its advantages by rederiving
the result of Example 4 in a simpler way.
Formula (4.22) may be derived directly from the
EXAMPLE 5. Implicit dlferentiation.
equation x2 + y2 = r2 without the necessity of solving for y. We remember that y is a
function of x [either y = f(x) or y = g(x)]. A ssuming that y’ exists, we differentiate both
sides of the equation x2 + y2 = r2 to obtain
2x + 2yy’ = 0 .
(4.23)
(The term 2yy cornes from differentiating y2 as explained in Example 3.) When Equation
(4.23) is solved for y’ it yields (4.22).
The equation x2 + y2 = r2 is said to define y implicitly as a function of x (it actually
defines two functions), and the process by which (4.23) is obtained from this equation is
called implicit diferentiation.
The end result is valid for either of the two functionsfand g
SO defined. Notice that at a point (x, y) on the circle with x # 0 and y # 0, the tangent
line has a slope -~/y, whereas the radius from the tenter to (x, y) has the slope y/,~. The
product of the two slopes is -1 SO the tangent is perpendicular to the radius.
4.12 Exercises
In Exercises 1 through 14, determine the derivativef’(x). In each case it is understood that x is
restricted to those values for which the formula for ,f(x) is meaningful.
l.f(x) =cos2x
-2sinx.
COS
x2 + 2x sin x3.
4. f(x) = sin (Cos2 x) cas (sin2 x).
5. f(x) = sinn x . cas nx.
6. f(x) = sin [sin (sin X>I.
sin2 x
sur x2 ’
7. f(x) = 7
2
2’
9. f(x) = sec2 x + csc2 x.
2. f(x) = 2/1$-.
3. f(x) = (2 - x2)
X
8. f’(x) = tan - - cet z
10. f(X) = x&-K?
11. f(x) = d&2.
1 + x3 1’3
12. f(x) = G3 .
c
1
1
13. f(x) =
dïTF(x + dïTT>’
14. f(X) = Jm.
Differential calculus
180
15. Computef’(x) iff(x) = (1 + x)(2 + x2)1/2(3 + x3)1/3, x3 + -3.
1
1
16. Letf(x) = ~ if x # 0, and let g(x) =
Computef’(x) and g’(x).
1 + 1/x
1 + llfc4
17. The following table of values was computed for a pair of functions f and g and their derivativesf’ and g’. Construct a corresponding table for the two composite functions h and k
@en by &) =.fLyWl, k(x) = g[fWl.
1
fW
.f'cx>
1
5
3
0
2
-2
2
4
18. A functionfand its first two derivatives are tabulated as shown. Let g(x) = X~(X~) and make
a table ofg and its first two derivatives for x = 0, 1, 2.
x
f(x)
f’(x)
f”(x)
0
0
1
1
1
2
1
2
4
3
6
2
3
1
1
0
19. Determine the derivativeg’(x) in terms off’(x) if:
64 g(x) = f(x‘? ;
(cl g(x) = f’f(x>l;
(b) g(x) =f(sin2 x) +~(COS~ x);
(4 g(x) = f{f[fWl>.
Related rates and implicit diferentiation.
20. Each edge of a cube is expanding at the rate of 1 centimeter (cm) per second. How fast is the
volume changing when the length of each edge is (a) 5 cm? (b) 10 cm? (c) x cm?
21. An airplane flies in level flight at constant velocity, eight miles above the ground. (In this
exercise assume the earth is flat.) The flight path passes directly over a point P on the ground.
The distance from the plane to P is decreasing at the rate of 4 miles per minute at the instant
when this distance is 10 miles. Compute the velocity of the plane in miles per hour.
22. A baseball diamond is a 90-foot square. A bal1 is batted along the third-base line at a constant
speed of 100 feet per second. How fast is its distance from first base changing when (a) it is
halfway to third base? (b) it reaches third base?
23. A boat sails parallel to a straight beach at a constant speed of 12 miles per hour, staying 4
miles offshore. How fast is it approaching a lighthouse on the shoreline at the instant it is
exactly 5 miles from the lighthouse?
Applications of diflerentiation
to extreme values of functions
181
24. A reservoir has the shape of a right-circular cane. The altitude is 10 feet, and the radius of the
base is 4 ft. Water is poured into the reservoir at a constant rate of 5 cubic feet per minute.
How fast is the water level rising when the depth of the water is 5 feet if (a) the vertex of the
cane is up? (b) the vertex of the cane is down?
25. A water tank has the shape of a right-circular cane with its vertex down. Its altitude is 10 feet
and the radius of the base is 15 feet. Water leaks out of the bottom at a constant rate of 1
cubic foot per second. Water is poured into the tank at a constant rate of c cubic feet per
second. Compute c SO that the water level Will be rising at the rate of 4 feet per second at the
instant when the water is 2 feet deep.
26. Water flows into a hemispherical tank of radius 10 feet (flat side UP). At any instant, let h
denote the depth of the water, measured from the bottom, r the radius of the surface of the
water, and V the volume of the water in the tank. Compute dV/dh at the instant when h = 5
feet. If the water flows in at a constant rate of 52/3 cubic feet per second, compute dr/dt,
the rate at which r is changing, at the instant t when h = 5 feet.
27. A variable right triangle ABC in the xy-plane has its right angle at vertex B, a fixed vertex
A at the origin, and the third vertex C restricted to lie on the parabola y = 1 + & x2. The
point B starts at the point (0, 1) at time t = 0 and moves upward along the y-axis at a constant
velocity of 2 cm/sec. How fast is the area of the triangle increasing when t = 7/2 sec?
28. The radius of a right-circular cylinder increases at a constant rate. Its altitude is a linear
function of the radius and increases three times as fast as the radius. When the radius is 1
foot the altitude is 6 feet. When the radius is 6 feet, the volume is increasing at a rate of 1
cubic foot per second. When the radius is 36 feet, the volume is increasing at a rate of n cubic
feet per second, where n is an integer. Compute n.
29. A particle is constrained to move along a parabola whose equation is y = x2. (a) At what
point on the curve are the abscissa and the ordinate changing at the same rate? (b) Find this
rate if the motion is such that at time t we have x = sin t and y = sin2 t.
30. The equation x3 + y3 = 1 defines y as one or more functions of x. (a) Assuming the derivative
y’ exists, and without attempting to salve for y, show thaty’ satisfies the equation x2 + y2y’ = 0.
(b) Assuming the second derivative y” exists, show that y” = -2xyP5 whenever y # 0.
31. If 0 < x < 5, the equation xii2 + y1’2 = 5 defines y as a function of x. Without solving for y,
show that the derivative y’ has a fixed sign. (You may assume the existence of y’.)
32. The equation 3x2 + 4y2 = 12 defines y implicitly as two functions of x if 1x1 < 2. Assuming
the second derivative y” exists, show that it satisfies the equation 4y3y” = -9.
33. The equation x sin xy + 2x2 = 0 defines y implicitly as a function of x. Assuming the derivative y’ exists, show that it satisfies the equation y’x2 COS xy + xy COS xy + sin xy + 4x = 0.
34. If y = x”, where r is a rational number, say r = m/n, then y” = xm. Assuming the existence
of the derivative y’, derive the formula y’ = rxrP1 using implicit differentiation and the corresponding formula for integer exponents.
4.13 Applications of differentiation to extreme values of functions
Differentiation cari be used to help locate maxima and minima of functions. Actually,
there are two different uses of the word “maximum” in calculus, and they are distinguished
by the two prefixes absolute and relative. The concept of absolute maximum was introduced
in Chapter 3. We recall that a real-valued functionfis said to have an absolute maximum
on a set S if there is at least one point c in S such that
f(x) If(c)
for a11 x in S .
The concept of relative maximum is defined as follows.
182
DifSerential
calculus
DEFINITION OF RELATIVE MAXIMUM.
A function j; dejned on a set S, is said to have a
relative maximum at a point c in S if there is some open interval I containing c such that
f(x) <f(c)
for a11 x u’hich lie in I n S.
The concept of relative minimum is similarly
dejîned by reversing the inequality.
In other words, a relative maximum at c is an absolute maximum in some neighborhood
of c, although this need not be an absolute maximum on the whole of S. Examples are
shown in Figure 4.7. Of course, every absolute maximum is, in particular, a relative
maximum.
A
A
Absolute
maximum
/
O\
Absolute
minimum
*
a
5 Absolute
minimum
Relative
maximum
-,X
f(x) = sin x, 0 I x i 7r
FIGURE 4.7
I
-1
Relative
minimum
I
&/(X)=X(I
-x)2, -41x12
/
L Absolute minimum
Extrema of functions.
A number M,hich is either a relative maximum or a relative
DEFINITION OF EXTREMUM.
minimum of a function f is called an extreme value or an extremum off.
The next theorem, which is illustrated in Figure 4.7, relates extrema of a function to
horizontal tangents of its graph.
Let f be
THEOREM 4.3.
VANISHING OF THE DERIVATIVE AT AN INTERIOR EXTREMUM.
deflned on an open interval I, and assume thatf has a relative maximum or a relative minimum
at an interior point c of 1. If the derivative f ‘(c) exists, then f ‘(c) = 0.
Proof.
Define a function Q on I as follows:
Q(x) _
- f(x) - f(c)
x - c
if x # c,
Q(c, = f’(c)
Since f ‘(c) exists, Q(x) + Q(c) as x 4 c, SO Q is continuous at c. We wish to prove that
Q(c) = 0. We shall do this by showing that each of the inequalities Q(c) > 0 and Q(c) < 0
leads to a contradiction.
The mean-value theorem for derivatives
183
Assume Q(c) > 0. By the sign-preserving property of continuous functions, there is an
interval about c in which Q(x) is positive. Therefore the numerator of the quotient Q(x)
has the same sign as the denominator for a11 x # c in this interval. In other words,
f(x) > f(c) when x > c, and f(x) <f(c) when x < c. This contradicts the assumption
that f has an extremum at c. Hence, the inequality Q(c) > 0 is impossible. A similar
argument shows that we cannot have Q(c) < 0. Therefore Q(c) = 0, as asserted. Since
Q(c) = f’(c), this proves the theorem.
It is important to realize that a zero derivative at c does not imply an extremum at c.
For example, let f(x) = x3. The graph off is shown in Figure 4.8. Here f’(x) = 3x2, SO
Y
f(x) = 1x1
X
0
*
FIGURE 4.8 Heref’(0) equals
0 but there is no extremum
at 0.
FIGURE 4.9 There is an extremum at 0, but f’(0) does
not exist.
f'(0) = 0. However, this function is increasing in every interval containing 0 SO there is
no extremum at 0. This example shows that a zero derivative at c is not suficient for an
extremum at c.
Another example, f(x) = 1x1, shows that a zero derivative does not always occur at an
extremum. Here there is a relative minimum at 0, as shown in Figure 4.9, but at the point
0 itself the graph has a Sharp corner and there is no derivative. Theorem 4.3 assumes that
the derivative f’(c) exists at the extremum. In other words, Theorem 4.3 tells us that, in
the absence of Sharp corners, the derivative must necessarily vanish at an extremum if this
extremum occurs in the interior of an interval.
In a later section we shall describe a test for extrema which is comprehensive enough to
include both the examples in Figure 4.7 and also the example in Figure 4.9. This test,
which is described in Theorem 4.8, tells us that an extremum always occurs at a point
where the derivative changes its sign. Although this fact may seem geometrically evident,
a proof is not easy to give with the materials developed thus far. We shall deduce this
result as a consequence
of the mean-value theorem for derivatives which we discuss next.
4.14 The mean-value
theorem for derivatives
The mean-value theorem for derivatives holds a position of importance in calculus
because many properties of functions cari easily be deduced from it. Before we state the
mean-value theorem, we Will examine one of its special cases from which the more general
Dlxerential
184
theorem Will be deduced. This special
(1652-l 719), a French mathematician.
calculus
case was discovered in 1690 by Michel Rolle
THEOREM 4.4. ROLLE’S THEOREM. Let f
on a closed interval [a, b] and has a derivative
assume that
at
be a function which is continuous everywhere
each point of the open interual (a, b). Also,
f(a) =f@) .
Then there is at least one point c in the open interval (a, b) such that f ‘(c) = 0.
The geometric significance of Rolle’s theorem is illustrated in Figure 4.10. The theorem
simply asserts that the curve shown must have a horizontal tangent somewhere between
a and b.
AmB
‘bB
a
C
b
a
b
C
(4
FIGURE 4.10 Geometric interpretation of Rolle’s theorem.
a
CI
CZ
b
(b)
FIGURE 4.11 Geometric significance of the mean-value
theorem.
Proof.
We assume that f ‘(x) # 0 for every x in the open interval (a, b), and we arrive
at a contradiction as follows: By the extreme-value theorem for continuous functions,
f
must take on its absolute maximum M and its absolute minimum m somewhere in the
closed interval [a, b]. Theorem 4.3 tells us that neither extreme value cari be taken at any
interior point (otherwise the derivative would vanish there). Hence, both extreme values
are taken on at the endpoints a and b. But since f (a) = f (b), this means that m = M, and
hence f is constant on [a, b]. This contradicts the fact that f ‘(x) # 0 for a11 x in (a, b). It
follows that f’(c) = 0 for at least one c satisfying a < c < b, which proves the theorem.
We cari use Rolle’s theorem to prove the mean-value theorem. Before we state the
mean-value theorem, it may be helpful to examine its geometric significance. Each of the
curves shown in Figure 4.11 is the graph of a continuous function f with a tangent line
above each point of the open interval (a, b). At the point (c, f (c)) shown in Figure 4.1 l(a),
the tangent line is parallel to the chord AB. In Figure 4.1 l(b), there are two points where
the tangent line is parallel to the chord AB. The mean-value theorem guarantees that
there Will be at least onepoint with this property.
TO translate this geometric property into an analytic statement, we need only observe
that parallelism of two lines means equality of their slopes. Since the slope of the chord
The mean-value theorem for derivatives
185
AB is the quotient [f (6) - f (a)]/(b -a) a n d since the slope of the tangent line at c is the
derivative f ‘(c), the above assertion states that
y If’“’ = f’@)
(4.24)
for some c in the open interval (a, b).
TO exhibit strong intuitive evidence for the truth of (4.24) we may think off(t) as the
distance traveled by a moving particle at time t. Then the quotient on the left of (4.24)
represents the mean or average speed in the time interval [a, 61, and the derivative f’(t)
represents the instantaneous speed at time t. The equation asserts that there must be
some moment when the instantaneous speed is equal to the average speed. For example,
if the average speed during an automobile trip is 45 mph, then the speedometer must
register 45 mph at least once during the trip.
The mean-value theorem may be stated formally as follows.
THEOREM 4.5.
MEAN-VALUE THEOREM FOR DERIVATIVES.
Assume that f is continuous
everywhere on a closed interval [a, b] and has a derivative at each point of the open interval
(a, b). Then there is at least one interior point c of (a, b) for ichich
(4.25)
f(b) -f(a) = f’(c)(b - a).
Proof. TO apply Rolle’s theorem we need a function which has equal values at the
endpoints a and b. TO construct such a function, we modify f as follows. Let
h(x) =f(x)(b
- a) - x[f(b) -f(a)] .
Then h(a) = h(b) = bf (a) - af(b). Also, h is continuous on [a, b] and has a derivative
in the open interval (a, 6). Applying Rolle’s theorem to h, we find that h’(c) = 0 for some
c in (a, b). But
h’(x)
=f’(x)(b
- a) - [f(b) -f(a)] .
When x = c, this gives us Equation (4.25).
Notice that the theorem makes no assertion about the exact location of the one or more
“mean values” c, except to say that they a11 lie somewhere between a and b. For some
functions the position of the mean values may be specified exactly, but in most cases it is
very difficult to make an accurate determination of these points. Nevertheless, the real
usefulness of the theorem lies in the fact that many conclusions cari be drawn from the
knowledge of the mere existence of at least one mean value.
Note: It is important to realize that the conclusion of the mean-value
theorem may fail
to hold if there is any point between a and b where the derivative does not exist. For example, the function f defined by the equation f (xj = 1x1 is continuous everywhere on the
186
Dl$erentiaI
calculus
real axis and has a derivative everywhere except at 0. Let A = ( - 1, f ( - 1)) and let B =
(2, f(2)). The slope of the chord joining A and B is
2 - 1 =-1
f(2) -f(-1) =2 - (-1)
3
3
but the derivative is nowhere equal to 4.
The following extension of the mean-value theorem is often useful.
Let f and g be two functions conTHEOREM 4.6. CAUCHY’S MEAN-VALUE FORMULA.
tinuous on a closed interval [a, b] and haviq derivatives in the open interval (a, b). Then, for
some c in (a, b), rt’e have
f'(c)[g(b) - g(a)1 = g'(c)[f(b) -.f(a)] .
Proof.
The proof is similar to that of Theorem 4.5. We let
h(x) =f(x)ig(b) - g(43 - g(x)[f(b) -f (41.
Then h(a) = h(b) = f(a)g(b) - g(a)f(b). Applying Rolle’s theorem to h, we find that
h’(c) = 0 for some c in (a, 6). Computing h’(c) from the formula defining h, we obtain
Cauchy’s mean-value formula. Theorem 4.5 is the special case obtained by taking g(x) = x.
4 . 1 5 Exercises
1. Show that on the graph of any quadratic polynomial the chord joining the points for which
x = a and x = b is parallel to the tangent line at the midpoint x = (a + b)/2.
2. Use Rolle’s theorem to prove that, regardless of the value of 6, there is at most one point x
in the interval -1 5 x < 1 for which x3 - 3x + b = 0.
3. Define a functionfas follows:
3 - x2
f(x) = 7j-
i f x<l,
f(x) = $
i f x21.
(a) Sketch the graph off for x in the interval 0 I x < 2.
(b) Show that f satisfies the conditions of the mean-value theorem over the interval [O, 21
and determine all the mean values provided by the theorem.
4. Let f(x) = 1 - xx. Show that f(1) = f( - 1) = 0, but thatf’(x) is never zero in the interval
[ -1, 11. Explain how this is possible, in view of Rolle’s theorem.
5. Show that x2 = x sin x + COS x for exactly two real values of x.
6. Show that the mean-value formula cari be expressed in the form
f(x+h)=f(x)+hf’(x+Oh)
w h e r e 0<8<1.
Determine 0 in terms of x and h when (a) f(x) = x2; (b),f(x) = x3. Keep x fixed, x # 0, and
find the limit of 0 in each case as h -f 0.
7. Let f be a polynomial. A real number tl is said to be a zero off of multiplicity m iff(x) =
(x - ct)mg(x), whereg(a) # 0.
Applications of the mean-value theorem to geometric properties of finctions
187
(a) Iff has r zeros in an interval [a, 61, prove that J” has at least r - 1 zeros, and in general,
the kth derivativef”“) has at least r - k zeros in [a, b]. (The zeros are to be counted as often
as their multiplicity indicates.)
(b) If the kth derivative f (k) has exactly r zeros in [a, b], what cari you conclude about the
number of zeros off in [a, b]?
8. Use the mean-value theorem to deduce the following inequalities:
(a) Isinx - sinyl 5 Ix -y\.
(b) nyn-l(x - y) < xn -y” I: nxn-l(x -y)
if 0 < y 5 x, n = 1, 2, 3, . . . .
9. A functionf, continuous on [a, b], has a second derivativef” everywhere on the open interval
(a, b). The line segment joining (a, f(n)) and (b, f(b)) intersects the graph offat a third point
(c,f(c)), where a < c < b. Prove thatf”(t) = 0 for at least one point t in (a, b).
10. This exercise outlines a proof of the intermediate-value theorem for derivatives. Assume f
has a derivative everywhere on an open interval I. Choose a < b in 1. Then f’ takes on every value
between ,f’(u) andf’(b) somewhere in (a, b).
(a) Define a new function g on [a, b] as follows:
g(x) = f(x) -f(a)
x - a
if x # a,
g(4 =fw .
Prove that g takes on every value betweenf’(a)
and g(b) in the open interval (a, b). Use the
mean-value theorem for derivatives to show thatf’ takes on every value betweenf’(a)
andg(b)
in the open interval (a, b).
(b) Define a new function h on [a, b] as follows:
h(x) = f(x) -f(b)
x - b
if x # b,
h(b) =f’(b) .
By an argument similar to that in part (a), show that f’ takes on every value between f’(b)
and h(u) in (a, b). Since h(a) = g(b), this proves the intermediate-value theorem for derivatives.
4.16 Applications of the mean-value theorem to geometric properties of functions
The mean-value theorem may be used to deduce properties of a function from a
knowledge of the algebraic sign of its derivative. This is illustrated by the following
theorem.
f
THEOREM
4.7. Letf be a function which is continuous on a closed interval [a, b] and assume
has a derivative at each point of the open interval (a, 6).
Then nie have:
(4 Iff'C4 > Ofor every x in (a, b), f is strictly increasing on [a, b];
(b) Vf'<x> < Ofor every x in (a, b), f is strictly decreasing on [a, b];
cc> q-f'<4 = Ofor every x in (a, b), f is constant throughout [a, b].
Proof. TO prove (a) we must show that f (x) <f(y) whenever a < x < y 5 b. Therefore, suppose x < y and apply the mean-value theorem to the closed subinterval [x, y].
We obtain
(4.26)
f(Y) -f(x) =f’My - x>,
Since both f ‘(c) and y - x are positive,
SO
where
x<c < y .
is f (y) - f(x), and this means f (x) < f(y), as
Dtferential
188
calculus
asserted. This proves (a), and the proof of (b) is similar. TO prove (c), we use Equation
(4.26) with x = a. Sincef’(c) = 0, we havef(y) =f(a) for every y in [a, b], so f is constant
on [a, b].
We cari use Theorem 4.7 to prove that an extremum occurs whenever the derivative
changes sign.
THEOREM 4.8.
Assume f is continuous on a closed interval [a, b] and assume that the
derivative f’ exists everywhere in the open interval (a, b), except possibly at a point c.
(a> rff’( x ts
1 OSI
’ ptue ‘t’or fa11 x < c and negative for a11 x > c, then f has a relative
maximum at c.
(b) If, on the other hand, f’(x) is negative for a11 x < c and positive for a11 x > c, then f
has a relative minimum at c.
Proof.
In case (a), Theorem 4.7(a) tells US that f is strictly increasing on [a, c] and
strictly decreasing on [c, b]. Hence f(x) < f(c ) for a11 x # c in (a, b), so f has a relative
ytx)m<’
I
Il
l
II
I
l
a
I
Il
I
II
I
II
I
I
I
I
I
I
1
l
l
l
c
b
a
c
(a) Relative maximum at c
FIGURE
yL+
b
(b) Relative minimum at c
4.12 An extremum occurs when the derivative changes sign.
maximum at c. This proves (a) and the proof of(b) is entirely analogous. The two cases
are illustrated in Figure 4.12.
4.17 Second-derivative test for extrema
If a function f is continuous on a closed interval [a, b], the extreme-value theorem tells
us that it has an absolute maximum and an absolute minimum somewhere in [a, b]. If f
has a derivative at each interior point, then the only places where extrema cari occur are:
(1) at the endpoints a and b;
(2) at those interior points x where f ‘(x) = 0.
Points of type (2) are often called criticalpoints off. TO decide whether there is a maximum
or a minimum (or neither) at a critical point c, we need more information about f. Usually
the behavior off at a critical point cari be determined from the algebraic sign of the
derivative near c. The next theorem shows that a study of the sign of the second derivative
near c cari also be helpful.
Curve sketching
189
THEOREM $9.
SECOND-DERIVATIVE TEST FOR AN EXTREMUM AT A CRITICAL POINT.
Let
c be a criticalpoint off in an open interval (a, b); that is, assume a < c < b andf ‘(c) = 0.
Assume also that the second derivative f” exists in (a, 6). Then we have the follolcing:
(a) Iff" is negative in (a, b), f has a relative maximum at c.
(b) If f" is positive in (a, b), f has a relative minimum at c.
The two cases are illustrated in Figure 4.12.
Proof.
Consider case (a), f U < 0 in (a, b). By Theorem 4.7 (applied to f ‘), the function
is strictly decreasing in (a, b). But f'(c) = 0, SO f' changes its sign from positive to
negative at c, as shown in Figure 4.12(a). Hence, by Theorem 4.8, f has a relative maximum
at c. The proof in case (b) is entirely analogous.
f'
Iff” is continuous
at c, and if f “(c) # 0, there Will be a neighborhood of c in which f n
has the same sign asf”(c). Therefore, iff’(c) = 0, the function f has a relative maximum
at c if f “(c) is negative, and a relative minimum if r(c) is positive. This test suffices for
many examples that occur in practice.
The sign of the second derivative also governs the convexity or the concavity off. The
next theorem shows that the function is convex in intervals where f" is positive, as illustrated
by Figure 4.12(b). In Figure 4.12(a), fis concave because f' is negative. It suffices to
discuss only the convex case, because iff is convex, then -fis concave.
4.10. DERIVATIVE TEST FOR CONVEXITY. Assume f is contimous on [a, b] and
has a derivative in the open interval (a, b). Iff’ is increasing on (a, b), then f is convex on
[a, b]. In particular, f is convex iff” exists and is nonnegative in (a, b).
THEOREM
Proof. Take x < y in [a, b] and let z = ~y + (1 - LX)X, where 0 < tc < 1. We wish
to prove that f(z) 5 af (y) + (1 - a)f (x). Since f(z) = af (z) + (1 - a)f (z), this is the
same as proving that
(1 - a)[f(z)
-f<x>l I
a[f(y)
-f<z>l.
By the mean-value theorem (applied twice), there exist points c and d satisfying x < c < z
and z < d < y such that
f(z) -f(x) = f’(c)@ - x),
and
f(y) -f(z) = f’(d)(y - z ) .
Since f’ is increasing, we have f ‘(c) 5 f’(d). Also, we have (1 - a)(z - X) = a(y - z), so
we may Write
(1 - K)[f(z)
-f(x)1
= (1 - a>f’(c)(z
- x) I af’(d)(y - z) = cr[f(y)
-f(z)],
which proves the required inequality for convexity.
4.18 Curve sketching
The information gathered in the theorems of the last few sections is often useful in curve
sketching. In drawing the graph of a function f, we should first determine the domain off
190
DifSerential calculus
[the set of x for whichf(x) is defined] and, if it is easy to do SO, we should find the range
off(the set of values taken on byf). A knowledge of the domain and range gives us an
idea of the extent of the curve y = f(x), since it specifies a portion of the xy-plane in which
the entire curve must lie. Then it is a good idea to try to locate those points (if any) where
the curve crosses the coordinate axes. These are called intercepts of the graph. The
y-intercept is simply the point (O,f(O)), assuming 0 is in the domain off, and the x-intercepts
are those points (x, 0) for whichf(x) = 0. Computing the x-intercepts may be extremely
difficult in practice, and we may have to be content with approximate values only.
We should also try to determine intervals in whichfis monotonie by examining the sign
off’, and to determine intervals of convexity and concavity
by studying the sign off “.
Special attention should be paid to those points where the graph has horizontal tangents.
1. The graph of y =f(x), wheref(x) = x + 1/x for x # 0.
In this case, there are no intercepts on either axis. The first two derivatives are given by
the formulas
EXAMPLE
f’(x) = 1 - 1/x2 >
Jr(X) =
2/x3 .
Y
FIGURE 4.13
Graph off(x) = x + 1/x.
FIGURE 4.14 Graph off(x) = 1/(x2 + 1).
The first derivative is positive if x2 > 1, negative if x2 < 1, and zero if x2 = 1. Hence
there is a relative minimum at x = 1 and a relative maximum at x = - 1. For x > 0,
the second derivative is positive SO the first derivative is strictly increasing. For x < 0, the
second derivative is negative, and therefore the first derivative is strictly decreasing. For
x near 0, the term x is small compared to 1/x, and the curve behaves like the curve y = 1 /x.
(See Figure 4.13.) On the other hand, for very large x (positive or negative), the term 1/x
is small compared to x, and the curve behaves very much like the line y = x. In this
example, the function is odd, f( -x) = -f(x), SO the graph is symmetric with respect to
the origin.
In the foregoing example, the line y = x is an asymptote of the curve. In general, a
nonvertical line with equation y = mx + b is called an asymptote of the graph of y = f(x)
if the differencef(x) - ( mx + 6) tends to 0 as x takes arbitrarily large positive values or
Worked examples of extremum problems
191
arbitrarily large negative values. A vertical line, x = a, is called a vertical asymptote if
1f (x)1 takes arbitrarily large values as x --f a from the right or from the left. In the foregoing
example, the y-axis is a vertical asymptote.
2. The graph of y = f (x), where f (x) = 1/(x2 + 1).
This is an even function, positive for a11 x, and has the x-axis as a horizontal asymptote.
The first derivative is given by
EXAMPLE
f’(x) =
-2x
(x2 + 1)2 ’
Therefore the
SO f’(x) < 0 if x > 0, f’(x) > 0 if x < 0, and f’(x) = 0 when x = 0.
function increases over the negative axis, decreases over the positive axis, and has a relative
maximum at x = 0. Differentiating once more, we find that
f,,(X> = (x2 + 112(--‘3 -- (-2x)2(x2 + 1X2.x) = 2(3x2 - 1)
(x2 + 1)1
(x2 + 1)3 .
Thus f “(x) > 0 if 3x2 > 1, and f “(x) <: 0 if 3x2 < 1. Hence, the first derivative increases
when x2 > + and decreases when x2 <: Q. This information suffices to draw the curve in
Figure 4.14. The two points on the graph corresponding to x2 = ‘3, where the second
derivative changes its sign, are called points of i@ection.
4 . 1 9 Exercises
In the following exercises, (a) find a11 points x such that J”(x) = 0; (b) examine the sign off
and determine those intervals in which f is monotonie; (c) examine the sign off” and determine
those intervals in which ,f’ is monotonie; (d) make a sketch of the graph of J In each case, the
function is defined for ail x for which the given formula forf(x) is meaningful.
1
1. f(X) = x2 - 3x + 2.
8. fCx) = (x - l)(x - 3) .
2. f(x) =
3. f(x) =
4. f(x) =
5. f(x) =
6. f(x) =
7. f(x) =
x3 - 4x.
(x - 1)2(x + 2).
x3 - 6x2 + 9x + 5.
2 + (x - 1)4.
1/x2.
x + 1/x2.
9. f(x) = X/(l + x2).
10. f(x) = (x2 - 4)/(x2 - 9).
11. f(x) = sin2 x.
12. f(x) = x - sin x.
13. f(x) = x + cosx.
14. f(X) = -6-x” + 82 COS 2x.
4.20 Worked examples of extremum problems
Many extremum problems in both pure and applied mathematics cari be attacked
systematically with the use of differential calculus. As a matter of fact, the rudiments of
differential calculus were first developed when Fermat tried to find general methods for
determining maxima and minima. We shall solve a few examples in this section and give
the reader an opportunity to solve others in the next set of exercises.
First we formulate two simple principles which cari be used to solve many extremum
problems.
192
DifSerential
calculus
EXAMPLE
1. Constant-sum, maximum-product principle. Given a positive number S.
Prove that among a11 choices of positive numbers x and y with x + y = S, the product xy
is largest when x = y = 4s.
If x + y = S, then y = S - x and the product xy is equal to x(S - x) =
Proof.
XS - x2. Let f(x) = XS - x2. This quadratic polynomial has first derivative f’(x) =
S - 2x which is positive for x < &S and negative for x > +,Y. Hence the maximum of
xy occurs when x = &Y, y = S - x = &Y. This cari also be proved without the use of
calculus. We simply Write f(x) = AS2 - (x - &!7)2 and note that j(x) is largest when
x = &Y.
EXAMPLE
2. Constant-product, minimum-sum principle. Given a positive number P.
Prove that among a11 choices of positive numbers x and y with xy = P, the sum x + y is
smallest when x = y = ~6.
We must determine the minimum of the function f(x) = x + P/x for x > 0.
Proof.
The first derivative is f’(x) = 1 - P/x2. This is negative for x2 < P and positive for
x2 > P, SO f(.x) has its minimum at x = V?. Hence, the sum x + y is smallest when
x = y = vi?
EXAMPLE
3. Among a11 rectangles of given perimeter, the square has the largest area.
Proof. We use the result of Example 1. Let x and y denote the sides of a general
rectangle. If the perimeter is fixed, then x + y is constant, SO the area xy has its largest
value when x = y. Hence, the maximizing rectangle is a square.
EXAMPLE 4. The geometric mean of two positive numbers does
not exceed their arithmetic mean. That is, z/ab 5 &(a + b).
Proof. Given a > 0, b > 0, let P = ab. Among a11 positive x and y with xy = P, the
sum x + y is smallest when x = y = 2/p. In other words, if xy = P, then x + y 2
V%+V?=21/p. 1 n particular, a + b > 2V? = 22/ab, SO 6 < ;(a + b). Equality
occurs if and only if a = b.
EXAMPLE 5. A block of weight W is to be moved along a flat table by a force inclined
at an angle 0 with the line of motion, where 0 < 19 5 &T, as shown in Figure 4.15. Assume
the motion is resisted by a frictional force which is proportional to the normal force with
which the block presses perpendicularly against the surface of the table. Find the angle 8
for which the propelling force needed to overcome friction Will be as small as possible.
Solution. Let F(8) denote the propelling force. It has an upward vertical component
F(B) sin 8, SO the net normal force pressing against the table is N = W - F(8) sin 8. The
frictional force is ,uN, where p (the Greek letter mu) is a constant called the coefficient of
friction. The horizontal component of the propelling force is F(8) COS 0. When this is
Worked examples of extremum problems
equated to the frictional force, we get F(8)
COS
193
19 = ,u[ W - F(8) k-81 from which we find
F(e) == e pw+ j.4 sin e *
COS
TO minimize F(B), we maximize the denominator g(0) = COS 0 + ,u sin 0 in the interval
0 5 8 5 tn. At the endpoints, we have g(0) = 1 and g(hn) = ru. In the interior of the
interval, we have
g’(e) := - s i n e + pcose,
This gives g(x) = COS IX +
g has a critical point at 8 = CC, where sin cc = ,U COS cc.
Since ru2 cos2 tc = sin2 t( =
We
cari
express
COS
tc
in
terms
of
pu.
/A2 COS c( = (1 + /L”) COS t(.
1 - cos2 cc, we find (1 + p2) cos2 SC =: 1, SO COS c( = l/dm. Thus g(x) = dm.
SO
Y
I
F(t))
we) c0se
Normal force N = W- F(B) sin f3
FIGURE 4.15 Example 5.
FIGURE
4.16 Example 6.
Since g(cr) exceeds g(0) and g(&n), the maximum of g occurs at the critical point.
minimum force required is
Hence the
EXAMPLE 6. Find the shortest
distance from a given point (0, b) on the y-axis to the
parabola x2 = 4~. (The number b may have any real value.)
Solution. The parabola is shown in Figure 4.16. The quantity to be minimized is the
distance d, where
d := tix2 + (y - b)2 >
subject to the restriction x2 = 4~. It is clear from the figure that when b is negative the
minimum distance is Ibl. As the point (0, b) moves upward along the positive y-axis,
194
DifSerential
calculus
the minimum is b until the point reaches a certain special position, above which the
minimum is <b. The exact location of this special position Will now be determined.
First of all, we observe that the point (x, y) that minimizes d also minimizes d2. (This
observation enables us to avoid differentiation of square roots.) At this stage, we may
express d2 in terms of x alone or else in terms of y alone. We shall express d2 in terms of
y and leave it as an exercise for the reader to carry out the calculations when d2 is expressed
in terms of x.
Therefore the functionfto be minimized is given by the formula
f(y) = d2 = 4y + (y - b)2.
Althoughf(y) is defined for a11 real y, the nature of the problem requires that we seek the
minimum only among those y 2 0. The derivative, given byf’(y) = 4 + 2(y - b), is zero
only when y = b - 2. When b < 2, this leads to a negative critical point y which is
excluded by the restriction y 2 0. In other words, if b < 2, the minimum does not occur
at a critical point. In fact, when b < 2, we see that f’(y) > 0 when y 2 0, and hence
f is strictly increasing for y 2 0. Therefore the absolute minimum occurs at the endpoint
y = 0. The corresponding minimum d is db2 = Ibl.
If b 2 2, there is a legitimate critical point at y = b - 2. Since f”(y) = 2 for a11 y,
the derivative f’ is increasing, and hence the absolute minimum off occurs at this critical
point. The minimum d is 1/4(b - 2) + 4 = 2V%?. Thus we have shown that the
minimum distance is lb1 if b < 2 and is 22/b-1 if b 2 2. (The value b = 2 is the special
value referred to above.)
4.21 Exercises
1. Prove that among a11 rectangles of a given area, the square has the smallest perimeter.
2. A farmer has L feet of fencing to enclose a rectangular pasture adjacent to a long stone wall.
What dimensions give the maximum area of the pasture?
3. A farmer wishes to enclose a rectangular pasture of area A adjacent to a long stone wall. What
dimensions require the least amount of fencing?
4. Given S > 0. Prove that among a11 positive numbers x and y with x + y = S, the sum
x2 + y2 is smallest when x = y.
5. Given R > 0. Prove that among a11 positive numbers x and y with x2 + y2 = R, the sum
x + y is largest when x = y.
6. Each edge of a square has length L. Prove that among a11 squares inscribed in the given
square, the one of minimum area has edges of length qL&!.
7. Each edge of a square has length L, Find the size of the square of largest area that cari be
circumscribed about the given square.
8. Prove that among a11 rectangles that cari be inscribed in a given circle, the square has the
largest area.
9. Prove that among a11 rectangles of a given area, the square has the smallest circumscribed
circle.
10. Given a sphere of radius R. Find the radius Y and altitude h of the right circular cylinder with
largest lateral surface area 2wh that cari be inscribed in the sphere.
11. Among a11 right circular cylinders of given lateral surface area, prove that the smallest circumscribed sphere has radius 1/2 times that of the cylinder.
195
Exerc&es
12. Given a right circular cane with radius R and altitude H. Find the radius and altitude of the
right circular cylinder of largest lateral surface area that cari be inscribed in the cane.
13. Find the dimensions of the right circular cylinder of maximum volume that cari be inscribed in
a right circular cane of radius R and ialtitude H.
14. Given a sphere of radius R. Compute, in terms of R, the radius r and the altitude h of the
right circular cane of maximum volume that cari be inscribed in this sphere.
15. Find the rectangle of largest area that cari be inscribed in a semicircle, the lower base being on
the diameter.
16. Find the trapezoid of largest area that cari be inscribed in a semicircle, the lower base being on
the diameter.
17. An open box is made from a rectangular piece of material by removing equal squares at each
corner and turning up the sides. Find the dimensions of the box of largest volume that cari
be made in this manner if the material has sides (a) 10 and 10; (b) 12 and 18.
18. If a and b are the legs of a right triangle whose hypotenuse is 1, find the largest value of 2a + b.
19. A truck is to be driven 300 miles on a freeway at a constant speed of x miles per hour. Speed
laws require 30 5 x 5 60. Assume that fuel costs 30 cents per gallon and is consumed at the
rate of 2 + x2/600 gallons per hour. If the driver’s wages are D dollars per hour and if he
obeys a11 speed laws, find the most economical speed and the cost of the trip if (a) D = 0,
(b) D = 1, (c) D = 2, (d) D = 3, (e) D = 4.
20. A cylinder is obtained by revolving a rectangle about the x-axis, the base of the rectangle
lying on the x-axis and the entire rectangle lying in the region between the curve y = x/(x2 + 1)
and the x-axis. Find the maximum possible volume of the cylinder.
21. The lower right-hand corner of a page is folded over SO as to reach the leftmost edge. (See
Figure 4.17.) If the width of the page is six inches, find the minimum length of the crease.
What angle Will this minimal crease rnake with the rightmost edge of the page? Assume the
page is long enough to prevent the crease reaching the top of the page.
F IGURE 4.17 Exercise
21
F IGURE 4.18 Exercise
22.
22. (a) An isosceles triangle is inscribed in a circle of radius r as shown in Figure 4.18. If the
angle 2a at the apex is restricted to lie between 0 and i ,n find the largest value and the smallest
value of the perimeter of the triangle. Give full details of your reasoning.
196
D$erential calculus
(b) What is the radius of the smallest circular disk large enough to caver every isosceles
triangle of a given perimeter L ? Give full details of your reasoning.
23. A window is to be made in the form of a rectangle surmounted by a semicircle with diameter
equal to the base of the rectangle. The rectangular portion is to be of clear glass, and the
semicircular portion is to be of a colored glass admitting only half as much light per square
foot as the clear glass. The total perimeter of the window frame is to be a fixed length P. Find,
in terms of P, the dimensions of the window which Will admit the most light.
24. A log 12 feet long has the shape of a frustum of a right circular cane with diameters 4 feet and
(4 + h) feet at its ends, where h 2 0. Determine, as a function of h, the volume of the largest
right circular cylinder that cari be tut from the log, if its axis coincides with that of the log.
25. Given n real numbers a,, . . . , a,. Prove that the sum 1g-r (x - aJ2 is smallest when x is
the arithmetic mean of al, . . . , a,.
26. If x > 0, letf(x) = 5x2 + Axh5, where A is a positive constant. Find the smallest A such that
f(x) 2 24 for a11 x > 0.
27. For each real I, let f(x) = -$x3 + t2x, and let m(t) denote the minimum of f(x) over the
interval 0 5 x < 1. Determine the value of m(t) for each t in the interval -1 1. t 5 1.
Remember that for some values of t the minimum off(x) may occur at the endpoints of the
interval 0 5 x < 1.
28. A number x is known to lie in an interval a 5 x 5 b, where a > 0. We wish to approximate
x by another number t in [a, b] SO that the relative error, (t - X\/X, Will be as small as possible.
Let M(t) denote the maximum value of It - X\/X as x varies from a to b. (a) Prove that this
maximum occurs at one of the endpoints x = a or x = b. (b) Prove that M(t) is smallest when
t is the harmonie mean of a and b, that is, when l/t = &(l/a + l/b).
‘4.22 Partial derivatives
This section explains the concept of partial derivative and introduces the reader to some
notation and terminology. We shall not make use of the results of this section anywhere
else in Volume 1, SO this material may be omitted or postponed without loss in continuity.
In Chapter 1, a function was defined to be a correspondence which associates with each
abject in a set X one and only one abject in another set Y; the set X is referred to as the
domain of the function. Up to now, we have dealt with functions having a domain consisting
of points on the x-axis. Such functions are usually called functions of one real variable. It
is not difficult to extend many of the ideas of calculus to functions of two or more real
variables.
By a real-valuedfunction of two real variables we mean one whose domain X is a set of
points in the ,uy-plane.
If f denotes such a function, its value at a point (x, y) is a real
number, written f (x, JJ). It is easy to imagine how such a function might arise in a physical
problem. For example, suppose a flat metal plate in the shape of a circular disk of radius
4 centimeters is placed on the xy-plane, with the tenter of the disk at the origin and with
the disk heated in such a way that its temperature at each point (x, y) is 16 - x2 - y2
degrees centigrade. If we denote the temperature at (x, JI) by f (x, ,v), then f is a function
of two variables defined by the equation
(4.27)
f(x, y) = 16 - x2 - y2.
The domain of this function is the set of a11 points (~,y) whose distance from the origin
does not exceed 4. The theorem of Pythagoras tells us that a11 points (~,y) at a distance
Partial
Y
derivatives
197
from the origin satisfy the equation
(4.28)
JC2 +
y2 =
r2.
Therefore the domain in this case consists of a11 points (~,y) which satisfy the inequality
x2 + y2 5 16. Note that on the circle described by (4.28), the temperature is f(x, y) =
16 - r2. That is, the functionf is Con:stant on each circle with tenter at the origin. (See
Figure 4.19.)
We shall describe two useful methods for obtaining a geometric picture of a function of
two variables. One is by means of a sur$zce in space. TO construct this surface, we introduce
a third coordinate axis (called the z-axis); it passes through the origin and is perpendicular
k i 0)
FIGURE 4.19
The temperature is constant on
each circle with tenter at the origin.
FIGURE 4.20 The surface represented by the
equation z = 16 - x2 - y2.
to the xy-plane. Above each point (x, y) we plot the point (x, y, z) whose z-coordinate is
obtained from the equation z = f(x, y).
The surface for the example deseribed above is shown in Figure 4.20. If we placed a
thermometer at a point (x, y) on the plate, the top of the mercury column would just touch
the surface at the point (x, y, z) where z = f(x, y) provided, of course, that unit distances
on the z-axis are properly chosen.
A different kind of picture of a function of two variables cari be drawn entirely in the
xy-plane. This is the method of contour Zines that is used by map makers to represent a
three-dimensional landscape by a two-dimensional drawing. We imagine that the surface
described above has been tut by various horizontal planes (parallel to the xy-plane). They
intersect the surface at those points (x, y, z) whose elevation z is constant. By projecting
these points on the xy-plane, we get a. family of contour lines or levez curves. Each level
curve consists of those and only those points (x, y) whose coordinates satisfy the equation
Dlxerential
198
calculus
Y
(a)
FIGURE 4.21
z = xy
(b)
Level curves: xy = c
(a) A surface whose equation is z = xy. (b) The corresponding level
curves xy = constant.
f(x, y) = c, where c is the constant elevation for that particular curve. In the example
mentioned above, the level curves are concentric circles, and they represent curves’ of
constant temperature, or isothermals, as might be drawn on a weather map. Another
example of a surface and its level curves is shown in Figure 4.21. The equation in this case
is z = xy. The “saddle-shaped” surface is known as a hyperbolicparaboloid.
Contour lines on topographie maps are often shown for every 100 ft of elevation. W h e n
they are close together, the elevation is changing rapidly as we move from one contour to
the next; this happens in the vicinity of a steep mountain. When the contour lines are far
apart the elevation is changing slowly. We cari get a general idea of the steepness of a
z
Plane where y = ya
z = f(x,y,) on this
CU rve
Surface whose
X
FIGURE 4.22 The curve of intersection of a surface z =f(x, y) and a plane y = y,.
Purtial
derivatives
199
However, to get precise information
landscape by considering the spacing ofits level curves.
concerning the rate of change of the elevation, we must describe the surface in terms of a
function to which we cari apply the ideas of differential calculus.
The rate at which the elevation is changing at a point (~,,y& depends on the direction
in which we move away from this point. For the sake of simplicity, we shall consider at
this time just the two special directions, parallel to the x- and y-axes. Suppose we examine
a surface described by an equation of the form z =f(x, y); let us tut this surface with a
plane perpendicular to the y-axis, as shown in Figure 4.22. Such a plane consists of a11
points (x, y, z) in space for which the y-coordinate is constant, say y = y,,. (The equation
y = y,, is cal1 e d an equation of this plane.) The intersection of this plane with the surface
is a plane curve, a11 points of which satisfy the equation z =f(x, y,J. On this curve the
elevationf(x, y0) is a function of x alone.
Suppose now we move from a point (x,, y0) to a point (x, + h, y,,). The corresponding
change in elevation isf(x, + h, y0) -.f(x,,, y,,). Thi s suggests that we form the difference
quotient
f(xo- h, Yo) - J-(x0, Yo)
h
(4.29)
and let h + 0. If this quotient approaches a definite limit as h -+ 0, we cal1 this limit the
partial derivative off with respect to x at (x,, y,J. There are various symbols that are used
to denote partial derivatives, some of the most common ones being
3fc%?
ax
Yo)
>
f Xx0,
Yo)
fic%~
3 fhl~ Yo) 3
Yo)
7
w-(XO~
Yo) .
The subscript 1 in the last two notations refers to the fact that only the first coordinate is
allowed to change when we form the difference quotient in (4.29). Thus we have
L(x0 , yo) = lim f(xo + h, YO) h - f(xo
> yo)
h+O
Similarly, we define the partial derivati,ve
with respect to y at (x0, yo) by the equation
fi(xo~ yo) = lim f(xo 9Y0 + k) - f(xo 3Yo) 9
k-0
k
alternative notations being
wxo , Y,)
ay
’
L(x0 9Yo) )
.fl/(xo 2Y,) 2
&f(xo 9Y,) .
If we Write z =f(x, y), then az/ax and az/ay are also used to denote partial derivatives.
Partial differentiation is not a new concept. If we introduce another function g of one
variable, defined by the equation
gc4 = f(x, Yo) 3
200
DifSerential
calculus
then the ordinary derivative g’(xJ is exactly the same as the partial derivative fi(xO , y,,).
Geometrically, the partial derivative fi(x, y,,) represents the slope of the tangent line at a
typical point of the curve shown in Figure 4.22. In the same way, when x is constant, say
x = x0 , the equation z = f(xO , y) describes the curve of intersection of the surface with
the plane whose equation is x = x,, . The partial derivativef,(x, , y) gives the slope of the
line tangent to this curve. From these remarks we see that to compute the partial derivative
off(x, y) with respect to x, we cari treat y as though it were constant and use the ordinary
rules of differential calculus. Thus, for example, if f(x, y) = 16 - x2 - y2, we get
f,(x, y) = -2x. Similarly, if we hold x fixed, we findf,(x, y) = -2~.
Another example is the function given by
(4.30)
f(x,y) = xsiny +y2cosxy.
Its partial derivatives are
fl(x, y) = sin y - y3 sin xy ,
fi(x, y) = x COS y - xy2 sin xy + 2y COS xy .
Partial difrerentiation is a process which produces new functions fi = af/lax and
= af/lay from a given function f. Since fi and fi are also functions of two variables, we
cari consider their partial derivatives. These are called second-order partial derivatives of
f, denoted as follows:
fi
Notice that fi,z means (f& , the partial derivative off, with respect to y.
we indicate the order of derivatives by writing
In the a-notation,
a-
a af
-=--
ayax ay ( ax 1 *
This does not always yield the same result as the other mixed partial derivative,
a”f
a af
-=-
axay ax ( ay 1 '
However, equality of the two mixed partial derivatives does hold under certain conditions
that are usually satisfied by most functions that occur in practice. We shall discuss these
conditions further in Volume II.
Referring to the example in (4.27), we find that its second-order partial derivatives are
given by the following formulas:
fi,dXPY)
= -23
&(x, y> = fi,&, y> = 0,
fi,z(x3
Y) = -2 -
Exercises
--
201
For the example in (4.30), we obtain
fi,&, y> = -y4 COS xy >
fl,2(x, y) = COS y - xy3 cas xy - 3y2 sin xy ,
$2,1(x, y> = COS y - ?Y3 cas xy - y2 sin xy - 2y2 sin xy =fi,*(x, y) ,
f2,2(x, y) = -x sin y - x”y” COS xy - 2xy sin xy - 2xy sin xy + 2 COS xy
= -x sin y - x2y2 COS xy - 4xy sin xy + 2 cas xy .
A more detailed study of partial derivatives Will be undertaken in Volume II.
*4.23
Exercises
In Exercises 1 through 8, compute a11 first- and second-order partial derivatives. In each case
verify that the mixed partial derivativesf,,,(x, y) andf,,,(x, y) are equal.
1. f(x, y) = x4 + y4 - 4xzy2.
2. f(x, y) = x sin (x + y).
3. j-(X, y) = xy + ;
4. f(X, y) = +-T-y.
(y # 0).
5. f(x, y) = sin (x2y3).
6. f(x, y) = sin [COS (2x - 3y)].
7. fk y) = 5
8.fb,y)=
(x # y).
/ x
(4 y> # (0, 0).
vx2 + y2
9. Show that x( az/ ax) + y( az/ ay) = 2z if (a) z = (x - 2~)~, (b) z = (x4 + y4)‘12.
10. Iff(x, y) = X~/(X~ + y2)2 for (x, y) # (0, 0), show that
a 2f
a 2f
;g2 + -2 = 0.
aY
5
THE RELATION BETWEEN INTEGRATION
AND DIFFERENTIATION
5.1 The derivative of an indefinite integral. The first fundamental theorem of calculus
We corne now to the remarkable connection that exists between integration and
differentiation. The relationship between these two processes is somewhat analogous to
that which holds between “squaring” and “taking the square root.” If we square a positive
number and then take the positive square root of the result, we get the original number
back again. Similarly, if we operate on a continuous function f by integration, we get a
new function (an indefinite integral off) which, when differentiated, leads back to the
original function f. For example, if&) = x2, then an indefinite integral A off may be
defined by the equation
where c is a constant. Differentiating, we find A’(x) = x2 = f(x). This example illustrates
a general result, called the first fundamental theorem of calculus, which may be stated as
follows :
THEOREM 5.1.
FIRST FUNDAMENTAL THEOREM OF CALCULUS.
Let f be a function that is
integrable on [a, x] for each x in [a, b]. Let c be such that a < c 5 b and dejne a new
function A as follows:
A(x) = jCf(t) dt
if
a<x<b.
Then the derivative A’(x) exists at eachpoint x in the open interval
andfor such x we have
(5.1)
(a, b) where
f
is continuous,
A’(x) = f (x) .
First we give a geometric argument which suggests why the theorem ought to be truc;
then we give an analytic proof.
202
Derivative of an indefinite
integral.
TheJirst fundamental theorem of calculus
203
Geometric motivation. Figure 5.1 shows the graph of a function f over an interval [a, b].
In the figure, h is positive and
j”;+hf(t) dt = lczihf(t) dt - SCf(t) dt = /I(x + h) - A(x) .
The example shown is continuous throughout the interval [x, x + h]. Therefore, by the
mean-value theorem for integrals, we have
A(x + h) - A(x) = II$-(Z),
where
x5z < x+ h .
Hence we have
4x + “h - 4x1 = f(z) )
(5.2)
X
a
FIGURE 5.1
Z
x+h
b
Geometric motivation for the first fundamental theorem of calculus.
A
and, since x < z 5 x + h, we find that f(z) -f(x) as h -+ 0 through positive values.
similar argument is valid if h + 0 through negative values. Therefore, A’(x) exists and is
equal to f (x).
This argument assumes that the funlction f is continuous in some neighborhood of the
point x. However, the hypothesis of the theorem refers only to continuity off at a single
point x. Therefore, we use a different method to prove the theorem under this weaker
hypothesis.
Analytic Proof.
Let x be a point of continuity off, keep x fixed, and form the quotient
A(:s
- + h) - A(x)
h
’
TO prove the theorem we must show that this quotient approaches the limit f (x) as h + 0.
The numerator is
A(x + h) - .4(x) = IC+*.f(t) dt - JCzf(t) dt = j-;+I’f(t) dt .
The relation between integration and dlflerentiation
204
If we writej’(t) =Y(x) + [f(t) -f(x)] in the last integral, we obtain
A(X + h) - A(X) = j:+hf(~) dt + j:+l’Lf(t) - f(x)1 dt
=
hf(x) + jI+” U-(t) - f(x)1 dt >
from which we find
A(x + h) - A(x) = f(x) + ; j-;-kh[f(t) -f(x)] dt .
(5.3)
h
r
Therefore, to complete the proof of (5.1), a11 we need to do is show that
lim
h-O Ah
s
x+h
r [f(t) - f(x)1 dt = 0.
Jt is this part of the proof that makes use of the continuity off at x.
Let us denote the second term on the right of (5.3) by G(h). We are to prove that
G(h) -f 0 as h --f 0. Using the definition of limit, we must show that for every E > 0 there
is a 6 > 0 such that
P@)I -C E
(5.4)
whenever 0 < (h( < 6 .
Continuity offat x tells us that, if E is given, there is a positive 6 such that
(5.5)
lf(t> -fWl < +
whenever
x-d<t<x+d.
(5.6)
If we choose h SO that 0 < h < 6, then every t in the interval [x, x + h] satisfies (5.6) and
hence (5.5) holds for every such t. Using the property IJz+“g(t) dt ( < JZ+“lg(t)l dt with
g(t) =fW -f( x > , we see that the inequality in (5.5) leads to the relation
/ jtfh [j-(t) -f(x)] dt ) 5 j;+h If(t) -S(x)1 dt < j:+n 4~ dt = $hc < he <
If we divide by h, we see that (5.4) holds for 0 < h < 6. If h < 0, a similar argument
proves that (5.4) holds whenever 0 < Ihl < 6, and this completes the proof.
5.2 The zero-derivative theorem
If a functionfis constant on an open interval (a, b), its derivative is zero everywhere on
(a, b). We proved this fact earlier as an immediate consequence
of the definition of
derivative. We also proved, as part (c) of Theorem 4.7, the converse of this statement
which we restate here as a separate theorem.
Primitive functions and the second fundamental theorem
THEOREM 5.2.
ZERO-DERIVATIVE
I, then f is constant on I.
THE:OREM.
of calculus
205
If f’(x) = 0 for each x in an open interval
This theorem, when used in combination with the first fundamental theorem of calculus,
leads to the second fundamental theorem which is described in the next section.
5.3 Primitive functions and the second fundamental theorem of calculus
DEFINITION OF PRIMITIVE FUNCTION.
A function P is called a primitive (or an antiderivative)
of a function f on an open interval I if the derivative of P is f, that is, if P’(x) = f (x) for a11
x in I.
For example, the sine function is a primitive of the cosine on every interval because the
derivative of the sine is the cosine. We speak of a primitive, rather than the primitive,
because if P is a primitive offthen SO is P + k for every constant k. Conversely, any two
primitives P and Q of the same function f cari differ only by a constant because their
difference P - Q has the derivative
P’(x)
- Q’(x) = f(x) -
f(x)
= 0
for every x in I and hence, by Theorem 5.2, P - Q is constant on Z.
The first fundamental theorem of calculus tells us that we cari always construct a primitive
of a continuous function by integration. When we combine this with the fact that two
primitives of the same function cari differ only by a constant, we obtain the second
fundamental theorem of calculus.
Assume f iS COntirUdOUS
Then, for each c and each x in I,
THEOREM 5.3. SECOND FUNDAMENTAL THEOREM OF CALCULUS.
on an open interval I, and let P be any primitive off on I.
we have
(5.7)
P(x) =: P(c) + JCzf(t) dt .
Proof.
Let A(x) = jC f(t) dt. Since f is continuous at each x in 1, the first fundamental
theorem tells us that A’(x) = f(x) for a11 x in Z. In other words, A is a primitive off on Z.
Since two primitives off cari differ only by a constant, we must have A(x) - P(x) = k
for some constant k. When x = c, this formula implies -P(c) = k, since A(c) = 0.
Therefore, A(x) - P(x) = -P(c), from which we obtain (5.7).
Theorem 5.3 tells us how to find every primitive P of a continuous functionf. We simply
integrateffrom a fixed point c to an arbitrary point x and add the constant P(c) to get P(x).
But the real power of the theorem becomes apparent when we Write Equation (5.7) in the
following form :
(5.8)
sczf(t:, dt = P(x) - P(c).
In this form it tells us that we cari compute the value of an integral by a mere subtraction
206
The relation between integration and diflèrentiation
if we know a primitive P. The problem of evaluating an integral is transferred to another
problem-that of finding a primitive P off. In actual practice, the second problem is a
great deal casier to deal with than the fïrst. Every differentiation formula, when read in
reverse, gives us an example of a primitive of some functionfand this, in turn, leads to an
integration formula for this function.
From the differentiation formulas worked out thus far we cari derive the following
integration formulas as consequences
of the second fundamental theorem.
EXAMPLE
(5.9)
1. Integration of rationalpowers. The integration formula
s
a
b
x”dx =
bn+l
- an+l
(n = 0, 1, 2, . . .)
n+l
was proved in Section 1.23 directly from the definition of the integral. The result may be
rederived and generalized to rational exponents by using the second fundamental theorem.
First of all, we observe that the function P defined by the equation
n+l
(5.10)
P(x) = -z-n+l
has the derivative P’(x) = .Y if n is any nonnegative integer.
x, we may use (5.8) to Write
s
b
n
xn dx = P(b) - P(a) =
Since this is valid for a11 real
bn+l _ an+l
n+l
for a11 intervals [a, b]. This formula, proved for a11 integers n 2 0, also holds for a11 negative
integers except n = - 1, which is excluded because n + 1 appears in the denominator. TO
prove (5.9) for negative n, it suffices to show that (5.10) implies P’(x) = xn when n is negative
and # - 1, a fact which is easily verified by differentiating P as a rational function. Of
course, when n is negative, neither P(x) nor P’(x) is defined for x = 0, and when we use
(5.9) for negative n, it is important to exclude those intervals [a, b] that contain the point
x = 0.
The results of Example 3 in Section 4.5 enable us to extend (5.9) to a11 rational exponents
(except -l), provided the integrand is defined everywhere on the interval [a, b] under
consideration. For example, if 0 < a < b and n = -4, we find
Iab-$dx =SYx-I/‘dx = $7: = 2(45 - 49.
This result was proved earlier, using the area axioms. The present proof makes no use of
these axioms.
In the next chapter we shall define a general power function f such that j-(x) = xc for
every real exponent c. We shall find that this function has the derivativef’(x) = cxe-l and
Properties of a function deducedfrom properties of its derivative
207
the primitive P(x) = X~+~/(C + 1) if c f- - 1. This Will enable us to extend (5.9) to a11 real
exponents except - 1.
Note that we cannot get P’(x) = 1/x by differentiation of any function of the form
P(x) = xn. Nevertheless, there exists a function P whose derivative is P’(x) = 1/x. TO
exhibit such a function a11 we need to do is Write a suitable indefinite integral; for example,
P(x) = lz: dt
if
s
x>o.
This integral exists because the integrand is monotonie. The function SO defined is called
the Zogarithm (more specifically, the naturaf logarithm). Its properties are developed
systematically in Chapter 6.
EXAMPLE 2. Integration
of the sine and cosine. Since the derivative of the sine is the
cosine and the derivative of the cosine is minus the sine, the second fundamental theorem
also gives us the following formulas:
b
ia
COS
b
x dx = sin x
a
= sin b - sin a ,
b
b
Ï sinxdx=(-COS~)
=cosa-cosb.
These formulas were also proved in Chapter 2 directly from the definition of the integral.
Further examples of integration formulas cari be obtained from Examples 1 and 2 by
taking finite sums of terms of the form Ax’“, B sin x, C COS x, where A, B, C are constants.
5.4 Properties of a function deduced. from properties of its derivative
If a function f has a continuous derivative
theorem states that
(5.11)
f'
on an open
interval Z, the second fundamental
f(x) == f(c) + /czf’(t) dt
for every choice of points x and c in Z. This formula, which expresses f in terms of its
derivative f ‘, enables us to deduce prolperties of a function from properties of its derivative.
Although the following properties have already been discussed in Chapter 4, it may be of
interest to see how they cari also be deduced as simple consequences
of Equation (5.11).
Suppose f' is continuous and nonnegative on I. If x > c, then jC f ‘(t) dt 2 0, and hence
f(x) 2 f(c). In other words, if the Iderivative is continuous and nonnegative on Z, the
function is increasing on Z.
In Theorem 2.9 we proved that the indefinite integral of an increasing function is convex.
Therefore, iff’ is continuous and increasing on 1, Equation (5.11) shows thatf is convex on
Z. Similarly, f is concave on those intervals where f’ is continuous and decreasing.
The relation between integration and differentiation
208
5.5 Exercises
In each of Exercises 1 through 10, find a primitive off; that is, find a function P such that
P’(x) = f(x) and use the second fundamental theorem to evaluate j:,(x) dx.
1. f(x) = 5x3.
6. f(x) = z/zx + &,
x > 0.
2x2 - 6x + 7
2. f(x) = 4x4 - 12x.
7. f(x) =
x > 0.
22/x
’
8. f(x) = 2x1/3 - x-113,
x > 0.
3. f(x) = (x + 1)(x3 - 2).
- 3
4. f(x) =x4 +x;
,
x #O.
9. f(x) = 3 sin x + 2x5.
x > 0.
10. f(x) = x4/3 - 5 COS x.
11. Prove that there is no polynomial f whose derivative is given by the formulaf’(x) = 1/x.
12. Show that jt It) dt = +X\X\ for a11 real x.
13. Show that
5. f(x) = (1 + IL),,
‘(t + ItlY dt = F (x + Ix])
for a11 real x .
s0
14. A function f is continuous everywhere and satisfies the equation
I; f(t) dt = -4 + x2 + x sin 2x + i
COS
2x
for a11 x. Compute f(ir) andf’(&).
15. Find a function f and a value of the constant c such that
s
CE f (t) dt = COS x - 3
for a11 real x .
16. Find a function f and a value of the constant c such that
sCE
e tf(t> dt = sin x - x
COS
x - 4x2
for a11 real x .
17. There is a function J defined and continuous for a11 real x, which satisfies an equation of the
form
lf(t)dt =j)j-(t)dt +G +; +c,
where c is a constant. Find an explicit formula for f (x) and find the value of the constant c.
18. A functionf is defined for a11 real x by the formula
f(x) = 3 + oz G dt .
s
Without attempting to evaluate this integral, find a quadratic polynomialp(x) = a + bx + cx2
such that p(O) = f(O), p’(O) =T(O), and p”(O) =Y(O).
Exercises
209
19. Given a function g, continuous everywhere, such that g( 1) = 5 and so g(t) dt = 2.
4 j; (x - @g(t) dt. Prove that
Let f(x) =
f’(x) = x /;g(t) dt -1; tg(t) dt ,
then compute f”( 1) and f”‘( 1).
20. Without attempting to evaluate the following indefinite integrals, find the derivativef’(x) in
each case if f(x) is equal to
(a) JI (1 + t2jp3 dt ,
(b) j-l’ (1 + t2)-3 dt ,
(c) j-;a2 (1 + t2)-3 dt .
21. Without attempting to evaluate the integral, computef’(x) iffis defined by the formula
f(x) =[’ &a dt .
22. In each case, computef(2) iffis continuous and satisfies the given formula for a11 x 2 0.
(a) J:f(t) dt = x2(1 + x) .
t2 dt = x2(1 + x) .
(b) ff(t) dt = x2(1 + x) .
(d) j-;z’l+z’
f(t) dt = x .
23. The base of a solid is the ordinate set of a nonnegative functionf’over the interval [0, a]. Al1
cross sections perpendicular to this imerval are squares. The volume of the solid is
a3 - 2a Cos a + (2 - a2) sin a
for every a 2 0. Assume fis continuous on [0, a] and calculatef(a).
24. A mechanism propels a particle along a straight line. It is designed SO that the displacement
of the particle at time t from an initial point 0 on the line is given by the formula f(t) = &t2 +
2t sin t. The mechanism works perfectly until time t = 7~ when an unexpected malfunction
occurs. From then on the particle moves with constant velocity (the velocity it acquires at
time t = r). Compute the following: (a) its velocity at time t = n; (b) its acceleration at
time t = 3~; (c) its acceleration at time t = $5~; (d) its displacement from 0 at time t = 3~.
(e) Find a time t > r when the particle returns to the initial point 0, or else prove that it never
returns to 0.
25. A particle moves along a straight line. Its position at time t is,f(t). When 0 5 t 5 1, the
position is given by the integral
f(t) =
t 1 + 2 sin XX Cos HX
dx .
1 +x2
s0
(Do not attempt to evaluate this integral.) For t 2 1, the particle moves with constant
acceleration (the acceleration it acquires at time t = 1). Compute the following: (a) its acceleration at time t = 2; (b) its velocity when t = 1; (c) its velocity when t > 1; (d) the difference
f(t) -f(l) when t > 1.
26. In each case, find a function f with a continuous second derivativef” which satisfies a11 the
given conditions or else explain why auch an example cannot exist.
for every x,
f’(0) = 1,
f’(1) = 0.
(4 f”(x) > 0
for every x,
f’(0) = 1,
f’( 1) = 3.
(b) f”(x) > 0
for every x,
f’(0) = 1,
f(x) 5 100
for a11 x > 0.
(4 fw > 0
for every x,
f(0) = 1,
f(x) 5 100
for a11 x < 0.
(4 f”(x) > 0
210
The relation between integration and dz$erentiation
27. A particle moves along a straight line, its position at time t beingf(r). It starts with an initial
velocityf(0) = 0 and has a continuous accelerationf”(f) 2 6 for a11 t in the interval 0 < t 2 1.
Prove that the velocityf’(t) 2 3 for a11 t in some interval [a, b], where 0 < a < b 5 1, with
b-a=+.
28. Given a functionfsuch that the integral A(x) = ef(t) dt exists for each x in an interval [a, b].
Let c be a point in the open interval (a, b). Consider the following ten statements about this
f and this A:
(a) fis continuous at c.
(a) A is continuous at c.
(b) fis discontinuous at c.
(p) A is discontinuous at c.
(c) fis increasing on (a, b).
(y) A is convex on (a, b).
(d) f(c) exists.
(6) A’(c) exists.
(E) A’ is continuous at c.
(e) f’ is continuous at c.
In a table like the one shown here, mark T in
the appropriate square if the statement labeled
with a Latin letter always implies the statement
labeled with a Greek letter. Leave the other
squares blank. For example, if (a) implies (a),
mark T in the Upper left-hand corner square, etc.
a
- - ~ - _- b
--~---~
c
d
-~---e
5.6 The Leibniz notation for primitives
We return now to a further study of the relationship between integration and differentiation. First we discuss some notation introduced by Leibniz.
We have defined a primitive P of a functionfto be any function for which P’(x) =f(x).
Iff is continuous on an interval, one primitive is given by a formula of the form
P(x) = lc’f(t) dt >
and a11 other primitives cari differ from this one only by a constant. Leibniz used the
symbol jf(x) dx to denote a general primitive off. In this notation, an equation like
J f(x) dx = P(x) + c
(5.12)
is considered to be merely an alternative way of writing P’(x) =f(x). For example, since
the derivative of the sine is the cosine, we may Write
(5.13)
s
COS
x dx = sin x + C .
Similarly, since the derivative of xn+l/(n + 1) is x”, we may Write
(5.14)
s
Xn+l
xndx = - +c,
n+l
The Leibniz notation for primitives
211
for any rational power n # - 1. The symbol C represents an arbitrary constant SO each
of Equations (5.13) and (5.14) is really a statement about a whole set of functions.
Despite similarity in appearance, the symbol jf(x) dx is conceptually distinct from
the integration symbol Jo f(x) dx. The symbols originate from two entirely different
processes-differentiation and integration. Since, however, the two processes are related
by the fundamental theorems of calculus, there are corresponding relationships between
the two symbols.
The first fundamental theorem states that any indefinite integral off is also a primitive
off. Therefore we may replace P(x) in Equation (5.12) by Jz f(t) dt for some lower limit
c and Write (5.12) as follows:
(5.15)
/f(x) dx = rf(t) dt + C .
This means that we cari think of the symbol jf(x) dx as representing some indefinite
integral off, plus a constant.
The second fundamental theorem tells us that for any primitive P off and for any constant
C, we have
lab f(x:) dx = V’(x) + Cl 11.
If we replace P(x) + C by jf(x) dx, this formula may be written in the form
The two formulas in (5.15) and (5.16) may be thought of as symbolic expressions of the
first and second fundamental theorems of calculus.
Because of long historical usage, many calculus textbooks refer to the symbol jf(x) dx
as an “indefinite integral” rather than as a primitive or an antiderivative. This is justified,
in part, by Equation (5.15), which tells us that the symbol jf(x) dx is, apart from an
additive constant C, an indefinite integral off. For the same reason, many handbooks of
mathematical tables contain extensive lists of formulas labeled “tables of indefinite
integrals” which, in reality, are tables of primitives. TO distinguish the symbol jf(x) dx
from Ja f(x) dx, the latter is called a dejnite integral. Since the second fundamental theorem
reduces the problem of integration to that of finding a primitive, the term “technique of
integration” is used to refer to any systematic method for finding primitives. This terminology is widely used in the mathematical literature, and it Will be adopted also in this
book. Thus, for example, when one is asked to “integrate” jf (x) dx, it is to be understood
that what is wanted is the most general primitive off.
There are three principal techniques that are used to construct tables of indefinite
integrals, and they should be learned by anyone who desires a good working knowledge
of calculus. They are (1) integration by substitution (to be described in the next section),
a method based on the chain rule; (2) integration byparts, a method based on the formula
for differentiating a product (to be described in Section 5.9); and (3) integration bypartial
fractions, an algebraic technique which is discussed at the end of Chapter 6. These
techniques not only explain how tables of indefinite integrals are constructed, but also
they tel1 us how certain formulas are converted to the basic forms listed in the tables.
212
The relation between integration and d@erentiation
5.7 Integration by substitution
Let Q be a composition of two functions P and g, say Q(x) = P[g(x)] for a11 x in some
interval Z. If we know the derivative of P, say P’(x) =f(x), the chain rule tells us that the
derivative of Q is given by the formula Q’(x) = P’[g(x)]g’(x). Since P’ =f, this states
that Q’(x) =f[g(x)]g’(x). In other words,
(5.17)
implies
P’(x) = fW
Q’(x) = f[g(x)]g’(x)
.
In Leibniz notation, this statement cari be written as follows: If we have the integration
formula
s f(x) dx = P(x) + C ,
(5.18)
then we also have the more general formula
(5.19)
For example, if S(x) =
(5.20)
s fkWl&)
COS
I
dx = PM41 + C .
x, then (5.18) holds with P(x) = sin x,
COS
SO
(5.19) becomes
g(x) . g’(x) dx = sin g(x) + C .
In particular, if g(x) = x3, this gives us
COS
x3 . 3x2 dx = sin x3 + C ,
a result that is easily verified directly since the derivative of sin x3 is 3x2 COS x3.
Now we notice that the general formula in (5.19) is related to (5.18) by a simple mechanical
process. Suppose we replace g(x) everywhere in (5.19) by a new symbol u and replace g’(x)
by du/dx, the Leibniz notation for derivatives. Then (5.19) becomes
s
f(u) 2 dx = P(u) + C .
At this stage the temptation is strong to replace the combination g dx by du. If we do
this, the last formula becomes
(5.21)
s f CU) du
= P(u) + c .
Notice that this has exactly the same form as (5.18), except that the symbol u appears
everywhere instead of x. In other words, every integration formula such as (5.18) cari be
made to yield a more general integration formula if we simply substitute symbols. We
replace x in (5.18) by a new symbol u to obtain (5.21), and then we think of u as representing
Integration by substitution
213
a new function of x, say u = g(x). Then we replace the symbol du by the combination
g’(x) dx, and Equation (5.21) reduces to the general formula in (5.19).
For example, if we replace x by ZJ in the formula J COS x dx = sin x + C, we obtain
s
COS
u du = sin u + C .
In this latter formula, u may be replaced by g(x) and du by g’(x) dx, and a correct integration
formula, (5.20), results.
When this mechanical process is used in reverse, it becomes the method of integration by
substitution. The abject of the method is to transform an integral with a complicated
integrand, such as J 3x2 COS x3 dx, into a more familiar integral, such as J COS u du. The
method is applicable whenever the original integral cari be written in the form
since the substitution
u = g(x),
du = g’(x) dx ,
transforms this to Jf(u) d u . If we succeed in carrying out the integration indicated by
Jf(u) du, we obtain a primitive, say P(u), and then the original integral may be evaluated
by replacing u by g(x) in the formula for P(u).
The reader should realize that we have attached no meanings to the symbols dx and du
by themselves. They are used as purely forma1 devices to help us perform the mathematical
operations in a mechanical way. Each time we use the process, we are really applying the
statement (5.17).
Success in this method depends on one’s ability to determine at the outset which part of
the integrand should be replaced by the symbol u, and this ability cornes from a lot of
experience in working out specific examples. The following examples illustrate how the
method is carried out in actual practice.
EXAMPLE
1. Integrate J x3
COS
x4 dx.
Solution. Let us keep in mind that we are trying to Write x3 COS x4 in the formS[g(x)]g’(x)
with a suitable choice off and g. Since COS x4 is a composition, this suggests that we take
f(x) = COS x and g(x) = x4 SO that COS x4 becomes f [g(x)]. This choice of g gives g’(x) =
4x3, and hence f[g(x)Jg'(x) = (COS x4) (4x3 ). The extra factor 4 is easily taken tare of
by multiplying and dividing the integrand by 4. Thus we have
x3 COS x4 = gcos x4)(4x3) = $f[g(x)]g'(x)
.
New, we make the substitution u = g(x) = x4, du = g’(x) dx = 4x3 dx, and obtain
$x3cosx4dx= $lf(u)du = f$cosudu
= *sinu + C.
The relation between integration and differentiation
214
Replacing u by x4 in the end result, we obtain the formula
i
x3
COS
x4 dx = 2 sin x4 + C ,
which cari be verified directly by differentiation.
After a little practice one cari perform some of the above steps mentally, and the entire
calculation cari be given more briefly as follows: Let u = xd. Then du = 4x3 dx, and we
obtain
j x3 COS x4 dx = $ J^ (COS x4)(4x3 dx) = & j
COS
u du = $ sin u + C = B sin x4 + C .
Notice that the method works in this example because the factor x3 has an exponent one
less than the power of x which appears in COS x4.
EXAMPLE
2. Integrate J CO? x sin x dx.
Solution. Let u =
s
COS’ x sin x dx = -
Again,
x. Then du = -sin x dx, and we get
COS
s
(cosx)‘(-sinxdx)=-
I
u2du=-$+C=-c$+C.
the final result is easily verified by differentiation.
ExAMPLE 3. Integrate
s
Solution. Let u = 1/x = xl 1 2. Then du = &x-l12 dx, or dxl& = 2 du. Hence
s
sin %G
- dx = 2 sin u du = -2
XG
ExAMPLE 4. Integrate
COS
u + C = -2
COS
1/x + C .
x dx
s m*
Solution. Let u = 1 + x2. Then du = 2x dx
SO
x dx = + du, and we obtain
-li2 du = ul” + C = 2/1+x2 + C .
The method of substitution is, of course, also applicable to definite integrals. For example,
to evaluate the definite integral j,,rj2 cos2 x sin x dx, we first determine the indefinite integral,
215
Integration by substitution
as explained in Example 2, and then we use the second fundamental theorem to Write
RI2
s0
cos2xsinxdx=
-~c~s~x~~=
~~(,os3~~,os30)
=i.
Sometimes it is desirable to apply the second fundamental theorem directly to the integral
expressed in terms of U. This may be done by using new limits of integration. We shall
illustrate how this is carried out in a particular example, and then we shall justify the
process with a general theorem.
EXAMPLE
5. Evaluate
3
(x+l)dx
s,
2
x2
+ 2x + 3 .
Solution. Let u = x2 + 2x + 3. Then du = (2x + 2) dx,
(x + 1) dx
SO
that
1 du
ýx2 + 2x + 3 =zzi*
Now we obtain new limits of integration by noting that u = 11 when x = 2, and that
u= 18whenx=3.
Thenwewrite
3 (’ + ‘) dx
Sd
2 x2 + 2x +
3
=-’ 18u-1iz du = 6 ‘a = 2/18 _ fi
2 s11
11
The same result is arrived at by expressing everything in terms of x. Thus we have
(x+l)dx
2 dx” + 2x + 3
s
3
=
dxz
+
2x
+
3
Now we prove a general theorem which justifies the process used in Example 5.
THEOREM 5.4. SUBSTITUTION THEOREM FOR INTEGRALS. Assume g has a continuous
derivative g’ on an open interval I. Let .J be the set of values taken by g on I and assume that
fis continuous on J. Then for each x and c in I, we have
(5.22)
Proof. Let a = g(c) and define two new functions
P(x) = ja'f<4 du
if x E J,
P and Q as follows:
Q(x) = [f[g(t)]g’(t)
dt if x E 1 .
The relation between integration and dlrerentiation
216
Since P and Q are indefinite integrals of continuous functions,
by the formulas
P’(x) = f(x),
Now let R denote
they have derivatives given
Q’(x) = f kWlg’(4 .
the composite function, R(x) = P[g(x)]. Using the chain rule, we find
R’(x) = P’&)lg’(4 = fkWlg’(x) = Q’(x) .
Applying the second fundamental theorem twice, we obtain
Cer)j(u) du = /;;’ P’(u) du = P[g(x)] - P[g(c)] = R(x) - R(c),
and
jczf[g(t)]g’(t) dt = jcz Q’(t) dt = je= R’(t) dt = R(x) - R(c) .
This shows that the two integrals in (5.22) are equal.
5.8 Exercises
In Exercises 1 through 20, evaluate the integrals by the method of substitution.
1.
s
v%% dx.
11*
2 . xd1+3xdx.
s
12.
3.
13.
s
x2dx dx.
xdx
113
4. s-213 1/n *
5.
6.
s (x2 + 2x
s
+ 2)3 *
- 1)1’3
7. jz(z
dz .
x dx
8 . ~
s sin3 x ’
COS
9.
rl4
I
o
COS
8 sin l/xT1 dx
15.
17.
n # 0.
xnpl sin xn dx,
x5 dx
s dc-7.
s
t(1 + t)1’4 dt.
16.
sin3 x dx.
sin x dx
2/cos3x*
3vza
14.
(x + 1)dx
s
s
I
(x2 + 1)-3’2 dx.
J
x2(8x3 + 27)2’3
dx
.
(sin x + COS x) dx
(sin x - cas X)I/~ ’
x dx
18.
2x&=-%% dx.
1 +x2+Vj3’
[ (x2 + 1 - 2~)“~ dx
20.
l-x
.
J
Integration by parts
217
21. Deduce the formulas in Theorems 1.18 and 1.19 by the method of substitution.
22. Let
5
tp
F(x, a) =
~ dt >
so (P + a2Y
where a > 0, and p and q are positive integers. Show that F(x, a) = aw1-2gF(x/a,
23. Show that
1).
24. If m and n are positive integers, show that
J^ ; xm(l - x)~
dx = ; xn(l - x)” dx .
s
25. If m is a positive integer, show that
COS~ x sin” x dx = 2F”
26. (a) Show that
Ps
0
xf(sin x) dx = -2 Rf(sin x) dx .
s0
s
11/2
0
COS”
[H&t:
x dx .
u = 7r - x.1
(b) Use part (a) to deduce the formula
s
T x sin x
l dx
dx=n
0 1 + COS2 x
01
s +x2’
27. Show that ji (1 - x2)+lj2 dx = jg’2 COS~” u du if n is a positive integer. [Hint: x = sin u.]
The integral on the right cari be evaluated by the method of integration by parts, to be discussed
in the next section.
5.9 Integration by parts
We proved in Chapter 4 that the derivative of a product of two functions f and g is given
by the formula
where h(x) =f(x)g(x). When this is translated into the Leibniz notation for primitives, it
becomes J f(x)g ‘(x) dx + j f’(x)g(x) dx = f(x)g(x) + C, usually written as follows :
(5.23)
j+f(xk’(x) dx = f(x)&) - j-f’(x)g(x)
dx + C .
The relation between integration and difSentiation
218
This equation, known as the formula for integration by parts, provides us with a new
integration technique.
TO evaluate an integral, say j k(x) dx, using (5.23), we try to find two functions f and g
such that k(x) cari be written in the formf(x)g’(x). If we cari do this, then (5.23) tells us
that we have
j 4~) dx = f(x)&) - j dx)f'(x) dx + C >
and the difficulty has been transferred to the evaluation of J g(x)f’(x) dx. If f and g are
properly chosen, this last integral may be easier to evaluate than the original one. Sometimes two or more applications of (5.23) Will lead to an integral that is easily evaluated
or that may be found in a table. The examples worked out below have been chosen to
illustrate the advantages of this method. For definite integrals, (5.23) leads to the formula
If we introduce the substitutions u =f(x), u = g(x), du =f’(x) dx, and & = g’(x) dx,
the formula for integration by parts assumes an abbreviated form that many people hnd
easier to remember, namely
I u du =
(5.24)
EXAMPLE
1. Integrate J x
COS
UV
- .r vdu + C.
x dx.
Solution. We choose f(x) = x and g’(x) =
and g(x) = sin x, SO (5.23) becomes
sx
(5.25)
COS
COS
x. This means that we have f’(x) = 1
x dx = x sin x - s sin x dx + C = x sin x +
COS
x + C .
Note that in this case the second integral is one we have already calculated.
TO carry out the same calculation in the abbreviated notation of (5.24) we Write
u = x,
du = dx,
s
x COS x dx = UV -
s
dv =
COS
x dx ,
v = j COS x dx = sin x ,
v du = x sin x - s sin x dx + C = x sin x +
COS
x + C .
Had we chosen u = COS x and du = x dx, we would have obtained du = -sin x dx,
v = $x2, and (5.24) would have given us
s
x
COS
x dx = ‘x2
2
COS
x - 12 x2(-sin x) dx + C = 3x”
s
COS
x+ $
s
x2 sin x dx + C .
Integration by parts
219
Since the last integral is one which we have not yet calculated, this choice of u and u is not
as useful as the first choice. Notice, however, that we cari salve this last equation for
J x2 sin x dx and use (5.25) to obtain
s
EXAMPLE
x2 sin x dx = 2x sin x + 2
2. Integrate J x2
COS
COS
x - x2
COS
x+C.
x dx.
Solution. Let u = x2 and du =
have
COS
x dx. Then du = 2x dx and v = j
COS
x dx = sin x,
SO we
(5.26)
j x2
COS
x dx = s u du =
UV
- s v du + C = x2 sin x - 2 .c x sin x dx + C .
The last integral cari be evaluated by applying integration by parts once more. Since it is
similar to Example 1, we simply state the result:
s
x sin x dx = -x COS x + sin x + C .
Substituting in (5.26) and consolidating the two arbitrary constants into one, we obtain
s
x2cosxdx = x’sinx + 2xcosx - 2sinx + C.
EXAMPLE 3. The method sometimes fails because it leads back to the original integral.
For example, let us try to integrate J x-l dx by parts. If we let u = x and du = xP2 dx,
then J x-l dx = J u du. For this choice of u and v, we have du = dx and v = -x-l, SO
(5.24) gives us
(5.27)
s
X-’
dx = j u du =
UV
- jvdu+C=-l+jx-‘dx+C,
and we are back where we started. Moreover, the situation does not improve if we try
u = x” and du = xPpl dx.
This example is often used to illustrate the importance of paying attention to the arbitrary
constant C. If formula (5.27) is written without C, it leads to the equation J x-l d x =
- 1 + j x-l dx, which is sometimes used to give a fallacious proof that 0 = - 1.
As an application of the method of integration by parts, we obtain another version of
the weighted mean-value theorem for integrals (Theorem 3.16).
THEOREM
5.5. SECOND MEAN-VALUE THEOREM FOR INTEGRAL~. Assumegiscontinuouson
[a, b], and assume f has a derivative which is continuous and never changes sign in [a, b].
Then, for some c in [a, b], we have
(5.28)
j; f(x)&) dx = f(a) j; g(x) dx + f(b) JC" g(x) dx .
The relation between integration and d@erentiation
220
Proof Let G(x) = jag(t) dt. Since g is continuous,
integration by parts gives us
we have G’(x) = g(x). Therefore,
(5.29)
since G(a) = 0. By the weighted mean-value theorem, we have
jab f’(WW dx = G(c) jab f’(x) dx = G(cNf@) - f(a)1
for some c in [a, b]. Therefore, (5.29) becomes
Iab f(xMx) dx = f(b)W) - G(c)Lf(b) - f(a)1 = ~C~)G(C) + f@)W) - G(c)1 .
This proves (5.28) since G(c) = jz g(x) dx and G(b) - G(c) = je g(x) dx .
5.10 Exercises
Use integration by parts to evaluate
the integrals in Exercises 1 through 6.
x3 sin x dx.
1. xsinxdx.
s
4.
s
x2 sin x d:w.
5.
J
sin x
6.
s
x sin x
2.
s
3.
x3
s
COS
x dx.
COS
x dx.
COS
x dx.
7. Use integration by parts to deduce the formula
s
sin2 x dx = -sin x
COS
x + cos2 x dx .
î
In the second integral, Write cos2 x = 1 - sin2 x and thereby deduce the formula
s
sin2 x dx = 4.x - 4 sin 2x .
8. Use integration by parts to deduce the formula
jsin” x dx = -siiF x COS x + (n - 1) jsinnP2 x cos2 x dx .
In the second integral, Write cos2 x = 1 - sin2 x and thereby deduce the recursion
sinnxdx = s
sirY x Cos x
n
9. Use the results of Exercises 7 and 8 to show that
(a) 1”sin2 x dx = f .
n-1
siiF x dx .
+n s
formula
Exercises
s RI2
221
3 ni2
377
sin4 x d x = -4 s0 sin2 x d x = 16 *
ut2
nP
5
577
sin6 x d x = sin4 x d x = - .
(cl
6 s0
32
s0
10. Use the results of Exercises 7 and 8 to derive the following formulas.
(b)
0
(a) îsin3xdx = -~COS~ + &COS 3x.
(b) j sm4 x dx = f x - 4 sin 2x + &Y sin 4x.
Cd j sm5
. x dx = -ix + 2%
COS
3x - y& COS 5x.
11. Use integration by parts and the results of Exercises 7 and 10 to deduce the following formulas.
(4 jx sin2 x dx = 4 x2 - 2 x sin 2x - 8
COS
2x.
(b) jx sin3 x dx = 2 sin x - 3% sin 3x - 2x COS x + 141 x COS 3x.
(c) jx2 sin2 x dx = +x3 + ($ - 4x2) sin 2x - ix Cos 2x.
12. Use integration by parts to derive the recursion
s
COS”
x dx -
formula
COP x sin x
n - l
n
+ns
COS-~X dx .
13. Use the result of Exercise 12 to obtain the following formulas.
(a) jcos2 x dx = ix + t sin 2x.
(b) [COS~ x dx = 2 sin x + 1% sin 3x.
cc>
jcm4 x dx = #x + $ sin 2x + & sin 4x.
14. Use integration by parts to show that
s
mdx
=Xv’= +
s
r$dx.
Write x2 = x2 - 1 + 1 in the second integral and deduce the formula
15. (a) Use integration by parts to derive the formula
X2)*
s
(a2 - x~)~ dx = “‘;; -+ 1
+ SI
(b) Use part (a) to evaluate s: (a2 - x2)5’2 dx.
s
(a2 - x2)-l dx + C .
The relation between integration and diflerentiation
222
16. (a) If Z,(x) = j$ tn(t2 + a 2 )- lf2 dt, use integration by parts to show that
i f n22.
nZ,(x) = x+$47-2 - (n - l)a2Z,,(x)
(b) Use part (a) to show that JO x5(x2 + 5)-*j2 dx = 168/5 - 402/j/3.
17. Evaluate the integral ST1 t3(4 + t3)-lj2 dt, given that j?i (4 + t3)1/2 dt = 11.35. Leave the
answer in terms of 1/3 and fi.
18. Use integration by parts to derive the formula
sirP+l x
sin+i x
- x - ’- ~ dx .
dx = - ’ sinn
Co@l
x
m
COPX
m
Cos”-l
x
s
s
Apply the formula to integrate s tan2 x dx and j tan4 x dx.
19. Use integration by parts to derive the formula
Cos”+1
x
Cos-l x
-dx=-;z-f
s
s
~ dx .
sinnP1 x
sinnfl x
Apply the formula to integrate j cot2 x dx and j cot4 x dx.
20. (a) Find an integer n such that n jo xf”(2x) dx = ji {f”(t) dt.
(b) Compute ji xf”(2x) dx, given that f(0) = 1, f(2) = 3, and f’(2) = 5.
21. (a) If $” is continuous and nonzero on [a, b], and if there is a constant m > 0 such that
4’(t) > m for a11 t in [a, b], use Theorem 5.5 to prove that
I[sin+(t)dtI
[Hint:
5:.
Multiply and divide the integrand by d’(t).]
(b) If a > 0, show that 1s: sin (t2) dtl < 2/a for a11 x > a.
*5.11
Miscellaneous review exercises
1. Let f be a polynomial withf(0) = 1 and let g(x) = PJ’(x). Compute g(O), g’(O), . . . , g(n)(0).
2. Find a polynomial P of degree < 5 with P(0) = l,P(l) = 2,P’(O) = P”(0) = P’(1) = P”(1) = 0.
3. lff(x) = COS x andg(x) = sin x, prove that
,f(qx) = COS (x + gm)
and
g(“)(x) = sin (x + +T).
4. If h(x) = f(x)g(x), prove that the nth derivative of h is given by the formula
P’(x) = ~($f(“‘(x)g’“-“)(x) >
k=O '
where (2) denotes the binomial coefficient. This is called Leibniz’s formula.
5. Given two functionsJ and g whose derivativesf’ and g’ satisfy the equations
(5.30)
f’w = g(x) >
g’(x)
= -f(x),
f(O) = 0,
g(O) = 1,
for every x in some open interval .Z containing 0. For example, these equations are satisfied
when f (x) = sin x and g(x) = COS x.
Miscellaneous
review exercises
223
(a) Prove thatf2(x) + g2(x) = 1 for every x in J.
(b) Let F and G be another pair of functions satisfying (5.30). Prove that F(x) =f(x) and
G(x) = g(x) for every x in J. [Hint: Consider h(x) = [F(x) -,f(x>]” + [G(x) - ~(X)I~.]
(c) What more cari you say about functionsfand g satisfying (5.30)?
6. A function f, defined for a11 positive real numbers, satisfies the equation f(x2) = xa for every
x > 0. Determinef’(4).
7. A function g, defined for a11 positive real numbers, satisfies the following two conditions:
g(1) = 1 and g’(x2) = x3 for a11 x > 0. Compute g(4).
8. Show that
2 sin t
for a11 x 2 0.
pdt>O
+1
s,t
9. Let C, and C, be two curves passing through the origin as indicated in Figure 5.2. A curve
C is said to “bisect in area” the region between C, and C, if, for each point P of C, the two
shaded regions A and B shown in the figure have equal areas. Determine the Upper curve C,,
given that the bisecting curve C has the equation y = x2 and that the lower curve C, has the
equation y = 4x2.
0
F IGURE 5.2 Exercise
9.
10. A functionfis defined for a11 x as follows:
x2
f(x) = o
i
if x is rational ,
if x is irrational .
Let Q(h) = f (h)/h if h # 0. (a) Prove that Q(h) - 0 as h - 0. (b) Prove that
at 0, and compute,f’(O).
f has a derivative
In Exercises 11 through 20, evaluate the given integrals. Try to simplify the calculations by
using the method of substitution and/or integration by parts whenever possible.
11. [(2 + 3x) sin 5x dx.
16. J;x4(l - x)~O dx.
12. j”xdï%?dx.
s
sin $‘zdx.
13. j:2x(x2 - l)sdx.
18.
i
l2x+3
~ dx.
o (6x + 7>3
19.
s
x4(1 + X~)~~X.
20. Id1 + 3cos2xsin2xdx.
14.
15.
s
x sin x2 COS x2 dx.
The relation between integration and diferentiation
224
21. Show that the value of the integral fi 375x5(x2 + l)-” dx is 2” for some integer n.
22. ,Determine a pair of numbers a and b for which ji (ax + b)(x2 + 3x + 2)-2 dx = 3/2.
23. Let Zn = jt(l - x~)~ dx. Show that (2n + l)Z, = 2n InpI, then use this relation to compute
Z2, Z3, Z4, ad &.
24. Let F(m, n) = jg tm(l + t)” dt, m > 0, n > 0. Show that
(m + l)F(m, n) + nF(m + 1, n - 1) = xm+l(l + x)n.
Use this to evaluate F(l0, 2).
25. Letf(n) =Si’” tan* x dx where n 2 1. Show that
(4 fCn + 0 <fM.
(b) fC4 +fk - 2) = &
if n > 2.
1
& < 2fW < n - l
(4
i f n>2.
26, Compute f(O), given that f(r) = 2 and that ji[f<x) +Y(x)] sin x dx = 5.
27. Let A denote the value of the integral
s
+ cosx
o (x + 2ydx*
Compute the following integral in terms of A:
dz sin x COS x
x+1 dx*
s0
The formulas in Exercises 28 through 33 appear in integral tables. Verify each of these formulas
by any method.
dx=22/afb,+a
29.
30.
31.
s
X+~X
s
+ c*
2
X~(~X + b)3’2 - nb xn-ldz dx
a(2n + 3)
s
COS” x
s
s
+ C (n # -3).
d& dx = (2m: l)b xma - ma qsdx + C (m # -4).
(
s
1
dax+b
(2n - 3)~
dx
Xy,"dk
=
(n
_
,)bx"-1
(2n
2)b
Xn-ld&q
s
s
32.
33.
+ b dx =
s
z dx = (m ~~,:n~-lx + ~~~c~ dx + C
zdx=-
Co??+l
x
(n - 1) sinn-l x -
+ c cn + l).
(m # n).
m - n + 2 Cos” x
n - 1 s FXdx + C (fi f 1).
34. (a) Find a polynomial P(x) such that P’(x) - 3P(x) = 4 - 5x + 3x2. Prove that there is
only one solution.
Miscellaneous review exercises
225
(b) If Q(x) is a given polynomial, prove that there is one and only one polynomial P(x) such
that P’(x) - 3P(x) = Q(x).
35. A sequence
of polynomials (called the Bernoullipolynomials) is defined inductively as follows:
P,(x) = 1;
Pi(x) = nP,-.1(x)
and
j”iPn(x) dx = 0
i f n>l.
(a) Determine explicit formulas for PI(x), P2(x), . . . , Pa(x).
(b) Prove, by induction, that P,(x) is a polynomial in x of degree n, the term of highest degree
being xn.
(c) Prove that P,(O) = P,(l) if n 2 2.
(d) Prove that P,(x + 1) - P,(x) = nxn-l if n 2 1.
(e) Prove that for n 2 2 we have
k
P,(x) dx =
PTz,lW
- pn+m
n+l
*
(f) Prove that PJ1 - x) = ( -l)nP,(x) if n 2 1.
(g) Prove that P2,+l(0) = 0 and P,,-,(i) = 0 if n 2 1.
36. Assume that if”(x)1 5 m for each x in the interval [O, a], and assume thatftakes on its largest
value at an interior point of this interval. Show that If’(O)1 + If’(a)] 5 am. You may assume
thatf” is continuous in [0, a].
6
THE LOGARITHM, THE EXPONENTIAL,
AND THE INVERSE TRIGONOMETRIC FUNCTIONS
6.1 Introduction
Whenever man focuses his attention on quantitative relationships, he is either studying
the properties of a known function or trying to discover the properties of an unknown
function. The function concept is SO broad and SO general that it is not surprising to find
an endless variety of functions occurring in nature. What is surprising is that a few rather
special functions govern SO many totally different kinds of natural phenomena. We shall
study some of these functions in this chapter-first of all, the logarithm and its inverse
(the exponential function) and secondly, the inverses of the trigonometric functions. Anyone who studies mathematics, either as an abstract discipline or as a tool for some other
scientific field, Will find that a good working knowledge of these functions and their properties is indispensable.
The reader probably has had occasion to work with logarithms to the base 10 in an
elementary algebra or trigonometry course. The definition usually given in elementary
algebra is this: If x > 0, the logarithm of x to the base 10, denoted by log,, x, is that
real number u such that 10” = x. If x = 10U and y = IO”, the law of exponents yields
x y = lo”+“. In terms of logarithms, this becomes
(6.1)
Qhl (xy) = logm x + log,, y.
It is this fundamental property that makes logarithms particularly adaptable to computations involving multiplication. The number 10 is useful as a base because real numbers
are commonly written in the decimal system, and certain important numbers like 0.01,
0.1, 1, 10, 100, 1000, . . . have for their logarithms the integers -2, -1, 0, 1, 2, 3, . . . ,
respectively.
It is not necessary to restrict ourselves to base 10. Any other positive base b # 1 would
serve equally well. Thus
(6.2)
ZJ = log, x
means
x= b”,
and the fundamental property in (6.1) becomes
(6.3)
226
log, (xy) = log, x + log, y .
Motivation for the dejînition of the natural logarithm as an integral
227
If we examine the definition in (6.2) from a critical point of view, we find that it suffers
from several logical gaps. First of all, to understand (6.2) we must know what is meant
by bu. This is easy to define when u is an integer or a rational number (the quotient of two
integers), but it is not a trivial matter to define bu when u is irrational. For example, how
should we define lO<2? Even if we manage to obtain a satisfactory definition for bu,
there are further difficulties to overcome before we cari use (6.2) as a good definition of
logarithms. It must be shown that for every x > 0, there actually exists a number u such
that x = bu. Also, the law of exponents, b”b” = bu+“, must be established for a11 real
exponents u and v in order to derive (6.3) from (6.2).
It is possible to overcome these difficulties and arrive at a satisfactory definition of
logarithms by this method, but the process is long and tedious. Fortunately, however,
the study of logarithms cari proceed in an entirely different way which is much simpler
and which illustrates the power and elegance of the methods of calculus. The idea is to
introduce logarithmsjrst, and then use logarithms to define bu.
6.2 Motivation for the definition of the natural logarithm as an integral
The logarithm is an example of a mathematical concept that cari be defined in many
different ways. When a mathematician tries to formulate a definition of a concept, such
as the logarithm, he usually has in mind a number of properties he wants this concept
to have. By examining these properties, he is often led to a simple formula or process
that might serve as a definition from which a11 the desired properties spring forth as logical
deductions.
We shall illustrate how this procedure may be used to arrive at the definition
of the logarithm which is given in the next section.
One of the properties we want logarithms to have is that the logarithm of a product
should be the sum of the logarithms of the individual factors. Let us consider this property
by itself and see where it leads us. If we think of the logarithm as a functionf, then we
want this function to have the property expressed by the formula
(6.4)
f(v) =f(x) +f(y)
whenever x, y, and xy are in the domain off.
An equation like (6.4), which expresses a relationship between the values of a function
at two or more points, is called a functional equation. Many mathematical problems cari
be reduced to solving a functional equation, a solution being any function which satisfies
the equation. Ordinarily an equation of this sort has many different solutions, and it is
usually very difficult to find them all. lt is easier to seek only those solutions which have
some additional property such as continuity or differentiability. For the most part, these
are the only solutions we are interested in anyway. We shall adopt this point of view and
determine a11 differentiable solutions of (6.4). But first let us try to deduce what information
we cari from (6.4) alone, without any further restrictions on f.
One solution of (6.4) is the function that is zero everywhere on the real axis. In fact,
this is the only solution of (6.4) that is defined for a11 real numbers. TO prove this, letf
be any function that satisfies (6.4). If 0 is in the domain off, then we may put y = 0 in
(6.4) to obtain f (0) = f (x) + f (0), and this implies that f (x) = 0 for every x in the domain
off. In other words, if 0 is in the domain off, thenfmust be identically zero. Therefore,
a solution of (6.4) that is not identically zero cannot be defined at 0.
228
The logarithm,
the exponential, and the inverse trigonometric functions
If f is a solution of (6.4) and if the domain off includes 1, we may put x = y = 1 in
(6.4) to obtain f (1) = 2f (l), and this implies
f(1) = 0.
If both 1 and - 1 are in the domain off, we may take x = - 1 and y = - 1 to deduce
thatf(1) = 2f(-1); hencef(-1) = 0. If now x, -x, 1, and - 1 are in the domain off,
we may put y = - 1 in (6.4) to deduce f(-x) =f( - 1) + f (x) and, since f (- 1) = 0,
we find
f(--4 =fW*
In other words, any solution of (6.4) is necessarily an euen function.
Suppose, now, we assume that f has a derivative f ‘(x) at each x # 0. If we hold y fixed
in (6.4) and differentiate with respect to x (using the chain rule on the left), we find
Yf ‘(x9 =f ‘(x)
When x = 1, this gives us y,‘()~) =f’(l), and hence we have
f’(y) =f’(i>
Y
y#O.
for each
From this equation we see that the derivative f’ is monotonie and hence integrable on
every closed interval not containing the origin. Also, f’ is continuous on every such interval,
and we may apply the second fundamental theorem of calculus to Write
f(x) -f(c) =Izf’(r) dt = f’(l)/; f dt .
e
If x > 0, this equation holds for any positive c, and if x < 0, it holds for any negative c.
Since f(1) = 0, the choice c = 1 gives us
f(x)
If
x
= f’(l)/l’f dt
if
x>0 I
is negative then -x is positive and, since f (x) = f (-x), we find
f(x) = /‘(1)/ez L dt
1t
if
x < 0.
These two formulas for f(x) may be combined into one formula that is vaiid for both
positive and negative x, namely,
f(x) = f’(l)l”’ t dt
if x # 0 .
Therefore we have shown that if there is a solution of (6.4) which has a derivative at each
The dejnition of the logarithm. Basic properties
point x # 0, then this solution
Iff’(1) = 0, then (6.5) implies
the solution that is identically
f’(1) # 0, in which case we cari
229
must necessarily be given by the integral formula in (6.5).
thatf(.x) = 0 for a11 x # 0, and this solution agrees with
zero. Therefore, if f is not identically zero, we must have
divide both sides of (6.5) byf’(1) to obtain
g(x) = s1lz j1- dt
i f
x#O,
where g(x) = f(x)/f’(l). The function g is also a solution of (6.4), since cf is a solution
whenever f is. This proves that if (6.4) has a solution that is not identically zero and if
this solution has a derivative everywhere except at the origin, then the function g given
by (6.6) is also a solution, and all solutions may be obtained from this one by multiplying
g by a suitable constant.
It should be emphasized that this argument does not prove that the function g in (6.6)
actually is a solution, because we derived (6.6) on the assumption that there is at least one
solution that is not identically zero. Formula (6.6) suggests a way to construct such a
solution. We simply operate in reverse. That is, we use the integral in (6.6) to define a
function g, and then we verify directly that this function actually satisfies (6.4). This
suggests that we should define the logarithm to be the function g given by (6.6). If we
did SO, this function would have the property that g(-x) = g(x) or, in other words,
distinct numbers would have the same logarithm. For some of the things we want to do
later, it is preferable to define the logarithm in such a way that no two distinct numbers
have the same logarithm. This latter property may be achieved by defining the logarithm
only for positive numbers. Therefore we use the following definition.
6.3 The definition of the logarithm. Basic properties
D E F I N I T I O N.
If x is a positive real number, w-e dejine the natural logarithm of x, denoted
temporarily by L(x), to be the integral
L(x) = ‘ldt.
s1 t
(6.7)
When x > 1, L(x) may be interpreted geometrically as the area of the shaded region
shown in Figure 6.1.
6.1. The logarithm function has the following properties:
(a) L(1) = 0.
THEOREM
(b) L’(x) = i
for every x > 0.
(c) L(ab) = L(a) + L(b)
for every a > 0, b > 0.
Proof. Part (a) follows at once from the definition. TO prove (b), we simply note that
L is an indefinite integral of a continuous function and apply the first fundamental theorem
The logarithm, the exponential, and the inverse trigonometric functions
230
of calculus. Property (c) follows from the additive property of the integral. We Write
In the last integral we make the substitution u = t/a, du = dt/a, and we find that the
integral reduces to L(b), thus proving (c).
F IGURE 6.1 Interpretation
of the log-
F IGURE 6.2
arithm as an area.
The graph of the natural logarithm.
6.4 The graph of the natural logarithm
The graph of the logarithm function has the general shape shown in Figure 6.2. Many
properties of this curve cari be discovered without undue calculation simply by referring
to the properties in Theorem 6.1. For example, from (b) we see that L has a positive
derivative everywhere SO it is strictly increasing on every interval. Since L(1) = 0, the
graph lies above the x-axis if x > 1 and below the axis if 0 < x < 1. The curve has slope
1 when x = 1. For x > 1, the slope gradually decreases toward zero as x increases
indefinitely. For small values of X, the slope is large and, moreover, it increases without
bound as x decreases toward zero. The second derivative is L”(x) = -1/x2 which is
negative for a11 x, SO L is a concave function.
6.5
Consequences
of the functional equation L(ab) = L(a) + L(b)
Since the graph of the logarithm tends to level off as x increases indefinitely, it might
be suspected that the values of L have an Upper bound. Actually, the function is unbounded
above; that is, for every positive number M (no matter how large) there exist values of x
such that
(6.8)
L(x) > M.
Consequences
of the functional equation L(ab) = L(a) + L(b)
231
We cari deduce this from the functional equation. When a = b, we get L(a2) = 2L(a).
Using the functional equation once more with b = a2, we obtain L(a3) = 3L(a). By
induction we find the general formula
L(a”) = nL(a)
for every integer n 2 1. When a = 2, this becomes L(2”) = nL(2), and hence
(6.9)
W”) > M
we have
when n > &.
L(2)
This proves the assertion in (6.8). Taking b = l/a in the functional equation, we find
L(i/a) = -L(a). In particular, when a = 2”, where n is chosen as in (6.9), we have
L ($ 1=
-~5(2”) <
-M >
which shows that there is also no lower bound to the function values.
Finally we observe that the graph crosses every horizontal line exactly once. That is,
given an arbitrary real number b (positive, negative, or zero), there is one and only one
a > 0 such that
(6.10)
L(a) = b .
TO prove this we cari argue as follows: If b > 0, choose any integer n > b/L(2). Then
L(2”) > b because of (6.9). Now examine the function L on the closed interval [l, 2”].
Its value at the left endpoint is L(1) = 0, and its value at the right endpoint is L(2”).
Since 0 < b < L(2”), the intermediate-value theorem for continuous functions (Theorem
3.8 in Section 3.10) guarantees the existence of at least one a such that L(a) = b. There
cannot be another value a’ such that L(a’) = b because this would mean L(a) = L(a’)
for a # a’, thus contradicting the increasing property of the logarithm. Therefore the
assertion in (6.10) has been proved for b > 0. The proof for negative b follows from this
if we use the equation L(i/a) = -L(a). In other words, we have proved the following.
THEOREM 6.2.
For every real number b there is exactly one positive real number a whose
Iogarithm, L(a), is equal to b.
In particular, there is exactly one number whose natural logarithm is equal to 1. This
number, like YT, occurs repeatedly in SO many mathematical formulas that it was inevitable
that a special symbol would be adopted for it. Leonard Euler (1707-1783) seems to have
been the first to recognize the importance of this number, and he modestly denoted it
by e, a notation which soon became standard.
DEFINITION.
(6.11)
We denote by e that numberfor which
L(e) = 1 .
232
The logarithm, the exponential, and the inverse trigonometric functions
In Chapter 7 we shall obtain explicit formulas that enable us to calculate the decimal
expansion of e to any desired degree of accuracy. Its value, correct to ten decimal places,
is 2.7182818285. In Chapter 7 we also prove that e is irrational.
Natural logarithms are also called Napierian Zogarithms, in honor of their inventor,
John Napier (1550-1617). It is common practice to use the symbols In x or log x instead
of L(x) to denote the logarithm of x.
6.6 Logarithms referred to any positive base b # 1
The work of Section 6.2 tells us that the most general f which is differentiable on the
positive real axis and which satisfies the functional equation f(xy) = f(x) + f(u) is given
by the formula
(6.12)
f(x) = c log x ,
where c is a constant. For each c, we could cal1 thisf(x) the logarithm of x associated with
c although, of course, its value would not be necessarily the same as the natural logarithm
of x. When c = 0, fis identically zero, SO this case is uninteresting. If c # 0, we may
indicate in another way the dependence off on c by introducing the concept of a base
for logarithms.
From (6.12) we see that when c # 0, there exists a unique real number b > 0 such that
f(b) = 1. This b is related to c by the equation c log b = 1; hence b # 1, c = l/log b,
and (6.12) becomes
f(x) = ‘Fb .
For this choice of c we say that
forf(x).
DEFINITION.
f
(x) is the logarithm of x to the base b and we Write log, x
If b > 0, b # 1, and I~X > 0, the logarithm of x to the base b is the number
log x
log, x = log b ’
where the logarithms on the right are natural logarithms.
Note that log, b = 1. Also, when b = e, we have log, x = log x, SO natural logarithms
are those with base e. Since logarithms to base e are used SO frequently in mathematics,
the word logarithm almost invariably means natural logarithm. Later, in Section 6.15,
we shall define bu in such a way that the equation bu = x Will mean exactly the same as the
equation u = log, x.
Since logarithms to the base b are obtained from natural logarithms by multiplying by
the constant l/log b, the graph of the equation y = log, x may be obtained from that of
the equation y = log x by simply multiplying a11 ordinates by the same factor. When
b > 1, this factor is positive, and, when b < 1, it is negative. Examples with b > 1 are
Dl$erentiation
and integration formulas involving logarithms
233
Y
t
1
\
\
\
0
\
---_
“\
-‘<b<l
n
f
(a) b > 1
FIGURE 6 . 3
e
( b ) O<b<l
The graph of y = logb x for various values of b.
shown in Figure 6.3(a). When b < 1, we note that I/b > 1 and log b = -1og (l/b), SO
the graph of y = log, x may be obtained from that of y = log,,, x by reflection through
the x-axis. Examples are shown in Figure 6.3(b).
6.7 Differentiation and integration formulas involving logarithms
Since the derivative of the logarithm is given by the formula D log x = 1/x for x > 0,
we have the integration formula
s
;dx=logx+C.
More generally, if u =f(x), wheref has a continuous derivative, we have
(6.13)
s
du
- = log t4 + C
U
or
s
f’(x> dx = logf(x) + C .
f(x)
Some tare must be exercised when using (6.13) because the logarithm is not defined for
negative numbers. Therefore, the integration formulas in (6.13) are valid only if U, or
f(x), is positive.
Fortunately it is easy to extend the range of validity of these formulas to accommodate
functions that are negative or positive (but nonzero). We simply introduce a new function
L,, defined for a11 real x # 0 by the equation
1x1 1
(6.14)
L,(x) = log [XI =
s1
;dt,
a definition suggested by Equation (6.6) of Section 6.2. The graph of L, is symmetric
about the y-axis, as shown in Figure 6.4. The portion to the right of the y-axis is exactly
the same as the logarithmic curve of Figure 6.2.
234
The logarithm, the exponential, and the inverse trigonometric functions
Since log Ixyl 7 log (1x1 1~~1) = log 1x1 + log 1~1, the function L, also satisfies the basic
functional equation in (6.4). That is, we have
Ll(xy) = L,(x) + Llw
for a11 real x and y except 0. For x > 0, we have L;(x) = 1/x since L,(x) is the same as
log x for positive x. This derivative formula also holds for x < 0 because, in this case,
L,(x) = L(-x), and hence LA(x) = - L’( -x) = - 1/(-x) = 1/x. Therefore we have
L;(x) = i
(6.15)
X
FIGURE 6.4
for a11 real x # 0 .
The graph of the function L,.
Hence, if we use L, instead of L in the foregoing integration formulas, we cari extend
their scope to include functions which assume negative values as well as positive values.
For example, (6.13) cari be generalized as follows:
s
du
- = l o g IUI + c >
U
f’(X> dx
s
f(x)
= 1% If(x>l + c.
Of course, when we use (6.16) along with the second fundamental theorem of calculus to
evaluate a definite integral, we must avoid intervals that include points where u or
f(x) might be zero.
EXAMPLE 1. Integrate J tan x dx.
Solution. The integral has the form -j dulu, where u =
fore we have
s
tan x dx = -
COS
x, du = -sin x dx. There-
du
- = -1og IUI + c = -1og Icos XI + C)
U
a formula which is valid on any interval in which
COS
x # 0.
Logarithmic d$erentiation
235
The next two examples illustrate the use of integration by parts.
EXAMPLE
2. Integrate S log x dx.
Solution. Let u = log x, du = dx. Then du = dxlx, v = x, and we obtain
/logxdx=/udv=uv-/vdu=xlogx-/x;dx=xlogx-x+C.
EXAMPLE
3. Integrate S sin (log x) dx.
Solution. Let u = sin (log x), v = x. Then du = COS (log x)( 1 /x) dx, and we find
I sin (log x) dx = I u du = UV - i v du = x sin (log x) - s
COS
(log x) dx .
In the last integral we use integration by parts once more to get
!”
COS
(log
x) dx = x
COS (log
x) + [ sin (log x) dx .
Combining this with the foregoing equation, we find that
i sin (log x) dx = 4x sin (log x) - &x
COS
(log x) + C ,
COS
(log x) + C .
and
î COS
(log x) dx = 4x sin (log
xj
+ ix
6.8 Logarithmic differentiation
We shall describe now a technique known as logarithmic d@erentiation which is often
a great help in computing derivatives. The method was developed in 1697 by Johann
Bernoulli (1667-1748), and a11 it amounts to is a simple application of the chain rule.
Suppose we form the composition of L, with any differentiable function f; say we let
g(x) = 4llfw = log If(x>l
for those x such that f(x) # 0. The chain rule, used in conjunction with (6.15), yields the
formula
g’(x) = L’[f(x)]
*f’(x) - f’o
0
- f(x) .
If the derivative g’(x) cari be found in some other way, then we may use (6.17) to obtain
f’(x) by simply multiplying g’(x) by f(x). The process is useful in practice because in
many cases g’(x) is easier to compute than f’(x) itself. In particular, this is true when f is
a product or quotient of several simpler functions.
The following example is typical.
236
The logarithm, the exponential, and the inverse trigonometric functions
EXAMPLE.
Computef’(x)
if f (x) = x2
COS
x (1 + x4)-‘.
Sohtion. We take the logarithm of the absolute value off(x) and then we differentiate.
Let
g(x) = log (f(x)\ = log x2 + log (COS x( + log (1 + x4)-’
= 2 log 1x1 + log [COS XI -7 log (1 + x”).
Differentiation yields
g’(x) = f’(X> = 2 _ sin
- x
f(x)
x
COS
x
- 28x3 1 + x4.
Multiplying by f (x), we obtain
2x COS x
x2 sin x
28x5
COS
x
f’(x) = (1 + x4)’ - (1 + x4)’ - (1 + x4)8 .
6.9 Exercises
(a) Find a11 c such that log x = c + jz t-l dt for a11 x > 0.
(b) Let f(x) = log [(l + x)/(1 - x)] if x > 0. If a and b are given numbers, with ab # - 1,
find a11 x such thatf(x) =f(a) +f(b).
In each case, find a real x satisfying the given equation.
(c) 2 log x = x log 2, x # 2.
(a) log (1 + x) = log (1 - x).
(b) log (1 + x) = 1 + log(1 - x).
(d) log(z/x + &?Ï) = 1.
Let f(x) = (log x )/ x if x > 0. Describe the intervals in which f is increasing, decreasing,
convex, and concave. Sketch the graph off.
In Exercises 4 through 15, find the derivativef’(x). In each case, the function f is assumed to be
defined for a11 real x for which the given formula for f(x) is meaningful.
4. f(x) = log (1 + x2).
10. f(x) = (x + dïT?)n
5. f(x) = log d-7-2.
6. f(x) = log 1/=.
ll.f(x) =&-TÏ -log(l +m).
12. f(x) = x log (x + l/l+xz> - .+x2:
7. f(x) = log (log x).
8. f(x) = log(x2 log x).
9. f(X) = B 1ogs <
14. f(x) = x[sin (log x) - Cos (log x)1.
15. f(x) = log, e.
In Exercises 16 through 26, evaluate the integrals.
17. s log2 x dx.
21. jcotxdx.
18. Jxlogxdx.
22. Jx” log (ax) dx.
19. j x log2 x dx.
23. j x2 log2 x dx.
Exercises
24.
dx
26.
s xlogx’
1-e- log (1 - t)
25.
1 - * dt.
s0
27. Derive the recursion formula
x” log” x dx =
s
xm+l 10 gn x
mfl
s
231
xd& dx.
- --& x” logn-l x dx
s
and use it to integrate sx” log3 x dx.
28. (a) If x > 0, let f(x) = x - 1 - log x, g(x) = log x - 1 + 1/x. Examine the signs off’
and g’ to prove that the inequalities
1
1 -;<logx<x-1
are valid for x > 0, x # 1. When x = 1, they become equalities.
(b) Sketch graphs of the functions A and B defined by the equations A(x) = x - 1 and
B(x) = 1 - 1/x for x > 0, and interpret geometrically the inequalities in part (a).
29. Prove the limit relation
lim
x-o
log (1 + 4 = 1
X
by the following two methods: (a) using the definition of the derivative L’(1); (b) using the
result of Exercise 28.
30. If a > 0, use the functional equation for the logarithm to prove that log (ar) = r log a for
every rational number Y.
31. Let P = {a,, ul, u2, . . . , a,} be any partition of the interval [1, x], where x > 1.
(a) Integrate suitable step functions that are constant on the open subintervals of P to derive
the following inequalities :
si”’ --a,-,) < logx <z(“k;--r).
k=l
(b) Interpret the inequalities of part (a) geometrically in terms of areas.
(c) Specialize the partition to show that for every integer n > 1,
n 1
k<logn<
c
k=2
n-1
c
1
k.
k=l
32. Prove the following formulas for changing from one logarithmic base to another:
loga x
(b) log, x = logb .
a
33. Given that log, 10 = 2.302585, correct to six decimal places, compute log,, e using one of the
formulas in Exercise 32. How many correct decimal places cari you be certain of in the result
of your calculation? Note: A table, correct to six decimal places, gives log,, e = 0.434294.
(a) log, x = log, a log, x;
238
The logarithm, the exponential, and the inverse trigonometric functions
34. A function f, continuous on the positive real axis, has the property that for a11 choices of
x > 0 and y > 0, the integral
s
rf(t) d t
is independent of x (and therefore depends only on y). If f(2) = 2, compute the value of the
integral A(x) = jTf(t) dt for a11 n > 0.
35. A functionf, continuous on the positive real axis, has the property that
j;f(t)dt =yJ;f(t)dt +x/;f(t)dt
for a11 x > 0 and a11 y > 0. If f (1) = 3, compute f (x) for each x > 0.
36. The base of a solid is the ordinate set of a function f which is continuous over the interval
[l, a]. Al1 cross sections perpendicular to the interval Il, a] are squares. The volume of the
solid is $a3 log2 a - $a3 log a + &u3 - & for every a 2 1. Compute f(u).
6.10 Polynomial approximations to the logarithm
In this section we Will show that the logarithm function cari be approximated by certain
polynomials which cari be used to compute logarithms to any desired degree of accuracy.
TO simplify the resulting formulas, we first replace x by 1 - x in the integral defining
the logarithm to obtain
log (1 - x) =
‘-* dt
s1
->
t
which is valid if x < 1. The change of variable t = 1 - u converts this to the form
-log(l - x )
* du
= s0 1 - u ’
valid for x < 1.
Now we approximate the integrand I/(l - u) by polynomials which we then integrate to
obtain corresponding approximations for the logarithm. TO illustrate the method, we
begin with a simple linear approximation to the integrand.
From the algebraic identity 1 - u2 = (1 - u)(l + u), we obtain the formula
(6.18)
u2
1
-=1+u+l - u ’
1 - U
valid for any real u # 1. Integrating this from 0 to x, where x < 1, we have
(6.19)
- log (1 - x) = x + -2 +
x z12
- d u .
s0 1 - u
The graph of the quadratic polynomial P(x) = x + 4x2 which appears on the right of
(6.19) is shown in Figure 6.5 along with the curve y = -1og (1 - x). Note that for x near
zero the polynomial P(x) is a good approximation to -1og (1 - x). In the next theorem
we use a polynomial of degree n - 1 to approximate l/(l - u), and thereby obtain a
polynomial of degree n which approximates log (1 - x).
Polynomial approximations to the logarithm
.* c’
239
/’ /’
Y.’
---ÿ - log (1 - x)
FIGURE 6.5
THEOREM
6.3.
A quadratic polynomial approximation to the curve y = -1og (1 - x),
Let P, denote the polynomial of degree n given by
Then, for every x < 1 and every n 2 1, we have
x un
-1og (1 - x) = P,(x) + o Ï du .
s
- 11
(6.20)
Proof. From the algebraic identity
1 - un = (1 - u)(l + u + u2 + . . . + Un-l),
we obtain the formula
1
l - u
= 1 + u + u2 + . . . + un-l + & )
which is valid for u # 1. Integrating this from 0 to x, where x < 1, we obtain (6.20).
We cari rewrite (6.20) in the form
(6.21)
-1og (1 - x) = P,(x) + En(x),
where E,(x) is the integral,
E,(x) = ,‘-& du .
s
240
The logarithm,
the exponential, and the inverse trigonometric fînctions
The quantity E,(x) represents the error made when we approximate -1og (1 - x) by the
polynomial P,(x). TO use (6.21) in computations, we need to know whether the error is
positive or negative and how large it cari be. The next theorem tells us that for small
positive x the error E,(x) is positive, but for negative x the error has the same sign as
(- l)n+r, where n is the degree of the approximating polynomial. The theorem also gives
useful Upper and lower bounds for the error.
THEOREM
6.4.
If 0 < x < 1, M>e have the inequalities
p+l
(6.22)
$+
< E,(x)
< -A----
1-xn+l’
If x < 0, the error E,(x) has the same sign as (- l)“+l, and we have
n+l
(6.23)
0 < (-l)“+‘E,(x)
5 IxI
Il + 1.
Proof. Assume that 0 < x < 1. In the integral defining E,(x) we have 0 < u 5 x,
1 - x < 1 - u 5 1, and hence the integrand satisfies the inequalities
Un
un 5 -
l - u
SO
4%
1 -x’
Integrating these inequalities, we obtain (6.22).
TO prove (6.23), assume x < 0 and let t = -x = 1x1. Then t > 0 and we have
t
t (-v)”
E,(x) = E,(-t) = ut-& du = - - du = (-I)“+l
2 du .
s0 1+v
s
s01+v
This shows that E,(x) has the same sign as (- l)“+l. Also, we have
(-l)“+‘E,(x)
=s,%v du <j+‘vn dv = f$ = 5 ,
0
which compfetes the proof of (6.23).
The next theorem gives a formula which is admirably suited for computations of logarithms.
THEOREM
6.5.
If 0 < x < 1 and zfrn 2 1, we have
1+x
-p-l
log - =2x+$+...++ Ux) 9
l - x
2m - 1 1
(
Polynomial approximations to the logarithm
where the error term, R,,(x), satisjîes
241
the inequalities
2 - x gm+1
X2mfl < R,,,(x) 5 -~
2m + 1
l-x2m+l’
(6.24)
Proof. Equation (6.21) is valid for any real x < 1. If we replace x by -x in (6.21),
keeping x > - 1, we obtain the formula
(6.25)
-1og (1 + x) = PJ-X) + E,(-x).
If -1 < x < 1, both (6.21) and (6.25) are valid. Subtracting (6.25) from (6.21), we find
1+x
log - = P,(x) - PJ-X) + E,(x) - E,(-x).
(6.26)
l - x
In the difference P,(x) - PJ-x), the even powers of x cancel
up. Therefore, if n is even, say n = 2m, we have
and the odd powers double
P&(X) - Pz,(-x) = 2 x + $ + . . . +
(
and Equation (6.26) becomes
1+x
log l - x
5
+ R,(x),
where R,(x) = Ezm(x) - &,(-x). This formula is valid if x lies in the open interval
- 1 < x < 1. Now we restrict x to the interval 0 < x < 1. Then the estimates of Theorem
6.4 give us
gm+1
and
0 < -&,(--x) I ~
2m + 1.
Adding these, we obtain the inequalities in (6.24), since 1 + I/(l - x) = (2 - x)/(1 - x).
EXAMPLE.
Taking m = 2 and x = 4, we have (1 + x)/(1 - x) = 2, and we obtain the
formula
log 2 = 2(9 + &) + R,(g) 2
where
This gives us the inequalities 0.6921 < log 2 < 0.6935 with very little calculation.
242
The logarithm, the exponential, and the inverse trigonometric jiunctions
6.11 Exercises
1. Use Theorem 6.5 with x = i and m = 5 to calculate approximations to log 2. Retain nine
decimals in your calculations and obtain the inequalities 0.6931460 < log 2 < 0.6931476.
2 . If x = 5, then (1 + x)/(1 - x) = 3. Thus, Theorem 6.5 enables us to compute log 3 in terms
of log 2. Take x = 6 and m = 5 in Theorem 6.5 and use the results of Exercise 1 to obtain the
inequalities 1.098611 < log 3 < 1.098617.
Note: Since log 2 < log e < log 3, it follows that 2 < e < 3.
3 . Use Theorem 6.5 with x = & to calculate log 5 in terms of log 2. Choose the degree of the
approximating polynomial high enough to obtain the inequalities 1.609435 < log 5 < 1.609438.
4 . Use Theorem 6.5 with x = Q to calculate log 7 in terms of log 5. Choose the degree of the
approximating polynomial high enough to obtain the inequalities 1.945907 < log 7 < 1.945911.
5. Use the results of Exercises 1 through 4 to calculate a short table listing log n for n = 2, 3, . . . ,
10. Tabulate each entry with as many correct decimal places as you cari be certain of from the
inequalities in Exercises 1 through 4.
6.12 The exponential function
Theorem 6.2 shows that for every real x there is one and only oney such that L(y) = x.
Therefore we cari use the process of inversion to define y as a function of x. The resulting
inverse function is called the exponentialfinction, or the antilogarithm, and is denoted by E.
DEFINITION.
For any real x, we dejine E(x) to be that number y whose logarithm is x.
That is, y = E(x) means that L(y) = x.
The domain of E is the entire real axis; its range is the set of positive real numbers. The
graph of E, which is shown in Figure 6.6, is obtained from the graph of the logarithm by
Y
FIGURE 6.6 The graph of the exponential function is obtained from that of the
logarithm by reflection through the line y = x.
243
The exponential function
reflection through the line y = x. Since L and E are inverses of each other, we have
L[E(x)] = x
for a11 x
E[L(y)1 = Y
and
for a11 y > 0.
Each property of the logarithm cari be translated into a property of the exponential.
For example, since the logarithm is strictly increasing and continuous on the positive real
axis, it follows from Theorem 3.10 that the exponential is strictly increasing and continuous
on the entire real axis. The counterpart of Theorem 6.1 is given by the following theorem.
6.6. The exponential function has the following
E(1) = e.
(a) E(O) = 1,
(b) E’(x) = E(x)
for every x.
(c) E(a + b) = E(a)E(b)
for ail a and b.
THEOREM
properties:
Proof. Part (a) follows from the equations L(1) = 0 and L(e) = 1. Next we prove (c),
the functional equation for the exponential. Assume that a and b are given and let
x = E(a),
y = E(b) ,
c = L(xy) .
L(x) = a,
L(y) = b ,
E(c) = xy .
Then we have
But c = L(X~) = L(x) + L(y) = a + b. That is, c = a + b.
Hence, E(c) = E(a + b).
On the other hand, E(c) = xy = E(a)E(b), SO E(a + b) = E(a)E(b), which proves (c).
Now we use the functional equation to help us prove (b). The difference quotient for
the derivative E’(x) is
E(x + h) - E(x) = E(x)E@) - E(x) = E(x) E(h) - 1
h
h
h
*
Therefore, to prove (b) we must show that
(6.27)
lim Rh) - 1 =l
h-0
h
’
We shall express the quotient in (6.27) in terms of the logarithm. Let k = E(h) - 1.
Then k + 1 = E(h) SO L(k + 1) = h and the quotient is equal to
(6.28)
E(h)
- 1h
k=
L(k + 1) *
Now as h + 0, E(h) 4 1 because the exponential function is continuous at 1. Since
k = E ( h ) - l,wehavek+Oash+O.
But
L(k + 1) = L(k + 1) - L(1) - L’(1) = 1
k
k
In view of (6.28), this proves (6.27) which, in turn, proves (b).
a s
k+O.
244
The logarithm, the exponential, and the inverse trigonometric functions
6.13 Exponentials expressed as powers of e
The functional equation E(a + b) = E(a)E(b) has many interesting consequences.
example, we cari use it to prove that
For
E(r) = er
(6.29)
for every rational number r.
First we take b = -a in the functional equation to get
E(a)E(-a) = E(0) = 1 ,
and hence E(-a) = l/E(a) for every real a. Taking b = a, b = 2a, . . . , b = na in the
functional equation we obtain, successively, E(2a) = E(a)2, E(3a) = E(a)3, and, in general,
we have
(6.30)
E(na) = E(a)”
for every positive integer n. In particular, when a = 1, we obtain
E(n) = e” ,
whereas for a = I/n, we obtain E(1) = E(l/n)“. Since E(l/n) > 0, this implies
(6.31)
E i = &” .
0
Therefore, if we put a = l/m in (6.30) and use (6.31), we find
for a11 positive integers m and n. In other words, we have proved (6.29) for every positive
rational number r. Since E(-r) = l/E(r) = eC, it also holds for a11 negative rational r.
6.14 The definition of es for arbitrary real x
In the foregoing section weproved that e” = E(x) when x is any rational number. Now
we shall de$ne e” for irrational x by writing
(6.32)
e” = E(x)
for every real x .
One justification for this definition is that we cari use it to prove that the law of exponents
(6.33)
eaeb = ea+b
is valid for a11 real exponents a and b. When we use the definition in (6.32), the proof of
(6.33) is a triviality because (6.33) is nothing but a restatement of the functional equation.
Diflerentiation and integration formulas involving exponentials
245
The notation e” for E(x) is the one that is commonly used for the exponential. Occasionally exp(x) is written instead of ed, especially when complicated formulas appear in the
exponent. We shall continue to use E(x) from time to time in this chapter, but later we
shah switch to ex.
We have defined the exponential function SO that the two equations
y = e”
and
x = logy
mean exactly the same thing. In the next section we shall define
that the two equations y = a5 and x = log, y Will be equivalent.
more general powers
SO
6.15 The definition of a’ for a > 0 and x real
Now that we have defined ex for arbitrary real x, there is absolutely no difficulty in
formulating a definition of a5 for every a > 0. One way to proceed is to let a” denote that
number y such that log, y = x. But this does not work for a = 1, since logarithms to the
base 1 have not been defined. Another way is to define a” by the formula
(6.34)
a” = ,zloga
The second method is preferable because, first of all, it is meaningful for a11 positive a
(including a = 1) and, secondly, it makes it easy to prove the following properties of
exponentials:
(ab)” = a”b” .
log a” = x log a .
a”@ = a”‘”
(a=)Y = (auy = a”v .
Zfa# l,theny=a”ifandonlyifx=log,y.
The proofs of these properties are left as exercises for the reader.
Just as the graph of the exponential function was obtained from that of the logarithm
by reflection through the line y = x, SO the graph of y = a” cari be obtained from that
of y = log, x by reflection through the same line; examples are shown in Figure 6.7. The
curves in Figures 6.7 were obtained by reflection of those in Figures 6.3. The graph
corresponding to a = 1 is, of course, the horizontal line y = 1.
6.16 Differentiation and integration formulas involving exponentials
One of the most remarkable properties of the exponential function is the formula
E’(x) = E(x) )
(6.35)
which tells us that this function is its own derivative. If we use this along with the chain
rule, we cari obtain differentiation formulas for exponential functions
with any positive
base a.
Suppose f(x) = a” for x > 0. By the definition of a”, we may Write
f(x) =
ezloga = E(x log a) ;
246
hence,
The logarithm,
the exponential, and the inverse
trigonometric functions
by the chain rule, we find
f’(x) = I?(x log a) * log a = E(x log a) * log a = a” log a.
(6.36)
In other words, differentiation of a” simply multiplies a” by the constant factor log a, this
factor being 1 when a = e.
Y
t
O<a<i
e
1
a=f
!<a<1
e
\
\
l
\
\
\
\
\
\
\
\
\
\
\
4
a>e
C
(a) a > 1
FIGURE
(b) 0 < a <
1
6 . 7 The graph of y = a” for various values of a.
Of course, these differentiation formulas automatically lead to corresponding integration
formulas. For example, (6.35) yields the result
i e” dx =e”+C,
(6.37)
whereas (6.36) gives us the more general formula
s
(6.38)
a” +
a”dx = log a
c
(a > 0, a # 1) .
These may be generalized further by the method of substitution. We simply replace x
everywhere in (6.37) and (6.38) by u to obtain
(6.39)
s
eu du = eu + C ,
s
aUdu = au +
log a
c
(a > 0, a # 1) ,
DifSerentiation
and integration formulas incolving exponentials
247
where u now represents any function with a continuous derivative. If we Write u =f(x),
and du =f’(x) dx, the formulas in (6.39) become
s
efcx’f’(x)
dx = ef’“’ + C ,
s
affz~‘(x)
dx = af’“’ + C ,
log a
the second of these being valid for a > 0, a # 1.
EXAMPLE
1. Integrate Jx2er3 dx.
Solution. Let u = x3. Then du = 3x2 dx, and we obtain
s
x2$ dx = 13 1 er3(3x2 dx) = i 1 eu du = +eU + C = $exs + C .
2. Integrate
EXAMPLE
2A
- dx .
s 6
Solution. Let u = V5 = x%.
I
EXAMPLE
Then du = 4x-x dx = $ dx/&. Hence we have
$dx=+“jf$)
3. Integrate J
COS
=+‘du=2--&+C=g+C.
x e2 Si*x dx.
Solution. Let u = 2 sin x. Then du = 2
COS
x dx, and hence
we obtain
2sinïd~=~je2sinx(2coS~d~)=~je~dU=~e~+C=~e2Sin~+C.
i cas x e
EXAMPLE
4. Integrate J” e” sin x dx.
Solution. Let u = e5, du = sin x dx. Then du = e” dx, ~1 = -COS x, and we find
( 6 . 4 0 ) j e”sinxdx
= j u du =
UV
- i vdu = -ercosx + s e”
The integral j ex COS x dx is treated in the same way.
e” dx, v = sin x, and we obtain
(6.41)
s ex
COS
COS
x dx + C .
We let u = e”, du =
COS
x dx, du =
x dx = e” sin x - i e” sin x dx + C .
Substituting this in (6.40), we may solve for je” sin x dx and consolidate the arbitrary
constants to obtain
s
e” sin x dx = f (sin x -
COS
x) + C .
Notice that we cari use this in (6.41) to obtain also
s
ex
COS
x dx = f (COS x + sin x) + C .
248
The logarithrn, the exponential, and the imerse
ExAMPLE 5. Integrate
s
trigonometric fînctions
dx
1.
Solution. One way to treat this example is to rewrite the integrand as follows:
1
e -x
-=1 + e” eex + 1 ’
Now put u = e-” + 1. Then du = -e-” dx, and we get
s
dx e -’ =
_
eë” + 1
-e-“dx
du
-=-=-lOglUl+C=-log(l+e-“)+Ce
s e-” + 1
s u
The result cari be written in other ways if we manipulate the logarithm.
For instance,
1
= log e”
-1og (1 + eë’) = log ~
1 + e-”
e” + 1
= log (ez) - log (e’ + 1) = x - log (1 + e’)
Another way to treat this same example is to Write
1
e”
-=l--..-1 + e”
1 + e”’
Then we have
s
dx
e”
-=xd.x = x - - ,
1 + ex
s 1 + er
s u
where u = 1 + e”. Thus we find
=xs1 + e”
dx
log (1 + e’) + C ,
which is one of the forms obtained above.
6.17 Exercises
In Exercises 1 through 12, find the derivativef’(x). In each case the functionfis assumed to be
defined for a11 real x for which the given formula forf(x) is meaningful.
1. f(x) = e3z-1.
7. f(x) = 2”’ [which means 2(z2)].
2. f(x) = e4*‘.
8. f(x) = esin z.
3. f(x) = eë5*.
9. f(x) = ecosz %.
10.
f(x) = elOaz.
5. ;y; f $
. x
11. f(x) = eex [which means e(e5)].
6. f(x) = 2”.
12. f(x) = eeez [which means exp (e(e”))].
249
Exercises
Evaluate
the indefinite integrals in Exercises 13 through 18.
1 3 . xe”dx.
16.
14.
17.
s
s
x ePx dx.
18.
s
s
i
x2 e-2x dx.
eG dx.
x3e-xa dx.
19. Determine a11 constants a and b such that e” = b + ja et dt.
20. Let A = s cas COS bx dx and B = j e az sin bx dx, where a and b are constants, not both zero.
Use integration by parts to show that
aA - bB = eaz
COS
aB + bA = eaz sin bx + C29
bx + Cl,
where C, and C, are arbitrary constants. Solve for A and B to deduce the following integration
formulas :
s
eax
s
COS
bx dx =
eax sin bx dx =
eax(a
bx + b sin bx)
a2 + b2
COS
eaz(a sin bx - b
a2 + b2
COS
bx)
+ c,
+ c.
In Exercises 21 through 34, find the derivativef’(x). In each case, the functionfis assumed to be
defined for a11 real x for which the given formula for f(x) is meaningful. Logarithmic differentiation may simplify the work in some cases.
21. f(x) = x”.
28. f(x) = (log x)~.
22. f(x) = (1 + x)(1 + ezz).
29. f(x) = xlOgz.
mg 4
30. f(x) = xlog *
24. f(x) = xa’ + a”’ + aaz.
25. f(x) = log [log (log X)l.
26. f(x) = log (e” + dm).
31. f(x) = (sin x)cos 2 + (COS x)sin x.
32. f(x) = xl’%.
x2(3 - x)1/3
33’ fcX) = (1 _ x)(3 + x)2/3'
27. f(x) = x”‘.
34. f(x) = fi (x - a$<.
i=l
35. Let f(x) = xr, where x > 0 and r is any real number. The formula f(x) = rx’-l was proved
earlier for rational r.
(a) Show that this formula also holds for arbitrary real r. [Hint: Write x7 = erlogz.]
(b) Discuss under what conditions the result of part (a) applies for x 5 0.
36. Use the definition a” = ezloaa to derive the following properties of general exponentials:
(a) log a” = x log a.
(b) (ab)” = azb”.
(c) azau = a”+y.
(d) (a”)” = (a”)” = a”u.
(e) Ifa # 1, theny = az ifandonly ifx = log,y.
37. Let f(x) = $(az + a-%) if a > 0. Show that
J’(x + y) +f(x - y> = wQw .
The logarithm,
250
the exponential, and the inverse trigonometric functions
38. Letf(x) = ecr, where c is a constant. Show thatf’(0) = c, and use this to deduce the following
limit relation:
ecz - 1
lim - =c.
X
2+0
39. Let f be a function defined everywhere on the real axis, with a derivativef’ which satisfies
the equation
for every x ,
f’(x) = C~(X)
where c is a constant. Prove that there is a constant K such that f(x) = Kecx for every x.
[MM: Let g(x) = J’(x)eëc” and consider g’(x).]
40. Let f be a function defined everywhere on the real axis. Suppose also that f satisfies the
functional equation
(9
f(x + y) =f<x,fol)
for aIl x andy .
(a) Using only the functional equation, prove that f(0) is either 0 or 1. Also, prove that if
f(0) # 0 then f(x) # 0 for all x.
Assume, in addition to (i), thatf’(x) exists for a11 x, and prove the following statements:
W f’Wf~y> =f’(y>fW for a11 x and y.
(c) There is a constant c such thatf(x) = cf(x) for a11 x.
(d) f(x) = eca if f(0) # 0. [Hint: See Exercise 39.1
41. (a) Let f(x) = es - 1 - x for a11 x. Prove that f(x) 2 0 if x 2 0 and f’(x) 5 0 if x 5 0.
Use this fact to deduce the inequalities
e”>l + x ,
eë” > 1 - x ,
valid for a11 x > 0. (When x = 0, these become equalities.)
Integrate these inequalities to derive the following further inequalities, a11 valid for x > 0:
(b) e” > 1 + x + z,
eë” < 1 - x + 2!.
x2 x3
Cc> e” > 1 + x + ~1 + u ,
x2 x3
eë”>l-~+y,--.
. 3!
(d) Guess the generalization suggested and prove your result.
42. If n is a positive integer and if x > 0, show that
and that
i f x<n.
By choosing a suitable value of n, deduce that 2.5 < e < 2.99.
43. Let f(x, y) = xv where x > 0. Show that
af
-=
ax
Yx
Y-l
and
af
- =xVlogx.
aY
Exercises
251
6.18 The hyperbolic functions
Certain combinations of exponential functions occur quite frequently in analysis, and
it is worth while to give these combinations special names and to study them as examples
of new functions. These combinations, called the hyperbolic sine (sinh), the hyperbolic
cosine (cash), the hyperbolic tangent (tanh), etc., are defined as follows:
ex - e-=
sinh x = 2
1
csch x = sinh x ’
y = sinhx
’
cash
er + eë”
x = ~
2
1
sech x = cash x ’
’
er - e-”
sinh x
tanh x = - = cash x
e” + e-” ’
1
coth x = tanh x ’
y = cash x
FIGURE 6.8 Graphs
y = tanhx
of hyperbolic functions.
The prefix “hyperbolic” is due to the fact that these functions are related geometrically
to a hyperbola in much the same way as the trigonometric functions are related to a circle.
This relation Will be discussed in more detail in Chapter 14 when we study the hyperbola.
The graphs of the sinh, cash, and tanh are shown in Figure 6.8.
The hyperbolic functions possess many properties that resemble those of the trigonometric
functions. Some of these are listed as exercises in the following section.
6.19 Exercises
Derive the properties of the hyperbolic functions listed in Exercises 1 through 15 and compare
them, whenever possible, with the corresponding properties of the trigonometric functions.
1. cosh2x - sinh2x = 1.
2. sinh (-x) = -sinh x.
3. cash (-x) = cash x.
4. tanh (-x) = -tanh x.
5. sinh (x + y) = sinh x cash y + cash x sinh y.
6. cash (x + y) = cash x cash y + sinh x sinh y.
7. sinh 2x = 2 sinh x cash x.
8. cash 2x = cosh2 x + sinh2 x.
9. cash x + sinh x = ex.
10. cash x - sinh x = eP.
11. (cash x + sinh x)” = cash nx + sinh nx (n an integer).
12. 2sinh2&x = coshx - 1.
252
The logarithm, the exponential, and the inverse trigonometric jiinctions
13. 2cosh2& = coshx + 1.
14. tanh2 x + sech2 x = 1.
15. coth2x - csch2x = 1.
16. Find cash x if sinh x = 9.
17. Find sinh x if cash x = 4 and x > 0.
18. Find sinh x and cash x if tanh x = A.
19. Find cash (x + y) if sinh x = 3 and sinh y = 2.
20. Find tanh 2x if tanh x = 2.
In Exercises
21. D sinh x
22. D cash x
23. D tanh x
21 through 26, prove the differentiation formulas.
24. D coth x = -csch2 x.
= cash x.
= sinh x.
25. D sech x = -sech x tanh x.
= sech2 x.
26. D cschx = -cschxcothx.
6.20 Derivatives of inverse functions
We have applied the process of inversion to construct the exponential function from the
logarithm. In the next section, we shall invert the trigonometric functions. It is convenient
at this point to discuss a general theorem which shows that the process of inversion transmits
differentiability from a function to its inverse.
THEOREM
6.7. Assume f is strictly increasing and continuous on an interval [a, b], and
let g be the inverse of J If the derivative f ‘(x) exists and is nonzero at a point x in (a, b),
then the derivative g’(y) also exists and is nonzero at the corresponding point y, where y =
f(x). Moreover, the two derivatives are reciprocals of each other; that is, we have
(6.42)
1
d(Y) = f’(x> *
Note: If we use the Leibniz notation and Write y forf(x), dy/dx fory( x forg(y), and
dx/dy for g’(y), then Equation (6.42) becomes
dx
1
which has the appearance of a trivial algebraic identity.
Proof. Assume x is a point in (a, 6) where f’(x) exists and is nonzero, and let y = f(x).
We shah show that the difference quotient
dy + k) - g(y)
k
approaches the limit l/f’(x) as k - 0.
Let h = g(y + k) - g(y). Since x = g(y), this implies h = g(y + k) - x or x + h =
g(y + k). Therefore y + k = f(x + h ) , a n d hence k = f(x + h ) -f(x). Note that
Inverses of the trigonometric functions
253
h # 0 if k + 0 because g is strictly increasing. Therefore, if k # 0, the difference quotient
in question is
(6.43)
& + 4 - g(y) =k
f(x
h
1
+ h) -f(x) = U’(x + h) - fWh ’
As k --+ 0, the difference g(y + k) - g(y) -+ 0 because of the continuity of g at y [property
(b) of Theorem 3.101. This means that h + 0 as k - 0. But we know that the difference
quotient in the denominator on the extreme right of (6.43) approaches f’(x) as h -f 0
[since f’(x) exists]. Therefore, when k -f 0, the quotient on the extreme left of (6.43)
approaches the limit I/f’(x). This proves Theorem 6.7.
6.21 Inverses of the trigonometric functions
The process of inversion may be applied to the trigonometric functions. Suppose we
begin with the sine function. TO determine a unique inverse, we must consider the sine
over some interval where it is monotonie. There are, of course, many such intervals, for
F IGURE 6.10 y = arcsin x.
F IGURE 6.9 y = sin x.
example [-&r, &r], [$rr , 3.rr],
2
[-$T, --&T], etc., and it really does not matter which one of
these we choose. It is customary to Select [ - & n, 2 &T] and define a new function f as follows :
f(x) = sin x
if
-;<x<;.
The function f SO defined is strictly increasing and it assumes every value between -1
and + 1 exactly once on the interval [ -- 3 7~~ 3~1. (See Figure 6.9.) Hence there is a uniquely
determined function g defined on [- 1, l] which assigns to each number y in [- 1, l] that
number x in [-&r, &T] for which y = sin x. This function g is called the inverse sine or
arc sine, and its value at y is denoted by arcsin y, or by sin-l y. Thus,
u = arcsin v
means v = sin u
and
-pu<;.
The graph of the arc sine is shown in Figure 6.10. Note that the arc sine is not defined
outside the interval [ - 1, 11.
254
The Iogarithm, the exponential, and the inverse trigonometric finctions
The derivative of the arc sine cari be obtained from formula (6.42) of Section 6.20.
In this case we have f’(x) = COS x and this is nonzero in the open interval (-4x, +).
Therefore formula (6.42) yields
1
1
1
g’(y) = - = - =
=d&
COS
X
dl
_
sin2 x
f’(x)
i f
-l<y<l.
With a change in notation we cari Write this result as follows:
D arcsin x = d&-
if
-l<x<l.
Of course, this now gives us a new integration formula,
(6.45)
~ dt = arcsin x ,
which is valid for - 1 < x < 1.
Note: This formula may be used as the starting point for a completely analytic theory
of the trigonometric functions, without any reference to geometry. Briefly, the idea is to
begin with the arc sine function, defining it by the integral in (6.45), just as we defined the
logarithm as an integral. Next, the sine function is defined as the inverse of the arc sine,
and the cosine as the derivative of the sine. Many details are required to carry out this
program completely and we shall not attempt to describe them here. An alternative
method for introducing the trigonometric functions analytically Will be mentioned in
Chapter 11.
In the Leibniz notation for indefinite integrals we may Write formula (6.45) in the form
s
(6.46)
dx
= arcsin x + C .
m
Integration by parts yields the following further integration formula:
s
s
x dx
arcsin x dx = x arcsin x - l/l-x2= x arcsin x + l/l-x2 + C .
The cosine and tangent are inverted in a similar fashion. For the cosine it is customary
to choose the interval [0, 7~1 in which to perform the inversion. (See Figure 6.11.) The
resulting inverse function, called the arc cosine, is defined as follows:
2.4 = arccos
v
means
v = COS u
and
The graph of the arc cosine function is shown in Figure 6.12.
O<Ul?T.
Inverses of the trigonometric
FIGURE 6.11
y=
COS
u = arctan v
FIGURE 6.12 y = arccos
x.
TO invert the tangent we choose
define the arc tangent as follows:
functions
x.
the open interval (-$z-, $r) (see Figure 6.13) and we
means v = tan u
and
-;<u<;.
Figure 6.14 shows a portion of the graph of the arc tangent function.
The argument used to derive (6.44) cari also be applied to the arc cosine and arc tangent
functions, and it yields the following differentiation formulas:
D arccos
(6.47)
x =
m-1
2/1-x”’
validfor -1 <x<1,and
1
D arctan x = 1+ x2’
(6.48)
valid for a11 real x.
------------5
/
____--_----*
-2
FIGURE 6.13 y = tan x.
FIGURE 6.14 y = arctan
x.
The logarithm, the exponential, and the inverse trigonometric fîînctions
256
When (6.47) is translated into an integration formula it becomes
~ dt = -(arccos
(6.49)
x - arccos
0) = 5 - arccos
x
if - 1 < x < 1. By comparing (6.49) with (6.45), we deduce the relation $r - arccos x =
arcsin x. ( This may also be deduced from the familiar identity sin ($T - y ) =
COS y if we Write y = arccos
x.) In the Leibniz notation for indefinite integrals, we may
Write (6.49) as follows:
(6.50)
s &=
-arccos x + C .
Similarly, from (6.48) we obtain
x dt
- = arctan x
s0 1+ t2
(6.51)
or
dx
- = arctan x + C .
s 1 + x2
Using integration by parts in conjunction with (6.50) and (6.51), we cari derive the
following further integration formulas :
s
s
arccos
x dx = x arccos
x +
arctan x dx = x arctan x -
s
x dx
- x arccos
Iq-Iy
x - G-S + C ,
x dx
- = x arctan x - 4 log (1 + x2) + C .
s 1 + x2
The inverses of the cotangent, secant,
following formulas :
and cosecant cari be defined by means of the
for
a11 real x ,
(6.52)
arccot x = Z - arctan x
2
(6.53)
arcsec
1
x = arccos ;
w h e n 1x1 2 1,
(6.54)
1
arccsc x = arcsin ;
when 1x1 2 1 .
Differentiation and integration formulas for these functions are listed in the following
exercises.
6.22 Exercises
Derive
the differentiation formulas in Exercises 1 through 5.
1. Darccosx = ~
d&2 i f
1
2. D arctan x = 1 +x2
- l < x < l .
for a11 real x.
Exercises
-1
3. Darccotx = 1 +x2
4. D arcsec x =
5. D arccsc x =
for a11 real x.
1
i f ]XI>~.
lXl@=T
-1
i f IX]>~.
IXl%G=ï
Derive the integration formulas in Exercises 6 through 10.
6. j’ arccot x dx = x arccot x + &log (1 + x2) + C.
7.jarcsecxdx =xarcsecx-iloglx+Z/XeT1]
+C.
arccsc x dx = x arccsc x + i log Ix + q-1 + C.
s
9. j (arcsin x)~ dx = x(arcsin x)~ - 2x + 24- arcsin x + C.
8.
arcsin x
1°*
s
dx = log 1 --dg
x2
X
arcsin x
--+c.
x
11. (a) Show that D arccot x - arctan J = 0 for a11 x # 0.
X1
(
(b) Prove that there is no constant C such that arccot x - arctan (I/x) = C for a11 x # 0.
Explain why this does not contradict the zero-derivative theorem (Theorem 5.2).
In Exercises 12 through 25, find the derivativef’(x). In each case the functionfis assumed to be
defined for a11 real x for which the given formula for f(x) is meaningful.
12. f(x) = arcsin 5 .
13. f(x) = arccos
19. f(x) = arctan
1 -x
-.
fi
20. f(x) = arctan
(tan2 x).
(x + dGZ).
14. f(x) = arccos !
21. f(x) = arcsin (sin x - cas x).
15. f(x) = arcsin (sin x).
22. f(x) = arccos
1/1_xz.
23. f(x) = arctan
1 +x
l-x.
X’
16. f(x) = fi - arctan
1/x.
17. f(x) = arctan x + 4 arctan
(x3).
1 -x2
18. f(x) = arcsin - .
1 +x2
24. f(x) = [arccos
(x2)lp2.
25. f(x) = log (arccos
-+) .
26. Show that dy/dx = (x + y)/(~ - y) if arctan (y/~) = log 2/X2+r
27. Compute d2y/dx2 if y = (arcsin x)/\/l’-xi; for 1x1 < 1.
28. Letf(x) = arctan x - x + %x
1 3. Examine the sign off’ to prove that
x3
x - - < arctan
3
x
if x > 0.
The logarithm, the exponential, and the inverse trigonometric functions
258
In Exercises 29 through 47, evaluate the indefinite integrals.
29. s &$y23
s
a # 0.
38.
dx
dx
a2 + x2 ’
dx
32. a
+
bx2
s
VS(1 + x) dx*
39. sd1 - x2 dx.
1 - 2x - x2 *
x earctan z
40* s (1 + dx’
earc tanz
41.
dx.
(1
s +
a # 0.
31. -
s
arctan &
[Hint: x = sin u.]
x2)3/2
(ab # 0).
.2)3’2
34. j x arctan x dx.
35. J x2 arccos x dx.
36. j x(arctan x)~ dx.
a > 0.
37. J arctan fi dx.
47.
sy
46. s d(x - a)(b - x) dx,
dx
( x - a)(b - x) ’
6.23 Integration
b # a.
[Hint:
b # a.
x - a = (b - a) sin2 u.]
by partial fractions
We recall that a quotient of two polynomials is called a rational function. Differentiation of a rational function leads to a new rational function which may be obtained by
the quotient rule for derivatives. On the other hand, integration of a rational function
may lead to functions that are not rational. For example, we have
s - = log 1x1 + c
dx
x
and
dx
- = arctan x + C .
s 1 + x2
We shall describe a method for computing the integral of any rational function, and we
shall find that the result cari always be expressed in terms of polynomials, rational functions,
inverse tangents, and logarithms.
The basic idea of the method is to decompose a given rational function into a sum of
simpler fractions (called partial fractions) that cari be integrated by the techniques discussed
earlier. We shall describe the general procedure by means of a number of simple examples
that illustrate a11 the essential features of the method.
EXAMPLE 1. In this example we begin with two simple fractions, 1/(x - 1) and 1/(x + 3),
which we know how to integrate, and see what happens when we form a linear combination
of these fractions. For example, if we take twice the first fraction plus three times the
second, we obtain
5x + 3
2+3= 2(x + 3) + 3(x - 1) =
x - l
x2
+ 2x - 3 .
x+3
(x - 1)(x + 3)
Integration by partial fractions
259
If, now, we read this formula from right to left, it tells us that the rational function r given
by r(x) = (5x + 3)/(x2 + 2x - 3) has been expressed as a linear combination of 1/(x - 1)
and 1/(x + 3). Therefore, we may evaluate the integral of r by writing
s
x2
5x + 3
- dx=2p-+3/%
3
+ 2x
= 2 log Ix - 11 + 3 log Ix + 31 + c .
EXAMPLE 2. The foregoing example suggests a procedure for dealing with integrals of
the form J(ax + b)/(xz + 2x - 3) dx. For example, to evaluate J(2x + 5)/(x2 + 2x - 3) dx,
we try to express the integral as a linear combination of 1/(x - 1) and 1/(x + 3) by writing
2x + 5
~= A+L
x2 + 2x - 3
x - l
x + 3
(6.55)
with constants A and B to be determined. If we cari choose A and B SO that Equation (6.55)
is an identity, then the integral of the fraction on the left is equal to the sum of the integrals
of the simpler fractions on the right. TO find A and B, we multiply both sides of (6.55) by
(x - 1)(x + 3) to remove the fractions. This gives us
(6.56)
A(x + 3) + B(x - 1) = 2x + 5 <
At this stage there are two methods commonly used to find A and B. One method is to
equate coefficients of like powers of x in (6.56). This leads to the equations A + B = 2
and 3A - B = 5. Solving this pair of simultaneous equations, we obtain A = $ and
B = a. The other method involves the substitution of two values of x in (6.56) and leads
to another pair of equations for A and B. In this particular case, the presence of the factors
x - 1 and x + 3 suggests that we use the values x = 1 and x = -3. When we put x = 1
in (6.56), the coefficient of B vanishes, and we find 4A = 7, or A = f. Similarly, we cari
make the coefficient of A vanish by putting x = - 3. This gives us -4B = - 1, or B = $.
In any event, we have found values of A and B to satisfy (6.55), SO we have
s
2x + 5
dx=~~~+~~~=~log,x-l,+~log,x+3,+C.
x2 + 2x - 3
It is clear that the method described in Example 2 also applies to integrals of the form
Jf (x)/g(x) dx in which f is a linear polynomial and g is a quadratic polynomial that cari be
factored into distinct linear factors with real coefficients, say g(x) = (x - x1)(x - x2). In
this case the quotient f (x)/g(x) cari be expressed as a linear combination of 1/(x - x1) and
1/(x - x2), and integration of f(x)/g(x) leads to a corresponding combination of the
logarithmic terms log Ix - x11 and log Ix - x21.
The foregoing examples involve rational functions
f/g in which the degree of the
numerator is less than that of the denominator. A rational function with this property
is said to be a proper rational function. Iff/g is improper, that is, if the degree off is not
less than that of g, then we cari express f/g as the sum of a polynomial and a proper rational
function. In fact, we simply divide f by g to obtain
f(x)
z = Q(x) + RE 9
260
The logarithm,
the exponential, and the inverse trigonometric functions
where Q and R are polynomials (called the quotient and remainder, respectively) such that
the remainder has degree less than that of g. For example,
x3 + 3x
10x + 6
= x + 2 +
x2 - 2 x - 3
x2 - 2x - 3 .
Therefore, in the study of integration technique, there is no loss in generality if we restrict
ourselves to proper rational functions, and from now on we consider jf(x)/g(x) dx, where
f has degree less than that of g.
A general theorem in algebra states that every proper rational function cari be expressed
as a finite sum of fractions of the forms
A
and
(x + a)”
Bx + C
(x2 + bx + c)” ’
where k and m are positive integers and A, B, C, a, b, c are constants with b2 - 4c < 0.
The condition b2 - 4c < 0 means that the quadratic polynomial x2 + bx + c cannot be
factored into linear factors with real coefficients or, what amounts to the same thing, the
quadratic equation x2 + bx + c = 0 has no real roots. Such a quadratic factor is said to
be irreducible. When a rational function has been SO expressed, we say that it has been
decomposed into partial fractions. Therefore the problem of integrating this rational
function reduces to that of integrating its partial fractions. These may be easily dealt with
by the techniques described in the examples which follow.
We shall not bother to prove that partial-fraction decompositions always exist. Instead,
we shall show (by means of examples) how to obtain the partial fractions in specific
problems. In each case that arises the partial-fraction decomposition cari be verified
directly.
It is convenient
to separate the discussion into cases depending on the way in which the
denominator of the quotientf(x)/g(x) cari be factored.
CASE 1. The denominator is a product of distinct linear factors. Suppose that g(x) splits
into n distinct linear factors, say
g(x) = (x - x1)(x - x2) . *. (x - x,) .
Now notice that a linear combination of the form
Al
x - x1
+-.+A
n
may be expressed as a single fraction with the common denominator g(x), and the numerator
of this fraction Will be a polynomial of degree < n involving the A’s. Therefore, if we cari
find A’s to make this numerator equal tof(x), we shall have the decomposition
f(x)Al +...+A,
-x - x1
x - x,’
g(x)
Integration by partial fractions
261
and the integral off(x)/g(x) Will be equal to & Ai log lx - xii. In the next example, we
work out a case with n = 3.
EXAMPLE 3.
htegrate
2xz+5x-‘dx.
s x3 + x2 - 2x
Solution. Since x3 + x2 - 2x = x(x - 1)(x + 2), the denominator is a product of
distinct linear factors, and we try to find A,, A,, and A, such that
2x2+ 5x -.=1 Al +A2 + A3
x3 + x2 - 2x
x
X - l
x+2‘
Clearing the fractions, we obtain
2x2 + 5x - 1 = A,(x - I)(x + 2) + A,x(x + 2) +
A,x(x
- 1).
When x = 0, we find -2A, = - 1, SO A, = g. When x = 1, we obtain 3A, = 6, A, = 2,
and when x = -2, we find 6A, = -3, or A, = -&. Therefore we have
= 3 log Ix1 + 2 log Ix - 11 - 4 log Ix + 21 + C.
CASE 2. The deenominator is a product of linear factors, some of which are repeated.
illustrate this case with an example.
EXAMPLE
Solution.
(6.57)
4. Integrate
s
We
(xx2 _+2x
l)(x +Ldx.
+ 1)”
Here we try to find A,, A,, A,
SO
that
x2 + 2x + 3
A
A2
Al
(x - 1)(x + 1)” = x-l + x + 1+ (x .
We need both A,/(x + 1) and A,/(x + 1)” as well as A,/(x - 1) in order to get a polynomial
of degree two in the numerator and to have as many constants as equations when we try
to determine the A’s. Clearing the fractions, we obtain
(6.58)
x2 + 2x + 3 = A,(x + 1)” + A,(x - I)(x + 1) +
A3(x
- 1) .
Substituting x = 1, we find 4A, = 6, SO A, = $. When x = - 1, we obtain -2A, = 2
and A, = - 1. We need one more equation to determine A,. Since there are no other
choices of x that Will make any factor vanish, we choose a convenient
x that Will help to
simplify the calculations. For example, the choice x = 0 leads to the equation 3 = A, A, - A, from which we find A, = -4. An alternative method is to differentiate both
262
The logarithm, the exponential, and the inverse trigonometric functions
sides of (6.58) and then substitute a convenient
x. Differentiation of (6.58) leads to the
equation
2x + 2 = 24(x + 1) + A,(x - 1) + A& + 1) + A,,
and, if we put x = - 1, we find 0 = -2A, + A,,
we have found A’s to satisfy (6.57), SO we have
SO
A, = $A, = -i, as before. Therefore
x2 + 2x + 3
s (x - 1)(x + 1)”
= 4 log Ix - 11 - ; log (x + 11 + -..L- +c.
x+1
If, on the left of (6.57), the factor (x + 1)3 had appeared instead of (x + l)“, we would
have added an extra term A,/(x + 1)” on the right. More generally, if a linear factor
x + a appears p times in the denominator, then for this factor we must allow for a sum
ofp terms, namely
P
Ak
(6.59)
c
kil (x + a)” ’
where the A’s are constants. A sum of this type is to be used for each repeated linear factor.
CASE 3. The denominator contains
repeated.
EXAMPLE
5. Integrate
irreducible quadratic factors, none of which are
3x2 x; yl- 2 dx .
s
Solution. The denominator cari be split as the product .X~ - 1 = (x - I)(x2 + x + 1),
where x2 + x + 1 is irreducible, and we try a decomposition of the form
3x2 + 2x - 2 I_ A
x3 - 1
X - l
Bx + C
x2 + x + 1.
In the fraction with denominator x2 + x + 1, we have used a linear polynomial Bx + C
in the numerator in order to have as many constants as equations when we solve for A, B,
C. Clearing the fractions and solving for A, B, and C, we find A = 1, B = 2, and C = 3.
Therefore we have
s
3x2 + 2x - 2
x3-1
dx=[~+/x2~*r~Idx.
The first integral on the right is log Ix - 11. TO evaluate the second integral, we Write
s + 3 dx = s
x
2x
+ x + 1
2
2x + 1
dx +
sx
x2+x+1
= log (x” + x + 1) + 2
2
2
dx
+ x + 1
dx
s (x + i>” + 2 .
IntegratrQn
263
by partial fractions
If we let u = x + $ and tc = 42, the last integral is
2 s ~=du
2
4
2x + 1
arctan u = -darctan ~
u2+u2
0:
CI
3
43 *
Therefore. we have
s
1/3 2x+1 +c.
3x2 x3 + - 2x 1 - 2 dx = log Ix - 11 + log (x 2 + x + 1) + i V’? arctan -
CASE 4. The denominator contaitw irreducible quadratic factors, some of which are
repeated. Here the situation is analogous to Case 2. In the partial-fraction decomposition
off(x)/g(x) we allow, first of all, a sum of the form (6.59) for each linear factor, as already
described. In addition, if an irreducible quadratic factor x2 + bx + c is repeated m times,
we allow a sum of m terms, namely
m
B,x + C,
c
k=l ix2 + bx + CY ’
where each numerator is linear.
EXAMPLE
6. Jntegrate
x4 - x3 + 2x2 - x + 2
(x - 1)(x2 + 2)2
dx.
s
Solution. We Write
x4 - x3 + 2x2 - x + -=2
Bx + c
Dx + E
A
+(x - 1)(x2 + 2)2
x - l
x2 + 2 + (x2 + 2)“.
Clearing the fractions and solving for A, B, C, D, and E, we find that
A = 3,
B = $,
c= -5,
D= -1,
E=O.
Therefore, we have
s
x4 - x3 + 2x2 - x + 2
dx=;/~+~~dx-/~x:+dXZ)2
(x - 1)(x2 + 2)2
2 6 4
The logarithm,
the exponential, and the inverse trigonometric finctions
The foregoing examples are typical of what happens in general. The problem of integrating a proper rational function reduces to that of calculating integrals of the forms
dx
x dx
s (x2 + bx + c)” ’
s ix + a)” ’
and
dx
s (x2 + bx + c)” ’
The first integral is log Ix + a1 if n = 1 and (x + a)‘-“/(1 - n) if n > 1. TO treat the other
two, we express the quadratic as a sum of two squares by writing
x’+bx+c=
( x + $ ) 2 + (c-T) =u2+012,
where u = x + b/2 and CI = 4%a2. (This is possible because 4c - b2 > 0.) The
substitution u = x + b/2 reduces the problem to that of computing
J
(6.60)
.
u du
(u” + u2y
and
du
s(2 + u2y .
The first of these is 4 log (u” + x2) if m = 1, and *(u” + E”)‘-“/il - m) if m > 1. When
m = 1, the second integral in (6.60) is evaluated by the formula
du
-=
L arctan u + C .
. r u2+u2
CI
u
The case m > 1 may be reduced to the case m = 1 by repeated application of the recursion
formula
du
1
21
2m - 3
du
+
s (u” + u2y = 2cr”(m - 1) (u” + a2)+l
2cr2(m - 1) s(u’ + a2)m-1 ’
which is obtained by integration by parts. This discussion shows that every rational
function may be integrated in terms of polynomials, rational functions, inverse tangents,
and logarithms.
6.24 Integrals which cari be transformed into integrals of rational functions
A function of two variables defined by an equation of the form
P(x, y) = f$ i am,nxmyn
WL=0 n=o
is called a polynomial in tu,o variables. The quotient of two such polynomials is called a
rational function of two variables. Integrals of the form JR(sin x, COS x) dx, where R is a
rational function of two variables, may be reduced by the substitution u = tan 3x to
integrals of the form jr(u) du where r is a rational function of one variable. The latter
integral may be evaluated by the techniques just described. We illustrate the method with
a particular example.
EXAMPLE
1. fntegrate
s
1
sin x +
COS
x
dx .
Integrals which cari be tramformed
Solution.
into integrals of rationalfunctions
265
The substitution u = tan 4x gives us
dx = 2 du ,
1 + u2
x = 2 arctan u ,
2tan’x
2u
sin x = 2 sin x ~0s x = ----K- = sec’ &x
1+ u2’
2
2
2
C O S x = 2 COS2 5 - 1 = - 1=-L- 1J-u2
sec’ 3x
1 + IA2
1+ u2’
and
sin x +
COS
x =
2u + 1 - ZL2
1+u2
.
Therefore, we have
s
dx
= -2
sin x + COS x
du
du
-2U-1= -2 s (u - a)(u - b) ’
where a = 1 + %5 and b = 1 - ~‘5. The method of partial fractions leads to
s
1
---&jdu
(u - a:yu - b) = a ! b S i u - a
and, since a - b = 2y2, we obtain
(6.61)
J sin x “c’cos x = pop / ;2 /+ c = $log 1:a; f* 1: ; 2 1+ c.
The final answer may be simplified somewhat by using suitable trigonometric identities.
First we note that V5 - 1 = tan 8~ SO the numerator of the last fraction in (6.61) is
tan 3x + tan $r. In the denominator we Write
tan t - 1 - V? = (A + 1) (V5 - 1) tan : - 1 = (V5 + 1) 1 - tan t tan i
Taking logarithms as indicated in (6.61), we may combine the term -id? log (~5 + 1)
with the arbitrary constant and rewrite (6.61) as follows:
s
q
dx
=~logItan(~+~)~+C.
sin x + COS x
In an earlier section we derived the integration formula
s
dx
-~
= arcsin x
%‘l - x2
The logarithm, the exponential, and the inverse trigonometric jîînctions
266
as a consequence
of the formula for differentiating arcsin x. The presence of arcsin x
suggests that we could also evaluate this integral by the trigonometric substitution
t = arcsin x. We then have
dx =
x = sin t,
and we find that
s
COS
t dt,
le-=-T = .\/1-sin2t
= COS t )
ds=/s=Sdf=t=arcsinx.
This is always a good substitution to try if the integrand involves dg. More
generally, any integral of the form jR(x, m)
dx, where R is a rational function of
two variables, cari be transformed by the substitution
x = a sin t,
dx = a
into an integral of the form jR(a sin t, a COS t)a
integrated by one of the methods described above.
EXAMPLE
Solution.
s
2. Integrate
s
COS
COS
t dt ,
t dt. This, in turn, cari always be
x dx
4-xz+m’
We let x = 2 sin t, dx = 2
t
COS
t dt, 1/4_x; = 2 COS t, and we find that
tl
x dx
4 sin
COS t dt
sin t dt
=
4-x2+d4= s 4 COS2 t + 2 COS t s COS t + 4
= -1og 1; + COS
+ c = -log(l + G=F)+c
The same method works for integrals of the form
s
R(x, 6’ - (cx + d)2) dx ;
we use the trigonometric substitution cx + d = a sin t.
We cari deal similarly with integrals of the form
s
R(x, da2 + (cx + d)2) dx
by the substitution cx + d = a tan t, c dx = a sec2 t dt. For integrals of the form
s
R(x, q(cx + d)2 - a”) dx ,
we use the substitution cx + d = a sec t, c dx = a sec t tan t dt. In either case, the new
integrand becomes a rational function of sin t and COS t.
Exercises
Exercises
6.25
Evaluate the following integrals:
2x + 3
1.
(x - 2)(x + 5) dx.
2.
3.
4.
s
s
ss
x dx
(x + 1)(x + 2)(x + 3) ’
x dx
x3 - 3 x + 2
x4 + 2x - 6
x3 + x2 - 2x
dx.
x + 2
- dx.
x2 + x
dx
9.
x(x2
+ 1)2 *
s
10.
11.
12.
dx
(x + 1)(x + 2)2(x +
x dx
(x + 112’
3)3
.
dx
x3
- x *
s
x2 d x
.x2+x-6’
s
(x + 2) dx
14.
x2
s - 4x + 4 .
16.
s
s
dx
(x2 - 4x + 4)(x2 - 4x + 5)
(x - 3) dx
s x3 + 3x2 + 2x.
25.
4x5 - 1
(x5
+ x + 1)2 dx.
s
26.
dx
2
sin
x
COS x + 5 ’
s
dx
s 1 + n COS x
dx
28.
s 1 + a COS x
27.
13.
15.
dx
~
x4 - 2x3.
1 -x3
~ dx.
x(x2 + 1)
dx
22.
x4
f -1.
dx
23.
s x4 + 1.
1
x2 d x
24.
J (x2 + 2x + 2)2.
8x3 + 7
(x
+
1)(2x + 1)X dx.
s
4x2 + x + 1
6.
dx.
x3 - 1
x4 d x
7.
s x4 + 5x2 + 4 .
s
s
s
s
s
s
2 0 .
21.
5.
8.
267
(0 < a < 1).
(a > 1).
29.
sin2 x
1
+
sin2 x dx’
s
30.
dx
I a2 sin2 x + b2 COS~ x
31.
dx
s (a sin x + b
nl2
32.
33.
so
*
COS
sin x dx
1 + Cos x + sin x’
2/3 - x2 dx.
3 4 . d&dx.
s
3 5 . -dx.
s
X
dx
(x2 - 1)2’
3 6 . -dx.
s
X
18.
x+1
- dx.
.x3-1
1
3 7 . qmdx.
s
19.
x4 + 1
dx.
s x(x2 + 1)2
38.
17.
x)~
s Yx2:x+ldx.
(ab # 0).
(a # 0).
268
The logarithm,
the exponential, and the inverse trigonometric jîunctions
39* s &&.
[Hint:
40.
s
d
2-x-x2
x2
dx*
In Exercise 40, multiply numerator and denominator by 42 - x - x2.]
6.26 Miscellaneous review exercises
1. Let f(x) = s: (log t)/(t + 1) dt if x > 0. Compute f(x) +f(l/x). As a check,
obtainf(2) +f(&) = 3 log2 2.
2. Find a functionf, continuous for a11 x (and not everywhere zero), such that
f”(x) =
s0
you should
=f(t) et dt .
3. Try to evaluate je/x dx by using integration by parts.
4. Integrate si’” log (ecosZ) dx.
5. A function f is defined by the equation
f(*) = m
i f x>O.
(a) Find the slope of the graph off at the point for which x = 1.
(b) The region under the graph and above the interval [1,4] is rotated about the x-axis, thus
generating a solid of revolution. Write an integral for the volume of this solid. Compute this
integral and show that its value is VT log (25/8).
6. A function Fis defined by the following indefinite integral:
F(x) =
s
x
et
- dt
1 t
i f x>O.
(a) For what values of x is it true that log x < F(x)?
(b) Prove that jf et/(t + a) dt = e-‘[F(x + a) - F(l + a)].
(c) In a similar way, express the following integrals in terms of F:
l$dt,
l$dt,
lelltdt.
7. In each case, give an example of a continuous functionfsatisfying the conditions stated for ah
real x, or else explain why there is no such function:
(a) jzf(t) dt = e”.
[2*’ means 2(22).]
(b) j$(t) dt = 1 - 2”‘.
(c) j;f(t) dt =f2(x) - 1.
8. If f(x + y) = f(x)&) for a11 x and y and if f(x) = 1 + X~(X), where g(x) + 1 as x + 0,
prove that (a),f(x) exists for every x, and (b)f(x) = e”.
9. Given a functiong which has a derivativeg’(x) for every real x and which satisfies the following
equations :
g’(0) = 2
and
g(x + y) = e”g(x) + cg(y)
for a11 x and y .
(a) Show that g(2x) = 2eZg(x) and find a similar formula for g(3x).
(b) Generalize (a) by finding a formula relating g(nx) to g(x), valid for every positive integer
n. Prove your result by induction.
Miscellaneous
269
review exercises
(c) Show that g(0) = 0 and find the limit of g(h)/h as h -+ 0.
(d) There is a constant C such that ‘p’(x) = g(x) + Ce3: for a11 x. Prove this statement and
find the value of C. [Hier: Use the definition of the derivative g’(x).]
10. A periodic function with period a satisfiesf(x + a) =f(x) for a11 x in its domain. What cari
you conclude
about a function which has a derivative everywhere and satisfies an equation of
the form
f<x + 4 = bfW
for a11 x, where a and b are positive constants?
11. Use logarithmic differentiation to derive the formulas for differentiation of products
quotients from the corresponding formulas for sums and differences.
12. Let A = j: &/(t + 1) dt. Express the values of the following integrals in terms of A:
and
(4 os1 h2 dt.
s
1
et log (1 + t) dt.
(4
0
13. Let p(x) = c, + crx + c,x2 and letf(x) = e”p(x).
(a) Show thatf(“)(O), the nth derivative offat 0, is c. + nc, + n(n - 1)~ .
(b) Solve the problem when p is a polynomial of degree 3.
(c) Generalize to a polynomial of degree m.
14. Let f(x) = x sin ax. Show that f(zn)(x) = ( - l)‘$~~~x sin QX - 2na2n-1 COS ax).
15. Prove that
L2(-1)k(k)z&T = spk(Y)k + n + 1 .
k=O
[Hint:
l/(k + m + 1) = jo tkfm dt.]
16. Let F(x) = Jzf(t) dt. Determine a formula (or formulas) for computing F(x) for a11 real x
if f is defined as follows:
(c) f(t) = ë’t’.
(a> f(t) = (t + ltD2.
l-12
if Itl I 1,
(d) f(t) = the maximum of 1 and t2.
(b) f(t) = 1 _ ,t,
if Itl > 1.
17. A solid of revolution is generated by rotating the graph of a continuous function f around
the interval [0, a] on the x-axis. If, for every u > 0, the volume is a2 + a, find the functionf.
18. Let f(x) = eë2î for a11 x. Denote by S(t) the ordinate set off over the interval [0, t], where
t > 0. Let A(t) be the area of S(t), V(t) the volume of the solid obtained by rotating S(t)
about the x-axis, and W(t) the volume of the solid obtained by rotating S(t) about the y-axis.
Compute the following: (a) A(t); (b) V(t); (c) W(t); (d) lim,,, V(t)/A(t).
19. Let c be the number such that sinh c = 2. (Do not attempt to compute c.) In each case
find a11 those x (if any exist) satisfying the given equation. Express your answers in terms of
log 2 and log 3.
(b) log (e” - d=) = c.
(a) log (e” + de”% + 1) = c.
20. Determine whether each of the following statements is true or false. Prove each true statement.
m
,”
(a) 21%5 = 51W2.
(c) 2 k-Il2 < 2&z for every n 2 1.
k=l
1% 5
(b) logz 5 = lop3 .
n
(d) 1 + sinh x 5 cash x for every x.
The logarithm, the exponential, and the inverse trigonometric functions
270
In Exercises 21 through 24, establish each inequality by examining the sign of the derivative of
an appropriate function.
2
21. -x < sinx < x
i f O<x<i.
77
if x > 0.
X3
i f x>O.
23. x - 6 < sin x < x
i f x>O,y>O,and
24. (xb + yb)‘lb < (x” + J”)“~
25. Show that
(a) J$ e-l t dt = eë”(e” - 1 - x).
(b)
s0
X2
2 e-tt2dt =2!eP!
(c) le-tfdt
O<a<b.
8 - 1 -x - - .
2!
=S!e+(P
- 1 -x -g -$.
(d) Guess the generalization suggested and prove it by induction.
26. If a, b, a,, bl are given, with ab # 0, show that there exist constants A, B, C such that
a, sin x + bl COS x
dx = Ax + Bloglasinx
s a sin x + b COS x
[Hint:
Show that A and B exist such that
a, sin x + bI
27. In
(4
(b)
(c)
+ bcosxJ + C.
COS
x = A(u sin x + b COS x) + B(u
each case, find a function f satisfying the
forx > 0, f(l) =
f’(x”> = 1/x
f’(sin2 x) = cos2 x
for a11 x,
f(1) =
f’(sin x) = Cos2 x
for a11 x, f(1) =
(d) f’(log x) = (r
E x ; ;,’ ”
COS
x - b sin x).]
given conditions.
1.
1.
1.
f(0) = 0.
28. A function, called the integral logarithm and denoted by Li, is defined as follows:
x dt
Li(x) = s2 log t
i f x22.
This function occurs in analytic number theory where it is proved that Li(x) is a very good
approximation to the number of primes I x. Derive the following properties of Li(x) :
X
x dt
2
(a) Li(x) = - +
log x s2 loge t log2’
(b) Li(x) =
where C,, is a constant (depending on n). Find this constant.
(c) Show that there is a constant b such that f,tOsZ &/t dt = Li(x) and find the value of b.
(d) Express j: e2t/(t - 1) dt in terms of the integral logarithm, where c = 1 + $ log 2.
Miscellaneous
review exercises
271
(e) Letf(x) = e4 Li(e2z-4) - e2 Li(e2T-2) if x > 3. Show that
f’@) =
x2
_ 3x + 2 *
29. Let f(x) = log 1x1 if x < 0. Show that f has an inverse, and denote this inverse by g. What
is the domain ofg? Find a formula for computingg(y)
for each y in the domain ofg. Sketch
the graph of g.
30. Letf(x) = jX(l + t3)-li2 dt if x 2 0. (Do not attempt to evaluate this integral.)
(a) Show that f is strictly increasing on the nonnegative real axis.
(b) Let g denote the inverse of J Show that the second derivative of g is proportional to g2
[that is, g”(u) = cg”(y) for each y in the domain of g] and find the constant of proportionality.
POLYNOMIAL APPROXIMATIONS TO FUNCTIONS
7.1 Introduction
Polynomials are among the simplest functions that occur in analysis. They are pleasant
to work with in numerical computations because their values may be found by performing
a finite number of multiplications and additions. In Chapter 6 we showed that the logarithm
function cari be approximated by polynomials that enable us to compute logarithms to any
desired degree of accuracy. In this chapter we Will show that many other functions, such
as the exponential and trigonometric functions, cari also be approximated by polynomials.
If the difference between a function and its polynomial approximation is sufficiently small,
then we cari, for practical purposes, compute with the polynomial in place of the original
function.
There are many ways to approximate a given function f by polynomials, depending on
what use is to be made of the approximation. In this chapter we shall be interested in
obtaining a polynomial which agrees with f and some of its derivatives at a given point.
We begin our discussion with a simple example.
Supposefis the exponential function,f(x) = e”. At the point x = 0, the function f and
a11 its derivatives have the value 1. The linear polynomial
g(x) = 1 + x
also has g(0) = 1 and g’(O) = 1, SO it agrees withfand its first derivative at 0. Geometrically,
this means the graph ofg is the tangent line offat the point (0, 1), as shown in Figure 7.1.
If we approximate f by a quadratic polynomial Q which agrees with f and its first two
derivatives at 0, we might expect a better approximation to f than the linear function g, at
least near the point (0, 1). The polynomial
Q(x) = 1 + x + ix”
has Q(0) = Q’(0)
approximates the
We cari improve
agree withf in the
(7.1)
212
= 1 and Q”(0) = f “(0) = 1. Figure 7.1 shows that the graph of Q
curve y = e5 more closely than the line y = 1 + x near the point (0, 1).
further the accuracy of the approximation by using polynomials which
third and higher derivatives as well. It is easy to verify that the polynomial
.+”
P(x) = 2 5 = 1 + xf;+2
n.
k=O
The Taylor polynomials generated by a function
273
=e x
= I + x
7
y = ex
-1
ti
y=l+x
FIGURE 7 . 1
0
Polynomial approximations to the curve y = e” near (0, 1).
agrees with the exponential function and its first n derivatives at the point x = 0. Of
course, before we cari use such polynomials to compute approximate values for the
exponential function, we need some information about the error made in the approximation.
Rather than discuss this particular example in more detail, we turn now to the general
theory.
7.2 The Taylor polynomials generated by a function
Suppose f has derivatives up to order n at the point x = 0, where n > 1, and let us
try to find a polynomial P which agrees withfand its first n derivatives at 0. There are n + 1
conditions to be satisfied, namely
(7.2)
SO
P(O) = f(O) >
P’(0) =f’(O),
...)
P(“)(O) =f(@(O) )
we try a polynomial of degree n, say
(7.3)
P(x) = cg +, CIX + c2xz + . . . + c,xn ,
with n + 1 coefficients to be determined. We shall use the conditions in (7.2) to determine
these coefficients in succession.
First, we put x = 0 in (7.3) and we find P(0) = c,, , SO c,, =Y(O). Next, we differentiate
both sides of (7.3) and then substitute .K = 0 once more to find P’(0) = c1 ; hence c1 =f’(O).
Polynomial
274
approximations to functions
If we differentiate (7.3) again and put x = 0, we find that P”(0) = 2c,, SO c2 = f “(0)/2.
After differentiating k times, we find that P(“)(O) = k! ck, and this gives us the formula
(7.4)
fork=0,1,2 ,...,n. [When k = 0, we interpret f (O)(O) to mean f (0).] This argument
proves that if a polynomial of degree 5 n exists which satisfies (7.2), then its coefficients
are necessarily given by (7.4). (The degree of P Will be equal to IZ if and only iff cri)(O) # 0.)
Conversely, it is easy to verify that the polynomial P with coefficients given by (7.4) satisfies
(7.2), and therefore we have the following theorem.
THEOREM 7.1. Let f be a function with derivatives of order n at the point x = 0. Then
there exists one and only one polynomial P of degree < n which satisjes the n + 1 conditions
P(O) = f (0) ,
P’(0) = f ‘(O),
...>
P<@(O)
= f (“J(O) .
This polynomial is given by the formula
P(x) = -y$ Xk.
k=O
In the same way, we may show that there is one and only one polynomial of degree < n
which agrees with f and its first n derivatives at a point x = a. In fact, instead of (7.3), we
may Write P in powers of x - a and proceed as before. If we evaluate the derivatives at a
in place of 0, we are led to the polynomial
P(x) = cn f’“‘(a)
7 (x - a)“.
(7.5)
k=O
This is the one and only polynomial of degree 5 n which satisfies the conditions
P(a) = f(a) Y
P’(a) =,f’(a),
...,
P(“)(a) = f (n)(a),
and it is referred to as a Taylor polynomial in honor of the English mathematician Brook
Taylor (1685-1731). More precisely, we say that the polynomial in (7.5) is the Taylor
polynomial of degree n generated by f at the point a.
It is convenient
to have a notation that indicates the dependence of the Taylor polynomial
P on f and n. We shah indicate this dependence by writing P = T,f or P = T,(f). The
symbol T, is called the Taylor operator of degree n. When this operator is applied to a
function f, it produces a new function Tnf the Taylor polynomial of degree n. The value
of this function at x is denoted by T,f(x) or by T,[f(x)]. If we also wish to indicate the
dependence on a, we Write T,f(x; a) instead of T,f(x).
EXAMPLE
for a11 k,
1. When f is the exponential function, f(x) = E(x) = ea, we have E(“)(x) = e”
E(“)(O) = e” = 1, and the Taylor polynomial of degree n generated by E at 0
SO
Calculus of Taylor polynomials
215
is given by the formula
If we want a polynomial which agrees with E and its derivatives at the point a = 1, we
have E(“)(l) = e for a11 k, SO (7.5) gives us
T,E(x; I) = $;(x - 1)“.
k=O
'
EXAMPLE 2. Whenf(x)
= sin x, we have f’(x) = COS x, f”(x) = - sin x,f”‘(x) = - COS x,
f(“)(x) = sin x, etc., SO f (zn+l)(0) = (- 1)” and f(2”)(0) = 0. Thus only odd powers of x
appear in the Taylor polynomials generated by the sine function at 0. The Taylor polynomial
of degree 2n + 1 has the form
Tzn+i(sin
x) = x - ; ,+ g - ;; + . . .
.
.
+(-l)“(2n
-p+1
+ l)!’
EXAMPLE 3. Arguing as in Example 2, we find that the Taylor polynomials generated
by the cosine function at 0 contain only even powers of x. The polynomial of degree 2n
is given by
2n
TZn(COS
x) = 1 - $ + $ - $ + . * * + (-1)” -z.
.
.
(2n)! *
Note that each Taylor polynomial T2Jcos x) is the derivative of the Taylor polynomial
T,,+,(sin x). This is due to the fact that the cosine itself is the derivative of the sine. In
the next section we learn that certain relations which hold between functions are transmitted
to their Taylor polynomials.
7.3 Calculus
of Taylor polynomials
If a function f has derivatives of order n at a point a, we cari always form its Taylor
polynomial Tnf by the formula
T,f(x) = 2’3 (x - a)“.
k=O
'
Sometimes the calculation of the derivatives f(“)(a) may become lengthy, SO it is desirable
to have alternate methods for determining Taylor polynomials. The next theorem describes
properties of the Taylor operator that often enable us to obtain new Taylor polynomials
from given ones. In this theorem it is understood that a11 Taylor polynomials are generated
at a common point a.
Polynomial approximations to functions
276
7.2. The Taylor operator T, has the following properties:
(a) Linearity property. If c1 and c2 are constants, then
THEOREM
L(c,f + c,g) = c,T,(f) + cd”&) .
(b) DifSerentiation
property. The derivative of a Taylor polynomial off is a Taylor
poljnomial off ‘; in fact, we have
(Lf)’ = Tn-df’) .
(c) Integration property. An indejnite integral of a Taylor polynomial off is a Taylor
polynomial of an indejînite integral off. A 4ore p recisely, if g(x) = ja f(t) dt, then we
have
Tn+&) = j-u TJ-(0 dt .
Proof. Each statement (a), (b), or (c), is an equation involving two polynomials of the
same degree. TO prove each statement we simply observe that the polynomial which
appears on the left has the same value and the same derivatives at the point a as the one
which appears on the right. Then we invoke the uniqueness property of Theorem 7.1.
Note that differentiation of a polynomial lowers its degree, whereas integration increases
its degree.
The next theorem tells us what happens when we replace x by cx in a Taylor polynomial.
7.3.
THEOREM
PROPERTY.
SUBSTITUTION
Let g(x) =
f(cx), w h ere c is a constant. Then
we have
T,g(x ; a) = T,f(cx ; ca) .
Zn particular, when a = 0, we have T,g(x) = T,f(cx).
Proof.
Since g(x) =f(cx), the chain rule gives us
g”(x) = ?f”(CX),
g’(x) = cf ‘(cx) ,
...)
g’“‘(x) = C”fyCX) .
Hence we obtain
T,g(x; a> = cn Lf$ (x - a)” = sf* (cx - ca)” = T,f(cx ; ca) .
k=O
EXAMPLES.
’
k=O
*
Replacing x by -x in the Taylor polynomial for ez, we find that
T,(e-“)
= 1 - x + $ - $ + . . 1
.
.
+ ( - 1 ) ” 5.
Since cash x = tex + &e-“, we may use the linearity property to obtain
2n
T,,(cosh x) = +T&e’) + +Tzn(eW2) = 1 + $ i- $ + * * 1 i- x
.
.
(2n)! .
Calculus
of Taylor polynomials
217
The differentiation property gives us
X2n-l
T,,-,(sinh x) = x + $ + $ + . +* +
(2n - l)! ’
.
.
The next theorem is also useful in simplifying calculations of Taylor polynomials.
THEOREM
7.4. Let P, be a polynomial of degree n 2 1. Let f and g be two functions
with derivatives of order n at 0 and assume that
(7.6)
f(x) = P,(x) + x”g(x> ,
where g(x) --f 0 as x + 0.
Then P, is the Taylor polynomial generated by f at 0.
Proof. Let h(x) =f(x) - P,(x) = x”g(x). By d’ff
1 erentiating t h e product x”g(x)
repeatedly, we see that h and its first n derivatives are 0 at x = 0. Therefore, f agrees with
P, and its first n derivatives at 0, SO P,, = Tnf) as asserted.
EXAMPLES.
From the algebraic identity
n+l
1
-=
1 + x + x2 + . . . + xn + x 9
1-X
1 - x
(7.7)
valid for a11 x # 1, we see that (7.6) is satisfied with f(x) = 1/(1 - x), P,(x) = 1 +
x+*-e + xn, and g(x) = x/(1 - x). Since g(x) + 0 as x + 0, Theorem 7.4 tells us that
Integration of this relation gives us the further Taylor polynomial
Xntl
T,+,[-log (1 - x)] := x + ; + f + . . . + n+ 1’
In (7.7) we may replace x by -x2 to get
1
1
+
= 1 -x2+x4
_ .
.
.
+
(-1)nx2n
x2n+l
- (-l)n-
1+ x2’
x2
Applying Theorem 7.4 once more, we lind that
T”.(&) = 2 (-1)“~~~.
k=O
Integration of this relation leads to the formula
Tznfl (arctan x) = 2 (- 1)” & .
k=O
Polynomial approximations to functions
278
7.4 Exercises
1. Draw graphs of the Taylor polynomials Ta(sin x) = x - x3/3 ! and T,(sin x) = x - x3/3 ! +
x5/5!. Pay careful attention to the points where the curves cross the x-axis. Compare these
graphs with that off(x) = sin x.
2. Do the same as in Exercise 1 for the Taylor polynomials T,(cos x), T4 (COS x), and f(x) = cas x.
In Exercises 3 through 10, obtain the Taylor polynomials T,f(x) as indicated. In each case, it
is understood that f(x) is defined for a11 x for which f(x) is meaningful. Theorems 7.2, 7.3, and
7.4 will help simplify the computations in many cases.
n (log a)” ”
3. T,(az) =c 7 xl’.
6. T, [log(l + x)] =$(-‘y.
k=O
k=l
4 . T,(A) =&)kxk.
5. ..+I(&) =zxzk+‘.
9. T,[(l + ~>a] =-$(;)xk,
where
cf.
k
=
0
k=O
10. Tzn (Sir-? x) = n ( -l)k+l $ xzk.
c
cr(a -l).. . (a - k + 1)
k!
[HinI:
COS
2x = 1 - 2 sin2 x.1
k=l
7.5 Taylor% formula with remainder
We turn now to a discussion of the error in the approximation of a function f by its
Taylor polynomial TJ at a point a. The error is defined to be the difference E,(x) =
f(x) - L~(X). Th us, iff has a derivative of order n at a, we may Write
(7.8)
f(x) = 29 (x - a)” + E,(x) .
k=O
This is known as Taylor’s formula with remainder E,(x); it is useful whenever we cari
estimate the size of E,(x). We shall express the error as an integral and then estimate the
size of the integral. TO illustrate the principal ideas, we consider first the error arising
from a linear approximation.
THEOREM 7.5. Assume f has a continuous second derivative
Then, for every x in this neighborhood, we have
f(x) =f(a) +
f'(a)(x
f"
- a) + G(x),
where
E,(x) = Jo” (x - t)f”(t) dt .
in some neighborhood of a.
Taylor’s$,rmula with remainder
Proof.
219
From the definition of the error we may Write
E,(x) = f(x) -f(a) - f’(u)(x - a) = j-‘f’(t) dt -f’(a) s: dt = s: [f’(t) -f’(a)] dt .
The last integral may be written as ja u du, where u =f’(tj -f’(a), and v = t - x. NO~
du/dt =f”(t> and du/dt = 1, SO the formula for integration by parts gives us
E,(x) = 1: u du =
UV
11 - j: (t - x)f”(t) dt = i,; (x - t)f”(t) dt ,
since u = 0 when t = u, and v = 0 when t = x. This proves the theorem.
The corresponding result for a polynomial approximation of degree n is given by the
following.
THEOREM 7.6. Assume f has a continuous
derivative of order n + 1 in some interval
containing a. Then, for every x in this interval, we have the Taylor formula
f(x) = zf$ (x - a)” + E,(x),
k=O
’
where
E,(x) = 5 s” (x - t)nf’“+l’(t)
a
dt .
Proof.
The theorem is proved by induction on n. We have already proved it for n = 1.
Now we assume it is true for some n and prove it for n + 1. We Write Taylor’s formula
(7.8) with n + 1 and with n and subtract to get
-%+1(x) = w4 - f3(,-.,n+l.
Now we use the integral for E,(x) and note that (x - a)n+l/(n + 1) = J;(x - tj” dt to
obtain
E,n+l(~) = 1 ix - tjnfn+‘)(tj dt - ~ a (x - t)” dt
f'"+l'(u)
n. sn
= ;
sa
n!
=(x - t)n[f’“+“(t)
z
s
-f’““‘(u)] dt <
The last integral may be written in the form Ja u du, where u = f (+l)(t) -f (n+1)(a) and v =
-(x - t)“+‘/(n + 1). Integrating by parts and noting that u = 0 when t = a, and that
v = 0 when t = x, we find that
E,+,(x) = -$ “u dv = - 1 “v du = --!- <x - t)n+‘f’n+2’(t)
. sa
n. sa
(n + l)! sa
This completes the inductive step from n to n + 1,
SO
dt .
the theorem is true for a11 n 2 1.
280
Polynomial approximations to functions
7.6 Estimates for the error in Taylor’s formula
Since the error E,(X) in Taylor’s formula has been expressed as an integral involving
the (n + 1)st derivative off, we need some further information aboutf(“+l) before we cari
estimate the size of E,(x). If Upper and lower bounds forf(“+l) are known, we cari deduce
corresponding Upper and lower bounds for E,(x), as described in the next theorem.
THEOREM
7.7.
Zf the (n + 1)st derivative off satisfes the inequalities
m <f
- (n+l)(t) 5 M
(7.9)
for a11 t in some interval
estimates:
(7.10)
containing a, then for every x in this interval we have the following
n+l
m (x - a)n+l
(n+l),
Iw)ef(x-a)
(n + l)!
i f x>a,
and
(7.11)
m (a - x)n+l
(n + l)!
5 (- l)“+lE,(x) < M (a - ‘)*+’
(n + l)!
if x < a .
Proof. Assume first that x > a. Then the integral for E,(x) is extended over the interval
[a, x]. For each t in this interval we have (x - t)” 2 0, SO the inequalities in (7.9) give us
m (x - t>” < cx - t)“f’“+“(t) 5 M cx - t)”
n!
-
n!
Integrating from a to x, we find that
(7.12)
‘(x - t)” dt 5 E,(x) < z <-Y - t)” dt .
. sn
The substitution u = x - t, du = -dt gives us
- t)” dt =/‘-‘un du = (x - a)“+l,
0
n+l
(7.12) reduces to (7.10).
If x < a, the integration takes place over the interval [x, a]. For each t in this interval
we have t 2 x, SO (-I)“(X - t)” = (t - x)” 2 0. Therefore, we may multiply the
inequalities (7.9) by the nonnegative factor (- I)“(x - t)“/n! and integrate from x to a to
obtain (7.11).
SO
EXAMPLE 1. Iff(X) = e” and a = 0, we have the formula
e” = cn FXk + En(x) .
k=O
’
Estimates for the error in Taylor’s formula
281
Since f’“+‘)(x) = e’, the derivative ftn+l) is monotonie increasing on every interval, and
therefore satisfies the inequalities eb :;ftn+l)(t) < ec on every interval of the form [6, c].
In such an interval, the inequalities for E,(x) of Theorem 7.7 are satisfied with m = eb and
M = ec. In particular, when b = 0, we have
Xn+l
Xnfl
(n + l)! < En(x) 5i ec (n
i f
O<~<C.
We cari use these estimates to calculate the Euler number e. We take b = 0, c = 1,
x = 1, and use the inequality e < 3 to obtain
(7.13)
c
n 1
e = k=O k? +
where&y, . I En(l) < 3
(n + l)! *
E,(l) ,
This enables us to compute e to any desired degree of accuracy. For example, if we want
the value of e correct to seven decimal places, we choose an n SO that 3/(n + l)! < 3lO-s.
We shall see presently that n = 12 suffices. A table of values of I/n ! may be computed
rather quickly because l/n ! may be obtained from l/(~ - l)! by simply dividing by n. The
following table for 3 5 n < 12 contains these numbers rounded off to nine decimals.
The “round-off error” in each case is indicated by a plus or minus sign which tells whether
the correct value exceeds or is less than the recorded value. (In any case, this error is less
than one-half unit in the last decimal place.)
1
z
12
3
4
5
6
7
0.166
0.041
0.008
0.001
0.000
666
666
333
388
198
n
667
667
333
889
413
+
-
1
n?
8 0.000 024 802 9 0.000 002 756 10 0.000 000 276 11 0.000 000 025 +
12 0.000 000 002 +
The terms corresponding to n = 0, 1, 2 have sum 2. Adding this to the sum of the entries
in the table (for n < 12) we obtain a total of 2.718281830. If we take into account the
roundoff errors, the actual value of this sum may be less than this by as much as ;Z- of a unit
in the last decimal place (due to the seven minus signs) or may exceed this by as much as
i of a unit in the last place (due to the three plus signs). Cal1 the sum s. Then a11 we cari
assert by this calculation is the inequality 2.718281826 < s < 2.718281832. Now the
estimates for the error E,,(l) give us 0.000000000 5 E12(l) < 0.000000001. Since e =
s + E,,(l), this calculation leads to the following inequalities for e:
2.718281826 < e < 2.718281833.
This tells us that the value of e, correct to seuen decimals, is e = 2.7182818, or that the
value of e, rounded off to eight decimals, is e = 2.71828183.
282
Polynomial approximations to jîunctions
EXAMPLE 2. Zrrationality of e. We cari use the foregoing estimates for the error E,(l)
to prove that e is irrational. First we rewrite the inequalities in (7.13) as follows:
1
(n+l)!‘e-Z:~<(n:l)!’
k=O .
Multiplying through by n!, we obtain
(7.14)
1
-<n!en+l
if n 2 3. For every n, the sum on k is an integer. If e were rational, we could choose n SO
large that n! e would also be an integer. But then (7.14) would tel1 us that the difference
of these two integers is a positive number not exceeding 2, which is impossible. Therefore
e cannot be rational.
Polynomial approximations often enable us to obtain approximate numerical values for
integrals that cannot be evaluated directly in terms of elementary functions. A famous
example is the integral
f(x) = Joz evt2 dt
which occurs in probability theory and in many physical problems. It is known that the
function f SO defined is not an elementary function.
That is to say, f cannot be obtained
from polynomials, exponentials, logarithms, trigonometric or inverse trigonometric
functions in a finite number of steps by using the operations of addition, subtraction,
multiplication, division, or composition. Other examples which occur rather frequently
in both theory and practice are the integrals
s0
9in (t2) dt ,
s0
21 - k2 sin2 t dt .
(In the first of these, it is understood that the quotient (sin t)/t is to be replaced by 1 when
t = 0. In the third integral, k is a constant, 0 < k < 1.) We conclude this section with
an example which illustrates how Taylor’s formula may be used to obtain an accurate
estimate of the integral jt’2e-t2dt.
EXAMPLE
(7.15)
3. The Taylor formula for e” with n = 4 gives us
e” = 1 + x + cy + $ + $ + Ed(x) .
.
.
.
Suppose now that x < 0. In any interval of the form [-c, 0] we have eëc < e” < 1,
may use the inequalities (7.11) of Theorem 7.7 with m = eëc and M = 1 to Write
O<(-1)SE,(x)<($ i
f
x<O.
SO
we
Other forms of t.he remainder in Taylor’s formula
In other words, if .x < 0, then Ed(x) is negative and 2 x”/5 ! .
we have
283
Replacing x by -P in (7.15),
(7.16)
where -P0/5! 2 Ed(-t”) < 0. If 0 5 t 5 4, we find that P0/5! 5 (!#O/S! < 0.000 009.
Thus, if we integrate (7.16) from 0 1.0 4, w e obtain
s lie-t2&=‘-L+
0
2
’
23.32 5.25.2! - 7.27.3!1 + 9.2’.4!l -0,
where 0 < 8 < 0.000 0045. Rounding off to four decimals, we find Jt’2e-t2 dt = 0.4613.
*7.7 Other forms of the remainder in Taylor% formula
We have expressed the error in Taylor’s formula as an integral,
E,(x) = $, sa ix - t)y+yt) dt .
It cari also be expressed in many other forms. Since the factor (x - t)” in the integrand
never changes sign in the interval of integration, and sincef(“+l) is continuous on this
interval, the weighted mean-value theorem for integrals (Theorem 3.16) gives us
s z(x - Qnf(“fl)(t) dt = f(n+l)
a
(C)/;X - t)” dr = f’“+“(c) (x n-J:+’ ,
a
where c lies in the closed interval joining a and x. Therefore, the error cari be written as
E (x) _ f’“+%>
12
(n + 1), (x - aY’l .
This is called Lagrange’s form of the remainder. It resembles the earlier terms in Taylor’s
formula, except that the derivative f (71+1)(c) is evaluated at some unknown point c rather
than at a. The point c depends on x and on n, as well as onf:
Using a different type of argument, we cari drop the continuity requirement on f(%+l)
and derive Lagrange’s formula and other forms of the remainder under a weaker hypothesis.
Suppose that f 'wl) exists in some open interval (h, k) containing the point a, and assume
that fcn) is continuous in the closed interval [h, k]. Choose any x # a in [h, k]. For
simplicity, say x > a. Keep x fixed and define a new function F on the interval [a, x] as
follows :
J-(t) = j(t) +
Note that F(x) = f(x) and F(a) = Tnf x’
( , a ),
SO
F(x) - F(a) = E,(x). The function Fis
284
Polynomial approximations to functions
continuous in the closed interval [a, x] and has a derivative in the open interval (a, x). If
we compute F’(t), keeping in mind that each term of the sum defining F(t) is a product, we
find that a11 terms cancel except one, and we are left with the equation
Jv(Q = (x - 0" (n+l)
---yf (t>
.
Now let G be any function that is continuous on [a, x] and differentiable on (a, x). Then
we cari apply Cauchy’s mean-value formula (Theorem 4.6) to Write
G’(cNW
- @)l = F’(c)[G(x) - G(a)1 ,
for some c in the open interval (a, x). If G’ is nonzero in (a, x), this gives the following
formula for the error E,(x):
E,(x) = $fj [G(x) - G(a)1 .
We cari express the error in various forms by different choices
G(t) = (X - t)n+l, we obtain Lagrange’s form,
of G.
For example, taking
where a < c < x .
E (x)
n
Taking G(f) = x - t, we obtain another formula, called Cauchy’s form of the remainder,
E 12 (x) -- f”+%)
--y--(X - ,-)“@ - a) >
n.
where a < c < x .
If G(t) = (x - t)“, wherep 2 1, we obtain the formula
E (x) _ f’“+%>
n, p (x - cy+l+yx - a)” )
n
where
a<c < x.
7.8 Exercises
Examples of Taylor’s formula with remainder are given in Exercises 1, 2, and 3. In each case
prove that the error satisfies the given inequalities.
1. sinx =
n (- l)‘c-1xZk-1 + E (x>
c
k=l ( 2 k - l)!
2n ’
c +
n (-l)“x2k
2. COS x = k=O (2#/4!
3. arctan x
1x1 2n+l
I~2nW I (2n + 1)! *
1xp+2
E2n+l(x)*
=“-‘(-1)“X2k+’ + E (x)
2n
9
,c
l”“(J 2k + l
I~2n+lWI
5 (2n + 2)!'
x27x+1
I~2nWl < zn + 1
i f Olxll.
285
Exercises
(a) Obtain the number r =
- 3 as an approximation to the nonzero root of the equation
x2 = sin x by using the cubic Taylor polynomial approximation to sin x.
(b) Show that the approximation in part (a) satisfies the inequality
1
Isin r - PI < 200,
given that fi - 3 < 0.9. 1s the difference (sin r - r2) positive or negative? Give full
details of your reasoning.
(a) Use the cubic Taylor polynomial approximation to arctan x to obtain the number r =
(fi - 3)/2 as an approximation to the nonzero root of the equation arctan x = x2.
(b) Given that fi < 4.6 and that 216 = 65536, prove that the approximation in part (a)
satisfies the inequality
7
]r2 - arctan r] < - .
100
1s the difference (r2 - arctan r) positi.ve or negative? Give full details of your reasoning.
11 +x30
Prove that
- dx=l+$
where O<C<~.
0 1 + x60
s
,. Prove that 0.493948 <
A dx < 0.493958.
J 0 1+x4
8. (a) If 0 5 x L< 4, show that sin x = x - x3/3! + r(x), where Ir(x)l < ($)“/5!.
(b) Use the estimate in part (a) to find an approximate value for the integral jfF/2 sin (x2) dx.
Make sure you give an estimate for the error.
9. Use the first three nonzero terms of Taylor’s formula for sin x to find an approximate value
for the integral j: (sin x)/x dx and give an estimate for the error. [It is to be understood that
the quotient (sin x)/x is equal to 1 when x = 0.1
10. This exercise outlines a method for computing n, using Taylor’s formula for arctan x given in
Exercise 3. It is based on the fact that 71 is nearly 3.2, SO &r is nearly 0.8 or 2, and this is nearly
4 arctan 8. Let a = arctan 3, B = 4ar - &.
(a) Use the identity tan@ + B) = (tan A + tan B)/(l - tan A tan B) with A = B = a and
then again with A = B = 2a to get tan 2u = la2 and tan 4a = ++g. Then use the identity
once more with A = ~CC, B = -3~ to obtain tan B = &. This yields the following
remarkable identity discovered in 1706 by John Machin (1680-1751):
TI = 16 arctan g - 4 arctan &.
(b) Use the Taylor polynomial T,,(arctan
x) with x = 3 to show that
3.158328934 < 16 arctan
(c) Use the Taylor polynomial T,(arctan
3 < 3.158328972.
x) with x = &g to show that
-0.016736309 < -4 arctan
& < -0.016736300.
(d) Use parts (a), (b) and (c) to show that the value of X, correct to seven decimals, is
3.1415926.
286
Polynomial approximations to functions
7.9 Further remarks on the errer in Taylor’s formula. The o-notation
Iffhas a continuous (n + 1)st derivative in some interval containing a point a, we may
Write Taylor’s formula in the form
(7.17)
f(x) = z’$ (x - a)” + E,(x) .
k=O
’
Suppose we restrict x to lie in some closed interval [a - c, a + c] about a, in whichf(“+‘)
is continuous. Then f (n+l) is bounded on this interval and hence satistjes an inequality of
the form
If’“f”Wl 5 M >
where M > 0. Hence, by Theorem 7.7, we have the error estimate
for each x in [a - c, a + c].
find that
If we keep x # a and divide this inequaiity by Ix - aIn, we
If now we let x -* a, we see that E,(x)/(x - a)n -f 0. We describe this by saying that the
error E,(x) is of smaller order than (x - a)n as x + a.
In other words, under the conditions stated, f(x) may be approximated near a b y a
polynomial in (x - a) of degree n, and the error in this approximation is of smaller order
than (x - a)n as x + a.
A special notation, introduced in 1909 by E. Landau,? is particularly appropriate when
used in connection with Taylor’s formula. This is called the o-notation (the little-oh
notation) and it is defined as follows.
DEFINITION.
Assume g(x) # 0 for all x # a in some interval
f(x) = O(~(X))
containing a. The notation
as x - a
means that
The symbolf(x) = o(g( x)) is read ‘f(x) is little-oh of g(x),” or “f(x) is of smaller order
than g(x),” and it is intended to convey the idea that for x near a, f(x) is small compared
with g(x).
t Edmund Landau (1877-1938) was a famous German mathematician who made many important contributions to mathematics. He is best known for his lucid books in analysis and in the theory of numbers.
Further remarks on the error in Taylor’s formula. The o-notation
EXAMPLE
EXAMPLE
281
1. j(x)= o(1) as x + a means that S(x) + 0 as x -+ a.
f(x)
2. f(x) = o(x) as x --f 0 means that - + 0 as x + 0.
X
An equation of the formf(x) = h(x) + o(g( x)) is understood to mean thatf(x) - h(x) =
o(g(x)) or, in other words, [f(x) - h(x)]/g(x) + 0 as x --f a.
EXAMPLE
3. We have sin x = x + o(x) because
sin x
sinx - x =-l-+Oasx+O.
X
X
The foregoing remarks concerning the error in Taylor’s formula cari now be expressed
in the o-notation. We may Write
x-a,
a s
(x -- a)” + o((x - a)“)
k=O
whenever the derivative f cn+l) is continuous in some closed interval containing the point a.
This expresses, in a brief way, the fact that the error term is small compared to (x - a)n
when x is near a. In particular, from the discussion of earlier sections, we have the following
examples of Taylor’s formula expressed in the o-notation:
1
-=
1 + x + x2 + ’ . . + xn + o(P)
l - x
a s x-0.
a s
log (1 + x) = x - $ + $ - -4 + . ’ . + (-l)“-’ f + O(Xn)
e” = 1 + x + Fy + * . . + 1s + 4x”)
3
a s
x-+0.
x-0.
Zn-1
X5
sin x = x - : + 5 - ti + . . *
+ c+-’ (2;
.
.
.
- l)!
+
0(X27
a s
x+0.
2n
COS x = 1 - ;y + t; - ;; + . . .
.
.
.
3
5
7
2n+l
+
(-'Y&
+ 4x
1
a s x+0.
x2>Lm-l
arctanx=x-~+~-~+~..+(-l)“l- + O(X2n)
2n - 1
a s
x+0.
In calculations involving Taylor approximations, it often becomes necessary to combine
several terms involving the o-symbol. A few simple rules for manipulating o-symbols are
discussed in the next theorem. These caver most situations that arise in practice.
288
Polynomial approximations to functions
7.8.
THEOREM
64
(b)
(4
(4
te>
ALGEBRA
OF
0-SYMBOLS.
As x -f a, we have the following:
4&>> -I o(g(x>> = 4gW.
if c#O.
O(C&>> = 4gW)
f(x) * O(&N = 4fWg(x>>.
44gW = 4gW).
l
1 + g(x)
= 1 - g(x) + OMXN
if g(x) - 0
as
x-ta.
Proof, The statement in part (a) is understood to mean that iffi = o(g(x)) and if
fi(x) = o(g(x)), thenf,(x) f fi(x) = o(g(x)). But since we have
fi(X) h.fi(X) -fl(X> p4
g(x)
g(x) .dx> ’
each term on the right tends to 0 as x + a, SO part (a) is proved. The statements in (b),
(c), and (d) are proved in a similar way.
TO prove (e), we use the algebraic identity
1
1+u
=l-u+uL
1+u
g(x)
with u replaced by g(x) and then note that ~ -0 a s x - t a .
1 + g(x)
EXAMPLE
1. Prove that tan x = x + 4x” + 0(x3) as x --f 0.
Solution. We use the Taylor approximations for the sine and cosine.
Theorem 7.8, with g(x) = -4x” + 0(x3), we have
1
1
-=
= 1 + f x2 + 0(x2)
COS x
1 - 4x2 + 0(x3)
From part (e) of
a s x-0.
Therefore, we have
sin x
tan x = -= x-; x3 + 0(x4) 1 + ; x2 + 0(x2) = x + ; x3 + 0(x3) .
(
Ii
1
COS x
EXAMPLE
2. Prove that (1 + x)1/= = c . 1 - -2 + g +
o(x2)
as
x -+ 0.
Solution. Since (1 + x)ll” = e”/x)lOgo+s), we begin with a polynomial approximation
to log (1 + x). Taking a cubic approximation, we have
log (1 + x) = x - ; + f + 0(x3) )
1% (1 + x) = 1
X
- ; + 5 + 0(x2) )
Applicatiorw
and
SO
289
to indeterminate forms
we obtain
(7.18)
(1 + X)l/” = exp (1 - x/2 + x2/3 + 0(x2)) = e . eu,
where u = -x/2 + x2/3 + 0(x2). But as u -j 0, we have et1 = 1 + u + tu2 + o(G),
obtain
SO
we
eu = 1 - I + $ + 0(x2) + i - 5 -f $ + 0(x2) 2 + 0(x2) = 1 - 5 + $ + 0(x2) .
(
)
When we use this in Equation (7.18), we obtain the desired formula.
7.10 Applications to indeterminate forms
We have already illustrated how polynomial approximations are used in the computation
of function values. They cari also be used as an aid in the calculation of limits. We illustrate
with some examples.
EXAMPLE
1. If a and b are positive numbers, determine the limit
- b”
lim a” .
X+0
X
Solution. We cannot solve this problem by computing the limit of the numerator and
denominator separately, because the denominator tends to 0 and the quotient theorem on
limits is not applicable. The numerator in this case also tends to 0 and the quotient is said
to assume the “indeterminate form O/O” as x + 0. Taylor’s formula and the o-notation
often enable us to calculate the limit of an indeterminate form like this one very simply.
The idea is to approximate the numerator a” - 6” by a polynomial in x, then divide by x
and let x -f 0. We could apply Taylor’s formula directly to f(x) = a” - b” but, since
az = ,slOga and b” = ,xloeb
, it is simpler in this case to use the polynomial approximations
already derived for the exponential function. If we begin with the linear approximation
et = 1 + t + o(t)
a s
t-t0
and replace t by x log a and x log b, respectively, we find
a5 = 1 + x log a + o(x)
and
b” = 1 + x log b + o(x)
a s x+0.
Here we have used the fact that o(x log a) = o(x) and o(x log b) = o(x). If now we subtract
Dividing
and note that o(x) - o(x) = o(x), we find a5 - b” = x(log a - log b) + o(x).
by x and using the relation ~(X)/X = o(l), we obtain
a” - b”
X
= log a + o(1) + log 2
b
b
a s x-0.
290
EXAMPLE
Solution.
Polyzomial
approximations to functions
2. Prove that lim,,, ycotx+ -5.
We use Example 1 of Section 7.9, and Theorem 7.8(e) to Write
1
1
1
1
cet x = - =
tan x x + $x3 + 0(x3) = x 1 + 4x2 + 0(x2)
=-1
X
i
1 - 5 x2 + 0(x2)
)
= ; - ; x + o(x).
Hence, we have
k(cotx-$) =-5+0(l)+-: a
EXAMPLE
3. Prove that lim,,,
log (1 + ax)
X
= a
s
x+0.
for every real a.
Solution. If a = 0, the result holds trivially. If a # 0, we use the linear approximation
log (1 + x) = x + o(x). Replacing x by ax, we obtain log (1 + ax) = ax + o(ax) =
ax + o(x). Dividing by x and letting x + 0, we obtain the limit a.
EXAMPLE
4. Prove that for every real a, we have
(7.19)
lim (1 + ax)liz = eu ,
X+0
Solution. We simply note that (1 + ax)l/” = e(l~x)‘o~o+as) and use the result of Example
3 along with the continuity of the exponential function.
Replacing ax by y in (7.19), we find another important limit relation:
lim (1 + y) “’ = eu .
u-0
Sometimes these limit relations are taken as the starting point for the theory of the
exponential function.
7.11 Exercises
1. Find a quadratic polynomial P(x) such that 2x = P(x) -t- 0(x2) as x + 0.
2. Find a cubic polynomial P(x) such that x COS x = P(x) + o((x - 1)3) as x -i 1.
3. Find the polynomial P(x) of smallest degree such that sin (x - x2) = P(x) + 0(x6) as x -+ 0.
4. Find constants a, b, c such that log x = a + b(x - 1) + c(x - 1)2 + o((x - 1)2) as x -+ 1.
5. Recall that COS x = 1 - 4x2 + o(2) as x -i 0. Use this to prove that x-2 (1 - COS x) - $
as x -+ 0. In a similar way, find the limit of xe4(1 - COS 2x - 2x2) as x + 0.
Exercises
291
Evaluate the limits in Exercises 6 through 29.
,8 lim [sin (7dWKlogx)
sin ax
6. lim 7
x+o sm 6x’
tan 2x
7. lim xjo sin 3x’
8. lim
x-o
sin x - x
x3
.
9. lim
log (1 + x)
@-1 *
. s-1 (x3 + 5)(x - 1) *
cash x - COS x
19. lim
x2
*
x-o
3 tan 4x - 1 2 t a n x
20. lim
r+o 3sin4x - 12sinx’
a” _ asin z
21. lim
x3
*
X+0
s-0
1 - COS2 x
10. lim
s+o x t a n x *
22. lim
sin x
11. limp
r+O arctan x’
23. lim ail.
X+0
Cos (sin x) x4
COS
6 21.
24. lim (x + ezz)l’x.
2-O
log x
13. lim
x,1x2+x-2’
25 lim (1 + XY’x - e
1 - COS x2
14. lim
z+o x2 sin x2 ’
26. :z( (’ +e’,‘-xr.
CV-0
15. lim
x(e” + 1) - 2(e3: - 1)
x3
*
arcsin x lira
27. lim 2-O (
)*
16. lim
log (1 + x) - x
1 - cosx *
28. lii(; -x-&I).
X+0
*
X+l
a” - 1
12. lim -,
r-06” - 1
2+0
x
COS x
17. lim pe+n x - a,’
1
1
29. lim - - x-1.
!I+l ( log x
1
30. For what value of the constant a Will x- 2 (e ax - fl - x) tend to a finite limit as x -+ O? What
is the value of this limit ?
31. Given two functionsfandg with derivatives in some interval containing 0, whereg is positive.
Assume also f(x) = O(~(X)) as x + 0. Prove or disprove each of the following statements:
(a) jrfcl> dl = o(jf g(t) 4) as x - 0,
(b) f’(x) = e@‘(x)) as x + 0 .
32. (a) Ifg(x) = o(1) as x + 0, prove that
1
~ = 1 -g(x) + g2(x> + o(g2(x))
1 +gm
(b) Use part (a) to prove that tan x = x + f + z + 0(x5)
33. A function
f
a s x-0.
a s x+0.
has a continuous third derivative everywhere and satisfies the relation
lim
X+0
1 +
(
x
+
f(x>
X
1’z
i
=
e3.
Polynomial approximations to finctions
292
.
[Hint: If limz,og(x) = A, then g(x) = A + o(l) as x -+ 0.1
7.12 L’Hôpital’s rule for the indeterminate form O/O
In many examples in the foregoing sections we have calculated the limit of a quotient
f(x)/g(x) in which both the numeratorS(x) and the denominator g(x) approached 0. In
examples like these, the quotientf(x)/g( x ) is said to assume the “indeterminate form O/O.”
One way to attack problems on indeterminate forms is to obtain polynomial approximations tof(x) and g(x) as we did in treating the above examples. Sometimes the work cari
be shortened by use of a differentiation technique known as L’Hôpital’s rule.? The basic
idea of the method is to study the quotient of derivativesf’(x)/g’(x) and thereby to try to
deduce information about f(x)/g(x).
Before stating L’Hôpital’s rule, we show why the quotient of derivativesf’(x)/g’(x) bears
a relation to the quotient f(x)/g(x). Supposefand g are two functions withf(a) = g(a) = 0.
Then, for x # a, we have
f(x)
g(x) _ f(x) -f(a) = f(x)
- -f(a)
x - a l
g(x) g(x) - g(a)
g(a)
x - u
If the derivativesf’(a) and g’(a) exist, and if g’(u) # 0, then as x + a the quotient on the
right approachesf’(a)/g’(a) and hencef(x)/g(x) AS’(a)/g’(a).
EXAMPLE.
1 - e2r
Compute lim,,, ~ .
X
Solution. Here f(x) = 1 - e2x and g(x) = x, SO f’(x) = -2e2’, g’(x) = 1. Hence we
havef’(O)/g’(O) = - 2 , SO the limit in question is -2.
In L’Hôpital’s rule, no assumptions are made aboutf, g or their derivatives ut the point
x = a. Instead, we assume thatf(x) and g(x) approach 0 as x + a and that the quotient
f’(x)/g’(x) tends to a finite limit as x -j a. L’Hôpital’s rule then tells us thatf(x)/g(x) tends
to the same limit. More precisely, we have the following.
THEOREM 7.9.
L'HÔPITAL'S
RULE FOR 010.
Assume f and g have derivatives f’(x) and
g’(x) at each point x of an open interval (a, b), and suppose that
(7.20)
limf(x) = 0
m-ta+
and
lim g(x) = 0 .
57-a+
t In 1696, Guillaume François Antoine de L’Hôpital (1661-1704) wrote the first textbook on differential
calculus. This work appeared in many editions and played a significant role in the popularization of the
subject. Much of the content of the book, including the method known as “L’Hôpital’s rule,” was based
on the earlier work of Johann Bernoulli, one of L’Hôpital’s teachers.
L’Hôpital’s rulejk the indeterminate form 010
293
Assume also that g’(x) # 0 for each x in (a, b). If the limit
Iim f’(x>
r-a+ g’(x)
(7.21)
exists and has the value L, say, then the limit
lim f(x>
a-n+ g(x)
(7.22)
also exists and has the value L.
Note that the limits in (7.20), (7.21), and (7.22) are “right-handed.” There is, of course,
a similar theorem in which the hypotheses are satisfied in some open interval of the form
(6, a) and a11 the limits are “left-handed.” Also, by combining the two “one-sided”
theorems, there follows a “two-sided” result of the same kind in which x + a in an
unrestricted fashion.
Before we discuss the proof of Theorem 7.9, we shall illustrate the use of this theorem
in a number of examples.
EXAMPLE
1. We shall use L’Hôpital’s rule to obtain the familiar formula
sin x
lim -=l
s-+0
x
(7.23)
Heref(x) = sin x and g(x) = x. The quotient of derivatives isf’(x)/g’(x) = (COS x)/1 and
this tends to 1 as x --f 0. By Theorem 7.9 the limit in (7.23) also exists and equals 1.
EXAMPLE
2. TO determine the limit
x - tan x
lim
270 x - sin x
by L’Hôpital’s rule, we letf(x) = x - tan X, g(x) = x - sin x, and we find that
f’(x)
--= 1 - sec2 x
1 - COS x .
g’(x)
(7.24)
Although this, too, assumes the form O/O as x -f 0, we may remove the indeterminacy at
this stage by algebraic means. If we Write
1
1 - sec2 x = 1 - = COS2 x - 1 = _ (1 + COS X)(l - COS x) 3
COS2 x
COS2 x
COS2 x
the quotient in (7.24) becomes
fr=‘(xl
1 + COS x
>
COS2 x
g’(x:r
and this approaches -2 as x + 0. Notice that the indeterminacy disappeared when we
294
Polynomial approximations to jiinctions
canceled the common factor 1 - COS x. Canceling common factors usually tends to
simplify the work in problems of this kind.
When the quotient of derivatives f’(x)/g’(x) also assumes the indeterminate form O/O,
we may try L’Hôpital’s rule again.
In the next example, the indeterminacy is removed
after two applications of the rule.
EXAMPLE
3. For any real number c, we have
lim
Z-r1
XC-cx+c-1 = Iirn cx C - l - c = ,im C(C -21)*‘-2 _ C(C - 1) .
r-1 2(x - 1)
(x - 1)”
2-1
2
In this sequence of equations it is understood that the existence of each limit implies that
of the preceding and also their equality.
The next example shows that L’Hôpital’s rule is not infallible.
EXAMPLE 4. Letf(x) = e-lix if x # 0, and let g(x) = x. The quotientf(x)/g(x)
assumes
the indeterminate form O/O a s x --f O+, and one application of L’Hôpital’s rule leads to
the quotient
f’(x)
-= Ulx”>e1
g'(x)
l/z
e-l/x
=-
X2
This, too, is indeterminate as x + O+, and if we differentiate numerator and denominator we
obtain (I/x2)e-1/z/(2x) = e-l/“/(2x3). After n steps we are led to the quotient e-lix/(n! xn+l),
SO the indeterminacy never disappears by this method.
5. When using L’Hôpital’s rule repeatedly, some tare is needed to make certain
that the quotient under consideration actually assumes an indeterminate form. A common
type of error is illustrated by the following calculation:
EXAMPLE
lim3X2-2x-1=lim6x-2=lim6=3
X+l
x2 - x
r-1
2x - 1
s+l2
The first step is correct but the second is not. The quotient (6x - 2)/(2x - 1) is not
indeterminate as x --f 1. The correct limit, 4, is obtained by substituting 1 for x in
(6x - 2)/(2x - 1).
EXAMPLE 6. Sometimes the work cari be shortened by a change of variable. For example,
we could apply L’Hôpital’s rule directly to calculate the limit
but we may avoid differentiation of square roots by writing t = V% and noting that
t
1
1
fi
::y+ l _ e2G =ji$ 1 _ e2t = lim - = - t+o+ -2e2t
2’
We turn now to the proof of Theorem 7.9.
Exercises
295
Proof.
We make use of Cauchy’s mean-value formula (Theorem 4.6 of Section 4.14)
applied to a closed interval having a as its left endpoint. Since the functionsf and g may
not be detined at a, we introduce two new functions that are defined there. Let
.
F(x) =Se4
i f
x#a,
F(a) = 0 ,
Gc-4 = g(x)
i f
x#a,
G(a) = 0 .
Both F and G are continuous at a. In fact, if a < x < b, both functions F and G are
continuous on the closed interval [a, x] and have derivatives everywhere in the open interval
(a, x). Therefore Cauchy’s formula is applicable to the interval [a, x] and we obtain
[F(x) - F(a)]c’(c)
= [G(x) - C(~)IF’(C),
where c is some point satisfying a < c < x. Since F(a) = G(a) = 0, this becomes
fbk:‘(c> = gwf’(c) .
Now g’(c) # 0 [since, by hypothesis, g’ is never zero in (a, b)] and also g(x) # 0. In fact,
if we had g(x) = 0 then we would have G(x) = G(a) = 0 and, by Rolle’s theorem, there
would be a point x1 between a and x where G’(x,) = 0, contradicting the hypothesis that
g’ is never zero in (a, b). Therefore we may divide by g’(c) and g(x) to obtain
J(x) f’(c)
g(x) g’(c) .
As x--f a, the point c + a (since a < c < x) and the quotient on the right approaches L
[by (7.21)1. Hence,fb)/g( x >a SO
1 approaches L and the theorem is proved.
7.13
Exercises
Evaluate the limits in Exercises 1 through 12.
3x2 + 2x - 16
1. lim
x-2 x 2 - x - 2 .
x2 - 4x + 3
2. lim
z,32x2 - 13X + 21.
sinh x - sin x
3. lim
x3
.
2+0
4 lim (2 - x)eZ - x - 2
x3
*
x-o
log (COS ax)
5. lim
2-o log (COS bx) ’
x - sin x
6. lim
2+o+ (x sin x)3/2 ’
7 lim vs-v5+d=-u
x-a+
l/zs
*
x” - x
8. lim
r-l+ 1 - x +1ogx’
arcsin 2x - 2 arcsin x
9. lim
x3
.
X+0
x cet x - 1
10.
lim
x2 .
X-r0
11. lim;5;1:Tl-n.
x-l
12. lim J-- a arctan
2+0+x 2/x (
1/x
a - b arctan
1/x
7 .
1
296
Polynomial approximations to jîunctions
13. Determine the limit of the quotient
(sin 4x)( sin 3x)
x sin 2x
asx-+Oandalsoasx-+~n.
14. For what values of the constants a and b is
lim (x-~ sin 3x + axw2 -t b) = O?
X+0
1
x t2 dt
- =l.
bx - sin x so ct
16. A circular arc of radius 1 subtends an angle of x radians, 0 < x < 4~, as shown in Figure
7.2. The point C is the intersection of the two tangent lines at A and B. Let T(x) be the area of
15. Find constants a and b such that lim,,,
FIGURE 7.2 Exercise
16.
triangle ABC and let S(x) be the area of the shaded region. Compute the following: (a) T(x);
(b) S(x); (c) the limit of T(x)/S(x) as x --+ 0 +.
17. The current Z(t) flowing in a certain electrical circuit at time t is given by
I(t) = E (1 - ,-J-w)
R
where E, R, and L are positive numbers. Determine the limiting value of Z(t) as R -f 0 +.
18. A weight hangs by a spring and is caused to vibrate by a sinusoidal force. Its displacement
f(t) at time t is given by an equation of the form
f(t) = & (sin kt - sin ct) ,
where A, c, and k are positive constants, with c # k. Determine the limiting value of the displacement as c -+ k.
7.14 The symhols +C+D and -CO. Extension of L’HÔpital’s rule
L’Hôpital’s rule may be extended in several ways. First of all, we may wish to consider
the quotient S(x)/g(x) as x increases without bound. It is convenient to have a short
The symbols + GO and .- CO. Extension of L’Hôpital’s rule
297
descriptive symbolism to express the fact that we are allowing x to increase indefinitely.
For this purpose, mathematicians use the special symbol + CO, called “plus infinity.”
Although we shall not attach any meaning to the symbol + CO by itself, we shall give
precise definitions of various statements involving this symbol.
One of these statements is written as follows:
lim f(x) = A ,
CZ++CX
and is read “The limit off(x), as x tends to plus infinity, is A.” The idea we are trying to
express here is that the function valuesf(x) cari be made arbitrarily close to the real number
A by taking x large enough. TO make this statement mathematically precise, we must
explain what is meant by “arbitrarily close” and by “large enough.” This is done by means
of the following definition :
DEFINITION.
The symbolism
lim f(x) = A
s-++m
means that for every number E > 0, there is another number M > 0 (which may depend
such that
on
l
)
whenever x > M .
If(4 - Al < 6
Calculations involving limits as x+ + CO may be reduced to a more familiar case. We
simply replace x by l/t (that is, let t = 1/x) and note that t - 0 through positive values as
x -+ + CO. More precisely, we introduce a new function F, where
(7.25)
F(t)=f.(f) i
f
t#O,
and simply observe that the two statements
lim f(x) = A
x-t+m
and
lim F(t) = A
t-+0+
mean exactly the same thing. The proof of this equivalence requires only the definitions
of the two limit symbols and is left as an exercise.
When we are interested in the behavior off(x) for large negative x, we introduce the
symbol -CO (“minus infinity”) and Write
lim f(x) = A
CT-*-CC
to mean: For every E > 0, there is an 44 > 0 such that
IfW - Al < E
whenever x < -M.
298
Polynomial approximations to jûnctions
If Fis defined by (7.25), it is easy to verify that the two statements
lim f(x) = A
a!+-*
and
lim F(t) = A
t-o-
are equivalent.
In view of the above remarks, it is not surprising to find that a11 the usual rules for
calculating with limits (as stated in Theorem 3.1 of Section 3.4) also apply to limits as
x + f 00. The same is true of L’Hôpital’s rule which may be extended as follows:
THEOREM
7.10. Assume that f and g have derivatives
than a certainjxed M > 0. Suppose that
lim f(x) = 0
%T++a>
and
f'(x)
and g’(x) for a11 x greater
lim g(x) = 0 ,
CZ++CC
and that g’(x) # 0 for x > M. Zf f ‘(x)/$(x) tends to a Zimit as x * + CO, then f(x)/g(x)
also tends to a limit and the two limits are equal. In other words,
(7.26)
]im i.‘(x> = L
z-‘+m g’(x)
]im Ad = L
z++m g(x)
.
implies
Proof.
Let F(t) = f(l /t) and G(t) = g(l /t). Then f(x)/g(x) = F(t)/G(t) if t = 1 /x, and
t + 0+ as x + + 00. Since F(t)/G(t) assumes the indeterminate form O/O as t + O+, we
examine the quotient of derivatives F’(t)/G’(t). By the chain rule, we have
and
G’(r) = 2 g’ ; .
0
Also, G’(t) # 0 if 0 < t < l/M. When x = l/t and x > M, we have F’(t)/G’(t) =f’(x)/g’(x)
since the common factor - l/t2 cancels. Therefore, iff’(x)/g’(x) + L as x -+ + 00, then
F’(t)/G’(t) + L as t + 0+ and hence, by Theorem 7.9, F(t)/G(t) + L. Since F(t)/G(t) =
f(x)/g(x) this proves (7.26).
There is, of course, a result analogous to Theorem 7.10 in which we consider limits as
x+-CO.
7.15 Infinite limits
In the foregoing section we used the notation x -f + CO to convey the idea that x takes
on arbitrarily large positive values. We also Write
(7.27)
limf(x) = +co
r-+a
or, alternatively,
(7.28)
f(X)-+ +cO
a s x+ a
Injinite limits
299
to indicate thatf(x) takes arbitrarily large values as x approaches a.
of these symbols is given in the following definition.
The precise
meaning
DEFINITION.
The symbolism in (7.27) or in (7.28) means that to every positive number
M (no matter how large), there corresponds another positive number 6 (which may depend on
M) such that
f(x) > M
rff
uhenever 0 < Ix - a1 < S .
(x) > M whenever 0 < x - a < 6, we write
limf(x) = + 00 ,
a-a-t
and we say that f (x) tends to plus injinity as x approaches a from the right.
whenever 0 < a - x < 6, we Write
Zf f (x) > M
limf(x) = + co ,
2+uand we say that f (x) tends to plus injinit.y as x approaches a from the left.
The symbols
limf(x) = -CO,
2+a
limf(x) = - co ,
iC+a+
and
limf(x) = -CO
r-a-
are similarly defined, the only difference being that we replace f(x) > M by f(x) < -M.
Examples are shown in Figure 7.3.
limf(x) = - m ;
x-0 -
limf(x) = + m
x-CI+
FIGURE 7.3 Infinite
limf(x) = + ~5
x-0
limits.
300
Polynomial approximations to functions
It is also convenient to extend the definitions of these symbols further to caver the cases
when x + f 00. Thus, for example, we Write
lim f(x) = + co
2-++CC
if, for every positive number M, there exists another positive number X such that f(x) > M
whenever x > X.
The reader should have no difficulty in formulating similar definitions for the symbols
lim f(x) = + cc ,
a--cc
lim f(x) = - co ,
+++CC
and
lim f(x) = -CO .
a?‘-‘x
EXAMPLES. In Chapter 6 we proved that the logarithm function is increasing and unbounded on the positive real axis. We may express this fact brielly by writing
lim logx = +co.
S?++OZ
(7.29)
We also proved in Chapter 6 that log x < 0 when 0 < x < 1 and that the logarithm has
no lower bound in the interval (0, 1). Therefore, we may also Write lim,.,,+ log x = - 00.
From the relation that holds between the logarithm and the exponential function it is
easy to prove that
(7.30)
lim e” = + co
z++co
and
lim e” = 0
z+-m
(or
lim eé” = 0) .
2++CC
Using these results it is not difficult to show that for cc > 0 we have
lim xa = + CO
m++CX
and
lim L = 0 .
z++m xa
The idea is to Write xa = errlogr and use (7.30) together with (7.29). The formulas in (7.30)
also give us the relations
lim e-lis = + co
r+a-
and
lim e-l/’ = 0 .
?C+D+
The proofs of these statements make good exercises for testing a reader’s understanding
of limit symbols involving f cc.
7.16 The behavior of log x and es for large x
Infinite limits lead to new types of indeterminate forms. For example, we may have a
quotient j(x)/g(x) where both f(x) + + cc and g(x) + + CO as x + a (or as x + f ~0).
In this case, we say that the quotientf(x)/g( x ) assumes the indeterminate form CO/ 00. There
are various extensions of L’Hôpital’s rule that often help to determine the behavior of a
quotient when it assumes the indeterminate form co/co. However, we shall not discuss
these extensions because most examples that occur in practice cari be treated by use of the
The behavior of log x and e” for large x
301
following theorem which describes the behavior of the logarithm and the exponential for
large values of x.
THEOREM
7.11.
(7.31)
If a > 0 and b > 0, we have
lim (log = 0
z-t+00 xa
and
lim - = 0 .
~++m eux
(7.32)
Proof. We prove (7.31) first and then use it to derive (7.32). A simple proof of (7.31)
may be given directly from the definition of the logarithm as an integral. If c > 0 and
t 2 1, we have t-l < te-l. Hence, if x > 1, we may Write
Therefore, we have
o < (log x)” < Xhc-a
Xa
for every c > 0 ,
Cb
If we choose c = $a/b, then xbc-a = xpa/2 which tends to 0 as x + + CO. This proves (7.31).
TO prove (7.32), we make the change of variable t = e”. Then x = log t, and hence
xb/eax = (log t)b/t”. But t + + cc as x --f + CO, SO (7.32) follows from (7.31).
With a natural extension of the o-notation, we cari Write the limit relations just proved
in the form
(log x)” = O(XU)
a s X++C~,
and
xb = o(eaz)
a s X++C~.
In other words, no matter how large b may be and no matter how small a may be (as long
as both are positive), (log x)” tends to infinity more slowly than xa. Also, xb tends to
infinity more slowly than e”“.
EXAMPLE 1. In Example 4 of Section 7.12 we showed that the behavior of e-llx/x for x
near 0 could not be decided by any number of applications of L’Hôpital’s rule for O/O.
However, if we Write t = 1/x, this quotient becomes t/et and it assumes the indeterminate
form co/m as t + + cc. Theorem 7.11 tells us that
lim 4 = 0 .
t++m et
Therefore, eël/‘/x + 0 as x + 0+ or, in other words, e-“’ = o(x) as x + O+.
302
Polynomial approximations to jîînctions
There are other indeterminate forms besides O/O and co/co. Some of these, denoted by
the symbols 0 1CO, Oo, and co”, are illustrated by the examples given below. In examples
iike these, algebraic manipulation often enables us to reduce the problem to an indeterminate
form of the type O/O or CO/CO which may be handled by L’Hôpital’s rule, by polynomial
approximation, or by Theorem 7.11.
EXAMPLE
2. (0 * co). Prove that lim,,,, xa log x = 0 for each fixed LX > 0.
Solution. Writing t = 1/x, we find that xa log x = -(log t)/ta and, by (7.31), this tends
toOas t++oo.
EXAMPLE
Solution.
3. (OO). Show that lim,,,, x” = 1.
Since x” = ex’Or:‘, by continuity of the exponential function we have
lim xr = exp (lim x log x) ,
x-o+
x-+0+
if the last limit exists. But by Example 2 we know that x log x -f 0 as x -j O+, and hence
x%-te0 = 1.
EXAMPLE
Solution.
4. (~0~). Show that lim,,,, x1/” = 1.
Put t = I/x and use the result of Example 3.
In Section 7.10 we proved the limit relations
(7.33)
lim (1 + ax)l’” = en
and
LX-0
lim (1 + x)“‘l = e’ .
X+0
Each of these is an indeterminate form of the type 1 OD. We may replace x by 1/x in these
formulas and obtain, respectively,
lim
LZ++CX
and
lim
X-+CC
both of which are valid for a11 real a.
The relations (7.33) and those in Examples 2, 3, and 4 are a11 of the typeS(x)“(“). These
are usually dealt with by writing
and then treating the exponent g(x) logf(x) by one of the methods discussed earlier.
Exercises
303
7.17 Exercises
Evaluate
the limits in Exercises 1 through 25. The letters a and b denote
positive constants.
e-l/S2
1 . l i *lOOO
m-.
x-o
sin (1/x)
2. lim
2++ao arctan (I/x) ’
14* ;:y+
tan 3x
3. lim
.,&T *
15. lim (logx) log(1 - x).
Lt-l-
4
lim
logb
13. lim (x2 -2/x4 -x2 + 1).
x++ m
+ b@Y
~.
‘r-+m du + bx2
16. lim x(“‘-l).
2-o+
17. lim [x@) - 11.
x+0+
log Jsin XI
6. lim
x~n log Isin 2x1 .
7. lim
LX-&-
log (1 - 2x)
tan TX *
8. lim
cl++ 02
cash (x + 1)
e5
’
18. lim (1 - 2Z)sin5.
x-o1 9 lim xl/lW 2
s-o+
20. lim (cet x)sin 2.
x-o+
9. lim a > 1.
z-+ m Xb ’
tan x - 5
10. lim
r+tn sec x + 4
21. lim (tan x)tnn 2r.
Z-h
1 x
22. lim log- .
X
x+0+
( 1
23. lim &(l+ls 2).
2-o+
24. lim (2 - x)~~~(TZ/~).
12. lim x114 sin (Il&).
r-+ m
1
25. lim
r-0 log (x + d1) 26. Find c
r-1
SO that
27. Prove that (1 + x)” = 1 + cx + o(x) as x - 0. Use this to compute the limit of
{(ti + x2>li2 - x2}
as x + + 00,
28. For a certain value of c, the limit
lim ((x5 + 7x4 + 2)c - x}
x++ 00
is finite and nonzero. Determine this c and compute the value of the limit.
304
Polynomial approximations to jiinctions
29. Let g(x) = x@’ and let f(x) = jy g(t)(t + l/t) dt. Compute the limit of f”(x)&‘(x) as
x++w.
30. Let g(x) = xcezZ and let f(x) = JO e2t(3t2 + 1)112 dt. For a certain value of c, the limit of
f’(xYg’C4 as x -+ + CO is finite and nonzero. Determine c and compute the value of the limit.
31. Letf(x) = e-1/s2 if x # 0, and let f(0) = 0.
(a) Prove that for every m > O,f(x)/x” - 0 as x + 0.
(b) Prove that for x # 0 the nth derivative off has the formf(n)(x) =f(x)P(l/x), where P(t)
is a polynomial in t.
(c) Prove that f tn)(0) = 0 for a11 n 2 1. This shows that every Taylor polynomial generated
by f at 0 is the zero polynomial.
32. An amount of P dollars is deposited in a bank which pays interest at a rate r per year, compounded m times a year. (For example, r = 0.06 when the annual rate is 6x.) (a) Prove that
the total amount of principal plus interest at the end of n years is P(l + r/m)mn. If r and n
are kept fixed, this amount approaches the limit Pern as m + + to. This motivates the following definition: We say that money grows at an annual rate r when compounded continuously
if the amount f(t) after t years is f(0)ert, where t is any nonnegative real number. Approximately how long does it take for a bank account to double in value if it receives interest at an
annual rate of 6% compounded (b) continuously? (c) four times a year?
8
INTRODUCTION TO DIFFERENTIAL EQUATIONS
8.1 Introduction
A large variety of scientific problems arise in which one tries to determine something
from its rate of change. For example, we could try to compute the position of a moving
particle from a knowledge of its velocity or acceleration. Or a radioactive substance may
be disintegrating at a known rate and we may be required to determine the amount of
material present after a given time. In examples like these, we are trying to determine an
unknown fonction from prescribed information expressed in the form of an equation
involving at least one of the derivatives of the unknown function. These equations are
called dij’ërential equations, and their study forms one of the most challenging branches
of mathematics.
Differential equations are classified
under two main headings: ordinary and partial,
depending on whether the unknown is a function of just one variable or of two or more
variables. A simple example of an ordinary differential equation is the relation
f’(x) = f(-4
(8.1)
which is satisfied, in particular, by the exponential function, f(x) = ex. We shall see
presently that every solution of (8.1) must be of the formf(x) = Ce”, where C may be any
constant.
On the other hand, an equation like
a%> Y> + a”fcx,
Y>
()
PC
ax2
aY
is an example of a partial differential equation. This particular one, called Luplace’s
equation, appears in the theory of electricity and magnetism, fluid mechanics, and elsewhere. It has many different kinds of solutions, among which are f(x, y) = x + 2y,
f(x, y) = e” COS y, andf(x, y) = log (x2 + y”).
The study of differential equations is one part of mathematics that, perhaps more than
any other, has been directly inspired by mechanics, astronomy, and mathematical physics.
Its history began in the 17th Century when Newton, Leibniz, and the Bernoullis solved
some simple differential equations arising from problems in geometry and mechanics.
305
306
Introduction to d@erential equations
These early discoveries, beginning about 1690, gradually led to the development of a nowclassic “bag of tricks” for solving certain special kinds of differential equations. Although
these special tricks are applicable in relatively few cases, they do enable us to solve many
differential equations that arise in mechanics and geometry, SO their study is of practical
importance. Some of these special methods and some of the problems which they help us
solve are discussed near the end of this chapter.
Experience has shown that it is difficult to obtain mathematical theories of much
generality about solutions of differential equations, except for a few types. Among these
are the so-called linear differential equations which occur in a great variety of scientific
problems. The simplest types of linear differential equations and some of their applications
are also discussed in this introductory chapter. A more thorough study of linear equations
is carried out in Volume II.
8.2 Terminology and notation
When we work with a differential equation such as (8.1), it is customary to Write y in
place off(x) and y’ in place off’(x), the higher derivatives being denoted by y”, y”‘, etc.
Of course, other letters such as U, u, z, etc. are also used instead of y. By the order of an
equation is meant the order of the highest derivative which appears. For example, (8.1)
is a first-order equation which may be written as y’ = y. The differential equation
y’ = $y + sin (xy”) is one of second order.
In this chapter we shall begin our study with first-order equations which cari be solved
for y’ and written as follows:
where the expressionf(x, y) on the right has various special forms. A differentiable function
y = Y(x) Will b e called a solution of (8.2) on an interval Z if the function Y and its derivative
Y’ satisfy the relation
Y’(x) = fk Y(x)1
for every x in Z. The simplest case occurs when f(x, y) is independent of y.
.
(8.2) becomes
In this case,
Y ’ = Qc4 7
(8.3)
say, where Q is assumed to be a given function defined on some interval i. TO solve the
differential equation (8.3) means 1:o find a primitive of Q. The second fundamental theorem
of calculus tells us how to do it when Q is continuous on an open interval Z. We simply
integrate Q and add any constant. Thus, every solution of (8.3) is included in the formula
(8.4)
Y
= j-
Q(x) dx +
C ,
where C is any constant (usually called an arbitrary constant of integration). The differential
equation (8.3) has infinitely many solutions, one for each value of C.
If it is not possible to evaluate the integral in (8.4) in terms of familiar functions,
such
AJirst-order d@erential
equation for the exponentialfunction
307
as polynomials, rational functions, trigonometric and inverse trigonometric functions,
logarithms, and exponentials, still we consider the differential equation as having been
solved if the solution cari be expressed in terms of integrals of known functions. In actual
practice, there are various methods for obtaining approximate evaluations of integrals
which lead to useful information about the solution. Automatic high-speed computing
machines are often designed with this kind of problem in mind.
EXAMPLE. Linear motion determined from the velocity. Suppose a particle moves along a
straight line in such a way that its velocity at time t is 2 sin t. Determine its position at
time t.
Solution. If Y(t) denotes the position at time t measured from some starting point, then
the derivative Y’(t) represents the velocity at time t. We are given that
Y’(t) = 2 sin t .
Integrating, we find that
Y(t) = 2 1 sin t dt + C = -2
COS
t + C .
This is a11 we cari deduce about Y(t) from a knowledge of the velocity alone; some other
piece of information is needed to fix the position function. We cari determine C if we know
the value of Y at some particular instant. For example, if Y(0) = 0, then C = 2 and the
position function is Y(t) = 2 - 2 cas t. But if Y(0) = 2, then C = 4 and the position
function is Y(t) = 4 - 2 cas t.
In some respects the example just solved is typical of what happens in general. Somewhere in the process of solving a first-order differential equation, an integration is required
to remove the derivative y’ and in this step an arbitrary constant C appears. The way in
which the arbitrary constant C enters into the solution Will depend on the nature of the
given differential equation. It may appear as an additive constant, as in Equation (8.4),
but it is more likely to appear in some other way. For example, when we solve the equation
y’ = y in Section 8.3, we shall find that every solution has the form y = Ce”.
In many problems it is necessary to Select from the collection of a11 solutions one having
a prescribed value at some point. The prescribed value is called an initial condition, and
the problem of determining such a solution is called an initial-value problem. This
terminology originated in mechanics where, as in the above example, the prescribed value
represents the displacement at some initial time.
We shall begin our study of differential equations with an important special case.
8.3 A first-order differential equation for the exponential function
The exponential function is equal to its own derivative, and the same is true of any
constant multiple of the exponential. It is easy to show that these are the only functions
that satisfy this property on the whole real axis.
8.1. If C is a given real number, there is one and only one function
the d@erential
equation
THEOREM
satisjîes
f'(x) =fW
f
which
308
Introduction to dlyerential
for a11 real x and which also satisjîes
by the formula
equations
the initial condition f(0) = C.
This function is given
f(x) = Ce”.
Proof.
It is easy to verify that the function f (x) = Ce” satisfies both the given differential
equation and the given initial condition. Now we must show that this is the only solution.
Let y = g(x) be any solution of this initial-value problem:
g’(x) = g(x)
for a11 x,
g(0) = c.
We wish to show that g(x) = Ce” or that g(x)e-” = C. We consider the function h(x) =
g(x)e-” and show that its derivative is always zero. The derivative of h is given by
h’(x) = g’(x)e-r - g(x)e-” = e-“[g’(x)
- g(x)] = 0 .
Hence, by the zero-derivative theorem, h is constant. But g(0) = C SO h(0) = g(0)e” = C. .
Hence, we have h(x) = C for a11 x which means that g(x) = Ce”, as required.
Theorem 8.1 is an example of an existence-uniqueness theorem. It tells us that the given
initial-value problem has a solution (existence) and that it has onZy one solution (uniqueness).
The abject of much of the research in the theory of differential equations is to discover
existence and uniqueness theorems for wide classes of equations.
We discuss next an important type which includes both the differential equation y’ = Q(x)
and the equation y’ = y as special cases.
8.4 First-order linear differential equations
A differential equation of the form
63.5)
Y’
+ f’(x>y =
QC4 >
where P and Q are given functions, is called a$rst-order linear differential equation. The
terms involving the unknown function y and its derivative y’ appear as a linear combination
of y and y’. The functions P and Q are assumed to be continuous on some open interval I.
We seek a11 solutions y defined on Z.
First we consider the special case in which the right member, Q(x), is identically zero.
The equation
(8.6)
y' + P(x)y = 0
is called the homogeneous or reduced equation corresponding to (8.5). We Will show how
to solve the homogeneous equation and then use the result to help us solve the nonhomogeneous equation (8.5).
If y is nonzero on Z, Equation (8.6) is equivalent to the equation
(8.7)
yl = -P(x)
Y
First-order linear d@erential
equations
309
That is, every nonzero y which satisfies (8.6) also satisfies (8.7) and vice versa. Now suppose
y is a positive function satisfying (8.7). Since the quotient y’/~ is the derivative of log y,
Equation (8.7) becomes D log y = -P(x), from which we find log y = -SP(x) dx + C,
SO we have
(8.8)
y = e-A(r) )
where
A(x) = s P(x) dx - C
In other words, if there is a positive solution of (8.6), it must necessarily have the form
(8.8) for some C. But now it is easy to verify that every function in (8.8) is a solution of
the homogeneous equation (8.6). In fact, we have
Thus, we have found a11 positive solutions of (8.6). But now it is easy to describe
solutions. We state the result as an existence-uniqueness theorem.
a11
THEOREM
8.2. Assume P is continuous on an open interval Z. Choose any point a in Z
and let b be any real number. Then there is one and only one function y = f (x) tllhich satisjes
the initial-value problem
(8.9)
y’ + P(x)y = 0,
with
f(a) = b ,
on the interval Z. This jiinction is given by the formula
(8.10)
f(x) = beëA(“) ,
w h e r e A(x) = J: P(t) dt .
Proof. Let f be defined by (8.10). Then A(a) = 0 SO f(a) = beo = b. Differentiation
shows that f satisfies the differential equation in (8.9) SO f is a solution of the initial-value
problem. Now we must show that it is the only solution.
Let g be an arbitrary solution. We wish to show that g(x) = be&(“) or that g(x)eA(“) = b.
Therefore it is natural to introduce h(x) = g(x)eA(“). The derivative of h is given by
(8.11)
h’(x) = g’(x)en(‘) + g(x)eA’“‘A’(x) = eA’“‘[g’(x)
+ P(X)g(x)] .
Now since g satisfies the differential equation in (8.9), we have g’(x) + P(x)g(x) = 0
everywhere on Z, SO I~‘(X) = 0 for a11 x in Z. This means that h is constant on Z. Hence,
we have h(x) = h(a) = g(a)e”‘“) = g(a) = b. In other words, g(x)e”(“) = b, SO g(x) =
be&(‘), which shows that g = f. This completes the proof.
The last part of the foregoing proof suggests a method for solving the nonhomogeneous
differential equation in (8.5). Suppose that g is any function satisfying (8.5) and let
h(x) = g(x)eA(“) where, as above, A(x) = j$ P(t) dt. Then Equation (8.11) is again valid,
but since g satisfies (8.5), the formula for h’(x) gives us
h’(x) = eA(‘)Q(x)
.
Introduction to difSerentia1
310
equations
Now we may invoke the second fundamental theorem to Write
h(x)
= h(u) + ioz eA(‘)Q(t) dt .
Hence, since h(a) = g( a ) , every solution g of (8.5) has the form
(8.12)
g(x) = ëAcz) h(x) = g(a)eëA’“’ + ëAc2) ax Q(t) eAtt)dt .
s
Conversely, by direct differentiation of (8.12), it is easy to verify that each such g is a
solution of (8.5), SO we have found a11 solutions. We state the result as follows.
THEOREM
8.3. Assume P and Q are continuous on an open interval I. Choose anypoint
a in I and let b be any real number. Then there is one and only one function y = f (x) which
satisjes the initial-value problem
Y’
with f(a) = b ,
+ f’(x)y = Q(x),
on the interval I. Thisfunction is given by the formula
f(x) = be-A(“) + e-A(d
sa% Q(t)
eAct) dt ,
where A(x) = j; P(t) dt.
Up to now the word “interval” has meant a bounded interval of the form (a, b), [a, b],
[a, b), or (a, b], with a < b. It is convenient to consider also unbounded intervals. They
are denoted by the symbols (a, + OO), (- 00, a), [a, + CO) and (- CO, a], and they are
defined as follows:
(6 + a> = {x I x > a} ,
(-~,aj={xIx<a},
[a, + 00) = lx Ix 2 a) ,
(-oo,a]={xIx<a}.
In addition, it is convenient to refer to the collection of a11 real numbers as the interval
(- oc), + co). Thus, when we discuss a differential equation or its solution over an interval
Z, it Will be understood that Z is one of the nine types just described.
EXAMPLE. Find a11 solutions of the first-order differential equation xy’ + (1 - x)y = ezr
on the interval (0, + CO).
Solution. First we transform the equation to the form y’ + P(x)y = Q(x) by dividing
through by x. This gives us
y’+ ( ;- 1 ) Jd$,
Exercises
311
P(x) = I/x - 1 and Q(x) = ezx/x. Since P and Q are continuous o n t h e interval
(0, + co), there is a unique solution y = f(x) satisfying any given initial condition of the
formf(a) = b. We shall express a11 solutions in terms of the initial value at the point a = 1.
In other words, given any real number b, we Will determine a11 solutions for whichf( 1) = 6.
First we compute
SO
Hence
we bave
e-.@)
= e~-l-lW~
= e"-l
Ix, a n d e A(~) = teret,
Theorem 8.3 tells us
SO
that the solution is given by the formula
f(x) = b f$l + $s,” q tel-’ dl = b $! + $S”et dt
1
=b$i+$(e~-e)=bef+;-$l.
We cari also Write this in the form
f(x)
ezx + Ce”
=
x
>
where C = be-l - e. This gives a11 solutions on the interval (0, +CD).
It may be of interest to study the behavior of the solutions as x --f 0. If we approximate
the exponential by its linear Taylor polynomial, we find that ezî: = 1 + 2x + o(x) and
e” = 1 + x + o(x) as x + 0, SO we have
f(x) = (l + c, + (2 + Cb + 4x)- 1 + c I (2 + c) + o(l)
X
X
Therefore, only the solution with C = - 1 tends to a finite limit as x -f 0, this limit being 1.
8.5 Exercises
In each of Exercises 1 through 5, solve the initial-value problem on the specified interval.
1. y’ - 3~ = ezz on (- ~0, + CO), with y = 0 when x = 0.
2. xy’ - 2~ = x5 on (0, + oo), with y = 1 when x = 1.
3. y’ + y tan x = sin 2x on (-4x, in), with y = 2 when x = 0.
4. y’+xy =x30n(-a, +co),withy-=Owhenx
=O.
5. 2 + x = ezt on (- m, + CO), with x = 1 when t = 0.
6. Find a11 solutions of y’ sin x + y COS x = 1 on the interval (0, n). Prove that exactly one of
these solutions has a finite limit as x -+ 0, and another has a finite limit as x - n.
7. Find a11 solutions of x(x + 1)~’ + y = x(x + 1)2e-“2 on the interval (-1,O). Prove that a11
solutions approach 0 as x -+ - 1, but that only one of them has a finite limit as x -+ 0.
8. Find a11 solutions of y’ + y cet x = 2 COS x on the interval (0, r). Prove that exactly one of
these is also a solution on (- ~0, + a).
312
Introduction to differential equations
9. Find a11 solutions of (x - 2)(x - 3)y’ + 2y = (x - 1)(x - 2) on each of the following
intervals: (a) (- ~0, 2); (b) (2, 3); (c) (3, + a). Prove that a11 solutions tend to a finite limit
as x -+ 2, but that none has a finite limit as x + 3.
10. Let S(X) = (sin x)/x if x # 0, and let s(0) = 1. Define T(x) = j$ s(t) dt. Prove that the
function f(x) = XT(X) satisfies the differential equation xy’ - y = x sin x on the interval
(-CO, + a) and find a11 solutions on this interval. Prove that the differential equation has
no solution satisfying the initial conditionf(0) = 1, and explain why this does not contradict
Theorem 8.3.
11. Prove that there is exactly one function f, continuous on the positive real axis, such that
f(x) = 1 + ;
s1
zf(‘) dt
for a11 x > 0 and find this function.
12. The function f defined by the equation
J'(x)
= ~,(1-~~)/2
_ xe-X2/2
2 f-2et2/2 dt
1
for x > 0 has the properties that (i) it is continuous on the positive real axis, and (ii) it satisfies
the equation
f(x) = 1 - x j;f(t) dt
for a11 x > 0. Find a11 functions
with these two properties.
The Bernoulli equation. A differential equation of the form y’ + &)y = Q(x)yn, where n is
not 0 or 1, is called a Bernoulli equation. This equation is nonlinear because of the presence of y”.
The next exercise shows that it cari always be transformed into a linear first-order equation for a
new unknown function v, where v = y”, k = 1 - n.
13. Let k be a nonzero constant. Assume P and Q are continuous on an interval Z. If a E Z and
if b is any real number, let v =g(x) be the unique solution of the initital-value problem
v’ + kP(x)v = kQ(x) on Z, with g(u) = b. If n # 1 and k = 1 - n, prove that a function
y =f(x), which is never zero on Z, is a solution of the initial-value problem
y’ + PWy = QWyn
on Z,
with f(a)” = b
if and only if the kth power off is equal to g on Z.
In each of Exercises 14 through 17, solve the initial-value problem on the specihed interval.
14. y’ - 4y = 2exyli2 On(--aJ, +co),withy =2whenx =O.
15. y’ -y = -y2(x2 +x + l)on(-a, +co),withy = 1 whenx =O.
16. xy’ - 2y = 4~~yl’~ On(-a, +co),withy =Owhenx = 1.
17. xy’ +y =y2x210gxon(0,
+m),withy =iwhenx = 1.
18. 2xyy’ + (1 + x)y2 = e” on (0, + CO), with (a) y = Z/ewhen x = 1; (b) y = -&when x = 1;
(c) a finite limit as x + 0.
19. An equation of the form y’ + P(x)y + Q(x)y” = R(x) is called a Riccati eyuation. (There
is no known method for solving the general Riccati equation.) Prove that if u is a known
solution of this equation, then there are further solutions of the form y = u + I/v, where u
satisfies a first-order linear equation.
Some physical problems leading to first-order linear differential equations
313
20. The Riccati equation y’ + y + y 2 = 2 has two constant solutions. Start with each of these
and use Exercise 19 to find further solutions as follows: (a) If -2 5 b < 1, find a solution on
(- m, + CO) for which y = b when x = 0. (b) If b 2 1 or b < -2, find a solution on the interval
(-MI, +co)forwhichy =bwhenx =O.
8.6 Some physical problems leading to first-order linear differential equations
In this section we Will discuss various physical problems that cari be formulated mathematically as differential equations. In each case, the differential equation represents an
idealized simplification of the physical problem and is called a mathematical mode1 of
the problem. The differential equation occurs as a translation of some physical law, such
as Newton’s second law of motion, a “conservation” law, etc. Our purpose here is not to
justify the choice of the mathematical mode1 but rather -to deduce logical consequences
from it. Each mode1 is only an approximation to reality, and its justification properly
belongs to the science from which the problem emanates. If intuition or experimental
evidence agrees with the results deduced mathematically, then we feel that the mode1 is a
useful one. If not, we try to find a more suitable model.
EXAMPLE
1. Radioactive decay. Although various radioactive elements show marked
differences
in their rates of decay, they a11 seem to share a common property-the rate at
which a given substance decomposes at any instant is proportional to the amount present
at that instant. If we denote by y =f(t) the amount present at time t, the derivative y’ =
f’(t) represents the rate of change of y at time t, and the “law of decay” states that
y’ = -ky ,
where k is a positive constant (called the decay constartt) whose actual value depends on
the particular element that is decomposing. The minus sign cornes in because y decreases
as t increases, and hence y’ is always negative. The differential equation y’ = -ky is the
mathematical mode1 used for problems concerning radioactive decay. Every solution
y =f(t) of th’1s d’lfferential equation has the form
(8.13)
f(t) =f(O)e-““.
Therefore, to determine the amount present at time t, we need to know the initial amount
f(0) and the value of the decay constant k.
It is interesting to see what information cari be deduced from (8.13), without knowing the
exact value off(O) or of k. First we observe that there is no finite time t at whichf(t) Will
be zero because the exponential e@ never vanishes. Therefore, it is not useful to study
the “total lifetime” of a radioactive substance. However, it is possible to determine the
time required for any particularfraction of a sample to decay. The fraction 4 is usually
chosen for convenience and the time T at which f(T)/f(O) = 4 is called the halfXfe of the
substance. This cari be determined by solving the equation eekT = i for T. Taking
logarithms, we get -kT = -1og 2 or T = (log 2)/k. This equation relates the half-life
to the decay constant. Since we have
f(t
+ T)
f(Wk’“+T’ = e-kT = 1
---=
f (0)eëk’
2’
f(t)
314
Introduction to differential equations
0
FIGURE 8.1
Radioactive
decay
with
half-life T.
we see that the half-life is the same for every sample of a given material.
the general shape of a radioactive decay curve.
Figure 8.1 illustrates
EXAMPLE 2. Falling body in a resisting medium. A body of mass m is dropped from
rest from a great height in the earth’s atmosphere. Assume that it falls in a straight line
and that the only forces acting on it are the earth’s gravitational attraction (mg, where g is
the acceleration due to gravity, assumed to be constant) and a resisting force (due to air
resistance) which is proportional to its velocity. It is required to discuss the resulting
motion.
Let s = f(t) denote the distance the body has fallen at time t and let u = s’ = f’(t) denote
its velocity. The assumption that it falls from rest means thatf’(0) = 0.
There are two forces acting on the body, a downward force mg (due to its weight) and
an upward force -ku (due to air resistance), where k is some positive constant. Newton’s
second law states that the net sum of the forces acting on the body at any instant is equal
to the product of its mass m and its acceleration. If we denote the acceleration at time r
by a, then a = v’ = S” and Newton’s law gives us the equation
ma=mg-kv.
This cari be considered as a second-order differential equation for the displacement s or
as a first-order equation for the velocity u. As a first-order equation for v, it is linear and
cari be written in the form
k
v’+-u=g.
m
This equation is the mathematical mode1 of the problem. Since v = 0 when t = 0, the
Some physical problems leading to jrst-order
Iinear d$erential
equations
315
unique solution of the differential equation is given by the formula
(8.14)
v=e -ktlm
s0
tgeWm
du _
- y (1 - e-ktl”‘) .
Note that v + mg/k as t -+ +co. If we differentiate Equation (8.14) we find that the
acceleration at every instant is a = geëktlm. Note that a - 0 as t - + CO. Interpreted
physically, this means that the air resistance tends to balance out the force of gravity.
Since v = s’, Equation (8.14) is itself a differential equation for the displacement s, and
it may be integrated directly to give
2
s
=
y t +
g ! ! e-k’lm +
k2
c .
Since s = 0 when t = 0, we find that C = -gm2/k2 and the equation of motion becomes
2
s
=
y
t +
5
(e-“tlm
- 1).
If the initial velocity is vo when t = 0, formula (8.14) for the velocity at time t must be
replaced by
v = y (1 - eekflm) + voevkt’m.
It is interesting to note that for every initial velocity (positive, negative, or zero), the limiting
velocity, as t increases without bound, is mg/k, a number independent of vo . The reader
should convince himself, on physical grounds, that this seems reasonable.
EXAMPLE 3. A cooling problem. The rate at which a body changes temperature is proportional to the difference between its temperature and that of the surrounding medium.
(This is called Newton’s Zaw of cooling.) If y =f(t) is the (unknown) temperature of the
body at time t and if M(t) denotes the (known) temperature of the surrounding medium,
Newton’s law leads to the differential equation
(8.15)
Y’
= -kIy - M(t)1
or
y’ + ky = kM(t) ,
where k is a positive constant. This first-order linear equation is the mathematical mode1
we use for cooling problems. The unique solution of the equation satisfying the initial
conditionf(a) = b is given by the formula
(8.16)
f(t) = bewkt + eekt/I kM(u)e””
du .
Consider now a specific problem in which a body cools from 200” to 100” in 40 minutes
while immersed in a medium whose temperature is kept constant, say M(t) = 10”. If we
316
Introduction to difSerentia1
equations
measure t in minutes andf(t) in degrees, we havef(0) = 200 and Equation (8.16) gives us
(8.17)
f’(t) = 200eë”’
+ 10keëkt s ’ ekzL du
0
= 200eeekf + lO(1 - e?) = 10 + 190e@.
We cari compute k from the information thatf(40) = 100. Putting t = 40 in (8.17), we
find 90 = 190e-40k, SO -4Ok = log (90/190), k = &(log 19 - log 9).
Next, let us compute the time required for this same material to cool from 200” to
100” if the temperature of the medium is kept at 5”. Then Equation (8.16) is valid with the
same constant k but with M(u) = 5. Instead of (8.17), we get the formula
f(t) = 5 + 195eëkt.
TO find the time t for which f(t) = 100, we get 95 = 195e@,
log (19/39), and hence
SO
-kt = log (95/195) =
t = i (log 39 - log 19) = 40 log 39 - log 19
log 19 - log 9 *
From a four-place table of natural logarithms, we find log 39 = 3.6636, log 19 = 2.9444,
and log 9 = 2.1972 SO, with slide-rule accuracy, we get t = 40(0.719)/(0.747) = 38.5
minutes.
The differential equation in (8.15) tells us that the rate of cooling decreases considerably
as the temperature of the body begins to approach the temperature of the medium. TO
illustrate, let us find the time required to cool the same substance from 100” to 10” with
the medium kept at 5”. The calculation leads to log (5/95) = -kt, or
t = i log 19 = 40
19
log
= 40(2.944) = 158 minutes
log 19 - log 9
0.747
Note that the temperature drop from 100” to 10” takes more than four times as long as the
change from 200” to 100”.
EXAMPLE 4. A dilutionproblem. A tank contains 100 gallons of brine whose concentration
is 2.5 pounds of salt per gallon. Brine containing 2 pounds of salt per gallon runs into the
tank at a rate of 5 gallons per minute and the mixture (kept uniform by stirring) runs out
at the same rate. Find the amount of sait in the tank at every instant.
Let y =f(t) denote the number of pounds of salt in the tank at time t minutes after
mixing begins. There are two factors which cause y to change, the incoming brine which
brings salt in at a rate of 10 pounds per minute and the outgoing mixture which removes salt
at a rate of 5(y/lOO) pounds per minute. (The fraction y/100 represents the concentration
at time t.) Hence the differential equation is
or
y’ + &y = 10 .
This linear equation is the mathematical mode1 for our problem. Since
y = 250 when
Some physical problems leading
tojrst-order linear difSerentia1
equations
317
t = 0, the unique solution is given by the formula
(8.18)
y = 250e-fi20 + e-wJ
t l()e+o du = 200 + 50eë’120.
s0
This equation shows that y > 200 for a11 t and that y + 200 as t increases without bound.
Hence, the minimum salt content is 200 pounds. (This could also have been guessed from
the statement of the problem.) Equation (8.18) cari be solved for t in terms of y to yield
t logc i
= 20
50
~
y - 200 .
This enables us to find the time at which the salt content Will be a given amount y, provided
that 200 < y < 250.
EXAMPLE
5. Electric circuits. Figure 8.2(a), page 318, shows an electric circuit which
has an electromotive force, a resistor, and an inductor connected
in series. The electromotive force produces a voltage which causes an electric current to flow in the circuit.
If the reader is not familiar with electric circuits, he should not be concerned.
For our
purposes, a11 we need to know about the circuit is that the voltage, denoted by V(t),
and the current, denoted by Z(t), are functions of time t related by a differential equation
of the form
(8.19)
LT(t) + Rz(t) = V(t).
Here L and R are assumed to be positive constants. They are called, respectively, the
inductance and resistance of the circuit. The differential equation is a mathematical formulation of a conservation law known as Kirchhofs voltage Ian,, and it serves as a mathematical mode1 for the circuit.
Those readers unfamiliar with circuits may find it helpful to think of the current as being
analogous to water flowing in a pipe. The electromotive force (usually a battery or a
generator) is analogous to a pump which causes the water to flow; the resistor is analogous
to friction in the pipe, which tends to oppose the flow; and the inductance is a stabilizing
influence which tends to oppose sudden changes in the current due to sudden changes in
the voltage.
The usual type of question concerning such circuits is this: If a given voltage V(t) is
impressed on the circuit, what is the resulting current Z(t)? Since we are dealing with a
first-order linear differential equation, the solution is a routine matter. If Z(0) denotes the
initial current at time t = 0, the equation has the solution
Z(t) = Z(0)eëntiL + eëRtiL
t V(x> e%dL dx .
s0 L
An important special case occurs when the impressed voltage is constant, say V(t) = E
for a11 t. In this case, the integration is easy to perform and we are led to the formula
318
Introduction to d$erential equations
Inductor
force T
ElectromoLT@@@-l
Resistor
(4
FIGURE 8.2
(b)
(a) Diagram for a simple series circuit. (b) The current resulting from
a constant impressed voltage E.
This shows that the nature of the solution depends on the relation between the initial
current Z(0) and the quotient E/R. If Z(0) = E/R, the exponential term is not present and
the current is constant, Z(t) = E/R. If Z(0) > E/R, the coefficient of the exponential term
is positive and the current decreases to the limiting value E/R as t + + CO. If Z(0) < E/R,
the current increases to the limiting value E/R. The constant E/R is called the steady-state
current, and the exponential term [I(O) - E/R]e- “IL is called the transient current. Examples are illustrated in Figure 8.2(b).
The foregoing examples illustrate the unifying power and practical utility of differential
equations. They show how several different types of physical problems may lead to
exactly the same type of differential equation.
The differential equation in (8.19) is of special interest because it suggests the possibility
For example, suppose
of attacking a wide variety of physical problems by electrical means.
a physical problem leads to a differential equation of the form
/+%Y=
Q,
where a is a positive constant and Q is a known function. We cari try to construct an
electric circuit with inductance L and resistance
R in the ratio R/L = a and then try to
impress a voltage LQ on the circuit. We would then have an electric circuit with exactly the
same mathematical mode1 as the physical problem. Thus, we cari hope to get numerical
data about the solution of the physical problem by making measurements of current in
the electric circuit. This idea has been used in practice and has led to the development of
the analog computer.
Exercises
319
8.7 Exercises
In the following exercises,
use an appropriate first-order differential equation as a mathematical
mode1 of the problem.
The half-Pife for radium is approximately 1600 years. Find what percentage of a given quantity
of radium disintegrates in 100 years.
If a strain of bacteria grows at a rate proportional to the amount present and if the population
doubles in one hour, by how much Will it increase at the end of two hours?
Denote by y =f(t) the amount of a substance present at time t. Assume it disintegrates at a
rate proportional to the amount present. If n is a positive integer, the number T for which
f(T) =f(O)/n is called the l/nth life of the substance.
(a) Prove that the l/nth life is the same for every sample of a given material, and compute T
in terms of n and the decay constant k.
(b) If a and b are given, prove that f cari be expressed in the form
f(t) = f(u)y(b)l-~(t)
and determine w(t). This shows that the amount present at time t is a weighted geometric
mean of the amounts present at two instants t = a and t = b.
4. A man wearing a parachute jumps from a great height. The combined weight of man and parachute is 192 pounds. Let v(t) denote his speed (in feet per second) at time t seconds after
falling. During the first 10 seconds, before the parachute opens, assume the air resistance is
$V(t) pounds. Thereafter, while the parachute is open, assume the resistance
is 12u(t) pounds.
Assume the acceleration of gravity is 32 ft/sec2 and find explicit formulas for the speed v(t)
at time t. (You may use the approximation e- 5/4 = 37/128 in your calculations.)
5. Refer to Example 2 of Section 8.6. Use the chain rule to Write
du
ds du du
-=--=udt
dt ds ds
and thus show that the differential equation in the example cari
be expressed as follows:
ds
bu
-=du
c-v’
where b = mlk and c = gm/k. Integrate this equation to express s in terms of v. Check your
result with the formulas for Y and s derived in the example.
6. Modify Example 2 of Section 8.6 by assuming the air resistance is proportional to v2. Show
that the differential equation cari be put in each of the following forms:
ds _A-.
m
v
dt
m
1
z=k,z*
du- k$-$’
where c = dmg/k. Integrate each of these and obtain the following formulas for v:
02
ebt
= !f (1 - e-2kshn) ;
’ =
_ e-ht
’ ebt +
=ctanhbt,
e-ht
where b = m Determine the limiting value of v as t + +a.
Introduction to d@erential
320
equations
7. A body in a room at 60” cools from 200” to 120” in half an hour.
(a) Show that its temperature after t minutes is 60 + 140ePLt, where k = (log 7 - log 3)/30.
(b) Show that the time t required to reach a temperature of T degrees is given by the formula
t = [log 140 - log (T - 60)]/k, where 60 < T 5 200.
(c) Find the time at which the temperature is 90”.
(d) Find a formula for the temperature of the body at time t if the room temperature is not
kept constant but falls at a rate of 1” each ten minutes. Assume the room temperature is 60”
when the body temperature is 200”.
8. A thermometer has been stored in a room whose temperature is 75”. Five minutes after being
taken outdoors it reads 65”. After another five minutes, it reads 60”. Compute the outdoor
temperature.
9. In a tank are 100 gallons of brine containing 50 pounds of dissolved Salt. Water runs into the
tank at the rate of 3 gallons per minute, and the concentration is kept uniform by stirring.
How much salt is in the tank at the end of one hour if the mixture runs out at a rate of 2 gallons
per minute?
10. Refer to Exercise 9. Suppose the bottom of the tank is covered with a mixture of salt and insoluble material. Assume that the salt dissolves at a rate proportional to the difference between
the concentration of the solution and that of a saturated solution (3 pounds of salt per gallon),
and that if the water were fresh 1 Pound of salt would dissolve per minute. How much salt
Will be in solution at the end of one hour?
11. Consider an electric circuit like that in Example 5 of Section 8.6. Assume the electromotive
force is an alternating current generator which produces a voltage V(t) = E sin wt, where E
and o are positive constants (w is the Greek letter omega). If Z(0) = 0, prove that the current
has the form
4t> = -Q====
EoL
sin (ut - ~1 + R2 + 02L2 ëRtIL,
where m depends only on o, L, and R. Show that a = 0 when L = 0.
12. Refer to Example 5 of Section 8.6. Assume the impressed voltage is a step function defined as
follows: E(t) = E if a < t < b, where a > 0; E(t) = 0 for a11 other t. If Z(0) = 0 prove that
the current is given by the following formulas: Z(t) = 0 if t < a;
z(t)
= ; (1 - e-R(t-a)lL)
i f
a<t<b;
Z(t) = g e- RtIL
(eRbIL
_ eRa/L)
i f
t>b.
Make a sketch indicating the nature of the graph of Z.
Population growth. In a study of the growth of a population (whether human, animal, or bacterial), the function which counts the number x of individuals present at time t is necessarily a step
finction taking on only integer values. Therefore the true rate ofgrowth dx/dt is zero (when t lies
in an open interval where x is constant), or else the derivative dx/dt does not exist (when x jumps
from one integer to another). Nevertheless, useful information cari often be obtained if we assume
that the population x is a continuous function of t with a continuous derivative dx/dt at each
instant. We then postulate various “laws of growth” for the population, depending on the factors
in the environment which may stimulate or hinder growth.
For example, if environment has little or no effect, it seems reasonable to assume that the rate
of growth is proportional to the amount present. The simplest kind of growth law takes the form
(8.20)
dx
- =kx,
dt
Exercises
321
where k is a constant that depends on the particular kind of population. Conditions may develop
which cause the factor k to change with time, and the growth law (8.20) cari be generalized as
follows :
(8.21)
dx
- = k(t)x .
dt
If, for some reason, the population cannot exceed a certain maximum M (for example, because
the food supply may be exhausted), we may reasonably suppose that the rate of growth is jointly
proportional to both x and M - x. Thus we have a second type of growth law:
dx
- = kx(M - x) >
dt
where, as in (8.21), k may be constant or, more generally, k may change with time. Technological
improvements may tend to increase or decrease the value of M slowly, and hence we cari generalize
(8.22) even further by allowing M to change with time.
13. Express x as a function of t for each of the “growth laws” in (8.20) and (8.22) (with /c and M
both constant). Show that the result for (8.22) cari be expressed as follows:
M
x = 1 + e-d-tl) ’
where a is a constant and t, is the time at which x = M/2.
14. Assume the growth law in formula (8.23) of Exercise 13, and suppose a census is taken
at three equally spaced times t, , t, , t, , the resulting numbers being x1 , x2 , xs . Show
that this suffices to determine M and that, in fact, we have
15. Derive a formula that generalizes (8.23) of Exercise 13 for the growth law (8.22) when k is
not necessarily constant. Express the result in terms of the time t. for which x = M/2.
16. The Census Bureau reported the following population figures (in millions) for the United
States at ten-year intervals from 1790 to 1950: 3.9, 5.3, 7.2, 9.6, 12.9, 17, 23, 31, 39, 50, 63, 76,
92, 108, 122, 135, 150.
(a) Use Equation (8.24) to determine a value of M on the basis of the census figures for 1790,
1850, and 1910.
(b) Same as (a) for the years 1910, 1930, 1950.
(c) On the basis of your calculations in (a) and (b), would you be inclined to accept or reject
the growth law (8.23) for the population of the United States?
17. (a) Plot a graph of log x as a function of t, where x denotes the population figures quoted
in Exercise 16. Use this graph to show that the growth law (8.20) was very nearly satisfied from
1790 to 1910. Determine a reasonable average value of k for this period.
(b) Determine a reasonable average value of k for the period from 1920 to 1950, assume that
the growth law (8.20) Will hold for this k, and predict the United States population for the
years 2000 and 2050.
18. The presence of toxins in a certain medium destroys a strain of bacteria at a rate jointly proportional to the number of bacteria present and to the amount of toxin. If there were no
322
Introduction to diferential equations
FIGURE 8.3 Exercise
18.
toxins present, the bacteria would grow at a rate proportional to the amount present. Let x
denote the number of living bacteria present at time t. Assume that the amount of toxin is
increasing at a constant rate and that the production of toxin begins at time t = 0. Set up a
differential equation for x. Solve the differential equation. One of the curves shown in Figure
8.3 best represents the general behavior of x as a function of t. State your choice and explain
your reasoning.
8.8 Linear equations of second order with constant coefficients
A differential equation of the form
yn + PI(x + Pz(x)y = Jw
is said to be a linear equation of second order. The functions P, and Pz which multiply the
unknown function y and its derivative y’ are called the coe#icients of the equation.
For first-order linear equations, we proved an existence-uniqueness theorem and determined a11 solutions by an explicit formula. Although there is a corresponding existenceuniqueness theorem for the general second-order linear equation, there is no explicit
formula which gives a11 solutions, except in some special cases. A study of the general
linear equation of second order is undertaken in Volume II. Here we treat only the case
in which the coefficients P, and P, are constants. When the right-hand member R(x) is
identically zero, the equation is said to be homogeneous.
The homogeneous linear equation with constant coefficients was the first differential
equation of a general type to be completely solved. A solution was first published by Euler
in 1743. Apart from its historical interest, this equation arises in a great variety of applied
problems, SO its study is of practical importance. Moreover, we cari give explicit formulas
for a11 the solutions.
Consider a homogeneous linear equation with constant coefficients which we Write as
follows :
y# + ay’ + by = 0 .
Existence of solutions of the equation y” + by = 0
323
We seek solutions on the entire real axis (- 00, + co). One solution is the constant function
y = 0. This is called the trivial solution. We are interested in finding nontrivial solutions,
and we begin our study with some special cases for which nontrivial solutions cari be found
by inspection. In a11 these cases, the coefficient of y’ is zero, and the equation has the form
y” + by = 0. We shall find that solving this special equation is tantamount to solving the
general case.
8.9
Existence of solutions of the equation y” + by = 0
EXAMPLE 1. The equation y” = 0. Here both coefficients a and b are zero, and we cari
easily determine a11 solutions. Assume y is any function satisfying y” = 0 on (- CO, + CO).
Then its derivative y’ is constant, say y’ = c1 . Integrating this relation, we find that y
necessarily has the form
y = ClX + c2 ,
where c1 and cZ are constants. Conversely, for any choice of constants c1 and c2 , the linear
polynomial y = clx + c2 satisfies y” = 0, SO we have found a11 solutions in this case.
Next we assume that b # 0 and treat separately the cases b < 0 and b > 0.
EXAMPLE 2. The equation y” + by = 0, ir*here
b < 0. Since b < 0, we cari Write b = -k2,
where k > 0, and the differential equation takes the form
y” = k2y .
One obvious solution is y = elrZ, and another is y = e딓. From these we cari obtain
further solutions by constructing linear combinations of the form
y = c,e”” + c2eëkx 7
where c1 and cZ are arbitrary constants. It Will be shown presently, in Theorem 8.6, that
ail solutions are included in this formula.
EXAMPLE 3. The equation y” + by = 0, u,here b > 0. Here we cari Write b = k2, where
k > 0, and the differential equation takes the form
y” = -kZy .
Again we obtain some solutions by inspection. One solution is y = COS kx, and another
is y = sin kx. From these we get further solutions by forming linear combinations,
y = cl
COS
kx + c2 sin kx ,
where c1 and c2 are arbitrary constants. Theorem 8.6 Will show that this formula includes
a11 solutions.
Introduction to difSerentia1
324
8.10
Reduction
equations
of the general equation to the special case y” + by = 0
The problem of solving a second-order linear equation with constant coefficients cari
be reduced to that of solving the special cases just discussed. There is a method for doing
this that also applies to more general equations. The idea is to consider three functions
y, U, a n d u such that y = UU. Differentiation gives us y’ = UV’ + U’V, and y” = MI” +
224’21’ + ~A”U. Now we express the combination y” + ay’ + by in terms of u and v. We
have
(8.25)
y” + ay’ + by - uv” + 224’~’ + M”U + a(uv’ + u’u) + buv
= (v” + au’ + bv)u + (221’ + av)u’ + vu’.
Next we choose v to make the coefficient of u’ zero. This requires that v’ = -av/2, SO we
may choose v = e-ax/2. For this v we have un = -au’/2 = a2v/4, and the coefficient of
u in (8.25) becomes
y” + au’ + bu = ff!f -$+ bv=4+zu.
4
Thus, Equation (8.25) reduces to
y” + ay’ + by =
Since v = e-ax/2, the function v is never zero, SO y satisfies the differential equation y” +
ay’ + by = 0 if and only if u satisfies u” + &(4b - az)u = 0. Thus, we have proved the
following theorem.
THEOREM
8.4. Let y and u be two functions such that y = ue-ns/2. Then, on the interval
(- 00, + CO), y satisjes th e dfs
i erential equation y” + ay’ + by = 0 if und only if u satis$es
the dl@erential equation
Un + 4b - a2 u = 0
4
This theorem reduces the study of the equation y” + ay’ + by = 0 to the special case
y” + by = 0. We have exhibited nontrivial solutions of this equation but, except for the
case b = 0, we have not yet shown that we have found a11 solutions.
8.11 Uniqueness theorem for the equation y” + by = 0
The problem of determining a11 solutions of the equation y” + by = 0 cari be solved
with the help of the following uniqueness theorem.
THEOREM
8.5. Assume tti<o functions f and g satisfy the difSerentia1 equation y” + by = 0
o n ( - 00, + a~). Assume also that f and g satisfy the initial conditions
f@> = go-2 >
Then f (x) = g(x) for dl x.
f'(O) = g'(O).
Uniqueness theorem for the equation y” + by = 0
325
Proof. Let h(x) =f(x) - g(x). W e wish to prove that h(x) = 0 for a11 x. We shall
do this by expressing h in terms of its Taylor polynomial approximations.
First we note that h is also a solution of the differential equation y” + by = 0 and satisfies
the initial conditions h(0) = 0, h’(0) = 0. Now every function y satisfying the differential
equation has derivatives of every order on (- CO, + CO) and they cari be computed by
repeated differentiation of the differential equation. For example, since y” = -by, we
have y”’ = -by’, and y(*) = -by” = bzy. By induction we find that the derivatives of
even order are given by
~(2~) = (- l)"b"y ,
while those of odd order are yc2+l) = (- l)“-lb”-‘y’. Since h(0) and h’(0) are both 0, it
follows that a11 derivatives h(“)(O) are zero. Therefore, each Taylor polynomial generated
by h at 0 has a11 its coefficients zero.
Now we apply Taylor’s formula with remainder (Theorem 7.6), using a polynomial
approximation of odd degree 2n - 1, and we find that
h(x)
=
E,,-,(x)
3
where E,,-,(x) is the error term in Taylor’s formula. TO complete the proof, we show that
the error cari be made arbitrarily small by taking n large enough.
We use Theorem 7.7 to estimate the size of the error term. For this we need estimates
for the size of the derivative h (2n). Consider any finite closed interval [-c, c], where c > 0.
Since h is continuous
on this interval, it is bounded there, say Ih(x)l 5 M on [-c, c].
Since h@“)(x) = (- l)“b”h(x), we have the estimate [h(2”)(x)1 < M Jbl” on [-c, c]. Theorem
7.7 gives us ]E2,_i(x)] < M Ibl n x2”/(2n)! SO, on the interval [-c, c], we have the estimate
(8.26)
0 5 I@>l 5
M lb/” x2?’ I M lbl” c2n _ MA2”
(2n)!
(2n)!
(2n)! ’
where A = lbj1’2 c. Now we show that A”lm! tends to 0 as m + + CO. This is obvious if
O<A<l. IfA> 1,wemaywrite
A”
AA
-=-.- .
m.1
1 2
A
.._. -
A.
.
.
-
where k < m. If we choose k to be the greatest integer 5 A, then A < k + 1 and the last
factor tends to 0 as m -+ + a3. Hence A”/m! tends to 0 as m + CO, SO inequality (8.26)
shows that h(x) = 0 for every x in [-c, c]. But, since c is arbitrary, it follows that h(x) = 0
for a11 real x. This completes the proof.
Note: Theorem 8.5 tells us that two solutions of the differential equation y” + by = 0
which have the same value and the same derivative at 0 must agree everywhere. The choice
of the point 0 is not essential. The same argument shows that the theorem is also true if 0
is replaced by an arbitrary point c. In the foregoing proof, we simply use Taylor polynomial approximations at c instead of at 0.
Introduction to difSerentia1
326
equations
8.12 Complete solution of the equation y” + by = 0
The uniqueness theorem enables us to characterize a11 solutions of the differential
equation y” + by = 0.
THEOREM
8.6.
Given a real number b, dejine two functions u1 and u2 on (-CO, +CD) as
follows:
(a) If b = 0, Zet u,(x) = 1, uZ(x) = x.
(b) Zf b < 0, u!rite b = - k2 and dejine
ul(x) = eLx, uz(x) = eekx.
(c) Zf b > 0, write b = k2 and dejne ul(x) = COS kx, uz(x) = sin kx.
Then every solution of the d@erential equation y” + by = 0 on (- 00, + CQ) has the form
(8.27)
Y = Cl%(4 + c2u2(x>
>
where c1 and c2 are constants.
Proof. We proved in Section 8.9 that for each choice of constants c1 and c2 the function
y given in (8.27) is a solution of the equation y” + by = 0. Now we show that a11 solutions
have this form. The case b = 0 was settled in Section 8.9, SO we may assume that b # 0.
The idea of the proof is this: Let y = f(x) be any solution of y” + by = 0. If we cari
show that constants c1 and c2 exist satisfying the pair of equations
(8.28)
c,49 + c,u,m =f(O> ,
c,4@) + cg@) =f'(O>,
then both f and clul + cZuZ are solutions of the differential equation y” + by = 0 having
the same value and the same derivative at 0. By the uniqueness theorem, it follows that
f= qu1 + C$h.
In case (b), we have ul(x) = e”‘, uZ(x) = eëkz, SO u,(O) = u,(O) = 1 and u;(O) = k,
u:(O) = -k. Thus the equations in (8.28) become c1 + c2 = f (0), and c1 - c2 = f ‘(O)/k.
They have the solution c1 = &f(O) + if’(O)/k, c2 = if(O) - af’(O)/k.
In case (c), we have ul(x) = COS kx, uz(x) = sin kx, SO u,(O) = 1, ~~(0) = 0, u;(O) = 0,
U;(O) = k, and the solutions are c1 = f(O), and c2 = f’(O)/k. Since c1 and c2 always exist
to satisfy (8.28), the proof is complete.
8.13
Complete solution of the equation y” + uy’ + by = 0
Theorem 8.4 tells us that y satisfies the differential equation y” + ay’ + by = 0 if
and only if u satisfies U” + &(4b - a2)u = 0, where y = eP’&. From Theorem 8.6 we
know that the nature of each solution u depends on the algebraic sign of the coefficient of
u, that is, on the algebraic sign of 4b - a2 or, alternatively, of a2 - 46.
We cal1 the number
a2 - 4b the discriminant of the differential equation y” + ay’ + by = 0 and denote it by
d. When we combine the results of Theorem 8.4 and 8.6 we obtain the following.
THEOREM
8.7. Let d = a2 - 4b be the discriminant of the linear differential equation
y” + ay’ + by = 0. Then every solution of this equation on (- CO, + CO) has the form
(8.29)
y = eëaz’2[c1u1(x)
+ c2u2(x)3
,
Complete solution of the equation y” + a,v’ + by = 0
327
where c1 and c2 are constants, and the functions u1 and u2 are determined according to the
algebraic sign of the discriminant as follows:
(a) Ifd = 0, then ul(x) = 1 and uz(x) = x.
(b) If d > 0, then ul(x) = ekx and uz(x) = e-“‘, where k = $42.
(c) If d < 0, then ul(x) = COS kx and uz(x) = sin kx, where k = 4d-d.
Note: In case (b), where the discriminant dis positive, the solution y in (8.29) is a linear
combination of two exponential functions,
where
r,=-%+k=
- a +VS
2
’
a
p-,=---k=
2
The two numbers r1 and r2 have sum rl + r2 = -a and product
Therefore, they are the roots of the quadratic equation
- a -&
2
*
rlr2 = P(u” - d) = b.
r2+ar +b =O.
This is called the characteristic equation associated with the differential equation
y” + uy’ + by = 0 .
The number d = a2 - 4b is also called the discriminant of this quadratic equation; its
algebraic sign determines the nature of the roots. If d 2 0, the quadratic equation has real
roots given by (-a f &)/2. If d < 0, the quadratic equation has no real roots but it
does have complex roots r1 and r2 . The definition of the exponential function cari be extended SO that eV+ and erzr are meaningful when rl and r2 are complex numbers. This extension, described in Chapter 9, is made in such a way that the linear combination in
(8.29) cari also be written as a linear combination of erls and e”z+, when rl and r2 are
complex.
We conclude this section with some miscellaneous remarks. Since a11 the solutions of
the differential equation y” + ay’ + by = 0 are contained in formula (8.29), the linear
combination on the right is often called the general solution of the differential equation.
Any solution obtained by specializing the constants cr and c2 is called aparticular solution.
For example, taking cr = 1, c2 = 0, and then c, = 0, c2 = 1, we obtain the two particular
solutions
v1 = eëar12u1(x) ,
v2 = eëash2(x) .
These two solutions are of special importance because linear combinations of them give
us a11 solutions. Any pair of solutions with this property is called a basis for the set of
a11 solutions.
A differential equation always has more than one basis. For example, the equation
y” = 9y has the basis v1 = e3x, v2 = e-3x. But it also has the basis )vr = cash 3x, u12 =
sinh 3x. In fact, since e3s = bvl + w2 and e-3x = UJ~ - w2 , every linear combination of e3z
and eP3+ is also a linear combination of w1 and u’~ . Hence, the pair w1 , w2 is another basis.
It cari be shown that any pair of solutions v1 and v2 of a differential equation y” +
ay’ + by = 0 Will be a basis if the ratio v,/v, is not constant. Although we shall not need
328
Introduction to d$Gerential
equations
this fact, we mention it here because it is important in the theory of second-order linear
equations with nonconstant coefficients. A proof is outlined in Exercise 23 of Section 8.14.
8.14 Exercises
Find a11 solutions of the following differential equations on (- CO, + CO).
1. y” - 4y = 0.
6. y” + 2y’ - 3y = 0.
2. yo + 4y = 0.
7. y” - 2y’ + 2y = 0.
3. yo - 4y’ = 0.
8. y” - 2y’ + Sy = 0.
4. y” + 4y’ = 0.
9. yfl + 2y’ + y = 0.
5. y” - 2y’ + 3y = 0.
10. y” - 2y’ + y = 0.
In Exercises 11 through 14, find the particular solution satisfying the given initial conditions.
11.2y”+3y’=O,withy=landy’=lwhenx=O.
l2.y”+25y=O,withy=
-1andy’=Owhenx=3.
13.~“-4y’-y=O,withy=2andy’= -lwhenx=l.
14. y” + 4y’ + Sy = 0, with y = 2 and y’ = y” when x = 0.
15. The graph of a solution u of the differential equation y” - 4y’ + 29y = 0 intersects the graph
of a solution u of the equation y” + 4y’ + 13y = 0 at the origin. The two curves have equal
slopes at the origin. Determine u and v if u’&) = 1.
16. The graph of a solution u of the differential equation y” - 3y’ - 4y = 0 intersects the graph
of a solution v of the equation y” + 4y’ - 5y = 0 at the origin. Determine u and v if the two
curves have equal slopes at the origin and if
]im ‘04 = 5
z-+ m 44
6 ’
17. Find a11 values of the constant k such that the differential equation y” + ky = 0 has a nontrivial solution y =fk(x) for whichf,(O) =fk(l) = 0. For each permissible value of k, determine the corresponding solution y = fk(x). Consider both positive and negative values of k.
18. If (a, b) is a given point in the plane and if m is a given real number, prove that the differential
equation y” + k2y = 0 has exactly one solution whose graph passes through (a, b) and has the
slope m there. Discuss also the case k = 0.
19. (a) Let (ai , b,) and (az , b,) be two points in the plane such that a1 - a2 # m, where n is an
integer. Prove that there is exactly one solution of the differential equation y” + y = 0 whose
graph passes through these two points.
(b) 1s the statement in part (a) ever true if a, - a2 is a multiple of n?
(c) Generalize the result in part (a) for the equation y” + k2y = 0. Discuss also the case k = 0.
20. In each case, find a linear differential equation of second order satisfied by u1 and u2 .
(a) ui(x) = er, ue(x) = e-“.
(b) ui(x) = e2r, u2(x) = xe2”.
(c) ul(x) = eëxi2 cas x, u2(x) = eëri2 sin x.
(d) ur(x) = sin (2x + l), u2(x) = sin (2x + 2).
(e) ui(x) = cash x, u2(x) = sinh x.
The Wronskian. Given two functions ui and u2 , the function W defined by W(x) = u,(x)u~(x) u,(x)u~(x) is called their Wronskian, after J. M. H. Wronski (1778-1853). The following exercises
are concerned
with properties of the Wronskian.
21. (a) If the Wronskian W(x) of ui and u2 is zero for a11 x in an open interval Z, prove that the
quotient uz/ul is constant on Z. In other words, if u2/ui is not constant on Z, then W(c) # 0
for at least one c in Z.
(b) Prove that the derivative of the Wronskian is W’ = ului - L$u1 M .
Nonhomogeneous linear equations of second order with constant coeJ%-ients
329
22. Let W be the Wronskian of two solutions u r , u2 of the differential equation y” + uy’ + by = 0,
where a and b are constants.
(a) Prove that W satisfies the first-order equation W’ + a W = 0 and hence W(x) = W(0)eëa2.
This formula shows that if W(0) # 0, then W(x) # 0 for a11 x.
(b) Assume ui is not identically zero. Prove that W(0) = 0 if and only if uz/ul is constant.
23. Let ut and u2 be any two solutions of the differential equation y” + ay’ + by = 0 such that
v,/vt is not constant.
(a) Let y =f(x) be any solution of the differential equation. Use properties of the Wronskian
to prove that constants ci and c2 exist such that
c,zgO) + c,v;(O) =,f(O) .
w,(O) +w,(O) = f(O) >
(b) Prove that every solution has the form y = ciut + czvz . In other words, vr and ve form
a basis for the set of a11 solutions.
8.15 Nonhomogeneous linear equations of second order with constant coefficients
We turn now to a discussion of nonhomogeneous equations of the form
(8.30)
y” + ay’ + by = R ,
where the coefficients a and b are constants but the right-hand member R is any function
continuous
on (- CO, + cc). The discussion may be simplified by the use of operator
notation. For any function f with derivatives f' and f “, we may define an operator L
which transforms f into another function L(f) defined by the equation
L(f) =f" + af' + bf.
In operator notation, the differential equation (8.30) is written in the simpler form
L(y) = R .
lt is easy to verify that L(yl + yz) = L(yl) + L(y.J, and that L(cy) = CL(~) for every
constant c. Therefore, for every pair of constants ci and c2 , we have
L(c,y, + c,y,) = c,L(y,)
+ czL(yz) .
This is called the Zinearity property of the operator L.
Now suppose y1 and yz are any two solutions of the equation L(y) = R. Since L(yJ =
L(yJ = R, linearity gives us
L(yz - yl) = L(ys) - Uy,) = R - R = 0 ,
yz - y1 is a solution of the homogeneous equation L(y) = 0. Therefore, we must have
+ c2v2 is the general solution of the homogeneous
Cl4 + c2uz 9 where cru1
equation, or
SO
y2
- y1 =
y2 =
Cl%
+
c2v2
+y,.
330
Introduction to difSerentia1
equations
This equation must be satisfied by every pair of solutions y1 and yz of the nonhomogeneous
equation L(y) = R. Therefore, if we cari determine one particular solution y1 of the nonhomogeneous equation, a11 solutions are contained in the formula
(8.31)
y =
Cl4 +
czvz
+ y1 >
where c1 and c2 are arbitrary constants. Each such y is clearly a solution of L(y) = R
because L(c,v, + c2v2 + yI) = L(c,v, + czvz) + L(yl) = 0 + R = R.
Since a11 solutions
of L(y) = R are found in (8.31), the linear combination c,v, + czuz + y1 is called the general
solution of (8.30). Thus, we have proved the following theorem.
THEOREM
8.8. If y1 is a particular solution of the nonhomogeneous equation L(y) = R,
the general solution is obtained by adding to y1 the general solution of the corresponding
homogeneous equation L(y) = 0.
Theorem 8.7 tells us how to find the general solution of the homogeneous equation
L(y) = 0. It has the form y = clul + c2uz , where
(8.32)
ul(x) = e-arhl(x) ,
v2(x) = e-ar&42(x)
,
the functions ur and u2 being determined by the discriminant of the equation, as described
in Theorem 8.7. Now we show that v1 and v2 cari be used to construct a particular solution
y1 of the nonhomogeneous equation L(y) = R.
The construction involves a function W defined by the equation
W(x) = v1(x)lqx) - u,(x)o;(x) .
This is called the Wronskian of v1 and v2 ; some of its properties are described in Exercises
21 and 22 of Section 8.14. We shall need the property that W(x) is never zero. This cari be
proved by the methods outlined in the exercises or it cari be verified directly for the particular
functions v1 and v2 given in (8.32).
THEOREM
8.9. Let v1 and v2 be the solutions of the equation L(y) = 0 given by (8.32),
where L(y) = y” + ay’ + by. Let W denote the Wronskian of v1 and v2 . Then the nonhomogeneous equation L(y) = R has a particular solution y1 given by the formula
t1(x) = -s z&(x) -W(R(xx)) dx ’ t2(x> =s h(X) -W(R(xx)) dx *
YlW = t1W1(x>
where
(8.33)
+
t2(xb2(4 3
Proof Let us try to find functions t, and t, such that the combination y1 = t,v, + t,v,
Will satisfy the equation L(yI) = R. We have
y’ = t,v; + t,v; + (t$, + t&,) )
yf = t,v; + t,v; + (t;v; + t;v;) + (tiv, + t;v2)’ .
Nonhomogeneous linear equations of second order with constant coeficients
331
When we form the linear combination L(yl) = y; + ayi + byl, the terms involving tl
and t, drop out because of the relations L(u,) = L(U,) = 0. The remaining terms give us
the relation
L(yl) = (t;u; + t&> + (tiu, + tau2>’ + a(tih + Gu2> .
We want to choose t, and t2 SO that L(yl) = R. We cari satisfy this equation if we choose
t, and t, SO that
t;u, + tpu, = 0
and
t;u; + t;u; = R .
This is a pair of algebraic equations for ti and ti . The determinant of the system is the
Wronskian of u1 and u2 . Since this is never zero, the system has a solution given by
t; = -u,RlW
and
t; = u,RIW.
Integrating these relations, we obtain Equation (8.33), thus completing the proof.
The method by which we obtained the solution y1 is sometimes called variation ofparameters. It was first used by Johann Bernoulli in 1697 to solve linear equations of first order,
and then by Lagrange in 1774 to solve linear equations of second order.
Note: Since the functions t, and t, in Theorem 8.9 are expressed as indefinite integrals,
each of them is determined only to within an additive constant. If we add a constant c1
to t1 and a constant c2 to t, we change the function y1 to a new function yz = y1 + clul +
cZvZ . By linearity, we have
L(y2) = Uyd + L(c,u, + czu2) = L(y1) ,
SO
the new function y, is also
EXAMPLE
a particular solution of the nonhomogeneous equation.
1. Find the general solution of the equation y” + y = tan x on (-~r/2, 7~/2).
Solution. The functions u1 and u2 of Equation (8.32) are given by
Ul(X) = COS x >
u2(x) = sin x .
Their Wronskian is W(x) = u,(x)u~(x) - u,(x)u~(x) = cos2 x + sin2 x = 1. Therefore Equation (8.33) gives us
h(x) = ; j .sm x tan x
dx = sin x - log Isec x + tan XI ,
and
t2(x)
=
j
COS
x tan x dx = I s i n x d x = -COS~.
Thus, a particular solution of the nonhomogeneous equation is
y1
=
h(XM4
+
t2(+2(x>
sm
x cas x - cas x log [sec x + tan xl - sin x cas x
=
= -cosxloglsecx+ tanxl.
332
Introduction to difSerentia1
equations
By Theorem 8.8, its general solution is
y = cl cas x + c2 sin x - cas x log [sec x + mn XI .
Although Theorem 8.9 provides a general method for determining a particular solution
of L(y) = R, special methods are available that are often easier to apply when the function
R has certain special forms. In the next section we describe a method that works when R
is a polynomial or a polynomial times an exponential.
8.16
Special methods for determining a particular solution of the nonhomogeneous equation
y” + uy’ + by = R
CASE 1. The right-hand member R is a polynomial of degree n. If b # 0, we cari always
find a polynomial of degree n that satisfies the equation. We try a polynomial of the form
h(x) = 5 43
lr=o
with undetermined coefficients. Substituting in the differential equation L(y) = R and
equating coefficients of like powers ofx, we may determine a,, a,-, , , . . , a,, a, in succession.
The method is illustrated by the following example.
EXAMPLE
1. Find the general solution of the equation y” + y = x3.
Solution. The general solution of the homogeneous equation y” + y = 0 is given by
y = c1 COS x + c2 sin x. TO this we must add one particular solution of the nonhomogeneous
equation. Since the right member is a cubic polynomial and since the coefficient of y is
nonzero, we try to find a particular solution of the form Y~(X) = Ax3 + Bx2 + Cx + D.
Differentiating twice, we find that y”(x) = 6Ax + 2B. The dilIerentia1
equation leads to
the relation
(6Ax + 2B) + (Ax3 + Bx2 + Cx + D) = x3.
Equating coefficients of like powers of x, we obtain A = 1, B = 0, C = -6, and D = 0,
SO a particular solution is y,(x) = x3 - 6x. Thus, the general solution is
Y = cl
It may be of interest
(8.33) gives us
COS
x + c2 sin x + x3 - 6x .
to compare this method with variation of parameters. Equation
t,(x) = - 1 x3 sin x
dx = -(3x3 - 6) sin x + (x3 - 6x)
COS
x
and
tz(x>
= J x3
COS
x dx = (3x2 - 6)
COS
x + (x3 - 6x) sin x .
When we form the combination t,v, + t,u, , we find the particular solution y,(x) = x3 - 6x,
as before. In this case, the use of variation of parameters required the evaluation of the
Exercises
333
integrals jx” sin x dx and jx” COS x dx. With the method of undetermined coefficients, no
integration is required.
If the coefficient b is zero, the equation y” + ay’ = R cannot be satisfied by a polynomial
of degree n, but it cari be satisfied by a polynomial of degree n + 1 if a # 0. If both a and
b are zero, the equation becomes y” = R; its general solution is a polynomial of degree
n + 2 obtained by two successive integrations.
CASE 2. The right-hand member has the form R(x) = p(x)emx, where p is a polynomial
of degree n, and m is constant.
In this case the change of variable y = u(x)emx transforms the differential equation
y” + ay’ + by = R to a new equation,
u” + (2m + a)u’ + (m” + am + b)u = p .
This is the type discussed in Case 1 SO it always has a polynomial solution u1 . Hence, the
original equation has a particular solution of the form y1 = U1(x)ern”, where u1 is a polynomial. If m2 + am + b # 0, the degree of CI~ is the same as the degree of p. If m2 +
am + b = 0 but 2m + a # 0, the degree of u1 is one greater than that of p. If both
m2 + am + b = 0 and 2m + a = 0, the degree of u1 is two greater than the degree ofp.
EXAMPLE
2. Find a particular solution of the equation y” + y = xe3’.
Solution. The change of variable y = ue 3a leads to the new equation U” + 6~’ +
10~ = x. Trying ul(x) = Ax + B, we find the particular solution ur(x) = (5x - 3)/50, SO
a particular solution of the original equation is y1 = e3”(5.x - 3)/50.
The method of undetermined coefficients cari also be used if R has the form R(x) =
p(x>e”” COS LXX, or R(x) = p(x)e”” sin LXX, wherep is a polynomial and m and CI are constants.
In either case, there is always a particular solution of the form y,(x) = e”“[q(x) COS QX +
r(x) sin KX], where q and r are polynomials.
8.17 Exercises
Find the general solution of each of the differential equations in Exercises 1 through 17. If the
solution is not valid over the entire real axis, describe an interval over which it is valid.
1 . y* - y = x .
9. y” + y’ - 2y = P.
2. yn - y’ = x2.
10. y0 + y’ - 2y = e2s.
3. ys + y’ = x2 + 2x.
11. y” +y’-2y =e3: +e25.
4. y” - 2y’ + 3y = x3.
12. y” -2y’ +y =x +2x8.
5. y” - Sy’ + 4y = x2 - 2x + 1.
13. yn f 2y’ + y = e-*/x2.
6. y” + y’ - 6y = 2x3 $ 5x2 - 7x + 2.
14. y” + y = cot2 x.
7. yo - 4y = e2x.
15. y” - y = 2/(1 + eZ).
8. y” + 4y = e-2x.
16. y” + y’ - 2y = eZ/( 1 + e”).
17. y” + 6y’ + 9y =f(x), wheref(x) = 1 for 1 < x < 2, andf(x) = 0 for a11 other x.
18. If k is a nonzero constant, prove that the equation y” - k2y = R(x) has a particular solution
y1 given by
x R(t) sinh k(x - t) dt .
s0
Find the general solution of the equation y” - 9y = S5.
yp;
Introduction to dlferential equations
334
19. If k is a nonzero constant, prove that the equation y” + k2y = R(x) has a particular solution
y1 given by
R(t) sin k(x - t) dt .
Find the general solution of the equation y” + 9y = sin 3x.
In each of Exercises 20 through 25, determine the general solution.
23. y” + 4y = 3x sin x.
24. y” - 3y’ = 2ezx sin x.
20. y” + y = sin x.
21. y” + y = cosx.
22. y” + 4y = 3x COS x.
8.18
25. y# + y = ezz cas 3x.
Examples of physical problems leading to linear second-order equations with constant
coefficients
EXAMPLE 1. Simple harmonie
motion. Suppose a particle is constrained to move in a
straight line with its acceleration directed toward a fixed point of the line and proportional
to the displacement from that point. If we take the origin as the fixed point and let y be
the displacement at time x, then the acceleration y” must be negative when y is positive,
and positive when y is negative. Therefore we cari Write y” = -k2y, or
y” + k2y = 0 )
where k2 is a positive constant. This is called the differential equation of simp2e harmonie
motion. It is often used as the mathematical mode1 for the motion of a point on a vibrating
mechanism such as a plucked string or a vibrating tuning fork. The same equation arises
in electric circuit theory where it is called the equation of the harmonie oscillator.
Theorem 8.6 tells us that a11 solutions have the form
(8.34)
y=
Asinkx+
Bcoskx,
where A and B are arbitrary constants. We cari express the solutions in terms of the sine
or cosine alone. For example, we cari introduce new constants C and u, where
ckKiq-2
and
B
CI = arctan - ,
A
then we have (see Figure 8.4) A = Ccos CI, B = C sin CI, and Equation (8.34) becomes
y=Ccoscrsinkx+Csincrcoskx=Csin(kx+cc).
When the solution is written in this way, the constants C and tc have a simple geometric
interpretation (see Figure 8.5). The extreme values of y, which oc&r when sin (kx + a) =
f 1, are &C. When x = 0, the initial displacement is C sin CI. As x increases, the particle
oscillates between the extreme values +C and -C with period 2n/k. The angle kx + CI
is called the phase angle and u itself is called the initial value of the phase angle.
Physicalproblems
leading to linear second-order equations with constant coeficients
FIGURE 8.4
FIGURE
335
8.5 Simple harmonie motion.
2. Damped vibrations. If a particle undergoing simple harmonie motion is
suddenly subjected to an external force proportional to its velocity, the new motion satisfies
a differential equation of the form
EXAMPLE
y” + 2cy’ + k2y = 0 ,
where c and k2 are constants, c # 0, k > 0. If c > 0, we Will show that a11 solutions tend
to zeroasx-t +co. In this case, the differential equation is said to be stable. The external
force causes damping of the motion. If c < 0, we Will show that some solutions have
arbitrarily large absolute values as x + + 00. In this case, the equation is said to be
unstable.
Since the discriminant of the equation is d = (2~)~ - 4k2 = 4(c2 - k2), the nature of
the solutions is determined by the relative sizes of c2 and k 2. The three cases d = 0, d > 0,
and d < 0 may be analyzed as follows:
(a) Zero discriminant: c2 = k2. In this case, a11 solutions have the form
y = ewcs(A + Bx).
If c > 0, a11 solutions tend to 0 as x 4 + CO. This case is referred to as critical damping.
If B # 0, each solution Will change sign exactly once because of the linear factor A + Bx.
An example is shown in Figure 8.6(a). If c < 0, each nontrivial solution tends to + CO or
t o -cc asx+ +co.
(b) Positive discriminant: c2 > k2. By Theorem 8.7 a11 solutions have the form
y = e-cz(~ehz
+ Be-h”) = Ae(h-c)ï +
Be-(h+ds >
where h = $16 = dc2 - k2. Since h2 = c2 - k2, we have h” - c2 < 0 SO (h - c)(h + c) < 0.
Therefore, the numbers h - c and h + c have opposite signs. If c > 0, then h + c is
positive SO h - c is negative, and hence both exponentials e(h-e)x and e-(h+c)r tend to zero
as x- +Go. In this case, referred to as overcritical damping, a11 solutions tend to 0 for
large x. An example is shown in Figure 8.6(a). Each solution cari change sign at most
once.
If c < 0, then h - c is positive but h + c is negative. Thus, both exponentials ethecjZ
336
Introduction to difSerentia1
and e-(h+c)r tend to + cc for large x,
absolute values.
(c) Negative discriminant: c2 < k2.
SO
equations
again there are solutions with arbitrarily large
In this case, a11 solutions have the form
y = Cece” sin (hx + X) ,
where h = $Cd = 2/k2 - c2. If c > 0, every nontrivial solution oscillates, but the
amplitude of the oscillation decreases to 0 as x + + CO. This case is called undercritical
damping and is illustrated in Figure 8.6(b). If c < 0, a11 nontrivial solutions take arbitrarily
large positive and negative values as x + + CO.
Undercritical damping
Critical
damping
-,’
,,
(a) Discriminant 0 or positive
FIGURE 8.6
I
,.-*
(b)
Discriminant
negative
Damped vibrations occurring as solutions of y” + 24 + k2y = 0, with
c > 0, and discriminant 4(c2 - k2).
EXAMPLE 3. Electric circuits. If we insert a capacitor in the electric circuit of Example 5
in Section 8.6, the differential equation which serves as a mode1 for this circuit is given by
Wt) + RZ(t) + ; s I(t) dt = v(c) >
where C is a positive constant called the capacitance. Differentiation of this equation gives
a second-order linear equation of the form
LZ”(t) + RZ’(t) + ; Z(t) = V’(t) .
If the impressed voltage V(t) is constant, the right member is zero and the equation takes
the form
Z”(t) + ; Z’(t) + $ z(t) = 0 .
This is the same type of equation analyzed in Example 2 except that 2c is replaced by RIL,
and k2 is replaced by l/(LC). In this case, the coefficient c is positive SO the equation is
always stable. In other words, the current Z(t) always tends to 0 as t - + CO. The
Physical problems leading to linear second-order equations with constant coejfîcients
337
terminology of Example 2 is also used here. The current is said to be critically damped
when the discriminant is zero (CR2 = 4L), overcritically damped when the discriminant
is positive (CR2 > 4L), and undercritically damped when the discriminant is negative
(CR2 < 4L).
EXAMPLE 4. Motion of a rocket
with variable mass. A rocket is propelled by burning
fuel in a combustion chamber, allowing the products of combustion to be expelled backward.
Assume the rocket starts from rest and moves vertically upward along a straight line.
Designate the altitude of the rocket at lime t by r(t), the mass of the rocket (including fuel)
by m(t), and the velocity of the exhaust matter, relative to the rocket, by c(t). In the absence
of external forces, the equation
m(t)r”(t)
(8.35)
= m’(t)c(t)
is used as a mathematical mode1 for discussing the motion. The left member, m(t)r”(t), is
the product of the mass of the rocket and its acceleration. The right member, m’(t)c(t), is
the accelerating force on the rocket caused by the thrust developed by the rocket engine.
In the examples to be considered here, m(t) and c(t) are known or cari be prescribed in
terms of r(t) or its derivative r’(t) (the velocity of the rocket). Equation (8.35) then becomes
a second-order differential equation for the position function r.
If external forces are also present, such as gravitational attraction, then, instead of
(8.3.5), we use the equation
(8.36)
m(t)r”(t)
= m’(t)c(t)
+ F(t) ,
where F(t) represents the sum of a11 external forces acting on the rocket at time t.
Before we consider a specific example, we Will give an argument which may serve to
motivate the Equation (8.35). For this purpose we consider first a rocket that fires its
exhaust matter intermittently, like bullets from a gun. Specifically, we consider a time
interval [t, t + h], where h is a small positive number; we assume that some exhaust
matter is expelled at time t, and that no further exhaust matter is expelled in the half-open
interval (t, t + h]. On the basis of this assumption, we obtain a formula whose limit, as
h + 0, is Equation (8.35).
Just before the exhaust material is expelled at time t, the rocket has mass m(t) and
velocity v(t). At the end of the time interval [t, t + h], the rocket has mass m(t + h) and
velocity v(t + 11). The mass of the expelled matter is m(t) - m(t + h), and its velocity
during the interval is u(t) + c(t), since c(t) is the velocity of the exhaust relative to the
rocket. Just before the exhaust material is expelled at time t, the rocket is a system with
momentum m(t)v(t).
At time t + h, this system consists of two parts, a rocket with
momentum m(t + h)u(t + h) and exhaust matter with momentum [m(t) - m(t + h)][u(t) +
c(t)]. The law of conservation of momentum states that the momentum of the new system
must be equal to that of the old. Therefore, we have
m(t)v(t)
= m(t + h)v(t + h) + [m(t) - m(t +
h)l[v(t) + c(t)1 y
from which we obtain
m(t + h)[v(t + h) - v(t)] = [m(t + h) - m(t>lc(t)
.
Introduction to d@erential
338
equations
Dividing by h and letting h + 0, we find that
m(t)?/(t) = m’(t)c(t)
>
which is equivalent to Equation (8.35).
Consider a special case in which the rocket starts from rest with an initial weight of
w pounds (including b pounds of fuel) and moves vertically upward along a straight line.
Assume the fuel is consumed at a constant rate of k pounds per second and that the products
of combustion are discharged directly backward with a constant speed of c feet per second
relative to the rocket. Assume the only external force acting on the rocket is the earth’s
gravitational attraction. We want to know how high the rocket Will travel before a11 its
fuel is consumed.
Since a11 the fuel is consumed when kt = b, we restrict t to the interval 0 5 t 2 b/k.
The only external force acting on the rocket is -m(t)g, the velocity c(t) = -c, SO Equation
(8.36) becomes
m(t)r”(t) = -m’(t)c - m(t)g .
The weight of the rocket at time t is w - kt, and its mass m(t) is (w - kt)/g;
m’(t) = -k/g and the foregoing equation becomes
kc
r”(t) = - m’(t)
- c-g=-w - k t
40
hence
we have
g*
Integrating, and using the initial condition r’(0) = 0, we find
w - kt
r’(t) = -c log - - gt .
W
Integrating again and using the initial condition r(0) = 0, we obtain the relation
r(t) =
c(w - kt)
w - k t 1
log - - ; gt2 + cf.
k
W
Al1 the fuel is consumed when t = b/k. At that instant the altitude is
(8.37)
0
c(w - b)
b
r-b =log wk
k
W
This formula is valid if b < w. For some rackets, the weight of the carrier is negligible
compared to the weight of the fuel, and it is of interest to consider the limiting case b = w.
We cannot put b = w in (8.37) because of the presence of the term log (w - b)/w. However,
if we let b + w, the first term in (8.37) is an indeterminate form with limit 0. Therefore,
when b + w, the limiting value of the right member of (8.37) is
]im
b+w
r
b =
k
_ ! gw2
2>+y=-;gT2+cT,
0
where T = w/k is the time required for the entire weight w to be consumed.
Remarks concerning nonlinear d$Serential
equations
339
8.19 Exercises
In Exercises 1 through 5, a particle is assumed to be moving in simple harmonie motion, according to the equation y = C sin (kx + a). The velocity of the particle is defined to be the derivative
y’. Thefrequency of the motion is the reciprocal of the period. (Period = 2n/k; frequency = k/2=.)
The frequency represents the number of cycles completed in unit time, provided k > 0.
1. Find the amplitude C if the frequency is l/n and if the initial values of y and y’ (when x = 0)
are 2 and 4, respectively.
2. Find the velocity when y is zero, given that the amplitude is 7 and the frequency is 10.
3. Show that the equation of motion cari also be written as follows:
y = A COS (mx + 8) .
Find equations that relate the constants A, m, /?, and C, k, u.
4. Find the equation of motion given that y = 3 and y’ = 0 when x = 0 and that the period is +.
5. Find the amplitude of the motion if the period is 2 n and the velocity is &v, when y = y,, .
6. A particle undergoes simple harmonie motion. Initially its displacement is 1, its velocity is 2
and its acceleration is - 12. Compute its displacement and acceleration when the velocity is 1/8.
7. For a certain positive number k, the differential equation of simple harmonie motion y” +
k2y = 0 has solutions of the form y = f(x) with f(0) = f(3) = 0 and f(x) < 0 for a11 x in
the open interval 0 < x < 3. Compute k and find a11 solutions.
8. The current Z(t) at time t flowing in an electric circuit obeys the differential equation Z”(t) +
Z(t) = G(t), where G is a step function given by G(t) = 1 if 0 5 t 5 27~, G(t) = 0 for a11 other t.
Determine the solution which satisfies the initial conditions Z(0) = 0, Z’(0) = 1.
9. The current Z(t) at time t flowing in an electric circuit obeys the differential equation
Z”(t) + RZ’(t) + Z(t) = sin wt ,
where R and o are positive constants. The solution cari be expressed in the form Z(t) =
F(t) + A sin (ot + a), where F(t) - 0 as t - + CO, and A and OL are constants depending on
R and w, with A > 0. If there is a value of o which makes A as large as possible, then 0/(2~)
is called a resonance frequency of the circuit.
(a) Find a11 resonance frequencies when R = 1.
(b) Find those values of R for which the circuit Will have a resonance frequency.
10. A spaceship is returning to earth. Assume that the only external force acting on it is the
action of gravity, and that it falls along a straight line toward the tenter of the earth. The
effect of gravity is partly overcome by firing a rocket directly downward. The rocket fuel is
consumed at a constant rate of k pounds per second and the exhaust material has a constant
speed of c feet per second relative to the rocket. Find a formula for the distance the spaceship
falls in time t if it starts from rest at time t = 0 with an initial weight of w pounds.
11. A rocket of initial weight w pounds starts from rest in free space (no external forces) and
moves along a straight line. The fuel is consumed at a constant rate of k pounds per second
and the products of combustion are discharged directly backward at a constant speed of c
feet per second relative to the rocket. Find the distance traveled at time t.
12. Solve Exercise 11 if the initial speed of the rocket is v0 and if the products of combustion are
fired at such a speed that the discharged material remains at rest in space.
8.20 Remarks concerning nonlinear differential equations
Since second-order linear differential equations with constant coefficients occur in such
a wide variety of scientific problems, it is indeed fortunate that we have systematic methods
340
Introduction to d@erential
equations
for solving these equations. Many nonlinear equations also arise naturally from both
physical and geometrical problems, but there is no comprehensive theory comparable to
that for linear equations. In the introduction to this chapter we mentioned a classic “bag
of tricks” that has been developed for treating many special cases of nonlinear equations.
We conclude this chapter with a discussion of some of these tricks and some of the problems
they help to solve. We shall consider only first-order equations which cari be solved for
the derivative y’ and expressed in the form
(8.38)
We recall that a solution of (8.38) on an interval Z is any function, say y = Y(x), which
is differentiable on Z and satisfies the relation Y’(x) =f[x, Y(x)] for a11 x in Z. In the linear
case, we proved an existence-uniqueness theorem which tells us that one and only one
solution exists satisfying a prescribed initial condition. Moreover, we have an explicit
formula for determining this solution.
This is not typical of the general case. A nonlinear equation may have no solution
satisfying a given initial condition, or it may have more than one. For example, the equation
(y’)” - xy’ + y + 1 = 0 has no solution with y = 0 when x = 0, since this would require
that (j)” E - 1 when .X = 0. On the other hand, the equation y’ = 3~~‘~ has two distinct
solutions, Y,(x) = 0 and Y,(X) = x3, satisfying the initial condition y = 0 when x = 0.
Thus, the study of nonlinear equations is more difficult because of the possible nonexistence or nonuniqueness of solutions. Also, even when solutions exist, it may not be
possible to determine them explicitly in terms of familiar functions.
Sometimes we cari
eliminate the derivative y’ from the differential equation and arrive at a relation of the form
satisfied by some, or perhaps all, solutions. If this equation cari be solved for y in terms
of x, we get an explicit formula for the solution. More often than not, however, the
equation is too complicated to solve for y. For example, in a later section we shall study
the differential equation
and we shall find that every solution necessarily satisfies the relation
(8.39)
i
log
(x” +
y")
+ arctan 2 + C = 0
X
for some constant C. It would be hopeless to try to solve this equation for y in terms of x.
In a case like this, we say that the relation (8.39) is an implicitformula for the solutions. It
is common practice to say that the differential equation has been “solved” or “integrated”
when we arrive at an implicit formula such as F(x, y) = 0 in which no derivatives of the
unknown function appear. Sometimes this formula reveals useful information about the
solutions. On the other hand, the reader should realize that such an implicit relation may
be less helpful than the differential equation itself for studying properties of the solutions.
Integral
curves
and
directionjîelds
341
In the next section we show how qualitative information about the solutions cari often
be obtained directly from the differential equation without a knowledge of explicit or
implicit formulas for the solutions.
8.21 Integral curves and direction fields
Consider a differential equation of first order, say y’ = f(x, y), and suppose some of the
solutions satisfy an implicit relation of the form
(8.40)
w,y, c> = 0 ,
where C denotes a constant. If we introduce a rectangular coordinate system and plot a11
the points (x, y) whose coordinates satisfy (8.40) for a particular C, we obtain a curve called
an integral curve of the differential equation. Different values of C usually give different
integral curves, but a11 of them share a common geometric property. The differential
equation y’ = f(x, y) relates the s1 ope y’ at each point (x, y) of the curve to the coordinates
x and y. As C takes on a11 its values, the collection of integral curves obtained is called a
one-parameter family of curves.
For example, when the differential equation is y’ = 3, integration gives us y = 3x + C,
and the integral curves form a family of straight lines, a11 having slope 3. The arbitrary
constant C represents the y-intercept of these lines.
If the differential equation is y’ = x, integration yields y = 4x” + C, and the integral
curves form a family of parabolas as shown in Figure 8.7. Again, the constant C tells us
where the various curves cross the y-axis. Figure 8.8 illustrates the family of exponential
FIGURE 8.7
Integral curves of the differ-
ential equation y’ = x.
FIGURE 8.8 Integral
curves of the differential
equation y’ = y.
Introduction to difSerentia1
342
equations
curves, y = Ce”, which are integral curves of the differential equation y’ = y. Once more,
C represents the y-intercept. In this case, C is also equal to the slope of the curve at the
point where it crosses the y-axis.
A family of nonparallel straight lines is shown in Figure 8.9. These are integral curves
of the differential equation
(8.41)
Y
FIGURE 8.9 Integral curves of the differential
FIGURE 8.10 A solution of Equation
(8.41) that is not a member of the
family in Equation (8.42).
dY
and a one-parameter family of solutions is given by
(8.42)
y = cx - $22 .
This family is one which possesses an envelope, that is, a curve having the property that
at each of its points it is tangent to one of the members of the family.7 The envelope here
is y = x2 and its graph is indicated by the dotted curve in Figure 8.9. The envelope of a
family of integral curves is itself an integral curve because the slope and coordinates at a
point of the envelope are the same as those of one of the integral curves of the family. In
this example, it is easy to verify directly that y = x2 is a solution of (8.41). Note that this
particular solution is not a member of the family in (8.42). Further solutions, not members
of the family, may be obtained by piecing together members of the family with portions
of the envelope. An example is shown in Figure 8.10. The tangent line at A cornes from
taking C = -2 in (8.42) and the tangent at B cornes from C = 1. The resulting solution,
y =f(~), is given as follows:
-2x- 1
f ( x ) = x2
+x - xl6
i f
x5-1,
if
-l<.X<i,
i f
.x2;.
t And conversely, each member of the family is tangent to the envelope.
Integral curves and directionjîelds
343
This function has a derivative and satisfies the differential equation in (8.41) for every
real x. It is clear that an infinite number of similar examples could be constructed in the
same way. This example shows that it may not be easy to exhibit a11 possible solutions of
a differential equation.
Sometimes it is possible to find a first-order differential equation satisfied by a11 members
of a one-parameter family of curves. We illustrate with two examples.
EXAMPLE
1. Find a first-order differential equation satisfied by a11 circles with tenter
at the origin.
Solution. A circle with tenter at the origin and radius C satisfies the equation
x2 + y2 = C2. As C varies over a11 positive numbers, we obtain every circle with tenter
at the origin. TO find a first-order differential equation having these circles as integral
curves, we simply differentiate the Cartesian equation to obtain 2x + 2yy’ = 0. Thus,
each circle satisfies the differential equation y’ = -~/y.
EXAMPLE 2. Find a first-order differential equation for the family of a11 circles passing
through the origin and having their centers on the x-axis.
Solution. If the tenter of a circle is at (C, 0) and if it passes through the origin, the
theorem of Pythagoras tells us that each point (x, y) on the circle satisfies the Cartesian
equation (x - C)2 + y2 = C2, which cari be written as
(8.43)
x2+y2-2cx=o.
TO find a differential equation having these circles as integral curves, we differentiate (8.43)
to obtain 2x + 2yy’ - 2C = 0, or
x+yy’= c.
Since this equation contains C, it is satisfied only by that circle in (8.43) corresponding to
the same C. TO obtain one differential equation satisfied by a11 the curves in (8.43), we
must eliminate C. We could differentiate (8.44) to obtain 1 + yy” + (Y’)~ = 0. This is a
second-order differential equation satisfied by a11 the curves in (8.43). We cari obtain a
first-order equation by eliminating C algebraically from (8.43) and (8.44). Substituting
x + yy’ for C in (8.43), we obtain x2 + y2 - 2x(x + yy’), a first-order equation which
cari be solved for y’ and written as y’ = (y” - x2)/(2xy).
Figure 8.11 illustrates what is called a direction jîeld of a differential equation. This is
simply a collection of short line segments drawn tangent to the various integral curves.
The particular example shown in Figure 8.11 is a direction field of the equation y’ = y.
A direction field cari be constructed without solving the differential equation. Choose
a point, say (a, b), and compute the numberf(a, b) obtained by substituting in the righthand
side of the differential equation y’ =f(x, y). If there is an integral curve through this point,
its slope there must be equal tof(a, b). Therefore, if we draw a short line segment through
(a, b) having this slope, it Will be part of a direction field of the differential equation. By
drawing several of these line segments, we cari get a fair idea of the general behavior of the
Introduction to d@erential equations
344
Y
A
’
/
T ,/,/- ,/
v / / Y
;’
/’
4
,’
“Y 4 ,.’ --’
::=/c-=-- :Z4
\Xl
- - \
.\
‘\’
’ \ -’ \
’ ‘.
\
‘-
t-l\ ‘\ -\,
.
.
‘\’
i
F IGURE 8.11
\
\
‘,
-\
’
A direction field for the differential equation y’ = y.
integral curves. Sometimes such qualitative information about the solution may be a11
that is needed. Notice that different points (0, b) on the y-axis yield different integral
curves. This gives us a geometric reason for expecting an arbitrary constant to appear
when we integrate a first-order equation.
8.22 Exercises
In Exercises 1 through 12, find a first-order differential equation having the given family of
curves as integral curves.
6. x2 +y2 + 2Cy = 1.
1. 2x+3y=c.
7. y = C(x - l)e=.
2. y = Ceë2’.
3. x2 - y2 = c.
4. xy = c.
5. y2 = cx.
8. y4(x + 2) = C(x - 2).
9. y = ccosx.
10. arctan y + arcsin x = C.
11. Al1 circles through the points (1,O) and ( - 1,O).
12. Al1 circles through the points (1, 1) and ( - 1, - 1).
In the construction of a direction field of a differential equation, sometimes the work may be
speeded considerably if we first locate those points at which the slope y’ has a constant value C.
For each C, these points lie on a curve called an isocline.
13. Plot the isoclines corresponding to the constant slopes i, 1, s, and 2 for the differential equation
y’ = x2 + y”. With the aid of the isoclines, construct a direction field for the equation and try
to determine the shape of the integral curve passing through the origin.
14. Show that the isoclines of the differential equation y’ = x + y form a one-parameter
family
of straight lines. Plot the isoclines corresponding to the constant slopes 0, &&, fl, +$, f2.
With the aid of the isoclines, construct
a direction field and sketch the integral curve passing
through the origin. One of the integral curves is also an isocline; find this curve.
First-order separable equations
15. Plot a number of isoclines and construct
345
a direction field for the equation
du du 2
y=xz+-&
.
( )
If you draw the direction field carefully, you should be able to determine a one-parameter
family of solutions of this equation from the appearance of the direction field.
8.23 First-order separable equations
A first-order differential equation of the form y’ = f(x, y) in which the right member
(x, y) splits into a product of two factors, one depending on x alone and the other depending
on y alone, is said to be a separable equation. Examples are y’ = x3,y’ = y, y’ = sin y log x,
y’ = x/tan y, etc. Thus each separable equation cari be expressed in the form
f
Y’
= QGW(u)
2
where Q and R are given functions.
When R(y) # 0, we cari divide by R(y) and rewrite
this differential equation in the form
where A(y) = l/R(y). T h e next theorem tells us how to find an implicit formula satisfied
by every solution of such an equation.
THEOREM
(8.45)
8.10.
Let y = Y(x) be any solution of the separable dtxerential
&)y’
=
equation
Q(x)
such that Y’ is continuous on an open interval I. Assume that both Q and the composite
function A 0 Y are continuous on I. Let G be any primitive of A, that is, any function such
that G’ = A. Then the solution Y satisjes the implicit formula
(8.46)
G(Y) = j
for some constant C. Conversely, if y satisfïes
Proof
(8.47)
Q(x)
dx + C
(8.46) then y is a solution of (8.45).
Since Y is a solution of (8.49, we must have
A [ WI Y’(4 = QG>
for each x in I. Since G’ = A, this equation becomes
G’[ Y(x)] Y’(x) = Q(x) .
But, by the chain rule, the left member is the derivative of the composite function G 0 Y.
346
Introduction to dtxerential
equations
Therefore G 0 Y is a primitive of Q, which means that
(8.48)
G[W)l = j” Q(x) dx + C
for some constant C. This is the relation (8.46). Conversely, if y = Y(x) satisfies (8.46),
differentiation gives us (8.47), which shows that Y is a solution of the differential equation
(8.45).
Note:
The implicit formula (8.46) cari also be expressed in terms of A. From (8.47)
we have
(A[ Y(x)] y’(x) dx =J Q(x) dx + C .
If we make the substitution y = Y(x), dy = Y’(x) dx in the integral on the left, the
equation becomes
(8.49)
fi(y) dy =
jQW
dix + C .
Since the indefinite integral j A(y) dy represents any primitive of A, Equation (8.49) is
an alternative way of writing (8.46).
In practice, formula (8.49) is obtained directly from (8.45) by a mechanical process. In
the differential equation (8.45) we Write dy/dx for the derivative y’ and then treat dy/dx as
a fraction to obtain the relation A(y) dy = Q(x) dx. Now we simply attach integral signs
to both sides of this equation and add the constant C to obtain (8.49). The justification for
this mechanical process is provided by Theorem 8.10. This process is another example
illustrating the effectiveness of the Leibniz notation.
EXAMPLE.
The nonlinear equation xy’ + y = y2 is separable since it cari be written in
the form
(8.50)
1
Y’
-=Y(Y - 1)
x ’
provided that y(y - 1) # 0 and x # 0. Now the two constant functions y = 0 and y = 1
are clearly solutions of xy’ + y = y2. The remaining solutions, if any exist, satisfy (8.50)
and, hence, by Theorem 8.10 they also satisfy
dy = s -$K
s~
Y(Y - 1)
x
for some constant K. Since the integrand on the left is l/(y - 1) - l/y, when we integrate,
we find that
This gives us l(y - l)/yl = x11eK or (y - l)/y = Cx for some constant C. Solving for y,
we obtain the explicit formula
(8.51)
1
Y=1-&
Homogeneousjrst-order equations
347
Theorem 8.10 tells us that for any choice of C this y is a solution; therefore, in this example
we have determined a11 solutions: the constant functions y = 0 and y = 1 and a11 the
functions defined by (8.51). Note that the choice C = 0 gives the constant solution y = 1.
8.24 Exercises
In Exercises 1 through 12, assume solutions exist and find an implicit formula satisfied by the
solutions.
7. (1 - x2)i’Zy’ + 1 + y2 = 0.
1. y’ = xa/yz.
8. xy(1 + x2)y’ - (1 + y2) = 0.
2. tan x cas y = -y’ tan y.
3. (x + 1)y’ + y2 = 0.
9. (x2 - 4)y’ = y.
4. y’ = (y - l)(y - 2).
10. xyy’ = 1 + x2 + y2 + x”y”.
5. y$4=2y’ = x.
11. yy’ = @+22/ sin x.
12. x dx + y dy = xy(x dy - y dx).
6. (x - 1)~’ = xy.
In Exercises 13 through 16, find functionsf,
continuous on the whole real axis, which satisfy the
conditions given. When it is easy to enumerate a11 of them, do SO; in any case, find as many as
you cari.
13. f(x) = 2 + jlf(t) dt.
14. f(x>f’(x) = 5x,
f(0) = 1.
15. f’(x) + 2x&) = 0,
f(0) = 0.
16. ,f2(x) + [f’(x)]” = 1. Note: f(x) = -1 is one solution.
17. A nonnegative functionf, continuous on the whole real axis, has the property that its ordinate
set over an arbitrary interval has an area proportional to the length of the interval. Findf.
18. Solve Exercise 17 if the area is proportional to the difference of the function values at the endpoints of the interval.
19. Solve Exercise 18 when “difference” is replaced by “sum.”
20. Solve Exercise 18 when “difference” is replaced by “product.”
8.25 Homogeneous first-order equations
We consider now a special kind of first-order equation,
(8.52)
Y’ = f(X9 Y> 9
in which the right-hand side has a special property known as homogeneity. This means that
(8.53)
f(w 94 = f(XY Y)
for a11 ,Y, y, and a11 t # 0. In other words, replacement of x by tx and y by ty has no effect
on the value off(x, y). Equations of the form (8.52) which have this property are called
homogeneous (sometimes called homogeneous of degree zero). Examples are the following:
y’ = y-x
y+x’
y’ =
x2 + y2 3 2
()
XY
+ Y2
y’ = x sin x2
-&2
- y2’
Y
y’=logx-logy.
If we use (8.53) with t = 1/x, the differential equation in (8.52) becomes
(8.54)
y’ =f l,Y .
( x1
348
Introduction to d@erential
equations
The appearance of the quotient y/x on the right suggests that we introduce a new unknown
function v where v = y/x. Then y = vx, y’ = v’x + v, and this substitution transforms
(8.54) into
v’x + v =f(l, v)
or
x$=j(l,V)-v.
This last equation is a first-order separable equation for v. We may use Theorem 8.10 to
obtain an implicit formula for u and then replace v by y/x to obtain an implicit formula
for y.
EXAMPLE.
Solution.
Solve the differential equation y’ = (y - ~)/(y + x).
We rewrite the equation as follows:
y
’
V/X~
- 1
4
y/x + 1.
The substitution v = y/x transforms this into
du
v - l -v=-l+.
x-zdx
u+l
v+l
Applying Theorem 8.10, we get
s
V
1 + u2
du+
-du=
- $+c.
’
s 1 + u2
s
Integration yields
t log (1 + u”) + arctan v = -1og Ix1 + C
Replacing v by y/x, we have
t log (x2 + y”) - 4 log x2 + arctan 2 = -1og Ix( + C ,
x
and since log x2 = 2 log 1x1, this simplifies to
4 log (x2 + y2) + arctan y = C .
x
There are some interesting geometric properties possessed by the solutions of a homogeneous equation y’ = f(x, y). First of all, it is easy to show that straight lines through the
origin are isoclines of the equation. We recall that an isocline of y’ =f(x, y) is a curve
along which the slope y’ is constant. This property is illustrated in Figure 8.12 which
shows a direction field of the differential equation y’ = -2y/x. The isocline corresponding
Homogeneous$rst-order
equations
349
to slope c has the equation -2y/x = c, or y = -icx and is therefore a line of slope -te
through the origin. TO prove the property in general, consider a line of slope m through
the origin. Then y = mx for a11 (x, y) on this line; in particular, the point (1, m) is on the
line. Suppose now, for the sake of simplicity, that there is an integral curve through each
point of the line y = mx. The slope of the integral curve through a point (a, b) on this
line is f(a, b) =f(a, ma). If a # 0, we may use the homogeneity property in (8.53) to
Y
FIGURE 8.12
A direction field for the differential equation y’ = - 2y/x. The isoclines
are straight lines through the origin.
Write f(a, ma) =f(l, m). In other words, if (a, b) # (0, 0), the integral curve through
(a, b) has the same slope’as the integral curve through (1, m). Therefore the line y = mx
is an isocline, as asserted. (It cari also be shown that these are the only isoclines of a
homogeneous equation.)
This property of the isoclines suggests a property of the integral curves known as
invariance under similarity transformations. We recall that a similarity transformation
carries a set S into a new set kS obtained by multiplying the coordinates of each point
of S by a constant factor k > 0. Every line through the origin remains fixed under a
similarity transformation. Therefore, the isoclines of a homogeneous equation do not
change under a similarity transformation; hence the appearance of the direction field
does not change either. This suggests that similarity transformations carry integral curves
350
Introduction to d@erential
equations
into integral curves. TO prove this analytically, let us assume that S is an integral curve
described by an explicit formula of the form
(8.55)
y = F(x) .
TO say that S is an integral curve of y’ = f(~, y) means that we have
(8.56)
F’(x) = f(x, F(x))
for a11 x under consideration. Now choose any point (x, y) on kS. Then the point (x/k, y/k)
lies on S and hence its coordinates satisfy (8.55), SO we have y/k = F(x/k) or y = kF(x/k).
In other words, the curve kS is described by the equation y = G(x), where G(x) = kF(x/k).
Note that the derivative of G is given by
G’(x) = kF’($ $ = F’(s) .
TO prove that kS is an integral curve of y’ = f(x, y) it Will suffice
f(x, G(x)) or, what is the same thing, that
(8.57)
to show that G’(x) =
Fr(;) =f (x, kF($) .
But if we replace x by x/k in Equation (8.56) and then use the homogeneity property with
t = k, we obtain
F$) =f(;>$)) =f(x,kF(;j) >
and this proves (8.57). In other words, we have shown that kS is an integral curve whenever
Sis. A simple example in which this geometric property is quite obvious is the homogeneous
equation y’ = -x/y whose integral curves form a one-parameter family of concentric
circles given by the equation x2 + y2 = C.
It cari also be shown that if the integral curves of a first-order equation y’ = f(x, y) are
invariant under similarity transformations, then the differential equation is necessarily
homogeneous.
8 . 2 6 Exercises
1. Show that the substitution y = X/V transforms a homogeneous equation y’ = f (x, y) into a
first-order equation for v which is separable. Sometimes this substitution leads to integrals
that are easier to evaluate than those obtained by the substitution y = xv discussed in the text.
Integrate the differential equations in Exercises
2. y’ = T.
3. y’=1 +;.
2 through 11.
2 +
4. y’ = ~
2y2 .
XY
5. (2yZ - x”)y’ + 3xy = 0.
Some geometrical and physical problems leading to Jirst-order
6. xy’ = y - 2/x2 + y2.
9. y’ =
equations
351
y(x2 + xy + y?
x(x2 + 3xy + y21 *
7. x2y’ + xy + 2y2 = 0.
10. y’ =: +sin:.
8. y2 + (x2 - xy + y”)y’ = 0.
11. x(y + 4x)y’ + y(x + 4y) = 0.
8.27 Some geometrical and physical problems leading to first-order equations
We discuss next some examples of geometrical and physical problems that lead to
first-order differential equations that are either separable or homogeneous.
Orthogonal trajectories. Two curves are said to intersect orthogonally at a point if their
tangent lines are perpendicular at that point. A curve which intersects every member of a
family of curves orthogonally is called an orthogonal trajectory for the family. Figure 8.13
shows some examples. Problems involving orthogonal trajectories are of importance in
both pure and applied mathematics. For example, in the theory of fluid flow, two orthogonal
families of curves are called the equipotential lines and the stream lines, respectively. In the
theory of heat, they are known as isothermal lines and lines ofJow.
Suppose a given family of curves satisfies a first-order differential equation, say
(8.58)
Y' =fku> *
The number f (x, y) is the slope of an integral curve passing through (x, y). The slope of
each orthogonal trajectory through this point is the negative reciprocal - l/f (x, y), SO the
orthogonal trajectories satisfy the differential equation
1
y’ = - f(x, y) *
If (8.58) is separable, then (8.59) is also separable. If (8.58) is homogeneous, then (8.59) is
also homogeneous.
EXAMPLE 1. Find the orthogonal trajectories of the family of a11 circles through the origin
with their centers on the x-axis.
In Example 2 of Section 8.21 we found that this family is given by the
Solution.
Cartesian equation x2 + y2 - 2Cx = 0 and that it satisfies the differential equation
y’ = (y” - x~/(~xJ)). Replacing the right member by its negative reciprocal, we find that
the orthogonal trajectories satisfy the differential equation
y1 = 2xy
x2 - Y2 .
This homogeneous equation may be integrated by the substitution y = vx, and it leads to
the family of integral curves
x2 + y2 - 2cy = 0 .
352
Introduction to diferential equations
This is a family of circles passing through the origin and having their centers on the y-axis.
Examples are shown in Figure 8.13.
Pursuit problems. A point Q is constrained to move along a prescribed plane curve C, .
Another point P in the same plane “pursues” the point Q. That is, P moves in such a
manner that its direction of motion is always toward Q. The point P thereby traces out
another curve C, called a curve of pursuit. An example is shown in Figure 8.14 where C, is
X
; F
\ \
/
1\
‘\ - - M-0,/’
/I
\\
\
,/
‘.
‘.- -/
I’
/
FIGURE 8.13 Orthogonal circles.
FIGURE 8.14 The tractrix as
a curve of pursuit. The distance from P to Q is constant.
the y-axis. In a typical problem of pursuit we seek to determine the curve C, when the
curve C, is known and some additional piece of information is given concerning P and Q,
for example, a relation between their positions or their velocities.
When we say that the direction of motion of P is always toward Q, we mean that the
tangent line of C, through P passes through Q. Therefore, if we denote by (x, y) the
rectangular coordinates of P at a given instant, and by (X, Y) those of Q at the same
instant, we must have
(8.60)
y’ = Y-y.
X -X
The additional piece of information usually enables us to consider X and Y as known
functions of x and y, in which case Equation (8.60) becomes a first-order differential
equation for y. Now we consider a specific example in which this equation is separable.
Some geometrical and physical problems leading to jirst-order
equations
353
EXAMPLE 2. A point Q moves on a straight line C1, and a point P pursues
Q in such a
way that the distance from P to Q has a constant value k > 0. If P is initially not on C, ,
find the curve of pursuit.
Solution. We take C, to be the y-axis and place P initially at the point (k, 0). Since
the distance from P to Q is k, we must have (X - x)” + (Y - y)” = k2. Eut X = 0 on
C, , SO we have Y - y = m,and the differential equation (8.60) becomes
y, dk2
=
- x2
-x
Integrating this equation with the help of the substitution x = k
that y = 0 when x = k, we obtain the relation
COS
t and using the fact
y = k log k+y-qw.
The curve of pursuit in this example is called a tractrix; it is shown in Figure 8.14.
Flow ofjuid through an orijïce. Suppose we are given a tank (not necessarily cylindrical)
orifice. If there
containing a fluid. The fluid flows from the tank through a Sharp-edged
were no friction (and hence no loss of energy) the speed of the jet would be equal to 2/2gy
feet per second, where y denotes the height (in feet) of the surface above the orifice.7 (See
Figure 8.15.) If Ao denotes the area (in square feet) of the orifice, then A,Ygy represents
the number of cubic feet per second of fluid flowing from the orifice. Because of friction,
the jet stream contracts somewhat and the actual rate of discharge is more nearly C&V$$,
where c is an experimentally determined number called the discharge coeficient.
For
ordinary Sharp-edged
orifices, the approximate value of c is 0.60. Using this and taking
g = 32, we find that the speed of the jet is 4.86 feet per second, and therefore the rate of
discharge of volume is 4.8A,$ cubic feet per second.
Let V(y) denote the volume of the fluid in the tank when the height of the fluid is y. I f
the cross-sectional area of the tank at the height u is A(u), then we have V(y) = fi A(u) du,
from which we obtain dV/dy = A(y). The argument in the foregoing paragraph tells us
that the rate of change of volume with respect to time is dV/dt = -4.8A,dj cubic feet per
second, the minus sign coming in because the volume is decreasing. By the chain rule we
have
g = dg z = A(y) yt.
Combining this with the equation dV/dt = -4.8A,dj, we obtain the differential equation
A(y) Gt = -4.8A,dL .
t If a particle of mass WI falls freely through a distance y and reaches a speed v, its kinetic energy ~mv”
must be equal to the potential
get v = d2gu.
energy mgy (the work done in lifting it up a distance y). Solving for v, we
354
Introduction to differential equations
This separable differential equation is used as the mathematical mode1 for problems
concerning fluid slow through an orifice. The height y of the surface is related to the time
t by an equation of the form
s
(8.61)
FIGURE
8.15 Flow of fluid through an orifice.
EXAMPLE 3. Consider a specific case in which the cross-sectional area of the tank is
constant, say A(y) = A for a11 y, and suppose the level of the fluid is lowered from 10 feet
to 9 feet in 10 minutes (600 seconds). These data cari be combined with Equation (8.61)
to give us
-j=;z = kSo*OOdt,
where k = 4.8Ao/A.
Using this, we cari determine k and we find that
V%--49
4
=
600k
or
,-fi-3
300
*
Now we cari compute the time required for the level to fa11 from one given value to any
other. For example, if at time t, the level is 7 feet and at time t, it is 1 foot (tl, t, measured
Miscellaneous review exercises
355
in minutes, say), then we must have
which yields
t2 - tl = 2(d7 - ‘) = 10 ;; 1 : = “(” - 1)(2/1o + 3, =
60k
(1())(1&5)(6.162)
10 - 9
= 101.3 min.
8.28 Miscellaneous review exercises
In each of Exercises 1 through 10 find the orthogonal trajectories of the given family of curves.
5. x”y = c.
1. 2x +3y = c.
6. y = Cemzx.
2. xy = c.
3. x2 + y2 + 2cy = 1.
7. x2 - y2 = c.
4. y2 = ex.
8. y = Ccosx.
9. Al1 circles through the points (1,O) and (- 1,O).
10. Al1 circles througb the points (1, 1) and ( - 1, - 1).
Il. A point Q moves upward along the positive y-axis. A point P, initially at (1, 0), pursues Q
in such a way that its distance from the y-axis is 4 the distance of Q from the origin. Find a
Cartesian equation for the path of pursuit.
12. Solve Exercise 11 when the fraction i is replaced by an arbitrary positive number k.
13. A curve with Cartesian equation y =f(x) passes through the origin. Lines drawn parallel
to the coordinate axes through an arbitrary point of the curve form a rectangle with two sides
on the axes. The curve divides every such rectangle into two regions A and B, one of which
has an area equal to n times the other. Find the functionf.
14. Solve Exercise 13 if the two regions A and B have the property that, when rotated about the
x-axis, they sweep out solids one of which has a volume n times that of the other.
1.5. The graph of a nonnegative differentiable function
f passes through the origin and through
the point (1, 2/=). If, for every x > 0, the ordinate set off above the interval [0, x] sweeps
out a solid of volume X~~(X) when rotated about the x-axis, find the function f.
16. A nonnegative differentiable function f is defined on the closed interval [0, l] with f(1) = 0.
For each a, 0 < a < 1, the line x = a cuts the ordinate set off into two regions having areas
A and B, respectively, A being the area of the leftmost region. If A - B = 2f(a) + 3a + b,
where b is a constant independent of a, find the function f and the constant 6.
17. The graph of a functionfpasses through the two points P,, = (0, 1) and P, = (1,O). For every
point P = (x, y) on the graph, the curve lies above the chord POP, and the area A(x) of the
region between the curve and the chord PP,, is equal to x3. Determine the function f.
18. A tank with vertical sides has a square cross-section of area 4 square feet. Water is leaving the
tank through an orifice of area 513 square inches. If the water level is initially 2 feet above
the orifice, find the time required for the level to drop 1 foot.
19. Refer to the preceding problem. If water also flows into the tank at the rate of 100 cubic inches
per second, show that the water level approaches the value (25/24)2 feet above the orifice,
regardless of the initial water level.
20. A tank has the shape of a right circular cane with its vertex up. Find the time required to
empty a liquid from the tank through an orifice in its base. Express your result in terms of the
dimensions of the cane and the area A, of the orifice.
356
Introduction to d@erential equations
21. The equation xy” - y’ + (1 - x)y = 0 possesses a solution of the form y = em”, where m
is constant. Determine this solution explicitly.
22. Solve the differential equation (x + y3) + 6xy2y’ = 0 by making a suitable change of variable
which converts it into a linear equation.
23. Solve the differential equation (1 + y2ez5)y’ + y = 0 by introducing a change of variable of
the form y = uemx, where m is constant and u is a new unknown function.
24. (a) Given a function f which satisfies the relations
2f’(x) =f ;
0
if x > 0,
f(l) = 2,
let y =f(x) and show that y satisfies a differential equation of the form
x2y” + axy’ + by = 0 ,
where a and b are constants. Determine a and b.
(b) Find a solution of the formf(x) = Cxn.
25. (a) Let u be a nonzero solution of the second-order equation
y” + P(x)y’ + Q(x)y = 0 .
Show that the substitution y =
UV
converts
the equation
y” + f’(dy + Q(x)y = R(x)
into a first-order linear equation for v’.
(b) Obtain a nonzero solution of the equation y” - 4y’ + ~“(y’ - 4y) = 0 by inspection
and use the method of part (a) to find a solution of
y” - 4y’ + x2(y’ - 4y) = 2xe-z3/3
such that y = 0 and y’ = 4 when x = 0.
26. Scientists at the Ajax Atomics Works isolated one gram of a new radioactive element called
Deteriorum. It was found to decay at a rate proportional to the square of the amount present.
After one year, fr gram remained.
(a) Set up and solve the differential equation for the mass of Deteriorum remaining at time t.
(b) Evaluate the decay constant in units of gm-l yr-l.
27. In the preceding problem, suppose the word square were replaced by square root, the other
data remaining the same. Show that in this case the substance would decay entirely within
a finite time, and find this time.
28. At the beginning of the Gold Rush, the population of Coyote Gulch, Arizona was 365. From
then on, the population would have grown by a factor of e each year, except for the high rate
of “accidental” death, amounting to one victim per day among every 100 citizens. By solving
an appropriate differential equation determine, as functions
of time, (a) the actual population of
Coyote Gulch t years from the day the Gold Rush began, and (b) the cumulative number of
fatalities.
29. With what speed should a rocket be fired upward SO that it never returns to earth? (Neglect
a11 forces except the earth’s gravitational attraction.)
Miscellaneous
review exercises
357
30. Let y =f(x) be that solution of the differential equation
I 2y2 + x
y = 3ya + 5
which satisfies the initial conditionf(0) = 0. (Do not attempt to solve this differential equation.)
(a) The differential equation shows thatf’(0) = 0. Discuss whetherf has a relative maximum
or minimum or neither at 0.
(b) Notice that f’(x) 2 0 for each x 2 0 and that f’(x) 2 8 for each x 2 13”.
Exhibit
two positive numbers a and b such that f(x) > ax - b for each x 2 8.
(c) Show that x/y2 + 0 as x -+ +a. Give full details of your reasoning.
(d) Show that y/x tends to a finite limit as x + + CO and determine this limit.
3 1. Given a function f which satisfies the differential equation
X~(X) + ~X[~(X)I~ = 1 - e+
for a11 real x. (Do not attempt to solve this differential equation.)
(a) Iffhas an extremum at a point c # 0, show that this extremum is a minimum.
(b) Iffhas an extremum at 0, is it a maximum or a minimum? Justify your conclusion.
(c) Iff(0) =f’(O) = 0, find the smallest constant A such thatf(x) I Ax2 for a11 x 2 0.
9
COMPLEX NUMBERS
9.1 Historical introduction
The quadratic equation x2 + 1 = 0 has no solution in the real-number system because
there is no real number whose square is - 1. New types of numbers, called complex numbers,
have been introduced to provide solutions to such equations. In this brief chapter we
discuss complex numbers and show that they are important in solving algebraic equations
and that they have an impact on differential and integral calculus.
As early as the 16th Century, a symbol d- 1 was introduced to provide solutions of the
quadratic equation x2 + 1 = 0. This symbol, later denoted by the letter i, was regarded
as a fictitious or imaginary number which could be manipulated algebraically like an
ordinary real number, except that its square was - 1. Thus, for example, the quadratic
polynomial x2 + 1 was factored by writing x2 + 1 = x2 - i2 = (x - i)(x + i), and the
solutions of the equation x2 + 1 = 0 were exhibited as x = fi, without any concern
regarding the meaning or validity of such formulas. Expressions such as 2 + 3i were
called complex numbers, and they were used in a purely forma1 way for nearly 300 years
before they were described in a manner that would be considered satisfactory by present-day
standards.
Early in the 19th Century, Karl Friedrich Gauss (1777-1855) and William Rowan
Hamilton (1805-1865) independently and almost simultaneously proposed the idea of
defining complex numbers as ordered pairs (a, 6) of real numbers endowed with certain
special properties. This idea is widely accepted today and is described in the next section.
9.2 Definitions and field properties
If a and b are real numbers, the pair (a, b) is called a complex number,
DEFINITION.
provided that equality, addition, and multiplication of pairs is de$ned as follows:
(a) Equality: (a, 6) = (c, d) means a = c and b = d.
(b) Sum: (a, b) + (c, d) = (a + c, b + d).
(c) Product: (a, b)(c, d) = (ac - bd, ad + bc).
The definition of equality tells us that the pair (a, b) is to be regarded as an ordered pair.
Thus, the complex number (2,3) is not equal to the complex number (3, 2). The numbers
358
Dejinitions
359
and$eId properties
a and b are called components of (a, b). The first component, a, is also called the realpart
of the complex number ; the second component, b, is called the imaginarypart.
Note that the symbol i = d- 1 does not appear anywhere in this definition. Presently
we shall introduce i as a particular complex number which has a11 the algebraic properties
ascribed to the fictitious symbol q- 1 by the early mathematicians. However, before we
do this, we Will discuss the basic properties of the operations just defined.
THEOREM 9.1.
The operations of addition and multiplication of complex numbers satisfy
the commutative, associative and distributive laws. That is, if x, y, and z are arbitrary complex
numbers, we have the following.
Commutative laws: x + y = y + x,
xy = yx .
Associative laws: x + (y + z) = (x + y) + z,
X(YZ> = (xy>z .
Distributive Iaw: x(y + z) = xy + xz .
Proof.
Al1 these laws are easily verified directly from the definition of sum and product.
For example, to prove the associative law for multiplication, we Write x = (xl , x,),
y = (y, , y& z = (zl, zZ) and note that
x(yz> = (Xl >X2)(Jvl = MYlZl -
yzz2
2
y92
Y2Z2)
-
x,(y,z,
= KW1 -
x2y2h
-
hy2
= (XlYl -
X2Y2,
%.Y2
+
+
Y2Zd
+
+
dy,z,
+
Y24
X2YdZ2~ hy2
+
X2Ylh
yzzd,
X2YlK~l>Z2)
+
x,(y,z,
-
+ &Y1 -
y,z,))
x2yz)zz)
= (xy>z.
The commutative and distributive laws may be similarly proved.
Theorem 9.1 shows that the set of a11 complex numbers satisfies the first three field
axioms for the real number system, as given in Section 1 3.2. Now we Will show that
Axioms 4, 5, and 6 are also satisfied.
Since (0, 0) + (a, b) = (a, b) for a11 complex numbers (a, b), the complex number (0, 0)
is an identity element for addition. It is called the zero complex number. Similarly, the
complex number (1,0) is an identity for multiplication because
(a, W, 0) = (a, b)
for a11 (a, b). Thus, Axiom 4 is satisfied with (0,O) as the identity for addition and (1, 0)
as the identity for multiplication.
TO verify Axiom 5, we simply note that (-a, -b) + (a, b) = (0, 0), SO (-a, -b) is the
negative of (a, 6). We Write -(a, b) for (-a, -b).
Finally, we show that each nonzero complex number has a reciprocal relative to the
identity element (1, 0). That is, if (a, b) # (0, 0), there is a complex number (c, d) such that
(a, b)(c, 4 = (LOI .
In fact, this equation is equivalent to the pair of equations
ac - bd= 1,
ad+bc=O,
Complex numbers
360
which has the unique solution
a
CE a2 + b2’
(9.1)
&-b
a2 + b2 ’
The condition (a, b) # (0,O) ensures that a2 + b2 # 0, SO the reciprocal is well defined.
We Write (a, b)-’ or l/(a, b) for the reciprocal of (a, b). Thus, we have
(9.2)
1
-=
(a, b)
a
- b
a2 + b2 ’ a2 + b2
if (a, b) # (0, 0) .
The foregoing discussion shows that the set of a11 complex numbers satisfies the six
field axioms for the real-number system. Therefore, a11 the laws of algebra deducible from
the field axioms also hold for complex numbers. In particular, Theorems 1.1 through 1.15
of Section 1 3.2 are a11 valid for complex numbers as well as for real numbers. Theorem
1.8 tells us that quotients of complex numbers exist. That is, if (a, b) and (c, d) are two
complex numbers with (a, b) # (0, 0), then there is exactly one complex number (x, y)
such that (a, b)(x, y) = (c, d). In fact, we have (x, y) = (c, d)(a, b)-l.
9.3 The complex numbers as an extension of the real numbers
Let C denote the set of a11 complex numbers. Consider the subset C, of C consisting of
a11 complex numbers of the form (a, 0), that is, a11 complex numbers with zero imaginary
part. The sum or product of two members of C, is again in C,. In fact, we have
(9.3)
(a, 0) + (b, 0) = (a + 6, 0)
and
(a, W, 0) = (ah 0) .
This shows that we cari add or multiply two numbers in C, by adding or multiplying the
real parts alone. Or, in other words, with respect to addition and multiplication, the
numbers in C, act exactly as though they were real numbers. The same is true for
subtraction and division, since -(a, 0) = (-a, 0) and (b, 0)-l = (b-l, 0) if b # 0. For this
reason, we ordinarily make no distinction between the real number x and the complex
number (x, 0) whose real part is X; we agree to identify x and (x, 0), and we Write x = (x, 0).
In particular, we Write 0 = (0, 0), 1 = (1, 0), - 1 = (- 1, 0), and SO on. Thus, we cari
think of the complex number system as an extension of the real number system.
The relation between C, and the real-number system cari be described in a slightly
different way. Let R denote the set of a11 real numbers, and letfdenote the function which
maps each real number x onto the complex number (x, 0). That is, if x E R, let
f(x) = (x,0) .
The functionf so defined has domain R and range C,, and it maps distinct elements of R
onto distinct elements of C,. Because of these properties,fis said to establish a one-to-one
correspondence
between R and C,. The operations of addition and multiplication are
preserved under this correspondence. That is, we have
f(a + b) =f@> +f(b)
and
f(ab> =fWV> ,
these equations being merely a restatement of (9.3). Since R satisfies the six field axioms,
The imaginary unit i
361
the same is true of C,. The two fields R and C, are said to be isomorphic; the function f
which relates them as described above is called an isomorphism. As far as the algebraic
operations of addition and multiplication are concerned,
we make no distinction between
isomorphic fields. That is why we identify the real number x with the complex number
(x, 0). The complex-number system C is called an extension of the real-number system R
because it contains a subset C, which is isomorphic to R.
The field C, cari also be ordered in such a way that the three order axioms of Section 1 3.4
are satisfied. In fact, we simply define (x, 0) to be positive if and only if x > 0. It is trivial
to verify that Axioms 7, 8, and 9 are satisfied, SO C, is an ordered field. The isomorphism
f described above also preserves order since it maps the positive elements of R onto the
positive elements of C, .
9.4 The imaginary unit i
Complex numbers have some algebraic properties not possessed by real numbers. For
example, the quadratic equation x2 + 1 = 0, which has no solution among the real
numbers, cari now be solved with the use of complex numbers. In fact, the complex
number (0, 1) is a solution, since we have
(0, l)“= (0, lj(0, 1) = (0.0 - 1. 1, 0. 1 + 1 *oj = (-1,O) = -1.
The complex number (0, 1) is denoted by i and is called the imaginary unit. It has the
property that its square is - 1, i2 = - 1. The reader cari easily verify that (-i)” = - 1,
SO x = -i is another solution of the equation x2 + 1 = 0.
Now we cari relate the ordered-pair idea with the notation used by the early mathematicians. First we note that the definition of multiplication of complex numbers gives
us (b, Oj(0, 1) = (0, 6), and hence we have
(a, b) = (a, 0) + (0, b) = (a, 0) + (b, WO, 1) .
Therefore, if we Write a = (a, 0), b = (b, 0), and i = (0, l), we get (a, b) = a + bi. In
other words, we have proved the following.
THEOREM
9.2.
Every complex number (a, b) cari be expressed in the form (a, b) = a + bi.
The advantage of this notation is that it aids us in algebraic manipulations of formulas
involving addition and multiplication. For example, if we multiply a + bi by c + di,
using the distributive and associative laws, and replace i2 by - 1, we find that
(a + bi)(c + di) = ac - bd + (ad + bc)i ,
which, of course, is in agreement with the definition of multiplication. Similarly, to
compute the reciprocal of a nonzero complex number a + bi, we may Write
1
a - bi
z a- biE
=
a + bi
(a + bij(,a - bi)
a2 + b2
This formula is in agreement with that given in (9.2).
- a- bi a2 + b2
a2 + b”
362
Complex numbers
By the introduction of complex numbers, we have gained much more than the ability
to solve the simple quadratic equation x2 + 1 = 0. Consider, for example, the quadratic
equation ux2 + bx + c = 0, where a, b, c are real and a # 0. By completing the square,
we may Write this equation in the form
If 4ac - b2 5 0, the equation has the real roots (-b f -)/(2a). If 4ac - b2 > 0,
the left member is positive for every real x and the equation has no real roots. In this case,
however, there are two complex roots, given by the formulas
(9.4)
r,=-b+i~
2a
2a
and y,=---i
b
1/4ac-b2.
2a
2a
In 1799, Gauss proved that every polynomial equation of the form
where a,, a, , . . . , a, are arbitrary real numbers, with a, # 0, has a solution among the
complex numbers if n 2 1. Moreover, even if the coefficients a,, a, , . . . , a, are complex,
a solution exists in the complex-number system. This fact is known as the jiindamental
theorem of a1gebra.t It shows that there is no need to construct numbers more general
than complex numbers to solve polynomial equations with complex coefficients.
9.5 Geometric interpretation. Modulus and argument
Since a complex number (x, y) is an ordered pair of real numbers, it may be represented
geometrically by a point in the plane, or by an arrow or geometric vector from the origin
to the point (x, y), as shown in Figure 9.1. In this context, the xy-plane is often referred
to as the complex plane. The x-axis is called the real axis; the y-axis is the imaginary axis.
It is customary to use the words complex number and point interchangeably. Thus, we
refer to the point z rather than the point corresponding to the complex number z.
The operations of addition and subtraction of complex numbers have a simple geometric
interpretation. If two complex numbers z1 and z2 are represented by arrows from the
origin to z1 and z2 , respectively, then the sum z1 + z2 is determined by the parallelogram
Zaw. The arrow from the origin to z1 + z2 is a diagonal of the parallelogram determined
by 0, z1 , and z2 , as illustrated by the example in Figure 9.2. The other diagonal is related
to the difference of z1 and z2 . The arrow from z1 to z2 is parallel to and equal in length to
the arrow from 0 to z2 - z1 ; the arrow in the opposite direction, from z2 to z1 , is related
in the same way to z1 - z2 .
t A proof of the fundamental theorem of algebra cari be found in almost any book on the theory of functions
of a complex variable. For example, see K. Knopp, Theory of Functions, Dover Publications, New York,
1945, or E. Hille, Analytic Function Theory, Vol. 1, Blaisdell Publishing CO ., 1959. A more elementary
proof is given in 0. Schreier and E. Sperner, Introduction to Modem Algebra and Matrix Theory, Chelsea
Publishing Company, New York, 1951.
Geometric interpretation. Modulus and argument
363
If (x, y) # (0, 0), we cari express x and y in polar coordinates,
x=r
COS
8,
y = r sin e ,
and we obtain
x + iy = r (COS
(9.5)
e +i
sin
e)
.
The positive number r, which represents the distance of (x, y) from the origin, is called
the modulus or absolute value of x + iy and is denoted by 1.x + iy]. Thus, we have
Ix +
iyl =
dx2
+
y2.
Y
z, +z,
CX
0
FIGURE 9 . 1
Geometric representation of the
complex number x + iy.
FIGURE 9.2 Addition and subtraction of
complex numbers represented geometrically
by the parallelogram law.
The polar angle 8 is called an argument of x + iy. We say an argument rather than the
argument because for a given point (x, y) the angle 0 is determined only up to multiples
of 27r. Sometimes it is desirable to assign a unique argument to a complex number. This
may be done by restricting 8 to lie in a half-open interval of length 2~. The intervals
[0,27r) and (-7r, ~1 are commonly used for this purpose. We shall use the interval (-r, ~1
and refer to the corresponding 0 as the principal argument of x + iy; we denote this 0 by
arg (x + iy). Thus, if x + iy # 0 and r = Ix + iyl, we define arg (x + iy) to be the
unique real 8 satisfying the conditions
x =r
COS
e,
y = r sin 8,
-n<esn.
For the zero complex number, we assign the modulus 0 and agree that any real 0 may be
used as an argument.
Since the absolute value of a complex number z is simply the length of a line segment, it
is not surprising to find that it has the usual properties of absolute values of real numbers.
For example, we have
1.4 > 0
if z # 0,
and
]zl - z2] = ]za - zl] .
Complex numbers
364
Geometrically, the absolute value Izl - z.J represents the distance between the points z1
and z2 in the complex plane. The triangle inequality
IZl + z21 I 1211 + Iz21
is also valid. In addition, we have the following formulas for absolute values of products
and quotients of complex numbers:
bl z2l = IZll I%l
(9.6)
and
bll
=-
if z2 # 0 .
1221
If we Write z1 = a + bi and z2 = c + di, we obtain (9.6) at once from the identity
(UC - bd)” + (bc + a@ = (a” + b2)(c2 + dz) .
The formula for Iz1/z21 follows from (9.6) if we Write z1 as a product,
z1 = z2 3.
z2
If z = x + iy, the complex conjugate of z is the complex number z = x - iy. Geometrically, z represents the reflection of z through the real axis. The definition of conjugate
implies that
z1 + z2 = 2, + z, >
ZlZ2
=
- ZlZ2
3
ZllZ2
=
Q2
>
zz = 1212
.
The verification of these properties is left as an exercise for the reader.
If a quadratic equation with real coefficients has no real roots, its complex roots, given
by (9.4), are conjugates. Conversely, if r1 and r2 are complex conjugates, say r1 = CI + i/3
and r2 = tc - i/l, where CI and ,8 are real, then r1 and r2 are roots of a quadratic equation
with real coefficients. In fact, we have
r1 + r2 = 2u
and
r1r2 = tc2 + /?” ,
SO
(x - rl)(x - r2) = x2 - (rl + r2>x + r1r2 ,
and the quadratic equation in question is
x2 - 2ctx + cc2 + p = 0.
Exercises
365
9.6 Exercises
1. Express the following complex numbers in the form a + bi.
(e) (1 + i)/(l - 2i).
;;; ;; + i12.
(f) i5 + i16.
(c) 1,;; + i).
(g) 1 + i + i2 + P.
(h) +(l + i)(l + i-*).
(d) (2 + 3i)(3 - 49.
2. Compute the absolute values of the following complex numbers.
(d) 1 + i + i2.
(a) 1 + i.
(b) 3 + 4i.
(e) i’ + Po.
(f) 2(1 - i) + 3(2 + i).
(c) (1 + i>/(l - i).
3. Compute the modulus and principal argument of each of the following complex numbers.
(a) 2i.
(f) (1 + i)/z/z.
(b) -3i.
(g) (-1 + i)3.
(c) -1.
(h) (-1 - i)3.
(i) l/(l + i).
(dl 1.
(e) -3 + & i.
Cj) l/(l + iY.
4. In each case, determine a11 real numbers x and y which satisfy the given relation.
(a) x + iy = x - iy.
(d) (x + i~)~ = (x - y)2.
x + iy
= x -iy.
(b) x + iy = 1.x + iyl.
Ce> x - iy
100
(c) Ix + iyl = Ix - iyl.
(f) kzo ik = x + iy.
5. Make a sketch showing the set of a11 z in the complex plane which satisfy each of the following
conditions.
(d) Iz - 11 = Iz + II.
64 14 < 1.
(b) z + z = 1.
(e) Iz - il = Iz + il.
(c) z - 2 = i.
(f) z + i = lzl2.
6. Let f be a polynomial with real coefficients.
(a) Show that f(z) = f(i) for every complex z.
(b) Use part (a) to deduce that the nonreal zeros off(if any exist) must occur in pairs of conjugate complex numbers.
7. Prove that an ordering relation cannot be introduced in the complex number system SO that
a11 three order axioms of Section 13.4 are satisfied.
[Hint: Assume that such an ordering cari be introduced and try to decide
imaginary unit i is positive or negative.]
whether the
8. Define the following “pseudo-ordering” among the complex numbers. If z = x + iy, we say
that z is positive if and only if x > 0. Which of the order axioms of Section 13.4 are satisfied
with this definition of positive?
9. Solve Exercise 8 if the pseudo-ordering is defined as follows: We say that z is positive if and
only if IzI > 0.
10. Solve Exercise 8 if the pseudo-ordering is defined as follows: If z = x + iy, we say that z is
positive if and only if x > y.
11. Make a sketch showing the set of a11 complex z which satisfy each of the following conditions.
(a) 122 + 31 < 1.
(c) Iz - il 5 Iz + il.
(b) Iz + 11 < Iz - II.
(d) Izl Il22 + II.
12. Let w = (az + b)/(cz + d), where a, b, c, and d are real. Prove that
w - iv = (ad - bc)(z - i)/lcz + dj2.
If ad - bc > 0, prove that the imaginary parts of z and w have the same sign.
366
Complex
numbers
9.7 Complex exponentials
We wish now to extend the definition of e” SO that it becomes meaningful when x is
replaced by any complex number z. We wish this extension to be such that the law of
exponents, eaeb = eaib, Will be valid for a11 complex a and b. And, of course, we want ez
to agree with the usual exponential when z is real. There are several equivalent ways to
carry out this extension. Before we state the definition of ez that we have chosen, we shall
give a heuristic discussion which Will serve as motivation for this definition.
If we Write z = x + iu, then, if the law of exponents is to be valid for complex numbers,
we must have
ez = ex+iY = e”ei”
Since e” has already been defined when x is real, our task is to arrive at a reasonable
definition for eiY when y is real. Now, if eiY is to be a complex number, we may Write
eiy = A(y) + 8(y) ,
(9.7)
where A and B are real-valued functions to be determined. Let us differentiate both sides
of Equation (9.7), assuming A and B are differentiable, and treating the complex number
i as though it were a real number. Then we get
ieiv = A’(y) + iB’(y) .
(9.8)
Differentiating once more, we find that
-eiy = A”(y) + ii?‘(y) .
Comparison of this equation with (9.7) shows that A and B must satisfy the equations
A”(y) = --A(y)
In other words, each of
f” + f = 0. From the
solution with specified initial
use the fact that e” = 1,
A(0) = 1,
and
B”(y) = - B ( y ) .
the functions A and B is a solution of the differential equation
work of Chapter 8, we know that this equation has exactly one
values f (0) and f ‘(0). If we put y = 0 in (9.7) and (9.8) and
we find that A and B have the initial values
A’(0) = 0,
and
B(0) = 0,
B’(0) = 1 .
By the uniqueness theorem for second-order differential equations with constant coefficients,
we must have
and
A(y) = COS y
B(y) = sin y .
In other words, if eiY is to be a complex number with the properties just described, then
we must have ei* = COS y + i sin y. This discussion serves to motivate the following
definition.
Complex exponentials
Ifz =
DEFINITION.
361
x + iy, ule dejine ez to be the complex number given by the equation
ez = e”(cos y + i sin y) .
(9.9)
Note that eZ = e” when y = 0; hence this exponential agrees with the usual exponential
when z is real. Now we shah use this definition to deduce the law of exponents.
THEOREM
9.3.
If a and b are complex numbers, we have
eaeh
(9.10)
Proof.
= ea+b
Writing a = x + iy and b = u + iv, we have
ea = e”(cos y + i sin y),
eb = eU(cos v + i sin v) ,
SO
eaeb = e”e”[cos
y cas v - sin y sin v + i(cos y sin u + sin y
COS
u)] .
Now we use the addition formulas for COS (y + v) and sin (y + v) and the law of exponents
for real exponentials, and we see that the foregoing equation becomes
eaeh = ez+u[COS (y + u) + i sin (Y +
(9.11)
V)I <
Since a + b = (x + U) + i(y + v), the right member of (9.11) is ea+b. This proves (9.10).
THEOREM
9.4.
Every complex number z # 0 cari be expressed in the form
z = reie,
(9.12)
h>here r = \zJ and 8 = arg (z) + 2nx, n being any integer.
polar form of z.
This representation is called the
Proof. I f z = x + iy, the polar-coordinate representation (9.5) gives us
z = r(cos 19 + i sin 0) ,
where r = Jzj and 8 = arg (z) + 2nn, n being any integer. But if we take x = 0 and y = 8
in (9.9), we obtain the formula
e i8 = cas 8 + i sin 0,
which proves (9.12).
The representation of complex numbers in the polar form (9.12) is especially useful in
connection with multiplication and division of complex numbers. For example, if z1 = rIede
and z2 = r2eî4’, we have
(9.13)
ZlZ2 =
rlei0r2e i0 =
r1r2e de++> .
Complex numbers
368
Therefore the product of the moduli, r1r2, is the modulus of the product z1z2 , in agreement
with Equation (9.6), and the sum of the arguments, 8 + 4, is an admissible argument for
the product z1z2 .
When z = reie, repeated application of (9.13) gives us the formula
zn = peina = P(cos ne + i sin ne),
valid for any nonnegative integer n. This formula is also valid for negative integers n if
we define zP’ to be (z-~)~ when m is a positive integer.
Similarly, we have
ie
3 _
rie
r1 ice-,#J>
--=-e
Z2
SO
ie
w
>
r2
the modulus of zJzz is r,/r, and the difference 6’ - 4 is an admissible argument for zl/z2 .
9.8 Complex-valued functions
A function f whose values are complex numbers is called a complex-valued function.
If the domain offis a set of real numbers, f is called a complex-valued function of a real
variable. If the domain is a set of complex numbers,fis called a complex-valued function
of a complex variable, or more simply, a function of a complex variable. An example is
the exponential function, defined by the equation
f(z) = e”
for a11 complex z. Most of the familiar elementary functions of calculus, such as the
exponential, the logarithm, and the trigonometric functions, cari be extended to become
functions of a complex variable. (See Exercises 9 and 10 in Section 9.10.) In this more
general
framework many new properties and interrelationships are often revealed. For
example, the complex exponential function is periodic. In fact, if z = x + ij~ and if n
is any integer, we have
ez+znni
= e”[cos (y + 2n77) + i sin (y + 2n77)] = e”(cos y + i siny) = eZ .
Thus we see thatf(z + 2mri) =f(z), sof has the period 2ni. This property of the exponential function is revealed only when we study the exponefitial as a function of a complex
variable.
The first systematic treatment of the differential and integral calculus of functions of
a complex variable was given by Cauchy early in the 19th Century. Since then the theory
has developed into one of the most important and interesting branches of mathematics.
It has become an indispensable tool for physicists and engineers and has connections in
nearly every branch of pure mathematics. A discussion of this theory Will not be given
here. We shall discuss only the rudiments of the calculus of complex-valued functions of a
real variable.
Exanzples
of dlxerentiation
and integration formulas
Suppose f is a complex-valued function defined on some interval Z of real numbers.
each x in Z, the function valueS is a complex number, SO we cari Write
369
For
f(x) = 4x) + iv(x) ,
where U(X) and v(x) are real. This equation determines two real-valued functions u and v
called, respectively, the real and imaginary parts off; we Write the equation more briefly
as f = u + iv. Concepts such as continuity, differentiation, and integration off may be
defined in terms of the corresponding concepts for u and v, as described in the following
definition.
DEFINITION. qf = u + iv, we say f is continuousut a point if both u and u are continuous at that point. The derivative off is dejned by the equation
f’(x) = u’(x) + iv’(x)
whenever both derivatives u’(x) and v’(x) exist. Similarly, we dejîne the integral off by the
equation
j;f(x) dx = c u(x) dx + i t v(x) dx
whenever both integrals on the right exist.
In view of this definition, it is not surprising to find that many of the theorems of differential and integral calculus are also valid for complex-valued functions. For example, the
rules for differentiating sums, products, and quotients (Theorem 4.1) are valid for complex
functions. The first and second fundamental theorems of calculus (Theorems 5.1 and 5.3)
as well as the zero-derivative theorem (Theorem 5.2) also hold for complex functions. TO
illustrate the ease with which these theorems cari be proved, we consider the zero-derivative
theorem :
Zff’(x) = Oorf a Ifx on an open interval I, then f is constant on I.
Proof Write f = u + iv. Since f’ = u’ + iv’, the statement f’ = 0 on Z means that
both u’ and v’ are zero on Z. Hence, by Theorem 5.2, both u and u are constant on I.
Therefore f is constant on Z.
9.9
Examples of differentiation and integration formulas
In this section we discuss an important example of a complex-valued function of a real
variable, namely the function f defined for a11 real x by the equation
f(x) = etm ,
where t is a fixed complex number. When t is real, the derivative of this function is given
by the formula f ‘(x) = tet”. Now we prove that this formula is also valid for complex t.
370
Complex numbers
THEOREM 9.5.
Iff(X)
=
d'for
a11 real x and ajxed complex t, then f’(x) = tetx.
Proof. Write t = cc + $, where M and ,!I are real.
exponential, we have
f(x) = e ” = eaz+iaa:
From the definition of the complex
= car Cos /lx + ieax sin px .
Therefore, the real and imaginary parts off are given by
u(x) = eus cas 0x
(9.14)
These functions
and
u(x) = eux sin px .
are differentiable for a11 x and their derivatives are given by the formulas
U’(x) = ue”” cas /3x - @eae sin Bx ,
v’(x) = ue’” sin bx + ge’”
COS
px .
Since f’(x) = u’(x) + iv’(x), we have
f’(x) = ueaz(cos
@x + i sin px) + i/leaz(cos Bx + i sin Bx)
= (u + i/j>e(a+i/J)Z
= tet”.
This completes the proof.
Theorem 9.5 has some interesting consequences.
For example, if we adopt the Leibniz
notation for indefinite integrals, we cari restate Theorem 9.5 in the form
s
(9.15)
etx dx = et”
t
when t # 0. If we let t = CI + i/l and equate the real and imaginary parts of Equation
(9.15), we obtain the integration formulas
s
s
and
eux
COS
,8x dx =
car sin px dx =
eas(u cas j3x + fl sin Bx)
2 + p
ebz(u sin /Ix - b
cc2 + B”
COS
Bx)
3
which are valid if cc and /? are not both zero.
Another consequence
of Theorem 9.5 is the connection between complex exponentials
and second-order linear differential equations with constant coefficients.
THEOREM
9.6.
Consider the dl@erential
(9.16)
where a and b are real constants.
equation
y” + ay’ + by = 0 ,
The real and imaginary parts of the function f dejned on
Exercises
371
(- CO, + ~XI) by the equation f(x) = ets are solutions of the dzjèrential equation (9.16) if
and only if t is a root of the characteristic equation
t2 + ut + b = 0,
Proof. Let L(y) = y” + ay’ + by. Since f’(x) = tet”, we also have r(x)= t2etz, SO
L(f) = etî(t2 + ut + b). But et” is never zero since etZeëtZ = e” = 1. Hence, L(f) = 0
if and only if t2 + ut + b = 0. But if we Write f = u + iv, we find L(f) = L(u) + Z(u),
and hence L(f) = 0 if and only if both L(u) = 0 and L(u) = 0. This completes the proof.
Note: If t = OL + $, the real and imaginary parts off are given by (9.14). If the
characteristic equation has two distinct roots, real or complex, the linear combination
y = QU(X) + c2u(x>
is the general solution of the differential equation. This agrees with the results proved
in Theorem 8.7.
Further examples of complex functions are discussed in the next set of exercises.
9.10 Exercises
1. Express each of the following complex numbers in the form a + bi.
(a) enii2.
(e) i + e2ai.
(b) 2e-vi/2
(f) enil
(c) 3eRi. ’
Cg) eni/4 . _ e-oil4 .
1 _ enil
(d) -e-Ri.
04 ~
1 + eni/2 2. In each case, find a11 real x and y that satisfy the given relation.
(a) x + iy = xeiy.
(c) e+iy = -1.
1 +i
(b) x + e = yeir.
= xeiY.
(4 1 -i
3. (a) Prove that ez # 0 for a11 complex z.
(b) Find a11 complex z for which ez = 1.
4. (a) If 0 is real, show that
ei8 + e-i8
COS 8 =
2
eie _ e-ie
and
sin 9 =
2i
’
(b) Use the formulas in (a) to deduce the identities
COS2 e = g<1 + COS 2e),
5. (a) Prove DeMoivre’s
sin2 0 = $(l -
COS
20) .
theorem,
( COS e + i sin ey = COS ne + i sin ne ,
valid for every real 0 and every positive integer n.
(b) Take n = 3 in part (a) and deduce the trigonometric identities
sin38 =3c0s2esine -sin38.
Cos38 =c03e -3cos8sin2e.
312
Complex numbers
6. Prove that every trigonometric sum of the form
S,(X) = ;a0 + 5 (ak
COS
kx + bk sin kx)
k=l
cari
be expressed as a sum of complex exponentials,
k=-n
cgeilca ,
whereck=+(ak-ib,)fork=1,2,...,n. Determine corresponding formulas for tek .
7. (a) If m and n are integers, prove that
277
einx e-imx dx = 02n
s0
i
i f m#n,
i f m=n.
(b) Use part (a) to deduce the orthogonality relations for the sine and cosine (m and n are
integers, m2 Z n2):
277
i0
sin nx COS mx dx =
1
277
0
sin2 nx dx =
2R
i0
2n
i0
sin nx sin mx dx =
cos2nxdx = T
cosnxcosmxdx = 0,
i f n#O.
8. Given a complex number z # 0. Write z = y&*, where 19 = arg(z). Let zi = Reza, where
R = rlin and tc = O/n, and let E = elaiin, where n is a positive integer.
(a) Show that zr = z; that is, zi is an nth root of z.
(b) Show that z has exactly n distinct nth roots,
z1 , EZ1
, r2z1 , . . . > E n-1Zl ,
and that they are equally spaced on a circle of radius R.
(c) Determine the three cube roots of i.
(d) Determine the four fourth roots of i.
(e) Determine the four fourth roots of -i.
9. The definitions of the sine and cosine functions cari be extended to the complex plane as
follows :
,iz + &7.
,iz _ e-iz
COS z =
sin z =
2
’
2i
’
When z is real, these formulas agree with the ordinary sine and cosine functions. (See Exercise
4.) Use these formulas to deduce the following properties of complex sines and cosines. Here
u, v, and z denote complex numbers, with z = x + iy.
(a) sin (u + v) = sin u cas v + COS u sin v.
(b) COS (u + v) = COS u COS v - sin u sin v.
(c) sin2z + cos2z = 1.
(d) COS (iy) = cash y,
sin (iy) = i sinh y.
(e) COS z = COS x cash y - i sin x sinh y.
(f) sin z = sin x cash y + i COS x sinh y.
373
Exercises
10. If z is a nonzero complex number, we define
Log z, the complex logarithm of z, by the equation
Log 2 = log IzI + i arg(z) .
When z is real and positive, this formula agrees with the ordinary logarithm.
to deduce the following properties of complex logarithms.
(a) Log (- 1) = pi,
Log (i) = d/2.
where n is an integer.
(b) Log (zrzs) = Log zr + Log z2 + 2mi,
(c) Log (zJzs) = Log zr - Log z2 + 2niri,
where n is an integer.
(d) eLog z = z.
Use this formula
11. If w and z are complex numbers, z # 0, we define zw by the equation
zw = etoLogz >
where Log z is defined as in Exercise 10.
(a) Compute li, ii, and ( -l)i.
(b) Prove that zazb = za+b if a, b, and z are complex, z # 0.
(c) Note that the equation
(9.17)
(w2)w = zyz;
is violated when zr = Z~ = -1 and w = i. What conditions on zr and z, are necessary for
Equation (9.17) to hold for a11 complex w?
In Exercises 12 through 15, L denotes the linear operator defined by Z(y) = y” + uy’ + by,
where a and b are real constants.
12. Prove that if R is a complex-valued function, say R(x) = P(x) + iQ(x), then a complex-valued
function f(x) = u(x) + iv(x) satisfies the differential equation L(y) = R(x) on an interval Z
if and only if u and v satisfy the equations L(u) = P(x) and L(v) = Q(x) on Z.
13. If A is complex and w is real, prove that the differential equation L(y) = A&@* has a complexvalued solution of the form y = Beéax, provided that either b # w2 or aw # 0. Express the
complex number B in terms of a, b, A, and o.
14. Assume c is real and b # 02. Use the results of Exercise 13 to prove that the differential
equation L(y) = c COS ox has a particular solution of the form y = A COS (ox - a), where
A
c
= d(b - ,2)2 + a%02
and
ao
tana=b-02.
15. Assume c is real and b # w2. Prove that the differential equation L(y) = c sin ox has a particular solution of the form y = A sin (ox + a) and express A and a in terms of a, b, c, and o.
10
SEQUENCES, INFINITE SERIES,
IMPROPER INTEGRALS
10.1 Zeno’s paradox
The principal subject matter of this chapter had its beginning nearly 2400 years ago
when the Greek philosopher Zeno of Elea (495-435 B.C.) precipitated a crisis in ancient
mathematics by setting forth a number of ingenious paradoxes. One of these, often called
the racecourse paradox, may be described as follows:
A runner cari never reach the end of a racecourse because he must caver half of any
distance before he covers the whole. That is to say, having covered the first half he
still has the second half before him. When half of this is covered, one-fourth yet
remains. When half of this one-fourth is covered, there remains one-eighth,
and SO
on, ad injnitum.
Zeno was referring, of course, to an idealized situation in which the runner is to be
thought of as a particle or point moving from one end of a line segment to the other. W e
cari formulate the paradox in another way. Assume that the runner starts at the point
marked 1 in Figure 10.1 and runs toward the goal marked 0. The positions labeled ‘,, t,
8, etc., indicate the fraction of the course yet to be covered when these points are reached.
These fractions, each of which is half the previous one, subdivide the whole course into an
endless number of smaller portions. A positive amount of time is required to caver each
portion separately, and the time required for the whole course is the sum total of a11 these
amounts. TO say that the runner cari never reach the goal is to say that he never arrives
there in a finite length of time; or, in other words, that the sum of an endless number of
positive time intervals cannot possibly be finite.
This assertion was rejected 2000 years after Zeno’s time when the theory of infinite
series was created. In the 17th and 18th centuries, mathematicians began to realize that it
is possible to extend the ideas of ordinary addition fromJinite collections of numbers to
inznite collections SO that sometimes infinitely many positive numbers have a finite “sum.”
TO see how this extension might corne about and to get an idea of some of the difficulties
that might be encountered in making the extension, let us analyze Zeno’s paradox in more
detail.
Suppose the aforementioned runner travels at a constant speed and suppose it takes him
T minutes to caver the first half of the course. The next quarter of the course Will take
374
Zeno’s paradox
375
T/2 minutes, the next eighth Will take T/4 minutes, and, in general, the portion from
1/2” to 1/2 n+l Will take T/2” minutes. The “sum” of a11 these time intervals may be indicated symbolically by writing the following expression:
(10.1)
T+$+;+-
This is an example of what is known as an injnite series, and the problem here is to decide
whether there is some reasonable way to assign a number which may be called the sum of
this series.
Our physical experience tells us that a runner who travels at a constant speed should
reach his goal in twice the time it takes for him to reach the halfway point. Since it takes
F IGURE 10.1
The racecourse paradox.
T minutes to caver half the course, it should require 2T minutes for the whole course.
This line of reasoning strongly suggests that we should assign the “sum” 2T to the series
in (lO.l), and it leads us to expect that the equation
(10.2)
T+;+T+- +;+- . = 2T
should be “true” in some sense.
The theory of infinite series tells us exactly how to interpret this equation. The idea is
this: First we add a$nite number of the terms, say the first n, and denote their sum by s,.
Thus we have
(10.3)
s,=T+;+;+-+-&.
This is called the nth partial sum of the series. Now we study the behavior of s, as n takes
larger and larger values. In particular, we try to determine whether the partial sums s,
approach a finite limit as n increases without bound.
In this example it is easy to see that 2T is the limiting value of the partial sums. In
fact, if we calculate a few of these partial sums, we find that
sl= T ,
s,=T+;=;T,
s,=T+;+;=;T,
376
Sequences,
inznite series,
improper integrals
Now, observe that these results may be expressed as follows:
Sl = (2 - l)T,
s2 = (2 - $)T,
SS = (2 - f)T,
sq = (2 - $)T.
This leads us to conjecture the following general formula:
(10.4)
for a11 positive integers n .
Formula (10.4) is easily verified by induction. Since 1/2n-1 + 0 as n increases indefinitely,
this shows that s, -+ 2T. Therefore, Equation (10.2) is “true” if we interpret it to mean that
2T is the limit of the partial sums s,. This limit process seems to invalidate the assertion
that the sum of an infinite number of time intervals cari never be finite.
Now we shall give an argument which lends considerable support to Zeno’s point of
view. Suppose we make a small but important change in the foregoing analysis of the
racecourse paradox. Instead of assuming that the speed of the runner is constant, let us
suppose that his speed gradually decreases in such a way that he requires T minutes to
go from 1 to 1/2, T/2 minutes to go from 1/2 to 1/4, T/3 minutes to go from 1/4 to 1/8,
and, in general, T/n minutes to go from 1/2n-1 to 1/2”. The “total time” for the course
may now be represented by the following infinite series:
(10.5)
T+f+;+...
In this case, our physical experience does not suggest any natural or obvious “sum” to
assign to this series, and hence we must rely entirely on mathematical analysis to deal with
this example.
Let us proceed as before and introduce the partial sums s, . That is, let
s,=T+$+;+...+;.
Our abject is to decide what happens to s, for larger and larger values of n. These partial
sums are not as easy to study as those in (10.3) because there is no simple formula analogous
to (10.4) for simplifying the expression on the right of (10.6). Nevertheless, it is easy to
obtain an estimate for the size of s, if we compare the partial sum with an appropriate
integral.
Figure 10.2 shows the graph of the functionf(x) = 1/x for x > 0. (The scale is distorted
along the y-axis.) The rectangles shown there have a total area equal to the sum
1+;+;+...+;.
The area of the shaded region is j ;2+lx-l dx = log (n + 1). Since this area cannot exceed
the sum of the areas of the rectangles, we have the inequality
(10.8)
1+;+;+... + ‘, 2 log (n + 1) .
311
Zeno’s paradox
Multiplying both sides by T, we obtain s, 2 Tlog (n + 1). In other words, if the runner’s
speed decreases in the manner described above, the time required to reach the point 1/2”
is at least T log (n + 1) minutes. Since log (n + 1) increases without bound as n increases,
we must agree with Zeno and conclude that the runner cannot reach his goal in any finite
time.
The general theory of infinite series makes a distinction between series like (10.1) whose
partial sums tend to a finite limit, and those like (10.5) whose partial sums have no finite
x-’ dx = log(n + 1)
FIGURE 10.2 Geometric meaning of the inequality 1 + 1/2 + . . . + I/n 2 log (12 + 1).
limit. The former are called convergent, the latter divergent. Early investigators in the
field paid little or no attention to questions of convergence or divergence. They treated
infinite series as though they were ordinary finite sums, subject to the usual laws of algebra,
not realizing that these laws cannot be universally extended to infinite series. Therefore,
it is not surprising that some of the results they obtained were later shown to be incorrect.
Fortunately, many of the early pioneers possessed unusual intuition and ski11 which
prevented them from arriving at too many false conclusions, even though they could not
justify a11 their methods. Foremost among these men was Leonard Euler who discovered
one beautiful formula after another and at the same time used infinite series as a unifying
idea to bring together many branches of mathematics, hitherto unrelated. The great
quantity of Euler’s work that has survived the test of history is a tribute to his remarkable
instinct for what is mathematically correct.
The widespread use of infinite series began late in the 17th Century, nearly fifty years
before Euler was born, and coincided with the early development of the integral calculus.
Nicholas Mercator (1620-l 687) and William Brouncker (1620-l 684) discovered an infinite
series- for the logarithm in 1668 while attempting to calculate the area of a hyperbolic
segment. Shortly thereafter, Newton discovered the binomial series. This discovery proved
378
Sequences, injnite series, improper integrals
to be a landmark in the history of mathematics. A special case of the binomial series
the now-familiar binomial theorem which states that
is
(1 + x)” = 2 (;jxk,
k=O
where x is an arbitrary real number, n is a nonnegative integer, and (I;n) is the binomial
coefficient. Newton found that this formula could be extended from integer values of
the exponent n to arbitrary real values of n by replacing the finite sum on the right by a
suitable infinite series, although he gave no proof of this fact. Actually, a careful treatment
of the binomial series raises some rather delicate questions of convergence that could not
have been answered in Newton’s time.
Shortly after Euler’s death in 1783, the flood of new discoveries began to recede and the
forma1 period in the history of series came to a close. A new and more critical period
began in 1812 when Gauss published a celebrated memoir which contained, for the first
time in history, a thorough and rigorous treatment of the convergence of a particular
infinite series. A few years later Cauchy introduced an analytic definition of the limit
concept in his treatise Cours d’anaZyse algébrique (published in 1821) and laid the foundations of the modern theory of convergence and divergence. The rudiments of that theory
are discussed in the sections that follow.
10.2 Sequences
In everyday usage of the English language,
the words “sequence”
and “series” are
synonyms, and they are used to suggest a succession of things or events arranged in some
order. In mathematics these words have special technical meanings. The word “sequence”
is employed as in the common use of the term to convey the idea of a set of things arranged
in order, but the word “series” is used in a somewhat different sense. The concept of a
sequence Will be discussed in this section, and series Will be defined in Section 10.5.
If for every positive integer n there is associated a real or complex number a,, then the
ordered set
a1 , a2 , a3, . . . , a, , . . .
is said to define an infinite sequence.
The important thing here is that each member of
the set has been labeled with an integer SO that we may speak of the$rst term a, , the second
term a2 , and, in general, the nth term a, . Each term a, has a successor a,,, and hence
there is no “last” term.
The most common examples of sequences
cari be constructed if we give some rule or
formula for describing the nth term. Thus, for example, the formula a, = l/n defines a
sequence whose first five terms are
11””
> 3, 3, 4, 5 .
Sometimes two or more formulas may be employed as, for example,
aznel = 1,
a2n = 2n2 ,
Sequences
379
the first few terms in this case being
1, 2, 1, 8, 1, 18, 1, 32, 1 .
Another common way to define a sequence is by a set of instructions which explains how
to carry on after a given start. Thus we may have
a, = us = 1,
a,+, = 0, + a,-,
for n 2 2 .
This particular rule is known as a recursion formula, and it defines a famous sequence
whose terms are called the Fibonaccit numbers. The first few terms are
1, 1,2,3, 5, 8, 13,21, 34.
In any sequence the essential thing is that there be some function f defined on the positive
integers such that f(n) is the nth term of the sequence for each n = 1, 2, 3, . . . . In fact,
this is probably the most convenient
way to state a technical definition of sequence.
DEFI NITION.
called an infinite
A jiinction f whose domain is the set of a11 positive integers 1, 2, 3, . . . is
sequence. The function value f(n) is called the nth term of the sequence.
The range of the function (that is, the set of function values) is usually displayed by writing
the terms in order, thus:
fU>,fG?,fW, . . .,f(n>, . . . .
For brevity, the notation {f(n)} is used to denote the sequence whose nth term is f(n).
Very often the dependence on n is denoted by using subscripts, and we Write a,, s, , x, , u, ,
or something similar instead of f(n). Unless otherwise specified, a11 sequences
in this
chapter are assumed to have real or complex terms.
The main question we are concerned with here is to decide whether or not the terms
f(n) tend to a finite limit as n increases indefinitely. TO treat this problem, we must extend
the limit concept to sequences. This is done as follows.
DEFINITION. A sequence {f(n)} is said to have a limit L if, for every positive number E,
there is another positive number N (which may depend on l ) such that
If(n) - LI < E
for a11 n 2 N .
In this case, we say the sequence {f(n)} converges to L and we Write
limf(n) = L,
n+m
or
f(n)+L
a
s
n-tco.
A sequence which does not converge is called divergent.
In this definition the function values f (n) and the limit L may be real or complex numbers.
If f and L are complex, we may decompose them into their real and imaginary parts, say
f = u + iv and L = a + ib. Then we have f(n) - L = u(n) - a + i[v(n) - b]. The
t Fibonacci, also known as Leonardo of Pisa
concerning the offspring of rabbits.
(circa
1175-1250), encountered this sequence in a problem
380
Sequences,
injnite series, improper integrals
inequalities
and
Mn> - 4 I If(n) - LI
Mn> - 4 I If(n) - 4
show that the relationf(n) + L implies u(n) + a and v(n) + b as n + CO. Conversely, the
inequality
If(n) - LI I b(n) - 4 + Mn) - 4
shows that the two relations u(n) + a and v(n) --f b imply f(n) + L as n + CO. In other
words, a complex-valued sequence f converges if and only if both the real part u and the
imaginary part z, converge separately, in which case we have
limf(n) = lim u(n) + ilim v(n) .
n-+cc
n-tm
n’m
It is clear that any function defined for a11 positive real x may be used to construct a
sequence by restricting x to take only integer values. This explains the strong analogy
between the definition just given and the one in Section 7.14 for more general functions.
The analogy carries over to injnite Zimits as well, and we leave it for the reader to define
the symbols
limf(n) = +co
and
limf(n) = -CO
n+‘x
12-m
as was done in Section 7.15 when f is real-valued. Iffis complex, we Write f(n) + CO as
n+ coiflf(n)l+
+co.
The phrase “convergent sequence” is used only for a sequence whose limit isfinite. A
sequence with an infinite limit is said to diverge. There are, of course, divergent sequences
that do not have infinite limits. Examples are defined by the following formulas:
f(n) = (--lY ,
f(n) = sin 7,
f(n) = (-I)“(l
+ -j ,
f(n) = enin”.
The basic rules for dealing with limits of sums, products, etc., also hold for limits of
convergent sequences. The reader should have no difficulty in formulating these theorems
for himself. Their proofs are somewhat similar to those given in Section 3.5.
The convergence or divergence of many sequences may be determined by using properties
of familiar functions that are defined for a11 positive x. We mention a few important
examples of real-valued sequences whose limits may be found directly or by using some of
the results derived in Chapter 7.
(10.9)
lim L = 0
n-+m na
i f a>O.
(10.10)
lim xn = 0
?L+a>
if 1x1 < 1 .
(10.11)
lim(logn)“O
n+m
nb
(10. 12)
lim n lin = 1 .
?L+a,
(10. 13)
lim 1 + a n= ea
?L’O2 (
n1
for a11 a > 0, b > 0 .
for a11 real a .
Monotonie sequences of real numbers
381
10.3 Monotonie sequences of real numbers
A sequence {f(n)} is said to be increasing if
f(n)sf(n+ 1)
forallnk 1 .
We indicate this briefly by writingf(n)f . If, on the other hand, we have
f(n)>f(n+ 1) foralln2 1
,
we cal1 the sequence decreasing and Write f (n)\. A sequence is called monotonie if it is
increasing or if it is decreasing.
Monotonie sequences are pleasant to work with because their convergence or divergence
is particularly easy to determine. In fact, we have the following simple criterion.
THEOREM
10.1.
A monotonie sequence converges if and only if it is bounded.
Note: A sequence {f( n )} .IS called bounded if there exists a positive number M such that
If(n)] 5 M for a11 n. A sequence that is not bounded is called unbounded.
Proof. It is clear that an unbounded sequence cannot converge. Therefore, a11 we need
to prove is that a bounded monotonie sequence must converge.
Assume f(n)7 and let L denote the least Upper bound of the set of function values.
(Since the sequence is bounded, it has a least Upper bound by Axiom 10 of the real-number
L-C
L
Y,
J(l)
f(2)
FIGURE 10.3
f(3)
f(4)
J(N) J(n)
A bounded increasing sequence converges to its least Upper
bound.
system.) Thenf(n) 5 L for a11 n, and we shall prove that the sequence converges to L.
Choose any positive number E. Since L - E cannot be an Upper bound for a11 numbers
f(n), we must have L - E <f(N) for some N. (This N may depend on E.) If n 2. N,
we have f(N) <f(n) since f(n)7 . Hence, we have L - E <f(n) 5 L for a11 n 2 N, as
illustrated in Figure 10.3. From these inequalities we find that
0 < L - f (12) < E
for a11 n 2 N
and this means that the sequence converges to L, as asserted.
Iff(n)L , the proof is similar, the limit in this case being the greatest lower bound of the
set of function values.
Sequences,
382
injînite series, improper integrals
10.4 Exercises
In Exercises 1 through 22, a sequence {f(n)} is defined by the formula given. In each case, (a)
determine whether the sequence converges or diverges, and (b) find the limit of each convergent
sequence. In some cases it may be helpful to replace the integer n by an arbitrary positive real x
and to study the resulting function of x by the methods of Chapter 7. You may use formulas (10.9)
through (10.13) listed at the end of Section 10.2.
n+l
3” + (-2)n
l.f(n) =* - - .
12* fCn) = 3n+1 + ( -2)“+1 *
n
2. f(n) = -& - ql.
13. f(n) = m - fi.
3. f(n) = COS n;.
14. f(n) = nun,
4.
f(n)
n2 + 3n - 2
= 5n2
.
where \a( < 1.
log, n
15. f(n) = 7 ,
a > 1.
100,000n
16. f(n) = 1 .
6. f(n) = 1 + ( -l)n.
7. f(n) =
1 + (-1)n
II
-
18.f(n) = 1 +-/&cosy.
C-1)” + 1 + (-1)n
2
.
.
8. f(n) = n
9. f(n) = 211n.
20. f(n) = eeni”12.
10. f(n) = n(-l)n.
21. f(n) = i eënin12,
n213 sin (n !)
ll.f(n>=
n+l
.
22. f(n) = neëoini2,
Each of the sequences {a,} in Exercises 23 through 28 is convergent. Therefore, for every preassigned E > 0, there exists an integer N (depending on c) such that la, - LI < E if n > N, where
L = lim,,, un . In each case, determine a value of N that is suitable for each of the following
values of E: E = 1, 0.1, 0.01, 0.001, 0.0001.
29. Prove that a sequence cannot converge to two different limits.
30. Assume lim,,, a, = 0. Use the definition of limit to prove that lim,,, un = 0.
31. If lim,,, a, = A and lim,,, b, = B, use the definition of limit to prove that we have
limn+m (a, + b,) = A + B, and lim,,, (ca,) = CA, where c is a constant.
32. From the results of Exercises 30 and 31, prove that if lim,,, a, = A then lim,,, ui = AZ.
Then use the identity 2a,b, = (a, + bJ2 - un. - bi to prove that lim,,,(u,b,) = AB if
lim,,, 4 = A and lim,,, b, = B.
Inflnite series
383
33. If M is a real number and n a nonnegative integer, the binomial coefficient (t) is defined by
the equation
u
C((cc - l)(cc - 2) . . . (a - n + 1)
=
n
n!
0
(a) When u = -4, show that
(:)= -;, (;)=i,
(;)= -&, (o;)=$
(;)= -g.
(b) Let a, = (- 1)” (-i’“). Prove that a, > 0 and that a,+, < a, .
34. Let f be a real-valued function that is monotonie increasing and bounded on the interval
[O, 11. Define two sequences {sn} and (tn} as follows:
(a) Prove that s, 5 :f(x) dx <
s
t,
and that 0 <
(b) Prove that both sequences {Si} and
{tn}
s0
‘J(x) dx - s, 2 f(1’ if(‘) .
converge to the limit
sif (-4 dx.
(c) State and prove a corresponding result for the interval [a, b].
35. Use Exercise 34 to establish the following limit relations:
12
1
(a) lim n+ocs n
(d) lim
k=l & = 1% (1 + ti).
12-00 c
(b) lim cñfk = log2.
12-m kzl
1 0 . 5 Infinite series
From a given sequence of real or complex numbers, we cari always generate a new
sequence by adding together successive terms. Thus, if the given sequence has the terms
4 , a2, . . . , a,, . . . ,
we may form, in succession, the “partial sums”
Sl =
and
SO
(10.14)
a,,
s2 = a, +
4
>
sg = a, +
a2
on, the partial sum s, of the first n terms being defined
s, = a, + u2 + . * * + a, =gJk *
+ a3 ,
as follows:
384
Sequences, infînite series, improper integrals
The sequence {s,) of partial sums is called an injinite series, or simply a series, and is also
denoted by the following symbols:
m
(10.15)
a,l-a,+-..+a,+*..,
a, + a2 + a3 + . . . ,
zak.
k=l
For example, the series zpzl l/k represents the sequence {s,} for which
s, =
c
n 1
k’
k=l
The symbols in (10.15) are intended to remind us that the sequence of partial sums {s,}
is obtained from the sequence {a,} by addition of successive terms.
If there is a real or complex number S such that
lim s, = S ,
n-m
we say that the series ~~zl a, is convergent and has the sum S, in which case we Write
If {s,} diverges, we say that the series ~~fl a, diverges and has no sum.
EXAMPLE 1. THE HARMONIC SERIES. In the discussion of Zeno’s paradox, we showed that
the partial sums s, of the series zpzl l/k satisfy the inequality
s, =
c
n 1
- > log (n + 1).
k=l k -
Since log (n + 1) - CO as n + 00, the same is true of s,, and hence the series
diverges. This series is called the harmonie series.
ZZZ1 I/k
EXAMPLE 2. In the discussion of Zeno’s paradox, we also encountered the partial sums
of the series 1 + & + & + * *. , given by the formula
ck-n1 2k-1
14-L
y-1'
which is easily proved by induction. As n + 00, these partial sums approach the limit 2,
and hence the series converges and has sum 2. We may indicate this by writing
(10.16)
1+4+$+...=2.
The reader should realize that the word “sum” is used here in a very special sense. The
sum of a convergent series is not obtained by ordinary addition but rather as the Zimit
The linearity property of convergent series
385
of the sequence of partial sums. Also, the reader should note that for a convergent series,
the symbol Ii:=, ak is used to denote both the series and its sum, even though the two are
conceptually distinct. The sum represents a number and it is not capable of being convergent or divergent. Once the distinction between a series and its sum has been realized,
the use of one symbol to represent both should cause no confusion.
As in the case of finite summation notation, the letter k used in the symbol zpzl a, is a
“dummy index” and may be replaced by any other convenient symbol. The letters n, m,
and r are commonly used for this purpose. Sometimes it is desirable to start the summation
from k = 0 or from k = 2 or from some other value of k. Thus, for example, the series
in (10.16) could be written as zr=, 1/2”. In general, ifp 2 0, we define the symbol zF=, a,
to mean the same as Er=, b, , where b, = aD+k-l. Thus b, = a,, b, = aP+l, etc. When there
is no danger of confusion or when the starting point is unimportant, we Write 2 a, instead
Of 2:x, ak .
It is easy to prove that the two series X:=1 a, and zF=, a, both converge or both diverge.
If p = 0, we
Suppose we let s, = a, + *. . + a, and t, = a, + a,,, + * * - + a,+,-, .
have tn+l = a, + s, , SO if s, -t S as n --f CO, then t, --f a,, + S and, conversely, if t, -+ T
asn+ao,thens,+T-a,.
Therefore, both series converge or both diverge whenp = 0.
The same holds true if p 2 1. For p = 1, we have s, = t, , and for p > 1, we have
{s,} and {t,} both converge
t, = s,+,-1 - s,-1 > and again it follows that the sequences
or both diverge. This is often described by saying that a finite number of terms may be
omitted or added at the beginning of a series without affecting its convergence or divergence.
10.6 The linearity property of convergent series
Ordinary finite sums have the following important properties:
(10.17)
z>ak + bk) =g;k + 2
k=l bk
and
(10.18)
k$Jca,> = ci ak
k=l
(homogeneous property) .
The next theorem provides a natural extension of these properties to convergent infinite
series and thereby justifies many algebraic manipulations in which convergent series are
treated as though they were finite sums. Both additivity and homogeneity may be combined into one property called linearity which may be described as follows :
THEOREM
10.2. Let 2 a, and 2 b, be convergent injnite series of complex terms and
let 0: and p be complex constants. Then the series 2 (ua, + fib,) also converges, and its sum
is given by the equation
(10.19)
386
Sequences,
infinite series, improper integrals
Proof. Using (10.17) and (10.18), we may Write
When n -+ CO, the first term on the right of (10.20) tends to CI z;?, a, and the second term
tends to ,CI zFZ1 6,. Therefore tbe left-hand side tends to their sum, and this proves that
the series 2 (~(a~ + /3b,) converges to the sum indicated by (10.19).
Theorem 10.2 has an interesting corollary which is often used to establish the divergence
of a series.
THEOREM
10.3. If2 a, converges and ifz b, diverges, then 2 (a, + b,) diverges.
Proof. Since b, = (a, + b,) - a, , and since 2 a, converges, Theorem 10.2 tells us that
convergence of 1 (an + b,) implies convergence of 2 b, . Therefore, 2 (a, + b,) cannot
converge if 2 b, diverges.
EXAMPLE.
The series 2 (1 /k + 1/2”) diverges because 2 1 /k diverges and 2 1/2” converges.
If x a, and 1 b, are both divergent, the series 1 (an + b,) may or may not converge. F o r
example, when a, = b, = 1 for a11 n, then 2 (a, + 6,) diverges. But when a, = 1 and
b n = -1 for a/ n, then 2 (a, + b,) converges.
10.7 Telescoping series
Another important property of finite sums is the telescoping property which states that
n
(10.21)
kzl@k - b,+d = b, - b,+l.
When we try to extend this property to infinite series we are led to consider those series
2 a, for which each term a, may be expressed as a difference of the form
a, = b, - b,+I .
(10.22)
These series are known as telescoping series and their behavior is characterized by the
following theorem.
THEOREM
(10.23)
10.4.
Let {a,> and {b,} be two sequences
a, = b, - b,+I
of complex numbers such that
for
n=l,2,3 ,....
Then the series 2 a, converges if and only if the sequence
(10.24)
$Fn=b,-L
where
{b,} converges, in which case we have
L = lim b, .
n’a>
387
Telescoping series
Proof. Let s, denote
the nth partial sum of 2 a, . Then we have
% = 2 ak = i@k - bk+d =
k-l
k=l
b, - b,+l ,
because of (10.21). Therefore, both sequences
{s,} and {b,} converge or both diverge.
Moreover, if b, + L as n + CO, then s, + b, - L, and this proves (10.24).
Note: Every series is telescoping because we cari always satisfy (10.22) if we first choose
b, to be arbitrary and then choose b,+l = b, - s, for n 2 1, where s, = a, + . + a,.
EXAMPLE
1. Let a, = l/(n” + n). Then we have
1
1
1
a,=-=--n+
1’
n(n + 1) n
and hence
(10.23) holds with b, = I/n. Since b, = 1 and L = 0, we obtain
CO
c
?l=l
EXAMPLE
1
1.
n(n + l)=
2. If x is not a negative integer, we have the decomposition
1
1
1
1
(n + x)(n + x + l)(n + x + 2) = Z ( ( n + x)(n + x + 1) - (n + x + I)(n + x + 2) 1
for each integer n 2 1. Therefore, by the telescoping property, the following series converges and has the sum indicated:
cca
1
1
n=l (n + x)(n + x + I)(n + x + 2) = 2(x + 1)(x + 2) ’
EXAMPLE 3. Sincelog [n/(n + 1)] = log n - log (n + l), and sincelog ut + cc as ut + 00,
the series 2 log [n/(n + l)] diverges.
Note: Telescoping series illustrate an important difference between finite sums and
infinite series. If we Write (10.21) in extended form, it becomes
(b, - b,) + (b, - b3) + - . * + (6, - b,+,) = b, - b,+1
which cari be verified by merely removing parentheses and canceling. Suppose now we perform the same operations on the infinite series
(6, - b,) + (b, - b3) + (b3 - b,) + . ae.
We leave b, , cancel bz , cancel b, , and SO on. For each n > 1, at some stage we cancel b, .
Thus every b, cancels with the exception of bI . This leads us to the conclusion that the sum
Sequences, infinite series, improper integrals
388
of the series is b, . Because of Theorem 10.4, this conclusion is false unless lim,.,, b, = 0.
This shows that parentheses cannot always be removed in an infinite series as they cari in a
finite sum. (See also Exercise 24 in Section 10.9.)
10.8 The geometric series
The telescoping property of finite sums may be used to study a very important example
known as the geometric series. This series is generated by successive addition of the terms
in a geometric progression and has the form 2 x”, where the nth term xn is the nth power
of a fixed real or complex number x. It is convenient to start this series with n = 0, with
the understanding that the initial term, x0, is equal to 1.
Let s, denote the nth partial sum of this series, SO that
s,
q
=
1
+
x
+
x2
+
*.
.
+
X+l.
If x = 1, each term on the righ t is 1 and s, = n. In this case, the series diverges since
s, + CO as n -t CO. If x # 1, we may simplify the sum for s, by writing
12-l
(1
- X)S, = (il
since the last sum telescopes.
s71 =
n-1
- xjI~oxk
=r'xk
- xk+lj = 1 - in,
Dividing by 1 - x, we obtain the formula
1-x”
1 - :c
=
1
-
1 - X
Xn
l - x
i f
x+1.
This shows that the behavior of s, for large n depends entirely on the behavior of xn.
When 1x1 < 1, then xn --f 0 as n --f CO, and the series converges to the sum I/(l - x).
Since sn+r - s, = xn, convergence of {.Y~) implies x” + 0 as n --f 00. Therefore, if
1x1 2 1 the sequence (s,} diverges since xn does not tend to 0 in this case. Thus we have
proved the following theorem.
THEOREM
10.5. Zf x is complex, with 1x1 < 1, the geometric series ~~‘,x” converges
and has sum l/(l - x). That is to say, we have
(10.25)
Zf
1
1 + x + x2 + . . + /yn + . . . = l - x
if 1x1 < 1.
(xl 2 1, the series diverges.
The geometric series, with (XI < 1, is one of those rare examples whose sum we are
able to determine by finding first a simple formula for its partial sums. (A special case
with x = 4 was encountered in Section 10.1 in connection with Zeno’s paradox.) The
real importance of this series lies in the fact that it may be used as a starting point for
determining the sums of a large number of other interesting series. For example, if we
assume 1x1 < 1 and replace x by x2 in (10.25), we obtain the formula
(10.26)
1 + X2 + xl + . ,. . + X2n + . . . =1 -1 - x2
if 1x1 < 1 .
The geontetric series
389
Notice that this series contains those terms of (10.25) with euen exponents. TO find the
sum of the odd powers alone, we need only multiply both sides of (10.26) by x to obtain
(10.27)
x +
X
x3 + x5 + . . . + x2n+1 + * * * = -
if 1x1 < 1 .
1 - x2
If we replace x by -x in (10.25), we find that
(10.28)
1 - x +
x3
_
x3
1
+ . . . + (-lyx” + . . . = 1 + x
if 1x1 < 1 .
Replacing x by x2 in (10.28), we find that
(10.29)
l-~~+x~-x~+~~~+(-l)~x~~+~~~=~~
if 1x1 < 1 .
Multiplying both sides of (10.29) by x, we obtain
(10.30)
X
x - x3 + x5 - x’ + *. *+ (- l)nxZn+l + *. . = 1 + x2
if 1x1 < 1 .
If we replace x by 2x in (10.26), we find that
1
1+4x2+16x4+~~~+4”x2”+~~~= ~
1 - 4x2 ’
which is valid if 12x1 < 1 or, what is the same thing, if 1x1 < t. It is clear that many other
examples may be constructed by similar means.
Al1 these series have the special form
and are known as pow,er series. The numbers a, , a, , a2 , . . . , which may be real or complex,
are called coejîcients
of the power series. The geometric series is an example with a11
coefficients equal to 1. If x and a11 the coefficients are real, the series is called a real power
series. We shall find later, when we discuss the general theory of real power series, that it
is permissible to differentiate and to integrate both sides of each of the Equations (10.25)
through (10.30), treating the left-hand members as though they were ordinary finite sums.
These operations lead to many remarkable new formulas. For example, differentiation of
(10.25) gives us
(10.31)
1
1 + 2x + 3x2 + . . . + nx”-l + . . . = ~
(1 - x)”
if 1x1 < 1 ,
Sequencez,
390
in$nite series, improper integrals
whereas integration of (10.28) yields the interesting formula
n ni-l
(10.32)
(-l) x
n+l
+ **- = log(l +x>
which expresses the logarithm as a power series. This is the discovery of Mercator and
Brouncker (1668) that we mentioned earlier. Although each of the Equations (10.25)
through (10.31) is valid for x in the open interval -1 < x < +l, it turns out that the
logarithmic series in (10.32) is valid at the endpoint x = + 1 as well.
Another important example, which may be obtained by integration of (10.29), is the
following power-series expansion for the inverse tangent, discovered in 1671 by James
Gregory (1638-l 675) :
3
(10.33)
n 2n+l
X5
x-;+5-z:+...+
(-l)’
-,
/
2n + 1
+..*=arctanx.
Gregory’s series converges for each complex x with 1x1 < 1 and also for x = f 1. When
x is real, the series agrees with the inverse tangent function introduced in Chapter 6. The
series cari be used to extend the definition of the arctangent function from real values of x
to complex x with 1x1 < 1.
Many of the other elementary functions of calculus, such as the sine, cosine, and exponential, may also be represented by power series. This is not too surprising, in view of
Taylor’s formula which tells us that any function may be approximated by a Taylor polynomial in x of degree < n if it bas derivatives of order n + 1 in some neighborhood of the
origin. In the examples given above, the partial sums of the power series are precisely the
Taylor polynomials. When a function f has derivatives of every order in a neighborhood
of the origin, then for every positive integer n Taylor’s formula leads to an equation of the
form
(10.34)
J’(X)
=
ia,xk + E,(x) )
k=O
where the finite sum z;=. akxk is a Taylor polynomial of degree < n and E,(x) is the error
for this approximation. If, now, we keep x fixed and let n increase without bound in (10.34)
the Taylor polynomials give rise to a power series, namely zkm_o akxk, where each coefficient
ak is determined as follows:
f ‘“‘(0)
ak=k!.
If, for some x, the error E,(x) tends to 0 as n + CO, then for this x we may let n - CO in
(10.34) to obtain
f(x) = lim ia,xk + lim E,(x) = iakxk
n+ CO k=O
n-+m
R=O
In other words, the power series in question converges to f(x). If x is a point for which
Conditions
E,(x) does not tend to 0 as n + CO: then the partial sums Will not approachf(x).
on f for guaranteeing that E,(x) -+ 0 Will be discussed later in Section Il. 10.
Exercises
391
TO lay a better foundation for the general theory of power series, we turn next to certain
general questions related to the convergence and divergence of arbitrary series. We shall
return to the subject of power series in Chapter 11.
10.9 Exercises
Each of the series in Exercises 1 through 10 is a telescoping series, or a geometric series, or some
related series whose partial sums may be simplified. In each case, prove that the series converges
and has the sum indicated.
m
m
1
1
1
6.
1.
c (n + l)(n + 2)(n + 3) = 4’
n=l
c (2n - 1)(2n + 1) = z *
n=l
cc
2n + 1
7.
1.
c n2(n + 1Y =
n=l
5.
m lhzT--di =l.
c
Yn2+n
10.
fi=1
m log [(l + l/n)“(l + n)]
log, 4.
c
n=2 oog n”)[log (n + lYfl1 =
Power series for log (1 + x) and arctan x were obtained in Section 10.8 by performing various
operations on the geometric series. In a similar manner, without attempting to justify the steps,
obtain the formulas in Exercises 11 through 19. They are a11 valid at least for 1x1 < 1. (The theoretical justification is provided in Section 11.8.)
11.
m
c
?l=l
12.
13.
c
n2xn
c
?Z=l
x2 + x
= (1 -x)3'
c
x3 + 4x2 + x
c
m
x4 + 11x3 + 11x2 + x
c n4xn =
(1 - x)5
.
n3xn
=
(lAX)4
18.
.
?l==l
14.
19.
n=l
ao
15.
0
c
X2n-1
1 +x
- = a log 1 -x’
n=l 2n - l
00
17.
(n + 1)x” = -L
c
(1 - x)2.
?%=Il
16.
1
m (n + I)(n + 2)
Xn = (1 -x)3
c
2!
TL=O
c
m (n + l)(n + 2)(n + 3)
3!
*
1
Xn =(l -x)4'
?L=ll
X"
- =log+x.
c
n=l
n
20. The results of Exercises 11 through 14 suggest
cv
c
that
there exists a general formula of the form
%W
nkXn =
(1 -x)k+l'
n=1
where Pk(x) is a polynomial of degree k, the term of lowest degree being x and that
of highest
Sequences, injnite series, improper integrals
392
degree being xk. Prove this by induction, without attempting to justify the forma1 manipulations with the series.
21. The results of Exercises 17 through 19 suggest the more general formula
1
Xn
= (1 _ X)k+i
= (n + l)(n + 2) **. (n + k)
k!
where
’
Prove this by induction, without attempting to justify the forma1 manipulations with the
series.
22. Given that zz=a xn/n ! = e” for a11 x, find the sums of the following series, assuming it is
permissible to operate on infinite series as though they were finite sums.
(c) -$ (n - 1;; + 1).
(b) 7%.
ïz?
n=2
23. (a) Given that ~~zO xn/n! = e” for a11 x, show that
c n2xn
= (2 + x)e”,
cTZ!
Tl=l
assuming it is permissible to operate on these series as though they were finite sums.
(b) The sum of the series ~~=-, ,n3/n! is ke, where k is a positive integer. Find the value of k.
Do not attempt to justify forma1 manipulations.
24. Two series ~~=r a, and ~~=r 6, are called identical if a, = b, for each n 2 1. For example,
the series
0+0+0-t...
and
(1 - 1) + (1 - 1) + (1 - 1) + . . .
are identical, but the series
1+1+1+..are not identical. Determine whether
pairs :
(a) 1 - 1 + 1 - 1 + . .
and
(b) 1 - 1 + 1 - 1 + . . .
(and
(c) 1 - 1 + 1 - 1 + . + and
(d) 1 + 4 + $ + 4 + . .
;and
25. (a) Use (10.26) to prove that
and
1+0+1+0+1
+o+..,
or not the series are identical in each of the following
(2 - 1) - (3 - 2) + (4 - 3) - (5 - 4) + . . . .
(1 - 1) + (1 - 1) + (1 - 1) + (1 - 1) + . . . .
1 +(-1+1)+(-l +1)+(-l +l)+....
1 + (1 - 4) + (4 - $) + (i - $) + . . . .
1+o+rs+o+x4+...=$-&
if 1x1 < 1 .
Note that, according to the definition given in Exercise 24, this series is not identical to the
one in (10.26) if x # 0.
(b) Apply Theorem 10.2 to the result in part (a) and to (10.25) to deduce (10.27).
(c) Show that Theorem 10.2 when applied directly to (10.25) and (10.26) does not yield (10.27).
Instead, it yields the formula X:C1 (x” - xzn) = x/(1 - x2), valid for 1x1 < 1.
Exercises
on
decimal
expansions
393
*lO.lO Exercises on decimal expansions
Decimal representations of real numbers were introduced in Section 13.15. It was shown there
that every positive real x has a decimal representation of the form
where 0 5 uk < 9 for each k 2 1. The number x is related to the digits a, , a, , a2 , . . . by the
inequalities
(10.35)
0, + 1
01
a,-,
a, + E + . . .+~~~x<u,+~+...+10"-11+10'L.
If we let s, =z”k=O uJlO”, and if we subtract s, from each member of (10.35), we obtain
0 < x - s, < 10-n.
This shows that s, + x as n --f ~0, and hence x is given by the convergent series
m
(10.36)
k=O
Each of the infinite decimal expansions in Exercises 1 through 5 is understood to be repeated
indefinitely as suggested. In each case, express the decimal as an infinite series, find the sum of the
series, and thereby express x as a quotient of two integers.
1.
x=0.4444....
4. x = 0.123123123123.. . .
2. x = 0.51515151 . . . .
5. x = 0.142857142857142857142857.. . .
3. x = 2.02020202. . . .
6. Prove that every repeating decimal represents a rational number.
7. If a number has a decimal expansion which ends in zeros, such as 4 = 0.1250000. . . , then
this number cari also be written as a decimal which ends in nines if we decrease the last nonzero
digit by one unit. For example, & = 0.1249999 . . . . Use infinite series to prove this statement.
The decimal representation in (10.36) may be generalized by replacing the integer 10 by any
other integer b > 1. If x > 0, let a, denote the greatest integer in x; assuming that a,, a,, , . ,
%-1
have been defined, let a, denote the largest integer such that
1
k-c0 bk -
%X.
The following exercises refer to the sequence of integers a, , a, , u2 , . . . SO obtained.
8. Show that 0 < uk 5 b - 1 for each k 2 1.
9. Describe a geometric method for obtaining the numbers a,, a, , u2 , . . . .
10. Show that the series xreo uk/bk converges and has sum x. This provides a decimal expansion
of x in the scale of 6. Important special cases, other than b = 10, are the binury SC&, b = 2,
and the duodecimul seule, b = 12.
394
Sequences, injnite series, improper integrals
10.11 Tests for convergence
In theory, the convergence or (divergence of a particular series 2 a, is decided by examining its partial sums s, to see whether or not they tend to a finite limit as n + 00. In some
special cases, such as the geometric series, the sums defining s, may be simplified to the
point where it becomes a simple matter to determine their behavior for large n. However,
in the majority of cases there is no nice formula for simplifying s, and the convergence
or divergence may be rather difficult to establish in a straightforward manner. Early
investigators in the subject, notably Cauchy and his contemporaries, realized this ditlîculty
and they developed a number of “convergence tests” that by-passed the need for an explicit
knowledge of the partial sums. A few of the simplest and most useful of these tests Will
be discussed in this chapter, but first we want to make some general remarks about the nature
of these tests.
Convergence tests may be broadly classified into three categories:
(i) su&ient conditions;
(ii) necessary conditions; (iii) necessary and S@cient conditions. A test of type (i) may
be expressed symbolically as follows:
“If C is satisfied, then 2 a, converges,”
where C stands for the condition in question. Tests of type (ii) have the form
“If 2 a,? converges, then C is satisfied,”
whereas those of type (iii) may be written thus:
“1 a, converges if and only if C is satisfied.”
We shall see presently that there are tests of type (ii) that are not of type (i) (and vice versa).
Beginners often use such tests incorrectly by failing to realize the difference between a
necessary condition and a suflcient condition. Therefore the reader should make an effort
to keep this distinction in mind when using a particular test in practice.
The simplest of a11 convergence tests gives a necessary condition for convergence and
may be stated as follows.
THEOREM
10.6.
If the series 2 a, converges, then its nth term tends to 0; that is,
(10.37)
lim a, = 0 .
?L+u>
ProoJ Let s, = a, + a2 + . . . + a, . Then a, = s, - s,-~. As n + CO, both s, and
s,-i tend to the same limit and hence a, + 0. This proves the theorem.
This is an example of a test of type (ii) which is not of type (i). Condition (10.37) is not
sufficient for convergence. For example, when a, = I/n, the condition a, + 0 is satisfied
but the series 2 I/n diverges. The real usefulness of this test is that it gives us a suficient
condition for divergence. That is, if the terms a, of a series 1 a, do not tend to zero, then
the series must diverge. This statement is logically equivalent to Theorem 10.6.
10.12 Comparison tests for series
of nonnegative terms
In this section we shall be concerned with series having nonnegative terms, that is, series
of the form 2 a,, where each a n ;z 0. Since the partial sums of such series are monotonie
Comparison tests for series of nonnegatice terms
395
increasing, we may use Theorem 10.1 to obtain the following necessary and sujjîcient
condition for convergence.
THEOREM
10.7. Assume that a, 2 0 for each n 2 1. Then the series 2 a, converges
if and only if the sequence of its partial sums is bounded above.
If the partial sums are bounded above by a number M, say, then the sum of the series
cannot exceed M.
EXAMPLE 1. Theorem 10.7 may be used to establish the convergence of the seriesIc=,
l/n!.
We estimate the partial sums from above by using the inequality
L<L
k, - 2k-1'
which is obviously true for a11 k 2. 1 since k! consists
we have
of k - 1 factors, each 22. Therefore
the last series being a geometric series. The series zc=, I/n! is therefore convergent and
has a sum < 2. We shall see later that the sum of this series is e - 1, where e is the Euler
number.
The convergence of the foregoing example was established by comparing the terms of
the given series with those of a series known to converge. This idea may be pursued further
to yield a number of tests known as comparison tests.
THEOREM 10.8.
COMPARISON
TEST.
A s s u m e a, 2 0 a n d b, 2 0 for a11 n 2 1. I f there
exists a positive constant c such that
(10.38)
a, I 4
for a11 n, then convergence of 2 b, implies convergence of 2 a,.
Note: The conclusion may also be formulated as follows: “Divergence of 2 a, implies
divergence of z 6, .” This statement is logically equivalent to Theorem 10.8. When the
inequality (10.38) is satisfied, we say that the series 1 b, dominates the series 1 a, .
Proof. Let s, = a, + . . . + a,, t, = b, + * * *+ b, . Then (10.38) implies s, < ct, . If
1 b, converges, its partial sums are bounded, say by M. Then s, < CM, and hence 2 a,
is also convergent since its partial sums are bounded by CM. This completes the proof.
Sequences,
396
injïnite series, improper integrals
Omitting a finite number of terms at the beginning of a series does not affect its convergence or divergence. Therefore Theorem 10.8 still holds true if the inequality (10.38) is
valid only for a11 n 2 N for some N.
THEOREM 10.9. LIMIT
and suppose that
COMPARIISON TEST.
Assume that a,,, > 0 and 6, > 0 for a11 n 2 1,
lim F = 1 .
vL+cc n
(10.39)
Then 2 a, converges if and only i,f 2 b, converges.
Proof. There exists an N such that n 2 N implies fr < anlb, < $. Therefore b, < 2a,
and a, < $b, for a11 n 2 N, and the theorem follows by applying Theorem 10.8 twice.
Note that Theorem 10.9 also holds if lim,,,a,/b, = c, provided that c > 0, because
we then have lim ,,man/(cbn) = 1 and we may compare 2 a, with x (cb,). However, if
lim n+coanlbn = 0, we conclude only that convergence of 2 b, implies convergence of 1 a, .
DEFINITION.
equal if
Two sequences
{a,,} and {b,} of complex
numbers are said to be asymptotically
lim F = 1 .
n-+02 12
This relation is often indicated symbolically by writing
(10.40)
a, - 6,
a s n-ta,.
The notation a, - b, is read “1~~ is asymptotically equal to b, ,” and it is intended to
suggest that a, and b, behave in essentially the same way for large n. Using this terminology,
we may state the limit comparison test in the following manner.
THEOREM
10.10. Two series 2 a, and 2 b, with terms that are positive and asymptotically
equal converge together or they diverge together.
EXAMPLE
2. THE RIEMANN ZETA-FUNCTION. In Example 1 of Section 10.7, we proved
that 1 l/(n” +n) is a convergent telescoping series. If we use this as a comparison series,
it follows that 2 l/n2 is convergent, since l/n2 N l/(n2 + n) as n + ûo. Also, 2 l/nz
dominates 1 I/n8 for s 2 2, and therefore 2 l/ns converges for every real s 2 2. We shall
prove in the next section that this series also converges for every s > 1. Its sum, denoted
by c(s) (5 is the Greek letter zeta), defines an important function in analysis known as the
Riemann
zeta-function:
5(s) = $$
n=l
i f
Euler discovered many beautiful formulas involving c(s).
7?/6, a result which is not easy to derive at this stage.
s>l.
In particular, he found that c(2) =
The integral test
397
EXAMPLE 3. Since 2 I/n diverges, every series having positive terms asymptotically
equal to I/n must also diverge. For example, this is true of the two series
CU
m
1
c
n=ldn(n + 10)
and
c
n=l
1
sin - .
n
The relation sin l/n - l/n follows from the fact that (sin x)/x --+ 1 as x + 0.
10.13 The integral test
T O use comparison tests effectively, we must have at our disposa1 some examples of
series of known behavior. The geometric series and the zeta-function are useful for this
purpose. New examples cari be obtained very simply by applying the integral test, first
proved by Cauchy in 1837.
FIGURE 10.4 Proof of the integral test.
THEOREM
10.11. INTEGRAL TEST.
allrealx> 1. Foreachnk 1,let
k=l
Let f be a positive decreasing function, defined for
and
t, = i(x) d x .
s1
Then both sequences {s,} and {t,} converge or both diverge.
Proof. By comparing
obtain the inequalities
f with appropriate step functions as suggested in Figure 10.4, we
or s, -f(l) 5 t, I s,-1 . Since both sequences {s,} and {t,} are monotonie increasing,
these inequalities show that both are bounded above or both are unbounded. Therefore,
both sequences converge or both diverge, as asserted.
398
Sequence,r,
EXAMPLE
inyÇnite
series, improper integrals
1. The integral test enables us to prove that
c
O2 1
TL=l
s > 1 .
converges if and only if
ns
Takingf(x) = x+, we have
nlps - 1
t, =
l - s
log n
i f
s#l,
i f
s=l.
When s > 1 the term nlPs -+ 0 as n + CO and hence {t,} converges. By the integral test,
this implies convergence of the series for s > 1.
When s < 1, then t, + CO and the series diverges. The special case s = 1 (the harmonie
series) was discussed earlier in Section 10.5. Its divergence was known to Leibniz.
EXAMPLE
2. The same method may be used to prove that
m 1
ca n@x n)!;
s=l>
converges if and only if s > 1 .
(We start the sum with n = 2 to avoid n for which log n may be zero.)
The corresponding integral in this case is
t, =
n-dx
1
s2 x(log x)
(log n)‘-” - (log 2)l-’
=
l - s
log (log n) - log (log 2)
i f
s#l,
i f
s=l.
Thus {t,} converges if and only if s > 1, and hence, by the integral test, the same holds
true for the series in question.
1 0 . 1 4 Exercises
Test the following series for convergence or divergence. In each case, give a reason for your
decision.
on
Isin nxj
n
A
yF-‘C
” zl (4n - 3)(4n - 1) .
?X=l
5. c
c
c
m 2/2n-llog(4n+l).
2.
n(n + 1)
n=1
m n + l
L
3.
-yY’
n=1
4.5;.
n=1
L
c
m
6.
2
c
n=1
m
7.
+(-un* (,
2n
c
n=1
cc
log n
8.
c
n=2
qi=z *
c
The root test and the ratio test for series of nonnegative terms
9.
m
c
n=l
1
~. k
dn(n + 1) ”
14.
m
10.
1 +&i
c
n=l (n + 1j3 - 1 *6
399
m n Cos2 (n743)
c
n=1
m
2n
.
1
15’ C
n log n (log log n)S .
n=3
m
11.
1
n=2 (log ds *
16. 2 neën’.
c
12.
n=1
m
O2 I%l
17.
c 10% 9 bnl < 10. c
n=1
m
1
13.
c 1000n + 1 * ’k
18.
?I=l
1ln 6 dx
cs 0 1+x2 *
n=1
m
n+1
CI n
?a=1
e-dl dx.
19. Assumefis a nonnegative increasing function
defined for a11 x 2 1. Use the method suggested
by the proof of the integral test to show that
Takef(x) = log x and deduce the inequalities
enneen < n ! < enn+‘eën.
(10.41)
These give a rough estimate
of the order of magnitude of n!. From (10.41), we may Write
&n
(n
!)Un
e<-
n
Letting n + w, we find that
(n!)lln
1
-+n
e
or
elln &n
<y.
(,l)lh
N”
.
e
10.15 The root test and the ratio test for series
a s n-tco.
of nonnegative terms
Using the geometric series 2 xn as a comparison series, Cauchy developed two useful
tests known as the root test and the ratio test.
If 2 a, is a series whose terms (from some point on) satisfy an inequality of the form
(10.42)
0 5 a, 5 x”,
where
O<x<l,
a direct application of the comparison test (Theorem 10.8) tells us that 2 a, converges.
The inequalities in (10.42) are equivalent to
(10.43)
hence the name root test.
Sequences., injnite series, improper integrals
400
If the sequence {aia} is convergent, the test may be restated in a somewhat more useful
form that makes no reference to the number x.
THEOREM
10.12.
ROOT
TEST.
Let 2 a, be a series of nonnegative terms such that
a:lln + R
IZ
a s
n-00.
(a) If R < 1, the series converges.
(b) If R > 1, the series diverges.
(c) If R = 1, the test is inconchasive.
Proof. Assume R < 1 and choose x SO that R < x < 1. Then (10.43) must be satisfied
for a11 n 2 N for some N. Hence, 1 a, converges by the comparison test. This proves (a).
TO prove (b), we observe that R > 1 implies a, > 1 for infinitely many values of n
and hence a, cannot tend to 0. Therefore, by Theorem 10.6,2 a, diverges. This proves (b).
TO prove (c), consider the two examples in which a, = I/n and a, = l/n2. In both
cases R = 1 since nll” --i 1 as n -+ CO [see Equation (10.12) of Section 10.21, but 2 l/n
diverges whereas 2 1 /n2 converges.
EXAMPLE
1. The root test makes it easy to determine the convergence of the series
ZZZ2 (log n)-” since
1
a s n-w.
alin, --- - - - + O
log n
EXAMPLE
2. Applying the root test to 2 [n/(n + l)]““, we find that
1
1
n n=
lin -- ( 1 (1 + l/n)“+é
a,
n+l
as
by Equation (10.13) of Section 10.2. Since l/e < 1, the series
n -f w,
converges.
A slightly different use of the comparison test yields the ratio test.
THEOREM
10.13.
RATIO
TEST.
Let 2 a, be a series of positive terms such that
a,+1
- - + L
‘1,
as
n+w.
(a) If L < 1, the series converges.
(b) If L > 1, the series diverge,r.
(c) Zf L = 1, the test is inconclusive.
Proof. Assume L < 1 and choose x SO that L < x < 1. Then there must be an N
such that a,+,/a, < x for a11 n 2 N. This implies
for a11 n 2 N .
The root test and the ratio test for series of nonnegative terms
401
In other words, the sequence {a,/x”} is decreasing for n 2 N. In particular, when n 2 N,
we must have anlxn 5 aN/xN, or, in other words,
a, 5 12,
where c = 9
ix*.
Therefore x a, is dominated by the convergent series x xn. This proves (a).
TO prove (b), we simply observe that L > 1 implies a,,, > a, for a11 n 2 N for some N,
and hence a, cannot approach 0.
Finally, (c) is proved by using the same examples as in Theorem 10.12.
Warning. If the test ratio a,,, /a, is always less than 1, it does not necessarily follow that
the Zimit L Will be less than 1. For example, the harmonie series, which diverges, has test
ratio n/(n + 1) which is always less than 1 but the limit L equals 1. On the other hand, for
divergence it is sufficient that the test ratio be greater than 1 for a11 sufficiently large IZ
because for such n we have a,,, > a, and a, cannot approach 0.
EXAMPLE 3. We may establish the convergence of the series
The ratio of consecutive
terms is
x n!/n” by the ratio test.
a12+11
_ (n + 1>! . f = 1 n n=
(n + l)n+l n!
( n+l1
(1 + l/n)” + i
a,
as
n-00,
by formula (10.13) of Section 10.2. Since l/e < 1, the series converges. In particular, this
implies that the general term of the series tends to 0; that is,
n.I
-0
nn
(10.44)
a s
n-03.
This is often described by saying that nn “grows faste?’ than n! for large n. Also, with a
natural extension of the o-notation, we cari Write (10.44) as follows: n! = o(rP) as II + 00.
Note:
The relation (10.44) may also be proved directly by writing
n!
1 2
k k + l
z=~.~‘..~.n...-
n
n’
where k = n/2 if n is even, and k = (n - 1)/2 if n is odd. If n 2 2, the product of the first
k factors on the right does not exceed (J$)~, and each of the remaining factors does not
exceed 1. Since (4)” + 0 as n + ~0, this proves (10.44). Relation (10.44) also follows
from (10.41).
The reader should realize that both the root test and the ratio test are, in reality, special
cases of the comparison test. In both tests when we have case (a), convergence is deduced
from the fact that the series in question cari be dominated by a suitable geometric series
z: xn. The usefulness of these tests in practice is that a knowledge of a particular comparison
series x xn is not explicitly required. Further convergence tests may be deduced by using
the comparison test in other ways. Two important examples known as Raabe’s test and
Gauss’ test are described in Exercises 16 and 17 of Section 10.16. These are often helpful
when the ratio test fails.
Sequences,
402
infinite series,
improper integrals
1 0 . 1 6 Exercises
Test the following series for convergence or divergence and give a reason for your decision in
each case.
1.
cc
* (n!)2
-.
c (24 !
n=l
8.
m (n!)2
2.&.
9. 2 evn2.
n=1
n=1
3.
Y (nl’n - 1)“.
2
n=1
c
m 2%!
7.
TL=l
10. $$i - eën’).
n=1
4. z3;.
’ 11.
c n!
F*
c
n=1
c n!
6.
pi’
c
?I=l
n=1
5.
12.
c c (1000)n
-.
c
n!
n=1
m nn+lln
c
n=l (n + l/nY *
m n”[fi + (-l)“]”
c
3n
n=1
m
m
1
7
.
~
14.
r” /sin nxl,
r > 0.
c
c
n=2
nV ’
n=1
15. Let {a,} and {b,} be two sequences with a, > 0 and b, > 0 for a11 n 2. N, and let c, = b, b,+la,+,/a, . Prove that :
(a) If there is a positive constant r such that c n 2 r > 0 for a11 n 2 N, then 2 a, converges.
13.
(log
[Hintc
Show that ~~zA~ ak :< u&Jr.]
(b) If c, 5 0 for n 2 N and if 2 l/b, diverges, then x a, diverges,
[Hint: Show that 2 u, dominates z l/b, .]
16. Let 2 a, be a series of positive terms. Prove Raabe’s test: If there is an r > 0 and an N > 1
such that
1
Qn+1
-‘çl---I
for a11 n 2 N ,
n
n
a,
then 2 a, converges. The series 21 a, diverges if
a,,,>12n
for a11 n 2 N .
an
[Hint:
Use Exercise 15 with b,+l = n.]
17. Let 2 a, be a series of positive terms. Prove Gauss’ test: If there is an N > 1, an s > 1, and
an M > 0 such that
an+,
-=1-A+fO
4
n
ns
f o r n>N,
where If(n)] 5 M for a11 n, then 2 a,, converges if A > 1 and diverges if A 5 1.
[Hint:
If A # 1, use Exercise 16. If A = 1, use Exercise 15 with b,+l = n log n.]
Alternating
series
403
18. Use Gauss’ test (in Exercise 17) to prove that the series
1 .3 . 5 . . . (2n - 1)
2 .4 .6 . . . (2~2)
converges if k > 2 and diverges if k < 2. For this example the ratio test fails.
10.17 Alternating series
Up to now we have been concerned largely with series of nonnegative terms. We wish
to turn our attention next to series whose terms may be positive or negative. The simplest
examples occur when the terms alternate in sign. These are called alternating series and
they have the form
(10.45)
nxl(-l)nmlan = a, - a2 + a3 - a4 + . . . + (--1Ya, + * ’ . y
where each a, > 0.
Examples of alternating series were known to many early investigators. We have already
mentioned the logarithmic series
log (1 + x) =
2
3
4
x _ “r + “s _ ; + . . . + (-I)n-1 x; + . . . .
As we shall prove later on, this series converges and has the sum log (1 + x) whenever
-1-c x 5 1. For positive x, it is an alternating series. In particular, when x = 1 we
obtain the formula
(10.46)
(-l)“-l + . . . ,
1,,2+;+;-$+...+n
which tells us that the alternating harmonie series has the sum log 2. This result is of
special interest in view of the fact that the harmonie series 2 l/n diverges.
Closely related to (10.46) is the interesting formula
(10.47)
;= IA+&;+...+ (-1)-l
2n - 1 + ***
discovered by James Gregory in 1671. Leibniz rediscovered this result in 1673 while
computing the area of a unit circular disk.
Bath series in (10.46) and in (10.47) are alternating series of the form (10.45) in which the
sequence
{a,} decreases monotonically to zero. Leibniz noticed, in 1705, that this simple
property of the a, implies the convergence of any alternating series.
Sequences,
404
injnite series, improper integrals
THEOREM 10.14. LEIBNIZ% RULE.
If {a,} is a monotonie decreasing sequence with limit
0, then the alternating series ZZZ,, (- l)+la,, converges. If S denotes its sum and s, its nth
partial sum, we also have the inequalities
(10.48)
0 < (--l)“(S - s,) < a,,,
for each n 2 1 .
The inequalities in (10.48) provide a useful way to estimate the error in approximating
the sum S by any partial sum s,. The first inequality tells us that the error, S - s, , has
the sign (- l)“, which is the same as the sign of the first neglected term, (- l)na,+l . The
second inequality states that the absolute value of this error is less than that of the first
neglected term.
s,, n even
FIGURE
c
1
S”, n odd
10.5 Proof of Leibniz’s rule for alternating series.
Proof. The idea of the proof of Leibniz’s rule is quite simple and is illustrated in Figure
10.5. The partial sums sZn (consisting of an even number of terms) form an increasing
sequence because sZn+2 - sZn = &rZnil - a2n+2 > 0. Similarly, the partial sums sZn-r form
a decreasing sequence. Both sequences
are bounded below by s2 and above by s1 . Therefore, each sequence {sZn} and {sZn-r}, being monotonie and bounded, converges to a limit,
say sZn --f S’, and sZnel -+ S”. But S’ = S” because
S’ - S” = lim sZn - lim sZn-l = lim (sZn - sZnpl) = lim (-a,,) = 0 .
n+cc
n--t52
n-tm
n+m
If we denote this common limit by S, it is clear that the series converges and has sum S.
TO derive the inequalities in (10.48) we argue as follows : Since sZnf and sZ,-rL, we have
%n -c ht+2 s s
and
s I SZn+l < SZn-1
for a11 n 2 1 .
Therefore we have the inequalities
0<s-
s2n
I s2n+l -
s2n
= a2n+l
and
0 < sznpl - S S s2n-l - szn = azn ,
which, taken together, yield (10.48). This completes the proof.
EXAMPLE
1. Since l/nL a n d -l/n + 0 as n + 00, the convergence of the alternating
harmonie series 1 - 3 + i - $ $- * * . is an immediate consequence
of Leibniz’s rule. The
sum of this series is computed below in Example 4.
EXAMPLE 2. The alternating series
2 (- 1)” (log n)/n converges. TO prove this using
Leibniz’s rule, we must show that (log n)/n --f 0 as n + 00 and that (log n)/nL. The first
Alternating
series
405
statement follows from Equation (10.11) of Section 10.2. TO prove the second statement,
we note that the function f for which
f(x) = ‘y
when x > 0
has the derivative f ‘(x) = (1 - log x)/x”. When x > e, this is negative and f is monotonie
decreasing. In particular, f(n + 1) <f(n) for n 2 3.
EXAMPLE
3. An important limit relation may be derived as a consequence
rule. L e t
a, = 1,
a2=
s
2 dx
-,
a3 = 2,
1x
3 dx
a4= s -,
2 x
of Leibniz’s
. . . ,
where, in general,
1
a2n-1 = i
and
azn
=
sn
PI+’ dx
X
f o r n=l,2,3 ,....
It is easy to verify that a, + 0 as n + CO and that a,\. Hence the series 2 (- l)n-lan
converges. Denote its sum by C and its nth partial sum by s, . The (2n - 1)st partial sum
may be expressed as follows:
El+;+-. +$ s n$ =I+i+...
+ b - log n .
1
Since s2n-l --f C as n + CO, we obtain the following limit formula:
(10.49)
lim 1 + i + . . * +i-logn = C .
n-+m t
1
The number C defined by this limit is called Euler’s constant (sometimes denoted by y).
Like r and e, this number appears in many analytic formulas. lts value, correct to ten
decimals, is 0.5772156649. An interesting problem, unsolved to this time, is to decide
whether Euler’s constant is rational or irrational.
Relation (10.49) cari also be expressed as follows:
(10.50)
n 1
- = log n + C + o(l)
c k
a s
n+co.
k=l
From this it follows that the ratio (1 + 4 + * * * + l/n)/log n + 1 as n + CO, SO the partial
sums of the harmonie series are asymptotically equal to log n. That is, we have
n 1
- - log n
c k
k=l
a s
n-m.
Sequences,
406
infinite series, improper integrals
The relation (10.50) not only explains why the harmonie series diverges, but it also gives
us some concrete idea of the rate of growth of its partial sums. In the next example we use
this relation to prove that the alternating harmonie series has the sum log 2.
EXAMPLE 4. Let s,, = Ikm,r (- l)“-l/k. We know that s, tends to a limit as m --f 00,
and we shall prove now that th:is limit is log 2. When m is even, say m = 2n, we may
separate the positive and negative terms to obtain
n
S 277 =
c
k=l
Applying (10.50) to each sum on the extreme right, we obtain
S 2TL
= (log 2n + C -t. o(1)) - (log n + C + o(1)) = log 2 + O(l),
sos2n+log2asn+a3.
This proves that the sum of the alternating harmonie series is log 2.
10.18 Conditional and absolute convergence
Although the alternating harmonie series 1 (- l)“-‘/ n is convergent, the series obtained
by replacing each term by its absolute value is divergent. This shows that, in general,
convergence of 2 a, does not imply convergence of 2 la,l. In the other direction, we have
the following theorem.
THEOREM
10.15.
Assume 1 la,,] converges.
Then 2 a, also converges, and we have
(10.51)
Proof. Assume first that the terms a, are real. Let b, = a, + la,]. We shall prove
that 1 b, converges. It then follows (by Theorem 10.2) that 1 a, converges because
a, = b, - la,l.
Since b, is either 0 or 2 la,l, we have 0 < b, < 2 la,\, and hence 1 la,1 dominates 2 b, .
Therefore 2 b, converges and, as already mentioned, this implies convergence of 2 a, .
Now suppose the terms a, are complex, say a, = U, + iv, , where U, and v, are real.
Since lu,1 5 la,l, convergence of 2 la,/ implies convergence of 2 lu,1 and this, in turn,
implies convergence of 1 u, , since the U, are real. Similarly, 2 v, converges. By linearity,
the series 2 (u, + iv,) converges..
TO prove (10.51), we note that lx;=, a,1 < z;=, la,l, and then we let n + CO.
A series za, is called absolutely convergent if 2 la,1 converges, It is
conditionally convergent if 2 a, converges but 2 la,1 diverges.
DEFINITION.
called
If 2 a, and 2 b, are absolutely convergent, then
SO
is the series 2 (aa, + ,Bb,) for every
The convergence tests
choice
of
407
Dirichlet and Abel
of M and B. This follows at once from the inequalities
which show that the partial sums of 2 laa, + /3b,l are bounded.
10.19 The convergence tests of Dirichlet and Abel
The convergence tests of the earlier sections that were developed for series of nonnegative
terms may also be used to test absolute convergence of a series with arbitrary complex
terms. In this section we discuss two tests that are often useful for determining convergence
when the series might not converge absolutely. Both tests make use of an algebraic identity
known as the Abel partial summation formula, named in honor of the Norwegian mathematician Niels Henrik Abel (1802-1829). Abel’s formula is analogous to the formula for
integration by parts and may be described as follows.
THEOREM
sequences
10.16. ABEL? P A R T I A L S U M M A T I O N
complex numbers, and let
FORMULA.
Let {a,} and {b,} be two
of
A , =ia,.
?C=l
Then IVe
have
the identity
(10.52)
Proof.
If we define
A, = 0, then ak = A, - A,-, for each k = 1,2, . . . , n,
SO
we have
&A, = k%A
- A,-db, = k=l
i AA -k=l
~ A&,,, + A,$,,, ,
k=l
=l
which gives us (10.52).
If we let n + 00 in (10.52), we see that the series 2 a,b, converges if both the series
2 A,(b, - b,+,) and the sequence {A.b,+l} converge. The next two tests give sufficient
conditions for these to converge.
THEOREM 10.17. DIRICHLET'S TEST.
Let 1 a, be a series of complex terms M)hose partial
sums form a bounded sequence. Let {b,} be a decreasing sequence which converges to 0.
Then the series 2 a,b, converges.
Proof.
Using the notation of Theorem 10.16, there is an M > 0 such that IA,1 < M
for a11 n. Therefore A,b,+, + 0 as n --f CO. TO establish convergence of 2 anb, , we need
only show that the series 1 A,(b, - bktl) is convergent. Since b,L, we have the inequality
IA,@, - bR+Jl I M@/c - b+d .
Sequences,
408
injînite series,
improper integrals
But the series C (b, - b,,,) is a convergent telescoping series which dominates
I: Ak@k - b,+J .
This implies absolute
convergence and hence convergence of 2 A,(& - bk+J.
THEOREM 10.18. ABEL'S TEST. Let 1 a, be a convergent series of complex terms and
Iet {b,} be a monotonie convergent sequence of real terms. Then the series 2 anb, converges.
Proof. Again we use the notation of Theorem 10.16. Convergence of 2 a, implies
convergence of the sequence {A.} and hence of the sequence {A,b,+,}. Also, {A,} is a
bounded sequence. The rest of the proof is similar to that of Dirichlet’s test.
TO use Dirichlet’s test effectively, we need some examples of series having bounded
partial sums. Of course, every convergent series has this property. An important example
of a divergent series with bounded partial sums is the geometric series 1 x”, where x is a
complex number with 1x1 = 1 but x # 1. The next theorem gives an Upper bound for the
partial sums of this series. When 1x1 = 1, we may Write x = e2@, where 19 is real, and we
have the following.
THEOREM
10.19.
For every real f3 not an integer multiple of rr, we have the identity
T1,
‘3
k;=l
(10.53j
sin ne zhtl)e
e2ike c-----e
sin 0
from which we obtain the estimate
(10.54)
Proof.
If x # 1, the partial sums of the geometric series are given by
12
c
k=l
x” - 1
xk=x----x - l ’
Writing x = e2i0 in this formula, where 8 is real but not an integer multiple of n, we find
n
c
k=l
e2ikB
=e
2ine
-1
2iB e--=
ezie - 1
ein” - eë’neeicn+l,o
eie _ e-ie
_ sin n0 ei(n+l,e
.
sin e
This proves (10.53). TO deduce (10.54), we simply note that (sin ne1 5 1 and (ei(n+l)e\ = 1.
Exercises
409
EXAMPLES. Assume {b,} is any decreasing sequence
of real numbers with limit 0. Taking
a, = xn in Dirichlet’s test, where x is complex, 1x1 = 1, x # 1, we find that the series
(10.55)
$x”
converges. Note that Leibniz’s rule for alternating series is merely the special case in which
x = - 1. If we Write x = eie, where 19 is real but not an integer multiple of 2n, and consider
the real and imaginary parts of (10.55), we deduce that the two trigonometric series
COS
rd3
and
zl~, sin ne
converge. In particular, when b, = ni, where cc > 0, we find the following series converge:
c eintJ
c-9
na
?L=l
m
cas nf3
na ’
* sin nf3
c
na - ’
c-
?I=l
n=l
When cc > 1, they converge absolutely since they are dominated by 2 nMa.
10.20 Exercises
In Exercises 1 through 32, determine convergence or divergence of the given series. In case of
convergence, determine whether the series
converges absolutely or conditionally.
c
m (-I)n+1
1.
n=l
42
.
4
2. $(-l)+$
PL=1
5. c
m (-l)n(n-l)/Z
7Z=l
2n
*
6. -f(4)” (SF.
?l=l
m
7.
cc
(-1Y
c vi + (-1P.
n=2
15.
c
TL=l
sin (log n).
410
Sequences, injinite series, improper integrals
17. 2(-l)” (1 - nsink) .
22.zsin(nn+&j.
Tl=l
cc
1
2 3 . n=l
c 41 + 1/2 + . . . + l/n) ’
18. $(-l)n (1 -cos$.
1L=l
m
19.
c
n=l
1
( - 1)” arctan 2n + 1 .
24. z(-1)” [e - (1 +~-r] .
m
C-1)”
2 5 . 12==2
c (n + (- 1)“)s ’
03
20. z(-l)n (a - arctan(logn)j
2 6 . 2-w
n=; 1
21.~1~~(1+~)~
27. f$ a,,
n=l
m
28.
a,,
c
where a, =
where a, =
?l=l
CO
l/n
i 1/?22
l/n2
-l/n
if n is a square,
otherwise.
if n is odd,
if n is even.
m
312
1
sin - *
n1
a
31.
m sin (l/n)
3 0 . ~
c
n
.
32.
29.
ci
?L=l
n=l
Yl=l
1 - n sin .!
n1 ’
Tkl
In Exercises 33 through 46, describe the set of a11 complex z for which the series converges.
m
m (z - 1)”
33.
nnzn .
40.
c
c
n=O (n .
7L=l
m (-1)nZan
3 4 . ~.
c
n!
41.
?l=l
m (- lyyz - 1)”
c
n
n=l
‘n
35.
Zn
c +.
n=O
42.
m (2z + 3y
c
n=l
n log (n + 1) ’
CO
36.
37.
38.
Zn
c F’
n=l
m (-l)n
c
&=*
cc
2n + 1
-1og-.
c
ne16
n
CO
39.
46.
1
c (1 + Iz12Y .
n=l
Rearrangements of series
411
In Exercises 47 and 48, determine the set of real x for which the given series
2n sin2n x
47. 2(-l)” -y- .
48.
converges.
m 2” sinn x
c --Jr-’
?L=l
?L=l
In Exercises 49 through 52, the series are assumed to have real terms.
49. Ifa, >Oandz a, converges, prove that 2 l/a, diverges.
50. If 1 \a,1 converges, prove that 2 ai converges. Give a counterexample in which 2 an converges
but 1 la,1 diverges.
5 1. Given a convergent series 2 u, , where each a, 2 0. Prove that 15 dz n-B converges if p > $.
Give a counterexample for p = 4.
52. Prove or disprove the following statements:
(a) If 1 a, converges absolutely, then SO does 2 ug/(l + ~2,).
(b) If 2 a, converges absolutely, and if no a, = - 1, then 1 a,/( 1 + a,) converges absolutely.
*10.21
Rearrangements of series
The order of the terms in a finite sum cari be rearranged without affecting the value of
the sum. In 1833 Cauchy made the surprising discovery that this is not always true for
infinite series. For example, consider the alternating harmonie series
1-j+*-*++-++ -...= log2.
(10.56)
The convergence of this series to the sum log 2 was shown in Section 10.17. If we rearrange
the terms of this series, taking alternately two positive terms followed by one negative
term, we get a new series which cari be designated as follows:
l+a-~+%+P-B+~+-~~ï-~++ _....
(10.57)
Each term which occurs in the alternating harmonie series occurs exactly once in this
rearrangement, and vice versa. But we cari easily prove that this new series has a sum
greater than log 2. We proceed as follows:
Let t, denote the nth partial sum of (10.57). If n is a multiple of 3, say n = 3m, the
partial sum t,, contains 2m positive terms and m negative terms and is given by
2m
t 3m
=
c
k=l
In each of the last three sums, we use the asymptotic relation
c
n 1
- = log n + C + o(1)
k
a s
n+co,
k=l
to obtain
t 3m
=
(10g
4~2
= + log 2
+ C + o(1)) - +.(log
+ o(1) .
2m + C + 00)) - &@g m + C + OU))
Sequences,
412
injînite series,
improper integrals
Thus t,, + +j log 2 as m -+ 00. But t3m+l = t3,,& + 1/(4m + 1) and t3m-1 = t,, - 1/(2m),
SO t3rn+l and tsnzpl have the same limit as t,, when m -+ CO. Therefore, every partial sum
t, has the limit Q log 2 as n -f CO, SO the sum of the series in (10.57) is 8 log 2.
The foregoing example shows that rearrangement of the terms of a convergent series
may alter its sum. We shall prove next that this cari happen only if the given series is
conditionally convergent. That is, rearrangement of an absolutely convergent series does
not alter its sum. Before we prove this, we Will explain more precisely what is meant by a
rearrangement.
DEFINITION.
Let P = (1, 2, 3, . . .} denote the set of positive integers. Let f be a function
whose domain is P and whose range is P, and assume f has the following property:
m#n
implies
f(m) #
f (4 .
Such a function f is calied a permutation of P, or a one-to-one mapping of P onto itself.
2 a, and 2 b, are two series SUC~ that for every n 2 1 we have
y
for some permutation f, then the series 2 b, is said to be a rearrangement of 2 a, .
EXAMPLE. If 2 a, denotes
the alternating harmonie series in (10.56) and if 2 b, denotes
the series in (10.57), we have b, = a,(,, , wherefis the permutation defined by the formulas
f(3n + 1) = 4n + 1 ,
f(3n + 2) = 4n + 3 ,
f(3n + 3) = 2n + 2 *
THEOREM
10.20. Let 2 a, be an absolutely convergent series having sum S. Then every
rearrangement of c a, also converges absolutely and has sum S.
Proof. Let 2 b, be a rearrangement, say b, = a,(,) . First we note that 2 b, converges
absolutely because 2 lb,\ is a series of nonnegative terms whose partial sums are bounded
above by 2 la,l.
TO prove that 2 b, also has su.m S, we introduce
Bn = k=l
fb,,
4 =iak,
and
k=l.
k=l
k=l
Now A, -t S and A*, -+ S* as n -t CO. Therefore, given any E > 0, there is an N such that
IAN - SI <;
For this N we cari choose
M
SO
and
IA; - S*I < ;.
that
{1,2,. . . 3 NI E {f(l),f(2)>. . . ,f(Wl.
Rearrangements of series
413
This is possible because the range offincludes a11 the positive integers. If n 2 M, we have
(10.58)
IB, - SI = IB, - -4, + A, - SI I P, - -&VI +
IA, - SI S P, - ANI + ;.
But we also have
The terms a, , . . . , aN cancel
in the subtraction,
SO
we have
Combining this with (10.58), we see that IB, - SI < E for a11 II 2 M, which means that
B, --f S as n + 00. This proves that the rearranged series 2 b, has sum S.
The hypothesis of absolute convergence in Theorem 10.20 is essential. Riemann discovered that a conditionally convergent series of real terms cari always be rearranged to give
a series which converges to any preassigned sum. Riemann’s argument is based on a special
property of conditionally convergent series of real terms. Such a series 2 a, has infinitely
many positive terms and infinitely many negative terms. Consider the two new series 2 a;t
and 2 a; obtained by taking the positive terms alone and the negative terms alone. More
specifically, define un and a; as follows:
(10.59)
a+
n--
a, + I%l
2
- 0, - I%l
’
an=
2
.
If a,, is positive, then u: = a, and a; = 0; if a, is negative, then a; = a, and ai = 0.
The two new series 1 aa and x a; are related to the given series 2 a, as follows.
10.21. Given a series 2 a,, of real terms, dejine aA and a; by (10.59).
(a) If c a, is conditionally convergent, both 1 a: and 2 a; diverge.
(b) Zf 1 a, is absolutely convergent, both 2 a;t and 2 a; converge, and we have
THEOREM
(10.60)
Proof. TO prove part (a), we note that 1 +a, converges and 2 $z,~I diverges. Therefore,
by the linearity property (Theorem 10.3) 2 a: diverges and 2 a; diverges. TO prove part
(b), we note that both 2 ;a, and 2 &la,l converge, SO by the linearity property (Theorem
10.2) both z at, and 1 a; converge. Since a, = a: + a; , we also obtain (10.60).
Now we cari easily prove Riemann’s rearrangement theorem.
THEOREM
10.22. Let z a, be a conditionally convergent series of real terms, and let S
be a given real number. Then there is a rearrangement 2 b, of 2 a, which converges to the
sum S.
414
Sequences, in$nite series, improper integrals
Proof.
Define a: and a; as indicated in (10.59). Both series 2 a: and 1 a; diverge
since 1 a, is conditionally convergent. We rearrange 2 a, as follows:
Take, in order, just enough positive terms a: SO that their sum exceeds S. Ifp, positive
terms are required, we have
n$;n
>
S
but
$GIS
if
q
<pl.
This is always possible since the partial sums of 1 ai tend to + CO. TO this sum we add
just enough negative terms a; , say n, negative terms, SO that the resulting sum is less than S.
This is possible since the partial sums of a; tend to - 00. Thus, we have
Now we repeat the process, adding just enough new positive terms to make the sum exceed
S, and then just enough new negative terms to make the sum less than S. Continuing in
this way, we obtain a rearrangement 2 b, . Each partial sum of 1 b, differs from S by at
most one term a: or a; . But a, + 0 as n + CO since 2 a, converges, SO the partial sums
of 2 b, tend to S. This proves that the rearranged series 2 6, converges and has sum S,
as asserted.
10.22 Miscellaneous review exercises
1. (a) Let a, = .t/n + 1 - &. Compute lim,,, a, .
(b) Let a, = (n + 1)” - ne, where c is real. Determine those c for which the sequence {a,}
converges and those for which it diverges. In case of convergence, compute the limit of the
sequence. Remember that c cari be positive, negative, or zero.
2. (a) If 0 < x < 1, prove that (1 + x”)lln approaches
a limit as n + ~0 and compute this
limit.
(b) Given a > 0, b > 0, compute lim,,,(an + bn)lin.
3. A sequence {a,} is defined recursively in terms of a, and a2 by the formula
%4-l =
4 + a,-,
f o r n22.
2
(a) Assuming that {a,} converges, compute the limit of the sequence in terms of a, and a2 .
The result is a weighted arithmetic mean of a, and a2 .
(b) Prove that for every choice ofa, and a2 the sequence {a,} converges. You may assume that
a1 < a2. [Hint: Consider {a,,} and {a2n+l} separately.]
4. A sequence {x,} is defined by the following recursion formula:
X:l = 1 )
x,+1 = 2/lfx,
Prove that the sequence converges and find its limit.
5. A sequence {xn} is defined by the following recursion formula:
x() = 1 )
x,=1,
1
-= -!- +‘.
X?l+?
x,+1
XII
Prove that the sequence converges and find its limit.
Miscellaneous
6. Let {a,} and (6,) be two sequences
review exercises
415
such that for each n we have
eu” = a, + e %
(a) Show that a, > 0 implies b, > 0.
(b) If a,, > 0 for a11 n and if 2 a, converges, show that 1 (b,/a,) converges.
In Exercises
7 through 11, test the given series
for convergence.
00
1
9.
c
n=2 (log nPn .
m
8. n~lns(~ - 242 + 2/n-l).
10.
1
-.
c nl+l/n
n=1
11. ~;&&l, where a, = l/n if n is odd, a, = l/n2 if n is even.
12. Show that the infinite series
converges for a > 2 and diverges for a = 2.
13. Given a, > 0 for each n. For each of the following statements, give a proof or exhibit a
counterexample.
(a) If zzcl a, diverges, then ~~=, ai diverges.
(b) If zzcl u: converges, then Ic=, a,/n converges.
14. Find a11 real c for which the series zz’, (n!)c/(3n)! converges.
15. Find a11 integers a 2 1 for which the series zF=r (n!)3/(an)! converges.
7%. L e t n, < n2 < n3 < . denote those positive integers that do not involve the digit 0 in their
decimal representations.
Thus n, = 1, n2 = 2, . . . , n, = 9, n10 = 11, . . . , nia = 19,
= 21,
etc. Show that the series of reciprocals I:=I l/n, converges and has a sum less than 90.
nlQ
[Hint:
Dominate the series by 9 xz=0 (9/10)n.]
17. If a is an arbitrary real number, let s,(a) = la + 2a + . . . + na. Determine the following
limit :
s,(a + 1)
lim-.
n-m n&L@)
(Consider both positive and negative a, as well as a = 0.)
18. (a) If p and q are fixed integers, p 2 q 2 1, show that
lim
P?z
c
1
- = logi.
n+m k=Qn k
(b) The following series is a rearrangement of the alternating harmonie series in which there
appear, alternately, three positive terms followed by two negative terms:
1
l+g+;-$-$+++*+l~l---s+++--....
i
Show that the series converges and has sum log 2. + a log 8.
[Hint:
Consider the partial sum sbn and use part (a).]
Sequcnces, injînite series, improper integrals
416
(c) Rearrange the alternating harmonie series, writing alternately p positive terms followed
by q negative terms. Then use part (a) to show that this rearranged series converges and has
sum log 2 + 9 log (p/q).
10.23 Improper integrals
The concept of an integral &“(x) dx was introduced in Chapter 1 under the restriction
that the function f is de$ned and bounded on a$nite interval [a, b]. The scope of integration
theory may be extended by relax@ these restrictions.
TO begin with, we may study the behavior of SI~(X) dx as b -+ + 03. This leads to the
notion of an injnite integral (also called an improper integral of thejrst kind) denoted by
the symbol jaf(x) dx. Another extension is obtained if we keep the interval [a, b] finite
and allowf to become unbounded at one or more points. The new integrals SO obtained
(by a suitable limit process) are called improper integrals of the second kind. TO distinguish
the integrals of Chapter 1 from :improper integrals, the former are often called “proper”
integrals.
Many important functions in analysis appear as improper integrals of one kind or
another, and a detailed study of such functions
is ordinarily undertaken in courses in
advanced calculus. We shah be concerned here only with the most elementary aspects of
the theory. In fact, we shall merely state some definitions and theorems and give some
examples.
It Will be evident presently that the definitions pertaining to improper integrals bear a
strong resemblance to those for infinite series. Therefore it is not surprising that many of
the elementary theorems on series have direct analogs for improper integrals.
If the proper integral Jaf(x) d.Y exists for every b > a, we may define a new function I
as follows :
z(b) = jabf<4 dx
for each b 2 a .
The function I defined in this waly is called an infinite integral, or an improper integral of
thejrst kind, and it is denoted by the symbol jz f (x) dx. The integral is said to converge
if the limit
(10.61)
llim I(b) = ,“im Saf(x) dx
b+tm
exists and is finite. Otherwise, the integral ja f(x) dx is said to diverge. If the limit in
(10.61) exists and equals A, the number A is called the value of the integral, and we Write
samf(x) dx = A.
These definitions are similar to those given for infinite series. The function values I(b)
play the role of the “partial SU~S” and may be referred to as “partial integrals.” Note
that the symbol j: f(x) dx is used both for the integral and for the value of the integral
when the integral converges. (Compare with the remarks near the end of Section
10.5.)
Improper integrals
417
EXAMPLE 1. The improper integral SF xeS dx converges if s > 1 and diverges if s < 1.
TO prove this, we note that
bl-” - 1
1 - s
I(b) =I;x+ dx =
log b
i f
s#l,
i f
s=l.
Therefore Z(b) tends to a finite limit if and only if s > 1, in which case the limit is
0s
1
x-’ dx = S&I .
The behavior of this integral is analogous to that of the series
l(s) = IF=, KS.
EXAMPLE
for the zeta-function,
2. The integral JF sin x dx diverges because
Z(b) = job
sin x dx = 1 - COS b ,
and this does not tend to a limit as b -+ + CO.
Infinite integrals of the form Jb, f(x) dx are similarly defined. Also, if SF, f(x) dx and
Jd f(x) dx are both convergent for some c, we say that the integral jmm f(x) dx is convergent,
and its value is defined to be the sum
j-t f(x) dx = j:Jx) dx + jcmf(x) dx .
(10.62)
(It is easy to show that the choice of c is unimportant.) The integral jZoof(x) dx is said to
diverge if at least one of the integrals on the right of (10.62) is divergent.
EXAMPLE
3. The integral jz, e- alrl dx converges if a > 0, for if b > 0, we have
b
b
e -a’%’
s0
dx
=
e-a=
e-ah dx = ~
- a
s0
1 + 1a
a s
b-ta.
Hence jr e- a15/ dx converges and has the value l/a. Also, if b > 0, we have
0
e -lx’ dx = s-b ens dx = -lb0 eVat dt = job eëaf dt .
Therefore JO, e- alxl dx also converges and has the value l/a. Hence we have j’?m e-airl dx =
2/a. Note, however, that the integral j:, eWar dx diverges because JO, ePx dx diverges.
As in the case of series, we have various convergence tests for improper integrals. The
simplest of these refers to a positive integrand.
418
Sequence.s,
inJnite series, improper integrals
THEOREM
10.23. Assume that the proper integral ja f (x) dx exists for each b 2 a and
suppose that f(x) 2 0 for a11 x I> a. Then jam f(x) d x converges if and only if there is a
constant M > 0 such that
iabf(x) dx 5 M
for every b 2 a .
This theorem forms the basis for the following comparison tests.
10.24. Assume the proper integral sz f (x) dx exists for each b 2 a and suppose
that 0 If(x) I g(x) fo r a11 x ]> a, where ja g(x) dx converges. Then jz f(x) dx also
converges and
THEOREM
Jr f(x) dx I ifrn g(x) dx .
Note:
The integral ja g(x) dx is said to dominate the integral jzf(x) dx.
THEOREM 10.25. LIMIT COMPA!RISON TEST.
Assume both proper integrals j: f (x) dx and
jz g(x) dx exist for each b 2 a, where f (x) 2 0 and g(x) > 0 for all x 2 a. If
(10.63)
lim Jl.9 = c
x-t+m g(x)
’
where c # 0 ,
then both integrals ja f (x) dx and jarn g(x) dx converge or both diverge.
Note: If the limit in (10.63) :is 0, we cari conclude only that convergence of Ja g(x) dx
implies convergence of j: f(x) dx.
The proofs of Theorem 10.23 through 10.25 are similar to the corresponding results for
series and are left as exercises.
EXAMPLE 4. For each real s, the integral j; e-=x’ dx converges. This is seen by comparison with jy xP2 dx since e-xxs/x-2 -f 0 as x + + CO.
Improper integrals of the second kind may be introduced as follows: Suppose f is
defined on the half-open interval (a, b], and assume that the integral JE f(t) dt exists for
each x satisfying a < x 5 b. Define a new function Z as follows:
Z(x) =: Jzbf(r) dt
i f
a<x<b.
The function I SO defined is called an improper integral of the second kind and is denoted
by the symbol Si+ f (t) dt. The integral is said to converge if the limit
(10.64)
lim I(x) = lim “f(f) dt
î-+a+
z-ta+ sz
exists and is finite. Otherwise, the integral Si+ f(t) dt is said to diverge. If the limit in
(10.64) exists and equals A, the number A is called the value of the integral, and we Write
s
a;f(t) dt = A .
Improper
EXAMPLE
integrals
419
5. Let f(t) = TP if t > 0. If b > 0 and x > 0, we have
I(x) =/;
tP dt =
bl-S _ xl-S
l - s
log b - log x
i f
S#I,
i f
s=l.
When x --f O+, Z(x) tends to a finite limit if and only ifs < 1. Hence the integral si+ tF dt
converges ifs < 1 and diverges ifs 2 1.
= l/u,
This example may be dealt with in another way. If we introduce the substitution
dt = -ü2 du, we obtain
t
0
1/x
î2 t-* dt = s llb us-’ du .
When x+0+, 1/x+ +co and hence Jo+ t+ dt = jz, zP2 du, provided the last integral
converges. By Example 1, this converges if and only if s - 2 < - 1, which means s < 1.
The foregoing example illustrates a remarkable geometric fact. Consider the function
defined by the equation f(x) = x - 3 / 4 if 0 < x 5 1. The integral &+ f (x) dx converges,
but the integral si+ nf “(x) dx diverges. Geometrically, this means that the ordinate set of
f has a finite area, but the solid obtained by rotating this ordinate set about the x-axis has
an infinite volume.
Improper integrals of the form Si-f(t) dt are defined in a similar fashion. If the two
integrals Sa+ f (t) dt and j”i- f (t) dt both converge, we Write
f
j-ayf(t) dt = I:+f(t) dt + /cb-fW dt .
Note:
Some authors Write ji where we have written Si;.
The definition cari be extended (in an obvious way) to caver the case of any finite number
of summands. For example, if f is undefined at two points c < d interior to an interval
[a, b], we say the improper integral Jzf (t) dt converges and has the value Si- f(t) dt +
E,f (t) dt + Sd+f (t) 4 provided that each of these integrals converges. Furthermore, we
cari consider “mixed” combinations such as sa+ f(t) dt + SF f(t) dt which we Write as
jz+ f (t) dt, or mixed combinations of the form ji- f (t) dt + j;+ f (t) dt + jr f (t) dt which
we Write simply as Ju f (t) dt.
EXAMPLE 6. The gamma function. If s > 0 the integral J:+ ePtts-’
dt converges. This
must be interpreted as a sum, say
(10.65)
t
s
1
O+ eëttspl dt + Jlrn eëtts-’ dt .
The second integral converges for a11 real s, by Example 4. TO test the first integral we put
= I/u and note that
e-ttS-l
dt = s:/' e-l/Uu-s-l du .
Sequences,
420
injnite series, improper integrals
But SF e-1%.-s-1 du converges for s > 0 by comparison with ST u-S-1 du. Therefore the
integral jt+ e-Q-’ dt converges for s > 0. When s > 0, the sum in (10.65) is denoted by
I’(s). The function I? SO defined is called the gamma function, first introduced by Euler in
1729. It has the interesting property that I’(n + 1) = n! when n is any integer 20. (See
Exercise 19 of Section 10.24 for an outline of the proof.)
The convergence tests given in Theorems 10.23 through 10.25 have straightforward
analogs for improper integrals of the second kind. The reader should have no difficulty in
formulating these tests for himself.
10.24 Exercises
In each
4.
of Exercises 1 through 10, test the improper integral for convergence.
om--& dx.
s
10.
s
c
dx
2 z(logx)s
*
11. For a certain real C the integral
converges. Determine C and evaluate the integral.
12. For a certain real C, the integral
converges. Determine C and evaluate the integral.
13. For a certain real C, the integral
‘Dz
i0 ( dl
1
c
+ 2x2 - x + 1- 1 dx
converges. Determine C and evaluate the integral.
14. Find the values of a and b SUC~ that
2x2 + bx + a
-1 dx=l.
x(2x + a)
1
Exercises
421
15. For what values of the constants a and b Will the following limit exist and be equal to 1 ?
9 x3 + ux2 + bx
lim
dx.
p-f+m s-P x2+x+1
16. (a) Prove that
;Al(/T$
+[$) = 0 a n d t h a t JJ-ia Lhsinxdx = O .
(b) Do the following improper integrals converge or diverge?
17. (a) Prove that the integral si+ (sin x)/x dx converges.
(b) Prove that lim,,, xsi (COS t)/t2 dt = 1.
(c) Does the integral 10, (COS t)/t2 dt converge or diverge?
18. (a) Iff is monotonie decreasing for a11 x 2 1 and iff(x) + 0 as x - + ~0, prove that the
integral jrWf(x) dx and the series 1 f(n) both converge or both diverge.
[KHZ:
Recall the proof of the integral test.]
(b) Give an example of a nonmonotonic f for which the series 2 f(n) converges and the integral si f(x) dx diverges.
19. Let l?(s) = jr+ t+le& dt, ifs > 0. (The gamma function.) Use integration by parts to show
P(s + 1) = SI?(S). Then use induction to prove that I(n + 1) = n! if n is a positive integer.
Each of Exercises 20 through 25 contains a statement, not necessarily true, about a function f
defined for a11 x 2 1. In each of these exercises, n denotes a positive integer, and Zn denotes the
integral J; f(x) d x, which is always assumed to exist. For each statement either give a proof or
provide a counterexample.
20. Iffis monotonie decreasing and if lim,,, Zn exists, then the integral j? f(x) dx converges.
21. If lim,,,f(x) = 0 and lim,,, Zn = A, then jyf(x) dx converges and has the value A.
22. If the sequence {In} converges, then the integral jy ,f(x) dx converges.
23. Iffis positive and if lim,,,Z, = A, then ~?Y(X) dx converges and has the value A.
24. Assumef’(x) exists for each x 2 1 and suppose there is a constant M > 0 such that If’(x)[ 5 M
for a11 x > 1. If lim,,, Zn = A, then the integral jFf(x) dx converges and has the value A.
25. If j? f(x) dx converges, then lim,,, f(x) = 0.
11
SEQUENCES 14ND SERIES OF FUNCTIONS
11.1 Pointwise convergence of sequences of functions
In Chapter 10 we discussed sequences whose terms were real or complex numbers. Now
we wish to consider sequences {fn} whose terms are real- or complex-valued fonctions
having a common domain on the real line or in the complex plane. For each x in the
domain, we cari form another sequence {f,Jx)} o f numbers whose terms are the corresponding function values. Let S denote the set of points x for which this sequence converges.
The function f defined on S by t:he equation
f(x) = n-tm
lim f,(x)
i f
xES,
is called the limitfînction of the sequence {fn}, and we say that the sequence {fn} converges
pointwise to f on the set S.
The study of such sequences is concerned primarily with the following type of question:
If each term of a sequence {fn} has a certain property, such as continuity, differentiability,
or integrability, to what extent is this property transferred to the limit function? For
example, if each function fn is continuous at a point x, is the limit function f also continuous
at x? The following example Sh#ows that, in general, it is not.
EXAMPLE 1. A sequence of continuous functions with a discontinuous limit function. Let
fn(x) = xn if 0 < x 5 1. The gralphs of a few terms are shown in Figure 11.1. The sequence
{fn} converges pointwise on the closed interval [0, 11, and its limit function f is given by
the formula
f(x) = Ilim xn = y
n-m
i
i f O<x<l,
i f
x=1.
Note that the limit function f is discontinuous at 1, although each term of the sequence is
continuous in the entire interval [0, 11.
EXAMPLE 2. A sequence for which lim,, m Ji fn(x) dx # ja lim,, oo fn(x) dx.
Let fn(x) =
nx(1 - x2)n for 0 5 x < 1. In this example, the sequence {fn} converges pointwise to a
limit function f which is 0 everywhere in the closed interval [0, 13. A few terms of the
422
Uniform convergence of sequences of functions
sequence are shown in Figure 11.2. The integral off%
423
over the interval [0, l] is given by
sof,(x) dx = n/‘x(l - x2)% dx = - 5 Cl Ti2r” ‘= n
0
0 2(n + 1) ’
Therefore we have lim,,, &fn(x) dx = 4, but Ji lim,,, fn(x) dx = 0. In other words,
the limit of the integrals is not equal to the integral of the limit. This example shows that
FIGURE 11.1 A sequence of continuous
tions with a discontinuous
funclimit function.
FIGURE 11.2
which
A sequence of functions for
fn - 0 on the interval [O, l] but
stfn +*asn--+ CO.
the two operations of “limit” and “integration” cannot always be interchanged. (See also
Exercises 17 and 18 in Section 11.7.)
George G. Stokes (1819-1903), Phillip L. v. Seidel (1821-1896), and Karl Weierstrass
were the first to realize that some extra condition is needed to justify interchanging these
operations. In 1848, Stokes and Seidel (independently and almost simultaneously) introduced a concept now known as uniform convergence and showed that for a uniformly
convergent sequence the operations of limit and integration could be interchanged.
Weierstrass later showed that the concept is of great importance in advanced analysis. W e
shall introduce the concept in the next section and show its relation to continuity and to
integration.
11.2 Uniform convergence of sequences of functions
Let {fn} be a sequence which converges pointwise on a set S to a limit function f. By the
definition of limit, this means that for each x in S and for each E > 0 there is an integer N,
which depends on both x and E, such that 1f,Jx) - f(x)1 < E whenever n 2 N. If the same
N serves equally well for a11 points x in S, then the convergence is said to be uniform on S.
That is, we have the following.
424
Sequences
and series of functions
DEFINITION.
A sequence of f;unctions {fn} is said to converge uniformly to .f on a set S
iffor every E > 0 there is an N (depending only on E) such that n 2 N implies
1fn(x) - f(x) 1< E
We denote this symbolically
for ail x in S .
by writing
fn -f uniformly on S .
FIGURE
11.3 Geometric meaning of uniform convergence. If n 2 N, the entire graph
of eachf, lies within a. distance E from the graph of the limit functionf.
When the functions fn are rleal-valued,
there is a simple geometric interpretation of
uniform convergence. The inequality Ifn(x) - f(x)1 < E is equivalent to the pair of
inequalities
f(x) - E < fn(x) <f(x) + 6.
If these hold for a11 n 2 N and every x in S, then the entire graph offn above S lies within
a band of height 2~ situated symmetrically about the graph off, as indicated in Figure 11.3.
11.3 Uniform convergence and continuity
Now we prove that uniform convergence transmits continuity from the individual terms
of the sequence {fa} to the limit function f.
THEOREM
11.1 Assume fn -,f uniformly on an interval S. If each function f,, is continuous at a point p in S, then the limit function f is also continuous at p,
Proof.
We Will show that for every E > 0 there is a neighborhood N(p) such that
If(x) -f(p)1 < E whenever x E N(p) n S. If E > 0 is given, there is an integer N such
that n 2 N implies
for a11 x in S .
SinceJv is continuous at p, there is a neighborhood N(p) such that
If&> -fAmI < 5
for a11 x in N(p) n S .
Uniform
425
convergence and integration
Therefore, for a11 x in N(p) n S, we have
If(x) - f(P) I = If(4 - fnc4 + fni(4 - fn(p) + &r(p> - f(p) I
I If(4 -fNwI + Ifn(4 -fXp)I + I”&(p) -f(p)I.
Since each term on the right is <
l
/3, we find If(x) -f(p) 1< 6, which completes the proof.
The foregoing theorem has an important application to infinite series
the function valuesf,(x) are partial sums of other functions, say
and
of functions. If
lffn +f pointwise on S, then we have
f’(x) =;+y&) = k=l
2 Uk(X)
for each x in S. In this case, the series 2 ulC is said to converge pointwise to the sum function
If fn --+ f uniformly on S, we say the series 1 uk converges uniformly to f. If each term
uk is continuous at a point p in S, then each partial sum fn is also continuous at p SO, from
Theorem 11.1, we obtain the following corollary.
f.
THEOREM 11.2. If a series of functions 2 uk converges uniformly to a sum function f on
a set S, and if each term uk is continuous at a point p in S, then the sum f is also continuous
atp.
Note:
We cari
also express this result symbolically by writing
lim 5 u&) = 2 lim I(l~(x).
+-D R=I
k=lz+fl
We describe this by saying that for a uniformly convergent series we may interchange
the limit symbol with the summation symbol, or that we cari pass to the limit term by
term.
11.4 Uniform convergence and integration
The next theorem shows that uniform convergence allows us to interchange the integration
symbol with the limit symbol.
THEOREM 11.3. Assume fn
-f uniformly on an interval [a, b], and assume that each
function fn is continuous on [a, b]. Dejne a new sequence {g,} by the equation
g,(x) = i~MO dt
if x E [a, bl ,
426
Sepences ad series of functions
and let
g(x) = iah) dt .
Then g, + g uniformly on [a, b]. In symbols,
Proof.
we have
The proof is very simple. Given E > 0, there is an integer N such that n 2 N
implies
for a11 t in [a, b] .
Hence, if x E [a, b] and if n 2 N, we have
k,(X) - s(x)1 = jbn(t)
-f(t)) dt lI ablln(O
--f(t)1
a
s
sogn-g
Again,
dt <lbea dt = E ,
uniformly on [a, b].
as a corollary, we have a corresponding result for infinite series.
THEOREM 11.4. Assume that a series of functions 2 uk converges uniformly to a sum
function f on an interval [a, b], where each uk is continuous on [a, b]. If x E [a, b], de-fine
and
g(x) = f(t) dt .
sa
Then g, -t g un@rmly on [a, b]. In other words, we have
01
Proof.
Apply Theorem 11.3 to the sequence
of partial sums {fn} given by
f,(t) = k=l
5 uk(t) 9
and note that jz fn(t) dt = z;=, Ja z+(t) dt.
A suficient condition for uniform convergence
Theorem 11.4 is often described by saying that a uniformly convergent series
integrated term by term.
427
may be
11.5 A sufficient condition for uniform convergence
Weierstrass developed a useful test for showing that certain series are uniformly convergent. The test is applicable whenever the given series cari be dominated by a convergent
series of positive constants.
Given a series qf jiinctions 2 u, which conon a set S. If there is a convergent series ofpositive constants
THEOREM 11.5. THE WEIERSTRASS M-TEST.
vergespointwise
to a function
2 M, such that
f
0 I Mx)l I Mn
then the series 2 u, converges un$ormly
for every n 2 1 and every x in S ,
on S.
Proof. The comparison test shows that the series 1 un(x) converges absolutely for each
x in S. For each x in S, we have
Since the series
implies
2 M,< converges, for every E > 0 there is an integer N such that n 2 N
5 M,<E.
k=n+l
This shows that
for a11 n 2 N and every x in S. Therefore, the series
2 U, converges uniformly tofon S.
Term-by-term differentiation of an arbitrary series of functions is even less promising
than term-by-term integration. For example, the series ZZZ, (sin nx)/n” converges for a11
real x because it is dominated by 2 l/n2. Moreover, the convergence is uniform on the
whole real axis. However, the series obtained by differentiating term by term is 2 (COS nx)/n,
and this diverges when x = 0. This example shows that term-by-term differentiation may
destroy convergence, even though the original series is uniformly convergent. Therefore,
the problem of justifying the interchange of the operations of differentiation and summation
is, in general, more serious than in the case of integration. We mention this example SO the
reader may realize that familiar manipulations with finite sums do not always carry over
to infinite series, even if the series involved are uniformly convergent. We turn next to
special series of functions, known as power series, which cari be manipulated in many
respects as though they were finite sums.
Sequences and series of functions
428
11.6 Power series.
Circle
of convergence
An infinite series of the form
nzoan(z
- ajn =
120
+ al(z - a) + . . . + a,(z _ a)n + . . .
is called a power series in z - a. The numbers z, a, and the coefficients a, are complex.
With each power series there is associated a circle, called the circle of convergence, such
that the series converges absolutely for every z interior to this circle, and diverges for every
z outside this circle. The tenter of the circle is at u and its radius r is called the radius of
FIGURE 11.4
The circle of convergence of a power series.
convergence. (See Figure 11.4.) In extreme cases, the circle may shrink to the single point
a, in which case r = 0, or it may consist of the entire complex plane, in which case we say
that r = + CO. The existence of the circle of convergence is shown in Theorem 11.7.
The behavior of the series at the boundary points of the circle cannot be predicted in
advance.
Examples show that there may be convergence at none, some, or a11 the boundary
points.
For many power series that occur in practice, the radius of convergence cari be determined
by using either the ratio test or the root test, as in the following examples.
EXAMPLE 1. TO find the radius of convergence of the power series
1 zn/n!,
the ratio test. If z # 0, the ratio of consecutive
terms has absolute value
1
Z n+l
~-
(n + l)! 1’
we apply
14
=-
n+l
Since this ratio tends to 0 as n + CO, we conclude that the series converges absolutely for
a11 complex z # 0. It also converges for z = 0, SO the radius of convergence is + CO.
Since the general term of a convergent series must tend to 0, the result of the foregoing
example proves that
lim - = 0
n+m n!
Power series.
Circle of convergence
429
for every complex z. That is, n! “grows faste? than the nth power of any fixed complex
number z as n -f cc).
EXAMPLE
2. TO test the series 2 n23nz”,
we use the root test. We have
(n23” Izl”)lln = 3 IzI n2jn -+ 3 IzI
a s
n--+oo,
since n21n = (n11n)2 and n’ln + 1 as n --f 00. Therefore, the series converges absolutely if
lzl < i and diverges if IzI > 3. The radius of convergence is i. This particular power
series diverges at every boundary point because, if IzI = $, the general term has absolute
value n2.
EXAMPLE 3. For each
of the series 2 znln and 1 zn/n 2, the ratio test tells us that the
radius of convergence is 1. The first series diverges at the boundary point z = 1 but
converges at a11 other boundary points (see Section 10.19). The second series converges
at every boundary point since it is dominated by 2 l/n2.
We conclude this section with a proof that every power series has a circle of convergence.
The proof is based on the following theorem.
THEOREM
11.6. Assume the power series 2 a,zn converges for a particular z # 0, say
or
z
=
z1
.
Then we have:
f
(a) The series converges absolutely for every z with IzI < 1~~1.
(b) The series converges uniformly on every circular disk with tenter at 0 and radius
R < l-4.
Proof. Since 1 a,z: converges, its general term tends to 0 as n --f 00. In particular,
la,zFl < 1 from some point on, say for n 2 N. Let S be a circular disk of radius R, where
O<R<lz,l. IfzESandn>N,wehavelzl<Rand
Since 0 < t < 1, the series 1 a,zn
By Weierstrass’ M-test, the series
argument also shows that the series
each z with IzI < 1~~1 lies in some
part (a).
is dominated by the convergent geometric series 1 tn.
1 a,zn converges uniformly on S. This proves (b). The
2 a,zn converges absolutely for each z in S. But since
circular disk S with radius R < 1~~1, this also proves
THEOREM 11.7. EXISTENCE OF A CIRCLEOF CONVERGENCE.
Assume that the power series
x a,z” converges for at least one z # 0, say jbr z = z1 , and that it diverges for at least one
z, say for z = z2 . Then there exists a positive real number r such that the series converges
absolutely if IzI < r and diverges if IzI > r.
Proof Let A denote the set of a11 positive numbers IzI for which the power series
2 a,zn converges. The set A is not empty since, by hypothesis, it contains IzJ. Also, no
Sequences and series of functions
430
number in A cari exceed Iz21 (because of Theorem 11.6). Hence, Iz21 is an Upper bound
for A. Since A is a nonempty set of positive numbers that is bounded above, it has a least
Upper bound which we denote by r. It is clear that r > 0 since r 2 1zJ. By the definition of
r, no number in A cari exceed r. Therefore, the series diverges if Izl > r. But it is easy
to prove that the series converges absohtely if lzl < r. If IzI < r, there is a positive number
x in A such that lzl < x < r. 1%~ Theorem 11.6, the series 2 a,z” converges absolutely.
This completes the proof.
There is, of course, a corresponding theorem for power series in z - a which may be
deduced from the case just treated by introducing the change of variable 2 = z - a. The
circle of convergence has its tenter at a, as shown in Figure 11.4.
11.7 Exercises
In Exercises 1 through 16, determine the radius of convergence r of the given power series. In
Exercises 1 through 10, test for convergence at the boundary points if r is finite.
1. -fg.
n=lJ
a3
Zn
2.
c (n + 1P”
12=o
3.
9.
*
m (z + 3Y
c
n=O (n + 112” ’
OD (n!)2
n=l (2n)! =“c
a2 pGzn
1 0 . -.
c
n
n=1
1
11.
.3*5...(2n
2 .4 .6 . . . (2~2) 1) =“*
m ( - l)n22nZ2n
4.
c
2n
*
n=1
m
5. - [l - ( -2)qz”.
2
1E=l
m n !z”
6.
nn.
c
TL=1
12.2 (1 + ;rzn.
n=1
m
13.
(sin an)zn,
c
n=o
a > 0.
a>
14.
c
12=0
m
(sinh un)zn,
a > 0.
m (-l)n(z + l)n
Z”
15.
~
a > 0, b > 0.
c
c
n2+1
.
n=1 un + b” ’
12=0
a,
8.
un2zn,
O<u<i.
16. -$ (; + ;) zn,
a > 0, b > 0.
c
PL=0
TL=1
17. Iff,(x) = nxeënz2 for n = 1,2, . . . and x real, show that
7.
This example shows that the operations of integration and limit cannot always be interchanged.
18. LetfJx) = (sin nx)/n, and for each fixed real x letf(x) = lim,,,f,(x). Show that
lim fi(O) #f(O) .
n-a>
Properties of functions represented by real power series
431
This example shows that the operations of differentiation and limit cannot always be interchanged.
19. Show that the series EzC1 (sin nx)/n2 converges for every real x, and denote its sum byf(x).
Prove that f is continuous on [0, ~1, and use Theorem 11.4 to prove that
sorf(x) dx = 25n=l -!--- lj3.
en
20. It is known that
m cas nx x2 TX 2
c
-=--2+x
n2
4
?t=l
i f 05x527~.
Use this formula and Theorem 11.4 to deduce the following formulas:
(a) 2 -$ = -5 ;
?l=l
11.8 Properties of functions represented by real power series
In this section we restrict ourselves to real power series, that is series of the form
2 a,(z - a)n in which z, a, and the coefficients a, are a11 real numbers. We also Write x in
place of z. The interior of the circle of convergence intersects the real axis along an interval
(a - r, a + r) symmetrically located about a; we refer to this as the interval of convergence
of the real power series 2 a,(x - a)n. The number r is called the radius of convergence.
(See Figure 11.5.)
Absolute
a-r
convergence
a
a+r
FIGURE 11.5 The interval of convergence of a real power series.
Each real power series defines
convergence is given by
a sum function whose value at each x in the interval of
f(x) =n$,& - a)“.
The series is said to represent the function f in the interval of convergence, and it is called
the power-series expansion off about a.
There are two basic problems about power-series expansions that concern us here:
(1) Given the series, to find properties of the sum function f.
(2) Given a function f, to find whether or not it may be represented by a power series.
It turns out that only rather special functions possess power-series expansions. Nevertheless,
the class of such functions includes most examples that arise in practice, and hence their
study is of great importance. We turn now to a discussion of question (1).
Sequences
432
and series of functions
Theorem 11.6 tells us that the power series converges absolutely for each x in the open
interval (a - r, a + r), and that it converges uniformly on every closed subinterval
[a - R, a + R], where 0 < R < r. Since each term of the power series is continuous on
the whole real axis, it follows from Theorem 11.2 that the sum function f is continuous
on every closed subinterval [a - R, a + R], and hence on the open interval (a - r, a + r).
Also, Theorem 11.4 tells us that we cari integrate the power series term by term on every
closed subinterval [a - R, a + R]. These properties of functions represented by power
series are stated formally in the following theorem.
THEOREM
11.8.
Assume a fumtion
f
is represented by the power series
(11.1)
in an open interval (a - r, a + r), Then f is continuous on this interval, and its integral over
any closed subinterval may be com,uuted by integrating the series term by term. In particular,
for every x in (a - r, a + r), u’e have
/:f(t)dt = can/:(t -
a)"&
?l=O
=
$f$-(,
-
ajnfl
n=0
Theorem 11.8 also shows that the radius of convergence of the integrated series is at
least as large as that of the original series. We Will prove presently that both series have
exactly the same radius of convergence. First we show that a power series may be
differentiated term by term within its interval of convergence.
THEOREM
11.9. Let f be represented by the poser series (11.1) in the interval of convergence (a - r, a + r). Then \~e bave:
(a) The dljêrentiated
series ~7~=, na,(x - a)‘“-l also has radius of convergence r.
(b) The derivative f ‘(x) exists ,for each x in the interval of convergence and is given by
f’(x) =.g&‘” - ay-l,
Proof. For simplicity, in the proof we assume that a = 0. First we prove that the
differentiated series converges absolutely in the interval (-r, r). Choose any positive x
such that 0 < x < r, and let h be a small positive number such that 0 < x < x + h < r.
Then the series for f (x) and for f (x + h) are each absolutely convergent. Hence, we may
Write
(11.2)
t-(x +
k) -J&f(x> = m a, (x +
c
k
n=O
k)” - xs
k
The series on the right is absolutely convergent since it is a linear combination of absolutely
convergent series. Now we apply the mean-value theorem to Write
(x + k)” - xn = knci-’ ,
Properties of functions represented by real power series
where x < C, <
x
+ h. Hence, the series
433
in (11.2) is identical to the series
z na&’
(11.3)
?l=l
which must be absolutely convergent, since that in Equation (11.2) is. The series (Il .3)
is no longer a power series, but it dominates the power series 2 nar,xn-l, SO this latter series
must be absolutely convergent for this x. This proves that the radius of convergence of
the differentiated series 2 na,x+l is at least as large as r. On the other hand, the radius of
convergence of the differentiated series cannot exceed r because the differentiated series
dominates the original series 2 a,xn. This proves part (a).
TO prove part (b), let g be the sum function of the differentiated series,
g(x) = z na,xn-l .
VS1
Applying Theorem 11.8 to g, we may integrate term by term in the interval of convergence
to obtain
s0
‘g(t) dt = 2 a,xn = f(x) - a,.
n=l
Since g is continuous,
the first fundamental theorem of calculus tells us thatf’(x) exists
and equals g(x) for each x in the interval of convergence. This proves (b).
Note: Since every power series 1 a,(x - ~2)~ cari be obtained by differentiating its
integrated series, 2 a,(x - ~~)~fl/(n + l), Theorem 11.9 tells us that both these series
have the same radius of convergence.
Theorems 11.8 and 11.9 justify the forma1 manipulations of Section 10.8 where we
obtained various power-series expansions using term-by-term differentiation and integration
of the geometric series. In particular, these theorems establish the validity of the expansions
log (1 + x) = c
Oo (-l)nXn+l
n=O
n+l
and
arctan x =
‘a (- 1yx2n+1
c
?Z=O
2n+1
’
whenever x is in the open interval - 1 < x < 1.
As a further consequence
of Theorem 11.9, we conclude that the sum function of a power
series has derivatives of every order and they may be obtained by repeated term-by-term
differentiation of the power series. If f(x) = 1 a,(x - a)n and if we differentiate this
formula k times and then put x = a in the result, we find thatftk)(a) = k!a, , SO the kth
coefficient ak is given by the formula
f (k’(a)
ak
= k!
f o r k=l,2,3 ,....
434
Sequences
and series of fonctions
This formula also holds for k = 0 if we interpret f(O)(a) to mean f(a). Thus, the powerseries expansion off has the form
(11.4)
f(x) = yp (x - a)“.
K=O
'
This property cari be formulated as a uniqueness theorem for power-series expansions.
THEOREM
11.10. If two power series 1 a,(x - a)* and 2 b,(x - a)n have the same sum
function f in some neighborhood of the point a, then the two series are equal term by term; in
fact, we have a, = b, = f (“)(a)/n ! for each n 2 0.
Equation (11.4) also shows that the partial sums of a power series are simply the Taylor
polynomials of the sum function at a. In other words, if a functionfis representable by a
power series in an interval (a ‘- r, a + r), then the sequence
of Taylor polynomials
{T,f(x; a)} generated by f at a converges pointwise in this interval to the sum function f.
Moreover, the convergence is uniform in every closed subinterval of the interval of
convergence.
11.9 The Taylor’s series
generated by a function
We turn now to the second problem raised at the beginning of the foregoing section.
That is, given a function f, to find whether or not it has a power series expansion in some
open interval about a point a.
We know from what was just proved that such a function must necessarily have derivatives
of every order in some open interval about a and that the coefficients of its power-series
expansion are given by Equation (11.4). Suppose, then, that we start with a function f
having derivatives of every order in an open interval about a. We cal1 such a function
injînitely diflerentiable
in this interval. Then we cari certainly form the power series
(11.5)
cmff!!$d (x _ a)” .
k=O
'
This is called the Taylor’s series generated by f at a. We now ask two questions : Does this
series converge for any x other than x = a ? If so, is its sum equal to f(x)? Surprisingly
enough, the answer to both questions is, in general, “no.” The series may or may not
converge for x # a and, if it does converge, its sum may or may not be f (x). An example
where the series converges to a sum different from f(x) is given in Exercise 24 in Section
11.13.
A necessary and sufficient condition for answering both questions in the affirmative cari
be given by using Taylor’s formula with remainder, which provides a.finite expansion of
the form
(11.6)
f(x) = z’+ (x - a)” + E,(x)
k=O
Power-series
expansions for the exponential and trigonometric functions
435
The finite sum is the Taylor polynomial of degree n generated by f at a, and E,,(x) is the
error made in approximatingf by its Taylor polynomial. If we let n + 00 in (11.6), we
see that the power series (11.5) Will converge tof(x) if and only if the error term tends to 0.
In the next section we discuss a useful su$cient condition for the error term to tend to 0.
11.10 A sufficient condition for convergence of a Taylor% series
In Theorem 7.6 we proved that the error term in Taylor’s formula could be expressed
as an integral,
E,(x) = J$. sa yx - t)y-(n+l)(t) dt
(11.7)
in any interval about a in which f cn+l) is continuous.
Therefore, iff is infinitely differentiable,
we always have this representation of the error SO the Taylor’s series converges to f(x) if
and only if this integral tends to 0 as n -f CO.
The integral cari be put into a slightly more useful form by a change of variable. We
Write
t = x + (a - x)u ,
d t = - ( x - a)du,
and note that u varies from 1 to 0 as t varies from a to x. Therefore, the integral in (11.7)
becomes
(11.8)
E,(x) = (’ -,a’n+l~lunj(n+l’[x + (a - x)u] du
n.
0
This form of the error enables us to give the following sufficient condition for convergence
of a Taylor’s series.
THEOREM
11.11. Assume f is injnitely d$%rentiable in an open interval Z = (a - r, a + r),
and assume that there is a positive constant A such that
(11.9)
If’“‘<x>l < A”
f o r n=l,2,3 ,...,
and every x in Z. Then the Taylor’s series generated by
Proof.
f
at a converges to f (x) for each x in Z.
Using the inequality (11.9) in the integral formula (11.8), we obtain the estimate
0 < ,,yJx), 5 Ix - ;ln+l A”+‘slu” du = Ix ( ‘y+;;“” = 5, ,
n.
0
n
.
n
.
where B = A lx - a[. But for every B, B”/n! tends to 0 as n - CO,
x in Z.
11.11 Power-series
SO
E,(x) - 0 for each
expansions for the exponential and trigonometric functions
The sine and cosine functions and a11 their derivatives are bounded by 1 over the entire
real axis. Therefore, inequality (11.9) holds with A = 1 if f (x) = sin x or if f (x) = COS x,
Sequences
436
and series
of functions
and we have the power-series expansions
sin x - x
_ ?Y
+
3!
hf _ 2 +
5!
. . . + (- ,y
7!
x2+1
(2n-l)!+“”
2n
(-.osx
= 1 _ - + 5: _ - +
2!
4.!
6!
. ..+(-l)&+..*.
valid for every real x. For the exponential function, f (x) = e5, we havef @J(x) = e” for a11
x, SO in any finite interval (-r, r) we have e5 5 e’. Therefore, (11.9) is satisfied with
A = er. Since r is arbitrary, this shows that the following power-series expansion is valid
for a11 real x:
e” = 1 + x + 5 + * * . +$+.Y
The foregoing power-series expansions for the sine and cosine cari be used as the starting
point for a completely analytic treatment of the trigonometric functions. If we use these
series as dejnitions of the sine and cosine, it is possible to derive a11 the familiar algebraic
and analytic properties of the trigonometric functions from these series alone. For example,
the series immediately give us the formulas
sin 0 = 0,
COS0 = 1,
D sin x ==
sin (-x) = -sin x ,
COS
COS (-x) = COS X)
Dcosx = -sinx.
x,
The addition formulas may be derived by the following simple device:
new functions defined by the equations
Let u and u be
u(x) = sin (x + a) - sin x cas a - cas x sin a,
u(x) =
COS
(x + a) -
COS
x cas a + sin x sin a,
where a is a fixed real number, and let f(x) = [u(x)]” + [V(X)]~. Then it is easy to verify
that u’(x) = v(x) and v’(x) = -u(x), and so f’(x) = 0 for a11 x. Therefore, f is a constant
and, since f(0) = 0, we must have f(x) = 0 for a11 x. This implies U(X) = u(x) = 0 for
a11 x or, in other words,
sin (x + a) = sin x cas a + cas x sin a ,
Cos (x + a) = cas x Cos a - sin x sin a.
The number 7~ may be introducecl as the smallest positive x such that sin x = 0 (such an
x cari be shown to exist) and then it cari be shown that the sine and cosine are periodic
with period 2n, that sin (&r) = 1, and that COS (3~) = 0. The details, which we shall not
present here, may be found in the book Theory and Application of Znjnite Series by
K. Knopp (New York: Hafner, 1951).
Bernstein’s
*11.12
theorem
4
Bernstein’s theorem
Theorem 11 .l 1 shows that the Taylor’s series of a functionfconverges if the nth derivative
j”fn) grows no faster than the nth power of some positive number. Another sufficient
condition for convergence was formulated by the Russian mathematician Sergei N.
Bernstein (1880- :).
Assume f and all its derivatives are nonnegative
THEOREM 11.12. BERNSTEIN’S THEOREM.
on a closed interval [0, r]. That is, assume that
f(x) 2 0
and
f(“)(x) 2 0
Then, $0 < x < r, the Taylor’s series
for each x in [0, r] and each n = 1,2,3, . . . .
c
O2 f’“‘oxk
k!
k=O
converges tof(x).
Proof. The result holds trivially for x = 0,
Taylor’s formula with remainder to Write
SO
we assume that 0 < x < r. We use
f(x) = $fT xk + E,(x).
(11.10)
k=O
'
We Will prove that the error term satisfies the inequalities
0 5 E,(x) 5
(11.11)
“r “‘f(r) .
0
This, in turn, shows that E,(x) + 0 as n + CO since the quotient (x/r)“+l + 0 when
O<x<r.
TO prove (11.1 l), we use the integral form of the error as given in Equation (11.8) with
a = 0:
1
E,(x) = 5
.
s0
unffn+l)(x - xu) du .
This formula is valid for each x in the closed interval [0, r]. If x # 0, let
F,(x)
E,o = 1 lUnf(n+l)(X
n.1s0
= Xn+l
- xu) d u.
The function fcn+-l) is monotonie increasing in the interval [0, r] since its derivative is
nonnegative. Therefore, we have
f(n+l)(x - xu) =f(n+l)[~(l - u)] < f(“+“[r(l - u)]
3
7
Sequences
438
and series offunctions
if 0 5 u 5 1, which implies that F,(x) < F,(r) if 0 < x 5 r. In other words, we have
E,(x)/xn+l I EJr)/P+l or
(11.12)
Setting x = r in Equation (1 l.lO), we see that E,(r) <f(r) because each term in the sum
is nonnegative. Using this in (11.112), we obtain (11.11) which, in turn, completes the proof.
11.13 Exercises
For each of the power series in Exercises 1 through 10 determine the set of a11 real x for which
the series converges and compute the sum of the series. The power-series expansions given earlier
in the text may be used whenever it is. convenient to do SO.
CO
1.
c
6.
( - l)flxZn.
PL=0
m
2.
c
n=O
3n+1
c
nxn.
8.
(-l)%Xn.
9.
n=O
m
4.
c
fi=0
a)
(-2y g xn.
5.
10.
c
Vk=O
Each
Assume
that the
be used
n
*
m
3.
m 2”xn
-.
c
fi=1
m (-1)%X312
~
n.1 .
n=o
m
c
Xn
c 6-m *
n=O
m (x - 1)”
(n *
c
a=O
of the functions in Exercises 11 through 21 has a power-series representation in powers of x.
the existence of the expansion, verify that the coefficients have the form given, and show
series converges for the values of x indicated. The expansions given earlier in the text may
whenever it is convenient to do SO.
11. a” =
Oo
c
?Z=O
mg
aY Xn
a>0
n!
00
[Hint:
a5 = e r1oga .l
X2n+l
12. sinhx =,=o(2n
c
+ l)!
(a11 x).
13. sin2x = m (-l)%+12G x2n
c
n=1
1
(a11 x).
mX”
14. - =
c
2 -x
n=02n+l
(a11 x).
[Hkt: cas 2x = 1 - 2 sin2 x.1
(1x1 -c 2).
m (-1),x2,
~
(a11 x).
c
n.1
n=O
m
32n - 1
16. sin3x = i
c (-lYfl (2nX2n+1
12=1
15. e+ =
(a11 x).
Power series and dtrerential
17.1o,g =zs
18.
439
(Ix1 < 1).
X
1 w
[l - (-2)n]x”
c
1 +x-2x2=5 n=1
(Ixl < 3).
[Hint: 3x/(1 + x - 2x2) = l/(l - x) - l/(l +
19.
20.
equations
2x).]
(1x1 < 1).
1
2 m
277(n + 1)
ZZZX”
x2 + x + 1 q’j n=O
c sin
3
[Hint:
(1x1 -c 1).
x3 - 1 = (x - 1)(x2 + x + l).]
X
1 m
1 -C-l)” Xn
21. (1 -X)(l -x2) = 5 z(
R=l n +
2
i
(Ixl < 1).
22. Determine the coefficient at,s in the power-series expansion sin (2x + in) = ~~!, a,xn.
23. Let f(x) = (2 + x2)512. Determine the coefficients a,, a,, . . . , a4 in the Taylor? series
generated by f at 0.
24. Letf(x) = ë1ia:2 if x # 0, and letf(0) = 0.
(a) Show thatf has derivatives of every order everywhere on the real axis.
(b) Show thatf(“)(O) = 0 for a11 n 2 1. This example shows that the Taylor’s series generated
by f about the point 0 converges everywhere on the real axis, but that it represents f only at
the origin.
11.14 Power series and differential equations
Power series sometimes enable us to obtain solutions of differential equations when
other methods fail. A systematic discussion of the use of power series in the theory of
linear second-order differential equations is given in Volume II. Here we illustrate with
an example some of the ideas and techniques involved.
Consider the second-order differential equation
(11.13)
(1 - x2)y” = -3.
Assume there exists a solution, say y = f(x), which may be represented by a power-series
expansion in some neighborhood of the origin, say
(11.14)
y = f a,xn .
7l=O
The first thing we do is determine the coefficients a,, a,, a2, . . . .
One way to proceed is this: Differentiating (11.14) twice, we obtain
y” = 2 n(n - l)a,xn-” .
n=2
Sequences
440
and series of functions
Multiplying by 1 - x2, we find that
(1 - ?)y” =%:;(a - l)a,xne2 -2 n(n - l)a,xn
2
(11.15)
=~$FI + 2)(n + l)an+2x12 -z;(n - l)aX
=~!PI + 2)(n + l)a,,? - n(n -
WJx” .
Substituting each of the series (111.14) and (11.15) in the differential equation, we obtain
an equation involving two power series, valid in some neighborhood of the origin. By
the uniqueness theorem, these power series must be equal term by term. Therefore we
may equate coefficients of xn and obtain the relation
(n + 2)(n + l)a,+, - n(n - l)a, = -2a,
or, what amounts to the same thing,
a
n2-n-2
n - 2
n+2 = <12 + 2)(n + 1) an = n+2 an *
This relation enables us to determine a2, ah, a6, . . . successively in terms of a,. Similarly,
we cari compute a3, as, a,, . . . in terms of a,. For the coefficients with even subscripts,
we find that
a, = a, = a,, = * . . = 0 .
a4 =O*a,=O,
a2 = - a , ,
The odd coefficients are
l - 2
--1
a3=-ul=--a
1+2
3 12
= 3- 2
= A. (-1)
3+2 a2
5
3 a1y
a5
-5-2
a
---a5=-.-.pal=-a
3
1
-1 (-1)
7
5+2
7
5
3
7.5 l
and, in general,
a
- 3 2n
- 5 2n - 7 . .
- 2n
- a- 3 - 2n
=-p
- .-.-.
2n+1
2n + 1 2n 2n
_ ’ + 1 2n - 1 2n - 3
When the common factors are canceled,
3.1 .C-1)
5
3
a,.
'5
this simplifies to
-1
a 2n+‘L = (2n + 1)(2n - 1) a’ ’
Therefore, the series for y cari be written as follows:
m
y = a,(1 - x2) - a,
c
n=O
1
P +
1Pn
X2n+l
- 1)
’
The binomial series
441
The ratio test may be used to verify the convergence of this series for 1x1 < 1. The work
just carried out shows that the series actually satisfies the differential equation in (11.13),
where a, and a, may be thought of as arbitrary constants. The reader should note that
in this particular example the polynomial which multiplies a, is itself a solution of (11.13),
and the series which multiplies a, is another solution.
The procedure just described is called the method of undetermined coe#cients. Another
way to find these coefficients is to use the formula
wo
a, = fn.1
if y = f(x) .
Sometimes the higher derivatives of y at the origin cari be computed directly from the
differential equation. For example, setting x = 0 in (11.13) we immediately obtain
f "(0) =
-2f (0) = -2a,,
and hence we have
a2 =frn = -a 0 ’
2!
TO find the higher derivatives, we differentiate the differential equation to obtain
(1 - x2)y” - 2xy” = -2y’ .
(11.16)
Putting x = 0, we see that f “‘(0) = -2f’(O) = -2a,, and hence a3 = f “‘(0)/3! = -a,/3.
Differentiation of (11.16) leads to the equation
(1 - x2)y@) - 4xy” = 0 .
When x = 0, this yields f ca)(O) = 0, and hence a4 = 0. Repeating the process once more,
we find
(1 _ ,2)95) - fjxy’4’ - 4f” = 0,
fC5)(0)
a1
a5=-=-i?
5!
fc5)(0) = 4f “‘(0) = - 8a, ,
It is clear that the process may be continued as long as desired.
11.15 The binomial series
We cari also use our knowledge of differential equations to determine the sums of certain
power series. For example, we shall use the existence-uniqueness theorem for first-order
linear differential equations to prove that the binomial series expansion
(11.17)
(1 + xy = -$ (n)xn
7L=O
Sequences
442
and series of functions
is valid in the interval 1x1 < 1. Here the exponent M is an arbitrary real number and (;)
denotes the binomial coefficient defined by
u
0n
(11.18)
= u(u - 1) *. . (u - n + 1)
II. I
When t( is a nonnegative integer, a11 but a finite number of the coefficients (n) are zero, and
the series reduces to a polynomial of degree G(, giving us the familiar binomial theorem.
TO prove (11.17) for an arbitrary real a, we first use the ratio test to find that the series
converges absolutely in the open interval - 1 < x < 1. Then we define a function f by
means of the equation
(11.19)
f(x)
= s (n)xn
if 1x1 < 1.
?l=O
We then show thatfis a solution (of the linear differential equation
Y’ - *Y=0
(11.20)
and satisfies the initial condition J”(0) = 1. Theorem 8.3 tells us that in any interval not
containing the point x = -1 there is only one solution of this differential equation with
y = 1 when x = 0. Since y = (1 -f x>” is such a solution, it follows thatf(x) = (1 + x)
if-l <x< 1.
Therefore, to prove (11.17) we need only show that f satisfies the differential equation
(11.20). For this purpose, we require the following property of the binomial coefficients:
(n + 1) (n T 1‘ ) = (a - n)(i) .
This property, which is an immediate consequence
of the definition in (11.18) holds for
every real cc and every integer n 2 0. It cari also be expressed in the form
(11.21)
Differentiation of (11.19) gives us
f’(x) = 2 n(n)x+l = z(n + l)(n $?I=l
l)~n,
TZ=O
from which we find that
(1 + X>~‘(X) = 2 ((n + 1:) (n;l) +n(~)~x~=u~(~)x”=oIfo,
TZ=O
n=O
Exercises
443
because of (Il .21). This shows thatf satisfies the differential equation (11.20) and this, in
turn, proves (11.17).
1 1..16
Exercises
1 The differential equation (1 - ~“)y” - 2xy’ + 6y = 0 has a power-series solution f(x) =
. zo u,xn withf(0) = 1 andf’(0) = 0. Use the method of undetermined coefficients to obtain
a recursion formula relating an+2 to a, . Determine a, explicitly for each n and find the sum
of the series.
2. Do the same as in Exercise 1 for the differential equation (1 - ~“)y” - 2xy’ + 12~ = 0 and
the initial conditionsf(0) = O,,r(O) = 2.
In each of Exercises 3 through 9, the power series is used to define the function f. Determine
the interval
of convergence in each case and show thatfsatisfies the differential equation indicated,
where y =f(x). In Exercises 6 through 9, solve the differential equation and thereby obtain the
sum of the series.
3.f(x) =Zo& ;
$J = y .
4. f(x) =n$$ ;
xy# +y’ -y = 0.
q(x)
7;;-);3n
= 1 +$
l .4.
- 2, X3n;
y* = x"y
+ b.
(Find a and b.)
S=I
6. f(x) =c x; ;
y’ = 2xy.
8. f(x) =y(!;;;y2n
fi=0
;
y# + 4y = 0.
n=O
m (3x)2n+l
7. f(x) =z ;;
y’ = x + y.
9. f(X) = x +zo(2n + l)r ;
n=2
10. The functions J, and J1 defined by the series
y" = 9(y - x>.
X2n+l
Jo(x)
=$(-on&,
?l=O
Jl(X) =$
*="
(-'Qn
+ 1)!22n+l
are called Besselfunctions
of thejrst kind of orders zero and one, respectively. Thcse functions
arise in many problems in pure and applied mathematics. Show (a) both series converge
for a11 real x; (b) Jo(x) = -J,(x); (c)j,(x) = j;(x), wherej,,(x) = xJ,,(x) and,j,(x) = xJ,(x).
11. The differential equation
x”y” + xy’ + (x2 - $)y = 0
is called Bessel’s equation. Show that J,, and J, (as defined in Exercise 10) are solutions when
n = 0 and 1, respectively.
In each of Exercises 12, 13, and 14, assume the given differential equation has a power-series
solution and find the first four nonzero terms.
12.y’=x2+y2,withy=1whenx=0.
13. y’ = 1 + xy2, with y = 0 when x = 0.
14. y’=x +y2,withy =Owhenx =O.
444
Sequences and series of functions
In Exercises 15, 16, and 17, assume the given differential equation has a power-series solution of
the form y = 2 a,x”, and determine the nth coefficient a, .
15. y’ = 0zy.
16. y* = xy.
17. y” + xy’ + y = 0.
18. Let f(x) = xz’e a,x”, where a, = 1 and the remaining coefficients are determined by the
identity
eëzr = f {2a, + (n + l)an+l}xn.
n=0
Compute a1 , a2 , a3 , and find the sum of the series forf(x).
19. Letf(x) = IF=, a,x”, where tlhe coefficients a, are determined by the relation
cas x = 2 a,(n + 2)~~.
n=o
Compute as, a,, andf(r).
20. (a) Show that the first six term;s of the binomial series for (1 - x)-lj2 are:
1 +;.c +ix2 +;x3 +gx4+g6x5.
(b) Let a, denote the nth term of this series when x = 1/50, and let r, denote the remainder
after n terms; that is, for n 2 0 let
rn = a,,, + a,+2 + a1L+3 + * . . .
Show that 0 < rn < a,/49.
[Hint: Show that a,+, < a,/50, and dominate
r, by a suitable geometric series.]
(c) Verify the identity
and use it to compute the first ten correct decimals of d.
[Hirzt: Use parts (a) and (b), retain twelve decimals during
and take into account round-off errors.]
the calculations,
21. (a) Show that
(b) Proceed as suggested in Exercise 20 and compute the first fifteen correct decimals of 43.
22. Integrate the binomial series for (1 - x2)-l12 and thereby obtain the power-series expansion
.$l.3.5...(2n-l) X2n+l
arcsin x = x +~4> 2 .4 .6. . . (2n) 2n + 1
11.=l
(I-4 < 1).
12
VECTOR ALGEBRA
12.1 Historical introduction
In the foregoing chapters we have presented many of the basic concepts of calculus
and have illustrated their use in solving a few relatively simple geometrical and physical
problems. Further applications of the calculus require a deeper knowledge of analytic
geometry than has been presented SO far, and therefore we turn our attention to a more
detailed investigation of some fundamental geometric ideas.
As we have pointed out earlier in this book, calculus and analytic geometry were
intimately related throughout their historical development. Every new discovery in one
subject led to an improvement in the other. The problem of drawing tangents to curves
resulted in the discovery of the derivative; that of area led to the integral; and partial
derivatives were introduced to investigate curved surfaces in space. Along with these
accomplishments came other parallel developments in mechanics and mathematical
physics. In 1788 Lagrange published his masterpiece Mécanique analytique (Analytical
Mechanics) which showed the great flexibility and tremendous power attained by using
analytical methods in the study of mechanics. Later on, in the 19th Century, the Irish
mathematician William Rowan Hamilton (1805-l 865) introduced his Theory of Quaternions,
a new method and a new point of view that contributed much to the understanding of both
algebra and physics. The best features of quaternion analysis and Cartesian geometry were ’
later united, largely through the efforts of J. W. Gibbs (1839-1903) and 0. Heaviside
(1850-1925), and a new subject called vector algebra sprang into being. It was soon realized
that vectors are the ideal tools for the exposition and simplification of many important
ideas in geometry and physics. In this chapter we propose to discuss the elements of vector
algebra. Applications to analytic geometry are given in Chapter 13. In Chapter 14 vector
algebra is combined with the methods of calculus, and applications are given to both
geometry and mechanics.
There are essentially three different ways to introduce vector algebra: geometrically,
analytically, and axiomatically. In the geometric approach, vectors are represented by
directed line segments, or arrows. Algebraic operations on vectors, such as addition,
subtraction, and multiplication by real numbers, are defined and studied by geometric
methods.
In the analytic approach, vectors and vector operations are described entirely in terms
of numbers, called components. Properties of the vector operations are then deduced from
445
446
Vector algebra
corresponding properties of numbers. The analytic description of vectors arises naturally
from the geometric description as soon as a coordinate system is introduced.
In the axiomatic approach, no attempt is made to describe the nature of a vector or of
the algebraic operations on vectors. Instead, vectors and vector operations are thought
of as unde$ned concepts of which we know nothing except that they satisfy a certain set of
axioms. Such an algebraic system, with appropriate axioms, is called a Zinear space or a
linear vector space. Examples of linear spaces occur in a11 branches of mathematics, and
we Will study many of them in Chapter 15. The algebra of directed line segments and the
algebra of vectors described by components are merely two examples of linear spaces.
The study of vector algebra from the axiomatic point of view is perhaps the most
mathematically satisfactory approach to use since it furnishes a description of vectors that
is free of coordinate systems and free of any particular geometric representation. This
study is carried out in detail in Chapter 15. In this chapter we base our treatment on the
analytic approach, and we also use directed line segments to interpret many of the results
geometrically. When possible, we give proofs by coordinate-free methods. Thus, this
chapter serves to provide familiarity with important concrete examples of vector spaces,
and it also motivates the more abstract approach in Chapter 15.
12.2 The vector space of n-tuples
of real numbers
The idea of using a number to locate a point on a line was known to the ancient Greeks.
In 1637 Descartes extended this idea, using a pair of numbers (a, , az) to locate a point in
the plane, and a triple of numbers (a,, a2, a& to locate a point in space. The 19th Century
mathematicians A. Cayley (1821-1895) and H. G. Grassmann (1809-1877) realized that
there is no need to stop with three numbers. One cari just as well consider a quadruple of
numbers (a,, a2, a3, a3 or, more generally, an n-tuple of real numbers
(a,, 4, . . . , a,)
for any integer n 2 1. Such an n-tuple is called an n-dimensionalpoint or an n-dimensional
vector, the individual numbers a,, a2, . . . , a, being referred to as coordinates or components
of the vector. The collection of a11 n-dimensional vectors is called the vector space of
n-tuples, or simply n-space. We denote this space by V, .
The reader may well ask at this stage why we are interested in spaces of dimension
greater than three. One answer is that many problems which involve a large number of
simultaneous equations are more easily analyzed by introducing vectors in a suitable
n-space and replacing a11 these equations by a single vector equation. Another advantage
is that we are able to deal in one stroke with many properties common to 1-space, 2-space,
3-space, etc., that is, properties independent of the dimensionality of the space. This
is in keeping with the spirit of modern mathematics which favors the development of
comprehensive methods for attacking problems on a wide front.
Unfortunately, the geometric pictures which are a great help in motivating and illustrating
vector concepts when n = 1,2, and 3 are not available when n > 3 ; therefore, the study
of vector algebra in higher-dimensional spaces must proceed entirely by analytic means.
In this chapter we shall usually denote vectors by capital letters A, B, C, . . . , and
components by the corresponding small letters a, b, c, . . . . Thus, we Write
A = (a,, a2, . . . , a,) ,
The vector space of n-tuples
of real numbers
447
TO convert V, into an algebraic system, we introduce equality of vectors and two vector
operations called addition and multiplication by scalars. The word “scalar” is used here as
a synonym for “real number.”
DEFINITION.
Two vectors A and B in V, are called equal whenever they agree in their
respective components. That is, ifA = (a,, a2, . . . , a,) and B = (b, , b, , . . . , b,), the vector
equation A = B means exactly the same as the n scalar equations
a2=b2, . . . . a,=b,.
a1 = b, ,
The sum A + B is dejned to be the vector obtained by adding corresponding components:
A+B=(a,+b,,a,+b,,...,a,+b,).
If c is a scalar, we dejine CA or Ac to be the vector obtained by multiplying
of A by c:
each component
CA = (cal, ca,, . . . , ca,).
From this definition it is easy to verify the following properties of these operations.
THEOREM
12.1.
Vector addition is commutative,
A+B=B+A,
and associative,
A+(B+C)=(A+B)+C.
Multiplication by scalars is associative,
c(dA) = (cd)A
and satisjes the two distributive laws
c(A+B)=cA+cB,
and
(c + d)A = CA + dA .
Proofs of these properties follow quickly from the definition and are left as exercises for
the reader.
The vector with a11 components 0 is called the zero vector and is denoted by 0. It has
the property that A + 0 = A for every vector A; in other words, 0 is an identity element
for vector addition. The vector (- l)A is also denoted by -A and is called the negative
of A. We also Write A - B for A + (-B) and cal1 this the dijërence of A and B. The
equation (A + B) - B = A shows that subtraction is the inverse of addition. Note that
OA = 0 and that IA = A.
The reader may have noticed the similarity between vectors in 2-space and complex
numbers. Both are defined as ordered pairs of real numbers and both are added in exactly
Vector
448
algebra
the same way. Thus, as far as addition is concerned,
complex numbers and two-dimensional
vectors are algebraically indistinguishable. They differ only when we introduce multiplication.
Multiplication of complex numbers gives the complex-number system the field properties
also possessed by the real numbers. It cari be shown (although the proof is difficult) that
except for n = 1 and 2, it is not possible to introduce multiplication in V, SO as to satisfy
a11 the field properties. However, special products cari be introduced in V,, which do not
satisfy ail the field properties. For example, in Section 12.5 we shall discuss the dotproduct
of two vectors in V, . The result of this multiplication is a scalar, not a vector. Another
product, called the cross product, is discussed in Section 13.9. This multiplication is
applicable only in the space V3 . The result is always a vector, but the cross product is
not commutative.
12.3 Geometric interpretation for n < 3
Although the foregoing definitions are completely divorced from geometry, vectors and
vector operations have an interesting geometric interpretation for spaces of dimension
three or less. We shall draw pictures in 2-space to illustrate these concepts and ask the
reader to produce the corresponding visualizations for himself in 3-space and in 1-space.
B (terminal
point)
4 - CI
A (initial point)
b, - a,
FIGURE
12.1 The geometric vector
A3 from A to B.
FIGURE
12.2 z and 6 are equivalent
because B - A = D - C.
A pair of points A and B is called a geometric vector if one of the points, say A, is called
the initialpoint and the other, B, the terminalpoint, or tip. We visualize a geometric vector
as an arrow from A to B, as shown in Figure 12.1, and denote it by the symbol A2.
Geometric vectors are especially convenient
for representing certain physical quantities
such as force, displacement, velocity, and acceleration, which possess both magnitude and
direction. The length of the arrow is a measure of the magnitude and the arrowhead
indicates the required direction.
Geometric interpretation for n 5 3
449
Suppose we in>roducea+coordinate system with origin 0. Figure 12.2 shows two geometric vectors AB and CD with B - A = D - C. In terms of components, this means
that we have
h, - a, = dl - cl
and
b, - a2 = d, - c2 .
By compariso: of thetcongruent triangles in Figure 12.2, we see that the two arrows
representing AB and CD have equal lengths, are parallel, affd point in the same direction.
We cal1 such geometric vectors equivalent. That is, we say AB is equivalent to G whenever
(12.1)
B - A = D - C .
Note that the four points A, B, C, D are vertices of a parallelogram. (See Figure 12.3.)
Equation (12.1) cari also be written in the form A + D = B + C which tells us that
opposite vertices of theparallelogram have the same sum. In particular, if one of the vertices,
say A, is the origin 0, as in Figure 12.4, the geometric vector from 0 to the opposite vertex
D corresponds to the vector sum D = B + C. This is described by saying that vector
addition corresponds geometrically to addition of geometric vectors by the parallelogram
Zaw. The importance of vectors in physics stems from the remarkable fact that many
physical quantities (such as force, velocity, and acceleration) combine by the parallelogram
law.
D
FIGURE 12.3 Opposite vertices of
a parallelogram have the same sum :
A+D=B+C.
FIGURE 12.4 Vector
addition interpreted
geometrically by the parallelogram law.
For simplicity in notation, we shall use the same symbol to denote a point in V, (when
n 5 3) and the geometric vector from the origin to this point. Thus, we Write A instead of
&l, B instead of 6, and SO on. Sometimes we also Write A in place of any geometric
vector equivalent to &. For example, Figure 12.5 illustrates the geometric meaning of
vector subtraction. Two geometric vectors are labeled as B - A, but these geometric vectors
are equivalent. They have the same length and the same direction.
Figure 12.6 illustrates the geometric meaning of multiplication by scalars. If B = CA,
the geometric vector B has length ICI times the length of A; it points in the same direction
as A if c is positive, and in the opposite direction if c is negative.
450
Vector
B
algebra
-A
7
A - B
FIGURE 12.5
Geometric meaning of subtraction of
vectors.
FIGURE
12.6 Multiplication of
vectors by scalars.
The geometric interpretation of vectors in V, for 12 5 3 suggests a way to define
parailelism in a general n-space.
DEFINITION.
Two vectors A and B in V, are said to have the same direction if B = CA
for some positive scalar c, and the opposite direction if B = cA for some negative c. They are
called parallel if B = CA for some nonzero c.
Note that this definition makes every vector have the same direction as itself-a property
which we surely want. Note also that this definition ascribes the following properties to
the zero vector: The zero vector is the only vector having the same direction as its negative
and therefore the only vector having the opposite direction to itself. The zero vector is the
only vector parallel to the zero vector.
1 2 . 4 Exercises
1. Let A = (1,3,6), B = (4, -3, 3), and C = (2, 1, 5) be three vectors in Va. Determine the
components of each of the following vectors: (a) A + B; (b) A - B; (c) A + B - C; (d)
7A - 2B - 3C; (e) 2A + B - 3C.
2. Draw the geometric vectors from the origin to the points A = (2, 1) and B = (1,3). On the
same figure, draw the geometric vector from the origin to the point C = A + tB for each of the
following values of 1: t = g; t = +; t = g; t = 1; t = 2; t = -1; t ZZZ -2.
3. Solve Exercise 2 if C = tA + B.
4. Let A = (2, l), B = (1, 3), and C = xA + yB, where x and y are scalars.
(a) Draw the geometric vector from the origin to C for each of the following pairs of values of
xandy:x = y =&;x =$,y =$;x =$,y =$;x =2,y = -1;~ =3,y = -2;~ = -4,
y=$;x=
-l,y=2.
(b) What do you think is the set of points C obtained as x and y run through a11 real numbers
such that x + y = l? (Just make a guess and show the locus on the figure. No proof is
required.)
(c) Make a guess for the set of a11 points C obtained as x and y range independently over the
intervals 0 < x < 1, 0 5 y 5 1, and make a sketch of this set.
(d) What do you think is the set of a11 C obtained if x ranges through the interval 0 5 x < 1
and y ranges through a11 real numbers?
(e) What do you think is the set if x and y both range over a11 real numbers?
5. Let A = (2, 1) and B = (1,3). Show that every vector C = (cr , ca) in V. cari be expressed in
the form C = xA + yB. Express x and y in terms of cr and ca.
The dot product
451
6 . Let A = (1, 1, l), B = (0, 1, l), and C = (1, 1,O) be three vectors in Va and let D = xA +
7.
8.
9.
10.
11.
12.
yB + zC, where x, y, z are scalars.
(a) Determine the components of D.
(b) If D = 0, prove that x = y = z = 0.
(c) Find x, y, z such that D = (1,2,3).
Let A = (1, 1, l), B = (0, 1, 1) and C = (2, 1, 1) be three vectors in V, , and let D = xA +
yB + zC, where x, y, and z are scalars.
(a) Determine the components of D.
(b) Find x, y, and z, not a11 zero, such that D = 0.
(c) Prove that no choice of x, y, z makes D = (1,2,3).
Let A = (1, 1, l,O), B = (0, 1, 1, l), C = (1, 1, 0, 0) be three vectors in V,, and let D =
xA + yB + zC, where x, y, and z are scalars.
(a) Determine the components of D.
(b) If D = 0, prove that x = y = z = 0.
(c) Find x, y, and z such that D = (1, 5, 3,4).
(d) Prove that no choice of x, y, z makes D = (1,2, 3,4).
In V, , prove that two vectors parallel to the same vector are parallel to each other.
Given four nonzero vectors A, B, C, D in V, such that C = A + B and A is parallel to D.
Prove that C is parallel to D if and only if B is parallel to D.
(a) Prove, for vectors in V, , the properties of addition and multiplication by scalars given in
Theorem 12.1.
(b) By drawing geometric vectors in the plane, illustrate the geometric meaning of the two
distributive laws (c + d)A = CA + dA and C(A + B) = CA + cB.
If a quadrilateral OABC in Va is a parallelogram having A and C as opposite vertices, prove
that A + $(C - A) = $B. What geometrical theorem about parallelograms cari you deduce
from this equation?
12.5 The dot product
We introduce now a new kind of multiplication called the dot product or scalar product
of two vectors in V, .
DEFINITION.
Zf A = (a,, . . . , a,) and B = (b, , . . . , b,) are two vectors in V, , their dot
product is denoted b.y A * B and is de$ned by the equation
A . B = $ a,b, .
k=l
Thus, to compute A . B we multiply corresponding components of A and B and then
add a11 the products.
This multiplication has the following algebraic properties.
THEOREM
12.2. For a11 vectors A, B, C in V, and a11 scalars c, we have the following
properties:
(commutative luw),
(a) A*B=B*A
(distributive law),
(b) A - (B + C) = A. B + A - C
(homogeneity),
(c) C(A - B) = (CA). B = A - (cB)
i f A#0
(positivity),
(d) A . A > 0
(e) A - A = 0
if A=O.
452
Vector algebra
Proof. The first three properties are easy consequences
of the definition and are left
as exercises. TO prove the last two, we use the relation A *A = 2 a:. Since each term is
nonnegative, the sum is nonnegative. Moreover, the sum is zero if and only if each term
in the sum is zero and this cari happen only if A = 0.
The dot product has an interesting geometric interpretation which Will be described in
Section 12.9. Before we discuss this, however, we mention an important inequality concerning dot products that is fundamental in vector algebra.
THEOREM
12.3.
THE
CAUCHY-SCHWARZ
INEQUALITY.
If A andB are vectors in V,, we
have
(A . B)2 5 (A . A)(B . B) .
(12.2)
Moreover, the equality sign holds ifand only ifone of the vectors is a scalar multiple of the
other.
Proof.
Expressing each member of (12.2) in terms of components, we obtain
which is the inequality proved earlier in Theorem 1.41.
We shall present another proof of (12.2) that makes no use of components. Such a proof
is of interest because it shows that the Cauchy-Schwarz inequality is a consequence
of the
five properties of the dot product listed in Theorem 12.2 and does not depend on the
particular definition that was used to deduce these properties.
TO carry out this proof, we notice first that (12.2) holds trivially if either A or B is the
zero vector. Therefore, we may assume that both A and B are nonzero. Let C be the vector
C=xA-yB,
where x = B * B
a n d y=A*B.
Properties (d) and (e) imply that C *C 2 0. When we translate this in terms of x and y,
it Will yield (12.2). TO express C *C in terms of x and y, we use properties (a), (b) and (c)
to obtain
c *c = (xA - yB) . (xA - yB) = x2(A . A) - 2xy(A . B) + y2(B. B) .
Using the definitions of x and y and the inequality C *C 2 0, we get
(B . B)2(A . A) - 2(A +B)2(B. B) + (A . B)2(B. B) 2 0.
Property (d) implies B . B > 0 since B # 0,
SO
we may divide by (B *B) to obtain
(B - B)(A . A) - (A . B)2 2 0,
which is (12.2). This proof also shows that the equality sign holds in (12.2) if and only
if C = 0. But C = 0 if and only if xA = yB. This equation holds, in turn, if and only if
one of the vectors is a scalar multiple of the other.
Length or norm of a vector
453
The Cauchy-Schwarz inequality has important applications to the properties of the
length or norm of a vector, a concept which we discuss next.
12.6 Length or norm of a vector
Figure 12.7 shows the geometric vector from the origin to a point A = (a,, aJ in the
plane. From the theorem of Pythagoras, we find that the length of A is given by the
formula
length of A = va: + ai.
0
FIGURE 12.7 In V, , the length
0fAisda:
FIGURE 12.8
In V3, the length of A is da: + ai + a$.
+a;.
A corresponding picture in 3-space is shown in Figure 12.8. Applying the theorem of
Pythagoras twice, we find that the length of a geometric vector A in 3-space is given by
length of A = da: + ai + ~3.
Note that in either case the length of A is given by (A . A)lj2, the square root of the dot
product of A with itself. This formula suggests b way to introduce the concept of length
in n-space.
DEFINITION.
the equation
Zf A is a vector in V,, , its length or norm is denoted by Il A 11
and is dejïned by
[/AI/ = (A . A)1’2.
The fundamental properties of the dot product lead to corresponding properties of norms.
THEOREM
12.4.
Zf A is a vector in V, and if c is a scalar, u!e have the following
(positivity),
if A#0
(4 II A Il > 0
if A=O,
(b) IlAIl = 0
Cc> IlcAll = ICI IlAIl
(homogeneity).
properties:
454
Vector algebra
Proof. Properties (a) and (b) follow at once from properties (d) and (e) of Theorem
12.2. TO prove (c), we use the homogeneity property of dot products to obtain
IICA 1 = (CA *CA)“2 = (?A *A)I’2 = (c2)1/2(A *Ay = (CI IlAI) .
The Cauchy-Schwarz inequality cari also be expressed in terms of norms. It states that
(A - a2 2 IlA Il2 lPl12.
(12.3)
Taking the positive square root of each member, we cari also Write the Cauchy-Schwarz
inequality in the equivalent form
IA * BI I IlAIl II~I1 *
(12.4)
Now we shall use the Cauchy-Schwarz inequality to deduce the triangle inequality.
THEOREM
12.5.
TRIANGLE
INEQUALITY.
If A and B are vectors in V, , we have
IIA + WI I IlAIl + IIBII .
Moreover, the equality sign holds if and only if A = 0, or B = 0, or B = CA for some
c > 0.
Proof.
T O avoid square roots, we Write the triangle inequality in the equivalent form
IIA + WI2 I (IlA II + llBlD2.
(12.5)
The left member of (12.5) is
IIA + Bl12 = (A + B) *(A + B) = A *A + 2A . B + B . B = IIA 112 + 2A . B +
llBj12,
whereas the right member is
(IlAIl + 11~11)” = IlAIl + WII IIBII + llBl12.
Comparing these two formulas, we see that (12.5) holds if and only if we have
(12.6)
A - B S IIA II IIBII .
But A *B < IA *BJ SO (12.6) follows from the Cauchy-Schwarz inequality, as expressed in
(12.4). This proves that the triangle inequality is a consequence of the Cauchy-Schwarz
inequality.
The converse statement is also true. That is, if the triangle inequality holds then (12.6)
also holds for A and for -A, from which we obtain (12.3). If equality holds in (12.5), then
A *B = IlAIl llB[j, SO B = CA for some scalar c. Hence A *B = C[\A[/~ and )\AI( 11Bll =
ICI IIAl12. If A # 0 this implies c = ICI 2 0. If B # 0 then B = CA with c > 0.
455
Orthogonality of vectors
The triangle inequality is illustrated geometrically in Figure 12.9. It states that the
length of one side of a triangle does not exceed the sum of the lengths of the other two
sides.
12.7 Orthogonality of vectors
In the course of the proof of the triangle inequality (Theorem 12.5), we obtained the
formula
IIA + Bll’ = lIAIl + llBl12 + 2x4 *B
(12.7)
II Bll
c
IlAIl
FIGURE 12.9 Geometric meaning of the
triangle inequality :
IIA + BII I IIA Il + I I 4 .
FIGURE 12.10 Two perpendicular
vectors satisfy the Pythagorean
identity :
IM + w+ = IM Il2 + llBl12.
which is valid for any two vectors A and B in V, . Figure 12.10 shows two perpendicular
geometric vectors in the plane. They determine a right triangle whose legs have lengths
((A(( and ((Bl( and whose hypotenuse has length ((A + B((. The theorem of Pythagoras
states that
IIA + Bl12 = ]]A\I2 + )jB/2.
Comparing this with (12.7), we see that A *B = 0. In other words, the dot product of two
perpendicular vectors in the plane is zero. This property motivates the definition of perpendicularity of vectors in V, .
DEFINITION.
Two vectors A and B in V, are calledperpendicular or orthogonal ifA *B = 0.
Equation (12.7) shows that two vectors A and B in V, are orthogonal if and only if
1) A + B Il2 = )I A 1) 2 + 1) B )12. This is called the Pythagorean identity in V, .
456
Vector algebra
12.8 Exercises
1. LetA =(1,2,3,4),B=(-1,2,-3,0),andC=(O,l,O,l)bethreevectorsinV,.
Compute
each of the following dot products:
(a) A . B;
(b) B. C;
(c) A ’ C;
(d) A . (B + C);
(e) (A - B) * C.
2. Given three vectors A = (2,4, -7), B = (2,6,3), and C = (3,4, -5). In each of the following
there is only one way to insert parentheses to obtain a meaningful expression. Insert parentheses and perform the indicated operations.
(a) A 3BC; (b) A . B + C;
(c) A + B . C;
(d) AB . C;
(e) A/B . C.
3. Prove or disprove the following statement about vectors in V,, : If A . B = A . C and A # 0,
then B = C.
4. Prove or disprove the following statement about vectors in V, : If A . B = 0 for every B, then
A = 0.
5 . IfA =(2,1, -1)andB =(l, - 1,2), find a nonzero vector C in Y3 such that A.C = BC = 0.
6. If A = (1, -2, 3) and B = (3, 1, 2), find scalars x and y such that C = xA + yB is a nonzero
vector with C *B = 0.
7. If A = (2, -1,2) and B = (1,2, -2>, find two vectors C and D in V, satisfying a11 the following conditions: A = C + D, B . D = 0, C parallel to B.
8. If A = (1, 2, 3, 4, 5) and B = (1, 3, 4, a, $>, find two vectors C and D in V, satisfying a11 the
following conditions: B = C + 20, D . A = 0, C parallel to A.
9. Let A = (2, -1, 5), B = (-1, -2, 3), and C = (1, -1, 1) be three vectors in V, . Calculate
the norm of each of the following vectors:
(b) A - B;
(c) A + B - C;
(d) A - B + C.
(a) A + B;
10. In each case, find a vector B in V, such that B A = 0 and jlBil = IjA /I if:
(b) A = (1, -1);
(c) A = (2, -3);
(d) A = (a, b).
(a> A = (1, 1);
11. Let A = (1, -2,3) and B = (3, 1,2) be two vectors in V, . In each case, find a vector C of
length 1 parallel to:
(d) A - 2B;
(a) A -t B;
(b)A -B;
(c) A + 2B;
(e) 2A - B.
12. Let A = (4, 1, -3), B = (1, 2, 2), C = (1, 2, -2), D = (2, 1, 2), and E = (2, -2, -1) be
vectors in V, . Determine a11 orthogonal pairs.
13. Find a11 vectors in V, that are orthogonal to A and have the same length as A if:
(b) A = (1, -2);
(c) A = (2, -1);
(d) A = (-2, 1).
(a) A = (1,2);
14. If A = (2, -1, 1) and B = (3, -4, -4), find a point C in 3-space such that A, B, and C are
the vertices of a right triangle.
15. If A = (1, -1,2) and B = (2, 1, -l), find a nonzero vector C in V3 orthogonal to A and B.
16. Let A = (1,2) and B = (3,4) be two vectors in V, . Find vectors P and Q in V, such that
A = P + Q, P is parallel to B, and Q is orthogonal to B.
17. Solve Exercise 16 if the vectors are in V4 , with A = (1,2, 3, 4) and B = (1, 1, 1, 1).
18. Given vectors A = (2, -1, l), B = (1,2, -l), and C = (1, 1, -2) in V, . Find every vector
D of the form xB + JC which is orthogonal to A and has length 1.
19. Prove that for two vectors A and B in V, we have the identity
1IA + B1j2 - lIA - B112 = 4A . B,
and hence A . B = 0 if and only if IIA + BII = IlA - BI(. When this is interpreted geometrically in V, , it states that the diagonals of a parallelogram are of equal length if and only if
the parallelogram is a rectangle.
20. Prove that for any two vectors A and B in V, we have
IIA + Bl12 + IIA - Bl12 = 2 IlAIl + 2 IIBl12.
What geometric theorem about the sides and diagonals of a parallelogram cari you deduce
from this identity?
Projections. Angle between vectors in n-space
457
21. The following theorem in geometry suggests a vector identity involving three vectors A, B,
and C. Guess the identity and prove that it holds for vectors in V, . This provides a proof of the
theorem by vector methods.
“The sum of the squares of the sides of any quadrilateral exceeds the sum of the squares of
the diagonals by four times the square of the length of the line segment which connects the
midpoints of the diagonals.”
22. A vector A in V, has length 6. A vector B in V, has the property that for every pair of scalars
x and y the vectors xA + yB and 4yA - 9xB are orthogonal. Compute the length of B and
the length of 2A + 3B.
23. Given two vectors A = (1, 2, 3,4, 5) and B = (1, $, i, a, 5) in V, . Find two vectors C and D
satisfying the following three conditions: C is parallel to A, D is orthogonal to A, and B =
C + D.
24. Given two nonperpendicular vectors A and B in V,, prove that there exist vectors C and D
in V, satisfying the three conditions in Exercise 23 and express C and D in terms of A and B.
25. Prove or disprove each of the following statements concerning vectors in V, :
(a) If A is orthogonal to B, then IIA + xBII 2 IIA I for a11 real x.
(b) If IlA + xBI/ 2 IIA I for a11 real x, then A is orthogonal to B.
12.9 Projections. Angle between vectors in n-space
The dot product of two vectors in V, has an interesting geometric interpretation. Figure
12.1 l(a) shows two nonzero geometric vectors A and B making an angle 0 with each other.
In this example, we have 0 < 0 < &T. Figure 12.1 l(b) shows the same vector A and two
perpendicular vectors whose sum is A. One of these, tB, is a scalar multiple of B which we
cal1 the projection of A along B. In this example, t is positive since 0 < 0 < in.
tB = projection of
A along B
(4
FIGURE 12.11
(b)
The vector tB is the projection of A along B.
We cari use dot products to express t in terms of A and B. First we Write tB + C = A
and then take the dot product of each member with B to obtain
tB.B+C.B=A.B.
But C. B = 0, because C was drawn perpendicular to B. Therefore tB *B = A . B, SO
we have
(12.8)
t=A.B
A.B
-=B . B
llB11’ *
458
Vector
algebra
On the other hand, the scalar t bears a simple relation to the angle 0. From Figure 12.1 l(b),
we see that
cos (j _ IIN _ t ll4l
IlAIl ’
IlAIl
Using (12.8) in this formula, we find that
A.B
COS 8 = ~
(12.9)
1141 IIBII
or
A. B = IlAIl IlBl/
COS
6’.
In other words, the dot product of two nonzero vectors A and B in V, is equal to the product of three numbers: the length of A, the length of B, and the cosine of the angle between
A and B.
Equation (12.9) suggests a way to define the concept of angle in V, . The Cauchy-Schwarz
inequality, as expressed in (12.4), shows that the quotient on the right of (12.9) has absolute
value 5 1 for any two nonzero vectors in V, . In other words, we have
-1<AB
- Il4 IIBII ’ ”
Therefore, there is exactly one real 8 in the interval 0 < 0 5 v such that (12.9) holds. We
define the angle between A and B to be this 8. The foregoing discussion is summarized in
the following definition.
DEFINITION.
Let A and B be two vectors in V, , with B # 0. The vector tB, bvhere
is called the projection of A along B.
and B is dejned by the equation
If
both A and B are nonzero, the angle 8 between A
% = arccos
Note: The arc cosine function
0 = 3~ when A . B = 0.
A.B
~
IlAIl IIBII ’
restricts 0 to the interval
0 5 0 5 7~. N o t e also that
12.10 The unit coordinate vectors
In Chapter 9 we learned that every complex number (a, b) cari be expressed in the form
a + bi, where i denotes the complex number (0, 1). Similarly, every vector (a, b) in V,
cari be expressed in the form
(a, b) = 41, 0) + b(O, 1) .
The unit coordinate vectors
459
The two vectors (1, 0) and (0, 1) which multiply the components a and b are called unit
coordinate vectors. We now introduce the corresponding concept in V, .
DEFINITION.
In V, , the n vectors E, = (1, 0, . . . , 0), E, = (0, 1 , 0, . . . , 0), . . . , E, =
(0, 0, . . . , 0, 1) are called the unit coordinate vectors. It is understood that the kth component
of Elc is 1 and a11 other components are 0.
The name “unit vector” cornes from the fact that each vector Ek has length 1. Note that
these vectors are mutually orthogonal, that is, the dot product of any two distinct vectors
is zero,
EI,. Ej = 0
i f k#j.
THEOREM
12.6.
Every vector X = (x1 , . . . , x,) in V, cari be expressed in the form
X = x,E, + . . . + x,E, = i X~E,.
k=l
Moreover,
this representation
is unique. That is, if
X = 2 x,E,
and
k=l
x = i Y,&,
k=l
then xk = y, for each k = I,2, . . . , n.
Proof. The first statement follows immediately from the definition of addition and
multiplication by scalars. The uniqueness property follows from the definition of vector
equality.
A sum of the type 2 ciAi is called a linear combination of the vectors A,, . . . , A,.
Theorem 12.6 tells us that every vector in V, cari be expressed as a linear combination of
the unit coordinate vectors. We describe this by saying that the unit coordinate vectors
E . . . ) E, span the space V, . We also say they span V, uniquely because each representatiin of a vector as a linear combination of E,, . . . , E, is unique. Some collections of
vectors other than E, , . . . , E, also span V, uniquely, and in Section 12.12 we turn to the
study of such collections.
In V, the unit coordinate vectors E, and E, are often denoted, respectively, by the
symbols i and j in bold-face italic type. In V, the symbols i, j, and k are also used in place
Of-E,, E,, E,. Sometimes a bar or arrow is placed over the symbol, for example, i or i.
The geometric meaning of Theorem 12.6 is illustrated in Figure 12.12 for n = 3.
When vectors are expressed as linear combinations of the unit coordinate vectors,
algebraic manipulations involving vectors cari be performed by treating the sums cxkEk
according to the usual rules of algebra. The various components cari be recognized at any
stage in the calculation by collecting the coefficients of the unit coordinate vectors. For
example, to add two vectors, say A = (a,, . . . , a,) and B = (b, , . . . , b,), we Write
A = i a,E, ,
k=l
B = 5 b,E, >
k=l
460
Vector algebra
A = a,i + aj+ a,k
FIGURE
12.12 A vector A in V, expressed as a linear combination of i, j, k.
and apply the linearity property of finite sums to obtain
A + B = i a,E, + i b,E, = 2 (ah + b&, .
k=l
k=l
k=l
The coefficient of Ek on the right is the kth component of the sum A + B.
12.11 Exercises
1. Determine the projection of A along B if A = (1, 2, 3) and B = (1, 2, 2).
2. Determine the projection of A along B if A = (4, 3, 2, 1) and B = (1, 1, 1, 1).
3. (a) Let A = (6, 3, -2), and let a, b, c denote the angles between A and the unit coordinate
vectors i, j, k, respectively. Compute COS a, COS b, and COS c. These are called the direction
cosines of A.
(b) Find a11 vectors in V, of length 1 parallel to A.
4. Prove that the angle between the two vectors A = (1, 2, 1) and B = (2, 1, - 1) is twice that
between C = (1,4, 1) and D = (2, 5, 5).
5. Use vector methods to determine the cosines of the angles of the triangle in 3-space whose
vertices are at the points (2, -1, l), (1, -3, -5), and (3, -4, -4).
6. Three vectors A, B, C in V, satisfy a11 the following properties:
Il4 = IlCIl = 5 ,
IIBII = 1 ,
IIA - B + CII = IlA + B + CII.
If the angle between A and B is r/S, find the angle between B and C.
7. Given three nonzero vectors A, B, C in V, . Assume that the angle between A and C is equal to
the angle between B and C. Prove that C is orthogonal to the vector I(BII A - l\A 11B.
8. Let 1!3 denote the angle between the following two vectors in V, : A = (1, 1, . . . , 1) and B =
(1, 2, ***, n). Find the limiting value of 0 as n -f ~0.
9. Solve Exercise 8 if A = (2, 4, 6, . . . ,2n) and B = (1, 3, 5, . . . ,2n - 1).
Exercises
461
10. Given vectors A = (COS 0, -sin 0) and B = (sin 8, COS 0) in V, .
(a) Prove that A and B are orthogonal vectors of length 1. Make a sketch showing A and B
when 0 = 3~16.
(b) Find a11 vectors (x, y) in Vs such that (x, y) = xA + yB. Be sure to consider a11 possible
values of 8.
Il. Use vector methods to prove that the diagonals of a rhombus are perpendicular.
12. By forming the dot product of the two vectors (COS a, sin a) and (COS b, sin 6), deduce the
trigonometric identity COS (a - b) = COS a COS b + sin a sin b.
13. If 0 is the angle between two nonzero vectors A and B in V, , prove that
IIA - Bl12 = II-4 Il2 + IlBl12 - 2 IIA il IIBll COS 0.
When interpreted geometrically in V, , this is the law of cosines of trigonometry.
14. Suppose that instead of defining the dot product of two vectors A = (a1 , . . . , a,) and B =
(6, > . . . , 6,) by the formula A . B = ztl akbk , we used the following definition :
A . B = 2 lakbkl .
k=l
Which of the properties of Theorem 12.2 are valid with this definition? 1s the Cauchy-Schwarz
inequality valid with this definition?
15. Suppose that in V2 we define the dot product of two vectors A = (a1 , a2) and B = (b, , 6,) by
the formula
A *B = 2a,b, + a,b2 + a,b, + a,b, .
Prove that a11 the properties of Theorem 12.2 are valid with this definition of dot product. 1s
the Cauchy-Schwarz inequality still valid?
16. Solve Exercise 15 if the dot product of two vectors A = (a1 , a2 , as) and B = (b, , b, , b3) in V,
is defined by the formula A . B = 2a,b, + a,b, + a,b, + a,b, + a,b, .
17. Suppose that instead of defining the norm of a vector A = (a1 , . . . , a,) by the formula
(A . A)1’2, we used the following definition :
IIA II = i Id .
k=l
(a) Prove that this definition of norm satisfies a11 the properties in Theorems 12.4 and 12.5.
(b) Use this definition in V, and describe on a figure the set of a11 points (x, y) of norm 1.
(c) Which of the properties of Theorems 12.4 and 12.5 would hold if we used the definition
18. Suppose that the norm of a vector A = (a1 , . . . , a,) were defined by the formula
IIA Il = max la,\ ,
l<kSn
where the symbol on the right means the maximum of the n numbers laJ, la,l, . . . , la,l.
(a) Which of the properties of Theorems 12.4 and 12.5 are valid with this definition?
(b) Use this definition of norm in V2 and describe on a figure the set of a11 points (x, y) of
norm 1.
462
Vector
algebra
19. If A = (a1 , . . . , a,) is a vector in V, , define two norms as follows:
and
IIA Il2 = max lakl .
lik<n
Prove that IIA Ils I IIA Il I IIA Ill . Interpret this inequality geometrically in the plane.
20. If A and B are two points in n-space, the distance from A to B is denoted by d(A, B) and is
defined by the equation d(A, B) = lIA - Bl(. Prove that distance has the following properties :
(a) d(A, B) = d(B, A).
(b) d(A, B) = 0
if and only if A = B.
Cc> 44 B) I 44, C> + d(C, B).
12.12 The linear span of a finite set of vectors
Let S = {A,, . . . , A,} be a nonempty set consisting of k vectors in V, , where k, the
number of vectors, may be less than, equal to, or greater than n, the dimension of the space.
If a vector X in V, cari be represented as a linear combination of A, , . . , A,, say
X = i ciAi,
i=l
then the set S is said to span the vector X.
DEFINITION.
The set of a11 vectors spanned by S is called the linear span of S and is denoted
by L(S)In other words, the linear span of S is simply the set of a11 possible linear combinations
of vectors in S. Note that linear combinations of vectors in L(S) are again in L(S). We
say that S spans the whole space V, if L(S) = V, .
EXAMPLE
1. Let S = {A,}. Then L(S) consists
of a11 scalar multiples of A, .
EXAMPLE 2. Every set S = {A,, . . . , Ak} spans thezero vector since 0 = OA, + *+. + OA,.
This representation, in which a11 the coefficients c1 , . . . , cR are zero, is called the trivial
representation of the zero vector. However, there may be nontrivial linear combinations
that represent 0. For example, suppose one of the vectors in S is a scalar multiple of
another, say A, = 2A, . Then we have many nontrivial representations of 0, for example
0=2tA,-tA,+OA,+...+OA,,
where t is any nonzero scalar.
We are especially interested in sets S that span vectors in exactly one way.
DEFINITION.
A set S = {A,, . . . , Alc} of vectors in V, is said to span X uniquely ifs spans
Xandif
(12.10)
X = i ciAi
i=l
and
X = i diAi
i=l
implies ci = di f o r a11 i .
Linear independence
463
In the two sums appearing in (12.10), it is understood that the vectors A, , . . . , A, are
written in the same order. It is also understood that the implication (12.10) is to hold for a
fixed but arbitrary ordering of the vectors A, , . . . , A,.
THEOREM
12.7. A set S spans every vector in L(S) uniquely if and only if S spans the
zero vector uniquely.
If S spans every vector in L(S) uniquely, then it certainly spans 0 uniquely. TO
Proof.
prove the converse, assume S spans 0 uniquely and choose any vector X in L(S). Suppose
S spans X in two ways, say
X = -$ ciAi
and
i=l
X = 5 d,A, .
i=l
By subtraction, we find that 0 = 26, (ci - d,)Ai .
have ci - di = 0 for a11 i, SO S spans X uniquely.
But since S spans 0 uniquely, we must
12.13 Linear independence
Theorem 12.7 demonstrates the importance of sets that span the zero vector uniquely.
Such sets are distinguished with a special name.
DEFINI T I ON . A set S = {A,, . . . , A,} which spans the zero vector unique& is said to be
a linearly independent set of vectors. Otherwise, S is called linearly dependent.
In other words, independence means that S spans 0 with only the trivial representation:
implies a11 ci = 0
Dependence means that S spans 0 in some nontrivial way. That is, for some choice
scalars c1 , . . . , ck , we have
i&Ai = 0
of
but not a11 c, are zero .
Although dependence and independence are properties of sets of vectors, it is common
practice to also apply these terms to the vectors themselves. For example, the vectors in
a linearly independent set are often called linearly independent vectors. We also agree to
cal1 the empty set linearly independent.
The following examples may serve to give further insight into the meaning of dependence
and independence.
EXAMPLE 1. If a subset T of a set S is dependent, then S itself is dependent, because
if T spans 0 nontrivially, then SO does S. This is logically equivalent to the statement that
every subset of an independent set is independent.
464
Vector
algebra
EXAMPLE 2. The n unit coordinate vectors E1, . . . , E,, in V,, span 0 uniquely
are linearly independent.
SO
they
EXAMPLE 3. Any set containing the zero vector is dependent. For example, if A, = 0,
we have the nontrivial representation 0 = lA, + OA, + . * *+ OA, .
EXAMPLE 4. The set 5’ = {i,j, i + j} of vectors in V, is linearly dependent because we
have the nontrivial representation of the zero vector
0 = i + j + (-I)(i+i).
In this example the subset T = {i, j} is linearly independent. The third vector, i + j, is
in the linear span of T. The next theorem shows that if we adjoin to i and j any vector in the
linear span of T, we get a dependent set.
THEOREM 12.8. Let s = {A,, . . . > Ak} be a linearly independent set of k vectors in V,, ,
and let L(S) be the linear span of S. Then, every set of k + 1 vectors in L(S) is linearly
dependent.
Proof.
The proof is by induction on k, the number of vectors in S. First suppose k = 1.
Then, by hypothesis, S consists of one vector, say A, , where A, # 0 since Sis independent.
Now take any two distinct vectors Br and B, in L(S). Then each is a scalar multiple of A,,
say B1 = c,A, and B, = c,A, , where c1 and c2 are not both zero. Multiplying B, by c2 and
B, by ci and subtracting, we find that
c,B, - c,B, = 0 .
This is a nontrivial representation of 0 SO B, and B, are dependent. This proves the
theorem when k = 1.
Now we assume that the theorem is true for k - 1 and prove that it is also true for k.
Take any set of k + 1 vectors in L(S), say T = {B, , B, , . . . , Bk+l}. We wish to prove that
T is linearly dependent. Since each Bi is in L(S), we may Write
(12.11)
f o r each i = 1,2,... , k + 1. We examine a11 the scalars ai1 that multiply A, and split
the proof into two cases according to whether a11 these scalars are 0 or not.
CASE 1. ail = 0 for every i = 1, 2, . . . , k + 1. In this case the sum in (12.11) does not
involve A, SO each Bi in T is in the linear span of the set S’ = {A, , . . . , A,}. But S’ is
linearly independent and consists of k - 1 vectors. By the induction hypothesis, the
theorem is true for k - 1 SO the set T is dependent. This proves the theorem in Case 1.
CASE 2. Not ail the scalars ai, are zero. Let us assume that a,, # 0. (If necessary, we
cari renumber the B’s to achieve this.) Taking i = 1 in Equation (12.11) and multiplying
Linear
independence
465
both members by ci , where ci = a,,/a,, , we get
Ci& = ailA + 2 cia,,Aj .
i=2
From this we subtract Equation (12.11) to get
ciBl - B, = i (c,alj - aij)Aj,
5=2
for i = 2, . . . , k + 1. This equation expresses each of the k vectors ciB, - B, as a linear
combination of k - 1 linearly independent vectors A, , . . . , A, . By the induction hypothesis, the k vectors c,B, - Bi must be dependent. Hence, for some choice of scalars
t 2 > . . . > t,,, 7 not a11 zero, we have
k+l
from which we find
But this is a nontrivial linear combination of Bl , . . . , Bk+l which represents the zero vector,
SO the vectors B, , . . . , B,,, must be dependent. This completes the proof.
We show next that the concept of orthogonality is intimately related to linear independence.
DEFINITION. A Set S= {z‘i,,...,A,}
Of vectors in V, is called
an orthogonal set if
Ai * Ai = 0 whenever i # j. In other ti<ords, any t&tso distinct vectors in an orthogonal set
are perpendicular.
12.9. Any orthogonal set S = {A,, . . . , Ak} of nonzero vectors in V, is linearly
independent. Moreover, if S spans a vector X, say
THEOREM
(12.12)
X = i ciAi >
i=l
then the scalar multipliers
cl , . . . , ck are given by the formulas
(12.13)
Proof. First we prove that 5’ is linearly independent. Assume that zF=, ciAi = 0.
Taking the dot product of each member with A, and using the fact that A, * Ai = 0 for
each i # 1, we find c,(A, * A,) = 0. But (A, * A,) # 0 since A, # 0, SO c1 = 0. Repeating
466
Vector
algebra
this argument with A, replaced by A, , we find that each ci = 0. Therefore S spans 0
uniquely SO S is linearly independent.
Now suppose that S spans X as in Equation (12.12). Taking the dot product of X with
Ai as above, we find that cj(Aj *AJ = X- Ai from which we obtain (12.13).
If a11 the vectors A, , . . . , A, in Theorem 12.9 have norm 1, the formula for the multipliers
simplifies to
cj = X*A,.
An orthogonal set of vectors {A, , . . , A,}, each of which has norm 1, is called an orthonormal set. The unit coordinate vectors E, , . . . , E, are an example of an orthonormal set.
12.14 Bases
It is natural to study sets of vectors that span every vector in V, uniquely.
called buses for V, .
DEFINITION.
,d Set s =
every vector in V,, uniquely.
basis.
Such sets are
A,} of vectors in V, is called a basis for V, if S spans
in addition, S is orthogonal, then S is called an orthogonal
{A, , . . . ,
If,
Thus, a basis is a linearly independent set which spans the whole space V, . The set of
unit coordinate vectors is an example of a basis. This particular basis is also an orthogonal
basis. Now we prove that every basis contains the same number of elements.
12.10. In a given vector space V, , buses have the following properties:
(a) Every basis contains exactly n vectors.
(b) Any set of linearly independent vectors is a subset of some basis.
(c) Any set of n linearly independent vectors is a basis.
THEOREM
Proof.
The unit coordinate vectors E, , . . . , E, form one basis for V, . If we prove that
any two bases contain the same number of vectors we obtain (a).
Let S and T be two bases, where S has k vectors and T has r vectors. If r > k, then T
contains at least k + 1 vectors in L(S), since L(S) = V, . Therefore, because of Theorem
12.8, T must be linearly dependent, contradicting the assumption that T is a basis. This
means we cannot have r > k, SO we must have r < k. Applying the same argument with
S and T interchanged, we find that k 2 r. Hence, k = r SO part (a) is proved.
TO prove (b), let S = {A,, . . . , Ak} be any linearly independent set of vectors in V, .
If L(S) = V, , then S is a basis. If not, then there is some vector X in V, which is not in
L(S). Adjoin this vector to S and consider the new set S’ = {A,, . . . , A,, X}. If this set
were dependent, there would be scalars c1 , . . . , c~+~ , not a11 zero, such that
i CiAi + ck+lx = 0 .
i=l
But ck+l Z 0 since A, , . . . , A, are independent. Hence, we could solve this equation for
Exercises
467
X and find that XE L(S), contradicting the fact that X is not in L(S). Therefore, the set
S’ is linearly independent but contains k + 1 vectors. If L(S’) = V, , then S’ is a basis
and, since S is a subset of S’, part (b) is proved. If S’ is not a basis, we may argue with S’
as we did with S, getting a new set S” which contains k + 2 vectors and is linearly independent. If S” is a basis, then part (b) is proved. If not, we repeat the process. We must
arrive at a basis in a finite number of steps, otherwise we would eventually obtain an independent set with n + 1 vectors, contradicting Theorem 12.8. Therefore part (b) is proved.
Finally, we use (a) and (b) to prove (c). Let S be any linearly independent set consisting
of IZ vectors. By part (b), S is a subset of some basis, say B. But by (a) the basis B has
exactly n elements, SO S = B.
12.15 Exercises
1. Let i and j denote the unit coordinate vectors in V, . In each case find scalars x and y such that
x(i -j) + y(i + j) is equal to
(c) 3i - Sj;
(d) 7i + Sj.
(a> i;
(b) A
2. If A = (1,2), B = (2, -4), and C = (2, -3) are three vectors in VZ , find scalars x and y such
that C = xA + yB. How many such pairs x, y are there?
3. If A = (2, -1, l), B = (1,2, -l), and C = (2, -11,7) are three vectors in V, , find scalars
x and y such that C = xA + yB.
4. Prove that Exercise 3 has no solution if C is replaced by the vector (2, 11, 7).
5. Let A and B be two nonzero vectors in V, .
(a) If A and B are parallel, prove that A and B are linearly dependent.
(b) If A and B are not parallel, prove that A and B are linearly independent.
6. If (a, b) and (c, d) are two vectors in Va , prove that they are linearly independent if and only
ifad-bc#O.
7. Find a11 real t for which the two vectors (1 + t, 1 - t) and (1 - t, 1 + t) in V, are linearly
independent.
8. Let i, j, k be the unit coordinate vectors in Va . Prove that the four vectors i, j, k, i + j + k
are linearly dependent, but that any three of them are linearly independent.
9. Let i and j be the unit coordinate vectors in VZ and let S = {i, i + j}.
(a) Prove that S is linearly independent.
(b) Prove that j is in the linear span of S.
(c) Express 3i - 4j as a linear combination of i and i + j.
(d) Prove that L(S) = V, .
10. Consider the three vectors A = i, B = i + j, and C = i + j + 3k in V, .
(a) Prove that the set {A, B, C> is linearly independent.
(b) Express each of j and k as a linear combination of A, B, and C.
(c) Express 2i - 3j + 5k as a linear combination of A, B, and C.
(d) Prove that {A, B, C} is a basis for V, .
11. Let A = (1, 2), B = (2, -4), C = (2, -3), and D = (1, -2) be four vectors in V,. Display
a11 nonempty subsets of (A, B, C, D} which are linearly independent.
12.LetA=(1,1,1,O),B=(0,1,1,1)andC=(1,1,0,0)bethreevectorsinV~.
(a) Determine whether A, B, C are linearly dependent or independent.
(b) Exhibit a nonzero vector D such that A, B, C, D are dependent.
(c) Exhibit a vector E such that A, B, C, E are independent.
(d) Having chosen E in part (c), express the vector X = (1, 2, 3, 4) as a linear combination of
A, B, C,E.
13. (a) Prove that the following three vectors in Va are linearly independent: (43, 1, 0), (1, d?, l),
641, fil.
Vector algebra
468
(b) Prove that the following three are dependent: (42, 1, 0), (1, 42, l), (0, 1, d?).
(c) Find a11 real t for which the following three vectors in V, are dependent: (t, 1, 0), (1, t, l),
(0, 12th
14. Consider the following sets of vectors in V, . In each case, find a linearly independent subset
containing as many vectors as possible.
(4 {Cl,& l,O>, (1, 1, 1, l>, (0, 1, 0, 11, CLO, -1,O)l.
(b) ((1, 1, 1, l), (1, -1, 1, l), (1, -1, -1, 11, (1, -1, -1, -111.
Cc) ((1, 1, 1, lh (0, 1, 1, l), (O,O, 1, l>, (O,O, 0, 111.
15. Given three linearly independent vectors A, B, C in V, . Prove or disprove each of the following statements.
(a) A + B, B + C, A + C are linearly independent.
(b) A - B, B + C, A + C are linearly independent.
16. (a) Prove that a set S of three vectors in Va is a basis for V, if and only if its linear span L(S)
contains the three unit coordinate vectors i, j, and k.
(b) State and prove a generalization of part (a) for V, .
17. Find two bases for V, containing the two vectors (0, 1, 1) and (1, 1, 1).
18. Find two bases for V, having only the two vectors (0, 1, 1, 1) and (1, 1, 1, 1) in common.
19. Consider the following sets of vectors in V, :
u = {(1,2,3), (1, 3,5)}.
39 = ((1, 1, l),(O, 1,2), u,o, -l>>,
T = {G 1, 01, G&O, -211,
(a) Prove that L(T) E L(S).
(b) Determine a11 inclusion relations that hold among the sets L(S), L(T), and L(U).
20. Let A and B denote two finite subsets of vectors in a vector space V, , and let L(A) and L(B)
denote their linear spans. Prove each of the following statements.
(a) If A C B, then L(A) s L(B).
(b) L(A n B) c L(A) r\ L(B).
(c) Give an example in which L(A n B) # L(A) fi L(B).
12.16 The vector space V,(C) of n-tuples of complex numbers
In Section 12.2 the vector space V, was defined to be the collection of a11 n-tuples of
real numbers. Equality, vector addition, and multiplication by scalars were defined in
terms of the components as follows: If A = (a, , . . . , a,) and B = (6, , . . . , b,), then
A = B
means a , = bi
foreachi= 1,2 ,..., n,
A + B = (a, + b, , . . . , a, + b,) ,
CA = (cal , . . . , ca,) .
If a11 the scalars a, , bi and c in these relations are replaced by complex numbers, the new
algebraic system SO obtained is called complex vector space and is denoted by V,(C).
Here C is used to remind us that the scalars are complex.
Since complex numbers satisfy the same field properties as real numbers, a11 theorems
about real vector space V, that use only the field properties of the real numbers are also
valid for V,(C), provided a11 the scalars are allowed to be complex. In particular, those
theorems in this chapter that involve only vector addition and multiplication by scalars
are also valid for V,(C).
This extension is not made simply for the sake of generalization. Complex vector spaces
arise naturally in the theory of linear differential equations and in modern quantum
mechanics, SO their study is of considerable importance. Fortunately, many of the theorems
about real vector space V, carry over without change to V,(C). Some small changes have
The cector space V,(C) of n-tuples
of complex numbers
469
to be made, however, in those theorems that involve dot products. In proving that the dot
product A . A of a nonzero vector with itself is positive, we used the fact that a sum of
squares of real numbers is positive. Since a sum of squares of complex numbers cari be
negative, we must modify the definition of A *B if we wish to retain the positivity property.
For V,(C), we use the following definition of dot product.
If A = (a, , . . . , a,) and B = (b, , . . . , b,) are two vectors in V,(C), we
their dot product A . B by the formula
DEFINITION.
define
A *B = f$ a& ,
k=l
where 6, is the complex conjugate
of 6, .
Note that this definition agrees with the one given earlier for V, because b, = b, when
bk is real. The fundamental properties of the dot product, corresponding to those in
Theorem 12.2, now take the following form.
12.11. For a11 vectors A, B, C in V,(C) and a11 complex scalars
A . B = B-,
A . (B + C) = A . B + A . C,
C(A . B) = (CA) . B = A * (CB),
A . A > 0
if A#O,
A . A = 0
i f A=O.
THEOREM
(a)
(b)
(c)
(d)
(e)
c, we have
Al1 these properties are easy consequences
of the definition and their proofs are left as
exercises. The reader should note that conjugation takes place in property (a) when the
order of the factors is reversed. Also, conjugation of the scalar multiplier occurs in property (c) when the scalar c is moved from one side of the dot to the other.
The Cauchy-Schwarz inequality now takes the form
(12.14)
IA . B12 < (A . A)(B . B) .
The proof is similar to that given for Theorem 12.3. We consider the vector C = xA - yB,
where x = B. B and y = A . B, and compute C. C. The inequality C *C 2 0 leads to
(12.14). Details are left as an exercise for the reader.
Since the dot product of a vector with itself is nonnegative, we cari introduce the norm
of a vector in V,(C) by the usual formula,
IIA/I = (A . A)1’2 .
The fundamental properties of norms, as stated in Theorem 12.4, are also valid without
change for V,(C). The triangle inequality, [IA + BII < Ij A I + llB\l, also holds in V,(C).
Orthogonality of vectors in V,(C) is defined by the relation A . B = 0. As in the real
case, two vectors A and B in V,(C) are orthogonal whenever they satisfy the Pythagorean
identity, IfA + BJ12 = llAI12 + llBl12.
470
Vector
algebra
The concepts of linear span, linear independence, linear dependence, and basis, are defined
for V,(C) exactly as in the real case. Theorems 12.7 through 12.10 and their proofs are a11
valid without change for V,(C).
1 2 . 1 7 Exercises
1. Let A = (1, i), B = (i, -i), and C = (2i, 1) be three vectors in V,(C). Compute each of the
following dot products:
(a) A *B;
(b) B.A;
(c) (iA) . B;
(d) A . (iB);
(e) (iA) . (iB);
(f) B . C;
(h) (B + C).A;
(i) (A - C). B;
Cg) A . C;
(j) (A - iB) . (A + iB).
2. If A = (2,1, -i) and B = (i, -1, 2i), find a nonzero vector C in V,(C) orthogonal to both A
and B.
3. Prove that for any two vectors A and B in V,(C), we have the identity
IIA + Bll’ = IM il2 + IIBl12 + A . B + A . B.
4. Prove that for any two vectors A and B in V,(C), we have the identity
IIA + B112 - IjA - B/12 = 2(A. B + A. B),
5. Prove that for any two vectors A and B in V,(C), we have the identity
IIA + Bl12 + IIA - B/12 = 2 I]A112 + 2 1lBl12.
6. (a) Prove that for any two vectors A and B in V,(C), the sum A . B + A . B is real.
(b) If A and B are nonzero vectors in V,(C), prove that
7. We define the angle 0 between two nonzero vectors A and B in V,(C) by the equation
0 = arccos
&(A . B + A . B)
II4
IIBII
*
The inequality in Exercise 6 shows that there is always a unique angle 8 in the closed interval
0 I 0 I r satisfying this equation. Prove that we have
IIA - Bl12 = IlAIl + lIBl12 - 2 IIA Il IIBII ~0s 0.
8. Use the definition in Exercise 7 to compute the angle between the following two vectors in
V,(C): A = (1, 0, i, i, i), and B = (i, i, i, 0, i).
9. (a) Prove that the following three vectors form a basis for V,(C): A = (1, 0, 0), B = (0, i, 0),
C = (1, 1, i).
(b) Express the vector (5,2 - i, 2i) as a linear combination of A, B, C.
10. Prove that the basis of unit coordinate vectors E, , . . . , En in V, is also a basis for V,(C).
13
APPLICATIONS OF VECTOR ALGEBRA
TO ANALYTIC GEOMETRY
13.1 Introduction
This chapter discusses applications of vector algebra to the study of lines, planes, and
conic sections. In Chapter 14 vector algebra is combined with the methods of calculus, and
further applications are given to the study of curves and to some problems in mechanics.
The study of geometry as a deductive system, as conceived by Euclid around 300 B.C.,
begins with a set of axioms or postulates which describe properties of points and lines.
The concepts “point” and “line” are taken as primitive notions and remain undefined.
Other concepts are defined in terms of points and lines, and theorems are systematically
deduced from the axioms. Euclid listed ten axioms from which he attempted to deduce a11
his theorems. It has since been shown that these axioms are not adequate for the theory.
For example, in the proof of his very first theorem Euclid made a tacit assumption concerning the intersection of two circles that is not covered by his axioms. Since then other lists
of axioms have been formulated that do give a11 of Euclid’s theorems. The most famous
of these is a list given by the German mathematician David Hilbert (1862-1943) in his now
classic GrundIugen
der Geometrie, published in 1899. (An English translation exists:
The Foundations of Geometry, Open Court Publishing CO., 1947.) This work, which went
through seven German editions in Hilbert’s lifetime, is said to have inaugurated the abstract
mathematics of the twentieth Century.
Hilbert starts his treatment of plane geometry with five undefined concepts: point, line,
on (a relation holding between a point and a line), between (a relation between a point and a
pair of points), and congruence (a relation between pairs of points). He then gives fifteen
axioms from which he develops a11 of plane Euclidean geometry. His treatment of solid
geometry is based on twenty-one axioms involving six undefined concepts.
The approach in analytic geometry is somewhat different. We define concepts such as
point, line, on, between, etc., but we do SO in terms of real numbers, which are left undefined. The resulting mathematical structure is called an analytic mode1 of Euclidean
geometry. In this model, properties of real numbers are used to deduce Hilbert’s axioms.
We shall not attempt to describe a11 of Hilbert’s axioms. Instead, we shall merely indicate
how the primitive concepts may be defmed in terms of numbers and give a few proofs to
illustrate the methods of analytic geometry.
471
472
Applications of vector algebra to analytic geometry
13.2 Lines
in n-space
In this section we use real numbers to define the concepts of point, line, and on. The
definitions are formulated to fit our intuitive ideas about three-dimensional Euclidean
geometry, but they are meaningful in n-space for any n 2 1.
A point is simply a vector in V, , that is, an ordered n-tuple of real numbers; we shall use
the words “point” and “vector” interchangeably. The vector space V, is called an analytic
mode1 of n-dimensional Euclidean space or simply Euclidean n-space. TO define “line,” we
employ the algebraic operations of addition and multiplication by scalars in V, .
Let P be a given point and A a given nonzero vector. The set of a11 points
of the form P + tA, where t runs through a11 real numbers, is called a line through Pparallel
to A. We denote this line by L(P; A) and Write
DEFINITION.
L(P; A) = {P + tA 1t real}
or, more briejy,
L(P; A) = {P + tA} .
A point Q is said to be on the line L(P; A) if Q E L(P; A).
In the symbol L(P; A), the point P which is written first is on the line since it corresponds
to t = 0. The second point, A, is called a direction vector for the line. The line L(0; A)
through the origin 0 is the linear span of A; it consists of a11 scalar multiples of A. The
line through P parallel to A is obtained by adding P to each vector in the linear span of A.
Figure 13.1 shows the geometric interpretation of this definition in V, . Each point P + tA
cari be visualized as the tip of a geometric vector drawn from the origin. As t varies over
a11 the real numbers, the corresponding point P + tA traces out a line through P parallel
to the vector A. Figure 13.1 shows points corresponding to a few values of t on both lines
L(P; A) and L(0; A).
FIGURE 13.1
The line L(P; A) through P parallel to A and its geometric relation to
the line L(O; A) through 0 parallel to A.
Some simple properties of straight lines
473
13.3 Some simple properties of straight lines
First we show that the direction vector A which occurs in the definition of L(P; A) cari
be replaced by any vector parallel to A. (We recall that two vectors A and B are called
parallel if A = cB for some nonzero scalar c.)
THEOREM 13.1.
Two lines L(P; A) and L(P; B) through the same point P are equal if
and only if the direction vectors A and B are parallel.
Proof.
Assume first that L(P; A) = L(P; B). Take a point on L(P; A) other than P,
for example, P + A. This point is also on L(P; B) SO P + A = P + cB for some scalar c.
Hence, we have A = cB and c # 0 since A # 0. Therefore, A and B are parallel.
Now we prove the converse. Assume A and B are parallel, say A = cB for some c # 0.
If Q is on L(P; A), then we have Q = P + tA = P + t(cB) = P + (ct)B, SO Q is on
L(P; B). Therefore L(P; A) c L(P; B). Similarly, L(P; B) s L(P; A), SO L(P; A) = L(P; B).
Next we show that the point P which occurs in the definition of L(P; A) cari be replaced
by any other point Q on the same line.
THEOREM
13.2. T\zto lines L(P; A) and L(Q; A) with the same direction vector A are
equal ifand only if Q is on L(P; A).
Proof. A s s u m e L(P; A) = L(Q; A). Since Q i s o n L(Q; A), Q is also on L(P; A).
TO prove the converse, assume that Q is on L(P; A), say Q = P + CA. We wish to prove
that L(P; A) = L(Q; A). If X E L(P; A), then X = P + tA for some t. But P = Q - CA,
S O X = Q - CA + tA = Q + (t - C)A, a n d hence X is also on L(Q; A). Therefore
L(P; A) c L(Q; A). Similarly, we find L(Q; A) c L(P; A), SO the two lines are equal.
One of Euclid’s famous postulates is the parallelpostulate which is logically equivalent
to the statement that “through a given point there exists one and only one line parallel to a
given line.” We shah deduce this property as an easy consequence
of Theorem 13.1.
First we need to define parallelism of lines.
DEFINITION.
Two lines L(P; A) and L(Q; B) are called parallel if their direction vectors
A and B are parallel.
THEOREM
13.3.
line L’ containing
Given a line L and a point Q not on L, then there is one and only one
Q andparallel to L.
Proof.
Suppose the given line has direction vector A. Consider the line L’ = L(Q; A).
This line contains Q and is parallel to L. Theorem 13.1 tells us that this is the only line
with these two properties.
Note: For a long time mathematicians suspected that the parallel postulate could
be deduced from the other Euclidean postulates, but a11 attempts to prove this resulted
in failure. Then in the early 19th Century the mathematicians Karl F. Gauss (1777-1855),
474
Applications of vector algebra to analytic
geometry
J. Bolyai (1802-1860), and N. 1. Lobatchevski (1793-1856) became convinced that the
parallel postulate could not be derived from the others and proceeded to develop nonEuclidean geometries, that is to say, geometries in which the parallel postulate does not
hold. The work of these men inspired other mathematicians and scientists to enlarge
their points of view about “accepted
truths” and to challenge other axioms that had been
considered sacred for centuries.
It is also easy to deduce the following property of lines which Euclid stated as an axiom.
T H E O R E M 13.4. TWO distinct points determine a line. That is, if P # Q, there is one
and only one line containing both P and Q. It cari be described as the set {P + t(Q - P)}.
Proof
Let L be the line through P parallel to Q - P, that is, let
L = L(P; Q - P) = {P + t(Q - P)} .
This line contains both P and Q (take t = 0 to get P and t = 1 to get Q). Now let L’ be
any line containing both P and Q. We shah prove that L’ = L. Since L’ contains P, we
have L’ = L(P; A) for some A # 0. But L’ also contains Q SO P + CA = Q for some c.
Hence we have Q - P = CA, where c # 0 since Q # P. Therefore Q - P is parallel to A
SO, by Theorem 13.2, we have L’ = L(P; A) = L(P; Q - P) = L.
EXAMPLE. Theorem 13.4 gives us an easy way to test if a point Q is on a given line
L(P; A). It tells us that Q is on L(P; A) if and only if Q - P is parallel to A. For example,
consider the line L(P; A), where P = (1, 2, 3) and A = (2, - 1, 5). TO test if the point
Q = (1, 1,4) is on this line, we examine Q - P = (0, - 1, 1). Since Q - P is not a scalar
multiple of A, the point (1, 1, 4) is not on this line. On the other hand, if Q = (5, 0, 13),
we find that Q - P = (4, -2, 10) = 2A, SO this Q is on the line.
Linear dependence of two vectors in V, cari be expressed in geometric language.
THEOREM 13.5. TWO vectors A and B in V, are linearly dependent if and only if they lie
on the same line through the origin.
Proof. If either A or B is zero, the result holds trivially. If both are nonzero, then A
and B are dependent if and only if B = tA for some scalar t. But B = tA if and only if B
lies on the line through the origin parallel to A.
13.4 Lines
and vector-valued functions
The concept of a line cari be related to the function concept. The correspondence which
associates to each real t the vector P + tA on the line L(P; A) is an example of a function
whose domain is the set of real numbers and whose range is the line L(P; A). If we denote
the function by the symbol A’, then the function value X(t) at t is given by the equation
(13.1)
X(t) = P + tA .
We cal1 this a vector-valued function of a real variable.
Lines and vector-valued
finctions
475
The function point of view is important because, as we shah see in Chapter 14, it provides
a natural method for describing more general space curves as well.
The scalar t in Equation (13.1) is often called aparameter, and Equation (13.1) is called a
vector parametric equation or, simply a vector equation of the line. Occasionally it is convenient to think of the line as the track of a moving particle, in which case the parameter t
is referred to as time and the vector X(t) is called the position vector.
Note that two points X(a) and X(b) on a given line L(P; A) are equal if and only if we have
P + aA = P + bA, or (a - b)A = 0. Since A # 0, this last relation holds if and only if
a = b. Thus, distinct values of the parameter t lead to distinct points on the line.
Now consider three distinct points on a given line, say X(a), X(b), and X(c), where a > b.
We say that X(c) is between X(a) and X(b) if c is between a and b, that is, if a < c < b.
Congruence cari be defined in terms of norms. A pair of points P, Q is called congruent
to another pair P’, Q’ if IIP - QI/ = IIP’ - Q’I/. The norm IIP - QI/ is also called the
distance between P and Q.
This completes the definitions of the concepts ofpoint, line, on, betutseen, and congruence
in our analytic mode1 of Euclidean n-space. We conclude this section with some further
remarks concerning parametric equations for lines in 3-space.
If a line passes through two distinct points P and Q, we cari use Q - P for the direction
vector A in Equation (13.1); the vector equation of the line then becomes
X(t) = P + t(Q - P)
or
X(t) = tQ + (1 - t)P .
Vector equations cari also be expressed in terms of components. For example, if we
Write P = (p, q, r), A = (a, 6, c) , and X(t) = (..Y, y, z), Equation (13.1) is equivalent to the
three scalar equations
(13.2)
x =p + ta,
y=q+
z = r + tc.
tb,
These are called scalar parametric equations or simply parametric equations for the line;
they are useful in computations involving components. The vector equation is simpler
and more natural for studying general properties of lines.
If a11 the vectors are in 2-space, only the first two parametric equations in (13.2) are
needed. In this case, we cari eliminate t from the two parametric equations to obtain the
relation
(13.3)
0 - P) - a(y - q) = 0,
which is called a Cartesian equation for the line. If a # 0, this cari be written in thepointslope form
y-4+x-p)
The point (p, q) is on the line; the number b/a is the slope of the line.
The Cartesian equation (13.3) cari also be written in terms of dot products.
N = (b, -4, X = (x, y), and P = (p, q), Equation (13.3) becomes
(X - P) * N = 0
or
X.N=P.N.
If we let
Applications of vector algebra to analytic geometry
476
The vector N is perpendicular to the direction vector A since N * A = bu - ab = 0; the
vector N is called a normal vector to the line. The line consists of a11 points X satisfying
the relation (X - P) * N = 0.
The geometric meaning of this relation is shown in Figure 13.2. The points P and X are
on the line and the normal vector N is orthogonal to X - P. The figure suggests that among
a11 points X on the line, the smallest length 11XII occurs when X is the projection of P along
N. We now give an algebraic proof of this fact.
Y
t
N Normal vector
F IGURE 13.2
THEOREM
13.6.
A line in the xy-plane through P with normal vector N. Each point X
on the line satisfies (X - P) * N = 0.
Let L be the line in V, consisting of a11 points X satisfying
X.N=P.N,
where P is on the line and N is a nonzero vector normal to the Iine. Let
d = Ip ’ NI
IlNIl
Then every X on L has length jl X/I 2 d. Moreover,
jection of P along N:
X=tN,
Proof.
’
I XII = d if and onZy if X is the pro-
P.N
where t = N.N’
If XE L, we have X *N = P. N. By the Cauchy-Schwarz inequality, we have
IP * NI = IX* NI 5 IIXII IlNIl,
which implies (1 XII 2 [P * Nl/llNll = d. The equality sign holds if and only if X = tN
This comfor some scalar t, in which case P. N = X *N = tN *N, SO t = P. N/N *N.
pletes the proof.
Exercises
417
In the same way we cari prove that if Q is a given point in Vz not on the line L, then for
a11 X on L the smallest value of 11X - QI1 is I(P - Q) * Nl/llNll, and this occurs when
X - Q is the projection of P - Q along the normal vector N. The number
IV - Q> * NI
IlNIl
is called the distance from the point Q to the line L. The reader should illustrate these concepts on a figure similar to that in Figure 13.2.
13.5 Exercises
1. A line L in Vs contains the two points P = (-3, 1) and Q = (1, 1). Determine which of the
following points are on L. (a) (0,O); (b) (0, 1); (c) (1,2); (d) (2, 1); (e) (-2, 1).
2. Solve Exercise 1 if P = (2, -1) and Q = (-4,2).
3. A line L in V, contains the point P = (-3, 1, 1) and is parallel to the vector (1, -2, 3).
Determine which of the following points are on L. (a) (0, 0,O); (b) (2, -1,4); (c) (-2, - 1,4);
(d) (-4, 3, -2); (e) (2, -9, 16).
4. A line L contains the two points P = (-3, 1, 1) and Q = (1, 2, 7). Determine which of the
followingpointsareonL. (a)(-7,0,5); (b)(-7,0, -5); (c)(-11,1,11); (d)(-11, -1,ll);
Ce>(-1,%4); (f)(-9,9,3); Cg>(-l,i?, -4).
5. In each case, determine if a11 three points P, Q, R lie on a line.
(a) P = (2, 1, l), Q = (4, 1, -1). R = (3, -1, 1).
(b) P = (2,2, 3), Q = (-2, 3, 1), R = (-6,4, 1).
(c) P = (2, 1, l), Q = (-2, 3, l), R = (5, -1, 1).
6. Among the following eight points, the three points A, B, and C lie on a line. Determine a11
subsets of three or more points which lie on a line: A = (2, 1, l), B = (6, -1, l), C =
(-6, 5, l), D = (-2, 3, l), E = (1, 1, l), F = (-4,4, l), G = (-13, 9, l), H = (14, -6, 1).
7. A line through the point P = (1, 1, 1) is parallel to the vector A = (1, 2,3). Another line
through Q = (2, 1, 0) is parallel to the vector B = (3, 8, 13). Prove that the two lines intersect
and determine the point of intersection.
8. (a) Prove that two lines L(P; A) and L(Q; B) in V, intersect if and only if P - Q is in the
linear span of A and B.
(b) Determine whether or not the following two lines in Vs intersect :
L = ((1, 1, -1) + t(-2,1,3)},
L’ = ((3, -4, 1) + t( -1, 5,2)} .
9. Let X(t) = P + tA be an arbitrary point on the line L(P; A), where P = (1,2, 3) and A =
(1, -2,2), and let Q = (3, 3, 1).
(a) Compute IiQ - X(t)112, the square of the distance between Q and X(t).
(b) Prove that there is exactly one point X(t,) for which the distance /I Q - X(t)11 is a minimum,
and compute this minimum distance.
(c) Prove that Q - X(t,J is orthogonal to A.
10. Let Q be a point not on the line L(P; A) in V, .
(a) Letf(t) = 11Q - X(t)112, where X(r) = P + tA. Prove thatf(t) is a quadratic polynomial
in t and that this polynomial takes on its minimum value at exactly one t, say at t = t,, .
(b) Prove that Q - X(t,,) is orthogonal to A.
11. Given two parallel lines L(P; A) and L(Q; A) in V, . Prove that either L(P; A) = L(Q; A)
or the intersection L(P; A) n L(Q; A) is empty.
12. Given two lines L(P; A) and L(Q; B) in V, which are not parallel. Prove that the intersection
is either empty or consists of exactly one point.
478
Applications of vector algebra to analytic geometry
13.6 Planes in Euclidean n-space
A line in n-space was defined to be a set of the form {P + tA} obtained by adding to a
given point P a11 vectors in the linear span of a nonzero vector A. A plane is defined in a
similar fashion except that we add to P a11 vectors in the linear span of two linearly independent vectors A and B. TO make certain that Vn contains two linearly independent
vectors, we assume at the outset that n 2 2. Most of our applications Will be concerned
with the case IZ = 3.
+ fB
FIGURE
13.3
The plane through P spanned by A and B, and its geometric relation
to the plane through 0 spanned by A and B.
DEFINITION.
A set M of points in V, is called a plane if there is a point P and t\vo linearly
independent vectors A and B such that
M = {P + SA + tB 1s, t real) .
We shah denote the set more briefly by writing M = {P + SA + tB}. Each point of A4
is said to be on the plane. In particular, taking s = t = 0, we see that P is on the plane. The
set {P + SA + tB} is also called the plane through P spanned by A and B. When P is the
origin, the plane is simply the linear span of A and B. Figure 13.3 shows a plane in V,
through the origin spanned by A and B and also a plane through a nonzero point P spanned
by the same two vectors.
Now we shah deduce some properties of planes analogous to the properties of lines given
in Theorems 13.1 through 13.4. The first of these shows that the vectors A and B in the
definition of the plane {P + SA + tB} cari be replaced by any other pair which has the
same linear span.
THEOREM
13.7. Tico planes M = {P + SA + tB} and M’ = {P + SC + tD} through
thé same point P are equal if and only if the Iinear span of A and B is equal to the linear
span qf C and D.
Planes in Euclidean n-space
Proof.
If the linear span of A and B is equal to that of C and D, then it is clear
M = M’. Conversely, assume that A4 = M’. Plane M contains both P + A and P
Since both these points are also on M’, each of A and B must be in the linear span
and D. Similarly, each of C and D is in the linear span of A and B. Therefore the
span of A and B is equal to that of C and D.
479
that
+ B.
of C
linear
The next theorem shows that the point P which occurs in the definition of the plane
{P + sA + tB} cari be replaced by any other point Q on the same plane.
THEOREM 13.8.
Two planes M = {P + SA + tB} and M’ = {Q + sA + tB} spanned by
the same vectors A and B are equal if and only if Q is on M.
Proof.
If M = M’, then Q is certainly on M. TO prove the converse, assume Q is on
M, say Q = P + aA + bB. Take any point X in M. Then X = P + SA + tB for some
scalars s and t. But P = Q - aA - bB, SO X = Q + (s - a)A + (t - b)B. Therefore
X is in M’, SO M E M’. Similarly, we find that M’ c M, SO the two planes are equal.
Euclid’s parallel postulate (Theorem 13.3) has an analog for planes. Before we state this
theorem we need to define parallelism of two planes. The definition is suggested by the
geometric representation in Figure 13.3.
DEFINITION.
Tw’o planes M = {P + SA + tB} and M’ = {Q + SC + tD} are said to
be parallel if the linear span of A and B is equal to the Iinear span of C and D. We also say
that a vector X is parallel to the plane M if X is in the linear span of A and B.
THEOREM 13.9. Given a plane M and a point Q not on M, there is one and only one plane
M’ which contains Q and is parallel to M.
Proof.
Let M = {P + SA + tB} and consider the plane M’ = {Q + SA + tB}. This
plane contains Q and is spanned by the same vectors A and B which span M. Therefore
M’ is parallel to M. If M” is another plane through Q parallel to M, then
M” = {Q + SC + tD}
where the linear span of C and D is equal to that of A and B. By Theorem 13.7, we must
have M” = M’. Therefore M’ is the only plane through Q which is parallel to M.
Theorem 13.4 tells us that two distinct points determine a line. The next theorem shows
that three distinct points determine a plane, provided that the three points are not collinear.
THEOREM
13.10. If P, Q, and R are three points not on the same line, then there is one
and only one plane M containing these three points. It cari be described as the set
(13.4)
M = {P + s(Q - P) + t(R - P)>.
480
Applications of vector algebra to analytic geometry
Proof. We assume first that one of the points, say P, is the origin. Then Q and R are
not on the same line through the origin SO they are linearly independent. Therefore, they
span a plane through the origin, say the plane
M’ = {SQ + tR} .
This plane contains a11 three points 0, Q, and R.
Now we prove that M’ is the only plane which contains
Any other plane through the origin has the form
a11 three points 0, Q, and R.
M” = {SA + tB},
where A and B are linearly independent. If M” contains Q and R, we have
(13.5)
Q = aA,+ bB,
R=cA+dB,
for some scalars a, b, c, d. Hence, every linear combination of Q and R is also a linear
combination of A and B, SO M’ c M”.
TO prove that M” c M’, it suffices to prove that each of A and B is a linear combination
of Q and R. Multiplying the first equation in (13.5) by d and the second by b and subtracting, we eliminate B and get
(ad - bc)A = dQ - bR.
Now ad - bc cannot be zero, otherwise Q and R would be dependent. Therefore we cari
divide by ad - bc and express A as a linear combination of Q and R. Similarly, we cari
express B as a linear combination of Q and R, SO we have M” E M’. This proves the
theorem when one of the three points P, Q, R is the origin.
TO prove the theorem in the general case, let M be the set in (13.4), and let C = Q - P,
D = R - P. First we show that C and D are linearly independent. If not we would have
D = tC for some scalar t, giving us R - P = t(Q - P), or R = P + t(Q - P), contradicting the fact that P, Q, R are not on the same line. Therefore the set M is a plane
through P spanned by the linearly independent pair C and D. This plane contains a11 three
points P, Q, and R (take s = 1, t = 0 to get Q, and s = 0, t = 1 to get R). Now we must
prove that this is the only plane containing P, Q, and R.
Let M’ be any plane containing P, Q, and R. Since M’ is a plane containing P, we have
M’ = (P + SA + tB}
for some linearly independent pair A and B. Let Mi = {SA + tB} be the plane through the
origin spanned by the same pair A and B. Clearly, M’ contains a vector X if and only if
Mi contains X - P. Since M’ contains Q and R, the plane Mi contains C = Q - P and
D = R - P. But we have just shown that there is one and only one plane containing 0,
C, and D since C and D are linearly independent. Therefore Mi = {SC + tD}, SO M’ =
{P + SC + tD} = M. This completes the proof.
In Theorem 13.5 we proved that two vectors in V, are linearly dependent if and only if
481
Planes and vector-valued functions
they lie on a line through the origin. The next theorem is the corresponding result for three
vectors.
THEOREM 13.11.
Three vectors A, B, C in V, are linearly dependent if and only if they
lie on the same plane through the origin.
Proof. Assume A, B, C are dependent. Then we cari express one of the vectors as a
linear combination of the other two, say C = SA + tB. If A and B are independent, they
span a plane through the origin and C is on this plane. If A and B are dependent, then
A, B, and C lie on a line through the origin, and hence they lie on any plane through the
origin which contains a11 three points A, B, and C.
TO prove the converse, assume that A, B, C lie on the same plane through the origin, say
the plane M. If A and B are dependent, then A, B, and C are dependent, and there is
nothing more to prove. If A and B are independent, they span a plane M’ through the
origin. By Theorem 13.10, there is one and only one plane through 0 containing A and B.
Therefore M’ = M. Since C is on this plane, we must have C = SA + tB, SO A, B, and
C are dependent.
13.7 Planes and vector-valued functions
The correspondence which associates to each pair of real numbers s and t the vector
P + SA + tB on the plane M = {P + SA + tB} is another example of a vector-valued
function. In this case, the domain of the function is the set of a11 pairs of real numbers
(s, t) and its range is the plane M. If we denote the function by X and the function values
by X(s, t), then for each pair (s, t) we have
(13.6)
X(s,t)=P+sA+tB.
We cal1 X a vector-valued function of two real variables. The scalars s and t are called
parameters, and the equation (13.6) is called a parametric or vector equation of the plane.
This is analogous to the representation of a line by a vector-valued function of one real
variable. The presence
of two parameters in Equation (13.6) gives the plane a twodimensional quality. When each vector is in V, and is expressed in terms of its components,
SaY
P=(Pl,pz,pJ,
A = (a1 , a2 , aa> ,
B = PI, b, >h),
and
w, t> = (x, y, 4 9
the vector equation (13.6) cari be replaced by three scalar equations,
x = p1 + sa, + tb,,
Y = pz + sa, + tb,,
z = p3 + sa3 + tb,.
The parameters s and t cari always be eliminated from these three equations to give one
linear equation of the form ax + by + cz = d, called a Cartesian equation of the plane.
We illustrate with an example.
482
Applications of vector algebra to analytic geometry
E X A M P L E . Let M = {P + sA + tB}, where P = (1, 2, 3) A = (1, 2, l), and B =
(1, -4, -1). The corresponding vector equation is
qs, t) = (1,2, 3) + S(I, 2, 1) + t(l, -4, -1).
From this we obtain the three scalar parametric equations
x=1+s+t,
y = 2 + 2s - 4 )
2=3+s-t.
TO obtain a Cartesian equation, we rewrite the first and third equations in the form x - 1 =
s + t, z - 3 = s - t. Adding and then subtracting these equations, we find that 2s =
x + z - 4, 2t = x - z + 2. Substituting in the equation for y, we are led to the Cartesian
equation x + y - 3z = -6. We shall return to a further study of linear Cartesian equations in Section 13.16.
13.8 Exercises
1. Let M = (P + SA + tB}, where P = (1, 2, -3), A = (3, 2, l), and B = (1, 0,4). Determine
which of the following points are on M.
(4 (1,2,0); (b) (1,2, 1); (cl (64, 6); (dl (6% 6); Ce> (6, 6, -9.
2. The three points P = (1, 1, -l), Q = (3, 3, 2), and R = (3, -1, -2) determine a plane M.
Determine which of the following points are on M.
(a> (2,2,&); (b) (4,0, -4); Cc) (-3,1, - 3 ) ; (4 (3,1,3); Ce> (O,O,O).
3. Determine scalar parametric equations for each of the following planes.
(a) The plane through (1, 2, 1) spanned by the vectors (0, 1,O) and (1, 1, 4).
(b) The plane through (1,2, l), (0, 1, 0), and (1, 1,4).
4. A plane M has scalar parametric equations
x=l+s-2t,
y=2+s+4t,
2 = 2s + t.
(a) Determine which of the following points are on M: (0, 0, 0), (1, 2, 0), (2, -3, -3).
(b) Find vectors P, A, and B such that M = {P + SA + tB}.
5. Let M be the plane determined by three points P, Q, R not on the same line.
(a) If p, q, r are three scalars such that p + q + r = 1, prove that pP + qQ + rR is on M.
(b) Prove that every point on M has the formpP + qQ + rR, wherep + q + r = 1.
6. Determine a linear Cartesian equation of the form ax + by + cz = d for each of the following
planes.
(a) The plane through (2, 3, 1) spanned by (3,2, 1) and (-1, -2, -3).
(b) The plane through (2,3, l), (-2, -1, -3), and (4,3, - 1).
(c) The plane through (2,3, 1) parallel to the plane through the origin spanned by (2,0, -2)
and (1, 1, 1).
7. A plane M has the Cartesian equation 3x - Sy + z = 9.
(a) Determine which of the following points are on M: (0, -2, -l), (-1, -2, 2), (3, 1, -5).
(b) Find vectors P, A, and B such that M = (P + SA + tB}.
8. Consider the two planes M = {P + SA + tB} and M’ = {Q + SC + tD}, where P = (1, 1, 1),
A = (2, -1, 3), B = (-1, 0, 2), Q = (2, 3, l), C = (1, 2, 3), and D = (3, 2, 1). Find two
distinct points on the intersection M n M’.
9. Given a plane M = {P + SA + tB}, where P = (2, 3, l), A = (1, 2, 3), and B = (3, 2, l), and
another plane .M’ with Cartesian equation x - 2y + z = 0.
(a) Determine whether M and M’ are parallel.
The cross product
483
(b) Find two points on the intersection M’ n M” if M” has the Cartesian equation
x+2y+z=o.
10. Let L be the line through (1, 1, 1) parallel to the vector (2, -1, 3), and let M be the plane
through (1, 1, -2) spanned by the vectors (2, 1, 3) and (0, 1, 1). Prove that there is one and
only one point on the intersection L n M and determine this point.
11. A line with direction vector X is said to be parallel to a plane M if X is parallel to M. Let
L be the line through (1, 1, 1) parallel to the vector (2, - 1, 3). Determine whether L is parallel
to each of the following planes.
(a) The plane through (1, 1, -2) spanned by (2, 1, 3) and (2, 1, 1).
(b) The plane through (1, 1, -2), (3, 5, 2), and (2,4, -1).
(c) The plane with Cartesian equation x + 2~ + 32 = -3.
12. Two distinct points P and Q lie on a plane M. Prove that every point on the line through P
and Q also lies on M.
13. Given the line L through (1,2,3) parallel to the vector (1, 1, l), and given a point (2, 3, 5)
which is not on L. Find a Cartesian equation for the plane M through (2, 3, 5) which contains
every point on L.
14. Given a line L and a point P not on L. Prove that there is one and only one plane through
P which contains every point on L.
13.9 The cross product
In many applications of vector algebra to problems in geometry and mechanics it is
helpful to have an easy method for constructing a vector perpendicular to each of two
given vectors A and B. This is accomplished by means of the cross product A x B (read
“A cross B”) which is defined as follows:
DEFINITION.
product A
x
Let A = (a, , a2, as) and B = (b, , b, , b3) be two vectors in V, .
B (in that order) is de$ned to be the vector
Their cross
A x B = (a,b, - a,b, , a,b, - a,b, , a,b, - a,b,) .
The following properties are easily deduced from this definition.
13.12. For a11 vectors A, B, C in VS and for a11 real c kr,e have:
A x B = -(B x A)
(skew symmetry),
(distributive law),
A x (B + C) = (A x B) + (A x C)
C(A x B) = (CA) x B,
A . (A x B) = 0
(orthogonality to A),
B *(A x B) = 0
(orthogonality to B),
(Lagrange’s identity),
(f) IIA x Bll’ = I14/211Bl12 - (A. Bj2
(g) A x B = 0
if and only if A and B are linearly dependent.
THEOREM
(a)
(b)
(c)
(d)
(e)
Proof. Parts (a), (b), and (c) follow quickly from the definition and are left as exercises
for the reader. TO prove (d), we note that
A *(A x B) = a,(a,b, - a,b,) + a,(a,b, - a,b,) + a,(a,b, - a,b,) = 0 .
484
Applications of vector algebra to analytic geometry
Part (e) follows in the same way, or it cari be deduced from (a) and (d). TO prove (f), we
Write
IIA x BI12 = (a,b, - a2b2j2 + (a&, - a,b,)2 + (a$, - a2bA2
and
I A Il2 Il Bll 2 - (A *B)2 = (a: + ai + ai)@: + bi + ba) - (a&, + a&, + a3b3)2
and then verify by brute force that the two right-hand members are identical.
Property (f) shows that A x B = 0 if and only if (A *B)2 = 11
A I 2 I Blj2. By the CauchySchwarz inequality (Theorem 12.3), this happens if and only if one of the vectors is a scalar
multiple of the other. In other words, A x B = 0 if and only if A and B are linearly
dependent, which proves (g).
EXAMPLES.
we fmd that
Both (a) and (g) show that A x A = 0.
ixj=k,
jxk=i,
From the definition of cross product
kxi=j.
The cross product is not associative. For example, we have
ix(ixj)=ixk=-j
but
(iXi)Xj=Oxj=O.
The next theorem describes two more fundamental properties of the cross product.
THEOREM
13.13. Let A and B be linearly independent vectors in V, . Then we have the
following:
(a) The vectors A, B, A x B are linearly independent.
(b) Every vector N in V, orthogonal to both A and B is a scalar multiple of A x B.
Proof.
Let C = A x B. Then C # 0 since A and B are linearly independent. Given
scalars a, 6, c such that aA + bB + CC = 0, we take the dot product of each member with
C and use the relations A . C = B. C = 0 to find c = 0. This gives aA + bB = 0, SO
a = b = 0 since A and B are independent. This proves (a).
Let N be any vector orthogonal to both A and B, and let C = A x B. We shah prove
that
(Na C)2 = (Ne N)(C. C).
Then from the Cauchy-Schwarz inequality (Theorem 12.3) it follows that N is a scalar
multiple of C.
Since A, B, and C are linearly independent, we know, by Theorem 12.10(c), that they
span V, . In particular, they span N, SO we cari Write
N = aA + bB + CC
for some scalars a, b, c. This gives us
N*N=N.(aA+bB+cC)=cN*C
The cross product
485
since N *A = N *B = 0. Also, since Cm A = C *B = 0, we have
C.N=C.(aA+bB+cC)=cC*C.
Therefore, (N *N)(C *C) = (cN *C)(C. C) = (N * C)(cc. C) = (N . C)2, which completes
the proof.
Theorem 13.12 helps us visualize the cross product geometrically. From properties (d)
and (e), we know that A x B is perpendicular to both A and B. When the vector A x B is
represented geometrically by an arrow, the direction of the arrow depends on the relative
k
k
i
j
i
\
j
(a) A right-handed coordinate system
FIGURE 13.4
AxB
(b) A left-handed coordinate system
Illustrating the relative positions of A, B, and A x B.
positions of the three unit coordinate vectors. If i, j, and k are arranged as shown in Figure
13.4(a), they are said to form a right-handed coordinate system. In this case, the direction of
A x B is determined by the “right-hand rule.” That is to say, when A is rotated into B
in such a way that the fingers of the right hand point in the direction of rotation, then the
thumb indicates the direction of A x B (assuming, for the sake of the discussion, that the
thumb is perpendicular to the other fingers). In a left-handed coordinate system, as shown
in Figure 13.4(b), the direction of A x B is reversed and may be determined by a corresponding left-hand rule.
The length of A x B has an interesting geometric interpretation. If A and B are nonzero
vectors making an angle 0 with each other, where 0 < 0 5 n-, we may Write A . B =
[(A 1111BII COS 0 in property (f) of Theorem 13.12 to obtain
IIA x Blj2 = IIA lj2~\B~~2(1
- cos2 0) = ~IA~~2~~B~~2 sin2 8,
from which we find
IIA x BII = IIA II IIBII sin 8 .
Since 11B I sin 8 is the altitude of the parallelogram determined by A and B (see Figure 13.9,
we see that the length of A x B is equal to the area of this parallelogram.
486
FIGURE 13.5
Applications of vector algebra to analytic geometry
The length of A x B is the area of the parallelogram determined by A and B.
13.10 The cross product expressed as a determinant
The formula which defines the cross product cari be put in a more compact form with the
aid of determinants. If a, b, c, d are four numbers, the difference ad - bc is often denoted
by the symbol
a b
Ic
dI
and is called a determinant (of order two). The numbers a, b, c, d are called its elements,
and they are said to be arranged in two horizontal roM’.s, a, b and c, d, and in two vertical
columns, a, c and b, d. Note that an interchange of two rows or of two columns only changes
the sign of the determinant. For example, since ad - bc = -(bc - ad), we have
If we express each of the components of the cross product as a determinant of order two,
the formula defining A x B becomes
This cari also be expressed in terms of the unit coordinate vectors i, j, k as follows:
(13.7)
Determinants of order three are written with three rows and three columns and they may
be defined in terms of second-order determinants by the formula
(13.8)
This is said to be an “expansion” of the determinant along its first row. Note that the
Exercises
487
determinant on the right that multiplies a, may be obtained from that on the left by deleting
the row and column in which a, appears. The other two determinants on the right are
obtained similarly.
Determinants of order greater than three are discussed in Volume II. Our only purpose
in introducing determinants of order two and three at this stage is to have a useful device
for writing certain formulas in a compact form that makes them easier to remember.
Determinants are meaningful if the elements in the first row are vectors. For example,
if we Write the determinant
and “expand” this according to the rule prescribed in (13.Q we find that the result is equal
to the right member of (13.7). In other words, we may Write the definition of the cross
product A x B in the following compact form:
b, b, b,
For example, to compute the cross product of A = 2i - 8j + 3k and B = 4j + 3k, we
Write
13.11 Exercises
1. Let A = -i + 2k, B = 2i + j - k, C = i + 2j + 2k. Compute each of the following
vectors in terms of i, j, k:
(a) A x B;
(d) A x (C x A);
Cg) (A x Cl x B;
(b) B x C;
(e) (A x B) x C;
(h) (A + B) x (A - C);
(c) c x A ;
(f) A x(BxC);
(i) (A x B) x (A x C).
2. In each case find a vector of length 1 in V, orthogonal to both A and B:
B=2i+3j-k;
(a) A=i+j+k,
B = -i + Sj + 7k;
(b) A = 2i - 3j + 4k,
(c) A = i - 2j + 3k,
B = -3i + 2j - k.
3. In each case use the cross product to compute the area of the triangle with vertices A, B, C:
B = (2,0, -l),
c =(3,4,0);
(4 A = (0,2,2),
B = (1, -3,4),
c = (1,2, 1);
0) A = (-293, 11,
c = (l,O, 1).
(cl A = (0, 0, 01,
B = (0, 1, 11,
4. If A = 2i + Sj + 3k, B = 2i + 7j + 4k, and C = 3i + 3j + 6k, express the cross product
(A - C) x (B - A) in terms of i, j, k.
5. Prove that IlA x BJI = i/All 11Bll if and only if A and B are orthogonal.
6. Given two linearly independent vectors A and B in V, . Let C = (B x A) - B.
(a) Prove that A is orthogonal to B + C.
488
Applications of vector algebra to analytic geometry
(b) Prove that the angle 0 between B and C satisfies $T < 0 < n.
(c) If 11811 = 1 and IIB x AIl = 2, compute the length of C.
7. Let A and B be two orthogonal vectors in Vs , each having length 1.
(a) Prove that A, B, A x B is an orthonormal basis for Vs.
(b) Let C = (A x B) x A. Prove that l/Cll = 1.
(c) Draw a figure showing the geometric relation between A, B, and A x B, and use this
figure to obtain the relations
(AxB)xA=B,
(AxB)xB=-A.
(d) Prove the relations in part (c) algebraically.
8. (a) If A x B = 0 and A . B = 0, then at least one of A or B is zero. Prove this statement
and give its geometric interpretation.
(b) Given A # 0. If A x B = A x C and A . B = A . C, prove that B = C.
9. LetA=2i-j+2kandC=3i+4j-k.
(a) Find a vector B such that A x B = C. 1s there more than one solution?
(b) Find a vector B such that A x B = C and A . B = 1. 1s there more than one solution?
10. Given a nonzero vector A and a vector C orthogonal to A, both vectors in V, . Prove that there
is exactly one vector B such that A x B = C and A . B = 1.
11. Three vertices of a parallelogram are at the points A = (1, 0, 1), B = (- 1, 1, 1), C =
(2, -192).
(a) Find a11 possible points D which cari be the fourth vertex of the parallelogram.
(b) Compute the area of triangle ABC.
12. Given two nonparallel vectors A and B in V, with A . B = 2, IlAIl = 1, [lB]l = 4. Let C =
2(A x B) - 3B. Compute A . (B + C), IlCll, and the cosine of the angle 0 between B and C.
13. Given two linearly independent vectors A and B in V, . Determine whether each of the following statements is true or false.
(a) A + B, A - B, A x B are linearly independent.
(b) A + B, A + (A x B), B + (A x B) are linearly independent.
(c) A, B, (A + B) x (A - B) are linearly independent.
14. (a) Prove that three vectors A, B, C in Y, lie on a line if and only if (B - A) x (C - A) = 0.
(b) If A # B, prove that the line through A and B consists of the set of a11 vectors P such
that (P - A) x (P - B) = 0.
15. Given two orthogonal vectors A, B in V, , each of length 1. Let P be a vector satisfying the
equation P x B = A - P. Prove each of the following statements.
(a) P is orthogonal to B and has length $A.
(b) P, B, P x B form a basis for V, .
(c) (P x B) x B = -P.
(d) P = $A - &(A x B).
13.12 The scalar triple product
The dot and cross products cari be combined to form the scalar tripleproduct A . B x C,
which cari only mean A *(B x C). Since this is a dot product of two vectors, its value is a
scalar. We cari compute this scalar by means of determinants. Write A = (a,, a2, a&
B = (b, , b, , b,), C = (cl, c2, cg) and express B x C according to Equation (13.7). Forming
the dot product with A, we obtain
A*BxC=a,
The scalar triple product
489
Thus, A *B x C is equal to the determinant whose rows are the components of the factors
A, B, and C.
In Theorem 13.12 we found that two vectors A and B are linearly dependent if and only
if their cross product A x B is the zero vector.
The next theorem gives a corresponding
criterion for linear dependence of three vectors.
THEOREM
13.14.
Three vectors A, B, C in V3 are linearly dependent ifand only if
A*Bx
C=O.
Proof. Assume first that A, B, and C are dependent. If B and C are dependent, then
B x C = 0, and hence A . B x C = 0. Suppose, then, that B and C are independent.
Since a11 three are dependent, there exist scalars a, b, c, not a11 zero, such that aA + bB +
CC = 0. We must have a # 0 in this relation, otherwise B and C would be dependent.
Therefore, we cari divide by a and express A as a linear combination of B and C, say A =
tB + SC. Taking the dot product of each member with B x C, we find
A.(Bx C)=tB.Bx
C+sC.Bx
C=O,
since each of B and C is orthogonal to B x C. Therefore dependence of A, B, and C
implies A . B x C = 0.
TO prove the converse, assume that A . B x C = 0. If B and C are dependent, then SO
are A, B, and C, and there is nothing more to prove. Assume then, that B and C are linearly
independent. Then, by Theorem 13.13, the three vectors B, C, and B x C are linearly
independent. Hence, they span A SO we cari Write
A = aB + bC + c(B X C)
for some scalars a, b, c. Taking the dot product of each member with B x C and using the
fact that A *(B x C) = 0, we find c = 0, SO A = aB + bC. This proves that A, B, and C
are linearly dependent.
EXAMPLE. TO determine whether the three vectors (2, 3, - l), (3, -7, 5) and (1, -5, 2)
are dependent, we form their scalar triple product, expressing it as the determinant
2
3
-1
3
-7
5 =2(-14+25)-3(6-5)-l(-15+7)=27.
1
-5
2
Since the scalar triple product is nonzero, the vectors are linearly independent.
The scalar triple product has an interesting geometric interpretation. Figure 13.6 shows
a parallelepiped determined by three geometric vectors A, B, C not in the same plane. Its
altitude is l]Cll COS 4, where 4 is the angle between A x B and C. In this figure, COS 4
is positive because 0 5 4 < &r. The area of the parallelogram which forms the base is
11A x Bll, and this is also the area of each cross section parallel to the base. Integrating
the cross-sectional area from 0 to I CII COS 4, we find that the volume of the parallelepiped
490
Applications of vector algebra to analytic geometry
is 11A x BII (11 CII
COS
4), the area of the base times the altitude. But we have
IIA x Bil(IlCllcos+)=(A
x B).C.
In other words, the scalar triple product A x B . C is equal to the volume of the parallelepiped determined by A, B, C. When &r < q5 5 r, COS 4 is negative and the product
A x B *C is the negative of the volume. If A, B, C are on a plane through the origin, they
are linearly dependent and their scalar triple product is zero. In this case, the parallelepiped
degenerates and has zero volume.
AxB
Altitude = 1 Cl1
F IGURE 13.6
COS
Volume= AxB.C
6-
Geometric interpretation of the scalar triple product as the volume of
a parallelepiped.
This geometric interpretation of the scalar triple product suggests certain algebraic
properties of this product. For example, a cyclic permutation of the vectors A, B, C
leaves the scalar triple product unchanged. By this we mean that
(13.9)
AxB.C=BxC.A=CxA.B.
An algebraic proof of this property is outlined in Exercise 7 of Section 13.14. This property
implies that the dot and cross are interchangeable in a scalar triple product. In fact, the
commutativity of the dot product implies (B x C) . A = A . (B x C) and when this is
combined with the first equation in (13.9), we find that
(13.10)
AxB-C=A.BxC.
The scalar triple product A * B x C is often denoted by the symbol [ABC] without indicating the dot or cross. Because of Equation (13. lO), there is no ambiguity in this notationthe product depends only on the order of the factors A, B, C and not on the positions of
the dot and cross.
13.13 Cramer%
rule for solving a system of three Iinear equations
The scalar triple product may be used to solve a system of three simultaneous linear
equations in three unknowns x, y, z. Suppose the system is written in the form
w + &y + clz = dl ,
(13.11)
a,x + b,y + c2z = d, ,
a2x + b,y + c3z = d, .
491
Exercises
Let A be the vector with components a, , a2 , a3 and define B, C, and D similarly. Then the
three equations in (13.11) are equivalent to the single vector equation
(13.12)
xA+yB+zC=
D.
If we dot multiply both sides of this equation with B x C, writing [ABC] for A . B x C,
we find that
x[ABC] + y[BBC] + z[CBC] = [DBC] .
Since [BBC] = [CBC] = 0, the coefficients of y and z drop out and we obtain
xJ!Kl
(13.13)
if [ABC] # 0.
WC1
A similar argument yields analogous formulas for y and z. Thus we have
[ADCI
(13.14)
and
Y = [ABC]
zJL!El
if [ABC] # 0 .
[ABCI
The condition [ABC] # 0 means that the three vectors A, B, C are linearly independent.
In this case, (13.12) shows that every vector D in 3-space is spanned by A, B, C and the
multipliers x, y, z are uniquely determined by the formulas in (13.13) and (13.14). When
the scalar triple products that occur in these formulas are written as determinants. the
result is known as Cramer’s rule for solving the system (13.11) :
X=
4
b,
cl
4
b,
~2
4
4
b,
b,
G
~1
a2
b2
c2
a3
b3
cg
3
y=
a1
a2
a3
a,
a2
a3
4
d2
d3
b,
b2
b3
cl
a,
h
4
c2
a2 b2
z= a3 b3
a, b,
a2 b2
a3 b3
d2
d3
cl
c2
c3
cl
c2
cg
)
cg
If [AK] = 0, then A, B, C lie on a plane through the origin and the system has no
solution unless D lies in the same plane. In this latter case, it is easy to show that there are
infinitely many solutions of the system. In fact, the vectors A, B, C are linearly dependent
SO there exist scalars U, U, u’ not a11 zero such that uA + vB + MC = 0. If the triple (x, y, z)
satisfies (13.12) then SO does the triple (x + tu, y + tu, z + tw) for a11 real t, since we have
(x + tu)A + (y + tv)B + (z + tw)C
13.14 Exercises
1. Compute the scalar triple product A . B x C in each case.
B = (0,4, o),
c = (0, 0,s).
(a> A = (3,0,0>,
B = (3, -7, 5),
c = (1, -5,2).
(b) A = (293, -11,
B = (-3,0, 6),
c = (4, 5, -1).
Cc> A = (2, 1, 3),
492
Applications of vector algebra to analytic geometry
2. Find a11 real t for which the three vectors (1, t, l), (t, 1, 0), (0, 1, t) are linearly dependent.
3. Compute the volume of the parallelepiped determined by the vectors i + j, j + k, k + i.
4.ProvethatAxB=A.(Bxi)i+A.(Bxj)j+A.(Bxk)k.
5.Provethatix(Axi)+jx(Axj)+kx(Axk)=2A.
6. (a) Find a11 vectors ai + bj + ck which satisfy the relation
(ai + bj + ck) *k x (6i + 3j + 4k) = 3 .
(b) Find that vector ai + bj + ck of shortest length which satisfies the relation in (a).
7. Use algebraic properties of the dot and cross products to derive the following properties of
the scalar triple product.
(a) (A + B) . (A + B) x C = 0.
(b) A B x C = -B . A x C. This shows that switching the first two vectors reverses the
sign. [H&t: Use part (a) and distributive laws.]
(c) A . B x C = -A . C x B. This shows that switching the second and third vectors
reverses the sign. [Hint: Use skew-symmetry.]
(d) A . B x C = -C . B x A. This shows that switching the first and third vectors reverses
the sign. [Hint: Use (b) and (c).]
Equating the right members of(b), (c), and (d), we find that
A.BxC=B.CxA=C.AxB,
which shows that a cyclic permutation of A, B, C leaves their scalar triple product unchanged.
9. This exercise outlines a proof of the vector identity
(13.15)
A x (B x C) = (C. A)B - (B . A)C >
sometimes referred to as the “cab minus bac” formula. Let B = (b, , b, , b3), C = (cl , c2 ,
and prove that
ix(BxC)=c,B-b,C.
CJ
This proves (13.15) in the special case A = i. Prove corresponding formulas for A = j and
A = k, and then combine them to obtain (13.15).
10. Use the “cab minus bac” formula of Exercise 9 to derive the following vector identities.
(a) (A x B) x (C x D) = (A x B. D)C - (A x B. C)D.
(b) A x (B x C) + B x (C x A) + C x (A x B) = 0.
(c) A x ( B x C)=(A x B) x CifandonlyifB
x (C x A) =O.
(d) (A x B) (C x D) = (B . D)(A C) - (B . C)(A . D).
11. Four vectors A, B, C, D in V, satisfy the relations A x C. B = 5, A x D . B = 3, C + D =
i+2j+k,C-D=i-k. Compute(A xB) x(Cx D ) i n t e r m s o f i , j , k .
12. Prove that (A x B) . (B x C) x (C x A) = (A . B x C)2.
13. Prove or disprove the formula A x [A x (A x B)] . C = - lIAIl A . B x C.
14. (a) Prove that the volume of the tetrahedron whose vertices are A, B, C, D is
+ j(B - A). (C - A) x (D - A)I .
(b) Compute this volume when A = (1, 1, l), B = (0, 0, 2), C = (0,3,0), and D = (4,0,0).
15. (a) If B # C, prove that the perpendicular distance from A to the line through B and C is
/IV - B) x (C - B)II/lIB - Cil .
(b) Compute this distance when A = (1, -2, -5), B = (-1, 1, 1), and C = (4, 5, 1).
493
Normal vectors to planes
16. Heron’s formula for computing the area S of a triangle whose sides have lengths a, b, c states
that S = ds(s - U)(S - b)(s - c), where s = (a + b + c)/2.
This exercise outlines a
vectorial proof of this formula.
Assume the triangle has vertices at 0, A, and B, with 1jAli = a, /iBlI = b, jlB - AIl = c.
(a) Combine the two identities
IIA x Bl? = lIA/121/Bl12 - (A . N2,
-2A . B = IIA - Bl12 - IjA1j2 - IIBj12
to obtain the formula
4~2 = a2@ - $(c2 - a2 - 62)2 = a(2ab _ c2 + a2 + b2)(2ab
+ c2 - a2 - b2) .
(b) Rewrite the formula in part (a) to obtain
S2 = 116(a + b + c)(a + b - C)(C - a + b)(c + a - b) ,
and thereby deduce Heron’s formula.
Use Cramer% rule to solve the system of equations in each of Exercises 17, 18, and 19.
17. x + 2y + 3z = 5,
2x -y +4z = 11,
-y +z = 3.
18. x +y +2z =4,
3x-y-z=2,
2x + 5y + 3z = 3.
y+z=5.
19. x + y = 5,
x+z=2,
20. If P = (1, 1, 1) and A = (2, 1, -l), prove that each point (x, y, z) on the line {P + tA}
satisfies the system of linear equations x - y + z = 1, x + y + 3z = 5, 3x + y + 7z = 11.
13.15 Normal vectors to planes
A plane was defined in Section 13.6 as a set of the form {P + sA + tB>, where A and B
are linearly independent vectors. Now we show that planes in Va cari be described in an
entirely different way, using the concept of a normal vector.
DEFINITION.
Let M = {P + SA + tB} be the plane through P spanned by A and B. A
vector N in V, is said to be perpendicular to M tf N is perpendicular to both A and B.
If, in
addition, N is nonzero, then N is called a normal vector to the plane.
Note: If N. A = N. B = 0, then N ’ (sA + tB) = 0, SO a vector perpendicular to
both A and B is perpendicular to every vector in the linear span of A and B. Also, if
N is normal to a plane, SO is tN for every real t Z 0.
THEOREM
13.15. Given a plane M = {P + SA + tB} through P spanned by A and B.
Let N = A x B. Then we have the following:
(a) N is a normal vector to M.
(b) M is the set of all X in V, satisfying the equation
(13.16)
(2’ - P) *N = 0,
Proof Since M is a plane, A and B are linearly independent,
proves (a) since A x B is orthogonal to both A and B.
SO
A
x
B # 0. This
494
Applications of vector algebra to analytic geometry
TO prove (b), let M’ be the set of a11 X in V, satisfying Equation (13.16). If X E M, then
X - P is in the linear span of A and B, SO X - P is orthogonal to N. Therefore XE M’
which proves that M c M’. Conversely, suppose XE M’. Then X satisfies (13.16). Since
A, B, N are linearly independent (Theorem 13.13), they span every vector in V, SO, in
particular, we have
X-P=sA+tB+uN
for some scalars s, t, U. Taking the dot product of each member with N, we find u = 0,
SO x - P = SA + tB. This shows that XE M. Hence, M’ G M, which completes the
proof of(b).
The geometric meaning of Theorem 13.15 is shown in Figure 13.7. The points P and X
are on the plane and the normal vector N is orthogonal to X - P. This figure suggests the
following theorem.
THEOREM
13.16.
Given a plane M through a point P, and given a nonzero vector N normal
to M, let
(13.17)
d = 1’ ’ NI
-K-’
Then every X on M has length ( X(/ 2 d. Moreover, we have (j X(l = d if and on& if X is the
projection of P along N:
P*N
X=tN,
where t = N.N’
ProoJ The proof follows from the Cauchy-Schwarz inequality in exactly the same
way as we proved Theorem 13.6, the corresponding result for lines in I’, .
By the same argument we find that if Q is a point not on M, then among a11 points X
on M the smallest length Ij X - Q I occurs when X - Q is the projection of P - Q along
N. This minimum length is J(P - Q) . Nl/llNil and is called the distance from Q to the
plane. The number d in (13.17) is the distance from the origin to the plane.
13.16 Linear Cartesian equations for planes
The results of Theorems 13.15 and 13.16 cari also be expressed in terms of components.
If we Write N = (a, b, c), P = (xi , y1 , z,), and X = (x, y, z), Equation (13.16) becomes
(13.18)
a(x - xl) + b(y - yl) + c(z - ~1) = 0.
This is called a Cartesian equation for the plane, and it is satisfied by those and only those
points (x, y, z) which lie on the plane. The set of points satisfying (13.18) is not altered if
we multiply each of a, b, c by a nonzero scalar t. This simply amounts to a different choice
of normal vector in (13.16).
We may transpose the terms not involving x, y, and z, and Write (13.18) in the form
(13.19)
ax + by + cz = dl,
Linear Cartesian equations for planes
495
where d, = ax, + by, + czl . An equation of this type is said to be linear in x, y, and z.
We have just shown that every point (x, y, z) on a plane satisfies a linear Cartesian equation
(13.19) in which not a11 three of a, b, c are zero. Conversely, every linear equation with this
property represents a plane. (The reader may verify this as an exercise.)
The number d, in Equation (13.19) bears a simple relation to the distance d of the plane
from the origin. Since dl = P * N, we have Id,1 = IP *NJ = dllNl1. In particular Id11 = d
if the normal N has length 1. The plane passes through the origin if and only if dl = 0.
FI G U R E
13.7 A plane through P
X with normal vector N.
and
FIGURE
13.8 A plane with intercepts
3, 1, 2.
EXAMPLE. The Cartesian equation 2x + 6y + 32 = 6 represents a plane with normal
vector N = 2i + 6j + 3k. We rewrite the Cartesian equation in the form
from which it is apparent that the plane intersects the coordinate axes at the points (3,0,0),
(0, 1, 0), and (0, 0, 2). The numbers 3, 1, 2 are called, respectively, the x-, y-, and zintercepts of the plane. A knowledge of the intercepts makes it possible to sketch the plane
quickly. A portion of the plane is shown in Figure 13.8. Its distance d from the origin is
d = 6/IINII = 6/7.
Two parallel planes Will have a common normal N. If N = (a, b, c), the Cartesian equations of two parallel planes cari be written as follows:
ax + by + cz = d, ,
ax + by + cz = dz,
the only difference
being in the right-hand members. The number Id1 - d21/ljN/l is called
the perpendicular distance between the two planes, a definition suggested by Theorem 13.16.
496
Applications of vector algebra
to analytic geometry
Two planes are called perpendicular if a normal of one is perpendicular to a normal of the
other. More generally, if the normals of two planes make an angle 13 with each other, then
we say that 0 is an angle between the two planes.
1 3 . 1 7 Exercises
1. GivenvectorsA =2i+3j-4kandB=j+k.
(a) Find a nonzero vector N perpendicular to both A and B.
(b) Give a Cartesian equation for the plane through the origin spanned by A and B.
(c) Give a Cartesian equation for the plane through (1,2,3) spanned by A and B.
2. A plane has Cartesian equation x + 2y - 22 + 7 = 0. Find the following:
(a) a normal vector of unit length;
(b) the intercepts of the plane;
(c) the distance of the plane from the origin;
(d) the point Q on the plane nearest the origin.
3. Find a Cartesian equation of the plane which passes through (1,2, -3) and is parallel to the
plane given by 3x -y + 2z = 4. What is the distance between the two planes?
4. Four planes have Cartesian equations x + 2y - 22 = 5, 3x - 6y + 32 = 2, 2x + y + 2z =
- l , a n d x -2y +z =7.
(a) Show that two of them are parallel and the other two are perpendicular.
(b) Find the distance between the two parallel planes.
5. The three points (1, 1, -l), (3, 3,2), and (3, -1, -2) determine a plane. Find (a) a vector
normal to the plane; (b) a Cartesian equation for the plane; (c) the distance of the plane
from the origin.
6. Find a Cartesian equation for the plane determined by (1, 2, 3), (2, 3, 4), and (-1, 7, -2).
7. Determine an angle between the planes with Cartesian equations x + y = 1 and y + z = 2.
8. A line parallel to a nonzero vector N is said to be perpendicular to a plane M if N is normal
to M. Find a Cartesian equation for the plane through (2, 3, -7), given that the line through
(1,2, 3) and (2,4, 12) is perpendicular to this plane.
9. Find a vector parametric equation for the line which contains the point (2, 1, -3) and is
perpendicular to the plane given by 4x - 3y + z = 5.
10. A point moves in space in such a way that at time f its position is given by the vector X(r) =
(1 - t)i + (2 - 3t)j + (2t - 1)k.
(a) Prove that the point moves along a line. (Cal1 it L.)
(b) Find a vector N parallel to L.
(c) At what time does the point strike the plane given by 2x + 3y + 2z + 1 = O?
(d) Find a Cartesian equation for that plane parallel to the one in part (c) which contains
the point X(3).
(e) Find a Cartesian equation for that plane perpendicular to L which contains the point X(2).
11. Find a Cartesian equation for the plane through (1, 1, 1) if a normal vector N makes angles
in, in, &v, with i, j, k, respectively.
12. Compute the volume of the tetrahedron whose vertices are at the origin and at the points
where the coordinate axes intersect the plane given by x + 2y + 32 = 6.
13. Find a vector A of length 1 perpendicular to i + 2j - 3k and parallel to the plane with
Cartesian equation x - y + 5z = 1.
14. Find a Cartesian equation of the plane which is parallel to both vectors i + j and j + k and
intersects the x-axis at (2,0, 0).
15. Find a11 points which lie on the intersection of the three planes given by 3x + y + z = 5,
3x+y+5z=7,x-y+32=3.
16. Prove that three planes whose normals are linearly independent intersect in one and only
one point.
The conic sections
497
17. A line with direction vector A is said to be parallel to a plane M if A is parallel to M. A line
containing (1, 2, 3) is parallel to each of the planes given by x + 2y + 32 = 4, 2x + 3y +
4z = 5. Find a vector parametric equation for this line.
18. Given a line L not parallel to a plane M, prove that the intersection L I-J M contains exactly
one point.
19. (a) Prove that the distance from the point (x,, , y,, z,,) to the plane with Cartesian equation
ax + by + cz + d = 0 is
I~X, + by,, + czo + dl
(2 + b2 + c2p
.
(b) Find the point P on the plane given by 5x - 14y + 2z + 9 = 0 which is nearest to the
point Q = (-2, 15, -7).
20. Find a Cartesian equation for the plane parallel to the plane given by 2x - y + 22 + 4 = 0
if the point (3,2, -1) is equidistant from both planes.
21. (a) If three points A, B, C determine a plane, prove that the distance from a point Q to this
plane is I(Q - A). (B - A) x (C - A)l/l[(B - A) x (C - A)lI.
(b) Compute this distance if Q = (1, 0, 0), A = (0, 1, l), B = (1, -1, l), and C = (2, 3,4).
22. Prove that if two planes M and M’ are not parallel, their intersection M n M’ is a line.
23. Find a Cartesian equation for the plane which is parallel to j and which passes through the
intersection of the planes described by the equations x + 2y + 32 = 4, and 2x + y + z = 2.
24. Find a Cartesian equation for the plane parallel to the vector 3i - j + 2k if it contains every
point on the line of intersection of the planes with equations x + y = 3 and 2y + 3z = 4.
13.18 The conic sections
A moving line G which intersects a fixed line A at a given point P, making a constant
angle 8 with A, where 0 < 0 < &T, generates a surface in 3-space called a right circular
cane. The line G is called a generator of the cane, A is its axis, and P its vertex. Each of the
cones shown in Figure 13.9 has a vertical axis. The Upper and lower portions of the cane
meeting at the vertex are called nappes of the cane. The curves obtained by slicing the
cane with a plane not passing through the vertex are called conic sections, or simply conics.
If the cutting plane is parallel to a line of the cane through the vertex, the conic is called a
FIGURE 13.9 The conic sections.
498
Applications of vector
algebra to analytic geometry
parabola. Otherwise the intersection is called an ellipse or a hyperbola, according as the
plane cuts just one or both nappes. (See Figure 13.9.) The hyperbola consists of two
“branches,” one on each nappe.
Many important discoveries in both pure and applied mathematics have been related
to the conic sections. Appolonius’ treatment of conics as early as the 3rd Century B. C. was
one of the most profound achievements of classical Greek geometry. Nearly 2000 years
later, Galileo discovered that a projectile fired horizontally from the top of a tower falls
to earth along a parabolic path (if air resistance
is neglected and if the motion takes place
above a part of the earth that cari be regarded as a flat plane). One of the turning points in
the history of astronomy occurred around 1600 when Kepler suggested that a11 planets
move in elliptical orbits. Some 80 years later, Newton was able to demonstrate that an
elliptical planetary path implies an inverse-square law of gravitational attraction. This led
Newton to formulate his famous theory of universal gravitation which has often been
referred to as the greatest scientific discovery ever made. Conic sections appear not only as
orbits of planets and satellites but also as trajectories of elementary atomic particles. They
are used in the design of lenses and mirrors, and in architecture. These examples and many
others show that the importance of the conic sections cari hardly be overestimated.
There are other equivalent definitions of the conic sections. One of these refers to special
points known as foci (singular: fous). An ellipse may be defined as the set of a11 points in a
plane the sum of whose distances d, and d, from two fixed points F1 and F, (the foci) is
1Directrix
I
d, = d,
(parabola)
FIGURE
13.10 Focal definitions of the conic sections.
constant. (See Figure 13.10.) If the foci coincide, the ellipse reduces to a circle. A hyperbola is the set of a11 points for which the difference Id, - d,l is constant. A parabola is the
set of a11 points in a plane for which the distance to a fixed point F (called the focus) is
equal to the distance to a given line (called the directrix).
There is a very simple and elegant argument which shows that the focal property of an
ellipse is a consequence
of its definition as a section of a cane. This proof, which we may
refer to as the “ice-cream-cane proof,” was discovered in 1822 by a Belgian mathematician,
G. P. Dandelin (1794-1847), and makes use of the two spheres S, and S, which are drawn
SO as to be tangent to the cutting plane and the cane, as illustrated in Figure 13.11. These
spheres touch the cane along two parallel circles C, and CZ . We shall prove that the points
FI and F, , where the spheres contact the plane, cari serve as foci of the ellipse.
The conic sections
499
F IGURE 13.11 The ice-cream-cane proof.
Let P be an arbitrary point of the ellipse. The problem is to prove that llP?l\\ + Ilk?211
is constant, that is, independent of the choice of P. For this purpose, draw that line on the
cane from the vertex 0 too and let AI and A, be its intersections with the Cir$es CI
and C, , respectively. _Then PFIAnd 3, are two tangents to S, from P, and hence IIPF, I =
llz1 11. Similarly (1 PF, I = IIPA, 11, and therefore we have
II~~III + llP~2ll = IIP~III + IlP~,II .
But Il~111 + II%I1 = IMlfj,ll> which is the distance between the parallel circles C, and
C, measured along the surface of the cane. This proves that FI and F, cari serve as foci of
the ellipse, as asserted.
Modifications of this proof work also for the hyperbola and the parabola. In the case
of the hyperbola, the proof employs one sphere in each portion of the cane. For the
500
Applications of vector algebra to analytic geometry
parabola one sphere tangent to the cutting plane at the focus Fis used. This sphere touches
the cane along a circle which lies in a plane whose intersection with the cutting plane is the
directrix of the parabola. With these hints the reader should be able to show that the focal
properties of the hyperbola and parabola may be deduced from their definitions as sections
of a cane.
13.19 Eccentricity of conic sections
Another characteristic property of conic sections involves a concept called eccentricity.
A conic section cari be defined as a curve traced out by a point moving in a plane in such
a way that the ratio of its distances from a fixed point and a fixed line is constant. This
constant ratio is called the eccentricity of the curve and is denoted by e. (This should not be
confused with the Euler number e.) The curve is an ellipse if 0 < e < 1, a parabola if
e = 1, and a hyperbola if e > 1. The fixed point is called a focus and the fixed line a
directrix.
We shah adopt this definition as the basis for our study of the conic sections since it
permits a simultaneous treatment of a11 three types of conics and lends itself to the use of
vector methods. In this discussion it is understood that a11 points and lines are in the same
plane.
DEFINITION.
Given a line L, a point F not on L, and a positive number e. Let d(X, L)
denote the distance from a point X to L. The set of ail X satisfying the relation
(13.20)
IIX - FI[ = e d(X, L)
is called a conic section with eccentricity e. The conic is called an ellipse if e < 1, a parabola
lfe = 1, and a hyperbola ife > 1.
If N is a vector normal to L and if P is any point on L the distance d(X, L) from any
point X to L is given by the forrnula
(f(X 2 L) = I(X - P). NI
IlNIl
’
When N has length 1, this simplifies to d(X, L) = 1(X - P) . NI, and the basic equation
(13.20) for the conic sections bec:omes
(13.21)
11
.Y - F/I = e I(X - P) *NI .
The line L separates the plane into two parts which we shall arbitrarily label as “positive”
and “negative” according to the choice of N. If (X - P) . N > 0, we say that X is in the
positive half-plane, and if (X - P) *N < 0, we say that X is in the negative half-plane.
On the line L itself we have (X - P) -N = 0. In Figure 13.12 the choice of the normal
vector N dictates that points to the right of L are in the positive half-plane and those to the
left are in the negative half-plane.
Now we place the focus F in the negative half-plane, as indicated in Figure 13.12, and
choose P to be that point on L nearest to F. Then P - F = dN, where Id/ = IIP - FI1 is
Polar equations for conic sections
501
Directrix L
llt;;h /=~;rmalto
L
) N unit normal to L
FocusF
FocusF
,~~,--__---L------------
P = F+dN
d-(X-F).N-
FIGURE 13.12 A conic section with eccentricity e is the set of a11 X satisfying
I\X - FI1 = e I(X - F). N - dl.
the distance from the focus to the directrix. Since F is in the negative half-plane, we have
(F - P) *N = -d < 0, SO dis positive. Replacing P by F + dN in (13.21), we obtain the
following theorem, which is illustrated in Figure 13.12.
THEOREM
13.17. Let C be a conic section with eccentricity e, focus F, and directrix L
at a distance d from F. If N is a unit normal to L and if F is in the negative half-plane determined by N, then C consists of a11 points X satisfying the equation
(13.22)
[IX - FI\ = e I(X - F) * N - dl .
13.20 Polar equations for conic sections
The equation in Theorem 13.17 cari be simplified if we place the focus
position. For example, if the focus is at the origin the equation becomes
(13.23)
in a special
IlXll = e IX- N - dl.
This form is especially useful if we wish to express X in terms of polar coordinates. Take
the directrix L to be vertical, as shown in Figure 13.13, and let N = i. If X has polar coordinates r and 8, we have I\Xll = r, X *N = r COS 8, and Equation (13.23) becomes
(13.24)
r = e Ir
COS
8 - dl.
If X lies to the left of the directrix, we have r Cos 0 < d, SO Ir COS 19 - dl = d - r
and (13.24) becomes r = e(d - r COS 13), or, solving for r, we obtain
(t3.25)
r =
ed
e cas 8 + 1.
COS
0
502
Applications
cpf
vector
algebra to analytic geometry
If X lies to the right of the directrix, we have r
r = e(r
COS
COS
0 > d,
SO
’
(13.24) becomes
8 - d) ,
giving us
r =
(13.26)
ed
ecose- 1 ’
Since r > 0, this last equation implies e > 1. In other words, there are points to the right
of the directrix only for the hyperbola. Thus, we have proved the following theorem which
is illustrated in Figure 13.13.
I
F
(a) r COS 8 < d o n the ellipse, parabola,
and left branch of the hyperb’ola
FIGURE
13.13
(b) r COS f3 > don the right branch of
the hyperbola
Conic sections with polar equation r = e Ir COS 0 - dl. The focus F
is at the origin and lies to the left of the directrix.
THEOREM
13.18. Let C be a conic section with eccentricity e, uith a focus F ai the origin,
and with a vertical directri.u L at a distance d to the right of F. If 0 < e 5 1, the conic C is
an ellipse or a parabola; every poirjt on C lies to the left of L and satis$es the polar equation
r =
ed
e
COS
e+1.
e > 1, the curve is a hyperbola risith a branch on each side of L. Points on the Ieft branch
satisfy (13.27) andpoints on the right branch satisfy
If
r =
ed
e COS
e
- 1 ’
Polar equations corresponding to other positions of the directrix are discussed in the
next set of exercises.
Exercises
503
13.21 Exercises
1. Prove that Equation (13.22) in Theorem 13.17 must be replaced by
I/X - FI] = e I(X - F) . N + dl
if Fis in the positive half-plane determined by N.
2. Let C be a conic section with eccentricity e, with a focus at the origin, and with a vertical
directrix L at a distance d to the left of F.
(a) Prove that if C is an ellipse or parabola, every point of C lies to the right of L and satisfies
the polar equation
ed
‘=l -ecos8’
(b) Prove that if C is a hyperbola, points on the right branch satisfy the equation in part (a)
and points on the left branch satisfy r = -ed/(l + e COS 0). Note that 1 + e COS 0 is always
negative in this case.
3. If a conic section has a horizontal directrix at a distance d above a focus at the origin, prove
that its points satisfy the polar equations obtained from those in Theorem 13.18 by replacing
COS 0 by sin 0. What are the corresponding polar equations if the directrix is horizontal and
lies below the focus?
Each of Exercises 4 through 9 gives a polar equation for a conic section with a focus Fat the
origin and a vertical directrix lying to the right of F. In each case, determine the eccentricity e
and the distance d from the focus to the directrix. Make a sketch showing the relation of the curve
to its focus and directrix.
4. r =
2
1 + COS 0.
5. r =
3
1 + g COS 0.
7. r =
1
-3 + COS e.
In each of Exercises 10 through 12, a conic section of eccentricity e has a focus at the origin and
a directrix with the given Cartesian equation. In each case, compute the distance d from the focus
to the directrix and determine a polar equation for the conic section. For a hyperbola, give a
polar equation for each branch. Make a sketch showing the relation of the curve to its focus
and directrix.
10. e = 4; directrix: 3x + 4~ = 25.
11. e = 1; directrix: 4x + 3y = 25.
12. e = 2; directrix: x + y = 1.
13. A cornet moves in a parabolic orbit with the sun at the focus. When the cornet is lO* miles
from the sun, a vector from the focus to the cornet makes an angle of 71/3 with a unit vector
N from the focus perpendicular to the directrix, the focus being in the negative half-plane
determined by N.
(a) Find a polar equation for the orbit, taking the origin at the focus, and compute the
smallest distance from the cornet to the sun.
(b) Solve part (a) if the focus is in the positive half-plane determined by N.
504
Applications of vector algebra to analytic geometry
13.22 Conic sections symmetric about the origin
A set of points is said to be symmetric about the origin if -X is in the set whenever X is
in the set. We show next that the focus of an ellipse or hyperbola cari always be placed SO
the conic section Will be symmetric about the origin. TO do this we rewrite the basic
equation (13.22) as follows:
(13.29)
I(X - P(( = e I(X - F) *N - dl = e(X-N-F-N-dl=
(eX*N-ai,
where a = ed + eF * N. Squaring both members, we obtain
(13.30)
llXl12 - 2F. X + [IFIl = e2(X*N)2 - 2eaX
Download