Uploaded by Claudia Rodriguez Gadea

calculus theory

advertisement
Analysis of functions and their graphical
representation
Michael Stich
November 17, 2021
7
Analysis of functions and their graphical representation
In this chapter we apply the methods we have studied in the previous chapters to
analyse functions and in particular to perform graphical representations.
7.1
Monotonicity and curvature of a function
Theorem 1 (graphical characterization of differentiable functions I):
Let f be a function continuous in [a, b] and differentiable in (a, b).
(i) If f 0 (x) > 0 for all x ∈ (a, b), then f is strictly monotonically increasing in (a, b).
(ii) If f 0 (x) < 0 for all x ∈ (a, b), then f is strictly monotonically decreasing in (a, b).
(iii) If f 0 (x) = 0 for all x ∈ (a, b), then f is constant in (a, b).
Comment:
If we allow for the ≥ sign in (i) and the ≤ sign in (ii), we have to drop the word
“strictly”. In some contexts, it may be useful to include (or drop) the equal sign from
the definition for practical reasons. Often, and in particular when it is clear from the
context, the word monotonically is suppressed.
Theorem 2 (second derivative):
Let f be differentiable with f 0 (x) = g(x). If the derivative of g exists (according to
0
the definition of differentiability), then
the second derivative of f and we
g represents
2
df
d
f
d2 f (x)
d
(x) = 2 (x) =
.
write g 0 (x) = (f 0 (x))0 = f 00 (x) =
dx dx
dx
dx2
Comment:
We already interpreted the derivative as “rate of change”, e.g., we interpreted the velocity of a moving body as limit of the difference quotient of the change of position
in an interval of time. The second derivative is therefore the “rate of change” of the
1
first derivative: the change of velocity in an interval of time represents the acceleration.
In this way, higher derivatives can be introduced:
Definition 1 (higher derivatives):
dn f
dn f (x)
Let f be n-times differentiable. Then, f (n) (x) = n (x) =
represents the n-th
dx
dxn
derivative of f .
Higher derivatives can be applied, e.g. using L’Hôpital’s rule repeatedly when an indetermination is replicated, or in the analysis of curves. For the latter, we need additional
definitions:
Definition 2 (critical point):
(i) Let f be a non-constant function, differentiable in c with f 0 (c) = 0. Then, c is a
critical point of f .
(ii) Let f be a continuous function with domain D. If f is not differentiable in
x = c ∈ D, c is a critical point of f .
Example 1: Find the critical points of f (x) = x3 − 9x2 + 24x − 10.
f is a polynomial and as such differentiable in R. Its critical points are the solutions
of f 0 (c) = 0: f 0 (x) = 3x2 − 18x + 24 = 3(x2 − 6x + 8) = 3(x − 2)(x − 4) = 0, whose
solutions are c = 2 and c = 4.
Example 2: Find the critical points of f (x) = |x|.
We have seen this function earlier. Specifically, f 0 (x) = 1 if x > 0 and f 0 (x) = −1 if
x < 0. There is no critical point according to Def. 2(i), but one according to Def. 2(ii)
since f is not differentiable in x = 0: |x| has a critical point in x = 0.
Theorem 3 (extrema: criterion of the first derivative):
Let f be a differentiable function (possibly with exception in x = c) and c a critical
point of f . If f 0 (x) changes its sign in c, then f has a local extremum in c. In particular:
(i) If f 0 (x) changes from negative to positive in c, there is a local minimum in (c, f (c)).
(ii) If f 0 (x) changes from positive to negative in c, there is a local maximum in (c, f (c)).
Comments:
(1) This theorem can be understood as providing the additional hypotheses such that
the converse of the theorem of local extrema (last chapter) is true. Recall that f 0 (c) = 0
alone does not imply that there is a local extremum in c.
(2) Sometimes, we refer to extrema (or other points of a function) specifying its value
2
x only. Nevertheless, we know that in a graph, an extremum (or any other point on
the curve) is characterized by a set of two numbers, c and f (c). There are contexts
where it is important to state both numbers.
Example 3: f (x) = |x|.
We know from Ex. 2 that f has a critical point in x = 0 and that f is differentiable
in R \ {0}. We know that f 0 (x) < 0 for x < 0 and f 0 (x) > 0 for x > 0. Following
Theorem 3, f (x) = |x| has a local minimum in x = 0.
Example 4: f (x) = x3 .
The function f is non-constant and differentiable in R. The derivative of f is f 0 (x) =
3x2 . Then, the condition f 0 (c) = 0 is verified for c = 0, critical point of f . Nevertheless, f 0 (x) = 3x2 > 0 for x < 0 and also for x > 0, i.e., f 0 (x) does not change sign in
the critical point and there is no local extremum according to Theorem 3 (but compare
with the comment on the theorem of local extrema in the last chapter).
In the following theorem, we introduce fundamental notions about the curvature of a
function (convex/concave) and characterize inflexion points that separate areas of convex and concave curvatures. These concepts rely on the signs of the second derivative,
in a similar way as the signs of the first derivative indicate the growth of a function
(increase/decrease) and establish extrema as points that separate increasing from decreasing functions.
Theorem 4 (graphical caracterization of differentiable functions II):
Let f be a function whose second derivative f 00 exists in (a, b).
(i) If f 00 (x) > 0 for x ∈ (a, b), then f is convex in (a, b).
(ii) If f 00 (x) < 0 for x ∈ (a, b), then f is concave in (a, b).
(iii) If f 00 (c) = 0 for c ∈ (a, b) and f 00 (x) changes its sign in c, then f has an inflexion
point in c.
Example 5: We consider the function of Ex. 4: f (x) = x3 . Its second derivative is
f 00 (x) = 6x. According to Theorem 4(iii), in c = 0 the second derivative is zero, and
furthermore f 00 (x) < 0 for x < 0 and f 00 (x) > 0 for x > 0, i.e., f 00 changes its sign and
in x = 0 we find an inflexion point. According to (i) and (ii), the function changes
from concave to convex.
Comments:
(1) A convex curve in a given point grows stronger than the tangent in that point and
f 0 (x) is strictly increasing.
(2) A concave curve in a given point grows weaker than the tangent in that point and
3
f 0 (x) is strictly decreasing.
(3) In a slight abuse of notation, the curvature of a graph is also referred to as its
“convexity” (although the shape may be concave).
(4) Another example: If f represents the daytime (duration of sun visible above the
horizon), then f 0 represents the change of daytime. After the summer solstice, nights
are getting longer (f 0 (x) < 0) until the winter solstice (i.e., from June to December
in the Northern hemisphere). Nevertheless, the rate with which the daylight changes
grows only from the summer solstice to the autumn equinox (f 00 (x) > 0), and later
decreases from the autumn equinox to the winter solstice (f 00 (x) < 0). The solstices
represent the local extrema and the equinoxes the inflexion points.
Theorem 5 (extrema: criterion of the second derivative):
Let c be a critical point of f . If f 00 (c) exists, then:
(i) If f 00 (c) > 0, then there is a local minimum in x = c;
(ii) If f 00 (c) < 0, then there is a local maximum in x = c;
(iii) If f 00 (c) = 0, the criterion is not conclusive about a possible extremum (there may
be a minimum, maximum or neither).
Example 6: Let f (x) = x2 . We calculate the first derivative and require it to vanish:
f 0 (c) = 0, implying that 2c = 0. Therefore, there is a critical point in x = 0. We can
use Theorem 3 to establish that there is a local minimum in x = 0, but we can also
use Theorem 5: f 00 (x) = (2x)0 = 2. For all x, 2 > 0, and specifically f 00 (0) > 0 and
there is a local minimum in x = 0.
Example 7: Let us consider the function from Ex. 4 and 5: f (x) = x3 . The second
derivative is f 00 (x) = 6x. According to Theorem 5(iii), in x = 0 (locus of the critical
point) the second derivative vanishes and the criterion is not conclusive. Actually, we
already established in Ex. 5 that in x = 0 there is an inflexion point and there is no
contradiction.
7.2
Asymptotes and global extrema of a function
Asymptotic behavior of functions was already considered in the chapter on limits. In
particular, when lim f (x) = ∞ or lim f (x) = −∞ (or for lateral limits, for that matx→c
x→c
ter), the curve of the function shows a vertical asymptote.
1
which shows lim f (x) = ∞ and lim f (x) = −∞ and
x→1+
x→1−
x−1
which leads to a vertical asymptote at x = 1.
Example 8: f (x) =
Also, limits as x → ±∞ were considered. For example, lim ex = ∞, so the function is
x→∞
4
not approaching a finite limit value. However, lim e−x = 0, which represents a finite
x→∞
limit value L = 0. Consequently, the curve of e−x approaches the horizontal asymptote
L = 0 as x → ∞. Of course, horizontal asymptotes can also be found for x → −∞,
depending on the function.
Global (or absolute) extrema are those points of the graph of the function where the
function takes their global maximum and minimum values. The candidates for global
maxima and minima are (a) the critical points and (b) the boundaries if the interval
is closed. Recall that only on closed intervals continuous functions are guaranteed to
take their maximum and minimum values within the interval. Therefore, a function
can have both global maximum and minimum, just one global maximum or minimum
or neither. For example, the function f : R → R: f (x) = x has neither local nor global
extrema in its domain.
Global extrema may be degenerate (e.g., in the interval [−2, 2], f (x) = x2 has two
global maxima at x = ±2, with degenerate value f (±2) = 4). If the domain is
composed of multiple intervals or of half-open interval(s), all closed boundary points
have to be considered.
7.3
Graphical representation of a function
We have now at our disposal the most fundamental tools for sketching functions graphically:
1. Domain. We know that the domain is typically an interval being a subset of R,
but a function domain can also be the union of various intervals, and can exclude
isolated points.
2. Symmetries. Functions may show a fundamental symmetry like f (x) = f (−x)
(even function, e.g., x2 , cos(x)), f (x) = −f (−x) (odd function, e.g., x3 , sin(x)),
or f (x) = f (x + c), c 6= 0 (periodic function, e.g. sin(x) with c = 2π).
3. Intersection points with axes: The condition x = 0 gives the intersection(s) with
the ordinate, the condition f (x) = 0 the intersection(s) with the abscissa, also
called roots or zeros.
4. Points of non-continuity. We know how to detect removable (evitable), jump,
and essential (including infinite) discontinuities and how they are represented.
5. Points of non-differentiability (for continuous functions). For example, f (x) = |x|
is not differentiable at x = 0, with the graph showing a peak (cusp) there, since
the lateral limits of the differential quotient do not coincide.
5
6. Monotonicity and local extrema. This is evaluated with help of the first derivative
(increasing for f 0 (x) > 0, decreasing for f 0 (x) < 0, and an extremum for f 0 (x)
changing sign). The critical points are the points of non-differentiability and
those which fulfill f 0 (x) = 0.
7. Curvature and inflexion points. This is evaluated with the help of the second
derivative (convex for f 00 (x) > 0, concave for f 00 (x) < 0, and an inflexion point
for f 00 (x) changing sign).
8. Asymptotes. Horizontal asymptotes with value L correspond to solutions of
lim f (x) = L (only if the domain of the function extends to ±∞) and vertical
x→±∞
asymptotes at x = c correspond to solutions of lim f (x) = ±∞.
x→c
9. Global extrema. Evaluate f (x) at the critical points and the closed boundaries
and compare the values.
Examples are provided in the problem sheet.
6
Taylor Polynomial
Michael Stich
November 13, 2021
8
Taylor polynomial
In this chapter we study a very useful method to approximate functions, to obtain
limits and to prove inequalities (among other uses): the Taylor polynomial.
8.1
Polynomial approximation of functions
Theorem 1 (Weierstrass’ approximation theorem):
Let f be a continuous function on [a, b]. Then, ∀ > 0, ∃ a polynomial P (x) such that
|f (x) − P (x)| < ∀x ∈ [a, b].
This is a remarkable statement: it means that an arbitrary continuous function can be
approximated to arbitrary precision by a polynomial! This is visualized in Fig. 1.
Figure 1: Illustration of Weierstrass’ approximation theorem: For any > 0, there is
a polynomial P (x) such that it uniformly close to f (x) in an interval [a, b].
By approximating an arbitrary function by a polynomial, it is useful to state its analytical properties. For that the following definition comes in handy:
1
Definition 1 (differentiability classes):
A function f is of differentiability class C k if the derivatives f 0 , f 00 , ..., f (k) exist and
are continuous.
Comments:
(1) Since differentiability includes continuity, continuity is implied for all derivatives
except for f (k) .
(2) If a function has derivatives of any order, it is called infinitely (often) differentiable,
smooth, or of class C ∞ .
(3) Depending on whether the function is defined on an open or closed interval, there
are slightly different meanings, in particular implying lateral differentiability on the
endpoints of a closed interval.
Theorem 2 (polynomials are smooth):
Let P be a polynomial of degree n. Then, P is infinitely (often) differentiable in R.
Since differentiability includes continuity, it is not necessary to state that P is continuous (although it is true). We already knew that a polynomial is differentiable. What
is new here is the generalization to higher derivatives. As we know, the derivative of a
polynomial of degree n yields a polynomial of degree n − 1. Therefore, differentiating
a polynomial (of degree n) n times produces a constant. Then, any further derivatives
(and there is nothing preventing us from taking higher derivatives) become zero. We
shall use degree and order of a polynomial synonymously in this chapter.
8.2
Taylor’s theorem
The preceding section showed that polynomials are a very powerful in approximating
functions in general . The theorem that we study in this section is not designed to
prove Theorem 1 from above, but applies to functions of differentiability class C n .
However, the following theorem provides a very useful tool to represent sufficiently often differentiable functions by polynomials, as it not only provides the approximation,
but also a way to evaluate the goodness of the approximation.
Theorem 3 (Taylor’s theorem):
Let f be of class C n on the interval [a, b] and suppose that the (n + 1)-th derivative of
f exists on (a, b). Let x0 ∈ [a, b]. Then, ∀x ∈ (a, b), ∃c between x and x0 such that
f (x) = Pn,x0 (x) + Rn,x0 (x),
2
with Pn,x0 (x) being the Taylor polynomial and Rn,x0 (x) the remainder , given by
Pn,x0 (x) = f (x0 ) + f 0 (x0 )(x − x0 ) +
Rn,x0 (x) =
f 00 (x0 )
f (n) (x0 )
(x − x0 )2 + · · · +
(x − x0 )n ,
2!
n!
f (n+1) (c)
(x − x0 )n+1 .
(n + 1)!
Comments:
(1) This is also called the Taylor expansion of f about (around) x0 .
(2) The specific value x0 is the point about which we approximate the function and is
typically given. If x0 = 0, the polynomial is also called the Maclaurin polynomial.
(3) The dependence on x resides solely in the polynomial terms (x − x0 )k since f (x0 ),
f 0 (x0 ), f 00 (x0 ),... are all scalar values. To approximate (replace) a function f by its
Taylor polynomial it is necessary to compute the aforementioned derivatives.
(4) The quality of the approximation (or the goodness of the fit) can be assessed by
calculate the remainder term (if possible) or to give an upper bound to it.
(5) The quality of the approximation typically increases if (a) the degree of the Taylor
polynomial increases and (b) if x approaches x0 . Obviously, for x = x0 , the Taylor
polynomial is simply f (x0 ), showing that Taylor’s theorema is only useful for x 6= x0 ,
and hence an appropriate choice of x0 is important (not too far from the values of x
where f should be evaluated).
(6) This formulation of Rn,x0 (x) is called the Lagrange remainder. There are other
variants, namely the Cauchy form and the integral form (not shown here). It is important to note that in general there is no way to know what specific value c takes.
(7) The Taylor polynomial of degree 1, P1,x0 (x) = f (x0 ) + f 0 (x0 )(x − x0 ) is identical
to the equation of the tangent in a point (x0 , f (x0 )).
(8) There are nice animations of Taylor polynomial approximations at:
https://en.wikipedia.org/wiki/Taylor’s_theorem
8.3
Applications of Taylor’s theorem
The fundamental application of Taylor’s theorem is to approximate (replace) a possibly complicated function by a polynomial. Polynomials are C ∞ functions and very
“well-behaved”, as it is easy to compute any derivative of it, obtain its graph, etc. The
remainder term can be used to check the quality of the approximation. Among other
uses, Taylor’s theorem can be used to evaluate limits and proving inequalities.
Example 1:
(a) Approximate f (x) = sin(x) by its Taylor polynomial of degree 3 about x0 = 0.
(b) Then give an upper bound to the absolute error in the interval [0, 1]. (c) Compare
with the actual absolute error at x = 0.5.
3
(a) We apply Taylor’s theorem as indicated:
f 00 (x0 )
f (3) (x0 )
f (4) (c)
(x − x0 )2 +
(x − x0 )3 +
(x − x0 )4
2!
3!
4!
f 00 (0) 2 f (3) (0) 3 f (4) (c) 4
= f (0) + f 0 (0)x +
x +
x +
x.
2
6
24
f (x) = f (x0 ) + f 0 (x0 )(x − x0 ) +
We have to compute derivatives up to 4th order:
f
f0
f 00
f (3)
f (4)
x0 or c
sin(x0 )
cos(x0 )
− sin(x0 )
− cos(x0 )
sin(c)
x0 = 0
0
1
0
−1
With this we obtain for the Taylor polynomial:
1
P3,0 (x) = x − x3 ,
6
and, overall:
sin(c) 4
1
x,
f (x) = x − x3 +
6
24
where c is between x0 = 0 and x.
(b) In principle, x can be any number, but we know that sin(x) is bound between −1
and 1 while the polynomial will go to ∞ for x → ∞, and the approximation will break
down. This example limits x to the interval [0, 1] and we use the remainder term to
determine the upper bound to the absolute error in that interval:
sin(c) 4
max(|R3,0 (x)|) = max
x
,
24
where the maximum is taken over all values x ∈ [0, 1]. The absolute value is taken
because we are interested in the absolute error of the approximation (and do not care
whether the difference f (x) − Pn,x0 (x) is positive or negative).
In the interval [0, 1], x4 takes its maximum at x = 1. The same holds for the sine
function (c is between 0 and 1), and therefore
max(|R3,0 (x)|) =
sin(1)
≈ 0.0351 (4 d.p.).
24
1
Result: Between [0, 1], sin(x) can be approximated by x − x3 with a maximum abso6
lute error of 0.0351 (to a precision of 4 decimal places).
4
A comment may be due here: It would be possible to obtain a weaker (larger) upper
bound by using that | sin x| ≤ 1 for all x ∈ R:
max(|R3,0 (x)|) =
1
≈ 0.0417 (4 d.p.).
24
For example, if the interval given was [0, b], with b ∈ R, we would have to use | sin x| ≤ 1
and keep the term b4 :
1 4
b .
max(|R3,0 (x)|) =
24
This result tells us that the approximation f (x) ≈ P3,0 (x) becomes bad if b 1.
(c) We have to compare f (0.5) and P3,0 (0.5):
f (0.5) = sin(0.5) = 0.479426 (6 d.p.)
0.53
23
P3 (0.5) = 0.5 −
=
= 0.479167 (6 d.p.)
6
48
The actual absolute error is
E(0.5) = |0.479426 − 0.479167| = 0.000259 = 2.59 × 10−4 .
This number is smaller than max(|R3,0 (x)|), as expected.
Example 2:
Calculate the Taylor polynomial of degree 3 of f (x) = x3 about x0 = 2 together with
the remainder term.
We write
f (3) (x0 )
f (4) (c)
f 00 (x0 )
(x − x0 )2 +
(x − x0 )3 +
(x − x0 )4
2!
3!
4!
f 00 (2)
f (3) (2)
f (4) (c)
= f (2) + f 0 (2)(x − 2) +
(x − 2)2 +
(x − 2)3 +
(x − 2)4 .
2
6
24
We have to compute derivatives up to 4th order:
f (x) = f (x0 ) + f 0 (x0 )(x − x0 ) +
f
f0
f 00
f (3)
f (4)
x0 or c
x30
3x20
6x0
6
0
x0 = 2
8
12
12
6
With this we obtain:
12
6
(x − 2)2 + (x − 2)3
2
6
2
= 8 + 12(x − 2) + 6(x − 2) + (x − 2)3 ,
0
R3,2 (x) =
(x − 2)4 = 0.
24
P3,2 (x) = 8 + 12(x − 2) +
5
The remainder term is zero, and consequently we have f (x) = x3 = P3,2 (x). This can
be easily checked by expanding the terms of P3,2 . Furthermore, if one chooses x0 = 0
in this example, one directly calculates P3,0 (x) = x3 . This reflects a general result: the
Taylor polynomial of degree n of a polynomial of degree n are identical.
Example 3:
Calculate the following limit using the Taylor expansions of the involved functions:
cos x − ex + x
.
x→0
x2
lim
Substitution of x = 0 yields an indetermination 0/0. Since the denominator is x2 , we
develop the functions cos x and ex up to that order, using x0 = 0 since this is the point
at which the limit is taken.
cos x = cos(0) + (cos x)0 (0) · (x − 0) +
(cos x)00 (0)
(x − 0)2 + h.o.t.
2
1
= 1 − x2 + h.o.t.,
2
ex = e0 + (ex )0 (0) · (x − 0) +
(ex )00 (0)
(x − 0)2 + h.o.t.
2
1
= 1 + x + x2 + h.o.t.,
2
where h.o.t. stands for higher order terms. Now, we proceed
1 − 12 x2 − 1 − x − 12 x2 + x + h.o.t.
cos x − ex + x
−x2 + h.o.t.
lim
=
lim
=
lim
x→0
x→0
x→0
x2
x2
x2
This simplifies to
cos x − ex + x
h.o.t.
lim
= −1 + lim 2 ,
2
x→0
x→0 x
x
but as we know the higher order terms contain contributions proportional to x3 + · · · ,
implying that the latter limit is zero and we have as result
cos x − ex + x
= −1.
x→0
x2
lim
Example 4:
We approximate f (x) = ex in the interval [0, 1] by a Taylor polynomial. Of which
degree needs the polynomial to be for the absolute error to be smaller than 0.05?
We select x0 = 0 and the Taylor polynomial of ex becomes
1 2 1 3
1
x + x + · · · + xn ,
2!
3!
n!
but what we really need in order to respond to the question is the remainder term
Pn,0 (x) = 1 + x +
Rn,0 (x) =
ec
xn+1 ,
(n + 1)!
6
where c is between 0 and x and we have to solve for n
ec
n+1
< 0.05.
x
max(|Rn,0 (x)|) = max
(n + 1)!
Since x is limited to 1 and the exponential function is strictly increasing, the maximum
value of ec in the interval [0, 1] is e1 . The power function xn+1 also is strictly increasing
in the interval [0, 1] and takes its maximum value at x = 1. We obtain
e
1n+1 < 0.05,
(n + 1)!
1
0.05
<
,
(n + 1)!
e
e
(n + 1)! >
≈ 54.4.
0.05
Since 4! = 24 and 5! = 120, we conclude that we have to approximate ex at least by
P4,0 (x) in order for the absolute error in [0, 1] to be smaller than 0.05.
More examples see the exercise sheet.
7
Integral calculus
Michael Stich
November 13, 2021
9
Integration of functions
Integral calculus has a tight relationship with differential calculus. Its essence is represented by the fundamental theorem of calculus that describes integration as an inverse
process to differentiation. Nevertheless, to understand better that relationship, we
consider first the integration of functions in its own right. The historical motivation
and fundamental application of integration is to calculate volumes and areas limited
by curves and functions, and the length of curves. Generalizations of the methods
involved lead to the calculation of work, velocity, moments of inertia etc. and other
applications in science and engineering.
9.1
Definite integral of a function
The specific question that we will try to answer here is to calculate the area between
the curve of a function f (x) and the abscissa (the x-axis) in an interval [a, b]. To
ensure that the area is finite, we assume that the function and the interval are bound.
A closed and bound interval is called compact.
We define the partition Pn of the interval [a, b] in n + 1 points:
Pn = {x0 , x1 , x2 , ..., xn }
with a = x0 < x1 < x2 < · · · < xn = b.
It is not necessary to assume that the points are equidistant, i.e., that the n subintervals
with lengths xi − xi−1 = ∆xi (with i = 1, 2, · · · , n) are all of the same length. Now,
we define a set Tn of n intermediate points, with
Tn = {x∗1 , x∗2 , ..., x∗n }
with xi−1 ≤ x∗i ≤ xi ,
and define the Riemann sum as
Sn =
n
X
f (x∗i )(xi − xi−1 ).
i=1
1
i = 1, 2, · · · , n
Every term of the sum is the product of the value of the function with an interval of x
and hence represents the area of a rectangle with sides f (x∗i ) and xi − xi−1 (compare
with Fig. 1).
Figure
1:
Construction
of
a
Riemann
sum
for
a
función.
The ti represent the x∗i
of the text.
Source:
https://upload.wikimedia.org/wikipedia/commons/8/85/Riemannsumme.svg (public
domain).
In general, the Riemann sum does not correspond exactly to the area between the
curve and the abscissa, but represents an approximation: by construction, the Riemann sum does not only depend on f but also on the specific partition and the choice
of the intermediate points, both at the same time dependent on n: Sn = Sn (f, Pn , Tn ).
However, the area below the curve must have a unique value and for that reason we
eliminate the dependence on Pn , Tn and n by taking the limit n → ∞, asking for the
result not to depend on neither the partition nor the intermediate points, giving the
following definition:
Definition 1 (Riemann integral):
Let f be a function defined on the compact interval [a, b] and Pn a partition such that
the maximum length of the subintervals tends to zero when n → ∞. If for all Tn
" n
#
X
lim
f (x∗i )(xi − xi−1 ) = I
n→∞
i=1
exists, then
Z
I=
b
f (x)dx
a
is the Riemann integral of f on [a, b].
If the function f is bound in [a, b] and has at most a finite number of discontinuities
in [a, b], the Riemann integral exists.
2
Comments:
(1) The Riemann integral is a definite integral, i.e., represents a real and specific number, a scalar. Later, we introduce integral functions called antiderivatives or primitives.
(2) We know (chapter on continuity) that a continuous function on a compact interval
is bound and fulfills the hypotheses of Definition 1. More on this in Theorem 2.
(3) In many texts, the hypothesis is used that the function f is continuous, but this
is not necessary since it is possible to prove that there are discontinuous functions for
which a Riemann integral exists. Furthermore, it is possible to generalize the integral
concept for non-bound functions and non-bound intervals (improper integrals).
(4) Other formulations of the Riemann integral use the definition of lower and upper
sums:
n
X
LP (f ) =
mi (xi − xi−1 ) being mi the minimum of f in the subinterval i,
i=1
UP (f ) =
n
X
Mi (xi − xi−1 )
being Mi the maximum of f in the subinterval i.
i=1
If the minimum of all upper sums (choosing among all possible partitions) coincides
with the maximum of all lower sums (choosing among all possible partitions), we have
(also assuming that the maximum of all subintervals tends to zero)
Z b
f (x)dx.
minP Up (f ) = maxP Lp (f ) =
a
This method (also known as the Darboux integral) is illustrated in Fig. 2.
Figure 2: Lower and upper sums for a function.
The area below the
curve is bound to below by LP (f ) and to above by UP (f ).
Source:
https://upload.wikimedia.org/wikipedia/commons/5/59/Darboux.svg
(dominio
público).
(5) If the Riemann integral exists, then we can say that the function is Riemann integrable. There exist other integrability concepts (e.g. Lebesque), but unless stated
3
otherwise, integrability refers to Def. 1 (or, below, to the existence of an antiderivative).
(6) From the definition it becomes clear that we need to ensure a correct use of limits
in a quite involved form (partitions, intermediate points). It is possible to make a
comparison of the terms:
" n
# Z
b
X
f (x)dx.
lim
f (x∗i )∆xi =
n→∞
a
i=1
Rb
The integral symbol a represents an infinite sum that covers the whole interval. The
function part f (x∗i ) is represented by f (x) and the size ∆xi of the interval i, is represented by dx, the differential , standing for an infinitesimal increment, in complete
analogy with dx in differential calculus (Leibnitz notation).
(7) By convention, the areas are measured by positive numbers. To ensure that the
integral returns a positive number, we ask for f (x) > 0 for all x ∈ [a, b] without loss
of generality. Below, we consider the cases when these conditions are not met.
Example 1:
Calculate the area below the function f (x) = kx, k > 0 in the interval a ≤ b (with
a > 0) using a Riemann integral.
For k > 0 and a ≤ b (with a > 0), the area between the curve of the function and the
abscissa lies in the first quadrant and represents a right trapezoid limited byR y = 0,
b
y = kx, x = a and x = b. Since f (x) > 0 between a and b, the integral I = a kx dx
indeed corresponds to this area.
We define a partition of [a, b] in n + 1 points, choosing an equidistant partition with
∆x = (b − a)/n and setting:
x0 = a, x1 = a + ∆x, x2 = a + 2∆x, · · · , xn = a + n∆x = b,
with the “intermediate” points
x∗1 = a, x∗2 = a + ∆x, x∗3 = a + 2∆x, · · · , x∗n = a + (n − 1)∆x,
i.e., the intermediate points of each subinterval actually coincide with the lower limit
of each subinterval (permitted according to the definition). The Riemann sum is
Sn =
n
X
i=1
=
f (x∗i )∆xi
=
n
X
n
f (x∗i )∆x
i=1
b−aX
f (x∗i )
=
n i=1
b−a
[ka + k(a + ∆x) + k(a + 2∆x) + · · · + k(a + (n − 1)∆x)] ,
n
4
with n terms in the sum. Then:
b−a
· k [a + (a + ∆x) + (a + 2∆x) + · · · + (a + (n − 1)∆x)]
n
b−a
· k [na + ∆x(1 + 2 + · · · + (n − 1))] .
=
n
Sn =
We calculate the sum of the arithmetic progression 1 + 2 + · · · + (n − 1) = n(n − 1)/2,
replace ∆x, and obtain
b−a
b − a n(n − 1)
b−a
b−a n−1
Sn =
· k na +
·
=
·k·n a+
·
n
n
2
n
n
2
b−a n−1
·
.
= k(b − a) a +
2
n
Now, we take the limit n → ∞:
I = lim Sn
n→∞
b−a
ka + kb
= k(b − a) a +
,
= (b − a)
2
2
coinciding with the area that can be obtained with elementary geometry (the product
of the base of a trapezoid and the average length of the parallel sides).
We see that calculating integrals using the Riemann integral can be a tedious process
even for relatively simple functions. Below, we see a more efficient method.
9.2
Basic properties of the integral
Theorem 1 (immediate properties):
Let f be a Riemann integrable function on an interval [a, b]. Then:
Rb
Ra
(i) inversion of theR interval: a f (x)dx = − b f (x)dx.
c
(ii) zero interval: c f (x)dx = 0 for all c ∈ [a, b].
(iii) independence of the integral on the variable of integration:
Rb
Rb
I = a f (x)dx = a f (t)dt.
Comment:
Part (iii) emphasizes that the argument of the function (the integrand ) over which is
integrated has to coincide with the variable of integration as identified through the differential (here, dx or dt). The result is a scalar number that does not depend on x (or t).
In Definition 1, it was stated that a bound function with at most a finite number of
discontinuities is integrable. Here, we specify other types of integrable functions.
Theorem 2 (integrable functions):
(i) If f is continuous in [a, b] compact, f is integrable in [a, b].
5
(ii) If f is monotonic in [a, b] compact, f is integrable in [a, b].
Comments:
(1) A continuous function on a compact interval is bound and has no discontinuities
and hence is integrable according to Definition 1.
(2) Assuming appropriate hypotheses about the intervals, we can establish that a
continuous function is always integrable but it is not necessary since there are noncontinuous functions which are integrable. Recalling that all derivable functions are
continuous, we find that the class of integrable functions is larger than the class of
continuous functions and the latter one larger than the class of differentiable functions.
Theorem 3 (basic properties):
Let f and g be integrable functions in [a, b] compact. Then,
Rb
Rb
(i) multiplication with a scalar: a c · f (x) dx = c a f (x) dx, with c ∈ R.
(ii) property of the sum:
Rb
a
[f (x) + g(x)]dx =
(iii) additivity with respect to the interval:
is integrable in [a, b] and [b, c]).
Rb
Rb
a
a
f (x) dx +
f (x) dx +
Rb
Rc
b
a
g(x) dx.
f (x) dx =
Rc
a
(iv) monotonicity of the integral: If f (x) ≤ g(x) for all x ∈ [a, b], then
Rb
g(x) dx.
a
f (x) dx (if f
Rb
a
f (x) dx ≤
(v) integrability of the absolute value: If f (x) is integrable in the interval [a, b] with
Rb
Rb
a < b, |f (x)| is also integrable, with | a f (x) dx| ≤ a |f (x)|dx.
(vi) integrability of the product and the quotient: f (x) · g(x) and f (x)/g(x) are integrable, the latter if for all x ∈ [a, b] exists a k > 0 such that |g(x)| ≥ k.
9.3
The fundamental theorem of calculus
Like in many contexts of mathematics, to formulate certain properties and results require the previous definition of appropriate concepts. To relate integral calculus with
differential calculus we need to move from the definite integral (a number) to the integral function (in analogy to the derivative in a point and the derivative as function).
First, we need to define for a function a certain other function whose derivative is the
original function:
Definition 2 (antiderivative):
A function f : [a, b] → R has an antiderivative (or primitive) φ if φ is differentiable
d
φ(x) = f (x) for all x ∈ [a, b].
with
dx
6
This definition does not provide any method to calculate the antiderivative for a given
function f , neither gives a criterion on f to ensure the existence of φ. It is a mere
definition of the relation between f and φ. To proceed, we need to generalize the
definite integral and define the integral function.
Theorem 4 (integral function):
Rx
If the function f is integrable on [a, b] compact, then the function φa (x) = a f (t) dt,
with x ≤ b exists and is continuous in [a, b].
In the first place, we have defined an integral as function of the upper limit of the
interval. The independent variable of φ is x, the upper limit of the interval. For that
reason, the variable of integration has to be identified with another letter, here t. It is
an error to use the same letter in the same equation for both concepts.
We observe that the function φa (x) is continuous according to Theorem 4. So while
the original function need not to be continuous, its integral is! However, we could
wish for more: If the integral function was differentiable, we could take its derivative
and compare it to the original function f . It is the fundamental theorem of calculus
that precisely establishes that relationship! We see that the integrability of f is not
sufficient, we have to ask for f being continuous!
Theorem 5 (fundamental theorem of calculus, FTC):
Rx
If the function f is continuous in [a, b] compact, then the function φa (x) = a f (t) dt,
d
φa (x) = f (x).
with x ≤ b exists and is differentiable in [a, b] and its derivative is
dx
Comments:
(1) This implies that φa (x) is an antiderivative of f (x), according to Definition 2.
(2) The derivative can be written as
d
dx
Z
x
f (t) dt = f (x),
a
showing that integration and differentiation are inverse processes.
Theorem 6 (set of antiderivatives):
If the function f isR continuous in [a, b] compact, then f has a set of antiderivatives in
x
[a, b] with φ(x) = a f (t) dt + C, with C ∈ R.
Proof:
φ(x) is an antiderivative according to Def. 2 if it is differentiable and its derivative is
7
identical to function f . We calculate
Z x
Z x
d
d
d
d
f (t) dt + C =
f (t) dt + C
(φ(x)) =
dx
dx
dx
dx
a
Za x
d
=
f (t) dt + 0 = f (x)
dx
a
The meaning of Theorem 6 is that we can add a constant to any antiderivative and it
remains an antiderivative. This leads us to the concept of the indefinite integral:
Definition 3 (indefinite integral):
The set of all antiderivatives of a function f is called indefinite integral and we write:
Z
f (x) dx = φ(x) + C, C ∈ R if and only if φ0 (x) = f (x).
Comment:
The indefinite integral relates two functions with the same independent variable x and
in absence of any interval limit the integration variable is also x.
How can we use the antiderivatives to calculate definite integrals? The following theorem provides a specific rule.
Theorem 7 (Barrow’s rule):
Let the function f be continuous in [a, b] compact and φ any antiderivative of f (in
this interval), then
Z
I=
b
b
f (x) dx = φ(b) − φ(a) = φ(x)|x=b
x=a = [φ(x)]a .
a
With this rule, it is possible to calculate the definite integral not via a process of sums
and limits like in Riemann’s integral, but as a subtraction of values of an auxiliar
function evaluated at the interval boundaries. That auxiliar function is precisely the
antiderivative. In the next section we study how to calculate the antiderivative.
9.4
Integration techniques
To solve integration problems (both definite and indefinite integrals), it is common to
find the antiderivative first. For this task there is a range of strategies:
• Invert the rules of differentiation (immediate and almost immediate integrals)
• Simplify the function using Theorems 1 and 3
• Partial integration (see below)
8
• Integration through substitution (see below)
• Rule of logarithmic integration (see below)
• Use selected methods for special function types, exemplified below for rational
functions and trigonometric functions
If necessary, use then Barrow’s rule to determine the definite integral.
Recall: Differentiation is a craft, integration is an art!
9.4.1
Partial integration
Let us recall the product rule of differentiation:
Let f and g be differentiable functions. Then, (f (x)g(x))0 = f 0 (x)g(x) + f (x)g 0 (x). A
way of interpreting this rule is to identify f (x)g(x) as the antiderivative of f 0 (x)g(x) +
f (x)g 0 (x). Then, we can integrate the equation and we obtain
Z
(f 0 (x)g(x) + f (x)g 0 (x)) dx = f (x)g(x) + C,
Z
Z
0
f (x)g(x) dx + f (x)g 0 (x) dx = f (x)g(x) + C.
Recasting the terms we obtain as rule for partial integration the following theorem.
Theorem 8 (partial integration):
Let f and g be differentiable functions (with their derivatives being continuous) over
[a, b] compact. Then,
Z
Z
0
f (x)g(x) dx = f (x)g(x) − f (x)g 0 (x) dx + C.
As always in the context of the indefinite integral, we have to carry a constant of integration: any constant may be added to an antiderivative since the derivative of any
the antiderivatives yields the same function.
For definite integrals, we obtain
Z b
Z b
0
b
f (x)g(x) dx = [f (x)g(x)]a −
f (x)g 0 (x) dx.
a
a
The rule replaces one integral (on the left-hand side) by two terms, one is the product of functions evaluated in the limits of the interval and another integral (on the
right-hand side), still unsolved, justifying the name of partial integration. This rule is
used if the integral on the right-hand side is easier to determine than the one on the
left-hand side. By construction, this rule can be interpreted as the inversion of the
product rule of differentiation.
9
Example 2: Calculate the following indefinite integrals using partial integration:
R
(a) F (x) = xex dx.
We know how to integrate polynomials and exponential functions, but we do not know
the integral of this product. We identify x with g(x) and ex with f 0 (x). It is easy to
integrate f 0 (x) and we obtain f (x) = ex , at the same time as g 0 (x) = 1. Therefore,
Z
Z
x
x
xe dx = xe − 1 · ex dx + C = xex − ex + C = ex (x − 1) + C.
R
(b) F (x) = x2 sin x dx.
We identify x2 with g(x) and sin x with f 0 (x). It is easy to integrate f 0 (x) and we
obtain f (x) = − cos x, at the same time as g 0 (x) = 2x. Therefore,
Z
Z
2
2
x sin x dx = x (− cos x) + (2x) · cos x dx + C.
The integrand has lowered the degree of the polynomial part and we have to apply
partial integration again to this integral, identifying 2x with g(x) (g 0 (x) = 2) and cos x
with f 0 (x) (f (x) = sin x):
Z
2
F (x) = −x cos x + 2x sin x − 2 sin x dx + C.
In principle, the term 2x sin x provides a new constant of integration, but we can
absorb the two constants of integration in one (the sum of two arbitrary numbers is
just another arbitrary number). We integrate again and obtain the final result:
F (x) = −x2 cos x + 2x sin x − 2(− cos x) + C = (2 − x2 ) cos x + 2x sin x + C.
R
(c) F (x) = ln x dx.
Although the integrand is not a product, we can interpret it as such by introducing a
1
factor 1, identifying f 0 (x) = 1 (f (x) = x) and g(x) = ln x (g 0 (x) = ). We obtain:
x
Z
Z
1
ln x dx = x · ln x − x · dx + C
x
Z
= x · ln x − 1 dx + C = x · ln x − x + C,
where we have absorbed the constants of integration in one.
(d) F (x) =
R
1
x
dx.
We know that (ln x)0 =
1
and therefore we can solve directly
x
Z
1
dx = ln x + C,
x
10
with the constant of integration. Nevertheless, we try to apply partial integration,
1
1
identifying f 0 (x) = 1 (f (x) = x) and g(x) = (g 0 (x) = − 2 ). We obtain
x
x
Z
Z
1
1
−1
dx = x · − x · 2 dx + C
x
x
x
Z
Z
1
1
dx + C.
= 1 + x · 2 dx + C = 1 +
x
x
On the right-hand side we reproduce the initial integral, plus 1 + C. This result shows
that partial integration in this example does not lead us to the primitive. But it tells us
how to interpret the result and the constant of integration. With C being an arbitrary
constant, C + 1 also is, and we can rename it as C and write
Z
Z
1
1
dx =
dx + C.
x
x
R 1
R 1
This is not an incorrect result because both
dx and
dx + C are antiderivatives
x
x
1
of
and we have to recall that, in spite of the result, we have integrated once and
x
therefore there must be a constant of integration. If we calculate a definite integral in
this example (with 0 < a < b), no constant of integration appears:
b Z b
Z b
Z b
Z b
−1
1
1
1
1
x · 2 dx = (1 − 1) +
−
x · 2 dx =
dx = x ·
dx.
x a
x
x
a
a
a x
a x
The strategy to introduce a factor 1 which is identified with f 0 (x) is valid and can be
useful to solve other integrals.
The mnemotechnic rule ALPES is a way to memorize the order in which the functions
are normally assigned: Arcsine (and other inverse trigonometric functions), Logarithm,
Polynomial, Exponential, Sine (and other trigonometric functions): the type of function that appears first is the one assigned to g(x), el second one to f 0 (x). This rule
works because the functions easier to integrate are in the lower part of the list.
Applying partial integration, one tends to encounter one of the following cases:
• the rule solves the integral directly (Ex. 2(c));
• the rule simplifies the integral, but the process has to be repeated (e.g. to lower
the order of the polynomial until a constant is reached) to solve the integral (Ex.
2(b));
• the rule does not give a useful result (Ex. 2(d));
• the rule produces a term which is identical to the original integral, but with
inverted sign, that is: F (x) = h(x) − F (x) + C, where F (x) represents the
integral to solve and h(x) a non-constant function. Then, 2F (x) = h(x) + C and
we solve F (x) = h(x)/2 + C.
11
9.4.2
Integration through substitution
This method is also known as integration via change of variables and can be interpreted
as the inversion of the chain rule of differentiation.
Theorem 9 (integration through substitution): Let f be a continuous function in
an interval J and g(x) a differentiable function with continuous derivative in a compact
interval [a, b] and image J. Then,
Z
b
Z
0
g(b)
f (g(x)) · g (x) dx =
f (t)dt.
g(a)
a
Comments:
(1) t = g(x) represents the change of variable from x to t.
(2) The typical use of Theorem 9 is to replace the integral on the left-hand side (representing the product of two functions) by the integral on the right-hand side and
proceed to resolve the right-hand side with other means (e.g., as immediate integral).
For that, it is important to recognize that one of the factors (g 0 (x)) is the derivative
of the argument of the other factor (f (g(x))) (see example below). Nevertheless, Theorem 9 works in both ways, i.e., also from right to left if appropriate.
(3) While performing a change of variables, the variable of integration and the limits
of integration also change. The change of variable of integration is visible in the differential dt: it would be wrong to integrate the right side over x.
(4) For an interval [a, b], we know that a < b (for a = b, the integral would be zero), but
this does not imply that after the change of variable g(a) < g(b) is fulfilled. Indeed, the
R g(b)
simple change t = g(x) = −x implies that g(b) < g(a) and in the integral g(a) f (t)dt
the lower limit is larger than the upper limit. As usual, we put the lower value as lower
limit and use (compare with Theorem 1(i)):
(R g(b)
Z b
f (t)dt si g(a) < g(b),
g(a)
R g(a)
f (g(x)) · g 0 (x) dx =
− g(b) f (t)dt si g(b) < g(a).
a
(5) For indefinite integrals, the rule is formulated as
Z
Z
0
f (g(x)) · g (x) dx = f (t) dt,
without any constant of integrationRbecause only one integral has been replaced by
another one. Resolving the integral f (t) dt, the result is a function that depends on
t, while a result depending on x is expected. To complete the calculation, we have to
resubstitute the result using
x = g −1 (t),
12
assuming that the inverse function g −1 of the change of variable is well-defined.
(6) While changing the variable of integration, the differential is changed:
t = g(x),
dt
= g 0 (x),
dx
dt = g 0 (x) dx.
While implementing a change of variable t = g(x) in the integral
replace f (t) by f (g(x)) and dt by g 0 (x) dx as the theorem states.
R
f (t) dt one has to
(7) The integration by substitution can be interpreted as the inversion of the chain rule
for differentiation since the derivative of a composed function is given by (f (g(x)))0 =
f 0 (g(x)) · g 0 (x).
Example 3: Calculate the following integrals using a change of variable:
R2
(a) I = 0 x cos(x2 + 1) dx.
We observe that the derivative of the argument of the cosine is 2x which is (disregarding
a constant factor) the other factor of the function. We substitute
t = g(x) = x2 + 1
and calculate dt/dx = g 0 (x) = 2x and therefore x dx = dt/2. We change the intervals:
x=0 =
b t = 1,
x=2 =
b t = 5.
We calculate
Z
Z 2
2
x cos(x + 1) dx =
I=
0
1
5
1
1
cos t · ( dt) =
2
2
R
Z
1
5
1
1
cos t dt = [sin t]51 = (sin 5 − sin 1).
2
2
(b) F (x) = x cos(x2 + 1) dx.
We use the same change of variable, but there are no interval limits to be changed.
But a constant of integration has to be included and t must be replaced by x after
integration. In many cases it is helpful to manipulate the expression to recognize g(x)
and g 0 (x) to prepare the substitution:
Z
Z
Z
1
1
2
2
F (x) =
x cos(x + 1) dx =
cos(x + 1) · (2x) dx =
cos t dt
2
2
1
1
=
sin t + C = sin(x2 + 1) + C.
2
2
Rr√
(c) I = 0 r2 − x2 dx, con r ∈ R+ .
In this example we perform the change of variable in the other direction (from one
13
term to two). The reason is that the function describes one part of a circle of radius r
and this suggests a change of variable to angle t. We substitute and calculate:
x
dx
dt
dx
x=0
= g(t) = r sin t,
= g 0 (t) = r cos t,
= r cos t dt,
=
b t = 0,
π
x=r =
b t= ,
2
where x = g(t) (names of the variables are arbitrary).
Z π/2 p
Z r√
2
2
r − x dx =
r2 − r2 sin2 t · (r cos t) dt,
I =
0
0
where the root represents f (g(t)) and r cos t the part g 0 (t). The justification is in the
next step in which we use the trigonometric property sin2 u + cos2 u = 1:
Z π/2
Z π/2 p
2
2
2
1 − sin t · cos t dt = r
cos2 t dt.
I = r
0
0
In the next step, we can perform a partial integration with g 0 (t) = cos t and f (t) = cos t
or use the trigonometric property 2 cos2 u = 1 + cos(2u):
Z π/2 Z
Z
1 1
r2 π/2
r2 π/2
2
I = r
+ cos(2t) dt =
dt +
cos(2t) dt
2 2
2 0
2 0
0
Z
r2 π/2 r2 π/2
=
[t] +
cos(2t) dt
2 0
2 0
Z
πr2 r2 π/2
=
cos(2t) dt.
+
4
2 0
Now, we replace
t
dt
dp
dt
t=0
π
t=
2
= h(p) = p/2,
= h0 (p) = 1/2,
= dp/2,
=
b p = 0,
=
b p = π,
and obtain
πr2 r2
I =
+
4
2
Z
0
π
1
πr2 r2
πr2 r2
πr2
cos(p) dp =
+ [sin p]π0 =
+ (0 − 0) =
.
2
4
4
4
4
4
14
R√
(d) F (x) =
r2 − x2 dx, with r ∈ R+ y |x| ≤ r.
We use part of (c), to be specific the two changes of variable and we write:
r2
r2
t(x) + sin p(x) + C,
F (x) =
2
4
where t(x) and p(x) indicate that we have to substitute t and p by x:
r2
1
F (x) =
t(x) + sin(2t(x)) + C,
2
2
using the trigonometric identity sin(2u) = 2 sin u cos u:
F (x) =
r2
(t(x) + sin t(x) · cos t(x)) + C.
2
x
Since x = r sin t, we have t = arcsin( ) and
r
r2
2
r2
=
2
x
x
x arcsin( ) + sin(arcsin( )) · cos(arcsin( )) + C,
r
r
r
x
x
x arcsin( ) + · cos(arcsin( )) + C.
r
r
r
√
Now, we apply another trigonometric identity, cos(arcsin(u)) = 1 − u2 ,
!
r
r2
x
x
x2
F (x) =
arcsin( ) + · 1 − 2 + C,
2
r
r
r
F (x) =
representing the final result. One can confirm with Barrow’s rule that F (r) − F (0) =
πr2
, the result of (c).
4
9.4.3
Rule of logarithmic integration
Theorem 10 (rule of logarithmic integration): Let g be a function with g(x) 6= 0
and differentiable with g 0 (x) continuous. Then,
Z 0
g (x)
dx = ln |g(x)| + C.
g(x)
Comment:
This rule can be interpreted as the inversion of the chain rule of the derivative of the
d
logarithm of a positive function,
(ln |g(x)|) = g 0 (x)/g(x).
dx
15
Example 4: Solve the following integrals:
R 1
dx.
(a)
x ln x
If we fix g(x) = ln x, we know that g 0 (x) = 1/x and hence
Z 1
Z 0
Z
g (x)
1
x
dx =
dx =
dx = ln |g(x)| + C = ln | ln x| + C.
x ln x
ln x
g(x)
dx
.
1 − x2 arcsin x
To apply the rule, the function needs
√ not to contain logarithmic functions. We recognize the derivative
of the arcsine 1/ 1 − x2 . Therefore, we choose g(x) = arcsin x and
√
0
2
g (x) = 1/ 1 − x . Therefore,
Z
Z 0
g (x)
dx
√
dx = ln |g(x)| + C = ln | arcsin x| + C.
=
2
g(x)
1 − x arcsin x
(b)
R
9.4.4
√
Integration of rational functions
A rational function R(x) is the quotient of two polynomials P (x) and Q(x):
R(x) =
P (x)
au xu + au−1 xu−1 + · · · + a0
=
,
Q(x)
bv xv + bv−1 xv−1 + · · · + b0
where u and v are the degrees of the two polynomials, respectively. Before we show
how to determine integrals of the kind
Z
Z
P (x)
dx,
R(x)dx =
Q(x)
we anticipate that it is always possible to give a solution and that it will be composed
of a linear combination of rational functions, logarithms of linear functions, logarithms
of quadratic functions, and arctangents of linear functions.
The first case to consider is when u ≥ v. Then, we perform a division of polynomials
(compare with Ruffini’s rule) so that we obtain
R(x) = H(x) +
P1 (x)
.
Q(x)
Now, H(x) stands for a polynomial that we integrate with the standard rule of inteP1 (x)
gration of polynomials and we only need to add the integral of
where the degree
Q(x)
of P1 (x) now is strictly smaller than the degree of Q(x).
The rational function R(x) admits a partial fraction decomposition and for that we use
that any polynomial (here, Q(x)) can be factorized using its roots:
Q(x) = c(x−r1 )n1 (x−r2 )n2 · · · (x−rs )ns [(x−p1 )2 +q12 ]m1 [(x−p2 )2 +q22 ]m2 · · · [(x−pt )2 +qt2 ]mt ,
16
where r1 , r2 , . . . , rs are s different real roots of Q(x) (with multiplicities n1 , n2 , . . . , ns )
and p1 ± iq1 , p2 ± iq2 , . . . are t different complex roots of Q(x) (with multiplicities
m1 , m2 , . . . , mt ) and c is a constant. Simple roots with multiplicities 1 are included in
this notation.
For each root (corresponding to a factor of Q(x)) we can use the following table to
determine the term(s) that have to appear in the partial fraction decomposition (PFD).
Root factor of Q(x)
(x − r)n
[(x − p)2 + q 2 ]m
Term in PFD
Pn
Ki
i=1
(x − r)i
Pm
Mi x + Ni
i=1
[(x − p)2 + q 2 ]i
This includes simple roots (with multiplicities 1).
In this way, the partial fraction decomposition of a rational function is then schematically given by a summation of terms,
" s n
#
mk
t X
k
X
P1 (x)
1 XX
Mi,k x + Ni,k
Ki,k
=
+
,
Q(x)
c k=1 i=1 (x − rk )i k=1 i=1 [(x − pk )2 + qk2 ]i
where the first group of sums refer to the real roots and the second group to the complex roots. The unknown coefficients K, M, N in the numerators of the right-hand side
are determined via comparison with the left-hand side when both sides are multiplied
with Q(x) and terms of same order are compared. Note that the left-hand side is
then P1 (x). This task and the partial fraction decomposition in general are part of
elementary algebra and not related to integration.
We emphasize that there are only 4 types of fractions that appear in the decomposition
(real or complex roots, with or without multiplicity). Since the integral of a sum is
the sum of the integrals, we only need to state the solutions for these 4 types of
integrals (where K, M, N represent real numbers determined in the partial fraction
decomposition):
Z
K
dx
x−r
Z
K
dx
(x − r)n
Z
Mx + N
dx
(x − p)2 + q 2
Z
Mx + N
dx
[(x − p)2 + q 2 ]m
= K ln |x − r| + C,
−K
+C
with n > 1,
(n − 1)(x − r)n−1
M
Mp + N
x−p
2
2
=
ln[(x − p) + q ] +
arctan
+ C,
2
q
q
M
=
+ (M p + N ) · Im (x), m > 1,
2(1 − m)[(x − p)2 + q 2 ]m−1
=
17
where Im (x) =
R
dx
is determined recursively with
[(x − p)2 + q 2 ]m
x−p
2m − 3
+
Im−1 (x),
2
2
m−1
2(m −
− p) + q ]
2(m − 1)q 2
x−p
1
arctan
+ C.
I1 (x) =
q
q
Im (x) =
1)q 2 [(x
Examples are provided in the work sheet.
9.4.5
Integration of trigonometric functions
We have already used a change of variable involving trigonometric functions in Example
(3c,d). Here, we give a table of suitable substitutions for functions that involve either
trigonometric functions or functions to which trigonometric identities can be applied.
The following table is not exhaustive and does not give a single substitution that fits
all cases. A lot of practice and trial-and-error is needed to solve integrals. The variable
names are arbitrary.
If f (x) contains
√
a2 − x 2
try
x = a sin θ
or x = a tanh u
√
a2 + x 2
x = a sinh u
or x = a tan θ
√
x 2 − a2
x = a cosh u
or x = a sec θ
Circular functions
s = sin x
or c = cos x
or t = tan( x2 )
Hyperbolic functions
u = ex
or s = sinh x
or c = cosh x
or t = tanh( x2 )
18
with
dx
= a cos θ
dθ
dx
= a sech2 u
du
dx
= a cosh u
du
dx
= a sec2 θ
dθ
dx
= a sinh u
du
dx
= a sec θ tan θ
dθ
ds
= cos x
dx
dc
= − sin x
dx
dx
2
=
dt
1 + t2
2t
1 − t2
sin x =
,
cos
x
=
1 + t2
1 + t2
du
= ex
dx
ds
= cosh x
dx
dc
= sinh x
dx
dt
= 12 sech2 ( x2 )
dx
Of particular interest is the substitution t = tan( x2 ) because it allows to replace all
simple sine and cosine functions with rational functions.
Examples are provided in the work sheet.
9.5
Improper integrals
We recall that the Riemann integral was defined for a bound function f over a compact
(bound and closed) interval [a, b].
Rb
The definite integral a f (x)dx is improper if at least one of the following hypotheses
are met: (a) The interval is not bound. (b) The function f is not bound in the interval.
9.5.1
Integrals in unbound intervals (improper integrals of the first kind)
Integrals in unbound intervals (also known as improper integrals of the first kind) can
be found if either the lower bound of the interval in question is −∞ or the upper bound
of the interval is +∞, or both. We therefore distinguish the following cases:
(a) Let f be bound and integrable in [M, b], with b ∈ R and M ≤ b. The improper
integral is defined as
Z b
Z b
f (x)dx = lim
f (x)dx.
I=
M →−∞
−∞
M
(b) Let f be bound and integrable in [a, N ], with a ∈ R and a ≤ N . The improper
integral is defined as
Z ∞
Z N
I=
f (x)dx = lim
f (x)dx.
N →∞
a
a
(c) Let f be bound and integrable in [M, N ], with M ≤ N . The improper integral is
defined as
Z c
Z N
Z ∞
f (x)dx + lim
f (x)dx,
I=
f (x)dx = lim
−∞
M →−∞
M
N →∞
c
with c ∈ R and M < c < N .
Obviously, the naming of the quantities M, N is arbitrary. We have defined the improper integral as the limit of a proper definite integral. There are three different
outcomes of the limit process:
(i) If the limit does not exist but is also not infinite (in case (c), in at least one limit),
the improper integral does not exist.
(ii) If the limit does not exist and is +∞ or −∞ (in case (c), in at least one limit and if
the other limit is not covered by (i)), the improper integral is divergent with I = +∞
19
or I = −∞.
(iii) If the limit exists and is finite (in case (c), in both limits), the improper integral
is convergent and its value is I.
Example 5:
Type (a)
Z 0
Z
x
e dx = lim
M →−∞
−∞
Type (b)
Z
∞
1
0
ex dx = lim [ex ]0M = lim (1 − eM ) = 1 convergent.
1
dx = lim
N →∞
x
M →−∞
M →−∞
M
N
Z
1
dx = lim [ln x]N
1 = lim ln N = ∞ divergent.
N →∞
N →∞
x
1
Type (c)
Z
∞
2
xe−x dx.
I=
−∞
First, let us calculate the primitive using a change of variable t = x2 :
Z
Z
Z
1
1
1
1
2
−x2
−x2
xe dx =
2xe dx =
e−t dt = − e−t + C = − e−x + C
2
2
2
2
Therefore,
Z
N
1
2
dx = − e−x
2
−x2
xe
0
and consequently
∞
Z
−x2
xe
N
0
1
1
1
2
2
= − e−N + = (1 − e−N ),
2
2
2
dx = lim
N →∞
0
1
1
−N 2
(1 − e
) = .
2
2
Similarly, for the other part of the improper integral:
0
Z 0
1 −x2
1
1 1
2
2
−x2
xe dx = − e
= − + e−M = − (1 − e−M ),
2
2 2
2
M
M
and consequently
Z
0
xe
−∞
In this case we find
−x2
1
1
−M 2
dx = lim
− (1 − e
) =− .
M →−∞
2
2
Z
∞
1 1
2
xe−x dx = − + = 0
2 2
−∞
and the improper integral is convergent with I = 0. In this example, we put c = 0
because e0 = 1 and the calculation simplifies, but any other choice of c would also
20
2
Also, the function
xe−x is actually an odd function and we could have used
Rwork.
R
2
2
0
∞
xe−x dx = − 0 xe−x dx.
−∞
Another valid solution for this example is:
N
Z ∞
1 −x2
1 −N 2 1 −M 2
−x2
xe dx =
lim
− e
I=
− e
+ e
,
=
lim
M →−∞, N →∞
M →−∞, N →∞
2
2
2
−∞
M
and from there
I = lim
M →−∞
1 −M 2
e
2
1 −N 2
+ lim − e
= 0,
N →∞
2
where we have used Barrow’s ruleR and that the individual
exist. In a further
R K limits
∞
−x2
−x2
simplification, we could have put −∞ xe dx = lim −K xe dx.
K→∞
9.5.2
Integrals of unbound functions (improper integrals of the second
kind)
Improper integrals of the second kind can be found if the function tends to +∞ or −∞
at a finite value of x (which is either at the interval boundary or at an inner point).
We therefore distinguish the following cases:
(a) Let f be integrable in [a, b) and lim |f (x)| = ∞. The improper integral is defined
x→b−
as
Z
Z
b−
b
f (x)dx.
f (x)dx = lim
I=
→0+
a
a
(b) Let f be integrable in (a, b] and lim |f (x)| = ∞. The improper integral is defined
x→a+
as
Z
Z
b
I=
b
f (x)dx = lim
→0+
a
f (x)dx.
a+
(c) Let f be integrable in (a, b), lim |f (x)| = ∞ and lim |f (x)| = ∞. The improper
x→a+
x→b−
integral is defined as
Z b
Z
I=
f (x)dx = lim
→0+
a
c
b−
Z
f (x)dx + lim
→0+
a+
f (x)dx,
c
where a < c < b.
(d) Let f be integrable in [a, c) and (c, b] and lim |f (x)| = ∞. The improper integral
x→c±
is defined as
Z
I=
b
Z
f (x)dx = lim
a
→0+
c−
Z
b
f (x)dx + lim
→0+
a
f (x)dx.
c+
We have defined the improper integral as the limit of a proper definite integral. There
are three different outcomes of the limit process:
21
(i) If the limit does not exist but is also not infinite (for (c) and (d), in at least one
limit), the improper integral does not exist.
(ii) If the limit does not exist and is +∞ or −∞ (for (c) and (d), in at least one limit
and if the other limit is not covered by (i)), the improper integral is divergent with
I = +∞ or I = −∞.
(iii) If the limit exists and is finite (for (c) and (d), in both limits), the improper integral is convergent and its value is I.
Example 6:
Type (a)
1
Z
I=
0
1
√
dx = lim
→0+
1 − x2
1−
Z
0
1
√
dx = lim [arcsin x]01− ,
2
→0+
1−x
and therefore
I = lim (arcsin(1 − ) − arcsin 0) = arcsin 1 =
→0+
π
2
convergent.
We
note that the function is not defined at x = 1, and, for example, the integral
R 7 also
√ 1
would be a standard definite integral.
3
1−x2
Type (c)
Z
1
I=
−1
1
√
dx = lim
→0+
1 − x2
Z
c
−1+
1
√
dx + lim
→0+
1 − x2
Z
c
1−
√
1
dx.
1 − x2
We have to fix c and appropriately choose c = 0 because it is in the center of the
interval. So we have
Z 1−
Z 0
1
1
√
√
dx + lim
dx.
I = lim
→0+ 0
→0+ −1+
1 − x2
1 − x2
The second term we have computed above, as well as the antiderivative and obtain
I=
π
π
π
π π
+ lim [arcsin x]0−1+ = + lim (0 − arcsin( − 1)) = −arcsin(−1) = + = π.
2 →0+
2 →0+
2
2 2
In this example, we could have used also the even symmetry of the function to obtain
the result. A separate example for type (b) is not necessary.
Type (d)
Z
I=
a
b
1
dx
x−c
with
a < c < b.
1
= ±∞ and we write
x→c± x − c
Z b
Z c−
Z b
I=
f (x)dx = lim
f (x)dx + lim
f (x)dx.
The integrand is not defined for x = c and lim
a
→0+
→0+
a
22
c+
We have to calculate both terms separately
Z
c−
lim
f (x)dx =
→0+
a
→0+
= −∞ divergent.
= lim ln
→0+
|a − c|
Z
b
f (x)dx =
lim
→0+
lim [ln |x − c|]c−
= lim (ln |c − − c| − ln |a − c|)
a
→0+
c+
=
lim [ln |x − c|]bc+ = lim (ln |b − c| − ln |c + − c|)
→0+
→0+
lim ln
→0+
|b − c|
= ∞ divergent.
While both improper integrals diverge, and so the overall integral, we may recognize
that the curve of the function has an odd symmetry with respect to x = c and that
the divergent integrals may cancel out if the limit process is combined. This is indeed
the case for this example:
Z
P.V.
a
b
1
dx =
x−c
c−
Z
lim
→0+
= ln
Z
b
f (x)dx
f (x)dx +
a
c+
|c − − c|
|b − c|
|b − c|
+ ln
= ln
,
|a − c|
|c + − c|
|a − c|
This is called the principal value of the integral.
It is possible to combine improper integrals of the first and second kind. In that case,
limits have to be taken separately.
9.6
Applications of integrals
The application of integral calculus are very diverse and are present in many fields of
science and engineering. We limit ourselves here to two general and simple concepts:
to determine areas and lengths of curves.
9.6.1
The area between two curves
The Riemann integral introduces the definite integral precisely as a representation
of the area A between the curve of a bound function and the abscissa in an interval [a, b]. For the integral really representing the area, we assume that f (x) > 0 for
Rb
all x ∈ [a, b]. In that case, a f (x) dx > 0 and it corresponds directly to the area (if
the calculation contains units of length, the integral contains units of lengths squared).
We have to check what happens if f (x) is negative or if it changes its sign in the
23
interval [a, b]. Let I =
Rb
a
f (x) dx
b
Z
f (x) ≥ 0 ∀x ∈ [a, b]
f (x) dx,
A=I=
a
Z
f (x) ≤ 0 ∀x ∈ [a, b]
b
A = −I =
(−f (x)) dx,
a
i.e., the integral of a negative function is negative, but its absolute value corresponds
to the area.
If f changes its sign at points c1 , c2 , . . . , cn (that can be identified solving f (ci ) = 0 for
ci ), then it is possible to determine the areas of each subinterval [a, c1 ], [c1 , c2 ], . . . , [cn , b]
according to the rules above and sum them up, or use the absolute value:
Z b
A=
|f (x)| dx.
a
We can generalize the concept to the area between two curves given by f (x) and g(x):
Z b
f (x) − g(x) ≥ 0 ∀x ∈ [a, b]
A=I=
(f (x) − g(x)) dx,
a
Z b
(g(x) − f (x)) dx,
f (x) − g(x) ≤ 0 ∀x ∈ [a, b]
A = −I =
a
If f (x) − g(x) changes its sign in points c1 , c2 , . . . , cn (that can be found solving f (ci ) =
g(ci )), then it is possible to determine the areas of each subinterval [a, c1 ], [c1 , c2 ], . . . , [cn , b]
according to the rules above and sum them up, or use the absolute value:
Z b
A=
|f (x) − g(x)| dx.
a
9.6.2
The arclength
A curve in the cartesian plane can be represented in various ways. We consider two
types, the explicit form and the parametric form.
If a curve is given by a function f (x) explicitly, we identify its position by sets of pairs
(x, y) = (x, f (x)) and assume that f (x) is differentiable and its derivative continuous
in the compact interval [a, b]. Then, the arclength of the curve between two points
(a, f (a)) and (b, f (b)) is given by
Z bp
l=
1 + (f 0 (x))2 dx.
a
If a curve is given in parametric form, we identify its position by sets of pairs (x, y) =
(x(t), y(t)), where both the position on the abscissa and the ordinate depend on the
24
parameter t. We assume that both x(t) and y(t) are differentiable and their derivatives
continuous in the compact interval [t1 , t2 ]. Then, the arclength of the curve between
two points (x(t1 ), y(t1 )) and (x(t2 ), y(t2 )) is given by
Z t2 p
(x0 (t))2 + (y 0 (t))2 dt.
l=
t1
Example 7: Calculate the arclength of the following curves:
4
(a) Let f (x) = x between x = 0 and x = 3.
3
The curve is a straight line with slope 4/3 passing through the origin, limited by (0, 0)
4
and (3, 4). Easily, we obtain f 0 (x) = . We use the formula for the explicit function
3
and determine
Z 3p
Z 3r
Z 3r
Z 3
16
5
5
25
0
2
l=
1 + (f (x)) dx =
1+
dx =
dx =
dx = [x]30 = 5.
9
9
3
0
0
0
0 3
(b) Let x(t) = rt − r sin t and y(t) = r − r cos t between t = 0 and t = 2π.
The curve represents the arc of a cycloid. In t = 0, (x, y) = (0, 0) and in t = 2π,
(x, y) = (2πr, 0).
We calculate:
x0 (t) = r(1 − cos t),
y 0 (t) = r sin t,
and therefore
Z
l =
2π
Z
p
(r(1 − cos t))2 + (r sin t)2 dt = r
2π
q
(1 − cos t)2 + sin2 t dt
0
0
Z
= r
2π
Z
p
2
2
1 + cos t − 2 cos t + sin t dt = r
0
2π
√
2 − 2 cos t dt.
0
Now, we use the trigonometric identity 2 sin2 u = 1 − cos(2u):
Z 2π s
Z 2π
Z π
t
t
2
l = r
4 sin
dt = 2r
sin
dt = 2r
sin p · 2 dp,
2
2
0
0
0
where we do not need to take the absolute value of sin( 2t ) because in the relevant
interval the sine is not negative and we perform a change of variable t/2 = p. We close
the calculation:
Z π
l = 4r
sin p dp = 4r[− cos p]π0 = −4r[cos p]π0 = −4r(−1 − 1) = 8r.
0
25
Sequences and series
Michael Stich
November 13, 2021
11
Numerical sequences and series
In this section, we will define finite and infinite sequences of numbers and finite and
infinite series of numbers. This part of calculus is relatively independent of the study
of functions that we have considered in some detail earlier, in particular when compared to continuity, differentiability and integrability of functions. However, there is
one pivotal point that connects methodologically these areas, and it is the definition
of limits, a topic that we have already used extensively.
11.1
Numerical sequences: definition and convergence
Let us start with a few examples:
{1, 2, 3, 4, . . . },
{−1, 0, 1, −1, 0, 1, −1, 0, 1, −1, 0, 1},
{1, 1, 2, 3, 5, 8, 13, 21, 34, . . . }
are all numerical sequences. The dots indicate that the sequence proceeds according to the rule given by the previous terms, and it is therefore clear that the first
example indicates the sequence of natural numbers, which is an infinite sequence.
The order of the elements of the sequence is important: The sequence that starts as
{1, 3, 2, 4, 5, 7, 6, 8, 9, 11, 10, 12, . . . } also yields the same set of numbers, but in a different order and the sequence is different.
The second sequence is a finite sequence created with a specific rule, but we will be
focus our interest on infinite sequences (in fact, many books do not consider finite sequences). Unless stated otherwise, all sequences from now on will be infinite sequences.
Finally, the third sequence is created by the rule ak = ak−1 + ak−2 . This introduces
several important properties of sequences: first, an infinite sequence is generally given
by
{a0 , a1 , a2 , . . . } = {ak }∞
k=0 = {ak } = (ak ) = ak ,
where ak is called the general term and where we show alternative notations in increasing degree of simplicity (the specific choice depends on the context). Obviously,
1
the symbols indicating the general term (here a) and the index (here k) are arbitrary.
Second, the rule ak = ak−1 + ak−2 uses the values ak−1 and ak−2 , which have to be
known before. But to determine ak−2 , we need ak−3 and ak−4 , and so on. Such kind of
rule is called a recurrence relation or recursion: at least one previous value is needed
to determine the next one. Therefore, the statement of the rule is not enough: we need
to know the starting value(s) of the sequence, in this case two: a0 = 1 and a1 = 1.
The specific sequence defined in this way is called the Fibonacci sequence. It is possible that different starting values create different sequences with the same recurrence
relation.
Third, the sequence typically starts with the element indexed with 0, but we may also
use initializations with the first index being 1.
For the specific infinite sequences considered above, the k-th element becomes larger
and larger and – due to the unboundedness of natural numbers – we observe
lim ak = ∞.
k→∞
Let us now consider the sequences
1 1 1
1 1 1
{1, , , , . . . } and {1, , , , . . . }.
2 3 4
4 9 16
It is important to recognize the rule with which a sequence is created, and for these
sequences it is quite simple to establish
ak =
1
,
k+1
k = 0, 1, 2, . . .
or simpler
ak =
1
,
k
k = 1, 2, 3, . . .
as general term for the first sequence and
ak =
1
,
k2
k = 1, 2, 3, . . .
as general term for the second sequence. For these examples, we see that
lim ak = 0.
k→∞
Note that ak 6= 0 for all k, so we observe – like in the chapter on limits – that the limit
value needs not to be part of the sequence. We define:
Definition 1 (limit point of a sequence):
Let (ak ), k = 0, 1, 2, . . . define an infinite sequence. Then, a is the limit (or limit point)
of the sequence (ak ), i.e.,
lim ak = a,
k→∞
2
if and only if for all > 0, there exist a number N such that
|ak − a| < for all k > N.
Example:
Prove, using the definition, that the limit point of (ak ), k = 0, 1, . . . , with ak =
is 1.
k+4
k+1
We proceed by using
|ak − 1| < for all k > N.
This converts into
k+4
3
−1 =
< or
k+1
k+1
k>
3
− 1.
Essentially, the right-hand side of the last inequality defines the value of N . Of course,
the important range of is when it is a small value, but the theorem only assumes
that it is positive. Therefore, we set N = max(0, d 3 − 1e). In this way, N is now a
non-negative integer value that is – no matter how small – large enough to ensure
that all sequence elements with k > N are within the interval (1 − , 1 + ) and in this
way proving that the limit is 1.
Comments:
(1) This definition is analogous to the definition of a limit of a function as x → ∞,
with the difference that k can only take integer values and that (ak ) is a sequence, not
a function (see also Theorem 1).
(2) If the limit exists (and this implies that a is finite), then we say that the sequence
(ak ) converges to a.
(3) If the limit does not exist, the sequence can either diverge to ±∞ or it neither
converges nor diverges to ±∞, like for the infinite sequence 1, −1, 1, −1, 1, −1, . . . .
(4) There is an ambiguous use of notation in the literature: Depending on the context,
divergence may either refer to all sequences that do not converge, or only to those that
tend to ±∞.
Theorem 1 (limit of a sequence and function):
If lim f (x) exists, then the sequence given by ak = f (k) converges to the same limit:
x→∞
lim ak = lim f (x).
k→∞
x→∞
Comment: This theorem specifies what has been said in Comment 1 of Definition 1.
Example:
Find the limit of the sequence given by
{
22 − 2 32 − 2 42 − 2 52 − 2
,
,
,
,...}
22
32
42
52
3
This sequence has a general term
k2 − 2
2
= 1 − 2.
2
k
k
2
We apply Theorem 1 using f (x) = 1 − 2 (so we simply replace k by x) and write
x
2
2
lim ak = lim f (x) = lim 1 − 2 = 1 − lim 2 = 1 − 0 = 1.
x→∞ x
x→∞
x→∞
k→∞
x
In order to find some more specific criteria on the convergence of sequences, it is useful
to enlarge our repertoire of definitions.
ak =
Definition 2 (accumulation point of a sequence):
a is an accumulation point of a sequence (ak ) if in any -neighbourhood of a there is
an infinite number of sequence elements.
Comments:
(1) This definition generalizes slightly Definition 1 (it is less strict on the hypotheses).
The concept -neighbourhood refers to the sequence elements that fulfil |ak − a| < .
For example, the infinite sequence 1, −1, 1, −1, 1, −1, . . . has no limit point, but it has
two accumulation points, at −1 and 1, since there is an infinite number of elements of
the sequence in any neighbourhood of 1, and also an infinite number of elements of the
sequence in any neighbourhood of −1. We can extract two infinite subsequences (or
partial sequences from 1, −1, 1, −1, 1, −1, . . . , namely, −1, −1, −1, . . . and 1, 1, 1, . . .
(in what follows, we always assume that subsequences of an infinite sequence are also
infinite).
(2) Every limit point is also an accumulation point. Therefore, if a sequence has a limit
point, it has only one accumulation point, which coincides with the limit point. The
converse is not true: not all accumulation points are limit points. Remember: there
exists at most one limit point.
(3) If there is an accumulation point, there are an infinite number of sequence elements
around it. This does not mean that almost all elements are in that neighbourhood
(which would mean that there is at most a finite number of elements outside it),
because there may still be an infinite number of sequence elements outside that neighbourhood. Take, for example, the sequence 1, 21 , 2, 13 , 3, 41 , 4, . . . . There is a partial
sequence 21 , 13 , 41 , . . . which converges to 0 (and there is an infinite number of sequence
elements in any -neighbourhood of 0), but there is also another partial sequence
1, 2, 3, 4, . . . which diverges (to ∞) and that sequence also has an infinite number of
elements.
Why are accumulation points useful? Because of the following theorem:
Theorem 2 (Bolzano-Weierstrass):
Every bound sequence (ak ) has at least one accumulation point.
4
Comments:
(1) Boundedness of a sequence is defined analogously to that of a function: there must
exist a finite value M such that all sequence elements are smaller than M , and a finite
value m such that all sequence elements are larger than m.
(2) While this theorem does not guarantee the existence of a limit point (only an accumulation point), it has been shown to be very important in the development of other
results for sequences and series – and calculus in general. This is because in many
instances it is relatively easy to check for boundedness, and because it is a relatively
weak hypothesis.
(3) An alternative formulation is that every bound sequence has at least one convergent subsequence.
Theorem 3 (convergent sequence):
A sequence (ak ) is convergent if and only if it is bound and if the largest accumulation
point is identical to the smallest accumulation point.
Comment:
Therefore, for a bound sequence, we may look for suitable subsequences and investigate
them separately, before making a conclusion about convergence.
Theorem 4 (sandwich theorem for sequences):
Suppose that (ak ), (bk ) and (ck ) are sequences with lim bk = lim ck = L and that
k→∞
k→∞
there exists a number M such that bk ≤ ak ≤ ck if k > M . Then, lim ak = L.
k→∞
Theorem 4 is analogous to the sandwich theorem for the limit of functions. Before we
consider some examples, we give some properties of convergent sequences.
Theorem 5 (properties of convergent sequences):
Suppose that (ak ) and (bk ) are convergent sequences with limit points a and b. Then,
(i) (ak ± bk ) has limit point a ± b;
(ii) (ak bk ) has limit point ab;
(iii) (ak /bk ) has limit point a/b (for bk 6= 0, b 6= 0);
(iv) (c ak ) has limit point ca (for all c ∈ R).
Example: Show that if lim |ak | = 0, then lim ak = 0.
k→∞
k→∞
We know that −|ak | ≤ ak ≤ |ak |. By hypothesis of this example, lim |ak | = 0 and also
k→∞
lim −|ak | = − lim |ak | = 0 (Theorem 5(iv) with c = −1). Therefore, we have two
k→∞
k→∞
sequences that converge to the same limit point and can use the sandwich theorem.
We conclude that also lim ak = 0.
k→∞
Rk
= 0 for R ∈ R.
Example: Show that lim
k→∞ k!
5
For R = 0, the limit is zero trivially. We now assume that R > 0 and define M a
positive integer such that
M ≤ R < M + 1.
Since we are later interested in the limit k → ∞, it is reasonable to consider the case
Rk
as product of its k factors:
where k > M and express the term
k!
Rk
RR
R
R
R
R
=
···
·
···
.
k!
1 2
M
M +1M +2
k
From this factorisation we make several conclusions. First, for all fixed R and M ,
the first parenthesis represents a finite number of factors and can be represented by a
constant C. Second, for k finite, the second parenthesis represents a finite number of
factors, each of which is smaller than unity and we have
R
R
R
R
··· < ,
M +1M +2
k
k
and we get overall
Rk
R
<C .
k!
k
R
The limit of 0 is trivially zero, and lim C = 0, and therefore using the sandwich
k→∞
k
Rk
theorem we find lim
= 0.
k→∞ k!
Rk
For R < 0, the limit is also zero as, following the previous example, lim
= 0.
k→∞ k!
0<
Theorem 6 (function of a convergent sequence):
If f (x) is a continuous function and lim ak = a, then:
k→∞
lim f (ak ) = f ( lim ak ) = f (a).
k→∞
k→∞
This theorem goes beyond Theorem 1 because now we assume that f is continuous.
The theorem expresses the fact that if ak is a convergent sequence, the limit point of
the new sequence f (ak ) is identical to the function evaluated at the limit point.
Example: Determine lim f (ak ) for f (x) = ex and ak =
k→∞
3k
.
k+1
First, we calculate
3k
3
= lim
= 3.
k→∞ k + 1
k→∞ 1 + k −1
lim ak = lim
k→∞
6
Therefore, we have
lim f (ak ) = lim e
k→∞
k→∞
3k
k+1
3k
lim
k→∞ k + 1
=f
= f (3) = e3 .
Theorem 7 (convergent sequence is bound):
If (ak ) converges, then (ak ) is bound.
Comment:
The converse is not true: if a sequence is bound, it is not necessarily convergent, as
the example 1, −1, 1, −1, 1, −1, . . . shows. However, we have the following property.
Theorem 8 (monotonic sequence):
If (ak ) is strictly monotonic and bound, (ak ) converges.
Comments:
(1) Again, monotonicity is defined in complete analogy to the monotonicity of functions. It can be tested via the first derivative after a transfer to functions or by
evaluating the sign of ak+1 − ak .
(2) If the sequence is strictly increasing, it has to be bound to above, if it is strictly
decreasing, it has to be bound to below for the conclusion to be valid.
Example: Show that ak =
Does lim ak exist?
√
√
k + 1 − k is strictly decreasing and bound to below.
k→∞
The function f (x) =
ative:
√
x+1−
√
x is strictly decreasing because its derivative is neg-
1
1
− √ <0
f 0 (x) = √
2 x+1 2 x
if x > 0.
Therefore, ak = f (k) is strictly decreasing. Furthermore, ak > 0 and the sequence is
bound to below (with lower bound 0). Theorem 9 ensures that lim ak = a exists and
that a ≥ 0. Actually, we can show that a = 0 using Theorem 1:
k→∞
√
√
1
lim f (x) = lim ( x + 1 − x) = lim √
√ = 0.
x→∞
x→∞
x→∞
x+1+ x
In this section, we have identified five strategies for checking for convergence of a sequence: (a) we use the Definition 1 (often a tedious process); (b) we investigate using
the limit of the associated function (we need all tools from limits of functions); (c)
we check for boundedness and find the accumulation point(s) of any subsequence (if
appropriate); (d) check whether the sequence is a sum/product/fraction of known convergent sequences or whether the sandwich teorem is applicable (you need to know
other convergent sequences); (e) check for boundedness and monotonicity.
7
11.2
Numerical series: definition and convergence
Actually, we have used series already, however, in this section we will study them in
their own right.
In Chapter 8, we considered Taylor’s Theorem. There, we approximated a function
(with appropriate properties) by a polynomial of degree n and a remainder term.
However, we could also try to let n → ∞ and therefore disregard the remainder term.
In this way, we create a power series, for example for sin(x) with development about
x0 = 0:
x3 x5 x7 x9 x11
+
−
+
−
··· .
sin(x) = x −
3!
5!
7!
9!
11!
But we want to focus on numerical series, and we set, for example, x = 1:
sin 1 = 1 −
1
1
1
1
1
+ − + −
··· .
3! 5! 7! 9! 11!
Let us know define partial sums SN simply as sums of all terms up to the N -th one.
In our example:
S1 = 1
1
≈ 0.833
3!
1
x5
1− +
≈ 0.841667
3!
5!
1
x5
1
1− +
− ≈ 0.841468
3!
5!
7!
1
x5
1
1
1− +
− + ≈ 0.8414709846
3!
5!
7! 9!
N
X (−1)k
(2k + 1)!
k=1
S2 = 1 −
S3 =
S4 =
S5 =
SN =
Comparing the values of S1 to S5 to the value of sin(1) ≈ 0.8414709848 (precise to 10
decimal places), we see that the agreement improves the more terms the partial sum
contains, and that S5 agrees to sin(1) to 9 d.p. This suggests that the partial sums
converge to sin(1) and, indeed,
N
X
(−1)k
sin 1 = lim SN = lim
.
N →∞
N →∞
(2k + 1)!
k=1
Actually, this is not surprising since the series was constructed from a Taylor polynomial and of course we expect that the right-hand side should converge to the finite
value corresponding to sin(1). In this way, we interpret the sum of an infinite series as
the limit of the sequence of partial sums.
8
Definition 3 (convergent
infinite series):
P∞
An infinite series k=1 ak converges to the sum S if the sequence of its partial sums
P
SN = N
k=1 ak converges to S:
lim SN = S,
N →∞
and we write S =
P∞
k=1
ak .
Comments:
P
(1) A sum may start from any finite index m: S = ∞
k=m ak .
(2) If the limit exists, we say that the series converges. If the limit is ±∞, the series
diverges to ±∞.
(3) This definition enables us to use the methods of sequences to study the convergence
of series.
Example: Determine S =
First, we express
P∞
n=1
1
.
n(n + 1)
1
using the partial fraction decomposition:
n(n + 1)
1
1
1
= −
.
n(n + 1)
n n+1
Now, we write down the first partial sums, to obtain the general term of the sequence
of partial sums:
1 1
1
= −
1·2
1 2
1
1 1 1 1
1 1
1
+
= − + − = −
=
1·2 2·3
1 2 2 3
1 3
1
1
1
1 1 1 1 1 1
1 1
=
+
+
= − + − + − = − .
1·2 2·3 3·4
1 2 2 3 3 4
1 4
S1 =
S2
S3
We observe that in the inner terms cancel each other (series with this property are
called telescoping series) and we are left with
Sn = 1 −
1
.
n+1
At this point, we are able to perform the limit of the partial sums as
1
lim Sn = lim 1 −
= 1,
n→∞
n→∞
n+1
which is the result of the infinite series
P∞
n=1
9
1
.
n(n + 1)
11.3
Arithmetic and geometric sequences and series
Before studying convergence tests of general series, we consider specific sequences and
series, where the consecutive terms have a particular regularity. These sequences and
series can be found in many applications and should be studied in their own right.
Definition 4 (arithmetic sequence/series):
An arithmetic sequence (or progression) is formed by a sequence of numbers where
the difference between consecutive terms is constant. The sum of the terms of an
arithmetic sequence is an arithmetic series.
For example, the sequences {2, 5, 8, 11, 24} or {2, 0, −2, −4, −6, · · · } represent arithmetic sequences, a finite one in the former case, and an infinite one in the latter case.
In the general case, the sequence element ak is given by
ak = s + k d,
k = 0, 1, . . . ,
where s stands for the starting value and d for the distance between consecutive elements. The value of the finite series with N terms of the sequence is then
SN =
N
−1
X
(s + k d).
k=0
Since the sum starts with k = 0, to have N terms in total, we terminate the sum at
element N − 1. Of course, alternative formulations of sequence and series are possible.
To obtain a simple expression for SN , we write the series twice, one in normal form,
one in reverse order of terms, and sum each term with the corresponding one above it:
SN = s + (s + d) + (s + 2d) + · · · + [s + (N − 1)d]
SN = [s + (N − 1)d] + [s + (N − 2)d] + [s + (N − 3)d] + · · · + s
2SN = [2s + (N − 1)d] + [2s + (N − 1)d] + [2s + (N − 1)d] + · · · + [2s + (N − 1)d],
with N identical terms on the right-hand side. We solve for SN :
SN =
N
N
N
[2s + (N − 1)d] = [s + s + (N − 1)d] = (first term+last term).
2
2
2
In particular, for s = 1 and d = 1,
SN = 1 + 2 + · · · + N =
N
−1
X
N
X
k=0
k=1
(1 + k) =
Infinite arithmetic sequences and series diverge.
10
k=
N
(N + 1).
2
Definition 5 (geometric sequence/series):
A geometric sequence (or progression) is formed by a sequence of numbers where the
ratio between consecutive terms is constant. The sum of the terms of a geometric
sequence is a geometric series.
1 1
, 32 , · · · } represent
For example, the sequences {2, 4, 8, 16, 32} or {2, −1, 21 , − 14 , 18 , − 16
geometric sequences, a finite one in the former case, and an infinite one in the latter
case. In the general case, the sequence element ak is given by
ak = s r k ,
k = 0, 1, . . . ,
where s stands for the non-zero starting value and r for the common ratio between
consecutive elements. The value of the finite series with N terms of the sequence is
then
N
−1
X
(s rk ).
SN =
k=0
To obtain a simple expression for SN , we write the series twice, one in normal form,
one multiplied by r, and then subtract both expressions:
SN = s + sr + sr2 + sr3 + · · · + srN −1
rSN = sr + sr2 + sr3 + sr4 + · · · + srN
SN − rSN = s − srN = s(1 − rN ).
For r 6= 1, we now simply solve for SN :
SN =
N
−1
X
(s rk ) =
k=0
s(1 − rN )
.
1−r
Let us now investigate an infinite geometric series (for r 6= 1):
s(1 − rN )
s
srN
s
srN
lim SN = lim
= lim
−
=
− lim
N →∞
N →∞
N →∞
1−r
1−r 1−r
1 − r N →∞ 1 − r
if the latter limits exist. We know that lim rN = 0 if |r| < 1 while for |r| > 1 the
N →∞
limit diverges. Therefore, the infinite geometric series converges to
lim SN =
N →∞
s
1−r
11
for |r| < 1.
11.4
Numerical series: convergence criteria for positive series
Theorem
9 (necessary criterion for convergence):
P
If ∞
a
k=1 k converges, lim ak = 0.
k→∞
Comments:
(1) However, the converse is not true, lim ak = 0 is not sufficient to ensure conk→∞
vergence. For a specific kind of series with additional hypotheses (see Theorem 13),
lim ak = 0 implies convergence.
k→∞
(2) The logical P
negation of this theorem can be used as a quick test for non-convergence:
If lim ak 6= 0, ∞
k=1 ak does not converge.
k→∞
Theorem
P∞ 10 (comparison test):
P
(i) If k=1 ak converges (with ak ≥ 0 for all k), then the series ∞
k=1 bk with bk ≥ 0
and bk P
≤ ak for all k converges.
P∞
(ii) If ∞
k=1 bk with bk ≥ 0
k=1 ak diverges (with ak ≥ 0 for all k), then the series
and bk ≥ ak for all k diverges.
Example: Investigate the convergence of
P∞
k=0
1
(factorial series).
k!
We write down the partial sum of the P
factorial series, AN , and also the partial sum of
−1
k
a suitable geometric series, CN = 1 + N
k=0 (s r ) with s = 1 and r = 1/2:
1
1
1
1
1
1
+ + + + + ··· +
,
0! 1! 2! 3! 4!
N!
1
1
1
1
1
= 1 + 0 + 1 + 2 + 3 + · · · + N −1 .
2
2
2
2
2
AN =
CN
Note that there are N +1 terms in both partial sums and that both series are composed
of positive terms. The 1 is added to the usual geometric series to make a term-wise
comparison: each term of AN is equal or smaller than the corresponding term of CN .
We obtain the sum of the series CN as:

 1
 1 1 − 2N 
1
1


CN = 1 + 
= 1 + 2 1 − N = 3 − N −1 .

1
2
2
1−
2
We therefore conclude that lim CN = 3, it converges. Therefore, the limit of the
N →∞
factorial series must converge to a limit smaller than 3 (recall that the series is entirely
composed of positive terms).
P
1
Example: Investigate the convergence of ∞
(harmonic series).
k=1
k
We again try to establish a term-wise comparison of partial sums, however, now with
a divergent series. For this, we write the terms with brackets to indicate how they will
12
be compared (A stands for the harmonic series, C for a divergent one):
1
1 1
1 1 1 1
1
1
+ ( + ) + ( + + + ) + ( + ··· + ) + ··· ,
2
3 4
5 6 7 8
9
16
1
1 1
1 1 1 1
1
1
C = 1 + + ( + ) + ( + + + ) + ( + ··· + ) + ··· .
2
4 4
8 8 8 8
16
16
A = 1+
In this case, each term of A is larger or equal to the corresponding term of C. However,
we see that C = 1 + 12 + 21 + 12 + 12 + · · · and C is clearly divergent. This also shows
us that for convergence it is not sufficient that the limit of the k-th term tends to zero
as k → ∞.
Theorem 11 (d’Alembert’s test or quotient test):
P
ak+1
Suppose an infinite series ∞
= R. Then,
k=1 ak (with ak ≥ 0 for all k) and lim
k→∞ ak
the series is convergent if R < 1, divergent if R > 1 and the criterion does not decide
if R = 1.
Comment:
Since a series with non-positive terms can be translated into a series with non-negative
terms through a multiplication with −1, the above-mentioned theorems hold actually
for all series with terms with non-changing sign.
Example: Use d’Alembert’s test to determine whether
P∞
k=0
2k
is convergent.
(k + 1)2
k
We confirm that all terms ak =
2
are positive and fulfil the hypothesis of
(k + 1)2
d’Alembert’s test. We calculate
2k+1
(k + 1)2
ak+1
= lim
·
= 2 · lim
lim
k→∞ (k + 2)2
k→∞
k→∞ ak
2k
Since 2 > 1, the infinite series diverges.
13
k+1
k+2
2
= 2.
11.5
Numerical series: convergence criteria for general series
Definition 6 (absolut
convergence):
P
P∞
An infinite series ∞
a
k=1 k is absolutely convergent if
k=1 |ak | is convergent.
Comments:
(1) An absolutely convergent series is also convergent in the usual sense.
(2) While checking for absolute convergence, the terms of the series are non-negative
and the tests mentioned above (comparison test, d’Alembert) can be used.
(3) There are convergent series that are not absolutely convergent. These are conditionally convergent series. Examples can be found among alternating series.
Definition 7 (alternating
series): P
P∞
k
Series of the kind k=1 (−1)k+1 ak or ∞
k=1 (−1) ak with ak ≥ 0 for all k are called
alternating series.
P
P∞
k
k+1
ak = − ∞
Comment: Since
k=1 (−1) ak , these two forms are actually
k=1 (−1)
equivalent (compare with the comment made after Theorem 11).
Theorem 12 (Leibniz test):
If (ak ) is monotonically decreasing with lim ak = 0, then, the alternating series
k→∞
P∞
k+1
(−1)
a
converges.
k
k=1
Comment: We have just provided an additional hypothesis under which lim ak = 0
k→∞
becomes part of a criterion for convergence. As Theorem 9 stated, lim ak = 0 alone
k→∞
is not sufficient.
P∞ (−1)k+1
converges.
k=1
k
P
1
1
We have seen that harmonic series ∞
diverges. However, the sequence is monok=1
k
k
P∞ (−1)k+1
1
= 0 and hence the alternating series k=1
tonically decreasing with lim
k→∞ k
k
converges conditionally.
Example: Determine whether the series
Example: Determine whether the series
√1
2−1
−
√1
2+1
+
√1
3−1
−
√1
3+1
+ · · · converges.
The signs are alternating and the terms tend to zero. However, the terms are not
1
1
1
1
monotonically decreasing ( √
>√
but √
<√
) and the criterion
2−1
2+1
2+1
3−1
2 2
cannot be applied. Actually, the series is divergent: The partial sum S2k = + +
1 2
2
2
+ ··· +
which is twice the sum of the divergent harmonic series.
3
k−1
14
Download