MATH 321: Real Variables II Notes 2015W2 Term

advertisement
MATH 321: Real Variables II Notes
2015W2 Term
Taught by Dr. Kalle Karu, taken by Adrian She
Please report typos or errors to Adrian at adrian.she@alumni.ubc.ca
Contents
I
Riemann-Steiljes Integration
5
1 The Riemann Integral
1.1 Darboux’s Definition of the Riemann Integral . . . . . . . . . . . . . . . . . . . .
1.2 Introduction to Integrability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5
5
7
2 The Riemann-Stieltjes Integral
8
3 Integrability
3.1 Upper and Lower Integrals . . . . . . .
3.2 Integrability of Continuous Functions .
3.2.1 Review of Uniform Continuity
3.2.2 Proof of Theorem . . . . . . . .
3.3 Riemann Sums . . . . . . . . . . . . .
3.4 Discontinuous Functions . . . . . . . .
.
.
.
.
.
.
11
11
12
13
13
14
15
4 Properties of the Integral
4.1 Change of Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.2 The Fundamental Theorem of Calculus . . . . . . . . . . . . . . . . . . . . . . . .
19
23
24
5 Functions of Bounded Variations
5.1 The Riesz Representation Theorem . . . . . . . . . . . . . . . . . . . . . . . . . .
5.2 The Length of a Curve . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.3 Functional Analysis Revisited . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
25
26
28
30
II
32
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Sequences and Series of Functions
6 Sequences and Series of Functions: Definitions and Issues
32
7 Uniform Convergence
7.1 Uniform Convergence of Sequences . . . . . . . . . . . . . . . . . . . . . . . . . .
7.2 Uniform Convergence of Series . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7.3 Interpretation of Uniform Convergence . . . . . . . . . . . . . . . . . . . . . . . .
36
36
38
39
1
8 Properties of Uniform Convergence
8.1 Uniform Convergence and Continuity . . .
8.1.1 The Main Result . . . . . . . . . .
8.1.2 Dini’s Theorem . . . . . . . . . . .
8.1.3 Strange Functions . . . . . . . . .
8.2 Uniform Convergence and Integration . .
8.2.1 Application to Function Spaces . .
8.3 Uniform Convergence and Differentiation
8.4 Some Counterexamples . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
40
40
40
41
43
45
46
48
49
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
50
51
51
52
53
54
56
10 Weierstrass’ Theorem
10.1 Motivation for the Proof - Averaging Operators . . . .
10.2 Proof of Weierstrass’ Theorem . . . . . . . . . . . . .
10.3 Stone’s Generalization of Weierstrass’ Theorem . . . .
10.4 Proof of Stone’s Theorem- The Lattice Version . . . .
10.5 Proofs of Stone-Weierstrass Theorem: Algebra Version
10.5.1 The Real Case . . . . . . . . . . . . . . . . . .
10.5.2 The Complex Case . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
58
58
60
62
64
65
65
67
9 The
9.1
9.2
9.3
9.4
9.5
III
Arzela-Ascoli Theorem
Types of Continuity . . . . . . . .
Pointwise Boundedness . . . . . . .
Proof of Arzela-Ascoli . . . . . . .
Converse to Arzela-Ascoli Theorem
Application: Peano’s Theorem . .
9.5.1 Proof of Peano’s Theorem .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Power Series and Fourier Series
11 Power Series
11.1 Power Series Properties . .
11.2 Behaviour at Endpoints . .
11.3 Rearrangement of Sums . .
11.4 Application to Taylor Series
11.5 Zeros of Analytic Functions
68
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
68
70
70
71
73
73
12 Fourier Series as Orthogonal Series
12.1 The Hermitian Inner Product . . . .
12.2 Orthogonal Bases of Functions . . .
12.3 Examples of Orthogonal Systems . .
12.4 Bessel’s Inequality . . . . . . . . . .
12.4.1 The Finite Dimensional Case
12.4.2 Orthogonal Series Case . . .
12.5 Riesz-Fischer Theorem . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
74
74
75
76
77
77
78
80
13 Convergence of Fourier Series
13.1 L2 convergence of Fourier Series . . . . . . . . . . . . . . . . . . . . . . . . . . .
13.2 Pointwise Convergence of Fourier Series . . . . . . . . . . . . . . . . . . . . . . .
81
81
83
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
2
List of Figures
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
Illustration of a partition, Riemann sum, and tag . . . . . . . . . . . . . . . . . .
Illustration of upper and lower Darboux sums . . . . . . . . . . . . . . . . . . . .
Quantity we want to compute . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
The graph of f , and its transformation under (x, y) 7→ (α(x), y). The area under
Rb
the left graph represents a f dx and the area under the right graph represents
Rb
f dα . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
a
R2
Visualization of 0 f dα . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Illustration of Lemma for L(P, f, α). Refining the partition increases L(P, f, α) .
Division of the Interval into Three Parts . . . . . . . . . . . . . . . . . . . . . . .
The Cantor Set can be covered with finitely many intervals of arbitrarily small
length. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .R . . . . . .
Illustration
of the integration by parts formula and symmetry between f dα and
R
α df . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
The shaded region is U (P, f, α) − L(P, f, α) . . . . . . . . . . . . . . . . . . . . .
Illustration of β(x) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Example of f (x) and the corresponding F (x) . . . . . . . . . . . . . . . . . . . .
A function not of bounded variation . . . . . . . . . . . . . . . . . . . . . . . . .
Illustration of Riesz Representation Theorem . . . . . . . . . . . . . . . . . . . .
Illustration of the proof for a plane curve . . . . . . . . . . . . . . . . . . . . . .
Illustration of the Sequence of Functions . . . . . . . . . . . . . . . . . . . . . . .
fn are a sequence of functions which form a “travelling wave” . . . . . . . . . . .
Illustration of the Sequence of Functions . . . . . . . . . . . . . . . . . . . . . . .
Illustration of uniform convergence . . . . . . . . . . . . . . . . . . . . . . . . . .
fn does not lie within an neighbourhood of the limit . . . . . . . . . . . . . . .
Schematic of the Proof . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Illustration of Proof of Claim. Given , there are n, δ such that |fn (x)| < in a δ
neighbourhood of x. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
First few iterations of the Takagi Function . . . . . . . . . . . . . . . . . . . . . .
First few iterations of construction . . . . . . . . . . . . . . . . . . . . . . . . . .
Alternate Construction of Cantor Staircase . . . . . . . . . . . . . . . . . . . . .
Illustration the L∞ and L1 distances between functions. Particularly, the L∞
distance is the maximum pointwise distance between the two function and the L1
distance is the area between the two curves. . . . . . . . . . . . . . . . . . . . . .
Illustration between Modes of Convergence . . . . . . . . . . . . . . . . . . . . .
Another solution of the differential equation is constructed by shifting the where
the function is first non-zero . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Other solutions of the differential equation are constructed in this case, again by
shifting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Euler’s method produces a series of piecewise linear approximations to the solution
of a differential equation . . . . . . . . . . . . . . p
. . . . . . . . . . . . . . . . . .
Two cases for Euler’s Methods when sovling x0 = |x| . . . . . . . . . . . . . . .
Application of the averaging operator to a step function yields a piecewise linear
function, then a piecewise quadratic function . . . . . . . . . . . . . . . . . . . .
Definition of g(t) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
A sequence of smooth g which approach the delta function . . . . . . . . . . . . .
Recall that such gn are bump functions, which approach the delta function . . .
3
5
6
8
10
10
11
15
16
17
17
21
24
25
28
29
33
35
36
37
37
40
43
44
44
45
47
47
54
55
55
56
59
59
60
61
36
37
38
P∞
2
First few terms of 12 + n=0 (2k+1)π
sin((2k + 1)π), a Fourier series for a step
function, overlaid with the original function. The original function is plotted in
green; the Fourier series is plotted in blue. . . . . . . . . . . . . . . . . . . . . . .
Example of the Gibbs Phenomenon for a Square Wave. Gibbs phenomenon are
displayed at the point of discontinuity and lie approximately on the line y = 1.09
Plot of the Dirichlet Kernel DN (x) for some N . . . . . . . . . . . . . . . . . . .
4
77
84
85
Part I
Riemann-Steiljes Integration
1
Rb
a
The Riemann Integral
Our first problem in this course is to rigorously define the integral. How do we define
f (x) dx? This problem was first explored by Riemann in his thesis.
From previous calculus courses, we define the integral as the limit of a Riemann sum. That
is:
Z
b
f (x) dx = lim
a
n
X
f (ti )∆xi
i=1
wherein ti ∈ [xi−1 , xi ] (known as a tag of the partition), and ∆xi = xi −xi−1 . As n approaches
infinity, the partitions should get finer and finer.
a
x0
x1
x2
ti x3
b
xn
Figure 1: Illustration of a partition, Riemann sum, and tag
However, the above definition of the integral raises two problems:
1. How is ti , the tag, chosen?
2. How is the limit taken as n approaches infinity?
The definition of the Riemann integral given by Darboux solves the above two issues. Next,
we will add a generalization of the Darboux integral due to Stieltjes.
1.1
Darboux’s Definition of the Riemann Integral
We firstly define partition.
Definition I.1 (Partition). A partition P of [a, b] is a set
P = {a = x0 < x1 < x2 < ... < xn = b}
Now suppose that f (x) is a bounded function on [a, b]. To solve the first issue, the tagging problem, we will replace f (ti ) with the maximum or minimum within each interval of the
partition. Let Mi = supx∈[xi−1 ,xi ] f (x) and mi = inf x∈[xi−1 ,xi ] f (x)
5
Then define the upper and lower sums to be
U (P, f ) =
n
X
Mi ∆xi
i=1
and
L(P, f ) =
n
X
mi ∆xi
i=1
sup
inf
a
b
Figure 2: Illustration of upper and lower Darboux sums
Supposing that
Rb
a
f (x) dx exists, we conjecture that
Z
L(P, f ) ≤
b
f (x) dx ≤ U (P, f )
a
should hold. Accordingly, we define the upper and lower integrals respectively as
b
Z
f (x) dx = inf {U (P, f )}
P
a
and
b
Z
f (x) dx = sup{L(P, f )}
P
a
where the P denotes all possible partitions of [a, b]. That is,
P =
∞
[
partitions with n parts
n=1
. In the case where partitions have three parts (a = x0 < x1 < x2 < x3 = b), the set of all
partitions is a upper-triangular region bounded by x1 = a, x2 = b and x1 = x2 where these lines
are not included in the region. We can also think of taking sup or inf over all possible partitions
as making partitions finer and finer.
Rb
Rb
Rb
If a f (x) dx = a f (x) dx, then a f (x) dx is equal to either quantity and we say that f is
Riemann-integrable. We write f ∈ R.
6
This solves the second issue since we take a supremum or infimum over a set, instead of a
limit in defining the integral this way.
The process just described is similar to finding the area of a plane region R. We can superimpose a square grid on the plane. Then we define the outer sum as counting every square which
meets R and the inner sum as counting every square which lies within R. As the grid is made
finer and finer, the outer and inner sums approximate more and more the area of R, and in the
limit, everything should be equal.
1.2
Introduction to Integrability
The next thing we want to do is to definition what functions are integrable. We begin with
the following example.
Example I.1 (A non-integrable function). Suppose f (x) is defined on [0, 1] as follows:
(
1 x∈Q
f (x) =
0 otherwise
Fix some partition P . Then Mi = 1 and mi = 0 on each interval of the partition.
R1
Pn
Thus, U (P, f ) = i=1 Mi ∆xi = 1 implies that 0 f (x)dx = 1.
R1
Pn
Similarly, L(P, f ) = i=1 mi ∆xi = 0 implies that 0 f (x)dx = 0.
R1
R1
Since 0 f dx 6= 0 f dx, then f ∈
/ R.
We will prove that f ∈ R if:
1. f (x) is continuous.
2. f (x) is continuous except at a finite number of points.
The above are sufficient conditions for Riemann integrability. Lebsegue formulated a necessary and sufficient condition for Riemann integrability. A function f is integrable iff f is
continuous except on a set of measure zero. Informally, measure denotes the “length” of a set.
If we can cover a set with smaller and smaller intervals whose length tends to zero, then we say
that the set is of measure zero. This is covered in more detail in subsequent analysis courses.
Example I.2 (Computation). Using the definition of the Riemann integral, we would like to
Rb
2
2
compute a x dx. We would expect that this is equal to I = b2 − a2 .
To apply the definition of the Riemann integral, we need to prove that the upper and lower
integrals are both equal to I. To prove that I = supP L(f, P ), we must show
a L(P, f ) ≤ I
b For every > 0, there exists a partition P such that |I − L(P, f )| < 7
y=x
a
b
Figure 3: Quantity we want to compute
Proof. a Informally, L(P, f ) lies within the trapezoid of area I, and accordingly I, representing
the area of the trapezoid, will be an upper bound of L(P, f ).
b Let Pn be the regular partition of n points. That is, a partition where ∆xi are all equal
(∆xi = b−a
n ). Then I − L(P, f ) will be the area of triangles below the line y = x and above
L(P, f ). Thus
1
1 b−a 2
(b − a)2
I − L(P, f ) = (∆x)2 n = (
) n=
2
2 n
2n
. By choosing sufficiently large n, we can make
(b−a)2
2n
< .
Remark. Since we know the “answer” in advance here, we can apply geometric arguments.
For a more formal argument, we may need to displayPa sum or resort to other criterion to prove
n
i−1
integrability. For instance, we may have written I as i=1 xi +x
∆xi , the sum of each trapezoid
2
2
2
a
b
in each interval of the partition, we prove that I = 2 − 2 .
We have yet to properties of the integral, but before then we will need to define the RiemannStietjes integral.
2
The Riemann-Stieltjes Integral
In computing L(P, f ) and U (P, f ), we take the heights mi , Mi are compute ∆xi per rectangle
in the partition. We will change the definition of ∆xi = l([xi−1 , xi ]) and use the new length to
compute areas.
We do this by fixing α : [a, b] → R which is monotonically increasing. Let
l([s, t]) = α(t) − α(s)
Then we can define the integral as before, replacing ∆xi with ∆αi , which is a new measure
of the length of the interval [xi−1 , xi ].
We can now define the Riemann-Stieltjes integral.
Definition I.2 (Riemann-Stieltjes Integral). Suppose f is bounded on [a, b] and α(x) is a monotonically increasing function on [a, b]. Fixing a partition P , define
8
L(P, f, α) =
n
X
mi ∆αi
i=1
taking
Rb
a
f dα = supP L(P, f, α) and
U (P, f, α) =
n
X
Mi ∆αi
i=1
taking
Rb
a
f dα = inf P U (P, f, α)
If these two are equal, we call it
Rb
a
f dα and say f ∈ R(α).
Remark.
1. Note that if α(x) = x, then ∆αi = ∆xi and the Riemann-Stieltjes integral is
the Riemann integral.
2. If α is continuous, there’s not much interesting to consider than we can compare this to
the Riemann integral. But the case where α is discontinuous is interesting.
(
0 x<1
Example I.3 (A discontinuous α). Let α(x) =
(this is a step function), [a, b] = [0, 2],
1 x≥1
R2
and f be any continuous function. We would like to compute 0 f dα.
In the interval [xj−1 , xj ] where xj ≥ 1 and xj−1 < 1, ∆αj = α(xj ) − α(xj−1 ) = 1. In all
other intervals of the partition, α is constant and accordingly, ∆αj = 0.
Thus, L(P, f, α) = mj and U (P, f, α) = Mj where the sup and inf are taken on the interval
[xj−1 , xj ]. As the interval shrinks, we conclude that sup mj = f (1) and inf Mj = f (1) since f is
R2
a continuous function. It follows that 0 f dα = f (1).
Remark.
1. The Dirac delta function, δ1 (x), is one which is infinite at x = 1, 0 everywhere
R∞
R2
else. It has the property that −∞ δ1 (x) dx = 1, and 0 f (x)δ1 (x) dx = f (1). One of the
motivations for introducing the Riemann-Stieljes integral is to study these objects. Later,
Rb
Rb
we will prove that if α is differentiable, then a f dα = a f α0 dx. Thus, we can interpret
δ1 (x) as being the derivative in the discontinuous step function defined in the previous
example, using this interpretation of the Riemann-Stieljes integral.
R2
2. We claim that 0 α dα does not exist, which we need to check.
Rb
We interpreted a f dx as the area under the graph of f (x). We can assign a similar interRb
pretation to a f dα.
Fix some f . In the case where α is continuous, consider the map (x, y) 7→ (α(x), y). Fixing
Rb
Rb
a partition P , the heights mi in each partition in a f (x) dx is preserved in a f dα. However,
each ∆xi is changed to ∆αi = α(xi ) − α(xi−1 ) under the transformation. Since each mi ∆αi is
Rb
an area of a rectangle, the integral a f dα then represents the area of the graph of f (x), under
the transformation (x, y) → (α(x), y).
9
(x, y) 7→ (α(x), y)
a
xi−1
xi
b
α(a) α(xi−1 )
α(xi )α(b)
∆αi = α(xi ) − α(xi−1 )
∆xi = xi − xi−1
Figure 4: The graph of f , and its transformation under (x, y) 7→ (α(x), y). The area under the
Rb
Rb
left graph represents a f dx and the area under the right graph represents a f dα
The more interesting case is the one where α is discontinuous. Consider the step function:
(
0 x<1
α=
1 x≥1
R
We determined last day that for continuous f , f dα = f (1). Pictorically we can illustrate
this as:
f (1)
(x, y) 7→ (α(x), y)
0
1
2
0
Figure 5: Visualization of
R2
0
1
f dα
The value f (1) is spread out across the interval [0, 1], as this is what the transformed graph
would look like in the limit, if we
R 2 take α as the limit of smooth functions which approximate it.
We claimed last day, also that 0 α dα does not exist, which we will now prove.
R2
Example I.4 (A Non-Integrable Function). Consider 0 α dα. Note that we only need to consider the interval [xj−1 , xj ] containing 1 to compute the upper and lower integrals. This is
because
(
1 i=j
∆αi =
0 otherwise
R2
On this interval, U (P, f, α) = Mj = 1 and L(P, f, α) = mj = 0. Thus, 0 α dα = 0 and
R2
R2
α dα = 1. Thus, 0 α dα does not exist.
0
We remark that Problem 2 on the problem set contains a similar function as α, which is
integrable on [0, 2].
10
3
Integrability
3.1
Upper and Lower Integrals
We now begin to investigate which functions are integrable. Before then, we need to establish
some properties of the integral. For instance:
Question I.1. Is
Rb
a
f dα ≤
Rb
a
f dα?
We begin by comparing upper and lower sums. We know that by fixing P , that
X
X
mi ∆αi = L(P, f, α) ≤ U (P, f, α) =
Mi ∆αi
.
This is because mi ≤ Mi by definition (they are lower and upper bounds respectively), and
∆αi ≥ 0 since α is an increasing function.
Question I.2. If we have two partitions P1 , P2 , is it true that L(P1 , f, α) ≤ U (P2 , f, α)?
Definition I.3. A partition P ∗ is a refinement of P if {x0 , x1 , ...xn } = P ⊂ P ∗ = {y0 , y1 , ..., ym }
Lemma I.1. If P ∗ is a refinement of P , then
1 L(P, f, α) ≤ L(P ∗ , f, α) and
2 U (P, f, α) ≥ U (P ∗ , f, α)
Figure 6: Illustration of Lemma for L(P, f, α). Refining the partition increases L(P, f, α)
Proof. It suffices to prove this for a partition P ∗ = P ∪ {y} where y ∈ [xi−1 , xi ]. Let mi =
inf x∈[xi−1 ,xi ] f (x). Then
L(P, f, α) = ... + mi ∆αi + ...
and
L(P ∗ , f, α) = ... + m∗1 ∆α1∗ + m∗2 ∆α2∗ +...
|
{z
}
s
11
where the m∗1 = inf x∈[xi−1 ,y] f (x), m∗2 = inf x∈[y,xi ] f (x), ∆α1∗ = α(y) − α(xi−1 ) and ∆α2∗ =
α(xi ) − α(y). The parts in ... are the same between L(P, f, α) and L(P ∗ , f, α).
Noting that ∆α1∗ + ∆α2∗ = ∆αi , m∗1 ≥ mi and m∗2 ≥ mi allows us to conclude that s ≥
mi (∆α1∗ + ∆α2∗ ) = mi ∆αi .
It follows that L(P ∗ , f, α) ≥ L(P, f, α). The other inequality follows similarly.
Note that the fact that α was increasing is crucial, since we need ∆α1∗ and ∆α2∗ to be nonnegative for the argument to work.
Lemma I.2. Any partitions P1 , P2 have a common refinement P ∗ .
Proof. Take P ∗ = P1 ∪ P2 .
This is minimal common refinement but we can also add points to P1 ∪P2 to create a common
refinement.
We now return to the original question we wanted to discuss.
Lemma I.3. Suppose P1 , P2 are partitions. Then L(P1 , f, α) ≤ U (P2 , f, α).
Proof. Let P ∗ be the common refinement of P1 , P2 . Then
L(P1 , f, α)
L(P ∗ , f, α)
≤
|{z}
By Lemma
P
∗
≤
|{z}
≤ U (P ∗ , f, α)
is same here
≤
|{z}
U (P2 , f, α)
By Lemma
It follows that L(P1 , f, α) ≤ U (P2 , f, α).
Theorem I.1.
Rb
a
f dα ≤
Rb
a
f dα
Proof. This comes from playing around with definitions of the upper and lower integrals. Fix
a partition P1 . Then L(P1 , f, α) ≤ U (P, f, α) for all partitions P . Since L(P1 , f, α) is a lower
Rb
Rb
bound for {U (P, f, α)} then L(P, f, α) ≤ a f dα since a f dα is the greatest lower bound for
U (P, f, α).
Rb
Rb
Likewise, a f dα is an upper bound for L(P, f, α) and hence a f dα, the least upper bound
Rb
Rb
for L(P, f, α) must satisfy a f dα ≤ a f dα.
3.2
Integrability of Continuous Functions
Rb
Rb
Last day, we established that a f dα ≤ a f dα holds. When do we have equality between
the upper and lower integrals, that is Riemann integrability?
We can restate the condition for Riemann integrability as follows:
f ∈ R(α) ↔ ∀ > 0 ∃P1 , P2 s.t. U (P1 , f, α) − L(P2 , f, α) < (*)
That is, the difference between the upper and lower sums can be made arbitrarily small. It
turns out that we only need one partition which works for both the upper and lower sums.
Theorem I.2. f ∈ R(α) if and only if for every > 0, there is a partition P such that
U (P, f, α) − L(P, f, α) < .
12
Proof. (←) The condition (∗) is satisfied for P1 = P and P2 = P and therefore, f ∈ R(α).
(→) Assume that f ∈ R(α) and there are two partitions P1 , P2 for which the condition (∗)
is true. Let P be the common refinement of P1 , P2 . Then
L(P2 , f, α) ≤ L(P, f, α) ≤ U (P, f, α) ≤ U (P1 , f, α)
holds. By assumption, U (P1 , f, α) − L(P2 , f, α) < . Accordingly,
U (P, f, α) − L(P, f, α) < .
Pn
Remark. We can rewrite the sum U (P, f, α)−L(P, f, α) = i=1 (Mi −mi )∆αi . Then informally,
a function is integrable if the areas between the upper and lower Riemann sums can be made
arbitrarily small.
We will apply the above condition in proving the following:
Theorem I.3. If f is continuous on [a, b], then f lies in R(α). That is, it is integrable with
respect to any α.
3.2.1
Review of Uniform Continuity
Before completing the proof, we will recall the notion of uniform continuity. A function f is
continuous at all points x ∈ [a, b] if
∀x ∀ > 0 ∃δ s.t. |x − y| < δ → |f (x) − f (y)| < Here δ = δ(x, ). Then f is uniformly continuous if it is continuous on [a, b] and δ is not
dependent on x. That is:
∀ > 0 ∃δ s.t. |x − y| < δ → |f (x) − f (y)| < Note that if f is continuous on [a, b], then f is uniformly continuous as the notions of uniform
continuity and continuity are equivalent on a compact set.
For instance, f (x) = x2 on [0, ∞) is continuous but not uniformly continuous. This is because
the graph gets steeper as x increases. Accordingly, for |f (x) − f (y)| < for fixed to hold when
|x − y| < δ, then δ must be decreased as x increases. We can also note this from the fact that
[0, ∞) is not a compact set.
Note that existence of the derivative is not required for a function to be uniformly continuous.
For example, if we regard part of a circle on [0, 1] as a function, the function is uniformly
continuous on that interval because [0, 1] is compact, although the derivative will be infinite at
x = 1.
3.2.2
Proof of Theorem
Proof. We need to show that for > 0, there exists a partition P such that U (P, f, α) −
L(P, f, α) < holds for any continuous f .
Since f is defined on a compact set, f is uniformly continuous. Then for every η > 0, there
exists δ for which |f (x) − f (y)| < η if |x − y| ≤ δ. We will specify the η later.
13
Next, we define the mesh of a partition P as ||P || = maxi∈{1,...,n} (xi − xi−1 ). Choose P
whose mesh is less than δ. It follows that in each interval on the partition, Mi − mi ≤ η holds
by (uniform) continuity of f .
Accordingly:
U (P, f, α) − L(P, f, α) =
n
X
(Mi − mi )∆αi < η
i=1
Taking η =
α(b)−α(a)
n
X
∆αi = η(α(b) − α(a)) < i=1
completes the proof.
Remark.
• In the case when α(a) = α(b),
function is integrable wrt to such alpha.
Rb
a
f dα = 0 holds since α is constant, and any
• We actually proved something stronger. We remarked before that f ∈ R(α) if and only if
for every , there was a partition P such that U − L < . We can express this condition in
terms of the mesh of the partition. For every , if there is a δ for which ||P || < δ, implies
U − L < then f is integrable.
3.3
Riemann Sums
Recall the definition of a Riemann sum. A Riemann Sum
n
X
f (ti )∆αi
RS(P, {t}, f, α) =
i=1
depends on not only a partition P , but also the tagging ti of each partition in the interval
[xi−1 , xi ].
Theorem I.4. Let f be continuous, so f ∈ R(α). Choose a sequence of partitions Pk such that
||Pk || → 0 as k → ∞. Then:
Z
lim RS(Pk , {ti }, f, α) →
k→∞
b
f dα
a
for any choice of {ti } or tagging in each Pk .
Remark. Continuity is really needed here. This is not necessarily true for non-continuous
functions.
Proof. By definition mi ≤ f (ti ) ≤ Mi holds for all intervals in the partition P . It follows that
L(Pk , f, α) ≤ RS(Pk , {ti }, f, α) ≤ U (Pk , f, α)
holds for any partition Pk and tagging. In the limit:
lim L ≤ lim R ≤ lim U
k→∞
k→∞
k→∞
But since f is continuous, it is integrable and limk→∞ L = limk→∞ U . It follows that all the
Rb
above limits are equal and tend towards a f dα.
Remark. We really proved that if f is continuous, then for every , there exists δ such that
Z b
f dα < RS(Pi , {ti }, f, α) −
a
provided ||P || < δ.
Next time we will prove more stuff about classes of integrable functions.
14
3.4
Discontinuous Functions
We already know that if f is continuous, then f ∈ R(α) for any α, but functions may
also be discontinuous and also be Riemann-Stieljes integrable. Note in the case of the RiemannStieljes integral, there is not complete characterization of functions which are integrable unlike the
Riemann integral, where a function is integrable if and only if it is continuous almost everywhere.
Theorem I.5. Let f be bounded and α be non-decreasing on [a, b]. If f is continuous except at
a finite number of points y1 , y2 , ..., yn and α is continuous at each yi , then f ∈ R(α).
(
0 x≤1
Remark. Recall the example of a step function α(x) =
and our observation that
1 x>1
R
α dα did not exist. This is because α had a discontinuity at x = 1.
Proof. It suffices to prove this in the case where m = 1, so we can apply the argument inductively
in the case where there is more than one discontinuity. Let y1 = y be the discontinuity of f (x).
We must show that for > 0, there exists a partition P such that U (P, f, α) − L(P, f, α) < .
Divide [a, b] into three pieces: [a, y − δ], [y − δ, y + δ], and [y + δ, b] where δ is a quantity we will
choose later. We choose P1 , P2 , P3 on each piece separately, such that U (Pi , f, α)−L(Pi , f, α) < 3
holds for i = 1, 2, 3. We will combine partitions P1 , P2 , P3 into the partition P to get U (P, f, α) −
L(P, f, α) < .
a
y−δ
y+δ
b
Figure 7: Division of the Interval into Three Parts
On [a, y − δ] and [y + δ, b], f is continuous and hence in R(α). Then there exist P1 , P3 such
that U (P1 , f, α) − L(P1 , f, α) < 3 and U (P3 , f, α) − L(P3 , f, α) < 3 hold.
Then on [y − δ, y + δ] we have a discontinuity. Let P2 have one part (P2 = {y − δ, y + δ}). It
follows that
U (P2 , f, α) − L(P2 , f, α) = (Mi − mi )∆αi = (Mi − mi )(α(y + δ) − α(y − δ))
By boundedness of f , there exists B such that Mi − mi ≤ 2B. By continuity of α at y:
∀η > 0 ∃δ > 0 s.t. |y − x| < δ → |α(y) − α(x)| < η
Thus, U (P2 , f, α) − L(P2 , f, α) ≤ 2B(2η) < 3 on [y − δ, y + δ] for some η. Choose η = 12B
,
so that δ is the corresponding value such that |y − x| < δ → |α(y) − α(x)| < η. This ensures
U (P2 , f, α) − L(P2 , f, α) < 3 , completing the proof.
15
Remark. We can adapt the above argument to prove that if for > 0, the discontinuities of f
can be covered by a finite number of intervals of total α-length < , then f is integrable with
respect to α.
This is the case if f has discontinuities on the Cantor set (taking α(x) = x). For example,
the Cantor set can be covered with finitely many intervals of arbitrarilly small length.
Iteration 2: Length =
4
9
Iteration 3: Length =
8
27
Figure 8: The Cantor Set can be covered with finitely many intervals of arbitrarily small length.
Theorem I.6. Let f be non-decreasing and α be continuous (and non-decreasing). Then f is
integrable (f ∈ R(α))
Remark. Here, a non-decreasing
f can be “arbitrarily bad” if it is integrated with respect to a
R
continuous α. Note that α dα does not exist because α is non-decreasing, but not continuous.
). This is
Proof. Choose a partition Pn for which all ∆αi are all equal (i.e. ∆αi = α(b)−α(a)
n
possible since α is a continuous function (so the intermediate value theorem holds for α).
Then
U (Pn , f, α) − L(Pn , f, α) =
n
X
(Mi − mi )∆αi
i=1
= ∆αi
n
X
[f (xi ) − f (xi−1 )]
i=1
= ∆αi (f (b) − f (a)) (This sum telescopes)
=
[(f (b) − f (a)][α(b) − α(a)]
n
Therefore, as n approaches infinitely, U (Pn , f, α) − L(Pn , f, α) approaches 0, which completes
the proof.
Remark. In the setting of the above theorem, since f is non-decreasing, we may compute both
Rb
Rb
f dα and a α df . In particular:
a
Z
b
Z
a
Rb
a
b
α df = f (b)α(b) − f (a)α(a)
f dα +
a
This is the integration by parts formula. We may interpret as the formula as the claim that
Rb
f dα exists iff a α df exists, although we need to prove this more formally.
16
R
f (b)
f (b)
f (a)
L(α, P, f )
f (a)
α df
α(a)
α(b)
R
α(a)
α(b)
f dα
U (f, P, α)
RFigure 9: Illustration of the integration by parts formula and symmetry between
α df
R
f dα and
Theorem I.7 (Composition of Functions). Let f ∈ R(α) and f : [a, b] → [c, d]. Let φ be
continuous on [c, d]. Then φ(f (x)) ∈ R(α)
Remark. The theorem enlarges the classes of functions which are integrable. For instance, if
we know that f ∈ R(α), then we will also know that f 2 ∈ R(α) and |f | ∈ R(α).
Before completing this proof, recall that f ∈ R(α) if and only if for every > 0, there exists
a partition P for which U (P, f, α) − L(P, f, α) < . We may illustrate this as follows:
Figure 10: The shaded region is U (P, f, α) − L(P, f, α)
That is, the area between U (P, f, α) and L(P, f, α) may be made arbitrarily small. To do so,
we either make the length or height of each rectangle between U (P, f, α) and L(P, f, α) small.
In the case of f continuous, each rectangle has small height and length since Mi − mi < η
may be satisfied given an interval whose length is as small as we please. Supposing f has
discontinuities, we may have some boxes with large height, although those boxes
may have small
R
width to make the difference U − L small. In the bad case, for instance α dα, we may have
rectangles with both large height and width in U − L for any partition P . We can now proceed
with our next proof, now having understood this principle.
17
Theorem I.8. Suppose f : [a, b] → [c, d] is integrable with respect to α, and φ is continuous on
[c, d]. Then φ(f (x)) ∈ R(α)
Proof. Assume that f is integrable and φ and continuous. In terms of − δ definitions:
1. Assuming φ is continuous on [c, d] means that it is uniformly continuous. This means:
∀ > 0, ∃δ s.t. |y1 − y2 | < δ → |φ(y1 ) − φ(y2 )| < (1)
2. Assuming f is integrable, this means
∀η > 0, ∃P s.t. U (P, f, α) − L(P, f, α) < η
(2)
We will need to prove that U (P, φ(f ), α) − L(P, φ(f ), α) < , that is their difference can be
made arbitrarily small.
Let η > 0. Take the P which satisfies equation (2). On each [xi−1 , xi ] on P , consider Mi , mi
which are the supremum and infimum of f on that interval. Let Mi∗ , m∗i be the supremum and
infimum of φ(f ) on that interval.
By uniform continuity of φ, if |Mi −mi | < δ then |Mi∗ −m∗i | < . We then divide our intervals
into two sets A, B defined as follows:
A = {i | Mi − mi < δ}
B = {i | Mi − mi ≥ δ}
Consider the contribution of A, B to U − L in φ(f ). In A:
X
X
(Mi∗ − m∗i )∆αi ≤
∆αi ≤ (α(b) − α(a))
i∈A
i∈A
In B, by boundedness of φ:
X
X
X
(Mi∗ − m∗i )∆αi ≤
2K∆αi = 2K
∆αi
i∈B
i∈B
where K is the bound on K. We claim
P
i∈B
i∈B
∆αi is small since
n
X
(Mi − mi )∆αi < η
i=1
by integrability of f . We can then derive the following inequalities to bound
X
(Mi − mi )∆αi
<
i=1
i∈B
|
n
X
(Mi − mi )∆αi < η
{z
}
Since we are taking fewer points
X
i∈B
(Mi − mi )∆αi <
X
δ∆αi = δ
i∈B
|
X
∆αi < η
i∈B
{z
by bound on Mi − mi
18
}
P
i∈B
∆αi .
Thus,
X
η
δ
∆αi <
i∈B
We then can complete the proof. Given > 0, we get some δ > 0 which satisfies (1). Then
choose η = · δ to obtain a partition P from (2). Therefore:
η
2K ·
| {z δ}
U (P, φ(f ) − α) − L(P, φ(f ), α) < (α(b) − α(a)) +
|
{z
}
Contribution from A
Contribution from B
= (α(b) − α(a) + 2K)
|
{z
}
Constant
Accordingly, U − L may be as small as we please and φ(f ) is integrable. The key idea to be
taken from this proof is the division of the partition into A, B and making U − L small on each
set separately.
4
Properties of the Integral
We now list some properties of the integral:
Theorem I.9. The following are properties of the Riemann-Steiljes integral:
1. Assume f, g ∈ R(α), then for all c, d ∈ R, cf + dg ∈ R(α) and
b
Z
b
Z
(cf + dg) dα = c
Z
b
f dα + d
a
a
g dα
a
In other words, the integral is a linear operator.
2. The integral is also linear in α. That is, if f ∈ R(α) and f ∈ R(β), then f ∈ R(c1 α + c2 β)
for c1 , c2 ≥ 0 and
Z
b
b
Z
f d(c1 α + c2 β) = c1
a
Z
f dα + c2
a
b
f dβ
a
The condition that c1 , c2 ≥ 0 is needed here to ensure c1 α, c2 β are non-decreasing functions.
3. If f, g are integrable and f (x) ≤ g(x) for all x, then
Z
b
Z
f dα ≤
a
b
g dα
a
.
4. f ∈ R(α) on [a, b] if and only f ∈ R(α) on [a, c] and [c, b] where a ≤ c ≤ b. Additionally:
Z
b
Z
f dα =
a
c
Z
f dα +
a
19
b
f dα
c
5. If f ∈ R(α) and |f (x)| ≤ M , then
Z
a
b
f dα ≤ M [α(b) − α(a)]
We will omit the proof for most of these theorems but they can be done by considering the
difference between the upper and lower sums, as follows:
Proof of Item 1. Suppose f, g ∈ R(α) and let h = f + g. Then on [xi−1 , xi ]:
mi =
Mi =
inf
h(x) ≥
x∈[xi−1 ,xi ]
sup
inf
f (x) +
x∈[xi−1 ,xi ]
h(x) ≤
x∈[xi−1 ,xi ]
sup
inf
g(x)
x∈[xi−1 ,xi ]
f (x) +
x∈[xi−1 ,xi ]
sup
g(x)
x∈[xi−1 ,xi ]
It follows that
L(P, f, α) + L(P, g, α) ≤ L(P, h, α) ≤ U (P, h, α) ≤ U (P, f, α) + U (P, g, α)
Since f, g are integrable, then
(U (P, f, α) − L(P, f, α)) + (U (P, g, α) − L(P, g, α)) < 2
It follows that U (P, h, α) − L(P, h, α) < 2, so h ∈ R(α).
We can get the following corollaries from the above theorem:
Corollary I.1. Assume that f, g ∈ R(α). Then:
1. f 2 ∈ R(α)
2.
1
f
∈ R(α), provided f (x) ≥ for some > 0.
3. f g ∈ R(α).
R
R
b
b
4. |f | ∈ R(α), with a f dα ≤ a |f | dα
Proof. Apply the preceeding theorem re composition of functions. In 1), choose φ(y) = y 2 . In 2),
choose φ(y) = y1 . For 3), note that f g = 14 [(f + g)2 − (f − g)2 ]. Finally, for 4), choose φ(y) = |y|.
R
R
b
b
To prove the assertion that a f dα ≤ a |f | dα. It suffices to prove bounds on the upper or
lower sums. For instance, |U (P, f, α)| ≤ U (P, |f |, α) can be shown from the triangle inequality.
Letting Mi be its usual meaning, then:
X
X
X
Mi ∆αi ≤
|Mi ∆αi | ≤
sup |f (x)|∆αi
We now come to our main theorem for today, which reduces Riemann-Steiljes integrals into
Riemann integrals.
20
Theorem I.10. Suppose α0 (x) exists and α0 ∈ R (i.e. α0 is Riemann-integrable). Then f ∈
R(α) if and only if f α0 ∈ R, and
Z b
Z b
f dα =
f α0 dx
a
a
Example I.5 (Applications of Theorem). Recall that one of the motivations for defining the
Riemann Steiljes integral was the Dirac delta function. In this case, α is a “smooth” step function
where the area under α0 (x) is approximately 1, and α0 (x) approaches δ(x). Similarly, if β = α0 (x)
is the following function:
1
2δ
1−δ
1+δ
Of area 1
Figure 11: Illustration of β(x)
Then we may interpret
why note:
Z
R2
0
R2
f dα =
2
0
Z
f β dx as the average value of f on [1 − δ, 1 + δ]. To see
1+δ
f β dx =
0
f (x)
1−δ
1
1
dx =
2δ
2δ
Z
1+δ
f (x) dx
1−δ
To prove the
Pnabove theorem, we will use Riemann sums. Recall a Riemann sum RS(P, {ti }, f, α)
is defined as i=1 f (ti )∆αi where ti is a tagging of a partition P . Note that L ≤ RS ≤ U where
L, U denote the lower and upper sums respectively as L chooses the infimum of each partition and
U chooses the supremum of the partition for the tagging in these cases (and mi ≤ f (ti ) ≤ Mi )
holds. Furthermore, if Pk is a sequence of partitions where limk→∞ L(Pk ) = limk→∞ U (Pk ),
then they are equal to limk→∞ RS(Pk ) for any tagging {ti } of the partitions.
Rb
Rb
Step 1. We will first assume that both integrals exist and prove that a f dα = a f α0 dx in this
case.
Fix some partition P . By the mean value theorem, ∆αi = α(xi ) − α(xi−1 ) = α0 (ti )∆xi for
some ti ∈ [xi−1 , xi ]. Use these {ti }s as tagging in a Riemann sum. In this case:
X
X
RS(P, {ti }, f, α) =
f (ti )∆αi =
f (ti )α0 (ti )∆xi = RS(P, {ti }, f α0 )
To show that they are equal, suppose they were not. Then there exists a partition P1 , P 0
for which L(P1 , f, α) ≤ U (P1 , f, α) < L(P 0 , f α0 ) ≤ U (P 0 , f α0 ). Taking the common refinement
P yields L(P, f, α) ≤ U (P, f, α) < L(P, f α0 ) ≤ U (P, f α0 ). Then there exist Riemann sums for
which RS(P, {ti }, f, α) 6= RS(P, {ti }, f α0 ), in contradiction to the result we just proved!
21
Before proceeding with the rest of the proof, we will prove a lemma.
Lemma I.4. For every η, there exists a partition P such that |RS(P, {si }, f, α)−RS(P, {si }, f α0 )| <
η for any choice of {si }. Moreover this is true for any refinement of P .
The lemma means that for any choice of {si }, the Riemann sum calculated using this tagging
differs little, by at most η, between RS(P, {si }, f, α) and RS(P, {si }, f α0 ). Furthermore, the
lemma implies the theorem.
Proof. We will use the fact that α0 ∈ R in the proof of this theorem. Since α0 ∈ R, then there
η
, where |f (x)| < B.
exists a partition such that U (P, α0 ) − L(P, α0 ) < = B
Then, let {si } and {ti } be taggings of the partition P .
X
X
X
|α0 (si ) − α0 (ti )|∆xi
α0 (si )∆xi −
α0 (ti )∆xi ≤
≤ |Mi − mi |∆xi (Mi , mi taken of α0 )
< (Riemann integrability of α0 )
Then:
X
f (si )∆αi
X
=
|{z}
f (si )α0 (ti )∆xi
By choice of ti
Changing the ti s to si s yields a difference of:
X
X
f (si )[α0 (ti ) − α0 (si )]∆xi ≤ B
|α0 (t) − α0 (si )|∆xi < B = η
We shall continue this discussion more next day.
It remains to show that the lemma implies the theorem.
Proof. Fix a partition P and let η > 0 be given. Let U (P, f, α) = sup{si } RS(P, {si }, f, α) and
U (P, f α0 ) = sup{si } RS(P, {si }, f α0 ). Then
|U (P, f, α) − U (P, f α0 )| ≤ η
. Otherwise, for P , there exist Riemann sums for which
|RS(P, {si }, f, α) − RS(P, {si }, f α0 )| ≥ η
, in contradiction to the lemma we proved earlier.
Then
Z b
f dα = inf0 U (P 0 , f, α) =
a
inf
All P ∗ refining P
P
U (P ∗ , f, α)
It follows that
Z b
Z b
f α0 dx| ≤ η
| f dα −
a
(1)
a
and in the same way
Z b
Z b
| f dα −
f α0 dx| ≤ η
a
a
22
(2)
If, for instance,
Rb
Rb
f dα and
f dα are the same, the fact that (1) and (2) hold means
Rb
Rb
Rb
that the difference between upper and lower integrals a f α0 dx and a f dα, along with a f dα
Rb
and a f α0 dx is small. Since the integrals of f with respect to α are equal, it means that
Rb 0
Rb
f α dx = a f α0 dx must hold.
a
Therefore, f ∈ R(α) if and only if f α0 ∈ R, by equality of these upper and lower integrals.
a
a
The above theorem, finally, gives a meaning to dα as α0 dx when α is differentiable and
Riemann integrable.
4.1
Change of Variables
Recall from calculus the change of variables formula: if x = u(t), then
b
Z
B
Z
f (x) dx =
a
f (u(t) d(u(t))
| {z }
A
u0 (t)dt
where a = u(A) and b = u(B). We may make a similar statement for the Riemann-Steiljes
integral.
Theorem I.11. Let u : [A, B] → [a, b] be a strictly increasing and onto function, and let f, α
have their usual meanings. Then:
b
Z
Z
B
f (x) dα(x) =
a
f (u(t)) dα(u(t))
A
The fact that u is strictly increasing is needed to ensure that dα(u(t)) is non-decreasing.
Proof. Let P = {x0 , ..., xn } be a partition on [a, b] and Q = {t0 , ..., tn } be a partition on [A, B]
such that u(ti ) = xi . We claim that
U (P, f, α) = U (Q, f (u), α(u))
L(P, f, α) = L(Q, f (u), α(u))
To see why, we write out the upper and lower sums.
X
U (P, f, α) =
Mi ∆αi
and
U (Q, f (u), α(u)) =
X
M 0 ∆α ◦ ui
Then Mi = supx∈[xi−1 ,xi ] f (xi ) and Mi0 = supt∈[ti−1 ,ti ] f (u(t)) = supx∈[xi−1 ,xi ] f (x) by choice
of ti . Furthermore,
∆αi = α(xi ) − α(xi−1 )
and
∆α ◦ ui = α(u(ti )) − α(u(ti−1 )) = α(xi ) − α(xi−1 )
again by choice of ti . So upper and lower sums between the two integrals are the same.
Accordingly, the two integrals are equal.
23
4.2
The Fundamental Theorem of Calculus
Rx
Given f (x), we may define F (x) = a f (t) dt, and we may expect F 0 (x) = f (x). However,
this may not work under some conditions. For instance, consider the following step function f (x)
and the corresponding function F (x).
f (x)
F (x)
Figure 12: Example of f (x) and the corresponding F (x)
We can see that at the point where f (x) changes from 0 to 1, the corresponding part in F (x)
is not differentiable. However, F 0 (x) = f (x) under some conditions:
Rx
Theorem I.12. Let f be integrable on [a, b] and let F (x) = a f (t) dt. Then
i F is continuous.
ii If f is continuous, then F is differentiable and F 0 = f .
Proof.
i Here we will need to prove that limx→x0 F (x) = F (x0 ). Then
Z
|F (x) − F (y)| = y
x
f (t) dt ≤ B(y − x)
where B is a bound on f (t). Thus, limx−y→0 (F (y) − F (x)) = 0.
(x)
ii Here we will need to show that lim F (y)−F
= limh→0
y−x
F (x0 +h)−F (x0 )
h
= f (x0 ).
In the case of h positive, we may bound the difference quotient as:
F (x + h) − F (x ) 1 Z x0 +h
0
0 f (t) dt
=
h
n x0
Letting m = inf f (t) and Mi = sup f (t) on [x0 , x0 + h] yields the inequality:
Z
x0 +h
mh ≤
f (t) dt ≤ M h
x0
(x0 )
≤ M . By continuity as h approaches 0 yields sup f (x) =
Therefore, m ≤ F (x0 +h)−F
h
(x0 )
inf f (x) = f (x0 ) on [x0 , x0 + h]. Therefore: limh→0 F (x0 +h)−F
= f (x0 ).
h
24
5
Functions of Bounded Variations
Rb
We considered the issue of defining a f dα where α was a non-decreasing function. How do
Rb
we now define, a f dg where g may not be monotone?
Rb
We may define a f dg whenever g is of bounded variation.
Definition I.4. The variation of g on [a, b] is
Vab (g) = sup
P
n
X
|∆gi | =
i=1
n
X
|g(xi ) − g(xi−1 )|
i=1
If we consider g : [a, b] → R as a path, then Vab (g) is the length of the path. In particular, if
g (x) exists and is integrable, then
Z b
b
|g 0 (x)| dx
Va (g) =
0
a
.
The set of functions of bounded variation on [a, b] is denoted as BV [a, b], where
BV [a, b] = {g |Vab (g) < ∞}
The following is a function which is not of bounded variation.
y0
y1
Figure 13: A function not of bounded variation
We construct the function as follows. Pick points y0 , y1 , ... such that |y1 −y0 | = 1, |y2 −y1 | = 21 ,
|y3 − y2 | = 31 ... and so on. Then the variation of the function defined by the harmonic series
1 + 12 + 13 ... which diverges.
Functions of bounded variation can be used in defining the integral because of the Jordan
decomposition.
25
Theorem I.13 (Jordan Decomposition). A function g is of bounded variation if and only if
g = α − β for some non-decreasing α, β.
Proof. (→) Assigned on Homework 3. (←) Let g = α − β for some non-decreasing, α, β. Then
Vab (α − β) ≤ Vab (α) + Vab (β)
Since α, β are non-decreasing, then
Vab (α) + Vab (β) = [α(b) − α(a)] + [β(b) − β(a)]
which is finite.
Remark. The decomposition into α, β need not be unique. For instance:
g(x) = 0 = α − α = β − β
for any non-decreasing α, β
Definition I.5 (Integral wrt to g). If f is continuous and g ∈ BV (i.e. g is of bounded variation),
then g may be decomposed as g = α − β. Therefore, we define:
Z
b
Z
a
b
Z
f dα −
f dg =
a
b
f dβ
a
where the integrals on the right hand side are the Riemann-Steiljes integral we have already
defined.
Remark. It’s enough to assume that f ∈ R(α) and f ∈ R(β) for the integral to exist, but f
continuous ensures that the integral always exists.
Furthermore, the integral is always the same regardless of the choice of α, β, wherever it
exists. This result is a problem on the next problem set.
In the case where g is not of bounded variation, we may also be able to decompose g into
g = α − β. Although |α − β| may be finite for every x in the interval, the issue here is α, β
Rb
Rb
themselves may not be bounded and furthermore, the integral a f dα − a f dβ may result in
an ∞ − ∞ answer, which is not defined.
This notion of integration, of integrating with respect to functions of bounded variation, is
applied in proving a result in functional analysis- the Riesz Representation Theorem.
5.1
The Riesz Representation Theorem
This result was first stated on 1910. Before stating the result, we will need to first make some
definitions.
Fix an interval [a, b]. Then let C[a, b] denote the set of continuous functions on [a, b].
Let C ∗ denote the vector space dual over R. The dual of a vector space is a vector space
consisting linear maps (maps respecting addition and scalar multiplication) from the original
space to R. In this case:
C ∗ = {T : C → R, T is a linear functional}
For instance, the dual space of the vector space of polynomials R[x] is the set of power series
R[[x]]. In this instance, the vector space of polynomial has countable dimension since its basis
is the monomials {1, x, x2 , x3 ...}. However, linear functionals may act on any finite or infinite
26
subset of this basis, thereby making the space of power series R[[x]], with an uncountable basis,
the dual space of the vector space of polynomials.
Likewise, C[a, b] itself a very large set with an uncountable basis. However, the dual space
∗
C ∗ is not well-behaved due to its extremely large size! We then take the subset CB
⊂ C ∗ of
bounded functionals. To make precise what bounded means, we will need to define a norm on
C[a, b]. For f ∈ C[a, b], its norm is:
||f ||∞ = sup |f (x)|
x∈[a,b]
Since any continuous function on [a, b] achieves its maximum value, then ||f ||∞ returns the
maximum value of f on [a, b] if f ∈ C[a, b]. We may check that this norm satisfies the properties
needed of a norm, such as the triangle inequality and the fact that ||f || = 0 iff f (x) = 0, for
instance. This norm is part of a set of norms called Lp norms and is known as the infinity norm.
Then a linear functional T : C → R will be bounded if there exists M ∈ R such that for
every function f (x) ∈ C[a, b]:
|T (f )| ≤ M · ||f ||∞
∗
We further note that the set of bounded functionals CB
is a vector space. Before proceeding
∗
further, we will examine some elements of CB :
∗
).
Example I.6 (Examples of Elements of CB
1. The Evaluation Map:
Let evx0 : C → R be the map f 7→ f (x0 ). That is, we take a function f ∈ C[a, b] and
return its value at x0 ∈ [a, b]. It is a linear map since evaluation of f + g and cf at x0
return f (x0 ) + g(x0 ) and cf (x0 ) respectively. Furthermore, it is bounded since:
|evx0 (f )| ≤ |f (x0 )| ≤ ||f ||∞
as f (x0 ) is necessarily less than or equal to its maximum on [a, b]. which is ||f ||∞ . Taking
M = 1 completes the proof that evx0 is a bounded linear functional.
2. Integration:
a Fix some non-decreasing α and define a map C[a, b] → R as f 7→
properties of the integral. Furthermore, it is bounded since:
Z
a
b
Rb
a
f dα. It is linear by
f dα ≤ ||f ||∞ (α(b) − α(a)|
since the integral is no bigger than the maximum of f multiplied by the length of the
interval on which we want to integrate. Taking M = |α(b) − α(a)| completes the proof
that integration is a bounded linear functional.
Rb
b Furthermore, fixing some g of bounded variation, then f → a f dg induces a map from
C → R. This is a bounded linear functional, taking M = Vab (g). We may check this as
an exercise.
3. Differentiation: Define C 1 [a, b] as a set of functions which have continuous derivatives. Let
a map C 1 [a, b] → R be defined as f 7→ f 0 (x0 ) where x0 is a point in [a, b]. This is not a
bounded linear functions since |f 0 (x0 )| may be arbitrarily big in relation to ||f ||∞ , such as
in the case of a function which gets arbitrarily steep close to the origin.
27
We now come to statement of the Riesz Representation Theorem.
Theorem I.14 (Riesz). All bounded linear functionals come from integration. More precisely,
∗
every T ∈ CB
, that is every bounded linear functional, is defined by some g ∈ BV such that
Z
T : f 7→
b
f dg
a
Remark.
Rb
1. For instance, the evaluation map at x0 may be defined by a f dα where α is a step function
which changes values at x0 . We have previously encountered this example.
∗
2. We may restate the Riesz representation theorem in terms of maps from functions to CB
.
∗
That is, the map from functions of bounded variations to CB is surjective since every
function of bounded variation can define a bounded linear functional.
However, the map is not injective since:
• The functions g and g + c where c is a constant define the same integral. We fix this
problem by only considering the functions where c = 0.
• Consider the following two step functions:
α
β
x0
x0
R
R
Then f dα = f dβ = f (x0 ) where x0 is the jump point. We fix this problem
by only considering functions which are continuous from the right, so β will be not
included in our set.
By imposing the above two restrictions on BV , we get a subspace of functions BV ⊂ BV .
∗
, is then an isomorphism between the two sets. We may illustrate this
The map BV → CB
graphically as follows.
surjective
BV
subset
∗
CB
injective, '
{g ∈ BV |g(α) = 0 and g continuous from the right} = BV
Figure 14: Illustration of Riesz Representation Theorem
5.2
The Length of a Curve
Recall from last day the definition for the variation of a function on [a, b]. This is:
Vab (g) = sup
P
n
X
|g(xi ) − g(xi−1 )|
i=1
28
. The variation satisfies properties such as
Vab (g) = Vac (g) + Vcb (g)
for an interval a < c < b. We may interpret the variation of a function, as the length of the curve
the function traces out.
Definition I.6. A curve is a (continuous) function γ : [a, b] → Rn , or a map
x 7→ (γ1 (x), γ2 (x), ...γn (x))
.
The length of γ is
Λ(γ) =
sup
X
Partitions of [a,b]
i
||γ(xi ) − γ(xi−1 )||
.
Call a curve rectifiable if Λ(γ) is finite.
We may think of calculating the length of a curve as approximating the curve as many little
line segments, and adding up the length of each line segment. Furthermore, in the case where
γ : [a, b] → R, then Λ(γ) = Vab (γ).
Note that Λ(P, γ), the length of the curve calculated using a partition P underestimates the
length of γ by construction (Λ(P, γ) ≤ supP Λ(P, γ) = Λ(γ)). As such, we may think of it as a
lower sum L(P, f ) and indeed, the two quantities share some similar properties.
Lemma I.5. If P ∗ is a refinement of P , then Λ(P ∗ , γ) ≥ Λ(P, γ).
Proof. It suffices to prove that for a partition P ∗ = P ∪ {y}.
γ(xi−1 )
γ(xy )
γ(xi )
Figure 15: Illustration of the proof for a plane curve
Note that
Λ(P, γ) = ... + ||λ(xi ) − λ(xi−1 )|| + ...
and
Λ(P ∗ , γ) = ... + ||λ(xi−1 ) − λ(y)|| + ||λ(xi )λ(y)|| + ...
. An application of the triangle inequality completes the proof.
Theorem I.15. Suppose a < c < b. Then if γ is rectifiable on [a, b], then
Λba (γ) = Λca (γ) + Λbc (γ)
29
Proof. By definition:
Λba (γ) =
sup
Λ(P, γ)
Partitions over [a,b]
. We claim that
sup
Λ(P, γ) =
sup
Λ(P ∗ , γ)
Partitions P ∗ =P ∪{C}
Partitions over [a,b]
The direction
Λ(P, γ) ≥
sup
sup
Λ(P ∗ , γ)
Partitions P ∗ =P ∪{C}
Partitions over [a,b]
follows from the fact that partitions containing {c} are subset of all partitions. The direction
Λ(P, γ) ≤
sup
Partitions over [a,b]
sup
Partitions
Λ(P ∗ , γ)
P ∗ =P ∪{C}
comes from noting that for every partition in [a, b], there exists a refinement of P , P ∗ containing
{c} such that Λ(P, γ) ≤ Λ(P ∗ , γ).
Then, the set of partitions P ∗ for which c ∈ P ∗ is in bijection with the set {(P1 , P2 )} where
P1 is a partition over [a, c] and P2 is a partition over [c, b].
Thus, Λ(P ∗ , γ) = Λ(P1 , γ) + Λ(P2 , γ). Taking the supremum over P ∗ on the left side, and
over (P1 , P2 ) on the right side yields the equality
Λba (γ) = Λca (γ) + Λbc (γ)
Note that above argument also works to show equality of integrals when an interval [a, b] is
divided into intervals [a, c] and [c, b].
Example I.7 (Non-Rectifiable Curves). We
like
would
an example of a non-rectifiable continuous
xa cos x1
(hence bounded) curve. Taking γ(x) =
where a is an appropriate exponent and
xa sin x1
x ∈ [0, 1] should work. Note that we define γ(0) = (0, 0). In particular, we may calculate the
Rb
length of the curve as Λ(γ) = a |λ0 (t)| dt whenever γ is differentiable.
| {z }
the speed
Next, the Koch snowflake is a non-rectifiable curve which begins as a map from [0, 3] to
a triangle, and where we draw a new triangle on each third of each side of the triangle upon
each iteration. This is not rectifiable since the length of the snowflake increases by 43 upon each
iteration, but it is continuous.
Finally space-filling curves are maps [0, 1] → [0, 1] × [0, 1] which are not rectifiable because
they fill the unit square.
5.3
Functional Analysis Revisited
We now make some remarks on the field of functional analysis.
• Functional analysis studies spaces of functions, such as C[a, b], the continuous functions
on [a, b], or BV [a, b], the functions of bounded variation on [a, b].
• By introducing a norm on the space of functions, then we may define a metric between
two functions and introduce a topology induced by the metric. An example of the norm
we saw last day was the supremum norm: ||f ||∞ which returns the maximum value of the
function on C[a, b].
30
• Functional analysis then studies the bounded linear maps: V → W where V, W are two
spaces. A bounded linear map will be a continuous map in this case.
• In the example of the Riesz representation theorem, the dual space of C[a, b], which is
all linear maps L : C[a, b] → R was found to be isomorphic to BV [a, b] or the functions
of bounded variation on [a, b]. This is because all linear functionals on C[a, b] could be
represented by integration with respect to a function of bounded variation. This induces
Rb
a map between BV [a, b] → C[a, b] as g 7→ a f dg where f is any continuous function.
Furthermore, N BV [a, b] = BV as defined last day, is in isomorphism with C[a, b]∗ .
• There exist different norms on a space. For instance, we may define ||g|| = Vab (g) instead of
using the supremum norm previously. Furthermore, norms have operators, where we may
define the norm of an operator L as
||Lf ||
f 6=0 ||f ||
||L|| = sup
• We may claim that the isomorphism BV ' C[a, b]∗ as stated in the Riesz representation
theorem is one which preserves norms. More precisely, if we let L denote the operator
Rb
f dg on C[a, b], then the norm of L is
a
||L|| = sup
f 6=0
||Lf ||∞
= Vab (g)
||f ||∞
We may see this in the case of α monotonic by the fact that:
Z
a
b
f dα ≤
||f ||∞
| {z }
The Maximum
|α(b) − α(a)|
|
{z
}
Vab (α)
Thus,
|
Rb
f dα|
≤ |α(b) − α(a)|
||f ||∞
a
.
It follows by taking f as a constant function that:
sup
|
Rb
f dα|
a
= |α(b) − α(a)|
||f ||∞
31
Part II
Sequences and Series of Functions
Today we will begin the main topic of the course: sequences and series of functions. This
culminates studying in the Stone-Weierstrass theorem.
6
Sequences and Series of Functions: Definitions and Issues
Definition II.1. A sequence of functions f1 , f2 , ... is denoted as {fi }ni=1 , where each fi (x) are
all defined on some domain E.
Pn
Similarly, a series of functions is denoted as i=1 fi .
We may compare sequences of functions with sequences of numbers by considering V : a space
of functions. If V is a function space, then each “point” in the space is a function, wherein a
sequence of points may converge to some function under a particular metric. We formally define
convergence here:
Definition II.2 (Convergence of Sequences). A sequence {fi (x)} converges to f (x) if
∀x ∈ E, lim fi (x) = f (x)
i→∞
.
In other words, the sequence of numbers fi (x) on x ∈ E converges to f (x). We write
limi→∞ fi = f to denote convergence of the sequence of functions.
In terms of epsilon-delta definitions, limi→∞ fi = f , if
∀x ∀, ∃N = N (x, ) s.t. |fi (x) − f (x)| < for i ≥ N
P∞
Definition II.3 (Convergence
of Series). A series P i=1 fi converges of f if the sequence of
Pn
∞
partial sums {sn = i=1 fi } converges to f . Write i=1 fi = f .
Note that in both of the above definitions, we allow different x to take different N such that
|fi (x) − f (x)| < for i ≥ N .
Example II.1. The Taylor series presents an example of a series of functions. We know that:
ex = 1 + x +
x2
...
2
2
This is a series consisting of the functions 1, x, x2 , ... and is convergent on R.
Next, consider the functions fn (x) = xn on [0, 1]. Note that
(
1 x=1
lim fn = f (x) =
n→∞
0 otherwise
The sequence of functions may be illustrated as follows:
32
x
x2
x3
x4
Figure 16: Illustration of the Sequence of Functions
The above example illustrates the main issue we are dealing with when considering a sequence
of functions. Each fn (x) = xn is continuous and differentiable everywhere, but the function in
the limit is not continuous and differentiable everywhere. Thus, is the limit compatible with
properties of functions of a sequence? More precisely, we may ask the following questions about
a sequence of functions {fn } and f = limi→∞ fn :
1. If each fn is continuous, is f also continuous? Again, this is not demonstrated in the
example above.
2. If each fn is differentiable, is f also differentiable? Furthermore, if f is differentiable, does
f 0 = limn→∞ fn0 ?
Rb
3. If each fn is integrable, is f also integrable? Furthermore, if f is integrable, is a f dα =
Rb
lim a fn dα?
We may extend these questions to ask if any properties which each function in {fn } possesses
may be extended to the limit f = limn→∞ fn . However, the answer to each of the above questions
is no in general- an instance of Murphy’s law in mathematics. However, if the sequence of
functions fn is uniformly convergent, then properties of f is generally preserved under the
limit. For instance, if each fn is continuous and fn converges uniformly, then f will also be
continuous. In essence, uniformly continuity ensures that N is chosen depending on only and
not the x at which the function is evaluated.
Before defining uniform convergence, we will examine a number of sequences of functions
which gives us negative results for the questions above.
33
Example II.2.
1. In Rudin Example 7.4, we consider the functions fm (x) = limn→∞ [cos(m!πx)]2n ,
each of which is integrable, but the limit f (x) = limm→∞ limn→∞ fm,n is not. We can see
from the example that the crux of the issue is the interchange of two limits (which generally
cannot do).
For instance f continuous at x0 means that limx→x0 f (x) = f (x0 ). If we have a sequence
of functions {fn (x)}, each continuous at x0 , and want to ask if the function in the limit is
continuous at x0 , then we must verify that:
lim ( lim fn (x)) = lim lim fn (x)
n→∞ x→x0
| {z }
x→x0 n→∞
fn (x0 )
holds.
2. Let h(x, y) =
y
x+y .
Then
lim
y
=1
y
lim h(x, y) = lim 0 = 0
lim h(x, y) = lim
y→0+ x→0+
lim
y→0+
x→0+ y→0+
x→0+
Evidently the interchange of limits fails in this case. We may see this additionally by
considering h being a slope function which is constant on all lines passing through the
origin. The limit then depends on which line the origin is approached from in this case.
3. We’ve already considered the case of fn (x) = xn on [0, 1], wherein each fn is continuous
and differentiable but f (x) is not continuous and not differentiable.
4. This example shows that fn (x) is integrable on [0, 1] but f (x) may not. Let Q ∩ [0, 1] =
{q1 , q2 , q3 , ...}. We may write the set this way since Q is countable. Then let:
(
1
fn (x) =
0
x ∈ {q1 , ..., qn }
otherwise
Each fn is integrable since it has a finite number of discontinuities. But the limit function
is
(
f (x) =
1
0
x∈Q
otherwise
which we’re already shown not to be integrable.
0
0
When
R the limitR function is also differentiable or integrable, the equalities f = limn→∞ fn
and fn dα = f dα may not hold as we will see in the next two examples.
5. Let fn be defined on [0, 1] as in the picture below:
34
2n
f3
fn
f2
f1
1
n
Figure 17: fn are a sequence of functions which form a “travelling wave”
Then since fn (x) = 0 for all n, f (0) = 0 in the limit. For x > 0 and sufficiently large n
(n ≥ x1 ), fn (x) = 0. Hence f (x) = 0 in the limit. We then have f (x) = limn→∞ fn (x) = 0
and that each fn , and f are integrable on [0, 1].
R1
R1
However, 0 fn dx = 12 ( n1 )(2n) = 1 by the area of a triangle formula, and 0 f dx = 0.
Thus, we have in this case an example of a function where
Z 1
Z 1
lim fn (x) dx 6= lim
fn (x) dx
0 n→∞
n→∞
0
although all functions in question were integrable.
We may easily extend the above example to R by considering the function here:
2
n
n+1
For n sufficiently large, the wave moves past each x ∈ R, hence the limit function is again
0 in this case, although each constituent fn (x) have a non-zero area under the curve.
n
6. Finally we give a case where fn , f are differentiable but lim fn0 6= f 0 . Consider fn = xn on
[0, 1] for which limn→∞ fn = 0 uniformly. This is because for sufficiently
( large n, f (x) < 0 x 6= 1
for all x ∈ [0, 1]. However, fn0 = xn−1 which converges to fn0 =
as in our
1 x=1
previous example. We may extend this example to have functions for which the sequence
of second, third, and additional derivatives do not converge to the derivatives of the limit
function.
35
x
x2
2
x3
3
x4
4
Figure 18: Illustration of the Sequence of Functions
We will then define uniform continuity in the next class.
7
Uniform Convergence
7.1
Uniform Convergence of Sequences
Before defining uniform convergence, we will write fn → f if a sequence of functions {fn }
converges pointwise to f , and fn ⇒ f if a sequence of functions converges uniformly to f .
Recall that fn → f on a domain E if:
∀x, ∀, ∃N = N (x, ) s.t. |fn (x) − f (x)| < for n ≥ N
In uniform convergence, N does not depend on which x we choose. Given some , we may
choose one N which works for all x.
Definition II.4 (Uniform Convergence). fn converges uniformly on f if
∀ > 0, ∃N = N () s.t. |fn (x) − f (x)| < for all n ≥ N and x ∈ E
We may illustrate uniform convergence as follows:
36
f (x)
fn (x)
a
b
Figure 19: Illustration of uniform convergence
If fn ⇒ f , we may draw an tube of width about the graph of f . Then fn (x) lies within the
tube for n sufficiently large in that case.
Example II.3. Reconsider
our example of a sequence: fn (x) = xn defined on [0, 1] which
(
0 x<1
converges to f (x) =
. The claim is that fn does not converge uniformly to f although
1 x=1
fn → f .
We may see this informally by considering the graph of f . The graph of fn should lie entirely
within a region of width about fn if fn ⇒ f , although this is not the case as each fn is
continuous and accordingly cannot “jump” near x = 1.
Figure 20: fn does not lie within an neighbourhood of the limit
We can reformulate the definition of uniform convergence as follows:
Theorem II.1. Assume fn → f on E and let Mn = supx∈E |fn (x) − f (x)|. Then fn converges
uniformly to f if and only if Mn → 0 as n → ∞.
Proof. This follows from the definition of uniformly convergence, which states that Mn < for
sufficiently large N . Thus, Mn → 0 must hold.
37
Example II.4. We may apply the above example in considering the sequence fn = xn again on
[0, 1].
Here Mn = supx∈[0,1) |fn (x) − 0| = supx∈[0,1) |xn | = 1. Since limnrightarrow∞ Mn 6= 0, then
fn does not converge uniformly to f .
Next, reconsider the example last day of functions which form a travelling wave. Let fn be a
wave of height 1 and have a width of n1 . Recall that fn → 0. Since Mn = supx∈[0, n1 ] |fn − f | =
maxx∈[0, n1 ] |fn (x)| = 1 9 0, then the sequence of fn does not converge uniformly to f .
n
Finally, reconsider the example where fn (x) = xn . This converges uniformly to 0 since
1
Mn = n → 0 holds.
We may state the Cauchy criterion for the uniform convergence of sequences, as we may state
the Cauchy criterion for the convergence of sequences of real numbers.
Theorem II.2 (Cauchy Criterion for Uniform Convergence). A sequence of functions {fn }
converges uniformly if and only if for all > 0, there exists N such that |fn (x) − fm (x)| < for
all x and for all m, n ≥ N .
Remark that this means for fixed x, the sequence {fn (x)} is a Cauchy sequence and the N
chosen does not depend on x but only on .
Proof. (→). Assume fn converges to f uniformly. Then
|fn (x) − fm (x)| ≤ |fn (x) − f (x)| + |f (x) − fm (x)|
by the triangle inequality. Hence given , we may find by uniform convergence, n sufficiently
large such that |fn (x) − f (x)| < 2 . Hence |fn (x) − fm (x)| < for all m, n sufficiently large.
(←). Here, for all x, fn (x) is a Cauchy sequence converging to f (x), hence we have pointwise
convergence to a function f (x). We must now show uniform convergence. We know that for
all , there exists N such that |fn (x) − fm (x)| < for all m, n ≥ N . Fix n and let m → ∞.
Then since fm (x) → f (x) by definition and |fm (x) − fn (x)| < for all m ≥ N , it follows that
|fn (x) − f (x)| ≤ in the limit for that n. Hence, we have uniform convergence.
7.2
Uniform Convergence of Series
P∞
Definition II.5 (Uniform
Convergence of Series). Consider a series of functions n=0 fn (x) =
Pn
f (x). Let sn (x) = k=0 fk (x) be the nth partial sum. Then the series converges uniformly if
sn ⇒ f , that is the sequence of partial sums converges uniformly.
Theorem II.3 (Characterizations of Uniform Convergence). The following are consequences of
the definition of uniform convergence and previous theorems:
P
P∞
1
fn → f uniformly if and only if limn→∞ supx |sn (x)−f (x)| = limn→∞ supx | k=n+1 fn (x)| →
0
2 (Cauchy Criterion)
Pm The series converges uniformly if and only if for every > 0, there exists
N such that | k=n+1 fk (x)| = |sm (x) − sn (x)| < for all m, n ≥ N and for all x.
P
P
3 (Weirstrauss M-Test) If |fn (x)| < Mn for all x and
Mn converges, then
fn (x) converges
uniformly.
38
P
Proof. We will prove P
the Weirstrauss M-Test. Suppose
Mn converges, then for all , there
m
exists m, n such that k=n+1 Mk < by the Cauchy criterion for convergence of numeric series.
By hypothesis:
|
m
X
k=n+1
m
X
fk | ≤ |
|fk (x)| ≤
k=n+1
which implies uniform convergence of
P
m
X
Mk < k=n+1
fk (x).
Beginning of class announcement: Exam 1 will be held next Wednesday, and will cover up to
material done this week. The problems will be mainly based on homework questions.
7.3
Interpretation of Uniform Convergence
Recall that fn → f uniformly on E if for all , there exists N such that |fn (x) − f (x)| < for all n ≥ N and for all x ∈ E. As we noticed last day, this is equivalent to sup |fn (x) − f (x)|
tending to zero for sufficiently large n. We may use sup |fn (x) − f (x)| as a measure for the
distance between fn , f .
More precisely, let B(E) be the space of bounded functions on E. We may define a norm
on B(E). If f ∈ B(E), then define ||f || = supx∈E |f (x)|, known as the infinity norm. We may
define a distance between f, g ∈ B(E) as ||f − g||, which induces a metric space on B(E).
A sequence of functions fn → f converges uniformly, if and only if fn → f in B(E) with
respect to the norm ||f || in B(E). We may think of each f as a point in B(E) and uniform
convergence will be the convergence of those points with the respect to the metric in B(E).
More precisely, a sequence of points {fn } converges to f if:
∀, ∃N s.t. ||fn − f || < ∀n ≥ N
Likewise, if F (E) is the space of all functions on E, we may define ||f || as in B(E) although
||f || ∈ R ∪ {∞} in this case. Furthermore, in the case where ||f ||, ||g|| are infinite, it may be the
case that ||f − g|| is finite.
Restating the theorems we proved last day in the language above yields the following:
1 fn → f uniformly if and only if supn→∞ |fn (x) − f (x)| = limn→∞ ||fn − f || → 0
2 (The Cauchy Criterion): fn → f uniformly if and only if for all > 0, there is N such that for
all m, n ≥ N , |fn (x) − fm (x)| < for all x, or ||fn − fm || ≤ for all n, m ≥ N .
Here, if a sequence of functions {fn } converges uniformly, then {fn } is a Cauchy sequence in
(B(E), ||f ||). Recall that a sequence {an } is a Cauchy in an arbitrary metric space if for every
, there exists N for which d(an , am ) < for all n, m ≥ N . A metric space is a complete
if every Cauchy sequence converges, hence B(E) is a complete metric space since a Cauchy
sequence in B(E) is equivalent to a sequence of functions {fn } converging uniformly to some
f in B(E).
P∞
3 (The
P∞ Weierstrass M-Test):PIf a series n=0 fn (x) satisfies |fn (x)| < Mn for all x such that
fn (x) converges uniformly.
n=0 Mn converges, then
We may rewrite this by replacing Mn with
they are upper bounds and
P the supremum sinceP
summing over those suprema. Thus, if
||fn || converges, then
fn converges uniformly in
B(E). This is the analogous thing to saying the absolute convergence implies convergence
when dealing with sequences of real numbers.
39
The analogies we made above when considering functions as points in a function space cannot
be made when solely consider pointwise, instead of uniform convergence since a meaningful
measure of d(fn , f ) cannot be made in this case. However, this analogy is useful when we try to
extend properties of sequences of points in R into sequences in B(E).
8
Properties of Uniform Convergence
8.1
8.1.1
Uniform Convergence and Continuity
The Main Result
We now prove an important result in uniform convergence.
Theorem II.4. If fn → f uniformly and each fn is continuous, then f is also continuous.
Proof. We will need to show that the limit function f is continuous at x0 for all x0 ∈ E. That is
∀ ∃δ s.t. |f (x) − f (x0 )| < if |x − x0 | < δ
Let be given. The following is a schematic of the bound we use in bounding the distance
|f (x) − f (x0 )|:
f (x)
f (x0 )
f
fn
fn (x)
fn (x0 )
Figure 21: Schematic of the Proof
By the triangle inequality:
|f (x) − f (x0 )| ≤ |f (x) − fn (x)| + |fn (x) − fn (x0 )| + |fn (x0 ) − f (x0 )|
(This is also illustrated in the figure above.) Since fn → f uniformly, there exists n such that
for all x, |f (x) − fn (x)| < 3 and |fn (x0 ) − f (x0 )| < 3 . Furthermore, by continuity of each fn ,
there exists δ such that |fn (x) − fn (x0 )| < 3 if |x − x0 | < δ. Hence, we have produced δ such
that |f (x) − f (x0 )| < 3( 3 ) = 40
The above proof illustrates something common we will do when dealing with sequences of
functions. We will prove something about the limit function f by jumping to an fn which is
close to it. The above proof does not work too if we only have pointwise convergence, since
|f (x) − fn (x)| may not be able to be made small for all x in the domain.
The above theorem also has an interpretation in the language of function spaces. Let C(E)
be the continuous and bounded functions of E. Note that C(E) ⊂ B(E) holds. If {fn } is a
sequence in C(E), converging to a point in B(E) (i.e. uniformly), then the limit f must lie
in C(E). Hence, C(E) is a complete space in its own right, and all limit points of C(E) lie
in B(E). Thus, C(E) is additionally a closed space, which is an interesting feature for the
infinite-dimensional normed vector space B(E).
More precisely, in a normed vector space V ⊂ W , where V is a linear subspace of W , then V
may not be closed in W if dim W = ∞. Consider the class C 1 (E) ⊂ B(E), where C 1 (E) consists
of the continuously differentiable functions. There exists fn , each of which is differentiable and
uniformly converging to f , although f is not differentiable. Geometrically, this represents a
sequence of points in a plane, although the limit does not lie in the plane itself! This is perhaps
a counter-intuitive feature of infinite dimensional vector spaces.
We’ve established that if fn → f uniformly, and each fn is continuous, then the function in
the limit f is continuous. That is, uniform convergence of continuous functions means we have a
continuous limit function. Today we consider the converse problem. That is, if fn → f pointwise
and each fn , f is continuous, does fn → f uniformly? In other words, if we have a continuous
limit function when each fn is continuous, does the sequence converge uniformly?
Dini’s theorem answers the above problem.
8.1.2
Dini’s Theorem
Theorem II.5 (Dini). Let fn → f pointwise on a compact set K. Assume that the sequence of
functions is monotone. That is fn (x) ≥ fn+1 (x) for all x, n, or fn (x) ≤ fn+1 (x) for all x, n. If
fn and f are continuous, then fn → f uniformly.
We now give some examples to show why the two assumptions are needed in the above
statement:
1 Compactness is needed. Consider the sequence of functions fn (x) defined on R according to
the figure below:
n
n+1
These functions are continuous on R, a non-compact set, and a monotonically decreasing since
fn+1 (x) ≤ fn (x) for all n. Furthermore, fn (x) → 0 pointwise. But supR ||fn − 0|| = 1 for all
n, hence the convergence is not uniform.
2 Monotonicity is needed. Consider the functions [0, 1] defined as follows, as triangular waves:
41
1
1
n
The above functions, on the compact set [0, 1], are not monotone since there exist x for which
fn+1 (x) < fn (x) and fn+1 (x) > fn (x). The functions converge pointwise to 0, but fn 9 f
uniformly since again supx∈[0,1] ||fn − f || = 1.
Note too that a sequence of discontinuous functions may converge uniformly to a continuous
function. Consider g(x) defined as:
and define fn (x) = g(x)
n . Each fn (x) is discontinuous, but fn (x) ⇒ 0 uniformly since
supx || n1 || → 0 as n → ∞. Hence Dini’s theorem specifies a sufficient but not necessary condition for uniformly convergence, given that the function in the limit is continuous. We now will
prove the theorem.
Proof. Let fn → f pointwise. Relabel fn as the differences fn → f . Hence fn → 0 pointwise
and without loss of generality, we may take the sequence of fn s are monotonically decreasing.
To prove uniform convergence, we may show that for , there exists N for which |fn (x)| =
fn (x) < for all x. We claim that locally:
∀y ∈ K, ∃δy , ny s.t. fny (x) < ∀x s.t. |x − y| < δy
and use compactness to get the global result given in the theorem.
Proof of Theorem Assuming Claim. Since the claim holds, the open intervals (y − δy , y + δy )
form a covering of K (and each y ∈ K is a centre of an interval). Hence by compactness, there
exists a finite subcover of those open intervals, meaning that there are y1 , y2 , ..., ym such that
K = Iy1 ∪ Iy2 ∪ ... ∪ Iym and Iy = {x ∈ K||x − y| < δy }.
Hence, taking N = max{ny1 , ny2 , ...., nyn } ensures that fn (x) < for all n ≥ N since the
graph of the function f is guaranteed to lie under the line y = for those points. Hence fn (x) < for all x ∈ K.
We now return to proving the previous claim to complete the proof of the theorem.
42
Proof of Claim. Fix some y ∈ K. We need to construct δy , ny . By assumption of pointwise
convergence at y, then fn (y) → 0. Hence for n sufficiently large, |fn (y)| < 2 . By continuity of
fny (y), there exists δy such that
|fny (x) − fny (y)| <
∀|x − y| < δy
2
Hence, by the triangle inequality
|fny (x)| ≤ |fny (x) − fny (y)| + |fny (y)| <
+ =
2 2
for such δy , ny .
δ
fn (x)
2
2
y
Figure 22: Illustration of Proof of Claim. Given , there are n, δ such that |fn (x)| < in a δ
neighbourhood of x.
We can conclude state a condition for the uniform convergence of series by applying the above
theorem.
P∞
Corollary II.1. If a series n=0 fn (x) → f pointwise, each fn , f is continuous, and each fn (x)
is non-negative on a compact set K, then fn ⇒ f uniformly.
Proof. The sequence of partial sums is monotone increasing. Since we have fn , f continuous on
a compact set and fn → f pointwise, hence fn ⇒ f uniformly by Dini’s Theorem.
We may further apply the theorems we have just proved in constructing some strange continuous functions. If we know that fn → f uniformly and each fn is continuous, then f is
continuous. But there may be some strange behaviours that f may take.
8.1.3
Strange Functions
1. The Weierstrass Function (1872): This is defined as
fa,b =
∞
X
k=0
43
ak cos(bk πx)
where a, b are constants satisfying 0 < a < 1 and ab > 1. The sum fa,b is continuous as
the uniform limit of a sum of smooth functions, but is nowhere differentiable and nowhere
monotone! In fact it is a fractal function which may be infinitely rough.
2. Rudin presents an example of a continuous but nowhere differentiable function.
3. The Takagi Function (1901): Define f0 as follows:
f0
1
2
−2
Next, define fk =
−1.5
−1
1
f (2k x).
2k 0
1
4
−2 −1.5 −1 −0.5 0
−0.5
0
0.5
1
1.5
2
Thus the following are images of f1 , f2 :
f1
1
8
0.5
1
1.5
−2 −1.5 −1 −0.5 0
2
f2
0.5
1
1.5
2
Figure 23: First few iterations of the Takagi Function
P∞
Define the Takagi function as f (x) = k=0 fk (x). Note that f (x) is continuous since it
is the uniform limit of a series of continuous
functions. This follows by the Weirestrass
P
Mn < ∞ implies that the series f is uniformly
M-Test, since |fn (x)| ≤ 21n = Mn , hence
convergent. However, we also need to check that this is nowhere differentiable.
4. The Devil’s Staircase, or Cantor’s Staircase: Consider the following iterative process which
constructs a function based on removed intervals, during the construction of the Cantor
set:
1
1
1
1/2
1/2
1/4
1/2
1/4
1/8
0
1
0
1
0
Figure 24: First few iterations of construction
44
1
The Cantor staircase is the limit of this construction, hence we get a “staircase” of infinitely
many steps in the limit. What is suprising is that f (x) is continuous in the limit and f is
constant almost everywhere (on [0, 1] \ C), but still rises from 0 to 1.
To see that the Cantor staircase is continuous, we may also construct it as a sequence of
piecewise linear functions as follows:
Figure 25: Alternate Construction of Cantor Staircase
Before moving to the next topic, an additional example of a continuous but nowhere differentiable function is the Koch snowflake, a fractal defined as γ : [0, 1] → R2 where γ = (γ1 (t), γ2 (t)).
Each of γ1 , γ2 is continuous but nowhere differentiable.
8.2
Uniform Convergence and Integration
Theorem II.6. Let α be non-decreasing, fn be defined on [a, b], fn → f uniformly and each
fn ∈ R(α). Then, f ∈ R(α) and
Z
b
Z
b
f dα = lim
n→∞
a
fn dα
a
.
We’ve already seen an example where the limit of the integrals is not equal to the integral
of the limit. This is the “triangle wave” example introduced in the first class when we discussed
uniformly convergence, wherein
Z 1
Z 1
lim
fn dx = 1 6=
f dx = 0
n→∞
0
0
Proof. We will need to prove that for all , there exists a partition P for which
U (P, f, α) − L(P, f, α) < . Bound this difference, using the triangle inequality, by:
|U (P, f, α) − L(P, f, α)| < |U (P, f, α) − U (P, fn , α)|
+ |U (P, fn , α) − L(P, fn , α)|
+ |L(P, fn , α) − L(P, f, α)|
Firstly, choose n such that |f (x)−fn (x)| <
For every P :
3(α(b)−α(a)) ,
45
which exists by uniform convergence.
|U (P, f, α) − U (P, fn , α)| = |
n
X
( Mi − Mi∗ )∆αi |
|{z}
|{z}
i=1
≤
X
sup f (x)
sup fn (x)
Mi∗ |∆αi
|Mi −
X
∆αi
≤
3(α(b) − α(a))
=
3
A similar proof shows that |L(P, f, α) − L(P, fn , α)| < 3 for all partitions and such n. Now,
choose P such that |U (P, fn , α) − L(P, fn , α)| < 3 since each fn is integrable.
Hence, U (P, f, α) − L(P, f, α) < 3( 3 ) = and f ∈ R(α).
To prove that the integral of the limit is equal to the limit of the integrals, we may prove, for
instance, that there exists n such that for all η,
Z b
Z b
|
f dα −
fn dα| < η
a
a
by appealing to upper and lower sums.
Hence in the limit, the integrals are equal.
8.2.1
Application to Function Spaces
Recall that the set B[a, b] denotes all the bounded functions on [a, b], and there exists a norm
||f || = supx |f (x)| on this space. The distance between two functions f, g in B[a, b] is ||f − g||,
and in addition, uniform convergence is the same as convergence with respect to this norm.
What the above theorem implies is that the subspace R(α) ⊂ B[a, b] is closed in B[a, b]. If
fn is a sequence of functions in R(α) for which fn → f , then f ∈ R(α). Hence the space R(α)
contains all of its limit points. Then we have thus far constructed, two closed subspaces of B[a, b]
by noting that C[a, b] ⊂ R(α) ⊂ B[a, b], wherein each of C[a, b] and R(α) are closed.
Rb
Theorem II.7. Define a functional I : R(α) → R as f 7→ a f dα. This is a continuous
functional.
Proof. The proof of the above comes via the sequential characterization of continuity. If fn → f
in B[a, b], then I(fn ) → I(f ) also holds by the previous theorem about uniform convergence and
integration.
We may also define some additional norms in addition to the ||f || norm we had on B[a, b].
This is also known as the L∞ norm. We may define the class of Lp norms as follows:
Rb
1
Definition II.6 (Lp norms). ||f ||p = [ a |f |p dx] p for p ≥ 1
For instance, some common norms include:
• The L2 norm was explored on a previous problem set and comes from the inner product.
Here:
s
Z b
||f ||2 =
|f |2 dx
a
.
46
• The L1 norm is
Z
b
||f ||1 =
|f | dx
a
.
Given some Lp norm, the Lp distance between functions f, g may be defined as ||f − g||p .
We may illustrate that L∞ and L1 distances as follows:
f (x)
sup |f (x) − g(x)|
f (x)
g(x)
g(x)
Rb
a
|f (x) − g(x)| dx
Figure 26: Illustration the L∞ and L1 distances between functions. Particularly, the L∞ distance
is the maximum pointwise distance between the two function and the L1 distance is the area
between the two curves.
Now we have three modes of convergence: uniform convergence (L∞ convergence), pointwise
convergence, and Lp convergence. We may illustrate the relationships between the following the
modes of convergence as follows:
Uniform Convergence
(1)
Lp Convergence for all p
Pointwise Convergence
Figure 27: Illustration between Modes of Convergence
Hence, uniform convergence implies both pointwise and Lp convergence for all p. But Lp
convergence implies neither pointwise or uniform convergence, and pointwise convergence implies
neither Lp and uniform convergence.
Theorem II.8. Uniform convergence implies Lp convergence for all p.
Proof. We will need to show that ||fn − f ||p → 0 as n → ∞, provided fn → f uniformly. By
uniform convergence, there exists N such that |fn (x) − f (x)| < for all n ≥ N . Hence, by
definition of Lp distance:
47
Z
||fn − f ||p = (
b
1
|fn (x) − f (x)|p dx) p
a
Z
b
1
≤(
p dx) p (By uniform convergence)
a
p
= p p (b − a)
√
p
= b−a
Hence, the distance ||fn − f ||p may be made arbitrary small given uniform convergence.
We also want to give a proof that the converse doesn’t hold. That is, an example where fn
converge in Lp but not uniform. Here, we refer back to the triangle wave example where fn → 0
pointwise but not uniformly. In addition, fn → 0 in L1 since:
Z
||fn − 0||1 =
1
|fn | dx =
0
11
(1) → 0
2n
but again fn does not converge uniformly. Geometrically, we can interpret this as the area
underneath the curve tending to zero, but the maximum of each function does not tend to zero.
8.3
Uniform Convergence and Differentiation
If a sequence of functions fn converges to f wherein each fn is differentiable, is it the case
that f is too differentiable, and that fn0 → f 0 ?
We’re already seen examples when the above two statements are false:
P
• The Weierstrass function, defined as n bn cos(an x) is a sum of smooth functions. However,
the function is nowhere differentiable in the limit, although fn → f uniformly.
n
• In the case where fn = xn , this sequence of functions uniformly converges to 0. However,
the sequence of derivatives fn0 = xn−1 does not converge to the derivative of 0, which is 0.
Hence, the condition that a sequence of differentiable {fn } converges to f uniformly is not
enough to ensure that f itself is differentiable, or that fn0 → f 0 . We need to have extra conditions
like the following:
Theorem II.9. Assume fn is differentiable on [a, b], fn → f pointwise, and fn0 → g uniformly.
Then g = f 0 .
Indeed if fn0 → g uniformly, then fn → f uniformly. It is, however, enough to replace fn → f
(convergence of the sequence of functions), with fn (x0 ) → a (convergence of a sequence of real
numbers, at one point of the function). If we attempt to recover f from f 0 from integration, f
is recovered up to a constant and hence a needs to be fixed to obtain the original sequence of
functions. We may restate the theorem as follows, with this weaker condition:
Theorem II.10. Assume fn is differentiable on [a, b], fn (x0 ) → L for some x0 ∈ [a, b], and
fn0 → g uniformly. Then:
1 fn converges to f uniformly.
2 f 0 = g. That is, limn→∞ fn0 = (limn→∞ fn )0 .
48
Proof. In the case where fn0 are continuous, we may apply the Fundamental Theorem of Calculus
to recover f . We will prove the above theorem in this special case.
Let fn0 be continuous and fn0 → g uniformly. Hence g itself is continuous. Write
Z x
fn (x) =
fn0 (t) dt + fn (x0 )
x0
and define
x
Z
f (x) =
g(t) dt + L
x0
.
We will now check assertion 2, then assertion 1.
By construction, f 0 (x) = g(x). It remains to check the first assertion. By assumption,
limn→∞ fn (x0 ) = L. Next, since fn0 → g uniformly, then
Z x
Z x
fn0 (t) dt =
lim fn0 (t) dt
lim
n→∞
x0 n→∞
x0
. Hence, fn → f pointwise.
Furthermore,
Z
x
(fn0 (t) − g(t)) dt + fn (x0 ) − L|
sup |fn (x) − f (x)| = sup |
x
x
x
Z x0
≤ sup |
x
|fn0 (t) − g(t)| dt| + |fn (x0 ) − L|
x0
By assumption, |fn (x0 ) − L| < for n ≥ N . Then, since fn0 → g uniformly, then |fn0 − g| < for all x and for n ≥ M . Hence, for n ≥ max{N, M }:
sup |fn (x) − f (x)| ≤ |x − x0 | + ≤ (|b − a|) + = (|b − a| + 1)
x
This proves uniform convergence of fn to f , by the criterion for uniform convergence.
8.4
Some Counterexamples
• Suppose fn = x + n. Then fn0 = 1 → 1 uniformly. However, the sequence of fn does
not converge since the sequence fn (x0 ) always diverges for any given x0 . Hence the both
consequences of our theorem are violated here and this shows why the first assumption is
needed.
• Suppose we are considering functions not on [a, b] but on R. Let fn (x) = nx . Then fn → 0
pointwise (but not uniformly), but fn0 = n1 → 0 uniformly. Hence, the first consequence of
our theorem is violated if we do not have a finite domain (as fn 9 f uniformly here).
– However, if we have fn → f pointwise on R, and fn0 → g converges uniformly on every
interval [a, b], then f 0 exists on R and the convergence of fn is uniform to f . We may
adapt the original proof of the theorem to this situation.
The next two topics which will be studied are two major theorems in analysis: the ArzelaAscoli Theorem and the Stone-Weierstrass Theorem.
49
9
The Arzela-Ascoli Theorem
Let C[a, b] be the vector space of continuous functions on [a, b] and ||f ||∞ = supx∈[a,b] |f (x)|.
Is every closed and bounded set in C[a, b] compact, as seen in Rn by the Heine-Borel theorem?
We may restate the compactness condition in terms of sequences. Given a sequence {fn } ⊂
C[a, b] which is bounded with respect to the supremum norm (this is equivalent to a sequence
being uniformly bounded, meaning that there exists M such that |fn (x)| < M for all x and
for all n), does {fn } contain a convergent subsequence? That is, does every uniformly bounded
sequence of functions contain a uniformly convergent subsequence? The answer is no in general.
For instance, if the sequence {fn } consists of the following functions:
1
0
1
n
then the sequence of functions {fn } is uniformly bounded by 1, although no subsequence
converges uniformly since sup |fn (x)| = 1 for all n. However, {fn } will contain a uniformly
convergent subsequence if the sequence is equicontinuous.
Definition II.7. A function fn is uniformly continuous (in x) if ∀, ∃δ such that
|x − y| < δ → |fn (x) − fn (y)| < A sequence {fn } is equi-continuous (in n) if every fn is uniformly continuous and the same
δ works for all n, that is it is uniformly continuous in x, n. In terms of − δ, this means that ∀,
∃δ such that
|x − y| < δ → (∀n, |fn (x) − fn (y)| < )
The Arzela-Ascoli Theorem is then as follows:
Theorem II.11. If {fn } is a uniformly bounded and equi-continuous sequence of functions on
[a, b], then {fn } contains a uniformly convergent subsequence.
Firstly, a finite interval is needed. Consider the functions defined on R as:
n
50
n+1
The sequence of these functions {fn } is equicontinuous since their derivative is bounded
and fn → 0 pointwise. But no subsequence converges to 0 uniformly since again we have
supx |fn (x)| = 1 where the supremum is taken over R.
9.1
Types of Continuity
Boundedness of the derivative is a sufficient condition for equicontinuity:
Lemma II.1. If {fn } are differentiable and there exists B ≥ 0 such that |fn0 (x)| ≤ B for all n,
then {fn } is equicontinuous.
Proof. Note that by the Mean Value Theorem
|fn (x) − fn (y)| = |fn0 (c)||x − y| ≤ B|x − y|
where c ∈ [x, y]. Taking δ =
B
show that if |x − y| < δ, then |fn (x) − fn (y)| < , for all n
There is a chain of implications for types of continuity:
Theorem II.12 (Types of Continuity).
• If f 0 exists and is bounded, then f is Lipchitz continuous. That is there is K such that |f (x) − f (y)| ≤ K|x − y|.
• Lipchitz continuity → Hölder continuity. That is there is K and α > 0 for which |f (x) −
f (y)| ≤ K|x − y|α .
• Hölder continuity → uniform continuity
• Uniform continuity → continuity
In the above example with the function on R, then {fn } is Lipchitz continuous with K = 1.
We will now prove that Hölder continuity implies uniform continuity.
Proof. Let be given and f be a Hölder continuous function with K, α known. Hence, taking
1
δ = ( K ) α implies that
|f (x) − f (y)| ≤ K( ) = K
9.2
Pointwise Boundedness
In the statement of the Arzela-Ascoli theorem, we need only pointwise boundedness instead
of uniform boundedness to have the theorem hold.
Definition II.8. A sequence {fn } is pointwise bounded if for every x, the sequence {fn (x)}∞
n=1
is bounded.
Theorem II.13. If a sequence {fn } is pointwise bounded and equi-continuous, then it is uniformly bounded.
Proof. If fn is defined on [a, b], choose a partition of [a, b] where xi in the partition satisfies
|x − xi | < δ, which we will choose later. By the triangle inequality,
|fn (x)| ≤ |fn (x) − fn (xi )| + |fn (xi )|
holds. Then given , we may choose δ such that |fn (x)−fn (y)| < 2 if |x−y| < δ by equicontinuity
of the sequence. Then since |fn (xi )| ≤ Mi by pointwise boundedness for all i, letting M =
maxi=1,...,n Mi yields a uniform bound of M + for |fn (x)|.
We will prove the Arzela-Ascoli theorem next day.
51
9.3
Proof of Arzela-Ascoli
Recall the statement of the Arzela-Ascoli Theorem.
Theorem II.14. Suppose {fn } is a sequence of functions on [a, b] such that {fn } is pointwise
bounded, and {fn } is equi-continuous. Then there exists a subsequence fnk such that fnk → f
uniformly.
Recall that a function is pointwise bounded if for every x, the set {fn (x)} is bounded.
Furthermore, an equi-continuous set of functions {fn } which is pointwise bounded, is uniformly
bounded too. We will prove the theorem today.
Proof. We shall split the proof into three cases.
Case I: {fn } is defined on a finite set E = {x1 , x2 , ..., xn }
Here, if {fn } is pointwise bounded, then there exists a convergent subsequence (and we do
not need the equi-continuity condition). The set {(fi (x1 ), ...., fi (xn ))} forms a sequence in Rn .
Hence, by the Heine-Borel theorem, there exists a subsequence {fni } which converges pointwise.
Case II: {fn } is defined on a countable set E = {x1 , x2 , ...}
Here, if {fn } is pointwise bounded, then there exists a convergent subsequence (and we again
do not need the equi-continuity condition). However, we cannot resort directly to the HeineBorel Theorem since there exist sequences in R∞ with no convergent-subsequence (taking the
standard basis vectors in R∞ , for example). However, the representations of R∞ as functions:
(
1 if i = j
fi (xj ) =
0 otherwise
produce a sequence of functions which tend to zero pointwise.
In this case, let S = {fn }. Then the sequence {fn (x1 )} ⊂ R is bounded. Hence there
exists a convergent subsequence {f1,1 (x1 ), f1,2 (x1 ), ...} of the previous sequence which converges
to y1 . Call this set of functions S1 : {f1,1 , f1,2 , ...}. Next we may construct S2 by taking a
subsequence of S1 which converges when evaluated at x2 . Repeating this process yields sequences
Sn ⊂ ... ⊂ S2 ⊂ S1 for which Sn converges on the set {x1 , ..., xn } and Sn is constructed such
that fn,k → yn as k → ∞ (and fi,k → yi at all previous points i < n.)
We may now diagonalize, taking the sequence:
L = {f1,1 , f2,2 , ..., fn,n , fn+1,n+1 , ...}
which is a subsequence of S which converges for every point xi ∈ E, as after the ith point,
Si ⊂ L and Si,k (xi ) → yi as k → ∞ for some yi by construction.
Case III: {fn } is defined on the interval E = [a, b]
Consider the countable set [a, b] ∩ Q = {q1 , q2 , ...}. By the result of Case II, there exists a
subsequence, {fnk } of {fn } which converges on [a, b] ∩ Q. That is fnk (qj ) → r for some r ∈ R as
k → ∞.
Claim: This subsequence {fnk } = {gk } converges pointwise on [a, b] and also uniformly on
that interval.
52
To prove the claim, we need to use the uniform Cauchy criterion. Let > 0 be given. We
need to exhibit N such that m, n ≥ N implies |gn (x) − gm (x)| < . By the triangle inequality:
|gn (x) − gm (x)| < |gn (x) − gn (qi )| + |gn (qi ) − gm (qi )| + |gm (qi ) − gm (x)|
By equicontinuity, there exists δ such that for all x, y, n, |x − y| < δ implies that |gn (x) −
gn (y)| < 3 . Hence choose δ for which |gn (x) − gn (y)| < 3 and construct a partition P of [a, b] ∩ Q
such that the mesh of the partition is less than δ. This ensures that |gn (x) − gn (qi )| < 3 and
|gm (qi ) − gm (x)| < 3 .
Furthermore, since gn (qi ) converges by construction, then there exists N such that |gn (qi ) −
gm (qi )| < 3 , if m, n ≥ N .
Hence
|gn (x) − gm (x)| < |gn (x) − gn (qi )| + |gn (qi ) − gm (qi )| + |gm (qi ) − gm (x)| < 3( ) = |
{z
} |
{z
} |
{z
}
3
By Equi-Continuity
9.4
By Construction
By Equi-Continuity
Converse to Arzela-Ascoli Theorem
Recall the statement of the Arzela-Ascoli Theorem:
Theorem II.15. If {fn } is uniformly bounded and equi-continuous on [a, b], then there exists a
uniformly convergent subsequence of {fn }.
Recall that the proof involved exhibiting a subsequence fnk which converges to some f on
Q ∩ [a, b], by diagonalization. The equi-continuous condition then allows us to conclude that fnk
converges to f uniformly on [a, b]. The converse to the Arzela-Ascoli Theorem holds:
Theorem II.16. If fn → f uniformly on [a, b] and fn is continuous, then {fn } is equicontinuous
and uniformly bounded.
Proof. Uniform boundedness was proved in a homework. To prove equicontinuity, we need to
show that we may choose δ to satisfy this condition whenever is provided.
Firstly, choose N such that |fn (x) − f (x)| < 3 for all x and for all n ≥ N .
Furthermore, choose δN such that |f (x) − f (y)| < 3 if |x − y| < δN . Then by the triangle
inequality:
|fn (x) − fn (y)| < |fn (x) − f (x)| + |f (x) − f (y)| + |f (y) − fn (y)|
< + +
3 3 3
=
for all n ≥ N and for such δN . Since the equi-continuity condition stipulates that we must
find δ which works for all n ∈ N, we may make it smaller to satisfy this condition. Since
functions {f1 , ..., fN −1 } in the sequence are uniformly continuous, hence there exist δ1 , ..., δN −1
for which |fi (x) − fi (y)| < for all |x − y| < δi for i ∈ {1, ..., N − 1}. Hence, choosing δ =
min{δ1 , ..., δN −1 , δN } makes |fn (x) − fn (y)| < satisfied for all n, if |x − y| < δ.
53
Recall that the Heine-Borel Theorem states that a set C ⊂ Rn is compact if and only if
it is closed and bounded. Another interpretation is the Arzela-Ascoli Theorem, is that a set
C ⊂ C[a, b] (continuous functions on [a, b] with the infinity norm) is compact if and only if C is
closed, bounded, and equicontinuous. We may make this interpretation based on the converse
we just proved, and noting that compactness in a space means that a sequence in the space has
a convergent subsequence.
Furthermore, we may define equicontinuity not just for sequences of functions but also any
arbitrary set of functions.
Definition II.9. If C ⊂ C[a, b] is a family of functions, then it is equicontinuous if ∀ > 0,
there exist δ > 0 for which
|x − y| < δ → |f (x) − f (y)| < for all f ∈ C.
9.5
Application: Peano’s Theorem
The main application of the Arzela-Ascoli Theorem is the proof of Peano’s Theorem, which
tells us about solutions of differential equations. A differential equation relates a function to its
derivative(s), with some initial conditions on the function. For instance, x0 = x2 , x(0) = 1, is an
example of a differential equation.
Theorem II.17 (Peano’s Theorem). Let x0 (t) = f (t, x(t)) be a differential equation with initial
condition x(t0 ) = x0 . If f is continuous, then there exists a solution x(t) where t ∈ [t0 , t0 + ].
1 Peano’s Theorem only stipulates that a solution exists locally, since a solution of a differential
equation may approach ∞ in finite time. The differential equation
x0 (t) = x3
x(0) = 1
is such an example. The above differential equation has solution x(t) =
proaches ∞ as t →
q
1
1−2t ,
which ap-
1
2.
2 We may not have a unique solution either. In the case of the differential equation
p
x0 = |x| x(0) = 0
2
the functions x(t) = 0 and x(t) = t4 are solutions. In the latter case, we may verify that
q
t2
x0 = 2t =
4 . In fact, we may construct infinitely many solutions by shifting the initial
condition as follows:
(t−1)2
4
t2
4
t0
t1
Figure 28: Another solution of the differential equation is constructed by shifting the where the
function is first non-zero
In the case where x(t0 ) < 0 , then two different solutions may be constructed similarly, by
shifting the time where the function starts growing after 0.
54
t0
t1
Figure 29: Other solutions of the differential equation are constructed in this case, again by
shifting
Hence, locally, there is not even a unique solution for this above differential equation! However,
the Picard-Lindelöf theorem says that p
if f is Lipchitz continuous, then there is a unique
solution. This theorem here fails since |x| is not Lipchitz continuous- the slope of the
function goes to ∞ as x → 0.
The idea of the proof of Peano’s theorem, is then to find a sequence of functions xn (t), which
hopefully converge to a solution of the the desired differential equation. For example, Euler’s
Method with step size n1 , produces a piecewise linear approximation to the solution, which we
hope to converge to the actual solution.
x(t)
xn (t)
Figure 30: Euler’s method produces a series of piecewise linear approximations to the solution
of a differential equation
The issue is the the sequence of iterates produced by decreasing the step size in Euler’s
method, that is the sequence of {xn (t)} may
p not converge to the solution x(t). For instance,
when solving the differential equation x0 = |x|, assuming that x(t0 ) < 0, we either have two
cases with Euler’s method:
55
Figure 31: Two cases for Euler’s Methods when sovling x0 =
p
|x|
In the case presented on the left, xn (t) = 0 after a finite number of steps and the function then
becomes identically zero. Otherwise, zero is never reached and the solution becomes unbounded
(this is shown in the right case). Hence, the sequence of iterates Euler’s method produces in this
case may contain solutions of both types, and does not converge to a function. However, we may
apply Arzela-Ascoli theorem to obtain a sequence xnk which converges to a solution x. Indeed,
if f (t, x(t)) is bounded, then the slope of xn at every step in Euler’s method is bounded, and we
get an equicontinuous sequence of functions which is uniformly bounded. This is the idea of the
proof.
Proof Sketch. Instead of solving the differential equation x0 = f (t, x) x(t0 ) = x0 , we will solve
the integral equation:
Z t
x(t) = x0 +
f (s, x(s)) ds
t0
for a function x(s). We know that f (t, x(t)) is continuous, hence the right hand side is
differentiable and x0 (t) exists (which satisfies the original differential equation).
Rt
Define an operator L{u(t)} = x0 + t0 f (s, u(s)) ds. Solving the integral equation is then
equal to finding a fixed point of this operator. We will then continue the proof next day.
9.5.1
Proof of Peano’s Theorem
Recall the statement of Peano’s Theorem:
Theorem II.18. Consider the differential equation x0 (t) = f (t, x(t)) with initial condition
x(t0 ) = x0 . If f is continuous, then there exists a solution x(t) of the equation in some interval [t0 , t0 + ].
Proof. Consider the integral equation
Z
x(t) = x0 +
|
t
t0
f (s, x(s)) ds
{z
}
This is the operator L{x(t)}
56
. Solving the differential equation is equivalent to finding a continuous function x(t) which
satisfies the integral equation (and is is a fixed point of the operator L(x(t))).
Let f (t, x) be defined and continuous for t0 ≤ t ≤ t0 + a and x0 − b ≤ x ≤ x0 + b. Since f (t, x)
is continuous, then there exists M such that |f (t, x)| < M on D = [t0 , t0 + a] × [x0 − b, x0 + b].
b
}. Define xn (t) on [t0 , t0 + ] as follows:
Let = min{a, M
(
x0
t ∈ [t0 + n ]
R t− n
xn (t) =
x0 + t0 f (s, xn (s)) ds t ∈ [t0 + n , t0 + ]
Divide in the interval [t0 , t0 + ] into n equal pieces. The claim is that this construction
enables the part of xn (t) on [t0 + ni , t0 + i+1
n ] to be determined by the part on the previous piece
i
,
t
+
].
We
can
see
this
by
noting
that:
[t0 + i−1
0
n
n
(
0
t ∈ [t0 , t0 + n ]
x0n (t) =
f (t − n , xn (t − n )) t ∈ [t0 + n , t0 + ]
Hence the function xn (t) is determined by the initial condition, and the value of f at xn (t− n ),
a time slightly before the current time. In other words, there is a time delay since, x0n (t) 6=
f (t, xn (t)), but instead, x0n (t) = f (t − n , xn (t − n )). In a way, this construction would be similar
to Euler’s method since we are using information produced at an earlier time (i.e. the derivative)
to construct the solution at the current time.
We may rewrite xn (t) using operator notation as:
(
x0
t ∈ [t0 , t0 + n ]
xn (t) =
L{xn (t − n )} = Ln {xn }(t) t ∈ [t0 + n , t0 + ]
So xn are the fixed points of operators L(xn ) which we have just defined.
We will first check that x0 − b ≤ xn (t) ≤ x0 + b so that the sequence of xn (t) are potentially
solutions to the differential equation.
Z
t− n
|xn (t) − x0 | ≤ |
t0
f (s, xn (s)) ds| = |t − − t0 |
n
|
{z
}
M
|{z}
Bound on f(t,x)
Length of interval
≤ M <
b
|{z}
By choice of Hence, we have verified that xn (t) are possibly converging to a solution. We may now
apply the Arzela-Ascoli theorem since xn (t) 9 x(t), but some sequence xnk → x may converge
uniformly to a solution x(t). It remains to check the two hypotheses:
• We have just verified that {xn } are uniformly bounded since |xn (t)| ≤ |x0 | + b.
• {xn } are equicontinuous. Supposing t1 , t2 ≤
Z
n,
then
t2 − n
|xn (t2 ) − xn (t1 )| ≤
|f (s, xn (s))| ds ≤ M |t2 − t1 |
t1 − n
Since {xn } are uniformly Lipchitz by the above observation, then they are equi-continuous.
57
Hence a subsequence of {xn } converges. It remains to show that the limit x(t), the limit is
a solution to the integral equation. We although know that Lnk (xnk ) = xnk and that Lnk → L
in some sense. Using this, we must show that for η > 0, |L(x(t)) − x(t)| < η for all t. By the
triangle inequality
|L(x) − x| < |L(x) − Lnk (x)| + |Lnk (x) − Lnk (xnk ) + |xnk − x|
{z
} |
{z
} | {z }
|
1
2
3
and Lnk (xnk ) = xnk by construction. We must show Nk sufficiently large such that |L(x) −
x| ≤ η.
Since xnk → x uniformly, then we may make |xnk − x| < η3 for all t, taking care of the third
part of the inequality.
Since the L operator is (uniformly) continuous, then |L(x(t)) − L(x(t − n ))| < η3 if n is small.
Note that Lnk (x) = L(x)(t − nk ) so this makes the first part of the inequality small.
To make the second part of the inequality small, note that Lnk is continuous, then ||x−xnk ||∞
small means that ||Lnk (x) − Lnk (xn )||∞ is too small.
Hence we have proven that |L(x) − x| < η for n sufficiently large, and we have produced a
solution to the differential equation.
We finally note that the Arzela-Ascoli theorem is useful in computing fixed points of operator,
like we have done here. This is equivalent to minimizing the norm ||L(x) − x||∞ . If we have
a compact set of functions x ∈ C and a continuous operator, then L attains its minimum,
and indeed if it is zero, we have some fixed point of L in C. Next day, we will discuss the
Stone-Weierstrass Theorem.
10
Weierstrass’ Theorem
Theorem II.19. Supposing f is a continuous function on [a.b]. Then there exists a sequence of
polynomials Pn (x) such that Pn (x) → f (x) uniformly on [a, b].
This is in stark contrast to the previous example of Weierstrass which we have encountered:
the continuous everywhere but nowhere differentiable function, since this theorem states that
any continuous function may be approximated, uniformly, by smooth functions. In the language
of function spaces, R[x], the set of polynomials with real coefficients, is a dense set of C[a, b]. For
instance, a function such as f (x) = |x| on [−1, 1] may be approximated by a polynomial of high
degree (to approximate the cusp at the origin) on that interval, but it may not approximate the
function well (and do anything) outside that interval.
10.1
Motivation for the Proof - Averaging Operators
We may interpret Weierstrass’ theorem in terms of averaging operators. Define the averaging
operator Aδ (which maps functions to function) as:
Aδ (f )(x) = Average of f on [x-δ,x+δ] =
1
2δ
Z
x+δ
f (t) dt
x−δ
The following images illustrate what happens when the averaging operator is applied to a
step function.
58
Aδ (t)
Aδ (t)
−δ
−2δ
δ
2δ
Figure 32: Application of the averaging operator to a step function yields a piecewise linear
function, then a piecewise quadratic function
It is evident from the above figure that the averaging operator is a smoothing operator. We
may rewrite the averaging operator as follows. Let g(t) be defined as:
1
2δ
−δ
δ
Figure 33: Definition of g(t)
R∞
f (t)g(t − x) dt, the convolution of f and g (denoted
R x +δ
1
1
dt, since g(t − x) = 2δ
on the interval
as f ∗ g). We may check that this is equal to x00−δ f (t) 2δ
[x − δ, x + δ] and is zero everywhere else.
Using the convolution operator, we may define other smoothing operators, for example,
weighted averages, with other choices of g (instead of using a function with its mass distributed
evenly on its support as in the example above). The function
R ∞which we use for g in a smoothing
operator, must have finite support, and we also need that −∞ g(t) dt = 1 to hold.
Here, we note that if g(x) is smooth, then f ∗ g is smooth no matter what f is (and provided
that the integral exists). Furthermore, if g → δ0 , the delta function, and f is continuous, then
f ∗ g → f.
Hence, our goal to find a sequence of smooth gn , approximating the delta function, to produce
a sequence of smooth functions f ∗ gn , each of which is approximating f but approaches f in the
limit.
Then, Aδ (f )(x) may be defined as
−∞
59
Figure 34: A sequence of smooth g which approach the delta function
Weierstrass’ theorem adds the condition that if g is a polynomial, then so is f ∗ g. So see
why:
Z
∞
f ∗ g(x) =
f (t)(xn an (t) + xn−1 an−1 (t)...) dt =
−∞
n
X
k=0
xk
Z
∞
f (t)ak (t) dt =
−∞
n
X
bk xk ∈ R[x]
k=0
.
10.2
Proof of Weierstrass’ Theorem
It suffices to assume that [a, b] = [0, 1] and that f (0) = f (1) = 0 (note that f (x) = 0 outside
[0,1]). We may recover the original f by adding a linear function to f , since there is a unique
line which passes through f (0) and f (1).
R1
Let gn = cn (1 − x2 )n . Let cn be chosen such that cn −1 gn (t) dt = 1. Note that this sequence
of polynomials, with normalization constants chosen appropriately, approaches the delta function
in the limit.
However,√this is something we need to prove. We need to establish a bound on the magnitude
of cn (cn < n) and show that gn (x) → δ in order to establish Weierstrass’ theorem. We will
show that estimate on cn next day, and the convergence.
Recall the lemma we stated last day in preparation for proving the Stone-Weierstrass Theorem:
Lemma II.2. If gn are polynomials and f is any continuous function, then Pn = f ∗ gn are
polynomials.
To complete the proof of the Stone-Weierstrass theorem, we must show that Pn → f uniformly. Recall that we assume that f is defined on [0, 1] where f (0) = f (1) = 0 and f (x) = 0
outside of that interval. Recall that we may assume this as the original f can be recovered
by a linear transformation. Next, recall that gn = cn (1 − x2 )n , where cn are chosen such that
R1
g (x) dx = 1.
−1 n
60
g3
g2
g1
Figure 35: Recall that such gn are bump functions, which approach the delta function
(
Lemma II.3. gn approaches the delta function. That is limn→∞ gn (x) =
Lemma II.4. cn <
√
0
∞
x ∈ [−1, 1] \ {0}
x=0
n
Proof of Lemma 2 assuming Lemma 3. In this case, fixing some x ∈ [−1, 1] \ {0}, we get:
lim gn (x) = lim cn (1 − x2 )n = lim cn an < lim
n→∞
n→∞
n→∞
√
n→∞
nan = 0
By the Ratio Test:
√
n + 1an+1
√ n
=
na
r
1
a→a<1
n
P∞ √
√
Hence, the series n=0 nan converges, and hence limn→∞ nan = 0.
1+
Proof of Lemma 2. By symmetry:
1
=2
cn
Z
1
√1
n
Z
2 n
(1 − x ) dx ≥ 2
(1 − x2 )n dx
0
0
By the binomial theorem and for x ≥ 0: (1 − x2 )n ≥ 1 − nx2 . Hence:
1
≥2
cn
Hence,
1
cn
Z
≥
√1
n
0
4
√
3 n
nx3 √1n
1
n
2
4
)|
= 2( √ − √ ) = 2( √ ) = √
3 0
n 3n n
3 n
3 n
√
≥ √1n . This means that cn < n.
1 − nx2 dx = 2(x −
We may now completed the proof of the lemma, that gn → δ where δ denotes the delta
function. We may now prove the main Stone-Weierstrass theorem: that for any continuous f
and for the gn we have just defined, then:
Z ∞
Z 1
Pn = f ∗ gn (x) =
f (t)g(t − x) dx =
f (t)g(t − x) dt
−∞
0
approaches f uniformly. Note that we may also write Pn as:
Z
∞
Pn (x) =
Z
1
f (x + t)gn (t) dt =
−∞
f (x + t)gn (t) dt
−1
61
that is in terms of where g is defined. Since x ∈ [0, 1] by assumption, then f (x + t) = 0 if
t∈
/ [−1, 1].
Theorem II.20. Pn (x) → f (x) uniformly on [0, 1].
Proof. Let be given. Then by definition:
Z
1
|Pn (x) − f (x)| = |f (x)
Z
−1
Z
1
gn (t) dt −
f (x + t)gn (t) dt|
−1
1
=|
(f (x) − f (x + t))gn (t) dt|
Z
−1
1
|f (x) − f (x + t)||gn (t)| dt
=
−1
Partition [−1, 1] into three intervals as follows:
Z
−δ
−1
|
Z δ
Z 1
|f (x) − f (x + t)||gn (t)| dt +
|f (x) − f (x + t)||gn (t)| dt +
|f (x) − f (x + t)||gn (t)| dt
−δ
δ
{z
}
{z
} |
{z
} |
1
3
2
and make each part small such that Pn (x) → f uniformly on [0, 1].
The fact that 2 is small is derived from the fact that f is uniformly continuous on [−δ, δ].
Hence, there exists δ such that
|f (x) − f (x + t) < if |t| ≤ δ
The fact that 1, 3 is small is derived from the fact that |gn | is small for large n. (We proved
that gn (δ) → 0 as n → ∞, if δ 6= 0). Hence, let N be such that gn (δ) < when n ≥ N .
Hence, by the bound on gn (t),
Z
−δ
|f (x) − f (x + t)||gn (t)| dt ≤ 2M (1 − δ) < 2M −1
where |f (x)| < M (M is a bound for f ). A similar argument holds to bound (3).
Furthermore, by the bound on f (x),
Z
δ
Z
δ
|f (x) − f (x + t)||gn (t)| dt ≤ −δ
Z
1
|gn (t)| dt < −δ
|gn (t)| dt = −1
Hence, |Pn (x) − f (x)| < (4M + 1), for all x where 4M + 1 is a constant. Since is arbitrary,
the difference |Pn (x)−f (x)| may be made small for all x, and hence Pn (x) → f (x) uniformly.
Next day, we will discuss Stone’s theorem: a generalization of Weierstrass’ Theorem.
10.3
Stone’s Generalization of Weierstrass’ Theorem
Recall Weierstrass’ Theorem. We may restate the theorem in the language of function spaces
as follows:
62
Theorem II.21. The set of polynomials R[x] is dense in C[a, b], the set of continuous function
on [a, b], with respect to the supremum norm.
This is because, R[x] = C[a, b], where the closure here denotes the set of uniform limits of
sequences {fn (x)} of polynomials. We may form a sequence such a sequence of polynomials
since there exist polynomial Pn (x) for which |Pn (x) − f (x)| < for every x in the domain, if f
is continuous.
Note that Weierstrass’ Theorem is not Taylor’s theorem. Consider the function
f (x) =
1
1 + x2
whose Taylor expansion is 1 − x2 + x4 − x6 ... converges on |x| < 1. If we take our interval to be
[0, 2], the Taylor polynomial does not converge on here. However, Weierstrass’ theorem enables
us to find a different set of polynomials Pn (x) ⊂ R[x] for which Pn (x) is able to approximate
1
1+x2 on [0, 2].
Weierstrass’ theorem tells us that a certain set of functions, the polynomials, are dense in
C[a, b]. In general, if A ⊂ C[a, b], when is A dense in C[a, b]? That is, when is Ā = C[a, b]?
• Weierstrass’ theorem tells us that A = R[x] is dense in C[a, b].
PN
• We may want to investigate if the trigonometric polynomials A = { n=0 an cos(nx) +
bn sin(nx)} are dense on C[−π, π]. This leads into the problem of investigating Fourier
series.
Stone’s theorem, tells us in-general, which sets of functions are dense in C[a, b]. When
Weierstrass proved his theorem in the 1880s, Stone proved his theorem in the 1930s and 1940s.
Before stating his theorem, we will make some definitions:
Definition II.10. A subset A ⊂ C(E), where E is a domain on which a function is defined and
C(E) is the set of all continuous functions on that domain, is an R-algebra if
1. A is closed under addition, subtraction and multiplication (i.e. A is a subring of C(E))
2. A is closed under scalar product with c ∈ R. That is, cf ∈ A for every f ∈ A and c ∈ R.
An R-algebra is unital if 1 ∈ A
A consequence is that every unital R-algebra contains all of the constant functions. We make
the distinction between unital and non-unital algebras based on the distinction in general ring
theory: for instance, Z is a ring with 1 but 2Z is a ring without 1. The term algebra also comes
from the corresponding term in ring theory. If A, B are rings with B ⊂ A, then A is a B-algebra.
Note too that R and R[x] are both unital R-algebras.
Definition II.11. A ⊂ C(E) seperates points if ∀x, y ∈ E, there is f ∈ A such that f (x) 6=
f (y)
In addition, if A is a unital algebra and separates points, then there exists a function f such
that f (x) = 0 and f (y) = 1 for all x, y in the domain.
• The algebra R[x] separates points, by considering the function f (x) = x − x0 where x0 is
fixed.
63
• The trigonometric polynomials do not separate points by their periodicity: f (−π) = f (π).
Hence A, being the trigonometric polynomials, are not dense in C[−π, π] since they cannot
approximate a function with different values at the endpoints. The best can do, in this case,
is to have a trigonometric polynomial which takes the average value of the endpoints, at each
endpoint. Hence, we may not be able to approximate these functions by a trigonometric
polynomial which is arbitrarily close to the original function. We have a solution here once
we identify the two endpoints as being essentially the same.
We may now state a few versions of Stone’s theorem.
Theorem II.22 (Stone). Let A ⊂ C[a, b] be a unital R-algebra. Then Ā = C[a, b] if and only if
A separates points in [a, b].
We may extend this theorem to complex-valued continuous functions. That is, maps from
[a, b] 7→ C. In this case, a function in C([a, b], C) can be written as two functions f (x) =
f1 (x) + if2 (x). The definition for R-algebra extends to the definition of a C-algebra.
PN
• The algebra A = { n=1 cn einx } is a C-algebra, since einx · eimx = ei(n+m)x
Theorem II.23 (Stone- Complex Version). Let A ⊂ C([a, b], C) be a unital C-algebra closed
under complex conjugation. That is, if f ∈ A, then f¯ ∈ A. Then, Ā = C([a, b], C) if and only if
A separates points.
PN
• The algebra A = { n=−N cn einx } satisfies closure under complex conjugation since einx =
e−inx . Hence the Stone-Weierstrass theorem applies to this algebra, if we again consider π
to be the same as −π.
The version of Stone’s theorem we will first prove is the lattice version.
Definition II.12. The algebra A ∈ C[a, b] is a lattice if it is closed under minimum and
maximum operations, where the minimum and maximum are taken pointwise.
Theorem II.24 (Stone- Lattice Version). Let A be a lattice of C[a, b] such that for all x, y ∈
[a, b], x 6= y, and for all c, d ∈ R, there exists a function f such that f (x) = c and f (y) = d.
Then Ā = C[a, b].
For instance, R[x] is not a lattice, since the pointwise maximum or minimum of polynomials
is in general, not a polynomial, but if A consist of the piecewise linear functions on [a, b], then
the theorem applies here since A is a lattice in this case.
10.4
Proof of Stone’s Theorem- The Lattice Version
Recall from last day the setting of Stone’s theorem. Let A ⊂ (C[a, b], || · ||∞ ) and Ā = {f ∈
C[a, b]|gn → f uniformly where gn ∈ A} When is Ā = C[a, b]? Equivalently, when is A dense in
C[a, b]?
Definition II.13. A ⊂ C[a, b] is a lattice if it is closed under pointwise maximum and minimum.
That is, if f, g ∈ A, max(f, g)(x) = max{f (x), g(x)} and min(f, g)(x) = min{f (x), g(x)} are both
in A.
The term lattice is derived from the corresponding term from partial ordered set theory. We
may define a partial order ≤ on C[a, b] by identifying f ≤ g if f (x) ≤ g(x) for all x. The
maximum of the two functions f, g is larger than both f, g and similarly, the minimum of the
two functions f, g is smaller than both. Hence, their positions relative to each other “form a
lattice”.
64
Theorem II.25 (Stone - Lattice Version). Let A ⊂ C[a, b] be a lattice such that for any x1 , x2 ∈
[a, b] and a1 , a2 ∈ R, there exists f ∈ A such that f (x1 ) = a1 and f (x2 ) = a2 . Then Ā = C[a, b].
Note that we may replace [a, b] with any compact set (in Rn or in a general metric space)
with at least two points. This fails when we don’t have two points in our domain. For instance,
if we take a = b, and define A = {f (a) = 1}, then Ā = A =
6 C[a, a].
Proof. We need to prove that for every f ∈ C[a, b] and > 0, there exists h ∈ A such that
||h(x) − f (x)|| < for every x.
Lemma II.5. Suppose f and are as given before. Fix x0 ∈ [a, b]. Then there exists g ∈ A such
that g(x0 ) = f (x0 ) and g(x) > f (x) − for all x.
Pictorally, this means that g lies above a curve of f (x) − at all points of its domain.
Proof. By assumption, for any x1 ∈ [a, b], there exists hx1 (x) ∈ A such that hx1 (x0 ) = f (x0 ) = a0
and hx1 (x1 ) = f (x1 ) = a1 . A candidate for g is g(x) = maxx1 ∈[a,b]x1 6=x0 hx1 since g(x1 ) ≥ f (x1 )
for all x1 and g(x0 ) = f (x0 ). However, we may take only the maximum of a finite number of
functions. We now will use the compactness assumption.
Each hx1 is continuous, hence exists δx1 such that |x−y| < δx1 → |hx1 (x)−hx1 (y)| < . Define
Ix1 = (x1 −δx1 , x1 +δx1 ). Construct Ix1 for every hx1 . Then, the set Ixi form an open cover of [a, b]
and a finite subcover {Ix1 , ..., Ixn } covers [a, b]. Hence choosing g(x) = max{hx1 , hx2 , ..., hxn }
satisfies g(x0 ) = f (x0 ) and g(x) > f (x) − for every x
For each x0 ∈ [a, b], we will obtain gx0 constructed by lemma. We want to take h =
minx0 ∈[a,b] gx0 to satisfy the condition that ||h − f || < for every x. Choose a finite number of
x0 over which to take the minimum as follows:
Each gx0 is continuous. Hence there are intervals Jx0 = (x0 − δx0 , x0 + δx0 ) which lies in a
-neighbourhood of the graph about g(x0 ). The set of all Jx0 cover [a, b], hence there exists a
finite subcover [a, b] = Jx0 ∪ ... ∪ Jxn . Hence, taking h = min{gx0 , gx1 , ..., gxn } works, as this
function lies below f (x1 ) on Jx0 but also above f (x) − .
10.5
10.5.1
Proofs of Stone-Weierstrass Theorem: Algebra Version
The Real Case
Recall that Stone-Weierstrass theorem stated in terms of algebras:
Theorem II.26 (Stone-Weierstrass). Suppose A ⊂ C[a, b] is a unital R-algebra such that A
separates points. Then Ā = C[a, b].
Recall that A is an algebra if it is closed under addition, multiplication, and scalar multiplication with real numbers. The algebra is unital if the constant functions lie in A. Furthermore,
A separates points if for all x1 , x2 in the domain, there exists an f ∈ A such that f (x1 ) 6= f (x2 ).
The Stone-Weierstrass theorem allows us to claim, that for an A satisfies these conditions,
Ā = {gn → f , gn ∈ A} as the set of all uniform limits of sequences in A, is C[a, b].
Proof. We have previously proved the Stone-Weierstrass theorem in the case where A is a lattice,
but an algebra in general is not a lattice (since an algebra may not be closed under minimum
and maximum operations- for example, the algebra of polynomials). The idea of the proof is to
reduce the case of algebras to the case of lattices.
Let us define B = Ā. By the Weierstrass approximation theorem (hence why this theorem is
called the Stone-Weierstrass theorem), we get that B is a lattice. Applying the lattice version
65
of the Stone-Weierstrass theorem yields B̄ = C[a, b], but since B is closed (as the closure of Ā),
then B̄ = B = Ā = C[a, b], which was to be shown.
It remains to show that B is a lattice.
Lemma II.6. B is a unital R-algebra.
Proof. Since R ⊂ A ⊂ Ā = B, then B is unital. It remains to show that B is closed under
addition and multiplication. Let f, g ∈ B. Then f = lim fn and g = lim gn where fn , gn ∈ A.
Since f +g = lim(fn +gn ) and f g = lim fn gn , and fn +gn ∈ A and fn gn ∈ A, hence f +g, f g ∈ B.
Hence B is a unital R-algebra.
To apply the lattice version of the theorem, we need to show that:
1. B is closed under minimum and maximum.
2. For all x1 6= x2 and for all a1 , a2 ∈ R, there exists f ∈ B such that f (x1 ) = a1 and
f (x2 ) = a2
We will first check property two.
Proof. Let x1 6= x2 on the domain and a1 , a2 ∈ R be given. Since A seperates points, then there
exists g ∈ A such that g(x1 ) 6= g(x2 ). We now construct f (x) = cg(x) + d, for appropriate
constants c, d such that f (x1 ) = a1 and f (x2 ) = a2 . Note that f ∈ A ⊂ B since A was a unital
algebra. The constants c, g satisfy the systems of equations:
a1 = cg(x1 ) + d
a2 = cg(x2 ) + d
Solving the system of equations yields c =
two is satisfied.
a1 −a2
g(x1 )−g(x2 )
and d = a1 − cg(x1 ). Hence property
We now prove property one: that B is closed under minimum and maximum.
Lemma II.7. If f ∈ B, then |f | ∈ B.
The above lemma, along with the fact that B is an algebra, implies the lattice property since
max(f, g) =
In the case where f ≥ g, then max(f, g) =
g−f
2g
max(f, g) = f +g
2 + 2 = 2 = g. Similarly
min(f, g) =
f + g |f − g|
+
2
2
f +g
2
+
f −g
2
=
2f
2
= f . Otherwise if f ≤ g, then
f + g |f − g|
−
2
2
We will use the Weierstrass Approximation Theorem to prove the above lemma.
Proof. Suppose f ∈ B. We must show that |f | ∈ B. Since f is continuous, then it is bounded
where |f | ≤ M . Then consider g(y) = |y| on [−M, M ]. By
Weierstrass approximation
Pthe
n
theorem applied to g(y), for every > 0, there exists Pn (y) = i=0 ci y i such that
|
n
X
ci y i − |y|| < i=0
66
for all y ∈ [−M, M ]. Substituting y = f (x) yields
|
n
X
ci f (x)i − |f (x)|| < i=0
Pn
for all x ∈ [a, b] (since x ∈ [a, b] → |f (x)| ∈ [−M, M ]). Let F (x) = i=0 ci f (x)i . Then f (x) ∈
B → F (x) ∈ B since B is an algebra and is hence closed under addition, scalar multiplication
with R, and multiplication of functions.
Furthermore, there exists a sequence {gn (x)} of such F (x) which uniformly approach |f | as
→ 0, by the above construction. Since B is closed under uniform limits, hence |f | ∈ B.
Now applying the lattice version of the Stone-Weierstrass theorem for B yields the StoneWeierstrass theorem for the algebra A
In Rudin, the algebra A may not be a unital algebra (that is R * A), although A is an Ralgebra. The Stone-Weierstrass theorem, in this case holds, if A separates points and if A does
not vanish at any point. That is, for all x, there exists f ∈ A such that f (x) 6= 0. Furthermore,
note that the theorem holds not only when the functions are defined on the interval [a, b] but
also any compact Hausdorff set K.
10.5.2
The Complex Case
Theorem II.27. Let A ⊂ C([a, b], C). This consists of functions of form f (x) + ig(x). If A is
a unital C-algebra (closure under +, · holds and C ⊂ A), A sepereates points, and A is closed
under complex conjugation (f (x) + ig(x) ∈ A → f (x) − ig(x) ∈ A), then Ā = C([a, b], C).
We may call A a C∗ -algebra in this case. The complex conjugation condition is important
since if F = f + ig ∈ A, then f, g ∈ A since
f=
F − F̄
F + F̄
and g =
2
2i
Conversely if f, g ∈ A, so does f + ig ∈ A.
Proof. Let F = f (x) + ig(x) be in C([a, b], C) and A be an algebra which satisfies the given
properties. We will approximate the function F by elements of A. We will need to find fn → f
and gn → g for which fn , gn ∈ A uniformly approximates f, g respectively. Finding such elements
implies that fn + ign → F uniformly. Applying the real version of the Stone-Weierstrass theorem
to A ∩ C([a, b], R) completes the proof since the real and imaginary parts of the function lie in
A and are each real functions.
67
Part III
Power Series and Fourier Series
The next topic we will study in this class will be power series and Fourier series, as an
application of the material we have covered on uniform convergence. This will not be tested on
the next exam.
11
Power Series
P∞
n
Definition III.1. A power series
P∞ is a sum n n=0 an x . where an ∈ R. The centre
P∞ of the
power series may be shifted to n=0 an (x − a) . We may recover a series of form n=0 an xn
by making the substitution y = x − a.
Definition III.2. A power series has radius of convergence R, if the series converges in
(−R, R) and diverges for |x| > R. We will need to determine behaviour of the series at ±R
separately.
p
P
Theorem III.1. Let n an xn be a power series, and α = lim supn→∞ n |an |. Then the radius
of convergence R = α1 .
P
Proof. This is an application of the root test. Recall that for a seriesp n cn , the root test says
that the series converges or diverges depending on L = lim supn→∞ n |cn |. If L < 1, the series
converges, otherwise if L > 1, the series diverges.
For the power series, we compute
lim sup
n→∞
p
p
|x|
n
|an xn | = |x| lim sup n |an | = |x|α =
R
n→∞
If |x|
R < 1, the series converges. Then the series converges when |x| < R, and R =
the radius of convergence. Otherwise of |x|
R > 1 → |x| > R, the series will diverge.
1
α
will be
Using the radius of convergence, we may also make a statement as to where the series converges uniformly.
P
Theorem III.2. The series n an xn converges uniformly in [−R + , R − ] for any > 0 and
if its radius of convergence is greater than zero. If R = ∞, there is uniform convergence on any
finite interval [−b, b].
1
on (−1, 1) and converges uniformly on any
For instance, the series 1 + x + x2 + ... = 1−x
smaller closed interval. That is, for any fixed (but arbitrary small) > 0, and for every η > 0,
1
there exists N such that |Sn (x) − 1−x
| < η for every x in [−1 + , 1 − ] and for every n ≥ N
n
where Sn (x) = 1 + ... + x .
Proof. Apply the Weierstrass M-Test. Note that on [−R + , R − ]:
|an xn | ≤ |an ||R − |n
Let Mn = |an ||R − |n . Applying the root test to Mn yields
lim sup
n→∞
p
n
Mn = |R − | lim sup
n→∞
68
p
|R − |
n
|an | =
<1
R
Since the original series was bounded above by a convergent series, then
uniformly in that interval.
P
n
an xn converges
We may combine the above statement that the power series converges uniformly, with the
properties we have already established on uniformly convergent series.
P
Corollary III.1. f (x) = n an xn is continuous on (−R, R)
Proof. The series of partial sums Sn (x) are polynomials which converge uniformly to f (x) on
[−R − , R + ]. Hence f (x) is continuous. Since f (x) is continuous for every > 0, then it is
continuous on (−R, R).
P
Corollary III.2. Let f (x) = n an xn be defined on [a, b] ⊂ (−R, R). Then
Z
b
b
Z
f (x) dx =
a
X
a
an xn dx =
X an xn+1
n
n+1
n
|ba =
X an
(bn+1 − an+1 )
n+1
n
since uniform convergence allows us to interchange the sum and the integral.
This enables us to define the antiderivative as a power series as follows:
Z x
X an xn+1
F (x) =
f (t) dt =
n+1
0
n
This is term by term integration.
Similarly, may we perform term by term differentiation to find a power series representation for f 0 (x)? We have established previously that additional conditions are needed for a
sequencePof derivatives to converge to the derivative of the limit. However, it is true that
∞
f 0 (x) = n=0 an nxn−1 .
that if fn → f pointwise andPfn0 → g uniformly, then g = f 0 . In the case of series,
P Recall
n
n−1
→ g uniformly, then g = f 0 uniformly.
n an x → f pointwise (given), and if
n an nx
P
P
Theorem III.3. If n an xn has P
a radius of convergence R, then n an nxn−1 has the same
radius of convergence and f 0 (x) = n an nxn−1 on (−R, R).
Proof.
lim sup
n→∞
p
n
|an n| = lim sup
√
n
n lim sup
n→∞
p
n
|an | = α(1) =
n→∞
1
R
Hence R is too the radius of convergence of n an nxn−1 . Hence by theorem,
the convergence
P
P
of n an nxn−1 is uniform on [−R + , R − ]. This means that f 0 (x) = n an nxn−1 on [−R +
, R − ] for every > 0, and hence for every x in (−R, R).
P
The above argument may be made more careful with shift of index arguments.
P
Corollary III.3. If f (x) = n an xn , then f (x) has continuous derivatives of any order for all
x ∈ (−R, R).
This comes from inductively applying the above theorem.
Definition III.3. f (x) is analytic if it is defined by a power series near every point in a domain.
69
1
For instance f (x) = 1−x
is analytic in (−1, 1). Every analytic function is a smooth functions
since by corollary, it has derivatives of any order. The above terminology comes from the complex
function theory (the analysis of functions C 7→ C) where once differentiable complex functions
are called holomorphic, but are also analytic since a once differentiable function in C has
derivatives of all orders.
This is unfortunately not the case when dealing with functions of a real variable. We have a
chain of inclusions where smooth functions, may not be analytic (and within analytic functions we
1
have subsets of functions such as the polynomials). An illustrative example will be f (x) = e− x2
whose Taylor series is identically zero.
11.1
Power Series Properties
Recall the following facts about power series mentioned during the previous class:
P
• A power series n an xn has an interval of convergence (−R, R), and converges uniformly
in [−R + , R − ].
P
• Similarly, n |an |xn converges in (−R, R), since the radius of convergence is calculated as
p
1
= lim sup n |an |
R
n→∞
P
• Furthermore, if f (x) = n an xn , then f (m) , the mth derivative exists for every m in the
same interval. This implies that f (m) (0) = m!am . Hence, if the series converges in (−R, R),
then the series must be its Taylor/Mclaurin Series.
However, we are not guaranteed when dealing with functions of a real variable, that the
function equals its power series, even when dealing with smooth functions! This behaviour is
shown, for instance, by the function
( 1
e− x2 x 6= 0
f (x) =
0
x=0
P∞
, where f (m) (0) = 0 for every m. Hence, f (x) 6= n=0 an xn in any (−R, R) where R > 0, since
the power series is identically 0!
We may hence classify power series as divergent, convergent (and hence representing a smooth
function), and smooth functions as having a power series or not having a power series representation.
11.2
Behaviour at Endpoints
P
Consider n an xn which converges in
P (−R,nR). The series may converge at its endpoints,
±R. Would f (x), the function which
n an x represents, continuous at x = ±R, given it
converges there?
P∞ n
Indeed this is the case, as Abel’s theorem states.P
As an illustrative example, consider n=1 xn
where R = 1. At x = 1, the series diverges since n n1 = ∞. Otherwise the series, at x = −1
n
P
P∞
1
converges (applying the alternating series test to n (−1)
. Since f 0 (x) = n=0 xn = 1−x
, we
n
P∞ xn
can conclude that f (x) = n=1 n = − log(1 − x), hence f (−1) = − log 2, but f goes to infinity
as x → 1.
70
P
Theorem III.4 (Abel). Let n cn xn be a power series where R is its radius of convergence.
Suppose it converges at R (or −R). Then f (x) is continuous at R. In other words:
∞
X
lim
x→R
n
cn x =
n=0
∞
X
cn R n
n=0
P
Proof. Without loss ofPgenerality, suppose R = 1 and
n cn = s. We need to prove that
limx→1 f (x) = limx→1 n cn xn = s. Symbolically, this means that ∀ > 0, |f (x) − s| < if |x −
1| < δ. Rewrite the sum as
|
X
cn xn −
n
X
cn | = |
X
n
cn (xn − 1)|
n
∞
X
= |(x − 1)
= |(x − 1)
n=0
∞
X
cn (1 + x + x2 + ... + xn−1 )|
xn (s − sn )|
n=0
where s − sn = cn+1 + cn+2 + ... = s −
switching the order of two sums.
Hence
|(x − 1)
∞
X
n
x (s − sn )| ≤ |1 − x|
≤∞
n=0
Pn
k=0 ck .
Note that the above step relied in formally
n
x |s − sn | = (1 − x)
n=0
N
X
n
x |s − sn | + (1 − x)
n=0
where N is chosen such that |s − sn | <
second sum may be bounded by
(1 − x)
∞
X
2
∞
X
xn |s − sn |
n=N +1
for n > N , by convergence of the sum. Hence the
xn |s − sn | ≤ (1 − x)
n=N +1
∞
1−x
X n
x =
=
2 n=0
21−x
2
and the first sum may be bounded by
(1 − x)
N
X
xn |s − sn | < (1 − x)
n=0
N
X
|s − sn |
n=0
PN
since x < 1. Let n=0 |s − sn | < M . Hence if |1 − x| <
Choosing δ = 2M
completes the proof.
2M ,
the first sum is bounded by 2 .
A stronger version of Abel’s theorem states that the convergence at the endpoints is uniform,
in [0, R].
11.3
Rearrangement of Sums
The proof of Abel’s theorem relied on switching two sums:
∞ n−1
X
X
n=0 m=0
cn x m =
∞
∞
X
X
m=0 n=m+1
71
cn xm
Under what conditions may we do this?
We will first examine a case where we may not rearrange the series is not the case. Take the
m
as the partial sums of a series. Then
sequence sm,n = m+n
lim lim
n→∞ m→∞
m
= lim 1 = 1
m + n n→∞
but
m
= lim 0 = 0
m→∞
m→∞ n→∞ m + n
Hence the order of addition in the series may not be switched. The series can be reconstructed
from the sequence of partial sums as a11 = s11 , a12 = s12 − s11 , a21 = s21 − s11, a22 =
s22 − s12 − s21 + s11 and so on. Note that this series may not be absolutely convergent, as the
following theorem shows:
P∞
P
Theorem III.5. Assume j=1 |aij | = bij where i bi converges. Then
lim lim
∞ X
∞
X
aij =
i=1 j=1
∞ X
∞
X
aij
j=1 i=1
Compare the above with the theorem that if
reorder aj and still get the same sum.
P∞
j=1
aj converges absolutely, then we may
Proof. Take
xi ∈ E which converge
Pn x1 , x2 , ... where P
P∞to x∞ ∈ E. Define fi (xn ), g on E as
∞
fi (xn ) = j=1 aij , fi (x∞ ) = j=1 aij , and g(x) = i=1 fi (x). Hence
g(x∞ ) =
∞ X
∞
X
aij
i=1 j=1
| {z }
fi (x∞ )
|
{z
g(x)
}
On the Z2 , the 2-dimensional lattice where i is on the x-axis and j is the y-axis, we may think
of fi (xn ) as a finite sum of n elements at x = i, and fi (x∞ ) as a infinite sum x = i (i.e. f sums
vertically). Furthermore, g(xn ) sums i f functions along a horizontal infinite strip of width n.
1. Note firstly, that fi (x) is continuous at x∞ , since fi (xn ) → fi (x∞ ) as n → ∞, since the
partial sums converge by the absolute convergence assumption.
2. Furthermore, each fi is bounded. Namely, |fi (x)| ≤ bi for every x, since
X
|fi (x)| ≤
|aij | = bi
j
by the assumption of absolute convergence.
P∞
3. Then applying the Weierstrass M-Test to the sequence {fi } means that i=1 fi (x) converges uniformly.
P∞
Combining
the above facts: fi being continuous, and i=1 fi converging uniformly, hence
P
g(x) = i fi (x) is continuous. g being continuous means that g(xn ) → g(x∞ ) as n → ∞ Hence
72
g(x∞ ) =
∞
X
fi (x∞ ) =
i=1
∞ X
n
X
lim g(xn ) = lim
n→∞
inductively applying the fact that
P∞
i=1 (ai
i=1
j=1
n X
∞
X
fi (xn ) = lim
n→∞
+ bi ) =
n→∞
P∞ P∞
n→∞
i=1 j=1
lim g(xn ) =
which is equal to g(x∞ ) =
aij = lim g(xn )
i=1 j=1
Note that
n→∞
∞ X
∞
X
P∞
i=1
∞ X
∞
X
aij
j=1 i=1
ai +
P∞
i=1 bi .
Hence
aij
j=1 i=1
aij = limm→∞
Pm P∞
i=1
j=1
aij .
Note that this is actually a problem of interchanging two limit processes, as we want to prove
that
∞
∞
X
X
lim
fi (xn ) =
lim fi (xn )
n→∞
11.4
i=1
i=1
n→∞
Application to Taylor Series
Suppose f is a function defined for x ∈ E ⊂ R. Say f is analytic if for every a ∈ E, the
Taylor series of f at a converges to fPin some interval (a − , a + ) where > 0.
∞
Suppose we have a power series n=0 an xn → f (x) in (−R, R). Is f analytic in (−R, R)?
n
That is, does an x → f in (−R, R), and given a ∈ (−R, R), does the Taylor series of f (x) at a
converge to f (x) in some (a − , a + )?
The answer is yes, if |x − a| < R − |a| = . Hence, the function will be analytic in the whole
interval.
P
Proof. f (x) is defined as the sum n an xn for all x ∈ (−R, R). Write this as
n X
X
X
n
an ((x − a) + a)n =
an
(x − a)m an−m
m
n
n
m=0
by the binomial theorem. Grouping terms yields
XX n
[an
an−m ](x − a)m
m
m n
Pn
n
To apply a switching of sums, we must prove that the inner sum
bn = m=0 m
an an−m (x −
P
Pn
n
n
m
n−m
m
a) converges absolutely at n → ∞. Note that
|bn | = m=0 m |a
||x − a| = m=0 (|x −
P
a| + |a|)n . Since (|x − a| + |a|)n < Rn , hence n |bn | is finite.
Hence if an = limn→∞ bn :
X
|an |rn
n
converges where r < R since
a convergent series).
r
n
n |an |R converges by the ratio test ( R < 1 implies that we have
P
For the remainder of the course, we will complete our study of power series, and Fourier
series.
73
11.5
Zeros of Analytic Functions
If f (x) is a polynomial of degree n, then f (x) can have at most n roots by the Fundamental
Theorem of Algebra. In the case where f is an analytic function, f can have infinitely many
3
roots. For instance, in the case where f (x) = sin x = x − x3! + ..., its roots are x = nπ where
n ∈ Z.
The main result is that the sequence of roots, of an analytic function, cannot converge.
P
Theorem III.6. Let n an xn converge to f (x) 6= 0 in (−R, R), and let E ⊂ (−R, R) be the set
of zeroes of f . Then E has no limit point in the interval (−R, R)
We note that in the statement of the above theorem, R may be infinite.
Proof. Suppose x0 ∈ (−R, R) is aP
limit point of the set of zeros of f . Consider the power series
∞
of f centered at x0 , which is f = n=0 bn (x − x0 )n . This series converges in (x0 − , x0 + ) for
some > 0.
P∞
P∞
k
n−k
Since f (x0 ) = 0, then b0 = 0. Hence f (x) = n=k bn (x−x0 )n = (x−x
0)
n=k bn (x−x0 )
P∞
n−k
for some k, since x0 is a zero of order k where bk 6= 0. Since g(x) = n=k bn (x − x0 )
6= 0,
and g is continuous, then g(x) 6= 0 in some interval (x0 − η, x0 + η), for some η > 0. Hence f (x)
has no zeros in this interval other than x0 . This is a contradiction.
Note the above theorem can be used to show that certain functions do not have power series
representations about a point. For instance, f (x) = x sin x1 has no power series at x = 0 since
x = 0 is(a limit point, for the zeros of this function. The same can be said about the function
1
e− x2 x > 0
f (x) =
since x = 0 is again a limit point of the zeros and there are no-zero values
0
x≤0
in every neighbourhood about x = 0. Hence f is not analytic at x = 0.
This ends our study of power series in this course. Power series become useful in complex
analysis, differential equations, algebraic geometry, and number theory.
12
12.1
Fourier Series as Orthogonal Series
The Hermitian Inner Product
Suppose f (x) is defined on [−π, π]. Is it true that
f (x) = a0 +
∞
X
an cos(nx) + bn sin(nx)
(1)
n=1
for some constants an , bn ? This is a power series, of trigonometric polynomials.
We will first rewrite the above series a little differently. Allow f (x) to be a complex valued
function and allow an , bn ∈ C. The series
∞
X
f (x) =
cn einx
(2)
n=−∞
e
iθ
is equivalent to series (1) since eiθ may be thought of as a point on the unit circle. That is:
= cos θ + i sin θ. Hence we may obtain
sin θ =
eiθ − e−iθ
2i
cos θ =
74
eiθ + e−iθ
2
where sin θ, cos θ are the real-valued trigonometric functions, if we let cn ∈ C and the series to
extend infinitely in both directions.
Furthermore, let us consider the trigonometric functions B = {1, cos(nx), sin(nx)}∞
n=1 as a
“basis” for V , some vector space of functions. This is not really a basis as every element of a
vector space should able to be written as a finite linear combination of some elements in a basis,
but here we allow an element of a vector space to written as a infinite linear combination of basis
elements (a series). Likewise, A = {xn }∞
n=0 is a basis for power series.
The basis B is orthogonal, that is
Z π
cos(nx) sin(mx) = 0
hcos(nx), sin(mx)i =
−π
for all n, m. This may be used to compute an , bn in the original series (1), since these coefficients
may be calculated by projections onto subspaces. Linear algebra applies to this space.
Before we define more rigorously, what orthogonality means, we need a definition of inner
product, especially if we employ the series in (2) as the series√we use to define Fourier series.
From V = Rn , the inner product is h~v , wi
~ = ~v · w,
~ where√||~v || = ~v · ~v . In V = Cn , this does not
work. For instance, taking ~v = (1, i), we obtain ~v · ~v = 12 + i2 = 0, which is not the length of
the vector in the complex plane.
The correct notion of an inner product, in a complex-valued vector space, then becomes the
Hermitian inner product.
Pn In a finite-dimensional vector space, we may use the dot product
where h~v , wi
~ = ~v · w
~¯ = k=1 vk w¯k . In a function space for functions defined on [−π, π], we may
Rπ
define hf, gi = −π f (x)g(x) dx. We can see that taking the conjugate is the correct notion since
here:
2
||~v || = h~v , ~v i =
n
X
k=1
vk · v¯k =
n
X
|vk |2
k=1
where |vk |2 denotes the length squared of the complex number vk . Furthermore, the inner
product must satisfy certain linearity assumptions. The Hermitian inner product is linear in ~v
since
hav~1 + bv~2 , wi
~ = ahv~1 , wi
~ + bhv~1 , wi
~
and anti-linear in w
~ since
h~v , aw~1 + bw~2 i = āh~v , w~1 i + b̄h~v , w~2 i
Hence the inner product on Rn is bilinear, since it is linear in both arguments, but on Cn ,
it is sesqui-linear (or 1 12 linear) in Cn as this inner product is linear in the first argument but
there is anti-linearity in the second argument. Furthermore, there is a certain symmetry in the
inner product where h~v , wi
~ = hw,
~ ~v i. We may finally define what a Hermitian inner product
is:
Definition III.4. A Hermitian inner product is a map h·, ·i : V × V → C, where V is a
vector space over C which is:
1. Sesquilinear, or linear in the first argument, and anti-linear in the second
2. Symmetric according to the conjugate (h~v , wi
~ = hw,
~ ~v i)
3. Positive definite: the quantity h~v , ~v i is real, and we want h~v , ~v i ≥ 0, with equality only for
~v = 0.
Next time ,we will show that
R π the basis B of functions is orthogonal with respect to the inner
product of functions hf, gi = −π f (x)g(x) dx
75
12.2
Orthogonal Bases of Functions
Let V be a vector space of complex-valued functions on [−π, π], which may be continuous,
integrable, or square-integrable (these are functions in L2 space). We will not impose specific
conditions
R π on these functions now. Define a Hermitian inner product on the function space as
hf, gi = −π f ḡ dx. The inner product defines orthogonality relations in this space and the norm
p
of a function f as ||f || = hf, gi.
Definition III.5. A sequence {ϕn }∞
n=1 in V is an orthonormal system if
(
1 n=m
hϕn , ϕm i =
0 n 6= m
In other words, the functions, each of length one, are pairwise orthogonal.
Rπ
P∞
For any f ∈ V , write f (x) ∼ n=1 cn ϕn where cn = hf, ϕn i = −π f (x)ϕn (x) dx. The ∼
denotes an association of a series with f , and may not imply equality. (The series generated may
1
or may not converge to f ). For instance, in the case of power series, we may write e− x2 ∼ 0
1
since f (x) = e− x2 has a power series representation at x = 0 which identically zero, but not
equal to the function f in any open interval about 0.
The above definition is motivated by orthonormal bases
Cn . If {v1 , ..., vn } is an orthonorPin
n
n
n
mal basis in C , then for any ~v ∈ C , we may write ~v = k=1 ck v~n for some constants ck where
ck = h~v , v~k i.
12.3
Examples of Orthogonal Systems
1 cos(x)
√
√
1. The sequence of functions { 2π
, cos(2x)
, ...} is an orthogonal system, which leads
, √π , sin(x)
π
π
to the series
∞
X
f (x) ∼ a0 +
an cos(nx) + bn sin(nx)
n=1
2. The sequence { √1π einx }∞
−∞ is an orthonormal system since
einx eimx
h√ , √ i =
2π
2π
Z
π
1 inx imx
e e
dx
2π
−π
Z π
1
=
einx e−imx dx
2π −π
Z π
1
=
eix(n−m) dx
2π −π
( Rπ
1
dx = 2π
2π = 1 n = m
−π
= 2π
1 eix(n−m) π
|
n=
6 m
2π i(n−m) −π = 0
hence the property of being an orthogonal system is verified. Since eikx = cos(kx)+i sin(kx)
and both cos, sin are periodic functions
then eikx is too a periodic function with
P∞ with 2π,inx
period 2π. Hence the series f (x) ∼ n=−∞ an e
is too periodic with 2π.
76
Rπ
1
3. Using the inner product hf, gi = 2π
f (x)g(x) dx, then the sequence {einx }∞
−∞ with
−π
respect
to
this
product
is
an
orthogonal
sequence.
The
series
is
then
written
as
f (x) ∼
Rπ
P∞
1
inx
−inx
c
e
where
c
=
f
(x)e
dx.
n
n=−∞ n
2π −π
Note that Rudin calls any orthogonal series Fourier, whereas sometimes a Fourier series only
refers to the series constructed from trigonometric polynomials.
(
1 x ∈ [0, π]
. Expand f as
Example III.1 (Computation of Fourier Series). Let f (x) =
0 x ∈ [−π, 0)
P∞
f (x) ∼ n=−∞ cn einx .
Rπ
R π −inx
1
1
The inner product defines cn = 2π
f (x)e−inx dx = 2π
e
dx
−π
0
This is
1
2
when n = 0. Otherwise, cn =
1 e−inx π
2π −in |0
= −e
−inπ
−1
2πin .
(
Since e−inπ = cos(−nπ) + i sin(−nπ) = (−1)n , then cn =
1
n
2πin ((−1)
− 1) =
0
1
πin
n even
.
n odd
Hence
f (x) ∼
X 1
1
+
einx
2
πin
n odd
since
1 inx
πin e
−
1 −inx
πin e
=
2 einx −e−inx
πn
2i
f (x) ∼
1
+
2
=
2 sin(nx)
,
πn
X
n>0n odd
hence the series may be expressed as
2
sin(nx)
πn
. The series is periodic with period 2π.
Note the following plots of the first few terms of the series:
−π
π
1 term
π
−π
2 terms
−π
π
3 terms
P∞
2
Figure 36: First few terms of 12 + n=0 (2k+1)π
sin((2k +1)π), a Fourier series for a step function,
overlaid with the original function. The original function is plotted in green; the Fourier series
is plotted in blue.
It can noted from the plots above that the Fourier series does not actually converge to the
desired function pointwise, as the endpoints as identically 12 for all terms in the series and hence
never approach the original function. However, the Fourier series converges to f in some other
sense- in the L2 metric.
12.4
12.4.1
Bessel’s Inequality
The Finite Dimensional Case
Consider S = {v~1 , ..., v~m } as an
orthonormal basis of Cn . Consider w
~ ∈ Cn . Set ak = h~v , v~k i
Pm
and consider the expansion w
~ = k=1 ak v~k . In case where n = m, we know that w
~ = ~v , as the
space is n dimensional.
77
Then suppose that S is not a basis of Cn . Then we call w
~ the projection of ~v on Span S = W .
Note that:
1. w
~ is the closest vector to ~v among all vectors in the vector space W .
2. ||w||
~ ≤ ||~v ||
We shall derive an inequality from the second claim. By definition, ||w||
~ is
h
m
X
ak v~k ,
k=1
m
X
al v~l i =
l=1
XX
hak v~k , al v~l i
k
l
Manipulating the above sum, we have
XX
XX
hak v~k , al v~l i =
ak āl hv~k , v~l i
k
l
k
l
(
and since {vk } form an orthogonal basis, we have hv~k , v~l i =
0
1
k=
6 l
. Hence the above
k=l
double sum reduces to the single sum:
m
X
ak a¯k = ||w||
k=1
This means that the coefficients are related to the norm of the original vector by
m
X
ak a¯k ≤ ||~v ||
k=1
and furthermore note that the left hand side is the norm of the coefficient vector ||(a1 , ..., am )|| ∈
Cm . This is a special case of Bessel’s Inequality.
12.4.2
Orthogonal Series Case
Recall the case of Fourier Series, in which we have some function space V of functions defined
on [−π, π], and the inner product is
Z π
hf, gi =
f (x)g(x) dx
−π
{en }∞
n=1
Let
denote an orthogonal system in this case. Recall that the expansion of f into an
orthogonal series is denoted as
∞
X
cn en
f∼
n=1
where cn = hf, en i.
PN
Theorem III.7.
1. For any N ≥ 1, the partial sum SN = n=1 cn en is closest to f among
PN
all linear combinations tN = n=1 dn en . That is to say that
||f −
N
X
cn en || ≤ ||f −
n=1
N
X
n=1
with equality if and only if cn = dn for all 1 ≤ n ≤ N .
78
dn en ||
2. For all N ≥ 1:
PN
n=1 cn c¯n
≤ ||f ||2 and furthermore
∞
X
cn c¯n ≤ ||f ||2
n=1
Proof.
1. Note
||f − tN ||2 = hf − tN , f − tN i
= hf, f i − hf, tN i − htN , f i + htN , tN i
Since
hf, tN i = hf,
N
X
dn en i =
X
X
hf, dn en i =
d¯n cn
n
n=1
n
htN , f i = hf, tN i =
N
X
dn c¯n
n=1
and
htN , tN i = h
X
dn en ,
n
X
dm em i =
X
m
dn d¯n
n
then
||f − tN ||2 = hf, f i −
N
X
n=1
d¯n cn −
N
X
dn c¯n +
n=1
N
X
dn d¯n
n=1
Furthermore, since
(dn − cn )(d¯n − c¯n ) = dn d¯n − cn d¯n − dn c¯n + cn c¯n
Then
||f −tN ||2 = hf, f i+
N
X
[(dn −cn )(d¯n − c¯n )−cn c¯n ] = hf, f i+
n=1
N
X
(dn −cn )(dn − cn )−
n=1
N
X
cn c¯n
n=1
Since (dn − cn )(dn − cn ) = ||dn − cn ||2 , then it is minimum if and only if cn = dn for all n,
since in that case ||dn − cn ||2 = 0. This proves Claim 1.
PN
2. Since ||f − sN ||2 = hf, f i − n=1 cn c¯n , then
N
X
cn c¯n + ||f − sN ||2 = ||f ||2
n=1
79
Since the norm is always non-negative, then
N
X
cn c¯n ≤ ||f ||2
n=1
PN
. The sequence of n=1 cn c¯n is then a bounded, monotone sequence. This means that as
n → ∞, the infinite series converges and
∞
X
cn c¯n ≤ ||f ||2
n=1
The above inequalities allow us to regard Fourier series as a type of projection. Let V be a
function space on which there are some restriction and let W denote all sequences of complex
numbers {cn }. Construct a map V 7→ W by taking f (x) and mapping it to its Fourier coefficients.
That is (cn )n = (hf, en i)n . Both V, W are inner product spaces where in V , the inner product
and norm are
Z π
p
f ḡ dx ||f || = hf, f i
hf, gi =
−π
and in W , the inner product and norm are
h(cn ), (dn )i =
∞
X
cn d¯n
||cn || =
p
hcn , cn i
n=1
In some sense, cn are then the coordinates of a function in V relative to the orthogonal system
{ei }, which becomes as “basis” for V .
The Riesz-Fischer theorem states this correspondence more rigorously.
Theorem III.8 (Riesz-Fischer). The map V 7→ C∞ is an isomorphism preserving the inner
product, if V is the L2 space of square-integrable functions on [−π, π] where ||f ||2 < ∞, and
(en ) = {einx }n∈Z .
12.5
Riesz-Fischer Theorem
We will now restrict attention to trigonometric series. Let f be expanded into its Fourier
series
∞
X
f∼
cn einx
n=−∞
Recall the Riesz-Fisher theorem which states that there is an isomorphism between the two spaces
L2 , l2 according to the map ∼ which takes a functionR and sends it into its Fourier coefficient.
π
L2 -space is the set of functions f on [−π, π] for which −π |f |2 dx < ∞ (although f itself may be
P
unbounded), and l2 -space is the set of sequences (cn )n in complex numbers for which n |cn |2 <
∞.
The Riesz Fisher theorem says that the two inner products correspond. That is, taking
(cn ), (dn ) to be Fourier coefficients of f, g respectively then
hf, gi =
1
2π
Z
π
f ḡ dx =
−π
∞
X
n=−∞
80
cn d¯n = hcn , dn i
and furthermore, the norms correspond. That is
||f || = ||cn || =
s
X
cn c¯n
n
We can then conclude that {einx } is an orthonormal basis with respect to L2 space. The
corresponding sequences in l2 space under this isomorphism will be the coordinates of f ∈ L2
according this space. This is the projection onto subspaces.
13
13.1
Convergence of Fourier Series
L2 convergence of Fourier Series
th
We shall now discuss the convergence
PN of Fourier series. Define the N partial sum of the
Fourier series of f to be sN (f, x) = n=−N cn einx . The Fourier series may converge to f in two
ways:
1. For any (L2 −) integrable f , the Fourier series converges in L2 . That is ||f − SN ||2 → 0.
2. The pointwise convergence of a Fourier series to f requires a Lipchitz condition.
Before proving the convergence of Fourier series, we first note what the Stone-Weierstrass
Theorem has to sayP
about uniform convergence of functions by trigonometric polynomials.
Suppose A = { n cn einx } which are the set of trigonometric polynomials (hence there is
a finite sum where N ≥ 0 and cn ∈ C). We claim that A is a (uniformly) dense in the space
C[−π, π] of continuous functions on [−π, π] that are 2π periodic (f (−π) = f (π)).
Theorem III.9. For anyP
2π-periodic continuous functionPf , and for every > 0, there exists a
N
N
trigonometric polynomial n=−N cn einx wherein |f (x) − n=−N cn einx | < , for every x
Proof Sketch. We need to check the following conditions:
1. A is a unital C-algebra.
2. A separates points in [−π, π], regarding −π = π. The function eix works.
3. A is closed under conjugation.
Hence if f is continuous and 2π-periodic, the Stone-Weierstrass theorem says that there exists
a sequence of trigonometric polynomials pn which approach f uniformly.
However, this does not guarantee that the Fourier series of f converges to f . There was a
1
similar situation in Taylor series. Consider the function f (x) = e− x2 defined on [−1, 1]. Since
this continuous, Weierstrass’ theorem says that there exists a sequence of polynomials pn (x)
which converge to f uniformly on [−1, 1]. But the Taylor series of f at zero, which is zero,
does not converge to f . This is because the sequence of polynomials pn which is given to us by
Weierstrass’ theorem is not the sum of a series. The lower order terms may change as n → ∞.
However, we may use the above result in establishing the convergence of Fourier series. Our first
theorem establishes L2 convergence of Fourier series, for Riemann-integrable functions.
Theorem III.10 (Parseval’s Theorem). Consider Riemann-integrable and 2π periodic functions
f, g on [−π, π].
81
1. The Fourier series sN (f, x) converges to f in L2 .
2. The L2 inner product of functions corresponds to l2 inner product of sequences. That is
hf, gi = h(cn ), (dn )i.
Proof of Part 1. Let f ∈ R. By the homework, for every > 0, there exists h which is continuous
for which ||f − h||2 < .
The Stone-Weierstrass theorem says that there exists a polynomial P such that ||h−P ||2 < ,
since h is continuous. Hence by the triangle inequality: ||f − P ||2 < 2.
PN
inx
Suppose P has degree N (P =
). Then by Bessel’s inequality, among all
n=−N cn e
trigonometric polynomials of degree N , the partial sum of the Fourier series sN (f, x) is closest
to f . Hence
||f − sN ||2 ≤ ||f − P ||2 < 2
For all M > N , the same inequality holds. Hence in the L2 metric,
||f − sM || ≤ ||f − sN || ≤ ||f − P || < 2
This series that the Fourier series of f converges in L2 , since for all , there exists N such
that ||f − sM || < 2 for all M ≥ N . Note, however, that this theorem does not tell us anything
about pointwise convergence of the series.
Recall the theorem stated last class about the L2 convergence of Fourier Series.
Theorem III.11. Suppose f, g ∈ R, periodic on [−π, π].
1. The Fourier series of f converges to f in L2 . That is
||f − sN (f, x)||2 → 0
as n → ∞, where sN (f, x) denotes the N th partial sum in the Fourier series of f .
P
2. The L2 inner
product of corresponds to the l2 inner product. That is, if f ∼ n cn einx
P
and g ∼ n dn einx , then
hf, gi = h(cn ), (dn )i
This is known as Parseval’s equality.
More generally, the above theorem is true if f, g ∈ L2 [−π, π], and not just if f, g are Riemann
integrable.
Proof Sketch for (1). If f is in Riemann-integrable, it may be approximated in L2 by a continuous
function h. Since h is continuous, it may be approximated by Stone-Weierstrass uniformly by a
trigonometric polynomials tN . Hence
||f − tN ||2 < in L2 . By Bessel’s inequality, we may conclude that the Fourier series sN is closest to f in L2
among all trigonometric polynomials. Hence
||f − sN || < proves L2 convergence since we have demonstrated that for any , there exist N such that
n ≥ N → ||f − sn || < .
82
Proof for (2) using a Formal Computation. Note firstly that {einx } is a set of functions which
is an orthonormal basis. Then because of the L2 convergence proved in (1):
X
X
hf, gi = h
cn einx ,
dm eimx i
n
=
X
m
cn d¯n he
inx
, einx i
n
= h(cn ), (dn )i
Note, however, that the above proof assumed that
Z π X
X
XX 1 Z π
1
inx
imx
cn e
cn einx dm eimx
dm e
dx =
2π −π n
2π −π
m
n m
The switching of the integral and the series may not be justified here since we have an infinite
series. Instead, we may repeat the above proof by taking the limit of partial sums.
PN
Proof. Define the partial sum sN (f, x) = n=−N cn einx . Then
hsN (f, x), gi =
=
1
2π
π
Z
N
X
cn einx g(x) dx
−π n=−N
N
X
cn
n=−N
1
2π
|
Z
π
einx g(x) dx
−π
{z
}
Fourier coefficient of ḡ
=
N
X
cn d¯n
n=−N
Hence as N → ∞, then hsN , gi → hcn , dn i. To show the stated equality, we then need to
show that hsN , gi − hf, gi = hsN − f, gi goes to zero as N → ∞. This follows from:
hsN , gi − hf, gi = hsN − f, gi ≤ ||sN − f ||||g||
Applying the Schawrz inequality. As sN − f → 0 by L2 convergence of the Fourier series and
||g|| is a constant, then hsN − f, gi → 0
This is a weaker form of the Riesz-Fischer theorem which states that the space of L2 -integrable
functions on [−π, π] mapping to l2 is an isomorphism of inner product spaces under the map
f 7→ (cn ) where cn is the sequence of Fourier coefficients.
Furthermore, to establish convergence for general orthogonal series, it depends on the completeness of the orthogonal set in question. For instance, for Fourier series, we take {φn } =
{sin(nx), cos(nx)} but if we do a restriction to {cos(nx)}, only symmetric (even) functions f
can be expanded in a series with only cosine terms in it. Similarly, if we use {sin(nx)} as the
orthogonal set, then only anti-symmetric (odd) functions can be expanded in a series with sine
terms.
83
13.2
Pointwise Convergence of Fourier Series
We have previously established L2 convergence for the Fourier series P
although the L2 convergence does not guarantee pointwise convergence. We cannot say that n cn einx → f (x) for
every x in general, even when f is continuous. An example of where pointwise convergence of a
Fourier series fails is the Gibbs phenomenon:
Figure 37: Example of the Gibbs Phenomenon for a Square Wave. Gibbs phenomenon are
displayed at the point of discontinuity and lie approximately on the line y = 1.09
To prove that a certain Fourier series converges to the function pointwise, we will express
the partial sums sN as a convolution sN = f ∗ DN where DN is known as the Dirichlet kernel.
This is similar to how to proved the Weierstrass approximation theorem where we expressed
polynomials approaching a function as a convolution. Let the parial sum sN be expressed as
sN =
N
X
n=−N
cn e
inx
=
N
X
n=−N
1
(
2π
Z
π
f (t)e
−int
−π
84
dt)e
inx
1
=
2π
Z
π
f (t)
−π
N
X
n=−N
ein(x−t) dt
If we let DN (x) =
PN
n=−N
sN
einx , then we have expressed sN as
Z π
1
1
=
f (t)DN (x − t) =
f ∗ DN
2π −π
2π
.
To obtain a more explicit form for DN we first multiply both sides by eix − 1 to obtain
X
X
(eix − 1)DN (x) =
ei(n+1)x −
einx = ei(N +1)x − e−iN x
(1)
n
n
since the sum on the right hand side telescopes. Recalling that sin x =
plying both sides of (1) by
e
−ix
2
2i
ix
eix −e−ix
,
2i
then multi-
yields
e2 −e
2i
−ix
2
1
1
ei(N + 2 )x − e−i(N + 2 )x
DN (x) =
2i
Hence we can conclude
x
1
sin( )DN (x) = sin((N + )x)
2
2
Hence
DN (x) =
sin((N + 21 )x)
sin( x2 )
is an explicit form of the Dirichlet kernel.
2N+1
Figure 38: Plot of the Dirichlet Kernel DN (x) for some N
The Dirichlet kernel P
in some sense approaches the delta function. Consider expanding f (x) =
δ0 into a Fourier series n cn einx . Then
Z π
1
1
cn =
δ0 einx dx =
2π −π
2π
85
This means that
N
1 X inx
e
→ δ0
2π
n=−N
1
2π DN (x)
th
is the N partial sum of the Fourier series of δ0 . This mimics our
as N → ∞, hence
proof of the Weierstrass approximation theorem since we are picking smooth functions which
approach the delta function, and convolving a function we want to approximate with them to
produce a sequence of functions approaching the original function.
Recall the proof of the Weierstrass approximation theorem where we picked polynomials
gn such that gn → δ0 and produced a sequence of polynomials Pn = f ∗ gn which uniformly
approximate f as n → ∞. In the case of Fourier series, our partial sums Sn is the convolution
1
f ∗ DN where DN is the Dirchlet kernel
SN = 2π
N
X
DN =
einx =
n=−N
sin((N + 21 )x)
sin( 12 x)
. We want to show that sN → f as f → ∞. Recall that DN in some sense is the partial sum of
the Fourier series of a delta function. It has the same property as the delta function that
Z π
1
DN (x) dx = 1
2π −π
since
1
2π
(
1
einx dx =
0
−π
Z
π
if n = 0
otherwise
Theorem III.12 (Fourier Convergence Theorem). Suppose f is periodic (f (−π) = f (π)) and
fix some x ∈ [−π, π]. Then sN (f, x) → f (x) as N → ∞ if f (x) is Lipchiz at x. This means that
there exists M, δ such that
|f (x + t) − f (x)| ≤ M |t|
for |t| < δ.
Corollary III.4. If f 0 exists at x, then the Fourier series converges to f at x (sN (f, x) → f (x))
Corollary III.5. If f 0 is continuous on [−π, π], then sN converges to f uniformly.
Proof. We need to show that |sN (f, x) − f (x)| → 0. We may write the convolution
Z π
Z x−π
1
1
f (t)D(x − t) dt = −
f (x − u)D(u) du
2π −π
2π x+π
by making the substitution tR= x − u. As the original function is 2π periodic by assumption
π
1
f (x − t)D(t) dt. And since
then the convolution is then 2π
−π
Z π
1
f (x) = f (x)
DN (t) dt
2π −π
we may write
86
Z π
Z π
1
1
f (x − t)DN (t) dt −
f (x)DN (t) dt
|sN (f, x) − f (x)| =
2π −π
2π −π
Z π
1
=
[f (x − t) − f (x)]DN (t) dt
2π −π
Z π
sin(N + 21 )t
1
=
[f (x − t) − f (x)]
dt
2π −π
sin( 12 t)
by definition of the Dirichlet kernel. Since
1
1
1
sin(N + )t = sin(N t) cos( t) + cos(N t) + sin( t)
2
2
2
Hence the above integral by me decomposed into two parts:
Z π
Z π
f (x − t) − f (x)
1
t
1
[f (x − t) − f (x)] cos(N t) dt
cos( ) sin(N t) dt +
{z
}
2π −π
2
2π −π |
sin( 2t )
|
{z
}
h
g
We may notice now that the first integral yields the N th Fourier coefficient for g(t) =
P
when it is expanded as g ∼
n bn sin(N t), and that the second integral yields
P
the N Fourier coefficient for h(t) = f (x − t) − f (x) when it is expanded as h ∼ n an cos(nt).
inequality,
which states that for an orthogonal basis {φn }, if f ∼
P Then recalling Bessel’s
P
2
2
2
c
φ
,
then
||f
||
≥
||c
||
=
|c
and the sequence
n
n n n
n n | where the two-norms of the function
P
of Fourier coefficients are considered. If f is finite, the infinite series n |cn |2 converges and
hence we may conclude that |cn |2 → 0 and cn → 0.
Hence if g, h are bounded, we get that bn → 0, an → 0. Hence, we get the result desired since
the difference |sN (f, x) − f (x)| = aN + bN → 0 as n → ∞.
Since h is continuous at x, we have that h is bounded in an interval about x. It remains to
show that g is bounded. For some t ∈ (−δ, δ) and by the Lipchitz condition:
f (x−t)−f (x)
sin( 2t )
th
g(t) = |
f (x − t) − f (x)
t
M |t|
M |t|
f (x − t) − f (x)
|| cos( )| ≤ |
|≤|
t
t
t | ≈ |t| = 2M
2
sin( 2 )
sin( 2 )
sin( 2 )
2
Hence g is bounded. To show that g is integrable, we will need to show furthermore, that it
is integrable on every interval [−π, −], avoiding the singularity to argue that the singularity at
t = 0 becomes removable upon integration.
Corollary III.6 (The Localization Property). Assume that f is zero on (x0 − δ, x0 + δ), but
may take any value outside that interval. Then sN (f, x) → f (x) = 0 on (x0 − δ, x0 + δ)
The corollary follows by the Lipchitz property of f on (x0 − δ, x0 + δ). The localization
property implies, then, that if f (x) = g(x) on an interval (x0 − δ, x0 + δ), then sN (f, x) → f (x)
iff sN (g, x) → g(x) on that interval, since f − g = 0 on that interval and we are applying the
corollary.
The study of Fourier series then leads into wavelet theory and additional applications in signal
processing.
87
Download