Uploaded by luminche

MATH 519-520 Lecture Notes: Mathematical Analysis

advertisement
Notes for MATH 519-520
Paul E. Sacks
August 20, 2015
Contents
1 Orientation
1.1
8
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2 Preliminaries
2.1
8
9
Ordinary di↵erential equations . . . . . . . . . . . . . . . . . . . . . . . .
9
2.1.1
Initial Value Problems . . . . . . . . . . . . . . . . . . . . . . . .
10
2.1.2
Boundary Value Problems . . . . . . . . . . . . . . . . . . . . . .
12
2.1.3
Some exactly solvable cases . . . . . . . . . . . . . . . . . . . . .
13
2.2
Integral equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
14
2.3
Partial di↵erential equations . . . . . . . . . . . . . . . . . . . . . . . . .
18
2.3.1
First order PDEs and the method of characteristics . . . . . . . .
18
2.3.2
Second order problems in R2 . . . . . . . . . . . . . . . . . . . . .
21
2.3.3
Further discussion of model problems . . . . . . . . . . . . . . . .
24
2.3.4
Standard problems and side conditions . . . . . . . . . . . . . . .
30
2.4
Well-posed and ill-posed problems . . . . . . . . . . . . . . . . . . . . . .
33
2.5
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
34
1
3 Vector spaces
39
3.1
Axioms of a vector space . . . . . . . . . . . . . . . . . . . . . . . . . . .
39
3.2
Linear independence and bases
. . . . . . . . . . . . . . . . . . . . . . .
42
3.3
Linear transformations of a vector space . . . . . . . . . . . . . . . . . .
43
3.4
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
44
4 Metric spaces
46
4.1
Axioms of a metric space . . . . . . . . . . . . . . . . . . . . . . . . . . .
46
4.2
Topological concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
49
4.3
Functions on metric spaces and continuity . . . . . . . . . . . . . . . . .
53
4.4
Compactness and optimization . . . . . . . . . . . . . . . . . . . . . . . .
54
4.5
Contraction mapping theorem . . . . . . . . . . . . . . . . . . . . . . . .
58
4.6
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
61
5 Normed linear spaces and Banach spaces
66
5.1
Axioms of a normed linear space . . . . . . . . . . . . . . . . . . . . . . .
66
5.2
Infinite series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
69
5.3
Linear operators and functionals . . . . . . . . . . . . . . . . . . . . . . .
70
5.4
Contraction mappings in a Banach space . . . . . . . . . . . . . . . . . .
72
5.5
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
73
6 Inner product spaces and Hilbert spaces
6.1
Axioms of an inner product space . . . . . . . . . . . . . . . . . . . . . .
2
75
75
6.2
Norm in a Hilbert space . . . . . . . . . . . . . . . . . . . . . . . . . . .
77
6.3
Orthogonality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
78
6.4
Projections . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
79
6.5
Gram-Schmidt method . . . . . . . . . . . . . . . . . . . . . . . . . . . .
83
6.6
Bessel’s inequality and infinite orthogonal sequences . . . . . . . . . . . .
84
6.7
Characterization of a basis of a Hilbert space . . . . . . . . . . . . . . . .
85
6.8
Isomorphisms of a Hilbert space . . . . . . . . . . . . . . . . . . . . . . .
87
6.9
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
89
7 Distributions
93
7.1
The space of test functions . . . . . . . . . . . . . . . . . . . . . . . . . .
94
7.2
The space of distributions . . . . . . . . . . . . . . . . . . . . . . . . . .
95
7.3
Algebra and Calculus with Distributions . . . . . . . . . . . . . . . . . .
99
7.3.1
Multiplication of distributions . . . . . . . . . . . . . . . . . . . .
99
7.3.2
Convergence of distributions . . . . . . . . . . . . . . . . . . . . .
99
7.3.3
Derivative of a distribution . . . . . . . . . . . . . . . . . . . . . . 102
7.4
Convolution and distributions . . . . . . . . . . . . . . . . . . . . . . . . 108
7.5
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
8 Fourier analysis and distributions
115
8.1
Fourier series in one space dimension . . . . . . . . . . . . . . . . . . . . 115
8.2
Alternative forms of Fourier series . . . . . . . . . . . . . . . . . . . . . . 121
8.3
More about convergence of Fourier series . . . . . . . . . . . . . . . . . . 123
3
8.4
The Fourier Transform on RN . . . . . . . . . . . . . . . . . . . . . . . . 125
8.5
Further properties of the Fourier transform . . . . . . . . . . . . . . . . . 130
8.6
Fourier series of distributions . . . . . . . . . . . . . . . . . . . . . . . . 134
8.7
Fourier transforms of distributions . . . . . . . . . . . . . . . . . . . . . . 137
8.8
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143
9 Distributions and Di↵erential Equations
148
9.1
Weak derivatives and Sobolev spaces . . . . . . . . . . . . . . . . . . . . 148
9.2
Di↵erential equations in D0 . . . . . . . . . . . . . . . . . . . . . . . . . . 150
9.3
Fundamental solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153
9.4
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163
10 Linear operators
165
10.1 Linear mappings between Banach spaces . . . . . . . . . . . . . . . . . . 165
10.2 Examples of linear operators . . . . . . . . . . . . . . . . . . . . . . . . . 167
10.3 Linear operator equations . . . . . . . . . . . . . . . . . . . . . . . . . . 173
10.4 The adjoint operator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174
10.5 Examples of adjoints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177
10.6 Conditions for solvability of linear operator equations . . . . . . . . . . . 179
10.7 Fredholm operators and the Fredholm alternative . . . . . . . . . . . . . 180
10.8 Convergence of operators . . . . . . . . . . . . . . . . . . . . . . . . . . . 181
10.9 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182
4
11 Unbounded operators
186
11.1 General aspects of unbounded linear operators . . . . . . . . . . . . . . . 186
11.2 The adjoint of an unbounded linear operator . . . . . . . . . . . . . . . . 190
11.3 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194
12 Spectrum of an operator
196
12.1 Resolvent and spectrum of a linear operator . . . . . . . . . . . . . . . . 196
12.2 Examples of operators and their spectra . . . . . . . . . . . . . . . . . . 200
12.3 Properties of spectra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203
12.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206
13 Compact Operators
209
13.1 Compact operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209
13.2 The Riesz-Schauder theory . . . . . . . . . . . . . . . . . . . . . . . . . . 216
13.3 The case of self-adjoint compact operators . . . . . . . . . . . . . . . . . 220
13.4 Some properties of eigenvalues . . . . . . . . . . . . . . . . . . . . . . . . 227
13.5 The Singular Value Decomposition and Normal Operators . . . . . . . . 229
13.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231
14 Spectra and Green’s functions for di↵erential operators
234
14.1 Green’s functions for second order ODEs . . . . . . . . . . . . . . . . . . 234
14.2 Adjoint problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 238
14.3 Sturm-Liouville theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . 241
5
14.4 The Laplacian with homogeneous Dirichlet boundary conditions . . . . . 245
14.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 252
15 Further study of integral equations
256
15.1 Singular integral operators . . . . . . . . . . . . . . . . . . . . . . . . . . 256
15.2 Layer potentials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 260
15.3 Convolution equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 265
15.4 Wiener-Hopf technique . . . . . . . . . . . . . . . . . . . . . . . . . . . . 266
15.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 267
16 Variational methods
269
16.1 The Dirichlet quotient . . . . . . . . . . . . . . . . . . . . . . . . . . . . 269
16.2 Eigenvalue approximation . . . . . . . . . . . . . . . . . . . . . . . . . . 274
16.3 The Euler-Lagrange equation . . . . . . . . . . . . . . . . . . . . . . . . 275
16.4 Variational methods for elliptic boundary value problems . . . . . . . . . 277
16.5 Other problems in the calculus of variations . . . . . . . . . . . . . . . . 281
16.6 The existence of minimizers . . . . . . . . . . . . . . . . . . . . . . . . . 286
16.7 The Fréchet derivative . . . . . . . . . . . . . . . . . . . . . . . . . . . . 287
16.8 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 291
17 Weak solutions of partial di↵erential equations
297
17.1 Lax-Milgram theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 297
17.2 More function spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 303
6
17.3 Galerkin’s method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 309
17.4 PDEs with variable coefficients . . . . . . . . . . . . . . . . . . . . . . . 309
17.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 309
18 Appendices
311
18.1 Inequalities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 311
18.2 Integration by parts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 314
18.3 Spherical coordinates in RN . . . . . . . . . . . . . . . . . . . . . . . . . 315
19 Bibliography
317
7
Chapter 1
Orientation
1.1
Introduction
While the phrase ’Applied Mathematics’ has a very broad meaning, the purpose of this
textbook is much more limited, namely to present techniques of mathematical analysis
which have been found to be particularly useful in understanding some kinds of mathematical problems which are very commonly occurring in scientific and technological
disciplines, especially physics and engineering. These methods, which are often regarded
as belonging to the realm of functional analysis, have been motivated most specifically in
connection with the study of ordinary di↵erential equations, partial di↵erential equations
and integral equations. The mathematical modeling of physical phenomena typically involves one or more of these types of equations, and insight into the physical phenomenon
itself may result from a deep understanding of the underlying mathematical properties
which the models possess. All concepts and techniques discussed in this book are ultimately of interest because of their relevance for the study of these three general types of
problems. There is a great deal of beautiful mathematics which has grown out of these
ideas, and so intrinsic mathematical motivation cannot be denied or ignored.
8
Chapter 2
Preliminaries
chprelim
In this chapter we will discuss ’standard problems’ in the theory of ordinary di↵erential
equations (ODEs), integral equations, and partial di↵erential equations (PDEs). The
techniques developed in these notes are all meant to have some relevance for one or more
of these kinds of problems, so it seems best to start with some awareness of exactly what
the problems are. In each case there are some relatively elementary methods, which the
reader may well have seen before, or which depend only on simple considerations, which
we will review. At the same time we establish terminology and notations, and begin to
get some sense of the ways in which problems are classified.
2.1
Ordinary di↵erential equations
An n’th order ordinary di↵erential equation for an unknown function u = u(t) on an
interval (a, b) ⇢ R may be given in the form
F (t, u, u0 , u00 , . . . u(n) ) = 0
0
00
(2.1.1)
odeform1
(n)
where we use the usual notations u , u , . . . for derivatives of order 1, 2, . . . and also u
for derivative of order n. Unless otherwise stated, we will assume that the ODE can be
solved for the highest derivative, i.e. written in the form
u(n) = f (t, u, u0 , . . . u(n
1)
)
(2.1.2)
For the purpose of this discussion, a solution of either equation will mean a real valued
function on (a, b) possessing continuous derivatives up through order n, and for which
9
odeform
the equation is satisfied at every point of (a, b). While it is easy to write down ODEs in
the form (2.1.1) without any solutions (for example, (u0 )2 + u2 + 1 = 0), we will see that
ODEs of the type (2.1.2) essentially always have solutions, subject to some very minimal
assumptions on f .
The ODE is linear if it can be written as
n
X
aj (t)u(j) (t) = g(t)
(2.1.3)
lode
j=0
for some coefficients a0 , . . . an , g, and homogeneous linear if also g(t) ⌘ 0. It is common
to use also operator notation for derivatives, especially in the linear case. Set
D=
d
dt
(2.1.4)
so that u0 = Du, u00 = D2 u etc., and (2.1.3) may be given as
Lu :=
n
X
aj (t)Dj u = g(t)
(2.1.5)
j=0
By standard calculus properties L is a linear operator, meaning that
L(c1 u1 + c2 u2 ) = c1 Lu1 + c2 Lu2
(2.1.6)
linear
for any scalars c1 , c2 and any n times di↵erentiable functions u1 , u2 .
An ODE normally has infinitely many solutions – the collection of all solutions is
called the general solution of the given ODE.
Example 2.1. By elementary calculus considerations, the simple ODE u0 = 0 has general
solution u(t) = c, where c is an arbitrary constant. Likewise u0 = u has the general
2
solution u(t) = cet and u00 = 1 has the general solution u(t) = t2 + c1 t + c2 , where c1 , c2
are arbitrary constants. 2
2.1.1
Initial Value Problems
The general solution of an n’th order ODE typically contains exactly n arbitrary constants, whose values may be then chosen so that the solution satisfies n additional, or side,
conditions. The most common kind of side conditions for an ODE are initial conditions,
u(j) (t0 ) =
j
j = 0, 1, . . . n
10
1
(2.1.7)
initcond
where t0 is a given point in (a, b) and 0 , . . . n 1 are given constants. Thus we are
prescribing the value of the solution and its derivatives up through order n 1 at the
point t0 . The problem of solving (2.1.2) together with the initial conditions (2.1.7) is
called an initial value problem (IVP), and it is a very important fact that under fairly
unrestrictive hypotheses a unique solution exists. In stating conditions on f , we regard
it as a function f = f (t, y1 , . . . yn ) defined on some domain in Rn+1 .
OdeMain
Theorem 2.1. Assume that
f,
@f
@f
,...,
@y1
@yn
(2.1.8)
are defined and continuous in a neighborhood of the point (t0 , 0 , . . . , n 1 ) 2 Rn+1 . Then
there exists ✏ > 0 such that the initial value problem (2.1.2),(2.1.7) has a unique solution
on the interval (t0 ✏, t0 + ✏).
A proof of this theorem may be found in standard ODE textbooks, see for example
[4],[7]. A slightly weaker version of this theorem will be proved in Section 4.5. As will be
discussed there, the condition of continuity of the partial derivatives of f with respect
to each of the variables yi can actually be replaced by the weaker assumption that f is
Lipschitz continuous with respect to each of these variables. If we assume only that f is
continuous in a neighborhood of the point (t0 , 0 , . . . , n 1 ) then it can be proved that at
least one solution exists, but it may not be unique, see Exercise 3.
It should also be emphasized that the theorem asserts a local existence property, i.e.
only in some sufficiently small interval centered at t0 . It has to be this way, first of all,
since the assumptions on f are made only in the vicinity of (t0 , 0 , . . . , n 1 ). But even
if the continuity properties of f were assumed to hold throughout Rn+1 , then as the
following example shows, it would still only be possible to prove that a solution exists
for points t close enough to t0 .
Example 2.2. Consider the first order initial value problem
u0 = u2
u(0) =
(2.1.9)
for which the assumptions of Theorem 2.1 hold for any . It may be checked that the
solution of this problem is
u(t) =
(2.1.10)
1
t
which is only a valid solution for t < 1 , which can be arbitrarily small. 2
11
With more restrictions on f it may be possible to show that the solution exists on
any interval containing t0 , in which case we would say that the solution exists globally.
This is the case, for example, for the linear ODE (2.1.3).
Whenever the conditions of Theorem 2.1 hold, the set of all possible solutions may
be regarded as being parametrized by the n constants 0 , . . . , n 1 , so that as mentioned
above, the general solution will contain n arbitrary parameters. In the special case of
the linear equation (2.1.3) it can be shown that the general solution may be given as
u(t) =
n
X
cj uj (t) + up (t)
(2.1.11)
j=1
where up is any particular solution of (2.1.3), and u1 , . . . , un are any n linearly independent solutions of the corresponding homogeneous equation Lu = 0. Any such set of
functions u1 , . . . , un is also called a fundamental set for Lu = 0.
Example 2.3. If Lu = u00 +u then by direct substitution we see that u1 (t) = sin t, u2 (t) =
cos t are solutions, and they are clearly linearly independent. Thus {sin t, cos t} is a
fundamental set for Lu = 0 and u(t) = c1 sin t + c2 cos t is the general solution of Lu = 0.
For the inhomogeneous ODE u00 + u = et one may check that up (t) = 12 et is a particular
solution, so the general solution is u(t) = c1 sin t + c2 cos t + 12 et .
2.1.2
Boundary Value Problems
For an ODE of degree n
2 it may be of interest to impose side conditions at more
than one point, typically the endpoints of the interval of interest. We will then refer to
the side conditions as boundary conditions and the problem of solving the ODE subject
to the given boundary conditions as a boundary value problem(BVP). Since the general
solution still contains n parameters, we still expect to be able to impose a total of n side
conditions. However we can see from simple examples that the situation with regard to
existence and uniqueness in such boundary value problems is much less clear than for
initial value problems.
Example 2.4. Consider the boundary value problem
u00 + u = 0 0 < t < ⇡
u(0) = 0 u(⇡) = 1
(2.1.12)
Starting from the general solution u(t) = c1 sin t + c2 cos t, the two boundary conditions
lead to u(0) = c2 = 0 and u(⇡) = c2 = 1. Since these are inconsistent, the BVP has no
solution. 2
12
Example 2.5. For the boundary value problem
u00 + u = 0 0 < t < ⇡
u(0) = 0 u(⇡) = 0
(2.1.13)
we have solutions u(t) = C sin t for any constant C, that is, the BVP has infinitely many
solutions.
The topic of boundary value problems will be studied in much more detail in Chapter
( ).
2.1.3
Some exactly solvable cases
Let us recall explicit solution methods for some commonly occurring types of ODEs.
• For the first order linear ODE
u0 + p(t)u = q(t)
(2.1.14)
define the so-called integrating factor ⇢(t) = eP (t) where P 0 = p. Multiplying the
equation through by ⇢ we then get
(⇢u)0 = ⇢q
(2.1.15)
so if we pick Q such that Q0 = ⇢q, the general solution may be given as
u(t) =
Q(t) + C
⇢(t)
(2.1.16)
• Next consider the linear homogeneous constant coefficient ODE
Lu =
n
X
aj u(j) = 0
(2.1.17)
j=0
If we look for solutions in the form u(t) = e t then by direct substitution we find that
u is a solution provided is a root of the corresponding characteristic polynomial
P( ) =
n
X
j=0
13
aj
j
(2.1.18)
lode1
We therefore obtain as many linearly independent solutions as there are distinct
roots of P . If this number is less than n, then we may seek further solutions of
the form te t , t2 e t , . . . , until a total of n linearly independent solutions have been
found. In the case of complex roots, equivalent expressions in terms of trigonometric
functions are often used in place of complex exponentials.
• Finally, closely related to the previous case is the so-called Cauchy-Euler type equation
n
X
Lu =
(t t0 )j aj u(j) = 0
(2.1.19)
CEtype
j=0
for some constants a0 , . . . , an . In this case we look for solutions in the form u(t) =
(t t0 ) with to be found. Substituting into (2.1.19) we will find again an n’th
order polynomial whose roots determine the possible values of . The interested
reader may refer to any standard undergraduate level ODE book for the additional
considerations which arise in the case of complex or repeated roots.
2.2
Integral equations
In this section we discuss the basic set-up for the study of linear integral equations. See
for example [15], [21] for general references in the classical theory of integral equations.
Let ⌦ ⇢ RN be a measurable set and set
Z
T u(x) =
K(x, y)u(y) dy
(2.2.1)
⌦
Here the function K should be a measurable function on ⌦ ⇥ ⌦, and is called the kernel
of the integral operator T , which is linear since (2.1.6) obviously holds.
A class of associated integral equations is then
Z
K(x, y)u(y) dy = u(x) + g(x)
⌦
x2⌦
(2.2.2)
for some scalar and given function g in some appropriate class. If = 0 then (2.2.2)
is a first kind integral equation, otherwise it is second kind. Let us consider some simple
examples which may be studied by elementary means.
14
basicie
Example 2.6. Let ⌦ = (0, 1) ⇢ R and K(x, y) ⌘ 1. The corresponding first kind
integral equation is therefore
Z 1
u(y) dy = g(x) 0 < x < 1
(2.2.3)
0
For simplicity here we will assume that g is a continuous function. The left hand side is
independent of x, thus a solution can exist only if g(x) is a constant function. When g
is constant, on the other hand, infinitely many solutions will exist, since we just need to
find any u with the given definite integral.
For the corresponding second kind equation,
Z 1
u(y) dy = u(x) + g(x)
(2.2.4)
simplestie
0
a solution must have the specific form u(x) = (C g(x))/ for some constant C. Substituting into the equation then gives, after obvious simplification, that
Z 1
C
g(y) dy = C
(2.2.5)
0
or
R1
g(y) dy
1
in the case that 6= 1. Thus, for any continuous function g and
unique solution of the integral equation, namely
R1
g(y) dy g(x)
u(x) = 0
(1
)
C=
0
(2.2.6)
6= 0, 1, there exists a
(2.2.7)
In the remaining
case that = 1 it is immediate from (2.2.5) that a solution can exist
R1
only if 0 g(y) dy = 0, in which case u(x) = C g(x) is a solution for any choice of C.
This very simple example already exhibits features which turn out to be common to
a much larger class of integral equations of this general type. These are
• The first kind integral equation will require much more restrictive conditions on g
in order for a solution to exist.
• For most
6= 0 the second kind integral equation has a unique solution for any g.
15
2-01
• There may exist a few exceptional values of for which either existence or uniqueness fails in the corresponding second kind equation.
All of these points will be elaborated and made precise in Chapter ( ).
Example 2.7. Let ⌦ = (0, 1) and
T u(x) =
corresponding to the kernel
K(x, y) =
Z
(
x
u(y) dy
(2.2.8)
1 y<x
0 xy
(2.2.9)
opVolterra
0
The corresponding integral equation may then be written as
Z x
u(y) dy = u(x) + g(x)
(2.2.10)
simpleVolterra
0
This is the prototype of an integral operator of so-called Volterra type, see the definition
below.
In the first kind case,
= 0, we see that g(0) = 0 is a necessary condition for
solvability, in which case the solution is u(x) = g 0 (x), provided that g is di↵erentiable in
some suitable sense. For 6= 0 we note that di↵erentiation of (2.2.10) with respect to x
gives
1
g 0 (x)
u0
u=
(2.2.11)
which is of the type (2.1.14), and so may be solved by the method given there. The
result, after some obvious algebraic manipulation, is
Z
x
e
1 x x y 0
u(x) =
g(0)
e g (y) dy
(2.2.12)
2-02
0
Note, however, that by an integration by parts, this formula is seen to be equivalent to
Z x
x y
g(x)
1
u(x) =
e g(y) dy
(2.2.13)
2
0
Observe that (2.2.12) seems to require di↵erentiability of g even though (2.2.13) does
not, thus (2.2.13) would be the preferred solution formula. It may be verified directly by
16
2-03
substitution that (2.2.13) is a valid solution of (2.2.10) for all
continuous on [0, 1].
6= 0, assuming that g is
Concerning the two simple integral equations just discussed observe that
• For the first kind equation, there are fewer restrictions on g needed for solvability
in the Volterra case (2.2.10) than in the non-Volterra case (2.2.4).
• There are no exceptional values 6= 0 in the Volterra case, that is, a unique solution
exists for every 6= 0 and every continuous g.
Here are some of the more important ways in which integral operators are classified:
IntOpClass
Definition 2.1. The kernel K(x, y) is called
• symmetric if K(x, y) = K(y, x)
• Volterra type if N = 1 and K(x, y) = 0 for x > y or x < y
• convolution type if K(x, y) = K(x y)
R
• Hilbert-Schmidt type if ⌦⇥⌦ |K(x, y)|2 dxdy < 1
• singular if K(x, y) is unbounded on ⌦ ⇥ ⌦
Some important examples of integral operators, which will receive much more attention later in the book are the Fourier transform
Z
1
T u(x) =
e ix·y u(y) dy,
(2.2.14)
N
(2⇡) 2 RN
the Laplace transform
T u(x) =
the Hilbert transform
Z
1
T u(x) =
xy
u(y) dy,
(2.2.15)
opLaplace
u(y)
dy,
x y
(2.2.16)
opHilbert
u(y)
p
dy.
x y
(2.2.17)
opAbel
0
1
T u(x) =
⇡
and the Abel operator
e
opFourier
Z
Z
1
x
0
17
1
2.3
Partial di↵erential equations
An m’th order partial di↵erential equation (PDE) for an unknown function u = u(x) on
a domain ⌦ ⇢ RN may be given in the form
F (x, {D↵ u}|↵|m ) = 0
(2.3.1)
Here we are using the so-called multi-index notation for partial derivatives which works
as follows. A multi-index is vector of non-negative integers
↵i 2 {0, 1, . . . }
↵ = (↵1 , ↵2 , . . . , ↵N )
In terms of ↵ we define
|↵| =
the order of ↵, and
N
X
↵i
(2.3.2)
(2.3.3)
i=1
@ |↵| u
(2.3.4)
. . . @x↵NN
the corresponding ↵ derivative of u. For later use it is also convenient to define the
factorial of a multi-index
↵! = ↵1 !↵2 ! . . . ↵N !
(2.3.5)
D↵ u =
@x↵11 @x↵22
The PDE (2.3.1) is linear if it can be written as
X
Lu(x) =
D↵ u(x) = g(x)
(2.3.6)
|↵|m
pdeorder1
2.3.1
First order PDEs and the method of characteristics
Let us start with the simplest possible example.
Example 2.8. When N = 2 and m = 1 consider
@u
=0
(2.3.7)
@x1
By elementary calculus considerations it is clear that u is a solution if and only if u is
independent of x1 , i.e.
u(x1 , x2 ) = f (x2 )
(2.3.8)
for some function f . This is then the general solution of the given PDE, which we note
contains an arbitrary function f .
18
pdeform1
Example 2.9. Next consider, again for N = 2, m = 1, the PDE
a
@u
@u
+b
=0
@x1
@x2
(2.3.9)
where a, b are fixed constants. This amounts precisely to the condition that u has directional derivative 0 in the direction ✓ = ha, bi, so u is constant along any line parallel to
✓. This in turn leads to the conclusion that u(x1 , x2 ) = f (ax2 bx1 ) for some arbitrary
function f , which at least for the moment would seem to need to be di↵erentiable. 2
The collection of lines parallel to ✓, i.e lines ax2 bx1 = C obviously play a special
role in the above example, they are the so-called characteristics, or characteristic curves
associated to this particular PDE. The general concept of characteristic curve will now
be described for the case of a first order linear PDE in two independent variables, (with
a temporary change of notation)
a(x, y)ux + b(x, y)uy = c(x, y)
(2.3.10)
linear1order
Consider the associated ODE system
dx
= a(x, y)
dt
dy
= b(x, y)
dt
(2.3.11)
and suppose we have some solution pair x = x(t), y = y(t) which we regard as a parametrically given curve in the (x, y) plane. Such a curve is then, by definition, a characteristic
curve for (2.3.10). Observe that if u(x, y) is a di↵erentiable solution of (2.3.10) then
d
u(x(t), y(t)) = a(x(t), y(t))ux (x(t), y(t)) + b(x(t), y(t))uy (x(t), y(t)) = c(x(t), y(t))
dt
(2.3.12)
so that u satisfies a certain first order ODE along any characteristic curve. For example
if c(x, y) ⌘ 0 then, as in the previous example, any solution of the PDE is constant along
any characteristic curve.
Now let
⇢ R2 be some curve, which we assume can be parametrized as
x = f (s), y = g(s), s0 < s < s1
(2.3.13)
The Cauchy problem for (2.3.10) consists in finding a solution of (2.3.10) with values
prescribed on , that is,
u(f (s), g(s)) = h(s) s0 < s < s1
19
(2.3.14)
udoteq
for some given function h. Assuming for the moment that such a solution u exists, let
x(t, s), y(t, s) be the characteristic curve passing through (f (s), g(s)) 2 when t = 0, i.e.
(
@x
= a(x, y) x(0, s) = f (s)
@t
(2.3.15)
@y
= b(x, y) y(0, s) = g(s)
@t
We must then have
@
u(x(t, s), y(t, s)) = c(x(t, s), y(t, s))
@t
u(x(0, s), y(0, s)) = h(s)
(2.3.16)
This is a first order initial value problem in t, depending on s as a parameter, which
is then guaranteed to have a solution at least for |t| < ✏ for some ✏ > 0. The three
relations x = x(t, s), y = y(t, s), z = u(x(t, s), y(t, s)) generally amounts to the parametric
description of a surface in R3 containing . If we can eliminate the parameters s, t to
obtain the surface in non-parametric form z = u(x, y) then u is the sought after solution
of the Cauchy problem.
example30
Example 2.10. Let
denote the x axis and let us solve
xux + uy = 1
(2.3.17)
300
with u = h on . Introducing f (s) = s, g(s) = 0 as the parametrization of , we must
then solve
8
@x
>
< @t = x x(0, s) = s
@y
(2.3.18)
= 1 y(0, s) = 0
@t
>
: @ u(x(t, s), y(t, s)) = 1
u(s, 0) = h(s)
@t
We then easily obtain
x(s, t) = set
y(s, t) = t u(x(s, t), y(s, t)) = t + h(s)
(2.3.19)
and eliminating t, s yields the solution formula
u(x, y) = y + h(xe y )
(2.3.20)
The characteristics in this case are the curves x = set , y = t for fixed s, or x = sey in
nonparametric form. Note here that the solution is defined throughout the x, y plane
even though nothing in the preceding discussion guarantees that. Since h has not been
otherwise prescribed we may also regard (2.3.20) as the general solution of (2.3.17).
The attentive reader may already realize that this procedure cannot work in all cases,
as is made clear by the following consideration: if c ⌘ 0 and is itself a characteristic
20
301
curve, then the solution on would have to simultaneously be equal to the given function
h and to be constant, so that no solution can exist except possibly in the case that h is
a constant function. From another, more general, point of view we must eliminate the
parameters s, t by inverting the relations x = x(s, t), y = y(s, t) to obtain s, t in terms of
x, y, at least near , and according to the inverse function theorem this should require
that the Jacobian matrix
 @x @y

a(f (s), g(s)) b(f (s), g(s))
@t
@t
=
(2.3.21)
@y
@x
f 0 (s)
g 0 (s)
@s
@s t=0
be nonsingular for all s. Equivalently the direction hf 0 , g 0 i should not be parallel to
ha, bi, and since ha, bi must be tangent to the characteristic curve, this amounts to the
requirement that
itself should have a non-characteristic tangent direction at every
point. We say that is non-characteristic for the PDE (2.3.10) when this condition
holds.
The following precise theorem can be established, see for example Chapter 1 of [18],
or Chapter 3 of [10].
Theorem 2.2. Let ⇢ R2 be a continuously di↵erentiable curve, which is non-characteristic
for (2.3.10), h a continuously di↵erentiable function on and let a, b, c be continuously
di↵erentiable functions in a neighborhood of . Then there exists a unique continuously di↵erentiable function u(x, y) defined in a neighborhood of which is a solution of
(2.3.10).
The method of characteristics is capable of a considerable amount of generalization,
in particular to first order PDEs in any number of independent variables, and to fully
nonlinear first PDEs, see the references just given above.
2.3.2
classif
Second order problems in R2
Let us next look at the following special type of second order PDE in two independent
variables x, y:
Auxx + Buxy + Cuyy = 0
(2.3.22)
l2order
where A, B, C are real constants, not all zero. Consider introducing new coordinates ⇠, ⌘
by means of a linear change of variable
⇠ = ↵x + y
21
⌘ = x+ y
(2.3.23)
ltrans
with ↵
6= 0, so that the transformation is invertible. Our goal is to make a good
choice of ↵, , , so as to achieve a simpler, but equivalent PDE to study.
Given any PDE and any change of coordinates, we obtain the expression for the PDE
in the new coordinate system by straightforward application of the chain rule. In our
case, for example, we have
@u
@u @⇠ @u @⌘
@u
@u
=
+
=↵
+
(2.3.24)
@x
@⇠ @x @⌘ @x
@⇠
@⌘
✓
◆✓
◆
2
2
@ 2u
@
@
@u
@u
@ 2u
2@ u
2@ u
=
↵
+
↵
+
=
↵
+
2↵
+
(2.3.25)
@x2
@⇠
@⌘
@⇠
@⌘
@⇠ 2
@⇠@⌘
@⌘ 2
with similar expressions for uxy and uyy . Substituting into (2.3.22) the resulting PDE is
au⇠⇠ + bu⇠⌘ + cu⌘⌘ = 0
(2.3.26)
where
a = ↵2 A + ↵ B + 2 C
b = 2↵ A + (↵ + )B + 2
c = 2A + B + 2C
C
(2.3.27)
(2.3.28)
(2.3.29)
The idea now is to make special choices of ↵, , , to achieve as simple a form as possible
for the transformed PDE (2.3.26).
Suppose first that B 2 4AC > 0, so that there exist two real and distinct roots r1 , r2
of Ar2 + Br + C = 0. If ↵, , , are chosen so that
↵
= r1
= r2
(2.3.30)
then a = c = 0, (and ↵
6= 0) so that the transformed PDE is simply u⇠⌘ = 0. The
general solution of this second order PDE is easily obtained: u⇠ must be a function of ⇠
alone, so integrating with respect to ⇠ and observing that the ’constant of integration’
could be any function of ⌘, we get
u(⇠, ⌘) = F (⇠) + G(⌘)
(2.3.31)
for any di↵erentiable functions F, G. Finally reverting to the original coordinate system,
the result is
u(x, y) = F (↵x + y) + G( x + y)
(2.3.32)
The lines ↵x + y = C, x + y = C are called the characteristics for (2.3.22). Characteristics are an important concept for this and some more general second order PDEs,
but they don’t play as central a role as in the first order case.
22
trpde
Example 2.11. For the PDE
uxx
uyy = 0
(2.3.33)
2
the roots r satisfy r 1 = 0. We may then choose, for example, ↵ =
to get the general solution
u(x, y) = F (x + y) + G(x
=
= 1,
y)
=
1,
(2.3.34)
Next assume that B 2 4AC = 0. If either of A or C is 0, then so is B, in which case
the PDE already has the form u⇠⇠ = 0 or u⌘⌘ = 0, say the first of these without loss of
generality. Otherwise, choose
↵=
B
2A
=1
=1
=0
(2.3.35)
to obtain a = b = 0, c = A, so that the transformed PDE in all cases is u⇠⇠ = 0.
Finally, if B 2
4AC < 0 then A 6= 0 must hold, and we may choose
↵= p
2A
4AC B 2
=p
B
4AC B 2
=0
=1
(2.3.36)
in which case the transformed equation is
u⇠⇠ + u⌘⌘ = 0
(2.3.37)
We have therefore established that any PDE of the type (2.3.22) can be transformed,
by means of a linear change of variables, to one of the three simple types,
u⇠⌘ = 0
u⇠⇠ = 0
u⇠⇠ + u⌘⌘ = 0
(2.3.38)
modelpde
each of which then leads to a prototype for a certain class of PDEs. If we allow lower
order terms
Auxx + Buxy + Cuyy + Dux + Euy + F u = G
(2.3.39)
l2orderg
then after the transformation (2.3.23) it is clear that the lower order terms remain as lower
order terms. Thus any PDE of the type (2.3.39) is, up to a change of coordinates, one of
the three types (2.3.38), up to lower order terms, and only the value of the discriminant
B 2 4AC needs to be known to determine which of the three types is obtained.
The above discussion motivates the following classification: The PDE (2.3.39) is said
to be:
23
• hyperbolic if B 2
• parabolic if B 2
• elliptic if B 2
4AC > 0
4AC = 0
4AC < 0
The terminology comes from an obvious analogy with conic sections, i.e. the solution
set of Ax2 + Bxy + Cy 2 + Dx + Ey + F = 0 is respectively a hyperbola, parabola or
ellipse (or a degenerate case) according as B 2 4AC is positive, zero or negative.
We can also allow the coefficients A, B, . . . G to be variable functions of x, y, and
in this case the classification is done pointwise, so the type can change. An important
example of this phenomenon is the so-called Tricomi equation (see e.g. Chapter 12 of
[13])
uxx xuyy = 0
(2.3.40)
which is hyperbolic for x > 0 and elliptic for x < 0. One might refer to the equation as
being parabolic for x = 0 but generally speaking we do not do this, since it is not really
meaningful to speak of a PDE being satisfied in a set without interior points.
The above discussion is special to the case of N = 2 independent variables, and in
the case of N
3 there is no such complete classification. As we will see there are still
PDEs referred to as being hyperbolic, parabolic or elliptic, but there are others which
are not of any of these types, although these tend to be of less physical importance.
2.3.3
Further discussion of model problems
According to the previous discussion, we should focus our attention on a representative
problem for each of the three types, since then we will also gain considerable information
about other problems of the given type.
Wave equation
For the hyperbolic case we consider the wave equation
utt
c2 uxx = 0
(2.3.41)
where c > 0 is a constant. Here we have changed the name of the variable y to t,
following the usual convention of regarding u = u(x, t) as depending on a ’space’ variable
24
waveeq
x and ’time’ variable t. This PDE arises in the simplest model of wave propagation in
one dimension, where u represents, for example, the displacement of a vibrating medium
from its equilibrium position, and c is the wave speed.
Following the procedure outlined at the beginning of this section, an appropriate
change of coordinates is ⇠ = x + ct, ⌘ = x ct, and we obtain the expression, also known
as d’Alembert’s formula, for the general solution,
u(x, t) = F (x + ct) + G(x
ct)
(2.3.42)
dal
for arbitrary twice di↵erentiable functions F, G. The general solution may be viewed as
the superposition of two waves of fixed shape, moving to the right and to the left with
speed c.
The initial value problem for the wave equation consists in solving (2.3.41) for x 2 R
and t > 0 subject to the side conditions
u(x, 0) = f (x) ut (x, 0) = g(x)
x2R
(2.3.43)
where f, g represent the initial displacement and initial velocity of the vibrating medium.
This problem may be completely and explicitly solved by means of d’Alembert’s formula.
We have
F (x) + G(x) = f (x) c(F 0 (x) G0 (x)) = g(x)
x2R
(2.3.44)
R
1 x
Integrating the second relation gives F (x) G(x) = c 0 g(s) ds + C for some constant
C, and combining with the first relation yields
✓
◆
✓
◆
Z
Z
1
1 x
1
1 x
F (x) =
f (x) +
g(s) ds + C
G(x) =
f (x)
g(s) ds C
2
c 0
2
c 0
(2.3.45)
Substituting into (2.3.42) and doing some obvious simplification we obtain
Z
1
1 x+ct
u(x, t) = (f (x + ct) + f (x ct)) +
g(s) ds
(2.3.46)
2
2c x ct
We remark that a general solution formula like (2.3.42) can be given for any PDE
which is exactly transformable to u⇠⌘ = 0, that is to say, any hyperbolic PDE of the
form (2.3.22), but once lower order terms are allowed such a simple solution method is
no longer available. For example the so-called Klein-Gordon equation utt uxx + u = 0
may be transformed to u⇠⌘ + 4u = 0 which cannot be solved in so transparent a form.
Thus the d’Alembert solution method, while very useful when applicable, is limited in
its scope.
25
waveeqic
dalivp
Heat equation
Another elementary method, which may be used in a wide variety of situations, is separation of variables. We illustrate with the case of the initial and boundary value problem
ut = uxx
0<x<1 t>0
u(0, t) = u(1, t) = 0
t>0
u(x, 0) = f (x)
0<x<1
(2.3.47)
(2.3.48)
(2.3.49)
Here (2.3.47) is the heat equation, a parabolic equation modeling for example the temperature in a one dimensional medium u = u(x, t) as a function of location x and time t,
(2.3.48) are the boundary conditions, stating that the temperature is held at temperature
zero at the two boundary points x = 0 and x = 1 for all t, and (2.3.49) represents the
initial condition, i.e. that the initial temperature distribution is given by the prescribed
function f (x).
We begin by ignoring the initial condition and otherwise looking for special solutions
of the form u(x, t) = (t) (x). Obviously u = 0 is such a solution, but cannot be of any
help in eventually solving the full stated problem, so we insist that neither of and is
the zero function. Inserting into (2.3.47) we obtain immediately that
0
must hold, or equivalently
(t) (x) = (t)
0
(t)
=
(t)
00
(x)
(2.3.50)
00
(x)
(x)
(2.3.51)
Since the left side depends on t alone and the right side on x alone, it must be that both
sides are equal to a common constant which we denote by
(without yet at this point
ruling out the possibility that itself is negative or even complex). We have therefore
obtained ODEs for and
0
(t) +
00
(t) = 0
(x) +
(x) = 0
(2.3.52)
linked via the separation constant . Next, from the boundary condition (2.3.48) we get
(t) (0) = (t) (1) = 0, and since is nonzero we must have (0) = (1) = 0.
The ODE and side conditions for , namely
00
(x) +
(x) = 0 0 < x < 1
26
(0) = (1) = 0
(2.3.53)
SL1
is the simplest example of a so-called Sturm-Liouville problem, a topic which will be
studied in detail in Chapter ( ), but this particular case can be handled by elementary
considerations. We emphasize that our goal is to find nonzero solutions of (2.3.53), along
with the values of these correspond to, and as we will see, only certain values of will
be possible.
Considering first the case that
> 0, the general solution of the ODE is
p
p
(x) = c1 sin
x + c2 cos
x
(2.3.54)
The first
p boundary condition (0) = 0 implies that c2 = 0 and the second gives
c1 sin
= 0. We are not pallowed to have c1 = 0, since otherwise
= 0, so instead
p
sin
= 0 must hold, i.e.
= ⇡, 2⇡, . . . . Thus we have found one collection of solutions of (2.3.53), which we denote k (x) = sin k⇡x, k = 1, 2, . . . . Since they were found
under the assumption that > 0, we should next consider other possibilities, but it
turns out that we have already
found all possible solutions of (2.3.53). For example if we
p
suppose < 0 and k =
then to solve (2.3.53) we must have (x) = c1 ekx + c2 e kx .
From the boundary conditions
c1 + c2 = 0 c1 e k + c2 e
k
=0
(2.3.55)
we see that the unique solution is c1 = c2 = 0 for any k > 0. Likewise we can check that
= 0 is the only possible solution for k = 0 and for nonreal k.
For each allowed value of we obviously have the corresponding function (t) = e t ,
so that
2 2
uk (x, t) = e k ⇡ t sin k⇡x k = 1, 2, . . .
(2.3.56)
represents, aside from multiplicative constants, all possible product solutions of (2.3.47),(2.3.48).
To complete
the solution of the initial and boundary value problem, we observe that
P
any sum 1
c
uk (x, t) is also a solution of (2.3.47),(2.3.48) as long as ck ! 0 sufficiently
k
k=1
rapidly, and we try to choose the coefficients ck to achieve the initial condition (2.3.49).
The requirement is therefore that
f (x) =
1
X
ck sin k⇡x
(2.3.57)
k=1
hold. For any f for which such a sine series representation is valid, we then have the
solution of the given PDE problem
u(x, t) =
1
X
ck e
k=1
27
k2 ⇡ 2 t
sin k⇡x
(2.3.58)
foursine
The question then becomes to characterize this set of f ’s in some more straightforward
way, and this is done, among many other things, within the theory of Fourier series, which
will be discussed in Chapter 8. Roughly speaking the result will be that essentially any
reasonable function can be represented this way, but there are many aspects to this,
including elaboration of the precise sense in which the series converges. One other fact
concerning this series which we can easily anticipate at this point, is a formula for the
coefficient ck : If we assume that (2.3.57) holds, we can multiply both sides by sin m⇡x
for some integer m and integrate with respect to x over (0, 1), to obtain
Z 1
Z 1
cm
f (x) sin m⇡x dx = cm
sin2 m⇡x dx =
(2.3.59)
2
0
0
R1
since 0 sin k⇡x sin m⇡x dx = 0 for k 6= m. Thus, if f is representable by a sine series,
there is only one possibility for the k’th coefficient, namely
Z 1
ck = 2
f (x) sin k⇡x dx
(2.3.60)
0
Laplace equation
Finally we discuss a model problem of elliptic type,
uxx + uyy = 0
x2 + y 2 < 1
u(x, y) = f (x, y) x2 + y 2 = 1
(2.3.61)
(2.3.62)
where f is a given function. The PDE in (2.3.61) is known as Laplace’s equation, and is
@2
@2
commonly written as u = 0 where = @x
2 + @y 2 is the Laplace operator, or Laplacian.
A function satisfying Laplace’s equation in some set is said to be a harmonic function on
that set, thus we are solving the boundary value problem of finding a harmonic function
in the unit disk x2 + y 2 < 1 subject to a prescribed boundary condition on the boundary
of the disk.
One should immediately recognize that it would be natural here to make use of polar
coordinates (r, ✓), where according to the usual calculus notations,
p
y
r = x2 + y 2 tan ✓ =
x = r cos ✓ y = r sin ✓
(2.3.63)
x
and we regard u = u(r, ✓) and f = f (✓).
28
sinecoef
To begin we need to find the expression for Laplace’s equation in polar coordinates.
Again this is a straightforward calculation with the chain rule, for example
@u
@u @r @u @✓
=
+
@x
@r @x @✓ @x
x
@u
y
@u
= p
2
2
2
2
x + y @r x + y @✓
@u sin ✓ @u
= cos ✓
@r
r @✓
and similar expressions for
@u
@y
(2.3.64)
(2.3.65)
(2.3.66)
and the second derivatives. The end result is
1
1
uxx + uyy = urr + ur + 2 u✓✓ = 0
r
r
(2.3.67)
We may now try separation of variables, looking for solutions in the product form
u(r, ✓) = R(r)⇥(✓). Substituting into (2.3.67) and dividing by R⇥ gives
r2
R00 (r)
R0 (r)
+r
=
R(r)
R(r)
⇥00 (✓)
⇥(✓)
(2.3.68)
so both sides must be equal to a common constant . Therefore R and ⇥ must be nonzero
solutions of
⇥00 + ⇥ = 0
r2 R00 + rR0
R=0
(2.3.69)
Next it is necessary to recognize that there are two ’hidden’ side conditions which we
must make use of. The first of these is that ⇥ must be 2⇡ periodic, since otherwise it
would not be possible to express the solution u in terms of the original variables x, y in
an unambiguous way. We can make this explicit by requiring
⇥0 (0) = ⇥0 (2⇡)
⇥(0) = ⇥(2⇡)
(2.3.70)
As in the case of (2.3.53) we can search for allowable values of by considering the
various cases > 0, < 0 etc. The outcome is that nontrivial solutions exist precisely if
= k 2 , k = 0, 1, 2, . . . , with corresponding solutions, up to multiplicative constant,
(
1 k=0
(2.3.71)
k (x) =
sin kx or cos kx k = 1, 2, . . .
If one is willing to use the complex form, we could replace sin kx, cos kx by e±ikx for
k = 1, 2, . . . .
29
laplace2radial
With
determined we must next solve the corresponding R equation,
r2 R00 + rR0
k2R = 0
which is of the Cauchy-Euler type (2.1.19). The general solution is
(
c1 + c2 log r k = 0
R(r) =
c1 rk + c2 r k k = 1, 2 . . .
(2.3.72)
(2.3.73)
and here we encounter the second hidden condition, the solution R should be not be
singular at the origin, since otherwise the PDE would not be satisfied throughout the
unit disk. Thus we should choose c2 = 0 in each case, leaving R(r) = rk , k = 0, 1, . . . .
Summarizing, we have found all possible product solutions R(r)⇥(✓) of (2.3.61), and
they are
1, rk sin k✓, rk cos k✓
k = 1, 2, . . .
(2.3.74)
up to constant multiples. Any sum of such terms is also a solution of (2.3.61), so we seek
a solution of (2.3.61),(2.3.62) in the form
u(r, ✓) = a0 +
1
X
ak rk cos k✓ + bk rk sin k✓
(2.3.75)
k=1
The coefficients must then be determined from the requirement that
f (✓) = a0 +
1
X
ak cos k✓ + bk sin k✓
(2.3.76)
k=1
This is another problem in the theory of Fourier series, which is very similar to that
associated with (2.3.57), which as mentioned before will be studied in detail in Chapter
8. Exact formulas for the coefficients in terms of f may be given, as in (2.3.60), see
Exercise 19.
2.3.4
Standard problems and side conditions
Let us now formulate a number of typical PDE problems which will recur throughout
this book, and which are for the most part variants of the model problems discussed in
30
fourseries
the previous section. Let ⌦ be some domain in RN and let @⌦ denote the boundary of
⌦. For any sufficiently di↵erentiable function u, the Laplacian of u is
u=
N
X
@ 2u
k=1
• The PDE
(2.3.77)
@x2k
u=h x2⌦
(2.3.78)
is Poisson’s equation, or Laplace’s equation in the special case that h = 0. It is
regarded as being of elliptic type, by analogy with the N = 2 case discussed in the
previous section, or on account of a more general definition of ellipticity which will
be given in Chapter 9. The most common type of side conditions associated with
this PDE are
– Dirichlet, or first kind, boundary conditions
u(x) = g(x)
x 2 @⌦
(2.3.79)
– Neumann, or second kind, boundary conditions
@u
(x) = g(x)
@n
x 2 @⌦
(2.3.80)
@u
where @n
(x) = (ru · n)(x) is the directional derivative in the direction of the
outward normal direction n(x) for x 2 @⌦.
– Robin, or third kind, boundary conditions
@u
(x) + (x)u(x) = g(x)
@n
x 2 @⌦
(2.3.81)
for some given function .
• The PDE
u+ u=h x2⌦
(2.3.82)
where is some constant, is the Helmholtz equation, also of elliptic type. The
three types of boundary condition mentioned for the Poisson equation may also be
imposed in this case.
31
• The PDE
ut =
x2⌦ t>0
u
(2.3.83)
is the heat equation and is of parabolic type. Here u = u(x, t), where x is regarded
as a spatial variable and t a time variable. By convention, the Laplacian acts only
with respect to the N spatial variables x1 , . . . xN . Appropriate side conditions for
determining a solution of the heat equation are an initial condition
u(x, 0) = f (x)
x2⌦
(2.3.84)
and boundary conditions of the Dirichlet, Neumann or Robin type mentioned above.
The only needed modification is that the functions involved may be allowed to
depend on t, for example the Dirichlet boundary condition for the heat equation is
u(x, t) = g(x, t)
x 2 @⌦ t > 0
(2.3.85)
and similarly for the other two types.
• The PDE
utt =
x2⌦ t>0
u
(2.3.86)
is the wave equation and is of hyperbolic type. Since it is second order in t it is
natural that there be two initial conditions, usually given as
u(x, 0) = f (x) ut (x, 0) = g(x)
x2⌦
(2.3.87)
Suitable boundary conditions for the wave equation are precisely the same as for
the heat equation.
• Finally, the PDE
iut =
x 2 RN
u
t>0
(2.3.88)
is the Schrödinger equation. Even when N = 1 it does not fall under the
p classification scheme of Section 2.3.2 because of the complex coefficient i =
1. It is
nevertheless one of the fundamental partial di↵erential equations of mathematical
physics, and we will have some things to say about it in later chapters. The spatial
domain here is taken to be all of RN rather than a subset ⌦ because this is by far
the most common situation and the only one which will arise in this book. Since
there is no spatial boundary, the only needed side condition is an initial condition
for u, u(x, 0) = f (x), as in the heat equation case.
32
2.4
Well-posed and ill-posed problems
illposed
All of the PDEs and associated side conditions discussed in the previous section turn
out to be natural, in the sense that they lead to what are called well-posed problems, a
somewhat imprecise concept we explain next. Roughly speaking a problem is well-posed
if
• A solution exists.
• The solution is unique.
• The solution depends continuously on the data.
Here by ’data’ we mean any of the ingredients of the problem which we might imagine
being changed, to obtain a problem of the same general type. For example in the Dirichlet
problem for the Poisson equation
u=f
x2⌦
u = 0 x 2 @⌦
(2.4.1)
the term f = f (x) would be regarded as the given data. The idea of continuous dependence is that if a ’small’ change is made in the data, then the resulting solution should
also undergo only a small change. For such a notion to be made precise, it is necessary
to have some specific idea in mind of how we would measure the magnitude of a change
in f . As we shall see, there may be many natural ways to do so, and no precise statement about well-posedness can be given until such choices are made. In fact, even the
existence and uniqueness requirements, which may seem more clear cut, may also turn
out to require much clarification in terms of what the exact meaning of ’solution’ is.
A problem which is not well-posed is called ill-posed. A classical problem in which
ill-posedness can be easily recognized is Hadamard’s example, which we note is not of
one of the standard types mentioned above:
uxx + uyy = 0
1<x<1 y>0
u(x, 0) = 0 uy (x, 0) = g(x)
1 < x1
(2.4.2)
(2.4.3)
If g(x) = ↵ sin kx for some ↵, k > 0 then a corresponding solution is
u(x, y) = ↵
33
sin kx ky
e
k
(2.4.4)
This is known to be the unique solution, but notice that a change in ↵ (i.e. of the data g)
of size ✏ implies a corresponding change in the solution for, say, y = 1 of size ✏ek . Since
k can be arbitrarily large, it follows that the problem is ill-posed, that is, small changes
in the data do not necessarily lead to small changes in the solution.
Note that in this example if we change the PDE from uxx + uyy = 0 to uxx uyy = 0
then (aside from the name of a variable) we have precisely the problem (2.3.41),(2.3.43),
which from the explicit solution (2.3.46) may be seen to be well-posed under any reasonable interpretation. Thus we see that some care must be taken in recognizing what
are the ’correct’ side conditions for a given PDE. Other interesting examples of ill-posed
problems are given in exercises 23 and 26, see also [24].
2.5
Exercises
1. Find a fundamental set and the general solution of u000 + u00 + u0 = 0.
ex22
2. Let L = aD2 +bD+c (a 6= 0) be a constant coefficient second order linear di↵erential
operator, and let p( ) = a 2 + b + c be the associated characteristic polynomial.
If 1 , 2 are the roots of p, show that we can express the operator L as L = a(D
1 )(D
2 ). Use this factorization to obtain the general solution of Lu = 0 in the
case of repeated roots, 1 = 2 .
p
3. Show that the solution of the initial value problem y 0 = 3 y, y(0) = 0 is not unique.
(Hint: y(t) = 0 is one solution, find another one.) Why doesn’t this contradict the
assertion in Theorem 2.1 about unique solvability of the initial value problem?
4. Solve the initial value problem for the Cauchy-Euler equation
(t + 1)2 u00 + 4(t + 1)u0
10u = 0
u(1) = 2 u0 (1) =
1
5. Consider the integral equation
Z 1
K(x, y)u(y) dy = u(x) + g(x)
0
for the kernel
K(x, y) =
x2
1 + y3
a) For what values of 2 C does there exist a unique solution for any function g
which is continuous on [0, 1]?
34
b) Find the solution set of the equation for all
2 C and continuous functions g.
(Hint: For 6= 0 any solution must have the form u(x) = g(x) + Cx2 for some
constant C.)
R1
6. Find a kernel K(x, y) such that u(x) = 0 K(x, y)f (y) dy is the solution of
u00 + u = f (x)
u(0) = u0 (0) = 0
(Hint: Review the variation of parameters method in any undergraduate ODE
textbook.)
2-7
7. If f 2 C([0, 1]),
K(x, y) =
(
y(x
x(y
and
u(x) =
show that
u00 = f
Z
1) 0 < y < x < 1
1) 0 < x < y < 1
1
K(x, y)f (y) dy
0
0<x<1
u(0) = u(1) = 0
8. For each of the integral operators in (2.2.8),(2.2.14),(2.2.15),(2.2.16),and (2.2.17),
discuss the classification(s) of the corresponding kernel, according to Definition
(2.1).
9. Find the general solution of (1 + x2 )ux + uy = 0. Sketch some of the characteristic
curves.
10. The general solution in Example 2.10 was found by solving the corresponding
Cauchy problem with
being the x axis. But the general solution should not
actually depend on any specific choice of . Show that the same general solution is
found if instead we take to be the y axis.
11. Find the solution of
yux + xuy = 1
u(0, y) = e
Discuss why the solution you find is only valid for |y|
y2
|x|.
12. The method of characteristics developed in Section 2.3.1 for the linear PDE (2.3.10)
can be easily extended to the so-called semilinear equation
a(x, y)ux + b(x, y)uy = c(x, y, u)
35
(2.5.1)
We simply replace (2.3.12) by
d
u(x(t), y(t)) = c(x(t), y(t), u(x(t), y(t)))
dt
(2.5.2)
which is still an ODE along a characteristic. With this in mind, solve
ux + xuy + u2 = 0
13. Find the general solution of uxx
u(0, y) =
1
y
(2.5.3)
4uxy + 3uyy = 0.
14. Find the regions of the xy plane where the PDE
yuxx
2uxy + xuyy
3ux + u = 0
is elliptic, parabolic, and hyperbolic.
15. Find a solution formula for the half line wave equation problem
utt
c2 uxx
u(0, t)
u(x, 0)
ut (x, 0)
=
=
=
=
0 x>0 t>0
h(t) t > 0
f (x) x > 0
g(x) x > 0
(2.5.4)
(2.5.5)
(2.5.6)
(2.5.7)
Note where the solution coincides with (2.3.46) and explain why this should be
expected.
16. Complete the details of verifying (2.3.67)
ex-2-17
17. If u is a twice di↵erentiable function on RN depending only on r = |x|, show that
u = urr +
N
1
r
ur
(Spherical coordinates in RN are reviewed in Section 18.3, but the details of the
@u
=
angular variables are not needed for this calculation. Start by showing that @x
j
x
j
0
u (r) r .)
18. Verify in detail that there are no nontrivial solutions of (2.3.53) for nonreal
ex23
2 C.
19. Assuming that (2.3.76) is valid, find the coefficients ak , bk in terms of f . (Hint:
multiply the equation by sin m✓ or cos m✓ and integrate from 0 to 2⇡.)
36
20. In the two dimensional case, solutions of Laplace’s equation u = 0 may also be
found by means of analytic function theory. Recall that if z = x+iy then a function
f (z) is analytic in an open set ⌦ if f 0 (z) exists at every point of ⌦. If we think of
f = u + iv and u, v as functions of x, y then u = u(x, y), v = v(x, y) must satisfy
the Cauchy-Riemann equations ux = vy , uy = vx . Show in this case that u, v are
also solutions of Laplace’s equation. Find u, v if f (z) = z 3 and f (z) = ez .
21. Find all of the product solutions u(x, t) = (t) (x) that you can which satisfy the
damped wave equation
utt + ↵ut = uxx
0<x<⇡
t>0
and the boundary conditions
u(0, t) = ux (⇡, t) = 0
t>0
Here ↵ > 0 is the damping constant. What is the significance of the condition
↵ < 1?
ex24
22. Show that any solution of the wave equation utt
property’
u(x, t) + u(x + h
uxx = 0 has the ‘four point
k, t + h + k) = u(x + h, t + h) + u(x
k, t + k)
for any h, k. (Suggestion: Use d’Alembert’s formula.)
ex25
23. In the Dirichlet problem for the wave equation
utt
uxx = 0
0<x<1 0<t<1
u(0, t) = u(1, t) = 0
u(x, 0) = 0 u(x, 1) = f (x)
0<t<1
0<x<1
show that neither existence nor uniqueness holds. (Hint: For the non-existence
part, use exercise 22 to find an f for which no solution exists.)
24. Let ⌦ be the rectangle [0, a] ⇥ [0, b] in R2 . Find all possible product solutions
u(x, y, t) = (t) (x)⇣(y)
satisfying
ut
(x, y) 2 ⌦ t > 0
u=0
(x, y) 2 @⌦ t > 0
u(x, y, t) = 0
37
25. Find a solution of the Dirichlet problem for u = u(x, y) in the unit disc ⌦ = {(x, y) :
x2 + y 2 < 1},
u = 1 (x, y) 2 ⌦
u(x, y) = 0 (x, y) 2 @⌦
(Suggestion: look for a solution in the form u = u(r) and recall (2.3.67).)
ex26
26. The problem
ut = uxx
0<x<1 t<T
u(0, t) = u(1, t) = 0
t>0
u(x, T ) = f (x)
0<x<1
(2.5.8)
(2.5.9)
(2.5.10)
is sometimes called a final value problem for the heat equation.
a) Show that this problem is ill-posed.
b) Show that this problem is equivalent to (2.3.47),(2.3.48),(2.3.49) except with the
heat equation (2.3.47) replaced by the backward heat equation ut = uxx .
38
Chapter 3
Vector spaces
We will be working frequently with function spaces which are themselves special cases
of more abstract spaces. Most such spaces which are of interest to us have both linear
structure and metric structure. This means that given any two elements of the space it
is meaningful to speak of (i) a linear combination of the elements, and (ii) the distance
between the two elements. These two kinds of concepts are abstracted in the definitions
of vector space and metric space.
3.1
Axioms of a vector space
chvec-1
Definition 3.1. A vector space is a set X such that whenever x, y 2 X and
we have x + y 2 X and x 2 X, and the following axioms hold.
is a scalar
[V1] x + y = y + x for all x, y 2 X
[V2] (x + y) + z = x + (y + z) for all x, y, z 2 X
[V3] There exists an element 0 2 X such that x + 0 = x for all x 2 X
[V4] For every x 2 X there exists an element
x 2 X such that x + ( x) = 0
[V5] (x + y) = x + y for all x, y 2 X and any scalar
[V6] ( + µ)x = x + µx for any x 2 X and any scalars , µ
39
[V7] (µx) = ( µ)x for any x 2 X and any scalars , µ
[V8] 1x = x for any x 2 X
Here the field of scalars my be either the real numbers R or the complex numbers
C, and we may refer to X as a real or complex vector space accordingly, if a distinction
needs to be made.
By an obvious induction
P argument, if x1 , . . . , xm 2 X and
the linear combination m
j=1 j xj is itself an element of X.
1, . . . ,
m
are scalars, then
Example 3.1. Ordinary N -dimensional Euclidean space
RN := {x = (x1 , x2 . . . xN ) : xj 2 R}
is a real vector space with the usual operations of vector addition and scalar multiplication,
(x1 , x2 . . . xN ) + (y1 , y2 , . . . yN ) = (x1 + y1 , x2 + y2 . . . xN + yN )
(x1 , x2 . . . xN ) = ( x1 , x2 , . . . xN )
If we allow the components xj as well as the scalars
instead the complex vector space CN .
2R
to be complex, we obtain
Example 3.2. If E ⇢ RN , let
C(E) = {f : E ! R : f is continous at x for every x 2 E}
denote the set of real valued continuous functions on E. Clearly C(E) is a real vector
space with the ordinary operations of function addition and scalar multiplication
(f + g)(x) = f (x) + g(x)
( f )(x) = f (x)
2R
If we allow the range space in the definition of C(E) to be C then C(E) becomes a
complex vector space.
Spaces of di↵erentiable functions likewise may be naturally regarded as vector spaces,
for example
C m (E) = {f : D↵ f 2 C(E), |↵|  m}
and
2
C 1 (E) = {f : D↵ f 2 C(E), for all ↵}
40
Example 3.3. If 0 < p < 1 and E is a measurable subset of RN , the space Lp (E) is
defined to be the set of measurable functions f : E ! R or f : E ! C such that
Z
|f (x)|p dx < 1
(3.1.1)
E
Here the integral is defined in the Lebesgue sense. Those unfamiliar with measure theory
and Lebesgue integration should consult a standard textbook such as [29],[27], or see a
brief summary in Appendix ( ).
It may then be shown that Lp (E) is vector space for any 0 < p < 1. To see this we
use the known fact that if f, g are measurable then so are f + g and f for any scalar ,
and the numerical inequality (a + b)p  Cp (ap + bp ) for a, b 0, where Cp = max (2p 1 , 1)
to prove that f + g 2 Lp (E) whenever f, g 2 Lp (E). Verification of the remaining axioms
is routine.
The related vector space L1 (E) is defined as the set of measurable functions f for
which
ess supx2E |f (x)| < 1
(3.1.2)
Here M = ess supx2E |f (x)| if |f (x)|  M a.e. and there is no smaller constant with this
property.
Definition 3.2. If X is a vector space, a subset M ⇢ X is a subspace of X if
(i) x + y 2 M whenever x, y 2 M
(ii) x 2 M whenever x 2 M and
is a scalar
That is to say, a subspace is a subset of X which is closed under formation of linear
combinations. Clearly a subspace of a vector space is itself a vector space.
Example 3.4. The subset M = {x 2 RN : xj = 0} is a subspace of RN for any fixed j.
Example 3.5. If E ⇢ RN then C 1 (E) is a subspace of C m (E) for any m, which in turn
is a subspace of C(E).
Example 3.6. If X is any vector space and S ⇢ X, then the set of all finite linear
combinations of elements of S,
L(S) := {v 2 X : x =
m
X
j xj
for some scalars
j=1
41
1,
2, . . .
m
and elements x1 , . . . xm 2 S}
is a subspace of X. It is also called the span, or linear span of S, or the subspace generated
by S. 2
Example 3.7. If in Example 5 we take X = C([a, b]) and fj (x) = xj 1 for j = 1, 2, . . .
+1
then the subspace generated by {fj }N
j=1 is PN , the vector space of polynomials of degree
less than or equal to N . Likewise, the the subspace generated by {fj }1
j=1 is P, the vector
space of all polynomials. 2
3.2
Linear independence and bases
Definition 3.3. We say that
PmS ⇢ X is linearly independent if whenever x1 , . . . xm 2 S,
1 , . . . m are scalars and
j=1 j xj = 0 then 1 = 2 = . . . m = 0. Otherwise S is
linearly dependent.
Equivalently, S is linearly dependent if it is possible to express at least one of its
elements as a linear combination of the remaining ones. In particular any set containing
the zero element is linearly dependent.
hamel
Definition 3.4. We say that S ⇢ X is a basis of X if for any x 2
PX there exists unique
scalars 1 , 2 , . . . m and elements x1 , . . . , xm 2 S such that x = m
j=1 j xj .
The following characterization of a basis is then immediate:
Theorem 3.1. S ⇢ X is a basis of X if and only if S is linearly independent and
L(S) = X.
It is important to emphasize that in this definition of basis it is required that every
x 2 X be expressible as a finite linear combination of the basis elements. This notion
of basis will be inadequate for later purposes, and will be replaced by one which allows
infinite sums, but this cannot be done until a meaning of convergence is available. The
notion of basis in Definition 3.4 is called a Hamel basis if a distinction is necessary.
Definition 3.5. We say that dim X, the dimension of X, is m if there exist m linearly
independent vectors in X but any collection of m + 1 elements of X is linearly dependent.
If there exists m linearly independent vectors for any positive integer m, then we say
dim X = 1.
prop31
Proposition 3.1. The elements {x1 , x2 , . . . xm } form a basis for L({x1 , x2 , . . . xm }) if
and only if they are linearly independent.
42
prop32
Proposition 3.2. The dimension of X is the number of vectors in any basis of X.
The proof of both of these Propositions is left for the exercises.
Example 3.8. RN or CN has dimension N . We will denote by ej the standard unit
vector with a one in the j’th position and zero elsewhere. Then {e1 , e2 , . . . eN } is the
standard basis for either RN or CN .
Example 3.9. In the vector space C([a, b]) the elements fj (t) = tj 1 are clearly linearly
independent, so that the dimension is 1, as is the dimension of the subspace P. Also
evidently the subspace PN has dimension N + 1.
Example 3.10. The set of solutions of the ordinary di↵erential equation u00 + u = 0
is precisely the set of linear combinations u(t) = 1 sin t + 2 cos t. Since sin t, cos t are
linearly independent functions, they form a basis for this two dimensional space.
The following is interesting, although not of great practical significance. Its proof,
which is not obvious in the infinite dimensional case, relies on the Axiom of Choice and
will not be given here.
Theorem 3.2. Every vector space has a basis.
3.3
Linear transformations of a vector space
sec33
If X and Y are vector spaces, a mapping T : X 7 ! Y is called linear if
T ( 1 x1 +
2 x2 )
=
1 T (x1 )
+
2 T (x2 )
(3.3.1)
for all x1 , x2 2 X and all scalars 1 , 2 . Such a linear transformation is uniquely determined on all of X by its action onPany basis of X, i.e. if S P
= {x↵ }↵2A is a basis of X
m
and y↵ = T (x↵ ), then for any x = m
x
we
have
T
x
=
j
↵
j
j=1
j=1 j y↵j .
In the case that X and Y are both of finite dimension let us choose bases {x1 , x2 , . . . xm },
{y1 , y2 , . . . yn } of P
X, Y respectively. For 1  j  m there must exist unique scalars akj
such that T xj = nk=1 akj yk and it follows that
x=
m
X
j=1
j xj ) T x =
n
X
µk y k
where µk =
m
X
j=1
k=1
43
akj
j
(3.3.2)
P
For a given basis {x1 , x2 , . . . xm } of X, if x = m
j=1 j xj we say that 1 , 2 , . . . m are
the coordinates of x with respect to the given basis. The n ⇥ m matrix A = [akj ] thus
maps the coordinates of x with respect to the basis {x1 , x2 , . . . xm } to the coordinates of
T x with respect to the basis {y1 , y2 , . . . yn }, and thus encodes all information about the
linear mapping T .
If T : X 7 ! Y is linear, one-to-one and onto then we say T is an isomorphism
between X to Y, and the vector spaces X and Y are isomorphic whenever there exists
an isomorphism between them. If T is such an isomorphism, and S is a basis of X then it
easy to check that the image set T (S) is a basis of Y. In particular, any two isomorphic
vector spaces have the same finite dimension or are both infinite dimensional.
For any linear mapping T : X ! Y we define the kernel, or null space, of T as
N (T ) = {x 2 X : T x = 0}
(3.3.3)
R(T ) = {y 2 Y : y = T x for some x 2 X}
(3.3.4)
and the range of T as
It is immediate that N (T ) and R(T ) are subspaces of X, Y respectively, and T is an
isomorphism precisely if N (T ) = {0} and R(T ) = Y. If X = Y = RN or CN , we learn
in linear algebra that these two conditions are equivalent.
3.4
Exercises
1. Using only the vector space axioms, show that the zero element in [V3] is unique.
2. Prove Propositions 3.1 and 3.2.
3. Show that the intersection of any family of subspaces of a vector space is also a
subspace. What about the union of subspaces?
4. Show that Mm⇥n , the set of m ⇥ n matrices, with the usual definitions of addition
and scalar multiplication, is a vector space of dimension mn. Show that the subset
of symmetric matrices n ⇥ n matrices forms a subspace of Mn⇥n . What is its
dimension?
5. Under what conditions on a measurable set E ⇢ RN and p 2 (0, 1] will it be true
that C(E) is a subspace of Lp (E)? Under what conditions is Lp (E) a subset of
Lq (E)?
44
6. Let uj (t) = t j where 1 , . . . n are arbitrary unequal real numbers. Show that
{u1 . . . uP
n } are linearly independent functions on any interval (a, b) ⇢ R. (Suggestion: If nj=1 ↵j t j = 0, divide by t 1 and di↵erentiate.)
7. A side condition for a di↵erential equation is homogeneous if whenever two functions satisfy the side condition then so does any linear combination of the two
functions. For example the Dirichlet
type boundary condition u = 0 for x 2 @⌦ is
P
homogeneous. Now let Lu = |↵|m a↵ (x)D↵ u denote any linear di↵erential operator. Show that the set of functions satisfying Lu = 0 and any homogeneous side
conditions is a vector space.
8. Consider the di↵erential equation u00 + u = 0 on the interval (0, ⇡). What is the
dimension of the vector space of solutions which satisfy the homogeneous boundary
conditions a) u(0) = u(⇡), and b) u(0) = u(⇡) = 0. Repeat the question if the
interval (0, ⇡) is replaced by (0, 1) and (0, 2⇡).
9. Let Df = f 0 for any di↵erentiable function f on R. For any N
D : PN ! PN is linear and find its null space and range.
0 show that
10. If X and Y are vector spaces, then the Cartesian product of X and Y, is defined
as the set of ordered pairs
X ⇥ Y = {(x, y) : x 2 X, y 2 Y}
(3.4.1)
Addition and scalar multiplication on X ⇥ Y are defined in the natural way,
(x, y) + (x̂, ŷ) = (x + x̂, y + ŷ)
(x, y) = ( x, y)
(3.4.2)
a) Show that X ⇥ Y is a vector space.
b) Show that R ⇥ R is isomorphic to R2 .
11. If X, Y are vector spaces of the same finite dimension, show X and Y are isomorphic.
12. Show that Lp (0, 1) and Lp (a, b) are isomorphic, for any a, b 2 R and p 2 (0, 1].
45
Chapter 4
Metric spaces
chmetric
4.1
Axioms of a metric space
A metric space is a set on which some natural notion of distance may be defined.
Definition 4.1. A metric space is a pair (X, d) where X is a set and d is a real valued
mapping on X ⇥ X, such that the following axioms hold.
[M1] d(x, y)
0 for all x, y 2 X
[M2] d(x, y) = 0 if and only if x = y
[M3] d(x, y) = d(y, x) for all x, y 2 X
[M4] d(x, y)  d(x, z) + d(z, y) for all x, y, z 2 X.
Here d is the metric on X, i.e. d(x, y) is regarded as the distance from x to y. Axiom
[M4] is known as the triangle inequality. Although strictly speaking the metric space is
the pair (X, d) it is a common practice to refer to X itself as being the metric space, with
the metric d understood from context. But as we will see in examples it is often possible
to assign di↵erent metrics to the same set X.
If (X, d) is a metric space and Y ⇢ X then it is clear that (Y, d) is also a metric
space, and in this case we say that Y inherits the metric of X.
46
ex41
Example 4.1. If X = RN then there are many choices of d for which (RN , d) is a metric
space. The most familiar is the ordinary Euclidean distance
d(x, y) =
N
X
j=1
! 12
(4.1.1)
1p<1
(4.1.2)
yj | 2
|xj
In general we may define
N
X
dp (x, y) =
j=1
yj | p
|xj
! p1
and
d1 (x, y) = max (|x1
y1 |, |x2
y2 |, . . . |xn
yn |)
(4.1.3)
The verification that (Rn , dp ) is a metric space for 1  p  1 is left to the exercises
– the triangle inequality is the only nontrivial step. The same family of metrics may be
used with X = CN .
CofE
Example 4.2. To assign a metric to C(E) more specific assumptions must be made
about E. If we assume, for example, that E is a closed and bounded1 subset of RN we
may set
d1 (f, g) = max |f (x) g(x)|
(4.1.4)
x2E
so that d(f, g) is always finite by virtue of the well known theorem that a continuous
function achieves its maximum on a closed, bounded set. Other possibilities are
✓Z
◆ p1
p
dp (f, g) =
|f (x) g(x)| dx
1p<1
(4.1.5)
E
Note the analogy with the definition of dp in the case of RN or CN .
For more arbitrary sets E there is in general no natural metric for C(E). For example,
if E is an open set, none of the metrics dp can be used since there is no reason why dp (f, g)
should be finite for f, g 2 C(E).
As in the case of vector spaces, some spaces of di↵erentiable functions may also be
made into metric spaces. For this we will assume a bit more about E, namely that E is
1
I.e. E is compact in RN . Compactness is discussed in more detail below, and we avoid using the term until
then.
47
CMetric
the closure of a bounded open set O ⇢ RN , and in this case will say that D↵ f 2 C(E) if
the function D↵ f defined in the usual pointwise sense on O has a continuous extension
to E. We then can define
C m (E) = {f : D↵ f 2 C(E) whenever |↵|  m}
(4.1.6)
with metric
d(f, g) = max max |D↵ (f
|↵|m x2E
g)(x)|
(4.1.7)
CmMetric
which may be easily checked to satisfy [M1]-[M4].
We cannot define a metric on C 1 (E) in the obvious way just by letting m ! 1
in the above definition, since there is no reason why the resulting maximum over m in
(4.1.7) will be finite, even if f 2 C m (E) for every m. See however Exercise 18.
Example 4.3. Recall that if E is a measurable subset of RN , we have defined corresponding vector spaces Lp (E) for 0 < p  1. To endow them with metric space structure
let
✓Z
◆ p1
p
dp (f, g) =
|f (x) g(x)| dx
(4.1.8)
dpmet
E
for 1  p < 1, and
d1 (f, g) = ess supx2E |f (x)
g(x)|
(4.1.9)
The validity of axioms [M1] and [M3] is clear, and the triangle inequality [M4] is
an immediate consequence of the Minkowski inequality (18.1.10). But axiom [M2] does
not appear to be satisfied here, since for example, two functions f, g agreeing except at
a single point, or more generally agreeing except on a set of measure zero, would have
dp (f, g) = 0. It is necessary, therefore, to modify our point of view concerning Lp (E) as
follows. We define an equivalence relation f ⇠ g if f = g almost everywhere, i.e. except
on a set of measure zero. If dp (f, g) = 0 we would be able to correctly conclude that
f ⇠ g, in which case we will regard f and g as being the same element of Lp (E). Thus
strictly speaking, Lp (E) is the set of equivalence classes of measurable functions, where
the equivalence classes are defined by means of the above equivalence relation.
The distance dp ([f ], [g]) between two equivalence classes [f ] and [g] may be unambiguously determined by selecting a representative of each class and then evaluating
the distance from (4.1.8) or (4.1.9). Likewise the vector space structure of Lp (E) is
maintained since, for example, we can define the sum of equivalence classes [f ] + [g] by
selecting a representative of each class and observing that if f1 ⇠ f2 and g1 ⇠ g2 then
48
dinfmet
f1 +g1 ⇠ f2 +g2 . It is rarely necessary to make a careful distinction between a measurable
function and the equivalence class it belongs to, and whenever it can cause no confusion
we will follow the common practice of referring to members of Lp (E) as functions rather
than equivalence classes. The notation f may be used to stand for either a function or its
equivalence class. An element f 2 Lp (E) will be said to be continuous if its equivalence
class contains a continuous function, and in this way we can naturally regard C(E) as a
subset of Lp (E).
Although Lp (E) is a vector space for 0 < p  1, we cannot use the above definition
of metric for 0 < p < 1, since it turns out the triangle inequality is not satisfied (see
Exercise 6 of Chapter 5) except in degenerate cases.
4.2
Topological concepts
In a metric space various concepts of point set topology may be introduced.
Definition 4.2. If (X, d) is a metric space then
1. B(x, ✏) = {y 2 X : d(x, y) < ✏} is the ball centered at x of radius ✏.
2. A set E ⇢ X is bounded if there exists some x 2 X and R < 1 such that
E ⇢ B(x, R).
3. If E ⇢ X, then a point x 2 X is an interior point of E if there exists ✏ > 0 such
that B(x, ✏) ⇢ E.
4. If E ⇢ X, then a point x 2 X is a limit point of E if for any ✏ > 0 there exists a
point y 2 B(x, ✏) \ E, y 6= x.
5. A subset E ⇢ X is open if every point of E is an interior point of E. By convention,
the empty set is open.
6. A subset E ⇢ X is closed if every limit point of E is in E.
7. The closure E of a set E ⇢ X is the union of E and the limit points of E.
8. The interior E of a set E is the set of all interior points of E.
9. A subset E is dense in X if E = X
49
10. X is separable if it contains a countable dense subset.
11. If E ⇢ X, we say that x 2 X is a boundary point of E if for any ✏ > 0 the ball
B(x, ✏) contains at least one point of E and at least one point of the complement
E c = {x 2 X : x 62 E}. The boundary of E is denoted @E.
The following Proposition states a number of elementary but important properties.
Proofs are essentially the same as in the more familiar special case when the metric space
is a subset of RN , and will be left for the reader.
Proposition 4.1. Let (X, d) be a metric space. Then
1. B(x, ✏) is open for any x 2 X and ✏ > 0.
2. E ⇢ X is open if and only if its complement E c is closed
3. An arbitrary union or finite intersection of open sets is open.
4. An arbitrary intersection or finite union of closed sets is closed.
5. If E ⇢ X then E is the union of all open sets contained in E, E is open, and E
is open if and only if E = E .
6. E is the intersection of all closed sets containing E, E is closed, and E is closed if
and only if E = E.
7. If E ⇢ X then @E = E\E = E \ E c
Next we study infinite sequences in X.
Definition 4.3. We say that a sequence {xn }1
n=1 in X is convergent to x, that is,
limn!1 xn = x, if for any ✏ > 0 there exists n0 < 1 such that d(xn , x) < ✏ whenever n n0 .
Example 4.4. If X = RN or CN , and d is any one of the metrics dp , then xn ! x if and
only if each component sequence converges to the corresponding limit, i.e. xj,n ! xj as
n ! 1 in the ordinary sense of convergence in R or C. (Here xj,n is the j’th component
of xn .)
Example 4.5. In the metric space (C(E), d1 ) of Example 4.2, limn!1 fn = f is equivalent to the definition of uniform convergence on E.
50
Definition 4.4. We say that a sequence {xn }1
n=1 in X is a Cauchy sequence if for any
✏ > 0 there exists n0 < 1 such that d(xn , xm ) < ✏ whenever n, m n0 .
It is easy to see that a convergent sequence is always a Cauchy sequence, but the
converse may be false.
Definition 4.5. A metric space X is said to be complete if every Cauchy sequence in X
is convergent in X.
Example 4.6. Completeness is one of the fundamental properties of the real numbers
N
R, see for example Chapter 1 of [28]. If a sequence {xn }1
n=1 in R is Cauchy with respect
to any of the metrics dp , then each component sequence {xj,n }1
n=1 is a Cauchy sequence
in R, hence convergent in R. It then follows immediately that {xn }1
n=1 is convergent in
RN , again with any of the metrics dp . The same conclusion holds for CN , so that RN , CN
are complete metric spaces. These spaces are also separable since the subset consisting
of points with rational co-ordinates is countable and dense. A standard example of an
incomplete metric space is the set of rational numbers with the metric inherited from R.
Most metric spaces used in this book, and indeed most metric spaces used in applied
mathematics, are complete.
prop42
Proposition 4.2. If E ⇢ RN is closed and bounded, then the metric space C(E) with
metric d = d1 is complete.
Proof: Let {fn }1
n=1 be a Cauchy sequence in C(E). If ✏ > 0 we may then find n0 such
that
max |fn (x) fm (x)| < ✏
(4.2.1)
x2E
whenever n, m n0 . In particular the sequence of numbers {fn (x)}1
n=1 is Cauchy in R
or C for each fixed x 2 E, so we may define f (x) := limn!1 fn (x). Letting m ! 1 in
(4.2.1) we obtain
|fn (x) f (x)|  ✏
n n0 x 2 E
(4.2.2)
which means d(fn , f )  ✏ for n
n0 . It remains to check that f 2 C(E). If we pick
x 2 E, then since fn0 2 C(E) there exists > 0 such that |fn0 (x) fn0 (y)| < ✏ if
|y x| < . Thus for |y x| < we have
|f (x)
f (y)|  |f (x)
fn0 (x)| + |fn0 (x)
fn0 (y)| + |fn0 (y)
f (y)| < 3✏
(4.2.3)
Since ✏ is arbitrary, f is continuous at x, and since x is arbitrary f 2 C(E). Thus we
have concluded that the Cauchy sequence {fn }1
n=1 is convergent in C(E) to f 2 C(E),
as needed. 2
51
eq401
The final part of the above proof should be recognized as the standard proof of the
familiar fact that a uniform limit of continuous functions is continuous.
The spaces C m (E) can likewise be shown, again assuming that E is closed and
bounded, to be complete metric spaces with the metric defined in (4.1.7), see Exercise
19.
If we were to choose the metric d1 on C(E) then the resulting metric space is not
1
complete. Choose for example E = [ 1, 1] and fn (x) = x 2n+1 so that the pointwise limit
of fn (x) is
f (x) = 1 x > 0
f (x) = 1 x < 0
f (0) = 0
(4.2.4)
By a simple calculation
Z
1
1
(4.2.5)
n+1
1
1
so that {fn }1
n=1 must be Cauchy in C(E) with metric d1 . On the other hand {fn }n=1
cannot be convergent in this space, since the only possible limit is f which does not
belong to C(E).
|fn (x)
f (x)| =
The same example can be modified to show that C(E) is not complete with any of
the metrics dp for 1  p < 1, and so d1 is in some sense the ’natural’ metric. For this
reason C(E) will always be assumed to supplied with the metric d1 unless otherwise
stated.
We next summarize in the form of a theorem some especially important facts about
the metric spaces Lp (E), which may be found in any standard textbook on Lebesgue
integration, for example Chapter 3 of [29] or Chapter 8 of [37].
th41
Theorem 4.1. If E ⇢ RN is measurable, then
1. Lp (E) is complete for 1  p  1.
2. Lp (E) is separable for 1  p < 1.
3. If Cc (E) is the set of continuous functions of bounded support, i.e.
Cc (E) = {f 2 C(E) : there exists R < 1 such that f (x) ⌘ 0 for |x| > R} (4.2.6)
then Cc (E) is dense in Lp (E) for 1  p < 1
The completeness property is a significant result in measure theory, often known as
the Riesz-Fischer Theorem.
52
4.3
Functions on metric spaces and continuity
Next, suppose X, Y are two metric spaces with metrics dX , dY respectively.
Definition 4.6. Let T : X ! Y be a mapping.
1. We say T is continuous at a point x 2 X if for any ✏ > 0 there exists
that dY (T (x), T (x̂))  ✏ whenever dX (x, x̂)  .
> 0 such
2. T is continuous on X if it is continuous at each point of X.
3. T is uniformly continuous on X if for any ✏ > 0 there exists
dY (T (x), T (x̂))  ✏ whenever dX (x, x̂)  , x, x̂ 2 X.
> 0 such that
4. T is Lipschitz continuous on X if there exists L such that
dY (T (x), T (x̂))  LdX (x, x̂)
x, x̂ 2 X
(4.3.1)
The infimum of all L’s which work in this definition is called the Lipschitz constant
of T .
Clearly we have the implications that T Lipschitz continuous implies T is uniformly
continuous, which in turn implies that T is continuous.
T is one-to-one, or injective, if T (x1 ) = T (x2 ) only if x1 = x2 , and onto, or surjective,
if for every y 2 Y there exists some x 2 X such that T (x) = y. If T is both one-to-one
and onto then we say it is bijective, and in this case there must exist an inverse mapping
T 1 : Y ! X.
For any mapping T : X ! Y we define, for E ⇢ X and F ⇢ Y
T (E) = {y 2 Y : y = T (x) for some x 2 E}
(4.3.2)
the image of E in Y, and
T
1
(E) = {x 2 X : T (x) 2 E}
(4.3.3)
the preimage of F in X. Note that T is not required to be bijective in order that the
preimage be defined.
The following theorem states two useful characterizations of continuity. Condition b)
is referred to as the sequential definition of continuity, for obvious reasons, while c) is
the topological definition, since it may be used to define continuity in much more general
topological spaces.
53
Theorem 4.2. Let X, Y be metric spaces and T : X ! Y. Then the following are
equivalent:
a) T is continuous on X.
b) If xn 2 X and xn ! x, then T (xn ) ! T (x).
c) If E is open in Y then T
1
(E) is open in X.
Proof: Assume T is continuous on X and let xn ! x in X. If ✏ > 0 then there exists
> 0 such that dY (T (x̂), T (x)) < ✏ if dX (x̂, x) < . Choosing n0 sufficiently large that
dX (xn , x) < for n
n0 we then must have dY (T (xn ), T (x)) < ✏ for n
n0 , so that
T (xn ) ! T (x). Thus a) implies b).
To see that b) implies c), suppose condition b) holds, E is open in Y and x 2 T 1 (E).
We must show that there exists > 0 such that x̂ 2 T 1 (E) whenever dX (x̂, x) < . If not
then there exists a sequence xn ! x such that xn 62 T 1 (E), and by b), T (xn ) ! T (x).
Since y = T (x) 2 E and E is open, there exists ✏ > 0 such that z 2 E if dY (z, y) < ✏.
Thus T (xn ) 2 E for sufficiently large n, i.e. xn 2 T 1 (E), a contradiction.
Finally, suppose c) holds and fix x 2 X. If ✏ > 0 then corresponding to the open set
E = B(T (x), ✏) in Y there exists a ball B(x, ) in X such that B(x, ) ⇢ T 1 (E). But
this means precisely that if dX (x̂, x) < then dY (T (x̂), T (x)) < ✏, so that T is continuous
at x. 2
4.4
Compactness and optimization
Another important topological concept is that of compactness.
Definition 4.7. If E ⇢ X then a collection of open sets {G↵ }↵2A is an open cover of E
if E ⇢ [↵2A G↵ .
Here A is the index set and may be finite, countably or uncountably infinite.
Definition 4.8. K ⇢ X is compact if any open cover of K has a finite subcover. More
explicitly, K is compact if whenever K ⇢ [↵2A G↵ , where each G↵ is open, there exists a
finite number of indices ↵1 , ↵2 , . . . ↵m 2 A such that K ⇢ [m
j=1 G↵j . In addition, E ⇢ X
is precompact (or relatively compact) if E is compact.
54
compact1
Proposition 4.3. A compact set is closed and bounded. A closed subset of a compact
set is compact.
Proof: Suppose that K is compact and pick x 2 K c . For any r > 0 let Gr = {y 2
X : d(x, y) > r}. It is easy to see that each Gr is open and K ⇢ [r>0 Gr . Thus there
c
exists r1 , r2 , . . . rm such that K ⇢ [m
j=1 Grj and so B(x, r) ⇢ K if r < min {r1 , r2 , . . . rm }.
c
Thus K is open and so K is closed.
Obviously [r>0 B(x, r) is an open cover of K for any fixed x 2 X. If K is compact
then there must exist r1 , r2 , . . . rm such that K ⇢ [m
j=1 B(x, rj ) and so K ⇢ B(x, R) where
R = max {r1 , r2 , . . . rm }. Thus K is bounded.
Now suppose that F ⇢ K where F is closed and K is compact. If {G↵ }↵2A is an
open cover of F then these sets together with the open set F c are an open cover of K.
c
Hence there exists ↵1 , ↵2 , . . . ↵m such that K ⇢ ([m
j=1 G↵j ) [ F , from which we conclude
m
that F ⇢ [j=1 G↵j . 2
There will be frequent occasions for wanting to know if a certain set is compact, but
it is rare to use the above definition directly. A useful equivalent condition is that of
sequential compactness.
Definition 4.9. A set K ⇢ X is sequentially compact if any infinite sequence in E has
a subsequence convergent to a point of K.
Proposition 4.4. A set K ⇢ X is compact if and only if it is sequentially compact.
We will not prove this result here, but instead refer to Theorem 16, Section 9.5 of
[27] for details. It follows immediately that if E ⇢ X is precompact then any infinite
sequence in X has a convergent subsequence (the point being that the limit need not
belong to E).
We point out that the concepts of compactness and sequential compactness are applicable in spaces even more general than metric spaces, and are not always equivalent
in such situations. In the case that X = RN or CN we have an even more explicit characterization of compactness, the well known Heine-Borel Theorem, for which we refer to
[28] for a proof.
thhb
Theorem 4.3. E ⇢ RN or E ⇢ CN is compact if and only if it is closed and bounded.
While we know from Proposition 4.3 that a compact set is always closed and bounded,
55
the converse implication is definitely false in most function spaces we will be interested
in.
In later chapters a great deal of attention will be paid to optimization problems in
function spaces, that is, problems in the Calculus of Variations. A simple result along
these lines that we can prove already is:
th43
Theorem 4.4. Let X be a compact metric space and f : X ! R be continuous. Then
there exists x0 2 X such that
f (x0 ) = max f (x)
(4.4.1)
x2X
Proof: Let M = supx2X f (x) (which may be +1). so there exists a sequence {xn }1
n=1
such that limn!1 f (xn ) = M . By sequential compactness there is a subsequence {xnk }
and x0 2 X such that limk!1 xnk = x0 and since f is continuous on X we must have
f (x0 ) = limk!1 f (xnk ) = M . Thus M < 1 and 4.4.1 holds. 2
A common notation expressing the same conclusion as 4.4.1 is
x0 2 argmax(f (x))
2
(4.4.2)
which is also useful in making the distinction between the maximum value of a function
and the point(s) at which the maximum is achieved.
We emphasize here the distinction between maximum and supremum, which is an
essential point in later discussion of optimization. If E ⇢ R then M = sup E if
• x  M for all x 2 E
• if M 0 < M there exists x 2 E such that x > M 0
Such a number M exists for any E ⇢ R if we allow the value M = +1; by convention
M = 1 if E is the empty set. On the other hand M = max E if
• xM 2E
2
Even though argmax(f (x)) is in general a set of points, i.e. all points where f achieves its maximum value,
one will often see this written as x0 = argmax(f (x)). Naturally we use the corresponding notation argmin for
points where the minimum of f is achieved.
56
in which case evidently the maximum is finite and equal to the supremum.
If f : X ! C is continuous on a compact metric space X, then we can apply Theorem
4.4 with f replaced by |f |, to obtain that there exists x0 2 X such that |f (x)|  |f (x0 )|
for all x 2 X. We can then also conclude, as in Example 4.2 and Proposition 4.2
Proposition 4.5. If X is a compact metric space, then
C(X) = {f : X ! C : f is continous at x for every x 2 X}
is a complete metric space with metric d(f, g) = maxx2X |f (x)
(4.4.3)
g(x)|.
In general C(X), or even a bounded set in C(X), is not itself precompact. A useful
criteria for precompactness of a set of functions in C(X) is given by the Arzela-Ascoli
theorem, which we review here, see e.g. [28] for a proof.
Definition 4.10. We say a family of real or complex valued functions F defined on a
metric space X is uniformly bounded if there exists a constant M such that
whenever x 2 X , f 2 F
|f (x)|  M
and equicontinuous if for every ✏ > 0 there exists
|f (x)
f (y)| < ✏
(4.4.4)
> 0 such that
whenever x, y 2 E
d(x, y) <
f 2F
(4.4.5)
We then have
arzasc
ex48
Theorem 4.5. (Arzela-Ascoli) If X is a compact metric space and F ⇢ C(X) is uniformly bounded and equicontinuous, then F is precompact in C(X).
Example 4.7. Let
F = {f 2 C([0, 1]) : |f 0 (x)|  M 8x 2 (0, 1), f (0) = 0}
for some fixed M . Then for f 2 F we have
Z
f (x) =
implying in particular that |f (x)| 
|f (x)
Rx
x
f 0 (s) ds
(4.4.6)
(4.4.7)
0
M ds  M . Also
Z y
f (y)| =
f 0 (s) ds  M |x
0
x
57
y|
(4.4.8)
so that for any ✏ > 0, = ✏/M works in the definition of equicontinuity. Thus by the
Arzela-Ascoli theorem F is precompact in C([0, 1]).
If X is a compact subset of RN then since uniform convergence implies Lp convergence,
it follows that any set which is precompact in C(X) is also precompact in Lp (X). But
there are also more refined, i.e. less restrictive, criteria for precompactness in Lp spaces,
which are known, see e.g. [5], Section 4.5.
4.5
Contraction mapping theorem
Met-Contr
One of the most important theorems about metric spaces, frequently used in applied
mathematics, is the Contraction Mapping Theorem, which concerns fixed points of a
mapping of X into itself.
Definition 4.11. A mapping T : X ! X is a contraction on X if it is Lipschitz continuous with Lipschitz constant ⇢ < 1, that is, there exists ⇢ 2 [0, 1) such that
d(T (x), T (x̂))  ⇢d(x, x̂)
8x, x̂ 2 X
(4.5.1)
If ⇢ = 1 is allowed, we say T is nonexpansive.
cmt
Theorem 4.6. If T is a contraction on a complete metric space X then there exists a
unique x 2 X such that T (x) = x.
Proof: The uniqueness assertion is immediate, namely if T (x1 ) = x1 and T (x2 ) = x2
then d(x1 , x2 ) = d(T (x1 ), T (x2 ))  ⇢d(x1 , x2 ). Since ⇢ < 1 we must have d(x1 , x2 ) = 0 so
that x1 = x2 .
To prove the existence of x, fix any point x1 2 X and define
xn+1 = T (xn )
(4.5.2)
for n = 1, 2, . . . . We first show that {xn }1
n=1 must be a Cauchy sequence.
Note that
and by induction
d(x3 , x2 ) = d(T (x2 ), T (x1 ))  ⇢d(x1 , x1 )
(4.5.3)
d(xn+1 , xn ) = d(T (xn ), T (xn 1 )  ⇢n 1 d(x2 , x1 )
(4.5.4)
58
fpi
Thus by the triangle inequality and the usual summation formula for a geometric series,
if m > n > 1
d(xm , xn ) 
=
m
X1
d(xj+1 , xj ) 
j=n
n 1
⇢
(1 ⇢m
1 ⇢
n+1
m
X1
)
⇢j 1 d(x2 , x1 )
(4.5.5)
j=n
d(x2 , x1 ) 
⇢n 1
d(x2 , x1 )
1 ⇢
(4.5.6)
It follows immediately that {xn }1
n=1 is a Cauchy sequence, and since X is complete there
exists x 2 X such that limn!1 xn = x. Since T is continuous T (xn ) ! T (x) as n ! 1
and so x = T (x) must hold. 2
The point x in the Contraction Mapping Theorem which satisfies T (x) = x is called
a fixed point of T , and the process (4.5.2) of generating the sequence {xn }1
n=1 , is called
fixed point iteration. Not only does the theorem show that T possesses a unique fixed
point under the stated hypotheses, but the proof shows that the fixed point may be
obtained by fixed point iteration starting from an arbitrary point of X.
As a simple application of the theorem, consider a second kind integral equation
Z
u(x) + K(x, y)u(y) dy = f (x)
(4.5.7)
inteq
⌦
with ⌦ ⇢ RN a bounded open set, a kernel function K = K(x, y) defined and continuous
for (x, y) 2 ⌦ ⇥ ⌦ and f 2 C(⌦). We can then define a mapping T on X = C(⌦) by
Z
T (u)(x) =
K(x, y)u(y) dy + f (x)
(4.5.8)
⌦
so that (4.5.7) is equivalent to the fixed point problem u = T (u) in X. Since K is
uniformly continuous on ⌦ ⇥ ⌦ it is immediate that T u 2 X whenever u 2 X, and by
elementary estimates we have
Z
d(T (u), T (v)) = max |T (u)(x) T (v)(x)| = max
K(x, y)(u(y) v(y)) dy  Ld(u, v)
x2⌦
x2⌦
⌦
(4.5.9)
where L := maxx2⌦ ⌦ |K(x, y)| dy. We therefore may conclude from the Contraction
Mapping Theorem the following:
R
Proposition 4.6. If
max
x2⌦
Z
⌦
|K(x, y)| dy < 1
59
(4.5.10)
410
then (4.5.7) has a unique solution for every f 2 C(⌦).
The condition (4.5.10) will be satisfied if either the maximum of |K| is small enough
or the size of the domain ⌦ is small enough. Eventually we will see that some such
smallness condition is necessary for unique solvability of (4.5.7), but the exact conditions
will be sharpened considerably.
If we consider instead the family of second kind integral equations
Z
u(x) + K(x, y)u(y) dy = f (x)
(4.5.11)
⌦
with the same conditions on K and f , then the above argument show unique solvability
for all sufficiently large , namely provided
Z
max |K(x, y)| dy < | |
(4.5.12)
x2⌦
⌦
As a second example, consider the initial value problem for a first order ODE
du
= f (t, u)
dt
u(t0 ) = u0
(4.5.13)
where we assume at least that f is continuous on [a, b] ⇥ R with t0 2 (a, b). If a classical
solution u exists, then integrating both sides of the ODE from t0 to t, and taking account
of the initial condition we obtain
Z t
u(t) = u0 +
f (s, u(s)) ds
(4.5.14)
t0
Conversely, if u 2 C([a, b]) and satisfies (4.5.14) then necessarily u0 exists, is also continuous and (4.5.13) holds. Thus the problem of solving (4.5.13) is seen to be equivalent
to that of finding a continuous solution of (4.5.14). In turn this can be viewed as the
problem of finding a fixed point of the nonlinear mapping T : C([a, b]) ! C([a, b]) defined
by
Z
t
T (u)(t) = u0 +
f (s, u(s)) ds
(4.5.15)
t0
Now if we assume that f satisfies the Lipschitz condition with respect to u,
|f (t, u)
f (t, v)|  L|u
v|
60
u, v 2 R t 2 [a, b]
(4.5.16)
odeivp1
odeie
then
|T (u)(t)
T (v)(t)|  L
or
Z
t
t0
|u(s)
v(s)| ds  L|b
d(T (u), T (v))  L|b
a| max |u(t)
atb
v(t)|
a|d(u, v)
(4.5.17)
(4.5.18)
where d is again the usual metric on C([a, b]). Thus the contraction mapping provides a
unique local solution, that is, on any interval [a, b] containing t0 for which (b a) < 1/L.
Instead of the requirement that the Lipschitz condition (4.5.18) be valid on the entire
infinite strip [a, b]⇥R, it is actually only necessary to assume it holds on [a, b]⇥[c, d] where
u0 2 (c, d). Also, first order systems of ODEs (and thus scalar higher order equations)
can be handled in essentially the same manner. Such generalizations may be found in
standard ODE textbooks, e.g. Chapter 1 of [CL] or Chapter 3 of [BN].
We conclude with a useful variant of the contraction mapping theorem. If T : X ! X
then we can define the (composition) powers of T by T 2 (x) = T (T (x)), T 3 (x) = T (T 2 (x))
etc. Thus T n : X ! X for n = 1, 2, 3, . . . .
Theorem 4.7. If there exists a positive integer n such that T n is a contraction on a
complete metric space X then there exists a unique x 2 X such that T (x) = x.
Proof: By Theorem 4.6 there exists a unique x 2 X such that T n (x) = x. Applying T
to both sides gives T n (T (x)) = T n+1 (x) = T (x) so that T (x) is also a fixed point of T n .
By uniqueness, T (x) = x, i.e. T has at least one fixed point. To see that the fixed point
of T is unique, observe that any fixed point of T is also a fixed point of T 2 , T 3 , . . . . In
particular, if T has two distinct fixed points then so does T n , which is a contradiction.
2
4.6
Exercises
1. Verify that dp defined in Example 4.1 is a metric on RN or CN . (Suggestion: to
prove the triangle inequality, use the finite dimensional version of the Minkowski
inequality (18.1.15)).
2. If (X, dX ), (Y, dY ) are metric spaces, show that the Cartesian product
Z = X ⇥ Y = {(x, y) : x 2 X, y 2 Y }
61
odelip
is a metric space with distance function
d((x1 , y1 ), (x2 , y2 )) = dX (x1 , x2 ) + dY (y1 , y2 )
p
3. Is d(x, y) = |x y|2 a metric on R? What about d(x, y) =
|x y|? Find
reasonable conditions on a function : [0, 1) ! [0, 1) such that d(x, y) = (|x y|)
is a metric on R.
4. Prove that a closed subset of a compact set in a metric space is also compact.
5. Let (X, d) be a metric space, A ⇢ X be nonempty and define the distance from a
point x to the set A to be
d(x, A) = inf d(x, y)
y2A
a) Show that |d(x, A)
pansive).
d(y, A)|  d(x, y) for x, y 2 X (i.e. x ! d(x, A) is nonex-
b) Assume A is closed. Show that d(x, A) = 0 if and only if x 2 A.
c) Assume A is compact. Show that for any x 2 X there exists z 2 A such that
d(x, A) = d(x, z).
6. Suppose that F is closed and G is open in a metric space (X, d) and F ⇢ G. Show
that there exists a continuous function f : X ! R such that
i) 0  f (x)  1 for all x 2 X.
ii) f (x) = 1 for x 2 F .
iii) f (x) = 0 for x 2 Gc .
Hint: Consider
f (x) =
d(x, Gc )
d(x, Gc ) + d(x, F )
7. Two metrics d, dˆ on a set X are said to be equivalent if there exist constants 0 <
C < C ⇤ < 1 such that
C
d(x, y)
 C⇤
ˆ y)
d(x,
8x, y 2 X
a) If d, dˆ are equivalent, show that a sequence {xk }1
k=1 is convergent in (X, d) if and
ˆ
only if it is convergent in (X, d).
b) Show that any two of the metrics dp on Rn are equivalent.
62
ex4-8
8. Prove that C([a, b]) is separable (you may quote the Weierstrass approximation
theorem) but L1 (a, b) is not separable.
9. If X, Y are metric spaces, f : X ! Y is continuous and K is compact in X, show
that the image f (K) is compact in Y .
10. Let
F = {f 2 C([0, 1]) : |f (x)
f (y)|  |x
y| for all x, y,
Z
1
f (x) dx = 0}
0
Show that F is compact in C([0, 1]). (Suggestion: to prove that F is uniformly
bounded, justify and use the fact that if f 2 F then f (x) = 0 for some x 2 [0, 1].)
11. Show that the set F in Example 4.7 is not closed.
12. From the proof of the contraction mapping it is clear that the smaller ⇢ is, the faster
the sequence xn converges to the fixed point x. With this in mind, explain why
Newton’s method
f (xn )
xn+1 = xn
f 0 (xn )
is in general a very rapidly convergent method for approximating roots of f : R ! R,
as long as the initial guess is close enough.
13. Let fn (x) = sinn x for n = 1, 2, . . . .
a) Is the sequence {fn }1
n=1 convergent in C([0, ⇡])?
b) Is the sequence convergent in L2 (0, ⇡)?
c) Is the sequence compact or precompact in either of these spaces?
14. Let X be a complete metric space and T : X ! X satisfy d(T (x), T (y)) < d(x, y)
for all x, y 2 X, x 6= y. Show that T can have at most one fixed point, but
p may
have none. (Suggestion: for an example of non-existence look at T (x) = x2 + 1
on R.)
15. Let S denote the linear Volterra type integral operator
Z x
Su(x) =
K(x, y)u(y) dy
a
where the kernel K is continuous and satisfies |K(x, y)|  M for a  y  x.
63
a) Show that
|S n u(x)| 
M n (x a)n
max |u(y)| x > a n = 1, 2, . . .
ayx
n!
b) Deduce from this that for any b > a, there exists an integer n such that S n is a
contraction on C([a, b]).
c) Show that for any f 2 C([a, b]) the second kind Volterra integral equation
Z x
K(x, y)u(y) dy = u(x) + f (x) a < x < b
a
has a unique solution u 2 C([a, b]).
16. Show that for sufficiently small | | there exists a unique solution of the boundary
value problem
u00 + u = f (x) 0 < x < 1
u(0) = u(1) = 0
for any f 2 C([0, 1]). (Suggestion: use the result of Chapter 2, Exercise 7 to
transform the boundary value problem into a fixed point problem for an integral
operator, then apply the Contraction Mapping Theorem.) Be as precise as you can
about which values of are allowed.
17. Let f = f (x, y) be continuously di↵erentiable on [0, 1] ⇥ R and satisfy
0<m
@f
(x, y)  M
@y
Show that there exists a unique continuous function (x) such that
f (x, (x)) = 0 0 < x < 1
(Suggestion: Define the transformation
(T )(x) = (x)
f (x, (x))
and show that T is a contraction on C([0, 1]) for some choice of . This is a special
case of the implicit function theorem.)
64
ex4-18
18. Show that if we let
d(f, g) =
where
1
X
2 n en
1 + en
n=0
en = max |f (n) (x)
x2[a,b]
g (n) (x)|
(n)
then (C 1 ([a, b]), d) is a metric space, in which fk ! f if and only if fk
uniformly on [a, b] for n = 0, 1, . . . .
ex4-19
! f (n)
19. If E ⇢ RN is closed and bounded, show that C 1 (E) is a complete metric space with
the metric defined by (4.1.7).
65
Chapter 5
Normed linear spaces and Banach
spaces
chbanach
5.1
Axioms of a normed linear space
Definition 5.1. A vector space X is said to be a normed linear space if for every x 2 X
there is defined a nonnegative real number ||x||, the norm of x, such that the following
axioms hold.
[N1] ||x|| = 0 if and only if x = 0
[N2] || x|| = | |||x|| for any x 2 X and any scalar .
[N3] ||x + y||  ||x|| + ||y|| for any x, y 2 X.
As in the case of a metric space it is technically the pair (X, || · ||) which constitute a
normed linear space, but the definition of the norm will usually be clear from the context.
If two di↵erent normed spaces are needed we will use a notation such as ||x||X to indicate
the space in which the norm is calculated.
Example 5.1. In the vector space X = RM or CN we can define the family of norms
! p1
n
X
p
||x||p =
|xj |
1p<1
j=1
66
||x||1 = max |xj |
1jn
(5.1.1)
Axioms [N1] and [N2] are obvious, while axiom [N3] amounts to the Minkowski inequality
(18.1.15).
We obviously have dp (x, y) = ||x y||p here, and this correspondence between norm
and metric is a special case of the following general fact that a norm always gives rise to
a metric, and whose proof is immediate from the definitions involved.
prop51
Proposition 5.1. Let (X, || · ||) be a normed linear space. If we set d(x, y) = ||x
for x, y 2 X then (X, d) is a metric space.
y||
Example 5.2. If E ⇢ RN is closed and bounded then it is easy to verify that
||f || = max |f (x)|
x2E
(5.1.2)
defines a norm on C(E), and the usual metric (4.1.4) on C(E) amounts to d(f, g) =
||f g||. Likewise, the metrics (4.1.8),(4.1.9) on Lp (E) may be viewed as coming from
the corresponding Lp norms,
( R
1
|f (x)|p dx p 1  p < 1
E
||f ||Lp (E) =
(5.1.3)
ess supx2E |f (x)| p = 1
Note that for such a metric we must have d( x, y) = | |d(x, y) so that if this property
does not hold, the metric cannot arise from a norm in this way. For example,
d(x, y) =
|x y|
1 + |x y|
(5.1.4)
is a metric on R which does not come from a norm.
Since any normed linear space may now be regarded as metric space, all of the topological concepts defined for a metric space are meaningful in a normed linear space.
Completeness holds in many situations of interest, so we have a special designation in
that case.
Definition 5.2. A Banach space is a complete normed linear space.
Example 5.3. The spaces RN , CN are vector spaces which are also complete metric
spaces with any of the norms || · ||p , hence they are Banach spaces. Similarly C(E),
Lp (E) are Banach spaces with norms indicated above.
Here are a few simple results we can prove already.
67
prop52
Proposition 5.2. If X is a normed linear space the the norm is a continuous function
on X. If E ⇢ X is compact and y 2 X then there exists x0 2 E such that
||y
x0 || = min ||y
x||
x2E
(5.1.5)
Proof: From the triangle inequality we get |(||x1 || ||x2 ||)|  ||x1 x2 || so that
f (x) = ||x|| is Lipschitz continuous (with Lipschitz constant 1) on X. Similarly f (x) =
||x y|| is also continuous for any fixed y, so we may apply Theorem 4.4 with X replaced
by the compact metric space E and f (x) = ||x y|| to get the conclusion (ii). 2
Another topological point of interest is the following.
th51
Theorem 5.1. If M is a subspace of a normed linear space X, and dim M < 1 then
M is closed.
Proof: The proof is by induction on the number of dimensions. Let dim(M ) = 1 so that
M = {u = e : 2 C} for some e 2 X, ||e|| = 1. If un 2 M then un = n e for some
um || = | n
n 2 C and un ! u in X implies, since ||un
m |, that { n } is a Cauchy
sequence in C. Thus there exist 2 C such that n ! so that un ! u = e 2 M , as
needed.
Now suppose we know that all N dimensional subspaces are closed and dim M =
N + 1, so we can find e1 , . . . , eN +1 linearly independent unit vectors such that M =
L(e1 , . . . , eN +1 ). Let M̃ = L(e1 , . . . , eN ) which is closed by the induction assumption. If
un 2 M there exists n 2 C and vn 2 M̃ such that un = vn + n eN +1 . Suppose that
un ! u in X. We claim first that { n } is bounded in C. If not, there must exist nk such
that | nk | ! 1, and since un remains bounded in X we get unk / nk ! 0. It follows that
eN +1
un k
vnk
=
nk
nk
2 M̃
(5.1.6)
Since M̃ is closed, it would follow, upon letting nk ! 1, that eN +1 2 M̃ , which is
impossible.
Thus we can find a subsequence
v n k = un k
nk
!
for some
nk eN +1
Again since M̃ is closed it follows that u
!u
2 C and
eN +1
(5.1.7)
eN +1 2 M̃ , so that u 2 M as needed.
68
For the proof, see for example Theorem 1.21 of [30]. For an infinite dimensional
subspace this is false in general. For example, the Weierstrass approximation theorem
states that if f 2 C([a, b]) and ✏ > 0 there exists a polynomial p such that |p(x) f (x)|  ✏
on [a, b]. Thus if we take X = C([a, b]) and E to be the set of all polynomials on [a, b]
then clearly E is a subspace of X and every point of X is a limit point of E. Thus E
cannot be closed since otherwise E would be equal to all of X.
Recall that when E = X as in this example, we say that E is a dense subspace of X.
Such subspaces play an important role in functional analysis. According to Theorem 5.1
a finite dimensional Banach space X has no dense subspace aside from X itself.
5.2
Infinite series
In a normed linear space we can study limits of sums, i.e. infinite series.
P
Definition 5.3. We say 1
j=1 xj is convergent in X to the limit s if limn!1 sn = s,
Pn
where sn = j=1 xj is the n’th partial sum of the series.
prop53
A useful criterion for convergence can then be given, provided the space is also complete.
P
Proposition
5.3. If X is a Banach space, xn 2 X for n = 1, P
2, . . . and 1
n=1 ||xn || < 1
P1
then n=1 xn is convergent to an element s 2 X with ||s||  1
||x
||.
n
n=1
P
P
Proof: If m > n we have ||sm sn || = || m
xj ||  m
j=n+1
j=n+1 ||xj || by the triangle
P1
inequality. If j=1 ||xj || it is convergent, its partial sums form a Cauchy sequence in R,
and hence {sn } is P
also Cauchy. Since the space is complete s = limn!1 sn exists. We
also have ||sn ||  nj=1 ||xj || for any fixed n, and ||sn || ! ||s|| by Proposition 5.2, so
P
||s||  1
j=1 ||xj || must hold. 2
The concepts of linear combination, linear independence and basis may now be extended to allow for infinite sums in an obvious way: We say a countably infinite set of
vectors {xn }1
n=1 is linearly independent if
1
X
n xn
= 0 if and only if
n=1
69
n
= 0 for all n
(5.2.1)
P1
1
and x 2 L({xn }1
n=1 ), the span of ({xn }n=1 ), provided x =
n=1 n xn for some scalars
{ n }1
n=1 . A basis of X is then a linearly independent spanning set, or equivalently
1
1
{xn }P
n=1 is a basis of X if for any x 2 X there exist unique scalars { n }n=1 such that
1
x = n=1 n xn .
We emphasize that this definition of basis is not the same as that given in Definition
3.4 for a basis of a vector space, the di↵erence being that the sum there is required to
always be finite. The term Schauder basis is sometimes used for the above definition if
the distinction needs to be made. Throughout the remainder of these notes, the term
basis will always mean Schauder basis unless otherwise stated.
A Banach space X which contains a Schauder basis {xn }1
n=1 is always separable, since
then the set of all finite linear combinations of the xn ’s with rational coefficients is easily
seen to be countable and dense. It is known that not every separable Banach space has
a Schauder basis (recall there must exist a Hamel basis), see for example Section 1.1 of
[38].
5.3
Linear operators and functionals
We have previously defined what it means for a mapping T : X 7 ! Y between vector
spaces to be linear. When the spaces X, Y are normed linear spaces we usually refer to
such a mapping T as a linear operator. We say that T is bounded if there exists a finite
constant C such that ||T x||  C||x|| for every x 2 X, and we may then define the norm
of T as the smallest such C, or equivalently
||T || = sup
x6=0
||T x||
||x||
(5.3.1)
The condition ||T || < 1 is equivalent to continuity of T .
prop54
Proposition 5.4. If X, Y are normed linear spaces and T : X ! Y is linear then the
following three conditions are equivalent.
a) T is bounded
b) T is continuous
c) There exists x0 2 X such that T is continuous at x0 .
70
normdef
Proof: If x0 , x 2 X then
||T (x)
T (x0 )|| = ||T (x
x0 )||  ||T || ||x
x0 ||
(5.3.2)
Thus if T is bounded then it is (Lipschitz) continuous at any point of X. The implication
that b) implies c) is trivial. Finally suppose T is continuous at x0 2 X. For any ✏ > 0
there must exist > 0 such that ||T (z x0 )|| = ||T (z) T (x0 )||  ✏ if ||z x0 ||  . For
x
any x 6= 0, choose z = x0 + ||x||
to get
||T
✓
x
||x||
◆
||  ✏
(5.3.3)
or equivalently, using the linearity of T , ||T x||  C||x|| with C = ✏/ . Thus T is
bounded. 2
A continuous linear operator is therefore the same as a bounded linear operator, and
the two terms are used interchangeably. When the range space Y is the scalar field R or
C we call T a linear functional instead of linear operator, and correspondingly a bounded
(or continuous) linear functional if |T x|  C||x|| for some finite constant C.
We introduce the notation
B(X, Y) = {T : X ! Y : T is linear and bounded}
and the special cases
B(X) = B(X, X)
X0 = B(X, C)
(5.3.4)
(5.3.5)
Examples of linear operators and functionals will be studied much more extensively
later. For now we just give two simple examples.
Example 5.4. If X = RN , Y = RM and A is an M ⇥N real matrix with entries akj , then
P
yk = M
j=1 akj xj defines a linear mapping, and according to the discussion of Section 3.3
any linear mapping of RN to RM is of this form. It is not hard to check that T is always
bounded, assuming that we use any of the norms || · ||p in X and in Y. Evidently T is a
linear functional if M = 1.
Example 5.5. If ⌦ ⇢ RN is compact and X = C(⌦) pick x0 2 ⌦ and set T (f ) = f (x0 )
for f 2 X. Clearly T is a linear functional and |T f |  ||f || so that ||T ||  1.
71
5.4
Contraction mappings in a Banach space
sec54
If the Contraction Mapping theorem, Theorem 4.6, is specialized to a Banach space, the
resulting statement is that if X is a Banach space and F : X ! X satisfies
||F (x)
F (y)||  L||x
y||
x, y 2 X
(5.4.1)
conaflin
for some L < 1, then F has a unique fixed point in X.
A particular case which arises frequently in applications is when the mapping F has
the form F (x) = T x + b for some b 2 X and bounded linear operator T on X, in which
case the contraction condition (5.4.1) simply amounts to the requirement that ||T || < 1.
If we then initialize the fixed point iteration process (4.5.2) with x1 = b, the successive
iterates are
x2 = F (x1 ) = F (b) = T b + b
x3 = F (x2 ) = T x2 + b = T 2 b + T b + b
(5.4.2)
(5.4.3)
etc., the general pattern being
xn =
n 1
X
T j b n = 1, 2, . . .
(5.4.4)
j=0
with T 0 = I as usual. If ||T || < 1 we already know that this sequence must converge,
but it could also be checked directly from Proposition 5.3 using the obvious inequality
||T j b||  ||T ||j ||b||. In fact we know that xn ! x, the unique fixed point of F , so
x=
1
X
T jb
(5.4.5)
j=0
is an explicit solution formula for the linear, inhomogeneous equation x T x = b. The
right hand side of (5.4.5) is known as the Neumann series for x = (I T ) 1 b, and
symbolically we may write
1
X
(I T ) 1 =
Tj
(5.4.6)
j=0
Note the formal similarity to the usual geometric series formula for (1 z) 1 if z 2 C,
|z| < 1. If T and b are such that ||T j b|| << ||T b|| for j 2, then truncating the series
after two terms we get the Born approximation formula x ⇡ b + T b.
72
neuser
5.5
Exercises
1. Give the proof of Proposition 5.1.
2. Show that any two norms on a finite dimensional normed linear space are equivalent.
That is to say, if (X, || · ||), (X, ||| · |||) are both normed linear spaces, then there
exist constants 0 < c < C < 1 such that
c
ex5-3
ex5-7
|||x|||
C
||x||
for all x 2 X
3. If X is a normed linear space and Y is a Banach space, show that B(X, Y) is a
Banach space, with the norm given by (5.3.1).
R
4. If T is a linear integral operator, T u(x) = ⌦ K(x, y)u(y) dy, then T 2 is also a linear
integral operator. What is the kernel for T 2 ?
5. If X is a normed linear space and E is a subspace of X, show that E is also a
subspace of X.
R
1/p
6. If p 2 (0, 1) show that ||f ||p = ⌦ |f (x)|p dx
does not define a norm.
7. The simple initial value problem
u0 = u
u(0) = 1
is equivalent to the integral equation
u(x) = 1 +
Z
x
u(s) ds
0
which may be viewed as a fixed point problem of the special type discussed in
Section 5.4. Find the Neumann series for the solution u. Where does it converge?
8. If T f = f (0), show that T is not a bounded linear functional on Lp ( 1, 1) for
1  p < 1.
expop
9. Let A 2 B(X).
a) Show that
exp(A) = eA :=
1
X
An
n=0
73
n!
(5.5.1)
is defined in B(X).
b) If also B 2 B(X) and AB = BA show that exp(A + B) = exp(A) exp(B).
c) Show that exp((t + s)A) = exp(tA) exp(sA) for any t, s 2 R.
d) Show that the conclusion in b) is false, in general, if A and B do not commute.
(Suggestion: a counterexample can be found in X = R2 .)
10. Find an integral equation of the form u = T u + f , T linear, which is equivalent to
the initial value problem
u00 + u = x2
x>0
u(0) = 1 u0 (0) = 2
(5.5.2)
Calculate the Born approximation to the solution u and compare to the exact
solution.
74
Chapter 6
Inner product spaces and Hilbert
spaces
chhilbert
6.1
Axioms of an inner product space
Definition 6.1. A vector space X is said to be an inner product space if for every
x, y 2 X there is defined a complex number hx, yi, the inner product of x and y such
that the following axioms hold.
[H1] hx, xi
0 for all x 2 X
[H2] hx, xi = 0 if and only if x = 0
[H3] h x, yi = hx, yi for any x, y 2 X and any scalar .
[H4] hx, yi = hy, xi for any x, y 2 X.
[H5] hx + y, zi = hx, zi + hy, zi for any x, y, z 2 X
Note that from axioms [H3] and [H4] it follows that
hx, yi = h y, xi = hy, xi = ¯ hy, xi = ¯ hx, yi
75
(6.1.1)
Another immediate consequence of the axioms is that
||x + y||2 = hx + y, x + yi = ||x||2 + 2Re hx, yi + ||y||2 = ||x||2 + ||y||2
If we replace y by
gram Law
(6.1.2)
y and add the resulting identities we obtain the so-called Parallelo||x + y||2 + ||x
y||2 = 2||x||2 + 2||y||2
(6.1.3)
Example 6.1. The vector space RN is an inner product space if we define
hx, yi =
n
X
hx, yi =
n
X
xj yj
(6.1.4)
xj y j
(6.1.5)
j=1
In the case of Cn we must define
j=1
in order that [H4] be satisfied.
Example 6.2. For the vector space L2 (⌦), with ⌦ ⇢ RN , we may define
Z
hf, gi =
f (x)g(x) dx
(6.1.6)
E
where of course the complex conjugation can be ignored in the case of the real vector
space L2 (⌦). Note the formal analogy with the inner product in the case of RN or CN .
The finiteness of hf, gi is guaranteed by the Hölder inequality (18.1.6), and the validity
of [H1]-[H5] is clear.
littlel2
612
Example 6.3. Another important inner product space which we introduce at this point
is the sequence space
(
)
1
X
`2 = x = {xj }1
|xj |2 < 1
(6.1.7)
j=1 :
j=1
with inner product
hx, yi =
1
X
xj y j
(6.1.8)
j=1
The fact that hx, yi is finite for any x, y 2 `2 follows now from (18.1.14), the discrete form
of the Hölder inequality. The notation `2 (Z) is often used when the sequences involved
are bi-infinite, i.e. of the form x = {xj }1
j= 1 .
76
plaw
6.2
Norm in a Hilbert space
Proposition 6.1. If x, y 2 X, an inner product space, then
|hx, yi|2  hx, xihy, yi
(6.2.1)
zi = hx, xi hx, zi hz, xi + hz, zi
= hx, xi + hz, zi 2Re hx, zi
(6.2.2)
(6.2.3)
Proof: For any z 2 X we have
0  hx
z, x
and hence
2Re hz, xi  hx, xi + hz, zi
(6.2.4)
|hx, yi|2
|hx, yi|2
 hx, xi +
hy, yi
hy, yi
(6.2.5)
If y = 0 there in nothing to prove, otherwise choose z = (hx, yi/hy, yi)y to get
2
The conclusion (6.2.1) now follows upon rearrangement. 2
th61
Theorem 6.1. If X is an inner product space and if we set ||x|| =
a norm on X.
Proof: By axiom [H1]
and axiom [H2] implies
|| x||2 =< x, x >=
then
p
hx, xi then || · || is
||x|| is defined as a nonnegative real number for every x 2 X,
the corresponding axiom [N1] of norm. If is any scalar then
¯ hx, xi = | |2 ||x||2 so that [N2] also holds. Finally, if x, y 2 X
||x + y||2 = hx + y, x + yi = ||x||2 + 2Re hx, yi + ||y||2
 ||x||2 + 2|hx, yi| + ||y||2  ||x||2 + 2||x|| ||y|| + ||y||2
= (||x|| + ||y||)2
(6.2.6)
(6.2.7)
(6.2.8)
so that the triangle inequality [N3] also holds. 2
The inequality (6.2.1) may now be restated as
|hx, yi|  ||x|| ||y||
(6.2.9)
for any x, y 2 X, and in this form is usually called the Schwarz or Cauchy-Schwarz
inequality.
77
schwarz
cor61
Corollary 6.1. If xn ! x in X then hxn , yi ! hx, yi for any y 2 X
Proof: We have that
|hxn , yi
hx, yi| = |hxn
x, yi|  ||xn
x|| ||y|| ! 0
(6.2.10)
By Theorem 6.1 an inner product space may always be regarded as a normed linear
space, and analogously to the definition of Banach space we have
Definition 6.2. A Hilbert space is a complete inner product space.
Example 6.4. The spaces RN and CN are Hilbert spaces, as is L2 (⌦) on account of the
completeness property mentioned in TheoremR 4.1 of Chapter 4. On the other hand if we
consider C(E) with inner product hf, gi = E f (x)g(x) dx, then it is an inner product
space which is not a Hilbert space, since as previously observed, C(E) is not complete
with the L2 (⌦) metric. The sequence space `2 is also a Hilbert space, see Exercise 7.
6.3
Orthogonality
Recall from elementary calculus that in Rn the inner product allows one to calculate the
angle between two vectors, namely
hx, yi = ||x|| ||y|| cos ✓
(6.3.1)
where ✓ is the angle between x and y. In particular x and y are perpendicular if and only
if hx, yi = 0. The concept of perpindicularity, also called orthogonality, is fundamental
in Hilbert space analysis, even if the geometric picture is less clear.
Definition 6.3. if x, y 2 X, an inner product space, we say x, y are orthogonal if
hx, yi = 0.
From (6.1.2) we obtain immediately the ’Pythagorean Theorem’ that if x and y are
orthogonal then
||x + y||2 = ||x||2 + ||y||2
(6.3.2)
78
A set of vectors {x1 , x2 , . . . xn } is called an orthogonal set if xj and xk are orthogonal
whenever j 6= k, and for such a set we have
||
n
X
j=1
2
xj || =
n
X
j=1
||xj ||2
(6.3.3)
634
The set is called orthonormal if in addition ||xj || = 1 for every j. The same terminology
is used for countably infinite sets, with (6.3.3) still valid provided that the series on the
right is convergent.
We may also use the notation x ? y if x, y are orthogonal, and if E ⇢ X we define
the orthogonal complement of E
E ? = {x 2 X : hx, yi = 0 for all y 2 E}
(E ? = x? if E consists of the single point x). We obviously have 0? = X and X? = {0}
also, since if x 2 X? then hx, xi = 0 so that x = 0.
prop62
Proposition 6.2. If E ⇢ X then E ? is a closed subspace of X. If E is a closed subpace
then E = E ?? .
We leave the proof as an exercise. Here E ?? means (E ? )? , the orthogonal complement of the orthogonal complement.
Example 6.5. If X = R3 and E = {x = (x1 , x2 , x3 ) : x1 = x2 = 0} then E ? = {x 2 R3 :
x3 = 0}.
Example 6.6. If X = L2 (⌦) with ⌦ a bounded open set in RRN , let E = L{1}, i.e. the
set of constant functions. Then f 2 E ? if and only if hf, 1i = ⌦ f (x) dx = 0. Thus E ?
is the set of functions in L2 (⌦) with mean value zero.
6.4
Projections
If E ⇢ X and x 2 X, the projection PE x of x onto E is the element of E closest to x, if
such an element exists. That is, y = PE (x) if y is the unique solution of the minimization
problem
min ||x z||
(6.4.1)
z2E
Of course such a point may not exist, and may not be unique if it does exist. In a Hilbert
space the projection will be well defined provided E is closed and convex.
79
bestapprox
Definition 6.4. If X is a vector space and E ⇢ X, we say E is convex if x+(1
whenever x, y 2 E and 2 [0, 1].
)y 2 E
Example 6.7. If X is a vector space then any subspace of X is convex. If X is a normed
linear space then any ball B(x, R) ⇢ X is convex.
Theorem 6.2. Let H be a Hilbert space, E ⇢ H closed and convex, and x 2 H. Then
y = PE x exists. Furthermore, y = PE x if and only if
y2E
Re hx
yi  0
y, z
for all z 2 E
(6.4.2)
projvar
Proof: Set d = inf z2E ||x z|| so that there exists a sequence zn 2 E such that
||x zn || ! d. We wish to show that {zn } is a Cauchy sequence. From the Parallelogram
Law (6.1.3) applied to zn x, zm x we have
||zn
zm ||2 = 2||zn
x||2 + 2||zm
x||2
m
Since E is convex, (zn + zm )/2 2 E so that || zn +z
2
||zn
zm ||2  2||zn
4||
x||
x||2 + 2||zm
zn + zm
2
x||2
(6.4.3)
d, and it follows that
x||2
4d2
(6.4.4)
Letting n, m ! 1 the right hand side tends to zero, so that {zn } is Cauchy. Since the
space is complete there exists y 2 H such that limn!1 zn = y, and y 2 E since E is
closed. It follows that ||y x|| = limn!1 ||zn x|| = d so that minz2E ||z x|| is achieved
at y.
For the uniqueness assertion, suppose ||y x|| = ||ŷ
(6.4.4) holds with zn , zm replaced by y, ŷ giving
||y
ŷ||  2||y
x||2 + 2||ŷ
x||2
x|| = d with y, ŷ 2 E. Then
4d2 = 0
(6.4.5)
so that y = ŷ. Thus y = PE x exists.
To obtain the characterization (6.4.2), note that for any z 2 E
f (t) = ||x
(y + t(z
y))||2
(6.4.6)
has its minimum value on the interval [0, 1] when t = 0, since y +t(z
E. We explicitly calculate
f (t) = ||x
y||2
2t Re hx
80
y, z
yi + t2 ||z
y) = tz +(1 t)y 2
y||2
(6.4.7)
644
By elementary calculus considerations, the minimum of this quadratic occurs at t = 0
only if f 0 (0) = 2 Re hx y, z yi 0 which is equivalent to (6.4.2). If, on the other
hand, (6.4.2) holds, then for any z 2 E we must have
||z
so that minz2E ||z
x||2 = f (1)
f (0) = ||z
y||2
(6.4.8)
x|| must occur at y, i.e. y = PE x 2
The most important special case of the above theorem is when E is a closed subspace
of the Hilbert space H (recall a subspace is always convex), in which case we have
th63
Theorem 6.3. If E ⇢ H is a closed subspace of a Hilbert space H and x 2 H then
y = PE x if and only if y 2 E and x y 2 E ? . Furthermore
1. x
y=x
PE x = P E ? x
2. We have that
x = y + (x
y) = PE x + PE ? x
(6.4.9)
is the unique decomposition of x as the sum of an element of E and an element of
E ?.
3. PE is a linear operator on H with ||PE || = 1 except for the case E = {0}.
Proof: If y = PE x then for any w 2 E we also have y ± w 2 E, and choosing
z = y ± w in (6.4.2) gives ± Re hx y, wi  0. Thus Re hx y, wi = 0, and repeating
the same argument with z = y ± iw gives Re hx y, iwi = Im hx y, wi = 0 also. We
conclude that hx y, wi = 0 for all w 2 E, i.e. x y 2 E ? . The converse statement may
be proved in a similar manner.
Recall that E ? is always a closed subspace of H. The statement that x y = PE ? x is
then equivalent, by the previous paragraph, to x y 2 E ? and hx (x y), wi = hy, wi = 0
for every w 2 E ? , which is evidently true since y 2 E.
Next, if x = y1 + z1 = y2 + z2 with y1 , y2 2 E and z1 , z2 2 E ? then y1 y2 = z1 z2
implying that y = y1 y2 belongs to both E and E ? . But then y ? y, i.e. < y, y >= 0,
must hold so that y = 0 and hence y1 = y2 , z1 = z2 . We leave the proof of linearity to
the exercises. 2
If we denote by I the identity mapping, we have just proved that PE ? = I
also obtain that
||x||2 = ||PE x||2 + ||PE ? x||2
81
PE . We
(6.4.10)
6410
for any x 2 H.
Example 6.8. In the Hilbert space L2 ( 1, 1) let E denote the subspace of even functions,
i.e. f 2 E if f (x) = f ( x) for almost every x 2 ( 1, 1). We claim that E ? is the subspace
?
of odd functions on ( 1, 1). The fact that any odd function
R 1 belongs to E is clear, since
if f is even and g is odd then f g is odd and so hf, gi = 1 f (x)g(x) dx = 0. Conversely,
if g ? E then for any f 2 E we have
Z 1
Z 1
0 = hg, f i =
g(x)f (x) dx =
(g(x) + g( x))f (x) dx
(6.4.11)
1
0
by an obvious change of variables. Choosing f (x) = g(x) + g( x) we see that
Z 1
|g(x) + g( x)|2 dx = 0
(6.4.12)
0
so that g(x) = g( x) for almost every x 2 (0, 1) and hence for almost every x 2 ( 1, 1).
Thus any element of E ? is and odd function on ( 1, 1).
Any function f 2 L2 ( 1, 1) thus has the unique decomposition f = PE f + PE ? f , a
sum of an even and an odd function. Since one such splitting is
f (x) =
f (x) + f ( x) f (x) f ( x)
+
2
2
(6.4.13)
we conclude from the uniqueness property that these two term are the projections, i.e.
PE f (x) =
f (x) + f ( x)
2
PE ? f (x) =
f (x)
f ( x)
2
(6.4.14)
Example 6.9. Let {x1 , x2 , . . . xn } be an orthogonal set of nonzero elements in a Hilbert
space X and E = L(x1 , x2 . . . xn ) the span ofPthese elements. Let us compute PE for
this closed subspace E. If y = PE x then y = nj=1 j xj for some scalars 1 , . . . n since
y 2 E. From Theorem 6.3 we also have that x y ? E which is equivalent to x y ? xk
for each k. Thus hx, xk i = hy, xk i = k hxk , xk i using the orthogonality assumption. Thus
we conclude that
n
X
hx, xj i
y = PE x =
xj
(6.4.15)
hx
j , xj i
j=1
82
6415
6.5
Gram-Schmidt method
The projection formula (6.4.15) provides an explicit and very convenient expression
for the solution y of the best approximation problem (6.4.1) provided E is a subspace
spanned by mutually orthogonal vectors {x1 , x2 , . . . xn }. If instead E = L(x1 , x2 . . . xn )
is a subspace but {x1 , x2 , . . . xn } are not orthogonal vectors, we can still use (6.4.15) to
compute y = PE x if we can find a set of orthogonal vectors {y1 , y2 , . . . ym } such that
E = L(x1 , x2 , . . . xn ) = L(y1 , y2 , . . . ym ), i.e. if we can find an orthogonal basis of E.
This may always be done by the Gram-Schmidt orthogonalization procedure from linear
algebra, which we now describe.
Assume that {x1 , x2 , . . . xn } are linearly independent, so that m = n must hold. First
set y1 = x1 . If orthogonal vectors y1 , y2 . . . yk have been chosen for some 1  k < n
such that Ek := L(y1 , y2 , . . . yk ) = L(x1 , x2 , . . . xk ) then define yk+1 = xk+1 PEk xk+1 .
Clearly {y1 , y2 , . . . yk+1 } are orthogonal since yk+1 is the projection of xk+1 onto Ek? .
Also since yk+1 , xk+1 di↵er by an element of Ek it is evident that L(x1 , x2 , . . . xk+1 ) =
L(y1 , y2 , . . . yk+1 ). Thus after n steps we obtain an orthogonal set {y1 , y2 , . . . yn } which
spans E. If the original set {x1 , x2 , . . . xn } is not linearly independent then some of the
yk ’s will be zero. After discarding these and relabeling, we obtain {y1 , y2 , . . . ym } for
some m  n, an orthogonal basis for E. Note that we may compute yk+1 using (6.4.15),
namely
k
X
< xk+1 , yj >
yk+1 = xk+1
yj
(6.5.1)
< y j , yj >
j=1
In practice the Gram-Schmidt method is often modified to produce an orthonormal
basis of E by normalizing yk to be a unit vector at each step, or else discarding it if it is
already a linear combination of {y1 , y2 , . . . yk 1 }. More explicitly:
• Set y1 =
x1
||x1 ||
• If orthonormal vectors {y1 , y2 , . . . yk } have been chosen, set
ỹk+1 = xk+1
k
X
< xk+1 , yj > yj
j=1
If ỹk+1 = 0 discard it, otherwise set yk+1 =
83
ỹk+1
.
||ỹk+1 ||
(6.5.2)
The reader may easily check
P that {y1 , y2 , . . . ym } constitutes an orthonormal basis of E,
and consequently PE x = m
j=1 < x, yj > yj for any x 2 H.
6.6
Bessel’s inequality and infinite orthogonal sequences
The formula (6.4.15) for PE may be adapted for use in infinite dimensional subspaces E.
If {xn }1
n=1 is a countable orthogonal set in H, xn 6= 0 for all n, we formally expect that
if E = L({xn }1
n=1 ) then
1
X
hx, xn i
PE x =
xn
(6.6.1)
hx
n , xn i
n=1
661
To verify that this is correct, we must show that the infinite series in (6.6.1) is guaranteed
to be convergent in H.
First of all, let us set
en =
xn
||xn ||
cn = hx, en i
EN = L(x1 , x2 , . . . xN )
(6.6.2)
cn e n
(6.6.3)
so that {en }1
n=1 is an orthonormal set, and
PE N x =
N
X
n=1
From (6.4.10) we have
N
X
n=1
|cn |2 = ||PEN x||2  ||x||2
(6.6.4)
Letting N ! 1 we obtain Bessel’s inequality
1
X
n=1
|cn |2 =
1
X
n=1
|hx, en i|2  ||x||2
(6.6.5)
besselin
The immediate implication that limn!1 cn = 0 is sometimes called the Riemann-Lebesgue
lemma.
prop63
Proposition 6.3. (Riesz-Fischer) Let {en }1
an orthonormal set in H, E = L({en }1
n=1 be P
n=1 ),
x 2 H and cn = hx, en i. Then the infinite series 1
c
e
is
convergent
in
H
to
P
x.
n
n
E
n=1
84
Proof: First we note that the series
||
M
X
P1
n=N
n=1 cn en
cn en ||2 =
is Cauchy in H since if M > N
M
X
n=N
|cn |2
(6.6.6)
P
2
which is less than any prescribed ✏ > 0 for M < N sufficiently large, since 1
n=1 |cn | <
P1
PN
1. Thus y = n=1 cn en exists in H, and clearly y 2 E. Since h n=1 cn en , em i = cm if
N > m it follows easily that < y, em >= cm =< x, em >. Thus y x ? em for any m
which implies y x 2 E ? . From Theorem 6.3 we conclude that y = PE x. 2
6.7
Characterization of a basis of a Hilbert space
Now suppose we have an orthogonal set {xn }1
n=1 and we wish to determine whether or
not it is a basis of the Hilbert space H. There are a number of interesting ways to
answer this question, summarized in Theorem 6.4 below. First we must make some more
definitions.
Definition 6.5. A collection of vectors {xn }1
n=1 is closed in H if the set of all finite linear
combinations of {xn }1
is
dense
in
H
n=1
A collection of vectors {xn }1
n=1 is complete in H if there is no nonzero vector orthogonal to all of them, i.e. hx, xn i = 0 for all n if and only if x = 0.
An orthogonal set {xn }1
n=1 in H is a maximal orthogonal set if it is not contained in
any larger orthogonal set.
basischar
Theorem 6.4. Let {en }1
n=1 be an orthonormal set in a Hilbert space H. Then the following are equivalent.
a) {en }1
n=1 is a basis of H.
P1
b) x = n=1 hx, en ien for every x 2 H.
P
c) hx, yi = 1
n=1 hx, en ihen , yi for every x, y 2 H.
P
2
d) ||x||2 = 1
n=1 |hx, en i| for every x 2 H.
e) {en }1
n=1 is a maximal orthonormal set.
f ) {en }1
n=1 is closed in H.
85
g) {en }1
n=1 is complete in H.
Proof: a) implies b): If {en }1
then for any x 2 H there exist unique
n=1 is a basis of HP
constants dn such that x = limSN where SN =
n = 1N dn en . Since hSN , em i = dm if
N > m it follows that
|dm
hx, em i| = |hSn
x, em i|  ||SN
x|| ||em || ! 0
(6.7.1)
as N ! 1, using the Schwarz inequality. Hence
x=
1
X
dn en =
n=1
1
X
n=1
hx, en ien
(6.7.2)
b) implies c): For any x, y 2 H we have
hx, yi = hx, yi = hx, lim
N !1
=
=
lim hx,
N !1
1
X
n=1
N
X
n=1
N
X
n=1
hy, en ien i
hy, en ien i = lim
N !1
hx, en ihy, en i = hx, yi =
(6.7.3)
1
X
n=1
1
X
n=1
hy, en ihx, en i
hx, en ihen , yi
(6.7.4)
(6.7.5)
Here we have used Corollary 6.1 in the second line.
c) implies d): We simply choose x = y in the identity stated in c).
d) implies e): If {en }1
n=1 is not maximal then there exists e 2 H such that
{en }1
n=1 [ {e}
(6.7.6)
is orthonormal. Since he, en i = 0 but ||e|| = 1 this contradicts d).
e) implies f ): Let E denote the set of finite linear combinations of the en ’s. If {en }1
n=1
is not closed then E 6= H so there must exist x 62 E. If we let y = x PE x then y 6= 0
and y ? E. If e = y/||y|| we would then have that {en }1
n=1 [ {e} is orthonormal so that
{en }1
could
not
be
maximal.
n=1
86
f ) implies g): Assume that hx, en i = 0 for all n. If {en }1
for any ✏ > 0
n=1 is closed then
PN
PN
2
2
2
there exists 1 , . . . N such that ||x
e
||
<
✏.
But
then
||x||
+
n
n
n=1
n=1 | n | < ✏
2
1
and in particular ||x|| < ✏. Thus x = 0 so {en }n=1 is complete.
P1
g) implies a): Let E = L({en }1
n=1 ). If x 2 H and y = PE x =
n=1 hx, en ien then
as in the proof of Proposition 6.3 hy, xn i = hx, xn i. Since {en }1
is
complete
it follows
n=1
that x = y 2 E so that L{en }1
=
H.
Since
an
orthonormal
set
is
obviously
linearly
n=1
independent it follows that {en }1
is
a
basis
of
H.
n=1
Because of the equivalence of the stated conditions, the phrases ’complete orthonormal
set’, ’maximal orthonormal set’, and ’closed orthonormal set’ are often used interchangeably with ’orthonormal basis’ in a Hilbert space setting. The identity in d) is called the
Bessel equality (recall the corresponding inequality (6.6.5) is valid whether or not the
orthonormal set {en }1
equality. For
n=1 is a basis), while the identity in c) is the ParsevalP
reasons which should become more clear in Chapter 8 the infinite series 1
n=1 hx, en ien
is often called the generalized Fourier series of x with respect to the orthonormal basis
{en }1
n=1 , and hx, en i is the n’th generalized Fourier coefficient.
th65
Theorem 6.5. Every separable Hilbert space has an orthonormal basis.
Proof: If {xn }1
n=1 is a countable dense sequence in H and we carry out the GramSchmidt procedure, we obtain an orthonormal sequence {en }1
n=1 . This sequence must
be complete, since any vector orthogonal to every en must also be orthogonal to every
1
xn , so must be zero, since {xn }1
n=1 is dense. Therefore by Theorem 6.4 {en }n=1 (or
{e1 , e2 , . . . en } in the finite dimensional case) is an orthonormal basis of H.
The same conclusion is actually correct in a no-separable Hilbert space also, but needs
more explanation. See for example Chapter 4 of [29].
6.8
Isomorphisms of a Hilbert space
There are two interesting isomorphisms of every separable Hilbert space, one is to its socalled dual space, and the second is to the sequence space `2 . In this section we explain
both of these facts.
Recall that in Chapter 5 we have already introduced X⇤ = B(X, C), the space of
continuous linear functionals on the normed linear space X. It is itself always a Banach
space (see Exercise 3 of Chapter 5), and is also called the dual space of X.
87
exmp6-10
Example 6.10. If H is a Hilbert space and y 2 H, define (x) = hx, yi. Then : H ! C
is clearly linear, and | (x)|  ||y|| ||x|| by the Schwarz inequality, hence 2 H⇤ , with
|| ||  ||y||.
The following theorem asserts that every element of the dual space H⇤ arises in this
way.
riesz
Theorem 6.6. (Riesz representation theorem) If H is a Hilbert space and
there exists a unique y 2 H such that (x) = hx, yi.
2 H⇤ then
Proof: Let M = {x 2 H : (x) = 0}, which is clearly a closed subspace of H. If M = H
then can only be the zero functional, so y = 0 has the required properties. Otherwise,
there must exist e 2 M ? such that ||e|| = 1. For any x 2 H let z = (x)e
(e)x and
observe that (z) = 0 so z 2 M , and in particular z ? e. It then follows that
0 = hz, ei = (x)he, ei
(e)hx, ei
(6.8.1)
Thus (x) = hx, yi with y := (e)e, for every x 2 H.
The uniqueness property is even easier to show. If (x) = hx, y1 i = hx, y2 i for every
x 2 H then necessarily hx, y1 y2 i = 0 for all x, and choosing x = y1 y2 we get
||y1 y2 ||2 = 0, that is, y1 = y2 .
We view the element y 2 H as ’representing’ the linear functional 2 H⇤ , hence the
name of the theorem. There are actually several theorems one may encounter, all called
the Riesz representation theorem, and what they all have in common is that the dual
space of some other space is characterized. The Hilbert space version here is by the far
the easiest of these theorems.
If we define the mapping R : H ! H⇤ (the Riesz map) by the condition R(y) = ,
with , y related as above, then Theorem 6.6 amounts to the statement that R is one
to one and onto. Since it is easy to check that R is also linear, it follows that R is an
isomorphism from H to H⇤ . In fact more is true, R is an isometric isomorphism which
means that ||R(y)|| = ||y|| for every y 2 H. To see this, recall we have already seen in
Example 6.10 that || ||  ||y||, and by choosing x = y we also get (y) = ||y||2 , which
implies || || ||y||.
Next, suppose that H is an infinite dimensional separable Hilbert space. According
to Theorem 6.5 there exists an orthonormal basis of H which cannot be finite, and so
may be written as {en }1
n=1 . Associate with any x 2 H the corresponding sequence
88
of generalized Fourier coefficients {cn }1
n=1 , where cn = hx, en i, and let ⇤ denote this
mapping, i.e. ⇤(x) = {cn }1
n=1 .
P1
2
2
We know byPTheorem 6.4 that
n=1 |cn |P< 1, i.e. ⇤(x) 2 ` . On the other
1
1
hand, suppose n=1 |cn |2 < 1 and let x =
n=1 cn en . This series is Cauchy, hence
convergent in H, by precisely the same argument as used in the beginning of the proof of
1
Proposition 6.3. Since {en }1
n=1 is a basis, we must have cn = hx, en i, thus ⇤(x) = {cn }n=1 ,
2
and consequently ⇤ : H ! ` is onto. It is also one-to-one, since ⇤(x1 ) = ⇤(x2 ) means
that hx1 x2 , en i = 0 for every n, hence x1 x2 = 0 by the completeness property of a
basis. Finally it is straightforward to check that ⇤ is linear, so that ⇤ is an isomorphism.
Like the Riesz map, the isomorphism ⇤ is also isometric, ||⇤(x)|| = ||x||, on account of
the Bessel equality. By the above considerations we have then established the following
theorem.
Theorem 6.7. If H is an infinite dimensional separable Hilbert space, then H is isometrically isomorphic to `2 .
Since all such Hilbert spaces are isometrically isomorphic to `2 , they are then obviously
isometrically isomorphic to each other. If H is a Hilbert space of dimension N , the same
arguments show that H is isometrically isomorphic to the Hilbert space RN or CN ,
depending on whether real or complex scalars are allowed. Finally, see Theorem 4.17 of
[29] for the nonseparable case.
6.9
Exercises
1. Prove Proposition 6.2.
2. In the Hilbert space L2 ( 1, 1) what is M ? if
a) M = {u : u(x) = u( x) a.e.}
b) M = {u : u(x) = 0 a.e. for
1 < x < 0}.
Give an explicit formula for the projection onto M in each case.
3. Prove that PE is a linear operator on H with norm ||PE || = 1 except in the trivial
case when E = {0}. Suggestion: If x = c1 x1 + c2 x2 first show that
PE x
c 1 PE x 1
c 2 PE x 2 =
89
PE ? x + c 1 PE ? x 1 + c 2 PE ? x 2
4. Show that the parallelogram law fails in L1 (⌦), so there is no choice of inner
product which can give rise to the norm in L1 (⌦). (The same is true in Lp (⌦) for
any p 6= 2.)
5. If (X, h·, ·i) is an inner product space prove the polarization identity
hx, yi =
1
||x + y||2
4
y||2 + i||x + iy||2
||x
i||x
iy||2
Thus, in any normed linear space, there can exist at most one inner product giving
rise to the norm.
6. Let M be a closed subspace of a Hilbert space H, and PM be the corresponding
projection. Show that
2
a) PM
= PM
b) hPM x, yi = hPM x, PM yi = hx, PM yi for any x, y 2 H.
ex6-6
7. Show that `2 is a Hilbert space. (Discussion: The only property you need to check
is completeness, and you may freely use the fact that R is complete. A Cauchy
sequence in this case is a sequence of sequences, so use a notation like
(n)
(n)
x(n) = {x1 , x2 , . . . }
(n)
where xj denotes the j’th term of the n’th sequence x(n) . Given a Cauchy sequence
(n)
2
{x(n) }1
= xj for each
n=1 in ` you’ll first find a sequence x such that limn!1 xj
fixed j. You then must still show that x 2 `2 , and one good way to do this is by
first showing that x x(n) 2 `2 for some n.)
8. Let H be a Hilbert space.
a) If xn ! x in H show that {xn }1
n=1 is bounded in H.
b) If xn ! x, yn ! y in H show that hxn , yn i ! hx, yi.
ex6-8
9. Compute orthogonal polynomials of degree 0,1,2,3 on [ 1, 1] and on [0, 1] by applying the Gram-Schmidt procedure to 1, x, x2 , x3 in L2 ( 1, 1) and L2 (0, 1). (In the
case of L2 ( 1, 1), you are finding so-called Legendre polynomials.)
10. Use the result of Exercise 9 and the projection formula (6.6.1) to compute the
best polynomial approximations of degrees 0,1,2 and 3 to u(x) = ex in L2 ( 1, 1).
Feel free to use any symbolic calculation tool you know to compute the necessary
integrals, but give exact coefficients, not calculator approximations. If possible,
produce a graph displaying u and the 4 approximations.
90
11. Let ⌦ ⇢ RN , ⇢ be a measurable function on ⌦, andR ⇢(x) > 0 a.e. on ⌦. Let X
denote the set of measurable functions u for which ⌦ |u(x)|2 ⇢(x) dx is finite. We
can then define the weighted inner product
Z
hu, vi⇢ =
u(x)v(x)⇢(x) dx
⌦
p
and corresponding norm ||u||⇢ = hu, ui⇢ on X. The resulting inner product space
is complete, often denoted L2⇢ (⌦). (As in the case of ⇢(x) = 1 we regard any two
functions which agree a.e. as being the same element, so L2⇢ (⌦) is again really a set
of equivalence classes.)
a) Verify that all of the inner product axioms are satisfied.
b) Suppose that there exist constants C1 , C2 such that 0 < C1  ⇢(x)  C2 a.e.
Show that un ! u in L2⇢ (⌦) if and only if un ! u in L2 (⌦).
12. More classes of orthogonal polynomials may be derived by applying the GramSchmidt procedure to {1, x, x2 , . . . } in L2⇢ (a, b) for various choices of ⇢, a, b, two of
which occur in Exercise 9. Another class is the Laguerre polynomials, corresponding
to a = 0, b = 1 and ⇢(x) = e x . Find the first four Laguerre polynomials.
13. Show that equality holds in the Schwarz inequality (6.2.1) if and only if x, y are
linearly dependent.
14. Show by examples that the best approximation problem (6.4.1) may not have a
solution if E is either not closed or not convex.
15. If ⌦ is a compact subset of RN , show that C(⌦) is a subspace of L2 (⌦) which isn’t
closed.
16. Show that
⇢
1
p , cos n⇡x, sin n⇡x
2
1
(6.9.1)
n=1
is an orthonormal set in L2 ( 1, 1). (Completeness of this set will be shown in
Chapter 8.)
17. For nonnegative integers n define
vn (x) = cos (n cos
1
x)
a) Show that vn+1 (x) + vn 1 (x) = 2xvn (x) for n = 1, 2, . . .
91
b) Show that vn is a polynomial of degree n (the so-called Chebyshev polynomials).
2
c) Show that {vn }1
n=1 are orthogonal in L⇢ ( 1, 1) where the weight function is
1
⇢(x) = p1 x2 .
18. If H is a Hilbert space we say a sequence {xn }1
n=1 converges weakly to x (notation:
w
xn ! x) if hxn , yi ! hx, yi for every y 2 H.
w
a) Show that if xn ! x then xn ! x.
b) Prove that the converse is false, as long as dim(H) = 1, by showing that if
w
{en }1
n=1 is any orthonormal sequence in H then en ! 0, but limn!1 en doesn’t
exist.
w
c) Prove that if xn ! x then ||x||  lim inf n!1 ||xn ||.
w
d) Prove that if xn ! x and ||xn || ! ||x|| then xn ! x.
19. Let M1 , M2 be closed subspaces of a Hilbert space H and suppose M1 ? M2 . Show
that
M1 M2 = {x 2 H : x = y + z, y 2 M1 , z 2 M2 }
is also a closed subspace of H.
92
Chapter 7
Distributions
chdist
In this chapter we will introduce and study the concept of distribution, also commonly
known as generalized function. To motivate this study we first mention two examples.
Example 7.1. The wave equation utt uxx = 0 has the general solution u(x, t) =
F (x + t) + G(x t) where F, G must be in C 2 (R) in order that u be a classical solution.
However from a physical point of view there is no apparent reason why such smoothness
restrictions on F, G should be needed. Indeed the two terms represent waves of fixed shape
moving to the left and right respectively with speed one, and it ought to be possible to
allow the shape functions F, G to have discontinuities. The calculus of distributions will
allow us to regard u as a solution of the wave equation in a well defined sense even for
such irregular F, G.
Example 7.2. In physics and engineering one frequently encounters the Dirac delta
function (x), which has the properties
Z 1
(x) = 0 x 6= 0
(x) dx = 1
(7.0.1)
1
Unfortunately these properties are inconsistent for ordinary functions – any function
which is zero except at a single point must have integral zero. The theory of distributions
will allow us to give a precise mathematical meaning to the delta function and in so doing
justify formal calculations with it.
Roughly speaking a distribution is a mathematical object whose unique identity is
specified by how it acts on all test function. It is in a sense quite analogous to a function
93
in the ordinary sense, whose unique identity is specified by how it acts (i.e. how it maps)
all points in its domain. As we will see, most ordinary functions may viewed as a special
kind of distribution, which explains the ’generalized function’ terminology. In addition,
there is a well defined calculus of distributions which is basic to the modern theory of
partial di↵erential equations. We now start to give precise meaning to these concepts.
7.1
The space of test functions
For any real or complex valued function f defined on some domain in RN , the support
of f , denoted supp f , is the closure of the set {x : f (x) 6= 0}.
Definition 7.1. If ⌦ is any open set in RN the space of test functions on ⌦ is
C01 (⌦) = { 2 C 1 (⌦) : supp
is compact in ⌦}
(7.1.1)
This function space is also commonly denoted D(⌦), which is the notation we will
use from now on. Clearly D(⌦) is a vector space, but it may not be immediately evident
that it contains any function other than ⌘ 0.
Example 7.3. Define
(x) =
(
1
e x2 1 |x| < 1
0 |x| 1
(7.1.2)
Then 2 D(⌦) with ⌦ = R. To see this one only needs to check that limx!1 (k) (x) = 0
for k = 0, 1, . . . , and similarly at x = 1. Once we have one such function then many
others can be derived from it by dilation ( (x) ! (↵x)), translation ( (x) ! (x ↵)),
scaling ( (x) ! ↵ (x)), di↵erentiation ( (x) ! (k) (x)) or any linear combination of
such terms. See also Exercise 1.
Next, we define convergence in the test function space.
Definition 7.2. If
n
2 D(⌦) then we say
n
! 0 in D(⌦) if
(i) There exists a compact set K ⇢ ⌦ such that supp
(ii) limn!1 maxx2⌦ |D↵
n (x)|
n
⇢ K for every n
= 0 for every multiindex ↵
We also say that n ! in D(⌦) provided n
! 0 in D(⌦). By specifying what
convergence of a sequence in D(⌦) means, we are partly, but not completely, specifying
94
a topology on D(⌦). We will have no need of further details about this topology, see
Chapter 6 of [30] for more on this point.
7.2
The space of distributions
sec72
We come now to the basic definition – a distribution is a continuous linear functional on
D(⌦). More precisely
Definition 7.3. A linear mapping T : D(⌦) ! C is a distribution on ⌦ if T ( n ) ! T ( )
whenever n ! in D(⌦). The set of all distributions on ⌦ is denoted D0 (⌦).
The distribution space D0 (⌦) is another example of a dual space X0 , the set of all
continuous linear functionals on X, which can be defined if X is any vector space in which
convergence of sequences is defined. The dual space is always itself a vector space. We’ll
discuss many more examples of dual spaces later on. We emphasize that the distribution
T is defined solely in terms of the values it assigns to test functions , in particular two
distributions T1 , T2 are equal if and only if T1 ( ) = T2 ( ) for every 2 D(⌦).
To clarify the concept, let us discuss a number of examples.
Example: If f 2 L1 (⌦) define
T( ) =
Z
f (x) (x) dx
(7.2.1)
⌦
Obviously |T ( )|  ||f ||L1 (⌦) || ||L1 (⌦) , so that T : D(⌦) ! C and is also clearly linear.
If n ! in D(⌦) then by the same token
|T (
n)
T ( )|  ||f ||L1 (⌦) ||
n
||L1 (⌦) ! 0
(7.2.2)
so that T is continuous. Thus T 2 D0 (⌦).
Because of the fact that must have compact support in ⌦ one does not really need
f to be in L1 (⌦) but only in L1 (K) for any compact subset K of ⌦. For any 1  p  1
let us define
Lploc (⌦) = {f : f 2 Lp (K) for any compact set K ⇢ ⌦}
95
(7.2.3)
Tf
Thus a function in Lploc (⌦) can become infinite arbitrarily rapidly at the boundary of ⌦.
We say that fn ! f in Lploc (⌦) if fn ! f in Lp (K) for every compact subset K ⇢ ⌦.
Functions in L1loc are said to be locally integrable on ⌦.
Now if we let f 2 L1loc (⌦) the definition (7.2.1) still produces a finite value, since
Z
Z
|T ( )| =
f (x) (x) dx =
f (x) (x) dx  ||f ||L1 (K) || ||L1 (K) < 1
(7.2.4)
⌦
K
if K = supp . Similarly if
containing supp and supp
|T (
n)
n
n
! in D(⌦) we can choose a fixed compact set K ⇢ ⌦
for every n, hence again
T ( )|  ||f ||L1 (K) ||
n
||L1 (K) ! 0
(7.2.5)
so that T 2 D0 (⌦).
When convenient, we will denote the distribution in (7.2.1) by Tf . The correspondence
f ! Tf allows us to think of L1loc (⌦) as a special subspace of D0 (⌦), i.e. locally integrable
functions are always distributions. The point is that a function f can be thought of as a
mapping
Z
!
f dx
(7.2.6)
⌦
instead of the more conventional
x ! f (x)
L1loc
(7.2.7)
In fact for
functions the former is in some sense more natural since it doesn’t require
us to make special arrangements for sets of measure zero. A distribution of the form
T = Tf for some f 2 L1loc (⌦) is sometimes referred to as a regular distribution, while any
distribution not of this type is a singular distribution.
The correspondence f ! Tf is also one-to-one. This is a slightly technical result in
measure theory which we leave for the exercises, for those with the necessary background.
See also Theorem 2, Chapter II of [31]:
th71
Theorem 7.1. Two distributions Tf1 , Tf2 on ⌦ are equal if and only if f1 = f2 almost
everywhere on ⌦.
Example 7.4. Fix a point x0 2 ⌦ and define
T ( ) = (x0 )
Clearly T is defined and linear on D(⌦) and if
|T (
n)
T ( )| = |
n
n (x0 )
96
!
(7.2.8)
in D(⌦) then
(x0 )| ! 0
(7.2.9)
Tdelta
since n ! uniformly on ⌦. We claim that T is not of the form Tf for any f 2 L1loc (⌦)
(i.e. not a regular distribution). To see this, suppose some such f existed. We would
then have
Z
f (x) (x) dx = 0
(7.2.10)
⌦
for any test function with (x0 ) = 0. In particular if ⌦0 = ⌦\{x0 } and 2 D(⌦0 ) then
defining (x0 ) = 0 we clearly have 2 D(⌦) and T ( ) = 0, hence f = 0 a.e. on ⌦0 and
so on ⌦, by Theorem 7.1. On the other hand we must also have, for any 2 D(⌦) that
Z
(x0 ) = T ( ) =
f (x) (x) dx
(7.2.11a)
⌦
Z
Z
Z
=
f (x)( (x)
(x0 )) dx + (x0 ) f (x) dx = (x0 ) f (x) dx
(7.2.11b)
⌦
since f = 0 a.e. on ⌦, and therefore
R
⌦
⌦
f (x) dx = 1 a contradiction.
R
Note that f (x) = 0 for a.e x 2 ⌦. and ⌦ f (x) dx 1 are precisely the formal
properties of the delta function mentioned in Example 2. We define T to be the Dirac
delta distribution with singularity at x0 , usually denoted x0 , or simply in the case
x0 = 0. By an acceptable abuse of notation, pretending that is an actual function, we
may write a formula like
Z
(x) (x) dx = (0)
(7.2.12)
⌦
⌦
but we emphasize that this is simply a formal expression of (7.2.8), and any rigorous
arguments must make use of (7.2.8) directly. In the same formal sense x0 (x) = (x x0 )
so that
Z
(x x0 ) (x) dx = (x0 )
(7.2.13)
⌦
ex-75
Example 7.5. Fix a point x0 2 ⌦, a multiindex ↵ and define
T ( ) = (D↵ )(x0 )
(7.2.14)
One may show, as in the previous example, that T 2 D0 (⌦).
Example 7.6. Let ⌃ be a sufficiently smooth hypersurface in ⌦ of dimension m  n 1
and define
Z
T( ) =
(x) ds(x)
(7.2.15)
⌃
where ds is the surface area element on ⌃. Then T is a distribution on ⌦ sometimes
referred to as the delta distribution concentrated on ⌃, sometimes written as ⌃ .
97
ex-77
Example 7.7. Let ⌦ = R and define
T ( ) = lim
✏!0+
Z
|x|>✏
(x)
dx
x
(7.2.16)
As we’ll show below, the indicated limit always exists and is finite for
for 2 C01 (⌦)). In general, a limit of the form
Z
lim
f (x) dx
✏!0+
2 D(⌦) (even
(7.2.17)
⌦\|x a|>✏
R
when it exists, is called the Cauchy principal value of ⌦ f (x) dx, which may be finite
R
R1
even when ⌦ f (x) dx is divergent in the ordinary sense. For example 1 dx
is divergent,
x
regarded as either a Lebesgue integral or an improper Riemann integral, but
Z
dx
lim
=0
(7.2.18)
✏!0+ 1>|x|>✏ x
To distinguish the principal value meaning of the integral, the notation
Z
pv f (x) dx
(7.2.19)
⌦
may be used instead of (7.2.17), where the point a in question must be clear from context.
Let us now check that (7.2.16) defines a distribution. If supp
Z
|x|>✏
(x)
dx =
x
Z
M >|x|>✏
(x)
dx =
x
Z
(x)
x
M >|x|>✏
and the last term on the right is zero, we have
Z
T ( ) = lim
✏!0+
where (x) = ( (x)
(0)
dx + (0)
(x) dx
⇢ [ M, M ] then since
Z
M >|x|>✏
1
dx (7.2.20)
x
(7.2.21)
M >|x|>✏
(0))/x. It now follows from the mean value theorem that
Z
|T ( )| 
| (x)| dx  2M || 0 ||L1
(7.2.22)
|x|<M
98
pv1x
pvdef
so T ( ) is defined and finite for all test functions. Linearity of T is clear, and if n !
in D(⌦) then
0
|T ( n ) T ( )|  2M || 0n
||L1 ! 0
(7.2.23)
where M is chosen so that supp
ous.
n,
supp
⇢ [ M, M ], and it follows that T is continu-
The distribution T is often denoted pv x1 , so for example pv x1 ( ) means the same
thing as the right hand side of (7.2.16). For reasons which will become more clear later,
it may also be referred to as pf x1 , pf standing for pseudofunction (also finite part).
7.3
7.3.1
Algebra and Calculus with Distributions
Multiplication of distributions
As noted above D0 (⌦) is a vector space, hence distributions can be added and multiplied
by scalars. In general it is not possible to multiply together arbitrary distributions –
for example 2 = · cannot be defined in any consistent way. It is always possible,
however, to multiply a distribution by a C 1 function. More precisely, if a 2 C 1 (⌦) and
T 2 D0 (⌦) then we may define the product aT as a distribution via
Definition 7.4. aT ( ) = T (a )
2 D(⌦)
Clearly a 2 D(⌦) so that the right hand side is well defined, and it it straightforward to check that aT satisfies the necessary linearity and continuity conditions. One
should also note that if T = Tf then this definition is consistent with ordinary pointwise
multiplication of the functions f and a.
7.3.2
Convergence of distributions
An appropriate definition of convergence of a sequence of distributions is as follows.
convD
Definition 7.5. If T, Tn 2 D0 (⌦) for n = 1, 2 . . . then we say Tn ! T in D0 (⌦) (or in
the sense of distributions) if Tn ( ) ! T ( ) for every 2 D(⌦).
99
It is an interesting fact, which we shall not prove here, that it is not necessary to
assume that the limit T belongs to D0 (⌦), that is to say, if T ( ) := limn!1 Tn ( ) exists
for every 2 D(⌦) then necessarily T 2 D0 (⌦), (see Theorem 6.17 of [30]).
Example 7.8. If fn 2 L1loc (⌦) and fn ! f in L1loc (⌦) then the corresponding distribution
Tfn ! Tf in the sense of distributions, since
Z
|Tfn ( ) Tf ( )| 
|fn f || | dx  ||fn f ||L1 (K) || ||L1 (⌦)
(7.3.1)
K
where K is the support of . Because of the one-to-one correspondence f $ Tf , we will
usually write instead that fn ! f in the sense of distributions.
Example 7.9. Define
fn (x) =
We claim that fn !
|Tfn ( )
⇢
n 0 < x < n1
0 otherwise
(7.3.2)
in the sense of distributions. We see this by first observing that
( )| = n
Z
1
n
(x) dx
(0) = n
0
Z
1
n
( (x)
(0)) dx
By the continuity of , if ✏ > 0 there exists > 0 such that | (x)
|x|  . Thus if we choose n > 1 there follows
n
Z
1
n
0
| (x)
(0)| dx  n✏
Z
(7.3.3)
0
1
n
dx = ✏
(0)|  ✏ whenever
(7.3.4)
0
from which the conclusion follows.
Note that the formal properties of the function,
R
(x) = 0, x 6= 0, (0) = +1,
(x) dx = 1, are clearly reflected in the pointwise limit
of the sequence fn , but it is only the distributional definition that is mathematically
satisfactory.
Sequences converging to play a very large role in methods of applied mathematics,
especially in the theory of di↵erential and integral equations. The following theorem
includes many cases of interest.
dst
Theorem 7.2. Suppose fn 2 L1 (RN ) for n = 1, 2, . . . and assume
R
a) RN fn (x) dx = 1 for all n.
b) There exists a constant C such that ||fn ||L1 (RN )  C for all n.
100
ds2
c) limn!1
If
R
|x|>
|fn (x)| dx = 0 for all
> 0.
is bounded on RN and continuous at x = 0 then
Z
lim
fn (x) (x) dx = (0)
n!1
and in particular fn !
Proof: For any
(7.3.5)
RN
in D0 (RN ).
2 D(⌦) we have
Z
fn (x) (x) dx
(0) =
RN
Z
RN
fn (x)( (x)
(0)) dx
(7.3.6)
ds1
and so we will be done if we show that that the integral on the right tends to zero as
n ! 1. Fix ✏ > 0 and choose > 0 such that | (x)
(0)|  ✏ whenever |x| < . Write
the integral on the right in (7.3.6) as the sum An, + Bn, where
Z
Z
An, =
fn (x)( (x)
(0)) dx
Bn, =
fn (x)( (x)
(0)) dx
(7.3.7)
|x|
|x|>
We then have, by obvious estimations, that
Z
|An, |  ✏
|fn (x)|  C✏
(7.3.8)
RN
while
lim sup |Bn, |  lim sup 2|| ||L1
n!1
n!1
Thus
lim sup
n!1
Z
RN
Z
|x|>
fn (x) (x) dx
|fn (x)| dx = 0
(0)  C✏
(7.3.9)
(7.3.10)
and the conclusion follows since ✏ > 0 is arbitrary. 2
It is often the case that fn 0 for all n, in which case assumption b) follows automatically from a) with C = 1. We will refer to any sequence satisfying the assumptions
of Theorem 7.2 as a delta sequence.
A common way to construct such a sequence is to
R
pick any f 2 L1 (RN ) with RN f (x) dx = 1 and set
fn (x) = nN f (nx)
(7.3.11)
The verification of this is left to the exercises. If, for example, we choose f (x) = [0,1] (x),
then the resulting sequence fn (x) is the same as is defined in (7.3.2). Since we can also
choose such an f in D(RN ) we also have
101
7311
dst2
N
Proposition 7.1. There exists a sequence {fn }1
n=1 such that fn 2 D(R ) and fn !
0
N
in D (R ).
7.3.3
Derivative of a distribution
Next we explain how it is possible to define the derivative of an arbitrary distribution.
For the moment, suppose (a, b) ⇢ R, f 2 C 1 (a, b) and T = Tf is the corresponding
distribution. We clearly then have from integration by parts that
Z b
Z b
0
0
Tf ( ) =
f (x) (x) dx =
f (x) 0 (x) dx = Tf ( 0 )
(7.3.12)
a
This suggests defining
a
T 0( ) =
T ( 0)
0
2 C01 (a, b)
(7.3.13)
T ( 0) = T 0( )
(7.3.14)
whenever T 2 D (a, b). The previous equation shows that this definition is consistent
with the ordinary concept of di↵erentiability for C 1 functions. Clearly, T 0 ( ) is always
defined, since 0 is a test function whenever is, linearity of T 0 is obvious, and if n !
in C01 (a, b) then 0n ! 0 also in C01 (a, b) so that
T 0(
n)
=
T(
0
n)
!
Thus, T 0 2 D0 (a, b).
Example: Consider the case of the Heaviside (unit step) function H(x)
⇢
0 x<0
H(x) =
1 x>0
(7.3.15)
If we seek the derivative of H (i.e. of TH ) according to the above distributional definition,
then we compute
Z 1
Z 1
0
H 0 ( ) = H( 0 ) =
H(x) 0 (x) dx =
(x) dx = (0)
(7.3.16)
1
0
(where we use the natural notation H 0 in place of TH0 ). This means that H 0 ( ) = ( ) for
any test function , and so H 0 = in the sense of distributions. This relationship clearly
captures the fact the H 0 = 0 at all points where the derivative exists in the classical
sense, since we think of the delta function as being zero on any interval not containing
102
the origin. Since H is not di↵erentiable at the origin, the distributional derivative is itself
a distribution which is not a function.
Since
is again a distribution, it will itself have a derivative, namely
0
( 0) =
( )=
0
(0)
(7.3.17)
a distribution of the type discussed in Example 7.5, often referred to as the dipole distribution, which of course we may regard as the second derivative of H.
For an arbitrary domain ⌦ ⇢ RN and sufficiently smooth function f we have the
similar integration by parts formula (see (18.2.3))
Z
Z
@f
@
dx =
f
dx
(7.3.18)
@x
@x
i
i
⌦
⌦
leading to the definition
Definition 7.6.
@T
( )=
@xi
T
✓
@
@xi
◆
2 D(⌦)
(7.3.19)
@T
As in the one dimensional case we easily check that @x
belongs to D0 (⌦) whenever T
i
does. This has the far reaching consequence that every distribution is infinitely di↵erentiable in the sense of distributions. Furthermore we have the general formula, obtained
by repeated application of the basic definition, that
(D↵ T )( ) = ( 1)|↵| T (D↵ )
(7.3.20)
for any multiindex ↵.
A simple and useful property is
prop72
Proposition 7.2. If Tn ! T in D0 (⌦) then D↵ Tn ! D↵ T in D0 (⌦) for any multiindex
↵.
Proof: D↵ Tn ( ) = ( 1)|↵| Tn (D↵ ) ! ( 1)|↵| T (D↵ ) = D↵ T ( ) for any test function
. 2
Next we consider a more generic one dimensional situation. Let x0 2 R and consider
a function f which is C 1 on ( 1, x0 ) and on (x0 , 1), and for which f (k) has finite left
103
and right hand limits at x = x0 , for any k. Thus, at the point x = x0 , f or any of its
derivatives may have a jump discontinuity, and we denote
k
f = lim f (k) (x)
x!x0 +
(and by convention
f=
lim f (k) (x)
(7.3.21)
x!x0
0
f .) Define also
⇢ (k)
f (x)
x 6= x0
(k)
[f ](x) =
undefined x = x0
(7.3.22)
which we’ll refer to as the pointwise k’th derivative. The notation f (k) will always be
understood to mean the distributional derivative unless otherwise stated. The distinction
between f (k) and [f (k) ] is crucial, for example if f (x) = H(x), the Heaviside function,
then H 0 = but [H 0 ] = 0 for x 6= 0, and is undefined for x = 0.
If
For f as described above, we now proceed to calculate the distributional derivative.
2 C01 (R) we have
Z 1
Z x0
Z 1
0
0
f (x) (x) dx =
f (x) (x) dx +
f (x) 0 (x) dx
(7.3.23a)
1
1
x0
Z x0
Z 1
x0
1
0
= f (x) (x) 1
f (x) (x) dx + f (x) (x) x0
f 0 (x) (x) dx (7.3.23b)
1
x0
Z 1
=
[f 0 (x)] (x) dx + (f (x0 ) f (x0 +)) (x0 )
(7.3.23c)
1
It follows that
f 0( ) =
or
Z
1
[f 0 (x)] (x) dx + ( f ) (x0 )
(7.3.24)
1
f 0 = [f 0 ] + ( f ) (x
x0 )
(7.3.25)
Note in particular that f 0 = [f 0 ] if and only if f is continuous at x0 .
The function [f 0 ] satisfies all of the same assumptions as f itself, with
thus we can di↵erentiate again in the distribution sense to obtain
f 00 = [f 0 ]0 + ( f ) 0 (x
x0 ) = [f 00 ] + (
1
f ) (x
x0 ) + ( f ) 0 (x
Here we use the evident fact that the distributional derivative of (x
104
f0 =
x0 )
x0 ) is
[f 0 ],
(7.3.26)
0
(x
x0 ).
A similar calculation can be carried out for higher derivatives of f , leading to the
general formula
k 1
X
f (k) = [f (k) ] +
( j f ) (k 1 j) (x x0 )
(7.3.27)
j=0
One can also obtain a similar formula if f is allowed to have any finite number of such
singular points.
Example 7.10. Let
f (x) =
⇢
x
x<0
cos x x > 0
(7.3.28)
Clearly f satisfies all of the assumptions mentioned above with x0 = 0, and
⇢
1
x<0
[f 0 ](x) =
sin x x > 0
⇢
0
x<0
00
[f ](x) =
cos x x > 0
so that
f = 1,
1
f=
(7.3.29)
(7.3.30)
1. Thus
f 0 = [f 0 ] +
f 00 = [f 00 ]
+
0
(7.3.31)
Here is one more instructive example in the one dimensional case.
Example 7.11. Let
f (x) =
(
log x x > 0
0 x0
(7.3.32)
Since f 2 L1loc (R) we may regard it as a distribution on R, but its pointwise derivative
H(x)/x is not locally integrable, so does not have an obvious distributional meaning.
Nevertheless f 0 must exist in the sense of D0 (R). To find it we use the definition above,
Z 1
Z 1
0
0
f 0( ) =
f ( 0) =
(x) log x dx = lim
(x) log x dx (7.3.33)
✏!0+
0
✏

Z 1
(x)
= lim
(✏) log ✏ +
dx
(7.3.34)
✏!0+
x
✏

Z 1
(x)
= lim
(0) log ✏ +
dx
(7.3.35)
✏!0+
x
✏
105
7327
where the final equality is valid because the di↵erence between it and the previous line
is lim✏!0 ( (✏)
(0))
⇣ log⌘✏ = 0. The functional defined by the final expression above will
1
be denoted as pf H(x)
, i.e.
x
pf
✓
H(x)
x
◆
( ) = lim
✏!0+

(0) log ✏ +
Z
1
✏
(x)
dx
x
(7.3.36)
Since we have already established
⇣
⌘ that the derivative of a distribution is also a distriH(x)
bution, it follows that pf x
2 D0 (R) and in particular the limit here always exists
for 2 D(R). It should be emphasized that if (0) 6= 0 then neither of the two terms
on the right hand side in (7.3.36) will have a finite limit separately, but the sum always
will. For a test function
⇣ with
⌘ support disjoint from the singularity at x = 0, the action
H(x)
of the distribution pf x
coincides with that of the ordinary function H(x)/x, as we
might expect.
Next we turn to examples involving partial derivatives.
exm7-12
Example 7.12. Let F 2 L1loc (R) and set u(x, t) = F (x + t). We claim that utt uxx = 0
in D0 (R2 ). Recall that this is the point that was raised in the first example at the
beginning of this chapter. A similar argument works for F (x t). To verify this claim,
first observe that for any 2 D(R2 )
ZZ
(utt uxx )( ) = u( tt
F (x + t)( tt (x, t)
(7.3.37)
xx ) =
xx (x, t)) dxdt
R2
Make the change of coordinates
⇠=x
to obtain
(utt
since
exm7-13
uxx )( ) = 2
Z
1
1
F (⌘)
Z
1
t ⌘ =x+t
⇠⌘ (⇠, ⌘) d⇠ d⌘ = 2
1
Z
1
1
F (⌘)
1
⌘ (⇠, ⌘)|⇠= 1
d⌘ = 0
(7.3.39)
has compact support.
Example 7.13. Let N
3 and define
u(x) =
1
(7.3.38)
1
|x|N
2
Recall the pf notation was introduced earlier in Section 7.2.
106
(7.3.40)
pf1
We claim that
in D0 (RN )
u = CN
(7.3.41)
7341
where CN = (2 N )⌦N 1 and ⌦N 1 is the surface area of the unit sphere in R . First
note that for any R we have
Z
Z R
1 N 1
|u(x)| dx = ⌦N 1
r
dr < 1
(7.3.42)
N
2
|x|<R
0 r
2
N
(using, for example (18.3.1)) so u 2 L1loc (RN ) and in particular u 2 D0 (RN ).
It is natural here to use spherical coordinates in RN , see Section 18.3 for a review. In
particular the expression for the Laplacian in spherical coordinates may be derived from
the chain rule, as was done in (2.3.67) for the two dimensional case. When applied to a
function depending only on r = |x|, such as u, the result is
u = urr +
N
1
r
(see Exercise 17 of Chapter 2) and it follows that
ur
(7.3.43)
u(x) = 0 for x 6= 0.
We may use Green’s identity (18.2.6) to obtain, for any 2 D(RN )
Z
Z
u(x) (x) dx = lim
u(x) (x) dx
u( ) = u( ) =
✏!0+ |x|>✏
RN
Z
✓
◆
Z
@
@u
= lim
u(x) (x) dx +
u(x) (x)
(x) (x) dS(x)
✏!0+
@n
@n
|x|>✏
|x|=✏
Since
u = 0 for x 6= 0 and
lim
✏!0+
2
Z
|x|=✏
The usual notation is to use N
1.
=
Z
(7.3.45)
@
@r
on {x : |x| = ✏} this simplifies to
✓
◆
2 N
1 @
u( ) = lim
(x)
(x) dS(x)
✏!0+ |x|=✏
✏N 1
✏N 2 @r
We next observe that
N
@
@n
(7.3.44)
2 N
(x) dS(x) = (2
✏N 1
N )⌦N
1
(0)
(7.3.46)
(7.3.47)
1 rather than N as the subscript because the sphere is a surface of dimension
107
radialNlaplacian
since the average of over the sphere of radius ✏ converges to (0) as ✏ ! 0. Finally, the
second integral tends to zero, since
Z
1 @
⌦N 1 ✏ N 1
(x)
dS(x)

||r ||L1 ! 0
(7.3.48)
N 2 @r
✏N 2
|x|=✏ ✏
Thus (7.3.41) holds. When N = 2 an analogous calculation shows that if u(x) = log |x|
then u = 2⇡ in D0 (R2 ).
7.4
Convolution and distributions
If f, g are locally integrable functions on RN the classical convolution of f and g is defined
to be
Z
(f ⇤ g)(x) =
f (x y)g(y) dy
(7.4.1)
RN
whenever the integral is defined. By an obvious change of variable we see that convolution
is commutative, f ⇤ g = g ⇤ f .
Proposition 7.3. If f 2 Lp (RN ) and g 2 Lq (RN ) then f ⇤ g 2 Lr (RN ) if 1 + 1r =
so in particular is defined almost everywhere. Furthermore
||f ⇤ g||Lr (RN )  ||f ||Lp (RN ) ||g||Lq (RN )
1
p
+ 1q ,
(7.4.2)
The inequality (7.4.2) is Young’s convolution inequality, and we refer to [37] (Theorem
9.2) for a proof. In the case r = 1 it can actually be shown that f ⇤ g 2 C(RN ).
Our goal here is to generalize the definition of convolution in such a way that at least
one of the two factors can be a distribution. Let us introduce the notations for translation
and inversion of a function f ,
(⌧h f )(x) = f (x
fˇ(x) = f ( x)
h)
(7.4.3)
(7.4.4)
so that f (x y) = (⌧x fˇ)(y). If f 2 D(RN ) then so is (⌧x fˇ) so that (f ⇤ g)(x) may be
regarded as Tg (⌧x fˇ), i.e. the value obtained when the distribution corresponding to the
locally integrable function g acts on the test function (⌧x fˇ). This motivates the following
definition.
108
youngci
convdp
Definition 7.7. If T 2 D0 (RN ) and
2 D(RN ) then (T ⇤ )(x) = T (⌧x ˇ).
By this definition (T ⇤ )(x) exists and is finite for every x 2 RN but other smoothness
or decay properties of T ⇤ may not be apparent.
Example 7.14. If T =
then
(T ⇤ )(x) = (⌧x ˇ) = (⌧x ˇ)(y)|y=0 = (x
y)|y=0 = (x)
Thus, is the ’convolution identity’, ⇤ = at least for
corresponds to the widely used formula
Z
(x y) (y) dy = (x)
(7.4.5)
2 D(RN ). Formally this
(7.4.6)
RN
If Tn !
in D0 (RN ) then likewise
(Tn ⇤ )(x) = Tn (⌧x ˇ) ! (⌧x ˇ) = (x)
(7.4.7)
ci3
for any fixed x 2 RN .
A key property of convolution is that in computing a derivative D↵ (T ⇤ ), the derivative may be applied to either factor in the convolution. More precisely we have the
following theorem.
convth1
Theorem 7.3. If T 2 D0 (RN ) and
2 D(RN ) then T ⇤
multi-index ↵
D↵ (T ⇤ ) = D↵ T ⇤ = T ⇤ D↵
2 C 1 (RN ) and for any
(7.4.8)
Proof: First observe that
( 1)|↵| D↵ (⌧x ˇ) = ⌧x ((D↵ )ˇ)
(7.4.9)
and applying T to these identical test functions we get the right hand equality in (7.4.8).
We refer to Theorem 6.30 of [30] for the proof of the left hand equality.
When f, g are continuous functions of compact support it is elementary to see that
supp (f ⇤ g) ⇢ supp f + supp g. The same property holds for T ⇤ if T 2 D0 (RN ) and
2 D(RN ), once a proper definition of the support of a distribution is given.
If ! ⇢ ⌦ is an open set we say that T = 0 in ! if T ( ) = 0 whenever 2 D(⌦) and
supp ( ) ⇢ !. If W denotes the largest open subset of ⌦ on which T = 0 (equivalently the
109
ci2
union of all open subsets of ⌦ on which T = 0) then the support of T is the complement
of W in ⌦. In other words, x 62 supp T if there exists ✏ > 0 such that T ( ) = 0 whenever
is a test function with support in B(x, ✏). One can easily verify that the support
of a distribution is closed, and agrees with the usual notion of support of a function,
up to sets of measure zero. The set of distributions of compact support in ⌦ forms a
vector subspace of D0 (⌦) denoted E 0 (⌦). This notation is appropriate because E 0 (⌦)
turns out to be precisely the dual space of C 1 (RN ) =: E(RN ) when a suitable definition
of convergence is given, see for example Chapter II, section 5 of [31].
If now T 2 E 0 (RN ) and
Thus
2 D(RN ), we observe that
supp (⌧x ˇ) = x supp
(T ⇤ )(x) = T (⌧x ˇ) = 0
(7.4.10)
(7.4.11)
unless there is a nonempty intersection of supp T and x supp , in other words, x 2
supp T + supp . Thus from these remarks and Theorem 7.3 we have
convth2
Proposition 7.4. If T 2 E 0 (RN ) and
and in particular T ⇤
2 D(RN ) then
supp (T ⇤ ) ⇢ supp T + supp
(7.4.12)
2 D(RN ).
Convolution provides an extremely useful and convenient way to approximate functions and distributions by very smooth functions, the exact sense in which the approximation takes place being dependent on the object being approximated. We will discuss
several results of this type.
thuapprox
Theorem
7.4. Let f 2 C(RN ) with supp f compact in RN . Pick
2 D(RN ), with
R
N
(x) dx = 1, set n (x) = n (nx) and fn = f ⇤ n . Then fn 2 D(RN ) and fn ! f
RN
uniformly on RN .
Proof: The fact that fn 2 D(RN ) is immediate from Proposition 7.4. Fix ✏ > 0. By the
assumption that f is continuous and of compact support it must be uniformly continuous
on RN so there exists > 0 such that |f (x) f (z)| < ✏ if |x z| < . Now choose n0
such that supp n ⇢ B(0, ) for n > n0 . We then have, for n > n0 that
Z
|fn (x) f (x)| =
(fn (x y) fn (x)) n (y) dy 
(7.4.13)
RN
Z
|fn (x y) f (x)|| n (y)| dy  ✏|| ||L1 (RN )
(7.4.14)
|y|<
110
and the conclusion follows. 2
thLpApprox
If f is not assumed continuous then of course it is not possible for there to exist
fn 2 D(RN ) converging uniformly to f . However the following can be shown.
R
Theorem 7.5. Let f 2 Lp (RN ), 1  p < 1. Pick 2 D(RN ), with RN (x) dx = 1,
set n (x) = nN (nx) and fn = f ⇤ n . Then fn 2 C 1 (RN ) \ Lp (RN ) and fn ! f in
Lp (RN ).
Proof: If ✏ > 0 we can find g 2 C(RN ) of compact support such that ||f
If gn = g ⇤ n then
||f
fn ||Lp (RN )  ||f g||Lp (RN ) + ||g gn ||Lp (RN ) + ||fn
 C||f g||Lp (RN ) + ||g gn ||Lp (RN )
g||Lp (RN ) < ✏.
gn ||Lp (RN ) (7.4.15)
(7.4.16)
where we have used Young’s convolution inequality (7.4.2) to obtain
||fn
gn ||Lp (RN )  ||
n ||L1 (RN ) ||f
g||Lp (RN ) = || ||L1 (RN ) ||f
g||Lp (RN )
(7.4.17)
Since gn ! g uniformly by Theorem 7.4 and g gn has support in a fixed compact set
independent of n, it follows that ||g gn ||Lp (RN ) ! 0, and so lim supn!1 ||f fn ||Lp (RN ) 
C✏.
Further refinements and variants of these results can be proved, see for example
Section C.4 of [10].
Next consider the even more general case that T 2 D0 (RN ). As in Proposition 7.1
we can choose n 2 D(RN ) such that n ! in D0 (RN ). Set Tn = T ⇤ n , so that
Tn 2 C 1 (RN ). If 2 D(RN ) we than have
Tn ( ) = (Tn ⇤ ˇ)(0) = ((Tn ⇤ n ) ⇤ ˇ)(0) = ((T ⇤ n ) ⇤ ˇ)(0)
= (T ⇤ ( n ⇤ ˇ))(0) = T (( n ⇤ ˇ)ˇ)
It may be checked that n ⇤ ˇ ! ˇ in D(RN ), thus Tn ( ) ! T ( ) for all
that is, Tn ! T in D0 (RN ).
(7.4.18)
(7.4.19)
2 D(RN ),
In the above derivation we used associativity of convolution. This property is not
completely obvious, and in fact is false in a more general setting in which convolution
of two distributions is defined. For example, if we were to assume that convolution of
distributions was always defined and that Theorem 7.3 holds, we would have 1⇤( 0 ⇤H) =
111
1 ⇤ H 0 = 1 ⇤ = 1, but (1 ⇤ 0 ) ⇤ H = 0 ⇤ H = 0. Nevertheless, associativity is correct in
the case we have just used it, and we refer to [30] Theorem 6.30(c), for the proof.
The pattern of the results just stated is that T ⇤ n converges to T in the topology
appropriate to the space that T itself belongs to, but this cannot be true in all situations
which may be encountered. For example it cannot be true that if f 2 L1 then f ⇤ n
converges to f in L1 since this would amount to uniform convergence of a sequence of
continuous functions, which is impossible if f itself is not continuous.
7.5
ex7-1
Exercises
1. Construct a test function 2 C01 (R) with the following properties: 0  (x)  1
for all x 2 R, (x) ⌘ 1 for |x| < 1 and (x) ⌘ 0 for |x| > 2. (Suggestion: think
about what 0 would have to look like.)
2. Show that
T( ) =
1
X
(n)
(n)
n=1
defines a distribution T 2 D0 (R).
3. If 2 D(R) show that (x) = ( (x)
(0))/x (this function
R 1 0 appeared in Example
1
7.7) belongs to C (R). (Suggestion: first prove (x) = 0 (xt) dt.)
4. Find the distributional derivative of f (x) = [x], the greatest integer function.
5. Find the distributional derivatives up through order four of f (x) = |x| sin x.
6. (For readers familiar with the concept of absolute continuity.) If f is absolutely
continuous on (a, b) and f 0 = g a.e., show that f 0 = g in the sense of distributions
on (a, b).
7-3
7. Let
n
> 0,
n
! +1 and set
fn (x) = sin
nx
gn (x) =
sin n x
⇡x
a) Show that fn ! 0 in D0 (R) as n ! 1.
b) Show that gn ! in D0 (R) as n ! 1.
R1
(You may use without proof the fact that the value of the improper integral 1
is ⇡.)
112
sin x
x
dx
8. Let
2 C01 (R) and f 2 L1 (R).
a) If n (x) = n( (x + n1 )
(x)), show that
the mean value theorem over and over again.)
b) If gn (x) = n(f (x + n1 )
n
!
0
in C01 (R). (Suggestion: use
f (x)), show that gn ! f 0 in D0 (R).
1
9. Let T = pv . Find a formula analogous to (7.3.35) for the distributional derivative
x
of T .
10. Find limn!1 sin2 nx in D0 (R), or show that it doesn’t exist.
ex7-11
11. Define the distribution
T( ) =
Z
1
(x, x) dx
1
for 2 C01 (R2 ). Show that T satisfies the wave equation uxx uyy = 0 in the sense
of distributions on R2 . Discuss why it makes sense to regard T as being (x y).
12. Let ⌦ ⇢ RN be a bounded open set and K ⇢⇢ ⌦. Show that there exists 2
C01 (⌦) such that 0  (x)  1 and (x) ⌘ 1 for x 2 K. (Hint: approximate the
characteristic function of ⌃ by convolution, where ⌃ satisfies K ⇢⇢ ⌃ ⇢⇢ ⌦. Use
Proposition 7.4 for the needed support property.)
13. If a 2 C 1 (⌦) and T 2 D0 (⌦) prove the product rule
@
@T
@a
(aT ) = a
+
T
@xj
@xj @xj
14. Let T 2 D0 (RN ). We may then regard 7 ! A = T ⇤ as a linear mapping
from C01 (Rn ) into C 1 (Rn ). Show that A commutes with translations, that is,
⌧h A = A⌧h for any 2 C01 (RN ). (The following interesting converse statement
can also be proved: If A : C01 (RN ) 7 ! C(RN ) is continuous and commutes with
translations then there exists a unique T 2 D0 (RN ) such that A = T ⇤ . An
operator commuting with translations is also said to be translation invariant.)
R1
15. If f 2 L1 (RN ), 1 f (x) dx = 1, and fn (x) = nN f (nx), use Theorem 7.2 to show
that fn ! in D0 (RN ).
16. Prove Theorem 7.1.
113
17. If T 2 D0 (⌦) prove the equality of mixed partial derivatives
@ 2T
@ 2T
=
@xi @xj
@xj @xi
(7.5.1)
in the sense of distributions, and discuss why there is no contradiction with known
examples from calculus showing that the mixed partial derivatives need not be
equal.
18. Show that the expression
T( ) =
Z
1
(x)
(0)
|x|
1
dx +
Z
|x|>1
defines a distribution on R. Show also that xT = sgn x.
(x)
dx
|x|
19. If f is a function defined on RN and > 0, let f (x) = f ( x). We say that f is
homogeneous of degree ↵ if f = ↵ f for any > 0. If T is a distribution on RN
we say that T is homogeneous of degree ↵ if
T(
↵ N
)=
T(
1
)
a) Show that these two definitions are consistent, i.e., if T = Tf for some f 2
L1loc (RN ) then T is homogeneous of degree ↵ if and only if f is homogeneous of
degree ↵.
b) Show that the delta function is homogeneous of degree N .
ex7-17
20. Show that u(x) =
1
2⇡
log |x| satisfies
in D0 (R2 ).
u=
21. Without appealing to Theorem 7.3, give a direct proof of the fact that T ⇤
continuous function of x, for T 2 D0 (RN ) and 2 D(RN ).
22. Let
f (x) =
(
is a
log2 x x > 0
0
x<0
Show that f 2 D0 (R) and find the distributional derivative f 0 . Is f a tempered
distribution?
23. If a 2 C 1 (R), show that
a 0 = a(0)
0
a0 (0)
24. If T 2 D0 (RN ) has compact support, show that T ( ) is defined in an unambiguous
way for any 2 C 1 (RN ) =: E(RN ). (Suggestion: write =
+ (1
) where
2 D(RN ) satisfies ⌘ 1 on the support of T .)
114
Chapter 8
Fourier analysis and distributions
chfourier
In this chapter We present some of the elements of Fourier analysis, with special attention
to those aspects arising in the theory of distributions. Fourier analysis is often viewed
as made up of two parts, one being a collection of topics relating to Fourier series, and
the second being those connected to the Fourier transform. The essential distinction is
that the former focuses on periodic functions while the latter is concerned with functions
defined on all of RN . In either case the central question is that of how we may represent
fairly arbitrary functions, or even distributions, as combinations of particularly simple
periodic functions.
We will begin with Fourier series, and restrict attention to the one dimensional case.
See for example [25] for treatment of multidimensional Fourier series.
8.1
Fourier series in one space dimension
The fundamental point is that if un (x) = einx then the functions {un }1
n= 1 make up
an orthogonal basis of L2 ( ⇡, ⇡). It will then follow from the general considerations of
Chapter 6 that any f 2 L2 ( ⇡, ⇡) may expressed as a linear combination
f (x) =
1
X
n= 1
115
cn einx
(8.1.1)
81
where
hf, un i
1
cn =
=
hun , un i
2⇡
Z
⇡
f (y)e
iny
dy
(8.1.2)
82
⇡
The right hand side of (8.1.1) is a Fourier series for f , and (8.1.2) is a formula for the n’th
Fourier coefficient of f . It must be understood that the equality in (8.1.1) is meant only
in the sense of L2 convergence of the partial sums, and need not be true at any particular
point. From the theory of Lebesgue integration it follows that there is a subsequence
of the partial sums which will converge almost everywhere on ( ⇡, ⇡), but more than
P
inx
that we cannot say, without further assumptions on f . Any finite sum N
is
n= N n e
called a trigonometric polynomial, so in particular we will be showing that trigonometric
polynomials are dense in L2 ( ⇡, ⇡).
Let us set
1
en (x) = p einx
2⇡
n = 0, ±1, ±2, . . .
n
1 X ikx
Dn (x) =
e
2⇡ k= n
(8.1.3)
(8.1.4)
82a
N
KN (x) =
1 X
Dn (x)
N + 1 n=0
(8.1.5)
It is immediate from checking the necessary integrals that {en }1
n= 1 is an orthonormal
set in H = L2 ( ⇡, ⇡). The main goal for the rest of this section is to prove that {en }1
n= 1
is actually an orthonormal basis of H.
For the rest of this section, the inner product symbol hf, gi and norm || · || refer to the
inner product and norm in H unless otherwise stated. In the context of Fourier analysis,
Dn , KN are known as the Dirichlet kernel and Féjer kernel respectively. Note that
Z ⇡
Z ⇡
Dn (x) dx =
KN (x) dx = 1
(8.1.6)
⇡
⇡
for any n, N .
If f 2 H, let
sn (x) =
n
X
k= n
116
ck eikx
(8.1.7)
83
where ck is given by (8.1.2) and
N
1 X
sn (x)
N (x) =
N + 1 n=0
Since
n
X
sn (x) =
k= n
(8.1.8)
hf, ek iek (x)
(8.1.9)
it follows that the partial sum sn is also the projection of f onto the span of {ek }nk= n
and so in particular the Bessel inequality
v
uX
u n
||sn || = t
|ck |2  ||f ||
(8.1.10)
k= n
holds for all n. In particular, limn!1 hf, en i = 0, which is the Riemann Lebesgue lemma
for the Fourier coefficients of f 2 H.
Next observe that by substitution of (8.1.2) into (8.1.7) we obtain
Z ⇡
sn (x) =
f (y)Dn (x y) dy
(8.1.11)
84
⇡
We can therefore regard sn as being given by the convolution Dn ⇤ f if we let f (x) = 0
outside of the interval ( ⇡, ⇡). We can also express Dn in an alternative and useful way:
1
e
Dn (x) =
2⇡
inx
2n
X
k=0
e
ikx
1
=
e
2⇡
inx
✓
1
e(2n+1)ix
1 eix
for x 6= 0. Multiplying top and bottom of the fraction by e
Dn (x) =
1 sin (n + 12 )x
2⇡
sin x2
x 6= 0
ix/2
◆
(8.1.12)
then yields
(8.1.13)
and obviously Dn (0) = (2n + 1)/2⇡.
An alternative viewpoint of the convolutional relation (8.1.11), which is in some sense
more natural, starts by defining the unit circle as T = R mod 2⇡Z, i.e. we identify any
117
84a
two points of R di↵ering by an integer multiple of 2⇡. Any 2⇡ periodic function, such
as en , Dn , sn etc may be regarded as a function on T, and if f is originally given as a
function on ( ⇡, ⇡) then it may extended in a 2⇡ periodic manner to all of R and so also
viewed as a function on the circle T. With f , Dn both 2⇡ periodic, the integral (8.1.11)
could be written as
Z
sn (x) = f (y)Dn (x y) dy
(8.1.14)
85
T
since (8.1.11) simply amounts to using one natural parametrization of the independent
variable. By the same token
Z a+2⇡
sn (x) =
f (y)Dn (x y) dy
(8.1.15)
a
for any convenient choice of a. A 2⇡ periodic function is continuous on T if it is continuous
on [ ⇡, ⇡] and f (⇡) = f ( ⇡), and the space C(T) may simply be regarded as
C(T) = {f 2 C([ ⇡, ⇡]) : f (⇡) = f ( ⇡)}
(8.1.16)
a closed subspace of C([ ⇡, ⇡]), so is itself a Banach space with maximum norm. Likewise
we can define
C m (T) = {f 2 C m ([ ⇡, ⇡]) : f (j) (⇡) = f (j) ( ⇡), j = 0, 1, . . . m}
(8.1.17)
a Banach space with the analogous norm.
Next let us make some corresponding observations about KN .
Proposition 8.1. There holds
N (x)
=
Z
T
KN (x
y)f (y) dy
(8.1.18)
86
(8.1.19)
86b
and
KN (x) =
N ✓
X
k= N
1
|k|
N +1
◆
eikx
1
=
2⇡(N + 1)
sin ( (N +1)x
)
2
x
sin ( 2 )
!2
x 6= 0
Proof: The identity (8.1.18) is immediate from (8.1.14) and the definition of KN , and
118
the first identity in (8.1.19) is left as an exercise. To complete the proof we observe that
2⇡
N
X
Dn (x) =
n=0
=
=
PN
sin (n + 12 )x
sin x2
⇣ xP
⌘
inx
Im ei 2 N
e
n=0
n=0
sin x
⇣ x ⇣ 2 i(N +1)x ⌘⌘
Im ei 2 1 1e eix
Im
=
(8.1.20)
⇣
sin x2
1 cos (N +1)x i sin (N +1)x
2i sin x2
sin x2
(8.1.21)
(8.1.22)
⌘
(8.1.23)
cos (N + 1)x 1
2 sin2 x2
!2
sin (N +1)x
2
=
sin ( x2 )
=
(8.1.24)
(8.1.25)
and the conclusion follows upon dividing by 2⇡(N + 1). 2
fejerconvergence
Theorem 8.1. Suppose that f 2 C(T). Then
N
! f in C(T).
R
0 and T KN (x y) dy = 1 for any x, we have
Z
Z x+⇡
f (x)| =
Kn (x y)(f (y) f (x)) dy 
Kn (x y)|f (y)
Proof: Since KN
|
N (x)
T
f (x)| dy
x ⇡
If ✏ > 0 is given, then since f must be uniformly continuous on T, there exists
that |f (x) f (y)| < ✏ if |x y| < . Thus
|
N (x)
f (x)|
✏
R
|x y|<
KN (x
Thus there exists N0 such that for N
uniformly.
y) dy + 2||f ||1
✏+
N0 , |
<|x y|<⇡
KN (x
(8.1.27)
y) dy(8.1.28)
||f ||1
⇡(N +1) sin2 ( 2 )
N (x)
119
R
(8.1.26)
> 0 such
f (x)| < 2✏ for all x, that is,
(8.1.29)
N
!f
corr81
Corollary 8.1. The functions {en (x)}1
n=
1
form an orthonormal basis of H = L2 ( ⇡, ⇡).
Proof: We have already observed that these functions form an orthonormal set, so it
remains only to verify one of the equivalent conditions stated in Theorem 6.4. We will
show the closedness property, i.e. that set of finite linear combinations of {en (x)}1
n= 1
is dense in H. Given g 2 H and ✏ > 0 we may find f 2 C(T) such that ||f g|| < ✏,
f 2 D( ⇡, ⇡)pfor example. Then choose N such that || N f ||C(T) < ✏, which implies
|| N f || < 2⇡✏. Thus N is a finite linear combination of the en ’s and
p
||g
2⇡)✏
(8.1.30)
N || < (1 +
Since ✏ is arbitrary, the conclusion follows.
corr82
Corollary 8.2. For any f 2 H = L2 ( ⇡, ⇡), if
n
X
sn (x) =
ck eikx
(8.1.31)
k= n
where
1
ck =
2⇡
then sn ! f in H.
Z
⇡
f (x)e
ikx
dx
(8.1.32)
⇡
For f 2 H, we will often write
f (x) =
1
X
cn einx
(8.1.33)
n= 1
but we emphasize that without further assumptions this only means that the partial
sums converge in L2 ( ⇡, ⇡).
At this point we have looked at the convergence properties of two di↵erent sequences
of trigonometric polynomials, sn and N , associated with f . While sn is simply the n’th
partial sum of the Fourier series of f , the N ’s are the so-called Féjer means of f . While
each Féjer mean is a trigonometric polynomial, the sequence N does not amount to the
partial sums of some other Fourier series, since the n’th coefficient would also have to
depend on N . For f 2 H, we have that sN ! f in H, and so the same is obviously true
under the stronger assumption that f 2 C(T). On the other hand for f 2 C(T) we have
120
shown that N ! f uniformly, but it need not be true that sN ! f uniformly, or even
pointwise (example of P. du Bois-Reymond, see Section 1.6.1 of [25]). For f 2 H it can
be shown that N ! f in H, but on the other hand the best L2 approximation property
of sN implies that
||sN f ||  || N f ||
(8.1.34)
since both sN and N are in the span of {ek }N
k= N . That is to say, the rate of convergence
of sN to f is faster, in the L2 sense at least, than that of N . In summary, both sN and
N provide a trigonometric polynomial approximating f , but each has some advantage
over the other, depending on what is to be assumed about f .
8.2
Alternative forms of Fourier series
From the basic Fourier series (8.1.1) a number of other closely related and useful expressions can be immediately derived. First suppose that f 2 L2 ( L, L) for some L > 0. If
we let f˜(x) = f (Lx/⇡) then f˜ 2 L2 ( ⇡, ⇡), so
f˜(x) =
1
X
cn e
inx
n= 1
1
cn =
2⇡
Z
1
cn =
2L
Z
⇡
f˜(y)e
iny
f (y)e
i⇡ny/L
dy
(8.2.1)
⇡
or equivalently
f (x) =
1
X
cn e
i⇡nx/L
n= 1
L
dy
(8.2.2)
L
Likewise (8.2.2) holds if we just regard f as being 2L periodic and in L2 , and in the
formula p
for cn we could replace ( L, L) by any other interval of length 2L. The functions
ei⇡nx/L / 2L make up an orthonormal basis of L2 (a, b) if b a = 2L.
Next observe that we can write
f (x) =
1
X
n= 1
cn
⇣
1
X
n⇡x
n⇡x ⌘
cos
+ i sin
= c0 +
(cn + c
L
L
n=1
n ) cos
n⇡x
+ i(cn
L
c
n⇡x
L
(8.2.3)
n ) sin
If we let
an = c n + c
n
bn = i(cn
121
c
n)
n = 0, 1, 2, . . .
(8.2.4)
87
then we obtain the equivalent formulas
f (x) =
1
a0 X
n⇡x
n⇡x
+
an cos
+ bn sin
2
L
L
n=1
(8.2.5)
88
where
1
an =
L
Z
L
n⇡y
f (y) cos
dy
L
L
1
bn =
L
n = 0, 1, . . .
Z
L
f (y) sin
L
n⇡y
dy
L
n = 1, 2, . . .
(8.2.6)
We refer to (8.2.5),(8.2.6) as the ’real form’ of the Fourier series, which is natural to
use, for example, if f is real valued, since then no complex quantities appear. Again the
precise meaning of (8.2.5) is that sn ! f in H = L2 ( L, L) or other interval of length
2L, where now
n
a0 X
k⇡x
k⇡x
sn (x) =
+
ak cos
+ bk sin
(8.2.7)
2
L
L
k=1
with results analogous to those mentioned above for the Féjer means also being valid. It
may be easily checked that the set of functions
⇢
1
1 cos n⇡x sin n⇡x
p , p L , pL
(8.2.8)
2L
L
L
n=1
make up an orthonormal basis of L2 ( L, L).
Another important variant is obtained as follows. If f 2 L2 (0, L) then we may define
the associated even and odd extensions of f in L2 ( L, L), namely
(
(
f (x) 0 < x < L
f (x) 0 < x < L
fe (x) =
fo (x) =
(8.2.9)
f ( x)
L<x<0
f ( x)
L<x<0
If we replace f by fe in (8.2.5),(8.2.6), then we obtain immediately that bn = 0 and a
resulting cosine series representation for f ,
Z
1
a0 X
n⇡x
2 L
n⇡y
f (x) =
+
an cos
an =
f (y) cos
dy n = 0, 1, . . .
(8.2.10)
2
L
L 0
L
n=1
Likewise replacing f by fo gives us a corresponding sine series,
Z
1
X
n⇡x
2 L
n⇡y
f (x) =
bn sin
bn =
f (y) sin
dy n = 1, 2, . . .
L
L 0
L
n=1
122
(8.2.11)
89
Note that if the 2L periodic extension of f is continuous, then the same is true of the
2L periodic extension of fe , but this need not be true in the case of fo . Thus we might
expect that the cosine series of f has typically better convergence properties than that
of the sine series.
8.3
More about convergence of Fourier series
If f 2 L2 ( ⇡, ⇡) it was already observed that since the the partial sums sn converge to
f in L2 ( ⇡, ⇡), some subsequence of the partial sums converges pointwise a.e. In fact it
is a famous theorem of Carleson ([6]) that sn ! f (i.e. the entire sequence, not just a
subsequence) pointwise a.e. This is a complicated proof and even now is not to be found
even in advanced textbooks. No better result could be expected since f itself is only
defined up to sets of measure zero.
If we were to assume the stronger condition that f 2 C(T) then it mighty be natural
to conjecture that sn ! f for every x (recall we know N ! f uniformly in this case), but
that turns out to be false, as mentioned above: in fact there exist continuous functions
for which sn (x) is divergent at infinitely many x 2 T, see Section 5.11 of [29].
A sufficient condition implying that sn (x) ! f (x) for every x 2 T is that f be
piecewise continuously di↵erentiable on T. In fact the following more precise theorem
can be proved.
FourPointwise
Theorem 8.2. Assume that there exist points ⇡  x0 < x1 < . . . xM = ⇡ such that
f 2 C 1 ([xj , xj+1 ]) for j = 0, 1, . . . M 1. Let
(
1
(limy!x+ f (y) + limy!x f (y))
⇡<x<⇡
f˜(x) = 21
(8.3.1)
(limy! ⇡+ f (y) + limy!⇡ f (y)) x = ±⇡
2
Then limn!1 sn (x) = f˜(x) for
⇡  x  ⇡.
Under the stated assumptions on f , the theorem states in particular that sn converges
to f at every point of continuity of f , (with appropriate modification at the endpoints)
and otherwise converges to the average of the left and right hand limits. The proof is
somewhat similar to that of Theorem 8.1 – steps in the proof are outlined in the exercises.
So far we have discussed the convergence properties of the Fourier series based on
assumptions about f , but another point of view we could take is to focus on how con123
vergence properties are influenced by the behavior of the Fourier coefficients cn . A first
simple result of this type is:
prop82
Proposition 8.2. If f 2 H = L2 ( ⇡, ⇡) and its Fourier coefficients satisfy
1
X
n= 1
|cn | < 1
(8.3.2)
acfs
then f 2 C(T) and sn ! f uniformly on T
P
inx
Proof: By the Weierstrass M-test, the series 1
is uniformly convergent on
n= 1 cn e
R to some limit g, and since each partial sum is continuous, the same must be true of g.
Since uniform convergence implies L2 convergence on any finite interval, we have sn ! g
in H, but also sn ! f in H by Corollary 8.2. By uniqueness of the limit f = g and the
conclusion follows.
We say that f has an absolutely convergent Fourier series when (8.3.2) holds. We
emphasize here that the conclusion f = g is meant in the sense of L2 , i.e. f (x) = g(x)
a.e., so by saying that f is continuous, we are really saying that the equivalence class of
f contains a continuous function, namely g.
It is not the case that every continuous function has an absolutely convergent Fourier
series, according to remarks made earlier in this section. It would therefore be of interest
to find other conditions on f which guarantee that (8.3.2) holds. One such condition
follows from the following, which is also of independent interest.
prop83
Proposition 8.3. If f 2 C m (T), then limn!±1 nm cn = 0.
Proof: We integrate by parts in (8.1.2) to get, for n 6= 0,

Z
Z
1 f (y)e iny ⇡
1 ⇡ 0
1
iny
cn =
+
f (y)e
dy =
⇡
2⇡
in
in ⇡
2⇡in
⇡
f 0 (y)e
iny
dy
(8.3.3)
⇡
if f 2 C 1 (T). Since f 0 2 L2 (T), the Riemann-Lebesgue lemma implies that ncn ! 0 as
n ! ±1. If f 2 C 2 (T) we could integrate by parts again to get n2 cn ! 0 etc.
It is immediate from this result that if f 2 C 2 (T) then it has an absolutely convergent
Fourier series, but in fact even f 2 C 1 (T) is more than enough, see Exercise 6.
One way to regard Proposition 8.3 is that it says that the smoother f is, the more
rapidly its Fourier coefficients must decay. The next result is a sort of converse statement.
124
810
prop84
Proposition 8.4. If f 2 H = L2 ( ⇡, ⇡) and its Fourier coefficients satisfy
|nm+↵ cn |  C
(8.3.4)
811
for some C and ↵ > 1, then f 2 C m (T).
Proof: When m = 0 this is just a special case of Proposition 8.2. When m = 1
we see that it is permissible to di↵erentiate the series (8.1.1) term by term, since the
di↵erentiated series
1
X
incn einx
(8.3.5)
n= 1
is uniformly convergent, by the assumption (8.3.4). Thus f, f 0 are both a.e. equal to an
absolutely convergent Fourier series, so f 2 C 1 (T), by Proposition 8.2. The proof for
m = 2, 3, . . . is similar.
Note that Proposition 8.3 states a necessary condition on the Fourier coefficients for
f to be in C m and Proposition 8.4 states a sufficient condition. The two conditions are
not identical, but both point to the general tendency that increased smoothness of f is
associated with more rapid decay of the corresponding Fourier coefficients.
8.4
The Fourier Transform on RN
If f is a given function on RN the Fourier transform of f is defined as
Z
1
fˆ(y) =
f (x)e ix·y dx
y 2 RN
N
N
2
(2⇡)
R
(8.4.1)
provided that the integral is defined in some sense. This will always be the case, for
example, if f 2 L1 (RN ) and any y 2 RN since then
Z
1
ˆ
|f (y)| 
|f (x)| dx < 1
(8.4.2)
N
(2⇡) 2 RN
thus in fact fˆ 2 L1 (RN ) in this case.
There are a number of other commonly used definitions of the Fourier transform,
obtained by changing the numerical constant in front of the integral, and/or replacing
125
812
813
ix · y by ix · y and/or including a factor of 2⇡ in the exponent in the integrand. Each
convention has some convenient properties in certain situations, but none of them is
always the best, hence the lack of a universally agreed upon definition. The di↵erences
are non-essential, all having to do with the way certain numerical constants turn up, so
the only requirement is that we adopt one specific definition, such as (8.4.1), and stick
with it.
The Fourier transform is a particular integral operator, and an alternative operator
type notation for it,
F = ˆ
(8.4.3)
is often convenient to use, especially when discussing its mapping properties.
Example 8.1. If N = 1 and f (x) =
then the Fourier transform of f is
1
fˆ(y) = p
2⇡
[a,b] (x),
Z
the indicator function of the interval [a, b],
b
e
ixy
dy =
a
Example 8.2. If N = 1, ↵ > 0 and f (x) = e
iay
p
e
2⇡iy
iby
1
↵x2
(8.4.4)
(a Gaussian function) then
y2 Z
1
4↵
iy 2
e
e
e ixy dx = p
e ↵(x+ 2 ) dx
2⇡ 1
1
y2 Z
y2 r
1
y2
e 4↵
e 4↵ ⇡
1
2
= p
e ↵x dx = p
= p e 4↵
2⇡ 1
2⇡ ↵
2↵
1
fˆ(y) = p
2⇡
Z
↵x2
e
(8.4.5)
(8.4.6)
In the above derivation, the key step is the third equality which is justified by contour
2
integration techniques in complex function theory – the integral of e ↵z along the real
axis is the same as the integral along the parallel line Imz = y2 for any y.
Thus the Fourier transform of a Gaussian is another Gaussian, and in particular fˆ = f
if ↵ = 12 .
It is clear from the Fourier transform definition that if f has the special product form
f (x) = f1 (x1 )f2 (x2 ) . . . fN (xN ) then fˆ(y) = fˆ1 (y1 )fˆ2 (y2 ) . . . fˆN (yN ). The Gaussian in
2
RN , namely f (x) = e ↵|x| , is of this type, so using (8.4.6) we immediately obtain
fˆ(y) =
e
|y|2
4↵
N
(2↵) 2
126
(8.4.7)
NdGaussian
To state our first theorem about the Fourier transform, let us denote
C0 (RN ) = {f 2 C(RN ) : lim |f (x)| = 0}
|x|!1
(8.4.8)
the space of continuous functions vanishing at 1. It is a closed subspace of L1 (RN ),
hence a Banach space with the L1 norm. We emphasize that despite the notation,
functions in this space need not be of compact support.
Theorem 8.3. If f 2 L1 (RN ) then fˆ 2 C0 (RN ).
Proof: If yn 2 RN and yn ! y then clearly f (x)e ix·yn ! f (x)e ix·y for a.e. x 2 RN .
Also, |f (x)e ix·yn |  |f (x)|, and since we assume f 2 L1 (RN ) we can immediately apply
the dominated convergence theorem to obtain
Z
Z
ix·yn
lim
f (x)e
dx =
f (x)e ix·y dx
(8.4.9)
n!1
RN
RN
that is, fˆ(yn ) ! fˆ(y). Hence fˆ 2 C(RN ).
Next, suppose temporarily that g 2 C 1 (RN ) and has compact support. An integration
by parts gives us, for j = 1, 2, . . . N that
Z
1
1
@g ix·y
e
dx
(8.4.10)
ĝ(y) =
N
(2⇡) 2 iyj RN @yj
Thus there exists some C, depending on g, such that
|ĝ(y)|2 
C
yj2
from which it follows that
2
|ĝ(y)|  min
j
j = 1, 2, . . . N
(8.4.11)
✓
(8.4.12)
C
yj2
◆

CN
|y|2
Thus ĝ(y) ! 0 as |y| ! 1 in this case.
Finally, such g’s are dense in L1 (RN ), so given f 2 L1 (RN ) and ✏ > 0, choose g as
above such that ||f g||L1 (RN ) < ✏. We then have, taking into account (8.4.2)
|fˆ(y)|  |fˆ(y)
ĝ(y)| + |ĝ(y)| 
1
N
(2⇡) 2
127
||f
g||L1 (RN ) + |ĝ(y)|
(8.4.13)
and so
lim sup |fˆ(y)| <
|y|!1
✏
N
(2⇡) 2
(8.4.14)
Since ✏ > 0 is arbitrary, the conclusion fˆ 2 C0 (RN ) follows.
The fact that fˆ(y) ! 0 as |y| ! 1 is analogous to the property that the Fourier
coefficients cn ! 0 as n ! ±1 in the case of Fourier series, and in fact is also called the
Riemann-Lebesgue Lemma.
One of the fundamental properties of the Fourier transform is that it is ’almost’ its
own inverse. A first precise version of this is given by the following Fourier Inversion
Theorem.
finvthm
Theorem 8.4. If f, fˆ 2 L1 (RN ) then
Z
1
f (x) =
fˆ(y)eix·y dy
N
(2⇡) 2 RN
a.e. x 2 RN
(8.4.15)
The right hand side of (8.4.15) is not precisely the Fourier transform of fˆ because
the exponent contains ix · y rather than ix · y, but it does mean that we can think of
ˆ
it as saying that f (x) = fˆ( x), or
ˆ
fˆ = fˇ,
(8.4.16)
where fˇ(x) = f ( x), is the reflection of f .1 The requirement in the theorem that both
f and fˆ be in L1 will be weakened later on.
Proof: Since fˆ 2 L1 (RN ) the right hand side of (8.4.15) is well defined, and we denote
it temporarily by g(x). Define also the family of Gaussians,
G↵ (x) =
1
e
|x|2
4↵
N
(4⇡↵) 2
Warning: some authors use the symbol fˇ to mean the inverse Fourier transform of f .
128
(8.4.17)
fourinv
819
We then have
g(x) =
=
=
=
=
1
Z
2
fˆ(y)eix·y e ↵|y| dy
↵!0+ (2⇡)
RN
Z Z
1
2
lim
f (z)e ↵|y| e i(z x)·y dzdy
↵!0+ (2⇡)N RN RN
✓Z
◆
Z
1
↵|y|2
i(z x)·y
lim
f (z)
e
e
dy dz
↵!0+ (2⇡)N RN
RN
|z x|2
Z
e 4↵
lim
f (z)
N dz
↵!0+ RN
(4⇡↵) 2
lim (f ⇤ G↵ )(x)
lim
N
2
↵!0+
(8.4.18)
(8.4.19)
(8.4.20)
(8.4.21)
(8.4.22)
Here (8.4.18) follows from the dominated convergence theorem and (8.4.20) from Fubini’s
theorem, which is applicable here because
Z Z
2
|f (z)e ↵|y| | dzdy < 1
(8.4.23)
RN
RN
In (8.4.21) we have used the explicit calculation (8.4.7) above for the Fourier transform
of a Gaussian.
R
Noting that RN G↵ (x) dx = 1 for every ↵ > 0, we see that the di↵erence f ⇤ G↵ (x)
f (x) may be written as
Z
RN
G↵ (y)(f (x
so that
||f ⇤ G↵
f ||L1 (RN ) 
y)
f (x)) dx
(8.4.24)
Z
G↵ (y) (y) dy
(8.4.25)
RN
R
where (y) = RN |f (x y) f (x)| dx. Then is bounded and continuous at y = 0 with
(0) = 0 (see Exercise 10), and we can verify that the hypotheses of Theorem 7.2 are
satisfied with fn replaced by G↵n as long as ↵n ! 0+. For any sequence ↵n > 0, ↵n ! 0
it follows that G↵n ⇤ f ! f in L1 (RN ), and so there is a subsequence ↵nk ! 0 such that
(G↵nk ⇤ f )(x) ! f (x) a.e. We conclude that (8.4.15) holds. 2
129
8.5
Further properties of the Fourier transform
Formally speaking we have
Z
@
f (x)e
@yj RN
or in more compact notation
ix·y
dx =
Z
RN
ixj f (x)e
ix·y
dx
@ fˆ
= ( ixj f )ˆ
@yj
(8.5.1)
(8.5.2)
This is rigorously justified by standard theorems Rof analysis about di↵erentiation of integrals with respect to parameters provided that RN |xj f (x)| dx < 1.
A companion property, obtained formally using integration by parts, is that
Z
Z
@f ix·y
e
dx =
iyj f (x)e ix·y dx
(8.5.3)
@x
N
N
j
R
R
or
✓
◆
@f
ˆ = iyj fˆ
@xj
(8.5.4)
R
which is rigorously correct provided at least that f 2 C 1 (RN ) and |x|=R |f (x)| dS ! 0
as R ! 1. Repeating the above arguments with higher derivatives we obtain
Proposition 8.5. If ↵ is any multi-index then
D↵ fˆ(y) = (( ix)↵ f )ˆ(y)
if
(8.5.5)
821
|x↵ f (x)| dx < 1
(8.5.6)
822
(D↵ f )ˆ(y) = (iy)↵ fˆ(y)
(8.5.7)
823
(8.5.8)
824
Z
and
if
m
n
f 2 C (R )
Z
|x|=R
RN
|D f (x)| dS ! 0 as R ! 1
130
| | < |↵| = m
We will eventually see that (8.5.5) and (8.5.7) remain valid, suitably interpreted in a
distributional sense, under conditions much more general than (8.5.6) and (8.5.8). But
for now we introduce a new space in which these last two conditions are guaranteed to
hold.
Definition 8.1. The Schwartz space is defined as
S(RN ) = { 2 C 1 (RN ) : x↵ D
2 L1 (RN ) for all ↵, }
(8.5.9)
Thus a function is in the Schwartz space if any derivative of it decays more rapidly
than the reciprocal of any polynomial. Clearly S(RN ) contains all test functions D(RN )
2
as well as other kinds of functions such as Gaussians, e ↵|x| for any ↵ > 0.
If
2 S(RN ) then in particular, for any n
|D
(x)| 
C
(1 + |x|2 )n
(8.5.10)
for some C, and so clearly both (8.5.5) and (8.5.7) hold, thus the two key identities (8.5.5)
and (8.5.7) are correct whenever f is in the Schwartz space. It is also immediate from
(8.5.10) that S(RN ) ⇢ L1 (RN ) \ L1 (RN ).
Proposition 8.6. If
2 S(RN ) then ˆ 2 S(RN ).
Proof: Note from (8.5.5) and (8.5.7) that
(iy)↵ D ˆ(y) = (iy)↵ (( ix)
)ˆ(y) = (D↵ (( ix)
))ˆ(y)
(8.5.11)
holds for 2 S(RN ). Also, since S(RN ) ⇢ L1 (RN ) it follows from (8.4.2) that if
S(RN ) then ˆ 2 L1 (RN ). Thus we have the following list of implications:
2 S(RN ) =) ( ix) 2 S(RN )
=) D↵ (( ix) ) 2 S(RN )
=) (D↵ (( ix) ))ˆ 2 L1 (RN )
=) y ↵ D ˆ 2 L1 (RN )
=) ˆ 2 S(RN )
fmap
(8.5.12)
(8.5.13)
(8.5.14)
(8.5.15)
(8.5.16)
Corollary 8.3. The Fourier transform F : S(RN ) ! S(RN ) is one to one and onto.
131
2
825
Proof: The above theorem says that F maps S(RN ) into S(RN ), and if F = ˆ = 0
then the inversion theorem Theorem 8.4 is applicable, since both , ˆ are in L1 (RN ). We
ˇ
conclude = 0, i.e. F is one to one. If 2 S(RN ), let = ˆ. Clearly 2 S(RN )
and one may check directly, again using the inversion theorem, that ˆ = , so that F is
onto.
The next result, usually known as the Parseval identity, is the key step needed to
define the Fourier transform of a function in L2 (RN ), which turns out to be the more
natural setting.
Proposition 8.7. If ,
2 S(RN ) then
Z
Z
(x) ˆ(x) dx =
RN
ˆ(x) (x) dx
(8.5.17)
pars
RN
Proof: The proof is simply an interchange of order in an iterated integral, which is easily
justified by Fubini’s theorem:
✓Z
◆
Z
Z
1
ix·y
ˆ
(x) (x) dx =
(x)
(y)e
dy dx
(8.5.18)
N
(2⇡) 2 RN
RN
RN
✓Z
◆
Z
1
ix·y
=
(y)
(x)e
dx dy
(8.5.19)
N
(2⇡) 2 RN
RN
Z
ˆ(y) (y) dy
=
(8.5.20)
RN
There is a slightly di↵erent but equivalent formula, which is also sometimes called the
Parseval identity, see Exercise 11. The content of the following corollary is the Plancherel
identity.
planchthm
Corollary 8.4. For every
2 S(RN ) we have
|| ||L2 (RN ) = || ˆ||L2 (RN )
(8.5.21)
Proof: Given 2 S(RN ) there exists, by Corollary 8.3, 2 S(RN ) such that ˆ = . In
addition it follows directly from the definition of the Fourier transform and the inversion
132
planch
theorem that
|| ||2L2 (RN ) =
Z
= ˆ. Therefore, by Parseval’s identity
Z
Z
ˆ
ˆ
ˆ(x) ˆ(x) dx = || ˆ||2 2 N (8.5.22)
(x) (x) dx =
(x) (x) =
L (R )
RN
RN
RN
Recalling that D(RN ) is dense in L2 (RN ) it follows that the same is true of S(RN )
and the Plancherel identity therefore implies that the Fourier transform has an extension
to all of L2 (RN ). To be precise, if f 2 L2 (RN ) pick n 2 S(RN ) such that n ! f
in L2 (RN ). Since { n } is Cauchy in L2 (RN ), (8.5.21) implies the same for { ˆn }, so
g := limn!1 ˆn exists in the L2 sense, and this limit is by definition fˆ. From elementary
considerations this limit is independent of the choice of approximating sequence { n },
the extended definition of fˆ agrees with the original definition if f 2 L1 (RN ) \ L2 (RN ),
and (8.5.21) continues to hold for all f 2 L2 (RN ).
ˆ
ˆ
Since ˆn ! fˆ in L2 (RN ), it follows by similar reasoning that ˆn ! fˆ. By the inversion
ˆ
ˆ
theorem we know that ˆn = ˇn which must converge to fˇ, thus fˇ = fˆ, i.e. the Fourier
2
N
inversion theorem continues to hold on L (R ).
The subset L1 (RN ) \ L2 (RN ) is dense in L2 (RN ) so we also have that fˆ = limn!1 fˆn
if fn is any sequence in L1 (RN ) \ L2 (RN ) convergent in L2 (RN ) to f . A natural choice
of such a sequence is
(
f (x) |x| < n
fn (x) =
(8.5.23)
0
|x| > n
leading to the following explicit formula, similar to an improper integral, for the Fourier
transform of an L2 function,
Z
1
ˆ
f (y) = lim
f (x)e ix·y dx
(8.5.24)
N
n!1 (2⇡) 2
|x|<n
fourL2
where again without further assumptions we only know that the limit takes place in the
L2 sense.
Let us summarize.
Theorem 8.5. For any f 2 L2 (RN ) there exists a unique fˆ 2 L2 (RN ) such that fˆ is
given by (8.4.1) whenever f 2 L1 (RN ) \ L2 (RN ) and
||f ||L2 (RN ) = ||fˆ||L2 (RN ) .
133
(8.5.25)
planch2
Furthermore, f, fˆ are related by (8.5.24) and
f (x) = lim
n!1
1
N
(2⇡) 2
Z
fˆ(y)eix·y dy
(8.5.26)
|y|<n
We conclude this section with one final important property of the Fourier transform.
ftconv
Proposition 8.8. If f, g 2 L1 (RN ) then f ⇤ g 2 L1 (RN ) and
N
(f ⇤ g)ˆ = (2⇡) 2 fˆĝ
(8.5.27)
Proof: The fact that f ⇤ g 2 L1 (RN ) is immediate from Fubini’s theorem, or, alternatively, is a special case of Young’s convolution inequality (7.4.2). To prove (8.5.27) we
have
Z
1
(f ⇤ g)ˆ(z) =
(f ⇤ g)(x)e ix·z dx
(8.5.28)
N
(2⇡) 2 RN
◆
Z ✓Z
1
=
f (x y)g(y) dy e ix·z dx
(8.5.29)
N
(2⇡) 2 RN
RN
✓Z
◆
Z
1
iy·z
i(x y)·z
=
g(y)e
f (x y)e
dx dy (8.5.30)
N
(2⇡) 2 RN
RN
N
= (2⇡) 2 fˆ(z)ĝ(z)
(8.5.31)
with the exchange of order of integration justified by Fubini’s theorem.
8.6
Fourier series of distributions
In this and the next section we will see how the theory of Fourier series and Fourier
transforms can be extended to a distributional setting. To begin with let us consider
the casePof the delta function, viewed as a distribution on ( ⇡, ⇡). Formally speaking, if
inx
(x) = 1
, then the coefficients cn should be given by
n= 1 cn e
Z ⇡
1
1
cn =
(x)e inx dx =
(8.6.1)
2⇡ ⇡
2⇡
134
874
for every n, so that
1
1 X inx
(x) =
e
2⇡ n= 1
(8.6.2)
871
Certainly this is not a valid formula in any classical sense, since the terms of the series
do not decay to zero. On the other hand, the N ’th partial sum of this series is precisely
the Dirichlet kernel DN (x), as in (8.1.4) or (8.1.13), and one consequence of Theorem
8.2 is precisely that DN ! in D0 ( ⇡, ⇡). Thus we may expect to find Fourier series
representations of distributions, provided that we allow for the series to converge in a
distributional sense.
Note that since DN !
we must also have, by Proposition 7.2, that
0
DN
=
N
i X
neinx !
2⇡ n= N
0
(8.6.3)
P
m inx
as N ! 1. By repeatedly di↵erentiating, we see that any formal Fourier series 1
n= 1 n e
is meaningful in the distributional sense, and is simply, up to a constant multiple, some
derivative of the delta function. The following proposition shows that we can allow any
sequence of Fourier coefficients as long as the rate of growth is at most a power of n.
Proposition 8.9. Let {cn }1
n=
1
be any sequence of constants satisfying
|cn |  C|n|M
(8.6.4)
for some constant C and positive integer M . Then there exists T 2 D0 ( ⇡, ⇡) such that
T =
1
X
cn einx
(8.6.5)
cn
einx
M +2
(in)
1
(8.6.6)
n= 1
Proof: Let
g(x) =
1
X
n=
which is a uniformly convergent Fourier series, so in particular the partial sums SN ! g
(j)
in the sense of distributions on ( ⇡, ⇡). But then SN ! g (j) also in the distributional
sense, and in particular
1
X
cn einx = T := g (M +2)
(8.6.7)
n= 1
135
distfs
It seems clear that any distribution on R of the form (8.6.5) should be 2⇡ periodic
since every partial sum is. To make this precise, define the translate of any distribution
T 2 D0 (RN ) by the natural definition ⌧h T ( ) = T (⌧ h ), where as usual ⌧h (x) =
(x h), h 2 RN . We then say that T is h-periodic with period h 2 RN if ⌧h T = T , and
it is immediate that if Tn is h-periodic and Tn ! T in D0 (RN ) then T is also h periodic.
Example 8.3. The Fourier series identity (8.6.2) becomes
1
X
(x
1
1 X inx
e
2⇡ n= 1
2n⇡) =
n= 1
(8.6.8)
when regarded as an identity in D0 (R), since the left side is 2⇡ periodic and coincides
with on ( ⇡, ⇡).
A 2⇡ periodic distribution on R may also naturally be regarded as an element of the
distribution space D0 (T), which is defined as the space of continuous linear functionals
(j)
on C 1 (T). Here, convergence in C 1 (T) means that n ! (j) uniformly on T for all
1
j = 0, 1, 2 . . . . Any function
usual way to regular distribution
R ⇡ f 2 L (T) gives rise in the
2
Tf defined by Tf ( ) = ⇡ f (x) (x) dx and if f 2 L then then n’th Fourier coefficient
1
is cn = 2⇡
Tf (e inx ). Since e inx 2 C 1 (T) it follows that
cn = T (e
inx
)
(8.6.9)
is defined for T 2 D0 (T), and is defined to be the n’th Fourier coefficient of the distribution
T . This definition is then consistent with the definition of Fourier coefficient for a regular
distribution, and it can be shown (Exercise 29) that
N
X
n= N
in D0 (T)
cn einx ! T
(8.6.10)
Example 8.4. Let us evaluate the distributional Fourier series
1
X
einx
(8.6.11)
n=0
The n’th partial sum is
sn (x) =
n
X
eikx =
k=0
136
1
ei(n+1)x
1 eix
(8.6.12)
872
so that we may write, since
R⇡
sn (x) dx = 2⇡,
Z ⇡
1 ei(n+1)x
sn ( ) = 2⇡ (0) +
( (x)
1 eix
⇡
⇡
(0)) dx
(8.6.13)
for any test function .
The function ( (x)
(0))/(1 eix ) belongs to L2 ( ⇡, ⇡), hence
Z ⇡ i(n+1)x
e
( (x)
(0)) dx ! 0
eix
⇡ 1
(8.6.14)
as n ! 1 by the Riemann-Lebesgue lemma. Next, using obvious trigonometric identities
we see that 1/(1 eix ) = 12 (1 + i cot x2 ), and so
Z ⇡
Z
(x)
(0)
1
x
dx = lim
( (x)
(0))(1 + i cot ) dx (8.6.15)
ix
✏!0+ 2 ✏<|x|<⇡
1 e
2
⇡
Z ⇡
1
=
(x) dx ⇡ (0)
(8.6.16)
2 ⇡
Z
i
x
+ lim
(x) cot dx
(8.6.17)
✏!0+ 2 ✏<|x|<⇡
2
The principal value integral in (8.6.17) is naturally defined to be the action of the distribution pv(cot x2 ), and we obtain the final result, upon letting n ! 1, that
1
X
einx = ⇡ +
n=0
1 i
x
+ pv(cot )
2 2
2
(8.6.18)
By taking the real and imaginary parts of this identity we also find
1
X
1
X
1
cos nx = ⇡ +
2
n=0
8.7
sin nx =
n=1
1
x
pv(cot )
2
2
(8.6.19)
Fourier transforms of distributions
Taking again the example of the delta function, now considered as a distribution on RN ,
it appears formally correct that it should have a Fourier transform which is a constant
function, namely
Z
1
ˆ(x) = 1 N
(x)e ix·y dx =
(8.7.1)
N
(2⇡) 2 RN
(2⇡) 2
137
If the inversion theorem remains valid then any constant should also have a Fourier
N
transform, e.g. 1̂ = (2⇡) 2 . On the other hand it will turn out that a function such as
ex does not have a Fourier transform in any reasonable sense.
We will now show that the set of distributions for which the Fourier transform can
be defined turns out to be precisely the dual space of the Schwartz space, known also
as the space of tempered distributions. To define this we must first have a definition of
convergence in S(RN ).
Definition 8.2. We say that
lim ||x↵ D (
n!1
n
n
!
in S(RN ) if
)||L1 (RN ) = 0
f or any ↵,
(8.7.2)
Proof of the following lemma will be left for the exercises.
lemma81
Lemma 8.1. If
n
!
in S(RN ) then ˆn ! ˆ in S(RN ).
Definition 8.3. The set of tempered distributions on RN is the space of continuous
linear functionals on S(RN ), denoted S 0 (RN ).
It was already observed that D(RN ) ⇢ S(RN ) and in addition, if
then the sequence also converges in S(RN ). It therefore follows that
S 0 (RN ) ⇢ D0 (RN )
n
!
in D(RN )
(8.7.3)
i.e. any tempered distribution is also a distribution, as the choice of language suggests.
On the other hand, if Tf is the regular distribution corresponding
to the L1loc function
R
1
f (x) = ex , then Tf 62 S 0 (RN ) since this would require 1 ex (x) dx to be finite for any
2 S(RN ), which is not true. Thus the inclusion (8.7.3) is strict. We define convergence
in S 0 (RN ) is defined in the expected way, analogously to Definition 7.5:
convS
Definition 8.4. If T, Tn 2 S 0 (RN ) for n = 1, 2 . . . then we say Tn ! T in S 0 (RN ) (or in
the sense of tempered distributions) if Tn ( ) ! T ( ) for every 2 S(RN ).
It is easy to see that the delta function belongs to S 0 (RN ) as does any derivative or
translate of the delta function. A regular distribution Tf will belong to S 0 (RN ) provided
it satisfies the condition
f (x)
lim
=0
(8.7.4)
|x|!1 |x|m
138
851
for some m. Such an f is sometimes referred to as a function of slow growth. In particular,
any polynomial belongs to S 0 (RN ).
We can now define the Fourier transform T̂ for any T 2 S 0 (RN ). For motivation
of the definition, recall the Parseval identity (8.5.17), which amounts to the identity
T ˆ( ) = T ( ˆ), if we regard as a function in S(RN ) and as a tempered distribution.
Definition 8.5. If T 2 S 0 (RN ) then T̂ is defined by T̂ ( ) = T ( ˆ) for any
2 S(RN ).
The action of T̂ on any 2 S(RN ) is well-defined, since ˆ 2 S(RN ), and linearity of
T̂ is immediate. If n ! in S(RN ) then by Lemma 8.1 ˆn ! ˆ in S(RN ), so that
T̂ (
n)
= T ( ˆn ) ! T ( ˆ) = T̂ ( )
(8.7.5)
We have thus verified that T̂ 2 S 0 (RN ) whenever T 2 S 0 (RN ).
Example 8.5. If T = , then from the definition,
T̂ ( ) = T ( ˆ) = ˆ(0) =
Thus, as expected, ˆ =
1
(2⇡)
N
2
1
N
(2⇡) 2
Z
(x) dx
(8.7.6)
RN
, the constant distribution.
Example 8.6. If T = 1 (the constant distribution) then
Z
ˆ
ˆ(x) dx = (2⇡) N2 ˆˆ(0) = (2⇡) N2 (0)
T̂ ( ) = T ( ) =
(8.7.7)
RN
where the last equality follows from the inversion theorem which is valid for any
S(RN ). Thus again the expected result is obtained,
N
1̂ = (2⇡) 2
2
(8.7.8)
The previous two examples verify the validity of one particular instance of the Fourier
inversion theorem in the distributional context, but it turns out to be rather easy to
prove that it always holds. One more definition is needed first, that of the reflection of
a distribution.
Definition 8.6. If T 2 D0 (RN ) then Ť , the reflection of T , is the distribution defined
by Ť ( ) = T ( ˇ).
139
We now obtain the Fourier inversion theorem in its most general form, analogous to
the statement (8.4.16) first justified when f, fˆ are in L1 (RN ).
ˆ
Theorem 8.6. If T 2 S 0 (RN ) then T̂ = Ť .
Proof: For any
2 S(RN ) we have
ˆ
ˆ
T̂ ( ) = T ( ˆ) = T ( ˇ) = Ť ( )
(8.7.9)
The apparent triviality of this proof should not be misconstrued, as it relies on the
validity of the inversion theorem in the Schwartz space, and other technical machinery
which we have developed.
Here we state several more simple but useful properties. Here and elsewhere, we
follow the convention of using x and y as the independent variables before and after
Fourier transformation respectively.
ftdprop
Proposition 8.10. Let T 2 S 0 (RN ) and ↵ be a multi-index. Then
1. x↵ T 2 S 0 (RN ).
2. D↵ T 2 S 0 (RN ).
3. D↵ T̂ = (( ix)↵ T )ˆ.
4. (D↵ T )ˆ = (iy)↵ T̂ .
propftd
5. If Tn 2 S 0 (RN ) and Tn ! T in S 0 (RN ) then T̂n ! T̂ in S 0 (RN ).
Proof: We give the proof of part 3 only, leaving the rest for the exercises. Just like the
inversion theorem, it is more or less a direct consequence of the corresponding identity
for functions in S(RN ). For any 2 S(RN ) we have
D↵ T̂ ( ) = ( 1)|↵| T̂ (D↵ )
= ( 1)|↵| T ((D↵ )ˆ)
= ( 1)|↵| T ((iy)↵ ˆ)
(8.7.10)
(8.7.11)
(8.7.12)
= ( ix) T ( ˆ) = (( ix) T )ˆ( )
↵
as needed, where we used (8.5.7) to obtain (8.7.12).
140
↵
(8.7.13)
Example 8.7. If T =
0
regarded as an element of S 0 (R) then
iy
T̂ = ( 0 )ˆ = iy ˆ = p
2⇡
(8.7.14)
by part 4 of the previous proposition. In other words
Z 1
i
T̂ ( ) = p
x (x) dx
2⇡ 1
(8.7.15)
Example 8.8. Let T = H(x), the Heaviside function, again regarded as an element of
S 0 (R). To evaluate the Fourier transform Ĥ, one possible approach
is to use part 4 of
p
0
Proposition 8.10
p along with H = to first obtain iy Ĥ = 1/ 2⇡. A formal solution
is then Ĥ = 1/ 2⇡iy, but it must then be recognized that this distributional equation
does not have a unique solution, rather we can add to it any solution of yT = 0, e.g.
T = C for any constant C. It must be verified that there are no other solutions, the
constant C must be evaluated, and the meaning of 1/y in the distribution sense must
be made precise. See Example 8, section 2.4 of [32] for details of how this calculation is
completed.
An alternate approach, which yields other useful formulas along the way is as follows.
For any 2 S(RN ) we have
Z 1
ˆ
ˆ(y) dy
Ĥ( ) = H( ) =
(8.7.16)
0
Z 1Z 1
1
= p
(x)e ixy dxdy
(8.7.17)
2⇡ 0
1
Z RZ 1
1
= lim p
(x)e ixy dxdy
(8.7.18)
R!1
2⇡ 0
1
✓Z R
◆
Z 1
1
ixy
p
= lim
(x)
e
dy dx
(8.7.19)
R!1
2⇡ 1
✓ 0
◆
Z 1
1
1 e iRx
= lim p
(x)
dx
(8.7.20)
R!1
ix
2⇡ 1
Z 1
Z 1
1
sin Rx
i
cos Rx 1
p
p
= lim
(x) dx +
(x) dx (8.7.21)
R!1
x
x
2⇡ 1
2⇡ 1
It can then be verified that
sin Rx
!⇡
x
cos Rx
x
141
1
!
pv
1
x
(8.7.22)
881
as R ! 1 in D0 (R). The first limit is just a restatement of the result of part b) in
Exercise 7 of Chapter 7, and the second we leave for the exercises. The final result,
therefore, is that
r
⇡
i
1
p pv
Ĥ =
(8.7.23)
2
2⇡ x
heavtrans
n), i.e. Tn ( ) = (n), for n = 0, ±1, . . . , so that
Z 1
1
ˆ
T̂n ( ) = (n) = p
(x)e inx dx
(8.7.24)
2⇡ 1
p
P
0
Equivalently, 2⇡ T̂n = e inx . If we now set T = 1
n= 1 Tn then T 2 S (R) and
Example 8.9. Let Tn = (x
1
1 X
T̂ = p
e
2⇡ n= 1
inx
1
1
X
1 X inx p
=p
e = 2⇡
(x
2⇡ n= 1
n= 1
2⇡n)
(8.7.25)
where the last equality comes from (8.6.8). The relation T ( ˆ) = T̂ ( ), then yields the
very interesting identity
1
1
X
X
p
ˆ(n) = 2⇡
(2⇡n)
(8.7.26)
n= 1
valid at least for
n= 1
2 S(R), which is known as the Poisson summation formula.
We conclude this section with some discussion of the Fourier transform and convolution in a distributional setting. Recall we gave a definition of the convolution T ⇤
in Definition 7.7, when T 2 D0 (RN ) and 2 D(RN ). We can use precisely the same
definition if T 2 S 0 (RN ) and 2 S(RN ), that is
convsp
Definition 8.7. If T 2 S 0 (RN ) and
2 S(RN ) then (T ⇤ )(x) = T (⌧x ˇ).
Note that in terms of the action of the distribution T , x is just a parameter, and that
we must regard ˇ as a function of some unnamed other variable, say y or ·. By methods
similar to those used in the proof of Theorem 7.3 it can be shown that
T⇤
2 C 1 (RN ) \ S 0 (RN )
(8.7.27)
and
D↵ (T ⇤ ) = D↵ T ⇤
= T ⇤ D↵
In addition we have the following generalization of Proposition 8.8:
142
(8.7.28)
poissum
convth3
Theorem 8.7. If T 2 S 0 (RN ) and
2 S(RN ) then
N
(T ⇤ )ˆ = (2⇡) 2 T̂ ˆ
(8.7.29)
Sketch of proof: First observe that from Proposition 8.8 and the inversion theorem we
see that
1
ˆ ⇤ ˆ)
( )ˆ =
(8.7.30)
N (
(2⇡) 2
for ,
2 S(RN ). Thus for
2 S(RN )
(T̂ ˆ)( ) = T̂ ( ˆ ) = T (( ˆ )ˆ) =
1
(2⇡)
N
2
ˆ
T ( ˆ ⇤ ˆ) =
1
N
(2⇡) 2
T ( ˇ ⇤ ˆ)
(8.7.31)
On the other hand,
(T ⇤ )ˆ( ) = (T ⇤ )( ˆ)
Z
Z
ˆ
=
(T ⇤ )(x) (x) dx =
T (⌧x ˇ) ˆ(x) dx
RN
RN
✓Z
◆
✓Z
◆
ˇ
ˆ
ˇ
ˆ
= T
⌧x (·) (x) dx = T
(· x) (x) dx
RN
(8.7.32)
(8.7.33)
(8.7.34)
RN
= T ( ˇ ⇤ ˆ)
(8.7.35)
which completes the proof.
We have labeled the above proof a ’sketch’ because one key step, the first equality in
(8.7.34) was not explained adequately. See the conclusion of the proof of Theorem 7.19
in [30] for why it is permissible to move T across the integral in this way.
8.8
8-1
8-2
Exercises
P
inx
1. Find the Fourier series 1
for the function f (x) = x on ( ⇡, ⇡). Use
n= 1 cn e
some sort of computer graphics to plot a few of the partial sums of this series on
the interval [ 3⇡, 3⇡].
2. Use the Fourier series in problem 1 to find the exact value of the series
1
X
1
n2
n=1
1
X
n=1
143
1
(2n
1)2
3. Evaluate explicitly the Fourier series, justifying your steps:
1
X
n
cos (nx)
2n
n=1
(Suggestion: start by evaluating
P1
einx
n=1 2n ,
which is a geometric series.)
4. Produce a sketch of the Dirichlet and Féjer kernels DN and KN , either by hand or
by computer, for some reasonably large value of N .
5. Verify the first identity in (8.1.19).
8-5
6. We say that f 2 H k (T) if f 2 D0 (T) and its Fourier coefficients cn satisfy
a) If f 2 H 1 (T) show that
is uniformly convergent.
P1
1
X
n= 1
n= 1
n2k |cn |2 < 1
(8.8.1)
|cn | is convergent and so the Fourier series of f
b) Show that f 2 H k (T) for every k if and only if f 2 C 1 (T).
7. Evaluate the Fourier series
1
X
( 1)n n sin (nx)
n=1
0
in D (R). If possible, plot some partial sums of this series.
8. Find the Fourier transform of H(x)e
9. Let f 2 L1 (RN ).
↵x
for ↵ > 0.
> 0, find a relationship between fb and fb.
h) for h 2 RN , find a relationship between fbh and fb.
a) If f (x) = f ( x) for
b) If fh (x) = f (x
8n1
8-10
10. If f 2 L1 (RN ) show that ⌧h f ! f in L1 (RN ) as h ! 0. (Hint: First prove it when
f is continuous and of compact support.)
11. Show that
Z
(x) (x) dx =
RN
Z
RN
b(x) b(x) dx
(8.8.2)
for and
in the Schwartz space. (This is also sometimes called the Parseval
identity and leads even more directly to the Plancherel formula.)
144
8n3
ex-8-13
12. Prove Lemma 8.1.
13. In this problem Jn denotes the Bessel function of the first kind and of order n. It
may defined in various ways, one of which is
Z
i n ⇡ iz cos ✓
Jn (z) =
e
cos (n✓) d✓
(8.8.3)
⇡ 0
Suppose that f is a radially symmetric function in L1 (R2 ), i.e. f (x) = f (r) where
r = |x|. Show that
Z 1
fˆ(y) =
J0 (r|y|)f (r)r dr
0
It follows in particular that fˆ is also radially symmetric. Using the known identity
d
(zJ1 (z)) = zJ0 (z) compute the Fourier transform of B(0,R) the indicator function
dz
of the ball B(0, R) in R2 .
14. For ↵ 2 R let f↵ (x) = cos ↵x.
a) Find the Fourier transform fb↵ .
b) Find lim↵!0 fb↵ and lim↵!1 fb↵ in the sense of distributions.
15. Compute the Fourier transform of the Heaviside function H(x) in yet a di↵erent
way by justifying that
bn
Ĥ = lim H
n!1
in the sense of distributions, where Hn (x) = H(x)e
limit.
x
n
, and then evaluating this
16. Prove the remaining parts of Proposition 8.10.
17. Let f 2 C(R) be 2⇡ periodic. It then has a Fourier series in the classical sense,
but it also has a Fourier transform since f is a tempered distribution. What is the
relationship between the Fourier series and the Fourier transform?
18. Let f 2 L2 (RN ). Show that f is real valued if and only if fb( k) = fb(k) for all
k 2 RN . What is the analog of this for Fourier series?
19. Let f be a continuous 2⇡ periodic function with the usual Fourier coefficients
Z ⇡
1
cn =
f (x)e inx dx
2⇡ ⇡
145
Show that
1
2⇡
cn =
and therefore
1
cn =
4⇡
Z
⇡
⇡
⇣
Z
⇡
f (x +
⇡
f (x)
⇡
)e
n
f (x +
inx
⇡ ⌘
) e
n
dx
inx
dx.
If f is Lipschitz continuous, use this to show that there exists a constant M such
that
M
|cn | 
n 6= 0
|n|
20. Let R = ( 1, 1) ⇥ ( 1, 1) be a square in R2 , let f be the indicator function of R
and g be the indicator function of the complement of R.
a) Compute the Fourier transforms fˆ and ĝ.
b) Is either fˆ or ĝ in L2 (R2 )?
8n5
21. Verify the second limit in (8.7.22).
22. A distribution T on RN is even if Ť = T , and odd if Ť = T . Prove that the
Fourier transform of an even (resp. odd) tempered distribution is even (resp. odd).
23. Let
2 S(R), || ||L2 (R) = 1, and show that
✓Z 1
◆ ✓Z 1
◆
2
2
2 ˆ
2
x | (x)| dx
y | (y)| dy
1
1
1
4
(8.8.4)
This is a mathematical statement of the Heisenberg uncertainty principle. (Suggestion: start with the identity
Z 1
Z 1
d
2
1=
| (x)| dx =
x | (x)|2 dx
dx
1
1
Make sure to allow to be complex valued.) Show that equality is achieved in
(8.8.4) if is a Gaussian.
P1
⇡n2 t
24. Let ✓(t) =
. (It is a particular case of a class of special functions
n= 1 e
known as theta functions.) Use the Poisson summation formula (8.7.26) to show
that
r
✓ ◆
1
1
✓(t) =
✓
t
t
146
uncert
25. Use (8.7.23) to obtain the Fourier transform of pv x1 ,
r
1
⇡
( pv )ˆ(y) = i
sgn y
x
2
(8.8.5)
26. The proof of Theorem 8.7 implicitly used the fact that if ,
S(RN ). Prove this property.
27. Where is the mistake in the following argument? If u(x) = e
by Fourier transformation
iyû(y) + û(y) = (1 + iy)û(y) = 0
2 S(RN ) then ⇤
x
2
then u0 + u = 0 so
y2R
Since 1 + iy 6= 0 for real y, it follows that û(y) = 0 for all real y and hence u(x) = 0.
28. If f 2 L2 (RN ), the autocorrelation function of f is defined to be
Z
ˇ
g(x) = (f ⇤ f )(x) =
f (y)f (y x) dy
RN
8n29
Show that ĝ(y) = |fˆ(y)|2 , ĝ 2 L1 (RN ) and that g 2 C0 (RN ). (ĝ is called the power
spectrum or spectral density of f .)
P
inx
29. If T 2 D0 (T) and cn = T (e inx ), show that T = 1
in D0 (T).
n= 1 cn e
30. The ODE u00 xu = 0 is known as Airy’s equation, and solutions of it are called
Airy functions.
a) If u is an Airy function which is also a tempered distribution, use the Fourier
transform to find a first order ODE for û(y).
b) Find the general solution of the ODE for û.
c) Obtain the formal solution formula
u(x) = C
Z
1
eixy+iy
3 /3
dy
1
d) Explain why this formula is not meaningful as an ordinary integral, and how it
can be properly interpreted.
e) Is this the general solution of the Airy equation?
147
pvhat
Chapter 9
Distributions and Di↵erential
Equations
chde
In this chapter we will begin to apply the theory of distributions developed in the previous chapter in a more systematic way to problems in di↵erential equations. The modern
theory of partial di↵erential equations, and to a somewhat lesser extent ordinary di↵erential equations, makes extensive use of the so-called Sobolev spaces which we now proceed
to introduce.
9.1
Weak derivatives and Sobolev spaces
sec-sobolev
If f 2 Lp (⌦) then for any multiindex ↵ we know that D↵ f exists as an element of D0 (RN ),
but in general the distributional derivative need not itself be a function. However if there
exists g 2 Lq (⌦) such that D↵ f = Tg in D0 (RN ) then we say that f has the weak ↵
derivative g in Lq (⌦). That is to say, the requirement is that
Z
Z
f D↵ dx = ( 1)|↵| g dx
8 2 D(⌦)
(9.1.1)
⌦
↵
⌦
q
and we write D f 2 L (⌦). It is important to distinguish the concept of weak derivative
and almost everywhere (a.e.) derivative.
Example 9.1. Let ⌦ = ( 1, 1) and f (x) = |x|. Obviously f 2 Lp (⌦) for any 1  p  1,
and in the sense of distributions we have f 0 (x) = 2H(x) 1 (use, for example, (7.3.27)).
148
Thus f 0 2 Lq (⌦) for any 1  q  1. On the other hand f 00 = 2 which does not coincide
with Tg for any g in any Lq space. Thus f has the weak first derivative, but not the
weak second derivative, in Lq (⌦) for any q. The first derivative of f coincides with its
a.e. derivative. In the case of the second derivative, f 00 = 2 in the sense of distributions,
and obviously f 00 = 0 a.e. but this function does not coincide with the weak second
derivative, indeed there is no weak second derivative according to the above definition.
We may now define the spaces W k,p (⌦), known as Sobolev spaces.
Definition 9.1. If ⌦ ⇢ RN is an open set, 1  p  1 and k = 1, 2, . . . then
W k,p (⌦) := {f 2 D0 (⌦) : D↵ f 2 Lp (⌦) |↵|  k}
(9.1.2)
We emphasize that the meaning of the condition D↵ f 2 Lp (⌦) is that f should have
the weak ↵ derivative in Lp (⌦) as discussed above. Clearly
D(⌦) ⇢ W k,p (⌦) ⇢ Lp (⌦)
(9.1.3)
so that W k,p (⌦) is always a dense subspace of Lp (⌦) for 1  p < 1.
Example 9.2. If f (x) = |x| then referring to the discussion in the previous example we
see that f 2 W 1,p ( 1, 1) for any p 2 [1, 1], but f 62 W 2,p for any p.
It may be readily checked that W k,p (⌦) is a normed linear space with norm
8⇣
⌘ p1
< P
p
↵
||D
f
||
1p<1
|↵|k
Lp (⌦)
||f ||W k,p (⌦) =
(9.1.4)
:max
↵
p=1
|↵|k ||D f ||L1 (⌦)
Furthermore, the necessary completeness property can be shown (Exercise 5, or see Theorem 9.1 below) so that W k,p (⌦) is a Banach space. When p = 2 the norm may be
regarded as arising from the inner product
X
hf, gi =
D↵ f (x)D↵ g(x) dx
(9.1.5)
|↵|k
so that it is a Hilbert space. The alternative notation H k (⌦) is commonly used in place
of W k,2 (⌦).
There is a second natural way to give meaning to the idea of a function f 2 Lp (⌦)
having a derivative in an Lq space, which is as follows: if there exists g 2 Lq (⌦) such
149
that there exists fn 2 C 1 (⌦) satisfying fn ! f in Lp (⌦) and D↵ fn ! g in Lq (⌦), then
we say f has the strong ↵ derivative g in Lq (⌦).
It is elementary to see that a strong derivative is also a weak derivative – we simply
let n ! 1 in the identity
Z
Z
D↵ fn dx = ( 1)↵ fn D↵ dx
(9.1.6)
⌦
⌦
for any test function . Far more interesting is that when p < 1 the converse statement is
also true, that is weak=strong. This important result, which shall not be proved here, was
first established by Friedrichs [12] in some special situations, and then in full generality
by Meyers and Serrin [23]. A more thorough discussion may be found, for example, in
Chapter 3 of Adams [1]. The key idea is to use convolution, as in Theorem 7.5 to obtain
the needed sequence fn of C 1 functions. For f 2 W k,p (⌦) the approximating sequence
may clearly be supposed to belong to C 1 (⌦) \ W k,p (⌦), so this space is dense in W k,p (⌦)
and we have
HisW
Theorem 9.1. For any open set ⌦ ⇢ RN , 1  p < 1 and k = 0, 1, 2 . . . the Sobolev
space W k,p (⌦) coincides with the closure of C 1 (⌦) \ W k,p (⌦) in the W k,p (⌦) norm.
We now define another class of Sobolev spaces which will be important for later use.
Definition 9.2. For ⌦ ⇢ RN , W0k,p (⌦) is defined to be the closure of C01 (⌦) in the
W k,p (⌦) norm.
Obviously W0k,p (⌦) ⇢ W k,p (⌦), but it may not be immediately clear whether these
are actually the same space. In fact this is certainly true when k = 0 since in this case
we know C01 (⌦) is dense in Lp (⌦), 1  p < 1. It also turns out to be correct for any k, p
when ⌦ = RN (see Corollary 3.19 of Adams [ ]). But in general the inclusion is strict,
and f 2 W0k,p (⌦) carries the interpretation that D↵ f = 0 on @⌦ for |↵|  k 1. This
topic will be continued in more detail in Chapter ( ).
9.2
Di↵erential equations in D0
If we consider the simplest di↵erential equation u0 = f on an interval (a, b) ⇢ R, then
from elementary calculus
R x we know that if f is continuous on [a, b], then every solution
is of the form u(x) = a f (y) dy + C, for some constant C. Furthermore in this case
150
u 2 C 1 ([a, b]), and u0 (x) = f (x) for every x 2 (a, b) and we would refer to u as a classical
solution of u0 = f . If we make the weaker assumption that f 2 L1 (a, b) then we can no
longer expect u to be C 1 or u0 (x) = f (x) to hold at every point,
R x since f itself is only
defined up to sets of measure zero. If, however, we let u(x) = a f (y) dy + C then it is
an important result of measure theory that u0 (x) = f (x) a.e. on (a, b). The question
remains whether all solutions of u0 = f are of this form, and the answer must now
depend on precisely what is meant by ’solution’. If we were to interpret the di↵erential
equation as meaning u0 = f a.e. then the answer is no. For example u(x) = H(x) is
a nonconstant function on ( 1, 1) with u0 (x) = 0 for x 6= 0. An alternative meaning is
that the di↵erential equation should be satisfied in the sense of distributions on (a, b), in
which case we have the following theorem.
th9-2
Theorem 9.2. Let f 2 L1 (a, b).
Rx
a) If F (x) = a f (y) dy then F 0 = f in D0 (a, b).
b) If u0 = f in D0 (a, b), then there exists a constant C such that
Z x
u(x) =
f (y) dy + C
a<x<b
(9.2.1)
a
Proof: If F (x) =
Rx
a
f (y) dy, then for any
F 0( ) =
F ( 0) =
Z b ✓Z x
=
Z
Z
b
F (x) 0 (x) dx
a
◆
0
f (y) dy
(x) dx
a
a
✓Z b
◆
Z b
0
f (y)
(x) dx dy
=
=
2 C01 (a, b) we have
a
(9.2.2)
(9.2.3)
(9.2.4)
y
b
f (y) (y) dy = f ( )
(9.2.5)
a
Here the interchange of order of integration in the third line is easily justified by Fubini’s
theorem. This proves part a).
Now if u0 = f in the distributional sense then T = u F satisfies T 0 = 0 in D0 (a, b),
and we will finish by showing that T must be a constant. Choose 0 2 C01 (a, b) such
151
921
that
Rb
a
0 (y) dy
2 C01 (a, b), set
= 1. If
(x) = (x)
so that
2 C01 (a, b) and
Rb
a
✓Z
b
(y) dy
a
◆
0 (x)
(x) dx = 0. Let
Z x
⇣(x) =
(y) dy
(9.2.6)
(9.2.7)
a
Obviously ⇣ 2 C 1 (a, b) since ⇣ 0 = , but in fact ⇣ 2 C01 (a, b) since ⇣(a) = ⇣(b) = 0 and
⇣ 0 = ⌘ 0 in some neighborhood of a and of b. Finally it follows, since T 0 = 0 that
✓Z b
◆
0
0
0 = T (⇣) = T (⇣ ) = T ( ) =
(y) dy T ( 0 ) T ( )
(9.2.8)
a
Rb
or equivalently T ( ) = a C (y) dy where C = T ( 0 ). Thus T is the distribution corresponding to the constant function C.
We emphasize that part b) of this theorem is of interest, and not obvious, even when
f = 0: any distribution whose distributional derivative on some interval is zero must be a
constant distribution on that interval. Therefore, any distribution is uniquely determined
up to an additive constant by its distributional derivative, which, to repeat, is not the
case for the a.e. derivative.
Now let ⌦ ⇢ RN be an open set and
Lu =
X
a↵ (x)D↵ u
(9.2.9)
|↵|m
be a di↵erential operator of order m. We assume that a↵ 2 C 1 (⌦) in which case
Lu 2 D0 (⌦) is well defined for any u 2 D0 (⌦). We will use the following terminology for
the rest of this chapter.
Definition 9.3. If f 2 D0 (⌦) then
• u is a classical solution of Lu = f in ⌦ if u 2 C m (⌦) and Lu(x) = f (x) for every
x 2 ⌦.
• u is a weak solution of Lu = f in ⌦ if u 2 L1loc (⌦) and Lu = f in D0 (⌦).
152
pdo
• u is a distributional solution of Lu = f in ⌦ if u 2 D0 (⌦) and Lu = f in D0 (⌦).
It is clear that a classical solution is also a weak solution, and a weak solution is a
distributional solution. The converse statements are false in general, but may be true
in special cases.For example we have proved above that any distributional solution of
u0 = 0 must be constant, hence in particular any distributional solution of this di↵erential
equation is actually a classical solution. On the other hand u = is a distributional
solution of x2 u0 = 0, but is not a classical or weak solution. Of course a classical solution
cannot exist if f is not continuous on ⌦. A theorem which says that any solution of
a certain di↵erential equation must be smoother than what is actually needed for the
definition of solution, is called a regularity result. Regularity theory is a large and
important research topic within the general area of di↵erential equations.
Example 9.3. Let Lu = uxx uyy . If F, G 2 C 2 (R) and u(x, y) = F (x + y) + G(x y)
then we know u is classical solution of Lu = 0. We have also observed, in Example 7.12
that if F, G 2 L1loc (R) then Lu = 0 in the sense of distributions, thus u is a weak solution
of Lu = 0 according to the above definition. The equation has distributional solutions
also, which
R 1 are not weak solutions. For example, the singular distribution T defined by
T ( ) = 1 (x, x), dx in Exercise 11 of Chapter 7).
Example 9.4. If Lu = uxx +uyy then it turns out that all solutions of Lu = 0 are classical
solutions, in fact, any distributional solution must be in C 1 (⌦). This is an example of
very important kind of regularity result in PDE theory, and will not be proved here, see
for example Corollary 2.20 of [11]. The di↵erence between Laplace’s equation and the
wave equation, i.e. that Laplace’s equation has only classical solutions, while the wave
equation has many non-classical solutions, is a typical di↵erence between solutions of
PDEs of elliptic and hyperbolic types.
9.3
Fundamental solutions
secfundsol
Let ⌦ ⇢ RN , L be a di↵erential operator as in (9.2.9), and suppose G(x, y) has the
following properties1 :
G(·, y) 2 D0 (⌦)
Lx G(x, y) = (x
y) 8y 2 ⌦
(9.3.1)
1
The subscript x in Lx is used here to emphasize that the di↵erential operator is acting in the x variable,
with y in the role of a parameter.
153
We then call G a fundamental solution of L in ⌦. If such a G can be found, then formally
if we let
Z
u(x) =
G(x, y)f (y) dy
(9.3.2)
fundsolform
⌦
we may expect that
Lu(x) =
Z
Lx G(x, y)f (y) dy =
⌦
Z
(x
y)f (y) dy = f (x)
(9.3.3)
⌦
That is to say, (9.3.2) provides a way to obtain solutions of the PDE Lu = f , and
perhaps also a tool to analyze specific properties of solutions. We are of course ignoring
here all questions of rigorous justification – whether the formula for u even makes sense
if G is only a distribution in x, for what class of f ’s this might be so, and whether it is
permissible to di↵erentiate under the integral to obtain (9.3.3). A more advanced PDE
text such as Hörmander [16] may be consulted for such study. Fundamental solutions
are not unique in general, since we could always add to G any function H(x, y) satisfying
the homogeneous equation Lx H = 0 for fixed y.
We will focus now on the case that ⌦ = RN and a↵ (x) ⌘ a↵ for every ↵, i.e. L is a
constant coefficient operator. In this case, if we can find 2 D0 (RN ) for which L = ,
then G(x, y) = (x y) is a fundamental solution according to the above definition, and
it is normal in this situation to refer to itself as the fundamental solution rather than
G.
Formally, the solution formula (9.3.2) becomes
Z
u(x) =
(x y)f (y) dy
(9.3.4)
RN
an integral operator of convolution type. Again it may not be clear if this makes sense
as an ordinary integral, but recall that we have earlier defined (Definition 7.7) the convolution of an arbitrary distribution and test function, namely
u(x) = ( ⇤ f )(x) := (⌧x fˇ)
(9.3.5)
if 2 D0 (⌦) and f 2 C01 (RN ). Furthermore, using Theorem 7.3, it follows that u 2
C 1 (RN ) and
Lu(x) = (L ) ⇤ f (x) = ( ⇤ f )(x) = f (x)
(9.3.6)
We have therefore proved
Proposition 9.1. If there exists 2 D0 (⌦) such that L = , then for any f 2 C01 (RN )
the function u = ⇤ f is a classical solution of Lu = f .
154
932
It will essentially always be the case that the solution formula u = ⇤f is actually valid
for a much larger class of f ’s than C01 (RN ) but this will depend on specific properties of
the fundamental solution , which in turn depend on those of the original operator L.
Example 9.5. If L = , the Laplacian operator in R3 , then we have already shown
(Example 7.13) that (x) = 1/4⇡|x| satisfies
= in the sense of distributions on
R3 . Thus
✓
◆
Z
1
1
f (y)
u(x) =
⇤ f (x) =
dy
(9.3.7)
4⇡|x|
4⇡ R3 |x y|
provides a solution of u = f in R3 , at least when f 2 C01 (R3 ). The integral on the
right in (9.3.7) is known as the Newtonian potential of f , and can be shown to be a valid
solution formula for a much larger class of f ’s. It is in any case always a ’candidate’
solution, which can be analyzed directly. A fundamental solution of the Laplacian exists
in RN for any dimension, and will be recalled at the end of this section.
Example 9.6. Consider the wave operator Lu = utt uxx in R2 . A fundamental solution
for L (see Exercise 9) is
1
(x, t) = H(t |x|)
(9.3.8)
2
The support of , namely the set {(x, t) : |x| < t} is in this context known as the forward
light cone, representing the set of points x at which for fixed t > 0 a signal emanating
from the origin x = 0 at time t = 0, and travelling with speed one, may have reached.
The resulting solution formula for Lu = f may then be obtained as
Z 1Z 1
u(x, t) =
(x y, t s)f (y, s) dyds
1
1
Z Z
1 1 1
=
H(t s |x y|)f (y, s) dyds
2 1 1
Z Z x+t s
1 t
=
f (y, s) dyds
2 1 x t+s
(9.3.9)
(9.3.10)
(9.3.11)
In many cases of interest f (x, t) ⌘ 0 for t < 0 in which case we replace the lower limit
in the s integral by 0. In any case the region over which f is integrated is the ’backward’
light cone, with vertex at (x, t). Under this support assumption on f it also follows that
u(x, 0) = ut (x, 0) ⌘ 0, so by adding in the corresponding terms in D’Alembert’s solution
(2.3.46) we find that
Z Z
Z
1 t x+s t
1
1 x+t
u(x, t) =
f (y, s) dyds + (h(x + t) + h(x t)) +
g(s) ds (9.3.12)
2 0 x+t s
2
2 x t
155
newtpot
is the unique solution of
utt uxx = f (x, t)
u(x, 0) = h(x)
ut (x, 0) = g(x)
x2R t>0
x2R
x2R
(9.3.13)
(9.3.14)
(9.3.15)
It is of interest to note that this solution formula could also be written, formally at least,
as
@
u(x, t) = ( ⇤ f )(x, t) + ( ⇤ h)(x, t) + ( ⇤ g)(x, t)
(9.3.16)
(x)
@t (x)
where the notation (
⇤ h) indicates that the convolution takes place in x only, with t
(x)
as a parameter. Thus the fundamental solution enters into the solution not only of the
inhomogeneous equation Lu = f but in solving the Cauchy problem as well. This is not
an accidental feature, and we will see other instances of this sort of thing later.
So far we have seen a couple of examples where an explicit fundamental solution is
known, but have given no indication of a general method for finding it, or even determining if a fundamental solution exists. Let us address the second issue first, by stating
without proof a remarkable theorem.
MalEhr
Theorem 9.3. (Malgrange-Ehrenpreis) If L 6= 0 is any constant coefficient linear di↵erential operator then there exists a fundamental solution of L.
The proof of this theorem is well beyond the scope of this book, see for example
Theorem 8.5 of [30] or Theorem 10.2.1 of [16]. The assumption of constant coefficients
is essential here, counterexamples are known otherwise.
If we now consider how it might be possible to compute a fundamental solution for a
given operator L, it soon becomes apparent that the Fourier transform may be a useful
tool. If we start with the distributional PDE
X
L =
a↵ D ↵ =
(9.3.17)
|↵|m
and take the Fourier transform of both sides, the result is
X
X
a↵ (D↵ )ˆ =
a↵ (iy)↵ ˆ =
|↵|m
or
|↵|m
P (y) ˆ (y) = 1
156
1
N
(2⇡) 2
(9.3.18)
(9.3.19)
9315
where P (y), the so-called symbol or characteristic polynomial of L is defined as
X
N
P (y) = (2⇡) 2
(iy)↵ a↵
(9.3.20)
|↵|m
Note it was implicitly assumed here that ˆ exists, which would be the case if were
a tempered distribution, but this is not actually guaranteed by Theorem 9.3. This is a
rather technical issue which we will not discuss here, but rather take the point of view
that we seek a formal solution which, potentially, further analysis may show is a bona
fide fundamental solution.
We have thus obtained ˆ (y) = 1/P (y), or by the inversion theorem
Z
1
1 ix·y
(x) =
e dy
N
(2⇡) 2 RN P (y)
(9.3.21)
as a candidate for fundamental solution of L. One particular source of difficulty in
making sense of the inverse transform of 1/P is that in general P has zeros, which might
be of arbitrarily high order, making the integrand too singular to have meaning in any
ordinary sense. On the other hand, we have seen, at least in one dimension, how welldefined distributions of the ’pseudo-function’ type may be associated with non- locally
integrable functions such as 1/xm . Thus there may be some analogous construction in
more than one dimension as well. This is in fact one possible means to proving the
Malgrange-Ehrenpreis theorem.
It also suggests that the situation may be somewhat easier to deal with if the zero
set of P in RN is empty, or at least not very large. As a polynomial, of course, P
always has zeros, but some or all of these could be complex, whereas the obstructions to
making sense of (9.3.21) pertain to the real zeros of P only. If L is a constant coefficient
di↵erential operator of order m as above, define
X
N
Pm (y) = (2⇡) 2
(iy)↵ a↵
(9.3.22)
|↵|=m
which is known as the principal symbol of L.
Definition 9.4. We say that L is elliptic if y 2 RN , Pm (y) = 0 implies that y = 0.
That is to say, the principal symbol has no nonzero real roots. For example the
Laplacian operator L =
is elliptic, as is + lower order terms, since either way
157
fundsolform2
P2 (y) = |y|2 . On the other hand, the wave operator, written say as Lu =
PN 2
2
is not elliptic, since the principal symbol is P2 (y) = yN
+1
j=1 yj .
u uxN +1 xN +1
The following is not so difficult to establish (Exercise 16), and may be exploited in
working with the representation (9.3.21) in the elliptic case.
prop92
Proposition 9.2. If L is elliptic then
{y 2 RN : P (y) = 0}
(9.3.23)
the real zero set of P , is compact in RN , and lim|y|!1 |P (y)| = 1.
We will next derive a fundamental solution for the heat equation by using the Fourier
transform, although in a slightly di↵erent way from the above discussion. Consider first
the initial value problem for the heat equation
ut
u = 0
x 2 RN t > 0
u(x, 0) = h(x)
x 2 RN
(9.3.24)
(9.3.25)
with h 2 C01 (RN ). Assuming a solution exists, define the Fourier transform in the x
variables,
Z
1
û(y, t) =
u(x, t)e ix·y dx
(9.3.26)
N
(2⇡) 2 RN
Taking the partial derivative with respect to t of both sides gives (û)t = (ut )ˆ so by the
usual Fourier transformation calculation rules,
|y|2 û
(ut )ˆ = (û)t =
(9.3.27)
and û(y, 0) = ĥ(y). We may regard this as an ODE in t satisfied by û(y, t) for fixed y,
for which the solution obtained by elementary means is
|y|2 t
û(y, t) = e
If we let
be such that ˆ (y, t) =
1
(2⇡)
N
2
e
|y|2 t
u(x, t) = (
ĥ(y)
(9.3.28)
then by Theorem 8.8 it follows that
⇤ h)(x, t)
(x)
Since ˆ is a Gaussian in x, the same is true for
we get
(x, t) = H(t)
158
(9.3.29)
itself, as long as t > 0, and from (8.4.7)
e
|x|2
4t
N
(4⇡t) 2
(9.3.30)
hteqfs
By including the H(t) factor we have for later convenience defined (x, t) = 0 for t < 0.
Thus we get an integral representation for the solution of (9.3.38)-(9.3.39), namely
Z
Z
|x y|2
1
4t
u(x, t) =
(x y, t)h(y) dy =
e
h(y) dy
(9.3.31)
N
(4⇡t) 2 RN
RN
930
valid for x 2 RN and t > 0. As usual, although this was derived for convenience under
very restrictive conditions on h, it is actually valid much more generally (see Exercise
12).
Now to derive a solution formula for ut
u = f , let v = v(x, t; s) be the solution of
(9.3.38)-(9.3.39) with h(x) replaced by f (x, s), regarding s for the moment as a parameter,
and define
Z t
u(x, t) =
v(x, t
s; s) ds
(9.3.32)
931
0
Assuming that f is sufficiently regular, it follows that
Z t
ut (x, t) = v(x, 0, t) +
vt (x, t s, s) ds
0
Z t
= f (x, t) +
v(x, t s, s) ds
(9.3.33)
(9.3.34)
0
= f (x, t) +
u(x, t)
Inserting the formula (9.3.31) with h replaced by f (·, s) gives
Z tZ
u(x, t) = ( ⇤ f )(x, t) =
(x y, t s)f (y, s) dyds
(9.3.35)
(9.3.36)
RN
0
with given again by (9.3.30). Strictly speaking, we should assume that f (x, t) ⌘ 0 for
t < 0 in order that the integral on the right in (9.3.36) coincide with the convolution
in RN +1 , but this is without loss of generality, since we only seek to solve the PDE for
t > 0. The procedure used above for obtaining the solution of the inhomogeneous PDE
starting with the solution of a corresponding initial value problem is known as Duhamel’s
method, and is generally applicable, with suitable modifications, for time dependent PDEs
in which the coefficients are independent of time.
Since u(x, t) in (9.3.32) evidently satisfies u(x, 0) ⌘ 0, it follows (compare to (9.3.16))
that
u(x, t) = ( ⇤ h)(x, t) + ( ⇤ f )(x, t)
(9.3.37)
(x)
159
935
is a solution2 of
x 2 RN
x 2 RN
ut
u = f (x, t)
u(x, 0) = h(x)
t>0
(9.3.38)
(9.3.39)
Let us also observe here that if
F (x) =
then F
0,
R
RN
1
(2⇡)
N
2
e
|x|2
4
(9.3.40)
F (x) dx = 1, and
(x, t) =
✓
1
p
t
◆N
x
F (p )
t
(9.3.41)
for t > 0. From Theorem 7.2, and the observation that a sequence of the form (7.3.11)
satisfies the assumptions of that theorem, it follows that nN F (nx) ! in D(RN ) as
n ! 1. Choosing n = p1t we conclude that
lim
t!0+
In particular limt!0+ (
in D0 (RN )
(·, t) =
(9.3.42)
⇤ h)(x, t) = h(x) for all x 2 RN , at least when h 2 C01 (RN ).
(x)
We conclude this section by collecting all in one place a number of important fundamental solutions. Some of these have been discussed already, some will be left for the
exercises, and in several other cases we will be content with a reference.
Laplace operator
For L =
2
in RN there exists the following fundamental solutions3 :
8 |x|
>
N =1
<2
1
N =2
(x) = 2⇡ log |x|
>
: CN
N 3
|x|N 2
(9.3.43)
Note we do not say ’the solution’ here, in fact the solution is not unique without further restrictions.
Some texts will use consistently the fundamental solution of
rather than , in which case all of the signs
will be reversed.
3
160
laplace-fund
where
CN =
(2
1
N )⌦N
⌦N
1
=
1
Z
dS(x)
(9.3.44)
|x|=1
Thus CN is a geometric constant, related to the area of the unit sphere in RN – an
equivalent formula in terms of the volume of the unit ball in RN is also commonly used.
Of the various cases, N = 1 is elementary to check, N = 2 is requested in Exercise 20 of
Chapter 7, and we have done the N 3 case in Example 7.13.
Heat operator
For the heat operator L =
fundamental solution
@
@t
in RN +1 , we have derived earlier in this section the
(x, t) = H(t)
e
|x|2
4t
N
(4⇡t) 2
(9.3.45)
for all N .
Wave operator
2
@
For the wave operator L = @t
in RN +1 , the fundamental solution is again significantly
2
dependent on N . The cases of N = 1, 2, 3 are as follows:
8
1
>
|x|)
N =1
>
< 2 H(t
H(t |x|)
1 p
N
=2
(x, t) = 2⇡ t2 |x|2
(9.3.46)
>
>
: (t |x|)
N =3
4⇡|x|
We have discussed the N = 1 case earlier in this section, and refer to [10] or [18] for the
cases N = 2, 3. As a distribution, the meaning of the the fundamental solution in the
N = 3 case is just what one expects from the formal expression, namely
Z Z 1
Z
(t |x|)
(x, |x|)
( )=
(x, t) dtdx =
dx
(9.3.47)
4⇡|x|
R3
1
R3 4⇡|x|
for any test function . Note the tendency for the fundamental solution to become more
and more singular, as N increases. This pattern persists in higher dimensions, as the
fundamental solution starts to contain expressions involving 0 and higher derivatives of
the function.
161
Schrödinger operator
@
The Schrödinger operator is defined as L = @t
i in RN +1 . The derivation of a
fundamental solution here is nearly the same as for the heat equation, the result being
(x, t) = H(t)
e
|x|2
4it
(9.3.48)
N
(4i⇡t) 2
In quantum mechanics is frequently referred to as the ’propagator’. See [26] for much
material about the Schrödinger equation.
Helmholtz operator
The Helmholtz operator is defined by Lu = u
u. For > 0 and dimensions N = 2, 3
fundamental solutions are
8 p
sin (
|x|)
>
N =1
> 2p
<
p
p
(x) = 2⇡ K0 ( x)
(9.3.49)
N =2
p
>
>
: e |x|
N =3
4⇡|x|
where K0 is the so-called modified Bessel function of the second kind and order 0. See
Chapter 6 of [3] for derivations of these formulas when N = 2, 3, while the N = 1 case
is left for the exercises. This is a case where it may be convenient to use the Fourier
transform method directly, since the symbol of L, P (y) = |y|2
has no real zeros.
Klein-Gordon operator
2
The Klein-Gordon operator is defined by Lu = @@t2u
u
u in RN +1 . We mention only
the case N = 1, > 0, in which case a fundamental solution is
1
(x, t) = H(t
2
|x|)J0 (
p
(t2
x2 ))
N =1
(9.3.50)
where J0 is the Bessel function of the first kind and order zero (see Exercise 13 of Chapter
8). This may be derived, for example, by the method presented in Problem 2, Section
5.1 of [18], and choosing = .
162
9349
Biharmonic operator
The biharmonic operator is L = 2 , i.e. Lu = ( u). It arises especially in connection
with the theory of plates and shells, so that N = 2 is the most interesting case. A
fundamental solution is
(x) = |x|2 log |x|
N =2
(9.3.51)
for which a derivation of this is outlined in Exercise 10.
9.4
Exercises
1. Show that an equivalent definition of W 2,s (RN ) = H s (RN ) for s = 0, 1, 2, . . . is
Z
s
N
0
N
H (R ) = {f 2 S (R ) :
|fˆ(y)|2 (1 + |y|2 )s dy < 1}
(9.4.1)
Rn
The second definition makes sense even if s isn’t a positive integer and leads to one
way to define fractional and negative order di↵erentiability. Implicitly it requires
that fˆ (but not f itself) must be a function.
2. Using the definition (9.4.1), show that H s (RN ) ⇢ C0 (RN ) if s >
2 H s (RN ) if s < N2 .
N
.
2
Show that
1
3. If ⌦ is a bounded open set in R3 , and u(x) = |x|
, show that u 2 W 1,p (⌦) for
3
1  p < 2 . Along the way, you should show carefully that a distributional first
@u
derivative @x
agrees with the corresponding pointwise derivative.
i
4. Prove that if f 2 W 1,p (a, b) for p > 1 then
|f (x)
f (y)|  ||f ||W 1,p (a,b) |x
y|1
1
p
(9.4.2)
so in particular W 1,p (a, b) ⇢ C([a, b]). (Caution: You would like to use the fundamental theorem of calculus here, but it isn’t quite obvious whether it is valid
assuming only that f 2 W 1,p (a, b).)
ex-8-6
5. Proved directly that W k,p (⌦) is complete (relying of course on the fact that Lp (⌦)
is complete).
6. Show that Theorem 9.1 is false for p = 1.
163
HsDef
7. If f is a nonzero constant function on [0, 1], show that f 62 W01,p (0, 1) for 1  p < 1.
8. Let Lu = u00 + u and E(x) = H(x) sin x, x 2 R.
a) Show that E is a fundamental solution of L.
b) What is the corresponding solution formula for Lu = f ?
c) The fundamental solution E is not the same as the one given in (9.3.49). Does
this call for any explanation?
ex-8-8
ex-9-10
9. Show that E(x, t) = 12 H(t
Lu = utt uxx .
|x|) is a fundamental solution for the wave operator
10. The fourth order operator Lu = uxxxx + 2uxxyy + uyyyy in R2 is the biharmonic
operator which arises in the theory of deformation of elastic plates.
a) Show that L = 2 , i.e. Lu = ( u) where is the Laplacian.
b) Find a fundamental solution of L. (Suggestions: To solve
p LE = , first solve
F = and then E = F . Since F will depend on r = x2 + y 2 only, you can
look for a solution E = E(r) also.)
11. Let Lu = u00 + ↵u0 where ↵ > 0 is a constant.
a) Find a fundamental solution of L which is a tempered distribution.
b) Find a fundamental solution of L which is not a tempered distribution.
ex-9-12
12. Show directly that u(x, t) defined by (9.3.31) is a classical solution of the heat
equation for t > 0, under the assumption that h is bounded and continuous on RN .
13. Assuming that (9.3.31) is valid and h 2 Lp (RN ), derive the decay property
||u(·, t)||L1 (RN ) 
||h||Lp (RN )
N
t 2p
for 1  p  1.
14. If
G(x, y) =
(
y(x
x(y
1) 0 < y < x < 1
1) 0 < x < y < 1
show that G is a fundamental solution of Lu = u00 in (0, 1).
15. Is the heat operator L =
ex-9-4
@
@t
elliptic?
16. Prove Proposition 9.2.
164
(9.4.3)
Download