Notes for MATH 519-520 Paul E. Sacks January 13, 2016

advertisement
Notes for MATH 519-520
Paul E. Sacks
January 13, 2016
Contents
1 Orientation
1.1
8
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2 Preliminaries
2.1
8
9
Ordinary differential equations . . . . . . . . . . . . . . . . . . . . . . . .
9
2.1.1
Initial Value Problems . . . . . . . . . . . . . . . . . . . . . . . .
10
2.1.2
Boundary Value Problems . . . . . . . . . . . . . . . . . . . . . .
12
2.1.3
Some exactly solvable cases . . . . . . . . . . . . . . . . . . . . .
13
2.2
Integral equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
14
2.3
Partial differential equations . . . . . . . . . . . . . . . . . . . . . . . . .
18
2.3.1
First order PDEs and the method of characteristics . . . . . . . .
18
2.3.2
Second order problems in R2 . . . . . . . . . . . . . . . . . . . . .
21
2.3.3
Further discussion of model problems . . . . . . . . . . . . . . . .
24
2.3.4
Standard problems and side conditions . . . . . . . . . . . . . . .
30
2.4
Well-posed and ill-posed problems . . . . . . . . . . . . . . . . . . . . . .
33
2.5
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
34
1
3 Vector spaces
39
3.1
Axioms of a vector space . . . . . . . . . . . . . . . . . . . . . . . . . . .
39
3.2
Linear independence and bases
. . . . . . . . . . . . . . . . . . . . . . .
42
3.3
Linear transformations of a vector space . . . . . . . . . . . . . . . . . .
43
3.4
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
44
4 Metric spaces
46
4.1
Axioms of a metric space . . . . . . . . . . . . . . . . . . . . . . . . . . .
46
4.2
Topological concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
49
4.3
Functions on metric spaces and continuity . . . . . . . . . . . . . . . . .
53
4.4
Compactness and optimization . . . . . . . . . . . . . . . . . . . . . . . .
54
4.5
Contraction mapping theorem . . . . . . . . . . . . . . . . . . . . . . . .
58
4.6
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
61
5 Normed linear spaces and Banach spaces
66
5.1
Axioms of a normed linear space . . . . . . . . . . . . . . . . . . . . . . .
66
5.2
Infinite series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
69
5.3
Linear operators and functionals . . . . . . . . . . . . . . . . . . . . . . .
70
5.4
Contraction mappings in a Banach space . . . . . . . . . . . . . . . . . .
72
5.5
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
73
6 Inner product spaces and Hilbert spaces
6.1
Axioms of an inner product space . . . . . . . . . . . . . . . . . . . . . .
2
75
75
6.2
Norm in a Hilbert space . . . . . . . . . . . . . . . . . . . . . . . . . . .
77
6.3
Orthogonality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
78
6.4
Projections . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
79
6.5
Gram-Schmidt method . . . . . . . . . . . . . . . . . . . . . . . . . . . .
83
6.6
Bessel’s inequality and infinite orthogonal sequences . . . . . . . . . . . .
84
6.7
Characterization of a basis of a Hilbert space . . . . . . . . . . . . . . . .
85
6.8
Isomorphisms of a Hilbert space . . . . . . . . . . . . . . . . . . . . . . .
87
6.9
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
89
7 Distributions
93
7.1
The space of test functions . . . . . . . . . . . . . . . . . . . . . . . . . .
94
7.2
The space of distributions . . . . . . . . . . . . . . . . . . . . . . . . . .
95
7.3
Algebra and Calculus with Distributions . . . . . . . . . . . . . . . . . .
99
7.3.1
Multiplication of distributions . . . . . . . . . . . . . . . . . . . .
99
7.3.2
Convergence of distributions . . . . . . . . . . . . . . . . . . . . .
99
7.3.3
Derivative of a distribution . . . . . . . . . . . . . . . . . . . . . . 102
7.4
Convolution and distributions . . . . . . . . . . . . . . . . . . . . . . . . 108
7.5
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
8 Fourier analysis and distributions
115
8.1
Fourier series in one space dimension . . . . . . . . . . . . . . . . . . . . 115
8.2
Alternative forms of Fourier series . . . . . . . . . . . . . . . . . . . . . . 121
8.3
More about convergence of Fourier series . . . . . . . . . . . . . . . . . . 123
3
8.4
The Fourier Transform on RN . . . . . . . . . . . . . . . . . . . . . . . . 125
8.5
Further properties of the Fourier transform . . . . . . . . . . . . . . . . . 130
8.6
Fourier series of distributions . . . . . . . . . . . . . . . . . . . . . . . . 134
8.7
Fourier transforms of distributions . . . . . . . . . . . . . . . . . . . . . . 137
8.8
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143
9 Distributions and Differential Equations
149
9.1
Weak derivatives and Sobolev spaces . . . . . . . . . . . . . . . . . . . . 149
9.2
Differential equations in D0 . . . . . . . . . . . . . . . . . . . . . . . . . . 151
9.3
Fundamental solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154
9.4
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164
10 Linear operators
166
10.1 Linear mappings between Banach spaces . . . . . . . . . . . . . . . . . . 166
10.2 Examples of linear operators . . . . . . . . . . . . . . . . . . . . . . . . . 168
10.3 Linear operator equations . . . . . . . . . . . . . . . . . . . . . . . . . . 174
10.4 The adjoint operator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175
10.5 Examples of adjoints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178
10.6 Conditions for solvability of linear operator equations . . . . . . . . . . . 180
10.7 Fredholm operators and the Fredholm alternative . . . . . . . . . . . . . 181
10.8 Convergence of operators . . . . . . . . . . . . . . . . . . . . . . . . . . . 182
10.9 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183
4
11 Unbounded operators
187
11.1 General aspects of unbounded linear operators . . . . . . . . . . . . . . . 187
11.2 The adjoint of an unbounded linear operator . . . . . . . . . . . . . . . . 191
11.3 Extensions of symmetric operators . . . . . . . . . . . . . . . . . . . . . 195
11.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197
12 Spectrum of an operator
200
12.1 Resolvent and spectrum of a linear operator . . . . . . . . . . . . . . . . 200
12.2 Examples of operators and their spectra . . . . . . . . . . . . . . . . . . 204
12.3 Properties of spectra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207
12.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 210
13 Compact Operators
213
13.1 Compact operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213
13.2 The Riesz-Schauder theory . . . . . . . . . . . . . . . . . . . . . . . . . . 220
13.3 The case of self-adjoint compact operators . . . . . . . . . . . . . . . . . 224
13.4 Some properties of eigenvalues . . . . . . . . . . . . . . . . . . . . . . . . 231
13.5 The Singular Value Decomposition and Normal Operators . . . . . . . . 233
13.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235
14 Spectra and Green’s functions for differential operators
238
14.1 Green’s functions for second order ODEs . . . . . . . . . . . . . . . . . . 238
14.2 Adjoint problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 242
5
14.3 Sturm-Liouville theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . 245
14.4 The Laplacian with homogeneous Dirichlet boundary conditions . . . . . 249
14.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 256
15 Further study of integral equations
260
15.1 Singular integral operators . . . . . . . . . . . . . . . . . . . . . . . . . . 260
15.2 Layer potentials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 264
15.3 Convolution equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 269
15.4 Wiener-Hopf technique . . . . . . . . . . . . . . . . . . . . . . . . . . . . 270
15.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 273
16 Variational methods
276
16.1 The Dirichlet quotient . . . . . . . . . . . . . . . . . . . . . . . . . . . . 276
16.2 Eigenvalue approximation . . . . . . . . . . . . . . . . . . . . . . . . . . 281
16.3 The Euler-Lagrange equation . . . . . . . . . . . . . . . . . . . . . . . . 282
16.4 Variational methods for elliptic boundary value problems . . . . . . . . . 284
16.5 Other problems in the calculus of variations . . . . . . . . . . . . . . . . 288
16.6 The existence of minimizers . . . . . . . . . . . . . . . . . . . . . . . . . 293
16.7 The Fréchet derivative . . . . . . . . . . . . . . . . . . . . . . . . . . . . 294
16.8 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 298
17 Weak solutions of partial differential equations
304
17.1 Lax-Milgram theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 304
6
17.2 More function spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 311
17.3 Galerkin’s method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 316
17.4 PDEs with variable coefficients . . . . . . . . . . . . . . . . . . . . . . . 318
17.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 319
18 Appendices
322
18.1 Inequalities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 322
18.2 Integration by parts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 325
18.3 Spherical coordinates in RN . . . . . . . . . . . . . . . . . . . . . . . . . 326
19 Bibliography
328
7
Chapter 1
Orientation
1.1
Introduction
While the phrase ’Applied Mathematics’ has a very broad meaning, the purpose of this
textbook is much more limited, namely to present techniques of mathematical analysis
which have been found to be particularly useful in understanding some kinds of mathematical problems which are very commonly occurring in scientific and technological
disciplines, especially physics and engineering. These methods, which are often regarded
as belonging to the realm of functional analysis, have been motivated most specifically in
connection with the study of ordinary differential equations, partial differential equations
and integral equations. The mathematical modeling of physical phenomena typically involves one or more of these types of equations, and insight into the physical phenomenon
itself may result from a deep understanding of the underlying mathematical properties
which the models possess. All concepts and techniques discussed in this book are ultimately of interest because of their relevance for the study of these three general types of
problems. There is a great deal of beautiful mathematics which has grown out of these
ideas, and so intrinsic mathematical motivation cannot be denied or ignored.
8
Chapter 2
Preliminaries
chprelim
In this chapter we will discuss ’standard problems’ in the theory of ordinary differential
equations (ODEs), integral equations, and partial differential equations (PDEs). The
techniques developed in these notes are all meant to have some relevance for one or more
of these kinds of problems, so it seems best to start with some awareness of exactly what
the problems are. In each case there are some relatively elementary methods, which the
reader may well have seen before, or which depend only on simple considerations, which
we will review. At the same time we establish terminology and notations, and begin to
get some sense of the ways in which problems are classified.
2.1
Ordinary differential equations
An n’th order ordinary differential equation for an unknown function u = u(t) on an
interval (a, b) ⊂ R may be given in the form
F (t, u, u0 , u00 , . . . u(n) ) = 0
(2.1.1)
odeform1
where we use the usual notations u0 , u00 , . . . for derivatives of order 1, 2, . . . and also u(n)
for derivative of order n. Unless otherwise stated, we will assume that the ODE can be
solved for the highest derivative, i.e. written in the form
u(n) = f (t, u, u0 , . . . u(n−1) )
(2.1.2)
For the purpose of this discussion, a solution of either equation will mean a real valued
function on (a, b) possessing continuous derivatives up through order n, and for which
9
odeform
the equation is satisfied at every point of (a, b). While it is easy to write down ODEs in
the form (2.1.1) without any solutions (for example, (u0 )2 + u2 + 1 = 0), we will see that
ODEs of the type (2.1.2) essentially always have solutions, subject to some very minimal
assumptions on f .
The ODE is linear if it can be written as
n
X
aj (t)u(j) (t) = g(t)
(2.1.3)
lode
j=0
for some coefficients a0 , . . . an , g, and homogeneous linear if also g(t) ≡ 0. It is common
to use also operator notation for derivatives, especially in the linear case. Set
D=
d
dt
(2.1.4)
so that u0 = Du, u00 = D2 u etc., and (2.1.3) may be given as
Lu :=
n
X
aj (t)Dj u = g(t)
(2.1.5)
j=0
By standard calculus properties L is a linear operator, meaning that
L(c1 u1 + c2 u2 ) = c1 Lu1 + c2 Lu2
(2.1.6)
linear
for any scalars c1 , c2 and any n times differentiable functions u1 , u2 .
An ODE normally has infinitely many solutions – the collection of all solutions is
called the general solution of the given ODE.
Example 2.1. By elementary calculus considerations, the simple ODE u0 = 0 has general
solution u(t) = c, where c is an arbitrary constant. Likewise u0 = u has the general
2
solution u(t) = cet and u00 = 1 has the general solution u(t) = t2 + c1 t + c2 , where c1 , c2
are arbitrary constants. 2
2.1.1
Initial Value Problems
The general solution of an n’th order ODE typically contains exactly n arbitrary constants, whose values may be then chosen so that the solution satisfies n additional, or side,
conditions. The most common kind of side conditions for an ODE are initial conditions,
u(j) (t0 ) = γj
j = 0, 1, . . . n − 1
10
(2.1.7)
initcond
where t0 is a given point in (a, b) and γ0 , . . . γn−1 are given constants. Thus we are
prescribing the value of the solution and its derivatives up through order n − 1 at the
point t0 . The problem of solving (2.1.2) together with the initial conditions (2.1.7) is
called an initial value problem (IVP), and it is a very important fact that under fairly
unrestrictive hypotheses a unique solution exists. In stating conditions on f , we regard
it as a function f = f (t, y1 , . . . yn ) defined on some domain in Rn+1 .
OdeMain
Theorem 2.1. Assume that
f,
∂f
∂f
,...,
∂y1
∂yn
(2.1.8)
are defined and continuous in a neighborhood of the point (t0 , γ0 , . . . , γn−1 ) ∈ Rn+1 . Then
there exists > 0 such that the initial value problem (2.1.2),(2.1.7) has a unique solution
on the interval (t0 − , t0 + ).
A proof of this theorem may be found in standard ODE textbooks, see for example
[4],[7]. A slightly weaker version of this theorem will be proved in Section 4.5. As will be
discussed there, the condition of continuity of the partial derivatives of f with respect
to each of the variables yi can actually be replaced by the weaker assumption that f is
Lipschitz continuous with respect to each of these variables. If we assume only that f is
continuous in a neighborhood of the point (t0 , γ0 , . . . , γn−1 ) then it can be proved that at
least one solution exists, but it may not be unique, see Exercise 3.
It should also be emphasized that the theorem asserts a local existence property, i.e.
only in some sufficiently small interval centered at t0 . It has to be this way, first of all,
since the assumptions on f are made only in the vicinity of (t0 , γ0 , . . . , γn−1 ). But even
if the continuity properties of f were assumed to hold throughout Rn+1 , then as the
following example shows, it would still only be possible to prove that a solution exists
for points t close enough to t0 .
Example 2.2. Consider the first order initial value problem
u0 = u2
u(0) = γ
(2.1.9)
for which the assumptions of Theorem 2.1 hold for any γ. It may be checked that the
solution of this problem is
γ
u(t) =
(2.1.10)
1 − γt
which is only a valid solution for t < γ1 , which can be arbitrarily small. 2
11
With more restrictions on f it may be possible to show that the solution exists on
any interval containing t0 , in which case we would say that the solution exists globally.
This is the case, for example, for the linear ODE (2.1.3).
Whenever the conditions of Theorem 2.1 hold, the set of all possible solutions may
be regarded as being parametrized by the n constants γ0 , . . . , γn−1 , so that as mentioned
above, the general solution will contain n arbitrary parameters. In the special case of
the linear equation (2.1.3) it can be shown that the general solution may be given as
u(t) =
n
X
cj uj (t) + up (t)
(2.1.11)
j=1
where up is any particular solution of (2.1.3), and u1 , . . . , un are any n linearly independent solutions of the corresponding homogeneous equation Lu = 0. Any such set of
functions u1 , . . . , un is also called a fundamental set for Lu = 0.
Example 2.3. If Lu = u00 +u then by direct substitution we see that u1 (t) = sin t, u2 (t) =
cos t are solutions, and they are clearly linearly independent. Thus {sin t, cos t} is a
fundamental set for Lu = 0 and u(t) = c1 sin t + c2 cos t is the general solution of Lu = 0.
For the inhomogeneous ODE u00 + u = et one may check that up (t) = 21 et is a particular
solution, so the general solution is u(t) = c1 sin t + c2 cos t + 21 et .
2.1.2
Boundary Value Problems
For an ODE of degree n ≥ 2 it may be of interest to impose side conditions at more
than one point, typically the endpoints of the interval of interest. We will then refer to
the side conditions as boundary conditions and the problem of solving the ODE subject
to the given boundary conditions as a boundary value problem(BVP). Since the general
solution still contains n parameters, we still expect to be able to impose a total of n side
conditions. However we can see from simple examples that the situation with regard to
existence and uniqueness in such boundary value problems is much less clear than for
initial value problems.
Example 2.4. Consider the boundary value problem
u00 + u = 0 0 < t < π
u(0) = 0 u(π) = 1
(2.1.12)
Starting from the general solution u(t) = c1 sin t + c2 cos t, the two boundary conditions
lead to u(0) = c2 = 0 and u(π) = c2 = 1. Since these are inconsistent, the BVP has no
solution. 2
12
Example 2.5. For the boundary value problem
u00 + u = 0 0 < t < π
u(0) = 0 u(π) = 0
(2.1.13)
we have solutions u(t) = C sin t for any constant C, that is, the BVP has infinitely many
solutions.
The topic of boundary value problems will be studied in much more detail in Chapter
( ).
2.1.3
Some exactly solvable cases
Let us recall explicit solution methods for some commonly occurring types of ODEs.
• For the first order linear ODE
u0 + p(t)u = q(t)
(2.1.14)
define the so-called integrating factor ρ(t) = eP (t) where P 0 = p. Multiplying the
equation through by ρ we then get
(ρu)0 = ρq
(2.1.15)
so if we pick Q such that Q0 = ρq, the general solution may be given as
u(t) =
Q(t) + C
ρ(t)
(2.1.16)
• Next consider the linear homogeneous constant coefficient ODE
Lu =
n
X
aj u(j) = 0
(2.1.17)
j=0
If we look for solutions in the form u(t) = eλt then by direct substitution we find that
u is a solution provided λ is a root of the corresponding characteristic polynomial
P (λ) =
n
X
j=0
13
aj λ j
(2.1.18)
lode1
We therefore obtain as many linearly independent solutions as there are distinct
roots of P . If this number is less than n, then we may seek further solutions of
the form teλt , t2 eλt , . . . , until a total of n linearly independent solutions have been
found. In the case of complex roots, equivalent expressions in terms of trigonometric
functions are often used in place of complex exponentials.
• Finally, closely related to the previous case is the so-called Cauchy-Euler type equation
n
X
Lu =
(t − t0 )j aj u(j) = 0
(2.1.19)
CEtype
j=0
for some constants a0 , . . . , an . In this case we look for solutions in the form u(t) =
(t − t0 )λ with λ to be found. Substituting into (2.1.19) we will find again an n’th
order polynomial whose roots determine the possible values of λ. The interested
reader may refer to any standard undergraduate level ODE book for the additional
considerations which arise in the case of complex or repeated roots.
2.2
Integral equations
In this section we discuss the basic set-up for the study of linear integral equations. See
for example [15], [21] for general references in the classical theory of integral equations.
Let Ω ⊂ RN be a measurable set and set
Z
T u(x) =
K(x, y)u(y) dy
(2.2.1)
Ω
Here the function K should be a measurable function on Ω × Ω, and is called the kernel
of the integral operator T , which is linear since (2.1.6) obviously holds.
A class of associated integral equations is then
Z
K(x, y)u(y) dy = λu(x) + g(x)
x∈Ω
(2.2.2)
Ω
for some scalar λ and given function g in some appropriate class. If λ = 0 then (2.2.2)
is a first kind integral equation, otherwise it is second kind. Let us consider some simple
examples which may be studied by elementary means.
14
basicie
Example 2.6. Let Ω = (0, 1) ⊂ R and K(x, y) ≡ 1. The corresponding first kind
integral equation is therefore
Z 1
u(y) dy = g(x) 0 < x < 1
(2.2.3)
0
For simplicity here we will assume that g is a continuous function. The left hand side is
independent of x, thus a solution can exist only if g(x) is a constant function. When g
is constant, on the other hand, infinitely many solutions will exist, since we just need to
find any u with the given definite integral.
For the corresponding second kind equation,
Z 1
u(y) dy = λu(x) + g(x)
(2.2.4)
simplestie
0
a solution must have the specific form u(x) = (C − g(x))/λ for some constant C. Substituting into the equation then gives, after obvious simplification, that
Z 1
g(y) dy = Cλ
(2.2.5)
C−
0
or
R1
g(y) dy
(2.2.6)
1−λ
in the case that λ 6= 1. Thus, for any continuous function g and λ 6= 0, 1, there exists a
unique solution of the integral equation, namely
R1
g(y) dy g(x)
u(x) = 0
−
(2.2.7)
λ(1 − λ)
λ
C=
0
In the remaining
case that λ = 1 it is immediate from (2.2.5) that a solution can exist
R1
only if 0 g(y) dy = 0, in which case u(x) = C − g(x) is a solution for any choice of C.
This very simple example already exhibits features which turn out to be common to
a much larger class of integral equations of this general type. These are
• The first kind integral equation will require much more restrictive conditions on g
in order for a solution to exist.
• For most λ 6= 0 the second kind integral equation has a unique solution for any g.
15
2-01
• There may exist a few exceptional values of λ for which either existence or uniqueness fails in the corresponding second kind equation.
All of these points will be elaborated and made precise in Chapter ( ).
Example 2.7. Let Ω = (0, 1) and
Z
x
u(y) dy
(2.2.8)
(
1 y<x
K(x, y) =
0 x≤y
(2.2.9)
T u(x) =
opVolterra
0
corresponding to the kernel
The corresponding integral equation may then be written as
Z x
u(y) dy = λu(x) + g(x)
(2.2.10)
simpleVolter
0
This is the prototype of an integral operator of so-called Volterra type, see the definition
below.
In the first kind case, λ = 0, we see that g(0) = 0 is a necessary condition for
solvability, in which case the solution is u(x) = g 0 (x), provided that g is differentiable in
some suitable sense. For λ 6= 0 we note that differentiation of (2.2.10) with respect to x
gives
1
g 0 (x)
u0 − u = −
(2.2.11)
λ
λ
which is of the type (2.1.14), and so may be solved by the method given there. The
result, after some obvious algebraic manipulation, is
Z
x
eλ
1 x x−y 0
u(x) = − g(0) −
e λ g (y) dy
(2.2.12)
λ
λ 0
Note, however, that by an integration by parts, this formula is seen to be equivalent to
Z x
x−y
g(x)
1
u(x) = −
e λ g(y) dy
− 2
(2.2.13)
λ
λ 0
Observe that (2.2.12) seems to require differentiability of g even though (2.2.13) does
not, thus (2.2.13) would be the preferred solution formula. It may be verified directly by
16
2-02
2-03
substitution that (2.2.13) is a valid solution of (2.2.10) for all λ 6= 0, assuming that g is
continuous on [0, 1].
Concerning the two simple integral equations just discussed observe that
• For the first kind equation, there are fewer restrictions on g needed for solvability
in the Volterra case (2.2.10) than in the non-Volterra case (2.2.4).
• There are no exceptional values λ 6= 0 in the Volterra case, that is, a unique solution
exists for every λ 6= 0 and every continuous g.
Here are some of the more important ways in which integral operators are classified:
IntOpClass
Definition 2.1. The kernel K(x, y) is called
• symmetric if K(x, y) = K(y, x)
• Volterra type if N = 1 and K(x, y) = 0 for x > y or x < y
• convolution type if K(x, y) = K(x − y)
R
• Hilbert-Schmidt type if Ω×Ω |K(x, y)|2 dxdy < ∞
• singular if K(x, y) is unbounded on Ω × Ω
Some important examples of integral operators, which will receive much more attention later in the book are the Fourier transform
Z
1
T u(x) =
e−ix·y u(y) dy,
(2.2.14)
N
N
2
(2π)
R
the Laplace transform
opFourier
∞
Z
e−xy u(y) dy,
T u(x) =
(2.2.15)
opLaplace
u(y)
dy,
x−y
(2.2.16)
opHilbert
u(y)
√
dy.
x−y
(2.2.17)
opAbel
0
the Hilbert transform
1
T u(x) =
π
Z
Z
x
∞
−∞
and the Abel operator
T u(x) =
0
17
2.3
Partial differential equations
An m’th order partial differential equation (PDE) for an unknown function u = u(x) on
a domain Ω ⊂ RN may be given in the form
F (x, {Dα u}|α|≤m ) = 0
(2.3.1)
Here we are using the so-called multi-index notation for partial derivatives which works
as follows. A multi-index is vector of non-negative integers
αi ∈ {0, 1, . . . }
α = (α1 , α2 , . . . , αN )
In terms of α we define
|α| =
N
X
αi
(2.3.2)
(2.3.3)
i=1
the order of α, and
∂ |α| u
(2.3.4)
∂xα11 ∂xα22 . . . ∂xαNN
the corresponding α derivative of u. For later use it is also convenient to define the
factorial of a multi-index
α! = α1 !α2 ! . . . αN !
(2.3.5)
Dα u =
The PDE (2.3.1) is linear if it can be written as
X
Lu(x) =
Dα u(x) = g(x)
(2.3.6)
|α|≤m
pdeorder1
2.3.1
First order PDEs and the method of characteristics
Let us start with the simplest possible example.
Example 2.8. When N = 2 and m = 1 consider
∂u
=0
(2.3.7)
∂x1
By elementary calculus considerations it is clear that u is a solution if and only if u is
independent of x1 , i.e.
u(x1 , x2 ) = f (x2 )
(2.3.8)
for some function f . This is then the general solution of the given PDE, which we note
contains an arbitrary function f .
18
pdeform1
Example 2.9. Next consider, again for N = 2, m = 1, the PDE
a
∂u
∂u
+b
=0
∂x1
∂x2
(2.3.9)
where a, b are fixed constants. This amounts precisely to the condition that u has directional derivative 0 in the direction θ = ha, bi, so u is constant along any line parallel to
θ. This in turn leads to the conclusion that u(x1 , x2 ) = f (ax2 − bx1 ) for some arbitrary
function f , which at least for the moment would seem to need to be differentiable. 2
The collection of lines parallel to θ, i.e lines ax2 − bx1 = C obviously play a special
role in the above example, they are the so-called characteristics, or characteristic curves
associated to this particular PDE. The general concept of characteristic curve will now
be described for the case of a first order linear PDE in two independent variables, (with
a temporary change of notation)
a(x, y)ux + b(x, y)uy = c(x, y)
(2.3.10)
linear1order
Consider the associated ODE system
dy
= b(x, y)
dt
dx
= a(x, y)
dt
(2.3.11)
and suppose we have some solution pair x = x(t), y = y(t) which we regard as a parametrically given curve in the (x, y) plane. Such a curve is then, by definition, a characteristic
curve for (2.3.10). Observe that if u(x, y) is a differentiable solution of (2.3.10) then
d
u(x(t), y(t)) = a(x(t), y(t))ux (x(t), y(t)) + b(x(t), y(t))uy (x(t), y(t)) = c(x(t), y(t))
dt
(2.3.12)
so that u satisfies a certain first order ODE along any characteristic curve. For example
if c(x, y) ≡ 0 then, as in the previous example, any solution of the PDE is constant along
any characteristic curve.
Now let Γ ⊂ R2 be some curve, which we assume can be parametrized as
x = f (s), y = g(s), s0 < s < s1
(2.3.13)
The Cauchy problem for (2.3.10) consists in finding a solution of (2.3.10) with values
prescribed on Γ, that is,
u(f (s), g(s)) = h(s) s0 < s < s1
19
(2.3.14)
udoteq
for some given function h. Assuming for the moment that such a solution u exists, let
x(t, s), y(t, s) be the characteristic curve passing through (f (s), g(s)) ∈ Γ when t = 0, i.e.
(
∂x
= a(x, y) x(0, s) = f (s)
∂t
(2.3.15)
∂y
= b(x, y) y(0, s) = g(s)
∂t
We must then have
∂
u(x(t, s), y(t, s)) = c(x(t, s), y(t, s))
∂t
u(x(0, s), y(0, s)) = h(s)
(2.3.16)
This is a first order initial value problem in t, depending on s as a parameter, which
is then guaranteed to have a solution at least for |t| < for some > 0. The three
relations x = x(t, s), y = y(t, s), z = u(x(t, s), y(t, s)) generally amounts to the parametric
description of a surface in R3 containing Γ. If we can eliminate the parameters s, t to
obtain the surface in non-parametric form z = u(x, y) then u is the sought after solution
of the Cauchy problem.
example30
Example 2.10. Let Γ denote the x axis and let us solve
xux + uy = 1
(2.3.17)
300
with u = h on Γ. Introducing f (s) = s, g(s) = 0 as the parametrization of Γ, we must
then solve

∂x

 ∂t = x x(0, s) = s
∂y
(2.3.18)
= 1 y(0, s) = 0
∂t

 ∂ u(x(t, s), y(t, s)) = 1
u(s, 0) = h(s)
∂t
We then easily obtain
x(s, t) = set
y(s, t) = t u(x(s, t), y(s, t)) = t + h(s)
(2.3.19)
and eliminating t, s yields the solution formula
u(x, y) = y + h(xe−y )
(2.3.20)
The characteristics in this case are the curves x = set , y = t for fixed s, or x = sey in
nonparametric form. Note here that the solution is defined throughout the x, y plane
even though nothing in the preceding discussion guarantees that. Since h has not been
otherwise prescribed we may also regard (2.3.20) as the general solution of (2.3.17).
The attentive reader may already realize that this procedure cannot work in all cases,
as is made clear by the following consideration: if c ≡ 0 and Γ is itself a characteristic
20
301
curve, then the solution on Γ would have to simultaneously be equal to the given function
h and to be constant, so that no solution can exist except possibly in the case that h is
a constant function. From another, more general, point of view we must eliminate the
parameters s, t by inverting the relations x = x(s, t), y = y(s, t) to obtain s, t in terms of
x, y, at least near Γ, and according to the inverse function theorem this should require
that the Jacobian matrix
∂x ∂y a(f
(s),
g(s))
b(f
(s),
g(s))
∂t
∂t =
(2.3.21)
∂y ∂x
f 0 (s)
g 0 (s)
∂s
∂s t=0
be nonsingular for all s. Equivalently the direction hf 0 , g 0 i should not be parallel to
ha, bi, and since ha, bi must be tangent to the characteristic curve, this amounts to the
requirement that Γ itself should have a non-characteristic tangent direction at every
point. We say that Γ is non-characteristic for the PDE (2.3.10) when this condition
holds.
The following precise theorem can be established, see for example Chapter 1 of [18],
or Chapter 3 of [10].
Theorem 2.2. Let Γ ⊂ R2 be a continuously differentiable curve, which is non-characteristic
for (2.3.10), h a continuously differentiable function on Γ and let a, b, c be continuously
differentiable functions in a neighborhood of Γ. Then there exists a unique continuously differentiable function u(x, y) defined in a neighborhood of Γ which is a solution of
(2.3.10).
The method of characteristics is capable of a considerable amount of generalization,
in particular to first order PDEs in any number of independent variables, and to fully
nonlinear first PDEs, see the references just given above.
2.3.2
classif
Second order problems in R2
Let us next look at the following special type of second order PDE in two independent
variables x, y:
Auxx + Buxy + Cuyy = 0
(2.3.22)
l2order
where A, B, C are real constants, not all zero. Consider introducing new coordinates ξ, η
by means of a linear change of variable
ξ = αx + βy
21
η = γx + δy
(2.3.23)
ltrans
with αδ − βγ 6= 0, so that the transformation is invertible. Our goal is to make a good
choice of α, β, γ, δ so as to achieve a simpler, but equivalent PDE to study.
Given any PDE and any change of coordinates, we obtain the expression for the PDE
in the new coordinate system by straightforward application of the chain rule. In our
case, for example, we have
∂u
∂u ∂ξ ∂u ∂η
∂u
∂u
=
+
=α
+γ
(2.3.24)
∂x
∂ξ ∂x ∂η ∂x
∂ξ
∂η
2
2
∂ 2u
∂
∂u
∂u
∂
∂ 2u
2∂ u
2∂ u
+
γ
α
+
γ
=
α
+
γ
=
α
+
2αγ
(2.3.25)
∂x2
∂ξ
∂η
∂ξ
∂η
∂ξ 2
∂ξ∂η
∂η 2
with similar expressions for uxy and uyy . Substituting into (2.3.22) the resulting PDE is
auξξ + buξη + cuηη = 0
(2.3.26)
a = α2 A + αβB + β 2 C
b = 2αγA + (αδ + βγ)B + 2βδC
c = γ 2 A + γδB + δ 2 C
(2.3.27)
(2.3.28)
(2.3.29)
where
The idea now is to make special choices of α, β, γ, δ to achieve as simple a form as possible
for the transformed PDE (2.3.26).
Suppose first that B 2 − 4AC > 0, so that there exist two real and distinct roots r1 , r2
of Ar2 + Br + C = 0. If α, β, γ, δ are chosen so that
α
γ
= r1
= r2
(2.3.30)
β
δ
then a = c = 0, (and αδ − βγ 6= 0) so that the transformed PDE is simply uξη = 0. The
general solution of this second order PDE is easily obtained: uξ must be a function of ξ
alone, so integrating with respect to ξ and observing that the ’constant of integration’
could be any function of η, we get
u(ξ, η) = F (ξ) + G(η)
(2.3.31)
for any differentiable functions F, G. Finally reverting to the original coordinate system,
the result is
u(x, y) = F (αx + βy) + G(γx + δy)
(2.3.32)
The lines αx + βy = C, γx + δy = C are called the characteristics for (2.3.22). Characteristics are an important concept for this and some more general second order PDEs,
but they don’t play as central a role as in the first order case.
22
trpde
Example 2.11. For the PDE
uxx − uyy = 0
(2.3.33)
the roots r satisfy r2 − 1 = 0. We may then choose, for example, α = β = γ = 1, δ = −1,
to get the general solution
u(x, y) = F (x + y) + G(x − y)
(2.3.34)
Next assume that B 2 − 4AC = 0. If either of A or C is 0, then so is B, in which case
the PDE already has the form uξξ = 0 or uηη = 0, say the first of these without loss of
generality. Otherwise, choose
α=−
B
2A
β=1 γ=1 δ=0
(2.3.35)
to obtain a = b = 0, c = A, so that the transformed PDE in all cases is uξξ = 0.
Finally, if B 2 − 4AC < 0 then A 6= 0 must hold, and we may choose
α= √
2A
4AC − B 2
β=√
−B
4AC − B 2
γ=0
δ=1
(2.3.36)
in which case the transformed equation is
uξξ + uηη = 0
(2.3.37)
We have therefore established that any PDE of the type (2.3.22) can be transformed,
by means of a linear change of variables, to one of the three simple types,
uξη = 0
uξξ = 0
uξξ + uηη = 0
(2.3.38)
modelpde
each of which then leads to a prototype for a certain class of PDEs. If we allow lower
order terms
Auxx + Buxy + Cuyy + Dux + Euy + F u = G
(2.3.39)
l2orderg
then after the transformation (2.3.23) it is clear that the lower order terms remain as lower
order terms. Thus any PDE of the type (2.3.39) is, up to a change of coordinates, one of
the three types (2.3.38), up to lower order terms, and only the value of the discriminant
B 2 − 4AC needs to be known to determine which of the three types is obtained.
The above discussion motivates the following classification: The PDE (2.3.39) is said
to be:
23
• hyperbolic if B 2 − 4AC > 0
• parabolic if B 2 − 4AC = 0
• elliptic if B 2 − 4AC < 0
The terminology comes from an obvious analogy with conic sections, i.e. the solution
set of Ax2 + Bxy + Cy 2 + Dx + Ey + F = 0 is respectively a hyperbola, parabola or
ellipse (or a degenerate case) according as B 2 − 4AC is positive, zero or negative.
We can also allow the coefficients A, B, . . . G to be variable functions of x, y, and
in this case the classification is done pointwise, so the type can change. An important
example of this phenomenon is the so-called Tricomi equation (see e.g. Chapter 12 of
[13])
uxx − xuyy = 0
(2.3.40)
which is hyperbolic for x > 0 and elliptic for x < 0. One might refer to the equation as
being parabolic for x = 0 but generally speaking we do not do this, since it is not really
meaningful to speak of a PDE being satisfied in a set without interior points.
The above discussion is special to the case of N = 2 independent variables, and in
the case of N ≥ 3 there is no such complete classification. As we will see there are still
PDEs referred to as being hyperbolic, parabolic or elliptic, but there are others which
are not of any of these types, although these tend to be of less physical importance.
2.3.3
Further discussion of model problems
According to the previous discussion, we should focus our attention on a representative
problem for each of the three types, since then we will also gain considerable information
about other problems of the given type.
Wave equation
For the hyperbolic case we consider the wave equation
utt − c2 uxx = 0
(2.3.41)
where c > 0 is a constant. Here we have changed the name of the variable y to t,
following the usual convention of regarding u = u(x, t) as depending on a ’space’ variable
24
waveeq
x and ’time’ variable t. This PDE arises in the simplest model of wave propagation in
one dimension, where u represents, for example, the displacement of a vibrating medium
from its equilibrium position, and c is the wave speed.
Following the procedure outlined at the beginning of this section, an appropriate
change of coordinates is ξ = x + ct, η = x − ct, and we obtain the expression, also known
as d’Alembert’s formula, for the general solution,
u(x, t) = F (x + ct) + G(x − ct)
(2.3.42)
dal
for arbitrary twice differentiable functions F, G. The general solution may be viewed as
the superposition of two waves of fixed shape, moving to the right and to the left with
speed c.
The initial value problem for the wave equation consists in solving (2.3.41) for x ∈ R
and t > 0 subject to the side conditions
u(x, 0) = f (x) ut (x, 0) = g(x)
x∈R
(2.3.43)
where f, g represent the initial displacement and initial velocity of the vibrating medium.
This problem may be completely and explicitly solved by means of d’Alembert’s formula.
We have
F (x) + G(x) = f (x) c(F 0 (x) − G0 (x)) = g(x)
x∈R
(2.3.44)
R
x
Integrating the second relation gives F (x) − G(x) = 1c 0 g(s) ds + C for some constant
C, and combining with the first relation yields
Z
Z
1
1
1 x
1 x
F (x) =
g(s) ds + C
G(x) =
g(s) ds − C
f (x) +
f (x) −
2
c 0
2
c 0
(2.3.45)
Substituting into (2.3.42) and doing some obvious simplification we obtain
Z
1
1 x+ct
g(s) ds
(2.3.46)
u(x, t) = (f (x + ct) + f (x − ct)) +
2
2c x−ct
We remark that a general solution formula like (2.3.42) can be given for any PDE
which is exactly transformable to uξη = 0, that is to say, any hyperbolic PDE of the
form (2.3.22), but once lower order terms are allowed such a simple solution method is
no longer available. For example the so-called Klein-Gordon equation utt − uxx + u = 0
may be transformed to uξη + 4u = 0 which cannot be solved in so transparent a form.
Thus the d’Alembert solution method, while very useful when applicable, is limited in
its scope.
25
waveeqic
dalivp
Heat equation
Another elementary method, which may be used in a wide variety of situations, is separation of variables. We illustrate with the case of the initial and boundary value problem
ut = uxx
0<x<1 t>0
u(0, t) = u(1, t) = 0
t>0
u(x, 0) = f (x)
0<x<1
(2.3.47)
(2.3.48)
(2.3.49)
Here (2.3.47) is the heat equation, a parabolic equation modeling for example the temperature in a one dimensional medium u = u(x, t) as a function of location x and time t,
(2.3.48) are the boundary conditions, stating that the temperature is held at temperature
zero at the two boundary points x = 0 and x = 1 for all t, and (2.3.49) represents the
initial condition, i.e. that the initial temperature distribution is given by the prescribed
function f (x).
We begin by ignoring the initial condition and otherwise looking for special solutions
of the form u(x, t) = φ(t)ψ(x). Obviously u = 0 is such a solution, but cannot be of any
help in eventually solving the full stated problem, so we insist that neither of φ and ψ is
the zero function. Inserting into (2.3.47) we obtain immediately that
must hold, or equivalently
φ0 (t)ψ(x) = φ(t)ψ 00 (x)
(2.3.50)
φ0 (t)
ψ 00 (x)
=
φ(t)
ψ(x)
(2.3.51)
Since the left side depends on t alone and the right side on x alone, it must be that both
sides are equal to a common constant which we denote by −λ (without yet at this point
ruling out the possibility that λ itself is negative or even complex). We have therefore
obtained ODEs for φ and ψ
φ0 (t) + λφ(t) = 0
ψ 00 (x) + λψ(x) = 0
(2.3.52)
linked via the separation constant λ. Next, from the boundary condition (2.3.48) we get
φ(t)ψ(0) = φ(t)ψ(1) = 0, and since φ is nonzero we must have ψ(0) = ψ(1) = 0.
The ODE and side conditions for ψ, namely
ψ 00 (x) + λψ(x) = 0 0 < x < 1
26
ψ(0) = ψ(1) = 0
(2.3.53)
SL1
is the simplest example of a so-called Sturm-Liouville problem, a topic which will be
studied in detail in Chapter ( ), but this particular case can be handled by elementary
considerations. We emphasize that our goal is to find nonzero solutions of (2.3.53), along
with the values of λ these correspond to, and as we will see, only certain values of λ will
be possible.
Considering first the case that λ > 0, the general solution of the ODE is
√
√
ψ(x) = c1 sin λx + c2 cos λx
(2.3.54)
The first
√ boundary condition ψ(0) = 0 implies that c2 = 0 and the second gives
c1 sin
√ λ = 0. We are not √allowed to have c1 = 0, since otherwise ψ = 0, so instead
sin λ = 0 must hold, i.e. λ = π, 2π, . . . . Thus we have found one collection of solutions of (2.3.53), which we denote ψk (x) = sin kπx, k = 1, 2, . . . . Since they were found
under the assumption that λ > 0, we should next consider other possibilities, but it
turns out that we have already
found all possible solutions of (2.3.53). For example if we
√
suppose λ < 0 and k = −λ then to solve (2.3.53) we must have ψ(x) = c1 ekx + c2 e−kx .
From the boundary conditions
c1 + c2 = 0 c1 ek + c2 e−k = 0
(2.3.55)
we see that the unique solution is c1 = c2 = 0 for any k > 0. Likewise we can check that
ψ = 0 is the only possible solution for k = 0 and for nonreal k.
For each allowed value of λ we obviously have the corresponding function φ(t) = e−λt ,
so that
2 2
uk (x, t) = e−k π t sin kπx k = 1, 2, . . .
(2.3.56)
represents, aside from multiplicative constants, all possible product solutions of (2.3.47),(2.3.48).
To complete
the solution of the initial and boundary value problem, we observe that
P
any sum ∞
c
k=1 k uk (x, t) is also a solution of (2.3.47),(2.3.48) as long as ck → 0 sufficiently
rapidly, and we try to choose the coefficients ck to achieve the initial condition (2.3.49).
The requirement is therefore that
f (x) =
∞
X
ck sin kπx
(2.3.57)
k=1
hold. For any f for which such a sine series representation is valid, we then have the
solution of the given PDE problem
u(x, t) =
∞
X
ck e−k
k=1
27
2 π2 t
sin kπx
(2.3.58)
foursine
The question then becomes to characterize this set of f ’s in some more straightforward
way, and this is done, among many other things, within the theory of Fourier series, which
will be discussed in Chapter 8. Roughly speaking the result will be that essentially any
reasonable function can be represented this way, but there are many aspects to this,
including elaboration of the precise sense in which the series converges. One other fact
concerning this series which we can easily anticipate at this point, is a formula for the
coefficient ck : If we assume that (2.3.57) holds, we can multiply both sides by sin mπx
for some integer m and integrate with respect to x over (0, 1), to obtain
Z 1
Z 1
cm
sin2 mπx dx =
f (x) sin mπx dx = cm
(2.3.59)
2
0
0
R1
since 0 sin kπx sin mπx dx = 0 for k 6= m. Thus, if f is representable by a sine series,
there is only one possibility for the k’th coefficient, namely
Z 1
f (x) sin kπx dx
(2.3.60)
ck = 2
0
Laplace equation
Finally we discuss a model problem of elliptic type,
uxx + uyy = 0
x2 + y 2 < 1
u(x, y) = f (x, y) x2 + y 2 = 1
(2.3.61)
(2.3.62)
where f is a given function. The PDE in (2.3.61) is known as Laplace’s equation, and is
∂2
∂2
commonly written as ∆u = 0 where ∆ = ∂x
2 + ∂y 2 is the Laplace operator, or Laplacian.
A function satisfying Laplace’s equation in some set is said to be a harmonic function on
that set, thus we are solving the boundary value problem of finding a harmonic function
in the unit disk x2 + y 2 < 1 subject to a prescribed boundary condition on the boundary
of the disk.
One should immediately recognize that it would be natural here to make use of polar
coordinates (r, θ), where according to the usual calculus notations,
p
y
r = x2 + y 2 tan θ =
x = r cos θ y = r sin θ
(2.3.63)
x
and we regard u = u(r, θ) and f = f (θ).
28
sinecoef
To begin we need to find the expression for Laplace’s equation in polar coordinates.
Again this is a straightforward calculation with the chain rule, for example
∂u
∂u ∂r ∂u ∂θ
=
+
∂x
∂r ∂x ∂θ ∂x
x
y
∂u
∂u
= p
− 2
x2 + y 2 ∂r x + y 2 ∂θ
∂u sin θ ∂u
= cos θ
−
∂r
r ∂θ
and similar expressions for
∂u
∂y
(2.3.64)
(2.3.65)
(2.3.66)
and the second derivatives. The end result is
1
1
uxx + uyy = urr + ur + 2 uθθ = 0
r
r
(2.3.67)
We may now try separation of variables, looking for solutions in the product form
u(r, θ) = R(r)Θ(θ). Substituting into (2.3.67) and dividing by RΘ gives
r2
R0 (r)
Θ00 (θ)
R00 (r)
+r
=−
R(r)
R(r)
Θ(θ)
(2.3.68)
so both sides must be equal to a common constant λ. Therefore R and Θ must be nonzero
solutions of
Θ00 + λΘ = 0
r2 R00 + rR0 − λR = 0
(2.3.69)
Next it is necessary to recognize that there are two ’hidden’ side conditions which we
must make use of. The first of these is that Θ must be 2π periodic, since otherwise it
would not be possible to express the solution u in terms of the original variables x, y in
an unambiguous way. We can make this explicit by requiring
Θ0 (0) = Θ0 (2π)
Θ(0) = Θ(2π)
(2.3.70)
As in the case of (2.3.53) we can search for allowable values of λ by considering the
various cases λ > 0, λ < 0 etc. The outcome is that nontrivial solutions exist precisely if
λ = k 2 , k = 0, 1, 2, . . . , with corresponding solutions, up to multiplicative constant,
(
1 k=0
ψk (x) =
(2.3.71)
sin kx or cos kx k = 1, 2, . . .
If one is willing to use the complex form, we could replace sin kx, cos kx by e±ikx for
k = 1, 2, . . . .
29
laplace2radi
With λ determined we must next solve the corresponding R equation,
r2 R00 + rR0 − k 2 R = 0
which is of the Cauchy-Euler type (2.1.19). The general solution is
(
c1 + c2 log r k = 0
R(r) =
c1 rk + c2 r−k k = 1, 2 . . .
(2.3.72)
(2.3.73)
and here we encounter the second hidden condition, the solution R should be not be
singular at the origin, since otherwise the PDE would not be satisfied throughout the
unit disk. Thus we should choose c2 = 0 in each case, leaving R(r) = rk , k = 0, 1, . . . .
Summarizing, we have found all possible product solutions R(r)Θ(θ) of (2.3.61), and
they are
1, rk sin kθ, rk cos kθ
k = 1, 2, . . .
(2.3.74)
up to constant multiples. Any sum of such terms is also a solution of (2.3.61), so we seek
a solution of (2.3.61),(2.3.62) in the form
u(r, θ) = a0 +
∞
X
ak rk cos kθ + bk rk sin kθ
(2.3.75)
k=1
The coefficients must then be determined from the requirement that
f (θ) = a0 +
∞
X
ak cos kθ + bk sin kθ
(2.3.76)
k=1
This is another problem in the theory of Fourier series, which is very similar to that
associated with (2.3.57), which as mentioned before will be studied in detail in Chapter
8. Exact formulas for the coefficients in terms of f may be given, as in (2.3.60), see
Exercise 19.
2.3.4
Standard problems and side conditions
Let us now formulate a number of typical PDE problems which will recur throughout
this book, and which are for the most part variants of the model problems discussed in
30
fourseries
the previous section. Let Ω be some domain in RN and let ∂Ω denote the boundary of
Ω. For any sufficiently differentiable function u, the Laplacian of u is
∆u =
N
X
∂ 2u
k=1
(2.3.77)
∂x2k
• The PDE
∆u = h x ∈ Ω
(2.3.78)
is Poisson’s equation, or Laplace’s equation in the special case that h = 0. It is
regarded as being of elliptic type, by analogy with the N = 2 case discussed in the
previous section, or on account of a more general definition of ellipticity which will
be given in Chapter 9. The most common type of side conditions associated with
this PDE are
– Dirichlet, or first kind, boundary conditions
u(x) = g(x)
x ∈ ∂Ω
(2.3.79)
– Neumann, or second kind, boundary conditions
∂u
(x) = g(x)
∂n
x ∈ ∂Ω
(2.3.80)
∂u
where ∂n
(x) = (∇u · n)(x) is the directional derivative in the direction of the
outward normal direction n(x) for x ∈ ∂Ω.
– Robin, or third kind, boundary conditions
∂u
(x) + σ(x)u(x) = g(x)
∂n
x ∈ ∂Ω
(2.3.81)
for some given function σ.
• The PDE
∆u + λu = h x ∈ Ω
(2.3.82)
where λ is some constant, is the Helmholtz equation, also of elliptic type. The
three types of boundary condition mentioned for the Poisson equation may also be
imposed in this case.
31
• The PDE
x∈Ω t>0
ut = ∆u
(2.3.83)
is the heat equation and is of parabolic type. Here u = u(x, t), where x is regarded
as a spatial variable and t a time variable. By convention, the Laplacian acts only
with respect to the N spatial variables x1 , . . . xN . Appropriate side conditions for
determining a solution of the heat equation are an initial condition
u(x, 0) = f (x)
x∈Ω
(2.3.84)
and boundary conditions of the Dirichlet, Neumann or Robin type mentioned above.
The only needed modification is that the functions involved may be allowed to
depend on t, for example the Dirichlet boundary condition for the heat equation is
u(x, t) = g(x, t)
x ∈ ∂Ω t > 0
(2.3.85)
and similarly for the other two types.
• The PDE
x∈Ω t>0
utt = ∆u
(2.3.86)
is the wave equation and is of hyperbolic type. Since it is second order in t it is
natural that there be two initial conditions, usually given as
u(x, 0) = f (x) ut (x, 0) = g(x)
x∈Ω
(2.3.87)
Suitable boundary conditions for the wave equation are precisely the same as for
the heat equation.
• Finally, the PDE
x ∈ RN
iut = ∆u
t>0
(2.3.88)
is the Schrödinger equation. Even when N = 1 it does not fall under the
√ classification scheme of Section 2.3.2 because of the complex coefficient i = −1. It is
nevertheless one of the fundamental partial differential equations of mathematical
physics, and we will have some things to say about it in later chapters. The spatial
domain here is taken to be all of RN rather than a subset Ω because this is by far
the most common situation and the only one which will arise in this book. Since
there is no spatial boundary, the only needed side condition is an initial condition
for u, u(x, 0) = f (x), as in the heat equation case.
32
2.4
Well-posed and ill-posed problems
illposed
All of the PDEs and associated side conditions discussed in the previous section turn
out to be natural, in the sense that they lead to what are called well-posed problems, a
somewhat imprecise concept we explain next. Roughly speaking a problem is well-posed
if
• A solution exists.
• The solution is unique.
• The solution depends continuously on the data.
Here by ’data’ we mean any of the ingredients of the problem which we might imagine
being changed, to obtain a problem of the same general type. For example in the Dirichlet
problem for the Poisson equation
∆u = f
x∈Ω
u = 0 x ∈ ∂Ω
(2.4.1)
the term f = f (x) would be regarded as the given data. The idea of continuous dependence is that if a ’small’ change is made in the data, then the resulting solution should
also undergo only a small change. For such a notion to be made precise, it is necessary
to have some specific idea in mind of how we would measure the magnitude of a change
in f . As we shall see, there may be many natural ways to do so, and no precise statement about well-posedness can be given until such choices are made. In fact, even the
existence and uniqueness requirements, which may seem more clear cut, may also turn
out to require much clarification in terms of what the exact meaning of ’solution’ is.
A problem which is not well-posed is called ill-posed. A classical problem in which
ill-posedness can be easily recognized is Hadamard’s example, which we note is not of
one of the standard types mentioned above:
uxx + uyy = 0
−∞<x<∞ y >0
u(x, 0) = 0 uy (x, 0) = g(x)
− ∞ < x∞
(2.4.2)
(2.4.3)
If g(x) = α sin kx for some α, k > 0 then a corresponding solution is
u(x, y) = α
33
sin kx ky
e
k
(2.4.4)
This is known to be the unique solution, but notice that a change in α (i.e. of the data g)
of size implies a corresponding change in the solution for, say, y = 1 of size ek . Since
k can be arbitrarily large, it follows that the problem is ill-posed, that is, small changes
in the data do not necessarily lead to small changes in the solution.
Note that in this example if we change the PDE from uxx + uyy = 0 to uxx − uyy = 0
then (aside from the name of a variable) we have precisely the problem (2.3.41),(2.3.43),
which from the explicit solution (2.3.46) may be seen to be well-posed under any reasonable interpretation. Thus we see that some care must be taken in recognizing what
are the ’correct’ side conditions for a given PDE. Other interesting examples of ill-posed
problems are given in exercises 23 and 26, see also [25].
2.5
Exercises
1. Find a fundamental set and the general solution of u000 + u00 + u0 = 0.
ex22
2. Let L = aD2 +bD+c (a 6= 0) be a constant coefficient second order linear differential
operator, and let p(λ) = aλ2 + bλ + c be the associated characteristic polynomial.
If λ1 , λ2 are the roots of p, show that we can express the operator L as L = a(D −
λ1 )(D − λ2 ). Use this factorization to obtain the general solution of Lu = 0 in the
case of repeated roots, λ1 = λ2 .
√
3. Show that the solution of the initial value problem y 0 = 3 y, y(0) = 0 is not unique.
(Hint: y(t) = 0 is one solution, find another one.) Why doesn’t this contradict the
assertion in Theorem 2.1 about unique solvability of the initial value problem?
4. Solve the initial value problem for the Cauchy-Euler equation
(t + 1)2 u00 + 4(t + 1)u0 − 10u = 0
u(1) = 2 u0 (1) = −1
5. Consider the integral equation
Z 1
K(x, y)u(y) dy = λu(x) + g(x)
0
for the kernel
K(x, y) =
x2
1 + y3
a) For what values of λ ∈ C does there exist a unique solution for any function g
which is continuous on [0, 1]?
34
b) Find the solution set of the equation for all λ ∈ C and continuous functions g.
(Hint: For λ 6= 0 any solution must have the form u(x) = − g(x)
+ Cx2 for some
λ
constant C.)
R1
6. Find a kernel K(x, y) such that u(x) = 0 K(x, y)f (y) dy is the solution of
u00 + u = f (x)
u(0) = u0 (0) = 0
(Hint: Review the variation of parameters method in any undergraduate ODE
textbook.)
2-7
7. If f ∈ C([0, 1]),
(
y(x − 1) 0 < y < x < 1
K(x, y) =
x(y − 1) 0 < x < y < 1
and
1
Z
K(x, y)f (y) dy
u(x) =
0
show that
u00 = f
0<x<1
u(0) = u(1) = 0
8. For each of the integral operators in (2.2.8),(2.2.14),(2.2.15),(2.2.16),and (2.2.17),
discuss the classification(s) of the corresponding kernel, according to Definition
(2.1).
9. Find the general solution of (1 + x2 )ux + uy = 0. Sketch some of the characteristic
curves.
10. The general solution in Example 2.10 was found by solving the corresponding
Cauchy problem with Γ being the x axis. But the general solution should not
actually depend on any specific choice of Γ. Show that the same general solution is
found if instead we take Γ to be the y axis.
11. Find the solution of
yux + xuy = 1
u(0, y) = e−y
2
Discuss why the solution you find is only valid for |y| ≥ |x|.
12. The method of characteristics developed in Section 2.3.1 for the linear PDE (2.3.10)
can be easily extended to the so-called semilinear equation
a(x, y)ux + b(x, y)uy = c(x, y, u)
35
(2.5.1)
We simply replace (2.3.12) by
d
u(x(t), y(t)) = c(x(t), y(t), u(x(t), y(t)))
dt
(2.5.2)
which is still an ODE along a characteristic. With this in mind, solve
ux + xuy + u2 = 0
u(0, y) =
1
y
(2.5.3)
13. Find the general solution of uxx − 4uxy + 3uyy = 0.
14. Find the regions of the xy plane where the PDE
yuxx − 2uxy + xuyy − 3ux + u = 0
is elliptic, parabolic, and hyperbolic.
15. Find a solution formula for the half line wave equation problem
utt − c2 uxx
u(0, t)
u(x, 0)
ut (x, 0)
=
=
=
=
0 x>0 t>0
h(t) t > 0
f (x) x > 0
g(x) x > 0
(2.5.4)
(2.5.5)
(2.5.6)
(2.5.7)
Note where the solution coincides with (2.3.46) and explain why this should be
expected.
16. Complete the details of verifying (2.3.67)
ex-2-17
17. If u is a twice differentiable function on RN depending only on r = |x|, show that
∆u = urr +
N −1
ur
r
(Spherical coordinates in RN are reviewed in Section 18.3, but the details of the
∂u
=
angular variables are not needed for this calculation. Start by showing that ∂x
j
xj
0
u (r) r .)
18. Verify in detail that there are no nontrivial solutions of (2.3.53) for nonreal λ ∈ C.
ex23
19. Assuming that (2.3.76) is valid, find the coefficients ak , bk in terms of f . (Hint:
multiply the equation by sin mθ or cos mθ and integrate from 0 to 2π.)
36
20. In the two dimensional case, solutions of Laplace’s equation ∆u = 0 may also be
found by means of analytic function theory. Recall that if z = x+iy then a function
f (z) is analytic in an open set Ω if f 0 (z) exists at every point of Ω. If we think of
f = u + iv and u, v as functions of x, y then u = u(x, y), v = v(x, y) must satisfy
the Cauchy-Riemann equations ux = vy , uy = −vx . Show in this case that u, v are
also solutions of Laplace’s equation. Find u, v if f (z) = z 3 and f (z) = ez .
21. Find all of the product solutions u(x, t) = φ(t)ψ(x) that you can which satisfy the
damped wave equation
utt + αut = uxx
0<x<π
t>0
and the boundary conditions
u(0, t) = ux (π, t) = 0
t>0
Here α > 0 is the damping constant. What is the significance of the condition
α < 1?
ex24
22. Show that any solution of the wave equation utt − uxx = 0 has the ‘four point
property’
u(x, t) + u(x + h − k, t + h + k) = u(x + h, t + h) + u(x − k, t + k)
for any h, k. (Suggestion: Use d’Alembert’s formula.)
ex25
23. In the Dirichlet problem for the wave equation
utt − uxx = 0
0<x<1 0<t<1
u(0, t) = u(1, t) = 0
u(x, 0) = 0 u(x, 1) = f (x)
0<t<1
0<x<1
show that neither existence nor uniqueness holds. (Hint: For the non-existence
part, use exercise 22 to find an f for which no solution exists.)
24. Let Ω be the rectangle [0, a] × [0, b] in R2 . Find all possible product solutions
u(x, y, t) = φ(t)ψ(x)ζ(y)
satisfying
ut − ∆u = 0
(x, y) ∈ Ω t > 0
u(x, y, t) = 0
(x, y) ∈ ∂Ω t > 0
37
25. Find a solution of the Dirichlet problem for u = u(x, y) in the unit disc Ω = {(x, y) :
x2 + y 2 < 1},
∆u = 1 (x, y) ∈ Ω
u(x, y) = 0 (x, y) ∈ ∂Ω
(Suggestion: look for a solution in the form u = u(r) and recall (2.3.67).)
ex26
26. The problem
ut = uxx
0<x<1 t<T
u(0, t) = u(1, t) = 0
t>0
u(x, T ) = f (x)
0<x<1
(2.5.8)
(2.5.9)
(2.5.10)
is sometimes called a final value problem for the heat equation.
a) Show that this problem is ill-posed.
b) Show that this problem is equivalent to (2.3.47),(2.3.48),(2.3.49) except with the
heat equation (2.3.47) replaced by the backward heat equation ut = −uxx .
38
Chapter 3
Vector spaces
We will be working frequently with function spaces which are themselves special cases
of more abstract spaces. Most such spaces which are of interest to us have both linear
structure and metric structure. This means that given any two elements of the space it
is meaningful to speak of (i) a linear combination of the elements, and (ii) the distance
between the two elements. These two kinds of concepts are abstracted in the definitions
of vector space and metric space.
3.1
Axioms of a vector space
chvec-1
Definition 3.1. A vector space is a set X such that whenever x, y ∈ X and λ is a scalar
we have x + y ∈ X and λx ∈ X, and the following axioms hold.
[V1] x + y = y + x for all x, y ∈ X
[V2] (x + y) + z = x + (y + z) for all x, y, z ∈ X
[V3] There exists an element 0 ∈ X such that x + 0 = x for all x ∈ X
[V4] For every x ∈ X there exists an element −x ∈ X such that x + (−x) = 0
[V5] λ(x + y) = λx + λy for all x, y ∈ X and any scalar λ
[V6] (λ + µ)x = λx + µx for any x ∈ X and any scalars λ, µ
39
[V7] λ(µx) = (λµ)x for any x ∈ X and any scalars λ, µ
[V8] 1x = x for any x ∈ X
Here the field of scalars my be either the real numbers R or the complex numbers
C, and we may refer to X as a real or complex vector space accordingly, if a distinction
needs to be made.
By an obvious induction
Pm argument, if x1 , . . . , xm ∈ X and λ1 , . . . , λm are scalars, then
the linear combination j=1 λj xj is itself an element of X.
Example 3.1. Ordinary N -dimensional Euclidean space
RN := {x = (x1 , x2 . . . xN ) : xj ∈ R}
is a real vector space with the usual operations of vector addition and scalar multiplication,
(x1 , x2 . . . xN ) + (y1 , y2 , . . . yN ) = (x1 + y1 , x2 + y2 . . . xN + yN )
λ(x1 , x2 . . . xN ) = (λx1 , λx2 , . . . λxN ) λ ∈ R
If we allow the components xj as well as the scalars λ to be complex, we obtain
instead the complex vector space CN .
Example 3.2. If E ⊂ RN , let
C(E) = {f : E → R : f is continous at x for every x ∈ E}
denote the set of real valued continuous functions on E. Clearly C(E) is a real vector
space with the ordinary operations of function addition and scalar multiplication
(λf )(x) = λf (x) λ ∈ R
(f + g)(x) = f (x) + g(x)
If we allow the range space in the definition of C(E) to be C then C(E) becomes a
complex vector space.
Spaces of differentiable functions likewise may be naturally regarded as vector spaces,
for example
C m (E) = {f : Dα f ∈ C(E), |α| ≤ m}
and
C ∞ (E) = {f : Dα f ∈ C(E), for all α}
2
40
Example 3.3. If 0 < p < ∞ and E is a measurable subset of RN , the space Lp (E) is
defined to be the set of measurable functions f : E → R or f : E → C such that
Z
|f (x)|p dx < ∞
(3.1.1)
E
Here the integral is defined in the Lebesgue sense. Those unfamiliar with measure theory
and Lebesgue integration should consult a standard textbook such as [30],[28], or see a
brief summary in Appendix ( ).
It may then be shown that Lp (E) is vector space for any 0 < p < ∞. To see this we
use the known fact that if f, g are measurable then so are f + g and λf for any scalar λ,
and the numerical inequality (a + b)p ≤ Cp (ap + bp ) for a, b ≥ 0, where Cp = max (2p−1 , 1)
to prove that f + g ∈ Lp (E) whenever f, g ∈ Lp (E). Verification of the remaining axioms
is routine.
The related vector space L∞ (E) is defined as the set of measurable functions f for
which
ess supx∈E |f (x)| < ∞
(3.1.2)
Here M = ess supx∈E |f (x)| if |f (x)| ≤ M a.e. and there is no smaller constant with this
property.
Definition 3.2. If X is a vector space, a subset M ⊂ X is a subspace of X if
(i) x + y ∈ M whenever x, y ∈ M
(ii) λx ∈ M whenever x ∈ M and λ is a scalar
That is to say, a subspace is a subset of X which is closed under formation of linear
combinations. Clearly a subspace of a vector space is itself a vector space.
Example 3.4. The subset M = {x ∈ RN : xj = 0} is a subspace of RN for any fixed j.
Example 3.5. If E ⊂ RN then C ∞ (E) is a subspace of C m (E) for any m, which in turn
is a subspace of C(E).
Example 3.6. If X is any vector space and S ⊂ X, then the set of all finite linear
combinations of elements of S,
L(S) := {v ∈ X : x =
m
X
λj xj for some scalars λ1 , λ2 , . . . λm and elements x1 , . . . xm ∈ S}
j=1
41
is a subspace of X. It is also called the span, or linear span of S, or the subspace generated
by S. 2
Example 3.7. If in Example 5 we take X = C([a, b]) and fj (x) = xj−1 for j = 1, 2, . . .
+1
then the subspace generated by {fj }N
j=1 is PN , the vector space of polynomials of degree
less than or equal to N . Likewise, the the subspace generated by {fj }∞
j=1 is P, the vector
space of all polynomials. 2
3.2
Linear independence and bases
Definition 3.3. We say that
PmS ⊂ X is linearly independent if whenever x1 , . . . xm ∈ S,
λ1 , . . . λm are scalars and j=1 λj xj = 0 then λ1 = λ2 = . . . λm = 0. Otherwise S is
linearly dependent.
Equivalently, S is linearly dependent if it is possible to express at least one of its
elements as a linear combination of the remaining ones. In particular any set containing
the zero element is linearly dependent.
hamel
Definition 3.4. We say that S ⊂ X is a basis of X if for any x ∈
there exists unique
PX
m
scalars λ1 , λ2 , . . . λm and elements x1 , . . . , xm ∈ S such that x = j=1 λj xj .
The following characterization of a basis is then immediate:
Theorem 3.1. S ⊂ X is a basis of X if and only if S is linearly independent and
L(S) = X.
It is important to emphasize that in this definition of basis it is required that every
x ∈ X be expressible as a finite linear combination of the basis elements. This notion
of basis will be inadequate for later purposes, and will be replaced by one which allows
infinite sums, but this cannot be done until a meaning of convergence is available. The
notion of basis in Definition 3.4 is called a Hamel basis if a distinction is necessary.
Definition 3.5. We say that dim X, the dimension of X, is m if there exist m linearly
independent vectors in X but any collection of m + 1 elements of X is linearly dependent.
If there exists m linearly independent vectors for any positive integer m, then we say
dim X = ∞.
prop31
Proposition 3.1. The elements {x1 , x2 , . . . xm } form a basis for L({x1 , x2 , . . . xm }) if
and only if they are linearly independent.
42
prop32
Proposition 3.2. The dimension of X is the number of vectors in any basis of X.
The proof of both of these Propositions is left for the exercises.
Example 3.8. RN or CN has dimension N . We will denote by ej the standard unit
vector with a one in the j’th position and zero elsewhere. Then {e1 , e2 , . . . eN } is the
standard basis for either RN or CN .
Example 3.9. In the vector space C([a, b]) the elements fj (t) = tj−1 are clearly linearly
independent, so that the dimension is ∞, as is the dimension of the subspace P. Also
evidently the subspace PN has dimension N + 1.
Example 3.10. The set of solutions of the ordinary differential equation u00 + u = 0
is precisely the set of linear combinations u(t) = λ1 sin t + λ2 cos t. Since sin t, cos t are
linearly independent functions, they form a basis for this two dimensional space.
The following is interesting, although not of great practical significance. Its proof,
which is not obvious in the infinite dimensional case, relies on the Axiom of Choice and
will not be given here.
Theorem 3.2. Every vector space has a basis.
3.3
Linear transformations of a vector space
sec33
If X and Y are vector spaces, a mapping T : X 7−→ Y is called linear if
T (λ1 x1 + λ2 x2 ) = λ1 T (x1 ) + λ2 T (x2 )
(3.3.1)
for all x1 , x2 ∈ X and all scalars λ1 , λ2 . Such a linear transformation is uniquely determined on all of X by its action onPany basis of X, i.e. if S P
= {xα }α∈A is a basis of X
m
and yα = T (xα ), then for any x = m
λ
x
we
have
T
x
=
j=1 λj yαj .
j=1 j αj
In the case that X and Y are both of finite dimension let us choose bases {x1 , x2 , . . . xm },
{y1 , y2 , . . . yn } of P
X, Y respectively. For 1 ≤ j ≤ m there must exist unique scalars akj
such that T xj = nk=1 akj yk and it follows that
x=
m
X
j=1
λ j xj ⇒ T x =
n
X
µk y k
where µk =
m
X
j=1
k=1
43
akj λj
(3.3.2)
P
For a given basis {x1 , x2 , . . . xm } of X, if x = m
j=1 λj xj we say that λ1 , λ2 , . . . λm are
the coordinates of x with respect to the given basis. The n × m matrix A = [akj ] thus
maps the coordinates of x with respect to the basis {x1 , x2 , . . . xm } to the coordinates of
T x with respect to the basis {y1 , y2 , . . . yn }, and thus encodes all information about the
linear mapping T .
If T : X 7−→ Y is linear, one-to-one and onto then we say T is an isomorphism
between X to Y, and the vector spaces X and Y are isomorphic whenever there exists
an isomorphism between them. If T is such an isomorphism, and S is a basis of X then it
easy to check that the image set T (S) is a basis of Y. In particular, any two isomorphic
vector spaces have the same finite dimension or are both infinite dimensional.
For any linear mapping T : X → Y we define the kernel, or null space, of T as
N (T ) = {x ∈ X : T x = 0}
(3.3.3)
R(T ) = {y ∈ Y : y = T x for some x ∈ X}
(3.3.4)
and the range of T as
It is immediate that N (T ) and R(T ) are subspaces of X, Y respectively, and T is an
isomorphism precisely if N (T ) = {0} and R(T ) = Y. If X = Y = RN or CN , we learn
in linear algebra that these two conditions are equivalent.
3.4
Exercises
1. Using only the vector space axioms, show that the zero element in [V3] is unique.
2. Prove Propositions 3.1 and 3.2.
3. Show that the intersection of any family of subspaces of a vector space is also a
subspace. What about the union of subspaces?
4. Show that Mm×n , the set of m × n matrices, with the usual definitions of addition
and scalar multiplication, is a vector space of dimension mn. Show that the subset
of symmetric matrices n × n matrices forms a subspace of Mn×n . What is its
dimension?
5. Under what conditions on a measurable set E ⊂ RN and p ∈ (0, ∞] will it be true
that C(E) is a subspace of Lp (E)? Under what conditions is Lp (E) a subset of
Lq (E)?
44
6. Let uj (t) = tλj where λ1 , . . . λn are arbitrary unequal real numbers. Show that
{u1 . . . uP
n } are linearly independent functions on any interval (a, b) ⊂ R. (Suggestion: If nj=1 αj tλj = 0, divide by tλ1 and differentiate.)
7. A side condition for a differential equation is homogeneous if whenever two functions satisfy the side condition then so does any linear combination of the two
functions. For example the Dirichlet
type boundary condition u = 0 for x ∈ ∂Ω is
P
homogeneous. Now let Lu = |α|≤m aα (x)Dα u denote any linear differential operator. Show that the set of functions satisfying Lu = 0 and any homogeneous side
conditions is a vector space.
8. Consider the differential equation u00 + u = 0 on the interval (0, π). What is the
dimension of the vector space of solutions which satisfy the homogeneous boundary
conditions a) u(0) = u(π), and b) u(0) = u(π) = 0. Repeat the question if the
interval (0, π) is replaced by (0, 1) and (0, 2π).
9. Let Df = f 0 for any differentiable function f on R. For any N ≥ 0 show that
D : PN → PN is linear and find its null space and range.
10. If X and Y are vector spaces, then the Cartesian product of X and Y, is defined
as the set of ordered pairs
X × Y = {(x, y) : x ∈ X, y ∈ Y}
(3.4.1)
Addition and scalar multiplication on X × Y are defined in the natural way,
(x, y) + (x̂, ŷ) = (x + x̂, y + ŷ)
λ(x, y) = (λx, λy)
(3.4.2)
a) Show that X × Y is a vector space.
b) Show that R × R is isomorphic to R2 .
11. If X, Y are vector spaces of the same finite dimension, show X and Y are isomorphic.
12. Show that Lp (0, 1) and Lp (a, b) are isomorphic, for any a, b ∈ R and p ∈ (0, ∞].
45
Chapter 4
Metric spaces
chmetric
4.1
Axioms of a metric space
A metric space is a set on which some natural notion of distance may be defined.
Definition 4.1. A metric space is a pair (X, d) where X is a set and d is a real valued
mapping on X × X, such that the following axioms hold.
[M1] d(x, y) ≥ 0 for all x, y ∈ X
[M2] d(x, y) = 0 if and only if x = y
[M3] d(x, y) = d(y, x) for all x, y ∈ X
[M4] d(x, y) ≤ d(x, z) + d(z, y) for all x, y, z ∈ X.
Here d is the metric on X, i.e. d(x, y) is regarded as the distance from x to y. Axiom
[M4] is known as the triangle inequality. Although strictly speaking the metric space is
the pair (X, d) it is a common practice to refer to X itself as being the metric space, with
the metric d understood from context. But as we will see in examples it is often possible
to assign different metrics to the same set X.
If (X, d) is a metric space and Y ⊂ X then it is clear that (Y, d) is also a metric
space, and in this case we say that Y inherits the metric of X.
46
ex41
Example 4.1. If X = RN then there are many choices of d for which (RN , d) is a metric
space. The most familiar is the ordinary Euclidean distance
d(x, y) =
N
X
! 21
|xj − yj |2
(4.1.1)
j=1
In general we may define
N
X
dp (x, y) =
! p1
|xj − yj |p
1≤p<∞
(4.1.2)
j=1
and
d∞ (x, y) = max (|x1 − y1 |, |x2 − y2 |, . . . |xn − yn |)
(4.1.3)
The verification that (Rn , dp ) is a metric space for 1 ≤ p ≤ ∞ is left to the exercises
– the triangle inequality is the only nontrivial step. The same family of metrics may be
used with X = CN .
CofE
Example 4.2. To assign a metric to C(E) more specific assumptions must be made
about E. If we assume, for example, that E is a closed and bounded1 subset of RN we
may set
d∞ (f, g) = max |f (x) − g(x)|
(4.1.4)
x∈E
so that d(f, g) is always finite by virtue of the well known theorem that a continuous
function achieves its maximum on a closed, bounded set. Other possibilities are
p1
Z
p
|f (x) − g(x)| dx
dp (f, g) =
1≤p<∞
(4.1.5)
E
Note the analogy with the definition of dp in the case of RN or CN .
For more arbitrary sets E there is in general no natural metric for C(E). For example,
if E is an open set, none of the metrics dp can be used since there is no reason why dp (f, g)
should be finite for f, g ∈ C(E).
As in the case of vector spaces, some spaces of differentiable functions may also be
made into metric spaces. For this we will assume a bit more about E, namely that E is
1
I.e. E is compact in RN . Compactness is discussed in more detail below, and we avoid using the term until
then.
47
CMetric
the closure of a bounded open set O ⊂ RN , and in this case will say that Dα f ∈ C(E) if
the function Dα f defined in the usual pointwise sense on O has a continuous extension
to E. We then can define
C m (E) = {f : Dα f ∈ C(E) whenever |α| ≤ m}
(4.1.6)
d(f, g) = max max |Dα (f − g)(x)|
(4.1.7)
with metric
|α|≤m x∈E
CmMetric
which may be easily checked to satisfy [M1]-[M4].
We cannot define a metric on C ∞ (E) in the obvious way just by letting m → ∞
in the above definition, since there is no reason why the resulting maximum over m in
(4.1.7) will be finite, even if f ∈ C m (E) for every m. See however Exercise 18.
Example 4.3. Recall that if E is a measurable subset of RN , we have defined corresponding vector spaces Lp (E) for 0 < p ≤ ∞. To endow them with metric space structure
let
p1
Z
p
|f (x) − g(x)| dx
dp (f, g) =
(4.1.8)
dpmet
E
for 1 ≤ p < ∞, and
d∞ (f, g) = ess supx∈E |f (x) − g(x)|
(4.1.9)
The validity of axioms [M1] and [M3] is clear, and the triangle inequality [M4] is
an immediate consequence of the Minkowski inequality (18.1.10). But axiom [M2] does
not appear to be satisfied here, since for example, two functions f, g agreeing except at
a single point, or more generally agreeing except on a set of measure zero, would have
dp (f, g) = 0. It is necessary, therefore, to modify our point of view concerning Lp (E) as
follows. We define an equivalence relation f ∼ g if f = g almost everywhere, i.e. except
on a set of measure zero. If dp (f, g) = 0 we would be able to correctly conclude that
f ∼ g, in which case we will regard f and g as being the same element of Lp (E). Thus
strictly speaking, Lp (E) is the set of equivalence classes of measurable functions, where
the equivalence classes are defined by means of the above equivalence relation.
The distance dp ([f ], [g]) between two equivalence classes [f ] and [g] may be unambiguously determined by selecting a representative of each class and then evaluating
the distance from (4.1.8) or (4.1.9). Likewise the vector space structure of Lp (E) is
maintained since, for example, we can define the sum of equivalence classes [f ] + [g] by
selecting a representative of each class and observing that if f1 ∼ f2 and g1 ∼ g2 then
48
dinfmet
f1 +g1 ∼ f2 +g2 . It is rarely necessary to make a careful distinction between a measurable
function and the equivalence class it belongs to, and whenever it can cause no confusion
we will follow the common practice of referring to members of Lp (E) as functions rather
than equivalence classes. The notation f may be used to stand for either a function or its
equivalence class. An element f ∈ Lp (E) will be said to be continuous if its equivalence
class contains a continuous function, and in this way we can naturally regard C(E) as a
subset of Lp (E).
Although Lp (E) is a vector space for 0 < p ≤ ∞, we cannot use the above definition
of metric for 0 < p < 1, since it turns out the triangle inequality is not satisfied (see
Exercise 6 of Chapter 5) except in degenerate cases.
4.2
Topological concepts
In a metric space various concepts of point set topology may be introduced.
Definition 4.2. If (X, d) is a metric space then
1. B(x, ) = {y ∈ X : d(x, y) < } is the ball centered at x of radius .
2. A set E ⊂ X is bounded if there exists some x ∈ X and R < ∞ such that
E ⊂ B(x, R).
3. If E ⊂ X, then a point x ∈ X is an interior point of E if there exists > 0 such
that B(x, ) ⊂ E.
4. If E ⊂ X, then a point x ∈ X is a limit point of E if for any > 0 there exists a
point y ∈ B(x, ) ∩ E, y 6= x.
5. A subset E ⊂ X is open if every point of E is an interior point of E. By convention,
the empty set is open.
6. A subset E ⊂ X is closed if every limit point of E is in E.
7. The closure E of a set E ⊂ X is the union of E and the limit points of E.
8. The interior E ◦ of a set E is the set of all interior points of E.
9. A subset E is dense in X if E = X
49
10. X is separable if it contains a countable dense subset.
11. If E ⊂ X, we say that x ∈ X is a boundary point of E if for any > 0 the ball
B(x, ) contains at least one point of E and at least one point of the complement
E c = {x ∈ X : x 6∈ E}. The boundary of E is denoted ∂E.
The following Proposition states a number of elementary but important properties.
Proofs are essentially the same as in the more familiar special case when the metric space
is a subset of RN , and will be left for the reader.
Proposition 4.1. Let (X, d) be a metric space. Then
1. B(x, ) is open for any x ∈ X and > 0.
2. E ⊂ X is open if and only if its complement E c is closed
3. An arbitrary union or finite intersection of open sets is open.
4. An arbitrary intersection or finite union of closed sets is closed.
5. If E ⊂ X then E ◦ is the union of all open sets contained in E, E ◦ is open, and E
is open if and only if E = E ◦ .
6. E is the intersection of all closed sets containing E, E is closed, and E is closed if
and only if E = E.
7. If E ⊂ X then ∂E = E\E ◦ = E ∩ E c
Next we study infinite sequences in X.
Definition 4.3. We say that a sequence {xn }∞
n=1 in X is convergent to x, that is,
limn→∞ xn = x, if for any > 0 there exists n0 < ∞ such that d(xn , x) < whenever n ≥ n0 .
Example 4.4. If X = RN or CN , and d is any one of the metrics dp , then xn → x if and
only if each component sequence converges to the corresponding limit, i.e. xj,n → xj as
n → ∞ in the ordinary sense of convergence in R or C. (Here xj,n is the j’th component
of xn .)
Example 4.5. In the metric space (C(E), d∞ ) of Example 4.2, limn→∞ fn = f is equivalent to the definition of uniform convergence on E.
50
Definition 4.4. We say that a sequence {xn }∞
n=1 in X is a Cauchy sequence if for any
> 0 there exists n0 < ∞ such that d(xn , xm ) < whenever n, m ≥ n0 .
It is easy to see that a convergent sequence is always a Cauchy sequence, but the
converse may be false.
Definition 4.5. A metric space X is said to be complete if every Cauchy sequence in X
is convergent in X.
Example 4.6. Completeness is one of the fundamental properties of the real numbers
N
R, see for example Chapter 1 of [29]. If a sequence {xn }∞
n=1 in R is Cauchy with respect
to any of the metrics dp , then each component sequence {xj,n }∞
n=1 is a Cauchy sequence
in R, hence convergent in R. It then follows immediately that {xn }∞
n=1 is convergent in
N
R , again with any of the metrics dp . The same conclusion holds for CN , so that RN , CN
are complete metric spaces. These spaces are also separable since the subset consisting
of points with rational co-ordinates is countable and dense. A standard example of an
incomplete metric space is the set of rational numbers with the metric inherited from R.
Most metric spaces used in this book, and indeed most metric spaces used in applied
mathematics, are complete.
prop42
Proposition 4.2. If E ⊂ RN is closed and bounded, then the metric space C(E) with
metric d = d∞ is complete.
Proof: Let {fn }∞
n=1 be a Cauchy sequence in C(E). If > 0 we may then find n0 such
that
max |fn (x) − fm (x)| < (4.2.1)
x∈E
whenever n, m ≥ n0 . In particular the sequence of numbers {fn (x)}∞
n=1 is Cauchy in R
or C for each fixed x ∈ E, so we may define f (x) := limn→∞ fn (x). Letting m → ∞ in
(4.2.1) we obtain
|fn (x) − f (x)| ≤ n ≥ n0 x ∈ E
(4.2.2)
which means d(fn , f ) ≤ for n ≥ n0 . It remains to check that f ∈ C(E). If we pick
x ∈ E, then since fn0 ∈ C(E) there exists δ > 0 such that |fn0 (x) − fn0 (y)| < if
|y − x| < δ. Thus for |y − x| < δ we have
|f (x) − f (y)| ≤ |f (x) − fn0 (x)| + |fn0 (x) − fn0 (y)| + |fn0 (y) − f (y)| < 3
(4.2.3)
Since is arbitrary, f is continuous at x, and since x is arbitrary f ∈ C(E). Thus we
have concluded that the Cauchy sequence {fn }∞
n=1 is convergent in C(E) to f ∈ C(E),
as needed. 2
51
eq401
The final part of the above proof should be recognized as the standard proof of the
familiar fact that a uniform limit of continuous functions is continuous.
The spaces C m (E) can likewise be shown, again assuming that E is closed and
bounded, to be complete metric spaces with the metric defined in (4.1.7), see Exercise
19.
If we were to choose the metric d1 on C(E) then the resulting metric space is not
1
complete. Choose for example E = [−1, 1] and fn (x) = x 2n+1 so that the pointwise limit
of fn (x) is
f (x) = 1 x > 0
f (x) = −1 x < 0
f (0) = 0
(4.2.4)
By a simple calculation
Z
1
1
(4.2.5)
n+1
−1
∞
so that {fn }∞
n=1 must be Cauchy in C(E) with metric d1 . On the other hand {fn }n=1
cannot be convergent in this space, since the only possible limit is f which does not
belong to C(E).
|fn (x) − f (x)| =
The same example can be modified to show that C(E) is not complete with any of
the metrics dp for 1 ≤ p < ∞, and so d∞ is in some sense the ’natural’ metric. For this
reason C(E) will always be assumed to supplied with the metric d∞ unless otherwise
stated.
We next summarize in the form of a theorem some especially important facts about
the metric spaces Lp (E), which may be found in any standard textbook on Lebesgue
integration, for example Chapter 3 of [30] or Chapter 8 of [38].
th41
Theorem 4.1. If E ⊂ RN is measurable, then
1. Lp (E) is complete for 1 ≤ p ≤ ∞.
2. Lp (E) is separable for 1 ≤ p < ∞.
3. If Cc (E) is the set of continuous functions of bounded support, i.e.
Cc (E) = {f ∈ C(E) : there exists R < ∞ such that f (x) ≡ 0 for |x| > R} (4.2.6)
then Cc (E) is dense in Lp (E) for 1 ≤ p < ∞
The completeness property is a significant result in measure theory, often known as
the Riesz-Fischer Theorem.
52
4.3
Functions on metric spaces and continuity
Next, suppose X, Y are two metric spaces with metrics dX , dY respectively.
Definition 4.6. Let T : X → Y be a mapping.
1. We say T is continuous at a point x ∈ X if for any > 0 there exists δ > 0 such
that dY (T (x), T (x̂)) ≤ whenever dX (x, x̂) ≤ δ.
2. T is continuous on X if it is continuous at each point of X.
3. T is uniformly continuous on X if for any > 0 there exists δ > 0 such that
dY (T (x), T (x̂)) ≤ whenever dX (x, x̂) ≤ δ, x, x̂ ∈ X.
4. T is Lipschitz continuous on X if there exists L such that
dY (T (x), T (x̂)) ≤ LdX (x, x̂)
x, x̂ ∈ X
(4.3.1)
The infimum of all L’s which work in this definition is called the Lipschitz constant
of T .
Clearly we have the implications that T Lipschitz continuous implies T is uniformly
continuous, which in turn implies that T is continuous.
T is one-to-one, or injective, if T (x1 ) = T (x2 ) only if x1 = x2 , and onto, or surjective,
if for every y ∈ Y there exists some x ∈ X such that T (x) = y. If T is both one-to-one
and onto then we say it is bijective, and in this case there must exist an inverse mapping
T −1 : Y → X.
For any mapping T : X → Y we define, for E ⊂ X and F ⊂ Y
T (E) = {y ∈ Y : y = T (x) for some x ∈ E}
(4.3.2)
the image of E in Y, and
T −1 (E) = {x ∈ X : T (x) ∈ E}
(4.3.3)
the preimage of F in X. Note that T is not required to be bijective in order that the
preimage be defined.
The following theorem states two useful characterizations of continuity. Condition b)
is referred to as the sequential definition of continuity, for obvious reasons, while c) is
the topological definition, since it may be used to define continuity in much more general
topological spaces.
53
Theorem 4.2. Let X, Y be metric spaces and T : X → Y. Then the following are
equivalent:
a) T is continuous on X.
b) If xn ∈ X and xn → x, then T (xn ) → T (x).
c) If E is open in Y then T −1 (E) is open in X.
Proof: Assume T is continuous on X and let xn → x in X. If > 0 then there exists
δ > 0 such that dY (T (x̂), T (x)) < if dX (x̂, x) < δ. Choosing n0 sufficiently large that
dX (xn , x) < δ for n ≥ n0 we then must have dY (T (xn ), T (x)) < for n ≥ n0 , so that
T (xn ) → T (x). Thus a) implies b).
To see that b) implies c), suppose condition b) holds, E is open in Y and x ∈ T −1 (E).
We must show that there exists δ > 0 such that x̂ ∈ T −1 (E) whenever dX (x̂, x) < δ. If not
then there exists a sequence xn → x such that xn 6∈ T −1 (E), and by b), T (xn ) → T (x).
Since y = T (x) ∈ E and E is open, there exists > 0 such that z ∈ E if dY (z, y) < .
Thus T (xn ) ∈ E for sufficiently large n, i.e. xn ∈ T −1 (E), a contradiction.
Finally, suppose c) holds and fix x ∈ X. If > 0 then corresponding to the open set
E = B(T (x), ) in Y there exists a ball B(x, δ) in X such that B(x, δ) ⊂ T −1 (E). But
this means precisely that if dX (x̂, x) < δ then dY (T (x̂), T (x)) < , so that T is continuous
at x. 2
4.4
Compactness and optimization
Another important topological concept is that of compactness.
Definition 4.7. If E ⊂ X then a collection of open sets {Gα }α∈A is an open cover of E
if E ⊂ ∪α∈A Gα .
Here A is the index set and may be finite, countably or uncountably infinite.
Definition 4.8. K ⊂ X is compact if any open cover of K has a finite subcover. More
explicitly, K is compact if whenever K ⊂ ∪α∈A Gα , where each Gα is open, there exists a
finite number of indices α1 , α2 , . . . αm ∈ A such that K ⊂ ∪m
j=1 Gαj . In addition, E ⊂ X
is precompact (or relatively compact) if E is compact.
54
compact1
Proposition 4.3. A compact set is closed and bounded. A closed subset of a compact
set is compact.
Proof: Suppose that K is compact and pick x ∈ K c . For any r > 0 let Gr = {y ∈
X : d(x, y) > r}. It is easy to see that each Gr is open and K ⊂ ∪r>0 Gr . Thus there
c
exists r1 , r2 , . . . rm such that K ⊂ ∪m
j=1 Grj and so B(x, r) ⊂ K if r < min {r1 , r2 , . . . rm }.
Thus K c is open and so K is closed.
Obviously ∪r>0 B(x, r) is an open cover of K for any fixed x ∈ X. If K is compact
then there must exist r1 , r2 , . . . rm such that K ⊂ ∪m
j=1 B(x, rj ) and so K ⊂ B(x, R) where
R = max {r1 , r2 , . . . rm }. Thus K is bounded.
Now suppose that F ⊂ K where F is closed and K is compact. If {Gα }α∈A is an
open cover of F then these sets together with the open set F c are an open cover of K.
c
Hence there exists α1 , α2 , . . . αm such that K ⊂ (∪m
j=1 Gαj ) ∪ F , from which we conclude
that F ⊂ ∪m
j=1 Gαj . 2
There will be frequent occasions for wanting to know if a certain set is compact, but
it is rare to use the above definition directly. A useful equivalent condition is that of
sequential compactness.
Definition 4.9. A set K ⊂ X is sequentially compact if any infinite sequence in E has
a subsequence convergent to a point of K.
Proposition 4.4. A set K ⊂ X is compact if and only if it is sequentially compact.
We will not prove this result here, but instead refer to Theorem 16, Section 9.5 of
[28] for details. It follows immediately that if E ⊂ X is precompact then any infinite
sequence in X has a convergent subsequence (the point being that the limit need not
belong to E).
We point out that the concepts of compactness and sequential compactness are applicable in spaces even more general than metric spaces, and are not always equivalent
in such situations. In the case that X = RN or CN we have an even more explicit characterization of compactness, the well known Heine-Borel Theorem, for which we refer to
[29] for a proof.
thhb
Theorem 4.3. E ⊂ RN or E ⊂ CN is compact if and only if it is closed and bounded.
While we know from Proposition 4.3 that a compact set is always closed and bounded,
55
the converse implication is definitely false in most function spaces we will be interested
in.
In later chapters a great deal of attention will be paid to optimization problems in
function spaces, that is, problems in the Calculus of Variations. A simple result along
these lines that we can prove already is:
th43
Theorem 4.4. Let X be a compact metric space and f : X → R be continuous. Then
there exists x0 ∈ X such that
f (x0 ) = max f (x)
(4.4.1)
x∈X
Proof: Let M = supx∈X f (x) (which may be +∞). so there exists a sequence {xn }∞
n=1
such that limn→∞ f (xn ) = M . By sequential compactness there is a subsequence {xnk }
and x0 ∈ X such that limk→∞ xnk = x0 and since f is continuous on X we must have
f (x0 ) = limk→∞ f (xnk ) = M . Thus M < ∞ and 4.4.1 holds. 2
A common notation expressing the same conclusion as 4.4.1 is
x0 ∈ argmax(f (x))
2
(4.4.2)
which is also useful in making the distinction between the maximum value of a function
and the point(s) at which the maximum is achieved.
We emphasize here the distinction between maximum and supremum, which is an
essential point in later discussion of optimization. If E ⊂ R then M = sup E if
• x ≤ M for all x ∈ E
• if M 0 < M there exists x ∈ E such that x > M 0
Such a number M exists for any E ⊂ R if we allow the value M = +∞; by convention
M = −∞ if E is the empty set. On the other hand M = max E if
• x≤M ∈E
2
Even though argmax(f (x)) is in general a set of points, i.e. all points where f achieves its maximum value,
one will often see this written as x0 = argmax(f (x)). Naturally we use the corresponding notation argmin for
points where the minimum of f is achieved.
56
in which case evidently the maximum is finite and equal to the supremum.
If f : X → C is continuous on a compact metric space X, then we can apply Theorem
4.4 with f replaced by |f |, to obtain that there exists x0 ∈ X such that |f (x)| ≤ |f (x0 )|
for all x ∈ X. We can then also conclude, as in Example 4.2 and Proposition 4.2
Proposition 4.5. If X is a compact metric space, then
C(X) = {f : X → C : f is continous at x for every x ∈ X}
(4.4.3)
is a complete metric space with metric d(f, g) = maxx∈X |f (x) − g(x)|.
In general C(X), or even a bounded set in C(X), is not itself precompact. A useful
criteria for precompactness of a set of functions in C(X) is given by the Arzela-Ascoli
theorem, which we review here, see e.g. [29] for a proof.
Definition 4.10. We say a family of real or complex valued functions F defined on a
metric space X is uniformly bounded if there exists a constant M such that
|f (x)| ≤ M
whenever x ∈ X , f ∈ F
(4.4.4)
and equicontinuous if for every > 0 there exists δ > 0 such that
|f (x) − f (y)| < whenever x, y ∈ E
d(x, y) < δ
f ∈F
(4.4.5)
We then have
arzasc
ex48
Theorem 4.5. (Arzela-Ascoli) If X is a compact metric space and F ⊂ C(X) is uniformly bounded and equicontinuous, then F is precompact in C(X).
Example 4.7. Let
F = {f ∈ C([0, 1]) : |f 0 (x)| ≤ M ∀x ∈ (0, 1), f (0) = 0}
for some fixed M . Then for f ∈ F we have
Z
f (x) =
(4.4.6)
x
f 0 (s) ds
(4.4.7)
0
implying in particular that |f (x)| ≤
Rx
M ds ≤ M . Also
Z y
0
|f (x) − f (y)| = f (s) ds ≤ M |x − y|
0
x
57
(4.4.8)
so that for any > 0, δ = /M works in the definition of equicontinuity. Thus by the
Arzela-Ascoli theorem F is precompact in C([0, 1]).
If X is a compact subset of RN then since uniform convergence implies Lp convergence,
it follows that any set which is precompact in C(X) is also precompact in Lp (X). But
there are also more refined, i.e. less restrictive, criteria for precompactness in Lp spaces,
which are known, see e.g. [5], Section 4.5.
4.5
Contraction mapping theorem
Met-Contr
One of the most important theorems about metric spaces, frequently used in applied
mathematics, is the Contraction Mapping Theorem, which concerns fixed points of a
mapping of X into itself.
Definition 4.11. A mapping T : X → X is a contraction on X if it is Lipschitz continuous with Lipschitz constant ρ < 1, that is, there exists ρ ∈ [0, 1) such that
d(T (x), T (x̂)) ≤ ρd(x, x̂)
∀x, x̂ ∈ X
(4.5.1)
If ρ = 1 is allowed, we say T is nonexpansive.
cmt
Theorem 4.6. If T is a contraction on a complete metric space X then there exists a
unique x ∈ X such that T (x) = x.
Proof: The uniqueness assertion is immediate, namely if T (x1 ) = x1 and T (x2 ) = x2
then d(x1 , x2 ) = d(T (x1 ), T (x2 )) ≤ ρd(x1 , x2 ). Since ρ < 1 we must have d(x1 , x2 ) = 0 so
that x1 = x2 .
To prove the existence of x, fix any point x1 ∈ X and define
xn+1 = T (xn )
(4.5.2)
for n = 1, 2, . . . . We first show that {xn }∞
n=1 must be a Cauchy sequence.
Note that
d(x3 , x2 ) = d(T (x2 ), T (x1 )) ≤ ρd(x1 , x1 )
(4.5.3)
d(xn+1 , xn ) = d(T (xn ), T (xn−1 ) ≤ ρn−1 d(x2 , x1 )
(4.5.4)
and by induction
58
fpi
Thus by the triangle inequality and the usual summation formula for a geometric series,
if m > n > 1
d(xm , xn ) ≤
m−1
X
d(xj+1 , xj ) ≤
j=n
n−1
=
ρ
m−1
X
ρj−1 d(x2 , x1 )
(4.5.5)
j=n
ρn−1
(1 − ρm−n+1 )
d(x2 , x1 ) ≤
d(x2 , x1 )
1−ρ
1−ρ
(4.5.6)
It follows immediately that {xn }∞
n=1 is a Cauchy sequence, and since X is complete there
exists x ∈ X such that limn→∞ xn = x. Since T is continuous T (xn ) → T (x) as n → ∞
and so x = T (x) must hold. 2
The point x in the Contraction Mapping Theorem which satisfies T (x) = x is called
a fixed point of T , and the process (4.5.2) of generating the sequence {xn }∞
n=1 , is called
fixed point iteration. Not only does the theorem show that T possesses a unique fixed
point under the stated hypotheses, but the proof shows that the fixed point may be
obtained by fixed point iteration starting from an arbitrary point of X.
As a simple application of the theorem, consider a second kind integral equation
Z
u(x) + K(x, y)u(y) dy = f (x)
(4.5.7)
inteq
Ω
with Ω ⊂ RN a bounded open set, a kernel function K = K(x, y) defined and continuous
for (x, y) ∈ Ω × Ω and f ∈ C(Ω). We can then define a mapping T on X = C(Ω) by
Z
(4.5.8)
T (u)(x) = − K(x, y)u(y) dy + f (x)
Ω
so that (4.5.7) is equivalent to the fixed point problem u = T (u) in X. Since K is
uniformly continuous on Ω × Ω it is immediate that T u ∈ X whenever u ∈ X, and by
elementary estimates we have
Z
d(T (u), T (v)) = max |T (u)(x) − T (v)(x)| = max K(x, y)(u(y) − v(y)) dy ≤ Ld(u, v)
x∈Ω
x∈Ω
Ω
(4.5.9)
where L := maxx∈Ω Ω |K(x, y)| dy. We therefore may conclude from the Contraction
Mapping Theorem the following:
R
Proposition 4.6. If
Z
|K(x, y)| dy < 1
max
x∈Ω
Ω
59
(4.5.10)
410
then (4.5.7) has a unique solution for every f ∈ C(Ω).
The condition (4.5.10) will be satisfied if either the maximum of |K| is small enough
or the size of the domain Ω is small enough. Eventually we will see that some such
smallness condition is necessary for unique solvability of (4.5.7), but the exact conditions
will be sharpened considerably.
If we consider instead the family of second kind integral equations
Z
λu(x) + K(x, y)u(y) dy = f (x)
(4.5.11)
Ω
with the same conditions on K and f , then the above argument show unique solvability
for all sufficiently large λ, namely provided
Z
max |K(x, y)| dy < |λ|
(4.5.12)
x∈Ω
Ω
As a second example, consider the initial value problem for a first order ODE
du
= f (t, u)
dt
u(t0 ) = u0
(4.5.13)
where we assume at least that f is continuous on [a, b] × R with t0 ∈ (a, b). If a classical
solution u exists, then integrating both sides of the ODE from t0 to t, and taking account
of the initial condition we obtain
Z t
u(t) = u0 +
f (s, u(s)) ds
(4.5.14)
t0
Conversely, if u ∈ C([a, b]) and satisfies (4.5.14) then necessarily u0 exists, is also continuous and (4.5.13) holds. Thus the problem of solving (4.5.13) is seen to be equivalent
to that of finding a continuous solution of (4.5.14). In turn this can be viewed as the
problem of finding a fixed point of the nonlinear mapping T : C([a, b]) → C([a, b]) defined
by
Z
t
T (u)(t) = u0 +
f (s, u(s)) ds
(4.5.15)
t0
Now if we assume that f satisfies the Lipschitz condition with respect to u,
|f (t, u) − f (t, v)| ≤ L|u − v|
60
u, v ∈ R t ∈ [a, b]
(4.5.16)
odeivp1
odeie
then
Z
t
|u(s) − v(s)| ds ≤ L|b − a| max |u(t) − v(t)|
|T (u)(t) − T (v)(t)| ≤ L
a≤t≤b
t0
(4.5.17)
or
d(T (u), T (v)) ≤ L|b − a|d(u, v)
(4.5.18)
where d is again the usual metric on C([a, b]). Thus the contraction mapping provides a
unique local solution, that is, on any interval [a, b] containing t0 for which (b − a) < 1/L.
Instead of the requirement that the Lipschitz condition (4.5.18) be valid on the entire
infinite strip [a, b]×R, it is actually only necessary to assume it holds on [a, b]×[c, d] where
u0 ∈ (c, d). Also, first order systems of ODEs (and thus scalar higher order equations)
can be handled in essentially the same manner. Such generalizations may be found in
standard ODE textbooks, e.g. Chapter 1 of [CL] or Chapter 3 of [BN].
We conclude with a useful variant of the contraction mapping theorem. If T : X → X
then we can define the (composition) powers of T by T 2 (x) = T (T (x)), T 3 (x) = T (T 2 (x))
etc. Thus T n : X → X for n = 1, 2, 3, . . . .
Theorem 4.7. If there exists a positive integer n such that T n is a contraction on a
complete metric space X then there exists a unique x ∈ X such that T (x) = x.
Proof: By Theorem 4.6 there exists a unique x ∈ X such that T n (x) = x. Applying T
to both sides gives T n (T (x)) = T n+1 (x) = T (x) so that T (x) is also a fixed point of T n .
By uniqueness, T (x) = x, i.e. T has at least one fixed point. To see that the fixed point
of T is unique, observe that any fixed point of T is also a fixed point of T 2 , T 3 , . . . . In
particular, if T has two distinct fixed points then so does T n , which is a contradiction.
2
4.6
Exercises
1. Verify that dp defined in Example 4.1 is a metric on RN or CN . (Suggestion: to
prove the triangle inequality, use the finite dimensional version of the Minkowski
inequality (18.1.15)).
2. If (X, dX ), (Y, dY ) are metric spaces, show that the Cartesian product
Z = X × Y = {(x, y) : x ∈ X, y ∈ Y }
61
odelip
is a metric space with distance function
d((x1 , y1 ), (x2 , y2 )) = dX (x1 , x2 ) + dY (y1 , y2 )
p
3. Is d(x, y) = |x − y|2 a metric on R? What about d(x, y) =
|x − y|? Find
reasonable conditions on a function φ : [0, ∞) → [0, ∞) such that d(x, y) = φ(|x−y|)
is a metric on R.
4. Prove that a closed subset of a compact set in a metric space is also compact.
5. Let (X, d) be a metric space, A ⊂ X be nonempty and define the distance from a
point x to the set A to be
d(x, A) = inf d(x, y)
y∈A
a) Show that |d(x, A) − d(y, A)| ≤ d(x, y) for x, y ∈ X (i.e. x → d(x, A) is nonexpansive).
b) Assume A is closed. Show that d(x, A) = 0 if and only if x ∈ A.
c) Assume A is compact. Show that for any x ∈ X there exists z ∈ A such that
d(x, A) = d(x, z).
6. Suppose that F is closed and G is open in a metric space (X, d) and F ⊂ G. Show
that there exists a continuous function f : X → R such that
i) 0 ≤ f (x) ≤ 1 for all x ∈ X.
ii) f (x) = 1 for x ∈ F .
iii) f (x) = 0 for x ∈ Gc .
Hint: Consider
f (x) =
d(x, Gc )
d(x, Gc ) + d(x, F )
7. Two metrics d, dˆ on a set X are said to be equivalent if there exist constants 0 <
C < C ∗ < ∞ such that
C≤
d(x, y)
≤ C∗
ˆ y)
d(x,
∀x, y ∈ X
a) If d, dˆ are equivalent, show that a sequence {xk }∞
k=1 is convergent in (X, d) if and
ˆ
only if it is convergent in (X, d).
b) Show that any two of the metrics dp on Rn are equivalent.
62
ex4-8
8. Prove that C([a, b]) is separable (you may quote the Weierstrass approximation
theorem) but L∞ (a, b) is not separable.
9. If X, Y are metric spaces, f : X → Y is continuous and K is compact in X, show
that the image f (K) is compact in Y .
10. Let
Z
F = {f ∈ C([0, 1]) : |f (x) − f (y)| ≤ |x − y| for all x, y,
1
f (x) dx = 0}
0
Show that F is compact in C([0, 1]). (Suggestion: to prove that F is uniformly
bounded, justify and use the fact that if f ∈ F then f (x) = 0 for some x ∈ [0, 1].)
11. Show that the set F in Example 4.7 is not closed.
12. From the proof of the contraction mapping it is clear that the smaller ρ is, the faster
the sequence xn converges to the fixed point x. With this in mind, explain why
Newton’s method
f (xn )
xn+1 = xn − 0
f (xn )
is in general a very rapidly convergent method for approximating roots of f : R → R,
as long as the initial guess is close enough.
13. Let fn (x) = sinn x for n = 1, 2, . . . .
a) Is the sequence {fn }∞
n=1 convergent in C([0, π])?
b) Is the sequence convergent in L2 (0, π)?
c) Is the sequence compact or precompact in either of these spaces?
14. Let X be a complete metric space and T : X → X satisfy d(T (x), T (y)) < d(x, y)
for all x, y ∈ X, x 6= y. Show that T can have at most one fixed point, but
√ may
have none. (Suggestion: for an example of non-existence look at T (x) = x2 + 1
on R.)
15. Let S denote the linear Volterra type integral operator
Z x
Su(x) =
K(x, y)u(y) dy
a
where the kernel K is continuous and satisfies |K(x, y)| ≤ M for a ≤ y ≤ x.
63
a) Show that
|S n u(x)| ≤
M n (x − a)n
max |u(y)| x > a n = 1, 2, . . .
a≤y≤x
n!
b) Deduce from this that for any b > a, there exists an integer n such that S n is a
contraction on C([a, b]).
c) Show that for any f ∈ C([a, b]) the second kind Volterra integral equation
Z x
K(x, y)u(y) dy = u(x) + f (x) a < x < b
a
has a unique solution u ∈ C([a, b]).
16. Show that for sufficiently small |λ| there exists a unique solution of the boundary
value problem
u00 + λu = f (x) 0 < x < 1
u(0) = u(1) = 0
for any f ∈ C([0, 1]). (Suggestion: use the result of Chapter 2, Exercise 7 to
transform the boundary value problem into a fixed point problem for an integral
operator, then apply the Contraction Mapping Theorem.) Be as precise as you can
about which values of λ are allowed.
17. Let f = f (x, y) be continuously differentiable on [0, 1] × R and satisfy
0<m≤
∂f
(x, y) ≤ M
∂y
Show that there exists a unique continuous function φ(x) such that
f (x, φ(x)) = 0 0 < x < 1
(Suggestion: Define the transformation
(T φ)(x) = φ(x) − λf (x, φ(x))
and show that T is a contraction on C([0, 1]) for some choice of λ. This is a special
case of the implicit function theorem.)
64
ex4-18
18. Show that if we let
∞
X
2−n en
d(f, g) =
1 + en
n=0
where
en = max |f (n) (x) − g (n) (x)|
x∈[a,b]
(n)
then (C ∞ ([a, b]), d) is a metric space, in which fk → f if and only if fk
uniformly on [a, b] for n = 0, 1, . . . .
ex4-19
→ f (n)
19. If E ⊂ RN is closed and bounded, show that C 1 (E) is a complete metric space with
the metric defined by (4.1.7).
65
Chapter 5
Normed linear spaces and Banach
spaces
chbanach
5.1
Axioms of a normed linear space
Definition 5.1. A vector space X is said to be a normed linear space if for every x ∈ X
there is defined a nonnegative real number ||x||, the norm of x, such that the following
axioms hold.
[N1] ||x|| = 0 if and only if x = 0
[N2] ||λx|| = |λ|||x|| for any x ∈ X and any scalar λ.
[N3] ||x + y|| ≤ ||x|| + ||y|| for any x, y ∈ X.
As in the case of a metric space it is technically the pair (X, || · ||) which constitute a
normed linear space, but the definition of the norm will usually be clear from the context.
If two different normed spaces are needed we will use a notation such as ||x||X to indicate
the space in which the norm is calculated.
Example 5.1. In the vector space X = RM or CN we can define the family of norms
! p1
n
X
||x||p =
|xj |p
1≤p<∞
j=1
66
||x||∞ = max |xj |
1≤j≤n
(5.1.1)
Axioms [N1] and [N2] are obvious, while axiom [N3] amounts to the Minkowski inequality
(18.1.15).
We obviously have dp (x, y) = ||x − y||p here, and this correspondence between norm
and metric is a special case of the following general fact that a norm always gives rise to
a metric, and whose proof is immediate from the definitions involved.
prop51
Proposition 5.1. Let (X, || · ||) be a normed linear space. If we set d(x, y) = ||x − y||
for x, y ∈ X then (X, d) is a metric space.
Example 5.2. If E ⊂ RN is closed and bounded then it is easy to verify that
||f || = max |f (x)|
x∈E
(5.1.2)
defines a norm on C(E), and the usual metric (4.1.4) on C(E) amounts to d(f, g) =
||f − g||. Likewise, the metrics (4.1.8),(4.1.9) on Lp (E) may be viewed as coming from
the corresponding Lp norms,
( R
p1
p
|f
(x)|
dx
1≤p<∞
E
||f ||Lp (E) =
(5.1.3)
ess supx∈E |f (x)| p = ∞
Note that for such a metric we must have d(λx, λy) = |λ|d(x, y) so that if this property
does not hold, the metric cannot arise from a norm in this way. For example,
d(x, y) =
|x − y|
1 + |x − y|
(5.1.4)
is a metric on R which does not come from a norm.
Since any normed linear space may now be regarded as metric space, all of the topological concepts defined for a metric space are meaningful in a normed linear space.
Completeness holds in many situations of interest, so we have a special designation in
that case.
Definition 5.2. A Banach space is a complete normed linear space.
Example 5.3. The spaces RN , CN are vector spaces which are also complete metric
spaces with any of the norms || · ||p , hence they are Banach spaces. Similarly C(E),
Lp (E) are Banach spaces with norms indicated above.
Here are a few simple results we can prove already.
67
prop52
Proposition 5.2. If X is a normed linear space the the norm is a continuous function
on X. If E ⊂ X is compact and y ∈ X then there exists x0 ∈ E such that
||y − x0 || = min ||y − x||
x∈E
(5.1.5)
Proof: From the triangle inequality we get |(||x1 || − ||x2 ||)| ≤ ||x1 − x2 || so that
f (x) = ||x|| is Lipschitz continuous (with Lipschitz constant 1) on X. Similarly f (x) =
||x − y|| is also continuous for any fixed y, so we may apply Theorem 4.4 with X replaced
by the compact metric space E and f (x) = −||x − y|| to get the conclusion (ii). 2
Another topological point of interest is the following.
th51
Theorem 5.1. If M is a subspace of a normed linear space X, and dim M < ∞ then
M is closed.
Proof: The proof is by induction on the number of dimensions. Let dim(M ) = 1 so that
M = {u = λe : λ ∈ C} for some e ∈ X, ||e|| = 1. If un ∈ M then un = λn e for some
λn ∈ C and un → u in X implies, since ||un − um || = |λn − λm |, that {λn } is a Cauchy
sequence in C. Thus there exist λ ∈ C such that λn → λ so that un → u = λe ∈ M , as
needed.
Now suppose we know that all N dimensional subspaces are closed and dim M =
N + 1, so we can find e1 , . . . , eN +1 linearly independent unit vectors such that M =
L(e1 , . . . , eN +1 ). Let M̃ = L(e1 , . . . , eN ) which is closed by the induction assumption. If
un ∈ M there exists λn ∈ C and vn ∈ M̃ such that un = vn + λn eN +1 . Suppose that
un → u in X. We claim first that {λn } is bounded in C. If not, there must exist λnk such
that |λnk | → ∞, and since un remains bounded in X we get unk /λnk → 0. It follows that
eN +1 −
unk
vn
= − k ∈ M̃
λnk
λnk
(5.1.6)
Since M̃ is closed, it would follow, upon letting nk → ∞, that eN +1 ∈ M̃ , which is
impossible.
Thus we can find a subsequence λnk → λ for some λ ∈ C and
vnk = unk − λnk eN +1 → u − λeN +1
(5.1.7)
Again since M̃ is closed it follows that u − λeN +1 ∈ M̃ , so that u ∈ M as needed.
68
For the proof, see for example Theorem 1.21 of [31]. For an infinite dimensional
subspace this is false in general. For example, the Weierstrass approximation theorem
states that if f ∈ C([a, b]) and > 0 there exists a polynomial p such that |p(x)−f (x)| ≤ on [a, b]. Thus if we take X = C([a, b]) and E to be the set of all polynomials on [a, b]
then clearly E is a subspace of X and every point of X is a limit point of E. Thus E
cannot be closed since otherwise E would be equal to all of X.
Recall that when E = X as in this example, we say that E is a dense subspace of X.
Such subspaces play an important role in functional analysis. According to Theorem 5.1
a finite dimensional Banach space X has no dense subspace aside from X itself.
5.2
Infinite series
In a normed linear space we can study limits of sums, i.e. infinite series.
P
Definition 5.3. We say ∞
j=1 xj is convergent in X to the limit s if limn→∞ sn = s,
Pn
where sn = j=1 xj is the n’th partial sum of the series.
prop53
A useful criterion for convergence can then be given, provided the space is also complete.
P∞
Proposition
5.3.
If
X
is
a
Banach
space,
x
∈
X
for
n
=
1,
2,
.
.
.
and
n
n=1 ||xn || < ∞
P∞
P∞
then n=1 xn is convergent to an element s ∈ X with ||s|| ≤ n=1 ||xn ||.
P
P
xj || ≤ m
Proof: If m > n we have ||sm − sn || = || m
j=n+1 ||xj || by the triangle
j=n+1
P∞
inequality. If j=1 ||xj || it is convergent, its partial sums form a Cauchy sequence in R,
and hence {sn } is P
also Cauchy. Since the space is complete s = limn→∞ sn exists. We
also have ||sn || ≤ nj=1 ||xj || for any fixed n, and ||sn || → ||s|| by Proposition 5.2, so
P
||s|| ≤ ∞
j=1 ||xj || must hold. 2
The concepts of linear combination, linear independence and basis may now be extended to allow for infinite sums in an obvious way: We say a countably infinite set of
vectors {xn }∞
n=1 is linearly independent if
∞
X
λn xn = 0 if and only if λn = 0 for all n
n=1
69
(5.2.1)
P∞
∞
and x ∈ L({xn }∞
n=1 ), the span of ({xn }n=1 ), provided x =
n=1 λn xn for some scalars
{λn }∞
.
A
basis
of
X
is
then
a
linearly
independent
spanning
set, or equivalently
n=1
∞
∞
{xn }P
n=1 is a basis of X if for any x ∈ X there exist unique scalars {λn }n=1 such that
∞
x = n=1 λn xn .
We emphasize that this definition of basis is not the same as that given in Definition
3.4 for a basis of a vector space, the difference being that the sum there is required to
always be finite. The term Schauder basis is sometimes used for the above definition if
the distinction needs to be made. Throughout the remainder of these notes, the term
basis will always mean Schauder basis unless otherwise stated.
A Banach space X which contains a Schauder basis {xn }∞
n=1 is always separable, since
then the set of all finite linear combinations of the xn ’s with rational coefficients is easily
seen to be countable and dense. It is known that not every separable Banach space has
a Schauder basis (recall there must exist a Hamel basis), see for example Section 1.1 of
[39].
5.3
Linear operators and functionals
We have previously defined what it means for a mapping T : X 7−→ Y between vector
spaces to be linear. When the spaces X, Y are normed linear spaces we usually refer to
such a mapping T as a linear operator. We say that T is bounded if there exists a finite
constant C such that ||T x|| ≤ C||x|| for every x ∈ X, and we may then define the norm
of T as the smallest such C, or equivalently
||T || = sup
x6=0
||T x||
||x||
(5.3.1)
The condition ||T || < ∞ is equivalent to continuity of T .
prop54
Proposition 5.4. If X, Y are normed linear spaces and T : X → Y is linear then the
following three conditions are equivalent.
a) T is bounded
b) T is continuous
c) There exists x0 ∈ X such that T is continuous at x0 .
70
normdef
Proof: If x0 , x ∈ X then
||T (x) − T (x0 )|| = ||T (x − x0 )|| ≤ ||T || ||x − x0 ||
(5.3.2)
Thus if T is bounded then it is (Lipschitz) continuous at any point of X. The implication
that b) implies c) is trivial. Finally suppose T is continuous at x0 ∈ X. For any > 0
there must exist δ > 0 such that ||T (z − x0 )|| = ||T (z) − T (x0 )|| ≤ if ||z − x0 || ≤ δ. For
x
any x 6= 0, choose z = x0 + δ ||x||
to get
x
|| ≤ ||T δ
||x||
(5.3.3)
or equivalently, using the linearity of T , ||T x|| ≤ C||x|| with C = /δ. Thus T is
bounded. 2
A continuous linear operator is therefore the same as a bounded linear operator, and
the two terms are used interchangeably. When the range space Y is the scalar field R or
C we call T a linear functional instead of linear operator, and correspondingly a bounded
(or continuous) linear functional if |T x| ≤ C||x|| for some finite constant C.
We introduce the notation
B(X, Y) = {T : X → Y : T is linear and bounded}
(5.3.4)
and the special cases
B(X) = B(X, X)
X0 = B(X, C)
(5.3.5)
Examples of linear operators and functionals will be studied much more extensively
later. For now we just give two simple examples.
Example 5.4. If X = RN , Y = RM and A is an M ×N real matrix with entries akj , then
P
yk = M
j=1 akj xj defines a linear mapping, and according to the discussion of Section 3.3
any linear mapping of RN to RM is of this form. It is not hard to check that T is always
bounded, assuming that we use any of the norms || · ||p in X and in Y. Evidently T is a
linear functional if M = 1.
Example 5.5. If Ω ⊂ RN is compact and X = C(Ω) pick x0 ∈ Ω and set T (f ) = f (x0 )
for f ∈ X. Clearly T is a linear functional and |T f | ≤ ||f || so that ||T || ≤ 1.
71
5.4
Contraction mappings in a Banach space
sec54
If the Contraction Mapping theorem, Theorem 4.6, is specialized to a Banach space, the
resulting statement is that if X is a Banach space and F : X → X satisfies
||F (x) − F (y)|| ≤ L||x − y||
x, y ∈ X
(5.4.1)
conaflin
for some L < 1, then F has a unique fixed point in X.
A particular case which arises frequently in applications is when the mapping F has
the form F (x) = T x + b for some b ∈ X and bounded linear operator T on X, in which
case the contraction condition (5.4.1) simply amounts to the requirement that ||T || < 1.
If we then initialize the fixed point iteration process (4.5.2) with x1 = b, the successive
iterates are
x2 = F (x1 ) = F (b) = T b + b
x3 = F (x2 ) = T x2 + b = T 2 b + T b + b
(5.4.2)
(5.4.3)
etc., the general pattern being
xn =
n−1
X
T j b n = 1, 2, . . .
(5.4.4)
j=0
with T 0 = I as usual. If ||T || < 1 we already know that this sequence must converge,
but it could also be checked directly from Proposition 5.3 using the obvious inequality
||T j b|| ≤ ||T ||j ||b||. In fact we know that xn → x, the unique fixed point of F , so
x=
∞
X
T jb
(5.4.5)
j=0
is an explicit solution formula for the linear, inhomogeneous equation x − T x = b. The
right hand side of (5.4.5) is known as the Neumann series for x = (I − T )−1 b, and
symbolically we may write
∞
X
(I − T )−1 =
Tj
(5.4.6)
j=0
Note the formal similarity to the usual geometric series formula for (1 − z)−1 if z ∈ C,
|z| < 1. If T and b are such that ||T j b|| << ||T b|| for j ≥ 2, then truncating the series
after two terms we get the Born approximation formula x ≈ b + T b.
72
neuser
5.5
Exercises
1. Give the proof of Proposition 5.1.
2. Show that any two norms on a finite dimensional normed linear space are equivalent.
That is to say, if (X, || · ||), (X, ||| · |||) are both normed linear spaces, then there
exist constants 0 < c < C < ∞ such that
c≤
ex5-3
ex5-7
|||x|||
≤C
||x||
for all x ∈ X
3. If X is a normed linear space and Y is a Banach space, show that B(X, Y) is a
Banach space, with the norm given by (5.3.1).
R
4. If T is a linear integral operator, T u(x) = Ω K(x, y)u(y) dy, then T 2 is also a linear
integral operator. What is the kernel for T 2 ?
5. If X is a normed linear space and E is a subspace of X, show that E is also a
subspace of X.
1/p
R
does not define a norm.
6. If p ∈ (0, 1) show that ||f ||p = Ω |f (x)|p dx
7. The simple initial value problem
u0 = u
u(0) = 1
is equivalent to the integral equation
Z
x
u(x) = 1 +
u(s) ds
0
which may be viewed as a fixed point problem of the special type discussed in
Section 5.4. Find the Neumann series for the solution u. Where does it converge?
8. If T f = f (0), show that T is not a bounded linear functional on Lp (−1, 1) for
1 ≤ p < ∞.
expop
9. Let A ∈ B(X).
a) Show that
exp(A) = eA :=
∞
X
An
n=0
73
n!
(5.5.1)
is defined in B(X).
b) If also B ∈ B(X) and AB = BA show that exp(A + B) = exp(A) exp(B).
c) Show that exp((t + s)A) = exp(tA) exp(sA) for any t, s ∈ R.
d) Show that the conclusion in b) is false, in general, if A and B do not commute.
(Suggestion: a counterexample can be found in X = R2 .)
10. Find an integral equation of the form u = T u + f , T linear, which is equivalent to
the initial value problem
u00 + u = x2
x>0
u(0) = 1 u0 (0) = 2
(5.5.2)
Calculate the Born approximation to the solution u and compare to the exact
solution.
74
Chapter 6
Inner product spaces and Hilbert
spaces
chhilbert
6.1
Axioms of an inner product space
Definition 6.1. A vector space X is said to be an inner product space if for every
x, y ∈ X there is defined a complex number hx, yi, the inner product of x and y such
that the following axioms hold.
[H1] hx, xi ≥ 0 for all x ∈ X
[H2] hx, xi = 0 if and only if x = 0
[H3] hλx, yi = λhx, yi for any x, y ∈ X and any scalar λ.
[H4] hx, yi = hy, xi for any x, y ∈ X.
[H5] hx + y, zi = hx, zi + hy, zi for any x, y, z ∈ X
Note that from axioms [H3] and [H4] it follows that
hx, λyi = hλy, xi = λhy, xi = λ̄hy, xi = λ̄hx, yi
75
(6.1.1)
Another immediate consequence of the axioms is that
||x + y||2 = hx + y, x + yi = ||x||2 + 2Re hx, yi + ||y||2 = ||x||2 + ||y||2
(6.1.2)
If we replace y by −y and add the resulting identities we obtain the so-called Parallelogram Law
||x + y||2 + ||x − y||2 = 2||x||2 + 2||y||2
(6.1.3)
Example 6.1. The vector space RN is an inner product space if we define
hx, yi =
n
X
xj yj
(6.1.4)
xj y j
(6.1.5)
j=1
In the case of Cn we must define
hx, yi =
n
X
j=1
in order that [H4] be satisfied.
Example 6.2. For the vector space L2 (Ω), with Ω ⊂ RN , we may define
Z
hf, gi =
f (x)g(x) dx
(6.1.6)
E
where of course the complex conjugation can be ignored in the case of the real vector
space L2 (Ω). Note the formal analogy with the inner product in the case of RN or CN .
The finiteness of hf, gi is guaranteed by the Hölder inequality (18.1.6), and the validity
of [H1]-[H5] is clear.
littlel2
Example 6.3. Another important inner product space which we introduce at this point
is the sequence space
(
)
∞
X
`2 = x = {xj }∞
|xj |2 < ∞
(6.1.7)
j=1 :
j=1
with inner product
hx, yi =
∞
X
xj y j
(6.1.8)
j=1
The fact that hx, yi is finite for any x, y ∈ `2 follows now from (18.1.14), the discrete form
of the Hölder inequality. The notation `2 (Z) is often used when the sequences involved
are bi-infinite, i.e. of the form x = {xj }∞
j=−∞ .
76
612
plaw
6.2
Norm in a Hilbert space
Proposition 6.1. If x, y ∈ X, an inner product space, then
|hx, yi|2 ≤ hx, xihy, yi
(6.2.1)
Proof: For any z ∈ X we have
0 ≤ hx − z, x − zi = hx, xi − hx, zi − hz, xi + hz, zi
= hx, xi + hz, zi − 2Re hx, zi
(6.2.2)
(6.2.3)
and hence
2Re hz, xi ≤ hx, xi + hz, zi
(6.2.4)
If y = 0 there in nothing to prove, otherwise choose z = (hx, yi/hy, yi)y to get
2
|hx, yi|2
|hx, yi|2
≤ hx, xi +
hy, yi
hy, yi
(6.2.5)
The conclusion (6.2.1) now follows upon rearrangement. 2
th61
Theorem 6.1. If X is an inner product space and if we set ||x|| =
a norm on X.
p
hx, xi then || · || is
Proof: By axiom [H1] ||x|| is defined as a nonnegative real number for every x ∈ X,
and axiom [H2] implies the corresponding axiom [N1] of norm. If λ is any scalar then
||λx||2 =< λx, λx >= λλ̄hx, xi = |λ|2 ||x||2 so that [N2] also holds. Finally, if x, y ∈ X
then
||x + y||2 = hx + y, x + yi = ||x||2 + 2Re hx, yi + ||y||2
≤ ||x||2 + 2|hx, yi| + ||y||2 ≤ ||x||2 + 2||x|| ||y|| + ||y||2
= (||x|| + ||y||)2
(6.2.6)
(6.2.7)
(6.2.8)
so that the triangle inequality [N3] also holds. 2
The inequality (6.2.1) may now be restated as
|hx, yi| ≤ ||x|| ||y||
(6.2.9)
for any x, y ∈ X, and in this form is usually called the Schwarz or Cauchy-Schwarz
inequality.
77
schwarz
cor61
Corollary 6.1. If xn → x in X then hxn , yi → hx, yi for any y ∈ X
Proof: We have that
|hxn , yi − hx, yi| = |hxn − x, yi| ≤ ||xn − x|| ||y|| → 0
(6.2.10)
By Theorem 6.1 an inner product space may always be regarded as a normed linear
space, and analogously to the definition of Banach space we have
Definition 6.2. A Hilbert space is a complete inner product space.
Example 6.4. The spaces RN and CN are Hilbert spaces, as is L2 (Ω) on account of the
completeness property mentioned in TheoremR 4.1 of Chapter 4. On the other hand if we
consider C(E) with inner product hf, gi = E f (x)g(x) dx, then it is an inner product
space which is not a Hilbert space, since as previously observed, C(E) is not complete
with the L2 (Ω) metric. The sequence space `2 is also a Hilbert space, see Exercise 7.
6.3
Orthogonality
Recall from elementary calculus that in Rn the inner product allows one to calculate the
angle between two vectors, namely
hx, yi = ||x|| ||y|| cos θ
(6.3.1)
where θ is the angle between x and y. In particular x and y are perpendicular if and only
if hx, yi = 0. The concept of perpindicularity, also called orthogonality, is fundamental
in Hilbert space analysis, even if the geometric picture is less clear.
Definition 6.3. if x, y ∈ X, an inner product space, we say x, y are orthogonal if
hx, yi = 0.
From (6.1.2) we obtain immediately the ’Pythagorean Theorem’ that if x and y are
orthogonal then
||x + y||2 = ||x||2 + ||y||2
(6.3.2)
78
A set of vectors {x1 , x2 , . . . xn } is called an orthogonal set if xj and xk are orthogonal
whenever j 6= k, and for such a set we have
||
n
X
2
xj || =
j=1
n
X
||xj ||2
(6.3.3)
634
j=1
The set is called orthonormal if in addition ||xj || = 1 for every j. The same terminology
is used for countably infinite sets, with (6.3.3) still valid provided that the series on the
right is convergent.
We may also use the notation x ⊥ y if x, y are orthogonal, and if E ⊂ X we define
the orthogonal complement of E
E ⊥ = {x ∈ X : hx, yi = 0 for all y ∈ E}
(E ⊥ = x⊥ if E consists of the single point x). We obviously have 0⊥ = X and X⊥ = {0}
also, since if x ∈ X⊥ then hx, xi = 0 so that x = 0.
prop62
Proposition 6.2. If E ⊂ X then E ⊥ is a closed subspace of X. If E is a closed subpace
then E = E ⊥⊥ .
We leave the proof as an exercise. Here E ⊥⊥ means (E ⊥ )⊥ , the orthogonal complement of the orthogonal complement.
Example 6.5. If X = R3 and E = {x = (x1 , x2 , x3 ) : x1 = x2 = 0} then E ⊥ = {x ∈ R3 :
x3 = 0}.
Example 6.6. If X = L2 (Ω) with Ω a bounded open set in RRN , let E = L{1}, i.e. the
set of constant functions. Then f ∈ E ⊥ if and only if hf, 1i = Ω f (x) dx = 0. Thus E ⊥
is the set of functions in L2 (Ω) with mean value zero.
6.4
Projections
If E ⊂ X and x ∈ X, the projection PE x of x onto E is the element of E closest to x, if
such an element exists. That is, y = PE (x) if y is the unique solution of the minimization
problem
min ||x − z||
(6.4.1)
z∈E
Of course such a point may not exist, and may not be unique if it does exist. In a Hilbert
space the projection will be well defined provided E is closed and convex.
79
bestapprox
Definition 6.4. If X is a vector space and E ⊂ X, we say E is convex if λx+(1−λ)y ∈ E
whenever x, y ∈ E and λ ∈ [0, 1].
Example 6.7. If X is a vector space then any subspace of X is convex. If X is a normed
linear space then any ball B(x, R) ⊂ X is convex.
Theorem 6.2. Let H be a Hilbert space, E ⊂ H closed and convex, and x ∈ H. Then
y = PE x exists. Furthermore, y = PE x if and only if
y∈E
Re hx − y, z − yi ≤ 0
for all z ∈ E
(6.4.2)
projvar
Proof: Set d = inf z∈E ||x − z|| so that there exists a sequence zn ∈ E such that
||x − zn || → d. We wish to show that {zn } is a Cauchy sequence. From the Parallelogram
Law (6.1.3) applied to zn − x, zm − x we have
||zn − zm ||2 = 2||zn − x||2 + 2||zm − x||2 − 4||
zn + zm
− x||2
2
(6.4.3)
m
Since E is convex, (zn + zm )/2 ∈ E so that || zn +z
− x|| ≥ d, and it follows that
2
||zn − zm ||2 ≤ 2||zn − x||2 + 2||zm − x||2 − 4d2
(6.4.4)
Letting n, m → ∞ the right hand side tends to zero, so that {zn } is Cauchy. Since the
space is complete there exists y ∈ H such that limn→∞ zn = y, and y ∈ E since E is
closed. It follows that ||y − x|| = limn→∞ ||zn − x|| = d so that minz∈E ||z − x|| is achieved
at y.
For the uniqueness assertion, suppose ||y − x|| = ||ŷ − x|| = d with y, ŷ ∈ E. Then
(6.4.4) holds with zn , zm replaced by y, ŷ giving
||y − ŷ|| ≤ 2||y − x||2 + 2||ŷ − x||2 − 4d2 = 0
(6.4.5)
so that y = ŷ. Thus y = PE x exists.
To obtain the characterization (6.4.2), note that for any z ∈ E
f (t) = ||x − (y + t(z − y))||2
(6.4.6)
has its minimum value on the interval [0, 1] when t = 0, since y +t(z −y) = tz +(1−t)y ∈
E. We explicitly calculate
f (t) = ||x − y||2 − 2t Re hx − y, z − yi + t2 ||z − y||2
80
(6.4.7)
644
By elementary calculus considerations, the minimum of this quadratic occurs at t = 0
only if f 0 (0) = −2 Re hx − y, z − yi ≥ 0 which is equivalent to (6.4.2). If, on the other
hand, (6.4.2) holds, then for any z ∈ E we must have
||z − x||2 = f (1) ≥ f (0) = ||z − y||2
(6.4.8)
so that minz∈E ||z − x|| must occur at y, i.e. y = PE x 2
The most important special case of the above theorem is when E is a closed subspace
of the Hilbert space H (recall a subspace is always convex), in which case we have
th63
Theorem 6.3. If E ⊂ H is a closed subspace of a Hilbert space H and x ∈ H then
y = PE x if and only if y ∈ E and x − y ∈ E ⊥ . Furthermore
1. x − y = x − PE x = PE ⊥ x
2. We have that
x = y + (x − y) = PE x + PE ⊥ x
(6.4.9)
is the unique decomposition of x as the sum of an element of E and an element of
E ⊥.
3. PE is a linear operator on H with ||PE || = 1 except for the case E = {0}.
Proof: If y = PE x then for any w ∈ E we also have y ± w ∈ E, and choosing
z = y ± w in (6.4.2) gives ± Re hx − y, wi ≤ 0. Thus Re hx − y, wi = 0, and repeating
the same argument with z = y ± iw gives Re hx − y, iwi = Im hx − y, wi = 0 also. We
conclude that hx − y, wi = 0 for all w ∈ E, i.e. x − y ∈ E ⊥ . The converse statement may
be proved in a similar manner.
Recall that E ⊥ is always a closed subspace of H. The statement that x − y = PE ⊥ x is
then equivalent, by the previous paragraph, to x−y ∈ E ⊥ and hx−(x−y), wi = hy, wi = 0
for every w ∈ E ⊥ , which is evidently true since y ∈ E.
Next, if x = y1 + z1 = y2 + z2 with y1 , y2 ∈ E and z1 , z2 ∈ E ⊥ then y1 − y2 = z1 − z2
implying that y = y1 − y2 belongs to both E and E ⊥ . But then y ⊥ y, i.e. < y, y >= 0,
must hold so that y = 0 and hence y1 = y2 , z1 = z2 . We leave the proof of linearity to
the exercises. 2
If we denote by I the identity mapping, we have just proved that PE ⊥ = I − PE . We
also obtain that
||x||2 = ||PE x||2 + ||PE ⊥ x||2
(6.4.10)
81
6410
for any x ∈ H.
Example 6.8. In the Hilbert space L2 (−1, 1) let E denote the subspace of even functions,
i.e. f ∈ E if f (x) = f (−x) for almost every x ∈ (−1, 1). We claim that E ⊥ is the subspace
⊥
of odd functions on (−1, 1). The fact that any odd function
R 1 belongs to E is clear, since
if f is even and g is odd then f g is odd and so hf, gi = −1 f (x)g(x) dx = 0. Conversely,
if g ⊥ E then for any f ∈ E we have
Z 1
Z 1
g(x)f (x) dx =
(g(x) + g(−x))f (x) dx
0 = hg, f i =
(6.4.11)
−1
0
by an obvious change of variables. Choosing f (x) = g(x) + g(−x) we see that
Z 1
|g(x) + g(−x)|2 dx = 0
(6.4.12)
0
so that g(x) = −g(−x) for almost every x ∈ (0, 1) and hence for almost every x ∈ (−1, 1).
Thus any element of E ⊥ is and odd function on (−1, 1).
Any function f ∈ L2 (−1, 1) thus has the unique decomposition f = PE f + PE ⊥ f , a
sum of an even and an odd function. Since one such splitting is
f (x) =
f (x) + f (−x) f (x) − f (−x)
+
2
2
(6.4.13)
we conclude from the uniqueness property that these two term are the projections, i.e.
PE f (x) =
f (x) + f (−x)
2
PE ⊥ f (x) =
f (x) − f (−x)
2
(6.4.14)
Example 6.9. Let {x1 , x2 , . . . xn } be an orthogonal set of nonzero elements in a Hilbert
space X and E = L(x1 , x2 . . . xn ) the span ofPthese elements. Let us compute PE for
this closed subspace E. If y = PE x then y = nj=1 λj xj for some scalars λ1 , . . . λn since
y ∈ E. From Theorem 6.3 we also have that x − y ⊥ E which is equivalent to x − y ⊥ xk
for each k. Thus hx, xk i = hy, xk i = λk hxk , xk i using the orthogonality assumption. Thus
we conclude that
n
X
hx, xj i
y = PE x =
xj
(6.4.15)
hx
,
x
i
j
j
j=1
82
6415
6.5
Gram-Schmidt method
The projection formula (6.4.15) provides an explicit and very convenient expression
for the solution y of the best approximation problem (6.4.1) provided E is a subspace
spanned by mutually orthogonal vectors {x1 , x2 , . . . xn }. If instead E = L(x1 , x2 . . . xn )
is a subspace but {x1 , x2 , . . . xn } are not orthogonal vectors, we can still use (6.4.15) to
compute y = PE x if we can find a set of orthogonal vectors {y1 , y2 , . . . ym } such that
E = L(x1 , x2 , . . . xn ) = L(y1 , y2 , . . . ym ), i.e. if we can find an orthogonal basis of E.
This may always be done by the Gram-Schmidt orthogonalization procedure from linear
algebra, which we now describe.
Assume that {x1 , x2 , . . . xn } are linearly independent, so that m = n must hold. First
set y1 = x1 . If orthogonal vectors y1 , y2 . . . yk have been chosen for some 1 ≤ k < n
such that Ek := L(y1 , y2 , . . . yk ) = L(x1 , x2 , . . . xk ) then define yk+1 = xk+1 − PEk xk+1 .
Clearly {y1 , y2 , . . . yk+1 } are orthogonal since yk+1 is the projection of xk+1 onto Ek⊥ .
Also since yk+1 , xk+1 differ by an element of Ek it is evident that L(x1 , x2 , . . . xk+1 ) =
L(y1 , y2 , . . . yk+1 ). Thus after n steps we obtain an orthogonal set {y1 , y2 , . . . yn } which
spans E. If the original set {x1 , x2 , . . . xn } is not linearly independent then some of the
yk ’s will be zero. After discarding these and relabeling, we obtain {y1 , y2 , . . . ym } for
some m ≤ n, an orthogonal basis for E. Note that we may compute yk+1 using (6.4.15),
namely
k
X
< xk+1 , yj >
yk+1 = xk+1 −
yj
(6.5.1)
<
y
,
y
>
j
j
j=1
In practice the Gram-Schmidt method is often modified to produce an orthonormal
basis of E by normalizing yk to be a unit vector at each step, or else discarding it if it is
already a linear combination of {y1 , y2 , . . . yk−1 }. More explicitly:
• Set y1 =
x1
||x1 ||
• If orthonormal vectors {y1 , y2 , . . . yk } have been chosen, set
ỹk+1 = xk+1 −
k
X
< xk+1 , yj > yj
j=1
If ỹk+1 = 0 discard it, otherwise set yk+1 =
83
ỹk+1
.
||ỹk+1 ||
(6.5.2)
The reader may easily check
P that {y1 , y2 , . . . ym } constitutes an orthonormal basis of E,
and consequently PE x = m
j=1 < x, yj > yj for any x ∈ H.
6.6
Bessel’s inequality and infinite orthogonal sequences
The formula (6.4.15) for PE may be adapted for use in infinite dimensional subspaces E.
If {xn }∞
n=1 is a countable orthogonal set in H, xn 6= 0 for all n, we formally expect that
if E = L({xn }∞
n=1 ) then
∞
X
hx, xn i
PE x =
xn
(6.6.1)
hxn , xn i
n=1
661
To verify that this is correct, we must show that the infinite series in (6.6.1) is guaranteed
to be convergent in H.
First of all, let us set
en =
xn
||xn ||
cn = hx, en i
EN = L(x1 , x2 , . . . xN )
(6.6.2)
cn en
(6.6.3)
so that {en }∞
n=1 is an orthonormal set, and
PEN x =
N
X
n=1
From (6.4.10) we have
N
X
|cn |2 = ||PEN x||2 ≤ ||x||2
(6.6.4)
n=1
Letting N → ∞ we obtain Bessel’s inequality
∞
X
n=1
2
|cn | =
∞
X
|hx, en i|2 ≤ ||x||2
(6.6.5)
besselin
n=1
The immediate implication that limn→∞ cn = 0 is sometimes called the Riemann-Lebesgue
lemma.
prop63
Proposition 6.3. (Riesz-Fischer) Let {en }∞
an orthonormal set in H, E = L({en }∞
n=1 be P
n=1 ),
x ∈ H and cn = hx, en i. Then the infinite series ∞
c
e
is
convergent
in
H
to
P
x.
E
n=1 n n
84
Proof: First we note that the series
||
P∞
M
X
n=1 cn en
2
cn en || =
n=N
is Cauchy in H since if M > N
M
X
|cn |2
(6.6.6)
n=N
P
2
which is less than any prescribed > 0 for M < N sufficiently large, since ∞
n=1 |cn | <
P∞
PN
∞. Thus y = n=1 cn en exists in H, and clearly y ∈ E. Since h n=1 cn en , em i = cm if
N > m it follows easily that < y, em >= cm =< x, em >. Thus y − x ⊥ em for any m
which implies y − x ∈ E ⊥ . From Theorem 6.3 we conclude that y = PE x. 2
6.7
Characterization of a basis of a Hilbert space
Now suppose we have an orthogonal set {xn }∞
n=1 and we wish to determine whether or
not it is a basis of the Hilbert space H. There are a number of interesting ways to
answer this question, summarized in Theorem 6.4 below. First we must make some more
definitions.
Definition 6.5. A collection of vectors {xn }∞
n=1 is closed in H if the set of all finite linear
∞
combinations of {xn }n=1 is dense in H
A collection of vectors {xn }∞
n=1 is complete in H if there is no nonzero vector orthogonal to all of them, i.e. hx, xn i = 0 for all n if and only if x = 0.
An orthogonal set {xn }∞
n=1 in H is a maximal orthogonal set if it is not contained in
any larger orthogonal set.
basischar
Theorem 6.4. Let {en }∞
n=1 be an orthonormal set in a Hilbert space H. Then the following are equivalent.
a) {en }∞
n=1 is a basis of H.
P
b) x = ∞
n=1 hx, en ien for every x ∈ H.
P
c) hx, yi = ∞
n=1 hx, en ihen , yi for every x, y ∈ H.
P
2
d) ||x||2 = ∞
n=1 |hx, en i| for every x ∈ H.
e) {en }∞
n=1 is a maximal orthonormal set.
f ) {en }∞
n=1 is closed in H.
85
g) {en }∞
n=1 is complete in H.
then for any x ∈ H there exist unique
Proof: a) implies b): If {en }∞
n=1 is a basis of HP
constants dn such that x = limSN where SN =
n = 1N dn en . Since hSN , em i = dm if
N > m it follows that
|dm − hx, em i| = |hSn − x, em i| ≤ ||SN − x|| ||em || → 0
(6.7.1)
as N → ∞, using the Schwarz inequality. Hence
x=
∞
X
dn en =
n=1
∞
X
hx, en ien
(6.7.2)
n=1
b) implies c): For any x, y ∈ H we have
hx, yi = hx, yi = hx, lim
N
X
N →∞
=
lim hx,
N →∞
N
X
hy, en ien i
(6.7.3)
n=1
hy, en ien i = lim
N →∞
n=1
∞
X
hy, en ihx, en i
n=1
∞
∞
X
X
hx, en ihy, en i = hx, yi =
hx, en ihen , yi
=
n=1
(6.7.4)
(6.7.5)
n=1
Here we have used Corollary 6.1 in the second line.
c) implies d): We simply choose x = y in the identity stated in c).
d) implies e): If {en }∞
n=1 is not maximal then there exists e ∈ H such that
{en }∞
n=1 ∪ {e}
(6.7.6)
is orthonormal. Since he, en i = 0 but ||e|| = 1 this contradicts d).
e) implies f ): Let E denote the set of finite linear combinations of the en ’s. If {en }∞
n=1
is not closed then E 6= H so there must exist x 6∈ E. If we let y = x − PE x then y 6= 0
and y ⊥ E. If e = y/||y|| we would then have that {en }∞
n=1 ∪ {e} is orthonormal so that
{en }∞
could
not
be
maximal.
n=1
86
f ) implies g): Assume that hx, en i = 0 for all n. If {en }∞
for any > 0
n=1 is closed then
P
PN
2
2
2
there exists λ1 , . . . λN such that ||x − N
λ
e
||
<
.
But
then
||x||
+
n=1 n n
n=1 |λn | < and in particular ||x||2 < . Thus x = 0 so {en }∞
n=1 is complete.
P∞
g) implies a): Let E = L({en }∞
n=1 ). If x ∈ H and y = PE x =
n=1 hx, en ien then
∞
as in the proof of Proposition 6.3 hy, xn i = hx, xn i. Since {en }n=1 is complete it follows
that x = y ∈ E so that L{en }∞
n=1 = H. Since an orthonormal set is obviously linearly
independent it follows that {en }∞
n=1 is a basis of H.
Because of the equivalence of the stated conditions, the phrases ’complete orthonormal
set’, ’maximal orthonormal set’, and ’closed orthonormal set’ are often used interchangeably with ’orthonormal basis’ in a Hilbert space setting. The identity in d) is called the
Bessel equality (recall the corresponding inequality (6.6.5) is valid whether or not the
orthonormal set {en }∞
equality. For
n=1 is a basis), while the identity in c) is the ParsevalP
reasons which should become more clear in Chapter 8 the infinite series ∞
n=1 hx, en ien
is often called the generalized Fourier series of x with respect to the orthonormal basis
{en }∞
n=1 , and hx, en i is the n’th generalized Fourier coefficient.
th65
Theorem 6.5. Every separable Hilbert space has an orthonormal basis.
Proof: If {xn }∞
n=1 is a countable dense sequence in H and we carry out the GramSchmidt procedure, we obtain an orthonormal sequence {en }∞
n=1 . This sequence must
be complete, since any vector orthogonal to every en must also be orthogonal to every
∞
xn , so must be zero, since {xn }∞
n=1 is dense. Therefore by Theorem 6.4 {en }n=1 (or
{e1 , e2 , . . . en } in the finite dimensional case) is an orthonormal basis of H.
The same conclusion is actually correct in a no-separable Hilbert space also, but needs
more explanation. See for example Chapter 4 of [30].
6.8
Isomorphisms of a Hilbert space
There are two interesting isomorphisms of every separable Hilbert space, one is to its socalled dual space, and the second is to the sequence space `2 . In this section we explain
both of these facts.
Recall that in Chapter 5 we have already introduced X∗ = B(X, C), the space of
continuous linear functionals on the normed linear space X. It is itself always a Banach
space (see Exercise 3 of Chapter 5), and is also called the dual space of X.
87
exmp6-10
Example 6.10. If H is a Hilbert space and y ∈ H, define φ(x) = hx, yi. Then φ : H → C
is clearly linear, and |φ(x)| ≤ ||y|| ||x|| by the Schwarz inequality, hence φ ∈ H∗ , with
||φ|| ≤ ||y||.
The following theorem asserts that every element of the dual space H∗ arises in this
way.
riesz
Theorem 6.6. (Riesz representation theorem) If H is a Hilbert space and φ ∈ H∗ then
there exists a unique y ∈ H such that φ(x) = hx, yi.
Proof: Let M = {x ∈ H : φ(x) = 0}, which is clearly a closed subspace of H. If M = H
then φ can only be the zero functional, so y = 0 has the required properties. Otherwise,
there must exist e ∈ M ⊥ such that ||e|| = 1. For any x ∈ H let z = φ(x)e − φ(e)x and
observe that φ(z) = 0 so z ∈ M , and in particular z ⊥ e. It then follows that
0 = hz, ei = φ(x)he, ei − φ(e)hx, ei
(6.8.1)
Thus φ(x) = hx, yi with y := φ(e)e, for every x ∈ H.
The uniqueness property is even easier to show. If φ(x) = hx, y1 i = hx, y2 i for every
x ∈ H then necessarily hx, y1 − y2 i = 0 for all x, and choosing x = y1 − y2 we get
||y1 − y2 ||2 = 0, that is, y1 = y2 .
We view the element y ∈ H as ’representing’ the linear functional φ ∈ H∗ , hence the
name of the theorem. There are actually several theorems one may encounter, all called
the Riesz representation theorem, and what they all have in common is that the dual
space of some other space is characterized. The Hilbert space version here is by the far
the easiest of these theorems.
If we define the mapping R : H → H∗ (the Riesz map) by the condition R(y) = φ,
with φ, y related as above, then Theorem 6.6 amounts to the statement that R is one
to one and onto. Since it is easy to check that R is also linear, it follows that R is an
isomorphism from H to H∗ . In fact more is true, R is an isometric isomorphism which
means that ||R(y)|| = ||y|| for every y ∈ H. To see this, recall we have already seen in
Example 6.10 that ||φ|| ≤ ||y||, and by choosing x = y we also get φ(y) = ||y||2 , which
implies ||φ|| ≥ ||y||.
Next, suppose that H is an infinite dimensional separable Hilbert space. According
to Theorem 6.5 there exists an orthonormal basis of H which cannot be finite, and so
may be written as {en }∞
n=1 . Associate with any x ∈ H the corresponding sequence
88
of generalized Fourier coefficients {cn }∞
n=1 , where cn = hx, en i, and let Λ denote this
mapping, i.e. Λ(x) = {cn }∞
.
n=1
P∞
2
2
We know byPTheorem 6.4 that
n=1 |cn |P< ∞, i.e. Λ(x) ∈ ` . On the other
∞
∞
hand, suppose n=1 |cn |2 < ∞ and let x =
n=1 cn en . This series is Cauchy, hence
convergent in H, by precisely the same argument as used in the beginning of the proof of
∞
Proposition 6.3. Since {en }∞
n=1 is a basis, we must have cn = hx, en i, thus Λ(x) = {cn }n=1 ,
2
and consequently Λ : H → ` is onto. It is also one-to-one, since Λ(x1 ) = Λ(x2 ) means
that hx1 − x2 , en i = 0 for every n, hence x1 − x2 = 0 by the completeness property of a
basis. Finally it is straightforward to check that Λ is linear, so that Λ is an isomorphism.
Like the Riesz map, the isomorphism Λ is also isometric, ||Λ(x)|| = ||x||, on account of
the Bessel equality. By the above considerations we have then established the following
theorem.
Theorem 6.7. If H is an infinite dimensional separable Hilbert space, then H is isometrically isomorphic to `2 .
Since all such Hilbert spaces are isometrically isomorphic to `2 , they are then obviously
isometrically isomorphic to each other. If H is a Hilbert space of dimension N , the same
arguments show that H is isometrically isomorphic to the Hilbert space RN or CN ,
depending on whether real or complex scalars are allowed. Finally, see Theorem 4.17 of
[30] for the nonseparable case.
6.9
Exercises
1. Prove Proposition 6.2.
2. In the Hilbert space L2 (−1, 1) what is M ⊥ if
a) M = {u : u(x) = u(−x) a.e.}
b) M = {u : u(x) = 0 a.e. for − 1 < x < 0}.
Give an explicit formula for the projection onto M in each case.
3. Prove that PE is a linear operator on H with norm ||PE || = 1 except in the trivial
case when E = {0}. Suggestion: If x = c1 x1 + c2 x2 first show that
PE x − c1 PE x1 − c2 PE x2 = −PE ⊥ x + c1 PE ⊥ x1 + c2 PE ⊥ x2
89
4. Show that the parallelogram law fails in L∞ (Ω), so there is no choice of inner
product which can give rise to the norm in L∞ (Ω). (The same is true in Lp (Ω) for
any p 6= 2.)
5. If (X, h·, ·i) is an inner product space prove the polarization identity
hx, yi =
1
||x + y||2 − ||x − y||2 + i||x + iy||2 − i||x − iy||2
4
Thus, in any normed linear space, there can exist at most one inner product giving
rise to the norm.
6. Let M be a closed subspace of a Hilbert space H, and PM be the corresponding
projection. Show that
2
a) PM
= PM
b) hPM x, yi = hPM x, PM yi = hx, PM yi for any x, y ∈ H.
ex6-6
7. Show that `2 is a Hilbert space. (Discussion: The only property you need to check
is completeness, and you may freely use the fact that R is complete. A Cauchy
sequence in this case is a sequence of sequences, so use a notation like
(n)
(n)
x(n) = {x1 , x2 , . . . }
(n)
where xj denotes the j’th term of the n’th sequence x(n) . Given a Cauchy sequence
(n)
2
{x(n) }∞
= xj for each
n=1 in ` you’ll first find a sequence x such that limn→∞ xj
fixed j. You then must still show that x ∈ `2 , and one good way to do this is by
first showing that x − x(n) ∈ `2 for some n.)
8. Let H be a Hilbert space.
a) If xn → x in H show that {xn }∞
n=1 is bounded in H.
b) If xn → x, yn → y in H show that hxn , yn i → hx, yi.
ex6-8
9. Compute orthogonal polynomials of degree 0,1,2,3 on [−1, 1] and on [0, 1] by applying the Gram-Schmidt procedure to 1, x, x2 , x3 in L2 (−1, 1) and L2 (0, 1). (In the
case of L2 (−1, 1), you are finding so-called Legendre polynomials.)
10. Use the result of Exercise 9 and the projection formula (6.6.1) to compute the
best polynomial approximations of degrees 0,1,2 and 3 to u(x) = ex in L2 (−1, 1).
Feel free to use any symbolic calculation tool you know to compute the necessary
integrals, but give exact coefficients, not calculator approximations. If possible,
produce a graph displaying u and the 4 approximations.
90
11. Let Ω ⊂ RN , ρ be a measurable function on Ω, andR ρ(x) > 0 a.e. on Ω. Let X
denote the set of measurable functions u for which Ω |u(x)|2 ρ(x) dx is finite. We
can then define the weighted inner product
Z
u(x)v(x)ρ(x) dx
hu, viρ =
Ω
p
and corresponding norm ||u||ρ = hu, uiρ on X. The resulting inner product space
is complete, often denoted L2ρ (Ω). (As in the case of ρ(x) = 1 we regard any two
functions which agree a.e. as being the same element, so L2ρ (Ω) is again really a set
of equivalence classes.)
a) Verify that all of the inner product axioms are satisfied.
b) Suppose that there exist constants C1 , C2 such that 0 < C1 ≤ ρ(x) ≤ C2 a.e.
Show that un → u in L2ρ (Ω) if and only if un → u in L2 (Ω).
12. More classes of orthogonal polynomials may be derived by applying the GramSchmidt procedure to {1, x, x2 , . . . } in L2ρ (a, b) for various choices of ρ, a, b, two of
which occur in Exercise 9. Another class is the Laguerre polynomials, corresponding
to a = 0, b = ∞ and ρ(x) = e−x . Find the first four Laguerre polynomials.
13. Show that equality holds in the Schwarz inequality (6.2.1) if and only if x, y are
linearly dependent.
14. Show by examples that the best approximation problem (6.4.1) may not have a
solution if E is either not closed or not convex.
15. If Ω is a compact subset of RN , show that C(Ω) is a subspace of L2 (Ω) which isn’t
closed.
16. Show that
∞
1
√ , cos nπx, sin nπx
2
n=1
(6.9.1)
is an orthonormal set in L2 (−1, 1). (Completeness of this set will be shown in
Chapter 8.)
17. For nonnegative integers n define
vn (x) = cos (n cos−1 x)
a) Show that vn+1 (x) + vn−1 (x) = 2xvn (x) for n = 1, 2, . . .
91
b) Show that vn is a polynomial of degree n (the so-called Chebyshev polynomials).
2
c) Show that {vn }∞
n=1 are orthogonal in Lρ (−1, 1) where the weight function is
1
ρ(x) = √1−x
2.
18. If H is a Hilbert space we say a sequence {xn }∞
n=1 converges weakly to x (notation:
w
xn → x) if hxn , yi → hx, yi for every y ∈ H.
w
a) Show that if xn → x then xn → x.
b) Prove that the converse is false, as long as dim(H) = ∞, by showing that if
w
{en }∞
n=1 is any orthonormal sequence in H then en → 0, but limn→∞ en doesn’t
exist.
w
c) Prove that if xn → x then ||x|| ≤ lim inf n→∞ ||xn ||.
w
d) Prove that if xn → x and ||xn || → ||x|| then xn → x.
19. Let M1 , M2 be closed subspaces of a Hilbert space H and suppose M1 ⊥ M2 . Show
that
M1 ⊕ M2 = {x ∈ H : x = y + z, y ∈ M1 , z ∈ M2 }
is also a closed subspace of H.
92
Chapter 7
Distributions
chdist
In this chapter we will introduce and study the concept of distribution, also commonly
known as generalized function. To motivate this study we first mention two examples.
Example 7.1. The wave equation utt − uxx = 0 has the general solution u(x, t) =
F (x + t) + G(x − t) where F, G must be in C 2 (R) in order that u be a classical solution.
However from a physical point of view there is no apparent reason why such smoothness
restrictions on F, G should be needed. Indeed the two terms represent waves of fixed shape
moving to the left and right respectively with speed one, and it ought to be possible to
allow the shape functions F, G to have discontinuities. The calculus of distributions will
allow us to regard u as a solution of the wave equation in a well defined sense even for
such irregular F, G.
Example 7.2. In physics and engineering one frequently encounters the Dirac delta
function δ(x), which has the properties
Z ∞
δ(x) = 0 x 6= 0
δ(x) dx = 1
(7.0.1)
−∞
Unfortunately these properties are inconsistent for ordinary functions – any function
which is zero except at a single point must have integral zero. The theory of distributions
will allow us to give a precise mathematical meaning to the delta function and in so doing
justify formal calculations with it.
Roughly speaking a distribution is a mathematical object whose unique identity is
specified by how it acts on all test functions. It is in a sense quite analogous to a function
93
in the ordinary sense, whose unique identity is specified by how it acts (i.e. how it maps)
all points in its domain. As we will see, most ordinary functions may viewed as a special
kind of distribution, which explains the ’generalized function’ terminology. In addition,
there is a well defined calculus of distributions which is basic to the modern theory of
partial differential equations. We now start to give precise meaning to these concepts.
7.1
The space of test functions
For any real or complex valued function f defined on some domain in RN , the support
of f , denoted supp f , is the closure of the set {x : f (x) 6= 0}.
Definition 7.1. If Ω is any open set in RN the space of test functions on Ω is
C0∞ (Ω) = {φ ∈ C ∞ (Ω) : supp φ is compact in Ω}
(7.1.1)
This function space is also commonly denoted D(Ω), which is the notation we will
use from now on. Clearly D(Ω) is a vector space, but it may not be immediately evident
that it contains any function other than φ ≡ 0.
Example 7.3. Define
( 1
e x2 −1 |x| < 1
φ(x) =
0 |x| ≥ 1
(7.1.2)
Then φ ∈ D(Ω) with Ω = R. To see this one only needs to check that limx→1− φ(k) (x) = 0
for k = 0, 1, . . . , and similarly at x = −1. Once we have one such function then many
others can be derived from it by dilation (φ(x) → φ(αx)), translation (φ(x) → φ(x − α)),
scaling (φ(x) → αφ(x)), differentiation (φ(x) → φ(k) (x)) or any linear combination of
such terms. See also Exercise 1.
Next, we define convergence in the test function space.
Definition 7.2. If φn ∈ D(Ω) then we say φn → 0 in D(Ω) if
(i) There exists a compact set K ⊂ Ω such that supp φn ⊂ K for every n
(ii) limn→∞ maxx∈Ω |Dα φn (x)| = 0 for every multiindex α
We also say that φn → φ in D(Ω) provided φn − φ → 0 in D(Ω). By specifying what
convergence of a sequence in D(Ω) means, we are partly, but not completely, specifying
94
a topology on D(Ω). We will have no need of further details about this topology, see
Chapter 6 of [31] for more on this point.
7.2
The space of distributions
sec72
We come now to the basic definition – a distribution is a continuous linear functional on
D(Ω). More precisely
Definition 7.3. A linear mapping T : D(Ω) → C is a distribution on Ω if T (φn ) → T (φ)
whenever φn → φ in D(Ω). The set of all distributions on Ω is denoted D0 (Ω).
The distribution space D0 (Ω) is another example of a dual space X0 , the set of all
continuous linear functionals on X, which can be defined if X is any vector space in which
convergence of sequences is defined. The dual space is always itself a vector space. We’ll
discuss many more examples of dual spaces later on. We emphasize that the distribution
T is defined solely in terms of the values it assigns to test functions φ, in particular two
distributions T1 , T2 are equal if and only if T1 (φ) = T2 (φ) for every φ ∈ D(Ω).
To clarify the concept, let us discuss a number of examples.
Example: If f ∈ L1 (Ω) define
Z
T (φ) =
f (x)φ(x) dx
(7.2.1)
Ω
Obviously |T (φ)| ≤ ||f ||L1 (Ω) ||φ||L∞ (Ω) , so that T : D(Ω) → C and is also clearly linear.
If φn → φ in D(Ω) then by the same token
|T (φn ) − T (φ)| ≤ ||f ||L1 (Ω) ||φn − φ||L∞ (Ω) → 0
(7.2.2)
so that T is continuous. Thus T ∈ D0 (Ω).
Because of the fact that φ must have compact support in Ω one does not really need
f to be in L1 (Ω) but only in L1 (K) for any compact subset K of Ω. For any 1 ≤ p ≤ ∞
let us define
Lploc (Ω) = {f : f ∈ Lp (K) for any compact set K ⊂ Ω}
95
(7.2.3)
Tf
Thus a function in Lploc (Ω) can become infinite arbitrarily rapidly at the boundary of Ω.
We say that fn → f in Lploc (Ω) if fn → f in Lp (K) for every compact subset K ⊂ Ω.
Functions in L1loc are said to be locally integrable on Ω.
Now if we let f ∈ L1loc (Ω) the definition (7.2.1) still produces a finite value, since
Z
Z
|T (φ)| =
f (x)φ(x) dx =
f (x)φ(x) dx ≤ ||f ||L1 (K) ||φ||L∞ (K) < ∞
(7.2.4)
Ω
K
if K = supp φ. Similarly if φn → φ in D(Ω) we can choose a fixed compact set K ⊂ Ω
containing supp φ and supp φn for every n, hence again
|T (φn ) − T (φ)| ≤ ||f ||L1 (K) ||φn − φ||L∞ (K) → 0
(7.2.5)
so that T ∈ D0 (Ω).
When convenient, we will denote the distribution in (7.2.1) by Tf . The correspondence
f → Tf allows us to think of L1loc (Ω) as a special subspace of D0 (Ω), i.e. locally integrable
functions are always distributions. The point is that a function f can be thought of as a
mapping
Z
φ→
f φ dx
(7.2.6)
Ω
instead of the more conventional
x → f (x)
(7.2.7)
In fact for L1loc functions the former is in some sense more natural since it doesn’t require
us to make special arrangements for sets of measure zero. A distribution of the form
T = Tf for some f ∈ L1loc (Ω) is sometimes referred to as a regular distribution, while any
distribution not of this type is a singular distribution.
The correspondence f → Tf is also one-to-one. This is a slightly technical result in
measure theory which we leave for the exercises, for those with the necessary background.
See also Theorem 2, Chapter II of [32]:
th71
Theorem 7.1. Two distributions Tf1 , Tf2 on Ω are equal if and only if f1 = f2 almost
everywhere on Ω.
Example 7.4. Fix a point x0 ∈ Ω and define
T (φ) = φ(x0 )
(7.2.8)
Clearly T is defined and linear on D(Ω) and if φn → φ in D(Ω) then
|T (φn ) − T (φ)| = |φn (x0 ) − φ(x0 )| → 0
96
(7.2.9)
Tdelta
since φn → φ uniformly on Ω. We claim that T is not of the form Tf for any f ∈ L1loc (Ω)
(i.e. not a regular distribution). To see this, suppose some such f existed. We would
then have
Z
f (x)φ(x) dx = 0
(7.2.10)
Ω
for any test function φ with φ(x0 ) = 0. In particular if Ω0 = Ω\{x0 } and φ ∈ D(Ω0 ) then
defining φ(x0 ) = 0 we clearly have φ ∈ D(Ω) and T (φ) = 0, hence f = 0 a.e. on Ω0 and
so on Ω, by Theorem 7.1. On the other hand we must also have, for any φ ∈ D(Ω) that
Z
φ(x0 ) = T (φ) =
f (x)φ(x) dx
(7.2.11a)
Ω
Z
Z
Z
f (x)(φ(x) − φ(x0 )) dx + φ(x0 ) f (x) dx = φ(x0 ) f (x) dx
(7.2.11b)
=
Ω
Ω
since f = 0 a.e. on Ω, and therefore
Ω
R
f (x) dx = 1 a contradiction.
R
Note that f (x) = 0 for a.e x ∈ Ω. and Ω f (x) dx − 1 are precisely the formal
properties of the delta function mentioned in Example 2. We define T to be the Dirac
delta distribution with singularity at x0 , usually denoted δx0 , or simply δ in the case
x0 = 0. By an acceptable abuse of notation, pretending that δ is an actual function, we
may write a formula like
Z
δ(x)φ(x) dx = φ(0)
(7.2.12)
Ω
Ω
but we emphasize that this is simply a formal expression of (7.2.8), and any rigorous
arguments must make use of (7.2.8) directly. In the same formal sense δx0 (x) = δ(x − x0 )
so that
Z
δ(x − x0 )φ(x) dx = φ(x0 )
(7.2.13)
Ω
ex-75
Example 7.5. Fix a point x0 ∈ Ω, a multiindex α and define
T (φ) = (Dα φ)(x0 )
(7.2.14)
One may show, as in the previous example, that T ∈ D0 (Ω).
Example 7.6. Let Σ be a sufficiently smooth hypersurface in Ω of dimension m ≤ n − 1
and define
Z
T (φ) =
φ(x) ds(x)
(7.2.15)
Σ
where ds is the surface area element on Σ. Then T is a distribution on Ω sometimes
referred to as the delta distribution concentrated on Σ, sometimes written as δΣ .
97
ex-77
Example 7.7. Let Ω = R and define
Z
T (φ) = lim
→0+
|x|>
φ(x)
dx
x
(7.2.16)
As we’ll show below, the indicated limit always exists and is finite for φ ∈ D(Ω) (even
for φ ∈ C01 (Ω)). In general, a limit of the form
Z
lim
f (x) dx
(7.2.17)
→0+
Ω∩|x−a|>
R
when it exists, is called the Cauchy principal value of Ω f (x) dx, which may be finite
R
R1
is divergent,
even when Ω f (x) dx is divergent in the ordinary sense. For example −1 dx
x
regarded as either a Lebesgue integral or an improper Riemann integral, but
Z
dx
lim
=0
(7.2.18)
→0+ 1>|x|> x
To distinguish the principal value meaning of the integral, the notation
Z
pv f (x) dx
(7.2.19)
Ω
may be used instead of (7.2.17), where the point a in question must be clear from context.
Let us now check that (7.2.16) defines a distribution. If supp φ ⊂ [−M, M ] then since
Z
|x|>
φ(x)
dx =
x
Z
M >|x|>
φ(x)
dx =
x
Z
M >|x|>
φ(x) − φ(0)
dx + φ(0)
x
and the last term on the right is zero, we have
Z
T (φ) = lim
→0+
ψ(x) dx
Z
M >|x|>
1
dx (7.2.20)
x
(7.2.21)
M >|x|>
where ψ(x) = (φ(x) − φ(0))/x. It now follows from the mean value theorem that
Z
|T (φ)| ≤
|ψ(x)| dx ≤ 2M ||φ0 ||L∞
(7.2.22)
|x|<M
98
pv1x
pvdef
so T (φ) is defined and finite for all test functions. Linearity of T is clear, and if φn → φ
in D(Ω) then
(7.2.23)
|T (φn ) − T (φ)| ≤ 2M ||φ0n − φ0 ||L∞ → 0
where M is chosen so that supp φn , supp φ ⊂ [−M, M ], and it follows that T is continuous.
The distribution T is often denoted pv x1 , so for example pv x1 (φ) means the same
thing as the right hand side of (7.2.16). For reasons which will become more clear later,
it may also be referred to as pf x1 , pf standing for pseudofunction (also finite part).
7.3
7.3.1
Algebra and Calculus with Distributions
Multiplication of distributions
As noted above D0 (Ω) is a vector space, hence distributions can be added and multiplied
by scalars. In general it is not possible to multiply together arbitrary distributions –
for example δ 2 = δ · δ cannot be defined in any consistent way. It is always possible,
however, to multiply a distribution by a C ∞ function. More precisely, if a ∈ C ∞ (Ω) and
T ∈ D0 (Ω) then we may define the product aT as a distribution via
Definition 7.4. aT (φ) = T (aφ)
φ ∈ D(Ω)
Clearly aφ ∈ D(Ω) so that the right hand side is well defined, and it it straightforward to check that aT satisfies the necessary linearity and continuity conditions. One
should also note that if T = Tf then this definition is consistent with ordinary pointwise
multiplication of the functions f and a.
7.3.2
Convergence of distributions
An appropriate definition of convergence of a sequence of distributions is as follows.
convD
Definition 7.5. If T, Tn ∈ D0 (Ω) for n = 1, 2 . . . then we say Tn → T in D0 (Ω) (or in
the sense of distributions) if Tn (φ) → T (φ) for every φ ∈ D(Ω).
99
It is an interesting fact, which we shall not prove here, that it is not necessary to
assume that the limit T belongs to D0 (Ω), that is to say, if T (φ) := limn→∞ Tn (φ) exists
for every φ ∈ D(Ω) then necessarily T ∈ D0 (Ω), (see Theorem 6.17 of [31]).
Example 7.8. If fn ∈ L1loc (Ω) and fn → f in L1loc (Ω) then the corresponding distribution
Tfn → Tf in the sense of distributions, since
Z
|Tfn (φ) − Tf (φ)| ≤
|fn − f ||φ| dx ≤ ||fn − f ||L1 (K) ||φ||L∞ (Ω)
(7.3.1)
K
where K is the support of φ. Because of the one-to-one correspondence f ↔ Tf , we will
usually write instead that fn → f in the sense of distributions.
Example 7.9. Define
fn (x) =
n 0 < x < n1
0 otherwise
(7.3.2)
We claim that fn → δ in the sense of distributions. We see this by first observing that
Z 1
Z 1
n
n
|Tfn (φ) − δ(φ)| = n
φ(x) dx − φ(0) = n
(φ(x) − φ(0)) dx
(7.3.3)
0
0
By the continuity of φ, if > 0 there exists δ > 0 such that |φ(x) − φ(0)| ≤ whenever
|x| ≤ δ. Thus if we choose n > 1δ there follows
Z
n
1
n
Z
|φ(x) − φ(0)| dx ≤ n
0
1
n
dx = (7.3.4)
0
from which the conclusion follows.
Note that the formal properties of the δ function,
R
δ(x) = 0, x 6= 0, δ(0) = +∞, δ(x) dx = 1, are clearly reflected in the pointwise limit
of the sequence fn , but it is only the distributional definition that is mathematically
satisfactory.
Sequences converging to δ play a very large role in methods of applied mathematics,
especially in the theory of differential and integral equations. The following theorem
includes many cases of interest.
dst
Theorem 7.2. Suppose fn ∈ L1 (RN ) for n = 1, 2, . . . and assume
R
a) RN fn (x) dx = 1 for all n.
b) There exists a constant C such that ||fn ||L1 (RN ) ≤ C for all n.
100
ds2
c) limn→∞
R
|x|>δ
|fn (x)| dx = 0 for all δ > 0.
If φ is bounded on RN and continuous at x = 0 then
Z
fn (x)φ(x) dx = φ(0)
lim
n→∞
(7.3.5)
RN
and in particular fn → δ in D0 (RN ).
Proof: For any φ ∈ D(Ω) we have
Z
Z
fn (x)φ(x) dx − φ(0) =
RN
fn (x)(φ(x) − φ(0)) dx
(7.3.6)
ds1
RN
and so we will be done if we show that that the integral on the right tends to zero as
n → ∞. Fix > 0 and choose δ > 0 such that |φ(x) − φ(0)| ≤ whenever |x| < δ. Write
the integral on the right in (7.3.6) as the sum An,δ + Bn,δ where
Z
Z
An,δ =
fn (x)(φ(x) − φ(0)) dx
Bn,δ =
fn (x)(φ(x) − φ(0)) dx
(7.3.7)
|x|≤δ
|x|>δ
We then have, by obvious estimations, that
Z
|An,δ | ≤ |fn (x)| ≤ C
(7.3.8)
RN
while
Z
|fn (x)| dx = 0
lim sup |Bn,δ | ≤ lim sup 2||φ||L∞
n→∞
Thus
n→∞
Z
lim sup n→∞
RN
(7.3.9)
|x|>δ
fn (x)φ(x) dx − φ(0) ≤ C
(7.3.10)
and the conclusion follows since > 0 is arbitrary. 2
It is often the case that fn ≥ 0 for all n, in which case assumption b) follows automatically from a) with C = 1. We will refer to any sequence satisfying the assumptions
of Theorem 7.2 as a delta sequence.
A common way to construct such a sequence is to
R
pick any f ∈ L1 (RN ) with RN f (x) dx = 1 and set
fn (x) = nN f (nx)
(7.3.11)
The verification of this is left to the exercises. If, for example, we choose f (x) = χ[0,1] (x),
then the resulting sequence fn (x) is the same as is defined in (7.3.2). Since we can also
choose such an f in D(RN ) we also have
101
7311
dst2
N
Proposition 7.1. There exists a sequence {fn }∞
n=1 such that fn ∈ D(R ) and fn → δ
0
N
in D (R ).
7.3.3
Derivative of a distribution
Next we explain how it is possible to define the derivative of an arbitrary distribution.
For the moment, suppose (a, b) ⊂ R, f ∈ C 1 (a, b) and T = Tf is the corresponding
distribution. We clearly then have from integration by parts that
Z b
Z b
0
f (x)φ0 (x) dx = −Tf (φ0 )
(7.3.12)
f (x)φ(x) dx = −
Tf 0 (φ) =
a
a
This suggests defining
T 0 (φ) = −T (φ0 )
φ ∈ C0∞ (a, b)
(7.3.13)
whenever T ∈ D0 (a, b). The previous equation shows that this definition is consistent
with the ordinary concept of differentiability for C 1 functions. Clearly, T 0 (φ) is always
defined, since φ0 is a test function whenever φ is, linearity of T 0 is obvious, and if φn → φ
in C0∞ (a, b) then φ0n → φ0 also in C0∞ (a, b) so that
T 0 (φn ) = −T (φ0n ) → −T (φ0 ) = T 0 (φ)
(7.3.14)
Thus, T 0 ∈ D0 (a, b).
Example: Consider the case of the Heaviside (unit step) function H(x)
0 x<0
H(x) =
1 x>0
(7.3.15)
If we seek the derivative of H (i.e. of TH ) according to the above distributional definition,
then we compute
Z ∞
Z ∞
0
0
0
H(x)φ (x) dx = −
φ0 (x) dx = φ(0)
(7.3.16)
H (φ) = −H(φ ) = −
−∞
0
(where we use the natural notation H 0 in place of TH0 ). This means that H 0 (φ) = δ(φ) for
any test function φ, and so H 0 = δ in the sense of distributions. This relationship clearly
captures the fact the H 0 = 0 at all points where the derivative exists in the classical
sense, since we think of the delta function as being zero on any interval not containing
102
the origin. Since H is not differentiable at the origin, the distributional derivative is itself
a distribution which is not a function.
Since δ is again a distribution, it will itself have a derivative, namely
δ 0 (φ) = −δ(φ0 ) = −φ0 (0)
(7.3.17)
a distribution of the type discussed in Example 7.5, often referred to as the dipole distribution, which of course we may regard as the second derivative of H.
For an arbitrary domain Ω ⊂ RN and sufficiently smooth function f we have the
similar integration by parts formula (see (18.2.3))
Z
Z
∂f
∂φ
φ dx = − f
dx
(7.3.18)
Ω ∂xi
Ω ∂xi
leading to the definition
Definition 7.6.
∂T
(φ) = −T
∂xi
∂φ
∂xi
φ ∈ D(Ω)
(7.3.19)
∂T
As in the one dimensional case we easily check that ∂x
belongs to D0 (Ω) whenever T
i
does. This has the far reaching consequence that every distribution is infinitely differentiable in the sense of distributions. Furthermore we have the general formula, obtained
by repeated application of the basic definition, that
(Dα T )(φ) = (−1)|α| T (Dα φ)
(7.3.20)
for any multiindex α.
A simple and useful property is
prop72
Proposition 7.2. If Tn → T in D0 (Ω) then Dα Tn → Dα T in D0 (Ω) for any multiindex
α.
Proof: Dα Tn (φ) = (−1)|α| Tn (Dα φ) → (−1)|α| T (Dα φ) = Dα T (φ) for any test function
φ. 2
Next we consider a more generic one dimensional situation. Let x0 ∈ R and consider
a function f which is C ∞ on (−∞, x0 ) and on (x0 , ∞), and for which f (k) has finite left
103
and right hand limits at x = x0 , for any k. Thus, at the point x = x0 , f or any of its
derivatives may have a jump discontinuity, and we denote
∆k f = lim f (k) (x) − lim f (k) (x)
x→x0 −
x→x0 +
(and by convention ∆f = ∆0 f .) Define also
(k)
f (x)
x 6= x0
(k)
[f ](x) =
undefined x = x0
(7.3.21)
(7.3.22)
which we’ll refer to as the pointwise k’th derivative. The notation f (k) will always be
understood to mean the distributional derivative unless otherwise stated. The distinction
between f (k) and [f (k) ] is crucial, for example if f (x) = H(x), the Heaviside function,
then H 0 = δ but [H 0 ] = 0 for x 6= 0, and is undefined for x = 0.
For f as described above, we now proceed to calculate the distributional derivative.
If φ ∈ C0∞ (R) we have
Z ∞
Z x0
Z ∞
0
0
f (x)φ (x) dx =
f (x)φ (x) dx +
f (x)φ0 (x) dx
(7.3.23a)
−∞
−∞
x0
Z x0
Z −∞
x0
−∞
0
= f (x)φ(x) −∞ −
f (x)φ(x) dx + f (x)φ(x) x0 −
f 0 (x)φ(x) dx (7.3.23b)
−∞
x0
Z ∞
=−
[f 0 (x)]φ(x) dx + (f (x0 −) − f (x0 +))φ(x0 )
(7.3.23c)
−∞
It follows that
0
Z
∞
f (φ) =
[f 0 (x)]φ(x) dx + (∆f )φ(x0 )
(7.3.24)
−∞
or
f 0 = [f 0 ] + (∆f )δ(x − x0 )
(7.3.25)
Note in particular that f 0 = [f 0 ] if and only if f is continuous at x0 .
The function [f 0 ] satisfies all of the same assumptions as f itself, with ∆f 0 = ∆[f 0 ],
thus we can differentiate again in the distribution sense to obtain
f 00 = [f 0 ]0 + (∆f )δ 0 (x − x0 ) = [f 00 ] + (∆1 f )δ(x − x0 ) + (∆f )δ 0 (x − x0 )
(7.3.26)
Here we use the evident fact that the distributional derivative of δ(x − x0 ) is δ 0 (x − x0 ).
104
A similar calculation can be carried out for higher derivatives of f , leading to the
general formula
k−1
X
f (k) = [f (k) ] +
(∆j f )δ (k−1−j) (x − x0 )
(7.3.27)
j=0
One can also obtain a similar formula if f is allowed to have any finite number of such
singular points.
Example 7.10. Let
f (x) =
x
x<0
cos x x > 0
Clearly f satisfies all of the assumptions mentioned above with x0 = 0, and
1
x<0
0
[f ](x) =
− sin x x > 0
0
x<0
00
[f ](x) =
− cos x x > 0
(7.3.28)
(7.3.29)
(7.3.30)
so that ∆f = 1, ∆1 f = −1. Thus
f 0 = [f 0 ] + δ
f 00 = [f 00 ] − δ + δ 0
(7.3.31)
Here is one more instructive example in the one dimensional case.
Example 7.11. Let
(
log x x > 0
f (x) =
0 x≤0
(7.3.32)
Since f ∈ L1loc (R) we may regard it as a distribution on R, but its pointwise derivative
H(x)/x is not locally integrable, so does not have an obvious distributional meaning.
Nevertheless f 0 must exist in the sense of D0 (R). To find it we use the definition above,
Z ∞
Z ∞
0
0
0
f (φ) = −f (φ ) = −
φ (x) log x dx = − lim
φ0 (x) log x dx (7.3.33)
→0+
0
Z ∞
φ(x)
= lim φ() log +
dx
(7.3.34)
→0+
x
Z ∞
φ(x)
= lim φ(0) log +
dx
(7.3.35)
→0+
x
105
7327
where the final equality is valid because the difference between it and the previous line
is lim→0 (φ() − φ(0))
log = 0. The functional defined by the final expression above will
be denoted1 as pf H(x)
, i.e.
x
pf
H(x)
x
∞
Z
(φ) = lim φ(0) log +
→0+
φ(x)
dx
x
(7.3.36)
Since we have already established
that the derivative of a distribution is also a distriH(x)
bution, it follows that pf x
∈ D0 (R) and in particular the limit here always exists
for φ ∈ D(R). It should be emphasized that if φ(0) 6= 0 then neither of the two terms
on the right hand side in (7.3.36) will have a finite limit separately, but the sum always
will. For a test function
φ with
support disjoint from the singularity at x = 0, the action
H(x)
of the distribution pf x
coincides with that of the ordinary function H(x)/x, as we
might expect.
Next we turn to examples involving partial derivatives.
exm7-12
Example 7.12. Let F ∈ L1loc (R) and set u(x, t) = F (x + t). We claim that utt − uxx = 0
in D0 (R2 ). Recall that this is the point that was raised in the first example at the
beginning of this chapter. A similar argument works for F (x − t). To verify this claim,
first observe that for any φ ∈ D(R2 )
ZZ
F (x + t)(φtt (x, t) − φxx (x, t)) dxdt
(7.3.37)
(utt − uxx )(φ) = u(φtt − φxx ) =
R2
Make the change of coordinates
ξ =x−t η =x+t
(7.3.38)
to obtain
Z
∞
(utt − uxx )(φ) = 2
Z
∞
Z
∞
φξη (ξ, η) dξ dη = 2
F (η)
−∞
−∞
F (η) φη (ξ, η)|∞
ξ=−∞ dη = 0
−∞
(7.3.39)
since φ has compact support.
exm7-13
Example 7.13. Let N ≥ 3 and define
u(x) =
1
1
|x|N −2
Recall the pf notation was introduced earlier in Section 7.2.
106
(7.3.40)
pf1
We claim that
in D0 (RN )
∆u = CN δ
(7.3.41)
7341
where CN = (2 − N )ΩN −1 and ΩN −1 is the surface area2 of the unit sphere in RN . First
note that for any R we have
Z
Z R
1 N −1
|u(x)| dx = ΩN −1
r
dr < ∞
(7.3.42)
N
−2
|x|<R
0 r
(using, for example (18.3.1)) so u ∈ L1loc (RN ) and in particular u ∈ D0 (RN ).
It is natural here to use spherical coordinates in RN , see Section 18.3 for a review. In
particular the expression for the Laplacian in spherical coordinates may be derived from
the chain rule, as was done in (2.3.67) for the two dimensional case. When applied to a
function depending only on r = |x|, such as u, the result is
∆u = urr +
N −1
ur
r
(7.3.43)
(see Exercise 17 of Chapter 2) and it follows that ∆u(x) = 0 for x 6= 0.
We may use Green’s identity (18.2.6) to obtain, for any φ ∈ D(RN )
Z
Z
u(x)∆φ(x) dx = lim
u(x)∆φ(x) dx
∆u(φ) = u(∆φ) =
→0+ |x|>
RN
Z
Z
∂φ
∂u
∆u(x)φ(x) dx +
= lim
u(x) (x) − φ(x) (x) dS(x)
→0+
∂n
∂n
|x|>
|x|=
∂
= − ∂r
on {x : |x| = } this simplifies to
Z
2−N
1 ∂φ
∆u(φ) = lim
φ(x) − N −2 (x) dS(x)
→0+ |x|=
N −1
∂r
Since ∆u = 0 for x 6= 0 and
(7.3.44)
(7.3.45)
∂
∂n
(7.3.46)
We next observe that
Z
lim
→0+
|x|=
2−N
φ(x) dS(x) = (2 − N )ΩN −1 φ(0)
N −1
2
(7.3.47)
The usual notation is to use N −1 rather than N as the subscript because the sphere is a surface of dimension
N − 1.
107
radialNlapla
since the average of φ over the sphere of radius converges to φ(0) as → 0. Finally, the
second integral tends to zero, since
Z
ΩN −1 N −1
1 ∂φ
≤
(x)
dS(x)
||∇φ||L∞ → 0
(7.3.48)
N −2 ∂r
N −2
|x|=
Thus (7.3.41) holds. When N = 2 an analogous calculation shows that if u(x) = log |x|
then ∆u = 2πδ in D0 (R2 ).
7.4
Convolution and distributions
If f, g are locally integrable functions on RN the classical convolution of f and g is defined
to be
Z
f (x − y)g(y) dy
(7.4.1)
(f ∗ g)(x) =
RN
whenever the integral is defined. By an obvious change of variable we see that convolution
is commutative, f ∗ g = g ∗ f .
Proposition 7.3. If f ∈ Lp (RN ) and g ∈ Lq (RN ) then f ∗ g ∈ Lr (RN ) if 1 + 1r =
so in particular is defined almost everywhere. Furthermore
||f ∗ g||Lr (RN ) ≤ ||f ||Lp (RN ) ||g||Lq (RN )
1
p
+ 1q ,
(7.4.2)
The inequality (7.4.2) is Young’s convolution inequality, and we refer to [38] (Theorem
9.2) for a proof. In the case r = ∞ it can actually be shown that f ∗ g ∈ C(RN ).
Our goal here is to generalize the definition of convolution in such a way that at least
one of the two factors can be a distribution. Let us introduce the notations for translation
and inversion of a function f ,
(τh f )(x) = f (x − h)
(7.4.3)
fˇ(x) = f (−x)
(7.4.4)
so that f (x − y) = (τx fˇ)(y). If f ∈ D(RN ) then so is (τx fˇ) so that (f ∗ g)(x) may be
regarded as Tg (τx fˇ), i.e. the value obtained when the distribution corresponding to the
locally integrable function g acts on the test function (τx fˇ). This motivates the following
definition.
108
youngci
convdp
Definition 7.7. If T ∈ D0 (RN ) and φ ∈ D(RN ) then (T ∗ φ)(x) = T (τx φ̌).
By this definition (T ∗φ)(x) exists and is finite for every x ∈ RN but other smoothness
or decay properties of T ∗ φ may not be apparent.
Example 7.14. If T = δ then
(T ∗ φ)(x) = δ(τx φ̌) = (τx φ̌)(y)|y=0 = φ(x − y)|y=0 = φ(x)
(7.4.5)
Thus, δ is the ’convolution identity’, δ ∗ φ = φ at least for φ ∈ D(RN ). Formally this
corresponds to the widely used formula
Z
δ(x − y)φ(y) dy = φ(x)
(7.4.6)
RN
If Tn → δ in D0 (RN ) then likewise
(Tn ∗ φ)(x) = Tn (τx φ̌) → δ(τx φ̌) = φ(x)
(7.4.7)
ci3
for any fixed x ∈ RN .
A key property of convolution is that in computing a derivative Dα (T ∗ φ), the derivative may be applied to either factor in the convolution. More precisely we have the
following theorem.
convth1
Theorem 7.3. If T ∈ D0 (RN ) and φ ∈ D(RN ) then T ∗ φ ∈ C ∞ (RN ) and for any
multi-index α
Dα (T ∗ φ) = Dα T ∗ φ = T ∗ Dα φ
(7.4.8)
Proof: First observe that
(−1)|α| Dα (τx φ̌) = τx ((Dα φ)ˇ)
(7.4.9)
and applying T to these identical test functions we get the right hand equality in (7.4.8).
We refer to Theorem 6.30 of [31] for the proof of the left hand equality.
When f, g are continuous functions of compact support it is elementary to see that
supp (f ∗ g) ⊂ supp f + supp g. The same property holds for T ∗ φ if T ∈ D0 (RN ) and
φ ∈ D(RN ), once a proper definition of the support of a distribution is given.
If ω ⊂ Ω is an open set we say that T = 0 in ω if T (φ) = 0 whenever φ ∈ D(Ω) and
supp (φ) ⊂ ω. If W denotes the largest open subset of Ω on which T = 0 (equivalently the
109
ci2
union of all open subsets of Ω on which T = 0) then the support of T is the complement
of W in Ω. In other words, x 6∈ supp T if there exists > 0 such that T (φ) = 0 whenever
φ is a test function with support in B(x, ). One can easily verify that the support
of a distribution is closed, and agrees with the usual notion of support of a function,
up to sets of measure zero. The set of distributions of compact support in Ω forms a
vector subspace of D0 (Ω) denoted E 0 (Ω). This notation is appropriate because E 0 (Ω)
turns out to be precisely the dual space of C ∞ (RN ) =: E(RN ) when a suitable definition
of convergence is given, see for example Chapter II, section 5 of [32].
If now T ∈ E 0 (RN ) and φ ∈ D(RN ), we observe that
supp (τx φ̌) = x − supp φ
(7.4.10)
(T ∗ φ)(x) = T (τx φ̌) = 0
(7.4.11)
Thus
unless there is a nonempty intersection of supp T and x − supp φ, in other words, x ∈
supp T + supp φ. Thus from these remarks and Theorem 7.3 we have
convth2
Proposition 7.4. If T ∈ E 0 (RN ) and φ ∈ D(RN ) then
supp (T ∗ φ) ⊂ supp T + supp φ
(7.4.12)
and in particular T ∗ φ ∈ D(RN ).
Convolution provides an extremely useful and convenient way to approximate functions and distributions by very smooth functions, the exact sense in which the approximation takes place being dependent on the object being approximated. We will discuss
several results of this type.
thuapprox
Theorem
7.4. Let f ∈ C(RN ) with supp f compact in RN . Pick φ ∈ D(RN ), with
R
φ(x) dx = 1, set φn (x) = nN φ(nx) and fn = f ∗ φn . Then fn ∈ D(RN ) and fn → f
RN
uniformly on RN .
Proof: The fact that fn ∈ D(RN ) is immediate from Proposition 7.4. Fix > 0. By the
assumption that f is continuous and of compact support it must be uniformly continuous
on RN so there exists δ > 0 such that |f (x) − f (z)| < if |x − z| < δ. Now choose n0
such that supp φn ⊂ B(0, δ) for n > n0 . We then have, for n > n0 that
Z
|fn (x) − f (x)| = (fn (x − y) − fn (x))φn (y) dy ≤
(7.4.13)
RN
Z
|fn (x − y) − f (x)||φn (y)| dy ≤ ||φ||L1 (RN )
(7.4.14)
|y|<δ
110
and the conclusion follows. 2
thLpApprox
If f is not assumed continuous then of course it is not possible for there to exist
fn ∈ D(RN ) converging uniformly to f . However the following can be shown.
R
Theorem 7.5. Let f ∈ Lp (RN ), 1 ≤ p < ∞. Pick φ ∈ D(RN ), with RN φ(x) dx = 1,
set φn (x) = nN φ(nx) and fn = f ∗ φn . Then fn ∈ C ∞ (RN ) ∩ Lp (RN ) and fn → f in
Lp (RN ).
Proof: If > 0 we can find g ∈ C(RN ) of compact support such that ||f − g||Lp (RN ) < .
If gn = g ∗ φn then
||f − fn ||Lp (RN ) ≤ ||f − g||Lp (RN ) + ||g − gn ||Lp (RN ) + ||fn − gn ||Lp (RN ) (7.4.15)
≤ C||f − g||Lp (RN ) + ||g − gn ||Lp (RN )
(7.4.16)
where we have used Young’s convolution inequality (7.4.2) to obtain
||fn − gn ||Lp (RN ) ≤ ||φn ||L1 (RN ) ||f − g||Lp (RN ) = ||φ||L1 (RN ) ||f − g||Lp (RN )
(7.4.17)
Since gn → g uniformly by Theorem 7.4 and g − gn has support in a fixed compact set
independent of n, it follows that ||g −gn ||Lp (RN ) → 0, and so lim supn→∞ ||f −fn ||Lp (RN ) ≤
C.
Further refinements and variants of these results can be proved, see for example
Section C.4 of [10].
Next consider the even more general case that T ∈ D0 (RN ). As in Proposition 7.1
we can choose ψn ∈ D(RN ) such that ψn → δ in D0 (RN ). Set Tn = T ∗ ψn , so that
Tn ∈ C ∞ (RN ). If φ ∈ D(RN ) we than have
Tn (φ) = (Tn ∗ φ̌)(0) = ((Tn ∗ ψn ) ∗ φ̌)(0) = ((T ∗ ψn ) ∗ φ̌)(0)
= (T ∗ (ψn ∗ φ̌))(0) = T ((ψn ∗ φ̌)ˇ)
(7.4.18)
(7.4.19)
It may be checked that ψn ∗ φ̌ → φ̌ in D(RN ), thus Tn (φ) → T (φ) for all φ ∈ D(RN ),
that is, Tn → T in D0 (RN ).
In the above derivation we used associativity of convolution. This property is not
completely obvious, and in fact is false in a more general setting in which convolution
of two distributions is defined. For example, if we were to assume that convolution of
distributions was always defined and that Theorem 7.3 holds, we would have 1∗(δ 0 ∗H) =
111
1 ∗ H 0 = 1 ∗ δ = 1, but (1 ∗ δ 0 ) ∗ H = 0 ∗ H = 0. Nevertheless, associativity is correct in
the case we have just used it, and we refer to [31] Theorem 6.30(c), for the proof.
The pattern of the results just stated is that T ∗ ψn converges to T in the topology
appropriate to the space that T itself belongs to, but this cannot be true in all situations
which may be encountered. For example it cannot be true that if f ∈ L∞ then f ∗ ψn
converges to f in L∞ since this would amount to uniform convergence of a sequence of
continuous functions, which is impossible if f itself is not continuous.
7.5
ex7-1
Exercises
1. Construct a test function φ ∈ C0∞ (R) with the following properties: 0 ≤ φ(x) ≤ 1
for all x ∈ R, φ(x) ≡ 1 for |x| < 1 and φ(x) ≡ 0 for |x| > 2. (Suggestion: think
about what φ0 would have to look like.)
2. Show that
T (φ) =
∞
X
φ(n) (n)
n=1
0
defines a distribution T ∈ D (R).
3. If φ ∈ D(R) show that ψ(x) = (φ(x) − φ(0))/x (this function
R 1 appeared in Example
7.7) belongs to C ∞ (R). (Suggestion: first prove ψ(x) = 0 φ0 (xt) dt.)
4. Find the distributional derivative of f (x) = [x], the greatest integer function.
5. Find the distributional derivatives up through order four of f (x) = |x| sin x.
6. (For readers familiar with the concept of absolute continuity.) If f is absolutely
continuous on (a, b) and f 0 = g a.e., show that f 0 = g in the sense of distributions
on (a, b).
7-3
7. Let λn > 0, λn → +∞ and set
fn (x) = sin λn x
gn (x) =
sin λn x
πx
a) Show that fn → 0 in D0 (R) as n → ∞.
b) Show that gn → δ in D0 (R) as n → ∞.
R∞
(You may use without proof the fact that the value of the improper integral −∞
is π.)
112
sin x
x
dx
8. Let φ ∈ C0∞ (R) and f ∈ L1 (R).
a) If ψn (x) = n(φ(x + n1 ) − φ(x)), show that ψn → φ0 in C0∞ (R). (Suggestion: use
the mean value theorem over and over again.)
b) If gn (x) = n(f (x + n1 ) − f (x)), show that gn → f 0 in D0 (R).
1
9. Let T = pv . Find a formula analogous to (7.3.35) for the distributional derivative
x
of T .
10. Find limn→∞ sin2 nx in D0 (R), or show that it doesn’t exist.
ex7-11
11. Define the distribution
Z
∞
T (φ) =
φ(x, x) dx
−∞
for φ ∈ C0∞ (R2 ). Show that T satisfies the wave equation uxx − uyy = 0 in the sense
of distributions on R2 . Discuss why it makes sense to regard T as being δ(x − y).
12. Let Ω ⊂ RN be a bounded open set and K ⊂⊂ Ω. Show that there exists φ ∈
C0∞ (Ω) such that 0 ≤ φ(x) ≤ 1 and φ(x) ≡ 1 for x ∈ K. (Hint: approximate the
characteristic function of Σ by convolution, where Σ satisfies K ⊂⊂ Σ ⊂⊂ Ω. Use
Proposition 7.4 for the needed support property.)
13. If a ∈ C ∞ (Ω) and T ∈ D0 (Ω) prove the product rule
∂
∂T
∂a
(aT ) = a
+
T
∂xj
∂xj ∂xj
14. Let T ∈ D0 (RN ). We may then regard φ 7−→ Aφ = T ∗ φ as a linear mapping
from C0∞ (Rn ) into C ∞ (Rn ). Show that A commutes with translations, that is,
τh Aφ = Aτh φ for any φ ∈ C0∞ (RN ). (The following interesting converse statement
can also be proved: If A : C0∞ (RN ) 7−→ C(RN ) is continuous and commutes with
translations then there exists a unique T ∈ D0 (RN ) such that Aφ = T ∗ φ. An
operator commuting with translations is also said to be translation invariant.)
R∞
15. If f ∈ L1 (RN ), −∞ f (x) dx = 1, and fn (x) = nN f (nx), use Theorem 7.2 to show
that fn → δ in D0 (RN ).
16. Prove Theorem 7.1.
113
17. If T ∈ D0 (Ω) prove the equality of mixed partial derivatives
∂ 2T
∂ 2T
=
∂xi ∂xj
∂xj ∂xi
(7.5.1)
in the sense of distributions, and discuss why there is no contradiction with known
examples from calculus showing that the mixed partial derivatives need not be
equal.
18. Show that the expression
Z
1
T (φ) =
−1
φ(x) − φ(0)
dx +
|x|
Z
|x|>1
φ(x)
dx
|x|
defines a distribution on R. Show also that xT = sgn x.
19. If f is a function defined on RN and λ > 0, let fλ (x) = f (λx). We say that f is
homogeneous of degree α if fλ = λα f for any λ > 0. If T is a distribution on RN
we say that T is homogeneous of degree α if
T (φλ ) = λ−α−N T (φλ−1 )
a) Show that these two definitions are consistent, i.e., if T = Tf for some f ∈
L1loc (RN ) then T is homogeneous of degree α if and only if f is homogeneous of
degree α.
b) Show that the delta function is homogeneous of degree −N .
ex7-17
20. Show that u(x) =
1
2π
log |x| satisfies ∆u = δ in D0 (R2 ).
21. Without appealing to Theorem 7.3, give a direct proof of the fact that T ∗ φ is a
continuous function of x, for T ∈ D0 (RN ) and φ ∈ D(RN ).
22. Let
(
log2 x x > 0
f (x) =
0
x<0
Show that f ∈ D0 (R) and find the distributional derivative f 0 . Is f a tempered
distribution?
23. If a ∈ C ∞ (R), show that
aδ 0 = a(0)δ 0 − a0 (0)δ
24. If T ∈ D0 (RN ) has compact support, show that T (φ) is defined in an unambiguous
way for any φ ∈ C ∞ (RN ) =: E(RN ). (Suggestion: write φ = ψφ + (1 − ψ)φ where
ψ ∈ D(RN ) satisfies ψ ≡ 1 on the support of T .)
114
Chapter 8
Fourier analysis and distributions
chfourier
In this chapter We present some of the elements of Fourier analysis, with special attention
to those aspects arising in the theory of distributions. Fourier analysis is often viewed
as made up of two parts, one being a collection of topics relating to Fourier series, and
the second being those connected to the Fourier transform. The essential distinction is
that the former focuses on periodic functions while the latter is concerned with functions
defined on all of RN . In either case the central question is that of how we may represent
fairly arbitrary functions, or even distributions, as combinations of particularly simple
periodic functions.
We will begin with Fourier series, and restrict attention to the one dimensional case.
See for example [26] for treatment of multidimensional Fourier series.
8.1
Fourier series in one space dimension
The fundamental point is that if un (x) = einx then the functions {un }∞
n=−∞ make up
2
an orthogonal basis of L (−π, π). It will then follow from the general considerations of
Chapter 6 that any f ∈ L2 (−π, π) may expressed as a linear combination
f (x) =
∞
X
n=−∞
115
cn einx
(8.1.1)
81
where
hf, un i
1
cn =
=
hun , un i
2π
Z
π
f (y)e−iny dy
(8.1.2)
82
−π
The right hand side of (8.1.1) is a Fourier series for f , and (8.1.2) is a formula for the n’th
Fourier coefficient of f . It must be understood that the equality in (8.1.1) is meant only
in the sense of L2 convergence of the partial sums, and need not be true at any particular
point. From the theory of Lebesgue integration it follows that there is a subsequence
of the partial sums which will converge almost everywhere on (−π, π), but more than
P
inx
that we cannot say, without further assumptions on f . Any finite sum N
is
n=−N γn e
called a trigonometric polynomial, so in particular we will be showing that trigonometric
polynomials are dense in L2 (−π, π).
Let us set
1
en (x) = √ einx
2π
n = 0, ±1, ±2, . . .
n
1 X ikx
Dn (x) =
e
2π k=−n
(8.1.3)
(8.1.4)
82a
N
1 X
KN (x) =
Dn (x)
N + 1 n=0
(8.1.5)
It is immediate from checking the necessary integrals that {en }∞
n=−∞ is an orthonormal
set in H = L2 (−π, π). The main goal for the rest of this section is to prove that {en }∞
n=−∞
is actually an orthonormal basis of H.
For the rest of this section, the inner product symbol hf, gi and norm || · || refer to the
inner product and norm in H unless otherwise stated. In the context of Fourier analysis,
Dn , KN are known as the Dirichlet kernel and Féjer kernel respectively. Note that
Z π
Z π
Dn (x) dx =
KN (x) dx = 1
(8.1.6)
−π
−π
for any n, N .
If f ∈ H, let
sn (x) =
n
X
k=−n
116
ck eikx
(8.1.7)
83
where ck is given by (8.1.2) and
N
1 X
sn (x)
σN (x) =
N + 1 n=0
Since
sn (x) =
n
X
hf, ek iek (x)
(8.1.8)
(8.1.9)
k=−n
it follows that the partial sum sn is also the projection of f onto the span of {ek }nk=−n
and so in particular the Bessel inequality
v
uX
u n
||sn || = t
|ck |2 ≤ ||f ||
(8.1.10)
k=−n
holds for all n. In particular, limn→∞ hf, en i = 0, which is the Riemann Lebesgue lemma
for the Fourier coefficients of f ∈ H.
Next observe that by substitution of (8.1.2) into (8.1.7) we obtain
Z π
f (y)Dn (x − y) dy
sn (x) =
(8.1.11)
84
−π
We can therefore regard sn as being given by the convolution Dn ∗ f if we let f (x) = 0
outside of the interval (−π, π). We can also express Dn in an alternative and useful way:
2n
1 −inx X ikx
1 −inx 1 − e(2n+1)ix
e
e =
e
Dn (x) =
2π
2π
1 − eix
k=0
(8.1.12)
for x 6= 0. Multiplying top and bottom of the fraction by e−ix/2 then yields
Dn (x) =
1 sin (n + 12 )x
2π
sin x2
x 6= 0
(8.1.13)
and obviously Dn (0) = (2n + 1)/2π.
An alternative viewpoint of the convolutional relation (8.1.11), which is in some sense
more natural, starts by defining the unit circle as T = R mod 2πZ, i.e. we identify any
117
84a
two points of R differing by an integer multiple of 2π. Any 2π periodic function, such
as en , Dn , sn etc may be regarded as a function on T, and if f is originally given as a
function on (−π, π) then it may extended in a 2π periodic manner to all of R and so also
viewed as a function on the circle T. With f , Dn both 2π periodic, the integral (8.1.11)
could be written as
Z
sn (x) = f (y)Dn (x − y) dy
(8.1.14)
85
T
since (8.1.11) simply amounts to using one natural parametrization of the independent
variable. By the same token
Z a+2π
sn (x) =
f (y)Dn (x − y) dy
(8.1.15)
a
for any convenient choice of a. A 2π periodic function is continuous on T if it is continuous
on [−π, π] and f (π) = f (−π), and the space C(T) may simply be regarded as
C(T) = {f ∈ C([−π, π]) : f (π) = f (−π)}
(8.1.16)
a closed subspace of C([−π, π]), so is itself a Banach space with maximum norm. Likewise
we can define
C m (T) = {f ∈ C m ([−π, π]) : f (j) (π) = f (j) (−π), j = 0, 1, . . . m}
(8.1.17)
a Banach space with the analogous norm.
Next let us make some corresponding observations about KN .
Proposition 8.1. There holds
Z
KN (x − y)f (y) dy
σN (x) =
(8.1.18)
86
(8.1.19)
86b
T
and
N X
1−
KN (x) =
k=−N
|k|
N +1
eikx
1
=
2π(N + 1)
sin ( (N +1)x
)
2
x
sin ( 2 )
!2
x 6= 0
Proof: The identity (8.1.18) is immediate from (8.1.14) and the definition of KN , and
118
rconvergence
the first identity in (8.1.19) is left as an exercise. To complete the proof we observe that
2π
N
X
PN
Dn (x) =
n=0
=
=
sin (n + 12 )x
sin x2
xP
inx
Im ei 2 N
e
n=0
n=0
(8.1.21)
sin x
x 2 i(N +1)x Im ei 2 1−e1−eix
(8.1.22)
sin x2
Im
=
(8.1.20)
1−cos (N +1)x−i sin (N +1)x
−2i sin x2
sin x2
cos (N + 1)x − 1
2 sin2 x2
!2
sin (N +1)x
2
=
x
sin ( 2 )
=
(8.1.23)
(8.1.24)
(8.1.25)
and the conclusion follows upon dividing by 2π(N + 1). 2
Theorem 8.1. Suppose that f ∈ C(T). Then σN → f in C(T).
R
Proof: Since KN ≥ 0 and T KN (x − y) dy = 1 for any x, we have
Z
Z x+π
|σN (x) − f (x)| = Kn (x − y)(f (y) − f (x)) dy ≤
Kn (x − y)|f (y) − f (x)| dy
T
x−π
(8.1.26)
If > 0 is given, then since f must be uniformly continuous on T, there exists δ > 0 such
that |f (x) − f (y)| < if |x − y| < δ. Thus
|σN (x) − f (x)|
≤
R
|x−y|<δ
KN (x − y) dy + 2||f ||∞
≤+
R
(8.1.27)
KN (x − y) dy(8.1.28)
δ<|x−y|<π
||f ||∞
π(N +1) sin2 ( 2δ )
(8.1.29)
Thus there exists N0 such that for N ≥ N0 , |σN (x) − f (x)| < 2 for all x, that is, σN → f
uniformly.
119
corr81
2
Corollary 8.1. The functions {en (x)}∞
n=−∞ form an orthonormal basis of H = L (−π, π).
Proof: We have already observed that these functions form an orthonormal set, so it
remains only to verify one of the equivalent conditions stated in Theorem 6.4. We will
show the closedness property, i.e. that set of finite linear combinations of {en (x)}∞
n=−∞
is dense in H. Given g ∈ H and > 0 we may find f ∈ C(T) such that ||f − g|| < ,
f ∈ D(−π, π)√for example. Then choose N such that ||σN − f ||C(T) < , which implies
||σN − f || < 2π. Thus σN is a finite linear combination of the en ’s and
√
(8.1.30)
||g − σN || < (1 + 2π)
Since is arbitrary, the conclusion follows.
corr82
Corollary 8.2. For any f ∈ H = L2 (−π, π), if
n
X
sn (x) =
ck eikx
(8.1.31)
f (x)e−ikx dx
(8.1.32)
k=−n
where
1
ck =
2π
Z
π
−π
then sn → f in H.
For f ∈ H, we will often write
f (x) =
∞
X
cn einx
(8.1.33)
n=−∞
but we emphasize that without further assumptions this only means that the partial
sums converge in L2 (−π, π).
At this point we have looked at the convergence properties of two different sequences
of trigonometric polynomials, sn and σN , associated with f . While sn is simply the n’th
partial sum of the Fourier series of f , the σN ’s are the so-called Féjer means of f . While
each Féjer mean is a trigonometric polynomial, the sequence σN does not amount to the
partial sums of some other Fourier series, since the n’th coefficient would also have to
depend on N . For f ∈ H, we have that sN → f in H, and so the same is obviously true
under the stronger assumption that f ∈ C(T). On the other hand for f ∈ C(T) we have
120
shown that σN → f uniformly, but it need not be true that sN → f uniformly, or even
pointwise (example of P. du Bois-Reymond, see Section 1.6.1 of [26]). For f ∈ H it can
be shown that σN → f in H, but on the other hand the best L2 approximation property
of sN implies that
||sN − f || ≤ ||σN − f ||
(8.1.34)
since both sN and σN are in the span of {ek }N
k=−N . That is to say, the rate of convergence
of sN to f is faster, in the L2 sense at least, than that of σN . In summary, both sN and
σN provide a trigonometric polynomial approximating f , but each has some advantage
over the other, depending on what is to be assumed about f .
8.2
Alternative forms of Fourier series
From the basic Fourier series (8.1.1) a number of other closely related and useful expressions can be immediately derived. First suppose that f ∈ L2 (−L, L) for some L > 0. If
we let f˜(x) = f (Lx/π) then f˜ ∈ L2 (−π, π), so
f˜(x) =
∞
X
inx
cn e
n=−∞
π
1
cn =
2π
Z
1
cn =
2L
Z
f˜(y)e−iny dy
(8.2.1)
f (y)e−iπny/L dy
(8.2.2)
−π
or equivalently
f (x) =
∞
X
iπnx/L
cn e
n=−∞
L
−L
Likewise (8.2.2) holds if we just regard f as being 2L periodic and in L2 , and in the
formula √
for cn we could replace (−L, L) by any other interval of length 2L. The functions
iπnx/L
e
/ 2L make up an orthonormal basis of L2 (a, b) if b − a = 2L.
Next observe that we can write
∞
X
f (x) =
n=−∞
cn
∞
X
nπx
nπx nπx
nπx
cos
+ i sin
= c0 +
(cn + c−n ) cos
+ i(cn − c−n ) sin
L
L
L
L
n=1
(8.2.3)
If we let
an = cn + c−n
bn = i(cn − c−n ) n = 0, 1, 2, . . .
121
(8.2.4)
87
then we obtain the equivalent formulas
∞
a0 X
nπx
nπx
f (x) =
+
an cos
+ bn sin
2
L
L
n=1
(8.2.5)
88
where
1
an =
L
Z
L
nπy
f (y) cos
dy
L
−L
1
bn =
L
n = 0, 1, . . .
Z
L
f (y) sin
−L
nπy
dy
L
n = 1, 2, . . .
(8.2.6)
We refer to (8.2.5),(8.2.6) as the ’real form’ of the Fourier series, which is natural to
use, for example, if f is real valued, since then no complex quantities appear. Again the
precise meaning of (8.2.5) is that sn → f in H = L2 (−L, L) or other interval of length
2L, where now
n
a0 X
kπx
kπx
sn (x) =
+
+ bk sin
(8.2.7)
ak cos
2
L
L
k=1
with results analogous to those mentioned above for the Féjer means also being valid. It
may be easily checked that the set of functions
∞
sin nπx
1 cos nπx
L
L
√ , √ , √
(8.2.8)
2L
L
L
n=1
make up an orthonormal basis of L2 (−L, L).
Another important variant is obtained as follows. If f ∈ L2 (0, L) then we may define
the associated even and odd extensions of f in L2 (−L, L), namely
(
(
f (x) 0 < x < L
f (x) 0 < x < L
fe (x) =
fo (x) =
(8.2.9)
f (−x) − L < x < 0
−f (−x) − L < x < 0
If we replace f by fe in (8.2.5),(8.2.6), then we obtain immediately that bn = 0 and a
resulting cosine series representation for f ,
Z
∞
nπx
2 L
nπy
a0 X
+
an cos
an =
f (y) cos
dy n = 0, 1, . . .
(8.2.10)
f (x) =
2
L
L 0
L
n=1
Likewise replacing f by fo gives us a corresponding sine series,
Z
∞
X
nπy
nπx
2 L
f (x) =
bn sin
bn =
f (y) sin
dy n = 1, 2, . . .
L
L
L
0
n=1
122
(8.2.11)
89
ourPointwise
Note that if the 2L periodic extension of f is continuous, then the same is true of the
2L periodic extension of fe , but this need not be true in the case of fo . Thus we might
expect that the cosine series of f has typically better convergence properties than that
of the sine series.
8.3
More about convergence of Fourier series
If f ∈ L2 (−π, π) it was already observed that since the the partial sums sn converge to
f in L2 (−π, π), some subsequence of the partial sums converges pointwise a.e. In fact it
is a famous theorem of Carleson ([6]) that sn → f (i.e. the entire sequence, not just a
subsequence) pointwise a.e. This is a complicated proof and even now is not to be found
even in advanced textbooks. No better result could be expected since f itself is only
defined up to sets of measure zero.
If we were to assume the stronger condition that f ∈ C(T) then it mighty be natural
to conjecture that sn → f for every x (recall we know σN → f uniformly in this case), but
that turns out to be false, as mentioned above: in fact there exist continuous functions
for which sn (x) is divergent at infinitely many x ∈ T, see Section 5.11 of [30].
A sufficient condition implying that sn (x) → f (x) for every x ∈ T is that f be
piecewise continuously differentiable on T. In fact the following more precise theorem
can be proved.
Theorem 8.2. Assume that there exist points −π ≤ x0 < x1 < . . . xM = π such that
f ∈ C 1 ([xj , xj+1 ]) for j = 0, 1, . . . M − 1. Let
(
1
(limy→x+ f (y) + limy→x− f (y)) − π < x < π
f˜(x) = 21
(8.3.1)
(limy→−π+ f (y) + limy→π− f (y)) x = ±π
2
Then limn→∞ sn (x) = f˜(x) for −π ≤ x ≤ π.
Under the stated assumptions on f , the theorem states in particular that sn converges
to f at every point of continuity of f , (with appropriate modification at the endpoints)
and otherwise converges to the average of the left and right hand limits. The proof is
somewhat similar to that of Theorem 8.1 – steps in the proof are outlined in the exercises.
So far we have discussed the convergence properties of the Fourier series based on
assumptions about f , but another point of view we could take is to focus on how con123
vergence properties are influenced by the behavior of the Fourier coefficients cn . A first
simple result of this type is:
prop82
Proposition 8.2. If f ∈ H = L2 (−π, π) and its Fourier coefficients satisfy
∞
X
|cn | < ∞
(8.3.2)
acfs
n=−∞
then f ∈ C(T) and sn → f uniformly on T
P
inx
Proof: By the Weierstrass M-test, the series ∞
is uniformly convergent on
n=−∞ cn e
R to some limit g, and since each partial sum is continuous, the same must be true of g.
Since uniform convergence implies L2 convergence on any finite interval, we have sn → g
in H, but also sn → f in H by Corollary 8.2. By uniqueness of the limit f = g and the
conclusion follows.
We say that f has an absolutely convergent Fourier series when (8.3.2) holds. We
emphasize here that the conclusion f = g is meant in the sense of L2 , i.e. f (x) = g(x)
a.e., so by saying that f is continuous, we are really saying that the equivalence class of
f contains a continuous function, namely g.
It is not the case that every continuous function has an absolutely convergent Fourier
series, according to remarks made earlier in this section. It would therefore be of interest
to find other conditions on f which guarantee that (8.3.2) holds. One such condition
follows from the following, which is also of independent interest.
prop83
Proposition 8.3. If f ∈ C m (T), then limn→±∞ nm cn = 0.
Proof: We integrate by parts in (8.1.2) to get, for n 6= 0,
Z
Z π
1 f (y)e−iny π
1 π 0
1
−iny
cn =
+
f (y)e
dy =
f 0 (y)e−iny dy
−π
2π
−in
in −π
2πin −π
(8.3.3)
if f ∈ C 1 (T). Since f 0 ∈ L2 (T), the Riemann-Lebesgue lemma implies that ncn → 0 as
n → ±∞. If f ∈ C 2 (T) we could integrate by parts again to get n2 cn → 0 etc.
It is immediate from this result that if f ∈ C 2 (T) then it has an absolutely convergent
Fourier series, but in fact even f ∈ C 1 (T) is more than enough, see Exercise 6.
One way to regard Proposition 8.3 is that it says that the smoother f is, the more
rapidly its Fourier coefficients must decay. The next result is a sort of converse statement.
124
810
prop84
Proposition 8.4. If f ∈ H = L2 (−π, π) and its Fourier coefficients satisfy
|nm+α cn | ≤ C
(8.3.4)
811
for some C and α > 1, then f ∈ C m (T).
Proof: When m = 0 this is just a special case of Proposition 8.2. When m = 1
we see that it is permissible to differentiate the series (8.1.1) term by term, since the
differentiated series
∞
X
incn einx
(8.3.5)
n=−∞
is uniformly convergent, by the assumption (8.3.4). Thus f, f 0 are both a.e. equal to an
absolutely convergent Fourier series, so f ∈ C 1 (T), by Proposition 8.2. The proof for
m = 2, 3, . . . is similar.
Note that Proposition 8.3 states a necessary condition on the Fourier coefficients for
f to be in C m and Proposition 8.4 states a sufficient condition. The two conditions are
not identical, but both point to the general tendency that increased smoothness of f is
associated with more rapid decay of the corresponding Fourier coefficients.
8.4
The Fourier Transform on RN
If f is a given function on RN the Fourier transform of f is defined as
Z
1
ˆ
f (y) =
f (x)e−ix·y dx
y ∈ RN
N
(2π) 2 RN
(8.4.1)
provided that the integral is defined in some sense. This will always be the case, for
example, if f ∈ L1 (RN ) and any y ∈ RN since then
Z
1
ˆ
|f (y)| ≤
|f (x)| dx < ∞
(8.4.2)
N
(2π) 2 RN
thus in fact fˆ ∈ L∞ (RN ) in this case.
There are a number of other commonly used definitions of the Fourier transform,
obtained by changing the numerical constant in front of the integral, and/or replacing
125
812
813
−ix · y by ix · y and/or including a factor of 2π in the exponent in the integrand. Each
convention has some convenient properties in certain situations, but none of them is
always the best, hence the lack of a universally agreed upon definition. The differences
are non-essential, all having to do with the way certain numerical constants turn up, so
the only requirement is that we adopt one specific definition, such as (8.4.1), and stick
with it.
The Fourier transform is a particular integral operator, and an alternative operator
type notation for it,
Fφ = φ̂
(8.4.3)
is often convenient to use, especially when discussing its mapping properties.
Example 8.1. If N = 1 and f (x) = χ[a,b] (x), the indicator function of the interval [a, b],
then the Fourier transform of f is
Z b
1
e−iay − e−iby
ˆ
√
f (y) = √
e−ixy dy =
(8.4.4)
2π a
2πiy
2
Example 8.2. If N = 1, α > 0 and f (x) = e−αx (a Gaussian function) then
y2 Z
∞
− 4α
iy 2
e
e
e−α(x+ 2 ) dx
e−ixy dx = √
2π −∞
−∞
y2 Z
y2 r
y2
e− 4α ∞ −αx2
e− 4α π
1
= √
e
dx = √
= √ e− 4α
2π −∞
2π α
2α
1
fˆ(y) = √
2π
Z
∞
−αx2
(8.4.5)
(8.4.6)
In the above derivation, the key step is the third equality which is justified by contour
2
integration techniques in complex function theory – the integral of e−αz along the real
axis is the same as the integral along the parallel line Imz = y2 for any y.
Thus the Fourier transform of a Gaussian is another Gaussian, and in particular fˆ = f
if α = 21 .
It is clear from the Fourier transform definition that if f has the special product form
f (x) = f1 (x1 )f2 (x2 ) . . . fN (xN ) then fˆ(y) = fˆ1 (y1 )fˆ2 (y2 ) . . . fˆN (yN ). The Gaussian in
2
RN , namely f (x) = e−α|x| , is of this type, so using (8.4.6) we immediately obtain
|y|2
fˆ(y) =
e− 4α
N
(2α) 2
126
(8.4.7)
NdGaussian
To state our first theorem about the Fourier transform, let us denote
C0 (RN ) = {f ∈ C(RN ) : lim |f (x)| = 0}
|x|→∞
(8.4.8)
the space of continuous functions vanishing at ∞. It is a closed subspace of L∞ (RN ),
hence a Banach space with the L∞ norm. We emphasize that despite the notation,
functions in this space need not be of compact support.
Theorem 8.3. If f ∈ L1 (RN ) then fˆ ∈ C0 (RN ).
Proof: If yn ∈ RN and yn → y then clearly f (x)e−ix·yn → f (x)e−ix·y for a.e. x ∈ RN .
Also, |f (x)e−ix·yn | ≤ |f (x)|, and since we assume f ∈ L1 (RN ) we can immediately apply
the dominated convergence theorem to obtain
Z
Z
−ix·yn
lim
f (x)e
dx =
f (x)e−ix·y dx
(8.4.9)
n→∞
RN
RN
that is, fˆ(yn ) → fˆ(y). Hence fˆ ∈ C(RN ).
Next, suppose temporarily that g ∈ C 1 (RN ) and has compact support. An integration
by parts gives us, for j = 1, 2, . . . N that
Z
1
∂g −ix·y
1
e
dx
(8.4.10)
ĝ(y) = −
N
(2π) 2 iyj RN ∂yj
Thus there exists some C, depending on g, such that
|ĝ(y)|2 ≤
C
yj2
j = 1, 2, . . . N
(8.4.11)
from which it follows that
2
|ĝ(y)| ≤ min
j
C
yj2
≤
CN
|y|2
(8.4.12)
Thus ĝ(y) → 0 as |y| → ∞ in this case.
Finally, such g’s are dense in L1 (RN ), so given f ∈ L1 (RN ) and > 0, choose g as
above such that ||f − g||L1 (RN ) < . We then have, taking into account (8.4.2)
|fˆ(y)| ≤ |fˆ(y) − ĝ(y)| + |ĝ(y)| ≤
1
N
(2π) 2
127
||f − g||L1 (RN ) + |ĝ(y)|
(8.4.13)
and so
lim sup |fˆ(y)| <
|y|→∞
N
(2π) 2
(8.4.14)
Since > 0 is arbitrary, the conclusion fˆ ∈ C0 (RN ) follows.
The fact that fˆ(y) → 0 as |y| → ∞ is analogous to the property that the Fourier
coefficients cn → 0 as n → ±∞ in the case of Fourier series, and in fact is also called the
Riemann-Lebesgue Lemma.
One of the fundamental properties of the Fourier transform is that it is ’almost’ its
own inverse. A first precise version of this is given by the following Fourier Inversion
Theorem.
finvthm
Theorem 8.4. If f, fˆ ∈ L1 (RN ) then
Z
1
f (x) =
fˆ(y)eix·y dy
N
(2π) 2 RN
a.e. x ∈ RN
(8.4.15)
The right hand side of (8.4.15) is not precisely the Fourier transform of fˆ because
the exponent contains ix · y rather than −ix · y, but it does mean that we can think of
ˆ
it as saying that f (x) = fˆ(−x), or
ˆ
fˆ = fˇ,
(8.4.16)
where
f and
fˇ(x) = f (−x), is the reflection of f .1 The requirement in the theorem that both
fˆ be in L1 will be weakened later on.
Proof: Since fˆ ∈ L1 (RN ) the right hand side of (8.4.15) is well defined, and we denote
it temporarily by g(x). Define also the family of Gaussians,
|x|2
Gα (x) =
1
e− 4α
N
(4πα) 2
Warning: some authors use the symbol fˇ to mean the inverse Fourier transform of f .
128
(8.4.17)
fourinv
819
We then have
g(x) =
=
=
=
=
Z
1
2
fˆ(y)eix·y e−α|y| dy
α→0+ (2π)
RN
Z Z
1
2
lim
f (z)e−α|y| e−i(z−x)·y dzdy
N
α→0+ (2π)
N
N
ZR R Z
1
−α|y|2 −i(z−x)·y
f (z)
e
e
dy dz
lim
α→0+ (2π)N RN
RN
|z−x|2
Z
e− 4α
lim
f (z)
N dz
α→0+ RN
(4πα) 2
lim (f ∗ Gα )(x)
lim
N
2
α→0+
(8.4.18)
(8.4.19)
(8.4.20)
(8.4.21)
(8.4.22)
Here (8.4.18) follows from the dominated convergence theorem and (8.4.20) from Fubini’s
theorem, which is applicable here because
Z Z
2
|f (z)e−α|y| | dzdy < ∞
(8.4.23)
RN
RN
In (8.4.21) we have used the explicit calculation (8.4.7) above for the Fourier transform
of a Gaussian.
R
Noting that RN Gα (x) dx = 1 for every α > 0, we see that the difference f ∗ Gα (x) −
f (x) may be written as
Z
Gα (y)(f (x − y) − f (x)) dx
(8.4.24)
RN
so that
Z
||f ∗ Gα − f ||L1 (RN ) ≤
Gα (y)φ(y) dy
(8.4.25)
RN
R
where φ(y) = RN |f (x − y) − f (x)| dx. Then φ is bounded and continuous at y = 0 with
φ(0) = 0 (see Exercise 10), and we can verify that the hypotheses of Theorem 7.2 are
satisfied with fn replaced by Gαn as long as αn → 0+. For any sequence αn > 0, αn → 0
it follows that Gαn ∗ f → f in L1 (RN ), and so there is a subsequence αnk → 0 such that
(Gαnk ∗ f )(x) → f (x) a.e. We conclude that (8.4.15) holds. 2
129
8.5
Further properties of the Fourier transform
Formally speaking we have
Z
Z
∂
−ix·y
f (x)e
dx =
−ixj f (x)e−ix·y dx
∂yj RN
RN
(8.5.1)
or in more compact notation
∂ fˆ
= (−ixj f )ˆ
∂yj
(8.5.2)
This is rigorously justified by standard theorems Rof analysis about differentiation of integrals with respect to parameters provided that RN |xj f (x)| dx < ∞.
A companion property, obtained formally using integration by parts, is that
Z
Z
∂f −ix·y
iyj f (x)e−ix·y dx
(8.5.3)
e
dx =
∂x
N
N
j
R
R
or
∂f
ˆ = iyj fˆ
∂xj
(8.5.4)
R
which is rigorously correct provided at least that f ∈ C 1 (RN ) and |x|=R |f (x)| dS → 0
as R → ∞. Repeating the above arguments with higher derivatives we obtain
Proposition 8.5. If α is any multi-index then
Dα fˆ(y) = ((−ix)α f )ˆ(y)
(8.5.5)
821
|xα f (x)| dx < ∞
(8.5.6)
822
(Dα f )ˆ(y) = (iy)α fˆ(y)
(8.5.7)
823
(8.5.8)
824
if
Z
RN
and
if
m
n
Z
f ∈ C (R )
|Dβ f (x)| dS → 0 as R → ∞
|x|=R
130
|β| < |α| = m
We will eventually see that (8.5.5) and (8.5.7) remain valid, suitably interpreted in a
distributional sense, under conditions much more general than (8.5.6) and (8.5.8). But
for now we introduce a new space in which these last two conditions are guaranteed to
hold.
Definition 8.1. The Schwartz space is defined as
S(RN ) = {φ ∈ C ∞ (RN ) : xα Dβ φ ∈ L∞ (RN ) for all α, β}
(8.5.9)
Thus a function is in the Schwartz space if any derivative of it decays more rapidly
than the reciprocal of any polynomial. Clearly S(RN ) contains all test functions D(RN )
2
as well as other kinds of functions such as Gaussians, e−α|x| for any α > 0.
If φ ∈ S(RN ) then in particular, for any n
|Dβ φ(x)| ≤
C
(1 + |x|2 )n
(8.5.10)
for some C, and so clearly both (8.5.5) and (8.5.7) hold, thus the two key identities (8.5.5)
and (8.5.7) are correct whenever f is in the Schwartz space. It is also immediate from
(8.5.10) that S(RN ) ⊂ L1 (RN ) ∩ L∞ (RN ).
Proposition 8.6. If φ ∈ S(RN ) then φ̂ ∈ S(RN ).
Proof: Note from (8.5.5) and (8.5.7) that
(iy)α Dβ φ̂(y) = (iy)α ((−ix)β φ)ˆ(y) = (Dα ((−ix)β φ))ˆ(y)
(8.5.11)
holds for φ ∈ S(RN ). Also, since S(RN ) ⊂ L1 (RN ) it follows from (8.4.2) that if φ ∈
S(RN ) then φ̂ ∈ L∞ (RN ). Thus we have the following list of implications:
φ ∈ S(RN ) =⇒ (−ix)β φ ∈ S(RN )
=⇒ Dα ((−ix)β φ) ∈ S(RN )
=⇒ (Dα ((−ix)β φ))ˆ ∈ L∞ (RN )
=⇒ y α Dβ φ̂ ∈ L∞ (RN )
N
=⇒ φ̂ ∈ S(R )
fmap
(8.5.12)
(8.5.13)
(8.5.14)
(8.5.15)
(8.5.16)
Corollary 8.3. The Fourier transform F : S(RN ) → S(RN ) is one to one and onto.
131
825
Proof: The above theorem says that F maps S(RN ) into S(RN ), and if Fφ = φ̂ = 0
then the inversion theorem Theorem 8.4 is applicable, since both φ, φ̂ are in L1 (RN ). We
ˇ
conclude φ = 0, i.e. F is one to one. If ψ ∈ S(RN ), let φ = ψ̂. Clearly φ ∈ S(RN )
and one may check directly, again using the inversion theorem, that φ̂ = ψ, so that F is
onto.
The next result, usually known as the Parseval identity, is the key step needed to
define the Fourier transform of a function in L2 (RN ), which turns out to be the more
natural setting.
Proposition 8.7. If φ, ψ ∈ S(RN ) then
Z
Z
φ(x)ψ̂(x) dx =
RN
φ̂(x)ψ(x) dx
(8.5.17)
pars
RN
Proof: The proof is simply an interchange of order in an iterated integral, which is easily
justified by Fubini’s theorem:
Z
Z
Z
1
−ix·y
φ(x)ψ̂(x) dx =
ψ(y)e
dy dx
(8.5.18)
φ(x)
N
(2π) 2 RN
RN
RN
Z
Z
1
−ix·y
=
φ(x)e
dx dy
(8.5.19)
ψ(y)
N
(2π) 2 RN
RN
Z
φ̂(y)ψ(y) dy
(8.5.20)
=
RN
There is a slightly different but equivalent formula, which is also sometimes called the
Parseval identity, see Exercise 11. The content of the following corollary is the Plancherel
identity.
planchthm
Corollary 8.4. For every φ ∈ S(RN ) we have
||φ||L2 (RN ) = ||φ̂||L2 (RN )
(8.5.21)
Proof: Given φ ∈ S(RN ) there exists, by Corollary 8.3, ψ ∈ S(RN ) such that ψ̂ = φ. In
addition it follows directly from the definition of the Fourier transform and the inversion
132
planch
theorem that ψ = φ̂. Therefore, by Parseval’s identity
Z
Z
Z
2
φ(x)ψ̂(x) dx =
φ̂(x)ψ(x) =
φ̂(x)φ̂(x) dx = ||φ̂||2L2 (RN ) (8.5.22)
||φ||L2 (RN ) =
RN
RN
RN
Recalling that D(RN ) is dense in L2 (RN ) it follows that the same is true of S(RN )
and the Plancherel identity therefore implies that the Fourier transform has an extension
to all of L2 (RN ). To be precise, if f ∈ L2 (RN ) pick φn ∈ S(RN ) such that φn → f
in L2 (RN ). Since {φn } is Cauchy in L2 (RN ), (8.5.21) implies the same for {φ̂n }, so
g := limn→∞ φ̂n exists in the L2 sense, and this limit is by definition fˆ. From elementary
considerations this limit is independent of the choice of approximating sequence {φn },
the extended definition of fˆ agrees with the original definition if f ∈ L1 (RN ) ∩ L2 (RN ),
and (8.5.21) continues to hold for all f ∈ L2 (RN ).
ˆ
ˆ
Since φ̂n → fˆ in L2 (RN ), it follows by similar reasoning that φˆn → fˆ. By the inversion
ˆ
ˆ
theorem we know that φˆn = φˇn which must converge to fˇ, thus fˇ = fˆ, i.e. the Fourier
inversion theorem continues to hold on L2 (RN ).
The subset L1 (RN ) ∩ L2 (RN ) is dense in L2 (RN ) so we also have that fˆ = limn→∞ fˆn
if fn is any sequence in L1 (RN ) ∩ L2 (RN ) convergent in L2 (RN ) to f . A natural choice
of such a sequence is
(
f (x) |x| < n
fn (x) =
(8.5.23)
0
|x| > n
leading to the following explicit formula, similar to an improper integral, for the Fourier
transform of an L2 function,
Z
1
fˆ(y) = lim
f (x)e−ix·y dx
(8.5.24)
N
n→∞ (2π) 2
|x|<n
fourL2
where again without further assumptions we only know that the limit takes place in the
L2 sense.
Let us summarize.
Theorem 8.5. For any f ∈ L2 (RN ) there exists a unique fˆ ∈ L2 (RN ) such that fˆ is
given by (8.4.1) whenever f ∈ L1 (RN ) ∩ L2 (RN ) and
||f ||L2 (RN ) = ||fˆ||L2 (RN ) .
133
(8.5.25)
planch2
Furthermore, f, fˆ are related by (8.5.24) and
f (x) = lim
n→∞
Z
1
(2π)
N
2
fˆ(y)eix·y dy
(8.5.26)
|y|<n
We conclude this section with one final important property of the Fourier transform.
ftconv
Proposition 8.8. If f, g ∈ L1 (RN ) then f ∗ g ∈ L1 (RN ) and
N
(f ∗ g)ˆ = (2π) 2 fˆĝ
(8.5.27)
Proof: The fact that f ∗ g ∈ L1 (RN ) is immediate from Fubini’s theorem, or, alternatively, is a special case of Young’s convolution inequality (7.4.2). To prove (8.5.27) we
have
Z
1
(f ∗ g)ˆ(z) =
(f ∗ g)(x)e−ix·z dx
(8.5.28)
N
(2π) 2 RN
Z Z
1
=
f (x − y)g(y) dy e−ix·z dx
(8.5.29)
N
(2π) 2 RN
RN
Z
Z
1
−iy·z
−i(x−y)·z
=
g(y)e
f (x − y)e
dx dy (8.5.30)
N
(2π) 2 RN
RN
N
= (2π) 2 fˆ(z)ĝ(z)
(8.5.31)
with the exchange of order of integration justified by Fubini’s theorem.
8.6
Fourier series of distributions
In this and the next section we will see how the theory of Fourier series and Fourier
transforms can be extended to a distributional setting. To begin with let us consider
the casePof the delta function, viewed as a distribution on (−π, π). Formally speaking, if
inx
δ(x) = ∞
, then the coefficients cn should be given by
n=−∞ cn e
Z π
1
1
cn =
δ(x)e−inx dx =
(8.6.1)
2π −π
2π
134
874
for every n, so that
δ(x) =
∞
1 X inx
e
2π n=−∞
(8.6.2)
871
Certainly this is not a valid formula in any classical sense, since the terms of the series
do not decay to zero. On the other hand, the N ’th partial sum of this series is precisely
the Dirichlet kernel DN (x), as in (8.1.4) or (8.1.13), and one consequence of Theorem
8.2 is precisely that DN → δ in D0 (−π, π). Thus we may expect to find Fourier series
representations of distributions, provided that we allow for the series to converge in a
distributional sense.
Note that since DN → δ we must also have, by Proposition 7.2, that
0
=
DN
N
i X
neinx → δ 0
2π n=−N
(8.6.3)
P
m inx
as N → ∞. By repeatedly differentiating, we see that any formal Fourier series ∞
n=−∞ n e
is meaningful in the distributional sense, and is simply, up to a constant multiple, some
derivative of the delta function. The following proposition shows that we can allow any
sequence of Fourier coefficients as long as the rate of growth is at most a power of n.
Proposition 8.9. Let {cn }∞
n=−∞ be any sequence of constants satisfying
|cn | ≤ C|n|M
(8.6.4)
for some constant C and positive integer M . Then there exists T ∈ D0 (−π, π) such that
T =
∞
X
cn einx
(8.6.5)
n=−∞
Proof: Let
g(x) =
∞
X
cn
einx
M +2
(in)
n=−∞
(8.6.6)
which is a uniformly convergent Fourier series, so in particular the partial sums SN → g
(j)
in the sense of distributions on (−π, π). But then SN → g (j) also in the distributional
sense, and in particular
∞
X
cn einx = T := g (M +2)
(8.6.7)
n=−∞
135
distfs
It seems clear that any distribution on R of the form (8.6.5) should be 2π periodic
since every partial sum is. To make this precise, define the translate of any distribution
T ∈ D0 (RN ) by the natural definition τh T (φ) = T (τ−h φ), where as usual τh φ(x) =
φ(x − h), h ∈ RN . We then say that T is h-periodic with period h ∈ RN if τh T = T , and
it is immediate that if Tn is h-periodic and Tn → T in D0 (RN ) then T is also h periodic.
Example 8.3. The Fourier series identity (8.6.2) becomes
∞
X
∞
1 X inx
δ(x − 2nπ) =
e
2π n=−∞
n=−∞
(8.6.8)
when regarded as an identity in D0 (R), since the left side is 2π periodic and coincides
with δ on (−π, π).
A 2π periodic distribution on R may also naturally be regarded as an element of the
distribution space D0 (T), which is defined as the space of continuous linear functionals
(j)
on C ∞ (T). Here, convergence in C ∞ (T) means that φn → φ(j) uniformly on T for all
1
j = 0, 1, 2 . . . . Any function
usual way to regular distribution
R π f ∈ L (T) gives rise in the
2
Tf defined by Tf (φ) = −π f (x)φ(x) dx and if f ∈ L then then n’th Fourier coefficient
1
Tf (e−inx ). Since e−inx ∈ C ∞ (T) it follows that
is cn = 2π
cn = T (e−inx )
(8.6.9)
is defined for T ∈ D0 (T), and is defined to be the n’th Fourier coefficient of the distribution
T . This definition is then consistent with the definition of Fourier coefficient for a regular
distribution, and it can be shown (Exercise 30) that
N
X
cn einx → T
in D0 (T)
(8.6.10)
n=−N
Example 8.4. Let us evaluate the distributional Fourier series
∞
X
einx
(8.6.11)
n=0
The n’th partial sum is
sn (x) =
n
X
eikx =
k=0
136
1 − ei(n+1)x
1 − eix
(8.6.12)
872
Rπ
sn (x) dx = 2π,
Z π
1 − ei(n+1)x
sn (φ) = 2πφ(0) +
(φ(x) − φ(0)) dx
1 − eix
−π
so that we may write, since
−π
(8.6.13)
for any test function φ.
The function (φ(x) − φ(0))/(1 − eix ) belongs to L2 (−π, π), hence
Z π i(n+1)x
e
(φ(x) − φ(0)) dx → 0
ix
−π 1 − e
(8.6.14)
as n → ∞ by the Riemann-Lebesgue lemma. Next, using obvious trigonometric identities
we see that 1/(1 − eix ) = 12 (1 + i cot x2 ), and so
Z π
Z
φ(x) − φ(0)
1
x
dx = lim
(φ(x) − φ(0))(1 + i cot ) dx (8.6.15)
ix
→0+ 2 <|x|<π
1−e
2
−π
Z π
1
φ(x) dx − πφ(0)
(8.6.16)
=
2 −π
Z
i
x
+ lim
φ(x) cot dx
(8.6.17)
→0+ 2 <|x|<π
2
The principal value integral in (8.6.17) is naturally defined to be the action of the distribution pv(cot x2 ), and we obtain the final result, upon letting n → ∞, that
∞
X
einx = πδ +
n=0
1 i
x
+ pv(cot )
2 2
2
(8.6.18)
By taking the real and imaginary parts of this identity we also find
∞
X
n=0
8.7
cos nx = πδ +
∞
X
1
2
sin nx =
n=1
1
x
pv(cot )
2
2
(8.6.19)
Fourier transforms of distributions
Taking again the example of the delta function, now considered as a distribution on RN ,
it appears formally correct that it should have a Fourier transform which is a constant
function, namely
Z
1
1
δ̂(x) =
δ(x)e−ix·y dx =
(8.7.1)
N
N
(2π) 2 RN
(2π) 2
137
If the inversion theorem remains valid then any constant should also have a Fourier
N
transform, e.g. 1̂ = (2π) 2 δ. On the other hand it will turn out that a function such as
ex does not have a Fourier transform in any reasonable sense.
We will now show that the set of distributions for which the Fourier transform can
be defined turns out to be precisely the dual space of the Schwartz space, known also
as the space of tempered distributions. To define this we must first have a definition of
convergence in S(RN ).
Definition 8.2. We say that φn → φ in S(RN ) if
lim ||xα Dβ (φn − φ)||L∞ (RN ) = 0
n→∞
f or any α, β
(8.7.2)
Proof of the following lemma will be left for the exercises.
lemma81
Lemma 8.1. If φn → φ in S(RN ) then φˆn → φ̂ in S(RN ).
Definition 8.3. The set of tempered distributions on RN is the space of continuous
linear functionals on S(RN ), denoted S 0 (RN ).
It was already observed that D(RN ) ⊂ S(RN ) and in addition, if φn → φ in D(RN )
then the sequence also converges in S(RN ). It therefore follows that
S 0 (RN ) ⊂ D0 (RN )
(8.7.3)
i.e. any tempered distribution is also a distribution, as the choice of language suggests.
On the other hand, if Tf is the regular distribution corresponding
to the L1loc function
R
∞
f (x) = ex , then Tf 6∈ S 0 (RN ) since this would require −∞ ex φ(x) dx to be finite for any
φ ∈ S(RN ), which is not true. Thus the inclusion (8.7.3) is strict. We define convergence
in S 0 (RN ) is defined in the expected way, analogously to Definition 7.5:
convS
Definition 8.4. If T, Tn ∈ S 0 (RN ) for n = 1, 2 . . . then we say Tn → T in S 0 (RN ) (or in
the sense of tempered distributions) if Tn (φ) → T (φ) for every φ ∈ S(RN ).
It is easy to see that the delta function belongs to S 0 (RN ) as does any derivative or
translate of the delta function. A regular distribution Tf will belong to S 0 (RN ) provided
it satisfies the condition
f (x)
lim
=0
(8.7.4)
|x|→∞ |x|m
138
851
for some m. Such an f is sometimes referred to as a function of slow growth. In particular,
any polynomial belongs to S 0 (RN ).
We can now define the Fourier transform T̂ for any T ∈ S 0 (RN ). For motivation
of the definition, recall the Parseval identity (8.5.17), which amounts to the identity
Tψ̂ (φ) = Tψ (φ̂), if we regard φ as a function in S(RN ) and ψ as a tempered distribution.
Definition 8.5. If T ∈ S 0 (RN ) then T̂ is defined by T̂ (φ) = T (φ̂) for any φ ∈ S(RN ).
The action of T̂ on any φ ∈ S(RN ) is well-defined, since φ̂ ∈ S(RN ), and linearity of
T̂ is immediate. If φn → φ in S(RN ) then by Lemma 8.1 φˆn → φ̂ in S(RN ), so that
T̂ (φn ) = T (φˆn ) → T (φ̂) = T̂ (φ)
(8.7.5)
We have thus verified that T̂ ∈ S 0 (RN ) whenever T ∈ S 0 (RN ).
Example 8.5. If T = δ, then from the definition,
T̂ (φ) = T (φ̂) = φ̂(0) =
Thus, as expected, δ̂ =
1
(2π)
N
2
Z
1
φ(x) dx
N
(2π) 2
(8.7.6)
RN
, the constant distribution.
Example 8.6. If T = 1 (the constant distribution) then
Z
N ˆ
N
T̂ (φ) = T (φ̂) =
φ̂(x) dx = (2π) 2 φ̂(0) = (2π) 2 φ(0)
(8.7.7)
RN
where the last equality follows from the inversion theorem which is valid for any φ ∈
S(RN ). Thus again the expected result is obtained,
N
1̂ = (2π) 2 δ
(8.7.8)
The previous two examples verify the validity of one particular instance of the Fourier
inversion theorem in the distributional context, but it turns out to be rather easy to
prove that it always holds. One more definition is needed first, that of the reflection of
a distribution.
Definition 8.6. If T ∈ D0 (RN ) then Ť , the reflection of T , is the distribution defined
by Ť (φ) = T (φ̌).
139
We now obtain the Fourier inversion theorem in its most general form, analogous to
the statement (8.4.16) first justified when f, fˆ are in L1 (RN ).
ˆ
Theorem 8.6. If T ∈ S 0 (RN ) then T̂ = Ť .
Proof: For any φ ∈ S(RN ) we have
ˆ
ˆ
T̂ (φ) = T (φ̂) = T (φ̌) = Ť (φ)
(8.7.9)
The apparent triviality of this proof should not be misconstrued, as it relies on the
validity of the inversion theorem in the Schwartz space, and other technical machinery
which we have developed.
Here we state several more simple but useful properties. Here and elsewhere, we
follow the convention of using x and y as the independent variables before and after
Fourier transformation respectively.
ftdprop
Proposition 8.10. Let T ∈ S 0 (RN ) and α be a multi-index. Then
1. xα T ∈ S 0 (RN ).
2. Dα T ∈ S 0 (RN ).
3. Dα T̂ = ((−ix)α T )ˆ.
4. (Dα T )ˆ = (iy)α T̂ .
propftd
5. If Tn ∈ S 0 (RN ) and Tn → T in S 0 (RN ) then T̂n → T̂ in S 0 (RN ).
Proof: We give the proof of part 3 only, leaving the rest for the exercises. Just like the
inversion theorem, it is more or less a direct consequence of the corresponding identity
for functions in S(RN ). For any φ ∈ S(RN ) we have
Dα T̂ (φ) = (−1)|α| T̂ (Dα φ)
= (−1)|α| T ((Dα φ)ˆ)
(8.7.10)
(8.7.11)
= (−1)|α| T ((iy)α φ̂)
α
(8.7.12)
α
= (−ix) T (φ̂) = ((−ix) T )ˆ(φ)
as needed, where we used (8.5.7) to obtain (8.7.12).
140
(8.7.13)
Example 8.7. If T = δ 0 regarded as an element of S 0 (R) then
iy
T̂ = (δ 0 )ˆ = iy δ̂ = √
2π
by part 4 of the previous proposition. In other words
Z ∞
i
xφ(x) dx
T̂ (φ) = √
2π −∞
(8.7.14)
(8.7.15)
Example 8.8. Let T = H(x), the Heaviside function, again regarded as an element of
S 0 (R). To evaluate the Fourier transform Ĥ, one possible approach
is to use part 4 of
√
0
Proposition 8.10
√ along with H = δ to first obtain iy Ĥ = 1/ 2π. A formal solution
is then Ĥ = 1/ 2πiy, but it must then be recognized that this distributional equation
does not have a unique solution, rather we can add to it any solution of yT = 0, e.g.
T = Cδ for any constant C. It must be verified that there are no other solutions, the
constant C must be evaluated, and the meaning of 1/y in the distribution sense must
be made precise. See Example 8, section 2.4 of [33] for details of how this calculation is
completed.
An alternate approach, which yields other useful formulas along the way is as follows.
For any φ ∈ S(RN ) we have
Z ∞
Ĥ(φ) = H(φ̂) =
φ̂(y) dy
(8.7.16)
Z ∞ 0Z ∞
1
= √
φ(x)e−ixy dxdy
(8.7.17)
2π 0
−∞
Z RZ ∞
1
φ(x)e−ixy dxdy
(8.7.18)
= lim √
R→∞
2π 0 −∞
Z R
Z ∞
1
−ixy
e
dy dx
(8.7.19)
= lim √
φ(x)
R→∞
2π −∞
0
Z ∞
1
1 − e−iRx
= lim √
φ(x)
dx
(8.7.20)
R→∞
ix
2π −∞
Z ∞
Z ∞
sin Rx
i
cos Rx − 1
1
= lim √
φ(x) dx + √
φ(x) dx (8.7.21)
R→∞
x
2π −∞ x
2π −∞
It can then be verified that
sin Rx
→ πδ
x
cos Rx − 1
1
→ − pv
x
x
141
(8.7.22)
881
as R → ∞ in D0 (R). The first limit is just a restatement of the result of part b) in
Exercise 7 of Chapter 7, and the second we leave for the exercises. The final result,
therefore, is that
r
π
i
1
Ĥ =
δ − √ pv
(8.7.23)
2
2π x
heavtrans
Example 8.9. Let Tn = δ(x − n), i.e. Tn (φ) = φ(n), for n = 0, ±1, . . . , so that
Z ∞
1
φ(x)e−inx dx
(8.7.24)
T̂n (φ) = φ̂(n) = √
2π −∞
√
P
0
Equivalently, 2π T̂n = e−inx . If we now set T = ∞
n=−∞ Tn then T ∈ S (R) and
∞
∞
∞
X
1 X inx √
1 X −inx
e
=√
e = 2π
δ(x − 2πn)
T̂ = √
2π n=−∞
2π n=−∞
n=−∞
(8.7.25)
where the last equality comes from (8.6.8). The relation T (φ̂) = T̂ (φ), then yields the
very interesting identity
∞
∞
X
X
√
φ̂(n) = 2π
φ(2πn)
(8.7.26)
n=−∞
n=−∞
valid at least for φ ∈ S(R), which is known as the Poisson summation formula.
We conclude this section with some discussion of the Fourier transform and convolution in a distributional setting. Recall we gave a definition of the convolution T ∗ φ
in Definition 7.7, when T ∈ D0 (RN ) and φ ∈ D(RN ). We can use precisely the same
definition if T ∈ S 0 (RN ) and φ ∈ S(RN ), that is
convsp
Definition 8.7. If T ∈ S 0 (RN ) and φ ∈ S(RN ) then (T ∗ φ)(x) = T (τx φ̌).
Note that in terms of the action of the distribution T , x is just a parameter, and that
we must regard φ̌ as a function of some unnamed other variable, say y or ·. By methods
similar to those used in the proof of Theorem 7.3 it can be shown that
T ∗ φ ∈ C ∞ (RN ) ∩ S 0 (RN )
(8.7.27)
Dα (T ∗ φ) = Dα T ∗ φ = T ∗ Dα φ
(8.7.28)
and
In addition we have the following generalization of Proposition 8.8:
142
poissum
convth3
Theorem 8.7. If T ∈ S 0 (RN ) and φ ∈ S(RN ) then
N
(T ∗ φ)ˆ = (2π) 2 T̂ φ̂
(8.7.29)
Sketch of proof: First observe that from Proposition 8.8 and the inversion theorem we
see that
1
(φψ)ˆ =
(8.7.30)
N (φ̂ ∗ ψ̂)
(2π) 2
for φ, ψ ∈ S(RN ). Thus for ψ ∈ S(RN )
(T̂ φ̂)(ψ) = T̂ (φ̂ψ) = T ((φ̂ψ)ˆ) =
1
(2π)
N
2
ˆ
T (φ̂ ∗ ψ̂) =
1
N
(2π) 2
T (φ̌ ∗ ψ̂)
(8.7.31)
On the other hand,
(T ∗ φ)ˆ(ψ) = (T ∗ φ)(ψ̂)
Z
Z
T (τx φ̌)ψ̂(x) dx
(T ∗ φ)(x)ψ̂(x) dx =
=
N
RN
R Z
Z
φ̌(· − x)ψ̂(x) dx
τx φ̌(·)ψ̂(x) dx = T
= T
(8.7.32)
(8.7.33)
(8.7.34)
RN
RN
= T (φ̌ ∗ ψ̂)
(8.7.35)
which completes the proof.
We have labeled the above proof a ’sketch’ because one key step, the first equality in
(8.7.34) was not explained adequately. See the conclusion of the proof of Theorem 7.19
in [31] for why it is permissible to move T across the integral in this way.
8.8
8-1
8-2
Exercises
P
inx
for the function f (x) = x on (−π, π). Use
1. Find the Fourier series ∞
n=−∞ cn e
some sort of computer graphics to plot a few of the partial sums of this series on
the interval [−3π, 3π].
2. Use the Fourier series in problem 1 to find the exact value of the series
∞
X
1
n2
n=1
∞
X
n=1
143
1
(2n − 1)2
3. Evaluate explicitly the Fourier series, justifying your steps:
∞
X
n
cos (nx)
2n
n=1
(Suggestion: start by evaluating
P∞
einx
n=1 2n ,
which is a geometric series.)
4. Produce a sketch of the Dirichlet and Féjer kernels DN and KN , either by hand or
by computer, for some reasonably large value of N .
5. Verify the first identity in (8.1.19).
8-5
6. We say that f ∈ H k (T) if f ∈ D0 (T) and its Fourier coefficients cn satisfy
∞
X
n2k |cn |2 < ∞
(8.8.1)
n=−∞
a) If f ∈ H 1 (T) show that
is uniformly convergent.
P∞
n=−∞
|cn | is convergent and so the Fourier series of f
b) Show that f ∈ H k (T) for every k if and only if f ∈ C ∞ (T).
7. Evaluate the Fourier series
∞
X
(−1)n n sin (nx)
n=1
0
in D (R). If possible, plot some partial sums of this series.
8. Find the Fourier transform of H(x)e−αx for α > 0.
9. Let f ∈ L1 (RN ).
a) If fλ (x) = f (λx) for λ > 0, find a relationship between fbλ and fb.
b) If fh (x) = f (x − h) for h ∈ RN , find a relationship between fbh and fb.
8n1
8-10
10. If f ∈ L1 (RN ) show that τh f → f in L1 (RN ) as h → 0. (Hint: First prove it when
f is continuous and of compact support.)
11. Show that
Z
Z
φ(x)ψ(x) dx =
RN
b ψ(x)
b dx
φ(x)
(8.8.2)
RN
for φ and ψ in the Schwartz space. (This is also sometimes called the Parseval
identity and leads even more directly to the Plancherel formula.)
144
8n3
ex-8-13
12. Prove Lemma 8.1.
13. In this problem Jn denotes the Bessel function of the first kind and of order n. It
may defined in various ways, one of which is
Z
i−n π iz cos θ
Jn (z) =
e
cos (nθ) dθ
(8.8.3)
π 0
Suppose that f is a radially symmetric function in L1 (R2 ), i.e. f (x) = f (r) where
r = |x|. Show that
Z ∞
ˆ
J0 (r|y|)f (r)r dr
f (y) =
0
It follows in particular that fˆ is also radially symmetric. Using the known identity
d
(zJ1 (z)) = zJ0 (z) compute the Fourier transform of χB(0,R) the indicator function
dz
of the ball B(0, R) in R2 .
14. Prove that J0 (z), defined as in (8.8.3), is a solution of the zero order Bessel equation
u00 +
u0
+u=0
z
Suggestion: show that
zJ000 (z)
+
J00 (z)
1
+ zJ0 (z) =
π
Z
0
π
d
(cos θ sin (z sin θ)) dθ
dθ
15. For α ∈ R let fα (x) = cos αx.
a) Find the Fourier transform fbα .
b) Find limα→0 fbα and limα→∞ fbα in the sense of distributions.
16. Compute the Fourier transform of the Heaviside function H(x) in yet a different
way by justifying that
bn
Ĥ = lim H
n→∞
x
in the sense of distributions, where Hn (x) = H(x)e− n , and then evaluating this
limit.
17. Prove the remaining parts of Proposition 8.10.
18. Let f ∈ C(R) be 2π periodic. It then has a Fourier series in the classical sense,
but it also has a Fourier transform since f is a tempered distribution. What is the
relationship between the Fourier series and the Fourier transform?
145
besselint
19. Let f ∈ L2 (RN ). Show that f is real valued if and only if fb(−k) = fb(k) for all
k ∈ RN . What is the analog of this for Fourier series?
20. Let f be a continuous 2π periodic function with the usual Fourier coefficients
Z π
1
cn =
f (x)e−inx dx
2π −π
Show that
and therefore
1
cn = −
2π
1
cn =
4π
Z
π
Z
π
f (x +
−π
π −inx
)e
dx
n
f (x) − f (x +
−π
π −inx
) e
dx.
n
If f is Lipschitz continuous, use this to show that there exists a constant M such
that
M
|cn | ≤
n 6= 0
|n|
21. Let R = (−1, 1) × (−1, 1) be a square in R2 , let f be the indicator function of R
and g be the indicator function of the complement of R.
a) Compute the Fourier transforms fˆ and ĝ.
b) Is either fˆ or ĝ in L2 (R2 )?
8n5
22. Verify the second limit in (8.7.22).
23. A distribution T on RN is even if Ť = T , and odd if Ť = −T . Prove that the
Fourier transform of an even (resp. odd) tempered distribution is even (resp. odd).
24. Let φ ∈ S(R), ||φ||L2 (R) = 1, and show that
Z ∞
Z ∞
1
2
2
2
2
y |φ̂(y)| dy ≥
x |φ(x)| dx
4
−∞
−∞
(8.8.4)
This is a mathematical statement of the Heisenberg uncertainty principle. (Suggestion: start with the identity
Z ∞
Z ∞
d
2
1=
|φ(x)| dx = −
x |φ(x)|2 dx
−∞
−∞ dx
Make sure to allow φ to be complex valued.) Show that equality is achieved in
(8.8.4) if φ is a Gaussian.
146
uncert
P∞
−πn2 t
25. Let θ(t) =
. (It is a particular case of a class of special functions
n=−∞ e
known as theta functions.) Use the Poisson summation formula (8.7.26) to show
that
r
1
1
θ(t) =
θ
t
t
26. Use (8.7.23) to obtain the Fourier transform of pv x1 ,
r
1
π
sgn y
( pv )ˆ(y) = −i
x
2
(8.8.5)
27. The proof of Theorem 8.7 implicitly used the fact that if φ, ψ ∈ S(RN ) then φ ∗ ψ ∈
S(RN ). Prove this property.
28. Where is the mistake in the following argument? If u(x) = e−x then u0 + u = 0 so
by Fourier transformation
iyû(y) + û(y) = (1 + iy)û(y) = 0
y∈R
Since 1 + iy 6= 0 for real y, it follows that û(y) = 0 for all real y and hence u(x) = 0.
29. If f ∈ L2 (RN ), the autocorrelation function of f is defined to be
Z
ˇ
g(x) = (f ∗ f )(x) =
f (y)f (y − x) dy
RN
8n29
Show that ĝ(y) = |fˆ(y)|2 , ĝ ∈ L1 (RN ) and that g ∈ C0 (RN ). (ĝ is called the power
spectrum or spectral density of f .)
P
inx
30. If T ∈ D0 (T) and cn = T (e−inx ), show that T = ∞
in D0 (T).
n=−∞ cn e
31. The ODE u00 − xu = 0 is known as Airy’s equation, and solutions of it are called
Airy functions.
a) If u is an Airy function which is also a tempered distribution, use the Fourier
transform to find a first order ODE for û(y).
b) Find the general solution of the ODE for û.
c) Obtain the formal solution formula
Z
∞
u(x) = C
−∞
147
eixy+iy
3 /3
dy
pvhat
d) Explain why this formula is not meaningful as an ordinary integral, and how it
can be properly interpreted.
e) Is this the general solution of the Airy equation?
148
Chapter 9
Distributions and Differential
Equations
chde
In this chapter we will begin to apply the theory of distributions developed in the previous chapter in a more systematic way to problems in differential equations. The modern
theory of partial differential equations, and to a somewhat lesser extent ordinary differential equations, makes extensive use of the so-called Sobolev spaces which we now proceed
to introduce.
9.1
Weak derivatives and Sobolev spaces
sec-sobolev
If f ∈ Lp (Ω) then for any multiindex α we know that Dα f exists as an element of D0 (RN ),
but in general the distributional derivative need not itself be a function. However if there
exists g ∈ Lq (Ω) such that Dα f = Tg in D0 (RN ) then we say that f has the weak α
derivative g in Lq (Ω). That is to say, the requirement is that
Z
Z
α
|α|
f D φ dx = (−1)
gφ dx
∀φ ∈ D(Ω)
(9.1.1)
Ω
α
Ω
q
and we write D f ∈ L (Ω). It is important to distinguish the concept of weak derivative
and almost everywhere (a.e.) derivative.
Example 9.1. Let Ω = (−1, 1) and f (x) = |x|. Obviously f ∈ Lp (Ω) for any 1 ≤ p ≤ ∞,
and in the sense of distributions we have f 0 (x) = 2H(x) − 1 (use, for example, (7.3.27)).
149
Thus f 0 ∈ Lq (Ω) for any 1 ≤ q ≤ ∞. On the other hand f 00 = 2δ which does not coincide
with Tg for any g in any Lq space. Thus f has the weak first derivative, but not the
weak second derivative, in Lq (Ω) for any q. The first derivative of f coincides with its
a.e. derivative. In the case of the second derivative, f 00 = 2δ in the sense of distributions,
and obviously f 00 = 0 a.e. but this function does not coincide with the weak second
derivative, indeed there is no weak second derivative according to the above definition.
We may now define the spaces W k,p (Ω), known as Sobolev spaces.
Definition 9.1. If Ω ⊂ RN is an open set, 1 ≤ p ≤ ∞ and k = 1, 2, . . . then
W k,p (Ω) := {f ∈ D0 (Ω) : Dα f ∈ Lp (Ω) |α| ≤ k}
(9.1.2)
We emphasize that the meaning of the condition Dα f ∈ Lp (Ω) is that f should have
the weak α derivative in Lp (Ω) as discussed above. Clearly
D(Ω) ⊂ W k,p (Ω) ⊂ Lp (Ω)
(9.1.3)
so that W k,p (Ω) is always a dense subspace of Lp (Ω) for 1 ≤ p < ∞.
Example 9.2. If f (x) = |x| then referring to the discussion in the previous example we
see that f ∈ W 1,p (−1, 1) for any p ∈ [1, ∞], but f 6∈ W 2,p for any p.
It may be readily checked that W k,p (Ω) is a normed linear space with norm

p1
 P
p
α
1≤p<∞
|α|≤k ||D f ||Lp (Ω)
||f ||W k,p (Ω) =
(9.1.4)
max
α
∞
p=∞
|α|≤k ||D f ||L (Ω)
Furthermore, the necessary completeness property can be shown (Exercise 5, or see Theorem 9.1 below) so that W k,p (Ω) is a Banach space. When p = 2 the norm may be
regarded as arising from the inner product
X
hf, gi =
Dα f (x)Dα g(x) dx
(9.1.5)
|α|≤k
so that it is a Hilbert space. The alternative notation H k (Ω) is commonly used in place
of W k,2 (Ω).
There is a second natural way to give meaning to the idea of a function f ∈ Lp (Ω)
having a derivative in an Lq space, which is as follows: if there exists g ∈ Lq (Ω) such
150
that there exists fn ∈ C ∞ (Ω) satisfying fn → f in Lp (Ω) and Dα fn → g in Lq (Ω), then
we say f has the strong α derivative g in Lq (Ω).
It is elementary to see that a strong derivative is also a weak derivative – we simply
let n → ∞ in the identity
Z
Z
α
α
D fn φ dx = (−1)
fn Dα φ dx
(9.1.6)
Ω
Ω
for any test function φ. Far more interesting is that when p < ∞ the converse statement is
also true, that is weak=strong. This important result, which shall not be proved here, was
first established by Friedrichs [12] in some special situations, and then in full generality
by Meyers and Serrin [23]. A more thorough discussion may be found, for example, in
Chapter 3 of Adams [1]. The key idea is to use convolution, as in Theorem 7.5 to obtain
the needed sequence fn of C ∞ functions. For f ∈ W k,p (Ω) the approximating sequence
may clearly be supposed to belong to C ∞ (Ω) ∩ W k,p (Ω), so this space is dense in W k,p (Ω)
and we have
HisW
Theorem 9.1. For any open set Ω ⊂ RN , 1 ≤ p < ∞ and k = 0, 1, 2 . . . the Sobolev
space W k,p (Ω) coincides with the closure of C ∞ (Ω) ∩ W k,p (Ω) in the W k,p (Ω) norm.
We now define another class of Sobolev spaces which will be important for later use.
Definition 9.2. For Ω ⊂ RN , W0k,p (Ω) is defined to be the closure of C0∞ (Ω) in the
W k,p (Ω) norm.
Obviously W0k,p (Ω) ⊂ W k,p (Ω), but it may not be immediately clear whether these
are actually the same space. In fact this is certainly true when k = 0 since in this case
we know C0∞ (Ω) is dense in Lp (Ω), 1 ≤ p < ∞. It also turns out to be correct for any k, p
when Ω = RN (see Corollary 3.19 of Adams [ ]). But in general the inclusion is strict,
and f ∈ W0k,p (Ω) carries the interpretation that Dα f = 0 on ∂Ω for |α| ≤ k − 1. This
topic will be continued in more detail in Chapter ( ).
9.2
Differential equations in D0
If we consider the simplest differential equation u0 = f on an interval (a, b) ⊂ R, then
from elementary calculus
R x we know that if f is continuous on [a, b], then every solution
is of the form u(x) = a f (y) dy + C, for some constant C. Furthermore in this case
151
u ∈ C 1 ([a, b]), and u0 (x) = f (x) for every x ∈ (a, b) and we would refer to u as a classical
solution of u0 = f . If we make the weaker assumption that f ∈ L1 (a, b) then we can no
longer expect u to be C 1 or u0 (x) = f (x) to hold at every point,
R x since f itself is only
defined up to sets of measure zero. If, however, we let u(x) = a f (y) dy + C then it is
an important result of measure theory that u0 (x) = f (x) a.e. on (a, b). The question
remains whether all solutions of u0 = f are of this form, and the answer must now
depend on precisely what is meant by ’solution’. If we were to interpret the differential
equation as meaning u0 = f a.e. then the answer is no. For example u(x) = H(x) is
a nonconstant function on (−1, 1) with u0 (x) = 0 for x 6= 0. An alternative meaning is
that the differential equation should be satisfied in the sense of distributions on (a, b), in
which case we have the following theorem.
th9-2
Theorem 9.2. Let f ∈ L1 (a, b).
Rx
a) If F (x) = a f (y) dy then F 0 = f in D0 (a, b).
b) If u0 = f in D0 (a, b), then there exists a constant C such that
Z x
u(x) =
f (y) dy + C
a<x<b
(9.2.1)
a
Proof: If F (x) =
Rx
a
f (y) dy, then for any φ ∈ C0∞ (a, b) we have
Z
b
F (x)φ0 (x) dx
F (φ) = −F (φ ) = −
a
Z b Z x
= −
f (y) dy φ0 (x) dx
a
a
Z b
Z b
0
φ (x) dx dy
= −
f (y)
0
0
a
Z
=
(9.2.2)
(9.2.3)
(9.2.4)
y
b
f (y)φ(y) dy = f (φ)
(9.2.5)
a
Here the interchange of order of integration in the third line is easily justified by Fubini’s
theorem. This proves part a).
Now if u0 = f in the distributional sense then T = u − F satisfies T 0 = 0 in D0 (a, b),
and we will finish by showing that T must be a constant. Choose φ0 ∈ C0∞ (a, b) such
152
921
that
Rb
a
φ0 (y) dy = 1. If φ ∈ C0∞ (a, b), set
Z
ψ(x) = φ(x) −
b
φ(y) dy φ0 (x)
(9.2.6)
a
so that ψ ∈ C0∞ (a, b) and
Rb
a
ψ(x) dx = 0. Let
Z x
ψ(y) dy
ζ(x) =
(9.2.7)
a
Obviously ζ ∈ C ∞ (a, b) since ζ 0 = ψ, but in fact ζ ∈ C0∞ (a, b) since ζ(a) = ζ(b) = 0 and
ζ 0 = ψ ≡ 0 in some neighborhood of a and of b. Finally it follows, since T 0 = 0 that
Z b
0
0
0 = T (ζ) = −T (ζ ) = −T (ψ) =
φ(y) dy T (φ0 ) − T (φ)
(9.2.8)
a
Rb
or equivalently T (φ) = a Cφ(y) dy where C = T (φ0 ). Thus T is the distribution corresponding to the constant function C.
We emphasize that part b) of this theorem is of interest, and not obvious, even when
f = 0: any distribution whose distributional derivative on some interval is zero must be a
constant distribution on that interval. Therefore, any distribution is uniquely determined
up to an additive constant by its distributional derivative, which, to repeat, is not the
case for the a.e. derivative.
Now let Ω ⊂ RN be an open set and
Lu =
X
aα (x)Dα u
(9.2.9)
|α|≤m
be a differential operator of order m. We assume that aα ∈ C ∞ (Ω) in which case
Lu ∈ D0 (Ω) is well defined for any u ∈ D0 (Ω). We will use the following terminology for
the rest of this chapter.
Definition 9.3. If f ∈ D0 (Ω) then
• u is a classical solution of Lu = f in Ω if u ∈ C m (Ω) and Lu(x) = f (x) for every
x ∈ Ω.
• u is a weak solution of Lu = f in Ω if u ∈ L1loc (Ω) and Lu = f in D0 (Ω).
153
pdo
• u is a distributional solution of Lu = f in Ω if u ∈ D0 (Ω) and Lu = f in D0 (Ω).
It is clear that a classical solution is also a weak solution, and a weak solution is a
distributional solution. The converse statements are false in general, but may be true
in special cases.For example we have proved above that any distributional solution of
u0 = 0 must be constant, hence in particular any distributional solution of this differential
equation is actually a classical solution. On the other hand u = δ is a distributional
solution of x2 u0 = 0, but is not a classical or weak solution. Of course a classical solution
cannot exist if f is not continuous on Ω. A theorem which says that any solution of
a certain differential equation must be smoother than what is actually needed for the
definition of solution, is called a regularity result. Regularity theory is a large and
important research topic within the general area of differential equations.
Example 9.3. Let Lu = uxx − uyy . If F, G ∈ C 2 (R) and u(x, y) = F (x + y) + G(x − y)
then we know u is classical solution of Lu = 0. We have also observed, in Example 7.12
that if F, G ∈ L1loc (R) then Lu = 0 in the sense of distributions, thus u is a weak solution
of Lu = 0 according to the above definition. The equation has distributional solutions
also, which
R ∞ are not weak solutions. For example, the singular distribution T defined by
T (φ) = −∞ φ(x, x), dx in Exercise 11 of Chapter 7).
Example 9.4. If Lu = uxx +uyy then it turns out that all solutions of Lu = 0 are classical
solutions, in fact, any distributional solution must be in C ∞ (Ω). This is an example of
very important kind of regularity result in PDE theory, and will not be proved here, see
for example Corollary 2.20 of [11]. The difference between Laplace’s equation and the
wave equation, i.e. that Laplace’s equation has only classical solutions, while the wave
equation has many non-classical solutions, is a typical difference between solutions of
PDEs of elliptic and hyperbolic types.
9.3
Fundamental solutions
secfundsol
Let Ω ⊂ RN , L be a differential operator as in (9.2.9), and suppose G(x, y) has the
following properties1 :
G(·, y) ∈ D0 (Ω)
Lx G(x, y) = δ(x − y) ∀y ∈ Ω
1
(9.3.1)
The subscript x in Lx is used here to emphasize that the differential operator is acting in the x variable,
with y in the role of a parameter.
154
We then call G a fundamental solution of L in Ω. If such a G can be found, then formally
if we let
Z
u(x) =
G(x, y)f (y) dy
(9.3.2)
fundsolform
Ω
we may expect that
Z
Lu(x) =
Z
δ(x − y)f (y) dy = f (x)
Lx G(x, y)f (y) dy =
Ω
(9.3.3)
Ω
That is to say, (9.3.2) provides a way to obtain solutions of the PDE Lu = f , and
perhaps also a tool to analyze specific properties of solutions. We are of course ignoring
here all questions of rigorous justification – whether the formula for u even makes sense
if G is only a distribution in x, for what class of f ’s this might be so, and whether it is
permissible to differentiate under the integral to obtain (9.3.3). A more advanced PDE
text such as Hörmander [16] may be consulted for such study. Fundamental solutions
are not unique in general, since we could always add to G any function H(x, y) satisfying
the homogeneous equation Lx H = 0 for fixed y.
We will focus now on the case that Ω = RN and aα (x) ≡ aα for every α, i.e. L is a
constant coefficient operator. In this case, if we can find Γ ∈ D0 (RN ) for which LΓ = δ,
then G(x, y) = Γ(x − y) is a fundamental solution according to the above definition, and
it is normal in this situation to refer to Γ itself as the fundamental solution rather than
G.
Formally, the solution formula (9.3.2) becomes
Z
u(x) =
Γ(x − y)f (y) dy
(9.3.4)
RN
an integral operator of convolution type. Again it may not be clear if this makes sense
as an ordinary integral, but recall that we have earlier defined (Definition 7.7) the convolution of an arbitrary distribution and test function, namely
u(x) = (Γ ∗ f )(x) := Γ(τx fˇ)
(9.3.5)
if Γ ∈ D0 (Ω) and f ∈ C0∞ (RN ). Furthermore, using Theorem 7.3, it follows that u ∈
C ∞ (RN ) and
Lu(x) = (LΓ) ∗ f (x) = (δ ∗ f )(x) = f (x)
(9.3.6)
We have therefore proved
Proposition 9.1. If there exists Γ ∈ D0 (Ω) such that LΓ = δ, then for any f ∈ C0∞ (RN )
the function u = Γ ∗ f is a classical solution of Lu = f .
155
932
It will essentially always be the case that the solution formula u = Γ∗f is actually valid
for a much larger class of f ’s than C0∞ (RN ) but this will depend on specific properties of
the fundamental solution Γ, which in turn depend on those of the original operator L.
Example 9.5. If L = ∆, the Laplacian operator in R3 , then we have already shown
(Example 7.13) that Γ(x) = −1/4π|x| satisfies ∆Γ = δ in the sense of distributions on
R3 . Thus
Z
1
1
f (y)
u(x) = −
∗ f (x) = −
dy
(9.3.7)
4π|x|
4π R3 |x − y|
provides a solution of ∆u = f in R3 , at least when f ∈ C0∞ (R3 ). The integral on the
right in (9.3.7) is known as the Newtonian potential of f , and can be shown to be a valid
solution formula for a much larger class of f ’s. It is in any case always a ’candidate’
solution, which can be analyzed directly. A fundamental solution of the Laplacian exists
in RN for any dimension, and will be recalled at the end of this section.
Example 9.6. Consider the wave operator Lu = utt −uxx in R2 . A fundamental solution
for L (see Exercise 9) is
1
(9.3.8)
Γ(x, t) = H(t − |x|)
2
The support of Γ, namely the set {(x, t) : |x| < t} is in this context known as the forward
light cone, representing the set of points x at which for fixed t > 0 a signal emanating
from the origin x = 0 at time t = 0, and travelling with speed one, may have reached.
The resulting solution formula for Lu = f may then be obtained as
Z ∞Z ∞
Γ(x − y, t − s)f (y, s) dyds
u(x, t) =
−∞ −∞
Z Z
1 ∞ ∞
=
H(t − s − |x − y|)f (y, s) dyds
2 −∞ −∞
Z Z x+t−s
1 t
=
f (y, s) dyds
2 −∞ x−t+s
(9.3.9)
(9.3.10)
(9.3.11)
In many cases of interest f (x, t) ≡ 0 for t < 0 in which case we replace the lower limit
in the s integral by 0. In any case the region over which f is integrated is the ’backward’
light cone, with vertex at (x, t). Under this support assumption on f it also follows that
u(x, 0) = ut (x, 0) ≡ 0, so by adding in the corresponding terms in D’Alembert’s solution
(2.3.46) we find that
Z Z
Z
1 t x+s−t
1
1 x+t
u(x, t) =
f (y, s) dyds + (h(x + t) + h(x − t)) +
g(s) ds (9.3.12)
2 0 x+t−s
2
2 x−t
156
newtpot
is the unique solution of
utt − uxx = f (x, t)
u(x, 0) = h(x)
ut (x, 0) = g(x)
x∈R t>0
x∈R
x∈R
(9.3.13)
(9.3.14)
(9.3.15)
It is of interest to note that this solution formula could also be written, formally at least,
as
∂
u(x, t) = (Γ ∗ f )(x, t) + (Γ ∗ h)(x, t) + (Γ ∗ g)(x, t)
(9.3.16)
(x)
∂t (x)
where the notation (Γ ∗ h) indicates that the convolution takes place in x only, with t
(x)
as a parameter. Thus the fundamental solution Γ enters into the solution not only of the
inhomogeneous equation Lu = f but in solving the Cauchy problem as well. This is not
an accidental feature, and we will see other instances of this sort of thing later.
So far we have seen a couple of examples where an explicit fundamental solution is
known, but have given no indication of a general method for finding it, or even determining if a fundamental solution exists. Let us address the second issue first, by stating
without proof a remarkable theorem.
MalEhr
Theorem 9.3. (Malgrange-Ehrenpreis) If L 6= 0 is any constant coefficient linear differential operator then there exists a fundamental solution of L.
The proof of this theorem is well beyond the scope of this book, see for example
Theorem 8.5 of [31] or Theorem 10.2.1 of [16]. The assumption of constant coefficients
is essential here, counterexamples are known otherwise, even in the case of very simple
and infinitely differentiable variable coefficients.
If we now consider how it might be possible to compute a fundamental solution for a
given operator L, it soon becomes apparent that the Fourier transform may be a useful
tool. If we start with the distributional PDE
X
LΓ =
aα D α Γ = δ
(9.3.17)
|α|≤m
and take the Fourier transform of both sides, the result is
X
|α|≤m
aα (Dα Γ)ˆ =
X
|α|≤m
157
aα (iy)α Γ̂ =
1
N
(2π) 2
(9.3.18)
9315
or
P (y)Γ̂(y) = 1
(9.3.19)
divprob
where P (y), the so-called symbol or characteristic polynomial of L is defined as
X
N
(iy)α aα
(9.3.20)
P (y) = (2π) 2
|α|≤m
Note it was implicitly assumed here that Γ̂ exists, which would be the case if Γ were
a tempered distribution, but this is not actually guaranteed by Theorem 9.3. This is a
rather technical issue which we will not discuss here, but rather take the point of view
that we seek a formal solution which, potentially, further analysis may show is a bona
fide fundamental solution.
The problem of solving (9.3.19) for a distribution Γ̂ is a special case of the so-called
problem of division, which is to solve an equation f S = T for a distribution S given a
distribution T and smooth function f is a suitable class. Various aspects of this problem
may be found in [16].
We have thus obtained Γ̂(y) = 1/P (y), or by the inversion theorem
Z
1
1 ix·y
Γ(x) =
e dy
N
(2π) 2 RN P (y)
(9.3.21)
as a candidate for fundamental solution of L. One particular source of difficulty in
making sense of the inverse transform of 1/P is that in general P has zeros, which might
be of arbitrarily high order, making the integrand too singular to have meaning in any
ordinary sense. On the other hand, we have seen, at least in one dimension, how welldefined distributions of the ’pseudo-function’ type may be associated with non- locally
integrable functions such as 1/xm . Thus there may be some analogous construction in
more than one dimension as well. This is in fact one possible means to proving the
Malgrange-Ehrenpreis theorem. Generally speaking, the
It also suggests that the situation may be somewhat easier to deal with if the zero
set of P in RN is empty, or at least not very large. As a polynomial, of course, P
always has zeros, but some or all of these could be complex, whereas the obstructions to
making sense of (9.3.21) pertain to the real zeros of P only. If L is a constant coefficient
differential operator of order m as above, define
X
N
Pm (y) = (2π) 2
(iy)α aα
(9.3.22)
|α|=m
which is known as the principal symbol of L.
158
fundsolform2
Definition 9.4. We say that L is elliptic if y ∈ RN , Pm (y) = 0 implies that y = 0.
That is to say, the principal symbol has no nonzero real roots. For example the
Laplacian operator L = ∆ is elliptic, as is ∆+ lower order terms, since either way
P2 (y) = −|y|2 . On the other hand, the wave operator, written say as Lu = ∆u−uxN +1 xN +1
PN 2
2
is not elliptic, since the principal symbol is P2 (y) = yN
+1 −
j=1 yj .
The following is not so difficult to establish (Exercise 16), and may be exploited in
working with the representation (9.3.21) in the elliptic case.
prop92
Proposition 9.2. If L is elliptic then
{y ∈ RN : P (y) = 0}
(9.3.23)
the real zero set of P , is compact in RN , and lim|y|→∞ |P (y)| = ∞.
We will next derive a fundamental solution for the heat equation by using the Fourier
transform, although in a slightly different way from the above discussion. Consider first
the initial value problem for the heat equation
ut − ∆u = 0
x ∈ RN t > 0
u(x, 0) = h(x)
x ∈ RN
(9.3.24)
(9.3.25)
with h ∈ C0∞ (RN ). Assuming a solution exists, define the Fourier transform in the x
variables,
Z
1
û(y, t) =
u(x, t)e−ix·y dx
(9.3.26)
N
N
2
(2π)
R
Taking the partial derivative with respect to t of both sides gives (û)t = (ut )ˆ so by the
usual Fourier transformation calculation rules,
(ut )ˆ = (û)t = −|y|2 û
(9.3.27)
and û(y, 0) = ĥ(y). We may regard this as an ODE in t satisfied by û(y, t) for fixed y,
for which the solution obtained by elementary means is
2
û(y, t) = e−|y| t ĥ(y)
If we let Γ be such that Γ̂(y, t) =
1
N
(2π) 2
(9.3.28)
2
e−|y| t then by Theorem 8.8 it follows that
u(x, t) = (Γ ∗ h)(x, t)
(x)
159
(9.3.29)
Since Γ̂ is a Gaussian in x, the same is true for Γ itself, as long as t > 0, and from (8.4.7)
we get
Γ(x, t) = H(t)
e−
|x|2
4t
N
(4πt) 2
(9.3.30)
By including the H(t) factor we have for later convenience defined Γ(x, t) = 0 for t < 0.
Thus we get an integral representation for the solution of (9.3.38)-(9.3.39), namely
Z
Z
|x−y|2
1
− 4t
Γ(x − y, t)h(y) dy =
u(x, t) =
e
h(y) dy
(9.3.31)
N
(4πt) 2 RN
RN
hteqfs
930
valid for x ∈ RN and t > 0. As usual, although this was derived for convenience under
very restrictive conditions on h, it is actually valid much more generally (see Exercise
12).
Now to derive a solution formula for ut − ∆u = f , let v = v(x, t; s) be the solution of
(9.3.38)-(9.3.39) with h(x) replaced by f (x, s), regarding s for the moment as a parameter,
and define
Z t
v(x, t − s; s) ds
u(x, t) =
(9.3.32)
931
0
Assuming that f is sufficiently regular, it follows that
Z t
ut (x, t) = v(x, 0, t) +
vt (x, t − s, s) ds
0
Z t
= f (x, t) +
∆v(x, t − s, s) ds
(9.3.33)
(9.3.34)
0
= f (x, t) + ∆u(x, t)
Inserting the formula (9.3.31) with h replaced by f (·, s) gives
Z tZ
Γ(x − y, t − s)f (y, s) dyds
u(x, t) = (Γ ∗ f )(x, t) =
0
(9.3.35)
(9.3.36)
RN
with Γ given again by (9.3.30). Strictly speaking, we should assume that f (x, t) ≡ 0 for
t < 0 in order that the integral on the right in (9.3.36) coincide with the convolution
in RN +1 , but this is without loss of generality, since we only seek to solve the PDE for
t > 0. The procedure used above for obtaining the solution of the inhomogeneous PDE
starting with the solution of a corresponding initial value problem is known as Duhamel’s
method, and is generally applicable, with suitable modifications, for time dependent PDEs
in which the coefficients are independent of time.
160
935
Since u(x, t) in (9.3.32) evidently satisfies u(x, 0) ≡ 0, it follows (compare to (9.3.16))
that
u(x, t) = (Γ ∗ h)(x, t) + (Γ ∗ f )(x, t)
(9.3.37)
(x)
2
is a solution of
x ∈ RN
x ∈ RN
ut − ∆u = f (x, t)
u(x, 0) = h(x)
t>0
(9.3.38)
(9.3.39)
Let us also observe here that if
F (x) =
then F ≥ 0,
R
RN
1
(2π)
N
2
e−
|x|2
4
(9.3.40)
F (x) dx = 1, and
N
1
x
Γ(x, t) = √
(9.3.41)
F (√ )
t
t
for t > 0. From Theorem 7.2, and the observation that a sequence of the form (7.3.11)
satisfies the assumptions of that theorem, it follows that nN F (nx) → δ in D(RN ) as
n → ∞. Choosing n = √1t we conclude that
lim Γ(·, t) = δ
t→0+
in D0 (RN )
(9.3.42)
In particular limt→0+ (Γ ∗ h)(x, t) = h(x) for all x ∈ RN , at least when h ∈ C0∞ (RN ).
(x)
We conclude this section by collecting all in one place a number of important fundamental solutions. Some of these have been discussed already, some will be left for the
exercises, and in several other cases we will be content with a reference.
Laplace operator
For L = ∆ in RN there exists the following fundamental solutions3 :
 |x|

N =1
2
1
N =2
Γ(x) = 2π log |x|

 CN
N ≥3
|x|N −2
2
(9.3.43)
Note we do not say ’the solution’ here, in fact the solution is not unique without further restrictions.
Some texts will use consistently the fundamental solution of −∆ rather than ∆, in which case all of the signs
will be reversed.
3
161
laplace-fund
where
Z
1
CN =
(2 − N )ΩN −1
dS(x)
ΩN −1 =
(9.3.44)
|x|=1
Thus CN is a geometric constant, related to the area of the unit sphere in RN – an
equivalent formula in terms of the volume of the unit ball in RN is also commonly used.
Of the various cases, N = 1 is elementary to check, N = 2 is requested in Exercise 20 of
Chapter 7, and we have done the N ≥ 3 case in Example 7.13.
Heat operator
For the heat operator L =
fundamental solution
∂
∂t
− ∆ in RN +1 , we have derived earlier in this section the
Γ(x, t) = H(t)
e−
|x|2
4t
N
(4πt) 2
(9.3.45)
for all N .
Wave operator
2
∂
N +1
, the fundamental solution is again significantly
For the wave operator L = ∂t
2 −∆ in R
dependent on N . The cases of N = 1, 2, 3 are as follows:

1

N =1
 2 H(t − |x|)

H(t−|x|)
1 √
N =2
Γ(x, t) = 2π t2 −|x|2
(9.3.46)


δ(t−|x|)

N =3
4π|x|
We have discussed the N = 1 case earlier in this section, and refer to [10] or [18] for the
cases N = 2, 3. As a distribution, the meaning of the the fundamental solution in the
N = 3 case is just what one expects from the formal expression, namely
Z Z ∞
Z
δ(t − |x|)
φ(x, |x|)
Γ(φ) =
φ(x, t) dtdx =
dx
(9.3.47)
4π|x|
R3 −∞
R3 4π|x|
for any test function φ. Note the tendency for the fundamental solution to become more
and more singular, as N increases. This pattern persists in higher dimensions, as the
fundamental solution starts to contain expressions involving δ 0 and higher derivatives of
the δ function.
162
Schrödinger operator
∂
The Schrödinger operator is defined as L = ∂t
− i∆ in RN +1 . The derivation of a
fundamental solution here is nearly the same as for the heat equation, the result being
|x|2
Γ(x, t) = H(t)
e− 4it
(9.3.48)
N
(4iπt) 2
In quantum mechanics Γ is frequently referred to as the ’propagator’. See [27] for much
material about the Schrödinger equation.
Helmholtz operator
The Helmholtz operator is defined by Lu = ∆u − λu. For λ > 0 and dimensions N = 2, 3
fundamental solutions are
 √
sin ( λ|x|)

N =1

 √ 2√λ √
λ
(9.3.49)
Γ(x) = 2π K0 ( λx)
N =2
√

−
λ|x|

− e
N =3
4π|x|
where K0 is the so-called modified Bessel function of the second kind and order 0. See
Chapter 6 of [3] for derivations of these formulas when N = 2, 3, while the N = 1 case
is left for the exercises. This is a case where it may be convenient to use the Fourier
transform method directly, since the symbol of L, P (y) = −|y|2 − λ has no real zeros.
Klein-Gordon operator
2
The Klein-Gordon operator is defined by Lu = ∂∂t2u − ∆u − λu in RN +1 . We mention only
the case N = 1, λ > 0, in which case a fundamental solution is
p
1
Γ(x, t) = H(t − |x|)J0 ( λ(t2 − x2 ))
2
N =1
(9.3.50)
where J0 is the Bessel function of the first kind and order zero (see Exercise 13 of Chapter
8). This may be derived, for example, by the method presented in Problem 2, Section
5.1 of [18], and choosing ψ = δ.
163
9349
Biharmonic operator
The biharmonic operator is L = ∆2 , i.e. Lu = ∆(∆u). It arises especially in connection
with the theory of plates and shells, so that N = 2 is the most interesting case. A
fundamental solution is
Γ(x) = |x|2 log |x|
N =2
(9.3.51)
for which a derivation of this is outlined in Exercise 10.
9.4
Exercises
1. Show that an equivalent definition of W 2,s (RN ) = H s (RN ) for s = 0, 1, 2, . . . is
Z
s
N
0
N
|fˆ(y)|2 (1 + |y|2 )s dy < ∞}
(9.4.1)
H (R ) = {f ∈ S (R ) :
Rn
The second definition makes sense even if s isn’t a positive integer and leads to one
way to define fractional and negative order differentiability. Implicitly it requires
that fˆ (but not f itself) must be a function.
2. Using the definition (9.4.1), show that H s (RN ) ⊂ C0 (RN ) if s >
δ ∈ H s (RN ) if s < − N2 .
N
.
2
Show that
1
3. If Ω is a bounded open set in R3 , and u(x) = |x|
, show that u ∈ W 1,p (Ω) for
1 ≤ p < 32 . Along the way, you should show carefully that a distributional first
∂u
agrees with the corresponding pointwise derivative.
derivative ∂x
i
4. Prove that if f ∈ W 1,p (a, b) for p > 1 then
1
|f (x) − f (y)| ≤ ||f ||W 1,p (a,b) |x − y|1− p
(9.4.2)
so in particular W 1,p (a, b) ⊂ C([a, b]). (Caution: You would like to use the fundamental theorem of calculus here, but it isn’t quite obvious whether it is valid
assuming only that f ∈ W 1,p (a, b).)
ex-8-6
5. Proved directly that W k,p (Ω) is complete (relying of course on the fact that Lp (Ω)
is complete).
6. Show that Theorem 9.1 is false for p = ∞.
164
HsDef
7. If f is a nonzero constant function on [0, 1], show that f 6∈ W01,p (0, 1) for 1 ≤ p < ∞.
8. Let Lu = u00 + u and E(x) = H(x) sin x, x ∈ R.
a) Show that E is a fundamental solution of L.
b) What is the corresponding solution formula for Lu = f ?
c) The fundamental solution E is not the same as the one given in (9.3.49). Does
this call for any explanation?
ex-8-8
9. Show that E(x, t) = 21 H(t − |x|) is a fundamental solution for the wave operator
Lu = utt − uxx .
ex-9-10
10. The fourth order operator Lu = uxxxx + 2uxxyy + uyyyy in R2 is the biharmonic
operator which arises in the theory of deformation of elastic plates.
a) Show that L = ∆2 , i.e. Lu = ∆(∆u) where ∆ is the Laplacian.
b) Find a fundamental solution of L. (Suggestions: To solve
p LE = δ, first solve
∆F = δ and then ∆E = F . Since F will depend on r = x2 + y 2 only, you can
look for a solution E = E(r) also.)
11. Let Lu = u00 + αu0 where α > 0 is a constant.
a) Find a fundamental solution of L which is a tempered distribution.
b) Find a fundamental solution of L which is not a tempered distribution.
ex-9-12
12. Show directly that u(x, t) defined by (9.3.31) is a classical solution of the heat
equation for t > 0, under the assumption that h is bounded and continuous on RN .
13. Assuming that (9.3.31) is valid and h ∈ Lp (RN ), derive the decay property
||u(·, t)||L∞ (RN ) ≤
||h||Lp (RN )
N
t 2p
for 1 ≤ p ≤ ∞.
14. If
(
y(x − 1) 0 < y < x < 1
G(x, y) =
x(y − 1) 0 < x < y < 1
show that G is a fundamental solution of Lu = u00 in (0, 1).
15. Is the heat operator L =
ex-9-4
∂
∂t
− ∆ elliptic?
16. Prove Proposition 9.2.
165
(9.4.3)
Chapter 10
Linear operators
choperators
10.1
Linear mappings between Banach spaces
Let X, Y be Banach spaces. We say that
T : D(T ) ⊂ X 7−→ Y
(10.1.1)
is linear if
∀x1 , x2 ∈ D(T ) ∀c1 , c2 ∈ C
T (c1 x1 + c2 x2 ) = c1 T (x1 ) + c2 T (x2 )
(10.1.2)
Here D(T ) is the domain of T which we do not assume is all of X. Note, however, that
it must be a subspace of X according to this definition. Likewise R(T ), the range of T ,
must be a subspace of Y. If D(T ) = X we say T is densely defined. It is common to
write T x instead of T (x) when T is linear, and we will often use this notation.
The definition of operator norm given earlier in (5.3.1) for the case D(T ) = X may
be generalized to the present case.
Definition 10.1. The norm of the operator T is
||T x||Y
x∈D(T ) ||x||X
||T ||X,Y = sup
x6=0
In general we allow for the case that ||T ||X,Y = ∞.
166
(10.1.3)
Definition 10.2. If ||T ||X,Y < ∞ we will say that T is bounded on its domain. If in
addition D(T ) = X we say T is bounded on X, or more simply that T is bounded, if
there is no possibility of confusion.
If it is clear from context what X, Y are, we may write ||x|| instead of ||x||X etc.
We point out, however, that many linear operators of interest may be defined for many
different choices of X, Y, and it will be important to be able to specify precisely which
spaces we have in mind.
Verification of the following properties is left for the reader.
Proposition 10.1. If T : D(T ) ⊂ X 7−→ Y is a linear operator then
1. ||T || = sup ||T x||.
x∈D(T )
||x||=1
2. ||T x|| ≤ ||T || ||x|| for all x ∈ D(T ).
The proof of the following is more or less the same as that of Proposition 5.4.
Theorem 10.1. Let T be a linear operator from X to Y. Then the following are equivalent:
1. T is bounded on its domain.
2. T is continuous at every point of D(T ).
3. T is continuous at some point of D(T ).
4. T is continuous at 0.
We also have (see Exercise 3)
prop10-2
Proposition 10.2. If T is bounded on its domain then it has a unique norm preserving
extension to D(T ). That is to say, there exists a unique linear operator S : D(T ) ⊂
X 7−→ Y such that Sx = T x for x ∈ D(T ) and ||S|| = ||T ||.
It follows that if T is densely defined and bounded on its domain, then it automatically
has a unique bounded extension to all of X. In such a case we will always assume that
T has been replaced by this extension, unless otherwise stated.
167
Recall the notations introduced previously,
B(X, Y) = {T : X → Y : T is linear and ||T ||X,Y < ∞}
B(X) = B(X, X)
10.2
X∗ = B(X, C)
(10.1.4)
(10.1.5)
Examples of linear operators
OpExamples
We next discuss an extensive list of examples of linear operators.
op-finite
Example 10.1. Let X = Cn , Y = Cm and T x = Ax for some m × n complex matrix A,
i.e.
n
X
(T x)k =
akj xj
k = 1, . . . m
(10.2.1)
j=1
if ajk is the (j, k) entry of A. Clearly T is linear and in Exercise 6 you are asked to verify
that T is bounded for any choice of the norms on X, Y. The exact value of the operator
norm of T , however, will depend on exactly which norms are used in X, Y.
Suppose we use the usual Euclidean norm || · ||2 in both spaces. Then using the
Schwarz inequality we may obtain
2
! n
!
m X
n
m
n
X
X
X
X
||T x||2 =
akj xj ≤
|akj |2
|xj |2
(10.2.2)
j=1 j=1
j=1
j=1
k=1
!
m X
n
X
=
|akj |2 ||x||2
(10.2.3)
k=1 j=1
from which we conclude that
||T || ≤
m X
n
X
!1/2
|akj |2
(10.2.4)
k=1 j=1
The right hand side of (10.2.4) is known as the Frobenius norm of A, and it is easy to
check that it satisfies all of the axioms of a norm on the vector space of m × n matrices.
Note however that (10.2.4) is only an inequality and it is known to be strict in general,
as will be clarified below.
168
frob
If p, q ∈ [1, ∞] let us temporarily use the notation ||T ||p,q for the norm of T when we
use the p-norm in X and the q-norm in Y. The problem of computing ||T ||p,q in a more
or less explicit way from the entries of A is difficult in general, but several special cases
are well known.
• If p = q = 1 then ||T ||1,1 = max
j
n
X
|akj |, the maximum absolute column sum of A.
k=1
• If p = q = ∞ then ||T ||∞,∞ = max
k
m
X
|akj |, the maximum absolute row sum of A.
j=1
• If p = q = 2 then ||T ||2,2 is the largest singular value of A, or equivalently ||T ||22,2 is
the largest eigenvalue of the square Hermitian matrix A∗ A.
Details about these points may be found in most textbooks on linear algebra or numerical
analysis, see for example Chapter 2 of [14].
Example 10.2. Let X = Y = Lp (RN ) and T be the translation operator defined on
D(T ) = X by
T u(x) = τh u(x) = u(x − h)
(10.2.5)
for some fixed h ∈ RN . Clearly
||T u|| = ||u||
(10.2.6)
for any f so that ||T || = 1.
op-mult
Example 10.3. Let Ω ⊂ RN , X = Y = Lp (Ω), m ∈ L∞ (Ω) and define the multiplication
operator T on D(T ) = X by
T u(x) = m(x)u(x)
(10.2.7)
Clearly we have
||T u||Lp ≤ ||m||L∞ ||u||Lp
(10.2.8)
so that ||T || ≤ ||m||L∞ . We claim that actually equality holds. The case m ≡ 0 is trivial,
otherwise in the case 1 ≤ p < ∞ we can see it as follows. For any 0 < < ||m||L∞ there
must exist a measurable set Σ ⊂ Ω of measure η > 0 such that ±m(x) ≥ ||m||L∞ − for
x ∈ Σ. If we now choose u = χΣ , the characteristic function of Σ, then ||u||Lp = η 1/p and
Z
p
|m(x)|p dx ≥ η(||m||L∞ − )p
(10.2.9)
||T u||Lp =
Σ
169
Thus
||T u||Lp
≥ ||m||L∞ − ||u||Lp
(10.2.10)
which immediately implies that ||T || ≥ ||m||L∞ as needed. The case p = ∞ is left as an
exercise.
Example 10.4. One of the most important classes of operators we will be concerned
with in this book is integral operators. Let Ω ⊂ RN , X = Y = L2 (Ω), K ∈ L2 (Ω × Ω)
and define the operator T by
Z
T u(x) =
K(x, y)u(y) dy
(10.2.11)
GenIntOp
Ω
It may not be immediately clear how we should define D(T ), but note by the Schwarz
inequality that
2
Z Z
K(x, y)u(y) dy dx
||T u||2L2 =
(10.2.12)
Ω
Ω
Z
Z Z
2
2
|u(y)| dy dx
(10.2.13)
≤
|K(x, y)| dy
Ω
Ω
Ω
Z
Z Z
2
2
|u(y)| dy
(10.2.14)
|K(x, y)| dy dx
=
Ω
Ω
Ω
This shows simultaneously that T u ∈ L2 whenever u ∈ L2 , so that we may take D(T ) =
L2 , and that
||T || ≤ ||K||L2 (Ω×Ω)
(10.2.15)
HSest
We refer to K as the kernel1 of the operator T . Note the formal similarity between this
calculation and that of Example 10.1. Just as in that case, the inequality for ||T || is
strict, in general.
Example 10.5. Let h be a locally integrable function and define the convolution operator
Z
T u(x) = (h ∗ u)(x) =
h(x − y)u(y) dy
(10.2.16)
RN
This is obviously an operator of the type (10.2.11) with Ω = RN but for which K(x, y) =
h(x − y) does not satisfy the L2 condition in the previous example, except in trivial cases.
1
which is not to be confused with the null space of T !
170
ConvOp
Thus it is again not immediately apparent how we should define D(T ). Recall, however,
Young’s convolution inequality (7.4.2) which implies immediately that
||T u||Lr ≤ ||h||Lp ||u||Lq
if
1 1
1
+ =1+
p q
r
p, q, r ∈ [1, ∞]
(10.2.17)
(10.2.18)
Thus we may take D(T ) = X = Lq (RN ) and Y = Lr (RN ) with p, q, r are related as
above, in which case ||T || ≤ ||h||Lp . [Is this sharp?]
Example 10.6. If we let
T u(x) =
Z
1
(2π)
N
2
u(y)e−ix·y dy
(10.2.19)
RN
then T u(x) = û(x), is the Fourier transform of u studied in Chapter 8. It is again
a special case of (10.2.11) with kernel K not satisfying the L2 integrability condition.
From the earlier discussion of properties of the Fourier transform we have the following:
1. T is a bounded linear operator from X = L1 (RN ) into Y = C0 (RN ) with norm
||T || ≤
1
N
(2π) 2
(10.2.20)
In fact it is easy to see that equality holds here, see Exercise 16.
2. T is a bounded linear operator from X = L2 (RN ) onto Y = L2 (RN ) with norm
||T || = 1. Indeed ||T u|| = ||u|| for all u ∈ L2 (RN ) by the Plancherel identity
(8.5.25).
It can also be shown, although this is more difficult (see Chapter I, section 2 of [35]),
that T is a bounded linear operator from X = Lp (RN ) into Y = Lq (RN ) if
1<p≤2≤q<∞
1 1
+ =1
p q
(10.2.21)
If u ∈ Lp (RN ) for p > 2 then û always exists in a distributional sense, but may not be a
function, see Chapter I, section 4.13 of [35]
171
fourmult
Example 10.7. Let m ∈ L∞ (RN ) and define the linear operator T , known as a Fourier
multiplication operator, by
Tcu(y) = m(y)b
u(y)
(10.2.22)
where as usual û denotes the Fourier transform. If we use F as an alternative special
notation for the Fourier transform, and let S denote the multiplication operator defined in
Example 10.3, then it is equivalent to defining T = F −1 SF. If we take X = Y = L2 (RN )
then from the known properties of F, S we get immediately from the Plancherel identity
that
||T u||L2 = ||Tcu||L2 = ||mb
u||L2 ≤ ||m||L∞ ||b
u||L2 = ||m||L∞ ||u||L2
(10.2.23)
implying that ||T || ≤ ||m||L∞ . As in the case of the ordinary multiplication operator one
can show that equality must hold.
Note that formally we have
Z
Z
1
−iz·y
ix·y
e
u(z) dz dy
T u(x) =
e m(y)
(2π)N RN
RN
Z
Z
1
i(x−z)·y
=
m(y)e
dy dz
u(z)
(2π)N RN
RN
Z
=
u(z)h(x − z) dz
(10.2.24)
(10.2.25)
(10.2.26)
RN
provided that b
h(y) =
m(y)
(2π)
N
2
. Thus the Fourier multiplication operators appears to be
just a special kind of convolution operator. However m ∈ L∞ (RN ) could happen even if
h 6∈ Lp (RN ) for any p, in which case the above discussion about convolution operators is
not applicable. A trivial example of this is when m(y) ≡ 1 corresponding to T being the
identity mapping and h being the delta function.
A significant example
is obtained by taking N = 1 and m(y) = −i sgn (y). By (8.8.5)
√
b
we see that m(y) = 2π h(y) if h(x) = π1 pv x1 , where here the Fourier transform is meant
in the sense of distributions. Thus, we have at least formally that
Z ∞
1 1
1
u(y)
T u(x) =
pv ∗ u (x) = pv
dy
(10.2.27)
π x
π
−∞ x − y
This operator is known as the Hilbert transform, and will be from now on denoted by
H. Since we have not rigorously established the validity of the formulas (10.2.27), or
even explained why the principal value integral in (10.2.27) should exist in general for
172
HilbTransDef
u ∈ L2 (R), we will always use the above, completely unambiguous definition of H as a
Fourier multiplication operator when anything needs to be proved. For example, since
c
|m(y)| ≡ 1, we get |Hu(y)|
≡ |b
u(y)| and then
c L2 = ||b
||Hu||L2 = ||Hu||
u||L2 = ||u||L2
(10.2.28)
and in particular ||H|| = 1 as an operator on L2 (R). The Hilbert transform is the
archetypical example of a singular integral operator, see for example Chapter II of [34].
A Fourier multiplication operator is often referred to as a filter, especially in the
electrical engineering and signals processing literature. The idea here is that if u =
u(t), t ∈ R represents a signal, then u
b(k) corresponds to the signal in the ’frequency
domain’, in the sense that the Fourier inversion formula
Z ∞
1
u(t) = √
eikt u
b(k) dk
(10.2.29)
2π −∞
represents the signal as a superposition of fixed frequency signals eikt , with u
b(k) then
being the weight given to the component of frequency k. The effect of a filter is thus to
modify the frequency component u
b(k) by multiplying it by m(k). The operator T coming
from the choice
(
1 |k| < k0
m(k) =
(10.2.30)
0 |k| ≥ k0
leaves low frequencies (|k| < k0 ) unchanged and removes all of the high frequency components, and is for this reason sometimes called an ideal low-pass filter. Likewise 1 − m(k)
gives an ideal high-pass filter. A band-pass filter would be one which for which m(k) = 1
on some interval of frequencies [k1 , k2 ] and is zero otherwise.
Example 10.8. If H is a Hilbert space and M ⊂ H is a closed subspace, we have seen
in Chapter 6 that the orthogonal projection PM is a linear operator defined on all of H.
It is immediate from the relation (6.4.10) that ||PM x|| ≤ ||x|| for all x ∈ H, and aside
from the trivial case PM = 0 there must exist x ∈ H, x 6= 0 such that PM x = x, from
which it follows that ||PM || = 1.
Example 10.9. Let X = Y = `2 (the sequence space defined in Example 6.3). If
x = {x1 , x2 , . . . } ∈ `2 set
S+ x = {0, x1 , x2 , . . . }
(10.2.31)
RShift
S− x = {x2 , x3 , . . . }
LShift
(10.2.32)
2
which are called respectively the right and left shift operators on ` . Clearly ||S+ x|| = ||x||
for any x, and ||S− x|| ≤ ||x|| with equality if x1 = 0. Thus, ||S+ || = ||S− || = 1. Note
173
that S− S+ = I (the identity map), while S+ S− = PM where M is the closed subspace
M = {x ∈ `2 : x1 = 0}.
exmp10-10
Example 10.10. Let Ω be an open set in RN , m a positive integer and
X
T u(x) =
aα (x)Dα u
(10.2.33)
|α|≤m
where the coefficients aα ∈ C(Ω). If X = Y = Lp (Ω), 1 ≤ p < ∞ then we can let
D(T ) = C m (Ω) which is a dense subset of X (since it contains C0∞ (Ω), for example).
Thus T is a densely defined linear operator, but it is not bounded in general. For example,
take X = Y =√L2 (0, 1), T u = u0 and√un (x) = sin nπx. Then by explicit calculation we
find ||un || = 1/ 2 and ||T un || = nπ/ 2, so that ||T un ||/||un || → ∞ as n → ∞.
Note that in the constant coefficient case with Ω = RN we have Tcu(y) = P (y)b
u(y),
provided T is a tempered distribution, where P is the characteristic polynomial of P as
discussed earlier in Section 9.3. Thus T is formally a a Fourier multiplication operator
but with a multiplier m(y) = P (y) which is not in L∞ .
Example 10.11. A pseudodifferential operator (ΨDO) is an operator of the form
Z
T u(x) =
a(x, y)eix·y u
b(y) dy
(10.2.34)
RN
for some function a, known as the symbol of T . If a(x, y) = a(y), a ∈ L∞ (RN ) then
T is a Fourier multiplication operator, while if a = a(x) it is an ordinary multiplication
operator.
10.3
Linear operator equations
Given a linear operator T : X → Y, we wish to study the operator equation
Tu = f
(10.3.1)
where f is a given member of Y. In the usual way, if T is one-to-one, i.e. if N (T ) = {0},
then we may define the corresponding inverse operator T −1 : R(T ) → D(T ). It is easy to
check that T −1 is also linear when it exists, but it need not be bounded even if T is, or it
may be bounded even if T is not. Some key questions which always arise in connection
with (10.3.1) are:
174
MainOpEq
• For what f ’s does there exist a solution u, i.e. what is the range R(T )?
• If a solution exists, is it unique? If not, how can we describe the set of all solutions?
Since any two solutions differ by a solution of T u = 0 this amounts to characterizing
the null space N (T ).
The investigation of these questions will clearly require us to be precise about what
the spaces X, Y are. For reasons which will become more apparent below, we will mostly
focus on the case that X = Y = H, a Hilbert space, but the study of more general
situations can be found in more advanced texts.
Let us first consider the case when X = Cn , Y = Cm so T u = Au for some m × n
matrix A = [akj ]. Well known results from linear algebra tell us
• R(T ) is the column space of A, i.e., the set of all linear combinations of the columns
of A.
• R(T ) = N (T ∗ )⊥ , where T ∗ is the matrix multiplication operator with matrix A∗ ,
the conjugate transpose (or Hermitian conjugate, or adjoint matrix) of A.
The second item provides a complete characterization of when T u = f is solvable, namely,
a solution exists if and only if f ⊥ v for every v ∈ N (T ∗ ). If the subspace N (T ∗ ) has the
basis {v1 , . . . vp } then it is equivalent to require hf, vk i = 0, k = 1, . . . p. This amounts to
p solvability, or consistency, conditions on f , which are necessary and sufficient for the
existence of a solution of T u = f . Eventually we will prove a version of this statement
in a Hilbert space setting, for certain types of operator T . The main point, at present,
is that the operator T ∗ plays a key role in understanding the solvability of T u = f , and
so something similar can be expected in the infinite dimensional case. The operator T ∗
is the so-called adjoint operator of T , and in the next section we show how it can be
defined at least in the case that T is bounded. The case of unbounded T is more subtle,
and will be taken up in the following chapter.
10.4
The adjoint operator
In the finite dimensional example of the previous section, note that T ∗ has the property
hT u, vi = hu, T ∗ vi
∀u ∈ Cn
175
v ∈ Cm
(10.4.1)
since either side is equal to
Pm Pn
k=1
j=1
akj uj vk .
Now suppose X = Y = H, a Hilbert space and T is a bounded linear operator on
H. With the above motivation we seek another bounded linear operator T ∗ with the
property that
hT u, vi = hu, T ∗ vi
∀u, v ∈ H
(10.4.2)
MainAdjProp
If such a T ∗ can be found, observe that if there exists any solution u of T u = f then we
have
hf, vi = hT u, vi = hu, T ∗ vi =< u, 0 >= 0
(10.4.3)
for any v ∈ N (T ∗ ), so that f ⊥ v must hold for all such v. We have thus shown that
R(T ) ⊥ N (T ∗ ), or equivalently
R(T ) ⊂ N (T ∗ )⊥
N (T ∗ ) ⊂ R(T )⊥
(10.4.4)
where the second inclusion follows from the first and the fact that N (T ∗ ) is closed.
In particular f ⊥ N (T ∗ ) is a necessary condition for the solvability of T u = f . The
sufficiency of this condition need not be true in general as we will see by examples, but
it does hold for some important classes of operator T .
AdjExists
Theorem 10.2. If H is a Hilbert space and T ∈ B(H) then there exists a unique T ∗ ∈
B(H), the adjoint of T , such that (10.4.2) holds. In addition, ||T ∗ || = ||T ||.
Proof: Fix v ∈ H and let `(u) = hT u, vi. Clearly ` is linear on H and
|`(u)| = |hT u, vi| ≤ ||T u|| ||v|| ≤ ||T || ||u|| ||v||
(10.4.5)
and therefore ` ∈ H∗ with ||`|| ≤ ||T || ||v||. By the Riesz Representation Theorem there
exists a unique v ∗ ∈ H such that
`(u) = hu, v ∗ i
∀u ∈ H
(10.4.6)
We define T ∗ v = v ∗ so that clearly T ∗ : H → H and (10.4.2) is true. We claim next that
T ∗ is linear. To see this, note that for any v1 , v2 ∈ H, u ∈ H and scalars c1 , c2
hu, T ∗ (c1 v1 + c2 v2 )i =
=
=
=
hT u, c1 v1 + c2 v2 i
c1 hT u, v1 i + c2 hT u, v2 i
c1 hu, T ∗ v1 i + c2 hu, T ∗ v2 i
hu, c1 T ∗ v1 + c2 T ∗ v2 i
176
(10.4.7)
(10.4.8)
(10.4.9)
(10.4.10)
RperpN
Since u is arbitrary we must have T ∗ (c1 v1 + c2 v2 ) = c1 T ∗ v1 + c2 T ∗ v2 as needed.
Next we claim that T ∗ is bounded. To verify this, note that ||T ∗ v|| = ||v ∗ || = ||`|| ≤
||T || ||v|| implying that
||T ∗ || ≤ ||T ||
(10.4.11)
To check the uniqueness property suppose that there exists some other bounded
linear operator S such that hT u, vi = hu, Svi for all u, v ∈ H. It would then follow that
hu, T ∗ v − Svi = 0 for all u, implying T ∗ v = Sv for all v, in other words S = T ∗ must
hold.
Finally we show that ||T ∗ || = ||T ||. Since T ∗ ∈ B(H) it also has an adjoint T ∗∗
satisfying hT ∗ u, vi = hu, T ∗∗ vi for all u, v. But we also have
hT ∗ u, vi = hv, T ∗ ui = hT v, ui =< u, T v >
(10.4.12)
so by uniqueness of the adjoint we must have T ∗∗ = T . But then from (10.4.11) with
T replaced by T ∗ it follows that ||T || = ||T ∗∗ || ≤ ||T ∗ || and so from (10.4.11) again we
obtain ||T || = ||T ∗ ||.
Certain special classes of operator are defined, depending on the relationship between
T and T ∗ .
Definition 10.3. If T ∈ B(H) then
• If T ∗ = T we say T is self-adjoint.
• If T ∗ = −T we say T is skew-adjoint.
• If T ∗ = T −1 we say T is unitary.
Proposition 10.3. If S, T ∈ B(H) then ST ∈ B(H) and
(ST )∗ = T ∗ S ∗
(10.4.13)
If T −1 ∈ B(H) then (T ∗ )−1 ∈ B(H) and
(T −1 )∗ = (T ∗ )−1
The proofs of these two properties will be left for the exercises.
177
(10.4.14)
NormAdjoint
10.5
Examples of adjoints
We now revisit several of the examples from Section 10.2, with focus on computing the
corresponding adjoint operators. We remark that the uniqueness assertion of Theorem
10.2 is a relatively elementary thing, but note how it gets used repeatedly below to
establish what the adjoint of a given operator T is.
Example 10.12. In the case H = Cn with T u = Au, A an n × n matrix, we already
know that
hT u, vi = hAu, vi = hu, A∗ vi
(10.5.1)
where A∗ is the conjugate transpose matrix of A. Thus by uniqueness T ∗ v = A∗ v, as
expected. T is then obviously self-adjoint if A∗ = A, consistent with the usual definition
from linear algebra. A is also said to be a Hermitian matrix in this case, or symmetric
in the real case. Likewise the meaning of a skew-adjoint operator or unitary operator
coincides with the way the terms are normally used in linear algebra.
Note that we haven’t considered here the case of an m × n matrix with m 6= m since
then the domain and range spaces would be different, requiring a somewhat different way
of defining the adjoint.
Example 10.13. Consider the multiplication operator T u(x) = m(x)u(x) on L2 (Ω),
where m ∈ L∞ (Ω). Then
Z
Z
hT u, vi =
m(x)u(x)v(x) dx =
u(x) m(x)v(x) dx
(10.5.2)
Ω
Ω
from which it follows that T ∗ v(x) = m(x)v(x). T is self-adjoint if m is real valued,
skew-adjoint if m is purely imaginary and unitary if |m(x)| ≡ 1.
GenIntOpAdj
Example 10.14. Next we look at the integral operator (10.2.11) on L2 (Ω), with K ∈
L2 (Ω × Ω) so that T is bounded. Assuming that the use of Fubini’s theorem below can
be justified, we get
Z Z
< T u, v > =
K(x, y)u(y) dy v(x) dx
(10.5.3)
Ω
Ω
Z
Z
=
u(y)
K(x, y)v(x) dx
(10.5.4)
Ω
Ω
which is the same as hu, T ∗ vi if and only if
Z
∗
T v(y) =
K(x, y)v(x) dx
Ω
178
(10.5.5)
or equivalently
Z
∗
T v(x) =
K(y, x)v(y) dy
(10.5.6)
Ω
Thus T ∗ is the integral operator with kernel K(y, x), and note again the formal similarity
to the case of the matrix multiplication operator. The use of Fubini’s theorem to exchange
the order of integrals above can be justified by observing that K(x, y)u(y)v(x) ∈ L1 (Ω ×
Ω) under our assumptions (Exercise 9). T will be self-adjoint, for example, if K is real
valued and symmetric in x, y.
Example 10.15. Consider next T = F, the Fourier transform on L2 (RN ). Based on the
previous example we may expect that
Z
1
∗
eix·y v(y) dy
(10.5.7)
T v(x) =
N
(2π) 2 RN
FTAdj
since the kernel here is the conjugate transpose of that for T . This is correct, but can’t
be proven as above since the use of Fubini’s theorem can’t be directly justified. Instead
we proceed by first recalling the Parseval identity (8.5.17)
Z
Z
u
b(x)v(x) dx =
u(x)b
v (x) dx
(10.5.8)
RN
Thus
RN
Z
hT u, vi =
RN
Z
u
b(x)v(x) dx =
d
u(x)v(x) dx
(10.5.9)
RN
d
so that T ∗ v(x) = v(x). One can now check, by unwinding the definitions, that this is the
same as T ∗ v(x) = T v(−x), which amounts to (10.5.7). Furthermore, we recognize from
the Fourier inversion theorem that (10.5.7) may be restated as
T ∗ v = T −1 v
(10.5.10)
so in particular the Fourier transform is a unitary operator.
Example 10.16. If T is the Fourier multiplication operator T = F −1 SF on L2 (RN ),
where S is the multiplication operator with L∞ multiplier m, then we can obtain that
T ∗ = F −1 S ∗ F, i.e. T ∗ is the Fourier multiplication operator with multiplier m. In
particular, the Hilbert transform is skew-adjoint, H∗ = −H, since m(y) = −m(y) in this
case.
179
10-5-10
10.6
Conditions for solvability of linear operator equations
Let us return now to the general study of operator equations T u = f , when T is a
bounded linear operator on a Hilbert space H.
prop10-4
Proposition 10.4. If T ∈ B(H) then N (T ∗ ) = R(T )⊥ .
Proof: By (10.4.4) we have N (T ∗ ) ⊂ R(T )⊥ . Conversely, if v ∈ R(T )⊥ then hu, T ∗ vi =
hT u, vi = 0 for all u ∈ H. Thus T ∗ v = 0 must hold so v ∈ N (T ∗ ).
Since M ⊥⊥ = M for any subspace M we get immediately
corr10-2
Corollary 10.1. If T ∈ B(H) then N (T ∗ )⊥ = R(T ).
Corollary 10.2. If T ∈ B(H) has closed range then T u = f has a solution if and only
if f ⊥ N (T ∗ ).
Since N (T ∗ )⊥ is always a closed subspace, clearly the identity R(T ) = N (T ∗ )⊥ can
only hold if T has closed range. This is not true in general, although it holds in the finite
dimensional case, by Theorem 5.1.
Rx
Example 10.17. Let H = L2 (0, 1) and T u(x) = 0 u(y) dy. We may think of this
operator as the special case of (10.2.11) in which
(
1
y<x
K(x, y) =
(10.6.1)
0
y>x
This kernel is clearly in L2 ((0, 1) × (0, 1)) so that T ∈ B(H). Let fn be any sequence
of continuously differentiable functions such that fn (0) = 0 for all n and fn converges in
H to f (x) = H(x − 1/2). Each fn is in the range of T since fn = T un if un = fn0 . But
f 6∈ H since the range of T contains only continuous functions. Thus T does not have
closed range.
Definition 10.4. If T is any linear operator, we set rank (T ) = dim R(T ), and say that
T is a finite rank operator whenever rank (T ) < ∞.
We have thus established the following:
Corollary 10.3. If T ∈ B(H) and rank (T ) < ∞ then R(T ) = N (T ∗ )⊥ .
180
Aside from the completely finite dimensional situation, there are other finite rank
operators which will be of interest to us.
R1
Example 10.18. Let H = L2 (0, 1) and T u(x) = 0 xyu(y) dy. Then R(T ) = span (e)
where e(x) = x, so rank (T ) = 1. Here T is self-adjoint so N (T ∗ ) = N (T ) = {e}⊥ so the
conclusion of the corollary is obvious.
More generally, let H = L2 (Ω) for some bounded open set Ω ⊂ RN and let T u(x) =
K(x, y)u(y) dy where
Ω
M
X
K(x, y) =
φj (x)ψj (y)
(10.6.2)
R
j=1
for some φj , ψj ∈ L2 (Ω). We may always assume that the φj ’s and ψj ’s are linearly
independent. Such a kernel K is sometimes said to be degenerate. In this case we have
R(T ) = L(φ1 , . . . φM ) so that rank (T ) = M . The condition f ⊥ N (T ∗ ) amounts to
requiring the M solvability or consistency conditions, hf, φj i = 0 for j = 1, . . . M .
10.7
Fredholm operators and the Fredholm alternative
The following is a very useful concept.
fredholmdef
Definition 10.5. T ∈ B(H) is of Fredholm type (or more informally, a Fredholm operator) if
• N (T ), N (T ∗ ) are both finite dimensional,
• R(T ) is closed.
For such an operator T we define ind (T ), the index of T as
ind (T ) = dim(N (T )) − dim(N (T ∗ ))
(10.7.1)
For our purposes the case of Fredholm operators of index 0 will be the most important
one. If we can show somehow that an operator T belongs to this class then we obtain
immediately the conclusion that ’uniqueness is equivalent to existence’. That is to say,
the property that T u = f has at most one solution for any f ∈ H is equivalent to the
property that T u = f has at least one solution for any f ∈ H. The following elaboration
of this is known as the Fredholm Alternative Theorem.
181
FredAlt
Theorem 10.3. Let T ∈ B(H) be a Fredholm operator of index 0. Then either
1. N (T ) = N (T ∗ ) = {0} and the equation T u = f has a unique solution for every
f ∈ H, or
2. dim(N (T )) = dim(N (T ∗ )) = M > 0, the equation T u = f has a solution u∗ if
and only if f satisfies the M compatibility conditions f ⊥ N (T ∗ ), and the general
solution of T u = f can be written as {u = u∗ + v : v ∈ N (T )}.
Example 10.19. Every linear operator on CN is of Fredholm type and index 0, since
by a well known fact from matrix theory, a matrix and its transpose have null spaces of
the same dimension.
In the infinite dimensional situation it is easy to find examples of nonzero index – the
simplest example is a shift operator.
Example 10.20. If we define S+ , S− as in (10.2.31), (10.2.32) then by Exercise 10
S+∗ = S− , S−∗ = S+ , and it is then easy to see that ind (S+ ) = −1 and ind (S− ) = 1.
Clearly by shifting to the left or right by more than one entry, we can create an example
of a Fredholm operator with any integer as its index.
Example 10.21. We will see in Chapter 13 that the operator λI + T , where T is an
integral operator of the form (10.2.11), with K ∈ L2 (Ω × Ω) and λ 6= 0, is always a
Fredholm operator of index 0.
10.8
Convergence of operators
Recall that if X, Y are Banach spaces we have defined a norm on B(X, Y) for which all
of the norm axioms are satisfied, so that B(X, Y) is a normed linear space, and in fact
is itself a Banach space.
Definition 10.6. We say Tn → T uniformly if ||Tn − T || → 0, i.e. Tn → T in the
topology of B(X, Y). We say Tn → T strongly if Tn x → T x for every x ∈ X.
Clearly uniform convergence implies strong convergence, but the converse is false
(see Exercise 17). As usual we can define an infinite series of operators as the limit of
the partial sums, and speak of uniform or strong convergence of the series. The series
182
P∞
n=1
Tn will converge uniformly to some limit T ∈ B(X) if
∞
X
||Tn || < ∞
(10.8.1)
OpMtest
n=1
and in this case ||T || ≤
by the following.
th10-4
P∞
n=1
||Tn || (See Exercise 18). An important special case is given
Theorem 10.4. If T ∈ B(X), λ ∈ C and ||T || < |λ| then (λI − T )−1 ∈ B(X),
−1
(λI − T )
∞
X
Tn
=
λn+1
n=0
(10.8.2)
10-8-2
(10.8.3)
10-8-3
where the series is uniformly convergent, and
||(λI − T )−1 || ≤
1
|λ| − ||T ||
Proof: If Tn is replaced by T n /λn+1 then clearly (10.8.1) holds for the series on the right
hand side of (10.8.2), so it is uniformly convergent to some S ∈ B(X). If SN denotes the
N ’th partial sum then
T N +1
SN (λI − T ) = I − N +1
(10.8.4)
λ
Since ||T N +1 /λN +1 || < (||T ||/|λ|)N +1 → 0 we obtain S(λI − T ) = I in the limit as
N → ∞. Likewise (λI − T )S = I, so that (10.8.2), and subsequently (10.8.3) holds.
The formula (10.8.2) is easily remembered as the ’geometric series’ for (λI − T )−1 .
10.9
Exercises
In these exercises assume that X is a Banach space and H is a Hilbert space.
1. If T1 , T2 ∈ B(X) show that ||T1 + T2 || ≤ ||T1 || + ||T2 ||, ||T1 T2 || ≤ ||T1 || ||T2 ||, and
||T n || ≤ ||T ||n .
1 −2
2. If A =
compute the Frobenius norm of A and ||A||p for p = 1, 2 and ∞.
3
4
183
extension
3. Prove Proposition 10.2.
4. Define the averaging operator
1
T u(x) =
x
Z
x
u(y) dy
0
Show that T is bounded on Lp (0, ∞) for 1 < p < ∞. (Suggestions: Assume first
that u ≥ 0 and is a continuous function of compact support. If v = T u show that
Z ∞
Z ∞
p
v (x) dx = −p
v p−1 (x)xv 0 (x) dx
0
0
Note that xv 0 = u − v and apply Hölder’s inequality. Then derive the general case.
The resulting inequality is known as Hardy’s inequality.)
exc10-5
5. Let T be the Fourier multiplication operator on L2 (R) with multiplier m(y) = H(y)
(the Heaviside function), and define
M+ = {u ∈ L2 (R) : û(y) = 0 ∀y < 0} M− = {u ∈ L2 (R) : û(y) = 0 ∀y > 0}
a) Show that T = 21 (I + iH), where H is the Hilbert transform.
b) Show that if u is real valued, then u is uniquely determined by either the real or
imaginary part of T u.
c) Show that L2 (R) = M+ ⊕ M− .
d) Show that T = PM+ .
e) If u ∈ M+ show that u = iHu. In particular, if u(x) = α(x) + iβ(x) then
β = Hα
α = −Hβ
(Comments: T u is sometimes called the analytic signal of u. This terminology
comes from the fact that T u can be shown to always have an extension as an analytic
function to the upper half of the complex plane. It is often convenient to work with
T u instead of u, because it avoids ambiguities due to k and −k really being the same
frequency – the analytic signal has only positive frequency components. By b), u
and T u are in one-to-one correspondence, at least for real signals. The relationships
between α and β in e) are sometimes called the Kramers-Kronig relations. Note
that it means that M+ contains no purely real valued functions except for u = 0,
and likewise for M− .)
184
ex10-6
6. Show that a linear operator T : Cn → Cm is always bounded for any choice of norms
on Cn and Cm .
7. If T, T −1 ∈ B(H) show that (T ∗ )−1 ∈ B(H) and (T −1 )∗ = (T ∗ )−1 .
8. If S, T ∈ B(H), show that
(i) (S + T )∗ = S ∗ + T ∗
(ii) (ST )∗ = T ∗ S ∗
(These properties, together with (iii) (λT )∗ = λ̄T ∗ for scalars λ and (iv) T ∗∗ = T ,
which we have already proved, are the axioms for an involution on B(H), that is to
say the mapping T 7−→ T ∗ is an involution. The term involution is also used more
generally to refer to any mapping which is its own inverse.)
Ex10-9
9. Give a careful justification of how (10.5.4) follows from (10.5.3) with reference to
an appropriate version of Fubini’s theorem.
Ex10-10
10. Let S+ , S− be the left and right shift operators on `2 . Show that S+ = S−∗ and
S− = S+∗ .
Rx
11. Let T be the Volterra integral operator T u(x) = 0 u(y) dy, considered as an operator on L2 (0, 1). Find T ∗ and N (T ∗ ).
12. Suppose T ∈ B(H) is self-adjoint and there exists a constant c > 0 such that
||T u|| ≥ c||u|| for all u ∈ H. Show that there exists a solution of T u = f for all
f ∈ H. Show by example that the conclusion may be false if the assumption of
self-adjointness is removed.
13. Let M be the multiplication operator M u(x) = xu(x) in L2 (0, 1). Show that R(M )
is dense but not closed.
14. An operator T ∈ B(H) is said to be normal if it commutes with its adjoint, i.e.
T ∗ T = T T ∗ . Thus, for example, any self-adjoint, skew-adjoint, or unitary operator
is normal. For a normal operator T show that
a) ||T u|| = ||T ∗ u|| for every u ∈ H.
b) T is one to one if and only if it has dense range.
c) Show that any Fourier multiplication operator, as in Example 10.7, is normal in
L2 (Ω).
d) Show that the shift operators S+ , S− are not normal in `2 .
185
15. If U(H) denotes the set of unitary operators on H, show that U(H) is a group under
composition. Is U(H) a subspace of B(H)?
ex10-14
16. Prove that if T is the Fourier transform regarded as a linear operator from L1 (RN )
into C0 (RN ) then ||T || = 1 N .
(2π)
Ex10-17
Ex10-18
2
17. Give an example of a sequence Tn ∈ B(H) which is strongly convergent but not
uniformly convergent.
P
P∞
18. If Tn ∈ B(X) and ∞
n=1 ||Tn || < ∞, show that the series
n=1 Tn is uniformly
convergent. In particular verify that the operator exponential
T
e :=
∞
X
Tn
n=0
n!
is well defined for any T ∈ B(X) and satisfies ||eT || ≤ e||T || .
19. If T : D(T ) ⊂ X → Y is a linear operator, then S is a left inverse of T if ST x = x
for every x ∈ D(T ) and is a right inverse if T Sx = x for every x ∈ R(T ). If X = Y
is finite dimensional then it is known from linear algebra that a left inverse must
also be a right inverse. Show by examples that this is false if X 6= Y or if X = Y
is infinite dimensional.
20. If T ∈ B(H), the numerical range of T is the set
{λ ∈ C : λ =
hT x, xi
for some x ∈ H}
hx, xi
If T is self-adjoint show that the numerical range of T is contained in the interval
[−||T ||, ||T ||] of the real axis. What is the corresponding statement for a skewadjoint operator?
186
Chapter 11
Unbounded operators
chunboundop
11.1
General aspects of unbounded linear operators
Let us return to the general definition of linear operator given at the beginning of the previous chapter, without any assumption about continuity of the operator. For simplicity
we will assume a Hilbert space setting, although much of what is stated below remains
true for mappings between Banach spaces. We have the following essential definition.
Definition 11.1. If H is a Hilbert space and T : D(T ) ⊂ H → H is a linear operator
then we say T is closed if whenever un ∈ D(T ), un → u and T un → v then u ∈ D(T )
and T u = v.
We emphasize that this definition is strictly weaker than continuity of T , since for
a closed operator it is quite possible that un → u but the image sequence {T un } is
divergent. This could not happen for a bounded linear operator. It is simple to check
that any T ∈ B(H) must be closed.
A common alternate way to define a closed operator employs the concept of the graph
of T .
Definition 11.2. If T : D(T ) ⊂ H → H is a linear operator then we define the graph
of T to be
G(T ) = {(u, v) ∈ H × H : v = T u}
(11.1.1)
187
The definition of G(T ) (and for that matter the definition of closedness) makes sense
even if T is not linear, but it is mostly useful in the linear case. It is easy to check that
H × H is a Hilbert space with the inner product
h(u1 , v1 ), (u2 , v2 )i = hu1 , u2 i + hv1 , v2 i
(11.1.2)
11-1-2
In particular, (un , vn ) → (u, v) in H × H if and only if un → u and vn → v in H. One
may now verify (Exercise 2)
prop11-1
Proposition 11.1. T : D(T ) ⊂ H → H is a closed linear operator if and only if G(T )
is a closed subspace of H × H.
We emphasize that closedness of T does not mean that D(T ) is closed – this is false
in general. In fact we have the so-called Closed Graph Theorem,
ClosedGraph
Theorem 11.1. If T is a closed linear operator and D(T ) is a closed subspace of H,
then T must be continuous on D(T ),
We refer to Theorem 2.15 of [31] or Theorem 2.9 of [5] for a proof. In particular if T
is closed and unbounded then D(T ) cannot be all of H.
By far the most common type of unbounded operator which we will be interested in
are differential operators. For use in the next example, let us recall that a function f
defined on a closed interval [a, b] is absolutely continuous on [a, b] (f ∈ AC([a, b])) if for
n
any > 0 there
k , bk )}k=1 is a disjoint collection of intervals
P{(a
Pnexists δ > 0 such that if
in [a, b], and k=1 |bk − ak | < δ then nk=1 |f (bk ) − f (ak )| < . Clearly an absolutely
continuous function is continuous.
th11-2
Theorem 11.2. The following are equivalent.
1. f is absolutely continuous on [a, b].
2. f is differentiable a.e. on [a, b], f 0 ∈ L1 (a, b) and
Z x
f (x) = f (a) +
f 0 (y) dy ∀x ∈ [a, b]
(11.1.3)
a
3. f ∈ W 1,1 (a, b) and its distributional derivative coincides with its pointwise a.e.
derivative.
188
ftc1
Here, the equivalence of 1 and 2 is an important theorem of analysis, see for example
Theorem 11, section 6.5 of [28], Theorem 7.29 of [38] or Theorem 7.20 of [30], while the
equivalence of 2 and 3 follows from Theorem 9.2 and the definition of the Sobolev space
W 1,1 .
SimpleDO
Example 11.1. Let H = L2 (0, 1) and T u = u0 on the domain
D(T ) = {u ∈ H 1 (0, 1) : u(0) = 0}
(11.1.4)
Here D(T ) is a dense subspace of H, since it contains D(0, 1), for example, but is not all
of H, and T is unbounded, as in Example 10.10. We claim that T is closed. To see this,
suppose un ∈ D(T ), un → u in H and vn = u0n → v in H. By our assumptions, (11.1.3)
is valid, so
Z
x
vn (y) dy
un (x) =
(11.1.5)
0
We can then find a subsequence nk → ∞ and a subset Σ ⊂ (0, 1) such that unk (x) → u(x)
for x ∈ Σ and the complement of Σ has measure zero. For any x we also have that vnk → v
in L2 (0, x), so that passing to the limit in (11.1.5) through the subsequence nk we obtain
Z x
v(s) ds
∀x ∈ Σ
(11.1.6)
u(x) =
0
If we denote the right hand side by w then it is clear that w ∈ D(T ), with w0 = v in the
sense of distributions. Since u = w a.e., u and w coincide as elements of L2 (0, 1) and so
we get the necessary conclusion that u ∈ D(T ) with u0 = v.
The proper definition of D(T ) was essential in this example. If we had defined instead
D(T ) = {u ∈ C 1 ([0, 1]) : u(0) = 0} then we would not have been able to reach the
conclusion that u ∈ D(T ).
An operator which is not closed may still be closeable, meaning that it has a closed
extension. Let us define this concept carefully.
Definition 11.3. If S, T are linear operators on H, we say that S is an extension of T
if D(T ) ⊂ D(S) and T u = Su for u ∈ D(T ). In this case we write T ⊂ S. T is closeable
if it has a closed extension.
If T is not closed, then its graph G(T ) is not closed, but it always has a closure G(T )
in the topology of H × H, which is then a natural candidate for the graph of a closed
operator which extends T . This procedure may fail however, because it may happen that
189
ftc
(u, v1 ), (u, v2 ) ∈ G(T ) with v1 6= v2 so that G(T ) would not correspond to a single valued
operator. If we know somehow that this cannot happen, then G(T ) will be the graph of
some linear operator S (you should check that G(T ) is a subspace of H × H) which is
obviously closed and extends T , thus T will be closeable.
It is useful to have a clearer criterion for the closability of a linear operator T . Note
that if (u, v1 ), (u, v2 ) are both in G(T ), with v1 6= v2 , then (0, v) ∈ G(T ) for v = v1 − v2 6=
0. This means there must exist un → 0, un ∈ D(T ) such that vn = T un → v 6= 0. If we
can show that no such sequence un can exist, then evidently no such pair of points can
exist in G(T ), so that T will be closeable. The converse statement is also valid and is
easy to check. Thus we have established the following
Proposition 11.2. A linear operator on H is closeable if and only if un ∈ D(T ), un → 0,
T un → v implies v = 0.
Example 11.2. Let T u = u0 on L2 (0, 1) with domain D(T ) = {u ∈ C 1 ([0, 1] : u(0) = 0}.
We have previously observed that T is not closed, but we can check that the above
criterion holds, so that T is closeable. Let un ∈ D(T ) and un → 0, u0n → v in L2 (0, 1).
As before,
Z x
un (x) =
u0n (s)
(11.1.7)
0
Picking a subsequence nk → ∞ for which unk → 0 a.e., we get
Z x
v(s) ds = 0
a.e.
(11.1.8)
0
The left hand side is absolutely continuous so equality must hold for every x ∈ [0, 1] and
by Theorem 11.2 we conclude that v = 0 a.e.
An operator which is closeable may in general have many different closed extensions.
However, there always exists a minimal extension in this case, denoted T , the closure of
T , defined by G(T ) = G(T ). It can be alternatively characterized as follows: T is the
unique linear operator on H with the properties that (i) T ⊂ T and (ii) if T ⊂ S and S
is closed then T ⊂ S.
If T : D(T ) ⊂ H → H and S : D(S) ⊂ H → H are closed linear operators then
the sum S + T is defined and linear on D(S + T ) = D(S) ∩ D(T ), but it is not closed,
in general. Choose, for example, any closed and densely defined linear operator T and
S = −T . Then the sum S+T is the zero operator, on the dense domain D(S∩T ) = D(T ),
which is not closed. In this example S + T is closeable, but even that need not be true,
190
see Exercise 13. One can show, however, that if T is closed and S is bounded, then S + T
is closed. Likewise the product ST is defined on D(ST ) = {x ∈ D(T ) : T x ∈ D(S)} and
need not be closed even if S, T are. If S ∈ B(H) and T is closed then T S will be closed,
but ST need not be (see Exercise 11).
Finally consider the inverse operator T −1 : R(T ) → D(T ), which is well defined if T
is one-to-one.
prop11-3
Proposition 11.3. If T is one-to-one and closed then T −1 is also closed.
Proof: Let un ∈ D(T −1 ), un → u and T −1 un → v. Then if vn = T −1 un we have
vn ∈ D(T ), vn → v and T vn = un → u. Since T is closed it follows that v ∈ D(T ) and
T v = u, or equivalently u ∈ R(T ) = D(T −1 ) and T −1 u = v as needed.
11.2
The adjoint of an unbounded linear operator
To some extent it is possible to define an adjoint operator, even in the unbounded case,
and obtain some results about the solvability of the operator equation T u = f analogous
to those proved earlier in the case of bounded T .
For the rest of this section we assume that T : D(T ) ⊂ H → H is linear and densely
defined. We will say that (v, v ∗ ) is an admissible pair for T ∗ if
hT u, vi = hu, v ∗ i
∀u ∈ D(T )
(11.2.1)
We then define
D(T ∗ ) = {v ∈ H : there exists v ∗ ∈ H such that (v, v ∗ ) is an admissible pair for T }
(11.2.2)
and
T ∗v = v∗
v ∈ D(T ∗ )
(11.2.3)
For this to be an appropriate definition, we should check that for any v there is at most
one v ∗ for which (v, v ∗ ) is admissible. Indeed if there were two such elements, then the
difference v1∗ − v2∗ would satisfy hu, v1∗ − v2∗ i = 0 for all u ∈ D(T ). Since we assume D(T )
is dense, it follows that v1∗ = v2∗ .
Note that for v ∈ D(T ∗ ), if we define φv (u) = hT u, vi for u ∈ D(T ), then φv is
bounded on D(T ), since
|φv (u)| = |hu, v ∗ i| = |hu, T ∗ vi| ≤ ||u|| ||T ∗ v||
191
(11.2.4)
The converse statement is also true (see Exercise 5) so that it is equivalent to define
D(T ∗ ) as the set of all v ∈ H such that u → hT u, vi is bounded on D(T ). The domain
D(T ∗ ) always contains at least the zero element, since (0, 0) is always an admissible pair.
There are known examples for which D(T ∗ ) contains no other points (see Exercise 4).
Here is a useful characterization of T ∗ in terms of its graph G(T ∗ ) ⊂ H × H.
adjgraph
Proposition 11.4. If T is a densely defined linear operator on H then
G(T ∗ ) = (V (G(T )))⊥
(11.2.5)
where V is the unitary operator on H × H defined by
V (x, y) = (−y, x)
x, y ∈ H
(11.2.6)
We leave the proof as an exercise.
Proposition 11.5. If T is a densely defined linear operator on H then T ∗ is a closed
linear operator on H.
We emphasize that it is not assumed here that T is closed.
Proof: If v1 , v2 ∈ D(T ∗ ) and c1 , c2 are scalars, then there exist unique elements v1∗ , v2∗
such that
hT u, v1 i = hu, v1∗ i hT u, v2 i = hu, v2∗ i
for all u ∈ D(T )
(11.2.7)
Then
hT u, c1 v1 + c2 v2 i = c1 hT u, v1 i + c2 hT u, v2 i = c1 hu, v1∗ i + c2 hu, v2∗ i = hu, c1 v1∗ + c2 v2∗ i
(11.2.8)
for all u ∈ D(T ), thus (c1 v1 + c2 v2 , c1 v1∗ + c2 v2∗ ) is an admissible pair for T ∗ . In particular
c1 v1 + c2 v2 ∈ D(T ∗ ) and
T ∗ (c1 v1 + c2 v2 ) = c1 v1∗ + c2 v2∗ = c1 T ∗ v1 + c2 T ∗ v2
(11.2.9)
To see that T ∗ is closed, let vn ∈ D(T ∗ ), vn → v and T ∗ vn → w. If u ∈ D(T ) then
we must have
hT u, vn i = hu, T ∗ vn i
(11.2.10)
Letting n → ∞ yields hT u, vi = hu, wi. Thus (v, w) is an admissible pair for T ∗ implying
that v ∈ D(T ∗ ) and T ∗ v = w, as needed.
192
Example 11.3. Let us reconsider the densely defined differential operator in Example
11.1. We seek here is to find the adjoint operator T ∗ , and emphasize that one must
determine D(T ∗ ) as part of the answer. It is typical in computing adjoints of unbounded
operators that precisely identifying the domain of the adjoint is more difficult than finding
a formula for the adjoint.
Let v ∈ D(T ∗ ) and T ∗ v = g, so that hT u, vi = hu, gi for all u ∈ D(T ). That is to say,
Z 1
Z 1
0
u (x)v(x) dx =
u(x)g(x) dx
∀u ∈ D(T )
(11.2.11)
0
0
Let
Z
G(x) = −
1
g(y) dy
(11.2.12)
x
so that G(1) = 0 and G0 (x) = g(x) a.e., since g is integrable. Integration by parts then
gives
Z 1
Z 1
Z 1
0
u(x)g(x) dx =
u(x)G (x) dx = −
u0 (x)G(x) dx
(11.2.13)
0
0
since the boundary term vanishes. Thus we have
Z 1
u0 (x)(v(x) + G(x) dx = 0
(11.2.14)
0
Now in (11.2.14) choose u(x) =
The result is that
Z
Rx
0
v(y) + G(y) dy, which is legitimate since u ∈ D(T ).
1
|v(x) + G(x)|2 dx = 0
(11.2.15)
0
R1
which can only occur if v(x) = −G(x) = x g(y) dy a.e., implying that T ∗ v = g = −v 0 .
The above representation for v also shows that v 0 ∈ L2 (0, 1) and v(1) = 0, i.e.
D(T ∗ ) ⊂ {v ∈ L2 (0, 1) : v 0 ∈ L2 (0, 1), v(1) = 0}
(11.2.16)
We claim that the reverse inclusion is also correct: If v belongs to the set on the right
and u ∈ D(T ) then
Z 1
Z 1
0
u(x)v 0 (x) dx = hu, −v 0 i
(11.2.17)
hT u, vi =
u (x)v(x) dx = −
0
f8.1
0
0
Thus (v, −v 0 ) is an admissible pair for T ∗ , from which we conclude that v ∈ D(T ∗ ) and
T ∗ v = −v 0 as needed.
193
11-2-12
In summary we have established that T ∗ v = −v 0 with domain
D(T ∗ ) = {v ∈ L2 (0, 1) : v 0 ∈ L2 (0, 1), v(1) = 0}
(11.2.18)
We remark that if we had originally defined T on the smaller domain {u ∈ C 1 ([0, 1]) :
u(0) = 0} we would have obtained exactly the same result for T ∗ as above. This is a
∗
special case of the general fact that T ∗ = T (see Exercise 14).
Definition 11.4. If T = T ∗ we say T is self-adjoint.
It is crucial here that equality of the operators T and T ∗ must include the fact that
their domains are identical.
Example 11.4. If in the previous example we defined T u = iu0 on the same domain
we would find that T ∗ v = iv 0 on the domain (11.2.18). Even though the expressions for
T, T ∗ are the same, T is not self-adjoint since the two domains are different. It does,
however, possess the property of symmetry.
Definition 11.5. We say that T is symmetric if hT u, vi = hu, T vi for all u, v ∈ D(T )
Example 11.5. Let T u = iu0 be the unbounded operator on H = L2 (0, 1) with domain
D(T ) = {u ∈ L2 (0, 1) : u0 ∈ L2 (0, 1), u(0) = u(1) = 0}
(11.2.19)
One sees immediately that T is symmetric, however it is still not self-adjoint since
D(T ∗ ) 6= D(T ) again, see Exercise 6.
If T is symmetric and u ∈ D(T ) then (v, T v) is an admissible pair for T ∗ , thus
D(T ) ⊂ D(T ∗ ) and T ∗ v = T v for v ∈ D(T ). In other words, T ∗ is always an extension
of T whenever T is symmetric. We see, therefore, that any self-adjoint operator is closed
and any symmetric operator is closeable.
prop11-5
Proposition 11.6. If T is densely defined and one-to-one, and if also R(T ) is dense,
then T ∗ is also one-to-one and (T ∗ )−1 = (T −1 )∗ .
Proof: By our assumptions, S = (T −1 )∗ exists. We are done if we show ST ∗ u = u for
all u ∈ D(T ∗ ) and T ∗ Sv = v for all v ∈ D(S).
First let u ∈ D(T ∗ ) and v ∈ D(T −1 ). Then
hv, ui = hT T −1 v, ui = hT −1 v, T ∗ ui
194
(11.2.20)
DAdjDom
This means (T ∗ u, u) is an admissible pair for (T −1 )∗ and (T −1 )∗ T ∗ u = u as needed.
Next, if u ∈ D(T ) and v ∈ D(S) then
hu, vi = hT −1 T u, vi = hT u, Svi
(11.2.21)
Therefore (Sv, v) is admissible for T ∗ , so that Sv ∈ D(T ∗ ) and T ∗ Sv = v.
With a small modification of the proof, we obtain that Proposition 10.4 remains valid.
th11-3
Theorem 11.3. If T : D(T ) ⊂ H → H is a densely defined linear operator then
N (T ∗ ) = R(T )⊥ .
Proof: Let f ∈ R(T ) and v ∈ N (T ∗ ). We have f = T u for some u ∈ D(T ) and
hf, vi = hT u, vi = hu, T ∗ vi = 0
(11.2.22)
so N (T ∗ ) ⊂ R(T )⊥ . To get the reverse inclusion, let v ∈ R(T )⊥ , so that hT u, vi = 0 =
hu, 0i for any u ∈ D(T ). This means (v, 0) is an admissible pair for T ∗ , so v ∈ D(T ∗ )
and T ∗ v = 0. Thus R(T )⊥ ⊂ N (T ∗ ) as needed.
Theorem 11.4. If T, T ∗ are both densely defined then T is closeable.
Proof: If we assume that T ∗ is densely defined, then T ∗∗ exists and is closed. If u ∈ D(T )
and v ∈ D(T ∗ ) then hT ∗ v, ui = hv, T ui which is to say that (u, T u) is an admissible pair
for T ∗∗ . Thus u ∈ D(T ∗∗ ) and T ∗∗ u = T u, or equivalently T ⊂ T ∗∗ . Thus T has a closed
extension, namely T ∗∗ .
There is a converse statement, which we will not prove here, see [2] section 46, or [31]
Theorem 13.12.
th11-5
Theorem 11.5. If T is densely defined and closeable then T ∗ must be densely defined,
and T = T ∗∗ . In particular if T is closed and densely defined then T ∗∗ = T .
11.3
Extensions of symmetric operators
It has been observed above that if T is a densely defined symmetric operator then the
adjoint T ∗ is always an extension of T . It is an interesting question whether such a T
195
always possesses a self-adjoint extension – the extension would necessarily be different
from T ∗ at least if T is closed, since then if T ∗ is self-adjoint so is T , by Theorem 11.5
above.
We say that a linear operator T is positive if hT u, ui ≥ 0 for all u ∈ D(T ).
frext
Theorem 11.6. If T is a densely defined, positive, symmetric operator on a Hilbert space
H then T has a positive self-adjoint extension.
Proof: Define
hu, vi∗ = hu, vi + hT u, vi u, v ∈ D(T )
(11.3.1)
with corresponding norm denoted by ||u||∗ . It may be easily verified that all of the inner
product axioms are satisfied by h·, ·i∗ on D(T ), and ||u|| ≤ ||u||∗ . Let H ∗ be the dense
closed subspace of H obtained as the closure of D(T ) in the || · ||∗ norm, and regard it
as equipped with the h·, ·i∗ inner product. For any z ∈ H the functional ψz (u) = hu, zi
belongs to the dual space of H ∗ since |ψz (u)| ≤ ||u|| ||z|| ≤ ||u||∗ ||z||, in particular
||ψz ||∗ ≤ ||z|| as a linear functional on H ∗ . Thus by the Riesz Representation theorem
there exists a unique element Λz ∈ H ∗ ⊂ H such that
ψz (u) = hu, Λzi∗
u ∈ H∗
(11.3.2)
with ||Λz|| ≤ ||Λz||∗ ≤ ||z||.
It may be checked that Λ : H → H is linear, and regarded as an operator on H we
claim it is also self-adjoint. To see this observe that for any u, z ∈ H we have
hΛu, zi = ψz (Λu) = hΛu, Λzi∗ = hΛz, Λui∗ = ψu (Λz) = hΛz, ui∗ = hu, Λzi
(11.3.3)
Choosing u = z we also see that Λ is positive, namely
hΛz, zi = hΛz, Λzi∗ ≥ 0
(11.3.4)
Next Λ is one-to-one, since if Λz = 0 and u ∈ H ∗ it follows that
0 = hu, Λzi∗ = hu, zi ∀u ∈ H ∗
(11.3.5)
and since H ∗ is dense in H the conclusion follows. The range of Λ is also dense in H ∗ ,
hence in H, because otherwise there must exist u ∈ H ∗ such that 0 = hu, Λzi∗ = hu, zi
for all z ∈ H. From the above considerations and Proposition 11.6 we conclude that
S = Λ−1 exists and is a densely defined self-adjoint operator on H.
196
We will complete the proof by showing that the self-adjoint operator S − I is an
extension of T . For z, w ∈ D(T ) we have
hz, wi∗ = h(I + T )z, wi = ψ(I+T )z (w) = hw, R(I + T )zi∗ = hR(I + T )z, wi∗
(11.3.6)
and so
Λ(I + T )z = z
∀z ∈ D(T )
(11.3.7)
by the assumed density of D(T ). In particular D(T ) ⊂ R(Λ) = D(S) and (I + T )z =
Λ−1 z = Sz for z ∈ D(T ), as needed. The positivity of S follows immediately from that
of Λ.
A positive symmetric operator may have more than one self-adjoint extension, but the
specific one constructed in the above proof is usually known as the Friedrichs extension.
To clarify what all of the objects in the proof are, it may be helpful to think of the case
that T u = −∆u on the domain D(T ) = C 2 (Ω) ∩ C0 (Ω). In this case ||u||∗ = ||u||H 1 (Ω) ,
H ∗ = H01 (Ω) (except endowed with the usual H 1 norm) and the Friedrichs extension will
turn out to be the Dirichlet Laplacian discussed in detail in Section 14.4.
The condition of positivity for T may be weakened, see Exercise 16.
11.4
Exercises
1. Let T, S be densely defined linear operators on a Hilbert space. If T ⊂ S, show
that S ∗ ⊂ T ∗ .
Ex11-3
2. Verify that H × H is a Hilbert space with the inner product given by (11.1.2), and
prove Proposition 11.1.
3. Prove the null space of a closed operator is closed.
Ex11-3a
4. Let φ ∈ H = L2 (R) be any nonzero function and define the linear operator
Z ∞
Tu =
u(x) dx φ
−∞
on the domain D(T ) = L1 (R) ∩ L2 (R).
a) Show that T is unbounded and densely defined
b) Show that T ∗ is not densely defined, more specifically show that T ∗ is the zero operator with domain {φ}⊥ . (Since D(T ∗ ) is not dense, it then follows from Theorem
11.5 that T is not closeable.)
197
Ex11-4
5. If T : D(T ) ⊂ H → H is a densely defined linear operator, v ∈ H and the map
u → hT u, vi is bounded on D(T ), show that there exists v ∗ ∈ H such that (v, v ∗ ) is
an admissible pair for T ∗ .
Ex11-5
6. Let H = L2 (0, 1) and T1 u = T2 u = iu0 with domains
D(T1 ) = {u ∈ AC[0, 1] : u(0) = u(1), u0 ∈ H}
D(T2 ) = {u ∈ AC[0, 1] : u(0) = u(1) = 0, u0 ∈ H}
Show that T1 is self-adjoint, and that T2 is closed and symmetric but not self-adjoint.
What is T2∗ ?
Ec-11-7
7. If T is symmetric and R(T ) = H show that T is self-adjoint. (Suggestion: it is
enough to show that D(T ∗ ) ⊂ D(T ).)
8. Show that if T is self-adjoint and one-to-one then T −1 is also self-adjoint. (Hint:
All you really need to do is show that T −1 is densely defined.)
9. If T is self-adjoint, S is symmetric and T ⊂ S, show that T = S. (Thus a self-adjoint
operator has no proper symmetric extension).
10. Let T, S be densely defined linear operators on H and assume that D(T + S) =
D(T )∩D(S) is also dense. Show that T ∗ +S ∗ ⊂ (T +S)∗ . Give an example showing
that T ∗ + S ∗ and (T + S)∗ may be unequal.
ex11-10
11. Assume that T is closed and S is bounded
a) show that S + T is closed
b) Show that T S is closed, but that ST is not closed, in general.
12. Prove Proposition 11.4.
ex11-9
13. Let H = `2 and define
Sx = {
∞
X
nxn , 4x2 , 9x3 , . . . }
(11.4.1)
n=1
T x = {0, −4x2 , −9x3 , . . . }
(11.4.2)
on D(S) = D(T ) = {x ∈ `2 : n4 |xn |2 < ∞}. Show that S, T are closed, but S + T
is not closeable. (Hint: for example en /n → 0 but (S + T )en /n → e1 .)
ex11-12
14. If T is closable, show that T and T have the same adjoint.
198
15. Suppose that T is densely defined and symmetric with dense range. Prove that
N (T ) = {0}.
ex11-14
16. We say that a linear operator on a Hilbert space H is bounded below if there exists
a constant c0 > 0 such that
hT u, ui ≥ −c0 ||u||2
∀u ∈ D(T )
Show that Theorem 11.6 remains valid if the condition that T be positive is replaced
by the assumption that T is bounded below. (Hint: T + co I is positive.)
199
Chapter 12
Spectrum of an operator
hap-spectrum
12.1
Resolvent and spectrum of a linear operator
Let T be a densely linear operator on a Hilbert space H. As usual, we use I to denote
the identity operator on H.
Definition 12.1. We say that λ ∈ C is a regular point for T if λI − T is one-to-one and
onto, and (λI − T )−1 ∈ B(H). We then define ρ(T ), the resolvent set of T and σ(T ), the
spectrum of T by
ρ(T ) = {λ ∈ C : λ is a regular point for T }
σ(T ) = C\ρ(T )
(12.1.1)
Example 12.1. Let H = CN , T u = Au for some N × N matrix A. From linear algebra
we know that λI − T is one-to-one and onto (automatically with a bounded inverse)
precisely if λI − A is a non-singular matrix. Equivalently, λ is in the resolvent set if and
only if λ is not an eigenvalue of A, where the eigenvalues are the roots of the N ’th degree
polynomial det (λI − A). Thus σ(T ) consists of a finite number of points λ1 , . . . , λM ,
where 1 ≤ M ≤ N , and all other points of the complex plane make up the resolvent set
ρ(T ).
In the case of a finite dimensional Hilbert space there is thus only one kind of point
in the spectrum, where (λI − T ) is neither one-to-one nor onto. But in general there are
more possibilities. The following definition presents a traditional division of the spectrum
into three parts.
Definition 12.2. Let λ ∈ σ(T ). Then
200
1. If λI − T is not one-to-one then we say λ ∈ σp (T ), the point spectrum of T .
2. If λI − T is one-to-one, R(λI − T ) = H, but (λI − T )−1 is not bounded, then we
say λ ∈ σc (T ), the continuous spectrum of T .
3. If λI − T is one-to-one but R(λI − T ) 6= H then we say λ ∈ σr (T ), the residual
spectrum of T .
Thus σ(T ) is the disjoint union of σp (T ), σc (T ) and σr (T ). The point spectrum is
also sometimes called the discrete spectrum. In the case of H = CN , σ(T ) = σp (T ) by
the above discussion, but in general all three parts of the spectrum may be non-empty,
as we will see from examples. There are further subclassifications of the spectrum which
are sometimes useful, see the exercises.
In the case that λ ∈ σp (T ) there must exist u 6= 0 such that T u = λu, and we then
say that λ is an eigenvalue of T and u is a corresponding eigenvector. In the case that
H is a space of functions we will often refer to u an eigenfunction instead. Obviously
any nonzero scalar multiple of an eigenvector is also an eigenvector, and the set of all
eigenvectors for a given λ, together with the zero element, make up N (T − λI), the
null space of T − λI, which will also be called the eigenspace of the eigenvalue λ. The
dimension of N (T − λI) is the multiplicity of λ and may be infinity.1 It is easy to check
that if T is a closed operator then any eigenspace of T is closed.
The concepts of resolvent set and spectrum, and the division of the spectrum just
introduced, are closely connected with what is meant by a well-posed or ill-posed problem,
as discussed in Section 2.4, and which we can restate in somewhat more precise terms
here. If T : D(T ) ⊂ X → Y is an operator between Banach spaces X, Y (T may even
be nonlinear here) then the problem of solving the operator equation T (u) = f is said to
be well posed with respect to X, Y if
1. A solution u exists for every f ∈ Y
2. The solution is unique in X
3. The solution depends continuously on f in the sense that if T (un ) = fn and fn → f
in Y then un → u in X where u is the unique solution of T (u) = f
1
Note this is agrees with the geometric multiplicity concept in linear algebra. In general there is no meaning
for algebraic multiplicity.
201
If the problem is not well-posed then it is ill-posed. Now observe that if T is a linear
operator on H and λ ∈ ρ(T ) then the problem of solving λu − T u = f is well posed with
respect to H. Existence holds since λI − T is onto, uniqueness since it is one-to-one, and
the continuous dependence property follows from the fact that (λI − T )−1 is bounded.
On the other hand, the three subsets of σ(T ) correspond more or less to the failure of
one of the three conditions above: λ ∈ σp (T ) means that uniqueness fails, λ ∈ σc (T )
means that the inverse map is defined on a dense subspace on which it is discontinuous,
and λ ∈ σr (T ) implies that existence fails in a more dramatic way, namely the closure of
the range of the map is a proper subspace of H.
Because the operator (λI − T )−1 arises so frequently, we introduce the notation
Rλ = (λI − T )−1
(12.1.2)
12-1-2
which is called the resolvent operator of T . Thus λ ∈ ρ(T ) if and only if Rλ ∈ B(H). It
may be checked that the resolvent identity
Rλ − Rµ = (µ − λ)Rλ Rµ
(12.1.3)
is valid (see Exercise 2).
Below we will look at a number of examples of operators and their spectra, but first
we will establish a few general results. Among the most fundamental of these is that
the resolvent set of any linear operator is open, so that the spectrum is closed. More
generally, the property of being in the resolvent set is preserved under any sufficiently
small bounded perturbation.
prop12-1
Proposition 12.1. Let T, S be linear operators on H such that 0 ∈ ρ(T ) and S ∈ B(H)
with ||S|| ||T −1 || < 1. Then 0 ∈ ρ(T + S).
Proof: Since ||T −1 S|| ≤ ||T −1 || ||S|| < 1 it follows from Theorem 10.4 that (I +
T −1 S)−1 ∈ B(H). If we now set A = (I + T −1 S)−1 T −1 then A ∈ B(H) also, and
A(T + S) = (I + T −1 S)−1 T −1 (T + S) = (I + T −1 S)−1 (I + T −1 S) = I
(12.1.4)
Similarly (T + S)A = I, so (T + S) has a bounded inverse, as needed.
We may now immediately obtain the properties of resolvent set and spectrum mentioned above.
Theorem 12.1. If T is a linear operator on H then ρ(T ) is open and σ(T ) is closed in
C. In addition if T ∈ B(H) and λ ∈ σ(T ) then |λ| ≤ ||T ||, so that σ(T ) is compact.
202
12-1-3
specnotempty
Proof: Let λ ∈ ρ(T ) so (λI − T )−1 ∈ B(H). If || < 1/||(λI − T )−1 || we can apply
Proposition 12.1 with T replaced by λI − T and S = I to get that 0 ∈ ρ((λ + )I − T ),
or equivalently λ + ∈ ρ(T ) for all sufficiently small ||. When T ∈ B(H), the conclusion
that σ(T ) is contained in the closed disk centered at the origin of radius ||T || is part of
the statement of Theorem 10.4.
Definition 12.3. The spectral radius of T is
r(T ) = sup{|λ| : λ ∈ σ(T )}
(12.1.5)
That is to say, r(T ) is the radius of the smallest disk centered at the origin containing
the spectrum of T . By the previous theorem we have always r(T ) ≤ ||T ||. This inequality
can be strict, even in the case that H = C2 , as may be seen in the example
0 1
Tu =
(12.1.6)
0 0
for which r(T ) = 0 but ||T || = 1. We do, however, have the following theorem, generalizing the well know spectral radius formula from matrix theory.
1
Theorem 12.2. If T ∈ B(H) then r(T ) = limn→∞ ||T n || n .
We will not prove this here, but see for example Proposition 9.7 of [17] or Theorem
10.13 of [31].
It is a natural question to ask whether it is possible that either of ρ(T ), σ(T ) can be
empty. In fact both can happen, for example any operator which is not closed has an
empty resolvent set. To see this, suppose λ ∈ ρ(T ). Then (λI − T )−1 ∈ B(H) hence
is closed, and by Proposition 11.3 (λI − T ) is then also closed. Finally it follows from
Exercise 11 in Chapter 11 that T = λI − (λI − T ) is also closed. An example for which
σ(T ) is empty is given in Exercise 6. The following theorem, however, says that this is
impossible in the case of bounded T .
Theorem 12.3. If T ∈ B(H) then σ(T ) 6= ∅.
Proof: Let x, y ∈ H and define
f (λ) = hx, Rλ yi
203
(12.1.7)
If σ(T ) = ∅ then f is defined for all λ ∈ C, and is differentiable with respect to the
complex variable λ, so that f is an entire function. On the other hand, for |λ| > ||T || we
have by(10.8.3) that
1
→ 0 as |λ| → ∞
(12.1.8)
||Rλ || ≤
|λ| − ||T ||
Thus by Liouville’s Theorem f (λ) ≡ 0. Since x is arbitrary we must have Rλ y = 0 for
any y ∈ H which is clearly false.
12.2
Examples of operators and their spectra
The purpose of introducing the concepts of resolvent and spectrum is to provide a systematic way of analyzing the solvability properties for operator equations of the the form
λu − T u = f . Even if we are actually only interested in the case when λ = 0 (or some
other fixed value) it is somehow still revealing to study the whole family of problems, as
λ varies over C. In this section we will look in detail at some examples.
Example 12.2. If H = CN and T u = Au for some N × N matrix A, then by previous
discussion we have
σ(T ) = σp (T ) = {λ1 , . . . , λm }
(12.2.1)
for some 1 ≤ m ≤ N , where λ1 , . . . λm are the distinct eigenvalues of A. Each eigenspace
N (λj I − T ) has dimension equal to the geometric multiplicity of λj and the sum of these
dimensions is also some integer between 1 and N .
Example 12.3. Let Ω ⊂ RN be a bounded open set, H = L2 (Ω) and let T be the
multiplication operator T u(x) = a(x)u(x) for some a ∈ C(Ω). If we begin by looking for
eigenvalues of T then we seek nontrivial solutions of T u = λu, that is to say
(λ − a(x))u(x) = 0
(12.2.2)
If a(x) 6= λ a.e. then u ≡ 0 is the only solution, so λ 6∈ σp (T ).
It is useful here to introduce a notation for the level sets of a, Eλ = {x ∈ Ω : a(x) = λ}.
If for some λ we have m(Eλ ) > 0 then the characteristic function u(x) = χΣ (x) is an
eigenfunction for the eigenvalue λ if Σ is any subset of Eλ of positive, finite measure. In
fact so is any other L2 function whose support lies within Eλ , and thus the corresponding
eigenspace is infinite dimensional. Thus
σp (T ) = {λ ∈ C : m(Eλ ) > 0}
204
(12.2.3)
Note that σp (T ) is at most countably infinite, since for example An = {λ ∈ C : m(Eλ ) >
1
} is at most countable for every n and σp (T ) = ∪∞
n=1 An .
n
Now let us consider the other parts of the spectrum. Consider the equation λu−T u =
f whose only possible solution is u(x) = f (x)/(λ − a(x)). For λ 6∈ σp (T ) u(x) is well
defined a.e., but it doesn’t necessarily follow that u ∈ L2 (Ω) even if f is. If λ 6∈ R(a)
(here R(a) is the range of the function a) then there exists δ > 0 such that |a(x) − λ| ≥ δ
for all x ∈ Ω, from which it follows that u = (λI − T )−1 f exists in L2 (Ω) and satisfies
|u(x)| ≤ δ −1 |f (x)|. Thus ||(λI − T )−1 || ≤ δ −1 and so λ ∈ ρ(T ).
If, on the other hand λ ∈ R(a) it is always possible to find f ∈ L2 (Ω) such that
u(x) = f (x)/(λ − a(x)) is not in L2 (Ω). This means in particular that λI − T is not onto,
i.e. λ is either in the continuous or residual spectrum. In fact it is not hard to verify that
the range of λI − T must be dense in this case. To see this, suppose λ ∈ σ(T )\σp (T ) so
that m(Eλ ) = 0. Then for any n there must exist an open set On containing Eλ such
that m(On ) < n1 . For any function f ∈ L2 (Ω) let Un = Ω\On and fn = f χUn . Then
fn ∈ R(λI − T ) since λ − a(x) will be bounded away from zero on Un , and fn → f in
L2 (Ω) as needed.
To summarize, we have the following conclusions about the spectral properties of T :
• ρ(T ) = {λ ∈ C : λ 6∈ R(m)}
• σp (T ) = {λ ∈ R(m) : m(Eλ ) > 0}
• σc (T ) = {λ ∈ R(m) : m(Eλ ) = 0}
• σr (T ) = ∅
Ex-12-4
Rx
Example 12.4. Next we consider the Volterra type integral operator T u(x) = 0 u(s) ds
on T = L2 (0, 1). We first observe that any λ 6= 0 is in the resolvent set of T . To see this,
consider the problem of solving (λI − T )u = f , i.e.
Z x
λu(x) −
u(s) ds = f (x) 0 < x < 1
(12.2.4)
0
with f ∈ L2 (0, 1). This is precisely the equation (2.2.10) whose solution is given in
(2.2.13) if g = −f and which is well defined for any f ∈ L2 (0, 1). Crude estimation
shows ||u|| = ||(λI − T )−1 f || ≤ C||f || for some constant C which depends only on λ.
Thus any nonzero λ is in ρ(T ). By Theorem 12.3 we can immediately conclude
that 0
Rx
must be in σ(T ). It is clear that λ = 0 cannot be an eigenvalue, since 0 u(s) ds = 0
205
VolterraReso
implies u(x) = 0 a.e., by the Fundamental Theorem of Calculus. On the other hand
R(T ) is dense, since for example it contains D(0, 1). One could also verify directly, that
T −1 is unbounded. We conclude then that
σ(T ) = σc (T ) = {0}
(12.2.5)
In the next example we see a typical way that residual spectrum appears.
Example 12.5. Let H = `2 and T = S+ the right shift operator introduced in (10.2.31).
As usual we first look for eigenvalues. The equation T x = λx gives λx1 = 0 and λxn+1 =
xn for n = 1, 2, . . . . Thus if λ 6= 0 we immediately conclude that x = 0. If T x = 0 we also
see directly that x = 0, thus the point spectrum is empty. Since T is a bounded operator
of norm 1, we also know that if |λ| > 1 then λ ∈ ρ(T ). Since R(T ) ⊂ {x ∈ `2 : x1 = 0}
it follows that R(T ) is not dense in `2 , and since we already know 0 is not an eigenvalue
it must be that 0 ∈ σr (T ). See Exercise 4 for classification of the remaining λ values.
Finally we consider the case of an unbounded operator.
ex12-6
Example 12.6. Let H = L2 (0, 1) and T u = −u00 on the domain
D(T ) = {u ∈ H 2 (0, 1) : u(0) = u(1) = 0}
(12.2.6)
The equation λu − T u = 0 is equivalent to the ODE boundary value problem
u00 + λu = 0 0 < x < 1
u(0) = u(1) = 0
(12.2.7)
which was already discussed in Chapter 2, see (2.3.53). We found that a nontrivial solution un (x) = sin nπx exists for λ = λn = (nπ)2 and there are no other eigenvalues. Notice
that the spectrum is unbounded here, as typically happens for unbounded operators.
We claim that all other λ ∈ C are in the resolvent set of T . To see this, we begin by
representing the general solution of u00 + λu = f for f ∈ L2 (0, 1) as
Z x
√
√
√
1
u(x) = C1 sin λx + C2 cos λx + √
sin λ(x − y)f (y) dy
(12.2.8)
λ 0
which may be derived from the usual variation of parameters method.2 To satisfy the
boundary conditions u(0) = u(1) = 0 we must have C2 = 0 and
Z 1
√
√
1
C1 sin λ + √
sin λ(1 − y)f (y) dy = 0
(12.2.9)
λ 0
√
It is correct for all complex λ 6= 0, taking λ to denote the principal branch of the square root function. We
leave the remaining case λ = 0 as an exercise.
2
206
which uniquely determines C1 as long as λ 6= (nπ)2 . Using this expression for C1 we
obtain a formula for u = (λI − T )−1 f of the form
Z 1
Gλ (x, y)f (y) dy
(12.2.10)
u(x) =
0
with a bounded kernel Gλ (x, y). By previous discussion we know that such an integral
operator is bounded on L2 (0, 1) and so λ ∈ ρ(T ).
12.3
Properties of spectra
We will see in this section that if an operator T belongs to some special class, then its
spectrum will often have some corresponding special properties.
th12-4
Theorem 12.4. Let T be a closed, densely defined operator.
1. If λ ∈ ρ(T ) then λ ∈ ρ(T ∗ ).
2. If λ ∈ σr (T ) then λ ∈ σp (T ∗ ).
3. If λ ∈ σp (T ) then λ ∈ σr (T ∗ ) ∪ σp (T ∗ ).
Proof: If λ ∈ ρ(T ) then
N (λI − T ∗ ) = N ((λI − T )∗ ) = R(λI − T )⊥ = {0}
(12.3.1)
where Theorem 11.3 is used for the second equality. In particular λI − T ∗ is invertible.
Also
R(λI − T ∗ ) = N (λI − T )⊥ = {0}⊥ = H
(12.3.2)
so that (λI − T ∗ )−1 is densely defined. Proposition 11.6 is then applicable so that
(λI − T ∗ )−1 = ((λI − T )∗ )−1 = ((λI − T )−1 )∗ ∈ B(H)
(12.3.3)
Therefore λ ∈ ρ(T ∗ ).
Next, if λ ∈ σr (T ) then R(λI − T ) = M for some subspace M whose closure is not
all of H. Thus
⊥
N (λI − T ∗ ) = R(λI − T )⊥ = M ⊥ = M 6= {0}
(12.3.4)
207
and so λ ∈ σp (T ∗ ).
Finally, if λ ∈ σp (T ) then
R(λI − T ∗ ) = N (λI − T )⊥ 6= H
(12.3.5)
so λ 6∈ σc (T ∗ ), as needed.
Next we turn to some special properties of self-adjoint and unitary operators.
th12-5
Theorem 12.5. Suppose that T is a densely defined operator with T ∗ = T . We then
have
1. σ(T ) ⊂ R.
2. σr (T ) = ∅.
3. If λ1 , λ2 ∈ σp (T ), λ1 6= λ2 then N (λ1 I − T ) ⊥ N (λ2 I − T ).
Proof: To prove the first statement, let λ = ξ + iη with η 6= 0. Then
||λu − T u||2 = hξu + iηu − T u, ξu + iηu − T ui = ||ξu − T u||2 + |η|2 ||u||2
(12.3.6)
since hξu − T u, iηui + hiηu, ξu − T ui = 0. In particular
||λu − T u|| ≥ |η|||u||
(12.3.7)
so λI − T is one to one, i.e. λ 6∈ σp (T ). Likewise λ 6∈ σr (T ) since otherwise, by Theorem
12.4 we would have λ ∈ σp (T ∗ ) = σp (T ) which is impossible by the same argument. Thus
if λ ∈ σ(T ) then it can only be in the continuous spectrum so R(λI − T ) is dense in H.
But (12.3.7) with η 6= 0 also implies that R(λI − T ) is closed and (12.3.7) then also says
that ||(λI − T )−1 || ≤ 1/|η|. Thus λ ∈ ρ(T ).
Next, if λ ∈ σr (T ) then λ ∈ σp (T ∗ ) = σp (T ) by Theorem 12.4. But λ must be real by
the first part of this proof, so λ ∈ σp (T ) ∩ σr (T ), which is impossible.
Finally, if λ1 , λ2 are distinct eigenvalues, pick u1 , u2 such that T u1 = λ1 u1 and T u2 =
λ2 u2 . There follows
λ1 hu1 , u2 i = hλ1 u1 , u2 i = hT u1 , u2 i = hu1 , T u2 i = hu1 , λ2 u2 i = λ2 hu1 , u2 i
Since λ1 , λ2 must be real we see that (λ1 − λ2 )hu1 , u2 i = 0 so u1 ⊥ u2 as needed.
208
(12.3.8)
lowerbound
Theorem 12.6. If T is a unitary operator then σr (T ) = ∅ and σ(T ) ⊂ {λ : |λ| = 1}.
Proof: Recall that ||T u|| = ||u|| for all u when T is unitary. Thus if T u = λu we then
have
||u|| = ||T u|| = ||λu|| = |λ|||u||
(12.3.9)
so |λ| = 1 must hold for any λ ∈ σp (T ). If λ ∈ σr (T ) then λ ∈ σp (T ∗ ) by Theorem
12.4. Since T ∗ is also unitary we get |λ| = |λ| = 1. Also T ∗ u = λu implies that
u = T T ∗ u = λT u so that λ = 1/λ ∈ σp (T ), which is a contradiction to the assumption
that λ ∈ σr (T ). Thus the residual spectrum of T is empty.
To complete the proof, first note that since ||T || = 1 we must have |λ| ≤ 1 if λ ∈ σ(T )
by Theorem 10.4. If |λ| < 1 then (I − λT ∗ )−1 ∈ B(H) by the same theorem, and for
any f ∈ H we can obtain a solution of λu − T u = f by setting u = −T ∗ (I − λT ∗ )−1 f .
Since we already know λ 6∈ σp (T ) it follows that λI − T is one-to-one and onto, and
||(λI − T )−1 || = ||T ∗ (I − λT ∗ )−1 || which is finite, and so λ ∈ ρ(T ), as needed.
Example 12.7. Let T = F, the Fourier transform on H = L2 (RN ), as defined in (8.4.1),
which we have already established is unitary, see (10.5.10). From the inversion formula
for the Fourier transform it is immediate that F 4 = I. If Fu = λu we would also have
u = F 4 u = λ4 u so that any eigenvalue λ of F satisfies λ4 = 1, i.e. σp (F) ⊂ {±1, ±i}. We
|x|2
already knew that λ = 1 must be an eigenvalue with a Gaussian e− 2 as a corresponding
eigenfunction. In fact all four values ±1, ±i are eigenvalues with infinite dimensional
eigenspaces spanned by products of Gaussians and so-called Hermite polynomials. See
Section 2.5 of [9] for more details. In Exercise 5 you are asked to show that all other
values of λ are in the resolvent set of F.
Example 12.8. The Hilbert transform H introduced in Example 10.7 is also unitary
on H = L2 (R). Since also H2 = −I it follows that the only possible eigenvalues of
H are ±i. It is readily checked that these are both eigenvalues with the eigenspace
for λ = i being M− = {u ∈ L2 (R) : û(k) = 0 ∀k > 0} and that for λ = −i being
M+ = {u ∈ L2 (R) : û(k) = 0 ∀k < 0}. Let us check that any λ 6= ±i is in the resolvent
set. If λu − Hu = f then applying H to both sides we get λHu + u = Hf . Eliminating
Hu between these two equations we can solve for
u=
λf + Hf
λ2 + 1
(12.3.10)
Conversely by direct substitution we can verify that this formula defines a solution of
λu − Hu = f , so that (λI − H)−1 = λI+H
which is obviously bounded for λ 6= ±i.
λ2 +1
209
Finally we discuss an important example of an unbounded operator.
Example 12.9. Let H = L2 (RN ) and T u = −∆u on D(T ) = H 2 (RN ). If we apply the
Fourier transform then for f, u ∈ H the resolvent equation λu − T u = f is seen to be
equivalent to
(λ − |y|2 )û(y) = fˆ(y)
(12.3.11)
It is then immediate that σ(T ) ⊂ [0, ∞) and that σp (T ) = ∅. On the other hand
√a
ˆ
solution û, and hence u, exists in H as long as f vanishes in a neighborhood of y = λ.
Such f form a dense subset of H so σr (T ) = ∅ also. This could also be shown by verifying
that T is self-adjoint. Finally, it is clear that for λ > 0 there exists a function u such
that û 6∈ L2 (RN ) but g := (λ − |y|2 )û ∈ L2 (RN ). If f ∈ L2 (RN ) is defined by fˆ = g then
it follows that f is not in the range of λI − T , so λ ∈ σc (T ) must hold. In summary,
σ(T ) = σc (T ) = [0, ∞).
12.4
Exercises
1. Let M be a closed subspace of a Hilbert space H, M 6= {0}, H and let PM be the
usual orthogonal projection onto M . Show that if λ 6= 0, 1 then λ ∈ ρ(PM ) and
||(λI − PM )−1 || ≤
ex12-2
1
1
+
|λ| |1 − λ|
2. Recall that the resolvent operator of T is defined to be Rλ = (λI −T )−1 for λ ∈ ρ(T ).
a) Prove the resolvent identity (12.1.3).
b) Deduce from this that Rλ , Rµ commute.
c) Show also that T, Rλ commute for λ ∈ ρ(T ).
3. Show that λ → Rλ is a continuously differentiable, regarded as a mapping from
ρ(T ) ⊂ C into B(H), with
dRλ
= −Rλ2
dλ
ex12-4
4. Let T denote the right shift operator on `2 . Show that
a) σp (T ) = ∅
b) σc (T ) = {λ : |λ| = 1}
c) σr (T ) = {λ : |λ| < 1}
210
ex12-4b
5. If λ 6= ±1, ±i show that λ is in the resolvent set of the Fourier transform F.
(Suggestion: Assuming that a solution of Fu − λu = f exists, derive an explicit
formula for it using
F 4 u = λ4 u + λ3 f + λ2 Ff + λF 2 f + F 3 f
and the fact that F 4 = I if F is the Fourier transform.)
ex12-5
6. Let H = L2 (0, 1), T1 u = T2 f = T3 u = u0 on the domains
D(T1 ) = H 1 (0, 1)
D(T2 ) = {u ∈ H 1 (0, 1) : u(0) = 0}
D(T3 ) = {u ∈ H 1 (0, 1) : u(0) = u(1) = 0}
Show that
(i) σ(T1 ) = σp (T1 ) = C
(ii) σ(T2 ) = ∅
(iii) σ(T3 ) = σr (T3 ) = C.
7. Define the translation operator T u(x) = u(x − 1) on L2 (R).
a) Find T ∗ .
b) Show that T is unitary.
c) Show that σ(T ) = σc (T ) = {λ ∈ C : |λ| = 1}.
Rx
8. Let T u(x) = 0 K(x, y)u(y) dy be a Volterra integral operator on L2 (0, 1) with a
bounded kernel, |K(x, y)| ≤ M . Show that σ(T ) = {0}. (There are several ways
to show that T has no nonzero eigenvalues. Here is one approach: Define the
equivalent norm on L2 (0, 1)
Z 1
2
||u||θ =
u2 (x)e−2θx dx
0
and show that the supremum of
sufficiently large.)
||T u||θ
||u||θ
can be made arbitrarily small by choosing θ
9. If T is a symmetric operator, show that
σp (T ) ∪ σc (T ) ⊂ R
(It is almost the same as showing that σ(T ) ⊂ R for a self-adjoint operator.)
211
10. The approximate spectrum σa (T ) of a linear operator T is the set of all λ ∈ C
such that there exists a sequence {un } in H such that ||un || = 1 for all n and
||T un − λun || → 0 as n → ∞. Show that
σp (T ) ∪ σc (T ) ⊂ σa (T ) ⊂ σ(T )
(so that σa (T ) = σ(T ) in the case of a self-adjoint operator.) Show by example that
σr (T ) need not be contained in σa (T ).
11. The essential spectrum σe (T ) of a linear operator T is the set of all λ ∈ C such
that λI − T is not a Fredholm operator3 (recall the Definition 10.5). Show that
σe (T ) ⊂ σ(T ). Characterize the essential spectrum for the following operators: i)
a linear operator on Cn , ii) an orthogonal projection on a Hilbert space, iii) the
Fourier transform on L2 (RN ), and iv) a multiplication operator on L2 (Ω).
12. If T is a bounded, self-adjoint operator on a Hilbert space H, show that hT u, ui ≥ 0
for all u ∈ H if and only if σ(T ) ⊂ [0, ∞).
3
Actually there are several non-equivalent definitions of essential spectrum which can be found in the literature.
212
Chapter 13
Compact Operators
chcompact
13.1
Compact operators
One type of operator which has not yet been mentioned much in connection with spectral
theory is integral operators. This is because they typically belong to a particular class of
operators known as compact operators for which there is a well developed special theory,
whose main points will be presented in this chapter.
If X is a Banach space, then as usual K ⊂ X is compact if any open cover of K
has a finite subcover. Equivalently any infinite bounded sequence in K has subsequence
convergent to an element of K. If dim(X) < ∞ then K is compact if and only if it is
closed and bounded, but this is false if dim(X) = ∞.
exmp13-1
Example 13.1. Let H be an infinite dimensional Hilbert space and K = {u ∈ H : ||u|| ≤
1}, which is obviously closed and bounded. If we let {en }∞
n=1 be an infinite orthonormal
sequence (which
√ we know must exist) there cannot be any convergent subsequence since
||en − em || = 2 for any n 6= m.
Recall also that E ⊂ X is precompact, or relatively compact, if E is compact.
Definition 13.1. If X, Y are Banach spaces then a linear operator T : X → Y is
compact if for any bounded set E ⊂ X the image T (E) is precompact in Y.
This definition makes sense even if T is nonlinear, but in this book the terminology
will only be used in the linear case. We will use the notation K(X, Y) to denote the set
213
of compact linear operators from X to Y and K(X) if Y = X.
Proposition 13.1. If X, Y are Banach spaces then
1. K(X, Y) is a subspace of B(X, Y)
2. If T ∈ B(X, Y) and dim(R(T )) < ∞ then T ∈ K(X, Y)
3. The identity map I ∈ K(X) if and only if dim(X) < ∞
Proof: If T is compact then T (B(0, 1)) is compact in Y and in particular is bounded in
Y. Thus there exists M < ∞ such that ||T u|| ≤ M if ||u|| ≤ 1, which means ||T || ≤ M .
It is immediate to check that K(X, Y) is a vector space, so (1) is proved.
If E ⊂ X is bounded and T ∈ B(X, Y) then T (E) is bounded in Y. Therefore T (E)
is a bounded subset of the finite dimensional set R(T ), so is relatively compact by the
Heine-Borel theorem. This proves (2) and the ’if’ part of (3). The other half of (3) is
equivalent to the statement that the unit ball B(0, 1) is not compact if dim(X) = ∞.
This was shown in Example 13.1 above in the Hilbert space case, and we refer to Theorem
6.5 of [5] for the general case of a Banach space.
Recall that when dim(R(T
R )) < ∞ we say that T is of finite
P rank. Any degenerate
integral operator T u(x) = Ω K(x, y)u(y) dy with K(x, y) = nj=1 φj (x)ψj (y), φj , ψj ∈
L2 (Ω) for j = 1, . . . n, is therefore of finite rank, and so in particular is compact.
A convenient alternate characterization of compact operators involves the notion of
weak convergence. Although the following discussion can mostly be carried out in a
Banach space setting, we will consider only the Hilbert space case.
Definition 13.2. If H is a Hilbert space and {un }∞
n=1 is an infinite sequence in H, we
w
say un converges weakly to u in H (un → u), provided that hun , vi → hu, vi for every
v ∈ H.
Note by the Riesz Representation Theorem that this is the same as requiring `(un ) →
`(u) for every ` ∈ H ∗ – this is the definition to use when generalizing to the Banach space
situation. The weak limit, if it exists, is unique, see Exercise 3.
Example 13.2. Assume that H is infinite dimensional and let {en }∞
n=1 be any orthonormal set in H, which is not convergent by Example 13.1. From Bessel’s inequality we
214
have
∞
X
|hen , vi|2 ≤ ||v||2 < ∞
for all v ∈ H
(13.1.1)
n=1
w
which implies in particular that hen , vi → 0 for every v ∈ H. This means en → 0,
In case it is necessary to emphasize the difference between weak convergence and the
ordinary notion of convergence in H we may refer to the latter as strong convergence. It
is elementary to show that strong convergence always implies weak convergence, but the
converse is false, as the above example shows.
To make the connection to compact operators, let {en }∞
n=1 again denote an infinite
orthonormal set in an infinite dimensional Hilbert space H and suppose T is compact
on H. If un = T en then {un }∞
n=1 is evidently relatively compact in H so we can find a
convergent subsequence unk → u. For any v ∈ H we then have
hunk , vi = hT enk , vi = henk , T ∗ vi → 0
(13.1.2)
w
so that unk = T enk → 0. But since also unk → u we must have unk → 0. Since the
original sequence could be replaced by any of its subsequences we conclude that for any
subsequence enk there must exist a further subsequence enkj such that T enkj → 0. We
now claim now that un → 0, i.e. the entire sequence converges, not just the subsequence.
If not, then there must exist δ > 0 and a subsequence enk such that ||T enk || ≥ δ,
which contradicts the fact just established that T enk must have a subsequence convergent
to zero. We have therefore established that any compact operator maps the weakly
convergent sequence en to a strongly convergent sequence. We will see below that compact
operators always map weakly convergent sequences to strongly convergent sequences and
that this property characterizes compact operators.
Let us first present some more important facts about weak convergence in a Hilbert
space.
prop13-2
w
Proposition 13.2. Let un → u in a Hilbert space H. Then
1. ||u|| ≤ lim inf n→∞ ||un ||.
2. If ||un || → ||u|| then un → u.
Proof: We have
0 ≤ ||un − u||2 = ||un ||2 − 2Re hun , ui + ||u||2
215
(13.1.3)
weak1
or
2Re hun , ui − ||u||2 ≤ ||un ||2
(13.1.4)
Now take the lim inf of both sides to get the conclusion of (1). If ||un || → ||u|| then the
right hand identity of (13.1.3) show that ||un − u|| → 0.
The property in part (1) of the Proposition is often referred to as the weak lower
semicontinuity of the norm. Note that strict inequality can occur, for example in the
case that un is an infinite orthonormal set.
Various familiar topological notions may be based on weak convergence.
Definition 13.3. A set E ⊂ H is weakly closed if
un ∈ E
w
un → u implies u ∈ E
(13.1.5)
and E is weakly open if its complement is weakly closed. We say E is weakly compact if
any infinite sequence in E has a subsequence which is weakly convergent to an element
u ∈ E.
Clearly a weakly closed set is closed, but the converse is false in general.
Example 13.3. If E = {u ∈ H : ||u|| = 1} then E is closed but is not weakly closed,
since again a counterexample is provided by any infinite orthonormal sequence. On the
other hand, E = {u ∈ H : ||u|| ≤ 1} is weakly closed by Proposition 13.2.
Several key facts relating to the weak convergence concept, which we will not prove
here but will make extensive use of, are given in the next theorem.
weaktopthm
Theorem 13.1. Let H be a Hilbert space. Then
1. Any weakly convergent sequence is bounded.
2. Any bounded sequence has a weakly convergent subsequence.
3. If E ⊂ H is convex and closed then it is also weakly closed. In particular any closed
subspace is weakly closed.
The three parts of these theorems are all special cases of some very general results in
functional analysis. The first statement is a special case of the Banach-Steinhaus theorem
(or Uniform Boundedness Principle) which is more generally a theorem about sequences
216
of bounded linear functionals on a Banach space. See Corollary 1 in Section 23 of [2] or
Theorem 5.8 of [30] for the more general Banach space result. The second statement is a
special case of the Banach-Alaoglu theorem, which asserts a weak compactness property
of a bounded sets in the dual space of any Banach space, see Theorem 1 in section 24 of
[2] or Theorem 3.15 of [31] for the more general Banach space result. The third part is
a special case of Mazur’s theorem, also valid in a more general Banach space setting, see
Theorem 3.7 of [5].
Now let us return to the main development and prove the following very important
characterization of compact linear operators.
Theorem 13.2. Let T ∈ B(H). Then T is compact if and only if T maps any weakly
convergent sequence to a strongly convergent sequence.
w
Proof: Suppose that T is compact and un → u. Then {un } is bounded by part 1 of
Theorem 13.1. Since T is bounded the image sequence {T un } is also bounded, hence
has a strongly convergent subsequence by part 2 of the same theorem. Note also that
w
T un → T u since for any v ∈ H we have
hT un , vi = hun , T ∗ vi → hu, T ∗ vi = hT u, vi
(13.1.6)
Thus there must exist a subsequence unk such that T unk → T u strongly in H. By the
same argument, any subsequence of un has a further subsequence for which the image
sequence converges to T u and so T un → T u.
To prove the converse, let E ⊂ H be bounded and {vn }∞
n=1 ⊂ T (E). We must
then have vn = zn + n where zn = T un for some un ∈ E and n → 0 in H. By the
boundedness of E and part 2 of Theorem 13.1 there must exist a weakly convergent
w
subsequence unk → u. Therefore vnk = T unk + nk is convergent, since we assume that
the image of any weakly convergent sequence is strongly convergent. It follows that T (E)
is relatively compact, as needed.
The following theorem will turn out to be a key tool in developing the theory of
integral equations with L2 kernels.
th13-3
Theorem 13.3. K(H) is a closed subspace of B(H).
Proof: We have already observed that K(H) is a subspace of B(H). To verify that it is
closed, pick Tn ∈ K(H) such that ||Tn − T || → 0 for some T ∈ B(H). We are done if we
217
show T ∈ K(H), and this in turn will follow if we show that for any bounded sequence
{un } there exists a convergent subsequence of the image sequence {T un }.
Since T1 ∈ K(H) there must exist a subsequence {u1n } ⊂ {un } such that {T1 u1n } is
convergent. Likewise, since T2 ∈ K(H) there must exist a further subsequence {u2n } ⊂
{u1n } such that {T2 u2n } is convergent. Continuing in this way we get {ujn } such that
j
j
{uj+1
n } ⊂ {un } and {Tj un } is convergent, for any fixed j.
Now let zn = unn , so that {zn } ⊂ {ujn } for any j, and is obviously a subsequence of
the original sequence {un }. We claim that {T zn } is convergent, which will complete the
proof.
Fix an > 0. We may first choose M such that ||un || ≤ M for every n, and then
some fixed j such that ||Tj − T || < 4M
. Next pick N so that ||Tj zn − Tj zm || < 2 when
m, n ≥ N . We then have, for n, m ≥ N , that
||T zn − T zm || ≤ ||T zn − Tj zn || + ||Tj zn − Tj zm || + ||Tj zm − T zm ||
≤ ||T − Tj ||(||zn || + ||zm ||) + ||Tj zn − Tj zm || ≤ (13.1.7)
It follows that {T zn } is Cauchy, hence convergent, in H.
Recall that an integral operator
Z
K(x, y)u(y) dy
T u(x) =
(13.1.8)
Ω
is of Hilbert-Schmidt type if K ∈ L2 (Ω × Ω), and we have earlier established that such
operators are bounded on L2 (Ω). We will now show that any Hilbert-Schmidt integral
operator is actually compact. The basic idea is to show that T can be approximated
by finite rank operators, which we know to be compact, and then apply the previous
theorem. First we need a lemma.
2
∞
Lemma 13.1. If {φn }∞
n=1 is an orthonormal basis of L (Ω) then {φn (x)φm (y)}n,m=1 is
an orthonormal basis of L2 (Ω × Ω).
Proof: By direct calculation we see that
(
1 n = n0 , m = m0
φn (x)φm (y)φn0 (x)φm0 (y) dxdy =
0 otherwise
Ω
Z Z
Ω
218
(13.1.9)
13-1-8
so that they are orthonormal in L2 (Ω × Ω). To show completeness, then by Theorem 6.4
it is enough to verify the Bessel equality. That is, we show
||f ||2L2 (Ω)
∞
X
=
|cn,m |2
(13.1.10)
n,m=1
where
Z Z
f (x, y)φn (x)φm (y) dxdy
cn,m =
Ω
(13.1.11)
Ω
and it is enough to do this for f ∈ C(Ω).
By applying the Bessel equality in x for fixed y, and then integrating with respect to
y we get
Z Z
Z X
∞
2
|cn (y)|2 dy
(13.1.12)
|f (x, y)| dxdy =
Ω
Ω
Ω n=1
R
where cn (y) = Ω f (x, y)φn (x) dx. Since we can clearly exchange the sum and integral, it
follows by applying the Bessel equality to cn (·) we get
Z Z
∞ X
∞
∞ Z
X
X
2
|cn (y)|2 dy =
|cn,m |2
(13.1.13)
|f (x, y)| dxdy =
Ω
Ω
where
Ω
n=1
Z
n=1 m=1
Z Z
cn (y)φm (y) dy =
cn,m =
f (x, y)φn (x)φm (y) dxdy
Ω
Ω
(13.1.14)
Ω
as needed.
th13-4
Theorem 13.4. If K ∈ L2 (Ω × Ω) then the integral operator (13.1.8) is compact on
L2 (Ω).
Proof: Let {φn } be an orthonormal basis of L2 (Ω) and set
KN (x, y) =
N
X
cn,m φn (x)φm (y)
(13.1.15)
n,m=1
with cn,m as above, so we know ||KN − K||L2 (Ω×Ω → 0 as N → ∞. Let TN be the
corresponding integral operator with kernel KN , which is compact since it has finite
rank. Finally since ||T − TN || ≤ ||KN − K||L2 (Ω×Ω) → 0 (recall (10.2.15)) it follows from
Theorem 13.3 that T is compact.
219
13.2
The Riesz-Schauder theory
In this section we first establish a fundamental result about the solvability of operator
equations of the form λu − T u = f when T is compact and λ 6= 0.
th13-5
Theorem 13.5. Let T ∈ K(H) and λ 6= 0. Then
1. λI − T is a Fredholm operator of index zero.
2. If λ ∈ σ(T ) then λ ∈ σp (T ).
Recall that the first statement means that N (λI − T ) and N (λI − T ∗ ) are of the same
finite dimension and that R(λI − T ) is closed. It follows that
R(λI − T ) = N (λI − T ∗ )⊥
(13.2.1)
and the Fredholm alternative holds:
Either
• λI − T and λI − T ∗ are both one to one, and λu − T u = f has a unique solution
for every f ∈ H, or
• dim N (λI − T ) = dim N (λI − T ∗ ) < ∞ and λu − T u = f has a solution if and
only if f ⊥ v for any v satisfying T ∗ v = λv.
If T is compact then so is T ∗ (Exercise 2), thus all of the same conclusions hold for
T ∗.
The proof proceeds by means of a number of intermediate steps, some of which are
of independent interest. Without loss of generality we may assume λ = 1, since we could
always write λI − T = λ(I − λ−1 T ). For the rest of the section we denote S = I − T
with the assumption that T ∈ K(H).
Lemma 13.2. There exists C > 0 such that ||Su|| ≥ C||u|| for all u ∈ N (S)⊥ .
Proof: If no such constant exists then we can find a sequence {un }∞
n=1 such that un ∈
⊥
N (S) , ||un || = 1 and ||Sun || → 0. By weak compactness there exists a subsequence
220
w
unk such that unk → u for some u with ||u|| ≤ 1. Since T is compact it follows that
T unk → T u, so unk = Sunk + T unk → T u. By uniqueness of the weak limit T u = u,
in other words u ∈ N (S). On the other hand un ∈ N (S)⊥ implies that u ∈ N (S)⊥ so
that u = 0 must hold. Finally we also have ||u|| = 1, since unk → u strongly, which is a
contradiction.
lemma13-3
Lemma 13.3. R(S) is closed.
Proof: Let vn ∈ R(S), vn → v. Obviously we may choose un such that Sun = vn . Let P
denote the orthogonal projection onto the closed subspace N (S). If wn = un − P un then
wn ∈ N (S)⊥ and Swn = Sun = vn . By the previous lemma ||vn − vm || ≥ C||wn − wm ||
for some C > 0, so that {wn } must be a Cauchy sequence. Letting w = limn→∞ wn we
then have Sw = limn→∞ Swn = v, so that v ∈ R(S) as needed.
lemma13-4
Lemma 13.4. R(S) = H if and only if N (S) = {0} .
Proof: First suppose that R(S) = H and that there exists u1 ∈ N (S), u1 6= 0. There
must exist u2 ∈ H such that Su2 = u1 , since we have assumed that S is onto. Similarly we
can find up for p = 3, 4, . . . such that Sup = up−1 and evidently S p−1 up = u1 , S p up = 0.
Let Np = N (S p ) so that Np−1 ⊂ Np and the inclusion is strict, since up ∈ Np but
up 6∈ Np−1 . Now apply the Gram-Schmidt procedure to the sequence {up } to get a
sequence {wp } such that wp ∈ Np , ||wp || = 1 and wp ⊥ Np−1 . We will be done if we show
that {T wp } has no convergent subsequence, since this will contradict the compactness of
T.
Fix p > q, let g = Swq − Swp − wq and observe that
||T wp − T wq || = ||wp − wq − Swp + Swq || = ||wp + g||
(13.2.2)
We must have wp ⊥ g since Swq , Swp , wq ∈ Np−1 , therefore
||T wp − T wq ||2 = ||wp ||2 + ||g||2 ≥ ||wp ||2 = 1
(13.2.3)
and it follows that there can be no convergent subsequence of {T wp }, as needed.
To prove the converse implication, assume that N (S) = {0} so that R(S ∗ ) = N (S)⊥ =
H by Corollary 10.1. But as remarked above T ∗ is also compact, so by Lemma 13.3
R(S ∗ ) is closed, hence R(S ∗ ) = H. By the first half of this lemma N (S ∗ ) = {0} so that
R(S) = H and therefore finally R(S) = H by one more application of Lemma 13.3.
221
Lemma 13.5. N (S) is of finite dimension.
Proof: If not, then there exists an infinite orthonormal basis {en }∞
n=1 of N (S), and in
particular ||T en || = ||en || = 1. But since T is compact we also know that T en → 0, a
contradiction.
lemma13-6
Lemma 13.6. The null spaces N (S) and N (S ∗ ) are of the same finite dimension.
Proof: Denote m = dim N (S), m∗ = dim N (S ∗ ) and suppose that m∗ > m. Let
w1 , . . . wm , v1 , . . . vm∗ be orthonormal bases of N (S), N (S ∗ ) respectively and define the
operator
m
X
hu, wj ivj
(13.2.4)
Au = Su −
j=1
Since hSu, vj i = 0 for j = 1, . . . m∗ it follows that
(
−hu, wk i
k = 1, . . . m
hAu, vk i =
0
k = m + 1, . . . m∗
(13.2.5)
Next we claim that N (A) = {0}. To see this, if Au = 0 we’d have hu, wk i = 0 for
k = 1, . . . m, so that u ∈ N (S)⊥ . But it would also follow that u ∈ N (S) by (13.2.4),
and so u = 0.
We may obviously write A = I − T̃ for some T̃ ∈ K(H), so by Lemma 13.4 we may
conclude that R(A) = H. But vm+1 6∈ R(A) since if Au = vm+1 it would follow that
1 = ||vm+1 ||2 = hAu, vm+1 i = 0, a contradiction.
corr13-1
Corollary 13.1. If 0 ∈ σ(S) then 0 ∈ σp (S).
Proof: If 0 6∈ σp (S) then N (S) = {0} so that R(S) = H by Lemma 13.4. But then
0 6∈ σc (S) ∪ σr (S), so 0 6∈ σ(S) as needed.
By combining the conclusions of Lemma 13.3, Lemma 13.6 and Corollary 13.1 we
have completed the proof of Theorem 13.5. Further important information about the
spectrum of a compact operator is contained in the next theorem.
th13-6
Theorem 13.6. If T ∈ K(H) then σ(T ) is at most countably infinite, with 0 as the only
possible accumulation point.
222
13-2-4
Proof: Since σ(T )\{0} = σp (T ), it is enough to show that for any > 0 there exists at most a finite number of linearly independent eigenvectors of T corresponding to
eigenvalues λ with |λ| > . Assuming to the contrary, there must exist {xn }∞
n=1 , linearly
independent, such that T xn = λn xn and |λn | > . Applying the Gram-Schmidt procedure
∞
to the sequence {xn }∞
n=1 we obtain an orthonormal sequence {yn }n=1 such that
yk =
k
X
βkk 6= 0
(13.2.6)
βkj (λj − λk )xj
(13.2.7)
βkj xj
j=1
Therefore
T yk − λk yk =
k
X
j=1
implying that
T yk = λk yk +
k−1
X
αkj yj
(13.2.8)
|αkj |2 = ||T yk ||2 → 0
(13.2.9)
j=1
for some αkj . But then
2
2
|λk | ≤ |λk | +
k−1
X
j=1
since {yn }∞
n=1 is orthonormal and T is compact, contradicting |λn | > .
We emphasize that nothing stated so far implies that a compact operator has any
eigenvalues at Rall. For example we have already observed that the simple Volterra operx
ator T u(x) = 0 u(s) ds, which is certainly compact, has spectrum σ(T ) = σc (T ) = {0}
(Example 12.4). In the next section we will see that if the operator T is also self-adjoint,
then this sort of behavior cannot happen, i.e. eigenvalues must exist.
We could also use Theorems 13.5 or 13.6 to prove that certain operators are not
compact. For example, a nonzero multiplication operator cannot be compact since it has
either an uncountable spectrum or an infinite dimensional eigenspace, or both.
We conclude this section by summarizing in the form of a theorem the implications
of the abstract results in this section for the solvability of integral equations
Z
λu(x) − K(x, y)u(y) dy = f (x)
x∈Ω
(13.2.10)
Ω
223
13-2-10
Theorem 13.7. If K ∈ L2 (Ω × Ω) then there exists a finite or countably infinite set
{λn ∈ C} with zero as its only possible accumulation point, such that
• If λ 6= λn , λ 6= 0 then for every f ∈ L2 (Ω) there exists a unique solution u ∈ L2 (Ω)
of (13.2.10).
• If λ = λn 6= 0 then there exist linearly independent solutions {v1 , . . . vm }, for some
finite m, of the homogeneous equation
Z
λv(x) − K(x, y)v(y) dy = 0
(13.2.11)
Ω
and m linearly independent solutions {w1 , . . . wm } of the adjoint homogeneous equation
Z
λw(x) − K(y, x)w(y) dy = 0
(13.2.12)
Ω
such that for f ∈ L2 (Ω) a solution of (13.2.10) exists if and only if f satisfies the
m solvability conditions hf, wj i = 0 for j = 1, . . . m. In such case (13.2.10) has the
m parameter family of solutions
u = up +
m
X
cj v j
(13.2.13)
j=1
where up denotes any solution of (13.2.10).
• If λ = 0 then either existence or uniqueness may fail. The condition hf, wi = 0 for
any solution w of
Z
K(y, x)w(y) dy = 0
(13.2.14)
Ω
is necessary, but in general insufficient for the existence of a solution of (13.2.10).
13.3
The case of self-adjoint compact operators
In this section we continue with the study of the spectral properties of compact operators,
but now make the additional assumption that the operator is self-adjoint. As motivation,
let us recall that in the finite dimensional case a Hermitian matrix is always diagonalizable, and in particular there exists an orthonormal basis of eigenvectors of the matrix.
224
If T x = Ax where A is an N × N Hermitian matrix with eigenvalues {λ1 , . . . λN } and
corresponding orthonormal eigenvectors {u1 , . . . uN }, and we let U denote the N ×N matrix whose columns are u1 , . . . uN , then U ∗ U = I and U ∗ AU = D where D is a diagonal
matrix with diagonal entries λ1 , . . . λN . It follows that
∗
Ax = U DU x =
N
X
λj huj , xiuj
(13.3.1)
j=1
or equivalently
T =
N
X
λ j Pj
(13.3.2)
j=1
where Pj is the orthogonal projection onto the span of uj . The property that an operator
may have of being expressible as a linear combination of projections is a useful one
when true, and as we will see in this section is generally correct for compact self-adjoint
operators.
Definition 13.4. If T is a linear operator on a Hilbert space H, the Rayleigh quotient
for T is
hT x, xi
J(x) =
(13.3.3)
||x||2
Clearly J : D(T )\{0} → C and |J(x)| ≤ ||T || for T ∈ B(H). If T is self-adjoint then
J is real valued since
hT x, xi = hx, T xi = hT x, xi
(13.3.4)
The range of the function J is sometimes referred to as the numerical range of T , and we
may occasionally use the notation Q(x) = hT x, xi, the so-called quadratic form associated
with T . Note also that σp (T ) is contained in the numerical range of T , since J(x) = λ if
T x = λx.
th13-8
Theorem 13.8. If T ∈ B(H) and T = T ∗ then
||T || = sup |J(x)|
(13.3.5)
x6=0
Proof: If MT = supx6=0 |J(x)| then we have already observed that MT ≤ ||T ||. To derive
the reverse inequality, first observe that since J is real valued,
hT (x + y), x + yi ≤ MT ||x + y||2
−hT (x − y), x − yi ≤ MT ||x − y||2
225
(13.3.6)
(13.3.7)
(13.3.8)
for any x, y ∈ H. Adding these inequalities and using the self-adjointness gives
2Re hT x, yi = hT x, yi + hT y, xi ≤ MT (||x||2 + ||y||2 )
(13.3.9)
If x 6∈ N (T ) choose y = (||x||/||T x||)T x so that ||y|| = ||x|| and hT x, yi = ||T x|| ||x||. It
follows that
2||T x|| ||x|| ≤ 2MT ||x||2
(13.3.10)
and therefore ||T x|| ≤ MT ||x|| holds for x 6∈ N (T ). Since the same conclusion is obvious
for x ∈ N (T ) the proof is completed.
We note that the conclusion of theorem is false without the self-adjointness assumption, for example J(x) = 0 for all x if T is the operator of rotation by π/2 in R2 .
Now consider the function α → J(x + αy) for fixed x, y ∈ H\{0}. As a function of
α it is simply a quotient of quadratic functions, hence differentiable at any α for which
||x + αy|| =
6 0. In particular
d
J(x + αy)α=0
(13.3.11)
dα
is well defined for any x 6= 0. This expression is the directional derivative of J at x in
the y direction, and we say that x is a critical point of J if (13.3.11) is zero for every
direction y.
13-3-12
We may evaluate (13.3.11) by elementary calculus rules and we find that
hx, xi(hT x, yi + hT y, xi) − hT x, xi(hx, yi + hy, xi)
d
J(x + αy)α=0 =
dα
hx, xi2
(13.3.12)
so at a critical point it must hold that
Re hT x, yi = J(x)Re hx, yi
∀y ∈ H
(13.3.13)
∀y ∈ H
(13.3.14)
Replacing y by iy we obtain
ImhT x, yi = J(x) Imhx, yi
and since J is real valued,
hT x, yi = J(x)hx, yi
∀y ∈ H
(13.3.15)
If λ = J(x) then hT x − λx, yi = 0 for all x ∈ H, so that T x = λx must hold. We
therefore see that eigenvalues of a self-adjoint operator T may be obtained from critical
points of the corresponding Rayleigh quotient, and it is also clear that the right side of
(13.3.12) evaluates to be zero for any y if T x = λx. We have therefore established the
following.
226
13-3-13
Proposition 13.3. Let T be a bounded self-adjoint operator on H. Then x ∈ H\{0} is
a critical point of J if and only if x is an eigenvector of T corresponding to eigenvalue
λ = J(x).
We emphasize that at this point we have not yet proved that any such critical points
exist, and indeed we know that a bounded self-adjoint operator can have an empty point
spectrum, for example a multiplication operator if the multiplier is real valued and all
of its level sets have measure zero. Nevertheless we have identified a strategy that will
succeed in proving the existence of eigenvalues, once some additional assumptions are
made. The main such additional assumption we will make is that T is compact.
th13-9
Theorem 13.9. If T ∈ K(H) and T = T ∗ then either J or −J achieves its maximum
on H\{0}. In particular, either ||T || or −||T || (or both) belong to σp (T ).
Proof: If T = 0 then J(x) ≡ 0 and the conclusion is obvious. Otherwise, if ||T || > 0
then by Theorem 13.8 either
sup J(x) = MT = ||T ||
x6=0
or
inf J(x) = −MT = −||T ||
x6=0
(13.3.16)
or both. For definiteness we assume that the first of these is true, in which case there
must exist a sequence {xn }∞
n=1 in H such that J(xn ) → MT . Without loss of generality
we may assume ||xn || = 1 for all n, so that hT xn , xn i → MT . By weak compactness
w
there is a subsequence xnk → x, for some x ∈ H, and since T is compact we also have
T xnk → T x. Thus
0 ≤ ||T xnk − MT xnk ||2 = ||T xnk ||2 + MT2 ||xnk ||2 − 2MT hT xnk , xnk i
(13.3.17)
Letting k → ∞ the right hand side tends to ||T x||2 − MT2 ≤ 0, and thus ||T x|| = MT .
Furthermore, T xnk − MT xnk → 0, and since MT 6= 0, {xnk } must be strongly convergent to x – in particular ||x|| = 1. Thus we have T x = MT x for some x 6= 0, so
that J(x) = MT which means that J achieves its maximum at x and x is an eigenvector
corresponding to eigenvalue ||T || = MT , as needed.
According to this theorem, any nonzero, compact, self-adjoint operator has at least
one eigenvector x1 corresponding to an eigenvalue λ1 6= 0. If another such eigenvector
exists which is not a scalar multiple of x1 , then it must be possible to find one which is orthogonal to x1 , since eigenvectors corresponding to distinct eigenvalues are automatically
orthogonal (Theorem 12.5) while the eigenvectors corresponding to λ1 form a subspace
227
which we can find an orthogonal basis of. This suggests that we seek another eigenvector
by maximizing or minimizing the Rayleigh quotient over the subspace H1 = {x1 }⊥ .
Let us first make a definition and a simple observation.
Definition 13.5. If T is a linear operator on H then a subspace M ⊂ D(T ) is invariant
for T if T (M ) ⊂ M .
It is obvious that any eigenspace of T is invariant for T , and in the case of a self-adjoint
operator we have also the following.
Lemma 13.7. If T ∈ B(H) is a self-adjoint and M is an invariant subspace for T , then
M ⊥ is also invariant for T .
Proof: If y ∈ M and x ∈ M ⊥ then
hT x, yi = hx, T yi = 0
(13.3.18)
since T y ∈ M . Thus T x ∈ M ⊥ .
Now defining H1 = {x1 }⊥ as above, we have immediately that T ∈ B(H1 ) and clearly
inherits the properties of compactness and self-adjointness from H. Theorem 13.9 is
therefore immediately applicable, so that the restriction of T to H1 has an eigenvector
x2 , which is also an eigenvector of T and which is automatically orthogonal to x1 . The
corresponding eigenvalue is λ2 = ±||T1 ||, where T1 is the restriction of T to H1 , and so
obviously |λ2 | ≤ |λ1 |.
Continuing this way we obtain orthogonal eigenvalues x1 , x2 , . . . corresponding to real
eigenvalues |λ1 | ≥ |λ2 | ≥ . . . where
|λn+1 | = max |J(x)| = ||Tn ||
(13.3.19)
x∈Hn
x6=0
where Hn = {x1 , . . . xn }⊥ and Tn is the restriction of T to Hn . Without loss of generality
||xn || = 1 for all n obtained this way.
There are now two possibilities, either (i) the process continues indefinitely with
λn 6= 0 for all n, or (ii) λn+1 = 0 for some n. In the first case we must have limn→∞ λn = 0
by Theorem 13.6 and the fact that every eigenspace is of finite dimension. In case (ii),
T has only finitely many linearly independent eigenvectors corresponding to nonzero
228
eigenvalues λ1 , . . . λn and T = 0 on Hn . Assuming for definiteness that H is separable
and of infinite dimension, then Hn = N (T ) is the eigenspace for λ = 0 which must itself
be infinite dimensional.
th13-10
Theorem 13.10. Let H be a separable Hilbert space. If T ∈ K(H) is self-adjoint then
a) R(T ) has an orthonormal basis consisting of eigenvectors {xn } of T corresponding
to eigenvalues λn 6= 0.
b) H has an orthnormal basis consisting of eigenvectors of T .
Proof: Let {xn } be the finite or countably infinite set of eigenvectors corresponding
to
Pn
the nonzero eigenvalues of T as constructed above. For x ∈ H let y = x − j=1 hx, xj ixj
for some n. Then y is the orthogonal projection of x onto Hn , so ||y|| ≤ ||x|| and
||T y|| ≤ |λn+1 | ||y||. In particular
n
X
n
X
hx, xj iT xj ||2 ≤ |λn+1 |2 ||x||2
(13.3.20)
hx, xj iT xj = hx, xj iλj xj = hx, λj xj ixj = hx, T xj ixj = hT x, xj ixj
(13.3.21)
||T x −
2
hT x, xj ixj || = ||T x −
j=1
j=1
where we have used that
Letting n → ∞, or taking n sufficiently large in the case of a finite number of nonzero
eigenvalues, we therefore see that T x is in the span of {xn }. This completes the proof of
a).
If we now let {zn } be any orthonormal basis of the closed subspace N (T ), then each zn
is an eigenvector of T corresponding to eigenvector
λ = 0 and zn ⊥ xm for any m, n since
P
⊥
N (T ) = R(T ) . For any x ∈ H let
P y = n hx, xn ixn - the series must be convergent by
Proposition 6.3 and the fact that n |hx, xn i|2 ≤ ||x||2 . It is immediate that x−y ∈ N (T )
since
X
Tx = Ty =
λn hx, xn ixn
(13.3.22)
13-3-23b
n
and so x has the unique representation
X
X
X
x=
hx, xn ixn +
hx, zn izn =
hx, xn ixn + P x
n
n
(13.3.23)
n
where P is the orthogonal projection onto the closed subspace N (T ). Thus {xn } ∪ {zn }
is an orthonormal basis of H.
229
13-3-23
We note that either sum in (13.3.23) can be finite or infinite, but of course they can’t
both be finite unless H is finite dimensional. In the case of a non-separable Hilbert space
it is only necessary to allow for an uncountable basis of N (T ). From (13.3.22) we also
get the diagonalization formula
X
T =
λn Pn
(13.3.24)
n
where Pn x = hx, xn ixn is the orthogonal projection onto the span of xn .
The existence of an eigenfunction basis provides a convenient tool for the study of
corresponding operator equations. Let us consider the problem
λx − T x = f
(13.3.25)
where T is a compact, self-adjoint operator on a separable, infinite dimensional Hilbert
space H. Let {xn }∞
n=1 be an orthonormal basis of eigenvectors of T . We may therefore
expand f , and solution x if it exists, in this basis,
x=
∞
X
n=1
an x n
f=
∞
X
an = hx, xn i bn = hf, xn i
bn x n
(13.3.26)
n=1
Inserting these into the equation and using T xn = λn xn there results
∞
X
((λn − λ)an − bn )xn = 0
(13.3.27)
n=1
Thus it is a necessary condition that (λn − λ)an = bn for all n, in order that a solution
x exist.
Now let us consider several cases.
Case 1. If λ 6= λn for every n and λ 6= 0, then λ ∈ ρ(T ) so a unique solution x of
(13.3.25) exists, which must be given by
x=
∞
X
hf, xn i
n=1
λ − λn
xn
(13.3.28)
Note that there exists a constant C such that 1/|λ − λn | ≤ C for all n, from which it
follows directly that the series is convergent in H and ||x|| ≤ C||f ||.
Case 2. Suppose λ = λm for some m and λ 6= 0. It is then necessary that bn = 0 for all
n for which λn = λm , which amounts precisely to the solvability condition on f already
230
13-2-26
derived, that f ⊥ z for all z ∈ N (λI − T ). When this holds the constants an may be
chosen arbitrarily for these n values, while an = bn /(λ − λn ) must hold otherwise. Thus
the general solution may be written
x=
X
{n:λn 6=λm }
hf, xn i
xn +
λ − λn
X
cn x n
(13.3.29)
{n:λn =λm }
for any f ∈ R(λI − T ).
Case 3. If λ = 0 and λn 6= 0 for all n then the unique solution is given by
x=−
∞
X
hf, xn i
n=1
λn
xn
(13.3.30)
provided the series is convergent in H. Since λn → 0 must hold in this case, there will
always exist f ∈ H for which the series is not convergent, as must be the case since R(T )
is dense but not equal to all of H. In fact we obtain the precise characterization that
f ∈ R(T ) if and only if
∞
X
|hf, xn i|2
<∞
(13.3.31)
λ2n
n=1
Case 4. If λ = 0 ∈ σp (T ) let {xn } ∪ {zn } be an orthonormal basis of eigenvectors as
above, with the zn ’s being a basis of N (T ). If a solution x exists, then by matching
coefficients in the basis expansions of T x and f we get that a solution exists if f has the
properties
X |hf, xn i|2
hf, zn i = 0 ∀n
and
<∞
(13.3.32)
λ2n
n
in which case the general solution is
x=
X hf, xn i
n
13.4
λn
xn +
X
cn zn
n
X
c2n < ∞
(13.3.33)
n
Some properties of eigenvalues
When T is a self-adjoint compact operator, we have seen in the previous section that
solution formulas for the equation λx − T x = f can be given purely in terms of the
231
eigenvalues and eigenvectors of T , along with f itself. This means that all of the properties
of T are encoded by these eigenvalues and eigenvectors. We will briefly pursue some
consequences of this in the case that T is an integral operator, in which case we may
anticipate that properties of the kernel of the operator are directly connected to those of
the eigenvalues and eigenvectors. Thus let
Z
K(x, y)u(y) dy
(13.4.1)
T u(x) =
13-4-1
Ω
where K ∈ L2 (Ω × Ω) and K(x, y) = K(y, x). Considered as an operator on L2 (Ω)
Theorem 13.10 is then applicable, so we know there must exist an orthonormal basis of
eigenfunctions {un }∞
n=1 and real eigenvalues λn such that T un = λn un , i.e.
Z
K(x, y)un (y) dy = λn un (x)
(13.4.2)
Ω
or equivalently
Z
K(y, x)un (y) dy = λn un (x)
(13.4.3)
Ω
This means that for almost every x ∈ Ω, λn un (x) is the n’th generalized Fourier coefficient
of K(·, x) with respect to the un basis. In particular, by the Bessel equality
Z
2
|K(x, y)| dy =
Ω
∞
X
λ2n |un (x)|2
for a.e. x ∈ Ω
(13.4.4)
n=1
and integrating with respect to x gives
Z
ZZ
∞
∞
X
X
2
2
2
|K(x, y)| dydx =
λn |un (x)| dx =
λ2n
Ω×Ω
n=1
Ω
(13.4.5)
n=1
It also follows from the above considerations that
K(y, x) =
∞
X
λn un (x)un (y)
(13.4.6)
λn un (x)un (y)
(13.4.7)
n=1
or
K(x, y) =
∞
X
n=1
232
13-4-7
in the sense that the convergence takes place in L2 (Ω) with respect to y for a.e. x and
vice versa. Formally at least it follows by setting y = x that
K(x, x) =
∞
X
λn |un (x)|2
(13.4.8)
n=1
and integrating in x that
Z
K(x, x) dx =
Ω
∞
X
λn
(13.4.9)
n=1
This identity, however, cannot be proved to be correct without further assumptions, if
for no other reason than that K(x, x), being a restriction of K to a set of measure zero in
Ω × Ω, could be changed in an arbitrary way with out changing the spectrum of T . Here
we state without proof Mercer’s theorem, which states sufficient conditions for (13.4.9)
to hold – see for example [8], p. 138.
Theorem 13.11. Let T be the compact self-adjoint integral operator (13.4.1). Assume
that Ω is bounded, K is continuous on Ω × Ω and that all but finitely many of the nonzero
eigenvalues of T are of the same sign. Then (13.4.7) is valid, where the convergence is
absolute and uniform, and in particular (13.4.9) holds.
13.5
The Singular Value Decomposition and Normal Operators
If T is a compact operator we know from explicit examples that the point spectrum of T
may be empty. However if we let S = T ∗ T , the so-called normal operator of T , then S
is compact and self-adjoint (see Exercise 1), so that Theorem 13.10 applies to S. There
must therefore exist an orthonormal basis {xn }∞
n=1 of H consisting of eigenvectors of S,
i.e.
T ∗ T xn = λn xn
(13.5.1)
Note that if J is the Rayleigh quotient for S then
λn = J(xn ) = hSxn , xn i = ||T xn ||2 ≥ 0
√
(13.5.2)
We define σn = λn to be the n’th singular value of T . If T 6= 0 and we list the nonzero
eigenvalues of S in decreasing order, λ1 ≥ λ2 ≥ . . . (this is possibly a finite list) then
from Theorem 13.9 it is immediate that λ1 = ||T ||2 so we have the following simple but
important result.
233
ktrace
Proposition 13.4. If T ∈ K(H) then ||T || = σ1 , the largest singular value of T .
Now for any n for which λn > 0, let yn = T xn /σn . We then have
T ∗ y n = σ n xn
T xn = σn yn
(13.5.3)
The xn ’s are orthonormal by construction, and
hyn , ym i =
1
hT xn , T xm i = hxn , xm i
λn
(13.5.4)
so that the yn ’s are also orthonormal. We say that xn is the n’th right singular vector of
T and yn is the n’th left singular vector. The collection {λn , xn , yn } is a singular system
for T .
From (13.3.23) we then have
X
X
σn hx, xn iyn
hx, xn iT xn =
Tx =
or
T =
X
σn Qn
(13.5.5)
n
n
where Qn x = hx, xn iyn
(13.5.6)
n
Here Qn is not a projection unless xn = yn , but is still a so-called rank one operator. This
representation of T as a sum of rank one operators is the singular value decomposition
of T .
Now let us consider a normal operator T ∈ K(H), which we recall means that T ∗ T =
T T . For simplicity let us also assume that all eigenvalues of the compact self-adjoint
operator S = T ∗ T are nonzero and simple. In that case if Sxn = λn xn it follows that
∗
ST xn = T ∗ T 2 xn = T T ∗ T xn = T Sxn = λn T xn
(13.5.7)
which means either T xn = 0 or T xn is an eigenvector of S corresponding to λn . The
first case cannot occur since then Sxn = 0 would hold, so it must be that xn and T xn
are nonzero and linearly dependent, T xn = θn xn for some θn ∈ C\{0}. Thus H has
an orthonormal basis consisting of eigenvectors of T since there are the same as the
eigenvectors of S. With a somewhat more complicated proof, the same can be shown for
any normal operator T , see Section 56 of [2].
234
13.6
Exercises
ex-13-1
1. Show that if S ∈ B(H) and T is compact, then T S and ST are also compact. (In
algebraic terms this means that the set of compact operators is an ideal in B(H).)
ex-13-2
2. If T ∈ B(H) and T ∗ T is compact, show that T must be compact. Use this to show
that if T is compact then T ∗ must also be compact.
ex-13-3
3. Prove that a sequence {xn }∞
n=1 in a Hilbert space can have at most one weak limit.
4. If T ∈ B(H) is compact and H is of infinite dimension, show that 0 ∈ σ(T ).
5. Let {φj }nj=1 ,{ψj }nj=1 be linearly independent sets in L2 (Ω),
K(x, y) =
n
X
φj (x)ψj (y)
j=1
be the corresponding degenerate kernel and T be the corresponding integral operator. Show that the problem of finding the nonzero eigenvalues of T always amounts
to a matrix eigenvalue problem. In particular, show that T has at most n nonzero
eigenvalues. Find σp (T ) in the case that K(x, y) = 6+12xy +60x2 y 3 and Ω = (0, 1).
(Feel free to use Matlab or some such thing to solve the resulting matrix eigenvalue
problem.)
6. Let
1
T u(x) =
x
Z
x
u(y) dy
u ∈ L2 (0, 1)
0
Show that (0, 2) ⊂ σp (T ) and that T is not compact. (Suggestion: look for eigenfunctions in the form u(x) = xα .)
7. Let {λj }∞
j=1 be a sequence of nonzero real numbers satisfying
∞
X
λ2j < ∞
j=1
Construct a symmetric Hilbert-Schmidt kernel K such that the corresponding integral operator has eigenvalues λj , j = 1, 2 . . . and for which 0 is an eigenvalue
of infinite multiplicity. (Suggestion: look for such a K in the form K(x, y) =
P∞
2
j=1 λj uj (x)uj (y) where {uj } are orthonormal, but not complete, in L (Ω).)
235
8. Let T be the integral operator
Z
1
T u(x) =
(x + y)u(y) dy
0
on L2 (0, 1). Find σp (T ), σc (T ) and σr (T ) and the multiplicity of each eigenvalue.
9. On the Hilbert space H = `2 define the operator T by
T {x1 , x2 , . . . } = {a1 x1 , a2 x2 , . . . }
for some sequence {an }∞
n=1 . Show that T is compact if and only if limn→∞ an = 0.
10. Let T be the integral operator with kernel K(x, y) = e−|x−y| on L2 (−1, 1). Find all
of the eigenvalues and eigenfunctions of T . (Suggestion: T u = λu is equivalent to
an ODE problem. Don’t forget about boundary conditions. The eigenvalues may
need to be characterized in terms of the roots of a certain nonlinear function.)
11. We say that T ∈ B(H) is a positive operator if hT x, xi ≥ 0 for all x ∈ H. If T is a
positive self-adjoint compact operator show that T has a square root, more precisely
2
thereP
exists a compact self-adjoint
P∞ √ operator S such that S = T . (Suggestion: If
∞
T = n=1 λn Pn try S = n=1 λn Pn . In a similar manner, one can define other
fractional powers of T .)
12. Suppose that S ∈ B(H), 0 ∈ ρ(S), T is a compact operator on H, and N (S + T ) =
{0}. Show that the operator equation
Sx + T x = y
has a unique solution for every y ∈ H.
13. Compute the singular value decomposition of the Volterra operator
Z x
T u(x) =
u(s) ds
0
in L2 (0, 1) and use it to find ||T ||. Is T normal? (Suggestion: The equation T ∗ T u =
λu is equivalent to an ODE eigenvalue problem which you can solve explicitly.)
14. The concept of a Hilbert-Schmidt operator can be defined abstractly as follows. If
H is a separable Hilbert space, we say that T ∈ B(H) is Hilbert-Schmidt if
∞
X
||T un ||2 < ∞
n=1
236
(13.6.1)
for some orthonormal basis {un }∞
n=1 of H.
a) Show that if T is Hilbert-Schmidt then the sum (13.4.1) must be finite for any
orthonormal basis of H. (Suggestion: If {vn }∞
n=1 is another orthonormal basis, then
∞
X
n=1
||T vn ||2 =
∞
X
n,m=1
|(T vn , um )|2 =
∞
X
|(vn , T ∗ um )|2 =
n,m=1
∞
X
|(un , T ∗ um )|2
n,m=1
etc.)
b) Show that a Hilbert-Schmidt operator is compact.
15. If Q ∈ B(H) is a Fredholm operator of index zero, show that there exists a oneto-one operator S ∈ B(H) and T ∈ K(H) such that Q = S + T .(Hint: Define
T = AP where P is the orthogonal projection onto N (Q) and A : N (Q) → N (Q∗ )
is one-to-one and onto.)
237
Chapter 14
Spectra and Green’s functions for
differential operators
chdiffop
In this chapter we will focus more on spectral properties of unbounded operators, about
which we have had little to say up to this point. Two simple but key observations are that
many interesting unbounded linear operators have an inverse which is compact, and that
if λ 6= 0 is an eigenvalue of some operator then λ−1 is an eigenvalue of the inverse operator,
with the same eigenvector. Thus we may be able to obtain a great deal of information
about the spectrum of an unbounded operator by looking at its inverse, if the inverse
exists. We will carry this plan out in detail for two important special cases. The first is
the case of a second order differential operator in one space dimension (Sturm-Liouville
theory), and the second is the case of the Laplacian operator in a bounded domain of
RN .
14.1
Green’s functions for second order ODEs
Let us reconsider the operator on L2 (0, 1) from Example 12.6, namely
T u = −u00
D(T ) = {u ∈ H 2 (0, 1) : u(0) = u(1) = 0}
(14.1.1)
Any u ∈ N (T ) is a linear function vanishing at the endpoints, so the associated problem
−u00 = f
0<x<1
238
u(0) = u(1) = 0
(14.1.2)
14-1-2
has at most one solution for any f ∈ L2 (0, 1). In fact an explicit solution formula was
given in Exercise 7 of Chapter 2, at least for f ∈ C([0, 1]), and it is not hard to check
that it remains valid for f ∈ L2 (0, 1) in the sense that if
(
y(x − 1) 0 < y < x < 1
G(x, y) =
(14.1.3)
x(y − 1) 0 < x < y < 1
then
Z
u(x) =
14-1-3
1
G(x, y)f (y) dy
(14.1.4)
14-1-4
0
satisfies −u00 = f in the sense of distributions on (0, 1), as well as the given boundary
conditions.
Let us next consider how (14.1.3)-(14.1.4) might be derived in the first place. Formally, if (14.1.4) holds, then
Z 1
00
Gxx (x, y)f (y) dy = −f (x)
(14.1.5)
u (x) =
0
which suggests Gxx (x, y) = −δ(x − y) for all y ∈ (0, 1). This in turn means, in particular,
that
(
Ax + B 0 < x < y
G(x, y) =
(14.1.6)
Cx + D y < x < 1
for some constants A, B, C, D. In order that u satisfy the required boundary conditions
we should have B = C +D = 0. Recalling the discussion leading up to (7.3.27) we expect
that x → G(x, y) should be continuous at x = y and x → Gx (x, y) should have a jump of
magnitude −1 at x = y. These four conditions uniquely determine the four coefficients
determining G in (14.1.3). We call G the Green’s function for the problem (14.1.2).
Now let us consider a more general situation of this type. Define a differential expression
Lu = a2 (x)u00 + a1 (x)u0 + a0 (x)u
(14.1.7)
14-1-7
where we require the coefficients to satisfy aj ∈ C([a, b]) for j = 1, 2, 3 and a2 (x) 6= 0 on
[a, b], together with boundary operators
B1 u = c1 u(a)+c2 u0 (a) B2 u = c3 u(b)+c4 u0 (b)
|c1 |+|c2 | =
6 0 |c3 |+|c4 | =
6 0 (14.1.8)
14-1-8
B1 u = B2 u = 0
14-1-9
We seek a solution for the problem
Lu(x) = f (x) a < x < b
239
(14.1.9)
in the form
Z
b
G(x, y)f (y) dy
u(x) =
(14.1.10)
a
for some suitable kernel function G(x, y). Computing again formally,
Z b
Lx G(x, y)f (y) dy
Lu(x) =
(14.1.11)
a
where the subscript on L reminds us that L operates in the x variable for fixed y, we see
that
Lx G = δ(x − y)
(14.1.12)
should hold, and
B1x G = B2x G = 0
(14.1.13)
in order that the boundary conditions for u be satisfied. In particular G should satisfy
Lx G = 0 for a < x < y < b and a < y < x < b, plus certain matching conditions at
x = y which may be stated as follows: G should be continuous at x = y since otherwise
Lx G would contain a term Cδ 0 (x − y), and Gx should experience a jump at x = y of the
correct magnitude such that a2 (x)Gxx (x, y) = δ(x − y), in other words the jump in Gx
should be 1/a2 (y). The same conclusion could be (formally) derived by integrating both
sides of 14.1.12 from y − to y + and letting → 0+. Thus our conditions may be
summarized as
1
G(y+, y) − G(y−, y) = 0
Gx (y+, y) − Gx (y−, y) =
(14.1.14)
a2 (y)
B1x G = B2x G = 0
(14.1.15)
We now claim that such a function G(x, y) can be found, under the additional assumption that the homogeneous problem (14.1.9) with f ≡ 0 has only the zero solution.
First observe that we can find non-trivial solutions φ1 , φ2 of
Lφ1 = 0 a < x < b
Lφ2 = 0 a < x < b
B1 φ1 = 0
B2 φ2 = 0
(14.1.16)
(14.1.17)
(14.1.18)
since each amounts to a second order ODE with only one initial condition. Now look for
G in the form
(
C1 (y)φ1 (x)
a<x<y<b
(14.1.19)
G(x, y) =
C2 (y)φ2 (x)
a<y<x<b
240
14-1-12
It is then automatic that Lx G = 0 for x 6= y, and that the boundary conditions (14.1.15)
hold. In order that the remaining conditions (14.1.14) be satisfied we must have that
C1 (y)φ1 (y) − C2 (y)φ2 (y) = 0
C1 (y)φ01 (y) − C2 (y)φ02 (y) = −
(14.1.20)
1
a2 (y)
(14.1.21)
Thus unique constants C1 (y), C2 (y) exist provided the coefficient matrix is nonsingular,
or equivalently the Wronskian of φ1 , φ2 is nonzero for every y. But it is known from
ODE theory that if the Wronskian is zero at any point then φ1 , φ2 must be linearly
dependent, in which case either one is a nontrivial solution of the homogeneous problem.
This contradicts the assumption we made, and so the first part of the following theorem
has been established.
th14-1
Theorem 14.1. Assume that (14.1.9) with f ≡ 0 has only the zero solution. Then
1. There exists a unique function G(x, y) defined for a ≤ x, y ≤ b such that Lx G(x, y) =
δ(x − y) in the sense of distributions on (a, b) for fixed y, and (14.1.14),(14.1.15)
hold.
2. G is bounded on [a, b] × [a, b].
3. If f ∈ L2 (a, b) and
Z
b
G(x, y)f (y) dy
u(x) = Sf :=
(14.1.22)
14-1-22
(14.1.23)
14-1-23
a
then u is the unique solution of (14.1.9).
In particular, if we define the unbounded linear operator
T u = Lu
D(T ) = {u ∈ L2 (a, b) : Lu ∈ L2 (a, b), B1 u = B2 u = 0}
then T −1 , given by (14.1.22) clearly satisfies the Hilbert-Schmidt condition and so is
compact operator on L2 (a, b)1 .
corr14-1
Corollary 14.1. Assume that (14.1.9) with f ≡ 0 has only the zero solution and define
T by (14.1.23). Then σ(T ) consists of at most countably many nonzero simple eigenvalues
with no finite accumulation point.
1
Note that we observe a careful distinction between the operator T and the differential expression defined by
L – the operator T corresponds to the triple (L, B1 , B2 ).
241
Proof: By Theorem 14.1 0 ∈ ρ(T ). If λ ∈ σ(T ) then µ = λ−1 ∈ σ(T −1 ) since if
µ ∈ ρ(T −1 ) it would follow that the equation T u − λu = f has the unique solution
u = µ(µI − T −1 )−1 T −1 f
µ = λ−1
(14.1.24)
which implies that λ ∈ ρ(T ). Thus σ(T ) is contained in the set {λ : λ−1 ∈ σ(T −1 )} which
is at most countable by Theorem 13.6. In addition every such point must be in σp (T −1 )
and so σ(T ) = σp (T ). Since σ(T −1 ) is bounded with zero as its only accumulation point
it follows that σ(T ) can have no finite accumulation point. Finally, all eigenvalues of T
must be simple, since if there existed two linearly independent functions in N (T − λI)
these would form a fundamental set for the ODE Lu = λu. But then every solution of
Lu = λu would have to be in D(T ), in particular satisfying the boundary conditions
B1 u = B2 u = 0, which is clearly false.
exmp14-1
Example 14.1. For the case
Lu = u00 − u
B1 u = u0 (0) B2 u = u(1)
(14.1.25)
we can choose
φ1 (x) = cosh x
φ2 (x) = sinh(x − 1)
(14.1.26)
The matching conditions at x = y then amount to
C1 (y) cosh(y) − C2 (y) sinh(y − 1) = 0
C1 (y) sinh(y) − C2 (y) cosh(y − 1) = −1
(14.1.27)
(14.1.28)
The solution pair is C1 (y) = sinh(y − 1)/ cosh(1), C2 (y) = cosh(y)/ cosh(1) giving the
Green’s function
( sinh(y−1) cosh(x)
0<x<y<1
cosh(1)
G(x, y) = sinh(x−1)
(14.1.29)
cosh(y)
0
<
y
<
x
<
1
cosh(1)
If T is the operator corresponding to L, B1 , B2 then it may be checked by explicit calculation that
∞
1 2
(14.1.30)
σ(T ) = −1 − ((n + )π)
2
n=0
14.2
Adjoint problems
Note in the last example that the Green’s function is real and symmetric, so that the
corresponding operator integral operator S in (14.1.22), and hence also T = S −1 is selfadjoint. In this section we consider in more detail the adjoint of the operator T defined
242
14-1-30
in (14.1.23). First we observe that formally, for φ, ψ ∈ C0∞ (a, b) we have
Z b
hLφ, ψi =
(a2 φ00 + a1 φ0 + a0 φ)ψ
a
Z b
φ((a2 ψ)00 − (a1 ψ)0 + a0 ψ) dx
=
(14.2.1)
(14.2.2)
a
That is to say,
hLφ, ψi = hφ, L∗ ψi
(14.2.3)
L∗ ψ = (a2 ψ)00 − (a1 ψ)0 + a0 ψ
(14.2.4)
where
For simplicity we will make the additional assumptions on the coefficients that
aj ∈ C j ([a, b]) and is real valued for j = 0, 1, 2
(14.2.5)
in which case (14.2.2) is correct. Furthermore since
L∗ ψ = a2 ψ 00 + (2a02 − a1 )ψ 0 + (a002 − a01 + a0 )ψ
(14.2.6)
we see that L∗ ψ = Lψ precisely if a1 = a02 . We say that the expression L is formally
self-adjoint in this case, but note that this is not the same as having the corresponding
operator T be self-adjoint, since so far there has been no taking account of the boundary
conditions which are part of the definition of T .
To pursue this point, we see from an integration by parts that for any φ, ψ ∈ C 2 ([a, b])
we have
b
hLφ, ψi − hφ, L∗ ψi = J(φ, ψ)a
(14.2.7)
where
0
J(φ, ψ) = a2 (φ0 ψ − φψ ) + (a1 − a02 )φψ
(14.2.8)
is the boundary functional. Since we can choose φ, ψ to have compact support in which
case the boundary term is zero, the
expression for T ∗ must be given by L∗ . Furthermore,
b
D(T ∗ ) must be such that J(φ, ψ)a = 0 whenever φ ∈ D(T ) and ψ ∈ D(T ∗ ). As we will
see, this amounts to the specification of two more homogeneous boundary conditions to
be satisfied by ψ.
exmp14-2
Example 14.2. As in Example 14.1 consider Lφ = φ00 − φ on (0, 1), which is formally
self-adjoint, together with the boundary operators B1 φ = φ0 (0), B2 φ = φ(1). By direct
calculation we see that
J(φ, ψ) = φ(0)ψ 0 (0) + φ0 (1)ψ(1)
(14.2.9)
243
if B1 φ = B2 φ = 0. But otherwise φ(0), φ0 (1) can take on arbitrary values, and the only
way this can be true is if ψ 0 (0) = ψ(1) = 0, i.e. ψ satisfies the same boundary conditions
as φ. Thus we expect that T ∗ = T , confirming what we saw earlier from the fact that
T −1 is self-adjoint.
exmp14-3
Example 14.3. Let
Lφ = x2 φ00 + xφ0 − φ 1 < x < 2
B1 φ = φ0 (1) B2 φ = φ(2) + φ0 (2)
(14.2.10)
In this case we find that the expression for the adjoint operator is
Lψ = (x2 ψ)00 − (xψ)0 − ψ = x2 ψ 00 + 3xψ 0
(14.2.11)
Next, the boundary functional is
J(φ, ψ) = x2 (φ0 ψ − φψ 0 ) − xφψ
so that if B1 φ = B2 φ = 0 it follows that
2
J(φ, ψ)1 = φ(2)(−6ψ(2) − 4ψ 0 (2)) + φ(1)(ψ 0 (1) + ψ(1))
(14.2.12)
(14.2.13)
Since φ(1), φ(2) can be chosen arbitrarily, it must be that
2ψ 0 (2) + 3ψ 0 (2) = 0
ψ 0 (1) + ψ(1) = 0
(14.2.14)
for ψ ∈ D(T ∗ ).
Definition 14.1. We say that a set of boundary operators {B1∗ , B2∗ } are adjoint to
{B1 , B2 }, with respect to L, if
b
J(φ, ψ)a = 0
(14.2.15)
whenever B1 φ = B2 φ = B1∗ ψ = B2∗ ψ = 0. The conditions B1∗ ψ = B2∗ ψ = 0 are referred
to as the adjoint boundary conditions (with respect to L).
Thus, for example, in Examples 14.2, 14.3 we found adjoint boundary operators
{ψ 0 (0), ψ(1)} and {ψ 0 (1) + ψ(1), 2ψ 0 (2) + 3ψ(2)} respectively. The operators B1∗ , B2∗ are
not themselves unique, since for example they could always be interchanged or multiplied
by constants. However the subspace {ψ : B1∗ ψ = B2∗ ψ = 0} is uniquely determined. If
we now define T ∗ ψ = L∗ ψ on the domain
D(T ∗ ) = {ψ ∈ L2 (a, b) : L∗ ψ ∈ L2 (a, b), B1∗ ψ = B2∗ ψ = 0}
(14.2.16)
then hT φ, ψi = hφ, T ∗ ψi if φ ∈ D(T ) and ψ ∈ D(T ∗ ) and so T ∗ is the adjoint operator
of T .
244
It can be shown (Exercise 4) that if a1 = a02 (that is, L is formally self-adjoint), and
the boundary conditions are of the form (14.1.8), then the adjoint boundary conditions
coincide with the original boundary conditions, so that T is self-adjoint. It is possible to
also consider non-separated boundary conditions of the form
B1 u = c1 u(a) + c2 u0 (a) + c3 u(b) + c4 u0 (b) = 0
B2 u = d1 u(a) + d2 u0 (a) + d3 u(b) + d4 u0 (b) = 0
(14.2.17)
(14.2.18)
to allow, for example, for periodic boundary conditions, see Exercise 6.
If T satisfies the assumptions of Theorem 14.1 then N (T ∗ ) = R(T )⊥ = {0}. Thus T ∗
also satisfies these assumptions, and so has a corresponding Green’s function which we
denote by G∗ (x, y). Let us observe, at least formally, the important property
G(x, y) = G∗ (y, x)
x, y ∈ (a, b)
(14.2.19)
To see this, use Lz G(z, y) = δ(z − y), L∗z G∗ (z, x) = δ(z − x) to get
Z b
Z b
∗
∗
G (y, x) − G(x, y) =
G (z, x)Lz G(z, y) dz −
G(z, y)L∗z G∗ (z, x) dz
(14.2.20)
a
a
z=b
= J(G∗ (z, x), G(z, y))z=a = 0
(14.2.21)
where the last equality follows from the fact that G, G∗ satisfy respectively the {B1 , B2 }
and {B1∗ , B2∗ } boundary conditions as a function of their first variable. This confirms
the expected result that G(x, y) = G(y, x) if T = T ∗ . Furthermore it shows that as a
function of the second variable, G(x, y) satisfies the homogeneous adjoint equation for
x 6= y and the adjoint boundary conditions.
14.3
Sturm-Liouville theory
If the operator T in (14.1.23) is self-adjoint, then the existence of eigenvalues and eigenfunctions can be directly proved as a consequence of the fact that T −1 is compact and
self-adjoint. But even if T is not self-adjoint, it is still possible to obtain such results by
using a special device known as the Liouville transformation. Essentially we will produce
a compact self-adjoint operator in a slightly different space, whose spectrum must agree
with that of T . The resulting conclusions about the spectral properties of second order
ordinary differential operators, together with some other closely related facts, is generally
referred to as Sturm-Liouville theory.
245
As in (14.1.7), let
L0 φ = a2 (x)φ00 + a1 (x)φ0 + a0 (x)φ
(14.3.1)
with the assumptions that aj ∈ C([a, b]), and now for definiteness a2 (x) < 0. Define
Z x
a1 (s)
p(x)
p(x) = exp
ds
ρ(x) = −
q(x) = a0 (x)ρ(x)
(14.3.2)
a2 (x)
a a2 (s)
so that p, ρ are both positive and continuous on [a, b]. We then observe that L0 φ = λφ
is equivalent to
−(pφ0 )0 + qφ = λρφ
(14.3.3)
If we define L1 , L by
L1 φ = −(pφ0 )0 + qφ
L1 φ
ρ
(14.3.4)
Lφ = λφ
(14.3.5)
Lφ =
then we see that
L0 φ = λφ
if and only if
Note that L1 is formally self-adjoint. In order to realize L itself as a self-adjoint operator
we introduce the weighted space
Z b
2
Lρ (a, b) = {φ :
|φ(x)|2 ρ(x) dx < ∞}
(14.3.6)
a
Since ρ is continuous and positive on [a, b], this space may be regarded as the Hilbert
space equipped with inner product
Z b
φ(x)ψ(x)ρ(x) dx
hφ, ψiρ :=
(14.3.7)
a
for which the corresponding norm ||φ||2ρ =
L2 (a, b) norm. We have obviously
Rb
a
|φ(x)|2 ρ(x) dx is equivalent to the usual
hLφ, ψiρ − hφ, Lψiρ = hL1 φ, ψi − hφ, L1 ψi = 0
for φ, ψ ∈ C0∞ (a, b). For φ, ψ ∈ C 2 ([a, b]) we have instead, just as before, that
b
hLφ, ψiρ − hφ, Lψiρ = J(φ, ψ)a
0
(14.3.8)
(14.3.9)
where here J(φ, ψ) = p(φ0 ψ − φψ ). In the case of separated boundary conditions (14.1.8)
we still have the property remarked earlier that {B1∗ , B2∗ } = {B1 , B2 } so that the operator
T1 corresponding to {L1 , B1 , B2 } is self-adjoint.
246
14-3-1
It follows in particular that the solution of
L1 φ = f
B1 φ = B2 φ = 0
(14.3.10)
may be given as
Z
b
G1 (x, y)f (y) dy
φ(x) =
(14.3.11)
a
as long as there is no non-trivial solution of the homogeneous problem. The Green’s
function G1 will have the properties stated in Theorem 14.1 and G1 (x, y) = G1 (y, x) by
the self-adjointness. The eigenvalue condition L1 φ = λρφ then amounts to
Z b
G1 (x, y)ρ(y)φ(y) dy
(14.3.12)
φ(x) = λ
a
p
If we let ψ(x) = ρ(x)φ(x), µ = 1/λ and
p
p
G(x, y) = ρ(x) ρ(y)G1 (x, y)
then we see that
Z
(14.3.13)
b
G(x, y)ψ(y) dy = µψ(x)
(14.3.14)
14-3-14
a
must hold. Conversely, any nontrivial solution of (14.3.14) gives rise, via all of the same
transformations, to an eigenfunction of L0 with the {B1 , B2 } boundary conditions. The
integral operator S with kernel G is clearly compact and self-adjoint, and
p 0 is not an
eigenvalue, since Sψ = 0 would imply that zero is a solution of L1 u = ρ(x)ψ(x). In
√
particular, if T ψ = µψ, then φ = ψ/ ρ satisfies
L0 φ = λφ
B1 φ = B2 φ = 0
(14.3.15)
with λ = 1/µ.
The choice we made that a2 (x) < 0 implies that the set of eigenvalues is bounded
below. Consider for example the case that the boundary conditions are φ(a) = φ(b) = 0.
From the fact that λn and any corresponding eigenfunction φn satisfy L1 φn = λn ρφn , it
follows, upon multiplying by φn and integrating by parts, that
Z b
Z b
0 2
2
(p|φn | + q|φn | ) dx = λn
ρ|φn |2 dx
(14.3.16)
a
a
Since p > 0 we get in particular that
Rb
Rb
(p|φ0n |2 + q|φn |2 ) dx
q|φn |2 dx
a
a
λn =
≥
≥C
Rb
Rb
2 dx
2 dx
ρ|φ
|
ρ|φ
|
n
n
a
a
247
(14.3.17)
14-3-17
where C = min q/ max ρ. The same conclusion holds for the case of more general boundary conditions, see Exercise 10.
Next we can say a little more about the eigenfunctions {φn }. We know by Theorem
13.10 that the eigenfunctions {ψn } of the operator S may be chosen as an orthonormal
√
basis of L2 (a, b). Since φn may be taken to be ψn / ρ, by the preceding discussion, it
follows that
(
Z b
Z b
0 n 6= m
ψn ψm dx =
(14.3.18)
φn φm ρ dx =
1 n=m
a
a
Thus the eigenfunctions are orthonormal in the weighted space L2ρ (a, b). We can also
easily verify the completeness of these eigenfunctions as follows. For any f ∈ L2ρ (a, b) we
√
have that ρf ∈ L2 (a, b), so
√
f ρ=
∞
X
√
cn = hf ρ, ψn i
cn ψn
(14.3.19)
n=1
in the sense of L2 convergence. Equivalently, this means
f=
∞
X
cn φn
cn = hf ρ, φn i = hf, φn iρ
(14.3.20)
n=1
also in the sense of L2 or L2ρ convergence, and so the completeness follows from Theorem
6.4.
From these observations, together with Theorem 13.10 and Corollary 14.1 we obtain
the following.
th14-2
Theorem 14.2. Assume that a0 , a1 , a2 ∈ C([a, b]), a2 (x) < 0 on [a, b], and that |c1 | +
|c2 | =
6 0, |c3 | + |c4 | =
6 0. Then the problem
a2 φ00 + a1 φ0 + a0 φ = λφ
a<x<b
c1 φ(a) + c2 φ0 (a) = c3 φ(b) + c4 φ0 (b) = 0 (14.3.21)
has a countable sequence of simple real eigenvalues {λn }∞
n=1 , with λn → ∞. The corresponding eigenfunctions may be chosen to form an orthonormal basis of L2ρ (a, b).
There is one other notable property of the eigenfunctions which we mention without
proof: The eigenfunction φn has exactly n − 1 roots in (a, b). See for example Theorem
2.1, Chapter 8 of [7].
248
14.4
The Laplacian with homogeneous Dirichlet boundary conditions
DirLap
In this section we develop some theory for the very important eigenvalue problem
−∆u = λu x ∈ Ω
u = 0 x ∈ ∂Ω
(14.4.1)
(14.4.2)
Here Ω is a bounded open set in RN , N ≥ 2, with sufficiently smooth boundary. The
general approach will again be to obtain the existence of eigenvalues and eigenfunctions
by first looking at an appropriately defined inverse operator. To begin making precise
the definitions of the operators involved, set
T u = −∆u
on D(T ) = {u ∈ H01 (Ω) : ∆u ∈ L2 (Ω)}
(14.4.3)
to be regarded as an unbounded operator on L2 (Ω).
Recall that in Section 9.1 we have defined the Sobolev spaces H 1 (Ω) and H01 (Ω),
and it was mentioned there that it is appropriate to regard u ∈ H01 (Ω) as meaning that
u ∈ H 1 (Ω) and u = 0 on ∂Ω. The precise meaning of this needs to be clarified, since
in general a function u ∈ H 1 (Ω) need not be continuous on Ω, so that its restriction to
the lower dimensional set ∂Ω is not defined in an obvious way. The following theorem is
proved in [5]or [10].
Theorem 14.3. If Ω is a bounded domain in RN with a C 1 boundary, then there exists
a bounded linear operator τ : H 1 (Ω) → L2 (∂Ω) such that
τ u = u|∂Ω
if u ∈ H 1 (Ω) ∩ C(Ω)
τ u = 0 if u ∈ H01 (Ω)
(14.4.4)
(14.4.5)
The mapping τ in this theorem is the trace operator, that is, the operator of restriction
to ∂Ω, and τ u is called the trace of u on ∂Ω. According to the theorem, the trace is well
defined for any u ∈ H 1 (Ω), it coincides with the usual notion of restriction if u happens
to be continuous on Ω, and any function u ∈ H01 (Ω) has trace equal to 0. It can be
further more be shown that the expected integration by parts formula (see (18.2.3))
Z
Z
Z
∂v
∂u
u
dx = −
v dx +
uvnj dS
(14.4.6)
Ω ∂xj
Ω ∂xj
∂Ω
249
remains valid as long as u, v ∈ H 1 (Ω), where in the boundary integral u, v must be
understood as meaning τ u and τ v. The boundary integral is well defined since these
traces belong to L2 (∂Ω), according to the theorem.
For any f ∈ L2 (Ω), the condition that u ∈ D(T ) and T u = f means
Z
Z
1
u∆v dx =
f v dx
∀v ∈ C0∞ (Ω)
u ∈ H0 (Ω)
(14.4.7)
wk-a
R
The first integral may be equivalently written as Ω ∇u · ∇v dx, using the integration by
parts formula, and then by the density of C0∞ (Ω) in H01 (Ω), we see that
Z
Z
1
u ∈ H0 (Ω)
∇u · ∇v dx =
f v dx
∀v ∈ H01 (Ω)
(14.4.8)
wk-b
Ω
Ω
Ω
Ω
must hold. Conversely, any function u satisfying (14.4.8) must also satisfy T u = f .
In particular, if λ is an eigenvalue of T then
Z
Z
1
u ∈ H0 (Ω)
∇u · ∇v dx = λ uv dx
Ω
∀v ∈ H01 (Ω)
(14.4.9)
14-4-7
Ω
We note that λ ≥ 0 must hold, since we can choose v = u. As we will see below λ = 0 is
impossible also.
Another tool we will make good use of is the so-called Poincaré inequality.
poincineq
Proposition 14.1. If Ω is a bounded open set in RN then there exists a constant C,
depending only on Ω, such that
||u||L2 (Ω) ≤ C||∇u||L2 (Ω)
∀u ∈ H01 (Ω)
(14.4.10)
Proof: It is enough to prove the stated inequality for u ∈ C0∞ (Ω). If we let R be large
enough so that Ω ⊂ QR = {x ∈ RN : |xj | < R, j = 1, . . . N } then defining u = 0 outside
of Ω we may also regard u as an element of C0∞ (QR ), with identical norms whether
considered on Ω or QR . Therefore
Z
Z
∂ 2
2
2
||u||L2 (Ω) =
u dx = −
x1
u dx
(14.4.11)
∂x1
QR
QR
Z
∂u
= −2
x1 u
dx
(14.4.12)
∂x1
QR
≤ 2R||u||L2 (Ω) ||∇u||L2 (Ω)
(14.4.13)
250
14-4-8
Thus the conclusion holds with C = 2R.
Note that we do not really need Ω to be bounded here, only that it be contained
between two parallel hyperplanes. It is an immediate consequence of Poincaré’s inequality
that
||u||H01 (Ω) := ||∇u||L2 (Ω)
(14.4.14)
14-4-12
defines a norm on H01 (Ω) which is equivalent to the original norm it inherits as a subspace
of H 1 (Ω), since
R 2
||u||2H 1 (Ω)
(u + |∇u|2 ) dx
Ω R
1≤
=
≤C +1
(14.4.15)
||u||2H 1 (Ω)
|∇u|2 dx
Ω
0
Unless otherwise stated we always assume that the norm on H01 (Ω) is that given by
(14.4.14), which of course corresponds to the inner product
Z
∇u · ∇v dx
hu, viH01 (Ω) =
(14.4.16)
Ω
A simple but important connection between the eigenvalues of T and the Poincaré
inequality, obtained by choosing v = u in the right hand equality of (14.4.9), is that any
such eigenvalue λ satisfies
1
λ≥ 2 >0
(14.4.17)
C
where C is any constant for which Poincaré’s inequality is valid. We will see later that
there is a ’best constant’, namely a value C = CP for which (14.4.10) is true, but is false
for any smaller value, and the smallest positive eigenvalue of T is precisely 1/CP2 .
Any constant which works in the Poincaré inequality also provides a lower bound for
the operator T , as follows: If T u = f then choosing v = u in (14.4.9) we get
Z
Z
2
|∇u| dx =
f u dx ≤ ||f ||L2 (Ω) ||u||L2 (Ω) ≤ C||f ||L2 (Ω) ||∇u||L2 (Ω)
(14.4.18)
Ω
Ω
Therefore
||u||H01 (Ω) ≤ C||f ||L2 (Ω)
and ||u||L2 (Ω) ≤ C 2 ||f ||L2 (Ω)
(14.4.19)
or equivalently ||T u||L2 (Ω) ≥ C −2 ||u||L2 (Ω) .
prop14-2
Proposition 14.2. Considered as an operator on L2 (Ω), T is one-to-one, onto and selfadjoint.
251
14-4-19
Proof: The property that T is one-to-one is immediate from (14.4.19). Next, if f ∈
L2 (Ω) define the linear functional φ by
Z
f v dx
(14.4.20)
φ(v) =
Ω
Then φ is continuous on H01 (Ω) since
|φ(v)| ≤ ||f ||L2 (Ω) ||v||L2 (Ω) ≤ C||f ||L2 (Ω) ||v||H01 (Ω)
(14.4.21)
By the Riesz Representation theorem, Theorem 6.6, there exists a unique u ∈ H01 (Ω)
such that
Z
∇u · ∇v dx = φ(v)
(14.4.22)
hu, viH01 (Ω) =
Ω
which is equivalent to T u = f , asR explained above. Thus T is onto. Finally, from
(14.4.9) it follows that hT u, vi = Ω ∇u · ∇v dx = hu, T vi, i.e. T is symmetric, and
a linear operator which is symmetric and onto must be self-adjoint, see Exercise 7 of
Chapter 11.
Next we consider the construction of an inverse operator to T , in the form of an
integral operator
Z
Sf (x) =
G(x, y)f (y) dy
(14.4.23)
14-4-23
Ω
where G will again be called the Green’s function for T u = f , assuming it exists. Thus
u(x) = Sf (x) should be the solution of
−∆u = f (x) x ∈ Ω
u(x) = 0 x ∈ ∂Ω
(14.4.24)
Analogously to the ODE case discussed in the previous section, we expect that G should
formally satisfy
−∆x G(x, y) = δ(x − y) x ∈ Ω
G(x, y) = 0 x ∈ ∂Ω
(14.4.25)
for every fixed y ∈ Ω. Recall that we already know that there exist Γ(x) such that
−∆Γ = δ in the sense of distributions, so if we set h(x, y) = G(x, y) − Γ(x − y) then it
is necessary for h to satisfy
−∆x h(x, y) = 0 x ∈ Ω
h(x, y) = −Γ(x − y) x ∈ ∂Ω
(14.4.26)
for every fixed y ∈ Ω. Note that since x − y 6= 0 for x ∈ ∂Ω and y ∈ Ω, the boundary
function for h is infinitely differentiable. Thus we have a parametrized set of boundary
252
14-4-26
value problems, each having the form of finding a function harmonic in Ω satisfying a
prescribed smooth Dirichlet type boundary condition. Such a problem is known to have
a unique solution, assuming only very minimal hypotheses on the smoothness of ∂Ω, see
for example Theorem 2, Section 4.3 of [22]. In a few special cases it is possible to compute
h(x, y), and hence G(x, y), explicitly, see Exercise 17 for the case when Ω is a ball.
Note however, that whatever h may be, G(x, y) is singular when x = y, and possesses
the same local integrability properties as Γ(x − y). It is not hard to check that
R
|Γ(x − y)|2 dxdy is finite for N = 2, 3 but not for N ≥ 4. Thus G is not of HilbertΩ×Ω
Schmidt type in general, so we cannot directly conclude in this way that S = T −1 is
compact on L2 (Ω). Nevertheless the operator is indeed compact. One approach to showing this comes from the general theory of singular integral operators, see Chapter ( ). A
simple alternative, which we will use here, is based on the following result, which is of
independent importance.
rellich
Theorem 14.4. (Rellich-Kondrachov) A bounded set in H01 (Ω) is precompact in L2 (Ω).
For a proof we refer to [10], Section 5.7, Theorem 1, or [5], Theorem 9.16, where
somewhat more general statements are given. With some minimal smoothness assumption on ∂Ω we can replace H01 (Ω) by H 1 (Ω). It is an equivalent statement to say that
the identity map i : H01 (Ω) → L2 (Ω) is a compact linear operator. Other terminology,
such as that H01 (Ω) is compactly embedded, compactly included, or compactly injected
in L2 (Ω) (or H01 (Ω) ,→ L2 (Ω)) are also commonly used.
Corollary 14.2. If S = T −1 then S is a compact self-adjoint operator on L2 (Ω)
Proof: If E ⊂ L2 (Ω) is bounded, then by (14.4.19) the image S(E) = {u = Sf : f ∈ E}
is bounded in H01 (Ω). The Rellich-Kondrachov theorem then implies S(E) is precompact
as a subset of L2 (Ω), so S : L2 (Ω) → L2 (Ω) is compact. The self-adjointness of S follows
immediately from that of T .
Thus S possesses an infinite sequence of real eigenvalues {µn }∞
n=1 , limn→∞ µn = 0,
∞
and corresponding eigenfunctions {ψn }n=1 which may be chosen as an orthonormal basis
of L2 (Ω). As usual, the reciprocals λn = 1/µn are eigenvalues of T = S −1 , and recall
that all eigenvalues of T are strictly positive. We have established the following.
th14-5
Theorem 14.5. The operator
T u = −∆u
D(T ) = {u ∈ H01 (Ω) : ∆u ∈ L2 (Ω)}
253
(14.4.27)
HeatEqSepVar
has an infinite sequence of real eigenvalues of finite multiplicity,
0 < λ1 ≤ λ2 ≤ λ3 ≤ . . . λn → +∞
(14.4.28)
and corresponding eigenfunctions {ψn }∞
n=1 which may be chosen as an orthonormal basis
2
of L (Ω).
The convention here is that an eigenvalue in this sequence is repeated according to
its multiplicity. In comparison with the Sturm-Liouville case, an eigenvalue need not
be simple, although the multiplicity must still be finite, thus repetitions in the sequence
(14.4.28) may occur. It does turn out to be the case, however, that λ1 is always simple
– this will be discussed in Section ( ). We refer to λn , ψn as Dirichlet eigenvalues and
eigenfunctions for the domain Ω. Among many other things, knowledge of the existence
of these eigenvalues and eigenfunctions allows us to greatly expand the scope of the
separation of variables method.
Example 14.4. Consider the initial and boundary value problem for the heat equation
in a bounded domain Ω ⊂ RN ,
ut − ∆u = 0
x∈Ω t>0
u(x, t) = 0
x ∈ ∂Ω t > 0
u(x, 0) = f (x)
x∈Ω
(14.4.29)
(14.4.30)
(14.4.31)
Employing the separation of variables method, we begin by looking for solutions in the
product form ψ(x)φ(t) which satisfy the PDE and the homogeneous boundary condition.
Substituting we see that φ0 (t)ψ(x) = φ(t)∆ψ(x) should hold, and therefore
φ0 + λφ = 0 t > 0
∆ψ + λψ = 0 x ∈ Ω
(14.4.32)
In addition the boundary condition implies that ψ(x) = 0 for x ∈ ∂Ω. In order to have
a nonzero solution, we concluded that λ, ψ must be a Dirichlet eigenvalue/eigenfunction
pair for the domain Ω, and then correspondingly φ(t) = Ce−λt . By linearity we therefore
see that if λn , ψn denote the Dirichlet eigenvalues and L2 (Ω) orthonormalized eigenfunctions, then
∞
X
u(x, t) =
cn e−λn t ψn (x)
(14.4.33)
n=1
is a solution of (14.4.29),(14.4.30), as long as the coefficients ck are sufficiently rapidly
decaying.
254
14-4-28
In order that (14.4.31) also holds, we must have
f (x) = u(x, 0) =
∞
X
cn ψn (x)
(14.4.34)
n=1
and so from the orthonormality, cn = hf, ψn i. We have thus obtained the (formal)
solution
∞
X
u(x, t) =
hf, ψn ie−λn t ψn (x)
(14.4.35)
n=1
of (14.4.29)-(14.4.30)-(14.4.31).
Making use of estimates which may be found in more advanced PDE textbooks,
it can be shown that for any f ∈ L2 (Ω) the series (14.4.35) is uniformly convergent
to an infinitely differentiable limit u(x, t) for t > 0, where u is a classical solution of
(14.4.29)-(14.4.30),
and the initial condition (14.4.31) is satisfied at least in the sense
R
that limt→0 Ω (u(x, t) − f (x))2 dx = 0. Under stronger conditions on f , the nature of
the convergence at t = 0 can be shown to be correspondingly
We refer, for
P∞ stronger.
2
example, to [10] for
At the very least, since n=1 |cn | < ∞ must hold, the
Pmore details.
−λn t 2
|c
e
|
< ∞ for t ≥ 0 implies that the series is convergent in
obvious estimate ∞
n=1 n
2
L (Ω) for every fixed t ≥ 0.
Note, again at a formal level at least, that the expression for the solution u can be
rewritten as
∞ Z
X
f (y)ψn (y) dy e−λn t ψn (x)
(14.4.36)
u(x, t) =
n=1
Ω
Z
=
f (y)
Ω
∞
X
!
e−λn t ψn (x)ψn (y)
dy
(14.4.37)
n=1
Z
=
f (y)G(x, y, t) dy
(14.4.38)
Ω
suggesting that
G(x, y, t) :=
∞
X
e−λn t ψn (x)ψn (y)
n=1
should be regarded as the Green’s function for (14.4.29)-(14.4.30)-(14.4.31).
255
(14.4.39)
14-4-35
14.5
Exercises
1. Let Lu = (x − 2)u00 + (1 − x)u0 + u on (0, 1).
a) Find the Green’s function for
Lu = f
u0 (0) = 0 u(1) = 0
(Hint: First show that x − 1, ex are linearly independent solutions of Lu = 0.)
b) Find the adjoint operator and boundary conditions.
2. Let
d
Tu = −
dx
du
x
dx
on the domain
D(T ) = {u ∈ H 2 (1, 2) : u(1) = u(2) = 0}
a) Show that N (T ) = {0}.
b) Find the Green’s function for the boundary value problem T u = f .
c) State and prove a result about the continuous dependence of the solution u on f
in part (b).
3. Let φ, ψ be solutions of Lu = a2 (x)u00 + a1 (x)u0 + a0 (x)u = 0 on (a, b) and
W (φ, ψ)(x) = φ(x)ψ 0 (x) − φ0 (x)ψ(x) be the corresponding Wronskian determinant.
a) Show that W is either zero everywhere or zero nowhere. (Suggestion: find a first
order ODE satisfied by W .)
b) If a1 (x) = 0 show that the W is constant.
Ec14-4
4. Let Lu = a2 (x)u00 + a1 (x)u0 + a0 (x)u with a02 = a1 , so that L is formally self adjoint.
If B1 u = C1 u(a)+C2 u0 (a), B2 u = C3 u(b)+C4 u0 (b), show that {B1∗ , B2∗ } = {B1 , B2 }.
5. Find the Green’s function for
u00 + 2u0 − 3u = f (x) 0 < x < ∞
u(0) = 0
lim u(x) = 0
x→∞
(Think of the last condition as a ’boundary condition at infinity’.) Using the Green’s
function, find u(2) if f (x) = e−6x .
256
Ec14-6
6. Consider the second order operator
Lu = a2 (x)u00 + a1 (x)u0 + a0 (x)u
a<x<b
with non-separated boundary conditions
B1 u = α11 u(a) + α12 u0 (a) + β11 u(b) + β12 u0 (b) = 0
B2 u = α21 u(a) + α22 u0 (a) + β21 u(b) + β22 u0 (b) = 0
where the vectors (α11 , α12 , β11 , β12 ), (α21 , α22 , β21 , β22 ) are linearly independent. We
again say that two other non-separated boundary conditions B1∗ , B2∗ are adjoint to
B1 , B2 with respect to L if J(u, v)|ba = 0 whenever B1 u = B2 u = B1∗ v = B2∗ v = 0.
Find the adjoint operator and boundary conditions in the case that
Lu = u00 + xu0
B1 u = u0 (0) − 2u(1) B2 u = u(0) + u(1)
7. When we rewrite a2 (x)u00 + a1 (x)u0 + a0 (x)u = λu as
−(p(x)u0 )0 + q(x)u = λρ(x)u
the latter is often referred to as the Liouville normal form. Consider the eigenvalue
problem
x2 u00 + xu0 + u = λu 1 < x < 2
u(1) = u(2) = 0
a) Find the Liouville normal form.
b) What is the orthogonality relationship satisfied by the eigenfunctions?
c) Find the eigenvalues and eigenfunctions. (You may find the original form of the
equation easier to work with than the Liouville normal form when computing the
eigenvalues and eigenfunctions.)
8. Consider the Sturm-Liouville equation in the Liouville normal form,
−(p(x)u0 )0 + q(x)u = λρ(x)u
a<x<b
where p, ρ ∈ C 2 ([a, b]), q ∈ C([a, b]), p, ρ > 0 on [a, b]. Let
s
Z b
Z
ρ(x)
1 x
1/4
σ(x) =
η(x) = (p(x)ρ(x))
L=
σ(s) ds φ(x) =
σ(s) ds
p(x)
L a
a
257
If ψ = φ−1 (the inverse function of φ) and v(z) = η(ψ(z))u(ψ(z)) show that v
satisfies
−v 00 + Q(z)v = µv 0 < z < 1
(14.5.1)
for some Q depending on p, ρ, q, and µ = L2 λ. (This is mainly a fairly tedious
exercise with the chain rule. Focus on making the derivation as clean as possible
and be sure to say exactly what Q(z) is. The point of this is that every eigenvalue
problem for a second order ODE is equivalent to one with an equation of the form
(14.5.1), provided that the coefficients have sufficient smoothness. The map u(x) →
v(z) is sometimes called the Liouville transformation, and the ODE (14.5.1) is the
canonical form for a 2nd order ODE eigenvalue problem.)
9. Consider the Sturm-Liouville problem
u00 + λu = 0
0<x<1
u(0) − u0 (0) = u(1) = 0
a) Multiply the equation by u and integrate by parts to show that any eigenvalue
is positive.
√
√
b) Show that the eigenvalues are the positive solutions of tan λ = − λ.
c) Show graphically
√ that such roots exist, and form an infinite sequence λk such
that (k − 12 )π < λk < kπ and
lim (
k→∞
Ec14-10
p
1
λk − (k − )π) = 0
2
10. Complete the proof that λn → +∞ under the assumptions of Theorem 14.2. (Suggestion: you can obtain an inequality like (14.3.17), except it may also contain
boundary terms.)
11. Using separation of variables, compute explicitly the Dirichlet eigenvalues and eigenfunctions of −∆ when the domain is a rectangle (0, A)×(0, B) in R2 . Verify directly
that the first eigenvalue is simple, and that the first eigenfunction is of constant
sign. Can there be other eigenvalues of multiplicity greater than one? (Hint: Your
answer should depend on whether the ratio A/B is rational or irrational).
12. Find Dirichlet eigenvalues and eigenfunctions of −∆ in the unit ball B(0, 1) ⊂ R2 .
(Suggestion: express the PDE and do separation of variables in polar coordinates.
Your answer should involve Bessel functions.)
258
LiNo
Ec14-13
13. If {ψn }∞
of the Laplacian making up an orthonormal
n=1 are Dirichlet eigenfunctions
√
basis of L2 (Ω), let ζn = ψn / λn (λn the corresponding eigenvalue).
1
a) Show that {ζn }∞
n=1 is an orthonormal basis of H0 (Ω).
P
2
b) Show that f ∈ H01 (Ω) if and only if ∞
n=1 λn |hf, ψn i| < ∞.
14. If Ω ⊂ Rn is a bounded open set with smooth enough boundary, find a solution of
the wave equation problem
utt − ∆u = 0
x∈Ω t>0
x ∈ ∂Ω t > 0
u(x, t) = 0
u(x, 0) = f (x) ut (x, 0) = g(x)
in the form
u(x, t) =
∞
X
x∈Ω
cn (t)ψn (x)
n=1
where
{ψn }∞
n=1
are the Dirichlet eigenfunctions of −∆ in Ω.
15. Derive formally that
G(x, y) =
∞
X
ψn (x)ψn (y)
n=1
λn
(14.5.2)
where λn , ψn are the Dirichlet eigenvalues and normalized eigenfunctions for the domain Ω, and G(x, y) is the corresponding Green’s function in (14.4.23). (Suggestion:
if −∆u = f , expand both u and f in the ψn basis.)
16. Formulate and prove a result which says that under appropriate conditions
u(x, t) ≈ Ce−λ1 t ψ1 (x)
(14.5.3)
as t → ∞, where u is the solution of (14.4.29)-(14.4.30)-(14.4.31).
Ec14-16
17. If Ω = B(0, 1) ⊂ RN show that the function h(x, y) appearing in (14.4.26) is given
by
h(x, y) = −Γ(|x|y − x/|x|)
(14.5.4)
18. Prove the Rellich-Kondrachov Theorem 14.4 directly in the case of one space dimension, by using the Arzela-Ascoli theorem.
259
Chapter 15
Further study of integral equations
chmoreint
15.1
Singular integral operators
In the very broadest sense, an integral operator
Z
K(x, y)u(y) dy
T u(x) =
(15.1.1)
Ω
is said to be singular if the kernel K(x, y) fails to be C ∞ at one or more points. Of course
this does not necessarily affect the general properties of T in a significant way, but there
are certain more specific kinds of singularity which occur in natural and important ways,
which do affect the general behavior of the operator, and so call for some specific study.
First of all let us observe that singularity is not necessarily a bad thing. For example,
the problem of solving T u = f with a C ∞ kernel is a first kind integral equation, for which
a solution only exists, in general, for very restricted f . By comparison, the corresponding
second kind integral equation u + T u = f may be regarded, at least formally, as a first
kind equation with the ‘very singular’ kernel δ(x − y) + K(x, y), and will have a unique
solution for a much larger class of f ’s, typically all f ∈ L2 (Ω) in fact.
As a second kind of example, recall that if Ω = (a, b) ⊂ R, a Volterra type integral
equation is generally easier to analyze and solve than a corresponding non-Volterra type
equation. The more amenable nature of the Volterra
equation may be understood as
Rx
the fact that the Volterra operator T u(x) = a K(x, y)u(y) dy could be rewritten as
260
intop15
Rb
a
K̃(x, y)u(y) dy where
(
K(x, y) a < y < x < b
K̃(x, y) =
0 a<x<y<b
(15.1.2)
That is to say, K̃ is singular when y = x no matter how smooth K itself is so singularity
is built in to the very structure of a Volterra type integral equation.
Let us also mention that it often appropriate to regard T as being singular if the
underlying domain Ω is unbounded. One might expect this from the fact that if were to
make a change of variable to map the unbounded domain Ω onto a convenient bounded
domain, the price to be paid normally is that the transformed kernel will become singular
at those points which are the image of ∞. The Fourier transform could be regarded in
this light, and its very nice behavior viewed as due to, rather than despite, the presence
of singularity.
For the remainder of this section we will focus on a specific class of singular integral
operators, in which the kernel K is assumed to satisfy
|K(x, y)| ≤
M
|x − y|α
x, y ∈ Ω
(15.1.3)
for some constant M and exponent α > 0, with Ω a bounded domain in RN . If α < N then
K is said to be weakly singular. The main result to be proved below is that an integral
operator with weakly singular kernel is compact on L2 (Ω). Note that such an operator
may or may not be of Hilbert-Schmidt type. For example if K(x, y) = 1/|x − y|α then
K ∈ L2 (Ω × Ω) if and only if α < N/2. The Green’s function G(x, y) for the Laplacian
(see (14.4.23)) is always weakly singular, and the compactness result below provides an
alternative to the Rellich-Kondrachov theorem (Theorem 14.4) for proving compactness
of the corresponding integral operator.
We begin with the following lemma.
lm15-1
Lemma 15.1. Suppose K ∈ L1 (Ω × Ω) and there exists a constant C such that
Z
Z
|K(x, y)| dx ≤ C ∀y ∈ Ω
|K(x, y)| dy ≤ C ∀x ∈ Ω
(15.1.4)
Ω
Ω
Then the corresponding integral operator T is a bounded linear operator on L2 (Ω) with
||T || ≤ C.
261
15-1-3
Proof: Using the Schwarz inequality we get
sZ
sZ
Z
|K(x, y)||u(y)| dy ≤
|K(x, y)| dy
|K(x, y)||u(y)|2 dy
Ω
Ω
≤
(15.1.5)
Ω
sZ
√
|K(x, y)||u(y)|2 dy
C
(15.1.6)
Ω
and therefore
Z
2
Z Z
|T u(x)| dx ≤ C
Ω
2
|K(x, y)||u(y)| dy dx
Z
Z
2
|K(x, y)| dx dy
= C
|u(y)|
Ω
Ω
Z
2
|u(y)|2 dy
≤ C
Ω
(15.1.7)
Ω
(15.1.8)
(15.1.9)
Ω
as needed.
We can now proved the compactness result mentioned above.
Theorem 15.1. If Ω is a bounded domain in RN and K is a weakly singular kernel,
then the integral operator (15.1.1) is compact on L2 (Ω).
Proof: First observe that
Z
Z
|K(x, y)| dy ≤ M
Ω
Z
dy
dy
≤M
α
α
Ω |x − y|
B(x,R) |x − y|
Z R
M ΩN −1 RN −α
N −1−α
≤ M ΩN −1
r
dr =
N −α
0
(15.1.10)
(15.1.11)
for some R depending on Ω. Here ΩN −1 denotes the surface area of the unit sphere in
RN , see (18.3.1). The same is true if we integrate with respect to x instead of y, and so
by Lemma 15.1, T is bounded. Now let
(
K(x, y)
|x − y| > m1
Km (x, y) =
(15.1.12)
0
|x − y| ≤ m1
and note that K − Km satisfies the same estimate as K above, except that R may be
replaced by 1/m. That is,
Z
M ΩN −1
|K(x, y) − Km (x, y)| dy ≤
(15.1.13)
(N − α)mN −α
Ω
262
and likewise for the integral with respect to x. Thus, if Tm is the integral operator with
kernel Km , then using Lemma 15.1 once more we get
||T − Tm || ≤
M ΩN −1
→0
(N − α)mN −α
(15.1.14)
as m → ∞. Since Km ∈ L∞ (Ω × Ω), the operator Tm is compact for each m, by Theorem
13.4, and so the compactness of T follows from Theorem 13.3.
Theorem 15.2. Let Ω be a bounded domain in RN and assume K is a weakly singular
kernel which is continuous on Ω × Ω for x 6= y. If u ∈ L∞ (Ω) then T u is uniformly
continuous on Ω.
Proof: Fix > 0, pick α ∈ (0, N ) such that (15.1.3) holds, and set
H(x, y) = K(x, y)|x − y|α
(15.1.15)
so H is bounded and continuous for x 6= y. Assuming u ∈ L∞ (Ω), and x ∈ Ω we have
for z ∈ B(x, δ) ∩ Ω
Z
(15.1.16)
|T u(z) − T u(x)| = | (K(z, y) − K(x, y))u(y) dy
Z Ω
≤
(|K(z, y)| + |K(x, y)|)|u(y)| dy
(15.1.17)
Ω∩B(x,2δ)
Z
+
|(K(z, y) − K(x, y))u(y)| dy
(15.1.18)
Ω\B(x,2δ)
The integral in (15.1.18) may be estimated by
Z
1
1
||H||∞ ||u||∞
+
dy
α
|x − y|α
B(x,2δ) |z − y|
(15.1.19)
and so tends to zero as δ → 0 at a rate which is independent of x, z. We fix δ > 0 such
that this term is less than .
In the remaining integral, assuming that |x − z| < δ we have |y − x| > 2δ and so also
|y − z| > δ. If Eδ = {(x, y) ∈ Ω × Ω : |x − y| ≥ δ} then K is uniformly continuous on Eδ ,
so there must exist δ 0 < δ such that for z ∈ B(x, δ 0 ) ∩ Ω the integral in (15.1.18) is less
than . This completes the proof.
263
In general compactness fails if α ≥ N . A good example to keep in mind is the Hilbert
transform ((10.2.27)) which is in the borderline case α = N = 1, and which we have
already noted is not a compact operator. Actually this example doesn’t quite fit in to
our discussion since the underlying domain is Ω = R which is not bounded. If, however,
we consider the so-called finite Hilbert transform defined by1
Z
1 1 u(y)
H0 u(x) =
dy
(15.1.20)
π 0 x−y
as an operator on L2 (0, 1), it is known (see [19]) that the spectrum σp (H0 ) consists of
the segment of the imaginary axis connecting the points ±i. In particular, since this
set is uncountable, H0 is not compact. See Chapter 5, section 2 of [15] for discussion
of the operator equation H0 u = f . Note that boundedness of H0 is automatic from the
corresponding property for the Hilbert transform. A thorough investigation of operators
which generalize the Hilbert transform may be found in [34].
15.2
Layer potentials
A type of singular integral operator which has played an important role in the historical
development of the theory of elliptic PDEs is the so-called layer potential, see for example
Kellogg [20] for a very classical treatment. Layer potentials actually come in two common
varieties. If Γ denotes the fundamental solution (9.3.43) of Laplace’s equation in RN for
N ≥ 2, and Σ ⊂ RN is a smooth bounded N − 1 dimensional surface, set
Z
Γ(x − y)φ(y) ds(y)
(15.2.1)
Sφ(x) =
Σ
Z
Dφ(x) =
Σ
∂
Γ(x − y)φ(y) ds(y)
∂ny
(15.2.2)
which are respectively known as single and double layer potentials on Σ with density φ.
To immediately see why such operators might arise naturally in connection with elliptic
PDEs, observe that for any φ which is well behaved on Σ, Sφ and Dφ are harmonic
functions in the complement of Σ. For example if u(x) = Sφ(x) then
Z
∆u(x) =
∆x Γ(x − y)φ(y) ds(y) = 0
(15.2.3)
Σ
1
The integral below must be understood in the principal value sense.
264
may be easily shown to be legitimate for x 6∈ Σ, taking into account that ∆Γ(x) = 0 for
x 6= 0. Likewise if u(x) = Dφ(x) then
Z
Z
∂
∂
∆u(x) =
∆x
Γ(x − y)φ(y) ds(y) =
∆x Γ(x − y)φ(y) ds(y) = 0 (15.2.4)
∂ny
Σ ∂ny
Σ
A wise choice of φ may then allow us to find harmonic functions satisfying some desired
further properties, such as prescribed boundary behavior.
To clarify the definition of D, we suppose that a unit vector n(x) normal to Σ is
chosen, which is a continuous function of x ∈ Σ (typically this amounts to making
a consistent choice of the sign of n(x), since there are two unit normal vectors at each
point of Σ). If Σ is a simple closed surface then we will always adopt the usual convention
which is to take n(x) to be the outward normal. In any case we have
N
X
(x − y) · n(y)
∂
Γ(x − y) = −
Γxj (x − y)nj (y) = −
:= K(x, y) y ∈ Σ
N
∂ny
Ω
|x
−
y|
N
−1
j=1
(15.2.5)
15-2-5
Both Sφ and Dφ are obviously well defined for x 6∈ Σ, and the kernels are well defined
for x 6= y even if x ∈ Σ. If we wish to view either S or D as an operator, say, on L2 (Σ)
then formally at least we should think of Σ as being N − 1 dimensional, and since the
singularity of Γ is like2 |x|2−N , S has the character of a weakly singular integral operator.
In the case of D, however, the singularity of Γxj is like |x|1−N , so K appears to be
exactly in the borderline case where compactness is lost. On the other hand, under some
reasonable assumptions on Σ we will see that extra decay of K when x → y is provided
by the n(y) factor, so that compactness of D will be recovered.
Let us consider now the Dirichlet problem
∆u = 0 x ∈ Ω
u=f
x∈Σ
(15.2.6)
where Ω is a bounded, connected domain in RN , N ≥ 2, and Σ = ∂Ω. We will seek a
solution in the form of a double layer potential u(x) = Dφ(x) for some density φ defined
on Σ. As mentioned above, it is automatic that u is harmonic in Ω, so the condition
which φ must be chosen to satisfy is that Dφ = f on Σ, or more precisely
Z
lim
K(z, y)φ(y) ds(y) = f (x)
(15.2.7)
z→x
z∈Ω
2
Σ
With the usual modification for N = 2.
265
dp15
for x ∈ Σ.
The distinction between evaluating Dφ on Σ and on the other hand taking the limit
of Dφ from inside Ω at a point of Σ is important in the following discussion, and must
be observed rigorously - they are in fact not the same in general, and it is only the latter
which we care about. The simplest possible case, which is contained in the following
lemma, illustrates the point.
Lemma 15.2. If φ(x) = 1 and Σ = ∂Ω is C 2 then


x∈Ω
1
1
Dφ(x) = 2
x∈Σ

c
0
x∈Ω
(15.2.8)
15-2-7
c
Proof: If x ∈ Ω then y → Γ(x − y) is a harmonic function in all of Ω, so integration by
parts gives
Z
Z
∂
∆y Γ(x − y) dy = 0
(15.2.9)
Dφ(x) =
Γ(x − y) ds(y) =
Σ ∂ny
Ω
Now set Ω = Ω\B(x, ). If x ∈ Ω, pick > 0 such that B(x, ) ⊂ Ω in which case
Z
Z
∂
0 =
∆Γ(x − y) dy =
Γ(x − y) ds(y)
(15.2.10)
Ω
∂Ω ∂ny
Z
Z
∂
∂
=
Γ(x − y) ds(y) −
Γ(x − y) ds(y)
(15.2.11)
Σ ∂ny
|y−x|= ∂ny
For |x − y| = it is easy to check that n(y) = (x − y)/|x − y| (the outward normal points
towards x) and so the second term evaluates to be
Z
1
ds(y) = 1
(15.2.12)
N −1
|y−x|= ΩN −1 which establishes (15.2.8) for x ∈ Ω. Finally for x ∈ Σ, we repeat the same calculation
and find that the integral on the left in (15.2.13) is replaced by
Z
1
ds(y)
(15.2.13)
N −1
Ω∩|y−x|= ΩN −1 Since we assumed that Σ is C 2 it follows that as → 0 we get precisely half of surface
area (i.e. Σ might as well be a hyperplane), so that the limit of 1/2 results, as needed.
266
15-2-12
15-2-12
Note that if we allowed Σ to have a corner at some point x, then the conclusion that
Dφ(x) = 1/2 for x ∈ Σ would definitely no longer be valid.
If u(x) = Dφ(x) for some φ, let us now define
u+ (x) = lim u(x + αn(x))
α→0+
u− (x) = lim u(x + αn(x))
α→0−
(15.2.14)
Thus u− , u+ are respectively limiting values of u from inside and outside the domain. In
the above example we saw that u(x) − u± (x) = ± 21 for x ∈ Σ, and this generalizes in the
following way.
Theorem 15.3. If φ ∈ C(Σ) and u = Dφ(x) then
u(x) − u± (x) = ±
φ(x)
2
x∈Σ
(15.2.15)
dljump
The proof of this result involves technicalities which are beyond the scope of this
book. We refer to Theorem 3.22 of [11] for details.
Thus in general Dφ experiences a jump as Σ is crossed, whose magnitude at x ∈ Σ is
precisely φ(x). For the Dirichlet problem (15.2.6) the precise meaning of the boundary
condition is that we seek a density φ such that u− (x) = f (x) for x ∈ Σ. It then follows
from (15.2.15) that φ should satisfy
Z
φ(x)
+ K(x, y)φ(y) ds(y) = f (x)
x∈Σ
(15.2.16)
2
Σ
Conversely, if φ is a continuous solution of (15.2.16) and we set u(x) = Dφ(x) then u
= f (x), as required. We therefore have
is harmonic inside Ω and u− (x) = u(x) + φ(x)
2
obtained the very interesting and useful result that solvability properties of (15.2.6) can
be analyzed in terms of the second kind integral equation (15.2.16). We can likewise
c
study the corresponding exterior Dirichlet problem, in which we seek u harmonic in Ω
with prescribed boundary values on Σ, by looking instead at
Z
φ(x)
−
+ K(x, y)φ(y) ds(y) = f (x)
x∈Σ
(15.2.17)
2
Σ
The strategy now is to show that D is a compact operator on L2 (Σ), so that the
general theory from Chapter 13 can be applied. Again, the technicalities are lengthy
267
dpie
extdpie
so we will content ourselves with a heuristic discussion, referring to [11] for a detailed
treatment.
In the previous section we have established a sufficient condition for a singular integral
operator to be compact. Here, the underlying domain Σ is not a domain in RN but
assuming it is a reasonably smooth surface, e.g. C 2 , it is ’locally’ a domain in RN −1 .
Thus compactness can be proved, as before, if the singularity of K has an associated
exponent α < N − 1. The explicit expression (15.2.5) for K does not appear to imply
this, but it will if we take into account that x − y becomes orthogonal to n(y) if x, y ∈ Σ
and x → y. More precisely we have
Lemma 15.3. If Σ is a C 2 surface then there exists a constant M such that
|(x − y) · n(y)| ≤ M |x − y|2
x, y ∈ Σ
(15.2.18)
Proof: Fix x ∈ Σ. Without loss of generality we may assume that x = 0 and that
n(0) = (0, 0, . . . 1). Thus in a neighborhood of x = 0 the surface Σ is given by yn = Ψ(y 0 )
where y 0 = (y1 , . . . yn−1 ), Ψ is C 2 near 0, and Ψ(0) = ∇Ψ(0) = 0. In particular Ψ(y) =
O(|y 0 |2 ) as y 0 → 0. By Taylor’s theorem, for y ∈ Σ
(x − y) · n(y) = −y · (n(0) + n(y) − n(0)) = −yn + y · (n(0) − n(y))(15.2.19)
= −Ψ(y1 , . . . , yn−1 ) + y · (n(0) − n(y))
(15.2.20)
Since Σ is C 2 it follows that n(y) is C 1 , and so both terms in (15.2.20) are O(|y 0 |2 ), which
is the needed conclusion at fixed x. The implied constant depends only on bounds for
the curvature of Σ and so a constant M exists which is independent of x ∈ Σ.
Corollary 15.1. The kernel K(x, y) in (15.2.5) satisfies
|K(x, y)| ≤ M |x − y|2−N
x, y ∈ Σ
(15.2.21)
and in particular D is compact on L2 (Σ).
From Theorem 13.5 it now follows that there exists a unique solution of (15.2.16)
for every f ∈ C(Σ) (or even L2 (Σ)) provided that it can be verified that there is no
non-trivial solution of the corresponding homogeneous equation. If such a solution φ 6≡ 0
exists then it follows first of all that u = Dφ is a solution of (15.2.6) with f ≡ 0. This
must mean u ≡ 0 and so in consequence u− (x) = 0 on Σ. Likewise u satisfies (15.2.6)
c
with Ω replaced by Ω , and this also implies u+ (x) = 0 on Σ, see Exercise ( ). But then
by (15.2.15) it follows that
φ(x) = u− (x) − u+ (x) = 0
268
(15.2.22)
so that Dφ − φ/2 = 0 has only the trivial solution, as needed.
Let us also note that if Dφ − φ/2 = f ∈ C(Σ) it can be shown that φ ∈ C(Σ) so that
(15.2.15) is valid.
15.3
Convolution equations
Consider the convolution type integral equation
Z
K(x − y)u(y) dy − λu(x) = f (x)
x ∈ RN
(15.3.1)
15-3-1
RN
where K, f ∈ L2 (RN ). If there exists a solution u ∈ L2 (RN ) then by Theorem 8.8 it must
hold that
N
a.e. y ∈ RN
(15.3.2)
((2π) 2 K̂(y) − λ)û(y) = fˆ(y)
N
The solution is evidently unique, at least in L2 (RN ), provided (2π) 2 K̂(y) 6= λ a.e. If
also there exists > 0 such that
N
|(2π) 2 K̂(y) − λ| ≥ then
û(y) =
a.e. y ∈ RN
fˆ(y)
N
2
(2π) K̂(y) − λ
(15.3.3)
(15.3.4)
defines a solution for every f ∈ L2 (RN ).
The requirement K ∈ L2 (RN ) can clearly be weakened to some extent. Recall that
K∗u is well defined under a number of different sets of assumptions which have been made
earlier, for example (i) K ∈ D0 (RN ) and u ∈ D(RN ), (ii) K ∈ S 0 (RN ) and u ∈ S(RN )
or (iii) K ∈ Lp (RN ) and u ∈ Lq (RN ) with p−1 + q −1 ≥ 1, and all of these are subject
to further refinement. Thus a separate analysis of existence and uniqueness for (15.3.1)
could be carried out under a wide variety of assumptions. Let us note in particular that
(15.3.4) provides at least a formal solution formula provided that K ∈ S 0 (RN ), fˆ, K̂ are
regular distributions (i.e. functions), and K̂(y) 6= λ a.e.
Example 15.1. In (15.3.1) let N = 1 and K = π1 pv x1 so that K ∗ u = Hu, the Hilbert
transform of u defined in (10.2.27). Referring to the formula (8.8.5) for the Fourier
transform of K, we obtain
(λû − i sgn y)û(y) = fˆ(y)
(15.3.5)
269
15-3-4
Thus for f ∈ L2 (R) and λ 6= ±i it is clear that
û(y) =
fˆ(y)
λ − i sgn y
(15.3.6)
defines the unique solution of (15.3.1).
Now let us consider a closely related situation of a so-called Hankel type integral
equation,
Z
K(x + y)u(y) dy = f (x)
x ∈ RN
(15.3.7)
15-3-7
RN
If we let K1 (x) = K(−x) and f1 (x) = f (−x) then (15.3.7) is equivalent to K1 ∗ u = f1 ,
and so
1 fb1 (y)
u
b(y) =
(15.3.8)
N
b 1 (y)
(2π) 2 K
If we temporarily denote the usual reflection operator by R, i.e. Rφ(x) = φ(−x), note
that R commutes with the Fourier transform. Thus,
!
fb
1
(15.3.9)
u
b=
N R
b
K
(2π) 2
and so from the inversion theorem the solution u is
!
1
fb b
u=
N
b
(2π) 2 K
(15.3.10)
assuming that the expression is meaningful.
Note that using this approach it would not be straightforward to include a λu term
on left side of (15.3.7).
15.4
Wiener-Hopf technique
Throughout this section it will be assumed that the reader has some familiarity with
basic ideas and techniques of complex analysis. Consider in one dimension the integral
equation of the special type
Z ∞
K(x − y)u(y) dy − λu(x) = f (x)
x>0
(15.4.1)
0
270
15-4-1
Here the kernel depends on the difference of the two arguments, as in a convolution equation, but it is not actually a convolution type equation since the integration only takes
place over (0, ∞). Nevertheless we can make some artificial extensions for mathematical
convenience. Assuming that there exists a solution u to be found, we let u(x) = f (x) = 0
for x < 0 and
(R ∞
K(x − y)u(y) dy
x<0
0
g(x) =
(15.4.2)
0
x>0
It then follows that
Z ∞
K(x − y)u(y) dy − λu(x) = f (x) + g(x)
x∈R
(15.4.3)
15-4-3
−∞
This resulting equation is of convolution type, but contains the additional unknown term
g. On the other hand when considered as a solution on all of R, u should be regarded as
constrained by the property that it has support in the positive half line.
A pair of operators which are technically useful for dealing with this situation are the
so-called Hardy space projection operators P± defined as
1
P± φ = (φ ± iHφ)
2
(15.4.4)
where H is the Hilbert transform. To motivate these definitions, recall from the discussion
b
just above (10.2.27) that (Hφ)b(y) = −i sgn y φ(y),
so
(
(
b
b
φ(y)
y>0
φ(y)
y<0
(P+ φ)b(y) =
(P− φ)b(y) =
(15.4.5)
0
y<0
0
y>0
It is therefore simple to see that P± are the orthogonal projections of L2 (R) onto the
corresponding closed subspaces
H+2 := {u ∈ L2 (R) : û(y) = 0 ∀y < 0} H−2 := {u ∈ L2 (R) : û(y) = 0 ∀y > 0}
(15.4.6)
2
2
2
for which L (R) = H+ ⊕ H− (see also Exercise 5 of Chapter 10.) These are so-called
Hardy spaces, which of course may be considered as Hilbert spaces in their own right,
see Chapter 3 of [9]. In particular it can be readily seen (Exercise 8) that if φ ∈ H+2 then
φ has an analytic extension to the upper half of the complex plane,
Z ∞
|φ(x + iy)|2 dx ≤ ||φ||2L2 (R) ∀y > 0
(15.4.7)
−∞
271
15-4-7
and
φ(· + iy) → φ in L2 (R) as y → 0+
(15.4.8)
Likewise a function φ ∈ H−2 has an analytic extension to the lower half of the complex
plane with analogous properties.
A very important converse of the above is given by a case of the Paley-Wiener theorem.
Theorem 15.4. If φ is analytic in the upper half of the complex plane and there exists
a constant C such that
Z ∞
|φ(x + iy)|2 dx = C
(15.4.9)
sup
y>0
then φ ∈
H+2
and
Z
−∞
∞
2
Z
|φ(x)| dx =
−∞
∞
|φ̂(y)|2 dy = C
(15.4.10)
0
See Theorem 19.2 of [30] or Theorem 1, section 3.4 of [9] for a proof. The spaces H±2
actually belong to the larger family of Hardy spaces H±p , 1 ≤ p ≤ ∞, where for example
φ ∈ H+p if φ has an analytic extension to the upper half of the complex plane and
||φ(· + iy)||Lp (R) ≤ ||φ||Lp (R)
∀y > 0
(15.4.11)
Returning to (15.4.3) we note that u
b, fb ∈ H−2 while gb ∈ H+2 . Suppose now that it is
possible to find a pair of functions q± ∈ H±∞ such that
√
q− (y)
b
2π K(y)
−λ=
q+ (y)
y∈R
(15.4.12)
15-4-12
Then from (15.4.3) it follows that
q− u
b = q+ fb + q+ gb
(15.4.13)
From the assumptions made on q+ and the Paley-Wiener theorem we can conclude
that q+ gb ∈ H+2 , and likewise q− u
b ∈ H−2 . In particular P− (q+ gb) = 0, so that
q− u
b = P− (q− u
b) = P− (q+ fb)
(15.4.14)
We thus obtain at least a formal solution formula for the Fourier transform of the solution,
namely
P− (q+ fb)
(15.4.15)
u
b=
q−
272
15-4-15
In order that the this formula be meaningful it is sufficient that 1/q± ∈ H±∞ along with
the other assumptions already made, see Exercise 9. The central question which remains
to be more thoroughly studied is the existence of the pair of functions q± satisfying all
of the above requirements. We refer the reader to Chapter 3 of [15] or Chapter 3 of [9]
for further reading about this, and conclude just with an example.
Example 15.2. Consider (15.4.1) with K(x) = e−|x| , that is
Z ∞
e−|x−y| u(y) dy − λu(x) = f (x) x > 0
(15.4.16)
0
Since
r
b
K(y)
=
we get
√
2 1
π y2 + 1
b
2π K(y)
− λ = −λ
(15.4.17)
y 2 + b2
y2 + 1
(15.4.18)
where b2 = (λ − 2)/λ. If we require λ 6∈ [0, 2] then b may be chosen as real and positive,
and we have
√
q− (y)
b
(15.4.19)
2π K(y)
−λ=
q+ (y)
where
q− (y) = −λ
y − ib
y−i
q+ (y) =
y+i
y + ib
(15.4.20)
−1
We see immediately that q± , q±
∈ H±∞ , and so (15.4.15) provides the unique solution
of (15.4.16) provided λ 6∈ [0, 2].√Note the significance of this restriction on λ is that it
b
precisely the requirement that √
2π K(y)
− λ 6= 0 for all y, or equivalently λ does not
b
belong to the numerical range of 2π K(y).
15.5
Exercises
1. The Abel integral equation is
x
Z
T u(x) =
0
u(y)
√
dy = f (x)
x−y
273
15-4-16
a first kind Volterra equation with a weakly singular kernel. Derive the explicit
solution formula
Z x
1 d
f (y)
√
u(x) =
dy
π dx 0
x−y
Rx
2
(Suggestions: it amounts to showing
that
T
u(x)
=
π
u(y) dy. You’ll need to
0
Rx
dz
√
√
evaluate an integral of the form y z−y x−z . Use the change of variable z =
y cos2 θ + x sin2 (θ).)
2. Let K1 , K2 be weakly singular kernels with associated exponents α1 , α2 , and let
T1 , T2 be the associated Volterra integral operators. Show that T1 T2 is also a Volterra
operator with a weakly singular kernel and associated exponent α1 + α2 − 1.
3. If P (x) is any nonzero polynomial, show that the first kind Volterra integral equation
Z x
P (x − y)u(y) dy = f (x)
a
is equivalent to a second kind Volterra integral equation.
4. If T is a weakly singular Volterra integral operator, show that there exists a positive
integer n such that T n is a Volterra integral operator with a bounded kernel.
5. Use (15.3.4) to obtain an explicit solution of (15.3.1) if
(
e−x x > 0
N =1
λ=1
K(x) = e−|x|
f (x) =
0 x<0
6. Discuss the solvability of the integral equation
Z ∞
u(s)
ds = f (t) t > 0
s+t
0
(15.5.1)
(15.5.2)
(Suggestions: Introduce new variables
ξ=
1
1
log t η = log s
2
2
ψ(η) = eη u(e2η ) g(ξ) = eξ f (e2ξ )
You may find it useful to work out, or look up, the Fourier transform of the hyperbolic secant function.)
274
7. If we look for a solution of
∆u = 0
x∈Ω
∂u
+u=f
x ∈ ∂Ω
∂n
in the form of a single layer potential
Z
Γ(x − y)φ(y) dy
u(x) =
∂Ω
find an integral equation for the density φ.
exr15-8
8. Ifφ ∈ H+2 show that φ has an analytic extension to the upper half of the complex
plane. To be precise, show that if
Z ∞
1
b itz dt
√
φ̃(z) =
φ(t)e
2π 0
then φ̃ is defined and analytic on {z = x + iy : y > 0} and
in L2 (R)
lim φ̃(· + iy) = φ
y→0+
Along the way show that (15.4.7) holds.
exr15-9
−1
9. Assume that f ∈ L2 (0, ∞) and that (15.4.12) is valid for some q± with q± , q±
∈ H±∞ .
2
Verify that (15.4.15) defines a function u ∈ L (R) such that u(x) = 0 for x < 0.
10. Find q± in (15.4.12) for the case
(
K(x) = sinc (x) :=
sin (πx)
πx
1
x 6= 0
x=0
and λ = −1. (Suggestion: look for q± in the form q± (x) = limy→0± F (x + iy) where
F (z) = ((z − π)/(z + π))iα .)
275
Chapter 16
Variational methods
calcvar
16.1
The Dirichlet quotient
DirFormCase
We have earlier introduced the concept of the Rayleigh quotient
J(u) =
hT u, ui
hu, ui
(16.1.1)
for a linear operator T on a Hilbert space H. In the previous discussion we were mainly
concerned with the case that T was a bounded or even a compact operator, but now
we will allow for T to be unbounded. In such a case, J(u) is defined at least for u ∈
D(T )\{0}, and possibly on some larger domain. The principal case of interest to us here
is the case of the Dirichlet Laplacian discussed in Section 14.4,
T u = −∆u
In this case
on D(T ) = {u ∈ H01 (Ω) : ∆u ∈ L2 (Ω)}
R
||u||2H 1 (Ω)
|∇u|2 dx
h−∆u, ui
Ω
0
J(u) =
= R
=
2
2 dx
hu, ui
||u||
|u|
L2 (Ω)
Ω
(16.1.2)
16-1-2
(16.1.3)
16-1-3
which we may evidently regard as being defined on all of H01 (Ω) except the origin. We’ll
refer to any of these equivalent expressions as the Dirichlet quotient (or Dirichlet form)for
−∆. Throughout this section we take (16.1.3) as the definition of J, and denote by
{λn , ψn } the eigenvalues and eigenfunctions of T , where we may choose the ψn ’s to be an
276
orthonormal basis of L2 (Ω), according to the discussion of Section 14.4. It is immediate
that
J(ψn ) = λn
(16.1.4)
for all n.
If we define a critical point of J to be any u ∈ H01 (Ω)\{0} for which
d
J(u + αv)|α=0 = 0
dα
∀v ∈ H01 (Ω)
then precisely as in (13.3.12) and the following discussion we find
Z
Z
∇u · ∇v dx = J(u) uv dx
∀v ∈ H01 (Ω)
Ω
(16.1.5)
(16.1.6)
Ω
In other words, T u = λu must hold with λ = J(u). Conversely, by straightforward
calculation, any eigenfunction of T is a critical point of J. Thus the set of eigenfunctions
of T coincides with the set of critical points of the Dirichlet quotient, and by (16.1.4) the
eigenvalues are exactly the critical values of J.
Among these critical points, one might expect to find a point at which J achieves its
minimum value, which must then correspond to the critical value λ1 , the least eigenvalue
of T . We emphasize, however, that the existence of a minimizer of J must be proved –
it is not immediate from anything we have stated so far. We give one such proof here,
and indicate another one in Exercise 3.
th16-1
Theorem 16.1. There exists ψ ∈ H01 (Ω), ψ 6= 0, such that J(ψ) ≤ J(φ) for all φ ∈
H01 (Ω), φ 6= 0.
Proof: Let
λ=
inf
φ∈H01 (Ω)
J(φ)
(16.1.7)
so λ > 0 by the Poincaré inequality. Therefore there exists ψn ∈ H01 (Ω) such that
J(ψn ) → λ. Without loss of generality we may assume ||ψn ||L2 (Ω) = 1 for all n, in which
case ||ψn ||2H 1 (Ω) → λ. In particular {ψn } is a bounded sequence in H01 (Ω), so by Theorem
0
w
13.1 there exists ψ ∈ H01 (Ω) such that ψnk → ψ in H01 (Ω), for some subsequence. By
Theorem 14.4 it follows that ψnk → ψ strongly in L2 (Ω), so in particular ||ψ||L2 (Ω) = 1.
Finally, using the lower semi-continuity property of weak convergence (Proposition 13.2)
λ ≤ J(ψ) = ||ψ||2H 1 (Ω) ≤ lim inf ||ψnk ||2H 1 (Ω) = lim inf J(ψnk ) = λ
0
nk →∞
0
277
nk →∞
(16.1.8)
16-1-4
so that J(ψ) = λ, i.e. J achieves its minimum at ψ.
Note that by its very definition, the minimum λ1 of the Rayleigh quotient J, gives rise
to the best constant in the Poincaré inequality, namely (14.4.10) is valid with C = √1λ1
and no smaller C works.
The above argument provides a proof of the existence of one eigenvalue of T , namely
the smallest eigenvalue λ1 , with corresponding eigenfunction ψ1 , which is completely
independent from the proof given in Chapter 13. It is natural to ask then whether the
existence of the other eigenvalues can be obtained in a similar way. Of course they can no
longer be obtained by minimizing the Dirichlet quotient (nor is there any maximum to
be found), but we know in fact that J has other critical points, since other eigenfunctions
exist. Consider, for example the case of λ2 , for which there must exist an eigenfunction
orthogonal in L2 (Ω) to the eigenfunction already found for λ1 . Thus it is a natural
conjecture that λ2 may be obtained by minimizing J over the orthogonal complement of
ψ1 . Specifically, if we set
Z
1
H1 = {φ ∈ H0 (Ω) :
φψ1 dx = 0}
(16.1.9)
Ω
then the existence of a minimizer of J over H1 can be proved just as in Theorem 16.1. If
the minimum occurs at ψ2 , with λ2 = J(ψ2 ) then the critical point condition amounts to
Z
Z
∇ψ2 · ∇v dx = λ2 ψ2 v dx
∀v ∈ H1
(16.1.10)
Ω
Ω
Furthermore, if v = ψ1 then
Z
Z
Z
∇ψ2 · ∇ψ1 dx = − ψ2 ∆ψ1 = −λ1 ψ2 ψ1 = 0
Ω
Ω
(16.1.11)
Ω
since ψ2 ∈ H1 . It follows that (16.1.10) holds for every v ∈ H01 (Ω), so ψ2 is an eigenvalue
of T for eigenvalue λ2 . Clearly λ2 ≥ λ1 , since λ2 is obtained by minimization over a
smaller set.
We may continue this way, successively minimizing the Rayleigh quotient over the
orthogonal complement in L2 (Ω) of the previously obtained eigenfunctions, to obtain a
variational characterization of all eigenvalues.
th16-2
Theorem 16.2. We have
λn = J(ψn ) = min J(u)
u∈Hn−1
278
(16.1.12)
16-1-10
where
Hn = {u ∈
H01 (Ω)
Z
uψk dx = 0, k = 1, 2, . . . n}
:
H0 = {0}
(16.1.13)
Ω
This proof is essentially a mirror image of the proof of Theorem 13.10, in which a
compact operator has been replaced by an unbounded operator, and maximization has
been replaced by minimization. One could also look at critical points of the reciprocal of
J in order to maintain it as a maximization problem, but it is more common to proceed
as above. Similar results can be obtained for a larger class of unbounded self-adjoint
operators, see for example [37]. The eigenfunctions may be interpreted as saddle points
of J, i.e., critical points which are not local extrema.
The characterization of eigenvalues and eigenfunctions stated in Theorem 16.2 is unsatisfactory, in the sense that the minimization problem to be solved in order to obtain
an eigenvalue λn requires knowledge of the eigenfunctions corresponding to smaller eigenvalues. We next discuss two alternative characterizations of eigenvalues, which may be
regarded as advantageous from this point of view.
If E is a finite dimensional subspace of H01 (Ω), we define
µ(E) = max J(u)
(16.1.14)
u∈E
and set
Sn = {E ⊂ H01 (Ω) : E is a subspace, dim(E) = n} n = 0, 1, . . .
(16.1.15)
Note that µ(E) exists and is finite for E ∈ Sn , since if we choose any orthonormal basis
{ζ1 , . . . ζn } of Sn then
Z
max J(u) = Pn max2
u∈E
k=1
|ck | =1
|
Ω
n
X
ck ∇ζk |2 dx
(16.1.16)
k=1
Thus finding µ(E) amounts to maximizing a continuous function over a compact set.
th16-3
Theorem 16.3. (Poincaré min-max formula) We have
λn = min µ(E) = min max J(u)
E∈Sn
E∈Sn u∈E
for n = 0, 1, . . .
279
(16.1.17)
16-1-17
Proof: J is constant on any one dimensional subspace, i.e. µ(E) = J(φ) if E = span {φ},
so the conclusion is equivalent to the statement of Theorem 16.1 for n = 1. For n ≥ 2, if
E ∈ Sn we can find w ∈ E, w 6= 0 such that w ⊥ ψk for k = 1, . . . n−1, since this amounts
to n − 1 linear equations for n unknowns (here {ψn } still denotes the orthonormalized
Dirichlet eigenfunctions). Thus w ∈ Hn−1 and so by Theorem 16.2.
λn ≤ J(w) ≤ max J(u) = µ(E)
(16.1.18)
λn ≤ inf µ(E)
(16.1.19)
u∈E
It follows that
E∈Sn
On the other hand, if we choose E = span {ψ1 , . . . ψn } note that
Pn
λk c2k
J(u) = Pk=1
n
2
k=1 ck
P
for any u = nk=1 ck ψk ∈ E. Thus
µ(E) = J(ψn ) = λn
(16.1.20)
(16.1.21)
and so the infimum in (16.1.19) is achieved for this E. The conclusion (16.1.17) then
follows.
A companion result, with a similar proof (see for example Theorem 5.2 of [37]) is
Theorem 16.4. (Courant-Weyl max-min formula) We have
λn = max min J(u)
E∈Sn−1 u⊥E
(16.1.22)
for n = 0, 1, . . .
An interesting application of the variational characterization of the first eigenvalue is
the following monotonicity property. We use temporarily the notation λ1 (Ω) to denote
the smallest Dirichlet eigenvalue of −∆ for the domain Ω.
Theorem 16.5. If Ω ⊂ Ω0 then λ1 (Ω0 ) ≤ λ1 (Ω).
Proof: By the density of C0∞ (Ω) in H01 (Ω) and Theorem 16.1, for any > 0 there exists
u ∈ C0∞ (Ω) such that
J(u) ≤ λ1 (Ω) + (16.1.23)
280
16-1-19
But extending u to be zero outside of Ω we may regard it as also belonging to C0∞ (Ω0 ),
and the value of J(u) is the same whichever domain we have in mind. Therefore
λ1 (Ω0 ) ≤ J(u) ≤ λ1 (Ω) + (16.1.24)
and so the conclusion follows by letting → 0.
16.2
Eigenvalue approximation
The variational characterizations of eigenvalues discussed in the previous section lead
immediately to certain estimates for the eigenvalues. In the simplest possible situation,
if we choose any nonzero function v ∈ H01 (Ω), (which we call the trial function in this
context), then from Theorem 16.1 we have that
λ1 ≤ J(v)
(16.2.1)
an upper bound for the smallest eigenvalue. Furthermore, if we can choose v to ’resemble’
the corresponding eigenfunction ψ1 , then we will typically find that J(v) is close to λ1 .
If, for example in the one dimensional case Ω = (0, 1), we choose v(x) = x(1 − x) then by
direct calculation we get that J(v) = 10, which should be compared to the exact value
π 2 ≈ 9.87. The trial function v(x) = x2 (1 − x), which is not so much like ψ1 = sin(πx)
provides a correspondingly poorer approximation J(v) = 14, which is still of course a
valid upper bound.
The so-called Rayleigh-Ritz method generalizes this idea, so as to provide inequalities
and/or approximations for other eigenvalues besides the first one. Let v1 , v2 . . . , vn denote
n linearly independent trial functions in H01 (Ω). Then E = span {v1 , v2 , . . . vn } is an ndimensional subspace of H01 (Ω), and so
λ1 ≤ min J(u)
λn ≤ max J(u)
u∈E
u∈E
(16.2.2)
by Theorems 16.2 and 16.3.
The problem of computing critical points of J over E is aP
calculus problem, which
may be handled as follows. Any u ∈ E may be written as u = nk=1 ck vk , and so
R Pn
| k=1 ck ∇vk |2
Ω
R
P
J(u) =
= J(c1 , . . . cn )
(16.2.3)
( nk=1 ck vk )2
Ω
281
16-2-2
∂J
The critical point condition ∂c
= 0, j = 1, . . . n is readily seen to be equivalent to the
j
T
linear system for c = hc1 , . . . cn i ,
Ac = ΛBc
(16.2.4)
where A, B are the n × n matrices with entries
Z
Z
Akj =
∇vk · ∇vj dx
Bkj =
vk vj dx
Ω
geneval
(16.2.5)
Ω
and Λ = J(u). In other words, the critical points are obtained as the eigenvalues of the
generalized eigenvalue problem (16.2.4) defined by means of the two matrices A, B.
As usual, the set of all eigenvalues of (16.2.4) are obtained as the roots of the n’th
degree polynomial det (A − ΛB) = 0. We denote these roots (which must be positive
and real, by the symmetry of A, B) as 0 < Λ1 ≤ Λ2 ≤ · · · ≤ Λn , with points repeated as
needed according to multiplicity. Thus (16.2.2) amounts to
λ1 ≤ Λ1
λn ≤ Λn
(16.2.6)
Similar inequalities can be proved for all of the intermediate eigenvalues as well, we refer
to [37] for the proof.
Theorem 16.6. We have
λk ≤ Λk
k = 1, . . . n
(16.2.7)
As in the case of a single eigenvalue, a good choice of trial functions {v1 , . . . vn } will
typically result in values of Λ1 , . . . Λk which are good approximations to λ1 , . . . λn .
16.3
The Euler-Lagrange equation
eul-lag-sec
In Section 16.1 we observed that the problem of minimizing the nonlinear functional J in
(16.1.3), or more generally finding any critical point of J, leads to the eigenvalue problem
for T defined in (16.1.2). This corresponds to the situation found even in elementary
calculus, where to solve an optimization problem, we look for points where a derivative is
equal to zero. In the Calculus of Variations, we continue to extend this kind of thinking
from finite dimensional to infinite dimensional situations.
Suppose X is a vector space, X ⊂ X, J : X → R is a functional, nonlinear in general,
and consider the problem
min J(x)
(16.3.1)
x∈X
282
uncon
There may also be constraints to be satisfied, for example in the form H(x) = C, where
H : X → R, so that the problem may be given as
min J(x)
(16.3.2)
con
x∈X
H(x)=C
We refer to (16.3.1) and (16.3.2) as the unconstrained and constrained cases respectively.1
In the unconstrained case, if x is a solution of (16.3.1) which is also an interior point
of X , then α → J(x + αy) has a minimum at α = 0, and so
d
J(x + αy)α=0 = 0
dα
∀y ∈ X
(16.3.3)
16-3-3
must be satisfied. In the constrained case, a solution must instead have the property
that there exists a constant λ such that
d
(J(x + αy) − λH(x + αy)) α=0 = 0
dα
∀y ∈ X
(16.3.4)
This condition may be motivated in several ways, here is one of them. Suppose we can
find a constant λ such that the unconstrained problem of minimizing J − λH has a
solution x for which H(x) = C, i.e. J(z) − λH(z) ≥ J(x) − λH(x) for all z. But if we
require z to satisfy the constraint, then H(z) = H(x), and so J(z) ≥ J(x) for all z for
which H(z) = C. Thus the constrained minimization problem may be regarded as that of
solving (16.3.4) simultaneously with the constraint H(x) = C. The special value of λ is
called a Lagrange multiplier for the problem. In either the constrained or unconstrained
case, the equation which results from (16.3.3) or (16.3.4) is called the Euler-Lagrange
equation.
The same conditions would be satisfied if we were seeking a maximum rather than
a minimum, and may also be satisfied at critical points which are neither. The EulerLagrange equation must be viewed as a necessary condition for a solution, but it does
not follow that any solution of the Euler-Lagrange equation must also be a solution of the
original optimization problem. Just as in elementary calculus, we only obtain candidates
for the solution in this way, and some further argument will in general be needed to
complete the solution.
1
Even though the definition of X itself will often amount to the imposition of certain constraints.
283
16-3-4
16.4
Variational methods for elliptic boundary value problems
We now present the application of variational methods, and obtain the Euler-Lagrange
equation in explicit form, for several important PDE problems.
Ex16-1
Example 16.1. Let J denote the Dirichlet quotient defined in (16.1.3) which we regard
as defined on X = {u ∈ H01 (Ω) : u 6= 0} ⊂ H01 (Ω). Precisely as in (13.3.12) we find that
R 2
R
R
R
2
(
u
dx)(
∇u
·
∇v
dx)
−
(
|∇u|
dx)(
uv dx)
d
Ω
Ω
Ω
Ω
R
J(u + αv)α=0 = 2
(16.4.1)
dα
( Ω u2 dx)2
The condition (16.3.3) for an unconstrained minimum of J over X then amounts to
Z
Z
u 6= 0
∇u · ∇v dx − λ uv dx = 0 ∀v ∈ H01 (Ω)
(16.4.2)
Ω
16-3-6
Ω
with λ = J(u). Thus the Euler-Lagrange equation for this problem is precisely the
equation for a Dirichlet eigenfunction in Ω.
Example 16.2. Let
Z
Z
2
|∇u| dx
J(u) =
H(u) =
Ω
u2 dx
(16.4.3)
Ω
both regarded as functionals on X = X = H01 (Ω). By elementary calculations,
Z
Z
d
d
(16.4.4)
J(u + αv) α=0 = 2 ∇u · ∇v dx
H(u + αv) α=0 = 2 uv dx
dα
dα
Ω
Ω
The condition (16.3.4) for a constrained minimum of J subject to the constraint H(u) = 1
then amounts to (16.4.2) again, except now the solution is automatically normalized in
L2 . Thus we can regard the problem of finding eigenvalues as coming from either a
constrained or an unconstrained optimization problem.
Ex16-3
Example 16.3. Define J as in Example 16.1, except replace H01 (Ω) by H 1 (Ω). The
condition for a solution of the unconstrained problem is then
Z
Z
u 6= 0
∇u · ∇v dx − λ uv dx = 0 ∀v ∈ H 1 (Ω)
(16.4.5)
Ω
Ω
Since we are still free to choose v ∈ C0∞ (Ω) it again follows that −∆u = λu for λ = J(u),
but there is no longer an evident boundary condition for u to be satisfied. We observe,
284
16-3-9
however, that if we choose v to be say in C 1 (Ω) in (16.4.5), then an integration by parts
yields
Z
Z
Z
∂u
ds = λ uv dx
(16.4.6)
− v∆u dx +
v
Ω
Ω
∂Ω ∂n
and since the Ω integrals must cancel, we get
Z
∂u
v
ds = 0 ∀v ∈ C 1 (Ω)
∂n
∂Ω
(16.4.7)
∂u
= 0 on ∂Ω should hold. Thus, by
Since v is otherwise arbitrary, we conclude that ∂n
looking for critical points of the Dirichlet quotient over the larger space H 1 (Ω) we get
eigenfunctions of −∆ subject to the homogeneous Neumann condition, in place of the
Dirichlet condition. Since this condition was not imposed explicitly, but rather followed
from the choice of space we used, it is often referred to in this context as the natural
boundary condition.
Note that the actual minimum in this case is clearly J = 0, achieved for any constant
function u. Thus it is the fact that infinitely many other critical points can be shown to
exist which makes this of interest.
Example 16.4. Let f ∈ L2 (Ω), and set
Z
Z
1
2
J(u) =
|∇u| dx − f u dx u ∈ H01 (Ω)
2 Ω
Ω
The condition for an unconstrained critical point is readily seen to be
Z
Z
1
u ∈ H0 (Ω)
∇u · ∇v dx − f v dx = 0
∀v ∈ H01 (Ω)
Ω
(16.4.8)
16-3-12
(16.4.9)
16-3-13
Ω
Thus, in the distributional sense at least, a minimizer is a solution of the Poisson problem
−∆u = f
x∈Ω
u = 0 x ∈ ∂Ω
(16.4.10)
The existence of a unique solution is already known from Proposition 14.2, and is explicitly given by the integral operator S appearing in (14.4.23). The main interest here
is that we have obtained a variational characterization of it. Furthermore, we can give a
direct proof of the existence of a unique solution of (16.4.9), which is of interest because
it is easily adaptable to some other situations, even if it does not provide a new result in
this particular case. The proof illustrates the so-called direct method of the Calculus of
Variations.
285
16-3-14
th16-7
Theorem 16.7. The problem of minimizing the functional J defined in (16.4.8) has a
unique solution, which also satisfies (16.4.9).
Proof: If C denotes any constant for which the Poincaré inequality (14.4.10) is valid,
we obtain
Z
f u dx ≤ ||f ||L2 ||u||L2 ≤ C||f ||L2 ||u||H 1 ≤ 1 ||u||2 1 + C 2 ||f ||2 2
(16.4.11)
L
H0
0
4
Ω
so that
1
J(u) ≥ ||u||2H 1 − C 2 ||f ||2L2
0
4
In particular, J is bounded below, so
d :=
inf
u∈H01 (Ω)
J(u)
(16.4.12)
(16.4.13)
is finite and there exists a sequence un ∈ H01 (Ω) such that J(un ) → d. Also, since
||un ||2H 1 ≤ 4 J(un ) + C 2 ||f ||2L2
(16.4.14)
0
the sequence {un } is bounded in H01 (Ω). By Theorem 13.1 there exists u ∈ H01 (Ω) and
w
a weakly convergent subsequence, unk → u, which is therefore strongly convergent in
L2 (Ω) by Theorem 14.4. Finally,
Z
Z
1
2
d ≤ J(u) =
|∇u| dx − f u dx
(16.4.15)
2 Ω
Ω
Z
Z
1
2
≤ lim inf
|∇unk | dx − f unk dx
(16.4.16)
nk →∞
2 Ω
Ω
= lim inf J(unk ) = d
(16.4.17)
nk →∞
Here, the inequality
R on the second line follows from the first part of Proposition 13.2 and
the fact that the Ω f unk dx term is convergent. We conclude that J(u) = d so J achieves
its minimum value.
If two such solutions u1 , u2 exist, then the difference u = u1 − u2 must satisfy
Z
∇u · ∇v dx = 0
∀v ∈ H01 (Ω)
(16.4.18)
Ω
Choosing v = u we get ||u||H01 = 0, so u1 = u2 .
286
Here is one immediate generalization about the solvability of 16.4.10, which is easy to
obtain by the above method. Suppose that there exists p ∈ [1, 2) such that the inequality
Z
f u dx ≤ C||f ||Lp ||u||H 1
(16.4.19)
0
16-3-23
Ω
holds. Then the remainder of the proof remains valid, establishing the existence of a
solution for all f ∈ Lp (Ω) for this choice of p, corresponding to a class of f ’s which is
larger than L2 (Ω). It can in fact be shown that (16.4.19) is correct for p = N2N
, see
+2
Exercise 16.
Example 16.5. Next consider the functional J in (16.4.8) except now regarded as defined
on all of H 1 (Ω), in which case the critical point condition is
Z
Z
1
u ∈ H (Ω)
∇u · ∇v dx − f v dx = 0
∀v ∈ H 1 (Ω)
(16.4.20)
Ω
16-3-24
Ω
It still follows that u must be a weak solution of −∆u = f , and by the same argument
∂u
= 0 on ∂Ω. Thus critical points of J over H 1 (Ω) provide us with
as in Example 16.3, ∂n
solutions of
∂u
−∆u = f x ∈ Ω
= 0 x ∈ ∂Ω
(16.4.21)
∂n
We must first recognize that we can no longer expect a solution to exist for arbitrary
choices of f ∈ L2 (Ω), since if we choose v ≡ 1 we obtain the condition
Z
f dx = 0
(16.4.22)
Ω
which is thus a necessary condition for solvability. Likewise, if a solution exists it will
not be unique, since any constant could be added to it. From another point of view, if we
examine
R the proof of Theorem 16.7, we see that the infimum of J is clearly equal to −∞,
unless Ω f dx = 0, since we can choose u to be an arbitrary constant function. Thus the
minimum of J cannot be achieved by any function u ∈ H 1 (Ω).
To work around this difficulty, we make use of the closed subspace of zero mean
functions in H 1 (Ω), namely
Z
1
1
H∗ (Ω) = {u ∈ H (Ω) :
u dx = 0}
(16.4.23)
Ω
where the inner product and norm will simply be the restriction of the usual ones in H 1
to H∗1 . Analogous to the Poincaré inequality, Proposition 14.1 we have
287
16-3-25
zeromean
poincineq2
Proposition 16.1. If Ω is a bounded open set in RN with sufficiently smooth boundary
then there exists a constant C, depending only on Ω, such that
∀u ∈ H∗1 (Ω)
||u||L2 (Ω) ≤ C||∇u||L2 (Ω)
(16.4.24)
See Exercise 6 for the proof. The key point is that H∗1 contains no constant functions
other than zero. Now if we regard the functional J in (16.4.8) as defined only on the
Hilbert space H∗1 (Ω), then the proof of Theorem 16.7 can be modified in an obvious way
to obtain that for any f ∈ L2 (Ω) there exists
Z
Z
1
∇u · ∇v dx − f v dx = 0
∀v ∈ H∗1 (Ω)
(16.4.25)
u ∈ H∗ (Ω)
Ω
16-4-24
Ω
R
1
For any v ∈ H 1 (Ω) let µ = m(Ω)
v dx be the mean value of v, so that v − µ ∈ H∗1 (Ω).
Ω
If in addition we assume that the necessary condition (16.4.22) holds, it follows that
Z
Z
Z
Z
∇u · ∇v dx =
∇u · ∇(v − µ) dx =
f (v − µ) dx =
f v dx
(16.4.26)
Ω
Ω
Ω
Ω
for any v ∈ H01 (Ω). Thus u satisfies (16.4.20) and so is a weak solution of (16.4.21). It
is unique within the subspace H∗1 (Ω), but by adding any constant we obtain the general
solution u(x) + C in H 1 (Ω).
16.5
Other problems in the calculus of variations
Let L = L(x, u, p) be a sufficiently smooth function on the domain {(x, u, p) : x ∈ Ω, u ∈
R, p ∈ RN } where as usual Ω ⊂ RN , and set
Z
J(u) =
L(x, u(x), ∇u(x)) dx
(16.5.1)
Ω
The function L is called the Lagrangian in this context. We consider the problem of
finding critical points of J, and for the moment proceed formally, without regard to the
288
16-5-1
precise spaces of functions involved. Expanding J(u + αv) in powers of α, we get
Z
L(x, u(x) + αv(x), ∇u(x) + α∇v(x)) dx
(16.5.2)
J(u + αv) =
Ω
Z
Z
∂L
L(x, u(x), ∇u(x)) dx + α
=
(x, u(x), ∇u(x))v(x) dx(16.5.3)
Ω
Ω ∂u
Z X
N
∂v
∂L
(x, u(x), ∇u(x))
(x) dx + o(α)
(16.5.4)
+ α
∂xj
Ω j=1 ∂pj
Thus the critical point condition reduces to
#
Z "
N
X
∂L
∂L
∂v
(·, u, ∇u)v +
(·, u, ∇u)
dx
0=
∂p
∂x
j
j
Ω ∂u
j=1
(16.5.5)
for all suitable v’s. Among the choices of v we can make, we certainly expect to find
those which satisfy v = 0 on ∂Ω. By an integration by parts we then get
#
Z "
N
X
∂L
∂ ∂L
0=
(·, u, ∇u) −
(·, u, ∇u) v dx
(16.5.6)
∂xj ∂pj
Ω ∂u
j=1
16-5-6
Since v is otherwise arbitrary, we conclude that
N
∂L X ∂ ∂L
−
=0
∂u j=1 ∂xj ∂pj
(16.5.7)
is a necessary condition for a critical point of J. That is to say, (16.5.7) is the EulerLagrange equation corresponding to the functional J. Typically it amounts to a partial
differential equation for u, or an ordinary differential equation if N = 1.
The fact that (16.5.6) leads to (16.5.7) is often referred to as the Fundamental lemma
of the Calculus of Variations, resulting formally from the intuition that we may (approximately) choose v to be equal to the bracketed term in (16.5.6) which it multiplies, so
that v has L2 norm equal to zero. Despite using the term ‘lemma’, it is not a precise
statement of anything unless some specific assumptions are made on L and the function
spaces involved.
Example 16.6. The functional J in (16.4.8) comes from the Lagrangian
1
L(x, u, p) = |p|2 − f (x)u
2
289
(16.5.8)
eul-lag
Thus
∂L
∂u
= −f (x) and
∂L
∂pj
= pj , so (16.5.7) becomes, upon substituting p = ∇u,
N
X
∂ ∂u
=0
−f (x) −
∂x
∂x
j
j
j=1
(16.5.9)
which is obviously the same as (16.4.10).
Example 16.7. A very classical problem in the calculus of variations is that of finding
the shape of a hanging uniform chain, given fixed locations for its two endpoints. The
physical principle which we invoke is that the shape must be such that the potential
energy is minimized. To find an expression for the potential energy, let the shape be
given by a function h = u(x), a < x < b. Observe that the contribution to the total
potential energy from a short segment of the chain is gh∆m where g is the gravitational
constant and ∆m is the mass of the segment, and so may be given asp
ρ∆s where ρ is the
(constant) density, and ∆s is the length of the segment. Since ∆s = 1 + u0 (x)2 ∆x, we
are led in the usual way to the potential energy functional
Z b
p
J(u) =
u(x) 1 + u0 (x)2 dx
(16.5.10)
a
p
to minimize. Applying (16.5.7) with L(x, u, p) = u 1 + p2 gives the Euler-Lagrange
equation
0
∂L
d ∂L √
d
uu
√
−
= 1 + u02 −
=0
(16.5.11)
∂u dx ∂p
dx
1 + u02
To solve this nonlinear ODE, we first multiply the equation through by
1 d
uu −
2 dx
0
uu0
√
1 + u02
0
√ uu
1+u02
to get
2
=0
(16.5.12)
so
2
uu0
=C
(16.5.13)
u − √
1 + u02
for some constant C. After some obvious algebra we get the separable first order ODE
r u 2
−1
(16.5.14)
u0 = ±
C
2
which is readily integrated to obtain the general solution
x
u(x) = C cosh ( + D)
C
290
(16.5.15)
The two constants C, D are determined by the values of u(a) and u(b), so that in all cases
the hanging chain is seen to assume the ’catenary’ shape, determined by the hyperbolic
cosine function.
Example 16.8. Another important class of examples comes from the theory of minimal
surfaces. A function u = u(x) defined on a domain Ω ⊂ R2 may be regarded as defining
a surface in R3 , and the corresponding surface area is
Z p
J(u) =
1 + |∇u|2 dx
(16.5.16)
Ω
Suppose we seek the surface of least possible area, subject to the requirement that u(x) =
g(x) on ∂Ω, where g is a prescribed function. Such a surface is said to span the bounding
curve Γ = {(x1 , x2 , g(x1 , x2 )) : (x1 , x2 ) ∈ ∂Ω}. The problem of finding a minimal surface
with a given boundary curve is known as Plateau’s problem.
For this discussion we assume that g is the restriction to ∂Ω of some function in
H (Ω) and then let X = {u ∈ H 1 (Ω) : u − g ∈ H01 (Ω)}. Thus in looking at J(u + αv)
1
we should always
p assume that v ∈ H0 (Ω), as in the discussion leading to (16.5.6). With
L(x, u, p) = 1 + |p|2 we obtain
1
pj
∂L
=p
∂pj
1 + |p|2
The resulting Euler-Lagrange equation is then the minimal surface equation
!
2
X
uxj
p
=0
2
1
+
|∇u|
j=1
x
(16.5.17)
(16.5.18)
j
It turns out that the expression on the left hand side is the so-called mean curvature 2 of
the surface defined by u(x, y), so a minimal surface always has zero mean curvature.
Let us finally consider an example in the case of constrained optimization,
min J(u)
(16.5.19)
H(u)=C
where J is defined as in (16.5.1) and H is another functional of the same sort, say
Z
H(u) =
N (x, u(x), ∇u(x)) dx
(16.5.20)
Ω
2
It is equal to the average of the principal curvatures.
291
16-5-19
As discussed in Section 16.3 we should seek critical points of J − λH, which we may
regard as coming from the augmented Lagrangian M := L − λN . The Euler-Lagrange
equation for a solution will then be
N
∂M X ∂ ∂M
−
=0
∂u
∂x
∂p
j
j
j=1
dido
Z
N (x, u(x), ∇u(x)) dx = C
(16.5.21)
Ω
Example 16.9. (Dido’s problem 3 ) Consider the area A in the (x, y) plane between y = 0
and y = u(x), where u(x) ≥ 0, u(0) = u(1) = 0. If the curve y = u(x) is fixed to have
length L, how should we choose the shape of the curve to maximize the area A? This
is an example of a so-called isoperimetric problem because the total perimeter of the
boundary of A is fixed to be 1 + L. Clearly the mathematical expression of this problem
may be written in the form (16.5.19) with
Z 1
Z 1p
u(x) dx
H(u) =
J(u) =
1 + u0 (x)2 dx
C=L
(16.5.22)
0
0
so that
p
M = u − λ 1 + p2
The first equation in (16.5.21) thus gives
0
u0
1
√
=
λ
1 + u02
(16.5.23)
(16.5.24)
From straightforward algebra and integration we obtain
x − x0
u0 = ± p
λ − (x − x0 )2
(16.5.25)
for some x0 , which subsequently leads to the expected result that the curve must be an
arc of a circle,
(u − u0 )2 + (x − x0 )2 = λ2
(16.5.26)
for some x0 , u0 . From the boundary conditions u(0) = u(1) = 0 it is easy to see that
x0 = 1/2, and the length constraint implies
Z 1√
Z 1
x − 21 1
dx
1
−1
02
q
L=
1 + u dx = λ
= λ sin
= 2λ sin−1
0
λ
2λ
0
0
λ2 − (x − 12 )2
(16.5.27)
3
Named for the founder and first queen of the ancient city of Carthage.
292
16-5-21
By elementary calculus techniques we may verify that a unique λ ≥ 1/2 exists for any
L ∈ (1, π2 ]. The restriction L > 1 is of course a necessary one for the curve to connect
the two endpoints and enclose a positive area, but L ≤ π2 is only an artifact due to us
requiring that the curve be given in the form y = u(x). If instead we allow more general
curves (e.g. given parametrically) then any L > 1 is possible, see Exercise 18.
16.6
The existence of minimizers
We turn now to some discussion of conditions which guarantee the existence of a solution
of a minimization problem. We emphasize that (16.5.7) is only a necessary condition
for a solution, and some different kind of argument is needed to establish that a given
minimization problem actually has a solution. Let H be a Hilbert space, X ⊂ H an
admissible subset of H, J : X → R and consider the problem
min J(x)
(16.6.1)
x∈X
One result which is immediate from applying Theorem 4.4 to −J is that a solution
exists provided X is compact and J is continuous. It is unfortunately the case for many
interesting problems that one or both of these conditions fails to be true, thus some other
considerations are needed. We’ll use the following definitions.
Definition 16.1. J is coercive if J(x) → +∞ as ||x|| → ∞, x ∈ X .
Definition 16.2. J is lower semicontinuous if J(x) ≤ lim inf n→∞ J(xn ) whenever xn ∈
X , xn → x, and weakly lower semicontinuous if J(x) ≤ lim inf n→∞ J(xn ) whenever
w
xn ∈ X , xn → x.
Definition 16.3. J is convex if J(tx+(1−t)y) ≤ tJ(x)+(1−t)J(y) whenever 0 ≤ t ≤ 1
and x, y ∈ X .
w
Recall also that X is weakly closed if xn ∈ X , xm → x implies that x ∈ X .
th16-8
Theorem 16.8. If J : X → R is coercive and weakly lower semicontinuous, and X ⊂ H
is weakly closed, then there exists a solution of (16.6.1). If J is convex then it is only
necessary to assume that J is lower semicontinuous rather than weakly lower semicontinuous.
Proof: Let d = inf x∈X J(x). If d 6= −∞ then there exists R > 0 such that J(x) ≥ d + 1
if x ∈ X , ||x|| > R, while if d = −∞ there exists R > 0 such that J(x) ≥ 0 if x ∈ X ,
293
16-6-1
||x|| > R. Either way, the infimum of J over X must be the same as the infimum over
{x ∈ X : ||x|| ≤ R}. Thus there must exist a sequence xn ∈ X , ||xn || ≤ R such that
J(xn ) → d. By the second part of Theorem 13.1 and the weak closedness of X , it follows
w
that there is a subsequence {xnk } and a point x ∈ X such that xnk → x. In particular
J(x) = d must hold, since
d ≤ J(x) ≤ lim inf J(xnk ) = d
nk →∞
(16.6.2)
Thus d must be finite, and the infimum of J is achieved at x, so x is a solution of (16.6.1).
The final statement is a consequence of the lemma below, which is of independent
interest.
Lemma 16.1. If J is convex and lower semicontinuous then it is weakly lower semicontinuous.
Proof: If
Eα = {x ∈ H : J(x) ≤ α}
(16.6.3)
then Eα is closed since xn ∈ Eα , xn → x implies that J(x) ≤ lim inf n→∞ J(xn ) ≤ α. Also,
Eα is convex since if x, y ∈ Eα and t ∈ [0, 1], then J(tx + (1 − t)y) ≤ tJ(x) + (1 − t)J(y) ≤
tα + (1 − t)α = α. Now by part 3 of Theorem 13.1 (Mazur’s theorem) we get that Eα
w
is weakly closed. Thus, if xn → x and α = lim inf n→∞ J(xn ), we may find nk → ∞ such
that J(xnk ) → α. If α 6= −∞ and > 0 we must have xnk ∈ Eα+ for sufficiently large
nk , and so x ∈ Eα+ by the weak closedness. Since is arbitrary, we must have J(x) ≤ α,
as needed. The proof is similar if α = −∞.
16.7
The Fréchet derivative
In this final section we discuss some notions which are often used in formalizing the
general ideas already used in this chapter.
Let X, Y be Banach spaces and F : D(F ) ⊂ X → Y be a mapping, nonlinear in
general, and let x0 be an interior point of D(F ).
Definition 16.4. If there exists a linear operator A ∈ B(X, Y) such that
lim
x→x0
||F (x) − F (x0 ) − A(x − x0 )||
=0
||x − x0 ||
294
(16.7.1)
frderiv
then we say F is Fréchet differentiable at x0 , and A =: DF (x0 ) is the Fréchet derivative
of F at x0 .
It is easy to see that there is at most one such operator A, see Exercise 21. It is also
immediate that if DF (x0 ) exists then F must be continuous at x0 .
Note that (16.7.1) is equivalent to
F (x) = F (x0 ) + DF (x0 )(x − x0 ) + o(||x − x0 ||) x ∈ D(F )
(16.7.2)
This general concept of differentiability of a mapping at a given point amounts to the
property that the mapping may be approximated in a precise sense by a linear map4 in
the vicinity of the given point x0 . The difference
E(x, x0 ) := F (x) − F (x0 ) − DF (x0 )(x − x0 ) = o(||x − x0 ||)
(16.7.3)
will be referred to as the linearization error, and approximating F (x) by F (x0 )+DF (x0 )(x−
x0 ) as linearization of F at x0 .
Example 16.10. If F : X → R is defined by F (x) = ||x||2 on a real Hilbert space X
then
F (x) − F (x0 ) = ||x0 + (x − x0 )||2 − ||x0 ||2 = 2hx0 , x − x0 i + ||x − x0 ||2
(16.7.4)
It follows that (16.7.2) holds with DF (x0 ) = A ∈ B(X, R) = X∗ given by
Az = 2hx0 , zi
Ex16-11
Example 16.11. Let F : RN → RM be defined as


f1 (x1 , . . . xN )
..

F (x) = F (x1 , . . . xN ) = 
.
fM (x1 , . . . xN )
(16.7.5)
(16.7.6)
If the component functions f1 , . . . fM are continuously differentiable on some open set
containing x0 , then
fk (x) = fk (x0 ) +
N
X
∂fk
j=1
∂xj
(x0 )(xj − x0j ) + o(||x − x0 ||)
4
(16.7.7)
Here we will temporarily use the word linear to refer to what might more properly be called an affine function,
F (x0 ) + A(x − x0 ) which differs from the linear function x → Ax by the constant F (x0 ) − Ax0 .
295
frderiv2
Therefore
F (x) = F (x0 ) + A(x0 )(x − x0 ) + o(||x − x0 ||)
(16.7.8)
with A(x0 ) ∈ B(RN , RM ) given by the Jacobian matrix of the transformation F at x0 ,
∂fk
i.e. the M × N matrix whose k, j entry is ∂x
(x0 ). It follows that DF (x0 ) is the linear
j
mapping defined by the matrix A(x0 ), or more informally DF (x0 ) = A(x0 ).
Example 16.12. If A ∈ B(X, Y) and F (x) = Ax then F (x) = F (x0 ) + A(x − x0 ) so
DF (x0 ) = A, i.e. the derivative of a linear map is itself.
Example 16.13. If J : X → R is a functional on X, and if DJ(x0 ) exists then
DJ(x0 )y =
d
J(x0 + αy)α=0
dα
(16.7.9)
since
J(x0 + αy) − J(x0 ) = DJ(x0 )(αy) + E(x0 + αy, x0 )
(16.7.10)
Dividing both sides by α and letting α → 0, we get (16.7.9). The right hand side of
(16.7.9) has the interpretation of being the directional derivative of J at x0 in the y
direction, and in this context is often referred to as the Gateaux derivative. The above
observation is simply that the Gateaux derivative coincides with Fréchet derivative if the
latter exists. From another point of view, it says that if the Fréchet derivative exists,
a formula for it may be found by computing the Gateaux derivative. It is, however,
possible that J has a derivative in the Gateaux sense, but not in the Fréchet sense, see
Exercise 22. In any case we see that if J is differentiable in the Fréchet sense, then the
Euler-Lagrange equation for a critical point of J amounts to DJ(x0 ) = 0.
With a notion of derivative at hand, we can introduce several additional useful concepts. We denote by C(X, Y) the vector space of continuous mappings from X to Y.
The mapping DF : x0 → DF (x0 ) is evidently itself a mapping between Banach spaces,
namely DF : X → B(X, Y), and we say F ∈ C 1 (X, Y) if this map is continuous with
respect to the usual metrics. Furthermore, we then denote D2 F (x0 ) as the Fréchet derivative of DF at x0 , if it exists, in which case D2 F (x0 ) ∈ B(X, B(X, Y)). There is a natural
isomorphism between B(X, B(X, Y)) and B(X × X, Y), namely if A ∈ B(X, B(X, Y))
there is an associated à ∈ B(X × X, Y) related by
Ã(x, z) = A(x)z
x, z ∈ X
(16.7.11)
Thus it is natural to regard D2 F (x0 ) as a continuous bilinear map, and the action of the
map will be denoted as D2 F (x0 )(x, z) ∈ Y. We say F ∈ C 2 (X, Y) if x0 → D2 F (x0 ) is
continuous. It can be shown that D2 F (x0 ) must be symmetric if F ∈ C 2 (X, Y).
296
16-7-9
In general, we may inductively define Dk F (x0 ) to be the Fréchet derivative of Dk−1 F
at x0 , if it exists, which will then be a k-linear mapping of X × · · · × X into Y.
k times
Example 16.14. If X is a real Hilbert space and F (x) = ||x||2 , recall we have seen that
DF (x0 )z = 2hx0 , zi. Thus
DF (x)z − DF (x0 )z = 2hx − x0 , zi = D2 F (x0 )(x − x0 , z) + o(||x − x0 ||)
(16.7.12)
provided D2 F (x0 )(x, z) = 2hx, zi, and obviously the error term is exactly zero.
Example 16.15. If F : RN → R then by Example 16.11 DF (x0 ) is given by the gradient
of F , that is
N
X
∂F
N
(x0 )zj
(16.7.13)
DF (x0 ) ∈ B(R , R)
DF (x0 )z =
∂xj
j=1
Therefore we may regard DF : RN → RN and so D2 F (x0 ) ∈ B(RN , RN ), given by (now
using Example 16.11 in the case M = N ) the Jacobian of the gradient of F , that is
2
D F (x0 )(z, w) =
N
X
Hjk (x0 )zj wk =
j,k=1
N
X
∂ 2F
(x0 )zj wk
∂xk ∂zj
j,k=1
(16.7.14)
where H is the usual Hessian matrix.
Certain calculus rules are valid and may be proved in essentially the same way as in
the finite dimensional case.
chainrule
Theorem 16.9. (Chain rule for Fréchet derivative). Assume that X, Y, Z are Banach
spaces and
F : D(F ) ⊂ X → Y
G : D(G) ⊂ Y → Z
(16.7.15)
Assume that x0 is an interior point of D(F ), DF (x0 ) exists, y0 = F (x0 ) is an interior
point of D(G) and DG(y0 ) exists. Then G ◦ F : X → Z is Fréchet differentiable at x0
and
D(G ◦ F )(x0 ) = DG(y0 )DF (x0 )
(16.7.16)
Proof: Let
EF (x, x0 ) = F (x)−F (x0 )−DF (x0 )(x−x0 )
EG (y, y0 ) = G(y)−G(y0 )−DG(y0 )(y−y0 )
(16.7.17)
297
so that
G(F (x)) − G(F (x0 )) = DG(y0 )DF (x0 )(x − x0 ) + DG(y0 )EF (x, x0 ) + EG (F (x), y0 )
(16.7.18)
for x sufficiently close to x0 .
By the differentiability of F, G we have
||EF (x, x0 )|| = o(||x − x0 ||)
||EG (F (x), y0 )|| = o(||F (x) − F (x0 )||) = o(||x − x0 ||)
(16.7.19)
Since also DG(y0 ) is bounded, the conclusion follows.
It is a familiar fact in one space dimension that a bound on the derivative of a function
implies Lipschitz continuity. Here is an analogue for maps on a Banach space.
Theorem 16.10. Let X, Y be Banach spaces, F : D(F ) ⊂ X → Y, and let x, x0 ∈ D(F )
be such that tx + (1 − t)x0 ∈ D(F ) for t ∈ [0, 1]. If
M := sup ||DF (tx + (1 − t)x0 )||
(16.7.20)
0≤t≤1
then
||F (x) − F (x0 )|| ≤ M ||x − x0 ||
secderiv
(16.7.21)
Theorem 16.11. (Second derivative test) Let X be a Banach space and J ∈ C 2 (X, R).
If J achieves its minimum at x0 ∈ X then D2 J(x0 ) must be positive semidefinite, that is,
D2 J(x0 )(z, z) ≥ 0 for all z ∈ X. Conversely if x0 is a critical point of J at which D2 J
is positive definite, D2 J(x0 )(z, z) > 0 for z 6= 0, then x0 is a local minimum of J.
16.8
Exercises
1. Using the trial function
|x|2
R2
compute an upper bound for the first Dirichlet eigenvalue of −∆ in the ball B(0, R)
of RN . Compare to the exact value of λ1 in dimensions 2 and 3. (Zeros of Bessel
functions can be found, for example, in tables, or by means of a root finding routine
in Matlab.)
φ(x) = 1 −
298
2. Consider the Sturm-Liouville problem
u00 + λu = 0
0<x<1
u0 (0) = u(1) = 0
It can be shown that the eigenvalues are the critical points of
R1 0 2
u (x) dx
J(u) = R01
u(x)2 dx
0
on the space H = {u ∈ H 1 (0, 1) : u(1) = 0}. Use the Rayleigh-Ritz method
to estimate the first two eigenvalues, and compare to the exact values. Choose
polynomial trial functions which resemble what the first two eigenfunctions should
look like.
Ec16-3
3. Use the result of Exercise 13 in Chapter 14 to give an alternate derivation of the fact
the Dirichlet quotient achieves its minimum at ψ1 . (Hint: For u ∈ H01 (Ω) compute
||u||2H 1 (Ω) and ||u||2L2 (Ω) by expanding in the eigenfunction basis.)
0
4. Let T be the integral operator
Z
1
|x − y|u(y) dy
T u(x) =
0
on L2 (0, 1). Show that
1
1
≤ ||T || ≤ √
3
6
(Suggestion: the lower bound can be obtained using a simple choice of trial function
in the corresponding Rayleigh quotient.)
5. Let A be an m × n real matrix, b ∈ Rm and define J(x) = ||Ax − b||2 for x ∈ Rn .
(Here ||x||2 denotes the 2 norm, the usual Euclidean distance on Rm ).
a) What is the Euler-Lagrange equation for the problem of minimizing J?
b) Under what circumstances does the Euler-Lagrange equation have a unique solution?
c) Under what circumstances will the solution of the Euler-Lagrange equation also
be a solution of Ax = b?
299
pinc2proof
6. Prove the version of the Poincaré inequality stated in Proposition 16.1. (Suggestions: If no such C exists show that we can find sequence uk ∈ H∗1 (Ω) with
||uk ||L2 (Ω) = 1 such that ||∇uk ||L2 (Ω) ≤ k1 . Using Rellich’s theorem obtain a convergent subsequence whose limit must have contradictory properties.)
7. Fill in the details of the following alternate proof that there exists a weak solution
of the Neumann problem
−∆u = f
∂u
= 0 x ∈ ∂Ω
∂n
x∈Ω
(NP)
(as usual, Ω is a bounded open set in RN ) provided f ∈ L2 (Ω), and
R
Ω
f (x) dx = 0:
a) Show that for any > 0 there exists a (suitably defined) unique weak solution
u of
∂u
−∆u + u = f x ∈ Ω
= 0 x ∈ ∂Ω
∂n
R
b) Show that Ω u (x) dx = 0 for any such .
c) Show that there exists u ∈ H 1 (Ω) such that u → u weakly in H 1 (Ω) as → 0,
and u is a weak solution of (NP).
Ec16-6
8. Consider a Lagrangian of the form L = L(u, p) (i.e. it happens not to depend on the
space variable x) when N = 1. Show that if u is a solution of the Euler-Lagrange
equation then
∂L
(u, u0 ) = C
L(u, u0 ) − u0
∂p
for some constant C. In this way we are able to achieve a reduction of order from a
second order ODE to a first order ODE. Use this observation to redo the derivation
of the solution of the hanging chain problem.
9. Find the function u(x) which minimizes
Z 1
J(u) =
(u0 (x) − u(x))2 dx
0
among all functions u ∈ H 1 (0, 1) satisfying u(0) = 0, u(1) = 1.
10. The area of a surface obtained by revolving the graph of y = u(x), 0 < x < 1 about
the x axis, is
Z 1
p
J(u) = 2π
u(x) 1 + u0 (x)2 dx
0
300
Assume that u is required to satisfy u(0) = a, u(1) = b where 0 < a < b.
a) Find the Euler-Lagrange equation for the problem of minimizing this surface
area.
b) Show that
p
u(u0 )2
p
− u 1 + (u0 )2
1 + (u0 )2
is a constant function for any such minimal surface (Hint: use Exercise 8).
c) Solve the first order ODE in part b) to find the minimal surface. Make sure to
compute all constants of integration.
11. Find a functional on H 1 (Ω) for which the Euler-Lagrange equation is
−∆u = f
x∈Ω
−
∂u
= k(x)u x ∈ ∂Ω
∂n
12. Find the Euler-Lagrange equation for minimizing
Z
J(u) =
|∇u(x)|q dx
Ω
subject to the constraint
Z
H(u) =
|u(x)|r dx = 1
Ω
where q, r > 1.
13. Let Ω ⊂ RN be a bounded open set, q ∈ C(Ω), q(x) > 0 in Ω, and
R
|∇u(x)|2 dx
Ω
R
J(u) =
q(x)u(x)2 dx
Ω
a) Show that any nonzero critical point u ∈ H01 (Ω) of J is a solution of the eigenvalue
problem
−∆u = λq(x)u x ∈ Ω
u=0
x ∈ ∂Ω
b) Show that all eigenvalues are positive.
c) If q(x) ≥ 1 in Ω and λ1 denotes the smallest eigenvalue, show that λ1 < λ∗1 where
λ∗1 is the corresponding first eigenvalue of −∆ in Ω.
301
14. Define
1
J(u) =
2
Z
2
Z
(∆u) dx +
Ω
f u dx
Ω
What PDE problem is satisfied by a critical point of J over X = H 2 (Ω) ∩ H01 (Ω)?
Make sure to specify any relevant boundary conditions. What is different if instead
we let X = H02 (Ω)?
15. Let H be a Hilbert space and J : H → R. Recall that J is lower semicontinuous if
J(x) ≤ lim inf n→∞ J(xn ) whenever xn → x, and is weakly lower semicontinuous if
w
the same is true whenever xn → x. We say J is coercive if lim||x||→∞ J(x) = +∞.
a) If J is weakly lower semicontinuous and coercive show that inf x∈H J(x) is finite.
b)If J is weakly lower semicontinuous and coercive show that minx∈H J(x) has a
solution.
c) Show that if f ∈ L2 (Ω) then
Z
Z
1
2
J(u) =
|∇u(x)| dx − f (x)u(x) dx
2 Ω
Ω
is weakly lower semicontinuous and coercive on H01 (Ω).
Ec15b
16. Let Ω be a bounded open set in RN . If p < N , a special case of the Sobolev
embedding theorem states that there exists a constant C = C(Ω, p, q) such that
||u||Lq (Ω) ≤ C||u||W 1,p (Ω)
1≤q≤
Np
N −p
(16.8.1)
Use this to show that (16.4.19) holds for N ≥ 3, p = N2N
, and so the problem
+2
(16.4.10) has a solution obtainable by the variational method, for all f in this Lp
space.
ex-15
17. Formulate and derive a replacement for (16.5.7) for the case that u is a vector
function.
Ec-dido
18. Redo Dido’s problem (Example 16.9) but allowing for an arbitrary curve (x(t), y(t))
in the plane connecting the points (0, 0) and (1, 0). Since there are now 2 unknown
functions, the result of Exercise 17 will be relevant.
19. Show that if Ω is a bounded domain in RN and f ∈ L2 (Ω), then the problem of
minimizing
Z
Z
1
2
|∇u| dx − f u dx
J(u) =
2 Ω
Ω
302
over H01 (Ω) satisfies all of the conditions of Theorem 16.8. What goes wrong if we
replace H01 (Ω) by H 1 (Ω)?
20. We say that J : X → R is strictly convex if
J(tx + (1 − t)y) < tJ(x) + (1 − t)J(y)
x, y ∈ X
0<t<1
If J is strictly convex, show that the minimization problem (16.6.1) has at most
one solution.
Ec16-19
21. Show that the Fréchet derivative, if it exists, must be unique.
Ec16-20
22. If F : R2 → R is defined by
(
F (x, y) =
xy 2
x2 +y 4
0
(x, y) 6= (0, 0)
(x, y) = (0, 0)
show that F is Gateaux differentiable but not Fréchet differentiable at the origin.
23. Let F be a C 1 mapping of a Banach space X into itself. Give a formal derivation
of Newton’s method
xn+1 = xn − DF (xn )−1 (F (xn ) − y)
for solving F (x) = y.
24. If A is a bounded linear operator on a Banach space X, discuss the differentiability
of the map t → etA , regarded as a mapping from R into B(X). (Recall that the
exponential of a bounded linear operator was defined in Exercise 9 of Chapter 5.)
25. Prove the second derivative test Theorem 16.11.
303
Chapter 17
Weak solutions of partial differential
equations
ch_weaksol
17.1
Lax-Milgram theorem
The main goal of this final chapter is to develop further tools which will allow us to answer
basic questions about second order linear PDEs with variable coefficients. Beginning our
discussion with the elliptic case, there are actually two natural ways to write such an
equation, namely
N
X
N
X
∂ 2u
∂u
Lu := −
ajk (x)
+
bj (x)
+ c(x)u = f (x) x ∈ Ω
∂xj ∂xk j=1
∂xj
j,k=1
(17.1.1)
endiv
X
N
N
X
∂
∂u
∂u
Lu := −
ajk (x)
+
bj (x)
+ c(x)u = f (x) x ∈ Ω
∂x
∂x
∂x
j
k
j
j=1
j,k=1
(17.1.2)
ediv
and
A second order PDE is said to be elliptic if it can be written in one of the forms
(17.1.1), (17.1.2) for which there exists a constant θ > 0 such that
N
X
ajk (x)ξj ξk ≥ θ|ξ|2
j,k=1
304
∀ξ ∈ RN
(17.1.3)
ellip
That is to say, the matrix with entries ajk (x) is uniformly positive definite on Ω. It is
easy to verify that this use of the term ’elliptic’ is consistent with all previous usages.
We will in addition always assume that the coefficients ajk , bj , c belong to L∞ (Ω).
The structure of these two equations are referred to respectively as non-divergence
form and divergence form since in the second case theP
leading order sum could be written
as ∇ · v if v is the vector field with components vj = N
k=1 ajk uxk . The minus sign in the
leading order term is included for later convenience, for the same reason that Poisson’s
equation is typically written as −∆u = f . Also for notational simplicity we will from
here on adopt the summation convention, that is, repeated indices are summed. Thus
the two forms of the PDE may be written instead as
−ajk (x)uxk xj + bj (x)uxj + c(x)u = f (x) x ∈ Ω
(17.1.4)
17-1-4
− (ajk (x)uxk )xj + bj (x)uxj + c(x)u = f (x) x ∈ Ω
(17.1.5)
17-1-5
There is obviously an equivalence between the two forms provided the leading coefficients ajk are differentiable in an appropriate sense, so that
(ajk (x)uxk )xj = ajk (x)uxk xj + (ajk )xj uxj
(17.1.6)
is valid, but one of the main reasons to maintain the distinction is that there may be
situations where we do not want to make any such differentiability assumption. In such
a case we cannot expect classical solutions to exist, and will rely instead on a notion of
weak solution, which generalizes (14.4.8) for the case of the Poisson equation.
A second reason, therefore, for direct consideration of the PDE in divergence form is
that a suitable definition of weak solution arises in a very natural way. The formal result
of multiplying the equation by a test function v and integrating over Ω is that
Z
Z
f (x)v(x) dx (17.1.7)
[ajk (x)uxk (x)vxj (x) + bj (x)uxj (x)v(x) + c(x)u(x)v(x)] dx =
Ω
Ω
If we also wish to impose the Dirichlet boundary condition u = 0 for x ∈ ∂Ω then as in
the case of the Laplace equation we interpret this as the requirement that u ∈ H01 (Ω).
Assuming that f ∈ L2 (Ω) the integrals in (17.1.7) are all defined and finite for v ∈ H01 (Ω)
and so we are motivated to make the following definition.
Definition 17.1. If f ∈ L2 (Ω) we say that u is a weak solution of the Dirichlet problem
− (ajk (x)uxk )xj + bj (x)uxj + c(x)u = f (x) x ∈ Ω
(17.1.8)
x ∈ ∂Ω
(17.1.9)
u = 0
if u ∈ H01 (Ω) and (17.1.7) holds for every v ∈ H01 (Ω).
305
17-1-7
In deciding whether a certain definition of weak solution for a PDE is an appropriate
one the following considerations should be born in mind
• If the definition is too narrow, then a solution need not exist.
• If the definition is too broad, then many solutions will exist.
Thus if both existence and uniqueness can be proved, it is an indication that the balance
is just right, i.e. the requirements for a weak solution are neither too narrow nor too
broad, so that the definition is suitable.
Here is a special case for which uniqueness is simple to prove.
Proposition 17.1. Let Ω be a bounded domain in RN . There exists > 0 depending
only on the domain Ω and the ellipticity constant θ in (17.1.3), such that if
c(x) ≥ 0 x ∈ Ω and max ||bj ||L∞ (Ω) < j
(17.1.10)
then there is at most one weak solution of the Dirichlet problem (17.1.8)-(17.1.9).
Proof: If u1 , u2 are both weak solutions then u = u1 − u2 is a weak solution with f ≡ 0.
We may then choose v = u in (17.1.7) to get
Z
[ajk (x)uxk (x)uxj (x) + bj (x)uxj (x)u(x) + c(x)u(x)2 ] dx = 0
(17.1.11)
Ω
By the ellipticity assumption we have ajk uxk uxj ≥ θ|∇u|2 and recalling that c ≥ 0 there
results
θ||u||2H 1 (Ω) ≤ ||u||L2 (Ω) ||u||H01 (Ω)
(17.1.12)
0
Now if C = C(Ω) denotes a constant for which Poincaré’s inequality (14.4.10) holds, we
obtain either u ≡ 0 or θ ≤ C. Thus any < θ/C has the required properties.
The smallness restriction on the bj ’s can be weakened considerably, but the nonnegativity assumption on c(x) is more essential. For example in the case of
−∆u + c(x)u = 0 x ∈ Ω
u = 0 x ∈ ∂Ω
(17.1.13)
uniqueness fails if c(x) = −λn , if λn is any Dirichlet eigenvalue of −∆, since then any
corresponding eigenfunction is a nontrivial solution.
306
Now turning to the question of the existence of weak solutions, our strategy will be
to adapt the argument that occurs in Proposition 14.2 showing that the operator T is
onto. Consider first the special case
− (ajk (x)uxk )xj = f (x) x ∈ Ω
u = 0 x ∈ ∂Ω
(17.1.14)
where as before we assume the ellipticity property (17.1.3), ajk ∈ L∞ (Ω), f ∈ L2 (Ω) and
in addition the symmetry property ajk = akj for all j, k. Define
Z
A[u, v] =
ajk (x)uxj (x)vxk (x) dx
(17.1.15)
Ω
We claim that A is a valid inner product on the real Hilbert space H01 (Ω). Note that
A[u, v] ≤ C||u||H01 (Ω) ||v||H01 (Ω)
(17.1.16)
for some constant C depending on maxj,k ||aj,k ||L∞ (Ω) , so A[u, v] is defined for all u, v ∈
H01 (Ω), and
A[u, u] ≥ θ||u||2H 1 (Ω)
(17.1.17)
0
by the ellipticity assumption. Thus the inner product axioms [H1] and [H2] hold. The
symmetry axiom [H4] follows from the assumed symmetry
of ajk , and the remaining
R
inner product axioms are obvious. If we let ψ(v) = Ω f v dx then just as in the proof
of Proposition 14.2 we have that ψ is a continuous linear functional on H01 (Ω). We
conclude that there exists u ∈ H01 (Ω) such that A[u, v] = ψ(v) for every v ∈ H01 (Ω),
which is precisely the definition of weak solution of (17.1.14).
The argument just given seems to rely in an essential way on the symmetry assumption, but it turns out that with a somewhat different proof we can eliminate that
hypothesis. This result, in its most abstract form, is the so-called Lax-Milgram theorem.
Note that even if we had no objection to the symmetry assumption on ajk , it would still
not be possible to allow for the presence of first order terms in any obvious way in the
above argument.
For simplicity, and because it is all that is needed in most applications, we will from
now on assume that all abstract and function spaces are real, that is, only real valued
functions and scalars are allowed.
Definition 17.2. If H is a Hilbert space and A : H × H → R, we say A is
• bilinear if it is linear in each argument separately,
307
dpspec
• bounded if there exists a constant M such that A[u, v] ≤ M ||u|| ||v|| for all u, v ∈ H,
• coercive if there exists γ > 0 such that A[u, u] ≥ γ||u||2 for all u ∈ H.
LaxMilgram
Theorem 17.1. (Lax-Milgram) Assume that A is bilinear, bounded and coercive on the
Hilbert space H, and ψ belongs to the dual space H∗ . Then there exists a unique w ∈ H
such that
A[x, w] = ψ(x)
∀x ∈ H
(17.1.18)
Proof: Let
E = {y ∈ H : ∃w ∈ H such that A[x, w] = hx, yi ∀x ∈ H}
(17.1.19)
If w is the element corresponding to some y ∈ E we then have
γ||w||2 ≤ A[w, w] = hw, yi ≤ ||w|| ||y||
so γ||w|| ≤ ||y||. In particular w is uniquely determined by y and E
claim that E = H. If not then there exists z ∈ E ⊥ , z 6= 0. If we let
then φ ∈ H∗ so by the Riesz Representation Theorem 6.6 there exists u
φ(x) = hx, ui, or A[x, z] = hx, ui, for all x. Thus u ∈ E, but since z
γ||z||2 ≤ A[z, z] = hz, ui = 0, a contradiction.
(17.1.20)
is closed. We
φ(x) = A[x, z]
∈ H such that
∈ E ⊥ we find
Finally if ψ ∈ H∗ , using Theorem 6.6 again, we obtain y ∈ H such that ψ(x) = hx, yi
for every x, and since y ∈ E = H there exists w ∈ H such that ψ(x) = A[x, w], as
needed.
The element w is unique, since if A[x, w1 ] = A[x, w2 ] for all x ∈ H then choosing
x = w1 − w2 we get A[x, x] = 0 and consequently x = w1 − w2 = 0.
Since there is no need for any assumption of symmetry, we can use the Lax-Milgram
theorem to prove a more general result about the existence of weak solutions, under the
same assumptions we used to prove uniqueness above.
th17-2
Theorem 17.2. Let Ω be a bounded domain in RN . There exists > 0 depending only
on Ω and the coercitivity constant γ such that if c(x) ≥ 0 in Ω and maxj ||bj ||L∞ (Ω) < then there exists a unique weak solution of the Dirichlet problem (17.1.8)-(17.1.9) for any
f ∈ L2 (Ω).
Proof: In the real Hilbert space H = H01 (Ω) let
Z
A[u, v] = [ajk (x)uxk (x)vxj (x) + bj (x)uxj (x)v(x) + c(x)u(x)v(x)] dx
Ω
308
(17.1.21)
17-1-20
for u, v ∈ H01 (Ω). It is immediate that A is bilinear and bounded. By the ellipticity and
other assumptions made on the coefficients we get
Z
A[u, u] =
[ajk (x)uxk (x)uxj (x) + bj (x)uxj (x)u(x) + c(x)u(x)2 ] dx (17.1.22)
Ω
≥ θ||u||2H 1 (Ω) − ||u||L2 (Ω) ||u||H01 (Ω)
0
≥
γ||u||2H 1 (Ω)
0
(17.1.23)
(17.1.24)
if γ = θ/2 and = γ/C, where C = C(Ω)
R is a constant for which the Poincaré inequality
(14.4.10) is valid. Finally since ψ(u) = Ω f u dx defines an element of H∗ , the conclusion
follows from the Lax-Milgram theorem.
As another application of the Lax-Milgram theorem, we can establish the existence
of eigenvalues and eigenfunctions of more general elliptic operators. Let
Lu = −(ajk uxk )xj
(17.1.25)
Here we will assume the ellipticity condition (17.1.3), ajk ∈ L∞ (Ω) and the symmetry
property ajk = akj . Let For f ∈ L2 (Ω) let v = Sf be the unique weak solution v ∈ H01 (Ω)
of
Lv = f x ∈ Ω
v = 0 x ∈ ∂Ω
(17.1.26)
R
1
whose existence is guaranteed by Theorem 17.2, i.e. v ∈ H0 (Ω) and A[v, w] = Ω f w dx
for all w ∈ H01 (Ω), where
Z
ajk vxk wxj dx
A[v, w] =
(17.1.27)
Ω
Choosing w = v, using the ellipticity and the Poincaré inequality gives
θ||v||2H 1 (Ω) ≤ C||f ||L2 (Ω) ||v||H01 (Ω)
0
(17.1.28)
Thus S : L2 (Ω) → H01 (Ω) is bounded and consequently compact as a linear operator on
L2 (Ω) by Rellich’s theorem. We claim next that S is self-adjoint on L2 (Ω). To see this,
suppose f, g ∈ L2 (Ω), v = Sf and w = Sg. Then
hSf, gi = hv, gi = hg, vi = A[w, v]
hf, Sgi = hf, wi = A[v, w]
(17.1.29)
(17.1.30)
(17.1.31)
But A[w, v] = A[v, w] by our symmetry assumption, so it follows that S is self-adjoint.
2
It then follows from Theorem 13.10 that there there exists a basis {un }∞
n=1 of L (Ω)
309
consisting of eigenfunctions of S, corresponding to real eigenvalues {µn }∞
n=1 , µn → 0.
The
eigenvalues
of
S
are
all
strictly
positive,
since
Su
=
µu
is
equivalent
to
A[µu, µu] =
R
2
−1
µu dx. If λn = µn then un is evidently a weak solution of
Ω
Lun = λn un
x∈Ω
un = 0 x ∈ ∂Ω
(17.1.32)
and we may assume the ordering
0 < λ1 ≤ λ2 ≤ · · · ≤ λn → +∞
(17.1.33)
The existence of an orthonormal basis of eigenfunctions now follows from Theorem 13.10.
To summarize, we have obtained the following generalization of Theorem 14.5.
Theorem 17.3. Assume that the ellipticity condition (17.1.3) holds, ajk = akj , and
ajk ∈ L∞ (Ω) for all j, k. Then the operator
T u = −(ajk (x)uxj )xk
D(T ) = {u ∈ H01 (Ω) : (ajk (x)uxj )xk ∈ L2 (Ω)}
(17.1.34)
has an infinite sequence of real eigenvalues of finite multiplicity,
0 < λ1 ≤ λ2 ≤ λ3 ≤ . . . λn → +∞
(17.1.35)
and corresponding eigenfunctions {ψn }∞
n=1 which may be chosen as an orthonormal basis
of L2 (Ω).
As an immediate application, we can derive a formal series solution for the parabolic
problem with time independent coefficients
ut − (ajk (x)uxj )xk = 0
x∈Ω t>0
u(x, t) = 0
x ∈ ∂Ω t > 0
u(x, 0) = f (x)
x∈Ω
(17.1.36)
(17.1.37)
(17.1.38)
Making the same assumptions on ajk as in the Theorem, so that an orthonormal basis
2
{ψn }∞
n=1 of eigenfunctions exists in L (Ω), we can obtain the solution in the form
u(x, t) =
∞
X
hf, ψn ie−λn t ψn (x)
(17.1.39)
n=1
in precisely the same way as was done to derive (14.4.35) for the heat equation. The
smallest eigenvalue λ1 again plays a distinguished role in determining the overall decay
rate for typical solutions.
310
17.2
More function spaces
In this section we will introduce some more useful function spaces. Recall that the
Sobolev space W0k,p (Ω) is the closure of C0∞ (Ω) in the norm of W k,p (Ω).
0
Definition 17.3. We define the negative order Sobolev space W −k,p (Ω) to be the dual
space of W0k,p (Ω). That is to say,
0
W −k,p (Ω) = {T ∈ D0 (Ω) : ∃C such that |T φ| ≤ C||φ||W k,p (Ω) ∀φ ∈ C0∞ (Ω)}
(17.2.1)
We emphasize that we are defining the dual of W0k,p (Ω), not W k,p (Ω). The notation
0
suggests that T is the ‘−k’th derivative (i.e a k-fold integral) of a function in Lp (Ω),
where p0 is the usual Hölder conjugate exponent, and we will make some more precise
statement along these lines below. When p = 2 the alternative notation H −k (Ω) is
commonly used. The same notation was also used in the case Ω = RN in which case a
definition using the Fourier transform was given. One can check that the definitions are
0
equivalent. The norm of an element in W −k,p (Ω) is defined in the usual way for dual
spaces, namely
|T φ|
||T ||W −k,p0 (Ω) = sup
(17.2.2)
||φ||W k,p (Ω)
φ6=0
0
−k,p0
W0k,p (Ω)
If φ ∈
and T ∈ W
(Ω) then it is common to use the ’inner product-like’
notation hT, φi in place of T φ, and may refer to this value as the duality pairing of T
and φ.
Example 17.1. If x0 ∈ (a, b) and T φ = φ(x0 ), i.e. T = δx0 , then T ∈ H −1 (a, b). To see
this, observe that for φ ∈ C0∞ (a, b) we have obviously
Z x0
p
p
0
|T φ| = |φ(x0 )| = φ (x) dx ≤ |b − a|||φ0 ||L2 (a,b) ≤ |b − a|||φ||H 1 (a,b) (17.2.3)
a
It is essential that Ω = (a, b) is one dimensional here. If Ω ⊂ RN and x0 ∈ Ω it can be
0
shown that δx0 ∈ W −k,p (Ω) if k > N/p, see Exercise ( ).
Let us next observe that in the
R proof of Theorem 17.2, the only property of f which we
actually used was that ψ(u) = Ω f u dx defines an element in the dual space of H01 (Ω).
Thus it should be possible to obtain similar conclusions if we replace the assumption
f ∈ L2 (Ω) by f ∈ H −1 (Ω). To make this precise, we will first make the obvious definition
311
that for T ∈ H −1 (Ω) and L a divergence form operator as in (17.1.2), with associated
bilinear form (17.1.21), u is a weak solution of
Lu = T
x∈Ω
u = 0 x ∈ ∂Ω
(17.2.4)
17-2-4
provided
u ∈ H01 (Ω)
A[u, v] = T v
∀v ∈ H01 (Ω)
(17.2.5)
We then have
th17-3
Theorem 17.4. There exists > 0 such that if c(x) ≥ 0 in Ω and maxj ||bj ||L∞ (Ω) < then there exists a unique weak solution of the Dirichlet problem (17.2.4) for any T ∈
H −1 (Ω).
Corollary 17.1. If T ∈ H −1 (Ω) and u ∈ H01 (Ω) is the corresponding weak solution of
−∆u = T
x∈Ω
u = 0 x ∈ ∂Ω
(17.2.6)
then
||u||H01 (Ω) = ||T ||H −1 (Ω)
Proof: The definition of weak solution here is
Z
∇u · ∇v = T v
∀v ∈ H01 (Ω)
(17.2.7)
(17.2.8)
Ω
so it follows that
|T v| ≤ ||u||H01 (Ω) ||v||H01 (Ω)
(17.2.9)
and therefore ||T ||H −1 (Ω) ≤ ||u||H01 (Ω) . But choosing v = u in the same identity gives
||u||2H 1 (Ω) = T u ≤ ||T ||H −1 (Ω) ||u||H01 (Ω)
0
(17.2.10)
and the conclusion follows.
In particular we see that the map T → u, which we will denote by (−∆)−1 , is an
isometric isomorphism of H −1 (Ω) onto H01 (Ω), thus is a specific example of the correspondence between a Hilbert space and its dual space, as is guaranteed by Theorem 6.6.
Using this map we can also give a convenient characterization of H −1 (Ω).
Corollary 17.2. T ∈ H −1 (Ω) if and only if there exists f1 . . . fN ∈ L2 (Ω) such that
N
X
∂fj
T =
∂xj
j=1
in the sense of distributions on Ω.
312
(17.2.11)
17-2-11
Proof: Given T ∈ H −1 (Ω) we let u = (−∆)−1 T ∈ H01 (Ω) in which case fj := uxj has
the required properties. Conversely, if f1 , . . . fN ∈ L2 (Ω) are given and T is defined as a
distribution by (17.2.11) it follows that
N Z
X
Tφ =
fj φxj dx
(17.2.12)
j=1
Ω
for any test function φ. Therefore
|T φ| ≤
N
X
||fj ||L2 (Ω) ||φxj ||L2 (Ω) ≤ C||φ||H01 (Ω)
(17.2.13)
j=1
which implies that T ∈ H −1 (Ω).
0
The spaces W −k,p for finite p 6= 2 can be characterized in a similar way, see Theorem
3.10 of [1].
A second kind of space we introduce arise very naturally in cases when there is a
distinguished variable, such as time t in the heat equation or wave equation. If X is any
Banach space and [a, b] ⊂ R, we denote
C([a, b] : X) = {f : [a, b] → X : f is continuous on [a, b]}
(17.2.14)
Continuity here is with respect to the obvious topologies, i.e. for any > 0 there exists
δ > 0 such that ||f (t) − f (t0 )||X ≤ if |t − t0 | < δ, t, t0 ∈ [a, b]. One can readily verify
that
||f ||C([a,b]:X) = max ||f (t)||X
(17.2.15)
a≤t≤b
defines a norm with respect to which C([a, b] : X) is a Banach space. The definition may
be modified in the usual way for the case that [a, b] is replaced by an open, semi-open or
infinite interval, although of course it need not then be a Banach space.
A related collection of spaces is defined by means of the norm defined as
Z b
p1
p
||f ||Lp ([a,b]:X) :=
||f (t)||X dt
(17.2.16)
a
for 1 ≤ p < ∞. To avoid questions of measurability we will simply define Lp ([a, b] : X)
to be the closure of C([a, b] : X) with respect to this norm. See, for example section 5.9.2
of [10] or section 39 of [36] for more details, and for the case p = ∞.
313
17-2-15
If X is a space of functions and u = u(x, t) is a function for which u(·, t) ∈ X for every
(or almost every) t ∈ [a, b], then we will often regard u as being the map u : [a, b] → X
defined by u(t)(x) = u(x, t). Thus u be viewed as a ’curve’ in the space X.
The following example illustrates a typical use of such spaces in a PDE problem.
According to the discussion of Example 14.4, if Ω is a bounded open set in RN and
f ∈ L2 (Ω) then the unique solution u = u(x, t) of
ut − ∆u = 0
x∈Ω t>0
u(x, t) = 0
x ∈ ∂Ω t > 0
u(x, 0) = f (x)
x∈Ω
is given by
u(x, t) =
∞
X
(17.2.17)
(17.2.18)
(17.2.19)
cn e−λn t ψn (x)
(17.2.20)
n=1
Here λn > 0 is the n’th Dirichlet eigenvalue of −∆ in Ω, {ψn }∞
n=1 is a corresponding
2
orthonormal eigenfunction basis of L (Ω), and cn = hf, ψn i.
Theorem 17.5. For any T > 0 the solution u satisfies
u(·, t) ∈ H01 (Ω) ∀t > 0
(17.2.21)
17-2-21
u ∈ C([0, T ] : L2 (Ω)) ∩ L2 ([0, T ] : H01 (Ω))
(17.2.22)
17-2-22
Proof: Pick 0 ≤ t < t0 ≤ T and observe by Bessel’s equality that
||u(·, t) − u(·, t0 )||2L2 (Ω) =
∞
X
0
|cn |2 (e−λn t − e−λn t )2 ≤
n=1
∞
X
0
|cn |2 (1 − e−λn (t −t) )2 (17.2.23)
n=1
Since f ∈ L2 (Ω) we know that {cn } ∈ `2 , so for given > 0 we may pick an integer N
such that
∞
X
|cn |2 <
(17.2.24)
2
n=N +1
Next, pick M > 0 such that |cn |2 ≤ M for all n and then δ > 0 such that
|e−λn δ − 1|2 ≤
314
2N M
(17.2.25)
for n = 1, . . . N . If 0 ≤ t < t0 ≤ t + δ we then have
∞
X
0 2
||u(·, t) − u(·, t )||L2 (Ω) ≤
|cn |2 (1 − e−λn t )2
n=1
N
X
≤
(17.2.26)
|cn | (1 − e
) +
n=1
N
X
≤
∞
X
−λn δ 2
2
2
|cn |2 (1 − e−λn δ )(17.2.27)
n=N +1
M
n=1
∞
X
+
|cn |2 < 2N M n=N +1
(17.2.28)
This completes the proof that u ∈ C([0, T ] : L2 (Ω)).
To verify (17.2.21) we use the fact that
||v||2H 1 (Ω)
0
=
∞
X
λn |hv, ψn i|2
(17.2.29)
n=1
for v ∈
H01 (Ω),
see Exercise 13 of Chapter 14. Thus it is enough to show that
∞
X
λn |hu(·, t), ψn i|2 =
n=1
∞
X
λn |hf, ψn i|2 e−2λn t < ∞
(17.2.30)
n=1
By means of elementary calculus it is easy to check that se−s ≤ e−1 for s ≥ 0, hence
1
λn e−2λn t ≤
n = 1, 2, . . .
(17.2.31)
2et
Thus
P∞
∞
2
X
||f ||2L2 (Ω)
2 −2λn t
n=1 |hf, ψn i|
λn |hf, ψn i| e
≤
=
<∞
(17.2.32)
2et
2et
n=1
as needed, as long as t > 0.
Finally,
||u||2L2 ([0,T ]:H 1 (Ω))
0
Z
T
=
0
=
=
∞
X
n=1
∞
X
||u(·, t)||2H 1 (Ω)
0
Z
Z
=
0
T
∞
X
λn e−2λn t |hf, ψn i|2 dt (17.2.33)
n=1
T
e−2λn t dt|hf, ψn i|2
(17.2.34)
(1 − e−2λn T )|hf, ψn i|2 ≤ ||f ||2L2 (Ω)
(17.2.35)
λn
0
n=1
315
This completes the proof.
Note that the proof actually establishes the quantitative estimates
||f ||L2 (Ω)
∀t > 0
||u(·, t)||H01 (Ω) ≤ √
2et
||u||L2 ([0,T ]:H01 (Ω)) ≤ ||f ||L2 (Ω)
∀T > 0
(17.2.36)
(17.2.37)
The fact that u(·, t) ∈ H01 (Ω) for t > 0 even though f is only assumed to belong
to L2 (Ω) is sometimes referred to as a regularizing effect – the solution becomes instantaneously smoother than it starts out being. With more advanced methods one can
actually show that u is infinitely differentiable, with respect to both x and t for t > 0.
The conclusion u(·, t) ∈ H01 (Ω) for t > 0 also gives a precise meaning for the boundary
condition (17.2.18), and similarly u ∈ C([0, T ] : L2 (Ω)) provides a specific sense in which
the initial condition (17.2.19) holds, namely u(·, t) → f in L2 (Ω) as t → 0+.
The above discussion is very specific to the heat equation – on physical grounds alone
one may expect rather different behavior for solutions of the wave equation. See Exercise
( ).
17.3
Galerkin’s method
For PDE problems of the form Lu = f , ut = Lu or utt = Lu, we can obtain very explicit
solution formulas involving the eigenvalues and eigenfunctions of a suitable operator T
corresponding to L, provided there exist such eigenvalues and eigenfunctions. But there
are situations of interest when this is not the case, for example if T is not symmetric.
Another case which may arise for time dependent problems is when the expression for
L, and hence the corresponding T , is itself t dependent. Even if the symmetry property
were assumed to hold for each fixed t, it would still not be possible to obtain solution
formulas by means of a suitable eigenvalue/eigenfunction series.
An alternative, but closely related method which will allow for such generalizations
is Galerkin’s method, which we will now discuss in the context of the abstract problem
u∈H
A[v, u] = ψ(v) ∀v ∈ H
(17.3.1)
under the same assumptions as in the Lax-Milgram theorem, Theorem 17.1. Recall this
means we assume that A is bilinear, bounded and coercive on the Hilbert space H and
ψ ∈ H∗ .
316
17-3-1
We start by choosing an arbitrary basis {vk } of H, and look for an approximate
solution (the Galerkin approximation) in the form
un =
n
X
ck v k
(17.3.2)
17-3-2
k=1
If un happened to be the exact solution we would have A[v, un ] = ψ(v) for any v ∈ H
and in particular
n
X
A[vj , un ] =
ck A[vj , vk ] = ψ(vj )
∀j
(17.3.3)
k=1
However this amounts to infinitely many equations for c1 , . . . cn , so can’t be satisfied in
general. Instead we require it only for j = 1, . . . n, and so obtain an n × n linear system
for these unknowns. The resulting system
n
X
ck A[vj , vk ] = ψ(vj )
j = 1, . . . n
(17.3.4)
17-3-4
k=1
is guaranteed nonsingular under our assumptions. Indeed, if
n
X
dk A[vj , vk ] = 0
j = 1, . . . n
(17.3.5)
n=1
and w =
Pn
k=1
dk vk then
A[vj , w] = 0
j = 1, . . . n
(17.3.6)
and so multiplying the j’th equation by dj and summing we get A[w, w] = 0. By the
coercitivity assumption it follows that w = 0 and so d1 = . . . dn = 0 by the linear
independence of the vk ’s. If we set En = span {v1 , . . . vn } then the previous discussion
amounts to defining un to be the unique solution of
un ∈ En
A[v, un ] = ψ(v) ∀v ∈ En
(17.3.7)
which may be obtained by solving the finite system (17.3.4). It now remains to study
the behavior of un as n → ∞.
The identity A[un , un ] = ψ(un ), obtained by choosing v = un in (17.3.7), together
with the coercitivity assumption, gives
γ||un ||2 ≤ ||ψ|| ||un ||
317
(17.3.8)
17-3-7
Thus the sequence un is bounded in H and so has a weakly convergent subsequence
w
unl → u in H. We may now pass to the limit as nl → ∞, taking into account the
meaning of weak convergence, in the relation
A[vk , unl ] = ψ(vk )
(17.3.9)
for any fixed k, obtaining A[vk , u] = ψ(vk ) for every k. It then follows that (17.3.1) holds,
because finite linear combinations of the vk ’s are dense in H. Also, since u is the unique
solution of (17.3.1) the entire sequence un must be weakly convergent to u.
We remark that in a situation like (17.1.14) in which, at least formally, A[v, u] =
hv, LuiH1 and ψ(v) = hf, viH1 for some second Hilbert space H1 ⊃ H, then the system
(17.3.4) amounts to the requirement that Lun − f = L(un − u) ∈ En⊥ , where the orthogonality is with respect to the H1 inner product. If also the embedding of H into H1 is
compact (think of H = H01 (Ω) and H1 = L2 (Ω)) then we also obtain immediately that
un → u strongly in H1 .
The Galerkin approximation technique can become a very powerful and effective
computational technique if the basis {vn } is chosen in a good way, and in particular
much more specific and refined convergence results can be proved for special choices of
the basis. For example in the finite element method, approximations to solutions of PDE
problems are obtained in the form (17.3.2) by solving (17.3.4) where the vn ’s are chosen
to be certain piecewise polynomial functions.
17.4
PDEs with variable coefficients
The Galerkin approach can also be adapted to the case of time dependent problems. We
illustrate by consideration of the parabolic problem
ut = (ajk (x, t)uxk )xj + h(x, t) x ∈ Ω 0 < t < T
u(x, t) = 0 x ∈ ∂Ω 0 < t < T
u(x, 0) = f (x)
x∈Ω
(17.4.1)
(17.4.2)
(17.4.3)
Here we assume that
• Ω is a bounded open set in RN .
• ajk ∈ L∞ (Ω × (0, T )) for all j, k and there exists a constant θ > 0 such that
ajk (x, t)ξj ξk ≥ θ|ξ|2 for all ξ ∈ RN , (x, t) ∈ Ω × (0, T ).
318
• h ∈ L2 ((0, T ) : L2 (Ω)) and f ∈ L2 (Ω).
By a weak solution of (17.4.1) we will mean a function u ∈ L∞ ([0, T ] : L2 (Ω))∩L2 ((0, T ) :
H01 (Ω)) such that
Z
Z tZ
u(x, t)ψ(x, t) dx −
u(x, s)ψt (x, s) dxds
(17.4.4)
Ω
0
Ω
Z tZ
+
ajk (x)uxj (x, s)ψxk (x, s) dxds
(17.4.5)
0
Ω
Z t
Z
=
h(x, s)ψ(x, s) dxds + f (x)ψ(x, 0) dx
(17.4.6)
o
Ω
for almost every t ∈ [0, T ] and every ψ ∈ C 1 ([0, T ] × Ω). We mention here that once
time dependence is allowed, several different reasonable definitions of weak solutions
become possible – see for example Section 7.1.1 of [10], or Section 9.2.d of [22] for other
definitions. Roughly speaking, if the class of test functions ψ is larger then proving
existence becomes harder and proving uniqueness becomes easier. For simple parabolic
problems of this type, however, all such definitions turn out in the end to be equivalent.
We now sketch how the Galerkin method may be adapted to this problem. Choose
1
2
any basis {vk }∞
k=1 of H0 (Ω) which is orthonormal in L (Ω), for example the Dirichlet
eigenfunctions of −∆. We seek an approximate solution
un (x, t) =
∞
X
ck (t)vk (x)
(17.4.7)
n=1
17.5
Exercises
1. Verify that the definition of ellipticity (17.1.3) is consistent with the one given for
the special case (2.3.39), i.e. for such an equation the two definitions are equivalent.
2. Let λ1 be the smallest Dirichlet eigenvalue for −∆ in Ω, assume that c ∈ C(Ω) and
c(x) > λ1 in Ω. If f ∈ L2 (Ω) prove the existence of a solution of
−∆u + c(x)u = f
x∈Ω
u = 0 x ∈ ∂Ω
(17.5.1)
3. Let λ > 0 and define
Z
A[u, v] = A[u, v] =
Z
ajk (x)uxk (x)vxj (x) dx + λ
Ω
uv dx
Ω
319
(17.5.2)
for u, v ∈ H 1 (Ω). Assume the ellipticity property (17.1.3) and that ajk ∈ L∞ (Ω).
If f ∈ L2 (Ω) show that there exists a unique solution of
Z
1
u ∈ H (Ω)
A[u, v] =
f v dx ∀v ∈ H 1 (Ω)
(17.5.3)
Ω
Justify that u may be regarded as the weak solution of
−(ajk uxk )xj + λu = f (x) x ∈ Ω
ajk uxk nj = 0 x ∈ ∂Ω
(17.5.4)
The above boundary condition is said to be of conormal type.
4. If f ∈ L2 (0, 1) we say that u is a weak solution of the fourth order problem
u0000 + u = f
0<x<1
u00 (0) = u000 (0) = u00 (1) = u000 (1) = 0
if u ∈ H 2 (0, 1) and
Z 1
Z
00
00
(u (x)ζ (x) + u(x)ζ(x)) dx =
0
1
f (x)ζ(x) dx for all ζ ∈ H 2 (0, 1)
0
Discuss why this is a reasonable definition and use the Lax-Milgram Theorem to
prove that there exists a weak solution.
The following fact may be useful here: there exists a finite constant C such that
0 2
2
00 2
||φ ||L2 (0,1) ≤ C ||φ||L2 (0,1) + ||φ ||L2 (0,1)
∀φ ∈ H 2 (0, 1)
see for example Lemma 4.10 of [1] or equation 12.1 in Chapter I of [24].
5. Let Ω ⊂ RN be a bounded open set containing the origin. Show that δ ∈ H −1 (Ω)
if and only if N = 1.
6. Let f and g be in L2 (0, 1). Use the Lax-Milgram Theorem to prove there is a unique
weak solution {u, v} ∈ H01 (0, 1) × H01 (0, 1) to
−u00 + u + v 0 = f
−v 00 + v + u0 = g,
where u(0) = v(0) = 0, u(1) = v(1) = 0. (Hint: Start by defining the bilinear form
Z 1
A[(u, v), (φ, ψ)] =
(u0 φ0 + uφ + v 0 φ + v 0 ψ 0 + vψ + u0 ψ) dx
0
on H01 (0, 1) × H01 (0, 1).)
320
7. If X is a Banach space prove that C([a, b] : X) is also a Banach space with norm
defined in (17.2.15).
8. Let L be the divergence form elliptic operator Lv = − ajk (x)vxj x in a bounded
k
open set Ω ⊂ RN and let u be a solution of the parabolic problem
ut +Lu = 0 x ∈ Ω, t > 0
u(x, t) = 0 x ∈ ∂Ω, t > 0
u(x, 0) = u0 (x) x ∈ Ω
Let φ be a C 2 convex function on R with φ0 (0) = 0.
a) Show that
Z
Z
φ(u(x, t)) dx ≤
Ω
φ(u0 (x)) dx
Ω
for any t > 0.
b) By choosing φ(s) = |s|p and letting p → ∞, show that
||u(·, t)||L∞ ≤ ||u0 ||L∞
9. What is the dual space of Lp ((a, b) : Lq (Ω)) for p, q ∈ (1, ∞)?
321
Chapter 18
Appendices
18.1
Inequalities
In this section we state and prove a number of useful inequalities for numbers and functions.
A function φ on an interval (a, b) ⊂ R is convex if
φ(λx1 + (1 − λ)x2 ) ≤ λφ(x1 ) + (1 − λ)φ(x2 )
(18.1.1)
for all x1 , x2 ∈ (a, b) and λ ∈ [0, 1]. A convex function is necessarily continuous (see
Theorem 3.2 of [30]). If φ is such a function and c ∈ (a, b) then there always exists
a supporting line for φ at c, more precisely, there exists m ∈ R such that if we let
ψ(x) = m(x − c) + φ(c), then ψ(x) ≤ φ(x) for all x ∈ (a, b). If φ is differentiable at
x = c then m = φ0 (c), otherwise it may be defined in terms of a certain supremum (or
infimum) of slopes. If in addition φ is twice differentiable then φ is convex if and only if
φ00 ≥ 0.
young
Proposition 18.1. (Young’s inequality) If a, b ≥ 0, 1 < p, q < ∞ and
ab ≤
ap b q
+
p
q
1
p
+
1
q
= 1 then
(18.1.2)
Proof: If a or b is zero the conclusion is obvious, otherwise, since the exponential function
322
A19
is convex and 1/p + 1/q = 1 we get
ab = e(log a+log b) = e(
q
log ap
+ logqb
p
)
p)
e(log a
p
≤
+
e(log b
q
q)
=
ap b q
+
p
q
(18.1.3)
In the special case that p = q = 2 (18.1.2) can be proved in an even more elementary
way, just by rearranging the obvious inequality a2 − 2ab + b2 = (a − b)2 ≥ 0.
propA4
Corollary 18.1. If a, b ≥ 0, 1 < p, q < ∞,
ab ≤
1
p
+
1
q
= 1, and > 0 there holds
ap
bq
+ q
p
q p
Proof: We can write
1
p
ab = ( a)
b
(18.1.4)
A113
1
(18.1.5)
p
and then apply Proposition 18.1.
holderp
Proposition 18.2. (Hölder’s inequality) If u, v are measurable functions on Ω ⊂ RN ,
1 ≤ p, q ≤ ∞, and p1 + 1q = 1 then
||uv||L1 (Ω) ≤ ||u||Lp (Ω) ||v||Lq (Ω)
(18.1.6)
Proof: We may assume that ||u||Lp (Ω) , ||v||Lq (Ω) are finite and nonzero, since otherwise
(18.1.6) is obvious. When p, q = 1 or ∞, proof of the inequality is elementary, so assume
first that 1 < p, q < ∞. Using (18.1.4) with a = |u(x)| and b = |v(x)|, and integrating
with respect to x over Ω gives
Z
Z
Z
1
p
|u(x)v(x)| dx ≤
|u(x)| dx + q
|v(x)|q dx
(18.1.7)
p
p
q Ω
Ω
Ω
By choosing
R
1
|v(x)|q dx q
Ω
= R
|u(x)|p dx
Ω
the right hand side of this inequality simplifies to
Z
p1 Z
1q 1 1
p
q
|u(x)| dx
|v(x)| dx
+
= ||u||Lp (Ω) ||v||Lq (Ω)
p q
Ω
Ω
323
(18.1.8)
(18.1.9)
holder
as needed.
The special case of Hölder’s inequality when p = q = 2 is commonly called the
Schwarz, or Cauchy-Schwarz inequality. Whenever p, q are related, as in Young’s or
Hölder’s inequality, via 1/p + 1/q = 1 it is common to refer to q = p/(p − 1) =: p0 , as the
Hölder conjugate exponent of p.
minkowskip
Proposition 18.3. (Minkowksi inequality) If u, v are measurable functions on Ω ⊂ RN
and 1 ≤ p ≤ ∞, then
||u + v||Lp (Ω) ≤ ||u||Lp (Ω) + ||v||Lp (Ω)
(18.1.10)
minkowski
Proof: We may assume that ||u||Lp (Ω) , ||v||Lp (Ω) are finite and that ||u + v||Lp (Ω) 6= 0,
since otherwise there is nothing to prove. We have earlier noted in Section 3.1 that Lp (Ω)
is a vector space, so u + v ∈ Lp (Ω) also. In the case 1 < p < ∞ we write
Z
Z
Z
p−1
p
|u(x)| |u(x)+v(x)| dx+ |v(x)| |u(x)+v(x)|p−1 dx (18.1.11)
|u(x)+v(x)| dx ≤
A120
Ω
Ω
Ω
By Hölder’s inequality
p1 Z
Z
1q
Z
p
p−1
(p−1)q
|u(x)| dx
|u(x)| |u(x) + v(x)| dx ≤
|u(x) + v(x)|
dx
(18.1.12)
Ω
Ω
Ω
where 1/q + 1/p = 1. Estimating the second term on the right of (18.1.11) in the same
way, we get
1q
Z
Z
p
p
|u(x) + v(x)| dx (||u||Lp (Ω) + ||v||Lp (Ω) )
|u(x) + v(x)| dx ≤
(18.1.13)
Ω
Ω
from which the conclusion (18.1.10) follows by obvious algebra. The two limiting cases
p = 1, ∞ may be handled in a more elementary manner, and we leave these cases to the
reader.
Both the Hölder and Minkowski inequalities have counterparts
! p1
! 1q
X
X
X
1 1
|ak bk | ≤
|ak |p
|bk |q
1 < p, q < ∞
+ =1
p q
k
k
k
! p1
X
k
|ak + bk |p
! p1
≤
X
|ak |p
(18.1.14)
holders
(18.1.15)
minkowskis
! p1
+
k
X
k
324
|bk |p
1≤p<∞
(with suitable modification for the case of p or q being ∞) in which the integrals are
replaced by finite or infinite sums of real or complex constants – the proofs are otherwise
identical1 .
18.2
Integration by parts
In the elementary integration by parts formula from calculus
Z b
Z b
0
u(x)v (x) dx = −
u0 (x)v(x) dx + u(x)v(x)|ba
a
(18.2.1)
ibp1
a
one integral is shown to be equal to another integral plus a ’boundary term’, where in this
case the boundary consists of the two points a, b, namely the boundary of the interval
[a, b] over which the integration takes place. In higher dimensional situations we refer to
any identity of this general character as being an integration by parts formula. There are
a number of such formulas, all more or less equivalent to each other, which are frequently
used in applied mathematics, and which we review here.
We will take as a known basic integration by parts formula the divergence theorem
Z
Z
∇ · F(x) dx =
F · n(x) dS(x)
(18.2.2)
Ω
valid for a C 1 vector field F and bounded open set Ω ⊂ RN , N ≥ 2, with C 1 boundary
∂Ω, see for example Theorem 10.51 of [29]. Here n(x) is the unit outward normal to ∂Ω
at x ∈ ∂Ω. If we now choose the vector field F to be zero except for the j’th component
Fj (x) = u(x)v(x), there results
Z
Z
Z
∂v
∂u
u(x)
(x) dx = −
(x)v(x) dx +
u(x)v(x)nj (x) dS(x)
(18.2.3)
∂xj
Ω
Ω ∂xj
∂Ω
Replacing v by vj , the j’th component of a vector function v, and summing on j we
next obtain
Z
Z
Z
u(x)(∇ · v)(x) dx = − ∇u(x) · v(x) dx +
u(x)(v · n)(x) dS(x)
(18.2.4)
Ω
divthm
∂Ω
Ω
∂Ω
1
Or from the point of view of abstract measure theory, the proofs are identical because a sum is just a certain
kind of integral.
325
1823
Now choosing v = ∇w, the gradient of some scalar function w, and noting that
∇ · (∇w) = ∆w we find
Z
Z
Z
∂w
u(x)∆w(x) dx = − (∇u · ∇w)(x) dx +
u(x) (x) dS(x)
(18.2.5)
∂n
Ω
Ω
∂Ω
where as usual ∂w
= ∇w · n in the outer normal derivative of w on ∂Ω. Reversing the
∂n
roles of u and w, and subtracting the resulting expressions, we may then obtain Green’s
identity
Z
Z ∂u
∂w
(u(x)∆w(x) − w(x)∆u(x)) dx =
u(x) (x) − w(x) (x) dS(x) (18.2.6)
∂n
∂n
Ω
∂Ω
The special case of (18.2.6) when u(x) ≡ 1, namely
Z
Z
∂w
∆w(x) dx =
(x) dS(x)
Ω
∂Ω ∂n
(18.2.7)
is also of interest.
Finally we mention that the classical Green’s theorem in the plane,
ZZ I
∂Q ∂P
P dx + Q dy =
−
dxdy
∂x
∂y
A
∂A
(18.2.8)
is also a special case of (18.2.2), obtained by choosing the vector field in R2 F = hQ, −P i.
18.3
Spherical coordinates in RN
sphercoord
As in the case of R2 or R3 , it is often convenient to work with spherical coordinates in
RN . Here is how it works:
We denote
SN −1 = {x ∈ RN : |x| = 1}
the unit sphere2 in RN . Every point x ∈ RN may be expressed as x = rω where
r = |x| ≥ 0 and ω ∈ SN −1 , and the representation is unique except for x = 0. We then
2
We try to use the terminology ’unit ball’ for {x ∈ RN : |x| < 1}, but sometimes ’sphere’ and ’ball’ are used
interchangeably. Also, SN is sometimes used as notation for the unit sphere, but SN −1 is more common since it
is a surface of dimension N − 1.
326
greensid
may parametrize SN −1 by N − 1 angle variables θ1 , θ2 , . . . θN −1 , where

x1 = r sin θ1 sin θ2 . . . sin θN −2 sin θN −1





x2 = r sin θ1 sin θ2 . . . sin θN −2 cos θN −1





.
.



.





xN −1 = r sin θ1 cos θ2



xN = r cos θ1
Here 0 ≤ θj ≤ π for j = 1, . . . N − 2 and 0 ≤ θN −1 ≤ 2π.
Thus (r, θ1 , θ2 , . . . θN −1 ) are spherical coordinates on RN . The Jacobian of the transformation (x1 , . . . xN ) → (r, θ1 , θ2 , . . . θN −1 ), needed for integration in spherical coordinates is
rN −1 sinN −2 θ1 sinN −3 θ2 . . . sin θN −2
Integration of a function f over SN −1 may expressed by
Z
Z π
Z π Z 2π
f (ω) dω =
...
f (θ1 , . . . θN −1 )dσ
SN −1
0
0
0
where
dσ = sinN −2 θ1 sinN −3 θ2 . . . sin θN −2 dθN −1 . . . dθ1
Likewise integration of a function f over RN is
Z
Z ∞Z π
Z ∞Z
Z
...
f (x) dx =
f (rω) dωdr =
RN
0
SN −1
0
0
0
π
Z
2π
f (r, θ1 , . . . θN −1 )rN −1 dσ dr
0
In particular if f is radially symmetric, f (x) = f (|x|), we get
Z
Z ∞
f (x) dx = ΩN −1
f (r)rN −1 dr
RN
0
where
Z
ΩN −1 =
dω
SN −1
is the surface area of SN −1 .
327
(18.3.1)
intradfn
Chapter 19
Bibliography
328
Bibliography
Ad75
[1] Robert A. Adams. Sobolev spaces. Academic Press [A subsidiary of Harcourt Brace
Jovanovich, Publishers], New York-London, 1975. Pure and Applied Mathematics,
Vol. 65.
AG93
[2] N. I. Akhiezer and I. M. Glazman. Theory of linear operators in Hilbert space. Dover
Publications, Inc., New York, 1993. Translated from the Russian and with a preface
by Merlynd Nestell, Reprint of the 1961 and 1963 translations, Two volumes bound
as one.
Bl84
[3] Norman Bleistein. Mathematical methods for wave phenomena. Computer Science
and Applied Mathematics. Academic Press, Inc., Orlando, FL, 1984.
BN69
[4] F. Brauer and John A. Nohel. The Qualitative theory of ordinary differential equations, an introduction. W. A. Benjamin Inc., Menlo Park, CA, 1969.
Br11
[5] Haim Brezis. Functional analysis, Sobolev spaces and partial differential equations.
Universitext. Springer, New York, 2011.
Ca66
[6] Lennart Carleson. On convergence and growth of partial sums of Fourier series. Acta
Math., 116:135–157, 1966.
CL55
[7] Earl A. Coddington and Norman Levinson. Theory of ordinary differential equations.
McGraw-Hill Book Company, Inc., New York-Toronto-London, 1955.
CH53
[8] R. Courant and D. Hilbert. Methods of mathematical physics. Vol. I. Interscience
Publishers, Inc., New York, N.Y., 1953.
DM72
[9] H. Dym and H. P. McKean. Fourier series and integrals. Academic Press, New
York-London, 1972. Probability and Mathematical Statistics, No. 14.
329
Ev10
[10] Lawrence C. Evans. Partial differential equations, volume 19 of Graduate Studies
in Mathematics. American Mathematical Society, Providence, RI, second edition,
2010.
Fo95
[11] Gerald B. Folland. Introduction to partial differential equations. Princeton University
Press, Princeton, NJ, second edition, 1995.
Fr44
[12] K. O. Friedrichs. The identity of weak and strong extensions of differential operators.
Trans. Amer. Math. Soc., 55:132–151, 1944.
Ga64
[13] P. R. Garabedian. Partial differential equations. John Wiley & Sons, Inc., New
York-London-Sydney, 1964.
GvL96
[14] Gene H. Golub and Charles F. Van Loan. Matrix computations. Johns Hopkins
Studies in the Mathematical Sciences. Johns Hopkins University Press, Baltimore,
MD, third edition, 1996.
Ho73
[15] Harry Hochstadt. Integral equations. John Wiley & Sons, New York-London-Sydney,
1973. Pure and Applied Mathematics.
Ho83
[16] Lars Hörmander. The analysis of linear partial differential operators. II, volume
256 of Grundlehren der Mathematischen Wissenschaften [Fundamental Principles
of Mathematical Sciences]. Springer-Verlag, Berlin, 1983. Distribution theory and
Fourier analysis.
HN01
[17] John K. Hunter and Bruno Nachtergaele. Applied analysis. World Scientific Publishing Co., Inc., River Edge, NJ, 2001.
Jo82
[18] Fritz John. Partial differential equations, volume 1 of Applied Mathematical Sciences.
Springer-Verlag, New York, fourth edition, 1982.
Ju72
[19] R. K. Juberg. Finite Hilbert transforms in Lp . Bull. Amer. Math. Soc., 78:435–438,
1972.
Ke67
[20] Oliver Dimon Kellogg. Foundations of potential theory. Reprint from the first edition
of 1929. Die Grundlehren der Mathematischen Wissenschaften, Band 31. SpringerVerlag, Berlin-New York, 1967.
Kr89
[21] Rainer Kress. Linear integral equations, volume 82 of Applied Mathematical Sciences.
Springer-Verlag, Berlin, 1989.
Mc03
[22] R. C. McOwen. Partial Differential Equations: Methods and Applications, 2nd ed.
Prentice-Hall, Upper Saddle River, NJ, 2003.
330
MS64
[23] Norman G. Meyers and James Serrin. H = W . Proc. Nat. Acad. Sci. U.S.A.,
51:1055–1056, 1964.
MPF91
[24] D. S. Mitrinović, J. E. Pečarić, and A. M. Fink. Inequalities involving functions and
their integrals and derivatives, volume 53 of Mathematics and its Applications (East
European Series). Kluwer Academic Publishers Group, Dordrecht, 1991.
Pa75
[25] L. E. Payne. Improperly posed problems in partial differential equations. Society for
Industrial and Applied Mathematics, Philadelphia, Pa., 1975. Regional Conference
Series in Applied Mathematics, No. 22.
Pi02
[26] Mark A. Pinsky. Introduction to Fourier analysis and wavelets. Brooks/Cole Series
in Advanced Mathematics. Brooks/Cole, Pacific Grove, CA, 2002.
Ra91
[27] Jeffrey Rauch. Partial differential equations, volume 128 of Graduate Texts in Mathematics. Springer-Verlag, New York, 1991.
Ro10
[28] H. L. Royden and P. M. Fitzpatrick. Real analysis. Prentice Hall, New York, fourth
edition, 2010.
Ru76
[29] Walter Rudin. Principles of mathematical analysis. McGraw-Hill Book Co., New
York-Auckland-Düsseldorf, third edition, 1976. International Series in Pure and
Applied Mathematics.
Ru87
[30] Walter Rudin. Real and complex analysis. McGraw-Hill Book Co., New York, third
edition, 1987.
Ru91
[31] Walter Rudin. Functional analysis. International Series in Pure and Applied Mathematics. McGraw-Hill, Inc., New York, second edition, 1991.
Sc66
[32] Laurent Schwartz. Mathematics for the physical sciences. Hermann, Paris; AddisonWesley Publishing Co., Reading, Mass.-London-Don Mills, Ont., 1966.
Sth11
[33] Ivar Stakgold and Michael Holst. Green’s functions and boundary value problems.
Pure and Applied Mathematics (Hoboken). John Wiley & Sons, Inc., Hoboken, NJ,
third edition, 2011.
St70
[34] Elias M. Stein. Singular integrals and differentiability properties of functions. Princeton Mathematical Series, No. 30. Princeton University Press, Princeton, N.J., 1970.
SW71
[35] Elias M. Stein and Guido Weiss. Introduction to Fourier analysis on Euclidean
spaces. Princeton University Press, Princeton, N.J., 1971. Princeton Mathematical
Series, No. 32.
331
Tr75
[36] François Trèves. Basic linear partial differential equations. Academic Press [A subsidiary of Harcourt Brace Jovanovich, Publishers], New York-London, 1975. Pure
and Applied Mathematics, Vol. 62.
We74
[37] Hans F. Weinberger. Variational methods for eigenvalue approximation. Society
for Industrial and Applied Mathematics, Philadelphia, Pa., 1974. Based on a series
of lectures presented at the NSF-CBMS Regional Conference on Approximation of
Eigenvalues of Differential Operators, Vanderbilt University, Nashville, Tenn., June
26–30, 1972, Conference Board of the Mathematical Sciences Regional Conference
Series in Applied Mathematics, No. 15.
WZ77
[38] Richard L. Wheeden and Antoni Zygmund. Measure and integral. Marcel Dekker,
Inc., New York-Basel, 1977. An introduction to real analysis, Pure and Applied
Mathematics, Vol. 43.
Y01
[39] Robert M. Young. An introduction to nonharmonic Fourier series. Academic Press,
Inc., San Diego, CA, first edition, 2001.
332
Download