Uploaded by Will Feldman

MATH5210 notes

advertisement
LECTURE NOTES MATH 5210
WILL FELDMAN
Abstract. These notes are a work in progress. There are sections /
examples / proofs / exercises which have been started but not completed.
Please read in those places with caution. Please let me know if you find
any errors or typos especially in sections that appear complete.
1. A Tiny Bit of General Measure Theory
In this section we will introduce the basic ideas of general measure theory. We will not go into any details on proofs, but only define the relevant
structures and give several important examples.
Definition 1.1. Given a set X we call 2X to be the power set of X, the set
of all subsets of X. A collection Σ ⊂ 2X is called a σ-algebra on X if:
(1) (Empty set and whole space) ∅, X ∈ Σ.
(2) (Complements) If E ∈ Σ then X \ E ∈ Σ.
∞
(3) (Countable unions / intersections) If (Ej )∞
j=1 are in Σ then ∪j=1 Ej ∈
Σ and ∩∞
j=1 Ej ∈ Σ.
Example 1.2. We have already seen two interesting examples of σ-algebras
on Rn , the Lebesgue measurable sets L(Rn ) and the Borel measurable sets
B(Rn ).
Definition 1.3. Given a σ-algebra Σ on a set X, a map µ : Σ → [0, +∞] is
called a positive measure on (X, Σ) if:
(1) (Empty set) µ(φ) = 0.
(2) (Complements) For E ⊂ F in Σ
µ(E) ≤ µ(F ).
(3) (Countable additivity) If (Ej )∞
j=1 in Σ are mutually disjoint then
µ(∪∞
j=1 Ej )
=
∞
X
µ(Ej ).
j=1
One can also naturally define signed measures, although some care is
needed due to possible cancellations between +∞ and −∞. We will not go
into more detail on this topic except to say that a natural class of signed
measures to consider consists of measures of the form
µ = µ+ − µ−
1
2
WILL FELDMAN
where µ+ , µ− are both positive measures on (X, Σ) with finite total mass
µ± (X) < +∞.
Now we have enough background to define a measure space.
Definition 1.4. A measure space is a triplet (X, Σ, µ) where X is a set, Σ
is a σ-algebra on X, and µ is a positive measure on (X, Σ). If we do not
want to specify the measure we can call (X, Σ) a measurable space.
Example 1.5. (Lebesgue measure) The fundamental example is of course
the one we have studied significantly already, (Rn , L(Rn ), m) the Lebesgue
measure on the Lebesgue measurable subsets of Rn .
We will see several more examples soon, but first let’s introduce the idea
of a measurable function and the idea of the integral.
Definition 1.6. Given a pair of measurable spaces (X, Σ) and (Y, Γ) a map
f : X → Y is called measurable if f −1 (E) ∈ Σ for all E ∈ Γ. Typically
we will take (Y, Γ) = (R, B(R)), this matches up with our definition of
measurable functions before when (X, Σ) = (Rn , L(Rn )).
Given a non-negative real valued measurable function f : X → [0, +∞]
we can define an integral exactly as we did for Lebesgue measure. We
define the integral for measurable simple functions s : X → [0, +∞] first
P
s(x) = Jj=1 aj χEj (x) where Ej ∈ Σ
ˆ
J
X
s dµ =
aj µ(Ej ).
X
j=1
Then we extend to general non-negative measurable functions by
ˆ
ˆ
f dµ = sup{ s dµ| s ≤ f simple }.
X
X
Of course we can also extend to absolutely integrable functions as well in
the same way we did in the Lebesgue theory.
Now we give a few more important examples of measures and their associated integrals.
Example 1.7. (Non-negative measurable functions) Another large class
of examples of positive measures is given by the non-negative measurable
functions on (Rn , L(Rn )). given ρ : Rn → [0, +∞] measurable define
ˆ
µ(E) =
ρ.
E
The integral in this case is
ˆ
ˆ
f dµ =
Rn
f ρ.
Rn
Example 1.8. (Delta mass) The famous δ-function of physics, the bane of
math undergraduates, actually does make sense mathematically, it is just
a measure not a function. Actually, maybe even more correctly, it is a
LECTURE NOTES MATH 5210
3
distribution, but this is an idea we will not explore in this course. We can
define δ measures on any measure space (X, Σ). Given a point x ∈ X and a
set E ∈ Σ define δx the delta mass at x
(
1 x∈E
δx (E) =
0 x 6∈ E.
The integral in this case is
ˆ
f dµ = f (x).
X
Example 1.9. (Surface measure on a smooth graph) Consider an 2-dimensional
surface in R3 given by the graph of a smooth function f : R2 → R
S = {x ∈ R3 : x3 = f (x1 , x2 )}.
Recall from calculus that the surface area of S above a region Ω ⊂ R2 is
given by
ˆ p
1 + k∇f k2 dx1 dx2 .
Ω
Define the projection P : R3 → R2
P (x1 , x2 , x3 ) = (x1 , x2 ).
It turns out that the projection of any Borel measurable set in R3 is a
Lebesgue measurable set in R2 so we can define the surface measure
ˆ
p
1 + k∇f k2 dx1 dx2 .
µ(E) :=
P (E∩S)
This fact about projections is highly non-obvious though, the projection of a
Lebesgue set in R3 onto R2 may not be measurable at all, and the projection
of a Borel set may not be Borel.
This example may provide some motivation to generalize the idea of
Lebesgue theory to start with “simple” sets (boxes) then define an outer
measure, and finally identify an appropriate class of measurable sets. In
fact there is such a general theory for constructing measures starting from
something called a pre-measure defined on a Boolean algebra - think of the
volume of boxes (pre-measure), and the class of sets which are finite unions
/ intersections / complements of boxes (Boolean algebra).
Example 1.10. (Cantor measure) This measure requires a bit more work
to construct so I am not including it in the notes at the moment, however
basically this is a measure which is ”uniform” on the middle thirds Cantor
set.
Example 1.11. (Probability measures) In probability theory measurable
spaces are often denoted (Ω, F) and are called probability spaces. A probability measure P on a space (Ω, F) is just a positive measure with total
mass P(Ω) = 1. Measurable sets are called events and measurable functions are called random variables and usually denoted by capital letters like
4
WILL FELDMAN
X : Ω → R. The fundamental computations of probability theory involve,
at the most basic level, computing probabilities like
P(X ∈ [a, b]) = P({ω ∈ Ω : X(ω) ∈ [a, b]})
for a real valued random variable X and an interval [a, b] ∈ R.
2. Basics of Functional Analysis
In this lecture we will explore some of the types of spaces, namely normed
spaces and inner product spaces, with examples. Besides the metric space,
which we have already studied a lot in this class, the new types of spaces
that we will introduce here will all be vector spaces over R or C. In the
following diagram each row introduces additional structures
Vector Spaces
↑
Normed Spaces
↑
Inner Product Spaces
↑
Finite Dimensional Euclidean Space.
As we will see every normed space also has a canonically associated metric
space structure, and so all the of the bottom three types of space in the
chart above are also naturally metric spaces.
Remark 2.1. There is another important class of spaces which have more
structure than vector spaces, but less than normed spaces, known as topological vector spaces. Within this there are also many distinctions, one of the
most important is the notion of Fréchet space (which are complete metric
spaces as well). This idea comes up rather naturally, for example if you want
to view the space C ∞ (Rn → R) as a metric space.
2.1. Normed Spaces.
Definition 2.2. Suppose that V is a vector space over R or C, we call a
map k · k : V → [0, ∞) a norm if all of the following properties hold:
(1) (Positivity) For all x ∈ V we have kxk ≥ 0 and kxk = 0 if and only
if x = 0.
(2) (Scaling) For any α ∈ R (or C) and x ∈ V we have kαxk = |α|kxk.
(3) (Triangle inequality) For any x, y ∈ V we have
kx + yk ≤ kxk + kyk.
Note that every norm naturally gives rise to a metric
d(x, y) = kx − yk.
The scaling property implies the symmetry of the metric, but also this scaling property is where norms are really more general than metrics. Of course
LECTURE NOTES MATH 5210
5
for the scaling property to make sense uses the underlying vector space
structure.
Many of the examples of metric spaces that we have seen in this class are
actually normed spaces. Let’s see several examples.
Example 2.3. Euclidean space Rn with the Euclidean norm
n
X
kxk2 = (
x2j )1/2 ,
j=1
or any of the
`p
norms
n
X
kxkp = (
|xj |p )1/p and kxk∞ = max |xj |.
1≤j≤n
j=1
Of course we have mostly been interested in the case p ∈ {1, 2, ∞}. One can
also make natural analogues of these norms for the complex vector space
Cn .
Example 2.4. (Sequence spaces) We have seen several times on the homework the space for 1 ≤ p < +∞


∞


X
`p (N → R) = x : N → R|
|xj |p < +∞


j=1
and
(
∞
` (N → R) =
)
x : N → R| sup |xj | < +∞ .
j
We mostly studied the cases p ∈ {1, ∞}. It is some work to establish the
triangle inequality for 1 < p < ∞, we will handle this later in these notes at
least for the case p = 2.
Example 2.5. (Continuous functions) Given a metric space (X, d) the space
of real valued continuous functions
C(X → R) = {f : X → R| f continuous}
is naturally a normed space with
kf ksup = sup |f (x)|.
x∈X
Example 2.6. (C k spaces) If the domain X = [0, 1] we can also define a
natural normed space of k-times differentiable functions
C k ([0, 1] → R) = {f : [0, 1] → R| f is k-times differentiable and f (k) is continuous},
with the norm
kf kC k =
k
X
j=0
kf (j) ksup ,
6
WILL FELDMAN
here f (0) , the zeroth derivative, is just the function itself. An analogue of
this space also makes sense on subdomains of Rn .
Example 2.7. (C ∞ space) The space of smooth functions
C ∞ ([0, 1] → R) =
∞
\
C k ([0, 1] → R)
k=0
is also naturally a metric space (it is complete so we call it a Fréchet space
actually), although it is not a normed space.
We say that a sequence fn ∈ C ∞ converges to f ∈ C ∞ if every derivative
(k)
fn converges uniformly to f (k) . You may find it interesting to think about
how to come up with a metric consistent with this notion of convergence.
Spoiler:
∞
X
kf (k) − g (k) ksup
∞
dC (f, g) =
2−k
.
1 + kf (k) − g (k) ksup
k=0
Example 2.8. (L1 space) The space of absolutely integrable functions on
Rn is (almost) another normed space.
L1 (Rn → R) = {f : Rn → R| f is absolutely integrable}
with the norm
ˆ
kf kL1 =
|f (x)| dx.
Rn
It is fairly straightforward to check all the properties of the norm (do it!),
except that kf kL1 = 0 only implies that f = 0 almost everywhere. However
this is a problem which we have seen and dealt with before, we simply need
to mod out by the equivalence relation “f = g almost everywhere”.
There are also natural Lp analogues of this space, we will postpone discussion on those. For 1 < p < ∞ the triangle inequality is a bit trickier to
prove, and for p = ∞ we will have to do a bit of measure theoretic thinking
to even define the norm.
There is a special name when the metric space induced by the norm is
complete:
Definition 2.9. If (V, k · k) is a complete normed space then we call it a
Banach Space.
Exercise 2.10. Show
P that a normed vector space (V, k·k) is complete
P∞ if and
only if any series ∞
v
which
is
absolutely
summable,
i.e.
j
j=1
j=1 kvj k <
+∞, converges in V .
All of the examples we gave above are Banach spaces, most of them infinite
dimensional. We will need to establish that L1 (Ω → R) is complete since
that is something we have not done yet, but we will save that for a bit later.
LECTURE NOTES MATH 5210
7
2.2. Inner product spaces. As we know from calculus Euclidean space
actually has more structure than just a norm, it has the dot product: for
x,y ∈ Rn .
n
X
x·y =
x j yj .
j=1
The notion of inner product is a generalization of the idea of dot product.
There is also a more significant difference between real and complex inner
products, so we will give the real case first and then the complex case.
Definition 2.11. Let V be a vector space over R, a map h·, ·i : V × V → R
is called a real inner product if it satisfies the following properties:
(1) (Symmetry) For all x, y ∈ V
hx, yi = hy, xi
(2) (Linearity in second entry) For all x, y, z ∈ V and a ∈ R
hx, ay + zi = ahx, yi + hx, zi
(3) (Positivity) For all x ∈ V
hx, xi ≥ 0 with equality if and only if x = 0.
Note that, by symmetry and linearity in the second entry, a real inner
product is actually bilinear, i.e. linear in both entries.
The definition of complex inner product is almost the same, replacing
only symmetry with conjugate symmetry.
Definition 2.12. Let V be a vector space over C, a map h·, ·i : V × V → C
is called a complex inner product if it satisfies the following properties:
(1) (Conjugate symmetry) For all x, y ∈ V
hx, yi = hy, xi,
where z = u − iv is the complex conjugate of a complex number
z = u + iv.
(2) (Linearity in second entry) For all x, y, z ∈ V and a ∈ R
hx, ay + zi = ahx, yi + hx, zi
(3) (Positivity) For all x ∈ V
hx, xi ≥ 0 with equality if and only if x = 0,
note we are using a shorthand that hx, xi ≥ 0 says that hx, xi ∈ R
and is non-negative.
Note that, by symmetry and linearity in the second entry, a complex inner
product is conjugate linear in its first entry. Also note that different sources
may take the convention of linearity in the first entry and conjugate linearity
in the second entry.
8
WILL FELDMAN
One of the key important notions in inner product spaces is orthogonality.
A pair of vectors x, y ∈ V are orthogonal if
hx, yi = 0.
It turns out that every inner product gives rise to a norm, and hence also
to a metric,
kxk = hx, xi1/2
Most of the properties of the norm follow directly from the definition, but
we need to prove the triangle inequality, which we do below.
First, however, we prove another extremely important inequality called
the Cauchy-Schwarz inequality.
Lemma 2.13. If (V, h·, ·i) is an inner product space for any x, y ∈ V
|hx, yi| ≤ kxkkyk
where k · k is the norm associated with the inner product as defined above.
Sketch. Let x, y ∈ V . By positivity
0 ≤ hx + λy, x + λyi
for all λ ∈ R (or C if we are in the complex case). Then expand out the
right hand side using bi-linearity and use calculus to choose a good value of
λ which minimizes the right hand side (exercise).
Now we can use Cauchy-Schwarz to prove the triangle inequality.
Lemma 2.14. If (V, h·, ·i) is an inner product space then the associated
norm kxk = hx, xi1/2 satisfies the triangle inequality.
Sketch. Expand out kx + yk2 = hx + y, x + yi and use Cauchy-Schwarz on
an appropriate term.
You might then ask if every norm actually comes from an inner product.
This turns out not to be the case.
Exercise 2.15. Show that any norm k·k which arises from an inner product
satisfies the parallelogram identity:
kx + yk2 + kx − yk2 = 2kxk2 + 2kyk2 .
Use this to check that one of norm spaces defined in the previous section is
not an inner product space.
Now let us see some examples of inner product spaces
Example 2.16. (Euclidean space) The dot product on Rn is an inner product
n
X
x·y =
x j yj .
j=1
LECTURE NOTES MATH 5210
9
On Cn we have the complex inner product
x̄ · y =
n
X
x̄j yj .
j=1
Example 2.17. (`2 -space) Among the normed sequence spaces we introduced earlier `2 (N → R) is also a real inner product space with
hx, yi`2 =
∞
X
xj yj .
j=1
There is also natural complex analogue of this for `2 (N → C).
Example 2.18. (L2 -space) We also have natural integral based inner products. Given any measurable domain Ω ⊂ Rn
L2 (Ω → R) = {f : Ω → C| f measurable and |f |2 is integrable}
with the inner product
ˆ
hf, giL2 =
f (x)g(x) dx.
Ω
Of course we need to ask ourselves if the product f g is absolutely integrable
so that this inner product is actually defined. Formally we can apply the
Cauchy-Schwarz inequality to conclude
ˆ
1/2 ˆ
1/2
ˆ
2
2
|f g| ≤
|f |
|g|
< +∞
Ω
Ω
Ω
since f and g are assumed to be square integrable in the definition of the
space. This argument is not really formal though because the proof of
Cauchy-Schwarz goes through when applied to h|f |, |g|iL2 which is an integral of a non-negative function and thus well-defined.
The complex version of the inner product on L2 (Ω → C) is defined analogously
ˆ
hf, giL2 =
f (x)g(x) dx.
Ω
2.3. Schauder Bases. As you may remember from your linear algebra
class, the notion of basis and dimension play a major role in the study
of vector spaces. Let’s recall the classical definition of a basis.
Definition 2.19 (Hamel Basis). For a given vector space V over R (or C),
a set of vectors B = {vj }j∈J is called a basis for V if every vector v ∈ V can
be written as a unique finite linear combination of elements of B
X
v=
aj vj for some aj ∈ R (or C).
j∈J
The dimension is then defined dim(V ) := #(J), this could be +∞ but it
does not depend on the choice of basis.
10
WILL FELDMAN
Of course most of the vector spaces that we are interested in this class are
actually infinite dimensional. It turns out that Hamel Bases, although they
do exist in any vector space, are not really the most natural concept of basis
in Banach / Hilbert spaces. Since we do have a topology induced by the
norm / inner product, and we are working in a complete space, we can make
sense of absolutely summable linear combinations of basis elements. It makes
sense to define a notion of basis allowing for infinite linear combinations.
Definition 2.20 (Schauder Basis). Suppose that (V, k · k) is a Banach space
over R (or C), a countable set of vectors B = {vj }∞
j=1 is called a Schauder
basis for V if for every v ∈ V there is a unique sequence of scalars (aj )∞
j=1
such that
N
X
kv −
aj vj k → 0 as N → ∞.
j=1
The basis is called normalized if kvj k = 1 for all j ∈ J. If k · k is derived
from an inner product h·, ·i, i.e. V is actually a Hilbert space, and the basis
has the property that
hvi , vj i = 0 for i 6= j
then we call the basis orthogonal. If a basis is both orthogonal and normalized it is called an orthonormal (Schauder) basis.
Exercise 2.21. Show that if (V, k · k) has a Schauder basis then it is separable.
Exercise 2.22. Show that if (V, k · k) is an infinite dimensional Banach
space then any Hamel basis must be uncountable. Hint: The span of any
finite subset of a Hamel basis B is a finite dimensional vector subspace of V .
Show that the complement of this subspace is open and dense. Use the Baire
Category theorem to get a contradiction in the case that B is countable.
Let’s see some examples
Example 2.23. (`p standard basis) In the sequence spaces
`p (N → R) = {x : N → R : kxk`p < +∞}
for 1 ≤ p < ∞ there is a natural Schauder basis which is just the obvious
extension of the canonical basis for finite dimensional Euclidean spaces, B =
{ej }∞
j=1
(
1 n=j
ej (n) =
0 else.
It is straightforward to check that this basis is normalized, and in the case
of `2 (N → R) (which is a Hilbert space) the basis is orthonormal.
Exercise 2.24. Check that the Euclidean basis {ej }j∈N is a Schauder basis
for `p (N → R) for 1 ≤ p < +∞. Why is it not a Schauder basis in the case
p = +∞?
LECTURE NOTES MATH 5210
11
Example 2.25. (Schauder basis of C([0, 1])?) Recall the space of continuous functions on the unit interval C([0, 1] → R) with the uniform norm
kf ksup = sup |f (x)|.
x∈[0,1]
We already know from the Weierstrass theorem that the set of polynomials
is dense. This allowed us to prove that C([0, 1] → R) is separable. One
might think that the monomials for n ∈ {0} ∪ N
mn (x) = xn
could form a Schauder basis, because, as we know from Weierstrass, the
finite linear combinations of the monomials
However any function
P are dense.
n , with the series converging
which admits a representation f (x) = ∞
a
x
n
n=0
uniformly on [0, 1], is actually infinitely differentiable at least on [0, 1). I
won’t go into the details of the proof here, but you can read Tao Chapter
4.1 and 4.2 to find the relevant results.
In fact C([0, 1] → R) does indeed have a Schauder basis, but we will need
to come back with a different idea.
2.4. Orthonormal Schauder Bases and Projections. Now let me make
a few additional notes about orthonormal bases in Hilbert spaces, because
these have some additional nice properties.
Definition 2.26. If (H, h·, ·, i) is a Hilbert space G is a closed subspace we
call bounded linear PG : H → H to be the orthogonal projection onto G if
PG x ∈ G for all x ∈ H and
x − PG x ∈ G⊥ .
Lemma 2.27 (Existence of Orthogonal Projections). If (H, h·, ·, i) is a
Hilbert space G is a closed subspace, there exists a bounded linear PG :
H → H, called the orthogonal projection onto G, such that PG x ∈ G for all
x ∈ H,
x − PG x ∈ G⊥ and kx − PG xk = min kx − yk.
y∈G
Proof. The construction of the operator actually goes through the minimization principle. Let yn be a minimizing sequence for miny∈G kx − yk, i.e.
yn ∈ G and
kyn − xk → α := min kx − yk.
y∈G
Then, using the parallelogram law,
kyn −ym k2 = k(yn −x)−(ym −x)k2 = 2kyn −xk2 +2kym −xk2 −kyn +ym −2xk2 .
Note that
1
kyn + ym − 2xk2 = 4k (yn + ym ) − xk2 ≥ 4m2
2
because 12 (yn + ym ) ∈ G as well and so has distance to x larger than the
minimum value α. Thus
kyn − ym k2 ≤ 2kyn − xk2 + 2kym − xk2 − 4α2 → 0 as n, m → ∞.
12
WILL FELDMAN
Thus (yn ) is Cauchy, since G is a closed subset of a complete space it is
convergent in G, define PG x to be the limit of this sequence. By continuity
of the norm with respect to convergence in the norm kx − PG xk = α.
Now for the orthogonality property. Let y ∈ G, then by the minimality
property again
0=
d
dt
kx−PG x+tyk2 =
t=0
d
dt
(kx−PG xk2 +2thx−PG x, yi+t2 kyk2 ) = 2hx−PG x, yi
t=0
which is the desired result. Actually note that this argument also shows,
conversely, that if x − PG x ∈ G⊥ then PG x is the point in G closest to x.
Finally we check that PG is a bounded linear operator, of course
kx − PG xk ≤ kxk
since 0 ∈ G was a valid test point for the minimization, so
kPG xk ≤ 2kxk.
We also need to check linearity. I will leave this as an exercise, use the fact
we just noted that x − PG x ∈ G⊥ implies that PG x minimizes miny∈G kx −
yk.
Lemma 2.28. If (H, h·, ·, i) is a Hilbert space G is a closed subspace and
B = {vj }∞
j=1 is an orthonormal basis for G, then, for any x ∈ H
PG x =
∞
X
hvj , xivj ,
j=1
with P
the sum converging in the norm topology. In particular if G = H then
x= ∞
j=1 hx, vj ivj for all x ∈ H.
Proof. Since B is a basis for G there are some coefficients αj so that
PG x =
∞
X
αj vj
j=1
with the sum converging in the norm topology. Then we compute the inner
product
∞
∞
X
X
hvi , PG xi = hvi ,
αj vj i =
αj hvi , vj i = αi .
j=1
j=1
Interchanging the sum and inner product is justified because h·, yi is a norm
continuous (Lipschitz in fact) function on H for any y ∈ H since
|ha, yi − hb, yi| = |ha − b, yi| ≤ ka − bkkyk
by Cauchy-Schwarz.
I forgot the following argument in class so if you were confused by this
proof in lecture look here: Now we are almost done we just need to see that
αi = hvi , PG xi = hvi , xi + hvi , PG x − xi = hvi , xi
LECTURE NOTES MATH 5210
13
since x−PG x is orthogonal to every vector in G, in particular it is orthogonal
to vi , so we have just added 0 to the RHS in the second equality.
Corollary 2.29 (Criterion for being a Schauder basis). If (H, h·, ·, i) is a
Hilbert space and B = {vj }∞
j=1 is an orthonormal system in H, then B is a
Schauder basis for
G := span({vj }∞
j=1 )
and if, moreover, G⊥ = {0} then B is a Schauder basis for H. In other
words if
hx, vj i = 0 ∀j =⇒ x = 0
then B is a Schauder basis for H.
Exercise 2.30. Prove Corollary 2.29. (It is really a corollary of the proof of
Lemma 2.28 so you will need to go into that proof a bit and change things
to match the new assumptions here).
Theorem 2.31 (Plancherel/Parseval identity). If (H, h·, ·, i) is a Hilbert
space and B = {vj }∞
j=1 is an orthonormal basis for H, then, for any x, y ∈ H
hx, yiH =
∞
X
hvj , xihvj , yi.
j=1
and in particular
kxk2H
=
∞
X
|hvj , xi|2 .
j=1
Proof. Formally we just expand out the norm using the inner product definition
X
X
X
hx, yi = h hvj , xivj ,
hvj , xihvi , xihvj , vi i.
hvi , xivi i =
j
i
i,j
Finally we use orthonormality so that hvi , vj i = 0 unless i = j and in that
case it is 1.
The key analysis point here is that the infinite sum converges in the norm
and h·, ·i is norm continuous in both entries due to the Cauchy-Schwarz
inequality. I recommend making this argument carefully on your own with
ε and N .
Example 2.32. (Orthogonal polynomials) In our first example we use the
Weierstrass theorem to construct an orthonormal basis for L2 ([0, 1] → R) of
orthogonal polynomials. The construction is really just the classical GramSchmidt procedure applied to the monomials. We start out with
P0 (x) = 1
14
WILL FELDMAN
and then we define iteratively
n
Qn (x) = x −
n−1
X ˆ 1
j=0
x Pj (x) dx Pj (x)
n
0
i.e. this is exactly the orthogonal projection of xn onto the orthogonal
complement of the span of P0 , . . . , Pn−1 . Then we normalize
ˆ 1
Pn (x) = Qn (x)/(
Qn (x)2 dx)1/2 .
0
By the construction we have an orthonormal system, now we need to check
that it is a basis. It is worth noting that span(P0 , . . . , Pn ) is exactly the
space of polynomials of degree at most n, and so span(P0 , P1 , . . . ) is the
space of polynomials on [0, 1].
For this we will use the criterion of Corollary 2.29. Suppose there is
f ∈ L2 ([0, 1] → R) such that
ˆ 1
f (x)Pn (x) dx = 0 for all 0 ≤ n < ∞.
0
We aim to show that f = 0 almost everywhere, if we can do this then we
will know that {Pn } are an orthonormal Schauder basis of L2 ([0, 1] → R).
Suppose otherwise, that kf kL2 ([0,1]) > 0.
As shown on your homework (more or less HW6 problem 1) the continuous
functions C([0, 1] → R) are dense in L2 ([0, 1] → R) in the L2 norm. So we
can find g ∈ C([0, 1] → R) with
1
kf − gkL2 ([0,1] ≤ kf kL2 ([0,1]) .
4
By the Weierstrass theorem there is a polynomial Q on [0, 1] such that
1
kg − Qksup ≤ kf kL2 ([0,1])
4
and so
ˆ
kg − QkL2 ([0,1]) =
1
2
|g(x) − Q(x)| dx
0
1/2
1
≤ kf kL2 ([0,1]) .
4
Then also Q is a good approximation of f in L2 , by triangle inequality,
1
kQ − f kL2 ([0,1]) ≤ kQ − gkL2 ([0,1]) + kg − f kL2 ([0,1]) ≤ kf kL2 ([0,1]) .
2
Since Q is a polynomial it can be written as a finite linear combination of
Pj
n
X
Q(x) =
αj Pj (x).
j=0
LECTURE NOTES MATH 5210
15
Then
0 = hf, QiL2 ([0,1])
= hf, f + Q − f iL2 ([0,1])
= kf k2L2 ([0,1]) − hf, Q − f iL2 ([0,1])
≥ kf k2L2 ([0,1]) − kf kL2 ([0,1]) kQ − f kL2 ([0,1])
1
≥ kf k2L2 ([0,1]) − kf k2L2 ([0,1]) > 0.
2
This is a contradiction of kf kL2 ([0,1]) > 0.
Example 2.33. (Fourier basis of L2 (T)) We will devote the next section to
this topic.
Example 2.34. (Haar basis of L2 (R)) The Haar basis is based around indicator functions of dyadic intervals at decreasing scales. At the first we simply
take the indicators of the dyadic intervals, but at all subsequence scales we
take a certain linear combination in order to maintain orthogonality with
the previous scales.
We define for k ∈ Z
ϕ0,k = χ[k,k+1] (x).
Then for n ≥ 1
ϕn,k = 2(n−1)/2 (χ2−n [2k,2k+1] (x) − χ2−n [2k+1,2k+2] (x)).
Try drawing the graph of ϕ0,0 , ϕ1,0 and ϕ2,0 to get an idea of what is going
on.
The key point is that ϕn,k are normalized. They are orthogonal for fixed
n because the supports are disjoint varying k. If for ϕn1 ,k1 and ϕn2 ,k2 the
scales n1 6= n2 then, let’s say n2 > n1 and then ϕn1 ,k1 is constant on the
support of ϕn2 ,k2 and since n2 > n1 ≥ 0 we have
ˆ
0 = ϕn2 ,k2 = hϕn2 ,k2 , ϕn1 ,k1 i.
Thus we at least have an orthonormal set of vectors in L2 (R). Now we
would like to show that this is indeed a Schauder basis. First let us note
that indicator functions of all dyadic intervals are in the span of the (ϕn,k ).
I will leave this as an exercise, it is a simple inductive argument, just start
out by proving that you can write [0, 1/2] as a linear combination of ϕ0,0
and ϕ1,0 and this will give you the idea.
Call G to be the closure of the set of finite linear combinations of {ϕn,k }n,k .
If we can show that G⊥ = {0} then we will have G = H. Suppose that
f ∈ G⊥ , that is
hf, ϕn,k i = 0 for all n ≥ 0 and k ∈ Z.
16
WILL FELDMAN
Then f has integral zero on every dyadic interval
ˆ
f = 0 for all n, k ∈ Z.
2−n [k,k+1]
From this we can conclude that f has integral zero on every interval
ˆ
f = 0 for all (a, b) ⊂ R
(a,b)
by (almost) covering the interval (a, b) by small dyadic intervals. Since every
open set in R is a disjoint union of open intervals we also get
ˆ
f = 0 for all U ⊂ R open and finite measure.
U
We need to also assume finite measure because |f |2 is integrable so we know
that f is absolutely integrable on sets of finite measure (see Homework 6).
Then for every measurable set E with finite measure there is a monotone
decreasing sequence of open finite measure sets E ⊂ Un such that
m(Un \ E) & 0 as n → ∞.
Then f χUn & f χE pointwise almost everywhere and the functions are dominated by f χU1 so by dominated convergence theorem
ˆ
ˆ
f = lim
f = 0.
E
n→∞ U
n
Finally we take Er = {f > 0} ∩ [−r, r] which is a finite measure set and
ˆ
ˆ
0=
f=
|f |.
Er
Er
So f = 0 almost everywhere on Er for all r and so f = 0 almost everywhere
on {f > 0}, i.e. {f > 0} has measure zero. A similar argument for {f < 0}
finally shows that f = 0 almost everywhere.
2.5. Dual spaces. Next we introduce a fundamental idea of linear spaces,
the dual space.
Definition 2.35. If V is a vector space we call ` : V → R a linear mapping
to be a linear functional on V . If (V, k · kV ) is a normed vector space and `
is a linear functional on V which is continuous we call it a continuous linear
functional on V . If ` is a linear functional on V with the property that
k`kV ∗ := sup |`(x)| < +∞
kxk≤1
we call ` a bounded linear functional on V . The norm k · kV ∗ defined above
on bounded linear functionals on V is called the dual norm of k · kV .
Note that the notion of a bounded linear functional does not mean the
functional is bounded on all of V , just on any finite radius ball. In fact the
only linear functional which is bounded on all of V is the 0 map.
LECTURE NOTES MATH 5210
17
It turns out that boundedness and continuity are the same for linear
functionals.
Exercise 2.36. If (V, k · kV ) is a normed vector space and ` is a bounded
linear functional on V then for all x ∈ V
|`(x)| ≤ k`kV ∗ kxk.
Lemma 2.37. If (V, k · kV ) is a normed vector space and ` is a linear
functional on V then ` is bounded if and only if ` is continuous.
Proof. If ` is bounded then for any x, y ∈ V
|`(x − y)| ≤ k`kV ∗ kx − yk
i.e. ` is Lipschitz continuous with constant k`kV ∗ .
On the other hand suppose that ` is continuous on V . Then ` is continuous
at 0, and recall that ` is linear means `(0) = 0, so there exists δ > 0 so that
|`(x)| ≤ 1 for kxk ≤ δ.
However by linearity for any x ∈ B(0, 1)
|`(x)| = |δ −1 `(δx)| ≤ δ −1
so ` is bounded.
Definition 2.38. If (V, k · kV ) is a normed vector space we define the dual
space
V ∗ = {` : V → R| ` is a bounded linear functional on V }.
The dual space is also a normed vector space with a canonical norm
inherited from duality.
Lemma 2.39. If (V, k · kV ) is a normed vector space then the dual space V ∗
is a normed vector space with the dual norm k · kV ∗ .
Exercise 2.40. Prove Lemma 2.39, i.e. check that V ∗ is a vector space and
k · kV ∗ is a norm.
The following theorem shows that there are at least some bounded linear
functionals.
Theorem 2.41 (Hahn-Banach). If W is a subspace of V and ` : W → R is
a bounded linear functional on W then there is an extension L : V → R a
bounded linear functional on V which agrees with ` on the subspace W and
kLkV ∗ ≤ k`kW ∗
Proof. We will skip the proof, it uses Zorn’s Lemma. The idea is to show
first that one can extend ` by one dimension at a time without increasing the
norm, and then use that the norm preserving extensions of ` form a partially
ordered set (ordered by containment of the domain of the extension) so there
is a maximal element.
Let ` be a bounded linear functional on a subspace W in V . If W is
not
18
WILL FELDMAN
Since V ∗ is also a normed vector space it itself also has a dual space
V = (V ∗ )∗ which is called the double dual. Notice that every x ∈ V is
actually a bounded linear functional on V ∗ via the mapping ` 7→ `(x). This
means that every element of V is an element of the double dual space
∗∗
V ⊂ V ∗∗ .
(Technically we should think of this actually as a norm preserving isomorphism between V and a subset of V ∗∗ ). It is sometimes the case that the
inclusion is strict, the double dual V ∗∗ is not equal (isomorphic) to the original space. If V ∗∗ = V (technically I mean that the normed spaces are
isomorphic) then we call V reflexive.
Hilbert spaces have the even more special property of being self-dual, the
dual space V ∗ is naturally isomorphic to V . This is known as the Riesz
Representation Theorem.
Theorem 2.42 (Riesz Representation Theorem). If (H, h·, ·, i) is a (real or
complex) Hilbert space and ` ∈ H ∗ is a bounded linear functional on H, then
there exists y ∈ H such that
`(x) = hy, xi for all x ∈ H.
Proof. We just do the real Hilbert space case. First of all if ` = 0 we are
done so we can assume that `(x) 6= 0 for some x ∈ H.
Call ker(`) = {x ∈ H : `(x) = 0}. Since ` is continuous this is a closed
subset of H, and since ` is linear it is a vector subspace. We claim that
ker(`)⊥ := {y ∈ H : hy, xi = 0 for all x ∈ ker(L)},
which is another vector subspace of H, is one-dimensional. Note that
ker(`)⊥ ∩ ker(`) = {0}
since any such x would satisfy kxk2 = hx, xi = 0.
First of all the proof of the Hahn-Banach theorem also implies that there
is at least one nontrivial vector in ker(`)⊥ . If v1 and v2 are two nontrivial
vectors inside of ker(`)⊥ then there is a nonzero λ ∈ R
`(v1 ) = λ`(v2 ).
Then by linearity
`(v1 − λv2 ) = 0
thus v1 − λv2 ∈ ker(`) ∩ ker(`)⊥ = {0}.
Therefore ker(`)⊥ is one-dimensional, take now v to be a unit vector in
ker(`)⊥ then for any x ∈ H we first note that
x − hv, xiv ∈ ker(`)
since that vector is orthogonal to v and (ker(L)⊥ )⊥ = ker(L). Then
`(x) = `(x − hv, xivi + hv, xiv) = 0 + `(hv, xiv) = hv, xi`(v).
So calling y = `(v)v we get the result.
LECTURE NOTES MATH 5210
19
Example 2.43. (Dual norm of `1 , `2 , `∞ on Rd ) Let’s study a bit the dual
norms of k · kp on finite dimensional space. We will write k · kp∗ for the dual
norm of k · kp .
First we claim that k·k1 and k·k∞ are dual norms on Rd . Let ` a bounded
linear functional on Rd . First of all notice that
`(x) = `(x1 e1 + · · · + xd ed ) = x1 `(e1 ) + · · · xd `(xd )
so we can represent ` by another vector in Rd namely ` = (`1 , · · · , `d ) with
`j = `(ej ). Note that technically this is a linear bijection between (Rd )∗ and
Rd itself. We compute
|`(x)| = |
d
X
j=1
x j `j | ≤
d
X
j=1
|xj `j | ≤ max |`j |
j
d
X
|xj | = k`k∞ kxk1 .
j=1
This means that
k`k1∗ ≤ k`k∞ and similarly kxk∞∗ ≤ kxk1 .
To get equality we just have to note that, for any dual vector `, there is an
x so that equality is achieved in the above inequality, precisely take
xj = `j /|`j | if |`j | = k`k∞ and zero for other j.
A similar argument shows that kxk∞∗ = kxk1 .
As for the dual of the Euclidean norm k · k2 , we can check that kxk2∗ =
kxk2 . I will leave this as an exercise.
Example 2.44. (Dual of `1 , `2 , `∞ sequence spaces) Very similar arguments
to those shown in the finite dimensional case above find that
`1 (N → R)∗ = `∞ (N → R) and `2 (N → R)∗ = `2 (N → R).
However things are more complicated for the dual of `∞ , it turns out that
the dual of `∞ (N → R) is strictly larger than `1 (N → R), although we will
need the axiom of choice by way of the Hahn-Banach Theorem to construct
such a linear functional.
First of all let us consider the following vector subspace of `∞ (N → R)
X = {x ∈ `∞ (N → R)| lim xn exists}.
n→∞
This subspace is actually separable and hence is a nowhere dense subset of
`∞ (N → R). We define a bounded linear functional ϕ on X
ϕ(x) = lim xn .
n→∞
The Hahn-Banach theorem says that ϕ has a bounded extension ϕ : `∞ (N →
R) → R.
I leave it as an exercise for you to show that ϕ cannot arise as an element
of `1 (N → R): more precisely for ϕ with the above defining property there
20
WILL FELDMAN
is no element y ∈ `1 (N → R) such that
∞
X
ϕ(x) =
yn x n
n=1
for every x ∈
`1 (N
→ R) (or even just for x ∈ X).
Example 2.45. (Lp spaces and duals) As it is a Hilbert space L2 (Ω → R)
is self dual. The dual of L1 (Ω → R) is L∞ (Ω → R). Similar to the case of
`∞ (N → R) the dual of L∞ (Ω → R) is strictly larger than L1 (Ω → R).
Let’s discuss the proof that (L1 )∗ = L∞ . First of all any g ∈ L∞ (Ω → R)
defines a linear functional on L1 (Ω → R) by
ˆ
g(x)f (x) dx
ϕg (f ) = hg, f i =
Ω
which is bounded because
ˆ
ˆ
ˆ
| g(x)f (x) dx| ≤
|g(x)f (x)| dx ≤ kgkL∞ (Ω)
|f (x)| dx.
Ω
Ω
Now if we take an arbitrary ϕ in the dual of
g ∈ L∞ (Ω) such that
L1 (Ω
→
Ω
R)∗
we need to find
hg, f i = ϕ(f ) for all f ∈ L1 (Ω).
This requires a serious theorem of measure theory called the Radon-Nikodym
Theorem. Basically the idea is to use that indicator functions of finite measure sets are in L1 (Ω) so ϕ defines a measure on the Lebesgue subsets of
Ω
µ(E) = ϕ(χE ) with |µ(E)| ≤ kϕkL1 (Ω)∗ m(E).
The idea then is to show that
µ(B(x, r))
g(x) := lim
r→0 m(B(x, r))
exists at almost every x ∈ Rd . This follows from Radon-Nikodym Theorem.
Taking the existence of g for granted you can try to prove that kgkL∞ =
kϕkL1 (Ω)∗ .
2.6. Weak and weak-* convergence. Unlike in the case of metric spaces
where there is really just one notion of convergence which emerges naturally
from the metric structure, in the case of normed spaces (and actually other
topological vector spaces as well) there are several other topologies / notions
of convergence which arise from the norm.
Definition 2.46. In a Banach space (V, k · kV ) a sequence (xn )∞
n=1 in V is
said to converge weakly to x ∈ V and we write
xn * x as n → ∞
if
`(xn ) → `(x) for all ` ∈ V ∗ .
LECTURE NOTES MATH 5210
21
That is, the weak topology is the topology which is the weakest one that
makes all of the bounded linear functionals on V continuous. There is a
similar notion on the dual space, which actually turns out to be a bit better
in some ways.
Definition 2.47. Suppose (V, k · kV ) is a Banach space and (V ∗ , k · kV ∗ )
∗
is its dual space. A sequence (`n )∞
n=1 in V is said to converge weak-* to
∗
` ∈ V and we write
`n *∗ ` as n → ∞
if
`n (x) → `(x) for all x ∈ V.
If V is separable then the weak-* topology actually comes from a metric.
∗
Let X = (xn )∞
n=1 be a countable dense set in V then define, for ϕ, ψ ∈ V ,
ρ(ϕ, ψ) =
∞
X
n=1
|ϕ(xn ) − ψ(xn )|
.
1 + |ϕ(xn ) − ψ(xn )|
The key feature of the weak-* topology is the following fundamental result:
the closed unit ball in V ∗ norm is compact in the weak-* topology.
Theorem 2.48. (Banach-Alaoglu) If (V, k · k) is a separable Banach space,
(V ∗ , k · kV ∗ ) is its dual space, and B is the closed unit ball in V ∗ then B is
compact in the weak-* topology.
Proof. Subsequence diagonalization argument appears again! Let X = {xα }α∈I
be a countable dense set of V . Suppose `n is a sequence inside the closed
unit ball of V ∗ . Then (`n (xα ))∞
n=1 is a bounded sequence in R for each
α ∈ I. By subsequence diagonalization we can find a subsequence `nk such
that `nk (xα ) converges for every α and we define
`(xα ) := lim `nk (xα ).
k→∞
Right now ` : X → R, we want to check that it is linear and extend it to
the whole space. Since, for any x, y ∈ X
|`n (x) − `n (y)| ≤ kx − yk for all n
the same estimate is true for `. Thus ` is uniformly norm continuous on X
so it has a unique continuous extension to X = V .
Now we check that `nk (x) → `(x) for all x ∈ V . Let ε > 0 there exists
y ∈ X with kx − yk ≤ ε/3 and there exists k so that for all k ≥ K
|`nk (y) − `(y)| ≤ ε/3
so then
|`nk (x) − `(x)| ≤ |`nk (x) − `nk (y)| + |`nk (y) − `(y)| + |`(y) − `(x)| ≤ ε.
Finally we need to check that ` is linear. This is easy to check from the
fact that it is a pointwise limit of linear maps.
22
WILL FELDMAN
Of course this (amazing) compactness comes with a price, the notion of
convergence is much weaker than most we have looked at so far. Let’s see
some examples.
Example 2.49. (Oscillations) Let’s work in L1 ([0, 1]) with the dual space
L∞ ([0, 1]). Here is an example of a bounded sequence in L∞ ([0, 1]) which
does not have any strongly convergent subsequence, but does converge in
the weak-* topology. Define
fn (x) = sin(2πnx).
This sequence has supremum norm uniformly bounded by 1, but obviously
is not convergent in L∞ , in any of the Lp norms, or pointwise almost everywhere. In fact it does not even have a subsequence converging in any
of these modes. However, by Banach-Alaoglu it does have a subsequence
converging weak-* in L∞ , and in fact we can check that the whole sequence
converges weak-* to 0
fn *∗ 0 as n → ∞.
The proof is quite technical measure theory, but the idea is typical, argue
with a dense subset of L1 first and then extend to all of L1 by uniform
continuity (of the maps g 7→ hfn , gi).
As we can see weak-* convergence does not preserve the norm, in the above
example we had a sequence of functions with unit norm and their weak-*
limit was 0. One of the key facts about weak-* convergence, reminiscent of
Fatou’s Lemma, is that the norm can only decrease in the limit. We call
this lower semi-continuity of the norm.
Lemma 2.50 (Lower semi-continuity of the norm). Suppose that (V, k · kV )
is a Banach space and (V ∗ , k · kV ∗ ) is its dual space. If `n is a sequence in
the dual converging in the weak-* topology to ` then
k`kV ∗ ≤ lim inf k`n kV ∗ .
n→∞
Proof. Let x ∈ V with at most unit norm, then
|`(x)| = lim inf |`n (x)| ≤ lim inf k`n kV ∗ .
n→∞
n→∞
Then
k`kV ∗ = sup |`(x)| ≤ lim inf k`n kV ∗ .
kxkV ≤1
n→∞
2.7. Application of dual space ideas. In this section we consider the
existence of a solution to the ODE boundary value problem
(
−∂x (p(x)∂x u) + q(x)u = 0 x ∈ (0, 1)
(2.1)
u(x) = g(x)
x ∈ ∂(0, 1) = {0, 1}.
Here g : {0, 1} → R is arbitrary, and p, q : [0, 1] → (0, ∞) are positive and
continuous, we can make this more precise and just assume λ ≤ p, q ≤ 1.
LECTURE NOTES MATH 5210
23
This type of problem is called Sturm-Liouville, analogous PDE problems in
higher dimensions also come up in many applications and are simply called
divergence form elliptic boundary value problems.
We will take a functional analytic / variational approach to the problem.
We define an associated inner product
ˆ 1
hu, viX :=
p(x)∂x u∂x v + q(x)uv dx.
0
This is a weighted version of the standard L2 inner product, because of the
positivity of the coefficients p and q one can check that it is indeed an inner
product. The associated norm is, of course,
ˆ 1
2
p(x)(∂x u)2 + q(x)u2 dx = kuk2L2p + k∂x uk2L2q .
kukX =
0
We define X to be the completion of C 1 ([0, 1]) under this norm. Basically
this space consists of functions which are square integrable on [0, 1] and have
a square integrable derivative on [0, 1], although you have not studied yet
precisely the sense of derivative that we mean here.
We should note that, at first appearances, neither the equation nor the
boundary condition (2.1) make sense on this space. First let us make sense
of the boundary condition. We make note that actually functions in X
are Hölder-1/2 continuous. Using the fundamental theorem of calculus and
Cauchy-Schwarz, for any x, y ∈ [0, 1]
ˆ y
|u(x) − u(y)| = |
∂x u(t) dt|
ˆ xy
≤ ( (∂x u(t))2 dt)1/2 |x − y|1/2
x
ˆ 1
≤ ( (∂x u(t))2 dt)1/2 |x − y|1/2
0
ˆ 1
1
p(x)(∂x u(t))2 dt)1/2 |x − y|1/2
≤ 1/2 (
λ
0
1
≤ 1/2 kukX |x − y|1/2 .
λ
At a technical level this argument was valid for functions in C 1 ([0, 1]), however the result extends to the completion because a Cauchy sequence in the
X-norm will be bounded in the X norm and hence, by this estimate, equicontinuous. Then Arzela-Ascoli, with a bit of extra work thinking about uniform boundedness, shows that there is a uniformly convergent subsequence
and we can obtain this same Hölder continuity estimate for the pointwise
limit.
Thus functions in X are actually Hölder continuous and so point evaluation of the function (but not the derivative) does make sense in this space.
24
WILL FELDMAN
Thus we can define the subset of X (a convex subset but not a subspace)
Xg = {u ∈ X : u = g on ∂(0, 1)}.
Now we aim to solve the following minimization problem: find u ∈ Xg
such that
kuk2X = min kvk2X =: α.
v∈Xg
It turns out that solving this minimization problem is the same as solving
the ODE (2.1), but we will explore this later. It also turns out that this
is exactly a Hilbert space projection problem, so we could use the Hilbert
projection theorem to find a minimizer, but let’s use a different approach
which relies on dual space ideas instead of the Hilbert structure.
Let’s take a minimizing sequence un ∈ X such that
lim kun k2X = α.
n→∞
The sequence has bounded norm so, by the Hölder estimate again, we can
take a subsequence (not relabeled) so that
un → u uniformly on [0, 1].
In particular u ∈ Xg as well. Also, by Banach-Alaoglu, there is a further
subsequence (again not relabeled) so that
un *∗X u as n → ∞.
(A-priori the weak-* limit could be different, it is a good exercise to check
that if a sequence converges pointwise and in the weak-* sense in this space
then the limits are the same). By the lower-semicontinuity of the norm
kuk2X ≤ lim inf kun k2X .
n→∞
Thus
α ≤ kuk2X ≤ lim inf kuk2X = α
n→∞
and so we have found a minimizer.
Finally we explain what this minimization problem has to do with the
original ODE. In words, the ODE (2.1) is the first derivative test for the
minimization problem of this norm functional. We do a computation to
check this under the assumption that the minimizer is C 2 . In reality we
have not shown the existence of a C 2 minimizer, so there is still work to
be done. This is a typical situation with this kind of method, one finds a
solution with less regularity than expected and there is more work to be
done.
Suppose that our minimizer derived above is in fact in C 2 ([0, 1]). Let
v ∈ C01 ([0, 1]) (i.e. v = 0 on the boundary) be another function. Note that
the perturbations u + tv ∈ Xg for all t > 0 so
F (t) := ku + tvk2X
LECTURE NOTES MATH 5210
25
has its minimum at t = 0. We compute the first derivative test
d
dt
0=
which says
ˆ
ku + tvk2X = 2hu, viX
t=0
1
p(x)∂x v∂x u + q(x)uv dx.
0=
0
Now we integrate by parts in the first term and, using that v = 0 on ∂(0, 1),
ˆ 1
0=
v(−∂x (p(x)∂x u) + q(x)u) dx.
0
Since −∂x (p(x)∂x u) + q(x)u is continuous and integrates to zero against
every v ∈ C 1 ([0, 1]) we can prove that
−∂x (p(x)∂x u) + q(x)u = 0 for all x ∈ (0, 1).
i.e. we have solved our ODE!
3. Fourier Analysis on the Torus
As we have seen L2 and `2 spaces have a very special structure compared
to other Lp or `p spaces, they are inner product spaces. In this section we
will explore an even more specialized duality structure which appears in L2
or `2 spaces on Tn , Rn and Zn . At an elementary level we can think of
Fourier Duality as providing an interesting and natural Schauder basis for
L2 (Tn ).
This material will be parallel to the material in Tao, Chapter 4. I highly
recommend reading along in Tao’s book as well, he is, to say the least, one of
the world experts on Fourier Analysis. However in our class we have already
studied measure theory and basic functional analysis so we will be able to
add some of that additional context in these notes.
3.1. The flat torus. We define the unit width flat torus Td = Rd /Zd . By
this we mean that we identify points x and y in Rd which differ by an integer
vector. There is a natural bijection between Td and [0, 1)d , for x ∈ Td we
write ϕ(x) ∈ [0, 1)d to be the unique element of that equivalence class in
[0, 1)d .
Definition 3.1. A function f : Rd → R is Zd -periodic if f (x + k) = f (x)
for all k ∈ Zd .
Zd -periodic functions on Rd can naturally be viewed as functions on the
torus Rd /Zd and vice versa.
The torus Td naturally inherits a metric from Rd
dTd (x, y) = min d2 (x, y + k).
k∈Zd
Exercise 3.2. (Td , dTd ) is a compact metric space.
26
WILL FELDMAN
The torus Td also has a natural group structure coming from the addition
x + y = ϕ(ϕ(x) + ϕ(y))
writing it in this way is confusing, but it does have all the appropriate
properties of an addition operation.
We can also make sense of differentiation for functions on Td , by just
viewing them as Zd -periodic functions on Rd . Similarly we can define the
integral: for any f : Td → R and Ω ⊂ Td measurable
ˆ
ˆ
f (ϕ−1 (y)) dy
f (x) dx :=
ϕ(Ω)
Ω
where ϕ is the canonical bijection ϕ : Td → [0, 1)d defined previously. Measurability of f and Ω is defined exactly so that f ◦ ϕ−1 and ϕ(Ω) are measurable on [0, 1)d .
This may seem a bit burdensome, but in practice we will not need to be
so pedantic and in most cases we can just identify Td and [0, 1)d without
being careful to write the bijection ϕ.
Fourier analysis on Td is closely connected with the L2 (Td ) inner product,
for f, g : Td → C
ˆ
hf, giL2 (Td ) =
f (x)g(x) dx.
Td
Since we will always use this inner product in this section we will usually
drop the subscript.
3.2. Fourier Transform. Let’s just work on the one dimensional torus
T = R/Z, we will explain the generalization to higher dimensions at the
end.
Definition 3.3. (Characters) Define the character with frequency n ∈ Z
en (x) = e2πinx = cos(2πnx) + i sin(2πnx).
Note that these are all Z-periodic functions on R, and thus, naturally, functions on the torus T. Also note that
en (x)em (x) = en+m (x).
Definition 3.4. (Trigonometric polynomial) A function f ∈ C(T → C) is
called a trigonometric polynomial if it is a linear combination of characters
f (x) =
N
X
cn en (x).
n=−N
Exercise 3.5. The characters form an orthonormal system:
(
1 n=m
hen , em i =
0 else.
LECTURE NOTES MATH 5210
Exercise 3.6. For any trigonometric polynomial f (x) =
kf k2L2 =
N
X
27
PN
n=−N cn en (x)
|cn |2 .
n=−N
Definition 3.7. Given a function f ∈ L2 (T → C), i.e. a 1-periodic function
on R, we define the Fourier transform fb : Z → C
ˆ
b
e−2πikx f (x) dx.
f (k) = hek , f iL2 =
T
Given a function α ∈
`2 (Z
→ C) we define the inverse Fourier transform
X
α̌(x) =
αk e2πikx .
k∈Z
If we knew that the orthonormal set {en }n∈Z forms a Schauder basis for
L2 (T → C) then, from the abstract results of the previous section,
X
f (x) =
fb(k)e2πikx
k∈Z
with the sum converging in L2 (T → C) (i.e. you should not necessarily
interpret this pointwise at this stage). In other words f is the inverse Fourier
transform of it’s Fourier transform fb.
Via the results in the section on orthonormal bases of Hilbert space, we
can show that {en }n∈Z is a Schauder basis for L2 (T → C) if we can show
that
hf, en i = 0 for all n ∈ Z =⇒ f = 0 a.e..
This will follow if span({en }) is dense in L2 (T). By the Stone-Weierstrass
theorem span({en }) is dense in C(T) (in uniform metric and hence also in L2
metric) and we have also shown that C(T) is dense in L2 (T) (in L2 metric)
so we find that span({en }) is dense in L2 (T).
Taking these arguments, which were only outlined, for granted we also get
the Plancherel formula which says that the Fourier transform is a norm and
inner product preserving isomorphism between L2 (T → C) and `2 (Z → C):
Lemma 3.8 (Plancherel/Parseval formula). For f, g ∈ L2 (T)
hf, giL2 (T) = hfb, gbi`2 (Z)
and, in particular,
kf kL2 (T) = kfbk`2 (Z) .
3.3. Properties of Fourier Transform. The Fourier Transform is linear
and interacts in a very simple way with translation and differentiation, the
interaction with products is a bit more complicated.
Proposition 3.9. If f, g ∈ L2 (T → C) and α ∈ C
28
WILL FELDMAN
(1) (Linearity) The Fourier Transform is linear
(f\
+ αg)(k) = fb(k) + αb
g (k).
(2) (Translation) Translation on the physical side corresponds to modulation by a character on the Fourier side
f\
(· + y)(k) = e2πiky fb(k).
(3) (Modulation) Modulation by a character on the physical side corresponds to translation on the Fourier side
[
b
(e
` f )(k) = f (k − `).
(4) (Differentiation) If f ∈ L2 (T → C) ∩ C 1 (T → C) then
fb0 (k) = 2πik fb(k).
(5) (Products) Products on the physical side correspond to convolutions
on the Fourier side
X
d
(f
g)(k) = fb ∗ gb(k) =
fb(k − `)b
g (`).
`∈Z
(6) (Convolutions) Convolutions on the physical side correspond to products on the Fourier side
\
(f
∗ g)(k) = fb(k)b
g (k).
Remark 3.10. The derivative of f ∈ L2 (T) may be defined via the Fourier
transform even if f is not differentiable. In particular, as long as k fb(k) is
square summable on Z the series
X
g(x) =
k fb(k)ek (x)
k∈Z
converges in
L2 (T).
This is a notion of weak derivative.
3.4. Fourier transform of finite measures. If µ is a finite measure on
(T, B(T)) (the Borel subsets of the torus) then we actually can still define
the Fourier transform. Note that the Fourier coefficients
ˆ
e−2πikx dµ(x)
µ
b(k) =
T
are perfectly well defined, since the characters ek (x) are continuous and
bounded.
Example 3.11 (δ-mass). One of the most interesting measures to compute
the Fourier transform is the δ-mass. We simply compute
ˆ
b
δ0 (k) =
e−2πikx dδ0 (x) = 1.
T
The Fourier transform of δ-masses centered at a general point x ∈ T, δx ,
can be computed using the rule for the Fourier transform of a translate.
LECTURE NOTES MATH 5210
29
This may give an idea for generating a sequence of approximate identities
out of trigonometric polynomials. We simply take the Fourier transform of
the δ-mass and cut-off at some mode N :
DN (x) =
N
X
e2πinx = e−2πiN x
n=−N
e2πi(2N +1)x − 1
e2πix − 1
1
where we used the geometric sum formula. Multiply the factor e−2πi(N + 2 )x
through the numerator to create symmetric terms
πix e
DN (x) = e
2πi(N + 12 )x
1
1
1
− e−2πi(N + 2 )x
e2πi(N + 2 )x − e−2πi(N + 2 )x
=
e2πix − 1
eπix − e−πix
and now we can rewrite this as
DN (x) =
sin(2π(N + 12 )x)
sin(πx)
This is called the Dirichlet kernel, it is a bit like an approximate identity, but
it is highly oscillatory and the L1 norm diverges as N → ∞. The analysis
of the pointwise or pointwise a.e. convergence DN ∗ f → f , is a quite tricky
topic, it is equivalent to the pointwise / pointwise a.e. convergence of Fourier
series.
Let’s try something similar, following Tao, but we will force non-negativity
N
1 X 2πinx
ρN (x) :=
e
N
n=0
2
N
N
X
1 X 2πi(n−m)x
|`|
e
=
(1 − )e2πi`x .
=
N
N
n,m=0
`=−N
3.5. Fourier transform decay, regularity, and convergence. The rate
of decay of the Fourier modes at ∞ in Z is closely connected with regularity
properties (continuity, differentiability) of the function in physical space and
the mode of convergence of the Fourier series.
Our first result adds just a small amount of decay, we consider absolutely
summable instead of square summable sequences of Fourier modes.
Lemma 3.12. If α ∈ `1 (Z → C) (i.e. it is an absolutely summable sequence), then the Fourier Series
X
α̌(x) =
αk e2πikx
k∈Z
converges uniformly on T and α̌ is continuous.
Proof. This is simply an application of the Weierstrass M -test, the partial
sums are continuous functions and the uniform limit of continuous functions
is continuous.
Now if we add even more decay on the Fourier side we can find higher
regularity of the inverse transform.
30
WILL FELDMAN
Lemma 3.13. Suppose that (αk )k∈Z is a sequence of complex coefficients
such that
X
|k|m |αk | < +∞.
k∈Z
Then the Fourier series
α̌(x) =
X
αk e2πikx
k∈Z
converges uniformly and α̌ is m-times continuously differentiable with
X
α̌(`) (x) =
αk (2πik)` e2πikx for 0 ≤ ` ≤ m.
k∈Z
Proof. Exercise.
3.6. Applications of Fourier Analysis - PDE. The heat equation is a
classical partial differential equation (PDE) model for the transfer of heat,
or the diffusion of a large number of micro-particles undergoing Brownian
motion. It is also a fundamental archetype in the world of PDE. Actually
the development of Fourier series by Jean-Baptiste Fourier was exactly with
the purpose of solving the heat equation.
We will look for a heat distribution function u(x, t) a real valued function
of a space variable x ∈ T and a time variable t ∈ R+ solving the initial value
problem
(
ut − uxx = 0
for (x, t) ∈ T × (0, ∞)
u(x, 0) = g(x),
where the initial data g is some function in L2 (T → R).
Classical approach. Fourier’s idea (or at least my semi-modern interpretation of it) started with looking for a solution of the separated form
u(t, x) = T (t)X(x).
Plugging this form into the equation yields
T 0 (t)
X 00 (x)
=
.
T (t)
X(x)
Since the left hand side depends only on t and the right hand side depends
only on x both must be equal to a constant λ ∈ R:
(
T 0 = λT
t ∈ R+
00
X = λX x ∈ T.
A different way of stating the second equation which may be more clear is
we look for a solution of X 00 = λX on R which is 1-periodic. This imposes
some restriction on λ, if λ > 0 then the solutions of X 00 = λX are of the
form
1/2
1/2
X(x) = Aeλ x + Be−λ x
which are not 1-periodic functions on R and so we must throw this case out.
LECTURE NOTES MATH 5210
31
Thus we arrive at the case λ = −ω 2 ≤ 0. Now the solutions of the second
equation are of the form
X(x) = Aeiωx + Be−iωx
again in order to be 1-periodic on R we arrive at the restriction
ω = 2πk for some k ∈ Z.
Then the corresponding solution of the first equation is
2
T (t) = Ce−(2πk) t ,
and our solution of the heat equation is
2
u(x, t) = De−(2πk) t e2πikx .
This is of course just a special solution with initial data forced on us to be
the character ek (x) for some k ∈ Z. However, you may observe, we have a
bit more than that. The heat equation is linear so any linear combination
of solutions is a solution. Thus actually we can solve the heat equation for
any trigonometric polynomial as initial data. This is the origin of the idea
of Fourier series: what if we could express an arbitrary periodic function as
a (infinite) linear combination of these trigonometric functions.
Modern approach. Now in our current situation we already know about
the Fourier transform on T so let’s approach the solution of the heat equation
with this in mind. We simply take the Fourier transform on both sides of
the heat equation to obtain the following equation on the Fourier modes
(
u
bt (k) + (2πk)2 u
b(k) = 0 for (k, t) ∈ Z × (0, ∞)
u
b(k, 0) = gb(k).
This is a first order linear ODE for each k ∈ Z which we can solve to find
2t
u
b(k, t) = gb(k)e−(2πk)
for (k, t) ∈ Z × [0, ∞).
Then taking the inverse Fourier transform, i.e. writing out the Fourier series,
X
2
(3.1)
u(x, t) =
gb(k)e−(2πk) t e2πikx .
k∈Z
This sum certainly converges in the L2 norm, but actually the convergence
is much better. Even if gb was only bounded (i.e. the Fourier transform of a
finite measure) then u
b(k, t) will decay extremely quickly in k for any t > 0.
From this we can easily derive an important and fundamental property of
the heat equation: instantaneous regularization. Informally speaking, even
for quite singular initial data the solution of the heat equation exists and is
smooth for all t > 0.
Lemma 3.14. If gb : Z → R is bounded then the solution of the heat equation,
u(x, t) defined in (3.1), is C ∞ for any t > 0.
Proof. Apply Lemma 3.13.
32
WILL FELDMAN
With the Fourier series it is also easy to understand the long time behavior
of solutions of the heat equation. If gb : Z → R is bounded then the solution
´ of
the heat equation, u(x, t) defined in (3.1), converges as t → ∞ to gb(0) = T g
|u(x, t) − gb(0)| ≤ kb
g k`∞ e−(2π)
2t
and we can even describe the asymptotic profile
2
e(2π) t (u(x, t) − gb(0)) = gb(1)e2πix + gb(−1)e−2πix .
Since we assumed g is real valued we actually have the relation gb(k) = gb(−k)
so
gb(1)e2πix +b
g (−1)e−2πix = 2Re(b
g (1)e2πix ) = 2Re(b
g (1)) cos(2πx)−2Im(b
g (1)) sin(2πx).
3.7. Schrödinger equation on the circle. The Schrödinger equation arrived on the scene much more recently than the heat equation, in the 20th
century. This is a fundamental equation of quantum mechanics describing the evolution of the (L2 -complex) probability distribution of a quantum
particle.
´
Given an initial data g ∈ L2 (T → C) with T |g|2 = 1 (L2 probability
distribution) we look for a solution u(x, t) ∈ C of the equation
(
iut − uxx = 0
(x, t) ∈ T × (0, ∞)
u(x, 0) = g(x).
Obviously this looks quite similar to the heat equation, but the appearance of the complex unit i completely changes the nature of the equation.
Nonetheless there are computational similarities and we can solve by the
same Fourier method.
Taking the Fourier transform on both sides of the equation we find
ib
ut (k) + (2πk)2 u
b(k) = 0 for (k, t) ∈ Z × (0, ∞)
and again we can integrate this simple first order ODE to find
2
u
b(k, t) = e(2πk) it gb(k).
Of course the major difference here is that there is no additional decay in k!
We are just multiplying by a complex phase so |b
u(k, t)| = |b
g (k)| for all t > 0.
In particular, and it is essential to the quantum mechanical interpretation
of the equation, we have
ˆ
ˆ
X
X
2
2
2
|u| =
|b
u(k, t)| =
|b
g (k)| =
|g|2 = 1
T
for all t > 0.
k∈Z
k∈Z
T
LECTURE NOTES MATH 5210
33
3.8. Making sense of nonlinear functions of the derivative. One interesting thing about the Fourier transform, since it diagonalizes differentiation, we can easily compute nonlinear functions of the derivative operator
which would have been very mysterious on the physical side.
For example let’s write ∂x for the derivative operator mapping C 1 (T) →
C(T). Many math students find the idea of fractional derivatives quite
intriguing, what is the meaning of
∂xα f for α ∈ R
or even the absolute value of the derivative
|∂x |f
What about some other operators which arise from formally solving the heat
or Schrödinger equations
2
2
Gt f = et∂x f and Ut f = e−i∂x f.
These operators are difficult to interpret on the physical side, but we can
define them easily by using the Fourier transform.
Given a smooth function m : R → R we can define the Fourier multiplier
operator
X
m(∂x )f :=
m(2πik)fb(k)e2πikx .
k∈Z
The operator makes sense on any function f which has sufficient decay so
that m(2πik)fb is square summable over the integers.
For example for the fractional derivatives above if
|k|α fb ∈ `2 (Z)
then the fractional derivative is well define in L2
∂xα f ∈ L2 (T).
4. Fourier analysis on R
In this section I will introduce some ideas of Fourier analysis on Euclidean
space. The situation here is more complicated than on the torus and we need
some more sophisticated functional analysis. In this case we cannot view the
Fourier transform as simply a choice of Schauder basis for L2 (R), although
there are certainly philosophical similarities.
The functional analytic basis of the Fourier transform on R starts with the
space of Schwartz functions which is a metrizable topological vector space.
4.1. Schwartz functions. We introduce the space of Schwartz functions
S(R), a space which is well suited for defining the Fourier transform. The
space is defined by a collection of semi-norms.
Definition 4.1. If V is a vector space then a map [·] : V → [0, ∞) is called
a semi-norm if it satisfies
(1) (Non-negativity) For all x ∈ V we have [x] ≥ 0.
34
WILL FELDMAN
(2) (Scaling) For any α ∈ R (or C) and x ∈ V we have [αx] = |α|[x].
(3) (Triangle inequality) For any x, y ∈ V we have
[x + y] ≤ [x] + [y].
A semi-norm satisfies all the properties of a norm except that the seminorm of some nonzero vectors may be zero. Of course, as we have seen, one
can mod out by the equivalence relation induced by [x − y] = 0 but that is
not actually our intention here.
We introduce a sequence of semi-norms which measure the differentiability
and decay of a function f : R → R. The Schwartz semi-norms are defined
[f ]k,` = sup(1 + |x|2 )k/2 |f (`) (x)| for 0 ≤ k, ` < ∞.
x∈R
Note that these are actually norms for ` = 0 but for ` > 0 they are only seminorms. In plain language: if [f ]k,` < +∞ then f is `-times differentiable and
the `’th derivative is bounded and decays at least as fast as (1 + |x|2 )−k/2 .
Now we define the Schwartz space
S(R) = {f : R → C| [f ]k,` < +∞ for all 0 ≤ k, ` < +∞}.
It is not hard to check that this is a vector space. We define a notion of
convergence on S(R). We say a sequence fn → f in S(R) if
(4.1)
[fn − f ]k,` → 0 as n → ∞ for all k, `.
We can actually define a metric which produces this notion of convergence,
making S(R) a metric space,
dS(R) (f, g) =
∞
X
k,`=0
2−k−`
[f − g]k,`
.
1 + [f − g]k,`
This notion of convergence makes S(R) a complete metrizable topological
vector space, i.e. it is a Fréchet space.
Exercise 4.2. Check that dS(R) is a metric and that fn → f in the sense
of (4.1) if and only fn → f in dS(R) . Check that S(R) is complete in this
metric.
The key properties of the Schwartz space in relation to the Fourier transform is that S(R) is closed under the operations of products, differentiation,
convolution and multiplication by x.
Lemma 4.3. If f, g ∈ S(R) then f g ∈ S(R), f ∗ g ∈ S(R), f 0 ∈ S(R), and
xf ∈ S(R).
4.2. Fourier transform on Schwartz space. The reason for the introduction of Schwartz space is that the Fourier transform acts extremely nicely
on S(R), in particular the Fourier transform of a Schwartz function is another Schwartz function.
LECTURE NOTES MATH 5210
35
For an f ∈ S(R) we define the Fourier transform
ˆ
b
e−2πiξx f (x) dx.
f (ξ) =
R
This integral is well defined actually for any f ∈ L1 (R), and so it is certainly well defined on Schwartz functions. We also introduce an alternative
notation which is useful when we want to view the Fourier transform as a
linear operator on Schwartz space
Ff = fb.
The Fourier transform interacts nicely with many of the symmetries of R.
Proposition 4.4. If f, g ∈ S(R) and α ∈ C
(1) (Linearity) The Fourier Transform is linear
F[f + αg] = Ff + αFg.
(2) (Translation) Translation on the physical side corresponds to modulation by a character on the Fourier side
F[f (x + x0 )](ξ) = e2πiξx0 fb(ξ).
(3) (Modulation) Modulation by a character on the physical side corresponds to translation on the Fourier side
F[e2πiξ0 x f (x)](ξ) = (Ff )(ξ − ξ0 ).
(4) (Differentiation) Differentiation on the physical side corresponds to
multiplication by ξ on the Fourier side
F[f 0 ](ξ) = 2πiξF[f ](ξ).
(5) (Multiplication by x) Multiplication by x on the physical side corresponds to differentiation on the Fourier side
F(xf ) =
1
(Ff )0 (ξ)
2πi
(6) (Products) Products on the physical side correspond to convolutions
on the Fourier side
ˆ
F(f g) = F[f ] ∗ F[g] =
fb(ξ − η)b
g (η) dη.
R
(7) (Convolutions) Convolutions on the physical side correspond to products on the Fourier side
F(f ∗ g) = (Ff )(Fg).
Corollary 4.5. The Fourier transform of a Schwartz function is a Schwartz
function.
36
WILL FELDMAN
5. Lp -spaces and other notions of convergence in Measure
Theory
5.1. Notions of convergence in measure theory. Given a domain Ω ⊂
Rn measurable and a sequence of measurable functions
fn : Ω → R
we have already seen several notions of convergence fn → f as n → ∞.
Often in measure theory we will identify functions which are equivalent
almost everywhere (a.e.), that is m({x : f (x) 6= g(x)}) = 0, and many of
the notions of convergence will only uniquely identify the limit up to a set
of measure zero.
The notions of pointwise and uniform convergence are by now well known
to us, we restate them for context.
Definition 5.1. A sequence fn → f pointwise in Ω if
|fn (x) − f (x)| → 0 as n → ∞ for all x ∈ Ω.
Definition 5.2. A sequence fn → f uniformly in Ω if
sup |fn (x) − f (x)| → 0 as n → ∞.
x∈Ω
As we know uniform convergence is a very strong notion of convergence
and implies pointwise convergence.
Next we have a measure theoretic version of pointwise convergence, called
convergence pointwise almost everywhere, which we have already introduced
in class.
Definition 5.3. A sequence fn → f pointwise a.e. in Ω if there is a set
E ⊂ Ω with m(Ω \ E) = 0 such that
|fn (x) − f (x)| → 0 as n → ∞ for all x ∈ E.
Naturally this notion is weaker than pointwise convergence, but it is still
sufficient to apply the important integral convergence theorems of measure
theory: the monotone convergence theorem and the dominated convergence
theorem.
Another natural notion of convergence is L1 -convergence
Definition 5.4. A sequence fn of absolutely integrable functions converges
to f in L1 on Ω if
ˆ
kfn − f kL1 =
|fn − f | → 0 as n → ∞.
Ω
The relationship between pointwise a.e. convergence and L1 convergence
is less clear cut that the relationship between previous notions. We have
some information, the following is a corollary of the dominated convergence
theorem:
LECTURE NOTES MATH 5210
37
Corollary 5.5. If fn : Ω → R are absolutely integrable and fn → f pointwise a.e. in Ω and there exists F absolutely integrable such that |fn | ≤ F
for all n then fn → f in L1 .
What about the other direction? Does convergence in L1 imply convergence pointwise a.e.? The answer is no in general, but it turns out that if
kfn − f kL1 → 0 fast enough then pointwise a.e. convergence does happen.
Example 5.6 (Typewriter/Piano sequence). Let’s start with an example
where convergence in L1 does not imply convergence pointwise almost everywhere. Define
fj,k (x) = 1[j2−k ,(j+1)2−k ] (x) for 0 ≤ j < 2k .
For each k this is a sequence of dyadic intervals which traverses [0, 1] from left
to right. For every k and every x ∈ [0, 1] there is an appropriate 0 ≤ j < 2k
such that fj,k (x) = 1.
Now for n ∈ N define k to be the smallest natural number so that 2k +1 ≤
n and define j = n − (2k + 1) and then
gn (x) = fj(n),k(n) (x).
This sequence traverses the dyadic subintervals of [0, 1] of length 2−k from
left to right, and then starts again at the left traversing the dyadic subintervals of length 2−k−1 .
With some additional checking we can see that this sequence does converge to zero in L1 but not pointwise almost everywhere.
Now we show that fast L1 convergence implies pointwise a.e. convergence.
Lemma 5.7. Suppose that fn : Ω → R are absolutely integrable and
∞
X
kfn+1 − fn kL1 (Ω) < +∞
N =1
then fn converges pointwise a.e. and in L1 to an absolutely integrable function f : Ω → R∗ .
Proof. Our idea is to define the limit f as the pointwise a.e. limit of the
telescoping sums
(5.1)
fn = f1 +
n−1
X
(fj+1 − fj ),
j=1
but we need to establish that this series actually does converge pointwise
almost everywhere.
We start by showing that the series is absolutely summable pointwise a.e.,
define
∞
X
g(x) = |f1 (x)| +
|fj+1 (x) − fj (x)|
j=1
38
WILL FELDMAN
this definition makes sense for each x ∈ Ω as an element of [0, +∞] since it is
a sum of non-negative terms. By the assumption defining the subsequence
g is absolutely integrable on Ω
ˆ
∞
X
g = kf1 kL1 +
kfj+1 − fj kL1 < +∞.
Ω
j=1
Absolutely integrable functions are finite almost everywhere, so m({g =
+∞}) = 0.
Thus the following series does converge pointwise on the set {g < +∞},
which has full measure in Ω,
f (x) := f1 (x) +
∞
X
(fj+1 (x) − fj (x)),
j=1
in particular fn → f pointwise a.e. on Ω. We know f is absolutely integrable
because |f | ≤ g.
We also need to show that fnj → f in L1 norm. Note that every |fnj −f | ≤
g (triangle inequality on (5.1)), so by dominated convergence theorem
ˆ
|fnj − f | → 0 as j → ∞.
Ω
Theorem 5.8. For Ω ⊂ Rn measurable (L1 (Ω), k · kL1 (Ω) ) is complete.
1
∞
Proof. Suppose (fn )∞
n=1 is a Cauchy sequence in L (Ω). If (fn )n=1 has a
convergent subsequence then the whole sequence converges so we would be
done.
Because supn,m≥N kfn − fm kL1 → 0 we can choose a subsequence fnj so
that
∞
X
kfnj+1 − fnj kL1 < +∞.
j=1
Now we can simply apply the result of Lemma 5.7.
Extremely similar arguments work to show that Lp (Ω → R) is complete
for any 1 ≤ p < ∞.
5.2. Continuous functions of compact support are dense in L1 . In
this section we will show that continuous functions of compact support are
dense in L1 (Rn → R). This is also true for Lp (Rn → R) for every 1 ≤ p <
+∞. I will outline the ideas and leave most of the details as an exercise for
the homework.
Here is the outline: we will first show that indicator functions of finite
measure sets can be approximated in L1 norm by continuous compactly
supported functions, then we will use the density of simple functions, which
we already know, to conclude.
LECTURE NOTES MATH 5210
39
5.3. Generalizations of uniform convergence. One may also think of
trying to make generalizations of the notion of uniform convergence, mixing
in some of the philosophy of measure theory (i.e. sets of small or zero
measure can be ignored). For example one may think to measure instead of
the smallest upper bound, the supremum, the smallest upper bound which
is valid on a set of full measure.
Let f : Ω → R, first we re-interpret the classical supremum supΩ f :
sup f = inf a ∈ R {f > a} = ∅ .
Ω
This motivates the following definition
Definition 5.9. Define the essential supremum of a measurable function
f : Ω → R by
ess-sup f = inf a ∈ R m({f > a}) = 0
Ω
and the essential infimum
ess-inf f = sup a ∈ R m({f < a}) = 0
Ω
This allows us to define a measure theoretic analogue of the supremum
norm
kf kL∞ = ess-sup |f |.
Ω
The associated normed space is defined
L∞ (Ω → R) = {f : Ω → R| f measurable and kf kL∞ (Ω) < +∞}.
This is an example of a non-separable Banach space. The proof of nonseparability is very similar to one we already saw on the homework for the
space `∞ (N → R).
There are some other generalizations of uniform convergence, which (to
be honest) are not as useful as they seem. However they do come up occasionally so it is good to be aware of them.
Definition 5.10. A sequence of measurable functions fn : Ω → R converges
almost uniformly to a function f if for all δ > 0 there is a set E with
m(E) ≤ δ such that fn → f uniformly on Ω \ E.
This notion of convergence looks stronger than pointwise a.e. convergence,
but it turns out to be equivalent.
Theorem 5.11 (Egorov’s Theorem). Suppose that fn : Ω → R are measurable and convergence pointwise a.e. to f : Ω → R, then fn → f almost
uniformly.
40
WILL FELDMAN
Proof. We start out with a trick that we have seen before in the sharp
Riemann integrability criterion, starting with the set where pointwise convergence holds, which has full measure, and writing it as a countable union
of sets where quantified convergence holds. Define
N (x, ε) = sup{n : |fn (x) − f (x)| ≥ ε}.
Since fn → f pointwise a.e. we know that there is a set F ⊂ Ω of full
measure on which N (x, ε) < +∞ for all ε > 0.
Thus, by monotone convergence theorem,
m({x : N (x, ε) > n}) → 0 as n → ∞.
For each k choose M (k) so that
1
m({x : N (x, ) > M (k)}) ≤ δ2−k .
k
Then define
E=
∞
[
1
{x : N (x, ) > n}.
k
k=1
By countable subadditivity
m(E) ≤ δ.
We claim that fn → f uniformly on Ω \ E. Let ε > 0 there exists k such
that 1/k ≤ ε and since
1
Ω \ E ⊂ {x : N (x, ) ≤ M (k)}
k
1
and so for n ≥ M (k) we have n ≥ N (x, k ) for all x ∈ Ω \ E
|fn (x) − f (x)| ≤
1
.
k
5.4. Appendix. Here we provide some additional notes and alternative
proofs on the material of the section.
We will start with an alternative “more hands on” proof of Lemma 5.7.
We state a very important lemma which will be useful in the upcoming proof.
In fact it is highly useful in many situations and it is worth remembering
the argument.
Lemma 5.12 (Markov/Chebyshev Inequality). Suppose that f : Ω → [0, ∞]
and λ ∈ (0, ∞) then
ˆ
1
m({x ∈ Ω : f (x) > λ}) ≤
f
λ {f >λ}
Proof. We simply note that f χ{f >λ} ≥ λχ{f >λ} and then compute
ˆ
ˆ
χ{f >λ} ≤ 1
m({f > λ}) =
fχ
.
λ Ω {f >λ}
Ω
LECTURE NOTES MATH 5210
41
´
´
It is also worth noting that {f >λ} f ≤ Ω f , since f is non-negative, and
this form of the inequality is often sufficient.
Now we show that fast L1 convergence implies pointwise a.e. convergence.
Lemma 5.13. Suppose that fn : Ω → R are absolutely integrable and
∞
X
kfn+1 − f kL1 (Ω) < +∞
N =1
then fn converges pointwise a.e. in Ω to f .
Proof. Define
E = {x ∈ Ω : inf sup |fn (x) − f (x)| > 0},
N n≥N
this is the set of points where fn (x) is NOT convergent to f (x), we aim to
show this set has measure zero. We start out with a trick that we have seen
before in the sharp Riemann integrability criterion and Egorov’s Theorem,
writing E as a countable union of sets of points where fn is quantitatively
not Cauchy. Note that
[
E=
Eδ
δ∈1/N
where 1/N = {1/k : k ∈ N} and
Eδ = {x ∈ Ω : inf sup |fn (x) − f (x)| ≥ δ}.
N n≥N
If we can show that m(Eδ ) = 0 for all δ > 0 then we are done since E will
be a countable union of sets of measure zero, and hence it will have measure
zero.
We use that to rewrite Eδ as
∞ [
\
Eδ =
{x ∈ Ω : |fn (x) − f (x)| ≥ δ}.
N =1 n≥N
By Markov’s inequality
1
m({x ∈ Ω : |fn (x) − f (x)| ≥ δ}) ≤ kfn − f kL1 (Ω) .
δ
Thus by sub-additivity, for any N ≥ 1,
∞
[
X
1
m(Eδ ) ≤ m(
{x ∈ Ω : |fn (x) − f (x)| ≥ δ}) ≤
kfn − f kL1 (Ω) .
δ
n≥N
n=N
By the summability assumption in the Lemma the right hand side in the
above inequality converges to zero as N → ∞. Thus m(Eδ ) = 0.
This argument should be familiar from Homework 5 problem 4(b). If you
read about the Borel-Cantelli lemma in the book, which we did not use in
class, that Lemma could have been cited to skip some parts of the proof.
42
WILL FELDMAN
5.5. Other important theorems of measure theory. There are some
additional important theorems of measure theory which we will not prove
in the class. Nonetheless they are useful to know and will allow us to prove
some of the functional analytic results we are interested in on the Lp spaces.
Theorem 5.14 (Lebesgue Differentiation Theorem). Suppose that f : Rd →
R is absolutely integrable on any bounded subset of Rd , i.e. f is locally
integrable. Then there exists a set of measure zero E ⊂ Ω such that for
all x ∈ Rd \ E
ˆ
1
f (y) dy.
f (x) = lim
r→0 m(B(x, r)) B(x,r)
Download