Notes on Real Analysis II

advertisement
Real Analysis II
John Loftin∗
May 4, 2013
1
Spaces of functions
1.1
Banach spaces
Many natural spaces of functions form infinite-dimensional vector spaces.
Examples are the space of polynomials and the space of smooth functions.
If we are interested in solving differential equations, then, it is important to
understand analysis in infinite-dimensional vector spaces (over R or C).
First of all, we should recognize the following straightforward fact about
finite-dimensional vector spaces:
Homework Problem 1. Let x = (x1 , . . . , xm ) denote a point in Rm , and
m
let {xn } = {(x1n , . . . , xm
n )} be a sequence of points in R . Then xn → x if
and only if xin → xi for all i = 1, . . . , m.
m
(Recall the standard
p metric on R is given by |x − y|, where the norm
| · | is given by |x| = (x1 )2 + · · · + (xm )2 .)
Thus for taking limits in Rm , we could even dispense with the notion of
the taking limits using the metric on Rm , and simply define the xn → x
by xin → xi for each i = 1, . . . , m. This reflects the fact that there is only
one natural topology on a finite-dimensional vector space: that given by the
standard norm.
For infinite-dimensional vector spaces, say with a countable basis, so that
x = (x1 , x2 , . . . ), it is possible to define a topology by xn → x if and only
if each xin → xi . It turns out that this is not usually the most useful way
∗
partially supported by NSF Grant DMS0405873
1
to define limits in infinite-dimensional spaces, however (though a related
construction is used in defining the topology of Fréchet spaces).
Finite-dimensional vector spaces are also all complete with respect to
their standard norm (in other words, they are all Banach spaces). Given a
norm on an infinite dimensional vector space, completeness must be proved,
however. There are many examples of Banach function spaces: On a measure
space, the Lp spaces of functions are all Banach spaces for 1 ≤ p ≤ ∞. Also,
on a metric space X, the space of all bounded continuous functions C 0 (X)
is a measure space under the norm
kf kC 0 (X) = sup |f (x)|.
x∈X
The Lp and C 0 form the basis of most other useful Banach spaces, with extensions typically provided by measuring not just the functions themselves, but
also their partial derivatives (as in Sobolev and C k spaces) or their difference
quotients (Hölder spaces).
Completeness of a metric space of course means that any Cauchy sequence
has a unique limit. More roughly, this means that any sequence that should
converge, in that its elements are becoming infinitesimally close to each other,
will converge to a limit in the space. As we will see, taking such limits is
a powerful way to construct solutions to analytic problems. Unfortunately,
many of the most familiar spaces of functions (such as smooth functions) do
not have the structure of a Banach space, and so it is difficult to ensure that
a given limit of smooth functions is smooth. In fact we have the following
theorem, which we state without proof:
Theorem 1. On Rn equipped with Lebesgue measure, the space C0∞ (Rn ) of
smooth functions with compact support is dense in Lp (Rn ) for all 1 ≤ p < ∞.
In other words, completion of the space of smooth functions with compact
support on Rn with respect to the Lp norm, is simply the space of all Lp
functions for 1 ≤ p < ∞.
If we are working in L2 , for example, it is possible for the limit of smooth
functions to be quite non-smooth: there are many L2 functions which are
discontinuous everywhere. This poses a potential problem if the limit we
have produced is supposed to be a solution to a differential equation. In
particular, such a limit may be nowhere differentiable. Some of our goals then
are to understand (1) how to make sense of taking derivatives of functions
which are not classically differentiable (the theory of distributions and weak
2
derivatives), and (2) how to show that a limit function actually has enough
derivatives to solve the equation (bootstrapping).
Theorem 1 reminds us that the Lp Banach spaces have a very large overlap, which of course includes many more functions than the smooth functions
with compact support. In particular, it is often useful to take the point of
view that these Banach function spaces are not so much different spaces but
different tools to study either the space of all functions or (via the completion process) the space of only very nice functions (e.g., smooth functions of
compact support).
In particular, two function spaces which are very closely related to each
other are L∞ and C 0 . As we will see below, they have essentially the same
norm. First of all, we show that C 0 (X) is a Banach space for any metric
space X.
1.2
The Banach space C 0
Given a metric space X, define
C 0 (X) = {f : X → R : f is continuous and sup |f | < ∞}.
X
Define the norm
kf kC 0 (X) = sup |f |.
X
It is straightforward to verify that k · kC 0 satisfies the requirements for a
norm:
• kf kC 0 = 0 ⇐⇒ f ≡ 0,
• kλf kC 0 = |λ|kf kC 0 ,
• kf + gkC 0 ≤ kf kC 0 + kgkC 0 .
Remark. If fi → f in C 0 (X), then we say fi → f uniformly on X, and
C 0 (X) convergence is the same as uniform convergence.
The main thing to check is that the norm gives C 0 (X) the structure of a
complete metric space:
Proposition 1. For any metric space X, C 0 (X) is a Banach space with
norm k · kC 0 .
3
Proof. We simply need to check the metric induced on C 0 (X) is complete.
Let d denote the metric on X, and consider a Cauchy sequence {fi } ⊂
C 0 (X). In other words, for all > 0, there is an N so that n, m > N
implies kfn − fm kC 0 < . By the definition of the norm, this is equivalent to
|fn (x) − fm (x)| < for all x ∈ X. Now for each x ∈ X, {fi (x)} ⊂ R is a
Cauchy sequence, and since R is complete, there is a limit f∞ (x) = limi fi (x).
Now we have produced a limit function f∞ . Now we need to show that
kfi − f∞ kC 0 → 0 and f∞ ∈ C 0 (X). The first statement is straightforward:
For all > 0, there is an N so that for all n, m > N , for all x ∈ X,
|fn (x) − fm (x)| < .
Now let m → ∞ to see that
|fn (x) − f∞ (x)| ≤ .
So we have that for all > 0, there is an N so that for all n > N , and for all
x ∈ X,
|fn (x) − f∞ (x)| ≤ .
Since this is true for all x ∈ X, we have
kfn − f∞ kC 0 = sup |fn (x) − f∞ (x)| ≤ ,
x∈X
and so kfi − f∞ kC 0 → 0.
We still need to prove that the limit function f∞ is continuous. So let x ∈
X and choose > 0. Then there is an N so that for n > N , kfn − f∞ kC 0 < .
By the previous paragraph and the definition of k · kC 0 ,
|fn (x) − f∞ (x)| < and |fn (y) − f∞ (y)| < for all y ∈ X.
Choose a particular n > N and since fn is continuous at x, there is a δ > 0
so that |fn (x) − fn (y)| < for y so that d(x, y) < δ. Then for such y in a
δ-ball around x,
|f∞ (x) − f∞ (y)| = | [f∞ (x) − fn (x)] + [fn (x) − fn (y)] + [fn (y) − f∞ (y)] |
≤ |f∞ (x) − fn (x)| + |fn (x) − fn (y)| + |fn (y) − f∞ (y)|
< + + = 3.
So we have proved that for all > 0, x ∈ X, there is a δ > 0 so that
d(x, y) < δ ⇒ |f∞ (x) − f∞ (y)| < 3. This proves f∞ is continuous.
4
The last bit of the proof can be remembered as this: Any uniform limit
of continuous functions is continuous.
Remark. The previous proposition works as well for functions whose range is
the complex numbers C, or a vector space Rn , or in fact any Banach space
B. The proof is the same. In this last case, we could refer to the Banach
space C 0 (X; B) as the Banach space of continuous functions from X into B.
Consider an open set Ω ⊂ Rn . On Ω, the C 0 norm is essentially the same
as the L∞ norm, but is simpler to define because we can consider functions
as elements of C 0 , while we need equivalence classes of functions to define
L∞ . In fact, more is true. Let Ω inherit the standard metric and Lebesgue
measure from Rn . For a measurable function f : Ω → R, let [f ] be the
equivalence class whose members are all functions from Ω → R which agree
with f almost everywhere.
Proposition 2. The map Φ : C 0 (Ω) → L∞ (Ω) given by Φ(f ) = [f ] is oneto-one and preserves the norm.
Proof. First of all, note that it follows immediately from the definitions that
for f ∈ C 0 (Ω), Φ(f ) ∈ L∞ (Ω). Also, we should show that kf kC 0 = kΦ(f )kL∞
to show Φ preserves the norm.
The proof hinges on the simple fact that every full-measure subset V of Ω
is dense in Ω. (Recall V ⊂ Ω has full measure if Ω \ V has Lebesgue measure
zero.) This fact may be proved as follows: let V ⊂ Ω have full measure.
Then there is no open ball contained in Ω \ V (since open balls have positive
measure). This shows V is dense in Ω. (Question: We need to use Ω is an
open subset of Rn in this paragraph. Where did we use that Ω is open?)
Now we prove the map Φ is injective. So if f and g are in C 0 (Ω), and
[f ] = [g], then by definition, f ≡ g on a set V of full measure. Let x ∈ Ω.
Since V is dense, there is a sequence xn → x, xn ∈ V . Then
f (x) = f (lim xn ) = lim f (xn ) = lim g(xn ) = g(lim xn ) = g(x)
n
n
n
n
since f and g are continuous and f (xn ) = g(xn ). So f and g coincide at each
point of Ω and so f = g in C 0 (Ω).
Finally, we show that for f ∈ C 0 (Ω), kf kC 0 = kf kL∞ . In particular, let µ
denote Lebesgue measure and compute (recall we often write kf kL∞ instead
of the more correct k[f ]kL∞ = kΦ(f )kL∞ )
kf kL∞ (Ω) = inf{a : |f (x)| ≤ a for almost every x ∈ Ω}
= inf{a : µ{x : |f (x)| > a} = 0}.
5
But µ{x : |f (x)| > a} = 0 implies that {x : |f (x)| > a} = ∅ (Proof: If the
set is not empty it is an open subset of Ω since |f | is continuous. The only
open subset of Ω with measure zero is the empty set.) So now
kf kL∞ (Ω) =
=
=
=
inf{a : µ{x : |f (x)| > a} = 0}
inf{a : {|f (x)| > a} = ∅}
inf{a : |f (x)| ≤ a for all x ∈ Ω}
sup |f (x)| = kf kC 0 (Ω) .
x∈Ω
Remark. The previous Proposition is true for any measurable subset Ω of Rn
with the following property: every nonempty open subset of Ω has positive
measure.
Remark. The map Φ from C 0 (Ω) to L∞ (Ω) is far from being onto. A typical
discontinuous function g cannot be changed on a set of measure zero to be
continuous. The following homework problem is to show this is the case with
the Heaviside function.
Homework Problem 2. Let g(x) be the Heaviside function on R. In other
words, let g(x) = 0 if x < 0 and g(x) = 1 if x ≥ 0.
(a) Show there is no function in C 0 (R) which is equal to g almost everywhere.
(b) Show that there is no sequence of functions fn ∈ C 0 (R) which satisfy
fn → g in L∞ (R).
Hint for (b): Show that if fn → g in L∞ (R), then {fn } is a Cauchy sequence
in C 0 (R). Then use Proposition 1 and show the resulting limit function
f∞ ∈ C 0 (R) must be equal to g almost everywhere. (This amounts to showing
that Φ(C 0 ) is a closed subspace of L∞ .) Provide a contradiction.
1.3
Quantifiers
It is worth taking the time to look in some detail at C 0 convergence, and to
compare it to pointwise convergence. By contrast, C 0 convergence is often
call uniform convergence.
6
For a metric space X, fn → f in C 0 (X), if for all > 0, there is an N so
that
n > N =⇒ kfn − f kC 0 (X) < .
In other words, for all > 0, there is an N so that
n>N
=⇒
sup |fn (x) − f (x)| < .
x∈X
So then fn → f in C 0 (X) implies that for all > 0, there is an N so that for
x ∈ X,
n > N =⇒ |fn (x) − f (x)| < .
A few easy manipulations imply in fact the following
Lemma 3. Let X be a metric space and let fn ∈ C 0 (X). Then fn → f in
C 0 (X) if and only if for every > 0, there is an N = N () so that for x ∈ X,
n>N
=⇒
|fn (x) − f (x)| < .
Homework Problem 3. Prove Lemma 3.
Since C 0 (X) is a Banach space, we know that the limit function f ∈
C 0 (X) as well, and thus the uniform limit of continuous functions is continuous. C 0 convergence is called uniform convergence because the N in Lemma
3 depends only on > 0 and not on x ∈ X: thus N is uniform over all
x ∈ X.
We contrast this with pointwise convergence. If fn are functions on X,
then fn → f pointwise if for all > 0 and x ∈ X, there is an N = N (, x) so
that
n > N =⇒ |fn (x) − f (x)| < .
The difference between pointwise and uniform convergence is subtle but very
important: in pointwise convergence N = N (, x) may depend on and x,
while in uniform convergence N = N () only depends on and is independent
of x.
We have belabored this point because it is one of the major issues in
analysis: keeping track of which constants, or quantifiers, depend on which
other quantifiers. (It is even better to have explicit bounds (estimates) on the
behavior of quantifiers with respect to each other.) Of course it is desirable
(though not always possible) to have more uniform dependence of quantifiers,
as we see in the following standard example:
We have seen that the uniform limit of continuous functions is continuous.
On the other hand, a pointwise limit of continuous functions may be not be:
7
Example 1. Consider X = [0, 1] and fn (x) = xn . Then fn → f pointwise
on [0, 1], where
0 for x ∈ [0, 1),
f (x) =
1 for x = 1.
So the pointwise limit f is discontinuous, and thus we see that fn 6→ f
uniformly.
1.4
Derivatives
The theory of derivatives in one variable is fairly straightforward: if a function
f : R → R is differentiable at p (i.e., f 0 (p) exists), then f must be continuous
at p. For functions of more than one variable, however, consider the following
example:
Example 2.
f (x, y) =
(
xy
x2 + y 2
0
for (x, y) 6= (0, 0)
for (x, y) = (0, 0),
has first partial derivatives everywhere but is not even continuous at (0, 0).
Even though f has all its first partial derivatives at (0, 0), we do not
consider f to be differentiable at (0, 0). For functions of more than one
variable, we introduce the following definition of differentiability, which is
stronger than just the existence of all the partial derivatives. Instead of Rvalued functions, we consider the slightly more general case of maps from Rn
to Rm . A basic reference is Spivak, Calculus on Manifolds, Chapter 2.
Let O ⊂ Rn be a domain, and let f = (f 1 , . . . , f m ) : O → Rm . Then f
is differentiable at a point a ∈ O if there is a linear map Df (a) : Rn → Rm
which satisfies
lim
h→0
|f (a + h) − f (a) − Df (a)(h)|
= 0,
|h|
where h ∈ Rn . Df (a) is called the derivative, or total derivative, of f at a.
Lemma 4. In terms of standard bases of Rn and Rm , Df (a) is written as
the Jacobian matrix
i ∂f
Df (a) =
(a) ,
i = 1, . . . , m, j = 1, . . . , n.
∂xj
8
In particular, if f is differentiable at a, then all the partial derivatives ∂f i /∂xj
exist at a.
Proof. Write Df (a) as the matrix (λij ). Also consider a path h = (0, . . . , k, . . . , 0),
where k → 0 sits in the j th slot. (In other words, hl = δjl k, where δjl is the
Kronecker delta, which is 1 if l = j and 0 otherwise.) We also use Einstein’s
summation convention. In n space, this summation convention requires that
any repeated index which appears in both up and down positions—such as
the l in the last two lines below—is assumed to be summed from 1 to n.
Compute
∂f i
f i (a1 , . . . , aj + k, . . . , an ) − f i (a)
(a) = lim
k→0
∂xj
k
[f i (a1 , . . . , aj + k, . . . , an ) − f i (a) − λil hl ] + λil hl
= lim
k→0
k
i l
λl δj k
= 0 + lim
k→0
k
i l
i
= λl δj = λj .
The key step, going from the second to the third line, follows from the assumption that f is differentiable at a.
Another important result with essentially the same proof concerns directional derivatives. For a vector v = v j ∈ Rn , The directional derivative at a
in the direction v is the vector
f (a + tv) − f (a)
.
t→0
t
Dv f (a) = lim
(Note we do not require kvk = 1 to define the directional derivative.) We
have the following lemma:
Lemma 5. If f is differentiable at a, then the directional derivative Dv f (a)
exists and
∂f
Dv f (a) = v j j .
∂x
As Example 2 above shows, the converse of Lemma 4 is not true without
extra assumptions on the partial derivatives. The following proposition gives
an easy criterion for a function to be differentiable:
9
Proposition 6. If f = (f 1 , . . . , f m ) has continuous first partial derivatives
∂f i /∂xj on a neighborhood of a, then f is differentiable at a.
Proof. For a component function f i , write
f i (a + h) − f i (a) = f i (a1 + h1 , a2 , . . . , an ) − f i (a1 , a2 , . . . , an )
+ f i (a1 + h1 , a2 + h2 , . . . , an ) − f i (a1 + h1 , a2 , . . . , an )
+ · · · + f i (a1 + h1 , a2 + h2 , . . . , an + hn ) −
f i (a1 + h1 , a2 + h2 , . . . , an−1 + hn−1 , an )
Now consider the first term in terms of the function f i (x1 , a2 , . . . , an ) of the
first variable x1 alone. The Mean Value Theorem shows that there is a b1
between a1 and a1 + h1 so that
f i (a1 + h1 , a2 , . . . , an ) − f i (a1 , a2 , . . . , an ) = h1
∂f i 1 2
(b , a , . . . , an ).
∂x1
Similarly, for all other terms the difference equals
hj
∂f i 1
(a + h1 , . . . , aj−1 + hj−1 , bj , aj+1 , . . . , an )
∂xj
for bj between aj and aj + hj . So if we set cj = (a1 + h1 , . . . , aj−1 +
hj−1 , bj , aj+1 , . . . , an ), then we have
i
i
f (a + h) − f (a) =
n
X
hj
j=1
∂f i
(cj ),
∂xj
where each cj → a as h → 0. So compute
n n
X ∂f i
i
i
X
∂f
∂f
i
(a)hj (cj ) − j (a) hj f (a + h) − f i (a) −
j
j
∂x
∂x
∂x
j=1
j=1
lim
= lim
h→0
h→0
|h|
|h|
i
n
i
X ∂f
j
∂f
|h |
(c
)
−
(a)
j
∂xj
j
∂x
j=1
≤ lim
h→0
|h|
i
n
i
X ∂f
∂f
≤ lim
(c
)
−
(a)
j
∂xj
j
h→0
∂x
j=1
= 0
10
since each ∂f i /∂xj is assumed to be continuous at a.
So we have proved that each component function f i is differentiable at a.
To show f is differentiable, just note
i
i
∂f
∂f
j
j
i
f (a + h) − f (a) −
f (a + h) − f (a) −
(a)h
(a)h
m
X
∂xj
∂xj
≤
,
|h|
|h|
i=1
which goes to 0 as h → 0.
Recall a function is (locally) C 1 if its first partial derivatives are continuous. The previous Proposition 6 shows that such functions are differentiable,
and Lemma 5 then shows that directional derivatives work as expected for
C 1 functions.
Now, for functions f on Ω an open subset of Rm , consider the norm
m X
∂f kf kC 1 (Ω) = kf kC 0 (Ω) +
∂xi 0
C (Ω)
i=1
and the space
C 1 (Ω) = {f : Ω → R : f, ∂1 f, . . . , ∂m f are bounded and continuous}.
Similarly, we can consider Rp -valued C 1 functions, the difference being that
the functions f , ∂i f have bounded values in Rp .
Proposition 7. On any open set Ω ⊂ Rm , C 1 (Ω, Rp ) is a Banach space.
Proof. It is straightforward to check k · kC 1 is a norm.
∂f
Since kf kC 1 ≥ kf kC 0 and kf kC 1 ≥ k ∂x
j kC 0 , then for any Cauchy sequence
∂fn
1
{fn } in C , {fn } and { ∂xj } are Cauchy sequences in C 0 . Therefore, since C 0
is a Banach space, there are uniform limits
f∞ = lim fn ,
n
gi = lim
n
∂f
,
∂xi
i = 1, . . . , m,
and f∞ , gi ∈ C 0 . Since
kf kC 1 = kf kC 0
m X
∂f +
∂xi i=1
11
C0
,
(1)
(1) shows it suffices to prove that
∂f∞
= gi ,
∂xi
i = 1, . . . , m.
As usual, we recognize that integrating has better properties than differentiating. For x ∈ Ω, choose an x0 = x − (0, . . . , k, . . . , 0), where the k > 0
is in the ith slot. Since Ω is open, we may choose k small enough so that the
line segment from x0 to x is contained in Ω. Compute
f∞ (x) = lim fn (x)
n
"
= lim fn (x0 ) +
n
= f∞ (x0 ) +
Z
y=xi
y=xi −k
Z
∂fn 1
(x , . . . , xi−1 , y, xi+1 , . . . , xm ) dy
∂xi
#
y=xi
gi (x1 , . . . , xi−1 , y, xi+1 , . . . , xm ) dy
(2)
y=xi −k
The key step in the computation is the last one: fn (x0 ) → f∞ (x0 ) is easy,
and the integral converges by the Dominated Convergence Theorem: Since
n
gi ∈ C 0 , there is a constant C so that |gi | ≤ C on Ω. Moreover, since ∂f
→ gi
∂xi
∂fn
∂fn
0
in C , there is an N so that | ∂xi − gi | ≤ 1 for all n ≥ N . Thus ∂xi are all
bounded by the integrable function C + 1, and the Dominated Convergence
Theorem applies.
∞
Now we can differentiate (2) with respect to xi and we see that ∂f
= gi
∂xi
at each x ∈ Ω. This completes the proof.
The last part of the proof is of independent interest. We record it as
Proposition 8. Let fn be C 1 functions on a domain Ω ⊂ Rm . Then if fn →
f uniformly and ∂fn /∂xi → gi uniformly for i = 1, . . . , m, then gi = ∂f /∂xi .
Remark. We can also define C k (Ω, Rp ) to be the space of all functions f :
Ω → Rp so that f and all its partial derivatives up to order k are continuous
and bounded. The norm is given by
X
kf kC k =
k∂α f kC 0 ,
(3)
|α|≤k
where α = (α1 , . . . , αm ), each αi ≥ 0, |α| = α1 + · · · + αm , and
∂α f =
∂ |α| f
(∂x1 )α1 · · · (∂xm )αm
12
(if some αi = 0, then there is no differentiation with respect to xi ).
We can use the same proof as above to conclude that C k is a Banach
space. In particular, we can apply the theorem to F = (f, f,1 , . . . , f,n ) and
then relate kF kC 1 to kf kC 2 to provide an inductive step.
C ∞ is not a Banach space, as the analog of (3) would involve an infinite
sum.
We’ve used the following problem implicitly a few times above.
Homework Problem 4. Show that if f : Rn → Rm is differentiable at a
point a, then it is continuous at a.
Homework Problem 5. Let f be a real-valued function defined on a domain
2f
∂2f
in R2 . Show that if the second mixed partials f,12 = ∂x∂1 ∂x
2 and f,21 = ∂x2 ∂x1
are continuous in a neighborhood of a point y, then
∂2f
∂2f
(y)
=
(y).
∂x1 ∂x2
∂x2 ∂x1
Hint: If the two are not equal, assume without loss of generality that the
difference f,12 − f,21 > 0 at y. Then it must be positive on a rectangular
neighborhood. Integrate this quantity over the rectangular neighborhood, and
use Fubini’s Theorem and the Fundamental Theorem of Calculus to arrive at
a contradiction.
Finally, we introduce the Chain Rule. We need the following lemma first:
Lemma 9. Let A : Rn → Rm be a linear map. Then there is a constant
C = C(A) so that |Ax| ≤ C|x| for all x ∈ Rn .
Homework Problem 6. Prove Lemma 9. Hint: write down Ax in terms
of the matrix entries of A.
Proposition 10 (Chain Rule). Let g : O → Rn , f : U → O, where
O ⊂ Rm and U ⊂ Rl are domains. Assume f is differentiable at a ∈ U, and
g is differentiable at f (a) ∈ O. Then there is a composition of linear maps
D(g ◦ f )(a) = Dg(f (a)) ◦ Df (a).
In terms of partial derivatives, this is equivalent to
∂g p
∂g p ∂y j
=
,
∂xi
∂y j ∂xi
where {xi } are coordinates on Rl , {y j } are coordinates on Rm , and we follow
the usual rules of Leibniz notation and Einstein summation.
13
Proof. Let A = Df (a), B = Dg(f (a)). Now consider the remainder terms
in the definition of differentiable maps. For h ∈ Rl , k ∈ Rm ,
φ(h) = f (a + h) − f (a) − A(h),
ψ(k) = g(f (a) + k) − g(f (a)) − B(k),
ρ(h) = (g ◦ f )(a + h) − (g ◦ f )(a) − (B ◦ A)(h).
Then since f and g are differentiable,
|φ(h)|
= 0,
|h|
|ψ(k)|
lim
= 0,
k→0 |k|
lim
h→0
(4)
(5)
and we want to show that
|ρ(h)|
= 0.
h→0 |h|
lim
So compute
ρ(h) =
=
=
=
So then
g(f (a + h)) − g(f (a)) − B(A(h))
g(f (a + h)) − g(f (a)) − B(f (a + h) − f (a) − φ(h))
[g(f (a + h)) − g(f (a)) − B(f (a + h) − f (a))] + B(φ(h))
ψ(f (a + h) − f (a)) + B(φ(h))
|ψ(f (a + h) − f (a))| |B(φ(h))|
|ρ(h)|
≤
+
.
|h|
|h|
|h|
|B(φ(h))|/|h| → 0 as h → 0 by (4) and Lemma 9. On the other hand (5)
shows that for all > 0 there is a δ so that
|k| < δ
|ψ(k)| ≤ |k|.
=⇒
Therefore if |f (a + h) − f (a)| < δ (which can be achieved if |h| < γ since f
is continuous),
|ψ(f (a + h) − f (a))|
|f (a + h) − f (a)|
≤
|h|
|h|
|A(h)| |φ(h)|
≤ +
|h|
|h|
14
Now if we let h → 0, using (4) and Lemma 9,
lim sup
h→0
|ψ(f (a + h) − f (a))|
≤ C.
|h|
Now we may let → 0 to show that |ρ(h)|/|h| → 0 as h → 0.
1.5
Contraction mappings
Another tool we need is a basic fact about complete metric spaces, the Contraction Mapping Theorem.
A fixed point of a map f : X → X is a point x ∈ X so that f (x) = x.
For a metric space X with metric d, a contraction map is a map g : X → X
so that there is a constant λ ∈ (0, 1) for which
d(g(x), g(y)) ≤ λ d(x, y) for all x, y ∈ X.
Remark. It is important that the constant λ < 1 is independent of the x and
y in X. As we’ll see below in a homework exercise, the following theorem is
false if we let λ depend on x and y.
Theorem 2 (Contraction Mapping). Any contraction mapping on a complete metric space has a unique fixed point.
Proof. As above, denote our metric space by X with metric d, and let
λ ∈ (0, 1) be the constant for the contraction map g: for all x, y ∈ X,
d(g(x), g(y)) ≤ λ d(x, y).
First we prove uniqueness. If x and y are fixed points of g (so g(x) = x,
g(y) = y), then
d(x, y) = d(g(x), g(y)) ≤ λ d(x, y).
So (1 − λ)d(x, y) ≤ 0. Since λ < 1 and d(x, y) ≥ 0 (since X is a metric
space), we must have d(x, y) = 0 and so x = y (again since X is a metric
space).
To prove existence of the fixed point, we consider any point x0 ∈ X,
and consider iterates defined inductively by xn+1 = g(xn ) for all n ≥ 0. We
claim xn is a Cauchy sequence and the limit x∞ of xn is the fixed point. For
15
n ≥ m ≥ 0, compute
d(xn , xm ) ≤
=
≤
≤
≤
d(xn , xn−1 ) + · · · + d(xm+1 , xm )
d(g(xn−1 ), g(xn−2 )) + · · · + d(g(xm ), g(xm−1 ))
λ d(xn−1 , xn−2 ) + · · · + λ d(xm , xm−1 )
λ2 d(xn−2 , xn−3 ) + · · · + λ2 d(xm−1 , xm−2 )
λn−1 d(x1 , x0 ) + · · · + λm d(x1 , x0 )
n−1
∞
X
X
λm
i
= d(x1 , x0 )
λ ≤ d(x1 , x0 )
λi = d(x1 , x0 )
1−λ
i=m
i=m
(Note that in this computation, we’ve used the exact sum of the geometric
series, and it is crucial that λ ∈ (0, 1): the geometric series diverges for
λ ≥ 1.) So if N is a positive integer, then for all n, m > N , d(xn , xm ) ≤
d(x1 , x0 )λN /(1 − λ), and this last quantity d(x1 , x0 )λN /(1 − λ) → 0 as N →
∞. Thus {xn } is a Cauchy sequence which has a limit x∞ ∈ X since X is a
complete metric space.
Now we prove that x∞ is a fixed point. Since x∞ = limi xi = limi xi+1 ,
we have
g(x∞ ) = g(lim xi ) = lim g(xi ) = lim xi+1 = x∞ ,
i
i
i
and so x∞ is a fixed point. One point to note is that we have interchanged g
with lim, which is valid only if g is continuous (this is a homework problem
below).
Homework Problem 7. Show any contraction map is continuous.
Homework Problem 8. Newton’s method is an iterative method for finding
zeros of differentiable functions. For an initial x0 , we proceed by the recursive
definition
f (xi )
xi+1 = xi − 0
.
f (xi )
Then the limit lim xn should produce a zero of the function f .
A differentiable function f : R → R has a nondegenerate zero at x if
f (x) = 0 and f 0 (x) 6= 0.
Assume f : R → R is a locally C 2 function (i.e., f 00 is continuous on all
of R). Show that every nondegenerate zero x of f has a neighborhood Nx so
that for any initial x0 ∈ Nx , Newton’s method converges to x. Hints:
16
(a) The main point is to exhibit the Newton’s method iteration as a contraction map on a complete metric space (recall a closed subset of any
complete metric space is complete). You must find an appropriately
small neighborhood of x on whose closure Newton’s method is a contraction map.
(b) You will need the following lemma: For a C 1 function g : R → R,
y 6= z ∈ [a, b]
=⇒
|g(y) − g(z)|
≤ max |g 0 (w)|.
w∈[a,b]
|y − z|
(c) Show that any fixed point of Newton’s method is a zero.
(d) Show the zero you have produced via Newton’s method must be the original zero x.
1.6
Differentiating under the Integral
Proposition 11. Let f = f (y, x) be a locally C 1 real-valued function for
y ∈ Rn , x ∈ O an open subset of Rm . Then on a measurable Ω ⊂⊂ O ⊂ Rm
equipped with Lebesgue measure dx,
Z
Z
∂
∂f
f (y, x) dx =
(y, x) dx.
i
i
∂y Ω
Ω ∂y
R
f (y, x) dx is C 1 as a function of y.
Ω
Remark. Ω ⊂⊂ O means that the closure Ω̄ in Rm is a compact subset of O.
Proof. Compute. Let ei be the standard ith basis vector on Rn .
Z
Z
Z
∂
1
f (y, x) dx = lim
f (y + kei , x) dx − f (y, x) dx
k→0 k
∂y i Ω
Ω
Ω
Z
f (y + kei , x) − f (y, x)
= lim
dx
k→0 Ω
k
∂f
Clearly as k → 0, the integrand goes to ∂y
We need to
i (y, x) pointwise.
show that the integrands are bounded in absolute value by a fixed integrable
function to use the Dominated Convergence Theorem. This follows from the
Mean Value Theorem, which shows that the integrand is equal to
∂f
(ỹ, x)
∂y i
17
for ỹ = (y 1 , . . . , y i−1 , bi , y i+1 , . . . , y n ), bi between y i and y i + k. Since f is C 1 ,
∂f /∂y i is continuous, Ω̄ is compact, and ỹ stays in a compact neighborhood
of y, then
the absolute value of the integrand is bounded by a constant M .
R
Since Ω M dx < ∞, the Dominated Convergence Theorem shows that
Z
Z
f (y + kei , x) − f (y, x)
∂
f (y, x) dx = lim
dx
i
k→0 Ω
∂y Ω
k
Z
f (y + kei , x) − f (y, x)
=
lim
dx
k
Ω k→0
Z
∂f
=
(y, x) dx
i
Ω ∂y
R
To show that Ω f (y, x) is C 1 as a function of y, note that its partial
derivatives
Z
∂f
(y, x) dx
gi (y) =
i
Ω ∂y
are continuous in y by the Dominated Convergence Theorem again, since if
y → y0 , then
Z
∂f
lim gi (y) = lim
(y, x) dx
y→y0
y→y0 Ω ∂y i
Z
∂f
(y, x) dx
=
lim
i
Ω y→y0 ∂y
Z
∂f
=
(y , x) dx
i 0
Ω ∂y
= gi (y0 )
because ∂f /∂y i is continuous in y.
Remark. The last argument also shows that if f = f (z, x) is a continuous
function of z and x, and x ∈ Ω a compact subset of Rn , then the function
Z
z 7→
f (z, x) dx
Ω
is continuous.
18
1.7
The Inverse Function Theorem
We need the following lemma first:
Lemma 12. If f is a C 1 function from a ball B in Rn to Rm , which satisfies
i
∂f ∂xj ≤ C
on B, then for y, z ∈ B,
|f (y) − f (z)| ≤ Cmn|y − z|.
Proof. If y, z ∈ B, then the line segment {ty + (1 − t)z : 0 ≤ t ≤ 1} between
them is also contained in B (see Homework Problem 13 below). Then use
the Chain Rule to compute for i = 1, . . . , m,
Z 1
∂ i
i
i
|f (y) − f (z)| = f (ty + (1 − t)z) dt
∂t
Z0 1
i
∂f
j
j
= (y − z ) j (ty + (1 − t)z) dt
∂x
0
≤ Cn|y − z|.
(Note this argument is essentially the same as the use of the Mean Value
Theorem.) Now apply
|f (y) − f (z)| ≤
m
X
|f i (y) − f i (z)|.
i=1
Theorem 3 (Inverse Function Theorem). Let f : O → U be a C 1 map
between domains in Rm . Assume that for a ∈ O, Df (a) is an invertible
matrix (i.e., det Df (a) 6= 0). Then there are neighborhoods O0 3 a and
U 0 3 f (a) so that f : O0 → U 0 is a bijection and f −1 is also a C 1 map. For
every b ∈ O0 , D(f −1 )(f (b)) = (Df (b))−1 .
Proof. First of all, we may reduce to the case that a = f (a) = 0 and Df (a) =
I the identity map from Rm to itself. (This can be achieved by replacing f (x)
by (Df (a))−1 (f (x + a) − f (a)). Then use the Chain Rule and the fact that
the derivative of the linear map (Df (a))−1 is (Df (a))−1 itself.)
19
Now consider g(x) = x − f (x) and note that Dg(0) = 0 the zero linear
transformation. Since g is C 1 , there is an r > 0 so that |x| < 2r implies
i
∂g
< 1 ,
(x)
for i, j = 1, . . . , m.
(6)
∂xj 2m2
Let B(r) = {x ∈ Rm : |x| < r}. Then Lemma 12 and g(0) = 0 imply that
g(B(r)) ⊂ B(r/2).
Now let y ∈ B(r/2) and consider
gy (x) = g(x) + y = x − f (x) + y.
Then
• gy (x) = x is equivalent to f (x) = y, and so a fixed point of gy is
equivalent to a solution to f (x) = y.
• If x ∈ B(r), |gy (x)| ≤ |g(x)| + |y| ≤ r, and so gy is a map from the
complete metric space B(r) to itself.
• Lemma 12 and (6) imply gy is a contraction map (with λ = 1/2). In
other words, for x1 , x2 ∈ B(r),
|gy (x1 ) − gy (x2 )| = |g(x1 ) − g(x2 )| ≤ 12 |x1 − x2 |
(7)
Therefore, for each y ∈ B(r/2), there is a unique fixed point x of gy , which
shows there is a unique solution x to f (x) = y in B(r).
Now we show x = f −1 (y) is continuous: for x1 , x2 ∈ B(r), we have, by
the definition g = x − f and (7)
|x1 − x2 | ≤ |g(x1 ) − g(x2 )| + |f (x1 ) − f (x2 )|
≤ 21 |x1 − x2 | + |f (x1 ) − f (x2 )|,
1
|x − x2 | ≤ |f (x1 ) − f (x2 )|,
2 1
|f −1 (y1 ) − f −1 (y2 )| ≤ 2|y1 − y2 |
(8)
for yi = f (xi ). Thus f −1 is continuous.
To show f −1 is differentiable at y2 with total derivative (Df (x2 ))−1 , we
need to show that
lim
y1 →y2
|f −1 (y1 ) − f −1 (y2 ) − (Df (x2 ))−1 (y1 − y2 )|
= 0.
|y1 − y2 |
20
To show this, compute
|f −1 (y1 ) − f −1 (y2 ) − (Df (x2 ))−1 (y1 − y2 )|
= |x1 − x2 − (Df (x2 ))−1 (f (x1 ) − f (x2 ))|
= |(Df (x2 ))−1 [Df (x2 )(x1 − x2 ) − (y1 − y2 )]|
≤ C|Df (x2 )(x1 − x2 ) − (y1 − y2 )| (by Lemma 9)
= C|Df (x2 )(x1 − x2 ) − [f (x1 ) − f (x2 )]|
(9)
Therefore,
|f −1 (y1 ) − f −1 (y2 ) − (Df (x2 ))−1 (y1 − y2 )|
|y1 − y2 |
−1
|f (y1 ) − f −1 (y2 ) − (Df (x2 ))−1 (y1 − y2 )| |x1 − x2 |
=
·
|x1 − x2 |
|y1 − y2 |
(Note y1 6= y2 implies x1 6= x2 since yi = f (xi ).) This expression goes to zero
as y1 → y2 by (8) and (9), since f is differentiable at x2 .
Finally we show the total derivative (Df (x))−1 is continuous in y. We
2
can think of Df as a map from x to Rm , which represents the space of
m × m matrices. Df (x) is continuous in x (f is C 1 ), and thus is continuous
2
in y. The determinant function det : Rm → R is continuous, since it is a
polynomial in the matrix entries. So det Df (x) is bounded away from zero,
by compactness of B(r). We are left to prove the continuity of the matrix
inverse operation for square matrices with determinant bounded away from 0.
This follows from the formula from the inverse in terms of cofactor matrices:
Each entry of the inverse matrix A−1 = (aij )−1 is of the form
(m − 1)st -order polynomial in the aij
.
det(aij )
Homework Problem 9. If, in the Inverse Function Theorem, f is a smooth
(C ∞ ) map, then f −1 : U 0 → O0 , the C 1 local inverse of f , is also C ∞ . Hints:
(a) If A = A(s) is a family of invertible n × n matrices which depend
differentiably on a real parameter s, differentiate the equation AA−1 = I
to show
d(A−1 )
dA
= −A−1 A−1 .
ds
ds
21
(b) Use the formula for D(f −1 ) to show that f −1 is C ∞ .
Hints: It may be helpful to use the following notation. If f = f (x) =
f (x1 , . . . , xn ), we may write (y 1 , . . . , y n ) = y = y(x) = f (x). And so
f −1 (y) = x may be written simply as y = y(x). To show f −1 is C 2 , for
example, you should write
∂ 2 (f −1 )k
∂ 2 xk
=
∂y i ∂y j
∂y i ∂y j
in terms of (the components of ) the first and second derivatives
∂f
∂y
=
,
∂xi
∂xi
and
∂2f
∂2y
=
∂xi ∂xj
∂xi ∂xj
and verify that the resulting expression is continuous.
Remember to use the Chain Rule, as in e.g.,
∂
∂xi ∂
=
,
∂y j
∂y j ∂xi
and recall that Df −1 = (Df )−1 can be written as
∂xi
∂y j
=
∂y k
∂xl
−1
.
It will also be helpful to use Einstein’s summation notation. In particular, the matrix notation used in part (a) is insufficient, as there may
be quantities with more than 2 indices which need to be summed.
Theorem 4 (Implicit Function Theorem). Suppose f : Rn × Rm → Rm
is C 1 in an open set containing (a, b), and assume f (a, b) = 0. Assume the
m × m matrix
∂f i
(a, b) ,
1 ≤ i, j ≤ m
∂xn+j
is invertible. Then there is an open set O ⊂ Rn containing a and an open
set U ⊂ Rm containing b so that for each x ∈ O, there is a unique g(x) ∈ U
so that f (x, g(x)) = 0. g is locally C 1 .
Homework Problem 10. Prove the Implicit Function Theorem. Hints:
22
(a) Consider F : Rn × Rm → Rn × Rm defined by F (x, y) = (x, f (x, y)) and
apply the Inverse Function Theorem to F .
(b) Show that, on a suitably small neighborhood, F −1 is of the form F −1 (x, y) =
(x, p(x, y)) for p : Rn × Rm → Rm .
(c) Show that g(x) = p(x, 0) satisfies the conditions of the theorem.
1.8
Lipschitz constants and functions
A closely related concept to the contraction map is the Lipschitz constant.
A map f : X → Y has Lipschitz constant
L=
sup
x,x0 ∈X:x6=x0
dY (f (x), f (x0 ))
.
dX (x, x0 )
Here of course dX and dY are the metrics on X and Y respectively. An
equivalent definition is that L is the smallest constant so that
dY (f (x), f (x0 )) ≤ L dX (x, x0 ) for all x, x0 ∈ X.
A function with finite Lipschitz constant is called Lipschitz. A basic fact is
the following:
Lemma 13. Any Lipschitz function is continuous.
If f : X → X, then the Lipschitz constant gives a criterion for a mapping
to be a contraction mapping:
Lemma 14. f : X → X is a contraction map if and only if the Lipschitz
constant L of f is strictly less than 1.
Idea of proof. The Lipschitz constant is the smallest value of λ for which f
is a contraction map.
If f : R → R, then the Lipschitz constant is simply
L = sup
x6=y
|f (x) − f (y)|
,
|x − y|
which of course is suggestive of the definition of the derivative. In fact, the
following is true:
23
Homework Problem 11. The Lipschitz constant of a locally C 1 function
f : R → R is equal to supx∈R |f 0 (x)|.
Hint: To show the two quantities are equal, you need to relate the sup of
the derivative to the sup of the difference quotients. To relate the derivative
f 0 (x) to difference quotients, use the definition of the derivative. To relate a
given difference quotient to a derivative, use the Mean Value Theorem.
The previous problem shows that any differentiable function with bounded
derivative is Lipschitz. The converse is false, as we see in the following example.
Example 3. The function x 7→ |x| is a Lipschitz function from R to R. This
follows from the observation that for each x 6= y ∈ R,
|x| − |y| ≤ 1.
|x − y|
(This can be proved using the Triangle Inequality.)
Example 4. For any constant α ∈ (0, 1), the function from R to R x 7→ |x|α
is not Lipschitz. In particular,
α
|x| − |0|α lim
= lim |x|α−1 = ∞.
x→0
x→0
|x − 0|
In terms of the graph of a function, a function whose graph has a corner
(as does x 7→ |x|) is Lipschitz, while a function whose graph has a cusp (as
does x 7→ |x|α ) is not Lipschitz.
Another basic fact we establish is this: the conclusion of the Contraction
Map Theorem may be false if the Lipschitz constant is equal to 1. An easy
example is the map x 7→ x + 1 from R → R. The Lipschitz constant is
obviously 1, and there is no fixed point. A related, but somewhat more
surprising fact, is outlined in the following problem:
Homework Problem 12. Find an example of a differentiable function f :
R → R so that for each x 6= y,
|f (x) − f (y)|
< 1,
|x − y|
and yet f has no fixed point. Prove your answer works.
24
Hint: The point of this problem is that there should be no uniform L < 1
which works for all x and y. To construct such a function f , use Problem
11 above. In particular, first construct the derivative f 0 and then integrate
to find f . (You’ll need supx |f 0 (x)| = 1; why?) Use the Mean Value Theorem
to relate values of f 0 to difference quotients.
A subset C of a real vector space is convex if every line segment connecting
two points in C is contained in C. More formally, C is convex if
x, y ∈ C,
t ∈ [0, 1]
=⇒
tx + (1 − t)y ∈ C.
Proposition 15. Any globally C 1 function from a convex domain Ω ⊂ Rn
to Rm is globally Lipschitz.
Proof. Lemma 12 above shows that for any x, y ∈ Ω,
i ∂f
|f (x)−f (y)| ≤ Cnm|x−y|, for C = sup j (z) : z ∈ Ω, i ≤ n, j ≤ m .
∂x
C < ∞ since f is C 1 . Thus f is Lipschitz.
Consider X a locally compact metric space and Y any metric space. Then
we say a function f : X → Y is locally Lipschitz if f satisfies one of the two
following equivalent definitions:
1. f is Lipschitz when restricted to any compact set of X. In other words,
if K ⊂ X is compact, then there is a constant LK so that
x, x0 ∈ K
=⇒
dY (f (x), f (x0 )) ≤ LK dX (x, x0 ).
2. Each x ∈ X has a neighborhood on which f is Lipschitz.
We prove these two definitions are equivalent below.
Corollary 16. On any domain Ω ⊂ Rn , any locally C 1 function f is locally
Lipschitz.
Proof. Any ball is convex (see the following homework problem), and so if
f is C 1 on a small ball, then it is Lipschitz on the ball by the previous
Proposition 15.
25
Homework Problem 13. Show that any ball Bx (r) = {y ∈ Rn : |y−x| < r}
is convex.
Proposition 17. Let X be a locally compact metric space and Y be any
metric space, then for maps f from X to Y , the two definitions (1) and (2)
above are equivalent.
Proof. To prove (1) =⇒ (2), consider x ∈ X. Since X is locally compact,
there is a neighborhood O of x with compact closure. By the definition of
locally Lipschitz, f is Lipschitz when restricted to Ō, and is thus Lipschitz
on O also.
To prove part (2) =⇒ (1), let K ⊂ X be a compact subset. Given that
all points in X have neighborhoods on which f is Lipschitz, we need to prove
that f is Lipschitz on K. The set of all neighborhoods of points in K on
which f is Lipschitz forms an open cover of K, and thus there is a finite
subcover O1 , . . . , On . The set
!
n
[
Oi × O i
P =K ×K \
i=1
is compact, and so the function
dY (f (x), f (x0 ))
,
dX (x, x0 )
which is continuous on P , attains its maximum M on P .
Consider any x 6= x0 ∈ K. Then either (x, x0 ) ∈ P or x, x0 ∈ Oi for
some i = 1, . . . , n. Let Li be the Lipschitz constant of f |Oi . Choose L =
max{M, L1 , . . . , Ln }. Then for every x 6= x0 ∈ K,
dY (f (x), f (x0 ))
≤L
dX (x, x0 )
and f is Lipschitz on K.
26
2
Ordinary Differential Equations
2.1
Introduction
An ordinary differential equation (an ODE ) is an equation of the form
x(n) (t) = F (x(n−1) , . . . , ẋ, x, t),
(10)
where x : I → R is a function of t, I is an open interval in R,
ẋ =
dx
,
dt
and x(n) =
dn x
.
dtn
The order of the above equation is n, the highest derivative of x which
appears. It is also useful to consider the case
x = (x1 , . . . , xm ) : I → Rm ,
which is called a system of ODEs.
Some ODEs can be solved explicitly by using integration techniques, but
most cannot. For most ODEs, instead of explicit solutions, we must rely
on an abstract existence theorem to show that for nice enough F (Lipschitz
suffices), there is a unique solution locally. We also investigate the regularity
of solutions, showing, for example, if F is smooth, then any solution to (10)
is smooth. Existence, uniqueness, and regularity are three main themes in
the theory of all differential equations, and there are satisfactory theorems
to handle all three for ODEs.
Consider the following example (where x, not t, is the dependent variable):
Example 5. Consider the differential equation dy/dx = x2 y. This first order
ODE is called separable, since it is written in the form dy/dx = f (x)g(y).
Recall the solution procedure for a separable ODE:
• If c is a root of g(y), then y = c is a solution. (Why?) So in the present
case, y = 0 is a solution.
27
• For other values of g(y), compute
dy
= x2 y,
dx
dy
= x2 dx,
y
Z
Z
dy
=
x2 dx,
y
x3
ln |y| =
+ C,
3
3
3
y = ±eC ex /3 = C 0 ex /3 ,
where C 0 = ±eC is a nonzero constant.
• If we let C 0 be any real number, then we capture both cases above, and
3
the general solution is y = C 0 ex /3 .
Homework Problem 14. Consider the ODE
dy
1 + y2
=
.
dx
1 + x2
(a) Find the general solution to this differential equation. Your answer
should be rational functions of x. You may need to write your answer
using more than one case.
(b) Find the particular solution passing through (x, y) = (1, 1).
(c) Find the particular solution passing through (x, y) = (1, −1). (Hint:
What is the formula for tan(φ + π2 )?)
2.2
Local Existence and Uniqueness
The most natural setting for systems of ODEs is in terms of an initial value
problem. Let x = (x1 , . . . , xn ) = x(t). An initial value problem for a first
order system of ODEs at t = t0 consists of
• a system of ODEs ẋ = v(x, t)
• and an initial condition x(t0 ) = x0 .
28
We’ll see below that if v satisfies a Lipschitz condition, and for t in a small
interval around t0 , there is a unique solution to the initial value problem.
Example 6. Consider the following problem: Find a solution to the ODE
ẏ = y 2 subject to the initial condition y(0) = 1. Interpreting t as a time
variable, what happens as time goes forward from t = 0?
Solution: dy/dt = y 2 is separable, and so compute
Z
Z
dy
1
1
dy
= dt =⇒
= dt =⇒ − = t + C =⇒ y = −
.
2
2
y
y
y
t+C
Plug in the initial condition y = 1 and t = 0 to solve for C to find C = −1
and
1
y=
.
1−t
Note that y(t) is discontinuous at t = 1, so as time goes forward from t = 0,
the solution only exists until time 1. Also note there is no problem going
backward in time, and so the solution to the initial value problem is
y=
1
,
1−t
t ∈ (−∞, 1).
It does not make sense to talk about the solution to the initial value problem
beyond t = 1.
The previous example shows that it is not in general possible to extend a
solution to an initial value problem for all time. However, we can still hope
to find a solution to an initial value problem on a neighborhood (t0 − , t0 + )
of t0 .
Theorem 5. Consider the initial value problem
ẋ = v(x, t),
x(t0 ) = x0
(11)
for x : I → Rn for I an open neighborhood of t0 . Assume v is a Lipschitz
function from O × I → Rn , where O ⊂ Rn is an open neighborhood of x0 .
Then on a neighborhood I˜ of t0 contained in I, there is a unique solution φ
to (11).
Before we give the proof, let us consider a few examples.
29
Example 7. The differential equation ẋ = x2 + t has no solution which can
be written down in terms of standard algebraic and transcendental functions
(such as roots, exponentials, trigonometric functions). Theorem 5 states that
there is a local solution for every initial value problem. For example, for
initial conditions x(0) = 1, there is a solution valid on an open interval
containing t = 0.
Theorem 5 does not guarantee a solution which is valid for all time t (see
Example 6 above). In fact the solution for the present initial-value problem
will also blow up in finite time. This is basically because for t ≥ 0, ẋ =
x2 + t ≥ x2 , and so the solution should grow faster than the solution to
Example 6, which goes to infinity in finite time.
If v in Theorem 5 is not Lipschitz, then it is possible to lose the uniqueness
statement from Theorem 5 (although existence is still valid).
Example 8. Consider the initial value problem
2
ẋ = x 3 ,
x(0) = 0.
Then it is straightforward to verify that x(t) = 0 is a solution. There is
another solution, however. Solve the equation
2
dx
= x3 ,
dt
− 23
x dx = dt,
Z
Z
− 23
x dx =
dt,
1
3x 3 = t + C,
x = ( 13 t + 13 C)3 .
Then plug in x(0) = 0 to find C = 0 and the solution x(t) = ( 13 t)3 .
2
The point of this example is that v = x 3 is not Lipschitz—see Example 4
above. Therefore, Theorem 5 does not apply.
Proof of Theorem 5. The idea of the proof is to set up the problem in terms
of a contraction mapping. We first find an iteration whose fixed point solves
the differential equation and then find an appropriate complete metric space
on which the iteration is a contraction map.
30
For a continuous Rn -valued function φ defined on a neighborhood of t0 ,
let Aφ be another such function defined as follows:
Z t
(Aφ)(t) = x0 +
v(φ(τ ), τ )dτ.
(12)
t0
(Note we are integrating Rn -valued function. This may be related to the
usual R-valued integration theory by considering each component separately.)
A will be our iterative map, and we consider φ, Aφ, A2 φ, etc., to be the
Picard approximations for the initial value problem. We consider Picard
approximations because of the following
Lemma 18. A continuous fixed point of the Picard approximation (12) is a
solution to the initial value problem (11). In particular, any such fixed point
is continuously differentiable.
Proof. If Aφ = φ, then compute
Z t
d
φ̇ =
x0 +
v(φ(τ ), τ )dτ = v(φ(t), t)
dt
t0
by the Fundamental Theorem of Calculus. In particular, since φ and v are
continuous (Lemma 13), φ̇ is continuous, and so φ is continuously differentiable. Lastly, check the initial condition
Z t0
φ(t0 ) = x0 +
v(φ(τ ), τ )dτ = x0
t0
to complete the proof of the lemma.
Our complete metric space will be
˜ Rn ) : φ(t0 ) = x0 , sup |φ(t) − x0 | ≤ P },
X = {φ ∈ C 0 (I,
t∈I˜
where I˜ = [t0 − , t0 + ] ⊂ I for a small positive to be determined later,
| · | is the norm on Rn , and P is chosen so that the closed ball Bx0 (P ) = {x :
|x − x0 | ≤ P } ⊂ O. We first demonstrate
Lemma 19. X is a complete metric space.
31
˜ Rn ) is complete by Proposition 1. Moreover, the
Proof. First of all, C 0 (I,
conditions imposed give closed subsets of the Banach space C 0 . The second
condition is obviously closed since the norm on any Banach space is continuous. To check the condition φ(t0 ) = x0 is closed, use the following lemma,
whose proof is immediate:
Lemma 20. For a metric space J and y ∈ J, the map from the Banach
space C 0 (J, Rn ) to Rn given by f 7→ f (y) is continuous.
Since these two conditions are closed, X is a closed subset of the complete
˜ Rn ), and so is complete with the induced metric.
metric space C 0 (I,
Remark. Lemma 20 is false for the Banach space L∞ . Why?
So we have proved that X is a complete metric space. Next we show
Lemma 21. For > 0 small enough, A : X → X.
Proof. First of all, choose δ > 0 so that [t0 − δ, t0 + δ] ⊂ I. Since v is
continuous and {x : |x − x0 | ≤ P } × [t0 − δ, t0 + δ] is compact, there is a
constant M so that
|v(x, t)| ≤ M.
sup
|t−t0 |≤δ, |x−x0 |≤P
In order for this bound to work below, we must have ≤ δ (so then I˜ ⊂
[t0 − δ, t0 + δ]). To check A : X → X, we need to check for each φ ∈ X,
1. Aφ is continuous. This follows as in Lemma 18 above.
2. (Aφ)(t0 ) = x0 . This is easy to check as in Lemma 18.
3. supt∈I˜ |(Aφ)(t) − x0 | ≤ P . To check this, write
Z t
|(Aφ)(t) − x0 | = v(φ(τ ), τ )dτ ≤ M |t − t0 | ≤ M ,
t0
where we have used the fact that φ ∈ X and the definition of M to
show the first inequality. So this condition is satisfied if ≤ P/M .
So A : X → X if ≤ min{δ, P/M }.
32
Finally we use the Lipschitz hypothesis on v to show that A is a contraction map. Let L be the Lipschitz constant for v. Then for φ, ψ ∈ X,
compute
Z t
|(Aφ)(t) − (Aψ)(t)| = [v(φ(τ ), τ ) − v(ψ(τ ), τ )]dτ t
Z t0
≤
|v(φ(τ ), τ ) − v(ψ(τ ), τ )|dτ
t0
Z t
≤
L|φ(τ ) − ψ(τ )|dτ
t0
≤ Lkφ − ψkC 0 |t − t0 |
≤ Lkφ − ψkC 0
Then since kAφ − AψkC 0 = supt∈I˜ |(Aφ)(t) − (Aψ)(t)|, we see that
kAφ − AψkC 0 ≤ Lkφ − ψkC 0 .
So A is a contraction map if < 1/L. Thus all together, if we require
< min{δ, P/M, 1/L}, then A is a contraction map on X, and its fixed
point is a solution to the initial value problem.
In order to show uniqueness of the initial value problem, note that the
Contraction Mapping Theorem automatically proves that any two continuous
solutions φ1 and φ2 to the initial value problem from I˜ to Rn must coincide
if the additional constraint
sup |φ(t) − x0 | ≤ P
t∈I˜
is satisfied. Since φ1 and φ2 are continuous and satisfy the initial condition,
this condition is automatically satisfied for both φ1 and φ2 on a (perhaps
smaller) interval Iˆ ⊂ I˜ containing t0 . Then uniqueness applies on this smaller
interval, since A is a contraction map for any small enough. Note that the
interval Iˆ on which φ1 = φ2 may depend on φ1 and φ2 . The proof that the
two solutions must coincide on all of I˜ depends on the Extension Theorem 6
below.
We record what we have proven so far with respect to uniqueness here.
Proposition 22. Any two solutions φ1 and φ2 to the initial value problem
(11) coincide on a small interval containing t0 . The interval may depend on
the solutions φ1 and φ2 .
33
Remark. Note that in the proof of the previous theorem, we only use that
v is Lipschitz in the x variables (with a uniform Lipschitz constant uniform
valid for all t). We still require v to be continuous in t.
The previous theorem provides a continuously differentiable solution on
an interval I˜ containing the initial time t0 and proves uniqueness on a (perˆ There is a satisfactory more global theory of ODEs
haps) smaller interval I.
which we detail in the next subsection.
2.3
Extension of solutions
Recall, from Corollary 16 above, that any locally C 1 function f from Ω, a
domain in Rn , to Rm is locally Lipschitz. In other words, f is Lipschitz when
restricted to any compact subset of Ω.
Theorem 6 (Extension). Consider an initial value problem
ẋ = v(x, t),
x(t0 ) = x0 .
(13)
Assume v is continuous and locally Lipschitz in Rn × I, where I is an open
interval containing t0 . Then there is an open interval J satisfying t0 ∈ J ⊂ I
and a unique solution φ : J → Rn to the initial value problem. Moreover,
J is maximal in the following sense: if there is a time T ∈ I ∩ ∂J, then
lim supt→T |φ(t)| = ∞.
So this theorem says that if we start with an initial condition x(t0 ) = x0
and flow forward (or backward) in time by satisfying the ODE, then there
is a unique solution which continues until (1) the end of the interval I is
reached, or (2) the solution blows up.
Proof. We first consider the following lemma, which is a consequence of the
proof of Theorem 5 above:
Lemma 23. On any compact subset K of Rn × I, there is an > 0 so
that for any (x0 , t0 ) ∈ K, there exists a solution to the initial value problem
ẋ = v(x, t), x(t0 ) = x0 which is valid on [t0 − , t0 + ].
The point is that there is a uniform which works for all initial conditions
(x0 , t0 ) ∈ K.
34
Proof. Recall that in the proof of Theorem 5. Any < min{δ, P/M, 1/L}
works. By compactness of K and since I is open, we can choose a uniform
δ > 0 so that for all (x0 , t0 ) ∈ K, [t0 − δ, t0 + δ] ⊂ I. We may choose P to
be any positive number (since O = Rn in the present case). The Lipschitz
constant L = LK̃ is uniform over any compact set K̃ by the locally Lipschitz
property of v (Proposition 17). Let
M = max |v(x, t)|,
(x,t)∈K̃
where
K̃ = {(x, t) ∈ Rn+1 : ∃(x0 , t0 ) ∈ K : |t − t0 | ≤ δ, |x − x0 | ≤ P }
It is straightforward to check K̃ is compact (it is the image of the compact set K × BP (0) × [−δ, δ] ⊂ Rn+1 × Rn+1 under the continuous map
+ : Rn+1 × Rn+1 → Rn+1 .) Therefore, since v is continuous, M can be chosen
independently of (x0 , t0 ) ∈ K.
(Note the reason we need to go to all of K̃: the definition of M in the
proof of Theorem 5 above is
M=
sup
|v(x, t)|.
|t−t0 |≤δ,|x−x0 |≤P
In order to have a single M work for all (x0 , t0 ) ∈ K, we must have let
(x, t) ∈ K̃. L must be valid on all of K̃ as well, since we consider integrals
from t0 to t, where (x0 , t0 ) ∈ K, |t − t0 | ≤ < δ.)
Now we must ensure that < min{δ, P/M, 1/L}. All of these quantities
can be chosen independently of (x0 , t0 ) ∈ K.
Lemma 24 (Gluing solutions). Consider any two solutions to ẋ = v(x, t)
which are defined on intervals in R. If the two coincide on any interval
in R then they must coincide on the entire intersection of their intervals of
definition. Thus they can be glued together to form a solution on the union
of their intervals of definition.
Proof. Consider two solutions φ1 , φ2 to ẋ = v(x, t) defined on intervals I1
and I2 . Assume they coincide on an interval I3 ⊂ I1 ∩ I2 . We want to show
φ1 = φ2 on all of I1 ∩ I2 . Let I4 be the largest interval containing I3 on which
φ1 and φ2 coincide (take I4 to be the path-connected component of the closed
set {t : φ1 (t) = φ2 (t)} containing I3 ). Now we will show that I4 = I1 ∩ I2 .
35
Assume I4 6= I1 ∩ I2 . Then since I4 is a relatively closed subinterval of
I1 ∩ I2 , there is an endpoint T of I4 in the interior of I1 ∩ I2 . Now φ1 and φ2
are both solutions of
ẋ = v(x, t),
x(T ) = φ1 (T ) [= φ2 (T )].
Proposition 22 shows that φ1 and φ2 must agree on a small interval I5 3 t0 .
Thus I4 must contain I5 , and we have a contradiction to the assumption that
T is an endpoint of I4 in the interior of I1 ∩ I2 . Thus I4 = I1 ∩ I2 .
It may help to refer to the following picture of the intervals involved.
I1
I2
I1 ∩ I2
I3
I4
I5
r
T
Now we have proved that φ1 = φ2 on the intersection of their domains of
definition I1 ∩ I2 . To extend to I1 ∪ I2 , define
φ1 (t) for t ∈ I1 ,
φ(t) =
φ2 (t) for t ∈ I2 \ I1 .
Note that φ is a solution to the differential equation since both φ1 and φ2
are. There is no trouble with the differentiability of this piecewise-defined
function since φ1 = φ2 on the whole interval I1 ∩ I2 .
For simplicity, consider only solutions moving forward in time. Let
E = {t ∈ I+ : there is a unique solution φ to (13) on [t0 , t)},
where I+ = I ∩ (t0 , ∞). We will set this E to be equal to J+ = J ∩ (t0 , ∞).
Uniqueness on [t0 , t) means any other solution to the initial value problem
defined on an interval containing [t0 , t) must coincide with φ there. It will
suffice to prove the following
Lemma 25. If supE |φ| ≤ C < ∞, then E = I+ .
Proof. Assume |φ| is uniformly bounded on E. Then to prove the lemma it
is enough to show that E is a nonempty, open, and closed subset of I+ (and
36
so E = I+ since I+ is connected). E is nonempty by Theorem 5 and Lemma
24 above.
To show E is open in I+ , let T ∈ E. Then there is a unique solution φ
defined on (t0 , T ). First we note that (t0 , T ] ⊂ E. To see this, let T 0 ∈ (t0 , T ].
Then the restriction of φ = φT to [t0 , T 0 ] is a solution to (13) on [t0 , T ).
Moreover, it is unique, since any other solution to (13) on [t0 , T 0 ) agrees with
φ on a neighborhood of t0 , and so Lemma 24 shows they must agree on all
[t0 , T 0 ).
So to show E is open, we may restrict our attention to times larger than
T . Since |φ| is uniformly bounded by C and [t0 , T ] is a compact subinterval
of I, we may apply Lemma 23 to show there is uniform so that any solution
to the differential equation with initial condition x(τ ) = χ for τ ∈ [t0 , T ],
|χ| ≤ C must exist on [τ − , τ + ]. Now we may consider the initial value
problem
ẋ = v(x, t),
x(T − 2 ) = φ(T − 2 ).
(14)
So Lemma 23 shows there is a solution φ̃ to this initial value problem which
exists on [T − 32 , T + 2 ]. Moreover, Lemma 24 says that φ = φ̃ on the
intersection of their intervals of definition, and moreover, that φ may be
extended by φ̃ to a solution on [t0 , T + 2 ]. Lemma 24 also implies this
extension is unique on every subinterval containing t0 , and so in particular
[T − 2 , T + 2 ] ⊂ E and E is open.
T−
r
2
T
r
[t0 , T ]
[T − 32 , T + 2 ]
It remains to show that E is closed in I+ . Let T ∈ Ē ∩ I+ . Let ti ∈ E,
ti → T . Then the assumption that |φ| ≤ C on E implies there is a uniform
so that for all ti , there is a solution on [ti − , ti + ]. Choose ti so that
|T − ti | < . Also, let τ < ti so that |T − τ | < . Now we use the same
argument as in previous paragraphs: Use the solution φ on [t0 , ti ) to construct
a solution φ̃ on [τ −, τ +] 3 T . Lemma 24 allows us to glue φ and φ̃ together
to form a unique solution valid on [t0 , τ + ] 3 T . So T ∈ E as above and E
is closed in I+ .
T − τ ti
r
r
r
37
T
r
[t0 , T ]
[τ − , τ + ]
This Lemma 25 completes the proof of the Extension Theorem 6, at least
for solutions moving forward in time. The reason is this: if there is a time
T ∈ I+ ∩ ∂J (we may choose I+ since we are only moving forward in time),
then
E = J+ 6= I+ .
Therefore, by the contrapositive of Lemma 25, supE |φ| = ∞. But since φ is
continuous on [t0 , T ), we must have lim supt→T |φ(t)| = ∞.
The argument for solutions moving backward in time is the same.
The above theorem may be improved as follows:
Theorem 7. Consider an initial value problem ẋ = v(x, t), x(t0 ) = x0 .
Assume v is continuous and locally Lipschitz in U, where U is a connected
open subset of Rn × R containing (x0 , t0 ). Then there is an open interval
J satisfying t0 ∈ J and a unique solution φ : J → Rn to the initial value
problem. Moreover, J is maximal in the following sense: Let J+ = J ∩(t0 , ∞)
and J− = J ∩ (−∞, t0 ). Then neither of the graphs G± = {(t, φ(t)) : t ∈ J± }
is contained in any compact subset of U.
The proof is essentially the same as that of Theorem 6.
Here is an important principle which follows from the basic theorems
Proposition 26. Consider the graph of a solution (t, x(t)) to a differential
equation ẋ = v(x, t), where v is Lipschitz. If any two solutions have graphs
which cross, then they must coincide on the intersection of their intervals of
definition.
Proof. Let φ1 and φ2 be the two solutions. If their graphs cross at (t0 , x0 ),
then they both solve the initial value problem
ẋ = v(x, t),
x(t0 ) = x0 .
The solutions must coincide on a small interval by Proposition 22, and then
must coincide on the whole intersection of their intervals of definition by
Lemma 24.
Homework Problem 15. Consider the initial value problem ẋ = x2 + t,
x(0) = 1. Show that the solution to this problem (moving forward in time)
exists only until some time T > 0, where T < 1.
38
Hint: See Examples 6 and 7 above. Let φ(t) be the solution to the current
1
initial value problem. We will compare φ to the solution ψ(t) = 1−t
of the
initial value problem ẋ = x2 , x(0) = 1. Let J be the maximal interval on
which φ can be extended. Let J+ = J ∩ (0, ∞); T is then the positive endpoint
of J+ . Now consider the interval
E = {t ∈ J+ : φ(τ ) ≥ ψ(τ ) for all τ ∈ (0, t]}.
(a) Show that E = J+ implies T ≤ 1. (Use Theorem 6.)
(b) Proceed to show E = J+ . It suffices to show E is nonempty, open and
closed in J+ . Why?
(c) To show E is nonempty, differentiate the equation φ̇ = φ2 + t at t = 0.
This will allow you to compute φ̈(0). Show that φ(0) = ψ(0), φ̇(0) =
ψ̇(0), and φ̈(0) > ψ̈(0). Why does this show E is nonempty? (Use
Taylor’s Theorem or integrate in t twice; in particular, by the regularity
results in Subsection 2.5 below, φ̈ is continuous.)
(d) To show E is open, show that φ̇(t) > ψ̇(t) for t ∈ E.
(e) To show E is closed, use the continuity of φ and ψ. So this proves
E = J+ and so T ≤ 1.
(f ) To show T < 1, note that part (c) implies there is a point τ ∈ E
where φ(τ ) > ψ(τ ). Let ψ̃(t) be the solution to the initial value problem
ẋ = x2 , x(τ ) = φ(τ ). Solve this equation explicitly and show that ψ̃
blows up at a time T̃ < 1. Then note that parts (a)-(e) can be repeated
to show that J+ ⊂ (0, T̃ ).
2.4
Linear systems
If x ∈ Rn , a homogeneous linear system is a system of the form ẋ = A(t)x,
where A(t) is an n × n matrix valued function of t alone. In this case, it is
straightforward to see that the space of solutions is a vector space over R.
In other words, if α ∈ R, φ, ψ satisfy the equation, then αφ + ψ also satisfies
the equation. The existence and uniqueness theorem allows us to find the
dimension of the solution space.
39
Proposition 27. Consider the equation ẋ = A(t)x, where A(t) is a continuous n × n matrix valued function of t, and x(t) ∈ Rn . For each t0 , there is
an interval I 3 t0 so that the space of solutions φ(t) on I has dimension n.
Consider an initial value condition x(t0 ) = x0 . Let φx0 (t) be the solution to
this initial value problem. Then the map S : x0 7→ φx0 is a linear isomorphism
from Rn to the space of solutions defined on I.
Remark. It is not too hard to show that the interval I can be taken to be
the maximal open interval containing t0 on which A(t) is continuous. (See
Michael Taylor, Partial Differential Equations, Basic Theory.)
Proof. A(t)x is locally Lipschitz in x and continuous in t, as needed for
Theorems 5 and 6. First of all, for a basis ξi of Rn , let I be a small interval
on which all the solutions φξi exist. Note the map x0 7→ φx0 is obviously
linear. S is injective since if x0 6= y0 , φx0 (t0 ) 6= φy0 (t0 ), and thus φx0 6= φy0 .
Therefore, if x0 = ai ξi , φx0 = ai φξi . Again by uniqueness, any solution φ to
ẋ = A(t)x is determined by the initial value φ(t0 ) = x0 , and so S is onto.
Given a linear equation ẋ = A(t)x, for x = x(t) ∈ Rn , we can consider a
similar equation Ẋ = A(t)X for X = X(t) an n × n matrix valued function.
The solution Φ(t) of the initial value problem
Ẋ = A(t)X,
X(t0 ) = I the identity matrix,
is called the fundamental solution of the equation ẋ = A(t)x. It is straightforward to see that the ith column of Φ(t) is the solution to ẋ = A(t)x,
xj (t0 ) = δij . Moreover, the fundamental solution can be used to compute any
solution to the differential equation near t0 .
Lemma 28. On the maximal interval of existence of the fundamental solution Φ(t) of ẋ = A(t)x, the solution to the initial value problem
ẋ = A(t)x,
x(t0 ) = x0 ,
is given by Φ(t)x0 .
Proof. The proof is an immediate calculation.
Homework Problem 16. An inhomogeneous linear system is a system of
the form
ẋ = A(t)x + b(t),
(15)
where A(t) and x are as above and b(t) is a continuous Rn -valued function.
40
(a) Let ψ(t) be a solution to (15). Show that the solution space to (15) is
equal to
{ψ(t) + φ(t) : φ(t) solves ẋ = A(t)x}.
(b) In dimension 1, let Φ(t) be the fundamental solution to ẋ = A(t)x.
Show that the general solution to (15) is
Z
b(t)
Φ(t)
dt + C .
Φ(t)
(c) Still in dimension 1, solve the initial value problem ẋ = x + t, x(0) = 1.
An important example class of equations are those with constant coefficients. ẋ = Ax, for A a constant n × n matrix. The fundamental solution to
such an equation (with t0 = 0) can be calculated directly. In the case that A
is diagonalizable, write A = P DP −1 , with D = diag (λ1 , . . . , λn ) the diagonal matrix with the eigenvalues λi of A along the diagonal and P the matrix
whose columns are a basis of eigenvectors for the appropriate eigenvalues.
Then if we define
etD = diag (etλ1 , . . . , etλn ),
then the fundamental solution to ẋ = Ax is given by
etA ≡ P etD P −1 .
To check that etA is the fundamental solution, note that e0A = I and
d tA
d
e
=
(P etD P −1 )
dt
dt
d tD
e
P −1
= P
dt
= P DetD P −1
= P DP −1 P etD P −1
= AetA .
One thing to note is D and P may be complex-valued matrices. This doesn’t
cause any problem if we use Euler’s formula
ex+iy = ex (cos y + i sin y).
Not every matrix B is diagonalizable. To find a general formula for the
fundamental solution etB , we need to deal with the case of Jordan blocks.
The following problem addresses this.
41
Homework Problem 17. Let

λ
 0

 0

 ..
 .
0
B be the n × n Jordan block matrix

1 0 ··· 0
λ 1 ··· 0 

0 λ ··· 0 

.. .. . . .. 
. . 
. .
0 0 ··· λ
(16)
with λ on the diagonal, 1 just above the diagonal, and 0 elsewhere. Find the
fundamental solution etB to ẋ = Bx.
Hint: Write out the system of equations in terms of components. Note
that ẋn only involves xn and not any other xi . So first solve the appropriate
initial value problems for xn (you’ll need to do one initial value problem for
each column of the identity matrix I). Then do xn−1 , then xn−2 , etc., and
find a formula that works for all xi .
Alternatively, it is possible to write out etB as a power series. If you
approach the problem this way, you must check to be sure your answer works.
Of course the reason we consider Jordan blocks is the following famous
theorem.
Theorem 8 (Jordan Canonical Form). Let A be an n×n complex matrix.
Then we can write A = P BP −1 , where B is an upper triangular, block
diagonal matrix of the form


B1 0 0 · · · 0
 0 B2 0 · · · 0 


 0 0 B3 · · · 0 
B=
,
 ..
..
.. . .
.. 
 .
. . 
.
.
0 0 0 · · · Bm
where each Bi is an li × li Jordan block matrix of the form (16) for i =
1, . . . , m, λ = λi an eigenvalue of A. Of course l1 + · · · + lm = n. If λ is a
root of the characteristic polynomial det(λI − A) repeated k times, then
X
li = k.
λi =λ
B is unique up to the ordering of the blocks Bi .
42
Remark. A is diagonalizable if and only if each Jordan block is 1 × 1. If
the characteristic polynomial of A has distinct roots, then A is diagonalizable, but the converse is false in general (A = I the identity matrix is a
counterexample).
Homework Problem 18. Assume that all the eigenvalues of the n × n
matrix A have negative real part. (A is not necessarily diagonalizable.) Show
that etA → 0 as t → ∞. (Just check that each entry in the matrix etA goes
to 0.)
Homework Problem 19. Solve the initial value problem
ẋ = 2x − y,
2.5
ẏ = 2x + 5y,
x(0) = 2,
y(0) = 1.
Regularity
Regularity of a function refers to how many times the function may be differentiated. A function is (locally) C k if it and all of its partial derivatives up to
order k are continuous. A function is C ∞ if it and all of its partial derivatives
of all orders are continuous. For the purposes of this course, a function is
smooth if it is C ∞ (in other settings a function may be called smooth if it has
as many derivatives as the purpose at hand requires). There are other notions of regularity in which the function and perhaps its derivatives, suitably
defined, are in Lp or other Banach spaces.
A vector-valued function is smooth or C k if and only if each of its component functions is smooth or C k respectively.
Theorem 9. Assume v : O × I → Rn is smooth (O ⊂ Rn is a domain and
I ⊂ R is an open interval). Any solution to ẋ = v(x, t) is smooth.
Proof. Let φ be a solution. Since φ̇ exists, then φ is differentiable, and thus
continuous. Since v is continuous as well, φ̇ = v(φ, t) is continuous and so φ
is (locally) C 1 . Now since v is smooth, we may differentiate to find
φ̈(t) =
∂v
∂v
(φ, t)φ̇i (t) +
(φ, t).
i
∂x
∂t
Now since φ and φ̇ and the partial derivatives of v are continuous, we see that
φ̈ is continuous and φ is (locally) C 2 . Since v is smooth, we can keep differentiating, using the chain and product rules, to find by induction dm φ/dtm
is continuous for all m and so φ is C ∞ .
43
Remark. The technique used in the proof of Theorem 9 above is called bootstrapping. In this process, once we know that φ is C 0 , we plug into the
equation to find that φ is C 1 . Then we use the fact that φ is C 1 to prove φ
is C 2 , etc.
Remark. The proof above also shows that if v is C k , then φ is C k+1 .
2.6
Higher order equations
A higher-order systems of ODEs is of the form
x(m) = v(x(m−1) , . . . , ẋ, x, t),
(17)
m
where of course x(m) = ddtmx . There is an easy trick to transform this system to
an equivalent first-order system with more variables. Let y 1 = ẋ, . . . , y m−1 =
x(m−1) . Then it is easy to see the system (17) above is equivalent to the
system


ẏ m−1 = v(y m−1 , . . . , y 1 , x, t),


m−2


= y m−1 ,
 ẏ
..
..
(18)
.
.


1
2

ẏ = y ,



ẋ = y 1 .
This first-order system leads us to the appropriate formulation of the initialvalue problem:
Theorem 10. Let U be a neighborhood of (xm−1
, . . . x10 , x0 , t0 ) in Rnm+1 =
0
n
n
n
R × · · · × R × R. Let v : U × I → R be locally Lipschitz. Then there is an
interval J on which there is a unique solution to the initial value problem


x(m) = v(x(m−1) , . . . , ẋ, x, t),


(m−1)


(t0 ) = xm−1
,
0
 x
..
..
(19)
.
.



ẋ(t0 ) = x10 ,



x(t0 ) = x0 .
Moreover, if T is an endpoint of J (either finite or infinite), then as t → T ,
(x(m−1) , . . . , ẋ, x, t) leaves every compact subset of U.
Proof. Apply Theorems 5 and 7.
44
So for an mth order differential equation, we need initial conditions for
the function and its derivatives up to order m − 1.
Remark. The trick of introducing new variables into a system of ODEs is
standard in physics. For a particle at position x = x(t), a typical equation
involves how a force acts on the particle. The sum F of the forces acting on
the particle must be equal to mẍ, where m is a constant called the mass. It is
standard to introduce a new vector quantity, called the momentum q = mẋ.
Then F = mẍ is equivalent to the system
q̇ = F,
ẋ =
q
.
m
Again, an important class of examples is linear equations with constant
coefficients. If
x(m) + am−1 x(m−1) + · · · + a1 ẋ + a0 x = 0,
for x a real-valued function, the functions {eλk t } are linearly independent in
the solution space, if λk solve the characteristic equation
λm + am−1 λm−1 + · · · + a1 λ + a0 .
If all the roots are distinct, then {eλk t } form a basis. If a root is repeated l
times, then we must consider functions of the form tj eλk t for j = 0, . . . , l − 1
to form a basis of the solution space.
Euler’s formula again allows us to handle complex roots of the characteristic equation.
Homework Problem 20. For which real values of the constants a and b do
all the solutions to
ẍ + aẋ + bx = 0
go to 0 as t → ∞? Prove your answer, and draw your answer as a region in
the (a, b) plane.
2.7
Dependence on initial conditions and parameters
We’ve shown above that if v = v(x, t) is smooth, then the resulting solution
to ẋ = v(x, t), x(t0 ) = x0 is also smooth as a function of t. The initial value
problem also depends on the initial point x0 . We investigate regularity of
the solution depending on x0 .
45
First of all we remark that there is a neighborhood N of (x0 , t0 ) in Rn+1
and an > 0 so that every solution to the equation with initial condition
x(τ ) = y for (y, τ ) ∈ N exists by Lemma 23. This existence on a neighborhood allows us to consider taking derivatives in y in what follows.
Theorem 11. Let v be a C 2 function on a neighborhood of the initial conditions (y, t0 ) ∈ Rn × R. Then the solution φ = φ(y, t) to the initial value
problem
ẋ = v(x, t),
x(t0 ) = y,
is C 1 in y.
Proof. If ∂φ/∂y i exists, then it must satisfy
∂
Dy φ = Dx v(φ, t) ◦ Dy φ.
∂t
(Here Dy φ is the total derivative matrix with respect to the y variables. So
its entries are ∂φj /∂y i .) So Φ = (φ, Dy φ) = (x, z) should satisfy the initial
value problem

ẋ = v(x, t),



ż = Dx v(x, t) ◦ z,
(20)
x(t0 ) = y,



z(t0 ) = I
the identity matrix.
Note that since v is C 2 , Dx v = (∂v k /∂xj ) is C 1 and is thus locally Lipschitz
by Proposition 15. Even though we don’t yet know that the derivative Dy φ
satisfies the equation, we do know that the initial value problem (20) is
solvable.
In order show the solution to (20) is the partial derivative, we return to the
proof of Theorem 5. Let φ0 = y, ψ0 = I the identity matrix. Then (φ0 , ψ0 )
satisfy the initial conditions in (20). Now we form Picard approximations
Z t
φn+1 (y, t) = y +
v(φn (y, τ ), τ ) dτ,
t0
Z t
ψn+1 (y, t) = I +
Dx v(φn (y, τ ), τ ) ◦ ψn (y, τ ) dτ.
t0
It is easy to show by induction that Dy φn = ψn . We already have the
initial step n = 0, and since we can differentiate under the integral sign
46
(see Proposition 11 above), we can easily check that Dy φn = ψn implies
Dy φn+1 = ψn+1 .
We know by the proof of Theorem 5 that φn → φ and ψn → ψ uniformly
on a small interval containing t0 . Then Proposition 8 shows that ∂φ/∂y i = ψi
the ith component of ψ for i = 1, . . . , n. Since these partial derivatives are
continuous (the uniform limit of continuous functions is continuous), then
Proposition 6 shows Dy φ = ψ.
Remark. The previous theorem is true if we assume v is only C 1 and not
necessarily C 2 . The proof is more involved in the case v is only C 1 . (See
Taylor, Partial Differential Equations, Basic Theory, section 1.6.)
A bootstrapping argument can be used to prove the following
Proposition 29. For r ≥ 2, let v be a C r function on a neighborhood of
the initial conditions (x0 , t0 ) ∈ Rn × R. Then the solution φ = φ(y, t) to the
initial value problem
ẋ = v(x, t),
x(t0 ) = y,
is C r−1 in y.
Proof. Let Proposition Tr be the proposition for a given r ≥ 2. We proceed
by induction. The case r = 2 is proved above in Theorem 11. Now assume
that the Proposition Tr has been proved. To prove Tr+1 , assume that v is
locally C r+1 and let φ be a solution to the initial value problem. Then Dx v
is locally C r . Now as above, the pair (φ, Dy φ) = (x, z) satisfies
ẋ = v(x, t),
ż = Dx v(x, t) ◦ z.
(21)
Now analyze the right-hand side of the equations in (21). They are C r
functions of x, z, t. Therefore, Proposition Tr shows that z = Dy φ is locally
C r−1 in y. Since the first partial derivatives of φ are C r−1 , φ is C r . This
proves the inductive step, and the proposition.
We also have the following
Corollary 30. If v = v(x, t) is smooth (C ∞ ), then the solution φ to the
initial value problem ẋ = v(x, t), x(t0 ) = y is smooth in y.
Moreover, it is not too hard to prove the following:
47
Theorem 12. Let r ≥ 2. If v(x, t) is C r jointly in x and t, and if φ is the
solution to ẋ = v(x, t), x(t0 ) = y, then φ is jointly C r−1 in y, t and t0 .
Idea of proof. The difficult part is already done (the C r−1 dependence on y).
For the rest, recall that any solution φ = φ(y, t0 , t) satisfies
Z t
φ=y+
v(φ(y, t0 , τ ), τ ) dτ.
t0
Then use the Fundamental Theorem of Calculus and Proposition 11 above
to produce a bootstrapping argument to show that the appropriate partial
derivatives are continuous.
For a complete proof, see Arnol’d, Ordinary Differential Equations, section 32.5.
Homework Problem 21. For f = f (x, t, y) a smooth function real variables
of x, t, and y, compute
d
dt
Z
t2
f (x(t, y), t, y) dy.
0
Make sure your answer works for the functions f (x, t, y) = x2 ty + t3 y 2 + x,
x(t, y) = y 2 + t2 .
Hint: Carefully rename all intermediate variables and apply
R the Chain
Rule. It also should help to write down the anti-derivative F = f (x(t, y), t, y) dy
and work with the function F using the Fundamental Theorem of Calculus.
Homework Problem 22 (Smooth dependence on parameters). Show
that if v = v(x, t, α) is jointly smooth on a neighborhood of (x0 , t0 , α0 ) in
Rn × R × Rm , then the solution φ to the initial value problem
ẋ = v(x, t, α),
x(t0 ) = x0
is smooth as a function of α.
Hint: Show that this initial value problem is equivalent to the problem
ẋ = v(x, t, β),
x(t0 ) = x0 ,
48
β̇ = 0,
β(t0 ) = α.
2.8
Autonomous equations
An ODE system of the form ẋ = v(x) is autonomous. In other words, a
system is autonomous if there is no explicit dependence on t. The main fact
about autonomous systems is the following proposition, whose proof is an
easy computation:
Proposition 31. If φ is a solution to ẋ = v(x), then for all T ∈ R, φ̃(t) =
φ(t + T ) is also a solution.
A constant solution to an ODE system is called an equilibrium solution.
The equilibrium solutions to autonomous equations correspond to the roots
of v.
Example 9. Consider the initial value problem ẋ = x2 − 1. Then to solve,
we have the equilibrium solutions x = 1 and x = −1. If x2 − 1 6= 0, compute
dx
dt
Z
dx
x2 − 1
Z 1
1
1
−
dx
2 x−1 x+1
1 x − 1 ln
2 x + 1
x−1
x+1
= x2 − 1,
Z
=
dt,
= t + C,
= t + C,
= ±e2t+2C
= Ae2t , A = ±e2C 6= 0,
1 + Ae2t
,
x =
1 − Ae2t
x(0) − 1
A =
.
x(0) + 1
If x(0) ∈ (−1, 1), then A < 0, and the solution x exists for all time and
is bounded between the equilibrium solutions at 1 and −1. Moreover, x
approaches the equilibrium solutions x → −1 as t → ∞ and x → 1 as
t → −∞. If x(0) > 1, then A ∈ (0, 1) and the solution exists only for
t ∈ (−∞, − 12 ln A). If x(0) < −1, then A > 1 and the solution exists only
for t ∈ (− 21 ln A, ∞).
49
This behavior is typical of the behavior of autonomous equations for Lipschitz v. Any bounded solution which exists for all time must be asymptotic
to equilibrium solutions as t → ±∞. Also note that any integral curve I
acts as a barrier to other solutions, in that no other integral curves can cross
I (see Proposition 26 above).
Homework Problem 23. Let v : R → R be locally Lipschitz. Show
that any bounded solution φ of ẋ = v(x) which exists for all time satisfies
limt→∞ φ(t) = c, where v(c) = 0.
Hint: There are three cases:
Case 1: v(φ(0)) = 0. Show that φ is constant by uniqueness.
Case 2: v(φ(0)) > 0. Show that v(φ(t)) > 0 for all t (if it is ever equal to
zero, apply the argument of Case 1 above to show φ is constant; also use the
continuity of v ◦ φ). Now show φ(t) is always increasing, and so must have
a finite limit c as t → ∞. Compute limt→∞ v(φ(t)). Write
Z ∞
Z ∞
∞ > c = φ(0) +
φ̇(t) dt = φ(0) +
v(φ(t)) dt,
0
0
and show that v(c) = 0.
Case 3: v(φ(0)) < 0 is essentially the same as Case 2.
2.9
Vector fields and flows
An important interpretation of autonomous systems of equations is given in
terms of vector fields. Interpret x(t) as a parametrized curve x : I → Rn ,
where I ⊂ R is an interval. Then ẋ(t) is the tangent vector to the curve at
time t. For O ⊂ Rn an open set, a function v : O → Rn can be thought
of as a vector field. In other words, at every point x ∈ O, v(x) is a vector
in Rn based at x. Then we have a natural interpretation of an autonomous
differential equation ẋ = v(x) as the flow along the vector field v.
For any solution to ẋ = v(x), the tangent vector ẋ(t) must be equal to
the value of the vector field v(x(t)). The solution x(t) is an integral curve
to the equation ẋ = v(x). The integral curves for the solution are tangent
to the vector field at each point x. Moreover, if v(x) is locally Lipschitz,
then the solutions are unique, and we may think of the vector field as giving
unique directions for how to proceed in time at each point in space. By
the invariance of solutions in time, we have the following strong version of
uniqueness:
50
Proposition 32. Let O ⊂ Rn be an open set, and let v : O → Rn be locally
Lipschitz. If φ1 and φ2 are two maximally extended solutions to ẋ = v(x)
which satisfy φ1 (t1 ) = φ2 (t2 ), then φ1 (t) = φ2 (t + t2 − t1 ) for all t in the
maximal interval of definition of φ1 .
Proof. φ1 (t) and φ̃2 (t) = φ2 (t + t2 − t1 ) both satisfy the initial value problem
ẋ = v(x),
x(t1 ) = φ1 (t1 ),
and so must be the same by Theorems 5 and 6.
For a vector field v on O ⊂ Rn , a picture of all
is called the phase portrait of v. Recall we drew in
of the two systems in R2
1 0
−3
ẋ =
x,
ẋ =
0 −1
−2
the integral curves on O
class the phase portraits
4
3
x.
Homework Problem 24.
(a) Draw the phase portrait of the system in R2
1 0
ẋ =
x.
0 2
Show that each integral curve lies in a parabola or a line in R2 .
(b) Draw the phase portrait of the system in R2
3
1 −
2
2
ẋ =
x.
− 12 32
Here is the principal theorem regarding flows of vector fields on open sets:
Theorem 13. Let O ⊂ Rn be open, and v : O → Rn be smooth. Then there
is an open set U so that O × {0} ⊂ U ⊂ O × R on which the solution φ(y, t)
to
ẋ = v(x),
x(0) = y
exists, is unique, and is smooth jointly as a function of (y, t).
Proof. This follows immediately from Theorems 5, 7 and 11.
51
Remark. It may not be possible to find an > 0 so that O ×(−, ) ⊂ U. The
reason is that solutions may leave O in shorter and shorter times for initial
conditions y → ∂O. A simple example is given by v(x) = 1, O = (0, 1).
This problem cannot be fixed by considering O = Rn , since we may have
v(y) → ∞ rapidly as y → ∞ in Rn . However, see the following corollary.
Corollary 33. Under the conditions of Theorem 13 above, if K ⊂ O is
compact, then there is an > 0 so that the solution
φ : K × (−, ) → O.
Proposition 34. Consider φ(y, t) the solution to ẋ = v(x), x(0) = y, for v
smooth. Then as long as φ(y, t1 ), φ(y, t1 + t2 ) ∈ O, then
φ(y, t1 + t2 ) = φ(φ(y, t1 ), t2 ).
Proof. Consider
ψ(t) = φ(y, t1 + t),
θ(t) = φ(φ(y, t1 ), t).
Then if we show ψ and θ satisfy the same initial value problem, then uniqueness will show that ψ(t) = θ(t) and we are done.
Compute
ψ(0)
θ(0)
ψ̇(t)
θ̇(t)
=
=
=
=
φ(y, t1 ),
φ(φ(y, t1 ), 0) = φ(y, t1 ),
φ̇(y, t1 + t) · 1 = v(φ(y, t1 + t)) = v(ψ(t)),
φ̇(φ(y, t1 ), t) = v(φ(φ(y, t1 ), t)) = v(θ(t)).
Note that it is necessary in the previous Proposition 34 to restrict to
times in which the solution does not leave O. In fact, long-time existence
of flows along vector fields is problematic on open subsets of Rn . Recall we
require our subsets to be open for ODEs since we want to be able to take twosided limits for any derivatives involved. On the other hand, compactness
guarantees a uniform time interval for existence. But compact subsets of Rn
are closed and bounded, and thus (if nonempty) cannot be open. The way
out of this problem is to consider compact manifolds, which we will realize
as compact lower-dimensional subsets of Rn . For example,
S1 = {(x1 , x2 ) : (x1 )2 + (x2 )2 = 1}
is a compact one-dimensional submanifold of R2 .
52
2.10
Vector fields as differential operators
A vector field v on O naturally differentiates functions f on O by the directional derivative:
∂f
vf = Dv f = v i i
∂x
for v i the components of v. Therefore, we often write
v = vi
∂
.
∂xi
We say that v is a first-order differential operator on functions f .
This observation is natural from the point of view of ODEs by the following
Proposition 35. For an interval I ⊂ R, let φ : I → Rn be a solution to the
autonomous system ẋ = v(x), where v : O → Rn is a continuous function and
O an open subset in Rn . Also consider a differentiable function f : O → R.
Then the derivative
(f ◦ φ)0 (t) = (Dv f )(φ(t)) = (vf )(φ(t)).
Proof. Compute
(f ◦ φ)0 (t) = (Df )(φ(t)) ◦ (Dφ)(t)
dφi
∂f
(φ(t))
(t)
=
∂xi
dt
∂f
=
(φ(t))v i (φ(t))
i
∂x
i ∂
f (φ(t))
=
v
∂xi
= (vf )(φ(t)).
Define the bracket [v, w] of two operators to be
[v, w]f = (vw − wv)f = v(wf ) − w(vf ).
Homework Problem 25. Let v and w are two smooth vector fields on Ω.
53
(a) Show that the differential operator [v, w] is also a first-order differential
operator determined by a vector field (which we also write as [v, w]).
What are the components of [v, w]?
(b) For smooth vector fields u, v and w, show that
[u, v] = −[v, u]
and
[[u, v], w] + [[v, w], u] + [[w, u], v] = 0.
(This last identity is the Jacobi identity.)
Remark. Part (b) of the previous problem shows that the vector space of
smooth vector fields on O is a Lie algebra. The bracket [·, ·] is called the Lie
bracket.
54
3
Manifolds
3.1
Smooth manifolds
We define smooth manifolds as subsets of RN . We basically follow Spivak,
Calculus on Manifolds, Chapter 5. When we say smooth in this section, we
mean C ∞ .
We say a subset M ⊂ Rn is a smooth k-dimensional manifold (or, more
properly, a submanifold of Rn ), if for all x ∈ M , there are open subsets
U ⊂ Rk and O ⊂ M with x ∈ O and a one-to-one C ∞ map φ : U → Rn
satisfying
1. φ(U) = O.
2. For all y ∈ U, Dφ(y) has rank k.
3. φ−1 : O → U is continuous.
Such a pair (φ, U) is called a local parametrization of M . The components
of the map φ−1 : O → Rk are local coordinates on M . A set of triples
(φα , Uα , Oα ) is called an atlas of M if {Oα } is an open cover of M .
Since O is an open subset of M , there is an open subset W ⊂ Rn so that
O = M ∩ W . In this case, we may rewrite condition (1) as
(10 ) φ(U) = M ∩ W .
Also note that φ : U → O is a homeomorphism from O to U since it is
smooth, one-to-one, onto, and φ−1 is continuous.
Now we note with a few examples why conditions (2) and (3) are necessary. First of all, consider φ : R → R2 given by φ(t) = (t2 , t3 ). Then
φ is smooth, one-to-one, and φ−1 : φ(R) → R is continuous. But we note
2
the image φ(R), which is the graph of x1 = (x2 ) 3 in R2 , is not smooth at
(0, 0) ∈ R2 . We also check that
2t
Dφ =
=0
when t = 0 and φ(t) = (0, 0),
3t2
and so Dφ has rank 0 < 1 at the point at which φ(R) is not smooth.
Condition (3) is necessary by the following problem:
55
Homework Problem 26. Recall polar coordinates (x, y) = (r cos θ, r sin θ)
in R2 . Show that a portion of the polar graph r = sin 2θ can be parametrized
for I an open interval in R, by φ : I → R2 so that φ is one-to-one, C ∞ , and
Dφ is never 0, but so that φ−1 : φ(I) → I is not continuous. Sketch the graph
and indicate pictorially why φ(I) should not be considered a submanifold of
R2 .
If W and V are open subset of Rn , then a map f : W → V is a diffeomorphism if f is one-to-one, onto, C ∞ , and f −1 is C ∞ . The Inverse Function
Theorem and Problem 9 show
Lemma 36. f : W → V is a diffeomorphism if and only if f is one-to-one,
onto, C ∞ , and det Df (x) 6= 0 for all x ∈ W .
The following theorem is useful in proving properties about manifolds:
Theorem 14. M ⊂ Rn is a k-dimensional manifold if and only if for all x ∈
M , there are two open subset V, W of Rn , with x ∈ W and a diffeomorphism
h : W → V satisfying
h(W ∩ M ) = V ∩ (Rk × {0}) = {y ∈ V : y k+1 = · · · = y n = 0}.
Proof. (⇐) Let U = {a ∈ Rk : (a, 0) ∈ h(W )}, and define φ : U → Rn by
φ(a) = h−1 (a, 0). φ is smooth and one-to-one since h is a diffeomorphism.
Moreover, φ(U) = M ∩ W to satisfy condition (10 ). φ−1 = h(W ∩M ) is continuous.
So all that is left to check is the rank condition (2). Consider H : W → Rk
H(z) = (h1 (z), . . . , hk (z)).
Then H(φ(y)) = y for all y ∈ U. Then use the Chain Rule to compute
DH(φ(y)) ◦ Dφ(y) = I, and so Dφ(y) must be an injective linear map, and
so must have rank k. Thus M is a smooth manifold.
(⇒) Now assume M is a manifold, and define y = φ−1 (x). Then Dφ(y)
has rank k, and so there is at least one k × k submatrix of Dφ(y) with
nonzero determinant. (We may think of Dφ(y) as an n × k matrix mapping
column vectors in Rk to column vectors in Rn . Then a k × k submatrix is
simply a collection of k distinct rows of Dφ(y).) By a linear change of basis,
if necessary, then, we may assume that
i
∂φ
(y) 6= 0.
det
1≤i,j≤k
∂y j
56
By continuity, this is true on an open neighborhood U 0 of y.
Define g : U 0 ×Rn−k → Rn by g(a, b) = φ(a)+(0, b). Then, in block matrix
form,
 i

∂φ
0
 ∂y j 1≤i,j≤k

.
i
Dg(a, b) = 
 ∂φ

In−k
j
∂y 1≤j≤k,k<i≤n
i
∂φ
So det Dg(a, b) = det1≤i,j≤k ∂y
6= 0. So we may apply the Inverse Function
j
Theorem to find that there are open subsets of Rn V10 3 (y, 0) and V20 3
g(y, 0) = x so that g : V10 → V20 has a smooth inverse h : V20 → V10 .
Define O via
O = {φ(a) : (a, 0) ∈ V10 }
= (φ−1 )−1 (ι−1 (V10 )),
where ι : Rk → Rn sends a to (a, 0). Since φ−1 is continuous, O is an open
subset of φ(U 0 ), and of M . Therefore, there is an open subset Ṽ of Rn so
that O = M ∩ Ṽ .
Let W = Ṽ ∩ V2 , and V = g −1 (W ). Then h : V → W is a diffeomorphism
and
W ∩M =
=
h(W ∩ M ) =
=
=
{φ(a) : (a, 0) ∈ V }
{g(a, 0) : (a, 0) ∈ V },
g −1 (W ∩ M )
g −1 ({g(a, 0) : (a, 0) ∈ V })
V ∩ (Rk × {0}).
This completes the proof.
This characterization of manifolds is quite useful. Consider two smooth
local parametrizations φα : Uα → Oα , and φβ : Uβ → Oβ . Then if Oα ∩Oβ 6= ∅,
then we have the following
−1
−1
Proposition 37. φ−1
β ◦ φα : φα (Oβ ) → φβ (Oα ) is a diffeomorphism.
Proof. Consider π : Rn → Rk given by (a, b) 7→ a for (a, b) ∈ Rk × Rn−k , and
ι : Rk → Rn given by ι(a) = (a, 0). Let hα and hβ be the diffeomorphisms
57
−1
guaranteed by Theorem 14. Then φα (a) = h−1
α (a, 0), φα (x) = π(hα (x)), and
so
−1
φ−1
β ◦ φα = π ◦ hβ ◦ hα ◦ ι
is smooth since hα , hβ are diffeomorphisms.
The maps φ−1
β ◦ φα are called gluing maps.
Remark. It is often useful to think of a manifold M as being glued together
from domains Uα in Rk by the gluing maps. In fact, the previous proposition is the starting point for the abstract definition of a smooth manifold:
A smooth k-dimensional manifold is Hausdorff, sigma-compact topological
space for which each point x has a neighborhood Oα homeomorphic to a
domain Uα ⊂ Rk via φα : Uα → Oα . In addition, we require the gluing maps
−1
φ−1
β ◦ φα to be smooth on φα (Oβ ).
If M is a smooth manifold, then a function f : M → Rp is said to
be smooth if for each smooth parametrization φ : U → M , f ◦ φ : U →
Rp is smooth. If N ⊂ Rp is a smooth submanifold, then f : M → N is
said to be smooth the induced map f : M → Rp is smooth. (For abstract
target manifolds N , we may work with local parametrizations instead.) This
definition of smooth maps from manifolds is consistent in the following sense:
Proposition 38. If f : M → Rp , and f ◦ φα is smooth from Uα → Rp , then
on φ−1
β (Oα ) ⊂ Uβ , f ◦ φβ is also smooth.
Proof. Apply Proposition 37 and the Chain Rule.
Proposition 39. If M ⊂ Rn is a smooth manifold and f : M → Rp , then f
is smooth if and only if f can be locally extended to smooth functions from
domains in Rn to Rp . In other words, f is smooth if and only if every x ∈ M
has a neighborhood
W in Rn , and there is a smooth function F : W → Rp so
that F W ∩M = f .
Proof. (⇒) For x ∈ M , consider the local diffeomorphism h : W → V
guaranteed by Theorem 14. Then for the smooth parametrization φ(a) =
h−1 (a, 0), we know f ◦ φ is smooth. Now define
F = f ◦ h−1 ◦ π ◦ h : W → Rp
for π : (a, b) 7→ a. F is smooth since
F = f ◦ h−1 ◦ π ◦ h = (f ◦ φ) ◦ π ◦ h.
58
(⇐) For a local parametrization φ, f ◦ φ is smooth since locally, f ◦ φ =
F ◦ φ, which is smooth by the Chain Rule.
X ⊂ RN is a smooth manifold of dimension k if every x ∈ X has a
neighborhood that is diffeomorphic to an open subset of Rk . In other words,
there is an open cover Oα of X so that each Oα is diffeomorphic to an open
subset Uα ⊂ Rk . Let φα : Uα → Oα be the diffeomorphism. φα is called a
parametrization of Oα ⊂ X, and the inverse map φ−1
α is called a coordinate
system. The open cover, together with the coordinate systems
{Oα , φα , Uα }
is called a smooth atlas of X, and X is a smooth manifold if and only if it
has a smooth atlas.
Example 10. The unit sphere
S2 = {(x1 , x2 , x3 ) ∈ R3 : (x1 )2 + (x2 )2 + (x3 )2 = 1}
is a two-dimensional submanifold of R3 .
To show this, we provide an atlas. Let N = (0, 0, 1) be the north pole and
S = (0, 0, −1) be the south pole. Then let O1 = S2 \ {N }, O2 = S2 \ {S},
U1 = U2 = R2 . We construct the coordinate systems φ−1
α , α = 1, 2, by
2
stereographic projection. We may realize R as the plane {x3 = 0} ⊂ R3 .
For a point x in O1 , consider the line Lx,N in R3 through N and x. We
2
define φ−1
1 (x) to be the unique point in R ∩ Lx,N . It is easy to compute
x2
x1
1 2
−1 1
2
3
,
,
(y , y ) = φ1 (x , x , x ) =
1 − x3 1 − x3
2y 1
2y 2
|y|2 − 1
1
2
3
1 2
(x , x , x ) = φ1 (y , y ) =
,
,
.
|y|2 + 1 |y|2 + 1 |y|2 + 1
Similarly, for any point x ∈ O2 , define φ−1
2 (x) to be the unique point in
R2 ∩ Lx,S , and we find as above
x1
x2
1 2
−1 1
2
3
(z , z ) = φ2 (x , x , x ) =
,
,
1 + x3 1 + x3
2z 1
2z 2
|z|2 − 1
1
2
3
1 2
,
,−
.
(x , x , x ) = φ2 (z , z ) =
|z|2 + 1 |z|2 + 1 |z|2 + 1
It is straightforward to check that each of these coordinate systems is a diffeomorphism, and since S2 = O1 ∪ O2 , we have produced a smooth atlas of
S2 and thus have shown that S2 is a two-dimensional manifold.
59
Given a smooth manifold X with a smooth atlas {Oα , φα , Uα }, let Oαβ =
Oα ∩ Oβ . Also define Uαβ = Uα ∩ φ−1
α (Oαβ ). As long as Oαβ 6= ∅, the map
φαβ ≡ φ−1
β ◦ φα : Uαβ → Uβα
is a diffeomorphism. These maps φαβ are called the gluing maps of the manifold X associated to the atlas. In particular, the manifold can be thought of
as the union of the coordinate charts Uα glued together by the gluing maps.
It is straightforward to see, at least as a set, we may identify
!
G
X=
Uα / ∼,
α
where t means disjoint union and the equivalence relation ∼ is given by
x∼y
if x ∈ Uαβ ⊂ Uα , y ∈ Uβα ⊂ Uβ , y = φαβ (x).
Gluing maps may be used to define smooth manifolds which are not necessarily subsets of RN (though we won’t do so here). It is instructive to think of
k-dimensional smooth manifolds as spaces that are smoothly glued together
from open sets in Rk .
Example 11. Recall the example of the atlas of S2 above. Compute
O12 = S2 \ {S, N },
U12 = R2 \ {0},
U21 = R2 \ {0},
z = φ12 (y) =
φ−1
2 (φ1 (y))
=
y1 y2
,
|y|2 |y|2
=
y
.
|y|2
This gluing map is called inversion across the circle |y|2 = 1 in R2 . Each
point is mapped to a point on the same ray through the origin, but the distance
to the origin is replaced by its reciprocal. So we can think of S2 as two copies
of R2 glued together along R2 \{0} by the inversion map across the unit circle.
3.2
Tangent vectors on manifolds
Recall that for a solution φ to an autonomous system ẋ = v(x), the parametric curve φ(t) has tangent vector φ̇(t) = v(φ(t)) at time t. We will use
60
this to define tangent vectors to manifolds. A tangent vector at a point p
in a smooth manifold X is given by the derivative α̇(0) of a smooth curve
α : (−, ) → X ⊂ RN so that α(0) = p. (Note the fact RN is a vector space
allows us to differentiate α.) The space of all tangent vectors at p is called
the tangent space Tp X of X at p, and it is characterized by the following
proposition.
Proposition 40. If X ⊂ RN is a k-dimensional smooth manifold, then the
tangent space Tp X is the following: Given a local parametrization of X
φ: U → O 3 p
so that φ(0) = p,
Tp X = Dφ(0)(Rk ).
In particular, Tp X is naturally a k-dimensional vector space.
Proof. First of all, given a curve α : (−, ) → X so that α(0) = p, we can
ensure (by shrinking if necessary), that the image of α is contained in the
coordinate neighborhood O. Now
α = φ ◦ (φ−1 ◦ α)
and the chain rule shows that
α0 (0) = Dφ(0)[(φ−1 ◦ α)0 (0)] ∈ Dφ(0)(Rk ).
Thus we’ve shown Tp X ⊂ Dφ(0)(Rk ).
To show Dφ(0)(Rk ) ⊂ Tp X, for any vector v ∈ Rk , consider α(t) = φ(tv)
for |t| small enough that the image of α is contained in O. Then
α0 (0) = Dφ(0)v
and so Dφ(0)(Rk ) = Tp X.
Also note the following corollary of our definition of Tp X:
Corollary 41. Tp X is independent of the coordinate neighborhood O of p.
If f : X → Rm is a smooth map from a smooth k-dimensional manifold
X, and if p ∈ X, then we define
Df (p) : Tp X → Rm
61
by using a local parametrization φ : U → X so that φ(q) = p. Then we define
Df (p) = D(f ◦ φ)(q) ◦ (Dφ(q))−1 .
The following exercise verifies this definition makes sense (see Guillemin and
Pollack).
Homework Problem 27.
(a) Show that Dφ(q) is invertible as a linear map from Rk to Tp X.
(b) Show that the definition of Df (p) is independent of the coordinate
parametrization φ.
(c) Show that if f : X → Y for Y ⊂ Rm a manifold, then Df (p)(Tp X) ⊂
Tf (p) Y .
Tangent vectors naturally differentiate functions at a point. So if f : X →
R, then and the tangent vector v = α0 (0) for a curve α so that α(0) = p,
then we may define
(vf )(p) = (f ◦ α)0 (0) = Df (p)α0 (0) = Df (p)v.
This definition depends only on v, and not on the curve α used. (For each v
there are many α, since v only depends on the first derivative α0 (0) and no
higher Taylor coefficients.)
For a coordinate system
φ−1 = (x1 , . . . , xk ) : O → Rk ,
(where we assume as usual that φ(0) = p), then the coordinate basis of Tp X
induced by φ may be written as {∂/∂xi }, which are thought of as tangent
vectors differentiating functions f by
∂
∂
∂
−1 1
k f =
f ◦φ =
f (x , . . . , x ) .
i
i
i
∂x
∂x
∂x
p
0
0
(∂/∂xi is the tangent vector associated to the curve α = φ(tei ), for ei the ith
basis standard basis vector in Rk .) Thus we can write any tangent vector v
at p as
∂
v = vi i .
∂x
62
Writing tangent vectors in terms of the coordinate basis of Tp X is much more
useful than writing them in terms of a basis of RN ⊃ Tp X.
The components v i will change depending on the local coordinates. On
Oαβ = Oα ∩ Oβ the intersection of two coordinate neighborhoods of p, then
−1
1
k
1
k
we have two coordinate systems φ−1
α = (x , . . . , x ) and φβ = (y , . . . , y ).
We can write by using the chain rule
v = v i (x)
∂
∂y j ∂
∂
i
=
v
(x)
= v j (y) j .
i
i
j
∂x
∂x ∂y
∂y
Therefore, we know how the v i change under coordinate transformations
x → y:
∂y j
v j (y) = v i (x) i .
(22)
∂x
(In a more coordinate-free notation, the Jacobian matrix ∂y j /∂xi is the
derivative of the gluing map φαβ = φ−1
β ◦ φα . It is easy to check that
y = φαβ ◦ x.)
All the tangent spaces of a manifold X patch together to make a larger
manifold T X called the tangent bundle. We define the tangent bundle
T X = {(p, w) ∈ RN × RN : p ∈ X, w ∈ Tp X}.
Homework Problem 28. If X is a k-dimensional manifold, show that T X
is a 2k-dimensional submanifold of R2N . To prove this, consider a local
parametrization φ : U → X ⊂ RN .
(a) Define Φ : U × Rk → R2N for y = (y 1 , . . . , y k ) by
∂φ
i
Φ(x, y) = φ(x), i (x) y .
∂x
Show that Φ(U ×Rk ) is an open subset of T X and that Φ is one-to-one.
(b) Show that DΦ has rank 2k.
(c) Show that Φ−1 is continuous from Φ(U × Rk ) to U × Rk .
There is a natural smooth map
π : T X → X,
π(p, w) = p,
63
and each π −1 ({p}) is the vector space Tp X.
Each coordinate system φ−1 = (x1 , . . . , xk ), provides a local frame {∂/∂xi }
of the tangent bundle. A local frame is a basis of the tangent space for every p in a neighborhood O ⊂ X. These frames are patched together in the
following paragraph.
A more abstract view of the tangent bundle is given by looking a given
smooth atlas {Oα , φα , Uα } of X. Then as a set, we may identify
!
G
TX =
Uα × Rk / ≈,
α
where the equivalence class ≈ is given by
(x, v) ≈ (y, w) if x ∈ Uαβ , y ∈ Uβα , y = φαβ (x), w = Dφαβ v.
A vector field on a manifold X provides a tangent vector at every point
in X. More precisely, a vector field is a section of the tangent bundle. In
other words, v : X → T X is a vector field if π(v(p)) = p for all p ∈ X. So
v(p) = (p, w(p)) for w(p) ∈ Tp X. In fact, for X ⊂ RN , w : X → RN so that
w(p) ∈ Tp X is equivalent to v(p) = (p, w(p)). (Clearly v and w carry the
same amount of information, and we often will refer to both of them using
the same symbol v.)
A vector field v is smooth if it is given as a smooth map from X to
N
R × RN ⊃ T X as above. Equivalently, v is smooth if for every local
coordinate system (x1 , . . . , xk ),
v = v i (x)
∂
∂xi
for v i smooth on U ⊂ Rk .
3.3
Flows on manifolds
A smooth vector field v on a manifold X defines a system of ODEs in the
local coordinates of X (or we may say more simply a system on X). The
ODE system is given by
ẋ = v(x)
for x : I → X a parametric curve.
64
In order to describe the relationship between the local and global pictures
of the ODE system, consider X ⊂ RN and v : X → RN so that for each
p ∈ X, v(p) ∈ Tp X. Consider a local parametrization φα : Uα → Oα . Let
1
k
k
φ−1
α = (xα , . . . , xα ). Locally on Uα ⊂ R , we represent v by
vα = vαi
∂
.
∂xiα
In other words, for p ∈ Oα ⊂ X, we have
v(p) = Dφα (p)vα (p).
Proposition 42. Consider v a smooth vector field on X ⊂ RN . Consider a
solution ψα to ẋα = vα (xα ), where ψα : I → Uα for a time interval I. Then
ψ = φα ◦ ψα
is a solution to ẋ = v(x) from I to Oα ⊂ X. Every solution to ẋ = v(x)
restricted to Oα is of this form.
Proof. First of all, note that ẋα = vα (xα ) is a well-defined system of ODEs
on the open set Uα ⊂ Rk . On the other hand, on X, the system ẋ = v(x) is
not an ODE system on RN ⊃ X. This may be remedied locally as follows:
For each p ∈ X, v(p) ∈ Tp X ⊂ RN . Then since v is a smooth function,
we may locally extend v to a smooth function to RN (we refer to each local
extension simply as v).
Consider a solution ψα to ẋα = vα (xα ). Then if we let ψ = φα ◦ ψα , then
compute
ψ̇ = Dφα (ψ̇α ) = Dφα (vα ) = v.
Thus ψ is a solution. To show that every solution ψ to ẋ = v(x) is of this
form, note that since Tp X is the image of Dφα (q) for φα (q) = p (Proposition
40), then every smooth vector field v is locally equal to Dφα vα . Then by
uniqueness of ODEs, the solution to ẋ = v(x) must be the image of the
solution to ẋα = vα (xα ).
Remark. The restriction to autonomous equations ẋ = v(x) is unnecessary.
The same proof works for non-autonomous systems ẋ = v(x, t) on manifolds.
Recall a subset X of a metric space Y is compactly contained in another
subset Z if X̄ is compact and X̄ ⊂ Z. In this case we write X ⊂⊂ Z, and
say X is a precompact subset of Z.
65
Theorem 15. Let v be a smooth vector field on a compact manifold X. Then
the flow F (y, t) along the vector field (the solution to
ẋ = v(x),
x(0) = y)
is a smooth function from X × R to X. In particular, any flow on a compact
manifold exists for all time.
Proof. Consider an atlas {Oα , φα , Uα } of X. First of all, by Lemma 43 below,
there is an open cover Qβ of X so that each Qβ ⊂⊂ Oα for some Oα in the
atlas. Then each φ−1
α Qβ is a compact subset of Uα . Our differential equation
is equivalent to ẋα = vα (xα ) on each Uα .
Since X is compact, we can choose a finite subcover {Q1 , . . . , Qn } of the
open cover {Qβ }. For each i = 1, . . . , n, an straightforward analog of Lemma
23 shows there is an i > 0 so that if x0 ∈ φ−1
α Qi , then the solution to
ẋα = vα (xα ),
x(0) = x0
stays in Uα for t ∈ [−i , i ]. Moreover, by Proposition 31, for any T ∈ R, the
solution with initial condition x(T ) = x0 ∈ φ−1
α Qi stays within Uα for time
t ∈ [T − i , T + i ].
Let = min{1 , . . . , n } > 0. Then for every T ∈ R, p ∈ X, we claim the
solution to ẋ = v(x), x(T ) = p exists for all t ∈ [T − , T + ]. To prove the
claim, note that each p ∈ X lies in one of the Qi ⊂ Oα , and that the solution
to
ẋα = vα (xα ),
x(T ) = φ−1
α (p)
lies in Uα for t ∈ [T − , T + ]. Thus Proposition 42 shows that the solution
to ẋ = v(x), x(T ) = p is in Oα for t ∈ [T − , T + ], and the claim is proved.
In order to prove the Theorem, continue as in the proof of Lemma 25 to
show the solution exists for all time. The smoothness of the solution follows
from Theorem 12 and Proposition 42.
Lemma 43. Given an atlas {Oα , φα , Uα } of a manifold X, there is an open
cover {Qβ } of X so that each Qβ is precompact in some Oα .
Proof. We can cover each open Uα ⊂ Rk by open balls Bβ ⊂⊂ Uα . Then
Qβ = φα (Bβ ) forms an open cover of X.
The support of an Rm -valued function f is the closure
supp(f ) = {x : f (x) 6= 0}.
66
An important class of functions is smooth functions with compact support.
Prominent examples can be constructed using the smooth function on R
−1
e x for x > 0
f (x) =
0 for x ≤ 0
See the notes on bump functions.
Homework Problem 29. Let Ω ⊂ Rn be a domain. Consider a smooth
vector field v : Ω → Rn with compact support. Show that any solution ψ to
ẋ = v(x), x(0) = x0 ∈ Ω, exists for all time t ∈ R.
Hint: First show that if v(y) = 0, then any solution to ẋ = v(x), x(t0 ) =
y, must be constant for all time. Use this to show that any solution to ẋ =
v(x), v(x(0)) 6= 0, must remain in supp(v) for its entire maximal interval of
definition. Apply Theorem 7.
Given a smooth manifold X, consider the set Diff(X) of diffeomorphisms
from X to itself. Then for f, g ∈ Diff(X), it is easy to see that
f ◦ g ∈ Diff(X),
f −1 ∈ Diff(X),
f ◦ f −1 = id
for id the identity map. Therefore, Diff(X) is a group.
Proposition 44. Let v be a smooth vector field on a compact manifold
X. Then for the flow F (y, t), define Ft (y) = F (y, t). Then Ft ∈ Diff(X),
Ft1 +t2 = Ft1 ◦ Ft2 , and F−t = Ft−1 . (And so F is a group homomorphism
from the additive group R to Diff(X).)
Proof. Theorem 15 shows that Ft is smooth for any t. The group homomorphism property is simply a restatement of Proposition 34. Therefore,
Ft ◦ F−t = F0 , which is the flow along v for time 0. By definition, F0 = id the
identity map. Now Ft−1 = F−t is smooth, and so Ft is a diffeomorphism.
Remark. Note the only place we used the fact that X is compact is to guarantee the existence of the flow for all time. So the proposition still holds for
any smooth vector field v on a smooth manifold X so that the flow exists for
all time.
Example 12. For the sphere S2 ⊂ R3 , consider the vector field defined by
v(x1 , x2 , x3 ) = (−x2 , x1 , 0). It is straightforward to show that the tangent
space to S2 at (x1 , x2 , x3 ) is given by v = (v 1 , v 2 , v 3 ) ∈ R3 so that v 1 x1 +
67
v 2 x2 + v 3 x3 = 0. (Proof: S2 = {f = 1} for f = (x1 )2 + (x2 )2 + (x3 )2 ,
and so for any local parametrization φ, we have f ◦ φ = 1. Thus the Chain
Rule shows that Df (x)(Tx S2 ) = 0, and so Tx S2 ⊂ ker Df (x). They must
be equal since both are two-dimensional vector spaces. Then simply compute
ker Df (x).) Therefore, v is a smooth vector field on S2 .
Recall that the coordinate systems of the atlas introduced above are
x1
x2
2
3
1 2
−1 1
(y , y ) = φ1 (x , x , x ) =
,
,
1 − x3 1 − x3
x1
x2
1 2
−1 1
2
3
(z , z ) = φ2 (x , x , x ) =
,
.
1 + x3 1 + x3
On U1 , compute at x = (x1 , x2 , x3 ) ∈ O1 ⊂ S2 ,
Dφ−1
1 (x)(v) =
=
!  −x2 
0
 x1 
1
0
1−x3
0
! 2
x
− 1−x
−y 2
3
=
.
1
x
y1
1−x3
x1
(1−x3 )2
x2
(1−x3 )2
1
1−x3
It turns out that for x ∈ O2 ,
Dφ−1
2 (x)(v)
=
−z 2
z1
as well.
coordinate charts, these systems can be solved explicitly. For A =
In the 0 −1
, compute the fundamental solution
1 0
eAt = P etD P −1
1 i 1 1
i 0
2
2
=
exp
t
1
−i i
0 −i
− 2i
2
1 1
cos t + i sin t
0
=
−i i
0
cos t − i sin t
cos t − sin t
=
.
sin t cos t
68
1
2
1
2
i
2
− 2i
Therefore, for y ∈ U1 , the solution to ẏ = v(y), y(0) = y0 is
cos t − sin t
y(t) =
y0 .
sin t cos t
And also, for z ∈ U2 , the solution to ż = v(z), z(0) = z0 is
cos t − sin t
z(t) =
z0 .
sin t cos t
(23)
(24)
Proposition 42 implies that these two flows should be related, since they both
correspond to flows on S2 . In particular, for y0 ∈ U12 , let z0 = φ12 (y0 ) =
y0 |y0 |−2 . Then we check that the solution
z(t) = φ12 (y(t))
for y(t) from (23) and z(t) from (24). So compute
1
1 y0 cos t − y02 sin t
y0
y(t) =
for y0 =
,
y01 sin t + y02 cos t
y02
|y(t)|2 = (y01 cos t − y02 sin t)2 + (y01 sin t + y02 cos t)2 = |y0 |2 ,
1
y(t)
cos t − sin t
=
y0
φ12 (y(t)) =
sin t cos t
|y(t)|2
|y0 |2
cos t − sin t
=
z0 = z(t).
sin t cos t
Therefore, the flow patches from U1 to U2 .
The flow itself can be represented by on U1 by
cos t − sin t
Ft (y) =
y,
sin t cos t
on U2 by
Ft (z) =
and even on S2 ⊂ R3 itself

0


1
Ft (x) = exp
0
cos t − sin t
sin t cos t
z,
by
 


−1 0
cos t − sin t 0
0 0  t x =  sin t cos t 0  x.
0 0
0
0
1
69
Homework Problem 30. Consider the atlas given above for S2 . On U1 ,
consider the vector field
v = −y 1
∂
2 ∂
−
y
.
∂y 1
∂y 2
Show that Dφ1 v is extends to a smooth vector field on all of S2 (i.e., it
extends smoothly across N = S2 \ O1 .) Write down this vector field in the
z coordinates on U2 as well. Solve for the flow on U1 and U2 , and explicitly
check they agree on the overlap O12 .
3.4
Riemannian metrics
For a vector v at a point p on a manifold X ⊂ RN , we can measure the length
of v by using the inner product on RN . So if v ∈ Tp X ⊂ RN , and
v = va
∂
∂y a
for y = (y 1 , · · · , y N ) coordinates on RN , then the length |v| of v is given by
2
|v| =
N
X
(v a )2 = δab v a v b
a=1
for the Kronecker δab = 1 if a = b and δab = 0 if a 6= b. In this usage
for computing the length of a tangent vector on RN , the Kronecker δ is a
Riemannian metric.
(Note we use the following convention for an n-dimensional manifold X ⊂
RN : use indices a, b, c from 1 to N to represent coordinates in RN , and use
i, j, k from 1 to n to represent local coordinates on X.)
On a manifold X, a Riemannian metric is a smoothly varying positive
definite inner product on Tp X for all p ∈ X. Recall the definitions involved.
An inner product on a real vector space V is a pairing g : V ×V → R which is
bilinear and symmetric. g is bilinear if for every v ∈ V , the maps g(v, ·) and
g(·, v) from V to R are linear maps, and g is symmetric if for each v, w ∈ V ,
g(v, w) = g(w, v). An inner product is positive definite if g(v, v) ≥ 0 for all
v ∈ V and g(v, v) = 0 only if v = 0.
If the vector space V has a basis ei , then the inner product g is determined
by gij = g(ei , ej ), since for any linear combination v = v i ei , w = wj ej ,
70
bilinearity shows
g(v, w) = g(v i ei , wj ej ) = v i g(ei , wj ej ) = v i wj g(ei , ej ) = v i wj gij .
The fact g is symmetric is equivalent to gij = gji .
Note that a positive definite
p inner product g provides a way to measure
the length of a vector |v|g = g(v, v), and it also provides a measurement
of the angle θ between two nonzero vectors v and w:
cos θ =
g(v, w)
.
|v|g |w|g
A Riemannian metric on X gives a positive definite inner product on each
tangent space Tp X. We also require these inner products to vary smoothly
as the point p varies in X. To describe this, consider a smooth atlas on X,
and a local coordinate system (x1 , . . . , xk ) around p. Then a smooth vector
field v can be represented as v = v i ∂x∂ i for the standard local frame {∂/∂xi }
of the tangent bundle. Then at each point, the inner product g is represented
by gij (x), and
g(v, w) = gij v i wj ,
v i = v i (x), wj = wj (x), gij = gij (x).
Then g is smoothly varying on X if the functions gij are smoothly varying
on each coordinate chart in the smooth atlas of X.
Euclidean space RN has a standard Riemannian metric given by the standard inner product δab . As we’ve seen above, for any submanifold X ⊂ RN
endows X with a Riemannian metric. In particular, for v, w ∈ Tp X ⊂ RN ,
we can form g(v, w) using the inner product δab . In particular, consider a
smooth parametrization φ : U → O ⊂ X ⊂ RN . Then φ = (φ1 , · · · , φN ). A
vector field represented by
∂
v = vi i
∂x
on U ⊂ Rn is represented by
∂φa
(x)v i (x) ∈ Tφ(x) X ⊂ RN .
Dφ(x)(v) =
i
∂x
Dφ(x)(v) is called the push-forward of v under the map φ. For v, w ∈ Tp X,
we may define the metric
a b ∂φ i
∂φ j
i j
v
w δab
gij v w = g(v, w) =
i
∂x
∂xj
a b ∂φ ∂φ
=
δab v i wj .
∂xi ∂xj
71
Therefore, the Euclidean inner product on RN induced the Riemannian metric on X locally given by the formula
∂
∂
∂φa ∂φb
g
,
= gij =
δab .
(25)
∂xi ∂xj
∂xi ∂xj
Given a real vector space V , the dual vector space V ∗ is given by the set
of all linear functions from V to R. It is easy to check V ∗ is a vector space.
If V has a basis {ei }, then there is a dual basis {η i } of V ∗ , which is defined
as follows:
η i (ej ) = δji .
Given a local coordinate frame {∂/∂xi } of T X, the local frame on the dual
space is written as {dxi }. Each dxi is called a differential. The dual space
Tp∗ X of Tp X is called the cotangent space of X at p.
Lemma 45. If y = y(x) is a coordinate change as in (22), then
dy j =
∂y j i
dx .
∂xi
Proof. Write dy j = ξ`j dx` . Then we have
k
k
k
k
∂x ∂
∂
∂
j
j
j ∂x
j ∂x
j ∂x
`
`
`
j
δi = dy
=
ξ
dx
=
ξ
dx
=
ξ
δ
=
ξ
.
`
`
`
k
∂y i
∂y i ∂xk
∂y i
∂xk
∂y i k
∂y i
k
∂x
∂y j
j
j
,
and
so
ξ
=
.
Therefore, (ξk ) is the inverse matrix of
k
∂y i
∂xk
A Riemannian metric can be naturally written as
gk` dy k dy ` = gij
∂y k ∂y ` i j
dx dx .
∂xi ∂xj
This makes sense because of the natural pairing
∂
i
dx
= δji
j
∂x
between the tangent and cotangent spaces implies that
j
i
k ∂
` ∂
g(v, w) = gij dx v
dx w
= gij (v k δki )(w` δ`j ) = gk` v k w` .
∂xk
∂x`
72
A Riemannian metric is an example of a tensor on X. The tensor product
V ⊗ W of two real vector spaces with bases respectively νi and ωj is the real
vector space formed from the basis
{νi ⊗ ωj }.
This implies
dim V ⊗ W = (dim V )(dim W ).
A tensor of type (k, `) on a manifold X assigns to each point p ∈ X an
element of
(Tp X)⊗k ⊗ (Tp∗ X)⊗` ,
which has as its basis
∂
∂
j1
j`
⊗ · · · ⊗ i ⊗ dx ⊗ · · · ⊗ dx
.
∂xi1
∂x k
Locally, we write a tensor ω as
k
ωji11 ···i
···j`
∂
∂
⊗
·
·
·
⊗
⊗ dxj1 ⊗ · · · ⊗ dxj` ,
i
∂x 1
∂xik
i1 ···ik
k
or simply as ωji11 ···i
···j` . We say ω is smooth if each ωj1 ···j` is smooth locally for
all coordinates in a smooth atlas of X.
A Riemannian metric is then a smooth symmetric (0, 2) tensor on a manifold X. Since the product is symmetric, we omit the ⊗ and simply write
gij dxi dxj for a Riemannian metric in local coordinates x. (There are also
antisymmetric (0, k) tensors, or k-forms, for which the tensor product ⊗ is
replaced by ∧.)
Example 13. For S2 , in the local coordinate given by stereographic projection, recall the coordinate chart φ = φ1 :
2y 1
2y 2
|y|2 − 1
1 2
φ(y , y ) =
,
,
,
|y|2 + 1 |y|2 + 1 |y|2 + 1
73
and the Riemannian metric induced from R3 is
∂φa ∂φb i j
dy dy
∂y i ∂y j
δab dφa dφb
dφ1 dφ1 + dφ2 dφ2 + dφ3 dφ3
2
−2(y 1 )2 + 2(y 2 )2 + 2 1
−4y 1 y 2
2
dy +
dy
(|y|2 + 1)2
(|y|2 + 1)2
2
−4y 1 y 2
2(y 1 )2 − 2(y 2 )2 + 2 2
1
dy +
dy
+
(|y|2 + 1)2
(|y|2 + 1)2
2
4y 1
4y 2
1
2
+
dy +
dy
(|y|2 + 1)2
(|y|2 + 1)2
4
(dy 1 dy 1 + dy 2 dy 2 ).
2
2
(|y| + 1)
gij dy i dy j = δab
=
=
=
=
Note in the previous example, we used the formula for differentials
dφa =
∂φa i
dy .
∂y i
It is also useful to have the following notation: If h = hab dz a dz b is a Riemannian metric on Z, and φ : Y → Z is a smooth map, then we denote the
pullback metric
φ∗ h = hab (φ) dφa dφb
on Y . Thus in the construction above, if δ = δab dxa dxb is the Euclidean
metric on RN , then the metric g induced on a submanifold φ : X ,→ RN is
the pullback φ∗ δ.
Homework Problem 31. Let φ : X → Y be a smooth map of manifolds.
Let Y have a Riemannian metric h on it. Show that φ∗ h is a Riemannian
metric on X if and only if the tangent map Dφ(x) : Tx X → Tφ(x) Y is injective
for every x ∈ X. (In this case φ is called an immersion.)
Hint: Do the calculations in local coordinates on X and Y . The key point
to check is whether φ∗ h is positive definite. Show φ∗ h(x) is 0 on the kernel
of Dφ(x).
Note in the previous example, we considered the Riemannian metric on
S pulled back from the Euclidean metric on R3 . It is possible to write down
other Riemannian metrics as well.
2
74
Example 14. Consider hyperbolic space
Hn = {x = (x1 , . . . , xn ) ∈ Rn : xn > 0}
equipped with the Riemannian metric
dx1 dx1 + · · · + dxn dxn
.
(xn )2
A famous theorem of John Nash shows that for every Riemannian metric
g on a smooth manifold X, there is an embedding i : X → RN so that g is
induced from the standard metric on RN . (Although it is not in most cases
obvious what the embedding is.)
3.5
Vector bundles and tensors
In order to explain better what tensors are, we introduce the idea of a vector
bundle. The tangent bundle T X of a smooth n-dimensional manifold X is a
vector bundle. Recall there is a map
π : T X → X.
The fiber over a point p ∈ X π −1 (p) = Tp X is an n-dimensional vector
space. Moreover, over each coordinate neighborhood O ⊂ X with coordinates {x1 , . . . , xn }, π −1 O is diffeomorphic to O × Rn , the diffeomorphism
being
(p, v) 7→ (p, v 1 , . . . , v n )
for p ∈ O, v = v i ∂x∂ i ∈ Tp X.
We generalize these properties of T X to define a vector bundle. A vector bundle of rank k over a manifold X is given by an n + k dimensional
manifold V with a smooth map π : V → X. V is called the total space of
the vector bundle. Every point in X has a neighborhood O so that π −1 O is
diffeomorphic to O × Rk . Under this diffeomorphism, π is simply the natural
projection from O × Rk → O. Thus vector bundles are locally trivial, in that
each vector bundle is locally a product of a neighborhood times Rk . Note
that each diffeomorphism
π −1 O → O × Rk
75
provides for each p ∈ O a basis of the vector space π −1 (p) by taking the
preimage of the standard basis of Rk under the diffeomorphism. Such a
smoothly varying basis is called a local frame of the vector bundle over O.
Given a gluing map y = y(x) of two small coordinate neighborhoods Ox
and Oy in X, there is a corresponding gluing map of Ox × Rk and Oy × Rk .
We require this gluing map to be of the form
(x, v) 7→ (y(x), A(x)v)
for v a vector in Rk and A(x) a smoothly varying nonsingular matrix in
x. Therefore, above each point p, if we change coordinates from x to y, the
frame changes by the matrix A(x). A(x) is a transition function of the vector
bundle V . So the transition functions act on the fibers of a vector bundle as
linear isomorphisms. This preserves the vector-space structure on each fiber
when changing coordinates.
Remark. We have defined real vector bundles of rank k, for which each fiber is
diffeomorphic to Rk . We may also define complex vector bundles with fibers
diffeomorphic to Ck .
A section of a vector bundle π : V → X is a map s : X → V satisfying
π(s(p)) = p for all p ∈ X. So for each p ∈ X, s(p) is an element of the
vector space π −1 (p). A vector field is precisely a section of the tangent
bundle. Locally, k sections which are linearly independent on each fiber
form a frame of the vector bundle. For example, {∂/∂xi } are n linearly
independent sections of the tangent bundle over a coordinate chart.
Since vector bundles preserve the linear structure on each fiber, we may
do linear algebra on the fibers to create new vector bundles. In particular,
we can take duals and tensor products of the fiber space to form new vector
bundles. The tensor bundle of type (k, `) over an n dimensional manifold X
is the vector bundle of rank nk+` with the fiber over p given by
Tp X ⊗k ⊗ Tp∗ X ⊗` .
Over each coordinate chart, the natural frame of the tensor bundle is
∂
∂
⊗ · · · ⊗ i ⊗ dxj1 ⊗ · · · ⊗ dxj`
i
1
∂x
∂x k
for i1 , . . . , ik , j1 , . . . , j` ∈ {1, . . . , n}. The transition functions of a tensor
bundle are determined by the formulas
∂
∂y k ∂
=
,
∂xi
∂xi ∂y k
dxj =
76
∂xj `
dy .
∂y `
For example the transition functions for the (0, 2) tensor bundle are given by
dxi dxj =
Note we can view
∂xi ∂xj k `
dy dy .
∂y k ∂y `
∂xi ∂xj
∂y k ∂y `
as a nonsingular n2 × n2 matrix, which is the tensor product of the matrix
∂xi
with itself.
∂y k
A smooth tensor of type (k, `) is a smooth section of the (k, `) tensor
bundle. Thus a Riemannian metric is a smooth symmetric, positive-definite
(0, 2) tensor.
3.6
Integration and densities
We begin by introducing the Change of Variables Formula for multiple integrals:
Theorem 16 (Change of Variables). Let Ω ⊂ Rn be an open set, and let
g : Ω → Rn be one-to-one and locally C 1 . Then for every L1 function f on
g(Ω) with Lebesgue measure dx and dy,
Z
Z
f (y) dy =
f (g(x))| det Dg(x)| dx.
g(Ω)
Ω
Proof. See Spivak Calculus on Manifolds.
Here is another useful concept. Given an open cover {Oα } of a smooth
manifold X, a partition of unity subordinate to the cover is a collection of
smooth functions ρβ : X → R satisfying
1. ρβ (x) ∈ [0, 1].
2. For each ρβ , there is an α so that supp(ρβ ) ⊂⊂ Oα .
3. Every x ∈ X has a neighborhood which intersects only finitely many
of the supports of the ρβ .
P
4.
β ρβ (x) = 1.
77
Proposition 46. For every open cover of a smooth manifold X, there exists
a subordinate partition of unity.
For a proof, see Spivak or Guillemin and Pollack.
Theorem 17. A Riemannian metric g on a manifold X provides a measure
on X called the Riemannian density.
The construction of this measure follows below, along with a sketch of a
proof.
Let {Oα , φα , Uα } be a smooth atlas of X. A function f : X → R is
measurable if each f ◦ φα : Uα → R is measurable. For a Riemannian metric
g on X, the density dVg is defined first for measurable functions f : X → R
whose supports are contained in some Oα . In this case, define
Z
Z
Z
q
f dVg =
f dVg =
f (x) det gij (x) dx
X
Oα
Uα
for local coordinate x on Oα and Lebesgue measure dx on Uα ⊂ Rn .
The key point is to make sure this definition makes sense for functions f
whose support is contained in two open charts Oα and Oβ . As above, let x
be the local coordinates on Oα , and let y be the coordinates on Oβ . Then we
use the rule (25) for changing gij under a change y = y(x) and the Change
of Variables Theorem 16 to show
Z
Z
q
q
∂y i f (y) det gij (y) dy =
f (x) det gij (y) det j dx
∂x
Uβ
Uα
s
Z
∂xk ∂x` ∂y i =
f (x) det gk` (x)
det j dx
∂yi ∂y j ∂x
Uβ
Z
k i
p
∂x
∂y
=
f (x) det gk` (x) det i det j dx
∂y
∂x
U
Z β
p
=
f (x) det gk` (x) dx.
Uβ
Let ρβ be a partition of unity subordinate to the atlas Oα of X. For any
measurable subset Ω ⊂ X, consider its characteristic function χΩ . Then
Z
XZ
Vg (Ω) =
χΩ dVg =
ρβ χΩ dVg .
X
β
78
X
The calculation in the previous paragraph can be used to ensure that this definition is independent of the atlas and partition of unity used. It is straightforward to check that dVg defines a measure on X. Then for any L1 function
f on X (measured by dVg of course),
Z
XZ
f dVg =
ρβ f dVg .
X
β
X
Homework Problem 32. Check that Vg is a measure on X.
Remark. To complete a proof of Theorem 17, it is necessary to check that
the definition depends only on g and not on the atlas {Oα , φα , Uα } or the
partition of unity {ρβ } subordinate to the open cover {Oα }.
If Ω is a domain in Rn with smooth boundary, then the measure on the
boundary ∂Ω is given by the restriction of the Riemannian metric on Rn .
(So this gives a Riemannian metric on ∂Ω, and thus a density as above.) If
∂Ω is locally given by the graph of a function (x1 , . . . , xn−1 , f (x1 , . . . , xn−1 )),
then
φ(x1 , . . . , xn−1 ) = (x1 , . . . , xn−1 , f (x1 , . . . , xn−1 ))
is a local parametrization of the n − 1
matrix

1 0
 0 1


..
Dφ =  ...
.

 0 0
f,1 f,2
dimensional manifold ∂Ω ⊂ Rn . The

···
0
···
0 

..  .
...
. 

···
1 
· · · f,n−1
Then the pullback metric
n−1
X
gij dxi dxj = φ∗ δ = δab dφa dφb
i,j=1
= (dx1 )2 + · · · + (dxn−1 )2 + (f,1 dx1 + · · · + f,n−1 dxn−1 )2 .
As a matrix,
(gij ) = (δij + f,i f,j ).
In order to compute the volume form, we should compute det gij . Fortunately,
it is easy to compute in this case
2
,
det g = 1 + |df |2 = 1 + f,12 + · · · + f,n−1
79
(see Problem 33) below. So the density
p
dVg = 1 + |df |2 dxn−1
for dxn−1 Lebesgue measure on Rn−1 .
Homework Problem 33. For w an n-dimensional column vector, and I
the n × n identity matrix, show that det(I + ww> ) = 1 + |w|2 .
Hint: Show that I +ww> can be diagonalized, with one eigenvalue 1+|w|2 ,
and with the eigenvalue 1 repeated n − 1 times. (For this last step, show that
on the n − 1 space orthogonal to the natural (1 + |w|2 )-eigenvector, I + ww>
acts as the identity. What is a natural eigenvector to try?)
∂f
i
For a function f : Ω → R, the differential, or one-form, df = ∂x
i dx .
Under a change of coordinates y = y(x), df transforms as via the chain rule
∂f
∂f
∂f ∂y j i
j
i
dy
=
df
=
dx
=
dx .
∂y j
∂xi
∂y j ∂xi
In particular, this gives the formula for differentials (cf. Lemma 45)
dy j =
∂y j i
dx .
∂xi
It also shows that for each p ∈ X a manifold, we can think of df (p) ∈ Tp∗ X
the cotangent space. This is investigated further in the following problem:
Homework Problem 34. If f is a smooth function on X and v is a smooth
vector field, show that at each point p ∈ X,
(vf )(p) = df (p)(v(p)).
(In the expression on the right, consider df (p) as an element of the dual space
Tp∗ X.)
Hint: Check it in a single coordinate chart.
On a Riemannian manifold (X, g) (i.e., g is a Riemannian metric on the
manifold X), for each smooth function f , there is a vector field called the
gradient of f . We define the gradient ∇f in local coordinates to be
(∇f )i = g ij f,j ,
k
g k` g`m = δm
.
(So g ij is the inverse of the matrix gij .) Note that the Einstein convention
with one index up (typically) indicates that ∇f is a vector field.
80
Homework Problem 35. Show that ∇f transforms as a vector field under
coordinate changes. In other words, check that if y = y(x),
(∇f )j (y) =
∂y j
(∇f )i (x)
∂xi
as in (22).
Hint: First check how the inverse of the metric g ij transforms. Note that
in the definition g ij gjk = δki , δki is independent of coordinate changes.
In the case of Euclidean space, it is common to use the gradient of a
function instead of its differential. In this case, ∇f = δ ab f,a . Note that on
any Riemannian manifold
|df |2 = g ab f,a f,b = g ac gcd g db f,a f,b = gcd (∇f )c (∇f )d = |∇f |2 .
Let v = v i ∂x∂ i be a vector field on a domain in Rn . Then the divergence of v
is a function defined to be
∂v i
∇·v =
.
∂xi
The divergence of a vector field may also be defined on Riemannian manifolds,
but the definition is somewhat more involved.
Here is another important theorem, which is a consequence of Stokes’s
Theorem (see Spivak, Guillemin and Pollack, or Taylor). We only state it
for domains in Rn , and not in its more general context of compact manifolds
with boundary.
Theorem 18 (Divergence Theorem). Let Ω ⊂⊂ Rn be a domain with
smooth boundary ∂Ω. Then for any C 1 vector field v on Ω̄,
Z
Z
∇ · v dxn =
v · n dV.
Ω
∂Ω
(Here n is the unit outward normal vector field to ∂Ω, and dV is the measure
on ∂Ω induced from the Euclidean metric.)
Remark. The way we have put the integration depends on the Euclidean
metric (to form the dot product, dV and n). In the general form of Stokes’s
Theorem, it it unnecessary to use the metric. (We may recast v and ∇ · v as
differential forms.)
81
Idea of proof. We do the computation in a very special case, for v having
compact support in Ω, which is the lower half-space {x = (x1 , . . . , xn ) ∈ Rn :
xn ≤ 0}.
In this case the unit normal vector n = (0, . . . , 0, 1) and dV = dxn−1
Lebesgue measure on Rn−1 = {xn = 0}. Then, using Fubini’s Theorem, we
want to prove
Z ∞
Z ∞
Z ∞Z 0
Z ∞
∂v i n n−1
1
...
dx dx
· · · dx =
...
v n dxn−1 · · · dx1 .
i
∂x
−∞
−∞ −∞
−∞
−∞
Note that the left-hand integral is a sum from i = 1 to n. For i = n, compute
Z 0
∂v n n
dx = v n (x1 , . . . , xn−1 , 0) − lim v(x1 , . . . , xn−1 , t)
n
t→−∞
∂x
−∞
n 1
n−1
= v (x , . . . , x , 0)
since v has compact support. On the other hand, for i 6= n,
Z ∞
∂v i i
dx = 0
i
−∞ ∂x
since v has compact support. Therefore, using Fubini’s Theorem, for each
i 6= n, we can integrate ∂v i /∂xi with respect to xi first to get zero. The
remaining term is the case i = n, and so
Z ∞
Z ∞Z 0
∂v i n n−1
dx dx
· · · dx1
...
i
∂x
−∞
−∞ −∞
Z ∞
Z ∞Z 0
∂v n n n−1
=
...
dx dx
· · · dx1
n
∂x
−∞
−∞ −∞
Z ∞
Z ∞
=
...
v n dxn−1 · · · dx1 .
−∞
−∞
This proves the Divergence Theorem in this special case.
The general case can be reduced to this special case by using a partition
of unity and the Implicit Function Theorem (see Spivak). In particular, near
each point in ∂Ω, there is a local diffeomorphism of Ω̄ to the lower halfspace, sending the boundary to the boundary. Together with open subsets of
Ω, these form an open cover of the compact Ω̄, and so we may take a finite
subcover, and a partition of unity subordinate to this subcover. Then we can
82
apply the above special case to ρv for ρ in the partition of unity and v the
vector field.
It is also necessary to make sure that the various terms in the integrals
transform well with respect to the local diffeomorphisms. This can be checked
directly, but it is better to use the language of differential forms (see Spivak
or Guillemin and Pollack).
Homework Problem 36. Let Ω be a domain in Rn with smooth boundary.
On a neighborhood N ⊂ Rn of a point in the boundary ∂Ω, assume that
Ω ∩ N = {x ∈ N : xn < f (x1 , . . . , xn−1 )}
so that Ω is locally the region under the graph of a smooth function f . Compute n and dV . For a smooth vector field v, compute
Z
v · n dV
∂Ω∩N
in terms of the integral of a function times Lebesgue measure on Rn−1 .
Hint: Locally, ∂Ω is a submanifold of Rn which is the image of
φ(x1 , . . . , xn−1 ) = (x1 , . . . , xn−1 , f (x1 , . . . , xn−1 )).
Show that n is proportional to ∇ψ, for
ψ(x1 , . . . , xn ) = xn − f (x1 , . . . , xn−1 ).
Your answer should be of the form
Z
h dxn−1
φ−1 (∂Ω∩N )
for h a function of x1 , . . . , xn−1 .
Corollary 47 (Integration by Parts). Let Ω ⊂⊂ Rn be a domain with
smooth boundary ∂Ω. Then for any C 1 vector field v on Ω̄ and C 1 function
f on Ω̄,
Z
Z
Z
v · ∇f dxn = − f ∇ · v dxn +
f v · n dV.
Ω
Ω
∂Ω
Proof. It is easy to check that ∇ · (f v) = (∇f ) · v + f ∇ · v, and
Z
Z
∇ · (f v) dxn =
f v · n dV.
Ω
∂Ω
83
3.7
The -Neighborhood Theorem
Theorem 19. Let X ⊂ Rn be a compact k-dimensional manifold. Then
there is an > 0 so that for
X = X + B (0) = {y ∈ Rn : there is an x ∈ X so that |x − y| < },
there is a smooth projection map from X to X which restricts to the identity
on X.
Before we prove Theorem 19, we need to introduce the normal bundle
N X, which is a vector bundle over X for X ⊂ Rn . Let h·, ·i denote the
standard inner product on Rn . Define
N X = {(x, y) ∈ Rn × Rn : x ∈ X, hy, zi = 0 for all z ∈ Tx X}.
Then N X is a vector bundle of rank n − k, with π : N X → X given by
π : (x, y) 7→ x. For a given x ∈ X, Nx X = π −1 (x) is the normal space to X
at x, which consists of all vectors in Rn perpendicular to the tangent space
Tx X.
First of all, we show that N X is a smooth n-dimensional manifold.
Homework Problem 37. N X is a smooth manifold of dimension n.
(a) Show that X ⊂ Rn is a smooth manifold if and only if for each x ∈ X,
there is a neighborhood W of x in Rn and a smooth function ψ : W →
Rn−k so that Dψ has constant rank n − k and X ∩ W = ψ −1 (0). (To
show =⇒, use Theorem 14, and to show ⇐=, use the Implicit Function
Theorem.)
(b) At each x ∈ X, and given a smooth function ψ as above, show that the
normal space Nx is the image of of the transpose of the tangent map
Dψ(x)⊥ : Rn−k → Rn .
(c) Use the previous section and the techniques of Problem 28 to show N X
is a manifold.
We will prove the -Neighborhood Theorem by showing that there is a
neighborhood of X in Rn which is diffeomorphic to the a neighborhood of
the zero section {(x, 0) : x ∈ X} ⊂ N X, and the map required by the
-Neighborhood Theorem then comes from π : N X → X.
84
Proof of the -Neighborhood Theorem. Consider the map F : N X → Rn
given by F : (x, y) 7→ x + y. For each x ∈ X, DF (x, 0) : Tx (N X) → Rn
is a linear isomorphism. This can be proved since T(x,0) (N X) can be written
as a sum Tx (X) + Nx (X), and DF (x), when restricted to each factor, is
a linear isomorphism. The Inverse Function Theorem then shows that each
x ∈ X, there are neighborhoods Nx of (x, 0) in N X and Wx of x in Rn so that
F |Nx is a diffeomorphism from Nx to Wx . Note we may apply the Inverse
Function Theorem because by considering a local parametrization of N X,
and diffeomorphisms of (open subsets of) manifolds are defined in terms of
these parametrizations.
Consider the following lemma:
Lemma 48. There are open sets N and X̃ so that X × {0} ⊂ N ⊂ N X and
X ⊂ X̃ ⊂ Rn and the restriction of F is a diffeomorphism from N to X̃.
Proof.
First of all, we note that DF is a linear isomorphism on N 0 =
S
x∈X Nx . The Inverse Function Theorem then shows that F |N 0 is a diffeomorphism onto its image as long as it is one-to-one. Therefore, we need
only find an open N satisfying X × {0} ⊂ N ⊂ N 0 on which F is one-to-one.
Now assume by contradiction that no such N exists. Then there are
points (xn , yn ) 6= (x0n , yn0 ) ∈ N X satisfying F (xn , yn ) = F (x0n , yn0 ) and so
that |yn |, |yn0 | < n1 (Why? You must use the compactness of X.) Since X is
compact, there must be a subsequence ni so that (xni , yni ) → (x, 0) as i → ∞.
Then we may take a further subsequence nij so that (x0ni , yn0 i ) → (x0 , 0) as
j
j
j → ∞. For simplicity, we rename the subsequence nij as simply n. Then
the continuity of F shows that
x = F (x, 0) = lim F (xn , yn ) = lim F (x0n , yn0 ) = F (x0 , 0) = x0 .
n→∞
n→∞
Since F is injective on X × {0}, we have x = x0 . But then F |Nx is injective, which contradicts our assumption that (xn , yn ) 6= (x0n , yn0 ) for large n.
Therefore, the lemma is proved.
Now since X is compact, there is a small > 0 so that X ⊂ F (N ). The
projection map from X → X is then given by π ◦ F −1 , which is smooth.
This completes the proof of the -Neighborhood Theorem.
85
4
4.1
The Calculus of Variations
The variational principle
In this section, we want to consider the problem of constructing a function
which minimizes a given functional. (A functional is a map from functions
to R.)
Example 15. Let Ω ⊂⊂ Rn be a domain with smooth boundary. Then we
consider the class
F = {f ∈ C 2 (Ω) ∩ C 0 (Ω̄) : f = g on ∂Ω}
for a given C 2 function g on ∂Ω. Consider the graph of f
{(x, f (x)) ∈ Ω̄ × R}.
By pulling back the Euclidean metric on Rn+1 , we can consider the n-volume
of the graph. We have computed above
Z p
1 + |∇f |2 dxn .
Vol(f ) =
Ω
Then we want to consider the following question: Is there an f ∈ F which
minimizes Vol(f ) over all of F?
If it exists, f must satisfy
d Vol(f + h) = 0
d =0
for every h so that f + h ∈ F. We compute and integrate by parts to find
a differential equation f must satisfy. First of all, f + h ∈ F if and only if
86
h ∈ C 2 (Ω) ∩ C 0 (Ω̄) and h = 0 on ∂Ω.
d 0 =
Vol(f + h)
d =0
Z
p
d =
1 + |∇f + ∇h|2 dxn
d =0 Ω
Z
p
d 1 + |∇f |2 + 2 df · ∇h + 2 |∇h|2 dxn
=
d =0 Ω
Z
2 ∇f · ∇h + 2 |∇h|2
p
=
dxn 2
Ω 2 1 + |∇f + ∇h|
=0
Z
∇f · ∇h
p
=
dxn
1 + |∇f |2
Ω
!
!
Z
Z
∇f
∇f
= − h∇ · p
dxn +
h p
· n dV
2
1 + |∇f |
1 + |∇f |2
Ω
∂Ω
!
Z
∇f
= − h∇ · p
dxn .
1 + |∇f |2
Ω
This last integral must be equal to zero for every h ∈ C 0 (Ω̄) which vanishes
on ∂Ω. We claim this forces
!
∇f
g =∇· p
=0
1 + |∇f |2
on Ω.
To prove the claim, note that since f is C 2 , g is continuous on Ω. We
prove the claim by contradiction. If g is nonzero at any point x ∈ Ω, assume
without loss of generality that g(x) > 0. Then by continuity, g > 0 in a small
ball B centered at x. Now it is easy to find a smooth bump function h whose
support is contained in B. In this case
Z
Z
hg dxn =
hg dxn > 0,
Ω
B
which provides the contradiction.
87
Thus any function f which minimizes the functional Vol satisfies the
Euler-Lagrange equation of the functional
!
∇f
∇· p
= 0.
1 + |∇f |2
This equation is known as the minimal surface equation.
So a solution to our problem satisfies the minimal surface equation, and
the boundary condition f = g on ∂Ω. This sort of boundary condition of
specifying the value of a solution f is called a Dirichlet boundary condition.
The problem of finding a solution to the equation with this boundary condition
is a Dirichlet boundary value problem. Note that the Dirichlet boundary
condition is essential in making sure the variational function h vanishes on
the boundary, and thus there are no boundary terms when we integrate by
parts. There is another useful type of boundary condition, the Neumann
boundary condition, in which the normal derivative ∇f · n = 0. Notice that
this also makes the integral over ∂Ω vanish in the integration by parts.
In the previous example, we computed the Euler-Lagrange equation for
Vol. There may be solutions to the Euler-Lagrange equation which are not
minimizers of Vol, since we have only checked the first-derivative test. A
solution to the Euler-Lagrange equation may correspond to a local maximum, a saddle point or a local but non-global minimum. We’ll see below
specific techniques for finding a global minimizer, which we apply in another
geometric problem.
The Euler-Lagrange equations come from the first variational formula
that a minimizer must satisfy: Given a family f with f = f0 , then if f
minimizes a functional P ,
d P (f ) = 0.
d =0
This is the formula of the first variation, which comes from the first derivative
test in calculus. We may also use the second derivative test. A minimizer f
as above must satisfy the second variation formula
d2 P (f ) ≥ 0.
d2 =0
88
Homework Problem 38. Consider a variational problem for C 2 functions
y = y(x) from a domain [a, b] and fixed endpoints y(a) = y0 , y(b) = y1 .
Assume the function is of the form
Z b
J(y) =
F (y, y 0 )dx,
a
for F a smooth function of 2 variables.
(a) Compute the general Euler-Lagrange equation for J.
(b) Multiply the Euler-Lagrange equation by y 0 to show that any solution to
the Euler-Lagrange equation must satisfy
dG
=0
dx
for a function G depending on F, y and their derivatives.
(c) A graph y = y(x) of a C 1 positive function determines a surface of
revolution around the x-axis with surface area
Z b p
A(y) = 2π
y 1 + (y 0 )2 dx.
a
Compute the Euler-Lagrange equation for A (assume y is C 2 ) and compute its general solution. (The graph of this solution is called a catenary.)
4.2
Geodesics
Given a C 1 path γ : I → X for I = [α, β] an interval and X ⊂ RN a manifold
with Riemannian metric g induced from the Euclidean metric on RN , the
length of the path γ(I) is given by
L(γ) =
Z
β
α
|γ̇|g dt =
Z
β
α
Z
p
g(γ̇, γ̇) dt =
β
α
q
gij (γ(t))γ̇ i (t)γ̇ j (t) dt.
(In the last formulation, note the use of local coordinates. So the last formulation is strictly only true when γ(I) is contained in a single coordinate
chart.) L(γ) is called the length functional which take paths γ to R.
89
Proposition 49. The length of a path is independent of the parametrization.
In other words, if γ̃(τ ) = γ(t(τ )) for t = t(τ ) a C 1 diffeomorphism onto I,
then L(γ̃) = L(γ).
Proof. Let t = t(τ ) with t(α̃) = α, t(β̃) = β. Assume that α̃ < β̃ and since t
is a diffeomorphism, then dt/dτ > 0. Then compute
Z β̃ s dγ̃ dγ̃
L(γ̃) =
g
,
dτ
dτ dτ
α̃
Z β̃ s dγ dt dγ dt
=
g
,
dτ
dt dτ dt dτ
α̃
Z β̃ s dγ dγ dt
=
g
,
dτ
dt dt dτ
α̃
Z βs dγ dγ
=
g
,
dt
dt dt
α
= L(γ).
The case when dt/dτ < 0 and α̃ > β̃ is similar.
So this definition corresponds to the usual definition of the arc length of a
parametric curve. In particular, it is invariant under change of parametrization. This particular feature turns out to cause trouble analytically. In the
following sections, we’ll seek to find paths minimizing arc length by constructing a sequence of paths approaching a length-minimizing one. The fact
that a potentially minimizing path has many different parametrizations will
make the analysis more difficult, since it will be difficult to find a sequence of
paths which approaches a particular minimizing path among all the possible
parametrizations. Another analytic objection to the length functional is that
it is the L1 norm of the length of the tangent vector γ̇. L2 norms tend to
behave better, since we can use the structure of Hilbert spaces.
Assume for convenience that the interval I = [0, 1]. This can always be
achieved by using a linear map to take a given I to [0, 1].
Thus we introduce a related functional, the energy of a C 1 path γ : [0, 1] →
X. Define
Z 1
E(γ) =
|γ̇|2g dt.
0
The energy is related to the length by the following proposition.
90
Proposition 50. For a given homotopy class C of curves γ : [0, 1] → X, a
C 1 curve γ minimizes E in C if and only if it minimizes L among C 1 curves
in C and the speed |γ̇(t)|g is constant.
Before we start the proof, we recall a little about homotopy classes.
Two continuous curves γi : [0, 1] → X i = 0, 1 are homotopic if γi (0) = p,
γi (1) = q for i = 0, 1, and if there is a continuous function (called a homotopy)
G : [0, 1] × [0, 1] → X so that G(0, t) = γ0 (t), G(1, t) = γ1 (t) for all t ∈ [0, 1],
and G(s, 0) = p and G(s, 1) = q for all s ∈ [0, 1]. (More generally, if Y
and X are both metric spaces, then two continuous maps f0 , f1 : Y → X
are said to be homotopic if there is a continuous map F : [0, 1] × Y → X
with F (0, y) = f0 (y), F (1, y) = f1 (y) for all y ∈ Y . In the present case, the
space Y = [0, 1] and we impose the extra conditions that the values at the
endpoints t = 0, 1 are fixed at p, q respectively as well.)
Since we are measuring length and energy, we are only interested in curves
γi which are C 1 , while we allow the homotopy G to be only continuous.
Proposition 51. The condition of two paths being homotopic is an equivalence relation, and thus we may consider homotopy classes of paths.)
Proof. We need to show the property is reflexive, symmetric, and transitive.
If γ : [0, 1] → X is a continuous path, then it is homotopic to itself via the
homotopy G(s, t) = γ(t) for s ∈ [0, 1]. This shows the reflexive property.
If γ0 is homotopic to γ1 via the homotopy G, then we see γ1 is homotopic
to γ0 via the homotopy G̃(s, t) = G(1 − s, t). This shows the symmetric
property.
Finally, to show the transitive property, if γ0 is homotopic to γ1 via a
homotopy G and γ1 is homotopic to γ2 via a homotopy F , then we construct
a homotopy from γ0 to γ2 by the formula
G(2s, t)
for s ∈ [0, 1/2]
H(s, t) =
F (2s − 1, t) for s ∈ [1/2, 1]
Note this definition is well-defined, since for H(1/2, t) = γ1 (t) for either
definition above. This observation also shows that H is continuous. It is
straightforward to show H is a homotopy.
A C 1 diffeomorphism t = t(τ ) of [0, 1] is called orientation preserving if
dt/dτ > 0. Another fact about homotopy we’ll presently use is the following
91
Lemma 52. If γ̃(τ ) = γ(t(τ )) for t = t(τ ) an orientation-preserving diffeomorphism of [0, 1], then γ̃ and γ are homotopic.
Proof. For s, τ ∈ [0, 1], define ψ(s, τ ) = sτ + (1 − s)t(τ ). Then we will
show that G(s, τ ) = γ(ψ(s, τ )) is the required homotopy. First of all, since
t(τ ) is an orientation-preserving diffeomorphism, we see t(0) = 0, t(1) = 1.
Now check that for s, τ ∈ [0, 1], ψ(s, τ ) ∈ [0, 1]: because 0 ≤ τ ≤ 1 and
0 ≤ t(τ ) ≤ 1, then
0 = s(0) + (1 − s)0 ≤ sτ + (1 − s)t(τ ) ≤ s(1) + (1 − s)(1) = 1.
This shows the homotopy G is well-defined. It is obvious for τ ∈ [0, 1]
that G(0, τ ) = γ0 (τ ) and G(1, τ ) = γ1 (τ ). Also compute for s ∈ [0, 1],
G(s, 0) = γ(0) and G(s, 1) = γ(1).
Also, note the following
Lemma 53. For any C 1 path γ, E(γ) ≥ L(γ)2 and they are equal if and
only if |γ̇(t)|g is constant.
Proof. Apply Hölder’s inequality
Z 1
21 Z
Z 1
2
L(γ) =
|γ̇(t)|g dt ≤
1 dt
0
0
0
1
|γ̇(t)|2g
12
p
dt
= E(γ)
with equality if and only if 1 is proportional to |γ̇(t)|g , which is the same as
|γ̇(t)|g being constant.
Proof of Proposition 50. Let γ ∈ C satisfy E(γ) ≤ E(γ 0 ) for all γ 0 ∈ C. Given
γ, let γc be the constant speed reparametrization of γ (this exists by Problem
39 below). Then we have by Proposition 49 and Lemma 53
L(γc )2 = L(γ)2 ≤ E(γ) ≤ E(γc ) = L(γc )2 .
Thus all the inequalities in the above equation must be equalities and L(γ)2 =
E(γ). Then Lemma 53 implies γ must have constant speed. So we’ve shown
so far that if γ minimizes E, then γ has constant speed.
Let γ minimize E. For each C 1 curve γ 0 ∈ C, let γc0 be a constant speed
reparametrization. Then since γ has constant speed, Lemma 53 and Proposition 49 show
L(γ)2 = E(γ) ≤ E(γc0 ) = L(γc0 )2 = L(γ 0 )2 .
So we’ve shown that if γ minimizes E in C, then γ minimizes L in C.
We leave the converse statement as Problem 40 below.
92
Homework Problem 39. (a) Let γ : [0, 1] → X, γ = γ(t) be a C 1 path
into a Riemannian manifold X. Assume |γ̇(t)|g 6= 0 for all t ∈ [0, 1].
Show that there is a reparametrization t(τ ) so that t(0) = 0, t(1) = 1,
is constant.
dt/dτ > 0, and dγ
dτ g
Hint: Show the constant must be equal to L(γ). Then show the condition is an ODE in τ = τ (t). (Note that if dt/dτ > 0, then t(τ ) is
strictly increasing and thus has an inverse on [0, 1].)
(b) Remove the condition that |γ̇(t)|g 6= 0. In this case, t(τ ) will only be
Lipschitz.
Hint: Consider the open set O = {t : γ̇(t) 6= 0}. Perform a similar
analysis on each connected component of O.
*** This still needs work. ***
Homework Problem 40. For a given homotopy class C of curves γ :
[0, 1] → X, assume γ has constant speed |γ̇(t)|g and γ minimizes L among
C 1 curves in C. Then γ minimizes E among C 1 curves in C.
Now we compute the first variation of the energy functional. Let γ be
a smooth curve from [0, 1] to X so that γ(0) = p, γ(1) = q. X ⊂ RN has
the Riemannian metric pulled back from RN . Assume γ minimizes E in a
homotopy class C, and that γ is C 2 . Then for each smooth family γ (t), we
have
d E(γ ) = 0.
d =0
Consider a variation of the following special form. Near a point in γ([0, 1]),
pick local coordinates x : O → U ⊂ Rn . Then there is a small time interval
I = γ −1 (O) ⊂ [0, 1]. Assume for simplicity that I doesn’t contain either
endpoint 0 or 1. In terms of the local coordinates x, x(γ(t)) = γ(t) ∈ U ⊂ Rn ,
for t ∈ I. Then let h : R → Rn be a smooth function so that supp(h) ⊂⊂ I.
For near 0,
γ (t) = γ(t) + h(t) ⊂ U
for t ∈ I. We define γ outside of O to be simply γ. Apply the first variational
93
formula
d E(γ ) =
d =0
Z 1
d g(γ̇ (t), γ̇ (t)) dt
d =0 0
Z
d =
gij (γ(t) + h(t))[γ̇ i (t) + ḣi (t)][γ̇ j (t) + ḣj (t)] dt
d =0 I
Z ∂gij
k
=
(γ(t))h (t) γ̇ i (t) γ̇ j (t) dt
k
∂x
I
Z
+ gij (γ(t)) ḣi (t) γ̇ j (t) dt
ZI
+ gij (γ(t)) γ̇ i (t) ḣj (t) dt
I
Now we integrate by parts in the last two integrals. Note that since h has
compact support, all the boundary terms involving h vanish. Compute
Z
Z ∂gij
k
i
j
gij (γ(t)) ḣ (t) γ̇ (t) dt = −
(γ(t))γ̇ (t) hi (t) γ̇ j (t) dt
k
I
I ∂x
Z
− gij (γ(t)) hi (t) γ̈ j (t) dt.
I
We may plug this in to find for a minimizer
d 0 =
E(γ )
d =0
Z ∂gij k i j
∂gij k i j ∂gij k i j
i j
i j
h γ̇ γ̇ − k γ̇ h γ̇ − gij h γ̈ − k γ̇ γ̇ h − gij γ̈ h dt
=
∂xk
∂x
∂x
I
Z
∂gij i j ∂gkj i j
∂gik j i
j
j
k
=
h
γ̇ γ̇ −
γ̇ γ̇ − gkj γ̈ −
γ̇ γ̇ − gjk γ̈ dt.
∂xk
∂xi
∂xj
I
Since this is true for each h with compact support in I, then we must have
for each k = 1, . . . , n, and for all t in the open interval I,
0=
∂gij i j ∂gkj i j
∂gik j i
γ̇ γ̇ −
γ̇ γ̇ − gkj γ̈ j −
γ̇ γ̇ − gjk γ̈ j .
k
i
∂x
∂x
∂xj
94
Since gkj = gjk , we have
0 =
0 =
=
Γ`ij =
1
∂gij ∂gkj ∂gik
gjk γ̈ +
− k +
+
γ̇ i γ̇ j ,
2
∂x
∂xi
∂xj
1 k` ∂gkj ∂gik ∂gij
`
γ̈ + g
+
− k γ̇ i γ̇ j
2
∂xi
∂xj
∂x
γ̈ ` + Γ`ij γ̇ i γ̇ j ,
1 k` ∂gkj ∂gik ∂gij
g
+
− k .
2
∂xi
∂xj
∂x
j
Γ`ij are called the Christoffel symbols of the metric gij , and
γ̈ ` + Γ`ij γ̇ i γ̇ j = 0
(26)
is called the geodesic equation for the metric g. Note
Γ`ij = Γ`ji .
Any curve satisfying this second-order system is called a geodesic on the
Riemannian manifold X.
Remark. Our definition of geodesic requires a specific parametrization to
solve the equation (the constant speed parametrization). Many other authors
define a geodesic to be a curve which satisfies the first variational equation
of arc-length. These geodesics are the same as our geodesics as subsets of the
Riemannian manifold, but the parametrization is not required to be constant
speed.
Note that this analysis does not work at the endpoints 0 and 1. There,
we simply have the conditions γ(0) = p and γ(1) = q to remain in the class
C. This is essentially a Dirichlet boundary condition on the problem.
Homework Problem 41. Let p, q be points in a manifold X, and consider
the class C of all smooth paths from p to q.
(a) Compute the Euler-Lagrange equations for the length functional L(γ)
for γ ∈ C. Show that any γ : [0, 1] → X which is a critical point of L
must satisfy
γ̈ ` (t) + Γ`ij (γ(t))γ̇ i (t)γ̇ j (t) = c(t)γ̇ ` (t)
for t ∈ (0, 1) and c(t) a real-valued function of t.
95
(b) Use part (a) to prove the following generalization of Proposition 50:
A curve γ in C is a critical point of E if and only if it is a critical point
of L and it has constant speed.
Homework Problem 42. Let (X, g) be an n-dimensional smooth compact
Riemannian manifold. By Nash’s Theorem, we may assume that g = i∗ δ the
pull-back of the Euclidean metric δ on RN for some embedding i : X → RN .
If (p, v) ∈ T X (i.e. p ∈ X and v ∈ Tp X), show that the solution to the
geodesic equation (26) on X with initial conditions γ(0) = p and γ̇(0) = v
exists for all time.
Hints:
(a) Show that if γ(t) solves the geodesic equation (26), then the speed |γ̇(t)|g
is constant in t.
(b) Reduce the problem to the case the initial speed |v|g(p) = 1.
(c) The unit tangent bundle U T X is defined by
U T X = {(p, v) ∈ T X : |v|g(p) = 1}.
Show U T X is compact as long as X is compact.
(d) Mimic the proof of Theorem 15 to complete the proof.
Example 16. Euclidean space is Rn with the standard Euclidean metric
δ = δij dxi dxj . In this case, all the Christoffel symbols Γkij vanish, since each
term involves differentiating the components of the metric tensor, all of which
are constant. Therefore, the geodesic system is simply γ̈ k = 0. Solutions to
this ODE are simply linear functions of t, and so geodesics are of the form
γ = tv + w for v, w ∈ Rn . So geodesics on Euclidean space are straight lines
traversed at constant speed.
Example 17. For hyperbolic space, recall the metric gij = (xn )−2 δij on {x ∈
Rn : xn > 0}. Compute the Christoffel symbols:
g ij = (xn )2 δ ij ,
gij,k = −2(xn )−3 δij δkn ,
Γkij = 12 (xn )2 δ k` (gi`,j + g`j,i − gij,` )
=
=
1
(xn )2 δ k` [−2(xn )−3 ](δi` δjn + δ`j δin
2
−(xn )−1 (δik δjn + δjk δin − δ kn δij ).
96
− δij δ`n )
Now consider i, j, k distinct integers in {1, . . . , n}.
Γkij = 0,
Γiik = Γiki = −(xn )−1 δkn ,
Γkii = (xn )−1 δ kn ,
Γiii = −(xn )−1 δin .
First, we look for solutions in which γ̇ k = 0 for k = 1, . . . , n − 1 (so only
γ n varies in t). It is plausible to look for such solutions since the coefficients
gij of the metric depend only on xn .
In this case, for k < n, compute
0 = γ̈ k = −Γkij γ̇ i γ̇ j
= −Γknn γ̇ n γ̇ n
= (xn )−1 δ kn γ̇ n γ̇ n = 0.
Thus if γ̇ 1 = · · · = γ̇ n−1 = 0, then the geodesic equations for γ̈ k for k < n
are automatically solved.
Now compute the geodesic equation for γ̈ n :
γ̈ n =
=
=
=
−Γnij γ̇ i γ̇ j
−Γnnn γ̇ n γ̇ n
(xn )−1 γ̇ n γ̇ n ,
(γ n )−1 γ̇ n γ̇ n .
(27)
This is a second-order nonlinear equation in γ n , and we do not have any
general technique to solve such an equation. We can, however, make some
educated guesses. In particular, note that
(γ n γ̇ n )˙ = γ n γ̈ n + γ̇ n γ̇ n ,
and that each of these terms is similar to those in the geodesic equation (27)
above.
In particular, compute for a function f of γ n
0 = (f (γ n )γ̇ n )˙
= f (γ n )γ̈ n + f 0 (γ n )γ̇ n γ̇ n ,
f 0 (γ n ) n n
0 = γ̈ n +
γ̇ γ̇ .
f (γ n )
97
(28)
(29)
This last equation is the same as the geodesic equation (27) if
1
f 0 (γ n )
= − n,
n
f (γ )
γ
and this is now a first-order separable equation for f . We may solve to find
f = (γ n )−1 is a solution.
Now plug into (28) to find
n ·
γ̇
0 =
,
γn
γ̇ n
C = n
γ
= (log γ n )˙,
Ct + D = log γ n ,
γ n = AeCt
for A a positive constant (since in hyperbolic space, we have xn = γ n > 0)
and C any real constant. Therefore,
γ 1 = γ01 ,
...,
γ n−1 = γ0n−1 ,
γ n = AeCt
solves the geodesic system on hyperbolic space.
So far we have only found geodesics in the special case that γ̇ 1 = · · · =
γ̇ n−1 = 0. To find all the geodesics on hyperbolic space, we introduce the
notion of an isometry of a Riemannian manifold.
Given a Riemannian manifold (X, g), a diffeomorphism Φ : X → X is an
isometry if Φ∗ g = g. Isometries of Hn are well understood, and we introduce
a specific type. For α > 0, let
ια : x 7→ α
x
,
|x|2
where x ∈ Hn ⊂ Rn and |x|2 = (x1 )2 + · · · + (xn )2 comes from Rn . It is easy
to see that ια is a diffeomorphism of Hn . To show that it is an isometry, let
y = ια (x). Then
!
Pn
j 2
(dy
)
j=1
ι∗α g = ι∗α
.
|y|2
98
Dropping the pull back ι∗α notation, we compute
xj
,
|x|2
∂y j i
=
dx ,
∂xi
n
X
|x|2 δij − 2xi xj i
= α
dx ,
4
|x|
i=1
yj = α
dy j
(dy j )2 = α2
n
X
|x|2 δij − 2xi xj i
dx
|x|4
i=1
!2
= α2
n
X
|x|2 δij − 2xi xj i
dx
|x|4
i=1
!
n
X
|x|2 δkj − 2xk xj k
dx
|x|4
k=1
!
n
i k
α2 X i k j 2
2 i j j
2 k j j
4 j j
=
4x
x
(x
)
−
2|x|
x
x
δ
−
2|x|
x
x
δ
+
|x|
δ
δ
i
i
k
k dx dx
|x|8 i,k=1
(
n
n
X
X
α2
j 2
i k
i
k
2 j
j
=
4(x )
x x dx dx − 2|x| x dx
xi dxi
|x|8
i=1
i,k=1
)
n
X
xk dxk + |x|4 (dxj )2
− 2|x|2 xj dxj
2
=
α
|x|8
(
4(xj )2
k=1
n
X
xi xk dxi dxk − 4|x|2 xj dxj
n
X
i=1
i,k=1
99
xi dxi + |x|4 (dxj )2
)
,
n
X
(dy j )2
j=1
α2
=
|x|8
(
n
X
4
(xj )2
!
j=1
2
−4|x|
n
X
n
X
xi xk dxi dxk
i,k=1
j i
i
j
4
x x dx dx + |x|
i,j=1
α2
=
|x|8
(
4|x|
=
Pn
j=1 (dy
(y n )2
j 2
)
=
=
α
|x|4
α2
|x|4
n
X
n
X
j 2
(dx )
)
j=1
n
X
i k
i
k
2
x x dx dx − 4|x|
i,k=1
+ |x|4
2
2
!
n
X
xi xk dxi dxk
i,k=1
(dxj )2
)
j=1
n
X
(dxj )2 ,
j=1
Pn
j=1 (dx
j 2
)
n 2
α2 (x|x|4)
Pn
j=1 (dx
(xn )2
j 2
)
.
Therefore, ι∗α g = g and ια is an isometry.
Moreover, it is trivial to check that any translation x 7→ x + x0 is an
isometry of Hn if the last component xn0 = 0. Also, note that the composition
of two isometries is again an isometry (indeed the set of isometries of a
Riemannian manifold X forms a subgroup of the diffeomorphism group called
the isometry group).
Proposition 55 below shows that for any geodesic ψ : R → Hn , then ια ◦ ψ
is also a geodesic. Recall we know so far that
γ = (γ01 , . . . , γ0n−1 , AeCt )
are geodesics for A > 0, C ∈ R. Compute for α > 0,
ια ◦ γ = α
α(γ01 , . . . , γ0n−1 , AeCt )
γ
=
.
|γ|2
(γ01 )2 + · · · + (γ0n−1 )2 + A2 e2Ct
The image ια ◦ γ(R) is then the half-circle in Rn which intersects {xn = 0}
perpendicularly at
0
and
(γ01 , . . . , γ0n−1 , 0)
.
(γ01 )2 + · · · + (γ0n−1 )2
100
Then if we apply the isometry given by adding a constant x0 with xn0 = 0,
then every half-circle in Hn which intersects {xn = 0} perpendicularly at both
endpoints is the image of a geodesic path in Hn .
All together, for constants
γ01 , . . . , γ0n−1 , x10 , . . . , xn−1
, C ∈ R, A, α > 0,
0
the path for t ∈ R
ψ(t) =
α(γ01 , . . . , γ0n−1 , AeCt )
+ (x10 , . . . , xn−1
, 0)
0
(γ01 )2 + · · · + (γ0n−1 )2 + A2 e2Ct
(30)
is a geodesic in Hn , and the image ψ(R) is a ray or a half-circle in Rn
perpendicular to {xn = 0}. All such rays and semicircles are represented by
such geodesic paths.
We claim that we have found all the geodesics in Hn . The way to check
this is to recognize that the geodesic system, as a second-order ODE system
with smooth coefficients, has a unique solution for each initial value problem
γ̈ k = −Γkij γ̇ i γ̇ j ,
γ(0) = y0 ,
γ̇(0) = v0 .
Then if we can check that every initial condition (y0 , v0 ) ∈ T Hn occurs as
(ψ(0), ψ̇(0)) for a geodesic ψ(t) in (30), uniqueness of the geodesic system
will imply that we have found all the geodesics in Hn .
So we must check that every (y0 , v0 ) ∈ T Hn = Hn × Rn can be represented
by (ψ(0), ψ̇(0)) for a ψ(t) in (30). For a given point y0 ∈ Hn , and vector
v0 ∈ Ty0 Hn = Rn , consider first the case when
v01 = · · · = v0n−1 = 0.
In this case, we can choose A > 0 and C so that
ψ(t) = (y01 , . . . , y0n−1 , AeCt )
satisfies ψ(0) = y0 and ψ̇(0) = v0 . Otherwise, y0 and v0 span a plane P in
Hn . Let L = P∩{xn = 0}. It is straightforward to check that there is a unique
semicircle in the plane P which hits L perpendicularly, passes through y0 and
is tangent to v0 at y0 . This is the image of some geodesic ψ(t) in (30). Then
we can adjust C and A to ensure that ψ(0) = y0 and ψ̇(0) = v0 . Therefore,
every initial condition (y0 , v0 ) is achieved by a geodesic on our list, and we
have found all the geodesics in hyperbolic space.
101
The following proposition was discussed in Example 17 above.
Proposition 54. Consider a Riemannian manifold (X, g). Given p ∈ X,
v ∈ Tp X, there is an > 0 and a unique geodesic γ : (−, ) → X with
γ(0) = p, γ̇(0) = v.
Remark. In general, the geodesic γ may not exist for all time, although we
have seen that all the geodesics on hyperbolic space (Example 17) and on
compact Riemannian manifolds (Problem 42) do exist for all time.
A map Φ : X → Y for manifolds X and Y with Riemannian metrics g
and h respectively is a local isometry if every point in X has a neighborhood
O on which Φ : O → Φ(O) ⊂ Y is an isometry.
Proposition 55. If Φ : X → Y is a local isometry of Riemannian manifolds,
then for every geodesic ψ : (−, ) → X, Φ◦ψ is a geodesic on Y . Any geodesic
on Φ(X) ⊂ Y is of this form.
Proof. In local coordinates on X and Y , we can write the isometry as y =
y(x). Note this is the same form as a coordinate change, and the condition
that the map is an isometry is simply that the metric pulls back as a (0, 2)
tensor when changing coordinates.
Therefore, the proof boils down the the following fact: for a local isometry,
and for any C 2 path γ, the quantity
wk = γ̈ k + Γkij γ̇ i γ̇ j
transforms like a tangent vector (i.e. a (1, 0) tensor) under changes of coordinates. Therefore,
∂y I ∂
∂
wk k = wk k I
∂x
∂x ∂y
and wk (x) = 0 for k = 1, . . . , n is equivalent to wI (y) = 0 for I = 1, . . . , n.
∂y I
This is because ∂x
k is nonsingular for y = y(x) a diffeomorphism.
In order to compute how wk transforms, we use the following index convention. Indices i, j, k, . . . are with respect to the x variables, while indices
I, J, K, . . . are with respect to the y variables. For example, gij is the metric
in the x coordinates, while gIJ is the metric in the y coordinates.
First of all, note
gIJ = gij
∂xi ∂xj
,
∂y I ∂y J
g IJ = g ij
102
∂y I ∂y J
.
∂xi ∂xj
Compute
∂gIJ
∂y K
∂
∂xi ∂xj
=
gij I J
∂y K
∂y ∂y
i
∂gij ∂x ∂xj
∂ 2 xi ∂xj
∂xi ∂ 2 xj
=
+
g
+
g
ij
ij
∂y K ∂y I ∂y J
∂y I ∂y K ∂y J
∂y I ∂y J ∂y K
∂xk ∂xi ∂xj
∂ 2 xi ∂xj
∂xi ∂ 2 xj
= gij,k K I J + gij I K J + gij I J K .
∂y ∂y ∂y
∂y ∂y ∂y
∂y ∂y ∂y
gIJ,K =
Then compute
gKJ,I + gIK,J − gIJ,K
∂xk ∂xi ∂xj
∂ 2 xi ∂xj
∂xi ∂ 2 xj
= gij,k I K J + gij I K J + gij K J I
∂y ∂y ∂y
∂y ∂y ∂y
∂y ∂y ∂y
k
i
j
2 i
j
∂x ∂x ∂x
∂ x ∂x
∂xi ∂ 2 xj
+ gij,k J I K + gij I J K + gij I J K
∂y ∂y ∂y
∂y ∂y ∂y
∂y ∂y ∂y
k
i
j
2 i
j
∂ x ∂x
∂xi ∂ 2 xj
∂x ∂x ∂x
− gij,k K I J − gij I K J − gij I J K
∂y ∂y ∂y
∂y ∂y ∂y
∂y ∂y ∂y
∂xk ∂xi ∂xj
∂xk ∂xi ∂xj
∂xk ∂xi ∂xj
= gij,k I K J + gij,k J I K − gij,k K I J
∂y ∂y ∂y
∂y ∂y ∂y
∂y ∂y ∂y
2 i
j
∂ x ∂x
+ 2 gij I J K .
∂y ∂y ∂y
103
Then the Christoffel symbols
ΓLIJ =
g KL (gKJ,I + gIK,J − gIJ,K )
K
L
∂xk ∂xi ∂xj
∂xk ∂xi ∂xj
1 m` ∂y ∂y
= 2g
gij,k I K J + gij,k J I K
∂xm ∂x`
∂y ∂y ∂y
∂y ∂y ∂y
k
i
j
2 i
j
∂x ∂x ∂x
∂ x ∂x
− gij,k K I J + 2 gij I J K
∂y ∂y ∂y
∂y ∂y ∂y
k
j
L
∂y
∂x ∂x
∂xk ∂xi
g
+
g
= 12 g m`
mj,k
im,k
∂x`
∂y I ∂y J
∂y J ∂y I
∂ 2 xi
∂xi ∂xj
− gij,m I J + 2 gim I J
∂y ∂y
∂y ∂y
i
j
L
L
∂y ∂ 2 x`
∂y ∂x ∂x
+
.
= Γ`ij
∂x` ∂y I ∂y J
∂x` ∂y I ∂y J
1
2
Note that the second term in the last formula shows that the Christoffel
symbols do not transform as a tensor. In fact, this is fortunate, as the extra
non-tensorial term will cancel out a similar term coming from the second
derivative γ̈ k .
Note that
∂y I i
γ̇ ,
i
∂x
d ∂y L
`
=
(γ) γ̇
dt ∂x`
∂y L `
∂ 2yL j `
=
γ̈
+
γ̇ γ̇ .
∂x`
∂x` ∂xj
γ̇ I =
γ̈ L
Compute
ΓLIJ γ̇ I γ̇ J
I
J
∂y L ∂xi ∂xj
∂y L ∂ 2 x`
m ∂y
p ∂y
=
+
γ̇
γ̇
∂x` ∂y I ∂y J
∂x` ∂y I ∂y J
∂xm ∂xp
∂y L
∂ 2 xk ∂y L ∂y I ∂y J
= Γ`ij γ̇ i γ̇ j ` + I J k j ` γ̇ j γ̇ ` .
∂x
∂y ∂y ∂x ∂x ∂x
Γ`ij
Therefore, γ̈ L + ΓLIJ γ̇ I γ̇ J will transform like a tensor if we can show that the
non-tensorial terms cancel: We need to show
∂ 2yL
∂ 2 xk ∂y L ∂y I ∂y J
+
= 0.
∂x` ∂xj ∂y I ∂y J ∂xk ∂xj ∂x`
104
(31)
This equation follows from the formula for the first derivative of an inverse
matrix. If Ȧ represents the first derivative of a matrix A (with respect to
any parameter or variable), then
(A−1 )˙ = −A−1 ȦA−1 .
(Proof: Differentiate the equation AA−1 = I to find ȦA−1 + A(A−1 )˙ = 0.)
Then since (∂y L /∂x` ) is the inverse matrix of (∂x` /∂y L ),
L
∂ 2yL
∂
∂y
=
`
j
j
∂x ∂x
∂x
∂x`
k J
∂y L ∂
∂x
∂y
= − k
∂x ∂xj ∂y J
∂x`
J
∂y L ∂y I ∂
∂xk
∂y
.
= − k
j
I
J
∂x ∂x ∂y
∂y
∂x`
Upon plugging in, this proves formula (31) and the proposition.
Remark. There is also a more geometric proof of the previous proposition.
Recall that we derived the geodesic equation as the Euler-Lagrange equation
of the energy functional. So any path which minimizes the energy satisfies the
geodesic equation. It is easy to see that the energy of a path is invariant under
an isometry; therefore, the notion of energy-minimizing path is invariant
under isometries.
The problem is that there are geodesics which do not minimize the energy. (They may be saddle points of the energy functional.) This can be
surmounted by restricting to small domains by using the following fact from
Riemannian geometry: Every point in a Riemannian manifold has a neighborhood O so that all geodesic paths in O are energy-minimizing for endpoints
in O. (In Riemannian geometry books, this fact is usually stated in terms
of the length functional instead; to translate to the present situation, recall that energy-minimizing paths are length-minimizing paths parametrized
with constant speed.)
Homework Problem 43. Given a smooth function on a Riemannian manifold, the Hessian of f is defined locally by the formula
H(f )ij =
∂2f
∂f
− Γkij k .
i
j
∂x ∂x
∂x
Show that the Hessian of f is a symmetric (0, 2) tensor.
105
Homework Problem 44. Compute all the geodesics on S2 .
Hint: Use the expression for the metric in local coordinates (y 1 , y 2 ) from
Example 13. Compute the Christoffel symbols. Analyze the case when y 2 = 0
and only y 1 varies. Solve the resulting second-order ODE for γ 1 = y 1 . Then
move these geodesics around via the isometry group of S2 .
(The isometry group of S2 is given by the orthogonal group of 3 × 3 matrices
O(3) = {A : AA> = I}.
Show that each such linear action is an isometry of R3 which takes the unit
sphere S2 to itself. For every line L though the origin in R3 , show that
rotating by an angle θ around the line L is a linear map in O(3). Show
that every initial condition (p, v) ∈ T S2 of the geodesic equation on S2 can
be realized by the examples you computed above, when acted on by such a
rotation in O(3).)
4.3
The direct method: An example
We have computed the Euler-Lagrange equations of the energy functional.
Now we introduce an example of the direct method in the calculus of variations.
The direct method is this: Given a functional E : C → R, if there is a
lower bound I = inf γ∈C E(γ) > −∞, then there is a sequence of paths γi so
that E(γi ) → I. The direct method is to show that there is a subsequence
of {γi } which converges to some γ, and to show that the limiting γ ∈ C and
that E(γ) = I. Thus we have constructed a minimizer γ over the class C
of the functional E. There are subtle points to deal with along the way.
Typically, the class C is a closed subset of a Banach space, and in passing
to the limit of a subsequence, the limit γ we construct may be in a weaker
Banach space (for example, a sequence in C 1 may produce a limit only in
C 0 , which will be problematic if the functional involves any derivatives).
A related issue is that in passing to the limit γij → γ, we may not have
E(γij ) → E(γ). In particular, below we will have to deal with the situation
in which we only know limj→∞ E(γij ) ≥ E(γ)—so that the functional is only
lower semi-continuous under the limit. Thus we will typically need to spend
time improving the regularity of the limit γ and showing some semi-continuity
of the functional under the limiting subsequence.
The direct method of the calculus of variations is very useful in solving
106
elliptic PDEs. The problem we approach involves geodesics, and thus the
solution we produce be a solution to an ODE. This will allow us to proceed
with much of the general picture of the calculus of variations while avoiding some of the more technical points. In particular, we will learn about
distributions, weak derivatives, Hilbert spaces, and compact maps between
Banach spaces in solving our problem.
Given a smooth manifold X, a loop is a continuous map from the circle S1
to X. Each such loop is equivalent to a continuous map γ : R → X which is
periodic in the sense that γ(t+1) = γ(t) for all t ∈ R. We will abuse notation
by using the same γ for γ : S1 → X and the periodic γ : R → X. (This is
because S1 is naturally the quotient R/Z, where Z acts on R by adding
integers to real numbers.) Two loops γ0 , γ1 : S1 → X are freely homotopic if
there is a continuous homotopy
G : [0, 1] × S1 → X,
G(0, t) = γ0 (t),
G(1, t) = γ1 (t).
The condition of being freely homotopic is an equivalence relation, and thus
each loop on a manifold X is a member of a free homotopy class.
Here is our problem:
Problem: Find a curve of least length in a free homotopy class of loops on
a compact Riemannian manifold.
The problem may have no solution on a noncompact Riemannian manifold. There may be loops of arbitrarily small length in a given nontrivial free
homotopy class, corresponding to a loops slipping off a narrowing end of the
manifold.
Homotopy classes are objects defined by continuity, and the following
result should come as no surprise.
Proposition 56. For a smooth compact manifold X ⊂ RN , there is an > 0
so that if two loops γ0 , γ1 : S1 → X ⊂ RN satisfy
kγ0 − γ1 kC 0 (S1 ,RN ) < ,
then γ0 and γ1 are homotopic as loops in X.
Proof. We apply the -Neighborhood Theorem (19): For > 0, let X be the
open subset of RN consisting of all points distance less than from X. There
is a > 0 small enough so that every point in X has a unique closest point
107
in X. Then the map π : X → X which sends a point in X to its closest
point in X is a smooth map of X to X, and it fixes each point in X ⊂ X .
Let γ0 and γ1 be loops on X satisfying
kγ0 − γ1 kC 0 (S1 ,RN ) < .
Then consider the homotopy in RN
G̃(s, t) = (1 − s)γ0 (t) + sγ1 (t) ∈ RN .
For s, t ∈ [0, 1], the distance in RN
|G̃(s, t) − γ0 (t)| = s|γ0 (t) − γ1 (t)| < 1 · .
So G̃(s, t) ∈ X for all s, t ∈ [0, 1], and we may define a homotopy in X by
G(s, t) = π(G̃(s, t)).
Remark. The homotopy G(s, t) constructed is a smooth homotopy if γ0 and
γ1 are smooth. Thus the same theorem works with smooth homotopy classes
(as considered in Guillemin and Pollack).
Corollary 57. If γi are a sequence of loops in a free homotopy class in
X ⊂ RN , and
lim kγi − γkC 0 (S1 ,RN ) = 0,
i→∞
then the loop γ is in the same free homotopy class.
Proof. For the > 0 of Proposition 56 above, there is a γi so that
kγi − γkC 0 (S1 ,RN ) < .
Apply Proposition 56 to show γ and γi are in the same free homotopy class.
The -Neighborhood Theorem, together with the mollifier technique of
approximation, allow us to prove an important foundational result in topology:
108
Theorem 20. Let f : Rn → Y be uniformly continuous, where Y ⊂ RN is
a compact submanifold without boundary. Then f is homotopic to a smooth
map from Rn → Y .
Proof. Since f is uniformly continuous, for all > 0, there is a δ > 0 so
that if |x − x0 | < δ, then |f (x) − f (x0 )| < . The -Neighborhood Theorem
shows that there is an > 0 so that the map π : Y → Y is well-defined and
smooth. Let δ be the corresponding δ from the uniform continuity of f .
Let ρ be a smooth nonnegative
bump function with support in the unit
R
n
ball B1 (0) in R so that Rn ρ dxn = 1. Then for α > 0, define ρα (x) =
α−n ρ(x/α). Note supp ρα = Bα (0). Define
Z
Z
α
f (x) =
f (y)ρα (x − y) dyn =
f (y)ρα (x − y) dyn .
Rn
{y:|x−y|≤α}
(Note each f α is RN -valued.) If α < δ, then |f (y) − f (x)| < for y in the
domain of integration, and so
Z
α
f (x) =
f (y)ρα (x − y) dyn
{y:|x−y|≤α}
Z
=
[f (y) − f (x)]ρα (x − y) dyn
{y:|x−y|≤α}
Z
+
f (x)ρα (x − y) dyn
{y:|x−y|≤α}
Z
=
[f (y) − f (x)]ρα (x − y) dyn + f (x)
{y:|x−y|≤α}
since
Z
{y:|x−y|≤α}
ρα (x − y) dyn =
Z
ρα (x − y) dyn =
Rn
Z
ρα (z) dzn = 1
Rn
for the substitution z = x − y. So
Z
α
|f (x) − f (x)| = [f (y) − f (x)]ρα (x − y) dyn {y:|x−y|≤α}
Z
≤
|f (y) − f (x)|ρα (x − y) dyn
{y:|x−y|≤α}
Z
< ρα (x − y) dyn = .
{y:|x−y|≤α}
109
(32)
Therefore if α ∈ (0, δ), then f α (x) ∈ Y . Then we check that f˜α (x) =
π(f α (x)) is the desired homotopy. In particular, as α → 0, f˜α (x) → f (x)
uniformly by (32) (view as varying to zero instead of fixed for this interpretation). Since π and f α are smooth, then f˜α is smooth for small α > 0.
In particular, we have shown that
α
f˜ (x) for α > 0 small
F (α, x) =
f (x) for α = 0
is the desired homotopy.
Theorem 21. Let f : X → Y be a continuous map between smooth manifolds. Then f is homotopic to a smooth map from X → Y .
Sketch of proof. We may assume X ⊂ RM by Whitney’s Embedding Theorem. Then there is a ν > 0 so that πM : X ν → X is well-defined and
smooth. Define g : RM → RN by g(p) = f (πM (p)) for p ∈ X ν and g(p) = 0
for p 6∈ X ν . Note g(p) is uniformly continuous on a neighborhood of X.
Apply the mollifier argument as above to g and show that the homotopy
constructed in the proof of Theorem 20, when restricted to X ⊂ RM , has the
desired properties.
The discussion above about energy and length still holds. Assuming the
minimizer is smooth enough, then a constant-speed length-minimizing loop
is the same as an energy-minimizing loop. Thus we may as well consider
energy-minimizing loops, and we have the equivalent problem.
Problem: Find a curve of least energy in a free homotopy class of loops on
a compact Riemannian manifold.
So far in our discussion, the formulation of length and energy depend
on the loop γ being C 1 (so that the derivative γ̇ is C 0 and thus can be
integrated). If we look more closely, the energy is defined as the square of
the L2 norm of γ̇
Z
1
|γ̇|2g dt.
E(γ) =
0
Therefore, we really do not need γ̇ to be continuous, but only L2 . In terms of
γ itself, we need to develop a theory of how to take a derivative which ends
up not being continuous, but only L2 . For this purpose, we define derivatives
in the sense of distributions, or weak derivatives.
110
4.4
Distributions
On Rn , we consider each smooth function φ with compact support to be a
test function. For any C 1 function f on Rn and test function φ, we have the
following formula by integrating by parts:
Z
Z
f,i φ dxn = −
f φ,i dxn .
(33)
Rn
Rn
For two locally L1 functions f and h on Rn , we say f,i = h in the sense of
distributions if for all test functions φ,
Z
Z
hφ dxn = −
f φ,i dxn .
Rn
Rn
Let D(Rn ) be the vector space of all smooth functions with compact support in Rn . For our purposes, we will define a distribution on Rn to be a linear
map from D(Rn ) → R. We often allow C-valued test functions and consider
complex linear maps to C; complex-valued functions are useful when doing
Fourier analysis. (The usual definition of a distribution is more involved:
one must define a topology on D(Rn ) and then consider distributions to be
only continuous linear maps to C. For our purposes, the simpler definition
suffices. See Section 4.9 below for a more standard treatment of distributions
on the circle S1 .) Recall a measurable function
f is locally L1 if over every
R
compact subset K of the domain of f , K |f | < ∞. Any locally L1 function
f on Rn gives a distribution by sending
Z
f : φ 7→ f (φ) =
f φ dxn .
Rn
Notice that there is a slight abuse of notation: f (φ) for φ a test function is
not to be confused with f (x) for x ∈ Rn . Two locally L1 functions f1 , f2 are
said to be equal in the sense of distributions if for every test function φ,
Z
Z
Z
f1 φ dxn =
f2 φ dxn ⇐⇒
(f1 − f2 )φ dxn = 0.
Rn
Rn
Rn
Remark. On RN , note that any locally Lp function for p ≥ 1 is also locally
L1 . This is because for K ⊂⊂ Rn , p1 + 1q = 1, and f locally Lp , Hölder’s
inequality states
Z
p1
1q Z
Z
p
|f | dxn ≤
1 dxn
|f | dxn
< ∞.
K
K
K
111
Example 18. Any locally finite Borel measure dµ on Rn defines a distribution by sending
Z
φ 7→
φ dµ
Rn
for any test function φ.
An important example of this is the
point mass, at the origin. The δ-function
subset Ω ⊂ Rn ,
1 if
δ(Ω) =
0 if
inaptly named δ-function, or unit
is a measure on Rn so that for any
0∈Ω
0∈
/ Ω.
So the distribution defined by this measure is
δ : φ 7→ φ(0),
which is just evaluation of φ at the origin. The following problem shows there
is no locally L1 function which is equal to the δ-function.
Homework Problem 45. Show that there is no L1 function f on Rn so
that
Z
f φ dxn = φ(0) for all φ ∈ D(Rn ).
Rn
n
Hint: Consider a smooth nonnegative function
R ρ : R → R with support
in B1 (0) the unit ball centered at 0 and so that Rn ρ dxn = 1. Use this ρ to
define ρ (x) = −n ρ(x/). If there were such an L1 function f , recall that if
Z
f (y)ρ (x − y) dyn ,
f (x) =
Rn
then f → f in L1 as → 0.
(a) Show that for all x 6= 0 that f (x) = 0 for small enough. (Follow the
proof of Proposition 58.)
(b) Suppose a family of continuous functions f → f in L1 (Rn ) as → 0+ ,
and let O ⊂ Rn be a measurable subset on which f = 0 identically on
O for all sufficiently small. Show that f = 0 almost everywhere on O.
(Split up the relevant integrals on Rn into integrals on O and Rn \ O.)
(c) Show our f = 0 almost everywhere on Rn .
112
(d) Find a contradiction.
We have just seen that distributions are more general than functions.
In particular, it is possible to differentiate any distribution by mimicking
formula (33). A distributional derivative of a function may no longer be a
function, but it will be well-defined as a distribution. Given a distribution f
defined by a map f : φ 7→ f (φ) ∈ R, the partial derivative f,i in the sense of
distributions is defined to be the distribution
f,i : φ 7→ −f (φ,i ).
It is this innovation which allows us to define the derivatives of L2 functions.
Remark. Note that the equation (33) motivating the distributional derivative
is essentially the same as the integration by parts used to calculate the EulerLagrange equations for γ +h. Thus if h is smooth with compact support, we
can still integrate by parts even if γ is no longer regular enough for ordinary
differentiation; we simply consider the derivatives to be taken in the sense of
distributions.
Homework Problem 46. Consider the Heaviside function
1 if x ≥ 0
h(x) =
0 if x < 0.
Show that the derivative h0 (taken in the sense of distributions) is the δ
function on R.
Homework Problem 47. Consider for any test function φ ∈ D(R),
Z −
Z ∞
1
1
1
PV
(φ) = lim+
φ(x) dx +
φ(x) dx .
→0
x
x
−∞ x
Part (a) shows that P V ( x1 ) is a distribution. It is called the principal value
of x1 .
(a) Show P V ( x1 )(φ) converges for all smooth test functions φ. (Hint: The
potential problem is clearly at x = 0. Use Taylor’s Theorem to write
φ = φ(0) + O(x), where O(x) represents a term so that O(x)/x converges to a real limit as x → 0.)
113
(b) Show that the first derivative in the sense of distributions of P V ( x1 ) is
given in terms of φ ∈ D(R) as
Z − Z ∞
1
1
2
lim
− 2 φ(x) dx +
− 2 φ(x) dx + φ(0) .
→0+
x
x
−∞
One more thing is needed to complete the picture of distributions as
generalizations of functions. Recall that every locally Lp function for p ≥ 1
defines a distribution. The following proposition shows this map is injective.
Proposition 58. If two locally L1 functions f1 and f2 on Rn define the same
distribution, then f1 = f2 almost everywhere.
Proof. We first consider the case when f1 and f2 are both globally L1 on Rn .
Then recall that we can use a mollifier to approximate each in L1 by smooth
functions. In particular,
if ρ is a smooth nonnegative function with compact
R
support so that Rn ρ dxn = 1, then define
Z
1 x
, fi (x) =
ρ (x − y)fi (y) dyn , i = 1, 2.
ρ (x) = n ρ
Rn
Then each fi is a smooth L1 function on Rn and fi → fi in L1 as → 0.
Now for each fixed x ∈ Rn , ρ (x − y) is a smooth test function with compact
support in y, and fi (x) is simply the evaluation of this test function by the
distribution fi . Since f1 = f2 in the sense of distributions, then f1 (x) = f2 (x)
for all x ∈ Rn . So then
kf1 − f2 kL1 = lim kf1 − f2 kL1 = lim 0 = 0.
→0
→0
Then f1 = f2 in L1 , which is equivalent to f1 = f2 almost everywhere.
If f1 and f2 are only locally L1 , consider a smooth function βR with
compact support which is identically equal to 1 on BR = {|x| ≤ R}. It is
easy to check that the condition f1 = f2 in the sense of distributions implies
βR f1 = βR f2 in the sense of distributions. Then since each fi is locally L1 ,
each βR fi is globally in L1 . We apply the argument of the previous paragraph;
so βR f1 = βR f2 almost everywhere on Rn . This implies that f1 = f2 almost
everywhere on the ball BR . Now let R → ∞ to conclude that f1 = f2 almost
everywhere on Rn .
114
So far, we have discussed distributions on Rn . On the circle S1 , the
definitions are similar, the main difference being that since S1 is compact,
our test functions are simply all smooth functions on S1 . In particular, we
can think of test functions on S1 as smooth periodic functions on R with
period 1. In this way, an L1 function f on S1 acts on test functions by
Z 1
f:φ→
f φ dt.
0
One thing to check is that integration by parts still works. If f is C 1 on S1
and φ is smooth on S1 , then
Z
Z 1
˙
f φ dt =
f˙φ dt
1
S
0
1
Z 1
= −
f φ̇ dt + (f φ)
0
Z0
= −
f φ̇ dt + f (1)φ(1) − f (0)φ(0)
1
ZS
= −
f φ̇ dt
S1
because f (0) = f (1) and φ(0) = φ(1) since f and φ are periodic. So we
have the same basic formula as in (33), and we may define distributions and
distributional derivatives in the same manner as above.
Now we return to our problem. We want to consider all loops γ : S1 →
X ⊂ RN so that
Z 1
E(γ) =
|γ̇|2g dt = kγ̇k2L2 (S1 ,RN ) < ∞.
0
Therefore, we consider the Sobolev space
L21 (S1 , RN ) = {γ : S1 → RN : kγk2L2 = kγk2L2 + kγ̇k2L2 < ∞},
1
where the derivative γ̇ is taken in the sense of distributions. Note that
γ ∈ L2 (S1 , RN ) implies that γ̇, when defined in the sense of distributions,
may be represented as a function (and an L2 function at that).
We may consider each component γ 1 , . . . , γ N separately, and it should be
clear that γi → γ in L21 (S1 , RN ) if and only if each γia → γ a in L21 (S1 , R) for
115
each a = 1, . . . , N . Thus we may work with each component of γ separately
in RN . Below we will see that L21 is a Hilbert space, but for now we are
content to show that every function in L21 (S1 ) is continuous. Recall that
elements of L21 (S1 ) are only equivalence classes of functions, two functions
being equivalent if they agree almost everywhere.
Proposition 59. Every element of L21 (R) contains a continuous representative.
Remark. This proposition is an important example of the Sobolev embedding
theorem, which gives a means to embed Sobolev spaces Lpk (Rn ) into appropriate C ` spaces ` = `(p, k, n). In particular, the present result depends strongly
on the fact that the dimension of the domain R of the functions is one. (There
are elements of L21 (R2 ) which do not have continuous representatives.)
R
Proof. Let f ∈ L21 (R). So R |f˙|2 dt = C 2 < ∞. Then compute for t2 ≥ t1
Z t2
˙
|f (t2 ) − f (t1 )| = f (t) dt
t1
Z t2
12 Z t2 12
≤
|f˙(t)|2 dt
dt
t1
t1
1
2
≤ C(t2 − t1 ) .
So this formula shows f is continuous, as long as we can justify using the
Fundamental Theorem of Calculus
Z t2
f (t2 ) − f (t1 ) =
f˙(t) dt.
t1
Rt
We achieve this by defining g(t) = 0 f˙(s) ds. The previous argument
implies that g is continuous. Now we argue that there is a constant K so
that f − g = K almost everywhere. This will show there is an continuous
representative g + K in the equivalence class of f .
First we show that ġ = f˙ in the sense of distributions. Consider a test
116
function φ. Then
ġ(φ) = −
Z
∞
g(t)φ̇(t) dt
Z t
˙
= −
f (s) ds φ̇(t) dt
−∞
0
Z
Z
˙
= −
f (s)φ̇(t) ds dt +
f˙(s)φ̇(t) ds dt
−∞
Z ∞
R1
R2
by Fubini’s Theorem, for the regions in the plane
R1 = {(s, t) : s ≥ 0, t ≥ s},
R2 = −R1 .
Then again by Fubini, and since φ has compact support,
Z ∞ Z ∞
Z 0 Z s
ġ(φ) = −
φ̇(t) dt f˙(s) ds +
φ̇(t) dt f˙(s) ds
0
s
−∞
−∞
Z ∞
Z 0
˙
= −
(−φ(s))f (s) ds +
φ(s)f˙(s) ds
−∞
Z ∞0
=
φ(s)f˙(s) ds
−∞
= f˙(φ).
Therefore, ġ = f˙ in the sense of distributions.
The following proposition, applied to f −g, shows that there is a constant
K so that f = g + K in the sense of distributions. Then Proposition 58
above shows f = g + K almost everywhere, and thus there is a continuous
representative in the equivalence class of f .
Proposition 60. If a distribution h on R satisfies ḣ = 0 in the sense of
distributions, then there is a constant K so that h = K as distributions.
R
Proof. Let φ be a test function Rwith integral R φ dt = 1. Let K = h(φ).
Then for a test function ψ with R ψ dt = L, compute
h(ψ) = h(ψ − Lφ) + Lh(φ) = h(ψ − Lφ) + LK.
But now
Z
∞
(ψ − Lφ) dt = L − L · 1 = 0,
−∞
117
and thus the function
χ(t) =
Z
t
[ψ(s) − Lφ(s)]ds
(34)
−∞
is a smooth function with compact support—Proof: Let supp(ψ − Lφ) ⊂
[T, T0 ]. It is clear that χ(t) = 0 for t < T . For t > T 0 , note that χ0 (t) =
ψ(t) − Lφ(t) = 0 and so χ is constant on (T 0 , ∞). Then (34) shows that
χ(t) → 0 as t → ∞, and so χ = 0 on (T 0 , ∞).
Then since χ̇ = ψ − Lφ,
h(ψ) = LK + h(ψ − Lφ) = LK + h(χ̇) = LK − ḣ(χ) = LK
since ḣ = 0 in the sense of distributions. But then
Z
Z
h(ψ) = LK = K ψ dt =
Kψ dt.
R
R
and h = K as distributions.
Homework Problem 48. Prove Propositions 59 and 60 above for distributions on S1 instead of on R. Here are the key steps:
(a) Let f : S1 → R be an L2 function, and assume that the distributional
derivative f˙ is L2 as well. Represent f and f˙ as periodic functions
from R → R. For any t ∈ R, define
Z t
g(t) =
f˙(s) ds.
0
Show that g is periodic and continuous (and so defines a continuous
function on S1 .) Note that the constant function 1 is a test function
on S1 .
(b) Show that f˙ = ġ in the sense of distributions. In other words, for every
smooth periodic test function φ ∈ D(S1 ), show that
Z 1
Z 1
˙
f φ dt = −
g φ̇ dt.
0
0
118
(c) If h is a distribution on S1 which satisfies ḣ = 0 in the sense of distributions, show there is a constant K so that h = K as distributions. In
other words, show that for every periodic smooth ψ : R → R,
Z 1
h(ψ) =
Kψ dt.
0
Now since any L21 map from S1 → X ⊂ RN is continuous, each one is in
a free homotopy class of loops on X. With that in mind, we formulate our
final version of the problem:
For X ⊂ RN a smooth submanifold with Riemannian metric pulled back
from the Euclidean metric on RN , define
L21 (S1 , X) = {γ ∈ L21 (S1 , RN ) : γ(S1 ) ⊂ X}.
Here we assume that γ is continuous, as we may by Proposition 59 above.
Problem: Let X ⊂ RN be a smooth compact manifold equipped with the
Riemannian metric pulled back from the Euclidean metric on RN . Let C be
the class of loops γ : S1 → X in a free homotopy class on X and in L21 (S1 , X).
Find a loop of least energy in C.
Proposition 61. Let γ ∈ L21 (S1 , X) be energy minimizing in a free homotopy
class on X for X ⊂ RN a smooth manifold without boundary. Then γ solves
(a version of ) the geodesic equation
2(gik γ̇ i )˙ − gij,k γ̇ i γ̇ j
for all k = 1, . . . , n, in the sense of distributions.
Proof. First of all, note that we can choose γ to be continuous by Problem
48 above. Thus it makes sense that γ is in a free homotopy class. Since
γ minimizes energy, then for each h smooth with compact support so that
γ(supp h) ⊂⊂ a single coordinate chart in X, that
d E(γ + h) = 0.
d =0
119
Compute the first variation as in the derivation of the Euler-Lagrange equations in Subsection 4.2 above:
Z
Z
Z
k i j
i j
gij,k h γ̇ γ̇ dt +
gij ḣ γ̇ dt +
gij γ̇ i ḣj dt = 0.
S1
S1
S1
Since the components of h are smooth with compact support, they act as
test functions, and we may then integrate by parts in the second and third
integrals, in the sense of distributions, to conclude that
0 = (gkj γ̇ j )˙ + (gik γ̇ i )˙ − gij,k γ̇ i γ̇ j = 2(gik γ̇ i )˙ − gij,k γ̇ i γ̇ j
in the sense of distributions.
Remark. In the previous proposition, we cannot immediately perform the
usual rules of calculus, since the objects involved are only distributions. In
particular, we show in the next homework problem that functions which are
only continuous cannot be meaningfully multiplied by distributions which
are not Borel measures.
Homework Problem 49. Note that if λ : Rn → R is a smooth function, and
f is a locally L1 function, then the product λf is also a locally L1 function.
(a) If λ : Rn → R is a smooth function, and p is a distribution on Rn , then
show that it is possible to define the product λp in such a way that if p
is induced from a locally L1 function, then λp is induced from the usual
product of two functions.
(b) Let δ be the δ-function on R. Compute its first derivative δ̇ in the sense
of distributions.
(c) Show that if g : R → R is a continuous function which is not differentiable at 0, then the formula for the product developed in part (a) above
does not give a reliable answer for the product g δ̇ of the continuous
function g and the distribution δ̇.
4.5
Hilbert spaces
Recall that a Hilbert space is a Banach space whose norm comes from a
positive definite inner product. We now show that L21 (S1 , R) is a Hilbert
120
space. Recall that L21 (S1 , R) consists of all L2 functions on S1 whose derivative
in the sense of distributions is also L2 . This suggests a natural inner product:
Z
Z
hf, hiL21 =
f h dt +
f˙ḣ dt.
S1
S1
Then plug in f = h to find
Z
Z
2
2
kf kL2 =
|f | dt +
|f˙|2 dt = hf, f iL21 ,
1
S1
S1
and so the norm on L21 is induced by the inner product. Below in Corollary
67, we show that any positive definite inner product defines a norm.
Remark. L21 (S1 , RN ) is also naturally a Hilbert space, with inner product
given by
Z
Z
hf, hiL2 =
hf, hi dt +
hf˙, ḣi dt,
1
S1
S1
N
where h·, ·i is the inner product on R .
It is also useful to define complex Hilbert spaces, in which the inner
product h·, ·i is Hermitian and positive definite. A Hermitian inner product
on a complex vector space V is a map from V × V → C which satisfies for
λ ∈ C and f, g, h ∈ V ,
hλf + g, hi = λhf, hi + hg, hi,
hf, λg + hi = λ̄hf, gi + hf, hi,
hf, gi = hg, f i.
These three conditions are respectively that the inner product is complex
linear in the first slot, complex antilinear in the second slot, and skewsymmetric. The first two conditions together are called sesquilinear.
Then L21 (S1 , C) is a complex Hilbert space with inner product
Z
Z
hf, gi =
f ḡ dt +
f˙ġ dt.
S1
S1
We can also define the Sobolev space L21 (Rn , R) by the inner product
Z
n Z
X
hf, gi =
f g dxn +
f,i g,i dxn ,
Rn
i=1
Rn
the derivatives taken in the sense of distributions. The elements of L21 (Rn , R)
are then equivalence classes of functions in L2 so that all the first partials in
the sense of distributions are also in L2 .
121
We will work with L21 (S1 , R) instead of L21 (S1 , RN ), since convergence
in L21 (S1 , RN ) is equivalent to each component converging in L21 (S1 , R). The
proofs that follow will work with minor modifications for the spaces L21 (S1 , RN )
and L21 (S1 , C).
We focus on L21 (S1 , R), which we refer to simply as L21 .
Proposition 62. L21 (S1 , R) is a Hilbert space.
Proof. We’ve exhibited an inner product on L21 , and it is easy to check that
it is positive definite (if we consider elements to be equivalence classes of
functions, two functions being equivalent if they agree almost everywhere).
Thus the remaining thing to check is that the metric L21 (S1 , R) is complete
(and so it is a Banach space).
First of all note that fn → f in L21 is equivalent to fn → f in L2 and
˙
fn → f˙ in L2 .
Let fn be a Cauchy sequence in L21 . Then by the definition of the norm,
it is clear that fn and f˙n are both Cauchy sequences in L2 . Then we have
limits fn → f and f˙n → g in L2 . In order to show that fn → f in L21 , it
suffices to show that f˙ = g in the sense of distributions.
Let φ be a test function, and note that fn → f in L2 implies by Hölder’s
inequality that
Z
|fn (φ) − f (φ)| = (fn − f )φ dt ≤ kfn − f kL2 kφkL2 → 0
S1
as n → ∞. We use this fact for both fn → f and f˙n → g to compute for a
test function φ
Z
Z
g(φ) =
gφ dt = lim
f˙n φ dt
n→∞ S1
S1
Z
= − lim
fn φ̇ dt
n→∞ S1
Z
f φ̇ dt
= −
S1
= −f (φ̇) = f˙(φ).
Therefore, g = f˙ in the sense of distributions.
Remark. Essentially the same proof shows that L21 (Rn , Rm ) is a Hilbert space.
122
For a real Hilbert space H, an orthonormal basis is a collection of elements
{eα }α∈A which are orthonormal in that
heα , eβ i = δαβ
and so that every element v ∈ H can be written as
X
v=
v α eα
α∈A
for v α ∈ R. Here A is an index set, which may be finite, countably infinite, or
uncountable (and of course the convergence of any infinite sum is controlled
by the norm). A Hilbert space which has a countable (finite or infinite)
orthonormal basis is called separable. The following is true:
Proposition 63. Every Hilbert space has an orthonormal basis. In fact,
every orthonormal set in a Hilbert space can be completed to an orthonormal
basis.
We omit the proof, which is similar to the proof of the corresponding
fact for vector spaces (any linearly independent set can be completed to a
basis). In particular, Zorn’s Lemma is needed in the case of non-separable
Hilbert spaces. But see Problem 54 below for a proof of this Proposition
for separable Hilbert spaces, and for a discussion of how this special case is
adequate for the proofs of the results in this section.
Theorem 22 (Pythagorean Theorem). If v, w ∈ H a Hilbert space, and
hv, wi = 0, then
kvk2 + kwk2 = kv + wk2 .
Proof. Compute
kv + wk2 = hv + w, v + wi = hv, vi + 2hv, wi + hw, wi = kvk2 + kwk2 .
Lemma 64 (Bessel’s Inequality). If {e1 , . . . , en } is a finite orthonormal
set in H, then for all y ∈ H,
2
kyk ≥
n
X
|hy, ei i|2 .
i=1
123
Pn
Proof. Check that for w =
i=1 hy, ei iei , hy − w, wi = 0. Then apply
the Pythagorian Theorem to y = (y − w) + w, and note that kwk2 =
P
n
2
i=1 |hy, ei i| .
Corollary 65. If ei is a countable orthonormal set, then
kyk2 ≥
∞
X
|hy, ei i|2 .
i=1
Proof. Use Bessel’s Inequality and take limits of partial sums.
Theorem 23. Given a Hilbert space H with an orthonormal basis {eα }α∈A ,
for every element v ∈ H,
X
v =
hv, eα ieα ,
(35)
α∈A
kvk2H = hv, vi =
X
|hv, eα i|2 ,
(36)
α∈A
where the (possibly uncountable) sums are defined by using
ProbP Homework
α
α 2
lem P
50 below. Moreover, if there are v ∈ R so that α∈A |v | < ∞, then
v = α∈A v α eα converges to an element of H.
Remark. For each v ∈ H, only a countable number of the coefficients v α =
hv, eα i are nonzero. This is due to the following fact:
Homework Problem 50. Let A be an uncountable set, and for each α ∈ A,
let xα ≥ 0.
P
(a) If A0 ⊂ A is a finite set, let SA0 = α∈A0 xα . Show that if the set
{SA0 : A0 ⊂ A is a finite set}
is bounded, then xα = 0 for all but countably many α ∈ A.
(b) Use part (a) to define
X
xα = sup{SA0 : A0 ⊂ A is a finite set}
α∈A
as an element of [0, ∞] for any xα ≥ 0. In particular, if the sum is
finite, show that
X
X
xα =
xα ,
α∈A
α∈Ã
124
where à = {α ∈ A : xα > 0} is countable. Show that if à is infinite,
the right-hand sum is the usual sum of a convergent countably infinite
series (for any bijection between à and the natural numbers).
Hint for (a): Each xα > 0 satisfies xα ∈ [2n , 2n+1 ) for some n ∈ Z.
Derive a contradiction if the number of positive xα is uncountable.
Remark. Note that
X
xα =
Z
x dc
A
α∈A
for dc the counting measure on A. If A0 ⊂ A, then the counting measure
c(A0 ) = |A0 | the cardinality of A0 (and so c(A0 ) = +∞ when A0 is infinite).
P
Proof of Theorem 23. First assume that v α ∈ R and α∈A |v α |2 < ∞. Then
Homework Problem 50 above shows that all but
manyP
of v α are
P∞countably
i
zero, and so we may write v as a countable sum i=1 v ei . Let vn = ni=1 v i ei .
Then for n > m
2
n
n
∞
X
X
X
2
i i 2
kvn − vm k = v ei =
|v | ≤
|v i |2 .
i=m+1
i=m+1
i=m+1
Here,
second equality is by the Pythagorean
Since the series
P∞ Theorem.
P∞ the
i 2
i 2
i=m+1 |v | must go to zero as
i=1 |v | converges, the tail of the series
m → ∞, and thus {vn } is a Cauchy sequence in H. Since H is complete, vn
converges to the limit v ∈ H.
Now let v ∈ H and v α = hv, eα i. Then Bessel’s Inequality shows that for
all finite subsets A0 ⊂ A, that
X
|v α |2 ≤ kvk2 .
α∈A0
So for the collection {|v α |2 }α∈A , the set S of finite partial sums is bounded.
So Homework Problem 50 shows that all but countably many v α = 0. Denumerate the countable number of nonzero terms as v 1 , v 2 , . . . , and the corresponding elements of the
basis as e1 , e2 , . . . .
Porthonormal
N
i 2
Since the sequence i=1 |v | is bounded and increasing,
P iti has a finite
limit as N → ∞. We have shown above that the series ∞
i=1 v ei converges
0
to a limit v ∈ H. Compute
*
+
n
X
hv − v 0 , ei i = lim v −
v j ej , ei = v i − v j δji = 0.
n→∞
j=1
125
And for any eα 6∈ {e1 , e2 , . . . }, compute
*
0
hv − v , eα i = lim
n→∞
v−
n
X
j
v ej , eα
+
= 0.
j=1
So for all eα in the orthonormal basis,
*∞
+
X
hv, eα i = hv 0 , eα i =
v i ei , eα = v α .
i=1
α
Now
P the αdefinition of orthonormal basis shows that there are ṽ ∈ R so
that α∈A ṽ eα = v. By the same analysis
P∞ ias above, all but countably many
α
v are zero, and we may write v = i=1 ṽ ei . Moreover, as in the previous
paragraph,
*∞
+
X
v α = hv, eα i =
ṽ i ei , eα = ṽ α
i=1
and so (35) is proved.
PnTo prove (36), note that (35) shows that v = limn→∞ vn in H, for vn =
i=1 hv, ei iei . Since the norm is continuous, then
kvk2 = lim kvn k2 = lim
n→∞
n→∞
n
X
|hv, ei i|2 =
i=1
∞
X
i=1
|hv, ei i|2 =
X
|hv, eα i|2 .
α∈A
This concludes the proof of the theorem.
P
P∞ i
i
Corollary 66. If v = ∞
i=1 w ei for {ei } an orthonormal
i=1 v ei , w =
basis of a separable Hilbert space, then
∞
X
hv, wi =
v i wi .
i=1
Proof. Compute
kv + wk2 = kvk2 + 2hv, wi + kwk2 ,
hv, wi = 12 [kv + wk2 − kvk2 − kwk2 ]
∞
X
[(v i + wi )2 − (v i )2 − (wi )2 ]
= 12
i=1
=
∞
X
v i wi .
i=1
126
Remark. The formula for a complex Hilbert space is
hv, wi =
∞
X
v i wi .
i=1
Remark. Homework Problem 50 shows that this result still holds for nonseparable Hilbert spaces, since the number of basis elements with nonzero
coefficients for v and/or w is countable.
Here is another basic result in Hilbert spaces:
Homework Problem 51 (Cauchy-Schwartz Inequality). If v, w ∈ H a
real Hilbert space, then |hv, wi| ≤ kvkkwk, and there is equality if and only
if v and w are linearly dependent.
Hint: Use calculus to compute the minimum value of ktv + wk2 as a
function of t, and note the minimum value must be nonnegative.
Remark. The Cauchy-Schwartz Inequality is also true for complex Hilbert
spaces, but for the proof, note that the minimum value of kteiθ v + wk2 , for
t ∈ R and θ so that eiθ hv, wi = |hv, wi|, is nonnegative.
Corollary 67. Any positive definite inner product on a real vector space V
produces a norm by the formula kvk2 = hv, vi.
Proof. The main thing to check is the triangle inequality. Let v, w ∈ V and
note that
kv + wk ≤ kvk + kwk
⇐⇒
kv + wk2 ≤ kvk2 + 2kvkkwk + kwk2
⇐⇒
kvk2 + 2hv, wi + kwk2 ≤ kvk2 + 2kvkkwk + kwk2
⇐⇒
hv, wi ≤ kvkkwk.
The main results we will use regarding Hilbert spaces involve another
topology on the Hilbert space which is different from the topology defined
by the metric. The usual metric convergence of sequences is called strong
convergence. So a sequence vi → v in H strongly if
kvi − vkH → 0,
127
and we write vi → v. On the other hand, a sequence vi ∈ H is weakly
convergent to a limit v ∈ H if
hvi , wi → hv, wi
for all w ∈ H,
and we write vi * v. If vi → v, then vi * v (Homework Problem 52 below),
but the converse is not true in general, as the following example shows:
Example 19. Let H be a Hilbert space with a countably infinite orthonormal
basis e1 , e2 , . . . . Then ei * 0 in H, but {ei } does not converge strongly.
P
2
Proof. Let w ∈ H. Then since kwk2 = ∞
i=1 |hw, ei i| < ∞, we must have
each term |hw, ei i|2 → 0 as i → ∞. This shows ei → 0 weakly in H as
i → ∞.
To show {ei } does not converge strongly, note that
√
kei − ej kH = 2
for i 6= j
by the Pythagorean Theorem. Thus {ei } cannot be a Cauchy sequence in
H, and thus cannot converge strongly.
Homework Problem 52. Show that if vi → v converges strongly in a
Hilbert space H, then vi * v weakly in H.
Hint: Use Cauchy-Schwartz.
Theorem 24. Let {vi } be a sequence in a Hilbert space H satisfying kvi k ≤
K for a uniform constant K. Then there is a weakly convergent subsequence
to a limit v which satisfies kvk ≤ K. In other words, the closed ball of radius
K is (sequentially) compact in the weak topology on H.
Proof. Let {eα }α∈A be an orthonormal basis. Problem 50 shows that for
each of {v1 , v2 , . . . }, only a countable subset Av1 , Av2 , · · · ⊂ A have nonzero
coefficients in the orthonormal decomposition. Then the union
∞
[
Av i
i=1
is also countable, and it represents all the basis elements with nonvanishing
coefficients for all the vi . Denumerate these elements as e1 , e2 , . . . , and write
vi =
∞
X
j=1
128
vij ej .
Since there is a constant K so that kvi k ≤ K, then Theorem 23 shows
for each N
N
X
|vij |2 ≤ K 2 .
(37)
j=1
Thus, since the interval [−K, K] ⊂ R is compact, there is a subsequence
{1 vi } of {vi } so that
lim 1 vi1 = v 1 ∈ [−K, K].
i→∞
Now there is a subsequence {2 vi } of {1 vi } so that
lim 2 vi1 = v 1 ,
lim 2 vi2 = v 2 ,
i→∞
|v 1 |2 + |v 2 |2 ≤ K 2 .
i→∞
This is because 1 vi2 ∈ [−K, K], which is compact, and the bound follows from
(37). Recursively, we may define for each N a subsequence {N vi } and a real
number v N so that
{N vi } is a subsequence of {(N −1) vi },
lim N vij = v j for j = 1, . . . , N,
(38)
|v | + · · · + |v N |2 ≤ K 2 .
(39)
i→∞
1 2
We use a diagonalization procedure to find a weakly convergent subsequence. {i vi } isPa subsequence of {vi }, and we will show that it converges
j
i
j
weakly to v = ∞
i=1 v ei and v ∈ H. Note by construction that {i vi } → v
as i → ∞ for each j = 1, 2, . . . . This is because, for each j, {i vi }∞
i=j is a
subsequence of {j vi }∞
and
by
condition
(38).
i=1
That v ∈ H follows directly from (39) and Theorem 23. Now we show
i vi → v weakly in H. Let w ∈ H, and let > 0. Write
|hi vi , wi − hv, wi| = |hi vi − v, wi|
X
≤
|(i viα − v α )wα |
=
=
α∈A
∞
X
j=1
n
X
|(i vij − v j )wj |
|(i vij
j
j
− v )w | +
j=1
∞
X
j=n+1
129
|(i vij − v j )wj |
for all n. Here the third line follows from the second since i viα = v α = 0 if
eα 6∈ {e1 , e2 , . . . }.
Since ki vi k ≤ K and kvk ≤ K, then ki vi −vk ≤ 2K and Cauchy-Schwartz
shows that
! 21
! 12
∞
∞
∞
X
X
X
|(i vij − v j )wj | ≤
|wj |2
|i vij − v j |2
j=n+1
j=n+1
j=n+1
∞
X
≤
|i vij − v j |2
! 12
∞
X
j=1
j=n+1
∞
X
≤ 2K
|wj |2
! 12
|wj |2
! 12
j=n+1
Since w ∈ H,
so that
P∞
j=1
|wj |2 ≤
P
α∈A
|wα |2 = kwk2 converges, and there is an n
∞
X
|wj |2
! 12
< .
j=n+1
Now for j = 1, 2, . . . , n, each i vij → v j as i → ∞. So we may choose an I
so that for all i ≥ I, |i vij − vj | < 0 for (|w1 | + · · · |wn |)0 < . Therefore, for
i ≥ I,
|hi vi , wi − hv, wi| ≤
n
X
j=1
0
|(i vij − v j )wj | +
∞
X
|(i vij − v j )wj |
j=n+1
1
n
≤ (|w | + · · · + |w |) + 2K
< + 2K.
Since K is independent of i, hi vi , wi → hv, wi as i → ∞ and thus i vi * v
weakly in H.
Theorem 25. Let vi * v in a Hilbert space. Then
kvk ≤ lim inf kvi k.
i→∞
In other words, the Hilbert space norm is lower semicontinuous under weak
convergence.
130
Proof. The proof is to translate the current problem into Fatou’s Lemma.
Let {eα }α∈A be an orthonormal basis of our Hilbert space H. Then put the
counting measure c on the index set A. Let f : A → [0, ∞), f : α 7→ fα be a
nonnegative real-valued function on A. Then it is straightforward to check
that
Z
X
f dc =
fα ,
A
α∈A
and thus each sum may be thought of as an integral with respect to the
counting measure.
In our case, if
X
X
vi =
viα eα ,
v=
v α eα ,
α∈A
α∈A
we may view vi as a function from A → R by vi : α → viα . (The same holds
for v.) Theorem 23 shows
X
X
kvi k2 =
|viα |2 ,
kvk2 =
|v α |2 .
(40)
α∈A
α∈A
Now since vi * v, then
viα = hvi , eα i → hv, eα i = v α
as i → ∞ for all α. Thus with respect to the counting measure on A, vi → v
everywhere on A. Thus each terms in the sums in (40) is nonnegative, and
for each α, |viα |2 → |v α |2 , the limit vi → v satisfies the hypotheses of Fatou’s
Lemma with respect to the counting measure, and so
X
|viα |2
lim inf kvi k2 = lim inf
i→∞
i→∞
α∈A
Z
= lim inf
|vi |2 dµ
i→∞
A
Z
X
|v|2 dµ =
|v α |2 = kvk2 .
≥
A
α∈A
Note that the above proofs depend heavily on the existence of an orthonormal basis, Proposition 63, which we did not prove. The following
131
problem outlines a standard procedure for getting around the proof of Proposition 63, by proving the existence of an orthonormal basis for any Hilbert
space with a countable spanning set. A subset S of a Hilbert space H is said
to be a spanning set if the (strong) closure of finite linear combinations of
elements in S is equal to all of H. For example, in the proof of Theorem
24, we need only deal with the closure H 0 of the span of {v1 , v2 , . . . }. The
existence of an orthonormal basis of H 0 is sufficient for the proof of Theorem
24.
Homework Problem 53. Show that any strongly closed linear subspace of
a Hilbert space H is again a Hilbert space (with the same inner product).
We say a subset {vα }α∈A ⊂ H is linearly independent (in the sense of
Banach spaces) if any convergent sum
X
bα vα = 0
α∈A
implies bα = 0 for all α ∈ A. Note in particular, the implication holds for any
finite sum (and thus this notion of linearly independence in this Banach-space
sense implies linear independence in the usual vector-space sense).
Homework Problem 54 (Gram-Schmidt Orthogonalization).
(a) Let H be a Hilbert space with a countable spanning set {v1 , v2 , . . . }
which is finite or countably infinite. Show that there is a subset of
{v1 , v2 , . . . } which is a linearly independent spanning set of H.
(b) Given a linearly independent spanning set {v1 , v2 , . . . } on a Hilbert
space H, define fi and ei recursively by
f1 = v1 ,
f2 = v2 − hv2 , e1 ie1 ,
fn = vn −
n−1
X
hvn , ei iei ,
i=1
f1
,
kf1 k
f2
e2 =
,
kf2 k
e1 =
en =
fn
.
kfn k
Show that this recursive definition can be carried out (in particular,
show that fn 6= 0). Then show that {e1 , e2 , . . . } is an orthonormal
basis for H. In other words, show that hei , P
ej i = δij and that any v in
i
H can be written as a convergent sum v = ∞
i=1 v ei .
132
The use of the previous problem isn’t strictly necessary for our purposes,
as L21 (S1 , R) is separable (though we won’t prove that it is).
Recall that for every Banach space B, the dual space Banach space B ∗ is
the space of all continuous linear functionals λ : B → R, with norm given by
|λ(x)|
.
x∈B\{0} kxkB
kλkB ∗ = sup
Also recall that for any p ∈ (1, ∞), the dual Banach space of Lp (Rn ) is
Lq (Rn ) for p−1 + q −1 = 1. Thus L2 (Rn ) is dual to itself. This fact is true for
all Hilbert spaces, as the following problem shows in the separable case.
Homework Problem 55. Let H be a separable real Hilbert space. Show
that the dual Banach space H ∗ is naturally equal to H. In particular, the
inner product provides a map from H → H ∗ by
x 7→ λx = h·, xi.
Show that this map preserves the norm, is one-to-one and onto.
Hint: The most significant step is showing the map is onto. First reduce
to the case λ 6= 0. Show that L = λ−1 (0) is also a separable Hilbert space,
and let {ei } be an orthonormal basis for L. Let y ∈
/ L and use a version of
Gram-Schmidt to show we may assume y ⊥ L. Construct x from y and λ.
A sequence vi in a Banach space B converges to v ∈ B, in the weak*
topology if for every λ ∈ B ∗ , λ(vi ) → λ(v). The previous problem shows that
Theorem 24 is a special case of the following more general theorem about
Banach spaces:
Theorem 26 (Banach-Alaoglu). In a Banach space B, the unit ball {x ∈
B : kxkB ≤ 1} is compact in the weak* topology. In other words, if xi is a
sequence in the unit ball, then there is a subsequence xij and a limit x ∈ B
so that for all λ ∈ B ∗ , λ(xij ) → λ(x) as j → ∞.
Example 20 (Fourier series). In the following theorem, we compute perhaps the easiest nontrivial example of an orthonormal basis on an infinitedimensional Hilbert space. L2 (S1 , C) is a complex Hilbert space with inner
product given by
Z
hf, gi =
f ḡ dt.
S1
133
Theorem 27. The complex exponential functions
{e2πikt : k ∈ Z}
form an orthonormal basis of L2 (S1 , C).
Proof. It is clear that each e2πikt ∈ L2 (S1 , C), and we compute
Z
2πikt 2πi`t
he
,e
i =
e2πikt e2πi`t dt
1
ZS 1
=
e2πi(k−`)t dt
0
1
e2πi(k−`)t 


if k 6= `
=0
2πi(k
−
`)
0
=
Z 1



dt
=1
if k = `.
0
2 1
Therefore, {e2πikt }∞
k=−∞ forms an orthonormal set in L (S , C).
We must show that every element f ∈ L2 (S1 , C) can be written as a
Fourier series
∞
X
f=
hf, e2πikt ie2πikt ,
k=−∞
with the convergence in the L2 sense.
First, we address this problem for smooth functions f ∈ C ∞ (S1 , C). Recall that C ∞ (S1 , C) is dense in L2 (S1 , C) (which may be proved by mollifying
L2 functions).
Lemma 68. If f ∈ C ∞ (S1 , C), then for every polynomial P = P (k),
lim P (k)hf, e2πikt i = lim P (k)hf, e2πikt i = 0.
k→∞
k→−∞
Proof. We use the following claim: For any L2 function f , the Fourier coefficients hf, e2πikt i → 0 as k → ±∞. This follows from Bessel’s Inequality
∞
X
|hf, e2πikt i|2 ≤ kf k2L2 < ∞.
k=−∞
134
If f is smooth, then f˙ is also smooth (and thus is in L2 ), and integration
by parts gives us
Z 1
2πikt
˙
hf , e
i =
f˙e−2πikt dt
0
1
Z 1
−2πikt
−2πikt f (e
)˙dt + f (t)e
= −
0
0
= 2πikhf, e2πikt i + 0.
Now by the claim, hf˙, e2πikt i = 2πikhf, e2πikt i → 0 as k → ±∞. Now we may
apply induction to show that
lim k n hf, e2πikt i = 0
for each n = 0, 1, 2, . . . .
k→±∞
Thus any polynomial P (k) times the Fourier coefficients also goes to zero as
k → ±∞.
The previous lemma shows that for any smooth function f ∈ C ∞ (S1 , C),
the Fourier series
∞
X
g(t) =
hf, e2πikt ie2πikt
k=−∞
converges uniformly: This is because there is a constant C > 0 so that
|hf, e2πikt i| ≤
C
1 + k2
(why?), which shows that the C 0 norm of the Fourier series satisfies
∞
∞
X
X
hf, e2πikt ie2πikt 0 ≤
C
C
< ∞.
1 + k2
k=−∞
k=−∞
So the sup norm of the tails of the series
∞
X
hf, e2πikt ie2πikt
k=−∞
must go to zero, as they are bounded by the tails of an absolutely convergent
series.
135
Therefore, uniform convergence implies that g(t) is continuous (and thus
is in L2 as well—why?). (In fact, g(t) is smooth—see Homework Problem 58
below.) If we let
h(t) = f (t) − g(t) = f (t) −
∞
X
hf, e2πikt ie2πikt ,
k=−∞
then by the same techniques in the proof of Theorem 23 above, we see that
hh, e2πikt i = 0
for all k ∈ Z.
The following lemma shows that h = 0:
Lemma 69. Given a function h ∈ C 0 (S1 , C) all of whose Fourier coefficients
hh, e2πikt i = 0, then h = 0 identically.
Proof. We prove by contradiction. If h is not identically zero, then there is
a point τ ∈ S1 at which h(τ ) 6= 0. Then we know that at least one of the
following is true:
Re h(τ ) > 0,
Re h(τ ) < 0,
Im h(τ ) > 0,
Im h(τ ) < 0.
Assume that Re h(τ ) > 0 (the other cases are similar), and let α(t) = Re h(t).
Since α is continuous, there is a δ > 0 so that
α(t) > 12 α(τ ) > 0 if t ∈ (τ − δ, τ + δ).
We will construct an approximate bump function to prove a contradiction.
For n a positive integer, define
n
bn (t) = [ 12 + 12 cos 2π(t − τ )]n = 21 + 14 e−2πiτ e2πit + 14 e2πiτ e−2πit .
It is obvious that bn (t) is real-valued, periodic with period 1 (and so defines
a function on S1 ), and is equal to a finite Fourier series. Moreover, note that
1
2
+ 12 cos 2π(t − τ ) ∈ [0, 1]
always, and is equal to 1 only if t = τ in S1 . Thus the powers bn (t) → 0 as
n → ∞ away from t = τ , while bn (τ ) → 1. This is the property that makes
bn similar to bump functions centered around t = τ .
136
Now compute
Z
|Re hh, bn i| = Re
h(t)bn (t) dt
1
Z S
= α(t)bn (t) dt
1
ZS τ +δ
Z
α(t)bn (t) dt +
= S1 \[τ −δ,τ +δ]
τ −δ
α(t)bn (t) dt
Z τ +δ
Z
≥ α(t)bn (t) dt − α(t)bn (t) dt
τ −δ
S1 \[τ −δ,τ +δ]
Z τ+ δ
Z
2
>
α(t)bn (t) dt − α(t)bn (t) dt .
τ − 2δ
S1 \[τ −δ,τ +δ]
(Note the last inequality follows since the integrand is positive.) Also, we
have the following bounds:
t ∈ [t − 2δ , t + 2δ ] =⇒ α(t) > 12 α(τ ) > 0, bn (t) ≥ ( 12 + 12 cos πδ)n ,
t 6∈ [t − δ, t + δ] =⇒ |α(t)| < C, bn (t) ≤ ( 12 + 12 cos 2πδ)n .
for some constant C (since α is continuous). The bounds on bn follow by
examining the graph of the cosine function. The key point is that
1
2
+ 12 cos πδ >
1
2
+ 12 cos 2πδ > 0.
(41)
Now compute
|Re hh, bn i| >
≥
Z
τ + 2δ
τ − 2δ
Z
α(t)bn (t) dt − δ 12 α(τ )( 12
S1 \[τ −δ,τ +δ]
n
α(t)bn (t) dt
+ cos πδ) − (1 − 2δ)C( 12 + 12 cos 2πδ)n .
1
2
Now (41) shows the ratio of the first term over the second goes to +∞ as
n → ∞ and thus there is an n so that |Re hh, bn i| > 0.
Now the contradiction is this: Since bn is a finite Fourier series, hh, bn i is
a finite linear combination of Fourier coefficients hh, e2πikt i, which we assume
are all zero. Thus hh, bn i = 0, and we have a contradiction.
Since h is the difference between the smooth f and its Fourier series, we
have shown
137
Lemma 70. Let f ∈ C ∞ (S1 , C). Then
f (t) =
∞
X
hf, e2πikt i e2πikt ,
k=−∞
and the series converges uniformly in t.
Uniform convergence on S1 implies L2 convergence (since S1 has finite
measure; why does this imply L2 convergence?). Therefore, as in Theorem
23, we have
Z
∞
X
2
|f |2 dt =
|hf, e2πikt i|2 ,
kf kL2 =
S1
k=−∞
for f ∈ C ∞ (S1 , C).
To complete the proof of Theorem 27, first define the Hilbert space `2 =
L2 (Z, C) for the counting measure on Z. In other words, P
`2 is the set of all
k
k 2
complex-valued integer-indexed sequences {v }k∈Z so that ∞
k=−∞ |v | < ∞.
Then we have the operation F of defining Fourier series:
F : f 7→ fˆk = hf, e2πikt i.
F : L2 (S1 , C) → `2 ,
Moreover, on the dense subset C ∞ (S1 , C) ⊂ L2 (S1 , C), F is an isometry.
Bessel’s Inequality and the fact that {e2πikt } is an orthonormal set in L2 (S1 , C)
shows that for all f ∈ L2 (S1 , C),
kf k2L2 ≥
∞
X
|hf, e2πikt i|2 = kF(f )k2`2 .
k=−∞
Therefore F is a bounded linear map from L2 (S1 , C) to `2 . A linear map L
from a Banach space B1 to another Banach space B2 is called bounded if
there is a positive constant C so that for all v ∈ B1 ,
kL(v)kB2 ≤ CkvkB1 .
A linear map between Banach spaces is bounded if and only if it is continuous
(see Problem 56 below). Therefore, F is continuous.
Also, define the linear map G : `2 → L2 (S1 , C) by
G(v) =
∞
X
k=−∞
138
v k e2πikt .
The proof of Theorem 23 shows that G preserves the norms. In other words,
kG(v)kL2 (S1 ,C) = kvk`2 .
Let f ∈ L2 (S1 , C). Since smooth functions are dense in L2 , there is a
sequence fn → f in L2 for fn ∈ C ∞ (S1 , C). Since F is continuous, then
F(fn ) → F(f ) in `2 as n → ∞. In other words,
lim kF(fn ) − F(f )k2`2
0 =
n→∞
lim kfˆn − fˆk2`2
=
n→∞
lim kG(fˆn ) − G(fˆ)k2L2
=
n→∞
Now recall that
∞
X
G(fˆn ) =
hfn , e2πikt ie2πikt = fn
k=−∞
since fn is smooth. Therefore,
0 = lim kG(fˆn ) − G(fˆ)kL2 = lim kfn − G(fˆ)kL2 .
n→∞
n→∞
So in L2 ,
fn → G(fˆ) =
∞
X
fˆk e2πikt .
k=−∞
Since we assumed fn → f in L2 , this shows
f=
∞
X
fˆk e2πikt
k=−∞
in L2 , and since the sum converges in L2 , finite linear combinations of the
orthonormal set {e2πikt } are dense in L2 (S1 , C), and {e2πikt } is an orthonormal
basis of L2 (S1 , C). So Theorem 27 is proved.
Homework Problem 56. Let L : B1 → B2 be a linear map between Banach
spaces. Show that L is bounded if and only if L is continuous.
Homework Problem 57. Using the notation of the proof of Theorem 27
above, show that F : L2 (S1 , C) → `2 is an isometry and that F ◦ G is the
identity map.
139
Homework Problem 58. Let f k ∈ C for all k ∈ Z, and assume for all
n ≥ 0 that
lim k n f k = lim k n f k = 0.
k→∞
k→−∞
Then the Fourier series
∞
X
f (t) =
f k e2πikt
k=−∞
converges uniformly to a smooth function from S1 → C.
Hint: The key isP
being able to change the order of the
P∞derivative d/dt
with the summation ∞
.
Recall
that
the
summation
k=−∞
k=−∞ can be interpreted as an integral over Z with respect to the counting measure dµ. Thus
for all t ∈ S1 ,
Z
f (t) = f k e2πikt dµ(k).
Z
1
1
To show that f (t) ∈ C (S , C), show that there is a constant C > 0 so
that
C
|f k | ≤
.
1 + |k|3
Mimic the proof of Proposition 11: Show that the absolute value of the difference quotient
f k e2πik(t+h) − f k e2πikt
h
0
(|k|+1)
is uniformly ≤ C1+|k|
for a constant C 0 . (Apply the Mean Value Theorem
3
to the real and imaginary parts of e2πikt separately.) Show that the series
∞
X
C 0 (|k| + 1)
1 + |k|3
k=−∞
converges by using the procedure in the proof of Lemma 70 above.
Use induction to show f (t) is smooth.
4.6
Compact maps and the Ascoli-Arzelá Theorem
Recall that every element of L21 (S1 ) = L21 (S1 , R) has a continuous representative (Proposition 59). So there is a natural linear map L21 (S1 ) → C 0 (S1 ).
140
In this section, we show that this map is compact. A linear map between
Banach spaces Λ : B1 → B2 is called compact if the closure of the image of
the unit ball in B1 is strongly compact in B2 . In other words, if vi ∈ B1
satisfy kvi kB1 ≤ 1, then {Λ(vi )} has a strongly convergent subsequence in
B2 : i.e. there is a subsequence {vij } and an element w ∈ B2 so that
lim kΛ(vij ) − wkB2 = 0.
j→∞
The basic observation which allows us to conclude that the natural inclusion map L21 (S1 ) → C 0 (S1 ) is compact comes from the proof of Proposition
59. If f ∈ L21 (S1 ), then
Z t2
˙
|f (t2 ) − f (t1 )| = f (t) dt
t1
Z t2
12 Z t2 12
≤
|f˙(t)|2 dt
dt
t1
≤
t1
12
1
2
˙
|f (t)| dt (t2 − t1 ) 2
Z
S1
1
≤ kf kL21 (t2 − t1 ) 2
(Note that the first equality was justified in the proof of Proposition 59.)
Therefore, f is continuous. But moreover, for every > 0, we may choose
!2
δ=
kf kL21
so that
|t2 − t1 | < δ
=⇒
|f (t2 ) − f (t1 )| < .
So the modulus of continuity δ does not depend on t, and depends only on
the norm kf kL21 , and on no other information about f .
A family of functions Ω of functions from a metric space X to a metric
space Y is called equicontinuous at a point x ∈ X if for all > 0, there is a
δ > 0 so that
dX (x, x0 ) < δ
=⇒
dY (f (x), f (x0 )) < for all f ∈ Ω. The point is that δ does not depend on f . Such a family
of functions Ω is called equicontinuous on X if it is equicontinuous at each
point x ∈ X.
141
Note that if Ω is equicontinuous on X then each f ∈ Ω is continuous.
The computations above show
Lemma 71. The unit ball in L21 (S1 ) is equicontinuous on S1 .
Theorem 28 (Ascoli-Arzelá). Let X be a compact metric space, and let
Ω be an equicontinuous family of real-valued functions on X. Assume there
is a uniform C so that |f (x)| ≤ C for all f ∈ Ω and x ∈ X. Then each
sequence {fn } ⊂ Ω has a uniformly convergent subsequence.
Proof. We’ll prove the theorem with the help of a few lemmas.
Lemma 72. Any compact metric space has a countable dense subset.
Proof. Let X be the compact metric space. For = 1/n, obviously
[
X=
B (x),
B (x) = {y ∈ X : dX (x, y) < }.
x∈X
For each positive integer n, this open cover of X has a finite subcover consisting of balls of radius 1/n centered at points xn,1 , . . . , xn,mn . The union
∞
[
{xn,1 , . . . , xn,mn }
n=1
is a countable dense subset of X.
Lemma 73. Let P be a countable set, and let fn : P → R be a sequence
of functions. Assume there is a constant C so that |fn (p)| ≤ C for all
n = 1, 2, . . . and all p ∈ P. Then there is a subsequence of {fn } which
converges everywhere on P to a function f : P → R.
Proof. See Problem 59 below.
Lemma 74. Let {fn } be an equicontinuous sequence of mappings from a
compact metric space X to R. If the sequence {fn (x)} converges for each x
in a dense subset of X, then {fn } converges uniformly on X to a continuous
limit function.
142
Proof. First we show that fn (x) converges pointwise everywhere to a function
f (x). Let y ∈ X and let > 0. Then by equicontinuity, there is a δ > 0 so
that
dX (x, y) < δ =⇒ |fn (x) − fn (y)| < .
(Note δ is independent of n.) Since fn converges on a dense subset of X,
there is an x ∈ Bδ (y) for which fn (x) converges. Therefore, {fn (x)} is a
Cauchy sequence in R, and so there is an N so that
n, m ≥ N
|fn (x) − fm (x)| < .
=⇒
Therefore, for n, m ≥ N ,
|fn (y) − fm (y)| ≤ |fn (y) − fn (x)| + |fn (x) − fm (x)| + |fm (x) − fm (y)| < 3.
Therefore, {fn (y)} is a Cauchy sequence in the complete metric space R, and
so it converges to a limit which we call f (y).
Let y ∈ X and > 0. Then equicontinuity shows that there is a δ > 0 so
that
x ∈ Bδ (y) =⇒ |fn (x) − fn (y)| < (42)
for all n. By letting n → ∞, we also have
x ∈ Bδ (y)
|f (x) − f (y)| ≤ =⇒
(43)
These Bδ (y) form an open cover of X, and so there is a finite subcover
X=
k
[
Bδi (yi )
i=1
since X is compact. Choose N large enough so that
n≥N
=⇒
|fn (yi ) − f (yi )| < ,
i = 1, . . . , k.
(44)
Then for x ∈ X, x ∈ Bδi (yi ) for some yi , and so (42), (43) and (44) show
|fn (x) − f (x)| ≤ |fn (x) − fn (yi )| + |fn (yi ) − f (yi )| + |f (yi ) − f (x)| < 3.
Since the same N works for all x ∈ X, the convergence is uniform.
f , as the uniform limit of continuous functions, is continuous.
This completes the proof of Theorem 28.
143
Homework Problem 59. Let P be a countable set, and let fn : P → R
be a sequence of functions. Assume that for each p ∈ P, there is a constant
C = Cp so that |fn (p)| ≤ C for all n = 1, 2, . . . . Show there is a subsequence
of {fn } which converges everywhere on P to a function f : P → R.
Hint: Use a diagonalization argument.
An important version of the Ascoli-Arzelá Theorem is the following:
Theorem 29. Let X be a metric space so that there is a countable number
of open subsets Oi satisfying
X=
∞
[
Oi ,
Oi ⊂⊂ Oi+1 ,
(45)
i=1
and let Ω be an equicontinuous set of real-valued functions on X. If for a
sequence of functions {fn } ⊂ Ω, there is a uniform C so that |fn (x)| ≤ C
for all n and all x ∈ X, then there is a subsequence of {fn } which converges
pointwise to a function f : X → R, and the convergence is uniform on every
compact subset of X.
Remark. Recall A ⊂⊂ B for A a subspace of a topological space B means
that the closure A relative to B is compact.
Remark. A sequence of functions converging uniformly on compact subsets
of X is said to converge normally on X.
We relegate the proof of Theorem 29 to the following problem:
Homework Problem 60. Prove Theorem 29.
Hint: Consider X, Oi as in the previous theorem. Note we may apply
Theorem 28 to each of the compact sets Oi . Use a diagonalization argument
to find a uniformly convergent subsequence on each Oi . Show that every
compact subset of X is contained in some Oi .
Remark. For every smooth manifold X (which is Hausdorff and sigma-compact),
there are a countable collection of open sets Oi satisfying condition (45). See
the notes on “The Real Definition of a Smooth Manifold.”
The Ascoli-Arzelá Theorem provides the following.
Proposition 75. If C > 0 and {fn } is a sequence of functions in L21 (S1 , R)
which satisfy kfn kL21 ≤ C, then there is a uniformly convergent subsequence.
144
Proof. This follows from the Ascoli-Arzelá Theorem and Lemma 71 above,
once we know in addition that there is a constant K so that |fn | ≤ K
pointwise. First of all, note that
1
1
|fn (t2 ) − fn (t1 )| ≤ kfn kL21 |t2 − t1 | 2 ≤ C|t2 − t1 | 2
shows that for every t2 , t1 ∈ S1 ,
|fn (t2 ) − fn (t1 )| ≤ C
since we may choose t2 , t1 ∈ [0, 1). Since
Z
0
1
21
|fn | dt
= kfn kL2 ≤ kfn kL21 ≤ C,
2
there must be a t1 ∈ S1 so that |fn (t1 )| ≤ C. Then for any t2 ∈ S1 ,
|fn (t2 )| ≤ |fn (t1 )| + |fn (t2 ) − fn (t1 )| ≤ 2C.
Thus the hypotheses of the Ascoli-Arzelá Theorem are satisfied.
Corollary 76. The inclusion L21 (S1 , R) ,→ C 0 (S1 , R) is compact.
Proof. Take C = 1 in the above theorem.
Corollary 77. Let C > 0 and let X ⊂ RN be a compact manifold, and let
γn ∈ L21 (S1 , X) ⊂ L21 (S1 , RN ) satisfy E(γn ) ≤ C. Then there is a uniformly
convergent subsequence of {γn }, and the limit is a continuous function γ :
S1 → X.
Proof. Recall
kγn k2L2 (S1 ,RN ) = kγn k2L2 (S1 ,RN ) + kγ̇n k2L2 (S1 ,RN ) = kγn k2L2 (S1 ,RN ) + E(γn ).
1
Since γn (S1 ) ⊂ X and X is compact, there is a constant K so that |γn (t)| ≤ K
for all n and t. Therefore,
Z
2
kγn kL2 (S1 ,RN ) ≤
K 2 dt = K 2 ,
S1
and moreover,
kγn k2L2 (S1 ,RN ) ≤ C + K 2
1
145
independently of n. So each component function γna for a = 1, . . . , N satisfies
√
kγna kL21 (S1 ,R) ≤ C + K 2 .
Then Proposition 75 shows that there is a subsequence {1 γn } of {γn } so
that the component 1 γn1 converges uniformly. Let {2 γn } be a subsequence of
{1 γn } so that 2 γn1 and 2 γn1 converge uniformly. By induction, as in the proof
of Theorem 24, there is a subsequence {N γn } of {γn } so that N γna converges
uniformly for a = 1, . . . , N . Since this subsequence converges uniformly on
each component in RN , N γn converges uniformly as n → ∞ to a limit γ in
C 0 (S1 , RN ).
Since X is closed in RN and the subsequence converges pointwise, the
limit γ : S1 → X.
It is also useful to define the Hölder norm for functions f : S1 → R
kf kC 0, 21 (S1 ) = kf kC 0 + sup
t1 6=t2
|f (t1 ) − f (t2 )|
1
dS1 (t1 , t2 ) 2
.
(Here we define
dS1 (t1 , t2 ) = inf |(t1 + k) − t2 |.
k∈Z
This definition is necessary, since we identify the real numbers t and t + k on
the circle S1 . For example, dS1 (0, 0.9) = |1 − 0.9| = 0.1.) It is easy to check
1
that this defines a norm. Define the space C 0, 2 (S1 ) to be all f from S1 → R
so that kf kC 0, 21 (S1 ) < ∞.
1
C 0, 2 (S1 ) is a Banach space (Proposition 78 below), and the calculations
above show that there is a natural continuous inclusion map from L21 (S1 ) →
1
1
C 0, 2 (S1 ). Moreover, the natural inclusion map from C 0, 2 (S1 ) → C 0 (S1 ) is
compact. Then Problem 63 below shows that composition inclusion from
L21 (S1 ) → C 0 (S1 ) is compact.
In general, for any metric space X, α ∈ (0, 1], we can define
C 0,α (X) = {f : X → R : kf kC 0,α < ∞},
|f (x) − f (y)|
kf kC 0,α = sup |f (x)| + sup
.
dX (x, y)α
x∈X
x6=y∈X
These are called Hölder spaces and Hölder norms respectively.
146
Example 21. This is the standard example for X = [−1, 1] ⊂ R. f (x) = |x|α
is in C 0,α (X).
Proof. It clearly suffices to bound the difference quotient
q(x, y) =
||x|α − |y|α |
,
|x − y|α
x 6= y ∈ [−1, 1].
We will show that this is always ≤ 1. First, simplify to the case x and y
have the same sign, since if they have opposite signs, q(x, y) < q(−x, y).
We may assume x and y have the same sign. By possibly interchanging
(x, y) ↔ (−x, −y) and switching x and y, we may assume x > y ≥ 0. Then
write
xα − y α
1 − ρα
y
q(x, y) =
=
,
ρ = ∈ [0, 1).
α
α
(x − y)
(1 − ρ)
x
Then we compute
dq
α(1 − ρα−1 )
=
≤ 0.
dρ
(1 − ρ)α+1
Therefore, the max of q(ρ) is achieved at ρ = 0, q = 1.
We also say f (x) = |x|α is locally C 0,α on R, since the α Hölder norm of
f is finite on any compact subset of R.
In the case α = 1, note that a function in C 0,1 is simply a C 0 function
which is globally Lipschitz.
Homework Problem 61.
(a) Show that the inclusion C 1 (S1 ) ,→ C 0 (S1 ) is compact (Hint: use the
Mean Value Theorem).
(b) Show that every bounded sequence fn ∈ C 1 (R) (i.e., there is a uniform
C so that kfn kC 1 ≤ C for all n) has a subsequence which converges
uniformly on compact subsets of R to a continuous limit f . Hint: It is
easy to show that R satisfies condition (45).
(c) Find an example of a bounded sequence of functions fn ∈ C 1 (R) which
does not have a convergent subsequence in C 0 (R). Thus the inclusion
C 1 (R) ,→ C 0 (R) is not compact. (Hint: How is this situation different from parts (a) and (b)? You must use the noncompactness of R.
Therefore, the interesting behavior of the fn should be “moving off to
infinity.”)
147
It is also useful to apply Hölder norms to the derivatives of a functions.
In particular, on Rn , we may define for k a positive integer, α ∈ (0, 1],
X
kf kC k,α =
k∂β f kC 0,α ,
|β|≤k
where, as in (3) above, we use the multi-index notation to denote all the
partial derivatives of f of order ≤ k.
Remark. It is not useful to define C 0,α for α > 1, as the following problem
shows.
Homework Problem 62. Let α > 1, and let f : R → R. Assume that
sup
x6=y
|f (x) − f (y)|
= C < ∞.
|x − y|α
Show that f is a constant function.
Hint: Use the definition of the derivative to show that f 0 (x) = 0 for all
x.
Proposition 78. Let X be a metric space and α ∈ (0, 1]. Then C 0,α (X) is
a Banach space.
Proof. It is straightforward to show that k · kC 0,α is a norm. As always, we
must check completeness carefully.
Let {fn } be a Cauchy sequence in C 0,α (X). We want to show that there
is a limit f ∈ C 0,α and that kfn − f kC 0,α → 0 as n → ∞.
First of all, it is obvious from the definition of the Hölder norm that
{fn } is a Cauchy sequence in C 0 (X), and since C 0 is complete, there is a
continuous limit function f , and fn → f uniformly.
Now we show f ∈ C 0,α . Let > 0. Then there is an N so that
m, n ≥ N
=⇒
kfm − fn kC 0,α < .
(46)
Then for all m ≥ N , kfm kC 0,α < kfN kC 0,α + ≡ C . By the definition of the
Hölder norm, for all x, y ∈ X,
|fm (x) − fm (y)| ≤ C dX (x, y)α .
Taking m → ∞ shows that f ∈ C 0,α . Now (46) also implies that for all
x, y ∈ X,
|fm (x) − fn (x) − fm (y) + fn (y)| ≤ dX (x, y)α ,
148
and so again let m → ∞ to show for all x, y ∈ X, and for all n ≥ N ,
|f (x) − fn (x) − f (y) + fn (y)| ≤ dX (x, y)α .
Since we already know fn → f in C 0 , this is exactly the additional statement
we need to show fn → f in C 0,α .
Remark. If X is a smooth manifold, then it is possible (by using an atlas and
a subordinate partition of unity) to define C k,α (X). If X is compact, then
C k,α (X) ,→ C k (X) is a compact inclusion.
Homework Problem 63. Let Λ : B1 → B2 and Φ : B2 → B3 be linear maps
between Banach spaces.
(a) Assume Λ is continuous and Φ is compact. Then Φ ◦ Λ is compact.
(b) Assume Λ is compact and Φ is continuous. Then Φ ◦ Λ is compact.
Homework Problem 64. Let Λ : B1 → B2 be a compact linear map of
Banach spaces. Show Λ is continuous.
Hint: It suffices to show Λ is bounded. For B1 (0) the unit ball in B1 ,
consider the image of the compact set
ΛB1 (0) ⊂⊂ B2
under the norm map k · kB2 : B2 → R.
Remark. The Hölder spaces C k,α , for α ∈ (0, 1), and the Sobolev spaces Lpk ,
for p ∈ (1, ∞), play a very important role in the theory of partial differential equations. In particular, the behave much better than the more obvious
1
spaces C k . Our simple proofs that L21 (S1 ) embeds continuously in C 0, 2 (S1 )
and compactly in C 0 (S1 ) constitute some of the easiest cases of Sobolev embedding theorem. The Sobolev embedding theorem allow us to embed certain
Sobolev spaces, in which derivatives are defined only in the sense of distributions, to Hölder and C k spaces, in which we may take derivatives in the
usual sense. These spaces are crucial to the regularity theory of solutions to
PDEs.
149
4.7
Convergence
Now we have finally developed the tools needed to solve our problem. Recall
Problem: Let X ⊂ RN be a smooth compact manifold equipped with the
Riemannian metric pulled back from the Euclidean metric on RN . Let C be
the class of loops γ : S1 → X in a free homotopy class on X and in L21 (S1 , X).
Find a loop of least energy in C.
Our strategy is as follows: Define
L = inf E(γ).
γ∈C
Since E(γ) ≥ 0 always, L ≥ 0. Now there is a sequence of γi ∈ C so that
E(γi ) → L. We want to find a subsequence γij which converges to a limit
γ ∈ C so that E(γ) = L. Moreover, we expect γ to be a geodesic—it should
satisfy the geodesic equations not just in the sense of distributions, but also
in the usual sense. Therefore, by the theory of ODEs, γ should be smooth.
First of all, we show the existence of a limit γ. Corollary 77 shows that
there is a subsequence of γi which converges uniformly to a continuous γ :
S1 → X. (For simplicity, we just refer to this subsequence as γi again.) Since
γi → γ uniformly, Corollary 57 shows that γ is in the same free homotopy
class. Thus we have
Proposition 79. There is a subsequence of γi which converges uniformly to
a limit γ in the same free homotopy class.
Proposition 80. Let X ⊂ RN be a compact manifold. If γi : S1 → X satisfy
E(γi ) → L, then there is a constant K independent of i so that kγi kL21 (RN ) ≤
K.
Proof. Since X is compact, there is a uniform C so that kγi kL2 (S1 ,RN ) ≤ C.
Since E(γi ) → L, {E(γi )} is a bounded sequence. Therefore,
kγi k2L2 (S1 ,RN ) = E(γi ) + kγi k2L2 (S1 ,RN )
1
is bounded independent of i.
This proposition shows there is a further subsequence of γi which converges weakly to a γ̃ ∈ L21 (S1 , RN ) by Theorem 24. (Explanatory note: a
150
further subsequence means that we take a subsequence not just of the original γi , but of the subsequence taken in the paragraph above Proposition 79.)
We still refer to this further subsequence as γi . Then Theorem 25 shows that
the Hilbert space norm
kγ̃kL21 (S1 ,RN ) ≤ lim inf kγi kL21 (S1 ,RN ) .
i→∞
Note a potential problem: We have taken a subsequence of the original
γi to converge uniformly to a continuous γ, and then we take a further subsequence to converge weakly in L21 to γ̃ in L21 . We must show γ and γ̃ are the
same. This will follow from the fact that they must be equal in the sense of
distributions, and thus are equal almost everywhere (Proposition 58). Since
both γ and γ̃ are continuous, they must be equal everywhere. In particular,
we require
Proposition 81. γ = γ̃ in the sense of distributions.
Proof. It suffices to show each component γ a = γ̃ a in the sense of distributions for a = 1, . . . , N .
For each a = 1, . . . , N , γia → γ a uniformly as i → ∞. So if φ ∈ D(S1 ) is
a smooth test function, then
Z
|γia (φ) − γ a (φ)| = (γia − γ a )φ dt ≤ kφkL1 kγia − γ a kC 0 ,
1
S
which goes to 0 as i → ∞ by uniform convergence. Therefore,
γ a (φ) = lim γia (φ).
i→∞
(47)
Also, γia → γ̃ a weakly in L21 (S1 ). Let φ ∈ D(S1 ) ⊂ L21 (S1 ) be a test
function. Let fi = γia − γ̃ a . Then fi → 0 weakly in L21 . Compute
Z
Z
˙
(fi φ − fi φ̈) dt = fi (φ − φ̈),
(fi φ + fi φ̇) dt =
hfi , φiL21 =
S1
S1
the last term denoting fi acting in the sense of distributions. Therefore, for
all φ ∈ D(S1 ),
lim fi (φ − φ̈) = lim hfi , φiL21 = 0.
i→∞
i→∞
By Proposition 82 below, for every ψ ∈ D(S1 ), there is a φ ∈ D(S1 ) so that
φ − φ̈ = ψ. Therefore, for all ψ ∈ D(S1 ),
lim fi (ψ) = 0
i→∞
⇐⇒
lim γia (ψ) = γ̃ a (ψ).
i→∞
Therefore, by (47) above, γ = γ̃ in the sense of distributions.
151
Proposition 82. For every ψ ∈ D(S1 ), there is a φ ∈ D(S1 ) so that ψ =
φ − φ̈.
Proof. Recall D(S1 ) = C ∞ (S1 , R). Moreover, Lemma 70 and Problem 58
show that
( ∞
)
X
∞ 1
k 2πikt
k
n
C (S , C) =
f e
: lim f |k| = 0 for n = 1, 2, . . . . (48)
k→±∞
k=−∞
The convergence of each such series is uniform, and the sum commutes with
the derivative d/dt.
Therefore, if
∞
X
φ=
φ̂k e2πikt ∈ C ∞ (S1 , C),
k=−∞
then
∞
X
φ̈ =
(−4π 2 k 2 )φ̂k e2πikt ,
k=−∞
∞
X
(1 + 4π 2 k 2 )φ̂k e2πikt .
φ − φ̈ =
k=−∞
So if
ψ=
∞
X
ψ̂ k e2πikt ∈ C ∞ (S1 , C),
k=−∞
then we may let
φ=
∞
X
ψ̂ k
e2πikt ,
2k2
1
+
4π
k=−∞
so that φ − φ̈ = ψ.
We must prove that φ ∈ C ∞ (S1 , C). Let n be a positive integer. Then
ψ̂ k |k|n
= 0.
k→±∞ 1 + 4π 2 k 2
lim φ̂k |k|n = lim
k→±∞
because |ψ̂ k ||k|n−2 → 0. So φ is smooth. (Note that we went from a |k|n
limit to a |k|n−2 limit. This is because the differential equation is of order
two.)
We have considered C-valued functions so far. It is easy to check that
ψ ∈ C ∞ (S1 , R) implies φ ∈ C ∞ (S1 , R).
152
Remark. The previous proposition uses a standard technique for solving
constant-coefficient differential equations on S1 . The differential equation
then breaks into an algebraic equation for each Fourier coefficient, each of
which can be typically be solved.
This also works for functions on the n-torus (S1 )n . In this case, the Fourier
series is summed over Zn , and we can solve constant-coefficient PDEs. Also,
on Rn , the Fourier transform turns constant-coefficient PDEs into algebraic
equations of the Fourier transform variable.
Homework Problem 65. L22 (S1 , C) is the complex Hilbert space defined by
the inner product
Z
¯ dt.
hf, giL2 =
(f ḡ + f˙ġ¯ + f¨g̈)
2
S1
The elements of L22 (S1 , C) are all complex-valued functions f on S1 which are
L2 and whose first and second derivatives f˙ and f¨ in the sense of distributions
are also L2 functions. (You may assume L22 (S1 , C) is a Hilbert space, as in
Proposition 62.)
Show that if fn → f converges weakly in L22 (S1 , C), then for all φ ∈ D(S1 ),
fn (φ) → f (φ).
Hint: Mimic the proofs of Propositions 81 and 82.
To recap, so far we have a sequence of loops γi in C so that
lim E(γi ) = L = inf E(α),
i→∞
α∈C
lim γi = γ uniformly and weakly in L21 (S1 , RN ).
i→∞
Moreover, γ ∈ C the same free homotopy class of L21 loops containing the γi .
Since γi → γ uniformly, we have
Z
2
kγi − γkL2 (S1 ,RN ) =
|γi − γ|2 dt ≤ sup |γi − γ|2 → 0,
t
S1
and so γi → γ in L2 .
153
Now Theorem 25 shows that
kγk2L2 (S1 ,RN ) ≤ lim inf kγi k2L2 (S1 ,RN )
1
1
i→∞
h
i
= lim inf E(γi ) + kγi k2L2 (S1 ,RN )
i→∞
= L + kγk2L2 (S1 ,RN ) ,
E(γ) = kγk2L2 (S1 ,RN ) − kγk2L2 (S1 ,RN )
1
≤ L.
Since L is the infimum of the energy of all loops in C, and γ ∈ C, then
E(γ) ≥ L as well. So E(γ) = L. Thus we have proved
Theorem 30. Let X be a compact Riemannian manifold without boundary.
Then in each free homotopy class of loops, there is a γ ∈ L21 (S1 , X) which
minimizes the energy.
Corollary 83. This minimizing γ satisfies the geodesic equations (in local
coordinates on X) in the form
2(gik γ̇ i )˙ − gij,k γ̇ i γ̇ j = 0
in the sense of distributions.
Proof. See Proposition 61.
Note in the proof of Theorem 30 above, we implicitly use the fact that
the map from L21 (S1 ) → L2 (S1 ) is compact, by using the inclusions
L21 (S1 ) ,→ C 0 (S1 ) ,→ L2 (S1 ),
the first of which is compact and the second of which is continuous. The
following problem gives a direct proof.
Homework Problem 66. Show directly that the inclusion L21 (S1 , C) ,→
L2 (S1 , C) is a compact linear map.
Hints:
(a) Use the characterization of L21 (S1 , C) in terms of Fourier series from
Proposition 87 below.
154
(b) If kfi (t)kL21 ≤ 1, then use a diagonalization argument to produce a
subsequence {fij } so that for each k ∈ Z, the Fourier coefficients fˆikj
converge to constants g k ∈ C as j → ∞.
(c) For all > 0, show that there is an N so that if |k| ≥ N , then
X
|fˆk |2 < |k|≥N
for all f such that kf kL21 ≤ 1.
(d) Conclude that the subsequence {fij } converges strongly to
X
g k e2πikt
k∈Z
in L2 (S1 , C).
Remark. The proof presented in the previous problem works for Sobolev
spaces in higher dimensions (for functions on the n-dimensional torus S1 ×
· · · × S1 ), whereas the use of the Sobolev embedding theorem for the compact
inclusion L21 (S1 , C) ,→ C 0 (S1 , C) is only available in dimension n = 1.
4.8
Regularity
Now we show that γ is smooth. First of all, note that Γkij is a smooth in each
set of local coordinates x on X. Also, since γ ∈ L21 (S1 , RN ), then we know
that γ is continuous in t ∈ S1 , and so Γkij (γ) is continuous on S1 .
Until now, we’ve been lax about distinguishing between γ = (γ 1 , . . . , γ N ) ∈
X ⊂ RN and γ in local coordinates. There is an important point in which
we should make a distinction. Recall we are working on a coordinate chart
φ : U → O ⊂ X ⊂ RN , where U ⊂ Rn . Our notation has been this: γ a is the
ath coordinate of γ in RN ⊃ X, while γ i has been shorthand for (φ−1 ◦ γ)i
the ith coordinate of φ−1 ◦ γ in Rn ⊃ U.
In the previous subsections, we have dealt with the L21 norm of γ in RN ,
while in local coordinates, we should deal with the L21 norm of φ−1 ◦ γ in
U ⊂ Rn . Let φ−1 : O → U be restriction of the smooth map
y = (y 1 , . . . , y n ) : Q → U,
155
where Q is an open subset of RN which contains O ⊂ X ⊂ RN . (Recall
we may do this by the definition of smooth maps from O to Rn .) Let x =
(x1 , . . . , xN ) represent coordinates on RN . Compute for k = 1, . . . , n
∂
∂y k a
(y ◦ γ)k =
γ̇ ,
∂t
∂xa
where a is summed from 1 to N .
Proposition 84. Let φ : U → O be a smooth coordinate parametrization of
X. Let I ⊂ R be a compact interval, and let K ⊂ O be compact. Then there
are positive constants C1 , . . . , C5 so that
C1 kγkL21 (I,RN ) + C3 ≥ kφ−1 ◦ γkL21 (I,Rn ) + C4 ≥ C2 kγkL21 (I,RN ) + C5
(49)
for all γ so that γ(I) ⊂ K. (The point is that C1 , C2 , C3 , C4 , C5 are independent of γ.)
Corollary 85. kγkL21 (I,RN ) is bounded if and only if kφ−1 ◦ γkL21 (I,Rn ) is
bounded.
Remark. A related, simpler notion is the following: Two norms k · kB1 and
k · kB2 on a single linear space B are called equivalent if there are constants
C1 > C2 > 0 so that for all x ∈ B,
C1 kxkB1 ≥ kxkB2 ≥ C2 kxkB1 .
Remark. As long as we restrict to compact subsets of coordinate charts,
the norms in RN ⊃ X and in local coordinates on Rn are equivalent. The
corollary holds for all the Banach function spaces we have discussed, not just
for L21 . Also, a similar proposition holds for Banach spaces of functions from
X to R, not simply spaces of maps from S1 to X:
For K ⊂⊂ O, the norms on L21 (K) and L21 (φ−1 K) are equivalent under
the map
L21 (K) → L21 (φ−1 K),
f 7→ f ◦ φ.
Proof of Proposition 84. We claim it suffices to prove the bound (49) separately for the L2 norm of γ and for the L2 norm of γ̇. Proof: if A = kγkL2
and B = kγ̇kL2 , then
√
kγkL21 = A2 + B 2 .
156
Then it is easy to check that for A, B ≥ 0,
√
1
√ (A + B) ≤ A2 + B 2 ≤ A + B.
2
In other words, the norm on γ given by the sum of the L2 norm of γ and the
L2 norm of γ̇ is equivalent to the L21 norm. It is straightforward to use this
fact to prove the claim.
Since φ−1 is C 1 on K, it is locally Lipschitz and thus globally Lipschitz
on K (see Proposition 17). So for C the Lipschitz constant and x0 a point
in K, for all x ∈ K,
|φ−1 (x)| ≤ |φ−1 (x0 )| + C|x − x0 |
≤ C 0 + C|x|,
C 0 = |φ−1 (x0 )| + C|x0 |.
Therefore, the Triangle Inequality gives
kφ (γ)kL2 (S1 ,Rn ) =
Z
12
|φ (γ(t))| dt
≤
Z
12
(C + C|γ(t)|) dt
≤
Z
12
12 Z
2
(C ) dt
+
[C|γ(t)|] dt
−1
−1
2
S1
0
2
S1
S1
0 2
S1
0
= C + CkγkL2 (S1 ,RN )
This is essentially one half of (49) for the L2 norm of γ. The other half
follows from the fact that φ is a C 1 function on the compact set φ−1 K.
We still must address the L21 norm of γ. Recall for y = φ−1 as above, that
(φ−1 ◦ γ)˙ =
∂y a
γ̇ .
∂xa
On the compact set K, since φ−1 is C 1 , there is a constant C so that
∂y ∂xa ≤ C on K,
157
and so on K
N
X
−1
∂y a (φ ◦ γ)˙ = |γ̇ a | ≤ CN |γ̇|.
∂xa γ̇ ≤ C
a=1
Thus, as in the previous paragraph,
k(φ−1 ◦ γ)˙kL2 (S1 ,Rn ) ≤ CN kγ̇kL2 (S1 ,RN ) .
The opposite inequality can be obtained by considering φ as a C 1 map instead
of φ−1 .
Remark. In the previous proof, it sufficed to consider the L2 norms of γ and
γ̇ separately. For higher derivatives, this is no longer adequate: Compute
(φ−1 ◦ γ)¨=
∂y a
∂2y
γ̈
+
γ̇ a γ̇ b .
∂xa
∂xa ∂xb
So first derivative terms of γ come into the calculations of the second derivatives of φ−1 ◦ γ.
The geodesic equation is written in terms of the coordinates on U ⊂ Rn ,
and for an open interval I ⊂ S1 , γ(I) ⊂ O. On any compact subinterval of
I, there is a constant C so that the the components of the metric gk` (γ) and
its first derivatives gk`,m (γ) have absolute values bounded by a constant C
(this is since γ is continuous on the compact interval I). Since γ ∈ L21 , each
γ̇ i ∈ L2 . Therefore, Hölder’s inequality shows that
Z
I
1
|g γ̇ i γ̇ j | dt
2 ij,k
≤
C
2
n Z
X
i,j=1
12 Z
12
j 2
|γ̇ | dt
|γ̇ | dt
< ∞.
i 2
I
I
Thus 12 |gij,k γ̇ i γ̇ j | ∈ L1 (I) for each k, and thus Corollary 83 shows (gik γ̇ i )˙ ∈
L1 (I) for each k in the sense of distributions. Lemma 86 below and the proof
of Proposition 59 above then show gik (γ)γ̇ i is continuous.
gk` (γ)γ̈ k ∈ L1 (I)
(50)
in the sense of distributions. Now since the inverse metric g `m (γ) is continuous in t, we may multiply by it to show that each γ̇ i is continuous as well.
Thus γ is locally C 1 .
158
Now bootstrap using Corollary 83 again to show that (gik (γ)γ̇ i )˙ is continuous as well. Thus gik (γ)γ̇ i is, in the sense of distributions, a C 1 function.
As above, this shows γ̇ i is also C 1 , and thus γ is locally C 2 .
We now have enough regularity to show rewrite Corollary 83 as the
geodesic equation
γ̈ k = −Γkij (γ)γ̇ i γ̇ j .
for γ k C 2 functions. The equation holds in the usual sense of ODEs. Therefore, since Γkij is smooth, the usual regularity theory for ODEs, Theorem 9,
applies, and the geodesic γ is smooth.
Lemma 86. Let f ∈ L1loc (R). Then
g(t) =
Z
t
f (s) ds
t0
is continuous.
Proof. Let t ∈ R, and let h > 0 (the case h < 0 is similar). Compute
g(t + h) − g(t) =
Z
=
Zt
t+h
f (s) ds
χ[t,t+h] (s) f (s) ds
R
for χ[t,t+h] the characteristic function of the interval [t, t + h]. Then as h → 0,
χ[t,t+h] (s) f (s) → 0
almost everywhere on R. For small h,
χ[t,t+h] (s) f (s) ≤ χ[t−1,t+1] (s)f (s) ,
and the right-hand function is integrable since f is locally L1 . Then the
Dominated Convergence Theorem says that
Z
Z
χ[t,t+h] (s) f (s) ds →
0 ds = 0
g(t + h) − g(t) =
R
R
as h → 0+ . The case h → 0− is similar. Thus g(t + h) → g(t) as h → 0 and
g is continuous at each t ∈ I.
159
Homework Problem 67. Let f : R → R be an L1 function. Show that
Z t
ψ(t) = exp
f (τ ) dτ
0
is a continuous function satisfying ψ(0) = 1 and ψ solves ψ̇ = f (t) ψ in
the sense of distributions. (Hint: approximate f in L1 by a sequence of C ∞
functions.)
4.9
Sobolev spaces, distributions, and Fourier series
In this subsection, we provide some more background results about Sobolev
spaces and distributions on S1 .
First of all, we describe C valued distributions. A complex valued distribution is a C-linear map from C ∞ (S1 , C) to C.
Example 22. For k ∈ Z, the map
k
φ 7→ φ̂ =
Z
φ e2πikt dt
S1
is a distribution.
Proposition 87.
L21 (S1 , C) =
(
X
X
f k e2πikt :
k∈Z
)
|f k |2 (k 2 + 1) < ∞ .
k∈Z
Moreover, the norm kf kL21 is equivalent to
X
! 12
|fˆk |2 (k 2 + 1)
.
k∈Z
Proof. First we show ⊂. Let f ∈ L21 (S1 , C) and compute
Z
Z
−2πikt
−2πikt
˙
˙
f (e
)=
f (t) e
dt = −
f (t)(−2πik)e−2πikt dt = 2πik fˆk .
S1
Since f˙ ∈ L2 ,
X
S1
4π 2 k 2 |fˆk |2 = kf˙k2L2 < ∞.
k∈Z
⇐⇒
X
k∈Z
160
k 2 |fˆk |2 < ∞.
Now since f ∈ L2 also, then
X
|fˆk |2 < ∞ and
X
This proves ⊂.
To show ⊃, note that
X
|f k |2 (k 2 + 1) < ∞ ⇐⇒
|f k |2 < ∞ and
k∈Z
k∈Z
|fˆk |2 (k 2 + 1) < ∞.
k∈Z
X
X
k 2 |f k |2 < ∞.
k∈Z
k∈Z
Therefore,
f=
X
f k e2πikt ∈ L2 ,
k∈Z
and by the computations in the previous paragraph f˙k ≡ f˙(e−2πikt ) =
2πikf k . Consider a test function φ ∈ C ∞ (S1 , C). Then compute
f˙(φ) = −f (φ̇)
Z
= −
f φ̇ dt
S1
= −hf, φ̇iL2
X
= −
f k 2πik φ̂k
k∈Z
= −
X
(−2πik)f k φ̂k
k∈Z
=
X
=
*
X
f˙k φ̂k
k∈Z
˙k 2πikt
f e
k∈Z
+
,φ
.
L2
This shows that
f˙ =
X
k∈Z
f˙k e2πikt =
X
(2πik)f k e2πikt
k∈Z
in the sense of distributions. Therefore, both f and f˙ are in L2 , and thus
f ∈ L21 (S1 , C).
The statement about equivalence of the norms follows easily.
161
Remark. Similar easy calculations show that
(
)
X
X
L2m (S1 , C) =
f k e2πikt :
|f k |2 (k 2 + 1)m < ∞
k∈Z
k∈Z
for every m = 0, 1, 2, . . . . Our characterization of smooth functions in (48)
above then shows that
∞
∞
\
1
C (S , C) =
L2m (S1 , C)
m=0
Proof: it is straightforward to show that L2m (S1 , C) compactly embeds in
C m−1 (S1 , C) for all m ≥ 1.
The Fourier series isometry between L2 (S1 , C) and sequences `2 = L2 (Z, C)
also allows us to define even more Sobolev spaces.
For any s ∈ R, define L2s (S1 , C) to be the set of distributions f which act
on
X
φ=
φ̂k e2πikt
k∈Z
by
f (φ) =
X
fˆk φ̂k ,
(51)
k∈Z
ˆk
where f = f (e2πikt ) and we assume that
X
|fˆk |2 (1 + k 2 )s < ∞.
(52)
k∈Z
Homework Problem 68. Show that if fˆk is a sequence of complex numbers
satisfying (52), then for any φ ∈ C ∞ (S1 , C), the sum in (51) converges.
Now we are able to put a topology on C ∞ (S1 , C). We only describe this
topology in terms of convergence of sequences. We say φj → φ in C ∞ (S1 , C),
if φj → φ in L2m (S1 , C) for all m ≥ 0.
Homework Problem 69. Show that φj → φ in C ∞ (S1 , C) if and only if
φj → φ in C p (S1 , C) for all p ≥ 0.
Hint: You may use the fact that L2m (S1 , C) embeds compactly into C m−1 (S1 , C)
for each m ≥ 1. Also show that C p (S1 , C) embeds continuously into L2p (S1 , C)
for all p ≥ 0.
162
Now we finally give the correct definition of complex distributions on S1 .
A distribution on S1 is a continuous C-linear map from C ∞ (S1 , C) to C.
Denote the space of complex distributions on S1 by D0 (S1 , C).
[
Proposition 88. D0 (S1 , C) =
L2m (S1 , C), and the image of D0 (S1 , C)
m∈Z
under the Fourier transform is the set of all polynomially bounded complex
sequences. In other words, it is the set of all sequences {f k } so that there are
m
m ∈ Z, C > 0 so that |f k | ≤ C(k 2 + 1) 2 for all k ∈ Z.
Proof. We prove the first equality, and leave the rest as an exercise.
To prove ⊃, if f is in the union, then f ∈ L2−m (S1 , C) for some positive
m. To show f ∈ D0 (S1 , C), consider a sequence of φj → φ in C ∞ (S1 , C).
Then by definition, φj → φ in L2m . Then
|f (φj ) − f (φ)| = |f (φj − φ)|
X
≤
|fˆk (φ̂kj − φ̂k )|
k∈Z
h
i
|fˆk |
k
k
2 m
2
|
φ̂
−
φ̂
|(1
+
k
)
m
j
(1 + k 2 ) 2
k∈Z
! 21
! 12
X |fˆk |2
X
≤
|φ̂kj − φ̂k |2 (1 + k 2 )m
.
2
m
(1 + k )
k∈Z
k∈Z
=
X
The second term in the last line goes to zero by the remark after Proposition
87, while the first term is finite by the fact f ∈ L2−m . Therefore, f (φj ) → f (φ)
for every test function φ, and f ∈ D0 (S1 , C).
We prove ⊂ by contradiction. If f ∈ D0 (S1 , C) is not in L2m (S1 , C) for
every m ∈ Z, then for all m ∈ Z,
∞
X
|fˆk |2 (1 + k 2 )m = ∞.
k=−∞
This implies that
sup |fˆk |2 (1 + k 2 )m = ∞ for all m ∈ Z.
k∈Z
(Proof of the contrapositive:
|fˆk |2 (1 + k 2 )m ≤ C
=⇒
X
|fˆk |2 (1 + k 2 )m−1 ≤
k∈Z
X
k∈Z
163
C
< ∞.)
1 + k2
So for each j, there is a kj so that
|fˆkj |
j
(1 + kj2 ) 2
≥ 1.
We may assume kj 6= 0.
Now we construct a sequence φj which converges to 0 in C ∞ (S1 , C), but
for which f (φj ) 6→ 0. Define
fˆkj
φj =
j
|fˆkj |(1 + kj2 ) 2
e2πikj t .
Compute
kφj k2L2n ≈
(1 + kj2 )n
= (1 + kj2 )n−j ,
(1 + kj2 )j
where ≈ denotes equivalence of norms. For each fixed n, since each kj2 ≥ 1,
then
lim kφj k2L2n = 0,
j→∞
and so φj → 0 in C ∞ (S1 , C). On the other hand,
f (φj ) = fˆkj ·
fˆkj
j
|fˆkj |(1 + kj2 ) 2
=
|fˆkj |
j
(1 + kj2 ) 2
≥ 1.
So f (φj ) 6→ 0 = f (0) = f (lim φj ), where φj → 0 in C ∞ (S1 , C).
164
Download