Real Analysis II John Loftin∗ May 4, 2013 1 Spaces of functions 1.1 Banach spaces Many natural spaces of functions form infinite-dimensional vector spaces. Examples are the space of polynomials and the space of smooth functions. If we are interested in solving differential equations, then, it is important to understand analysis in infinite-dimensional vector spaces (over R or C). First of all, we should recognize the following straightforward fact about finite-dimensional vector spaces: Homework Problem 1. Let x = (x1 , . . . , xm ) denote a point in Rm , and m let {xn } = {(x1n , . . . , xm n )} be a sequence of points in R . Then xn → x if and only if xin → xi for all i = 1, . . . , m. m (Recall the standard p metric on R is given by |x − y|, where the norm | · | is given by |x| = (x1 )2 + · · · + (xm )2 .) Thus for taking limits in Rm , we could even dispense with the notion of the taking limits using the metric on Rm , and simply define the xn → x by xin → xi for each i = 1, . . . , m. This reflects the fact that there is only one natural topology on a finite-dimensional vector space: that given by the standard norm. For infinite-dimensional vector spaces, say with a countable basis, so that x = (x1 , x2 , . . . ), it is possible to define a topology by xn → x if and only if each xin → xi . It turns out that this is not usually the most useful way ∗ partially supported by NSF Grant DMS0405873 1 to define limits in infinite-dimensional spaces, however (though a related construction is used in defining the topology of Fréchet spaces). Finite-dimensional vector spaces are also all complete with respect to their standard norm (in other words, they are all Banach spaces). Given a norm on an infinite dimensional vector space, completeness must be proved, however. There are many examples of Banach function spaces: On a measure space, the Lp spaces of functions are all Banach spaces for 1 ≤ p ≤ ∞. Also, on a metric space X, the space of all bounded continuous functions C 0 (X) is a measure space under the norm kf kC 0 (X) = sup |f (x)|. x∈X The Lp and C 0 form the basis of most other useful Banach spaces, with extensions typically provided by measuring not just the functions themselves, but also their partial derivatives (as in Sobolev and C k spaces) or their difference quotients (Hölder spaces). Completeness of a metric space of course means that any Cauchy sequence has a unique limit. More roughly, this means that any sequence that should converge, in that its elements are becoming infinitesimally close to each other, will converge to a limit in the space. As we will see, taking such limits is a powerful way to construct solutions to analytic problems. Unfortunately, many of the most familiar spaces of functions (such as smooth functions) do not have the structure of a Banach space, and so it is difficult to ensure that a given limit of smooth functions is smooth. In fact we have the following theorem, which we state without proof: Theorem 1. On Rn equipped with Lebesgue measure, the space C0∞ (Rn ) of smooth functions with compact support is dense in Lp (Rn ) for all 1 ≤ p < ∞. In other words, completion of the space of smooth functions with compact support on Rn with respect to the Lp norm, is simply the space of all Lp functions for 1 ≤ p < ∞. If we are working in L2 , for example, it is possible for the limit of smooth functions to be quite non-smooth: there are many L2 functions which are discontinuous everywhere. This poses a potential problem if the limit we have produced is supposed to be a solution to a differential equation. In particular, such a limit may be nowhere differentiable. Some of our goals then are to understand (1) how to make sense of taking derivatives of functions which are not classically differentiable (the theory of distributions and weak 2 derivatives), and (2) how to show that a limit function actually has enough derivatives to solve the equation (bootstrapping). Theorem 1 reminds us that the Lp Banach spaces have a very large overlap, which of course includes many more functions than the smooth functions with compact support. In particular, it is often useful to take the point of view that these Banach function spaces are not so much different spaces but different tools to study either the space of all functions or (via the completion process) the space of only very nice functions (e.g., smooth functions of compact support). In particular, two function spaces which are very closely related to each other are L∞ and C 0 . As we will see below, they have essentially the same norm. First of all, we show that C 0 (X) is a Banach space for any metric space X. 1.2 The Banach space C 0 Given a metric space X, define C 0 (X) = {f : X → R : f is continuous and sup |f | < ∞}. X Define the norm kf kC 0 (X) = sup |f |. X It is straightforward to verify that k · kC 0 satisfies the requirements for a norm: • kf kC 0 = 0 ⇐⇒ f ≡ 0, • kλf kC 0 = |λ|kf kC 0 , • kf + gkC 0 ≤ kf kC 0 + kgkC 0 . Remark. If fi → f in C 0 (X), then we say fi → f uniformly on X, and C 0 (X) convergence is the same as uniform convergence. The main thing to check is that the norm gives C 0 (X) the structure of a complete metric space: Proposition 1. For any metric space X, C 0 (X) is a Banach space with norm k · kC 0 . 3 Proof. We simply need to check the metric induced on C 0 (X) is complete. Let d denote the metric on X, and consider a Cauchy sequence {fi } ⊂ C 0 (X). In other words, for all > 0, there is an N so that n, m > N implies kfn − fm kC 0 < . By the definition of the norm, this is equivalent to |fn (x) − fm (x)| < for all x ∈ X. Now for each x ∈ X, {fi (x)} ⊂ R is a Cauchy sequence, and since R is complete, there is a limit f∞ (x) = limi fi (x). Now we have produced a limit function f∞ . Now we need to show that kfi − f∞ kC 0 → 0 and f∞ ∈ C 0 (X). The first statement is straightforward: For all > 0, there is an N so that for all n, m > N , for all x ∈ X, |fn (x) − fm (x)| < . Now let m → ∞ to see that |fn (x) − f∞ (x)| ≤ . So we have that for all > 0, there is an N so that for all n > N , and for all x ∈ X, |fn (x) − f∞ (x)| ≤ . Since this is true for all x ∈ X, we have kfn − f∞ kC 0 = sup |fn (x) − f∞ (x)| ≤ , x∈X and so kfi − f∞ kC 0 → 0. We still need to prove that the limit function f∞ is continuous. So let x ∈ X and choose > 0. Then there is an N so that for n > N , kfn − f∞ kC 0 < . By the previous paragraph and the definition of k · kC 0 , |fn (x) − f∞ (x)| < and |fn (y) − f∞ (y)| < for all y ∈ X. Choose a particular n > N and since fn is continuous at x, there is a δ > 0 so that |fn (x) − fn (y)| < for y so that d(x, y) < δ. Then for such y in a δ-ball around x, |f∞ (x) − f∞ (y)| = | [f∞ (x) − fn (x)] + [fn (x) − fn (y)] + [fn (y) − f∞ (y)] | ≤ |f∞ (x) − fn (x)| + |fn (x) − fn (y)| + |fn (y) − f∞ (y)| < + + = 3. So we have proved that for all > 0, x ∈ X, there is a δ > 0 so that d(x, y) < δ ⇒ |f∞ (x) − f∞ (y)| < 3. This proves f∞ is continuous. 4 The last bit of the proof can be remembered as this: Any uniform limit of continuous functions is continuous. Remark. The previous proposition works as well for functions whose range is the complex numbers C, or a vector space Rn , or in fact any Banach space B. The proof is the same. In this last case, we could refer to the Banach space C 0 (X; B) as the Banach space of continuous functions from X into B. Consider an open set Ω ⊂ Rn . On Ω, the C 0 norm is essentially the same as the L∞ norm, but is simpler to define because we can consider functions as elements of C 0 , while we need equivalence classes of functions to define L∞ . In fact, more is true. Let Ω inherit the standard metric and Lebesgue measure from Rn . For a measurable function f : Ω → R, let [f ] be the equivalence class whose members are all functions from Ω → R which agree with f almost everywhere. Proposition 2. The map Φ : C 0 (Ω) → L∞ (Ω) given by Φ(f ) = [f ] is oneto-one and preserves the norm. Proof. First of all, note that it follows immediately from the definitions that for f ∈ C 0 (Ω), Φ(f ) ∈ L∞ (Ω). Also, we should show that kf kC 0 = kΦ(f )kL∞ to show Φ preserves the norm. The proof hinges on the simple fact that every full-measure subset V of Ω is dense in Ω. (Recall V ⊂ Ω has full measure if Ω \ V has Lebesgue measure zero.) This fact may be proved as follows: let V ⊂ Ω have full measure. Then there is no open ball contained in Ω \ V (since open balls have positive measure). This shows V is dense in Ω. (Question: We need to use Ω is an open subset of Rn in this paragraph. Where did we use that Ω is open?) Now we prove the map Φ is injective. So if f and g are in C 0 (Ω), and [f ] = [g], then by definition, f ≡ g on a set V of full measure. Let x ∈ Ω. Since V is dense, there is a sequence xn → x, xn ∈ V . Then f (x) = f (lim xn ) = lim f (xn ) = lim g(xn ) = g(lim xn ) = g(x) n n n n since f and g are continuous and f (xn ) = g(xn ). So f and g coincide at each point of Ω and so f = g in C 0 (Ω). Finally, we show that for f ∈ C 0 (Ω), kf kC 0 = kf kL∞ . In particular, let µ denote Lebesgue measure and compute (recall we often write kf kL∞ instead of the more correct k[f ]kL∞ = kΦ(f )kL∞ ) kf kL∞ (Ω) = inf{a : |f (x)| ≤ a for almost every x ∈ Ω} = inf{a : µ{x : |f (x)| > a} = 0}. 5 But µ{x : |f (x)| > a} = 0 implies that {x : |f (x)| > a} = ∅ (Proof: If the set is not empty it is an open subset of Ω since |f | is continuous. The only open subset of Ω with measure zero is the empty set.) So now kf kL∞ (Ω) = = = = inf{a : µ{x : |f (x)| > a} = 0} inf{a : {|f (x)| > a} = ∅} inf{a : |f (x)| ≤ a for all x ∈ Ω} sup |f (x)| = kf kC 0 (Ω) . x∈Ω Remark. The previous Proposition is true for any measurable subset Ω of Rn with the following property: every nonempty open subset of Ω has positive measure. Remark. The map Φ from C 0 (Ω) to L∞ (Ω) is far from being onto. A typical discontinuous function g cannot be changed on a set of measure zero to be continuous. The following homework problem is to show this is the case with the Heaviside function. Homework Problem 2. Let g(x) be the Heaviside function on R. In other words, let g(x) = 0 if x < 0 and g(x) = 1 if x ≥ 0. (a) Show there is no function in C 0 (R) which is equal to g almost everywhere. (b) Show that there is no sequence of functions fn ∈ C 0 (R) which satisfy fn → g in L∞ (R). Hint for (b): Show that if fn → g in L∞ (R), then {fn } is a Cauchy sequence in C 0 (R). Then use Proposition 1 and show the resulting limit function f∞ ∈ C 0 (R) must be equal to g almost everywhere. (This amounts to showing that Φ(C 0 ) is a closed subspace of L∞ .) Provide a contradiction. 1.3 Quantifiers It is worth taking the time to look in some detail at C 0 convergence, and to compare it to pointwise convergence. By contrast, C 0 convergence is often call uniform convergence. 6 For a metric space X, fn → f in C 0 (X), if for all > 0, there is an N so that n > N =⇒ kfn − f kC 0 (X) < . In other words, for all > 0, there is an N so that n>N =⇒ sup |fn (x) − f (x)| < . x∈X So then fn → f in C 0 (X) implies that for all > 0, there is an N so that for x ∈ X, n > N =⇒ |fn (x) − f (x)| < . A few easy manipulations imply in fact the following Lemma 3. Let X be a metric space and let fn ∈ C 0 (X). Then fn → f in C 0 (X) if and only if for every > 0, there is an N = N () so that for x ∈ X, n>N =⇒ |fn (x) − f (x)| < . Homework Problem 3. Prove Lemma 3. Since C 0 (X) is a Banach space, we know that the limit function f ∈ C 0 (X) as well, and thus the uniform limit of continuous functions is continuous. C 0 convergence is called uniform convergence because the N in Lemma 3 depends only on > 0 and not on x ∈ X: thus N is uniform over all x ∈ X. We contrast this with pointwise convergence. If fn are functions on X, then fn → f pointwise if for all > 0 and x ∈ X, there is an N = N (, x) so that n > N =⇒ |fn (x) − f (x)| < . The difference between pointwise and uniform convergence is subtle but very important: in pointwise convergence N = N (, x) may depend on and x, while in uniform convergence N = N () only depends on and is independent of x. We have belabored this point because it is one of the major issues in analysis: keeping track of which constants, or quantifiers, depend on which other quantifiers. (It is even better to have explicit bounds (estimates) on the behavior of quantifiers with respect to each other.) Of course it is desirable (though not always possible) to have more uniform dependence of quantifiers, as we see in the following standard example: We have seen that the uniform limit of continuous functions is continuous. On the other hand, a pointwise limit of continuous functions may be not be: 7 Example 1. Consider X = [0, 1] and fn (x) = xn . Then fn → f pointwise on [0, 1], where 0 for x ∈ [0, 1), f (x) = 1 for x = 1. So the pointwise limit f is discontinuous, and thus we see that fn 6→ f uniformly. 1.4 Derivatives The theory of derivatives in one variable is fairly straightforward: if a function f : R → R is differentiable at p (i.e., f 0 (p) exists), then f must be continuous at p. For functions of more than one variable, however, consider the following example: Example 2. f (x, y) = ( xy x2 + y 2 0 for (x, y) 6= (0, 0) for (x, y) = (0, 0), has first partial derivatives everywhere but is not even continuous at (0, 0). Even though f has all its first partial derivatives at (0, 0), we do not consider f to be differentiable at (0, 0). For functions of more than one variable, we introduce the following definition of differentiability, which is stronger than just the existence of all the partial derivatives. Instead of Rvalued functions, we consider the slightly more general case of maps from Rn to Rm . A basic reference is Spivak, Calculus on Manifolds, Chapter 2. Let O ⊂ Rn be a domain, and let f = (f 1 , . . . , f m ) : O → Rm . Then f is differentiable at a point a ∈ O if there is a linear map Df (a) : Rn → Rm which satisfies lim h→0 |f (a + h) − f (a) − Df (a)(h)| = 0, |h| where h ∈ Rn . Df (a) is called the derivative, or total derivative, of f at a. Lemma 4. In terms of standard bases of Rn and Rm , Df (a) is written as the Jacobian matrix i ∂f Df (a) = (a) , i = 1, . . . , m, j = 1, . . . , n. ∂xj 8 In particular, if f is differentiable at a, then all the partial derivatives ∂f i /∂xj exist at a. Proof. Write Df (a) as the matrix (λij ). Also consider a path h = (0, . . . , k, . . . , 0), where k → 0 sits in the j th slot. (In other words, hl = δjl k, where δjl is the Kronecker delta, which is 1 if l = j and 0 otherwise.) We also use Einstein’s summation convention. In n space, this summation convention requires that any repeated index which appears in both up and down positions—such as the l in the last two lines below—is assumed to be summed from 1 to n. Compute ∂f i f i (a1 , . . . , aj + k, . . . , an ) − f i (a) (a) = lim k→0 ∂xj k [f i (a1 , . . . , aj + k, . . . , an ) − f i (a) − λil hl ] + λil hl = lim k→0 k i l λl δj k = 0 + lim k→0 k i l i = λl δj = λj . The key step, going from the second to the third line, follows from the assumption that f is differentiable at a. Another important result with essentially the same proof concerns directional derivatives. For a vector v = v j ∈ Rn , The directional derivative at a in the direction v is the vector f (a + tv) − f (a) . t→0 t Dv f (a) = lim (Note we do not require kvk = 1 to define the directional derivative.) We have the following lemma: Lemma 5. If f is differentiable at a, then the directional derivative Dv f (a) exists and ∂f Dv f (a) = v j j . ∂x As Example 2 above shows, the converse of Lemma 4 is not true without extra assumptions on the partial derivatives. The following proposition gives an easy criterion for a function to be differentiable: 9 Proposition 6. If f = (f 1 , . . . , f m ) has continuous first partial derivatives ∂f i /∂xj on a neighborhood of a, then f is differentiable at a. Proof. For a component function f i , write f i (a + h) − f i (a) = f i (a1 + h1 , a2 , . . . , an ) − f i (a1 , a2 , . . . , an ) + f i (a1 + h1 , a2 + h2 , . . . , an ) − f i (a1 + h1 , a2 , . . . , an ) + · · · + f i (a1 + h1 , a2 + h2 , . . . , an + hn ) − f i (a1 + h1 , a2 + h2 , . . . , an−1 + hn−1 , an ) Now consider the first term in terms of the function f i (x1 , a2 , . . . , an ) of the first variable x1 alone. The Mean Value Theorem shows that there is a b1 between a1 and a1 + h1 so that f i (a1 + h1 , a2 , . . . , an ) − f i (a1 , a2 , . . . , an ) = h1 ∂f i 1 2 (b , a , . . . , an ). ∂x1 Similarly, for all other terms the difference equals hj ∂f i 1 (a + h1 , . . . , aj−1 + hj−1 , bj , aj+1 , . . . , an ) ∂xj for bj between aj and aj + hj . So if we set cj = (a1 + h1 , . . . , aj−1 + hj−1 , bj , aj+1 , . . . , an ), then we have i i f (a + h) − f (a) = n X hj j=1 ∂f i (cj ), ∂xj where each cj → a as h → 0. So compute n n X ∂f i i i X ∂f ∂f i (a)hj (cj ) − j (a) hj f (a + h) − f i (a) − j j ∂x ∂x ∂x j=1 j=1 lim = lim h→0 h→0 |h| |h| i n i X ∂f j ∂f |h | (c ) − (a) j ∂xj j ∂x j=1 ≤ lim h→0 |h| i n i X ∂f ∂f ≤ lim (c ) − (a) j ∂xj j h→0 ∂x j=1 = 0 10 since each ∂f i /∂xj is assumed to be continuous at a. So we have proved that each component function f i is differentiable at a. To show f is differentiable, just note i i ∂f ∂f j j i f (a + h) − f (a) − f (a + h) − f (a) − (a)h (a)h m X ∂xj ∂xj ≤ , |h| |h| i=1 which goes to 0 as h → 0. Recall a function is (locally) C 1 if its first partial derivatives are continuous. The previous Proposition 6 shows that such functions are differentiable, and Lemma 5 then shows that directional derivatives work as expected for C 1 functions. Now, for functions f on Ω an open subset of Rm , consider the norm m X ∂f kf kC 1 (Ω) = kf kC 0 (Ω) + ∂xi 0 C (Ω) i=1 and the space C 1 (Ω) = {f : Ω → R : f, ∂1 f, . . . , ∂m f are bounded and continuous}. Similarly, we can consider Rp -valued C 1 functions, the difference being that the functions f , ∂i f have bounded values in Rp . Proposition 7. On any open set Ω ⊂ Rm , C 1 (Ω, Rp ) is a Banach space. Proof. It is straightforward to check k · kC 1 is a norm. ∂f Since kf kC 1 ≥ kf kC 0 and kf kC 1 ≥ k ∂x j kC 0 , then for any Cauchy sequence ∂fn 1 {fn } in C , {fn } and { ∂xj } are Cauchy sequences in C 0 . Therefore, since C 0 is a Banach space, there are uniform limits f∞ = lim fn , n gi = lim n ∂f , ∂xi i = 1, . . . , m, and f∞ , gi ∈ C 0 . Since kf kC 1 = kf kC 0 m X ∂f + ∂xi i=1 11 C0 , (1) (1) shows it suffices to prove that ∂f∞ = gi , ∂xi i = 1, . . . , m. As usual, we recognize that integrating has better properties than differentiating. For x ∈ Ω, choose an x0 = x − (0, . . . , k, . . . , 0), where the k > 0 is in the ith slot. Since Ω is open, we may choose k small enough so that the line segment from x0 to x is contained in Ω. Compute f∞ (x) = lim fn (x) n " = lim fn (x0 ) + n = f∞ (x0 ) + Z y=xi y=xi −k Z ∂fn 1 (x , . . . , xi−1 , y, xi+1 , . . . , xm ) dy ∂xi # y=xi gi (x1 , . . . , xi−1 , y, xi+1 , . . . , xm ) dy (2) y=xi −k The key step in the computation is the last one: fn (x0 ) → f∞ (x0 ) is easy, and the integral converges by the Dominated Convergence Theorem: Since n gi ∈ C 0 , there is a constant C so that |gi | ≤ C on Ω. Moreover, since ∂f → gi ∂xi ∂fn ∂fn 0 in C , there is an N so that | ∂xi − gi | ≤ 1 for all n ≥ N . Thus ∂xi are all bounded by the integrable function C + 1, and the Dominated Convergence Theorem applies. ∞ Now we can differentiate (2) with respect to xi and we see that ∂f = gi ∂xi at each x ∈ Ω. This completes the proof. The last part of the proof is of independent interest. We record it as Proposition 8. Let fn be C 1 functions on a domain Ω ⊂ Rm . Then if fn → f uniformly and ∂fn /∂xi → gi uniformly for i = 1, . . . , m, then gi = ∂f /∂xi . Remark. We can also define C k (Ω, Rp ) to be the space of all functions f : Ω → Rp so that f and all its partial derivatives up to order k are continuous and bounded. The norm is given by X kf kC k = k∂α f kC 0 , (3) |α|≤k where α = (α1 , . . . , αm ), each αi ≥ 0, |α| = α1 + · · · + αm , and ∂α f = ∂ |α| f (∂x1 )α1 · · · (∂xm )αm 12 (if some αi = 0, then there is no differentiation with respect to xi ). We can use the same proof as above to conclude that C k is a Banach space. In particular, we can apply the theorem to F = (f, f,1 , . . . , f,n ) and then relate kF kC 1 to kf kC 2 to provide an inductive step. C ∞ is not a Banach space, as the analog of (3) would involve an infinite sum. We’ve used the following problem implicitly a few times above. Homework Problem 4. Show that if f : Rn → Rm is differentiable at a point a, then it is continuous at a. Homework Problem 5. Let f be a real-valued function defined on a domain 2f ∂2f in R2 . Show that if the second mixed partials f,12 = ∂x∂1 ∂x 2 and f,21 = ∂x2 ∂x1 are continuous in a neighborhood of a point y, then ∂2f ∂2f (y) = (y). ∂x1 ∂x2 ∂x2 ∂x1 Hint: If the two are not equal, assume without loss of generality that the difference f,12 − f,21 > 0 at y. Then it must be positive on a rectangular neighborhood. Integrate this quantity over the rectangular neighborhood, and use Fubini’s Theorem and the Fundamental Theorem of Calculus to arrive at a contradiction. Finally, we introduce the Chain Rule. We need the following lemma first: Lemma 9. Let A : Rn → Rm be a linear map. Then there is a constant C = C(A) so that |Ax| ≤ C|x| for all x ∈ Rn . Homework Problem 6. Prove Lemma 9. Hint: write down Ax in terms of the matrix entries of A. Proposition 10 (Chain Rule). Let g : O → Rn , f : U → O, where O ⊂ Rm and U ⊂ Rl are domains. Assume f is differentiable at a ∈ U, and g is differentiable at f (a) ∈ O. Then there is a composition of linear maps D(g ◦ f )(a) = Dg(f (a)) ◦ Df (a). In terms of partial derivatives, this is equivalent to ∂g p ∂g p ∂y j = , ∂xi ∂y j ∂xi where {xi } are coordinates on Rl , {y j } are coordinates on Rm , and we follow the usual rules of Leibniz notation and Einstein summation. 13 Proof. Let A = Df (a), B = Dg(f (a)). Now consider the remainder terms in the definition of differentiable maps. For h ∈ Rl , k ∈ Rm , φ(h) = f (a + h) − f (a) − A(h), ψ(k) = g(f (a) + k) − g(f (a)) − B(k), ρ(h) = (g ◦ f )(a + h) − (g ◦ f )(a) − (B ◦ A)(h). Then since f and g are differentiable, |φ(h)| = 0, |h| |ψ(k)| lim = 0, k→0 |k| lim h→0 (4) (5) and we want to show that |ρ(h)| = 0. h→0 |h| lim So compute ρ(h) = = = = So then g(f (a + h)) − g(f (a)) − B(A(h)) g(f (a + h)) − g(f (a)) − B(f (a + h) − f (a) − φ(h)) [g(f (a + h)) − g(f (a)) − B(f (a + h) − f (a))] + B(φ(h)) ψ(f (a + h) − f (a)) + B(φ(h)) |ψ(f (a + h) − f (a))| |B(φ(h))| |ρ(h)| ≤ + . |h| |h| |h| |B(φ(h))|/|h| → 0 as h → 0 by (4) and Lemma 9. On the other hand (5) shows that for all > 0 there is a δ so that |k| < δ |ψ(k)| ≤ |k|. =⇒ Therefore if |f (a + h) − f (a)| < δ (which can be achieved if |h| < γ since f is continuous), |ψ(f (a + h) − f (a))| |f (a + h) − f (a)| ≤ |h| |h| |A(h)| |φ(h)| ≤ + |h| |h| 14 Now if we let h → 0, using (4) and Lemma 9, lim sup h→0 |ψ(f (a + h) − f (a))| ≤ C. |h| Now we may let → 0 to show that |ρ(h)|/|h| → 0 as h → 0. 1.5 Contraction mappings Another tool we need is a basic fact about complete metric spaces, the Contraction Mapping Theorem. A fixed point of a map f : X → X is a point x ∈ X so that f (x) = x. For a metric space X with metric d, a contraction map is a map g : X → X so that there is a constant λ ∈ (0, 1) for which d(g(x), g(y)) ≤ λ d(x, y) for all x, y ∈ X. Remark. It is important that the constant λ < 1 is independent of the x and y in X. As we’ll see below in a homework exercise, the following theorem is false if we let λ depend on x and y. Theorem 2 (Contraction Mapping). Any contraction mapping on a complete metric space has a unique fixed point. Proof. As above, denote our metric space by X with metric d, and let λ ∈ (0, 1) be the constant for the contraction map g: for all x, y ∈ X, d(g(x), g(y)) ≤ λ d(x, y). First we prove uniqueness. If x and y are fixed points of g (so g(x) = x, g(y) = y), then d(x, y) = d(g(x), g(y)) ≤ λ d(x, y). So (1 − λ)d(x, y) ≤ 0. Since λ < 1 and d(x, y) ≥ 0 (since X is a metric space), we must have d(x, y) = 0 and so x = y (again since X is a metric space). To prove existence of the fixed point, we consider any point x0 ∈ X, and consider iterates defined inductively by xn+1 = g(xn ) for all n ≥ 0. We claim xn is a Cauchy sequence and the limit x∞ of xn is the fixed point. For 15 n ≥ m ≥ 0, compute d(xn , xm ) ≤ = ≤ ≤ ≤ d(xn , xn−1 ) + · · · + d(xm+1 , xm ) d(g(xn−1 ), g(xn−2 )) + · · · + d(g(xm ), g(xm−1 )) λ d(xn−1 , xn−2 ) + · · · + λ d(xm , xm−1 ) λ2 d(xn−2 , xn−3 ) + · · · + λ2 d(xm−1 , xm−2 ) λn−1 d(x1 , x0 ) + · · · + λm d(x1 , x0 ) n−1 ∞ X X λm i = d(x1 , x0 ) λ ≤ d(x1 , x0 ) λi = d(x1 , x0 ) 1−λ i=m i=m (Note that in this computation, we’ve used the exact sum of the geometric series, and it is crucial that λ ∈ (0, 1): the geometric series diverges for λ ≥ 1.) So if N is a positive integer, then for all n, m > N , d(xn , xm ) ≤ d(x1 , x0 )λN /(1 − λ), and this last quantity d(x1 , x0 )λN /(1 − λ) → 0 as N → ∞. Thus {xn } is a Cauchy sequence which has a limit x∞ ∈ X since X is a complete metric space. Now we prove that x∞ is a fixed point. Since x∞ = limi xi = limi xi+1 , we have g(x∞ ) = g(lim xi ) = lim g(xi ) = lim xi+1 = x∞ , i i i and so x∞ is a fixed point. One point to note is that we have interchanged g with lim, which is valid only if g is continuous (this is a homework problem below). Homework Problem 7. Show any contraction map is continuous. Homework Problem 8. Newton’s method is an iterative method for finding zeros of differentiable functions. For an initial x0 , we proceed by the recursive definition f (xi ) xi+1 = xi − 0 . f (xi ) Then the limit lim xn should produce a zero of the function f . A differentiable function f : R → R has a nondegenerate zero at x if f (x) = 0 and f 0 (x) 6= 0. Assume f : R → R is a locally C 2 function (i.e., f 00 is continuous on all of R). Show that every nondegenerate zero x of f has a neighborhood Nx so that for any initial x0 ∈ Nx , Newton’s method converges to x. Hints: 16 (a) The main point is to exhibit the Newton’s method iteration as a contraction map on a complete metric space (recall a closed subset of any complete metric space is complete). You must find an appropriately small neighborhood of x on whose closure Newton’s method is a contraction map. (b) You will need the following lemma: For a C 1 function g : R → R, y 6= z ∈ [a, b] =⇒ |g(y) − g(z)| ≤ max |g 0 (w)|. w∈[a,b] |y − z| (c) Show that any fixed point of Newton’s method is a zero. (d) Show the zero you have produced via Newton’s method must be the original zero x. 1.6 Differentiating under the Integral Proposition 11. Let f = f (y, x) be a locally C 1 real-valued function for y ∈ Rn , x ∈ O an open subset of Rm . Then on a measurable Ω ⊂⊂ O ⊂ Rm equipped with Lebesgue measure dx, Z Z ∂ ∂f f (y, x) dx = (y, x) dx. i i ∂y Ω Ω ∂y R f (y, x) dx is C 1 as a function of y. Ω Remark. Ω ⊂⊂ O means that the closure Ω̄ in Rm is a compact subset of O. Proof. Compute. Let ei be the standard ith basis vector on Rn . Z Z Z ∂ 1 f (y, x) dx = lim f (y + kei , x) dx − f (y, x) dx k→0 k ∂y i Ω Ω Ω Z f (y + kei , x) − f (y, x) = lim dx k→0 Ω k ∂f Clearly as k → 0, the integrand goes to ∂y We need to i (y, x) pointwise. show that the integrands are bounded in absolute value by a fixed integrable function to use the Dominated Convergence Theorem. This follows from the Mean Value Theorem, which shows that the integrand is equal to ∂f (ỹ, x) ∂y i 17 for ỹ = (y 1 , . . . , y i−1 , bi , y i+1 , . . . , y n ), bi between y i and y i + k. Since f is C 1 , ∂f /∂y i is continuous, Ω̄ is compact, and ỹ stays in a compact neighborhood of y, then the absolute value of the integrand is bounded by a constant M . R Since Ω M dx < ∞, the Dominated Convergence Theorem shows that Z Z f (y + kei , x) − f (y, x) ∂ f (y, x) dx = lim dx i k→0 Ω ∂y Ω k Z f (y + kei , x) − f (y, x) = lim dx k Ω k→0 Z ∂f = (y, x) dx i Ω ∂y R To show that Ω f (y, x) is C 1 as a function of y, note that its partial derivatives Z ∂f (y, x) dx gi (y) = i Ω ∂y are continuous in y by the Dominated Convergence Theorem again, since if y → y0 , then Z ∂f lim gi (y) = lim (y, x) dx y→y0 y→y0 Ω ∂y i Z ∂f (y, x) dx = lim i Ω y→y0 ∂y Z ∂f = (y , x) dx i 0 Ω ∂y = gi (y0 ) because ∂f /∂y i is continuous in y. Remark. The last argument also shows that if f = f (z, x) is a continuous function of z and x, and x ∈ Ω a compact subset of Rn , then the function Z z 7→ f (z, x) dx Ω is continuous. 18 1.7 The Inverse Function Theorem We need the following lemma first: Lemma 12. If f is a C 1 function from a ball B in Rn to Rm , which satisfies i ∂f ∂xj ≤ C on B, then for y, z ∈ B, |f (y) − f (z)| ≤ Cmn|y − z|. Proof. If y, z ∈ B, then the line segment {ty + (1 − t)z : 0 ≤ t ≤ 1} between them is also contained in B (see Homework Problem 13 below). Then use the Chain Rule to compute for i = 1, . . . , m, Z 1 ∂ i i i |f (y) − f (z)| = f (ty + (1 − t)z) dt ∂t Z0 1 i ∂f j j = (y − z ) j (ty + (1 − t)z) dt ∂x 0 ≤ Cn|y − z|. (Note this argument is essentially the same as the use of the Mean Value Theorem.) Now apply |f (y) − f (z)| ≤ m X |f i (y) − f i (z)|. i=1 Theorem 3 (Inverse Function Theorem). Let f : O → U be a C 1 map between domains in Rm . Assume that for a ∈ O, Df (a) is an invertible matrix (i.e., det Df (a) 6= 0). Then there are neighborhoods O0 3 a and U 0 3 f (a) so that f : O0 → U 0 is a bijection and f −1 is also a C 1 map. For every b ∈ O0 , D(f −1 )(f (b)) = (Df (b))−1 . Proof. First of all, we may reduce to the case that a = f (a) = 0 and Df (a) = I the identity map from Rm to itself. (This can be achieved by replacing f (x) by (Df (a))−1 (f (x + a) − f (a)). Then use the Chain Rule and the fact that the derivative of the linear map (Df (a))−1 is (Df (a))−1 itself.) 19 Now consider g(x) = x − f (x) and note that Dg(0) = 0 the zero linear transformation. Since g is C 1 , there is an r > 0 so that |x| < 2r implies i ∂g < 1 , (x) for i, j = 1, . . . , m. (6) ∂xj 2m2 Let B(r) = {x ∈ Rm : |x| < r}. Then Lemma 12 and g(0) = 0 imply that g(B(r)) ⊂ B(r/2). Now let y ∈ B(r/2) and consider gy (x) = g(x) + y = x − f (x) + y. Then • gy (x) = x is equivalent to f (x) = y, and so a fixed point of gy is equivalent to a solution to f (x) = y. • If x ∈ B(r), |gy (x)| ≤ |g(x)| + |y| ≤ r, and so gy is a map from the complete metric space B(r) to itself. • Lemma 12 and (6) imply gy is a contraction map (with λ = 1/2). In other words, for x1 , x2 ∈ B(r), |gy (x1 ) − gy (x2 )| = |g(x1 ) − g(x2 )| ≤ 12 |x1 − x2 | (7) Therefore, for each y ∈ B(r/2), there is a unique fixed point x of gy , which shows there is a unique solution x to f (x) = y in B(r). Now we show x = f −1 (y) is continuous: for x1 , x2 ∈ B(r), we have, by the definition g = x − f and (7) |x1 − x2 | ≤ |g(x1 ) − g(x2 )| + |f (x1 ) − f (x2 )| ≤ 21 |x1 − x2 | + |f (x1 ) − f (x2 )|, 1 |x − x2 | ≤ |f (x1 ) − f (x2 )|, 2 1 |f −1 (y1 ) − f −1 (y2 )| ≤ 2|y1 − y2 | (8) for yi = f (xi ). Thus f −1 is continuous. To show f −1 is differentiable at y2 with total derivative (Df (x2 ))−1 , we need to show that lim y1 →y2 |f −1 (y1 ) − f −1 (y2 ) − (Df (x2 ))−1 (y1 − y2 )| = 0. |y1 − y2 | 20 To show this, compute |f −1 (y1 ) − f −1 (y2 ) − (Df (x2 ))−1 (y1 − y2 )| = |x1 − x2 − (Df (x2 ))−1 (f (x1 ) − f (x2 ))| = |(Df (x2 ))−1 [Df (x2 )(x1 − x2 ) − (y1 − y2 )]| ≤ C|Df (x2 )(x1 − x2 ) − (y1 − y2 )| (by Lemma 9) = C|Df (x2 )(x1 − x2 ) − [f (x1 ) − f (x2 )]| (9) Therefore, |f −1 (y1 ) − f −1 (y2 ) − (Df (x2 ))−1 (y1 − y2 )| |y1 − y2 | −1 |f (y1 ) − f −1 (y2 ) − (Df (x2 ))−1 (y1 − y2 )| |x1 − x2 | = · |x1 − x2 | |y1 − y2 | (Note y1 6= y2 implies x1 6= x2 since yi = f (xi ).) This expression goes to zero as y1 → y2 by (8) and (9), since f is differentiable at x2 . Finally we show the total derivative (Df (x))−1 is continuous in y. We 2 can think of Df as a map from x to Rm , which represents the space of m × m matrices. Df (x) is continuous in x (f is C 1 ), and thus is continuous 2 in y. The determinant function det : Rm → R is continuous, since it is a polynomial in the matrix entries. So det Df (x) is bounded away from zero, by compactness of B(r). We are left to prove the continuity of the matrix inverse operation for square matrices with determinant bounded away from 0. This follows from the formula from the inverse in terms of cofactor matrices: Each entry of the inverse matrix A−1 = (aij )−1 is of the form (m − 1)st -order polynomial in the aij . det(aij ) Homework Problem 9. If, in the Inverse Function Theorem, f is a smooth (C ∞ ) map, then f −1 : U 0 → O0 , the C 1 local inverse of f , is also C ∞ . Hints: (a) If A = A(s) is a family of invertible n × n matrices which depend differentiably on a real parameter s, differentiate the equation AA−1 = I to show d(A−1 ) dA = −A−1 A−1 . ds ds 21 (b) Use the formula for D(f −1 ) to show that f −1 is C ∞ . Hints: It may be helpful to use the following notation. If f = f (x) = f (x1 , . . . , xn ), we may write (y 1 , . . . , y n ) = y = y(x) = f (x). And so f −1 (y) = x may be written simply as y = y(x). To show f −1 is C 2 , for example, you should write ∂ 2 (f −1 )k ∂ 2 xk = ∂y i ∂y j ∂y i ∂y j in terms of (the components of ) the first and second derivatives ∂f ∂y = , ∂xi ∂xi and ∂2f ∂2y = ∂xi ∂xj ∂xi ∂xj and verify that the resulting expression is continuous. Remember to use the Chain Rule, as in e.g., ∂ ∂xi ∂ = , ∂y j ∂y j ∂xi and recall that Df −1 = (Df )−1 can be written as ∂xi ∂y j = ∂y k ∂xl −1 . It will also be helpful to use Einstein’s summation notation. In particular, the matrix notation used in part (a) is insufficient, as there may be quantities with more than 2 indices which need to be summed. Theorem 4 (Implicit Function Theorem). Suppose f : Rn × Rm → Rm is C 1 in an open set containing (a, b), and assume f (a, b) = 0. Assume the m × m matrix ∂f i (a, b) , 1 ≤ i, j ≤ m ∂xn+j is invertible. Then there is an open set O ⊂ Rn containing a and an open set U ⊂ Rm containing b so that for each x ∈ O, there is a unique g(x) ∈ U so that f (x, g(x)) = 0. g is locally C 1 . Homework Problem 10. Prove the Implicit Function Theorem. Hints: 22 (a) Consider F : Rn × Rm → Rn × Rm defined by F (x, y) = (x, f (x, y)) and apply the Inverse Function Theorem to F . (b) Show that, on a suitably small neighborhood, F −1 is of the form F −1 (x, y) = (x, p(x, y)) for p : Rn × Rm → Rm . (c) Show that g(x) = p(x, 0) satisfies the conditions of the theorem. 1.8 Lipschitz constants and functions A closely related concept to the contraction map is the Lipschitz constant. A map f : X → Y has Lipschitz constant L= sup x,x0 ∈X:x6=x0 dY (f (x), f (x0 )) . dX (x, x0 ) Here of course dX and dY are the metrics on X and Y respectively. An equivalent definition is that L is the smallest constant so that dY (f (x), f (x0 )) ≤ L dX (x, x0 ) for all x, x0 ∈ X. A function with finite Lipschitz constant is called Lipschitz. A basic fact is the following: Lemma 13. Any Lipschitz function is continuous. If f : X → X, then the Lipschitz constant gives a criterion for a mapping to be a contraction mapping: Lemma 14. f : X → X is a contraction map if and only if the Lipschitz constant L of f is strictly less than 1. Idea of proof. The Lipschitz constant is the smallest value of λ for which f is a contraction map. If f : R → R, then the Lipschitz constant is simply L = sup x6=y |f (x) − f (y)| , |x − y| which of course is suggestive of the definition of the derivative. In fact, the following is true: 23 Homework Problem 11. The Lipschitz constant of a locally C 1 function f : R → R is equal to supx∈R |f 0 (x)|. Hint: To show the two quantities are equal, you need to relate the sup of the derivative to the sup of the difference quotients. To relate the derivative f 0 (x) to difference quotients, use the definition of the derivative. To relate a given difference quotient to a derivative, use the Mean Value Theorem. The previous problem shows that any differentiable function with bounded derivative is Lipschitz. The converse is false, as we see in the following example. Example 3. The function x 7→ |x| is a Lipschitz function from R to R. This follows from the observation that for each x 6= y ∈ R, |x| − |y| ≤ 1. |x − y| (This can be proved using the Triangle Inequality.) Example 4. For any constant α ∈ (0, 1), the function from R to R x 7→ |x|α is not Lipschitz. In particular, α |x| − |0|α lim = lim |x|α−1 = ∞. x→0 x→0 |x − 0| In terms of the graph of a function, a function whose graph has a corner (as does x 7→ |x|) is Lipschitz, while a function whose graph has a cusp (as does x 7→ |x|α ) is not Lipschitz. Another basic fact we establish is this: the conclusion of the Contraction Map Theorem may be false if the Lipschitz constant is equal to 1. An easy example is the map x 7→ x + 1 from R → R. The Lipschitz constant is obviously 1, and there is no fixed point. A related, but somewhat more surprising fact, is outlined in the following problem: Homework Problem 12. Find an example of a differentiable function f : R → R so that for each x 6= y, |f (x) − f (y)| < 1, |x − y| and yet f has no fixed point. Prove your answer works. 24 Hint: The point of this problem is that there should be no uniform L < 1 which works for all x and y. To construct such a function f , use Problem 11 above. In particular, first construct the derivative f 0 and then integrate to find f . (You’ll need supx |f 0 (x)| = 1; why?) Use the Mean Value Theorem to relate values of f 0 to difference quotients. A subset C of a real vector space is convex if every line segment connecting two points in C is contained in C. More formally, C is convex if x, y ∈ C, t ∈ [0, 1] =⇒ tx + (1 − t)y ∈ C. Proposition 15. Any globally C 1 function from a convex domain Ω ⊂ Rn to Rm is globally Lipschitz. Proof. Lemma 12 above shows that for any x, y ∈ Ω, i ∂f |f (x)−f (y)| ≤ Cnm|x−y|, for C = sup j (z) : z ∈ Ω, i ≤ n, j ≤ m . ∂x C < ∞ since f is C 1 . Thus f is Lipschitz. Consider X a locally compact metric space and Y any metric space. Then we say a function f : X → Y is locally Lipschitz if f satisfies one of the two following equivalent definitions: 1. f is Lipschitz when restricted to any compact set of X. In other words, if K ⊂ X is compact, then there is a constant LK so that x, x0 ∈ K =⇒ dY (f (x), f (x0 )) ≤ LK dX (x, x0 ). 2. Each x ∈ X has a neighborhood on which f is Lipschitz. We prove these two definitions are equivalent below. Corollary 16. On any domain Ω ⊂ Rn , any locally C 1 function f is locally Lipschitz. Proof. Any ball is convex (see the following homework problem), and so if f is C 1 on a small ball, then it is Lipschitz on the ball by the previous Proposition 15. 25 Homework Problem 13. Show that any ball Bx (r) = {y ∈ Rn : |y−x| < r} is convex. Proposition 17. Let X be a locally compact metric space and Y be any metric space, then for maps f from X to Y , the two definitions (1) and (2) above are equivalent. Proof. To prove (1) =⇒ (2), consider x ∈ X. Since X is locally compact, there is a neighborhood O of x with compact closure. By the definition of locally Lipschitz, f is Lipschitz when restricted to Ō, and is thus Lipschitz on O also. To prove part (2) =⇒ (1), let K ⊂ X be a compact subset. Given that all points in X have neighborhoods on which f is Lipschitz, we need to prove that f is Lipschitz on K. The set of all neighborhoods of points in K on which f is Lipschitz forms an open cover of K, and thus there is a finite subcover O1 , . . . , On . The set ! n [ Oi × O i P =K ×K \ i=1 is compact, and so the function dY (f (x), f (x0 )) , dX (x, x0 ) which is continuous on P , attains its maximum M on P . Consider any x 6= x0 ∈ K. Then either (x, x0 ) ∈ P or x, x0 ∈ Oi for some i = 1, . . . , n. Let Li be the Lipschitz constant of f |Oi . Choose L = max{M, L1 , . . . , Ln }. Then for every x 6= x0 ∈ K, dY (f (x), f (x0 )) ≤L dX (x, x0 ) and f is Lipschitz on K. 26 2 Ordinary Differential Equations 2.1 Introduction An ordinary differential equation (an ODE ) is an equation of the form x(n) (t) = F (x(n−1) , . . . , ẋ, x, t), (10) where x : I → R is a function of t, I is an open interval in R, ẋ = dx , dt and x(n) = dn x . dtn The order of the above equation is n, the highest derivative of x which appears. It is also useful to consider the case x = (x1 , . . . , xm ) : I → Rm , which is called a system of ODEs. Some ODEs can be solved explicitly by using integration techniques, but most cannot. For most ODEs, instead of explicit solutions, we must rely on an abstract existence theorem to show that for nice enough F (Lipschitz suffices), there is a unique solution locally. We also investigate the regularity of solutions, showing, for example, if F is smooth, then any solution to (10) is smooth. Existence, uniqueness, and regularity are three main themes in the theory of all differential equations, and there are satisfactory theorems to handle all three for ODEs. Consider the following example (where x, not t, is the dependent variable): Example 5. Consider the differential equation dy/dx = x2 y. This first order ODE is called separable, since it is written in the form dy/dx = f (x)g(y). Recall the solution procedure for a separable ODE: • If c is a root of g(y), then y = c is a solution. (Why?) So in the present case, y = 0 is a solution. 27 • For other values of g(y), compute dy = x2 y, dx dy = x2 dx, y Z Z dy = x2 dx, y x3 ln |y| = + C, 3 3 3 y = ±eC ex /3 = C 0 ex /3 , where C 0 = ±eC is a nonzero constant. • If we let C 0 be any real number, then we capture both cases above, and 3 the general solution is y = C 0 ex /3 . Homework Problem 14. Consider the ODE dy 1 + y2 = . dx 1 + x2 (a) Find the general solution to this differential equation. Your answer should be rational functions of x. You may need to write your answer using more than one case. (b) Find the particular solution passing through (x, y) = (1, 1). (c) Find the particular solution passing through (x, y) = (1, −1). (Hint: What is the formula for tan(φ + π2 )?) 2.2 Local Existence and Uniqueness The most natural setting for systems of ODEs is in terms of an initial value problem. Let x = (x1 , . . . , xn ) = x(t). An initial value problem for a first order system of ODEs at t = t0 consists of • a system of ODEs ẋ = v(x, t) • and an initial condition x(t0 ) = x0 . 28 We’ll see below that if v satisfies a Lipschitz condition, and for t in a small interval around t0 , there is a unique solution to the initial value problem. Example 6. Consider the following problem: Find a solution to the ODE ẏ = y 2 subject to the initial condition y(0) = 1. Interpreting t as a time variable, what happens as time goes forward from t = 0? Solution: dy/dt = y 2 is separable, and so compute Z Z dy 1 1 dy = dt =⇒ = dt =⇒ − = t + C =⇒ y = − . 2 2 y y y t+C Plug in the initial condition y = 1 and t = 0 to solve for C to find C = −1 and 1 y= . 1−t Note that y(t) is discontinuous at t = 1, so as time goes forward from t = 0, the solution only exists until time 1. Also note there is no problem going backward in time, and so the solution to the initial value problem is y= 1 , 1−t t ∈ (−∞, 1). It does not make sense to talk about the solution to the initial value problem beyond t = 1. The previous example shows that it is not in general possible to extend a solution to an initial value problem for all time. However, we can still hope to find a solution to an initial value problem on a neighborhood (t0 − , t0 + ) of t0 . Theorem 5. Consider the initial value problem ẋ = v(x, t), x(t0 ) = x0 (11) for x : I → Rn for I an open neighborhood of t0 . Assume v is a Lipschitz function from O × I → Rn , where O ⊂ Rn is an open neighborhood of x0 . Then on a neighborhood I˜ of t0 contained in I, there is a unique solution φ to (11). Before we give the proof, let us consider a few examples. 29 Example 7. The differential equation ẋ = x2 + t has no solution which can be written down in terms of standard algebraic and transcendental functions (such as roots, exponentials, trigonometric functions). Theorem 5 states that there is a local solution for every initial value problem. For example, for initial conditions x(0) = 1, there is a solution valid on an open interval containing t = 0. Theorem 5 does not guarantee a solution which is valid for all time t (see Example 6 above). In fact the solution for the present initial-value problem will also blow up in finite time. This is basically because for t ≥ 0, ẋ = x2 + t ≥ x2 , and so the solution should grow faster than the solution to Example 6, which goes to infinity in finite time. If v in Theorem 5 is not Lipschitz, then it is possible to lose the uniqueness statement from Theorem 5 (although existence is still valid). Example 8. Consider the initial value problem 2 ẋ = x 3 , x(0) = 0. Then it is straightforward to verify that x(t) = 0 is a solution. There is another solution, however. Solve the equation 2 dx = x3 , dt − 23 x dx = dt, Z Z − 23 x dx = dt, 1 3x 3 = t + C, x = ( 13 t + 13 C)3 . Then plug in x(0) = 0 to find C = 0 and the solution x(t) = ( 13 t)3 . 2 The point of this example is that v = x 3 is not Lipschitz—see Example 4 above. Therefore, Theorem 5 does not apply. Proof of Theorem 5. The idea of the proof is to set up the problem in terms of a contraction mapping. We first find an iteration whose fixed point solves the differential equation and then find an appropriate complete metric space on which the iteration is a contraction map. 30 For a continuous Rn -valued function φ defined on a neighborhood of t0 , let Aφ be another such function defined as follows: Z t (Aφ)(t) = x0 + v(φ(τ ), τ )dτ. (12) t0 (Note we are integrating Rn -valued function. This may be related to the usual R-valued integration theory by considering each component separately.) A will be our iterative map, and we consider φ, Aφ, A2 φ, etc., to be the Picard approximations for the initial value problem. We consider Picard approximations because of the following Lemma 18. A continuous fixed point of the Picard approximation (12) is a solution to the initial value problem (11). In particular, any such fixed point is continuously differentiable. Proof. If Aφ = φ, then compute Z t d φ̇ = x0 + v(φ(τ ), τ )dτ = v(φ(t), t) dt t0 by the Fundamental Theorem of Calculus. In particular, since φ and v are continuous (Lemma 13), φ̇ is continuous, and so φ is continuously differentiable. Lastly, check the initial condition Z t0 φ(t0 ) = x0 + v(φ(τ ), τ )dτ = x0 t0 to complete the proof of the lemma. Our complete metric space will be ˜ Rn ) : φ(t0 ) = x0 , sup |φ(t) − x0 | ≤ P }, X = {φ ∈ C 0 (I, t∈I˜ where I˜ = [t0 − , t0 + ] ⊂ I for a small positive to be determined later, | · | is the norm on Rn , and P is chosen so that the closed ball Bx0 (P ) = {x : |x − x0 | ≤ P } ⊂ O. We first demonstrate Lemma 19. X is a complete metric space. 31 ˜ Rn ) is complete by Proposition 1. Moreover, the Proof. First of all, C 0 (I, conditions imposed give closed subsets of the Banach space C 0 . The second condition is obviously closed since the norm on any Banach space is continuous. To check the condition φ(t0 ) = x0 is closed, use the following lemma, whose proof is immediate: Lemma 20. For a metric space J and y ∈ J, the map from the Banach space C 0 (J, Rn ) to Rn given by f 7→ f (y) is continuous. Since these two conditions are closed, X is a closed subset of the complete ˜ Rn ), and so is complete with the induced metric. metric space C 0 (I, Remark. Lemma 20 is false for the Banach space L∞ . Why? So we have proved that X is a complete metric space. Next we show Lemma 21. For > 0 small enough, A : X → X. Proof. First of all, choose δ > 0 so that [t0 − δ, t0 + δ] ⊂ I. Since v is continuous and {x : |x − x0 | ≤ P } × [t0 − δ, t0 + δ] is compact, there is a constant M so that |v(x, t)| ≤ M. sup |t−t0 |≤δ, |x−x0 |≤P In order for this bound to work below, we must have ≤ δ (so then I˜ ⊂ [t0 − δ, t0 + δ]). To check A : X → X, we need to check for each φ ∈ X, 1. Aφ is continuous. This follows as in Lemma 18 above. 2. (Aφ)(t0 ) = x0 . This is easy to check as in Lemma 18. 3. supt∈I˜ |(Aφ)(t) − x0 | ≤ P . To check this, write Z t |(Aφ)(t) − x0 | = v(φ(τ ), τ )dτ ≤ M |t − t0 | ≤ M , t0 where we have used the fact that φ ∈ X and the definition of M to show the first inequality. So this condition is satisfied if ≤ P/M . So A : X → X if ≤ min{δ, P/M }. 32 Finally we use the Lipschitz hypothesis on v to show that A is a contraction map. Let L be the Lipschitz constant for v. Then for φ, ψ ∈ X, compute Z t |(Aφ)(t) − (Aψ)(t)| = [v(φ(τ ), τ ) − v(ψ(τ ), τ )]dτ t Z t0 ≤ |v(φ(τ ), τ ) − v(ψ(τ ), τ )|dτ t0 Z t ≤ L|φ(τ ) − ψ(τ )|dτ t0 ≤ Lkφ − ψkC 0 |t − t0 | ≤ Lkφ − ψkC 0 Then since kAφ − AψkC 0 = supt∈I˜ |(Aφ)(t) − (Aψ)(t)|, we see that kAφ − AψkC 0 ≤ Lkφ − ψkC 0 . So A is a contraction map if < 1/L. Thus all together, if we require < min{δ, P/M, 1/L}, then A is a contraction map on X, and its fixed point is a solution to the initial value problem. In order to show uniqueness of the initial value problem, note that the Contraction Mapping Theorem automatically proves that any two continuous solutions φ1 and φ2 to the initial value problem from I˜ to Rn must coincide if the additional constraint sup |φ(t) − x0 | ≤ P t∈I˜ is satisfied. Since φ1 and φ2 are continuous and satisfy the initial condition, this condition is automatically satisfied for both φ1 and φ2 on a (perhaps smaller) interval Iˆ ⊂ I˜ containing t0 . Then uniqueness applies on this smaller interval, since A is a contraction map for any small enough. Note that the interval Iˆ on which φ1 = φ2 may depend on φ1 and φ2 . The proof that the two solutions must coincide on all of I˜ depends on the Extension Theorem 6 below. We record what we have proven so far with respect to uniqueness here. Proposition 22. Any two solutions φ1 and φ2 to the initial value problem (11) coincide on a small interval containing t0 . The interval may depend on the solutions φ1 and φ2 . 33 Remark. Note that in the proof of the previous theorem, we only use that v is Lipschitz in the x variables (with a uniform Lipschitz constant uniform valid for all t). We still require v to be continuous in t. The previous theorem provides a continuously differentiable solution on an interval I˜ containing the initial time t0 and proves uniqueness on a (perˆ There is a satisfactory more global theory of ODEs haps) smaller interval I. which we detail in the next subsection. 2.3 Extension of solutions Recall, from Corollary 16 above, that any locally C 1 function f from Ω, a domain in Rn , to Rm is locally Lipschitz. In other words, f is Lipschitz when restricted to any compact subset of Ω. Theorem 6 (Extension). Consider an initial value problem ẋ = v(x, t), x(t0 ) = x0 . (13) Assume v is continuous and locally Lipschitz in Rn × I, where I is an open interval containing t0 . Then there is an open interval J satisfying t0 ∈ J ⊂ I and a unique solution φ : J → Rn to the initial value problem. Moreover, J is maximal in the following sense: if there is a time T ∈ I ∩ ∂J, then lim supt→T |φ(t)| = ∞. So this theorem says that if we start with an initial condition x(t0 ) = x0 and flow forward (or backward) in time by satisfying the ODE, then there is a unique solution which continues until (1) the end of the interval I is reached, or (2) the solution blows up. Proof. We first consider the following lemma, which is a consequence of the proof of Theorem 5 above: Lemma 23. On any compact subset K of Rn × I, there is an > 0 so that for any (x0 , t0 ) ∈ K, there exists a solution to the initial value problem ẋ = v(x, t), x(t0 ) = x0 which is valid on [t0 − , t0 + ]. The point is that there is a uniform which works for all initial conditions (x0 , t0 ) ∈ K. 34 Proof. Recall that in the proof of Theorem 5. Any < min{δ, P/M, 1/L} works. By compactness of K and since I is open, we can choose a uniform δ > 0 so that for all (x0 , t0 ) ∈ K, [t0 − δ, t0 + δ] ⊂ I. We may choose P to be any positive number (since O = Rn in the present case). The Lipschitz constant L = LK̃ is uniform over any compact set K̃ by the locally Lipschitz property of v (Proposition 17). Let M = max |v(x, t)|, (x,t)∈K̃ where K̃ = {(x, t) ∈ Rn+1 : ∃(x0 , t0 ) ∈ K : |t − t0 | ≤ δ, |x − x0 | ≤ P } It is straightforward to check K̃ is compact (it is the image of the compact set K × BP (0) × [−δ, δ] ⊂ Rn+1 × Rn+1 under the continuous map + : Rn+1 × Rn+1 → Rn+1 .) Therefore, since v is continuous, M can be chosen independently of (x0 , t0 ) ∈ K. (Note the reason we need to go to all of K̃: the definition of M in the proof of Theorem 5 above is M= sup |v(x, t)|. |t−t0 |≤δ,|x−x0 |≤P In order to have a single M work for all (x0 , t0 ) ∈ K, we must have let (x, t) ∈ K̃. L must be valid on all of K̃ as well, since we consider integrals from t0 to t, where (x0 , t0 ) ∈ K, |t − t0 | ≤ < δ.) Now we must ensure that < min{δ, P/M, 1/L}. All of these quantities can be chosen independently of (x0 , t0 ) ∈ K. Lemma 24 (Gluing solutions). Consider any two solutions to ẋ = v(x, t) which are defined on intervals in R. If the two coincide on any interval in R then they must coincide on the entire intersection of their intervals of definition. Thus they can be glued together to form a solution on the union of their intervals of definition. Proof. Consider two solutions φ1 , φ2 to ẋ = v(x, t) defined on intervals I1 and I2 . Assume they coincide on an interval I3 ⊂ I1 ∩ I2 . We want to show φ1 = φ2 on all of I1 ∩ I2 . Let I4 be the largest interval containing I3 on which φ1 and φ2 coincide (take I4 to be the path-connected component of the closed set {t : φ1 (t) = φ2 (t)} containing I3 ). Now we will show that I4 = I1 ∩ I2 . 35 Assume I4 6= I1 ∩ I2 . Then since I4 is a relatively closed subinterval of I1 ∩ I2 , there is an endpoint T of I4 in the interior of I1 ∩ I2 . Now φ1 and φ2 are both solutions of ẋ = v(x, t), x(T ) = φ1 (T ) [= φ2 (T )]. Proposition 22 shows that φ1 and φ2 must agree on a small interval I5 3 t0 . Thus I4 must contain I5 , and we have a contradiction to the assumption that T is an endpoint of I4 in the interior of I1 ∩ I2 . Thus I4 = I1 ∩ I2 . It may help to refer to the following picture of the intervals involved. I1 I2 I1 ∩ I2 I3 I4 I5 r T Now we have proved that φ1 = φ2 on the intersection of their domains of definition I1 ∩ I2 . To extend to I1 ∪ I2 , define φ1 (t) for t ∈ I1 , φ(t) = φ2 (t) for t ∈ I2 \ I1 . Note that φ is a solution to the differential equation since both φ1 and φ2 are. There is no trouble with the differentiability of this piecewise-defined function since φ1 = φ2 on the whole interval I1 ∩ I2 . For simplicity, consider only solutions moving forward in time. Let E = {t ∈ I+ : there is a unique solution φ to (13) on [t0 , t)}, where I+ = I ∩ (t0 , ∞). We will set this E to be equal to J+ = J ∩ (t0 , ∞). Uniqueness on [t0 , t) means any other solution to the initial value problem defined on an interval containing [t0 , t) must coincide with φ there. It will suffice to prove the following Lemma 25. If supE |φ| ≤ C < ∞, then E = I+ . Proof. Assume |φ| is uniformly bounded on E. Then to prove the lemma it is enough to show that E is a nonempty, open, and closed subset of I+ (and 36 so E = I+ since I+ is connected). E is nonempty by Theorem 5 and Lemma 24 above. To show E is open in I+ , let T ∈ E. Then there is a unique solution φ defined on (t0 , T ). First we note that (t0 , T ] ⊂ E. To see this, let T 0 ∈ (t0 , T ]. Then the restriction of φ = φT to [t0 , T 0 ] is a solution to (13) on [t0 , T ). Moreover, it is unique, since any other solution to (13) on [t0 , T 0 ) agrees with φ on a neighborhood of t0 , and so Lemma 24 shows they must agree on all [t0 , T 0 ). So to show E is open, we may restrict our attention to times larger than T . Since |φ| is uniformly bounded by C and [t0 , T ] is a compact subinterval of I, we may apply Lemma 23 to show there is uniform so that any solution to the differential equation with initial condition x(τ ) = χ for τ ∈ [t0 , T ], |χ| ≤ C must exist on [τ − , τ + ]. Now we may consider the initial value problem ẋ = v(x, t), x(T − 2 ) = φ(T − 2 ). (14) So Lemma 23 shows there is a solution φ̃ to this initial value problem which exists on [T − 32 , T + 2 ]. Moreover, Lemma 24 says that φ = φ̃ on the intersection of their intervals of definition, and moreover, that φ may be extended by φ̃ to a solution on [t0 , T + 2 ]. Lemma 24 also implies this extension is unique on every subinterval containing t0 , and so in particular [T − 2 , T + 2 ] ⊂ E and E is open. T− r 2 T r [t0 , T ] [T − 32 , T + 2 ] It remains to show that E is closed in I+ . Let T ∈ Ē ∩ I+ . Let ti ∈ E, ti → T . Then the assumption that |φ| ≤ C on E implies there is a uniform so that for all ti , there is a solution on [ti − , ti + ]. Choose ti so that |T − ti | < . Also, let τ < ti so that |T − τ | < . Now we use the same argument as in previous paragraphs: Use the solution φ on [t0 , ti ) to construct a solution φ̃ on [τ −, τ +] 3 T . Lemma 24 allows us to glue φ and φ̃ together to form a unique solution valid on [t0 , τ + ] 3 T . So T ∈ E as above and E is closed in I+ . T − τ ti r r r 37 T r [t0 , T ] [τ − , τ + ] This Lemma 25 completes the proof of the Extension Theorem 6, at least for solutions moving forward in time. The reason is this: if there is a time T ∈ I+ ∩ ∂J (we may choose I+ since we are only moving forward in time), then E = J+ 6= I+ . Therefore, by the contrapositive of Lemma 25, supE |φ| = ∞. But since φ is continuous on [t0 , T ), we must have lim supt→T |φ(t)| = ∞. The argument for solutions moving backward in time is the same. The above theorem may be improved as follows: Theorem 7. Consider an initial value problem ẋ = v(x, t), x(t0 ) = x0 . Assume v is continuous and locally Lipschitz in U, where U is a connected open subset of Rn × R containing (x0 , t0 ). Then there is an open interval J satisfying t0 ∈ J and a unique solution φ : J → Rn to the initial value problem. Moreover, J is maximal in the following sense: Let J+ = J ∩(t0 , ∞) and J− = J ∩ (−∞, t0 ). Then neither of the graphs G± = {(t, φ(t)) : t ∈ J± } is contained in any compact subset of U. The proof is essentially the same as that of Theorem 6. Here is an important principle which follows from the basic theorems Proposition 26. Consider the graph of a solution (t, x(t)) to a differential equation ẋ = v(x, t), where v is Lipschitz. If any two solutions have graphs which cross, then they must coincide on the intersection of their intervals of definition. Proof. Let φ1 and φ2 be the two solutions. If their graphs cross at (t0 , x0 ), then they both solve the initial value problem ẋ = v(x, t), x(t0 ) = x0 . The solutions must coincide on a small interval by Proposition 22, and then must coincide on the whole intersection of their intervals of definition by Lemma 24. Homework Problem 15. Consider the initial value problem ẋ = x2 + t, x(0) = 1. Show that the solution to this problem (moving forward in time) exists only until some time T > 0, where T < 1. 38 Hint: See Examples 6 and 7 above. Let φ(t) be the solution to the current 1 initial value problem. We will compare φ to the solution ψ(t) = 1−t of the initial value problem ẋ = x2 , x(0) = 1. Let J be the maximal interval on which φ can be extended. Let J+ = J ∩ (0, ∞); T is then the positive endpoint of J+ . Now consider the interval E = {t ∈ J+ : φ(τ ) ≥ ψ(τ ) for all τ ∈ (0, t]}. (a) Show that E = J+ implies T ≤ 1. (Use Theorem 6.) (b) Proceed to show E = J+ . It suffices to show E is nonempty, open and closed in J+ . Why? (c) To show E is nonempty, differentiate the equation φ̇ = φ2 + t at t = 0. This will allow you to compute φ̈(0). Show that φ(0) = ψ(0), φ̇(0) = ψ̇(0), and φ̈(0) > ψ̈(0). Why does this show E is nonempty? (Use Taylor’s Theorem or integrate in t twice; in particular, by the regularity results in Subsection 2.5 below, φ̈ is continuous.) (d) To show E is open, show that φ̇(t) > ψ̇(t) for t ∈ E. (e) To show E is closed, use the continuity of φ and ψ. So this proves E = J+ and so T ≤ 1. (f ) To show T < 1, note that part (c) implies there is a point τ ∈ E where φ(τ ) > ψ(τ ). Let ψ̃(t) be the solution to the initial value problem ẋ = x2 , x(τ ) = φ(τ ). Solve this equation explicitly and show that ψ̃ blows up at a time T̃ < 1. Then note that parts (a)-(e) can be repeated to show that J+ ⊂ (0, T̃ ). 2.4 Linear systems If x ∈ Rn , a homogeneous linear system is a system of the form ẋ = A(t)x, where A(t) is an n × n matrix valued function of t alone. In this case, it is straightforward to see that the space of solutions is a vector space over R. In other words, if α ∈ R, φ, ψ satisfy the equation, then αφ + ψ also satisfies the equation. The existence and uniqueness theorem allows us to find the dimension of the solution space. 39 Proposition 27. Consider the equation ẋ = A(t)x, where A(t) is a continuous n × n matrix valued function of t, and x(t) ∈ Rn . For each t0 , there is an interval I 3 t0 so that the space of solutions φ(t) on I has dimension n. Consider an initial value condition x(t0 ) = x0 . Let φx0 (t) be the solution to this initial value problem. Then the map S : x0 7→ φx0 is a linear isomorphism from Rn to the space of solutions defined on I. Remark. It is not too hard to show that the interval I can be taken to be the maximal open interval containing t0 on which A(t) is continuous. (See Michael Taylor, Partial Differential Equations, Basic Theory.) Proof. A(t)x is locally Lipschitz in x and continuous in t, as needed for Theorems 5 and 6. First of all, for a basis ξi of Rn , let I be a small interval on which all the solutions φξi exist. Note the map x0 7→ φx0 is obviously linear. S is injective since if x0 6= y0 , φx0 (t0 ) 6= φy0 (t0 ), and thus φx0 6= φy0 . Therefore, if x0 = ai ξi , φx0 = ai φξi . Again by uniqueness, any solution φ to ẋ = A(t)x is determined by the initial value φ(t0 ) = x0 , and so S is onto. Given a linear equation ẋ = A(t)x, for x = x(t) ∈ Rn , we can consider a similar equation Ẋ = A(t)X for X = X(t) an n × n matrix valued function. The solution Φ(t) of the initial value problem Ẋ = A(t)X, X(t0 ) = I the identity matrix, is called the fundamental solution of the equation ẋ = A(t)x. It is straightforward to see that the ith column of Φ(t) is the solution to ẋ = A(t)x, xj (t0 ) = δij . Moreover, the fundamental solution can be used to compute any solution to the differential equation near t0 . Lemma 28. On the maximal interval of existence of the fundamental solution Φ(t) of ẋ = A(t)x, the solution to the initial value problem ẋ = A(t)x, x(t0 ) = x0 , is given by Φ(t)x0 . Proof. The proof is an immediate calculation. Homework Problem 16. An inhomogeneous linear system is a system of the form ẋ = A(t)x + b(t), (15) where A(t) and x are as above and b(t) is a continuous Rn -valued function. 40 (a) Let ψ(t) be a solution to (15). Show that the solution space to (15) is equal to {ψ(t) + φ(t) : φ(t) solves ẋ = A(t)x}. (b) In dimension 1, let Φ(t) be the fundamental solution to ẋ = A(t)x. Show that the general solution to (15) is Z b(t) Φ(t) dt + C . Φ(t) (c) Still in dimension 1, solve the initial value problem ẋ = x + t, x(0) = 1. An important example class of equations are those with constant coefficients. ẋ = Ax, for A a constant n × n matrix. The fundamental solution to such an equation (with t0 = 0) can be calculated directly. In the case that A is diagonalizable, write A = P DP −1 , with D = diag (λ1 , . . . , λn ) the diagonal matrix with the eigenvalues λi of A along the diagonal and P the matrix whose columns are a basis of eigenvectors for the appropriate eigenvalues. Then if we define etD = diag (etλ1 , . . . , etλn ), then the fundamental solution to ẋ = Ax is given by etA ≡ P etD P −1 . To check that etA is the fundamental solution, note that e0A = I and d tA d e = (P etD P −1 ) dt dt d tD e P −1 = P dt = P DetD P −1 = P DP −1 P etD P −1 = AetA . One thing to note is D and P may be complex-valued matrices. This doesn’t cause any problem if we use Euler’s formula ex+iy = ex (cos y + i sin y). Not every matrix B is diagonalizable. To find a general formula for the fundamental solution etB , we need to deal with the case of Jordan blocks. The following problem addresses this. 41 Homework Problem 17. Let λ 0 0 .. . 0 B be the n × n Jordan block matrix 1 0 ··· 0 λ 1 ··· 0 0 λ ··· 0 .. .. . . .. . . . . 0 0 ··· λ (16) with λ on the diagonal, 1 just above the diagonal, and 0 elsewhere. Find the fundamental solution etB to ẋ = Bx. Hint: Write out the system of equations in terms of components. Note that ẋn only involves xn and not any other xi . So first solve the appropriate initial value problems for xn (you’ll need to do one initial value problem for each column of the identity matrix I). Then do xn−1 , then xn−2 , etc., and find a formula that works for all xi . Alternatively, it is possible to write out etB as a power series. If you approach the problem this way, you must check to be sure your answer works. Of course the reason we consider Jordan blocks is the following famous theorem. Theorem 8 (Jordan Canonical Form). Let A be an n×n complex matrix. Then we can write A = P BP −1 , where B is an upper triangular, block diagonal matrix of the form B1 0 0 · · · 0 0 B2 0 · · · 0 0 0 B3 · · · 0 B= , .. .. .. . . .. . . . . . 0 0 0 · · · Bm where each Bi is an li × li Jordan block matrix of the form (16) for i = 1, . . . , m, λ = λi an eigenvalue of A. Of course l1 + · · · + lm = n. If λ is a root of the characteristic polynomial det(λI − A) repeated k times, then X li = k. λi =λ B is unique up to the ordering of the blocks Bi . 42 Remark. A is diagonalizable if and only if each Jordan block is 1 × 1. If the characteristic polynomial of A has distinct roots, then A is diagonalizable, but the converse is false in general (A = I the identity matrix is a counterexample). Homework Problem 18. Assume that all the eigenvalues of the n × n matrix A have negative real part. (A is not necessarily diagonalizable.) Show that etA → 0 as t → ∞. (Just check that each entry in the matrix etA goes to 0.) Homework Problem 19. Solve the initial value problem ẋ = 2x − y, 2.5 ẏ = 2x + 5y, x(0) = 2, y(0) = 1. Regularity Regularity of a function refers to how many times the function may be differentiated. A function is (locally) C k if it and all of its partial derivatives up to order k are continuous. A function is C ∞ if it and all of its partial derivatives of all orders are continuous. For the purposes of this course, a function is smooth if it is C ∞ (in other settings a function may be called smooth if it has as many derivatives as the purpose at hand requires). There are other notions of regularity in which the function and perhaps its derivatives, suitably defined, are in Lp or other Banach spaces. A vector-valued function is smooth or C k if and only if each of its component functions is smooth or C k respectively. Theorem 9. Assume v : O × I → Rn is smooth (O ⊂ Rn is a domain and I ⊂ R is an open interval). Any solution to ẋ = v(x, t) is smooth. Proof. Let φ be a solution. Since φ̇ exists, then φ is differentiable, and thus continuous. Since v is continuous as well, φ̇ = v(φ, t) is continuous and so φ is (locally) C 1 . Now since v is smooth, we may differentiate to find φ̈(t) = ∂v ∂v (φ, t)φ̇i (t) + (φ, t). i ∂x ∂t Now since φ and φ̇ and the partial derivatives of v are continuous, we see that φ̈ is continuous and φ is (locally) C 2 . Since v is smooth, we can keep differentiating, using the chain and product rules, to find by induction dm φ/dtm is continuous for all m and so φ is C ∞ . 43 Remark. The technique used in the proof of Theorem 9 above is called bootstrapping. In this process, once we know that φ is C 0 , we plug into the equation to find that φ is C 1 . Then we use the fact that φ is C 1 to prove φ is C 2 , etc. Remark. The proof above also shows that if v is C k , then φ is C k+1 . 2.6 Higher order equations A higher-order systems of ODEs is of the form x(m) = v(x(m−1) , . . . , ẋ, x, t), (17) m where of course x(m) = ddtmx . There is an easy trick to transform this system to an equivalent first-order system with more variables. Let y 1 = ẋ, . . . , y m−1 = x(m−1) . Then it is easy to see the system (17) above is equivalent to the system ẏ m−1 = v(y m−1 , . . . , y 1 , x, t), m−2 = y m−1 , ẏ .. .. (18) . . 1 2 ẏ = y , ẋ = y 1 . This first-order system leads us to the appropriate formulation of the initialvalue problem: Theorem 10. Let U be a neighborhood of (xm−1 , . . . x10 , x0 , t0 ) in Rnm+1 = 0 n n n R × · · · × R × R. Let v : U × I → R be locally Lipschitz. Then there is an interval J on which there is a unique solution to the initial value problem x(m) = v(x(m−1) , . . . , ẋ, x, t), (m−1) (t0 ) = xm−1 , 0 x .. .. (19) . . ẋ(t0 ) = x10 , x(t0 ) = x0 . Moreover, if T is an endpoint of J (either finite or infinite), then as t → T , (x(m−1) , . . . , ẋ, x, t) leaves every compact subset of U. Proof. Apply Theorems 5 and 7. 44 So for an mth order differential equation, we need initial conditions for the function and its derivatives up to order m − 1. Remark. The trick of introducing new variables into a system of ODEs is standard in physics. For a particle at position x = x(t), a typical equation involves how a force acts on the particle. The sum F of the forces acting on the particle must be equal to mẍ, where m is a constant called the mass. It is standard to introduce a new vector quantity, called the momentum q = mẋ. Then F = mẍ is equivalent to the system q̇ = F, ẋ = q . m Again, an important class of examples is linear equations with constant coefficients. If x(m) + am−1 x(m−1) + · · · + a1 ẋ + a0 x = 0, for x a real-valued function, the functions {eλk t } are linearly independent in the solution space, if λk solve the characteristic equation λm + am−1 λm−1 + · · · + a1 λ + a0 . If all the roots are distinct, then {eλk t } form a basis. If a root is repeated l times, then we must consider functions of the form tj eλk t for j = 0, . . . , l − 1 to form a basis of the solution space. Euler’s formula again allows us to handle complex roots of the characteristic equation. Homework Problem 20. For which real values of the constants a and b do all the solutions to ẍ + aẋ + bx = 0 go to 0 as t → ∞? Prove your answer, and draw your answer as a region in the (a, b) plane. 2.7 Dependence on initial conditions and parameters We’ve shown above that if v = v(x, t) is smooth, then the resulting solution to ẋ = v(x, t), x(t0 ) = x0 is also smooth as a function of t. The initial value problem also depends on the initial point x0 . We investigate regularity of the solution depending on x0 . 45 First of all we remark that there is a neighborhood N of (x0 , t0 ) in Rn+1 and an > 0 so that every solution to the equation with initial condition x(τ ) = y for (y, τ ) ∈ N exists by Lemma 23. This existence on a neighborhood allows us to consider taking derivatives in y in what follows. Theorem 11. Let v be a C 2 function on a neighborhood of the initial conditions (y, t0 ) ∈ Rn × R. Then the solution φ = φ(y, t) to the initial value problem ẋ = v(x, t), x(t0 ) = y, is C 1 in y. Proof. If ∂φ/∂y i exists, then it must satisfy ∂ Dy φ = Dx v(φ, t) ◦ Dy φ. ∂t (Here Dy φ is the total derivative matrix with respect to the y variables. So its entries are ∂φj /∂y i .) So Φ = (φ, Dy φ) = (x, z) should satisfy the initial value problem ẋ = v(x, t), ż = Dx v(x, t) ◦ z, (20) x(t0 ) = y, z(t0 ) = I the identity matrix. Note that since v is C 2 , Dx v = (∂v k /∂xj ) is C 1 and is thus locally Lipschitz by Proposition 15. Even though we don’t yet know that the derivative Dy φ satisfies the equation, we do know that the initial value problem (20) is solvable. In order show the solution to (20) is the partial derivative, we return to the proof of Theorem 5. Let φ0 = y, ψ0 = I the identity matrix. Then (φ0 , ψ0 ) satisfy the initial conditions in (20). Now we form Picard approximations Z t φn+1 (y, t) = y + v(φn (y, τ ), τ ) dτ, t0 Z t ψn+1 (y, t) = I + Dx v(φn (y, τ ), τ ) ◦ ψn (y, τ ) dτ. t0 It is easy to show by induction that Dy φn = ψn . We already have the initial step n = 0, and since we can differentiate under the integral sign 46 (see Proposition 11 above), we can easily check that Dy φn = ψn implies Dy φn+1 = ψn+1 . We know by the proof of Theorem 5 that φn → φ and ψn → ψ uniformly on a small interval containing t0 . Then Proposition 8 shows that ∂φ/∂y i = ψi the ith component of ψ for i = 1, . . . , n. Since these partial derivatives are continuous (the uniform limit of continuous functions is continuous), then Proposition 6 shows Dy φ = ψ. Remark. The previous theorem is true if we assume v is only C 1 and not necessarily C 2 . The proof is more involved in the case v is only C 1 . (See Taylor, Partial Differential Equations, Basic Theory, section 1.6.) A bootstrapping argument can be used to prove the following Proposition 29. For r ≥ 2, let v be a C r function on a neighborhood of the initial conditions (x0 , t0 ) ∈ Rn × R. Then the solution φ = φ(y, t) to the initial value problem ẋ = v(x, t), x(t0 ) = y, is C r−1 in y. Proof. Let Proposition Tr be the proposition for a given r ≥ 2. We proceed by induction. The case r = 2 is proved above in Theorem 11. Now assume that the Proposition Tr has been proved. To prove Tr+1 , assume that v is locally C r+1 and let φ be a solution to the initial value problem. Then Dx v is locally C r . Now as above, the pair (φ, Dy φ) = (x, z) satisfies ẋ = v(x, t), ż = Dx v(x, t) ◦ z. (21) Now analyze the right-hand side of the equations in (21). They are C r functions of x, z, t. Therefore, Proposition Tr shows that z = Dy φ is locally C r−1 in y. Since the first partial derivatives of φ are C r−1 , φ is C r . This proves the inductive step, and the proposition. We also have the following Corollary 30. If v = v(x, t) is smooth (C ∞ ), then the solution φ to the initial value problem ẋ = v(x, t), x(t0 ) = y is smooth in y. Moreover, it is not too hard to prove the following: 47 Theorem 12. Let r ≥ 2. If v(x, t) is C r jointly in x and t, and if φ is the solution to ẋ = v(x, t), x(t0 ) = y, then φ is jointly C r−1 in y, t and t0 . Idea of proof. The difficult part is already done (the C r−1 dependence on y). For the rest, recall that any solution φ = φ(y, t0 , t) satisfies Z t φ=y+ v(φ(y, t0 , τ ), τ ) dτ. t0 Then use the Fundamental Theorem of Calculus and Proposition 11 above to produce a bootstrapping argument to show that the appropriate partial derivatives are continuous. For a complete proof, see Arnol’d, Ordinary Differential Equations, section 32.5. Homework Problem 21. For f = f (x, t, y) a smooth function real variables of x, t, and y, compute d dt Z t2 f (x(t, y), t, y) dy. 0 Make sure your answer works for the functions f (x, t, y) = x2 ty + t3 y 2 + x, x(t, y) = y 2 + t2 . Hint: Carefully rename all intermediate variables and apply R the Chain Rule. It also should help to write down the anti-derivative F = f (x(t, y), t, y) dy and work with the function F using the Fundamental Theorem of Calculus. Homework Problem 22 (Smooth dependence on parameters). Show that if v = v(x, t, α) is jointly smooth on a neighborhood of (x0 , t0 , α0 ) in Rn × R × Rm , then the solution φ to the initial value problem ẋ = v(x, t, α), x(t0 ) = x0 is smooth as a function of α. Hint: Show that this initial value problem is equivalent to the problem ẋ = v(x, t, β), x(t0 ) = x0 , 48 β̇ = 0, β(t0 ) = α. 2.8 Autonomous equations An ODE system of the form ẋ = v(x) is autonomous. In other words, a system is autonomous if there is no explicit dependence on t. The main fact about autonomous systems is the following proposition, whose proof is an easy computation: Proposition 31. If φ is a solution to ẋ = v(x), then for all T ∈ R, φ̃(t) = φ(t + T ) is also a solution. A constant solution to an ODE system is called an equilibrium solution. The equilibrium solutions to autonomous equations correspond to the roots of v. Example 9. Consider the initial value problem ẋ = x2 − 1. Then to solve, we have the equilibrium solutions x = 1 and x = −1. If x2 − 1 6= 0, compute dx dt Z dx x2 − 1 Z 1 1 1 − dx 2 x−1 x+1 1 x − 1 ln 2 x + 1 x−1 x+1 = x2 − 1, Z = dt, = t + C, = t + C, = ±e2t+2C = Ae2t , A = ±e2C 6= 0, 1 + Ae2t , x = 1 − Ae2t x(0) − 1 A = . x(0) + 1 If x(0) ∈ (−1, 1), then A < 0, and the solution x exists for all time and is bounded between the equilibrium solutions at 1 and −1. Moreover, x approaches the equilibrium solutions x → −1 as t → ∞ and x → 1 as t → −∞. If x(0) > 1, then A ∈ (0, 1) and the solution exists only for t ∈ (−∞, − 12 ln A). If x(0) < −1, then A > 1 and the solution exists only for t ∈ (− 21 ln A, ∞). 49 This behavior is typical of the behavior of autonomous equations for Lipschitz v. Any bounded solution which exists for all time must be asymptotic to equilibrium solutions as t → ±∞. Also note that any integral curve I acts as a barrier to other solutions, in that no other integral curves can cross I (see Proposition 26 above). Homework Problem 23. Let v : R → R be locally Lipschitz. Show that any bounded solution φ of ẋ = v(x) which exists for all time satisfies limt→∞ φ(t) = c, where v(c) = 0. Hint: There are three cases: Case 1: v(φ(0)) = 0. Show that φ is constant by uniqueness. Case 2: v(φ(0)) > 0. Show that v(φ(t)) > 0 for all t (if it is ever equal to zero, apply the argument of Case 1 above to show φ is constant; also use the continuity of v ◦ φ). Now show φ(t) is always increasing, and so must have a finite limit c as t → ∞. Compute limt→∞ v(φ(t)). Write Z ∞ Z ∞ ∞ > c = φ(0) + φ̇(t) dt = φ(0) + v(φ(t)) dt, 0 0 and show that v(c) = 0. Case 3: v(φ(0)) < 0 is essentially the same as Case 2. 2.9 Vector fields and flows An important interpretation of autonomous systems of equations is given in terms of vector fields. Interpret x(t) as a parametrized curve x : I → Rn , where I ⊂ R is an interval. Then ẋ(t) is the tangent vector to the curve at time t. For O ⊂ Rn an open set, a function v : O → Rn can be thought of as a vector field. In other words, at every point x ∈ O, v(x) is a vector in Rn based at x. Then we have a natural interpretation of an autonomous differential equation ẋ = v(x) as the flow along the vector field v. For any solution to ẋ = v(x), the tangent vector ẋ(t) must be equal to the value of the vector field v(x(t)). The solution x(t) is an integral curve to the equation ẋ = v(x). The integral curves for the solution are tangent to the vector field at each point x. Moreover, if v(x) is locally Lipschitz, then the solutions are unique, and we may think of the vector field as giving unique directions for how to proceed in time at each point in space. By the invariance of solutions in time, we have the following strong version of uniqueness: 50 Proposition 32. Let O ⊂ Rn be an open set, and let v : O → Rn be locally Lipschitz. If φ1 and φ2 are two maximally extended solutions to ẋ = v(x) which satisfy φ1 (t1 ) = φ2 (t2 ), then φ1 (t) = φ2 (t + t2 − t1 ) for all t in the maximal interval of definition of φ1 . Proof. φ1 (t) and φ̃2 (t) = φ2 (t + t2 − t1 ) both satisfy the initial value problem ẋ = v(x), x(t1 ) = φ1 (t1 ), and so must be the same by Theorems 5 and 6. For a vector field v on O ⊂ Rn , a picture of all is called the phase portrait of v. Recall we drew in of the two systems in R2 1 0 −3 ẋ = x, ẋ = 0 −1 −2 the integral curves on O class the phase portraits 4 3 x. Homework Problem 24. (a) Draw the phase portrait of the system in R2 1 0 ẋ = x. 0 2 Show that each integral curve lies in a parabola or a line in R2 . (b) Draw the phase portrait of the system in R2 3 1 − 2 2 ẋ = x. − 12 32 Here is the principal theorem regarding flows of vector fields on open sets: Theorem 13. Let O ⊂ Rn be open, and v : O → Rn be smooth. Then there is an open set U so that O × {0} ⊂ U ⊂ O × R on which the solution φ(y, t) to ẋ = v(x), x(0) = y exists, is unique, and is smooth jointly as a function of (y, t). Proof. This follows immediately from Theorems 5, 7 and 11. 51 Remark. It may not be possible to find an > 0 so that O ×(−, ) ⊂ U. The reason is that solutions may leave O in shorter and shorter times for initial conditions y → ∂O. A simple example is given by v(x) = 1, O = (0, 1). This problem cannot be fixed by considering O = Rn , since we may have v(y) → ∞ rapidly as y → ∞ in Rn . However, see the following corollary. Corollary 33. Under the conditions of Theorem 13 above, if K ⊂ O is compact, then there is an > 0 so that the solution φ : K × (−, ) → O. Proposition 34. Consider φ(y, t) the solution to ẋ = v(x), x(0) = y, for v smooth. Then as long as φ(y, t1 ), φ(y, t1 + t2 ) ∈ O, then φ(y, t1 + t2 ) = φ(φ(y, t1 ), t2 ). Proof. Consider ψ(t) = φ(y, t1 + t), θ(t) = φ(φ(y, t1 ), t). Then if we show ψ and θ satisfy the same initial value problem, then uniqueness will show that ψ(t) = θ(t) and we are done. Compute ψ(0) θ(0) ψ̇(t) θ̇(t) = = = = φ(y, t1 ), φ(φ(y, t1 ), 0) = φ(y, t1 ), φ̇(y, t1 + t) · 1 = v(φ(y, t1 + t)) = v(ψ(t)), φ̇(φ(y, t1 ), t) = v(φ(φ(y, t1 ), t)) = v(θ(t)). Note that it is necessary in the previous Proposition 34 to restrict to times in which the solution does not leave O. In fact, long-time existence of flows along vector fields is problematic on open subsets of Rn . Recall we require our subsets to be open for ODEs since we want to be able to take twosided limits for any derivatives involved. On the other hand, compactness guarantees a uniform time interval for existence. But compact subsets of Rn are closed and bounded, and thus (if nonempty) cannot be open. The way out of this problem is to consider compact manifolds, which we will realize as compact lower-dimensional subsets of Rn . For example, S1 = {(x1 , x2 ) : (x1 )2 + (x2 )2 = 1} is a compact one-dimensional submanifold of R2 . 52 2.10 Vector fields as differential operators A vector field v on O naturally differentiates functions f on O by the directional derivative: ∂f vf = Dv f = v i i ∂x for v i the components of v. Therefore, we often write v = vi ∂ . ∂xi We say that v is a first-order differential operator on functions f . This observation is natural from the point of view of ODEs by the following Proposition 35. For an interval I ⊂ R, let φ : I → Rn be a solution to the autonomous system ẋ = v(x), where v : O → Rn is a continuous function and O an open subset in Rn . Also consider a differentiable function f : O → R. Then the derivative (f ◦ φ)0 (t) = (Dv f )(φ(t)) = (vf )(φ(t)). Proof. Compute (f ◦ φ)0 (t) = (Df )(φ(t)) ◦ (Dφ)(t) dφi ∂f (φ(t)) (t) = ∂xi dt ∂f = (φ(t))v i (φ(t)) i ∂x i ∂ f (φ(t)) = v ∂xi = (vf )(φ(t)). Define the bracket [v, w] of two operators to be [v, w]f = (vw − wv)f = v(wf ) − w(vf ). Homework Problem 25. Let v and w are two smooth vector fields on Ω. 53 (a) Show that the differential operator [v, w] is also a first-order differential operator determined by a vector field (which we also write as [v, w]). What are the components of [v, w]? (b) For smooth vector fields u, v and w, show that [u, v] = −[v, u] and [[u, v], w] + [[v, w], u] + [[w, u], v] = 0. (This last identity is the Jacobi identity.) Remark. Part (b) of the previous problem shows that the vector space of smooth vector fields on O is a Lie algebra. The bracket [·, ·] is called the Lie bracket. 54 3 Manifolds 3.1 Smooth manifolds We define smooth manifolds as subsets of RN . We basically follow Spivak, Calculus on Manifolds, Chapter 5. When we say smooth in this section, we mean C ∞ . We say a subset M ⊂ Rn is a smooth k-dimensional manifold (or, more properly, a submanifold of Rn ), if for all x ∈ M , there are open subsets U ⊂ Rk and O ⊂ M with x ∈ O and a one-to-one C ∞ map φ : U → Rn satisfying 1. φ(U) = O. 2. For all y ∈ U, Dφ(y) has rank k. 3. φ−1 : O → U is continuous. Such a pair (φ, U) is called a local parametrization of M . The components of the map φ−1 : O → Rk are local coordinates on M . A set of triples (φα , Uα , Oα ) is called an atlas of M if {Oα } is an open cover of M . Since O is an open subset of M , there is an open subset W ⊂ Rn so that O = M ∩ W . In this case, we may rewrite condition (1) as (10 ) φ(U) = M ∩ W . Also note that φ : U → O is a homeomorphism from O to U since it is smooth, one-to-one, onto, and φ−1 is continuous. Now we note with a few examples why conditions (2) and (3) are necessary. First of all, consider φ : R → R2 given by φ(t) = (t2 , t3 ). Then φ is smooth, one-to-one, and φ−1 : φ(R) → R is continuous. But we note 2 the image φ(R), which is the graph of x1 = (x2 ) 3 in R2 , is not smooth at (0, 0) ∈ R2 . We also check that 2t Dφ = =0 when t = 0 and φ(t) = (0, 0), 3t2 and so Dφ has rank 0 < 1 at the point at which φ(R) is not smooth. Condition (3) is necessary by the following problem: 55 Homework Problem 26. Recall polar coordinates (x, y) = (r cos θ, r sin θ) in R2 . Show that a portion of the polar graph r = sin 2θ can be parametrized for I an open interval in R, by φ : I → R2 so that φ is one-to-one, C ∞ , and Dφ is never 0, but so that φ−1 : φ(I) → I is not continuous. Sketch the graph and indicate pictorially why φ(I) should not be considered a submanifold of R2 . If W and V are open subset of Rn , then a map f : W → V is a diffeomorphism if f is one-to-one, onto, C ∞ , and f −1 is C ∞ . The Inverse Function Theorem and Problem 9 show Lemma 36. f : W → V is a diffeomorphism if and only if f is one-to-one, onto, C ∞ , and det Df (x) 6= 0 for all x ∈ W . The following theorem is useful in proving properties about manifolds: Theorem 14. M ⊂ Rn is a k-dimensional manifold if and only if for all x ∈ M , there are two open subset V, W of Rn , with x ∈ W and a diffeomorphism h : W → V satisfying h(W ∩ M ) = V ∩ (Rk × {0}) = {y ∈ V : y k+1 = · · · = y n = 0}. Proof. (⇐) Let U = {a ∈ Rk : (a, 0) ∈ h(W )}, and define φ : U → Rn by φ(a) = h−1 (a, 0). φ is smooth and one-to-one since h is a diffeomorphism. Moreover, φ(U) = M ∩ W to satisfy condition (10 ). φ−1 = h(W ∩M ) is continuous. So all that is left to check is the rank condition (2). Consider H : W → Rk H(z) = (h1 (z), . . . , hk (z)). Then H(φ(y)) = y for all y ∈ U. Then use the Chain Rule to compute DH(φ(y)) ◦ Dφ(y) = I, and so Dφ(y) must be an injective linear map, and so must have rank k. Thus M is a smooth manifold. (⇒) Now assume M is a manifold, and define y = φ−1 (x). Then Dφ(y) has rank k, and so there is at least one k × k submatrix of Dφ(y) with nonzero determinant. (We may think of Dφ(y) as an n × k matrix mapping column vectors in Rk to column vectors in Rn . Then a k × k submatrix is simply a collection of k distinct rows of Dφ(y).) By a linear change of basis, if necessary, then, we may assume that i ∂φ (y) 6= 0. det 1≤i,j≤k ∂y j 56 By continuity, this is true on an open neighborhood U 0 of y. Define g : U 0 ×Rn−k → Rn by g(a, b) = φ(a)+(0, b). Then, in block matrix form, i ∂φ 0 ∂y j 1≤i,j≤k . i Dg(a, b) = ∂φ In−k j ∂y 1≤j≤k,k<i≤n i ∂φ So det Dg(a, b) = det1≤i,j≤k ∂y 6= 0. So we may apply the Inverse Function j Theorem to find that there are open subsets of Rn V10 3 (y, 0) and V20 3 g(y, 0) = x so that g : V10 → V20 has a smooth inverse h : V20 → V10 . Define O via O = {φ(a) : (a, 0) ∈ V10 } = (φ−1 )−1 (ι−1 (V10 )), where ι : Rk → Rn sends a to (a, 0). Since φ−1 is continuous, O is an open subset of φ(U 0 ), and of M . Therefore, there is an open subset Ṽ of Rn so that O = M ∩ Ṽ . Let W = Ṽ ∩ V2 , and V = g −1 (W ). Then h : V → W is a diffeomorphism and W ∩M = = h(W ∩ M ) = = = {φ(a) : (a, 0) ∈ V } {g(a, 0) : (a, 0) ∈ V }, g −1 (W ∩ M ) g −1 ({g(a, 0) : (a, 0) ∈ V }) V ∩ (Rk × {0}). This completes the proof. This characterization of manifolds is quite useful. Consider two smooth local parametrizations φα : Uα → Oα , and φβ : Uβ → Oβ . Then if Oα ∩Oβ 6= ∅, then we have the following −1 −1 Proposition 37. φ−1 β ◦ φα : φα (Oβ ) → φβ (Oα ) is a diffeomorphism. Proof. Consider π : Rn → Rk given by (a, b) 7→ a for (a, b) ∈ Rk × Rn−k , and ι : Rk → Rn given by ι(a) = (a, 0). Let hα and hβ be the diffeomorphisms 57 −1 guaranteed by Theorem 14. Then φα (a) = h−1 α (a, 0), φα (x) = π(hα (x)), and so −1 φ−1 β ◦ φα = π ◦ hβ ◦ hα ◦ ι is smooth since hα , hβ are diffeomorphisms. The maps φ−1 β ◦ φα are called gluing maps. Remark. It is often useful to think of a manifold M as being glued together from domains Uα in Rk by the gluing maps. In fact, the previous proposition is the starting point for the abstract definition of a smooth manifold: A smooth k-dimensional manifold is Hausdorff, sigma-compact topological space for which each point x has a neighborhood Oα homeomorphic to a domain Uα ⊂ Rk via φα : Uα → Oα . In addition, we require the gluing maps −1 φ−1 β ◦ φα to be smooth on φα (Oβ ). If M is a smooth manifold, then a function f : M → Rp is said to be smooth if for each smooth parametrization φ : U → M , f ◦ φ : U → Rp is smooth. If N ⊂ Rp is a smooth submanifold, then f : M → N is said to be smooth the induced map f : M → Rp is smooth. (For abstract target manifolds N , we may work with local parametrizations instead.) This definition of smooth maps from manifolds is consistent in the following sense: Proposition 38. If f : M → Rp , and f ◦ φα is smooth from Uα → Rp , then on φ−1 β (Oα ) ⊂ Uβ , f ◦ φβ is also smooth. Proof. Apply Proposition 37 and the Chain Rule. Proposition 39. If M ⊂ Rn is a smooth manifold and f : M → Rp , then f is smooth if and only if f can be locally extended to smooth functions from domains in Rn to Rp . In other words, f is smooth if and only if every x ∈ M has a neighborhood W in Rn , and there is a smooth function F : W → Rp so that F W ∩M = f . Proof. (⇒) For x ∈ M , consider the local diffeomorphism h : W → V guaranteed by Theorem 14. Then for the smooth parametrization φ(a) = h−1 (a, 0), we know f ◦ φ is smooth. Now define F = f ◦ h−1 ◦ π ◦ h : W → Rp for π : (a, b) 7→ a. F is smooth since F = f ◦ h−1 ◦ π ◦ h = (f ◦ φ) ◦ π ◦ h. 58 (⇐) For a local parametrization φ, f ◦ φ is smooth since locally, f ◦ φ = F ◦ φ, which is smooth by the Chain Rule. X ⊂ RN is a smooth manifold of dimension k if every x ∈ X has a neighborhood that is diffeomorphic to an open subset of Rk . In other words, there is an open cover Oα of X so that each Oα is diffeomorphic to an open subset Uα ⊂ Rk . Let φα : Uα → Oα be the diffeomorphism. φα is called a parametrization of Oα ⊂ X, and the inverse map φ−1 α is called a coordinate system. The open cover, together with the coordinate systems {Oα , φα , Uα } is called a smooth atlas of X, and X is a smooth manifold if and only if it has a smooth atlas. Example 10. The unit sphere S2 = {(x1 , x2 , x3 ) ∈ R3 : (x1 )2 + (x2 )2 + (x3 )2 = 1} is a two-dimensional submanifold of R3 . To show this, we provide an atlas. Let N = (0, 0, 1) be the north pole and S = (0, 0, −1) be the south pole. Then let O1 = S2 \ {N }, O2 = S2 \ {S}, U1 = U2 = R2 . We construct the coordinate systems φ−1 α , α = 1, 2, by 2 stereographic projection. We may realize R as the plane {x3 = 0} ⊂ R3 . For a point x in O1 , consider the line Lx,N in R3 through N and x. We 2 define φ−1 1 (x) to be the unique point in R ∩ Lx,N . It is easy to compute x2 x1 1 2 −1 1 2 3 , , (y , y ) = φ1 (x , x , x ) = 1 − x3 1 − x3 2y 1 2y 2 |y|2 − 1 1 2 3 1 2 (x , x , x ) = φ1 (y , y ) = , , . |y|2 + 1 |y|2 + 1 |y|2 + 1 Similarly, for any point x ∈ O2 , define φ−1 2 (x) to be the unique point in R2 ∩ Lx,S , and we find as above x1 x2 1 2 −1 1 2 3 (z , z ) = φ2 (x , x , x ) = , , 1 + x3 1 + x3 2z 1 2z 2 |z|2 − 1 1 2 3 1 2 , ,− . (x , x , x ) = φ2 (z , z ) = |z|2 + 1 |z|2 + 1 |z|2 + 1 It is straightforward to check that each of these coordinate systems is a diffeomorphism, and since S2 = O1 ∪ O2 , we have produced a smooth atlas of S2 and thus have shown that S2 is a two-dimensional manifold. 59 Given a smooth manifold X with a smooth atlas {Oα , φα , Uα }, let Oαβ = Oα ∩ Oβ . Also define Uαβ = Uα ∩ φ−1 α (Oαβ ). As long as Oαβ 6= ∅, the map φαβ ≡ φ−1 β ◦ φα : Uαβ → Uβα is a diffeomorphism. These maps φαβ are called the gluing maps of the manifold X associated to the atlas. In particular, the manifold can be thought of as the union of the coordinate charts Uα glued together by the gluing maps. It is straightforward to see, at least as a set, we may identify ! G X= Uα / ∼, α where t means disjoint union and the equivalence relation ∼ is given by x∼y if x ∈ Uαβ ⊂ Uα , y ∈ Uβα ⊂ Uβ , y = φαβ (x). Gluing maps may be used to define smooth manifolds which are not necessarily subsets of RN (though we won’t do so here). It is instructive to think of k-dimensional smooth manifolds as spaces that are smoothly glued together from open sets in Rk . Example 11. Recall the example of the atlas of S2 above. Compute O12 = S2 \ {S, N }, U12 = R2 \ {0}, U21 = R2 \ {0}, z = φ12 (y) = φ−1 2 (φ1 (y)) = y1 y2 , |y|2 |y|2 = y . |y|2 This gluing map is called inversion across the circle |y|2 = 1 in R2 . Each point is mapped to a point on the same ray through the origin, but the distance to the origin is replaced by its reciprocal. So we can think of S2 as two copies of R2 glued together along R2 \{0} by the inversion map across the unit circle. 3.2 Tangent vectors on manifolds Recall that for a solution φ to an autonomous system ẋ = v(x), the parametric curve φ(t) has tangent vector φ̇(t) = v(φ(t)) at time t. We will use 60 this to define tangent vectors to manifolds. A tangent vector at a point p in a smooth manifold X is given by the derivative α̇(0) of a smooth curve α : (−, ) → X ⊂ RN so that α(0) = p. (Note the fact RN is a vector space allows us to differentiate α.) The space of all tangent vectors at p is called the tangent space Tp X of X at p, and it is characterized by the following proposition. Proposition 40. If X ⊂ RN is a k-dimensional smooth manifold, then the tangent space Tp X is the following: Given a local parametrization of X φ: U → O 3 p so that φ(0) = p, Tp X = Dφ(0)(Rk ). In particular, Tp X is naturally a k-dimensional vector space. Proof. First of all, given a curve α : (−, ) → X so that α(0) = p, we can ensure (by shrinking if necessary), that the image of α is contained in the coordinate neighborhood O. Now α = φ ◦ (φ−1 ◦ α) and the chain rule shows that α0 (0) = Dφ(0)[(φ−1 ◦ α)0 (0)] ∈ Dφ(0)(Rk ). Thus we’ve shown Tp X ⊂ Dφ(0)(Rk ). To show Dφ(0)(Rk ) ⊂ Tp X, for any vector v ∈ Rk , consider α(t) = φ(tv) for |t| small enough that the image of α is contained in O. Then α0 (0) = Dφ(0)v and so Dφ(0)(Rk ) = Tp X. Also note the following corollary of our definition of Tp X: Corollary 41. Tp X is independent of the coordinate neighborhood O of p. If f : X → Rm is a smooth map from a smooth k-dimensional manifold X, and if p ∈ X, then we define Df (p) : Tp X → Rm 61 by using a local parametrization φ : U → X so that φ(q) = p. Then we define Df (p) = D(f ◦ φ)(q) ◦ (Dφ(q))−1 . The following exercise verifies this definition makes sense (see Guillemin and Pollack). Homework Problem 27. (a) Show that Dφ(q) is invertible as a linear map from Rk to Tp X. (b) Show that the definition of Df (p) is independent of the coordinate parametrization φ. (c) Show that if f : X → Y for Y ⊂ Rm a manifold, then Df (p)(Tp X) ⊂ Tf (p) Y . Tangent vectors naturally differentiate functions at a point. So if f : X → R, then and the tangent vector v = α0 (0) for a curve α so that α(0) = p, then we may define (vf )(p) = (f ◦ α)0 (0) = Df (p)α0 (0) = Df (p)v. This definition depends only on v, and not on the curve α used. (For each v there are many α, since v only depends on the first derivative α0 (0) and no higher Taylor coefficients.) For a coordinate system φ−1 = (x1 , . . . , xk ) : O → Rk , (where we assume as usual that φ(0) = p), then the coordinate basis of Tp X induced by φ may be written as {∂/∂xi }, which are thought of as tangent vectors differentiating functions f by ∂ ∂ ∂ −1 1 k f = f ◦φ = f (x , . . . , x ) . i i i ∂x ∂x ∂x p 0 0 (∂/∂xi is the tangent vector associated to the curve α = φ(tei ), for ei the ith basis standard basis vector in Rk .) Thus we can write any tangent vector v at p as ∂ v = vi i . ∂x 62 Writing tangent vectors in terms of the coordinate basis of Tp X is much more useful than writing them in terms of a basis of RN ⊃ Tp X. The components v i will change depending on the local coordinates. On Oαβ = Oα ∩ Oβ the intersection of two coordinate neighborhoods of p, then −1 1 k 1 k we have two coordinate systems φ−1 α = (x , . . . , x ) and φβ = (y , . . . , y ). We can write by using the chain rule v = v i (x) ∂ ∂y j ∂ ∂ i = v (x) = v j (y) j . i i j ∂x ∂x ∂y ∂y Therefore, we know how the v i change under coordinate transformations x → y: ∂y j v j (y) = v i (x) i . (22) ∂x (In a more coordinate-free notation, the Jacobian matrix ∂y j /∂xi is the derivative of the gluing map φαβ = φ−1 β ◦ φα . It is easy to check that y = φαβ ◦ x.) All the tangent spaces of a manifold X patch together to make a larger manifold T X called the tangent bundle. We define the tangent bundle T X = {(p, w) ∈ RN × RN : p ∈ X, w ∈ Tp X}. Homework Problem 28. If X is a k-dimensional manifold, show that T X is a 2k-dimensional submanifold of R2N . To prove this, consider a local parametrization φ : U → X ⊂ RN . (a) Define Φ : U × Rk → R2N for y = (y 1 , . . . , y k ) by ∂φ i Φ(x, y) = φ(x), i (x) y . ∂x Show that Φ(U ×Rk ) is an open subset of T X and that Φ is one-to-one. (b) Show that DΦ has rank 2k. (c) Show that Φ−1 is continuous from Φ(U × Rk ) to U × Rk . There is a natural smooth map π : T X → X, π(p, w) = p, 63 and each π −1 ({p}) is the vector space Tp X. Each coordinate system φ−1 = (x1 , . . . , xk ), provides a local frame {∂/∂xi } of the tangent bundle. A local frame is a basis of the tangent space for every p in a neighborhood O ⊂ X. These frames are patched together in the following paragraph. A more abstract view of the tangent bundle is given by looking a given smooth atlas {Oα , φα , Uα } of X. Then as a set, we may identify ! G TX = Uα × Rk / ≈, α where the equivalence class ≈ is given by (x, v) ≈ (y, w) if x ∈ Uαβ , y ∈ Uβα , y = φαβ (x), w = Dφαβ v. A vector field on a manifold X provides a tangent vector at every point in X. More precisely, a vector field is a section of the tangent bundle. In other words, v : X → T X is a vector field if π(v(p)) = p for all p ∈ X. So v(p) = (p, w(p)) for w(p) ∈ Tp X. In fact, for X ⊂ RN , w : X → RN so that w(p) ∈ Tp X is equivalent to v(p) = (p, w(p)). (Clearly v and w carry the same amount of information, and we often will refer to both of them using the same symbol v.) A vector field v is smooth if it is given as a smooth map from X to N R × RN ⊃ T X as above. Equivalently, v is smooth if for every local coordinate system (x1 , . . . , xk ), v = v i (x) ∂ ∂xi for v i smooth on U ⊂ Rk . 3.3 Flows on manifolds A smooth vector field v on a manifold X defines a system of ODEs in the local coordinates of X (or we may say more simply a system on X). The ODE system is given by ẋ = v(x) for x : I → X a parametric curve. 64 In order to describe the relationship between the local and global pictures of the ODE system, consider X ⊂ RN and v : X → RN so that for each p ∈ X, v(p) ∈ Tp X. Consider a local parametrization φα : Uα → Oα . Let 1 k k φ−1 α = (xα , . . . , xα ). Locally on Uα ⊂ R , we represent v by vα = vαi ∂ . ∂xiα In other words, for p ∈ Oα ⊂ X, we have v(p) = Dφα (p)vα (p). Proposition 42. Consider v a smooth vector field on X ⊂ RN . Consider a solution ψα to ẋα = vα (xα ), where ψα : I → Uα for a time interval I. Then ψ = φα ◦ ψα is a solution to ẋ = v(x) from I to Oα ⊂ X. Every solution to ẋ = v(x) restricted to Oα is of this form. Proof. First of all, note that ẋα = vα (xα ) is a well-defined system of ODEs on the open set Uα ⊂ Rk . On the other hand, on X, the system ẋ = v(x) is not an ODE system on RN ⊃ X. This may be remedied locally as follows: For each p ∈ X, v(p) ∈ Tp X ⊂ RN . Then since v is a smooth function, we may locally extend v to a smooth function to RN (we refer to each local extension simply as v). Consider a solution ψα to ẋα = vα (xα ). Then if we let ψ = φα ◦ ψα , then compute ψ̇ = Dφα (ψ̇α ) = Dφα (vα ) = v. Thus ψ is a solution. To show that every solution ψ to ẋ = v(x) is of this form, note that since Tp X is the image of Dφα (q) for φα (q) = p (Proposition 40), then every smooth vector field v is locally equal to Dφα vα . Then by uniqueness of ODEs, the solution to ẋ = v(x) must be the image of the solution to ẋα = vα (xα ). Remark. The restriction to autonomous equations ẋ = v(x) is unnecessary. The same proof works for non-autonomous systems ẋ = v(x, t) on manifolds. Recall a subset X of a metric space Y is compactly contained in another subset Z if X̄ is compact and X̄ ⊂ Z. In this case we write X ⊂⊂ Z, and say X is a precompact subset of Z. 65 Theorem 15. Let v be a smooth vector field on a compact manifold X. Then the flow F (y, t) along the vector field (the solution to ẋ = v(x), x(0) = y) is a smooth function from X × R to X. In particular, any flow on a compact manifold exists for all time. Proof. Consider an atlas {Oα , φα , Uα } of X. First of all, by Lemma 43 below, there is an open cover Qβ of X so that each Qβ ⊂⊂ Oα for some Oα in the atlas. Then each φ−1 α Qβ is a compact subset of Uα . Our differential equation is equivalent to ẋα = vα (xα ) on each Uα . Since X is compact, we can choose a finite subcover {Q1 , . . . , Qn } of the open cover {Qβ }. For each i = 1, . . . , n, an straightforward analog of Lemma 23 shows there is an i > 0 so that if x0 ∈ φ−1 α Qi , then the solution to ẋα = vα (xα ), x(0) = x0 stays in Uα for t ∈ [−i , i ]. Moreover, by Proposition 31, for any T ∈ R, the solution with initial condition x(T ) = x0 ∈ φ−1 α Qi stays within Uα for time t ∈ [T − i , T + i ]. Let = min{1 , . . . , n } > 0. Then for every T ∈ R, p ∈ X, we claim the solution to ẋ = v(x), x(T ) = p exists for all t ∈ [T − , T + ]. To prove the claim, note that each p ∈ X lies in one of the Qi ⊂ Oα , and that the solution to ẋα = vα (xα ), x(T ) = φ−1 α (p) lies in Uα for t ∈ [T − , T + ]. Thus Proposition 42 shows that the solution to ẋ = v(x), x(T ) = p is in Oα for t ∈ [T − , T + ], and the claim is proved. In order to prove the Theorem, continue as in the proof of Lemma 25 to show the solution exists for all time. The smoothness of the solution follows from Theorem 12 and Proposition 42. Lemma 43. Given an atlas {Oα , φα , Uα } of a manifold X, there is an open cover {Qβ } of X so that each Qβ is precompact in some Oα . Proof. We can cover each open Uα ⊂ Rk by open balls Bβ ⊂⊂ Uα . Then Qβ = φα (Bβ ) forms an open cover of X. The support of an Rm -valued function f is the closure supp(f ) = {x : f (x) 6= 0}. 66 An important class of functions is smooth functions with compact support. Prominent examples can be constructed using the smooth function on R −1 e x for x > 0 f (x) = 0 for x ≤ 0 See the notes on bump functions. Homework Problem 29. Let Ω ⊂ Rn be a domain. Consider a smooth vector field v : Ω → Rn with compact support. Show that any solution ψ to ẋ = v(x), x(0) = x0 ∈ Ω, exists for all time t ∈ R. Hint: First show that if v(y) = 0, then any solution to ẋ = v(x), x(t0 ) = y, must be constant for all time. Use this to show that any solution to ẋ = v(x), v(x(0)) 6= 0, must remain in supp(v) for its entire maximal interval of definition. Apply Theorem 7. Given a smooth manifold X, consider the set Diff(X) of diffeomorphisms from X to itself. Then for f, g ∈ Diff(X), it is easy to see that f ◦ g ∈ Diff(X), f −1 ∈ Diff(X), f ◦ f −1 = id for id the identity map. Therefore, Diff(X) is a group. Proposition 44. Let v be a smooth vector field on a compact manifold X. Then for the flow F (y, t), define Ft (y) = F (y, t). Then Ft ∈ Diff(X), Ft1 +t2 = Ft1 ◦ Ft2 , and F−t = Ft−1 . (And so F is a group homomorphism from the additive group R to Diff(X).) Proof. Theorem 15 shows that Ft is smooth for any t. The group homomorphism property is simply a restatement of Proposition 34. Therefore, Ft ◦ F−t = F0 , which is the flow along v for time 0. By definition, F0 = id the identity map. Now Ft−1 = F−t is smooth, and so Ft is a diffeomorphism. Remark. Note the only place we used the fact that X is compact is to guarantee the existence of the flow for all time. So the proposition still holds for any smooth vector field v on a smooth manifold X so that the flow exists for all time. Example 12. For the sphere S2 ⊂ R3 , consider the vector field defined by v(x1 , x2 , x3 ) = (−x2 , x1 , 0). It is straightforward to show that the tangent space to S2 at (x1 , x2 , x3 ) is given by v = (v 1 , v 2 , v 3 ) ∈ R3 so that v 1 x1 + 67 v 2 x2 + v 3 x3 = 0. (Proof: S2 = {f = 1} for f = (x1 )2 + (x2 )2 + (x3 )2 , and so for any local parametrization φ, we have f ◦ φ = 1. Thus the Chain Rule shows that Df (x)(Tx S2 ) = 0, and so Tx S2 ⊂ ker Df (x). They must be equal since both are two-dimensional vector spaces. Then simply compute ker Df (x).) Therefore, v is a smooth vector field on S2 . Recall that the coordinate systems of the atlas introduced above are x1 x2 2 3 1 2 −1 1 (y , y ) = φ1 (x , x , x ) = , , 1 − x3 1 − x3 x1 x2 1 2 −1 1 2 3 (z , z ) = φ2 (x , x , x ) = , . 1 + x3 1 + x3 On U1 , compute at x = (x1 , x2 , x3 ) ∈ O1 ⊂ S2 , Dφ−1 1 (x)(v) = = ! −x2 0 x1 1 0 1−x3 0 ! 2 x − 1−x −y 2 3 = . 1 x y1 1−x3 x1 (1−x3 )2 x2 (1−x3 )2 1 1−x3 It turns out that for x ∈ O2 , Dφ−1 2 (x)(v) = −z 2 z1 as well. coordinate charts, these systems can be solved explicitly. For A = In the 0 −1 , compute the fundamental solution 1 0 eAt = P etD P −1 1 i 1 1 i 0 2 2 = exp t 1 −i i 0 −i − 2i 2 1 1 cos t + i sin t 0 = −i i 0 cos t − i sin t cos t − sin t = . sin t cos t 68 1 2 1 2 i 2 − 2i Therefore, for y ∈ U1 , the solution to ẏ = v(y), y(0) = y0 is cos t − sin t y(t) = y0 . sin t cos t And also, for z ∈ U2 , the solution to ż = v(z), z(0) = z0 is cos t − sin t z(t) = z0 . sin t cos t (23) (24) Proposition 42 implies that these two flows should be related, since they both correspond to flows on S2 . In particular, for y0 ∈ U12 , let z0 = φ12 (y0 ) = y0 |y0 |−2 . Then we check that the solution z(t) = φ12 (y(t)) for y(t) from (23) and z(t) from (24). So compute 1 1 y0 cos t − y02 sin t y0 y(t) = for y0 = , y01 sin t + y02 cos t y02 |y(t)|2 = (y01 cos t − y02 sin t)2 + (y01 sin t + y02 cos t)2 = |y0 |2 , 1 y(t) cos t − sin t = y0 φ12 (y(t)) = sin t cos t |y(t)|2 |y0 |2 cos t − sin t = z0 = z(t). sin t cos t Therefore, the flow patches from U1 to U2 . The flow itself can be represented by on U1 by cos t − sin t Ft (y) = y, sin t cos t on U2 by Ft (z) = and even on S2 ⊂ R3 itself 0 1 Ft (x) = exp 0 cos t − sin t sin t cos t z, by −1 0 cos t − sin t 0 0 0 t x = sin t cos t 0 x. 0 0 0 0 1 69 Homework Problem 30. Consider the atlas given above for S2 . On U1 , consider the vector field v = −y 1 ∂ 2 ∂ − y . ∂y 1 ∂y 2 Show that Dφ1 v is extends to a smooth vector field on all of S2 (i.e., it extends smoothly across N = S2 \ O1 .) Write down this vector field in the z coordinates on U2 as well. Solve for the flow on U1 and U2 , and explicitly check they agree on the overlap O12 . 3.4 Riemannian metrics For a vector v at a point p on a manifold X ⊂ RN , we can measure the length of v by using the inner product on RN . So if v ∈ Tp X ⊂ RN , and v = va ∂ ∂y a for y = (y 1 , · · · , y N ) coordinates on RN , then the length |v| of v is given by 2 |v| = N X (v a )2 = δab v a v b a=1 for the Kronecker δab = 1 if a = b and δab = 0 if a 6= b. In this usage for computing the length of a tangent vector on RN , the Kronecker δ is a Riemannian metric. (Note we use the following convention for an n-dimensional manifold X ⊂ RN : use indices a, b, c from 1 to N to represent coordinates in RN , and use i, j, k from 1 to n to represent local coordinates on X.) On a manifold X, a Riemannian metric is a smoothly varying positive definite inner product on Tp X for all p ∈ X. Recall the definitions involved. An inner product on a real vector space V is a pairing g : V ×V → R which is bilinear and symmetric. g is bilinear if for every v ∈ V , the maps g(v, ·) and g(·, v) from V to R are linear maps, and g is symmetric if for each v, w ∈ V , g(v, w) = g(w, v). An inner product is positive definite if g(v, v) ≥ 0 for all v ∈ V and g(v, v) = 0 only if v = 0. If the vector space V has a basis ei , then the inner product g is determined by gij = g(ei , ej ), since for any linear combination v = v i ei , w = wj ej , 70 bilinearity shows g(v, w) = g(v i ei , wj ej ) = v i g(ei , wj ej ) = v i wj g(ei , ej ) = v i wj gij . The fact g is symmetric is equivalent to gij = gji . Note that a positive definite p inner product g provides a way to measure the length of a vector |v|g = g(v, v), and it also provides a measurement of the angle θ between two nonzero vectors v and w: cos θ = g(v, w) . |v|g |w|g A Riemannian metric on X gives a positive definite inner product on each tangent space Tp X. We also require these inner products to vary smoothly as the point p varies in X. To describe this, consider a smooth atlas on X, and a local coordinate system (x1 , . . . , xk ) around p. Then a smooth vector field v can be represented as v = v i ∂x∂ i for the standard local frame {∂/∂xi } of the tangent bundle. Then at each point, the inner product g is represented by gij (x), and g(v, w) = gij v i wj , v i = v i (x), wj = wj (x), gij = gij (x). Then g is smoothly varying on X if the functions gij are smoothly varying on each coordinate chart in the smooth atlas of X. Euclidean space RN has a standard Riemannian metric given by the standard inner product δab . As we’ve seen above, for any submanifold X ⊂ RN endows X with a Riemannian metric. In particular, for v, w ∈ Tp X ⊂ RN , we can form g(v, w) using the inner product δab . In particular, consider a smooth parametrization φ : U → O ⊂ X ⊂ RN . Then φ = (φ1 , · · · , φN ). A vector field represented by ∂ v = vi i ∂x on U ⊂ Rn is represented by ∂φa (x)v i (x) ∈ Tφ(x) X ⊂ RN . Dφ(x)(v) = i ∂x Dφ(x)(v) is called the push-forward of v under the map φ. For v, w ∈ Tp X, we may define the metric a b ∂φ i ∂φ j i j v w δab gij v w = g(v, w) = i ∂x ∂xj a b ∂φ ∂φ = δab v i wj . ∂xi ∂xj 71 Therefore, the Euclidean inner product on RN induced the Riemannian metric on X locally given by the formula ∂ ∂ ∂φa ∂φb g , = gij = δab . (25) ∂xi ∂xj ∂xi ∂xj Given a real vector space V , the dual vector space V ∗ is given by the set of all linear functions from V to R. It is easy to check V ∗ is a vector space. If V has a basis {ei }, then there is a dual basis {η i } of V ∗ , which is defined as follows: η i (ej ) = δji . Given a local coordinate frame {∂/∂xi } of T X, the local frame on the dual space is written as {dxi }. Each dxi is called a differential. The dual space Tp∗ X of Tp X is called the cotangent space of X at p. Lemma 45. If y = y(x) is a coordinate change as in (22), then dy j = ∂y j i dx . ∂xi Proof. Write dy j = ξ`j dx` . Then we have k k k k ∂x ∂ ∂ ∂ j j j ∂x j ∂x j ∂x ` ` ` j δi = dy = ξ dx = ξ dx = ξ δ = ξ . ` ` ` k ∂y i ∂y i ∂xk ∂y i ∂xk ∂y i k ∂y i k ∂x ∂y j j j , and so ξ = . Therefore, (ξk ) is the inverse matrix of k ∂y i ∂xk A Riemannian metric can be naturally written as gk` dy k dy ` = gij ∂y k ∂y ` i j dx dx . ∂xi ∂xj This makes sense because of the natural pairing ∂ i dx = δji j ∂x between the tangent and cotangent spaces implies that j i k ∂ ` ∂ g(v, w) = gij dx v dx w = gij (v k δki )(w` δ`j ) = gk` v k w` . ∂xk ∂x` 72 A Riemannian metric is an example of a tensor on X. The tensor product V ⊗ W of two real vector spaces with bases respectively νi and ωj is the real vector space formed from the basis {νi ⊗ ωj }. This implies dim V ⊗ W = (dim V )(dim W ). A tensor of type (k, `) on a manifold X assigns to each point p ∈ X an element of (Tp X)⊗k ⊗ (Tp∗ X)⊗` , which has as its basis ∂ ∂ j1 j` ⊗ · · · ⊗ i ⊗ dx ⊗ · · · ⊗ dx . ∂xi1 ∂x k Locally, we write a tensor ω as k ωji11 ···i ···j` ∂ ∂ ⊗ · · · ⊗ ⊗ dxj1 ⊗ · · · ⊗ dxj` , i ∂x 1 ∂xik i1 ···ik k or simply as ωji11 ···i ···j` . We say ω is smooth if each ωj1 ···j` is smooth locally for all coordinates in a smooth atlas of X. A Riemannian metric is then a smooth symmetric (0, 2) tensor on a manifold X. Since the product is symmetric, we omit the ⊗ and simply write gij dxi dxj for a Riemannian metric in local coordinates x. (There are also antisymmetric (0, k) tensors, or k-forms, for which the tensor product ⊗ is replaced by ∧.) Example 13. For S2 , in the local coordinate given by stereographic projection, recall the coordinate chart φ = φ1 : 2y 1 2y 2 |y|2 − 1 1 2 φ(y , y ) = , , , |y|2 + 1 |y|2 + 1 |y|2 + 1 73 and the Riemannian metric induced from R3 is ∂φa ∂φb i j dy dy ∂y i ∂y j δab dφa dφb dφ1 dφ1 + dφ2 dφ2 + dφ3 dφ3 2 −2(y 1 )2 + 2(y 2 )2 + 2 1 −4y 1 y 2 2 dy + dy (|y|2 + 1)2 (|y|2 + 1)2 2 −4y 1 y 2 2(y 1 )2 − 2(y 2 )2 + 2 2 1 dy + dy + (|y|2 + 1)2 (|y|2 + 1)2 2 4y 1 4y 2 1 2 + dy + dy (|y|2 + 1)2 (|y|2 + 1)2 4 (dy 1 dy 1 + dy 2 dy 2 ). 2 2 (|y| + 1) gij dy i dy j = δab = = = = Note in the previous example, we used the formula for differentials dφa = ∂φa i dy . ∂y i It is also useful to have the following notation: If h = hab dz a dz b is a Riemannian metric on Z, and φ : Y → Z is a smooth map, then we denote the pullback metric φ∗ h = hab (φ) dφa dφb on Y . Thus in the construction above, if δ = δab dxa dxb is the Euclidean metric on RN , then the metric g induced on a submanifold φ : X ,→ RN is the pullback φ∗ δ. Homework Problem 31. Let φ : X → Y be a smooth map of manifolds. Let Y have a Riemannian metric h on it. Show that φ∗ h is a Riemannian metric on X if and only if the tangent map Dφ(x) : Tx X → Tφ(x) Y is injective for every x ∈ X. (In this case φ is called an immersion.) Hint: Do the calculations in local coordinates on X and Y . The key point to check is whether φ∗ h is positive definite. Show φ∗ h(x) is 0 on the kernel of Dφ(x). Note in the previous example, we considered the Riemannian metric on S pulled back from the Euclidean metric on R3 . It is possible to write down other Riemannian metrics as well. 2 74 Example 14. Consider hyperbolic space Hn = {x = (x1 , . . . , xn ) ∈ Rn : xn > 0} equipped with the Riemannian metric dx1 dx1 + · · · + dxn dxn . (xn )2 A famous theorem of John Nash shows that for every Riemannian metric g on a smooth manifold X, there is an embedding i : X → RN so that g is induced from the standard metric on RN . (Although it is not in most cases obvious what the embedding is.) 3.5 Vector bundles and tensors In order to explain better what tensors are, we introduce the idea of a vector bundle. The tangent bundle T X of a smooth n-dimensional manifold X is a vector bundle. Recall there is a map π : T X → X. The fiber over a point p ∈ X π −1 (p) = Tp X is an n-dimensional vector space. Moreover, over each coordinate neighborhood O ⊂ X with coordinates {x1 , . . . , xn }, π −1 O is diffeomorphic to O × Rn , the diffeomorphism being (p, v) 7→ (p, v 1 , . . . , v n ) for p ∈ O, v = v i ∂x∂ i ∈ Tp X. We generalize these properties of T X to define a vector bundle. A vector bundle of rank k over a manifold X is given by an n + k dimensional manifold V with a smooth map π : V → X. V is called the total space of the vector bundle. Every point in X has a neighborhood O so that π −1 O is diffeomorphic to O × Rk . Under this diffeomorphism, π is simply the natural projection from O × Rk → O. Thus vector bundles are locally trivial, in that each vector bundle is locally a product of a neighborhood times Rk . Note that each diffeomorphism π −1 O → O × Rk 75 provides for each p ∈ O a basis of the vector space π −1 (p) by taking the preimage of the standard basis of Rk under the diffeomorphism. Such a smoothly varying basis is called a local frame of the vector bundle over O. Given a gluing map y = y(x) of two small coordinate neighborhoods Ox and Oy in X, there is a corresponding gluing map of Ox × Rk and Oy × Rk . We require this gluing map to be of the form (x, v) 7→ (y(x), A(x)v) for v a vector in Rk and A(x) a smoothly varying nonsingular matrix in x. Therefore, above each point p, if we change coordinates from x to y, the frame changes by the matrix A(x). A(x) is a transition function of the vector bundle V . So the transition functions act on the fibers of a vector bundle as linear isomorphisms. This preserves the vector-space structure on each fiber when changing coordinates. Remark. We have defined real vector bundles of rank k, for which each fiber is diffeomorphic to Rk . We may also define complex vector bundles with fibers diffeomorphic to Ck . A section of a vector bundle π : V → X is a map s : X → V satisfying π(s(p)) = p for all p ∈ X. So for each p ∈ X, s(p) is an element of the vector space π −1 (p). A vector field is precisely a section of the tangent bundle. Locally, k sections which are linearly independent on each fiber form a frame of the vector bundle. For example, {∂/∂xi } are n linearly independent sections of the tangent bundle over a coordinate chart. Since vector bundles preserve the linear structure on each fiber, we may do linear algebra on the fibers to create new vector bundles. In particular, we can take duals and tensor products of the fiber space to form new vector bundles. The tensor bundle of type (k, `) over an n dimensional manifold X is the vector bundle of rank nk+` with the fiber over p given by Tp X ⊗k ⊗ Tp∗ X ⊗` . Over each coordinate chart, the natural frame of the tensor bundle is ∂ ∂ ⊗ · · · ⊗ i ⊗ dxj1 ⊗ · · · ⊗ dxj` i 1 ∂x ∂x k for i1 , . . . , ik , j1 , . . . , j` ∈ {1, . . . , n}. The transition functions of a tensor bundle are determined by the formulas ∂ ∂y k ∂ = , ∂xi ∂xi ∂y k dxj = 76 ∂xj ` dy . ∂y ` For example the transition functions for the (0, 2) tensor bundle are given by dxi dxj = Note we can view ∂xi ∂xj k ` dy dy . ∂y k ∂y ` ∂xi ∂xj ∂y k ∂y ` as a nonsingular n2 × n2 matrix, which is the tensor product of the matrix ∂xi with itself. ∂y k A smooth tensor of type (k, `) is a smooth section of the (k, `) tensor bundle. Thus a Riemannian metric is a smooth symmetric, positive-definite (0, 2) tensor. 3.6 Integration and densities We begin by introducing the Change of Variables Formula for multiple integrals: Theorem 16 (Change of Variables). Let Ω ⊂ Rn be an open set, and let g : Ω → Rn be one-to-one and locally C 1 . Then for every L1 function f on g(Ω) with Lebesgue measure dx and dy, Z Z f (y) dy = f (g(x))| det Dg(x)| dx. g(Ω) Ω Proof. See Spivak Calculus on Manifolds. Here is another useful concept. Given an open cover {Oα } of a smooth manifold X, a partition of unity subordinate to the cover is a collection of smooth functions ρβ : X → R satisfying 1. ρβ (x) ∈ [0, 1]. 2. For each ρβ , there is an α so that supp(ρβ ) ⊂⊂ Oα . 3. Every x ∈ X has a neighborhood which intersects only finitely many of the supports of the ρβ . P 4. β ρβ (x) = 1. 77 Proposition 46. For every open cover of a smooth manifold X, there exists a subordinate partition of unity. For a proof, see Spivak or Guillemin and Pollack. Theorem 17. A Riemannian metric g on a manifold X provides a measure on X called the Riemannian density. The construction of this measure follows below, along with a sketch of a proof. Let {Oα , φα , Uα } be a smooth atlas of X. A function f : X → R is measurable if each f ◦ φα : Uα → R is measurable. For a Riemannian metric g on X, the density dVg is defined first for measurable functions f : X → R whose supports are contained in some Oα . In this case, define Z Z Z q f dVg = f dVg = f (x) det gij (x) dx X Oα Uα for local coordinate x on Oα and Lebesgue measure dx on Uα ⊂ Rn . The key point is to make sure this definition makes sense for functions f whose support is contained in two open charts Oα and Oβ . As above, let x be the local coordinates on Oα , and let y be the coordinates on Oβ . Then we use the rule (25) for changing gij under a change y = y(x) and the Change of Variables Theorem 16 to show Z Z q q ∂y i f (y) det gij (y) dy = f (x) det gij (y) det j dx ∂x Uβ Uα s Z ∂xk ∂x` ∂y i = f (x) det gk` (x) det j dx ∂yi ∂y j ∂x Uβ Z k i p ∂x ∂y = f (x) det gk` (x) det i det j dx ∂y ∂x U Z β p = f (x) det gk` (x) dx. Uβ Let ρβ be a partition of unity subordinate to the atlas Oα of X. For any measurable subset Ω ⊂ X, consider its characteristic function χΩ . Then Z XZ Vg (Ω) = χΩ dVg = ρβ χΩ dVg . X β 78 X The calculation in the previous paragraph can be used to ensure that this definition is independent of the atlas and partition of unity used. It is straightforward to check that dVg defines a measure on X. Then for any L1 function f on X (measured by dVg of course), Z XZ f dVg = ρβ f dVg . X β X Homework Problem 32. Check that Vg is a measure on X. Remark. To complete a proof of Theorem 17, it is necessary to check that the definition depends only on g and not on the atlas {Oα , φα , Uα } or the partition of unity {ρβ } subordinate to the open cover {Oα }. If Ω is a domain in Rn with smooth boundary, then the measure on the boundary ∂Ω is given by the restriction of the Riemannian metric on Rn . (So this gives a Riemannian metric on ∂Ω, and thus a density as above.) If ∂Ω is locally given by the graph of a function (x1 , . . . , xn−1 , f (x1 , . . . , xn−1 )), then φ(x1 , . . . , xn−1 ) = (x1 , . . . , xn−1 , f (x1 , . . . , xn−1 )) is a local parametrization of the n − 1 matrix 1 0 0 1 .. Dφ = ... . 0 0 f,1 f,2 dimensional manifold ∂Ω ⊂ Rn . The ··· 0 ··· 0 .. . ... . ··· 1 · · · f,n−1 Then the pullback metric n−1 X gij dxi dxj = φ∗ δ = δab dφa dφb i,j=1 = (dx1 )2 + · · · + (dxn−1 )2 + (f,1 dx1 + · · · + f,n−1 dxn−1 )2 . As a matrix, (gij ) = (δij + f,i f,j ). In order to compute the volume form, we should compute det gij . Fortunately, it is easy to compute in this case 2 , det g = 1 + |df |2 = 1 + f,12 + · · · + f,n−1 79 (see Problem 33) below. So the density p dVg = 1 + |df |2 dxn−1 for dxn−1 Lebesgue measure on Rn−1 . Homework Problem 33. For w an n-dimensional column vector, and I the n × n identity matrix, show that det(I + ww> ) = 1 + |w|2 . Hint: Show that I +ww> can be diagonalized, with one eigenvalue 1+|w|2 , and with the eigenvalue 1 repeated n − 1 times. (For this last step, show that on the n − 1 space orthogonal to the natural (1 + |w|2 )-eigenvector, I + ww> acts as the identity. What is a natural eigenvector to try?) ∂f i For a function f : Ω → R, the differential, or one-form, df = ∂x i dx . Under a change of coordinates y = y(x), df transforms as via the chain rule ∂f ∂f ∂f ∂y j i j i dy = df = dx = dx . ∂y j ∂xi ∂y j ∂xi In particular, this gives the formula for differentials (cf. Lemma 45) dy j = ∂y j i dx . ∂xi It also shows that for each p ∈ X a manifold, we can think of df (p) ∈ Tp∗ X the cotangent space. This is investigated further in the following problem: Homework Problem 34. If f is a smooth function on X and v is a smooth vector field, show that at each point p ∈ X, (vf )(p) = df (p)(v(p)). (In the expression on the right, consider df (p) as an element of the dual space Tp∗ X.) Hint: Check it in a single coordinate chart. On a Riemannian manifold (X, g) (i.e., g is a Riemannian metric on the manifold X), for each smooth function f , there is a vector field called the gradient of f . We define the gradient ∇f in local coordinates to be (∇f )i = g ij f,j , k g k` g`m = δm . (So g ij is the inverse of the matrix gij .) Note that the Einstein convention with one index up (typically) indicates that ∇f is a vector field. 80 Homework Problem 35. Show that ∇f transforms as a vector field under coordinate changes. In other words, check that if y = y(x), (∇f )j (y) = ∂y j (∇f )i (x) ∂xi as in (22). Hint: First check how the inverse of the metric g ij transforms. Note that in the definition g ij gjk = δki , δki is independent of coordinate changes. In the case of Euclidean space, it is common to use the gradient of a function instead of its differential. In this case, ∇f = δ ab f,a . Note that on any Riemannian manifold |df |2 = g ab f,a f,b = g ac gcd g db f,a f,b = gcd (∇f )c (∇f )d = |∇f |2 . Let v = v i ∂x∂ i be a vector field on a domain in Rn . Then the divergence of v is a function defined to be ∂v i ∇·v = . ∂xi The divergence of a vector field may also be defined on Riemannian manifolds, but the definition is somewhat more involved. Here is another important theorem, which is a consequence of Stokes’s Theorem (see Spivak, Guillemin and Pollack, or Taylor). We only state it for domains in Rn , and not in its more general context of compact manifolds with boundary. Theorem 18 (Divergence Theorem). Let Ω ⊂⊂ Rn be a domain with smooth boundary ∂Ω. Then for any C 1 vector field v on Ω̄, Z Z ∇ · v dxn = v · n dV. Ω ∂Ω (Here n is the unit outward normal vector field to ∂Ω, and dV is the measure on ∂Ω induced from the Euclidean metric.) Remark. The way we have put the integration depends on the Euclidean metric (to form the dot product, dV and n). In the general form of Stokes’s Theorem, it it unnecessary to use the metric. (We may recast v and ∇ · v as differential forms.) 81 Idea of proof. We do the computation in a very special case, for v having compact support in Ω, which is the lower half-space {x = (x1 , . . . , xn ) ∈ Rn : xn ≤ 0}. In this case the unit normal vector n = (0, . . . , 0, 1) and dV = dxn−1 Lebesgue measure on Rn−1 = {xn = 0}. Then, using Fubini’s Theorem, we want to prove Z ∞ Z ∞ Z ∞Z 0 Z ∞ ∂v i n n−1 1 ... dx dx · · · dx = ... v n dxn−1 · · · dx1 . i ∂x −∞ −∞ −∞ −∞ −∞ Note that the left-hand integral is a sum from i = 1 to n. For i = n, compute Z 0 ∂v n n dx = v n (x1 , . . . , xn−1 , 0) − lim v(x1 , . . . , xn−1 , t) n t→−∞ ∂x −∞ n 1 n−1 = v (x , . . . , x , 0) since v has compact support. On the other hand, for i 6= n, Z ∞ ∂v i i dx = 0 i −∞ ∂x since v has compact support. Therefore, using Fubini’s Theorem, for each i 6= n, we can integrate ∂v i /∂xi with respect to xi first to get zero. The remaining term is the case i = n, and so Z ∞ Z ∞Z 0 ∂v i n n−1 dx dx · · · dx1 ... i ∂x −∞ −∞ −∞ Z ∞ Z ∞Z 0 ∂v n n n−1 = ... dx dx · · · dx1 n ∂x −∞ −∞ −∞ Z ∞ Z ∞ = ... v n dxn−1 · · · dx1 . −∞ −∞ This proves the Divergence Theorem in this special case. The general case can be reduced to this special case by using a partition of unity and the Implicit Function Theorem (see Spivak). In particular, near each point in ∂Ω, there is a local diffeomorphism of Ω̄ to the lower halfspace, sending the boundary to the boundary. Together with open subsets of Ω, these form an open cover of the compact Ω̄, and so we may take a finite subcover, and a partition of unity subordinate to this subcover. Then we can 82 apply the above special case to ρv for ρ in the partition of unity and v the vector field. It is also necessary to make sure that the various terms in the integrals transform well with respect to the local diffeomorphisms. This can be checked directly, but it is better to use the language of differential forms (see Spivak or Guillemin and Pollack). Homework Problem 36. Let Ω be a domain in Rn with smooth boundary. On a neighborhood N ⊂ Rn of a point in the boundary ∂Ω, assume that Ω ∩ N = {x ∈ N : xn < f (x1 , . . . , xn−1 )} so that Ω is locally the region under the graph of a smooth function f . Compute n and dV . For a smooth vector field v, compute Z v · n dV ∂Ω∩N in terms of the integral of a function times Lebesgue measure on Rn−1 . Hint: Locally, ∂Ω is a submanifold of Rn which is the image of φ(x1 , . . . , xn−1 ) = (x1 , . . . , xn−1 , f (x1 , . . . , xn−1 )). Show that n is proportional to ∇ψ, for ψ(x1 , . . . , xn ) = xn − f (x1 , . . . , xn−1 ). Your answer should be of the form Z h dxn−1 φ−1 (∂Ω∩N ) for h a function of x1 , . . . , xn−1 . Corollary 47 (Integration by Parts). Let Ω ⊂⊂ Rn be a domain with smooth boundary ∂Ω. Then for any C 1 vector field v on Ω̄ and C 1 function f on Ω̄, Z Z Z v · ∇f dxn = − f ∇ · v dxn + f v · n dV. Ω Ω ∂Ω Proof. It is easy to check that ∇ · (f v) = (∇f ) · v + f ∇ · v, and Z Z ∇ · (f v) dxn = f v · n dV. Ω ∂Ω 83 3.7 The -Neighborhood Theorem Theorem 19. Let X ⊂ Rn be a compact k-dimensional manifold. Then there is an > 0 so that for X = X + B (0) = {y ∈ Rn : there is an x ∈ X so that |x − y| < }, there is a smooth projection map from X to X which restricts to the identity on X. Before we prove Theorem 19, we need to introduce the normal bundle N X, which is a vector bundle over X for X ⊂ Rn . Let h·, ·i denote the standard inner product on Rn . Define N X = {(x, y) ∈ Rn × Rn : x ∈ X, hy, zi = 0 for all z ∈ Tx X}. Then N X is a vector bundle of rank n − k, with π : N X → X given by π : (x, y) 7→ x. For a given x ∈ X, Nx X = π −1 (x) is the normal space to X at x, which consists of all vectors in Rn perpendicular to the tangent space Tx X. First of all, we show that N X is a smooth n-dimensional manifold. Homework Problem 37. N X is a smooth manifold of dimension n. (a) Show that X ⊂ Rn is a smooth manifold if and only if for each x ∈ X, there is a neighborhood W of x in Rn and a smooth function ψ : W → Rn−k so that Dψ has constant rank n − k and X ∩ W = ψ −1 (0). (To show =⇒, use Theorem 14, and to show ⇐=, use the Implicit Function Theorem.) (b) At each x ∈ X, and given a smooth function ψ as above, show that the normal space Nx is the image of of the transpose of the tangent map Dψ(x)⊥ : Rn−k → Rn . (c) Use the previous section and the techniques of Problem 28 to show N X is a manifold. We will prove the -Neighborhood Theorem by showing that there is a neighborhood of X in Rn which is diffeomorphic to the a neighborhood of the zero section {(x, 0) : x ∈ X} ⊂ N X, and the map required by the -Neighborhood Theorem then comes from π : N X → X. 84 Proof of the -Neighborhood Theorem. Consider the map F : N X → Rn given by F : (x, y) 7→ x + y. For each x ∈ X, DF (x, 0) : Tx (N X) → Rn is a linear isomorphism. This can be proved since T(x,0) (N X) can be written as a sum Tx (X) + Nx (X), and DF (x), when restricted to each factor, is a linear isomorphism. The Inverse Function Theorem then shows that each x ∈ X, there are neighborhoods Nx of (x, 0) in N X and Wx of x in Rn so that F |Nx is a diffeomorphism from Nx to Wx . Note we may apply the Inverse Function Theorem because by considering a local parametrization of N X, and diffeomorphisms of (open subsets of) manifolds are defined in terms of these parametrizations. Consider the following lemma: Lemma 48. There are open sets N and X̃ so that X × {0} ⊂ N ⊂ N X and X ⊂ X̃ ⊂ Rn and the restriction of F is a diffeomorphism from N to X̃. Proof. First of all, we note that DF is a linear isomorphism on N 0 = S x∈X Nx . The Inverse Function Theorem then shows that F |N 0 is a diffeomorphism onto its image as long as it is one-to-one. Therefore, we need only find an open N satisfying X × {0} ⊂ N ⊂ N 0 on which F is one-to-one. Now assume by contradiction that no such N exists. Then there are points (xn , yn ) 6= (x0n , yn0 ) ∈ N X satisfying F (xn , yn ) = F (x0n , yn0 ) and so that |yn |, |yn0 | < n1 (Why? You must use the compactness of X.) Since X is compact, there must be a subsequence ni so that (xni , yni ) → (x, 0) as i → ∞. Then we may take a further subsequence nij so that (x0ni , yn0 i ) → (x0 , 0) as j j j → ∞. For simplicity, we rename the subsequence nij as simply n. Then the continuity of F shows that x = F (x, 0) = lim F (xn , yn ) = lim F (x0n , yn0 ) = F (x0 , 0) = x0 . n→∞ n→∞ Since F is injective on X × {0}, we have x = x0 . But then F |Nx is injective, which contradicts our assumption that (xn , yn ) 6= (x0n , yn0 ) for large n. Therefore, the lemma is proved. Now since X is compact, there is a small > 0 so that X ⊂ F (N ). The projection map from X → X is then given by π ◦ F −1 , which is smooth. This completes the proof of the -Neighborhood Theorem. 85 4 4.1 The Calculus of Variations The variational principle In this section, we want to consider the problem of constructing a function which minimizes a given functional. (A functional is a map from functions to R.) Example 15. Let Ω ⊂⊂ Rn be a domain with smooth boundary. Then we consider the class F = {f ∈ C 2 (Ω) ∩ C 0 (Ω̄) : f = g on ∂Ω} for a given C 2 function g on ∂Ω. Consider the graph of f {(x, f (x)) ∈ Ω̄ × R}. By pulling back the Euclidean metric on Rn+1 , we can consider the n-volume of the graph. We have computed above Z p 1 + |∇f |2 dxn . Vol(f ) = Ω Then we want to consider the following question: Is there an f ∈ F which minimizes Vol(f ) over all of F? If it exists, f must satisfy d Vol(f + h) = 0 d =0 for every h so that f + h ∈ F. We compute and integrate by parts to find a differential equation f must satisfy. First of all, f + h ∈ F if and only if 86 h ∈ C 2 (Ω) ∩ C 0 (Ω̄) and h = 0 on ∂Ω. d 0 = Vol(f + h) d =0 Z p d = 1 + |∇f + ∇h|2 dxn d =0 Ω Z p d 1 + |∇f |2 + 2 df · ∇h + 2 |∇h|2 dxn = d =0 Ω Z 2 ∇f · ∇h + 2 |∇h|2 p = dxn 2 Ω 2 1 + |∇f + ∇h| =0 Z ∇f · ∇h p = dxn 1 + |∇f |2 Ω ! ! Z Z ∇f ∇f = − h∇ · p dxn + h p · n dV 2 1 + |∇f | 1 + |∇f |2 Ω ∂Ω ! Z ∇f = − h∇ · p dxn . 1 + |∇f |2 Ω This last integral must be equal to zero for every h ∈ C 0 (Ω̄) which vanishes on ∂Ω. We claim this forces ! ∇f g =∇· p =0 1 + |∇f |2 on Ω. To prove the claim, note that since f is C 2 , g is continuous on Ω. We prove the claim by contradiction. If g is nonzero at any point x ∈ Ω, assume without loss of generality that g(x) > 0. Then by continuity, g > 0 in a small ball B centered at x. Now it is easy to find a smooth bump function h whose support is contained in B. In this case Z Z hg dxn = hg dxn > 0, Ω B which provides the contradiction. 87 Thus any function f which minimizes the functional Vol satisfies the Euler-Lagrange equation of the functional ! ∇f ∇· p = 0. 1 + |∇f |2 This equation is known as the minimal surface equation. So a solution to our problem satisfies the minimal surface equation, and the boundary condition f = g on ∂Ω. This sort of boundary condition of specifying the value of a solution f is called a Dirichlet boundary condition. The problem of finding a solution to the equation with this boundary condition is a Dirichlet boundary value problem. Note that the Dirichlet boundary condition is essential in making sure the variational function h vanishes on the boundary, and thus there are no boundary terms when we integrate by parts. There is another useful type of boundary condition, the Neumann boundary condition, in which the normal derivative ∇f · n = 0. Notice that this also makes the integral over ∂Ω vanish in the integration by parts. In the previous example, we computed the Euler-Lagrange equation for Vol. There may be solutions to the Euler-Lagrange equation which are not minimizers of Vol, since we have only checked the first-derivative test. A solution to the Euler-Lagrange equation may correspond to a local maximum, a saddle point or a local but non-global minimum. We’ll see below specific techniques for finding a global minimizer, which we apply in another geometric problem. The Euler-Lagrange equations come from the first variational formula that a minimizer must satisfy: Given a family f with f = f0 , then if f minimizes a functional P , d P (f ) = 0. d =0 This is the formula of the first variation, which comes from the first derivative test in calculus. We may also use the second derivative test. A minimizer f as above must satisfy the second variation formula d2 P (f ) ≥ 0. d2 =0 88 Homework Problem 38. Consider a variational problem for C 2 functions y = y(x) from a domain [a, b] and fixed endpoints y(a) = y0 , y(b) = y1 . Assume the function is of the form Z b J(y) = F (y, y 0 )dx, a for F a smooth function of 2 variables. (a) Compute the general Euler-Lagrange equation for J. (b) Multiply the Euler-Lagrange equation by y 0 to show that any solution to the Euler-Lagrange equation must satisfy dG =0 dx for a function G depending on F, y and their derivatives. (c) A graph y = y(x) of a C 1 positive function determines a surface of revolution around the x-axis with surface area Z b p A(y) = 2π y 1 + (y 0 )2 dx. a Compute the Euler-Lagrange equation for A (assume y is C 2 ) and compute its general solution. (The graph of this solution is called a catenary.) 4.2 Geodesics Given a C 1 path γ : I → X for I = [α, β] an interval and X ⊂ RN a manifold with Riemannian metric g induced from the Euclidean metric on RN , the length of the path γ(I) is given by L(γ) = Z β α |γ̇|g dt = Z β α Z p g(γ̇, γ̇) dt = β α q gij (γ(t))γ̇ i (t)γ̇ j (t) dt. (In the last formulation, note the use of local coordinates. So the last formulation is strictly only true when γ(I) is contained in a single coordinate chart.) L(γ) is called the length functional which take paths γ to R. 89 Proposition 49. The length of a path is independent of the parametrization. In other words, if γ̃(τ ) = γ(t(τ )) for t = t(τ ) a C 1 diffeomorphism onto I, then L(γ̃) = L(γ). Proof. Let t = t(τ ) with t(α̃) = α, t(β̃) = β. Assume that α̃ < β̃ and since t is a diffeomorphism, then dt/dτ > 0. Then compute Z β̃ s dγ̃ dγ̃ L(γ̃) = g , dτ dτ dτ α̃ Z β̃ s dγ dt dγ dt = g , dτ dt dτ dt dτ α̃ Z β̃ s dγ dγ dt = g , dτ dt dt dτ α̃ Z βs dγ dγ = g , dt dt dt α = L(γ). The case when dt/dτ < 0 and α̃ > β̃ is similar. So this definition corresponds to the usual definition of the arc length of a parametric curve. In particular, it is invariant under change of parametrization. This particular feature turns out to cause trouble analytically. In the following sections, we’ll seek to find paths minimizing arc length by constructing a sequence of paths approaching a length-minimizing one. The fact that a potentially minimizing path has many different parametrizations will make the analysis more difficult, since it will be difficult to find a sequence of paths which approaches a particular minimizing path among all the possible parametrizations. Another analytic objection to the length functional is that it is the L1 norm of the length of the tangent vector γ̇. L2 norms tend to behave better, since we can use the structure of Hilbert spaces. Assume for convenience that the interval I = [0, 1]. This can always be achieved by using a linear map to take a given I to [0, 1]. Thus we introduce a related functional, the energy of a C 1 path γ : [0, 1] → X. Define Z 1 E(γ) = |γ̇|2g dt. 0 The energy is related to the length by the following proposition. 90 Proposition 50. For a given homotopy class C of curves γ : [0, 1] → X, a C 1 curve γ minimizes E in C if and only if it minimizes L among C 1 curves in C and the speed |γ̇(t)|g is constant. Before we start the proof, we recall a little about homotopy classes. Two continuous curves γi : [0, 1] → X i = 0, 1 are homotopic if γi (0) = p, γi (1) = q for i = 0, 1, and if there is a continuous function (called a homotopy) G : [0, 1] × [0, 1] → X so that G(0, t) = γ0 (t), G(1, t) = γ1 (t) for all t ∈ [0, 1], and G(s, 0) = p and G(s, 1) = q for all s ∈ [0, 1]. (More generally, if Y and X are both metric spaces, then two continuous maps f0 , f1 : Y → X are said to be homotopic if there is a continuous map F : [0, 1] × Y → X with F (0, y) = f0 (y), F (1, y) = f1 (y) for all y ∈ Y . In the present case, the space Y = [0, 1] and we impose the extra conditions that the values at the endpoints t = 0, 1 are fixed at p, q respectively as well.) Since we are measuring length and energy, we are only interested in curves γi which are C 1 , while we allow the homotopy G to be only continuous. Proposition 51. The condition of two paths being homotopic is an equivalence relation, and thus we may consider homotopy classes of paths.) Proof. We need to show the property is reflexive, symmetric, and transitive. If γ : [0, 1] → X is a continuous path, then it is homotopic to itself via the homotopy G(s, t) = γ(t) for s ∈ [0, 1]. This shows the reflexive property. If γ0 is homotopic to γ1 via the homotopy G, then we see γ1 is homotopic to γ0 via the homotopy G̃(s, t) = G(1 − s, t). This shows the symmetric property. Finally, to show the transitive property, if γ0 is homotopic to γ1 via a homotopy G and γ1 is homotopic to γ2 via a homotopy F , then we construct a homotopy from γ0 to γ2 by the formula G(2s, t) for s ∈ [0, 1/2] H(s, t) = F (2s − 1, t) for s ∈ [1/2, 1] Note this definition is well-defined, since for H(1/2, t) = γ1 (t) for either definition above. This observation also shows that H is continuous. It is straightforward to show H is a homotopy. A C 1 diffeomorphism t = t(τ ) of [0, 1] is called orientation preserving if dt/dτ > 0. Another fact about homotopy we’ll presently use is the following 91 Lemma 52. If γ̃(τ ) = γ(t(τ )) for t = t(τ ) an orientation-preserving diffeomorphism of [0, 1], then γ̃ and γ are homotopic. Proof. For s, τ ∈ [0, 1], define ψ(s, τ ) = sτ + (1 − s)t(τ ). Then we will show that G(s, τ ) = γ(ψ(s, τ )) is the required homotopy. First of all, since t(τ ) is an orientation-preserving diffeomorphism, we see t(0) = 0, t(1) = 1. Now check that for s, τ ∈ [0, 1], ψ(s, τ ) ∈ [0, 1]: because 0 ≤ τ ≤ 1 and 0 ≤ t(τ ) ≤ 1, then 0 = s(0) + (1 − s)0 ≤ sτ + (1 − s)t(τ ) ≤ s(1) + (1 − s)(1) = 1. This shows the homotopy G is well-defined. It is obvious for τ ∈ [0, 1] that G(0, τ ) = γ0 (τ ) and G(1, τ ) = γ1 (τ ). Also compute for s ∈ [0, 1], G(s, 0) = γ(0) and G(s, 1) = γ(1). Also, note the following Lemma 53. For any C 1 path γ, E(γ) ≥ L(γ)2 and they are equal if and only if |γ̇(t)|g is constant. Proof. Apply Hölder’s inequality Z 1 21 Z Z 1 2 L(γ) = |γ̇(t)|g dt ≤ 1 dt 0 0 0 1 |γ̇(t)|2g 12 p dt = E(γ) with equality if and only if 1 is proportional to |γ̇(t)|g , which is the same as |γ̇(t)|g being constant. Proof of Proposition 50. Let γ ∈ C satisfy E(γ) ≤ E(γ 0 ) for all γ 0 ∈ C. Given γ, let γc be the constant speed reparametrization of γ (this exists by Problem 39 below). Then we have by Proposition 49 and Lemma 53 L(γc )2 = L(γ)2 ≤ E(γ) ≤ E(γc ) = L(γc )2 . Thus all the inequalities in the above equation must be equalities and L(γ)2 = E(γ). Then Lemma 53 implies γ must have constant speed. So we’ve shown so far that if γ minimizes E, then γ has constant speed. Let γ minimize E. For each C 1 curve γ 0 ∈ C, let γc0 be a constant speed reparametrization. Then since γ has constant speed, Lemma 53 and Proposition 49 show L(γ)2 = E(γ) ≤ E(γc0 ) = L(γc0 )2 = L(γ 0 )2 . So we’ve shown that if γ minimizes E in C, then γ minimizes L in C. We leave the converse statement as Problem 40 below. 92 Homework Problem 39. (a) Let γ : [0, 1] → X, γ = γ(t) be a C 1 path into a Riemannian manifold X. Assume |γ̇(t)|g 6= 0 for all t ∈ [0, 1]. Show that there is a reparametrization t(τ ) so that t(0) = 0, t(1) = 1, is constant. dt/dτ > 0, and dγ dτ g Hint: Show the constant must be equal to L(γ). Then show the condition is an ODE in τ = τ (t). (Note that if dt/dτ > 0, then t(τ ) is strictly increasing and thus has an inverse on [0, 1].) (b) Remove the condition that |γ̇(t)|g 6= 0. In this case, t(τ ) will only be Lipschitz. Hint: Consider the open set O = {t : γ̇(t) 6= 0}. Perform a similar analysis on each connected component of O. *** This still needs work. *** Homework Problem 40. For a given homotopy class C of curves γ : [0, 1] → X, assume γ has constant speed |γ̇(t)|g and γ minimizes L among C 1 curves in C. Then γ minimizes E among C 1 curves in C. Now we compute the first variation of the energy functional. Let γ be a smooth curve from [0, 1] to X so that γ(0) = p, γ(1) = q. X ⊂ RN has the Riemannian metric pulled back from RN . Assume γ minimizes E in a homotopy class C, and that γ is C 2 . Then for each smooth family γ (t), we have d E(γ ) = 0. d =0 Consider a variation of the following special form. Near a point in γ([0, 1]), pick local coordinates x : O → U ⊂ Rn . Then there is a small time interval I = γ −1 (O) ⊂ [0, 1]. Assume for simplicity that I doesn’t contain either endpoint 0 or 1. In terms of the local coordinates x, x(γ(t)) = γ(t) ∈ U ⊂ Rn , for t ∈ I. Then let h : R → Rn be a smooth function so that supp(h) ⊂⊂ I. For near 0, γ (t) = γ(t) + h(t) ⊂ U for t ∈ I. We define γ outside of O to be simply γ. Apply the first variational 93 formula d E(γ ) = d =0 Z 1 d g(γ̇ (t), γ̇ (t)) dt d =0 0 Z d = gij (γ(t) + h(t))[γ̇ i (t) + ḣi (t)][γ̇ j (t) + ḣj (t)] dt d =0 I Z ∂gij k = (γ(t))h (t) γ̇ i (t) γ̇ j (t) dt k ∂x I Z + gij (γ(t)) ḣi (t) γ̇ j (t) dt ZI + gij (γ(t)) γ̇ i (t) ḣj (t) dt I Now we integrate by parts in the last two integrals. Note that since h has compact support, all the boundary terms involving h vanish. Compute Z Z ∂gij k i j gij (γ(t)) ḣ (t) γ̇ (t) dt = − (γ(t))γ̇ (t) hi (t) γ̇ j (t) dt k I I ∂x Z − gij (γ(t)) hi (t) γ̈ j (t) dt. I We may plug this in to find for a minimizer d 0 = E(γ ) d =0 Z ∂gij k i j ∂gij k i j ∂gij k i j i j i j h γ̇ γ̇ − k γ̇ h γ̇ − gij h γ̈ − k γ̇ γ̇ h − gij γ̈ h dt = ∂xk ∂x ∂x I Z ∂gij i j ∂gkj i j ∂gik j i j j k = h γ̇ γ̇ − γ̇ γ̇ − gkj γ̈ − γ̇ γ̇ − gjk γ̈ dt. ∂xk ∂xi ∂xj I Since this is true for each h with compact support in I, then we must have for each k = 1, . . . , n, and for all t in the open interval I, 0= ∂gij i j ∂gkj i j ∂gik j i γ̇ γ̇ − γ̇ γ̇ − gkj γ̈ j − γ̇ γ̇ − gjk γ̈ j . k i ∂x ∂x ∂xj 94 Since gkj = gjk , we have 0 = 0 = = Γ`ij = 1 ∂gij ∂gkj ∂gik gjk γ̈ + − k + + γ̇ i γ̇ j , 2 ∂x ∂xi ∂xj 1 k` ∂gkj ∂gik ∂gij ` γ̈ + g + − k γ̇ i γ̇ j 2 ∂xi ∂xj ∂x γ̈ ` + Γ`ij γ̇ i γ̇ j , 1 k` ∂gkj ∂gik ∂gij g + − k . 2 ∂xi ∂xj ∂x j Γ`ij are called the Christoffel symbols of the metric gij , and γ̈ ` + Γ`ij γ̇ i γ̇ j = 0 (26) is called the geodesic equation for the metric g. Note Γ`ij = Γ`ji . Any curve satisfying this second-order system is called a geodesic on the Riemannian manifold X. Remark. Our definition of geodesic requires a specific parametrization to solve the equation (the constant speed parametrization). Many other authors define a geodesic to be a curve which satisfies the first variational equation of arc-length. These geodesics are the same as our geodesics as subsets of the Riemannian manifold, but the parametrization is not required to be constant speed. Note that this analysis does not work at the endpoints 0 and 1. There, we simply have the conditions γ(0) = p and γ(1) = q to remain in the class C. This is essentially a Dirichlet boundary condition on the problem. Homework Problem 41. Let p, q be points in a manifold X, and consider the class C of all smooth paths from p to q. (a) Compute the Euler-Lagrange equations for the length functional L(γ) for γ ∈ C. Show that any γ : [0, 1] → X which is a critical point of L must satisfy γ̈ ` (t) + Γ`ij (γ(t))γ̇ i (t)γ̇ j (t) = c(t)γ̇ ` (t) for t ∈ (0, 1) and c(t) a real-valued function of t. 95 (b) Use part (a) to prove the following generalization of Proposition 50: A curve γ in C is a critical point of E if and only if it is a critical point of L and it has constant speed. Homework Problem 42. Let (X, g) be an n-dimensional smooth compact Riemannian manifold. By Nash’s Theorem, we may assume that g = i∗ δ the pull-back of the Euclidean metric δ on RN for some embedding i : X → RN . If (p, v) ∈ T X (i.e. p ∈ X and v ∈ Tp X), show that the solution to the geodesic equation (26) on X with initial conditions γ(0) = p and γ̇(0) = v exists for all time. Hints: (a) Show that if γ(t) solves the geodesic equation (26), then the speed |γ̇(t)|g is constant in t. (b) Reduce the problem to the case the initial speed |v|g(p) = 1. (c) The unit tangent bundle U T X is defined by U T X = {(p, v) ∈ T X : |v|g(p) = 1}. Show U T X is compact as long as X is compact. (d) Mimic the proof of Theorem 15 to complete the proof. Example 16. Euclidean space is Rn with the standard Euclidean metric δ = δij dxi dxj . In this case, all the Christoffel symbols Γkij vanish, since each term involves differentiating the components of the metric tensor, all of which are constant. Therefore, the geodesic system is simply γ̈ k = 0. Solutions to this ODE are simply linear functions of t, and so geodesics are of the form γ = tv + w for v, w ∈ Rn . So geodesics on Euclidean space are straight lines traversed at constant speed. Example 17. For hyperbolic space, recall the metric gij = (xn )−2 δij on {x ∈ Rn : xn > 0}. Compute the Christoffel symbols: g ij = (xn )2 δ ij , gij,k = −2(xn )−3 δij δkn , Γkij = 12 (xn )2 δ k` (gi`,j + g`j,i − gij,` ) = = 1 (xn )2 δ k` [−2(xn )−3 ](δi` δjn + δ`j δin 2 −(xn )−1 (δik δjn + δjk δin − δ kn δij ). 96 − δij δ`n ) Now consider i, j, k distinct integers in {1, . . . , n}. Γkij = 0, Γiik = Γiki = −(xn )−1 δkn , Γkii = (xn )−1 δ kn , Γiii = −(xn )−1 δin . First, we look for solutions in which γ̇ k = 0 for k = 1, . . . , n − 1 (so only γ n varies in t). It is plausible to look for such solutions since the coefficients gij of the metric depend only on xn . In this case, for k < n, compute 0 = γ̈ k = −Γkij γ̇ i γ̇ j = −Γknn γ̇ n γ̇ n = (xn )−1 δ kn γ̇ n γ̇ n = 0. Thus if γ̇ 1 = · · · = γ̇ n−1 = 0, then the geodesic equations for γ̈ k for k < n are automatically solved. Now compute the geodesic equation for γ̈ n : γ̈ n = = = = −Γnij γ̇ i γ̇ j −Γnnn γ̇ n γ̇ n (xn )−1 γ̇ n γ̇ n , (γ n )−1 γ̇ n γ̇ n . (27) This is a second-order nonlinear equation in γ n , and we do not have any general technique to solve such an equation. We can, however, make some educated guesses. In particular, note that (γ n γ̇ n )˙ = γ n γ̈ n + γ̇ n γ̇ n , and that each of these terms is similar to those in the geodesic equation (27) above. In particular, compute for a function f of γ n 0 = (f (γ n )γ̇ n )˙ = f (γ n )γ̈ n + f 0 (γ n )γ̇ n γ̇ n , f 0 (γ n ) n n 0 = γ̈ n + γ̇ γ̇ . f (γ n ) 97 (28) (29) This last equation is the same as the geodesic equation (27) if 1 f 0 (γ n ) = − n, n f (γ ) γ and this is now a first-order separable equation for f . We may solve to find f = (γ n )−1 is a solution. Now plug into (28) to find n · γ̇ 0 = , γn γ̇ n C = n γ = (log γ n )˙, Ct + D = log γ n , γ n = AeCt for A a positive constant (since in hyperbolic space, we have xn = γ n > 0) and C any real constant. Therefore, γ 1 = γ01 , ..., γ n−1 = γ0n−1 , γ n = AeCt solves the geodesic system on hyperbolic space. So far we have only found geodesics in the special case that γ̇ 1 = · · · = γ̇ n−1 = 0. To find all the geodesics on hyperbolic space, we introduce the notion of an isometry of a Riemannian manifold. Given a Riemannian manifold (X, g), a diffeomorphism Φ : X → X is an isometry if Φ∗ g = g. Isometries of Hn are well understood, and we introduce a specific type. For α > 0, let ια : x 7→ α x , |x|2 where x ∈ Hn ⊂ Rn and |x|2 = (x1 )2 + · · · + (xn )2 comes from Rn . It is easy to see that ια is a diffeomorphism of Hn . To show that it is an isometry, let y = ια (x). Then ! Pn j 2 (dy ) j=1 ι∗α g = ι∗α . |y|2 98 Dropping the pull back ι∗α notation, we compute xj , |x|2 ∂y j i = dx , ∂xi n X |x|2 δij − 2xi xj i = α dx , 4 |x| i=1 yj = α dy j (dy j )2 = α2 n X |x|2 δij − 2xi xj i dx |x|4 i=1 !2 = α2 n X |x|2 δij − 2xi xj i dx |x|4 i=1 ! n X |x|2 δkj − 2xk xj k dx |x|4 k=1 ! n i k α2 X i k j 2 2 i j j 2 k j j 4 j j = 4x x (x ) − 2|x| x x δ − 2|x| x x δ + |x| δ δ i i k k dx dx |x|8 i,k=1 ( n n X X α2 j 2 i k i k 2 j j = 4(x ) x x dx dx − 2|x| x dx xi dxi |x|8 i=1 i,k=1 ) n X xk dxk + |x|4 (dxj )2 − 2|x|2 xj dxj 2 = α |x|8 ( 4(xj )2 k=1 n X xi xk dxi dxk − 4|x|2 xj dxj n X i=1 i,k=1 99 xi dxi + |x|4 (dxj )2 ) , n X (dy j )2 j=1 α2 = |x|8 ( n X 4 (xj )2 ! j=1 2 −4|x| n X n X xi xk dxi dxk i,k=1 j i i j 4 x x dx dx + |x| i,j=1 α2 = |x|8 ( 4|x| = Pn j=1 (dy (y n )2 j 2 ) = = α |x|4 α2 |x|4 n X n X j 2 (dx ) ) j=1 n X i k i k 2 x x dx dx − 4|x| i,k=1 + |x|4 2 2 ! n X xi xk dxi dxk i,k=1 (dxj )2 ) j=1 n X (dxj )2 , j=1 Pn j=1 (dx j 2 ) n 2 α2 (x|x|4) Pn j=1 (dx (xn )2 j 2 ) . Therefore, ι∗α g = g and ια is an isometry. Moreover, it is trivial to check that any translation x 7→ x + x0 is an isometry of Hn if the last component xn0 = 0. Also, note that the composition of two isometries is again an isometry (indeed the set of isometries of a Riemannian manifold X forms a subgroup of the diffeomorphism group called the isometry group). Proposition 55 below shows that for any geodesic ψ : R → Hn , then ια ◦ ψ is also a geodesic. Recall we know so far that γ = (γ01 , . . . , γ0n−1 , AeCt ) are geodesics for A > 0, C ∈ R. Compute for α > 0, ια ◦ γ = α α(γ01 , . . . , γ0n−1 , AeCt ) γ = . |γ|2 (γ01 )2 + · · · + (γ0n−1 )2 + A2 e2Ct The image ια ◦ γ(R) is then the half-circle in Rn which intersects {xn = 0} perpendicularly at 0 and (γ01 , . . . , γ0n−1 , 0) . (γ01 )2 + · · · + (γ0n−1 )2 100 Then if we apply the isometry given by adding a constant x0 with xn0 = 0, then every half-circle in Hn which intersects {xn = 0} perpendicularly at both endpoints is the image of a geodesic path in Hn . All together, for constants γ01 , . . . , γ0n−1 , x10 , . . . , xn−1 , C ∈ R, A, α > 0, 0 the path for t ∈ R ψ(t) = α(γ01 , . . . , γ0n−1 , AeCt ) + (x10 , . . . , xn−1 , 0) 0 (γ01 )2 + · · · + (γ0n−1 )2 + A2 e2Ct (30) is a geodesic in Hn , and the image ψ(R) is a ray or a half-circle in Rn perpendicular to {xn = 0}. All such rays and semicircles are represented by such geodesic paths. We claim that we have found all the geodesics in Hn . The way to check this is to recognize that the geodesic system, as a second-order ODE system with smooth coefficients, has a unique solution for each initial value problem γ̈ k = −Γkij γ̇ i γ̇ j , γ(0) = y0 , γ̇(0) = v0 . Then if we can check that every initial condition (y0 , v0 ) ∈ T Hn occurs as (ψ(0), ψ̇(0)) for a geodesic ψ(t) in (30), uniqueness of the geodesic system will imply that we have found all the geodesics in Hn . So we must check that every (y0 , v0 ) ∈ T Hn = Hn × Rn can be represented by (ψ(0), ψ̇(0)) for a ψ(t) in (30). For a given point y0 ∈ Hn , and vector v0 ∈ Ty0 Hn = Rn , consider first the case when v01 = · · · = v0n−1 = 0. In this case, we can choose A > 0 and C so that ψ(t) = (y01 , . . . , y0n−1 , AeCt ) satisfies ψ(0) = y0 and ψ̇(0) = v0 . Otherwise, y0 and v0 span a plane P in Hn . Let L = P∩{xn = 0}. It is straightforward to check that there is a unique semicircle in the plane P which hits L perpendicularly, passes through y0 and is tangent to v0 at y0 . This is the image of some geodesic ψ(t) in (30). Then we can adjust C and A to ensure that ψ(0) = y0 and ψ̇(0) = v0 . Therefore, every initial condition (y0 , v0 ) is achieved by a geodesic on our list, and we have found all the geodesics in hyperbolic space. 101 The following proposition was discussed in Example 17 above. Proposition 54. Consider a Riemannian manifold (X, g). Given p ∈ X, v ∈ Tp X, there is an > 0 and a unique geodesic γ : (−, ) → X with γ(0) = p, γ̇(0) = v. Remark. In general, the geodesic γ may not exist for all time, although we have seen that all the geodesics on hyperbolic space (Example 17) and on compact Riemannian manifolds (Problem 42) do exist for all time. A map Φ : X → Y for manifolds X and Y with Riemannian metrics g and h respectively is a local isometry if every point in X has a neighborhood O on which Φ : O → Φ(O) ⊂ Y is an isometry. Proposition 55. If Φ : X → Y is a local isometry of Riemannian manifolds, then for every geodesic ψ : (−, ) → X, Φ◦ψ is a geodesic on Y . Any geodesic on Φ(X) ⊂ Y is of this form. Proof. In local coordinates on X and Y , we can write the isometry as y = y(x). Note this is the same form as a coordinate change, and the condition that the map is an isometry is simply that the metric pulls back as a (0, 2) tensor when changing coordinates. Therefore, the proof boils down the the following fact: for a local isometry, and for any C 2 path γ, the quantity wk = γ̈ k + Γkij γ̇ i γ̇ j transforms like a tangent vector (i.e. a (1, 0) tensor) under changes of coordinates. Therefore, ∂y I ∂ ∂ wk k = wk k I ∂x ∂x ∂y and wk (x) = 0 for k = 1, . . . , n is equivalent to wI (y) = 0 for I = 1, . . . , n. ∂y I This is because ∂x k is nonsingular for y = y(x) a diffeomorphism. In order to compute how wk transforms, we use the following index convention. Indices i, j, k, . . . are with respect to the x variables, while indices I, J, K, . . . are with respect to the y variables. For example, gij is the metric in the x coordinates, while gIJ is the metric in the y coordinates. First of all, note gIJ = gij ∂xi ∂xj , ∂y I ∂y J g IJ = g ij 102 ∂y I ∂y J . ∂xi ∂xj Compute ∂gIJ ∂y K ∂ ∂xi ∂xj = gij I J ∂y K ∂y ∂y i ∂gij ∂x ∂xj ∂ 2 xi ∂xj ∂xi ∂ 2 xj = + g + g ij ij ∂y K ∂y I ∂y J ∂y I ∂y K ∂y J ∂y I ∂y J ∂y K ∂xk ∂xi ∂xj ∂ 2 xi ∂xj ∂xi ∂ 2 xj = gij,k K I J + gij I K J + gij I J K . ∂y ∂y ∂y ∂y ∂y ∂y ∂y ∂y ∂y gIJ,K = Then compute gKJ,I + gIK,J − gIJ,K ∂xk ∂xi ∂xj ∂ 2 xi ∂xj ∂xi ∂ 2 xj = gij,k I K J + gij I K J + gij K J I ∂y ∂y ∂y ∂y ∂y ∂y ∂y ∂y ∂y k i j 2 i j ∂x ∂x ∂x ∂ x ∂x ∂xi ∂ 2 xj + gij,k J I K + gij I J K + gij I J K ∂y ∂y ∂y ∂y ∂y ∂y ∂y ∂y ∂y k i j 2 i j ∂ x ∂x ∂xi ∂ 2 xj ∂x ∂x ∂x − gij,k K I J − gij I K J − gij I J K ∂y ∂y ∂y ∂y ∂y ∂y ∂y ∂y ∂y ∂xk ∂xi ∂xj ∂xk ∂xi ∂xj ∂xk ∂xi ∂xj = gij,k I K J + gij,k J I K − gij,k K I J ∂y ∂y ∂y ∂y ∂y ∂y ∂y ∂y ∂y 2 i j ∂ x ∂x + 2 gij I J K . ∂y ∂y ∂y 103 Then the Christoffel symbols ΓLIJ = g KL (gKJ,I + gIK,J − gIJ,K ) K L ∂xk ∂xi ∂xj ∂xk ∂xi ∂xj 1 m` ∂y ∂y = 2g gij,k I K J + gij,k J I K ∂xm ∂x` ∂y ∂y ∂y ∂y ∂y ∂y k i j 2 i j ∂x ∂x ∂x ∂ x ∂x − gij,k K I J + 2 gij I J K ∂y ∂y ∂y ∂y ∂y ∂y k j L ∂y ∂x ∂x ∂xk ∂xi g + g = 12 g m` mj,k im,k ∂x` ∂y I ∂y J ∂y J ∂y I ∂ 2 xi ∂xi ∂xj − gij,m I J + 2 gim I J ∂y ∂y ∂y ∂y i j L L ∂y ∂ 2 x` ∂y ∂x ∂x + . = Γ`ij ∂x` ∂y I ∂y J ∂x` ∂y I ∂y J 1 2 Note that the second term in the last formula shows that the Christoffel symbols do not transform as a tensor. In fact, this is fortunate, as the extra non-tensorial term will cancel out a similar term coming from the second derivative γ̈ k . Note that ∂y I i γ̇ , i ∂x d ∂y L ` = (γ) γ̇ dt ∂x` ∂y L ` ∂ 2yL j ` = γ̈ + γ̇ γ̇ . ∂x` ∂x` ∂xj γ̇ I = γ̈ L Compute ΓLIJ γ̇ I γ̇ J I J ∂y L ∂xi ∂xj ∂y L ∂ 2 x` m ∂y p ∂y = + γ̇ γ̇ ∂x` ∂y I ∂y J ∂x` ∂y I ∂y J ∂xm ∂xp ∂y L ∂ 2 xk ∂y L ∂y I ∂y J = Γ`ij γ̇ i γ̇ j ` + I J k j ` γ̇ j γ̇ ` . ∂x ∂y ∂y ∂x ∂x ∂x Γ`ij Therefore, γ̈ L + ΓLIJ γ̇ I γ̇ J will transform like a tensor if we can show that the non-tensorial terms cancel: We need to show ∂ 2yL ∂ 2 xk ∂y L ∂y I ∂y J + = 0. ∂x` ∂xj ∂y I ∂y J ∂xk ∂xj ∂x` 104 (31) This equation follows from the formula for the first derivative of an inverse matrix. If Ȧ represents the first derivative of a matrix A (with respect to any parameter or variable), then (A−1 )˙ = −A−1 ȦA−1 . (Proof: Differentiate the equation AA−1 = I to find ȦA−1 + A(A−1 )˙ = 0.) Then since (∂y L /∂x` ) is the inverse matrix of (∂x` /∂y L ), L ∂ 2yL ∂ ∂y = ` j j ∂x ∂x ∂x ∂x` k J ∂y L ∂ ∂x ∂y = − k ∂x ∂xj ∂y J ∂x` J ∂y L ∂y I ∂ ∂xk ∂y . = − k j I J ∂x ∂x ∂y ∂y ∂x` Upon plugging in, this proves formula (31) and the proposition. Remark. There is also a more geometric proof of the previous proposition. Recall that we derived the geodesic equation as the Euler-Lagrange equation of the energy functional. So any path which minimizes the energy satisfies the geodesic equation. It is easy to see that the energy of a path is invariant under an isometry; therefore, the notion of energy-minimizing path is invariant under isometries. The problem is that there are geodesics which do not minimize the energy. (They may be saddle points of the energy functional.) This can be surmounted by restricting to small domains by using the following fact from Riemannian geometry: Every point in a Riemannian manifold has a neighborhood O so that all geodesic paths in O are energy-minimizing for endpoints in O. (In Riemannian geometry books, this fact is usually stated in terms of the length functional instead; to translate to the present situation, recall that energy-minimizing paths are length-minimizing paths parametrized with constant speed.) Homework Problem 43. Given a smooth function on a Riemannian manifold, the Hessian of f is defined locally by the formula H(f )ij = ∂2f ∂f − Γkij k . i j ∂x ∂x ∂x Show that the Hessian of f is a symmetric (0, 2) tensor. 105 Homework Problem 44. Compute all the geodesics on S2 . Hint: Use the expression for the metric in local coordinates (y 1 , y 2 ) from Example 13. Compute the Christoffel symbols. Analyze the case when y 2 = 0 and only y 1 varies. Solve the resulting second-order ODE for γ 1 = y 1 . Then move these geodesics around via the isometry group of S2 . (The isometry group of S2 is given by the orthogonal group of 3 × 3 matrices O(3) = {A : AA> = I}. Show that each such linear action is an isometry of R3 which takes the unit sphere S2 to itself. For every line L though the origin in R3 , show that rotating by an angle θ around the line L is a linear map in O(3). Show that every initial condition (p, v) ∈ T S2 of the geodesic equation on S2 can be realized by the examples you computed above, when acted on by such a rotation in O(3).) 4.3 The direct method: An example We have computed the Euler-Lagrange equations of the energy functional. Now we introduce an example of the direct method in the calculus of variations. The direct method is this: Given a functional E : C → R, if there is a lower bound I = inf γ∈C E(γ) > −∞, then there is a sequence of paths γi so that E(γi ) → I. The direct method is to show that there is a subsequence of {γi } which converges to some γ, and to show that the limiting γ ∈ C and that E(γ) = I. Thus we have constructed a minimizer γ over the class C of the functional E. There are subtle points to deal with along the way. Typically, the class C is a closed subset of a Banach space, and in passing to the limit of a subsequence, the limit γ we construct may be in a weaker Banach space (for example, a sequence in C 1 may produce a limit only in C 0 , which will be problematic if the functional involves any derivatives). A related issue is that in passing to the limit γij → γ, we may not have E(γij ) → E(γ). In particular, below we will have to deal with the situation in which we only know limj→∞ E(γij ) ≥ E(γ)—so that the functional is only lower semi-continuous under the limit. Thus we will typically need to spend time improving the regularity of the limit γ and showing some semi-continuity of the functional under the limiting subsequence. The direct method of the calculus of variations is very useful in solving 106 elliptic PDEs. The problem we approach involves geodesics, and thus the solution we produce be a solution to an ODE. This will allow us to proceed with much of the general picture of the calculus of variations while avoiding some of the more technical points. In particular, we will learn about distributions, weak derivatives, Hilbert spaces, and compact maps between Banach spaces in solving our problem. Given a smooth manifold X, a loop is a continuous map from the circle S1 to X. Each such loop is equivalent to a continuous map γ : R → X which is periodic in the sense that γ(t+1) = γ(t) for all t ∈ R. We will abuse notation by using the same γ for γ : S1 → X and the periodic γ : R → X. (This is because S1 is naturally the quotient R/Z, where Z acts on R by adding integers to real numbers.) Two loops γ0 , γ1 : S1 → X are freely homotopic if there is a continuous homotopy G : [0, 1] × S1 → X, G(0, t) = γ0 (t), G(1, t) = γ1 (t). The condition of being freely homotopic is an equivalence relation, and thus each loop on a manifold X is a member of a free homotopy class. Here is our problem: Problem: Find a curve of least length in a free homotopy class of loops on a compact Riemannian manifold. The problem may have no solution on a noncompact Riemannian manifold. There may be loops of arbitrarily small length in a given nontrivial free homotopy class, corresponding to a loops slipping off a narrowing end of the manifold. Homotopy classes are objects defined by continuity, and the following result should come as no surprise. Proposition 56. For a smooth compact manifold X ⊂ RN , there is an > 0 so that if two loops γ0 , γ1 : S1 → X ⊂ RN satisfy kγ0 − γ1 kC 0 (S1 ,RN ) < , then γ0 and γ1 are homotopic as loops in X. Proof. We apply the -Neighborhood Theorem (19): For > 0, let X be the open subset of RN consisting of all points distance less than from X. There is a > 0 small enough so that every point in X has a unique closest point 107 in X. Then the map π : X → X which sends a point in X to its closest point in X is a smooth map of X to X, and it fixes each point in X ⊂ X . Let γ0 and γ1 be loops on X satisfying kγ0 − γ1 kC 0 (S1 ,RN ) < . Then consider the homotopy in RN G̃(s, t) = (1 − s)γ0 (t) + sγ1 (t) ∈ RN . For s, t ∈ [0, 1], the distance in RN |G̃(s, t) − γ0 (t)| = s|γ0 (t) − γ1 (t)| < 1 · . So G̃(s, t) ∈ X for all s, t ∈ [0, 1], and we may define a homotopy in X by G(s, t) = π(G̃(s, t)). Remark. The homotopy G(s, t) constructed is a smooth homotopy if γ0 and γ1 are smooth. Thus the same theorem works with smooth homotopy classes (as considered in Guillemin and Pollack). Corollary 57. If γi are a sequence of loops in a free homotopy class in X ⊂ RN , and lim kγi − γkC 0 (S1 ,RN ) = 0, i→∞ then the loop γ is in the same free homotopy class. Proof. For the > 0 of Proposition 56 above, there is a γi so that kγi − γkC 0 (S1 ,RN ) < . Apply Proposition 56 to show γ and γi are in the same free homotopy class. The -Neighborhood Theorem, together with the mollifier technique of approximation, allow us to prove an important foundational result in topology: 108 Theorem 20. Let f : Rn → Y be uniformly continuous, where Y ⊂ RN is a compact submanifold without boundary. Then f is homotopic to a smooth map from Rn → Y . Proof. Since f is uniformly continuous, for all > 0, there is a δ > 0 so that if |x − x0 | < δ, then |f (x) − f (x0 )| < . The -Neighborhood Theorem shows that there is an > 0 so that the map π : Y → Y is well-defined and smooth. Let δ be the corresponding δ from the uniform continuity of f . Let ρ be a smooth nonnegative bump function with support in the unit R n ball B1 (0) in R so that Rn ρ dxn = 1. Then for α > 0, define ρα (x) = α−n ρ(x/α). Note supp ρα = Bα (0). Define Z Z α f (x) = f (y)ρα (x − y) dyn = f (y)ρα (x − y) dyn . Rn {y:|x−y|≤α} (Note each f α is RN -valued.) If α < δ, then |f (y) − f (x)| < for y in the domain of integration, and so Z α f (x) = f (y)ρα (x − y) dyn {y:|x−y|≤α} Z = [f (y) − f (x)]ρα (x − y) dyn {y:|x−y|≤α} Z + f (x)ρα (x − y) dyn {y:|x−y|≤α} Z = [f (y) − f (x)]ρα (x − y) dyn + f (x) {y:|x−y|≤α} since Z {y:|x−y|≤α} ρα (x − y) dyn = Z ρα (x − y) dyn = Rn Z ρα (z) dzn = 1 Rn for the substitution z = x − y. So Z α |f (x) − f (x)| = [f (y) − f (x)]ρα (x − y) dyn {y:|x−y|≤α} Z ≤ |f (y) − f (x)|ρα (x − y) dyn {y:|x−y|≤α} Z < ρα (x − y) dyn = . {y:|x−y|≤α} 109 (32) Therefore if α ∈ (0, δ), then f α (x) ∈ Y . Then we check that f˜α (x) = π(f α (x)) is the desired homotopy. In particular, as α → 0, f˜α (x) → f (x) uniformly by (32) (view as varying to zero instead of fixed for this interpretation). Since π and f α are smooth, then f˜α is smooth for small α > 0. In particular, we have shown that α f˜ (x) for α > 0 small F (α, x) = f (x) for α = 0 is the desired homotopy. Theorem 21. Let f : X → Y be a continuous map between smooth manifolds. Then f is homotopic to a smooth map from X → Y . Sketch of proof. We may assume X ⊂ RM by Whitney’s Embedding Theorem. Then there is a ν > 0 so that πM : X ν → X is well-defined and smooth. Define g : RM → RN by g(p) = f (πM (p)) for p ∈ X ν and g(p) = 0 for p 6∈ X ν . Note g(p) is uniformly continuous on a neighborhood of X. Apply the mollifier argument as above to g and show that the homotopy constructed in the proof of Theorem 20, when restricted to X ⊂ RM , has the desired properties. The discussion above about energy and length still holds. Assuming the minimizer is smooth enough, then a constant-speed length-minimizing loop is the same as an energy-minimizing loop. Thus we may as well consider energy-minimizing loops, and we have the equivalent problem. Problem: Find a curve of least energy in a free homotopy class of loops on a compact Riemannian manifold. So far in our discussion, the formulation of length and energy depend on the loop γ being C 1 (so that the derivative γ̇ is C 0 and thus can be integrated). If we look more closely, the energy is defined as the square of the L2 norm of γ̇ Z 1 |γ̇|2g dt. E(γ) = 0 Therefore, we really do not need γ̇ to be continuous, but only L2 . In terms of γ itself, we need to develop a theory of how to take a derivative which ends up not being continuous, but only L2 . For this purpose, we define derivatives in the sense of distributions, or weak derivatives. 110 4.4 Distributions On Rn , we consider each smooth function φ with compact support to be a test function. For any C 1 function f on Rn and test function φ, we have the following formula by integrating by parts: Z Z f,i φ dxn = − f φ,i dxn . (33) Rn Rn For two locally L1 functions f and h on Rn , we say f,i = h in the sense of distributions if for all test functions φ, Z Z hφ dxn = − f φ,i dxn . Rn Rn Let D(Rn ) be the vector space of all smooth functions with compact support in Rn . For our purposes, we will define a distribution on Rn to be a linear map from D(Rn ) → R. We often allow C-valued test functions and consider complex linear maps to C; complex-valued functions are useful when doing Fourier analysis. (The usual definition of a distribution is more involved: one must define a topology on D(Rn ) and then consider distributions to be only continuous linear maps to C. For our purposes, the simpler definition suffices. See Section 4.9 below for a more standard treatment of distributions on the circle S1 .) Recall a measurable function f is locally L1 if over every R compact subset K of the domain of f , K |f | < ∞. Any locally L1 function f on Rn gives a distribution by sending Z f : φ 7→ f (φ) = f φ dxn . Rn Notice that there is a slight abuse of notation: f (φ) for φ a test function is not to be confused with f (x) for x ∈ Rn . Two locally L1 functions f1 , f2 are said to be equal in the sense of distributions if for every test function φ, Z Z Z f1 φ dxn = f2 φ dxn ⇐⇒ (f1 − f2 )φ dxn = 0. Rn Rn Rn Remark. On RN , note that any locally Lp function for p ≥ 1 is also locally L1 . This is because for K ⊂⊂ Rn , p1 + 1q = 1, and f locally Lp , Hölder’s inequality states Z p1 1q Z Z p |f | dxn ≤ 1 dxn |f | dxn < ∞. K K K 111 Example 18. Any locally finite Borel measure dµ on Rn defines a distribution by sending Z φ 7→ φ dµ Rn for any test function φ. An important example of this is the point mass, at the origin. The δ-function subset Ω ⊂ Rn , 1 if δ(Ω) = 0 if inaptly named δ-function, or unit is a measure on Rn so that for any 0∈Ω 0∈ / Ω. So the distribution defined by this measure is δ : φ 7→ φ(0), which is just evaluation of φ at the origin. The following problem shows there is no locally L1 function which is equal to the δ-function. Homework Problem 45. Show that there is no L1 function f on Rn so that Z f φ dxn = φ(0) for all φ ∈ D(Rn ). Rn n Hint: Consider a smooth nonnegative function R ρ : R → R with support in B1 (0) the unit ball centered at 0 and so that Rn ρ dxn = 1. Use this ρ to define ρ (x) = −n ρ(x/). If there were such an L1 function f , recall that if Z f (y)ρ (x − y) dyn , f (x) = Rn then f → f in L1 as → 0. (a) Show that for all x 6= 0 that f (x) = 0 for small enough. (Follow the proof of Proposition 58.) (b) Suppose a family of continuous functions f → f in L1 (Rn ) as → 0+ , and let O ⊂ Rn be a measurable subset on which f = 0 identically on O for all sufficiently small. Show that f = 0 almost everywhere on O. (Split up the relevant integrals on Rn into integrals on O and Rn \ O.) (c) Show our f = 0 almost everywhere on Rn . 112 (d) Find a contradiction. We have just seen that distributions are more general than functions. In particular, it is possible to differentiate any distribution by mimicking formula (33). A distributional derivative of a function may no longer be a function, but it will be well-defined as a distribution. Given a distribution f defined by a map f : φ 7→ f (φ) ∈ R, the partial derivative f,i in the sense of distributions is defined to be the distribution f,i : φ 7→ −f (φ,i ). It is this innovation which allows us to define the derivatives of L2 functions. Remark. Note that the equation (33) motivating the distributional derivative is essentially the same as the integration by parts used to calculate the EulerLagrange equations for γ +h. Thus if h is smooth with compact support, we can still integrate by parts even if γ is no longer regular enough for ordinary differentiation; we simply consider the derivatives to be taken in the sense of distributions. Homework Problem 46. Consider the Heaviside function 1 if x ≥ 0 h(x) = 0 if x < 0. Show that the derivative h0 (taken in the sense of distributions) is the δ function on R. Homework Problem 47. Consider for any test function φ ∈ D(R), Z − Z ∞ 1 1 1 PV (φ) = lim+ φ(x) dx + φ(x) dx . →0 x x −∞ x Part (a) shows that P V ( x1 ) is a distribution. It is called the principal value of x1 . (a) Show P V ( x1 )(φ) converges for all smooth test functions φ. (Hint: The potential problem is clearly at x = 0. Use Taylor’s Theorem to write φ = φ(0) + O(x), where O(x) represents a term so that O(x)/x converges to a real limit as x → 0.) 113 (b) Show that the first derivative in the sense of distributions of P V ( x1 ) is given in terms of φ ∈ D(R) as Z − Z ∞ 1 1 2 lim − 2 φ(x) dx + − 2 φ(x) dx + φ(0) . →0+ x x −∞ One more thing is needed to complete the picture of distributions as generalizations of functions. Recall that every locally Lp function for p ≥ 1 defines a distribution. The following proposition shows this map is injective. Proposition 58. If two locally L1 functions f1 and f2 on Rn define the same distribution, then f1 = f2 almost everywhere. Proof. We first consider the case when f1 and f2 are both globally L1 on Rn . Then recall that we can use a mollifier to approximate each in L1 by smooth functions. In particular, if ρ is a smooth nonnegative function with compact R support so that Rn ρ dxn = 1, then define Z 1 x , fi (x) = ρ (x − y)fi (y) dyn , i = 1, 2. ρ (x) = n ρ Rn Then each fi is a smooth L1 function on Rn and fi → fi in L1 as → 0. Now for each fixed x ∈ Rn , ρ (x − y) is a smooth test function with compact support in y, and fi (x) is simply the evaluation of this test function by the distribution fi . Since f1 = f2 in the sense of distributions, then f1 (x) = f2 (x) for all x ∈ Rn . So then kf1 − f2 kL1 = lim kf1 − f2 kL1 = lim 0 = 0. →0 →0 Then f1 = f2 in L1 , which is equivalent to f1 = f2 almost everywhere. If f1 and f2 are only locally L1 , consider a smooth function βR with compact support which is identically equal to 1 on BR = {|x| ≤ R}. It is easy to check that the condition f1 = f2 in the sense of distributions implies βR f1 = βR f2 in the sense of distributions. Then since each fi is locally L1 , each βR fi is globally in L1 . We apply the argument of the previous paragraph; so βR f1 = βR f2 almost everywhere on Rn . This implies that f1 = f2 almost everywhere on the ball BR . Now let R → ∞ to conclude that f1 = f2 almost everywhere on Rn . 114 So far, we have discussed distributions on Rn . On the circle S1 , the definitions are similar, the main difference being that since S1 is compact, our test functions are simply all smooth functions on S1 . In particular, we can think of test functions on S1 as smooth periodic functions on R with period 1. In this way, an L1 function f on S1 acts on test functions by Z 1 f:φ→ f φ dt. 0 One thing to check is that integration by parts still works. If f is C 1 on S1 and φ is smooth on S1 , then Z Z 1 ˙ f φ dt = f˙φ dt 1 S 0 1 Z 1 = − f φ̇ dt + (f φ) 0 Z0 = − f φ̇ dt + f (1)φ(1) − f (0)φ(0) 1 ZS = − f φ̇ dt S1 because f (0) = f (1) and φ(0) = φ(1) since f and φ are periodic. So we have the same basic formula as in (33), and we may define distributions and distributional derivatives in the same manner as above. Now we return to our problem. We want to consider all loops γ : S1 → X ⊂ RN so that Z 1 E(γ) = |γ̇|2g dt = kγ̇k2L2 (S1 ,RN ) < ∞. 0 Therefore, we consider the Sobolev space L21 (S1 , RN ) = {γ : S1 → RN : kγk2L2 = kγk2L2 + kγ̇k2L2 < ∞}, 1 where the derivative γ̇ is taken in the sense of distributions. Note that γ ∈ L2 (S1 , RN ) implies that γ̇, when defined in the sense of distributions, may be represented as a function (and an L2 function at that). We may consider each component γ 1 , . . . , γ N separately, and it should be clear that γi → γ in L21 (S1 , RN ) if and only if each γia → γ a in L21 (S1 , R) for 115 each a = 1, . . . , N . Thus we may work with each component of γ separately in RN . Below we will see that L21 is a Hilbert space, but for now we are content to show that every function in L21 (S1 ) is continuous. Recall that elements of L21 (S1 ) are only equivalence classes of functions, two functions being equivalent if they agree almost everywhere. Proposition 59. Every element of L21 (R) contains a continuous representative. Remark. This proposition is an important example of the Sobolev embedding theorem, which gives a means to embed Sobolev spaces Lpk (Rn ) into appropriate C ` spaces ` = `(p, k, n). In particular, the present result depends strongly on the fact that the dimension of the domain R of the functions is one. (There are elements of L21 (R2 ) which do not have continuous representatives.) R Proof. Let f ∈ L21 (R). So R |f˙|2 dt = C 2 < ∞. Then compute for t2 ≥ t1 Z t2 ˙ |f (t2 ) − f (t1 )| = f (t) dt t1 Z t2 12 Z t2 12 ≤ |f˙(t)|2 dt dt t1 t1 1 2 ≤ C(t2 − t1 ) . So this formula shows f is continuous, as long as we can justify using the Fundamental Theorem of Calculus Z t2 f (t2 ) − f (t1 ) = f˙(t) dt. t1 Rt We achieve this by defining g(t) = 0 f˙(s) ds. The previous argument implies that g is continuous. Now we argue that there is a constant K so that f − g = K almost everywhere. This will show there is an continuous representative g + K in the equivalence class of f . First we show that ġ = f˙ in the sense of distributions. Consider a test 116 function φ. Then ġ(φ) = − Z ∞ g(t)φ̇(t) dt Z t ˙ = − f (s) ds φ̇(t) dt −∞ 0 Z Z ˙ = − f (s)φ̇(t) ds dt + f˙(s)φ̇(t) ds dt −∞ Z ∞ R1 R2 by Fubini’s Theorem, for the regions in the plane R1 = {(s, t) : s ≥ 0, t ≥ s}, R2 = −R1 . Then again by Fubini, and since φ has compact support, Z ∞ Z ∞ Z 0 Z s ġ(φ) = − φ̇(t) dt f˙(s) ds + φ̇(t) dt f˙(s) ds 0 s −∞ −∞ Z ∞ Z 0 ˙ = − (−φ(s))f (s) ds + φ(s)f˙(s) ds −∞ Z ∞0 = φ(s)f˙(s) ds −∞ = f˙(φ). Therefore, ġ = f˙ in the sense of distributions. The following proposition, applied to f −g, shows that there is a constant K so that f = g + K in the sense of distributions. Then Proposition 58 above shows f = g + K almost everywhere, and thus there is a continuous representative in the equivalence class of f . Proposition 60. If a distribution h on R satisfies ḣ = 0 in the sense of distributions, then there is a constant K so that h = K as distributions. R Proof. Let φ be a test function Rwith integral R φ dt = 1. Let K = h(φ). Then for a test function ψ with R ψ dt = L, compute h(ψ) = h(ψ − Lφ) + Lh(φ) = h(ψ − Lφ) + LK. But now Z ∞ (ψ − Lφ) dt = L − L · 1 = 0, −∞ 117 and thus the function χ(t) = Z t [ψ(s) − Lφ(s)]ds (34) −∞ is a smooth function with compact support—Proof: Let supp(ψ − Lφ) ⊂ [T, T0 ]. It is clear that χ(t) = 0 for t < T . For t > T 0 , note that χ0 (t) = ψ(t) − Lφ(t) = 0 and so χ is constant on (T 0 , ∞). Then (34) shows that χ(t) → 0 as t → ∞, and so χ = 0 on (T 0 , ∞). Then since χ̇ = ψ − Lφ, h(ψ) = LK + h(ψ − Lφ) = LK + h(χ̇) = LK − ḣ(χ) = LK since ḣ = 0 in the sense of distributions. But then Z Z h(ψ) = LK = K ψ dt = Kψ dt. R R and h = K as distributions. Homework Problem 48. Prove Propositions 59 and 60 above for distributions on S1 instead of on R. Here are the key steps: (a) Let f : S1 → R be an L2 function, and assume that the distributional derivative f˙ is L2 as well. Represent f and f˙ as periodic functions from R → R. For any t ∈ R, define Z t g(t) = f˙(s) ds. 0 Show that g is periodic and continuous (and so defines a continuous function on S1 .) Note that the constant function 1 is a test function on S1 . (b) Show that f˙ = ġ in the sense of distributions. In other words, for every smooth periodic test function φ ∈ D(S1 ), show that Z 1 Z 1 ˙ f φ dt = − g φ̇ dt. 0 0 118 (c) If h is a distribution on S1 which satisfies ḣ = 0 in the sense of distributions, show there is a constant K so that h = K as distributions. In other words, show that for every periodic smooth ψ : R → R, Z 1 h(ψ) = Kψ dt. 0 Now since any L21 map from S1 → X ⊂ RN is continuous, each one is in a free homotopy class of loops on X. With that in mind, we formulate our final version of the problem: For X ⊂ RN a smooth submanifold with Riemannian metric pulled back from the Euclidean metric on RN , define L21 (S1 , X) = {γ ∈ L21 (S1 , RN ) : γ(S1 ) ⊂ X}. Here we assume that γ is continuous, as we may by Proposition 59 above. Problem: Let X ⊂ RN be a smooth compact manifold equipped with the Riemannian metric pulled back from the Euclidean metric on RN . Let C be the class of loops γ : S1 → X in a free homotopy class on X and in L21 (S1 , X). Find a loop of least energy in C. Proposition 61. Let γ ∈ L21 (S1 , X) be energy minimizing in a free homotopy class on X for X ⊂ RN a smooth manifold without boundary. Then γ solves (a version of ) the geodesic equation 2(gik γ̇ i )˙ − gij,k γ̇ i γ̇ j for all k = 1, . . . , n, in the sense of distributions. Proof. First of all, note that we can choose γ to be continuous by Problem 48 above. Thus it makes sense that γ is in a free homotopy class. Since γ minimizes energy, then for each h smooth with compact support so that γ(supp h) ⊂⊂ a single coordinate chart in X, that d E(γ + h) = 0. d =0 119 Compute the first variation as in the derivation of the Euler-Lagrange equations in Subsection 4.2 above: Z Z Z k i j i j gij,k h γ̇ γ̇ dt + gij ḣ γ̇ dt + gij γ̇ i ḣj dt = 0. S1 S1 S1 Since the components of h are smooth with compact support, they act as test functions, and we may then integrate by parts in the second and third integrals, in the sense of distributions, to conclude that 0 = (gkj γ̇ j )˙ + (gik γ̇ i )˙ − gij,k γ̇ i γ̇ j = 2(gik γ̇ i )˙ − gij,k γ̇ i γ̇ j in the sense of distributions. Remark. In the previous proposition, we cannot immediately perform the usual rules of calculus, since the objects involved are only distributions. In particular, we show in the next homework problem that functions which are only continuous cannot be meaningfully multiplied by distributions which are not Borel measures. Homework Problem 49. Note that if λ : Rn → R is a smooth function, and f is a locally L1 function, then the product λf is also a locally L1 function. (a) If λ : Rn → R is a smooth function, and p is a distribution on Rn , then show that it is possible to define the product λp in such a way that if p is induced from a locally L1 function, then λp is induced from the usual product of two functions. (b) Let δ be the δ-function on R. Compute its first derivative δ̇ in the sense of distributions. (c) Show that if g : R → R is a continuous function which is not differentiable at 0, then the formula for the product developed in part (a) above does not give a reliable answer for the product g δ̇ of the continuous function g and the distribution δ̇. 4.5 Hilbert spaces Recall that a Hilbert space is a Banach space whose norm comes from a positive definite inner product. We now show that L21 (S1 , R) is a Hilbert 120 space. Recall that L21 (S1 , R) consists of all L2 functions on S1 whose derivative in the sense of distributions is also L2 . This suggests a natural inner product: Z Z hf, hiL21 = f h dt + f˙ḣ dt. S1 S1 Then plug in f = h to find Z Z 2 2 kf kL2 = |f | dt + |f˙|2 dt = hf, f iL21 , 1 S1 S1 and so the norm on L21 is induced by the inner product. Below in Corollary 67, we show that any positive definite inner product defines a norm. Remark. L21 (S1 , RN ) is also naturally a Hilbert space, with inner product given by Z Z hf, hiL2 = hf, hi dt + hf˙, ḣi dt, 1 S1 S1 N where h·, ·i is the inner product on R . It is also useful to define complex Hilbert spaces, in which the inner product h·, ·i is Hermitian and positive definite. A Hermitian inner product on a complex vector space V is a map from V × V → C which satisfies for λ ∈ C and f, g, h ∈ V , hλf + g, hi = λhf, hi + hg, hi, hf, λg + hi = λ̄hf, gi + hf, hi, hf, gi = hg, f i. These three conditions are respectively that the inner product is complex linear in the first slot, complex antilinear in the second slot, and skewsymmetric. The first two conditions together are called sesquilinear. Then L21 (S1 , C) is a complex Hilbert space with inner product Z Z hf, gi = f ḡ dt + f˙ġ dt. S1 S1 We can also define the Sobolev space L21 (Rn , R) by the inner product Z n Z X hf, gi = f g dxn + f,i g,i dxn , Rn i=1 Rn the derivatives taken in the sense of distributions. The elements of L21 (Rn , R) are then equivalence classes of functions in L2 so that all the first partials in the sense of distributions are also in L2 . 121 We will work with L21 (S1 , R) instead of L21 (S1 , RN ), since convergence in L21 (S1 , RN ) is equivalent to each component converging in L21 (S1 , R). The proofs that follow will work with minor modifications for the spaces L21 (S1 , RN ) and L21 (S1 , C). We focus on L21 (S1 , R), which we refer to simply as L21 . Proposition 62. L21 (S1 , R) is a Hilbert space. Proof. We’ve exhibited an inner product on L21 , and it is easy to check that it is positive definite (if we consider elements to be equivalence classes of functions, two functions being equivalent if they agree almost everywhere). Thus the remaining thing to check is that the metric L21 (S1 , R) is complete (and so it is a Banach space). First of all note that fn → f in L21 is equivalent to fn → f in L2 and ˙ fn → f˙ in L2 . Let fn be a Cauchy sequence in L21 . Then by the definition of the norm, it is clear that fn and f˙n are both Cauchy sequences in L2 . Then we have limits fn → f and f˙n → g in L2 . In order to show that fn → f in L21 , it suffices to show that f˙ = g in the sense of distributions. Let φ be a test function, and note that fn → f in L2 implies by Hölder’s inequality that Z |fn (φ) − f (φ)| = (fn − f )φ dt ≤ kfn − f kL2 kφkL2 → 0 S1 as n → ∞. We use this fact for both fn → f and f˙n → g to compute for a test function φ Z Z g(φ) = gφ dt = lim f˙n φ dt n→∞ S1 S1 Z = − lim fn φ̇ dt n→∞ S1 Z f φ̇ dt = − S1 = −f (φ̇) = f˙(φ). Therefore, g = f˙ in the sense of distributions. Remark. Essentially the same proof shows that L21 (Rn , Rm ) is a Hilbert space. 122 For a real Hilbert space H, an orthonormal basis is a collection of elements {eα }α∈A which are orthonormal in that heα , eβ i = δαβ and so that every element v ∈ H can be written as X v= v α eα α∈A for v α ∈ R. Here A is an index set, which may be finite, countably infinite, or uncountable (and of course the convergence of any infinite sum is controlled by the norm). A Hilbert space which has a countable (finite or infinite) orthonormal basis is called separable. The following is true: Proposition 63. Every Hilbert space has an orthonormal basis. In fact, every orthonormal set in a Hilbert space can be completed to an orthonormal basis. We omit the proof, which is similar to the proof of the corresponding fact for vector spaces (any linearly independent set can be completed to a basis). In particular, Zorn’s Lemma is needed in the case of non-separable Hilbert spaces. But see Problem 54 below for a proof of this Proposition for separable Hilbert spaces, and for a discussion of how this special case is adequate for the proofs of the results in this section. Theorem 22 (Pythagorean Theorem). If v, w ∈ H a Hilbert space, and hv, wi = 0, then kvk2 + kwk2 = kv + wk2 . Proof. Compute kv + wk2 = hv + w, v + wi = hv, vi + 2hv, wi + hw, wi = kvk2 + kwk2 . Lemma 64 (Bessel’s Inequality). If {e1 , . . . , en } is a finite orthonormal set in H, then for all y ∈ H, 2 kyk ≥ n X |hy, ei i|2 . i=1 123 Pn Proof. Check that for w = i=1 hy, ei iei , hy − w, wi = 0. Then apply the Pythagorian Theorem to y = (y − w) + w, and note that kwk2 = P n 2 i=1 |hy, ei i| . Corollary 65. If ei is a countable orthonormal set, then kyk2 ≥ ∞ X |hy, ei i|2 . i=1 Proof. Use Bessel’s Inequality and take limits of partial sums. Theorem 23. Given a Hilbert space H with an orthonormal basis {eα }α∈A , for every element v ∈ H, X v = hv, eα ieα , (35) α∈A kvk2H = hv, vi = X |hv, eα i|2 , (36) α∈A where the (possibly uncountable) sums are defined by using ProbP Homework α α 2 lem P 50 below. Moreover, if there are v ∈ R so that α∈A |v | < ∞, then v = α∈A v α eα converges to an element of H. Remark. For each v ∈ H, only a countable number of the coefficients v α = hv, eα i are nonzero. This is due to the following fact: Homework Problem 50. Let A be an uncountable set, and for each α ∈ A, let xα ≥ 0. P (a) If A0 ⊂ A is a finite set, let SA0 = α∈A0 xα . Show that if the set {SA0 : A0 ⊂ A is a finite set} is bounded, then xα = 0 for all but countably many α ∈ A. (b) Use part (a) to define X xα = sup{SA0 : A0 ⊂ A is a finite set} α∈A as an element of [0, ∞] for any xα ≥ 0. In particular, if the sum is finite, show that X X xα = xα , α∈A α∈à 124 where à = {α ∈ A : xα > 0} is countable. Show that if à is infinite, the right-hand sum is the usual sum of a convergent countably infinite series (for any bijection between à and the natural numbers). Hint for (a): Each xα > 0 satisfies xα ∈ [2n , 2n+1 ) for some n ∈ Z. Derive a contradiction if the number of positive xα is uncountable. Remark. Note that X xα = Z x dc A α∈A for dc the counting measure on A. If A0 ⊂ A, then the counting measure c(A0 ) = |A0 | the cardinality of A0 (and so c(A0 ) = +∞ when A0 is infinite). P Proof of Theorem 23. First assume that v α ∈ R and α∈A |v α |2 < ∞. Then Homework Problem 50 above shows that all but manyP of v α are P∞countably i zero, and so we may write v as a countable sum i=1 v ei . Let vn = ni=1 v i ei . Then for n > m 2 n n ∞ X X X 2 i i 2 kvn − vm k = v ei = |v | ≤ |v i |2 . i=m+1 i=m+1 i=m+1 Here, second equality is by the Pythagorean Since the series P∞ Theorem. P∞ the i 2 i 2 i=m+1 |v | must go to zero as i=1 |v | converges, the tail of the series m → ∞, and thus {vn } is a Cauchy sequence in H. Since H is complete, vn converges to the limit v ∈ H. Now let v ∈ H and v α = hv, eα i. Then Bessel’s Inequality shows that for all finite subsets A0 ⊂ A, that X |v α |2 ≤ kvk2 . α∈A0 So for the collection {|v α |2 }α∈A , the set S of finite partial sums is bounded. So Homework Problem 50 shows that all but countably many v α = 0. Denumerate the countable number of nonzero terms as v 1 , v 2 , . . . , and the corresponding elements of the basis as e1 , e2 , . . . . Porthonormal N i 2 Since the sequence i=1 |v | is bounded and increasing, P iti has a finite limit as N → ∞. We have shown above that the series ∞ i=1 v ei converges 0 to a limit v ∈ H. Compute * + n X hv − v 0 , ei i = lim v − v j ej , ei = v i − v j δji = 0. n→∞ j=1 125 And for any eα 6∈ {e1 , e2 , . . . }, compute * 0 hv − v , eα i = lim n→∞ v− n X j v ej , eα + = 0. j=1 So for all eα in the orthonormal basis, *∞ + X hv, eα i = hv 0 , eα i = v i ei , eα = v α . i=1 α Now P the αdefinition of orthonormal basis shows that there are ṽ ∈ R so that α∈A ṽ eα = v. By the same analysis P∞ ias above, all but countably many α v are zero, and we may write v = i=1 ṽ ei . Moreover, as in the previous paragraph, *∞ + X v α = hv, eα i = ṽ i ei , eα = ṽ α i=1 and so (35) is proved. PnTo prove (36), note that (35) shows that v = limn→∞ vn in H, for vn = i=1 hv, ei iei . Since the norm is continuous, then kvk2 = lim kvn k2 = lim n→∞ n→∞ n X |hv, ei i|2 = i=1 ∞ X i=1 |hv, ei i|2 = X |hv, eα i|2 . α∈A This concludes the proof of the theorem. P P∞ i i Corollary 66. If v = ∞ i=1 w ei for {ei } an orthonormal i=1 v ei , w = basis of a separable Hilbert space, then ∞ X hv, wi = v i wi . i=1 Proof. Compute kv + wk2 = kvk2 + 2hv, wi + kwk2 , hv, wi = 12 [kv + wk2 − kvk2 − kwk2 ] ∞ X [(v i + wi )2 − (v i )2 − (wi )2 ] = 12 i=1 = ∞ X v i wi . i=1 126 Remark. The formula for a complex Hilbert space is hv, wi = ∞ X v i wi . i=1 Remark. Homework Problem 50 shows that this result still holds for nonseparable Hilbert spaces, since the number of basis elements with nonzero coefficients for v and/or w is countable. Here is another basic result in Hilbert spaces: Homework Problem 51 (Cauchy-Schwartz Inequality). If v, w ∈ H a real Hilbert space, then |hv, wi| ≤ kvkkwk, and there is equality if and only if v and w are linearly dependent. Hint: Use calculus to compute the minimum value of ktv + wk2 as a function of t, and note the minimum value must be nonnegative. Remark. The Cauchy-Schwartz Inequality is also true for complex Hilbert spaces, but for the proof, note that the minimum value of kteiθ v + wk2 , for t ∈ R and θ so that eiθ hv, wi = |hv, wi|, is nonnegative. Corollary 67. Any positive definite inner product on a real vector space V produces a norm by the formula kvk2 = hv, vi. Proof. The main thing to check is the triangle inequality. Let v, w ∈ V and note that kv + wk ≤ kvk + kwk ⇐⇒ kv + wk2 ≤ kvk2 + 2kvkkwk + kwk2 ⇐⇒ kvk2 + 2hv, wi + kwk2 ≤ kvk2 + 2kvkkwk + kwk2 ⇐⇒ hv, wi ≤ kvkkwk. The main results we will use regarding Hilbert spaces involve another topology on the Hilbert space which is different from the topology defined by the metric. The usual metric convergence of sequences is called strong convergence. So a sequence vi → v in H strongly if kvi − vkH → 0, 127 and we write vi → v. On the other hand, a sequence vi ∈ H is weakly convergent to a limit v ∈ H if hvi , wi → hv, wi for all w ∈ H, and we write vi * v. If vi → v, then vi * v (Homework Problem 52 below), but the converse is not true in general, as the following example shows: Example 19. Let H be a Hilbert space with a countably infinite orthonormal basis e1 , e2 , . . . . Then ei * 0 in H, but {ei } does not converge strongly. P 2 Proof. Let w ∈ H. Then since kwk2 = ∞ i=1 |hw, ei i| < ∞, we must have each term |hw, ei i|2 → 0 as i → ∞. This shows ei → 0 weakly in H as i → ∞. To show {ei } does not converge strongly, note that √ kei − ej kH = 2 for i 6= j by the Pythagorean Theorem. Thus {ei } cannot be a Cauchy sequence in H, and thus cannot converge strongly. Homework Problem 52. Show that if vi → v converges strongly in a Hilbert space H, then vi * v weakly in H. Hint: Use Cauchy-Schwartz. Theorem 24. Let {vi } be a sequence in a Hilbert space H satisfying kvi k ≤ K for a uniform constant K. Then there is a weakly convergent subsequence to a limit v which satisfies kvk ≤ K. In other words, the closed ball of radius K is (sequentially) compact in the weak topology on H. Proof. Let {eα }α∈A be an orthonormal basis. Problem 50 shows that for each of {v1 , v2 , . . . }, only a countable subset Av1 , Av2 , · · · ⊂ A have nonzero coefficients in the orthonormal decomposition. Then the union ∞ [ Av i i=1 is also countable, and it represents all the basis elements with nonvanishing coefficients for all the vi . Denumerate these elements as e1 , e2 , . . . , and write vi = ∞ X j=1 128 vij ej . Since there is a constant K so that kvi k ≤ K, then Theorem 23 shows for each N N X |vij |2 ≤ K 2 . (37) j=1 Thus, since the interval [−K, K] ⊂ R is compact, there is a subsequence {1 vi } of {vi } so that lim 1 vi1 = v 1 ∈ [−K, K]. i→∞ Now there is a subsequence {2 vi } of {1 vi } so that lim 2 vi1 = v 1 , lim 2 vi2 = v 2 , i→∞ |v 1 |2 + |v 2 |2 ≤ K 2 . i→∞ This is because 1 vi2 ∈ [−K, K], which is compact, and the bound follows from (37). Recursively, we may define for each N a subsequence {N vi } and a real number v N so that {N vi } is a subsequence of {(N −1) vi }, lim N vij = v j for j = 1, . . . , N, (38) |v | + · · · + |v N |2 ≤ K 2 . (39) i→∞ 1 2 We use a diagonalization procedure to find a weakly convergent subsequence. {i vi } isPa subsequence of {vi }, and we will show that it converges j i j weakly to v = ∞ i=1 v ei and v ∈ H. Note by construction that {i vi } → v as i → ∞ for each j = 1, 2, . . . . This is because, for each j, {i vi }∞ i=j is a subsequence of {j vi }∞ and by condition (38). i=1 That v ∈ H follows directly from (39) and Theorem 23. Now we show i vi → v weakly in H. Let w ∈ H, and let > 0. Write |hi vi , wi − hv, wi| = |hi vi − v, wi| X ≤ |(i viα − v α )wα | = = α∈A ∞ X j=1 n X |(i vij − v j )wj | |(i vij j j − v )w | + j=1 ∞ X j=n+1 129 |(i vij − v j )wj | for all n. Here the third line follows from the second since i viα = v α = 0 if eα 6∈ {e1 , e2 , . . . }. Since ki vi k ≤ K and kvk ≤ K, then ki vi −vk ≤ 2K and Cauchy-Schwartz shows that ! 21 ! 12 ∞ ∞ ∞ X X X |(i vij − v j )wj | ≤ |wj |2 |i vij − v j |2 j=n+1 j=n+1 j=n+1 ∞ X ≤ |i vij − v j |2 ! 12 ∞ X j=1 j=n+1 ∞ X ≤ 2K |wj |2 ! 12 |wj |2 ! 12 j=n+1 Since w ∈ H, so that P∞ j=1 |wj |2 ≤ P α∈A |wα |2 = kwk2 converges, and there is an n ∞ X |wj |2 ! 12 < . j=n+1 Now for j = 1, 2, . . . , n, each i vij → v j as i → ∞. So we may choose an I so that for all i ≥ I, |i vij − vj | < 0 for (|w1 | + · · · |wn |)0 < . Therefore, for i ≥ I, |hi vi , wi − hv, wi| ≤ n X j=1 0 |(i vij − v j )wj | + ∞ X |(i vij − v j )wj | j=n+1 1 n ≤ (|w | + · · · + |w |) + 2K < + 2K. Since K is independent of i, hi vi , wi → hv, wi as i → ∞ and thus i vi * v weakly in H. Theorem 25. Let vi * v in a Hilbert space. Then kvk ≤ lim inf kvi k. i→∞ In other words, the Hilbert space norm is lower semicontinuous under weak convergence. 130 Proof. The proof is to translate the current problem into Fatou’s Lemma. Let {eα }α∈A be an orthonormal basis of our Hilbert space H. Then put the counting measure c on the index set A. Let f : A → [0, ∞), f : α 7→ fα be a nonnegative real-valued function on A. Then it is straightforward to check that Z X f dc = fα , A α∈A and thus each sum may be thought of as an integral with respect to the counting measure. In our case, if X X vi = viα eα , v= v α eα , α∈A α∈A we may view vi as a function from A → R by vi : α → viα . (The same holds for v.) Theorem 23 shows X X kvi k2 = |viα |2 , kvk2 = |v α |2 . (40) α∈A α∈A Now since vi * v, then viα = hvi , eα i → hv, eα i = v α as i → ∞ for all α. Thus with respect to the counting measure on A, vi → v everywhere on A. Thus each terms in the sums in (40) is nonnegative, and for each α, |viα |2 → |v α |2 , the limit vi → v satisfies the hypotheses of Fatou’s Lemma with respect to the counting measure, and so X |viα |2 lim inf kvi k2 = lim inf i→∞ i→∞ α∈A Z = lim inf |vi |2 dµ i→∞ A Z X |v|2 dµ = |v α |2 = kvk2 . ≥ A α∈A Note that the above proofs depend heavily on the existence of an orthonormal basis, Proposition 63, which we did not prove. The following 131 problem outlines a standard procedure for getting around the proof of Proposition 63, by proving the existence of an orthonormal basis for any Hilbert space with a countable spanning set. A subset S of a Hilbert space H is said to be a spanning set if the (strong) closure of finite linear combinations of elements in S is equal to all of H. For example, in the proof of Theorem 24, we need only deal with the closure H 0 of the span of {v1 , v2 , . . . }. The existence of an orthonormal basis of H 0 is sufficient for the proof of Theorem 24. Homework Problem 53. Show that any strongly closed linear subspace of a Hilbert space H is again a Hilbert space (with the same inner product). We say a subset {vα }α∈A ⊂ H is linearly independent (in the sense of Banach spaces) if any convergent sum X bα vα = 0 α∈A implies bα = 0 for all α ∈ A. Note in particular, the implication holds for any finite sum (and thus this notion of linearly independence in this Banach-space sense implies linear independence in the usual vector-space sense). Homework Problem 54 (Gram-Schmidt Orthogonalization). (a) Let H be a Hilbert space with a countable spanning set {v1 , v2 , . . . } which is finite or countably infinite. Show that there is a subset of {v1 , v2 , . . . } which is a linearly independent spanning set of H. (b) Given a linearly independent spanning set {v1 , v2 , . . . } on a Hilbert space H, define fi and ei recursively by f1 = v1 , f2 = v2 − hv2 , e1 ie1 , fn = vn − n−1 X hvn , ei iei , i=1 f1 , kf1 k f2 e2 = , kf2 k e1 = en = fn . kfn k Show that this recursive definition can be carried out (in particular, show that fn 6= 0). Then show that {e1 , e2 , . . . } is an orthonormal basis for H. In other words, show that hei , P ej i = δij and that any v in i H can be written as a convergent sum v = ∞ i=1 v ei . 132 The use of the previous problem isn’t strictly necessary for our purposes, as L21 (S1 , R) is separable (though we won’t prove that it is). Recall that for every Banach space B, the dual space Banach space B ∗ is the space of all continuous linear functionals λ : B → R, with norm given by |λ(x)| . x∈B\{0} kxkB kλkB ∗ = sup Also recall that for any p ∈ (1, ∞), the dual Banach space of Lp (Rn ) is Lq (Rn ) for p−1 + q −1 = 1. Thus L2 (Rn ) is dual to itself. This fact is true for all Hilbert spaces, as the following problem shows in the separable case. Homework Problem 55. Let H be a separable real Hilbert space. Show that the dual Banach space H ∗ is naturally equal to H. In particular, the inner product provides a map from H → H ∗ by x 7→ λx = h·, xi. Show that this map preserves the norm, is one-to-one and onto. Hint: The most significant step is showing the map is onto. First reduce to the case λ 6= 0. Show that L = λ−1 (0) is also a separable Hilbert space, and let {ei } be an orthonormal basis for L. Let y ∈ / L and use a version of Gram-Schmidt to show we may assume y ⊥ L. Construct x from y and λ. A sequence vi in a Banach space B converges to v ∈ B, in the weak* topology if for every λ ∈ B ∗ , λ(vi ) → λ(v). The previous problem shows that Theorem 24 is a special case of the following more general theorem about Banach spaces: Theorem 26 (Banach-Alaoglu). In a Banach space B, the unit ball {x ∈ B : kxkB ≤ 1} is compact in the weak* topology. In other words, if xi is a sequence in the unit ball, then there is a subsequence xij and a limit x ∈ B so that for all λ ∈ B ∗ , λ(xij ) → λ(x) as j → ∞. Example 20 (Fourier series). In the following theorem, we compute perhaps the easiest nontrivial example of an orthonormal basis on an infinitedimensional Hilbert space. L2 (S1 , C) is a complex Hilbert space with inner product given by Z hf, gi = f ḡ dt. S1 133 Theorem 27. The complex exponential functions {e2πikt : k ∈ Z} form an orthonormal basis of L2 (S1 , C). Proof. It is clear that each e2πikt ∈ L2 (S1 , C), and we compute Z 2πikt 2πi`t he ,e i = e2πikt e2πi`t dt 1 ZS 1 = e2πi(k−`)t dt 0 1 e2πi(k−`)t if k 6= ` =0 2πi(k − `) 0 = Z 1 dt =1 if k = `. 0 2 1 Therefore, {e2πikt }∞ k=−∞ forms an orthonormal set in L (S , C). We must show that every element f ∈ L2 (S1 , C) can be written as a Fourier series ∞ X f= hf, e2πikt ie2πikt , k=−∞ with the convergence in the L2 sense. First, we address this problem for smooth functions f ∈ C ∞ (S1 , C). Recall that C ∞ (S1 , C) is dense in L2 (S1 , C) (which may be proved by mollifying L2 functions). Lemma 68. If f ∈ C ∞ (S1 , C), then for every polynomial P = P (k), lim P (k)hf, e2πikt i = lim P (k)hf, e2πikt i = 0. k→∞ k→−∞ Proof. We use the following claim: For any L2 function f , the Fourier coefficients hf, e2πikt i → 0 as k → ±∞. This follows from Bessel’s Inequality ∞ X |hf, e2πikt i|2 ≤ kf k2L2 < ∞. k=−∞ 134 If f is smooth, then f˙ is also smooth (and thus is in L2 ), and integration by parts gives us Z 1 2πikt ˙ hf , e i = f˙e−2πikt dt 0 1 Z 1 −2πikt −2πikt f (e )˙dt + f (t)e = − 0 0 = 2πikhf, e2πikt i + 0. Now by the claim, hf˙, e2πikt i = 2πikhf, e2πikt i → 0 as k → ±∞. Now we may apply induction to show that lim k n hf, e2πikt i = 0 for each n = 0, 1, 2, . . . . k→±∞ Thus any polynomial P (k) times the Fourier coefficients also goes to zero as k → ±∞. The previous lemma shows that for any smooth function f ∈ C ∞ (S1 , C), the Fourier series ∞ X g(t) = hf, e2πikt ie2πikt k=−∞ converges uniformly: This is because there is a constant C > 0 so that |hf, e2πikt i| ≤ C 1 + k2 (why?), which shows that the C 0 norm of the Fourier series satisfies ∞ ∞ X X hf, e2πikt ie2πikt 0 ≤ C C < ∞. 1 + k2 k=−∞ k=−∞ So the sup norm of the tails of the series ∞ X hf, e2πikt ie2πikt k=−∞ must go to zero, as they are bounded by the tails of an absolutely convergent series. 135 Therefore, uniform convergence implies that g(t) is continuous (and thus is in L2 as well—why?). (In fact, g(t) is smooth—see Homework Problem 58 below.) If we let h(t) = f (t) − g(t) = f (t) − ∞ X hf, e2πikt ie2πikt , k=−∞ then by the same techniques in the proof of Theorem 23 above, we see that hh, e2πikt i = 0 for all k ∈ Z. The following lemma shows that h = 0: Lemma 69. Given a function h ∈ C 0 (S1 , C) all of whose Fourier coefficients hh, e2πikt i = 0, then h = 0 identically. Proof. We prove by contradiction. If h is not identically zero, then there is a point τ ∈ S1 at which h(τ ) 6= 0. Then we know that at least one of the following is true: Re h(τ ) > 0, Re h(τ ) < 0, Im h(τ ) > 0, Im h(τ ) < 0. Assume that Re h(τ ) > 0 (the other cases are similar), and let α(t) = Re h(t). Since α is continuous, there is a δ > 0 so that α(t) > 12 α(τ ) > 0 if t ∈ (τ − δ, τ + δ). We will construct an approximate bump function to prove a contradiction. For n a positive integer, define n bn (t) = [ 12 + 12 cos 2π(t − τ )]n = 21 + 14 e−2πiτ e2πit + 14 e2πiτ e−2πit . It is obvious that bn (t) is real-valued, periodic with period 1 (and so defines a function on S1 ), and is equal to a finite Fourier series. Moreover, note that 1 2 + 12 cos 2π(t − τ ) ∈ [0, 1] always, and is equal to 1 only if t = τ in S1 . Thus the powers bn (t) → 0 as n → ∞ away from t = τ , while bn (τ ) → 1. This is the property that makes bn similar to bump functions centered around t = τ . 136 Now compute Z |Re hh, bn i| = Re h(t)bn (t) dt 1 Z S = α(t)bn (t) dt 1 ZS τ +δ Z α(t)bn (t) dt + = S1 \[τ −δ,τ +δ] τ −δ α(t)bn (t) dt Z τ +δ Z ≥ α(t)bn (t) dt − α(t)bn (t) dt τ −δ S1 \[τ −δ,τ +δ] Z τ+ δ Z 2 > α(t)bn (t) dt − α(t)bn (t) dt . τ − 2δ S1 \[τ −δ,τ +δ] (Note the last inequality follows since the integrand is positive.) Also, we have the following bounds: t ∈ [t − 2δ , t + 2δ ] =⇒ α(t) > 12 α(τ ) > 0, bn (t) ≥ ( 12 + 12 cos πδ)n , t 6∈ [t − δ, t + δ] =⇒ |α(t)| < C, bn (t) ≤ ( 12 + 12 cos 2πδ)n . for some constant C (since α is continuous). The bounds on bn follow by examining the graph of the cosine function. The key point is that 1 2 + 12 cos πδ > 1 2 + 12 cos 2πδ > 0. (41) Now compute |Re hh, bn i| > ≥ Z τ + 2δ τ − 2δ Z α(t)bn (t) dt − δ 12 α(τ )( 12 S1 \[τ −δ,τ +δ] n α(t)bn (t) dt + cos πδ) − (1 − 2δ)C( 12 + 12 cos 2πδ)n . 1 2 Now (41) shows the ratio of the first term over the second goes to +∞ as n → ∞ and thus there is an n so that |Re hh, bn i| > 0. Now the contradiction is this: Since bn is a finite Fourier series, hh, bn i is a finite linear combination of Fourier coefficients hh, e2πikt i, which we assume are all zero. Thus hh, bn i = 0, and we have a contradiction. Since h is the difference between the smooth f and its Fourier series, we have shown 137 Lemma 70. Let f ∈ C ∞ (S1 , C). Then f (t) = ∞ X hf, e2πikt i e2πikt , k=−∞ and the series converges uniformly in t. Uniform convergence on S1 implies L2 convergence (since S1 has finite measure; why does this imply L2 convergence?). Therefore, as in Theorem 23, we have Z ∞ X 2 |f |2 dt = |hf, e2πikt i|2 , kf kL2 = S1 k=−∞ for f ∈ C ∞ (S1 , C). To complete the proof of Theorem 27, first define the Hilbert space `2 = L2 (Z, C) for the counting measure on Z. In other words, P `2 is the set of all k k 2 complex-valued integer-indexed sequences {v }k∈Z so that ∞ k=−∞ |v | < ∞. Then we have the operation F of defining Fourier series: F : f 7→ fˆk = hf, e2πikt i. F : L2 (S1 , C) → `2 , Moreover, on the dense subset C ∞ (S1 , C) ⊂ L2 (S1 , C), F is an isometry. Bessel’s Inequality and the fact that {e2πikt } is an orthonormal set in L2 (S1 , C) shows that for all f ∈ L2 (S1 , C), kf k2L2 ≥ ∞ X |hf, e2πikt i|2 = kF(f )k2`2 . k=−∞ Therefore F is a bounded linear map from L2 (S1 , C) to `2 . A linear map L from a Banach space B1 to another Banach space B2 is called bounded if there is a positive constant C so that for all v ∈ B1 , kL(v)kB2 ≤ CkvkB1 . A linear map between Banach spaces is bounded if and only if it is continuous (see Problem 56 below). Therefore, F is continuous. Also, define the linear map G : `2 → L2 (S1 , C) by G(v) = ∞ X k=−∞ 138 v k e2πikt . The proof of Theorem 23 shows that G preserves the norms. In other words, kG(v)kL2 (S1 ,C) = kvk`2 . Let f ∈ L2 (S1 , C). Since smooth functions are dense in L2 , there is a sequence fn → f in L2 for fn ∈ C ∞ (S1 , C). Since F is continuous, then F(fn ) → F(f ) in `2 as n → ∞. In other words, lim kF(fn ) − F(f )k2`2 0 = n→∞ lim kfˆn − fˆk2`2 = n→∞ lim kG(fˆn ) − G(fˆ)k2L2 = n→∞ Now recall that ∞ X G(fˆn ) = hfn , e2πikt ie2πikt = fn k=−∞ since fn is smooth. Therefore, 0 = lim kG(fˆn ) − G(fˆ)kL2 = lim kfn − G(fˆ)kL2 . n→∞ n→∞ So in L2 , fn → G(fˆ) = ∞ X fˆk e2πikt . k=−∞ Since we assumed fn → f in L2 , this shows f= ∞ X fˆk e2πikt k=−∞ in L2 , and since the sum converges in L2 , finite linear combinations of the orthonormal set {e2πikt } are dense in L2 (S1 , C), and {e2πikt } is an orthonormal basis of L2 (S1 , C). So Theorem 27 is proved. Homework Problem 56. Let L : B1 → B2 be a linear map between Banach spaces. Show that L is bounded if and only if L is continuous. Homework Problem 57. Using the notation of the proof of Theorem 27 above, show that F : L2 (S1 , C) → `2 is an isometry and that F ◦ G is the identity map. 139 Homework Problem 58. Let f k ∈ C for all k ∈ Z, and assume for all n ≥ 0 that lim k n f k = lim k n f k = 0. k→∞ k→−∞ Then the Fourier series ∞ X f (t) = f k e2πikt k=−∞ converges uniformly to a smooth function from S1 → C. Hint: The key isP being able to change the order of the P∞derivative d/dt with the summation ∞ . Recall that the summation k=−∞ k=−∞ can be interpreted as an integral over Z with respect to the counting measure dµ. Thus for all t ∈ S1 , Z f (t) = f k e2πikt dµ(k). Z 1 1 To show that f (t) ∈ C (S , C), show that there is a constant C > 0 so that C |f k | ≤ . 1 + |k|3 Mimic the proof of Proposition 11: Show that the absolute value of the difference quotient f k e2πik(t+h) − f k e2πikt h 0 (|k|+1) is uniformly ≤ C1+|k| for a constant C 0 . (Apply the Mean Value Theorem 3 to the real and imaginary parts of e2πikt separately.) Show that the series ∞ X C 0 (|k| + 1) 1 + |k|3 k=−∞ converges by using the procedure in the proof of Lemma 70 above. Use induction to show f (t) is smooth. 4.6 Compact maps and the Ascoli-Arzelá Theorem Recall that every element of L21 (S1 ) = L21 (S1 , R) has a continuous representative (Proposition 59). So there is a natural linear map L21 (S1 ) → C 0 (S1 ). 140 In this section, we show that this map is compact. A linear map between Banach spaces Λ : B1 → B2 is called compact if the closure of the image of the unit ball in B1 is strongly compact in B2 . In other words, if vi ∈ B1 satisfy kvi kB1 ≤ 1, then {Λ(vi )} has a strongly convergent subsequence in B2 : i.e. there is a subsequence {vij } and an element w ∈ B2 so that lim kΛ(vij ) − wkB2 = 0. j→∞ The basic observation which allows us to conclude that the natural inclusion map L21 (S1 ) → C 0 (S1 ) is compact comes from the proof of Proposition 59. If f ∈ L21 (S1 ), then Z t2 ˙ |f (t2 ) − f (t1 )| = f (t) dt t1 Z t2 12 Z t2 12 ≤ |f˙(t)|2 dt dt t1 ≤ t1 12 1 2 ˙ |f (t)| dt (t2 − t1 ) 2 Z S1 1 ≤ kf kL21 (t2 − t1 ) 2 (Note that the first equality was justified in the proof of Proposition 59.) Therefore, f is continuous. But moreover, for every > 0, we may choose !2 δ= kf kL21 so that |t2 − t1 | < δ =⇒ |f (t2 ) − f (t1 )| < . So the modulus of continuity δ does not depend on t, and depends only on the norm kf kL21 , and on no other information about f . A family of functions Ω of functions from a metric space X to a metric space Y is called equicontinuous at a point x ∈ X if for all > 0, there is a δ > 0 so that dX (x, x0 ) < δ =⇒ dY (f (x), f (x0 )) < for all f ∈ Ω. The point is that δ does not depend on f . Such a family of functions Ω is called equicontinuous on X if it is equicontinuous at each point x ∈ X. 141 Note that if Ω is equicontinuous on X then each f ∈ Ω is continuous. The computations above show Lemma 71. The unit ball in L21 (S1 ) is equicontinuous on S1 . Theorem 28 (Ascoli-Arzelá). Let X be a compact metric space, and let Ω be an equicontinuous family of real-valued functions on X. Assume there is a uniform C so that |f (x)| ≤ C for all f ∈ Ω and x ∈ X. Then each sequence {fn } ⊂ Ω has a uniformly convergent subsequence. Proof. We’ll prove the theorem with the help of a few lemmas. Lemma 72. Any compact metric space has a countable dense subset. Proof. Let X be the compact metric space. For = 1/n, obviously [ X= B (x), B (x) = {y ∈ X : dX (x, y) < }. x∈X For each positive integer n, this open cover of X has a finite subcover consisting of balls of radius 1/n centered at points xn,1 , . . . , xn,mn . The union ∞ [ {xn,1 , . . . , xn,mn } n=1 is a countable dense subset of X. Lemma 73. Let P be a countable set, and let fn : P → R be a sequence of functions. Assume there is a constant C so that |fn (p)| ≤ C for all n = 1, 2, . . . and all p ∈ P. Then there is a subsequence of {fn } which converges everywhere on P to a function f : P → R. Proof. See Problem 59 below. Lemma 74. Let {fn } be an equicontinuous sequence of mappings from a compact metric space X to R. If the sequence {fn (x)} converges for each x in a dense subset of X, then {fn } converges uniformly on X to a continuous limit function. 142 Proof. First we show that fn (x) converges pointwise everywhere to a function f (x). Let y ∈ X and let > 0. Then by equicontinuity, there is a δ > 0 so that dX (x, y) < δ =⇒ |fn (x) − fn (y)| < . (Note δ is independent of n.) Since fn converges on a dense subset of X, there is an x ∈ Bδ (y) for which fn (x) converges. Therefore, {fn (x)} is a Cauchy sequence in R, and so there is an N so that n, m ≥ N |fn (x) − fm (x)| < . =⇒ Therefore, for n, m ≥ N , |fn (y) − fm (y)| ≤ |fn (y) − fn (x)| + |fn (x) − fm (x)| + |fm (x) − fm (y)| < 3. Therefore, {fn (y)} is a Cauchy sequence in the complete metric space R, and so it converges to a limit which we call f (y). Let y ∈ X and > 0. Then equicontinuity shows that there is a δ > 0 so that x ∈ Bδ (y) =⇒ |fn (x) − fn (y)| < (42) for all n. By letting n → ∞, we also have x ∈ Bδ (y) |f (x) − f (y)| ≤ =⇒ (43) These Bδ (y) form an open cover of X, and so there is a finite subcover X= k [ Bδi (yi ) i=1 since X is compact. Choose N large enough so that n≥N =⇒ |fn (yi ) − f (yi )| < , i = 1, . . . , k. (44) Then for x ∈ X, x ∈ Bδi (yi ) for some yi , and so (42), (43) and (44) show |fn (x) − f (x)| ≤ |fn (x) − fn (yi )| + |fn (yi ) − f (yi )| + |f (yi ) − f (x)| < 3. Since the same N works for all x ∈ X, the convergence is uniform. f , as the uniform limit of continuous functions, is continuous. This completes the proof of Theorem 28. 143 Homework Problem 59. Let P be a countable set, and let fn : P → R be a sequence of functions. Assume that for each p ∈ P, there is a constant C = Cp so that |fn (p)| ≤ C for all n = 1, 2, . . . . Show there is a subsequence of {fn } which converges everywhere on P to a function f : P → R. Hint: Use a diagonalization argument. An important version of the Ascoli-Arzelá Theorem is the following: Theorem 29. Let X be a metric space so that there is a countable number of open subsets Oi satisfying X= ∞ [ Oi , Oi ⊂⊂ Oi+1 , (45) i=1 and let Ω be an equicontinuous set of real-valued functions on X. If for a sequence of functions {fn } ⊂ Ω, there is a uniform C so that |fn (x)| ≤ C for all n and all x ∈ X, then there is a subsequence of {fn } which converges pointwise to a function f : X → R, and the convergence is uniform on every compact subset of X. Remark. Recall A ⊂⊂ B for A a subspace of a topological space B means that the closure A relative to B is compact. Remark. A sequence of functions converging uniformly on compact subsets of X is said to converge normally on X. We relegate the proof of Theorem 29 to the following problem: Homework Problem 60. Prove Theorem 29. Hint: Consider X, Oi as in the previous theorem. Note we may apply Theorem 28 to each of the compact sets Oi . Use a diagonalization argument to find a uniformly convergent subsequence on each Oi . Show that every compact subset of X is contained in some Oi . Remark. For every smooth manifold X (which is Hausdorff and sigma-compact), there are a countable collection of open sets Oi satisfying condition (45). See the notes on “The Real Definition of a Smooth Manifold.” The Ascoli-Arzelá Theorem provides the following. Proposition 75. If C > 0 and {fn } is a sequence of functions in L21 (S1 , R) which satisfy kfn kL21 ≤ C, then there is a uniformly convergent subsequence. 144 Proof. This follows from the Ascoli-Arzelá Theorem and Lemma 71 above, once we know in addition that there is a constant K so that |fn | ≤ K pointwise. First of all, note that 1 1 |fn (t2 ) − fn (t1 )| ≤ kfn kL21 |t2 − t1 | 2 ≤ C|t2 − t1 | 2 shows that for every t2 , t1 ∈ S1 , |fn (t2 ) − fn (t1 )| ≤ C since we may choose t2 , t1 ∈ [0, 1). Since Z 0 1 21 |fn | dt = kfn kL2 ≤ kfn kL21 ≤ C, 2 there must be a t1 ∈ S1 so that |fn (t1 )| ≤ C. Then for any t2 ∈ S1 , |fn (t2 )| ≤ |fn (t1 )| + |fn (t2 ) − fn (t1 )| ≤ 2C. Thus the hypotheses of the Ascoli-Arzelá Theorem are satisfied. Corollary 76. The inclusion L21 (S1 , R) ,→ C 0 (S1 , R) is compact. Proof. Take C = 1 in the above theorem. Corollary 77. Let C > 0 and let X ⊂ RN be a compact manifold, and let γn ∈ L21 (S1 , X) ⊂ L21 (S1 , RN ) satisfy E(γn ) ≤ C. Then there is a uniformly convergent subsequence of {γn }, and the limit is a continuous function γ : S1 → X. Proof. Recall kγn k2L2 (S1 ,RN ) = kγn k2L2 (S1 ,RN ) + kγ̇n k2L2 (S1 ,RN ) = kγn k2L2 (S1 ,RN ) + E(γn ). 1 Since γn (S1 ) ⊂ X and X is compact, there is a constant K so that |γn (t)| ≤ K for all n and t. Therefore, Z 2 kγn kL2 (S1 ,RN ) ≤ K 2 dt = K 2 , S1 and moreover, kγn k2L2 (S1 ,RN ) ≤ C + K 2 1 145 independently of n. So each component function γna for a = 1, . . . , N satisfies √ kγna kL21 (S1 ,R) ≤ C + K 2 . Then Proposition 75 shows that there is a subsequence {1 γn } of {γn } so that the component 1 γn1 converges uniformly. Let {2 γn } be a subsequence of {1 γn } so that 2 γn1 and 2 γn1 converge uniformly. By induction, as in the proof of Theorem 24, there is a subsequence {N γn } of {γn } so that N γna converges uniformly for a = 1, . . . , N . Since this subsequence converges uniformly on each component in RN , N γn converges uniformly as n → ∞ to a limit γ in C 0 (S1 , RN ). Since X is closed in RN and the subsequence converges pointwise, the limit γ : S1 → X. It is also useful to define the Hölder norm for functions f : S1 → R kf kC 0, 21 (S1 ) = kf kC 0 + sup t1 6=t2 |f (t1 ) − f (t2 )| 1 dS1 (t1 , t2 ) 2 . (Here we define dS1 (t1 , t2 ) = inf |(t1 + k) − t2 |. k∈Z This definition is necessary, since we identify the real numbers t and t + k on the circle S1 . For example, dS1 (0, 0.9) = |1 − 0.9| = 0.1.) It is easy to check 1 that this defines a norm. Define the space C 0, 2 (S1 ) to be all f from S1 → R so that kf kC 0, 21 (S1 ) < ∞. 1 C 0, 2 (S1 ) is a Banach space (Proposition 78 below), and the calculations above show that there is a natural continuous inclusion map from L21 (S1 ) → 1 1 C 0, 2 (S1 ). Moreover, the natural inclusion map from C 0, 2 (S1 ) → C 0 (S1 ) is compact. Then Problem 63 below shows that composition inclusion from L21 (S1 ) → C 0 (S1 ) is compact. In general, for any metric space X, α ∈ (0, 1], we can define C 0,α (X) = {f : X → R : kf kC 0,α < ∞}, |f (x) − f (y)| kf kC 0,α = sup |f (x)| + sup . dX (x, y)α x∈X x6=y∈X These are called Hölder spaces and Hölder norms respectively. 146 Example 21. This is the standard example for X = [−1, 1] ⊂ R. f (x) = |x|α is in C 0,α (X). Proof. It clearly suffices to bound the difference quotient q(x, y) = ||x|α − |y|α | , |x − y|α x 6= y ∈ [−1, 1]. We will show that this is always ≤ 1. First, simplify to the case x and y have the same sign, since if they have opposite signs, q(x, y) < q(−x, y). We may assume x and y have the same sign. By possibly interchanging (x, y) ↔ (−x, −y) and switching x and y, we may assume x > y ≥ 0. Then write xα − y α 1 − ρα y q(x, y) = = , ρ = ∈ [0, 1). α α (x − y) (1 − ρ) x Then we compute dq α(1 − ρα−1 ) = ≤ 0. dρ (1 − ρ)α+1 Therefore, the max of q(ρ) is achieved at ρ = 0, q = 1. We also say f (x) = |x|α is locally C 0,α on R, since the α Hölder norm of f is finite on any compact subset of R. In the case α = 1, note that a function in C 0,1 is simply a C 0 function which is globally Lipschitz. Homework Problem 61. (a) Show that the inclusion C 1 (S1 ) ,→ C 0 (S1 ) is compact (Hint: use the Mean Value Theorem). (b) Show that every bounded sequence fn ∈ C 1 (R) (i.e., there is a uniform C so that kfn kC 1 ≤ C for all n) has a subsequence which converges uniformly on compact subsets of R to a continuous limit f . Hint: It is easy to show that R satisfies condition (45). (c) Find an example of a bounded sequence of functions fn ∈ C 1 (R) which does not have a convergent subsequence in C 0 (R). Thus the inclusion C 1 (R) ,→ C 0 (R) is not compact. (Hint: How is this situation different from parts (a) and (b)? You must use the noncompactness of R. Therefore, the interesting behavior of the fn should be “moving off to infinity.”) 147 It is also useful to apply Hölder norms to the derivatives of a functions. In particular, on Rn , we may define for k a positive integer, α ∈ (0, 1], X kf kC k,α = k∂β f kC 0,α , |β|≤k where, as in (3) above, we use the multi-index notation to denote all the partial derivatives of f of order ≤ k. Remark. It is not useful to define C 0,α for α > 1, as the following problem shows. Homework Problem 62. Let α > 1, and let f : R → R. Assume that sup x6=y |f (x) − f (y)| = C < ∞. |x − y|α Show that f is a constant function. Hint: Use the definition of the derivative to show that f 0 (x) = 0 for all x. Proposition 78. Let X be a metric space and α ∈ (0, 1]. Then C 0,α (X) is a Banach space. Proof. It is straightforward to show that k · kC 0,α is a norm. As always, we must check completeness carefully. Let {fn } be a Cauchy sequence in C 0,α (X). We want to show that there is a limit f ∈ C 0,α and that kfn − f kC 0,α → 0 as n → ∞. First of all, it is obvious from the definition of the Hölder norm that {fn } is a Cauchy sequence in C 0 (X), and since C 0 is complete, there is a continuous limit function f , and fn → f uniformly. Now we show f ∈ C 0,α . Let > 0. Then there is an N so that m, n ≥ N =⇒ kfm − fn kC 0,α < . (46) Then for all m ≥ N , kfm kC 0,α < kfN kC 0,α + ≡ C . By the definition of the Hölder norm, for all x, y ∈ X, |fm (x) − fm (y)| ≤ C dX (x, y)α . Taking m → ∞ shows that f ∈ C 0,α . Now (46) also implies that for all x, y ∈ X, |fm (x) − fn (x) − fm (y) + fn (y)| ≤ dX (x, y)α , 148 and so again let m → ∞ to show for all x, y ∈ X, and for all n ≥ N , |f (x) − fn (x) − f (y) + fn (y)| ≤ dX (x, y)α . Since we already know fn → f in C 0 , this is exactly the additional statement we need to show fn → f in C 0,α . Remark. If X is a smooth manifold, then it is possible (by using an atlas and a subordinate partition of unity) to define C k,α (X). If X is compact, then C k,α (X) ,→ C k (X) is a compact inclusion. Homework Problem 63. Let Λ : B1 → B2 and Φ : B2 → B3 be linear maps between Banach spaces. (a) Assume Λ is continuous and Φ is compact. Then Φ ◦ Λ is compact. (b) Assume Λ is compact and Φ is continuous. Then Φ ◦ Λ is compact. Homework Problem 64. Let Λ : B1 → B2 be a compact linear map of Banach spaces. Show Λ is continuous. Hint: It suffices to show Λ is bounded. For B1 (0) the unit ball in B1 , consider the image of the compact set ΛB1 (0) ⊂⊂ B2 under the norm map k · kB2 : B2 → R. Remark. The Hölder spaces C k,α , for α ∈ (0, 1), and the Sobolev spaces Lpk , for p ∈ (1, ∞), play a very important role in the theory of partial differential equations. In particular, the behave much better than the more obvious 1 spaces C k . Our simple proofs that L21 (S1 ) embeds continuously in C 0, 2 (S1 ) and compactly in C 0 (S1 ) constitute some of the easiest cases of Sobolev embedding theorem. The Sobolev embedding theorem allow us to embed certain Sobolev spaces, in which derivatives are defined only in the sense of distributions, to Hölder and C k spaces, in which we may take derivatives in the usual sense. These spaces are crucial to the regularity theory of solutions to PDEs. 149 4.7 Convergence Now we have finally developed the tools needed to solve our problem. Recall Problem: Let X ⊂ RN be a smooth compact manifold equipped with the Riemannian metric pulled back from the Euclidean metric on RN . Let C be the class of loops γ : S1 → X in a free homotopy class on X and in L21 (S1 , X). Find a loop of least energy in C. Our strategy is as follows: Define L = inf E(γ). γ∈C Since E(γ) ≥ 0 always, L ≥ 0. Now there is a sequence of γi ∈ C so that E(γi ) → L. We want to find a subsequence γij which converges to a limit γ ∈ C so that E(γ) = L. Moreover, we expect γ to be a geodesic—it should satisfy the geodesic equations not just in the sense of distributions, but also in the usual sense. Therefore, by the theory of ODEs, γ should be smooth. First of all, we show the existence of a limit γ. Corollary 77 shows that there is a subsequence of γi which converges uniformly to a continuous γ : S1 → X. (For simplicity, we just refer to this subsequence as γi again.) Since γi → γ uniformly, Corollary 57 shows that γ is in the same free homotopy class. Thus we have Proposition 79. There is a subsequence of γi which converges uniformly to a limit γ in the same free homotopy class. Proposition 80. Let X ⊂ RN be a compact manifold. If γi : S1 → X satisfy E(γi ) → L, then there is a constant K independent of i so that kγi kL21 (RN ) ≤ K. Proof. Since X is compact, there is a uniform C so that kγi kL2 (S1 ,RN ) ≤ C. Since E(γi ) → L, {E(γi )} is a bounded sequence. Therefore, kγi k2L2 (S1 ,RN ) = E(γi ) + kγi k2L2 (S1 ,RN ) 1 is bounded independent of i. This proposition shows there is a further subsequence of γi which converges weakly to a γ̃ ∈ L21 (S1 , RN ) by Theorem 24. (Explanatory note: a 150 further subsequence means that we take a subsequence not just of the original γi , but of the subsequence taken in the paragraph above Proposition 79.) We still refer to this further subsequence as γi . Then Theorem 25 shows that the Hilbert space norm kγ̃kL21 (S1 ,RN ) ≤ lim inf kγi kL21 (S1 ,RN ) . i→∞ Note a potential problem: We have taken a subsequence of the original γi to converge uniformly to a continuous γ, and then we take a further subsequence to converge weakly in L21 to γ̃ in L21 . We must show γ and γ̃ are the same. This will follow from the fact that they must be equal in the sense of distributions, and thus are equal almost everywhere (Proposition 58). Since both γ and γ̃ are continuous, they must be equal everywhere. In particular, we require Proposition 81. γ = γ̃ in the sense of distributions. Proof. It suffices to show each component γ a = γ̃ a in the sense of distributions for a = 1, . . . , N . For each a = 1, . . . , N , γia → γ a uniformly as i → ∞. So if φ ∈ D(S1 ) is a smooth test function, then Z |γia (φ) − γ a (φ)| = (γia − γ a )φ dt ≤ kφkL1 kγia − γ a kC 0 , 1 S which goes to 0 as i → ∞ by uniform convergence. Therefore, γ a (φ) = lim γia (φ). i→∞ (47) Also, γia → γ̃ a weakly in L21 (S1 ). Let φ ∈ D(S1 ) ⊂ L21 (S1 ) be a test function. Let fi = γia − γ̃ a . Then fi → 0 weakly in L21 . Compute Z Z ˙ (fi φ − fi φ̈) dt = fi (φ − φ̈), (fi φ + fi φ̇) dt = hfi , φiL21 = S1 S1 the last term denoting fi acting in the sense of distributions. Therefore, for all φ ∈ D(S1 ), lim fi (φ − φ̈) = lim hfi , φiL21 = 0. i→∞ i→∞ By Proposition 82 below, for every ψ ∈ D(S1 ), there is a φ ∈ D(S1 ) so that φ − φ̈ = ψ. Therefore, for all ψ ∈ D(S1 ), lim fi (ψ) = 0 i→∞ ⇐⇒ lim γia (ψ) = γ̃ a (ψ). i→∞ Therefore, by (47) above, γ = γ̃ in the sense of distributions. 151 Proposition 82. For every ψ ∈ D(S1 ), there is a φ ∈ D(S1 ) so that ψ = φ − φ̈. Proof. Recall D(S1 ) = C ∞ (S1 , R). Moreover, Lemma 70 and Problem 58 show that ( ∞ ) X ∞ 1 k 2πikt k n C (S , C) = f e : lim f |k| = 0 for n = 1, 2, . . . . (48) k→±∞ k=−∞ The convergence of each such series is uniform, and the sum commutes with the derivative d/dt. Therefore, if ∞ X φ= φ̂k e2πikt ∈ C ∞ (S1 , C), k=−∞ then ∞ X φ̈ = (−4π 2 k 2 )φ̂k e2πikt , k=−∞ ∞ X (1 + 4π 2 k 2 )φ̂k e2πikt . φ − φ̈ = k=−∞ So if ψ= ∞ X ψ̂ k e2πikt ∈ C ∞ (S1 , C), k=−∞ then we may let φ= ∞ X ψ̂ k e2πikt , 2k2 1 + 4π k=−∞ so that φ − φ̈ = ψ. We must prove that φ ∈ C ∞ (S1 , C). Let n be a positive integer. Then ψ̂ k |k|n = 0. k→±∞ 1 + 4π 2 k 2 lim φ̂k |k|n = lim k→±∞ because |ψ̂ k ||k|n−2 → 0. So φ is smooth. (Note that we went from a |k|n limit to a |k|n−2 limit. This is because the differential equation is of order two.) We have considered C-valued functions so far. It is easy to check that ψ ∈ C ∞ (S1 , R) implies φ ∈ C ∞ (S1 , R). 152 Remark. The previous proposition uses a standard technique for solving constant-coefficient differential equations on S1 . The differential equation then breaks into an algebraic equation for each Fourier coefficient, each of which can be typically be solved. This also works for functions on the n-torus (S1 )n . In this case, the Fourier series is summed over Zn , and we can solve constant-coefficient PDEs. Also, on Rn , the Fourier transform turns constant-coefficient PDEs into algebraic equations of the Fourier transform variable. Homework Problem 65. L22 (S1 , C) is the complex Hilbert space defined by the inner product Z ¯ dt. hf, giL2 = (f ḡ + f˙ġ¯ + f¨g̈) 2 S1 The elements of L22 (S1 , C) are all complex-valued functions f on S1 which are L2 and whose first and second derivatives f˙ and f¨ in the sense of distributions are also L2 functions. (You may assume L22 (S1 , C) is a Hilbert space, as in Proposition 62.) Show that if fn → f converges weakly in L22 (S1 , C), then for all φ ∈ D(S1 ), fn (φ) → f (φ). Hint: Mimic the proofs of Propositions 81 and 82. To recap, so far we have a sequence of loops γi in C so that lim E(γi ) = L = inf E(α), i→∞ α∈C lim γi = γ uniformly and weakly in L21 (S1 , RN ). i→∞ Moreover, γ ∈ C the same free homotopy class of L21 loops containing the γi . Since γi → γ uniformly, we have Z 2 kγi − γkL2 (S1 ,RN ) = |γi − γ|2 dt ≤ sup |γi − γ|2 → 0, t S1 and so γi → γ in L2 . 153 Now Theorem 25 shows that kγk2L2 (S1 ,RN ) ≤ lim inf kγi k2L2 (S1 ,RN ) 1 1 i→∞ h i = lim inf E(γi ) + kγi k2L2 (S1 ,RN ) i→∞ = L + kγk2L2 (S1 ,RN ) , E(γ) = kγk2L2 (S1 ,RN ) − kγk2L2 (S1 ,RN ) 1 ≤ L. Since L is the infimum of the energy of all loops in C, and γ ∈ C, then E(γ) ≥ L as well. So E(γ) = L. Thus we have proved Theorem 30. Let X be a compact Riemannian manifold without boundary. Then in each free homotopy class of loops, there is a γ ∈ L21 (S1 , X) which minimizes the energy. Corollary 83. This minimizing γ satisfies the geodesic equations (in local coordinates on X) in the form 2(gik γ̇ i )˙ − gij,k γ̇ i γ̇ j = 0 in the sense of distributions. Proof. See Proposition 61. Note in the proof of Theorem 30 above, we implicitly use the fact that the map from L21 (S1 ) → L2 (S1 ) is compact, by using the inclusions L21 (S1 ) ,→ C 0 (S1 ) ,→ L2 (S1 ), the first of which is compact and the second of which is continuous. The following problem gives a direct proof. Homework Problem 66. Show directly that the inclusion L21 (S1 , C) ,→ L2 (S1 , C) is a compact linear map. Hints: (a) Use the characterization of L21 (S1 , C) in terms of Fourier series from Proposition 87 below. 154 (b) If kfi (t)kL21 ≤ 1, then use a diagonalization argument to produce a subsequence {fij } so that for each k ∈ Z, the Fourier coefficients fˆikj converge to constants g k ∈ C as j → ∞. (c) For all > 0, show that there is an N so that if |k| ≥ N , then X |fˆk |2 < |k|≥N for all f such that kf kL21 ≤ 1. (d) Conclude that the subsequence {fij } converges strongly to X g k e2πikt k∈Z in L2 (S1 , C). Remark. The proof presented in the previous problem works for Sobolev spaces in higher dimensions (for functions on the n-dimensional torus S1 × · · · × S1 ), whereas the use of the Sobolev embedding theorem for the compact inclusion L21 (S1 , C) ,→ C 0 (S1 , C) is only available in dimension n = 1. 4.8 Regularity Now we show that γ is smooth. First of all, note that Γkij is a smooth in each set of local coordinates x on X. Also, since γ ∈ L21 (S1 , RN ), then we know that γ is continuous in t ∈ S1 , and so Γkij (γ) is continuous on S1 . Until now, we’ve been lax about distinguishing between γ = (γ 1 , . . . , γ N ) ∈ X ⊂ RN and γ in local coordinates. There is an important point in which we should make a distinction. Recall we are working on a coordinate chart φ : U → O ⊂ X ⊂ RN , where U ⊂ Rn . Our notation has been this: γ a is the ath coordinate of γ in RN ⊃ X, while γ i has been shorthand for (φ−1 ◦ γ)i the ith coordinate of φ−1 ◦ γ in Rn ⊃ U. In the previous subsections, we have dealt with the L21 norm of γ in RN , while in local coordinates, we should deal with the L21 norm of φ−1 ◦ γ in U ⊂ Rn . Let φ−1 : O → U be restriction of the smooth map y = (y 1 , . . . , y n ) : Q → U, 155 where Q is an open subset of RN which contains O ⊂ X ⊂ RN . (Recall we may do this by the definition of smooth maps from O to Rn .) Let x = (x1 , . . . , xN ) represent coordinates on RN . Compute for k = 1, . . . , n ∂ ∂y k a (y ◦ γ)k = γ̇ , ∂t ∂xa where a is summed from 1 to N . Proposition 84. Let φ : U → O be a smooth coordinate parametrization of X. Let I ⊂ R be a compact interval, and let K ⊂ O be compact. Then there are positive constants C1 , . . . , C5 so that C1 kγkL21 (I,RN ) + C3 ≥ kφ−1 ◦ γkL21 (I,Rn ) + C4 ≥ C2 kγkL21 (I,RN ) + C5 (49) for all γ so that γ(I) ⊂ K. (The point is that C1 , C2 , C3 , C4 , C5 are independent of γ.) Corollary 85. kγkL21 (I,RN ) is bounded if and only if kφ−1 ◦ γkL21 (I,Rn ) is bounded. Remark. A related, simpler notion is the following: Two norms k · kB1 and k · kB2 on a single linear space B are called equivalent if there are constants C1 > C2 > 0 so that for all x ∈ B, C1 kxkB1 ≥ kxkB2 ≥ C2 kxkB1 . Remark. As long as we restrict to compact subsets of coordinate charts, the norms in RN ⊃ X and in local coordinates on Rn are equivalent. The corollary holds for all the Banach function spaces we have discussed, not just for L21 . Also, a similar proposition holds for Banach spaces of functions from X to R, not simply spaces of maps from S1 to X: For K ⊂⊂ O, the norms on L21 (K) and L21 (φ−1 K) are equivalent under the map L21 (K) → L21 (φ−1 K), f 7→ f ◦ φ. Proof of Proposition 84. We claim it suffices to prove the bound (49) separately for the L2 norm of γ and for the L2 norm of γ̇. Proof: if A = kγkL2 and B = kγ̇kL2 , then √ kγkL21 = A2 + B 2 . 156 Then it is easy to check that for A, B ≥ 0, √ 1 √ (A + B) ≤ A2 + B 2 ≤ A + B. 2 In other words, the norm on γ given by the sum of the L2 norm of γ and the L2 norm of γ̇ is equivalent to the L21 norm. It is straightforward to use this fact to prove the claim. Since φ−1 is C 1 on K, it is locally Lipschitz and thus globally Lipschitz on K (see Proposition 17). So for C the Lipschitz constant and x0 a point in K, for all x ∈ K, |φ−1 (x)| ≤ |φ−1 (x0 )| + C|x − x0 | ≤ C 0 + C|x|, C 0 = |φ−1 (x0 )| + C|x0 |. Therefore, the Triangle Inequality gives kφ (γ)kL2 (S1 ,Rn ) = Z 12 |φ (γ(t))| dt ≤ Z 12 (C + C|γ(t)|) dt ≤ Z 12 12 Z 2 (C ) dt + [C|γ(t)|] dt −1 −1 2 S1 0 2 S1 S1 0 2 S1 0 = C + CkγkL2 (S1 ,RN ) This is essentially one half of (49) for the L2 norm of γ. The other half follows from the fact that φ is a C 1 function on the compact set φ−1 K. We still must address the L21 norm of γ. Recall for y = φ−1 as above, that (φ−1 ◦ γ)˙ = ∂y a γ̇ . ∂xa On the compact set K, since φ−1 is C 1 , there is a constant C so that ∂y ∂xa ≤ C on K, 157 and so on K N X −1 ∂y a (φ ◦ γ)˙ = |γ̇ a | ≤ CN |γ̇|. ∂xa γ̇ ≤ C a=1 Thus, as in the previous paragraph, k(φ−1 ◦ γ)˙kL2 (S1 ,Rn ) ≤ CN kγ̇kL2 (S1 ,RN ) . The opposite inequality can be obtained by considering φ as a C 1 map instead of φ−1 . Remark. In the previous proof, it sufficed to consider the L2 norms of γ and γ̇ separately. For higher derivatives, this is no longer adequate: Compute (φ−1 ◦ γ)¨= ∂y a ∂2y γ̈ + γ̇ a γ̇ b . ∂xa ∂xa ∂xb So first derivative terms of γ come into the calculations of the second derivatives of φ−1 ◦ γ. The geodesic equation is written in terms of the coordinates on U ⊂ Rn , and for an open interval I ⊂ S1 , γ(I) ⊂ O. On any compact subinterval of I, there is a constant C so that the the components of the metric gk` (γ) and its first derivatives gk`,m (γ) have absolute values bounded by a constant C (this is since γ is continuous on the compact interval I). Since γ ∈ L21 , each γ̇ i ∈ L2 . Therefore, Hölder’s inequality shows that Z I 1 |g γ̇ i γ̇ j | dt 2 ij,k ≤ C 2 n Z X i,j=1 12 Z 12 j 2 |γ̇ | dt |γ̇ | dt < ∞. i 2 I I Thus 12 |gij,k γ̇ i γ̇ j | ∈ L1 (I) for each k, and thus Corollary 83 shows (gik γ̇ i )˙ ∈ L1 (I) for each k in the sense of distributions. Lemma 86 below and the proof of Proposition 59 above then show gik (γ)γ̇ i is continuous. gk` (γ)γ̈ k ∈ L1 (I) (50) in the sense of distributions. Now since the inverse metric g `m (γ) is continuous in t, we may multiply by it to show that each γ̇ i is continuous as well. Thus γ is locally C 1 . 158 Now bootstrap using Corollary 83 again to show that (gik (γ)γ̇ i )˙ is continuous as well. Thus gik (γ)γ̇ i is, in the sense of distributions, a C 1 function. As above, this shows γ̇ i is also C 1 , and thus γ is locally C 2 . We now have enough regularity to show rewrite Corollary 83 as the geodesic equation γ̈ k = −Γkij (γ)γ̇ i γ̇ j . for γ k C 2 functions. The equation holds in the usual sense of ODEs. Therefore, since Γkij is smooth, the usual regularity theory for ODEs, Theorem 9, applies, and the geodesic γ is smooth. Lemma 86. Let f ∈ L1loc (R). Then g(t) = Z t f (s) ds t0 is continuous. Proof. Let t ∈ R, and let h > 0 (the case h < 0 is similar). Compute g(t + h) − g(t) = Z = Zt t+h f (s) ds χ[t,t+h] (s) f (s) ds R for χ[t,t+h] the characteristic function of the interval [t, t + h]. Then as h → 0, χ[t,t+h] (s) f (s) → 0 almost everywhere on R. For small h, χ[t,t+h] (s) f (s) ≤ χ[t−1,t+1] (s)f (s) , and the right-hand function is integrable since f is locally L1 . Then the Dominated Convergence Theorem says that Z Z χ[t,t+h] (s) f (s) ds → 0 ds = 0 g(t + h) − g(t) = R R as h → 0+ . The case h → 0− is similar. Thus g(t + h) → g(t) as h → 0 and g is continuous at each t ∈ I. 159 Homework Problem 67. Let f : R → R be an L1 function. Show that Z t ψ(t) = exp f (τ ) dτ 0 is a continuous function satisfying ψ(0) = 1 and ψ solves ψ̇ = f (t) ψ in the sense of distributions. (Hint: approximate f in L1 by a sequence of C ∞ functions.) 4.9 Sobolev spaces, distributions, and Fourier series In this subsection, we provide some more background results about Sobolev spaces and distributions on S1 . First of all, we describe C valued distributions. A complex valued distribution is a C-linear map from C ∞ (S1 , C) to C. Example 22. For k ∈ Z, the map k φ 7→ φ̂ = Z φ e2πikt dt S1 is a distribution. Proposition 87. L21 (S1 , C) = ( X X f k e2πikt : k∈Z ) |f k |2 (k 2 + 1) < ∞ . k∈Z Moreover, the norm kf kL21 is equivalent to X ! 12 |fˆk |2 (k 2 + 1) . k∈Z Proof. First we show ⊂. Let f ∈ L21 (S1 , C) and compute Z Z −2πikt −2πikt ˙ ˙ f (e )= f (t) e dt = − f (t)(−2πik)e−2πikt dt = 2πik fˆk . S1 Since f˙ ∈ L2 , X S1 4π 2 k 2 |fˆk |2 = kf˙k2L2 < ∞. k∈Z ⇐⇒ X k∈Z 160 k 2 |fˆk |2 < ∞. Now since f ∈ L2 also, then X |fˆk |2 < ∞ and X This proves ⊂. To show ⊃, note that X |f k |2 (k 2 + 1) < ∞ ⇐⇒ |f k |2 < ∞ and k∈Z k∈Z |fˆk |2 (k 2 + 1) < ∞. k∈Z X X k 2 |f k |2 < ∞. k∈Z k∈Z Therefore, f= X f k e2πikt ∈ L2 , k∈Z and by the computations in the previous paragraph f˙k ≡ f˙(e−2πikt ) = 2πikf k . Consider a test function φ ∈ C ∞ (S1 , C). Then compute f˙(φ) = −f (φ̇) Z = − f φ̇ dt S1 = −hf, φ̇iL2 X = − f k 2πik φ̂k k∈Z = − X (−2πik)f k φ̂k k∈Z = X = * X f˙k φ̂k k∈Z ˙k 2πikt f e k∈Z + ,φ . L2 This shows that f˙ = X k∈Z f˙k e2πikt = X (2πik)f k e2πikt k∈Z in the sense of distributions. Therefore, both f and f˙ are in L2 , and thus f ∈ L21 (S1 , C). The statement about equivalence of the norms follows easily. 161 Remark. Similar easy calculations show that ( ) X X L2m (S1 , C) = f k e2πikt : |f k |2 (k 2 + 1)m < ∞ k∈Z k∈Z for every m = 0, 1, 2, . . . . Our characterization of smooth functions in (48) above then shows that ∞ ∞ \ 1 C (S , C) = L2m (S1 , C) m=0 Proof: it is straightforward to show that L2m (S1 , C) compactly embeds in C m−1 (S1 , C) for all m ≥ 1. The Fourier series isometry between L2 (S1 , C) and sequences `2 = L2 (Z, C) also allows us to define even more Sobolev spaces. For any s ∈ R, define L2s (S1 , C) to be the set of distributions f which act on X φ= φ̂k e2πikt k∈Z by f (φ) = X fˆk φ̂k , (51) k∈Z ˆk where f = f (e2πikt ) and we assume that X |fˆk |2 (1 + k 2 )s < ∞. (52) k∈Z Homework Problem 68. Show that if fˆk is a sequence of complex numbers satisfying (52), then for any φ ∈ C ∞ (S1 , C), the sum in (51) converges. Now we are able to put a topology on C ∞ (S1 , C). We only describe this topology in terms of convergence of sequences. We say φj → φ in C ∞ (S1 , C), if φj → φ in L2m (S1 , C) for all m ≥ 0. Homework Problem 69. Show that φj → φ in C ∞ (S1 , C) if and only if φj → φ in C p (S1 , C) for all p ≥ 0. Hint: You may use the fact that L2m (S1 , C) embeds compactly into C m−1 (S1 , C) for each m ≥ 1. Also show that C p (S1 , C) embeds continuously into L2p (S1 , C) for all p ≥ 0. 162 Now we finally give the correct definition of complex distributions on S1 . A distribution on S1 is a continuous C-linear map from C ∞ (S1 , C) to C. Denote the space of complex distributions on S1 by D0 (S1 , C). [ Proposition 88. D0 (S1 , C) = L2m (S1 , C), and the image of D0 (S1 , C) m∈Z under the Fourier transform is the set of all polynomially bounded complex sequences. In other words, it is the set of all sequences {f k } so that there are m m ∈ Z, C > 0 so that |f k | ≤ C(k 2 + 1) 2 for all k ∈ Z. Proof. We prove the first equality, and leave the rest as an exercise. To prove ⊃, if f is in the union, then f ∈ L2−m (S1 , C) for some positive m. To show f ∈ D0 (S1 , C), consider a sequence of φj → φ in C ∞ (S1 , C). Then by definition, φj → φ in L2m . Then |f (φj ) − f (φ)| = |f (φj − φ)| X ≤ |fˆk (φ̂kj − φ̂k )| k∈Z h i |fˆk | k k 2 m 2 | φ̂ − φ̂ |(1 + k ) m j (1 + k 2 ) 2 k∈Z ! 21 ! 12 X |fˆk |2 X ≤ |φ̂kj − φ̂k |2 (1 + k 2 )m . 2 m (1 + k ) k∈Z k∈Z = X The second term in the last line goes to zero by the remark after Proposition 87, while the first term is finite by the fact f ∈ L2−m . Therefore, f (φj ) → f (φ) for every test function φ, and f ∈ D0 (S1 , C). We prove ⊂ by contradiction. If f ∈ D0 (S1 , C) is not in L2m (S1 , C) for every m ∈ Z, then for all m ∈ Z, ∞ X |fˆk |2 (1 + k 2 )m = ∞. k=−∞ This implies that sup |fˆk |2 (1 + k 2 )m = ∞ for all m ∈ Z. k∈Z (Proof of the contrapositive: |fˆk |2 (1 + k 2 )m ≤ C =⇒ X |fˆk |2 (1 + k 2 )m−1 ≤ k∈Z X k∈Z 163 C < ∞.) 1 + k2 So for each j, there is a kj so that |fˆkj | j (1 + kj2 ) 2 ≥ 1. We may assume kj 6= 0. Now we construct a sequence φj which converges to 0 in C ∞ (S1 , C), but for which f (φj ) 6→ 0. Define fˆkj φj = j |fˆkj |(1 + kj2 ) 2 e2πikj t . Compute kφj k2L2n ≈ (1 + kj2 )n = (1 + kj2 )n−j , (1 + kj2 )j where ≈ denotes equivalence of norms. For each fixed n, since each kj2 ≥ 1, then lim kφj k2L2n = 0, j→∞ and so φj → 0 in C ∞ (S1 , C). On the other hand, f (φj ) = fˆkj · fˆkj j |fˆkj |(1 + kj2 ) 2 = |fˆkj | j (1 + kj2 ) 2 ≥ 1. So f (φj ) 6→ 0 = f (0) = f (lim φj ), where φj → 0 in C ∞ (S1 , C). 164