LECTURE NOTES MATH 5210 WILL FELDMAN Abstract. These notes are a work in progress. There are sections / examples / proofs / exercises which have been started but not completed. Please read in those places with caution. Please let me know if you find any errors or typos especially in sections that appear complete. 1. A Tiny Bit of General Measure Theory In this section we will introduce the basic ideas of general measure theory. We will not go into any details on proofs, but only define the relevant structures and give several important examples. Definition 1.1. Given a set X we call 2X to be the power set of X, the set of all subsets of X. A collection Σ ⊂ 2X is called a σ-algebra on X if: (1) (Empty set and whole space) ∅, X ∈ Σ. (2) (Complements) If E ∈ Σ then X \ E ∈ Σ. ∞ (3) (Countable unions / intersections) If (Ej )∞ j=1 are in Σ then ∪j=1 Ej ∈ Σ and ∩∞ j=1 Ej ∈ Σ. Example 1.2. We have already seen two interesting examples of σ-algebras on Rn , the Lebesgue measurable sets L(Rn ) and the Borel measurable sets B(Rn ). Definition 1.3. Given a σ-algebra Σ on a set X, a map µ : Σ → [0, +∞] is called a positive measure on (X, Σ) if: (1) (Empty set) µ(φ) = 0. (2) (Complements) For E ⊂ F in Σ µ(E) ≤ µ(F ). (3) (Countable additivity) If (Ej )∞ j=1 in Σ are mutually disjoint then µ(∪∞ j=1 Ej ) = ∞ X µ(Ej ). j=1 One can also naturally define signed measures, although some care is needed due to possible cancellations between +∞ and −∞. We will not go into more detail on this topic except to say that a natural class of signed measures to consider consists of measures of the form µ = µ+ − µ− 1 2 WILL FELDMAN where µ+ , µ− are both positive measures on (X, Σ) with finite total mass µ± (X) < +∞. Now we have enough background to define a measure space. Definition 1.4. A measure space is a triplet (X, Σ, µ) where X is a set, Σ is a σ-algebra on X, and µ is a positive measure on (X, Σ). If we do not want to specify the measure we can call (X, Σ) a measurable space. Example 1.5. (Lebesgue measure) The fundamental example is of course the one we have studied significantly already, (Rn , L(Rn ), m) the Lebesgue measure on the Lebesgue measurable subsets of Rn . We will see several more examples soon, but first let’s introduce the idea of a measurable function and the idea of the integral. Definition 1.6. Given a pair of measurable spaces (X, Σ) and (Y, Γ) a map f : X → Y is called measurable if f −1 (E) ∈ Σ for all E ∈ Γ. Typically we will take (Y, Γ) = (R, B(R)), this matches up with our definition of measurable functions before when (X, Σ) = (Rn , L(Rn )). Given a non-negative real valued measurable function f : X → [0, +∞] we can define an integral exactly as we did for Lebesgue measure. We define the integral for measurable simple functions s : X → [0, +∞] first P s(x) = Jj=1 aj χEj (x) where Ej ∈ Σ ˆ J X s dµ = aj µ(Ej ). X j=1 Then we extend to general non-negative measurable functions by ˆ ˆ f dµ = sup{ s dµ| s ≤ f simple }. X X Of course we can also extend to absolutely integrable functions as well in the same way we did in the Lebesgue theory. Now we give a few more important examples of measures and their associated integrals. Example 1.7. (Non-negative measurable functions) Another large class of examples of positive measures is given by the non-negative measurable functions on (Rn , L(Rn )). given ρ : Rn → [0, +∞] measurable define ˆ µ(E) = ρ. E The integral in this case is ˆ ˆ f dµ = Rn f ρ. Rn Example 1.8. (Delta mass) The famous δ-function of physics, the bane of math undergraduates, actually does make sense mathematically, it is just a measure not a function. Actually, maybe even more correctly, it is a LECTURE NOTES MATH 5210 3 distribution, but this is an idea we will not explore in this course. We can define δ measures on any measure space (X, Σ). Given a point x ∈ X and a set E ∈ Σ define δx the delta mass at x ( 1 x∈E δx (E) = 0 x 6∈ E. The integral in this case is ˆ f dµ = f (x). X Example 1.9. (Surface measure on a smooth graph) Consider an 2-dimensional surface in R3 given by the graph of a smooth function f : R2 → R S = {x ∈ R3 : x3 = f (x1 , x2 )}. Recall from calculus that the surface area of S above a region Ω ⊂ R2 is given by ˆ p 1 + k∇f k2 dx1 dx2 . Ω Define the projection P : R3 → R2 P (x1 , x2 , x3 ) = (x1 , x2 ). It turns out that the projection of any Borel measurable set in R3 is a Lebesgue measurable set in R2 so we can define the surface measure ˆ p 1 + k∇f k2 dx1 dx2 . µ(E) := P (E∩S) This fact about projections is highly non-obvious though, the projection of a Lebesgue set in R3 onto R2 may not be measurable at all, and the projection of a Borel set may not be Borel. This example may provide some motivation to generalize the idea of Lebesgue theory to start with “simple” sets (boxes) then define an outer measure, and finally identify an appropriate class of measurable sets. In fact there is such a general theory for constructing measures starting from something called a pre-measure defined on a Boolean algebra - think of the volume of boxes (pre-measure), and the class of sets which are finite unions / intersections / complements of boxes (Boolean algebra). Example 1.10. (Cantor measure) This measure requires a bit more work to construct so I am not including it in the notes at the moment, however basically this is a measure which is ”uniform” on the middle thirds Cantor set. Example 1.11. (Probability measures) In probability theory measurable spaces are often denoted (Ω, F) and are called probability spaces. A probability measure P on a space (Ω, F) is just a positive measure with total mass P(Ω) = 1. Measurable sets are called events and measurable functions are called random variables and usually denoted by capital letters like 4 WILL FELDMAN X : Ω → R. The fundamental computations of probability theory involve, at the most basic level, computing probabilities like P(X ∈ [a, b]) = P({ω ∈ Ω : X(ω) ∈ [a, b]}) for a real valued random variable X and an interval [a, b] ∈ R. 2. Basics of Functional Analysis In this lecture we will explore some of the types of spaces, namely normed spaces and inner product spaces, with examples. Besides the metric space, which we have already studied a lot in this class, the new types of spaces that we will introduce here will all be vector spaces over R or C. In the following diagram each row introduces additional structures Vector Spaces ↑ Normed Spaces ↑ Inner Product Spaces ↑ Finite Dimensional Euclidean Space. As we will see every normed space also has a canonically associated metric space structure, and so all the of the bottom three types of space in the chart above are also naturally metric spaces. Remark 2.1. There is another important class of spaces which have more structure than vector spaces, but less than normed spaces, known as topological vector spaces. Within this there are also many distinctions, one of the most important is the notion of Fréchet space (which are complete metric spaces as well). This idea comes up rather naturally, for example if you want to view the space C ∞ (Rn → R) as a metric space. 2.1. Normed Spaces. Definition 2.2. Suppose that V is a vector space over R or C, we call a map k · k : V → [0, ∞) a norm if all of the following properties hold: (1) (Positivity) For all x ∈ V we have kxk ≥ 0 and kxk = 0 if and only if x = 0. (2) (Scaling) For any α ∈ R (or C) and x ∈ V we have kαxk = |α|kxk. (3) (Triangle inequality) For any x, y ∈ V we have kx + yk ≤ kxk + kyk. Note that every norm naturally gives rise to a metric d(x, y) = kx − yk. The scaling property implies the symmetry of the metric, but also this scaling property is where norms are really more general than metrics. Of course LECTURE NOTES MATH 5210 5 for the scaling property to make sense uses the underlying vector space structure. Many of the examples of metric spaces that we have seen in this class are actually normed spaces. Let’s see several examples. Example 2.3. Euclidean space Rn with the Euclidean norm n X kxk2 = ( x2j )1/2 , j=1 or any of the `p norms n X kxkp = ( |xj |p )1/p and kxk∞ = max |xj |. 1≤j≤n j=1 Of course we have mostly been interested in the case p ∈ {1, 2, ∞}. One can also make natural analogues of these norms for the complex vector space Cn . Example 2.4. (Sequence spaces) We have seen several times on the homework the space for 1 ≤ p < +∞ ∞ X `p (N → R) = x : N → R| |xj |p < +∞ j=1 and ( ∞ ` (N → R) = ) x : N → R| sup |xj | < +∞ . j We mostly studied the cases p ∈ {1, ∞}. It is some work to establish the triangle inequality for 1 < p < ∞, we will handle this later in these notes at least for the case p = 2. Example 2.5. (Continuous functions) Given a metric space (X, d) the space of real valued continuous functions C(X → R) = {f : X → R| f continuous} is naturally a normed space with kf ksup = sup |f (x)|. x∈X Example 2.6. (C k spaces) If the domain X = [0, 1] we can also define a natural normed space of k-times differentiable functions C k ([0, 1] → R) = {f : [0, 1] → R| f is k-times differentiable and f (k) is continuous}, with the norm kf kC k = k X j=0 kf (j) ksup , 6 WILL FELDMAN here f (0) , the zeroth derivative, is just the function itself. An analogue of this space also makes sense on subdomains of Rn . Example 2.7. (C ∞ space) The space of smooth functions C ∞ ([0, 1] → R) = ∞ \ C k ([0, 1] → R) k=0 is also naturally a metric space (it is complete so we call it a Fréchet space actually), although it is not a normed space. We say that a sequence fn ∈ C ∞ converges to f ∈ C ∞ if every derivative (k) fn converges uniformly to f (k) . You may find it interesting to think about how to come up with a metric consistent with this notion of convergence. Spoiler: ∞ X kf (k) − g (k) ksup ∞ dC (f, g) = 2−k . 1 + kf (k) − g (k) ksup k=0 Example 2.8. (L1 space) The space of absolutely integrable functions on Rn is (almost) another normed space. L1 (Rn → R) = {f : Rn → R| f is absolutely integrable} with the norm ˆ kf kL1 = |f (x)| dx. Rn It is fairly straightforward to check all the properties of the norm (do it!), except that kf kL1 = 0 only implies that f = 0 almost everywhere. However this is a problem which we have seen and dealt with before, we simply need to mod out by the equivalence relation “f = g almost everywhere”. There are also natural Lp analogues of this space, we will postpone discussion on those. For 1 < p < ∞ the triangle inequality is a bit trickier to prove, and for p = ∞ we will have to do a bit of measure theoretic thinking to even define the norm. There is a special name when the metric space induced by the norm is complete: Definition 2.9. If (V, k · k) is a complete normed space then we call it a Banach Space. Exercise 2.10. Show P that a normed vector space (V, k·k) is complete P∞ if and only if any series ∞ v which is absolutely summable, i.e. j j=1 j=1 kvj k < +∞, converges in V . All of the examples we gave above are Banach spaces, most of them infinite dimensional. We will need to establish that L1 (Ω → R) is complete since that is something we have not done yet, but we will save that for a bit later. LECTURE NOTES MATH 5210 7 2.2. Inner product spaces. As we know from calculus Euclidean space actually has more structure than just a norm, it has the dot product: for x,y ∈ Rn . n X x·y = x j yj . j=1 The notion of inner product is a generalization of the idea of dot product. There is also a more significant difference between real and complex inner products, so we will give the real case first and then the complex case. Definition 2.11. Let V be a vector space over R, a map h·, ·i : V × V → R is called a real inner product if it satisfies the following properties: (1) (Symmetry) For all x, y ∈ V hx, yi = hy, xi (2) (Linearity in second entry) For all x, y, z ∈ V and a ∈ R hx, ay + zi = ahx, yi + hx, zi (3) (Positivity) For all x ∈ V hx, xi ≥ 0 with equality if and only if x = 0. Note that, by symmetry and linearity in the second entry, a real inner product is actually bilinear, i.e. linear in both entries. The definition of complex inner product is almost the same, replacing only symmetry with conjugate symmetry. Definition 2.12. Let V be a vector space over C, a map h·, ·i : V × V → C is called a complex inner product if it satisfies the following properties: (1) (Conjugate symmetry) For all x, y ∈ V hx, yi = hy, xi, where z = u − iv is the complex conjugate of a complex number z = u + iv. (2) (Linearity in second entry) For all x, y, z ∈ V and a ∈ R hx, ay + zi = ahx, yi + hx, zi (3) (Positivity) For all x ∈ V hx, xi ≥ 0 with equality if and only if x = 0, note we are using a shorthand that hx, xi ≥ 0 says that hx, xi ∈ R and is non-negative. Note that, by symmetry and linearity in the second entry, a complex inner product is conjugate linear in its first entry. Also note that different sources may take the convention of linearity in the first entry and conjugate linearity in the second entry. 8 WILL FELDMAN One of the key important notions in inner product spaces is orthogonality. A pair of vectors x, y ∈ V are orthogonal if hx, yi = 0. It turns out that every inner product gives rise to a norm, and hence also to a metric, kxk = hx, xi1/2 Most of the properties of the norm follow directly from the definition, but we need to prove the triangle inequality, which we do below. First, however, we prove another extremely important inequality called the Cauchy-Schwarz inequality. Lemma 2.13. If (V, h·, ·i) is an inner product space for any x, y ∈ V |hx, yi| ≤ kxkkyk where k · k is the norm associated with the inner product as defined above. Sketch. Let x, y ∈ V . By positivity 0 ≤ hx + λy, x + λyi for all λ ∈ R (or C if we are in the complex case). Then expand out the right hand side using bi-linearity and use calculus to choose a good value of λ which minimizes the right hand side (exercise). Now we can use Cauchy-Schwarz to prove the triangle inequality. Lemma 2.14. If (V, h·, ·i) is an inner product space then the associated norm kxk = hx, xi1/2 satisfies the triangle inequality. Sketch. Expand out kx + yk2 = hx + y, x + yi and use Cauchy-Schwarz on an appropriate term. You might then ask if every norm actually comes from an inner product. This turns out not to be the case. Exercise 2.15. Show that any norm k·k which arises from an inner product satisfies the parallelogram identity: kx + yk2 + kx − yk2 = 2kxk2 + 2kyk2 . Use this to check that one of norm spaces defined in the previous section is not an inner product space. Now let us see some examples of inner product spaces Example 2.16. (Euclidean space) The dot product on Rn is an inner product n X x·y = x j yj . j=1 LECTURE NOTES MATH 5210 9 On Cn we have the complex inner product x̄ · y = n X x̄j yj . j=1 Example 2.17. (`2 -space) Among the normed sequence spaces we introduced earlier `2 (N → R) is also a real inner product space with hx, yi`2 = ∞ X xj yj . j=1 There is also natural complex analogue of this for `2 (N → C). Example 2.18. (L2 -space) We also have natural integral based inner products. Given any measurable domain Ω ⊂ Rn L2 (Ω → R) = {f : Ω → C| f measurable and |f |2 is integrable} with the inner product ˆ hf, giL2 = f (x)g(x) dx. Ω Of course we need to ask ourselves if the product f g is absolutely integrable so that this inner product is actually defined. Formally we can apply the Cauchy-Schwarz inequality to conclude ˆ 1/2 ˆ 1/2 ˆ 2 2 |f g| ≤ |f | |g| < +∞ Ω Ω Ω since f and g are assumed to be square integrable in the definition of the space. This argument is not really formal though because the proof of Cauchy-Schwarz goes through when applied to h|f |, |g|iL2 which is an integral of a non-negative function and thus well-defined. The complex version of the inner product on L2 (Ω → C) is defined analogously ˆ hf, giL2 = f (x)g(x) dx. Ω 2.3. Schauder Bases. As you may remember from your linear algebra class, the notion of basis and dimension play a major role in the study of vector spaces. Let’s recall the classical definition of a basis. Definition 2.19 (Hamel Basis). For a given vector space V over R (or C), a set of vectors B = {vj }j∈J is called a basis for V if every vector v ∈ V can be written as a unique finite linear combination of elements of B X v= aj vj for some aj ∈ R (or C). j∈J The dimension is then defined dim(V ) := #(J), this could be +∞ but it does not depend on the choice of basis. 10 WILL FELDMAN Of course most of the vector spaces that we are interested in this class are actually infinite dimensional. It turns out that Hamel Bases, although they do exist in any vector space, are not really the most natural concept of basis in Banach / Hilbert spaces. Since we do have a topology induced by the norm / inner product, and we are working in a complete space, we can make sense of absolutely summable linear combinations of basis elements. It makes sense to define a notion of basis allowing for infinite linear combinations. Definition 2.20 (Schauder Basis). Suppose that (V, k · k) is a Banach space over R (or C), a countable set of vectors B = {vj }∞ j=1 is called a Schauder basis for V if for every v ∈ V there is a unique sequence of scalars (aj )∞ j=1 such that N X kv − aj vj k → 0 as N → ∞. j=1 The basis is called normalized if kvj k = 1 for all j ∈ J. If k · k is derived from an inner product h·, ·i, i.e. V is actually a Hilbert space, and the basis has the property that hvi , vj i = 0 for i 6= j then we call the basis orthogonal. If a basis is both orthogonal and normalized it is called an orthonormal (Schauder) basis. Exercise 2.21. Show that if (V, k · k) has a Schauder basis then it is separable. Exercise 2.22. Show that if (V, k · k) is an infinite dimensional Banach space then any Hamel basis must be uncountable. Hint: The span of any finite subset of a Hamel basis B is a finite dimensional vector subspace of V . Show that the complement of this subspace is open and dense. Use the Baire Category theorem to get a contradiction in the case that B is countable. Let’s see some examples Example 2.23. (`p standard basis) In the sequence spaces `p (N → R) = {x : N → R : kxk`p < +∞} for 1 ≤ p < ∞ there is a natural Schauder basis which is just the obvious extension of the canonical basis for finite dimensional Euclidean spaces, B = {ej }∞ j=1 ( 1 n=j ej (n) = 0 else. It is straightforward to check that this basis is normalized, and in the case of `2 (N → R) (which is a Hilbert space) the basis is orthonormal. Exercise 2.24. Check that the Euclidean basis {ej }j∈N is a Schauder basis for `p (N → R) for 1 ≤ p < +∞. Why is it not a Schauder basis in the case p = +∞? LECTURE NOTES MATH 5210 11 Example 2.25. (Schauder basis of C([0, 1])?) Recall the space of continuous functions on the unit interval C([0, 1] → R) with the uniform norm kf ksup = sup |f (x)|. x∈[0,1] We already know from the Weierstrass theorem that the set of polynomials is dense. This allowed us to prove that C([0, 1] → R) is separable. One might think that the monomials for n ∈ {0} ∪ N mn (x) = xn could form a Schauder basis, because, as we know from Weierstrass, the finite linear combinations of the monomials However any function P are dense. n , with the series converging which admits a representation f (x) = ∞ a x n n=0 uniformly on [0, 1], is actually infinitely differentiable at least on [0, 1). I won’t go into the details of the proof here, but you can read Tao Chapter 4.1 and 4.2 to find the relevant results. In fact C([0, 1] → R) does indeed have a Schauder basis, but we will need to come back with a different idea. 2.4. Orthonormal Schauder Bases and Projections. Now let me make a few additional notes about orthonormal bases in Hilbert spaces, because these have some additional nice properties. Definition 2.26. If (H, h·, ·, i) is a Hilbert space G is a closed subspace we call bounded linear PG : H → H to be the orthogonal projection onto G if PG x ∈ G for all x ∈ H and x − PG x ∈ G⊥ . Lemma 2.27 (Existence of Orthogonal Projections). If (H, h·, ·, i) is a Hilbert space G is a closed subspace, there exists a bounded linear PG : H → H, called the orthogonal projection onto G, such that PG x ∈ G for all x ∈ H, x − PG x ∈ G⊥ and kx − PG xk = min kx − yk. y∈G Proof. The construction of the operator actually goes through the minimization principle. Let yn be a minimizing sequence for miny∈G kx − yk, i.e. yn ∈ G and kyn − xk → α := min kx − yk. y∈G Then, using the parallelogram law, kyn −ym k2 = k(yn −x)−(ym −x)k2 = 2kyn −xk2 +2kym −xk2 −kyn +ym −2xk2 . Note that 1 kyn + ym − 2xk2 = 4k (yn + ym ) − xk2 ≥ 4m2 2 because 12 (yn + ym ) ∈ G as well and so has distance to x larger than the minimum value α. Thus kyn − ym k2 ≤ 2kyn − xk2 + 2kym − xk2 − 4α2 → 0 as n, m → ∞. 12 WILL FELDMAN Thus (yn ) is Cauchy, since G is a closed subset of a complete space it is convergent in G, define PG x to be the limit of this sequence. By continuity of the norm with respect to convergence in the norm kx − PG xk = α. Now for the orthogonality property. Let y ∈ G, then by the minimality property again 0= d dt kx−PG x+tyk2 = t=0 d dt (kx−PG xk2 +2thx−PG x, yi+t2 kyk2 ) = 2hx−PG x, yi t=0 which is the desired result. Actually note that this argument also shows, conversely, that if x − PG x ∈ G⊥ then PG x is the point in G closest to x. Finally we check that PG is a bounded linear operator, of course kx − PG xk ≤ kxk since 0 ∈ G was a valid test point for the minimization, so kPG xk ≤ 2kxk. We also need to check linearity. I will leave this as an exercise, use the fact we just noted that x − PG x ∈ G⊥ implies that PG x minimizes miny∈G kx − yk. Lemma 2.28. If (H, h·, ·, i) is a Hilbert space G is a closed subspace and B = {vj }∞ j=1 is an orthonormal basis for G, then, for any x ∈ H PG x = ∞ X hvj , xivj , j=1 with P the sum converging in the norm topology. In particular if G = H then x= ∞ j=1 hx, vj ivj for all x ∈ H. Proof. Since B is a basis for G there are some coefficients αj so that PG x = ∞ X αj vj j=1 with the sum converging in the norm topology. Then we compute the inner product ∞ ∞ X X hvi , PG xi = hvi , αj vj i = αj hvi , vj i = αi . j=1 j=1 Interchanging the sum and inner product is justified because h·, yi is a norm continuous (Lipschitz in fact) function on H for any y ∈ H since |ha, yi − hb, yi| = |ha − b, yi| ≤ ka − bkkyk by Cauchy-Schwarz. I forgot the following argument in class so if you were confused by this proof in lecture look here: Now we are almost done we just need to see that αi = hvi , PG xi = hvi , xi + hvi , PG x − xi = hvi , xi LECTURE NOTES MATH 5210 13 since x−PG x is orthogonal to every vector in G, in particular it is orthogonal to vi , so we have just added 0 to the RHS in the second equality. Corollary 2.29 (Criterion for being a Schauder basis). If (H, h·, ·, i) is a Hilbert space and B = {vj }∞ j=1 is an orthonormal system in H, then B is a Schauder basis for G := span({vj }∞ j=1 ) and if, moreover, G⊥ = {0} then B is a Schauder basis for H. In other words if hx, vj i = 0 ∀j =⇒ x = 0 then B is a Schauder basis for H. Exercise 2.30. Prove Corollary 2.29. (It is really a corollary of the proof of Lemma 2.28 so you will need to go into that proof a bit and change things to match the new assumptions here). Theorem 2.31 (Plancherel/Parseval identity). If (H, h·, ·, i) is a Hilbert space and B = {vj }∞ j=1 is an orthonormal basis for H, then, for any x, y ∈ H hx, yiH = ∞ X hvj , xihvj , yi. j=1 and in particular kxk2H = ∞ X |hvj , xi|2 . j=1 Proof. Formally we just expand out the norm using the inner product definition X X X hx, yi = h hvj , xivj , hvj , xihvi , xihvj , vi i. hvi , xivi i = j i i,j Finally we use orthonormality so that hvi , vj i = 0 unless i = j and in that case it is 1. The key analysis point here is that the infinite sum converges in the norm and h·, ·i is norm continuous in both entries due to the Cauchy-Schwarz inequality. I recommend making this argument carefully on your own with ε and N . Example 2.32. (Orthogonal polynomials) In our first example we use the Weierstrass theorem to construct an orthonormal basis for L2 ([0, 1] → R) of orthogonal polynomials. The construction is really just the classical GramSchmidt procedure applied to the monomials. We start out with P0 (x) = 1 14 WILL FELDMAN and then we define iteratively n Qn (x) = x − n−1 X ˆ 1 j=0 x Pj (x) dx Pj (x) n 0 i.e. this is exactly the orthogonal projection of xn onto the orthogonal complement of the span of P0 , . . . , Pn−1 . Then we normalize ˆ 1 Pn (x) = Qn (x)/( Qn (x)2 dx)1/2 . 0 By the construction we have an orthonormal system, now we need to check that it is a basis. It is worth noting that span(P0 , . . . , Pn ) is exactly the space of polynomials of degree at most n, and so span(P0 , P1 , . . . ) is the space of polynomials on [0, 1]. For this we will use the criterion of Corollary 2.29. Suppose there is f ∈ L2 ([0, 1] → R) such that ˆ 1 f (x)Pn (x) dx = 0 for all 0 ≤ n < ∞. 0 We aim to show that f = 0 almost everywhere, if we can do this then we will know that {Pn } are an orthonormal Schauder basis of L2 ([0, 1] → R). Suppose otherwise, that kf kL2 ([0,1]) > 0. As shown on your homework (more or less HW6 problem 1) the continuous functions C([0, 1] → R) are dense in L2 ([0, 1] → R) in the L2 norm. So we can find g ∈ C([0, 1] → R) with 1 kf − gkL2 ([0,1] ≤ kf kL2 ([0,1]) . 4 By the Weierstrass theorem there is a polynomial Q on [0, 1] such that 1 kg − Qksup ≤ kf kL2 ([0,1]) 4 and so ˆ kg − QkL2 ([0,1]) = 1 2 |g(x) − Q(x)| dx 0 1/2 1 ≤ kf kL2 ([0,1]) . 4 Then also Q is a good approximation of f in L2 , by triangle inequality, 1 kQ − f kL2 ([0,1]) ≤ kQ − gkL2 ([0,1]) + kg − f kL2 ([0,1]) ≤ kf kL2 ([0,1]) . 2 Since Q is a polynomial it can be written as a finite linear combination of Pj n X Q(x) = αj Pj (x). j=0 LECTURE NOTES MATH 5210 15 Then 0 = hf, QiL2 ([0,1]) = hf, f + Q − f iL2 ([0,1]) = kf k2L2 ([0,1]) − hf, Q − f iL2 ([0,1]) ≥ kf k2L2 ([0,1]) − kf kL2 ([0,1]) kQ − f kL2 ([0,1]) 1 ≥ kf k2L2 ([0,1]) − kf k2L2 ([0,1]) > 0. 2 This is a contradiction of kf kL2 ([0,1]) > 0. Example 2.33. (Fourier basis of L2 (T)) We will devote the next section to this topic. Example 2.34. (Haar basis of L2 (R)) The Haar basis is based around indicator functions of dyadic intervals at decreasing scales. At the first we simply take the indicators of the dyadic intervals, but at all subsequence scales we take a certain linear combination in order to maintain orthogonality with the previous scales. We define for k ∈ Z ϕ0,k = χ[k,k+1] (x). Then for n ≥ 1 ϕn,k = 2(n−1)/2 (χ2−n [2k,2k+1] (x) − χ2−n [2k+1,2k+2] (x)). Try drawing the graph of ϕ0,0 , ϕ1,0 and ϕ2,0 to get an idea of what is going on. The key point is that ϕn,k are normalized. They are orthogonal for fixed n because the supports are disjoint varying k. If for ϕn1 ,k1 and ϕn2 ,k2 the scales n1 6= n2 then, let’s say n2 > n1 and then ϕn1 ,k1 is constant on the support of ϕn2 ,k2 and since n2 > n1 ≥ 0 we have ˆ 0 = ϕn2 ,k2 = hϕn2 ,k2 , ϕn1 ,k1 i. Thus we at least have an orthonormal set of vectors in L2 (R). Now we would like to show that this is indeed a Schauder basis. First let us note that indicator functions of all dyadic intervals are in the span of the (ϕn,k ). I will leave this as an exercise, it is a simple inductive argument, just start out by proving that you can write [0, 1/2] as a linear combination of ϕ0,0 and ϕ1,0 and this will give you the idea. Call G to be the closure of the set of finite linear combinations of {ϕn,k }n,k . If we can show that G⊥ = {0} then we will have G = H. Suppose that f ∈ G⊥ , that is hf, ϕn,k i = 0 for all n ≥ 0 and k ∈ Z. 16 WILL FELDMAN Then f has integral zero on every dyadic interval ˆ f = 0 for all n, k ∈ Z. 2−n [k,k+1] From this we can conclude that f has integral zero on every interval ˆ f = 0 for all (a, b) ⊂ R (a,b) by (almost) covering the interval (a, b) by small dyadic intervals. Since every open set in R is a disjoint union of open intervals we also get ˆ f = 0 for all U ⊂ R open and finite measure. U We need to also assume finite measure because |f |2 is integrable so we know that f is absolutely integrable on sets of finite measure (see Homework 6). Then for every measurable set E with finite measure there is a monotone decreasing sequence of open finite measure sets E ⊂ Un such that m(Un \ E) & 0 as n → ∞. Then f χUn & f χE pointwise almost everywhere and the functions are dominated by f χU1 so by dominated convergence theorem ˆ ˆ f = lim f = 0. E n→∞ U n Finally we take Er = {f > 0} ∩ [−r, r] which is a finite measure set and ˆ ˆ 0= f= |f |. Er Er So f = 0 almost everywhere on Er for all r and so f = 0 almost everywhere on {f > 0}, i.e. {f > 0} has measure zero. A similar argument for {f < 0} finally shows that f = 0 almost everywhere. 2.5. Dual spaces. Next we introduce a fundamental idea of linear spaces, the dual space. Definition 2.35. If V is a vector space we call ` : V → R a linear mapping to be a linear functional on V . If (V, k · kV ) is a normed vector space and ` is a linear functional on V which is continuous we call it a continuous linear functional on V . If ` is a linear functional on V with the property that k`kV ∗ := sup |`(x)| < +∞ kxk≤1 we call ` a bounded linear functional on V . The norm k · kV ∗ defined above on bounded linear functionals on V is called the dual norm of k · kV . Note that the notion of a bounded linear functional does not mean the functional is bounded on all of V , just on any finite radius ball. In fact the only linear functional which is bounded on all of V is the 0 map. LECTURE NOTES MATH 5210 17 It turns out that boundedness and continuity are the same for linear functionals. Exercise 2.36. If (V, k · kV ) is a normed vector space and ` is a bounded linear functional on V then for all x ∈ V |`(x)| ≤ k`kV ∗ kxk. Lemma 2.37. If (V, k · kV ) is a normed vector space and ` is a linear functional on V then ` is bounded if and only if ` is continuous. Proof. If ` is bounded then for any x, y ∈ V |`(x − y)| ≤ k`kV ∗ kx − yk i.e. ` is Lipschitz continuous with constant k`kV ∗ . On the other hand suppose that ` is continuous on V . Then ` is continuous at 0, and recall that ` is linear means `(0) = 0, so there exists δ > 0 so that |`(x)| ≤ 1 for kxk ≤ δ. However by linearity for any x ∈ B(0, 1) |`(x)| = |δ −1 `(δx)| ≤ δ −1 so ` is bounded. Definition 2.38. If (V, k · kV ) is a normed vector space we define the dual space V ∗ = {` : V → R| ` is a bounded linear functional on V }. The dual space is also a normed vector space with a canonical norm inherited from duality. Lemma 2.39. If (V, k · kV ) is a normed vector space then the dual space V ∗ is a normed vector space with the dual norm k · kV ∗ . Exercise 2.40. Prove Lemma 2.39, i.e. check that V ∗ is a vector space and k · kV ∗ is a norm. The following theorem shows that there are at least some bounded linear functionals. Theorem 2.41 (Hahn-Banach). If W is a subspace of V and ` : W → R is a bounded linear functional on W then there is an extension L : V → R a bounded linear functional on V which agrees with ` on the subspace W and kLkV ∗ ≤ k`kW ∗ Proof. We will skip the proof, it uses Zorn’s Lemma. The idea is to show first that one can extend ` by one dimension at a time without increasing the norm, and then use that the norm preserving extensions of ` form a partially ordered set (ordered by containment of the domain of the extension) so there is a maximal element. Let ` be a bounded linear functional on a subspace W in V . If W is not 18 WILL FELDMAN Since V ∗ is also a normed vector space it itself also has a dual space V = (V ∗ )∗ which is called the double dual. Notice that every x ∈ V is actually a bounded linear functional on V ∗ via the mapping ` 7→ `(x). This means that every element of V is an element of the double dual space ∗∗ V ⊂ V ∗∗ . (Technically we should think of this actually as a norm preserving isomorphism between V and a subset of V ∗∗ ). It is sometimes the case that the inclusion is strict, the double dual V ∗∗ is not equal (isomorphic) to the original space. If V ∗∗ = V (technically I mean that the normed spaces are isomorphic) then we call V reflexive. Hilbert spaces have the even more special property of being self-dual, the dual space V ∗ is naturally isomorphic to V . This is known as the Riesz Representation Theorem. Theorem 2.42 (Riesz Representation Theorem). If (H, h·, ·, i) is a (real or complex) Hilbert space and ` ∈ H ∗ is a bounded linear functional on H, then there exists y ∈ H such that `(x) = hy, xi for all x ∈ H. Proof. We just do the real Hilbert space case. First of all if ` = 0 we are done so we can assume that `(x) 6= 0 for some x ∈ H. Call ker(`) = {x ∈ H : `(x) = 0}. Since ` is continuous this is a closed subset of H, and since ` is linear it is a vector subspace. We claim that ker(`)⊥ := {y ∈ H : hy, xi = 0 for all x ∈ ker(L)}, which is another vector subspace of H, is one-dimensional. Note that ker(`)⊥ ∩ ker(`) = {0} since any such x would satisfy kxk2 = hx, xi = 0. First of all the proof of the Hahn-Banach theorem also implies that there is at least one nontrivial vector in ker(`)⊥ . If v1 and v2 are two nontrivial vectors inside of ker(`)⊥ then there is a nonzero λ ∈ R `(v1 ) = λ`(v2 ). Then by linearity `(v1 − λv2 ) = 0 thus v1 − λv2 ∈ ker(`) ∩ ker(`)⊥ = {0}. Therefore ker(`)⊥ is one-dimensional, take now v to be a unit vector in ker(`)⊥ then for any x ∈ H we first note that x − hv, xiv ∈ ker(`) since that vector is orthogonal to v and (ker(L)⊥ )⊥ = ker(L). Then `(x) = `(x − hv, xivi + hv, xiv) = 0 + `(hv, xiv) = hv, xi`(v). So calling y = `(v)v we get the result. LECTURE NOTES MATH 5210 19 Example 2.43. (Dual norm of `1 , `2 , `∞ on Rd ) Let’s study a bit the dual norms of k · kp on finite dimensional space. We will write k · kp∗ for the dual norm of k · kp . First we claim that k·k1 and k·k∞ are dual norms on Rd . Let ` a bounded linear functional on Rd . First of all notice that `(x) = `(x1 e1 + · · · + xd ed ) = x1 `(e1 ) + · · · xd `(xd ) so we can represent ` by another vector in Rd namely ` = (`1 , · · · , `d ) with `j = `(ej ). Note that technically this is a linear bijection between (Rd )∗ and Rd itself. We compute |`(x)| = | d X j=1 x j `j | ≤ d X j=1 |xj `j | ≤ max |`j | j d X |xj | = k`k∞ kxk1 . j=1 This means that k`k1∗ ≤ k`k∞ and similarly kxk∞∗ ≤ kxk1 . To get equality we just have to note that, for any dual vector `, there is an x so that equality is achieved in the above inequality, precisely take xj = `j /|`j | if |`j | = k`k∞ and zero for other j. A similar argument shows that kxk∞∗ = kxk1 . As for the dual of the Euclidean norm k · k2 , we can check that kxk2∗ = kxk2 . I will leave this as an exercise. Example 2.44. (Dual of `1 , `2 , `∞ sequence spaces) Very similar arguments to those shown in the finite dimensional case above find that `1 (N → R)∗ = `∞ (N → R) and `2 (N → R)∗ = `2 (N → R). However things are more complicated for the dual of `∞ , it turns out that the dual of `∞ (N → R) is strictly larger than `1 (N → R), although we will need the axiom of choice by way of the Hahn-Banach Theorem to construct such a linear functional. First of all let us consider the following vector subspace of `∞ (N → R) X = {x ∈ `∞ (N → R)| lim xn exists}. n→∞ This subspace is actually separable and hence is a nowhere dense subset of `∞ (N → R). We define a bounded linear functional ϕ on X ϕ(x) = lim xn . n→∞ The Hahn-Banach theorem says that ϕ has a bounded extension ϕ : `∞ (N → R) → R. I leave it as an exercise for you to show that ϕ cannot arise as an element of `1 (N → R): more precisely for ϕ with the above defining property there 20 WILL FELDMAN is no element y ∈ `1 (N → R) such that ∞ X ϕ(x) = yn x n n=1 for every x ∈ `1 (N → R) (or even just for x ∈ X). Example 2.45. (Lp spaces and duals) As it is a Hilbert space L2 (Ω → R) is self dual. The dual of L1 (Ω → R) is L∞ (Ω → R). Similar to the case of `∞ (N → R) the dual of L∞ (Ω → R) is strictly larger than L1 (Ω → R). Let’s discuss the proof that (L1 )∗ = L∞ . First of all any g ∈ L∞ (Ω → R) defines a linear functional on L1 (Ω → R) by ˆ g(x)f (x) dx ϕg (f ) = hg, f i = Ω which is bounded because ˆ ˆ ˆ | g(x)f (x) dx| ≤ |g(x)f (x)| dx ≤ kgkL∞ (Ω) |f (x)| dx. Ω Ω Now if we take an arbitrary ϕ in the dual of g ∈ L∞ (Ω) such that L1 (Ω → Ω R)∗ we need to find hg, f i = ϕ(f ) for all f ∈ L1 (Ω). This requires a serious theorem of measure theory called the Radon-Nikodym Theorem. Basically the idea is to use that indicator functions of finite measure sets are in L1 (Ω) so ϕ defines a measure on the Lebesgue subsets of Ω µ(E) = ϕ(χE ) with |µ(E)| ≤ kϕkL1 (Ω)∗ m(E). The idea then is to show that µ(B(x, r)) g(x) := lim r→0 m(B(x, r)) exists at almost every x ∈ Rd . This follows from Radon-Nikodym Theorem. Taking the existence of g for granted you can try to prove that kgkL∞ = kϕkL1 (Ω)∗ . 2.6. Weak and weak-* convergence. Unlike in the case of metric spaces where there is really just one notion of convergence which emerges naturally from the metric structure, in the case of normed spaces (and actually other topological vector spaces as well) there are several other topologies / notions of convergence which arise from the norm. Definition 2.46. In a Banach space (V, k · kV ) a sequence (xn )∞ n=1 in V is said to converge weakly to x ∈ V and we write xn * x as n → ∞ if `(xn ) → `(x) for all ` ∈ V ∗ . LECTURE NOTES MATH 5210 21 That is, the weak topology is the topology which is the weakest one that makes all of the bounded linear functionals on V continuous. There is a similar notion on the dual space, which actually turns out to be a bit better in some ways. Definition 2.47. Suppose (V, k · kV ) is a Banach space and (V ∗ , k · kV ∗ ) ∗ is its dual space. A sequence (`n )∞ n=1 in V is said to converge weak-* to ∗ ` ∈ V and we write `n *∗ ` as n → ∞ if `n (x) → `(x) for all x ∈ V. If V is separable then the weak-* topology actually comes from a metric. ∗ Let X = (xn )∞ n=1 be a countable dense set in V then define, for ϕ, ψ ∈ V , ρ(ϕ, ψ) = ∞ X n=1 |ϕ(xn ) − ψ(xn )| . 1 + |ϕ(xn ) − ψ(xn )| The key feature of the weak-* topology is the following fundamental result: the closed unit ball in V ∗ norm is compact in the weak-* topology. Theorem 2.48. (Banach-Alaoglu) If (V, k · k) is a separable Banach space, (V ∗ , k · kV ∗ ) is its dual space, and B is the closed unit ball in V ∗ then B is compact in the weak-* topology. Proof. Subsequence diagonalization argument appears again! Let X = {xα }α∈I be a countable dense set of V . Suppose `n is a sequence inside the closed unit ball of V ∗ . Then (`n (xα ))∞ n=1 is a bounded sequence in R for each α ∈ I. By subsequence diagonalization we can find a subsequence `nk such that `nk (xα ) converges for every α and we define `(xα ) := lim `nk (xα ). k→∞ Right now ` : X → R, we want to check that it is linear and extend it to the whole space. Since, for any x, y ∈ X |`n (x) − `n (y)| ≤ kx − yk for all n the same estimate is true for `. Thus ` is uniformly norm continuous on X so it has a unique continuous extension to X = V . Now we check that `nk (x) → `(x) for all x ∈ V . Let ε > 0 there exists y ∈ X with kx − yk ≤ ε/3 and there exists k so that for all k ≥ K |`nk (y) − `(y)| ≤ ε/3 so then |`nk (x) − `(x)| ≤ |`nk (x) − `nk (y)| + |`nk (y) − `(y)| + |`(y) − `(x)| ≤ ε. Finally we need to check that ` is linear. This is easy to check from the fact that it is a pointwise limit of linear maps. 22 WILL FELDMAN Of course this (amazing) compactness comes with a price, the notion of convergence is much weaker than most we have looked at so far. Let’s see some examples. Example 2.49. (Oscillations) Let’s work in L1 ([0, 1]) with the dual space L∞ ([0, 1]). Here is an example of a bounded sequence in L∞ ([0, 1]) which does not have any strongly convergent subsequence, but does converge in the weak-* topology. Define fn (x) = sin(2πnx). This sequence has supremum norm uniformly bounded by 1, but obviously is not convergent in L∞ , in any of the Lp norms, or pointwise almost everywhere. In fact it does not even have a subsequence converging in any of these modes. However, by Banach-Alaoglu it does have a subsequence converging weak-* in L∞ , and in fact we can check that the whole sequence converges weak-* to 0 fn *∗ 0 as n → ∞. The proof is quite technical measure theory, but the idea is typical, argue with a dense subset of L1 first and then extend to all of L1 by uniform continuity (of the maps g 7→ hfn , gi). As we can see weak-* convergence does not preserve the norm, in the above example we had a sequence of functions with unit norm and their weak-* limit was 0. One of the key facts about weak-* convergence, reminiscent of Fatou’s Lemma, is that the norm can only decrease in the limit. We call this lower semi-continuity of the norm. Lemma 2.50 (Lower semi-continuity of the norm). Suppose that (V, k · kV ) is a Banach space and (V ∗ , k · kV ∗ ) is its dual space. If `n is a sequence in the dual converging in the weak-* topology to ` then k`kV ∗ ≤ lim inf k`n kV ∗ . n→∞ Proof. Let x ∈ V with at most unit norm, then |`(x)| = lim inf |`n (x)| ≤ lim inf k`n kV ∗ . n→∞ n→∞ Then k`kV ∗ = sup |`(x)| ≤ lim inf k`n kV ∗ . kxkV ≤1 n→∞ 2.7. Application of dual space ideas. In this section we consider the existence of a solution to the ODE boundary value problem ( −∂x (p(x)∂x u) + q(x)u = 0 x ∈ (0, 1) (2.1) u(x) = g(x) x ∈ ∂(0, 1) = {0, 1}. Here g : {0, 1} → R is arbitrary, and p, q : [0, 1] → (0, ∞) are positive and continuous, we can make this more precise and just assume λ ≤ p, q ≤ 1. LECTURE NOTES MATH 5210 23 This type of problem is called Sturm-Liouville, analogous PDE problems in higher dimensions also come up in many applications and are simply called divergence form elliptic boundary value problems. We will take a functional analytic / variational approach to the problem. We define an associated inner product ˆ 1 hu, viX := p(x)∂x u∂x v + q(x)uv dx. 0 This is a weighted version of the standard L2 inner product, because of the positivity of the coefficients p and q one can check that it is indeed an inner product. The associated norm is, of course, ˆ 1 2 p(x)(∂x u)2 + q(x)u2 dx = kuk2L2p + k∂x uk2L2q . kukX = 0 We define X to be the completion of C 1 ([0, 1]) under this norm. Basically this space consists of functions which are square integrable on [0, 1] and have a square integrable derivative on [0, 1], although you have not studied yet precisely the sense of derivative that we mean here. We should note that, at first appearances, neither the equation nor the boundary condition (2.1) make sense on this space. First let us make sense of the boundary condition. We make note that actually functions in X are Hölder-1/2 continuous. Using the fundamental theorem of calculus and Cauchy-Schwarz, for any x, y ∈ [0, 1] ˆ y |u(x) − u(y)| = | ∂x u(t) dt| ˆ xy ≤ ( (∂x u(t))2 dt)1/2 |x − y|1/2 x ˆ 1 ≤ ( (∂x u(t))2 dt)1/2 |x − y|1/2 0 ˆ 1 1 p(x)(∂x u(t))2 dt)1/2 |x − y|1/2 ≤ 1/2 ( λ 0 1 ≤ 1/2 kukX |x − y|1/2 . λ At a technical level this argument was valid for functions in C 1 ([0, 1]), however the result extends to the completion because a Cauchy sequence in the X-norm will be bounded in the X norm and hence, by this estimate, equicontinuous. Then Arzela-Ascoli, with a bit of extra work thinking about uniform boundedness, shows that there is a uniformly convergent subsequence and we can obtain this same Hölder continuity estimate for the pointwise limit. Thus functions in X are actually Hölder continuous and so point evaluation of the function (but not the derivative) does make sense in this space. 24 WILL FELDMAN Thus we can define the subset of X (a convex subset but not a subspace) Xg = {u ∈ X : u = g on ∂(0, 1)}. Now we aim to solve the following minimization problem: find u ∈ Xg such that kuk2X = min kvk2X =: α. v∈Xg It turns out that solving this minimization problem is the same as solving the ODE (2.1), but we will explore this later. It also turns out that this is exactly a Hilbert space projection problem, so we could use the Hilbert projection theorem to find a minimizer, but let’s use a different approach which relies on dual space ideas instead of the Hilbert structure. Let’s take a minimizing sequence un ∈ X such that lim kun k2X = α. n→∞ The sequence has bounded norm so, by the Hölder estimate again, we can take a subsequence (not relabeled) so that un → u uniformly on [0, 1]. In particular u ∈ Xg as well. Also, by Banach-Alaoglu, there is a further subsequence (again not relabeled) so that un *∗X u as n → ∞. (A-priori the weak-* limit could be different, it is a good exercise to check that if a sequence converges pointwise and in the weak-* sense in this space then the limits are the same). By the lower-semicontinuity of the norm kuk2X ≤ lim inf kun k2X . n→∞ Thus α ≤ kuk2X ≤ lim inf kuk2X = α n→∞ and so we have found a minimizer. Finally we explain what this minimization problem has to do with the original ODE. In words, the ODE (2.1) is the first derivative test for the minimization problem of this norm functional. We do a computation to check this under the assumption that the minimizer is C 2 . In reality we have not shown the existence of a C 2 minimizer, so there is still work to be done. This is a typical situation with this kind of method, one finds a solution with less regularity than expected and there is more work to be done. Suppose that our minimizer derived above is in fact in C 2 ([0, 1]). Let v ∈ C01 ([0, 1]) (i.e. v = 0 on the boundary) be another function. Note that the perturbations u + tv ∈ Xg for all t > 0 so F (t) := ku + tvk2X LECTURE NOTES MATH 5210 25 has its minimum at t = 0. We compute the first derivative test d dt 0= which says ˆ ku + tvk2X = 2hu, viX t=0 1 p(x)∂x v∂x u + q(x)uv dx. 0= 0 Now we integrate by parts in the first term and, using that v = 0 on ∂(0, 1), ˆ 1 0= v(−∂x (p(x)∂x u) + q(x)u) dx. 0 Since −∂x (p(x)∂x u) + q(x)u is continuous and integrates to zero against every v ∈ C 1 ([0, 1]) we can prove that −∂x (p(x)∂x u) + q(x)u = 0 for all x ∈ (0, 1). i.e. we have solved our ODE! 3. Fourier Analysis on the Torus As we have seen L2 and `2 spaces have a very special structure compared to other Lp or `p spaces, they are inner product spaces. In this section we will explore an even more specialized duality structure which appears in L2 or `2 spaces on Tn , Rn and Zn . At an elementary level we can think of Fourier Duality as providing an interesting and natural Schauder basis for L2 (Tn ). This material will be parallel to the material in Tao, Chapter 4. I highly recommend reading along in Tao’s book as well, he is, to say the least, one of the world experts on Fourier Analysis. However in our class we have already studied measure theory and basic functional analysis so we will be able to add some of that additional context in these notes. 3.1. The flat torus. We define the unit width flat torus Td = Rd /Zd . By this we mean that we identify points x and y in Rd which differ by an integer vector. There is a natural bijection between Td and [0, 1)d , for x ∈ Td we write ϕ(x) ∈ [0, 1)d to be the unique element of that equivalence class in [0, 1)d . Definition 3.1. A function f : Rd → R is Zd -periodic if f (x + k) = f (x) for all k ∈ Zd . Zd -periodic functions on Rd can naturally be viewed as functions on the torus Rd /Zd and vice versa. The torus Td naturally inherits a metric from Rd dTd (x, y) = min d2 (x, y + k). k∈Zd Exercise 3.2. (Td , dTd ) is a compact metric space. 26 WILL FELDMAN The torus Td also has a natural group structure coming from the addition x + y = ϕ(ϕ(x) + ϕ(y)) writing it in this way is confusing, but it does have all the appropriate properties of an addition operation. We can also make sense of differentiation for functions on Td , by just viewing them as Zd -periodic functions on Rd . Similarly we can define the integral: for any f : Td → R and Ω ⊂ Td measurable ˆ ˆ f (ϕ−1 (y)) dy f (x) dx := ϕ(Ω) Ω where ϕ is the canonical bijection ϕ : Td → [0, 1)d defined previously. Measurability of f and Ω is defined exactly so that f ◦ ϕ−1 and ϕ(Ω) are measurable on [0, 1)d . This may seem a bit burdensome, but in practice we will not need to be so pedantic and in most cases we can just identify Td and [0, 1)d without being careful to write the bijection ϕ. Fourier analysis on Td is closely connected with the L2 (Td ) inner product, for f, g : Td → C ˆ hf, giL2 (Td ) = f (x)g(x) dx. Td Since we will always use this inner product in this section we will usually drop the subscript. 3.2. Fourier Transform. Let’s just work on the one dimensional torus T = R/Z, we will explain the generalization to higher dimensions at the end. Definition 3.3. (Characters) Define the character with frequency n ∈ Z en (x) = e2πinx = cos(2πnx) + i sin(2πnx). Note that these are all Z-periodic functions on R, and thus, naturally, functions on the torus T. Also note that en (x)em (x) = en+m (x). Definition 3.4. (Trigonometric polynomial) A function f ∈ C(T → C) is called a trigonometric polynomial if it is a linear combination of characters f (x) = N X cn en (x). n=−N Exercise 3.5. The characters form an orthonormal system: ( 1 n=m hen , em i = 0 else. LECTURE NOTES MATH 5210 Exercise 3.6. For any trigonometric polynomial f (x) = kf k2L2 = N X 27 PN n=−N cn en (x) |cn |2 . n=−N Definition 3.7. Given a function f ∈ L2 (T → C), i.e. a 1-periodic function on R, we define the Fourier transform fb : Z → C ˆ b e−2πikx f (x) dx. f (k) = hek , f iL2 = T Given a function α ∈ `2 (Z → C) we define the inverse Fourier transform X α̌(x) = αk e2πikx . k∈Z If we knew that the orthonormal set {en }n∈Z forms a Schauder basis for L2 (T → C) then, from the abstract results of the previous section, X f (x) = fb(k)e2πikx k∈Z with the sum converging in L2 (T → C) (i.e. you should not necessarily interpret this pointwise at this stage). In other words f is the inverse Fourier transform of it’s Fourier transform fb. Via the results in the section on orthonormal bases of Hilbert space, we can show that {en }n∈Z is a Schauder basis for L2 (T → C) if we can show that hf, en i = 0 for all n ∈ Z =⇒ f = 0 a.e.. This will follow if span({en }) is dense in L2 (T). By the Stone-Weierstrass theorem span({en }) is dense in C(T) (in uniform metric and hence also in L2 metric) and we have also shown that C(T) is dense in L2 (T) (in L2 metric) so we find that span({en }) is dense in L2 (T). Taking these arguments, which were only outlined, for granted we also get the Plancherel formula which says that the Fourier transform is a norm and inner product preserving isomorphism between L2 (T → C) and `2 (Z → C): Lemma 3.8 (Plancherel/Parseval formula). For f, g ∈ L2 (T) hf, giL2 (T) = hfb, gbi`2 (Z) and, in particular, kf kL2 (T) = kfbk`2 (Z) . 3.3. Properties of Fourier Transform. The Fourier Transform is linear and interacts in a very simple way with translation and differentiation, the interaction with products is a bit more complicated. Proposition 3.9. If f, g ∈ L2 (T → C) and α ∈ C 28 WILL FELDMAN (1) (Linearity) The Fourier Transform is linear (f\ + αg)(k) = fb(k) + αb g (k). (2) (Translation) Translation on the physical side corresponds to modulation by a character on the Fourier side f\ (· + y)(k) = e2πiky fb(k). (3) (Modulation) Modulation by a character on the physical side corresponds to translation on the Fourier side [ b (e ` f )(k) = f (k − `). (4) (Differentiation) If f ∈ L2 (T → C) ∩ C 1 (T → C) then fb0 (k) = 2πik fb(k). (5) (Products) Products on the physical side correspond to convolutions on the Fourier side X d (f g)(k) = fb ∗ gb(k) = fb(k − `)b g (`). `∈Z (6) (Convolutions) Convolutions on the physical side correspond to products on the Fourier side \ (f ∗ g)(k) = fb(k)b g (k). Remark 3.10. The derivative of f ∈ L2 (T) may be defined via the Fourier transform even if f is not differentiable. In particular, as long as k fb(k) is square summable on Z the series X g(x) = k fb(k)ek (x) k∈Z converges in L2 (T). This is a notion of weak derivative. 3.4. Fourier transform of finite measures. If µ is a finite measure on (T, B(T)) (the Borel subsets of the torus) then we actually can still define the Fourier transform. Note that the Fourier coefficients ˆ e−2πikx dµ(x) µ b(k) = T are perfectly well defined, since the characters ek (x) are continuous and bounded. Example 3.11 (δ-mass). One of the most interesting measures to compute the Fourier transform is the δ-mass. We simply compute ˆ b δ0 (k) = e−2πikx dδ0 (x) = 1. T The Fourier transform of δ-masses centered at a general point x ∈ T, δx , can be computed using the rule for the Fourier transform of a translate. LECTURE NOTES MATH 5210 29 This may give an idea for generating a sequence of approximate identities out of trigonometric polynomials. We simply take the Fourier transform of the δ-mass and cut-off at some mode N : DN (x) = N X e2πinx = e−2πiN x n=−N e2πi(2N +1)x − 1 e2πix − 1 1 where we used the geometric sum formula. Multiply the factor e−2πi(N + 2 )x through the numerator to create symmetric terms πix e DN (x) = e 2πi(N + 12 )x 1 1 1 − e−2πi(N + 2 )x e2πi(N + 2 )x − e−2πi(N + 2 )x = e2πix − 1 eπix − e−πix and now we can rewrite this as DN (x) = sin(2π(N + 12 )x) sin(πx) This is called the Dirichlet kernel, it is a bit like an approximate identity, but it is highly oscillatory and the L1 norm diverges as N → ∞. The analysis of the pointwise or pointwise a.e. convergence DN ∗ f → f , is a quite tricky topic, it is equivalent to the pointwise / pointwise a.e. convergence of Fourier series. Let’s try something similar, following Tao, but we will force non-negativity N 1 X 2πinx ρN (x) := e N n=0 2 N N X 1 X 2πi(n−m)x |`| e = (1 − )e2πi`x . = N N n,m=0 `=−N 3.5. Fourier transform decay, regularity, and convergence. The rate of decay of the Fourier modes at ∞ in Z is closely connected with regularity properties (continuity, differentiability) of the function in physical space and the mode of convergence of the Fourier series. Our first result adds just a small amount of decay, we consider absolutely summable instead of square summable sequences of Fourier modes. Lemma 3.12. If α ∈ `1 (Z → C) (i.e. it is an absolutely summable sequence), then the Fourier Series X α̌(x) = αk e2πikx k∈Z converges uniformly on T and α̌ is continuous. Proof. This is simply an application of the Weierstrass M -test, the partial sums are continuous functions and the uniform limit of continuous functions is continuous. Now if we add even more decay on the Fourier side we can find higher regularity of the inverse transform. 30 WILL FELDMAN Lemma 3.13. Suppose that (αk )k∈Z is a sequence of complex coefficients such that X |k|m |αk | < +∞. k∈Z Then the Fourier series α̌(x) = X αk e2πikx k∈Z converges uniformly and α̌ is m-times continuously differentiable with X α̌(`) (x) = αk (2πik)` e2πikx for 0 ≤ ` ≤ m. k∈Z Proof. Exercise. 3.6. Applications of Fourier Analysis - PDE. The heat equation is a classical partial differential equation (PDE) model for the transfer of heat, or the diffusion of a large number of micro-particles undergoing Brownian motion. It is also a fundamental archetype in the world of PDE. Actually the development of Fourier series by Jean-Baptiste Fourier was exactly with the purpose of solving the heat equation. We will look for a heat distribution function u(x, t) a real valued function of a space variable x ∈ T and a time variable t ∈ R+ solving the initial value problem ( ut − uxx = 0 for (x, t) ∈ T × (0, ∞) u(x, 0) = g(x), where the initial data g is some function in L2 (T → R). Classical approach. Fourier’s idea (or at least my semi-modern interpretation of it) started with looking for a solution of the separated form u(t, x) = T (t)X(x). Plugging this form into the equation yields T 0 (t) X 00 (x) = . T (t) X(x) Since the left hand side depends only on t and the right hand side depends only on x both must be equal to a constant λ ∈ R: ( T 0 = λT t ∈ R+ 00 X = λX x ∈ T. A different way of stating the second equation which may be more clear is we look for a solution of X 00 = λX on R which is 1-periodic. This imposes some restriction on λ, if λ > 0 then the solutions of X 00 = λX are of the form 1/2 1/2 X(x) = Aeλ x + Be−λ x which are not 1-periodic functions on R and so we must throw this case out. LECTURE NOTES MATH 5210 31 Thus we arrive at the case λ = −ω 2 ≤ 0. Now the solutions of the second equation are of the form X(x) = Aeiωx + Be−iωx again in order to be 1-periodic on R we arrive at the restriction ω = 2πk for some k ∈ Z. Then the corresponding solution of the first equation is 2 T (t) = Ce−(2πk) t , and our solution of the heat equation is 2 u(x, t) = De−(2πk) t e2πikx . This is of course just a special solution with initial data forced on us to be the character ek (x) for some k ∈ Z. However, you may observe, we have a bit more than that. The heat equation is linear so any linear combination of solutions is a solution. Thus actually we can solve the heat equation for any trigonometric polynomial as initial data. This is the origin of the idea of Fourier series: what if we could express an arbitrary periodic function as a (infinite) linear combination of these trigonometric functions. Modern approach. Now in our current situation we already know about the Fourier transform on T so let’s approach the solution of the heat equation with this in mind. We simply take the Fourier transform on both sides of the heat equation to obtain the following equation on the Fourier modes ( u bt (k) + (2πk)2 u b(k) = 0 for (k, t) ∈ Z × (0, ∞) u b(k, 0) = gb(k). This is a first order linear ODE for each k ∈ Z which we can solve to find 2t u b(k, t) = gb(k)e−(2πk) for (k, t) ∈ Z × [0, ∞). Then taking the inverse Fourier transform, i.e. writing out the Fourier series, X 2 (3.1) u(x, t) = gb(k)e−(2πk) t e2πikx . k∈Z This sum certainly converges in the L2 norm, but actually the convergence is much better. Even if gb was only bounded (i.e. the Fourier transform of a finite measure) then u b(k, t) will decay extremely quickly in k for any t > 0. From this we can easily derive an important and fundamental property of the heat equation: instantaneous regularization. Informally speaking, even for quite singular initial data the solution of the heat equation exists and is smooth for all t > 0. Lemma 3.14. If gb : Z → R is bounded then the solution of the heat equation, u(x, t) defined in (3.1), is C ∞ for any t > 0. Proof. Apply Lemma 3.13. 32 WILL FELDMAN With the Fourier series it is also easy to understand the long time behavior of solutions of the heat equation. If gb : Z → R is bounded then the solution ´ of the heat equation, u(x, t) defined in (3.1), converges as t → ∞ to gb(0) = T g |u(x, t) − gb(0)| ≤ kb g k`∞ e−(2π) 2t and we can even describe the asymptotic profile 2 e(2π) t (u(x, t) − gb(0)) = gb(1)e2πix + gb(−1)e−2πix . Since we assumed g is real valued we actually have the relation gb(k) = gb(−k) so gb(1)e2πix +b g (−1)e−2πix = 2Re(b g (1)e2πix ) = 2Re(b g (1)) cos(2πx)−2Im(b g (1)) sin(2πx). 3.7. Schrödinger equation on the circle. The Schrödinger equation arrived on the scene much more recently than the heat equation, in the 20th century. This is a fundamental equation of quantum mechanics describing the evolution of the (L2 -complex) probability distribution of a quantum particle. ´ Given an initial data g ∈ L2 (T → C) with T |g|2 = 1 (L2 probability distribution) we look for a solution u(x, t) ∈ C of the equation ( iut − uxx = 0 (x, t) ∈ T × (0, ∞) u(x, 0) = g(x). Obviously this looks quite similar to the heat equation, but the appearance of the complex unit i completely changes the nature of the equation. Nonetheless there are computational similarities and we can solve by the same Fourier method. Taking the Fourier transform on both sides of the equation we find ib ut (k) + (2πk)2 u b(k) = 0 for (k, t) ∈ Z × (0, ∞) and again we can integrate this simple first order ODE to find 2 u b(k, t) = e(2πk) it gb(k). Of course the major difference here is that there is no additional decay in k! We are just multiplying by a complex phase so |b u(k, t)| = |b g (k)| for all t > 0. In particular, and it is essential to the quantum mechanical interpretation of the equation, we have ˆ ˆ X X 2 2 2 |u| = |b u(k, t)| = |b g (k)| = |g|2 = 1 T for all t > 0. k∈Z k∈Z T LECTURE NOTES MATH 5210 33 3.8. Making sense of nonlinear functions of the derivative. One interesting thing about the Fourier transform, since it diagonalizes differentiation, we can easily compute nonlinear functions of the derivative operator which would have been very mysterious on the physical side. For example let’s write ∂x for the derivative operator mapping C 1 (T) → C(T). Many math students find the idea of fractional derivatives quite intriguing, what is the meaning of ∂xα f for α ∈ R or even the absolute value of the derivative |∂x |f What about some other operators which arise from formally solving the heat or Schrödinger equations 2 2 Gt f = et∂x f and Ut f = e−i∂x f. These operators are difficult to interpret on the physical side, but we can define them easily by using the Fourier transform. Given a smooth function m : R → R we can define the Fourier multiplier operator X m(∂x )f := m(2πik)fb(k)e2πikx . k∈Z The operator makes sense on any function f which has sufficient decay so that m(2πik)fb is square summable over the integers. For example for the fractional derivatives above if |k|α fb ∈ `2 (Z) then the fractional derivative is well define in L2 ∂xα f ∈ L2 (T). 4. Fourier analysis on R In this section I will introduce some ideas of Fourier analysis on Euclidean space. The situation here is more complicated than on the torus and we need some more sophisticated functional analysis. In this case we cannot view the Fourier transform as simply a choice of Schauder basis for L2 (R), although there are certainly philosophical similarities. The functional analytic basis of the Fourier transform on R starts with the space of Schwartz functions which is a metrizable topological vector space. 4.1. Schwartz functions. We introduce the space of Schwartz functions S(R), a space which is well suited for defining the Fourier transform. The space is defined by a collection of semi-norms. Definition 4.1. If V is a vector space then a map [·] : V → [0, ∞) is called a semi-norm if it satisfies (1) (Non-negativity) For all x ∈ V we have [x] ≥ 0. 34 WILL FELDMAN (2) (Scaling) For any α ∈ R (or C) and x ∈ V we have [αx] = |α|[x]. (3) (Triangle inequality) For any x, y ∈ V we have [x + y] ≤ [x] + [y]. A semi-norm satisfies all the properties of a norm except that the seminorm of some nonzero vectors may be zero. Of course, as we have seen, one can mod out by the equivalence relation induced by [x − y] = 0 but that is not actually our intention here. We introduce a sequence of semi-norms which measure the differentiability and decay of a function f : R → R. The Schwartz semi-norms are defined [f ]k,` = sup(1 + |x|2 )k/2 |f (`) (x)| for 0 ≤ k, ` < ∞. x∈R Note that these are actually norms for ` = 0 but for ` > 0 they are only seminorms. In plain language: if [f ]k,` < +∞ then f is `-times differentiable and the `’th derivative is bounded and decays at least as fast as (1 + |x|2 )−k/2 . Now we define the Schwartz space S(R) = {f : R → C| [f ]k,` < +∞ for all 0 ≤ k, ` < +∞}. It is not hard to check that this is a vector space. We define a notion of convergence on S(R). We say a sequence fn → f in S(R) if (4.1) [fn − f ]k,` → 0 as n → ∞ for all k, `. We can actually define a metric which produces this notion of convergence, making S(R) a metric space, dS(R) (f, g) = ∞ X k,`=0 2−k−` [f − g]k,` . 1 + [f − g]k,` This notion of convergence makes S(R) a complete metrizable topological vector space, i.e. it is a Fréchet space. Exercise 4.2. Check that dS(R) is a metric and that fn → f in the sense of (4.1) if and only fn → f in dS(R) . Check that S(R) is complete in this metric. The key properties of the Schwartz space in relation to the Fourier transform is that S(R) is closed under the operations of products, differentiation, convolution and multiplication by x. Lemma 4.3. If f, g ∈ S(R) then f g ∈ S(R), f ∗ g ∈ S(R), f 0 ∈ S(R), and xf ∈ S(R). 4.2. Fourier transform on Schwartz space. The reason for the introduction of Schwartz space is that the Fourier transform acts extremely nicely on S(R), in particular the Fourier transform of a Schwartz function is another Schwartz function. LECTURE NOTES MATH 5210 35 For an f ∈ S(R) we define the Fourier transform ˆ b e−2πiξx f (x) dx. f (ξ) = R This integral is well defined actually for any f ∈ L1 (R), and so it is certainly well defined on Schwartz functions. We also introduce an alternative notation which is useful when we want to view the Fourier transform as a linear operator on Schwartz space Ff = fb. The Fourier transform interacts nicely with many of the symmetries of R. Proposition 4.4. If f, g ∈ S(R) and α ∈ C (1) (Linearity) The Fourier Transform is linear F[f + αg] = Ff + αFg. (2) (Translation) Translation on the physical side corresponds to modulation by a character on the Fourier side F[f (x + x0 )](ξ) = e2πiξx0 fb(ξ). (3) (Modulation) Modulation by a character on the physical side corresponds to translation on the Fourier side F[e2πiξ0 x f (x)](ξ) = (Ff )(ξ − ξ0 ). (4) (Differentiation) Differentiation on the physical side corresponds to multiplication by ξ on the Fourier side F[f 0 ](ξ) = 2πiξF[f ](ξ). (5) (Multiplication by x) Multiplication by x on the physical side corresponds to differentiation on the Fourier side F(xf ) = 1 (Ff )0 (ξ) 2πi (6) (Products) Products on the physical side correspond to convolutions on the Fourier side ˆ F(f g) = F[f ] ∗ F[g] = fb(ξ − η)b g (η) dη. R (7) (Convolutions) Convolutions on the physical side correspond to products on the Fourier side F(f ∗ g) = (Ff )(Fg). Corollary 4.5. The Fourier transform of a Schwartz function is a Schwartz function. 36 WILL FELDMAN 5. Lp -spaces and other notions of convergence in Measure Theory 5.1. Notions of convergence in measure theory. Given a domain Ω ⊂ Rn measurable and a sequence of measurable functions fn : Ω → R we have already seen several notions of convergence fn → f as n → ∞. Often in measure theory we will identify functions which are equivalent almost everywhere (a.e.), that is m({x : f (x) 6= g(x)}) = 0, and many of the notions of convergence will only uniquely identify the limit up to a set of measure zero. The notions of pointwise and uniform convergence are by now well known to us, we restate them for context. Definition 5.1. A sequence fn → f pointwise in Ω if |fn (x) − f (x)| → 0 as n → ∞ for all x ∈ Ω. Definition 5.2. A sequence fn → f uniformly in Ω if sup |fn (x) − f (x)| → 0 as n → ∞. x∈Ω As we know uniform convergence is a very strong notion of convergence and implies pointwise convergence. Next we have a measure theoretic version of pointwise convergence, called convergence pointwise almost everywhere, which we have already introduced in class. Definition 5.3. A sequence fn → f pointwise a.e. in Ω if there is a set E ⊂ Ω with m(Ω \ E) = 0 such that |fn (x) − f (x)| → 0 as n → ∞ for all x ∈ E. Naturally this notion is weaker than pointwise convergence, but it is still sufficient to apply the important integral convergence theorems of measure theory: the monotone convergence theorem and the dominated convergence theorem. Another natural notion of convergence is L1 -convergence Definition 5.4. A sequence fn of absolutely integrable functions converges to f in L1 on Ω if ˆ kfn − f kL1 = |fn − f | → 0 as n → ∞. Ω The relationship between pointwise a.e. convergence and L1 convergence is less clear cut that the relationship between previous notions. We have some information, the following is a corollary of the dominated convergence theorem: LECTURE NOTES MATH 5210 37 Corollary 5.5. If fn : Ω → R are absolutely integrable and fn → f pointwise a.e. in Ω and there exists F absolutely integrable such that |fn | ≤ F for all n then fn → f in L1 . What about the other direction? Does convergence in L1 imply convergence pointwise a.e.? The answer is no in general, but it turns out that if kfn − f kL1 → 0 fast enough then pointwise a.e. convergence does happen. Example 5.6 (Typewriter/Piano sequence). Let’s start with an example where convergence in L1 does not imply convergence pointwise almost everywhere. Define fj,k (x) = 1[j2−k ,(j+1)2−k ] (x) for 0 ≤ j < 2k . For each k this is a sequence of dyadic intervals which traverses [0, 1] from left to right. For every k and every x ∈ [0, 1] there is an appropriate 0 ≤ j < 2k such that fj,k (x) = 1. Now for n ∈ N define k to be the smallest natural number so that 2k +1 ≤ n and define j = n − (2k + 1) and then gn (x) = fj(n),k(n) (x). This sequence traverses the dyadic subintervals of [0, 1] of length 2−k from left to right, and then starts again at the left traversing the dyadic subintervals of length 2−k−1 . With some additional checking we can see that this sequence does converge to zero in L1 but not pointwise almost everywhere. Now we show that fast L1 convergence implies pointwise a.e. convergence. Lemma 5.7. Suppose that fn : Ω → R are absolutely integrable and ∞ X kfn+1 − fn kL1 (Ω) < +∞ N =1 then fn converges pointwise a.e. and in L1 to an absolutely integrable function f : Ω → R∗ . Proof. Our idea is to define the limit f as the pointwise a.e. limit of the telescoping sums (5.1) fn = f1 + n−1 X (fj+1 − fj ), j=1 but we need to establish that this series actually does converge pointwise almost everywhere. We start by showing that the series is absolutely summable pointwise a.e., define ∞ X g(x) = |f1 (x)| + |fj+1 (x) − fj (x)| j=1 38 WILL FELDMAN this definition makes sense for each x ∈ Ω as an element of [0, +∞] since it is a sum of non-negative terms. By the assumption defining the subsequence g is absolutely integrable on Ω ˆ ∞ X g = kf1 kL1 + kfj+1 − fj kL1 < +∞. Ω j=1 Absolutely integrable functions are finite almost everywhere, so m({g = +∞}) = 0. Thus the following series does converge pointwise on the set {g < +∞}, which has full measure in Ω, f (x) := f1 (x) + ∞ X (fj+1 (x) − fj (x)), j=1 in particular fn → f pointwise a.e. on Ω. We know f is absolutely integrable because |f | ≤ g. We also need to show that fnj → f in L1 norm. Note that every |fnj −f | ≤ g (triangle inequality on (5.1)), so by dominated convergence theorem ˆ |fnj − f | → 0 as j → ∞. Ω Theorem 5.8. For Ω ⊂ Rn measurable (L1 (Ω), k · kL1 (Ω) ) is complete. 1 ∞ Proof. Suppose (fn )∞ n=1 is a Cauchy sequence in L (Ω). If (fn )n=1 has a convergent subsequence then the whole sequence converges so we would be done. Because supn,m≥N kfn − fm kL1 → 0 we can choose a subsequence fnj so that ∞ X kfnj+1 − fnj kL1 < +∞. j=1 Now we can simply apply the result of Lemma 5.7. Extremely similar arguments work to show that Lp (Ω → R) is complete for any 1 ≤ p < ∞. 5.2. Continuous functions of compact support are dense in L1 . In this section we will show that continuous functions of compact support are dense in L1 (Rn → R). This is also true for Lp (Rn → R) for every 1 ≤ p < +∞. I will outline the ideas and leave most of the details as an exercise for the homework. Here is the outline: we will first show that indicator functions of finite measure sets can be approximated in L1 norm by continuous compactly supported functions, then we will use the density of simple functions, which we already know, to conclude. LECTURE NOTES MATH 5210 39 5.3. Generalizations of uniform convergence. One may also think of trying to make generalizations of the notion of uniform convergence, mixing in some of the philosophy of measure theory (i.e. sets of small or zero measure can be ignored). For example one may think to measure instead of the smallest upper bound, the supremum, the smallest upper bound which is valid on a set of full measure. Let f : Ω → R, first we re-interpret the classical supremum supΩ f : sup f = inf a ∈ R {f > a} = ∅ . Ω This motivates the following definition Definition 5.9. Define the essential supremum of a measurable function f : Ω → R by ess-sup f = inf a ∈ R m({f > a}) = 0 Ω and the essential infimum ess-inf f = sup a ∈ R m({f < a}) = 0 Ω This allows us to define a measure theoretic analogue of the supremum norm kf kL∞ = ess-sup |f |. Ω The associated normed space is defined L∞ (Ω → R) = {f : Ω → R| f measurable and kf kL∞ (Ω) < +∞}. This is an example of a non-separable Banach space. The proof of nonseparability is very similar to one we already saw on the homework for the space `∞ (N → R). There are some other generalizations of uniform convergence, which (to be honest) are not as useful as they seem. However they do come up occasionally so it is good to be aware of them. Definition 5.10. A sequence of measurable functions fn : Ω → R converges almost uniformly to a function f if for all δ > 0 there is a set E with m(E) ≤ δ such that fn → f uniformly on Ω \ E. This notion of convergence looks stronger than pointwise a.e. convergence, but it turns out to be equivalent. Theorem 5.11 (Egorov’s Theorem). Suppose that fn : Ω → R are measurable and convergence pointwise a.e. to f : Ω → R, then fn → f almost uniformly. 40 WILL FELDMAN Proof. We start out with a trick that we have seen before in the sharp Riemann integrability criterion, starting with the set where pointwise convergence holds, which has full measure, and writing it as a countable union of sets where quantified convergence holds. Define N (x, ε) = sup{n : |fn (x) − f (x)| ≥ ε}. Since fn → f pointwise a.e. we know that there is a set F ⊂ Ω of full measure on which N (x, ε) < +∞ for all ε > 0. Thus, by monotone convergence theorem, m({x : N (x, ε) > n}) → 0 as n → ∞. For each k choose M (k) so that 1 m({x : N (x, ) > M (k)}) ≤ δ2−k . k Then define E= ∞ [ 1 {x : N (x, ) > n}. k k=1 By countable subadditivity m(E) ≤ δ. We claim that fn → f uniformly on Ω \ E. Let ε > 0 there exists k such that 1/k ≤ ε and since 1 Ω \ E ⊂ {x : N (x, ) ≤ M (k)} k 1 and so for n ≥ M (k) we have n ≥ N (x, k ) for all x ∈ Ω \ E |fn (x) − f (x)| ≤ 1 . k 5.4. Appendix. Here we provide some additional notes and alternative proofs on the material of the section. We will start with an alternative “more hands on” proof of Lemma 5.7. We state a very important lemma which will be useful in the upcoming proof. In fact it is highly useful in many situations and it is worth remembering the argument. Lemma 5.12 (Markov/Chebyshev Inequality). Suppose that f : Ω → [0, ∞] and λ ∈ (0, ∞) then ˆ 1 m({x ∈ Ω : f (x) > λ}) ≤ f λ {f >λ} Proof. We simply note that f χ{f >λ} ≥ λχ{f >λ} and then compute ˆ ˆ χ{f >λ} ≤ 1 m({f > λ}) = fχ . λ Ω {f >λ} Ω LECTURE NOTES MATH 5210 41 ´ ´ It is also worth noting that {f >λ} f ≤ Ω f , since f is non-negative, and this form of the inequality is often sufficient. Now we show that fast L1 convergence implies pointwise a.e. convergence. Lemma 5.13. Suppose that fn : Ω → R are absolutely integrable and ∞ X kfn+1 − f kL1 (Ω) < +∞ N =1 then fn converges pointwise a.e. in Ω to f . Proof. Define E = {x ∈ Ω : inf sup |fn (x) − f (x)| > 0}, N n≥N this is the set of points where fn (x) is NOT convergent to f (x), we aim to show this set has measure zero. We start out with a trick that we have seen before in the sharp Riemann integrability criterion and Egorov’s Theorem, writing E as a countable union of sets of points where fn is quantitatively not Cauchy. Note that [ E= Eδ δ∈1/N where 1/N = {1/k : k ∈ N} and Eδ = {x ∈ Ω : inf sup |fn (x) − f (x)| ≥ δ}. N n≥N If we can show that m(Eδ ) = 0 for all δ > 0 then we are done since E will be a countable union of sets of measure zero, and hence it will have measure zero. We use that to rewrite Eδ as ∞ [ \ Eδ = {x ∈ Ω : |fn (x) − f (x)| ≥ δ}. N =1 n≥N By Markov’s inequality 1 m({x ∈ Ω : |fn (x) − f (x)| ≥ δ}) ≤ kfn − f kL1 (Ω) . δ Thus by sub-additivity, for any N ≥ 1, ∞ [ X 1 m(Eδ ) ≤ m( {x ∈ Ω : |fn (x) − f (x)| ≥ δ}) ≤ kfn − f kL1 (Ω) . δ n≥N n=N By the summability assumption in the Lemma the right hand side in the above inequality converges to zero as N → ∞. Thus m(Eδ ) = 0. This argument should be familiar from Homework 5 problem 4(b). If you read about the Borel-Cantelli lemma in the book, which we did not use in class, that Lemma could have been cited to skip some parts of the proof. 42 WILL FELDMAN 5.5. Other important theorems of measure theory. There are some additional important theorems of measure theory which we will not prove in the class. Nonetheless they are useful to know and will allow us to prove some of the functional analytic results we are interested in on the Lp spaces. Theorem 5.14 (Lebesgue Differentiation Theorem). Suppose that f : Rd → R is absolutely integrable on any bounded subset of Rd , i.e. f is locally integrable. Then there exists a set of measure zero E ⊂ Ω such that for all x ∈ Rd \ E ˆ 1 f (y) dy. f (x) = lim r→0 m(B(x, r)) B(x,r)