advertisement

Measures and Integration László Erdős Nov 9, 2007 Based upon the poll in class (and the required prerequisite for the course – Analysis III), I assume that everybody is familiar with general measure theory and Lebesgue integration. The beginning of this note (Section 1 and 2) is meant to remind you what concepts this involves. If something is unknown, look it up. I provide a good summary of basic concepts (without proofs) by Marcel Griesemer. This file actually contains a bit more than we need, see the list below. Another summary you can find e.g. in the Appendix A of Werner: Funktionalanalysis. If you need to check more details (e.g. proofs), consult with any analysis or measure theory book. Very good books are: H.L. Royden: Real Analysis or Walter Rudin: Principles of mathematical analysis. A very concise and sharp introduction is in E. Lieb, M. Loss: Analysis. 1 Need for Lebesgue Integral One can justify the necessity of a more general integration (than Riemann) in many ways. From functional analysis point of view there are two “natural” arguments Need for Lebesgue I. Recall that we have equipped C[0, 1] with two different metrics (actually, norms) d1 and d∞ (or k · k1 and k · k∞ in terms of norms). This space was proven to be complete under the d∞ metric. It is clearly not complete under the d1 metric – it is trivial to find R 1a sequence of continuous functions fn and a discontinuous function f : [0, 1] → R, such that 0 |fn −f | → 0. In particular, fn is Cauchy in d1 (it even converges), but the limit is not in C[0, 1]. You may think, there is no problem, since we know how to Riemann-integrate functions with jumps, e.g. piecewise continuous functions. So if (C[0, 1], d1) is not complete, maybe (P C[0, 1], d1) is so. It is fairly easy to see that this is not the case: Homework 1.1 Prove that (P C[0, 1], d1) is not complete. 1 We know that P C[0, 1] is not the biggest class of functions that are Riemann integrable, eventually Riemann integrability can allow infinitely many discontinuities, as long as the difference of the lower sums and upper sums converge to zero, i.e. the oscillation of the function is not too big. The basic theorem about Riemann integrability is the following: Theorem 1.2 A function f : [0, 1] → R is Riemann integrable if and only if f is bounded and it is continuous almost everywhere, i.e. the set of discontinuities is of (Lebesgue) measure zero. Homework 1.3 Prove directly (without reference to the above characterization of Riemann integrability) that the set of Riemann integrable functions equipped with the d1 metric is not complete. (Hint: take a Cantor set C that has nonzero measure and consider its approximations Cn that you obtain along the Cantor procedure after removing the n-th generation of intervals. Then take the characteristic functions of these sets.) Need for Lebesgue II. We have seen that pointwise convergence and continuity (say in C[0, 1]) are not compatible without further assumptions. What about pointwise convergence and Riemann integration? Is Z Z 1 1 fn (x)dx = lim n→∞ 0 lim fn (x)dx 0 n→∞ true? (In a sense, that pointwise limit of Riemann integrable functions is Riemann integrable and the limit of the integral is the integral of the limit). We know that without some further condition this cannot hold, just consider the sequence fn (x) = nχ(0,1/n) (x) R1 that clearly converges to f ≡ 0 pointwise but 0 fn = 1. Suppose we are willing to assume uniform boundedness (that is anyway reasonable in the realm of Riemann integrable functions and, a-posteriori, we know from the dominated convergence theorem that some condition is necessary) in order to save the exchangeability of the limit and integral. It still does not work, for example, consider the Dirichlet function 1 if x ∈ Q ∩ [0, 1] f (x) = 0 if x ∈ [0, 1] \ Q and its approximations fn (x) = 1 if x = pq , p, q ∈ Z, p ≤ q ≤ n 0 otherwise 2 Clearly fn is Riemann integrable (since it is everywhere zero apart from finitely many points), while its pointwise limit, f , is not Riemann integrable (WHY?) Again, the problem is the big oscillation. 2 Concepts you should know The following concepts, theorems you should be familiar with: • σ-algebra (meaningful on any set X) • Borel sets (meaningful on a topological space). • Measures, outer measures. Measure spaces. • Regular measures on topological spaces (approximability of measures of sets by open sets from outside and compact sets from inside) • Lebesgue measure and its properties. Lebesgue measure is the unique measure on Rn that is invariant under Euclidean motions and assigns 1 to the unit cube. • Lebesgue measurable sets. Zero measure sets. Concept of “almost everywhere”. Not all Lebesgue sets are Borel (this is not easy to prove) • Counting measure on the measure space (N, P (N), µ), where P (N) is the σ-algebra of all subsets and µ is the counting measure. • (Borel)-measurable functions. This class is closed under arithmetic operations, compositions, lim inf and lim sup. • Lebesgue R R integral. | f | ≤ |f |) Integrable functions. Usual properties (linearity, monotonicity, • Lebesgue integral coincides with Riemann integral for Riemann integrable functions. • Basic limit theorems: Monotone and dominated convergence, Fatou’s Lemma. R • Lebesgue integral of complex valued functions (infinite integral is not allowed, |f | < ∞ is required). • σ-finite measure spaces. 3 • Product of two measure spaces (construction of the product σ-algebra and product measure). Fubini theorem (need non-negativity or integrability with respect to the product measure to interchange integrals) We will use the notation Z f dµ = Ω Z f (x)dµ(x) Ω simultaneously for the Lebesgue integral. The second notation is favored if for some reason the integration variable needs to be spelled out explicitly (e.g. we have multiple integral). If Ω ⊂ Rd , then we use Z Z f= f (x)dx Ω Ω where dx stands for the Lebesgue measure. Unless we indicate otherwise, integration on subsets of Rd is always understood with respect to the Lebesgue measure. 3 Singular measures This chapter usually belongs to measure theory but I am not sure if the majority of you had it. So we review it. We first present examples on R, then develop the general definitions. Let α(x) : R → R be a monotonically increasing function. A monoton function may not be continuous, but its one-sided limits exist at every point, we introduce the notation α(a + 0) := lim α(x), α(a − 0) := lim α(x), x→a+0 x→a−0 We define a measure µα by assigning µα ((a, b)) := α(b − 0) − α(a + 0) to any open interval (a, b). Since open intervals generate the sigma-algebra of Borel sets, it is easy to see that the usual construction of Lebesgue measure (using α(x) = x) goes through for this more general case. The resulting measure is called the Lebesgue-Stieltjes measure. With respect to this measure we can integrate, the corresponding integral is sometimes denoted as Z Z f dµα = f dα (the right hand side is only a notation). This is called the Lebesgue-Stieltjes integral. Examples: 4 (i) As mentioned, α(x) = x gives back the Lebesgue integral. A bit more generally, if α ∈ C 1 (continuously differentiable), then it is easy to check that Z Z f dα = f (x)α0 (x)dx i.e. in this case the Lebesgue-Stieltjes integral can be expressed as a Lebesgue integral with a weight function α0 . (ii) Fix a number p ∈ R. Let α(x) := χ(x ≥ p) be the characteristic function of the semi-axis [p, ∞). CHECK that Z f dα = f (p) for any function f . In particular, any function is integrable and the integral depends only on the value at the origin. The corresponding L1 space is simply L1 (R, dα) ∼ =C i.e. it is a one-dimensional vectorspace (CHECK!). The generated Lebesgue-Stieltjes measure is called the Dirac delta measure at p and it is denoted as δp . In particular, 1 if p ∈ A δp (A) = 0 if p 6∈ A (iii) Let f ≥ 0 be a measurable function on R with finite total integral. Let Z x f (s)ds α(x) := −∞ Then µα (A) = Z f (x)dx A (iv) A considerable more interesting example is the following function. Consider the standard Cantor set, i.e. 1 2 1 2 7 8 1 2 C := [0, 1] \ ( , ) ∪ ( , ) ∪ ( , ) ∪ ( , ) ∪ . . . 3 3 9 9 9 9 27 27 5 Recall that the Cantor set is a compact, uncountable set. It is easy to see that the Lebesgue measure of C is zero. Define an increasing function α on [0, 1] as follows: α will be constant on each of the set removed in the definition of C, more precisely 1/2 x ∈ (1/3, 2/3) 1/4 x ∈ (1/9, 2/9) 3/4 x ∈ (7/9, 8/9) 1/8 x ∈ (1/27, 2/27) α(x) := 3/8 x ∈ (7/27, 8/27) 5/8 x ∈ (19/27, 20/27) 7/8 x ∈ (25/27, 26/27) etc. Make a picture to see the succesive definition of α on the complement of the Cantor set. With these formulas we have not yet defined α on C. Homework 3.1 (a) Show that the function α defined on [0, 1] \ C above can be uniquely extended to [0, 1] by keeping monotonicity. This is called the Devil’s staircase. (b) Show that the extension is continuous. (c) Let µα the corresponding Lebesgue-Stieltjes measure. Show that µα ({p}) = 0 for any point p ∈ [0, 1]. (d) Show that dµα is supported on a set of Lebesgue measure zero. (Recall that the support (Träger) of a measure µ is the smallest closed set K such that for any proper closed subset H we have µ(K \ H) > 0.) (e) Show that α is almost everywhere differentiable in [0, 1] but the fundamental theorem of calculus does not hold, e.g. Z 1 α(1) − α(0) 6= α0 (x)dx 0 Homework 3.2 Let µα be the Lebesgue-Stieltjes measure constructed in the previous Homework. Compute Z 1 Z 1 x2 dµα (x) xdµα (x), and (b) (a) 0 0 [Hint: use the hierarchical structure of C] 6 This example shows that without the fundamental theorem of calculus it can be quite complicated to compute integrals. In this particular example the special structure of C and α helped. If one constructs either a less symmetric Cantor set or one defines α differently, it may be very complicated to compute the integral. The last three examples were prototypes of a certain classification of measures according to their singularity structure. The Dirac delta measure is so singular that it assigns nonzero value to a set consisting of a single point, namely δp ({p}) = 1. The measure dµα obtained from the “Devil’s staircase”, example (iv), is less singular, since it assigns zero measure to every point, but it is still supported on a small set (measured with respect to the usual Lebesgue measure). Finally, example (iii) shows a non-singular measure in a sense that µα (A) = 0 for any set of zero Lebesgue measure. We give the precise definitions of these classes. Definition 3.3 Let µ and ν be two measures defined on a fixed σ-algebra on a space X. Then (a) ν is absolutely continuous (absolutstetig) with respect to µ if ν(A) = 0 whenever µ(A) = 0. Notation; ν µ; (b) µ and ν are mutually singular if there is a measurable set A such that µ(A) = 0 and ν(X \ A) = 0. Notation: µ ⊥ ν. Example (iii) is a measure that is absolutely continuous with respect to the Lebesgue measure, while examples (ii) and (iv) are both mutally singular with the Lebesgue measure (and with each other as well). It is clear that example (iii) is an absolutely continuous measure. It is less clear, that essentially every absolutely continuous measure is the result an integration. This is the content of the important Radon-Nikodym theorem, whose proof we postpone: Theorem 3.4 (Radon-Nikodym) Let µ and ν be two measures on a common σ-algebra on X and µ be σ-finite. Then ν µ if and only if there exists a measurable function, f : X → R+ (infinity is allowed), such that Z ν(A) = f (x)dµ(x) A for any A in the σ-algebra. This function is µ-a.e. (almost everywhere) unique. Notation: f = dν (this is only a formal fraction!) dν Moreover, we also have the following decomposition which we mention without proof: Theorem 3.5 (Lebesgue decomposition I.) Let µ and ν be two σ-finite measures on a common σ-algebra. Then ν can be uniquely decomposed as ν = νac + νsing 7 where νac µ and νsing ⊥ µ. The singular part can be further decomposed under a mild countability condition on the number of points that have positive measure: Definition 3.6 Let (X, B, µ) be a measure space such that for every point x ∈ X, the set {x} belongs to B, and let P := {x ∈ X : µ({x}) 6= 0} The set P is called the pure points or atomic points of the measure µ. Assume that P is a countable set. Then the measure X µpp (A) := µ(A ∩ P ) = µ({x}) x∈A∩P is well-defined and it is called the pure point or atomic component of µ. A measure µ is called pure point measure if µ = µpp . A measure µ is called continuous if µpp = 0. Given another measure ν on the same σ-algebra, the measure µ is singular continuous with respect to ν if µ is continuous and µ ⊥ ν. The Dirac delta measure from example (ii) is an atomic measure; examples (iii) and (iv) are continuous measures. Example (iii) is a measure that is absolutely continuous with respect to the Lebesgue measure, while (iv) and the Lebesgue measure are mutually singular. The measure in (iv) is thus a singular continuous measure with respect to the Lebesgue measure. The following theorem is a simple exercise from these definitions: Theorem 3.7 Given two measures µ, ν on the same σ-algebra that contains each {x}, assume that ν ⊥ µ and assume that the set of atoms of ν is countable. Then the measure ν can be uniquely decomposed into ν = νpp + νsc , where νpp is the pure point component of ν and νsc is a singular continuous measure that is also mutually singular to νpp . The most important application is the following version of these decomposition theorems whose proof is a simple exercise from the definitions above. Theorem 3.8 (Lebesgue decomposition II.) Let µ and ν be two σ-finite measures on a common σ-algebra that contains each {x}. (In particular there are at most countably many points with nonzero weight). Then ν can be uniquely decomposed as ν = νac + νpp + νsc where νac µ, νsc ⊥ µ and νpp is the pure-point component of ν. 8 4 Lp-spaces Dominated convergence theorem resolved the “Need for Lebesgue II.” by demonstrating that pointwise limit and integration can be interchanged within the Lebesgue framework (assuming the existence of the integrable dominating function). What about “Need for Lebesgue I”? It is clear that the formula Z 1 kf k1 := |f (x)|dx (4.1) 0 extends the norm (metric) d1 from C[0, 1] to all Lebesgue integrable functions on [0, 1], since Riemann and Lebesgue integrals coincide on continuous functions. In the Riesz-Fischer theorem below (Section 5) we will show, that the space of Lebesgue integrable functions is actually complete, so it is one of the possible completions of (C[0, 1], d1) (we do not know yet that it is the smallest possible, for that we will have to show that the continuous functions are dense in the set of Lebesgue integrable ones). However, before we discuss this, we have to introduce the Lp spaces. It would be tempting to equip the space of Lebesgue integrable functions by the norm given by (4.1). Unfortunately, this is not a norm, for a “stupid” reason: the Lebesgue integral is insensitive to changing the integrand on zero measure set. In particular, kf k1 = 0 does not imply that f (x) = 0 for all x, only for almost all x. The following idea circumvents this problem and we discuss it in full generality. Let (Ω, B, µ) be a measure space (where Ω is the base set, B is a σ-algebra and µ is the measure). We consider the an equivalence relation on the set of functions f : Ω → C: f ∼g iff f (x) = g(x) for µ-almost all x It needs a (trivial) proof that this is indeed an equivalence relation. Suppose that f is integrable, then obviously any function in its equivalence class is also integrable with the same integral. Therefore we consider the space L1 (Ω, B, µ) = L1 (Ω, µ) = L1 (Ω) = L1 := {Integrable functions}/ ∼ i.e. the integrable functions factorized with this equivalence relation (the various notations are all used in practice, in principle the concept of L1 depends on the space, the measure and the sigma algebra, but in most cases it is clear from the context which sigma-algebra and measure we consider, so we omit it from the notation). It is easy to see that the usual vectorspace operations extend to the factorspace. Moreover the integration naturally extends to L1 (Ω, µ). The only thing to keep in mind, that notationally we still keep denoting elements of 1 L (Ω, µ) by f (x), even though f (x) does not make sense for a fixed x for a general L1 function (for continuous functions it is of course meaningful). 9 Definition 4.1 Let (Ω, B, µ) be a measure space and let 0 < p ≤ ∞. We set Z p L (Ω, µ) := {f : Ω → C, measurable : |f |p dµ < ∞}/ ∼ Ω for p < ∞ and L∞ (Ω, µ) := {f : Ω → C, measurable : ess sup |f | < ∞}/ ∼ where the essential supremum of a function is defined by ess sup |f | := inf{K ∈ R : |f (x)| ≤ K for almost all x} These spaces are called Lp -spaces or Lebesgue spaces. Note that every Lebesgue space is actually an equivalence class of functions. But this fact is usually omitted from the notations. Homework 4.2 Prove that Lp (Ω, µ) is a vectorspace for any p > 0. Definition 4.3 For f ∈ Lp we define kf kp := Z p |f | dµ Ω 1/p if p < ∞ and kf k∞ := ess sup |f | if p = ∞. These formulas do not define a norm if 0 < p < 1 (triangle inequality is not satisfied) but they do define a norm for 1 ≤ p ≤ ∞. For the proof, one needs Minkowski inequality (Theorem 6.5) kf + gkp ≤ kf kp + kgkp that is exactly the triangle inequality for k · kp (the other two properties of the norm are trivially satisfied). From now on we will always assume that 1 ≤ p ≤ ∞ whenever we talk about Lp spaces. These norms naturally define the concept of Lp convergence of functions: Definition 4.4 A sequence of functions fn ∈ Lp converges to f ∈ Lp in Lp -sense or in Lp -norm if kfn − f kp → 0 as n → ∞. 10 In case of Lp convergent sequences, we often say that fn converges strongly (stark), although this is a bit imprecise, since it does not specify the exponent p. We will see later that it nevertheless distinguishes from the concept of weak convergence. These convergences naturally extend the d1 , dp and d∞ convergences on continuous functions we have studied earlier. Moreover, the pointwise convergence also naturally extends to Lp functions, but we must keep in mind the problem that everything is defined only almost surely. Definition 4.5 (i) A sequence of measurable functions fn on a measure space (Ω, B, µ) converges to a measurable function f almost everywhere (fast überall) if there exists a set Z of measure zero, µ(Z) = 0, such that fn (x) → f (x) ∀x 6∈ Z (ii) A sequence of equivalence classes of measurable functions fn converges pointwise to an equivalence class of measurable functions f , if any sequence of representatives of the classes of fn converges to any representative of f . It is any easy exercise to show that if the convergence holds for at least one sequence of representatives, then it holds for any sequence (of course the exceptional set Z changes), in particular part (ii) of the above definition is meaningful. Therefore one does not need to distinguish between almost everywhere pointwise convergence of equivalence classes and their representatives. In the future, we will thus freely talk about, e.g., Lp functions converging almost everywhere pointwise without ever mentioning the equivalence classes. Homework 4.6 Give examples that pointwise convergence does not imply Lp convergence and vice versa. Give also examples that convergence in Lp does not in general imply convergence in Lq , p 6= q. There is, however, one positive statement: Lemma 4.7 Suppose the total measure of the space is finite, µ(Ω) < ∞. Then Lp convergence implies Lq convergence whenever q ≤ p. Proof. Use Hölder inequality (we will prove it later, but I assume everybody has seen it) Z Z q/p Z (p−q)/p Z p/(p−q) q q q p/q dµ 1 dµ |f | dµ = |f | · 1 dµ ≤ |f | Ω Ω Ω Ω thus kf kq ≤ kf kp µ(Ω) 11 q1 − p1 5 Riesz-Fischer theorem The following theorem presents the most important step towards proving that L1 [0, 1] is the completion of C[0, 1] equipped with the d1 metric. Theorem 5.1 (Riesz-Fischer) Let (Ω, B, µ) be an arbitrary measure space, let 1 ≤ p ≤ ∞ and consider the Lp = Lp (Ω, µ). (i) The space Lp , equipped with the norm k · kp , is complete, i.e. if fi ∈ Lp is Cauchy, then there is a function f ∈ Lp such that fi → f in Lp -sense. (ii) If fi → f in Lp , then there exists a subsequence, fik , and a function F ∈ Lp such that |fik (x)| ≤ F (x) for all n (almost everywhere in x) and fik converges to f almost everywhere, as k → ∞. Proof. We will do the proof for p < ∞. The p = ∞ case requires a somewhat different treatment (since L∞ is defined differently) but it is simpler. Step 1: Subsequential convergence is enough. This is an important basic idea. We want to prove that a Cauchy sequence fi converges strongly. It turns out that it is sufficient to show that some subsequence converges strongly. Apparently this is much weaker, but actually it is not. Suppose that fik is a strongly convergent subsequence, i.e. fik → f (in Lp ) as k → ∞. But then kfi − f kp ≤ kfi − fik kp + kfik − f kp and thus for any ε > 0 we can make the second term smaller than ε/2 by choosing k sufficiently large, and then, by the Cauchy property, the first term is smaller than ε/2 if i and k are sufficiently large. Thus from subsequential strong convergence of a Cauchy sequence we concluded the strong convergence of the whole sequence. Step 2. Selection a subsequence. To find a convergent subsequence we proceed successively. Pick i1 such that kfn − fi1 kp ≤ 1 2 ∀n ≥ i1 such i1 exists by the Cauchy property. Now select i2 > i1 such that kfn − fi2 kp ≤ 1 4 12 ∀n ≥ i2 and again by the Cauchy property such i2 exists. Next we choose i3 > i2 such that 1 8 ∀n ≥ i3 1 2k ∀n ≥ ik `−1 X |fik − fik+1 | kfn − fi3 kp ≤ etc., in general we have ik > ik−1 with kfn − fik kp ≤ Step 3. Telescopic sum Now we define F` := |fi1 | + k=1 By Minkowski inequality kF` kp ≤ kfi1 kp + 1 1 + + . . . = kfi1 kp + 1 2 4 and clearly F` is a monotone increasing sequence of functions. Let F := lim F` ` be the almost everywhere pointwise limit, then by monotone convergence theorem and by the uniform bound on the Lp norm of F` , we have kF kp < ∞ in particular, F (x) < ∞ almost everywhere. Now use the telescopic cum fik = fi1 + (fi2 − fi1 ) + (fi3 − fi2 ) + . . . + (fik−1 − fik ) As k → ∞ this is an absolutely convergent series for every x such that F (x) < ∞, let f (x) be its limit, thus fik (x) → f (x) k→∞ for almost every x. Moreover, from the telescopic sum it also follows that |fik | ≤ F ∈ Lp 13 and thus by dominated convergence, we have f ∈ Lp Using dominated convergence once more, for |fik − f | ≤ |fik | + |f | ≤ F + |f | ∈ Lp we also have kfik − f kp → 0, k → ∞. We have proved earlier that (C[0, 1], k · k∞ ) is complete and now we have seen that (L [0, 1], k · k∞ ) is also complete. However, for any p < ∞, the set (C[0, 1], k · kp ) is not complete (EXAMPLE!) but (Lp [0, 1], k · kp ) is complete. Actually it is the (smallest) completion of (C[0, 1], k · kp ) as we will soon prove. ∞ Remark 5.2 The p = ∞ case often behaves exceptionally. Many theorems about Lp spaces hold only with the restriction p < ∞, and/or sometimes, by duality, p > 1 is necessary. Rule of thumb: whenever you use some theorem about Lp spaces watch out for the borderline cases, p = 1, ∞ and make sure the theorem applies to them. Riesz-Fischer theorem holds without restrictions, but many other theorems do not. 6 Inequalities The primary tools in analysis are inequalities. Even though often theorems in analysis are formulated as limiting statements, the heart of the proof is almost always an inequality. Here we discuss a few basic inequalities involving integrals of functions. I assume that you have already seen Jensen’s, Hölder’s and Minkowski’s inequalities. I will not prove them in class, but I enclose their proofs – they are important, if you forgot them, review it. Theorem 6.1 (Jensen’s inequality) Let J : R → R be a convex function and let (Ω, µ) be a measure space with finite total measure, i.e. µ(Ω) < ∞. Let f ∈ L1 (Ω, µ) function and define its average as Z 1 hf i := f dµ µ(Ω) Ω Then (i) (J ◦ Rf )− ∈ L1 (here a− := max{0, −a} is the negative part (Negativteil) of a), in particular, J ◦ f dµ is well defined (maybe +∞). (ii) hJ ◦ f i ≥ J(hf i) (iii) If J is strictly convex at hf i then equality in (ii) holds iff f = hf i. 14 Proof. By convexity, there exists a number v such that J(t) ≥ J(hf i) + v(t − hf i) (6.2) holds for every t ∈ R. (The graph of a convex function lies “above” every tangent line). Plugging in t = f (x), we have J(f (x)) ≥ J(hf i) + v(f (x) − hf i) (6.3) and thus J(f (x))− ≤ J(hf i) + |v||f (x)| + |v||hf i| ∈ L1 thus (i) is proven (we needed only an upper bound on J(f (x))− since it is always non-negative). Integrating (6.3) over Ω with respect to µ, then dividing by µ(Ω), we get exactly (ii). Finally, to prove (iii), it is clear that if f is constant (almost everywhere), then clearly this constant must be its average, hf i and (ii) holds with equality. If f is not a constant, then f (x) − hf i takes on positive and negative values on sets of positive measure. Since J is strictly convex, then (6.2) is a strict inequality either for all t > hf i or for all t < hf i. That means that inequality (6.3) is a strict inequality on a set of positive measure, thus after integration we get a strict inequality in (iii). Remark 6.2 A measure space (M, µ) is called a probability space (Wahrscheinlichkeitsraum) if µ(M) = 1. On a probability space, Jensen inequality simplifies a bit since there is no need for normalization with µ(Ω). For example, from the convexity of the function J(t) = tp , t≥0 in case of 1 ≤ p < ∞, it follows that on a probability space Z p Z |f |dµ ≤ |f |p dµ (6.4) The last example is a special case of the (probably) most important inequality in analysis: Theorem 6.3 (Hölder’s inequality) Let 1 ≤ p, q ≤ ∞ be conjugate exponents (konjugierte Exponent), i.e. satisfy p1 + 1q = 1 (by convention, 1/∞ = 0). Then for any two nonnegative functions f, g ≥ 0 defined on a measure space (Ω, µ) we have Z (6.5) f g dµ ≤ kf kp kgkq Ω 15 Furthermore, if the assumption f, g ≥ 0 is dropped but we assume f ∈ Lp and g ∈ Lq , then f g ∈ L1 and (6.5) holds. Finally, if f ∈ Lp , g ∈ Lq then (6.5) holds with equality if and only if there exists λ ∈ R such that (i) g = λ|f |p−1 in case of 1 < p < ∞; (ii) in case of p = 1 we have |g| ≤ λ (a.e.) and |g| = λ on the set where f (x) 6= 0. The case p = ∞ is the dual of (ii). Hölder’s inequality is usually stated for two functions, but it is trivial to extend it to product of many functions by induction: Z (6.6) f1 f2 . . . fk dµ ≤ kf1 kp1 kf2 kp2 . . . kfk kpk Ω whenever 1 1 1 + + ...+ =1 p1 p2 pk Proof. I will just show the inequality, the cases of equality follows from these arguments (THINK IT OVER!). We also assume that f ∈ Lp and g ∈ Lq , otherwise (6.5) holds trivially for f,Rg ≥ 0. [Note that this statement is not true without the non-negativity assumption, since f gdµ may not be defined!] First proof. The standard proof starts with observing that it is sufficient to prove the inequality if kf kp = kgkq = 1, otherwise one could redefine f → f /kf kp , g → g/kgkq by the homogeneity of the norm. Then one uses the arithmetic inequality ab ≤ ap bq + , p q a, b ≥ 0 (that can be proven by elementary calculus) and get Z Z Z 1 1 1 1 p |f ||g| dµ ≤ |f | + |g|q = + = 1 p Ω q Ω p q Ω and this was to be proven under the condition that kf kp = kgkq = 1. Second proof. Again, we will prove only the kf kp = kgkq = 1 case and for simplicity we can clearly assume that f, g ≥ 0 (replace f → |f | and g → |g|). In this case, the measure g(x)q dµ(x) is a probability measure and we write Z Z Z p 1−q q f gdµ = f g g dµ ≤ f g 1−q g q dµ 16 by the probability space version of Jensen’s inequality (6.4). Thus Z Z Z p (1−q)p+q dµ = f p dµ = 1 f gdµ ≤ f g since p, q were conjugate exponents, thus (1 − q)p + q = 0. The most commonly used case of Hölder’s inequality is the case p = q = 2, i.e. the Cauchy-Schwarz inequality Z (6.7) f gdµ ≤ kf k2 kgk2 Homework 6.4 Prove the the following form of Cauchy-Schwarz’ inequality. For any α > 0 Z 1h i f gdµ ≤ αkf k2 + α−1 kgk2 2 This form is actually stronger; (6.7) follows from it easily (HOW?) In many cases it is useful to have the freedom of choosing the additional parameter α in the estimate. Keep this in mind! Theorem 6.5 (Minkowski inequality) Let 1 ≤ p ≤ ∞ and let f, g be defined on a measure space (Ω, dµ). Then kf + gkp ≤ kf kp + kgkp (6.8) If f 6= 0 and 1 < p < ∞, then equality holds iff g = λf for some λ ≥ 0. For the endpoint exponents, p = 1 or p = ∞ equality can hold in other cases as well. Minkowski inequality states the triangle inequality of the Lp norm as it was mentioned earlier. Proof. Again, Minkowski inequality has many proofs, see e.g. a very general version of this inequality whose proof uses Fubini’s theorem in Lieb-Loss: Analysis, Section 2.4. The most direct proof relies on convexity of the function t → tp (we can assume 1 < p < ∞, the p = 1 case is trivial, the p = ∞ case requires a different but equally trivial argument). We first note that f, g ≥ 0 can be assumed (WHY?). Then we write (f + g)p = f (f + g)p−1 + g(f + g)p−1 and apply Hölder’s inequality Z Z 1/q Z 1/q p−1 (p−1)q f (f + g) dµ ≤ kf kp (f + g) dµ = kf kp (f + g)p dµ 17 (since (p − 1)q = p). Similarly Z Z Z 1/q 1/q p−1 (p−1)q g(f + g) dµ ≤ kgkp = kgkp (f + g) dµ (f + g)p dµ Thus Z p (f + g) ≤ kf kp + kgkp Z (f + g)p dµ 1/q dividing through the second factor and using that 1 − 1q = 1p , we obtain (6.8). There is only one small thing to check: the last step of the argument would not have been correct if R (f + g)p dµ = ∞. But by convexity of t → tp (t ≥ 0), we have f + g p f p + g p ≤ 2 2 and the right hand side is integrable, so is the left hand side. So far we worked on arbitrary measure spaces. The following inequality uses that the underlying space has a vectorspace structure and the measure is translation invariant. For simplicity we state it only for Rd and the Lebesgue measure. Theorem 6.6 (Young’s inequality) Let 1 ≤ p, q, r ≤ ∞ be three exponents satisfying 1 1 1 + + =2 p q r (6.9) Then for any f ∈ Lp (Rd ), g ∈ Lq (Rd ), h ∈ Lr (Rd ) it holds Z Z f (x)g(x − y)h(y) dxdy ≤ kf kp kgkq khkr Rd (6.10) Rd Proof of Young’s inequality. It is a smart way of applying Hölder’s inequality. We can assume that f, g, h ≥ 0. Let p0 , q 0, r 0 be the dual exponents of p, q, r, i.e. 1 1 1 1 1 1 + 0 = + 0 = + 0 =1 p p q q r r and note that (6.9) implies 1 1 1 + + =1 p0 q 0 r 0 Define 0 α(x, y) := f (x)p/r g(x − y)q/r 18 0 (6.11) 0 β(x, y) := g(x − y)q/p h(y)r/p 0 γ(x, y) := f (x)p/q h(y)r/q 0 0 and notice that the integral in Young’s inequality is exactly Z Z α(x, y)β(x, y)γ(x, y) dxdy I= Rd Rd by using (6.11). Now we can use the generalized Hölder’s inequality (6.6) for three functions with exponents p0 , q 0 , r 0 on the measure space (Rd × Rd , dxdy) and conclude that I ≤ kαkr0 kβkp0 kγkq0 These norms can all be computed, e.g. Z Z 1/r0 0 q/r 0 kαkr0 = f (x)p g(x − y)q dxdy = kf kp/r p kgkq Rd Rd and similarly the other two. Putting these together, we arrive at (6.10). One important application of Young’s inequality is the honest definition of the convolution. Recall the definition Definition 6.7 The convolution (Faltung) of two functions f, g on Rd is given by Z (f ? g)(x) := f (y)g(x − y)dy Rd It is a nontrivial question that the integral in this definition makes sense and if does, in which sense (for all x, maybe only for almost all x?). If f, g are “nice” functions (e.g. bounded and sufficiently decaying at infinity), then it is easy to see that the convolution integral always exists, moreover, by a change of variables f ?g =g?f If, however, f, g are just in some Lebesgue spaces, then the integral may not exists. It is exactly Young’s inequality that tells us under which conditions on the exponents one can define convolution on Lebesgue spaces. Theorem 6.8 Let 1 ≤ 1p + 1q ≤ 2. Let f ∈ Lp (Rd ), g ∈ Lq (Rd ), then f ? g is a function in 0 Lr , where r 0 is the dual exponent to r from Young’s inequality, i.e. 1+ 1 1 1 = + 0 r p q 19 Proof of the special case q = 1. We want to show that Z Z p p kf ? gkp = f (y)g(x − y)dy dx (6.12) is finite. It is clearly enough to assume that f, g ≥ 0 (see the remark below). Write 1 1 1 1 f g = f g p · g 1− p = f g p · g r (notice that p, r are dual exponents) and use Hölder’s inequality for the inner integral (for p, r as exponents): Z Z Z pr p p kf ? gkp ≤ f (y) g(x − y)dy g(x − y)dy dx p r = kgk1 ZZ p p f (y) g(x − y)dxdy = +1 kf kpp kgk1r = kf kp kgk1 (since p/r + 1 = p) which proves the claim for the special case q = 1. p There are two related general remarks: (1) Note that Fubini theorem has been used, but for non-negative functions this is justified without any further assumptions. (2) You may not like that before we have proved that f ? g is actually in Lp or even that it exists, we already computed its Lp norm. However, none of these steps actually require any of these integrals to be finite: this is a big advantage of Lebesgue integrals of nonnegative functions. Recall that, for example, Hölder’s inequality was stated for any two nonnegative functions. To convince you that there is nothing fishy here, I show once the absolutely correct argument, but later similar arguments will not be spelled out. We first consider nonnegative f, g; for these functions every step is well justified, even if some of the above integrals are infinite. A-posteriori, we obtain from kf kp kgk1 < ∞ that every integral is finite. This does not mean that Z f (y)g(x − y)dy is finite for every x, but it means that this is an Lp function in x (in particular, it is finite for almost all x). Now for arbitrary f and g we want to prove that Z f (y)g(x − y)dy (6.13) 20 defines an Lp function, in particular that this integral is meaningful for almost all x. But this integral is clearly dominated pointwise (in x) by the integral Z |f (y)||g(x − y)|dy (6.14) and we know that this latter is in Lp by the argument above for nonnegative f, g. In particular, for almost all x, the function y → |f (y)||g(x − y)| is integrable, thus for almost all x the function y → f (y)g(x − y) is in L1 . Therefore the integral (6.13) is meaningful for almost all x and then to check that it is in Lp as a function of x, it is enough to show that it has a nonnegative majorant in Lp . But clearly (6.14) majorates (6.13) and it is in Lp . Remark on the proof of Theorem 6.8 of the general case. We do not yet have all tools for the proof of this theorem for the general case: it requires 0 to know that the dual space of Lr is Lr , then f ? g will be identified by its integral against any h ∈ Lr function, i.e. by Z (f ? g)(y)h(y)dy which is (modulo a sign flip) is exactly the double integral in Young’s inequality. Young’s inequality will tell us, that this double integral makes sense for any h ∈ Lr , moreover, it is 0 a bounded linear functional on Lr , therefore f ? g can be identified with elements of Lr . We will learn all these later, but keep in mind the theorem. 7 Approximation by C0∞ functions The goal is to prove the following basic approximation theorem. Recall that for any open domain Ω ⊂ Rd we denote by C0∞ (Ω), i.e. the set of compactly supported, smooth (=infinitely many times differentiable) functions: o n C0∞ (Ω) := f : Ω → C : supp(f ) ⊂ Ω is compact, ∂1α1 ∂2α2 . . . ∂dαd f (x) exists ∀x ∈ Ω, ∀αj ∈ N (Some books use the notation Cc∞ (Ω).) 21 WARNING: Recall the precise definition of the support (Träger) of a continuous function supp(f ) := {x ∈ Rd : f (x) 6= 0} i.e. it is the closure of all points where f does not vanish. In particular, since Ω is open, a function with compact support in Ω must vanish in a neighborhood of the boundary. Theorem 7.1 Let Ω ⊂ Rd be a non-empty open set and let 1 ≤ p < ∞. Then C0∞ (Ω) is dense in the space Lp (Ω, dx) equipped with the Lp norm. In particular, from this theorem it follows that C[0, 1] is dense in Lp [0, 1] for any Lp norm if p < ∞. Note that equipped with the supremum (or L∞ ) norm, (C[0, 1], L∞ ) is not dense in (L∞ [0, 1], L∞ ) because both spaces are complete and they are obviously not equal. Summarizing the conclusions of Riesz-Fischer theorem and Theorem 7.1, we obtain Corollary 7.2 Let 1 ≤ p < ∞ and Ω ⊂ Rd be open. Then the completion of C0∞ (Ω) equipped with the Lp norm is Lp (Ω). Homework 7.3 Let Ω ⊂ Rd be open. Show that the completion of C0∞ (Ω) equipped with the supremum norm is C(Ω). Proof of Theorem 7.1. We will show the proof for Ω = Rd , the general case will be homework. R Choose an arbitrary function j ∈ L1 (Rd ) with j = 1. Define x jε (x) := ε−d j ε Note that Z jε = 1, kjk1 = kjε k1 (7.15) (this is how the normalization was chosen) and as ε → 0, the function jε is more and more concentrated and peaky around the origin. Let f ∈ Lp , 1 ≤ p < ∞ and define Z fε (x) := (f ? jε )(x) = f (y)jε (x − y)dy According to Theorem 6.8, fε is an Lp function and kfε kp ≤ kf kp kjk1 22 (7.16) (we used kjk1 = kjε k1 and we used only the special case of Theorem 6.8 that we proved). Since jε is very strongly concentrated around 0 with a total integral 1, we expect that fε is close to f . This is the content of the Proposition 7.4 Assuming f ∈ Lp , 1 ≤ p < ∞, we have lim kf − fε k = 0 ε→0 Proof of Proposition 7.4. The proof consists of several standard steps. We will go through them, because the similar arguments very often used in analysis, and usually they are not explained in details, it is usually referred to as “by standard approximation arguments” and it is assumed that everybody went through such a proof in his/her life. Step 1. We show that it is sufficient to prove the Proposition if j has compact support. For any sufficiently large R, we define j R (x) := CR χ(|x| < R)j(x) (here R is not a power, but an upper index), where χ(|x| < R) is the characteristic function of the ball |x| < R and CR is the normalization Z −1 CR := j(x)dx |x|<R to ensure that R j R = 1. Obviously, CR → 1 as R → ∞. As before, we define jεR (x) := ε−d j R (x/ε) Then, by using j − j R = [(1 − χ) + (1 − CR )χ]j, we have Z Z R R kjε − jε k1 = kj − j k1 ≤ |j| + |CR − 1| |x|≥R |j| → 0 |x|≤R as R → ∞ uniformly in ε. Therefore, by inequality (7.16) (that is basically a special case of Young’s inequality), we have kjε ? f − jεR ? f kp ≤ kf kp kjε − jεR k1 → 0 uniformly in ε as R → ∞. This shows that one can replace j with a compactly supported version j R and the error can be made arbitrarily small. This technique is called cutoff at infinity. 23 Step 2. With an almost identical (actually somewhat easier) cutoff argument, it is sufficient to show the Proposition for compactly supported f (HOMEWORK: think it over). Step 3. Now we show that it is sufficient to prove the theorem for bounded f . We again use a cutoff argument, but now not in the domain (x-space) but in the range. For a sufficiently large positive h we define f h (x) := f (x)χ{x : |f (x)| ≤ h} Again, by (7.16) and (7.15), we have kjε ? f − jε ? f h kp ≤ kjk1 kf − f h kp and clearly kf − f h kp → 0 as h → ∞. The estimate is again uniform in ε. Step 4. Now we show that it is sufficient to prove the Proposition for p = 1. Indeed, for any 1 < p < ∞ we have Z Z p p kjε ? f − f kp = jε (x − y)f (y)dy − f (x) dx We can estimate p−1 Z p−1 ≤ Ckf k∞ jε (x − y)f (y)dy − f (x) where C := (kjk1 + 1)p−1 , thus Z Z p−1 p p−1 kjε ? f − f k1 kjε ? f − f kp ≤ Ckf k∞ jε (x − y)f (y)dy − f (x)dx = Ckf k∞ Thus it is sufficient to show kjε ? f − f k1 → 0 as ε → 0. One also should check that f ∈ Lp condition can be translated to f ∈ L1 , but we already assumed that f is compactly supported and bounded, so it is any Lp space. Step 5. It is sufficient to prove the Proposition for simple functions of the form X f= ci χRi i where the sum is finite, ci ∈ C and Ri ’s are rectangles. To see this, we recall that the set of simple functions of this form are dense in L1 , in other words any L1 -function can be approximated by them in L1 -sense. (This fact follows from the construction of the Lebesgue integral plus the regularity of the Lebesgue measure plus from the fact that any open set in Rd can be approximated by rectangles – THINK IT OVER!) 24 For any given f ∈ L1 , let fn be a sequence of simple functions such that fn → f in L1 . Suppose that the Proposition is proven for every fn . Then kjε ? f − f k1 ≤ kjε ? (f − fn )k1 + kjε ? fn − fn k1 + kfn − f k1 ≤ (kjk1 + 1)kfn − f k1 + kjε ? fn − fn k1 For any given η > 0. the first term can be made small than η/2 by choosing n sufficiently large and this choice is uniform in ε. After choosing n sufficiently large, we can fix it and choose ε sufficiently small so that the second term becomes smaller than η/2. Thus kjε ? f − f k1 can be made smaller than any given η if ε is sufficiently small, and this proves Step 5. Step 6. By linearity of the convolution and the triangle inequality of the norm, it is sufficient to prove the Proposition for f = χR , i.e. for the characteristic function of a single rectangle. Step 7. By an explicit calculation: Z Z kjε ? χR − χR k1 = jε (x − y)χR (y)dy − χR (x)dx Z Z = jε (x − y)(χR(y) − χR (x))dy dx R Notice the trick of bringing the second term χR (x) inside the integration by using that jε = 1. The integrand jε (x−y)(χR (y) −χR (x)) is explicitly zero unless dist(x, ∂R) ≤ ε`, where ∂R is the boundary of R and j is supported in a ball of radius `. This is because the first factor in jε (x − y)(χR(y) − χR (x)) is zero whenever |x − y| ≥ ε` and the second factor is nonzero only if exactly one of the two points x, y lies in R. Therefore Z Z kjε ? χR − χR k1 = jε (x − y)(χR (y) − χR (x))dy dx dist(x,∂R)≤ε` Z ≤ 2kjε k1 1dx = 2kjk1 vol{x : dist(x, ∂R) ≤ ε`} → 0 dist(x,∂R)≤ε` as ε → 0 since the volume of an ε` neighborhood of the boundary of a fixed rectange R is of order ε (here ` is fixed). This completes the proof of Proposition 7.4. From Proposition 7.4 our Theorem 7.1 easily follows. Simply consider a smooth, compactly R supported function j ∈ L1 with j = 1. If f has a compact support, then so does Z fε (x) = jε (x − y)f (y)dy 25 and by the same argument as in Step 2. above, it is sufficient to prove Theorem 7.1 for compactly supported f (THINK IT OVER). Since fε → f in Lp , Theorem 7.1 will be proven once we show that fε ∈ C0∞ . We will show that ∂fε ∂jε = ?f ∂xj ∂xj (7.17) i.e. convolution fε = jε ? f can be differentiated such that we differentiate one factor. The differentiability up to arbitary order will then follow by induction. To show (7.17), we form the difference quotient on the left hand side after changing variables Z jε (. . . , yj + δ, . . .) − jε (. . . , yj , . . .) f (x − y)dy δ ∂jε (y) pointwise, and it is also uniformly bounded The fraction in the integrand converges to ∂x j in δ (here ε is fixed!) since jε is smooth and compactly supported, thus its first derivatives are bounded (and the first derivatives control the difference quotients by the Taylor formula with remainder term, THINK IT OVER!) Thus by dominated convergence theorem we obtain (7.17) and this completes the proof for the case Ω = Rd . Homework 7.5 The above proof was for Ω = Rd . Prove the theorem for any open set Ω. [Hint: show that there exists an increasing sequence of compact sets, K1 ⊂ K2 ⊂ . . . ⊂ Ω such that if fn := f χKn , then kf − fn kLp (Ω) → 0 as n → ∞. Apply the construction described above for each fn , the construction shows that the support of the approximating functions of fn can be chosen arbitrarily close to the support of fn , in particular it can be chosen in an arbitrary small neighborhood of Kn , i.e. it can be chosen within Ω.] This completes the proof of the approximation theorem. 26