Measures and Integration L´aszl´o Erd˝os Nov 9, 2007

Measures and Integration
László Erdős
Nov 9, 2007
Based upon the poll in class (and the required prerequisite for the course – Analysis III),
I assume that everybody is familiar with general measure theory and Lebesgue integration.
The beginning of this note (Section 1 and 2) is meant to remind you what concepts
this involves. If something is unknown, look it up. I provide a good summary of basic
concepts (without proofs) by Marcel Griesemer. This file actually contains a bit more than
we need, see the list below. Another summary you can find e.g. in the Appendix A of Werner:
Funktionalanalysis. If you need to check more details (e.g. proofs), consult with any analysis
or measure theory book. Very good books are: H.L. Royden: Real Analysis or Walter Rudin:
Principles of mathematical analysis. A very concise and sharp introduction is in E. Lieb, M.
Loss: Analysis.
Need for Lebesgue Integral
One can justify the necessity of a more general integration (than Riemann) in many ways.
From functional analysis point of view there are two “natural” arguments
Need for Lebesgue I.
Recall that we have equipped C[0, 1] with two different metrics (actually, norms) d1 and
d∞ (or k · k1 and k · k∞ in terms of norms). This space was proven to be complete under the
d∞ metric. It is clearly not complete under the d1 metric – it is trivial to find
R 1a sequence of
continuous functions fn and a discontinuous function f : [0, 1] → R, such that 0 |fn −f | → 0.
In particular, fn is Cauchy in d1 (it even converges), but the limit is not in C[0, 1].
You may think, there is no problem, since we know how to Riemann-integrate functions
with jumps, e.g. piecewise continuous functions. So if (C[0, 1], d1) is not complete, maybe
(P C[0, 1], d1) is so. It is fairly easy to see that this is not the case:
Homework 1.1 Prove that (P C[0, 1], d1) is not complete.
We know that P C[0, 1] is not the biggest class of functions that are Riemann integrable,
eventually Riemann integrability can allow infinitely many discontinuities, as long as the
difference of the lower sums and upper sums converge to zero, i.e. the oscillation of the
function is not too big. The basic theorem about Riemann integrability is the following:
Theorem 1.2 A function f : [0, 1] → R is Riemann integrable if and only if f is bounded
and it is continuous almost everywhere, i.e. the set of discontinuities is of (Lebesgue) measure
Homework 1.3 Prove directly (without reference to the above characterization of Riemann
integrability) that the set of Riemann integrable functions equipped with the d1 metric is not
complete. (Hint: take a Cantor set C that has nonzero measure and consider its approximations Cn that you obtain along the Cantor procedure after removing the n-th generation of
intervals. Then take the characteristic functions of these sets.)
Need for Lebesgue II.
We have seen that pointwise convergence and continuity (say in C[0, 1]) are not compatible
without further assumptions. What about pointwise convergence and Riemann integration?
fn (x)dx =
lim fn (x)dx
0 n→∞
true? (In a sense, that pointwise limit of Riemann integrable functions is Riemann integrable
and the limit of the integral is the integral of the limit). We know that without some further
condition this cannot hold, just consider the sequence
fn (x) = nχ(0,1/n) (x)
that clearly converges to f ≡ 0 pointwise but 0 fn = 1.
Suppose we are willing to assume uniform boundedness (that is anyway reasonable in
the realm of Riemann integrable functions and, a-posteriori, we know from the dominated
convergence theorem that some condition is necessary) in order to save the exchangeability of
the limit and integral. It still does not work, for example, consider the Dirichlet function
1 if x ∈ Q ∩ [0, 1]
f (x) =
0 if x ∈ [0, 1] \ Q
and its approximations
fn (x) =
1 if x = pq , p, q ∈ Z, p ≤ q ≤ n
Clearly fn is Riemann integrable (since it is everywhere zero apart from finitely many points),
while its pointwise limit, f , is not Riemann integrable (WHY?) Again, the problem is the big
Concepts you should know
The following concepts, theorems you should be familiar with:
• σ-algebra (meaningful on any set X)
• Borel sets (meaningful on a topological space).
• Measures, outer measures. Measure spaces.
• Regular measures on topological spaces (approximability of measures of sets by open
sets from outside and compact sets from inside)
• Lebesgue measure and its properties. Lebesgue measure is the unique measure on Rn
that is invariant under Euclidean motions and assigns 1 to the unit cube.
• Lebesgue measurable sets. Zero measure sets. Concept of “almost everywhere”. Not all
Lebesgue sets are Borel (this is not easy to prove)
• Counting measure on the measure space (N, P (N), µ), where P (N) is the σ-algebra of
all subsets and µ is the counting measure.
• (Borel)-measurable functions. This class is closed under arithmetic operations, compositions, lim inf and lim sup.
• Lebesgue
R integral.
| f | ≤ |f |)
Integrable functions.
Usual properties (linearity, monotonicity,
• Lebesgue integral coincides with Riemann integral for Riemann integrable functions.
• Basic limit theorems: Monotone and dominated convergence, Fatou’s Lemma.
• Lebesgue integral of complex valued functions (infinite integral is not allowed, |f | < ∞
is required).
• σ-finite measure spaces.
• Product of two measure spaces (construction of the product σ-algebra and product
measure). Fubini theorem (need non-negativity or integrability with respect to the
product measure to interchange integrals)
We will use the notation
f dµ =
f (x)dµ(x)
simultaneously for the Lebesgue integral. The second notation is favored if for some reason
the integration variable needs to be spelled out explicitly (e.g. we have multiple integral).
If Ω ⊂ Rd , then we use
f (x)dx
where dx stands for the Lebesgue measure. Unless we indicate otherwise, integration on
subsets of Rd is always understood with respect to the Lebesgue measure.
Singular measures
This chapter usually belongs to measure theory but I am not sure if the majority of you had
it. So we review it. We first present examples on R, then develop the general definitions.
Let α(x) : R → R be a monotonically increasing function. A monoton function may not
be continuous, but its one-sided limits exist at every point, we introduce the notation
α(a + 0) := lim α(x),
α(a − 0) := lim α(x),
We define a measure µα by assigning
µα ((a, b)) := α(b − 0) − α(a + 0)
to any open interval (a, b). Since open intervals generate the sigma-algebra of Borel sets, it is
easy to see that the usual construction of Lebesgue measure (using α(x) = x) goes through for
this more general case. The resulting measure is called the Lebesgue-Stieltjes measure.
With respect to this measure we can integrate, the corresponding integral is sometimes denoted
f dµα = f dα
(the right hand side is only a notation). This is called the Lebesgue-Stieltjes integral.
(i) As mentioned, α(x) = x gives back the Lebesgue integral. A bit more generally, if
α ∈ C 1 (continuously differentiable), then it is easy to check that
f dα = f (x)α0 (x)dx
i.e. in this case the Lebesgue-Stieltjes integral can be expressed as a Lebesgue integral
with a weight function α0 .
(ii) Fix a number p ∈ R. Let α(x) := χ(x ≥ p) be the characteristic function of the semi-axis
[p, ∞). CHECK that
f dα = f (p)
for any function f . In particular, any function is integrable and the integral depends
only on the value at the origin. The corresponding L1 space is simply
L1 (R, dα) ∼
i.e. it is a one-dimensional vectorspace (CHECK!).
The generated Lebesgue-Stieltjes measure is called the Dirac delta measure at p and
it is denoted as δp . In particular,
1 if p ∈ A
δp (A) =
0 if p 6∈ A
(iii) Let f ≥ 0 be a measurable function on R with finite total integral. Let
Z x
f (s)ds
α(x) :=
µα (A) =
f (x)dx
(iv) A considerable more interesting example is the following function. Consider the standard
Cantor set, i.e.
1 2
1 2
7 8
1 2
C := [0, 1] \ ( , ) ∪ ( , ) ∪ ( , ) ∪ ( , ) ∪ . . .
3 3
9 9
9 9
27 27
Recall that the Cantor set is a compact, uncountable set. It is easy to see that the
Lebesgue measure of C is zero. Define an increasing function α on [0, 1] as follows: α
will be constant on each of the set removed in the definition of C, more precisely
x ∈ (1/3, 2/3)
x ∈ (1/9, 2/9)
x ∈ (7/9, 8/9)
1/8 x ∈ (1/27, 2/27)
α(x) :=
3/8 x ∈ (7/27, 8/27)
5/8 x ∈ (19/27, 20/27)
x ∈ (25/27, 26/27)
Make a picture to see the succesive definition of α on the complement of the Cantor set.
With these formulas we have not yet defined α on C.
Homework 3.1 (a) Show that the function α defined on [0, 1] \ C above can be uniquely
extended to [0, 1] by keeping monotonicity. This is called the Devil’s staircase.
(b) Show that the extension is continuous.
(c) Let µα the corresponding Lebesgue-Stieltjes measure. Show that µα ({p}) = 0 for any
point p ∈ [0, 1].
(d) Show that dµα is supported on a set of Lebesgue measure zero. (Recall that the
support (Träger) of a measure µ is the smallest closed set K such that for any
proper closed subset H we have µ(K \ H) > 0.)
(e) Show that α is almost everywhere differentiable in [0, 1] but the fundamental theorem
of calculus does not hold, e.g.
Z 1
α(1) − α(0) 6=
α0 (x)dx
Homework 3.2 Let µα be the Lebesgue-Stieltjes measure constructed in the previous
Homework. Compute
Z 1
Z 1
x2 dµα (x)
xdµα (x), and (b)
[Hint: use the hierarchical structure of C]
This example shows that without the fundamental theorem of calculus it can be quite
complicated to compute integrals. In this particular example the special structure of
C and α helped. If one constructs either a less symmetric Cantor set or one defines α
differently, it may be very complicated to compute the integral.
The last three examples were prototypes of a certain classification of measures according
to their singularity structure. The Dirac delta measure is so singular that it assigns nonzero
value to a set consisting of a single point, namely δp ({p}) = 1. The measure dµα obtained from
the “Devil’s staircase”, example (iv), is less singular, since it assigns zero measure to every
point, but it is still supported on a small set (measured with respect to the usual Lebesgue
measure). Finally, example (iii) shows a non-singular measure in a sense that µα (A) = 0 for
any set of zero Lebesgue measure.
We give the precise definitions of these classes.
Definition 3.3 Let µ and ν be two measures defined on a fixed σ-algebra on a space X. Then
(a) ν is absolutely continuous (absolutstetig) with respect to µ if ν(A) = 0 whenever
µ(A) = 0. Notation; ν µ;
(b) µ and ν are mutually singular if there is a measurable set A such that µ(A) = 0 and
ν(X \ A) = 0. Notation: µ ⊥ ν.
Example (iii) is a measure that is absolutely continuous with respect to the Lebesgue
measure, while examples (ii) and (iv) are both mutally singular with the Lebesgue measure
(and with each other as well).
It is clear that example (iii) is an absolutely continuous measure. It is less clear, that
essentially every absolutely continuous measure is the result an integration. This is the content
of the important Radon-Nikodym theorem, whose proof we postpone:
Theorem 3.4 (Radon-Nikodym) Let µ and ν be two measures on a common σ-algebra on
X and µ be σ-finite. Then ν µ if and only if there exists a measurable function, f : X → R+
(infinity is allowed), such that
ν(A) =
f (x)dµ(x)
for any A in the σ-algebra. This function is µ-a.e. (almost everywhere) unique. Notation:
f = dν
(this is only a formal fraction!)
Moreover, we also have the following decomposition which we mention without proof:
Theorem 3.5 (Lebesgue decomposition I.) Let µ and ν be two σ-finite measures on a
common σ-algebra. Then ν can be uniquely decomposed as
ν = νac + νsing
where νac µ and νsing ⊥ µ.
The singular part can be further decomposed under a mild countability condition on the
number of points that have positive measure:
Definition 3.6 Let (X, B, µ) be a measure space such that for every point x ∈ X, the set {x}
belongs to B, and let
P := {x ∈ X : µ({x}) 6= 0}
The set P is called the pure points or atomic points of the measure µ. Assume that P is
a countable set. Then the measure
µpp (A) := µ(A ∩ P ) =
is well-defined and it is called the pure point or atomic component of µ. A measure µ is
called pure point measure if µ = µpp . A measure µ is called continuous if µpp = 0.
Given another measure ν on the same σ-algebra, the measure µ is singular continuous
with respect to ν if µ is continuous and µ ⊥ ν.
The Dirac delta measure from example (ii) is an atomic measure; examples (iii) and (iv)
are continuous measures. Example (iii) is a measure that is absolutely continuous with respect
to the Lebesgue measure, while (iv) and the Lebesgue measure are mutually singular. The
measure in (iv) is thus a singular continuous measure with respect to the Lebesgue measure.
The following theorem is a simple exercise from these definitions:
Theorem 3.7 Given two measures µ, ν on the same σ-algebra that contains each {x}, assume
that ν ⊥ µ and assume that the set of atoms of ν is countable. Then the measure ν can be
uniquely decomposed into ν = νpp + νsc , where νpp is the pure point component of ν and νsc is
a singular continuous measure that is also mutually singular to νpp .
The most important application is the following version of these decomposition theorems
whose proof is a simple exercise from the definitions above.
Theorem 3.8 (Lebesgue decomposition II.) Let µ and ν be two σ-finite measures on a
common σ-algebra that contains each {x}. (In particular there are at most countably many
points with nonzero weight). Then ν can be uniquely decomposed as
ν = νac + νpp + νsc
where νac µ, νsc ⊥ µ and νpp is the pure-point component of ν.
Dominated convergence theorem resolved the “Need for Lebesgue II.” by demonstrating that
pointwise limit and integration can be interchanged within the Lebesgue framework (assuming
the existence of the integrable dominating function). What about “Need for Lebesgue I”?
It is clear that the formula
Z 1
kf k1 :=
|f (x)|dx
extends the norm (metric) d1 from C[0, 1] to all Lebesgue integrable functions on [0, 1], since
Riemann and Lebesgue integrals coincide on continuous functions. In the Riesz-Fischer theorem below (Section 5) we will show, that the space of Lebesgue integrable functions is actually
complete, so it is one of the possible completions of (C[0, 1], d1) (we do not know yet that it
is the smallest possible, for that we will have to show that the continuous functions are dense
in the set of Lebesgue integrable ones).
However, before we discuss this, we have to introduce the Lp spaces. It would be tempting
to equip the space of Lebesgue integrable functions by the norm given by (4.1). Unfortunately,
this is not a norm, for a “stupid” reason: the Lebesgue integral is insensitive to changing the
integrand on zero measure set. In particular, kf k1 = 0 does not imply that f (x) = 0 for all
x, only for almost all x.
The following idea circumvents this problem and we discuss it in full generality. Let
(Ω, B, µ) be a measure space (where Ω is the base set, B is a σ-algebra and µ is the measure).
We consider the an equivalence relation on the set of functions f : Ω → C:
f ∼g
iff f (x) = g(x) for µ-almost all x
It needs a (trivial) proof that this is indeed an equivalence relation.
Suppose that f is integrable, then obviously any function in its equivalence class is also
integrable with the same integral. Therefore we consider the space
L1 (Ω, B, µ) = L1 (Ω, µ) = L1 (Ω) = L1 := {Integrable functions}/ ∼
i.e. the integrable functions factorized with this equivalence relation (the various notations
are all used in practice, in principle the concept of L1 depends on the space, the measure
and the sigma algebra, but in most cases it is clear from the context which sigma-algebra
and measure we consider, so we omit it from the notation). It is easy to see that the usual
vectorspace operations extend to the factorspace. Moreover the integration naturally extends
to L1 (Ω, µ).
The only thing to keep in mind, that notationally we still keep denoting elements of
L (Ω, µ) by f (x), even though f (x) does not make sense for a fixed x for a general L1 function
(for continuous functions it is of course meaningful).
Definition 4.1 Let (Ω, B, µ) be a measure space and let 0 < p ≤ ∞. We set
L (Ω, µ) := {f : Ω → C, measurable :
|f |p dµ < ∞}/ ∼
for p < ∞ and
L∞ (Ω, µ) := {f : Ω → C, measurable : ess sup |f | < ∞}/ ∼
where the essential supremum of a function is defined by
ess sup |f | := inf{K ∈ R : |f (x)| ≤ K for almost all x}
These spaces are called Lp -spaces or Lebesgue spaces.
Note that every Lebesgue space is actually an equivalence class of functions. But this fact
is usually omitted from the notations.
Homework 4.2 Prove that Lp (Ω, µ) is a vectorspace for any p > 0.
Definition 4.3 For f ∈ Lp we define
kf kp :=
|f | dµ
if p < ∞ and
kf k∞ := ess sup |f |
if p = ∞.
These formulas do not define a norm if 0 < p < 1 (triangle inequality is not satisfied)
but they do define a norm for 1 ≤ p ≤ ∞. For the proof, one needs Minkowski inequality
(Theorem 6.5)
kf + gkp ≤ kf kp + kgkp
that is exactly the triangle inequality for k · kp (the other two properties of the norm are
trivially satisfied). From now on we will always assume that 1 ≤ p ≤ ∞ whenever we talk
about Lp spaces.
These norms naturally define the concept of Lp convergence of functions:
Definition 4.4 A sequence of functions fn ∈ Lp converges to f ∈ Lp in Lp -sense or in
Lp -norm if kfn − f kp → 0 as n → ∞.
In case of Lp convergent sequences, we often say that fn converges strongly (stark),
although this is a bit imprecise, since it does not specify the exponent p. We will see later
that it nevertheless distinguishes from the concept of weak convergence.
These convergences naturally extend the d1 , dp and d∞ convergences on continuous functions we have studied earlier. Moreover, the pointwise convergence also naturally extends to
Lp functions, but we must keep in mind the problem that everything is defined only almost
Definition 4.5 (i) A sequence of measurable functions fn on a measure space (Ω, B, µ) converges to a measurable function f almost everywhere (fast überall) if there exists a set
Z of measure zero, µ(Z) = 0, such that
fn (x) → f (x)
∀x 6∈ Z
(ii) A sequence of equivalence classes of measurable functions fn converges pointwise to an
equivalence class of measurable functions f , if any sequence of representatives of the classes
of fn converges to any representative of f .
It is any easy exercise to show that if the convergence holds for at least one sequence
of representatives, then it holds for any sequence (of course the exceptional set Z changes),
in particular part (ii) of the above definition is meaningful. Therefore one does not need to
distinguish between almost everywhere pointwise convergence of equivalence classes and their
representatives. In the future, we will thus freely talk about, e.g., Lp functions converging
almost everywhere pointwise without ever mentioning the equivalence classes.
Homework 4.6 Give examples that pointwise convergence does not imply Lp convergence and
vice versa. Give also examples that convergence in Lp does not in general imply convergence
in Lq , p 6= q.
There is, however, one positive statement:
Lemma 4.7 Suppose the total measure of the space is finite, µ(Ω) < ∞. Then Lp convergence
implies Lq convergence whenever q ≤ p.
Proof. Use Hölder inequality (we will prove it later, but I assume everybody has seen it)
q/p Z
q p/q
|f | dµ =
|f | · 1 dµ ≤
|f |
kf kq ≤ kf kp µ(Ω)
q1 − p1
Riesz-Fischer theorem
The following theorem presents the most important step towards proving that L1 [0, 1] is the
completion of C[0, 1] equipped with the d1 metric.
Theorem 5.1 (Riesz-Fischer) Let (Ω, B, µ) be an arbitrary measure space, let 1 ≤ p ≤ ∞
and consider the Lp = Lp (Ω, µ).
(i) The space Lp , equipped with the norm k · kp , is complete, i.e. if fi ∈ Lp is Cauchy, then
there is a function f ∈ Lp such that fi → f in Lp -sense.
(ii) If fi → f in Lp , then there exists a subsequence, fik , and a function F ∈ Lp such that
|fik (x)| ≤ F (x) for all n (almost everywhere in x) and fik converges to f almost everywhere,
as k → ∞.
Proof. We will do the proof for p < ∞. The p = ∞ case requires a somewhat different
treatment (since L∞ is defined differently) but it is simpler.
Step 1: Subsequential convergence is enough.
This is an important basic idea. We want to prove that a Cauchy sequence fi converges
strongly. It turns out that it is sufficient to show that some subsequence converges strongly.
Apparently this is much weaker, but actually it is not. Suppose that fik is a strongly convergent
subsequence, i.e. fik → f (in Lp ) as k → ∞. But then
kfi − f kp ≤ kfi − fik kp + kfik − f kp
and thus for any ε > 0 we can make the second term smaller than ε/2 by choosing k sufficiently large, and then, by the Cauchy property, the first term is smaller than ε/2 if i and k
are sufficiently large. Thus from subsequential strong convergence of a Cauchy sequence we
concluded the strong convergence of the whole sequence.
Step 2. Selection a subsequence.
To find a convergent subsequence we proceed successively. Pick i1 such that
kfn − fi1 kp ≤
∀n ≥ i1
such i1 exists by the Cauchy property. Now select i2 > i1 such that
kfn − fi2 kp ≤
∀n ≥ i2
and again by the Cauchy property such i2 exists. Next we choose i3 > i2 such that
∀n ≥ i3
∀n ≥ ik
|fik − fik+1 |
kfn − fi3 kp ≤
etc., in general we have ik > ik−1 with
kfn − fik kp ≤
Step 3. Telescopic sum
Now we define
F` := |fi1 | +
By Minkowski inequality
kF` kp ≤ kfi1 kp +
1 1
+ + . . . = kfi1 kp + 1
2 4
and clearly F` is a monotone increasing sequence of functions. Let
F := lim F`
be the almost everywhere pointwise limit, then by monotone convergence theorem and by the
uniform bound on the Lp norm of F` , we have
kF kp < ∞
in particular, F (x) < ∞ almost everywhere.
Now use the telescopic cum
fik = fi1 + (fi2 − fi1 ) + (fi3 − fi2 ) + . . . + (fik−1 − fik )
As k → ∞ this is an absolutely convergent series for every x such that F (x) < ∞, let f (x)
be its limit, thus
fik (x) → f (x)
for almost every x. Moreover, from the telescopic sum it also follows that
|fik | ≤ F ∈ Lp
and thus by dominated convergence, we have
f ∈ Lp
Using dominated convergence once more, for
|fik − f | ≤ |fik | + |f | ≤ F + |f | ∈ Lp
we also have
kfik − f kp → 0,
k → ∞.
We have proved earlier that (C[0, 1], k · k∞ ) is complete and now we have seen that
(L [0, 1], k · k∞ ) is also complete. However, for any p < ∞, the set (C[0, 1], k · kp ) is not
complete (EXAMPLE!) but (Lp [0, 1], k · kp ) is complete. Actually it is the (smallest) completion of (C[0, 1], k · kp ) as we will soon prove.
Remark 5.2 The p = ∞ case often behaves exceptionally. Many theorems about Lp spaces
hold only with the restriction p < ∞, and/or sometimes, by duality, p > 1 is necessary. Rule
of thumb: whenever you use some theorem about Lp spaces watch out for the borderline cases,
p = 1, ∞ and make sure the theorem applies to them. Riesz-Fischer theorem holds without
restrictions, but many other theorems do not.
The primary tools in analysis are inequalities. Even though often theorems in analysis are
formulated as limiting statements, the heart of the proof is almost always an inequality. Here
we discuss a few basic inequalities involving integrals of functions. I assume that you have
already seen Jensen’s, Hölder’s and Minkowski’s inequalities. I will not prove them in class,
but I enclose their proofs – they are important, if you forgot them, review it.
Theorem 6.1 (Jensen’s inequality) Let J : R → R be a convex function and let (Ω, µ)
be a measure space with finite total measure, i.e. µ(Ω) < ∞. Let f ∈ L1 (Ω, µ) function and
define its average as
hf i :=
f dµ
µ(Ω) Ω
(i) (J ◦ Rf )− ∈ L1 (here a− := max{0, −a} is the negative part (Negativteil) of a), in
particular, J ◦ f dµ is well defined (maybe +∞).
(ii) hJ ◦ f i ≥ J(hf i)
(iii) If J is strictly convex at hf i then equality in (ii) holds iff f = hf i.
Proof. By convexity, there exists a number v such that
J(t) ≥ J(hf i) + v(t − hf i)
holds for every t ∈ R. (The graph of a convex function lies “above” every tangent line).
Plugging in t = f (x), we have
J(f (x)) ≥ J(hf i) + v(f (x) − hf i)
and thus
J(f (x))− ≤ J(hf i) + |v||f (x)| + |v||hf i| ∈ L1
thus (i) is proven (we needed only an upper bound on J(f (x))− since it is always non-negative).
Integrating (6.3) over Ω with respect to µ, then dividing by µ(Ω), we get exactly (ii).
Finally, to prove (iii), it is clear that if f is constant (almost everywhere), then clearly this
constant must be its average, hf i and (ii) holds with equality. If f is not a constant, then
f (x) − hf i takes on positive and negative values on sets of positive measure. Since J is strictly
convex, then (6.2) is a strict inequality either for all t > hf i or for all t < hf i. That means
that inequality (6.3) is a strict inequality on a set of positive measure, thus after integration
we get a strict inequality in (iii). Remark 6.2 A measure space (M, µ) is called a probability space (Wahrscheinlichkeitsraum) if µ(M) = 1. On a probability space, Jensen inequality simplifies a bit since there is
no need for normalization with µ(Ω). For example, from the convexity of the function
J(t) = tp ,
in case of 1 ≤ p < ∞, it follows that on a probability space
p Z
|f |dµ ≤ |f |p dµ
The last example is a special case of the (probably) most important inequality in analysis:
Theorem 6.3 (Hölder’s inequality) Let 1 ≤ p, q ≤ ∞ be conjugate exponents (konjugierte Exponent), i.e. satisfy p1 + 1q = 1 (by convention, 1/∞ = 0). Then for any two
nonnegative functions f, g ≥ 0 defined on a measure space (Ω, µ) we have
f g dµ ≤ kf kp kgkq
Furthermore, if the assumption f, g ≥ 0 is dropped but we assume f ∈ Lp and g ∈ Lq , then
f g ∈ L1 and (6.5) holds.
Finally, if f ∈ Lp , g ∈ Lq then (6.5) holds with equality if and only if there exists λ ∈ R
such that
(i) g = λ|f |p−1 in case of 1 < p < ∞;
(ii) in case of p = 1 we have |g| ≤ λ (a.e.) and |g| = λ on the set where f (x) 6= 0.
The case p = ∞ is the dual of (ii).
Hölder’s inequality is usually stated for two functions, but it is trivial to extend it to
product of many functions by induction:
f1 f2 . . . fk dµ ≤ kf1 kp1 kf2 kp2 . . . kfk kpk
+ ...+
p1 p2
Proof. I will just show the inequality, the cases of equality follows from these arguments
(THINK IT OVER!). We also assume that f ∈ Lp and g ∈ Lq , otherwise (6.5) holds trivially
for f,Rg ≥ 0. [Note that this statement is not true without the non-negativity assumption,
since f gdµ may not be defined!]
First proof. The standard proof starts with observing that it is sufficient to prove the
inequality if kf kp = kgkq = 1, otherwise one could redefine f → f /kf kp , g → g/kgkq by the
homogeneity of the norm. Then one uses the arithmetic inequality
ab ≤
ap bq
+ ,
a, b ≥ 0
(that can be proven by elementary calculus) and get
1 1
|f ||g| dµ ≤
|f | +
|g|q = + = 1
p Ω
q Ω
p q
and this was to be proven under the condition that kf kp = kgkq = 1. Second proof. Again, we will prove only the kf kp = kgkq = 1 case and for simplicity
we can clearly assume that f, g ≥ 0 (replace f → |f | and g → |g|). In this case, the measure
g(x)q dµ(x) is a probability measure and we write
Z p
1−q q
f gdµ = f g g dµ ≤ f g 1−q g q dµ
by the probability space version of Jensen’s inequality (6.4). Thus
p (1−q)p+q
dµ = f p dµ = 1
f gdµ ≤ f g
since p, q were conjugate exponents, thus (1 − q)p + q = 0. The most commonly used case of Hölder’s inequality is the case p = q = 2, i.e. the
Cauchy-Schwarz inequality
f gdµ ≤ kf k2 kgk2
Homework 6.4 Prove the the following form of Cauchy-Schwarz’ inequality. For any α > 0
f gdµ ≤ αkf k2 + α−1 kgk2
This form is actually stronger; (6.7) follows from it easily (HOW?) In many cases it is useful
to have the freedom of choosing the additional parameter α in the estimate. Keep this in mind!
Theorem 6.5 (Minkowski inequality) Let 1 ≤ p ≤ ∞ and let f, g be defined on a measure
space (Ω, dµ). Then
kf + gkp ≤ kf kp + kgkp
If f 6= 0 and 1 < p < ∞, then equality holds iff g = λf for some λ ≥ 0. For the endpoint
exponents, p = 1 or p = ∞ equality can hold in other cases as well.
Minkowski inequality states the triangle inequality of the Lp norm as it was mentioned
Proof. Again, Minkowski inequality has many proofs, see e.g. a very general version of
this inequality whose proof uses Fubini’s theorem in Lieb-Loss: Analysis, Section 2.4.
The most direct proof relies on convexity of the function t → tp (we can assume 1 < p < ∞,
the p = 1 case is trivial, the p = ∞ case requires a different but equally trivial argument).
We first note that f, g ≥ 0 can be assumed (WHY?). Then we write
(f + g)p = f (f + g)p−1 + g(f + g)p−1
and apply Hölder’s inequality
f (f + g) dµ ≤ kf kp
(f + g)
= kf kp
(f + g)p dµ
(since (p − 1)q = p). Similarly
g(f + g) dµ ≤ kgkp
= kgkp
(f + g)
(f + g)p dµ
(f + g) ≤ kf kp + kgkp
(f + g)p dµ
dividing through the second factor and using that 1 − 1q = 1p , we obtain (6.8). There is
one small thing to check: the last step of the argument would not have been correct if
(f + g)p dµ = ∞. But by convexity of t → tp (t ≥ 0), we have
f + g p f p + g p
and the right hand side is integrable, so is the left hand side.
So far we worked on arbitrary measure spaces. The following inequality uses that the
underlying space has a vectorspace structure and the measure is translation invariant. For
simplicity we state it only for Rd and the Lebesgue measure.
Theorem 6.6 (Young’s inequality) Let 1 ≤ p, q, r ≤ ∞ be three exponents satisfying
1 1 1
+ + =2
p q r
Then for any f ∈ Lp (Rd ), g ∈ Lq (Rd ), h ∈ Lr (Rd ) it holds
f (x)g(x − y)h(y) dxdy ≤ kf kp kgkq khkr
Proof of Young’s inequality. It is a smart way of applying Hölder’s inequality. We can
assume that f, g, h ≥ 0. Let p0 , q 0, r 0 be the dual exponents of p, q, r, i.e.
+ 0 = + 0 = + 0 =1
p p
q q
r r
and note that (6.9) implies
p0 q 0 r 0
α(x, y) := f (x)p/r g(x − y)q/r
β(x, y) := g(x − y)q/p h(y)r/p
γ(x, y) := f (x)p/q h(y)r/q
and notice that the integral in Young’s inequality is exactly
α(x, y)β(x, y)γ(x, y) dxdy
by using (6.11). Now we can use the generalized Hölder’s inequality (6.6) for three functions
with exponents p0 , q 0 , r 0 on the measure space (Rd × Rd , dxdy) and conclude that
I ≤ kαkr0 kβkp0 kγkq0
These norms can all be computed, e.g.
q/r 0
kαkr0 =
f (x)p g(x − y)q dxdy
= kf kp/r
p kgkq
and similarly the other two. Putting these together, we arrive at (6.10). One important application of Young’s inequality is the honest definition of the convolution.
Recall the definition
Definition 6.7 The convolution (Faltung) of two functions f, g on Rd is given by
(f ? g)(x) :=
f (y)g(x − y)dy
It is a nontrivial question that the integral in this definition makes sense and if does, in
which sense (for all x, maybe only for almost all x?). If f, g are “nice” functions (e.g. bounded
and sufficiently decaying at infinity), then it is easy to see that the convolution integral always
exists, moreover, by a change of variables
f ?g =g?f
If, however, f, g are just in some Lebesgue spaces, then the integral may not exists. It
is exactly Young’s inequality that tells us under which conditions on the exponents one can
define convolution on Lebesgue spaces.
Theorem 6.8 Let 1 ≤ 1p + 1q ≤ 2. Let f ∈ Lp (Rd ), g ∈ Lq (Rd ), then f ? g is a function in
Lr , where r 0 is the dual exponent to r from Young’s inequality, i.e.
1 1
= +
p q
Proof of the special case q = 1. We want to show that
kf ? gkp = f (y)g(x − y)dy dx
is finite. It is clearly enough to assume that f, g ≥ 0 (see the remark below). Write
f g = f g p · g 1− p = f g p · g r
(notice that p, r are dual exponents) and use Hölder’s inequality for the inner integral (for p, r
as exponents):
kf ? gkp ≤
f (y) g(x − y)dy
g(x − y)dy dx
= kgk1
f (y) g(x − y)dxdy =
kf kpp kgk1r
= kf kp kgk1
(since p/r + 1 = p) which proves the claim for the special case q = 1. p
There are two related general remarks:
(1) Note that Fubini theorem has been used, but for non-negative functions this is justified
without any further assumptions.
(2) You may not like that before we have proved that f ? g is actually in Lp or even that it
exists, we already computed its Lp norm. However, none of these steps actually require any
of these integrals to be finite: this is a big advantage of Lebesgue integrals of nonnegative
functions. Recall that, for example, Hölder’s inequality was stated for any two nonnegative
functions. To convince you that there is nothing fishy here, I show once the absolutely correct
argument, but later similar arguments will not be spelled out.
We first consider nonnegative f, g; for these functions every step is well justified, even if
some of the above integrals are infinite. A-posteriori, we obtain from kf kp kgk1 < ∞ that
every integral is finite. This does not mean that
f (y)g(x − y)dy
is finite for every x, but it means that this is an Lp function in x (in particular, it is finite for
almost all x).
Now for arbitrary f and g we want to prove that
f (y)g(x − y)dy
defines an Lp function, in particular that this integral is meaningful for almost all x. But this
integral is clearly dominated pointwise (in x) by the integral
|f (y)||g(x − y)|dy
and we know that this latter is in Lp by the argument above for nonnegative f, g. In particular,
for almost all x, the function
y → |f (y)||g(x − y)|
is integrable, thus for almost all x the function
y → f (y)g(x − y)
is in L1 . Therefore the integral (6.13) is meaningful for almost all x and then to check that
it is in Lp as a function of x, it is enough to show that it has a nonnegative majorant in Lp .
But clearly (6.14) majorates (6.13) and it is in Lp .
Remark on the proof of Theorem 6.8 of the general case.
We do not yet have all tools for the proof of this theorem for the general case: it requires
to know that the dual space of Lr is Lr , then f ? g will be identified by its integral against
any h ∈ Lr function, i.e. by
(f ? g)(y)h(y)dy
which is (modulo a sign flip) is exactly the double integral in Young’s inequality. Young’s
inequality will tell us, that this double integral makes sense for any h ∈ Lr , moreover, it is
a bounded linear functional on Lr , therefore f ? g can be identified with elements of Lr . We
will learn all these later, but keep in mind the theorem.
Approximation by C0∞ functions
The goal is to prove the following basic approximation theorem. Recall that for any open
domain Ω ⊂ Rd we denote by C0∞ (Ω), i.e. the set of compactly supported, smooth (=infinitely
many times differentiable) functions:
C0∞ (Ω) := f : Ω → C : supp(f ) ⊂ Ω is compact, ∂1α1 ∂2α2 . . . ∂dαd f (x) exists ∀x ∈ Ω, ∀αj ∈ N
(Some books use the notation Cc∞ (Ω).)
WARNING: Recall the precise definition of the support (Träger) of a continuous function
supp(f ) := {x ∈ Rd : f (x) 6= 0}
i.e. it is the closure of all points where f does not vanish.
In particular, since Ω is open, a function with compact support in Ω must vanish in a
neighborhood of the boundary.
Theorem 7.1 Let Ω ⊂ Rd be a non-empty open set and let 1 ≤ p < ∞. Then C0∞ (Ω) is
dense in the space Lp (Ω, dx) equipped with the Lp norm.
In particular, from this theorem it follows that C[0, 1] is dense in Lp [0, 1] for any Lp
norm if p < ∞. Note that equipped with the supremum (or L∞ ) norm, (C[0, 1], L∞ ) is not
dense in (L∞ [0, 1], L∞ ) because both spaces are complete and they are obviously not equal.
Summarizing the conclusions of Riesz-Fischer theorem and Theorem 7.1, we obtain
Corollary 7.2 Let 1 ≤ p < ∞ and Ω ⊂ Rd be open. Then the completion of C0∞ (Ω) equipped
with the Lp norm is Lp (Ω).
Homework 7.3 Let Ω ⊂ Rd be open. Show that the completion of C0∞ (Ω) equipped with the
supremum norm is C(Ω).
Proof of Theorem 7.1. We will show the proof for Ω = Rd , the general case will be
Choose an arbitrary function j ∈ L1 (Rd ) with j = 1. Define
jε (x) := ε−d j
Note that
jε = 1,
kjk1 = kjε k1
(this is how the normalization was chosen) and as ε → 0, the function jε is more and more
concentrated and peaky around the origin.
Let f ∈ Lp , 1 ≤ p < ∞ and define
fε (x) := (f ? jε )(x) = f (y)jε (x − y)dy
According to Theorem 6.8, fε is an Lp function and
kfε kp ≤ kf kp kjk1
(we used kjk1 = kjε k1 and we used only the special case of Theorem 6.8 that we proved).
Since jε is very strongly concentrated around 0 with a total integral 1, we expect that fε is
close to f . This is the content of the
Proposition 7.4 Assuming f ∈ Lp , 1 ≤ p < ∞, we have
lim kf − fε k = 0
Proof of Proposition 7.4. The proof consists of several standard steps. We will go through
them, because the similar arguments very often used in analysis, and usually they are not
explained in details, it is usually referred to as “by standard approximation arguments” and
it is assumed that everybody went through such a proof in his/her life.
Step 1. We show that it is sufficient to prove the Proposition if j has compact support.
For any sufficiently large R, we define
j R (x) := CR χ(|x| < R)j(x)
(here R is not a power, but an upper index), where χ(|x| < R) is the characteristic function
of the ball |x| < R and CR is the normalization
CR :=
to ensure that
j R = 1. Obviously, CR → 1 as R → ∞. As before, we define
jεR (x) := ε−d j R (x/ε)
Then, by using j − j R = [(1 − χ) + (1 − CR )χ]j, we have
kjε − jε k1 = kj − j k1 ≤
|j| + |CR − 1|
|j| → 0
as R → ∞ uniformly in ε. Therefore, by inequality (7.16) (that is basically a special case of
Young’s inequality), we have
kjε ? f − jεR ? f kp ≤ kf kp kjε − jεR k1 → 0
uniformly in ε as R → ∞. This shows that one can replace j with a compactly supported
version j R and the error can be made arbitrarily small.
This technique is called cutoff at infinity.
Step 2. With an almost identical (actually somewhat easier) cutoff argument, it is sufficient to show the Proposition for compactly supported f (HOMEWORK: think it over).
Step 3. Now we show that it is sufficient to prove the theorem for bounded f . We again
use a cutoff argument, but now not in the domain (x-space) but in the range. For a sufficiently
large positive h we define
f h (x) := f (x)χ{x : |f (x)| ≤ h}
Again, by (7.16) and (7.15), we have
kjε ? f − jε ? f h kp ≤ kjk1 kf − f h kp
and clearly kf − f h kp → 0 as h → ∞. The estimate is again uniform in ε.
Step 4. Now we show that it is sufficient to prove the Proposition for p = 1. Indeed, for
any 1 < p < ∞ we have
kjε ? f − f kp = jε (x − y)f (y)dy − f (x) dx
We can estimate
≤ Ckf k∞
jε (x − y)f (y)dy − f (x)
where C := (kjk1 + 1)p−1 , thus
kjε ? f − f k1
kjε ? f − f kp ≤ Ckf k∞
jε (x − y)f (y)dy − f (x)dx = Ckf k∞
Thus it is sufficient to show kjε ? f − f k1 → 0 as ε → 0. One also should check that f ∈ Lp
condition can be translated to f ∈ L1 , but we already assumed that f is compactly supported
and bounded, so it is any Lp space.
Step 5. It is sufficient to prove the Proposition for simple functions of the form
ci χRi
where the sum is finite, ci ∈ C and Ri ’s are rectangles. To see this, we recall that the
set of simple functions of this form are dense in L1 , in other words any L1 -function can be
approximated by them in L1 -sense. (This fact follows from the construction of the Lebesgue
integral plus the regularity of the Lebesgue measure plus from the fact that any open set in
Rd can be approximated by rectangles – THINK IT OVER!)
For any given f ∈ L1 , let fn be a sequence of simple functions such that fn → f in L1 .
Suppose that the Proposition is proven for every fn . Then
kjε ? f − f k1 ≤ kjε ? (f − fn )k1 + kjε ? fn − fn k1 + kfn − f k1
≤ (kjk1 + 1)kfn − f k1 + kjε ? fn − fn k1
For any given η > 0. the first term can be made small than η/2 by choosing n sufficiently large
and this choice is uniform in ε. After choosing n sufficiently large, we can fix it and choose ε
sufficiently small so that the second term becomes smaller than η/2. Thus kjε ? f − f k1 can
be made smaller than any given η if ε is sufficiently small, and this proves Step 5.
Step 6. By linearity of the convolution and the triangle inequality of the norm, it is
sufficient to prove the Proposition for f = χR , i.e. for the characteristic function of a single
Step 7. By an explicit calculation:
kjε ? χR − χR k1 = jε (x − y)χR (y)dy − χR (x)dx
= jε (x − y)(χR(y) − χR (x))dy dx
Notice the trick of bringing the second term χR (x) inside the integration by using that jε = 1.
The integrand jε (x−y)(χR (y) −χR (x)) is explicitly zero unless dist(x, ∂R) ≤ ε`, where ∂R
is the boundary of R and j is supported in a ball of radius `. This is because the first factor
in jε (x − y)(χR(y) − χR (x)) is zero whenever |x − y| ≥ ε` and the second factor is nonzero
only if exactly one of the two points x, y lies in R. Therefore
kjε ? χR − χR k1 =
jε (x − y)(χR (y) − χR (x))dy dx
≤ 2kjε k1
1dx = 2kjk1 vol{x : dist(x, ∂R) ≤ ε`} → 0
as ε → 0 since the volume of an ε` neighborhood of the boundary of a fixed rectange R is of
order ε (here ` is fixed). This completes the proof of Proposition 7.4. From Proposition 7.4 our Theorem
7.1 easily follows. Simply consider a smooth, compactly
supported function j ∈ L1 with j = 1. If f has a compact support, then so does
fε (x) = jε (x − y)f (y)dy
and by the same argument as in Step 2. above, it is sufficient to prove Theorem 7.1 for
compactly supported f (THINK IT OVER). Since fε → f in Lp , Theorem 7.1 will be proven
once we show that fε ∈ C0∞ . We will show that
i.e. convolution fε = jε ? f can be differentiated such that we differentiate one factor. The
differentiability up to arbitary order will then follow by induction.
To show (7.17), we form the difference quotient on the left hand side after changing variables
jε (. . . , yj + δ, . . .) − jε (. . . , yj , . . .)
f (x − y)dy
(y) pointwise, and it is also uniformly bounded
The fraction in the integrand converges to ∂x
in δ (here ε is fixed!) since jε is smooth and compactly supported, thus its first derivatives
are bounded (and the first derivatives control the difference quotients by the Taylor formula
with remainder term, THINK IT OVER!) Thus by dominated convergence theorem we obtain
(7.17) and this completes the proof for the case Ω = Rd .
Homework 7.5 The above proof was for Ω = Rd . Prove the theorem for any open set Ω.
[Hint: show that there exists an increasing sequence of compact sets, K1 ⊂ K2 ⊂ . . . ⊂ Ω such
that if fn := f χKn , then kf − fn kLp (Ω) → 0 as n → ∞. Apply the construction described above
for each fn , the construction shows that the support of the approximating functions of fn can
be chosen arbitrarily close to the support of fn , in particular it can be chosen in an arbitrary
small neighborhood of Kn , i.e. it can be chosen within Ω.]
This completes the proof of the approximation theorem. 26