MATH 519-520 Lecture Notes: Mathematical Analysis

Notes for MATH 519-520 Paul E. Sacks August 20, 2015 Contents 1 Orientation 1.1 8 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 Preliminaries 2.1 8 9 Ordinary di↵erential equations . . . . . . . . . . . . . . . . . . . . . . . . 9 2.1.1 Initial Value Problems . . . . . . . . . . . . . . . . . . . . . . . . 10 2.1.2 Boundary Value Problems . . . . . . . . . . . . . . . . . . . . . . 12 2.1.3 Some exactly solvable cases . . . . . . . . . . . . . . . . . . . . . 13 2.2 Integral equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 2.3 Partial di↵erential equations . . . . . . . . . . . . . . . . . . . . . . . . . 18 2.3.1 First order PDEs and the method of characteristics . . . . . . . . 18 2.3.2 Second order problems in R2 . . . . . . . . . . . . . . . . . . . . . 21 2.3.3 Further discussion of model problems . . . . . . . . . . . . . . . . 24 2.3.4 Standard problems and side conditions . . . . . . . . . . . . . . . 30 2.4 Well-posed and ill-posed problems . . . . . . . . . . . . . . . . . . . . . . 33 2.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 1 3 Vector spaces 39 3.1 Axioms of a vector space . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 3.2 Linear independence and bases . . . . . . . . . . . . . . . . . . . . . . . 42 3.3 Linear transformations of a vector space . . . . . . . . . . . . . . . . . . 43 3.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 4 Metric spaces 46 4.1 Axioms of a metric space . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 4.2 Topological concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 4.3 Functions on metric spaces and continuity . . . . . . . . . . . . . . . . . 53 4.4 Compactness and optimization . . . . . . . . . . . . . . . . . . . . . . . . 54 4.5 Contraction mapping theorem . . . . . . . . . . . . . . . . . . . . . . . . 58 4.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 5 Normed linear spaces and Banach spaces 66 5.1 Axioms of a normed linear space . . . . . . . . . . . . . . . . . . . . . . . 66 5.2 Infinite series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 5.3 Linear operators and functionals . . . . . . . . . . . . . . . . . . . . . . . 70 5.4 Contraction mappings in a Banach space . . . . . . . . . . . . . . . . . . 72 5.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 6 Inner product spaces and Hilbert spaces 6.1 Axioms of an inner product space . . . . . . . . . . . . . . . . . . . . . . 2 75 75 6.2 Norm in a Hilbert space . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 6.3 Orthogonality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78 6.4 Projections . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 6.5 Gram-Schmidt method . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83 6.6 Bessel’s inequality and infinite orthogonal sequences . . . . . . . . . . . . 84 6.7 Characterization of a basis of a Hilbert space . . . . . . . . . . . . . . . . 85 6.8 Isomorphisms of a Hilbert space . . . . . . . . . . . . . . . . . . . . . . . 87 6.9 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 7 Distributions 93 7.1 The space of test functions . . . . . . . . . . . . . . . . . . . . . . . . . . 94 7.2 The space of distributions . . . . . . . . . . . . . . . . . . . . . . . . . . 95 7.3 Algebra and Calculus with Distributions . . . . . . . . . . . . . . . . . . 99 7.3.1 Multiplication of distributions . . . . . . . . . . . . . . . . . . . . 99 7.3.2 Convergence of distributions . . . . . . . . . . . . . . . . . . . . . 99 7.3.3 Derivative of a distribution . . . . . . . . . . . . . . . . . . . . . . 102 7.4 Convolution and distributions . . . . . . . . . . . . . . . . . . . . . . . . 108 7.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112 8 Fourier analysis and distributions 115 8.1 Fourier series in one space dimension . . . . . . . . . . . . . . . . . . . . 115 8.2 Alternative forms of Fourier series . . . . . . . . . . . . . . . . . . . . . . 121 8.3 More about convergence of Fourier series . . . . . . . . . . . . . . . . . . 123 3 8.4 The Fourier Transform on RN . . . . . . . . . . . . . . . . . . . . . . . . 125 8.5 Further properties of the Fourier transform . . . . . . . . . . . . . . . . . 130 8.6 Fourier series of distributions . . . . . . . . . . . . . . . . . . . . . . . . 134 8.7 Fourier transforms of distributions . . . . . . . . . . . . . . . . . . . . . . 137 8.8 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143 9 Distributions and Di↵erential Equations 148 9.1 Weak derivatives and Sobolev spaces . . . . . . . . . . . . . . . . . . . . 148 9.2 Di↵erential equations in D0 . . . . . . . . . . . . . . . . . . . . . . . . . . 150 9.3 Fundamental solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153 9.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163 10 Linear operators 165 10.1 Linear mappings between Banach spaces . . . . . . . . . . . . . . . . . . 165 10.2 Examples of linear operators . . . . . . . . . . . . . . . . . . . . . . . . . 167 10.3 Linear operator equations . . . . . . . . . . . . . . . . . . . . . . . . . . 173 10.4 The adjoint operator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174 10.5 Examples of adjoints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 10.6 Conditions for solvability of linear operator equations . . . . . . . . . . . 179 10.7 Fredholm operators and the Fredholm alternative . . . . . . . . . . . . . 180 10.8 Convergence of operators . . . . . . . . . . . . . . . . . . . . . . . . . . . 181 10.9 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182 4 11 Unbounded operators 186 11.1 General aspects of unbounded linear operators . . . . . . . . . . . . . . . 186 11.2 The adjoint of an unbounded linear operator . . . . . . . . . . . . . . . . 190 11.3 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194 12 Spectrum of an operator 196 12.1 Resolvent and spectrum of a linear operator . . . . . . . . . . . . . . . . 196 12.2 Examples of operators and their spectra . . . . . . . . . . . . . . . . . . 200 12.3 Properties of spectra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203 12.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206 13 Compact Operators 209 13.1 Compact operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209 13.2 The Riesz-Schauder theory . . . . . . . . . . . . . . . . . . . . . . . . . . 216 13.3 The case of self-adjoint compact operators . . . . . . . . . . . . . . . . . 220 13.4 Some properties of eigenvalues . . . . . . . . . . . . . . . . . . . . . . . . 227 13.5 The Singular Value Decomposition and Normal Operators . . . . . . . . 229 13.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231 14 Spectra and Green’s functions for di↵erential operators 234 14.1 Green’s functions for second order ODEs . . . . . . . . . . . . . . . . . . 234 14.2 Adjoint problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 238 14.3 Sturm-Liouville theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . 241 5 14.4 The Laplacian with homogeneous Dirichlet boundary conditions . . . . . 245 14.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 252 15 Further study of integral equations 256 15.1 Singular integral operators . . . . . . . . . . . . . . . . . . . . . . . . . . 256 15.2 Layer potentials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 260 15.3 Convolution equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 265 15.4 Wiener-Hopf technique . . . . . . . . . . . . . . . . . . . . . . . . . . . . 266 15.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 267 16 Variational methods 269 16.1 The Dirichlet quotient . . . . . . . . . . . . . . . . . . . . . . . . . . . . 269 16.2 Eigenvalue approximation . . . . . . . . . . . . . . . . . . . . . . . . . . 274 16.3 The Euler-Lagrange equation . . . . . . . . . . . . . . . . . . . . . . . . 275 16.4 Variational methods for elliptic boundary value problems . . . . . . . . . 277 16.5 Other problems in the calculus of variations . . . . . . . . . . . . . . . . 281 16.6 The existence of minimizers . . . . . . . . . . . . . . . . . . . . . . . . . 286 16.7 The Fréchet derivative . . . . . . . . . . . . . . . . . . . . . . . . . . . . 287 16.8 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 291 17 Weak solutions of partial di↵erential equations 297 17.1 Lax-Milgram theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 297 17.2 More function spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 303 6 17.3 Galerkin’s method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 309 17.4 PDEs with variable coefficients . . . . . . . . . . . . . . . . . . . . . . . 309 17.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 309 18 Appendices 311 18.1 Inequalities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 311 18.2 Integration by parts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 314 18.3 Spherical coordinates in RN . . . . . . . . . . . . . . . . . . . . . . . . . 315 19 Bibliography 317 7 Chapter 1 Orientation 1.1 Introduction While the phrase ’Applied Mathematics’ has a very broad meaning, the purpose of this textbook is much more limited, namely to present techniques of mathematical analysis which have been found to be particularly useful in understanding some kinds of mathematical problems which are very commonly occurring in scientific and technological disciplines, especially physics and engineering. These methods, which are often regarded as belonging to the realm of functional analysis, have been motivated most specifically in connection with the study of ordinary di↵erential equations, partial di↵erential equations and integral equations. The mathematical modeling of physical phenomena typically involves one or more of these types of equations, and insight into the physical phenomenon itself may result from a deep understanding of the underlying mathematical properties which the models possess. All concepts and techniques discussed in this book are ultimately of interest because of their relevance for the study of these three general types of problems. There is a great deal of beautiful mathematics which has grown out of these ideas, and so intrinsic mathematical motivation cannot be denied or ignored. 8 Chapter 2 Preliminaries chprelim In this chapter we will discuss ’standard problems’ in the theory of ordinary di↵erential equations (ODEs), integral equations, and partial di↵erential equations (PDEs). The techniques developed in these notes are all meant to have some relevance for one or more of these kinds of problems, so it seems best to start with some awareness of exactly what the problems are. In each case there are some relatively elementary methods, which the reader may well have seen before, or which depend only on simple considerations, which we will review. At the same time we establish terminology and notations, and begin to get some sense of the ways in which problems are classified. 2.1 Ordinary di↵erential equations An n’th order ordinary di↵erential equation for an unknown function u = u(t) on an interval (a, b) ⇢ R may be given in the form F (t, u, u0 , u00 , . . . u(n) ) = 0 0 00 (2.1.1) odeform1 (n) where we use the usual notations u , u , . . . for derivatives of order 1, 2, . . . and also u for derivative of order n. Unless otherwise stated, we will assume that the ODE can be solved for the highest derivative, i.e. written in the form u(n) = f (t, u, u0 , . . . u(n 1) ) (2.1.2) For the purpose of this discussion, a solution of either equation will mean a real valued function on (a, b) possessing continuous derivatives up through order n, and for which 9 odeform the equation is satisfied at every point of (a, b). While it is easy to write down ODEs in the form (2.1.1) without any solutions (for example, (u0 )2 + u2 + 1 = 0), we will see that ODEs of the type (2.1.2) essentially always have solutions, subject to some very minimal assumptions on f . The ODE is linear if it can be written as n X aj (t)u(j) (t) = g(t) (2.1.3) lode j=0 for some coefficients a0 , . . . an , g, and homogeneous linear if also g(t) ⌘ 0. It is common to use also operator notation for derivatives, especially in the linear case. Set D= d dt (2.1.4) so that u0 = Du, u00 = D2 u etc., and (2.1.3) may be given as Lu := n X aj (t)Dj u = g(t) (2.1.5) j=0 By standard calculus properties L is a linear operator, meaning that L(c1 u1 + c2 u2 ) = c1 Lu1 + c2 Lu2 (2.1.6) linear for any scalars c1 , c2 and any n times di↵erentiable functions u1 , u2 . An ODE normally has infinitely many solutions – the collection of all solutions is called the general solution of the given ODE. Example 2.1. By elementary calculus considerations, the simple ODE u0 = 0 has general solution u(t) = c, where c is an arbitrary constant. Likewise u0 = u has the general 2 solution u(t) = cet and u00 = 1 has the general solution u(t) = t2 + c1 t + c2 , where c1 , c2 are arbitrary constants. 2 2.1.1 Initial Value Problems The general solution of an n’th order ODE typically contains exactly n arbitrary constants, whose values may be then chosen so that the solution satisfies n additional, or side, conditions. The most common kind of side conditions for an ODE are initial conditions, u(j) (t0 ) = j j = 0, 1, . . . n 10 1 (2.1.7) initcond where t0 is a given point in (a, b) and 0 , . . . n 1 are given constants. Thus we are prescribing the value of the solution and its derivatives up through order n 1 at the point t0 . The problem of solving (2.1.2) together with the initial conditions (2.1.7) is called an initial value problem (IVP), and it is a very important fact that under fairly unrestrictive hypotheses a unique solution exists. In stating conditions on f , we regard it as a function f = f (t, y1 , . . . yn ) defined on some domain in Rn+1 . OdeMain Theorem 2.1. Assume that f, @f @f ,..., @y1 @yn (2.1.8) are defined and continuous in a neighborhood of the point (t0 , 0 , . . . , n 1 ) 2 Rn+1 . Then there exists ✏ > 0 such that the initial value problem (2.1.2),(2.1.7) has a unique solution on the interval (t0 ✏, t0 + ✏). A proof of this theorem may be found in standard ODE textbooks, see for example [4],[7]. A slightly weaker version of this theorem will be proved in Section 4.5. As will be discussed there, the condition of continuity of the partial derivatives of f with respect to each of the variables yi can actually be replaced by the weaker assumption that f is Lipschitz continuous with respect to each of these variables. If we assume only that f is continuous in a neighborhood of the point (t0 , 0 , . . . , n 1 ) then it can be proved that at least one solution exists, but it may not be unique, see Exercise 3. It should also be emphasized that the theorem asserts a local existence property, i.e. only in some sufficiently small interval centered at t0 . It has to be this way, first of all, since the assumptions on f are made only in the vicinity of (t0 , 0 , . . . , n 1 ). But even if the continuity properties of f were assumed to hold throughout Rn+1 , then as the following example shows, it would still only be possible to prove that a solution exists for points t close enough to t0 . Example 2.2. Consider the first order initial value problem u0 = u2 u(0) = (2.1.9) for which the assumptions of Theorem 2.1 hold for any . It may be checked that the solution of this problem is u(t) = (2.1.10) 1 t which is only a valid solution for t < 1 , which can be arbitrarily small. 2 11 With more restrictions on f it may be possible to show that the solution exists on any interval containing t0 , in which case we would say that the solution exists globally. This is the case, for example, for the linear ODE (2.1.3). Whenever the conditions of Theorem 2.1 hold, the set of all possible solutions may be regarded as being parametrized by the n constants 0 , . . . , n 1 , so that as mentioned above, the general solution will contain n arbitrary parameters. In the special case of the linear equation (2.1.3) it can be shown that the general solution may be given as u(t) = n X cj uj (t) + up (t) (2.1.11) j=1 where up is any particular solution of (2.1.3), and u1 , . . . , un are any n linearly independent solutions of the corresponding homogeneous equation Lu = 0. Any such set of functions u1 , . . . , un is also called a fundamental set for Lu = 0. Example 2.3. If Lu = u00 +u then by direct substitution we see that u1 (t) = sin t, u2 (t) = cos t are solutions, and they are clearly linearly independent. Thus {sin t, cos t} is a fundamental set for Lu = 0 and u(t) = c1 sin t + c2 cos t is the general solution of Lu = 0. For the inhomogeneous ODE u00 + u = et one may check that up (t) = 12 et is a particular solution, so the general solution is u(t) = c1 sin t + c2 cos t + 12 et . 2.1.2 Boundary Value Problems For an ODE of degree n 2 it may be of interest to impose side conditions at more than one point, typically the endpoints of the interval of interest. We will then refer to the side conditions as boundary conditions and the problem of solving the ODE subject to the given boundary conditions as a boundary value problem(BVP). Since the general solution still contains n parameters, we still expect to be able to impose a total of n side conditions. However we can see from simple examples that the situation with regard to existence and uniqueness in such boundary value problems is much less clear than for initial value problems. Example 2.4. Consider the boundary value problem u00 + u = 0 0 < t < ⇡ u(0) = 0 u(⇡) = 1 (2.1.12) Starting from the general solution u(t) = c1 sin t + c2 cos t, the two boundary conditions lead to u(0) = c2 = 0 and u(⇡) = c2 = 1. Since these are inconsistent, the BVP has no solution. 2 12 Example 2.5. For the boundary value problem u00 + u = 0 0 < t < ⇡ u(0) = 0 u(⇡) = 0 (2.1.13) we have solutions u(t) = C sin t for any constant C, that is, the BVP has infinitely many solutions. The topic of boundary value problems will be studied in much more detail in Chapter ( ). 2.1.3 Some exactly solvable cases Let us recall explicit solution methods for some commonly occurring types of ODEs. • For the first order linear ODE u0 + p(t)u = q(t) (2.1.14) define the so-called integrating factor ⇢(t) = eP (t) where P 0 = p. Multiplying the equation through by ⇢ we then get (⇢u)0 = ⇢q (2.1.15) so if we pick Q such that Q0 = ⇢q, the general solution may be given as u(t) = Q(t) + C ⇢(t) (2.1.16) • Next consider the linear homogeneous constant coefficient ODE Lu = n X aj u(j) = 0 (2.1.17) j=0 If we look for solutions in the form u(t) = e t then by direct substitution we find that u is a solution provided is a root of the corresponding characteristic polynomial P( ) = n X j=0 13 aj j (2.1.18) lode1 We therefore obtain as many linearly independent solutions as there are distinct roots of P . If this number is less than n, then we may seek further solutions of the form te t , t2 e t , . . . , until a total of n linearly independent solutions have been found. In the case of complex roots, equivalent expressions in terms of trigonometric functions are often used in place of complex exponentials. • Finally, closely related to the previous case is the so-called Cauchy-Euler type equation n X Lu = (t t0 )j aj u(j) = 0 (2.1.19) CEtype j=0 for some constants a0 , . . . , an . In this case we look for solutions in the form u(t) = (t t0 ) with to be found. Substituting into (2.1.19) we will find again an n’th order polynomial whose roots determine the possible values of . The interested reader may refer to any standard undergraduate level ODE book for the additional considerations which arise in the case of complex or repeated roots. 2.2 Integral equations In this section we discuss the basic set-up for the study of linear integral equations. See for example [15], [21] for general references in the classical theory of integral equations. Let ⌦ ⇢ RN be a measurable set and set Z T u(x) = K(x, y)u(y) dy (2.2.1) ⌦ Here the function K should be a measurable function on ⌦ ⇥ ⌦, and is called the kernel of the integral operator T , which is linear since (2.1.6) obviously holds. A class of associated integral equations is then Z K(x, y)u(y) dy = u(x) + g(x) ⌦ x2⌦ (2.2.2) for some scalar and given function g in some appropriate class. If = 0 then (2.2.2) is a first kind integral equation, otherwise it is second kind. Let us consider some simple examples which may be studied by elementary means. 14 basicie Example 2.6. Let ⌦ = (0, 1) ⇢ R and K(x, y) ⌘ 1. The corresponding first kind integral equation is therefore Z 1 u(y) dy = g(x) 0 < x < 1 (2.2.3) 0 For simplicity here we will assume that g is a continuous function. The left hand side is independent of x, thus a solution can exist only if g(x) is a constant function. When g is constant, on the other hand, infinitely many solutions will exist, since we just need to find any u with the given definite integral. For the corresponding second kind equation, Z 1 u(y) dy = u(x) + g(x) (2.2.4) simplestie 0 a solution must have the specific form u(x) = (C g(x))/ for some constant C. Substituting into the equation then gives, after obvious simplification, that Z 1 C g(y) dy = C (2.2.5) 0 or R1 g(y) dy 1 in the case that 6= 1. Thus, for any continuous function g and unique solution of the integral equation, namely R1 g(y) dy g(x) u(x) = 0 (1 ) C= 0 (2.2.6) 6= 0, 1, there exists a (2.2.7) In the remaining case that = 1 it is immediate from (2.2.5) that a solution can exist R1 only if 0 g(y) dy = 0, in which case u(x) = C g(x) is a solution for any choice of C. This very simple example already exhibits features which turn out to be common to a much larger class of integral equations of this general type. These are • The first kind integral equation will require much more restrictive conditions on g in order for a solution to exist. • For most 6= 0 the second kind integral equation has a unique solution for any g. 15 2-01 • There may exist a few exceptional values of for which either existence or uniqueness fails in the corresponding second kind equation. All of these points will be elaborated and made precise in Chapter ( ). Example 2.7. Let ⌦ = (0, 1) and T u(x) = corresponding to the kernel K(x, y) = Z ( x u(y) dy (2.2.8) 1 y<x 0 xy (2.2.9) opVolterra 0 The corresponding integral equation may then be written as Z x u(y) dy = u(x) + g(x) (2.2.10) simpleVolterra 0 This is the prototype of an integral operator of so-called Volterra type, see the definition below. In the first kind case, = 0, we see that g(0) = 0 is a necessary condition for solvability, in which case the solution is u(x) = g 0 (x), provided that g is di↵erentiable in some suitable sense. For 6= 0 we note that di↵erentiation of (2.2.10) with respect to x gives 1 g 0 (x) u0 u= (2.2.11) which is of the type (2.1.14), and so may be solved by the method given there. The result, after some obvious algebraic manipulation, is Z x e 1 x x y 0 u(x) = g(0) e g (y) dy (2.2.12) 2-02 0 Note, however, that by an integration by parts, this formula is seen to be equivalent to Z x x y g(x) 1 u(x) = e g(y) dy (2.2.13) 2 0 Observe that (2.2.12) seems to require di↵erentiability of g even though (2.2.13) does not, thus (2.2.13) would be the preferred solution formula. It may be verified directly by 16 2-03 substitution that (2.2.13) is a valid solution of (2.2.10) for all continuous on [0, 1]. 6= 0, assuming that g is Concerning the two simple integral equations just discussed observe that • For the first kind equation, there are fewer restrictions on g needed for solvability in the Volterra case (2.2.10) than in the non-Volterra case (2.2.4). • There are no exceptional values 6= 0 in the Volterra case, that is, a unique solution exists for every 6= 0 and every continuous g. Here are some of the more important ways in which integral operators are classified: IntOpClass Definition 2.1. The kernel K(x, y) is called • symmetric if K(x, y) = K(y, x) • Volterra type if N = 1 and K(x, y) = 0 for x > y or x < y • convolution type if K(x, y) = K(x y) R • Hilbert-Schmidt type if ⌦⇥⌦ |K(x, y)|2 dxdy < 1 • singular if K(x, y) is unbounded on ⌦ ⇥ ⌦ Some important examples of integral operators, which will receive much more attention later in the book are the Fourier transform Z 1 T u(x) = e ix·y u(y) dy, (2.2.14) N (2⇡) 2 RN the Laplace transform T u(x) = the Hilbert transform Z 1 T u(x) = xy u(y) dy, (2.2.15) opLaplace u(y) dy, x y (2.2.16) opHilbert u(y) p dy. x y (2.2.17) opAbel 0 1 T u(x) = ⇡ and the Abel operator e opFourier Z Z 1 x 0 17 1 2.3 Partial di↵erential equations An m’th order partial di↵erential equation (PDE) for an unknown function u = u(x) on a domain ⌦ ⇢ RN may be given in the form F (x, {D↵ u}|↵|m ) = 0 (2.3.1) Here we are using the so-called multi-index notation for partial derivatives which works as follows. A multi-index is vector of non-negative integers ↵i 2 {0, 1, . . . } ↵ = (↵1 , ↵2 , . . . , ↵N ) In terms of ↵ we define |↵| = the order of ↵, and N X ↵i (2.3.2) (2.3.3) i=1 @ |↵| u (2.3.4) . . . @x↵NN the corresponding ↵ derivative of u. For later use it is also convenient to define the factorial of a multi-index ↵! = ↵1 !↵2 ! . . . ↵N ! (2.3.5) D↵ u = @x↵11 @x↵22 The PDE (2.3.1) is linear if it can be written as X Lu(x) = D↵ u(x) = g(x) (2.3.6) |↵|m pdeorder1 2.3.1 First order PDEs and the method of characteristics Let us start with the simplest possible example. Example 2.8. When N = 2 and m = 1 consider @u =0 (2.3.7) @x1 By elementary calculus considerations it is clear that u is a solution if and only if u is independent of x1 , i.e. u(x1 , x2 ) = f (x2 ) (2.3.8) for some function f . This is then the general solution of the given PDE, which we note contains an arbitrary function f . 18 pdeform1 Example 2.9. Next consider, again for N = 2, m = 1, the PDE a @u @u +b =0 @x1 @x2 (2.3.9) where a, b are fixed constants. This amounts precisely to the condition that u has directional derivative 0 in the direction ✓ = ha, bi, so u is constant along any line parallel to ✓. This in turn leads to the conclusion that u(x1 , x2 ) = f (ax2 bx1 ) for some arbitrary function f , which at least for the moment would seem to need to be di↵erentiable. 2 The collection of lines parallel to ✓, i.e lines ax2 bx1 = C obviously play a special role in the above example, they are the so-called characteristics, or characteristic curves associated to this particular PDE. The general concept of characteristic curve will now be described for the case of a first order linear PDE in two independent variables, (with a temporary change of notation) a(x, y)ux + b(x, y)uy = c(x, y) (2.3.10) linear1order Consider the associated ODE system dx = a(x, y) dt dy = b(x, y) dt (2.3.11) and suppose we have some solution pair x = x(t), y = y(t) which we regard as a parametrically given curve in the (x, y) plane. Such a curve is then, by definition, a characteristic curve for (2.3.10). Observe that if u(x, y) is a di↵erentiable solution of (2.3.10) then d u(x(t), y(t)) = a(x(t), y(t))ux (x(t), y(t)) + b(x(t), y(t))uy (x(t), y(t)) = c(x(t), y(t)) dt (2.3.12) so that u satisfies a certain first order ODE along any characteristic curve. For example if c(x, y) ⌘ 0 then, as in the previous example, any solution of the PDE is constant along any characteristic curve. Now let ⇢ R2 be some curve, which we assume can be parametrized as x = f (s), y = g(s), s0 < s < s1 (2.3.13) The Cauchy problem for (2.3.10) consists in finding a solution of (2.3.10) with values prescribed on , that is, u(f (s), g(s)) = h(s) s0 < s < s1 19 (2.3.14) udoteq for some given function h. Assuming for the moment that such a solution u exists, let x(t, s), y(t, s) be the characteristic curve passing through (f (s), g(s)) 2 when t = 0, i.e. ( @x = a(x, y) x(0, s) = f (s) @t (2.3.15) @y = b(x, y) y(0, s) = g(s) @t We must then have @ u(x(t, s), y(t, s)) = c(x(t, s), y(t, s)) @t u(x(0, s), y(0, s)) = h(s) (2.3.16) This is a first order initial value problem in t, depending on s as a parameter, which is then guaranteed to have a solution at least for |t| < ✏ for some ✏ > 0. The three relations x = x(t, s), y = y(t, s), z = u(x(t, s), y(t, s)) generally amounts to the parametric description of a surface in R3 containing . If we can eliminate the parameters s, t to obtain the surface in non-parametric form z = u(x, y) then u is the sought after solution of the Cauchy problem. example30 Example 2.10. Let denote the x axis and let us solve xux + uy = 1 (2.3.17) 300 with u = h on . Introducing f (s) = s, g(s) = 0 as the parametrization of , we must then solve 8 @x > < @t = x x(0, s) = s @y (2.3.18) = 1 y(0, s) = 0 @t > : @ u(x(t, s), y(t, s)) = 1 u(s, 0) = h(s) @t We then easily obtain x(s, t) = set y(s, t) = t u(x(s, t), y(s, t)) = t + h(s) (2.3.19) and eliminating t, s yields the solution formula u(x, y) = y + h(xe y ) (2.3.20) The characteristics in this case are the curves x = set , y = t for fixed s, or x = sey in nonparametric form. Note here that the solution is defined throughout the x, y plane even though nothing in the preceding discussion guarantees that. Since h has not been otherwise prescribed we may also regard (2.3.20) as the general solution of (2.3.17). The attentive reader may already realize that this procedure cannot work in all cases, as is made clear by the following consideration: if c ⌘ 0 and is itself a characteristic 20 301 curve, then the solution on would have to simultaneously be equal to the given function h and to be constant, so that no solution can exist except possibly in the case that h is a constant function. From another, more general, point of view we must eliminate the parameters s, t by inverting the relations x = x(s, t), y = y(s, t) to obtain s, t in terms of x, y, at least near , and according to the inverse function theorem this should require that the Jacobian matrix  @x @y  a(f (s), g(s)) b(f (s), g(s)) @t @t = (2.3.21) @y @x f 0 (s) g 0 (s) @s @s t=0 be nonsingular for all s. Equivalently the direction hf 0 , g 0 i should not be parallel to ha, bi, and since ha, bi must be tangent to the characteristic curve, this amounts to the requirement that itself should have a non-characteristic tangent direction at every point. We say that is non-characteristic for the PDE (2.3.10) when this condition holds. The following precise theorem can be established, see for example Chapter 1 of [18], or Chapter 3 of [10]. Theorem 2.2. Let ⇢ R2 be a continuously di↵erentiable curve, which is non-characteristic for (2.3.10), h a continuously di↵erentiable function on and let a, b, c be continuously di↵erentiable functions in a neighborhood of . Then there exists a unique continuously di↵erentiable function u(x, y) defined in a neighborhood of which is a solution of (2.3.10). The method of characteristics is capable of a considerable amount of generalization, in particular to first order PDEs in any number of independent variables, and to fully nonlinear first PDEs, see the references just given above. 2.3.2 classif Second order problems in R2 Let us next look at the following special type of second order PDE in two independent variables x, y: Auxx + Buxy + Cuyy = 0 (2.3.22) l2order where A, B, C are real constants, not all zero. Consider introducing new coordinates ⇠, ⌘ by means of a linear change of variable ⇠ = ↵x + y 21 ⌘ = x+ y (2.3.23) ltrans with ↵ 6= 0, so that the transformation is invertible. Our goal is to make a good choice of ↵, , , so as to achieve a simpler, but equivalent PDE to study. Given any PDE and any change of coordinates, we obtain the expression for the PDE in the new coordinate system by straightforward application of the chain rule. In our case, for example, we have @u @u @⇠ @u @⌘ @u @u = + =↵ + (2.3.24) @x @⇠ @x @⌘ @x @⇠ @⌘ ✓ ◆✓ ◆ 2 2 @ 2u @ @ @u @u @ 2u 2@ u 2@ u = ↵ + ↵ + = ↵ + 2↵ + (2.3.25) @x2 @⇠ @⌘ @⇠ @⌘ @⇠ 2 @⇠@⌘ @⌘ 2 with similar expressions for uxy and uyy . Substituting into (2.3.22) the resulting PDE is au⇠⇠ + bu⇠⌘ + cu⌘⌘ = 0 (2.3.26) where a = ↵2 A + ↵ B + 2 C b = 2↵ A + (↵ + )B + 2 c = 2A + B + 2C C (2.3.27) (2.3.28) (2.3.29) The idea now is to make special choices of ↵, , , to achieve as simple a form as possible for the transformed PDE (2.3.26). Suppose first that B 2 4AC > 0, so that there exist two real and distinct roots r1 , r2 of Ar2 + Br + C = 0. If ↵, , , are chosen so that ↵ = r1 = r2 (2.3.30) then a = c = 0, (and ↵ 6= 0) so that the transformed PDE is simply u⇠⌘ = 0. The general solution of this second order PDE is easily obtained: u⇠ must be a function of ⇠ alone, so integrating with respect to ⇠ and observing that the ’constant of integration’ could be any function of ⌘, we get u(⇠, ⌘) = F (⇠) + G(⌘) (2.3.31) for any di↵erentiable functions F, G. Finally reverting to the original coordinate system, the result is u(x, y) = F (↵x + y) + G( x + y) (2.3.32) The lines ↵x + y = C, x + y = C are called the characteristics for (2.3.22). Characteristics are an important concept for this and some more general second order PDEs, but they don’t play as central a role as in the first order case. 22 trpde Example 2.11. For the PDE uxx uyy = 0 (2.3.33) 2 the roots r satisfy r 1 = 0. We may then choose, for example, ↵ = to get the general solution u(x, y) = F (x + y) + G(x = = 1, y) = 1, (2.3.34) Next assume that B 2 4AC = 0. If either of A or C is 0, then so is B, in which case the PDE already has the form u⇠⇠ = 0 or u⌘⌘ = 0, say the first of these without loss of generality. Otherwise, choose ↵= B 2A =1 =1 =0 (2.3.35) to obtain a = b = 0, c = A, so that the transformed PDE in all cases is u⇠⇠ = 0. Finally, if B 2 4AC < 0 then A 6= 0 must hold, and we may choose ↵= p 2A 4AC B 2 =p B 4AC B 2 =0 =1 (2.3.36) in which case the transformed equation is u⇠⇠ + u⌘⌘ = 0 (2.3.37) We have therefore established that any PDE of the type (2.3.22) can be transformed, by means of a linear change of variables, to one of the three simple types, u⇠⌘ = 0 u⇠⇠ = 0 u⇠⇠ + u⌘⌘ = 0 (2.3.38) modelpde each of which then leads to a prototype for a certain class of PDEs. If we allow lower order terms Auxx + Buxy + Cuyy + Dux + Euy + F u = G (2.3.39) l2orderg then after the transformation (2.3.23) it is clear that the lower order terms remain as lower order terms. Thus any PDE of the type (2.3.39) is, up to a change of coordinates, one of the three types (2.3.38), up to lower order terms, and only the value of the discriminant B 2 4AC needs to be known to determine which of the three types is obtained. The above discussion motivates the following classification: The PDE (2.3.39) is said to be: 23 • hyperbolic if B 2 • parabolic if B 2 • elliptic if B 2 4AC > 0 4AC = 0 4AC < 0 The terminology comes from an obvious analogy with conic sections, i.e. the solution set of Ax2 + Bxy + Cy 2 + Dx + Ey + F = 0 is respectively a hyperbola, parabola or ellipse (or a degenerate case) according as B 2 4AC is positive, zero or negative. We can also allow the coefficients A, B, . . . G to be variable functions of x, y, and in this case the classification is done pointwise, so the type can change. An important example of this phenomenon is the so-called Tricomi equation (see e.g. Chapter 12 of [13]) uxx xuyy = 0 (2.3.40) which is hyperbolic for x > 0 and elliptic for x < 0. One might refer to the equation as being parabolic for x = 0 but generally speaking we do not do this, since it is not really meaningful to speak of a PDE being satisfied in a set without interior points. The above discussion is special to the case of N = 2 independent variables, and in the case of N 3 there is no such complete classification. As we will see there are still PDEs referred to as being hyperbolic, parabolic or elliptic, but there are others which are not of any of these types, although these tend to be of less physical importance. 2.3.3 Further discussion of model problems According to the previous discussion, we should focus our attention on a representative problem for each of the three types, since then we will also gain considerable information about other problems of the given type. Wave equation For the hyperbolic case we consider the wave equation utt c2 uxx = 0 (2.3.41) where c > 0 is a constant. Here we have changed the name of the variable y to t, following the usual convention of regarding u = u(x, t) as depending on a ’space’ variable 24 waveeq x and ’time’ variable t. This PDE arises in the simplest model of wave propagation in one dimension, where u represents, for example, the displacement of a vibrating medium from its equilibrium position, and c is the wave speed. Following the procedure outlined at the beginning of this section, an appropriate change of coordinates is ⇠ = x + ct, ⌘ = x ct, and we obtain the expression, also known as d’Alembert’s formula, for the general solution, u(x, t) = F (x + ct) + G(x ct) (2.3.42) dal for arbitrary twice di↵erentiable functions F, G. The general solution may be viewed as the superposition of two waves of fixed shape, moving to the right and to the left with speed c. The initial value problem for the wave equation consists in solving (2.3.41) for x 2 R and t > 0 subject to the side conditions u(x, 0) = f (x) ut (x, 0) = g(x) x2R (2.3.43) where f, g represent the initial displacement and initial velocity of the vibrating medium. This problem may be completely and explicitly solved by means of d’Alembert’s formula. We have F (x) + G(x) = f (x) c(F 0 (x) G0 (x)) = g(x) x2R (2.3.44) R 1 x Integrating the second relation gives F (x) G(x) = c 0 g(s) ds + C for some constant C, and combining with the first relation yields ✓ ◆ ✓ ◆ Z Z 1 1 x 1 1 x F (x) = f (x) + g(s) ds + C G(x) = f (x) g(s) ds C 2 c 0 2 c 0 (2.3.45) Substituting into (2.3.42) and doing some obvious simplification we obtain Z 1 1 x+ct u(x, t) = (f (x + ct) + f (x ct)) + g(s) ds (2.3.46) 2 2c x ct We remark that a general solution formula like (2.3.42) can be given for any PDE which is exactly transformable to u⇠⌘ = 0, that is to say, any hyperbolic PDE of the form (2.3.22), but once lower order terms are allowed such a simple solution method is no longer available. For example the so-called Klein-Gordon equation utt uxx + u = 0 may be transformed to u⇠⌘ + 4u = 0 which cannot be solved in so transparent a form. Thus the d’Alembert solution method, while very useful when applicable, is limited in its scope. 25 waveeqic dalivp Heat equation Another elementary method, which may be used in a wide variety of situations, is separation of variables. We illustrate with the case of the initial and boundary value problem ut = uxx 0<x<1 t>0 u(0, t) = u(1, t) = 0 t>0 u(x, 0) = f (x) 0<x<1 (2.3.47) (2.3.48) (2.3.49) Here (2.3.47) is the heat equation, a parabolic equation modeling for example the temperature in a one dimensional medium u = u(x, t) as a function of location x and time t, (2.3.48) are the boundary conditions, stating that the temperature is held at temperature zero at the two boundary points x = 0 and x = 1 for all t, and (2.3.49) represents the initial condition, i.e. that the initial temperature distribution is given by the prescribed function f (x). We begin by ignoring the initial condition and otherwise looking for special solutions of the form u(x, t) = (t) (x). Obviously u = 0 is such a solution, but cannot be of any help in eventually solving the full stated problem, so we insist that neither of and is the zero function. Inserting into (2.3.47) we obtain immediately that 0 must hold, or equivalently (t) (x) = (t) 0 (t) = (t) 00 (x) (2.3.50) 00 (x) (x) (2.3.51) Since the left side depends on t alone and the right side on x alone, it must be that both sides are equal to a common constant which we denote by (without yet at this point ruling out the possibility that itself is negative or even complex). We have therefore obtained ODEs for and 0 (t) + 00 (t) = 0 (x) + (x) = 0 (2.3.52) linked via the separation constant . Next, from the boundary condition (2.3.48) we get (t) (0) = (t) (1) = 0, and since is nonzero we must have (0) = (1) = 0. The ODE and side conditions for , namely 00 (x) + (x) = 0 0 < x < 1 26 (0) = (1) = 0 (2.3.53) SL1 is the simplest example of a so-called Sturm-Liouville problem, a topic which will be studied in detail in Chapter ( ), but this particular case can be handled by elementary considerations. We emphasize that our goal is to find nonzero solutions of (2.3.53), along with the values of these correspond to, and as we will see, only certain values of will be possible. Considering first the case that > 0, the general solution of the ODE is p p (x) = c1 sin x + c2 cos x (2.3.54) The first p boundary condition (0) = 0 implies that c2 = 0 and the second gives c1 sin = 0. We are not pallowed to have c1 = 0, since otherwise = 0, so instead p sin = 0 must hold, i.e. = ⇡, 2⇡, . . . . Thus we have found one collection of solutions of (2.3.53), which we denote k (x) = sin k⇡x, k = 1, 2, . . . . Since they were found under the assumption that > 0, we should next consider other possibilities, but it turns out that we have already found all possible solutions of (2.3.53). For example if we p suppose < 0 and k = then to solve (2.3.53) we must have (x) = c1 ekx + c2 e kx . From the boundary conditions c1 + c2 = 0 c1 e k + c2 e k =0 (2.3.55) we see that the unique solution is c1 = c2 = 0 for any k > 0. Likewise we can check that = 0 is the only possible solution for k = 0 and for nonreal k. For each allowed value of we obviously have the corresponding function (t) = e t , so that 2 2 uk (x, t) = e k ⇡ t sin k⇡x k = 1, 2, . . . (2.3.56) represents, aside from multiplicative constants, all possible product solutions of (2.3.47),(2.3.48). To complete the solution of the initial and boundary value problem, we observe that P any sum 1 c uk (x, t) is also a solution of (2.3.47),(2.3.48) as long as ck ! 0 sufficiently k k=1 rapidly, and we try to choose the coefficients ck to achieve the initial condition (2.3.49). The requirement is therefore that f (x) = 1 X ck sin k⇡x (2.3.57) k=1 hold. For any f for which such a sine series representation is valid, we then have the solution of the given PDE problem u(x, t) = 1 X ck e k=1 27 k2 ⇡ 2 t sin k⇡x (2.3.58) foursine The question then becomes to characterize this set of f ’s in some more straightforward way, and this is done, among many other things, within the theory of Fourier series, which will be discussed in Chapter 8. Roughly speaking the result will be that essentially any reasonable function can be represented this way, but there are many aspects to this, including elaboration of the precise sense in which the series converges. One other fact concerning this series which we can easily anticipate at this point, is a formula for the coefficient ck : If we assume that (2.3.57) holds, we can multiply both sides by sin m⇡x for some integer m and integrate with respect to x over (0, 1), to obtain Z 1 Z 1 cm f (x) sin m⇡x dx = cm sin2 m⇡x dx = (2.3.59) 2 0 0 R1 since 0 sin k⇡x sin m⇡x dx = 0 for k 6= m. Thus, if f is representable by a sine series, there is only one possibility for the k’th coefficient, namely Z 1 ck = 2 f (x) sin k⇡x dx (2.3.60) 0 Laplace equation Finally we discuss a model problem of elliptic type, uxx + uyy = 0 x2 + y 2 < 1 u(x, y) = f (x, y) x2 + y 2 = 1 (2.3.61) (2.3.62) where f is a given function. The PDE in (2.3.61) is known as Laplace’s equation, and is @2 @2 commonly written as u = 0 where = @x 2 + @y 2 is the Laplace operator, or Laplacian. A function satisfying Laplace’s equation in some set is said to be a harmonic function on that set, thus we are solving the boundary value problem of finding a harmonic function in the unit disk x2 + y 2 < 1 subject to a prescribed boundary condition on the boundary of the disk. One should immediately recognize that it would be natural here to make use of polar coordinates (r, ✓), where according to the usual calculus notations, p y r = x2 + y 2 tan ✓ = x = r cos ✓ y = r sin ✓ (2.3.63) x and we regard u = u(r, ✓) and f = f (✓). 28 sinecoef To begin we need to find the expression for Laplace’s equation in polar coordinates. Again this is a straightforward calculation with the chain rule, for example @u @u @r @u @✓ = + @x @r @x @✓ @x x @u y @u = p 2 2 2 2 x + y @r x + y @✓ @u sin ✓ @u = cos ✓ @r r @✓ and similar expressions for @u @y (2.3.64) (2.3.65) (2.3.66) and the second derivatives. The end result is 1 1 uxx + uyy = urr + ur + 2 u✓✓ = 0 r r (2.3.67) We may now try separation of variables, looking for solutions in the product form u(r, ✓) = R(r)⇥(✓). Substituting into (2.3.67) and dividing by R⇥ gives r2 R00 (r) R0 (r) +r = R(r) R(r) ⇥00 (✓) ⇥(✓) (2.3.68) so both sides must be equal to a common constant . Therefore R and ⇥ must be nonzero solutions of ⇥00 + ⇥ = 0 r2 R00 + rR0 R=0 (2.3.69) Next it is necessary to recognize that there are two ’hidden’ side conditions which we must make use of. The first of these is that ⇥ must be 2⇡ periodic, since otherwise it would not be possible to express the solution u in terms of the original variables x, y in an unambiguous way. We can make this explicit by requiring ⇥0 (0) = ⇥0 (2⇡) ⇥(0) = ⇥(2⇡) (2.3.70) As in the case of (2.3.53) we can search for allowable values of by considering the various cases > 0, < 0 etc. The outcome is that nontrivial solutions exist precisely if = k 2 , k = 0, 1, 2, . . . , with corresponding solutions, up to multiplicative constant, ( 1 k=0 (2.3.71) k (x) = sin kx or cos kx k = 1, 2, . . . If one is willing to use the complex form, we could replace sin kx, cos kx by e±ikx for k = 1, 2, . . . . 29 laplace2radial With determined we must next solve the corresponding R equation, r2 R00 + rR0 k2R = 0 which is of the Cauchy-Euler type (2.1.19). The general solution is ( c1 + c2 log r k = 0 R(r) = c1 rk + c2 r k k = 1, 2 . . . (2.3.72) (2.3.73) and here we encounter the second hidden condition, the solution R should be not be singular at the origin, since otherwise the PDE would not be satisfied throughout the unit disk. Thus we should choose c2 = 0 in each case, leaving R(r) = rk , k = 0, 1, . . . . Summarizing, we have found all possible product solutions R(r)⇥(✓) of (2.3.61), and they are 1, rk sin k✓, rk cos k✓ k = 1, 2, . . . (2.3.74) up to constant multiples. Any sum of such terms is also a solution of (2.3.61), so we seek a solution of (2.3.61),(2.3.62) in the form u(r, ✓) = a0 + 1 X ak rk cos k✓ + bk rk sin k✓ (2.3.75) k=1 The coefficients must then be determined from the requirement that f (✓) = a0 + 1 X ak cos k✓ + bk sin k✓ (2.3.76) k=1 This is another problem in the theory of Fourier series, which is very similar to that associated with (2.3.57), which as mentioned before will be studied in detail in Chapter 8. Exact formulas for the coefficients in terms of f may be given, as in (2.3.60), see Exercise 19. 2.3.4 Standard problems and side conditions Let us now formulate a number of typical PDE problems which will recur throughout this book, and which are for the most part variants of the model problems discussed in 30 fourseries the previous section. Let ⌦ be some domain in RN and let @⌦ denote the boundary of ⌦. For any sufficiently di↵erentiable function u, the Laplacian of u is u= N X @ 2u k=1 • The PDE (2.3.77) @x2k u=h x2⌦ (2.3.78) is Poisson’s equation, or Laplace’s equation in the special case that h = 0. It is regarded as being of elliptic type, by analogy with the N = 2 case discussed in the previous section, or on account of a more general definition of ellipticity which will be given in Chapter 9. The most common type of side conditions associated with this PDE are – Dirichlet, or first kind, boundary conditions u(x) = g(x) x 2 @⌦ (2.3.79) – Neumann, or second kind, boundary conditions @u (x) = g(x) @n x 2 @⌦ (2.3.80) @u where @n (x) = (ru · n)(x) is the directional derivative in the direction of the outward normal direction n(x) for x 2 @⌦. – Robin, or third kind, boundary conditions @u (x) + (x)u(x) = g(x) @n x 2 @⌦ (2.3.81) for some given function . • The PDE u+ u=h x2⌦ (2.3.82) where is some constant, is the Helmholtz equation, also of elliptic type. The three types of boundary condition mentioned for the Poisson equation may also be imposed in this case. 31 • The PDE ut = x2⌦ t>0 u (2.3.83) is the heat equation and is of parabolic type. Here u = u(x, t), where x is regarded as a spatial variable and t a time variable. By convention, the Laplacian acts only with respect to the N spatial variables x1 , . . . xN . Appropriate side conditions for determining a solution of the heat equation are an initial condition u(x, 0) = f (x) x2⌦ (2.3.84) and boundary conditions of the Dirichlet, Neumann or Robin type mentioned above. The only needed modification is that the functions involved may be allowed to depend on t, for example the Dirichlet boundary condition for the heat equation is u(x, t) = g(x, t) x 2 @⌦ t > 0 (2.3.85) and similarly for the other two types. • The PDE utt = x2⌦ t>0 u (2.3.86) is the wave equation and is of hyperbolic type. Since it is second order in t it is natural that there be two initial conditions, usually given as u(x, 0) = f (x) ut (x, 0) = g(x) x2⌦ (2.3.87) Suitable boundary conditions for the wave equation are precisely the same as for the heat equation. • Finally, the PDE iut = x 2 RN u t>0 (2.3.88) is the Schrödinger equation. Even when N = 1 it does not fall under the p classification scheme of Section 2.3.2 because of the complex coefficient i = 1. It is nevertheless one of the fundamental partial di↵erential equations of mathematical physics, and we will have some things to say about it in later chapters. The spatial domain here is taken to be all of RN rather than a subset ⌦ because this is by far the most common situation and the only one which will arise in this book. Since there is no spatial boundary, the only needed side condition is an initial condition for u, u(x, 0) = f (x), as in the heat equation case. 32 2.4 Well-posed and ill-posed problems illposed All of the PDEs and associated side conditions discussed in the previous section turn out to be natural, in the sense that they lead to what are called well-posed problems, a somewhat imprecise concept we explain next. Roughly speaking a problem is well-posed if • A solution exists. • The solution is unique. • The solution depends continuously on the data. Here by ’data’ we mean any of the ingredients of the problem which we might imagine being changed, to obtain a problem of the same general type. For example in the Dirichlet problem for the Poisson equation u=f x2⌦ u = 0 x 2 @⌦ (2.4.1) the term f = f (x) would be regarded as the given data. The idea of continuous dependence is that if a ’small’ change is made in the data, then the resulting solution should also undergo only a small change. For such a notion to be made precise, it is necessary to have some specific idea in mind of how we would measure the magnitude of a change in f . As we shall see, there may be many natural ways to do so, and no precise statement about well-posedness can be given until such choices are made. In fact, even the existence and uniqueness requirements, which may seem more clear cut, may also turn out to require much clarification in terms of what the exact meaning of ’solution’ is. A problem which is not well-posed is called ill-posed. A classical problem in which ill-posedness can be easily recognized is Hadamard’s example, which we note is not of one of the standard types mentioned above: uxx + uyy = 0 1<x<1 y>0 u(x, 0) = 0 uy (x, 0) = g(x) 1 < x1 (2.4.2) (2.4.3) If g(x) = ↵ sin kx for some ↵, k > 0 then a corresponding solution is u(x, y) = ↵ 33 sin kx ky e k (2.4.4) This is known to be the unique solution, but notice that a change in ↵ (i.e. of the data g) of size ✏ implies a corresponding change in the solution for, say, y = 1 of size ✏ek . Since k can be arbitrarily large, it follows that the problem is ill-posed, that is, small changes in the data do not necessarily lead to small changes in the solution. Note that in this example if we change the PDE from uxx + uyy = 0 to uxx uyy = 0 then (aside from the name of a variable) we have precisely the problem (2.3.41),(2.3.43), which from the explicit solution (2.3.46) may be seen to be well-posed under any reasonable interpretation. Thus we see that some care must be taken in recognizing what are the ’correct’ side conditions for a given PDE. Other interesting examples of ill-posed problems are given in exercises 23 and 26, see also [24]. 2.5 Exercises 1. Find a fundamental set and the general solution of u000 + u00 + u0 = 0. ex22 2. Let L = aD2 +bD+c (a 6= 0) be a constant coefficient second order linear di↵erential operator, and let p( ) = a 2 + b + c be the associated characteristic polynomial. If 1 , 2 are the roots of p, show that we can express the operator L as L = a(D 1 )(D 2 ). Use this factorization to obtain the general solution of Lu = 0 in the case of repeated roots, 1 = 2 . p 3. Show that the solution of the initial value problem y 0 = 3 y, y(0) = 0 is not unique. (Hint: y(t) = 0 is one solution, find another one.) Why doesn’t this contradict the assertion in Theorem 2.1 about unique solvability of the initial value problem? 4. Solve the initial value problem for the Cauchy-Euler equation (t + 1)2 u00 + 4(t + 1)u0 10u = 0 u(1) = 2 u0 (1) = 1 5. Consider the integral equation Z 1 K(x, y)u(y) dy = u(x) + g(x) 0 for the kernel K(x, y) = x2 1 + y3 a) For what values of 2 C does there exist a unique solution for any function g which is continuous on [0, 1]? 34 b) Find the solution set of the equation for all 2 C and continuous functions g. (Hint: For 6= 0 any solution must have the form u(x) = g(x) + Cx2 for some constant C.) R1 6. Find a kernel K(x, y) such that u(x) = 0 K(x, y)f (y) dy is the solution of u00 + u = f (x) u(0) = u0 (0) = 0 (Hint: Review the variation of parameters method in any undergraduate ODE textbook.) 2-7 7. If f 2 C([0, 1]), K(x, y) = ( y(x x(y and u(x) = show that u00 = f Z 1) 0 < y < x < 1 1) 0 < x < y < 1 1 K(x, y)f (y) dy 0 0<x<1 u(0) = u(1) = 0 8. For each of the integral operators in (2.2.8),(2.2.14),(2.2.15),(2.2.16),and (2.2.17), discuss the classification(s) of the corresponding kernel, according to Definition (2.1). 9. Find the general solution of (1 + x2 )ux + uy = 0. Sketch some of the characteristic curves. 10. The general solution in Example 2.10 was found by solving the corresponding Cauchy problem with being the x axis. But the general solution should not actually depend on any specific choice of . Show that the same general solution is found if instead we take to be the y axis. 11. Find the solution of yux + xuy = 1 u(0, y) = e Discuss why the solution you find is only valid for |y| y2 |x|. 12. The method of characteristics developed in Section 2.3.1 for the linear PDE (2.3.10) can be easily extended to the so-called semilinear equation a(x, y)ux + b(x, y)uy = c(x, y, u) 35 (2.5.1) We simply replace (2.3.12) by d u(x(t), y(t)) = c(x(t), y(t), u(x(t), y(t))) dt (2.5.2) which is still an ODE along a characteristic. With this in mind, solve ux + xuy + u2 = 0 13. Find the general solution of uxx u(0, y) = 1 y (2.5.3) 4uxy + 3uyy = 0. 14. Find the regions of the xy plane where the PDE yuxx 2uxy + xuyy 3ux + u = 0 is elliptic, parabolic, and hyperbolic. 15. Find a solution formula for the half line wave equation problem utt c2 uxx u(0, t) u(x, 0) ut (x, 0) = = = = 0 x>0 t>0 h(t) t > 0 f (x) x > 0 g(x) x > 0 (2.5.4) (2.5.5) (2.5.6) (2.5.7) Note where the solution coincides with (2.3.46) and explain why this should be expected. 16. Complete the details of verifying (2.3.67) ex-2-17 17. If u is a twice di↵erentiable function on RN depending only on r = |x|, show that u = urr + N 1 r ur (Spherical coordinates in RN are reviewed in Section 18.3, but the details of the @u = angular variables are not needed for this calculation. Start by showing that @x j x j 0 u (r) r .) 18. Verify in detail that there are no nontrivial solutions of (2.3.53) for nonreal ex23 2 C. 19. Assuming that (2.3.76) is valid, find the coefficients ak , bk in terms of f . (Hint: multiply the equation by sin m✓ or cos m✓ and integrate from 0 to 2⇡.) 36 20. In the two dimensional case, solutions of Laplace’s equation u = 0 may also be found by means of analytic function theory. Recall that if z = x+iy then a function f (z) is analytic in an open set ⌦ if f 0 (z) exists at every point of ⌦. If we think of f = u + iv and u, v as functions of x, y then u = u(x, y), v = v(x, y) must satisfy the Cauchy-Riemann equations ux = vy , uy = vx . Show in this case that u, v are also solutions of Laplace’s equation. Find u, v if f (z) = z 3 and f (z) = ez . 21. Find all of the product solutions u(x, t) = (t) (x) that you can which satisfy the damped wave equation utt + ↵ut = uxx 0<x<⇡ t>0 and the boundary conditions u(0, t) = ux (⇡, t) = 0 t>0 Here ↵ > 0 is the damping constant. What is the significance of the condition ↵ < 1? ex24 22. Show that any solution of the wave equation utt property’ u(x, t) + u(x + h uxx = 0 has the ‘four point k, t + h + k) = u(x + h, t + h) + u(x k, t + k) for any h, k. (Suggestion: Use d’Alembert’s formula.) ex25 23. In the Dirichlet problem for the wave equation utt uxx = 0 0<x<1 0<t<1 u(0, t) = u(1, t) = 0 u(x, 0) = 0 u(x, 1) = f (x) 0<t<1 0<x<1 show that neither existence nor uniqueness holds. (Hint: For the non-existence part, use exercise 22 to find an f for which no solution exists.) 24. Let ⌦ be the rectangle [0, a] ⇥ [0, b] in R2 . Find all possible product solutions u(x, y, t) = (t) (x)⇣(y) satisfying ut (x, y) 2 ⌦ t > 0 u=0 (x, y) 2 @⌦ t > 0 u(x, y, t) = 0 37 25. Find a solution of the Dirichlet problem for u = u(x, y) in the unit disc ⌦ = {(x, y) : x2 + y 2 < 1}, u = 1 (x, y) 2 ⌦ u(x, y) = 0 (x, y) 2 @⌦ (Suggestion: look for a solution in the form u = u(r) and recall (2.3.67).) ex26 26. The problem ut = uxx 0<x<1 t<T u(0, t) = u(1, t) = 0 t>0 u(x, T ) = f (x) 0<x<1 (2.5.8) (2.5.9) (2.5.10) is sometimes called a final value problem for the heat equation. a) Show that this problem is ill-posed. b) Show that this problem is equivalent to (2.3.47),(2.3.48),(2.3.49) except with the heat equation (2.3.47) replaced by the backward heat equation ut = uxx . 38 Chapter 3 Vector spaces We will be working frequently with function spaces which are themselves special cases of more abstract spaces. Most such spaces which are of interest to us have both linear structure and metric structure. This means that given any two elements of the space it is meaningful to speak of (i) a linear combination of the elements, and (ii) the distance between the two elements. These two kinds of concepts are abstracted in the definitions of vector space and metric space. 3.1 Axioms of a vector space chvec-1 Definition 3.1. A vector space is a set X such that whenever x, y 2 X and we have x + y 2 X and x 2 X, and the following axioms hold. is a scalar [V1] x + y = y + x for all x, y 2 X [V2] (x + y) + z = x + (y + z) for all x, y, z 2 X [V3] There exists an element 0 2 X such that x + 0 = x for all x 2 X [V4] For every x 2 X there exists an element x 2 X such that x + ( x) = 0 [V5] (x + y) = x + y for all x, y 2 X and any scalar [V6] ( + µ)x = x + µx for any x 2 X and any scalars , µ 39 [V7] (µx) = ( µ)x for any x 2 X and any scalars , µ [V8] 1x = x for any x 2 X Here the field of scalars my be either the real numbers R or the complex numbers C, and we may refer to X as a real or complex vector space accordingly, if a distinction needs to be made. By an obvious induction P argument, if x1 , . . . , xm 2 X and the linear combination m j=1 j xj is itself an element of X. 1, . . . , m are scalars, then Example 3.1. Ordinary N -dimensional Euclidean space RN := {x = (x1 , x2 . . . xN ) : xj 2 R} is a real vector space with the usual operations of vector addition and scalar multiplication, (x1 , x2 . . . xN ) + (y1 , y2 , . . . yN ) = (x1 + y1 , x2 + y2 . . . xN + yN ) (x1 , x2 . . . xN ) = ( x1 , x2 , . . . xN ) If we allow the components xj as well as the scalars instead the complex vector space CN . 2R to be complex, we obtain Example 3.2. If E ⇢ RN , let C(E) = {f : E ! R : f is continous at x for every x 2 E} denote the set of real valued continuous functions on E. Clearly C(E) is a real vector space with the ordinary operations of function addition and scalar multiplication (f + g)(x) = f (x) + g(x) ( f )(x) = f (x) 2R If we allow the range space in the definition of C(E) to be C then C(E) becomes a complex vector space. Spaces of di↵erentiable functions likewise may be naturally regarded as vector spaces, for example C m (E) = {f : D↵ f 2 C(E), |↵|  m} and 2 C 1 (E) = {f : D↵ f 2 C(E), for all ↵} 40 Example 3.3. If 0 < p < 1 and E is a measurable subset of RN , the space Lp (E) is defined to be the set of measurable functions f : E ! R or f : E ! C such that Z |f (x)|p dx < 1 (3.1.1) E Here the integral is defined in the Lebesgue sense. Those unfamiliar with measure theory and Lebesgue integration should consult a standard textbook such as [29],[27], or see a brief summary in Appendix ( ). It may then be shown that Lp (E) is vector space for any 0 < p < 1. To see this we use the known fact that if f, g are measurable then so are f + g and f for any scalar , and the numerical inequality (a + b)p  Cp (ap + bp ) for a, b 0, where Cp = max (2p 1 , 1) to prove that f + g 2 Lp (E) whenever f, g 2 Lp (E). Verification of the remaining axioms is routine. The related vector space L1 (E) is defined as the set of measurable functions f for which ess supx2E |f (x)| < 1 (3.1.2) Here M = ess supx2E |f (x)| if |f (x)|  M a.e. and there is no smaller constant with this property. Definition 3.2. If X is a vector space, a subset M ⇢ X is a subspace of X if (i) x + y 2 M whenever x, y 2 M (ii) x 2 M whenever x 2 M and is a scalar That is to say, a subspace is a subset of X which is closed under formation of linear combinations. Clearly a subspace of a vector space is itself a vector space. Example 3.4. The subset M = {x 2 RN : xj = 0} is a subspace of RN for any fixed j. Example 3.5. If E ⇢ RN then C 1 (E) is a subspace of C m (E) for any m, which in turn is a subspace of C(E). Example 3.6. If X is any vector space and S ⇢ X, then the set of all finite linear combinations of elements of S, L(S) := {v 2 X : x = m X j xj for some scalars j=1 41 1, 2, . . . m and elements x1 , . . . xm 2 S} is a subspace of X. It is also called the span, or linear span of S, or the subspace generated by S. 2 Example 3.7. If in Example 5 we take X = C([a, b]) and fj (x) = xj 1 for j = 1, 2, . . . +1 then the subspace generated by {fj }N j=1 is PN , the vector space of polynomials of degree less than or equal to N . Likewise, the the subspace generated by {fj }1 j=1 is P, the vector space of all polynomials. 2 3.2 Linear independence and bases Definition 3.3. We say that PmS ⇢ X is linearly independent if whenever x1 , . . . xm 2 S, 1 , . . . m are scalars and j=1 j xj = 0 then 1 = 2 = . . . m = 0. Otherwise S is linearly dependent. Equivalently, S is linearly dependent if it is possible to express at least one of its elements as a linear combination of the remaining ones. In particular any set containing the zero element is linearly dependent. hamel Definition 3.4. We say that S ⇢ X is a basis of X if for any x 2 PX there exists unique scalars 1 , 2 , . . . m and elements x1 , . . . , xm 2 S such that x = m j=1 j xj . The following characterization of a basis is then immediate: Theorem 3.1. S ⇢ X is a basis of X if and only if S is linearly independent and L(S) = X. It is important to emphasize that in this definition of basis it is required that every x 2 X be expressible as a finite linear combination of the basis elements. This notion of basis will be inadequate for later purposes, and will be replaced by one which allows infinite sums, but this cannot be done until a meaning of convergence is available. The notion of basis in Definition 3.4 is called a Hamel basis if a distinction is necessary. Definition 3.5. We say that dim X, the dimension of X, is m if there exist m linearly independent vectors in X but any collection of m + 1 elements of X is linearly dependent. If there exists m linearly independent vectors for any positive integer m, then we say dim X = 1. prop31 Proposition 3.1. The elements {x1 , x2 , . . . xm } form a basis for L({x1 , x2 , . . . xm }) if and only if they are linearly independent. 42 prop32 Proposition 3.2. The dimension of X is the number of vectors in any basis of X. The proof of both of these Propositions is left for the exercises. Example 3.8. RN or CN has dimension N . We will denote by ej the standard unit vector with a one in the j’th position and zero elsewhere. Then {e1 , e2 , . . . eN } is the standard basis for either RN or CN . Example 3.9. In the vector space C([a, b]) the elements fj (t) = tj 1 are clearly linearly independent, so that the dimension is 1, as is the dimension of the subspace P. Also evidently the subspace PN has dimension N + 1. Example 3.10. The set of solutions of the ordinary di↵erential equation u00 + u = 0 is precisely the set of linear combinations u(t) = 1 sin t + 2 cos t. Since sin t, cos t are linearly independent functions, they form a basis for this two dimensional space. The following is interesting, although not of great practical significance. Its proof, which is not obvious in the infinite dimensional case, relies on the Axiom of Choice and will not be given here. Theorem 3.2. Every vector space has a basis. 3.3 Linear transformations of a vector space sec33 If X and Y are vector spaces, a mapping T : X 7 ! Y is called linear if T ( 1 x1 + 2 x2 ) = 1 T (x1 ) + 2 T (x2 ) (3.3.1) for all x1 , x2 2 X and all scalars 1 , 2 . Such a linear transformation is uniquely determined on all of X by its action onPany basis of X, i.e. if S P = {x↵ }↵2A is a basis of X m and y↵ = T (x↵ ), then for any x = m x we have T x = j ↵ j j=1 j=1 j y↵j . In the case that X and Y are both of finite dimension let us choose bases {x1 , x2 , . . . xm }, {y1 , y2 , . . . yn } of P X, Y respectively. For 1  j  m there must exist unique scalars akj such that T xj = nk=1 akj yk and it follows that x= m X j=1 j xj ) T x = n X µk y k where µk = m X j=1 k=1 43 akj j (3.3.2) P For a given basis {x1 , x2 , . . . xm } of X, if x = m j=1 j xj we say that 1 , 2 , . . . m are the coordinates of x with respect to the given basis. The n ⇥ m matrix A = [akj ] thus maps the coordinates of x with respect to the basis {x1 , x2 , . . . xm } to the coordinates of T x with respect to the basis {y1 , y2 , . . . yn }, and thus encodes all information about the linear mapping T . If T : X 7 ! Y is linear, one-to-one and onto then we say T is an isomorphism between X to Y, and the vector spaces X and Y are isomorphic whenever there exists an isomorphism between them. If T is such an isomorphism, and S is a basis of X then it easy to check that the image set T (S) is a basis of Y. In particular, any two isomorphic vector spaces have the same finite dimension or are both infinite dimensional. For any linear mapping T : X ! Y we define the kernel, or null space, of T as N (T ) = {x 2 X : T x = 0} (3.3.3) R(T ) = {y 2 Y : y = T x for some x 2 X} (3.3.4) and the range of T as It is immediate that N (T ) and R(T ) are subspaces of X, Y respectively, and T is an isomorphism precisely if N (T ) = {0} and R(T ) = Y. If X = Y = RN or CN , we learn in linear algebra that these two conditions are equivalent. 3.4 Exercises 1. Using only the vector space axioms, show that the zero element in [V3] is unique. 2. Prove Propositions 3.1 and 3.2. 3. Show that the intersection of any family of subspaces of a vector space is also a subspace. What about the union of subspaces? 4. Show that Mm⇥n , the set of m ⇥ n matrices, with the usual definitions of addition and scalar multiplication, is a vector space of dimension mn. Show that the subset of symmetric matrices n ⇥ n matrices forms a subspace of Mn⇥n . What is its dimension? 5. Under what conditions on a measurable set E ⇢ RN and p 2 (0, 1] will it be true that C(E) is a subspace of Lp (E)? Under what conditions is Lp (E) a subset of Lq (E)? 44 6. Let uj (t) = t j where 1 , . . . n are arbitrary unequal real numbers. Show that {u1 . . . uP n } are linearly independent functions on any interval (a, b) ⇢ R. (Suggestion: If nj=1 ↵j t j = 0, divide by t 1 and di↵erentiate.) 7. A side condition for a di↵erential equation is homogeneous if whenever two functions satisfy the side condition then so does any linear combination of the two functions. For example the Dirichlet type boundary condition u = 0 for x 2 @⌦ is P homogeneous. Now let Lu = |↵|m a↵ (x)D↵ u denote any linear di↵erential operator. Show that the set of functions satisfying Lu = 0 and any homogeneous side conditions is a vector space. 8. Consider the di↵erential equation u00 + u = 0 on the interval (0, ⇡). What is the dimension of the vector space of solutions which satisfy the homogeneous boundary conditions a) u(0) = u(⇡), and b) u(0) = u(⇡) = 0. Repeat the question if the interval (0, ⇡) is replaced by (0, 1) and (0, 2⇡). 9. Let Df = f 0 for any di↵erentiable function f on R. For any N D : PN ! PN is linear and find its null space and range. 0 show that 10. If X and Y are vector spaces, then the Cartesian product of X and Y, is defined as the set of ordered pairs X ⇥ Y = {(x, y) : x 2 X, y 2 Y} (3.4.1) Addition and scalar multiplication on X ⇥ Y are defined in the natural way, (x, y) + (x̂, ŷ) = (x + x̂, y + ŷ) (x, y) = ( x, y) (3.4.2) a) Show that X ⇥ Y is a vector space. b) Show that R ⇥ R is isomorphic to R2 . 11. If X, Y are vector spaces of the same finite dimension, show X and Y are isomorphic. 12. Show that Lp (0, 1) and Lp (a, b) are isomorphic, for any a, b 2 R and p 2 (0, 1]. 45 Chapter 4 Metric spaces chmetric 4.1 Axioms of a metric space A metric space is a set on which some natural notion of distance may be defined. Definition 4.1. A metric space is a pair (X, d) where X is a set and d is a real valued mapping on X ⇥ X, such that the following axioms hold. [M1] d(x, y) 0 for all x, y 2 X [M2] d(x, y) = 0 if and only if x = y [M3] d(x, y) = d(y, x) for all x, y 2 X [M4] d(x, y)  d(x, z) + d(z, y) for all x, y, z 2 X. Here d is the metric on X, i.e. d(x, y) is regarded as the distance from x to y. Axiom [M4] is known as the triangle inequality. Although strictly speaking the metric space is the pair (X, d) it is a common practice to refer to X itself as being the metric space, with the metric d understood from context. But as we will see in examples it is often possible to assign di↵erent metrics to the same set X. If (X, d) is a metric space and Y ⇢ X then it is clear that (Y, d) is also a metric space, and in this case we say that Y inherits the metric of X. 46 ex41 Example 4.1. If X = RN then there are many choices of d for which (RN , d) is a metric space. The most familiar is the ordinary Euclidean distance d(x, y) = N X j=1 ! 12 (4.1.1) 1p<1 (4.1.2) yj | 2 |xj In general we may define N X dp (x, y) = j=1 yj | p |xj ! p1 and d1 (x, y) = max (|x1 y1 |, |x2 y2 |, . . . |xn yn |) (4.1.3) The verification that (Rn , dp ) is a metric space for 1  p  1 is left to the exercises – the triangle inequality is the only nontrivial step. The same family of metrics may be used with X = CN . CofE Example 4.2. To assign a metric to C(E) more specific assumptions must be made about E. If we assume, for example, that E is a closed and bounded1 subset of RN we may set d1 (f, g) = max |f (x) g(x)| (4.1.4) x2E so that d(f, g) is always finite by virtue of the well known theorem that a continuous function achieves its maximum on a closed, bounded set. Other possibilities are ✓Z ◆ p1 p dp (f, g) = |f (x) g(x)| dx 1p<1 (4.1.5) E Note the analogy with the definition of dp in the case of RN or CN . For more arbitrary sets E there is in general no natural metric for C(E). For example, if E is an open set, none of the metrics dp can be used since there is no reason why dp (f, g) should be finite for f, g 2 C(E). As in the case of vector spaces, some spaces of di↵erentiable functions may also be made into metric spaces. For this we will assume a bit more about E, namely that E is 1 I.e. E is compact in RN . Compactness is discussed in more detail below, and we avoid using the term until then. 47 CMetric the closure of a bounded open set O ⇢ RN , and in this case will say that D↵ f 2 C(E) if the function D↵ f defined in the usual pointwise sense on O has a continuous extension to E. We then can define C m (E) = {f : D↵ f 2 C(E) whenever |↵|  m} (4.1.6) with metric d(f, g) = max max |D↵ (f |↵|m x2E g)(x)| (4.1.7) CmMetric which may be easily checked to satisfy [M1]-[M4]. We cannot define a metric on C 1 (E) in the obvious way just by letting m ! 1 in the above definition, since there is no reason why the resulting maximum over m in (4.1.7) will be finite, even if f 2 C m (E) for every m. See however Exercise 18. Example 4.3. Recall that if E is a measurable subset of RN , we have defined corresponding vector spaces Lp (E) for 0 < p  1. To endow them with metric space structure let ✓Z ◆ p1 p dp (f, g) = |f (x) g(x)| dx (4.1.8) dpmet E for 1  p < 1, and d1 (f, g) = ess supx2E |f (x) g(x)| (4.1.9) The validity of axioms [M1] and [M3] is clear, and the triangle inequality [M4] is an immediate consequence of the Minkowski inequality (18.1.10). But axiom [M2] does not appear to be satisfied here, since for example, two functions f, g agreeing except at a single point, or more generally agreeing except on a set of measure zero, would have dp (f, g) = 0. It is necessary, therefore, to modify our point of view concerning Lp (E) as follows. We define an equivalence relation f ⇠ g if f = g almost everywhere, i.e. except on a set of measure zero. If dp (f, g) = 0 we would be able to correctly conclude that f ⇠ g, in which case we will regard f and g as being the same element of Lp (E). Thus strictly speaking, Lp (E) is the set of equivalence classes of measurable functions, where the equivalence classes are defined by means of the above equivalence relation. The distance dp ([f ], [g]) between two equivalence classes [f ] and [g] may be unambiguously determined by selecting a representative of each class and then evaluating the distance from (4.1.8) or (4.1.9). Likewise the vector space structure of Lp (E) is maintained since, for example, we can define the sum of equivalence classes [f ] + [g] by selecting a representative of each class and observing that if f1 ⇠ f2 and g1 ⇠ g2 then 48 dinfmet f1 +g1 ⇠ f2 +g2 . It is rarely necessary to make a careful distinction between a measurable function and the equivalence class it belongs to, and whenever it can cause no confusion we will follow the common practice of referring to members of Lp (E) as functions rather than equivalence classes. The notation f may be used to stand for either a function or its equivalence class. An element f 2 Lp (E) will be said to be continuous if its equivalence class contains a continuous function, and in this way we can naturally regard C(E) as a subset of Lp (E). Although Lp (E) is a vector space for 0 < p  1, we cannot use the above definition of metric for 0 < p < 1, since it turns out the triangle inequality is not satisfied (see Exercise 6 of Chapter 5) except in degenerate cases. 4.2 Topological concepts In a metric space various concepts of point set topology may be introduced. Definition 4.2. If (X, d) is a metric space then 1. B(x, ✏) = {y 2 X : d(x, y) < ✏} is the ball centered at x of radius ✏. 2. A set E ⇢ X is bounded if there exists some x 2 X and R < 1 such that E ⇢ B(x, R). 3. If E ⇢ X, then a point x 2 X is an interior point of E if there exists ✏ > 0 such that B(x, ✏) ⇢ E. 4. If E ⇢ X, then a point x 2 X is a limit point of E if for any ✏ > 0 there exists a point y 2 B(x, ✏) \ E, y 6= x. 5. A subset E ⇢ X is open if every point of E is an interior point of E. By convention, the empty set is open. 6. A subset E ⇢ X is closed if every limit point of E is in E. 7. The closure E of a set E ⇢ X is the union of E and the limit points of E. 8. The interior E of a set E is the set of all interior points of E. 9. A subset E is dense in X if E = X 49 10. X is separable if it contains a countable dense subset. 11. If E ⇢ X, we say that x 2 X is a boundary point of E if for any ✏ > 0 the ball B(x, ✏) contains at least one point of E and at least one point of the complement E c = {x 2 X : x 62 E}. The boundary of E is denoted @E. The following Proposition states a number of elementary but important properties. Proofs are essentially the same as in the more familiar special case when the metric space is a subset of RN , and will be left for the reader. Proposition 4.1. Let (X, d) be a metric space. Then 1. B(x, ✏) is open for any x 2 X and ✏ > 0. 2. E ⇢ X is open if and only if its complement E c is closed 3. An arbitrary union or finite intersection of open sets is open. 4. An arbitrary intersection or finite union of closed sets is closed. 5. If E ⇢ X then E is the union of all open sets contained in E, E is open, and E is open if and only if E = E . 6. E is the intersection of all closed sets containing E, E is closed, and E is closed if and only if E = E. 7. If E ⇢ X then @E = E\E = E \ E c Next we study infinite sequences in X. Definition 4.3. We say that a sequence {xn }1 n=1 in X is convergent to x, that is, limn!1 xn = x, if for any ✏ > 0 there exists n0 < 1 such that d(xn , x) < ✏ whenever n n0 . Example 4.4. If X = RN or CN , and d is any one of the metrics dp , then xn ! x if and only if each component sequence converges to the corresponding limit, i.e. xj,n ! xj as n ! 1 in the ordinary sense of convergence in R or C. (Here xj,n is the j’th component of xn .) Example 4.5. In the metric space (C(E), d1 ) of Example 4.2, limn!1 fn = f is equivalent to the definition of uniform convergence on E. 50 Definition 4.4. We say that a sequence {xn }1 n=1 in X is a Cauchy sequence if for any ✏ > 0 there exists n0 < 1 such that d(xn , xm ) < ✏ whenever n, m n0 . It is easy to see that a convergent sequence is always a Cauchy sequence, but the converse may be false. Definition 4.5. A metric space X is said to be complete if every Cauchy sequence in X is convergent in X. Example 4.6. Completeness is one of the fundamental properties of the real numbers N R, see for example Chapter 1 of [28]. If a sequence {xn }1 n=1 in R is Cauchy with respect to any of the metrics dp , then each component sequence {xj,n }1 n=1 is a Cauchy sequence in R, hence convergent in R. It then follows immediately that {xn }1 n=1 is convergent in RN , again with any of the metrics dp . The same conclusion holds for CN , so that RN , CN are complete metric spaces. These spaces are also separable since the subset consisting of points with rational co-ordinates is countable and dense. A standard example of an incomplete metric space is the set of rational numbers with the metric inherited from R. Most metric spaces used in this book, and indeed most metric spaces used in applied mathematics, are complete. prop42 Proposition 4.2. If E ⇢ RN is closed and bounded, then the metric space C(E) with metric d = d1 is complete. Proof: Let {fn }1 n=1 be a Cauchy sequence in C(E). If ✏ > 0 we may then find n0 such that max |fn (x) fm (x)| < ✏ (4.2.1) x2E whenever n, m n0 . In particular the sequence of numbers {fn (x)}1 n=1 is Cauchy in R or C for each fixed x 2 E, so we may define f (x) := limn!1 fn (x). Letting m ! 1 in (4.2.1) we obtain |fn (x) f (x)|  ✏ n n0 x 2 E (4.2.2) which means d(fn , f )  ✏ for n n0 . It remains to check that f 2 C(E). If we pick x 2 E, then since fn0 2 C(E) there exists > 0 such that |fn0 (x) fn0 (y)| < ✏ if |y x| < . Thus for |y x| < we have |f (x) f (y)|  |f (x) fn0 (x)| + |fn0 (x) fn0 (y)| + |fn0 (y) f (y)| < 3✏ (4.2.3) Since ✏ is arbitrary, f is continuous at x, and since x is arbitrary f 2 C(E). Thus we have concluded that the Cauchy sequence {fn }1 n=1 is convergent in C(E) to f 2 C(E), as needed. 2 51 eq401 The final part of the above proof should be recognized as the standard proof of the familiar fact that a uniform limit of continuous functions is continuous. The spaces C m (E) can likewise be shown, again assuming that E is closed and bounded, to be complete metric spaces with the metric defined in (4.1.7), see Exercise 19. If we were to choose the metric d1 on C(E) then the resulting metric space is not 1 complete. Choose for example E = [ 1, 1] and fn (x) = x 2n+1 so that the pointwise limit of fn (x) is f (x) = 1 x > 0 f (x) = 1 x < 0 f (0) = 0 (4.2.4) By a simple calculation Z 1 1 (4.2.5) n+1 1 1 so that {fn }1 n=1 must be Cauchy in C(E) with metric d1 . On the other hand {fn }n=1 cannot be convergent in this space, since the only possible limit is f which does not belong to C(E). |fn (x) f (x)| = The same example can be modified to show that C(E) is not complete with any of the metrics dp for 1  p < 1, and so d1 is in some sense the ’natural’ metric. For this reason C(E) will always be assumed to supplied with the metric d1 unless otherwise stated. We next summarize in the form of a theorem some especially important facts about the metric spaces Lp (E), which may be found in any standard textbook on Lebesgue integration, for example Chapter 3 of [29] or Chapter 8 of [37]. th41 Theorem 4.1. If E ⇢ RN is measurable, then 1. Lp (E) is complete for 1  p  1. 2. Lp (E) is separable for 1  p < 1. 3. If Cc (E) is the set of continuous functions of bounded support, i.e. Cc (E) = {f 2 C(E) : there exists R < 1 such that f (x) ⌘ 0 for |x| > R} (4.2.6) then Cc (E) is dense in Lp (E) for 1  p < 1 The completeness property is a significant result in measure theory, often known as the Riesz-Fischer Theorem. 52 4.3 Functions on metric spaces and continuity Next, suppose X, Y are two metric spaces with metrics dX , dY respectively. Definition 4.6. Let T : X ! Y be a mapping. 1. We say T is continuous at a point x 2 X if for any ✏ > 0 there exists that dY (T (x), T (x̂))  ✏ whenever dX (x, x̂)  . > 0 such 2. T is continuous on X if it is continuous at each point of X. 3. T is uniformly continuous on X if for any ✏ > 0 there exists dY (T (x), T (x̂))  ✏ whenever dX (x, x̂)  , x, x̂ 2 X. > 0 such that 4. T is Lipschitz continuous on X if there exists L such that dY (T (x), T (x̂))  LdX (x, x̂) x, x̂ 2 X (4.3.1) The infimum of all L’s which work in this definition is called the Lipschitz constant of T . Clearly we have the implications that T Lipschitz continuous implies T is uniformly continuous, which in turn implies that T is continuous. T is one-to-one, or injective, if T (x1 ) = T (x2 ) only if x1 = x2 , and onto, or surjective, if for every y 2 Y there exists some x 2 X such that T (x) = y. If T is both one-to-one and onto then we say it is bijective, and in this case there must exist an inverse mapping T 1 : Y ! X. For any mapping T : X ! Y we define, for E ⇢ X and F ⇢ Y T (E) = {y 2 Y : y = T (x) for some x 2 E} (4.3.2) the image of E in Y, and T 1 (E) = {x 2 X : T (x) 2 E} (4.3.3) the preimage of F in X. Note that T is not required to be bijective in order that the preimage be defined. The following theorem states two useful characterizations of continuity. Condition b) is referred to as the sequential definition of continuity, for obvious reasons, while c) is the topological definition, since it may be used to define continuity in much more general topological spaces. 53 Theorem 4.2. Let X, Y be metric spaces and T : X ! Y. Then the following are equivalent: a) T is continuous on X. b) If xn 2 X and xn ! x, then T (xn ) ! T (x). c) If E is open in Y then T 1 (E) is open in X. Proof: Assume T is continuous on X and let xn ! x in X. If ✏ > 0 then there exists > 0 such that dY (T (x̂), T (x)) < ✏ if dX (x̂, x) < . Choosing n0 sufficiently large that dX (xn , x) < for n n0 we then must have dY (T (xn ), T (x)) < ✏ for n n0 , so that T (xn ) ! T (x). Thus a) implies b). To see that b) implies c), suppose condition b) holds, E is open in Y and x 2 T 1 (E). We must show that there exists > 0 such that x̂ 2 T 1 (E) whenever dX (x̂, x) < . If not then there exists a sequence xn ! x such that xn 62 T 1 (E), and by b), T (xn ) ! T (x). Since y = T (x) 2 E and E is open, there exists ✏ > 0 such that z 2 E if dY (z, y) < ✏. Thus T (xn ) 2 E for sufficiently large n, i.e. xn 2 T 1 (E), a contradiction. Finally, suppose c) holds and fix x 2 X. If ✏ > 0 then corresponding to the open set E = B(T (x), ✏) in Y there exists a ball B(x, ) in X such that B(x, ) ⇢ T 1 (E). But this means precisely that if dX (x̂, x) < then dY (T (x̂), T (x)) < ✏, so that T is continuous at x. 2 4.4 Compactness and optimization Another important topological concept is that of compactness. Definition 4.7. If E ⇢ X then a collection of open sets {G↵ }↵2A is an open cover of E if E ⇢ [↵2A G↵ . Here A is the index set and may be finite, countably or uncountably infinite. Definition 4.8. K ⇢ X is compact if any open cover of K has a finite subcover. More explicitly, K is compact if whenever K ⇢ [↵2A G↵ , where each G↵ is open, there exists a finite number of indices ↵1 , ↵2 , . . . ↵m 2 A such that K ⇢ [m j=1 G↵j . In addition, E ⇢ X is precompact (or relatively compact) if E is compact. 54 compact1 Proposition 4.3. A compact set is closed and bounded. A closed subset of a compact set is compact. Proof: Suppose that K is compact and pick x 2 K c . For any r > 0 let Gr = {y 2 X : d(x, y) > r}. It is easy to see that each Gr is open and K ⇢ [r>0 Gr . Thus there c exists r1 , r2 , . . . rm such that K ⇢ [m j=1 Grj and so B(x, r) ⇢ K if r < min {r1 , r2 , . . . rm }. c Thus K is open and so K is closed. Obviously [r>0 B(x, r) is an open cover of K for any fixed x 2 X. If K is compact then there must exist r1 , r2 , . . . rm such that K ⇢ [m j=1 B(x, rj ) and so K ⇢ B(x, R) where R = max {r1 , r2 , . . . rm }. Thus K is bounded. Now suppose that F ⇢ K where F is closed and K is compact. If {G↵ }↵2A is an open cover of F then these sets together with the open set F c are an open cover of K. c Hence there exists ↵1 , ↵2 , . . . ↵m such that K ⇢ ([m j=1 G↵j ) [ F , from which we conclude m that F ⇢ [j=1 G↵j . 2 There will be frequent occasions for wanting to know if a certain set is compact, but it is rare to use the above definition directly. A useful equivalent condition is that of sequential compactness. Definition 4.9. A set K ⇢ X is sequentially compact if any infinite sequence in E has a subsequence convergent to a point of K. Proposition 4.4. A set K ⇢ X is compact if and only if it is sequentially compact. We will not prove this result here, but instead refer to Theorem 16, Section 9.5 of [27] for details. It follows immediately that if E ⇢ X is precompact then any infinite sequence in X has a convergent subsequence (the point being that the limit need not belong to E). We point out that the concepts of compactness and sequential compactness are applicable in spaces even more general than metric spaces, and are not always equivalent in such situations. In the case that X = RN or CN we have an even more explicit characterization of compactness, the well known Heine-Borel Theorem, for which we refer to [28] for a proof. thhb Theorem 4.3. E ⇢ RN or E ⇢ CN is compact if and only if it is closed and bounded. While we know from Proposition 4.3 that a compact set is always closed and bounded, 55 the converse implication is definitely false in most function spaces we will be interested in. In later chapters a great deal of attention will be paid to optimization problems in function spaces, that is, problems in the Calculus of Variations. A simple result along these lines that we can prove already is: th43 Theorem 4.4. Let X be a compact metric space and f : X ! R be continuous. Then there exists x0 2 X such that f (x0 ) = max f (x) (4.4.1) x2X Proof: Let M = supx2X f (x) (which may be +1). so there exists a sequence {xn }1 n=1 such that limn!1 f (xn ) = M . By sequential compactness there is a subsequence {xnk } and x0 2 X such that limk!1 xnk = x0 and since f is continuous on X we must have f (x0 ) = limk!1 f (xnk ) = M . Thus M < 1 and 4.4.1 holds. 2 A common notation expressing the same conclusion as 4.4.1 is x0 2 argmax(f (x)) 2 (4.4.2) which is also useful in making the distinction between the maximum value of a function and the point(s) at which the maximum is achieved. We emphasize here the distinction between maximum and supremum, which is an essential point in later discussion of optimization. If E ⇢ R then M = sup E if • x  M for all x 2 E • if M 0 < M there exists x 2 E such that x > M 0 Such a number M exists for any E ⇢ R if we allow the value M = +1; by convention M = 1 if E is the empty set. On the other hand M = max E if • xM 2E 2 Even though argmax(f (x)) is in general a set of points, i.e. all points where f achieves its maximum value, one will often see this written as x0 = argmax(f (x)). Naturally we use the corresponding notation argmin for points where the minimum of f is achieved. 56 in which case evidently the maximum is finite and equal to the supremum. If f : X ! C is continuous on a compact metric space X, then we can apply Theorem 4.4 with f replaced by |f |, to obtain that there exists x0 2 X such that |f (x)|  |f (x0 )| for all x 2 X. We can then also conclude, as in Example 4.2 and Proposition 4.2 Proposition 4.5. If X is a compact metric space, then C(X) = {f : X ! C : f is continous at x for every x 2 X} is a complete metric space with metric d(f, g) = maxx2X |f (x) (4.4.3) g(x)|. In general C(X), or even a bounded set in C(X), is not itself precompact. A useful criteria for precompactness of a set of functions in C(X) is given by the Arzela-Ascoli theorem, which we review here, see e.g. [28] for a proof. Definition 4.10. We say a family of real or complex valued functions F defined on a metric space X is uniformly bounded if there exists a constant M such that whenever x 2 X , f 2 F |f (x)|  M and equicontinuous if for every ✏ > 0 there exists |f (x) f (y)| < ✏ (4.4.4) > 0 such that whenever x, y 2 E d(x, y) < f 2F (4.4.5) We then have arzasc ex48 Theorem 4.5. (Arzela-Ascoli) If X is a compact metric space and F ⇢ C(X) is uniformly bounded and equicontinuous, then F is precompact in C(X). Example 4.7. Let F = {f 2 C([0, 1]) : |f 0 (x)|  M 8x 2 (0, 1), f (0) = 0} for some fixed M . Then for f 2 F we have Z f (x) = implying in particular that |f (x)|  |f (x) Rx x f 0 (s) ds (4.4.6) (4.4.7) 0 M ds  M . Also Z y f (y)| = f 0 (s) ds  M |x 0 x 57 y| (4.4.8) so that for any ✏ > 0, = ✏/M works in the definition of equicontinuity. Thus by the Arzela-Ascoli theorem F is precompact in C([0, 1]). If X is a compact subset of RN then since uniform convergence implies Lp convergence, it follows that any set which is precompact in C(X) is also precompact in Lp (X). But there are also more refined, i.e. less restrictive, criteria for precompactness in Lp spaces, which are known, see e.g. [5], Section 4.5. 4.5 Contraction mapping theorem Met-Contr One of the most important theorems about metric spaces, frequently used in applied mathematics, is the Contraction Mapping Theorem, which concerns fixed points of a mapping of X into itself. Definition 4.11. A mapping T : X ! X is a contraction on X if it is Lipschitz continuous with Lipschitz constant ⇢ < 1, that is, there exists ⇢ 2 [0, 1) such that d(T (x), T (x̂))  ⇢d(x, x̂) 8x, x̂ 2 X (4.5.1) If ⇢ = 1 is allowed, we say T is nonexpansive. cmt Theorem 4.6. If T is a contraction on a complete metric space X then there exists a unique x 2 X such that T (x) = x. Proof: The uniqueness assertion is immediate, namely if T (x1 ) = x1 and T (x2 ) = x2 then d(x1 , x2 ) = d(T (x1 ), T (x2 ))  ⇢d(x1 , x2 ). Since ⇢ < 1 we must have d(x1 , x2 ) = 0 so that x1 = x2 . To prove the existence of x, fix any point x1 2 X and define xn+1 = T (xn ) (4.5.2) for n = 1, 2, . . . . We first show that {xn }1 n=1 must be a Cauchy sequence. Note that and by induction d(x3 , x2 ) = d(T (x2 ), T (x1 ))  ⇢d(x1 , x1 ) (4.5.3) d(xn+1 , xn ) = d(T (xn ), T (xn 1 )  ⇢n 1 d(x2 , x1 ) (4.5.4) 58 fpi Thus by the triangle inequality and the usual summation formula for a geometric series, if m > n > 1 d(xm , xn )  = m X1 d(xj+1 , xj )  j=n n 1 ⇢ (1 ⇢m 1 ⇢ n+1 m X1 ) ⇢j 1 d(x2 , x1 ) (4.5.5) j=n d(x2 , x1 )  ⇢n 1 d(x2 , x1 ) 1 ⇢ (4.5.6) It follows immediately that {xn }1 n=1 is a Cauchy sequence, and since X is complete there exists x 2 X such that limn!1 xn = x. Since T is continuous T (xn ) ! T (x) as n ! 1 and so x = T (x) must hold. 2 The point x in the Contraction Mapping Theorem which satisfies T (x) = x is called a fixed point of T , and the process (4.5.2) of generating the sequence {xn }1 n=1 , is called fixed point iteration. Not only does the theorem show that T possesses a unique fixed point under the stated hypotheses, but the proof shows that the fixed point may be obtained by fixed point iteration starting from an arbitrary point of X. As a simple application of the theorem, consider a second kind integral equation Z u(x) + K(x, y)u(y) dy = f (x) (4.5.7) inteq ⌦ with ⌦ ⇢ RN a bounded open set, a kernel function K = K(x, y) defined and continuous for (x, y) 2 ⌦ ⇥ ⌦ and f 2 C(⌦). We can then define a mapping T on X = C(⌦) by Z T (u)(x) = K(x, y)u(y) dy + f (x) (4.5.8) ⌦ so that (4.5.7) is equivalent to the fixed point problem u = T (u) in X. Since K is uniformly continuous on ⌦ ⇥ ⌦ it is immediate that T u 2 X whenever u 2 X, and by elementary estimates we have Z d(T (u), T (v)) = max |T (u)(x) T (v)(x)| = max K(x, y)(u(y) v(y)) dy  Ld(u, v) x2⌦ x2⌦ ⌦ (4.5.9) where L := maxx2⌦ ⌦ |K(x, y)| dy. We therefore may conclude from the Contraction Mapping Theorem the following: R Proposition 4.6. If max x2⌦ Z ⌦ |K(x, y)| dy < 1 59 (4.5.10) 410 then (4.5.7) has a unique solution for every f 2 C(⌦). The condition (4.5.10) will be satisfied if either the maximum of |K| is small enough or the size of the domain ⌦ is small enough. Eventually we will see that some such smallness condition is necessary for unique solvability of (4.5.7), but the exact conditions will be sharpened considerably. If we consider instead the family of second kind integral equations Z u(x) + K(x, y)u(y) dy = f (x) (4.5.11) ⌦ with the same conditions on K and f , then the above argument show unique solvability for all sufficiently large , namely provided Z max |K(x, y)| dy < | | (4.5.12) x2⌦ ⌦ As a second example, consider the initial value problem for a first order ODE du = f (t, u) dt u(t0 ) = u0 (4.5.13) where we assume at least that f is continuous on [a, b] ⇥ R with t0 2 (a, b). If a classical solution u exists, then integrating both sides of the ODE from t0 to t, and taking account of the initial condition we obtain Z t u(t) = u0 + f (s, u(s)) ds (4.5.14) t0 Conversely, if u 2 C([a, b]) and satisfies (4.5.14) then necessarily u0 exists, is also continuous and (4.5.13) holds. Thus the problem of solving (4.5.13) is seen to be equivalent to that of finding a continuous solution of (4.5.14). In turn this can be viewed as the problem of finding a fixed point of the nonlinear mapping T : C([a, b]) ! C([a, b]) defined by Z t T (u)(t) = u0 + f (s, u(s)) ds (4.5.15) t0 Now if we assume that f satisfies the Lipschitz condition with respect to u, |f (t, u) f (t, v)|  L|u v| 60 u, v 2 R t 2 [a, b] (4.5.16) odeivp1 odeie then |T (u)(t) T (v)(t)|  L or Z t t0 |u(s) v(s)| ds  L|b d(T (u), T (v))  L|b a| max |u(t) atb v(t)| a|d(u, v) (4.5.17) (4.5.18) where d is again the usual metric on C([a, b]). Thus the contraction mapping provides a unique local solution, that is, on any interval [a, b] containing t0 for which (b a) < 1/L. Instead of the requirement that the Lipschitz condition (4.5.18) be valid on the entire infinite strip [a, b]⇥R, it is actually only necessary to assume it holds on [a, b]⇥[c, d] where u0 2 (c, d). Also, first order systems of ODEs (and thus scalar higher order equations) can be handled in essentially the same manner. Such generalizations may be found in standard ODE textbooks, e.g. Chapter 1 of [CL] or Chapter 3 of [BN]. We conclude with a useful variant of the contraction mapping theorem. If T : X ! X then we can define the (composition) powers of T by T 2 (x) = T (T (x)), T 3 (x) = T (T 2 (x)) etc. Thus T n : X ! X for n = 1, 2, 3, . . . . Theorem 4.7. If there exists a positive integer n such that T n is a contraction on a complete metric space X then there exists a unique x 2 X such that T (x) = x. Proof: By Theorem 4.6 there exists a unique x 2 X such that T n (x) = x. Applying T to both sides gives T n (T (x)) = T n+1 (x) = T (x) so that T (x) is also a fixed point of T n . By uniqueness, T (x) = x, i.e. T has at least one fixed point. To see that the fixed point of T is unique, observe that any fixed point of T is also a fixed point of T 2 , T 3 , . . . . In particular, if T has two distinct fixed points then so does T n , which is a contradiction. 2 4.6 Exercises 1. Verify that dp defined in Example 4.1 is a metric on RN or CN . (Suggestion: to prove the triangle inequality, use the finite dimensional version of the Minkowski inequality (18.1.15)). 2. If (X, dX ), (Y, dY ) are metric spaces, show that the Cartesian product Z = X ⇥ Y = {(x, y) : x 2 X, y 2 Y } 61 odelip is a metric space with distance function d((x1 , y1 ), (x2 , y2 )) = dX (x1 , x2 ) + dY (y1 , y2 ) p 3. Is d(x, y) = |x y|2 a metric on R? What about d(x, y) = |x y|? Find reasonable conditions on a function : [0, 1) ! [0, 1) such that d(x, y) = (|x y|) is a metric on R. 4. Prove that a closed subset of a compact set in a metric space is also compact. 5. Let (X, d) be a metric space, A ⇢ X be nonempty and define the distance from a point x to the set A to be d(x, A) = inf d(x, y) y2A a) Show that |d(x, A) pansive). d(y, A)|  d(x, y) for x, y 2 X (i.e. x ! d(x, A) is nonex- b) Assume A is closed. Show that d(x, A) = 0 if and only if x 2 A. c) Assume A is compact. Show that for any x 2 X there exists z 2 A such that d(x, A) = d(x, z). 6. Suppose that F is closed and G is open in a metric space (X, d) and F ⇢ G. Show that there exists a continuous function f : X ! R such that i) 0  f (x)  1 for all x 2 X. ii) f (x) = 1 for x 2 F . iii) f (x) = 0 for x 2 Gc . Hint: Consider f (x) = d(x, Gc ) d(x, Gc ) + d(x, F ) 7. Two metrics d, dˆ on a set X are said to be equivalent if there exist constants 0 < C < C ⇤ < 1 such that C d(x, y)  C⇤ ˆ y) d(x, 8x, y 2 X a) If d, dˆ are equivalent, show that a sequence {xk }1 k=1 is convergent in (X, d) if and ˆ only if it is convergent in (X, d). b) Show that any two of the metrics dp on Rn are equivalent. 62 ex4-8 8. Prove that C([a, b]) is separable (you may quote the Weierstrass approximation theorem) but L1 (a, b) is not separable. 9. If X, Y are metric spaces, f : X ! Y is continuous and K is compact in X, show that the image f (K) is compact in Y . 10. Let F = {f 2 C([0, 1]) : |f (x) f (y)|  |x y| for all x, y, Z 1 f (x) dx = 0} 0 Show that F is compact in C([0, 1]). (Suggestion: to prove that F is uniformly bounded, justify and use the fact that if f 2 F then f (x) = 0 for some x 2 [0, 1].) 11. Show that the set F in Example 4.7 is not closed. 12. From the proof of the contraction mapping it is clear that the smaller ⇢ is, the faster the sequence xn converges to the fixed point x. With this in mind, explain why Newton’s method f (xn ) xn+1 = xn f 0 (xn ) is in general a very rapidly convergent method for approximating roots of f : R ! R, as long as the initial guess is close enough. 13. Let fn (x) = sinn x for n = 1, 2, . . . . a) Is the sequence {fn }1 n=1 convergent in C([0, ⇡])? b) Is the sequence convergent in L2 (0, ⇡)? c) Is the sequence compact or precompact in either of these spaces? 14. Let X be a complete metric space and T : X ! X satisfy d(T (x), T (y)) < d(x, y) for all x, y 2 X, x 6= y. Show that T can have at most one fixed point, but p may have none. (Suggestion: for an example of non-existence look at T (x) = x2 + 1 on R.) 15. Let S denote the linear Volterra type integral operator Z x Su(x) = K(x, y)u(y) dy a where the kernel K is continuous and satisfies |K(x, y)|  M for a  y  x. 63 a) Show that |S n u(x)|  M n (x a)n max |u(y)| x > a n = 1, 2, . . . ayx n! b) Deduce from this that for any b > a, there exists an integer n such that S n is a contraction on C([a, b]). c) Show that for any f 2 C([a, b]) the second kind Volterra integral equation Z x K(x, y)u(y) dy = u(x) + f (x) a < x < b a has a unique solution u 2 C([a, b]). 16. Show that for sufficiently small | | there exists a unique solution of the boundary value problem u00 + u = f (x) 0 < x < 1 u(0) = u(1) = 0 for any f 2 C([0, 1]). (Suggestion: use the result of Chapter 2, Exercise 7 to transform the boundary value problem into a fixed point problem for an integral operator, then apply the Contraction Mapping Theorem.) Be as precise as you can about which values of are allowed. 17. Let f = f (x, y) be continuously di↵erentiable on [0, 1] ⇥ R and satisfy 0<m @f (x, y)  M @y Show that there exists a unique continuous function (x) such that f (x, (x)) = 0 0 < x < 1 (Suggestion: Define the transformation (T )(x) = (x) f (x, (x)) and show that T is a contraction on C([0, 1]) for some choice of . This is a special case of the implicit function theorem.) 64 ex4-18 18. Show that if we let d(f, g) = where 1 X 2 n en 1 + en n=0 en = max |f (n) (x) x2[a,b] g (n) (x)| (n) then (C 1 ([a, b]), d) is a metric space, in which fk ! f if and only if fk uniformly on [a, b] for n = 0, 1, . . . . ex4-19 ! f (n) 19. If E ⇢ RN is closed and bounded, show that C 1 (E) is a complete metric space with the metric defined by (4.1.7). 65 Chapter 5 Normed linear spaces and Banach spaces chbanach 5.1 Axioms of a normed linear space Definition 5.1. A vector space X is said to be a normed linear space if for every x 2 X there is defined a nonnegative real number ||x||, the norm of x, such that the following axioms hold. [N1] ||x|| = 0 if and only if x = 0 [N2] || x|| = | |||x|| for any x 2 X and any scalar . [N3] ||x + y||  ||x|| + ||y|| for any x, y 2 X. As in the case of a metric space it is technically the pair (X, || · ||) which constitute a normed linear space, but the definition of the norm will usually be clear from the context. If two di↵erent normed spaces are needed we will use a notation such as ||x||X to indicate the space in which the norm is calculated. Example 5.1. In the vector space X = RM or CN we can define the family of norms ! p1 n X p ||x||p = |xj | 1p<1 j=1 66 ||x||1 = max |xj | 1jn (5.1.1) Axioms [N1] and [N2] are obvious, while axiom [N3] amounts to the Minkowski inequality (18.1.15). We obviously have dp (x, y) = ||x y||p here, and this correspondence between norm and metric is a special case of the following general fact that a norm always gives rise to a metric, and whose proof is immediate from the definitions involved. prop51 Proposition 5.1. Let (X, || · ||) be a normed linear space. If we set d(x, y) = ||x for x, y 2 X then (X, d) is a metric space. y|| Example 5.2. If E ⇢ RN is closed and bounded then it is easy to verify that ||f || = max |f (x)| x2E (5.1.2) defines a norm on C(E), and the usual metric (4.1.4) on C(E) amounts to d(f, g) = ||f g||. Likewise, the metrics (4.1.8),(4.1.9) on Lp (E) may be viewed as coming from the corresponding Lp norms, ( R 1 |f (x)|p dx p 1  p < 1 E ||f ||Lp (E) = (5.1.3) ess supx2E |f (x)| p = 1 Note that for such a metric we must have d( x, y) = | |d(x, y) so that if this property does not hold, the metric cannot arise from a norm in this way. For example, d(x, y) = |x y| 1 + |x y| (5.1.4) is a metric on R which does not come from a norm. Since any normed linear space may now be regarded as metric space, all of the topological concepts defined for a metric space are meaningful in a normed linear space. Completeness holds in many situations of interest, so we have a special designation in that case. Definition 5.2. A Banach space is a complete normed linear space. Example 5.3. The spaces RN , CN are vector spaces which are also complete metric spaces with any of the norms || · ||p , hence they are Banach spaces. Similarly C(E), Lp (E) are Banach spaces with norms indicated above. Here are a few simple results we can prove already. 67 prop52 Proposition 5.2. If X is a normed linear space the the norm is a continuous function on X. If E ⇢ X is compact and y 2 X then there exists x0 2 E such that ||y x0 || = min ||y x|| x2E (5.1.5) Proof: From the triangle inequality we get |(||x1 || ||x2 ||)|  ||x1 x2 || so that f (x) = ||x|| is Lipschitz continuous (with Lipschitz constant 1) on X. Similarly f (x) = ||x y|| is also continuous for any fixed y, so we may apply Theorem 4.4 with X replaced by the compact metric space E and f (x) = ||x y|| to get the conclusion (ii). 2 Another topological point of interest is the following. th51 Theorem 5.1. If M is a subspace of a normed linear space X, and dim M < 1 then M is closed. Proof: The proof is by induction on the number of dimensions. Let dim(M ) = 1 so that M = {u = e : 2 C} for some e 2 X, ||e|| = 1. If un 2 M then un = n e for some um || = | n n 2 C and un ! u in X implies, since ||un m |, that { n } is a Cauchy sequence in C. Thus there exist 2 C such that n ! so that un ! u = e 2 M , as needed. Now suppose we know that all N dimensional subspaces are closed and dim M = N + 1, so we can find e1 , . . . , eN +1 linearly independent unit vectors such that M = L(e1 , . . . , eN +1 ). Let M̃ = L(e1 , . . . , eN ) which is closed by the induction assumption. If un 2 M there exists n 2 C and vn 2 M̃ such that un = vn + n eN +1 . Suppose that un ! u in X. We claim first that { n } is bounded in C. If not, there must exist nk such that | nk | ! 1, and since un remains bounded in X we get unk / nk ! 0. It follows that eN +1 un k vnk = nk nk 2 M̃ (5.1.6) Since M̃ is closed, it would follow, upon letting nk ! 1, that eN +1 2 M̃ , which is impossible. Thus we can find a subsequence v n k = un k nk ! for some nk eN +1 Again since M̃ is closed it follows that u !u 2 C and eN +1 (5.1.7) eN +1 2 M̃ , so that u 2 M as needed. 68 For the proof, see for example Theorem 1.21 of [30]. For an infinite dimensional subspace this is false in general. For example, the Weierstrass approximation theorem states that if f 2 C([a, b]) and ✏ > 0 there exists a polynomial p such that |p(x) f (x)|  ✏ on [a, b]. Thus if we take X = C([a, b]) and E to be the set of all polynomials on [a, b] then clearly E is a subspace of X and every point of X is a limit point of E. Thus E cannot be closed since otherwise E would be equal to all of X. Recall that when E = X as in this example, we say that E is a dense subspace of X. Such subspaces play an important role in functional analysis. According to Theorem 5.1 a finite dimensional Banach space X has no dense subspace aside from X itself. 5.2 Infinite series In a normed linear space we can study limits of sums, i.e. infinite series. P Definition 5.3. We say 1 j=1 xj is convergent in X to the limit s if limn!1 sn = s, Pn where sn = j=1 xj is the n’th partial sum of the series. prop53 A useful criterion for convergence can then be given, provided the space is also complete. P Proposition 5.3. If X is a Banach space, xn 2 X for n = 1, P 2, . . . and 1 n=1 ||xn || < 1 P1 then n=1 xn is convergent to an element s 2 X with ||s||  1 ||x ||. n n=1 P P Proof: If m > n we have ||sm sn || = || m xj ||  m j=n+1 j=n+1 ||xj || by the triangle P1 inequality. If j=1 ||xj || it is convergent, its partial sums form a Cauchy sequence in R, and hence {sn } is P also Cauchy. Since the space is complete s = limn!1 sn exists. We also have ||sn ||  nj=1 ||xj || for any fixed n, and ||sn || ! ||s|| by Proposition 5.2, so P ||s||  1 j=1 ||xj || must hold. 2 The concepts of linear combination, linear independence and basis may now be extended to allow for infinite sums in an obvious way: We say a countably infinite set of vectors {xn }1 n=1 is linearly independent if 1 X n xn = 0 if and only if n=1 69 n = 0 for all n (5.2.1) P1 1 and x 2 L({xn }1 n=1 ), the span of ({xn }n=1 ), provided x = n=1 n xn for some scalars { n }1 n=1 . A basis of X is then a linearly independent spanning set, or equivalently 1 1 {xn }P n=1 is a basis of X if for any x 2 X there exist unique scalars { n }n=1 such that 1 x = n=1 n xn . We emphasize that this definition of basis is not the same as that given in Definition 3.4 for a basis of a vector space, the di↵erence being that the sum there is required to always be finite. The term Schauder basis is sometimes used for the above definition if the distinction needs to be made. Throughout the remainder of these notes, the term basis will always mean Schauder basis unless otherwise stated. A Banach space X which contains a Schauder basis {xn }1 n=1 is always separable, since then the set of all finite linear combinations of the xn ’s with rational coefficients is easily seen to be countable and dense. It is known that not every separable Banach space has a Schauder basis (recall there must exist a Hamel basis), see for example Section 1.1 of [38]. 5.3 Linear operators and functionals We have previously defined what it means for a mapping T : X 7 ! Y between vector spaces to be linear. When the spaces X, Y are normed linear spaces we usually refer to such a mapping T as a linear operator. We say that T is bounded if there exists a finite constant C such that ||T x||  C||x|| for every x 2 X, and we may then define the norm of T as the smallest such C, or equivalently ||T || = sup x6=0 ||T x|| ||x|| (5.3.1) The condition ||T || < 1 is equivalent to continuity of T . prop54 Proposition 5.4. If X, Y are normed linear spaces and T : X ! Y is linear then the following three conditions are equivalent. a) T is bounded b) T is continuous c) There exists x0 2 X such that T is continuous at x0 . 70 normdef Proof: If x0 , x 2 X then ||T (x) T (x0 )|| = ||T (x x0 )||  ||T || ||x x0 || (5.3.2) Thus if T is bounded then it is (Lipschitz) continuous at any point of X. The implication that b) implies c) is trivial. Finally suppose T is continuous at x0 2 X. For any ✏ > 0 there must exist > 0 such that ||T (z x0 )|| = ||T (z) T (x0 )||  ✏ if ||z x0 ||  . For x any x 6= 0, choose z = x0 + ||x|| to get ||T ✓ x ||x|| ◆ ||  ✏ (5.3.3) or equivalently, using the linearity of T , ||T x||  C||x|| with C = ✏/ . Thus T is bounded. 2 A continuous linear operator is therefore the same as a bounded linear operator, and the two terms are used interchangeably. When the range space Y is the scalar field R or C we call T a linear functional instead of linear operator, and correspondingly a bounded (or continuous) linear functional if |T x|  C||x|| for some finite constant C. We introduce the notation B(X, Y) = {T : X ! Y : T is linear and bounded} and the special cases B(X) = B(X, X) X0 = B(X, C) (5.3.4) (5.3.5) Examples of linear operators and functionals will be studied much more extensively later. For now we just give two simple examples. Example 5.4. If X = RN , Y = RM and A is an M ⇥N real matrix with entries akj , then P yk = M j=1 akj xj defines a linear mapping, and according to the discussion of Section 3.3 any linear mapping of RN to RM is of this form. It is not hard to check that T is always bounded, assuming that we use any of the norms || · ||p in X and in Y. Evidently T is a linear functional if M = 1. Example 5.5. If ⌦ ⇢ RN is compact and X = C(⌦) pick x0 2 ⌦ and set T (f ) = f (x0 ) for f 2 X. Clearly T is a linear functional and |T f |  ||f || so that ||T ||  1. 71 5.4 Contraction mappings in a Banach space sec54 If the Contraction Mapping theorem, Theorem 4.6, is specialized to a Banach space, the resulting statement is that if X is a Banach space and F : X ! X satisfies ||F (x) F (y)||  L||x y|| x, y 2 X (5.4.1) conaflin for some L < 1, then F has a unique fixed point in X. A particular case which arises frequently in applications is when the mapping F has the form F (x) = T x + b for some b 2 X and bounded linear operator T on X, in which case the contraction condition (5.4.1) simply amounts to the requirement that ||T || < 1. If we then initialize the fixed point iteration process (4.5.2) with x1 = b, the successive iterates are x2 = F (x1 ) = F (b) = T b + b x3 = F (x2 ) = T x2 + b = T 2 b + T b + b (5.4.2) (5.4.3) etc., the general pattern being xn = n 1 X T j b n = 1, 2, . . . (5.4.4) j=0 with T 0 = I as usual. If ||T || < 1 we already know that this sequence must converge, but it could also be checked directly from Proposition 5.3 using the obvious inequality ||T j b||  ||T ||j ||b||. In fact we know that xn ! x, the unique fixed point of F , so x= 1 X T jb (5.4.5) j=0 is an explicit solution formula for the linear, inhomogeneous equation x T x = b. The right hand side of (5.4.5) is known as the Neumann series for x = (I T ) 1 b, and symbolically we may write 1 X (I T ) 1 = Tj (5.4.6) j=0 Note the formal similarity to the usual geometric series formula for (1 z) 1 if z 2 C, |z| < 1. If T and b are such that ||T j b|| << ||T b|| for j 2, then truncating the series after two terms we get the Born approximation formula x ⇡ b + T b. 72 neuser 5.5 Exercises 1. Give the proof of Proposition 5.1. 2. Show that any two norms on a finite dimensional normed linear space are equivalent. That is to say, if (X, || · ||), (X, ||| · |||) are both normed linear spaces, then there exist constants 0 < c < C < 1 such that c ex5-3 ex5-7 |||x||| C ||x|| for all x 2 X 3. If X is a normed linear space and Y is a Banach space, show that B(X, Y) is a Banach space, with the norm given by (5.3.1). R 4. If T is a linear integral operator, T u(x) = ⌦ K(x, y)u(y) dy, then T 2 is also a linear integral operator. What is the kernel for T 2 ? 5. If X is a normed linear space and E is a subspace of X, show that E is also a subspace of X. R 1/p 6. If p 2 (0, 1) show that ||f ||p = ⌦ |f (x)|p dx does not define a norm. 7. The simple initial value problem u0 = u u(0) = 1 is equivalent to the integral equation u(x) = 1 + Z x u(s) ds 0 which may be viewed as a fixed point problem of the special type discussed in Section 5.4. Find the Neumann series for the solution u. Where does it converge? 8. If T f = f (0), show that T is not a bounded linear functional on Lp ( 1, 1) for 1  p < 1. expop 9. Let A 2 B(X). a) Show that exp(A) = eA := 1 X An n=0 73 n! (5.5.1) is defined in B(X). b) If also B 2 B(X) and AB = BA show that exp(A + B) = exp(A) exp(B). c) Show that exp((t + s)A) = exp(tA) exp(sA) for any t, s 2 R. d) Show that the conclusion in b) is false, in general, if A and B do not commute. (Suggestion: a counterexample can be found in X = R2 .) 10. Find an integral equation of the form u = T u + f , T linear, which is equivalent to the initial value problem u00 + u = x2 x>0 u(0) = 1 u0 (0) = 2 (5.5.2) Calculate the Born approximation to the solution u and compare to the exact solution. 74 Chapter 6 Inner product spaces and Hilbert spaces chhilbert 6.1 Axioms of an inner product space Definition 6.1. A vector space X is said to be an inner product space if for every x, y 2 X there is defined a complex number hx, yi, the inner product of x and y such that the following axioms hold. [H1] hx, xi 0 for all x 2 X [H2] hx, xi = 0 if and only if x = 0 [H3] h x, yi = hx, yi for any x, y 2 X and any scalar . [H4] hx, yi = hy, xi for any x, y 2 X. [H5] hx + y, zi = hx, zi + hy, zi for any x, y, z 2 X Note that from axioms [H3] and [H4] it follows that hx, yi = h y, xi = hy, xi = ¯ hy, xi = ¯ hx, yi 75 (6.1.1) Another immediate consequence of the axioms is that ||x + y||2 = hx + y, x + yi = ||x||2 + 2Re hx, yi + ||y||2 = ||x||2 + ||y||2 If we replace y by gram Law (6.1.2) y and add the resulting identities we obtain the so-called Parallelo||x + y||2 + ||x y||2 = 2||x||2 + 2||y||2 (6.1.3) Example 6.1. The vector space RN is an inner product space if we define hx, yi = n X hx, yi = n X xj yj (6.1.4) xj y j (6.1.5) j=1 In the case of Cn we must define j=1 in order that [H4] be satisfied. Example 6.2. For the vector space L2 (⌦), with ⌦ ⇢ RN , we may define Z hf, gi = f (x)g(x) dx (6.1.6) E where of course the complex conjugation can be ignored in the case of the real vector space L2 (⌦). Note the formal analogy with the inner product in the case of RN or CN . The finiteness of hf, gi is guaranteed by the Hölder inequality (18.1.6), and the validity of [H1]-[H5] is clear. littlel2 612 Example 6.3. Another important inner product space which we introduce at this point is the sequence space ( ) 1 X `2 = x = {xj }1 |xj |2 < 1 (6.1.7) j=1 : j=1 with inner product hx, yi = 1 X xj y j (6.1.8) j=1 The fact that hx, yi is finite for any x, y 2 `2 follows now from (18.1.14), the discrete form of the Hölder inequality. The notation `2 (Z) is often used when the sequences involved are bi-infinite, i.e. of the form x = {xj }1 j= 1 . 76 plaw 6.2 Norm in a Hilbert space Proposition 6.1. If x, y 2 X, an inner product space, then |hx, yi|2  hx, xihy, yi (6.2.1) zi = hx, xi hx, zi hz, xi + hz, zi = hx, xi + hz, zi 2Re hx, zi (6.2.2) (6.2.3) Proof: For any z 2 X we have 0  hx z, x and hence 2Re hz, xi  hx, xi + hz, zi (6.2.4) |hx, yi|2 |hx, yi|2  hx, xi + hy, yi hy, yi (6.2.5) If y = 0 there in nothing to prove, otherwise choose z = (hx, yi/hy, yi)y to get 2 The conclusion (6.2.1) now follows upon rearrangement. 2 th61 Theorem 6.1. If X is an inner product space and if we set ||x|| = a norm on X. Proof: By axiom [H1] and axiom [H2] implies || x||2 =< x, x >= then p hx, xi then || · || is ||x|| is defined as a nonnegative real number for every x 2 X, the corresponding axiom [N1] of norm. If is any scalar then ¯ hx, xi = | |2 ||x||2 so that [N2] also holds. Finally, if x, y 2 X ||x + y||2 = hx + y, x + yi = ||x||2 + 2Re hx, yi + ||y||2  ||x||2 + 2|hx, yi| + ||y||2  ||x||2 + 2||x|| ||y|| + ||y||2 = (||x|| + ||y||)2 (6.2.6) (6.2.7) (6.2.8) so that the triangle inequality [N3] also holds. 2 The inequality (6.2.1) may now be restated as |hx, yi|  ||x|| ||y|| (6.2.9) for any x, y 2 X, and in this form is usually called the Schwarz or Cauchy-Schwarz inequality. 77 schwarz cor61 Corollary 6.1. If xn ! x in X then hxn , yi ! hx, yi for any y 2 X Proof: We have that |hxn , yi hx, yi| = |hxn x, yi|  ||xn x|| ||y|| ! 0 (6.2.10) By Theorem 6.1 an inner product space may always be regarded as a normed linear space, and analogously to the definition of Banach space we have Definition 6.2. A Hilbert space is a complete inner product space. Example 6.4. The spaces RN and CN are Hilbert spaces, as is L2 (⌦) on account of the completeness property mentioned in TheoremR 4.1 of Chapter 4. On the other hand if we consider C(E) with inner product hf, gi = E f (x)g(x) dx, then it is an inner product space which is not a Hilbert space, since as previously observed, C(E) is not complete with the L2 (⌦) metric. The sequence space `2 is also a Hilbert space, see Exercise 7. 6.3 Orthogonality Recall from elementary calculus that in Rn the inner product allows one to calculate the angle between two vectors, namely hx, yi = ||x|| ||y|| cos ✓ (6.3.1) where ✓ is the angle between x and y. In particular x and y are perpendicular if and only if hx, yi = 0. The concept of perpindicularity, also called orthogonality, is fundamental in Hilbert space analysis, even if the geometric picture is less clear. Definition 6.3. if x, y 2 X, an inner product space, we say x, y are orthogonal if hx, yi = 0. From (6.1.2) we obtain immediately the ’Pythagorean Theorem’ that if x and y are orthogonal then ||x + y||2 = ||x||2 + ||y||2 (6.3.2) 78 A set of vectors {x1 , x2 , . . . xn } is called an orthogonal set if xj and xk are orthogonal whenever j 6= k, and for such a set we have || n X j=1 2 xj || = n X j=1 ||xj ||2 (6.3.3) 634 The set is called orthonormal if in addition ||xj || = 1 for every j. The same terminology is used for countably infinite sets, with (6.3.3) still valid provided that the series on the right is convergent. We may also use the notation x ? y if x, y are orthogonal, and if E ⇢ X we define the orthogonal complement of E E ? = {x 2 X : hx, yi = 0 for all y 2 E} (E ? = x? if E consists of the single point x). We obviously have 0? = X and X? = {0} also, since if x 2 X? then hx, xi = 0 so that x = 0. prop62 Proposition 6.2. If E ⇢ X then E ? is a closed subspace of X. If E is a closed subpace then E = E ?? . We leave the proof as an exercise. Here E ?? means (E ? )? , the orthogonal complement of the orthogonal complement. Example 6.5. If X = R3 and E = {x = (x1 , x2 , x3 ) : x1 = x2 = 0} then E ? = {x 2 R3 : x3 = 0}. Example 6.6. If X = L2 (⌦) with ⌦ a bounded open set in RRN , let E = L{1}, i.e. the set of constant functions. Then f 2 E ? if and only if hf, 1i = ⌦ f (x) dx = 0. Thus E ? is the set of functions in L2 (⌦) with mean value zero. 6.4 Projections If E ⇢ X and x 2 X, the projection PE x of x onto E is the element of E closest to x, if such an element exists. That is, y = PE (x) if y is the unique solution of the minimization problem min ||x z|| (6.4.1) z2E Of course such a point may not exist, and may not be unique if it does exist. In a Hilbert space the projection will be well defined provided E is closed and convex. 79 bestapprox Definition 6.4. If X is a vector space and E ⇢ X, we say E is convex if x+(1 whenever x, y 2 E and 2 [0, 1]. )y 2 E Example 6.7. If X is a vector space then any subspace of X is convex. If X is a normed linear space then any ball B(x, R) ⇢ X is convex. Theorem 6.2. Let H be a Hilbert space, E ⇢ H closed and convex, and x 2 H. Then y = PE x exists. Furthermore, y = PE x if and only if y2E Re hx yi  0 y, z for all z 2 E (6.4.2) projvar Proof: Set d = inf z2E ||x z|| so that there exists a sequence zn 2 E such that ||x zn || ! d. We wish to show that {zn } is a Cauchy sequence. From the Parallelogram Law (6.1.3) applied to zn x, zm x we have ||zn zm ||2 = 2||zn x||2 + 2||zm x||2 m Since E is convex, (zn + zm )/2 2 E so that || zn +z 2 ||zn zm ||2  2||zn 4|| x|| x||2 + 2||zm zn + zm 2 x||2 (6.4.3) d, and it follows that x||2 4d2 (6.4.4) Letting n, m ! 1 the right hand side tends to zero, so that {zn } is Cauchy. Since the space is complete there exists y 2 H such that limn!1 zn = y, and y 2 E since E is closed. It follows that ||y x|| = limn!1 ||zn x|| = d so that minz2E ||z x|| is achieved at y. For the uniqueness assertion, suppose ||y x|| = ||ŷ (6.4.4) holds with zn , zm replaced by y, ŷ giving ||y ŷ||  2||y x||2 + 2||ŷ x||2 x|| = d with y, ŷ 2 E. Then 4d2 = 0 (6.4.5) so that y = ŷ. Thus y = PE x exists. To obtain the characterization (6.4.2), note that for any z 2 E f (t) = ||x (y + t(z y))||2 (6.4.6) has its minimum value on the interval [0, 1] when t = 0, since y +t(z E. We explicitly calculate f (t) = ||x y||2 2t Re hx 80 y, z yi + t2 ||z y) = tz +(1 t)y 2 y||2 (6.4.7) 644 By elementary calculus considerations, the minimum of this quadratic occurs at t = 0 only if f 0 (0) = 2 Re hx y, z yi 0 which is equivalent to (6.4.2). If, on the other hand, (6.4.2) holds, then for any z 2 E we must have ||z so that minz2E ||z x||2 = f (1) f (0) = ||z y||2 (6.4.8) x|| must occur at y, i.e. y = PE x 2 The most important special case of the above theorem is when E is a closed subspace of the Hilbert space H (recall a subspace is always convex), in which case we have th63 Theorem 6.3. If E ⇢ H is a closed subspace of a Hilbert space H and x 2 H then y = PE x if and only if y 2 E and x y 2 E ? . Furthermore 1. x y=x PE x = P E ? x 2. We have that x = y + (x y) = PE x + PE ? x (6.4.9) is the unique decomposition of x as the sum of an element of E and an element of E ?. 3. PE is a linear operator on H with ||PE || = 1 except for the case E = {0}. Proof: If y = PE x then for any w 2 E we also have y ± w 2 E, and choosing z = y ± w in (6.4.2) gives ± Re hx y, wi  0. Thus Re hx y, wi = 0, and repeating the same argument with z = y ± iw gives Re hx y, iwi = Im hx y, wi = 0 also. We conclude that hx y, wi = 0 for all w 2 E, i.e. x y 2 E ? . The converse statement may be proved in a similar manner. Recall that E ? is always a closed subspace of H. The statement that x y = PE ? x is then equivalent, by the previous paragraph, to x y 2 E ? and hx (x y), wi = hy, wi = 0 for every w 2 E ? , which is evidently true since y 2 E. Next, if x = y1 + z1 = y2 + z2 with y1 , y2 2 E and z1 , z2 2 E ? then y1 y2 = z1 z2 implying that y = y1 y2 belongs to both E and E ? . But then y ? y, i.e. < y, y >= 0, must hold so that y = 0 and hence y1 = y2 , z1 = z2 . We leave the proof of linearity to the exercises. 2 If we denote by I the identity mapping, we have just proved that PE ? = I also obtain that ||x||2 = ||PE x||2 + ||PE ? x||2 81 PE . We (6.4.10) 6410 for any x 2 H. Example 6.8. In the Hilbert space L2 ( 1, 1) let E denote the subspace of even functions, i.e. f 2 E if f (x) = f ( x) for almost every x 2 ( 1, 1). We claim that E ? is the subspace ? of odd functions on ( 1, 1). The fact that any odd function R 1 belongs to E is clear, since if f is even and g is odd then f g is odd and so hf, gi = 1 f (x)g(x) dx = 0. Conversely, if g ? E then for any f 2 E we have Z 1 Z 1 0 = hg, f i = g(x)f (x) dx = (g(x) + g( x))f (x) dx (6.4.11) 1 0 by an obvious change of variables. Choosing f (x) = g(x) + g( x) we see that Z 1 |g(x) + g( x)|2 dx = 0 (6.4.12) 0 so that g(x) = g( x) for almost every x 2 (0, 1) and hence for almost every x 2 ( 1, 1). Thus any element of E ? is and odd function on ( 1, 1). Any function f 2 L2 ( 1, 1) thus has the unique decomposition f = PE f + PE ? f , a sum of an even and an odd function. Since one such splitting is f (x) = f (x) + f ( x) f (x) f ( x) + 2 2 (6.4.13) we conclude from the uniqueness property that these two term are the projections, i.e. PE f (x) = f (x) + f ( x) 2 PE ? f (x) = f (x) f ( x) 2 (6.4.14) Example 6.9. Let {x1 , x2 , . . . xn } be an orthogonal set of nonzero elements in a Hilbert space X and E = L(x1 , x2 . . . xn ) the span ofPthese elements. Let us compute PE for this closed subspace E. If y = PE x then y = nj=1 j xj for some scalars 1 , . . . n since y 2 E. From Theorem 6.3 we also have that x y ? E which is equivalent to x y ? xk for each k. Thus hx, xk i = hy, xk i = k hxk , xk i using the orthogonality assumption. Thus we conclude that n X hx, xj i y = PE x = xj (6.4.15) hx j , xj i j=1 82 6415 6.5 Gram-Schmidt method The projection formula (6.4.15) provides an explicit and very convenient expression for the solution y of the best approximation problem (6.4.1) provided E is a subspace spanned by mutually orthogonal vectors {x1 , x2 , . . . xn }. If instead E = L(x1 , x2 . . . xn ) is a subspace but {x1 , x2 , . . . xn } are not orthogonal vectors, we can still use (6.4.15) to compute y = PE x if we can find a set of orthogonal vectors {y1 , y2 , . . . ym } such that E = L(x1 , x2 , . . . xn ) = L(y1 , y2 , . . . ym ), i.e. if we can find an orthogonal basis of E. This may always be done by the Gram-Schmidt orthogonalization procedure from linear algebra, which we now describe. Assume that {x1 , x2 , . . . xn } are linearly independent, so that m = n must hold. First set y1 = x1 . If orthogonal vectors y1 , y2 . . . yk have been chosen for some 1  k < n such that Ek := L(y1 , y2 , . . . yk ) = L(x1 , x2 , . . . xk ) then define yk+1 = xk+1 PEk xk+1 . Clearly {y1 , y2 , . . . yk+1 } are orthogonal since yk+1 is the projection of xk+1 onto Ek? . Also since yk+1 , xk+1 di↵er by an element of Ek it is evident that L(x1 , x2 , . . . xk+1 ) = L(y1 , y2 , . . . yk+1 ). Thus after n steps we obtain an orthogonal set {y1 , y2 , . . . yn } which spans E. If the original set {x1 , x2 , . . . xn } is not linearly independent then some of the yk ’s will be zero. After discarding these and relabeling, we obtain {y1 , y2 , . . . ym } for some m  n, an orthogonal basis for E. Note that we may compute yk+1 using (6.4.15), namely k X < xk+1 , yj > yk+1 = xk+1 yj (6.5.1) < y j , yj > j=1 In practice the Gram-Schmidt method is often modified to produce an orthonormal basis of E by normalizing yk to be a unit vector at each step, or else discarding it if it is already a linear combination of {y1 , y2 , . . . yk 1 }. More explicitly: • Set y1 = x1 ||x1 || • If orthonormal vectors {y1 , y2 , . . . yk } have been chosen, set ỹk+1 = xk+1 k X < xk+1 , yj > yj j=1 If ỹk+1 = 0 discard it, otherwise set yk+1 = 83 ỹk+1 . ||ỹk+1 || (6.5.2) The reader may easily check P that {y1 , y2 , . . . ym } constitutes an orthonormal basis of E, and consequently PE x = m j=1 < x, yj > yj for any x 2 H. 6.6 Bessel’s inequality and infinite orthogonal sequences The formula (6.4.15) for PE may be adapted for use in infinite dimensional subspaces E. If {xn }1 n=1 is a countable orthogonal set in H, xn 6= 0 for all n, we formally expect that if E = L({xn }1 n=1 ) then 1 X hx, xn i PE x = xn (6.6.1) hx n , xn i n=1 661 To verify that this is correct, we must show that the infinite series in (6.6.1) is guaranteed to be convergent in H. First of all, let us set en = xn ||xn || cn = hx, en i EN = L(x1 , x2 , . . . xN ) (6.6.2) cn e n (6.6.3) so that {en }1 n=1 is an orthonormal set, and PE N x = N X n=1 From (6.4.10) we have N X n=1 |cn |2 = ||PEN x||2  ||x||2 (6.6.4) Letting N ! 1 we obtain Bessel’s inequality 1 X n=1 |cn |2 = 1 X n=1 |hx, en i|2  ||x||2 (6.6.5) besselin The immediate implication that limn!1 cn = 0 is sometimes called the Riemann-Lebesgue lemma. prop63 Proposition 6.3. (Riesz-Fischer) Let {en }1 an orthonormal set in H, E = L({en }1 n=1 be P n=1 ), x 2 H and cn = hx, en i. Then the infinite series 1 c e is convergent in H to P x. n n E n=1 84 Proof: First we note that the series || M X P1 n=N n=1 cn en cn en ||2 = is Cauchy in H since if M > N M X n=N |cn |2 (6.6.6) P 2 which is less than any prescribed ✏ > 0 for M < N sufficiently large, since 1 n=1 |cn | < P1 PN 1. Thus y = n=1 cn en exists in H, and clearly y 2 E. Since h n=1 cn en , em i = cm if N > m it follows easily that < y, em >= cm =< x, em >. Thus y x ? em for any m which implies y x 2 E ? . From Theorem 6.3 we conclude that y = PE x. 2 6.7 Characterization of a basis of a Hilbert space Now suppose we have an orthogonal set {xn }1 n=1 and we wish to determine whether or not it is a basis of the Hilbert space H. There are a number of interesting ways to answer this question, summarized in Theorem 6.4 below. First we must make some more definitions. Definition 6.5. A collection of vectors {xn }1 n=1 is closed in H if the set of all finite linear combinations of {xn }1 is dense in H n=1 A collection of vectors {xn }1 n=1 is complete in H if there is no nonzero vector orthogonal to all of them, i.e. hx, xn i = 0 for all n if and only if x = 0. An orthogonal set {xn }1 n=1 in H is a maximal orthogonal set if it is not contained in any larger orthogonal set. basischar Theorem 6.4. Let {en }1 n=1 be an orthonormal set in a Hilbert space H. Then the following are equivalent. a) {en }1 n=1 is a basis of H. P1 b) x = n=1 hx, en ien for every x 2 H. P c) hx, yi = 1 n=1 hx, en ihen , yi for every x, y 2 H. P 2 d) ||x||2 = 1 n=1 |hx, en i| for every x 2 H. e) {en }1 n=1 is a maximal orthonormal set. f ) {en }1 n=1 is closed in H. 85 g) {en }1 n=1 is complete in H. Proof: a) implies b): If {en }1 then for any x 2 H there exist unique n=1 is a basis of HP constants dn such that x = limSN where SN = n = 1N dn en . Since hSN , em i = dm if N > m it follows that |dm hx, em i| = |hSn x, em i|  ||SN x|| ||em || ! 0 (6.7.1) as N ! 1, using the Schwarz inequality. Hence x= 1 X dn en = n=1 1 X n=1 hx, en ien (6.7.2) b) implies c): For any x, y 2 H we have hx, yi = hx, yi = hx, lim N !1 = = lim hx, N !1 1 X n=1 N X n=1 N X n=1 hy, en ien i hy, en ien i = lim N !1 hx, en ihy, en i = hx, yi = (6.7.3) 1 X n=1 1 X n=1 hy, en ihx, en i hx, en ihen , yi (6.7.4) (6.7.5) Here we have used Corollary 6.1 in the second line. c) implies d): We simply choose x = y in the identity stated in c). d) implies e): If {en }1 n=1 is not maximal then there exists e 2 H such that {en }1 n=1 [ {e} (6.7.6) is orthonormal. Since he, en i = 0 but ||e|| = 1 this contradicts d). e) implies f ): Let E denote the set of finite linear combinations of the en ’s. If {en }1 n=1 is not closed then E 6= H so there must exist x 62 E. If we let y = x PE x then y 6= 0 and y ? E. If e = y/||y|| we would then have that {en }1 n=1 [ {e} is orthonormal so that {en }1 could not be maximal. n=1 86 f ) implies g): Assume that hx, en i = 0 for all n. If {en }1 for any ✏ > 0 n=1 is closed then PN PN 2 2 2 there exists 1 , . . . N such that ||x e || < ✏. But then ||x|| + n n n=1 n=1 | n | < ✏ 2 1 and in particular ||x|| < ✏. Thus x = 0 so {en }n=1 is complete. P1 g) implies a): Let E = L({en }1 n=1 ). If x 2 H and y = PE x = n=1 hx, en ien then as in the proof of Proposition 6.3 hy, xn i = hx, xn i. Since {en }1 is complete it follows n=1 that x = y 2 E so that L{en }1 = H. Since an orthonormal set is obviously linearly n=1 independent it follows that {en }1 is a basis of H. n=1 Because of the equivalence of the stated conditions, the phrases ’complete orthonormal set’, ’maximal orthonormal set’, and ’closed orthonormal set’ are often used interchangeably with ’orthonormal basis’ in a Hilbert space setting. The identity in d) is called the Bessel equality (recall the corresponding inequality (6.6.5) is valid whether or not the orthonormal set {en }1 equality. For n=1 is a basis), while the identity in c) is the ParsevalP reasons which should become more clear in Chapter 8 the infinite series 1 n=1 hx, en ien is often called the generalized Fourier series of x with respect to the orthonormal basis {en }1 n=1 , and hx, en i is the n’th generalized Fourier coefficient. th65 Theorem 6.5. Every separable Hilbert space has an orthonormal basis. Proof: If {xn }1 n=1 is a countable dense sequence in H and we carry out the GramSchmidt procedure, we obtain an orthonormal sequence {en }1 n=1 . This sequence must be complete, since any vector orthogonal to every en must also be orthogonal to every 1 xn , so must be zero, since {xn }1 n=1 is dense. Therefore by Theorem 6.4 {en }n=1 (or {e1 , e2 , . . . en } in the finite dimensional case) is an orthonormal basis of H. The same conclusion is actually correct in a no-separable Hilbert space also, but needs more explanation. See for example Chapter 4 of [29]. 6.8 Isomorphisms of a Hilbert space There are two interesting isomorphisms of every separable Hilbert space, one is to its socalled dual space, and the second is to the sequence space `2 . In this section we explain both of these facts. Recall that in Chapter 5 we have already introduced X⇤ = B(X, C), the space of continuous linear functionals on the normed linear space X. It is itself always a Banach space (see Exercise 3 of Chapter 5), and is also called the dual space of X. 87 exmp6-10 Example 6.10. If H is a Hilbert space and y 2 H, define (x) = hx, yi. Then : H ! C is clearly linear, and | (x)|  ||y|| ||x|| by the Schwarz inequality, hence 2 H⇤ , with || ||  ||y||. The following theorem asserts that every element of the dual space H⇤ arises in this way. riesz Theorem 6.6. (Riesz representation theorem) If H is a Hilbert space and there exists a unique y 2 H such that (x) = hx, yi. 2 H⇤ then Proof: Let M = {x 2 H : (x) = 0}, which is clearly a closed subspace of H. If M = H then can only be the zero functional, so y = 0 has the required properties. Otherwise, there must exist e 2 M ? such that ||e|| = 1. For any x 2 H let z = (x)e (e)x and observe that (z) = 0 so z 2 M , and in particular z ? e. It then follows that 0 = hz, ei = (x)he, ei (e)hx, ei (6.8.1) Thus (x) = hx, yi with y := (e)e, for every x 2 H. The uniqueness property is even easier to show. If (x) = hx, y1 i = hx, y2 i for every x 2 H then necessarily hx, y1 y2 i = 0 for all x, and choosing x = y1 y2 we get ||y1 y2 ||2 = 0, that is, y1 = y2 . We view the element y 2 H as ’representing’ the linear functional 2 H⇤ , hence the name of the theorem. There are actually several theorems one may encounter, all called the Riesz representation theorem, and what they all have in common is that the dual space of some other space is characterized. The Hilbert space version here is by the far the easiest of these theorems. If we define the mapping R : H ! H⇤ (the Riesz map) by the condition R(y) = , with , y related as above, then Theorem 6.6 amounts to the statement that R is one to one and onto. Since it is easy to check that R is also linear, it follows that R is an isomorphism from H to H⇤ . In fact more is true, R is an isometric isomorphism which means that ||R(y)|| = ||y|| for every y 2 H. To see this, recall we have already seen in Example 6.10 that || ||  ||y||, and by choosing x = y we also get (y) = ||y||2 , which implies || || ||y||. Next, suppose that H is an infinite dimensional separable Hilbert space. According to Theorem 6.5 there exists an orthonormal basis of H which cannot be finite, and so may be written as {en }1 n=1 . Associate with any x 2 H the corresponding sequence 88 of generalized Fourier coefficients {cn }1 n=1 , where cn = hx, en i, and let ⇤ denote this mapping, i.e. ⇤(x) = {cn }1 n=1 . P1 2 2 We know byPTheorem 6.4 that n=1 |cn |P< 1, i.e. ⇤(x) 2 ` . On the other 1 1 hand, suppose n=1 |cn |2 < 1 and let x = n=1 cn en . This series is Cauchy, hence convergent in H, by precisely the same argument as used in the beginning of the proof of 1 Proposition 6.3. Since {en }1 n=1 is a basis, we must have cn = hx, en i, thus ⇤(x) = {cn }n=1 , 2 and consequently ⇤ : H ! ` is onto. It is also one-to-one, since ⇤(x1 ) = ⇤(x2 ) means that hx1 x2 , en i = 0 for every n, hence x1 x2 = 0 by the completeness property of a basis. Finally it is straightforward to check that ⇤ is linear, so that ⇤ is an isomorphism. Like the Riesz map, the isomorphism ⇤ is also isometric, ||⇤(x)|| = ||x||, on account of the Bessel equality. By the above considerations we have then established the following theorem. Theorem 6.7. If H is an infinite dimensional separable Hilbert space, then H is isometrically isomorphic to `2 . Since all such Hilbert spaces are isometrically isomorphic to `2 , they are then obviously isometrically isomorphic to each other. If H is a Hilbert space of dimension N , the same arguments show that H is isometrically isomorphic to the Hilbert space RN or CN , depending on whether real or complex scalars are allowed. Finally, see Theorem 4.17 of [29] for the nonseparable case. 6.9 Exercises 1. Prove Proposition 6.2. 2. In the Hilbert space L2 ( 1, 1) what is M ? if a) M = {u : u(x) = u( x) a.e.} b) M = {u : u(x) = 0 a.e. for 1 < x < 0}. Give an explicit formula for the projection onto M in each case. 3. Prove that PE is a linear operator on H with norm ||PE || = 1 except in the trivial case when E = {0}. Suggestion: If x = c1 x1 + c2 x2 first show that PE x c 1 PE x 1 c 2 PE x 2 = 89 PE ? x + c 1 PE ? x 1 + c 2 PE ? x 2 4. Show that the parallelogram law fails in L1 (⌦), so there is no choice of inner product which can give rise to the norm in L1 (⌦). (The same is true in Lp (⌦) for any p 6= 2.) 5. If (X, h·, ·i) is an inner product space prove the polarization identity hx, yi = 1 ||x + y||2 4 y||2 + i||x + iy||2 ||x i||x iy||2 Thus, in any normed linear space, there can exist at most one inner product giving rise to the norm. 6. Let M be a closed subspace of a Hilbert space H, and PM be the corresponding projection. Show that 2 a) PM = PM b) hPM x, yi = hPM x, PM yi = hx, PM yi for any x, y 2 H. ex6-6 7. Show that `2 is a Hilbert space. (Discussion: The only property you need to check is completeness, and you may freely use the fact that R is complete. A Cauchy sequence in this case is a sequence of sequences, so use a notation like (n) (n) x(n) = {x1 , x2 , . . . } (n) where xj denotes the j’th term of the n’th sequence x(n) . Given a Cauchy sequence (n) 2 {x(n) }1 = xj for each n=1 in ` you’ll first find a sequence x such that limn!1 xj fixed j. You then must still show that x 2 `2 , and one good way to do this is by first showing that x x(n) 2 `2 for some n.) 8. Let H be a Hilbert space. a) If xn ! x in H show that {xn }1 n=1 is bounded in H. b) If xn ! x, yn ! y in H show that hxn , yn i ! hx, yi. ex6-8 9. Compute orthogonal polynomials of degree 0,1,2,3 on [ 1, 1] and on [0, 1] by applying the Gram-Schmidt procedure to 1, x, x2 , x3 in L2 ( 1, 1) and L2 (0, 1). (In the case of L2 ( 1, 1), you are finding so-called Legendre polynomials.) 10. Use the result of Exercise 9 and the projection formula (6.6.1) to compute the best polynomial approximations of degrees 0,1,2 and 3 to u(x) = ex in L2 ( 1, 1). Feel free to use any symbolic calculation tool you know to compute the necessary integrals, but give exact coefficients, not calculator approximations. If possible, produce a graph displaying u and the 4 approximations. 90 11. Let ⌦ ⇢ RN , ⇢ be a measurable function on ⌦, andR ⇢(x) > 0 a.e. on ⌦. Let X denote the set of measurable functions u for which ⌦ |u(x)|2 ⇢(x) dx is finite. We can then define the weighted inner product Z hu, vi⇢ = u(x)v(x)⇢(x) dx ⌦ p and corresponding norm ||u||⇢ = hu, ui⇢ on X. The resulting inner product space is complete, often denoted L2⇢ (⌦). (As in the case of ⇢(x) = 1 we regard any two functions which agree a.e. as being the same element, so L2⇢ (⌦) is again really a set of equivalence classes.) a) Verify that all of the inner product axioms are satisfied. b) Suppose that there exist constants C1 , C2 such that 0 < C1  ⇢(x)  C2 a.e. Show that un ! u in L2⇢ (⌦) if and only if un ! u in L2 (⌦). 12. More classes of orthogonal polynomials may be derived by applying the GramSchmidt procedure to {1, x, x2 , . . . } in L2⇢ (a, b) for various choices of ⇢, a, b, two of which occur in Exercise 9. Another class is the Laguerre polynomials, corresponding to a = 0, b = 1 and ⇢(x) = e x . Find the first four Laguerre polynomials. 13. Show that equality holds in the Schwarz inequality (6.2.1) if and only if x, y are linearly dependent. 14. Show by examples that the best approximation problem (6.4.1) may not have a solution if E is either not closed or not convex. 15. If ⌦ is a compact subset of RN , show that C(⌦) is a subspace of L2 (⌦) which isn’t closed. 16. Show that ⇢ 1 p , cos n⇡x, sin n⇡x 2 1 (6.9.1) n=1 is an orthonormal set in L2 ( 1, 1). (Completeness of this set will be shown in Chapter 8.) 17. For nonnegative integers n define vn (x) = cos (n cos 1 x) a) Show that vn+1 (x) + vn 1 (x) = 2xvn (x) for n = 1, 2, . . . 91 b) Show that vn is a polynomial of degree n (the so-called Chebyshev polynomials). 2 c) Show that {vn }1 n=1 are orthogonal in L⇢ ( 1, 1) where the weight function is 1 ⇢(x) = p1 x2 . 18. If H is a Hilbert space we say a sequence {xn }1 n=1 converges weakly to x (notation: w xn ! x) if hxn , yi ! hx, yi for every y 2 H. w a) Show that if xn ! x then xn ! x. b) Prove that the converse is false, as long as dim(H) = 1, by showing that if w {en }1 n=1 is any orthonormal sequence in H then en ! 0, but limn!1 en doesn’t exist. w c) Prove that if xn ! x then ||x||  lim inf n!1 ||xn ||. w d) Prove that if xn ! x and ||xn || ! ||x|| then xn ! x. 19. Let M1 , M2 be closed subspaces of a Hilbert space H and suppose M1 ? M2 . Show that M1 M2 = {x 2 H : x = y + z, y 2 M1 , z 2 M2 } is also a closed subspace of H. 92 Chapter 7 Distributions chdist In this chapter we will introduce and study the concept of distribution, also commonly known as generalized function. To motivate this study we first mention two examples. Example 7.1. The wave equation utt uxx = 0 has the general solution u(x, t) = F (x + t) + G(x t) where F, G must be in C 2 (R) in order that u be a classical solution. However from a physical point of view there is no apparent reason why such smoothness restrictions on F, G should be needed. Indeed the two terms represent waves of fixed shape moving to the left and right respectively with speed one, and it ought to be possible to allow the shape functions F, G to have discontinuities. The calculus of distributions will allow us to regard u as a solution of the wave equation in a well defined sense even for such irregular F, G. Example 7.2. In physics and engineering one frequently encounters the Dirac delta function (x), which has the properties Z 1 (x) = 0 x 6= 0 (x) dx = 1 (7.0.1) 1 Unfortunately these properties are inconsistent for ordinary functions – any function which is zero except at a single point must have integral zero. The theory of distributions will allow us to give a precise mathematical meaning to the delta function and in so doing justify formal calculations with it. Roughly speaking a distribution is a mathematical object whose unique identity is specified by how it acts on all test function. It is in a sense quite analogous to a function 93 in the ordinary sense, whose unique identity is specified by how it acts (i.e. how it maps) all points in its domain. As we will see, most ordinary functions may viewed as a special kind of distribution, which explains the ’generalized function’ terminology. In addition, there is a well defined calculus of distributions which is basic to the modern theory of partial di↵erential equations. We now start to give precise meaning to these concepts. 7.1 The space of test functions For any real or complex valued function f defined on some domain in RN , the support of f , denoted supp f , is the closure of the set {x : f (x) 6= 0}. Definition 7.1. If ⌦ is any open set in RN the space of test functions on ⌦ is C01 (⌦) = { 2 C 1 (⌦) : supp is compact in ⌦} (7.1.1) This function space is also commonly denoted D(⌦), which is the notation we will use from now on. Clearly D(⌦) is a vector space, but it may not be immediately evident that it contains any function other than ⌘ 0. Example 7.3. Define (x) = ( 1 e x2 1 |x| < 1 0 |x| 1 (7.1.2) Then 2 D(⌦) with ⌦ = R. To see this one only needs to check that limx!1 (k) (x) = 0 for k = 0, 1, . . . , and similarly at x = 1. Once we have one such function then many others can be derived from it by dilation ( (x) ! (↵x)), translation ( (x) ! (x ↵)), scaling ( (x) ! ↵ (x)), di↵erentiation ( (x) ! (k) (x)) or any linear combination of such terms. See also Exercise 1. Next, we define convergence in the test function space. Definition 7.2. If n 2 D(⌦) then we say n ! 0 in D(⌦) if (i) There exists a compact set K ⇢ ⌦ such that supp (ii) limn!1 maxx2⌦ |D↵ n (x)| n ⇢ K for every n = 0 for every multiindex ↵ We also say that n ! in D(⌦) provided n ! 0 in D(⌦). By specifying what convergence of a sequence in D(⌦) means, we are partly, but not completely, specifying 94 a topology on D(⌦). We will have no need of further details about this topology, see Chapter 6 of [30] for more on this point. 7.2 The space of distributions sec72 We come now to the basic definition – a distribution is a continuous linear functional on D(⌦). More precisely Definition 7.3. A linear mapping T : D(⌦) ! C is a distribution on ⌦ if T ( n ) ! T ( ) whenever n ! in D(⌦). The set of all distributions on ⌦ is denoted D0 (⌦). The distribution space D0 (⌦) is another example of a dual space X0 , the set of all continuous linear functionals on X, which can be defined if X is any vector space in which convergence of sequences is defined. The dual space is always itself a vector space. We’ll discuss many more examples of dual spaces later on. We emphasize that the distribution T is defined solely in terms of the values it assigns to test functions , in particular two distributions T1 , T2 are equal if and only if T1 ( ) = T2 ( ) for every 2 D(⌦). To clarify the concept, let us discuss a number of examples. Example: If f 2 L1 (⌦) define T( ) = Z f (x) (x) dx (7.2.1) ⌦ Obviously |T ( )|  ||f ||L1 (⌦) || ||L1 (⌦) , so that T : D(⌦) ! C and is also clearly linear. If n ! in D(⌦) then by the same token |T ( n) T ( )|  ||f ||L1 (⌦) || n ||L1 (⌦) ! 0 (7.2.2) so that T is continuous. Thus T 2 D0 (⌦). Because of the fact that must have compact support in ⌦ one does not really need f to be in L1 (⌦) but only in L1 (K) for any compact subset K of ⌦. For any 1  p  1 let us define Lploc (⌦) = {f : f 2 Lp (K) for any compact set K ⇢ ⌦} 95 (7.2.3) Tf Thus a function in Lploc (⌦) can become infinite arbitrarily rapidly at the boundary of ⌦. We say that fn ! f in Lploc (⌦) if fn ! f in Lp (K) for every compact subset K ⇢ ⌦. Functions in L1loc are said to be locally integrable on ⌦. Now if we let f 2 L1loc (⌦) the definition (7.2.1) still produces a finite value, since Z Z |T ( )| = f (x) (x) dx = f (x) (x) dx  ||f ||L1 (K) || ||L1 (K) < 1 (7.2.4) ⌦ K if K = supp . Similarly if containing supp and supp |T ( n) n n ! in D(⌦) we can choose a fixed compact set K ⇢ ⌦ for every n, hence again T ( )|  ||f ||L1 (K) || n ||L1 (K) ! 0 (7.2.5) so that T 2 D0 (⌦). When convenient, we will denote the distribution in (7.2.1) by Tf . The correspondence f ! Tf allows us to think of L1loc (⌦) as a special subspace of D0 (⌦), i.e. locally integrable functions are always distributions. The point is that a function f can be thought of as a mapping Z ! f dx (7.2.6) ⌦ instead of the more conventional x ! f (x) L1loc (7.2.7) In fact for functions the former is in some sense more natural since it doesn’t require us to make special arrangements for sets of measure zero. A distribution of the form T = Tf for some f 2 L1loc (⌦) is sometimes referred to as a regular distribution, while any distribution not of this type is a singular distribution. The correspondence f ! Tf is also one-to-one. This is a slightly technical result in measure theory which we leave for the exercises, for those with the necessary background. See also Theorem 2, Chapter II of [31]: th71 Theorem 7.1. Two distributions Tf1 , Tf2 on ⌦ are equal if and only if f1 = f2 almost everywhere on ⌦. Example 7.4. Fix a point x0 2 ⌦ and define T ( ) = (x0 ) Clearly T is defined and linear on D(⌦) and if |T ( n) T ( )| = | n n (x0 ) 96 ! (7.2.8) in D(⌦) then (x0 )| ! 0 (7.2.9) Tdelta since n ! uniformly on ⌦. We claim that T is not of the form Tf for any f 2 L1loc (⌦) (i.e. not a regular distribution). To see this, suppose some such f existed. We would then have Z f (x) (x) dx = 0 (7.2.10) ⌦ for any test function with (x0 ) = 0. In particular if ⌦0 = ⌦\{x0 } and 2 D(⌦0 ) then defining (x0 ) = 0 we clearly have 2 D(⌦) and T ( ) = 0, hence f = 0 a.e. on ⌦0 and so on ⌦, by Theorem 7.1. On the other hand we must also have, for any 2 D(⌦) that Z (x0 ) = T ( ) = f (x) (x) dx (7.2.11a) ⌦ Z Z Z = f (x)( (x) (x0 )) dx + (x0 ) f (x) dx = (x0 ) f (x) dx (7.2.11b) ⌦ since f = 0 a.e. on ⌦, and therefore R ⌦ ⌦ f (x) dx = 1 a contradiction. R Note that f (x) = 0 for a.e x 2 ⌦. and ⌦ f (x) dx 1 are precisely the formal properties of the delta function mentioned in Example 2. We define T to be the Dirac delta distribution with singularity at x0 , usually denoted x0 , or simply in the case x0 = 0. By an acceptable abuse of notation, pretending that is an actual function, we may write a formula like Z (x) (x) dx = (0) (7.2.12) ⌦ ⌦ but we emphasize that this is simply a formal expression of (7.2.8), and any rigorous arguments must make use of (7.2.8) directly. In the same formal sense x0 (x) = (x x0 ) so that Z (x x0 ) (x) dx = (x0 ) (7.2.13) ⌦ ex-75 Example 7.5. Fix a point x0 2 ⌦, a multiindex ↵ and define T ( ) = (D↵ )(x0 ) (7.2.14) One may show, as in the previous example, that T 2 D0 (⌦). Example 7.6. Let ⌃ be a sufficiently smooth hypersurface in ⌦ of dimension m  n 1 and define Z T( ) = (x) ds(x) (7.2.15) ⌃ where ds is the surface area element on ⌃. Then T is a distribution on ⌦ sometimes referred to as the delta distribution concentrated on ⌃, sometimes written as ⌃ . 97 ex-77 Example 7.7. Let ⌦ = R and define T ( ) = lim ✏!0+ Z |x|>✏ (x) dx x (7.2.16) As we’ll show below, the indicated limit always exists and is finite for for 2 C01 (⌦)). In general, a limit of the form Z lim f (x) dx ✏!0+ 2 D(⌦) (even (7.2.17) ⌦\|x a|>✏ R when it exists, is called the Cauchy principal value of ⌦ f (x) dx, which may be finite R R1 even when ⌦ f (x) dx is divergent in the ordinary sense. For example 1 dx is divergent, x regarded as either a Lebesgue integral or an improper Riemann integral, but Z dx lim =0 (7.2.18) ✏!0+ 1>|x|>✏ x To distinguish the principal value meaning of the integral, the notation Z pv f (x) dx (7.2.19) ⌦ may be used instead of (7.2.17), where the point a in question must be clear from context. Let us now check that (7.2.16) defines a distribution. If supp Z |x|>✏ (x) dx = x Z M >|x|>✏ (x) dx = x Z (x) x M >|x|>✏ and the last term on the right is zero, we have Z T ( ) = lim ✏!0+ where (x) = ( (x) (0) dx + (0) (x) dx ⇢ [ M, M ] then since Z M >|x|>✏ 1 dx (7.2.20) x (7.2.21) M >|x|>✏ (0))/x. It now follows from the mean value theorem that Z |T ( )|  | (x)| dx  2M || 0 ||L1 (7.2.22) |x|<M 98 pv1x pvdef so T ( ) is defined and finite for all test functions. Linearity of T is clear, and if n ! in D(⌦) then 0 |T ( n ) T ( )|  2M || 0n ||L1 ! 0 (7.2.23) where M is chosen so that supp ous. n, supp ⇢ [ M, M ], and it follows that T is continu- The distribution T is often denoted pv x1 , so for example pv x1 ( ) means the same thing as the right hand side of (7.2.16). For reasons which will become more clear later, it may also be referred to as pf x1 , pf standing for pseudofunction (also finite part). 7.3 7.3.1 Algebra and Calculus with Distributions Multiplication of distributions As noted above D0 (⌦) is a vector space, hence distributions can be added and multiplied by scalars. In general it is not possible to multiply together arbitrary distributions – for example 2 = · cannot be defined in any consistent way. It is always possible, however, to multiply a distribution by a C 1 function. More precisely, if a 2 C 1 (⌦) and T 2 D0 (⌦) then we may define the product aT as a distribution via Definition 7.4. aT ( ) = T (a ) 2 D(⌦) Clearly a 2 D(⌦) so that the right hand side is well defined, and it it straightforward to check that aT satisfies the necessary linearity and continuity conditions. One should also note that if T = Tf then this definition is consistent with ordinary pointwise multiplication of the functions f and a. 7.3.2 Convergence of distributions An appropriate definition of convergence of a sequence of distributions is as follows. convD Definition 7.5. If T, Tn 2 D0 (⌦) for n = 1, 2 . . . then we say Tn ! T in D0 (⌦) (or in the sense of distributions) if Tn ( ) ! T ( ) for every 2 D(⌦). 99 It is an interesting fact, which we shall not prove here, that it is not necessary to assume that the limit T belongs to D0 (⌦), that is to say, if T ( ) := limn!1 Tn ( ) exists for every 2 D(⌦) then necessarily T 2 D0 (⌦), (see Theorem 6.17 of [30]). Example 7.8. If fn 2 L1loc (⌦) and fn ! f in L1loc (⌦) then the corresponding distribution Tfn ! Tf in the sense of distributions, since Z |Tfn ( ) Tf ( )|  |fn f || | dx  ||fn f ||L1 (K) || ||L1 (⌦) (7.3.1) K where K is the support of . Because of the one-to-one correspondence f $ Tf , we will usually write instead that fn ! f in the sense of distributions. Example 7.9. Define fn (x) = We claim that fn ! |Tfn ( ) ⇢ n 0 < x < n1 0 otherwise (7.3.2) in the sense of distributions. We see this by first observing that ( )| = n Z 1 n (x) dx (0) = n 0 Z 1 n ( (x) (0)) dx By the continuity of , if ✏ > 0 there exists > 0 such that | (x) |x|  . Thus if we choose n > 1 there follows n Z 1 n 0 | (x) (0)| dx  n✏ Z (7.3.3) 0 1 n dx = ✏ (0)|  ✏ whenever (7.3.4) 0 from which the conclusion follows. Note that the formal properties of the function, R (x) = 0, x 6= 0, (0) = +1, (x) dx = 1, are clearly reflected in the pointwise limit of the sequence fn , but it is only the distributional definition that is mathematically satisfactory. Sequences converging to play a very large role in methods of applied mathematics, especially in the theory of di↵erential and integral equations. The following theorem includes many cases of interest. dst Theorem 7.2. Suppose fn 2 L1 (RN ) for n = 1, 2, . . . and assume R a) RN fn (x) dx = 1 for all n. b) There exists a constant C such that ||fn ||L1 (RN )  C for all n. 100 ds2 c) limn!1 If R |x|> |fn (x)| dx = 0 for all > 0. is bounded on RN and continuous at x = 0 then Z lim fn (x) (x) dx = (0) n!1 and in particular fn ! Proof: For any (7.3.5) RN in D0 (RN ). 2 D(⌦) we have Z fn (x) (x) dx (0) = RN Z RN fn (x)( (x) (0)) dx (7.3.6) ds1 and so we will be done if we show that that the integral on the right tends to zero as n ! 1. Fix ✏ > 0 and choose > 0 such that | (x) (0)|  ✏ whenever |x| < . Write the integral on the right in (7.3.6) as the sum An, + Bn, where Z Z An, = fn (x)( (x) (0)) dx Bn, = fn (x)( (x) (0)) dx (7.3.7) |x| |x|> We then have, by obvious estimations, that Z |An, |  ✏ |fn (x)|  C✏ (7.3.8) RN while lim sup |Bn, |  lim sup 2|| ||L1 n!1 n!1 Thus lim sup n!1 Z RN Z |x|> fn (x) (x) dx |fn (x)| dx = 0 (0)  C✏ (7.3.9) (7.3.10) and the conclusion follows since ✏ > 0 is arbitrary. 2 It is often the case that fn 0 for all n, in which case assumption b) follows automatically from a) with C = 1. We will refer to any sequence satisfying the assumptions of Theorem 7.2 as a delta sequence. A common way to construct such a sequence is to R pick any f 2 L1 (RN ) with RN f (x) dx = 1 and set fn (x) = nN f (nx) (7.3.11) The verification of this is left to the exercises. If, for example, we choose f (x) = [0,1] (x), then the resulting sequence fn (x) is the same as is defined in (7.3.2). Since we can also choose such an f in D(RN ) we also have 101 7311 dst2 N Proposition 7.1. There exists a sequence {fn }1 n=1 such that fn 2 D(R ) and fn ! 0 N in D (R ). 7.3.3 Derivative of a distribution Next we explain how it is possible to define the derivative of an arbitrary distribution. For the moment, suppose (a, b) ⇢ R, f 2 C 1 (a, b) and T = Tf is the corresponding distribution. We clearly then have from integration by parts that Z b Z b 0 0 Tf ( ) = f (x) (x) dx = f (x) 0 (x) dx = Tf ( 0 ) (7.3.12) a This suggests defining a T 0( ) = T ( 0) 0 2 C01 (a, b) (7.3.13) T ( 0) = T 0( ) (7.3.14) whenever T 2 D (a, b). The previous equation shows that this definition is consistent with the ordinary concept of di↵erentiability for C 1 functions. Clearly, T 0 ( ) is always defined, since 0 is a test function whenever is, linearity of T 0 is obvious, and if n ! in C01 (a, b) then 0n ! 0 also in C01 (a, b) so that T 0( n) = T( 0 n) ! Thus, T 0 2 D0 (a, b). Example: Consider the case of the Heaviside (unit step) function H(x) ⇢ 0 x<0 H(x) = 1 x>0 (7.3.15) If we seek the derivative of H (i.e. of TH ) according to the above distributional definition, then we compute Z 1 Z 1 0 H 0 ( ) = H( 0 ) = H(x) 0 (x) dx = (x) dx = (0) (7.3.16) 1 0 (where we use the natural notation H 0 in place of TH0 ). This means that H 0 ( ) = ( ) for any test function , and so H 0 = in the sense of distributions. This relationship clearly captures the fact the H 0 = 0 at all points where the derivative exists in the classical sense, since we think of the delta function as being zero on any interval not containing 102 the origin. Since H is not di↵erentiable at the origin, the distributional derivative is itself a distribution which is not a function. Since is again a distribution, it will itself have a derivative, namely 0 ( 0) = ( )= 0 (0) (7.3.17) a distribution of the type discussed in Example 7.5, often referred to as the dipole distribution, which of course we may regard as the second derivative of H. For an arbitrary domain ⌦ ⇢ RN and sufficiently smooth function f we have the similar integration by parts formula (see (18.2.3)) Z Z @f @ dx = f dx (7.3.18) @x @x i i ⌦ ⌦ leading to the definition Definition 7.6. @T ( )= @xi T ✓ @ @xi ◆ 2 D(⌦) (7.3.19) @T As in the one dimensional case we easily check that @x belongs to D0 (⌦) whenever T i does. This has the far reaching consequence that every distribution is infinitely di↵erentiable in the sense of distributions. Furthermore we have the general formula, obtained by repeated application of the basic definition, that (D↵ T )( ) = ( 1)|↵| T (D↵ ) (7.3.20) for any multiindex ↵. A simple and useful property is prop72 Proposition 7.2. If Tn ! T in D0 (⌦) then D↵ Tn ! D↵ T in D0 (⌦) for any multiindex ↵. Proof: D↵ Tn ( ) = ( 1)|↵| Tn (D↵ ) ! ( 1)|↵| T (D↵ ) = D↵ T ( ) for any test function . 2 Next we consider a more generic one dimensional situation. Let x0 2 R and consider a function f which is C 1 on ( 1, x0 ) and on (x0 , 1), and for which f (k) has finite left 103 and right hand limits at x = x0 , for any k. Thus, at the point x = x0 , f or any of its derivatives may have a jump discontinuity, and we denote k f = lim f (k) (x) x!x0 + (and by convention f= lim f (k) (x) (7.3.21) x!x0 0 f .) Define also ⇢ (k) f (x) x 6= x0 (k) [f ](x) = undefined x = x0 (7.3.22) which we’ll refer to as the pointwise k’th derivative. The notation f (k) will always be understood to mean the distributional derivative unless otherwise stated. The distinction between f (k) and [f (k) ] is crucial, for example if f (x) = H(x), the Heaviside function, then H 0 = but [H 0 ] = 0 for x 6= 0, and is undefined for x = 0. If For f as described above, we now proceed to calculate the distributional derivative. 2 C01 (R) we have Z 1 Z x0 Z 1 0 0 f (x) (x) dx = f (x) (x) dx + f (x) 0 (x) dx (7.3.23a) 1 1 x0 Z x0 Z 1 x0 1 0 = f (x) (x) 1 f (x) (x) dx + f (x) (x) x0 f 0 (x) (x) dx (7.3.23b) 1 x0 Z 1 = [f 0 (x)] (x) dx + (f (x0 ) f (x0 +)) (x0 ) (7.3.23c) 1 It follows that f 0( ) = or Z 1 [f 0 (x)] (x) dx + ( f ) (x0 ) (7.3.24) 1 f 0 = [f 0 ] + ( f ) (x x0 ) (7.3.25) Note in particular that f 0 = [f 0 ] if and only if f is continuous at x0 . The function [f 0 ] satisfies all of the same assumptions as f itself, with thus we can di↵erentiate again in the distribution sense to obtain f 00 = [f 0 ]0 + ( f ) 0 (x x0 ) = [f 00 ] + ( 1 f ) (x x0 ) + ( f ) 0 (x Here we use the evident fact that the distributional derivative of (x 104 f0 = x0 ) x0 ) is [f 0 ], (7.3.26) 0 (x x0 ). A similar calculation can be carried out for higher derivatives of f , leading to the general formula k 1 X f (k) = [f (k) ] + ( j f ) (k 1 j) (x x0 ) (7.3.27) j=0 One can also obtain a similar formula if f is allowed to have any finite number of such singular points. Example 7.10. Let f (x) = ⇢ x x<0 cos x x > 0 (7.3.28) Clearly f satisfies all of the assumptions mentioned above with x0 = 0, and ⇢ 1 x<0 [f 0 ](x) = sin x x > 0 ⇢ 0 x<0 00 [f ](x) = cos x x > 0 so that f = 1, 1 f= (7.3.29) (7.3.30) 1. Thus f 0 = [f 0 ] + f 00 = [f 00 ] + 0 (7.3.31) Here is one more instructive example in the one dimensional case. Example 7.11. Let f (x) = ( log x x > 0 0 x0 (7.3.32) Since f 2 L1loc (R) we may regard it as a distribution on R, but its pointwise derivative H(x)/x is not locally integrable, so does not have an obvious distributional meaning. Nevertheless f 0 must exist in the sense of D0 (R). To find it we use the definition above, Z 1 Z 1 0 0 f 0( ) = f ( 0) = (x) log x dx = lim (x) log x dx (7.3.33) ✏!0+ 0 ✏  Z 1 (x) = lim (✏) log ✏ + dx (7.3.34) ✏!0+ x ✏  Z 1 (x) = lim (0) log ✏ + dx (7.3.35) ✏!0+ x ✏ 105 7327 where the final equality is valid because the di↵erence between it and the previous line is lim✏!0 ( (✏) (0)) ⇣ log⌘✏ = 0. The functional defined by the final expression above will 1 be denoted as pf H(x) , i.e. x pf ✓ H(x) x ◆ ( ) = lim ✏!0+  (0) log ✏ + Z 1 ✏ (x) dx x (7.3.36) Since we have already established ⇣ ⌘ that the derivative of a distribution is also a distriH(x) bution, it follows that pf x 2 D0 (R) and in particular the limit here always exists for 2 D(R). It should be emphasized that if (0) 6= 0 then neither of the two terms on the right hand side in (7.3.36) will have a finite limit separately, but the sum always will. For a test function ⇣ with ⌘ support disjoint from the singularity at x = 0, the action H(x) of the distribution pf x coincides with that of the ordinary function H(x)/x, as we might expect. Next we turn to examples involving partial derivatives. exm7-12 Example 7.12. Let F 2 L1loc (R) and set u(x, t) = F (x + t). We claim that utt uxx = 0 in D0 (R2 ). Recall that this is the point that was raised in the first example at the beginning of this chapter. A similar argument works for F (x t). To verify this claim, first observe that for any 2 D(R2 ) ZZ (utt uxx )( ) = u( tt F (x + t)( tt (x, t) (7.3.37) xx ) = xx (x, t)) dxdt R2 Make the change of coordinates ⇠=x to obtain (utt since exm7-13 uxx )( ) = 2 Z 1 1 F (⌘) Z 1 t ⌘ =x+t ⇠⌘ (⇠, ⌘) d⇠ d⌘ = 2 1 Z 1 1 F (⌘) 1 ⌘ (⇠, ⌘)|⇠= 1 d⌘ = 0 (7.3.39) has compact support. Example 7.13. Let N 3 and define u(x) = 1 (7.3.38) 1 |x|N 2 Recall the pf notation was introduced earlier in Section 7.2. 106 (7.3.40) pf1 We claim that in D0 (RN ) u = CN (7.3.41) 7341 where CN = (2 N )⌦N 1 and ⌦N 1 is the surface area of the unit sphere in R . First note that for any R we have Z Z R 1 N 1 |u(x)| dx = ⌦N 1 r dr < 1 (7.3.42) N 2 |x|<R 0 r 2 N (using, for example (18.3.1)) so u 2 L1loc (RN ) and in particular u 2 D0 (RN ). It is natural here to use spherical coordinates in RN , see Section 18.3 for a review. In particular the expression for the Laplacian in spherical coordinates may be derived from the chain rule, as was done in (2.3.67) for the two dimensional case. When applied to a function depending only on r = |x|, such as u, the result is u = urr + N 1 r (see Exercise 17 of Chapter 2) and it follows that ur (7.3.43) u(x) = 0 for x 6= 0. We may use Green’s identity (18.2.6) to obtain, for any 2 D(RN ) Z Z u(x) (x) dx = lim u(x) (x) dx u( ) = u( ) = ✏!0+ |x|>✏ RN Z ✓ ◆ Z @ @u = lim u(x) (x) dx + u(x) (x) (x) (x) dS(x) ✏!0+ @n @n |x|>✏ |x|=✏ Since u = 0 for x 6= 0 and lim ✏!0+ 2 Z |x|=✏ The usual notation is to use N 1. = Z (7.3.45) @ @r on {x : |x| = ✏} this simplifies to ✓ ◆ 2 N 1 @ u( ) = lim (x) (x) dS(x) ✏!0+ |x|=✏ ✏N 1 ✏N 2 @r We next observe that N @ @n (7.3.44) 2 N (x) dS(x) = (2 ✏N 1 N )⌦N 1 (0) (7.3.46) (7.3.47) 1 rather than N as the subscript because the sphere is a surface of dimension 107 radialNlaplacian since the average of over the sphere of radius ✏ converges to (0) as ✏ ! 0. Finally, the second integral tends to zero, since Z 1 @ ⌦N 1 ✏ N 1 (x) dS(x)  ||r ||L1 ! 0 (7.3.48) N 2 @r ✏N 2 |x|=✏ ✏ Thus (7.3.41) holds. When N = 2 an analogous calculation shows that if u(x) = log |x| then u = 2⇡ in D0 (R2 ). 7.4 Convolution and distributions If f, g are locally integrable functions on RN the classical convolution of f and g is defined to be Z (f ⇤ g)(x) = f (x y)g(y) dy (7.4.1) RN whenever the integral is defined. By an obvious change of variable we see that convolution is commutative, f ⇤ g = g ⇤ f . Proposition 7.3. If f 2 Lp (RN ) and g 2 Lq (RN ) then f ⇤ g 2 Lr (RN ) if 1 + 1r = so in particular is defined almost everywhere. Furthermore ||f ⇤ g||Lr (RN )  ||f ||Lp (RN ) ||g||Lq (RN ) 1 p + 1q , (7.4.2) The inequality (7.4.2) is Young’s convolution inequality, and we refer to [37] (Theorem 9.2) for a proof. In the case r = 1 it can actually be shown that f ⇤ g 2 C(RN ). Our goal here is to generalize the definition of convolution in such a way that at least one of the two factors can be a distribution. Let us introduce the notations for translation and inversion of a function f , (⌧h f )(x) = f (x fˇ(x) = f ( x) h) (7.4.3) (7.4.4) so that f (x y) = (⌧x fˇ)(y). If f 2 D(RN ) then so is (⌧x fˇ) so that (f ⇤ g)(x) may be regarded as Tg (⌧x fˇ), i.e. the value obtained when the distribution corresponding to the locally integrable function g acts on the test function (⌧x fˇ). This motivates the following definition. 108 youngci convdp Definition 7.7. If T 2 D0 (RN ) and 2 D(RN ) then (T ⇤ )(x) = T (⌧x ˇ). By this definition (T ⇤ )(x) exists and is finite for every x 2 RN but other smoothness or decay properties of T ⇤ may not be apparent. Example 7.14. If T = then (T ⇤ )(x) = (⌧x ˇ) = (⌧x ˇ)(y)|y=0 = (x y)|y=0 = (x) Thus, is the ’convolution identity’, ⇤ = at least for corresponds to the widely used formula Z (x y) (y) dy = (x) (7.4.5) 2 D(RN ). Formally this (7.4.6) RN If Tn ! in D0 (RN ) then likewise (Tn ⇤ )(x) = Tn (⌧x ˇ) ! (⌧x ˇ) = (x) (7.4.7) ci3 for any fixed x 2 RN . A key property of convolution is that in computing a derivative D↵ (T ⇤ ), the derivative may be applied to either factor in the convolution. More precisely we have the following theorem. convth1 Theorem 7.3. If T 2 D0 (RN ) and 2 D(RN ) then T ⇤ multi-index ↵ D↵ (T ⇤ ) = D↵ T ⇤ = T ⇤ D↵ 2 C 1 (RN ) and for any (7.4.8) Proof: First observe that ( 1)|↵| D↵ (⌧x ˇ) = ⌧x ((D↵ )ˇ) (7.4.9) and applying T to these identical test functions we get the right hand equality in (7.4.8). We refer to Theorem 6.30 of [30] for the proof of the left hand equality. When f, g are continuous functions of compact support it is elementary to see that supp (f ⇤ g) ⇢ supp f + supp g. The same property holds for T ⇤ if T 2 D0 (RN ) and 2 D(RN ), once a proper definition of the support of a distribution is given. If ! ⇢ ⌦ is an open set we say that T = 0 in ! if T ( ) = 0 whenever 2 D(⌦) and supp ( ) ⇢ !. If W denotes the largest open subset of ⌦ on which T = 0 (equivalently the 109 ci2 union of all open subsets of ⌦ on which T = 0) then the support of T is the complement of W in ⌦. In other words, x 62 supp T if there exists ✏ > 0 such that T ( ) = 0 whenever is a test function with support in B(x, ✏). One can easily verify that the support of a distribution is closed, and agrees with the usual notion of support of a function, up to sets of measure zero. The set of distributions of compact support in ⌦ forms a vector subspace of D0 (⌦) denoted E 0 (⌦). This notation is appropriate because E 0 (⌦) turns out to be precisely the dual space of C 1 (RN ) =: E(RN ) when a suitable definition of convergence is given, see for example Chapter II, section 5 of [31]. If now T 2 E 0 (RN ) and Thus 2 D(RN ), we observe that supp (⌧x ˇ) = x supp (T ⇤ )(x) = T (⌧x ˇ) = 0 (7.4.10) (7.4.11) unless there is a nonempty intersection of supp T and x supp , in other words, x 2 supp T + supp . Thus from these remarks and Theorem 7.3 we have convth2 Proposition 7.4. If T 2 E 0 (RN ) and and in particular T ⇤ 2 D(RN ) then supp (T ⇤ ) ⇢ supp T + supp (7.4.12) 2 D(RN ). Convolution provides an extremely useful and convenient way to approximate functions and distributions by very smooth functions, the exact sense in which the approximation takes place being dependent on the object being approximated. We will discuss several results of this type. thuapprox Theorem 7.4. Let f 2 C(RN ) with supp f compact in RN . Pick 2 D(RN ), with R N (x) dx = 1, set n (x) = n (nx) and fn = f ⇤ n . Then fn 2 D(RN ) and fn ! f RN uniformly on RN . Proof: The fact that fn 2 D(RN ) is immediate from Proposition 7.4. Fix ✏ > 0. By the assumption that f is continuous and of compact support it must be uniformly continuous on RN so there exists > 0 such that |f (x) f (z)| < ✏ if |x z| < . Now choose n0 such that supp n ⇢ B(0, ) for n > n0 . We then have, for n > n0 that Z |fn (x) f (x)| = (fn (x y) fn (x)) n (y) dy  (7.4.13) RN Z |fn (x y) f (x)|| n (y)| dy  ✏|| ||L1 (RN ) (7.4.14) |y|< 110 and the conclusion follows. 2 thLpApprox If f is not assumed continuous then of course it is not possible for there to exist fn 2 D(RN ) converging uniformly to f . However the following can be shown. R Theorem 7.5. Let f 2 Lp (RN ), 1  p < 1. Pick 2 D(RN ), with RN (x) dx = 1, set n (x) = nN (nx) and fn = f ⇤ n . Then fn 2 C 1 (RN ) \ Lp (RN ) and fn ! f in Lp (RN ). Proof: If ✏ > 0 we can find g 2 C(RN ) of compact support such that ||f If gn = g ⇤ n then ||f fn ||Lp (RN )  ||f g||Lp (RN ) + ||g gn ||Lp (RN ) + ||fn  C||f g||Lp (RN ) + ||g gn ||Lp (RN ) g||Lp (RN ) < ✏. gn ||Lp (RN ) (7.4.15) (7.4.16) where we have used Young’s convolution inequality (7.4.2) to obtain ||fn gn ||Lp (RN )  || n ||L1 (RN ) ||f g||Lp (RN ) = || ||L1 (RN ) ||f g||Lp (RN ) (7.4.17) Since gn ! g uniformly by Theorem 7.4 and g gn has support in a fixed compact set independent of n, it follows that ||g gn ||Lp (RN ) ! 0, and so lim supn!1 ||f fn ||Lp (RN )  C✏. Further refinements and variants of these results can be proved, see for example Section C.4 of [10]. Next consider the even more general case that T 2 D0 (RN ). As in Proposition 7.1 we can choose n 2 D(RN ) such that n ! in D0 (RN ). Set Tn = T ⇤ n , so that Tn 2 C 1 (RN ). If 2 D(RN ) we than have Tn ( ) = (Tn ⇤ ˇ)(0) = ((Tn ⇤ n ) ⇤ ˇ)(0) = ((T ⇤ n ) ⇤ ˇ)(0) = (T ⇤ ( n ⇤ ˇ))(0) = T (( n ⇤ ˇ)ˇ) It may be checked that n ⇤ ˇ ! ˇ in D(RN ), thus Tn ( ) ! T ( ) for all that is, Tn ! T in D0 (RN ). (7.4.18) (7.4.19) 2 D(RN ), In the above derivation we used associativity of convolution. This property is not completely obvious, and in fact is false in a more general setting in which convolution of two distributions is defined. For example, if we were to assume that convolution of distributions was always defined and that Theorem 7.3 holds, we would have 1⇤( 0 ⇤H) = 111 1 ⇤ H 0 = 1 ⇤ = 1, but (1 ⇤ 0 ) ⇤ H = 0 ⇤ H = 0. Nevertheless, associativity is correct in the case we have just used it, and we refer to [30] Theorem 6.30(c), for the proof. The pattern of the results just stated is that T ⇤ n converges to T in the topology appropriate to the space that T itself belongs to, but this cannot be true in all situations which may be encountered. For example it cannot be true that if f 2 L1 then f ⇤ n converges to f in L1 since this would amount to uniform convergence of a sequence of continuous functions, which is impossible if f itself is not continuous. 7.5 ex7-1 Exercises 1. Construct a test function 2 C01 (R) with the following properties: 0  (x)  1 for all x 2 R, (x) ⌘ 1 for |x| < 1 and (x) ⌘ 0 for |x| > 2. (Suggestion: think about what 0 would have to look like.) 2. Show that T( ) = 1 X (n) (n) n=1 defines a distribution T 2 D0 (R). 3. If 2 D(R) show that (x) = ( (x) (0))/x (this function R 1 0 appeared in Example 1 7.7) belongs to C (R). (Suggestion: first prove (x) = 0 (xt) dt.) 4. Find the distributional derivative of f (x) = [x], the greatest integer function. 5. Find the distributional derivatives up through order four of f (x) = |x| sin x. 6. (For readers familiar with the concept of absolute continuity.) If f is absolutely continuous on (a, b) and f 0 = g a.e., show that f 0 = g in the sense of distributions on (a, b). 7-3 7. Let n > 0, n ! +1 and set fn (x) = sin nx gn (x) = sin n x ⇡x a) Show that fn ! 0 in D0 (R) as n ! 1. b) Show that gn ! in D0 (R) as n ! 1. R1 (You may use without proof the fact that the value of the improper integral 1 is ⇡.) 112 sin x x dx 8. Let 2 C01 (R) and f 2 L1 (R). a) If n (x) = n( (x + n1 ) (x)), show that the mean value theorem over and over again.) b) If gn (x) = n(f (x + n1 ) n ! 0 in C01 (R). (Suggestion: use f (x)), show that gn ! f 0 in D0 (R). 1 9. Let T = pv . Find a formula analogous to (7.3.35) for the distributional derivative x of T . 10. Find limn!1 sin2 nx in D0 (R), or show that it doesn’t exist. ex7-11 11. Define the distribution T( ) = Z 1 (x, x) dx 1 for 2 C01 (R2 ). Show that T satisfies the wave equation uxx uyy = 0 in the sense of distributions on R2 . Discuss why it makes sense to regard T as being (x y). 12. Let ⌦ ⇢ RN be a bounded open set and K ⇢⇢ ⌦. Show that there exists 2 C01 (⌦) such that 0  (x)  1 and (x) ⌘ 1 for x 2 K. (Hint: approximate the characteristic function of ⌃ by convolution, where ⌃ satisfies K ⇢⇢ ⌃ ⇢⇢ ⌦. Use Proposition 7.4 for the needed support property.) 13. If a 2 C 1 (⌦) and T 2 D0 (⌦) prove the product rule @ @T @a (aT ) = a + T @xj @xj @xj 14. Let T 2 D0 (RN ). We may then regard 7 ! A = T ⇤ as a linear mapping from C01 (Rn ) into C 1 (Rn ). Show that A commutes with translations, that is, ⌧h A = A⌧h for any 2 C01 (RN ). (The following interesting converse statement can also be proved: If A : C01 (RN ) 7 ! C(RN ) is continuous and commutes with translations then there exists a unique T 2 D0 (RN ) such that A = T ⇤ . An operator commuting with translations is also said to be translation invariant.) R1 15. If f 2 L1 (RN ), 1 f (x) dx = 1, and fn (x) = nN f (nx), use Theorem 7.2 to show that fn ! in D0 (RN ). 16. Prove Theorem 7.1. 113 17. If T 2 D0 (⌦) prove the equality of mixed partial derivatives @ 2T @ 2T = @xi @xj @xj @xi (7.5.1) in the sense of distributions, and discuss why there is no contradiction with known examples from calculus showing that the mixed partial derivatives need not be equal. 18. Show that the expression T( ) = Z 1 (x) (0) |x| 1 dx + Z |x|>1 defines a distribution on R. Show also that xT = sgn x. (x) dx |x| 19. If f is a function defined on RN and > 0, let f (x) = f ( x). We say that f is homogeneous of degree ↵ if f = ↵ f for any > 0. If T is a distribution on RN we say that T is homogeneous of degree ↵ if T( ↵ N )= T( 1 ) a) Show that these two definitions are consistent, i.e., if T = Tf for some f 2 L1loc (RN ) then T is homogeneous of degree ↵ if and only if f is homogeneous of degree ↵. b) Show that the delta function is homogeneous of degree N . ex7-17 20. Show that u(x) = 1 2⇡ log |x| satisfies in D0 (R2 ). u= 21. Without appealing to Theorem 7.3, give a direct proof of the fact that T ⇤ continuous function of x, for T 2 D0 (RN ) and 2 D(RN ). 22. Let f (x) = ( is a log2 x x > 0 0 x<0 Show that f 2 D0 (R) and find the distributional derivative f 0 . Is f a tempered distribution? 23. If a 2 C 1 (R), show that a 0 = a(0) 0 a0 (0) 24. If T 2 D0 (RN ) has compact support, show that T ( ) is defined in an unambiguous way for any 2 C 1 (RN ) =: E(RN ). (Suggestion: write = + (1 ) where 2 D(RN ) satisfies ⌘ 1 on the support of T .) 114 Chapter 8 Fourier analysis and distributions chfourier In this chapter We present some of the elements of Fourier analysis, with special attention to those aspects arising in the theory of distributions. Fourier analysis is often viewed as made up of two parts, one being a collection of topics relating to Fourier series, and the second being those connected to the Fourier transform. The essential distinction is that the former focuses on periodic functions while the latter is concerned with functions defined on all of RN . In either case the central question is that of how we may represent fairly arbitrary functions, or even distributions, as combinations of particularly simple periodic functions. We will begin with Fourier series, and restrict attention to the one dimensional case. See for example [25] for treatment of multidimensional Fourier series. 8.1 Fourier series in one space dimension The fundamental point is that if un (x) = einx then the functions {un }1 n= 1 make up an orthogonal basis of L2 ( ⇡, ⇡). It will then follow from the general considerations of Chapter 6 that any f 2 L2 ( ⇡, ⇡) may expressed as a linear combination f (x) = 1 X n= 1 115 cn einx (8.1.1) 81 where hf, un i 1 cn = = hun , un i 2⇡ Z ⇡ f (y)e iny dy (8.1.2) 82 ⇡ The right hand side of (8.1.1) is a Fourier series for f , and (8.1.2) is a formula for the n’th Fourier coefficient of f . It must be understood that the equality in (8.1.1) is meant only in the sense of L2 convergence of the partial sums, and need not be true at any particular point. From the theory of Lebesgue integration it follows that there is a subsequence of the partial sums which will converge almost everywhere on ( ⇡, ⇡), but more than P inx that we cannot say, without further assumptions on f . Any finite sum N is n= N n e called a trigonometric polynomial, so in particular we will be showing that trigonometric polynomials are dense in L2 ( ⇡, ⇡). Let us set 1 en (x) = p einx 2⇡ n = 0, ±1, ±2, . . . n 1 X ikx Dn (x) = e 2⇡ k= n (8.1.3) (8.1.4) 82a N KN (x) = 1 X Dn (x) N + 1 n=0 (8.1.5) It is immediate from checking the necessary integrals that {en }1 n= 1 is an orthonormal set in H = L2 ( ⇡, ⇡). The main goal for the rest of this section is to prove that {en }1 n= 1 is actually an orthonormal basis of H. For the rest of this section, the inner product symbol hf, gi and norm || · || refer to the inner product and norm in H unless otherwise stated. In the context of Fourier analysis, Dn , KN are known as the Dirichlet kernel and Féjer kernel respectively. Note that Z ⇡ Z ⇡ Dn (x) dx = KN (x) dx = 1 (8.1.6) ⇡ ⇡ for any n, N . If f 2 H, let sn (x) = n X k= n 116 ck eikx (8.1.7) 83 where ck is given by (8.1.2) and N 1 X sn (x) N (x) = N + 1 n=0 Since n X sn (x) = k= n (8.1.8) hf, ek iek (x) (8.1.9) it follows that the partial sum sn is also the projection of f onto the span of {ek }nk= n and so in particular the Bessel inequality v uX u n ||sn || = t |ck |2  ||f || (8.1.10) k= n holds for all n. In particular, limn!1 hf, en i = 0, which is the Riemann Lebesgue lemma for the Fourier coefficients of f 2 H. Next observe that by substitution of (8.1.2) into (8.1.7) we obtain Z ⇡ sn (x) = f (y)Dn (x y) dy (8.1.11) 84 ⇡ We can therefore regard sn as being given by the convolution Dn ⇤ f if we let f (x) = 0 outside of the interval ( ⇡, ⇡). We can also express Dn in an alternative and useful way: 1 e Dn (x) = 2⇡ inx 2n X k=0 e ikx 1 = e 2⇡ inx ✓ 1 e(2n+1)ix 1 eix for x 6= 0. Multiplying top and bottom of the fraction by e Dn (x) = 1 sin (n + 12 )x 2⇡ sin x2 x 6= 0 ix/2 ◆ (8.1.12) then yields (8.1.13) and obviously Dn (0) = (2n + 1)/2⇡. An alternative viewpoint of the convolutional relation (8.1.11), which is in some sense more natural, starts by defining the unit circle as T = R mod 2⇡Z, i.e. we identify any 117 84a two points of R di↵ering by an integer multiple of 2⇡. Any 2⇡ periodic function, such as en , Dn , sn etc may be regarded as a function on T, and if f is originally given as a function on ( ⇡, ⇡) then it may extended in a 2⇡ periodic manner to all of R and so also viewed as a function on the circle T. With f , Dn both 2⇡ periodic, the integral (8.1.11) could be written as Z sn (x) = f (y)Dn (x y) dy (8.1.14) 85 T since (8.1.11) simply amounts to using one natural parametrization of the independent variable. By the same token Z a+2⇡ sn (x) = f (y)Dn (x y) dy (8.1.15) a for any convenient choice of a. A 2⇡ periodic function is continuous on T if it is continuous on [ ⇡, ⇡] and f (⇡) = f ( ⇡), and the space C(T) may simply be regarded as C(T) = {f 2 C([ ⇡, ⇡]) : f (⇡) = f ( ⇡)} (8.1.16) a closed subspace of C([ ⇡, ⇡]), so is itself a Banach space with maximum norm. Likewise we can define C m (T) = {f 2 C m ([ ⇡, ⇡]) : f (j) (⇡) = f (j) ( ⇡), j = 0, 1, . . . m} (8.1.17) a Banach space with the analogous norm. Next let us make some corresponding observations about KN . Proposition 8.1. There holds N (x) = Z T KN (x y)f (y) dy (8.1.18) 86 (8.1.19) 86b and KN (x) = N ✓ X k= N 1 |k| N +1 ◆ eikx 1 = 2⇡(N + 1) sin ( (N +1)x ) 2 x sin ( 2 ) !2 x 6= 0 Proof: The identity (8.1.18) is immediate from (8.1.14) and the definition of KN , and 118 the first identity in (8.1.19) is left as an exercise. To complete the proof we observe that 2⇡ N X Dn (x) = n=0 = = PN sin (n + 12 )x sin x2 ⇣ xP ⌘ inx Im ei 2 N e n=0 n=0 sin x ⇣ x ⇣ 2 i(N +1)x ⌘⌘ Im ei 2 1 1e eix Im = (8.1.20) ⇣ sin x2 1 cos (N +1)x i sin (N +1)x 2i sin x2 sin x2 (8.1.21) (8.1.22) ⌘ (8.1.23) cos (N + 1)x 1 2 sin2 x2 !2 sin (N +1)x 2 = sin ( x2 ) = (8.1.24) (8.1.25) and the conclusion follows upon dividing by 2⇡(N + 1). 2 fejerconvergence Theorem 8.1. Suppose that f 2 C(T). Then N ! f in C(T). R 0 and T KN (x y) dy = 1 for any x, we have Z Z x+⇡ f (x)| = Kn (x y)(f (y) f (x)) dy  Kn (x y)|f (y) Proof: Since KN | N (x) T f (x)| dy x ⇡ If ✏ > 0 is given, then since f must be uniformly continuous on T, there exists that |f (x) f (y)| < ✏ if |x y| < . Thus | N (x) f (x)| ✏ R |x y|< KN (x Thus there exists N0 such that for N uniformly. y) dy + 2||f ||1 ✏+ N0 , | <|x y|<⇡ KN (x (8.1.27) y) dy(8.1.28) ||f ||1 ⇡(N +1) sin2 ( 2 ) N (x) 119 R (8.1.26) > 0 such f (x)| < 2✏ for all x, that is, (8.1.29) N !f corr81 Corollary 8.1. The functions {en (x)}1 n= 1 form an orthonormal basis of H = L2 ( ⇡, ⇡). Proof: We have already observed that these functions form an orthonormal set, so it remains only to verify one of the equivalent conditions stated in Theorem 6.4. We will show the closedness property, i.e. that set of finite linear combinations of {en (x)}1 n= 1 is dense in H. Given g 2 H and ✏ > 0 we may find f 2 C(T) such that ||f g|| < ✏, f 2 D( ⇡, ⇡)pfor example. Then choose N such that || N f ||C(T) < ✏, which implies || N f || < 2⇡✏. Thus N is a finite linear combination of the en ’s and p ||g 2⇡)✏ (8.1.30) N || < (1 + Since ✏ is arbitrary, the conclusion follows. corr82 Corollary 8.2. For any f 2 H = L2 ( ⇡, ⇡), if n X sn (x) = ck eikx (8.1.31) k= n where 1 ck = 2⇡ then sn ! f in H. Z ⇡ f (x)e ikx dx (8.1.32) ⇡ For f 2 H, we will often write f (x) = 1 X cn einx (8.1.33) n= 1 but we emphasize that without further assumptions this only means that the partial sums converge in L2 ( ⇡, ⇡). At this point we have looked at the convergence properties of two di↵erent sequences of trigonometric polynomials, sn and N , associated with f . While sn is simply the n’th partial sum of the Fourier series of f , the N ’s are the so-called Féjer means of f . While each Féjer mean is a trigonometric polynomial, the sequence N does not amount to the partial sums of some other Fourier series, since the n’th coefficient would also have to depend on N . For f 2 H, we have that sN ! f in H, and so the same is obviously true under the stronger assumption that f 2 C(T). On the other hand for f 2 C(T) we have 120 shown that N ! f uniformly, but it need not be true that sN ! f uniformly, or even pointwise (example of P. du Bois-Reymond, see Section 1.6.1 of [25]). For f 2 H it can be shown that N ! f in H, but on the other hand the best L2 approximation property of sN implies that ||sN f ||  || N f || (8.1.34) since both sN and N are in the span of {ek }N k= N . That is to say, the rate of convergence of sN to f is faster, in the L2 sense at least, than that of N . In summary, both sN and N provide a trigonometric polynomial approximating f , but each has some advantage over the other, depending on what is to be assumed about f . 8.2 Alternative forms of Fourier series From the basic Fourier series (8.1.1) a number of other closely related and useful expressions can be immediately derived. First suppose that f 2 L2 ( L, L) for some L > 0. If we let f˜(x) = f (Lx/⇡) then f˜ 2 L2 ( ⇡, ⇡), so f˜(x) = 1 X cn e inx n= 1 1 cn = 2⇡ Z 1 cn = 2L Z ⇡ f˜(y)e iny f (y)e i⇡ny/L dy (8.2.1) ⇡ or equivalently f (x) = 1 X cn e i⇡nx/L n= 1 L dy (8.2.2) L Likewise (8.2.2) holds if we just regard f as being 2L periodic and in L2 , and in the formula p for cn we could replace ( L, L) by any other interval of length 2L. The functions ei⇡nx/L / 2L make up an orthonormal basis of L2 (a, b) if b a = 2L. Next observe that we can write f (x) = 1 X n= 1 cn ⇣ 1 X n⇡x n⇡x ⌘ cos + i sin = c0 + (cn + c L L n=1 n ) cos n⇡x + i(cn L c n⇡x L (8.2.3) n ) sin If we let an = c n + c n bn = i(cn 121 c n) n = 0, 1, 2, . . . (8.2.4) 87 then we obtain the equivalent formulas f (x) = 1 a0 X n⇡x n⇡x + an cos + bn sin 2 L L n=1 (8.2.5) 88 where 1 an = L Z L n⇡y f (y) cos dy L L 1 bn = L n = 0, 1, . . . Z L f (y) sin L n⇡y dy L n = 1, 2, . . . (8.2.6) We refer to (8.2.5),(8.2.6) as the ’real form’ of the Fourier series, which is natural to use, for example, if f is real valued, since then no complex quantities appear. Again the precise meaning of (8.2.5) is that sn ! f in H = L2 ( L, L) or other interval of length 2L, where now n a0 X k⇡x k⇡x sn (x) = + ak cos + bk sin (8.2.7) 2 L L k=1 with results analogous to those mentioned above for the Féjer means also being valid. It may be easily checked that the set of functions ⇢ 1 1 cos n⇡x sin n⇡x p , p L , pL (8.2.8) 2L L L n=1 make up an orthonormal basis of L2 ( L, L). Another important variant is obtained as follows. If f 2 L2 (0, L) then we may define the associated even and odd extensions of f in L2 ( L, L), namely ( ( f (x) 0 < x < L f (x) 0 < x < L fe (x) = fo (x) = (8.2.9) f ( x) L<x<0 f ( x) L<x<0 If we replace f by fe in (8.2.5),(8.2.6), then we obtain immediately that bn = 0 and a resulting cosine series representation for f , Z 1 a0 X n⇡x 2 L n⇡y f (x) = + an cos an = f (y) cos dy n = 0, 1, . . . (8.2.10) 2 L L 0 L n=1 Likewise replacing f by fo gives us a corresponding sine series, Z 1 X n⇡x 2 L n⇡y f (x) = bn sin bn = f (y) sin dy n = 1, 2, . . . L L 0 L n=1 122 (8.2.11) 89 Note that if the 2L periodic extension of f is continuous, then the same is true of the 2L periodic extension of fe , but this need not be true in the case of fo . Thus we might expect that the cosine series of f has typically better convergence properties than that of the sine series. 8.3 More about convergence of Fourier series If f 2 L2 ( ⇡, ⇡) it was already observed that since the the partial sums sn converge to f in L2 ( ⇡, ⇡), some subsequence of the partial sums converges pointwise a.e. In fact it is a famous theorem of Carleson ([6]) that sn ! f (i.e. the entire sequence, not just a subsequence) pointwise a.e. This is a complicated proof and even now is not to be found even in advanced textbooks. No better result could be expected since f itself is only defined up to sets of measure zero. If we were to assume the stronger condition that f 2 C(T) then it mighty be natural to conjecture that sn ! f for every x (recall we know N ! f uniformly in this case), but that turns out to be false, as mentioned above: in fact there exist continuous functions for which sn (x) is divergent at infinitely many x 2 T, see Section 5.11 of [29]. A sufficient condition implying that sn (x) ! f (x) for every x 2 T is that f be piecewise continuously di↵erentiable on T. In fact the following more precise theorem can be proved. FourPointwise Theorem 8.2. Assume that there exist points ⇡  x0 < x1 < . . . xM = ⇡ such that f 2 C 1 ([xj , xj+1 ]) for j = 0, 1, . . . M 1. Let ( 1 (limy!x+ f (y) + limy!x f (y)) ⇡<x<⇡ f˜(x) = 21 (8.3.1) (limy! ⇡+ f (y) + limy!⇡ f (y)) x = ±⇡ 2 Then limn!1 sn (x) = f˜(x) for ⇡  x  ⇡. Under the stated assumptions on f , the theorem states in particular that sn converges to f at every point of continuity of f , (with appropriate modification at the endpoints) and otherwise converges to the average of the left and right hand limits. The proof is somewhat similar to that of Theorem 8.1 – steps in the proof are outlined in the exercises. So far we have discussed the convergence properties of the Fourier series based on assumptions about f , but another point of view we could take is to focus on how con123 vergence properties are influenced by the behavior of the Fourier coefficients cn . A first simple result of this type is: prop82 Proposition 8.2. If f 2 H = L2 ( ⇡, ⇡) and its Fourier coefficients satisfy 1 X n= 1 |cn | < 1 (8.3.2) acfs then f 2 C(T) and sn ! f uniformly on T P inx Proof: By the Weierstrass M-test, the series 1 is uniformly convergent on n= 1 cn e R to some limit g, and since each partial sum is continuous, the same must be true of g. Since uniform convergence implies L2 convergence on any finite interval, we have sn ! g in H, but also sn ! f in H by Corollary 8.2. By uniqueness of the limit f = g and the conclusion follows. We say that f has an absolutely convergent Fourier series when (8.3.2) holds. We emphasize here that the conclusion f = g is meant in the sense of L2 , i.e. f (x) = g(x) a.e., so by saying that f is continuous, we are really saying that the equivalence class of f contains a continuous function, namely g. It is not the case that every continuous function has an absolutely convergent Fourier series, according to remarks made earlier in this section. It would therefore be of interest to find other conditions on f which guarantee that (8.3.2) holds. One such condition follows from the following, which is also of independent interest. prop83 Proposition 8.3. If f 2 C m (T), then limn!±1 nm cn = 0. Proof: We integrate by parts in (8.1.2) to get, for n 6= 0,  Z Z 1 f (y)e iny ⇡ 1 ⇡ 0 1 iny cn = + f (y)e dy = ⇡ 2⇡ in in ⇡ 2⇡in ⇡ f 0 (y)e iny dy (8.3.3) ⇡ if f 2 C 1 (T). Since f 0 2 L2 (T), the Riemann-Lebesgue lemma implies that ncn ! 0 as n ! ±1. If f 2 C 2 (T) we could integrate by parts again to get n2 cn ! 0 etc. It is immediate from this result that if f 2 C 2 (T) then it has an absolutely convergent Fourier series, but in fact even f 2 C 1 (T) is more than enough, see Exercise 6. One way to regard Proposition 8.3 is that it says that the smoother f is, the more rapidly its Fourier coefficients must decay. The next result is a sort of converse statement. 124 810 prop84 Proposition 8.4. If f 2 H = L2 ( ⇡, ⇡) and its Fourier coefficients satisfy |nm+↵ cn |  C (8.3.4) 811 for some C and ↵ > 1, then f 2 C m (T). Proof: When m = 0 this is just a special case of Proposition 8.2. When m = 1 we see that it is permissible to di↵erentiate the series (8.1.1) term by term, since the di↵erentiated series 1 X incn einx (8.3.5) n= 1 is uniformly convergent, by the assumption (8.3.4). Thus f, f 0 are both a.e. equal to an absolutely convergent Fourier series, so f 2 C 1 (T), by Proposition 8.2. The proof for m = 2, 3, . . . is similar. Note that Proposition 8.3 states a necessary condition on the Fourier coefficients for f to be in C m and Proposition 8.4 states a sufficient condition. The two conditions are not identical, but both point to the general tendency that increased smoothness of f is associated with more rapid decay of the corresponding Fourier coefficients. 8.4 The Fourier Transform on RN If f is a given function on RN the Fourier transform of f is defined as Z 1 fˆ(y) = f (x)e ix·y dx y 2 RN N N 2 (2⇡) R (8.4.1) provided that the integral is defined in some sense. This will always be the case, for example, if f 2 L1 (RN ) and any y 2 RN since then Z 1 ˆ |f (y)|  |f (x)| dx < 1 (8.4.2) N (2⇡) 2 RN thus in fact fˆ 2 L1 (RN ) in this case. There are a number of other commonly used definitions of the Fourier transform, obtained by changing the numerical constant in front of the integral, and/or replacing 125 812 813 ix · y by ix · y and/or including a factor of 2⇡ in the exponent in the integrand. Each convention has some convenient properties in certain situations, but none of them is always the best, hence the lack of a universally agreed upon definition. The di↵erences are non-essential, all having to do with the way certain numerical constants turn up, so the only requirement is that we adopt one specific definition, such as (8.4.1), and stick with it. The Fourier transform is a particular integral operator, and an alternative operator type notation for it, F = ˆ (8.4.3) is often convenient to use, especially when discussing its mapping properties. Example 8.1. If N = 1 and f (x) = then the Fourier transform of f is 1 fˆ(y) = p 2⇡ [a,b] (x), Z the indicator function of the interval [a, b], b e ixy dy = a Example 8.2. If N = 1, ↵ > 0 and f (x) = e iay p e 2⇡iy iby 1 ↵x2 (8.4.4) (a Gaussian function) then y2 Z 1 4↵ iy 2 e e e ixy dx = p e ↵(x+ 2 ) dx 2⇡ 1 1 y2 Z y2 r 1 y2 e 4↵ e 4↵ ⇡ 1 2 = p e ↵x dx = p = p e 4↵ 2⇡ 1 2⇡ ↵ 2↵ 1 fˆ(y) = p 2⇡ Z ↵x2 e (8.4.5) (8.4.6) In the above derivation, the key step is the third equality which is justified by contour 2 integration techniques in complex function theory – the integral of e ↵z along the real axis is the same as the integral along the parallel line Imz = y2 for any y. Thus the Fourier transform of a Gaussian is another Gaussian, and in particular fˆ = f if ↵ = 12 . It is clear from the Fourier transform definition that if f has the special product form f (x) = f1 (x1 )f2 (x2 ) . . . fN (xN ) then fˆ(y) = fˆ1 (y1 )fˆ2 (y2 ) . . . fˆN (yN ). The Gaussian in 2 RN , namely f (x) = e ↵|x| , is of this type, so using (8.4.6) we immediately obtain fˆ(y) = e |y|2 4↵ N (2↵) 2 126 (8.4.7) NdGaussian To state our first theorem about the Fourier transform, let us denote C0 (RN ) = {f 2 C(RN ) : lim |f (x)| = 0} |x|!1 (8.4.8) the space of continuous functions vanishing at 1. It is a closed subspace of L1 (RN ), hence a Banach space with the L1 norm. We emphasize that despite the notation, functions in this space need not be of compact support. Theorem 8.3. If f 2 L1 (RN ) then fˆ 2 C0 (RN ). Proof: If yn 2 RN and yn ! y then clearly f (x)e ix·yn ! f (x)e ix·y for a.e. x 2 RN . Also, |f (x)e ix·yn |  |f (x)|, and since we assume f 2 L1 (RN ) we can immediately apply the dominated convergence theorem to obtain Z Z ix·yn lim f (x)e dx = f (x)e ix·y dx (8.4.9) n!1 RN RN that is, fˆ(yn ) ! fˆ(y). Hence fˆ 2 C(RN ). Next, suppose temporarily that g 2 C 1 (RN ) and has compact support. An integration by parts gives us, for j = 1, 2, . . . N that Z 1 1 @g ix·y e dx (8.4.10) ĝ(y) = N (2⇡) 2 iyj RN @yj Thus there exists some C, depending on g, such that |ĝ(y)|2  C yj2 from which it follows that 2 |ĝ(y)|  min j j = 1, 2, . . . N (8.4.11) ✓ (8.4.12) C yj2 ◆  CN |y|2 Thus ĝ(y) ! 0 as |y| ! 1 in this case. Finally, such g’s are dense in L1 (RN ), so given f 2 L1 (RN ) and ✏ > 0, choose g as above such that ||f g||L1 (RN ) < ✏. We then have, taking into account (8.4.2) |fˆ(y)|  |fˆ(y) ĝ(y)| + |ĝ(y)|  1 N (2⇡) 2 127 ||f g||L1 (RN ) + |ĝ(y)| (8.4.13) and so lim sup |fˆ(y)| < |y|!1 ✏ N (2⇡) 2 (8.4.14) Since ✏ > 0 is arbitrary, the conclusion fˆ 2 C0 (RN ) follows. The fact that fˆ(y) ! 0 as |y| ! 1 is analogous to the property that the Fourier coefficients cn ! 0 as n ! ±1 in the case of Fourier series, and in fact is also called the Riemann-Lebesgue Lemma. One of the fundamental properties of the Fourier transform is that it is ’almost’ its own inverse. A first precise version of this is given by the following Fourier Inversion Theorem. finvthm Theorem 8.4. If f, fˆ 2 L1 (RN ) then Z 1 f (x) = fˆ(y)eix·y dy N (2⇡) 2 RN a.e. x 2 RN (8.4.15) The right hand side of (8.4.15) is not precisely the Fourier transform of fˆ because the exponent contains ix · y rather than ix · y, but it does mean that we can think of ˆ it as saying that f (x) = fˆ( x), or ˆ fˆ = fˇ, (8.4.16) where fˇ(x) = f ( x), is the reflection of f .1 The requirement in the theorem that both f and fˆ be in L1 will be weakened later on. Proof: Since fˆ 2 L1 (RN ) the right hand side of (8.4.15) is well defined, and we denote it temporarily by g(x). Define also the family of Gaussians, G↵ (x) = 1 e |x|2 4↵ N (4⇡↵) 2 Warning: some authors use the symbol fˇ to mean the inverse Fourier transform of f . 128 (8.4.17) fourinv 819 We then have g(x) = = = = = 1 Z 2 fˆ(y)eix·y e ↵|y| dy ↵!0+ (2⇡) RN Z Z 1 2 lim f (z)e ↵|y| e i(z x)·y dzdy ↵!0+ (2⇡)N RN RN ✓Z ◆ Z 1 ↵|y|2 i(z x)·y lim f (z) e e dy dz ↵!0+ (2⇡)N RN RN |z x|2 Z e 4↵ lim f (z) N dz ↵!0+ RN (4⇡↵) 2 lim (f ⇤ G↵ )(x) lim N 2 ↵!0+ (8.4.18) (8.4.19) (8.4.20) (8.4.21) (8.4.22) Here (8.4.18) follows from the dominated convergence theorem and (8.4.20) from Fubini’s theorem, which is applicable here because Z Z 2 |f (z)e ↵|y| | dzdy < 1 (8.4.23) RN RN In (8.4.21) we have used the explicit calculation (8.4.7) above for the Fourier transform of a Gaussian. R Noting that RN G↵ (x) dx = 1 for every ↵ > 0, we see that the di↵erence f ⇤ G↵ (x) f (x) may be written as Z RN G↵ (y)(f (x so that ||f ⇤ G↵ f ||L1 (RN )  y) f (x)) dx (8.4.24) Z G↵ (y) (y) dy (8.4.25) RN R where (y) = RN |f (x y) f (x)| dx. Then is bounded and continuous at y = 0 with (0) = 0 (see Exercise 10), and we can verify that the hypotheses of Theorem 7.2 are satisfied with fn replaced by G↵n as long as ↵n ! 0+. For any sequence ↵n > 0, ↵n ! 0 it follows that G↵n ⇤ f ! f in L1 (RN ), and so there is a subsequence ↵nk ! 0 such that (G↵nk ⇤ f )(x) ! f (x) a.e. We conclude that (8.4.15) holds. 2 129 8.5 Further properties of the Fourier transform Formally speaking we have Z @ f (x)e @yj RN or in more compact notation ix·y dx = Z RN ixj f (x)e ix·y dx @ fˆ = ( ixj f )ˆ @yj (8.5.1) (8.5.2) This is rigorously justified by standard theorems Rof analysis about di↵erentiation of integrals with respect to parameters provided that RN |xj f (x)| dx < 1. A companion property, obtained formally using integration by parts, is that Z Z @f ix·y e dx = iyj f (x)e ix·y dx (8.5.3) @x N N j R R or ✓ ◆ @f ˆ = iyj fˆ @xj (8.5.4) R which is rigorously correct provided at least that f 2 C 1 (RN ) and |x|=R |f (x)| dS ! 0 as R ! 1. Repeating the above arguments with higher derivatives we obtain Proposition 8.5. If ↵ is any multi-index then D↵ fˆ(y) = (( ix)↵ f )ˆ(y) if (8.5.5) 821 |x↵ f (x)| dx < 1 (8.5.6) 822 (D↵ f )ˆ(y) = (iy)↵ fˆ(y) (8.5.7) 823 (8.5.8) 824 Z and if m n f 2 C (R ) Z |x|=R RN |D f (x)| dS ! 0 as R ! 1 130 | | < |↵| = m We will eventually see that (8.5.5) and (8.5.7) remain valid, suitably interpreted in a distributional sense, under conditions much more general than (8.5.6) and (8.5.8). But for now we introduce a new space in which these last two conditions are guaranteed to hold. Definition 8.1. The Schwartz space is defined as S(RN ) = { 2 C 1 (RN ) : x↵ D 2 L1 (RN ) for all ↵, } (8.5.9) Thus a function is in the Schwartz space if any derivative of it decays more rapidly than the reciprocal of any polynomial. Clearly S(RN ) contains all test functions D(RN ) 2 as well as other kinds of functions such as Gaussians, e ↵|x| for any ↵ > 0. If 2 S(RN ) then in particular, for any n |D (x)|  C (1 + |x|2 )n (8.5.10) for some C, and so clearly both (8.5.5) and (8.5.7) hold, thus the two key identities (8.5.5) and (8.5.7) are correct whenever f is in the Schwartz space. It is also immediate from (8.5.10) that S(RN ) ⇢ L1 (RN ) \ L1 (RN ). Proposition 8.6. If 2 S(RN ) then ˆ 2 S(RN ). Proof: Note from (8.5.5) and (8.5.7) that (iy)↵ D ˆ(y) = (iy)↵ (( ix) )ˆ(y) = (D↵ (( ix) ))ˆ(y) (8.5.11) holds for 2 S(RN ). Also, since S(RN ) ⇢ L1 (RN ) it follows from (8.4.2) that if S(RN ) then ˆ 2 L1 (RN ). Thus we have the following list of implications: 2 S(RN ) =) ( ix) 2 S(RN ) =) D↵ (( ix) ) 2 S(RN ) =) (D↵ (( ix) ))ˆ 2 L1 (RN ) =) y ↵ D ˆ 2 L1 (RN ) =) ˆ 2 S(RN ) fmap (8.5.12) (8.5.13) (8.5.14) (8.5.15) (8.5.16) Corollary 8.3. The Fourier transform F : S(RN ) ! S(RN ) is one to one and onto. 131 2 825 Proof: The above theorem says that F maps S(RN ) into S(RN ), and if F = ˆ = 0 then the inversion theorem Theorem 8.4 is applicable, since both , ˆ are in L1 (RN ). We ˇ conclude = 0, i.e. F is one to one. If 2 S(RN ), let = ˆ. Clearly 2 S(RN ) and one may check directly, again using the inversion theorem, that ˆ = , so that F is onto. The next result, usually known as the Parseval identity, is the key step needed to define the Fourier transform of a function in L2 (RN ), which turns out to be the more natural setting. Proposition 8.7. If , 2 S(RN ) then Z Z (x) ˆ(x) dx = RN ˆ(x) (x) dx (8.5.17) pars RN Proof: The proof is simply an interchange of order in an iterated integral, which is easily justified by Fubini’s theorem: ✓Z ◆ Z Z 1 ix·y ˆ (x) (x) dx = (x) (y)e dy dx (8.5.18) N (2⇡) 2 RN RN RN ✓Z ◆ Z 1 ix·y = (y) (x)e dx dy (8.5.19) N (2⇡) 2 RN RN Z ˆ(y) (y) dy = (8.5.20) RN There is a slightly di↵erent but equivalent formula, which is also sometimes called the Parseval identity, see Exercise 11. The content of the following corollary is the Plancherel identity. planchthm Corollary 8.4. For every 2 S(RN ) we have || ||L2 (RN ) = || ˆ||L2 (RN ) (8.5.21) Proof: Given 2 S(RN ) there exists, by Corollary 8.3, 2 S(RN ) such that ˆ = . In addition it follows directly from the definition of the Fourier transform and the inversion 132 planch theorem that || ||2L2 (RN ) = Z = ˆ. Therefore, by Parseval’s identity Z Z ˆ ˆ ˆ(x) ˆ(x) dx = || ˆ||2 2 N (8.5.22) (x) (x) dx = (x) (x) = L (R ) RN RN RN Recalling that D(RN ) is dense in L2 (RN ) it follows that the same is true of S(RN ) and the Plancherel identity therefore implies that the Fourier transform has an extension to all of L2 (RN ). To be precise, if f 2 L2 (RN ) pick n 2 S(RN ) such that n ! f in L2 (RN ). Since { n } is Cauchy in L2 (RN ), (8.5.21) implies the same for { ˆn }, so g := limn!1 ˆn exists in the L2 sense, and this limit is by definition fˆ. From elementary considerations this limit is independent of the choice of approximating sequence { n }, the extended definition of fˆ agrees with the original definition if f 2 L1 (RN ) \ L2 (RN ), and (8.5.21) continues to hold for all f 2 L2 (RN ). ˆ ˆ Since ˆn ! fˆ in L2 (RN ), it follows by similar reasoning that ˆn ! fˆ. By the inversion ˆ ˆ theorem we know that ˆn = ˇn which must converge to fˇ, thus fˇ = fˆ, i.e. the Fourier 2 N inversion theorem continues to hold on L (R ). The subset L1 (RN ) \ L2 (RN ) is dense in L2 (RN ) so we also have that fˆ = limn!1 fˆn if fn is any sequence in L1 (RN ) \ L2 (RN ) convergent in L2 (RN ) to f . A natural choice of such a sequence is ( f (x) |x| < n fn (x) = (8.5.23) 0 |x| > n leading to the following explicit formula, similar to an improper integral, for the Fourier transform of an L2 function, Z 1 ˆ f (y) = lim f (x)e ix·y dx (8.5.24) N n!1 (2⇡) 2 |x|<n fourL2 where again without further assumptions we only know that the limit takes place in the L2 sense. Let us summarize. Theorem 8.5. For any f 2 L2 (RN ) there exists a unique fˆ 2 L2 (RN ) such that fˆ is given by (8.4.1) whenever f 2 L1 (RN ) \ L2 (RN ) and ||f ||L2 (RN ) = ||fˆ||L2 (RN ) . 133 (8.5.25) planch2 Furthermore, f, fˆ are related by (8.5.24) and f (x) = lim n!1 1 N (2⇡) 2 Z fˆ(y)eix·y dy (8.5.26) |y|<n We conclude this section with one final important property of the Fourier transform. ftconv Proposition 8.8. If f, g 2 L1 (RN ) then f ⇤ g 2 L1 (RN ) and N (f ⇤ g)ˆ = (2⇡) 2 fˆĝ (8.5.27) Proof: The fact that f ⇤ g 2 L1 (RN ) is immediate from Fubini’s theorem, or, alternatively, is a special case of Young’s convolution inequality (7.4.2). To prove (8.5.27) we have Z 1 (f ⇤ g)ˆ(z) = (f ⇤ g)(x)e ix·z dx (8.5.28) N (2⇡) 2 RN ◆ Z ✓Z 1 = f (x y)g(y) dy e ix·z dx (8.5.29) N (2⇡) 2 RN RN ✓Z ◆ Z 1 iy·z i(x y)·z = g(y)e f (x y)e dx dy (8.5.30) N (2⇡) 2 RN RN N = (2⇡) 2 fˆ(z)ĝ(z) (8.5.31) with the exchange of order of integration justified by Fubini’s theorem. 8.6 Fourier series of distributions In this and the next section we will see how the theory of Fourier series and Fourier transforms can be extended to a distributional setting. To begin with let us consider the casePof the delta function, viewed as a distribution on ( ⇡, ⇡). Formally speaking, if inx (x) = 1 , then the coefficients cn should be given by n= 1 cn e Z ⇡ 1 1 cn = (x)e inx dx = (8.6.1) 2⇡ ⇡ 2⇡ 134 874 for every n, so that 1 1 X inx (x) = e 2⇡ n= 1 (8.6.2) 871 Certainly this is not a valid formula in any classical sense, since the terms of the series do not decay to zero. On the other hand, the N ’th partial sum of this series is precisely the Dirichlet kernel DN (x), as in (8.1.4) or (8.1.13), and one consequence of Theorem 8.2 is precisely that DN ! in D0 ( ⇡, ⇡). Thus we may expect to find Fourier series representations of distributions, provided that we allow for the series to converge in a distributional sense. Note that since DN ! we must also have, by Proposition 7.2, that 0 DN = N i X neinx ! 2⇡ n= N 0 (8.6.3) P m inx as N ! 1. By repeatedly di↵erentiating, we see that any formal Fourier series 1 n= 1 n e is meaningful in the distributional sense, and is simply, up to a constant multiple, some derivative of the delta function. The following proposition shows that we can allow any sequence of Fourier coefficients as long as the rate of growth is at most a power of n. Proposition 8.9. Let {cn }1 n= 1 be any sequence of constants satisfying |cn |  C|n|M (8.6.4) for some constant C and positive integer M . Then there exists T 2 D0 ( ⇡, ⇡) such that T = 1 X cn einx (8.6.5) cn einx M +2 (in) 1 (8.6.6) n= 1 Proof: Let g(x) = 1 X n= which is a uniformly convergent Fourier series, so in particular the partial sums SN ! g (j) in the sense of distributions on ( ⇡, ⇡). But then SN ! g (j) also in the distributional sense, and in particular 1 X cn einx = T := g (M +2) (8.6.7) n= 1 135 distfs It seems clear that any distribution on R of the form (8.6.5) should be 2⇡ periodic since every partial sum is. To make this precise, define the translate of any distribution T 2 D0 (RN ) by the natural definition ⌧h T ( ) = T (⌧ h ), where as usual ⌧h (x) = (x h), h 2 RN . We then say that T is h-periodic with period h 2 RN if ⌧h T = T , and it is immediate that if Tn is h-periodic and Tn ! T in D0 (RN ) then T is also h periodic. Example 8.3. The Fourier series identity (8.6.2) becomes 1 X (x 1 1 X inx e 2⇡ n= 1 2n⇡) = n= 1 (8.6.8) when regarded as an identity in D0 (R), since the left side is 2⇡ periodic and coincides with on ( ⇡, ⇡). A 2⇡ periodic distribution on R may also naturally be regarded as an element of the distribution space D0 (T), which is defined as the space of continuous linear functionals (j) on C 1 (T). Here, convergence in C 1 (T) means that n ! (j) uniformly on T for all 1 j = 0, 1, 2 . . . . Any function usual way to regular distribution R ⇡ f 2 L (T) gives rise in the 2 Tf defined by Tf ( ) = ⇡ f (x) (x) dx and if f 2 L then then n’th Fourier coefficient 1 is cn = 2⇡ Tf (e inx ). Since e inx 2 C 1 (T) it follows that cn = T (e inx ) (8.6.9) is defined for T 2 D0 (T), and is defined to be the n’th Fourier coefficient of the distribution T . This definition is then consistent with the definition of Fourier coefficient for a regular distribution, and it can be shown (Exercise 29) that N X n= N in D0 (T) cn einx ! T (8.6.10) Example 8.4. Let us evaluate the distributional Fourier series 1 X einx (8.6.11) n=0 The n’th partial sum is sn (x) = n X eikx = k=0 136 1 ei(n+1)x 1 eix (8.6.12) 872 so that we may write, since R⇡ sn (x) dx = 2⇡, Z ⇡ 1 ei(n+1)x sn ( ) = 2⇡ (0) + ( (x) 1 eix ⇡ ⇡ (0)) dx (8.6.13) for any test function . The function ( (x) (0))/(1 eix ) belongs to L2 ( ⇡, ⇡), hence Z ⇡ i(n+1)x e ( (x) (0)) dx ! 0 eix ⇡ 1 (8.6.14) as n ! 1 by the Riemann-Lebesgue lemma. Next, using obvious trigonometric identities we see that 1/(1 eix ) = 12 (1 + i cot x2 ), and so Z ⇡ Z (x) (0) 1 x dx = lim ( (x) (0))(1 + i cot ) dx (8.6.15) ix ✏!0+ 2 ✏<|x|<⇡ 1 e 2 ⇡ Z ⇡ 1 = (x) dx ⇡ (0) (8.6.16) 2 ⇡ Z i x + lim (x) cot dx (8.6.17) ✏!0+ 2 ✏<|x|<⇡ 2 The principal value integral in (8.6.17) is naturally defined to be the action of the distribution pv(cot x2 ), and we obtain the final result, upon letting n ! 1, that 1 X einx = ⇡ + n=0 1 i x + pv(cot ) 2 2 2 (8.6.18) By taking the real and imaginary parts of this identity we also find 1 X 1 X 1 cos nx = ⇡ + 2 n=0 8.7 sin nx = n=1 1 x pv(cot ) 2 2 (8.6.19) Fourier transforms of distributions Taking again the example of the delta function, now considered as a distribution on RN , it appears formally correct that it should have a Fourier transform which is a constant function, namely Z 1 ˆ(x) = 1 N (x)e ix·y dx = (8.7.1) N (2⇡) 2 RN (2⇡) 2 137 If the inversion theorem remains valid then any constant should also have a Fourier N transform, e.g. 1̂ = (2⇡) 2 . On the other hand it will turn out that a function such as ex does not have a Fourier transform in any reasonable sense. We will now show that the set of distributions for which the Fourier transform can be defined turns out to be precisely the dual space of the Schwartz space, known also as the space of tempered distributions. To define this we must first have a definition of convergence in S(RN ). Definition 8.2. We say that lim ||x↵ D ( n!1 n n ! in S(RN ) if )||L1 (RN ) = 0 f or any ↵, (8.7.2) Proof of the following lemma will be left for the exercises. lemma81 Lemma 8.1. If n ! in S(RN ) then ˆn ! ˆ in S(RN ). Definition 8.3. The set of tempered distributions on RN is the space of continuous linear functionals on S(RN ), denoted S 0 (RN ). It was already observed that D(RN ) ⇢ S(RN ) and in addition, if then the sequence also converges in S(RN ). It therefore follows that S 0 (RN ) ⇢ D0 (RN ) n ! in D(RN ) (8.7.3) i.e. any tempered distribution is also a distribution, as the choice of language suggests. On the other hand, if Tf is the regular distribution corresponding to the L1loc function R 1 f (x) = ex , then Tf 62 S 0 (RN ) since this would require 1 ex (x) dx to be finite for any 2 S(RN ), which is not true. Thus the inclusion (8.7.3) is strict. We define convergence in S 0 (RN ) is defined in the expected way, analogously to Definition 7.5: convS Definition 8.4. If T, Tn 2 S 0 (RN ) for n = 1, 2 . . . then we say Tn ! T in S 0 (RN ) (or in the sense of tempered distributions) if Tn ( ) ! T ( ) for every 2 S(RN ). It is easy to see that the delta function belongs to S 0 (RN ) as does any derivative or translate of the delta function. A regular distribution Tf will belong to S 0 (RN ) provided it satisfies the condition f (x) lim =0 (8.7.4) |x|!1 |x|m 138 851 for some m. Such an f is sometimes referred to as a function of slow growth. In particular, any polynomial belongs to S 0 (RN ). We can now define the Fourier transform T̂ for any T 2 S 0 (RN ). For motivation of the definition, recall the Parseval identity (8.5.17), which amounts to the identity T ˆ( ) = T ( ˆ), if we regard as a function in S(RN ) and as a tempered distribution. Definition 8.5. If T 2 S 0 (RN ) then T̂ is defined by T̂ ( ) = T ( ˆ) for any 2 S(RN ). The action of T̂ on any 2 S(RN ) is well-defined, since ˆ 2 S(RN ), and linearity of T̂ is immediate. If n ! in S(RN ) then by Lemma 8.1 ˆn ! ˆ in S(RN ), so that T̂ ( n) = T ( ˆn ) ! T ( ˆ) = T̂ ( ) (8.7.5) We have thus verified that T̂ 2 S 0 (RN ) whenever T 2 S 0 (RN ). Example 8.5. If T = , then from the definition, T̂ ( ) = T ( ˆ) = ˆ(0) = Thus, as expected, ˆ = 1 (2⇡) N 2 1 N (2⇡) 2 Z (x) dx (8.7.6) RN , the constant distribution. Example 8.6. If T = 1 (the constant distribution) then Z ˆ ˆ(x) dx = (2⇡) N2 ˆˆ(0) = (2⇡) N2 (0) T̂ ( ) = T ( ) = (8.7.7) RN where the last equality follows from the inversion theorem which is valid for any S(RN ). Thus again the expected result is obtained, N 1̂ = (2⇡) 2 2 (8.7.8) The previous two examples verify the validity of one particular instance of the Fourier inversion theorem in the distributional context, but it turns out to be rather easy to prove that it always holds. One more definition is needed first, that of the reflection of a distribution. Definition 8.6. If T 2 D0 (RN ) then Ť , the reflection of T , is the distribution defined by Ť ( ) = T ( ˇ). 139 We now obtain the Fourier inversion theorem in its most general form, analogous to the statement (8.4.16) first justified when f, fˆ are in L1 (RN ). ˆ Theorem 8.6. If T 2 S 0 (RN ) then T̂ = Ť . Proof: For any 2 S(RN ) we have ˆ ˆ T̂ ( ) = T ( ˆ) = T ( ˇ) = Ť ( ) (8.7.9) The apparent triviality of this proof should not be misconstrued, as it relies on the validity of the inversion theorem in the Schwartz space, and other technical machinery which we have developed. Here we state several more simple but useful properties. Here and elsewhere, we follow the convention of using x and y as the independent variables before and after Fourier transformation respectively. ftdprop Proposition 8.10. Let T 2 S 0 (RN ) and ↵ be a multi-index. Then 1. x↵ T 2 S 0 (RN ). 2. D↵ T 2 S 0 (RN ). 3. D↵ T̂ = (( ix)↵ T )ˆ. 4. (D↵ T )ˆ = (iy)↵ T̂ . propftd 5. If Tn 2 S 0 (RN ) and Tn ! T in S 0 (RN ) then T̂n ! T̂ in S 0 (RN ). Proof: We give the proof of part 3 only, leaving the rest for the exercises. Just like the inversion theorem, it is more or less a direct consequence of the corresponding identity for functions in S(RN ). For any 2 S(RN ) we have D↵ T̂ ( ) = ( 1)|↵| T̂ (D↵ ) = ( 1)|↵| T ((D↵ )ˆ) = ( 1)|↵| T ((iy)↵ ˆ) (8.7.10) (8.7.11) (8.7.12) = ( ix) T ( ˆ) = (( ix) T )ˆ( ) ↵ as needed, where we used (8.5.7) to obtain (8.7.12). 140 ↵ (8.7.13) Example 8.7. If T = 0 regarded as an element of S 0 (R) then iy T̂ = ( 0 )ˆ = iy ˆ = p 2⇡ (8.7.14) by part 4 of the previous proposition. In other words Z 1 i T̂ ( ) = p x (x) dx 2⇡ 1 (8.7.15) Example 8.8. Let T = H(x), the Heaviside function, again regarded as an element of S 0 (R). To evaluate the Fourier transform Ĥ, one possible approach is to use part 4 of p 0 Proposition 8.10 p along with H = to first obtain iy Ĥ = 1/ 2⇡. A formal solution is then Ĥ = 1/ 2⇡iy, but it must then be recognized that this distributional equation does not have a unique solution, rather we can add to it any solution of yT = 0, e.g. T = C for any constant C. It must be verified that there are no other solutions, the constant C must be evaluated, and the meaning of 1/y in the distribution sense must be made precise. See Example 8, section 2.4 of [32] for details of how this calculation is completed. An alternate approach, which yields other useful formulas along the way is as follows. For any 2 S(RN ) we have Z 1 ˆ ˆ(y) dy Ĥ( ) = H( ) = (8.7.16) 0 Z 1Z 1 1 = p (x)e ixy dxdy (8.7.17) 2⇡ 0 1 Z RZ 1 1 = lim p (x)e ixy dxdy (8.7.18) R!1 2⇡ 0 1 ✓Z R ◆ Z 1 1 ixy p = lim (x) e dy dx (8.7.19) R!1 2⇡ 1 ✓ 0 ◆ Z 1 1 1 e iRx = lim p (x) dx (8.7.20) R!1 ix 2⇡ 1 Z 1 Z 1 1 sin Rx i cos Rx 1 p p = lim (x) dx + (x) dx (8.7.21) R!1 x x 2⇡ 1 2⇡ 1 It can then be verified that sin Rx !⇡ x cos Rx x 141 1 ! pv 1 x (8.7.22) 881 as R ! 1 in D0 (R). The first limit is just a restatement of the result of part b) in Exercise 7 of Chapter 7, and the second we leave for the exercises. The final result, therefore, is that r ⇡ i 1 p pv Ĥ = (8.7.23) 2 2⇡ x heavtrans n), i.e. Tn ( ) = (n), for n = 0, ±1, . . . , so that Z 1 1 ˆ T̂n ( ) = (n) = p (x)e inx dx (8.7.24) 2⇡ 1 p P 0 Equivalently, 2⇡ T̂n = e inx . If we now set T = 1 n= 1 Tn then T 2 S (R) and Example 8.9. Let Tn = (x 1 1 X T̂ = p e 2⇡ n= 1 inx 1 1 X 1 X inx p =p e = 2⇡ (x 2⇡ n= 1 n= 1 2⇡n) (8.7.25) where the last equality comes from (8.6.8). The relation T ( ˆ) = T̂ ( ), then yields the very interesting identity 1 1 X X p ˆ(n) = 2⇡ (2⇡n) (8.7.26) n= 1 valid at least for n= 1 2 S(R), which is known as the Poisson summation formula. We conclude this section with some discussion of the Fourier transform and convolution in a distributional setting. Recall we gave a definition of the convolution T ⇤ in Definition 7.7, when T 2 D0 (RN ) and 2 D(RN ). We can use precisely the same definition if T 2 S 0 (RN ) and 2 S(RN ), that is convsp Definition 8.7. If T 2 S 0 (RN ) and 2 S(RN ) then (T ⇤ )(x) = T (⌧x ˇ). Note that in terms of the action of the distribution T , x is just a parameter, and that we must regard ˇ as a function of some unnamed other variable, say y or ·. By methods similar to those used in the proof of Theorem 7.3 it can be shown that T⇤ 2 C 1 (RN ) \ S 0 (RN ) (8.7.27) and D↵ (T ⇤ ) = D↵ T ⇤ = T ⇤ D↵ In addition we have the following generalization of Proposition 8.8: 142 (8.7.28) poissum convth3 Theorem 8.7. If T 2 S 0 (RN ) and 2 S(RN ) then N (T ⇤ )ˆ = (2⇡) 2 T̂ ˆ (8.7.29) Sketch of proof: First observe that from Proposition 8.8 and the inversion theorem we see that 1 ˆ ⇤ ˆ) ( )ˆ = (8.7.30) N ( (2⇡) 2 for , 2 S(RN ). Thus for 2 S(RN ) (T̂ ˆ)( ) = T̂ ( ˆ ) = T (( ˆ )ˆ) = 1 (2⇡) N 2 ˆ T ( ˆ ⇤ ˆ) = 1 N (2⇡) 2 T ( ˇ ⇤ ˆ) (8.7.31) On the other hand, (T ⇤ )ˆ( ) = (T ⇤ )( ˆ) Z Z ˆ = (T ⇤ )(x) (x) dx = T (⌧x ˇ) ˆ(x) dx RN RN ✓Z ◆ ✓Z ◆ ˇ ˆ ˇ ˆ = T ⌧x (·) (x) dx = T (· x) (x) dx RN (8.7.32) (8.7.33) (8.7.34) RN = T ( ˇ ⇤ ˆ) (8.7.35) which completes the proof. We have labeled the above proof a ’sketch’ because one key step, the first equality in (8.7.34) was not explained adequately. See the conclusion of the proof of Theorem 7.19 in [30] for why it is permissible to move T across the integral in this way. 8.8 8-1 8-2 Exercises P inx 1. Find the Fourier series 1 for the function f (x) = x on ( ⇡, ⇡). Use n= 1 cn e some sort of computer graphics to plot a few of the partial sums of this series on the interval [ 3⇡, 3⇡]. 2. Use the Fourier series in problem 1 to find the exact value of the series 1 X 1 n2 n=1 1 X n=1 143 1 (2n 1)2 3. Evaluate explicitly the Fourier series, justifying your steps: 1 X n cos (nx) 2n n=1 (Suggestion: start by evaluating P1 einx n=1 2n , which is a geometric series.) 4. Produce a sketch of the Dirichlet and Féjer kernels DN and KN , either by hand or by computer, for some reasonably large value of N . 5. Verify the first identity in (8.1.19). 8-5 6. We say that f 2 H k (T) if f 2 D0 (T) and its Fourier coefficients cn satisfy a) If f 2 H 1 (T) show that is uniformly convergent. P1 1 X n= 1 n= 1 n2k |cn |2 < 1 (8.8.1) |cn | is convergent and so the Fourier series of f b) Show that f 2 H k (T) for every k if and only if f 2 C 1 (T). 7. Evaluate the Fourier series 1 X ( 1)n n sin (nx) n=1 0 in D (R). If possible, plot some partial sums of this series. 8. Find the Fourier transform of H(x)e 9. Let f 2 L1 (RN ). ↵x for ↵ > 0. > 0, find a relationship between fb and fb. h) for h 2 RN , find a relationship between fbh and fb. a) If f (x) = f ( x) for b) If fh (x) = f (x 8n1 8-10 10. If f 2 L1 (RN ) show that ⌧h f ! f in L1 (RN ) as h ! 0. (Hint: First prove it when f is continuous and of compact support.) 11. Show that Z (x) (x) dx = RN Z RN b(x) b(x) dx (8.8.2) for and in the Schwartz space. (This is also sometimes called the Parseval identity and leads even more directly to the Plancherel formula.) 144 8n3 ex-8-13 12. Prove Lemma 8.1. 13. In this problem Jn denotes the Bessel function of the first kind and of order n. It may defined in various ways, one of which is Z i n ⇡ iz cos ✓ Jn (z) = e cos (n✓) d✓ (8.8.3) ⇡ 0 Suppose that f is a radially symmetric function in L1 (R2 ), i.e. f (x) = f (r) where r = |x|. Show that Z 1 fˆ(y) = J0 (r|y|)f (r)r dr 0 It follows in particular that fˆ is also radially symmetric. Using the known identity d (zJ1 (z)) = zJ0 (z) compute the Fourier transform of B(0,R) the indicator function dz of the ball B(0, R) in R2 . 14. For ↵ 2 R let f↵ (x) = cos ↵x. a) Find the Fourier transform fb↵ . b) Find lim↵!0 fb↵ and lim↵!1 fb↵ in the sense of distributions. 15. Compute the Fourier transform of the Heaviside function H(x) in yet a di↵erent way by justifying that bn Ĥ = lim H n!1 in the sense of distributions, where Hn (x) = H(x)e limit. x n , and then evaluating this 16. Prove the remaining parts of Proposition 8.10. 17. Let f 2 C(R) be 2⇡ periodic. It then has a Fourier series in the classical sense, but it also has a Fourier transform since f is a tempered distribution. What is the relationship between the Fourier series and the Fourier transform? 18. Let f 2 L2 (RN ). Show that f is real valued if and only if fb( k) = fb(k) for all k 2 RN . What is the analog of this for Fourier series? 19. Let f be a continuous 2⇡ periodic function with the usual Fourier coefficients Z ⇡ 1 cn = f (x)e inx dx 2⇡ ⇡ 145 Show that 1 2⇡ cn = and therefore 1 cn = 4⇡ Z ⇡ ⇡ ⇣ Z ⇡ f (x + ⇡ f (x) ⇡ )e n f (x + inx ⇡ ⌘ ) e n dx inx dx. If f is Lipschitz continuous, use this to show that there exists a constant M such that M |cn |  n 6= 0 |n| 20. Let R = ( 1, 1) ⇥ ( 1, 1) be a square in R2 , let f be the indicator function of R and g be the indicator function of the complement of R. a) Compute the Fourier transforms fˆ and ĝ. b) Is either fˆ or ĝ in L2 (R2 )? 8n5 21. Verify the second limit in (8.7.22). 22. A distribution T on RN is even if Ť = T , and odd if Ť = T . Prove that the Fourier transform of an even (resp. odd) tempered distribution is even (resp. odd). 23. Let 2 S(R), || ||L2 (R) = 1, and show that ✓Z 1 ◆ ✓Z 1 ◆ 2 2 2 ˆ 2 x | (x)| dx y | (y)| dy 1 1 1 4 (8.8.4) This is a mathematical statement of the Heisenberg uncertainty principle. (Suggestion: start with the identity Z 1 Z 1 d 2 1= | (x)| dx = x | (x)|2 dx dx 1 1 Make sure to allow to be complex valued.) Show that equality is achieved in (8.8.4) if is a Gaussian. P1 ⇡n2 t 24. Let ✓(t) = . (It is a particular case of a class of special functions n= 1 e known as theta functions.) Use the Poisson summation formula (8.7.26) to show that r ✓ ◆ 1 1 ✓(t) = ✓ t t 146 uncert 25. Use (8.7.23) to obtain the Fourier transform of pv x1 , r 1 ⇡ ( pv )ˆ(y) = i sgn y x 2 (8.8.5) 26. The proof of Theorem 8.7 implicitly used the fact that if , S(RN ). Prove this property. 27. Where is the mistake in the following argument? If u(x) = e by Fourier transformation iyû(y) + û(y) = (1 + iy)û(y) = 0 2 S(RN ) then ⇤ x 2 then u0 + u = 0 so y2R Since 1 + iy 6= 0 for real y, it follows that û(y) = 0 for all real y and hence u(x) = 0. 28. If f 2 L2 (RN ), the autocorrelation function of f is defined to be Z ˇ g(x) = (f ⇤ f )(x) = f (y)f (y x) dy RN 8n29 Show that ĝ(y) = |fˆ(y)|2 , ĝ 2 L1 (RN ) and that g 2 C0 (RN ). (ĝ is called the power spectrum or spectral density of f .) P inx 29. If T 2 D0 (T) and cn = T (e inx ), show that T = 1 in D0 (T). n= 1 cn e 30. The ODE u00 xu = 0 is known as Airy’s equation, and solutions of it are called Airy functions. a) If u is an Airy function which is also a tempered distribution, use the Fourier transform to find a first order ODE for û(y). b) Find the general solution of the ODE for û. c) Obtain the formal solution formula u(x) = C Z 1 eixy+iy 3 /3 dy 1 d) Explain why this formula is not meaningful as an ordinary integral, and how it can be properly interpreted. e) Is this the general solution of the Airy equation? 147 pvhat Chapter 9 Distributions and Di↵erential Equations chde In this chapter we will begin to apply the theory of distributions developed in the previous chapter in a more systematic way to problems in di↵erential equations. The modern theory of partial di↵erential equations, and to a somewhat lesser extent ordinary di↵erential equations, makes extensive use of the so-called Sobolev spaces which we now proceed to introduce. 9.1 Weak derivatives and Sobolev spaces sec-sobolev If f 2 Lp (⌦) then for any multiindex ↵ we know that D↵ f exists as an element of D0 (RN ), but in general the distributional derivative need not itself be a function. However if there exists g 2 Lq (⌦) such that D↵ f = Tg in D0 (RN ) then we say that f has the weak ↵ derivative g in Lq (⌦). That is to say, the requirement is that Z Z f D↵ dx = ( 1)|↵| g dx 8 2 D(⌦) (9.1.1) ⌦ ↵ ⌦ q and we write D f 2 L (⌦). It is important to distinguish the concept of weak derivative and almost everywhere (a.e.) derivative. Example 9.1. Let ⌦ = ( 1, 1) and f (x) = |x|. Obviously f 2 Lp (⌦) for any 1  p  1, and in the sense of distributions we have f 0 (x) = 2H(x) 1 (use, for example, (7.3.27)). 148 Thus f 0 2 Lq (⌦) for any 1  q  1. On the other hand f 00 = 2 which does not coincide with Tg for any g in any Lq space. Thus f has the weak first derivative, but not the weak second derivative, in Lq (⌦) for any q. The first derivative of f coincides with its a.e. derivative. In the case of the second derivative, f 00 = 2 in the sense of distributions, and obviously f 00 = 0 a.e. but this function does not coincide with the weak second derivative, indeed there is no weak second derivative according to the above definition. We may now define the spaces W k,p (⌦), known as Sobolev spaces. Definition 9.1. If ⌦ ⇢ RN is an open set, 1  p  1 and k = 1, 2, . . . then W k,p (⌦) := {f 2 D0 (⌦) : D↵ f 2 Lp (⌦) |↵|  k} (9.1.2) We emphasize that the meaning of the condition D↵ f 2 Lp (⌦) is that f should have the weak ↵ derivative in Lp (⌦) as discussed above. Clearly D(⌦) ⇢ W k,p (⌦) ⇢ Lp (⌦) (9.1.3) so that W k,p (⌦) is always a dense subspace of Lp (⌦) for 1  p < 1. Example 9.2. If f (x) = |x| then referring to the discussion in the previous example we see that f 2 W 1,p ( 1, 1) for any p 2 [1, 1], but f 62 W 2,p for any p. It may be readily checked that W k,p (⌦) is a normed linear space with norm 8⇣ ⌘ p1 < P p ↵ ||D f || 1p<1 |↵|k Lp (⌦) ||f ||W k,p (⌦) = (9.1.4) :max ↵ p=1 |↵|k ||D f ||L1 (⌦) Furthermore, the necessary completeness property can be shown (Exercise 5, or see Theorem 9.1 below) so that W k,p (⌦) is a Banach space. When p = 2 the norm may be regarded as arising from the inner product X hf, gi = D↵ f (x)D↵ g(x) dx (9.1.5) |↵|k so that it is a Hilbert space. The alternative notation H k (⌦) is commonly used in place of W k,2 (⌦). There is a second natural way to give meaning to the idea of a function f 2 Lp (⌦) having a derivative in an Lq space, which is as follows: if there exists g 2 Lq (⌦) such 149 that there exists fn 2 C 1 (⌦) satisfying fn ! f in Lp (⌦) and D↵ fn ! g in Lq (⌦), then we say f has the strong ↵ derivative g in Lq (⌦). It is elementary to see that a strong derivative is also a weak derivative – we simply let n ! 1 in the identity Z Z D↵ fn dx = ( 1)↵ fn D↵ dx (9.1.6) ⌦ ⌦ for any test function . Far more interesting is that when p < 1 the converse statement is also true, that is weak=strong. This important result, which shall not be proved here, was first established by Friedrichs [12] in some special situations, and then in full generality by Meyers and Serrin [23]. A more thorough discussion may be found, for example, in Chapter 3 of Adams [1]. The key idea is to use convolution, as in Theorem 7.5 to obtain the needed sequence fn of C 1 functions. For f 2 W k,p (⌦) the approximating sequence may clearly be supposed to belong to C 1 (⌦) \ W k,p (⌦), so this space is dense in W k,p (⌦) and we have HisW Theorem 9.1. For any open set ⌦ ⇢ RN , 1  p < 1 and k = 0, 1, 2 . . . the Sobolev space W k,p (⌦) coincides with the closure of C 1 (⌦) \ W k,p (⌦) in the W k,p (⌦) norm. We now define another class of Sobolev spaces which will be important for later use. Definition 9.2. For ⌦ ⇢ RN , W0k,p (⌦) is defined to be the closure of C01 (⌦) in the W k,p (⌦) norm. Obviously W0k,p (⌦) ⇢ W k,p (⌦), but it may not be immediately clear whether these are actually the same space. In fact this is certainly true when k = 0 since in this case we know C01 (⌦) is dense in Lp (⌦), 1  p < 1. It also turns out to be correct for any k, p when ⌦ = RN (see Corollary 3.19 of Adams [ ]). But in general the inclusion is strict, and f 2 W0k,p (⌦) carries the interpretation that D↵ f = 0 on @⌦ for |↵|  k 1. This topic will be continued in more detail in Chapter ( ). 9.2 Di↵erential equations in D0 If we consider the simplest di↵erential equation u0 = f on an interval (a, b) ⇢ R, then from elementary calculus R x we know that if f is continuous on [a, b], then every solution is of the form u(x) = a f (y) dy + C, for some constant C. Furthermore in this case 150 u 2 C 1 ([a, b]), and u0 (x) = f (x) for every x 2 (a, b) and we would refer to u as a classical solution of u0 = f . If we make the weaker assumption that f 2 L1 (a, b) then we can no longer expect u to be C 1 or u0 (x) = f (x) to hold at every point, R x since f itself is only defined up to sets of measure zero. If, however, we let u(x) = a f (y) dy + C then it is an important result of measure theory that u0 (x) = f (x) a.e. on (a, b). The question remains whether all solutions of u0 = f are of this form, and the answer must now depend on precisely what is meant by ’solution’. If we were to interpret the di↵erential equation as meaning u0 = f a.e. then the answer is no. For example u(x) = H(x) is a nonconstant function on ( 1, 1) with u0 (x) = 0 for x 6= 0. An alternative meaning is that the di↵erential equation should be satisfied in the sense of distributions on (a, b), in which case we have the following theorem. th9-2 Theorem 9.2. Let f 2 L1 (a, b). Rx a) If F (x) = a f (y) dy then F 0 = f in D0 (a, b). b) If u0 = f in D0 (a, b), then there exists a constant C such that Z x u(x) = f (y) dy + C a<x<b (9.2.1) a Proof: If F (x) = Rx a f (y) dy, then for any F 0( ) = F ( 0) = Z b ✓Z x = Z Z b F (x) 0 (x) dx a ◆ 0 f (y) dy (x) dx a a ✓Z b ◆ Z b 0 f (y) (x) dx dy = = 2 C01 (a, b) we have a (9.2.2) (9.2.3) (9.2.4) y b f (y) (y) dy = f ( ) (9.2.5) a Here the interchange of order of integration in the third line is easily justified by Fubini’s theorem. This proves part a). Now if u0 = f in the distributional sense then T = u F satisfies T 0 = 0 in D0 (a, b), and we will finish by showing that T must be a constant. Choose 0 2 C01 (a, b) such 151 921 that Rb a 0 (y) dy 2 C01 (a, b), set = 1. If (x) = (x) so that 2 C01 (a, b) and Rb a ✓Z b (y) dy a ◆ 0 (x) (x) dx = 0. Let Z x ⇣(x) = (y) dy (9.2.6) (9.2.7) a Obviously ⇣ 2 C 1 (a, b) since ⇣ 0 = , but in fact ⇣ 2 C01 (a, b) since ⇣(a) = ⇣(b) = 0 and ⇣ 0 = ⌘ 0 in some neighborhood of a and of b. Finally it follows, since T 0 = 0 that ✓Z b ◆ 0 0 0 = T (⇣) = T (⇣ ) = T ( ) = (y) dy T ( 0 ) T ( ) (9.2.8) a Rb or equivalently T ( ) = a C (y) dy where C = T ( 0 ). Thus T is the distribution corresponding to the constant function C. We emphasize that part b) of this theorem is of interest, and not obvious, even when f = 0: any distribution whose distributional derivative on some interval is zero must be a constant distribution on that interval. Therefore, any distribution is uniquely determined up to an additive constant by its distributional derivative, which, to repeat, is not the case for the a.e. derivative. Now let ⌦ ⇢ RN be an open set and Lu = X a↵ (x)D↵ u (9.2.9) |↵|m be a di↵erential operator of order m. We assume that a↵ 2 C 1 (⌦) in which case Lu 2 D0 (⌦) is well defined for any u 2 D0 (⌦). We will use the following terminology for the rest of this chapter. Definition 9.3. If f 2 D0 (⌦) then • u is a classical solution of Lu = f in ⌦ if u 2 C m (⌦) and Lu(x) = f (x) for every x 2 ⌦. • u is a weak solution of Lu = f in ⌦ if u 2 L1loc (⌦) and Lu = f in D0 (⌦). 152 pdo • u is a distributional solution of Lu = f in ⌦ if u 2 D0 (⌦) and Lu = f in D0 (⌦). It is clear that a classical solution is also a weak solution, and a weak solution is a distributional solution. The converse statements are false in general, but may be true in special cases.For example we have proved above that any distributional solution of u0 = 0 must be constant, hence in particular any distributional solution of this di↵erential equation is actually a classical solution. On the other hand u = is a distributional solution of x2 u0 = 0, but is not a classical or weak solution. Of course a classical solution cannot exist if f is not continuous on ⌦. A theorem which says that any solution of a certain di↵erential equation must be smoother than what is actually needed for the definition of solution, is called a regularity result. Regularity theory is a large and important research topic within the general area of di↵erential equations. Example 9.3. Let Lu = uxx uyy . If F, G 2 C 2 (R) and u(x, y) = F (x + y) + G(x y) then we know u is classical solution of Lu = 0. We have also observed, in Example 7.12 that if F, G 2 L1loc (R) then Lu = 0 in the sense of distributions, thus u is a weak solution of Lu = 0 according to the above definition. The equation has distributional solutions also, which R 1 are not weak solutions. For example, the singular distribution T defined by T ( ) = 1 (x, x), dx in Exercise 11 of Chapter 7). Example 9.4. If Lu = uxx +uyy then it turns out that all solutions of Lu = 0 are classical solutions, in fact, any distributional solution must be in C 1 (⌦). This is an example of very important kind of regularity result in PDE theory, and will not be proved here, see for example Corollary 2.20 of [11]. The di↵erence between Laplace’s equation and the wave equation, i.e. that Laplace’s equation has only classical solutions, while the wave equation has many non-classical solutions, is a typical di↵erence between solutions of PDEs of elliptic and hyperbolic types. 9.3 Fundamental solutions secfundsol Let ⌦ ⇢ RN , L be a di↵erential operator as in (9.2.9), and suppose G(x, y) has the following properties1 : G(·, y) 2 D0 (⌦) Lx G(x, y) = (x y) 8y 2 ⌦ (9.3.1) 1 The subscript x in Lx is used here to emphasize that the di↵erential operator is acting in the x variable, with y in the role of a parameter. 153 We then call G a fundamental solution of L in ⌦. If such a G can be found, then formally if we let Z u(x) = G(x, y)f (y) dy (9.3.2) fundsolform ⌦ we may expect that Lu(x) = Z Lx G(x, y)f (y) dy = ⌦ Z (x y)f (y) dy = f (x) (9.3.3) ⌦ That is to say, (9.3.2) provides a way to obtain solutions of the PDE Lu = f , and perhaps also a tool to analyze specific properties of solutions. We are of course ignoring here all questions of rigorous justification – whether the formula for u even makes sense if G is only a distribution in x, for what class of f ’s this might be so, and whether it is permissible to di↵erentiate under the integral to obtain (9.3.3). A more advanced PDE text such as Hörmander [16] may be consulted for such study. Fundamental solutions are not unique in general, since we could always add to G any function H(x, y) satisfying the homogeneous equation Lx H = 0 for fixed y. We will focus now on the case that ⌦ = RN and a↵ (x) ⌘ a↵ for every ↵, i.e. L is a constant coefficient operator. In this case, if we can find 2 D0 (RN ) for which L = , then G(x, y) = (x y) is a fundamental solution according to the above definition, and it is normal in this situation to refer to itself as the fundamental solution rather than G. Formally, the solution formula (9.3.2) becomes Z u(x) = (x y)f (y) dy (9.3.4) RN an integral operator of convolution type. Again it may not be clear if this makes sense as an ordinary integral, but recall that we have earlier defined (Definition 7.7) the convolution of an arbitrary distribution and test function, namely u(x) = ( ⇤ f )(x) := (⌧x fˇ) (9.3.5) if 2 D0 (⌦) and f 2 C01 (RN ). Furthermore, using Theorem 7.3, it follows that u 2 C 1 (RN ) and Lu(x) = (L ) ⇤ f (x) = ( ⇤ f )(x) = f (x) (9.3.6) We have therefore proved Proposition 9.1. If there exists 2 D0 (⌦) such that L = , then for any f 2 C01 (RN ) the function u = ⇤ f is a classical solution of Lu = f . 154 932 It will essentially always be the case that the solution formula u = ⇤f is actually valid for a much larger class of f ’s than C01 (RN ) but this will depend on specific properties of the fundamental solution , which in turn depend on those of the original operator L. Example 9.5. If L = , the Laplacian operator in R3 , then we have already shown (Example 7.13) that (x) = 1/4⇡|x| satisfies = in the sense of distributions on R3 . Thus ✓ ◆ Z 1 1 f (y) u(x) = ⇤ f (x) = dy (9.3.7) 4⇡|x| 4⇡ R3 |x y| provides a solution of u = f in R3 , at least when f 2 C01 (R3 ). The integral on the right in (9.3.7) is known as the Newtonian potential of f , and can be shown to be a valid solution formula for a much larger class of f ’s. It is in any case always a ’candidate’ solution, which can be analyzed directly. A fundamental solution of the Laplacian exists in RN for any dimension, and will be recalled at the end of this section. Example 9.6. Consider the wave operator Lu = utt uxx in R2 . A fundamental solution for L (see Exercise 9) is 1 (x, t) = H(t |x|) (9.3.8) 2 The support of , namely the set {(x, t) : |x| < t} is in this context known as the forward light cone, representing the set of points x at which for fixed t > 0 a signal emanating from the origin x = 0 at time t = 0, and travelling with speed one, may have reached. The resulting solution formula for Lu = f may then be obtained as Z 1Z 1 u(x, t) = (x y, t s)f (y, s) dyds 1 1 Z Z 1 1 1 = H(t s |x y|)f (y, s) dyds 2 1 1 Z Z x+t s 1 t = f (y, s) dyds 2 1 x t+s (9.3.9) (9.3.10) (9.3.11) In many cases of interest f (x, t) ⌘ 0 for t < 0 in which case we replace the lower limit in the s integral by 0. In any case the region over which f is integrated is the ’backward’ light cone, with vertex at (x, t). Under this support assumption on f it also follows that u(x, 0) = ut (x, 0) ⌘ 0, so by adding in the corresponding terms in D’Alembert’s solution (2.3.46) we find that Z Z Z 1 t x+s t 1 1 x+t u(x, t) = f (y, s) dyds + (h(x + t) + h(x t)) + g(s) ds (9.3.12) 2 0 x+t s 2 2 x t 155 newtpot is the unique solution of utt uxx = f (x, t) u(x, 0) = h(x) ut (x, 0) = g(x) x2R t>0 x2R x2R (9.3.13) (9.3.14) (9.3.15) It is of interest to note that this solution formula could also be written, formally at least, as @ u(x, t) = ( ⇤ f )(x, t) + ( ⇤ h)(x, t) + ( ⇤ g)(x, t) (9.3.16) (x) @t (x) where the notation ( ⇤ h) indicates that the convolution takes place in x only, with t (x) as a parameter. Thus the fundamental solution enters into the solution not only of the inhomogeneous equation Lu = f but in solving the Cauchy problem as well. This is not an accidental feature, and we will see other instances of this sort of thing later. So far we have seen a couple of examples where an explicit fundamental solution is known, but have given no indication of a general method for finding it, or even determining if a fundamental solution exists. Let us address the second issue first, by stating without proof a remarkable theorem. MalEhr Theorem 9.3. (Malgrange-Ehrenpreis) If L 6= 0 is any constant coefficient linear di↵erential operator then there exists a fundamental solution of L. The proof of this theorem is well beyond the scope of this book, see for example Theorem 8.5 of [30] or Theorem 10.2.1 of [16]. The assumption of constant coefficients is essential here, counterexamples are known otherwise. If we now consider how it might be possible to compute a fundamental solution for a given operator L, it soon becomes apparent that the Fourier transform may be a useful tool. If we start with the distributional PDE X L = a↵ D ↵ = (9.3.17) |↵|m and take the Fourier transform of both sides, the result is X X a↵ (D↵ )ˆ = a↵ (iy)↵ ˆ = |↵|m or |↵|m P (y) ˆ (y) = 1 156 1 N (2⇡) 2 (9.3.18) (9.3.19) 9315 where P (y), the so-called symbol or characteristic polynomial of L is defined as X N P (y) = (2⇡) 2 (iy)↵ a↵ (9.3.20) |↵|m Note it was implicitly assumed here that ˆ exists, which would be the case if were a tempered distribution, but this is not actually guaranteed by Theorem 9.3. This is a rather technical issue which we will not discuss here, but rather take the point of view that we seek a formal solution which, potentially, further analysis may show is a bona fide fundamental solution. We have thus obtained ˆ (y) = 1/P (y), or by the inversion theorem Z 1 1 ix·y (x) = e dy N (2⇡) 2 RN P (y) (9.3.21) as a candidate for fundamental solution of L. One particular source of difficulty in making sense of the inverse transform of 1/P is that in general P has zeros, which might be of arbitrarily high order, making the integrand too singular to have meaning in any ordinary sense. On the other hand, we have seen, at least in one dimension, how welldefined distributions of the ’pseudo-function’ type may be associated with non- locally integrable functions such as 1/xm . Thus there may be some analogous construction in more than one dimension as well. This is in fact one possible means to proving the Malgrange-Ehrenpreis theorem. It also suggests that the situation may be somewhat easier to deal with if the zero set of P in RN is empty, or at least not very large. As a polynomial, of course, P always has zeros, but some or all of these could be complex, whereas the obstructions to making sense of (9.3.21) pertain to the real zeros of P only. If L is a constant coefficient di↵erential operator of order m as above, define X N Pm (y) = (2⇡) 2 (iy)↵ a↵ (9.3.22) |↵|=m which is known as the principal symbol of L. Definition 9.4. We say that L is elliptic if y 2 RN , Pm (y) = 0 implies that y = 0. That is to say, the principal symbol has no nonzero real roots. For example the Laplacian operator L = is elliptic, as is + lower order terms, since either way 157 fundsolform2 P2 (y) = |y|2 . On the other hand, the wave operator, written say as Lu = PN 2 2 is not elliptic, since the principal symbol is P2 (y) = yN +1 j=1 yj . u uxN +1 xN +1 The following is not so difficult to establish (Exercise 16), and may be exploited in working with the representation (9.3.21) in the elliptic case. prop92 Proposition 9.2. If L is elliptic then {y 2 RN : P (y) = 0} (9.3.23) the real zero set of P , is compact in RN , and lim|y|!1 |P (y)| = 1. We will next derive a fundamental solution for the heat equation by using the Fourier transform, although in a slightly di↵erent way from the above discussion. Consider first the initial value problem for the heat equation ut u = 0 x 2 RN t > 0 u(x, 0) = h(x) x 2 RN (9.3.24) (9.3.25) with h 2 C01 (RN ). Assuming a solution exists, define the Fourier transform in the x variables, Z 1 û(y, t) = u(x, t)e ix·y dx (9.3.26) N (2⇡) 2 RN Taking the partial derivative with respect to t of both sides gives (û)t = (ut )ˆ so by the usual Fourier transformation calculation rules, |y|2 û (ut )ˆ = (û)t = (9.3.27) and û(y, 0) = ĥ(y). We may regard this as an ODE in t satisfied by û(y, t) for fixed y, for which the solution obtained by elementary means is |y|2 t û(y, t) = e If we let be such that ˆ (y, t) = 1 (2⇡) N 2 e |y|2 t u(x, t) = ( ĥ(y) (9.3.28) then by Theorem 8.8 it follows that ⇤ h)(x, t) (x) Since ˆ is a Gaussian in x, the same is true for we get (x, t) = H(t) 158 (9.3.29) itself, as long as t > 0, and from (8.4.7) e |x|2 4t N (4⇡t) 2 (9.3.30) hteqfs By including the H(t) factor we have for later convenience defined (x, t) = 0 for t < 0. Thus we get an integral representation for the solution of (9.3.38)-(9.3.39), namely Z Z |x y|2 1 4t u(x, t) = (x y, t)h(y) dy = e h(y) dy (9.3.31) N (4⇡t) 2 RN RN 930 valid for x 2 RN and t > 0. As usual, although this was derived for convenience under very restrictive conditions on h, it is actually valid much more generally (see Exercise 12). Now to derive a solution formula for ut u = f , let v = v(x, t; s) be the solution of (9.3.38)-(9.3.39) with h(x) replaced by f (x, s), regarding s for the moment as a parameter, and define Z t u(x, t) = v(x, t s; s) ds (9.3.32) 931 0 Assuming that f is sufficiently regular, it follows that Z t ut (x, t) = v(x, 0, t) + vt (x, t s, s) ds 0 Z t = f (x, t) + v(x, t s, s) ds (9.3.33) (9.3.34) 0 = f (x, t) + u(x, t) Inserting the formula (9.3.31) with h replaced by f (·, s) gives Z tZ u(x, t) = ( ⇤ f )(x, t) = (x y, t s)f (y, s) dyds (9.3.35) (9.3.36) RN 0 with given again by (9.3.30). Strictly speaking, we should assume that f (x, t) ⌘ 0 for t < 0 in order that the integral on the right in (9.3.36) coincide with the convolution in RN +1 , but this is without loss of generality, since we only seek to solve the PDE for t > 0. The procedure used above for obtaining the solution of the inhomogeneous PDE starting with the solution of a corresponding initial value problem is known as Duhamel’s method, and is generally applicable, with suitable modifications, for time dependent PDEs in which the coefficients are independent of time. Since u(x, t) in (9.3.32) evidently satisfies u(x, 0) ⌘ 0, it follows (compare to (9.3.16)) that u(x, t) = ( ⇤ h)(x, t) + ( ⇤ f )(x, t) (9.3.37) (x) 159 935 is a solution2 of x 2 RN x 2 RN ut u = f (x, t) u(x, 0) = h(x) t>0 (9.3.38) (9.3.39) Let us also observe here that if F (x) = then F 0, R RN 1 (2⇡) N 2 e |x|2 4 (9.3.40) F (x) dx = 1, and (x, t) = ✓ 1 p t ◆N x F (p ) t (9.3.41) for t > 0. From Theorem 7.2, and the observation that a sequence of the form (7.3.11) satisfies the assumptions of that theorem, it follows that nN F (nx) ! in D(RN ) as n ! 1. Choosing n = p1t we conclude that lim t!0+ In particular limt!0+ ( in D0 (RN ) (·, t) = (9.3.42) ⇤ h)(x, t) = h(x) for all x 2 RN , at least when h 2 C01 (RN ). (x) We conclude this section by collecting all in one place a number of important fundamental solutions. Some of these have been discussed already, some will be left for the exercises, and in several other cases we will be content with a reference. Laplace operator For L = 2 in RN there exists the following fundamental solutions3 : 8 |x| > N =1 <2 1 N =2 (x) = 2⇡ log |x| > : CN N 3 |x|N 2 (9.3.43) Note we do not say ’the solution’ here, in fact the solution is not unique without further restrictions. Some texts will use consistently the fundamental solution of rather than , in which case all of the signs will be reversed. 3 160 laplace-fund where CN = (2 1 N )⌦N ⌦N 1 = 1 Z dS(x) (9.3.44) |x|=1 Thus CN is a geometric constant, related to the area of the unit sphere in RN – an equivalent formula in terms of the volume of the unit ball in RN is also commonly used. Of the various cases, N = 1 is elementary to check, N = 2 is requested in Exercise 20 of Chapter 7, and we have done the N 3 case in Example 7.13. Heat operator For the heat operator L = fundamental solution @ @t in RN +1 , we have derived earlier in this section the (x, t) = H(t) e |x|2 4t N (4⇡t) 2 (9.3.45) for all N . Wave operator 2 @ For the wave operator L = @t in RN +1 , the fundamental solution is again significantly 2 dependent on N . The cases of N = 1, 2, 3 are as follows: 8 1 > |x|) N =1 > < 2 H(t H(t |x|) 1 p N =2 (x, t) = 2⇡ t2 |x|2 (9.3.46) > > : (t |x|) N =3 4⇡|x| We have discussed the N = 1 case earlier in this section, and refer to [10] or [18] for the cases N = 2, 3. As a distribution, the meaning of the the fundamental solution in the N = 3 case is just what one expects from the formal expression, namely Z Z 1 Z (t |x|) (x, |x|) ( )= (x, t) dtdx = dx (9.3.47) 4⇡|x| R3 1 R3 4⇡|x| for any test function . Note the tendency for the fundamental solution to become more and more singular, as N increases. This pattern persists in higher dimensions, as the fundamental solution starts to contain expressions involving 0 and higher derivatives of the function. 161 Schrödinger operator @ The Schrödinger operator is defined as L = @t i in RN +1 . The derivation of a fundamental solution here is nearly the same as for the heat equation, the result being (x, t) = H(t) e |x|2 4it (9.3.48) N (4i⇡t) 2 In quantum mechanics is frequently referred to as the ’propagator’. See [26] for much material about the Schrödinger equation. Helmholtz operator The Helmholtz operator is defined by Lu = u u. For > 0 and dimensions N = 2, 3 fundamental solutions are 8 p sin ( |x|) > N =1 > 2p < p p (x) = 2⇡ K0 ( x) (9.3.49) N =2 p > > : e |x| N =3 4⇡|x| where K0 is the so-called modified Bessel function of the second kind and order 0. See Chapter 6 of [3] for derivations of these formulas when N = 2, 3, while the N = 1 case is left for the exercises. This is a case where it may be convenient to use the Fourier transform method directly, since the symbol of L, P (y) = |y|2 has no real zeros. Klein-Gordon operator 2 The Klein-Gordon operator is defined by Lu = @@t2u u u in RN +1 . We mention only the case N = 1, > 0, in which case a fundamental solution is 1 (x, t) = H(t 2 |x|)J0 ( p (t2 x2 )) N =1 (9.3.50) where J0 is the Bessel function of the first kind and order zero (see Exercise 13 of Chapter 8). This may be derived, for example, by the method presented in Problem 2, Section 5.1 of [18], and choosing = . 162 9349 Biharmonic operator The biharmonic operator is L = 2 , i.e. Lu = ( u). It arises especially in connection with the theory of plates and shells, so that N = 2 is the most interesting case. A fundamental solution is (x) = |x|2 log |x| N =2 (9.3.51) for which a derivation of this is outlined in Exercise 10. 9.4 Exercises 1. Show that an equivalent definition of W 2,s (RN ) = H s (RN ) for s = 0, 1, 2, . . . is Z s N 0 N H (R ) = {f 2 S (R ) : |fˆ(y)|2 (1 + |y|2 )s dy < 1} (9.4.1) Rn The second definition makes sense even if s isn’t a positive integer and leads to one way to define fractional and negative order di↵erentiability. Implicitly it requires that fˆ (but not f itself) must be a function. 2. Using the definition (9.4.1), show that H s (RN ) ⇢ C0 (RN ) if s > 2 H s (RN ) if s < N2 . N . 2 Show that 1 3. If ⌦ is a bounded open set in R3 , and u(x) = |x| , show that u 2 W 1,p (⌦) for 3 1  p < 2 . Along the way, you should show carefully that a distributional first @u derivative @x agrees with the corresponding pointwise derivative. i 4. Prove that if f 2 W 1,p (a, b) for p > 1 then |f (x) f (y)|  ||f ||W 1,p (a,b) |x y|1 1 p (9.4.2) so in particular W 1,p (a, b) ⇢ C([a, b]). (Caution: You would like to use the fundamental theorem of calculus here, but it isn’t quite obvious whether it is valid assuming only that f 2 W 1,p (a, b).) ex-8-6 5. Proved directly that W k,p (⌦) is complete (relying of course on the fact that Lp (⌦) is complete). 6. Show that Theorem 9.1 is false for p = 1. 163 HsDef 7. If f is a nonzero constant function on [0, 1], show that f 62 W01,p (0, 1) for 1  p < 1. 8. Let Lu = u00 + u and E(x) = H(x) sin x, x 2 R. a) Show that E is a fundamental solution of L. b) What is the corresponding solution formula for Lu = f ? c) The fundamental solution E is not the same as the one given in (9.3.49). Does this call for any explanation? ex-8-8 ex-9-10 9. Show that E(x, t) = 12 H(t Lu = utt uxx . |x|) is a fundamental solution for the wave operator 10. The fourth order operator Lu = uxxxx + 2uxxyy + uyyyy in R2 is the biharmonic operator which arises in the theory of deformation of elastic plates. a) Show that L = 2 , i.e. Lu = ( u) where is the Laplacian. b) Find a fundamental solution of L. (Suggestions: To solve p LE = , first solve F = and then E = F . Since F will depend on r = x2 + y 2 only, you can look for a solution E = E(r) also.) 11. Let Lu = u00 + ↵u0 where ↵ > 0 is a constant. a) Find a fundamental solution of L which is a tempered distribution. b) Find a fundamental solution of L which is not a tempered distribution. ex-9-12 12. Show directly that u(x, t) defined by (9.3.31) is a classical solution of the heat equation for t > 0, under the assumption that h is bounded and continuous on RN . 13. Assuming that (9.3.31) is valid and h 2 Lp (RN ), derive the decay property ||u(·, t)||L1 (RN )  ||h||Lp (RN ) N t 2p for 1  p  1. 14. If G(x, y) = ( y(x x(y 1) 0 < y < x < 1 1) 0 < x < y < 1 show that G is a fundamental solution of Lu = u00 in (0, 1). 15. Is the heat operator L = ex-9-4 @ @t elliptic? 16. Prove Proposition 9.2. 164 (9.4.3)

MATH 519-520 Lecture Notes: Mathematical Analysis

Related documents

Products

Support

MATH 519-520 Lecture Notes: Mathematical Analysis

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib