Notes for MATH 519-520 Paul E. Sacks January 13, 2016 Contents 1 Orientation 1.1 8 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 Preliminaries 2.1 8 9 Ordinary differential equations . . . . . . . . . . . . . . . . . . . . . . . . 9 2.1.1 Initial Value Problems . . . . . . . . . . . . . . . . . . . . . . . . 10 2.1.2 Boundary Value Problems . . . . . . . . . . . . . . . . . . . . . . 12 2.1.3 Some exactly solvable cases . . . . . . . . . . . . . . . . . . . . . 13 2.2 Integral equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 2.3 Partial differential equations . . . . . . . . . . . . . . . . . . . . . . . . . 18 2.3.1 First order PDEs and the method of characteristics . . . . . . . . 18 2.3.2 Second order problems in R2 . . . . . . . . . . . . . . . . . . . . . 21 2.3.3 Further discussion of model problems . . . . . . . . . . . . . . . . 24 2.3.4 Standard problems and side conditions . . . . . . . . . . . . . . . 30 2.4 Well-posed and ill-posed problems . . . . . . . . . . . . . . . . . . . . . . 33 2.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 1 3 Vector spaces 39 3.1 Axioms of a vector space . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 3.2 Linear independence and bases . . . . . . . . . . . . . . . . . . . . . . . 42 3.3 Linear transformations of a vector space . . . . . . . . . . . . . . . . . . 43 3.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 4 Metric spaces 46 4.1 Axioms of a metric space . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 4.2 Topological concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 4.3 Functions on metric spaces and continuity . . . . . . . . . . . . . . . . . 53 4.4 Compactness and optimization . . . . . . . . . . . . . . . . . . . . . . . . 54 4.5 Contraction mapping theorem . . . . . . . . . . . . . . . . . . . . . . . . 58 4.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 5 Normed linear spaces and Banach spaces 66 5.1 Axioms of a normed linear space . . . . . . . . . . . . . . . . . . . . . . . 66 5.2 Infinite series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 5.3 Linear operators and functionals . . . . . . . . . . . . . . . . . . . . . . . 70 5.4 Contraction mappings in a Banach space . . . . . . . . . . . . . . . . . . 72 5.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 6 Inner product spaces and Hilbert spaces 6.1 Axioms of an inner product space . . . . . . . . . . . . . . . . . . . . . . 2 75 75 6.2 Norm in a Hilbert space . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 6.3 Orthogonality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78 6.4 Projections . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 6.5 Gram-Schmidt method . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83 6.6 Bessel’s inequality and infinite orthogonal sequences . . . . . . . . . . . . 84 6.7 Characterization of a basis of a Hilbert space . . . . . . . . . . . . . . . . 85 6.8 Isomorphisms of a Hilbert space . . . . . . . . . . . . . . . . . . . . . . . 87 6.9 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 7 Distributions 93 7.1 The space of test functions . . . . . . . . . . . . . . . . . . . . . . . . . . 94 7.2 The space of distributions . . . . . . . . . . . . . . . . . . . . . . . . . . 95 7.3 Algebra and Calculus with Distributions . . . . . . . . . . . . . . . . . . 99 7.3.1 Multiplication of distributions . . . . . . . . . . . . . . . . . . . . 99 7.3.2 Convergence of distributions . . . . . . . . . . . . . . . . . . . . . 99 7.3.3 Derivative of a distribution . . . . . . . . . . . . . . . . . . . . . . 102 7.4 Convolution and distributions . . . . . . . . . . . . . . . . . . . . . . . . 108 7.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112 8 Fourier analysis and distributions 115 8.1 Fourier series in one space dimension . . . . . . . . . . . . . . . . . . . . 115 8.2 Alternative forms of Fourier series . . . . . . . . . . . . . . . . . . . . . . 121 8.3 More about convergence of Fourier series . . . . . . . . . . . . . . . . . . 123 3 8.4 The Fourier Transform on RN . . . . . . . . . . . . . . . . . . . . . . . . 125 8.5 Further properties of the Fourier transform . . . . . . . . . . . . . . . . . 130 8.6 Fourier series of distributions . . . . . . . . . . . . . . . . . . . . . . . . 134 8.7 Fourier transforms of distributions . . . . . . . . . . . . . . . . . . . . . . 137 8.8 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143 9 Distributions and Differential Equations 149 9.1 Weak derivatives and Sobolev spaces . . . . . . . . . . . . . . . . . . . . 149 9.2 Differential equations in D0 . . . . . . . . . . . . . . . . . . . . . . . . . . 151 9.3 Fundamental solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154 9.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164 10 Linear operators 166 10.1 Linear mappings between Banach spaces . . . . . . . . . . . . . . . . . . 166 10.2 Examples of linear operators . . . . . . . . . . . . . . . . . . . . . . . . . 168 10.3 Linear operator equations . . . . . . . . . . . . . . . . . . . . . . . . . . 174 10.4 The adjoint operator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175 10.5 Examples of adjoints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178 10.6 Conditions for solvability of linear operator equations . . . . . . . . . . . 180 10.7 Fredholm operators and the Fredholm alternative . . . . . . . . . . . . . 181 10.8 Convergence of operators . . . . . . . . . . . . . . . . . . . . . . . . . . . 182 10.9 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183 4 11 Unbounded operators 187 11.1 General aspects of unbounded linear operators . . . . . . . . . . . . . . . 187 11.2 The adjoint of an unbounded linear operator . . . . . . . . . . . . . . . . 191 11.3 Extensions of symmetric operators . . . . . . . . . . . . . . . . . . . . . 195 11.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197 12 Spectrum of an operator 200 12.1 Resolvent and spectrum of a linear operator . . . . . . . . . . . . . . . . 200 12.2 Examples of operators and their spectra . . . . . . . . . . . . . . . . . . 204 12.3 Properties of spectra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207 12.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 210 13 Compact Operators 213 13.1 Compact operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213 13.2 The Riesz-Schauder theory . . . . . . . . . . . . . . . . . . . . . . . . . . 220 13.3 The case of self-adjoint compact operators . . . . . . . . . . . . . . . . . 224 13.4 Some properties of eigenvalues . . . . . . . . . . . . . . . . . . . . . . . . 231 13.5 The Singular Value Decomposition and Normal Operators . . . . . . . . 233 13.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235 14 Spectra and Green’s functions for differential operators 238 14.1 Green’s functions for second order ODEs . . . . . . . . . . . . . . . . . . 238 14.2 Adjoint problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 242 5 14.3 Sturm-Liouville theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . 245 14.4 The Laplacian with homogeneous Dirichlet boundary conditions . . . . . 249 14.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 256 15 Further study of integral equations 260 15.1 Singular integral operators . . . . . . . . . . . . . . . . . . . . . . . . . . 260 15.2 Layer potentials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 264 15.3 Convolution equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 269 15.4 Wiener-Hopf technique . . . . . . . . . . . . . . . . . . . . . . . . . . . . 270 15.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 273 16 Variational methods 276 16.1 The Dirichlet quotient . . . . . . . . . . . . . . . . . . . . . . . . . . . . 276 16.2 Eigenvalue approximation . . . . . . . . . . . . . . . . . . . . . . . . . . 281 16.3 The Euler-Lagrange equation . . . . . . . . . . . . . . . . . . . . . . . . 282 16.4 Variational methods for elliptic boundary value problems . . . . . . . . . 284 16.5 Other problems in the calculus of variations . . . . . . . . . . . . . . . . 288 16.6 The existence of minimizers . . . . . . . . . . . . . . . . . . . . . . . . . 293 16.7 The Fréchet derivative . . . . . . . . . . . . . . . . . . . . . . . . . . . . 294 16.8 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 298 17 Weak solutions of partial differential equations 304 17.1 Lax-Milgram theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 304 6 17.2 More function spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 311 17.3 Galerkin’s method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 316 17.4 PDEs with variable coefficients . . . . . . . . . . . . . . . . . . . . . . . 318 17.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 319 18 Appendices 322 18.1 Inequalities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 322 18.2 Integration by parts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 325 18.3 Spherical coordinates in RN . . . . . . . . . . . . . . . . . . . . . . . . . 326 19 Bibliography 328 7 Chapter 1 Orientation 1.1 Introduction While the phrase ’Applied Mathematics’ has a very broad meaning, the purpose of this textbook is much more limited, namely to present techniques of mathematical analysis which have been found to be particularly useful in understanding some kinds of mathematical problems which are very commonly occurring in scientific and technological disciplines, especially physics and engineering. These methods, which are often regarded as belonging to the realm of functional analysis, have been motivated most specifically in connection with the study of ordinary differential equations, partial differential equations and integral equations. The mathematical modeling of physical phenomena typically involves one or more of these types of equations, and insight into the physical phenomenon itself may result from a deep understanding of the underlying mathematical properties which the models possess. All concepts and techniques discussed in this book are ultimately of interest because of their relevance for the study of these three general types of problems. There is a great deal of beautiful mathematics which has grown out of these ideas, and so intrinsic mathematical motivation cannot be denied or ignored. 8 Chapter 2 Preliminaries chprelim In this chapter we will discuss ’standard problems’ in the theory of ordinary differential equations (ODEs), integral equations, and partial differential equations (PDEs). The techniques developed in these notes are all meant to have some relevance for one or more of these kinds of problems, so it seems best to start with some awareness of exactly what the problems are. In each case there are some relatively elementary methods, which the reader may well have seen before, or which depend only on simple considerations, which we will review. At the same time we establish terminology and notations, and begin to get some sense of the ways in which problems are classified. 2.1 Ordinary differential equations An n’th order ordinary differential equation for an unknown function u = u(t) on an interval (a, b) ⊂ R may be given in the form F (t, u, u0 , u00 , . . . u(n) ) = 0 (2.1.1) odeform1 where we use the usual notations u0 , u00 , . . . for derivatives of order 1, 2, . . . and also u(n) for derivative of order n. Unless otherwise stated, we will assume that the ODE can be solved for the highest derivative, i.e. written in the form u(n) = f (t, u, u0 , . . . u(n−1) ) (2.1.2) For the purpose of this discussion, a solution of either equation will mean a real valued function on (a, b) possessing continuous derivatives up through order n, and for which 9 odeform the equation is satisfied at every point of (a, b). While it is easy to write down ODEs in the form (2.1.1) without any solutions (for example, (u0 )2 + u2 + 1 = 0), we will see that ODEs of the type (2.1.2) essentially always have solutions, subject to some very minimal assumptions on f . The ODE is linear if it can be written as n X aj (t)u(j) (t) = g(t) (2.1.3) lode j=0 for some coefficients a0 , . . . an , g, and homogeneous linear if also g(t) ≡ 0. It is common to use also operator notation for derivatives, especially in the linear case. Set D= d dt (2.1.4) so that u0 = Du, u00 = D2 u etc., and (2.1.3) may be given as Lu := n X aj (t)Dj u = g(t) (2.1.5) j=0 By standard calculus properties L is a linear operator, meaning that L(c1 u1 + c2 u2 ) = c1 Lu1 + c2 Lu2 (2.1.6) linear for any scalars c1 , c2 and any n times differentiable functions u1 , u2 . An ODE normally has infinitely many solutions – the collection of all solutions is called the general solution of the given ODE. Example 2.1. By elementary calculus considerations, the simple ODE u0 = 0 has general solution u(t) = c, where c is an arbitrary constant. Likewise u0 = u has the general 2 solution u(t) = cet and u00 = 1 has the general solution u(t) = t2 + c1 t + c2 , where c1 , c2 are arbitrary constants. 2 2.1.1 Initial Value Problems The general solution of an n’th order ODE typically contains exactly n arbitrary constants, whose values may be then chosen so that the solution satisfies n additional, or side, conditions. The most common kind of side conditions for an ODE are initial conditions, u(j) (t0 ) = γj j = 0, 1, . . . n − 1 10 (2.1.7) initcond where t0 is a given point in (a, b) and γ0 , . . . γn−1 are given constants. Thus we are prescribing the value of the solution and its derivatives up through order n − 1 at the point t0 . The problem of solving (2.1.2) together with the initial conditions (2.1.7) is called an initial value problem (IVP), and it is a very important fact that under fairly unrestrictive hypotheses a unique solution exists. In stating conditions on f , we regard it as a function f = f (t, y1 , . . . yn ) defined on some domain in Rn+1 . OdeMain Theorem 2.1. Assume that f, ∂f ∂f ,..., ∂y1 ∂yn (2.1.8) are defined and continuous in a neighborhood of the point (t0 , γ0 , . . . , γn−1 ) ∈ Rn+1 . Then there exists > 0 such that the initial value problem (2.1.2),(2.1.7) has a unique solution on the interval (t0 − , t0 + ). A proof of this theorem may be found in standard ODE textbooks, see for example [4],[7]. A slightly weaker version of this theorem will be proved in Section 4.5. As will be discussed there, the condition of continuity of the partial derivatives of f with respect to each of the variables yi can actually be replaced by the weaker assumption that f is Lipschitz continuous with respect to each of these variables. If we assume only that f is continuous in a neighborhood of the point (t0 , γ0 , . . . , γn−1 ) then it can be proved that at least one solution exists, but it may not be unique, see Exercise 3. It should also be emphasized that the theorem asserts a local existence property, i.e. only in some sufficiently small interval centered at t0 . It has to be this way, first of all, since the assumptions on f are made only in the vicinity of (t0 , γ0 , . . . , γn−1 ). But even if the continuity properties of f were assumed to hold throughout Rn+1 , then as the following example shows, it would still only be possible to prove that a solution exists for points t close enough to t0 . Example 2.2. Consider the first order initial value problem u0 = u2 u(0) = γ (2.1.9) for which the assumptions of Theorem 2.1 hold for any γ. It may be checked that the solution of this problem is γ u(t) = (2.1.10) 1 − γt which is only a valid solution for t < γ1 , which can be arbitrarily small. 2 11 With more restrictions on f it may be possible to show that the solution exists on any interval containing t0 , in which case we would say that the solution exists globally. This is the case, for example, for the linear ODE (2.1.3). Whenever the conditions of Theorem 2.1 hold, the set of all possible solutions may be regarded as being parametrized by the n constants γ0 , . . . , γn−1 , so that as mentioned above, the general solution will contain n arbitrary parameters. In the special case of the linear equation (2.1.3) it can be shown that the general solution may be given as u(t) = n X cj uj (t) + up (t) (2.1.11) j=1 where up is any particular solution of (2.1.3), and u1 , . . . , un are any n linearly independent solutions of the corresponding homogeneous equation Lu = 0. Any such set of functions u1 , . . . , un is also called a fundamental set for Lu = 0. Example 2.3. If Lu = u00 +u then by direct substitution we see that u1 (t) = sin t, u2 (t) = cos t are solutions, and they are clearly linearly independent. Thus {sin t, cos t} is a fundamental set for Lu = 0 and u(t) = c1 sin t + c2 cos t is the general solution of Lu = 0. For the inhomogeneous ODE u00 + u = et one may check that up (t) = 21 et is a particular solution, so the general solution is u(t) = c1 sin t + c2 cos t + 21 et . 2.1.2 Boundary Value Problems For an ODE of degree n ≥ 2 it may be of interest to impose side conditions at more than one point, typically the endpoints of the interval of interest. We will then refer to the side conditions as boundary conditions and the problem of solving the ODE subject to the given boundary conditions as a boundary value problem(BVP). Since the general solution still contains n parameters, we still expect to be able to impose a total of n side conditions. However we can see from simple examples that the situation with regard to existence and uniqueness in such boundary value problems is much less clear than for initial value problems. Example 2.4. Consider the boundary value problem u00 + u = 0 0 < t < π u(0) = 0 u(π) = 1 (2.1.12) Starting from the general solution u(t) = c1 sin t + c2 cos t, the two boundary conditions lead to u(0) = c2 = 0 and u(π) = c2 = 1. Since these are inconsistent, the BVP has no solution. 2 12 Example 2.5. For the boundary value problem u00 + u = 0 0 < t < π u(0) = 0 u(π) = 0 (2.1.13) we have solutions u(t) = C sin t for any constant C, that is, the BVP has infinitely many solutions. The topic of boundary value problems will be studied in much more detail in Chapter ( ). 2.1.3 Some exactly solvable cases Let us recall explicit solution methods for some commonly occurring types of ODEs. • For the first order linear ODE u0 + p(t)u = q(t) (2.1.14) define the so-called integrating factor ρ(t) = eP (t) where P 0 = p. Multiplying the equation through by ρ we then get (ρu)0 = ρq (2.1.15) so if we pick Q such that Q0 = ρq, the general solution may be given as u(t) = Q(t) + C ρ(t) (2.1.16) • Next consider the linear homogeneous constant coefficient ODE Lu = n X aj u(j) = 0 (2.1.17) j=0 If we look for solutions in the form u(t) = eλt then by direct substitution we find that u is a solution provided λ is a root of the corresponding characteristic polynomial P (λ) = n X j=0 13 aj λ j (2.1.18) lode1 We therefore obtain as many linearly independent solutions as there are distinct roots of P . If this number is less than n, then we may seek further solutions of the form teλt , t2 eλt , . . . , until a total of n linearly independent solutions have been found. In the case of complex roots, equivalent expressions in terms of trigonometric functions are often used in place of complex exponentials. • Finally, closely related to the previous case is the so-called Cauchy-Euler type equation n X Lu = (t − t0 )j aj u(j) = 0 (2.1.19) CEtype j=0 for some constants a0 , . . . , an . In this case we look for solutions in the form u(t) = (t − t0 )λ with λ to be found. Substituting into (2.1.19) we will find again an n’th order polynomial whose roots determine the possible values of λ. The interested reader may refer to any standard undergraduate level ODE book for the additional considerations which arise in the case of complex or repeated roots. 2.2 Integral equations In this section we discuss the basic set-up for the study of linear integral equations. See for example [15], [21] for general references in the classical theory of integral equations. Let Ω ⊂ RN be a measurable set and set Z T u(x) = K(x, y)u(y) dy (2.2.1) Ω Here the function K should be a measurable function on Ω × Ω, and is called the kernel of the integral operator T , which is linear since (2.1.6) obviously holds. A class of associated integral equations is then Z K(x, y)u(y) dy = λu(x) + g(x) x∈Ω (2.2.2) Ω for some scalar λ and given function g in some appropriate class. If λ = 0 then (2.2.2) is a first kind integral equation, otherwise it is second kind. Let us consider some simple examples which may be studied by elementary means. 14 basicie Example 2.6. Let Ω = (0, 1) ⊂ R and K(x, y) ≡ 1. The corresponding first kind integral equation is therefore Z 1 u(y) dy = g(x) 0 < x < 1 (2.2.3) 0 For simplicity here we will assume that g is a continuous function. The left hand side is independent of x, thus a solution can exist only if g(x) is a constant function. When g is constant, on the other hand, infinitely many solutions will exist, since we just need to find any u with the given definite integral. For the corresponding second kind equation, Z 1 u(y) dy = λu(x) + g(x) (2.2.4) simplestie 0 a solution must have the specific form u(x) = (C − g(x))/λ for some constant C. Substituting into the equation then gives, after obvious simplification, that Z 1 g(y) dy = Cλ (2.2.5) C− 0 or R1 g(y) dy (2.2.6) 1−λ in the case that λ 6= 1. Thus, for any continuous function g and λ 6= 0, 1, there exists a unique solution of the integral equation, namely R1 g(y) dy g(x) u(x) = 0 − (2.2.7) λ(1 − λ) λ C= 0 In the remaining case that λ = 1 it is immediate from (2.2.5) that a solution can exist R1 only if 0 g(y) dy = 0, in which case u(x) = C − g(x) is a solution for any choice of C. This very simple example already exhibits features which turn out to be common to a much larger class of integral equations of this general type. These are • The first kind integral equation will require much more restrictive conditions on g in order for a solution to exist. • For most λ 6= 0 the second kind integral equation has a unique solution for any g. 15 2-01 • There may exist a few exceptional values of λ for which either existence or uniqueness fails in the corresponding second kind equation. All of these points will be elaborated and made precise in Chapter ( ). Example 2.7. Let Ω = (0, 1) and Z x u(y) dy (2.2.8) ( 1 y<x K(x, y) = 0 x≤y (2.2.9) T u(x) = opVolterra 0 corresponding to the kernel The corresponding integral equation may then be written as Z x u(y) dy = λu(x) + g(x) (2.2.10) simpleVolter 0 This is the prototype of an integral operator of so-called Volterra type, see the definition below. In the first kind case, λ = 0, we see that g(0) = 0 is a necessary condition for solvability, in which case the solution is u(x) = g 0 (x), provided that g is differentiable in some suitable sense. For λ 6= 0 we note that differentiation of (2.2.10) with respect to x gives 1 g 0 (x) u0 − u = − (2.2.11) λ λ which is of the type (2.1.14), and so may be solved by the method given there. The result, after some obvious algebraic manipulation, is Z x eλ 1 x x−y 0 u(x) = − g(0) − e λ g (y) dy (2.2.12) λ λ 0 Note, however, that by an integration by parts, this formula is seen to be equivalent to Z x x−y g(x) 1 u(x) = − e λ g(y) dy − 2 (2.2.13) λ λ 0 Observe that (2.2.12) seems to require differentiability of g even though (2.2.13) does not, thus (2.2.13) would be the preferred solution formula. It may be verified directly by 16 2-02 2-03 substitution that (2.2.13) is a valid solution of (2.2.10) for all λ 6= 0, assuming that g is continuous on [0, 1]. Concerning the two simple integral equations just discussed observe that • For the first kind equation, there are fewer restrictions on g needed for solvability in the Volterra case (2.2.10) than in the non-Volterra case (2.2.4). • There are no exceptional values λ 6= 0 in the Volterra case, that is, a unique solution exists for every λ 6= 0 and every continuous g. Here are some of the more important ways in which integral operators are classified: IntOpClass Definition 2.1. The kernel K(x, y) is called • symmetric if K(x, y) = K(y, x) • Volterra type if N = 1 and K(x, y) = 0 for x > y or x < y • convolution type if K(x, y) = K(x − y) R • Hilbert-Schmidt type if Ω×Ω |K(x, y)|2 dxdy < ∞ • singular if K(x, y) is unbounded on Ω × Ω Some important examples of integral operators, which will receive much more attention later in the book are the Fourier transform Z 1 T u(x) = e−ix·y u(y) dy, (2.2.14) N N 2 (2π) R the Laplace transform opFourier ∞ Z e−xy u(y) dy, T u(x) = (2.2.15) opLaplace u(y) dy, x−y (2.2.16) opHilbert u(y) √ dy. x−y (2.2.17) opAbel 0 the Hilbert transform 1 T u(x) = π Z Z x ∞ −∞ and the Abel operator T u(x) = 0 17 2.3 Partial differential equations An m’th order partial differential equation (PDE) for an unknown function u = u(x) on a domain Ω ⊂ RN may be given in the form F (x, {Dα u}|α|≤m ) = 0 (2.3.1) Here we are using the so-called multi-index notation for partial derivatives which works as follows. A multi-index is vector of non-negative integers αi ∈ {0, 1, . . . } α = (α1 , α2 , . . . , αN ) In terms of α we define |α| = N X αi (2.3.2) (2.3.3) i=1 the order of α, and ∂ |α| u (2.3.4) ∂xα11 ∂xα22 . . . ∂xαNN the corresponding α derivative of u. For later use it is also convenient to define the factorial of a multi-index α! = α1 !α2 ! . . . αN ! (2.3.5) Dα u = The PDE (2.3.1) is linear if it can be written as X Lu(x) = Dα u(x) = g(x) (2.3.6) |α|≤m pdeorder1 2.3.1 First order PDEs and the method of characteristics Let us start with the simplest possible example. Example 2.8. When N = 2 and m = 1 consider ∂u =0 (2.3.7) ∂x1 By elementary calculus considerations it is clear that u is a solution if and only if u is independent of x1 , i.e. u(x1 , x2 ) = f (x2 ) (2.3.8) for some function f . This is then the general solution of the given PDE, which we note contains an arbitrary function f . 18 pdeform1 Example 2.9. Next consider, again for N = 2, m = 1, the PDE a ∂u ∂u +b =0 ∂x1 ∂x2 (2.3.9) where a, b are fixed constants. This amounts precisely to the condition that u has directional derivative 0 in the direction θ = ha, bi, so u is constant along any line parallel to θ. This in turn leads to the conclusion that u(x1 , x2 ) = f (ax2 − bx1 ) for some arbitrary function f , which at least for the moment would seem to need to be differentiable. 2 The collection of lines parallel to θ, i.e lines ax2 − bx1 = C obviously play a special role in the above example, they are the so-called characteristics, or characteristic curves associated to this particular PDE. The general concept of characteristic curve will now be described for the case of a first order linear PDE in two independent variables, (with a temporary change of notation) a(x, y)ux + b(x, y)uy = c(x, y) (2.3.10) linear1order Consider the associated ODE system dy = b(x, y) dt dx = a(x, y) dt (2.3.11) and suppose we have some solution pair x = x(t), y = y(t) which we regard as a parametrically given curve in the (x, y) plane. Such a curve is then, by definition, a characteristic curve for (2.3.10). Observe that if u(x, y) is a differentiable solution of (2.3.10) then d u(x(t), y(t)) = a(x(t), y(t))ux (x(t), y(t)) + b(x(t), y(t))uy (x(t), y(t)) = c(x(t), y(t)) dt (2.3.12) so that u satisfies a certain first order ODE along any characteristic curve. For example if c(x, y) ≡ 0 then, as in the previous example, any solution of the PDE is constant along any characteristic curve. Now let Γ ⊂ R2 be some curve, which we assume can be parametrized as x = f (s), y = g(s), s0 < s < s1 (2.3.13) The Cauchy problem for (2.3.10) consists in finding a solution of (2.3.10) with values prescribed on Γ, that is, u(f (s), g(s)) = h(s) s0 < s < s1 19 (2.3.14) udoteq for some given function h. Assuming for the moment that such a solution u exists, let x(t, s), y(t, s) be the characteristic curve passing through (f (s), g(s)) ∈ Γ when t = 0, i.e. ( ∂x = a(x, y) x(0, s) = f (s) ∂t (2.3.15) ∂y = b(x, y) y(0, s) = g(s) ∂t We must then have ∂ u(x(t, s), y(t, s)) = c(x(t, s), y(t, s)) ∂t u(x(0, s), y(0, s)) = h(s) (2.3.16) This is a first order initial value problem in t, depending on s as a parameter, which is then guaranteed to have a solution at least for |t| < for some > 0. The three relations x = x(t, s), y = y(t, s), z = u(x(t, s), y(t, s)) generally amounts to the parametric description of a surface in R3 containing Γ. If we can eliminate the parameters s, t to obtain the surface in non-parametric form z = u(x, y) then u is the sought after solution of the Cauchy problem. example30 Example 2.10. Let Γ denote the x axis and let us solve xux + uy = 1 (2.3.17) 300 with u = h on Γ. Introducing f (s) = s, g(s) = 0 as the parametrization of Γ, we must then solve ∂x ∂t = x x(0, s) = s ∂y (2.3.18) = 1 y(0, s) = 0 ∂t ∂ u(x(t, s), y(t, s)) = 1 u(s, 0) = h(s) ∂t We then easily obtain x(s, t) = set y(s, t) = t u(x(s, t), y(s, t)) = t + h(s) (2.3.19) and eliminating t, s yields the solution formula u(x, y) = y + h(xe−y ) (2.3.20) The characteristics in this case are the curves x = set , y = t for fixed s, or x = sey in nonparametric form. Note here that the solution is defined throughout the x, y plane even though nothing in the preceding discussion guarantees that. Since h has not been otherwise prescribed we may also regard (2.3.20) as the general solution of (2.3.17). The attentive reader may already realize that this procedure cannot work in all cases, as is made clear by the following consideration: if c ≡ 0 and Γ is itself a characteristic 20 301 curve, then the solution on Γ would have to simultaneously be equal to the given function h and to be constant, so that no solution can exist except possibly in the case that h is a constant function. From another, more general, point of view we must eliminate the parameters s, t by inverting the relations x = x(s, t), y = y(s, t) to obtain s, t in terms of x, y, at least near Γ, and according to the inverse function theorem this should require that the Jacobian matrix ∂x ∂y a(f (s), g(s)) b(f (s), g(s)) ∂t ∂t = (2.3.21) ∂y ∂x f 0 (s) g 0 (s) ∂s ∂s t=0 be nonsingular for all s. Equivalently the direction hf 0 , g 0 i should not be parallel to ha, bi, and since ha, bi must be tangent to the characteristic curve, this amounts to the requirement that Γ itself should have a non-characteristic tangent direction at every point. We say that Γ is non-characteristic for the PDE (2.3.10) when this condition holds. The following precise theorem can be established, see for example Chapter 1 of [18], or Chapter 3 of [10]. Theorem 2.2. Let Γ ⊂ R2 be a continuously differentiable curve, which is non-characteristic for (2.3.10), h a continuously differentiable function on Γ and let a, b, c be continuously differentiable functions in a neighborhood of Γ. Then there exists a unique continuously differentiable function u(x, y) defined in a neighborhood of Γ which is a solution of (2.3.10). The method of characteristics is capable of a considerable amount of generalization, in particular to first order PDEs in any number of independent variables, and to fully nonlinear first PDEs, see the references just given above. 2.3.2 classif Second order problems in R2 Let us next look at the following special type of second order PDE in two independent variables x, y: Auxx + Buxy + Cuyy = 0 (2.3.22) l2order where A, B, C are real constants, not all zero. Consider introducing new coordinates ξ, η by means of a linear change of variable ξ = αx + βy 21 η = γx + δy (2.3.23) ltrans with αδ − βγ 6= 0, so that the transformation is invertible. Our goal is to make a good choice of α, β, γ, δ so as to achieve a simpler, but equivalent PDE to study. Given any PDE and any change of coordinates, we obtain the expression for the PDE in the new coordinate system by straightforward application of the chain rule. In our case, for example, we have ∂u ∂u ∂ξ ∂u ∂η ∂u ∂u = + =α +γ (2.3.24) ∂x ∂ξ ∂x ∂η ∂x ∂ξ ∂η 2 2 ∂ 2u ∂ ∂u ∂u ∂ ∂ 2u 2∂ u 2∂ u + γ α + γ = α + γ = α + 2αγ (2.3.25) ∂x2 ∂ξ ∂η ∂ξ ∂η ∂ξ 2 ∂ξ∂η ∂η 2 with similar expressions for uxy and uyy . Substituting into (2.3.22) the resulting PDE is auξξ + buξη + cuηη = 0 (2.3.26) a = α2 A + αβB + β 2 C b = 2αγA + (αδ + βγ)B + 2βδC c = γ 2 A + γδB + δ 2 C (2.3.27) (2.3.28) (2.3.29) where The idea now is to make special choices of α, β, γ, δ to achieve as simple a form as possible for the transformed PDE (2.3.26). Suppose first that B 2 − 4AC > 0, so that there exist two real and distinct roots r1 , r2 of Ar2 + Br + C = 0. If α, β, γ, δ are chosen so that α γ = r1 = r2 (2.3.30) β δ then a = c = 0, (and αδ − βγ 6= 0) so that the transformed PDE is simply uξη = 0. The general solution of this second order PDE is easily obtained: uξ must be a function of ξ alone, so integrating with respect to ξ and observing that the ’constant of integration’ could be any function of η, we get u(ξ, η) = F (ξ) + G(η) (2.3.31) for any differentiable functions F, G. Finally reverting to the original coordinate system, the result is u(x, y) = F (αx + βy) + G(γx + δy) (2.3.32) The lines αx + βy = C, γx + δy = C are called the characteristics for (2.3.22). Characteristics are an important concept for this and some more general second order PDEs, but they don’t play as central a role as in the first order case. 22 trpde Example 2.11. For the PDE uxx − uyy = 0 (2.3.33) the roots r satisfy r2 − 1 = 0. We may then choose, for example, α = β = γ = 1, δ = −1, to get the general solution u(x, y) = F (x + y) + G(x − y) (2.3.34) Next assume that B 2 − 4AC = 0. If either of A or C is 0, then so is B, in which case the PDE already has the form uξξ = 0 or uηη = 0, say the first of these without loss of generality. Otherwise, choose α=− B 2A β=1 γ=1 δ=0 (2.3.35) to obtain a = b = 0, c = A, so that the transformed PDE in all cases is uξξ = 0. Finally, if B 2 − 4AC < 0 then A 6= 0 must hold, and we may choose α= √ 2A 4AC − B 2 β=√ −B 4AC − B 2 γ=0 δ=1 (2.3.36) in which case the transformed equation is uξξ + uηη = 0 (2.3.37) We have therefore established that any PDE of the type (2.3.22) can be transformed, by means of a linear change of variables, to one of the three simple types, uξη = 0 uξξ = 0 uξξ + uηη = 0 (2.3.38) modelpde each of which then leads to a prototype for a certain class of PDEs. If we allow lower order terms Auxx + Buxy + Cuyy + Dux + Euy + F u = G (2.3.39) l2orderg then after the transformation (2.3.23) it is clear that the lower order terms remain as lower order terms. Thus any PDE of the type (2.3.39) is, up to a change of coordinates, one of the three types (2.3.38), up to lower order terms, and only the value of the discriminant B 2 − 4AC needs to be known to determine which of the three types is obtained. The above discussion motivates the following classification: The PDE (2.3.39) is said to be: 23 • hyperbolic if B 2 − 4AC > 0 • parabolic if B 2 − 4AC = 0 • elliptic if B 2 − 4AC < 0 The terminology comes from an obvious analogy with conic sections, i.e. the solution set of Ax2 + Bxy + Cy 2 + Dx + Ey + F = 0 is respectively a hyperbola, parabola or ellipse (or a degenerate case) according as B 2 − 4AC is positive, zero or negative. We can also allow the coefficients A, B, . . . G to be variable functions of x, y, and in this case the classification is done pointwise, so the type can change. An important example of this phenomenon is the so-called Tricomi equation (see e.g. Chapter 12 of [13]) uxx − xuyy = 0 (2.3.40) which is hyperbolic for x > 0 and elliptic for x < 0. One might refer to the equation as being parabolic for x = 0 but generally speaking we do not do this, since it is not really meaningful to speak of a PDE being satisfied in a set without interior points. The above discussion is special to the case of N = 2 independent variables, and in the case of N ≥ 3 there is no such complete classification. As we will see there are still PDEs referred to as being hyperbolic, parabolic or elliptic, but there are others which are not of any of these types, although these tend to be of less physical importance. 2.3.3 Further discussion of model problems According to the previous discussion, we should focus our attention on a representative problem for each of the three types, since then we will also gain considerable information about other problems of the given type. Wave equation For the hyperbolic case we consider the wave equation utt − c2 uxx = 0 (2.3.41) where c > 0 is a constant. Here we have changed the name of the variable y to t, following the usual convention of regarding u = u(x, t) as depending on a ’space’ variable 24 waveeq x and ’time’ variable t. This PDE arises in the simplest model of wave propagation in one dimension, where u represents, for example, the displacement of a vibrating medium from its equilibrium position, and c is the wave speed. Following the procedure outlined at the beginning of this section, an appropriate change of coordinates is ξ = x + ct, η = x − ct, and we obtain the expression, also known as d’Alembert’s formula, for the general solution, u(x, t) = F (x + ct) + G(x − ct) (2.3.42) dal for arbitrary twice differentiable functions F, G. The general solution may be viewed as the superposition of two waves of fixed shape, moving to the right and to the left with speed c. The initial value problem for the wave equation consists in solving (2.3.41) for x ∈ R and t > 0 subject to the side conditions u(x, 0) = f (x) ut (x, 0) = g(x) x∈R (2.3.43) where f, g represent the initial displacement and initial velocity of the vibrating medium. This problem may be completely and explicitly solved by means of d’Alembert’s formula. We have F (x) + G(x) = f (x) c(F 0 (x) − G0 (x)) = g(x) x∈R (2.3.44) R x Integrating the second relation gives F (x) − G(x) = 1c 0 g(s) ds + C for some constant C, and combining with the first relation yields Z Z 1 1 1 x 1 x F (x) = g(s) ds + C G(x) = g(s) ds − C f (x) + f (x) − 2 c 0 2 c 0 (2.3.45) Substituting into (2.3.42) and doing some obvious simplification we obtain Z 1 1 x+ct g(s) ds (2.3.46) u(x, t) = (f (x + ct) + f (x − ct)) + 2 2c x−ct We remark that a general solution formula like (2.3.42) can be given for any PDE which is exactly transformable to uξη = 0, that is to say, any hyperbolic PDE of the form (2.3.22), but once lower order terms are allowed such a simple solution method is no longer available. For example the so-called Klein-Gordon equation utt − uxx + u = 0 may be transformed to uξη + 4u = 0 which cannot be solved in so transparent a form. Thus the d’Alembert solution method, while very useful when applicable, is limited in its scope. 25 waveeqic dalivp Heat equation Another elementary method, which may be used in a wide variety of situations, is separation of variables. We illustrate with the case of the initial and boundary value problem ut = uxx 0<x<1 t>0 u(0, t) = u(1, t) = 0 t>0 u(x, 0) = f (x) 0<x<1 (2.3.47) (2.3.48) (2.3.49) Here (2.3.47) is the heat equation, a parabolic equation modeling for example the temperature in a one dimensional medium u = u(x, t) as a function of location x and time t, (2.3.48) are the boundary conditions, stating that the temperature is held at temperature zero at the two boundary points x = 0 and x = 1 for all t, and (2.3.49) represents the initial condition, i.e. that the initial temperature distribution is given by the prescribed function f (x). We begin by ignoring the initial condition and otherwise looking for special solutions of the form u(x, t) = φ(t)ψ(x). Obviously u = 0 is such a solution, but cannot be of any help in eventually solving the full stated problem, so we insist that neither of φ and ψ is the zero function. Inserting into (2.3.47) we obtain immediately that must hold, or equivalently φ0 (t)ψ(x) = φ(t)ψ 00 (x) (2.3.50) φ0 (t) ψ 00 (x) = φ(t) ψ(x) (2.3.51) Since the left side depends on t alone and the right side on x alone, it must be that both sides are equal to a common constant which we denote by −λ (without yet at this point ruling out the possibility that λ itself is negative or even complex). We have therefore obtained ODEs for φ and ψ φ0 (t) + λφ(t) = 0 ψ 00 (x) + λψ(x) = 0 (2.3.52) linked via the separation constant λ. Next, from the boundary condition (2.3.48) we get φ(t)ψ(0) = φ(t)ψ(1) = 0, and since φ is nonzero we must have ψ(0) = ψ(1) = 0. The ODE and side conditions for ψ, namely ψ 00 (x) + λψ(x) = 0 0 < x < 1 26 ψ(0) = ψ(1) = 0 (2.3.53) SL1 is the simplest example of a so-called Sturm-Liouville problem, a topic which will be studied in detail in Chapter ( ), but this particular case can be handled by elementary considerations. We emphasize that our goal is to find nonzero solutions of (2.3.53), along with the values of λ these correspond to, and as we will see, only certain values of λ will be possible. Considering first the case that λ > 0, the general solution of the ODE is √ √ ψ(x) = c1 sin λx + c2 cos λx (2.3.54) The first √ boundary condition ψ(0) = 0 implies that c2 = 0 and the second gives c1 sin √ λ = 0. We are not √allowed to have c1 = 0, since otherwise ψ = 0, so instead sin λ = 0 must hold, i.e. λ = π, 2π, . . . . Thus we have found one collection of solutions of (2.3.53), which we denote ψk (x) = sin kπx, k = 1, 2, . . . . Since they were found under the assumption that λ > 0, we should next consider other possibilities, but it turns out that we have already found all possible solutions of (2.3.53). For example if we √ suppose λ < 0 and k = −λ then to solve (2.3.53) we must have ψ(x) = c1 ekx + c2 e−kx . From the boundary conditions c1 + c2 = 0 c1 ek + c2 e−k = 0 (2.3.55) we see that the unique solution is c1 = c2 = 0 for any k > 0. Likewise we can check that ψ = 0 is the only possible solution for k = 0 and for nonreal k. For each allowed value of λ we obviously have the corresponding function φ(t) = e−λt , so that 2 2 uk (x, t) = e−k π t sin kπx k = 1, 2, . . . (2.3.56) represents, aside from multiplicative constants, all possible product solutions of (2.3.47),(2.3.48). To complete the solution of the initial and boundary value problem, we observe that P any sum ∞ c k=1 k uk (x, t) is also a solution of (2.3.47),(2.3.48) as long as ck → 0 sufficiently rapidly, and we try to choose the coefficients ck to achieve the initial condition (2.3.49). The requirement is therefore that f (x) = ∞ X ck sin kπx (2.3.57) k=1 hold. For any f for which such a sine series representation is valid, we then have the solution of the given PDE problem u(x, t) = ∞ X ck e−k k=1 27 2 π2 t sin kπx (2.3.58) foursine The question then becomes to characterize this set of f ’s in some more straightforward way, and this is done, among many other things, within the theory of Fourier series, which will be discussed in Chapter 8. Roughly speaking the result will be that essentially any reasonable function can be represented this way, but there are many aspects to this, including elaboration of the precise sense in which the series converges. One other fact concerning this series which we can easily anticipate at this point, is a formula for the coefficient ck : If we assume that (2.3.57) holds, we can multiply both sides by sin mπx for some integer m and integrate with respect to x over (0, 1), to obtain Z 1 Z 1 cm sin2 mπx dx = f (x) sin mπx dx = cm (2.3.59) 2 0 0 R1 since 0 sin kπx sin mπx dx = 0 for k 6= m. Thus, if f is representable by a sine series, there is only one possibility for the k’th coefficient, namely Z 1 f (x) sin kπx dx (2.3.60) ck = 2 0 Laplace equation Finally we discuss a model problem of elliptic type, uxx + uyy = 0 x2 + y 2 < 1 u(x, y) = f (x, y) x2 + y 2 = 1 (2.3.61) (2.3.62) where f is a given function. The PDE in (2.3.61) is known as Laplace’s equation, and is ∂2 ∂2 commonly written as ∆u = 0 where ∆ = ∂x 2 + ∂y 2 is the Laplace operator, or Laplacian. A function satisfying Laplace’s equation in some set is said to be a harmonic function on that set, thus we are solving the boundary value problem of finding a harmonic function in the unit disk x2 + y 2 < 1 subject to a prescribed boundary condition on the boundary of the disk. One should immediately recognize that it would be natural here to make use of polar coordinates (r, θ), where according to the usual calculus notations, p y r = x2 + y 2 tan θ = x = r cos θ y = r sin θ (2.3.63) x and we regard u = u(r, θ) and f = f (θ). 28 sinecoef To begin we need to find the expression for Laplace’s equation in polar coordinates. Again this is a straightforward calculation with the chain rule, for example ∂u ∂u ∂r ∂u ∂θ = + ∂x ∂r ∂x ∂θ ∂x x y ∂u ∂u = p − 2 x2 + y 2 ∂r x + y 2 ∂θ ∂u sin θ ∂u = cos θ − ∂r r ∂θ and similar expressions for ∂u ∂y (2.3.64) (2.3.65) (2.3.66) and the second derivatives. The end result is 1 1 uxx + uyy = urr + ur + 2 uθθ = 0 r r (2.3.67) We may now try separation of variables, looking for solutions in the product form u(r, θ) = R(r)Θ(θ). Substituting into (2.3.67) and dividing by RΘ gives r2 R0 (r) Θ00 (θ) R00 (r) +r =− R(r) R(r) Θ(θ) (2.3.68) so both sides must be equal to a common constant λ. Therefore R and Θ must be nonzero solutions of Θ00 + λΘ = 0 r2 R00 + rR0 − λR = 0 (2.3.69) Next it is necessary to recognize that there are two ’hidden’ side conditions which we must make use of. The first of these is that Θ must be 2π periodic, since otherwise it would not be possible to express the solution u in terms of the original variables x, y in an unambiguous way. We can make this explicit by requiring Θ0 (0) = Θ0 (2π) Θ(0) = Θ(2π) (2.3.70) As in the case of (2.3.53) we can search for allowable values of λ by considering the various cases λ > 0, λ < 0 etc. The outcome is that nontrivial solutions exist precisely if λ = k 2 , k = 0, 1, 2, . . . , with corresponding solutions, up to multiplicative constant, ( 1 k=0 ψk (x) = (2.3.71) sin kx or cos kx k = 1, 2, . . . If one is willing to use the complex form, we could replace sin kx, cos kx by e±ikx for k = 1, 2, . . . . 29 laplace2radi With λ determined we must next solve the corresponding R equation, r2 R00 + rR0 − k 2 R = 0 which is of the Cauchy-Euler type (2.1.19). The general solution is ( c1 + c2 log r k = 0 R(r) = c1 rk + c2 r−k k = 1, 2 . . . (2.3.72) (2.3.73) and here we encounter the second hidden condition, the solution R should be not be singular at the origin, since otherwise the PDE would not be satisfied throughout the unit disk. Thus we should choose c2 = 0 in each case, leaving R(r) = rk , k = 0, 1, . . . . Summarizing, we have found all possible product solutions R(r)Θ(θ) of (2.3.61), and they are 1, rk sin kθ, rk cos kθ k = 1, 2, . . . (2.3.74) up to constant multiples. Any sum of such terms is also a solution of (2.3.61), so we seek a solution of (2.3.61),(2.3.62) in the form u(r, θ) = a0 + ∞ X ak rk cos kθ + bk rk sin kθ (2.3.75) k=1 The coefficients must then be determined from the requirement that f (θ) = a0 + ∞ X ak cos kθ + bk sin kθ (2.3.76) k=1 This is another problem in the theory of Fourier series, which is very similar to that associated with (2.3.57), which as mentioned before will be studied in detail in Chapter 8. Exact formulas for the coefficients in terms of f may be given, as in (2.3.60), see Exercise 19. 2.3.4 Standard problems and side conditions Let us now formulate a number of typical PDE problems which will recur throughout this book, and which are for the most part variants of the model problems discussed in 30 fourseries the previous section. Let Ω be some domain in RN and let ∂Ω denote the boundary of Ω. For any sufficiently differentiable function u, the Laplacian of u is ∆u = N X ∂ 2u k=1 (2.3.77) ∂x2k • The PDE ∆u = h x ∈ Ω (2.3.78) is Poisson’s equation, or Laplace’s equation in the special case that h = 0. It is regarded as being of elliptic type, by analogy with the N = 2 case discussed in the previous section, or on account of a more general definition of ellipticity which will be given in Chapter 9. The most common type of side conditions associated with this PDE are – Dirichlet, or first kind, boundary conditions u(x) = g(x) x ∈ ∂Ω (2.3.79) – Neumann, or second kind, boundary conditions ∂u (x) = g(x) ∂n x ∈ ∂Ω (2.3.80) ∂u where ∂n (x) = (∇u · n)(x) is the directional derivative in the direction of the outward normal direction n(x) for x ∈ ∂Ω. – Robin, or third kind, boundary conditions ∂u (x) + σ(x)u(x) = g(x) ∂n x ∈ ∂Ω (2.3.81) for some given function σ. • The PDE ∆u + λu = h x ∈ Ω (2.3.82) where λ is some constant, is the Helmholtz equation, also of elliptic type. The three types of boundary condition mentioned for the Poisson equation may also be imposed in this case. 31 • The PDE x∈Ω t>0 ut = ∆u (2.3.83) is the heat equation and is of parabolic type. Here u = u(x, t), where x is regarded as a spatial variable and t a time variable. By convention, the Laplacian acts only with respect to the N spatial variables x1 , . . . xN . Appropriate side conditions for determining a solution of the heat equation are an initial condition u(x, 0) = f (x) x∈Ω (2.3.84) and boundary conditions of the Dirichlet, Neumann or Robin type mentioned above. The only needed modification is that the functions involved may be allowed to depend on t, for example the Dirichlet boundary condition for the heat equation is u(x, t) = g(x, t) x ∈ ∂Ω t > 0 (2.3.85) and similarly for the other two types. • The PDE x∈Ω t>0 utt = ∆u (2.3.86) is the wave equation and is of hyperbolic type. Since it is second order in t it is natural that there be two initial conditions, usually given as u(x, 0) = f (x) ut (x, 0) = g(x) x∈Ω (2.3.87) Suitable boundary conditions for the wave equation are precisely the same as for the heat equation. • Finally, the PDE x ∈ RN iut = ∆u t>0 (2.3.88) is the Schrödinger equation. Even when N = 1 it does not fall under the √ classification scheme of Section 2.3.2 because of the complex coefficient i = −1. It is nevertheless one of the fundamental partial differential equations of mathematical physics, and we will have some things to say about it in later chapters. The spatial domain here is taken to be all of RN rather than a subset Ω because this is by far the most common situation and the only one which will arise in this book. Since there is no spatial boundary, the only needed side condition is an initial condition for u, u(x, 0) = f (x), as in the heat equation case. 32 2.4 Well-posed and ill-posed problems illposed All of the PDEs and associated side conditions discussed in the previous section turn out to be natural, in the sense that they lead to what are called well-posed problems, a somewhat imprecise concept we explain next. Roughly speaking a problem is well-posed if • A solution exists. • The solution is unique. • The solution depends continuously on the data. Here by ’data’ we mean any of the ingredients of the problem which we might imagine being changed, to obtain a problem of the same general type. For example in the Dirichlet problem for the Poisson equation ∆u = f x∈Ω u = 0 x ∈ ∂Ω (2.4.1) the term f = f (x) would be regarded as the given data. The idea of continuous dependence is that if a ’small’ change is made in the data, then the resulting solution should also undergo only a small change. For such a notion to be made precise, it is necessary to have some specific idea in mind of how we would measure the magnitude of a change in f . As we shall see, there may be many natural ways to do so, and no precise statement about well-posedness can be given until such choices are made. In fact, even the existence and uniqueness requirements, which may seem more clear cut, may also turn out to require much clarification in terms of what the exact meaning of ’solution’ is. A problem which is not well-posed is called ill-posed. A classical problem in which ill-posedness can be easily recognized is Hadamard’s example, which we note is not of one of the standard types mentioned above: uxx + uyy = 0 −∞<x<∞ y >0 u(x, 0) = 0 uy (x, 0) = g(x) − ∞ < x∞ (2.4.2) (2.4.3) If g(x) = α sin kx for some α, k > 0 then a corresponding solution is u(x, y) = α 33 sin kx ky e k (2.4.4) This is known to be the unique solution, but notice that a change in α (i.e. of the data g) of size implies a corresponding change in the solution for, say, y = 1 of size ek . Since k can be arbitrarily large, it follows that the problem is ill-posed, that is, small changes in the data do not necessarily lead to small changes in the solution. Note that in this example if we change the PDE from uxx + uyy = 0 to uxx − uyy = 0 then (aside from the name of a variable) we have precisely the problem (2.3.41),(2.3.43), which from the explicit solution (2.3.46) may be seen to be well-posed under any reasonable interpretation. Thus we see that some care must be taken in recognizing what are the ’correct’ side conditions for a given PDE. Other interesting examples of ill-posed problems are given in exercises 23 and 26, see also [25]. 2.5 Exercises 1. Find a fundamental set and the general solution of u000 + u00 + u0 = 0. ex22 2. Let L = aD2 +bD+c (a 6= 0) be a constant coefficient second order linear differential operator, and let p(λ) = aλ2 + bλ + c be the associated characteristic polynomial. If λ1 , λ2 are the roots of p, show that we can express the operator L as L = a(D − λ1 )(D − λ2 ). Use this factorization to obtain the general solution of Lu = 0 in the case of repeated roots, λ1 = λ2 . √ 3. Show that the solution of the initial value problem y 0 = 3 y, y(0) = 0 is not unique. (Hint: y(t) = 0 is one solution, find another one.) Why doesn’t this contradict the assertion in Theorem 2.1 about unique solvability of the initial value problem? 4. Solve the initial value problem for the Cauchy-Euler equation (t + 1)2 u00 + 4(t + 1)u0 − 10u = 0 u(1) = 2 u0 (1) = −1 5. Consider the integral equation Z 1 K(x, y)u(y) dy = λu(x) + g(x) 0 for the kernel K(x, y) = x2 1 + y3 a) For what values of λ ∈ C does there exist a unique solution for any function g which is continuous on [0, 1]? 34 b) Find the solution set of the equation for all λ ∈ C and continuous functions g. (Hint: For λ 6= 0 any solution must have the form u(x) = − g(x) + Cx2 for some λ constant C.) R1 6. Find a kernel K(x, y) such that u(x) = 0 K(x, y)f (y) dy is the solution of u00 + u = f (x) u(0) = u0 (0) = 0 (Hint: Review the variation of parameters method in any undergraduate ODE textbook.) 2-7 7. If f ∈ C([0, 1]), ( y(x − 1) 0 < y < x < 1 K(x, y) = x(y − 1) 0 < x < y < 1 and 1 Z K(x, y)f (y) dy u(x) = 0 show that u00 = f 0<x<1 u(0) = u(1) = 0 8. For each of the integral operators in (2.2.8),(2.2.14),(2.2.15),(2.2.16),and (2.2.17), discuss the classification(s) of the corresponding kernel, according to Definition (2.1). 9. Find the general solution of (1 + x2 )ux + uy = 0. Sketch some of the characteristic curves. 10. The general solution in Example 2.10 was found by solving the corresponding Cauchy problem with Γ being the x axis. But the general solution should not actually depend on any specific choice of Γ. Show that the same general solution is found if instead we take Γ to be the y axis. 11. Find the solution of yux + xuy = 1 u(0, y) = e−y 2 Discuss why the solution you find is only valid for |y| ≥ |x|. 12. The method of characteristics developed in Section 2.3.1 for the linear PDE (2.3.10) can be easily extended to the so-called semilinear equation a(x, y)ux + b(x, y)uy = c(x, y, u) 35 (2.5.1) We simply replace (2.3.12) by d u(x(t), y(t)) = c(x(t), y(t), u(x(t), y(t))) dt (2.5.2) which is still an ODE along a characteristic. With this in mind, solve ux + xuy + u2 = 0 u(0, y) = 1 y (2.5.3) 13. Find the general solution of uxx − 4uxy + 3uyy = 0. 14. Find the regions of the xy plane where the PDE yuxx − 2uxy + xuyy − 3ux + u = 0 is elliptic, parabolic, and hyperbolic. 15. Find a solution formula for the half line wave equation problem utt − c2 uxx u(0, t) u(x, 0) ut (x, 0) = = = = 0 x>0 t>0 h(t) t > 0 f (x) x > 0 g(x) x > 0 (2.5.4) (2.5.5) (2.5.6) (2.5.7) Note where the solution coincides with (2.3.46) and explain why this should be expected. 16. Complete the details of verifying (2.3.67) ex-2-17 17. If u is a twice differentiable function on RN depending only on r = |x|, show that ∆u = urr + N −1 ur r (Spherical coordinates in RN are reviewed in Section 18.3, but the details of the ∂u = angular variables are not needed for this calculation. Start by showing that ∂x j xj 0 u (r) r .) 18. Verify in detail that there are no nontrivial solutions of (2.3.53) for nonreal λ ∈ C. ex23 19. Assuming that (2.3.76) is valid, find the coefficients ak , bk in terms of f . (Hint: multiply the equation by sin mθ or cos mθ and integrate from 0 to 2π.) 36 20. In the two dimensional case, solutions of Laplace’s equation ∆u = 0 may also be found by means of analytic function theory. Recall that if z = x+iy then a function f (z) is analytic in an open set Ω if f 0 (z) exists at every point of Ω. If we think of f = u + iv and u, v as functions of x, y then u = u(x, y), v = v(x, y) must satisfy the Cauchy-Riemann equations ux = vy , uy = −vx . Show in this case that u, v are also solutions of Laplace’s equation. Find u, v if f (z) = z 3 and f (z) = ez . 21. Find all of the product solutions u(x, t) = φ(t)ψ(x) that you can which satisfy the damped wave equation utt + αut = uxx 0<x<π t>0 and the boundary conditions u(0, t) = ux (π, t) = 0 t>0 Here α > 0 is the damping constant. What is the significance of the condition α < 1? ex24 22. Show that any solution of the wave equation utt − uxx = 0 has the ‘four point property’ u(x, t) + u(x + h − k, t + h + k) = u(x + h, t + h) + u(x − k, t + k) for any h, k. (Suggestion: Use d’Alembert’s formula.) ex25 23. In the Dirichlet problem for the wave equation utt − uxx = 0 0<x<1 0<t<1 u(0, t) = u(1, t) = 0 u(x, 0) = 0 u(x, 1) = f (x) 0<t<1 0<x<1 show that neither existence nor uniqueness holds. (Hint: For the non-existence part, use exercise 22 to find an f for which no solution exists.) 24. Let Ω be the rectangle [0, a] × [0, b] in R2 . Find all possible product solutions u(x, y, t) = φ(t)ψ(x)ζ(y) satisfying ut − ∆u = 0 (x, y) ∈ Ω t > 0 u(x, y, t) = 0 (x, y) ∈ ∂Ω t > 0 37 25. Find a solution of the Dirichlet problem for u = u(x, y) in the unit disc Ω = {(x, y) : x2 + y 2 < 1}, ∆u = 1 (x, y) ∈ Ω u(x, y) = 0 (x, y) ∈ ∂Ω (Suggestion: look for a solution in the form u = u(r) and recall (2.3.67).) ex26 26. The problem ut = uxx 0<x<1 t<T u(0, t) = u(1, t) = 0 t>0 u(x, T ) = f (x) 0<x<1 (2.5.8) (2.5.9) (2.5.10) is sometimes called a final value problem for the heat equation. a) Show that this problem is ill-posed. b) Show that this problem is equivalent to (2.3.47),(2.3.48),(2.3.49) except with the heat equation (2.3.47) replaced by the backward heat equation ut = −uxx . 38 Chapter 3 Vector spaces We will be working frequently with function spaces which are themselves special cases of more abstract spaces. Most such spaces which are of interest to us have both linear structure and metric structure. This means that given any two elements of the space it is meaningful to speak of (i) a linear combination of the elements, and (ii) the distance between the two elements. These two kinds of concepts are abstracted in the definitions of vector space and metric space. 3.1 Axioms of a vector space chvec-1 Definition 3.1. A vector space is a set X such that whenever x, y ∈ X and λ is a scalar we have x + y ∈ X and λx ∈ X, and the following axioms hold. [V1] x + y = y + x for all x, y ∈ X [V2] (x + y) + z = x + (y + z) for all x, y, z ∈ X [V3] There exists an element 0 ∈ X such that x + 0 = x for all x ∈ X [V4] For every x ∈ X there exists an element −x ∈ X such that x + (−x) = 0 [V5] λ(x + y) = λx + λy for all x, y ∈ X and any scalar λ [V6] (λ + µ)x = λx + µx for any x ∈ X and any scalars λ, µ 39 [V7] λ(µx) = (λµ)x for any x ∈ X and any scalars λ, µ [V8] 1x = x for any x ∈ X Here the field of scalars my be either the real numbers R or the complex numbers C, and we may refer to X as a real or complex vector space accordingly, if a distinction needs to be made. By an obvious induction Pm argument, if x1 , . . . , xm ∈ X and λ1 , . . . , λm are scalars, then the linear combination j=1 λj xj is itself an element of X. Example 3.1. Ordinary N -dimensional Euclidean space RN := {x = (x1 , x2 . . . xN ) : xj ∈ R} is a real vector space with the usual operations of vector addition and scalar multiplication, (x1 , x2 . . . xN ) + (y1 , y2 , . . . yN ) = (x1 + y1 , x2 + y2 . . . xN + yN ) λ(x1 , x2 . . . xN ) = (λx1 , λx2 , . . . λxN ) λ ∈ R If we allow the components xj as well as the scalars λ to be complex, we obtain instead the complex vector space CN . Example 3.2. If E ⊂ RN , let C(E) = {f : E → R : f is continous at x for every x ∈ E} denote the set of real valued continuous functions on E. Clearly C(E) is a real vector space with the ordinary operations of function addition and scalar multiplication (λf )(x) = λf (x) λ ∈ R (f + g)(x) = f (x) + g(x) If we allow the range space in the definition of C(E) to be C then C(E) becomes a complex vector space. Spaces of differentiable functions likewise may be naturally regarded as vector spaces, for example C m (E) = {f : Dα f ∈ C(E), |α| ≤ m} and C ∞ (E) = {f : Dα f ∈ C(E), for all α} 2 40 Example 3.3. If 0 < p < ∞ and E is a measurable subset of RN , the space Lp (E) is defined to be the set of measurable functions f : E → R or f : E → C such that Z |f (x)|p dx < ∞ (3.1.1) E Here the integral is defined in the Lebesgue sense. Those unfamiliar with measure theory and Lebesgue integration should consult a standard textbook such as [30],[28], or see a brief summary in Appendix ( ). It may then be shown that Lp (E) is vector space for any 0 < p < ∞. To see this we use the known fact that if f, g are measurable then so are f + g and λf for any scalar λ, and the numerical inequality (a + b)p ≤ Cp (ap + bp ) for a, b ≥ 0, where Cp = max (2p−1 , 1) to prove that f + g ∈ Lp (E) whenever f, g ∈ Lp (E). Verification of the remaining axioms is routine. The related vector space L∞ (E) is defined as the set of measurable functions f for which ess supx∈E |f (x)| < ∞ (3.1.2) Here M = ess supx∈E |f (x)| if |f (x)| ≤ M a.e. and there is no smaller constant with this property. Definition 3.2. If X is a vector space, a subset M ⊂ X is a subspace of X if (i) x + y ∈ M whenever x, y ∈ M (ii) λx ∈ M whenever x ∈ M and λ is a scalar That is to say, a subspace is a subset of X which is closed under formation of linear combinations. Clearly a subspace of a vector space is itself a vector space. Example 3.4. The subset M = {x ∈ RN : xj = 0} is a subspace of RN for any fixed j. Example 3.5. If E ⊂ RN then C ∞ (E) is a subspace of C m (E) for any m, which in turn is a subspace of C(E). Example 3.6. If X is any vector space and S ⊂ X, then the set of all finite linear combinations of elements of S, L(S) := {v ∈ X : x = m X λj xj for some scalars λ1 , λ2 , . . . λm and elements x1 , . . . xm ∈ S} j=1 41 is a subspace of X. It is also called the span, or linear span of S, or the subspace generated by S. 2 Example 3.7. If in Example 5 we take X = C([a, b]) and fj (x) = xj−1 for j = 1, 2, . . . +1 then the subspace generated by {fj }N j=1 is PN , the vector space of polynomials of degree less than or equal to N . Likewise, the the subspace generated by {fj }∞ j=1 is P, the vector space of all polynomials. 2 3.2 Linear independence and bases Definition 3.3. We say that PmS ⊂ X is linearly independent if whenever x1 , . . . xm ∈ S, λ1 , . . . λm are scalars and j=1 λj xj = 0 then λ1 = λ2 = . . . λm = 0. Otherwise S is linearly dependent. Equivalently, S is linearly dependent if it is possible to express at least one of its elements as a linear combination of the remaining ones. In particular any set containing the zero element is linearly dependent. hamel Definition 3.4. We say that S ⊂ X is a basis of X if for any x ∈ there exists unique PX m scalars λ1 , λ2 , . . . λm and elements x1 , . . . , xm ∈ S such that x = j=1 λj xj . The following characterization of a basis is then immediate: Theorem 3.1. S ⊂ X is a basis of X if and only if S is linearly independent and L(S) = X. It is important to emphasize that in this definition of basis it is required that every x ∈ X be expressible as a finite linear combination of the basis elements. This notion of basis will be inadequate for later purposes, and will be replaced by one which allows infinite sums, but this cannot be done until a meaning of convergence is available. The notion of basis in Definition 3.4 is called a Hamel basis if a distinction is necessary. Definition 3.5. We say that dim X, the dimension of X, is m if there exist m linearly independent vectors in X but any collection of m + 1 elements of X is linearly dependent. If there exists m linearly independent vectors for any positive integer m, then we say dim X = ∞. prop31 Proposition 3.1. The elements {x1 , x2 , . . . xm } form a basis for L({x1 , x2 , . . . xm }) if and only if they are linearly independent. 42 prop32 Proposition 3.2. The dimension of X is the number of vectors in any basis of X. The proof of both of these Propositions is left for the exercises. Example 3.8. RN or CN has dimension N . We will denote by ej the standard unit vector with a one in the j’th position and zero elsewhere. Then {e1 , e2 , . . . eN } is the standard basis for either RN or CN . Example 3.9. In the vector space C([a, b]) the elements fj (t) = tj−1 are clearly linearly independent, so that the dimension is ∞, as is the dimension of the subspace P. Also evidently the subspace PN has dimension N + 1. Example 3.10. The set of solutions of the ordinary differential equation u00 + u = 0 is precisely the set of linear combinations u(t) = λ1 sin t + λ2 cos t. Since sin t, cos t are linearly independent functions, they form a basis for this two dimensional space. The following is interesting, although not of great practical significance. Its proof, which is not obvious in the infinite dimensional case, relies on the Axiom of Choice and will not be given here. Theorem 3.2. Every vector space has a basis. 3.3 Linear transformations of a vector space sec33 If X and Y are vector spaces, a mapping T : X 7−→ Y is called linear if T (λ1 x1 + λ2 x2 ) = λ1 T (x1 ) + λ2 T (x2 ) (3.3.1) for all x1 , x2 ∈ X and all scalars λ1 , λ2 . Such a linear transformation is uniquely determined on all of X by its action onPany basis of X, i.e. if S P = {xα }α∈A is a basis of X m and yα = T (xα ), then for any x = m λ x we have T x = j=1 λj yαj . j=1 j αj In the case that X and Y are both of finite dimension let us choose bases {x1 , x2 , . . . xm }, {y1 , y2 , . . . yn } of P X, Y respectively. For 1 ≤ j ≤ m there must exist unique scalars akj such that T xj = nk=1 akj yk and it follows that x= m X j=1 λ j xj ⇒ T x = n X µk y k where µk = m X j=1 k=1 43 akj λj (3.3.2) P For a given basis {x1 , x2 , . . . xm } of X, if x = m j=1 λj xj we say that λ1 , λ2 , . . . λm are the coordinates of x with respect to the given basis. The n × m matrix A = [akj ] thus maps the coordinates of x with respect to the basis {x1 , x2 , . . . xm } to the coordinates of T x with respect to the basis {y1 , y2 , . . . yn }, and thus encodes all information about the linear mapping T . If T : X 7−→ Y is linear, one-to-one and onto then we say T is an isomorphism between X to Y, and the vector spaces X and Y are isomorphic whenever there exists an isomorphism between them. If T is such an isomorphism, and S is a basis of X then it easy to check that the image set T (S) is a basis of Y. In particular, any two isomorphic vector spaces have the same finite dimension or are both infinite dimensional. For any linear mapping T : X → Y we define the kernel, or null space, of T as N (T ) = {x ∈ X : T x = 0} (3.3.3) R(T ) = {y ∈ Y : y = T x for some x ∈ X} (3.3.4) and the range of T as It is immediate that N (T ) and R(T ) are subspaces of X, Y respectively, and T is an isomorphism precisely if N (T ) = {0} and R(T ) = Y. If X = Y = RN or CN , we learn in linear algebra that these two conditions are equivalent. 3.4 Exercises 1. Using only the vector space axioms, show that the zero element in [V3] is unique. 2. Prove Propositions 3.1 and 3.2. 3. Show that the intersection of any family of subspaces of a vector space is also a subspace. What about the union of subspaces? 4. Show that Mm×n , the set of m × n matrices, with the usual definitions of addition and scalar multiplication, is a vector space of dimension mn. Show that the subset of symmetric matrices n × n matrices forms a subspace of Mn×n . What is its dimension? 5. Under what conditions on a measurable set E ⊂ RN and p ∈ (0, ∞] will it be true that C(E) is a subspace of Lp (E)? Under what conditions is Lp (E) a subset of Lq (E)? 44 6. Let uj (t) = tλj where λ1 , . . . λn are arbitrary unequal real numbers. Show that {u1 . . . uP n } are linearly independent functions on any interval (a, b) ⊂ R. (Suggestion: If nj=1 αj tλj = 0, divide by tλ1 and differentiate.) 7. A side condition for a differential equation is homogeneous if whenever two functions satisfy the side condition then so does any linear combination of the two functions. For example the Dirichlet type boundary condition u = 0 for x ∈ ∂Ω is P homogeneous. Now let Lu = |α|≤m aα (x)Dα u denote any linear differential operator. Show that the set of functions satisfying Lu = 0 and any homogeneous side conditions is a vector space. 8. Consider the differential equation u00 + u = 0 on the interval (0, π). What is the dimension of the vector space of solutions which satisfy the homogeneous boundary conditions a) u(0) = u(π), and b) u(0) = u(π) = 0. Repeat the question if the interval (0, π) is replaced by (0, 1) and (0, 2π). 9. Let Df = f 0 for any differentiable function f on R. For any N ≥ 0 show that D : PN → PN is linear and find its null space and range. 10. If X and Y are vector spaces, then the Cartesian product of X and Y, is defined as the set of ordered pairs X × Y = {(x, y) : x ∈ X, y ∈ Y} (3.4.1) Addition and scalar multiplication on X × Y are defined in the natural way, (x, y) + (x̂, ŷ) = (x + x̂, y + ŷ) λ(x, y) = (λx, λy) (3.4.2) a) Show that X × Y is a vector space. b) Show that R × R is isomorphic to R2 . 11. If X, Y are vector spaces of the same finite dimension, show X and Y are isomorphic. 12. Show that Lp (0, 1) and Lp (a, b) are isomorphic, for any a, b ∈ R and p ∈ (0, ∞]. 45 Chapter 4 Metric spaces chmetric 4.1 Axioms of a metric space A metric space is a set on which some natural notion of distance may be defined. Definition 4.1. A metric space is a pair (X, d) where X is a set and d is a real valued mapping on X × X, such that the following axioms hold. [M1] d(x, y) ≥ 0 for all x, y ∈ X [M2] d(x, y) = 0 if and only if x = y [M3] d(x, y) = d(y, x) for all x, y ∈ X [M4] d(x, y) ≤ d(x, z) + d(z, y) for all x, y, z ∈ X. Here d is the metric on X, i.e. d(x, y) is regarded as the distance from x to y. Axiom [M4] is known as the triangle inequality. Although strictly speaking the metric space is the pair (X, d) it is a common practice to refer to X itself as being the metric space, with the metric d understood from context. But as we will see in examples it is often possible to assign different metrics to the same set X. If (X, d) is a metric space and Y ⊂ X then it is clear that (Y, d) is also a metric space, and in this case we say that Y inherits the metric of X. 46 ex41 Example 4.1. If X = RN then there are many choices of d for which (RN , d) is a metric space. The most familiar is the ordinary Euclidean distance d(x, y) = N X ! 21 |xj − yj |2 (4.1.1) j=1 In general we may define N X dp (x, y) = ! p1 |xj − yj |p 1≤p<∞ (4.1.2) j=1 and d∞ (x, y) = max (|x1 − y1 |, |x2 − y2 |, . . . |xn − yn |) (4.1.3) The verification that (Rn , dp ) is a metric space for 1 ≤ p ≤ ∞ is left to the exercises – the triangle inequality is the only nontrivial step. The same family of metrics may be used with X = CN . CofE Example 4.2. To assign a metric to C(E) more specific assumptions must be made about E. If we assume, for example, that E is a closed and bounded1 subset of RN we may set d∞ (f, g) = max |f (x) − g(x)| (4.1.4) x∈E so that d(f, g) is always finite by virtue of the well known theorem that a continuous function achieves its maximum on a closed, bounded set. Other possibilities are p1 Z p |f (x) − g(x)| dx dp (f, g) = 1≤p<∞ (4.1.5) E Note the analogy with the definition of dp in the case of RN or CN . For more arbitrary sets E there is in general no natural metric for C(E). For example, if E is an open set, none of the metrics dp can be used since there is no reason why dp (f, g) should be finite for f, g ∈ C(E). As in the case of vector spaces, some spaces of differentiable functions may also be made into metric spaces. For this we will assume a bit more about E, namely that E is 1 I.e. E is compact in RN . Compactness is discussed in more detail below, and we avoid using the term until then. 47 CMetric the closure of a bounded open set O ⊂ RN , and in this case will say that Dα f ∈ C(E) if the function Dα f defined in the usual pointwise sense on O has a continuous extension to E. We then can define C m (E) = {f : Dα f ∈ C(E) whenever |α| ≤ m} (4.1.6) d(f, g) = max max |Dα (f − g)(x)| (4.1.7) with metric |α|≤m x∈E CmMetric which may be easily checked to satisfy [M1]-[M4]. We cannot define a metric on C ∞ (E) in the obvious way just by letting m → ∞ in the above definition, since there is no reason why the resulting maximum over m in (4.1.7) will be finite, even if f ∈ C m (E) for every m. See however Exercise 18. Example 4.3. Recall that if E is a measurable subset of RN , we have defined corresponding vector spaces Lp (E) for 0 < p ≤ ∞. To endow them with metric space structure let p1 Z p |f (x) − g(x)| dx dp (f, g) = (4.1.8) dpmet E for 1 ≤ p < ∞, and d∞ (f, g) = ess supx∈E |f (x) − g(x)| (4.1.9) The validity of axioms [M1] and [M3] is clear, and the triangle inequality [M4] is an immediate consequence of the Minkowski inequality (18.1.10). But axiom [M2] does not appear to be satisfied here, since for example, two functions f, g agreeing except at a single point, or more generally agreeing except on a set of measure zero, would have dp (f, g) = 0. It is necessary, therefore, to modify our point of view concerning Lp (E) as follows. We define an equivalence relation f ∼ g if f = g almost everywhere, i.e. except on a set of measure zero. If dp (f, g) = 0 we would be able to correctly conclude that f ∼ g, in which case we will regard f and g as being the same element of Lp (E). Thus strictly speaking, Lp (E) is the set of equivalence classes of measurable functions, where the equivalence classes are defined by means of the above equivalence relation. The distance dp ([f ], [g]) between two equivalence classes [f ] and [g] may be unambiguously determined by selecting a representative of each class and then evaluating the distance from (4.1.8) or (4.1.9). Likewise the vector space structure of Lp (E) is maintained since, for example, we can define the sum of equivalence classes [f ] + [g] by selecting a representative of each class and observing that if f1 ∼ f2 and g1 ∼ g2 then 48 dinfmet f1 +g1 ∼ f2 +g2 . It is rarely necessary to make a careful distinction between a measurable function and the equivalence class it belongs to, and whenever it can cause no confusion we will follow the common practice of referring to members of Lp (E) as functions rather than equivalence classes. The notation f may be used to stand for either a function or its equivalence class. An element f ∈ Lp (E) will be said to be continuous if its equivalence class contains a continuous function, and in this way we can naturally regard C(E) as a subset of Lp (E). Although Lp (E) is a vector space for 0 < p ≤ ∞, we cannot use the above definition of metric for 0 < p < 1, since it turns out the triangle inequality is not satisfied (see Exercise 6 of Chapter 5) except in degenerate cases. 4.2 Topological concepts In a metric space various concepts of point set topology may be introduced. Definition 4.2. If (X, d) is a metric space then 1. B(x, ) = {y ∈ X : d(x, y) < } is the ball centered at x of radius . 2. A set E ⊂ X is bounded if there exists some x ∈ X and R < ∞ such that E ⊂ B(x, R). 3. If E ⊂ X, then a point x ∈ X is an interior point of E if there exists > 0 such that B(x, ) ⊂ E. 4. If E ⊂ X, then a point x ∈ X is a limit point of E if for any > 0 there exists a point y ∈ B(x, ) ∩ E, y 6= x. 5. A subset E ⊂ X is open if every point of E is an interior point of E. By convention, the empty set is open. 6. A subset E ⊂ X is closed if every limit point of E is in E. 7. The closure E of a set E ⊂ X is the union of E and the limit points of E. 8. The interior E ◦ of a set E is the set of all interior points of E. 9. A subset E is dense in X if E = X 49 10. X is separable if it contains a countable dense subset. 11. If E ⊂ X, we say that x ∈ X is a boundary point of E if for any > 0 the ball B(x, ) contains at least one point of E and at least one point of the complement E c = {x ∈ X : x 6∈ E}. The boundary of E is denoted ∂E. The following Proposition states a number of elementary but important properties. Proofs are essentially the same as in the more familiar special case when the metric space is a subset of RN , and will be left for the reader. Proposition 4.1. Let (X, d) be a metric space. Then 1. B(x, ) is open for any x ∈ X and > 0. 2. E ⊂ X is open if and only if its complement E c is closed 3. An arbitrary union or finite intersection of open sets is open. 4. An arbitrary intersection or finite union of closed sets is closed. 5. If E ⊂ X then E ◦ is the union of all open sets contained in E, E ◦ is open, and E is open if and only if E = E ◦ . 6. E is the intersection of all closed sets containing E, E is closed, and E is closed if and only if E = E. 7. If E ⊂ X then ∂E = E\E ◦ = E ∩ E c Next we study infinite sequences in X. Definition 4.3. We say that a sequence {xn }∞ n=1 in X is convergent to x, that is, limn→∞ xn = x, if for any > 0 there exists n0 < ∞ such that d(xn , x) < whenever n ≥ n0 . Example 4.4. If X = RN or CN , and d is any one of the metrics dp , then xn → x if and only if each component sequence converges to the corresponding limit, i.e. xj,n → xj as n → ∞ in the ordinary sense of convergence in R or C. (Here xj,n is the j’th component of xn .) Example 4.5. In the metric space (C(E), d∞ ) of Example 4.2, limn→∞ fn = f is equivalent to the definition of uniform convergence on E. 50 Definition 4.4. We say that a sequence {xn }∞ n=1 in X is a Cauchy sequence if for any > 0 there exists n0 < ∞ such that d(xn , xm ) < whenever n, m ≥ n0 . It is easy to see that a convergent sequence is always a Cauchy sequence, but the converse may be false. Definition 4.5. A metric space X is said to be complete if every Cauchy sequence in X is convergent in X. Example 4.6. Completeness is one of the fundamental properties of the real numbers N R, see for example Chapter 1 of [29]. If a sequence {xn }∞ n=1 in R is Cauchy with respect to any of the metrics dp , then each component sequence {xj,n }∞ n=1 is a Cauchy sequence in R, hence convergent in R. It then follows immediately that {xn }∞ n=1 is convergent in N R , again with any of the metrics dp . The same conclusion holds for CN , so that RN , CN are complete metric spaces. These spaces are also separable since the subset consisting of points with rational co-ordinates is countable and dense. A standard example of an incomplete metric space is the set of rational numbers with the metric inherited from R. Most metric spaces used in this book, and indeed most metric spaces used in applied mathematics, are complete. prop42 Proposition 4.2. If E ⊂ RN is closed and bounded, then the metric space C(E) with metric d = d∞ is complete. Proof: Let {fn }∞ n=1 be a Cauchy sequence in C(E). If > 0 we may then find n0 such that max |fn (x) − fm (x)| < (4.2.1) x∈E whenever n, m ≥ n0 . In particular the sequence of numbers {fn (x)}∞ n=1 is Cauchy in R or C for each fixed x ∈ E, so we may define f (x) := limn→∞ fn (x). Letting m → ∞ in (4.2.1) we obtain |fn (x) − f (x)| ≤ n ≥ n0 x ∈ E (4.2.2) which means d(fn , f ) ≤ for n ≥ n0 . It remains to check that f ∈ C(E). If we pick x ∈ E, then since fn0 ∈ C(E) there exists δ > 0 such that |fn0 (x) − fn0 (y)| < if |y − x| < δ. Thus for |y − x| < δ we have |f (x) − f (y)| ≤ |f (x) − fn0 (x)| + |fn0 (x) − fn0 (y)| + |fn0 (y) − f (y)| < 3 (4.2.3) Since is arbitrary, f is continuous at x, and since x is arbitrary f ∈ C(E). Thus we have concluded that the Cauchy sequence {fn }∞ n=1 is convergent in C(E) to f ∈ C(E), as needed. 2 51 eq401 The final part of the above proof should be recognized as the standard proof of the familiar fact that a uniform limit of continuous functions is continuous. The spaces C m (E) can likewise be shown, again assuming that E is closed and bounded, to be complete metric spaces with the metric defined in (4.1.7), see Exercise 19. If we were to choose the metric d1 on C(E) then the resulting metric space is not 1 complete. Choose for example E = [−1, 1] and fn (x) = x 2n+1 so that the pointwise limit of fn (x) is f (x) = 1 x > 0 f (x) = −1 x < 0 f (0) = 0 (4.2.4) By a simple calculation Z 1 1 (4.2.5) n+1 −1 ∞ so that {fn }∞ n=1 must be Cauchy in C(E) with metric d1 . On the other hand {fn }n=1 cannot be convergent in this space, since the only possible limit is f which does not belong to C(E). |fn (x) − f (x)| = The same example can be modified to show that C(E) is not complete with any of the metrics dp for 1 ≤ p < ∞, and so d∞ is in some sense the ’natural’ metric. For this reason C(E) will always be assumed to supplied with the metric d∞ unless otherwise stated. We next summarize in the form of a theorem some especially important facts about the metric spaces Lp (E), which may be found in any standard textbook on Lebesgue integration, for example Chapter 3 of [30] or Chapter 8 of [38]. th41 Theorem 4.1. If E ⊂ RN is measurable, then 1. Lp (E) is complete for 1 ≤ p ≤ ∞. 2. Lp (E) is separable for 1 ≤ p < ∞. 3. If Cc (E) is the set of continuous functions of bounded support, i.e. Cc (E) = {f ∈ C(E) : there exists R < ∞ such that f (x) ≡ 0 for |x| > R} (4.2.6) then Cc (E) is dense in Lp (E) for 1 ≤ p < ∞ The completeness property is a significant result in measure theory, often known as the Riesz-Fischer Theorem. 52 4.3 Functions on metric spaces and continuity Next, suppose X, Y are two metric spaces with metrics dX , dY respectively. Definition 4.6. Let T : X → Y be a mapping. 1. We say T is continuous at a point x ∈ X if for any > 0 there exists δ > 0 such that dY (T (x), T (x̂)) ≤ whenever dX (x, x̂) ≤ δ. 2. T is continuous on X if it is continuous at each point of X. 3. T is uniformly continuous on X if for any > 0 there exists δ > 0 such that dY (T (x), T (x̂)) ≤ whenever dX (x, x̂) ≤ δ, x, x̂ ∈ X. 4. T is Lipschitz continuous on X if there exists L such that dY (T (x), T (x̂)) ≤ LdX (x, x̂) x, x̂ ∈ X (4.3.1) The infimum of all L’s which work in this definition is called the Lipschitz constant of T . Clearly we have the implications that T Lipschitz continuous implies T is uniformly continuous, which in turn implies that T is continuous. T is one-to-one, or injective, if T (x1 ) = T (x2 ) only if x1 = x2 , and onto, or surjective, if for every y ∈ Y there exists some x ∈ X such that T (x) = y. If T is both one-to-one and onto then we say it is bijective, and in this case there must exist an inverse mapping T −1 : Y → X. For any mapping T : X → Y we define, for E ⊂ X and F ⊂ Y T (E) = {y ∈ Y : y = T (x) for some x ∈ E} (4.3.2) the image of E in Y, and T −1 (E) = {x ∈ X : T (x) ∈ E} (4.3.3) the preimage of F in X. Note that T is not required to be bijective in order that the preimage be defined. The following theorem states two useful characterizations of continuity. Condition b) is referred to as the sequential definition of continuity, for obvious reasons, while c) is the topological definition, since it may be used to define continuity in much more general topological spaces. 53 Theorem 4.2. Let X, Y be metric spaces and T : X → Y. Then the following are equivalent: a) T is continuous on X. b) If xn ∈ X and xn → x, then T (xn ) → T (x). c) If E is open in Y then T −1 (E) is open in X. Proof: Assume T is continuous on X and let xn → x in X. If > 0 then there exists δ > 0 such that dY (T (x̂), T (x)) < if dX (x̂, x) < δ. Choosing n0 sufficiently large that dX (xn , x) < δ for n ≥ n0 we then must have dY (T (xn ), T (x)) < for n ≥ n0 , so that T (xn ) → T (x). Thus a) implies b). To see that b) implies c), suppose condition b) holds, E is open in Y and x ∈ T −1 (E). We must show that there exists δ > 0 such that x̂ ∈ T −1 (E) whenever dX (x̂, x) < δ. If not then there exists a sequence xn → x such that xn 6∈ T −1 (E), and by b), T (xn ) → T (x). Since y = T (x) ∈ E and E is open, there exists > 0 such that z ∈ E if dY (z, y) < . Thus T (xn ) ∈ E for sufficiently large n, i.e. xn ∈ T −1 (E), a contradiction. Finally, suppose c) holds and fix x ∈ X. If > 0 then corresponding to the open set E = B(T (x), ) in Y there exists a ball B(x, δ) in X such that B(x, δ) ⊂ T −1 (E). But this means precisely that if dX (x̂, x) < δ then dY (T (x̂), T (x)) < , so that T is continuous at x. 2 4.4 Compactness and optimization Another important topological concept is that of compactness. Definition 4.7. If E ⊂ X then a collection of open sets {Gα }α∈A is an open cover of E if E ⊂ ∪α∈A Gα . Here A is the index set and may be finite, countably or uncountably infinite. Definition 4.8. K ⊂ X is compact if any open cover of K has a finite subcover. More explicitly, K is compact if whenever K ⊂ ∪α∈A Gα , where each Gα is open, there exists a finite number of indices α1 , α2 , . . . αm ∈ A such that K ⊂ ∪m j=1 Gαj . In addition, E ⊂ X is precompact (or relatively compact) if E is compact. 54 compact1 Proposition 4.3. A compact set is closed and bounded. A closed subset of a compact set is compact. Proof: Suppose that K is compact and pick x ∈ K c . For any r > 0 let Gr = {y ∈ X : d(x, y) > r}. It is easy to see that each Gr is open and K ⊂ ∪r>0 Gr . Thus there c exists r1 , r2 , . . . rm such that K ⊂ ∪m j=1 Grj and so B(x, r) ⊂ K if r < min {r1 , r2 , . . . rm }. Thus K c is open and so K is closed. Obviously ∪r>0 B(x, r) is an open cover of K for any fixed x ∈ X. If K is compact then there must exist r1 , r2 , . . . rm such that K ⊂ ∪m j=1 B(x, rj ) and so K ⊂ B(x, R) where R = max {r1 , r2 , . . . rm }. Thus K is bounded. Now suppose that F ⊂ K where F is closed and K is compact. If {Gα }α∈A is an open cover of F then these sets together with the open set F c are an open cover of K. c Hence there exists α1 , α2 , . . . αm such that K ⊂ (∪m j=1 Gαj ) ∪ F , from which we conclude that F ⊂ ∪m j=1 Gαj . 2 There will be frequent occasions for wanting to know if a certain set is compact, but it is rare to use the above definition directly. A useful equivalent condition is that of sequential compactness. Definition 4.9. A set K ⊂ X is sequentially compact if any infinite sequence in E has a subsequence convergent to a point of K. Proposition 4.4. A set K ⊂ X is compact if and only if it is sequentially compact. We will not prove this result here, but instead refer to Theorem 16, Section 9.5 of [28] for details. It follows immediately that if E ⊂ X is precompact then any infinite sequence in X has a convergent subsequence (the point being that the limit need not belong to E). We point out that the concepts of compactness and sequential compactness are applicable in spaces even more general than metric spaces, and are not always equivalent in such situations. In the case that X = RN or CN we have an even more explicit characterization of compactness, the well known Heine-Borel Theorem, for which we refer to [29] for a proof. thhb Theorem 4.3. E ⊂ RN or E ⊂ CN is compact if and only if it is closed and bounded. While we know from Proposition 4.3 that a compact set is always closed and bounded, 55 the converse implication is definitely false in most function spaces we will be interested in. In later chapters a great deal of attention will be paid to optimization problems in function spaces, that is, problems in the Calculus of Variations. A simple result along these lines that we can prove already is: th43 Theorem 4.4. Let X be a compact metric space and f : X → R be continuous. Then there exists x0 ∈ X such that f (x0 ) = max f (x) (4.4.1) x∈X Proof: Let M = supx∈X f (x) (which may be +∞). so there exists a sequence {xn }∞ n=1 such that limn→∞ f (xn ) = M . By sequential compactness there is a subsequence {xnk } and x0 ∈ X such that limk→∞ xnk = x0 and since f is continuous on X we must have f (x0 ) = limk→∞ f (xnk ) = M . Thus M < ∞ and 4.4.1 holds. 2 A common notation expressing the same conclusion as 4.4.1 is x0 ∈ argmax(f (x)) 2 (4.4.2) which is also useful in making the distinction between the maximum value of a function and the point(s) at which the maximum is achieved. We emphasize here the distinction between maximum and supremum, which is an essential point in later discussion of optimization. If E ⊂ R then M = sup E if • x ≤ M for all x ∈ E • if M 0 < M there exists x ∈ E such that x > M 0 Such a number M exists for any E ⊂ R if we allow the value M = +∞; by convention M = −∞ if E is the empty set. On the other hand M = max E if • x≤M ∈E 2 Even though argmax(f (x)) is in general a set of points, i.e. all points where f achieves its maximum value, one will often see this written as x0 = argmax(f (x)). Naturally we use the corresponding notation argmin for points where the minimum of f is achieved. 56 in which case evidently the maximum is finite and equal to the supremum. If f : X → C is continuous on a compact metric space X, then we can apply Theorem 4.4 with f replaced by |f |, to obtain that there exists x0 ∈ X such that |f (x)| ≤ |f (x0 )| for all x ∈ X. We can then also conclude, as in Example 4.2 and Proposition 4.2 Proposition 4.5. If X is a compact metric space, then C(X) = {f : X → C : f is continous at x for every x ∈ X} (4.4.3) is a complete metric space with metric d(f, g) = maxx∈X |f (x) − g(x)|. In general C(X), or even a bounded set in C(X), is not itself precompact. A useful criteria for precompactness of a set of functions in C(X) is given by the Arzela-Ascoli theorem, which we review here, see e.g. [29] for a proof. Definition 4.10. We say a family of real or complex valued functions F defined on a metric space X is uniformly bounded if there exists a constant M such that |f (x)| ≤ M whenever x ∈ X , f ∈ F (4.4.4) and equicontinuous if for every > 0 there exists δ > 0 such that |f (x) − f (y)| < whenever x, y ∈ E d(x, y) < δ f ∈F (4.4.5) We then have arzasc ex48 Theorem 4.5. (Arzela-Ascoli) If X is a compact metric space and F ⊂ C(X) is uniformly bounded and equicontinuous, then F is precompact in C(X). Example 4.7. Let F = {f ∈ C([0, 1]) : |f 0 (x)| ≤ M ∀x ∈ (0, 1), f (0) = 0} for some fixed M . Then for f ∈ F we have Z f (x) = (4.4.6) x f 0 (s) ds (4.4.7) 0 implying in particular that |f (x)| ≤ Rx M ds ≤ M . Also Z y 0 |f (x) − f (y)| = f (s) ds ≤ M |x − y| 0 x 57 (4.4.8) so that for any > 0, δ = /M works in the definition of equicontinuity. Thus by the Arzela-Ascoli theorem F is precompact in C([0, 1]). If X is a compact subset of RN then since uniform convergence implies Lp convergence, it follows that any set which is precompact in C(X) is also precompact in Lp (X). But there are also more refined, i.e. less restrictive, criteria for precompactness in Lp spaces, which are known, see e.g. [5], Section 4.5. 4.5 Contraction mapping theorem Met-Contr One of the most important theorems about metric spaces, frequently used in applied mathematics, is the Contraction Mapping Theorem, which concerns fixed points of a mapping of X into itself. Definition 4.11. A mapping T : X → X is a contraction on X if it is Lipschitz continuous with Lipschitz constant ρ < 1, that is, there exists ρ ∈ [0, 1) such that d(T (x), T (x̂)) ≤ ρd(x, x̂) ∀x, x̂ ∈ X (4.5.1) If ρ = 1 is allowed, we say T is nonexpansive. cmt Theorem 4.6. If T is a contraction on a complete metric space X then there exists a unique x ∈ X such that T (x) = x. Proof: The uniqueness assertion is immediate, namely if T (x1 ) = x1 and T (x2 ) = x2 then d(x1 , x2 ) = d(T (x1 ), T (x2 )) ≤ ρd(x1 , x2 ). Since ρ < 1 we must have d(x1 , x2 ) = 0 so that x1 = x2 . To prove the existence of x, fix any point x1 ∈ X and define xn+1 = T (xn ) (4.5.2) for n = 1, 2, . . . . We first show that {xn }∞ n=1 must be a Cauchy sequence. Note that d(x3 , x2 ) = d(T (x2 ), T (x1 )) ≤ ρd(x1 , x1 ) (4.5.3) d(xn+1 , xn ) = d(T (xn ), T (xn−1 ) ≤ ρn−1 d(x2 , x1 ) (4.5.4) and by induction 58 fpi Thus by the triangle inequality and the usual summation formula for a geometric series, if m > n > 1 d(xm , xn ) ≤ m−1 X d(xj+1 , xj ) ≤ j=n n−1 = ρ m−1 X ρj−1 d(x2 , x1 ) (4.5.5) j=n ρn−1 (1 − ρm−n+1 ) d(x2 , x1 ) ≤ d(x2 , x1 ) 1−ρ 1−ρ (4.5.6) It follows immediately that {xn }∞ n=1 is a Cauchy sequence, and since X is complete there exists x ∈ X such that limn→∞ xn = x. Since T is continuous T (xn ) → T (x) as n → ∞ and so x = T (x) must hold. 2 The point x in the Contraction Mapping Theorem which satisfies T (x) = x is called a fixed point of T , and the process (4.5.2) of generating the sequence {xn }∞ n=1 , is called fixed point iteration. Not only does the theorem show that T possesses a unique fixed point under the stated hypotheses, but the proof shows that the fixed point may be obtained by fixed point iteration starting from an arbitrary point of X. As a simple application of the theorem, consider a second kind integral equation Z u(x) + K(x, y)u(y) dy = f (x) (4.5.7) inteq Ω with Ω ⊂ RN a bounded open set, a kernel function K = K(x, y) defined and continuous for (x, y) ∈ Ω × Ω and f ∈ C(Ω). We can then define a mapping T on X = C(Ω) by Z (4.5.8) T (u)(x) = − K(x, y)u(y) dy + f (x) Ω so that (4.5.7) is equivalent to the fixed point problem u = T (u) in X. Since K is uniformly continuous on Ω × Ω it is immediate that T u ∈ X whenever u ∈ X, and by elementary estimates we have Z d(T (u), T (v)) = max |T (u)(x) − T (v)(x)| = max K(x, y)(u(y) − v(y)) dy ≤ Ld(u, v) x∈Ω x∈Ω Ω (4.5.9) where L := maxx∈Ω Ω |K(x, y)| dy. We therefore may conclude from the Contraction Mapping Theorem the following: R Proposition 4.6. If Z |K(x, y)| dy < 1 max x∈Ω Ω 59 (4.5.10) 410 then (4.5.7) has a unique solution for every f ∈ C(Ω). The condition (4.5.10) will be satisfied if either the maximum of |K| is small enough or the size of the domain Ω is small enough. Eventually we will see that some such smallness condition is necessary for unique solvability of (4.5.7), but the exact conditions will be sharpened considerably. If we consider instead the family of second kind integral equations Z λu(x) + K(x, y)u(y) dy = f (x) (4.5.11) Ω with the same conditions on K and f , then the above argument show unique solvability for all sufficiently large λ, namely provided Z max |K(x, y)| dy < |λ| (4.5.12) x∈Ω Ω As a second example, consider the initial value problem for a first order ODE du = f (t, u) dt u(t0 ) = u0 (4.5.13) where we assume at least that f is continuous on [a, b] × R with t0 ∈ (a, b). If a classical solution u exists, then integrating both sides of the ODE from t0 to t, and taking account of the initial condition we obtain Z t u(t) = u0 + f (s, u(s)) ds (4.5.14) t0 Conversely, if u ∈ C([a, b]) and satisfies (4.5.14) then necessarily u0 exists, is also continuous and (4.5.13) holds. Thus the problem of solving (4.5.13) is seen to be equivalent to that of finding a continuous solution of (4.5.14). In turn this can be viewed as the problem of finding a fixed point of the nonlinear mapping T : C([a, b]) → C([a, b]) defined by Z t T (u)(t) = u0 + f (s, u(s)) ds (4.5.15) t0 Now if we assume that f satisfies the Lipschitz condition with respect to u, |f (t, u) − f (t, v)| ≤ L|u − v| 60 u, v ∈ R t ∈ [a, b] (4.5.16) odeivp1 odeie then Z t |u(s) − v(s)| ds ≤ L|b − a| max |u(t) − v(t)| |T (u)(t) − T (v)(t)| ≤ L a≤t≤b t0 (4.5.17) or d(T (u), T (v)) ≤ L|b − a|d(u, v) (4.5.18) where d is again the usual metric on C([a, b]). Thus the contraction mapping provides a unique local solution, that is, on any interval [a, b] containing t0 for which (b − a) < 1/L. Instead of the requirement that the Lipschitz condition (4.5.18) be valid on the entire infinite strip [a, b]×R, it is actually only necessary to assume it holds on [a, b]×[c, d] where u0 ∈ (c, d). Also, first order systems of ODEs (and thus scalar higher order equations) can be handled in essentially the same manner. Such generalizations may be found in standard ODE textbooks, e.g. Chapter 1 of [CL] or Chapter 3 of [BN]. We conclude with a useful variant of the contraction mapping theorem. If T : X → X then we can define the (composition) powers of T by T 2 (x) = T (T (x)), T 3 (x) = T (T 2 (x)) etc. Thus T n : X → X for n = 1, 2, 3, . . . . Theorem 4.7. If there exists a positive integer n such that T n is a contraction on a complete metric space X then there exists a unique x ∈ X such that T (x) = x. Proof: By Theorem 4.6 there exists a unique x ∈ X such that T n (x) = x. Applying T to both sides gives T n (T (x)) = T n+1 (x) = T (x) so that T (x) is also a fixed point of T n . By uniqueness, T (x) = x, i.e. T has at least one fixed point. To see that the fixed point of T is unique, observe that any fixed point of T is also a fixed point of T 2 , T 3 , . . . . In particular, if T has two distinct fixed points then so does T n , which is a contradiction. 2 4.6 Exercises 1. Verify that dp defined in Example 4.1 is a metric on RN or CN . (Suggestion: to prove the triangle inequality, use the finite dimensional version of the Minkowski inequality (18.1.15)). 2. If (X, dX ), (Y, dY ) are metric spaces, show that the Cartesian product Z = X × Y = {(x, y) : x ∈ X, y ∈ Y } 61 odelip is a metric space with distance function d((x1 , y1 ), (x2 , y2 )) = dX (x1 , x2 ) + dY (y1 , y2 ) p 3. Is d(x, y) = |x − y|2 a metric on R? What about d(x, y) = |x − y|? Find reasonable conditions on a function φ : [0, ∞) → [0, ∞) such that d(x, y) = φ(|x−y|) is a metric on R. 4. Prove that a closed subset of a compact set in a metric space is also compact. 5. Let (X, d) be a metric space, A ⊂ X be nonempty and define the distance from a point x to the set A to be d(x, A) = inf d(x, y) y∈A a) Show that |d(x, A) − d(y, A)| ≤ d(x, y) for x, y ∈ X (i.e. x → d(x, A) is nonexpansive). b) Assume A is closed. Show that d(x, A) = 0 if and only if x ∈ A. c) Assume A is compact. Show that for any x ∈ X there exists z ∈ A such that d(x, A) = d(x, z). 6. Suppose that F is closed and G is open in a metric space (X, d) and F ⊂ G. Show that there exists a continuous function f : X → R such that i) 0 ≤ f (x) ≤ 1 for all x ∈ X. ii) f (x) = 1 for x ∈ F . iii) f (x) = 0 for x ∈ Gc . Hint: Consider f (x) = d(x, Gc ) d(x, Gc ) + d(x, F ) 7. Two metrics d, dˆ on a set X are said to be equivalent if there exist constants 0 < C < C ∗ < ∞ such that C≤ d(x, y) ≤ C∗ ˆ y) d(x, ∀x, y ∈ X a) If d, dˆ are equivalent, show that a sequence {xk }∞ k=1 is convergent in (X, d) if and ˆ only if it is convergent in (X, d). b) Show that any two of the metrics dp on Rn are equivalent. 62 ex4-8 8. Prove that C([a, b]) is separable (you may quote the Weierstrass approximation theorem) but L∞ (a, b) is not separable. 9. If X, Y are metric spaces, f : X → Y is continuous and K is compact in X, show that the image f (K) is compact in Y . 10. Let Z F = {f ∈ C([0, 1]) : |f (x) − f (y)| ≤ |x − y| for all x, y, 1 f (x) dx = 0} 0 Show that F is compact in C([0, 1]). (Suggestion: to prove that F is uniformly bounded, justify and use the fact that if f ∈ F then f (x) = 0 for some x ∈ [0, 1].) 11. Show that the set F in Example 4.7 is not closed. 12. From the proof of the contraction mapping it is clear that the smaller ρ is, the faster the sequence xn converges to the fixed point x. With this in mind, explain why Newton’s method f (xn ) xn+1 = xn − 0 f (xn ) is in general a very rapidly convergent method for approximating roots of f : R → R, as long as the initial guess is close enough. 13. Let fn (x) = sinn x for n = 1, 2, . . . . a) Is the sequence {fn }∞ n=1 convergent in C([0, π])? b) Is the sequence convergent in L2 (0, π)? c) Is the sequence compact or precompact in either of these spaces? 14. Let X be a complete metric space and T : X → X satisfy d(T (x), T (y)) < d(x, y) for all x, y ∈ X, x 6= y. Show that T can have at most one fixed point, but √ may have none. (Suggestion: for an example of non-existence look at T (x) = x2 + 1 on R.) 15. Let S denote the linear Volterra type integral operator Z x Su(x) = K(x, y)u(y) dy a where the kernel K is continuous and satisfies |K(x, y)| ≤ M for a ≤ y ≤ x. 63 a) Show that |S n u(x)| ≤ M n (x − a)n max |u(y)| x > a n = 1, 2, . . . a≤y≤x n! b) Deduce from this that for any b > a, there exists an integer n such that S n is a contraction on C([a, b]). c) Show that for any f ∈ C([a, b]) the second kind Volterra integral equation Z x K(x, y)u(y) dy = u(x) + f (x) a < x < b a has a unique solution u ∈ C([a, b]). 16. Show that for sufficiently small |λ| there exists a unique solution of the boundary value problem u00 + λu = f (x) 0 < x < 1 u(0) = u(1) = 0 for any f ∈ C([0, 1]). (Suggestion: use the result of Chapter 2, Exercise 7 to transform the boundary value problem into a fixed point problem for an integral operator, then apply the Contraction Mapping Theorem.) Be as precise as you can about which values of λ are allowed. 17. Let f = f (x, y) be continuously differentiable on [0, 1] × R and satisfy 0<m≤ ∂f (x, y) ≤ M ∂y Show that there exists a unique continuous function φ(x) such that f (x, φ(x)) = 0 0 < x < 1 (Suggestion: Define the transformation (T φ)(x) = φ(x) − λf (x, φ(x)) and show that T is a contraction on C([0, 1]) for some choice of λ. This is a special case of the implicit function theorem.) 64 ex4-18 18. Show that if we let ∞ X 2−n en d(f, g) = 1 + en n=0 where en = max |f (n) (x) − g (n) (x)| x∈[a,b] (n) then (C ∞ ([a, b]), d) is a metric space, in which fk → f if and only if fk uniformly on [a, b] for n = 0, 1, . . . . ex4-19 → f (n) 19. If E ⊂ RN is closed and bounded, show that C 1 (E) is a complete metric space with the metric defined by (4.1.7). 65 Chapter 5 Normed linear spaces and Banach spaces chbanach 5.1 Axioms of a normed linear space Definition 5.1. A vector space X is said to be a normed linear space if for every x ∈ X there is defined a nonnegative real number ||x||, the norm of x, such that the following axioms hold. [N1] ||x|| = 0 if and only if x = 0 [N2] ||λx|| = |λ|||x|| for any x ∈ X and any scalar λ. [N3] ||x + y|| ≤ ||x|| + ||y|| for any x, y ∈ X. As in the case of a metric space it is technically the pair (X, || · ||) which constitute a normed linear space, but the definition of the norm will usually be clear from the context. If two different normed spaces are needed we will use a notation such as ||x||X to indicate the space in which the norm is calculated. Example 5.1. In the vector space X = RM or CN we can define the family of norms ! p1 n X ||x||p = |xj |p 1≤p<∞ j=1 66 ||x||∞ = max |xj | 1≤j≤n (5.1.1) Axioms [N1] and [N2] are obvious, while axiom [N3] amounts to the Minkowski inequality (18.1.15). We obviously have dp (x, y) = ||x − y||p here, and this correspondence between norm and metric is a special case of the following general fact that a norm always gives rise to a metric, and whose proof is immediate from the definitions involved. prop51 Proposition 5.1. Let (X, || · ||) be a normed linear space. If we set d(x, y) = ||x − y|| for x, y ∈ X then (X, d) is a metric space. Example 5.2. If E ⊂ RN is closed and bounded then it is easy to verify that ||f || = max |f (x)| x∈E (5.1.2) defines a norm on C(E), and the usual metric (4.1.4) on C(E) amounts to d(f, g) = ||f − g||. Likewise, the metrics (4.1.8),(4.1.9) on Lp (E) may be viewed as coming from the corresponding Lp norms, ( R p1 p |f (x)| dx 1≤p<∞ E ||f ||Lp (E) = (5.1.3) ess supx∈E |f (x)| p = ∞ Note that for such a metric we must have d(λx, λy) = |λ|d(x, y) so that if this property does not hold, the metric cannot arise from a norm in this way. For example, d(x, y) = |x − y| 1 + |x − y| (5.1.4) is a metric on R which does not come from a norm. Since any normed linear space may now be regarded as metric space, all of the topological concepts defined for a metric space are meaningful in a normed linear space. Completeness holds in many situations of interest, so we have a special designation in that case. Definition 5.2. A Banach space is a complete normed linear space. Example 5.3. The spaces RN , CN are vector spaces which are also complete metric spaces with any of the norms || · ||p , hence they are Banach spaces. Similarly C(E), Lp (E) are Banach spaces with norms indicated above. Here are a few simple results we can prove already. 67 prop52 Proposition 5.2. If X is a normed linear space the the norm is a continuous function on X. If E ⊂ X is compact and y ∈ X then there exists x0 ∈ E such that ||y − x0 || = min ||y − x|| x∈E (5.1.5) Proof: From the triangle inequality we get |(||x1 || − ||x2 ||)| ≤ ||x1 − x2 || so that f (x) = ||x|| is Lipschitz continuous (with Lipschitz constant 1) on X. Similarly f (x) = ||x − y|| is also continuous for any fixed y, so we may apply Theorem 4.4 with X replaced by the compact metric space E and f (x) = −||x − y|| to get the conclusion (ii). 2 Another topological point of interest is the following. th51 Theorem 5.1. If M is a subspace of a normed linear space X, and dim M < ∞ then M is closed. Proof: The proof is by induction on the number of dimensions. Let dim(M ) = 1 so that M = {u = λe : λ ∈ C} for some e ∈ X, ||e|| = 1. If un ∈ M then un = λn e for some λn ∈ C and un → u in X implies, since ||un − um || = |λn − λm |, that {λn } is a Cauchy sequence in C. Thus there exist λ ∈ C such that λn → λ so that un → u = λe ∈ M , as needed. Now suppose we know that all N dimensional subspaces are closed and dim M = N + 1, so we can find e1 , . . . , eN +1 linearly independent unit vectors such that M = L(e1 , . . . , eN +1 ). Let M̃ = L(e1 , . . . , eN ) which is closed by the induction assumption. If un ∈ M there exists λn ∈ C and vn ∈ M̃ such that un = vn + λn eN +1 . Suppose that un → u in X. We claim first that {λn } is bounded in C. If not, there must exist λnk such that |λnk | → ∞, and since un remains bounded in X we get unk /λnk → 0. It follows that eN +1 − unk vn = − k ∈ M̃ λnk λnk (5.1.6) Since M̃ is closed, it would follow, upon letting nk → ∞, that eN +1 ∈ M̃ , which is impossible. Thus we can find a subsequence λnk → λ for some λ ∈ C and vnk = unk − λnk eN +1 → u − λeN +1 (5.1.7) Again since M̃ is closed it follows that u − λeN +1 ∈ M̃ , so that u ∈ M as needed. 68 For the proof, see for example Theorem 1.21 of [31]. For an infinite dimensional subspace this is false in general. For example, the Weierstrass approximation theorem states that if f ∈ C([a, b]) and > 0 there exists a polynomial p such that |p(x)−f (x)| ≤ on [a, b]. Thus if we take X = C([a, b]) and E to be the set of all polynomials on [a, b] then clearly E is a subspace of X and every point of X is a limit point of E. Thus E cannot be closed since otherwise E would be equal to all of X. Recall that when E = X as in this example, we say that E is a dense subspace of X. Such subspaces play an important role in functional analysis. According to Theorem 5.1 a finite dimensional Banach space X has no dense subspace aside from X itself. 5.2 Infinite series In a normed linear space we can study limits of sums, i.e. infinite series. P Definition 5.3. We say ∞ j=1 xj is convergent in X to the limit s if limn→∞ sn = s, Pn where sn = j=1 xj is the n’th partial sum of the series. prop53 A useful criterion for convergence can then be given, provided the space is also complete. P∞ Proposition 5.3. If X is a Banach space, x ∈ X for n = 1, 2, . . . and n n=1 ||xn || < ∞ P∞ P∞ then n=1 xn is convergent to an element s ∈ X with ||s|| ≤ n=1 ||xn ||. P P xj || ≤ m Proof: If m > n we have ||sm − sn || = || m j=n+1 ||xj || by the triangle j=n+1 P∞ inequality. If j=1 ||xj || it is convergent, its partial sums form a Cauchy sequence in R, and hence {sn } is P also Cauchy. Since the space is complete s = limn→∞ sn exists. We also have ||sn || ≤ nj=1 ||xj || for any fixed n, and ||sn || → ||s|| by Proposition 5.2, so P ||s|| ≤ ∞ j=1 ||xj || must hold. 2 The concepts of linear combination, linear independence and basis may now be extended to allow for infinite sums in an obvious way: We say a countably infinite set of vectors {xn }∞ n=1 is linearly independent if ∞ X λn xn = 0 if and only if λn = 0 for all n n=1 69 (5.2.1) P∞ ∞ and x ∈ L({xn }∞ n=1 ), the span of ({xn }n=1 ), provided x = n=1 λn xn for some scalars {λn }∞ . A basis of X is then a linearly independent spanning set, or equivalently n=1 ∞ ∞ {xn }P n=1 is a basis of X if for any x ∈ X there exist unique scalars {λn }n=1 such that ∞ x = n=1 λn xn . We emphasize that this definition of basis is not the same as that given in Definition 3.4 for a basis of a vector space, the difference being that the sum there is required to always be finite. The term Schauder basis is sometimes used for the above definition if the distinction needs to be made. Throughout the remainder of these notes, the term basis will always mean Schauder basis unless otherwise stated. A Banach space X which contains a Schauder basis {xn }∞ n=1 is always separable, since then the set of all finite linear combinations of the xn ’s with rational coefficients is easily seen to be countable and dense. It is known that not every separable Banach space has a Schauder basis (recall there must exist a Hamel basis), see for example Section 1.1 of [39]. 5.3 Linear operators and functionals We have previously defined what it means for a mapping T : X 7−→ Y between vector spaces to be linear. When the spaces X, Y are normed linear spaces we usually refer to such a mapping T as a linear operator. We say that T is bounded if there exists a finite constant C such that ||T x|| ≤ C||x|| for every x ∈ X, and we may then define the norm of T as the smallest such C, or equivalently ||T || = sup x6=0 ||T x|| ||x|| (5.3.1) The condition ||T || < ∞ is equivalent to continuity of T . prop54 Proposition 5.4. If X, Y are normed linear spaces and T : X → Y is linear then the following three conditions are equivalent. a) T is bounded b) T is continuous c) There exists x0 ∈ X such that T is continuous at x0 . 70 normdef Proof: If x0 , x ∈ X then ||T (x) − T (x0 )|| = ||T (x − x0 )|| ≤ ||T || ||x − x0 || (5.3.2) Thus if T is bounded then it is (Lipschitz) continuous at any point of X. The implication that b) implies c) is trivial. Finally suppose T is continuous at x0 ∈ X. For any > 0 there must exist δ > 0 such that ||T (z − x0 )|| = ||T (z) − T (x0 )|| ≤ if ||z − x0 || ≤ δ. For x any x 6= 0, choose z = x0 + δ ||x|| to get x || ≤ ||T δ ||x|| (5.3.3) or equivalently, using the linearity of T , ||T x|| ≤ C||x|| with C = /δ. Thus T is bounded. 2 A continuous linear operator is therefore the same as a bounded linear operator, and the two terms are used interchangeably. When the range space Y is the scalar field R or C we call T a linear functional instead of linear operator, and correspondingly a bounded (or continuous) linear functional if |T x| ≤ C||x|| for some finite constant C. We introduce the notation B(X, Y) = {T : X → Y : T is linear and bounded} (5.3.4) and the special cases B(X) = B(X, X) X0 = B(X, C) (5.3.5) Examples of linear operators and functionals will be studied much more extensively later. For now we just give two simple examples. Example 5.4. If X = RN , Y = RM and A is an M ×N real matrix with entries akj , then P yk = M j=1 akj xj defines a linear mapping, and according to the discussion of Section 3.3 any linear mapping of RN to RM is of this form. It is not hard to check that T is always bounded, assuming that we use any of the norms || · ||p in X and in Y. Evidently T is a linear functional if M = 1. Example 5.5. If Ω ⊂ RN is compact and X = C(Ω) pick x0 ∈ Ω and set T (f ) = f (x0 ) for f ∈ X. Clearly T is a linear functional and |T f | ≤ ||f || so that ||T || ≤ 1. 71 5.4 Contraction mappings in a Banach space sec54 If the Contraction Mapping theorem, Theorem 4.6, is specialized to a Banach space, the resulting statement is that if X is a Banach space and F : X → X satisfies ||F (x) − F (y)|| ≤ L||x − y|| x, y ∈ X (5.4.1) conaflin for some L < 1, then F has a unique fixed point in X. A particular case which arises frequently in applications is when the mapping F has the form F (x) = T x + b for some b ∈ X and bounded linear operator T on X, in which case the contraction condition (5.4.1) simply amounts to the requirement that ||T || < 1. If we then initialize the fixed point iteration process (4.5.2) with x1 = b, the successive iterates are x2 = F (x1 ) = F (b) = T b + b x3 = F (x2 ) = T x2 + b = T 2 b + T b + b (5.4.2) (5.4.3) etc., the general pattern being xn = n−1 X T j b n = 1, 2, . . . (5.4.4) j=0 with T 0 = I as usual. If ||T || < 1 we already know that this sequence must converge, but it could also be checked directly from Proposition 5.3 using the obvious inequality ||T j b|| ≤ ||T ||j ||b||. In fact we know that xn → x, the unique fixed point of F , so x= ∞ X T jb (5.4.5) j=0 is an explicit solution formula for the linear, inhomogeneous equation x − T x = b. The right hand side of (5.4.5) is known as the Neumann series for x = (I − T )−1 b, and symbolically we may write ∞ X (I − T )−1 = Tj (5.4.6) j=0 Note the formal similarity to the usual geometric series formula for (1 − z)−1 if z ∈ C, |z| < 1. If T and b are such that ||T j b|| << ||T b|| for j ≥ 2, then truncating the series after two terms we get the Born approximation formula x ≈ b + T b. 72 neuser 5.5 Exercises 1. Give the proof of Proposition 5.1. 2. Show that any two norms on a finite dimensional normed linear space are equivalent. That is to say, if (X, || · ||), (X, ||| · |||) are both normed linear spaces, then there exist constants 0 < c < C < ∞ such that c≤ ex5-3 ex5-7 |||x||| ≤C ||x|| for all x ∈ X 3. If X is a normed linear space and Y is a Banach space, show that B(X, Y) is a Banach space, with the norm given by (5.3.1). R 4. If T is a linear integral operator, T u(x) = Ω K(x, y)u(y) dy, then T 2 is also a linear integral operator. What is the kernel for T 2 ? 5. If X is a normed linear space and E is a subspace of X, show that E is also a subspace of X. 1/p R does not define a norm. 6. If p ∈ (0, 1) show that ||f ||p = Ω |f (x)|p dx 7. The simple initial value problem u0 = u u(0) = 1 is equivalent to the integral equation Z x u(x) = 1 + u(s) ds 0 which may be viewed as a fixed point problem of the special type discussed in Section 5.4. Find the Neumann series for the solution u. Where does it converge? 8. If T f = f (0), show that T is not a bounded linear functional on Lp (−1, 1) for 1 ≤ p < ∞. expop 9. Let A ∈ B(X). a) Show that exp(A) = eA := ∞ X An n=0 73 n! (5.5.1) is defined in B(X). b) If also B ∈ B(X) and AB = BA show that exp(A + B) = exp(A) exp(B). c) Show that exp((t + s)A) = exp(tA) exp(sA) for any t, s ∈ R. d) Show that the conclusion in b) is false, in general, if A and B do not commute. (Suggestion: a counterexample can be found in X = R2 .) 10. Find an integral equation of the form u = T u + f , T linear, which is equivalent to the initial value problem u00 + u = x2 x>0 u(0) = 1 u0 (0) = 2 (5.5.2) Calculate the Born approximation to the solution u and compare to the exact solution. 74 Chapter 6 Inner product spaces and Hilbert spaces chhilbert 6.1 Axioms of an inner product space Definition 6.1. A vector space X is said to be an inner product space if for every x, y ∈ X there is defined a complex number hx, yi, the inner product of x and y such that the following axioms hold. [H1] hx, xi ≥ 0 for all x ∈ X [H2] hx, xi = 0 if and only if x = 0 [H3] hλx, yi = λhx, yi for any x, y ∈ X and any scalar λ. [H4] hx, yi = hy, xi for any x, y ∈ X. [H5] hx + y, zi = hx, zi + hy, zi for any x, y, z ∈ X Note that from axioms [H3] and [H4] it follows that hx, λyi = hλy, xi = λhy, xi = λ̄hy, xi = λ̄hx, yi 75 (6.1.1) Another immediate consequence of the axioms is that ||x + y||2 = hx + y, x + yi = ||x||2 + 2Re hx, yi + ||y||2 = ||x||2 + ||y||2 (6.1.2) If we replace y by −y and add the resulting identities we obtain the so-called Parallelogram Law ||x + y||2 + ||x − y||2 = 2||x||2 + 2||y||2 (6.1.3) Example 6.1. The vector space RN is an inner product space if we define hx, yi = n X xj yj (6.1.4) xj y j (6.1.5) j=1 In the case of Cn we must define hx, yi = n X j=1 in order that [H4] be satisfied. Example 6.2. For the vector space L2 (Ω), with Ω ⊂ RN , we may define Z hf, gi = f (x)g(x) dx (6.1.6) E where of course the complex conjugation can be ignored in the case of the real vector space L2 (Ω). Note the formal analogy with the inner product in the case of RN or CN . The finiteness of hf, gi is guaranteed by the Hölder inequality (18.1.6), and the validity of [H1]-[H5] is clear. littlel2 Example 6.3. Another important inner product space which we introduce at this point is the sequence space ( ) ∞ X `2 = x = {xj }∞ |xj |2 < ∞ (6.1.7) j=1 : j=1 with inner product hx, yi = ∞ X xj y j (6.1.8) j=1 The fact that hx, yi is finite for any x, y ∈ `2 follows now from (18.1.14), the discrete form of the Hölder inequality. The notation `2 (Z) is often used when the sequences involved are bi-infinite, i.e. of the form x = {xj }∞ j=−∞ . 76 612 plaw 6.2 Norm in a Hilbert space Proposition 6.1. If x, y ∈ X, an inner product space, then |hx, yi|2 ≤ hx, xihy, yi (6.2.1) Proof: For any z ∈ X we have 0 ≤ hx − z, x − zi = hx, xi − hx, zi − hz, xi + hz, zi = hx, xi + hz, zi − 2Re hx, zi (6.2.2) (6.2.3) and hence 2Re hz, xi ≤ hx, xi + hz, zi (6.2.4) If y = 0 there in nothing to prove, otherwise choose z = (hx, yi/hy, yi)y to get 2 |hx, yi|2 |hx, yi|2 ≤ hx, xi + hy, yi hy, yi (6.2.5) The conclusion (6.2.1) now follows upon rearrangement. 2 th61 Theorem 6.1. If X is an inner product space and if we set ||x|| = a norm on X. p hx, xi then || · || is Proof: By axiom [H1] ||x|| is defined as a nonnegative real number for every x ∈ X, and axiom [H2] implies the corresponding axiom [N1] of norm. If λ is any scalar then ||λx||2 =< λx, λx >= λλ̄hx, xi = |λ|2 ||x||2 so that [N2] also holds. Finally, if x, y ∈ X then ||x + y||2 = hx + y, x + yi = ||x||2 + 2Re hx, yi + ||y||2 ≤ ||x||2 + 2|hx, yi| + ||y||2 ≤ ||x||2 + 2||x|| ||y|| + ||y||2 = (||x|| + ||y||)2 (6.2.6) (6.2.7) (6.2.8) so that the triangle inequality [N3] also holds. 2 The inequality (6.2.1) may now be restated as |hx, yi| ≤ ||x|| ||y|| (6.2.9) for any x, y ∈ X, and in this form is usually called the Schwarz or Cauchy-Schwarz inequality. 77 schwarz cor61 Corollary 6.1. If xn → x in X then hxn , yi → hx, yi for any y ∈ X Proof: We have that |hxn , yi − hx, yi| = |hxn − x, yi| ≤ ||xn − x|| ||y|| → 0 (6.2.10) By Theorem 6.1 an inner product space may always be regarded as a normed linear space, and analogously to the definition of Banach space we have Definition 6.2. A Hilbert space is a complete inner product space. Example 6.4. The spaces RN and CN are Hilbert spaces, as is L2 (Ω) on account of the completeness property mentioned in TheoremR 4.1 of Chapter 4. On the other hand if we consider C(E) with inner product hf, gi = E f (x)g(x) dx, then it is an inner product space which is not a Hilbert space, since as previously observed, C(E) is not complete with the L2 (Ω) metric. The sequence space `2 is also a Hilbert space, see Exercise 7. 6.3 Orthogonality Recall from elementary calculus that in Rn the inner product allows one to calculate the angle between two vectors, namely hx, yi = ||x|| ||y|| cos θ (6.3.1) where θ is the angle between x and y. In particular x and y are perpendicular if and only if hx, yi = 0. The concept of perpindicularity, also called orthogonality, is fundamental in Hilbert space analysis, even if the geometric picture is less clear. Definition 6.3. if x, y ∈ X, an inner product space, we say x, y are orthogonal if hx, yi = 0. From (6.1.2) we obtain immediately the ’Pythagorean Theorem’ that if x and y are orthogonal then ||x + y||2 = ||x||2 + ||y||2 (6.3.2) 78 A set of vectors {x1 , x2 , . . . xn } is called an orthogonal set if xj and xk are orthogonal whenever j 6= k, and for such a set we have || n X 2 xj || = j=1 n X ||xj ||2 (6.3.3) 634 j=1 The set is called orthonormal if in addition ||xj || = 1 for every j. The same terminology is used for countably infinite sets, with (6.3.3) still valid provided that the series on the right is convergent. We may also use the notation x ⊥ y if x, y are orthogonal, and if E ⊂ X we define the orthogonal complement of E E ⊥ = {x ∈ X : hx, yi = 0 for all y ∈ E} (E ⊥ = x⊥ if E consists of the single point x). We obviously have 0⊥ = X and X⊥ = {0} also, since if x ∈ X⊥ then hx, xi = 0 so that x = 0. prop62 Proposition 6.2. If E ⊂ X then E ⊥ is a closed subspace of X. If E is a closed subpace then E = E ⊥⊥ . We leave the proof as an exercise. Here E ⊥⊥ means (E ⊥ )⊥ , the orthogonal complement of the orthogonal complement. Example 6.5. If X = R3 and E = {x = (x1 , x2 , x3 ) : x1 = x2 = 0} then E ⊥ = {x ∈ R3 : x3 = 0}. Example 6.6. If X = L2 (Ω) with Ω a bounded open set in RRN , let E = L{1}, i.e. the set of constant functions. Then f ∈ E ⊥ if and only if hf, 1i = Ω f (x) dx = 0. Thus E ⊥ is the set of functions in L2 (Ω) with mean value zero. 6.4 Projections If E ⊂ X and x ∈ X, the projection PE x of x onto E is the element of E closest to x, if such an element exists. That is, y = PE (x) if y is the unique solution of the minimization problem min ||x − z|| (6.4.1) z∈E Of course such a point may not exist, and may not be unique if it does exist. In a Hilbert space the projection will be well defined provided E is closed and convex. 79 bestapprox Definition 6.4. If X is a vector space and E ⊂ X, we say E is convex if λx+(1−λ)y ∈ E whenever x, y ∈ E and λ ∈ [0, 1]. Example 6.7. If X is a vector space then any subspace of X is convex. If X is a normed linear space then any ball B(x, R) ⊂ X is convex. Theorem 6.2. Let H be a Hilbert space, E ⊂ H closed and convex, and x ∈ H. Then y = PE x exists. Furthermore, y = PE x if and only if y∈E Re hx − y, z − yi ≤ 0 for all z ∈ E (6.4.2) projvar Proof: Set d = inf z∈E ||x − z|| so that there exists a sequence zn ∈ E such that ||x − zn || → d. We wish to show that {zn } is a Cauchy sequence. From the Parallelogram Law (6.1.3) applied to zn − x, zm − x we have ||zn − zm ||2 = 2||zn − x||2 + 2||zm − x||2 − 4|| zn + zm − x||2 2 (6.4.3) m Since E is convex, (zn + zm )/2 ∈ E so that || zn +z − x|| ≥ d, and it follows that 2 ||zn − zm ||2 ≤ 2||zn − x||2 + 2||zm − x||2 − 4d2 (6.4.4) Letting n, m → ∞ the right hand side tends to zero, so that {zn } is Cauchy. Since the space is complete there exists y ∈ H such that limn→∞ zn = y, and y ∈ E since E is closed. It follows that ||y − x|| = limn→∞ ||zn − x|| = d so that minz∈E ||z − x|| is achieved at y. For the uniqueness assertion, suppose ||y − x|| = ||ŷ − x|| = d with y, ŷ ∈ E. Then (6.4.4) holds with zn , zm replaced by y, ŷ giving ||y − ŷ|| ≤ 2||y − x||2 + 2||ŷ − x||2 − 4d2 = 0 (6.4.5) so that y = ŷ. Thus y = PE x exists. To obtain the characterization (6.4.2), note that for any z ∈ E f (t) = ||x − (y + t(z − y))||2 (6.4.6) has its minimum value on the interval [0, 1] when t = 0, since y +t(z −y) = tz +(1−t)y ∈ E. We explicitly calculate f (t) = ||x − y||2 − 2t Re hx − y, z − yi + t2 ||z − y||2 80 (6.4.7) 644 By elementary calculus considerations, the minimum of this quadratic occurs at t = 0 only if f 0 (0) = −2 Re hx − y, z − yi ≥ 0 which is equivalent to (6.4.2). If, on the other hand, (6.4.2) holds, then for any z ∈ E we must have ||z − x||2 = f (1) ≥ f (0) = ||z − y||2 (6.4.8) so that minz∈E ||z − x|| must occur at y, i.e. y = PE x 2 The most important special case of the above theorem is when E is a closed subspace of the Hilbert space H (recall a subspace is always convex), in which case we have th63 Theorem 6.3. If E ⊂ H is a closed subspace of a Hilbert space H and x ∈ H then y = PE x if and only if y ∈ E and x − y ∈ E ⊥ . Furthermore 1. x − y = x − PE x = PE ⊥ x 2. We have that x = y + (x − y) = PE x + PE ⊥ x (6.4.9) is the unique decomposition of x as the sum of an element of E and an element of E ⊥. 3. PE is a linear operator on H with ||PE || = 1 except for the case E = {0}. Proof: If y = PE x then for any w ∈ E we also have y ± w ∈ E, and choosing z = y ± w in (6.4.2) gives ± Re hx − y, wi ≤ 0. Thus Re hx − y, wi = 0, and repeating the same argument with z = y ± iw gives Re hx − y, iwi = Im hx − y, wi = 0 also. We conclude that hx − y, wi = 0 for all w ∈ E, i.e. x − y ∈ E ⊥ . The converse statement may be proved in a similar manner. Recall that E ⊥ is always a closed subspace of H. The statement that x − y = PE ⊥ x is then equivalent, by the previous paragraph, to x−y ∈ E ⊥ and hx−(x−y), wi = hy, wi = 0 for every w ∈ E ⊥ , which is evidently true since y ∈ E. Next, if x = y1 + z1 = y2 + z2 with y1 , y2 ∈ E and z1 , z2 ∈ E ⊥ then y1 − y2 = z1 − z2 implying that y = y1 − y2 belongs to both E and E ⊥ . But then y ⊥ y, i.e. < y, y >= 0, must hold so that y = 0 and hence y1 = y2 , z1 = z2 . We leave the proof of linearity to the exercises. 2 If we denote by I the identity mapping, we have just proved that PE ⊥ = I − PE . We also obtain that ||x||2 = ||PE x||2 + ||PE ⊥ x||2 (6.4.10) 81 6410 for any x ∈ H. Example 6.8. In the Hilbert space L2 (−1, 1) let E denote the subspace of even functions, i.e. f ∈ E if f (x) = f (−x) for almost every x ∈ (−1, 1). We claim that E ⊥ is the subspace ⊥ of odd functions on (−1, 1). The fact that any odd function R 1 belongs to E is clear, since if f is even and g is odd then f g is odd and so hf, gi = −1 f (x)g(x) dx = 0. Conversely, if g ⊥ E then for any f ∈ E we have Z 1 Z 1 g(x)f (x) dx = (g(x) + g(−x))f (x) dx 0 = hg, f i = (6.4.11) −1 0 by an obvious change of variables. Choosing f (x) = g(x) + g(−x) we see that Z 1 |g(x) + g(−x)|2 dx = 0 (6.4.12) 0 so that g(x) = −g(−x) for almost every x ∈ (0, 1) and hence for almost every x ∈ (−1, 1). Thus any element of E ⊥ is and odd function on (−1, 1). Any function f ∈ L2 (−1, 1) thus has the unique decomposition f = PE f + PE ⊥ f , a sum of an even and an odd function. Since one such splitting is f (x) = f (x) + f (−x) f (x) − f (−x) + 2 2 (6.4.13) we conclude from the uniqueness property that these two term are the projections, i.e. PE f (x) = f (x) + f (−x) 2 PE ⊥ f (x) = f (x) − f (−x) 2 (6.4.14) Example 6.9. Let {x1 , x2 , . . . xn } be an orthogonal set of nonzero elements in a Hilbert space X and E = L(x1 , x2 . . . xn ) the span ofPthese elements. Let us compute PE for this closed subspace E. If y = PE x then y = nj=1 λj xj for some scalars λ1 , . . . λn since y ∈ E. From Theorem 6.3 we also have that x − y ⊥ E which is equivalent to x − y ⊥ xk for each k. Thus hx, xk i = hy, xk i = λk hxk , xk i using the orthogonality assumption. Thus we conclude that n X hx, xj i y = PE x = xj (6.4.15) hx , x i j j j=1 82 6415 6.5 Gram-Schmidt method The projection formula (6.4.15) provides an explicit and very convenient expression for the solution y of the best approximation problem (6.4.1) provided E is a subspace spanned by mutually orthogonal vectors {x1 , x2 , . . . xn }. If instead E = L(x1 , x2 . . . xn ) is a subspace but {x1 , x2 , . . . xn } are not orthogonal vectors, we can still use (6.4.15) to compute y = PE x if we can find a set of orthogonal vectors {y1 , y2 , . . . ym } such that E = L(x1 , x2 , . . . xn ) = L(y1 , y2 , . . . ym ), i.e. if we can find an orthogonal basis of E. This may always be done by the Gram-Schmidt orthogonalization procedure from linear algebra, which we now describe. Assume that {x1 , x2 , . . . xn } are linearly independent, so that m = n must hold. First set y1 = x1 . If orthogonal vectors y1 , y2 . . . yk have been chosen for some 1 ≤ k < n such that Ek := L(y1 , y2 , . . . yk ) = L(x1 , x2 , . . . xk ) then define yk+1 = xk+1 − PEk xk+1 . Clearly {y1 , y2 , . . . yk+1 } are orthogonal since yk+1 is the projection of xk+1 onto Ek⊥ . Also since yk+1 , xk+1 differ by an element of Ek it is evident that L(x1 , x2 , . . . xk+1 ) = L(y1 , y2 , . . . yk+1 ). Thus after n steps we obtain an orthogonal set {y1 , y2 , . . . yn } which spans E. If the original set {x1 , x2 , . . . xn } is not linearly independent then some of the yk ’s will be zero. After discarding these and relabeling, we obtain {y1 , y2 , . . . ym } for some m ≤ n, an orthogonal basis for E. Note that we may compute yk+1 using (6.4.15), namely k X < xk+1 , yj > yk+1 = xk+1 − yj (6.5.1) < y , y > j j j=1 In practice the Gram-Schmidt method is often modified to produce an orthonormal basis of E by normalizing yk to be a unit vector at each step, or else discarding it if it is already a linear combination of {y1 , y2 , . . . yk−1 }. More explicitly: • Set y1 = x1 ||x1 || • If orthonormal vectors {y1 , y2 , . . . yk } have been chosen, set ỹk+1 = xk+1 − k X < xk+1 , yj > yj j=1 If ỹk+1 = 0 discard it, otherwise set yk+1 = 83 ỹk+1 . ||ỹk+1 || (6.5.2) The reader may easily check P that {y1 , y2 , . . . ym } constitutes an orthonormal basis of E, and consequently PE x = m j=1 < x, yj > yj for any x ∈ H. 6.6 Bessel’s inequality and infinite orthogonal sequences The formula (6.4.15) for PE may be adapted for use in infinite dimensional subspaces E. If {xn }∞ n=1 is a countable orthogonal set in H, xn 6= 0 for all n, we formally expect that if E = L({xn }∞ n=1 ) then ∞ X hx, xn i PE x = xn (6.6.1) hxn , xn i n=1 661 To verify that this is correct, we must show that the infinite series in (6.6.1) is guaranteed to be convergent in H. First of all, let us set en = xn ||xn || cn = hx, en i EN = L(x1 , x2 , . . . xN ) (6.6.2) cn en (6.6.3) so that {en }∞ n=1 is an orthonormal set, and PEN x = N X n=1 From (6.4.10) we have N X |cn |2 = ||PEN x||2 ≤ ||x||2 (6.6.4) n=1 Letting N → ∞ we obtain Bessel’s inequality ∞ X n=1 2 |cn | = ∞ X |hx, en i|2 ≤ ||x||2 (6.6.5) besselin n=1 The immediate implication that limn→∞ cn = 0 is sometimes called the Riemann-Lebesgue lemma. prop63 Proposition 6.3. (Riesz-Fischer) Let {en }∞ an orthonormal set in H, E = L({en }∞ n=1 be P n=1 ), x ∈ H and cn = hx, en i. Then the infinite series ∞ c e is convergent in H to P x. E n=1 n n 84 Proof: First we note that the series || P∞ M X n=1 cn en 2 cn en || = n=N is Cauchy in H since if M > N M X |cn |2 (6.6.6) n=N P 2 which is less than any prescribed > 0 for M < N sufficiently large, since ∞ n=1 |cn | < P∞ PN ∞. Thus y = n=1 cn en exists in H, and clearly y ∈ E. Since h n=1 cn en , em i = cm if N > m it follows easily that < y, em >= cm =< x, em >. Thus y − x ⊥ em for any m which implies y − x ∈ E ⊥ . From Theorem 6.3 we conclude that y = PE x. 2 6.7 Characterization of a basis of a Hilbert space Now suppose we have an orthogonal set {xn }∞ n=1 and we wish to determine whether or not it is a basis of the Hilbert space H. There are a number of interesting ways to answer this question, summarized in Theorem 6.4 below. First we must make some more definitions. Definition 6.5. A collection of vectors {xn }∞ n=1 is closed in H if the set of all finite linear ∞ combinations of {xn }n=1 is dense in H A collection of vectors {xn }∞ n=1 is complete in H if there is no nonzero vector orthogonal to all of them, i.e. hx, xn i = 0 for all n if and only if x = 0. An orthogonal set {xn }∞ n=1 in H is a maximal orthogonal set if it is not contained in any larger orthogonal set. basischar Theorem 6.4. Let {en }∞ n=1 be an orthonormal set in a Hilbert space H. Then the following are equivalent. a) {en }∞ n=1 is a basis of H. P b) x = ∞ n=1 hx, en ien for every x ∈ H. P c) hx, yi = ∞ n=1 hx, en ihen , yi for every x, y ∈ H. P 2 d) ||x||2 = ∞ n=1 |hx, en i| for every x ∈ H. e) {en }∞ n=1 is a maximal orthonormal set. f ) {en }∞ n=1 is closed in H. 85 g) {en }∞ n=1 is complete in H. then for any x ∈ H there exist unique Proof: a) implies b): If {en }∞ n=1 is a basis of HP constants dn such that x = limSN where SN = n = 1N dn en . Since hSN , em i = dm if N > m it follows that |dm − hx, em i| = |hSn − x, em i| ≤ ||SN − x|| ||em || → 0 (6.7.1) as N → ∞, using the Schwarz inequality. Hence x= ∞ X dn en = n=1 ∞ X hx, en ien (6.7.2) n=1 b) implies c): For any x, y ∈ H we have hx, yi = hx, yi = hx, lim N X N →∞ = lim hx, N →∞ N X hy, en ien i (6.7.3) n=1 hy, en ien i = lim N →∞ n=1 ∞ X hy, en ihx, en i n=1 ∞ ∞ X X hx, en ihy, en i = hx, yi = hx, en ihen , yi = n=1 (6.7.4) (6.7.5) n=1 Here we have used Corollary 6.1 in the second line. c) implies d): We simply choose x = y in the identity stated in c). d) implies e): If {en }∞ n=1 is not maximal then there exists e ∈ H such that {en }∞ n=1 ∪ {e} (6.7.6) is orthonormal. Since he, en i = 0 but ||e|| = 1 this contradicts d). e) implies f ): Let E denote the set of finite linear combinations of the en ’s. If {en }∞ n=1 is not closed then E 6= H so there must exist x 6∈ E. If we let y = x − PE x then y 6= 0 and y ⊥ E. If e = y/||y|| we would then have that {en }∞ n=1 ∪ {e} is orthonormal so that {en }∞ could not be maximal. n=1 86 f ) implies g): Assume that hx, en i = 0 for all n. If {en }∞ for any > 0 n=1 is closed then P PN 2 2 2 there exists λ1 , . . . λN such that ||x − N λ e || < . But then ||x|| + n=1 n n n=1 |λn | < and in particular ||x||2 < . Thus x = 0 so {en }∞ n=1 is complete. P∞ g) implies a): Let E = L({en }∞ n=1 ). If x ∈ H and y = PE x = n=1 hx, en ien then ∞ as in the proof of Proposition 6.3 hy, xn i = hx, xn i. Since {en }n=1 is complete it follows that x = y ∈ E so that L{en }∞ n=1 = H. Since an orthonormal set is obviously linearly independent it follows that {en }∞ n=1 is a basis of H. Because of the equivalence of the stated conditions, the phrases ’complete orthonormal set’, ’maximal orthonormal set’, and ’closed orthonormal set’ are often used interchangeably with ’orthonormal basis’ in a Hilbert space setting. The identity in d) is called the Bessel equality (recall the corresponding inequality (6.6.5) is valid whether or not the orthonormal set {en }∞ equality. For n=1 is a basis), while the identity in c) is the ParsevalP reasons which should become more clear in Chapter 8 the infinite series ∞ n=1 hx, en ien is often called the generalized Fourier series of x with respect to the orthonormal basis {en }∞ n=1 , and hx, en i is the n’th generalized Fourier coefficient. th65 Theorem 6.5. Every separable Hilbert space has an orthonormal basis. Proof: If {xn }∞ n=1 is a countable dense sequence in H and we carry out the GramSchmidt procedure, we obtain an orthonormal sequence {en }∞ n=1 . This sequence must be complete, since any vector orthogonal to every en must also be orthogonal to every ∞ xn , so must be zero, since {xn }∞ n=1 is dense. Therefore by Theorem 6.4 {en }n=1 (or {e1 , e2 , . . . en } in the finite dimensional case) is an orthonormal basis of H. The same conclusion is actually correct in a no-separable Hilbert space also, but needs more explanation. See for example Chapter 4 of [30]. 6.8 Isomorphisms of a Hilbert space There are two interesting isomorphisms of every separable Hilbert space, one is to its socalled dual space, and the second is to the sequence space `2 . In this section we explain both of these facts. Recall that in Chapter 5 we have already introduced X∗ = B(X, C), the space of continuous linear functionals on the normed linear space X. It is itself always a Banach space (see Exercise 3 of Chapter 5), and is also called the dual space of X. 87 exmp6-10 Example 6.10. If H is a Hilbert space and y ∈ H, define φ(x) = hx, yi. Then φ : H → C is clearly linear, and |φ(x)| ≤ ||y|| ||x|| by the Schwarz inequality, hence φ ∈ H∗ , with ||φ|| ≤ ||y||. The following theorem asserts that every element of the dual space H∗ arises in this way. riesz Theorem 6.6. (Riesz representation theorem) If H is a Hilbert space and φ ∈ H∗ then there exists a unique y ∈ H such that φ(x) = hx, yi. Proof: Let M = {x ∈ H : φ(x) = 0}, which is clearly a closed subspace of H. If M = H then φ can only be the zero functional, so y = 0 has the required properties. Otherwise, there must exist e ∈ M ⊥ such that ||e|| = 1. For any x ∈ H let z = φ(x)e − φ(e)x and observe that φ(z) = 0 so z ∈ M , and in particular z ⊥ e. It then follows that 0 = hz, ei = φ(x)he, ei − φ(e)hx, ei (6.8.1) Thus φ(x) = hx, yi with y := φ(e)e, for every x ∈ H. The uniqueness property is even easier to show. If φ(x) = hx, y1 i = hx, y2 i for every x ∈ H then necessarily hx, y1 − y2 i = 0 for all x, and choosing x = y1 − y2 we get ||y1 − y2 ||2 = 0, that is, y1 = y2 . We view the element y ∈ H as ’representing’ the linear functional φ ∈ H∗ , hence the name of the theorem. There are actually several theorems one may encounter, all called the Riesz representation theorem, and what they all have in common is that the dual space of some other space is characterized. The Hilbert space version here is by the far the easiest of these theorems. If we define the mapping R : H → H∗ (the Riesz map) by the condition R(y) = φ, with φ, y related as above, then Theorem 6.6 amounts to the statement that R is one to one and onto. Since it is easy to check that R is also linear, it follows that R is an isomorphism from H to H∗ . In fact more is true, R is an isometric isomorphism which means that ||R(y)|| = ||y|| for every y ∈ H. To see this, recall we have already seen in Example 6.10 that ||φ|| ≤ ||y||, and by choosing x = y we also get φ(y) = ||y||2 , which implies ||φ|| ≥ ||y||. Next, suppose that H is an infinite dimensional separable Hilbert space. According to Theorem 6.5 there exists an orthonormal basis of H which cannot be finite, and so may be written as {en }∞ n=1 . Associate with any x ∈ H the corresponding sequence 88 of generalized Fourier coefficients {cn }∞ n=1 , where cn = hx, en i, and let Λ denote this mapping, i.e. Λ(x) = {cn }∞ . n=1 P∞ 2 2 We know byPTheorem 6.4 that n=1 |cn |P< ∞, i.e. Λ(x) ∈ ` . On the other ∞ ∞ hand, suppose n=1 |cn |2 < ∞ and let x = n=1 cn en . This series is Cauchy, hence convergent in H, by precisely the same argument as used in the beginning of the proof of ∞ Proposition 6.3. Since {en }∞ n=1 is a basis, we must have cn = hx, en i, thus Λ(x) = {cn }n=1 , 2 and consequently Λ : H → ` is onto. It is also one-to-one, since Λ(x1 ) = Λ(x2 ) means that hx1 − x2 , en i = 0 for every n, hence x1 − x2 = 0 by the completeness property of a basis. Finally it is straightforward to check that Λ is linear, so that Λ is an isomorphism. Like the Riesz map, the isomorphism Λ is also isometric, ||Λ(x)|| = ||x||, on account of the Bessel equality. By the above considerations we have then established the following theorem. Theorem 6.7. If H is an infinite dimensional separable Hilbert space, then H is isometrically isomorphic to `2 . Since all such Hilbert spaces are isometrically isomorphic to `2 , they are then obviously isometrically isomorphic to each other. If H is a Hilbert space of dimension N , the same arguments show that H is isometrically isomorphic to the Hilbert space RN or CN , depending on whether real or complex scalars are allowed. Finally, see Theorem 4.17 of [30] for the nonseparable case. 6.9 Exercises 1. Prove Proposition 6.2. 2. In the Hilbert space L2 (−1, 1) what is M ⊥ if a) M = {u : u(x) = u(−x) a.e.} b) M = {u : u(x) = 0 a.e. for − 1 < x < 0}. Give an explicit formula for the projection onto M in each case. 3. Prove that PE is a linear operator on H with norm ||PE || = 1 except in the trivial case when E = {0}. Suggestion: If x = c1 x1 + c2 x2 first show that PE x − c1 PE x1 − c2 PE x2 = −PE ⊥ x + c1 PE ⊥ x1 + c2 PE ⊥ x2 89 4. Show that the parallelogram law fails in L∞ (Ω), so there is no choice of inner product which can give rise to the norm in L∞ (Ω). (The same is true in Lp (Ω) for any p 6= 2.) 5. If (X, h·, ·i) is an inner product space prove the polarization identity hx, yi = 1 ||x + y||2 − ||x − y||2 + i||x + iy||2 − i||x − iy||2 4 Thus, in any normed linear space, there can exist at most one inner product giving rise to the norm. 6. Let M be a closed subspace of a Hilbert space H, and PM be the corresponding projection. Show that 2 a) PM = PM b) hPM x, yi = hPM x, PM yi = hx, PM yi for any x, y ∈ H. ex6-6 7. Show that `2 is a Hilbert space. (Discussion: The only property you need to check is completeness, and you may freely use the fact that R is complete. A Cauchy sequence in this case is a sequence of sequences, so use a notation like (n) (n) x(n) = {x1 , x2 , . . . } (n) where xj denotes the j’th term of the n’th sequence x(n) . Given a Cauchy sequence (n) 2 {x(n) }∞ = xj for each n=1 in ` you’ll first find a sequence x such that limn→∞ xj fixed j. You then must still show that x ∈ `2 , and one good way to do this is by first showing that x − x(n) ∈ `2 for some n.) 8. Let H be a Hilbert space. a) If xn → x in H show that {xn }∞ n=1 is bounded in H. b) If xn → x, yn → y in H show that hxn , yn i → hx, yi. ex6-8 9. Compute orthogonal polynomials of degree 0,1,2,3 on [−1, 1] and on [0, 1] by applying the Gram-Schmidt procedure to 1, x, x2 , x3 in L2 (−1, 1) and L2 (0, 1). (In the case of L2 (−1, 1), you are finding so-called Legendre polynomials.) 10. Use the result of Exercise 9 and the projection formula (6.6.1) to compute the best polynomial approximations of degrees 0,1,2 and 3 to u(x) = ex in L2 (−1, 1). Feel free to use any symbolic calculation tool you know to compute the necessary integrals, but give exact coefficients, not calculator approximations. If possible, produce a graph displaying u and the 4 approximations. 90 11. Let Ω ⊂ RN , ρ be a measurable function on Ω, andR ρ(x) > 0 a.e. on Ω. Let X denote the set of measurable functions u for which Ω |u(x)|2 ρ(x) dx is finite. We can then define the weighted inner product Z u(x)v(x)ρ(x) dx hu, viρ = Ω p and corresponding norm ||u||ρ = hu, uiρ on X. The resulting inner product space is complete, often denoted L2ρ (Ω). (As in the case of ρ(x) = 1 we regard any two functions which agree a.e. as being the same element, so L2ρ (Ω) is again really a set of equivalence classes.) a) Verify that all of the inner product axioms are satisfied. b) Suppose that there exist constants C1 , C2 such that 0 < C1 ≤ ρ(x) ≤ C2 a.e. Show that un → u in L2ρ (Ω) if and only if un → u in L2 (Ω). 12. More classes of orthogonal polynomials may be derived by applying the GramSchmidt procedure to {1, x, x2 , . . . } in L2ρ (a, b) for various choices of ρ, a, b, two of which occur in Exercise 9. Another class is the Laguerre polynomials, corresponding to a = 0, b = ∞ and ρ(x) = e−x . Find the first four Laguerre polynomials. 13. Show that equality holds in the Schwarz inequality (6.2.1) if and only if x, y are linearly dependent. 14. Show by examples that the best approximation problem (6.4.1) may not have a solution if E is either not closed or not convex. 15. If Ω is a compact subset of RN , show that C(Ω) is a subspace of L2 (Ω) which isn’t closed. 16. Show that ∞ 1 √ , cos nπx, sin nπx 2 n=1 (6.9.1) is an orthonormal set in L2 (−1, 1). (Completeness of this set will be shown in Chapter 8.) 17. For nonnegative integers n define vn (x) = cos (n cos−1 x) a) Show that vn+1 (x) + vn−1 (x) = 2xvn (x) for n = 1, 2, . . . 91 b) Show that vn is a polynomial of degree n (the so-called Chebyshev polynomials). 2 c) Show that {vn }∞ n=1 are orthogonal in Lρ (−1, 1) where the weight function is 1 ρ(x) = √1−x 2. 18. If H is a Hilbert space we say a sequence {xn }∞ n=1 converges weakly to x (notation: w xn → x) if hxn , yi → hx, yi for every y ∈ H. w a) Show that if xn → x then xn → x. b) Prove that the converse is false, as long as dim(H) = ∞, by showing that if w {en }∞ n=1 is any orthonormal sequence in H then en → 0, but limn→∞ en doesn’t exist. w c) Prove that if xn → x then ||x|| ≤ lim inf n→∞ ||xn ||. w d) Prove that if xn → x and ||xn || → ||x|| then xn → x. 19. Let M1 , M2 be closed subspaces of a Hilbert space H and suppose M1 ⊥ M2 . Show that M1 ⊕ M2 = {x ∈ H : x = y + z, y ∈ M1 , z ∈ M2 } is also a closed subspace of H. 92 Chapter 7 Distributions chdist In this chapter we will introduce and study the concept of distribution, also commonly known as generalized function. To motivate this study we first mention two examples. Example 7.1. The wave equation utt − uxx = 0 has the general solution u(x, t) = F (x + t) + G(x − t) where F, G must be in C 2 (R) in order that u be a classical solution. However from a physical point of view there is no apparent reason why such smoothness restrictions on F, G should be needed. Indeed the two terms represent waves of fixed shape moving to the left and right respectively with speed one, and it ought to be possible to allow the shape functions F, G to have discontinuities. The calculus of distributions will allow us to regard u as a solution of the wave equation in a well defined sense even for such irregular F, G. Example 7.2. In physics and engineering one frequently encounters the Dirac delta function δ(x), which has the properties Z ∞ δ(x) = 0 x 6= 0 δ(x) dx = 1 (7.0.1) −∞ Unfortunately these properties are inconsistent for ordinary functions – any function which is zero except at a single point must have integral zero. The theory of distributions will allow us to give a precise mathematical meaning to the delta function and in so doing justify formal calculations with it. Roughly speaking a distribution is a mathematical object whose unique identity is specified by how it acts on all test functions. It is in a sense quite analogous to a function 93 in the ordinary sense, whose unique identity is specified by how it acts (i.e. how it maps) all points in its domain. As we will see, most ordinary functions may viewed as a special kind of distribution, which explains the ’generalized function’ terminology. In addition, there is a well defined calculus of distributions which is basic to the modern theory of partial differential equations. We now start to give precise meaning to these concepts. 7.1 The space of test functions For any real or complex valued function f defined on some domain in RN , the support of f , denoted supp f , is the closure of the set {x : f (x) 6= 0}. Definition 7.1. If Ω is any open set in RN the space of test functions on Ω is C0∞ (Ω) = {φ ∈ C ∞ (Ω) : supp φ is compact in Ω} (7.1.1) This function space is also commonly denoted D(Ω), which is the notation we will use from now on. Clearly D(Ω) is a vector space, but it may not be immediately evident that it contains any function other than φ ≡ 0. Example 7.3. Define ( 1 e x2 −1 |x| < 1 φ(x) = 0 |x| ≥ 1 (7.1.2) Then φ ∈ D(Ω) with Ω = R. To see this one only needs to check that limx→1− φ(k) (x) = 0 for k = 0, 1, . . . , and similarly at x = −1. Once we have one such function then many others can be derived from it by dilation (φ(x) → φ(αx)), translation (φ(x) → φ(x − α)), scaling (φ(x) → αφ(x)), differentiation (φ(x) → φ(k) (x)) or any linear combination of such terms. See also Exercise 1. Next, we define convergence in the test function space. Definition 7.2. If φn ∈ D(Ω) then we say φn → 0 in D(Ω) if (i) There exists a compact set K ⊂ Ω such that supp φn ⊂ K for every n (ii) limn→∞ maxx∈Ω |Dα φn (x)| = 0 for every multiindex α We also say that φn → φ in D(Ω) provided φn − φ → 0 in D(Ω). By specifying what convergence of a sequence in D(Ω) means, we are partly, but not completely, specifying 94 a topology on D(Ω). We will have no need of further details about this topology, see Chapter 6 of [31] for more on this point. 7.2 The space of distributions sec72 We come now to the basic definition – a distribution is a continuous linear functional on D(Ω). More precisely Definition 7.3. A linear mapping T : D(Ω) → C is a distribution on Ω if T (φn ) → T (φ) whenever φn → φ in D(Ω). The set of all distributions on Ω is denoted D0 (Ω). The distribution space D0 (Ω) is another example of a dual space X0 , the set of all continuous linear functionals on X, which can be defined if X is any vector space in which convergence of sequences is defined. The dual space is always itself a vector space. We’ll discuss many more examples of dual spaces later on. We emphasize that the distribution T is defined solely in terms of the values it assigns to test functions φ, in particular two distributions T1 , T2 are equal if and only if T1 (φ) = T2 (φ) for every φ ∈ D(Ω). To clarify the concept, let us discuss a number of examples. Example: If f ∈ L1 (Ω) define Z T (φ) = f (x)φ(x) dx (7.2.1) Ω Obviously |T (φ)| ≤ ||f ||L1 (Ω) ||φ||L∞ (Ω) , so that T : D(Ω) → C and is also clearly linear. If φn → φ in D(Ω) then by the same token |T (φn ) − T (φ)| ≤ ||f ||L1 (Ω) ||φn − φ||L∞ (Ω) → 0 (7.2.2) so that T is continuous. Thus T ∈ D0 (Ω). Because of the fact that φ must have compact support in Ω one does not really need f to be in L1 (Ω) but only in L1 (K) for any compact subset K of Ω. For any 1 ≤ p ≤ ∞ let us define Lploc (Ω) = {f : f ∈ Lp (K) for any compact set K ⊂ Ω} 95 (7.2.3) Tf Thus a function in Lploc (Ω) can become infinite arbitrarily rapidly at the boundary of Ω. We say that fn → f in Lploc (Ω) if fn → f in Lp (K) for every compact subset K ⊂ Ω. Functions in L1loc are said to be locally integrable on Ω. Now if we let f ∈ L1loc (Ω) the definition (7.2.1) still produces a finite value, since Z Z |T (φ)| = f (x)φ(x) dx = f (x)φ(x) dx ≤ ||f ||L1 (K) ||φ||L∞ (K) < ∞ (7.2.4) Ω K if K = supp φ. Similarly if φn → φ in D(Ω) we can choose a fixed compact set K ⊂ Ω containing supp φ and supp φn for every n, hence again |T (φn ) − T (φ)| ≤ ||f ||L1 (K) ||φn − φ||L∞ (K) → 0 (7.2.5) so that T ∈ D0 (Ω). When convenient, we will denote the distribution in (7.2.1) by Tf . The correspondence f → Tf allows us to think of L1loc (Ω) as a special subspace of D0 (Ω), i.e. locally integrable functions are always distributions. The point is that a function f can be thought of as a mapping Z φ→ f φ dx (7.2.6) Ω instead of the more conventional x → f (x) (7.2.7) In fact for L1loc functions the former is in some sense more natural since it doesn’t require us to make special arrangements for sets of measure zero. A distribution of the form T = Tf for some f ∈ L1loc (Ω) is sometimes referred to as a regular distribution, while any distribution not of this type is a singular distribution. The correspondence f → Tf is also one-to-one. This is a slightly technical result in measure theory which we leave for the exercises, for those with the necessary background. See also Theorem 2, Chapter II of [32]: th71 Theorem 7.1. Two distributions Tf1 , Tf2 on Ω are equal if and only if f1 = f2 almost everywhere on Ω. Example 7.4. Fix a point x0 ∈ Ω and define T (φ) = φ(x0 ) (7.2.8) Clearly T is defined and linear on D(Ω) and if φn → φ in D(Ω) then |T (φn ) − T (φ)| = |φn (x0 ) − φ(x0 )| → 0 96 (7.2.9) Tdelta since φn → φ uniformly on Ω. We claim that T is not of the form Tf for any f ∈ L1loc (Ω) (i.e. not a regular distribution). To see this, suppose some such f existed. We would then have Z f (x)φ(x) dx = 0 (7.2.10) Ω for any test function φ with φ(x0 ) = 0. In particular if Ω0 = Ω\{x0 } and φ ∈ D(Ω0 ) then defining φ(x0 ) = 0 we clearly have φ ∈ D(Ω) and T (φ) = 0, hence f = 0 a.e. on Ω0 and so on Ω, by Theorem 7.1. On the other hand we must also have, for any φ ∈ D(Ω) that Z φ(x0 ) = T (φ) = f (x)φ(x) dx (7.2.11a) Ω Z Z Z f (x)(φ(x) − φ(x0 )) dx + φ(x0 ) f (x) dx = φ(x0 ) f (x) dx (7.2.11b) = Ω Ω since f = 0 a.e. on Ω, and therefore Ω R f (x) dx = 1 a contradiction. R Note that f (x) = 0 for a.e x ∈ Ω. and Ω f (x) dx − 1 are precisely the formal properties of the delta function mentioned in Example 2. We define T to be the Dirac delta distribution with singularity at x0 , usually denoted δx0 , or simply δ in the case x0 = 0. By an acceptable abuse of notation, pretending that δ is an actual function, we may write a formula like Z δ(x)φ(x) dx = φ(0) (7.2.12) Ω Ω but we emphasize that this is simply a formal expression of (7.2.8), and any rigorous arguments must make use of (7.2.8) directly. In the same formal sense δx0 (x) = δ(x − x0 ) so that Z δ(x − x0 )φ(x) dx = φ(x0 ) (7.2.13) Ω ex-75 Example 7.5. Fix a point x0 ∈ Ω, a multiindex α and define T (φ) = (Dα φ)(x0 ) (7.2.14) One may show, as in the previous example, that T ∈ D0 (Ω). Example 7.6. Let Σ be a sufficiently smooth hypersurface in Ω of dimension m ≤ n − 1 and define Z T (φ) = φ(x) ds(x) (7.2.15) Σ where ds is the surface area element on Σ. Then T is a distribution on Ω sometimes referred to as the delta distribution concentrated on Σ, sometimes written as δΣ . 97 ex-77 Example 7.7. Let Ω = R and define Z T (φ) = lim →0+ |x|> φ(x) dx x (7.2.16) As we’ll show below, the indicated limit always exists and is finite for φ ∈ D(Ω) (even for φ ∈ C01 (Ω)). In general, a limit of the form Z lim f (x) dx (7.2.17) →0+ Ω∩|x−a|> R when it exists, is called the Cauchy principal value of Ω f (x) dx, which may be finite R R1 is divergent, even when Ω f (x) dx is divergent in the ordinary sense. For example −1 dx x regarded as either a Lebesgue integral or an improper Riemann integral, but Z dx lim =0 (7.2.18) →0+ 1>|x|> x To distinguish the principal value meaning of the integral, the notation Z pv f (x) dx (7.2.19) Ω may be used instead of (7.2.17), where the point a in question must be clear from context. Let us now check that (7.2.16) defines a distribution. If supp φ ⊂ [−M, M ] then since Z |x|> φ(x) dx = x Z M >|x|> φ(x) dx = x Z M >|x|> φ(x) − φ(0) dx + φ(0) x and the last term on the right is zero, we have Z T (φ) = lim →0+ ψ(x) dx Z M >|x|> 1 dx (7.2.20) x (7.2.21) M >|x|> where ψ(x) = (φ(x) − φ(0))/x. It now follows from the mean value theorem that Z |T (φ)| ≤ |ψ(x)| dx ≤ 2M ||φ0 ||L∞ (7.2.22) |x|<M 98 pv1x pvdef so T (φ) is defined and finite for all test functions. Linearity of T is clear, and if φn → φ in D(Ω) then (7.2.23) |T (φn ) − T (φ)| ≤ 2M ||φ0n − φ0 ||L∞ → 0 where M is chosen so that supp φn , supp φ ⊂ [−M, M ], and it follows that T is continuous. The distribution T is often denoted pv x1 , so for example pv x1 (φ) means the same thing as the right hand side of (7.2.16). For reasons which will become more clear later, it may also be referred to as pf x1 , pf standing for pseudofunction (also finite part). 7.3 7.3.1 Algebra and Calculus with Distributions Multiplication of distributions As noted above D0 (Ω) is a vector space, hence distributions can be added and multiplied by scalars. In general it is not possible to multiply together arbitrary distributions – for example δ 2 = δ · δ cannot be defined in any consistent way. It is always possible, however, to multiply a distribution by a C ∞ function. More precisely, if a ∈ C ∞ (Ω) and T ∈ D0 (Ω) then we may define the product aT as a distribution via Definition 7.4. aT (φ) = T (aφ) φ ∈ D(Ω) Clearly aφ ∈ D(Ω) so that the right hand side is well defined, and it it straightforward to check that aT satisfies the necessary linearity and continuity conditions. One should also note that if T = Tf then this definition is consistent with ordinary pointwise multiplication of the functions f and a. 7.3.2 Convergence of distributions An appropriate definition of convergence of a sequence of distributions is as follows. convD Definition 7.5. If T, Tn ∈ D0 (Ω) for n = 1, 2 . . . then we say Tn → T in D0 (Ω) (or in the sense of distributions) if Tn (φ) → T (φ) for every φ ∈ D(Ω). 99 It is an interesting fact, which we shall not prove here, that it is not necessary to assume that the limit T belongs to D0 (Ω), that is to say, if T (φ) := limn→∞ Tn (φ) exists for every φ ∈ D(Ω) then necessarily T ∈ D0 (Ω), (see Theorem 6.17 of [31]). Example 7.8. If fn ∈ L1loc (Ω) and fn → f in L1loc (Ω) then the corresponding distribution Tfn → Tf in the sense of distributions, since Z |Tfn (φ) − Tf (φ)| ≤ |fn − f ||φ| dx ≤ ||fn − f ||L1 (K) ||φ||L∞ (Ω) (7.3.1) K where K is the support of φ. Because of the one-to-one correspondence f ↔ Tf , we will usually write instead that fn → f in the sense of distributions. Example 7.9. Define fn (x) = n 0 < x < n1 0 otherwise (7.3.2) We claim that fn → δ in the sense of distributions. We see this by first observing that Z 1 Z 1 n n |Tfn (φ) − δ(φ)| = n φ(x) dx − φ(0) = n (φ(x) − φ(0)) dx (7.3.3) 0 0 By the continuity of φ, if > 0 there exists δ > 0 such that |φ(x) − φ(0)| ≤ whenever |x| ≤ δ. Thus if we choose n > 1δ there follows Z n 1 n Z |φ(x) − φ(0)| dx ≤ n 0 1 n dx = (7.3.4) 0 from which the conclusion follows. Note that the formal properties of the δ function, R δ(x) = 0, x 6= 0, δ(0) = +∞, δ(x) dx = 1, are clearly reflected in the pointwise limit of the sequence fn , but it is only the distributional definition that is mathematically satisfactory. Sequences converging to δ play a very large role in methods of applied mathematics, especially in the theory of differential and integral equations. The following theorem includes many cases of interest. dst Theorem 7.2. Suppose fn ∈ L1 (RN ) for n = 1, 2, . . . and assume R a) RN fn (x) dx = 1 for all n. b) There exists a constant C such that ||fn ||L1 (RN ) ≤ C for all n. 100 ds2 c) limn→∞ R |x|>δ |fn (x)| dx = 0 for all δ > 0. If φ is bounded on RN and continuous at x = 0 then Z fn (x)φ(x) dx = φ(0) lim n→∞ (7.3.5) RN and in particular fn → δ in D0 (RN ). Proof: For any φ ∈ D(Ω) we have Z Z fn (x)φ(x) dx − φ(0) = RN fn (x)(φ(x) − φ(0)) dx (7.3.6) ds1 RN and so we will be done if we show that that the integral on the right tends to zero as n → ∞. Fix > 0 and choose δ > 0 such that |φ(x) − φ(0)| ≤ whenever |x| < δ. Write the integral on the right in (7.3.6) as the sum An,δ + Bn,δ where Z Z An,δ = fn (x)(φ(x) − φ(0)) dx Bn,δ = fn (x)(φ(x) − φ(0)) dx (7.3.7) |x|≤δ |x|>δ We then have, by obvious estimations, that Z |An,δ | ≤ |fn (x)| ≤ C (7.3.8) RN while Z |fn (x)| dx = 0 lim sup |Bn,δ | ≤ lim sup 2||φ||L∞ n→∞ Thus n→∞ Z lim sup n→∞ RN (7.3.9) |x|>δ fn (x)φ(x) dx − φ(0) ≤ C (7.3.10) and the conclusion follows since > 0 is arbitrary. 2 It is often the case that fn ≥ 0 for all n, in which case assumption b) follows automatically from a) with C = 1. We will refer to any sequence satisfying the assumptions of Theorem 7.2 as a delta sequence. A common way to construct such a sequence is to R pick any f ∈ L1 (RN ) with RN f (x) dx = 1 and set fn (x) = nN f (nx) (7.3.11) The verification of this is left to the exercises. If, for example, we choose f (x) = χ[0,1] (x), then the resulting sequence fn (x) is the same as is defined in (7.3.2). Since we can also choose such an f in D(RN ) we also have 101 7311 dst2 N Proposition 7.1. There exists a sequence {fn }∞ n=1 such that fn ∈ D(R ) and fn → δ 0 N in D (R ). 7.3.3 Derivative of a distribution Next we explain how it is possible to define the derivative of an arbitrary distribution. For the moment, suppose (a, b) ⊂ R, f ∈ C 1 (a, b) and T = Tf is the corresponding distribution. We clearly then have from integration by parts that Z b Z b 0 f (x)φ0 (x) dx = −Tf (φ0 ) (7.3.12) f (x)φ(x) dx = − Tf 0 (φ) = a a This suggests defining T 0 (φ) = −T (φ0 ) φ ∈ C0∞ (a, b) (7.3.13) whenever T ∈ D0 (a, b). The previous equation shows that this definition is consistent with the ordinary concept of differentiability for C 1 functions. Clearly, T 0 (φ) is always defined, since φ0 is a test function whenever φ is, linearity of T 0 is obvious, and if φn → φ in C0∞ (a, b) then φ0n → φ0 also in C0∞ (a, b) so that T 0 (φn ) = −T (φ0n ) → −T (φ0 ) = T 0 (φ) (7.3.14) Thus, T 0 ∈ D0 (a, b). Example: Consider the case of the Heaviside (unit step) function H(x) 0 x<0 H(x) = 1 x>0 (7.3.15) If we seek the derivative of H (i.e. of TH ) according to the above distributional definition, then we compute Z ∞ Z ∞ 0 0 0 H(x)φ (x) dx = − φ0 (x) dx = φ(0) (7.3.16) H (φ) = −H(φ ) = − −∞ 0 (where we use the natural notation H 0 in place of TH0 ). This means that H 0 (φ) = δ(φ) for any test function φ, and so H 0 = δ in the sense of distributions. This relationship clearly captures the fact the H 0 = 0 at all points where the derivative exists in the classical sense, since we think of the delta function as being zero on any interval not containing 102 the origin. Since H is not differentiable at the origin, the distributional derivative is itself a distribution which is not a function. Since δ is again a distribution, it will itself have a derivative, namely δ 0 (φ) = −δ(φ0 ) = −φ0 (0) (7.3.17) a distribution of the type discussed in Example 7.5, often referred to as the dipole distribution, which of course we may regard as the second derivative of H. For an arbitrary domain Ω ⊂ RN and sufficiently smooth function f we have the similar integration by parts formula (see (18.2.3)) Z Z ∂f ∂φ φ dx = − f dx (7.3.18) Ω ∂xi Ω ∂xi leading to the definition Definition 7.6. ∂T (φ) = −T ∂xi ∂φ ∂xi φ ∈ D(Ω) (7.3.19) ∂T As in the one dimensional case we easily check that ∂x belongs to D0 (Ω) whenever T i does. This has the far reaching consequence that every distribution is infinitely differentiable in the sense of distributions. Furthermore we have the general formula, obtained by repeated application of the basic definition, that (Dα T )(φ) = (−1)|α| T (Dα φ) (7.3.20) for any multiindex α. A simple and useful property is prop72 Proposition 7.2. If Tn → T in D0 (Ω) then Dα Tn → Dα T in D0 (Ω) for any multiindex α. Proof: Dα Tn (φ) = (−1)|α| Tn (Dα φ) → (−1)|α| T (Dα φ) = Dα T (φ) for any test function φ. 2 Next we consider a more generic one dimensional situation. Let x0 ∈ R and consider a function f which is C ∞ on (−∞, x0 ) and on (x0 , ∞), and for which f (k) has finite left 103 and right hand limits at x = x0 , for any k. Thus, at the point x = x0 , f or any of its derivatives may have a jump discontinuity, and we denote ∆k f = lim f (k) (x) − lim f (k) (x) x→x0 − x→x0 + (and by convention ∆f = ∆0 f .) Define also (k) f (x) x 6= x0 (k) [f ](x) = undefined x = x0 (7.3.21) (7.3.22) which we’ll refer to as the pointwise k’th derivative. The notation f (k) will always be understood to mean the distributional derivative unless otherwise stated. The distinction between f (k) and [f (k) ] is crucial, for example if f (x) = H(x), the Heaviside function, then H 0 = δ but [H 0 ] = 0 for x 6= 0, and is undefined for x = 0. For f as described above, we now proceed to calculate the distributional derivative. If φ ∈ C0∞ (R) we have Z ∞ Z x0 Z ∞ 0 0 f (x)φ (x) dx = f (x)φ (x) dx + f (x)φ0 (x) dx (7.3.23a) −∞ −∞ x0 Z x0 Z −∞ x0 −∞ 0 = f (x)φ(x) −∞ − f (x)φ(x) dx + f (x)φ(x) x0 − f 0 (x)φ(x) dx (7.3.23b) −∞ x0 Z ∞ =− [f 0 (x)]φ(x) dx + (f (x0 −) − f (x0 +))φ(x0 ) (7.3.23c) −∞ It follows that 0 Z ∞ f (φ) = [f 0 (x)]φ(x) dx + (∆f )φ(x0 ) (7.3.24) −∞ or f 0 = [f 0 ] + (∆f )δ(x − x0 ) (7.3.25) Note in particular that f 0 = [f 0 ] if and only if f is continuous at x0 . The function [f 0 ] satisfies all of the same assumptions as f itself, with ∆f 0 = ∆[f 0 ], thus we can differentiate again in the distribution sense to obtain f 00 = [f 0 ]0 + (∆f )δ 0 (x − x0 ) = [f 00 ] + (∆1 f )δ(x − x0 ) + (∆f )δ 0 (x − x0 ) (7.3.26) Here we use the evident fact that the distributional derivative of δ(x − x0 ) is δ 0 (x − x0 ). 104 A similar calculation can be carried out for higher derivatives of f , leading to the general formula k−1 X f (k) = [f (k) ] + (∆j f )δ (k−1−j) (x − x0 ) (7.3.27) j=0 One can also obtain a similar formula if f is allowed to have any finite number of such singular points. Example 7.10. Let f (x) = x x<0 cos x x > 0 Clearly f satisfies all of the assumptions mentioned above with x0 = 0, and 1 x<0 0 [f ](x) = − sin x x > 0 0 x<0 00 [f ](x) = − cos x x > 0 (7.3.28) (7.3.29) (7.3.30) so that ∆f = 1, ∆1 f = −1. Thus f 0 = [f 0 ] + δ f 00 = [f 00 ] − δ + δ 0 (7.3.31) Here is one more instructive example in the one dimensional case. Example 7.11. Let ( log x x > 0 f (x) = 0 x≤0 (7.3.32) Since f ∈ L1loc (R) we may regard it as a distribution on R, but its pointwise derivative H(x)/x is not locally integrable, so does not have an obvious distributional meaning. Nevertheless f 0 must exist in the sense of D0 (R). To find it we use the definition above, Z ∞ Z ∞ 0 0 0 f (φ) = −f (φ ) = − φ (x) log x dx = − lim φ0 (x) log x dx (7.3.33) →0+ 0 Z ∞ φ(x) = lim φ() log + dx (7.3.34) →0+ x Z ∞ φ(x) = lim φ(0) log + dx (7.3.35) →0+ x 105 7327 where the final equality is valid because the difference between it and the previous line is lim→0 (φ() − φ(0)) log = 0. The functional defined by the final expression above will be denoted1 as pf H(x) , i.e. x pf H(x) x ∞ Z (φ) = lim φ(0) log + →0+ φ(x) dx x (7.3.36) Since we have already established that the derivative of a distribution is also a distriH(x) bution, it follows that pf x ∈ D0 (R) and in particular the limit here always exists for φ ∈ D(R). It should be emphasized that if φ(0) 6= 0 then neither of the two terms on the right hand side in (7.3.36) will have a finite limit separately, but the sum always will. For a test function φ with support disjoint from the singularity at x = 0, the action H(x) of the distribution pf x coincides with that of the ordinary function H(x)/x, as we might expect. Next we turn to examples involving partial derivatives. exm7-12 Example 7.12. Let F ∈ L1loc (R) and set u(x, t) = F (x + t). We claim that utt − uxx = 0 in D0 (R2 ). Recall that this is the point that was raised in the first example at the beginning of this chapter. A similar argument works for F (x − t). To verify this claim, first observe that for any φ ∈ D(R2 ) ZZ F (x + t)(φtt (x, t) − φxx (x, t)) dxdt (7.3.37) (utt − uxx )(φ) = u(φtt − φxx ) = R2 Make the change of coordinates ξ =x−t η =x+t (7.3.38) to obtain Z ∞ (utt − uxx )(φ) = 2 Z ∞ Z ∞ φξη (ξ, η) dξ dη = 2 F (η) −∞ −∞ F (η) φη (ξ, η)|∞ ξ=−∞ dη = 0 −∞ (7.3.39) since φ has compact support. exm7-13 Example 7.13. Let N ≥ 3 and define u(x) = 1 1 |x|N −2 Recall the pf notation was introduced earlier in Section 7.2. 106 (7.3.40) pf1 We claim that in D0 (RN ) ∆u = CN δ (7.3.41) 7341 where CN = (2 − N )ΩN −1 and ΩN −1 is the surface area2 of the unit sphere in RN . First note that for any R we have Z Z R 1 N −1 |u(x)| dx = ΩN −1 r dr < ∞ (7.3.42) N −2 |x|<R 0 r (using, for example (18.3.1)) so u ∈ L1loc (RN ) and in particular u ∈ D0 (RN ). It is natural here to use spherical coordinates in RN , see Section 18.3 for a review. In particular the expression for the Laplacian in spherical coordinates may be derived from the chain rule, as was done in (2.3.67) for the two dimensional case. When applied to a function depending only on r = |x|, such as u, the result is ∆u = urr + N −1 ur r (7.3.43) (see Exercise 17 of Chapter 2) and it follows that ∆u(x) = 0 for x 6= 0. We may use Green’s identity (18.2.6) to obtain, for any φ ∈ D(RN ) Z Z u(x)∆φ(x) dx = lim u(x)∆φ(x) dx ∆u(φ) = u(∆φ) = →0+ |x|> RN Z Z ∂φ ∂u ∆u(x)φ(x) dx + = lim u(x) (x) − φ(x) (x) dS(x) →0+ ∂n ∂n |x|> |x|= ∂ = − ∂r on {x : |x| = } this simplifies to Z 2−N 1 ∂φ ∆u(φ) = lim φ(x) − N −2 (x) dS(x) →0+ |x|= N −1 ∂r Since ∆u = 0 for x 6= 0 and (7.3.44) (7.3.45) ∂ ∂n (7.3.46) We next observe that Z lim →0+ |x|= 2−N φ(x) dS(x) = (2 − N )ΩN −1 φ(0) N −1 2 (7.3.47) The usual notation is to use N −1 rather than N as the subscript because the sphere is a surface of dimension N − 1. 107 radialNlapla since the average of φ over the sphere of radius converges to φ(0) as → 0. Finally, the second integral tends to zero, since Z ΩN −1 N −1 1 ∂φ ≤ (x) dS(x) ||∇φ||L∞ → 0 (7.3.48) N −2 ∂r N −2 |x|= Thus (7.3.41) holds. When N = 2 an analogous calculation shows that if u(x) = log |x| then ∆u = 2πδ in D0 (R2 ). 7.4 Convolution and distributions If f, g are locally integrable functions on RN the classical convolution of f and g is defined to be Z f (x − y)g(y) dy (7.4.1) (f ∗ g)(x) = RN whenever the integral is defined. By an obvious change of variable we see that convolution is commutative, f ∗ g = g ∗ f . Proposition 7.3. If f ∈ Lp (RN ) and g ∈ Lq (RN ) then f ∗ g ∈ Lr (RN ) if 1 + 1r = so in particular is defined almost everywhere. Furthermore ||f ∗ g||Lr (RN ) ≤ ||f ||Lp (RN ) ||g||Lq (RN ) 1 p + 1q , (7.4.2) The inequality (7.4.2) is Young’s convolution inequality, and we refer to [38] (Theorem 9.2) for a proof. In the case r = ∞ it can actually be shown that f ∗ g ∈ C(RN ). Our goal here is to generalize the definition of convolution in such a way that at least one of the two factors can be a distribution. Let us introduce the notations for translation and inversion of a function f , (τh f )(x) = f (x − h) (7.4.3) fˇ(x) = f (−x) (7.4.4) so that f (x − y) = (τx fˇ)(y). If f ∈ D(RN ) then so is (τx fˇ) so that (f ∗ g)(x) may be regarded as Tg (τx fˇ), i.e. the value obtained when the distribution corresponding to the locally integrable function g acts on the test function (τx fˇ). This motivates the following definition. 108 youngci convdp Definition 7.7. If T ∈ D0 (RN ) and φ ∈ D(RN ) then (T ∗ φ)(x) = T (τx φ̌). By this definition (T ∗φ)(x) exists and is finite for every x ∈ RN but other smoothness or decay properties of T ∗ φ may not be apparent. Example 7.14. If T = δ then (T ∗ φ)(x) = δ(τx φ̌) = (τx φ̌)(y)|y=0 = φ(x − y)|y=0 = φ(x) (7.4.5) Thus, δ is the ’convolution identity’, δ ∗ φ = φ at least for φ ∈ D(RN ). Formally this corresponds to the widely used formula Z δ(x − y)φ(y) dy = φ(x) (7.4.6) RN If Tn → δ in D0 (RN ) then likewise (Tn ∗ φ)(x) = Tn (τx φ̌) → δ(τx φ̌) = φ(x) (7.4.7) ci3 for any fixed x ∈ RN . A key property of convolution is that in computing a derivative Dα (T ∗ φ), the derivative may be applied to either factor in the convolution. More precisely we have the following theorem. convth1 Theorem 7.3. If T ∈ D0 (RN ) and φ ∈ D(RN ) then T ∗ φ ∈ C ∞ (RN ) and for any multi-index α Dα (T ∗ φ) = Dα T ∗ φ = T ∗ Dα φ (7.4.8) Proof: First observe that (−1)|α| Dα (τx φ̌) = τx ((Dα φ)ˇ) (7.4.9) and applying T to these identical test functions we get the right hand equality in (7.4.8). We refer to Theorem 6.30 of [31] for the proof of the left hand equality. When f, g are continuous functions of compact support it is elementary to see that supp (f ∗ g) ⊂ supp f + supp g. The same property holds for T ∗ φ if T ∈ D0 (RN ) and φ ∈ D(RN ), once a proper definition of the support of a distribution is given. If ω ⊂ Ω is an open set we say that T = 0 in ω if T (φ) = 0 whenever φ ∈ D(Ω) and supp (φ) ⊂ ω. If W denotes the largest open subset of Ω on which T = 0 (equivalently the 109 ci2 union of all open subsets of Ω on which T = 0) then the support of T is the complement of W in Ω. In other words, x 6∈ supp T if there exists > 0 such that T (φ) = 0 whenever φ is a test function with support in B(x, ). One can easily verify that the support of a distribution is closed, and agrees with the usual notion of support of a function, up to sets of measure zero. The set of distributions of compact support in Ω forms a vector subspace of D0 (Ω) denoted E 0 (Ω). This notation is appropriate because E 0 (Ω) turns out to be precisely the dual space of C ∞ (RN ) =: E(RN ) when a suitable definition of convergence is given, see for example Chapter II, section 5 of [32]. If now T ∈ E 0 (RN ) and φ ∈ D(RN ), we observe that supp (τx φ̌) = x − supp φ (7.4.10) (T ∗ φ)(x) = T (τx φ̌) = 0 (7.4.11) Thus unless there is a nonempty intersection of supp T and x − supp φ, in other words, x ∈ supp T + supp φ. Thus from these remarks and Theorem 7.3 we have convth2 Proposition 7.4. If T ∈ E 0 (RN ) and φ ∈ D(RN ) then supp (T ∗ φ) ⊂ supp T + supp φ (7.4.12) and in particular T ∗ φ ∈ D(RN ). Convolution provides an extremely useful and convenient way to approximate functions and distributions by very smooth functions, the exact sense in which the approximation takes place being dependent on the object being approximated. We will discuss several results of this type. thuapprox Theorem 7.4. Let f ∈ C(RN ) with supp f compact in RN . Pick φ ∈ D(RN ), with R φ(x) dx = 1, set φn (x) = nN φ(nx) and fn = f ∗ φn . Then fn ∈ D(RN ) and fn → f RN uniformly on RN . Proof: The fact that fn ∈ D(RN ) is immediate from Proposition 7.4. Fix > 0. By the assumption that f is continuous and of compact support it must be uniformly continuous on RN so there exists δ > 0 such that |f (x) − f (z)| < if |x − z| < δ. Now choose n0 such that supp φn ⊂ B(0, δ) for n > n0 . We then have, for n > n0 that Z |fn (x) − f (x)| = (fn (x − y) − fn (x))φn (y) dy ≤ (7.4.13) RN Z |fn (x − y) − f (x)||φn (y)| dy ≤ ||φ||L1 (RN ) (7.4.14) |y|<δ 110 and the conclusion follows. 2 thLpApprox If f is not assumed continuous then of course it is not possible for there to exist fn ∈ D(RN ) converging uniformly to f . However the following can be shown. R Theorem 7.5. Let f ∈ Lp (RN ), 1 ≤ p < ∞. Pick φ ∈ D(RN ), with RN φ(x) dx = 1, set φn (x) = nN φ(nx) and fn = f ∗ φn . Then fn ∈ C ∞ (RN ) ∩ Lp (RN ) and fn → f in Lp (RN ). Proof: If > 0 we can find g ∈ C(RN ) of compact support such that ||f − g||Lp (RN ) < . If gn = g ∗ φn then ||f − fn ||Lp (RN ) ≤ ||f − g||Lp (RN ) + ||g − gn ||Lp (RN ) + ||fn − gn ||Lp (RN ) (7.4.15) ≤ C||f − g||Lp (RN ) + ||g − gn ||Lp (RN ) (7.4.16) where we have used Young’s convolution inequality (7.4.2) to obtain ||fn − gn ||Lp (RN ) ≤ ||φn ||L1 (RN ) ||f − g||Lp (RN ) = ||φ||L1 (RN ) ||f − g||Lp (RN ) (7.4.17) Since gn → g uniformly by Theorem 7.4 and g − gn has support in a fixed compact set independent of n, it follows that ||g −gn ||Lp (RN ) → 0, and so lim supn→∞ ||f −fn ||Lp (RN ) ≤ C. Further refinements and variants of these results can be proved, see for example Section C.4 of [10]. Next consider the even more general case that T ∈ D0 (RN ). As in Proposition 7.1 we can choose ψn ∈ D(RN ) such that ψn → δ in D0 (RN ). Set Tn = T ∗ ψn , so that Tn ∈ C ∞ (RN ). If φ ∈ D(RN ) we than have Tn (φ) = (Tn ∗ φ̌)(0) = ((Tn ∗ ψn ) ∗ φ̌)(0) = ((T ∗ ψn ) ∗ φ̌)(0) = (T ∗ (ψn ∗ φ̌))(0) = T ((ψn ∗ φ̌)ˇ) (7.4.18) (7.4.19) It may be checked that ψn ∗ φ̌ → φ̌ in D(RN ), thus Tn (φ) → T (φ) for all φ ∈ D(RN ), that is, Tn → T in D0 (RN ). In the above derivation we used associativity of convolution. This property is not completely obvious, and in fact is false in a more general setting in which convolution of two distributions is defined. For example, if we were to assume that convolution of distributions was always defined and that Theorem 7.3 holds, we would have 1∗(δ 0 ∗H) = 111 1 ∗ H 0 = 1 ∗ δ = 1, but (1 ∗ δ 0 ) ∗ H = 0 ∗ H = 0. Nevertheless, associativity is correct in the case we have just used it, and we refer to [31] Theorem 6.30(c), for the proof. The pattern of the results just stated is that T ∗ ψn converges to T in the topology appropriate to the space that T itself belongs to, but this cannot be true in all situations which may be encountered. For example it cannot be true that if f ∈ L∞ then f ∗ ψn converges to f in L∞ since this would amount to uniform convergence of a sequence of continuous functions, which is impossible if f itself is not continuous. 7.5 ex7-1 Exercises 1. Construct a test function φ ∈ C0∞ (R) with the following properties: 0 ≤ φ(x) ≤ 1 for all x ∈ R, φ(x) ≡ 1 for |x| < 1 and φ(x) ≡ 0 for |x| > 2. (Suggestion: think about what φ0 would have to look like.) 2. Show that T (φ) = ∞ X φ(n) (n) n=1 0 defines a distribution T ∈ D (R). 3. If φ ∈ D(R) show that ψ(x) = (φ(x) − φ(0))/x (this function R 1 appeared in Example 7.7) belongs to C ∞ (R). (Suggestion: first prove ψ(x) = 0 φ0 (xt) dt.) 4. Find the distributional derivative of f (x) = [x], the greatest integer function. 5. Find the distributional derivatives up through order four of f (x) = |x| sin x. 6. (For readers familiar with the concept of absolute continuity.) If f is absolutely continuous on (a, b) and f 0 = g a.e., show that f 0 = g in the sense of distributions on (a, b). 7-3 7. Let λn > 0, λn → +∞ and set fn (x) = sin λn x gn (x) = sin λn x πx a) Show that fn → 0 in D0 (R) as n → ∞. b) Show that gn → δ in D0 (R) as n → ∞. R∞ (You may use without proof the fact that the value of the improper integral −∞ is π.) 112 sin x x dx 8. Let φ ∈ C0∞ (R) and f ∈ L1 (R). a) If ψn (x) = n(φ(x + n1 ) − φ(x)), show that ψn → φ0 in C0∞ (R). (Suggestion: use the mean value theorem over and over again.) b) If gn (x) = n(f (x + n1 ) − f (x)), show that gn → f 0 in D0 (R). 1 9. Let T = pv . Find a formula analogous to (7.3.35) for the distributional derivative x of T . 10. Find limn→∞ sin2 nx in D0 (R), or show that it doesn’t exist. ex7-11 11. Define the distribution Z ∞ T (φ) = φ(x, x) dx −∞ for φ ∈ C0∞ (R2 ). Show that T satisfies the wave equation uxx − uyy = 0 in the sense of distributions on R2 . Discuss why it makes sense to regard T as being δ(x − y). 12. Let Ω ⊂ RN be a bounded open set and K ⊂⊂ Ω. Show that there exists φ ∈ C0∞ (Ω) such that 0 ≤ φ(x) ≤ 1 and φ(x) ≡ 1 for x ∈ K. (Hint: approximate the characteristic function of Σ by convolution, where Σ satisfies K ⊂⊂ Σ ⊂⊂ Ω. Use Proposition 7.4 for the needed support property.) 13. If a ∈ C ∞ (Ω) and T ∈ D0 (Ω) prove the product rule ∂ ∂T ∂a (aT ) = a + T ∂xj ∂xj ∂xj 14. Let T ∈ D0 (RN ). We may then regard φ 7−→ Aφ = T ∗ φ as a linear mapping from C0∞ (Rn ) into C ∞ (Rn ). Show that A commutes with translations, that is, τh Aφ = Aτh φ for any φ ∈ C0∞ (RN ). (The following interesting converse statement can also be proved: If A : C0∞ (RN ) 7−→ C(RN ) is continuous and commutes with translations then there exists a unique T ∈ D0 (RN ) such that Aφ = T ∗ φ. An operator commuting with translations is also said to be translation invariant.) R∞ 15. If f ∈ L1 (RN ), −∞ f (x) dx = 1, and fn (x) = nN f (nx), use Theorem 7.2 to show that fn → δ in D0 (RN ). 16. Prove Theorem 7.1. 113 17. If T ∈ D0 (Ω) prove the equality of mixed partial derivatives ∂ 2T ∂ 2T = ∂xi ∂xj ∂xj ∂xi (7.5.1) in the sense of distributions, and discuss why there is no contradiction with known examples from calculus showing that the mixed partial derivatives need not be equal. 18. Show that the expression Z 1 T (φ) = −1 φ(x) − φ(0) dx + |x| Z |x|>1 φ(x) dx |x| defines a distribution on R. Show also that xT = sgn x. 19. If f is a function defined on RN and λ > 0, let fλ (x) = f (λx). We say that f is homogeneous of degree α if fλ = λα f for any λ > 0. If T is a distribution on RN we say that T is homogeneous of degree α if T (φλ ) = λ−α−N T (φλ−1 ) a) Show that these two definitions are consistent, i.e., if T = Tf for some f ∈ L1loc (RN ) then T is homogeneous of degree α if and only if f is homogeneous of degree α. b) Show that the delta function is homogeneous of degree −N . ex7-17 20. Show that u(x) = 1 2π log |x| satisfies ∆u = δ in D0 (R2 ). 21. Without appealing to Theorem 7.3, give a direct proof of the fact that T ∗ φ is a continuous function of x, for T ∈ D0 (RN ) and φ ∈ D(RN ). 22. Let ( log2 x x > 0 f (x) = 0 x<0 Show that f ∈ D0 (R) and find the distributional derivative f 0 . Is f a tempered distribution? 23. If a ∈ C ∞ (R), show that aδ 0 = a(0)δ 0 − a0 (0)δ 24. If T ∈ D0 (RN ) has compact support, show that T (φ) is defined in an unambiguous way for any φ ∈ C ∞ (RN ) =: E(RN ). (Suggestion: write φ = ψφ + (1 − ψ)φ where ψ ∈ D(RN ) satisfies ψ ≡ 1 on the support of T .) 114 Chapter 8 Fourier analysis and distributions chfourier In this chapter We present some of the elements of Fourier analysis, with special attention to those aspects arising in the theory of distributions. Fourier analysis is often viewed as made up of two parts, one being a collection of topics relating to Fourier series, and the second being those connected to the Fourier transform. The essential distinction is that the former focuses on periodic functions while the latter is concerned with functions defined on all of RN . In either case the central question is that of how we may represent fairly arbitrary functions, or even distributions, as combinations of particularly simple periodic functions. We will begin with Fourier series, and restrict attention to the one dimensional case. See for example [26] for treatment of multidimensional Fourier series. 8.1 Fourier series in one space dimension The fundamental point is that if un (x) = einx then the functions {un }∞ n=−∞ make up 2 an orthogonal basis of L (−π, π). It will then follow from the general considerations of Chapter 6 that any f ∈ L2 (−π, π) may expressed as a linear combination f (x) = ∞ X n=−∞ 115 cn einx (8.1.1) 81 where hf, un i 1 cn = = hun , un i 2π Z π f (y)e−iny dy (8.1.2) 82 −π The right hand side of (8.1.1) is a Fourier series for f , and (8.1.2) is a formula for the n’th Fourier coefficient of f . It must be understood that the equality in (8.1.1) is meant only in the sense of L2 convergence of the partial sums, and need not be true at any particular point. From the theory of Lebesgue integration it follows that there is a subsequence of the partial sums which will converge almost everywhere on (−π, π), but more than P inx that we cannot say, without further assumptions on f . Any finite sum N is n=−N γn e called a trigonometric polynomial, so in particular we will be showing that trigonometric polynomials are dense in L2 (−π, π). Let us set 1 en (x) = √ einx 2π n = 0, ±1, ±2, . . . n 1 X ikx Dn (x) = e 2π k=−n (8.1.3) (8.1.4) 82a N 1 X KN (x) = Dn (x) N + 1 n=0 (8.1.5) It is immediate from checking the necessary integrals that {en }∞ n=−∞ is an orthonormal set in H = L2 (−π, π). The main goal for the rest of this section is to prove that {en }∞ n=−∞ is actually an orthonormal basis of H. For the rest of this section, the inner product symbol hf, gi and norm || · || refer to the inner product and norm in H unless otherwise stated. In the context of Fourier analysis, Dn , KN are known as the Dirichlet kernel and Féjer kernel respectively. Note that Z π Z π Dn (x) dx = KN (x) dx = 1 (8.1.6) −π −π for any n, N . If f ∈ H, let sn (x) = n X k=−n 116 ck eikx (8.1.7) 83 where ck is given by (8.1.2) and N 1 X sn (x) σN (x) = N + 1 n=0 Since sn (x) = n X hf, ek iek (x) (8.1.8) (8.1.9) k=−n it follows that the partial sum sn is also the projection of f onto the span of {ek }nk=−n and so in particular the Bessel inequality v uX u n ||sn || = t |ck |2 ≤ ||f || (8.1.10) k=−n holds for all n. In particular, limn→∞ hf, en i = 0, which is the Riemann Lebesgue lemma for the Fourier coefficients of f ∈ H. Next observe that by substitution of (8.1.2) into (8.1.7) we obtain Z π f (y)Dn (x − y) dy sn (x) = (8.1.11) 84 −π We can therefore regard sn as being given by the convolution Dn ∗ f if we let f (x) = 0 outside of the interval (−π, π). We can also express Dn in an alternative and useful way: 2n 1 −inx X ikx 1 −inx 1 − e(2n+1)ix e e = e Dn (x) = 2π 2π 1 − eix k=0 (8.1.12) for x 6= 0. Multiplying top and bottom of the fraction by e−ix/2 then yields Dn (x) = 1 sin (n + 12 )x 2π sin x2 x 6= 0 (8.1.13) and obviously Dn (0) = (2n + 1)/2π. An alternative viewpoint of the convolutional relation (8.1.11), which is in some sense more natural, starts by defining the unit circle as T = R mod 2πZ, i.e. we identify any 117 84a two points of R differing by an integer multiple of 2π. Any 2π periodic function, such as en , Dn , sn etc may be regarded as a function on T, and if f is originally given as a function on (−π, π) then it may extended in a 2π periodic manner to all of R and so also viewed as a function on the circle T. With f , Dn both 2π periodic, the integral (8.1.11) could be written as Z sn (x) = f (y)Dn (x − y) dy (8.1.14) 85 T since (8.1.11) simply amounts to using one natural parametrization of the independent variable. By the same token Z a+2π sn (x) = f (y)Dn (x − y) dy (8.1.15) a for any convenient choice of a. A 2π periodic function is continuous on T if it is continuous on [−π, π] and f (π) = f (−π), and the space C(T) may simply be regarded as C(T) = {f ∈ C([−π, π]) : f (π) = f (−π)} (8.1.16) a closed subspace of C([−π, π]), so is itself a Banach space with maximum norm. Likewise we can define C m (T) = {f ∈ C m ([−π, π]) : f (j) (π) = f (j) (−π), j = 0, 1, . . . m} (8.1.17) a Banach space with the analogous norm. Next let us make some corresponding observations about KN . Proposition 8.1. There holds Z KN (x − y)f (y) dy σN (x) = (8.1.18) 86 (8.1.19) 86b T and N X 1− KN (x) = k=−N |k| N +1 eikx 1 = 2π(N + 1) sin ( (N +1)x ) 2 x sin ( 2 ) !2 x 6= 0 Proof: The identity (8.1.18) is immediate from (8.1.14) and the definition of KN , and 118 rconvergence the first identity in (8.1.19) is left as an exercise. To complete the proof we observe that 2π N X PN Dn (x) = n=0 = = sin (n + 12 )x sin x2 xP inx Im ei 2 N e n=0 n=0 (8.1.21) sin x x 2 i(N +1)x Im ei 2 1−e1−eix (8.1.22) sin x2 Im = (8.1.20) 1−cos (N +1)x−i sin (N +1)x −2i sin x2 sin x2 cos (N + 1)x − 1 2 sin2 x2 !2 sin (N +1)x 2 = x sin ( 2 ) = (8.1.23) (8.1.24) (8.1.25) and the conclusion follows upon dividing by 2π(N + 1). 2 Theorem 8.1. Suppose that f ∈ C(T). Then σN → f in C(T). R Proof: Since KN ≥ 0 and T KN (x − y) dy = 1 for any x, we have Z Z x+π |σN (x) − f (x)| = Kn (x − y)(f (y) − f (x)) dy ≤ Kn (x − y)|f (y) − f (x)| dy T x−π (8.1.26) If > 0 is given, then since f must be uniformly continuous on T, there exists δ > 0 such that |f (x) − f (y)| < if |x − y| < δ. Thus |σN (x) − f (x)| ≤ R |x−y|<δ KN (x − y) dy + 2||f ||∞ ≤+ R (8.1.27) KN (x − y) dy(8.1.28) δ<|x−y|<π ||f ||∞ π(N +1) sin2 ( 2δ ) (8.1.29) Thus there exists N0 such that for N ≥ N0 , |σN (x) − f (x)| < 2 for all x, that is, σN → f uniformly. 119 corr81 2 Corollary 8.1. The functions {en (x)}∞ n=−∞ form an orthonormal basis of H = L (−π, π). Proof: We have already observed that these functions form an orthonormal set, so it remains only to verify one of the equivalent conditions stated in Theorem 6.4. We will show the closedness property, i.e. that set of finite linear combinations of {en (x)}∞ n=−∞ is dense in H. Given g ∈ H and > 0 we may find f ∈ C(T) such that ||f − g|| < , f ∈ D(−π, π)√for example. Then choose N such that ||σN − f ||C(T) < , which implies ||σN − f || < 2π. Thus σN is a finite linear combination of the en ’s and √ (8.1.30) ||g − σN || < (1 + 2π) Since is arbitrary, the conclusion follows. corr82 Corollary 8.2. For any f ∈ H = L2 (−π, π), if n X sn (x) = ck eikx (8.1.31) f (x)e−ikx dx (8.1.32) k=−n where 1 ck = 2π Z π −π then sn → f in H. For f ∈ H, we will often write f (x) = ∞ X cn einx (8.1.33) n=−∞ but we emphasize that without further assumptions this only means that the partial sums converge in L2 (−π, π). At this point we have looked at the convergence properties of two different sequences of trigonometric polynomials, sn and σN , associated with f . While sn is simply the n’th partial sum of the Fourier series of f , the σN ’s are the so-called Féjer means of f . While each Féjer mean is a trigonometric polynomial, the sequence σN does not amount to the partial sums of some other Fourier series, since the n’th coefficient would also have to depend on N . For f ∈ H, we have that sN → f in H, and so the same is obviously true under the stronger assumption that f ∈ C(T). On the other hand for f ∈ C(T) we have 120 shown that σN → f uniformly, but it need not be true that sN → f uniformly, or even pointwise (example of P. du Bois-Reymond, see Section 1.6.1 of [26]). For f ∈ H it can be shown that σN → f in H, but on the other hand the best L2 approximation property of sN implies that ||sN − f || ≤ ||σN − f || (8.1.34) since both sN and σN are in the span of {ek }N k=−N . That is to say, the rate of convergence of sN to f is faster, in the L2 sense at least, than that of σN . In summary, both sN and σN provide a trigonometric polynomial approximating f , but each has some advantage over the other, depending on what is to be assumed about f . 8.2 Alternative forms of Fourier series From the basic Fourier series (8.1.1) a number of other closely related and useful expressions can be immediately derived. First suppose that f ∈ L2 (−L, L) for some L > 0. If we let f˜(x) = f (Lx/π) then f˜ ∈ L2 (−π, π), so f˜(x) = ∞ X inx cn e n=−∞ π 1 cn = 2π Z 1 cn = 2L Z f˜(y)e−iny dy (8.2.1) f (y)e−iπny/L dy (8.2.2) −π or equivalently f (x) = ∞ X iπnx/L cn e n=−∞ L −L Likewise (8.2.2) holds if we just regard f as being 2L periodic and in L2 , and in the formula √ for cn we could replace (−L, L) by any other interval of length 2L. The functions iπnx/L e / 2L make up an orthonormal basis of L2 (a, b) if b − a = 2L. Next observe that we can write ∞ X f (x) = n=−∞ cn ∞ X nπx nπx nπx nπx cos + i sin = c0 + (cn + c−n ) cos + i(cn − c−n ) sin L L L L n=1 (8.2.3) If we let an = cn + c−n bn = i(cn − c−n ) n = 0, 1, 2, . . . 121 (8.2.4) 87 then we obtain the equivalent formulas ∞ a0 X nπx nπx f (x) = + an cos + bn sin 2 L L n=1 (8.2.5) 88 where 1 an = L Z L nπy f (y) cos dy L −L 1 bn = L n = 0, 1, . . . Z L f (y) sin −L nπy dy L n = 1, 2, . . . (8.2.6) We refer to (8.2.5),(8.2.6) as the ’real form’ of the Fourier series, which is natural to use, for example, if f is real valued, since then no complex quantities appear. Again the precise meaning of (8.2.5) is that sn → f in H = L2 (−L, L) or other interval of length 2L, where now n a0 X kπx kπx sn (x) = + + bk sin (8.2.7) ak cos 2 L L k=1 with results analogous to those mentioned above for the Féjer means also being valid. It may be easily checked that the set of functions ∞ sin nπx 1 cos nπx L L √ , √ , √ (8.2.8) 2L L L n=1 make up an orthonormal basis of L2 (−L, L). Another important variant is obtained as follows. If f ∈ L2 (0, L) then we may define the associated even and odd extensions of f in L2 (−L, L), namely ( ( f (x) 0 < x < L f (x) 0 < x < L fe (x) = fo (x) = (8.2.9) f (−x) − L < x < 0 −f (−x) − L < x < 0 If we replace f by fe in (8.2.5),(8.2.6), then we obtain immediately that bn = 0 and a resulting cosine series representation for f , Z ∞ nπx 2 L nπy a0 X + an cos an = f (y) cos dy n = 0, 1, . . . (8.2.10) f (x) = 2 L L 0 L n=1 Likewise replacing f by fo gives us a corresponding sine series, Z ∞ X nπy nπx 2 L f (x) = bn sin bn = f (y) sin dy n = 1, 2, . . . L L L 0 n=1 122 (8.2.11) 89 ourPointwise Note that if the 2L periodic extension of f is continuous, then the same is true of the 2L periodic extension of fe , but this need not be true in the case of fo . Thus we might expect that the cosine series of f has typically better convergence properties than that of the sine series. 8.3 More about convergence of Fourier series If f ∈ L2 (−π, π) it was already observed that since the the partial sums sn converge to f in L2 (−π, π), some subsequence of the partial sums converges pointwise a.e. In fact it is a famous theorem of Carleson ([6]) that sn → f (i.e. the entire sequence, not just a subsequence) pointwise a.e. This is a complicated proof and even now is not to be found even in advanced textbooks. No better result could be expected since f itself is only defined up to sets of measure zero. If we were to assume the stronger condition that f ∈ C(T) then it mighty be natural to conjecture that sn → f for every x (recall we know σN → f uniformly in this case), but that turns out to be false, as mentioned above: in fact there exist continuous functions for which sn (x) is divergent at infinitely many x ∈ T, see Section 5.11 of [30]. A sufficient condition implying that sn (x) → f (x) for every x ∈ T is that f be piecewise continuously differentiable on T. In fact the following more precise theorem can be proved. Theorem 8.2. Assume that there exist points −π ≤ x0 < x1 < . . . xM = π such that f ∈ C 1 ([xj , xj+1 ]) for j = 0, 1, . . . M − 1. Let ( 1 (limy→x+ f (y) + limy→x− f (y)) − π < x < π f˜(x) = 21 (8.3.1) (limy→−π+ f (y) + limy→π− f (y)) x = ±π 2 Then limn→∞ sn (x) = f˜(x) for −π ≤ x ≤ π. Under the stated assumptions on f , the theorem states in particular that sn converges to f at every point of continuity of f , (with appropriate modification at the endpoints) and otherwise converges to the average of the left and right hand limits. The proof is somewhat similar to that of Theorem 8.1 – steps in the proof are outlined in the exercises. So far we have discussed the convergence properties of the Fourier series based on assumptions about f , but another point of view we could take is to focus on how con123 vergence properties are influenced by the behavior of the Fourier coefficients cn . A first simple result of this type is: prop82 Proposition 8.2. If f ∈ H = L2 (−π, π) and its Fourier coefficients satisfy ∞ X |cn | < ∞ (8.3.2) acfs n=−∞ then f ∈ C(T) and sn → f uniformly on T P inx Proof: By the Weierstrass M-test, the series ∞ is uniformly convergent on n=−∞ cn e R to some limit g, and since each partial sum is continuous, the same must be true of g. Since uniform convergence implies L2 convergence on any finite interval, we have sn → g in H, but also sn → f in H by Corollary 8.2. By uniqueness of the limit f = g and the conclusion follows. We say that f has an absolutely convergent Fourier series when (8.3.2) holds. We emphasize here that the conclusion f = g is meant in the sense of L2 , i.e. f (x) = g(x) a.e., so by saying that f is continuous, we are really saying that the equivalence class of f contains a continuous function, namely g. It is not the case that every continuous function has an absolutely convergent Fourier series, according to remarks made earlier in this section. It would therefore be of interest to find other conditions on f which guarantee that (8.3.2) holds. One such condition follows from the following, which is also of independent interest. prop83 Proposition 8.3. If f ∈ C m (T), then limn→±∞ nm cn = 0. Proof: We integrate by parts in (8.1.2) to get, for n 6= 0, Z Z π 1 f (y)e−iny π 1 π 0 1 −iny cn = + f (y)e dy = f 0 (y)e−iny dy −π 2π −in in −π 2πin −π (8.3.3) if f ∈ C 1 (T). Since f 0 ∈ L2 (T), the Riemann-Lebesgue lemma implies that ncn → 0 as n → ±∞. If f ∈ C 2 (T) we could integrate by parts again to get n2 cn → 0 etc. It is immediate from this result that if f ∈ C 2 (T) then it has an absolutely convergent Fourier series, but in fact even f ∈ C 1 (T) is more than enough, see Exercise 6. One way to regard Proposition 8.3 is that it says that the smoother f is, the more rapidly its Fourier coefficients must decay. The next result is a sort of converse statement. 124 810 prop84 Proposition 8.4. If f ∈ H = L2 (−π, π) and its Fourier coefficients satisfy |nm+α cn | ≤ C (8.3.4) 811 for some C and α > 1, then f ∈ C m (T). Proof: When m = 0 this is just a special case of Proposition 8.2. When m = 1 we see that it is permissible to differentiate the series (8.1.1) term by term, since the differentiated series ∞ X incn einx (8.3.5) n=−∞ is uniformly convergent, by the assumption (8.3.4). Thus f, f 0 are both a.e. equal to an absolutely convergent Fourier series, so f ∈ C 1 (T), by Proposition 8.2. The proof for m = 2, 3, . . . is similar. Note that Proposition 8.3 states a necessary condition on the Fourier coefficients for f to be in C m and Proposition 8.4 states a sufficient condition. The two conditions are not identical, but both point to the general tendency that increased smoothness of f is associated with more rapid decay of the corresponding Fourier coefficients. 8.4 The Fourier Transform on RN If f is a given function on RN the Fourier transform of f is defined as Z 1 ˆ f (y) = f (x)e−ix·y dx y ∈ RN N (2π) 2 RN (8.4.1) provided that the integral is defined in some sense. This will always be the case, for example, if f ∈ L1 (RN ) and any y ∈ RN since then Z 1 ˆ |f (y)| ≤ |f (x)| dx < ∞ (8.4.2) N (2π) 2 RN thus in fact fˆ ∈ L∞ (RN ) in this case. There are a number of other commonly used definitions of the Fourier transform, obtained by changing the numerical constant in front of the integral, and/or replacing 125 812 813 −ix · y by ix · y and/or including a factor of 2π in the exponent in the integrand. Each convention has some convenient properties in certain situations, but none of them is always the best, hence the lack of a universally agreed upon definition. The differences are non-essential, all having to do with the way certain numerical constants turn up, so the only requirement is that we adopt one specific definition, such as (8.4.1), and stick with it. The Fourier transform is a particular integral operator, and an alternative operator type notation for it, Fφ = φ̂ (8.4.3) is often convenient to use, especially when discussing its mapping properties. Example 8.1. If N = 1 and f (x) = χ[a,b] (x), the indicator function of the interval [a, b], then the Fourier transform of f is Z b 1 e−iay − e−iby ˆ √ f (y) = √ e−ixy dy = (8.4.4) 2π a 2πiy 2 Example 8.2. If N = 1, α > 0 and f (x) = e−αx (a Gaussian function) then y2 Z ∞ − 4α iy 2 e e e−α(x+ 2 ) dx e−ixy dx = √ 2π −∞ −∞ y2 Z y2 r y2 e− 4α ∞ −αx2 e− 4α π 1 = √ e dx = √ = √ e− 4α 2π −∞ 2π α 2α 1 fˆ(y) = √ 2π Z ∞ −αx2 (8.4.5) (8.4.6) In the above derivation, the key step is the third equality which is justified by contour 2 integration techniques in complex function theory – the integral of e−αz along the real axis is the same as the integral along the parallel line Imz = y2 for any y. Thus the Fourier transform of a Gaussian is another Gaussian, and in particular fˆ = f if α = 21 . It is clear from the Fourier transform definition that if f has the special product form f (x) = f1 (x1 )f2 (x2 ) . . . fN (xN ) then fˆ(y) = fˆ1 (y1 )fˆ2 (y2 ) . . . fˆN (yN ). The Gaussian in 2 RN , namely f (x) = e−α|x| , is of this type, so using (8.4.6) we immediately obtain |y|2 fˆ(y) = e− 4α N (2α) 2 126 (8.4.7) NdGaussian To state our first theorem about the Fourier transform, let us denote C0 (RN ) = {f ∈ C(RN ) : lim |f (x)| = 0} |x|→∞ (8.4.8) the space of continuous functions vanishing at ∞. It is a closed subspace of L∞ (RN ), hence a Banach space with the L∞ norm. We emphasize that despite the notation, functions in this space need not be of compact support. Theorem 8.3. If f ∈ L1 (RN ) then fˆ ∈ C0 (RN ). Proof: If yn ∈ RN and yn → y then clearly f (x)e−ix·yn → f (x)e−ix·y for a.e. x ∈ RN . Also, |f (x)e−ix·yn | ≤ |f (x)|, and since we assume f ∈ L1 (RN ) we can immediately apply the dominated convergence theorem to obtain Z Z −ix·yn lim f (x)e dx = f (x)e−ix·y dx (8.4.9) n→∞ RN RN that is, fˆ(yn ) → fˆ(y). Hence fˆ ∈ C(RN ). Next, suppose temporarily that g ∈ C 1 (RN ) and has compact support. An integration by parts gives us, for j = 1, 2, . . . N that Z 1 ∂g −ix·y 1 e dx (8.4.10) ĝ(y) = − N (2π) 2 iyj RN ∂yj Thus there exists some C, depending on g, such that |ĝ(y)|2 ≤ C yj2 j = 1, 2, . . . N (8.4.11) from which it follows that 2 |ĝ(y)| ≤ min j C yj2 ≤ CN |y|2 (8.4.12) Thus ĝ(y) → 0 as |y| → ∞ in this case. Finally, such g’s are dense in L1 (RN ), so given f ∈ L1 (RN ) and > 0, choose g as above such that ||f − g||L1 (RN ) < . We then have, taking into account (8.4.2) |fˆ(y)| ≤ |fˆ(y) − ĝ(y)| + |ĝ(y)| ≤ 1 N (2π) 2 127 ||f − g||L1 (RN ) + |ĝ(y)| (8.4.13) and so lim sup |fˆ(y)| < |y|→∞ N (2π) 2 (8.4.14) Since > 0 is arbitrary, the conclusion fˆ ∈ C0 (RN ) follows. The fact that fˆ(y) → 0 as |y| → ∞ is analogous to the property that the Fourier coefficients cn → 0 as n → ±∞ in the case of Fourier series, and in fact is also called the Riemann-Lebesgue Lemma. One of the fundamental properties of the Fourier transform is that it is ’almost’ its own inverse. A first precise version of this is given by the following Fourier Inversion Theorem. finvthm Theorem 8.4. If f, fˆ ∈ L1 (RN ) then Z 1 f (x) = fˆ(y)eix·y dy N (2π) 2 RN a.e. x ∈ RN (8.4.15) The right hand side of (8.4.15) is not precisely the Fourier transform of fˆ because the exponent contains ix · y rather than −ix · y, but it does mean that we can think of ˆ it as saying that f (x) = fˆ(−x), or ˆ fˆ = fˇ, (8.4.16) where f and fˇ(x) = f (−x), is the reflection of f .1 The requirement in the theorem that both fˆ be in L1 will be weakened later on. Proof: Since fˆ ∈ L1 (RN ) the right hand side of (8.4.15) is well defined, and we denote it temporarily by g(x). Define also the family of Gaussians, |x|2 Gα (x) = 1 e− 4α N (4πα) 2 Warning: some authors use the symbol fˇ to mean the inverse Fourier transform of f . 128 (8.4.17) fourinv 819 We then have g(x) = = = = = Z 1 2 fˆ(y)eix·y e−α|y| dy α→0+ (2π) RN Z Z 1 2 lim f (z)e−α|y| e−i(z−x)·y dzdy N α→0+ (2π) N N ZR R Z 1 −α|y|2 −i(z−x)·y f (z) e e dy dz lim α→0+ (2π)N RN RN |z−x|2 Z e− 4α lim f (z) N dz α→0+ RN (4πα) 2 lim (f ∗ Gα )(x) lim N 2 α→0+ (8.4.18) (8.4.19) (8.4.20) (8.4.21) (8.4.22) Here (8.4.18) follows from the dominated convergence theorem and (8.4.20) from Fubini’s theorem, which is applicable here because Z Z 2 |f (z)e−α|y| | dzdy < ∞ (8.4.23) RN RN In (8.4.21) we have used the explicit calculation (8.4.7) above for the Fourier transform of a Gaussian. R Noting that RN Gα (x) dx = 1 for every α > 0, we see that the difference f ∗ Gα (x) − f (x) may be written as Z Gα (y)(f (x − y) − f (x)) dx (8.4.24) RN so that Z ||f ∗ Gα − f ||L1 (RN ) ≤ Gα (y)φ(y) dy (8.4.25) RN R where φ(y) = RN |f (x − y) − f (x)| dx. Then φ is bounded and continuous at y = 0 with φ(0) = 0 (see Exercise 10), and we can verify that the hypotheses of Theorem 7.2 are satisfied with fn replaced by Gαn as long as αn → 0+. For any sequence αn > 0, αn → 0 it follows that Gαn ∗ f → f in L1 (RN ), and so there is a subsequence αnk → 0 such that (Gαnk ∗ f )(x) → f (x) a.e. We conclude that (8.4.15) holds. 2 129 8.5 Further properties of the Fourier transform Formally speaking we have Z Z ∂ −ix·y f (x)e dx = −ixj f (x)e−ix·y dx ∂yj RN RN (8.5.1) or in more compact notation ∂ fˆ = (−ixj f )ˆ ∂yj (8.5.2) This is rigorously justified by standard theorems Rof analysis about differentiation of integrals with respect to parameters provided that RN |xj f (x)| dx < ∞. A companion property, obtained formally using integration by parts, is that Z Z ∂f −ix·y iyj f (x)e−ix·y dx (8.5.3) e dx = ∂x N N j R R or ∂f ˆ = iyj fˆ ∂xj (8.5.4) R which is rigorously correct provided at least that f ∈ C 1 (RN ) and |x|=R |f (x)| dS → 0 as R → ∞. Repeating the above arguments with higher derivatives we obtain Proposition 8.5. If α is any multi-index then Dα fˆ(y) = ((−ix)α f )ˆ(y) (8.5.5) 821 |xα f (x)| dx < ∞ (8.5.6) 822 (Dα f )ˆ(y) = (iy)α fˆ(y) (8.5.7) 823 (8.5.8) 824 if Z RN and if m n Z f ∈ C (R ) |Dβ f (x)| dS → 0 as R → ∞ |x|=R 130 |β| < |α| = m We will eventually see that (8.5.5) and (8.5.7) remain valid, suitably interpreted in a distributional sense, under conditions much more general than (8.5.6) and (8.5.8). But for now we introduce a new space in which these last two conditions are guaranteed to hold. Definition 8.1. The Schwartz space is defined as S(RN ) = {φ ∈ C ∞ (RN ) : xα Dβ φ ∈ L∞ (RN ) for all α, β} (8.5.9) Thus a function is in the Schwartz space if any derivative of it decays more rapidly than the reciprocal of any polynomial. Clearly S(RN ) contains all test functions D(RN ) 2 as well as other kinds of functions such as Gaussians, e−α|x| for any α > 0. If φ ∈ S(RN ) then in particular, for any n |Dβ φ(x)| ≤ C (1 + |x|2 )n (8.5.10) for some C, and so clearly both (8.5.5) and (8.5.7) hold, thus the two key identities (8.5.5) and (8.5.7) are correct whenever f is in the Schwartz space. It is also immediate from (8.5.10) that S(RN ) ⊂ L1 (RN ) ∩ L∞ (RN ). Proposition 8.6. If φ ∈ S(RN ) then φ̂ ∈ S(RN ). Proof: Note from (8.5.5) and (8.5.7) that (iy)α Dβ φ̂(y) = (iy)α ((−ix)β φ)ˆ(y) = (Dα ((−ix)β φ))ˆ(y) (8.5.11) holds for φ ∈ S(RN ). Also, since S(RN ) ⊂ L1 (RN ) it follows from (8.4.2) that if φ ∈ S(RN ) then φ̂ ∈ L∞ (RN ). Thus we have the following list of implications: φ ∈ S(RN ) =⇒ (−ix)β φ ∈ S(RN ) =⇒ Dα ((−ix)β φ) ∈ S(RN ) =⇒ (Dα ((−ix)β φ))ˆ ∈ L∞ (RN ) =⇒ y α Dβ φ̂ ∈ L∞ (RN ) N =⇒ φ̂ ∈ S(R ) fmap (8.5.12) (8.5.13) (8.5.14) (8.5.15) (8.5.16) Corollary 8.3. The Fourier transform F : S(RN ) → S(RN ) is one to one and onto. 131 825 Proof: The above theorem says that F maps S(RN ) into S(RN ), and if Fφ = φ̂ = 0 then the inversion theorem Theorem 8.4 is applicable, since both φ, φ̂ are in L1 (RN ). We ˇ conclude φ = 0, i.e. F is one to one. If ψ ∈ S(RN ), let φ = ψ̂. Clearly φ ∈ S(RN ) and one may check directly, again using the inversion theorem, that φ̂ = ψ, so that F is onto. The next result, usually known as the Parseval identity, is the key step needed to define the Fourier transform of a function in L2 (RN ), which turns out to be the more natural setting. Proposition 8.7. If φ, ψ ∈ S(RN ) then Z Z φ(x)ψ̂(x) dx = RN φ̂(x)ψ(x) dx (8.5.17) pars RN Proof: The proof is simply an interchange of order in an iterated integral, which is easily justified by Fubini’s theorem: Z Z Z 1 −ix·y φ(x)ψ̂(x) dx = ψ(y)e dy dx (8.5.18) φ(x) N (2π) 2 RN RN RN Z Z 1 −ix·y = φ(x)e dx dy (8.5.19) ψ(y) N (2π) 2 RN RN Z φ̂(y)ψ(y) dy (8.5.20) = RN There is a slightly different but equivalent formula, which is also sometimes called the Parseval identity, see Exercise 11. The content of the following corollary is the Plancherel identity. planchthm Corollary 8.4. For every φ ∈ S(RN ) we have ||φ||L2 (RN ) = ||φ̂||L2 (RN ) (8.5.21) Proof: Given φ ∈ S(RN ) there exists, by Corollary 8.3, ψ ∈ S(RN ) such that ψ̂ = φ. In addition it follows directly from the definition of the Fourier transform and the inversion 132 planch theorem that ψ = φ̂. Therefore, by Parseval’s identity Z Z Z 2 φ(x)ψ̂(x) dx = φ̂(x)ψ(x) = φ̂(x)φ̂(x) dx = ||φ̂||2L2 (RN ) (8.5.22) ||φ||L2 (RN ) = RN RN RN Recalling that D(RN ) is dense in L2 (RN ) it follows that the same is true of S(RN ) and the Plancherel identity therefore implies that the Fourier transform has an extension to all of L2 (RN ). To be precise, if f ∈ L2 (RN ) pick φn ∈ S(RN ) such that φn → f in L2 (RN ). Since {φn } is Cauchy in L2 (RN ), (8.5.21) implies the same for {φ̂n }, so g := limn→∞ φ̂n exists in the L2 sense, and this limit is by definition fˆ. From elementary considerations this limit is independent of the choice of approximating sequence {φn }, the extended definition of fˆ agrees with the original definition if f ∈ L1 (RN ) ∩ L2 (RN ), and (8.5.21) continues to hold for all f ∈ L2 (RN ). ˆ ˆ Since φ̂n → fˆ in L2 (RN ), it follows by similar reasoning that φˆn → fˆ. By the inversion ˆ ˆ theorem we know that φˆn = φˇn which must converge to fˇ, thus fˇ = fˆ, i.e. the Fourier inversion theorem continues to hold on L2 (RN ). The subset L1 (RN ) ∩ L2 (RN ) is dense in L2 (RN ) so we also have that fˆ = limn→∞ fˆn if fn is any sequence in L1 (RN ) ∩ L2 (RN ) convergent in L2 (RN ) to f . A natural choice of such a sequence is ( f (x) |x| < n fn (x) = (8.5.23) 0 |x| > n leading to the following explicit formula, similar to an improper integral, for the Fourier transform of an L2 function, Z 1 fˆ(y) = lim f (x)e−ix·y dx (8.5.24) N n→∞ (2π) 2 |x|<n fourL2 where again without further assumptions we only know that the limit takes place in the L2 sense. Let us summarize. Theorem 8.5. For any f ∈ L2 (RN ) there exists a unique fˆ ∈ L2 (RN ) such that fˆ is given by (8.4.1) whenever f ∈ L1 (RN ) ∩ L2 (RN ) and ||f ||L2 (RN ) = ||fˆ||L2 (RN ) . 133 (8.5.25) planch2 Furthermore, f, fˆ are related by (8.5.24) and f (x) = lim n→∞ Z 1 (2π) N 2 fˆ(y)eix·y dy (8.5.26) |y|<n We conclude this section with one final important property of the Fourier transform. ftconv Proposition 8.8. If f, g ∈ L1 (RN ) then f ∗ g ∈ L1 (RN ) and N (f ∗ g)ˆ = (2π) 2 fˆĝ (8.5.27) Proof: The fact that f ∗ g ∈ L1 (RN ) is immediate from Fubini’s theorem, or, alternatively, is a special case of Young’s convolution inequality (7.4.2). To prove (8.5.27) we have Z 1 (f ∗ g)ˆ(z) = (f ∗ g)(x)e−ix·z dx (8.5.28) N (2π) 2 RN Z Z 1 = f (x − y)g(y) dy e−ix·z dx (8.5.29) N (2π) 2 RN RN Z Z 1 −iy·z −i(x−y)·z = g(y)e f (x − y)e dx dy (8.5.30) N (2π) 2 RN RN N = (2π) 2 fˆ(z)ĝ(z) (8.5.31) with the exchange of order of integration justified by Fubini’s theorem. 8.6 Fourier series of distributions In this and the next section we will see how the theory of Fourier series and Fourier transforms can be extended to a distributional setting. To begin with let us consider the casePof the delta function, viewed as a distribution on (−π, π). Formally speaking, if inx δ(x) = ∞ , then the coefficients cn should be given by n=−∞ cn e Z π 1 1 cn = δ(x)e−inx dx = (8.6.1) 2π −π 2π 134 874 for every n, so that δ(x) = ∞ 1 X inx e 2π n=−∞ (8.6.2) 871 Certainly this is not a valid formula in any classical sense, since the terms of the series do not decay to zero. On the other hand, the N ’th partial sum of this series is precisely the Dirichlet kernel DN (x), as in (8.1.4) or (8.1.13), and one consequence of Theorem 8.2 is precisely that DN → δ in D0 (−π, π). Thus we may expect to find Fourier series representations of distributions, provided that we allow for the series to converge in a distributional sense. Note that since DN → δ we must also have, by Proposition 7.2, that 0 = DN N i X neinx → δ 0 2π n=−N (8.6.3) P m inx as N → ∞. By repeatedly differentiating, we see that any formal Fourier series ∞ n=−∞ n e is meaningful in the distributional sense, and is simply, up to a constant multiple, some derivative of the delta function. The following proposition shows that we can allow any sequence of Fourier coefficients as long as the rate of growth is at most a power of n. Proposition 8.9. Let {cn }∞ n=−∞ be any sequence of constants satisfying |cn | ≤ C|n|M (8.6.4) for some constant C and positive integer M . Then there exists T ∈ D0 (−π, π) such that T = ∞ X cn einx (8.6.5) n=−∞ Proof: Let g(x) = ∞ X cn einx M +2 (in) n=−∞ (8.6.6) which is a uniformly convergent Fourier series, so in particular the partial sums SN → g (j) in the sense of distributions on (−π, π). But then SN → g (j) also in the distributional sense, and in particular ∞ X cn einx = T := g (M +2) (8.6.7) n=−∞ 135 distfs It seems clear that any distribution on R of the form (8.6.5) should be 2π periodic since every partial sum is. To make this precise, define the translate of any distribution T ∈ D0 (RN ) by the natural definition τh T (φ) = T (τ−h φ), where as usual τh φ(x) = φ(x − h), h ∈ RN . We then say that T is h-periodic with period h ∈ RN if τh T = T , and it is immediate that if Tn is h-periodic and Tn → T in D0 (RN ) then T is also h periodic. Example 8.3. The Fourier series identity (8.6.2) becomes ∞ X ∞ 1 X inx δ(x − 2nπ) = e 2π n=−∞ n=−∞ (8.6.8) when regarded as an identity in D0 (R), since the left side is 2π periodic and coincides with δ on (−π, π). A 2π periodic distribution on R may also naturally be regarded as an element of the distribution space D0 (T), which is defined as the space of continuous linear functionals (j) on C ∞ (T). Here, convergence in C ∞ (T) means that φn → φ(j) uniformly on T for all 1 j = 0, 1, 2 . . . . Any function usual way to regular distribution R π f ∈ L (T) gives rise in the 2 Tf defined by Tf (φ) = −π f (x)φ(x) dx and if f ∈ L then then n’th Fourier coefficient 1 Tf (e−inx ). Since e−inx ∈ C ∞ (T) it follows that is cn = 2π cn = T (e−inx ) (8.6.9) is defined for T ∈ D0 (T), and is defined to be the n’th Fourier coefficient of the distribution T . This definition is then consistent with the definition of Fourier coefficient for a regular distribution, and it can be shown (Exercise 30) that N X cn einx → T in D0 (T) (8.6.10) n=−N Example 8.4. Let us evaluate the distributional Fourier series ∞ X einx (8.6.11) n=0 The n’th partial sum is sn (x) = n X eikx = k=0 136 1 − ei(n+1)x 1 − eix (8.6.12) 872 Rπ sn (x) dx = 2π, Z π 1 − ei(n+1)x sn (φ) = 2πφ(0) + (φ(x) − φ(0)) dx 1 − eix −π so that we may write, since −π (8.6.13) for any test function φ. The function (φ(x) − φ(0))/(1 − eix ) belongs to L2 (−π, π), hence Z π i(n+1)x e (φ(x) − φ(0)) dx → 0 ix −π 1 − e (8.6.14) as n → ∞ by the Riemann-Lebesgue lemma. Next, using obvious trigonometric identities we see that 1/(1 − eix ) = 12 (1 + i cot x2 ), and so Z π Z φ(x) − φ(0) 1 x dx = lim (φ(x) − φ(0))(1 + i cot ) dx (8.6.15) ix →0+ 2 <|x|<π 1−e 2 −π Z π 1 φ(x) dx − πφ(0) (8.6.16) = 2 −π Z i x + lim φ(x) cot dx (8.6.17) →0+ 2 <|x|<π 2 The principal value integral in (8.6.17) is naturally defined to be the action of the distribution pv(cot x2 ), and we obtain the final result, upon letting n → ∞, that ∞ X einx = πδ + n=0 1 i x + pv(cot ) 2 2 2 (8.6.18) By taking the real and imaginary parts of this identity we also find ∞ X n=0 8.7 cos nx = πδ + ∞ X 1 2 sin nx = n=1 1 x pv(cot ) 2 2 (8.6.19) Fourier transforms of distributions Taking again the example of the delta function, now considered as a distribution on RN , it appears formally correct that it should have a Fourier transform which is a constant function, namely Z 1 1 δ̂(x) = δ(x)e−ix·y dx = (8.7.1) N N (2π) 2 RN (2π) 2 137 If the inversion theorem remains valid then any constant should also have a Fourier N transform, e.g. 1̂ = (2π) 2 δ. On the other hand it will turn out that a function such as ex does not have a Fourier transform in any reasonable sense. We will now show that the set of distributions for which the Fourier transform can be defined turns out to be precisely the dual space of the Schwartz space, known also as the space of tempered distributions. To define this we must first have a definition of convergence in S(RN ). Definition 8.2. We say that φn → φ in S(RN ) if lim ||xα Dβ (φn − φ)||L∞ (RN ) = 0 n→∞ f or any α, β (8.7.2) Proof of the following lemma will be left for the exercises. lemma81 Lemma 8.1. If φn → φ in S(RN ) then φˆn → φ̂ in S(RN ). Definition 8.3. The set of tempered distributions on RN is the space of continuous linear functionals on S(RN ), denoted S 0 (RN ). It was already observed that D(RN ) ⊂ S(RN ) and in addition, if φn → φ in D(RN ) then the sequence also converges in S(RN ). It therefore follows that S 0 (RN ) ⊂ D0 (RN ) (8.7.3) i.e. any tempered distribution is also a distribution, as the choice of language suggests. On the other hand, if Tf is the regular distribution corresponding to the L1loc function R ∞ f (x) = ex , then Tf 6∈ S 0 (RN ) since this would require −∞ ex φ(x) dx to be finite for any φ ∈ S(RN ), which is not true. Thus the inclusion (8.7.3) is strict. We define convergence in S 0 (RN ) is defined in the expected way, analogously to Definition 7.5: convS Definition 8.4. If T, Tn ∈ S 0 (RN ) for n = 1, 2 . . . then we say Tn → T in S 0 (RN ) (or in the sense of tempered distributions) if Tn (φ) → T (φ) for every φ ∈ S(RN ). It is easy to see that the delta function belongs to S 0 (RN ) as does any derivative or translate of the delta function. A regular distribution Tf will belong to S 0 (RN ) provided it satisfies the condition f (x) lim =0 (8.7.4) |x|→∞ |x|m 138 851 for some m. Such an f is sometimes referred to as a function of slow growth. In particular, any polynomial belongs to S 0 (RN ). We can now define the Fourier transform T̂ for any T ∈ S 0 (RN ). For motivation of the definition, recall the Parseval identity (8.5.17), which amounts to the identity Tψ̂ (φ) = Tψ (φ̂), if we regard φ as a function in S(RN ) and ψ as a tempered distribution. Definition 8.5. If T ∈ S 0 (RN ) then T̂ is defined by T̂ (φ) = T (φ̂) for any φ ∈ S(RN ). The action of T̂ on any φ ∈ S(RN ) is well-defined, since φ̂ ∈ S(RN ), and linearity of T̂ is immediate. If φn → φ in S(RN ) then by Lemma 8.1 φˆn → φ̂ in S(RN ), so that T̂ (φn ) = T (φˆn ) → T (φ̂) = T̂ (φ) (8.7.5) We have thus verified that T̂ ∈ S 0 (RN ) whenever T ∈ S 0 (RN ). Example 8.5. If T = δ, then from the definition, T̂ (φ) = T (φ̂) = φ̂(0) = Thus, as expected, δ̂ = 1 (2π) N 2 Z 1 φ(x) dx N (2π) 2 (8.7.6) RN , the constant distribution. Example 8.6. If T = 1 (the constant distribution) then Z N ˆ N T̂ (φ) = T (φ̂) = φ̂(x) dx = (2π) 2 φ̂(0) = (2π) 2 φ(0) (8.7.7) RN where the last equality follows from the inversion theorem which is valid for any φ ∈ S(RN ). Thus again the expected result is obtained, N 1̂ = (2π) 2 δ (8.7.8) The previous two examples verify the validity of one particular instance of the Fourier inversion theorem in the distributional context, but it turns out to be rather easy to prove that it always holds. One more definition is needed first, that of the reflection of a distribution. Definition 8.6. If T ∈ D0 (RN ) then Ť , the reflection of T , is the distribution defined by Ť (φ) = T (φ̌). 139 We now obtain the Fourier inversion theorem in its most general form, analogous to the statement (8.4.16) first justified when f, fˆ are in L1 (RN ). ˆ Theorem 8.6. If T ∈ S 0 (RN ) then T̂ = Ť . Proof: For any φ ∈ S(RN ) we have ˆ ˆ T̂ (φ) = T (φ̂) = T (φ̌) = Ť (φ) (8.7.9) The apparent triviality of this proof should not be misconstrued, as it relies on the validity of the inversion theorem in the Schwartz space, and other technical machinery which we have developed. Here we state several more simple but useful properties. Here and elsewhere, we follow the convention of using x and y as the independent variables before and after Fourier transformation respectively. ftdprop Proposition 8.10. Let T ∈ S 0 (RN ) and α be a multi-index. Then 1. xα T ∈ S 0 (RN ). 2. Dα T ∈ S 0 (RN ). 3. Dα T̂ = ((−ix)α T )ˆ. 4. (Dα T )ˆ = (iy)α T̂ . propftd 5. If Tn ∈ S 0 (RN ) and Tn → T in S 0 (RN ) then T̂n → T̂ in S 0 (RN ). Proof: We give the proof of part 3 only, leaving the rest for the exercises. Just like the inversion theorem, it is more or less a direct consequence of the corresponding identity for functions in S(RN ). For any φ ∈ S(RN ) we have Dα T̂ (φ) = (−1)|α| T̂ (Dα φ) = (−1)|α| T ((Dα φ)ˆ) (8.7.10) (8.7.11) = (−1)|α| T ((iy)α φ̂) α (8.7.12) α = (−ix) T (φ̂) = ((−ix) T )ˆ(φ) as needed, where we used (8.5.7) to obtain (8.7.12). 140 (8.7.13) Example 8.7. If T = δ 0 regarded as an element of S 0 (R) then iy T̂ = (δ 0 )ˆ = iy δ̂ = √ 2π by part 4 of the previous proposition. In other words Z ∞ i xφ(x) dx T̂ (φ) = √ 2π −∞ (8.7.14) (8.7.15) Example 8.8. Let T = H(x), the Heaviside function, again regarded as an element of S 0 (R). To evaluate the Fourier transform Ĥ, one possible approach is to use part 4 of √ 0 Proposition 8.10 √ along with H = δ to first obtain iy Ĥ = 1/ 2π. A formal solution is then Ĥ = 1/ 2πiy, but it must then be recognized that this distributional equation does not have a unique solution, rather we can add to it any solution of yT = 0, e.g. T = Cδ for any constant C. It must be verified that there are no other solutions, the constant C must be evaluated, and the meaning of 1/y in the distribution sense must be made precise. See Example 8, section 2.4 of [33] for details of how this calculation is completed. An alternate approach, which yields other useful formulas along the way is as follows. For any φ ∈ S(RN ) we have Z ∞ Ĥ(φ) = H(φ̂) = φ̂(y) dy (8.7.16) Z ∞ 0Z ∞ 1 = √ φ(x)e−ixy dxdy (8.7.17) 2π 0 −∞ Z RZ ∞ 1 φ(x)e−ixy dxdy (8.7.18) = lim √ R→∞ 2π 0 −∞ Z R Z ∞ 1 −ixy e dy dx (8.7.19) = lim √ φ(x) R→∞ 2π −∞ 0 Z ∞ 1 1 − e−iRx = lim √ φ(x) dx (8.7.20) R→∞ ix 2π −∞ Z ∞ Z ∞ sin Rx i cos Rx − 1 1 = lim √ φ(x) dx + √ φ(x) dx (8.7.21) R→∞ x 2π −∞ x 2π −∞ It can then be verified that sin Rx → πδ x cos Rx − 1 1 → − pv x x 141 (8.7.22) 881 as R → ∞ in D0 (R). The first limit is just a restatement of the result of part b) in Exercise 7 of Chapter 7, and the second we leave for the exercises. The final result, therefore, is that r π i 1 Ĥ = δ − √ pv (8.7.23) 2 2π x heavtrans Example 8.9. Let Tn = δ(x − n), i.e. Tn (φ) = φ(n), for n = 0, ±1, . . . , so that Z ∞ 1 φ(x)e−inx dx (8.7.24) T̂n (φ) = φ̂(n) = √ 2π −∞ √ P 0 Equivalently, 2π T̂n = e−inx . If we now set T = ∞ n=−∞ Tn then T ∈ S (R) and ∞ ∞ ∞ X 1 X inx √ 1 X −inx e =√ e = 2π δ(x − 2πn) T̂ = √ 2π n=−∞ 2π n=−∞ n=−∞ (8.7.25) where the last equality comes from (8.6.8). The relation T (φ̂) = T̂ (φ), then yields the very interesting identity ∞ ∞ X X √ φ̂(n) = 2π φ(2πn) (8.7.26) n=−∞ n=−∞ valid at least for φ ∈ S(R), which is known as the Poisson summation formula. We conclude this section with some discussion of the Fourier transform and convolution in a distributional setting. Recall we gave a definition of the convolution T ∗ φ in Definition 7.7, when T ∈ D0 (RN ) and φ ∈ D(RN ). We can use precisely the same definition if T ∈ S 0 (RN ) and φ ∈ S(RN ), that is convsp Definition 8.7. If T ∈ S 0 (RN ) and φ ∈ S(RN ) then (T ∗ φ)(x) = T (τx φ̌). Note that in terms of the action of the distribution T , x is just a parameter, and that we must regard φ̌ as a function of some unnamed other variable, say y or ·. By methods similar to those used in the proof of Theorem 7.3 it can be shown that T ∗ φ ∈ C ∞ (RN ) ∩ S 0 (RN ) (8.7.27) Dα (T ∗ φ) = Dα T ∗ φ = T ∗ Dα φ (8.7.28) and In addition we have the following generalization of Proposition 8.8: 142 poissum convth3 Theorem 8.7. If T ∈ S 0 (RN ) and φ ∈ S(RN ) then N (T ∗ φ)ˆ = (2π) 2 T̂ φ̂ (8.7.29) Sketch of proof: First observe that from Proposition 8.8 and the inversion theorem we see that 1 (φψ)ˆ = (8.7.30) N (φ̂ ∗ ψ̂) (2π) 2 for φ, ψ ∈ S(RN ). Thus for ψ ∈ S(RN ) (T̂ φ̂)(ψ) = T̂ (φ̂ψ) = T ((φ̂ψ)ˆ) = 1 (2π) N 2 ˆ T (φ̂ ∗ ψ̂) = 1 N (2π) 2 T (φ̌ ∗ ψ̂) (8.7.31) On the other hand, (T ∗ φ)ˆ(ψ) = (T ∗ φ)(ψ̂) Z Z T (τx φ̌)ψ̂(x) dx (T ∗ φ)(x)ψ̂(x) dx = = N RN R Z Z φ̌(· − x)ψ̂(x) dx τx φ̌(·)ψ̂(x) dx = T = T (8.7.32) (8.7.33) (8.7.34) RN RN = T (φ̌ ∗ ψ̂) (8.7.35) which completes the proof. We have labeled the above proof a ’sketch’ because one key step, the first equality in (8.7.34) was not explained adequately. See the conclusion of the proof of Theorem 7.19 in [31] for why it is permissible to move T across the integral in this way. 8.8 8-1 8-2 Exercises P inx for the function f (x) = x on (−π, π). Use 1. Find the Fourier series ∞ n=−∞ cn e some sort of computer graphics to plot a few of the partial sums of this series on the interval [−3π, 3π]. 2. Use the Fourier series in problem 1 to find the exact value of the series ∞ X 1 n2 n=1 ∞ X n=1 143 1 (2n − 1)2 3. Evaluate explicitly the Fourier series, justifying your steps: ∞ X n cos (nx) 2n n=1 (Suggestion: start by evaluating P∞ einx n=1 2n , which is a geometric series.) 4. Produce a sketch of the Dirichlet and Féjer kernels DN and KN , either by hand or by computer, for some reasonably large value of N . 5. Verify the first identity in (8.1.19). 8-5 6. We say that f ∈ H k (T) if f ∈ D0 (T) and its Fourier coefficients cn satisfy ∞ X n2k |cn |2 < ∞ (8.8.1) n=−∞ a) If f ∈ H 1 (T) show that is uniformly convergent. P∞ n=−∞ |cn | is convergent and so the Fourier series of f b) Show that f ∈ H k (T) for every k if and only if f ∈ C ∞ (T). 7. Evaluate the Fourier series ∞ X (−1)n n sin (nx) n=1 0 in D (R). If possible, plot some partial sums of this series. 8. Find the Fourier transform of H(x)e−αx for α > 0. 9. Let f ∈ L1 (RN ). a) If fλ (x) = f (λx) for λ > 0, find a relationship between fbλ and fb. b) If fh (x) = f (x − h) for h ∈ RN , find a relationship between fbh and fb. 8n1 8-10 10. If f ∈ L1 (RN ) show that τh f → f in L1 (RN ) as h → 0. (Hint: First prove it when f is continuous and of compact support.) 11. Show that Z Z φ(x)ψ(x) dx = RN b ψ(x) b dx φ(x) (8.8.2) RN for φ and ψ in the Schwartz space. (This is also sometimes called the Parseval identity and leads even more directly to the Plancherel formula.) 144 8n3 ex-8-13 12. Prove Lemma 8.1. 13. In this problem Jn denotes the Bessel function of the first kind and of order n. It may defined in various ways, one of which is Z i−n π iz cos θ Jn (z) = e cos (nθ) dθ (8.8.3) π 0 Suppose that f is a radially symmetric function in L1 (R2 ), i.e. f (x) = f (r) where r = |x|. Show that Z ∞ ˆ J0 (r|y|)f (r)r dr f (y) = 0 It follows in particular that fˆ is also radially symmetric. Using the known identity d (zJ1 (z)) = zJ0 (z) compute the Fourier transform of χB(0,R) the indicator function dz of the ball B(0, R) in R2 . 14. Prove that J0 (z), defined as in (8.8.3), is a solution of the zero order Bessel equation u00 + u0 +u=0 z Suggestion: show that zJ000 (z) + J00 (z) 1 + zJ0 (z) = π Z 0 π d (cos θ sin (z sin θ)) dθ dθ 15. For α ∈ R let fα (x) = cos αx. a) Find the Fourier transform fbα . b) Find limα→0 fbα and limα→∞ fbα in the sense of distributions. 16. Compute the Fourier transform of the Heaviside function H(x) in yet a different way by justifying that bn Ĥ = lim H n→∞ x in the sense of distributions, where Hn (x) = H(x)e− n , and then evaluating this limit. 17. Prove the remaining parts of Proposition 8.10. 18. Let f ∈ C(R) be 2π periodic. It then has a Fourier series in the classical sense, but it also has a Fourier transform since f is a tempered distribution. What is the relationship between the Fourier series and the Fourier transform? 145 besselint 19. Let f ∈ L2 (RN ). Show that f is real valued if and only if fb(−k) = fb(k) for all k ∈ RN . What is the analog of this for Fourier series? 20. Let f be a continuous 2π periodic function with the usual Fourier coefficients Z π 1 cn = f (x)e−inx dx 2π −π Show that and therefore 1 cn = − 2π 1 cn = 4π Z π Z π f (x + −π π −inx )e dx n f (x) − f (x + −π π −inx ) e dx. n If f is Lipschitz continuous, use this to show that there exists a constant M such that M |cn | ≤ n 6= 0 |n| 21. Let R = (−1, 1) × (−1, 1) be a square in R2 , let f be the indicator function of R and g be the indicator function of the complement of R. a) Compute the Fourier transforms fˆ and ĝ. b) Is either fˆ or ĝ in L2 (R2 )? 8n5 22. Verify the second limit in (8.7.22). 23. A distribution T on RN is even if Ť = T , and odd if Ť = −T . Prove that the Fourier transform of an even (resp. odd) tempered distribution is even (resp. odd). 24. Let φ ∈ S(R), ||φ||L2 (R) = 1, and show that Z ∞ Z ∞ 1 2 2 2 2 y |φ̂(y)| dy ≥ x |φ(x)| dx 4 −∞ −∞ (8.8.4) This is a mathematical statement of the Heisenberg uncertainty principle. (Suggestion: start with the identity Z ∞ Z ∞ d 2 1= |φ(x)| dx = − x |φ(x)|2 dx −∞ −∞ dx Make sure to allow φ to be complex valued.) Show that equality is achieved in (8.8.4) if φ is a Gaussian. 146 uncert P∞ −πn2 t 25. Let θ(t) = . (It is a particular case of a class of special functions n=−∞ e known as theta functions.) Use the Poisson summation formula (8.7.26) to show that r 1 1 θ(t) = θ t t 26. Use (8.7.23) to obtain the Fourier transform of pv x1 , r 1 π sgn y ( pv )ˆ(y) = −i x 2 (8.8.5) 27. The proof of Theorem 8.7 implicitly used the fact that if φ, ψ ∈ S(RN ) then φ ∗ ψ ∈ S(RN ). Prove this property. 28. Where is the mistake in the following argument? If u(x) = e−x then u0 + u = 0 so by Fourier transformation iyû(y) + û(y) = (1 + iy)û(y) = 0 y∈R Since 1 + iy 6= 0 for real y, it follows that û(y) = 0 for all real y and hence u(x) = 0. 29. If f ∈ L2 (RN ), the autocorrelation function of f is defined to be Z ˇ g(x) = (f ∗ f )(x) = f (y)f (y − x) dy RN 8n29 Show that ĝ(y) = |fˆ(y)|2 , ĝ ∈ L1 (RN ) and that g ∈ C0 (RN ). (ĝ is called the power spectrum or spectral density of f .) P inx 30. If T ∈ D0 (T) and cn = T (e−inx ), show that T = ∞ in D0 (T). n=−∞ cn e 31. The ODE u00 − xu = 0 is known as Airy’s equation, and solutions of it are called Airy functions. a) If u is an Airy function which is also a tempered distribution, use the Fourier transform to find a first order ODE for û(y). b) Find the general solution of the ODE for û. c) Obtain the formal solution formula Z ∞ u(x) = C −∞ 147 eixy+iy 3 /3 dy pvhat d) Explain why this formula is not meaningful as an ordinary integral, and how it can be properly interpreted. e) Is this the general solution of the Airy equation? 148 Chapter 9 Distributions and Differential Equations chde In this chapter we will begin to apply the theory of distributions developed in the previous chapter in a more systematic way to problems in differential equations. The modern theory of partial differential equations, and to a somewhat lesser extent ordinary differential equations, makes extensive use of the so-called Sobolev spaces which we now proceed to introduce. 9.1 Weak derivatives and Sobolev spaces sec-sobolev If f ∈ Lp (Ω) then for any multiindex α we know that Dα f exists as an element of D0 (RN ), but in general the distributional derivative need not itself be a function. However if there exists g ∈ Lq (Ω) such that Dα f = Tg in D0 (RN ) then we say that f has the weak α derivative g in Lq (Ω). That is to say, the requirement is that Z Z α |α| f D φ dx = (−1) gφ dx ∀φ ∈ D(Ω) (9.1.1) Ω α Ω q and we write D f ∈ L (Ω). It is important to distinguish the concept of weak derivative and almost everywhere (a.e.) derivative. Example 9.1. Let Ω = (−1, 1) and f (x) = |x|. Obviously f ∈ Lp (Ω) for any 1 ≤ p ≤ ∞, and in the sense of distributions we have f 0 (x) = 2H(x) − 1 (use, for example, (7.3.27)). 149 Thus f 0 ∈ Lq (Ω) for any 1 ≤ q ≤ ∞. On the other hand f 00 = 2δ which does not coincide with Tg for any g in any Lq space. Thus f has the weak first derivative, but not the weak second derivative, in Lq (Ω) for any q. The first derivative of f coincides with its a.e. derivative. In the case of the second derivative, f 00 = 2δ in the sense of distributions, and obviously f 00 = 0 a.e. but this function does not coincide with the weak second derivative, indeed there is no weak second derivative according to the above definition. We may now define the spaces W k,p (Ω), known as Sobolev spaces. Definition 9.1. If Ω ⊂ RN is an open set, 1 ≤ p ≤ ∞ and k = 1, 2, . . . then W k,p (Ω) := {f ∈ D0 (Ω) : Dα f ∈ Lp (Ω) |α| ≤ k} (9.1.2) We emphasize that the meaning of the condition Dα f ∈ Lp (Ω) is that f should have the weak α derivative in Lp (Ω) as discussed above. Clearly D(Ω) ⊂ W k,p (Ω) ⊂ Lp (Ω) (9.1.3) so that W k,p (Ω) is always a dense subspace of Lp (Ω) for 1 ≤ p < ∞. Example 9.2. If f (x) = |x| then referring to the discussion in the previous example we see that f ∈ W 1,p (−1, 1) for any p ∈ [1, ∞], but f 6∈ W 2,p for any p. It may be readily checked that W k,p (Ω) is a normed linear space with norm p1 P p α 1≤p<∞ |α|≤k ||D f ||Lp (Ω) ||f ||W k,p (Ω) = (9.1.4) max α ∞ p=∞ |α|≤k ||D f ||L (Ω) Furthermore, the necessary completeness property can be shown (Exercise 5, or see Theorem 9.1 below) so that W k,p (Ω) is a Banach space. When p = 2 the norm may be regarded as arising from the inner product X hf, gi = Dα f (x)Dα g(x) dx (9.1.5) |α|≤k so that it is a Hilbert space. The alternative notation H k (Ω) is commonly used in place of W k,2 (Ω). There is a second natural way to give meaning to the idea of a function f ∈ Lp (Ω) having a derivative in an Lq space, which is as follows: if there exists g ∈ Lq (Ω) such 150 that there exists fn ∈ C ∞ (Ω) satisfying fn → f in Lp (Ω) and Dα fn → g in Lq (Ω), then we say f has the strong α derivative g in Lq (Ω). It is elementary to see that a strong derivative is also a weak derivative – we simply let n → ∞ in the identity Z Z α α D fn φ dx = (−1) fn Dα φ dx (9.1.6) Ω Ω for any test function φ. Far more interesting is that when p < ∞ the converse statement is also true, that is weak=strong. This important result, which shall not be proved here, was first established by Friedrichs [12] in some special situations, and then in full generality by Meyers and Serrin [23]. A more thorough discussion may be found, for example, in Chapter 3 of Adams [1]. The key idea is to use convolution, as in Theorem 7.5 to obtain the needed sequence fn of C ∞ functions. For f ∈ W k,p (Ω) the approximating sequence may clearly be supposed to belong to C ∞ (Ω) ∩ W k,p (Ω), so this space is dense in W k,p (Ω) and we have HisW Theorem 9.1. For any open set Ω ⊂ RN , 1 ≤ p < ∞ and k = 0, 1, 2 . . . the Sobolev space W k,p (Ω) coincides with the closure of C ∞ (Ω) ∩ W k,p (Ω) in the W k,p (Ω) norm. We now define another class of Sobolev spaces which will be important for later use. Definition 9.2. For Ω ⊂ RN , W0k,p (Ω) is defined to be the closure of C0∞ (Ω) in the W k,p (Ω) norm. Obviously W0k,p (Ω) ⊂ W k,p (Ω), but it may not be immediately clear whether these are actually the same space. In fact this is certainly true when k = 0 since in this case we know C0∞ (Ω) is dense in Lp (Ω), 1 ≤ p < ∞. It also turns out to be correct for any k, p when Ω = RN (see Corollary 3.19 of Adams [ ]). But in general the inclusion is strict, and f ∈ W0k,p (Ω) carries the interpretation that Dα f = 0 on ∂Ω for |α| ≤ k − 1. This topic will be continued in more detail in Chapter ( ). 9.2 Differential equations in D0 If we consider the simplest differential equation u0 = f on an interval (a, b) ⊂ R, then from elementary calculus R x we know that if f is continuous on [a, b], then every solution is of the form u(x) = a f (y) dy + C, for some constant C. Furthermore in this case 151 u ∈ C 1 ([a, b]), and u0 (x) = f (x) for every x ∈ (a, b) and we would refer to u as a classical solution of u0 = f . If we make the weaker assumption that f ∈ L1 (a, b) then we can no longer expect u to be C 1 or u0 (x) = f (x) to hold at every point, R x since f itself is only defined up to sets of measure zero. If, however, we let u(x) = a f (y) dy + C then it is an important result of measure theory that u0 (x) = f (x) a.e. on (a, b). The question remains whether all solutions of u0 = f are of this form, and the answer must now depend on precisely what is meant by ’solution’. If we were to interpret the differential equation as meaning u0 = f a.e. then the answer is no. For example u(x) = H(x) is a nonconstant function on (−1, 1) with u0 (x) = 0 for x 6= 0. An alternative meaning is that the differential equation should be satisfied in the sense of distributions on (a, b), in which case we have the following theorem. th9-2 Theorem 9.2. Let f ∈ L1 (a, b). Rx a) If F (x) = a f (y) dy then F 0 = f in D0 (a, b). b) If u0 = f in D0 (a, b), then there exists a constant C such that Z x u(x) = f (y) dy + C a<x<b (9.2.1) a Proof: If F (x) = Rx a f (y) dy, then for any φ ∈ C0∞ (a, b) we have Z b F (x)φ0 (x) dx F (φ) = −F (φ ) = − a Z b Z x = − f (y) dy φ0 (x) dx a a Z b Z b 0 φ (x) dx dy = − f (y) 0 0 a Z = (9.2.2) (9.2.3) (9.2.4) y b f (y)φ(y) dy = f (φ) (9.2.5) a Here the interchange of order of integration in the third line is easily justified by Fubini’s theorem. This proves part a). Now if u0 = f in the distributional sense then T = u − F satisfies T 0 = 0 in D0 (a, b), and we will finish by showing that T must be a constant. Choose φ0 ∈ C0∞ (a, b) such 152 921 that Rb a φ0 (y) dy = 1. If φ ∈ C0∞ (a, b), set Z ψ(x) = φ(x) − b φ(y) dy φ0 (x) (9.2.6) a so that ψ ∈ C0∞ (a, b) and Rb a ψ(x) dx = 0. Let Z x ψ(y) dy ζ(x) = (9.2.7) a Obviously ζ ∈ C ∞ (a, b) since ζ 0 = ψ, but in fact ζ ∈ C0∞ (a, b) since ζ(a) = ζ(b) = 0 and ζ 0 = ψ ≡ 0 in some neighborhood of a and of b. Finally it follows, since T 0 = 0 that Z b 0 0 0 = T (ζ) = −T (ζ ) = −T (ψ) = φ(y) dy T (φ0 ) − T (φ) (9.2.8) a Rb or equivalently T (φ) = a Cφ(y) dy where C = T (φ0 ). Thus T is the distribution corresponding to the constant function C. We emphasize that part b) of this theorem is of interest, and not obvious, even when f = 0: any distribution whose distributional derivative on some interval is zero must be a constant distribution on that interval. Therefore, any distribution is uniquely determined up to an additive constant by its distributional derivative, which, to repeat, is not the case for the a.e. derivative. Now let Ω ⊂ RN be an open set and Lu = X aα (x)Dα u (9.2.9) |α|≤m be a differential operator of order m. We assume that aα ∈ C ∞ (Ω) in which case Lu ∈ D0 (Ω) is well defined for any u ∈ D0 (Ω). We will use the following terminology for the rest of this chapter. Definition 9.3. If f ∈ D0 (Ω) then • u is a classical solution of Lu = f in Ω if u ∈ C m (Ω) and Lu(x) = f (x) for every x ∈ Ω. • u is a weak solution of Lu = f in Ω if u ∈ L1loc (Ω) and Lu = f in D0 (Ω). 153 pdo • u is a distributional solution of Lu = f in Ω if u ∈ D0 (Ω) and Lu = f in D0 (Ω). It is clear that a classical solution is also a weak solution, and a weak solution is a distributional solution. The converse statements are false in general, but may be true in special cases.For example we have proved above that any distributional solution of u0 = 0 must be constant, hence in particular any distributional solution of this differential equation is actually a classical solution. On the other hand u = δ is a distributional solution of x2 u0 = 0, but is not a classical or weak solution. Of course a classical solution cannot exist if f is not continuous on Ω. A theorem which says that any solution of a certain differential equation must be smoother than what is actually needed for the definition of solution, is called a regularity result. Regularity theory is a large and important research topic within the general area of differential equations. Example 9.3. Let Lu = uxx − uyy . If F, G ∈ C 2 (R) and u(x, y) = F (x + y) + G(x − y) then we know u is classical solution of Lu = 0. We have also observed, in Example 7.12 that if F, G ∈ L1loc (R) then Lu = 0 in the sense of distributions, thus u is a weak solution of Lu = 0 according to the above definition. The equation has distributional solutions also, which R ∞ are not weak solutions. For example, the singular distribution T defined by T (φ) = −∞ φ(x, x), dx in Exercise 11 of Chapter 7). Example 9.4. If Lu = uxx +uyy then it turns out that all solutions of Lu = 0 are classical solutions, in fact, any distributional solution must be in C ∞ (Ω). This is an example of very important kind of regularity result in PDE theory, and will not be proved here, see for example Corollary 2.20 of [11]. The difference between Laplace’s equation and the wave equation, i.e. that Laplace’s equation has only classical solutions, while the wave equation has many non-classical solutions, is a typical difference between solutions of PDEs of elliptic and hyperbolic types. 9.3 Fundamental solutions secfundsol Let Ω ⊂ RN , L be a differential operator as in (9.2.9), and suppose G(x, y) has the following properties1 : G(·, y) ∈ D0 (Ω) Lx G(x, y) = δ(x − y) ∀y ∈ Ω 1 (9.3.1) The subscript x in Lx is used here to emphasize that the differential operator is acting in the x variable, with y in the role of a parameter. 154 We then call G a fundamental solution of L in Ω. If such a G can be found, then formally if we let Z u(x) = G(x, y)f (y) dy (9.3.2) fundsolform Ω we may expect that Z Lu(x) = Z δ(x − y)f (y) dy = f (x) Lx G(x, y)f (y) dy = Ω (9.3.3) Ω That is to say, (9.3.2) provides a way to obtain solutions of the PDE Lu = f , and perhaps also a tool to analyze specific properties of solutions. We are of course ignoring here all questions of rigorous justification – whether the formula for u even makes sense if G is only a distribution in x, for what class of f ’s this might be so, and whether it is permissible to differentiate under the integral to obtain (9.3.3). A more advanced PDE text such as Hörmander [16] may be consulted for such study. Fundamental solutions are not unique in general, since we could always add to G any function H(x, y) satisfying the homogeneous equation Lx H = 0 for fixed y. We will focus now on the case that Ω = RN and aα (x) ≡ aα for every α, i.e. L is a constant coefficient operator. In this case, if we can find Γ ∈ D0 (RN ) for which LΓ = δ, then G(x, y) = Γ(x − y) is a fundamental solution according to the above definition, and it is normal in this situation to refer to Γ itself as the fundamental solution rather than G. Formally, the solution formula (9.3.2) becomes Z u(x) = Γ(x − y)f (y) dy (9.3.4) RN an integral operator of convolution type. Again it may not be clear if this makes sense as an ordinary integral, but recall that we have earlier defined (Definition 7.7) the convolution of an arbitrary distribution and test function, namely u(x) = (Γ ∗ f )(x) := Γ(τx fˇ) (9.3.5) if Γ ∈ D0 (Ω) and f ∈ C0∞ (RN ). Furthermore, using Theorem 7.3, it follows that u ∈ C ∞ (RN ) and Lu(x) = (LΓ) ∗ f (x) = (δ ∗ f )(x) = f (x) (9.3.6) We have therefore proved Proposition 9.1. If there exists Γ ∈ D0 (Ω) such that LΓ = δ, then for any f ∈ C0∞ (RN ) the function u = Γ ∗ f is a classical solution of Lu = f . 155 932 It will essentially always be the case that the solution formula u = Γ∗f is actually valid for a much larger class of f ’s than C0∞ (RN ) but this will depend on specific properties of the fundamental solution Γ, which in turn depend on those of the original operator L. Example 9.5. If L = ∆, the Laplacian operator in R3 , then we have already shown (Example 7.13) that Γ(x) = −1/4π|x| satisfies ∆Γ = δ in the sense of distributions on R3 . Thus Z 1 1 f (y) u(x) = − ∗ f (x) = − dy (9.3.7) 4π|x| 4π R3 |x − y| provides a solution of ∆u = f in R3 , at least when f ∈ C0∞ (R3 ). The integral on the right in (9.3.7) is known as the Newtonian potential of f , and can be shown to be a valid solution formula for a much larger class of f ’s. It is in any case always a ’candidate’ solution, which can be analyzed directly. A fundamental solution of the Laplacian exists in RN for any dimension, and will be recalled at the end of this section. Example 9.6. Consider the wave operator Lu = utt −uxx in R2 . A fundamental solution for L (see Exercise 9) is 1 (9.3.8) Γ(x, t) = H(t − |x|) 2 The support of Γ, namely the set {(x, t) : |x| < t} is in this context known as the forward light cone, representing the set of points x at which for fixed t > 0 a signal emanating from the origin x = 0 at time t = 0, and travelling with speed one, may have reached. The resulting solution formula for Lu = f may then be obtained as Z ∞Z ∞ Γ(x − y, t − s)f (y, s) dyds u(x, t) = −∞ −∞ Z Z 1 ∞ ∞ = H(t − s − |x − y|)f (y, s) dyds 2 −∞ −∞ Z Z x+t−s 1 t = f (y, s) dyds 2 −∞ x−t+s (9.3.9) (9.3.10) (9.3.11) In many cases of interest f (x, t) ≡ 0 for t < 0 in which case we replace the lower limit in the s integral by 0. In any case the region over which f is integrated is the ’backward’ light cone, with vertex at (x, t). Under this support assumption on f it also follows that u(x, 0) = ut (x, 0) ≡ 0, so by adding in the corresponding terms in D’Alembert’s solution (2.3.46) we find that Z Z Z 1 t x+s−t 1 1 x+t u(x, t) = f (y, s) dyds + (h(x + t) + h(x − t)) + g(s) ds (9.3.12) 2 0 x+t−s 2 2 x−t 156 newtpot is the unique solution of utt − uxx = f (x, t) u(x, 0) = h(x) ut (x, 0) = g(x) x∈R t>0 x∈R x∈R (9.3.13) (9.3.14) (9.3.15) It is of interest to note that this solution formula could also be written, formally at least, as ∂ u(x, t) = (Γ ∗ f )(x, t) + (Γ ∗ h)(x, t) + (Γ ∗ g)(x, t) (9.3.16) (x) ∂t (x) where the notation (Γ ∗ h) indicates that the convolution takes place in x only, with t (x) as a parameter. Thus the fundamental solution Γ enters into the solution not only of the inhomogeneous equation Lu = f but in solving the Cauchy problem as well. This is not an accidental feature, and we will see other instances of this sort of thing later. So far we have seen a couple of examples where an explicit fundamental solution is known, but have given no indication of a general method for finding it, or even determining if a fundamental solution exists. Let us address the second issue first, by stating without proof a remarkable theorem. MalEhr Theorem 9.3. (Malgrange-Ehrenpreis) If L 6= 0 is any constant coefficient linear differential operator then there exists a fundamental solution of L. The proof of this theorem is well beyond the scope of this book, see for example Theorem 8.5 of [31] or Theorem 10.2.1 of [16]. The assumption of constant coefficients is essential here, counterexamples are known otherwise, even in the case of very simple and infinitely differentiable variable coefficients. If we now consider how it might be possible to compute a fundamental solution for a given operator L, it soon becomes apparent that the Fourier transform may be a useful tool. If we start with the distributional PDE X LΓ = aα D α Γ = δ (9.3.17) |α|≤m and take the Fourier transform of both sides, the result is X |α|≤m aα (Dα Γ)ˆ = X |α|≤m 157 aα (iy)α Γ̂ = 1 N (2π) 2 (9.3.18) 9315 or P (y)Γ̂(y) = 1 (9.3.19) divprob where P (y), the so-called symbol or characteristic polynomial of L is defined as X N (iy)α aα (9.3.20) P (y) = (2π) 2 |α|≤m Note it was implicitly assumed here that Γ̂ exists, which would be the case if Γ were a tempered distribution, but this is not actually guaranteed by Theorem 9.3. This is a rather technical issue which we will not discuss here, but rather take the point of view that we seek a formal solution which, potentially, further analysis may show is a bona fide fundamental solution. The problem of solving (9.3.19) for a distribution Γ̂ is a special case of the so-called problem of division, which is to solve an equation f S = T for a distribution S given a distribution T and smooth function f is a suitable class. Various aspects of this problem may be found in [16]. We have thus obtained Γ̂(y) = 1/P (y), or by the inversion theorem Z 1 1 ix·y Γ(x) = e dy N (2π) 2 RN P (y) (9.3.21) as a candidate for fundamental solution of L. One particular source of difficulty in making sense of the inverse transform of 1/P is that in general P has zeros, which might be of arbitrarily high order, making the integrand too singular to have meaning in any ordinary sense. On the other hand, we have seen, at least in one dimension, how welldefined distributions of the ’pseudo-function’ type may be associated with non- locally integrable functions such as 1/xm . Thus there may be some analogous construction in more than one dimension as well. This is in fact one possible means to proving the Malgrange-Ehrenpreis theorem. Generally speaking, the It also suggests that the situation may be somewhat easier to deal with if the zero set of P in RN is empty, or at least not very large. As a polynomial, of course, P always has zeros, but some or all of these could be complex, whereas the obstructions to making sense of (9.3.21) pertain to the real zeros of P only. If L is a constant coefficient differential operator of order m as above, define X N Pm (y) = (2π) 2 (iy)α aα (9.3.22) |α|=m which is known as the principal symbol of L. 158 fundsolform2 Definition 9.4. We say that L is elliptic if y ∈ RN , Pm (y) = 0 implies that y = 0. That is to say, the principal symbol has no nonzero real roots. For example the Laplacian operator L = ∆ is elliptic, as is ∆+ lower order terms, since either way P2 (y) = −|y|2 . On the other hand, the wave operator, written say as Lu = ∆u−uxN +1 xN +1 PN 2 2 is not elliptic, since the principal symbol is P2 (y) = yN +1 − j=1 yj . The following is not so difficult to establish (Exercise 16), and may be exploited in working with the representation (9.3.21) in the elliptic case. prop92 Proposition 9.2. If L is elliptic then {y ∈ RN : P (y) = 0} (9.3.23) the real zero set of P , is compact in RN , and lim|y|→∞ |P (y)| = ∞. We will next derive a fundamental solution for the heat equation by using the Fourier transform, although in a slightly different way from the above discussion. Consider first the initial value problem for the heat equation ut − ∆u = 0 x ∈ RN t > 0 u(x, 0) = h(x) x ∈ RN (9.3.24) (9.3.25) with h ∈ C0∞ (RN ). Assuming a solution exists, define the Fourier transform in the x variables, Z 1 û(y, t) = u(x, t)e−ix·y dx (9.3.26) N N 2 (2π) R Taking the partial derivative with respect to t of both sides gives (û)t = (ut )ˆ so by the usual Fourier transformation calculation rules, (ut )ˆ = (û)t = −|y|2 û (9.3.27) and û(y, 0) = ĥ(y). We may regard this as an ODE in t satisfied by û(y, t) for fixed y, for which the solution obtained by elementary means is 2 û(y, t) = e−|y| t ĥ(y) If we let Γ be such that Γ̂(y, t) = 1 N (2π) 2 (9.3.28) 2 e−|y| t then by Theorem 8.8 it follows that u(x, t) = (Γ ∗ h)(x, t) (x) 159 (9.3.29) Since Γ̂ is a Gaussian in x, the same is true for Γ itself, as long as t > 0, and from (8.4.7) we get Γ(x, t) = H(t) e− |x|2 4t N (4πt) 2 (9.3.30) By including the H(t) factor we have for later convenience defined Γ(x, t) = 0 for t < 0. Thus we get an integral representation for the solution of (9.3.38)-(9.3.39), namely Z Z |x−y|2 1 − 4t Γ(x − y, t)h(y) dy = u(x, t) = e h(y) dy (9.3.31) N (4πt) 2 RN RN hteqfs 930 valid for x ∈ RN and t > 0. As usual, although this was derived for convenience under very restrictive conditions on h, it is actually valid much more generally (see Exercise 12). Now to derive a solution formula for ut − ∆u = f , let v = v(x, t; s) be the solution of (9.3.38)-(9.3.39) with h(x) replaced by f (x, s), regarding s for the moment as a parameter, and define Z t v(x, t − s; s) ds u(x, t) = (9.3.32) 931 0 Assuming that f is sufficiently regular, it follows that Z t ut (x, t) = v(x, 0, t) + vt (x, t − s, s) ds 0 Z t = f (x, t) + ∆v(x, t − s, s) ds (9.3.33) (9.3.34) 0 = f (x, t) + ∆u(x, t) Inserting the formula (9.3.31) with h replaced by f (·, s) gives Z tZ Γ(x − y, t − s)f (y, s) dyds u(x, t) = (Γ ∗ f )(x, t) = 0 (9.3.35) (9.3.36) RN with Γ given again by (9.3.30). Strictly speaking, we should assume that f (x, t) ≡ 0 for t < 0 in order that the integral on the right in (9.3.36) coincide with the convolution in RN +1 , but this is without loss of generality, since we only seek to solve the PDE for t > 0. The procedure used above for obtaining the solution of the inhomogeneous PDE starting with the solution of a corresponding initial value problem is known as Duhamel’s method, and is generally applicable, with suitable modifications, for time dependent PDEs in which the coefficients are independent of time. 160 935 Since u(x, t) in (9.3.32) evidently satisfies u(x, 0) ≡ 0, it follows (compare to (9.3.16)) that u(x, t) = (Γ ∗ h)(x, t) + (Γ ∗ f )(x, t) (9.3.37) (x) 2 is a solution of x ∈ RN x ∈ RN ut − ∆u = f (x, t) u(x, 0) = h(x) t>0 (9.3.38) (9.3.39) Let us also observe here that if F (x) = then F ≥ 0, R RN 1 (2π) N 2 e− |x|2 4 (9.3.40) F (x) dx = 1, and N 1 x Γ(x, t) = √ (9.3.41) F (√ ) t t for t > 0. From Theorem 7.2, and the observation that a sequence of the form (7.3.11) satisfies the assumptions of that theorem, it follows that nN F (nx) → δ in D(RN ) as n → ∞. Choosing n = √1t we conclude that lim Γ(·, t) = δ t→0+ in D0 (RN ) (9.3.42) In particular limt→0+ (Γ ∗ h)(x, t) = h(x) for all x ∈ RN , at least when h ∈ C0∞ (RN ). (x) We conclude this section by collecting all in one place a number of important fundamental solutions. Some of these have been discussed already, some will be left for the exercises, and in several other cases we will be content with a reference. Laplace operator For L = ∆ in RN there exists the following fundamental solutions3 : |x| N =1 2 1 N =2 Γ(x) = 2π log |x| CN N ≥3 |x|N −2 2 (9.3.43) Note we do not say ’the solution’ here, in fact the solution is not unique without further restrictions. Some texts will use consistently the fundamental solution of −∆ rather than ∆, in which case all of the signs will be reversed. 3 161 laplace-fund where Z 1 CN = (2 − N )ΩN −1 dS(x) ΩN −1 = (9.3.44) |x|=1 Thus CN is a geometric constant, related to the area of the unit sphere in RN – an equivalent formula in terms of the volume of the unit ball in RN is also commonly used. Of the various cases, N = 1 is elementary to check, N = 2 is requested in Exercise 20 of Chapter 7, and we have done the N ≥ 3 case in Example 7.13. Heat operator For the heat operator L = fundamental solution ∂ ∂t − ∆ in RN +1 , we have derived earlier in this section the Γ(x, t) = H(t) e− |x|2 4t N (4πt) 2 (9.3.45) for all N . Wave operator 2 ∂ N +1 , the fundamental solution is again significantly For the wave operator L = ∂t 2 −∆ in R dependent on N . The cases of N = 1, 2, 3 are as follows: 1 N =1 2 H(t − |x|) H(t−|x|) 1 √ N =2 Γ(x, t) = 2π t2 −|x|2 (9.3.46) δ(t−|x|) N =3 4π|x| We have discussed the N = 1 case earlier in this section, and refer to [10] or [18] for the cases N = 2, 3. As a distribution, the meaning of the the fundamental solution in the N = 3 case is just what one expects from the formal expression, namely Z Z ∞ Z δ(t − |x|) φ(x, |x|) Γ(φ) = φ(x, t) dtdx = dx (9.3.47) 4π|x| R3 −∞ R3 4π|x| for any test function φ. Note the tendency for the fundamental solution to become more and more singular, as N increases. This pattern persists in higher dimensions, as the fundamental solution starts to contain expressions involving δ 0 and higher derivatives of the δ function. 162 Schrödinger operator ∂ The Schrödinger operator is defined as L = ∂t − i∆ in RN +1 . The derivation of a fundamental solution here is nearly the same as for the heat equation, the result being |x|2 Γ(x, t) = H(t) e− 4it (9.3.48) N (4iπt) 2 In quantum mechanics Γ is frequently referred to as the ’propagator’. See [27] for much material about the Schrödinger equation. Helmholtz operator The Helmholtz operator is defined by Lu = ∆u − λu. For λ > 0 and dimensions N = 2, 3 fundamental solutions are √ sin ( λ|x|) N =1 √ 2√λ √ λ (9.3.49) Γ(x) = 2π K0 ( λx) N =2 √ − λ|x| − e N =3 4π|x| where K0 is the so-called modified Bessel function of the second kind and order 0. See Chapter 6 of [3] for derivations of these formulas when N = 2, 3, while the N = 1 case is left for the exercises. This is a case where it may be convenient to use the Fourier transform method directly, since the symbol of L, P (y) = −|y|2 − λ has no real zeros. Klein-Gordon operator 2 The Klein-Gordon operator is defined by Lu = ∂∂t2u − ∆u − λu in RN +1 . We mention only the case N = 1, λ > 0, in which case a fundamental solution is p 1 Γ(x, t) = H(t − |x|)J0 ( λ(t2 − x2 )) 2 N =1 (9.3.50) where J0 is the Bessel function of the first kind and order zero (see Exercise 13 of Chapter 8). This may be derived, for example, by the method presented in Problem 2, Section 5.1 of [18], and choosing ψ = δ. 163 9349 Biharmonic operator The biharmonic operator is L = ∆2 , i.e. Lu = ∆(∆u). It arises especially in connection with the theory of plates and shells, so that N = 2 is the most interesting case. A fundamental solution is Γ(x) = |x|2 log |x| N =2 (9.3.51) for which a derivation of this is outlined in Exercise 10. 9.4 Exercises 1. Show that an equivalent definition of W 2,s (RN ) = H s (RN ) for s = 0, 1, 2, . . . is Z s N 0 N |fˆ(y)|2 (1 + |y|2 )s dy < ∞} (9.4.1) H (R ) = {f ∈ S (R ) : Rn The second definition makes sense even if s isn’t a positive integer and leads to one way to define fractional and negative order differentiability. Implicitly it requires that fˆ (but not f itself) must be a function. 2. Using the definition (9.4.1), show that H s (RN ) ⊂ C0 (RN ) if s > δ ∈ H s (RN ) if s < − N2 . N . 2 Show that 1 3. If Ω is a bounded open set in R3 , and u(x) = |x| , show that u ∈ W 1,p (Ω) for 1 ≤ p < 32 . Along the way, you should show carefully that a distributional first ∂u agrees with the corresponding pointwise derivative. derivative ∂x i 4. Prove that if f ∈ W 1,p (a, b) for p > 1 then 1 |f (x) − f (y)| ≤ ||f ||W 1,p (a,b) |x − y|1− p (9.4.2) so in particular W 1,p (a, b) ⊂ C([a, b]). (Caution: You would like to use the fundamental theorem of calculus here, but it isn’t quite obvious whether it is valid assuming only that f ∈ W 1,p (a, b).) ex-8-6 5. Proved directly that W k,p (Ω) is complete (relying of course on the fact that Lp (Ω) is complete). 6. Show that Theorem 9.1 is false for p = ∞. 164 HsDef 7. If f is a nonzero constant function on [0, 1], show that f 6∈ W01,p (0, 1) for 1 ≤ p < ∞. 8. Let Lu = u00 + u and E(x) = H(x) sin x, x ∈ R. a) Show that E is a fundamental solution of L. b) What is the corresponding solution formula for Lu = f ? c) The fundamental solution E is not the same as the one given in (9.3.49). Does this call for any explanation? ex-8-8 9. Show that E(x, t) = 21 H(t − |x|) is a fundamental solution for the wave operator Lu = utt − uxx . ex-9-10 10. The fourth order operator Lu = uxxxx + 2uxxyy + uyyyy in R2 is the biharmonic operator which arises in the theory of deformation of elastic plates. a) Show that L = ∆2 , i.e. Lu = ∆(∆u) where ∆ is the Laplacian. b) Find a fundamental solution of L. (Suggestions: To solve p LE = δ, first solve ∆F = δ and then ∆E = F . Since F will depend on r = x2 + y 2 only, you can look for a solution E = E(r) also.) 11. Let Lu = u00 + αu0 where α > 0 is a constant. a) Find a fundamental solution of L which is a tempered distribution. b) Find a fundamental solution of L which is not a tempered distribution. ex-9-12 12. Show directly that u(x, t) defined by (9.3.31) is a classical solution of the heat equation for t > 0, under the assumption that h is bounded and continuous on RN . 13. Assuming that (9.3.31) is valid and h ∈ Lp (RN ), derive the decay property ||u(·, t)||L∞ (RN ) ≤ ||h||Lp (RN ) N t 2p for 1 ≤ p ≤ ∞. 14. If ( y(x − 1) 0 < y < x < 1 G(x, y) = x(y − 1) 0 < x < y < 1 show that G is a fundamental solution of Lu = u00 in (0, 1). 15. Is the heat operator L = ex-9-4 ∂ ∂t − ∆ elliptic? 16. Prove Proposition 9.2. 165 (9.4.3) Chapter 10 Linear operators choperators 10.1 Linear mappings between Banach spaces Let X, Y be Banach spaces. We say that T : D(T ) ⊂ X 7−→ Y (10.1.1) is linear if ∀x1 , x2 ∈ D(T ) ∀c1 , c2 ∈ C T (c1 x1 + c2 x2 ) = c1 T (x1 ) + c2 T (x2 ) (10.1.2) Here D(T ) is the domain of T which we do not assume is all of X. Note, however, that it must be a subspace of X according to this definition. Likewise R(T ), the range of T , must be a subspace of Y. If D(T ) = X we say T is densely defined. It is common to write T x instead of T (x) when T is linear, and we will often use this notation. The definition of operator norm given earlier in (5.3.1) for the case D(T ) = X may be generalized to the present case. Definition 10.1. The norm of the operator T is ||T x||Y x∈D(T ) ||x||X ||T ||X,Y = sup x6=0 In general we allow for the case that ||T ||X,Y = ∞. 166 (10.1.3) Definition 10.2. If ||T ||X,Y < ∞ we will say that T is bounded on its domain. If in addition D(T ) = X we say T is bounded on X, or more simply that T is bounded, if there is no possibility of confusion. If it is clear from context what X, Y are, we may write ||x|| instead of ||x||X etc. We point out, however, that many linear operators of interest may be defined for many different choices of X, Y, and it will be important to be able to specify precisely which spaces we have in mind. Verification of the following properties is left for the reader. Proposition 10.1. If T : D(T ) ⊂ X 7−→ Y is a linear operator then 1. ||T || = sup ||T x||. x∈D(T ) ||x||=1 2. ||T x|| ≤ ||T || ||x|| for all x ∈ D(T ). The proof of the following is more or less the same as that of Proposition 5.4. Theorem 10.1. Let T be a linear operator from X to Y. Then the following are equivalent: 1. T is bounded on its domain. 2. T is continuous at every point of D(T ). 3. T is continuous at some point of D(T ). 4. T is continuous at 0. We also have (see Exercise 3) prop10-2 Proposition 10.2. If T is bounded on its domain then it has a unique norm preserving extension to D(T ). That is to say, there exists a unique linear operator S : D(T ) ⊂ X 7−→ Y such that Sx = T x for x ∈ D(T ) and ||S|| = ||T ||. It follows that if T is densely defined and bounded on its domain, then it automatically has a unique bounded extension to all of X. In such a case we will always assume that T has been replaced by this extension, unless otherwise stated. 167 Recall the notations introduced previously, B(X, Y) = {T : X → Y : T is linear and ||T ||X,Y < ∞} B(X) = B(X, X) 10.2 X∗ = B(X, C) (10.1.4) (10.1.5) Examples of linear operators OpExamples We next discuss an extensive list of examples of linear operators. op-finite Example 10.1. Let X = Cn , Y = Cm and T x = Ax for some m × n complex matrix A, i.e. n X (T x)k = akj xj k = 1, . . . m (10.2.1) j=1 if ajk is the (j, k) entry of A. Clearly T is linear and in Exercise 6 you are asked to verify that T is bounded for any choice of the norms on X, Y. The exact value of the operator norm of T , however, will depend on exactly which norms are used in X, Y. Suppose we use the usual Euclidean norm || · ||2 in both spaces. Then using the Schwarz inequality we may obtain 2 ! n ! m X n m n X X X X ||T x||2 = akj xj ≤ |akj |2 |xj |2 (10.2.2) j=1 j=1 j=1 j=1 k=1 ! m X n X = |akj |2 ||x||2 (10.2.3) k=1 j=1 from which we conclude that ||T || ≤ m X n X !1/2 |akj |2 (10.2.4) k=1 j=1 The right hand side of (10.2.4) is known as the Frobenius norm of A, and it is easy to check that it satisfies all of the axioms of a norm on the vector space of m × n matrices. Note however that (10.2.4) is only an inequality and it is known to be strict in general, as will be clarified below. 168 frob If p, q ∈ [1, ∞] let us temporarily use the notation ||T ||p,q for the norm of T when we use the p-norm in X and the q-norm in Y. The problem of computing ||T ||p,q in a more or less explicit way from the entries of A is difficult in general, but several special cases are well known. • If p = q = 1 then ||T ||1,1 = max j n X |akj |, the maximum absolute column sum of A. k=1 • If p = q = ∞ then ||T ||∞,∞ = max k m X |akj |, the maximum absolute row sum of A. j=1 • If p = q = 2 then ||T ||2,2 is the largest singular value of A, or equivalently ||T ||22,2 is the largest eigenvalue of the square Hermitian matrix A∗ A. Details about these points may be found in most textbooks on linear algebra or numerical analysis, see for example Chapter 2 of [14]. Example 10.2. Let X = Y = Lp (RN ) and T be the translation operator defined on D(T ) = X by T u(x) = τh u(x) = u(x − h) (10.2.5) for some fixed h ∈ RN . Clearly ||T u|| = ||u|| (10.2.6) for any f so that ||T || = 1. op-mult Example 10.3. Let Ω ⊂ RN , X = Y = Lp (Ω), m ∈ L∞ (Ω) and define the multiplication operator T on D(T ) = X by T u(x) = m(x)u(x) (10.2.7) Clearly we have ||T u||Lp ≤ ||m||L∞ ||u||Lp (10.2.8) so that ||T || ≤ ||m||L∞ . We claim that actually equality holds. The case m ≡ 0 is trivial, otherwise in the case 1 ≤ p < ∞ we can see it as follows. For any 0 < < ||m||L∞ there must exist a measurable set Σ ⊂ Ω of measure η > 0 such that ±m(x) ≥ ||m||L∞ − for x ∈ Σ. If we now choose u = χΣ , the characteristic function of Σ, then ||u||Lp = η 1/p and Z p |m(x)|p dx ≥ η(||m||L∞ − )p (10.2.9) ||T u||Lp = Σ 169 Thus ||T u||Lp ≥ ||m||L∞ − ||u||Lp (10.2.10) which immediately implies that ||T || ≥ ||m||L∞ as needed. The case p = ∞ is left as an exercise. Example 10.4. One of the most important classes of operators we will be concerned with in this book is integral operators. Let Ω ⊂ RN , X = Y = L2 (Ω), K ∈ L2 (Ω × Ω) and define the operator T by Z T u(x) = K(x, y)u(y) dy (10.2.11) GenIntOp Ω It may not be immediately clear how we should define D(T ), but note by the Schwarz inequality that 2 Z Z K(x, y)u(y) dy dx ||T u||2L2 = (10.2.12) Ω Ω Z Z Z 2 2 |u(y)| dy dx (10.2.13) ≤ |K(x, y)| dy Ω Ω Ω Z Z Z 2 2 |u(y)| dy (10.2.14) |K(x, y)| dy dx = Ω Ω Ω This shows simultaneously that T u ∈ L2 whenever u ∈ L2 , so that we may take D(T ) = L2 , and that ||T || ≤ ||K||L2 (Ω×Ω) (10.2.15) HSest We refer to K as the kernel1 of the operator T . Note the formal similarity between this calculation and that of Example 10.1. Just as in that case, the inequality for ||T || is strict, in general. Example 10.5. Let h be a locally integrable function and define the convolution operator Z T u(x) = (h ∗ u)(x) = h(x − y)u(y) dy (10.2.16) RN This is obviously an operator of the type (10.2.11) with Ω = RN but for which K(x, y) = h(x − y) does not satisfy the L2 condition in the previous example, except in trivial cases. 1 which is not to be confused with the null space of T ! 170 ConvOp Thus it is again not immediately apparent how we should define D(T ). Recall, however, Young’s convolution inequality (7.4.2) which implies immediately that ||T u||Lr ≤ ||h||Lp ||u||Lq if 1 1 1 + =1+ p q r p, q, r ∈ [1, ∞] (10.2.17) (10.2.18) Thus we may take D(T ) = X = Lq (RN ) and Y = Lr (RN ) with p, q, r are related as above, in which case ||T || ≤ ||h||Lp . [Is this sharp?] Example 10.6. If we let T u(x) = Z 1 (2π) N 2 u(y)e−ix·y dy (10.2.19) RN then T u(x) = û(x), is the Fourier transform of u studied in Chapter 8. It is again a special case of (10.2.11) with kernel K not satisfying the L2 integrability condition. From the earlier discussion of properties of the Fourier transform we have the following: 1. T is a bounded linear operator from X = L1 (RN ) into Y = C0 (RN ) with norm ||T || ≤ 1 N (2π) 2 (10.2.20) In fact it is easy to see that equality holds here, see Exercise 16. 2. T is a bounded linear operator from X = L2 (RN ) onto Y = L2 (RN ) with norm ||T || = 1. Indeed ||T u|| = ||u|| for all u ∈ L2 (RN ) by the Plancherel identity (8.5.25). It can also be shown, although this is more difficult (see Chapter I, section 2 of [35]), that T is a bounded linear operator from X = Lp (RN ) into Y = Lq (RN ) if 1<p≤2≤q<∞ 1 1 + =1 p q (10.2.21) If u ∈ Lp (RN ) for p > 2 then û always exists in a distributional sense, but may not be a function, see Chapter I, section 4.13 of [35] 171 fourmult Example 10.7. Let m ∈ L∞ (RN ) and define the linear operator T , known as a Fourier multiplication operator, by Tcu(y) = m(y)b u(y) (10.2.22) where as usual û denotes the Fourier transform. If we use F as an alternative special notation for the Fourier transform, and let S denote the multiplication operator defined in Example 10.3, then it is equivalent to defining T = F −1 SF. If we take X = Y = L2 (RN ) then from the known properties of F, S we get immediately from the Plancherel identity that ||T u||L2 = ||Tcu||L2 = ||mb u||L2 ≤ ||m||L∞ ||b u||L2 = ||m||L∞ ||u||L2 (10.2.23) implying that ||T || ≤ ||m||L∞ . As in the case of the ordinary multiplication operator one can show that equality must hold. Note that formally we have Z Z 1 −iz·y ix·y e u(z) dz dy T u(x) = e m(y) (2π)N RN RN Z Z 1 i(x−z)·y = m(y)e dy dz u(z) (2π)N RN RN Z = u(z)h(x − z) dz (10.2.24) (10.2.25) (10.2.26) RN provided that b h(y) = m(y) (2π) N 2 . Thus the Fourier multiplication operators appears to be just a special kind of convolution operator. However m ∈ L∞ (RN ) could happen even if h 6∈ Lp (RN ) for any p, in which case the above discussion about convolution operators is not applicable. A trivial example of this is when m(y) ≡ 1 corresponding to T being the identity mapping and h being the delta function. A significant example is obtained by taking N = 1 and m(y) = −i sgn (y). By (8.8.5) √ b we see that m(y) = 2π h(y) if h(x) = π1 pv x1 , where here the Fourier transform is meant in the sense of distributions. Thus, we have at least formally that Z ∞ 1 1 1 u(y) T u(x) = pv ∗ u (x) = pv dy (10.2.27) π x π −∞ x − y This operator is known as the Hilbert transform, and will be from now on denoted by H. Since we have not rigorously established the validity of the formulas (10.2.27), or even explained why the principal value integral in (10.2.27) should exist in general for 172 HilbTransDef u ∈ L2 (R), we will always use the above, completely unambiguous definition of H as a Fourier multiplication operator when anything needs to be proved. For example, since c |m(y)| ≡ 1, we get |Hu(y)| ≡ |b u(y)| and then c L2 = ||b ||Hu||L2 = ||Hu|| u||L2 = ||u||L2 (10.2.28) and in particular ||H|| = 1 as an operator on L2 (R). The Hilbert transform is the archetypical example of a singular integral operator, see for example Chapter II of [34]. A Fourier multiplication operator is often referred to as a filter, especially in the electrical engineering and signals processing literature. The idea here is that if u = u(t), t ∈ R represents a signal, then u b(k) corresponds to the signal in the ’frequency domain’, in the sense that the Fourier inversion formula Z ∞ 1 u(t) = √ eikt u b(k) dk (10.2.29) 2π −∞ represents the signal as a superposition of fixed frequency signals eikt , with u b(k) then being the weight given to the component of frequency k. The effect of a filter is thus to modify the frequency component u b(k) by multiplying it by m(k). The operator T coming from the choice ( 1 |k| < k0 m(k) = (10.2.30) 0 |k| ≥ k0 leaves low frequencies (|k| < k0 ) unchanged and removes all of the high frequency components, and is for this reason sometimes called an ideal low-pass filter. Likewise 1 − m(k) gives an ideal high-pass filter. A band-pass filter would be one which for which m(k) = 1 on some interval of frequencies [k1 , k2 ] and is zero otherwise. Example 10.8. If H is a Hilbert space and M ⊂ H is a closed subspace, we have seen in Chapter 6 that the orthogonal projection PM is a linear operator defined on all of H. It is immediate from the relation (6.4.10) that ||PM x|| ≤ ||x|| for all x ∈ H, and aside from the trivial case PM = 0 there must exist x ∈ H, x 6= 0 such that PM x = x, from which it follows that ||PM || = 1. Example 10.9. Let X = Y = `2 (the sequence space defined in Example 6.3). If x = {x1 , x2 , . . . } ∈ `2 set S+ x = {0, x1 , x2 , . . . } (10.2.31) RShift S− x = {x2 , x3 , . . . } LShift (10.2.32) 2 which are called respectively the right and left shift operators on ` . Clearly ||S+ x|| = ||x|| for any x, and ||S− x|| ≤ ||x|| with equality if x1 = 0. Thus, ||S+ || = ||S− || = 1. Note 173 that S− S+ = I (the identity map), while S+ S− = PM where M is the closed subspace M = {x ∈ `2 : x1 = 0}. exmp10-10 Example 10.10. Let Ω be an open set in RN , m a positive integer and X T u(x) = aα (x)Dα u (10.2.33) |α|≤m where the coefficients aα ∈ C(Ω). If X = Y = Lp (Ω), 1 ≤ p < ∞ then we can let D(T ) = C m (Ω) which is a dense subset of X (since it contains C0∞ (Ω), for example). Thus T is a densely defined linear operator, but it is not bounded in general. For example, take X = Y =√L2 (0, 1), T u = u0 and√un (x) = sin nπx. Then by explicit calculation we find ||un || = 1/ 2 and ||T un || = nπ/ 2, so that ||T un ||/||un || → ∞ as n → ∞. Note that in the constant coefficient case with Ω = RN we have Tcu(y) = P (y)b u(y), provided T is a tempered distribution, where P is the characteristic polynomial of P as discussed earlier in Section 9.3. Thus T is formally a a Fourier multiplication operator but with a multiplier m(y) = P (y) which is not in L∞ . Example 10.11. A pseudodifferential operator (ΨDO) is an operator of the form Z T u(x) = a(x, y)eix·y u b(y) dy (10.2.34) RN for some function a, known as the symbol of T . If a(x, y) = a(y), a ∈ L∞ (RN ) then T is a Fourier multiplication operator, while if a = a(x) it is an ordinary multiplication operator. 10.3 Linear operator equations Given a linear operator T : X → Y, we wish to study the operator equation Tu = f (10.3.1) where f is a given member of Y. In the usual way, if T is one-to-one, i.e. if N (T ) = {0}, then we may define the corresponding inverse operator T −1 : R(T ) → D(T ). It is easy to check that T −1 is also linear when it exists, but it need not be bounded even if T is, or it may be bounded even if T is not. Some key questions which always arise in connection with (10.3.1) are: 174 MainOpEq • For what f ’s does there exist a solution u, i.e. what is the range R(T )? • If a solution exists, is it unique? If not, how can we describe the set of all solutions? Since any two solutions differ by a solution of T u = 0 this amounts to characterizing the null space N (T ). The investigation of these questions will clearly require us to be precise about what the spaces X, Y are. For reasons which will become more apparent below, we will mostly focus on the case that X = Y = H, a Hilbert space, but the study of more general situations can be found in more advanced texts. Let us first consider the case when X = Cn , Y = Cm so T u = Au for some m × n matrix A = [akj ]. Well known results from linear algebra tell us • R(T ) is the column space of A, i.e., the set of all linear combinations of the columns of A. • R(T ) = N (T ∗ )⊥ , where T ∗ is the matrix multiplication operator with matrix A∗ , the conjugate transpose (or Hermitian conjugate, or adjoint matrix) of A. The second item provides a complete characterization of when T u = f is solvable, namely, a solution exists if and only if f ⊥ v for every v ∈ N (T ∗ ). If the subspace N (T ∗ ) has the basis {v1 , . . . vp } then it is equivalent to require hf, vk i = 0, k = 1, . . . p. This amounts to p solvability, or consistency, conditions on f , which are necessary and sufficient for the existence of a solution of T u = f . Eventually we will prove a version of this statement in a Hilbert space setting, for certain types of operator T . The main point, at present, is that the operator T ∗ plays a key role in understanding the solvability of T u = f , and so something similar can be expected in the infinite dimensional case. The operator T ∗ is the so-called adjoint operator of T , and in the next section we show how it can be defined at least in the case that T is bounded. The case of unbounded T is more subtle, and will be taken up in the following chapter. 10.4 The adjoint operator In the finite dimensional example of the previous section, note that T ∗ has the property hT u, vi = hu, T ∗ vi ∀u ∈ Cn 175 v ∈ Cm (10.4.1) since either side is equal to Pm Pn k=1 j=1 akj uj vk . Now suppose X = Y = H, a Hilbert space and T is a bounded linear operator on H. With the above motivation we seek another bounded linear operator T ∗ with the property that hT u, vi = hu, T ∗ vi ∀u, v ∈ H (10.4.2) MainAdjProp If such a T ∗ can be found, observe that if there exists any solution u of T u = f then we have hf, vi = hT u, vi = hu, T ∗ vi =< u, 0 >= 0 (10.4.3) for any v ∈ N (T ∗ ), so that f ⊥ v must hold for all such v. We have thus shown that R(T ) ⊥ N (T ∗ ), or equivalently R(T ) ⊂ N (T ∗ )⊥ N (T ∗ ) ⊂ R(T )⊥ (10.4.4) where the second inclusion follows from the first and the fact that N (T ∗ ) is closed. In particular f ⊥ N (T ∗ ) is a necessary condition for the solvability of T u = f . The sufficiency of this condition need not be true in general as we will see by examples, but it does hold for some important classes of operator T . AdjExists Theorem 10.2. If H is a Hilbert space and T ∈ B(H) then there exists a unique T ∗ ∈ B(H), the adjoint of T , such that (10.4.2) holds. In addition, ||T ∗ || = ||T ||. Proof: Fix v ∈ H and let `(u) = hT u, vi. Clearly ` is linear on H and |`(u)| = |hT u, vi| ≤ ||T u|| ||v|| ≤ ||T || ||u|| ||v|| (10.4.5) and therefore ` ∈ H∗ with ||`|| ≤ ||T || ||v||. By the Riesz Representation Theorem there exists a unique v ∗ ∈ H such that `(u) = hu, v ∗ i ∀u ∈ H (10.4.6) We define T ∗ v = v ∗ so that clearly T ∗ : H → H and (10.4.2) is true. We claim next that T ∗ is linear. To see this, note that for any v1 , v2 ∈ H, u ∈ H and scalars c1 , c2 hu, T ∗ (c1 v1 + c2 v2 )i = = = = hT u, c1 v1 + c2 v2 i c1 hT u, v1 i + c2 hT u, v2 i c1 hu, T ∗ v1 i + c2 hu, T ∗ v2 i hu, c1 T ∗ v1 + c2 T ∗ v2 i 176 (10.4.7) (10.4.8) (10.4.9) (10.4.10) RperpN Since u is arbitrary we must have T ∗ (c1 v1 + c2 v2 ) = c1 T ∗ v1 + c2 T ∗ v2 as needed. Next we claim that T ∗ is bounded. To verify this, note that ||T ∗ v|| = ||v ∗ || = ||`|| ≤ ||T || ||v|| implying that ||T ∗ || ≤ ||T || (10.4.11) To check the uniqueness property suppose that there exists some other bounded linear operator S such that hT u, vi = hu, Svi for all u, v ∈ H. It would then follow that hu, T ∗ v − Svi = 0 for all u, implying T ∗ v = Sv for all v, in other words S = T ∗ must hold. Finally we show that ||T ∗ || = ||T ||. Since T ∗ ∈ B(H) it also has an adjoint T ∗∗ satisfying hT ∗ u, vi = hu, T ∗∗ vi for all u, v. But we also have hT ∗ u, vi = hv, T ∗ ui = hT v, ui =< u, T v > (10.4.12) so by uniqueness of the adjoint we must have T ∗∗ = T . But then from (10.4.11) with T replaced by T ∗ it follows that ||T || = ||T ∗∗ || ≤ ||T ∗ || and so from (10.4.11) again we obtain ||T || = ||T ∗ ||. Certain special classes of operator are defined, depending on the relationship between T and T ∗ . Definition 10.3. If T ∈ B(H) then • If T ∗ = T we say T is self-adjoint. • If T ∗ = −T we say T is skew-adjoint. • If T ∗ = T −1 we say T is unitary. Proposition 10.3. If S, T ∈ B(H) then ST ∈ B(H) and (ST )∗ = T ∗ S ∗ (10.4.13) If T −1 ∈ B(H) then (T ∗ )−1 ∈ B(H) and (T −1 )∗ = (T ∗ )−1 The proofs of these two properties will be left for the exercises. 177 (10.4.14) NormAdjoint 10.5 Examples of adjoints We now revisit several of the examples from Section 10.2, with focus on computing the corresponding adjoint operators. We remark that the uniqueness assertion of Theorem 10.2 is a relatively elementary thing, but note how it gets used repeatedly below to establish what the adjoint of a given operator T is. Example 10.12. In the case H = Cn with T u = Au, A an n × n matrix, we already know that hT u, vi = hAu, vi = hu, A∗ vi (10.5.1) where A∗ is the conjugate transpose matrix of A. Thus by uniqueness T ∗ v = A∗ v, as expected. T is then obviously self-adjoint if A∗ = A, consistent with the usual definition from linear algebra. A is also said to be a Hermitian matrix in this case, or symmetric in the real case. Likewise the meaning of a skew-adjoint operator or unitary operator coincides with the way the terms are normally used in linear algebra. Note that we haven’t considered here the case of an m × n matrix with m 6= m since then the domain and range spaces would be different, requiring a somewhat different way of defining the adjoint. Example 10.13. Consider the multiplication operator T u(x) = m(x)u(x) on L2 (Ω), where m ∈ L∞ (Ω). Then Z Z hT u, vi = m(x)u(x)v(x) dx = u(x) m(x)v(x) dx (10.5.2) Ω Ω from which it follows that T ∗ v(x) = m(x)v(x). T is self-adjoint if m is real valued, skew-adjoint if m is purely imaginary and unitary if |m(x)| ≡ 1. GenIntOpAdj Example 10.14. Next we look at the integral operator (10.2.11) on L2 (Ω), with K ∈ L2 (Ω × Ω) so that T is bounded. Assuming that the use of Fubini’s theorem below can be justified, we get Z Z < T u, v > = K(x, y)u(y) dy v(x) dx (10.5.3) Ω Ω Z Z = u(y) K(x, y)v(x) dx (10.5.4) Ω Ω which is the same as hu, T ∗ vi if and only if Z ∗ T v(y) = K(x, y)v(x) dx Ω 178 (10.5.5) or equivalently Z ∗ T v(x) = K(y, x)v(y) dy (10.5.6) Ω Thus T ∗ is the integral operator with kernel K(y, x), and note again the formal similarity to the case of the matrix multiplication operator. The use of Fubini’s theorem to exchange the order of integrals above can be justified by observing that K(x, y)u(y)v(x) ∈ L1 (Ω × Ω) under our assumptions (Exercise 9). T will be self-adjoint, for example, if K is real valued and symmetric in x, y. Example 10.15. Consider next T = F, the Fourier transform on L2 (RN ). Based on the previous example we may expect that Z 1 ∗ eix·y v(y) dy (10.5.7) T v(x) = N (2π) 2 RN FTAdj since the kernel here is the conjugate transpose of that for T . This is correct, but can’t be proven as above since the use of Fubini’s theorem can’t be directly justified. Instead we proceed by first recalling the Parseval identity (8.5.17) Z Z u b(x)v(x) dx = u(x)b v (x) dx (10.5.8) RN Thus RN Z hT u, vi = RN Z u b(x)v(x) dx = d u(x)v(x) dx (10.5.9) RN d so that T ∗ v(x) = v(x). One can now check, by unwinding the definitions, that this is the same as T ∗ v(x) = T v(−x), which amounts to (10.5.7). Furthermore, we recognize from the Fourier inversion theorem that (10.5.7) may be restated as T ∗ v = T −1 v (10.5.10) so in particular the Fourier transform is a unitary operator. Example 10.16. If T is the Fourier multiplication operator T = F −1 SF on L2 (RN ), where S is the multiplication operator with L∞ multiplier m, then we can obtain that T ∗ = F −1 S ∗ F, i.e. T ∗ is the Fourier multiplication operator with multiplier m. In particular, the Hilbert transform is skew-adjoint, H∗ = −H, since m(y) = −m(y) in this case. 179 10-5-10 10.6 Conditions for solvability of linear operator equations Let us return now to the general study of operator equations T u = f , when T is a bounded linear operator on a Hilbert space H. prop10-4 Proposition 10.4. If T ∈ B(H) then N (T ∗ ) = R(T )⊥ . Proof: By (10.4.4) we have N (T ∗ ) ⊂ R(T )⊥ . Conversely, if v ∈ R(T )⊥ then hu, T ∗ vi = hT u, vi = 0 for all u ∈ H. Thus T ∗ v = 0 must hold so v ∈ N (T ∗ ). Since M ⊥⊥ = M for any subspace M we get immediately corr10-2 Corollary 10.1. If T ∈ B(H) then N (T ∗ )⊥ = R(T ). Corollary 10.2. If T ∈ B(H) has closed range then T u = f has a solution if and only if f ⊥ N (T ∗ ). Since N (T ∗ )⊥ is always a closed subspace, clearly the identity R(T ) = N (T ∗ )⊥ can only hold if T has closed range. This is not true in general, although it holds in the finite dimensional case, by Theorem 5.1. Rx Example 10.17. Let H = L2 (0, 1) and T u(x) = 0 u(y) dy. We may think of this operator as the special case of (10.2.11) in which ( 1 y<x K(x, y) = (10.6.1) 0 y>x This kernel is clearly in L2 ((0, 1) × (0, 1)) so that T ∈ B(H). Let fn be any sequence of continuously differentiable functions such that fn (0) = 0 for all n and fn converges in H to f (x) = H(x − 1/2). Each fn is in the range of T since fn = T un if un = fn0 . But f 6∈ H since the range of T contains only continuous functions. Thus T does not have closed range. Definition 10.4. If T is any linear operator, we set rank (T ) = dim R(T ), and say that T is a finite rank operator whenever rank (T ) < ∞. We have thus established the following: Corollary 10.3. If T ∈ B(H) and rank (T ) < ∞ then R(T ) = N (T ∗ )⊥ . 180 Aside from the completely finite dimensional situation, there are other finite rank operators which will be of interest to us. R1 Example 10.18. Let H = L2 (0, 1) and T u(x) = 0 xyu(y) dy. Then R(T ) = span (e) where e(x) = x, so rank (T ) = 1. Here T is self-adjoint so N (T ∗ ) = N (T ) = {e}⊥ so the conclusion of the corollary is obvious. More generally, let H = L2 (Ω) for some bounded open set Ω ⊂ RN and let T u(x) = K(x, y)u(y) dy where Ω M X K(x, y) = φj (x)ψj (y) (10.6.2) R j=1 for some φj , ψj ∈ L2 (Ω). We may always assume that the φj ’s and ψj ’s are linearly independent. Such a kernel K is sometimes said to be degenerate. In this case we have R(T ) = L(φ1 , . . . φM ) so that rank (T ) = M . The condition f ⊥ N (T ∗ ) amounts to requiring the M solvability or consistency conditions, hf, φj i = 0 for j = 1, . . . M . 10.7 Fredholm operators and the Fredholm alternative The following is a very useful concept. fredholmdef Definition 10.5. T ∈ B(H) is of Fredholm type (or more informally, a Fredholm operator) if • N (T ), N (T ∗ ) are both finite dimensional, • R(T ) is closed. For such an operator T we define ind (T ), the index of T as ind (T ) = dim(N (T )) − dim(N (T ∗ )) (10.7.1) For our purposes the case of Fredholm operators of index 0 will be the most important one. If we can show somehow that an operator T belongs to this class then we obtain immediately the conclusion that ’uniqueness is equivalent to existence’. That is to say, the property that T u = f has at most one solution for any f ∈ H is equivalent to the property that T u = f has at least one solution for any f ∈ H. The following elaboration of this is known as the Fredholm Alternative Theorem. 181 FredAlt Theorem 10.3. Let T ∈ B(H) be a Fredholm operator of index 0. Then either 1. N (T ) = N (T ∗ ) = {0} and the equation T u = f has a unique solution for every f ∈ H, or 2. dim(N (T )) = dim(N (T ∗ )) = M > 0, the equation T u = f has a solution u∗ if and only if f satisfies the M compatibility conditions f ⊥ N (T ∗ ), and the general solution of T u = f can be written as {u = u∗ + v : v ∈ N (T )}. Example 10.19. Every linear operator on CN is of Fredholm type and index 0, since by a well known fact from matrix theory, a matrix and its transpose have null spaces of the same dimension. In the infinite dimensional situation it is easy to find examples of nonzero index – the simplest example is a shift operator. Example 10.20. If we define S+ , S− as in (10.2.31), (10.2.32) then by Exercise 10 S+∗ = S− , S−∗ = S+ , and it is then easy to see that ind (S+ ) = −1 and ind (S− ) = 1. Clearly by shifting to the left or right by more than one entry, we can create an example of a Fredholm operator with any integer as its index. Example 10.21. We will see in Chapter 13 that the operator λI + T , where T is an integral operator of the form (10.2.11), with K ∈ L2 (Ω × Ω) and λ 6= 0, is always a Fredholm operator of index 0. 10.8 Convergence of operators Recall that if X, Y are Banach spaces we have defined a norm on B(X, Y) for which all of the norm axioms are satisfied, so that B(X, Y) is a normed linear space, and in fact is itself a Banach space. Definition 10.6. We say Tn → T uniformly if ||Tn − T || → 0, i.e. Tn → T in the topology of B(X, Y). We say Tn → T strongly if Tn x → T x for every x ∈ X. Clearly uniform convergence implies strong convergence, but the converse is false (see Exercise 17). As usual we can define an infinite series of operators as the limit of the partial sums, and speak of uniform or strong convergence of the series. The series 182 P∞ n=1 Tn will converge uniformly to some limit T ∈ B(X) if ∞ X ||Tn || < ∞ (10.8.1) OpMtest n=1 and in this case ||T || ≤ by the following. th10-4 P∞ n=1 ||Tn || (See Exercise 18). An important special case is given Theorem 10.4. If T ∈ B(X), λ ∈ C and ||T || < |λ| then (λI − T )−1 ∈ B(X), −1 (λI − T ) ∞ X Tn = λn+1 n=0 (10.8.2) 10-8-2 (10.8.3) 10-8-3 where the series is uniformly convergent, and ||(λI − T )−1 || ≤ 1 |λ| − ||T || Proof: If Tn is replaced by T n /λn+1 then clearly (10.8.1) holds for the series on the right hand side of (10.8.2), so it is uniformly convergent to some S ∈ B(X). If SN denotes the N ’th partial sum then T N +1 SN (λI − T ) = I − N +1 (10.8.4) λ Since ||T N +1 /λN +1 || < (||T ||/|λ|)N +1 → 0 we obtain S(λI − T ) = I in the limit as N → ∞. Likewise (λI − T )S = I, so that (10.8.2), and subsequently (10.8.3) holds. The formula (10.8.2) is easily remembered as the ’geometric series’ for (λI − T )−1 . 10.9 Exercises In these exercises assume that X is a Banach space and H is a Hilbert space. 1. If T1 , T2 ∈ B(X) show that ||T1 + T2 || ≤ ||T1 || + ||T2 ||, ||T1 T2 || ≤ ||T1 || ||T2 ||, and ||T n || ≤ ||T ||n . 1 −2 2. If A = compute the Frobenius norm of A and ||A||p for p = 1, 2 and ∞. 3 4 183 extension 3. Prove Proposition 10.2. 4. Define the averaging operator 1 T u(x) = x Z x u(y) dy 0 Show that T is bounded on Lp (0, ∞) for 1 < p < ∞. (Suggestions: Assume first that u ≥ 0 and is a continuous function of compact support. If v = T u show that Z ∞ Z ∞ p v (x) dx = −p v p−1 (x)xv 0 (x) dx 0 0 Note that xv 0 = u − v and apply Hölder’s inequality. Then derive the general case. The resulting inequality is known as Hardy’s inequality.) exc10-5 5. Let T be the Fourier multiplication operator on L2 (R) with multiplier m(y) = H(y) (the Heaviside function), and define M+ = {u ∈ L2 (R) : û(y) = 0 ∀y < 0} M− = {u ∈ L2 (R) : û(y) = 0 ∀y > 0} a) Show that T = 21 (I + iH), where H is the Hilbert transform. b) Show that if u is real valued, then u is uniquely determined by either the real or imaginary part of T u. c) Show that L2 (R) = M+ ⊕ M− . d) Show that T = PM+ . e) If u ∈ M+ show that u = iHu. In particular, if u(x) = α(x) + iβ(x) then β = Hα α = −Hβ (Comments: T u is sometimes called the analytic signal of u. This terminology comes from the fact that T u can be shown to always have an extension as an analytic function to the upper half of the complex plane. It is often convenient to work with T u instead of u, because it avoids ambiguities due to k and −k really being the same frequency – the analytic signal has only positive frequency components. By b), u and T u are in one-to-one correspondence, at least for real signals. The relationships between α and β in e) are sometimes called the Kramers-Kronig relations. Note that it means that M+ contains no purely real valued functions except for u = 0, and likewise for M− .) 184 ex10-6 6. Show that a linear operator T : Cn → Cm is always bounded for any choice of norms on Cn and Cm . 7. If T, T −1 ∈ B(H) show that (T ∗ )−1 ∈ B(H) and (T −1 )∗ = (T ∗ )−1 . 8. If S, T ∈ B(H), show that (i) (S + T )∗ = S ∗ + T ∗ (ii) (ST )∗ = T ∗ S ∗ (These properties, together with (iii) (λT )∗ = λ̄T ∗ for scalars λ and (iv) T ∗∗ = T , which we have already proved, are the axioms for an involution on B(H), that is to say the mapping T 7−→ T ∗ is an involution. The term involution is also used more generally to refer to any mapping which is its own inverse.) Ex10-9 9. Give a careful justification of how (10.5.4) follows from (10.5.3) with reference to an appropriate version of Fubini’s theorem. Ex10-10 10. Let S+ , S− be the left and right shift operators on `2 . Show that S+ = S−∗ and S− = S+∗ . Rx 11. Let T be the Volterra integral operator T u(x) = 0 u(y) dy, considered as an operator on L2 (0, 1). Find T ∗ and N (T ∗ ). 12. Suppose T ∈ B(H) is self-adjoint and there exists a constant c > 0 such that ||T u|| ≥ c||u|| for all u ∈ H. Show that there exists a solution of T u = f for all f ∈ H. Show by example that the conclusion may be false if the assumption of self-adjointness is removed. 13. Let M be the multiplication operator M u(x) = xu(x) in L2 (0, 1). Show that R(M ) is dense but not closed. 14. An operator T ∈ B(H) is said to be normal if it commutes with its adjoint, i.e. T ∗ T = T T ∗ . Thus, for example, any self-adjoint, skew-adjoint, or unitary operator is normal. For a normal operator T show that a) ||T u|| = ||T ∗ u|| for every u ∈ H. b) T is one to one if and only if it has dense range. c) Show that any Fourier multiplication operator, as in Example 10.7, is normal in L2 (Ω). d) Show that the shift operators S+ , S− are not normal in `2 . 185 15. If U(H) denotes the set of unitary operators on H, show that U(H) is a group under composition. Is U(H) a subspace of B(H)? ex10-14 16. Prove that if T is the Fourier transform regarded as a linear operator from L1 (RN ) into C0 (RN ) then ||T || = 1 N . (2π) Ex10-17 Ex10-18 2 17. Give an example of a sequence Tn ∈ B(H) which is strongly convergent but not uniformly convergent. P P∞ 18. If Tn ∈ B(X) and ∞ n=1 ||Tn || < ∞, show that the series n=1 Tn is uniformly convergent. In particular verify that the operator exponential T e := ∞ X Tn n=0 n! is well defined for any T ∈ B(X) and satisfies ||eT || ≤ e||T || . 19. If T : D(T ) ⊂ X → Y is a linear operator, then S is a left inverse of T if ST x = x for every x ∈ D(T ) and is a right inverse if T Sx = x for every x ∈ R(T ). If X = Y is finite dimensional then it is known from linear algebra that a left inverse must also be a right inverse. Show by examples that this is false if X 6= Y or if X = Y is infinite dimensional. 20. If T ∈ B(H), the numerical range of T is the set {λ ∈ C : λ = hT x, xi for some x ∈ H} hx, xi If T is self-adjoint show that the numerical range of T is contained in the interval [−||T ||, ||T ||] of the real axis. What is the corresponding statement for a skewadjoint operator? 186 Chapter 11 Unbounded operators chunboundop 11.1 General aspects of unbounded linear operators Let us return to the general definition of linear operator given at the beginning of the previous chapter, without any assumption about continuity of the operator. For simplicity we will assume a Hilbert space setting, although much of what is stated below remains true for mappings between Banach spaces. We have the following essential definition. Definition 11.1. If H is a Hilbert space and T : D(T ) ⊂ H → H is a linear operator then we say T is closed if whenever un ∈ D(T ), un → u and T un → v then u ∈ D(T ) and T u = v. We emphasize that this definition is strictly weaker than continuity of T , since for a closed operator it is quite possible that un → u but the image sequence {T un } is divergent. This could not happen for a bounded linear operator. It is simple to check that any T ∈ B(H) must be closed. A common alternate way to define a closed operator employs the concept of the graph of T . Definition 11.2. If T : D(T ) ⊂ H → H is a linear operator then we define the graph of T to be G(T ) = {(u, v) ∈ H × H : v = T u} (11.1.1) 187 The definition of G(T ) (and for that matter the definition of closedness) makes sense even if T is not linear, but it is mostly useful in the linear case. It is easy to check that H × H is a Hilbert space with the inner product h(u1 , v1 ), (u2 , v2 )i = hu1 , u2 i + hv1 , v2 i (11.1.2) 11-1-2 In particular, (un , vn ) → (u, v) in H × H if and only if un → u and vn → v in H. One may now verify (Exercise 2) prop11-1 Proposition 11.1. T : D(T ) ⊂ H → H is a closed linear operator if and only if G(T ) is a closed subspace of H × H. We emphasize that closedness of T does not mean that D(T ) is closed – this is false in general. In fact we have the so-called Closed Graph Theorem, ClosedGraph Theorem 11.1. If T is a closed linear operator and D(T ) is a closed subspace of H, then T must be continuous on D(T ), We refer to Theorem 2.15 of [31] or Theorem 2.9 of [5] for a proof. In particular if T is closed and unbounded then D(T ) cannot be all of H. By far the most common type of unbounded operator which we will be interested in are differential operators. For use in the next example, let us recall that a function f defined on a closed interval [a, b] is absolutely continuous on [a, b] (f ∈ AC([a, b])) if for n any > 0 there k , bk )}k=1 is a disjoint collection of intervals P{(a Pnexists δ > 0 such that if in [a, b], and k=1 |bk − ak | < δ then nk=1 |f (bk ) − f (ak )| < . Clearly an absolutely continuous function is continuous. th11-2 Theorem 11.2. The following are equivalent. 1. f is absolutely continuous on [a, b]. 2. f is differentiable a.e. on [a, b], f 0 ∈ L1 (a, b) and Z x f (x) = f (a) + f 0 (y) dy ∀x ∈ [a, b] (11.1.3) a 3. f ∈ W 1,1 (a, b) and its distributional derivative coincides with its pointwise a.e. derivative. 188 ftc1 Here, the equivalence of 1 and 2 is an important theorem of analysis, see for example Theorem 11, section 6.5 of [28], Theorem 7.29 of [38] or Theorem 7.20 of [30], while the equivalence of 2 and 3 follows from Theorem 9.2 and the definition of the Sobolev space W 1,1 . SimpleDO Example 11.1. Let H = L2 (0, 1) and T u = u0 on the domain D(T ) = {u ∈ H 1 (0, 1) : u(0) = 0} (11.1.4) Here D(T ) is a dense subspace of H, since it contains D(0, 1), for example, but is not all of H, and T is unbounded, as in Example 10.10. We claim that T is closed. To see this, suppose un ∈ D(T ), un → u in H and vn = u0n → v in H. By our assumptions, (11.1.3) is valid, so Z x vn (y) dy un (x) = (11.1.5) 0 We can then find a subsequence nk → ∞ and a subset Σ ⊂ (0, 1) such that unk (x) → u(x) for x ∈ Σ and the complement of Σ has measure zero. For any x we also have that vnk → v in L2 (0, x), so that passing to the limit in (11.1.5) through the subsequence nk we obtain Z x v(s) ds ∀x ∈ Σ (11.1.6) u(x) = 0 If we denote the right hand side by w then it is clear that w ∈ D(T ), with w0 = v in the sense of distributions. Since u = w a.e., u and w coincide as elements of L2 (0, 1) and so we get the necessary conclusion that u ∈ D(T ) with u0 = v. The proper definition of D(T ) was essential in this example. If we had defined instead D(T ) = {u ∈ C 1 ([0, 1]) : u(0) = 0} then we would not have been able to reach the conclusion that u ∈ D(T ). An operator which is not closed may still be closeable, meaning that it has a closed extension. Let us define this concept carefully. Definition 11.3. If S, T are linear operators on H, we say that S is an extension of T if D(T ) ⊂ D(S) and T u = Su for u ∈ D(T ). In this case we write T ⊂ S. T is closeable if it has a closed extension. If T is not closed, then its graph G(T ) is not closed, but it always has a closure G(T ) in the topology of H × H, which is then a natural candidate for the graph of a closed operator which extends T . This procedure may fail however, because it may happen that 189 ftc (u, v1 ), (u, v2 ) ∈ G(T ) with v1 6= v2 so that G(T ) would not correspond to a single valued operator. If we know somehow that this cannot happen, then G(T ) will be the graph of some linear operator S (you should check that G(T ) is a subspace of H × H) which is obviously closed and extends T , thus T will be closeable. It is useful to have a clearer criterion for the closability of a linear operator T . Note that if (u, v1 ), (u, v2 ) are both in G(T ), with v1 6= v2 , then (0, v) ∈ G(T ) for v = v1 − v2 6= 0. This means there must exist un → 0, un ∈ D(T ) such that vn = T un → v 6= 0. If we can show that no such sequence un can exist, then evidently no such pair of points can exist in G(T ), so that T will be closeable. The converse statement is also valid and is easy to check. Thus we have established the following Proposition 11.2. A linear operator on H is closeable if and only if un ∈ D(T ), un → 0, T un → v implies v = 0. Example 11.2. Let T u = u0 on L2 (0, 1) with domain D(T ) = {u ∈ C 1 ([0, 1] : u(0) = 0}. We have previously observed that T is not closed, but we can check that the above criterion holds, so that T is closeable. Let un ∈ D(T ) and un → 0, u0n → v in L2 (0, 1). As before, Z x un (x) = u0n (s) (11.1.7) 0 Picking a subsequence nk → ∞ for which unk → 0 a.e., we get Z x v(s) ds = 0 a.e. (11.1.8) 0 The left hand side is absolutely continuous so equality must hold for every x ∈ [0, 1] and by Theorem 11.2 we conclude that v = 0 a.e. An operator which is closeable may in general have many different closed extensions. However, there always exists a minimal extension in this case, denoted T , the closure of T , defined by G(T ) = G(T ). It can be alternatively characterized as follows: T is the unique linear operator on H with the properties that (i) T ⊂ T and (ii) if T ⊂ S and S is closed then T ⊂ S. If T : D(T ) ⊂ H → H and S : D(S) ⊂ H → H are closed linear operators then the sum S + T is defined and linear on D(S + T ) = D(S) ∩ D(T ), but it is not closed, in general. Choose, for example, any closed and densely defined linear operator T and S = −T . Then the sum S+T is the zero operator, on the dense domain D(S∩T ) = D(T ), which is not closed. In this example S + T is closeable, but even that need not be true, 190 see Exercise 13. One can show, however, that if T is closed and S is bounded, then S + T is closed. Likewise the product ST is defined on D(ST ) = {x ∈ D(T ) : T x ∈ D(S)} and need not be closed even if S, T are. If S ∈ B(H) and T is closed then T S will be closed, but ST need not be (see Exercise 11). Finally consider the inverse operator T −1 : R(T ) → D(T ), which is well defined if T is one-to-one. prop11-3 Proposition 11.3. If T is one-to-one and closed then T −1 is also closed. Proof: Let un ∈ D(T −1 ), un → u and T −1 un → v. Then if vn = T −1 un we have vn ∈ D(T ), vn → v and T vn = un → u. Since T is closed it follows that v ∈ D(T ) and T v = u, or equivalently u ∈ R(T ) = D(T −1 ) and T −1 u = v as needed. 11.2 The adjoint of an unbounded linear operator To some extent it is possible to define an adjoint operator, even in the unbounded case, and obtain some results about the solvability of the operator equation T u = f analogous to those proved earlier in the case of bounded T . For the rest of this section we assume that T : D(T ) ⊂ H → H is linear and densely defined. We will say that (v, v ∗ ) is an admissible pair for T ∗ if hT u, vi = hu, v ∗ i ∀u ∈ D(T ) (11.2.1) We then define D(T ∗ ) = {v ∈ H : there exists v ∗ ∈ H such that (v, v ∗ ) is an admissible pair for T } (11.2.2) and T ∗v = v∗ v ∈ D(T ∗ ) (11.2.3) For this to be an appropriate definition, we should check that for any v there is at most one v ∗ for which (v, v ∗ ) is admissible. Indeed if there were two such elements, then the difference v1∗ − v2∗ would satisfy hu, v1∗ − v2∗ i = 0 for all u ∈ D(T ). Since we assume D(T ) is dense, it follows that v1∗ = v2∗ . Note that for v ∈ D(T ∗ ), if we define φv (u) = hT u, vi for u ∈ D(T ), then φv is bounded on D(T ), since |φv (u)| = |hu, v ∗ i| = |hu, T ∗ vi| ≤ ||u|| ||T ∗ v|| 191 (11.2.4) The converse statement is also true (see Exercise 5) so that it is equivalent to define D(T ∗ ) as the set of all v ∈ H such that u → hT u, vi is bounded on D(T ). The domain D(T ∗ ) always contains at least the zero element, since (0, 0) is always an admissible pair. There are known examples for which D(T ∗ ) contains no other points (see Exercise 4). Here is a useful characterization of T ∗ in terms of its graph G(T ∗ ) ⊂ H × H. adjgraph Proposition 11.4. If T is a densely defined linear operator on H then G(T ∗ ) = (V (G(T )))⊥ (11.2.5) where V is the unitary operator on H × H defined by V (x, y) = (−y, x) x, y ∈ H (11.2.6) We leave the proof as an exercise. Proposition 11.5. If T is a densely defined linear operator on H then T ∗ is a closed linear operator on H. We emphasize that it is not assumed here that T is closed. Proof: If v1 , v2 ∈ D(T ∗ ) and c1 , c2 are scalars, then there exist unique elements v1∗ , v2∗ such that hT u, v1 i = hu, v1∗ i hT u, v2 i = hu, v2∗ i for all u ∈ D(T ) (11.2.7) Then hT u, c1 v1 + c2 v2 i = c1 hT u, v1 i + c2 hT u, v2 i = c1 hu, v1∗ i + c2 hu, v2∗ i = hu, c1 v1∗ + c2 v2∗ i (11.2.8) for all u ∈ D(T ), thus (c1 v1 + c2 v2 , c1 v1∗ + c2 v2∗ ) is an admissible pair for T ∗ . In particular c1 v1 + c2 v2 ∈ D(T ∗ ) and T ∗ (c1 v1 + c2 v2 ) = c1 v1∗ + c2 v2∗ = c1 T ∗ v1 + c2 T ∗ v2 (11.2.9) To see that T ∗ is closed, let vn ∈ D(T ∗ ), vn → v and T ∗ vn → w. If u ∈ D(T ) then we must have hT u, vn i = hu, T ∗ vn i (11.2.10) Letting n → ∞ yields hT u, vi = hu, wi. Thus (v, w) is an admissible pair for T ∗ implying that v ∈ D(T ∗ ) and T ∗ v = w, as needed. 192 Example 11.3. Let us reconsider the densely defined differential operator in Example 11.1. We seek here is to find the adjoint operator T ∗ , and emphasize that one must determine D(T ∗ ) as part of the answer. It is typical in computing adjoints of unbounded operators that precisely identifying the domain of the adjoint is more difficult than finding a formula for the adjoint. Let v ∈ D(T ∗ ) and T ∗ v = g, so that hT u, vi = hu, gi for all u ∈ D(T ). That is to say, Z 1 Z 1 0 u (x)v(x) dx = u(x)g(x) dx ∀u ∈ D(T ) (11.2.11) 0 0 Let Z G(x) = − 1 g(y) dy (11.2.12) x so that G(1) = 0 and G0 (x) = g(x) a.e., since g is integrable. Integration by parts then gives Z 1 Z 1 Z 1 0 u(x)g(x) dx = u(x)G (x) dx = − u0 (x)G(x) dx (11.2.13) 0 0 since the boundary term vanishes. Thus we have Z 1 u0 (x)(v(x) + G(x) dx = 0 (11.2.14) 0 Now in (11.2.14) choose u(x) = The result is that Z Rx 0 v(y) + G(y) dy, which is legitimate since u ∈ D(T ). 1 |v(x) + G(x)|2 dx = 0 (11.2.15) 0 R1 which can only occur if v(x) = −G(x) = x g(y) dy a.e., implying that T ∗ v = g = −v 0 . The above representation for v also shows that v 0 ∈ L2 (0, 1) and v(1) = 0, i.e. D(T ∗ ) ⊂ {v ∈ L2 (0, 1) : v 0 ∈ L2 (0, 1), v(1) = 0} (11.2.16) We claim that the reverse inclusion is also correct: If v belongs to the set on the right and u ∈ D(T ) then Z 1 Z 1 0 u(x)v 0 (x) dx = hu, −v 0 i (11.2.17) hT u, vi = u (x)v(x) dx = − 0 f8.1 0 0 Thus (v, −v 0 ) is an admissible pair for T ∗ , from which we conclude that v ∈ D(T ∗ ) and T ∗ v = −v 0 as needed. 193 11-2-12 In summary we have established that T ∗ v = −v 0 with domain D(T ∗ ) = {v ∈ L2 (0, 1) : v 0 ∈ L2 (0, 1), v(1) = 0} (11.2.18) We remark that if we had originally defined T on the smaller domain {u ∈ C 1 ([0, 1]) : u(0) = 0} we would have obtained exactly the same result for T ∗ as above. This is a ∗ special case of the general fact that T ∗ = T (see Exercise 14). Definition 11.4. If T = T ∗ we say T is self-adjoint. It is crucial here that equality of the operators T and T ∗ must include the fact that their domains are identical. Example 11.4. If in the previous example we defined T u = iu0 on the same domain we would find that T ∗ v = iv 0 on the domain (11.2.18). Even though the expressions for T, T ∗ are the same, T is not self-adjoint since the two domains are different. It does, however, possess the property of symmetry. Definition 11.5. We say that T is symmetric if hT u, vi = hu, T vi for all u, v ∈ D(T ) Example 11.5. Let T u = iu0 be the unbounded operator on H = L2 (0, 1) with domain D(T ) = {u ∈ L2 (0, 1) : u0 ∈ L2 (0, 1), u(0) = u(1) = 0} (11.2.19) One sees immediately that T is symmetric, however it is still not self-adjoint since D(T ∗ ) 6= D(T ) again, see Exercise 6. If T is symmetric and u ∈ D(T ) then (v, T v) is an admissible pair for T ∗ , thus D(T ) ⊂ D(T ∗ ) and T ∗ v = T v for v ∈ D(T ). In other words, T ∗ is always an extension of T whenever T is symmetric. We see, therefore, that any self-adjoint operator is closed and any symmetric operator is closeable. prop11-5 Proposition 11.6. If T is densely defined and one-to-one, and if also R(T ) is dense, then T ∗ is also one-to-one and (T ∗ )−1 = (T −1 )∗ . Proof: By our assumptions, S = (T −1 )∗ exists. We are done if we show ST ∗ u = u for all u ∈ D(T ∗ ) and T ∗ Sv = v for all v ∈ D(S). First let u ∈ D(T ∗ ) and v ∈ D(T −1 ). Then hv, ui = hT T −1 v, ui = hT −1 v, T ∗ ui 194 (11.2.20) DAdjDom This means (T ∗ u, u) is an admissible pair for (T −1 )∗ and (T −1 )∗ T ∗ u = u as needed. Next, if u ∈ D(T ) and v ∈ D(S) then hu, vi = hT −1 T u, vi = hT u, Svi (11.2.21) Therefore (Sv, v) is admissible for T ∗ , so that Sv ∈ D(T ∗ ) and T ∗ Sv = v. With a small modification of the proof, we obtain that Proposition 10.4 remains valid. th11-3 Theorem 11.3. If T : D(T ) ⊂ H → H is a densely defined linear operator then N (T ∗ ) = R(T )⊥ . Proof: Let f ∈ R(T ) and v ∈ N (T ∗ ). We have f = T u for some u ∈ D(T ) and hf, vi = hT u, vi = hu, T ∗ vi = 0 (11.2.22) so N (T ∗ ) ⊂ R(T )⊥ . To get the reverse inclusion, let v ∈ R(T )⊥ , so that hT u, vi = 0 = hu, 0i for any u ∈ D(T ). This means (v, 0) is an admissible pair for T ∗ , so v ∈ D(T ∗ ) and T ∗ v = 0. Thus R(T )⊥ ⊂ N (T ∗ ) as needed. Theorem 11.4. If T, T ∗ are both densely defined then T is closeable. Proof: If we assume that T ∗ is densely defined, then T ∗∗ exists and is closed. If u ∈ D(T ) and v ∈ D(T ∗ ) then hT ∗ v, ui = hv, T ui which is to say that (u, T u) is an admissible pair for T ∗∗ . Thus u ∈ D(T ∗∗ ) and T ∗∗ u = T u, or equivalently T ⊂ T ∗∗ . Thus T has a closed extension, namely T ∗∗ . There is a converse statement, which we will not prove here, see [2] section 46, or [31] Theorem 13.12. th11-5 Theorem 11.5. If T is densely defined and closeable then T ∗ must be densely defined, and T = T ∗∗ . In particular if T is closed and densely defined then T ∗∗ = T . 11.3 Extensions of symmetric operators It has been observed above that if T is a densely defined symmetric operator then the adjoint T ∗ is always an extension of T . It is an interesting question whether such a T 195 always possesses a self-adjoint extension – the extension would necessarily be different from T ∗ at least if T is closed, since then if T ∗ is self-adjoint so is T , by Theorem 11.5 above. We say that a linear operator T is positive if hT u, ui ≥ 0 for all u ∈ D(T ). frext Theorem 11.6. If T is a densely defined, positive, symmetric operator on a Hilbert space H then T has a positive self-adjoint extension. Proof: Define hu, vi∗ = hu, vi + hT u, vi u, v ∈ D(T ) (11.3.1) with corresponding norm denoted by ||u||∗ . It may be easily verified that all of the inner product axioms are satisfied by h·, ·i∗ on D(T ), and ||u|| ≤ ||u||∗ . Let H ∗ be the dense closed subspace of H obtained as the closure of D(T ) in the || · ||∗ norm, and regard it as equipped with the h·, ·i∗ inner product. For any z ∈ H the functional ψz (u) = hu, zi belongs to the dual space of H ∗ since |ψz (u)| ≤ ||u|| ||z|| ≤ ||u||∗ ||z||, in particular ||ψz ||∗ ≤ ||z|| as a linear functional on H ∗ . Thus by the Riesz Representation theorem there exists a unique element Λz ∈ H ∗ ⊂ H such that ψz (u) = hu, Λzi∗ u ∈ H∗ (11.3.2) with ||Λz|| ≤ ||Λz||∗ ≤ ||z||. It may be checked that Λ : H → H is linear, and regarded as an operator on H we claim it is also self-adjoint. To see this observe that for any u, z ∈ H we have hΛu, zi = ψz (Λu) = hΛu, Λzi∗ = hΛz, Λui∗ = ψu (Λz) = hΛz, ui∗ = hu, Λzi (11.3.3) Choosing u = z we also see that Λ is positive, namely hΛz, zi = hΛz, Λzi∗ ≥ 0 (11.3.4) Next Λ is one-to-one, since if Λz = 0 and u ∈ H ∗ it follows that 0 = hu, Λzi∗ = hu, zi ∀u ∈ H ∗ (11.3.5) and since H ∗ is dense in H the conclusion follows. The range of Λ is also dense in H ∗ , hence in H, because otherwise there must exist u ∈ H ∗ such that 0 = hu, Λzi∗ = hu, zi for all z ∈ H. From the above considerations and Proposition 11.6 we conclude that S = Λ−1 exists and is a densely defined self-adjoint operator on H. 196 We will complete the proof by showing that the self-adjoint operator S − I is an extension of T . For z, w ∈ D(T ) we have hz, wi∗ = h(I + T )z, wi = ψ(I+T )z (w) = hw, R(I + T )zi∗ = hR(I + T )z, wi∗ (11.3.6) and so Λ(I + T )z = z ∀z ∈ D(T ) (11.3.7) by the assumed density of D(T ). In particular D(T ) ⊂ R(Λ) = D(S) and (I + T )z = Λ−1 z = Sz for z ∈ D(T ), as needed. The positivity of S follows immediately from that of Λ. A positive symmetric operator may have more than one self-adjoint extension, but the specific one constructed in the above proof is usually known as the Friedrichs extension. To clarify what all of the objects in the proof are, it may be helpful to think of the case that T u = −∆u on the domain D(T ) = C 2 (Ω) ∩ C0 (Ω). In this case ||u||∗ = ||u||H 1 (Ω) , H ∗ = H01 (Ω) (except endowed with the usual H 1 norm) and the Friedrichs extension will turn out to be the Dirichlet Laplacian discussed in detail in Section 14.4. The condition of positivity for T may be weakened, see Exercise 16. 11.4 Exercises 1. Let T, S be densely defined linear operators on a Hilbert space. If T ⊂ S, show that S ∗ ⊂ T ∗ . Ex11-3 2. Verify that H × H is a Hilbert space with the inner product given by (11.1.2), and prove Proposition 11.1. 3. Prove the null space of a closed operator is closed. Ex11-3a 4. Let φ ∈ H = L2 (R) be any nonzero function and define the linear operator Z ∞ Tu = u(x) dx φ −∞ on the domain D(T ) = L1 (R) ∩ L2 (R). a) Show that T is unbounded and densely defined b) Show that T ∗ is not densely defined, more specifically show that T ∗ is the zero operator with domain {φ}⊥ . (Since D(T ∗ ) is not dense, it then follows from Theorem 11.5 that T is not closeable.) 197 Ex11-4 5. If T : D(T ) ⊂ H → H is a densely defined linear operator, v ∈ H and the map u → hT u, vi is bounded on D(T ), show that there exists v ∗ ∈ H such that (v, v ∗ ) is an admissible pair for T ∗ . Ex11-5 6. Let H = L2 (0, 1) and T1 u = T2 u = iu0 with domains D(T1 ) = {u ∈ AC[0, 1] : u(0) = u(1), u0 ∈ H} D(T2 ) = {u ∈ AC[0, 1] : u(0) = u(1) = 0, u0 ∈ H} Show that T1 is self-adjoint, and that T2 is closed and symmetric but not self-adjoint. What is T2∗ ? Ec-11-7 7. If T is symmetric and R(T ) = H show that T is self-adjoint. (Suggestion: it is enough to show that D(T ∗ ) ⊂ D(T ).) 8. Show that if T is self-adjoint and one-to-one then T −1 is also self-adjoint. (Hint: All you really need to do is show that T −1 is densely defined.) 9. If T is self-adjoint, S is symmetric and T ⊂ S, show that T = S. (Thus a self-adjoint operator has no proper symmetric extension). 10. Let T, S be densely defined linear operators on H and assume that D(T + S) = D(T )∩D(S) is also dense. Show that T ∗ +S ∗ ⊂ (T +S)∗ . Give an example showing that T ∗ + S ∗ and (T + S)∗ may be unequal. ex11-10 11. Assume that T is closed and S is bounded a) show that S + T is closed b) Show that T S is closed, but that ST is not closed, in general. 12. Prove Proposition 11.4. ex11-9 13. Let H = `2 and define Sx = { ∞ X nxn , 4x2 , 9x3 , . . . } (11.4.1) n=1 T x = {0, −4x2 , −9x3 , . . . } (11.4.2) on D(S) = D(T ) = {x ∈ `2 : n4 |xn |2 < ∞}. Show that S, T are closed, but S + T is not closeable. (Hint: for example en /n → 0 but (S + T )en /n → e1 .) ex11-12 14. If T is closable, show that T and T have the same adjoint. 198 15. Suppose that T is densely defined and symmetric with dense range. Prove that N (T ) = {0}. ex11-14 16. We say that a linear operator on a Hilbert space H is bounded below if there exists a constant c0 > 0 such that hT u, ui ≥ −c0 ||u||2 ∀u ∈ D(T ) Show that Theorem 11.6 remains valid if the condition that T be positive is replaced by the assumption that T is bounded below. (Hint: T + co I is positive.) 199 Chapter 12 Spectrum of an operator hap-spectrum 12.1 Resolvent and spectrum of a linear operator Let T be a densely linear operator on a Hilbert space H. As usual, we use I to denote the identity operator on H. Definition 12.1. We say that λ ∈ C is a regular point for T if λI − T is one-to-one and onto, and (λI − T )−1 ∈ B(H). We then define ρ(T ), the resolvent set of T and σ(T ), the spectrum of T by ρ(T ) = {λ ∈ C : λ is a regular point for T } σ(T ) = C\ρ(T ) (12.1.1) Example 12.1. Let H = CN , T u = Au for some N × N matrix A. From linear algebra we know that λI − T is one-to-one and onto (automatically with a bounded inverse) precisely if λI − A is a non-singular matrix. Equivalently, λ is in the resolvent set if and only if λ is not an eigenvalue of A, where the eigenvalues are the roots of the N ’th degree polynomial det (λI − A). Thus σ(T ) consists of a finite number of points λ1 , . . . , λM , where 1 ≤ M ≤ N , and all other points of the complex plane make up the resolvent set ρ(T ). In the case of a finite dimensional Hilbert space there is thus only one kind of point in the spectrum, where (λI − T ) is neither one-to-one nor onto. But in general there are more possibilities. The following definition presents a traditional division of the spectrum into three parts. Definition 12.2. Let λ ∈ σ(T ). Then 200 1. If λI − T is not one-to-one then we say λ ∈ σp (T ), the point spectrum of T . 2. If λI − T is one-to-one, R(λI − T ) = H, but (λI − T )−1 is not bounded, then we say λ ∈ σc (T ), the continuous spectrum of T . 3. If λI − T is one-to-one but R(λI − T ) 6= H then we say λ ∈ σr (T ), the residual spectrum of T . Thus σ(T ) is the disjoint union of σp (T ), σc (T ) and σr (T ). The point spectrum is also sometimes called the discrete spectrum. In the case of H = CN , σ(T ) = σp (T ) by the above discussion, but in general all three parts of the spectrum may be non-empty, as we will see from examples. There are further subclassifications of the spectrum which are sometimes useful, see the exercises. In the case that λ ∈ σp (T ) there must exist u 6= 0 such that T u = λu, and we then say that λ is an eigenvalue of T and u is a corresponding eigenvector. In the case that H is a space of functions we will often refer to u an eigenfunction instead. Obviously any nonzero scalar multiple of an eigenvector is also an eigenvector, and the set of all eigenvectors for a given λ, together with the zero element, make up N (T − λI), the null space of T − λI, which will also be called the eigenspace of the eigenvalue λ. The dimension of N (T − λI) is the multiplicity of λ and may be infinity.1 It is easy to check that if T is a closed operator then any eigenspace of T is closed. The concepts of resolvent set and spectrum, and the division of the spectrum just introduced, are closely connected with what is meant by a well-posed or ill-posed problem, as discussed in Section 2.4, and which we can restate in somewhat more precise terms here. If T : D(T ) ⊂ X → Y is an operator between Banach spaces X, Y (T may even be nonlinear here) then the problem of solving the operator equation T (u) = f is said to be well posed with respect to X, Y if 1. A solution u exists for every f ∈ Y 2. The solution is unique in X 3. The solution depends continuously on f in the sense that if T (un ) = fn and fn → f in Y then un → u in X where u is the unique solution of T (u) = f 1 Note this is agrees with the geometric multiplicity concept in linear algebra. In general there is no meaning for algebraic multiplicity. 201 If the problem is not well-posed then it is ill-posed. Now observe that if T is a linear operator on H and λ ∈ ρ(T ) then the problem of solving λu − T u = f is well posed with respect to H. Existence holds since λI − T is onto, uniqueness since it is one-to-one, and the continuous dependence property follows from the fact that (λI − T )−1 is bounded. On the other hand, the three subsets of σ(T ) correspond more or less to the failure of one of the three conditions above: λ ∈ σp (T ) means that uniqueness fails, λ ∈ σc (T ) means that the inverse map is defined on a dense subspace on which it is discontinuous, and λ ∈ σr (T ) implies that existence fails in a more dramatic way, namely the closure of the range of the map is a proper subspace of H. Because the operator (λI − T )−1 arises so frequently, we introduce the notation Rλ = (λI − T )−1 (12.1.2) 12-1-2 which is called the resolvent operator of T . Thus λ ∈ ρ(T ) if and only if Rλ ∈ B(H). It may be checked that the resolvent identity Rλ − Rµ = (µ − λ)Rλ Rµ (12.1.3) is valid (see Exercise 2). Below we will look at a number of examples of operators and their spectra, but first we will establish a few general results. Among the most fundamental of these is that the resolvent set of any linear operator is open, so that the spectrum is closed. More generally, the property of being in the resolvent set is preserved under any sufficiently small bounded perturbation. prop12-1 Proposition 12.1. Let T, S be linear operators on H such that 0 ∈ ρ(T ) and S ∈ B(H) with ||S|| ||T −1 || < 1. Then 0 ∈ ρ(T + S). Proof: Since ||T −1 S|| ≤ ||T −1 || ||S|| < 1 it follows from Theorem 10.4 that (I + T −1 S)−1 ∈ B(H). If we now set A = (I + T −1 S)−1 T −1 then A ∈ B(H) also, and A(T + S) = (I + T −1 S)−1 T −1 (T + S) = (I + T −1 S)−1 (I + T −1 S) = I (12.1.4) Similarly (T + S)A = I, so (T + S) has a bounded inverse, as needed. We may now immediately obtain the properties of resolvent set and spectrum mentioned above. Theorem 12.1. If T is a linear operator on H then ρ(T ) is open and σ(T ) is closed in C. In addition if T ∈ B(H) and λ ∈ σ(T ) then |λ| ≤ ||T ||, so that σ(T ) is compact. 202 12-1-3 specnotempty Proof: Let λ ∈ ρ(T ) so (λI − T )−1 ∈ B(H). If || < 1/||(λI − T )−1 || we can apply Proposition 12.1 with T replaced by λI − T and S = I to get that 0 ∈ ρ((λ + )I − T ), or equivalently λ + ∈ ρ(T ) for all sufficiently small ||. When T ∈ B(H), the conclusion that σ(T ) is contained in the closed disk centered at the origin of radius ||T || is part of the statement of Theorem 10.4. Definition 12.3. The spectral radius of T is r(T ) = sup{|λ| : λ ∈ σ(T )} (12.1.5) That is to say, r(T ) is the radius of the smallest disk centered at the origin containing the spectrum of T . By the previous theorem we have always r(T ) ≤ ||T ||. This inequality can be strict, even in the case that H = C2 , as may be seen in the example 0 1 Tu = (12.1.6) 0 0 for which r(T ) = 0 but ||T || = 1. We do, however, have the following theorem, generalizing the well know spectral radius formula from matrix theory. 1 Theorem 12.2. If T ∈ B(H) then r(T ) = limn→∞ ||T n || n . We will not prove this here, but see for example Proposition 9.7 of [17] or Theorem 10.13 of [31]. It is a natural question to ask whether it is possible that either of ρ(T ), σ(T ) can be empty. In fact both can happen, for example any operator which is not closed has an empty resolvent set. To see this, suppose λ ∈ ρ(T ). Then (λI − T )−1 ∈ B(H) hence is closed, and by Proposition 11.3 (λI − T ) is then also closed. Finally it follows from Exercise 11 in Chapter 11 that T = λI − (λI − T ) is also closed. An example for which σ(T ) is empty is given in Exercise 6. The following theorem, however, says that this is impossible in the case of bounded T . Theorem 12.3. If T ∈ B(H) then σ(T ) 6= ∅. Proof: Let x, y ∈ H and define f (λ) = hx, Rλ yi 203 (12.1.7) If σ(T ) = ∅ then f is defined for all λ ∈ C, and is differentiable with respect to the complex variable λ, so that f is an entire function. On the other hand, for |λ| > ||T || we have by(10.8.3) that 1 → 0 as |λ| → ∞ (12.1.8) ||Rλ || ≤ |λ| − ||T || Thus by Liouville’s Theorem f (λ) ≡ 0. Since x is arbitrary we must have Rλ y = 0 for any y ∈ H which is clearly false. 12.2 Examples of operators and their spectra The purpose of introducing the concepts of resolvent and spectrum is to provide a systematic way of analyzing the solvability properties for operator equations of the the form λu − T u = f . Even if we are actually only interested in the case when λ = 0 (or some other fixed value) it is somehow still revealing to study the whole family of problems, as λ varies over C. In this section we will look in detail at some examples. Example 12.2. If H = CN and T u = Au for some N × N matrix A, then by previous discussion we have σ(T ) = σp (T ) = {λ1 , . . . , λm } (12.2.1) for some 1 ≤ m ≤ N , where λ1 , . . . λm are the distinct eigenvalues of A. Each eigenspace N (λj I − T ) has dimension equal to the geometric multiplicity of λj and the sum of these dimensions is also some integer between 1 and N . Example 12.3. Let Ω ⊂ RN be a bounded open set, H = L2 (Ω) and let T be the multiplication operator T u(x) = a(x)u(x) for some a ∈ C(Ω). If we begin by looking for eigenvalues of T then we seek nontrivial solutions of T u = λu, that is to say (λ − a(x))u(x) = 0 (12.2.2) If a(x) 6= λ a.e. then u ≡ 0 is the only solution, so λ 6∈ σp (T ). It is useful here to introduce a notation for the level sets of a, Eλ = {x ∈ Ω : a(x) = λ}. If for some λ we have m(Eλ ) > 0 then the characteristic function u(x) = χΣ (x) is an eigenfunction for the eigenvalue λ if Σ is any subset of Eλ of positive, finite measure. In fact so is any other L2 function whose support lies within Eλ , and thus the corresponding eigenspace is infinite dimensional. Thus σp (T ) = {λ ∈ C : m(Eλ ) > 0} 204 (12.2.3) Note that σp (T ) is at most countably infinite, since for example An = {λ ∈ C : m(Eλ ) > 1 } is at most countable for every n and σp (T ) = ∪∞ n=1 An . n Now let us consider the other parts of the spectrum. Consider the equation λu−T u = f whose only possible solution is u(x) = f (x)/(λ − a(x)). For λ 6∈ σp (T ) u(x) is well defined a.e., but it doesn’t necessarily follow that u ∈ L2 (Ω) even if f is. If λ 6∈ R(a) (here R(a) is the range of the function a) then there exists δ > 0 such that |a(x) − λ| ≥ δ for all x ∈ Ω, from which it follows that u = (λI − T )−1 f exists in L2 (Ω) and satisfies |u(x)| ≤ δ −1 |f (x)|. Thus ||(λI − T )−1 || ≤ δ −1 and so λ ∈ ρ(T ). If, on the other hand λ ∈ R(a) it is always possible to find f ∈ L2 (Ω) such that u(x) = f (x)/(λ − a(x)) is not in L2 (Ω). This means in particular that λI − T is not onto, i.e. λ is either in the continuous or residual spectrum. In fact it is not hard to verify that the range of λI − T must be dense in this case. To see this, suppose λ ∈ σ(T )\σp (T ) so that m(Eλ ) = 0. Then for any n there must exist an open set On containing Eλ such that m(On ) < n1 . For any function f ∈ L2 (Ω) let Un = Ω\On and fn = f χUn . Then fn ∈ R(λI − T ) since λ − a(x) will be bounded away from zero on Un , and fn → f in L2 (Ω) as needed. To summarize, we have the following conclusions about the spectral properties of T : • ρ(T ) = {λ ∈ C : λ 6∈ R(m)} • σp (T ) = {λ ∈ R(m) : m(Eλ ) > 0} • σc (T ) = {λ ∈ R(m) : m(Eλ ) = 0} • σr (T ) = ∅ Ex-12-4 Rx Example 12.4. Next we consider the Volterra type integral operator T u(x) = 0 u(s) ds on T = L2 (0, 1). We first observe that any λ 6= 0 is in the resolvent set of T . To see this, consider the problem of solving (λI − T )u = f , i.e. Z x λu(x) − u(s) ds = f (x) 0 < x < 1 (12.2.4) 0 with f ∈ L2 (0, 1). This is precisely the equation (2.2.10) whose solution is given in (2.2.13) if g = −f and which is well defined for any f ∈ L2 (0, 1). Crude estimation shows ||u|| = ||(λI − T )−1 f || ≤ C||f || for some constant C which depends only on λ. Thus any nonzero λ is in ρ(T ). By Theorem 12.3 we can immediately conclude that 0 Rx must be in σ(T ). It is clear that λ = 0 cannot be an eigenvalue, since 0 u(s) ds = 0 205 VolterraReso implies u(x) = 0 a.e., by the Fundamental Theorem of Calculus. On the other hand R(T ) is dense, since for example it contains D(0, 1). One could also verify directly, that T −1 is unbounded. We conclude then that σ(T ) = σc (T ) = {0} (12.2.5) In the next example we see a typical way that residual spectrum appears. Example 12.5. Let H = `2 and T = S+ the right shift operator introduced in (10.2.31). As usual we first look for eigenvalues. The equation T x = λx gives λx1 = 0 and λxn+1 = xn for n = 1, 2, . . . . Thus if λ 6= 0 we immediately conclude that x = 0. If T x = 0 we also see directly that x = 0, thus the point spectrum is empty. Since T is a bounded operator of norm 1, we also know that if |λ| > 1 then λ ∈ ρ(T ). Since R(T ) ⊂ {x ∈ `2 : x1 = 0} it follows that R(T ) is not dense in `2 , and since we already know 0 is not an eigenvalue it must be that 0 ∈ σr (T ). See Exercise 4 for classification of the remaining λ values. Finally we consider the case of an unbounded operator. ex12-6 Example 12.6. Let H = L2 (0, 1) and T u = −u00 on the domain D(T ) = {u ∈ H 2 (0, 1) : u(0) = u(1) = 0} (12.2.6) The equation λu − T u = 0 is equivalent to the ODE boundary value problem u00 + λu = 0 0 < x < 1 u(0) = u(1) = 0 (12.2.7) which was already discussed in Chapter 2, see (2.3.53). We found that a nontrivial solution un (x) = sin nπx exists for λ = λn = (nπ)2 and there are no other eigenvalues. Notice that the spectrum is unbounded here, as typically happens for unbounded operators. We claim that all other λ ∈ C are in the resolvent set of T . To see this, we begin by representing the general solution of u00 + λu = f for f ∈ L2 (0, 1) as Z x √ √ √ 1 u(x) = C1 sin λx + C2 cos λx + √ sin λ(x − y)f (y) dy (12.2.8) λ 0 which may be derived from the usual variation of parameters method.2 To satisfy the boundary conditions u(0) = u(1) = 0 we must have C2 = 0 and Z 1 √ √ 1 C1 sin λ + √ sin λ(1 − y)f (y) dy = 0 (12.2.9) λ 0 √ It is correct for all complex λ 6= 0, taking λ to denote the principal branch of the square root function. We leave the remaining case λ = 0 as an exercise. 2 206 which uniquely determines C1 as long as λ 6= (nπ)2 . Using this expression for C1 we obtain a formula for u = (λI − T )−1 f of the form Z 1 Gλ (x, y)f (y) dy (12.2.10) u(x) = 0 with a bounded kernel Gλ (x, y). By previous discussion we know that such an integral operator is bounded on L2 (0, 1) and so λ ∈ ρ(T ). 12.3 Properties of spectra We will see in this section that if an operator T belongs to some special class, then its spectrum will often have some corresponding special properties. th12-4 Theorem 12.4. Let T be a closed, densely defined operator. 1. If λ ∈ ρ(T ) then λ ∈ ρ(T ∗ ). 2. If λ ∈ σr (T ) then λ ∈ σp (T ∗ ). 3. If λ ∈ σp (T ) then λ ∈ σr (T ∗ ) ∪ σp (T ∗ ). Proof: If λ ∈ ρ(T ) then N (λI − T ∗ ) = N ((λI − T )∗ ) = R(λI − T )⊥ = {0} (12.3.1) where Theorem 11.3 is used for the second equality. In particular λI − T ∗ is invertible. Also R(λI − T ∗ ) = N (λI − T )⊥ = {0}⊥ = H (12.3.2) so that (λI − T ∗ )−1 is densely defined. Proposition 11.6 is then applicable so that (λI − T ∗ )−1 = ((λI − T )∗ )−1 = ((λI − T )−1 )∗ ∈ B(H) (12.3.3) Therefore λ ∈ ρ(T ∗ ). Next, if λ ∈ σr (T ) then R(λI − T ) = M for some subspace M whose closure is not all of H. Thus ⊥ N (λI − T ∗ ) = R(λI − T )⊥ = M ⊥ = M 6= {0} (12.3.4) 207 and so λ ∈ σp (T ∗ ). Finally, if λ ∈ σp (T ) then R(λI − T ∗ ) = N (λI − T )⊥ 6= H (12.3.5) so λ 6∈ σc (T ∗ ), as needed. Next we turn to some special properties of self-adjoint and unitary operators. th12-5 Theorem 12.5. Suppose that T is a densely defined operator with T ∗ = T . We then have 1. σ(T ) ⊂ R. 2. σr (T ) = ∅. 3. If λ1 , λ2 ∈ σp (T ), λ1 6= λ2 then N (λ1 I − T ) ⊥ N (λ2 I − T ). Proof: To prove the first statement, let λ = ξ + iη with η 6= 0. Then ||λu − T u||2 = hξu + iηu − T u, ξu + iηu − T ui = ||ξu − T u||2 + |η|2 ||u||2 (12.3.6) since hξu − T u, iηui + hiηu, ξu − T ui = 0. In particular ||λu − T u|| ≥ |η|||u|| (12.3.7) so λI − T is one to one, i.e. λ 6∈ σp (T ). Likewise λ 6∈ σr (T ) since otherwise, by Theorem 12.4 we would have λ ∈ σp (T ∗ ) = σp (T ) which is impossible by the same argument. Thus if λ ∈ σ(T ) then it can only be in the continuous spectrum so R(λI − T ) is dense in H. But (12.3.7) with η 6= 0 also implies that R(λI − T ) is closed and (12.3.7) then also says that ||(λI − T )−1 || ≤ 1/|η|. Thus λ ∈ ρ(T ). Next, if λ ∈ σr (T ) then λ ∈ σp (T ∗ ) = σp (T ) by Theorem 12.4. But λ must be real by the first part of this proof, so λ ∈ σp (T ) ∩ σr (T ), which is impossible. Finally, if λ1 , λ2 are distinct eigenvalues, pick u1 , u2 such that T u1 = λ1 u1 and T u2 = λ2 u2 . There follows λ1 hu1 , u2 i = hλ1 u1 , u2 i = hT u1 , u2 i = hu1 , T u2 i = hu1 , λ2 u2 i = λ2 hu1 , u2 i Since λ1 , λ2 must be real we see that (λ1 − λ2 )hu1 , u2 i = 0 so u1 ⊥ u2 as needed. 208 (12.3.8) lowerbound Theorem 12.6. If T is a unitary operator then σr (T ) = ∅ and σ(T ) ⊂ {λ : |λ| = 1}. Proof: Recall that ||T u|| = ||u|| for all u when T is unitary. Thus if T u = λu we then have ||u|| = ||T u|| = ||λu|| = |λ|||u|| (12.3.9) so |λ| = 1 must hold for any λ ∈ σp (T ). If λ ∈ σr (T ) then λ ∈ σp (T ∗ ) by Theorem 12.4. Since T ∗ is also unitary we get |λ| = |λ| = 1. Also T ∗ u = λu implies that u = T T ∗ u = λT u so that λ = 1/λ ∈ σp (T ), which is a contradiction to the assumption that λ ∈ σr (T ). Thus the residual spectrum of T is empty. To complete the proof, first note that since ||T || = 1 we must have |λ| ≤ 1 if λ ∈ σ(T ) by Theorem 10.4. If |λ| < 1 then (I − λT ∗ )−1 ∈ B(H) by the same theorem, and for any f ∈ H we can obtain a solution of λu − T u = f by setting u = −T ∗ (I − λT ∗ )−1 f . Since we already know λ 6∈ σp (T ) it follows that λI − T is one-to-one and onto, and ||(λI − T )−1 || = ||T ∗ (I − λT ∗ )−1 || which is finite, and so λ ∈ ρ(T ), as needed. Example 12.7. Let T = F, the Fourier transform on H = L2 (RN ), as defined in (8.4.1), which we have already established is unitary, see (10.5.10). From the inversion formula for the Fourier transform it is immediate that F 4 = I. If Fu = λu we would also have u = F 4 u = λ4 u so that any eigenvalue λ of F satisfies λ4 = 1, i.e. σp (F) ⊂ {±1, ±i}. We |x|2 already knew that λ = 1 must be an eigenvalue with a Gaussian e− 2 as a corresponding eigenfunction. In fact all four values ±1, ±i are eigenvalues with infinite dimensional eigenspaces spanned by products of Gaussians and so-called Hermite polynomials. See Section 2.5 of [9] for more details. In Exercise 5 you are asked to show that all other values of λ are in the resolvent set of F. Example 12.8. The Hilbert transform H introduced in Example 10.7 is also unitary on H = L2 (R). Since also H2 = −I it follows that the only possible eigenvalues of H are ±i. It is readily checked that these are both eigenvalues with the eigenspace for λ = i being M− = {u ∈ L2 (R) : û(k) = 0 ∀k > 0} and that for λ = −i being M+ = {u ∈ L2 (R) : û(k) = 0 ∀k < 0}. Let us check that any λ 6= ±i is in the resolvent set. If λu − Hu = f then applying H to both sides we get λHu + u = Hf . Eliminating Hu between these two equations we can solve for u= λf + Hf λ2 + 1 (12.3.10) Conversely by direct substitution we can verify that this formula defines a solution of λu − Hu = f , so that (λI − H)−1 = λI+H which is obviously bounded for λ 6= ±i. λ2 +1 209 Finally we discuss an important example of an unbounded operator. Example 12.9. Let H = L2 (RN ) and T u = −∆u on D(T ) = H 2 (RN ). If we apply the Fourier transform then for f, u ∈ H the resolvent equation λu − T u = f is seen to be equivalent to (λ − |y|2 )û(y) = fˆ(y) (12.3.11) It is then immediate that σ(T ) ⊂ [0, ∞) and that σp (T ) = ∅. On the other hand √a ˆ solution û, and hence u, exists in H as long as f vanishes in a neighborhood of y = λ. Such f form a dense subset of H so σr (T ) = ∅ also. This could also be shown by verifying that T is self-adjoint. Finally, it is clear that for λ > 0 there exists a function u such that û 6∈ L2 (RN ) but g := (λ − |y|2 )û ∈ L2 (RN ). If f ∈ L2 (RN ) is defined by fˆ = g then it follows that f is not in the range of λI − T , so λ ∈ σc (T ) must hold. In summary, σ(T ) = σc (T ) = [0, ∞). 12.4 Exercises 1. Let M be a closed subspace of a Hilbert space H, M 6= {0}, H and let PM be the usual orthogonal projection onto M . Show that if λ 6= 0, 1 then λ ∈ ρ(PM ) and ||(λI − PM )−1 || ≤ ex12-2 1 1 + |λ| |1 − λ| 2. Recall that the resolvent operator of T is defined to be Rλ = (λI −T )−1 for λ ∈ ρ(T ). a) Prove the resolvent identity (12.1.3). b) Deduce from this that Rλ , Rµ commute. c) Show also that T, Rλ commute for λ ∈ ρ(T ). 3. Show that λ → Rλ is a continuously differentiable, regarded as a mapping from ρ(T ) ⊂ C into B(H), with dRλ = −Rλ2 dλ ex12-4 4. Let T denote the right shift operator on `2 . Show that a) σp (T ) = ∅ b) σc (T ) = {λ : |λ| = 1} c) σr (T ) = {λ : |λ| < 1} 210 ex12-4b 5. If λ 6= ±1, ±i show that λ is in the resolvent set of the Fourier transform F. (Suggestion: Assuming that a solution of Fu − λu = f exists, derive an explicit formula for it using F 4 u = λ4 u + λ3 f + λ2 Ff + λF 2 f + F 3 f and the fact that F 4 = I if F is the Fourier transform.) ex12-5 6. Let H = L2 (0, 1), T1 u = T2 f = T3 u = u0 on the domains D(T1 ) = H 1 (0, 1) D(T2 ) = {u ∈ H 1 (0, 1) : u(0) = 0} D(T3 ) = {u ∈ H 1 (0, 1) : u(0) = u(1) = 0} Show that (i) σ(T1 ) = σp (T1 ) = C (ii) σ(T2 ) = ∅ (iii) σ(T3 ) = σr (T3 ) = C. 7. Define the translation operator T u(x) = u(x − 1) on L2 (R). a) Find T ∗ . b) Show that T is unitary. c) Show that σ(T ) = σc (T ) = {λ ∈ C : |λ| = 1}. Rx 8. Let T u(x) = 0 K(x, y)u(y) dy be a Volterra integral operator on L2 (0, 1) with a bounded kernel, |K(x, y)| ≤ M . Show that σ(T ) = {0}. (There are several ways to show that T has no nonzero eigenvalues. Here is one approach: Define the equivalent norm on L2 (0, 1) Z 1 2 ||u||θ = u2 (x)e−2θx dx 0 and show that the supremum of sufficiently large.) ||T u||θ ||u||θ can be made arbitrarily small by choosing θ 9. If T is a symmetric operator, show that σp (T ) ∪ σc (T ) ⊂ R (It is almost the same as showing that σ(T ) ⊂ R for a self-adjoint operator.) 211 10. The approximate spectrum σa (T ) of a linear operator T is the set of all λ ∈ C such that there exists a sequence {un } in H such that ||un || = 1 for all n and ||T un − λun || → 0 as n → ∞. Show that σp (T ) ∪ σc (T ) ⊂ σa (T ) ⊂ σ(T ) (so that σa (T ) = σ(T ) in the case of a self-adjoint operator.) Show by example that σr (T ) need not be contained in σa (T ). 11. The essential spectrum σe (T ) of a linear operator T is the set of all λ ∈ C such that λI − T is not a Fredholm operator3 (recall the Definition 10.5). Show that σe (T ) ⊂ σ(T ). Characterize the essential spectrum for the following operators: i) a linear operator on Cn , ii) an orthogonal projection on a Hilbert space, iii) the Fourier transform on L2 (RN ), and iv) a multiplication operator on L2 (Ω). 12. If T is a bounded, self-adjoint operator on a Hilbert space H, show that hT u, ui ≥ 0 for all u ∈ H if and only if σ(T ) ⊂ [0, ∞). 3 Actually there are several non-equivalent definitions of essential spectrum which can be found in the literature. 212 Chapter 13 Compact Operators chcompact 13.1 Compact operators One type of operator which has not yet been mentioned much in connection with spectral theory is integral operators. This is because they typically belong to a particular class of operators known as compact operators for which there is a well developed special theory, whose main points will be presented in this chapter. If X is a Banach space, then as usual K ⊂ X is compact if any open cover of K has a finite subcover. Equivalently any infinite bounded sequence in K has subsequence convergent to an element of K. If dim(X) < ∞ then K is compact if and only if it is closed and bounded, but this is false if dim(X) = ∞. exmp13-1 Example 13.1. Let H be an infinite dimensional Hilbert space and K = {u ∈ H : ||u|| ≤ 1}, which is obviously closed and bounded. If we let {en }∞ n=1 be an infinite orthonormal sequence (which √ we know must exist) there cannot be any convergent subsequence since ||en − em || = 2 for any n 6= m. Recall also that E ⊂ X is precompact, or relatively compact, if E is compact. Definition 13.1. If X, Y are Banach spaces then a linear operator T : X → Y is compact if for any bounded set E ⊂ X the image T (E) is precompact in Y. This definition makes sense even if T is nonlinear, but in this book the terminology will only be used in the linear case. We will use the notation K(X, Y) to denote the set 213 of compact linear operators from X to Y and K(X) if Y = X. Proposition 13.1. If X, Y are Banach spaces then 1. K(X, Y) is a subspace of B(X, Y) 2. If T ∈ B(X, Y) and dim(R(T )) < ∞ then T ∈ K(X, Y) 3. The identity map I ∈ K(X) if and only if dim(X) < ∞ Proof: If T is compact then T (B(0, 1)) is compact in Y and in particular is bounded in Y. Thus there exists M < ∞ such that ||T u|| ≤ M if ||u|| ≤ 1, which means ||T || ≤ M . It is immediate to check that K(X, Y) is a vector space, so (1) is proved. If E ⊂ X is bounded and T ∈ B(X, Y) then T (E) is bounded in Y. Therefore T (E) is a bounded subset of the finite dimensional set R(T ), so is relatively compact by the Heine-Borel theorem. This proves (2) and the ’if’ part of (3). The other half of (3) is equivalent to the statement that the unit ball B(0, 1) is not compact if dim(X) = ∞. This was shown in Example 13.1 above in the Hilbert space case, and we refer to Theorem 6.5 of [5] for the general case of a Banach space. Recall that when dim(R(T R )) < ∞ we say that T is of finite P rank. Any degenerate integral operator T u(x) = Ω K(x, y)u(y) dy with K(x, y) = nj=1 φj (x)ψj (y), φj , ψj ∈ L2 (Ω) for j = 1, . . . n, is therefore of finite rank, and so in particular is compact. A convenient alternate characterization of compact operators involves the notion of weak convergence. Although the following discussion can mostly be carried out in a Banach space setting, we will consider only the Hilbert space case. Definition 13.2. If H is a Hilbert space and {un }∞ n=1 is an infinite sequence in H, we w say un converges weakly to u in H (un → u), provided that hun , vi → hu, vi for every v ∈ H. Note by the Riesz Representation Theorem that this is the same as requiring `(un ) → `(u) for every ` ∈ H ∗ – this is the definition to use when generalizing to the Banach space situation. The weak limit, if it exists, is unique, see Exercise 3. Example 13.2. Assume that H is infinite dimensional and let {en }∞ n=1 be any orthonormal set in H, which is not convergent by Example 13.1. From Bessel’s inequality we 214 have ∞ X |hen , vi|2 ≤ ||v||2 < ∞ for all v ∈ H (13.1.1) n=1 w which implies in particular that hen , vi → 0 for every v ∈ H. This means en → 0, In case it is necessary to emphasize the difference between weak convergence and the ordinary notion of convergence in H we may refer to the latter as strong convergence. It is elementary to show that strong convergence always implies weak convergence, but the converse is false, as the above example shows. To make the connection to compact operators, let {en }∞ n=1 again denote an infinite orthonormal set in an infinite dimensional Hilbert space H and suppose T is compact on H. If un = T en then {un }∞ n=1 is evidently relatively compact in H so we can find a convergent subsequence unk → u. For any v ∈ H we then have hunk , vi = hT enk , vi = henk , T ∗ vi → 0 (13.1.2) w so that unk = T enk → 0. But since also unk → u we must have unk → 0. Since the original sequence could be replaced by any of its subsequences we conclude that for any subsequence enk there must exist a further subsequence enkj such that T enkj → 0. We now claim now that un → 0, i.e. the entire sequence converges, not just the subsequence. If not, then there must exist δ > 0 and a subsequence enk such that ||T enk || ≥ δ, which contradicts the fact just established that T enk must have a subsequence convergent to zero. We have therefore established that any compact operator maps the weakly convergent sequence en to a strongly convergent sequence. We will see below that compact operators always map weakly convergent sequences to strongly convergent sequences and that this property characterizes compact operators. Let us first present some more important facts about weak convergence in a Hilbert space. prop13-2 w Proposition 13.2. Let un → u in a Hilbert space H. Then 1. ||u|| ≤ lim inf n→∞ ||un ||. 2. If ||un || → ||u|| then un → u. Proof: We have 0 ≤ ||un − u||2 = ||un ||2 − 2Re hun , ui + ||u||2 215 (13.1.3) weak1 or 2Re hun , ui − ||u||2 ≤ ||un ||2 (13.1.4) Now take the lim inf of both sides to get the conclusion of (1). If ||un || → ||u|| then the right hand identity of (13.1.3) show that ||un − u|| → 0. The property in part (1) of the Proposition is often referred to as the weak lower semicontinuity of the norm. Note that strict inequality can occur, for example in the case that un is an infinite orthonormal set. Various familiar topological notions may be based on weak convergence. Definition 13.3. A set E ⊂ H is weakly closed if un ∈ E w un → u implies u ∈ E (13.1.5) and E is weakly open if its complement is weakly closed. We say E is weakly compact if any infinite sequence in E has a subsequence which is weakly convergent to an element u ∈ E. Clearly a weakly closed set is closed, but the converse is false in general. Example 13.3. If E = {u ∈ H : ||u|| = 1} then E is closed but is not weakly closed, since again a counterexample is provided by any infinite orthonormal sequence. On the other hand, E = {u ∈ H : ||u|| ≤ 1} is weakly closed by Proposition 13.2. Several key facts relating to the weak convergence concept, which we will not prove here but will make extensive use of, are given in the next theorem. weaktopthm Theorem 13.1. Let H be a Hilbert space. Then 1. Any weakly convergent sequence is bounded. 2. Any bounded sequence has a weakly convergent subsequence. 3. If E ⊂ H is convex and closed then it is also weakly closed. In particular any closed subspace is weakly closed. The three parts of these theorems are all special cases of some very general results in functional analysis. The first statement is a special case of the Banach-Steinhaus theorem (or Uniform Boundedness Principle) which is more generally a theorem about sequences 216 of bounded linear functionals on a Banach space. See Corollary 1 in Section 23 of [2] or Theorem 5.8 of [30] for the more general Banach space result. The second statement is a special case of the Banach-Alaoglu theorem, which asserts a weak compactness property of a bounded sets in the dual space of any Banach space, see Theorem 1 in section 24 of [2] or Theorem 3.15 of [31] for the more general Banach space result. The third part is a special case of Mazur’s theorem, also valid in a more general Banach space setting, see Theorem 3.7 of [5]. Now let us return to the main development and prove the following very important characterization of compact linear operators. Theorem 13.2. Let T ∈ B(H). Then T is compact if and only if T maps any weakly convergent sequence to a strongly convergent sequence. w Proof: Suppose that T is compact and un → u. Then {un } is bounded by part 1 of Theorem 13.1. Since T is bounded the image sequence {T un } is also bounded, hence has a strongly convergent subsequence by part 2 of the same theorem. Note also that w T un → T u since for any v ∈ H we have hT un , vi = hun , T ∗ vi → hu, T ∗ vi = hT u, vi (13.1.6) Thus there must exist a subsequence unk such that T unk → T u strongly in H. By the same argument, any subsequence of un has a further subsequence for which the image sequence converges to T u and so T un → T u. To prove the converse, let E ⊂ H be bounded and {vn }∞ n=1 ⊂ T (E). We must then have vn = zn + n where zn = T un for some un ∈ E and n → 0 in H. By the boundedness of E and part 2 of Theorem 13.1 there must exist a weakly convergent w subsequence unk → u. Therefore vnk = T unk + nk is convergent, since we assume that the image of any weakly convergent sequence is strongly convergent. It follows that T (E) is relatively compact, as needed. The following theorem will turn out to be a key tool in developing the theory of integral equations with L2 kernels. th13-3 Theorem 13.3. K(H) is a closed subspace of B(H). Proof: We have already observed that K(H) is a subspace of B(H). To verify that it is closed, pick Tn ∈ K(H) such that ||Tn − T || → 0 for some T ∈ B(H). We are done if we 217 show T ∈ K(H), and this in turn will follow if we show that for any bounded sequence {un } there exists a convergent subsequence of the image sequence {T un }. Since T1 ∈ K(H) there must exist a subsequence {u1n } ⊂ {un } such that {T1 u1n } is convergent. Likewise, since T2 ∈ K(H) there must exist a further subsequence {u2n } ⊂ {u1n } such that {T2 u2n } is convergent. Continuing in this way we get {ujn } such that j j {uj+1 n } ⊂ {un } and {Tj un } is convergent, for any fixed j. Now let zn = unn , so that {zn } ⊂ {ujn } for any j, and is obviously a subsequence of the original sequence {un }. We claim that {T zn } is convergent, which will complete the proof. Fix an > 0. We may first choose M such that ||un || ≤ M for every n, and then some fixed j such that ||Tj − T || < 4M . Next pick N so that ||Tj zn − Tj zm || < 2 when m, n ≥ N . We then have, for n, m ≥ N , that ||T zn − T zm || ≤ ||T zn − Tj zn || + ||Tj zn − Tj zm || + ||Tj zm − T zm || ≤ ||T − Tj ||(||zn || + ||zm ||) + ||Tj zn − Tj zm || ≤ (13.1.7) It follows that {T zn } is Cauchy, hence convergent, in H. Recall that an integral operator Z K(x, y)u(y) dy T u(x) = (13.1.8) Ω is of Hilbert-Schmidt type if K ∈ L2 (Ω × Ω), and we have earlier established that such operators are bounded on L2 (Ω). We will now show that any Hilbert-Schmidt integral operator is actually compact. The basic idea is to show that T can be approximated by finite rank operators, which we know to be compact, and then apply the previous theorem. First we need a lemma. 2 ∞ Lemma 13.1. If {φn }∞ n=1 is an orthonormal basis of L (Ω) then {φn (x)φm (y)}n,m=1 is an orthonormal basis of L2 (Ω × Ω). Proof: By direct calculation we see that ( 1 n = n0 , m = m0 φn (x)φm (y)φn0 (x)φm0 (y) dxdy = 0 otherwise Ω Z Z Ω 218 (13.1.9) 13-1-8 so that they are orthonormal in L2 (Ω × Ω). To show completeness, then by Theorem 6.4 it is enough to verify the Bessel equality. That is, we show ||f ||2L2 (Ω) ∞ X = |cn,m |2 (13.1.10) n,m=1 where Z Z f (x, y)φn (x)φm (y) dxdy cn,m = Ω (13.1.11) Ω and it is enough to do this for f ∈ C(Ω). By applying the Bessel equality in x for fixed y, and then integrating with respect to y we get Z Z Z X ∞ 2 |cn (y)|2 dy (13.1.12) |f (x, y)| dxdy = Ω Ω Ω n=1 R where cn (y) = Ω f (x, y)φn (x) dx. Since we can clearly exchange the sum and integral, it follows by applying the Bessel equality to cn (·) we get Z Z ∞ X ∞ ∞ Z X X 2 |cn (y)|2 dy = |cn,m |2 (13.1.13) |f (x, y)| dxdy = Ω Ω where Ω n=1 Z n=1 m=1 Z Z cn (y)φm (y) dy = cn,m = f (x, y)φn (x)φm (y) dxdy Ω Ω (13.1.14) Ω as needed. th13-4 Theorem 13.4. If K ∈ L2 (Ω × Ω) then the integral operator (13.1.8) is compact on L2 (Ω). Proof: Let {φn } be an orthonormal basis of L2 (Ω) and set KN (x, y) = N X cn,m φn (x)φm (y) (13.1.15) n,m=1 with cn,m as above, so we know ||KN − K||L2 (Ω×Ω → 0 as N → ∞. Let TN be the corresponding integral operator with kernel KN , which is compact since it has finite rank. Finally since ||T − TN || ≤ ||KN − K||L2 (Ω×Ω) → 0 (recall (10.2.15)) it follows from Theorem 13.3 that T is compact. 219 13.2 The Riesz-Schauder theory In this section we first establish a fundamental result about the solvability of operator equations of the form λu − T u = f when T is compact and λ 6= 0. th13-5 Theorem 13.5. Let T ∈ K(H) and λ 6= 0. Then 1. λI − T is a Fredholm operator of index zero. 2. If λ ∈ σ(T ) then λ ∈ σp (T ). Recall that the first statement means that N (λI − T ) and N (λI − T ∗ ) are of the same finite dimension and that R(λI − T ) is closed. It follows that R(λI − T ) = N (λI − T ∗ )⊥ (13.2.1) and the Fredholm alternative holds: Either • λI − T and λI − T ∗ are both one to one, and λu − T u = f has a unique solution for every f ∈ H, or • dim N (λI − T ) = dim N (λI − T ∗ ) < ∞ and λu − T u = f has a solution if and only if f ⊥ v for any v satisfying T ∗ v = λv. If T is compact then so is T ∗ (Exercise 2), thus all of the same conclusions hold for T ∗. The proof proceeds by means of a number of intermediate steps, some of which are of independent interest. Without loss of generality we may assume λ = 1, since we could always write λI − T = λ(I − λ−1 T ). For the rest of the section we denote S = I − T with the assumption that T ∈ K(H). Lemma 13.2. There exists C > 0 such that ||Su|| ≥ C||u|| for all u ∈ N (S)⊥ . Proof: If no such constant exists then we can find a sequence {un }∞ n=1 such that un ∈ ⊥ N (S) , ||un || = 1 and ||Sun || → 0. By weak compactness there exists a subsequence 220 w unk such that unk → u for some u with ||u|| ≤ 1. Since T is compact it follows that T unk → T u, so unk = Sunk + T unk → T u. By uniqueness of the weak limit T u = u, in other words u ∈ N (S). On the other hand un ∈ N (S)⊥ implies that u ∈ N (S)⊥ so that u = 0 must hold. Finally we also have ||u|| = 1, since unk → u strongly, which is a contradiction. lemma13-3 Lemma 13.3. R(S) is closed. Proof: Let vn ∈ R(S), vn → v. Obviously we may choose un such that Sun = vn . Let P denote the orthogonal projection onto the closed subspace N (S). If wn = un − P un then wn ∈ N (S)⊥ and Swn = Sun = vn . By the previous lemma ||vn − vm || ≥ C||wn − wm || for some C > 0, so that {wn } must be a Cauchy sequence. Letting w = limn→∞ wn we then have Sw = limn→∞ Swn = v, so that v ∈ R(S) as needed. lemma13-4 Lemma 13.4. R(S) = H if and only if N (S) = {0} . Proof: First suppose that R(S) = H and that there exists u1 ∈ N (S), u1 6= 0. There must exist u2 ∈ H such that Su2 = u1 , since we have assumed that S is onto. Similarly we can find up for p = 3, 4, . . . such that Sup = up−1 and evidently S p−1 up = u1 , S p up = 0. Let Np = N (S p ) so that Np−1 ⊂ Np and the inclusion is strict, since up ∈ Np but up 6∈ Np−1 . Now apply the Gram-Schmidt procedure to the sequence {up } to get a sequence {wp } such that wp ∈ Np , ||wp || = 1 and wp ⊥ Np−1 . We will be done if we show that {T wp } has no convergent subsequence, since this will contradict the compactness of T. Fix p > q, let g = Swq − Swp − wq and observe that ||T wp − T wq || = ||wp − wq − Swp + Swq || = ||wp + g|| (13.2.2) We must have wp ⊥ g since Swq , Swp , wq ∈ Np−1 , therefore ||T wp − T wq ||2 = ||wp ||2 + ||g||2 ≥ ||wp ||2 = 1 (13.2.3) and it follows that there can be no convergent subsequence of {T wp }, as needed. To prove the converse implication, assume that N (S) = {0} so that R(S ∗ ) = N (S)⊥ = H by Corollary 10.1. But as remarked above T ∗ is also compact, so by Lemma 13.3 R(S ∗ ) is closed, hence R(S ∗ ) = H. By the first half of this lemma N (S ∗ ) = {0} so that R(S) = H and therefore finally R(S) = H by one more application of Lemma 13.3. 221 Lemma 13.5. N (S) is of finite dimension. Proof: If not, then there exists an infinite orthonormal basis {en }∞ n=1 of N (S), and in particular ||T en || = ||en || = 1. But since T is compact we also know that T en → 0, a contradiction. lemma13-6 Lemma 13.6. The null spaces N (S) and N (S ∗ ) are of the same finite dimension. Proof: Denote m = dim N (S), m∗ = dim N (S ∗ ) and suppose that m∗ > m. Let w1 , . . . wm , v1 , . . . vm∗ be orthonormal bases of N (S), N (S ∗ ) respectively and define the operator m X hu, wj ivj (13.2.4) Au = Su − j=1 Since hSu, vj i = 0 for j = 1, . . . m∗ it follows that ( −hu, wk i k = 1, . . . m hAu, vk i = 0 k = m + 1, . . . m∗ (13.2.5) Next we claim that N (A) = {0}. To see this, if Au = 0 we’d have hu, wk i = 0 for k = 1, . . . m, so that u ∈ N (S)⊥ . But it would also follow that u ∈ N (S) by (13.2.4), and so u = 0. We may obviously write A = I − T̃ for some T̃ ∈ K(H), so by Lemma 13.4 we may conclude that R(A) = H. But vm+1 6∈ R(A) since if Au = vm+1 it would follow that 1 = ||vm+1 ||2 = hAu, vm+1 i = 0, a contradiction. corr13-1 Corollary 13.1. If 0 ∈ σ(S) then 0 ∈ σp (S). Proof: If 0 6∈ σp (S) then N (S) = {0} so that R(S) = H by Lemma 13.4. But then 0 6∈ σc (S) ∪ σr (S), so 0 6∈ σ(S) as needed. By combining the conclusions of Lemma 13.3, Lemma 13.6 and Corollary 13.1 we have completed the proof of Theorem 13.5. Further important information about the spectrum of a compact operator is contained in the next theorem. th13-6 Theorem 13.6. If T ∈ K(H) then σ(T ) is at most countably infinite, with 0 as the only possible accumulation point. 222 13-2-4 Proof: Since σ(T )\{0} = σp (T ), it is enough to show that for any > 0 there exists at most a finite number of linearly independent eigenvectors of T corresponding to eigenvalues λ with |λ| > . Assuming to the contrary, there must exist {xn }∞ n=1 , linearly independent, such that T xn = λn xn and |λn | > . Applying the Gram-Schmidt procedure ∞ to the sequence {xn }∞ n=1 we obtain an orthonormal sequence {yn }n=1 such that yk = k X βkk 6= 0 (13.2.6) βkj (λj − λk )xj (13.2.7) βkj xj j=1 Therefore T yk − λk yk = k X j=1 implying that T yk = λk yk + k−1 X αkj yj (13.2.8) |αkj |2 = ||T yk ||2 → 0 (13.2.9) j=1 for some αkj . But then 2 2 |λk | ≤ |λk | + k−1 X j=1 since {yn }∞ n=1 is orthonormal and T is compact, contradicting |λn | > . We emphasize that nothing stated so far implies that a compact operator has any eigenvalues at Rall. For example we have already observed that the simple Volterra operx ator T u(x) = 0 u(s) ds, which is certainly compact, has spectrum σ(T ) = σc (T ) = {0} (Example 12.4). In the next section we will see that if the operator T is also self-adjoint, then this sort of behavior cannot happen, i.e. eigenvalues must exist. We could also use Theorems 13.5 or 13.6 to prove that certain operators are not compact. For example, a nonzero multiplication operator cannot be compact since it has either an uncountable spectrum or an infinite dimensional eigenspace, or both. We conclude this section by summarizing in the form of a theorem the implications of the abstract results in this section for the solvability of integral equations Z λu(x) − K(x, y)u(y) dy = f (x) x∈Ω (13.2.10) Ω 223 13-2-10 Theorem 13.7. If K ∈ L2 (Ω × Ω) then there exists a finite or countably infinite set {λn ∈ C} with zero as its only possible accumulation point, such that • If λ 6= λn , λ 6= 0 then for every f ∈ L2 (Ω) there exists a unique solution u ∈ L2 (Ω) of (13.2.10). • If λ = λn 6= 0 then there exist linearly independent solutions {v1 , . . . vm }, for some finite m, of the homogeneous equation Z λv(x) − K(x, y)v(y) dy = 0 (13.2.11) Ω and m linearly independent solutions {w1 , . . . wm } of the adjoint homogeneous equation Z λw(x) − K(y, x)w(y) dy = 0 (13.2.12) Ω such that for f ∈ L2 (Ω) a solution of (13.2.10) exists if and only if f satisfies the m solvability conditions hf, wj i = 0 for j = 1, . . . m. In such case (13.2.10) has the m parameter family of solutions u = up + m X cj v j (13.2.13) j=1 where up denotes any solution of (13.2.10). • If λ = 0 then either existence or uniqueness may fail. The condition hf, wi = 0 for any solution w of Z K(y, x)w(y) dy = 0 (13.2.14) Ω is necessary, but in general insufficient for the existence of a solution of (13.2.10). 13.3 The case of self-adjoint compact operators In this section we continue with the study of the spectral properties of compact operators, but now make the additional assumption that the operator is self-adjoint. As motivation, let us recall that in the finite dimensional case a Hermitian matrix is always diagonalizable, and in particular there exists an orthonormal basis of eigenvectors of the matrix. 224 If T x = Ax where A is an N × N Hermitian matrix with eigenvalues {λ1 , . . . λN } and corresponding orthonormal eigenvectors {u1 , . . . uN }, and we let U denote the N ×N matrix whose columns are u1 , . . . uN , then U ∗ U = I and U ∗ AU = D where D is a diagonal matrix with diagonal entries λ1 , . . . λN . It follows that ∗ Ax = U DU x = N X λj huj , xiuj (13.3.1) j=1 or equivalently T = N X λ j Pj (13.3.2) j=1 where Pj is the orthogonal projection onto the span of uj . The property that an operator may have of being expressible as a linear combination of projections is a useful one when true, and as we will see in this section is generally correct for compact self-adjoint operators. Definition 13.4. If T is a linear operator on a Hilbert space H, the Rayleigh quotient for T is hT x, xi J(x) = (13.3.3) ||x||2 Clearly J : D(T )\{0} → C and |J(x)| ≤ ||T || for T ∈ B(H). If T is self-adjoint then J is real valued since hT x, xi = hx, T xi = hT x, xi (13.3.4) The range of the function J is sometimes referred to as the numerical range of T , and we may occasionally use the notation Q(x) = hT x, xi, the so-called quadratic form associated with T . Note also that σp (T ) is contained in the numerical range of T , since J(x) = λ if T x = λx. th13-8 Theorem 13.8. If T ∈ B(H) and T = T ∗ then ||T || = sup |J(x)| (13.3.5) x6=0 Proof: If MT = supx6=0 |J(x)| then we have already observed that MT ≤ ||T ||. To derive the reverse inequality, first observe that since J is real valued, hT (x + y), x + yi ≤ MT ||x + y||2 −hT (x − y), x − yi ≤ MT ||x − y||2 225 (13.3.6) (13.3.7) (13.3.8) for any x, y ∈ H. Adding these inequalities and using the self-adjointness gives 2Re hT x, yi = hT x, yi + hT y, xi ≤ MT (||x||2 + ||y||2 ) (13.3.9) If x 6∈ N (T ) choose y = (||x||/||T x||)T x so that ||y|| = ||x|| and hT x, yi = ||T x|| ||x||. It follows that 2||T x|| ||x|| ≤ 2MT ||x||2 (13.3.10) and therefore ||T x|| ≤ MT ||x|| holds for x 6∈ N (T ). Since the same conclusion is obvious for x ∈ N (T ) the proof is completed. We note that the conclusion of theorem is false without the self-adjointness assumption, for example J(x) = 0 for all x if T is the operator of rotation by π/2 in R2 . Now consider the function α → J(x + αy) for fixed x, y ∈ H\{0}. As a function of α it is simply a quotient of quadratic functions, hence differentiable at any α for which ||x + αy|| = 6 0. In particular d J(x + αy)α=0 (13.3.11) dα is well defined for any x 6= 0. This expression is the directional derivative of J at x in the y direction, and we say that x is a critical point of J if (13.3.11) is zero for every direction y. 13-3-12 We may evaluate (13.3.11) by elementary calculus rules and we find that hx, xi(hT x, yi + hT y, xi) − hT x, xi(hx, yi + hy, xi) d J(x + αy)α=0 = dα hx, xi2 (13.3.12) so at a critical point it must hold that Re hT x, yi = J(x)Re hx, yi ∀y ∈ H (13.3.13) ∀y ∈ H (13.3.14) Replacing y by iy we obtain ImhT x, yi = J(x) Imhx, yi and since J is real valued, hT x, yi = J(x)hx, yi ∀y ∈ H (13.3.15) If λ = J(x) then hT x − λx, yi = 0 for all x ∈ H, so that T x = λx must hold. We therefore see that eigenvalues of a self-adjoint operator T may be obtained from critical points of the corresponding Rayleigh quotient, and it is also clear that the right side of (13.3.12) evaluates to be zero for any y if T x = λx. We have therefore established the following. 226 13-3-13 Proposition 13.3. Let T be a bounded self-adjoint operator on H. Then x ∈ H\{0} is a critical point of J if and only if x is an eigenvector of T corresponding to eigenvalue λ = J(x). We emphasize that at this point we have not yet proved that any such critical points exist, and indeed we know that a bounded self-adjoint operator can have an empty point spectrum, for example a multiplication operator if the multiplier is real valued and all of its level sets have measure zero. Nevertheless we have identified a strategy that will succeed in proving the existence of eigenvalues, once some additional assumptions are made. The main such additional assumption we will make is that T is compact. th13-9 Theorem 13.9. If T ∈ K(H) and T = T ∗ then either J or −J achieves its maximum on H\{0}. In particular, either ||T || or −||T || (or both) belong to σp (T ). Proof: If T = 0 then J(x) ≡ 0 and the conclusion is obvious. Otherwise, if ||T || > 0 then by Theorem 13.8 either sup J(x) = MT = ||T || x6=0 or inf J(x) = −MT = −||T || x6=0 (13.3.16) or both. For definiteness we assume that the first of these is true, in which case there must exist a sequence {xn }∞ n=1 in H such that J(xn ) → MT . Without loss of generality we may assume ||xn || = 1 for all n, so that hT xn , xn i → MT . By weak compactness w there is a subsequence xnk → x, for some x ∈ H, and since T is compact we also have T xnk → T x. Thus 0 ≤ ||T xnk − MT xnk ||2 = ||T xnk ||2 + MT2 ||xnk ||2 − 2MT hT xnk , xnk i (13.3.17) Letting k → ∞ the right hand side tends to ||T x||2 − MT2 ≤ 0, and thus ||T x|| = MT . Furthermore, T xnk − MT xnk → 0, and since MT 6= 0, {xnk } must be strongly convergent to x – in particular ||x|| = 1. Thus we have T x = MT x for some x 6= 0, so that J(x) = MT which means that J achieves its maximum at x and x is an eigenvector corresponding to eigenvalue ||T || = MT , as needed. According to this theorem, any nonzero, compact, self-adjoint operator has at least one eigenvector x1 corresponding to an eigenvalue λ1 6= 0. If another such eigenvector exists which is not a scalar multiple of x1 , then it must be possible to find one which is orthogonal to x1 , since eigenvectors corresponding to distinct eigenvalues are automatically orthogonal (Theorem 12.5) while the eigenvectors corresponding to λ1 form a subspace 227 which we can find an orthogonal basis of. This suggests that we seek another eigenvector by maximizing or minimizing the Rayleigh quotient over the subspace H1 = {x1 }⊥ . Let us first make a definition and a simple observation. Definition 13.5. If T is a linear operator on H then a subspace M ⊂ D(T ) is invariant for T if T (M ) ⊂ M . It is obvious that any eigenspace of T is invariant for T , and in the case of a self-adjoint operator we have also the following. Lemma 13.7. If T ∈ B(H) is a self-adjoint and M is an invariant subspace for T , then M ⊥ is also invariant for T . Proof: If y ∈ M and x ∈ M ⊥ then hT x, yi = hx, T yi = 0 (13.3.18) since T y ∈ M . Thus T x ∈ M ⊥ . Now defining H1 = {x1 }⊥ as above, we have immediately that T ∈ B(H1 ) and clearly inherits the properties of compactness and self-adjointness from H. Theorem 13.9 is therefore immediately applicable, so that the restriction of T to H1 has an eigenvector x2 , which is also an eigenvector of T and which is automatically orthogonal to x1 . The corresponding eigenvalue is λ2 = ±||T1 ||, where T1 is the restriction of T to H1 , and so obviously |λ2 | ≤ |λ1 |. Continuing this way we obtain orthogonal eigenvalues x1 , x2 , . . . corresponding to real eigenvalues |λ1 | ≥ |λ2 | ≥ . . . where |λn+1 | = max |J(x)| = ||Tn || (13.3.19) x∈Hn x6=0 where Hn = {x1 , . . . xn }⊥ and Tn is the restriction of T to Hn . Without loss of generality ||xn || = 1 for all n obtained this way. There are now two possibilities, either (i) the process continues indefinitely with λn 6= 0 for all n, or (ii) λn+1 = 0 for some n. In the first case we must have limn→∞ λn = 0 by Theorem 13.6 and the fact that every eigenspace is of finite dimension. In case (ii), T has only finitely many linearly independent eigenvectors corresponding to nonzero 228 eigenvalues λ1 , . . . λn and T = 0 on Hn . Assuming for definiteness that H is separable and of infinite dimension, then Hn = N (T ) is the eigenspace for λ = 0 which must itself be infinite dimensional. th13-10 Theorem 13.10. Let H be a separable Hilbert space. If T ∈ K(H) is self-adjoint then a) R(T ) has an orthonormal basis consisting of eigenvectors {xn } of T corresponding to eigenvalues λn 6= 0. b) H has an orthnormal basis consisting of eigenvectors of T . Proof: Let {xn } be the finite or countably infinite set of eigenvectors corresponding to Pn the nonzero eigenvalues of T as constructed above. For x ∈ H let y = x − j=1 hx, xj ixj for some n. Then y is the orthogonal projection of x onto Hn , so ||y|| ≤ ||x|| and ||T y|| ≤ |λn+1 | ||y||. In particular n X n X hx, xj iT xj ||2 ≤ |λn+1 |2 ||x||2 (13.3.20) hx, xj iT xj = hx, xj iλj xj = hx, λj xj ixj = hx, T xj ixj = hT x, xj ixj (13.3.21) ||T x − 2 hT x, xj ixj || = ||T x − j=1 j=1 where we have used that Letting n → ∞, or taking n sufficiently large in the case of a finite number of nonzero eigenvalues, we therefore see that T x is in the span of {xn }. This completes the proof of a). If we now let {zn } be any orthonormal basis of the closed subspace N (T ), then each zn is an eigenvector of T corresponding to eigenvector λ = 0 and zn ⊥ xm for any m, n since P ⊥ N (T ) = R(T ) . For any x ∈ H let P y = n hx, xn ixn - the series must be convergent by Proposition 6.3 and the fact that n |hx, xn i|2 ≤ ||x||2 . It is immediate that x−y ∈ N (T ) since X Tx = Ty = λn hx, xn ixn (13.3.22) 13-3-23b n and so x has the unique representation X X X x= hx, xn ixn + hx, zn izn = hx, xn ixn + P x n n (13.3.23) n where P is the orthogonal projection onto the closed subspace N (T ). Thus {xn } ∪ {zn } is an orthonormal basis of H. 229 13-3-23 We note that either sum in (13.3.23) can be finite or infinite, but of course they can’t both be finite unless H is finite dimensional. In the case of a non-separable Hilbert space it is only necessary to allow for an uncountable basis of N (T ). From (13.3.22) we also get the diagonalization formula X T = λn Pn (13.3.24) n where Pn x = hx, xn ixn is the orthogonal projection onto the span of xn . The existence of an eigenfunction basis provides a convenient tool for the study of corresponding operator equations. Let us consider the problem λx − T x = f (13.3.25) where T is a compact, self-adjoint operator on a separable, infinite dimensional Hilbert space H. Let {xn }∞ n=1 be an orthonormal basis of eigenvectors of T . We may therefore expand f , and solution x if it exists, in this basis, x= ∞ X n=1 an x n f= ∞ X an = hx, xn i bn = hf, xn i bn x n (13.3.26) n=1 Inserting these into the equation and using T xn = λn xn there results ∞ X ((λn − λ)an − bn )xn = 0 (13.3.27) n=1 Thus it is a necessary condition that (λn − λ)an = bn for all n, in order that a solution x exist. Now let us consider several cases. Case 1. If λ 6= λn for every n and λ 6= 0, then λ ∈ ρ(T ) so a unique solution x of (13.3.25) exists, which must be given by x= ∞ X hf, xn i n=1 λ − λn xn (13.3.28) Note that there exists a constant C such that 1/|λ − λn | ≤ C for all n, from which it follows directly that the series is convergent in H and ||x|| ≤ C||f ||. Case 2. Suppose λ = λm for some m and λ 6= 0. It is then necessary that bn = 0 for all n for which λn = λm , which amounts precisely to the solvability condition on f already 230 13-2-26 derived, that f ⊥ z for all z ∈ N (λI − T ). When this holds the constants an may be chosen arbitrarily for these n values, while an = bn /(λ − λn ) must hold otherwise. Thus the general solution may be written x= X {n:λn 6=λm } hf, xn i xn + λ − λn X cn x n (13.3.29) {n:λn =λm } for any f ∈ R(λI − T ). Case 3. If λ = 0 and λn 6= 0 for all n then the unique solution is given by x=− ∞ X hf, xn i n=1 λn xn (13.3.30) provided the series is convergent in H. Since λn → 0 must hold in this case, there will always exist f ∈ H for which the series is not convergent, as must be the case since R(T ) is dense but not equal to all of H. In fact we obtain the precise characterization that f ∈ R(T ) if and only if ∞ X |hf, xn i|2 <∞ (13.3.31) λ2n n=1 Case 4. If λ = 0 ∈ σp (T ) let {xn } ∪ {zn } be an orthonormal basis of eigenvectors as above, with the zn ’s being a basis of N (T ). If a solution x exists, then by matching coefficients in the basis expansions of T x and f we get that a solution exists if f has the properties X |hf, xn i|2 hf, zn i = 0 ∀n and <∞ (13.3.32) λ2n n in which case the general solution is x= X hf, xn i n 13.4 λn xn + X cn zn n X c2n < ∞ (13.3.33) n Some properties of eigenvalues When T is a self-adjoint compact operator, we have seen in the previous section that solution formulas for the equation λx − T x = f can be given purely in terms of the 231 eigenvalues and eigenvectors of T , along with f itself. This means that all of the properties of T are encoded by these eigenvalues and eigenvectors. We will briefly pursue some consequences of this in the case that T is an integral operator, in which case we may anticipate that properties of the kernel of the operator are directly connected to those of the eigenvalues and eigenvectors. Thus let Z K(x, y)u(y) dy (13.4.1) T u(x) = 13-4-1 Ω where K ∈ L2 (Ω × Ω) and K(x, y) = K(y, x). Considered as an operator on L2 (Ω) Theorem 13.10 is then applicable, so we know there must exist an orthonormal basis of eigenfunctions {un }∞ n=1 and real eigenvalues λn such that T un = λn un , i.e. Z K(x, y)un (y) dy = λn un (x) (13.4.2) Ω or equivalently Z K(y, x)un (y) dy = λn un (x) (13.4.3) Ω This means that for almost every x ∈ Ω, λn un (x) is the n’th generalized Fourier coefficient of K(·, x) with respect to the un basis. In particular, by the Bessel equality Z 2 |K(x, y)| dy = Ω ∞ X λ2n |un (x)|2 for a.e. x ∈ Ω (13.4.4) n=1 and integrating with respect to x gives Z ZZ ∞ ∞ X X 2 2 2 |K(x, y)| dydx = λn |un (x)| dx = λ2n Ω×Ω n=1 Ω (13.4.5) n=1 It also follows from the above considerations that K(y, x) = ∞ X λn un (x)un (y) (13.4.6) λn un (x)un (y) (13.4.7) n=1 or K(x, y) = ∞ X n=1 232 13-4-7 in the sense that the convergence takes place in L2 (Ω) with respect to y for a.e. x and vice versa. Formally at least it follows by setting y = x that K(x, x) = ∞ X λn |un (x)|2 (13.4.8) n=1 and integrating in x that Z K(x, x) dx = Ω ∞ X λn (13.4.9) n=1 This identity, however, cannot be proved to be correct without further assumptions, if for no other reason than that K(x, x), being a restriction of K to a set of measure zero in Ω × Ω, could be changed in an arbitrary way with out changing the spectrum of T . Here we state without proof Mercer’s theorem, which states sufficient conditions for (13.4.9) to hold – see for example [8], p. 138. Theorem 13.11. Let T be the compact self-adjoint integral operator (13.4.1). Assume that Ω is bounded, K is continuous on Ω × Ω and that all but finitely many of the nonzero eigenvalues of T are of the same sign. Then (13.4.7) is valid, where the convergence is absolute and uniform, and in particular (13.4.9) holds. 13.5 The Singular Value Decomposition and Normal Operators If T is a compact operator we know from explicit examples that the point spectrum of T may be empty. However if we let S = T ∗ T , the so-called normal operator of T , then S is compact and self-adjoint (see Exercise 1), so that Theorem 13.10 applies to S. There must therefore exist an orthonormal basis {xn }∞ n=1 of H consisting of eigenvectors of S, i.e. T ∗ T xn = λn xn (13.5.1) Note that if J is the Rayleigh quotient for S then λn = J(xn ) = hSxn , xn i = ||T xn ||2 ≥ 0 √ (13.5.2) We define σn = λn to be the n’th singular value of T . If T 6= 0 and we list the nonzero eigenvalues of S in decreasing order, λ1 ≥ λ2 ≥ . . . (this is possibly a finite list) then from Theorem 13.9 it is immediate that λ1 = ||T ||2 so we have the following simple but important result. 233 ktrace Proposition 13.4. If T ∈ K(H) then ||T || = σ1 , the largest singular value of T . Now for any n for which λn > 0, let yn = T xn /σn . We then have T ∗ y n = σ n xn T xn = σn yn (13.5.3) The xn ’s are orthonormal by construction, and hyn , ym i = 1 hT xn , T xm i = hxn , xm i λn (13.5.4) so that the yn ’s are also orthonormal. We say that xn is the n’th right singular vector of T and yn is the n’th left singular vector. The collection {λn , xn , yn } is a singular system for T . From (13.3.23) we then have X X σn hx, xn iyn hx, xn iT xn = Tx = or T = X σn Qn (13.5.5) n n where Qn x = hx, xn iyn (13.5.6) n Here Qn is not a projection unless xn = yn , but is still a so-called rank one operator. This representation of T as a sum of rank one operators is the singular value decomposition of T . Now let us consider a normal operator T ∈ K(H), which we recall means that T ∗ T = T T . For simplicity let us also assume that all eigenvalues of the compact self-adjoint operator S = T ∗ T are nonzero and simple. In that case if Sxn = λn xn it follows that ∗ ST xn = T ∗ T 2 xn = T T ∗ T xn = T Sxn = λn T xn (13.5.7) which means either T xn = 0 or T xn is an eigenvector of S corresponding to λn . The first case cannot occur since then Sxn = 0 would hold, so it must be that xn and T xn are nonzero and linearly dependent, T xn = θn xn for some θn ∈ C\{0}. Thus H has an orthonormal basis consisting of eigenvectors of T since there are the same as the eigenvectors of S. With a somewhat more complicated proof, the same can be shown for any normal operator T , see Section 56 of [2]. 234 13.6 Exercises ex-13-1 1. Show that if S ∈ B(H) and T is compact, then T S and ST are also compact. (In algebraic terms this means that the set of compact operators is an ideal in B(H).) ex-13-2 2. If T ∈ B(H) and T ∗ T is compact, show that T must be compact. Use this to show that if T is compact then T ∗ must also be compact. ex-13-3 3. Prove that a sequence {xn }∞ n=1 in a Hilbert space can have at most one weak limit. 4. If T ∈ B(H) is compact and H is of infinite dimension, show that 0 ∈ σ(T ). 5. Let {φj }nj=1 ,{ψj }nj=1 be linearly independent sets in L2 (Ω), K(x, y) = n X φj (x)ψj (y) j=1 be the corresponding degenerate kernel and T be the corresponding integral operator. Show that the problem of finding the nonzero eigenvalues of T always amounts to a matrix eigenvalue problem. In particular, show that T has at most n nonzero eigenvalues. Find σp (T ) in the case that K(x, y) = 6+12xy +60x2 y 3 and Ω = (0, 1). (Feel free to use Matlab or some such thing to solve the resulting matrix eigenvalue problem.) 6. Let 1 T u(x) = x Z x u(y) dy u ∈ L2 (0, 1) 0 Show that (0, 2) ⊂ σp (T ) and that T is not compact. (Suggestion: look for eigenfunctions in the form u(x) = xα .) 7. Let {λj }∞ j=1 be a sequence of nonzero real numbers satisfying ∞ X λ2j < ∞ j=1 Construct a symmetric Hilbert-Schmidt kernel K such that the corresponding integral operator has eigenvalues λj , j = 1, 2 . . . and for which 0 is an eigenvalue of infinite multiplicity. (Suggestion: look for such a K in the form K(x, y) = P∞ 2 j=1 λj uj (x)uj (y) where {uj } are orthonormal, but not complete, in L (Ω).) 235 8. Let T be the integral operator Z 1 T u(x) = (x + y)u(y) dy 0 on L2 (0, 1). Find σp (T ), σc (T ) and σr (T ) and the multiplicity of each eigenvalue. 9. On the Hilbert space H = `2 define the operator T by T {x1 , x2 , . . . } = {a1 x1 , a2 x2 , . . . } for some sequence {an }∞ n=1 . Show that T is compact if and only if limn→∞ an = 0. 10. Let T be the integral operator with kernel K(x, y) = e−|x−y| on L2 (−1, 1). Find all of the eigenvalues and eigenfunctions of T . (Suggestion: T u = λu is equivalent to an ODE problem. Don’t forget about boundary conditions. The eigenvalues may need to be characterized in terms of the roots of a certain nonlinear function.) 11. We say that T ∈ B(H) is a positive operator if hT x, xi ≥ 0 for all x ∈ H. If T is a positive self-adjoint compact operator show that T has a square root, more precisely 2 thereP exists a compact self-adjoint P∞ √ operator S such that S = T . (Suggestion: If ∞ T = n=1 λn Pn try S = n=1 λn Pn . In a similar manner, one can define other fractional powers of T .) 12. Suppose that S ∈ B(H), 0 ∈ ρ(S), T is a compact operator on H, and N (S + T ) = {0}. Show that the operator equation Sx + T x = y has a unique solution for every y ∈ H. 13. Compute the singular value decomposition of the Volterra operator Z x T u(x) = u(s) ds 0 in L2 (0, 1) and use it to find ||T ||. Is T normal? (Suggestion: The equation T ∗ T u = λu is equivalent to an ODE eigenvalue problem which you can solve explicitly.) 14. The concept of a Hilbert-Schmidt operator can be defined abstractly as follows. If H is a separable Hilbert space, we say that T ∈ B(H) is Hilbert-Schmidt if ∞ X ||T un ||2 < ∞ n=1 236 (13.6.1) for some orthonormal basis {un }∞ n=1 of H. a) Show that if T is Hilbert-Schmidt then the sum (13.4.1) must be finite for any orthonormal basis of H. (Suggestion: If {vn }∞ n=1 is another orthonormal basis, then ∞ X n=1 ||T vn ||2 = ∞ X n,m=1 |(T vn , um )|2 = ∞ X |(vn , T ∗ um )|2 = n,m=1 ∞ X |(un , T ∗ um )|2 n,m=1 etc.) b) Show that a Hilbert-Schmidt operator is compact. 15. If Q ∈ B(H) is a Fredholm operator of index zero, show that there exists a oneto-one operator S ∈ B(H) and T ∈ K(H) such that Q = S + T .(Hint: Define T = AP where P is the orthogonal projection onto N (Q) and A : N (Q) → N (Q∗ ) is one-to-one and onto.) 237 Chapter 14 Spectra and Green’s functions for differential operators chdiffop In this chapter we will focus more on spectral properties of unbounded operators, about which we have had little to say up to this point. Two simple but key observations are that many interesting unbounded linear operators have an inverse which is compact, and that if λ 6= 0 is an eigenvalue of some operator then λ−1 is an eigenvalue of the inverse operator, with the same eigenvector. Thus we may be able to obtain a great deal of information about the spectrum of an unbounded operator by looking at its inverse, if the inverse exists. We will carry this plan out in detail for two important special cases. The first is the case of a second order differential operator in one space dimension (Sturm-Liouville theory), and the second is the case of the Laplacian operator in a bounded domain of RN . 14.1 Green’s functions for second order ODEs Let us reconsider the operator on L2 (0, 1) from Example 12.6, namely T u = −u00 D(T ) = {u ∈ H 2 (0, 1) : u(0) = u(1) = 0} (14.1.1) Any u ∈ N (T ) is a linear function vanishing at the endpoints, so the associated problem −u00 = f 0<x<1 238 u(0) = u(1) = 0 (14.1.2) 14-1-2 has at most one solution for any f ∈ L2 (0, 1). In fact an explicit solution formula was given in Exercise 7 of Chapter 2, at least for f ∈ C([0, 1]), and it is not hard to check that it remains valid for f ∈ L2 (0, 1) in the sense that if ( y(x − 1) 0 < y < x < 1 G(x, y) = (14.1.3) x(y − 1) 0 < x < y < 1 then Z u(x) = 14-1-3 1 G(x, y)f (y) dy (14.1.4) 14-1-4 0 satisfies −u00 = f in the sense of distributions on (0, 1), as well as the given boundary conditions. Let us next consider how (14.1.3)-(14.1.4) might be derived in the first place. Formally, if (14.1.4) holds, then Z 1 00 Gxx (x, y)f (y) dy = −f (x) (14.1.5) u (x) = 0 which suggests Gxx (x, y) = −δ(x − y) for all y ∈ (0, 1). This in turn means, in particular, that ( Ax + B 0 < x < y G(x, y) = (14.1.6) Cx + D y < x < 1 for some constants A, B, C, D. In order that u satisfy the required boundary conditions we should have B = C +D = 0. Recalling the discussion leading up to (7.3.27) we expect that x → G(x, y) should be continuous at x = y and x → Gx (x, y) should have a jump of magnitude −1 at x = y. These four conditions uniquely determine the four coefficients determining G in (14.1.3). We call G the Green’s function for the problem (14.1.2). Now let us consider a more general situation of this type. Define a differential expression Lu = a2 (x)u00 + a1 (x)u0 + a0 (x)u (14.1.7) 14-1-7 where we require the coefficients to satisfy aj ∈ C([a, b]) for j = 1, 2, 3 and a2 (x) 6= 0 on [a, b], together with boundary operators B1 u = c1 u(a)+c2 u0 (a) B2 u = c3 u(b)+c4 u0 (b) |c1 |+|c2 | = 6 0 |c3 |+|c4 | = 6 0 (14.1.8) 14-1-8 B1 u = B2 u = 0 14-1-9 We seek a solution for the problem Lu(x) = f (x) a < x < b 239 (14.1.9) in the form Z b G(x, y)f (y) dy u(x) = (14.1.10) a for some suitable kernel function G(x, y). Computing again formally, Z b Lx G(x, y)f (y) dy Lu(x) = (14.1.11) a where the subscript on L reminds us that L operates in the x variable for fixed y, we see that Lx G = δ(x − y) (14.1.12) should hold, and B1x G = B2x G = 0 (14.1.13) in order that the boundary conditions for u be satisfied. In particular G should satisfy Lx G = 0 for a < x < y < b and a < y < x < b, plus certain matching conditions at x = y which may be stated as follows: G should be continuous at x = y since otherwise Lx G would contain a term Cδ 0 (x − y), and Gx should experience a jump at x = y of the correct magnitude such that a2 (x)Gxx (x, y) = δ(x − y), in other words the jump in Gx should be 1/a2 (y). The same conclusion could be (formally) derived by integrating both sides of 14.1.12 from y − to y + and letting → 0+. Thus our conditions may be summarized as 1 G(y+, y) − G(y−, y) = 0 Gx (y+, y) − Gx (y−, y) = (14.1.14) a2 (y) B1x G = B2x G = 0 (14.1.15) We now claim that such a function G(x, y) can be found, under the additional assumption that the homogeneous problem (14.1.9) with f ≡ 0 has only the zero solution. First observe that we can find non-trivial solutions φ1 , φ2 of Lφ1 = 0 a < x < b Lφ2 = 0 a < x < b B1 φ1 = 0 B2 φ2 = 0 (14.1.16) (14.1.17) (14.1.18) since each amounts to a second order ODE with only one initial condition. Now look for G in the form ( C1 (y)φ1 (x) a<x<y<b (14.1.19) G(x, y) = C2 (y)φ2 (x) a<y<x<b 240 14-1-12 It is then automatic that Lx G = 0 for x 6= y, and that the boundary conditions (14.1.15) hold. In order that the remaining conditions (14.1.14) be satisfied we must have that C1 (y)φ1 (y) − C2 (y)φ2 (y) = 0 C1 (y)φ01 (y) − C2 (y)φ02 (y) = − (14.1.20) 1 a2 (y) (14.1.21) Thus unique constants C1 (y), C2 (y) exist provided the coefficient matrix is nonsingular, or equivalently the Wronskian of φ1 , φ2 is nonzero for every y. But it is known from ODE theory that if the Wronskian is zero at any point then φ1 , φ2 must be linearly dependent, in which case either one is a nontrivial solution of the homogeneous problem. This contradicts the assumption we made, and so the first part of the following theorem has been established. th14-1 Theorem 14.1. Assume that (14.1.9) with f ≡ 0 has only the zero solution. Then 1. There exists a unique function G(x, y) defined for a ≤ x, y ≤ b such that Lx G(x, y) = δ(x − y) in the sense of distributions on (a, b) for fixed y, and (14.1.14),(14.1.15) hold. 2. G is bounded on [a, b] × [a, b]. 3. If f ∈ L2 (a, b) and Z b G(x, y)f (y) dy u(x) = Sf := (14.1.22) 14-1-22 (14.1.23) 14-1-23 a then u is the unique solution of (14.1.9). In particular, if we define the unbounded linear operator T u = Lu D(T ) = {u ∈ L2 (a, b) : Lu ∈ L2 (a, b), B1 u = B2 u = 0} then T −1 , given by (14.1.22) clearly satisfies the Hilbert-Schmidt condition and so is compact operator on L2 (a, b)1 . corr14-1 Corollary 14.1. Assume that (14.1.9) with f ≡ 0 has only the zero solution and define T by (14.1.23). Then σ(T ) consists of at most countably many nonzero simple eigenvalues with no finite accumulation point. 1 Note that we observe a careful distinction between the operator T and the differential expression defined by L – the operator T corresponds to the triple (L, B1 , B2 ). 241 Proof: By Theorem 14.1 0 ∈ ρ(T ). If λ ∈ σ(T ) then µ = λ−1 ∈ σ(T −1 ) since if µ ∈ ρ(T −1 ) it would follow that the equation T u − λu = f has the unique solution u = µ(µI − T −1 )−1 T −1 f µ = λ−1 (14.1.24) which implies that λ ∈ ρ(T ). Thus σ(T ) is contained in the set {λ : λ−1 ∈ σ(T −1 )} which is at most countable by Theorem 13.6. In addition every such point must be in σp (T −1 ) and so σ(T ) = σp (T ). Since σ(T −1 ) is bounded with zero as its only accumulation point it follows that σ(T ) can have no finite accumulation point. Finally, all eigenvalues of T must be simple, since if there existed two linearly independent functions in N (T − λI) these would form a fundamental set for the ODE Lu = λu. But then every solution of Lu = λu would have to be in D(T ), in particular satisfying the boundary conditions B1 u = B2 u = 0, which is clearly false. exmp14-1 Example 14.1. For the case Lu = u00 − u B1 u = u0 (0) B2 u = u(1) (14.1.25) we can choose φ1 (x) = cosh x φ2 (x) = sinh(x − 1) (14.1.26) The matching conditions at x = y then amount to C1 (y) cosh(y) − C2 (y) sinh(y − 1) = 0 C1 (y) sinh(y) − C2 (y) cosh(y − 1) = −1 (14.1.27) (14.1.28) The solution pair is C1 (y) = sinh(y − 1)/ cosh(1), C2 (y) = cosh(y)/ cosh(1) giving the Green’s function ( sinh(y−1) cosh(x) 0<x<y<1 cosh(1) G(x, y) = sinh(x−1) (14.1.29) cosh(y) 0 < y < x < 1 cosh(1) If T is the operator corresponding to L, B1 , B2 then it may be checked by explicit calculation that ∞ 1 2 (14.1.30) σ(T ) = −1 − ((n + )π) 2 n=0 14.2 Adjoint problems Note in the last example that the Green’s function is real and symmetric, so that the corresponding operator integral operator S in (14.1.22), and hence also T = S −1 is selfadjoint. In this section we consider in more detail the adjoint of the operator T defined 242 14-1-30 in (14.1.23). First we observe that formally, for φ, ψ ∈ C0∞ (a, b) we have Z b hLφ, ψi = (a2 φ00 + a1 φ0 + a0 φ)ψ a Z b φ((a2 ψ)00 − (a1 ψ)0 + a0 ψ) dx = (14.2.1) (14.2.2) a That is to say, hLφ, ψi = hφ, L∗ ψi (14.2.3) L∗ ψ = (a2 ψ)00 − (a1 ψ)0 + a0 ψ (14.2.4) where For simplicity we will make the additional assumptions on the coefficients that aj ∈ C j ([a, b]) and is real valued for j = 0, 1, 2 (14.2.5) in which case (14.2.2) is correct. Furthermore since L∗ ψ = a2 ψ 00 + (2a02 − a1 )ψ 0 + (a002 − a01 + a0 )ψ (14.2.6) we see that L∗ ψ = Lψ precisely if a1 = a02 . We say that the expression L is formally self-adjoint in this case, but note that this is not the same as having the corresponding operator T be self-adjoint, since so far there has been no taking account of the boundary conditions which are part of the definition of T . To pursue this point, we see from an integration by parts that for any φ, ψ ∈ C 2 ([a, b]) we have b hLφ, ψi − hφ, L∗ ψi = J(φ, ψ)a (14.2.7) where 0 J(φ, ψ) = a2 (φ0 ψ − φψ ) + (a1 − a02 )φψ (14.2.8) is the boundary functional. Since we can choose φ, ψ to have compact support in which case the boundary term is zero, the expression for T ∗ must be given by L∗ . Furthermore, b D(T ∗ ) must be such that J(φ, ψ)a = 0 whenever φ ∈ D(T ) and ψ ∈ D(T ∗ ). As we will see, this amounts to the specification of two more homogeneous boundary conditions to be satisfied by ψ. exmp14-2 Example 14.2. As in Example 14.1 consider Lφ = φ00 − φ on (0, 1), which is formally self-adjoint, together with the boundary operators B1 φ = φ0 (0), B2 φ = φ(1). By direct calculation we see that J(φ, ψ) = φ(0)ψ 0 (0) + φ0 (1)ψ(1) (14.2.9) 243 if B1 φ = B2 φ = 0. But otherwise φ(0), φ0 (1) can take on arbitrary values, and the only way this can be true is if ψ 0 (0) = ψ(1) = 0, i.e. ψ satisfies the same boundary conditions as φ. Thus we expect that T ∗ = T , confirming what we saw earlier from the fact that T −1 is self-adjoint. exmp14-3 Example 14.3. Let Lφ = x2 φ00 + xφ0 − φ 1 < x < 2 B1 φ = φ0 (1) B2 φ = φ(2) + φ0 (2) (14.2.10) In this case we find that the expression for the adjoint operator is Lψ = (x2 ψ)00 − (xψ)0 − ψ = x2 ψ 00 + 3xψ 0 (14.2.11) Next, the boundary functional is J(φ, ψ) = x2 (φ0 ψ − φψ 0 ) − xφψ so that if B1 φ = B2 φ = 0 it follows that 2 J(φ, ψ)1 = φ(2)(−6ψ(2) − 4ψ 0 (2)) + φ(1)(ψ 0 (1) + ψ(1)) (14.2.12) (14.2.13) Since φ(1), φ(2) can be chosen arbitrarily, it must be that 2ψ 0 (2) + 3ψ 0 (2) = 0 ψ 0 (1) + ψ(1) = 0 (14.2.14) for ψ ∈ D(T ∗ ). Definition 14.1. We say that a set of boundary operators {B1∗ , B2∗ } are adjoint to {B1 , B2 }, with respect to L, if b J(φ, ψ)a = 0 (14.2.15) whenever B1 φ = B2 φ = B1∗ ψ = B2∗ ψ = 0. The conditions B1∗ ψ = B2∗ ψ = 0 are referred to as the adjoint boundary conditions (with respect to L). Thus, for example, in Examples 14.2, 14.3 we found adjoint boundary operators {ψ 0 (0), ψ(1)} and {ψ 0 (1) + ψ(1), 2ψ 0 (2) + 3ψ(2)} respectively. The operators B1∗ , B2∗ are not themselves unique, since for example they could always be interchanged or multiplied by constants. However the subspace {ψ : B1∗ ψ = B2∗ ψ = 0} is uniquely determined. If we now define T ∗ ψ = L∗ ψ on the domain D(T ∗ ) = {ψ ∈ L2 (a, b) : L∗ ψ ∈ L2 (a, b), B1∗ ψ = B2∗ ψ = 0} (14.2.16) then hT φ, ψi = hφ, T ∗ ψi if φ ∈ D(T ) and ψ ∈ D(T ∗ ) and so T ∗ is the adjoint operator of T . 244 It can be shown (Exercise 4) that if a1 = a02 (that is, L is formally self-adjoint), and the boundary conditions are of the form (14.1.8), then the adjoint boundary conditions coincide with the original boundary conditions, so that T is self-adjoint. It is possible to also consider non-separated boundary conditions of the form B1 u = c1 u(a) + c2 u0 (a) + c3 u(b) + c4 u0 (b) = 0 B2 u = d1 u(a) + d2 u0 (a) + d3 u(b) + d4 u0 (b) = 0 (14.2.17) (14.2.18) to allow, for example, for periodic boundary conditions, see Exercise 6. If T satisfies the assumptions of Theorem 14.1 then N (T ∗ ) = R(T )⊥ = {0}. Thus T ∗ also satisfies these assumptions, and so has a corresponding Green’s function which we denote by G∗ (x, y). Let us observe, at least formally, the important property G(x, y) = G∗ (y, x) x, y ∈ (a, b) (14.2.19) To see this, use Lz G(z, y) = δ(z − y), L∗z G∗ (z, x) = δ(z − x) to get Z b Z b ∗ ∗ G (y, x) − G(x, y) = G (z, x)Lz G(z, y) dz − G(z, y)L∗z G∗ (z, x) dz (14.2.20) a a z=b = J(G∗ (z, x), G(z, y))z=a = 0 (14.2.21) where the last equality follows from the fact that G, G∗ satisfy respectively the {B1 , B2 } and {B1∗ , B2∗ } boundary conditions as a function of their first variable. This confirms the expected result that G(x, y) = G(y, x) if T = T ∗ . Furthermore it shows that as a function of the second variable, G(x, y) satisfies the homogeneous adjoint equation for x 6= y and the adjoint boundary conditions. 14.3 Sturm-Liouville theory If the operator T in (14.1.23) is self-adjoint, then the existence of eigenvalues and eigenfunctions can be directly proved as a consequence of the fact that T −1 is compact and self-adjoint. But even if T is not self-adjoint, it is still possible to obtain such results by using a special device known as the Liouville transformation. Essentially we will produce a compact self-adjoint operator in a slightly different space, whose spectrum must agree with that of T . The resulting conclusions about the spectral properties of second order ordinary differential operators, together with some other closely related facts, is generally referred to as Sturm-Liouville theory. 245 As in (14.1.7), let L0 φ = a2 (x)φ00 + a1 (x)φ0 + a0 (x)φ (14.3.1) with the assumptions that aj ∈ C([a, b]), and now for definiteness a2 (x) < 0. Define Z x a1 (s) p(x) p(x) = exp ds ρ(x) = − q(x) = a0 (x)ρ(x) (14.3.2) a2 (x) a a2 (s) so that p, ρ are both positive and continuous on [a, b]. We then observe that L0 φ = λφ is equivalent to −(pφ0 )0 + qφ = λρφ (14.3.3) If we define L1 , L by L1 φ = −(pφ0 )0 + qφ L1 φ ρ (14.3.4) Lφ = λφ (14.3.5) Lφ = then we see that L0 φ = λφ if and only if Note that L1 is formally self-adjoint. In order to realize L itself as a self-adjoint operator we introduce the weighted space Z b 2 Lρ (a, b) = {φ : |φ(x)|2 ρ(x) dx < ∞} (14.3.6) a Since ρ is continuous and positive on [a, b], this space may be regarded as the Hilbert space equipped with inner product Z b φ(x)ψ(x)ρ(x) dx hφ, ψiρ := (14.3.7) a for which the corresponding norm ||φ||2ρ = L2 (a, b) norm. We have obviously Rb a |φ(x)|2 ρ(x) dx is equivalent to the usual hLφ, ψiρ − hφ, Lψiρ = hL1 φ, ψi − hφ, L1 ψi = 0 for φ, ψ ∈ C0∞ (a, b). For φ, ψ ∈ C 2 ([a, b]) we have instead, just as before, that b hLφ, ψiρ − hφ, Lψiρ = J(φ, ψ)a 0 (14.3.8) (14.3.9) where here J(φ, ψ) = p(φ0 ψ − φψ ). In the case of separated boundary conditions (14.1.8) we still have the property remarked earlier that {B1∗ , B2∗ } = {B1 , B2 } so that the operator T1 corresponding to {L1 , B1 , B2 } is self-adjoint. 246 14-3-1 It follows in particular that the solution of L1 φ = f B1 φ = B2 φ = 0 (14.3.10) may be given as Z b G1 (x, y)f (y) dy φ(x) = (14.3.11) a as long as there is no non-trivial solution of the homogeneous problem. The Green’s function G1 will have the properties stated in Theorem 14.1 and G1 (x, y) = G1 (y, x) by the self-adjointness. The eigenvalue condition L1 φ = λρφ then amounts to Z b G1 (x, y)ρ(y)φ(y) dy (14.3.12) φ(x) = λ a p If we let ψ(x) = ρ(x)φ(x), µ = 1/λ and p p G(x, y) = ρ(x) ρ(y)G1 (x, y) then we see that Z (14.3.13) b G(x, y)ψ(y) dy = µψ(x) (14.3.14) 14-3-14 a must hold. Conversely, any nontrivial solution of (14.3.14) gives rise, via all of the same transformations, to an eigenfunction of L0 with the {B1 , B2 } boundary conditions. The integral operator S with kernel G is clearly compact and self-adjoint, and p 0 is not an eigenvalue, since Sψ = 0 would imply that zero is a solution of L1 u = ρ(x)ψ(x). In √ particular, if T ψ = µψ, then φ = ψ/ ρ satisfies L0 φ = λφ B1 φ = B2 φ = 0 (14.3.15) with λ = 1/µ. The choice we made that a2 (x) < 0 implies that the set of eigenvalues is bounded below. Consider for example the case that the boundary conditions are φ(a) = φ(b) = 0. From the fact that λn and any corresponding eigenfunction φn satisfy L1 φn = λn ρφn , it follows, upon multiplying by φn and integrating by parts, that Z b Z b 0 2 2 (p|φn | + q|φn | ) dx = λn ρ|φn |2 dx (14.3.16) a a Since p > 0 we get in particular that Rb Rb (p|φ0n |2 + q|φn |2 ) dx q|φn |2 dx a a λn = ≥ ≥C Rb Rb 2 dx 2 dx ρ|φ | ρ|φ | n n a a 247 (14.3.17) 14-3-17 where C = min q/ max ρ. The same conclusion holds for the case of more general boundary conditions, see Exercise 10. Next we can say a little more about the eigenfunctions {φn }. We know by Theorem 13.10 that the eigenfunctions {ψn } of the operator S may be chosen as an orthonormal √ basis of L2 (a, b). Since φn may be taken to be ψn / ρ, by the preceding discussion, it follows that ( Z b Z b 0 n 6= m ψn ψm dx = (14.3.18) φn φm ρ dx = 1 n=m a a Thus the eigenfunctions are orthonormal in the weighted space L2ρ (a, b). We can also easily verify the completeness of these eigenfunctions as follows. For any f ∈ L2ρ (a, b) we √ have that ρf ∈ L2 (a, b), so √ f ρ= ∞ X √ cn = hf ρ, ψn i cn ψn (14.3.19) n=1 in the sense of L2 convergence. Equivalently, this means f= ∞ X cn φn cn = hf ρ, φn i = hf, φn iρ (14.3.20) n=1 also in the sense of L2 or L2ρ convergence, and so the completeness follows from Theorem 6.4. From these observations, together with Theorem 13.10 and Corollary 14.1 we obtain the following. th14-2 Theorem 14.2. Assume that a0 , a1 , a2 ∈ C([a, b]), a2 (x) < 0 on [a, b], and that |c1 | + |c2 | = 6 0, |c3 | + |c4 | = 6 0. Then the problem a2 φ00 + a1 φ0 + a0 φ = λφ a<x<b c1 φ(a) + c2 φ0 (a) = c3 φ(b) + c4 φ0 (b) = 0 (14.3.21) has a countable sequence of simple real eigenvalues {λn }∞ n=1 , with λn → ∞. The corresponding eigenfunctions may be chosen to form an orthonormal basis of L2ρ (a, b). There is one other notable property of the eigenfunctions which we mention without proof: The eigenfunction φn has exactly n − 1 roots in (a, b). See for example Theorem 2.1, Chapter 8 of [7]. 248 14.4 The Laplacian with homogeneous Dirichlet boundary conditions DirLap In this section we develop some theory for the very important eigenvalue problem −∆u = λu x ∈ Ω u = 0 x ∈ ∂Ω (14.4.1) (14.4.2) Here Ω is a bounded open set in RN , N ≥ 2, with sufficiently smooth boundary. The general approach will again be to obtain the existence of eigenvalues and eigenfunctions by first looking at an appropriately defined inverse operator. To begin making precise the definitions of the operators involved, set T u = −∆u on D(T ) = {u ∈ H01 (Ω) : ∆u ∈ L2 (Ω)} (14.4.3) to be regarded as an unbounded operator on L2 (Ω). Recall that in Section 9.1 we have defined the Sobolev spaces H 1 (Ω) and H01 (Ω), and it was mentioned there that it is appropriate to regard u ∈ H01 (Ω) as meaning that u ∈ H 1 (Ω) and u = 0 on ∂Ω. The precise meaning of this needs to be clarified, since in general a function u ∈ H 1 (Ω) need not be continuous on Ω, so that its restriction to the lower dimensional set ∂Ω is not defined in an obvious way. The following theorem is proved in [5]or [10]. Theorem 14.3. If Ω is a bounded domain in RN with a C 1 boundary, then there exists a bounded linear operator τ : H 1 (Ω) → L2 (∂Ω) such that τ u = u|∂Ω if u ∈ H 1 (Ω) ∩ C(Ω) τ u = 0 if u ∈ H01 (Ω) (14.4.4) (14.4.5) The mapping τ in this theorem is the trace operator, that is, the operator of restriction to ∂Ω, and τ u is called the trace of u on ∂Ω. According to the theorem, the trace is well defined for any u ∈ H 1 (Ω), it coincides with the usual notion of restriction if u happens to be continuous on Ω, and any function u ∈ H01 (Ω) has trace equal to 0. It can be further more be shown that the expected integration by parts formula (see (18.2.3)) Z Z Z ∂v ∂u u dx = − v dx + uvnj dS (14.4.6) Ω ∂xj Ω ∂xj ∂Ω 249 remains valid as long as u, v ∈ H 1 (Ω), where in the boundary integral u, v must be understood as meaning τ u and τ v. The boundary integral is well defined since these traces belong to L2 (∂Ω), according to the theorem. For any f ∈ L2 (Ω), the condition that u ∈ D(T ) and T u = f means Z Z 1 u∆v dx = f v dx ∀v ∈ C0∞ (Ω) u ∈ H0 (Ω) (14.4.7) wk-a R The first integral may be equivalently written as Ω ∇u · ∇v dx, using the integration by parts formula, and then by the density of C0∞ (Ω) in H01 (Ω), we see that Z Z 1 u ∈ H0 (Ω) ∇u · ∇v dx = f v dx ∀v ∈ H01 (Ω) (14.4.8) wk-b Ω Ω Ω Ω must hold. Conversely, any function u satisfying (14.4.8) must also satisfy T u = f . In particular, if λ is an eigenvalue of T then Z Z 1 u ∈ H0 (Ω) ∇u · ∇v dx = λ uv dx Ω ∀v ∈ H01 (Ω) (14.4.9) 14-4-7 Ω We note that λ ≥ 0 must hold, since we can choose v = u. As we will see below λ = 0 is impossible also. Another tool we will make good use of is the so-called Poincaré inequality. poincineq Proposition 14.1. If Ω is a bounded open set in RN then there exists a constant C, depending only on Ω, such that ||u||L2 (Ω) ≤ C||∇u||L2 (Ω) ∀u ∈ H01 (Ω) (14.4.10) Proof: It is enough to prove the stated inequality for u ∈ C0∞ (Ω). If we let R be large enough so that Ω ⊂ QR = {x ∈ RN : |xj | < R, j = 1, . . . N } then defining u = 0 outside of Ω we may also regard u as an element of C0∞ (QR ), with identical norms whether considered on Ω or QR . Therefore Z Z ∂ 2 2 2 ||u||L2 (Ω) = u dx = − x1 u dx (14.4.11) ∂x1 QR QR Z ∂u = −2 x1 u dx (14.4.12) ∂x1 QR ≤ 2R||u||L2 (Ω) ||∇u||L2 (Ω) (14.4.13) 250 14-4-8 Thus the conclusion holds with C = 2R. Note that we do not really need Ω to be bounded here, only that it be contained between two parallel hyperplanes. It is an immediate consequence of Poincaré’s inequality that ||u||H01 (Ω) := ||∇u||L2 (Ω) (14.4.14) 14-4-12 defines a norm on H01 (Ω) which is equivalent to the original norm it inherits as a subspace of H 1 (Ω), since R 2 ||u||2H 1 (Ω) (u + |∇u|2 ) dx Ω R 1≤ = ≤C +1 (14.4.15) ||u||2H 1 (Ω) |∇u|2 dx Ω 0 Unless otherwise stated we always assume that the norm on H01 (Ω) is that given by (14.4.14), which of course corresponds to the inner product Z ∇u · ∇v dx hu, viH01 (Ω) = (14.4.16) Ω A simple but important connection between the eigenvalues of T and the Poincaré inequality, obtained by choosing v = u in the right hand equality of (14.4.9), is that any such eigenvalue λ satisfies 1 λ≥ 2 >0 (14.4.17) C where C is any constant for which Poincaré’s inequality is valid. We will see later that there is a ’best constant’, namely a value C = CP for which (14.4.10) is true, but is false for any smaller value, and the smallest positive eigenvalue of T is precisely 1/CP2 . Any constant which works in the Poincaré inequality also provides a lower bound for the operator T , as follows: If T u = f then choosing v = u in (14.4.9) we get Z Z 2 |∇u| dx = f u dx ≤ ||f ||L2 (Ω) ||u||L2 (Ω) ≤ C||f ||L2 (Ω) ||∇u||L2 (Ω) (14.4.18) Ω Ω Therefore ||u||H01 (Ω) ≤ C||f ||L2 (Ω) and ||u||L2 (Ω) ≤ C 2 ||f ||L2 (Ω) (14.4.19) or equivalently ||T u||L2 (Ω) ≥ C −2 ||u||L2 (Ω) . prop14-2 Proposition 14.2. Considered as an operator on L2 (Ω), T is one-to-one, onto and selfadjoint. 251 14-4-19 Proof: The property that T is one-to-one is immediate from (14.4.19). Next, if f ∈ L2 (Ω) define the linear functional φ by Z f v dx (14.4.20) φ(v) = Ω Then φ is continuous on H01 (Ω) since |φ(v)| ≤ ||f ||L2 (Ω) ||v||L2 (Ω) ≤ C||f ||L2 (Ω) ||v||H01 (Ω) (14.4.21) By the Riesz Representation theorem, Theorem 6.6, there exists a unique u ∈ H01 (Ω) such that Z ∇u · ∇v dx = φ(v) (14.4.22) hu, viH01 (Ω) = Ω which is equivalent to T u = f , asR explained above. Thus T is onto. Finally, from (14.4.9) it follows that hT u, vi = Ω ∇u · ∇v dx = hu, T vi, i.e. T is symmetric, and a linear operator which is symmetric and onto must be self-adjoint, see Exercise 7 of Chapter 11. Next we consider the construction of an inverse operator to T , in the form of an integral operator Z Sf (x) = G(x, y)f (y) dy (14.4.23) 14-4-23 Ω where G will again be called the Green’s function for T u = f , assuming it exists. Thus u(x) = Sf (x) should be the solution of −∆u = f (x) x ∈ Ω u(x) = 0 x ∈ ∂Ω (14.4.24) Analogously to the ODE case discussed in the previous section, we expect that G should formally satisfy −∆x G(x, y) = δ(x − y) x ∈ Ω G(x, y) = 0 x ∈ ∂Ω (14.4.25) for every fixed y ∈ Ω. Recall that we already know that there exist Γ(x) such that −∆Γ = δ in the sense of distributions, so if we set h(x, y) = G(x, y) − Γ(x − y) then it is necessary for h to satisfy −∆x h(x, y) = 0 x ∈ Ω h(x, y) = −Γ(x − y) x ∈ ∂Ω (14.4.26) for every fixed y ∈ Ω. Note that since x − y 6= 0 for x ∈ ∂Ω and y ∈ Ω, the boundary function for h is infinitely differentiable. Thus we have a parametrized set of boundary 252 14-4-26 value problems, each having the form of finding a function harmonic in Ω satisfying a prescribed smooth Dirichlet type boundary condition. Such a problem is known to have a unique solution, assuming only very minimal hypotheses on the smoothness of ∂Ω, see for example Theorem 2, Section 4.3 of [22]. In a few special cases it is possible to compute h(x, y), and hence G(x, y), explicitly, see Exercise 17 for the case when Ω is a ball. Note however, that whatever h may be, G(x, y) is singular when x = y, and possesses the same local integrability properties as Γ(x − y). It is not hard to check that R |Γ(x − y)|2 dxdy is finite for N = 2, 3 but not for N ≥ 4. Thus G is not of HilbertΩ×Ω Schmidt type in general, so we cannot directly conclude in this way that S = T −1 is compact on L2 (Ω). Nevertheless the operator is indeed compact. One approach to showing this comes from the general theory of singular integral operators, see Chapter ( ). A simple alternative, which we will use here, is based on the following result, which is of independent importance. rellich Theorem 14.4. (Rellich-Kondrachov) A bounded set in H01 (Ω) is precompact in L2 (Ω). For a proof we refer to [10], Section 5.7, Theorem 1, or [5], Theorem 9.16, where somewhat more general statements are given. With some minimal smoothness assumption on ∂Ω we can replace H01 (Ω) by H 1 (Ω). It is an equivalent statement to say that the identity map i : H01 (Ω) → L2 (Ω) is a compact linear operator. Other terminology, such as that H01 (Ω) is compactly embedded, compactly included, or compactly injected in L2 (Ω) (or H01 (Ω) ,→ L2 (Ω)) are also commonly used. Corollary 14.2. If S = T −1 then S is a compact self-adjoint operator on L2 (Ω) Proof: If E ⊂ L2 (Ω) is bounded, then by (14.4.19) the image S(E) = {u = Sf : f ∈ E} is bounded in H01 (Ω). The Rellich-Kondrachov theorem then implies S(E) is precompact as a subset of L2 (Ω), so S : L2 (Ω) → L2 (Ω) is compact. The self-adjointness of S follows immediately from that of T . Thus S possesses an infinite sequence of real eigenvalues {µn }∞ n=1 , limn→∞ µn = 0, ∞ and corresponding eigenfunctions {ψn }n=1 which may be chosen as an orthonormal basis of L2 (Ω). As usual, the reciprocals λn = 1/µn are eigenvalues of T = S −1 , and recall that all eigenvalues of T are strictly positive. We have established the following. th14-5 Theorem 14.5. The operator T u = −∆u D(T ) = {u ∈ H01 (Ω) : ∆u ∈ L2 (Ω)} 253 (14.4.27) HeatEqSepVar has an infinite sequence of real eigenvalues of finite multiplicity, 0 < λ1 ≤ λ2 ≤ λ3 ≤ . . . λn → +∞ (14.4.28) and corresponding eigenfunctions {ψn }∞ n=1 which may be chosen as an orthonormal basis 2 of L (Ω). The convention here is that an eigenvalue in this sequence is repeated according to its multiplicity. In comparison with the Sturm-Liouville case, an eigenvalue need not be simple, although the multiplicity must still be finite, thus repetitions in the sequence (14.4.28) may occur. It does turn out to be the case, however, that λ1 is always simple – this will be discussed in Section ( ). We refer to λn , ψn as Dirichlet eigenvalues and eigenfunctions for the domain Ω. Among many other things, knowledge of the existence of these eigenvalues and eigenfunctions allows us to greatly expand the scope of the separation of variables method. Example 14.4. Consider the initial and boundary value problem for the heat equation in a bounded domain Ω ⊂ RN , ut − ∆u = 0 x∈Ω t>0 u(x, t) = 0 x ∈ ∂Ω t > 0 u(x, 0) = f (x) x∈Ω (14.4.29) (14.4.30) (14.4.31) Employing the separation of variables method, we begin by looking for solutions in the product form ψ(x)φ(t) which satisfy the PDE and the homogeneous boundary condition. Substituting we see that φ0 (t)ψ(x) = φ(t)∆ψ(x) should hold, and therefore φ0 + λφ = 0 t > 0 ∆ψ + λψ = 0 x ∈ Ω (14.4.32) In addition the boundary condition implies that ψ(x) = 0 for x ∈ ∂Ω. In order to have a nonzero solution, we concluded that λ, ψ must be a Dirichlet eigenvalue/eigenfunction pair for the domain Ω, and then correspondingly φ(t) = Ce−λt . By linearity we therefore see that if λn , ψn denote the Dirichlet eigenvalues and L2 (Ω) orthonormalized eigenfunctions, then ∞ X u(x, t) = cn e−λn t ψn (x) (14.4.33) n=1 is a solution of (14.4.29),(14.4.30), as long as the coefficients ck are sufficiently rapidly decaying. 254 14-4-28 In order that (14.4.31) also holds, we must have f (x) = u(x, 0) = ∞ X cn ψn (x) (14.4.34) n=1 and so from the orthonormality, cn = hf, ψn i. We have thus obtained the (formal) solution ∞ X u(x, t) = hf, ψn ie−λn t ψn (x) (14.4.35) n=1 of (14.4.29)-(14.4.30)-(14.4.31). Making use of estimates which may be found in more advanced PDE textbooks, it can be shown that for any f ∈ L2 (Ω) the series (14.4.35) is uniformly convergent to an infinitely differentiable limit u(x, t) for t > 0, where u is a classical solution of (14.4.29)-(14.4.30), and the initial condition (14.4.31) is satisfied at least in the sense R that limt→0 Ω (u(x, t) − f (x))2 dx = 0. Under stronger conditions on f , the nature of the convergence at t = 0 can be shown to be correspondingly We refer, for P∞ stronger. 2 example, to [10] for At the very least, since n=1 |cn | < ∞ must hold, the Pmore details. −λn t 2 |c e | < ∞ for t ≥ 0 implies that the series is convergent in obvious estimate ∞ n=1 n 2 L (Ω) for every fixed t ≥ 0. Note, again at a formal level at least, that the expression for the solution u can be rewritten as ∞ Z X f (y)ψn (y) dy e−λn t ψn (x) (14.4.36) u(x, t) = n=1 Ω Z = f (y) Ω ∞ X ! e−λn t ψn (x)ψn (y) dy (14.4.37) n=1 Z = f (y)G(x, y, t) dy (14.4.38) Ω suggesting that G(x, y, t) := ∞ X e−λn t ψn (x)ψn (y) n=1 should be regarded as the Green’s function for (14.4.29)-(14.4.30)-(14.4.31). 255 (14.4.39) 14-4-35 14.5 Exercises 1. Let Lu = (x − 2)u00 + (1 − x)u0 + u on (0, 1). a) Find the Green’s function for Lu = f u0 (0) = 0 u(1) = 0 (Hint: First show that x − 1, ex are linearly independent solutions of Lu = 0.) b) Find the adjoint operator and boundary conditions. 2. Let d Tu = − dx du x dx on the domain D(T ) = {u ∈ H 2 (1, 2) : u(1) = u(2) = 0} a) Show that N (T ) = {0}. b) Find the Green’s function for the boundary value problem T u = f . c) State and prove a result about the continuous dependence of the solution u on f in part (b). 3. Let φ, ψ be solutions of Lu = a2 (x)u00 + a1 (x)u0 + a0 (x)u = 0 on (a, b) and W (φ, ψ)(x) = φ(x)ψ 0 (x) − φ0 (x)ψ(x) be the corresponding Wronskian determinant. a) Show that W is either zero everywhere or zero nowhere. (Suggestion: find a first order ODE satisfied by W .) b) If a1 (x) = 0 show that the W is constant. Ec14-4 4. Let Lu = a2 (x)u00 + a1 (x)u0 + a0 (x)u with a02 = a1 , so that L is formally self adjoint. If B1 u = C1 u(a)+C2 u0 (a), B2 u = C3 u(b)+C4 u0 (b), show that {B1∗ , B2∗ } = {B1 , B2 }. 5. Find the Green’s function for u00 + 2u0 − 3u = f (x) 0 < x < ∞ u(0) = 0 lim u(x) = 0 x→∞ (Think of the last condition as a ’boundary condition at infinity’.) Using the Green’s function, find u(2) if f (x) = e−6x . 256 Ec14-6 6. Consider the second order operator Lu = a2 (x)u00 + a1 (x)u0 + a0 (x)u a<x<b with non-separated boundary conditions B1 u = α11 u(a) + α12 u0 (a) + β11 u(b) + β12 u0 (b) = 0 B2 u = α21 u(a) + α22 u0 (a) + β21 u(b) + β22 u0 (b) = 0 where the vectors (α11 , α12 , β11 , β12 ), (α21 , α22 , β21 , β22 ) are linearly independent. We again say that two other non-separated boundary conditions B1∗ , B2∗ are adjoint to B1 , B2 with respect to L if J(u, v)|ba = 0 whenever B1 u = B2 u = B1∗ v = B2∗ v = 0. Find the adjoint operator and boundary conditions in the case that Lu = u00 + xu0 B1 u = u0 (0) − 2u(1) B2 u = u(0) + u(1) 7. When we rewrite a2 (x)u00 + a1 (x)u0 + a0 (x)u = λu as −(p(x)u0 )0 + q(x)u = λρ(x)u the latter is often referred to as the Liouville normal form. Consider the eigenvalue problem x2 u00 + xu0 + u = λu 1 < x < 2 u(1) = u(2) = 0 a) Find the Liouville normal form. b) What is the orthogonality relationship satisfied by the eigenfunctions? c) Find the eigenvalues and eigenfunctions. (You may find the original form of the equation easier to work with than the Liouville normal form when computing the eigenvalues and eigenfunctions.) 8. Consider the Sturm-Liouville equation in the Liouville normal form, −(p(x)u0 )0 + q(x)u = λρ(x)u a<x<b where p, ρ ∈ C 2 ([a, b]), q ∈ C([a, b]), p, ρ > 0 on [a, b]. Let s Z b Z ρ(x) 1 x 1/4 σ(x) = η(x) = (p(x)ρ(x)) L= σ(s) ds φ(x) = σ(s) ds p(x) L a a 257 If ψ = φ−1 (the inverse function of φ) and v(z) = η(ψ(z))u(ψ(z)) show that v satisfies −v 00 + Q(z)v = µv 0 < z < 1 (14.5.1) for some Q depending on p, ρ, q, and µ = L2 λ. (This is mainly a fairly tedious exercise with the chain rule. Focus on making the derivation as clean as possible and be sure to say exactly what Q(z) is. The point of this is that every eigenvalue problem for a second order ODE is equivalent to one with an equation of the form (14.5.1), provided that the coefficients have sufficient smoothness. The map u(x) → v(z) is sometimes called the Liouville transformation, and the ODE (14.5.1) is the canonical form for a 2nd order ODE eigenvalue problem.) 9. Consider the Sturm-Liouville problem u00 + λu = 0 0<x<1 u(0) − u0 (0) = u(1) = 0 a) Multiply the equation by u and integrate by parts to show that any eigenvalue is positive. √ √ b) Show that the eigenvalues are the positive solutions of tan λ = − λ. c) Show graphically √ that such roots exist, and form an infinite sequence λk such that (k − 12 )π < λk < kπ and lim ( k→∞ Ec14-10 p 1 λk − (k − )π) = 0 2 10. Complete the proof that λn → +∞ under the assumptions of Theorem 14.2. (Suggestion: you can obtain an inequality like (14.3.17), except it may also contain boundary terms.) 11. Using separation of variables, compute explicitly the Dirichlet eigenvalues and eigenfunctions of −∆ when the domain is a rectangle (0, A)×(0, B) in R2 . Verify directly that the first eigenvalue is simple, and that the first eigenfunction is of constant sign. Can there be other eigenvalues of multiplicity greater than one? (Hint: Your answer should depend on whether the ratio A/B is rational or irrational). 12. Find Dirichlet eigenvalues and eigenfunctions of −∆ in the unit ball B(0, 1) ⊂ R2 . (Suggestion: express the PDE and do separation of variables in polar coordinates. Your answer should involve Bessel functions.) 258 LiNo Ec14-13 13. If {ψn }∞ of the Laplacian making up an orthonormal n=1 are Dirichlet eigenfunctions √ basis of L2 (Ω), let ζn = ψn / λn (λn the corresponding eigenvalue). 1 a) Show that {ζn }∞ n=1 is an orthonormal basis of H0 (Ω). P 2 b) Show that f ∈ H01 (Ω) if and only if ∞ n=1 λn |hf, ψn i| < ∞. 14. If Ω ⊂ Rn is a bounded open set with smooth enough boundary, find a solution of the wave equation problem utt − ∆u = 0 x∈Ω t>0 x ∈ ∂Ω t > 0 u(x, t) = 0 u(x, 0) = f (x) ut (x, 0) = g(x) in the form u(x, t) = ∞ X x∈Ω cn (t)ψn (x) n=1 where {ψn }∞ n=1 are the Dirichlet eigenfunctions of −∆ in Ω. 15. Derive formally that G(x, y) = ∞ X ψn (x)ψn (y) n=1 λn (14.5.2) where λn , ψn are the Dirichlet eigenvalues and normalized eigenfunctions for the domain Ω, and G(x, y) is the corresponding Green’s function in (14.4.23). (Suggestion: if −∆u = f , expand both u and f in the ψn basis.) 16. Formulate and prove a result which says that under appropriate conditions u(x, t) ≈ Ce−λ1 t ψ1 (x) (14.5.3) as t → ∞, where u is the solution of (14.4.29)-(14.4.30)-(14.4.31). Ec14-16 17. If Ω = B(0, 1) ⊂ RN show that the function h(x, y) appearing in (14.4.26) is given by h(x, y) = −Γ(|x|y − x/|x|) (14.5.4) 18. Prove the Rellich-Kondrachov Theorem 14.4 directly in the case of one space dimension, by using the Arzela-Ascoli theorem. 259 Chapter 15 Further study of integral equations chmoreint 15.1 Singular integral operators In the very broadest sense, an integral operator Z K(x, y)u(y) dy T u(x) = (15.1.1) Ω is said to be singular if the kernel K(x, y) fails to be C ∞ at one or more points. Of course this does not necessarily affect the general properties of T in a significant way, but there are certain more specific kinds of singularity which occur in natural and important ways, which do affect the general behavior of the operator, and so call for some specific study. First of all let us observe that singularity is not necessarily a bad thing. For example, the problem of solving T u = f with a C ∞ kernel is a first kind integral equation, for which a solution only exists, in general, for very restricted f . By comparison, the corresponding second kind integral equation u + T u = f may be regarded, at least formally, as a first kind equation with the ‘very singular’ kernel δ(x − y) + K(x, y), and will have a unique solution for a much larger class of f ’s, typically all f ∈ L2 (Ω) in fact. As a second kind of example, recall that if Ω = (a, b) ⊂ R, a Volterra type integral equation is generally easier to analyze and solve than a corresponding non-Volterra type equation. The more amenable nature of the Volterra equation may be understood as Rx the fact that the Volterra operator T u(x) = a K(x, y)u(y) dy could be rewritten as 260 intop15 Rb a K̃(x, y)u(y) dy where ( K(x, y) a < y < x < b K̃(x, y) = 0 a<x<y<b (15.1.2) That is to say, K̃ is singular when y = x no matter how smooth K itself is so singularity is built in to the very structure of a Volterra type integral equation. Let us also mention that it often appropriate to regard T as being singular if the underlying domain Ω is unbounded. One might expect this from the fact that if were to make a change of variable to map the unbounded domain Ω onto a convenient bounded domain, the price to be paid normally is that the transformed kernel will become singular at those points which are the image of ∞. The Fourier transform could be regarded in this light, and its very nice behavior viewed as due to, rather than despite, the presence of singularity. For the remainder of this section we will focus on a specific class of singular integral operators, in which the kernel K is assumed to satisfy |K(x, y)| ≤ M |x − y|α x, y ∈ Ω (15.1.3) for some constant M and exponent α > 0, with Ω a bounded domain in RN . If α < N then K is said to be weakly singular. The main result to be proved below is that an integral operator with weakly singular kernel is compact on L2 (Ω). Note that such an operator may or may not be of Hilbert-Schmidt type. For example if K(x, y) = 1/|x − y|α then K ∈ L2 (Ω × Ω) if and only if α < N/2. The Green’s function G(x, y) for the Laplacian (see (14.4.23)) is always weakly singular, and the compactness result below provides an alternative to the Rellich-Kondrachov theorem (Theorem 14.4) for proving compactness of the corresponding integral operator. We begin with the following lemma. lm15-1 Lemma 15.1. Suppose K ∈ L1 (Ω × Ω) and there exists a constant C such that Z Z |K(x, y)| dx ≤ C ∀y ∈ Ω |K(x, y)| dy ≤ C ∀x ∈ Ω (15.1.4) Ω Ω Then the corresponding integral operator T is a bounded linear operator on L2 (Ω) with ||T || ≤ C. 261 15-1-3 Proof: Using the Schwarz inequality we get sZ sZ Z |K(x, y)||u(y)| dy ≤ |K(x, y)| dy |K(x, y)||u(y)|2 dy Ω Ω ≤ (15.1.5) Ω sZ √ |K(x, y)||u(y)|2 dy C (15.1.6) Ω and therefore Z 2 Z Z |T u(x)| dx ≤ C Ω 2 |K(x, y)||u(y)| dy dx Z Z 2 |K(x, y)| dx dy = C |u(y)| Ω Ω Z 2 |u(y)|2 dy ≤ C Ω (15.1.7) Ω (15.1.8) (15.1.9) Ω as needed. We can now proved the compactness result mentioned above. Theorem 15.1. If Ω is a bounded domain in RN and K is a weakly singular kernel, then the integral operator (15.1.1) is compact on L2 (Ω). Proof: First observe that Z Z |K(x, y)| dy ≤ M Ω Z dy dy ≤M α α Ω |x − y| B(x,R) |x − y| Z R M ΩN −1 RN −α N −1−α ≤ M ΩN −1 r dr = N −α 0 (15.1.10) (15.1.11) for some R depending on Ω. Here ΩN −1 denotes the surface area of the unit sphere in RN , see (18.3.1). The same is true if we integrate with respect to x instead of y, and so by Lemma 15.1, T is bounded. Now let ( K(x, y) |x − y| > m1 Km (x, y) = (15.1.12) 0 |x − y| ≤ m1 and note that K − Km satisfies the same estimate as K above, except that R may be replaced by 1/m. That is, Z M ΩN −1 |K(x, y) − Km (x, y)| dy ≤ (15.1.13) (N − α)mN −α Ω 262 and likewise for the integral with respect to x. Thus, if Tm is the integral operator with kernel Km , then using Lemma 15.1 once more we get ||T − Tm || ≤ M ΩN −1 →0 (N − α)mN −α (15.1.14) as m → ∞. Since Km ∈ L∞ (Ω × Ω), the operator Tm is compact for each m, by Theorem 13.4, and so the compactness of T follows from Theorem 13.3. Theorem 15.2. Let Ω be a bounded domain in RN and assume K is a weakly singular kernel which is continuous on Ω × Ω for x 6= y. If u ∈ L∞ (Ω) then T u is uniformly continuous on Ω. Proof: Fix > 0, pick α ∈ (0, N ) such that (15.1.3) holds, and set H(x, y) = K(x, y)|x − y|α (15.1.15) so H is bounded and continuous for x 6= y. Assuming u ∈ L∞ (Ω), and x ∈ Ω we have for z ∈ B(x, δ) ∩ Ω Z (15.1.16) |T u(z) − T u(x)| = | (K(z, y) − K(x, y))u(y) dy Z Ω ≤ (|K(z, y)| + |K(x, y)|)|u(y)| dy (15.1.17) Ω∩B(x,2δ) Z + |(K(z, y) − K(x, y))u(y)| dy (15.1.18) Ω\B(x,2δ) The integral in (15.1.18) may be estimated by Z 1 1 ||H||∞ ||u||∞ + dy α |x − y|α B(x,2δ) |z − y| (15.1.19) and so tends to zero as δ → 0 at a rate which is independent of x, z. We fix δ > 0 such that this term is less than . In the remaining integral, assuming that |x − z| < δ we have |y − x| > 2δ and so also |y − z| > δ. If Eδ = {(x, y) ∈ Ω × Ω : |x − y| ≥ δ} then K is uniformly continuous on Eδ , so there must exist δ 0 < δ such that for z ∈ B(x, δ 0 ) ∩ Ω the integral in (15.1.18) is less than . This completes the proof. 263 In general compactness fails if α ≥ N . A good example to keep in mind is the Hilbert transform ((10.2.27)) which is in the borderline case α = N = 1, and which we have already noted is not a compact operator. Actually this example doesn’t quite fit in to our discussion since the underlying domain is Ω = R which is not bounded. If, however, we consider the so-called finite Hilbert transform defined by1 Z 1 1 u(y) H0 u(x) = dy (15.1.20) π 0 x−y as an operator on L2 (0, 1), it is known (see [19]) that the spectrum σp (H0 ) consists of the segment of the imaginary axis connecting the points ±i. In particular, since this set is uncountable, H0 is not compact. See Chapter 5, section 2 of [15] for discussion of the operator equation H0 u = f . Note that boundedness of H0 is automatic from the corresponding property for the Hilbert transform. A thorough investigation of operators which generalize the Hilbert transform may be found in [34]. 15.2 Layer potentials A type of singular integral operator which has played an important role in the historical development of the theory of elliptic PDEs is the so-called layer potential, see for example Kellogg [20] for a very classical treatment. Layer potentials actually come in two common varieties. If Γ denotes the fundamental solution (9.3.43) of Laplace’s equation in RN for N ≥ 2, and Σ ⊂ RN is a smooth bounded N − 1 dimensional surface, set Z Γ(x − y)φ(y) ds(y) (15.2.1) Sφ(x) = Σ Z Dφ(x) = Σ ∂ Γ(x − y)φ(y) ds(y) ∂ny (15.2.2) which are respectively known as single and double layer potentials on Σ with density φ. To immediately see why such operators might arise naturally in connection with elliptic PDEs, observe that for any φ which is well behaved on Σ, Sφ and Dφ are harmonic functions in the complement of Σ. For example if u(x) = Sφ(x) then Z ∆u(x) = ∆x Γ(x − y)φ(y) ds(y) = 0 (15.2.3) Σ 1 The integral below must be understood in the principal value sense. 264 may be easily shown to be legitimate for x 6∈ Σ, taking into account that ∆Γ(x) = 0 for x 6= 0. Likewise if u(x) = Dφ(x) then Z Z ∂ ∂ ∆u(x) = ∆x Γ(x − y)φ(y) ds(y) = ∆x Γ(x − y)φ(y) ds(y) = 0 (15.2.4) ∂ny Σ ∂ny Σ A wise choice of φ may then allow us to find harmonic functions satisfying some desired further properties, such as prescribed boundary behavior. To clarify the definition of D, we suppose that a unit vector n(x) normal to Σ is chosen, which is a continuous function of x ∈ Σ (typically this amounts to making a consistent choice of the sign of n(x), since there are two unit normal vectors at each point of Σ). If Σ is a simple closed surface then we will always adopt the usual convention which is to take n(x) to be the outward normal. In any case we have N X (x − y) · n(y) ∂ Γ(x − y) = − Γxj (x − y)nj (y) = − := K(x, y) y ∈ Σ N ∂ny Ω |x − y| N −1 j=1 (15.2.5) 15-2-5 Both Sφ and Dφ are obviously well defined for x 6∈ Σ, and the kernels are well defined for x 6= y even if x ∈ Σ. If we wish to view either S or D as an operator, say, on L2 (Σ) then formally at least we should think of Σ as being N − 1 dimensional, and since the singularity of Γ is like2 |x|2−N , S has the character of a weakly singular integral operator. In the case of D, however, the singularity of Γxj is like |x|1−N , so K appears to be exactly in the borderline case where compactness is lost. On the other hand, under some reasonable assumptions on Σ we will see that extra decay of K when x → y is provided by the n(y) factor, so that compactness of D will be recovered. Let us consider now the Dirichlet problem ∆u = 0 x ∈ Ω u=f x∈Σ (15.2.6) where Ω is a bounded, connected domain in RN , N ≥ 2, and Σ = ∂Ω. We will seek a solution in the form of a double layer potential u(x) = Dφ(x) for some density φ defined on Σ. As mentioned above, it is automatic that u is harmonic in Ω, so the condition which φ must be chosen to satisfy is that Dφ = f on Σ, or more precisely Z lim K(z, y)φ(y) ds(y) = f (x) (15.2.7) z→x z∈Ω 2 Σ With the usual modification for N = 2. 265 dp15 for x ∈ Σ. The distinction between evaluating Dφ on Σ and on the other hand taking the limit of Dφ from inside Ω at a point of Σ is important in the following discussion, and must be observed rigorously - they are in fact not the same in general, and it is only the latter which we care about. The simplest possible case, which is contained in the following lemma, illustrates the point. Lemma 15.2. If φ(x) = 1 and Σ = ∂Ω is C 2 then x∈Ω 1 1 Dφ(x) = 2 x∈Σ c 0 x∈Ω (15.2.8) 15-2-7 c Proof: If x ∈ Ω then y → Γ(x − y) is a harmonic function in all of Ω, so integration by parts gives Z Z ∂ ∆y Γ(x − y) dy = 0 (15.2.9) Dφ(x) = Γ(x − y) ds(y) = Σ ∂ny Ω Now set Ω = Ω\B(x, ). If x ∈ Ω, pick > 0 such that B(x, ) ⊂ Ω in which case Z Z ∂ 0 = ∆Γ(x − y) dy = Γ(x − y) ds(y) (15.2.10) Ω ∂Ω ∂ny Z Z ∂ ∂ = Γ(x − y) ds(y) − Γ(x − y) ds(y) (15.2.11) Σ ∂ny |y−x|= ∂ny For |x − y| = it is easy to check that n(y) = (x − y)/|x − y| (the outward normal points towards x) and so the second term evaluates to be Z 1 ds(y) = 1 (15.2.12) N −1 |y−x|= ΩN −1 which establishes (15.2.8) for x ∈ Ω. Finally for x ∈ Σ, we repeat the same calculation and find that the integral on the left in (15.2.13) is replaced by Z 1 ds(y) (15.2.13) N −1 Ω∩|y−x|= ΩN −1 Since we assumed that Σ is C 2 it follows that as → 0 we get precisely half of surface area (i.e. Σ might as well be a hyperplane), so that the limit of 1/2 results, as needed. 266 15-2-12 15-2-12 Note that if we allowed Σ to have a corner at some point x, then the conclusion that Dφ(x) = 1/2 for x ∈ Σ would definitely no longer be valid. If u(x) = Dφ(x) for some φ, let us now define u+ (x) = lim u(x + αn(x)) α→0+ u− (x) = lim u(x + αn(x)) α→0− (15.2.14) Thus u− , u+ are respectively limiting values of u from inside and outside the domain. In the above example we saw that u(x) − u± (x) = ± 21 for x ∈ Σ, and this generalizes in the following way. Theorem 15.3. If φ ∈ C(Σ) and u = Dφ(x) then u(x) − u± (x) = ± φ(x) 2 x∈Σ (15.2.15) dljump The proof of this result involves technicalities which are beyond the scope of this book. We refer to Theorem 3.22 of [11] for details. Thus in general Dφ experiences a jump as Σ is crossed, whose magnitude at x ∈ Σ is precisely φ(x). For the Dirichlet problem (15.2.6) the precise meaning of the boundary condition is that we seek a density φ such that u− (x) = f (x) for x ∈ Σ. It then follows from (15.2.15) that φ should satisfy Z φ(x) + K(x, y)φ(y) ds(y) = f (x) x∈Σ (15.2.16) 2 Σ Conversely, if φ is a continuous solution of (15.2.16) and we set u(x) = Dφ(x) then u = f (x), as required. We therefore have is harmonic inside Ω and u− (x) = u(x) + φ(x) 2 obtained the very interesting and useful result that solvability properties of (15.2.6) can be analyzed in terms of the second kind integral equation (15.2.16). We can likewise c study the corresponding exterior Dirichlet problem, in which we seek u harmonic in Ω with prescribed boundary values on Σ, by looking instead at Z φ(x) − + K(x, y)φ(y) ds(y) = f (x) x∈Σ (15.2.17) 2 Σ The strategy now is to show that D is a compact operator on L2 (Σ), so that the general theory from Chapter 13 can be applied. Again, the technicalities are lengthy 267 dpie extdpie so we will content ourselves with a heuristic discussion, referring to [11] for a detailed treatment. In the previous section we have established a sufficient condition for a singular integral operator to be compact. Here, the underlying domain Σ is not a domain in RN but assuming it is a reasonably smooth surface, e.g. C 2 , it is ’locally’ a domain in RN −1 . Thus compactness can be proved, as before, if the singularity of K has an associated exponent α < N − 1. The explicit expression (15.2.5) for K does not appear to imply this, but it will if we take into account that x − y becomes orthogonal to n(y) if x, y ∈ Σ and x → y. More precisely we have Lemma 15.3. If Σ is a C 2 surface then there exists a constant M such that |(x − y) · n(y)| ≤ M |x − y|2 x, y ∈ Σ (15.2.18) Proof: Fix x ∈ Σ. Without loss of generality we may assume that x = 0 and that n(0) = (0, 0, . . . 1). Thus in a neighborhood of x = 0 the surface Σ is given by yn = Ψ(y 0 ) where y 0 = (y1 , . . . yn−1 ), Ψ is C 2 near 0, and Ψ(0) = ∇Ψ(0) = 0. In particular Ψ(y) = O(|y 0 |2 ) as y 0 → 0. By Taylor’s theorem, for y ∈ Σ (x − y) · n(y) = −y · (n(0) + n(y) − n(0)) = −yn + y · (n(0) − n(y))(15.2.19) = −Ψ(y1 , . . . , yn−1 ) + y · (n(0) − n(y)) (15.2.20) Since Σ is C 2 it follows that n(y) is C 1 , and so both terms in (15.2.20) are O(|y 0 |2 ), which is the needed conclusion at fixed x. The implied constant depends only on bounds for the curvature of Σ and so a constant M exists which is independent of x ∈ Σ. Corollary 15.1. The kernel K(x, y) in (15.2.5) satisfies |K(x, y)| ≤ M |x − y|2−N x, y ∈ Σ (15.2.21) and in particular D is compact on L2 (Σ). From Theorem 13.5 it now follows that there exists a unique solution of (15.2.16) for every f ∈ C(Σ) (or even L2 (Σ)) provided that it can be verified that there is no non-trivial solution of the corresponding homogeneous equation. If such a solution φ 6≡ 0 exists then it follows first of all that u = Dφ is a solution of (15.2.6) with f ≡ 0. This must mean u ≡ 0 and so in consequence u− (x) = 0 on Σ. Likewise u satisfies (15.2.6) c with Ω replaced by Ω , and this also implies u+ (x) = 0 on Σ, see Exercise ( ). But then by (15.2.15) it follows that φ(x) = u− (x) − u+ (x) = 0 268 (15.2.22) so that Dφ − φ/2 = 0 has only the trivial solution, as needed. Let us also note that if Dφ − φ/2 = f ∈ C(Σ) it can be shown that φ ∈ C(Σ) so that (15.2.15) is valid. 15.3 Convolution equations Consider the convolution type integral equation Z K(x − y)u(y) dy − λu(x) = f (x) x ∈ RN (15.3.1) 15-3-1 RN where K, f ∈ L2 (RN ). If there exists a solution u ∈ L2 (RN ) then by Theorem 8.8 it must hold that N a.e. y ∈ RN (15.3.2) ((2π) 2 K̂(y) − λ)û(y) = fˆ(y) N The solution is evidently unique, at least in L2 (RN ), provided (2π) 2 K̂(y) 6= λ a.e. If also there exists > 0 such that N |(2π) 2 K̂(y) − λ| ≥ then û(y) = a.e. y ∈ RN fˆ(y) N 2 (2π) K̂(y) − λ (15.3.3) (15.3.4) defines a solution for every f ∈ L2 (RN ). The requirement K ∈ L2 (RN ) can clearly be weakened to some extent. Recall that K∗u is well defined under a number of different sets of assumptions which have been made earlier, for example (i) K ∈ D0 (RN ) and u ∈ D(RN ), (ii) K ∈ S 0 (RN ) and u ∈ S(RN ) or (iii) K ∈ Lp (RN ) and u ∈ Lq (RN ) with p−1 + q −1 ≥ 1, and all of these are subject to further refinement. Thus a separate analysis of existence and uniqueness for (15.3.1) could be carried out under a wide variety of assumptions. Let us note in particular that (15.3.4) provides at least a formal solution formula provided that K ∈ S 0 (RN ), fˆ, K̂ are regular distributions (i.e. functions), and K̂(y) 6= λ a.e. Example 15.1. In (15.3.1) let N = 1 and K = π1 pv x1 so that K ∗ u = Hu, the Hilbert transform of u defined in (10.2.27). Referring to the formula (8.8.5) for the Fourier transform of K, we obtain (λû − i sgn y)û(y) = fˆ(y) (15.3.5) 269 15-3-4 Thus for f ∈ L2 (R) and λ 6= ±i it is clear that û(y) = fˆ(y) λ − i sgn y (15.3.6) defines the unique solution of (15.3.1). Now let us consider a closely related situation of a so-called Hankel type integral equation, Z K(x + y)u(y) dy = f (x) x ∈ RN (15.3.7) 15-3-7 RN If we let K1 (x) = K(−x) and f1 (x) = f (−x) then (15.3.7) is equivalent to K1 ∗ u = f1 , and so 1 fb1 (y) u b(y) = (15.3.8) N b 1 (y) (2π) 2 K If we temporarily denote the usual reflection operator by R, i.e. Rφ(x) = φ(−x), note that R commutes with the Fourier transform. Thus, ! fb 1 (15.3.9) u b= N R b K (2π) 2 and so from the inversion theorem the solution u is ! 1 fb b u= N b (2π) 2 K (15.3.10) assuming that the expression is meaningful. Note that using this approach it would not be straightforward to include a λu term on left side of (15.3.7). 15.4 Wiener-Hopf technique Throughout this section it will be assumed that the reader has some familiarity with basic ideas and techniques of complex analysis. Consider in one dimension the integral equation of the special type Z ∞ K(x − y)u(y) dy − λu(x) = f (x) x>0 (15.4.1) 0 270 15-4-1 Here the kernel depends on the difference of the two arguments, as in a convolution equation, but it is not actually a convolution type equation since the integration only takes place over (0, ∞). Nevertheless we can make some artificial extensions for mathematical convenience. Assuming that there exists a solution u to be found, we let u(x) = f (x) = 0 for x < 0 and (R ∞ K(x − y)u(y) dy x<0 0 g(x) = (15.4.2) 0 x>0 It then follows that Z ∞ K(x − y)u(y) dy − λu(x) = f (x) + g(x) x∈R (15.4.3) 15-4-3 −∞ This resulting equation is of convolution type, but contains the additional unknown term g. On the other hand when considered as a solution on all of R, u should be regarded as constrained by the property that it has support in the positive half line. A pair of operators which are technically useful for dealing with this situation are the so-called Hardy space projection operators P± defined as 1 P± φ = (φ ± iHφ) 2 (15.4.4) where H is the Hilbert transform. To motivate these definitions, recall from the discussion b just above (10.2.27) that (Hφ)b(y) = −i sgn y φ(y), so ( ( b b φ(y) y>0 φ(y) y<0 (P+ φ)b(y) = (P− φ)b(y) = (15.4.5) 0 y<0 0 y>0 It is therefore simple to see that P± are the orthogonal projections of L2 (R) onto the corresponding closed subspaces H+2 := {u ∈ L2 (R) : û(y) = 0 ∀y < 0} H−2 := {u ∈ L2 (R) : û(y) = 0 ∀y > 0} (15.4.6) 2 2 2 for which L (R) = H+ ⊕ H− (see also Exercise 5 of Chapter 10.) These are so-called Hardy spaces, which of course may be considered as Hilbert spaces in their own right, see Chapter 3 of [9]. In particular it can be readily seen (Exercise 8) that if φ ∈ H+2 then φ has an analytic extension to the upper half of the complex plane, Z ∞ |φ(x + iy)|2 dx ≤ ||φ||2L2 (R) ∀y > 0 (15.4.7) −∞ 271 15-4-7 and φ(· + iy) → φ in L2 (R) as y → 0+ (15.4.8) Likewise a function φ ∈ H−2 has an analytic extension to the lower half of the complex plane with analogous properties. A very important converse of the above is given by a case of the Paley-Wiener theorem. Theorem 15.4. If φ is analytic in the upper half of the complex plane and there exists a constant C such that Z ∞ |φ(x + iy)|2 dx = C (15.4.9) sup y>0 then φ ∈ H+2 and Z −∞ ∞ 2 Z |φ(x)| dx = −∞ ∞ |φ̂(y)|2 dy = C (15.4.10) 0 See Theorem 19.2 of [30] or Theorem 1, section 3.4 of [9] for a proof. The spaces H±2 actually belong to the larger family of Hardy spaces H±p , 1 ≤ p ≤ ∞, where for example φ ∈ H+p if φ has an analytic extension to the upper half of the complex plane and ||φ(· + iy)||Lp (R) ≤ ||φ||Lp (R) ∀y > 0 (15.4.11) Returning to (15.4.3) we note that u b, fb ∈ H−2 while gb ∈ H+2 . Suppose now that it is possible to find a pair of functions q± ∈ H±∞ such that √ q− (y) b 2π K(y) −λ= q+ (y) y∈R (15.4.12) 15-4-12 Then from (15.4.3) it follows that q− u b = q+ fb + q+ gb (15.4.13) From the assumptions made on q+ and the Paley-Wiener theorem we can conclude that q+ gb ∈ H+2 , and likewise q− u b ∈ H−2 . In particular P− (q+ gb) = 0, so that q− u b = P− (q− u b) = P− (q+ fb) (15.4.14) We thus obtain at least a formal solution formula for the Fourier transform of the solution, namely P− (q+ fb) (15.4.15) u b= q− 272 15-4-15 In order that the this formula be meaningful it is sufficient that 1/q± ∈ H±∞ along with the other assumptions already made, see Exercise 9. The central question which remains to be more thoroughly studied is the existence of the pair of functions q± satisfying all of the above requirements. We refer the reader to Chapter 3 of [15] or Chapter 3 of [9] for further reading about this, and conclude just with an example. Example 15.2. Consider (15.4.1) with K(x) = e−|x| , that is Z ∞ e−|x−y| u(y) dy − λu(x) = f (x) x > 0 (15.4.16) 0 Since r b K(y) = we get √ 2 1 π y2 + 1 b 2π K(y) − λ = −λ (15.4.17) y 2 + b2 y2 + 1 (15.4.18) where b2 = (λ − 2)/λ. If we require λ 6∈ [0, 2] then b may be chosen as real and positive, and we have √ q− (y) b (15.4.19) 2π K(y) −λ= q+ (y) where q− (y) = −λ y − ib y−i q+ (y) = y+i y + ib (15.4.20) −1 We see immediately that q± , q± ∈ H±∞ , and so (15.4.15) provides the unique solution of (15.4.16) provided λ 6∈ [0, 2].√Note the significance of this restriction on λ is that it b precisely the requirement that √ 2π K(y) − λ 6= 0 for all y, or equivalently λ does not b belong to the numerical range of 2π K(y). 15.5 Exercises 1. The Abel integral equation is x Z T u(x) = 0 u(y) √ dy = f (x) x−y 273 15-4-16 a first kind Volterra equation with a weakly singular kernel. Derive the explicit solution formula Z x 1 d f (y) √ u(x) = dy π dx 0 x−y Rx 2 (Suggestions: it amounts to showing that T u(x) = π u(y) dy. You’ll need to 0 Rx dz √ √ evaluate an integral of the form y z−y x−z . Use the change of variable z = y cos2 θ + x sin2 (θ).) 2. Let K1 , K2 be weakly singular kernels with associated exponents α1 , α2 , and let T1 , T2 be the associated Volterra integral operators. Show that T1 T2 is also a Volterra operator with a weakly singular kernel and associated exponent α1 + α2 − 1. 3. If P (x) is any nonzero polynomial, show that the first kind Volterra integral equation Z x P (x − y)u(y) dy = f (x) a is equivalent to a second kind Volterra integral equation. 4. If T is a weakly singular Volterra integral operator, show that there exists a positive integer n such that T n is a Volterra integral operator with a bounded kernel. 5. Use (15.3.4) to obtain an explicit solution of (15.3.1) if ( e−x x > 0 N =1 λ=1 K(x) = e−|x| f (x) = 0 x<0 6. Discuss the solvability of the integral equation Z ∞ u(s) ds = f (t) t > 0 s+t 0 (15.5.1) (15.5.2) (Suggestions: Introduce new variables ξ= 1 1 log t η = log s 2 2 ψ(η) = eη u(e2η ) g(ξ) = eξ f (e2ξ ) You may find it useful to work out, or look up, the Fourier transform of the hyperbolic secant function.) 274 7. If we look for a solution of ∆u = 0 x∈Ω ∂u +u=f x ∈ ∂Ω ∂n in the form of a single layer potential Z Γ(x − y)φ(y) dy u(x) = ∂Ω find an integral equation for the density φ. exr15-8 8. Ifφ ∈ H+2 show that φ has an analytic extension to the upper half of the complex plane. To be precise, show that if Z ∞ 1 b itz dt √ φ̃(z) = φ(t)e 2π 0 then φ̃ is defined and analytic on {z = x + iy : y > 0} and in L2 (R) lim φ̃(· + iy) = φ y→0+ Along the way show that (15.4.7) holds. exr15-9 −1 9. Assume that f ∈ L2 (0, ∞) and that (15.4.12) is valid for some q± with q± , q± ∈ H±∞ . 2 Verify that (15.4.15) defines a function u ∈ L (R) such that u(x) = 0 for x < 0. 10. Find q± in (15.4.12) for the case ( K(x) = sinc (x) := sin (πx) πx 1 x 6= 0 x=0 and λ = −1. (Suggestion: look for q± in the form q± (x) = limy→0± F (x + iy) where F (z) = ((z − π)/(z + π))iα .) 275 Chapter 16 Variational methods calcvar 16.1 The Dirichlet quotient DirFormCase We have earlier introduced the concept of the Rayleigh quotient J(u) = hT u, ui hu, ui (16.1.1) for a linear operator T on a Hilbert space H. In the previous discussion we were mainly concerned with the case that T was a bounded or even a compact operator, but now we will allow for T to be unbounded. In such a case, J(u) is defined at least for u ∈ D(T )\{0}, and possibly on some larger domain. The principal case of interest to us here is the case of the Dirichlet Laplacian discussed in Section 14.4, T u = −∆u In this case on D(T ) = {u ∈ H01 (Ω) : ∆u ∈ L2 (Ω)} R ||u||2H 1 (Ω) |∇u|2 dx h−∆u, ui Ω 0 J(u) = = R = 2 2 dx hu, ui ||u|| |u| L2 (Ω) Ω (16.1.2) 16-1-2 (16.1.3) 16-1-3 which we may evidently regard as being defined on all of H01 (Ω) except the origin. We’ll refer to any of these equivalent expressions as the Dirichlet quotient (or Dirichlet form)for −∆. Throughout this section we take (16.1.3) as the definition of J, and denote by {λn , ψn } the eigenvalues and eigenfunctions of T , where we may choose the ψn ’s to be an 276 orthonormal basis of L2 (Ω), according to the discussion of Section 14.4. It is immediate that J(ψn ) = λn (16.1.4) for all n. If we define a critical point of J to be any u ∈ H01 (Ω)\{0} for which d J(u + αv)|α=0 = 0 dα ∀v ∈ H01 (Ω) then precisely as in (13.3.12) and the following discussion we find Z Z ∇u · ∇v dx = J(u) uv dx ∀v ∈ H01 (Ω) Ω (16.1.5) (16.1.6) Ω In other words, T u = λu must hold with λ = J(u). Conversely, by straightforward calculation, any eigenfunction of T is a critical point of J. Thus the set of eigenfunctions of T coincides with the set of critical points of the Dirichlet quotient, and by (16.1.4) the eigenvalues are exactly the critical values of J. Among these critical points, one might expect to find a point at which J achieves its minimum value, which must then correspond to the critical value λ1 , the least eigenvalue of T . We emphasize, however, that the existence of a minimizer of J must be proved – it is not immediate from anything we have stated so far. We give one such proof here, and indicate another one in Exercise 3. th16-1 Theorem 16.1. There exists ψ ∈ H01 (Ω), ψ 6= 0, such that J(ψ) ≤ J(φ) for all φ ∈ H01 (Ω), φ 6= 0. Proof: Let λ= inf φ∈H01 (Ω) J(φ) (16.1.7) so λ > 0 by the Poincaré inequality. Therefore there exists ψn ∈ H01 (Ω) such that J(ψn ) → λ. Without loss of generality we may assume ||ψn ||L2 (Ω) = 1 for all n, in which case ||ψn ||2H 1 (Ω) → λ. In particular {ψn } is a bounded sequence in H01 (Ω), so by Theorem 0 w 13.1 there exists ψ ∈ H01 (Ω) such that ψnk → ψ in H01 (Ω), for some subsequence. By Theorem 14.4 it follows that ψnk → ψ strongly in L2 (Ω), so in particular ||ψ||L2 (Ω) = 1. Finally, using the lower semi-continuity property of weak convergence (Proposition 13.2) λ ≤ J(ψ) = ||ψ||2H 1 (Ω) ≤ lim inf ||ψnk ||2H 1 (Ω) = lim inf J(ψnk ) = λ 0 nk →∞ 0 277 nk →∞ (16.1.8) 16-1-4 so that J(ψ) = λ, i.e. J achieves its minimum at ψ. Note that by its very definition, the minimum λ1 of the Rayleigh quotient J, gives rise to the best constant in the Poincaré inequality, namely (14.4.10) is valid with C = √1λ1 and no smaller C works. The above argument provides a proof of the existence of one eigenvalue of T , namely the smallest eigenvalue λ1 , with corresponding eigenfunction ψ1 , which is completely independent from the proof given in Chapter 13. It is natural to ask then whether the existence of the other eigenvalues can be obtained in a similar way. Of course they can no longer be obtained by minimizing the Dirichlet quotient (nor is there any maximum to be found), but we know in fact that J has other critical points, since other eigenfunctions exist. Consider, for example the case of λ2 , for which there must exist an eigenfunction orthogonal in L2 (Ω) to the eigenfunction already found for λ1 . Thus it is a natural conjecture that λ2 may be obtained by minimizing J over the orthogonal complement of ψ1 . Specifically, if we set Z 1 H1 = {φ ∈ H0 (Ω) : φψ1 dx = 0} (16.1.9) Ω then the existence of a minimizer of J over H1 can be proved just as in Theorem 16.1. If the minimum occurs at ψ2 , with λ2 = J(ψ2 ) then the critical point condition amounts to Z Z ∇ψ2 · ∇v dx = λ2 ψ2 v dx ∀v ∈ H1 (16.1.10) Ω Ω Furthermore, if v = ψ1 then Z Z Z ∇ψ2 · ∇ψ1 dx = − ψ2 ∆ψ1 = −λ1 ψ2 ψ1 = 0 Ω Ω (16.1.11) Ω since ψ2 ∈ H1 . It follows that (16.1.10) holds for every v ∈ H01 (Ω), so ψ2 is an eigenvalue of T for eigenvalue λ2 . Clearly λ2 ≥ λ1 , since λ2 is obtained by minimization over a smaller set. We may continue this way, successively minimizing the Rayleigh quotient over the orthogonal complement in L2 (Ω) of the previously obtained eigenfunctions, to obtain a variational characterization of all eigenvalues. th16-2 Theorem 16.2. We have λn = J(ψn ) = min J(u) u∈Hn−1 278 (16.1.12) 16-1-10 where Hn = {u ∈ H01 (Ω) Z uψk dx = 0, k = 1, 2, . . . n} : H0 = {0} (16.1.13) Ω This proof is essentially a mirror image of the proof of Theorem 13.10, in which a compact operator has been replaced by an unbounded operator, and maximization has been replaced by minimization. One could also look at critical points of the reciprocal of J in order to maintain it as a maximization problem, but it is more common to proceed as above. Similar results can be obtained for a larger class of unbounded self-adjoint operators, see for example [37]. The eigenfunctions may be interpreted as saddle points of J, i.e., critical points which are not local extrema. The characterization of eigenvalues and eigenfunctions stated in Theorem 16.2 is unsatisfactory, in the sense that the minimization problem to be solved in order to obtain an eigenvalue λn requires knowledge of the eigenfunctions corresponding to smaller eigenvalues. We next discuss two alternative characterizations of eigenvalues, which may be regarded as advantageous from this point of view. If E is a finite dimensional subspace of H01 (Ω), we define µ(E) = max J(u) (16.1.14) u∈E and set Sn = {E ⊂ H01 (Ω) : E is a subspace, dim(E) = n} n = 0, 1, . . . (16.1.15) Note that µ(E) exists and is finite for E ∈ Sn , since if we choose any orthonormal basis {ζ1 , . . . ζn } of Sn then Z max J(u) = Pn max2 u∈E k=1 |ck | =1 | Ω n X ck ∇ζk |2 dx (16.1.16) k=1 Thus finding µ(E) amounts to maximizing a continuous function over a compact set. th16-3 Theorem 16.3. (Poincaré min-max formula) We have λn = min µ(E) = min max J(u) E∈Sn E∈Sn u∈E for n = 0, 1, . . . 279 (16.1.17) 16-1-17 Proof: J is constant on any one dimensional subspace, i.e. µ(E) = J(φ) if E = span {φ}, so the conclusion is equivalent to the statement of Theorem 16.1 for n = 1. For n ≥ 2, if E ∈ Sn we can find w ∈ E, w 6= 0 such that w ⊥ ψk for k = 1, . . . n−1, since this amounts to n − 1 linear equations for n unknowns (here {ψn } still denotes the orthonormalized Dirichlet eigenfunctions). Thus w ∈ Hn−1 and so by Theorem 16.2. λn ≤ J(w) ≤ max J(u) = µ(E) (16.1.18) λn ≤ inf µ(E) (16.1.19) u∈E It follows that E∈Sn On the other hand, if we choose E = span {ψ1 , . . . ψn } note that Pn λk c2k J(u) = Pk=1 n 2 k=1 ck P for any u = nk=1 ck ψk ∈ E. Thus µ(E) = J(ψn ) = λn (16.1.20) (16.1.21) and so the infimum in (16.1.19) is achieved for this E. The conclusion (16.1.17) then follows. A companion result, with a similar proof (see for example Theorem 5.2 of [37]) is Theorem 16.4. (Courant-Weyl max-min formula) We have λn = max min J(u) E∈Sn−1 u⊥E (16.1.22) for n = 0, 1, . . . An interesting application of the variational characterization of the first eigenvalue is the following monotonicity property. We use temporarily the notation λ1 (Ω) to denote the smallest Dirichlet eigenvalue of −∆ for the domain Ω. Theorem 16.5. If Ω ⊂ Ω0 then λ1 (Ω0 ) ≤ λ1 (Ω). Proof: By the density of C0∞ (Ω) in H01 (Ω) and Theorem 16.1, for any > 0 there exists u ∈ C0∞ (Ω) such that J(u) ≤ λ1 (Ω) + (16.1.23) 280 16-1-19 But extending u to be zero outside of Ω we may regard it as also belonging to C0∞ (Ω0 ), and the value of J(u) is the same whichever domain we have in mind. Therefore λ1 (Ω0 ) ≤ J(u) ≤ λ1 (Ω) + (16.1.24) and so the conclusion follows by letting → 0. 16.2 Eigenvalue approximation The variational characterizations of eigenvalues discussed in the previous section lead immediately to certain estimates for the eigenvalues. In the simplest possible situation, if we choose any nonzero function v ∈ H01 (Ω), (which we call the trial function in this context), then from Theorem 16.1 we have that λ1 ≤ J(v) (16.2.1) an upper bound for the smallest eigenvalue. Furthermore, if we can choose v to ’resemble’ the corresponding eigenfunction ψ1 , then we will typically find that J(v) is close to λ1 . If, for example in the one dimensional case Ω = (0, 1), we choose v(x) = x(1 − x) then by direct calculation we get that J(v) = 10, which should be compared to the exact value π 2 ≈ 9.87. The trial function v(x) = x2 (1 − x), which is not so much like ψ1 = sin(πx) provides a correspondingly poorer approximation J(v) = 14, which is still of course a valid upper bound. The so-called Rayleigh-Ritz method generalizes this idea, so as to provide inequalities and/or approximations for other eigenvalues besides the first one. Let v1 , v2 . . . , vn denote n linearly independent trial functions in H01 (Ω). Then E = span {v1 , v2 , . . . vn } is an ndimensional subspace of H01 (Ω), and so λ1 ≤ min J(u) λn ≤ max J(u) u∈E u∈E (16.2.2) by Theorems 16.2 and 16.3. The problem of computing critical points of J over E is aP calculus problem, which may be handled as follows. Any u ∈ E may be written as u = nk=1 ck vk , and so R Pn | k=1 ck ∇vk |2 Ω R P J(u) = = J(c1 , . . . cn ) (16.2.3) ( nk=1 ck vk )2 Ω 281 16-2-2 ∂J The critical point condition ∂c = 0, j = 1, . . . n is readily seen to be equivalent to the j T linear system for c = hc1 , . . . cn i , Ac = ΛBc (16.2.4) where A, B are the n × n matrices with entries Z Z Akj = ∇vk · ∇vj dx Bkj = vk vj dx Ω geneval (16.2.5) Ω and Λ = J(u). In other words, the critical points are obtained as the eigenvalues of the generalized eigenvalue problem (16.2.4) defined by means of the two matrices A, B. As usual, the set of all eigenvalues of (16.2.4) are obtained as the roots of the n’th degree polynomial det (A − ΛB) = 0. We denote these roots (which must be positive and real, by the symmetry of A, B) as 0 < Λ1 ≤ Λ2 ≤ · · · ≤ Λn , with points repeated as needed according to multiplicity. Thus (16.2.2) amounts to λ1 ≤ Λ1 λn ≤ Λn (16.2.6) Similar inequalities can be proved for all of the intermediate eigenvalues as well, we refer to [37] for the proof. Theorem 16.6. We have λk ≤ Λk k = 1, . . . n (16.2.7) As in the case of a single eigenvalue, a good choice of trial functions {v1 , . . . vn } will typically result in values of Λ1 , . . . Λk which are good approximations to λ1 , . . . λn . 16.3 The Euler-Lagrange equation eul-lag-sec In Section 16.1 we observed that the problem of minimizing the nonlinear functional J in (16.1.3), or more generally finding any critical point of J, leads to the eigenvalue problem for T defined in (16.1.2). This corresponds to the situation found even in elementary calculus, where to solve an optimization problem, we look for points where a derivative is equal to zero. In the Calculus of Variations, we continue to extend this kind of thinking from finite dimensional to infinite dimensional situations. Suppose X is a vector space, X ⊂ X, J : X → R is a functional, nonlinear in general, and consider the problem min J(x) (16.3.1) x∈X 282 uncon There may also be constraints to be satisfied, for example in the form H(x) = C, where H : X → R, so that the problem may be given as min J(x) (16.3.2) con x∈X H(x)=C We refer to (16.3.1) and (16.3.2) as the unconstrained and constrained cases respectively.1 In the unconstrained case, if x is a solution of (16.3.1) which is also an interior point of X , then α → J(x + αy) has a minimum at α = 0, and so d J(x + αy)α=0 = 0 dα ∀y ∈ X (16.3.3) 16-3-3 must be satisfied. In the constrained case, a solution must instead have the property that there exists a constant λ such that d (J(x + αy) − λH(x + αy)) α=0 = 0 dα ∀y ∈ X (16.3.4) This condition may be motivated in several ways, here is one of them. Suppose we can find a constant λ such that the unconstrained problem of minimizing J − λH has a solution x for which H(x) = C, i.e. J(z) − λH(z) ≥ J(x) − λH(x) for all z. But if we require z to satisfy the constraint, then H(z) = H(x), and so J(z) ≥ J(x) for all z for which H(z) = C. Thus the constrained minimization problem may be regarded as that of solving (16.3.4) simultaneously with the constraint H(x) = C. The special value of λ is called a Lagrange multiplier for the problem. In either the constrained or unconstrained case, the equation which results from (16.3.3) or (16.3.4) is called the Euler-Lagrange equation. The same conditions would be satisfied if we were seeking a maximum rather than a minimum, and may also be satisfied at critical points which are neither. The EulerLagrange equation must be viewed as a necessary condition for a solution, but it does not follow that any solution of the Euler-Lagrange equation must also be a solution of the original optimization problem. Just as in elementary calculus, we only obtain candidates for the solution in this way, and some further argument will in general be needed to complete the solution. 1 Even though the definition of X itself will often amount to the imposition of certain constraints. 283 16-3-4 16.4 Variational methods for elliptic boundary value problems We now present the application of variational methods, and obtain the Euler-Lagrange equation in explicit form, for several important PDE problems. Ex16-1 Example 16.1. Let J denote the Dirichlet quotient defined in (16.1.3) which we regard as defined on X = {u ∈ H01 (Ω) : u 6= 0} ⊂ H01 (Ω). Precisely as in (13.3.12) we find that R 2 R R R 2 ( u dx)( ∇u · ∇v dx) − ( |∇u| dx)( uv dx) d Ω Ω Ω Ω R J(u + αv)α=0 = 2 (16.4.1) dα ( Ω u2 dx)2 The condition (16.3.3) for an unconstrained minimum of J over X then amounts to Z Z u 6= 0 ∇u · ∇v dx − λ uv dx = 0 ∀v ∈ H01 (Ω) (16.4.2) Ω 16-3-6 Ω with λ = J(u). Thus the Euler-Lagrange equation for this problem is precisely the equation for a Dirichlet eigenfunction in Ω. Example 16.2. Let Z Z 2 |∇u| dx J(u) = H(u) = Ω u2 dx (16.4.3) Ω both regarded as functionals on X = X = H01 (Ω). By elementary calculations, Z Z d d (16.4.4) J(u + αv) α=0 = 2 ∇u · ∇v dx H(u + αv) α=0 = 2 uv dx dα dα Ω Ω The condition (16.3.4) for a constrained minimum of J subject to the constraint H(u) = 1 then amounts to (16.4.2) again, except now the solution is automatically normalized in L2 . Thus we can regard the problem of finding eigenvalues as coming from either a constrained or an unconstrained optimization problem. Ex16-3 Example 16.3. Define J as in Example 16.1, except replace H01 (Ω) by H 1 (Ω). The condition for a solution of the unconstrained problem is then Z Z u 6= 0 ∇u · ∇v dx − λ uv dx = 0 ∀v ∈ H 1 (Ω) (16.4.5) Ω Ω Since we are still free to choose v ∈ C0∞ (Ω) it again follows that −∆u = λu for λ = J(u), but there is no longer an evident boundary condition for u to be satisfied. We observe, 284 16-3-9 however, that if we choose v to be say in C 1 (Ω) in (16.4.5), then an integration by parts yields Z Z Z ∂u ds = λ uv dx (16.4.6) − v∆u dx + v Ω Ω ∂Ω ∂n and since the Ω integrals must cancel, we get Z ∂u v ds = 0 ∀v ∈ C 1 (Ω) ∂n ∂Ω (16.4.7) ∂u = 0 on ∂Ω should hold. Thus, by Since v is otherwise arbitrary, we conclude that ∂n looking for critical points of the Dirichlet quotient over the larger space H 1 (Ω) we get eigenfunctions of −∆ subject to the homogeneous Neumann condition, in place of the Dirichlet condition. Since this condition was not imposed explicitly, but rather followed from the choice of space we used, it is often referred to in this context as the natural boundary condition. Note that the actual minimum in this case is clearly J = 0, achieved for any constant function u. Thus it is the fact that infinitely many other critical points can be shown to exist which makes this of interest. Example 16.4. Let f ∈ L2 (Ω), and set Z Z 1 2 J(u) = |∇u| dx − f u dx u ∈ H01 (Ω) 2 Ω Ω The condition for an unconstrained critical point is readily seen to be Z Z 1 u ∈ H0 (Ω) ∇u · ∇v dx − f v dx = 0 ∀v ∈ H01 (Ω) Ω (16.4.8) 16-3-12 (16.4.9) 16-3-13 Ω Thus, in the distributional sense at least, a minimizer is a solution of the Poisson problem −∆u = f x∈Ω u = 0 x ∈ ∂Ω (16.4.10) The existence of a unique solution is already known from Proposition 14.2, and is explicitly given by the integral operator S appearing in (14.4.23). The main interest here is that we have obtained a variational characterization of it. Furthermore, we can give a direct proof of the existence of a unique solution of (16.4.9), which is of interest because it is easily adaptable to some other situations, even if it does not provide a new result in this particular case. The proof illustrates the so-called direct method of the Calculus of Variations. 285 16-3-14 th16-7 Theorem 16.7. The problem of minimizing the functional J defined in (16.4.8) has a unique solution, which also satisfies (16.4.9). Proof: If C denotes any constant for which the Poincaré inequality (14.4.10) is valid, we obtain Z f u dx ≤ ||f ||L2 ||u||L2 ≤ C||f ||L2 ||u||H 1 ≤ 1 ||u||2 1 + C 2 ||f ||2 2 (16.4.11) L H0 0 4 Ω so that 1 J(u) ≥ ||u||2H 1 − C 2 ||f ||2L2 0 4 In particular, J is bounded below, so d := inf u∈H01 (Ω) J(u) (16.4.12) (16.4.13) is finite and there exists a sequence un ∈ H01 (Ω) such that J(un ) → d. Also, since ||un ||2H 1 ≤ 4 J(un ) + C 2 ||f ||2L2 (16.4.14) 0 the sequence {un } is bounded in H01 (Ω). By Theorem 13.1 there exists u ∈ H01 (Ω) and w a weakly convergent subsequence, unk → u, which is therefore strongly convergent in L2 (Ω) by Theorem 14.4. Finally, Z Z 1 2 d ≤ J(u) = |∇u| dx − f u dx (16.4.15) 2 Ω Ω Z Z 1 2 ≤ lim inf |∇unk | dx − f unk dx (16.4.16) nk →∞ 2 Ω Ω = lim inf J(unk ) = d (16.4.17) nk →∞ Here, the inequality R on the second line follows from the first part of Proposition 13.2 and the fact that the Ω f unk dx term is convergent. We conclude that J(u) = d so J achieves its minimum value. If two such solutions u1 , u2 exist, then the difference u = u1 − u2 must satisfy Z ∇u · ∇v dx = 0 ∀v ∈ H01 (Ω) (16.4.18) Ω Choosing v = u we get ||u||H01 = 0, so u1 = u2 . 286 Here is one immediate generalization about the solvability of 16.4.10, which is easy to obtain by the above method. Suppose that there exists p ∈ [1, 2) such that the inequality Z f u dx ≤ C||f ||Lp ||u||H 1 (16.4.19) 0 16-3-23 Ω holds. Then the remainder of the proof remains valid, establishing the existence of a solution for all f ∈ Lp (Ω) for this choice of p, corresponding to a class of f ’s which is larger than L2 (Ω). It can in fact be shown that (16.4.19) is correct for p = N2N , see +2 Exercise 16. Example 16.5. Next consider the functional J in (16.4.8) except now regarded as defined on all of H 1 (Ω), in which case the critical point condition is Z Z 1 u ∈ H (Ω) ∇u · ∇v dx − f v dx = 0 ∀v ∈ H 1 (Ω) (16.4.20) Ω 16-3-24 Ω It still follows that u must be a weak solution of −∆u = f , and by the same argument ∂u = 0 on ∂Ω. Thus critical points of J over H 1 (Ω) provide us with as in Example 16.3, ∂n solutions of ∂u −∆u = f x ∈ Ω = 0 x ∈ ∂Ω (16.4.21) ∂n We must first recognize that we can no longer expect a solution to exist for arbitrary choices of f ∈ L2 (Ω), since if we choose v ≡ 1 we obtain the condition Z f dx = 0 (16.4.22) Ω which is thus a necessary condition for solvability. Likewise, if a solution exists it will not be unique, since any constant could be added to it. From another point of view, if we examine R the proof of Theorem 16.7, we see that the infimum of J is clearly equal to −∞, unless Ω f dx = 0, since we can choose u to be an arbitrary constant function. Thus the minimum of J cannot be achieved by any function u ∈ H 1 (Ω). To work around this difficulty, we make use of the closed subspace of zero mean functions in H 1 (Ω), namely Z 1 1 H∗ (Ω) = {u ∈ H (Ω) : u dx = 0} (16.4.23) Ω where the inner product and norm will simply be the restriction of the usual ones in H 1 to H∗1 . Analogous to the Poincaré inequality, Proposition 14.1 we have 287 16-3-25 zeromean poincineq2 Proposition 16.1. If Ω is a bounded open set in RN with sufficiently smooth boundary then there exists a constant C, depending only on Ω, such that ∀u ∈ H∗1 (Ω) ||u||L2 (Ω) ≤ C||∇u||L2 (Ω) (16.4.24) See Exercise 6 for the proof. The key point is that H∗1 contains no constant functions other than zero. Now if we regard the functional J in (16.4.8) as defined only on the Hilbert space H∗1 (Ω), then the proof of Theorem 16.7 can be modified in an obvious way to obtain that for any f ∈ L2 (Ω) there exists Z Z 1 ∇u · ∇v dx − f v dx = 0 ∀v ∈ H∗1 (Ω) (16.4.25) u ∈ H∗ (Ω) Ω 16-4-24 Ω R 1 For any v ∈ H 1 (Ω) let µ = m(Ω) v dx be the mean value of v, so that v − µ ∈ H∗1 (Ω). Ω If in addition we assume that the necessary condition (16.4.22) holds, it follows that Z Z Z Z ∇u · ∇v dx = ∇u · ∇(v − µ) dx = f (v − µ) dx = f v dx (16.4.26) Ω Ω Ω Ω for any v ∈ H01 (Ω). Thus u satisfies (16.4.20) and so is a weak solution of (16.4.21). It is unique within the subspace H∗1 (Ω), but by adding any constant we obtain the general solution u(x) + C in H 1 (Ω). 16.5 Other problems in the calculus of variations Let L = L(x, u, p) be a sufficiently smooth function on the domain {(x, u, p) : x ∈ Ω, u ∈ R, p ∈ RN } where as usual Ω ⊂ RN , and set Z J(u) = L(x, u(x), ∇u(x)) dx (16.5.1) Ω The function L is called the Lagrangian in this context. We consider the problem of finding critical points of J, and for the moment proceed formally, without regard to the 288 16-5-1 precise spaces of functions involved. Expanding J(u + αv) in powers of α, we get Z L(x, u(x) + αv(x), ∇u(x) + α∇v(x)) dx (16.5.2) J(u + αv) = Ω Z Z ∂L L(x, u(x), ∇u(x)) dx + α = (x, u(x), ∇u(x))v(x) dx(16.5.3) Ω Ω ∂u Z X N ∂v ∂L (x, u(x), ∇u(x)) (x) dx + o(α) (16.5.4) + α ∂xj Ω j=1 ∂pj Thus the critical point condition reduces to # Z " N X ∂L ∂L ∂v (·, u, ∇u)v + (·, u, ∇u) dx 0= ∂p ∂x j j Ω ∂u j=1 (16.5.5) for all suitable v’s. Among the choices of v we can make, we certainly expect to find those which satisfy v = 0 on ∂Ω. By an integration by parts we then get # Z " N X ∂L ∂ ∂L 0= (·, u, ∇u) − (·, u, ∇u) v dx (16.5.6) ∂xj ∂pj Ω ∂u j=1 16-5-6 Since v is otherwise arbitrary, we conclude that N ∂L X ∂ ∂L − =0 ∂u j=1 ∂xj ∂pj (16.5.7) is a necessary condition for a critical point of J. That is to say, (16.5.7) is the EulerLagrange equation corresponding to the functional J. Typically it amounts to a partial differential equation for u, or an ordinary differential equation if N = 1. The fact that (16.5.6) leads to (16.5.7) is often referred to as the Fundamental lemma of the Calculus of Variations, resulting formally from the intuition that we may (approximately) choose v to be equal to the bracketed term in (16.5.6) which it multiplies, so that v has L2 norm equal to zero. Despite using the term ‘lemma’, it is not a precise statement of anything unless some specific assumptions are made on L and the function spaces involved. Example 16.6. The functional J in (16.4.8) comes from the Lagrangian 1 L(x, u, p) = |p|2 − f (x)u 2 289 (16.5.8) eul-lag Thus ∂L ∂u = −f (x) and ∂L ∂pj = pj , so (16.5.7) becomes, upon substituting p = ∇u, N X ∂ ∂u =0 −f (x) − ∂x ∂x j j j=1 (16.5.9) which is obviously the same as (16.4.10). Example 16.7. A very classical problem in the calculus of variations is that of finding the shape of a hanging uniform chain, given fixed locations for its two endpoints. The physical principle which we invoke is that the shape must be such that the potential energy is minimized. To find an expression for the potential energy, let the shape be given by a function h = u(x), a < x < b. Observe that the contribution to the total potential energy from a short segment of the chain is gh∆m where g is the gravitational constant and ∆m is the mass of the segment, and so may be given asp ρ∆s where ρ is the (constant) density, and ∆s is the length of the segment. Since ∆s = 1 + u0 (x)2 ∆x, we are led in the usual way to the potential energy functional Z b p J(u) = u(x) 1 + u0 (x)2 dx (16.5.10) a p to minimize. Applying (16.5.7) with L(x, u, p) = u 1 + p2 gives the Euler-Lagrange equation 0 ∂L d ∂L √ d uu √ − = 1 + u02 − =0 (16.5.11) ∂u dx ∂p dx 1 + u02 To solve this nonlinear ODE, we first multiply the equation through by 1 d uu − 2 dx 0 uu0 √ 1 + u02 0 √ uu 1+u02 to get 2 =0 (16.5.12) so 2 uu0 =C (16.5.13) u − √ 1 + u02 for some constant C. After some obvious algebra we get the separable first order ODE r u 2 −1 (16.5.14) u0 = ± C 2 which is readily integrated to obtain the general solution x u(x) = C cosh ( + D) C 290 (16.5.15) The two constants C, D are determined by the values of u(a) and u(b), so that in all cases the hanging chain is seen to assume the ’catenary’ shape, determined by the hyperbolic cosine function. Example 16.8. Another important class of examples comes from the theory of minimal surfaces. A function u = u(x) defined on a domain Ω ⊂ R2 may be regarded as defining a surface in R3 , and the corresponding surface area is Z p J(u) = 1 + |∇u|2 dx (16.5.16) Ω Suppose we seek the surface of least possible area, subject to the requirement that u(x) = g(x) on ∂Ω, where g is a prescribed function. Such a surface is said to span the bounding curve Γ = {(x1 , x2 , g(x1 , x2 )) : (x1 , x2 ) ∈ ∂Ω}. The problem of finding a minimal surface with a given boundary curve is known as Plateau’s problem. For this discussion we assume that g is the restriction to ∂Ω of some function in H (Ω) and then let X = {u ∈ H 1 (Ω) : u − g ∈ H01 (Ω)}. Thus in looking at J(u + αv) 1 we should always p assume that v ∈ H0 (Ω), as in the discussion leading to (16.5.6). With L(x, u, p) = 1 + |p|2 we obtain 1 pj ∂L =p ∂pj 1 + |p|2 The resulting Euler-Lagrange equation is then the minimal surface equation ! 2 X uxj p =0 2 1 + |∇u| j=1 x (16.5.17) (16.5.18) j It turns out that the expression on the left hand side is the so-called mean curvature 2 of the surface defined by u(x, y), so a minimal surface always has zero mean curvature. Let us finally consider an example in the case of constrained optimization, min J(u) (16.5.19) H(u)=C where J is defined as in (16.5.1) and H is another functional of the same sort, say Z H(u) = N (x, u(x), ∇u(x)) dx (16.5.20) Ω 2 It is equal to the average of the principal curvatures. 291 16-5-19 As discussed in Section 16.3 we should seek critical points of J − λH, which we may regard as coming from the augmented Lagrangian M := L − λN . The Euler-Lagrange equation for a solution will then be N ∂M X ∂ ∂M − =0 ∂u ∂x ∂p j j j=1 dido Z N (x, u(x), ∇u(x)) dx = C (16.5.21) Ω Example 16.9. (Dido’s problem 3 ) Consider the area A in the (x, y) plane between y = 0 and y = u(x), where u(x) ≥ 0, u(0) = u(1) = 0. If the curve y = u(x) is fixed to have length L, how should we choose the shape of the curve to maximize the area A? This is an example of a so-called isoperimetric problem because the total perimeter of the boundary of A is fixed to be 1 + L. Clearly the mathematical expression of this problem may be written in the form (16.5.19) with Z 1 Z 1p u(x) dx H(u) = J(u) = 1 + u0 (x)2 dx C=L (16.5.22) 0 0 so that p M = u − λ 1 + p2 The first equation in (16.5.21) thus gives 0 u0 1 √ = λ 1 + u02 (16.5.23) (16.5.24) From straightforward algebra and integration we obtain x − x0 u0 = ± p λ − (x − x0 )2 (16.5.25) for some x0 , which subsequently leads to the expected result that the curve must be an arc of a circle, (u − u0 )2 + (x − x0 )2 = λ2 (16.5.26) for some x0 , u0 . From the boundary conditions u(0) = u(1) = 0 it is easy to see that x0 = 1/2, and the length constraint implies Z 1√ Z 1 x − 21 1 dx 1 −1 02 q L= 1 + u dx = λ = λ sin = 2λ sin−1 0 λ 2λ 0 0 λ2 − (x − 12 )2 (16.5.27) 3 Named for the founder and first queen of the ancient city of Carthage. 292 16-5-21 By elementary calculus techniques we may verify that a unique λ ≥ 1/2 exists for any L ∈ (1, π2 ]. The restriction L > 1 is of course a necessary one for the curve to connect the two endpoints and enclose a positive area, but L ≤ π2 is only an artifact due to us requiring that the curve be given in the form y = u(x). If instead we allow more general curves (e.g. given parametrically) then any L > 1 is possible, see Exercise 18. 16.6 The existence of minimizers We turn now to some discussion of conditions which guarantee the existence of a solution of a minimization problem. We emphasize that (16.5.7) is only a necessary condition for a solution, and some different kind of argument is needed to establish that a given minimization problem actually has a solution. Let H be a Hilbert space, X ⊂ H an admissible subset of H, J : X → R and consider the problem min J(x) (16.6.1) x∈X One result which is immediate from applying Theorem 4.4 to −J is that a solution exists provided X is compact and J is continuous. It is unfortunately the case for many interesting problems that one or both of these conditions fails to be true, thus some other considerations are needed. We’ll use the following definitions. Definition 16.1. J is coercive if J(x) → +∞ as ||x|| → ∞, x ∈ X . Definition 16.2. J is lower semicontinuous if J(x) ≤ lim inf n→∞ J(xn ) whenever xn ∈ X , xn → x, and weakly lower semicontinuous if J(x) ≤ lim inf n→∞ J(xn ) whenever w xn ∈ X , xn → x. Definition 16.3. J is convex if J(tx+(1−t)y) ≤ tJ(x)+(1−t)J(y) whenever 0 ≤ t ≤ 1 and x, y ∈ X . w Recall also that X is weakly closed if xn ∈ X , xm → x implies that x ∈ X . th16-8 Theorem 16.8. If J : X → R is coercive and weakly lower semicontinuous, and X ⊂ H is weakly closed, then there exists a solution of (16.6.1). If J is convex then it is only necessary to assume that J is lower semicontinuous rather than weakly lower semicontinuous. Proof: Let d = inf x∈X J(x). If d 6= −∞ then there exists R > 0 such that J(x) ≥ d + 1 if x ∈ X , ||x|| > R, while if d = −∞ there exists R > 0 such that J(x) ≥ 0 if x ∈ X , 293 16-6-1 ||x|| > R. Either way, the infimum of J over X must be the same as the infimum over {x ∈ X : ||x|| ≤ R}. Thus there must exist a sequence xn ∈ X , ||xn || ≤ R such that J(xn ) → d. By the second part of Theorem 13.1 and the weak closedness of X , it follows w that there is a subsequence {xnk } and a point x ∈ X such that xnk → x. In particular J(x) = d must hold, since d ≤ J(x) ≤ lim inf J(xnk ) = d nk →∞ (16.6.2) Thus d must be finite, and the infimum of J is achieved at x, so x is a solution of (16.6.1). The final statement is a consequence of the lemma below, which is of independent interest. Lemma 16.1. If J is convex and lower semicontinuous then it is weakly lower semicontinuous. Proof: If Eα = {x ∈ H : J(x) ≤ α} (16.6.3) then Eα is closed since xn ∈ Eα , xn → x implies that J(x) ≤ lim inf n→∞ J(xn ) ≤ α. Also, Eα is convex since if x, y ∈ Eα and t ∈ [0, 1], then J(tx + (1 − t)y) ≤ tJ(x) + (1 − t)J(y) ≤ tα + (1 − t)α = α. Now by part 3 of Theorem 13.1 (Mazur’s theorem) we get that Eα w is weakly closed. Thus, if xn → x and α = lim inf n→∞ J(xn ), we may find nk → ∞ such that J(xnk ) → α. If α 6= −∞ and > 0 we must have xnk ∈ Eα+ for sufficiently large nk , and so x ∈ Eα+ by the weak closedness. Since is arbitrary, we must have J(x) ≤ α, as needed. The proof is similar if α = −∞. 16.7 The Fréchet derivative In this final section we discuss some notions which are often used in formalizing the general ideas already used in this chapter. Let X, Y be Banach spaces and F : D(F ) ⊂ X → Y be a mapping, nonlinear in general, and let x0 be an interior point of D(F ). Definition 16.4. If there exists a linear operator A ∈ B(X, Y) such that lim x→x0 ||F (x) − F (x0 ) − A(x − x0 )|| =0 ||x − x0 || 294 (16.7.1) frderiv then we say F is Fréchet differentiable at x0 , and A =: DF (x0 ) is the Fréchet derivative of F at x0 . It is easy to see that there is at most one such operator A, see Exercise 21. It is also immediate that if DF (x0 ) exists then F must be continuous at x0 . Note that (16.7.1) is equivalent to F (x) = F (x0 ) + DF (x0 )(x − x0 ) + o(||x − x0 ||) x ∈ D(F ) (16.7.2) This general concept of differentiability of a mapping at a given point amounts to the property that the mapping may be approximated in a precise sense by a linear map4 in the vicinity of the given point x0 . The difference E(x, x0 ) := F (x) − F (x0 ) − DF (x0 )(x − x0 ) = o(||x − x0 ||) (16.7.3) will be referred to as the linearization error, and approximating F (x) by F (x0 )+DF (x0 )(x− x0 ) as linearization of F at x0 . Example 16.10. If F : X → R is defined by F (x) = ||x||2 on a real Hilbert space X then F (x) − F (x0 ) = ||x0 + (x − x0 )||2 − ||x0 ||2 = 2hx0 , x − x0 i + ||x − x0 ||2 (16.7.4) It follows that (16.7.2) holds with DF (x0 ) = A ∈ B(X, R) = X∗ given by Az = 2hx0 , zi Ex16-11 Example 16.11. Let F : RN → RM be defined as f1 (x1 , . . . xN ) .. F (x) = F (x1 , . . . xN ) = . fM (x1 , . . . xN ) (16.7.5) (16.7.6) If the component functions f1 , . . . fM are continuously differentiable on some open set containing x0 , then fk (x) = fk (x0 ) + N X ∂fk j=1 ∂xj (x0 )(xj − x0j ) + o(||x − x0 ||) 4 (16.7.7) Here we will temporarily use the word linear to refer to what might more properly be called an affine function, F (x0 ) + A(x − x0 ) which differs from the linear function x → Ax by the constant F (x0 ) − Ax0 . 295 frderiv2 Therefore F (x) = F (x0 ) + A(x0 )(x − x0 ) + o(||x − x0 ||) (16.7.8) with A(x0 ) ∈ B(RN , RM ) given by the Jacobian matrix of the transformation F at x0 , ∂fk i.e. the M × N matrix whose k, j entry is ∂x (x0 ). It follows that DF (x0 ) is the linear j mapping defined by the matrix A(x0 ), or more informally DF (x0 ) = A(x0 ). Example 16.12. If A ∈ B(X, Y) and F (x) = Ax then F (x) = F (x0 ) + A(x − x0 ) so DF (x0 ) = A, i.e. the derivative of a linear map is itself. Example 16.13. If J : X → R is a functional on X, and if DJ(x0 ) exists then DJ(x0 )y = d J(x0 + αy)α=0 dα (16.7.9) since J(x0 + αy) − J(x0 ) = DJ(x0 )(αy) + E(x0 + αy, x0 ) (16.7.10) Dividing both sides by α and letting α → 0, we get (16.7.9). The right hand side of (16.7.9) has the interpretation of being the directional derivative of J at x0 in the y direction, and in this context is often referred to as the Gateaux derivative. The above observation is simply that the Gateaux derivative coincides with Fréchet derivative if the latter exists. From another point of view, it says that if the Fréchet derivative exists, a formula for it may be found by computing the Gateaux derivative. It is, however, possible that J has a derivative in the Gateaux sense, but not in the Fréchet sense, see Exercise 22. In any case we see that if J is differentiable in the Fréchet sense, then the Euler-Lagrange equation for a critical point of J amounts to DJ(x0 ) = 0. With a notion of derivative at hand, we can introduce several additional useful concepts. We denote by C(X, Y) the vector space of continuous mappings from X to Y. The mapping DF : x0 → DF (x0 ) is evidently itself a mapping between Banach spaces, namely DF : X → B(X, Y), and we say F ∈ C 1 (X, Y) if this map is continuous with respect to the usual metrics. Furthermore, we then denote D2 F (x0 ) as the Fréchet derivative of DF at x0 , if it exists, in which case D2 F (x0 ) ∈ B(X, B(X, Y)). There is a natural isomorphism between B(X, B(X, Y)) and B(X × X, Y), namely if A ∈ B(X, B(X, Y)) there is an associated à ∈ B(X × X, Y) related by Ã(x, z) = A(x)z x, z ∈ X (16.7.11) Thus it is natural to regard D2 F (x0 ) as a continuous bilinear map, and the action of the map will be denoted as D2 F (x0 )(x, z) ∈ Y. We say F ∈ C 2 (X, Y) if x0 → D2 F (x0 ) is continuous. It can be shown that D2 F (x0 ) must be symmetric if F ∈ C 2 (X, Y). 296 16-7-9 In general, we may inductively define Dk F (x0 ) to be the Fréchet derivative of Dk−1 F at x0 , if it exists, which will then be a k-linear mapping of X × · · · × X into Y. k times Example 16.14. If X is a real Hilbert space and F (x) = ||x||2 , recall we have seen that DF (x0 )z = 2hx0 , zi. Thus DF (x)z − DF (x0 )z = 2hx − x0 , zi = D2 F (x0 )(x − x0 , z) + o(||x − x0 ||) (16.7.12) provided D2 F (x0 )(x, z) = 2hx, zi, and obviously the error term is exactly zero. Example 16.15. If F : RN → R then by Example 16.11 DF (x0 ) is given by the gradient of F , that is N X ∂F N (x0 )zj (16.7.13) DF (x0 ) ∈ B(R , R) DF (x0 )z = ∂xj j=1 Therefore we may regard DF : RN → RN and so D2 F (x0 ) ∈ B(RN , RN ), given by (now using Example 16.11 in the case M = N ) the Jacobian of the gradient of F , that is 2 D F (x0 )(z, w) = N X Hjk (x0 )zj wk = j,k=1 N X ∂ 2F (x0 )zj wk ∂xk ∂zj j,k=1 (16.7.14) where H is the usual Hessian matrix. Certain calculus rules are valid and may be proved in essentially the same way as in the finite dimensional case. chainrule Theorem 16.9. (Chain rule for Fréchet derivative). Assume that X, Y, Z are Banach spaces and F : D(F ) ⊂ X → Y G : D(G) ⊂ Y → Z (16.7.15) Assume that x0 is an interior point of D(F ), DF (x0 ) exists, y0 = F (x0 ) is an interior point of D(G) and DG(y0 ) exists. Then G ◦ F : X → Z is Fréchet differentiable at x0 and D(G ◦ F )(x0 ) = DG(y0 )DF (x0 ) (16.7.16) Proof: Let EF (x, x0 ) = F (x)−F (x0 )−DF (x0 )(x−x0 ) EG (y, y0 ) = G(y)−G(y0 )−DG(y0 )(y−y0 ) (16.7.17) 297 so that G(F (x)) − G(F (x0 )) = DG(y0 )DF (x0 )(x − x0 ) + DG(y0 )EF (x, x0 ) + EG (F (x), y0 ) (16.7.18) for x sufficiently close to x0 . By the differentiability of F, G we have ||EF (x, x0 )|| = o(||x − x0 ||) ||EG (F (x), y0 )|| = o(||F (x) − F (x0 )||) = o(||x − x0 ||) (16.7.19) Since also DG(y0 ) is bounded, the conclusion follows. It is a familiar fact in one space dimension that a bound on the derivative of a function implies Lipschitz continuity. Here is an analogue for maps on a Banach space. Theorem 16.10. Let X, Y be Banach spaces, F : D(F ) ⊂ X → Y, and let x, x0 ∈ D(F ) be such that tx + (1 − t)x0 ∈ D(F ) for t ∈ [0, 1]. If M := sup ||DF (tx + (1 − t)x0 )|| (16.7.20) 0≤t≤1 then ||F (x) − F (x0 )|| ≤ M ||x − x0 || secderiv (16.7.21) Theorem 16.11. (Second derivative test) Let X be a Banach space and J ∈ C 2 (X, R). If J achieves its minimum at x0 ∈ X then D2 J(x0 ) must be positive semidefinite, that is, D2 J(x0 )(z, z) ≥ 0 for all z ∈ X. Conversely if x0 is a critical point of J at which D2 J is positive definite, D2 J(x0 )(z, z) > 0 for z 6= 0, then x0 is a local minimum of J. 16.8 Exercises 1. Using the trial function |x|2 R2 compute an upper bound for the first Dirichlet eigenvalue of −∆ in the ball B(0, R) of RN . Compare to the exact value of λ1 in dimensions 2 and 3. (Zeros of Bessel functions can be found, for example, in tables, or by means of a root finding routine in Matlab.) φ(x) = 1 − 298 2. Consider the Sturm-Liouville problem u00 + λu = 0 0<x<1 u0 (0) = u(1) = 0 It can be shown that the eigenvalues are the critical points of R1 0 2 u (x) dx J(u) = R01 u(x)2 dx 0 on the space H = {u ∈ H 1 (0, 1) : u(1) = 0}. Use the Rayleigh-Ritz method to estimate the first two eigenvalues, and compare to the exact values. Choose polynomial trial functions which resemble what the first two eigenfunctions should look like. Ec16-3 3. Use the result of Exercise 13 in Chapter 14 to give an alternate derivation of the fact the Dirichlet quotient achieves its minimum at ψ1 . (Hint: For u ∈ H01 (Ω) compute ||u||2H 1 (Ω) and ||u||2L2 (Ω) by expanding in the eigenfunction basis.) 0 4. Let T be the integral operator Z 1 |x − y|u(y) dy T u(x) = 0 on L2 (0, 1). Show that 1 1 ≤ ||T || ≤ √ 3 6 (Suggestion: the lower bound can be obtained using a simple choice of trial function in the corresponding Rayleigh quotient.) 5. Let A be an m × n real matrix, b ∈ Rm and define J(x) = ||Ax − b||2 for x ∈ Rn . (Here ||x||2 denotes the 2 norm, the usual Euclidean distance on Rm ). a) What is the Euler-Lagrange equation for the problem of minimizing J? b) Under what circumstances does the Euler-Lagrange equation have a unique solution? c) Under what circumstances will the solution of the Euler-Lagrange equation also be a solution of Ax = b? 299 pinc2proof 6. Prove the version of the Poincaré inequality stated in Proposition 16.1. (Suggestions: If no such C exists show that we can find sequence uk ∈ H∗1 (Ω) with ||uk ||L2 (Ω) = 1 such that ||∇uk ||L2 (Ω) ≤ k1 . Using Rellich’s theorem obtain a convergent subsequence whose limit must have contradictory properties.) 7. Fill in the details of the following alternate proof that there exists a weak solution of the Neumann problem −∆u = f ∂u = 0 x ∈ ∂Ω ∂n x∈Ω (NP) (as usual, Ω is a bounded open set in RN ) provided f ∈ L2 (Ω), and R Ω f (x) dx = 0: a) Show that for any > 0 there exists a (suitably defined) unique weak solution u of ∂u −∆u + u = f x ∈ Ω = 0 x ∈ ∂Ω ∂n R b) Show that Ω u (x) dx = 0 for any such . c) Show that there exists u ∈ H 1 (Ω) such that u → u weakly in H 1 (Ω) as → 0, and u is a weak solution of (NP). Ec16-6 8. Consider a Lagrangian of the form L = L(u, p) (i.e. it happens not to depend on the space variable x) when N = 1. Show that if u is a solution of the Euler-Lagrange equation then ∂L (u, u0 ) = C L(u, u0 ) − u0 ∂p for some constant C. In this way we are able to achieve a reduction of order from a second order ODE to a first order ODE. Use this observation to redo the derivation of the solution of the hanging chain problem. 9. Find the function u(x) which minimizes Z 1 J(u) = (u0 (x) − u(x))2 dx 0 among all functions u ∈ H 1 (0, 1) satisfying u(0) = 0, u(1) = 1. 10. The area of a surface obtained by revolving the graph of y = u(x), 0 < x < 1 about the x axis, is Z 1 p J(u) = 2π u(x) 1 + u0 (x)2 dx 0 300 Assume that u is required to satisfy u(0) = a, u(1) = b where 0 < a < b. a) Find the Euler-Lagrange equation for the problem of minimizing this surface area. b) Show that p u(u0 )2 p − u 1 + (u0 )2 1 + (u0 )2 is a constant function for any such minimal surface (Hint: use Exercise 8). c) Solve the first order ODE in part b) to find the minimal surface. Make sure to compute all constants of integration. 11. Find a functional on H 1 (Ω) for which the Euler-Lagrange equation is −∆u = f x∈Ω − ∂u = k(x)u x ∈ ∂Ω ∂n 12. Find the Euler-Lagrange equation for minimizing Z J(u) = |∇u(x)|q dx Ω subject to the constraint Z H(u) = |u(x)|r dx = 1 Ω where q, r > 1. 13. Let Ω ⊂ RN be a bounded open set, q ∈ C(Ω), q(x) > 0 in Ω, and R |∇u(x)|2 dx Ω R J(u) = q(x)u(x)2 dx Ω a) Show that any nonzero critical point u ∈ H01 (Ω) of J is a solution of the eigenvalue problem −∆u = λq(x)u x ∈ Ω u=0 x ∈ ∂Ω b) Show that all eigenvalues are positive. c) If q(x) ≥ 1 in Ω and λ1 denotes the smallest eigenvalue, show that λ1 < λ∗1 where λ∗1 is the corresponding first eigenvalue of −∆ in Ω. 301 14. Define 1 J(u) = 2 Z 2 Z (∆u) dx + Ω f u dx Ω What PDE problem is satisfied by a critical point of J over X = H 2 (Ω) ∩ H01 (Ω)? Make sure to specify any relevant boundary conditions. What is different if instead we let X = H02 (Ω)? 15. Let H be a Hilbert space and J : H → R. Recall that J is lower semicontinuous if J(x) ≤ lim inf n→∞ J(xn ) whenever xn → x, and is weakly lower semicontinuous if w the same is true whenever xn → x. We say J is coercive if lim||x||→∞ J(x) = +∞. a) If J is weakly lower semicontinuous and coercive show that inf x∈H J(x) is finite. b)If J is weakly lower semicontinuous and coercive show that minx∈H J(x) has a solution. c) Show that if f ∈ L2 (Ω) then Z Z 1 2 J(u) = |∇u(x)| dx − f (x)u(x) dx 2 Ω Ω is weakly lower semicontinuous and coercive on H01 (Ω). Ec15b 16. Let Ω be a bounded open set in RN . If p < N , a special case of the Sobolev embedding theorem states that there exists a constant C = C(Ω, p, q) such that ||u||Lq (Ω) ≤ C||u||W 1,p (Ω) 1≤q≤ Np N −p (16.8.1) Use this to show that (16.4.19) holds for N ≥ 3, p = N2N , and so the problem +2 (16.4.10) has a solution obtainable by the variational method, for all f in this Lp space. ex-15 17. Formulate and derive a replacement for (16.5.7) for the case that u is a vector function. Ec-dido 18. Redo Dido’s problem (Example 16.9) but allowing for an arbitrary curve (x(t), y(t)) in the plane connecting the points (0, 0) and (1, 0). Since there are now 2 unknown functions, the result of Exercise 17 will be relevant. 19. Show that if Ω is a bounded domain in RN and f ∈ L2 (Ω), then the problem of minimizing Z Z 1 2 |∇u| dx − f u dx J(u) = 2 Ω Ω 302 over H01 (Ω) satisfies all of the conditions of Theorem 16.8. What goes wrong if we replace H01 (Ω) by H 1 (Ω)? 20. We say that J : X → R is strictly convex if J(tx + (1 − t)y) < tJ(x) + (1 − t)J(y) x, y ∈ X 0<t<1 If J is strictly convex, show that the minimization problem (16.6.1) has at most one solution. Ec16-19 21. Show that the Fréchet derivative, if it exists, must be unique. Ec16-20 22. If F : R2 → R is defined by ( F (x, y) = xy 2 x2 +y 4 0 (x, y) 6= (0, 0) (x, y) = (0, 0) show that F is Gateaux differentiable but not Fréchet differentiable at the origin. 23. Let F be a C 1 mapping of a Banach space X into itself. Give a formal derivation of Newton’s method xn+1 = xn − DF (xn )−1 (F (xn ) − y) for solving F (x) = y. 24. If A is a bounded linear operator on a Banach space X, discuss the differentiability of the map t → etA , regarded as a mapping from R into B(X). (Recall that the exponential of a bounded linear operator was defined in Exercise 9 of Chapter 5.) 25. Prove the second derivative test Theorem 16.11. 303 Chapter 17 Weak solutions of partial differential equations ch_weaksol 17.1 Lax-Milgram theorem The main goal of this final chapter is to develop further tools which will allow us to answer basic questions about second order linear PDEs with variable coefficients. Beginning our discussion with the elliptic case, there are actually two natural ways to write such an equation, namely N X N X ∂ 2u ∂u Lu := − ajk (x) + bj (x) + c(x)u = f (x) x ∈ Ω ∂xj ∂xk j=1 ∂xj j,k=1 (17.1.1) endiv X N N X ∂ ∂u ∂u Lu := − ajk (x) + bj (x) + c(x)u = f (x) x ∈ Ω ∂x ∂x ∂x j k j j=1 j,k=1 (17.1.2) ediv and A second order PDE is said to be elliptic if it can be written in one of the forms (17.1.1), (17.1.2) for which there exists a constant θ > 0 such that N X ajk (x)ξj ξk ≥ θ|ξ|2 j,k=1 304 ∀ξ ∈ RN (17.1.3) ellip That is to say, the matrix with entries ajk (x) is uniformly positive definite on Ω. It is easy to verify that this use of the term ’elliptic’ is consistent with all previous usages. We will in addition always assume that the coefficients ajk , bj , c belong to L∞ (Ω). The structure of these two equations are referred to respectively as non-divergence form and divergence form since in the second case theP leading order sum could be written as ∇ · v if v is the vector field with components vj = N k=1 ajk uxk . The minus sign in the leading order term is included for later convenience, for the same reason that Poisson’s equation is typically written as −∆u = f . Also for notational simplicity we will from here on adopt the summation convention, that is, repeated indices are summed. Thus the two forms of the PDE may be written instead as −ajk (x)uxk xj + bj (x)uxj + c(x)u = f (x) x ∈ Ω (17.1.4) 17-1-4 − (ajk (x)uxk )xj + bj (x)uxj + c(x)u = f (x) x ∈ Ω (17.1.5) 17-1-5 There is obviously an equivalence between the two forms provided the leading coefficients ajk are differentiable in an appropriate sense, so that (ajk (x)uxk )xj = ajk (x)uxk xj + (ajk )xj uxj (17.1.6) is valid, but one of the main reasons to maintain the distinction is that there may be situations where we do not want to make any such differentiability assumption. In such a case we cannot expect classical solutions to exist, and will rely instead on a notion of weak solution, which generalizes (14.4.8) for the case of the Poisson equation. A second reason, therefore, for direct consideration of the PDE in divergence form is that a suitable definition of weak solution arises in a very natural way. The formal result of multiplying the equation by a test function v and integrating over Ω is that Z Z f (x)v(x) dx (17.1.7) [ajk (x)uxk (x)vxj (x) + bj (x)uxj (x)v(x) + c(x)u(x)v(x)] dx = Ω Ω If we also wish to impose the Dirichlet boundary condition u = 0 for x ∈ ∂Ω then as in the case of the Laplace equation we interpret this as the requirement that u ∈ H01 (Ω). Assuming that f ∈ L2 (Ω) the integrals in (17.1.7) are all defined and finite for v ∈ H01 (Ω) and so we are motivated to make the following definition. Definition 17.1. If f ∈ L2 (Ω) we say that u is a weak solution of the Dirichlet problem − (ajk (x)uxk )xj + bj (x)uxj + c(x)u = f (x) x ∈ Ω (17.1.8) x ∈ ∂Ω (17.1.9) u = 0 if u ∈ H01 (Ω) and (17.1.7) holds for every v ∈ H01 (Ω). 305 17-1-7 In deciding whether a certain definition of weak solution for a PDE is an appropriate one the following considerations should be born in mind • If the definition is too narrow, then a solution need not exist. • If the definition is too broad, then many solutions will exist. Thus if both existence and uniqueness can be proved, it is an indication that the balance is just right, i.e. the requirements for a weak solution are neither too narrow nor too broad, so that the definition is suitable. Here is a special case for which uniqueness is simple to prove. Proposition 17.1. Let Ω be a bounded domain in RN . There exists > 0 depending only on the domain Ω and the ellipticity constant θ in (17.1.3), such that if c(x) ≥ 0 x ∈ Ω and max ||bj ||L∞ (Ω) < j (17.1.10) then there is at most one weak solution of the Dirichlet problem (17.1.8)-(17.1.9). Proof: If u1 , u2 are both weak solutions then u = u1 − u2 is a weak solution with f ≡ 0. We may then choose v = u in (17.1.7) to get Z [ajk (x)uxk (x)uxj (x) + bj (x)uxj (x)u(x) + c(x)u(x)2 ] dx = 0 (17.1.11) Ω By the ellipticity assumption we have ajk uxk uxj ≥ θ|∇u|2 and recalling that c ≥ 0 there results θ||u||2H 1 (Ω) ≤ ||u||L2 (Ω) ||u||H01 (Ω) (17.1.12) 0 Now if C = C(Ω) denotes a constant for which Poincaré’s inequality (14.4.10) holds, we obtain either u ≡ 0 or θ ≤ C. Thus any < θ/C has the required properties. The smallness restriction on the bj ’s can be weakened considerably, but the nonnegativity assumption on c(x) is more essential. For example in the case of −∆u + c(x)u = 0 x ∈ Ω u = 0 x ∈ ∂Ω (17.1.13) uniqueness fails if c(x) = −λn , if λn is any Dirichlet eigenvalue of −∆, since then any corresponding eigenfunction is a nontrivial solution. 306 Now turning to the question of the existence of weak solutions, our strategy will be to adapt the argument that occurs in Proposition 14.2 showing that the operator T is onto. Consider first the special case − (ajk (x)uxk )xj = f (x) x ∈ Ω u = 0 x ∈ ∂Ω (17.1.14) where as before we assume the ellipticity property (17.1.3), ajk ∈ L∞ (Ω), f ∈ L2 (Ω) and in addition the symmetry property ajk = akj for all j, k. Define Z A[u, v] = ajk (x)uxj (x)vxk (x) dx (17.1.15) Ω We claim that A is a valid inner product on the real Hilbert space H01 (Ω). Note that A[u, v] ≤ C||u||H01 (Ω) ||v||H01 (Ω) (17.1.16) for some constant C depending on maxj,k ||aj,k ||L∞ (Ω) , so A[u, v] is defined for all u, v ∈ H01 (Ω), and A[u, u] ≥ θ||u||2H 1 (Ω) (17.1.17) 0 by the ellipticity assumption. Thus the inner product axioms [H1] and [H2] hold. The symmetry axiom [H4] follows from the assumed symmetry of ajk , and the remaining R inner product axioms are obvious. If we let ψ(v) = Ω f v dx then just as in the proof of Proposition 14.2 we have that ψ is a continuous linear functional on H01 (Ω). We conclude that there exists u ∈ H01 (Ω) such that A[u, v] = ψ(v) for every v ∈ H01 (Ω), which is precisely the definition of weak solution of (17.1.14). The argument just given seems to rely in an essential way on the symmetry assumption, but it turns out that with a somewhat different proof we can eliminate that hypothesis. This result, in its most abstract form, is the so-called Lax-Milgram theorem. Note that even if we had no objection to the symmetry assumption on ajk , it would still not be possible to allow for the presence of first order terms in any obvious way in the above argument. For simplicity, and because it is all that is needed in most applications, we will from now on assume that all abstract and function spaces are real, that is, only real valued functions and scalars are allowed. Definition 17.2. If H is a Hilbert space and A : H × H → R, we say A is • bilinear if it is linear in each argument separately, 307 dpspec • bounded if there exists a constant M such that A[u, v] ≤ M ||u|| ||v|| for all u, v ∈ H, • coercive if there exists γ > 0 such that A[u, u] ≥ γ||u||2 for all u ∈ H. LaxMilgram Theorem 17.1. (Lax-Milgram) Assume that A is bilinear, bounded and coercive on the Hilbert space H, and ψ belongs to the dual space H∗ . Then there exists a unique w ∈ H such that A[x, w] = ψ(x) ∀x ∈ H (17.1.18) Proof: Let E = {y ∈ H : ∃w ∈ H such that A[x, w] = hx, yi ∀x ∈ H} (17.1.19) If w is the element corresponding to some y ∈ E we then have γ||w||2 ≤ A[w, w] = hw, yi ≤ ||w|| ||y|| so γ||w|| ≤ ||y||. In particular w is uniquely determined by y and E claim that E = H. If not then there exists z ∈ E ⊥ , z 6= 0. If we let then φ ∈ H∗ so by the Riesz Representation Theorem 6.6 there exists u φ(x) = hx, ui, or A[x, z] = hx, ui, for all x. Thus u ∈ E, but since z γ||z||2 ≤ A[z, z] = hz, ui = 0, a contradiction. (17.1.20) is closed. We φ(x) = A[x, z] ∈ H such that ∈ E ⊥ we find Finally if ψ ∈ H∗ , using Theorem 6.6 again, we obtain y ∈ H such that ψ(x) = hx, yi for every x, and since y ∈ E = H there exists w ∈ H such that ψ(x) = A[x, w], as needed. The element w is unique, since if A[x, w1 ] = A[x, w2 ] for all x ∈ H then choosing x = w1 − w2 we get A[x, x] = 0 and consequently x = w1 − w2 = 0. Since there is no need for any assumption of symmetry, we can use the Lax-Milgram theorem to prove a more general result about the existence of weak solutions, under the same assumptions we used to prove uniqueness above. th17-2 Theorem 17.2. Let Ω be a bounded domain in RN . There exists > 0 depending only on Ω and the coercitivity constant γ such that if c(x) ≥ 0 in Ω and maxj ||bj ||L∞ (Ω) < then there exists a unique weak solution of the Dirichlet problem (17.1.8)-(17.1.9) for any f ∈ L2 (Ω). Proof: In the real Hilbert space H = H01 (Ω) let Z A[u, v] = [ajk (x)uxk (x)vxj (x) + bj (x)uxj (x)v(x) + c(x)u(x)v(x)] dx Ω 308 (17.1.21) 17-1-20 for u, v ∈ H01 (Ω). It is immediate that A is bilinear and bounded. By the ellipticity and other assumptions made on the coefficients we get Z A[u, u] = [ajk (x)uxk (x)uxj (x) + bj (x)uxj (x)u(x) + c(x)u(x)2 ] dx (17.1.22) Ω ≥ θ||u||2H 1 (Ω) − ||u||L2 (Ω) ||u||H01 (Ω) 0 ≥ γ||u||2H 1 (Ω) 0 (17.1.23) (17.1.24) if γ = θ/2 and = γ/C, where C = C(Ω) R is a constant for which the Poincaré inequality (14.4.10) is valid. Finally since ψ(u) = Ω f u dx defines an element of H∗ , the conclusion follows from the Lax-Milgram theorem. As another application of the Lax-Milgram theorem, we can establish the existence of eigenvalues and eigenfunctions of more general elliptic operators. Let Lu = −(ajk uxk )xj (17.1.25) Here we will assume the ellipticity condition (17.1.3), ajk ∈ L∞ (Ω) and the symmetry property ajk = akj . Let For f ∈ L2 (Ω) let v = Sf be the unique weak solution v ∈ H01 (Ω) of Lv = f x ∈ Ω v = 0 x ∈ ∂Ω (17.1.26) R 1 whose existence is guaranteed by Theorem 17.2, i.e. v ∈ H0 (Ω) and A[v, w] = Ω f w dx for all w ∈ H01 (Ω), where Z ajk vxk wxj dx A[v, w] = (17.1.27) Ω Choosing w = v, using the ellipticity and the Poincaré inequality gives θ||v||2H 1 (Ω) ≤ C||f ||L2 (Ω) ||v||H01 (Ω) 0 (17.1.28) Thus S : L2 (Ω) → H01 (Ω) is bounded and consequently compact as a linear operator on L2 (Ω) by Rellich’s theorem. We claim next that S is self-adjoint on L2 (Ω). To see this, suppose f, g ∈ L2 (Ω), v = Sf and w = Sg. Then hSf, gi = hv, gi = hg, vi = A[w, v] hf, Sgi = hf, wi = A[v, w] (17.1.29) (17.1.30) (17.1.31) But A[w, v] = A[v, w] by our symmetry assumption, so it follows that S is self-adjoint. 2 It then follows from Theorem 13.10 that there there exists a basis {un }∞ n=1 of L (Ω) 309 consisting of eigenfunctions of S, corresponding to real eigenvalues {µn }∞ n=1 , µn → 0. The eigenvalues of S are all strictly positive, since Su = µu is equivalent to A[µu, µu] = R 2 −1 µu dx. If λn = µn then un is evidently a weak solution of Ω Lun = λn un x∈Ω un = 0 x ∈ ∂Ω (17.1.32) and we may assume the ordering 0 < λ1 ≤ λ2 ≤ · · · ≤ λn → +∞ (17.1.33) The existence of an orthonormal basis of eigenfunctions now follows from Theorem 13.10. To summarize, we have obtained the following generalization of Theorem 14.5. Theorem 17.3. Assume that the ellipticity condition (17.1.3) holds, ajk = akj , and ajk ∈ L∞ (Ω) for all j, k. Then the operator T u = −(ajk (x)uxj )xk D(T ) = {u ∈ H01 (Ω) : (ajk (x)uxj )xk ∈ L2 (Ω)} (17.1.34) has an infinite sequence of real eigenvalues of finite multiplicity, 0 < λ1 ≤ λ2 ≤ λ3 ≤ . . . λn → +∞ (17.1.35) and corresponding eigenfunctions {ψn }∞ n=1 which may be chosen as an orthonormal basis of L2 (Ω). As an immediate application, we can derive a formal series solution for the parabolic problem with time independent coefficients ut − (ajk (x)uxj )xk = 0 x∈Ω t>0 u(x, t) = 0 x ∈ ∂Ω t > 0 u(x, 0) = f (x) x∈Ω (17.1.36) (17.1.37) (17.1.38) Making the same assumptions on ajk as in the Theorem, so that an orthonormal basis 2 {ψn }∞ n=1 of eigenfunctions exists in L (Ω), we can obtain the solution in the form u(x, t) = ∞ X hf, ψn ie−λn t ψn (x) (17.1.39) n=1 in precisely the same way as was done to derive (14.4.35) for the heat equation. The smallest eigenvalue λ1 again plays a distinguished role in determining the overall decay rate for typical solutions. 310 17.2 More function spaces In this section we will introduce some more useful function spaces. Recall that the Sobolev space W0k,p (Ω) is the closure of C0∞ (Ω) in the norm of W k,p (Ω). 0 Definition 17.3. We define the negative order Sobolev space W −k,p (Ω) to be the dual space of W0k,p (Ω). That is to say, 0 W −k,p (Ω) = {T ∈ D0 (Ω) : ∃C such that |T φ| ≤ C||φ||W k,p (Ω) ∀φ ∈ C0∞ (Ω)} (17.2.1) We emphasize that we are defining the dual of W0k,p (Ω), not W k,p (Ω). The notation 0 suggests that T is the ‘−k’th derivative (i.e a k-fold integral) of a function in Lp (Ω), where p0 is the usual Hölder conjugate exponent, and we will make some more precise statement along these lines below. When p = 2 the alternative notation H −k (Ω) is commonly used. The same notation was also used in the case Ω = RN in which case a definition using the Fourier transform was given. One can check that the definitions are 0 equivalent. The norm of an element in W −k,p (Ω) is defined in the usual way for dual spaces, namely |T φ| ||T ||W −k,p0 (Ω) = sup (17.2.2) ||φ||W k,p (Ω) φ6=0 0 −k,p0 W0k,p (Ω) If φ ∈ and T ∈ W (Ω) then it is common to use the ’inner product-like’ notation hT, φi in place of T φ, and may refer to this value as the duality pairing of T and φ. Example 17.1. If x0 ∈ (a, b) and T φ = φ(x0 ), i.e. T = δx0 , then T ∈ H −1 (a, b). To see this, observe that for φ ∈ C0∞ (a, b) we have obviously Z x0 p p 0 |T φ| = |φ(x0 )| = φ (x) dx ≤ |b − a|||φ0 ||L2 (a,b) ≤ |b − a|||φ||H 1 (a,b) (17.2.3) a It is essential that Ω = (a, b) is one dimensional here. If Ω ⊂ RN and x0 ∈ Ω it can be 0 shown that δx0 ∈ W −k,p (Ω) if k > N/p, see Exercise ( ). Let us next observe that in the R proof of Theorem 17.2, the only property of f which we actually used was that ψ(u) = Ω f u dx defines an element in the dual space of H01 (Ω). Thus it should be possible to obtain similar conclusions if we replace the assumption f ∈ L2 (Ω) by f ∈ H −1 (Ω). To make this precise, we will first make the obvious definition 311 that for T ∈ H −1 (Ω) and L a divergence form operator as in (17.1.2), with associated bilinear form (17.1.21), u is a weak solution of Lu = T x∈Ω u = 0 x ∈ ∂Ω (17.2.4) 17-2-4 provided u ∈ H01 (Ω) A[u, v] = T v ∀v ∈ H01 (Ω) (17.2.5) We then have th17-3 Theorem 17.4. There exists > 0 such that if c(x) ≥ 0 in Ω and maxj ||bj ||L∞ (Ω) < then there exists a unique weak solution of the Dirichlet problem (17.2.4) for any T ∈ H −1 (Ω). Corollary 17.1. If T ∈ H −1 (Ω) and u ∈ H01 (Ω) is the corresponding weak solution of −∆u = T x∈Ω u = 0 x ∈ ∂Ω (17.2.6) then ||u||H01 (Ω) = ||T ||H −1 (Ω) Proof: The definition of weak solution here is Z ∇u · ∇v = T v ∀v ∈ H01 (Ω) (17.2.7) (17.2.8) Ω so it follows that |T v| ≤ ||u||H01 (Ω) ||v||H01 (Ω) (17.2.9) and therefore ||T ||H −1 (Ω) ≤ ||u||H01 (Ω) . But choosing v = u in the same identity gives ||u||2H 1 (Ω) = T u ≤ ||T ||H −1 (Ω) ||u||H01 (Ω) 0 (17.2.10) and the conclusion follows. In particular we see that the map T → u, which we will denote by (−∆)−1 , is an isometric isomorphism of H −1 (Ω) onto H01 (Ω), thus is a specific example of the correspondence between a Hilbert space and its dual space, as is guaranteed by Theorem 6.6. Using this map we can also give a convenient characterization of H −1 (Ω). Corollary 17.2. T ∈ H −1 (Ω) if and only if there exists f1 . . . fN ∈ L2 (Ω) such that N X ∂fj T = ∂xj j=1 in the sense of distributions on Ω. 312 (17.2.11) 17-2-11 Proof: Given T ∈ H −1 (Ω) we let u = (−∆)−1 T ∈ H01 (Ω) in which case fj := uxj has the required properties. Conversely, if f1 , . . . fN ∈ L2 (Ω) are given and T is defined as a distribution by (17.2.11) it follows that N Z X Tφ = fj φxj dx (17.2.12) j=1 Ω for any test function φ. Therefore |T φ| ≤ N X ||fj ||L2 (Ω) ||φxj ||L2 (Ω) ≤ C||φ||H01 (Ω) (17.2.13) j=1 which implies that T ∈ H −1 (Ω). 0 The spaces W −k,p for finite p 6= 2 can be characterized in a similar way, see Theorem 3.10 of [1]. A second kind of space we introduce arise very naturally in cases when there is a distinguished variable, such as time t in the heat equation or wave equation. If X is any Banach space and [a, b] ⊂ R, we denote C([a, b] : X) = {f : [a, b] → X : f is continuous on [a, b]} (17.2.14) Continuity here is with respect to the obvious topologies, i.e. for any > 0 there exists δ > 0 such that ||f (t) − f (t0 )||X ≤ if |t − t0 | < δ, t, t0 ∈ [a, b]. One can readily verify that ||f ||C([a,b]:X) = max ||f (t)||X (17.2.15) a≤t≤b defines a norm with respect to which C([a, b] : X) is a Banach space. The definition may be modified in the usual way for the case that [a, b] is replaced by an open, semi-open or infinite interval, although of course it need not then be a Banach space. A related collection of spaces is defined by means of the norm defined as Z b p1 p ||f ||Lp ([a,b]:X) := ||f (t)||X dt (17.2.16) a for 1 ≤ p < ∞. To avoid questions of measurability we will simply define Lp ([a, b] : X) to be the closure of C([a, b] : X) with respect to this norm. See, for example section 5.9.2 of [10] or section 39 of [36] for more details, and for the case p = ∞. 313 17-2-15 If X is a space of functions and u = u(x, t) is a function for which u(·, t) ∈ X for every (or almost every) t ∈ [a, b], then we will often regard u as being the map u : [a, b] → X defined by u(t)(x) = u(x, t). Thus u be viewed as a ’curve’ in the space X. The following example illustrates a typical use of such spaces in a PDE problem. According to the discussion of Example 14.4, if Ω is a bounded open set in RN and f ∈ L2 (Ω) then the unique solution u = u(x, t) of ut − ∆u = 0 x∈Ω t>0 u(x, t) = 0 x ∈ ∂Ω t > 0 u(x, 0) = f (x) x∈Ω is given by u(x, t) = ∞ X (17.2.17) (17.2.18) (17.2.19) cn e−λn t ψn (x) (17.2.20) n=1 Here λn > 0 is the n’th Dirichlet eigenvalue of −∆ in Ω, {ψn }∞ n=1 is a corresponding 2 orthonormal eigenfunction basis of L (Ω), and cn = hf, ψn i. Theorem 17.5. For any T > 0 the solution u satisfies u(·, t) ∈ H01 (Ω) ∀t > 0 (17.2.21) 17-2-21 u ∈ C([0, T ] : L2 (Ω)) ∩ L2 ([0, T ] : H01 (Ω)) (17.2.22) 17-2-22 Proof: Pick 0 ≤ t < t0 ≤ T and observe by Bessel’s equality that ||u(·, t) − u(·, t0 )||2L2 (Ω) = ∞ X 0 |cn |2 (e−λn t − e−λn t )2 ≤ n=1 ∞ X 0 |cn |2 (1 − e−λn (t −t) )2 (17.2.23) n=1 Since f ∈ L2 (Ω) we know that {cn } ∈ `2 , so for given > 0 we may pick an integer N such that ∞ X |cn |2 < (17.2.24) 2 n=N +1 Next, pick M > 0 such that |cn |2 ≤ M for all n and then δ > 0 such that |e−λn δ − 1|2 ≤ 314 2N M (17.2.25) for n = 1, . . . N . If 0 ≤ t < t0 ≤ t + δ we then have ∞ X 0 2 ||u(·, t) − u(·, t )||L2 (Ω) ≤ |cn |2 (1 − e−λn t )2 n=1 N X ≤ (17.2.26) |cn | (1 − e ) + n=1 N X ≤ ∞ X −λn δ 2 2 2 |cn |2 (1 − e−λn δ )(17.2.27) n=N +1 M n=1 ∞ X + |cn |2 < 2N M n=N +1 (17.2.28) This completes the proof that u ∈ C([0, T ] : L2 (Ω)). To verify (17.2.21) we use the fact that ||v||2H 1 (Ω) 0 = ∞ X λn |hv, ψn i|2 (17.2.29) n=1 for v ∈ H01 (Ω), see Exercise 13 of Chapter 14. Thus it is enough to show that ∞ X λn |hu(·, t), ψn i|2 = n=1 ∞ X λn |hf, ψn i|2 e−2λn t < ∞ (17.2.30) n=1 By means of elementary calculus it is easy to check that se−s ≤ e−1 for s ≥ 0, hence 1 λn e−2λn t ≤ n = 1, 2, . . . (17.2.31) 2et Thus P∞ ∞ 2 X ||f ||2L2 (Ω) 2 −2λn t n=1 |hf, ψn i| λn |hf, ψn i| e ≤ = <∞ (17.2.32) 2et 2et n=1 as needed, as long as t > 0. Finally, ||u||2L2 ([0,T ]:H 1 (Ω)) 0 Z T = 0 = = ∞ X n=1 ∞ X ||u(·, t)||2H 1 (Ω) 0 Z Z = 0 T ∞ X λn e−2λn t |hf, ψn i|2 dt (17.2.33) n=1 T e−2λn t dt|hf, ψn i|2 (17.2.34) (1 − e−2λn T )|hf, ψn i|2 ≤ ||f ||2L2 (Ω) (17.2.35) λn 0 n=1 315 This completes the proof. Note that the proof actually establishes the quantitative estimates ||f ||L2 (Ω) ∀t > 0 ||u(·, t)||H01 (Ω) ≤ √ 2et ||u||L2 ([0,T ]:H01 (Ω)) ≤ ||f ||L2 (Ω) ∀T > 0 (17.2.36) (17.2.37) The fact that u(·, t) ∈ H01 (Ω) for t > 0 even though f is only assumed to belong to L2 (Ω) is sometimes referred to as a regularizing effect – the solution becomes instantaneously smoother than it starts out being. With more advanced methods one can actually show that u is infinitely differentiable, with respect to both x and t for t > 0. The conclusion u(·, t) ∈ H01 (Ω) for t > 0 also gives a precise meaning for the boundary condition (17.2.18), and similarly u ∈ C([0, T ] : L2 (Ω)) provides a specific sense in which the initial condition (17.2.19) holds, namely u(·, t) → f in L2 (Ω) as t → 0+. The above discussion is very specific to the heat equation – on physical grounds alone one may expect rather different behavior for solutions of the wave equation. See Exercise ( ). 17.3 Galerkin’s method For PDE problems of the form Lu = f , ut = Lu or utt = Lu, we can obtain very explicit solution formulas involving the eigenvalues and eigenfunctions of a suitable operator T corresponding to L, provided there exist such eigenvalues and eigenfunctions. But there are situations of interest when this is not the case, for example if T is not symmetric. Another case which may arise for time dependent problems is when the expression for L, and hence the corresponding T , is itself t dependent. Even if the symmetry property were assumed to hold for each fixed t, it would still not be possible to obtain solution formulas by means of a suitable eigenvalue/eigenfunction series. An alternative, but closely related method which will allow for such generalizations is Galerkin’s method, which we will now discuss in the context of the abstract problem u∈H A[v, u] = ψ(v) ∀v ∈ H (17.3.1) under the same assumptions as in the Lax-Milgram theorem, Theorem 17.1. Recall this means we assume that A is bilinear, bounded and coercive on the Hilbert space H and ψ ∈ H∗ . 316 17-3-1 We start by choosing an arbitrary basis {vk } of H, and look for an approximate solution (the Galerkin approximation) in the form un = n X ck v k (17.3.2) 17-3-2 k=1 If un happened to be the exact solution we would have A[v, un ] = ψ(v) for any v ∈ H and in particular n X A[vj , un ] = ck A[vj , vk ] = ψ(vj ) ∀j (17.3.3) k=1 However this amounts to infinitely many equations for c1 , . . . cn , so can’t be satisfied in general. Instead we require it only for j = 1, . . . n, and so obtain an n × n linear system for these unknowns. The resulting system n X ck A[vj , vk ] = ψ(vj ) j = 1, . . . n (17.3.4) 17-3-4 k=1 is guaranteed nonsingular under our assumptions. Indeed, if n X dk A[vj , vk ] = 0 j = 1, . . . n (17.3.5) n=1 and w = Pn k=1 dk vk then A[vj , w] = 0 j = 1, . . . n (17.3.6) and so multiplying the j’th equation by dj and summing we get A[w, w] = 0. By the coercitivity assumption it follows that w = 0 and so d1 = . . . dn = 0 by the linear independence of the vk ’s. If we set En = span {v1 , . . . vn } then the previous discussion amounts to defining un to be the unique solution of un ∈ En A[v, un ] = ψ(v) ∀v ∈ En (17.3.7) which may be obtained by solving the finite system (17.3.4). It now remains to study the behavior of un as n → ∞. The identity A[un , un ] = ψ(un ), obtained by choosing v = un in (17.3.7), together with the coercitivity assumption, gives γ||un ||2 ≤ ||ψ|| ||un || 317 (17.3.8) 17-3-7 Thus the sequence un is bounded in H and so has a weakly convergent subsequence w unl → u in H. We may now pass to the limit as nl → ∞, taking into account the meaning of weak convergence, in the relation A[vk , unl ] = ψ(vk ) (17.3.9) for any fixed k, obtaining A[vk , u] = ψ(vk ) for every k. It then follows that (17.3.1) holds, because finite linear combinations of the vk ’s are dense in H. Also, since u is the unique solution of (17.3.1) the entire sequence un must be weakly convergent to u. We remark that in a situation like (17.1.14) in which, at least formally, A[v, u] = hv, LuiH1 and ψ(v) = hf, viH1 for some second Hilbert space H1 ⊃ H, then the system (17.3.4) amounts to the requirement that Lun − f = L(un − u) ∈ En⊥ , where the orthogonality is with respect to the H1 inner product. If also the embedding of H into H1 is compact (think of H = H01 (Ω) and H1 = L2 (Ω)) then we also obtain immediately that un → u strongly in H1 . The Galerkin approximation technique can become a very powerful and effective computational technique if the basis {vn } is chosen in a good way, and in particular much more specific and refined convergence results can be proved for special choices of the basis. For example in the finite element method, approximations to solutions of PDE problems are obtained in the form (17.3.2) by solving (17.3.4) where the vn ’s are chosen to be certain piecewise polynomial functions. 17.4 PDEs with variable coefficients The Galerkin approach can also be adapted to the case of time dependent problems. We illustrate by consideration of the parabolic problem ut = (ajk (x, t)uxk )xj + h(x, t) x ∈ Ω 0 < t < T u(x, t) = 0 x ∈ ∂Ω 0 < t < T u(x, 0) = f (x) x∈Ω (17.4.1) (17.4.2) (17.4.3) Here we assume that • Ω is a bounded open set in RN . • ajk ∈ L∞ (Ω × (0, T )) for all j, k and there exists a constant θ > 0 such that ajk (x, t)ξj ξk ≥ θ|ξ|2 for all ξ ∈ RN , (x, t) ∈ Ω × (0, T ). 318 • h ∈ L2 ((0, T ) : L2 (Ω)) and f ∈ L2 (Ω). By a weak solution of (17.4.1) we will mean a function u ∈ L∞ ([0, T ] : L2 (Ω))∩L2 ((0, T ) : H01 (Ω)) such that Z Z tZ u(x, t)ψ(x, t) dx − u(x, s)ψt (x, s) dxds (17.4.4) Ω 0 Ω Z tZ + ajk (x)uxj (x, s)ψxk (x, s) dxds (17.4.5) 0 Ω Z t Z = h(x, s)ψ(x, s) dxds + f (x)ψ(x, 0) dx (17.4.6) o Ω for almost every t ∈ [0, T ] and every ψ ∈ C 1 ([0, T ] × Ω). We mention here that once time dependence is allowed, several different reasonable definitions of weak solutions become possible – see for example Section 7.1.1 of [10], or Section 9.2.d of [22] for other definitions. Roughly speaking, if the class of test functions ψ is larger then proving existence becomes harder and proving uniqueness becomes easier. For simple parabolic problems of this type, however, all such definitions turn out in the end to be equivalent. We now sketch how the Galerkin method may be adapted to this problem. Choose 1 2 any basis {vk }∞ k=1 of H0 (Ω) which is orthonormal in L (Ω), for example the Dirichlet eigenfunctions of −∆. We seek an approximate solution un (x, t) = ∞ X ck (t)vk (x) (17.4.7) n=1 17.5 Exercises 1. Verify that the definition of ellipticity (17.1.3) is consistent with the one given for the special case (2.3.39), i.e. for such an equation the two definitions are equivalent. 2. Let λ1 be the smallest Dirichlet eigenvalue for −∆ in Ω, assume that c ∈ C(Ω) and c(x) > λ1 in Ω. If f ∈ L2 (Ω) prove the existence of a solution of −∆u + c(x)u = f x∈Ω u = 0 x ∈ ∂Ω (17.5.1) 3. Let λ > 0 and define Z A[u, v] = A[u, v] = Z ajk (x)uxk (x)vxj (x) dx + λ Ω uv dx Ω 319 (17.5.2) for u, v ∈ H 1 (Ω). Assume the ellipticity property (17.1.3) and that ajk ∈ L∞ (Ω). If f ∈ L2 (Ω) show that there exists a unique solution of Z 1 u ∈ H (Ω) A[u, v] = f v dx ∀v ∈ H 1 (Ω) (17.5.3) Ω Justify that u may be regarded as the weak solution of −(ajk uxk )xj + λu = f (x) x ∈ Ω ajk uxk nj = 0 x ∈ ∂Ω (17.5.4) The above boundary condition is said to be of conormal type. 4. If f ∈ L2 (0, 1) we say that u is a weak solution of the fourth order problem u0000 + u = f 0<x<1 u00 (0) = u000 (0) = u00 (1) = u000 (1) = 0 if u ∈ H 2 (0, 1) and Z 1 Z 00 00 (u (x)ζ (x) + u(x)ζ(x)) dx = 0 1 f (x)ζ(x) dx for all ζ ∈ H 2 (0, 1) 0 Discuss why this is a reasonable definition and use the Lax-Milgram Theorem to prove that there exists a weak solution. The following fact may be useful here: there exists a finite constant C such that 0 2 2 00 2 ||φ ||L2 (0,1) ≤ C ||φ||L2 (0,1) + ||φ ||L2 (0,1) ∀φ ∈ H 2 (0, 1) see for example Lemma 4.10 of [1] or equation 12.1 in Chapter I of [24]. 5. Let Ω ⊂ RN be a bounded open set containing the origin. Show that δ ∈ H −1 (Ω) if and only if N = 1. 6. Let f and g be in L2 (0, 1). Use the Lax-Milgram Theorem to prove there is a unique weak solution {u, v} ∈ H01 (0, 1) × H01 (0, 1) to −u00 + u + v 0 = f −v 00 + v + u0 = g, where u(0) = v(0) = 0, u(1) = v(1) = 0. (Hint: Start by defining the bilinear form Z 1 A[(u, v), (φ, ψ)] = (u0 φ0 + uφ + v 0 φ + v 0 ψ 0 + vψ + u0 ψ) dx 0 on H01 (0, 1) × H01 (0, 1).) 320 7. If X is a Banach space prove that C([a, b] : X) is also a Banach space with norm defined in (17.2.15). 8. Let L be the divergence form elliptic operator Lv = − ajk (x)vxj x in a bounded k open set Ω ⊂ RN and let u be a solution of the parabolic problem ut +Lu = 0 x ∈ Ω, t > 0 u(x, t) = 0 x ∈ ∂Ω, t > 0 u(x, 0) = u0 (x) x ∈ Ω Let φ be a C 2 convex function on R with φ0 (0) = 0. a) Show that Z Z φ(u(x, t)) dx ≤ Ω φ(u0 (x)) dx Ω for any t > 0. b) By choosing φ(s) = |s|p and letting p → ∞, show that ||u(·, t)||L∞ ≤ ||u0 ||L∞ 9. What is the dual space of Lp ((a, b) : Lq (Ω)) for p, q ∈ (1, ∞)? 321 Chapter 18 Appendices 18.1 Inequalities In this section we state and prove a number of useful inequalities for numbers and functions. A function φ on an interval (a, b) ⊂ R is convex if φ(λx1 + (1 − λ)x2 ) ≤ λφ(x1 ) + (1 − λ)φ(x2 ) (18.1.1) for all x1 , x2 ∈ (a, b) and λ ∈ [0, 1]. A convex function is necessarily continuous (see Theorem 3.2 of [30]). If φ is such a function and c ∈ (a, b) then there always exists a supporting line for φ at c, more precisely, there exists m ∈ R such that if we let ψ(x) = m(x − c) + φ(c), then ψ(x) ≤ φ(x) for all x ∈ (a, b). If φ is differentiable at x = c then m = φ0 (c), otherwise it may be defined in terms of a certain supremum (or infimum) of slopes. If in addition φ is twice differentiable then φ is convex if and only if φ00 ≥ 0. young Proposition 18.1. (Young’s inequality) If a, b ≥ 0, 1 < p, q < ∞ and ab ≤ ap b q + p q 1 p + 1 q = 1 then (18.1.2) Proof: If a or b is zero the conclusion is obvious, otherwise, since the exponential function 322 A19 is convex and 1/p + 1/q = 1 we get ab = e(log a+log b) = e( q log ap + logqb p ) p) e(log a p ≤ + e(log b q q) = ap b q + p q (18.1.3) In the special case that p = q = 2 (18.1.2) can be proved in an even more elementary way, just by rearranging the obvious inequality a2 − 2ab + b2 = (a − b)2 ≥ 0. propA4 Corollary 18.1. If a, b ≥ 0, 1 < p, q < ∞, ab ≤ 1 p + 1 q = 1, and > 0 there holds ap bq + q p q p Proof: We can write 1 p ab = ( a) b (18.1.4) A113 1 (18.1.5) p and then apply Proposition 18.1. holderp Proposition 18.2. (Hölder’s inequality) If u, v are measurable functions on Ω ⊂ RN , 1 ≤ p, q ≤ ∞, and p1 + 1q = 1 then ||uv||L1 (Ω) ≤ ||u||Lp (Ω) ||v||Lq (Ω) (18.1.6) Proof: We may assume that ||u||Lp (Ω) , ||v||Lq (Ω) are finite and nonzero, since otherwise (18.1.6) is obvious. When p, q = 1 or ∞, proof of the inequality is elementary, so assume first that 1 < p, q < ∞. Using (18.1.4) with a = |u(x)| and b = |v(x)|, and integrating with respect to x over Ω gives Z Z Z 1 p |u(x)v(x)| dx ≤ |u(x)| dx + q |v(x)|q dx (18.1.7) p p q Ω Ω Ω By choosing R 1 |v(x)|q dx q Ω = R |u(x)|p dx Ω the right hand side of this inequality simplifies to Z p1 Z 1q 1 1 p q |u(x)| dx |v(x)| dx + = ||u||Lp (Ω) ||v||Lq (Ω) p q Ω Ω 323 (18.1.8) (18.1.9) holder as needed. The special case of Hölder’s inequality when p = q = 2 is commonly called the Schwarz, or Cauchy-Schwarz inequality. Whenever p, q are related, as in Young’s or Hölder’s inequality, via 1/p + 1/q = 1 it is common to refer to q = p/(p − 1) =: p0 , as the Hölder conjugate exponent of p. minkowskip Proposition 18.3. (Minkowksi inequality) If u, v are measurable functions on Ω ⊂ RN and 1 ≤ p ≤ ∞, then ||u + v||Lp (Ω) ≤ ||u||Lp (Ω) + ||v||Lp (Ω) (18.1.10) minkowski Proof: We may assume that ||u||Lp (Ω) , ||v||Lp (Ω) are finite and that ||u + v||Lp (Ω) 6= 0, since otherwise there is nothing to prove. We have earlier noted in Section 3.1 that Lp (Ω) is a vector space, so u + v ∈ Lp (Ω) also. In the case 1 < p < ∞ we write Z Z Z p−1 p |u(x)| |u(x)+v(x)| dx+ |v(x)| |u(x)+v(x)|p−1 dx (18.1.11) |u(x)+v(x)| dx ≤ A120 Ω Ω Ω By Hölder’s inequality p1 Z Z 1q Z p p−1 (p−1)q |u(x)| dx |u(x)| |u(x) + v(x)| dx ≤ |u(x) + v(x)| dx (18.1.12) Ω Ω Ω where 1/q + 1/p = 1. Estimating the second term on the right of (18.1.11) in the same way, we get 1q Z Z p p |u(x) + v(x)| dx (||u||Lp (Ω) + ||v||Lp (Ω) ) |u(x) + v(x)| dx ≤ (18.1.13) Ω Ω from which the conclusion (18.1.10) follows by obvious algebra. The two limiting cases p = 1, ∞ may be handled in a more elementary manner, and we leave these cases to the reader. Both the Hölder and Minkowski inequalities have counterparts ! p1 ! 1q X X X 1 1 |ak bk | ≤ |ak |p |bk |q 1 < p, q < ∞ + =1 p q k k k ! p1 X k |ak + bk |p ! p1 ≤ X |ak |p (18.1.14) holders (18.1.15) minkowskis ! p1 + k X k 324 |bk |p 1≤p<∞ (with suitable modification for the case of p or q being ∞) in which the integrals are replaced by finite or infinite sums of real or complex constants – the proofs are otherwise identical1 . 18.2 Integration by parts In the elementary integration by parts formula from calculus Z b Z b 0 u(x)v (x) dx = − u0 (x)v(x) dx + u(x)v(x)|ba a (18.2.1) ibp1 a one integral is shown to be equal to another integral plus a ’boundary term’, where in this case the boundary consists of the two points a, b, namely the boundary of the interval [a, b] over which the integration takes place. In higher dimensional situations we refer to any identity of this general character as being an integration by parts formula. There are a number of such formulas, all more or less equivalent to each other, which are frequently used in applied mathematics, and which we review here. We will take as a known basic integration by parts formula the divergence theorem Z Z ∇ · F(x) dx = F · n(x) dS(x) (18.2.2) Ω valid for a C 1 vector field F and bounded open set Ω ⊂ RN , N ≥ 2, with C 1 boundary ∂Ω, see for example Theorem 10.51 of [29]. Here n(x) is the unit outward normal to ∂Ω at x ∈ ∂Ω. If we now choose the vector field F to be zero except for the j’th component Fj (x) = u(x)v(x), there results Z Z Z ∂v ∂u u(x) (x) dx = − (x)v(x) dx + u(x)v(x)nj (x) dS(x) (18.2.3) ∂xj Ω Ω ∂xj ∂Ω Replacing v by vj , the j’th component of a vector function v, and summing on j we next obtain Z Z Z u(x)(∇ · v)(x) dx = − ∇u(x) · v(x) dx + u(x)(v · n)(x) dS(x) (18.2.4) Ω divthm ∂Ω Ω ∂Ω 1 Or from the point of view of abstract measure theory, the proofs are identical because a sum is just a certain kind of integral. 325 1823 Now choosing v = ∇w, the gradient of some scalar function w, and noting that ∇ · (∇w) = ∆w we find Z Z Z ∂w u(x)∆w(x) dx = − (∇u · ∇w)(x) dx + u(x) (x) dS(x) (18.2.5) ∂n Ω Ω ∂Ω where as usual ∂w = ∇w · n in the outer normal derivative of w on ∂Ω. Reversing the ∂n roles of u and w, and subtracting the resulting expressions, we may then obtain Green’s identity Z Z ∂u ∂w (u(x)∆w(x) − w(x)∆u(x)) dx = u(x) (x) − w(x) (x) dS(x) (18.2.6) ∂n ∂n Ω ∂Ω The special case of (18.2.6) when u(x) ≡ 1, namely Z Z ∂w ∆w(x) dx = (x) dS(x) Ω ∂Ω ∂n (18.2.7) is also of interest. Finally we mention that the classical Green’s theorem in the plane, ZZ I ∂Q ∂P P dx + Q dy = − dxdy ∂x ∂y A ∂A (18.2.8) is also a special case of (18.2.2), obtained by choosing the vector field in R2 F = hQ, −P i. 18.3 Spherical coordinates in RN sphercoord As in the case of R2 or R3 , it is often convenient to work with spherical coordinates in RN . Here is how it works: We denote SN −1 = {x ∈ RN : |x| = 1} the unit sphere2 in RN . Every point x ∈ RN may be expressed as x = rω where r = |x| ≥ 0 and ω ∈ SN −1 , and the representation is unique except for x = 0. We then 2 We try to use the terminology ’unit ball’ for {x ∈ RN : |x| < 1}, but sometimes ’sphere’ and ’ball’ are used interchangeably. Also, SN is sometimes used as notation for the unit sphere, but SN −1 is more common since it is a surface of dimension N − 1. 326 greensid may parametrize SN −1 by N − 1 angle variables θ1 , θ2 , . . . θN −1 , where x1 = r sin θ1 sin θ2 . . . sin θN −2 sin θN −1 x2 = r sin θ1 sin θ2 . . . sin θN −2 cos θN −1 . . . xN −1 = r sin θ1 cos θ2 xN = r cos θ1 Here 0 ≤ θj ≤ π for j = 1, . . . N − 2 and 0 ≤ θN −1 ≤ 2π. Thus (r, θ1 , θ2 , . . . θN −1 ) are spherical coordinates on RN . The Jacobian of the transformation (x1 , . . . xN ) → (r, θ1 , θ2 , . . . θN −1 ), needed for integration in spherical coordinates is rN −1 sinN −2 θ1 sinN −3 θ2 . . . sin θN −2 Integration of a function f over SN −1 may expressed by Z Z π Z π Z 2π f (ω) dω = ... f (θ1 , . . . θN −1 )dσ SN −1 0 0 0 where dσ = sinN −2 θ1 sinN −3 θ2 . . . sin θN −2 dθN −1 . . . dθ1 Likewise integration of a function f over RN is Z Z ∞Z π Z ∞Z Z ... f (x) dx = f (rω) dωdr = RN 0 SN −1 0 0 0 π Z 2π f (r, θ1 , . . . θN −1 )rN −1 dσ dr 0 In particular if f is radially symmetric, f (x) = f (|x|), we get Z Z ∞ f (x) dx = ΩN −1 f (r)rN −1 dr RN 0 where Z ΩN −1 = dω SN −1 is the surface area of SN −1 . 327 (18.3.1) intradfn Chapter 19 Bibliography 328 Bibliography Ad75 [1] Robert A. Adams. Sobolev spaces. Academic Press [A subsidiary of Harcourt Brace Jovanovich, Publishers], New York-London, 1975. Pure and Applied Mathematics, Vol. 65. AG93 [2] N. I. Akhiezer and I. M. Glazman. Theory of linear operators in Hilbert space. Dover Publications, Inc., New York, 1993. Translated from the Russian and with a preface by Merlynd Nestell, Reprint of the 1961 and 1963 translations, Two volumes bound as one. Bl84 [3] Norman Bleistein. Mathematical methods for wave phenomena. Computer Science and Applied Mathematics. Academic Press, Inc., Orlando, FL, 1984. BN69 [4] F. Brauer and John A. Nohel. The Qualitative theory of ordinary differential equations, an introduction. W. A. Benjamin Inc., Menlo Park, CA, 1969. Br11 [5] Haim Brezis. Functional analysis, Sobolev spaces and partial differential equations. Universitext. Springer, New York, 2011. Ca66 [6] Lennart Carleson. On convergence and growth of partial sums of Fourier series. Acta Math., 116:135–157, 1966. CL55 [7] Earl A. Coddington and Norman Levinson. Theory of ordinary differential equations. McGraw-Hill Book Company, Inc., New York-Toronto-London, 1955. CH53 [8] R. Courant and D. Hilbert. Methods of mathematical physics. Vol. I. Interscience Publishers, Inc., New York, N.Y., 1953. DM72 [9] H. Dym and H. P. McKean. Fourier series and integrals. Academic Press, New York-London, 1972. Probability and Mathematical Statistics, No. 14. 329 Ev10 [10] Lawrence C. Evans. Partial differential equations, volume 19 of Graduate Studies in Mathematics. American Mathematical Society, Providence, RI, second edition, 2010. Fo95 [11] Gerald B. Folland. Introduction to partial differential equations. Princeton University Press, Princeton, NJ, second edition, 1995. Fr44 [12] K. O. Friedrichs. The identity of weak and strong extensions of differential operators. Trans. Amer. Math. Soc., 55:132–151, 1944. Ga64 [13] P. R. Garabedian. Partial differential equations. John Wiley & Sons, Inc., New York-London-Sydney, 1964. GvL96 [14] Gene H. Golub and Charles F. Van Loan. Matrix computations. Johns Hopkins Studies in the Mathematical Sciences. Johns Hopkins University Press, Baltimore, MD, third edition, 1996. Ho73 [15] Harry Hochstadt. Integral equations. John Wiley & Sons, New York-London-Sydney, 1973. Pure and Applied Mathematics. Ho83 [16] Lars Hörmander. The analysis of linear partial differential operators. II, volume 256 of Grundlehren der Mathematischen Wissenschaften [Fundamental Principles of Mathematical Sciences]. Springer-Verlag, Berlin, 1983. Distribution theory and Fourier analysis. HN01 [17] John K. Hunter and Bruno Nachtergaele. Applied analysis. World Scientific Publishing Co., Inc., River Edge, NJ, 2001. Jo82 [18] Fritz John. Partial differential equations, volume 1 of Applied Mathematical Sciences. Springer-Verlag, New York, fourth edition, 1982. Ju72 [19] R. K. Juberg. Finite Hilbert transforms in Lp . Bull. Amer. Math. Soc., 78:435–438, 1972. Ke67 [20] Oliver Dimon Kellogg. Foundations of potential theory. Reprint from the first edition of 1929. Die Grundlehren der Mathematischen Wissenschaften, Band 31. SpringerVerlag, Berlin-New York, 1967. Kr89 [21] Rainer Kress. Linear integral equations, volume 82 of Applied Mathematical Sciences. Springer-Verlag, Berlin, 1989. Mc03 [22] R. C. McOwen. Partial Differential Equations: Methods and Applications, 2nd ed. Prentice-Hall, Upper Saddle River, NJ, 2003. 330 MS64 [23] Norman G. Meyers and James Serrin. H = W . Proc. Nat. Acad. Sci. U.S.A., 51:1055–1056, 1964. MPF91 [24] D. S. Mitrinović, J. E. Pečarić, and A. M. Fink. Inequalities involving functions and their integrals and derivatives, volume 53 of Mathematics and its Applications (East European Series). Kluwer Academic Publishers Group, Dordrecht, 1991. Pa75 [25] L. E. Payne. Improperly posed problems in partial differential equations. Society for Industrial and Applied Mathematics, Philadelphia, Pa., 1975. Regional Conference Series in Applied Mathematics, No. 22. Pi02 [26] Mark A. Pinsky. Introduction to Fourier analysis and wavelets. Brooks/Cole Series in Advanced Mathematics. Brooks/Cole, Pacific Grove, CA, 2002. Ra91 [27] Jeffrey Rauch. Partial differential equations, volume 128 of Graduate Texts in Mathematics. Springer-Verlag, New York, 1991. Ro10 [28] H. L. Royden and P. M. Fitzpatrick. Real analysis. Prentice Hall, New York, fourth edition, 2010. Ru76 [29] Walter Rudin. Principles of mathematical analysis. McGraw-Hill Book Co., New York-Auckland-Düsseldorf, third edition, 1976. International Series in Pure and Applied Mathematics. Ru87 [30] Walter Rudin. Real and complex analysis. McGraw-Hill Book Co., New York, third edition, 1987. Ru91 [31] Walter Rudin. Functional analysis. International Series in Pure and Applied Mathematics. McGraw-Hill, Inc., New York, second edition, 1991. Sc66 [32] Laurent Schwartz. Mathematics for the physical sciences. Hermann, Paris; AddisonWesley Publishing Co., Reading, Mass.-London-Don Mills, Ont., 1966. Sth11 [33] Ivar Stakgold and Michael Holst. Green’s functions and boundary value problems. Pure and Applied Mathematics (Hoboken). John Wiley & Sons, Inc., Hoboken, NJ, third edition, 2011. St70 [34] Elias M. Stein. Singular integrals and differentiability properties of functions. Princeton Mathematical Series, No. 30. Princeton University Press, Princeton, N.J., 1970. SW71 [35] Elias M. Stein and Guido Weiss. Introduction to Fourier analysis on Euclidean spaces. Princeton University Press, Princeton, N.J., 1971. Princeton Mathematical Series, No. 32. 331 Tr75 [36] François Trèves. Basic linear partial differential equations. Academic Press [A subsidiary of Harcourt Brace Jovanovich, Publishers], New York-London, 1975. Pure and Applied Mathematics, Vol. 62. We74 [37] Hans F. Weinberger. Variational methods for eigenvalue approximation. Society for Industrial and Applied Mathematics, Philadelphia, Pa., 1974. Based on a series of lectures presented at the NSF-CBMS Regional Conference on Approximation of Eigenvalues of Differential Operators, Vanderbilt University, Nashville, Tenn., June 26–30, 1972, Conference Board of the Mathematical Sciences Regional Conference Series in Applied Mathematics, No. 15. WZ77 [38] Richard L. Wheeden and Antoni Zygmund. Measure and integral. Marcel Dekker, Inc., New York-Basel, 1977. An introduction to real analysis, Pure and Applied Mathematics, Vol. 43. Y01 [39] Robert M. Young. An introduction to nonharmonic Fourier series. Academic Press, Inc., San Diego, CA, first edition, 2001. 332