II. Extensions c 2015, Philip D Loewen A. Vector-Valued Arcs Everything works just like it does for scalar-valued arcs. With careful management of notation, the main results even have the same appearance. So let us consider . ∗ Rn as the vector space of n ×1-matrices (column vectors, like [..]), and write (Rn ) for the vector space of 1×n-matrices (row vectors, like [· · ·]). We will use square brackets when being completely precise, and allow round ones to simplify the expression of points in Rn : x1 x2 n (x1 , x2 , . . . , xn ) ∈ Rn encodes ... ∈ R . xn We will use no decorations on elements x ∈ Rn (avoiding x, ~x, etc.), and keep using the hat accent just for identification, not a symbol for a unit vector. Use C 1 ([a, b], Rn) as our notation for the basic arc space when n ≥ 2, reverting to C 1 [a, b] when n = 1. Define VM N , DLM, etc., just as in the case n = 1. Gradients versus Derivatives. For any scalar-valued function Φ: X → R and point of interest x ∈ X, the symbol DΦ[x] denotes an operator intended to approximate the action of Φ near x. That is, DΦ[x] is itself a funcion that takes inputs from the space X and gives out real numbers. In abstract notation, evaluating DΦ[x] at some point h ∈ X looks (and is defined) like this: def Φ[x + λh] − Φ[x] . λ→0 λ DΦ[x](h) = Φ′ [x; h] = lim In calculus, X = Rn and the name f is more common than Φ. Linearization for f near x looks like this: f (x + h) ≈ f (x) + ∂f ∂f h1 + · · · + hn ∂x1 ∂xn ∂f ≈ f (x) + [ ∂x 1 ∂f ∂x2 ··· ∂f ∂xn h1 h2 ] ... , for h ≈ 0 in Rn . hn To write f (x + h) ≈ f (x) + Df (x)h requires interpreting Df (x) as the row vector shown above. Our subscript notation fx (x) = Df (x) will match this. The Gradient. To say, as in calculus, that the vector v = ∇f (x) tells the direction to move from x in order to increase f -values most rapidly, the expression x + λv must make sense for real scalars λ. So v must have the same shape as x—that is, File “extns”, version of 23 January 2015, page 1. Typeset at 12:29 January 23, 2015. 2 PHILIP D. LOEWEN v = ∇f (x) must be a column vector. So our fx (x) = Df [x] is not quite identical to ∇f (x): to be completely precise, ∂f Df [x] = [ ∂x 1 ∂f ∂x2 ··· ∂f ∂xn T ] = (∇f (x)) . Basic Problem. A real interval [a, b] is given, along with a C 1 function L: [a, b] × Rn × Rn → R and points A, B ∈ Rn . ( min Λ[x] = Z b L(t, x(t), ẋ(t)) dt : x(a) = A, x(b) = B a ) . Theorem. Suppose the arc x b ∈ C 1 ([a, b], Rn) gives a DLM (rel. VII ) above. Then ∗ (a) There is a constant c ∈ (Rn ) such that b v (t) = c + L (IEL) Z t a b x (r) dr, L b v is a function in C 1 [a, b]; (Rn )∗ satisfying (b) So L d b b x (t) Lv (t) = L dt (DEL) ∀t ∈ [a, b]. ∀t ∈ [a, b]. (c) If L ∈ C 2 on some open set containing (t0 , x b(t0 ), x(t ḃ 0 )) and the n × n matrix b vv (t) is invertible, then x L b is C 2 on some open interval containing t0 . (d) If x b ∈ C 2 ([a, b], Rn) and L ∈ C 2 , then (WE2) i d hb b v (t)x(t) b t (t), L(t) − L ḃ =L dt ∀t ∈ [a, b]. Here (DEL) is an equation between row vectors of length n. Writing out the component equations gives a system of n ODE’s in n unknown functions. However, (WE2) is just a single scalar differential equation . . . certain to be inadequate to provide a unique solution when n ≥ 2. Kepler’s Problem. For central-force motion in polar coordinates, a particle of mass m, generalized position (r, θ), and generalized velocity (ṙ, θ̇), has kinetic and potential energies given by 2 1 2 mv KE: T = PE: V =− File “extns”, version of 23 January 2015, page 2. = Km . r 1 2m 1 2 2 ṙ + 1 2 2 2 r θ̇ , Typeset at 12:29 January 23, 2015. II. Extensions 3 The Principle of Least Action says that real objects move along paths in space that minimize (at least over small time intervals) the Integral” below, in which “Action r(t) the input arcs have the R2 -valued form x(t) = : θ(t) A[x] = Z b (T − V ) dt = a Z b m a 1 ṙ(t)2 2 + 1 r(t)2 θ̇(t)2 2 K + r(t) dt. Ignoring the constant m > 0, write x = (r, θ) ∈ R2 and v = (u, ω) ∈ R2 . Then the integrand above is built by evaluating this function along arcs in R2 : L(t, x, v) = 21 u2 + 12 r 2 ω 2 + Note that Lt (t, x, v) = 0, h ∂L Lx (t, x, v) = ∂r h ∂L Lv (t, x, v) = ∂u K . r i h K ∂L = rω 2 − 2 r ∂θ i ∂L = [ u r2 ω ] . ∂ω i 0 , Any action-minimizing trajectory x(t) = (r(t), θ(t)) must satisfy (DEL), namely, d [ ṙ dt h K r θ̇ ] = r θ̇ − 2 r 2 2 i 0 . This leads to the system of 2 second-order equations in 2 variables, r̈(t) = r θ̇ 2 − K , r2 r 2 θ̇ = const. The second equation shows conservation of angular momentum (and leads to Kepler’s Second Law). Combining it with the first leads to Kepler’s other two laws of planetary motion . . . Physics courses show how. In this problem we have Lt = 0, so (WE2) implies that the following quantity is constant: K ṙ 2 − [ ṙ r θ̇ ] + + θ̇ r K = − 12 ṙ 2 − 12 r 2 θ̇ 2 + = −(T + V ). r b −L b v (t)x(t) L(t) ḃ = 1 2 2 ṙ 1 2 2 2 r θ̇ Again the conservation of total energy arises from a variational principle. //// B. Parametric Curves Sometimes the independent variable t is a physically meaningful parameter (as in Kepler’s problem above). However, the same letter is often used as a generic parameter name in geometric problems where only the shape of the curve in Rn is of interest, and the details of its parametrization make no difference. In general, we File “extns”, version of 23 January 2015, page 3. Typeset at 12:29 January 23, 2015. 4 PHILIP D. LOEWEN want to deal with parametric curves x: [t0 , t1 ] → Rn in C 1 ([a, b]; Rn); in what follows we enforce smoothness by requiring 2 0 6= |ẋ(t)| = dx1 dt 2 + dx2 dt 2 +···+ dxn dt 2 , ∀t ∈ [t0 , t1 ]. To illustrate, suppose n = 2. At any parameter value t0 ∈ [a, b] the vector ẋ(t0 ) (if nonzero) is tangent to the curve at the point x(t0 ); the slope of curve at that point equals the slope of ẋ(t0 ) = (ẋ1 (t0 ), ẋ2 (t0 )), namely, ẋ2 (t0 ) dx2 = . dx1 t=t0 ẋ1 (t0 ) (Picture.) In a parametric description where x(t0 ) = a and x(t1 ) = b, the ordinary Rb variational integral a L0 (x, y(x), y ′(x)) dx would correspond to ẏ(t) ẋ(t) dt. I= L0 x(t), y(t), ẋ(t) t=t0 def Z t1 (For calculation purposes, this looks just like integration by substitution, where we “substitute x = x(t)”.) To prove that this expression is unaffected by changes in parametrization of the curve, suppose φ: [r0 , r1 ] → R is differentiable with φ′ (r) > 0 always and φ(r0 ) = t0 , φ(r1 ) = t1 . Then the mapping def r 7→ (e x(r), ye(r)) = (x(φ(r)), y(φ(r))) , r ∈ [r0 , r1 ], traces the same points of R2 in the same order as the original curve, and its integral value is ! Z r1 y(r) ė def x(r) ė dr. Ie= L0 x e(r), ye(r), x(r) ė r=r0 Let t = φ(r) in I: since we get x(r) ė = ẋ(φ(r))φ̇(r), y(r) ė = ẏ(φ(r))φ̇(r), I= = Z Z r1 L0 r=r0 r1 L0 r=r0 r ∈ [r0 , r1 ], ẏ(φ(r)) x(φ(r)), y(φ(r)), ẋ(φ(r))φ̇(r) dr ẋ(φ(r)) ! y(r) ė e x(r) ė dr = I. x e(r), ye(r), x(r) ė This calculation shows that the integrand x v L , = L0 (x, y, w/v)v, y w File “extns”, version of 23 January 2015, page 4. (0) Typeset at 12:29 January 23, 2015. II. Extensions 5 leads to a functional Z t1 x x(t) ẋ(t) Λ = L , dt y y(t) ẏ(t) t0 that assigns the same numerical value to a given geometric curve regardless of its parametric description. In more general situations, where x denotes a parametric arc in Rn , any integrand L: Rn × Rn → R with the property ∀r > 0, will give a functional Λ[x] = L(x, rv) = rL(x, v) Z ∀x, v ∈ Rn (1) t1 L(x(t), ẋ(t)) dt whose value is the same for every t0 parametrization of the input curve. (Notice that property (1) holds for any integrand defined using (0).) The converse also holds, giving . . . Theorem. Let L = L(t, x, v) be of class C 1 on R × Rn × Rn . The correspondRb ing functional Λ[x] = a L(t, x(t), ẋ(t)) dt gives the same value to every parametric description of the same arc if and only if both (i) L has no direct t-dependence, i.e., Lt (t, x, v) = 0 everywhere, and (ii) L satisfies the homogeneity condition (1). Proof. Easy, but details are given by Bliss, Lectures on the COV. //// Euler’s Theorem on Homogeneous Functions. Taking ∂/∂r in (1) gives Lv (x, rv)v = L(x, v). Substituting r = 1 yields a useful identity, which can be differentiated further: for all x, v, Lv (x, v)v = L(x, v), v T Lvx (x, v) = Lx (x, v), v T Lvv (x, v) = 0. (2) Hence condition (WE2) is completely redundant in parametric problems (don’t waste your time), and the matrix Lvv is guaranteed to be singular everywhere (Weierstrass/Hilbert Theorem can’t be applied directly as written). Furthermore, the equations making up (DEL) are linearly dependent: in vector notation, any arbitrary parametric curve x will obey d Lv (x(t), ẋ(t) − Lx (x(t), ẋ(t)) ẋ(t) = 0 for all t. dt (Practice: Derive this from (2).) This is to be expected: since the integral shows no preference for one parametrization over another, the system of Euler equations for the solution arc should leave us one degree of freedom to choose whichever parametrization is convenient. File “extns”, version of 23 January 2015, page 5. Typeset at 12:29 January 23, 2015. 6 PHILIP D. LOEWEN C. Piecewise Smooth Arcs Overview. Allowing arcs with corners to compete for the minimum in a COV problem increases the chances that a minimizer exists. Infimum versus Minimum. Let S ⊆ R, S 6= ∅. The notation m = min S = min(S) is reserved for the unique real number m with these two properties: (i) m ≤ s, ∀s ∈ S; (ii) m ∈ S. Strictly speaking, then, the following symbol is undefined: min {t ∈ R : t > 0} . (The set S = (0, +∞) contains no m with properties (i)–(ii).) The notation µ = inf S = inf(S) is reserved for the unique element of R ∪ {−∞} with these two properties: (i) µ ≤ s, ∀s ∈ S, and (ii) S contains a sequence (sn ) such that sn → µ. E.g., inf {t ∈ R : t > 0} = 0. When S = {Φ(x) : x ∈ X} for some function Φ: X → R, we write inf(S) = inf f (x). x∈X Any sequence (sn ) as in (ii) must have the form sn = Φ(xn ) for some xn ∈ X; we then call (xn ) a minimizing sequence for Φ. If m = min(S) = min f (x) x∈X happens to exist, then the condition m ∈ S forces m = f (b x) for some x b ∈ X: then x b is a minimizing point for Φ. Example. Let S = x ∈ C 1 [−1, 1] : x(−1) = 0, x(1) = 1 , and solve min Λ[x] := x∈S Z 1 2 x(t)2 (ẋ(t) − 1) dt. −1 Clearly Λ[x] ≥ 0 for all arcs x. Indeed, to get Λ[x] = 0 would require either x(t) = 0 or ẋ(t) = 1 at each t, and there is no x ∈ C 1 with this property, so Λ[x] > 0 for all x ∈ S. Of course, Λ[b x] = 0 for x b(t) = File “extns”, version of 23 January 2015, page 6. 0, if −1 ≤ t < 0, t, if 0 ≤ t ≤ 1, Typeset at 12:29 January 23, 2015. II. Extensions 7 but x b 6∈ S because x(0) ḃ does not exist. A sequence of smooth arcs approximating x b is obtained by defining −1 ≤ t < − k1 , 0, vk (t) = k2 (t + k1 ), − k1 ≤ t < k1 , 1 ≤ t ≤ 1, 1, k −1 ≤ t < − k1 , Z t 0, xk (t) = vk (r) dr = k4 (t + k1 )2 , − k1 ≤ t < k1 , −1 1 t, k ≤ t ≤ 1. (Pictures.) It’s easy to check that each xk ∈ C 1 [−1, 1], and that Λ[xk ] = Z 1/k 2 xk (t)2 [vk (t) − 1] dt → 0 as k → ∞. −1/k It follows that inf S Λ = 0, while minS Λ does not exist. If we could allow inputs like x b, the minimum would exist and equal 0. //// Use “smooth” as a synonym for “continuously differentiable (once).” Then call a function x: [a, b] → R piecewise smooth if it is a finite end-to-end concatenation of smooth arcs. Formally, say x ∈ P WS[a, b] if and only if x is continuous on [a, b] and there exist N ∈ N and numbers a = t0 < t1 < t2 < · · · < tN = b such that x ∈ C 1 [ti−1 , ti ] for each i = 1, 2, . . . , N . As before, x ∈ C 1 [ti−1 , ti ] requires that these two one-sided limits to exist finitely for each i = 1, . . . , N : ẋ(t+ i−1 ) = lim ẋ(t), t→t+ i−1 ẋ(t− i ) = lim ẋ(t). t→t− i Thus the job of the partition points t0 , . . . , tN is to cover endpoints and corner points in the graph of x. Write P WS ([a, b]; Rn) for the set of arcs for which each component is piecewise smooth. For any such arc there will be a finite subset of [a, b] covering all points where any component has a jump in its derivative. Terminology Upgrade. From now on, “arc” means “function in P WS”. For any continuous Lagrangian L = L(t, x, v) and arc x on [a, b], the function t 7→ L (t, x(t), ẋ(t)) is piecewise continuous. Such a function is certainly (Riemann) integrable. Indeed, if the partition t0 < · · · < tN covers the corners of x, then Λ[x] = Z b L (t, x(t), ẋ(t)) dt = a N Z X i=1 ti L (t, x(t), ẋ(t)) dt ti−1 shows how to split the evaluation of Λ into a sum of integrals involving continuous functions. File “extns”, version of 23 January 2015, page 7. Typeset at 12:29 January 23, 2015. 8 PHILIP D. LOEWEN Note: If x ∈ P WS[a, b] obeys x(a) = A and x(b) = B, then we still get Z b ẋ(t) dt = x(b) − x(a) = B − A. a Indeed, choose any partition t0 < · · · < tN that covers the corners of x, and telescope the sum: Z b N N Z ti X X [x(ti ) − x(ti−1 )] = x(tN ) − x(t0 ) = x(b) − x(a). ẋ(t) dt = ẋ(t) dt = a i=1 ti−1 i=1 Theorem (The Fairing Theorem—Troutman Prop 7.6). Under the conditions stated above, inf {Λ[x] : x(a) = A, x(b) = B} = inf {Λ[x] : x(a) = A, x(b) = B} . (∗) x∈P WS x∈C 1 Proof. (Sketch.) Inequality ≥ in (∗) is obvious, since the arcs competing for minimality on the right include all the arcs allowed on the left and more besides. To prove the reverse inequality, choose an arbitrary ε > 0. For any z ∈ P WS satisfying the endpoint conditions, surround each corner point of z in (a, b) with a small open interval; write Ω for the union of these intervals. Carefully modify z inside Ω to smooth out its corners, thereby producing an arc y ∈ C 1 such that ẏ(t) = ż(t) ∀t ∈ [a, b] \ Ω, sup |ẏ − ż| < +∞. Ω\{corners} By controlling both the length of Ω and the worst-case discrepancy sup[a,b] |y − z|, we can arrange that ! y(a) = z(a), y(b) = z(b), and Λ[z] ≥ Λ[y] − ε ≥ inf Λ[y] − ε. y∈C 1 Here the rightmost expression is independent of z, and the same argument works for any z, so inf Λ[z] ≥ inf Λ[y] − ε. z∈P WS y∈C 1 Since this last inequality holds for arbitrary ε > 0, we have inf Λ[z] ≥ inf Λ[y], z∈P WS y∈C 1 as required. //// Costs and Benefits of Using PWS. 1. Allowing corners makes existence of minimizers more likely. (Benefit.) 2. Allowing corners doesn’t affect the infimum value. (Benefit.) 3. Cornered variations are technically convenient. (Benefit.) 4. Familiar necessary conditions can be extended to allow corners, but this takes work and careful interpretation. (Cost.) File “extns”, version of 23 January 2015, page 8. Typeset at 12:29 January 23, 2015. II. Extensions 9 D. Piecewise Smooth Directional Local Minimizers Cornered Variations. Upgrade variation spaces with notation like VII = {h ∈ P WS[a, b] : h(a) = 0 = h(b)} . There are lots of piecewise linear arcs in this space that are easier to calculate with than their smooth approximants. Principle of Optimality. Suppose x b gives a DLM relative to VII in (P ): (Z ) b L (t, x(t), ẋ(t)) dt : x ∈ P WS, x(a) = A, x(b) = B min . a Pick any subinterval [α, β] ⊆ [a, b] and use the corresponding points on gph(b x) as endpoint targets in a new problem: (Z ) β min α L (t, x(t), ẋ(t)) dt : x ∈ C 1 , x(α) = x b(α), x(β) = x b(β) . Clearly, the restriction of x b to [α, β] gives a DLM in this new problem. (Suppose not: if some yb provides a lower cost over [α, β], then splicing x b to yb to x b would produce a lower cost over the full interval [a, b]. End-to-end splicing preserves the defining property of P WS.) Necessary Conditions. Let x b, h ∈ P WS[a, b]. By splitting the whole calculation into subintervals where both x b and h are C 1 , same calculation done before will prove Z ′ Λ [b x; h] = a t b x (r) dr h(b) + L Z b a b v (t) − L Z t a b x (r) dr ḣ(t) dt. L Now suppose x b gives a DLM in problem (P ), has exactly one corner point, say at θ ∈ (a, b). Apply Principle of Optimality on [a, θ] to see that x b solves a version of (P ) there, so IEL holds: Likewise for [θ, b]: Define c2 = d1 − b v (t) = c1 + ∃c1 : L Z b v (t) = d1 + ∃d1 : L Rθ a b x (r) dr to get L b v (t) = c2 + L Z File “extns”, version of 23 January 2015, page 9. a t t a Z θ t b x (r) dr L ∀t ∈ [a, θ). b x (r) dr L ∀t ∈ (θ, b]. b x (r) dr L ∀t ∈ (θ, b]. (†) (‡) Typeset at 12:29 January 23, 2015. 10 PHILIP D. LOEWEN Combine (†)–(‡) to get b v (t) − L Z t a b x (r) dr = L c1 , if t ∈ [a, θ), c2 , if t ∈ (θ, b]. (∗∗) Now x b solves (P ), so our general theory says that for every h ∈ VII , ′ 0 = Λ (b x; h) = Z b a = Z θ b v (t) − L Z t a [c1 ] ḣ(t) dt + b Lx (r) dr ḣ(t) dt a Z b [c2 ] ḣ(t) dt = h(θ)[c1 − c2 ]. θ Since h ∈ VII is arbitrary, this forces c1 = c2 . Hence, by (∗∗), the constant c = c1 = c2 obeys Z b v (t) = c + L t a b x (r) dr L ∀t ∈ [a, θ) ∪ (θ, b]. Similar arguments can be made for any finite number of corner points, with the following result. Theorem. If L ∈ C 1 and x b ∈ P W S[a, b] gives a DLM in problem (P ), then there is a constant c such that for all t ∈ [a, b] where x ḃ is continuous, b v (t) = c + L Z t a b x (r) dr. L (IEL) Regularity Bonus. The function of t on the RHS of (IEL) is continuous on [a, b], b v (t) are even if x b has corners. Consequently all discontinuities of the function L removable: the one-sided limits obey b v (t− ) = L b v (t+ ) L ∀t ∈ (a, b). (WE1) Example. In the motivating example of Section C above, x b(t) = max {0, t} and 2 2 2 L(t, x, v) = x (v − 1) . Calculation gives Lv = 2x (v − 1) and Hence x(t) ḃ = ( 0, if −1 ≤ t < 0, undefined, if t = 0, 1, if 0 < t ≤ 1. b v (t) = 2b L x(t)2 (x(t) ḃ − 1) = ( 0, if −1 ≤ t < 0, undefined, if t = 0, 0, if 0 < t ≤ 1. b v (t) looks like a continuous function punctuated by at most The graph of t 7→ L finitely many holes where the function is undefined. //// File “extns”, version of 23 January 2015, page 10. Typeset at 12:29 January 23, 2015. II. Extensions 11 Extremality. Any x b ∈ P WS obeying (IEL) on an open interval, with finitely many exceptions, is called an extremal for L. At the non-exceptional points, differentiation gives d b b x (t). (DEL) Lv (t) = L dt But splicing solutions of (DEL) across points where x b has corners must be done with care: for arcs in P WS, we have (IEL) ⇐⇒ (DEL) and (WE1). Geometrical Interpretation. Suppose L ∈ C 1 and some extremal x b has a corner − + point at (t0 , x0 ). Let u = x(t ḃ 0 ) and w = x(t ḃ 0 ). Then u 6= w, but by (WE1), Lv (t0 , x0 , u) = lim Lv (t, x b(t), x(t)) ḃ = lim Lv (t, x b(t), x(t)) ḃ = Lv (t0 , x0 , w). t→t− 0 (∗) t→t+ 0 This means that the function v 7→ Lv (t0 , x0 , v) is not one-to-one, and indeed that v = u and v = w are different inputs that give the same output. Illustrate with L(t, x, v) = (v 2 − 1)2 , for which L = v 4 − 2v 2 + 1 =⇒ Lv = 4v 3 − 4v = 4v(v 2 − 1). A sketch shows that choices u = 1, w = −1 are compatible, since both give Lv = 0: hence all zig-zag arcs of slope ±1 are extremals for L. Theorem (Regularity). Fix (t0 , x0 ) ∈ R × Rn , and suppose L = L(t, x, v) is C 1 on some open set containing (t0 , x0 ) × Rn . Suppose x b is an extremal for L obeying x b(t0 ) = x0 . (a) If v 7→ Lv (t0 , x0 , v) is one-to-one on Rn , then x b must be C 1 on some open interval containing t0 . (b) If L is C 2 on the open set above and Lvv (t0 , x0 , v) > 0 for all v ∈ Rn , then x b must be C 2 on some open interval containing t0 . [Likewise if Lvv (t0 , x0 , v) < 0 for all v ∈ Rn .] Proof. (a) Define u = x(t ḃ − ḃ + 0 ) and w = x(t 0 ). As in (∗) above, (WE1) gives Lv (t0 , x0 , u) = lim Lv (t, x b(t), x(t)) ḃ = lim Lv (t, x b(t), x(t)) ḃ = Lv (t0 , x0 , w). t→t− 0 t→t+ 0 ∃ Since v 7→ Lv (t0 , x0 , v) is one-to-one, this forces u = w. By L’Hospital, x(t ḃ 0 )=u = w. [Give details here.] In particular, t0 is not a corner point for x b. Since there are only finitely many corner points, the nearest one is some positive distance (say r) from t0 : then x b is C 1 on (t0 − r, t0 + r). (b) The hypothesis implies that v 7→ Lv (t0 , x0 , v) is strictly monotonic, hence one-toone. So x b is C 1 near t0 by (a). Hence x b is C 2 near t0 by the Weierstrass/Hilbert Theorem. //// Natural Boundary Conditions. These work just the same as they did for smooth extremals. Students derived a rather general formulation on HW02, and solutions have been distributed. File “extns”, version of 23 January 2015, page 11. Typeset at 12:29 January 23, 2015. 12 PHILIP D. LOEWEN E. Multiple Integrals Suppose a bounded open set Ω in R3 is given, together with a scalar-valued function g defined and continuous on some open set that contains the boundary surface of Ω. Use the symbol ∂Ω for this boundary surface, and write Ω for the set Ω ∪ ∂Ω. (The surface ∂Ω is “closed” in two senses: (1) it completely encloses a finite threedimensional volume, (2) its complement in R3 is an open set. Topologically, Ω is the “closure of Ω”, a compact set in R3 .) We are interested in scalar-valued functions u: Ω → R that agree with g on the boundary, i.e., u(x) = g(x) ∀x ∈ ∂Ω. (DBC) [DBC stands for Dirichlet Boundary Condition.] Every such function u is assigned a number by the functional ZZZ def Λ[u] = L(x, u(x), ∇u(x)) dV (x), Ω where L: Ω × R × R3 → R is a given function. The basic multidimensional problem in the COV is to minimize Λ[u] subject to (DBC). In full detail, ZZZ def minimize Λ[u] = L(x, u(x), ∇u(x)) dV (x) Ω among all u: Ω → R subject to u(x) = g(x) ∀x ∈ ∂Ω. Background. Under suitable smoothness hypotheses on the set Ω and a vector field F: Ω → R3 , Gauss’s Divergence Theorem states ZZZ ZZ b (∇ • F) dV = F • N dS. Ω ∂Ω This is a form of the Fundamental Theorem of Calculus: it shows how a suitable combination of first derivatives of F will “cancel” one integral in a triple-integral setup, leaving only a double-integral over the boundary of the original integration b represents the domain. The boundary integral on the right is a “flux integral”: N outward unit normal to the solid Ω, which typically varies from point to point. A typical F: Ω → R3 , will have input x = (x1 , x2 , x3 ) and values as shown here: P (x1 , x2 , x3 ) F(x1 , x2 , x3 ) = Q(x1 , x2 , x3 ) R(x1 , x2 , x3 ) The divergence of F is this scalar-valued function of position: div(F) = ∇ • F = File “extns”, version of 23 January 2015, page 12. ∂Q ∂R ∂P + + . ∂x1 ∂x2 ∂x3 Typeset at 12:29 January 23, 2015. II. Extensions 13 For any smooth scalar-valued function y: Ω → R, one form of the product rule says ∇ • (yF) = (∇y) • F + y(∇ • F). Necessary conditions for optimality in problem (P) can be derived following the same abstract path we took in earlier studies. First, pick an arbitrary function u obeying (DBC) and an arbitrary y: Ω → R satisfying y(x) = 0 for each x ∈ ∂Ω. Consider d Λ[u + λy] − Λ[u] ′ Λ [u; y] = lim+ = Λ[u + λy] λ dλ λ→0 λ=0 ZZZ d L(x, u(x) + λy(x), ∇u(x) + λ∇y(x)) dV (x) = dλ Ω λ=0 ZZZ d L(x, u(x) + λy(x), ∇u(x) + λ∇y(x)) dV (x) = Ω dλ λ=0 ZZZ = Lu (x, u(x), ∇u(x))y(x) + Lw (x, u(x), ∇u(x))∇y(x) dV (x). Ω Now we apply the product rule above with F(x) = Lw (x, u(x), ∇u(x)): it gives Lw (x, u(x), ∇u(x))∇y(x) = ∇•(y(x)Lw (x, u(x), ∇u(x)))−y(∇•Lw(x, u(x), ∇u(x))). Integrating this and applying the Divergence Theorem gives ZZZ Ω Lw (x, u(x), ∇u(x))∇y(x) dV ZZZ ZZZ = ∇ • (y(x)Lw (x, u(x), ∇u(x))) dV − y(∇ • Lw (x, u(x), ∇u(x))) dV Ω Ω ZZ ZZZ b = y(x)Lw (x, u(x), ∇u(x)) • N dS − y(∇ • Lw (x, u(x), ∇u(x))) dV. ∂Ω Ω In summary, we have ′ Λ [u; y] = ZZZ [Lu (x, u(x), ∇u(x)) − ∇ • Lw (x, u(x), ∇u(x))] y(x) dV (x) ZZ b dS. + y(x)Lw (x, u(x), ∇u(x)) • N Ω ∂Ω This formula is valid for any smooth y. If, in addition, y(x) = 0 for each x on ∂Ω, then the second integral equals 0. A simple argument involving perturbations with bump-shaped graphs leads to the following necessary condition for u to solve problem (P): ∇ • Lw (x, u(x), ∇u(x)) = Lu (x, u(x), ∇u(x)), File “extns”, version of 23 January 2015, page 13. x ∈ Ω. (ELPDE) Typeset at 12:29 January 23, 2015. 14 PHILIP D. LOEWEN Example. Suppose Λ[u] = ZZZ Here we have so Lu ≡ 0 and Ω 1 2 |∇u(x)| 2 dV = 1 2 ZZZ Ω ∂u ∂x1 2 + ∂u ∂x2 2 + ∂u ∂x3 2 ! dV. x1 w1 L x2 , u, w2 = 21 w12 + 21 w22 + 21 w32 , x3 w3 Lw (x, u, w) = Dw L(x, u, w) = [ w1 w2 w3 ] . So a function u is an extremal [i.e., a solution of (ELPDE)] exactly when 0 = ∇ • Lw (x, u(x), ∇u(x)) = ∇ • [ u,1 (x) u,2 (x) u,3 (x) ] = u,11 + u,22 + u,33 . This is Laplace’s Equation for the function u. //// There are analogous results in all dimensions. For the case of two independent variables, where Ω ⊆ R2 , u = u(x, y) and L = L((x, y), u, (v, w)), (ELPDE) says ∂ Lv ((x, y), u(x, y), (ux(x, y), uy (x, y))) ∂x ∂ Lw ((x, y), u(x, y), (ux(x, y), uy (x, y))) + ∂y = Lu ((x, y), u(x, y), (ux(x, y), uy (x, y))), (x, y) ∈ Ω. F. Local Minima (Basic Problem) min {Λ[x] : x ∈ P WS[a, b], x(a) = A, x(b) = B} . (P ) Recall the space of variations VII = {h ∈ P WS[a, b] : h(a) = 0 = h(b)} . We’re looking for arcs x b that obey Λ[b x] ≤ Λ[b x + λh] (∗) for a good selection of variations h ∈ VII and λ > 0. x b gives a (Global) Minimum if (∗) holds for all λ > 0 and all h ∈ VII . x b gives a Strong Local Minimum if there exists ρ > 0 such that (∗) holds for any combination of λ > 0 and h ∈ VII such that max |λh(t)| < ρ. t∈[a,b] File “extns”, version of 23 January 2015, page 14. Typeset at 12:29 January 23, 2015. II. Extensions 15 x b gives a Weak Local Minimum if there exists ρ > 0 such that (∗) holds for any combination of λ > 0 and h ∈ VII such that both max |λh(t)| < ρ and sup λḣ(t) < ρ. t∈[a,b] t∈[a,b] x b gives a Directional Local Minimum if for each fixed h ∈ VII , there exists ρ > 0 such that (∗) holds for all λ ∈ (0, ρ). I.e., for each h, the function λ 7→ Λ[b x + λh] has a local minimum over [0, +∞) at λ = 0. Each definition is different, and the sets of arcs in various categories may be different vary. Writing Σ(P ) for the set of global minimizers, and adding descriptive superscripts as suggested above, we have Σ(P ) ⊆ ΣSLM (P ) ⊆ ΣW LM (P ) ⊆ ΣDLM (P ). Short Segments. Combining each definition above with the Principle of Optimality leads to a “short segment” version of the criterion. Instead of the full vector space VII , we use the subset of variations h for which all nonzero values occur for t in some short open interval. For example, an arc x b gives a Directional Local Minimum on Short Segments if there exists ρ > 0 such that for each fixed h ∈ VII whose nonzero values can all be covered by some open interval of length ρ, the function λ 7→ Λ[b x + λh] has a local minimum over [0, +∞) at λ = 0. Clearly, ΣDLM SS contains ΣDLM . There must be more. Our proofs of IEL, NBC, WE1, WE2 (when x b ∈ C 2 ), and DLM SS Hilbert’s theorem apply to every arc in the biggest class, Σ . (DLM class is obvious, add SS by using Principle of Optimality.) Assuming we have a more robust type of local minimum is a stronger hypothesis, and should give stronger conclusions. Intuition-Builder. Consider f : R2 → R defined by −1, if x2 = x21 , x1 > 0, f (x) = 0, otherwise, near the point x b = 0. Pick any nonzero h ∈ R2 and observe that there exists ρ > 0 such that λ 7→ f (λh) = 0 = f (0, 0) ∀λ ∈ (0, ρ). Hence the point x b = 0 gives a directional local min for f . Detail: Fix h = (h1 , h2 ) ∈ R2 . If either h1 ≤ 0 or h2 ≤ 0, then f (λh) = 0 for all λ ≥ 0. If both h1 > 0 and h2 > 0, then f (λh) 6= 0 ⇐⇒ f (λh) = −1 ⇐⇒ λh2 = (λh1 )2 ⇐⇒ λ = h2 . h21 def Thus f (λh) = 0 for all λ ∈ [0, ρ), where ρ = ρ(h) = h2 /h21 . Note, however, that there are points arbitrarily near x b with smaller values. (E.g., x(n) = (1/n, 1/n2 ) f or large n.) File “extns”, version of 23 January 2015, page 15. Typeset at 12:29 January 23, 2015. 16 PHILIP D. LOEWEN G. Strong Local Minimizers; the Weierstrass Condition Example. Consider the extremal x b(t) = t for the problem min Z 1 3 ẋ(t) dt : x(0) = 0, x(1) = 1 . 0 Given any ε > 0, consider the variation hλ defined as follows: ε − t, if 0 ≤ t ≤ λ, λ hλ (t) = ε − (t − 1), if λ < t ≤ 1. λ−1 Observe that khλ k∞ = ε. Calculation reveals 3 Z 1 ε ε 3 dt dt + 1− Λ[b x + hλ ] = 1− λ λ−1 λ 0 3 ε ε 3 + (1 − λ) 1 − = λ 1− λ λ−1 3 3 (ε − λ) (1 − λ − ε) =− + . 2 λ (1 − λ)2 Z λ Notice that as λ → 0+ , the second term here converges to the finite quantity (1 − ε)3 , while the first term diverges to −∞. Thus we have Λ[b x + hλ ] < Λ[b x] for all λ > 0 sufficiently small, and since this is true for every ε > 0, the arc x b fails to provide a strong local minimum here. In fact, our analysis shows that for every ε > 0, inf {Λ[x] : x(0) = 0, x(1) = 1, kx − x bk∞ ≤ ε} (1 − λ + ε)3 (ε − λ)3 + = −∞. ≤ inf − λ2 (1 − λ)2 Thus the problem above has no minimum at all. //// The method of this example works well in other contexts, too. The key observation is that large negative derivatives will make a large negative contribution to the objective integral. The triangular variations constructed here allow these derivatives to make a large negative contribution in the first very small interval, leading to the divergent first term in the sum above; in contrast the variations are very nearly zero for the remainder of the interval, so the integral there comes very close to the original objective value Λ[b x] for small λ. It is helpful to imagine changing ε to −ε in the definition of hλ , thus producing a triangular variation with positive values. In this case the first term of the objective sum above is positive, but the second term diverges to −∞ as one takes the limit λ → 1− . File “extns”, version of 23 January 2015, page 16. Typeset at 12:29 January 23, 2015. II. Extensions 17 Theorem (Weierstrass, 1879). If L ∈ C 1 and x b gives a weak local minimum in the basic problem, then there exists ε > 0 such that for all t ∈ (a, b), h i L t, x b(t), x(t) ḃ + Lv t, x b(t), x(t) ḃ w − x(t) ḃ ≤ L(t, x b(t), w) , (∗) whenever w − x(t) ḃ < ε. [Interpretation: If t ∈ (a, b) is a corner point of x b, (∗) holds if we write x(t ḃ − ) or x(t ḃ + ) instead of x(t) ḃ throughout.] Moreover, if x b gives a strong local minimum, then (i) (∗) holds for all w without restriction. b −L b v (t)x(t) (ii) the function t 7→ L(t) ḃ has only removable discontinuities in [a, b]. Notation. Define the Weierstrass Excess Function E(t, x, v, w) = L(t, x, w) − L(t, x, v) − Lv (t, x, v)[w − v]. Then the Weierstrass Necessary Condition (for strong local minimizers) says E(t, x b(t), x(t), ḃ w) ≥ 0, ∀w. (W) Geometry. Condition (W) asserts the subgradient inequality for the function v 7→ L(t, x b(t), v) for the specific tangent erected using v = x(t). ḃ (Draw a picture.) If this function is convex, the subgradient inequality holds for all tangents at all base points; so a simple sufficient condition for (W) [and indeed for (W+ ) below] is the requirement Lvv (t, x b(t), v) ≥ 0 for all v. Proof. (Graves.) Choose ε > 0 as in the definition of WLM, and let t ∈ (a, b) be a ḃ Then ḃ < ε, let v = w − x(t). non-corner point of x b. Given any w where w − x(t) |v| < ε and w = x(t) ḃ + v. For h > 0 and α > 0 small enough that t − α > 0 and t + α/h < 1, define a variation y by taking y(0) = 0 and 0, if 0 < r < t − α, v, if t − α < r < t, ẏ(r) = −hv, if t < r < t + α/h, 0, if t + α/h < r < 1. Note that y satisfies (∗), so Λ[b x + y] ≥ Λ[b x], and consequently 0 ≤ lim α−1 [Λ[b x + y] − Λ[b x]] α→0+ Z t h i −1 L(b x(r) + y(r), x(r) ḃ + v) − L(b x(r), x(r)) ḃ dr = lim+ α α→0 t−α + lim α α→0+ −1 Z t t+α/h h i L(b x(r) + y(r), x(r) ḃ − hv) − L(b x(r), x(r)) ḃ dr = L(b x(t), x(t) ḃ + v) − L(b x(t), x(t)) ḃ + File “extns”, version of 23 January 2015, page 17. i 1h L(b x(t), x(t) ḃ − hv) − L(b x(t), x(t)) ḃ . h Typeset at 12:29 January 23, 2015. 18 PHILIP D. LOEWEN Rearranging this inequality gives, for all h > 0 sufficiently small, i 1 h L(b x(t), x(t)) ḃ + L(b x(t), x(t) ḃ − hv) − L(b x(t), x(t)) ḃ ≤ L(b x(t), x(t) ḃ + v). (−h) Now (∗) follows by taking the limit as h → 0+ . Both sides in (∗) depend continuously on t in any open interval where x b has no corners. Hence we can take one-sided limits as t approaches a corner point and retain the validity of the inequality. (i) If x b gives a strong local minimum, then the key inequality above remains valid for arbitrary v so the results above hold for arbitrary w. (ii) Define p(t) := Lv (t, x b(t), x(t)). ḃ If x b is a SLM, it is certainly a DLM, so it must obey WE1. This means that p(t− ) = p(t+ ) holds at each t ∈ (a, b). Rearranging (∗) gives L(t, x b(t), x(t)) ḃ − p(t)x(t) ḃ ≤ L(t, x b(t), w) − p(t)w ∀w. Now fix t. The function of w on the right side is unchanged if we swap t for either t− or t+ , so its minimum value over w is unambiguous. Choosing t− on the left side gives one expression for this minimum (attained when w = x(t ḃ − )), + + while choosing t gives another (attained when w = x(t ḃ )). Matching these expressions gives the stated result: H(t− ) = H(t+ ), File “extns”, version of 23 January 2015, page 18. where def b b v (t)x(t). H(t) = L(t) −L ḃ (WE2) //// Typeset at 12:29 January 23, 2015.