Chapter I. First Variations c 2015, Philip D Loewen In abstract terms, the Calculus of Variations is a subject concerned with max/min problems for a real-valued function of several variables. Given a vector space V and a function Φ: V → R, we explore the theory and practice of minimizing Φ[x] over x ∈ V . Additional interest and power comes from allowing • dim(V ) = +∞, • constrained minimization, where the choice variable x must lie in some preassigned subset S of V . We’ll investigate and generalize familiar facts and new issues, including . . . - necessary conditions: if x minimizes Φ over V , then Φ′ [x] = 0 and Φ′′ [x] ≥ 0; - existence/regularity: what spaces V are appropriate? - sufficient conditions: if x obeys Φ′ [x] = 0 and Φ′′ [x] > 0 then x gives a local minimum. - applications, calculations, etc. A. The Basic Problem Choice Variables. A closed real interval [a, b] of finite length is given. The set X = C 1 [a, b] denotes the collection of all functions x: [a, b] → R whose derivative ẋ is continuous on [a, b]. These will be the “choice variables” in a minimization problem: we call them “arcs” for now. Typically x = x(t). Detail: Continuity of ẋ on [a, b] requires ẋ(a) and ẋ(b) to be defined. We use one-sided definitions at a and b to arrange this. For example, x(a + r) − x(a) ẋ(a) = lim . + r r→0 Examples: (i) The function x(t) = t|t| lies in C 1 [−1, 1], so it’s an arc (check this). However, x 6∈ C 2 [−1, 1]: there’s trouble at 0. √ (ii) The function x(t) = t does not lie in C 1 [0, 1]. (It’s too steep at left end.) Objective Functional. A function L ∈ C 1 ([a, b] × R × R) (“the Lagrangian”) is given. Typically L = L(t, x, v): t for “time”, x for “position”, v for “velocity”. The Lagrangian helps assign a number to every arc x in X, namely Z b def Λ[x] = L(t, x(t), ẋ(t)) dt. a A Minimization Problem. Given real numbers A, B, the basic problem in the Calculus of Variations is this: Z b L(t, x(t), ẋ(t)) dt minimize Λ[x] = a over x∈X subject to x(a) = A, x(b) = B. File “var1”, version of 08 January 2015, page 1. Typeset at 14:51 January 9, 2015. 2 PHILIP D. LOEWEN Shorthand: min x∈C 1 [a,b] (Z b L(t, x(t), ẋ(t)) dt : x(a) = A, x(b) = B a ) . (P ) Example: Geodesics in the Plane. An arc defined on [a, b] has total length s= Z ds = Z a b p 1 + ẋ(t)2 dt. So finding the shortest √ arc joining (a, A) to (b, B) is an instance of the basic problem in which L(t, x, v) = 1 + v 2 . Example: Soap Film in Zero-Gravity. Wire rings of radii A > 0 and B > 0 are perpendicular to an axis through both their centres; the centres are 1 unit apart. A soap film stretches between them, forming a surface of revolution relative to the axes shown below. Surface tension acts to minimize the area of that surface, which we can calculate: infinitesimal ring at position t with horizontal slice dt has slant length p √ ds = dt2 + dx2 = 1 + ẋ(t)2 dt, perimeter 2πx(t), hence area p dS = 2πx(t) 1 + ẋ(t)2 dt. Total area is the “sum” of these contributions, i.e., Z Z 1 p 2πx(t) 1 + ẋ(t)2 dt. S= dS = 0 The problem of minimizing S subject to x(0) = A, x(1) = B fits the pattern above, with [a, b] = [0, 1] and p L(t, x, v) = 2πx 1 + v 2 . Example: Brachistochrone. The birth announcement of our subject came just over 310 years ago: “I, Johann Bernoulli, greet the most clever mathematicians in the world. Nothing is more attractive to intelligent people than an honest, challenging problem whose possible solution will bestow fame and remain as a lasting monument. Following the example set by Pascal, Fermat, etc., I hope to earn the gratitude of the entire scientific community by placing before the finest mathematicians of our time a problem which will test their methods and the strength of their intellect. If someone communicates to me the solution of the proposed problem, I shall then publicly declare him worthy of praise.” (Groningen, 1 January 1697) Here is a statement of Bernoulli’s problem in modern terms: Given two points α and β in a vertical plane, find the curve joining α to β down which a bead—sliding from rest without friction—will fall in least time. The Greek works for “least” and “time” give the unknown curve its impressive title: the brachistochrone. To set File “var1”, version of 08 January 2015, page 2. Typeset at 14:51 January 9, 2015. Chapter I. First Variations 3 up, install a Cartesian coordinate system with its origin at point α and the y-axis pointing downward. Then B ≥ 0, for the bead to “fall”. Now speed is the rate of change to time: v = ds/dt. Along a p of distance relative p curve in the (x, y)-plane, ds = (dx)2 + (dy)2 = 1 + (y ′ (x))2 dx, so the infinitesimal time taken to travel along the segment of curve corresponding to a horizontal distance dx is p 1 + (y ′ (x))2 dx ds = . dt = v v Here v is the speed of the bead, given from conservation of energy as PE + KE = const. −mgy + 21 mv 2 = 12 mv02 . (Here v0 is the bead’s initial velocity: v0 ≥ 0 seems reasonable.) This gives v = p v02 + 2gy, leading to p 1 + (y ′ (x))2 dt = p 2 dx. v0 + 2gy(x) The total travel time is then T = Z dt = Z b x=0 s 1 + (y ′ (x))2 dx. v02 + 2gy(x) The problem of minimum fall time fits our pattern, with integration interval [0, b], endpoint values A = 0 and B as specified above, and integrand s 1 + w2 . L(x, y, w) = v02 + 2gy Or, after changing to standard letter-names (spoiling the meaning completely!), s 1 + v2 L(t, x, v) = . v02 + 2gx Comparing this integrand to the one for minimum-distance makes a straight line seem unlikely to provide the minimum. More details later. Analogy/Crystal Ball. Calculus deals with minimization at every level. For unconstrained local minima, the only possible minimizers are critical points: • When solving minx∈R f (x), solve f ′ (x) = 0 . . . an algebraic equation for the unknown number x. • When solving minx∈Rn F (x), solve ∇F (x) = 0 . . . a system of n algebraic equations for the unknown vector x. • When solving minx∈C 1 [a,b] Λ(x), solve DΛ[x] = 0 . . . a differential equation for the unknown arc x. Before deriving that differential equation, it will be instructive to try some ad-hoc methods directly on the given problem. File “var1”, version of 08 January 2015, page 3. Typeset at 14:51 January 9, 2015. 4 PHILIP D. LOEWEN A.′ Ad-hoc Methods Let a pair of endpoints (a, A), (b, B) with a < b and a smooth function L = L(t, x, v) be given. We call any piecewise-smooth function defined on [a, b] an “arc”, and distinguish two special families of arcs: • an arc x: [a, b] → R is “admissible” if x(a) = A and x(b) = B; • an arc h: [a, b] → R is “a variation” if h(a) = 0 and h(b) = 0. Our ultimate goal is to minimize Λ[x] := Z b L(t, x(t), ẋ(t)) dt a among all admissible arcs x. In this “basic problem”, a simple admissible arc is always available—the straight line between the given endpoints: t−a x0 (t) = A + [B − A], a ≤ t ≤ b. b−a The number Λ[x0 ] provides a reference value for the cost we hope to minimize: clearly the minimum value must be less than or equal to Λ[x0 ]. The arc x0 also provides a reference input for our problem, and we can search for preferable inputs by making wise adjustments to x0 . For any specific variation h, consider the one-parameter family of arcs xλ (t) = x0 (t) + λh(t), a ≤ t ≤ b; λ ∈ R. Each of these arcs is admissible; the sign and magnitude of the parameter λ determine the strength by which a perturbation of “shape” h is added to the reference shape x0 . To assess how much improvement from the reference value we can achieve, we define the function Z b def φ(λ) = Λ[xλ ] = L(t, xλ (t), ẋλ (t)) dt. a Here φ(0) = Λ[x0 ] is our reference value, and if φ′ (0) < 0 we know that φ(λ) < φ(0) for small λ > 0: that is, a small perturbation of shape h will improve our reference arc. (If φ′ (0) > 0, then φ(λ) < φ(0) for small λ < 0, so an improvement is still possible—it is obtained by adding a small positive multiple of −h to x0 .) To get the most possible improvement out of this idea, we could somehow solve the single-variable problem of choosing the scalar λ∗ that minimizes the function φ. The resulting perturbed arc xλ∗ is certain to be preferable to x0 whenever φ′ (0) 6= 0. In some practical problems, good choices of the reference arc x0 and the variation h might make xλ∗ a usable improvement. Alternatively, one could build an iterativeimprovement scheme by declaring xλ∗ as the new reference arc (change its name to x0 ) choosing a new variation, and repeating the process above to generate further improvements. (Note: This approach is both conceptually attractive and technically feasible, but it is neither efficient nor effective. Training computers to find approximate solutions for the basic problem is an ongoing area of research, and the best known methods are quite different from the one outlined above.) File “var1”, version of 08 January 2015, page 4. Typeset at 14:51 January 9, 2015. Chapter I. First Variations 5 √ Example (Soap Film). When L(t, x, v) = x 1 + v 2 , (a, A) = (0, 1), and (b, B) = (1, 2), the unique admissible linear arc is x0 (t) = 1 + t, Calculation gives Z Λ[x0 ] = 0 1 0 ≤ t ≤ 1. √ Z 1 p √ 3 2 2 x0 (t) 1 + ẋ0 (t) dt = ≈ 2.1213. (1 + t) 2 dt = 2 0 Selecting the variation h(t) = sin(πt) leads to xλ (t) = 1 + t + λ sin(πt), ẋλ (t) = 1 + πλ cos(πt), Z 1 p φ(λ) = (1 + t + λ sin(πt)) 1 + (1 + πλ cos(πt))2 dt. 0 The computer-algebra system “Maple” suggests that the choice λ ≈ −0.1886 minimizes φ(λ), with φ(−0.1886 . . .) ≈ 2.0796. The arc x(−0.1886) has a Λ-value that is 1.96% lower than Λ[x0 ]. Selecting the variation h(t) = t(1 − t) instead leads to xλ (t) = 1 + t + λt(1 − t), ẋλ (t) = 1 + λ(1 − 2t), Z 1 p (1 + t + λt(1 − t)) 1 + (1 + λ(1 − 2t))2 dt. φ(λ) = 0 Numerical minimization using “Maple” suggests that the λ ≈ −0.7252 minimizes φ(λ), with φ(−0.725 . . .) ≈ 2.0792. The arc x(−0.725) produces a Λ-value that is 1.99% lower than Λ[x0 ]. Notice that the definitions of φ, xλ , and the minimizing λ-value all depend implicitly on the chosen variation h. Some variations produce better results than others. Theoretical methods we will develop below reveal that the minimizing arc in this problem is x b(t) = cosh(α + t cosh α) , cosh α where α ≈ 0.323074. The minimum value is Λ[b x] ≈ 2.0788, only 2.00% lower than Λ[x0 ]. Here are two plots to illustrate these results. The first shows the four admissible arcs generated above: the straight line in blue, the trigonometric perturbation in red, the quadratic perturbation in green, and the true minimizer in black. The nonlinear arcs are so close together that they look identical when printed. To display their differences, the second plot shows the differences x − xopt for the trigonometric File “var1”, version of 08 January 2015, page 5. Typeset at 14:51 January 9, 2015. 6 PHILIP D. LOEWEN perturbation x in red, the quadratic perturbation x in green, and the true minimizer x = xopt in black. (So the black line coincides with the t-axis.) 2.5 2 1.5 1 0 0.2 0.4 0.6 0.8 1 0.2 0.4 0.6 0.8 1 −3 20 x 10 15 10 5 0 −5 0 //// Descent Directions. Every time we choose a reference arc x0 and a variation h, we can define the single-variable function def φ(λ) = Λ[x0 + λh]. For small λ, the linear approximation φ(λ) ≈ φ(0) + φ′ (0)λ + o(λ2 ) reveals the scalar φ′ (0) as the rate of change of the objective value Λ with respect to a variation in direction h, locally near the reference arc x0 . It seems natural to say that h provides a “descent direction for Λ at x0 ” when φ′ (0) < 0. Let’s calculate File “var1”, version of 08 January 2015, page 6. Typeset at 14:51 January 9, 2015. Chapter I. First Variations 7 φ′ (0), holding tight to a single fixed variation h throughout. φ(λ) − φ(0) λ→0 λ Z b L(t, x0 (t) + λh(t), ẋ0 (t) + λḣ(t)) − L(t, x0 (t), ẋ0 (t)) = lim dt λ→0 a λ Z b L(t, x0 (t) + λh(t), ẋ0 (t) + λḣ(t)) − L(t, x0 (t), ẋ0 (t)) = lim dt λ a λ→0 Z b d L(t, x0 (t) + λh(t), ẋ0 (t) + λḣ(t)) dt = dλ a λ=0 Z b = Lx (t, x0 (t), ẋ0 (t))h(t) + Lv (t, x0 (t), ẋ0 (t))ḣ(t) dt a Z b d = Lx (t, x0 (t), ẋ0 (t)) − Lv (t, x0 (t), ẋ0 (t)) h(t) dt (int by parts). dt a φ′ (0) = lim In summary, we have ′ φ (0) = Z b R(t)h(t) dt, a d R(t) = Lx (t, x0 (t), ẋ0 (t)) − Lv (t, x0 (t), ẋ0 (t)). dt where (∗∗) A key observation: φ and φ′ (0) depend on our choice of h, but R does not. Formula (∗∗) is valid for any smooth admissible x0 and variation h, and is full of useful information. Viewpoint 1. For an admissible arc x0 , we can use (∗∗) to guide a search for descent directions. It suffices to choose a variation h whose pointwise product with R is large and negative. The choice h = −R is particularly tempting, but the requirement that h(a) = 0 = h(b) sometimes requires a modification of this selection. Example. Suppose (a, A) = (0, 0), (b, B) = ( π2 , π2 ), and L(t, x, v) = v 2 −x2 . Consider the linear reference arc x0 (t) = t. Its objective value is Λ[x0 ] = Z 0 π/2 2 ẋ0 (t) − x0 (t) To improve on this, calculate 2 dt = Lx (t, x, v) = −2x, Z π/2 0 1−t 2 π 1 dt = − 2 3 3 π ≈ 0.2789. 2 Lv (t, x, v) = 2v, so Lx (t, x0 (t), ẋ0 (t)) = −2t, Lv (t, x0 (t), ẋ0 (t)) = 2. d We get R(t) = [−2t] − dt [2] = −2t. Here −R(t) = 2t does not vanish at both endpoints, but it is positive everywhere, so we try a variation that is positive everywhere: h(t) = sin(2t). File “var1”, version of 08 January 2015, page 7. Typeset at 14:51 January 9, 2015. 8 PHILIP D. LOEWEN Calculation gives Z π/2 [1 + 2λ cos(2t)]2 − [t + λ sin(2t)]2 dt φ(λ) = 0 = Z π/2 0 [1 + 4λ cos(2t) + 4λ2 cos2 (2t)] − [t2 + 2λt sin(2t) + λ2 sin2 (2t)] dt 3π 2 π π π3 λ − λ+ − . 4 2 2 24 This is minimized when λ = 1/3, and the minimum value provides a 94% discount from the reference value Λ[x0 ]: = Λ[x1/3 ] = φ(1/3) = 5π π 3 − ≈ 0.01707. 12 24 Viewpoint 2. If our minimization problem has a solution, we don’t need to know it in detail to assign it the name “x0 ”. If the solution happens to be smooth, then the derivation above applies and conclusion (∗∗) is available. But now the minimality property of x0 makes it impossible to improve upon: we must have φ′ (0) = 0 for every possible variation h, i.e., Z b 0= R(t)h(t) dt for every h: [a, b] → R obeying h(a) = 0 = h(b). (†) a This forces R(t) = 0 for all t. To see why any other outcome is impossible, imagine that some θ ∈ (a, b) makes R(θ) < 0. Our smoothness hypotheses guarantee that R is continuous on [a, b]. Recall d [Lv (t, x0 (t), ẋ0 (t))] , t ∈ [a, b]. dt So if R(θ) < 0, there must be some open interval (α, β) containing θ such that [α, β] ⊆ [a, b] and R(t) < 0 for all t in (α, β). The variation t−α , for α ≤ t ≤ β, 1 − cos 2π h(t) = β − α 0, otherwise, R(t) = Lx (t, x0 (t), ẋ0 (t)) − is continuously differentiable, everywhere nonnegative, and has h(t) > 0 if and only if t ∈ (α, β). Using this variation in (∗∗) would give Z β ′ φ (0) = R(t)h(t) dt < 0. α This contradicts (†). We have shown that if x0 is a smooth minimizer, then R cannot take on any negative values. Positive R-values are impossible for similar reasons. Our conclusion, R(t) = 0 for all t ∈ [a, b], is usually written as d [Lv (t, x0 (t), ẋ0 (t))] , t ∈ [a, b]. (DEL) dt This is the renowned Euler-Lagrange Equation (“EL”) in differentiated form (“D”, hence “DEL”). If a smooth arc x0 gives the minimum in the basic problem, it must obey (DEL). Solutions of (DEL) are called extremal arcs. Lx (t, x0 (t), ẋ0 (t)) = File “var1”, version of 08 January 2015, page 8. Typeset at 14:51 January 9, 2015. Chapter I. First Variations 9 Example. When L(t, x, v) = v 2 − x2 , we have Lx (t, x, v) = −2x and Lv (t, x, v) = 2v, so equation (DEL) for an unknown arc x(·) says −2x(t) = d [2ẋ(t)] , dt i.e., ẍ(t) + x(t) = 0. A complete list of smooth solutions for this equation is x(t) = c1 cos(t) + c2 sin(t), c1 , c2 ∈ R. In a previous example, the prescribed endpoints where (a, A) = (0, 0) and (b, B) = ( π2 , π2 ). Only one solution joins these points: substitution gives π π = c sin 0 = x(0) = c1 , 2 2 2 = c2 , so x(t) = π2 sin(t) is the only smooth contender for optimality in the corresponding problem. Its objective value is Λ[ π2 π/2 2 Z π/2 Z π2 π 2 π/2 π 2 2 cos(2t) dt = cos t − sin t dt = = 0. sin(2t) sin] = 2 4 0 8 0 t=0 //// Discussion. It seems natural to call R the residual in equation (DEL), and to record the following interpretation. A given arc x0 is extremal if and only if it makes R = 0. If x0 makes R(θ) < 0 at some instant θ, then perturbing x0 with a smooth upward bump centred at θ will give a preferable arc (i.e., an arc with lower Λ-value). A smooth downward bump is advantageous near any point where R(θ) > 0. B. Abstract Minimization in Vector Spaces Throughout this section, V is a real vector space, and Φ: V → R is given. Derivatives. Suppose Φ: V → R is given. The directional derivative of Φ at base point x b in direction h is this number (or “undefined”): def Φ′ [x; h] = lim λ→0+ Φ[b x + λh] − Φ[b x] . λ Note: Φ′ [x; 0] = 0, and for all r > 0, Φ′ [b x; rh] = lim λ→0+ Φ[b x + λrh] − Φ[b x] r Φ[b x + (λr)h] − Φ[b x] × = r lim = rΦ′ [b x; h]. + λ r (λr) λ→0 When x b and Φ are such that Φ′ [b x; h] is defined for every h ∈ V , we say Φ is directionally differentiable at x b, and define the derivative of Φ at x b as the operator DΦ[b x]: V → R for which DΦ[b x](h) = Φ′ [b x; h] File “var1”, version of 08 January 2015, page 9. ∀h ∈ V. Typeset at 14:51 January 9, 2015. 10 PHILIP D. LOEWEN Descent. If Φ′ [b x; h] < 0, then h 6= 0, and h provides a first-order descent direction for Φ at x b. That is, for 0 < λ ≪ 1, Φ′ [x; h] ≈ Φ[b x + λh] − Φ[b x] λ =⇒ Φ[b x + λh] ≈ Φ[b x] + λΦ′ [b x; h] < Φ[b x]. (∗) Directional Local Minima. A point x b in X provides a Directional Local Minimum (DLM) for Φ over V exactly when, for every h ∈ V , there exists ε = ε(h) > 0 so small that ∀λ ∈ (0, ε), Φ[b x] ≤ Φ[b x + λh]. Intuitively, x b is a DLM for Φ if it provides an ordinary local minimum in the one-variable sense along every line through x b in the space V . Proposition. If x b gives a DLM for Φ over V , then ∀h ∈ V, Φ′ [b x; h] ≥ 0 (or Φ′ [b x; h] is undefined). (∗∗) In particular, if Φ is directionally differentiable at x b and DΦ[b x] is linear, then DΦ[b x] = 0 (“the zero operator”). Proof. If (∗∗) is false, then Φ′ [b x; h] < 0 for some h ∈ X, and DLM definition is contradicted by (∗). So (∗∗) must hold. Now if DΦ[b x] is linear, then for arbitrary h ∈ V two applications of (∗∗) give DΦ[b x](h) = Φ′ [b x; h] ≥ 0, ′ −DΦ[b x](h) = DΦ[b x](−h) = Φ [b x; −h] ≥ 0 Thus 0 ≤ DΦ[b x](h) ≤ 0, giving DΦ[b x](h) = 0. Since this holds for arbitrary h ∈ V , DΦ[b x] must be the zero operator. //// Application. When V = Rn and Φ: Rn → R has a gradient at x b ∈ Rn , we calculate Φ[b x + λh] − Φ[b x] λ φ(λ) − φ(0) def = lim+ for φ(λ) = Φ[b x + λh] λ λ→0 = φ′ (0) = ∇Φ(b x)h. Φ′ [b x; h] = lim λ→0+ Here DΦ[b x] is the operator “left matrix multiplication by the 1 × n matrix ∇Φ(b x)”. This is linear, so the proposition above gives a familiar result: if x b minimizes Φ over Rn , then ∇Φ(b x) must be the zero vector. Affine Constraints. In many practical minimization problems, the domain of the objective function is confined to a proper subset of a real vector space X. The simplest such case arises when the subset S is affine, i.e., when S = x0 + V = {x0 + h : h ∈ V } for some element x0 ∈ X and subspace V of X. File “var1”, version of 08 January 2015, page 10. Typeset at 14:51 January 9, 2015. Chapter I. First Variations 11 Lemma. For S as above, one has S = z + V for every z ∈ S. Proof. Since z ∈ S, there exists h0 ∈ V such that z = x0 + h0 . Then y ∈ S ⇔ y = x0 + h for some h ∈ V ⇔ y = (z − h0 ) + h for some h ∈ V ⇔ y = z + (h − h0 ) for some h ∈ V ⇔ y = z + H for some H ∈ V (H = h − h0 ). //// Definition. Given a vector space X containing an element x0 and a subspace V , let S = x0 + V and suppose Φ: S → R. A point x b ∈ S gives a directional local minimum (DLM) for Φ relative to V if and only if ∀h ∈ V, ∃ε = ε(h) > 0 : ∀λ ∈ [0, ε), Φ[b x] ≤ Φ[b x + λh]. Geometrically, x b provides a DLM relative to V if x b gives a local minimum in every one-dimensional problem obtained by restricting Φ to a line passing through x b and parallel to V . To harness our previous results, we translate (shift) our reference point from the origin of X to the point x b, defining Ψ: V → R via def Ψ[h] = Φ[b x + h], h ∈ V. Note that Ψ′ [0; h] = lim λ→0+ Φ[b x + λh] − Φ[b x] Ψ[λh] − Ψ[0] = lim = Φ′ [b x; h]. λ λ λ→0+ If x b ∈ S gives a DLM for Φ relative to V , then 0 gives a DLM for Ψ over V , and our previous result applies to Ψ, giving ∀h ∈ V, Ψ′ [0; h] ≥ 0 (or Ψ′ [0; h] is undefined), Summary. If x b ∈ S = x0 + V solves the constrained problem min {Φ[x] : x ∈ x0 + V } , x∈X then ∀h ∈ V, Φ′ [b x; h] ≥ 0 (or Φ′ [b x; h] is undefined), If, in addition, Φ is differentiable on X and DΦ[b x]: X → R is linear, then then DΦ[b x](h) = 0 for all h ∈ V . Application. Consider X = Rn . Given a nonzero normal vector N ∈ Rn and point x0 ∈ Rn , consider the hyperplane S = {x ∈ Rn : N • (x − x0 ) = 0} . File “var1”, version of 08 January 2015, page 11. Typeset at 14:51 January 9, 2015. 12 PHILIP D. LOEWEN Observe that x ∈ S iff x − x0 ∈ V , where V = {h ∈ Rn : N • h = 0} . This V is a subspace of Rn : it’s the set of all vectors perpendicular to N. Now suppose some x b in Rn minimizes Φ over S. We have DΦ[b x](h) = 0 for all h ∈ V , i.e., ∇Φ(b x)h = 0 for all h ∈ V , i.e., ∇Φ(b x) ⊥ V , i.e., ∇Φ(b x) = −λN for some λ ∈ R. In summary, if x b minimizes Φ: Rn → R subject to the constraint N • (x − x0 ) = 0, then 0 = ∇ Φ(x) + λN • (x − x0 ) . x=b x That’s the Lagrange Multiplier Rule for the case of a single (linear) constraint! C. Derivatives of Integral Functionals Our “basic problem” involves minimization of Λ: C 1 [a, b] → R defined by Λ[x] = L(t, x(t), ẋ(t)) dt for some given Lagrangian L ∈ C 1 ([a, b] × R × R). So we pick a arbitrary x b, h ∈ C 1 [a, b] and calculate 1 ′ Λ [b x; h] = lim+ Λ[b x + λh] − Λ[b x] (1) λ→0 λ Z i 1 bh L t, x b(t) + λh(t), x(t) ḃ + λḣ(t) − L(t, x(t), ẋ(t)) dt (2) = lim+ λ→0 λ a Z b i 1h = lim+ L t, x b(t) + λh(t), x(t) ḃ + λḣ(t) − L t, x b(t), x(t) ḃ dt (3) a λ→0 λ Z b ∂ = L t, x b(t) + λh(t), x(t) ḃ + λḣ(t) dt (4) a ∂λ λ=0 Z b = Lx t, x b(t), x(t) ḃ h(t) + Lv t, x b(t), x(t) ḃ ḣ(t) dt (5) Rb a Here line (1) is the definition of the directional derivative, and line (2) comes from the definition of Λ. Passing from (2) to (3) requires that we interchange the limit and the integral. In our case this is justified because the limit is approached uniformly in t, a consequence of L ∈ C 1 . See, e.g., Walter Rudin, Real and Complex Analysis, page 223. Existence of the derivative in (4) and its evaluation in (5) also follow from our assumption that L ∈ C 1 ; the definition of this derivative allows it to be evaluated as shown inside the integral in (3). b b x (t) and L b v (t), Using the notation L(t) = L t, x b(t), x(t) ḃ and likewise defining L we summarize: if L ∈ C 1 and x b ∈ C 1 [a, b], then Z bh i ′ b b Lx (t)h(t) + Lv (t)ḣ(t) dt Λ [b x; h] = a ∀h ∈ C 1 [a, b]. Integration by parts gives an equivalent expression. Let q(t) = q(a) = 0 and b x (t) dt dq(t) = q̇(t) dt = L ∀t ∈ [a, b]. File “var1”, version of 08 January 2015, page 12. Rt a (∗) b x (r) dr. Then L Typeset at 14:51 January 9, 2015. Chapter I. First Variations 13 b x is continuous (The latter follows from the Fundamental Theorem of Calculus, since L on [a, b].) Hence the first term on the right in (∗) is Z b Z b b Lx (t)h(t) dt = h(t) dq a a b = q(t)h(t) t=a = q(b)h(b) − We conclude that for all h ∈ C 1 [a, b], "Z # Z b ′ b Λ [b x; h] = Lx (t) dt h(b) + a a b Z − a b Z b q(t) dh(t) a b x (t)ḣ(t) dt L Z t b v (t) − b x (r) dr ḣ(t) dt. L L (∗∗) a Note that this expression well-defined for all h ∈ C 1 [a, b], and linear in h. D. Fundamental Lemma; Euler-Lagrange Equation Given real numbers a < b, consider these subspaces of C 1 [a, b]: VII = h ∈ C 1 [a, b] : h(a) = 0, h(b) = 0 , VI0 = h ∈ C 1 [a, b] : h(a) = 0 , V0I = h ∈ C 1 [a, b] : h(b) = 0 , VM N = h ∈ C 1 [a, b] : M h(a) = 0, N h(b) = 0 , V00 = C 1 [a, b]. Recall the basic problem of the COV: ( ) Z b Λ[x] := L (t, x(t), ẋ(t)) dt : x(a) = A, x(b) = B . min 1 x∈C [a,b] a With X = C 1 [a, b], the arcs competing for minimality in (P ) are those in the set S = {x ∈ X : x(a) = A, x(b) = B} . An arc is admissible for (P ) if it lies in S. This set is affine: pick any x0 in S (like the straight-line curve x0 (t) = A + (t − a)(B − A)/(b − a)) to see S = x0 + VII . [Proof: One has x ∈ S if and only if x − x0 ∈ VII . This is easy.] Our general discussion above shows that if some arc x b in S gives a directional local minimum for Λ relative to VII , then every h ∈ VII must satisfy Z b Z t ′ b b b b x (r) dr. 0 = Λ [b x; h] = N (t)ḣ(t) dt, where N (t) = Lv (t) − L a a This situation explains our interest in the Fundamental Lemma below. File “var1”, version of 08 January 2015, page 13. Typeset at 14:51 January 9, 2015. 14 PHILIP D. LOEWEN Lemma (duBois-Reymond). If N : [a, b] → R is integrable, TFAE: Z b N (t)ḣ(t) dt = 0 for all h ∈ VII . (a) a (b) The function N is essentially constant. Proof. (b⇒a): If N (t) = c for all t in [a, b] (allowing finitely many exceptions), then each h ∈ VII obeys b Z b = 0. (∗) cḣ(t) dt = ch(t) a t=a (a⇒b): Consider any piecewise continuous function N . Use the average value of N , namely the constant Z b 1 N (r) dr, N= b−a a to define Z t h(t) = N − N (r) dr. a 1 Obviously h ∈ C [a, b] with h(a) = 0, but also (by definition of N ) Z b Z b h(b) = N − N (r) dr = (b − a)N − N (r) dr = 0. a a Therefore h ∈ VII ; in particular, using c = N in line (∗) gives Z b N ḣ(t) dt = 0. a It follows that Z b Z N (t)ḣ(t) dt = a b a = Z a b N (t) − N ḣ(t) dt N (t) − N N − N (t) dt = − Z a b N (t) − N 2 dt. If (a) holds, this integral equals 0, and therefore N (t) = N for all t in [a, b] (with at most finitely many exceptions). This proves (b). //// Theorem (Euler-Lagrange Equation—Integral Form). If x b is a directional local minimizer in the basic problem (P ), then there is a constant c such that Z t b x (r) dr b v (t) = c + L ∀t ∈ [a, b]. (IEL) L a Proof. As discussed above, directional local minimality implies that for every h in VII obeys Z t Z b ′ b x (r) dr. b b b L N (t)ḣ(t) dt, where N (t) = Lv (t) − 0 = Λ [b x; h] = a a b is essentially constant. But Thanks to the Fundamental Lemma, it follows that N 1 b is continuous on [a, b], so (IEL) holds at since both L and x b are C , the function N all points without exception. //// File “var1”, version of 08 January 2015, page 14. Typeset at 14:51 January 9, 2015. Chapter I. First Variations 15 Definition. An arc that satisfies (IEL) on some interval [a, b] is called an extremal for L on that interval. Remark. Note that (IEL) is the same for the function −L as it is for L, so it must hold also for a directional local maximizer in the Basic Problem. An extremal arc in the COV is analogous to a “critical point” in ordinary calculus: the set of extremal arcs includes every arc that provides a (directional) local minimum or maximum, and possibly some arcs that provide neither. Remark. For any admissible arc x b that fails to satisfy (IEL), the continuous function b N defined above will be nonconstant and the proof of the Fundamental Lemma shows that Z t Z b 1 h(t) = N (r) dr, N − N (r) dr, where N= b−a a a makes Λ′ [b x; h] < 0. That is, this h is a descent direction for Λ[·] relative to the nominal arc x b. //// Regularity Bonus. The integrand L(t, x, v) = 21 v 2 , for which Lv (t, x, v) = v, gives Lv (t, x(t), ẋ(t)) = ẋ(t). So for typical arcs x ∈ C 1 [a, b], we can be sure that the mapping t 7→ Lv (t, x(t), ẋ(t)) is continuous, but perhaps no better. However, minimality exerts a little bias in favour of smoothness: in (IEL), the integrand on RHS b v (t) is is continuous, so the RHS fcn of t is differentiable. This means that t 7→ L guaranteed to be differentiable, with d b b x (t) Lv (t) = L dt ∀t ∈ [a, b]. (DEL) In fact, since the right side here is continuous, we have the following. b v is conCorollary. If x b gives a directional local minimum in problem (P), then L tinuously differentiable, and (DEL) holds. We discuss consequences of this in the next section. Example. Find candidates for minimality in (P ) with L(t, x, v) = v 2 − x2 and (a, A) = (0, 0), in cases (i) (b1 , B1 ) = (π/2, 1), (ii) (b2 , B2 ) = (3π/2, 1). Note Lv (t, x, v) = 2v and Lx (t, x, v) = −2x. If x b minimizes Λ among all C 1 curves from (0, 0) to (b1 , B1 ), then (DEL) says d ∃ 2x(t) ḃ = − 2b x(t) dt ∀t. That is, x(t) b̈ = −b x(t) for all t. This shows x b ∈ C 2 , and gives the general solution x b(t) = c1 cos(t) + c2 sin(t), File “var1”, version of 08 January 2015, page 15. c1 , c2 ∈ R. Typeset at 14:51 January 9, 2015. 16 PHILIP D. LOEWEN (i) Here the boundary conditions give c1 = 0, c2 = 1. Unique candidate: x b1 (t) = sin(t). Later we’ll show that x b1 gives a true [global] minimum: ( Λ1 [b x1 ] = min Λ1 [x] = Z π/2 2 ẋ(t) − x(t) 0 2 ) dt : x(0) = 0, x(π/2) = 1 . (ii) Here the BC’s identify the unique candidate x b2 (t) = − sin(t). Later we’ll show that this does not give even a directional local minimum; moreover, inf ( Λ2 [x] = Z 3π/2 ẋ(t)2 − x(t) 0 2 dt : x(0) = 0, x(3π/2) = 1 ) = −∞. //// Special Case 1: L = L(t, v) is independent of x. Here (IEL) reduces to a first-order ODE for x b, involving an unknown constant: b v (t) = const. L Consider these subcases, where L = L(v) is also independent of t: + 2 2 L= v −1 . p L = 1 + v2 , 2 L=v , In these three cases, every admissible extremal x b is a global minimizer. To see this, let c = Lv (x) ḃ and define f (v) = L(v) − cv. Then f ′ (v) = Lv (v) − c is nondecreasing, with f ′ (x(t)) ḃ = 0, so that point gives a global minimum. In other words, f (v) ≥ f (x(t)) ḃ Now every arc x obeying the BC’s has Z Z a b a b ∀v ∈ R, ∀t ∈ [a, b]. Rb a (∗) cẋ(t) dt = c [x(b) − x(a)] = c [B − A], so f (ẋ(t)) dt ≥ L(ẋ(t)) dt − c [B − A] ≥ Z Z b a b a f (x(t)) ḃ dt L(x(t)) ḃ dt − c [B − A] Λ[x] ≥ Λ[b x]. √ (For L = 1 + v 2 , this proves that the arc of shortest length from (a, A) to (b, B) is the straight line. The technical definition of the term “arc” here leaves room for some improvement in this well-known conclusion.) File “var1”, version of 08 January 2015, page 16. Typeset at 14:51 January 9, 2015. Chapter I. First Variations 17 An Optimistic Calculation: Suppose x b solves (IEL) and is in fact C 2 . (See “Regularity Bonus” above, and Section D below.) Then the Chain Rule and (DEL) together give d b t (t) + L b x (t)x(t) b v (t)x(t) L t, x b(t), x(t) ḃ =L ḃ + L b̈ dt h i b v (t)x(t) b t (t) + d L ḃ by (DEL). =L dt We rearrange this to get i d hb b v (t)x(t) b t (t) L(t) − L ḃ =L dt ∀t ∈ [a, b]. (W E2) Special Case 2: L = L(x, v) is independent of t (“autonomous”). For every extremal x b of class C 2 , (W E2) implies that b −L b v (t)x(t) L(t) ḃ = C ∀t ∈ [a, b], for some constant C. In other words, the following function of 2 variables is constant along every C 2 arc solving (IEL): L(x, v) − Lv (x, v) · v. In Physics, a famous Lagrangian is L(x, v) = 12 mv 2 − 21 kx2 (that’s KE minus PE for a simple mass-spring system). Here Lv = mv, and the function above works out to 2 2 2 2 1 1 1 1 mv − kx − (mv)v = − mv + kx , 2 2 2 2 the total energy. For other Lagrangians, condition (WE2) expresses conservation of energy along real motions. //// Caution. (WE2) and (IEL) are not quite equivalent, even for very smooth arcs. The Lagrangian L(x, v) = 12 mv 2 − 12 kx2 illustrates this. As shown above, (WE2) holds along an arc x b if and only if 1 ḃ 2 2 mx(t) + 21 kb x(t)2 = const, and this is true for any constant function x b. However, (IEL) holds along x b iff mx(t) b̈ + kb x(t) = 0, and the only constant solution of this equation is x b(t) = 0. The upshot: Every smooth solution of (IEL) must obey (WE2), but (WE2) may have some spurious solutions as well. Home practice: Show that if x ∈ C 2 obeys (WE2) and satisfies ẋ(t) 6= 0 for all t, then x obeys (IEL) also. (Thus, many solutions of WE2 also obey IEL, and we can predict which they are.) //// File “var1”, version of 08 January 2015, page 17. Typeset at 14:51 January 9, 2015. 18 PHILIP D. LOEWEN D. Smoothness of Extremals Here we explore the “regularity bonus” mentioned above in detail. The situation is especially good when L is quadratic in v. Proposition. Suppose L(t, x, v) = 12 A(t, x)v 2 + B(t, x)v + C(t, x) for C 1 functions A, B, C. Then for any x b obeying (IEL), with A(t, x b(t)) 6= 0 for all t ∈ [a, b], we have x b ∈ C 2 [a, b]. Proof. Here Lv (t, x, v) = A(t, x)v + B(t, x). Along the arc x b, this identity gives x(t) ḃ = h i 1 b v (t) − B(t, x L b(t)) , A(t, x b(t)) t ∈ [a, b]. The function of t on the RHS in C 1 , so x ḃ ∈ C 1 [a, b], giving x b ∈ C 2 [a, b]. //// The previous result covers a surprising wealth of examples, but can still be generalized. Recall the quadratic approximation: f (v) ≈ f (b v) + ∇f (b v )(v − vb) + 21 (v − vb)T D2 f (b v)(v − vb), Use this on the function f (v) = L(t, x, v) near vb = x(t): ḃ v ≈ vb. v)(v − vb)2 . L(t, x, v) ≈ L(t, x, b v) + Lv (t, x, b v)(v − vb) + 21 Lvv (t, x, b Here the coefficient of 12 v 2 is A(t, x) = Lvv (t, x, b v). So we might expect the propob vv (t) 6= 0 for all sition above to hold for general L, under the assumption that L t ∈ [a, b]. This turns out to be correct, but the reasons are not simple. Classic M100 problem: Assuming the relation v3 − v − t = 0 defines v as a function of t near the point (t, v) = (0, 0), find dv/dt there. Solution: Differentiation gives 3v 2 dv dv dv 1 − − 1 = 0 =⇒ = 2 . dt dt dt 3v − 1 At the point (t, v) = (0, 0), substitution gives dv = −1. dt (t,v)=(0,0) The curve t = v 3 − v is easy to draw in the (t, v)-plane. As the following sketch shows, there are three points on the curve satisfying t = 0, and the calculation above File “var1”, version of 08 January 2015, page 18. Typeset at 14:51 January 9, 2015. Chapter I. First Variations 19 finds the slope at just one of them: v q α β t p Figure 1: The curve v 3 − v − t = 0 in (t, v)-space. (The shaded rectangle will be described later.) Classic M200 reformulation: Assuming the relation F (t, v) = 0 defines v as a function of t near the point (t0 , v0 ), find dv/dt at this point. Solution: 0= d dv dv Ft (t, v(t)) F (t, v(t)) = Ft (t, v(t)) + Fv (t, v(t)) =⇒ =− . dt dt dt Fv (t, v(t)) Classic M321 theorem (Rudin, Principles, Thm. 9.28): Implicit Function Theorem. An open set U ⊆ R2 is given, along with (t0 , v0 ) ∈ U and a function F : U → R (F = F (t, v)) such that both Ft and Fv exist and are continuous at each point of U . Suppose F (t0 , v0 ) = 0. If Fv (t0 , v0 ) 6= 0, then there are open intervals (α, β) containing t0 and (p, q) containing v0 such that (i) For each t ∈ (α, β), the equation F (t, v) = 0 holds for a unique point v ∈ (p, q). (ii) If we write ψ(t) for the unique v in (i), so that F (t, ψ(t)) = 0 for all t ∈ (α, β), then ψ ∈ C 1 (α, β), with ψ̇(t) = − Ft (t, ψ(t)) Fv (t, ψ(t)) ∀t ∈ (α, β). Illustrations. The equation z = F (t, v) defines a surface (the graph of F ) in R3 lying above the open set U . The (t, v)-plane in R3 is defined by the equation z = 0. File “var1”, version of 08 January 2015, page 19. Typeset at 14:51 January 9, 2015. 20 PHILIP D. LOEWEN This plane slices the graph of F in the same curve we see in the M100 example above. 1 0 –1 –1 –0.5 –1 –0.5 v0 0.5 0t 1 0.5 1 Figure 2: The plane z = 0 slicing the surface z = F (t, v). The Theorem presents conditions under which some open rectangle (α, β) × (p, q) centred at (t0 , v0 ) contains a piece of the curve that coincides with the graph of a C 1 function on (α, β). A rectangle consistent with this conclusion is shown in Figure 1. We now prove the famous regularity theorem of Weierstrass/Hilbert. Theorem (Weierstrass/Hilbert). Suppose L = L(t, x, v) is C 2 on some open set containing the point (t0 , x0 , v0 ) ∈ [a, b] × R × R. Assume Lvv (t0 , x0 , v0 ) 6= 0. Then any arc x b obeying (IEL) on some interval I containing t0 , and satisfying (t0 , x b(t0 ), x(t ḃ 0 )) = (t0 , x0 , v0 ), must be C 2 on some relatively open subinterval of I containing t0 . In particular, if L ∈ C 2 everywhere and x b ∈ C 1 [a, b] is extremal for 2 L, then x b ∈ C [a, b]. Proof. Since x b is an extremal, there is a constant c so that the function Z t b x (r) dr − c F (t, v) := Lv (t, x b(t), v) − L a obeys F (t, x(t)) ḃ = 0 for all t ∈ [a, b]. In particular, F (t0 , v0 ) = 0. Now function F is 1 jointly C near (t0 , v0 ), and Fv (t0 , v0 ) = Lvv (t0 , x0 , v0 ) 6= 0 by (ii). Hence the implicit function theorem cited above gives an open interval (α, β) around t0 and another open set U around v0 such that the conditions F (t, ψ(t)) = 0, ψ(t) ∈ U implicitly define a unique ψ ∈ C 1 (α, β). Since x ḃ is continuous at t0 , with x(t ḃ 0 ) = v0 ∈ U , we may shrink (α, β) if necessary to guarantee that x(t) ḃ ∈ U for all t ∈ I ∩ (α, β). We already know F (t, x(t)) ḃ = 0 in I ∩ (α, β) so uniqueness gives ψ(t) = x(t) ḃ for all t 1 1 2 in this interval. But since ψ ∈ C , this gives x ḃ ∈ C , i.e., x b ∈ C (I ∩ (α, β)). //// File “var1”, version of 08 January 2015, page 20. Typeset at 14:51 January 9, 2015. Chapter I. First Variations 21 E. Natural Boundary Conditions Our abstract theory is easily extended to problems where one or both of the endpoint values are unconstrained. A missing boundary condition in the problem formulation generates allows a larger class of admissible variations, and this gives an extra conclusion in the first-order analysis. The extra conclusion is called a “Natural Boundary Condition” because it arises naturally through minimization, rather than artificially in the formulation of the problem. Consider, for example, this problem where the right endpoint is unconstrained: ( min Λ[x] := Z ) b a L(t, x(t), ẋ(t)) dt : x(a) = A, x(b) ∈ R . For b with x b(a) = b + h is admissible for every variation h ∈ VI0 = any1arc x A, the arc x h ∈ C [a, b] : h(a) = 0 . If x b happens to give a DLM relative to VI0 , then every h ∈ VI0 obeys 0 = DΛ[b x](h) = "Z a b # b x (t) dt h(b) + L Z b a Z t b x (r) dr ḣ(t) dt. b v (t) − L L (∗) a Now VII ⊆ VI0 , and knowing (∗) for h ∈ VII is enough to establish (IEL), as shown above: thus there exists some constant c such that b v (t) = c + L Z t a b x (r) dr L ∀t ∈ [a, b]. This equation is independent of h: it describes a property of x b that can be used anywhere, including in (∗) above. Hence, for all h ∈ VI0 , Z b h i b cḣ(t) dt 0 = DΛ[b x](h) = Lv (b) − c h(b) + a h i b v (b) − c h(b) + c [h(b) − h(a)] ḣ(t) dt = L b v (b)h(b). =L Since h(b) is arbitrary, we get the natural boundary condition b v (b) = 0. L A similar argument, with a similar outcome, applies to problems when x(a) is free to vary in R, but x(b) = B is required. When both endpoints are free, both natural File “var1”, version of 08 January 2015, page 21. Typeset at 14:51 January 9, 2015. 22 PHILIP D. LOEWEN boundary conditions are in force. The table below summarizes these results: BC’s in Prob Stmt Admissible Variations Natural BC’s x(a) = A, x(b) = B VII , x(a) = A, VI0 b v (b) = 0 ,L , x(b) = B V0I , V00 = C 1 [a, b] b v (a) = 0, L b v (a) = 0, L b v (b) = 0 L Example. Consider the Brachistochrone Problem with a free right endpoint: ( ) Z bs 2 1 + ẋ(t) min Λ[x] = dt : x(0) = 0 . v02 + 2gx(t) 0 −1/2 Here Lv = v (1 + v 2 )(v02 + 2gx) , and the natural boundary condition at t = b says a minimizing curve must obey −1/2 2 2 b 0 = Lv (b) = v (1 + v )(v0 + 2gx) . (x,v)=(b x(b),ḃ x(b)) It follows that x(b) ḃ = 0: the minimizing curve must be horizontal at its right end. [Troutman, page 156: “In 1696, Jakob Bernoulli publicly challenged his younger brother Johann to find the solutions to several problems in optimization including [this one] (thereby initiating a long, bitter, and pointless rivalry between two representatives of the best minds of their era).”] //// HW02. Modify the derivation above to treat free-endpoint problems where the cost to be minimized includes endpoint terms, so it looks like this: Z b L(t, x(t), ẋ(t)) dt. k(x(a)) + ℓ(x(b)) + a Future Considerations. Later, we’ll study problems in which one or both endpoints are allowed to vary along given curves in the (t, x)-plane. The analysis above handles only rather special curves . . . namely, the vertical lines t = a and t = b. File “var1”, version of 08 January 2015, page 22. Typeset at 14:51 January 9, 2015.