ANALYSIS AND OPTIMIZATION 1. Lecture 1: Elements of Linear Algebra Definition 1.1. An m◊n matrix is a rectangular array with m rows and n columns: Q R a11 a12 · · · a1n c a21 a22 · · · a2n d c d A = (aij )m◊n = c . .. .. d . .. a .. . . . b am1 am2 ··· amn Here aij = the entry in the ith row and jth column. Also, i œ {1, . . . , m} and j œ {1, . . . , n}. Let us recall some basic properties or facts about matrices. 1. Addition: If two matrices A and B have the same size, we can add them. A + B = (aij + bij )m◊n . 2. Multiplication: If A and B are two matrices and if the number of columns of A matches the number of rows of B, we can multiply them. 3ÿ 4 n AB = aik bkj k=1 m◊p where A = (aij )m◊n and B = (bij )n◊p . 3. Scalar Multiplication: If A = (aij )m◊n is a matrix and c œ R, is a constant, then we can (scalar) multiply A by c. cA = (caij )m◊n 4. Interaction of Basic Operations: If A, B, and C have the appropriate size, then we have that i. (AB)C = A(BC); ii. A(B + C) = AB + AC; and iii. (A + B)C = AC + BC. 5. Curious Facts: In general, i. AB ”= BA. E.g., (aij )2◊3 (bij )3◊1 = a 2 ◊ 1 matrix; yet (bij )3◊1 (aij )2◊3 is not defined; ii. AB = 0 ”∆ A = 0 or B = 0; and iii. AB = AC ”∆ B = C or B = 0. 6. Transpose: Given a matrix A = (aij )m◊n we can take it’s transpose, defined as AÕ = At = AT = (aji )n◊m . E.g., Q R 3 4 0 1 0 2 4 A = a2 3b ∆ At = . 1 3 5 4 5 1 2 ANALYSIS AND OPTIMIZATION Definition 1.2. A column vector is an m ◊ 1 matrix. A row vector is a 1 ◊ n matrix. Definition 1.3. A set of vectors {a1 , . . . , an } is linearly dependent if a set of scalars {c1 , . . . , cn } exists with ci ”= 0 for some i œ {1, . . . , n} such that n ÿ ci ai = 0. i=1 Otherwise, the set {a1 , . . . , an } is called linearly independent. For example, 1. Parallel vectors are linearly dependent, by definition. Indeed, a1 Î a2 if a1 = ca2 for some c ”= 0. 2. If three vectors live in a plane, then they are linearly dependent. Conversely, if three vectors are linearly dependent, then they live in a plane. Remark 1.4. Suppose {a1 , . . . , an } is linearly dependent and Then, c1 a1 + · · · + cn an = 0 with cn ”= 0. an = ≠ n≠1 ÿ i=1 ci ai . cn In other words, an can be written as a linear combination of the other vectors. Definition 1.5. The span of a set of vectors {a1 , . . . , an } is the collection of all linear combinations of these vectors. span{a1 , . . . , an } = {c1 a1 + · · · + cn an : ci œ R} For example, 1. The span of a single vector is the collection of all vectors parallel to that vector and the 0 vector. 2. If we have two linearly independent vectors, their span is a plane. If we have two linearly dependent vectors, then they span a line. Definition 1.6. The rank of a matrix A, rank(A), is the maximal number of linearly independent column vectors of A. For example, 1. A = 0 ∆ rank(A) = 0. Theorem 1.7. rank(A) = rank(At ). Corollary 1.8. rank(A) = the maximal number of linearly independent rows of A. How do we find the rank of a given matrix? Let’s Q 2 0 A = (a1 | a2 | a3 | a4 ) = a1 1 0 1 try to compute the rank of R 4 1 3 0b . 1 1 Here ai , for i = 1, . . . 4, are the columns of A. To do this, we need to recall the elementary row operations on matrices. Let A = (aij )m◊n . ANALYSIS AND OPTIMIZATION 3 1. Add (row i) to c times (row j). This row operation is equivalent to multiplying A on the left by E = (eij )m◊m where E has a c in the (i, j)th position, 1s on the diagonal, and 0s everywhere else. 2. Swapping (row i) with (row j). This row operation is equivalent to multiplying A on the left by E = (eij )m◊m where E has 1s on all the diagonal entries except for the (i, i)th and (j, j)th positions, where it has 0s, and has 0s in the remaining positions except for the (i, j)th and (j, i)th positions, where it has 1s. 3. Multiplication of (row i) by c ”= 0. This row operation is equivalent to multiplying A on the left by E = (eij )m◊m where E has 1s on along the diagonal entries except for the (i, i)th position, where it has c. The rest of the positions are 0s. Theorem 1.9. rank(EA) = rank(A). So and Hence, Q 2 a1 0 Q 1 a1 0 0 1 1 ≠2 1 1 4 3 1 R Q 1 2 r 0b ‘≠≠≠1≠æ a1 r1 ≠r3 1 0 ≠1 1 1 R Q 0 0 1 r2 3 0b ‘≠≠≠≠æ a0 r2 ≠r1 1 1 0 ≠2 3 1 R Q 3 0 1 r 3 0b ‘≠≠≠1≠æ a1 r1 ≠r2 1 1 0 R Q 0 0 1 r3 3 0b ‘≠≠≠≠æ a0 3r3 ≠r2 1 1 0 rank(A) = 3. ≠2 1 1 ≠2 3 0 R 0 0 3 0b , 1 1 R 0 0 3 0b . 0 1 4 ANALYSIS AND OPTIMIZATION 2. Lecture 2: Elements of Linear Algebra Cont’d 2.1. Systems of Linear Equations. A system of linear equations takes the form a11 x1 + a12 x2 + · · · + a1n xn = b1 a21 x1 + a22 x2 + · · · + a2n xn = b1 .. .. .=. am1 x1 + am2 x2 + · · · + amn xn = bm . Equivalently, or compactly, any such system can be written as Ax = b for A = (aij )m◊n , b = (bi )m , and x = (xj )n◊1 . Here we think of A and b as given and x as an unknown. Given A and b, does an x exits such that Ax = b? This question is about linear dependence and independence. Theorem 2.1. Ax = b has a solution if and only if rank(A) = rank(Ab ) where Q R a11 · · · a1n b1 c .. .. d . .. Ab = (A | b) = a ... . . . b am1 · · · amn bm In words, a system of linear equations has a (at least one) solution if and only if b can be written as a linear combination of the columns of A. Remark 2.2. rank(A) Æ min{m, n}. Remark 2.3. rank(Ab ) is at most rank(A)+1 and at least rank(A), i.e., rank(A) Æ rank(Ab ) Æ rank(A) + 1. Once we’ve established whether or not a solution exists, by understanding if rank(Ab ) and rank(A) are the same or different, we can move to understanding if we have a unique solution or infinitely many solutions. Heuristically, to solve a system of linear equations, we need enough equations. So we can also ask, assuming we have a solution, if we have more equations than we need. Theorem 2.4. Consider a system of linear equations Ax = b. Assume it has a solution, and suppose that rank(A) = rank(Ab ) = k. 1. If k = m = n, then the solution is unique. 2. If k < n, the number of unknowns, then some n ≠ k variables can be chosen freely. The remaining k variables are determined uniquely. Thus, we have infinitely many solutions. The system has n ≠ k degrees of freedom. 3. If k < m, then m ≠ k equations can be thrown away. In this case, we may have infinitely many solutions or just one unique solution, depending on the relationship between k and n. Let’s consider the system determined by Q R Q R 1 1 1 0 A = a2 1 0 b and b = a1b , 1 0 ≠1 1 ANALYSIS AND OPTIMIZATION 5 i.e., x1 + x2 + x3 = 0 2x1 + x2 = 1 Observe that x1 ≠ x3 = 1. rank(A) ≠ rank(Ab ) = 2, since r3 = r2 ≠ r1 . So our system is equivalent to the reduced system x1 + x2 + x3 = 0 2x1 + x2 = 1, i.e., Âx = b̂ with  = 3 1 2 1 1 4 3 4 1 0 and b̂ = . 0 1 (This is an example of part 3. of the theorem.) Notice that x3 is a free variable. That is, for any x3 œ R, we can find a unique pair (x1 , x2 ) such that x1 + x2 = ≠x3 2x1 + x2 = 1. Thus, our system has infinitely many solutions and an extra equation. 2.2. Determinants and Inverses. Let A be an n ◊ n matrix, i.e., square. Definition 2.5. An n ◊ n matrix B is the inverse of A if AB = BA = I = diag(1, . . . , 1). If an inverse exists, it is unique, and we denote it by A≠1 . When does A≠1 exit? 1. Well, we can try to compute it, if we knew how. 2. Recall, from our theorem on systems of linear equation, rank(A) = n implies that a unique solution exists to any system Ax = b. Hence, explicitly, x = A≠1 b. So full rank, square matrices are invertible. 3. We can use the determinant test. Case 1: n = 2. Let A= Then, 3 4 a b . c d -a b - = ad ≠ bc. det(A) = c d- Geometrically, | det(A)| = the volume of the parallelogram determined by the rows of A. Case 2: n = 3. Let Q a11 A = aa21 a31 a12 a22 a32 R a13 a23 b . a33 6 ANALYSIS AND OPTIMIZATION Then, -a11 det(A) = --a21 -a31 a12 a22 a32 a13 --a a23 - . = a11 -- 22 a32 a33 - -a a23 -≠ a12 -- 21 a33 a31 -a a23 -+ a13 -- 21 a33 a31 a22 -. a32 - Geometrically, | det(A)| = the volume of the parallelopiped determined by the rows of A. Case 3: n > 3. Definition 2.6. The cofactor Aij of a matrix A is equal to (≠1)i+j times the determinant of the n ≠ 1 by n ≠ 1 matrix found by removing the ith row and jth column of A. Then, det(A) = n ÿ j=1 aij Aij = n ÿ aij Aij for all i, j. i=1 This is called the cofactor expansion of the determinant. For example, 1. A = diag(a1 , . . . , an ) ∆ det(A) = a1 a2 · · · an . 2. Q R a11 0 ··· 0 c a21 a22 0 0 d c d A=c . d ∆ det(A) = a1 a2 · · · an . . .. .. a .. . 0 b am1 am2 · · · ann Matrices of this form are called lower triangular. The transpose of a lower triangular is upper triangular. Let us recall some properties of determinants. 1. det(At ) = det(A). 2. det(AB) = det(A) det(B). 3. Swap two rows ∆ determinant changes sign. 4. det(cA) = cn det(A). 5. If ai denote the columns of A, then det(a1 | · · · | cai | · · · | an ) = c det(A). 6. If ai denote the columns of A and b is some vector, then det(a1 | · · · | ai + bi | · · · | an ) = det(A) + det(a1 | · · · | ai≠1 | bi | ai+1 | · · · | an ). 7. If the columns (rows) of A are linearly dependent, then det(A) = 0. 8. det(A) ”= 0 … rank(A) = n. 9. det(a1 | · · · | ai + caj | · · · | an ) = det(A). Properties 5. and 6. together say that the det as a function on vectors is multilinear. Let’s go back to inverses. Theorem 2.7. A≠1 exists if and only if det(A) ”= 0. Also, det(A≠1 ) = 1/ det(A). If a matrix has 0 determinant, it is called singular. Let us now recall some properties of inverses. 1. (A≠1 )≠1 = A. 2. (At )≠1 = (A≠1 )t . 3. (AB)≠1 = B ≠1 A≠1 . ANALYSIS AND OPTIMIZATION 7 4. (cA)≠1 = c≠1 A≠1 , for c ”= 0. Theorem 2.8. A≠1 = det(A)≠1 (Aij )t , where Aij is the (i, j)th cofactor of A. Definition 2.9. Given a square matrix A, the adjoint of A, adj(A), is the transpose of the matrix of cofactors of A, i.e., (Aij )t . So that when A is invertible, A≠1 = det(A)≠1 adj(A). If we go back to our original question, we had a question within it: how do we compute A≠1 ? These two theorems give us an answer. Yet we might ask, is there another way? Indeed there is. We can use Gauss–Jordan elimination (row reduction) on the matrix (A | I) to transform it into (I | B). Then, B = A≠1 . For example, set Q R 1 3 3 A = a1 4 3b . 1 3 4 Let us use this direct method to compute A≠1 . Q R Q R 1 3 3 1 0 0 1 3 3 1 0 0 r2 ;r3 a1 4 3 0 1 0b ‘≠≠≠≠≠≠≠≠æ a0 1 0 ≠1 1 0b r2 ≠r1 ;r3 ≠r1 1 3 4 0 0 1 0 0 1 ≠1 0 1 Q R Q R 1 0 3 4 ≠3 0 1 0 0 7 ≠3 ≠3 r r ≠1 1 0b ‘≠≠≠1≠æ a0 1 0 ≠1 1 0 b. ‘≠≠≠1≠æ a0 1 0 r1 ≠3r2 r1 ≠3r3 0 0 1 ≠1 0 1 0 0 1 ≠1 0 1 Hence, Q R 7 ≠3 ≠3 0 b. A≠1 = a≠1 1 ≠1 0 1 Finally, let us compute some cofactors. -4 3- = 16 ≠ 9 = 7, A11 = 3 4- 1 3- = (≠1)1+2 (4 ≠ 3) = ≠1, A12 = -1 4and -3 3- = (≠1)1+2 (12 ≠ 9) = ≠3. A21 = 3 4If we continue in this way and also compute the determinant of A, we will recover what we found as A≠1 using our theorem above. 8 ANALYSIS AND OPTIMIZATION 3. Lecture 3: Elements of Linear Algebra Cont’d 3.1. Eigenvalues and Eigenvectors. Let A be an n ◊ n (square) matrix. What does A represent geometrically? Well, we can always ask, on what does A act and how does A act? The simplest object A might act on is an n ◊ 1 (column) vector, i.e., an element in Rn . For example, if A = diag(2, 1/2), then A takes every element in R2 and doubles its length in the e1 direction and halves its length in the e2 direction, where ei = the vector with a 1 in the ith position and 0s in all the other positions. We call the set {ei : i = 1, . . . , n} the canonical basis of Rn . In other words, for A, the fundamental building blocks are the two vectors e1 and e2 and the constants ⁄1 = 2 and ⁄2 = 1/2. These are examples of eigenvectors and eigenvalues. Definition 3.1. A nonzero x œ Rn (x ”= 0) is an eigenvector of A if Ax = ⁄x for some ⁄ œ R. The constant ⁄ is the called an eigenvalue of A. Eigenvalues and eigenvectors are the geometric building blocks of a matrix. For example, consider an ellipsoid in Rn , which is just a sphere stretched or shrunk in n different directions. It turns out that we can identify any ellipsoid with a (square) matrix. Moreover, the directions along which we change a sphere to get our ellipsoid are the eigenvectors of this matrix, and the amount we stretch or shrink our sphere in these directions are the eigenvalues of this matrix. How do we find the eigenvectors and eigenvectors of a given A? From the definition, we see that we need to solve (A ≠ ⁄I)x = 0, i.e., a system of linear equations, which has a solution if and only if det(A ≠ ⁄I) ”= 0. Notice that, as a function of ⁄, det(A ≠ ⁄I) is a polynomial of degree n. Definition 3.2. The polynomial p(⁄) = det(A ≠ ⁄I) is called the characteristic polynomial of A. The roots of the characteristic polynomial of A are the eigenvalues of A (there are n, but they may not be distinct). For example, 1. Let A = diag(a1 , . . . , an ). Then, det(A ≠ ⁄I) = (a1 ≠ ⁄) · · · (an ≠ ⁄). So the eigenvalues of A are ⁄i = ai , and corresponding eigenvectors (found by inspection) are ei respectively: Aei = ai ei . 2. If Q R 2 4 0 A = a4 8 0 b , 0 0 10 then to find the eigenvalues of A, we solve -2 ≠ ⁄ 4 0 -8≠⁄ 0 -- = (10≠⁄)[(2≠⁄)(8≠⁄)≠16] = ⁄(10≠⁄)2 . 0 = det(A≠⁄I) = -- 4 - 0 0 10 ≠ ⁄Hence, ⁄1 = 0, ⁄2 = 10, and ⁄3 = 10. ANALYSIS AND OPTIMIZATION 9 In this case, we’d prefer to say that A has two eigenvalues ⁄1 = 0 and ⁄2 = 10, and the second eigenvalue has multiplicity two. Now let’s move onto the eigenvectors. Associated to ⁄1 = 0, we get the system i.e., 0 = (A ≠ 0I)x = Ax, 2x1 + 4x2 = 0 4x1 + 8x2 = 0 x3 = 0 … x1 = ≠2x2 x3 = 0. (Note r2 = 2r1 on the left.) Clearly, x2 is a free variable, so an eigenvectors associated to ⁄1 = 0 is Q R ≠2 x1 = a 1 b . 0 Associated to ⁄2 = ⁄3 = 10, we get the system i.e., 0 = (A ≠ 10I)x, ≠8x1 + 4x2 = 0 4x1 ≠ 2x2 = 0 … x2 = 2x1 . So eigenvectors associated to ⁄2 = ⁄3 = 10 are Q R Q R 1 0 x2 = a2b and x3 = a0b . 0 1 We found the last one by inspection. Remark 3.3. Eigenvectors are not unique. Indeed, if x is an eigenvector of A, so is cx for all c ”= 0 (and cx corresponds to the same eigenvalue as x). Definition 3.4. The eigenspace associated to a distinct eigenvalue is the span of all the eigenvectors associated to it. For example, if we go back to our second example above. The eigenspace associated to ⁄1 is span{x1 }, and the eigenspace associated to ⁄2 is span{x2 , x3 }. Definition 3.5. The trace of a (square) matrix is given by tr(A) = a11 + · · · + ann , i.e., it is the sum of the diagonal entries of A. Theorem 3.6. Let A have eigenvalues ⁄i , for i = 1, . . . , n. Then, det(A) = ⁄1 · · · ⁄n and tr(A) = ⁄1 + · · · + ⁄n . Corollary 3.7. A is invertible if and only if the eigenvalues of A are all nonzero. 10 ANALYSIS AND OPTIMIZATION 3.2. Diagonalization. As I said before, we think of the eigenvalues and eigenvectors of a matrix as its geometric building blocks. Can we represent A in terms of its eigenvalues and eigenvectors? Definition 3.8. A is called diagonalizable if it can be written as A = P P ≠1 for some invertible matrix P and some diagonal matrix . Theorem 3.9. An n ◊ n matrix is diagonalizable if and only if it has a set of n linearly independent eigenvectors. In this case, P = (x1 | · · · | xn ) and = diag(⁄1 , . . . , ⁄n ) where xi is an eigenvector associated to ⁄i . Definition 3.10. A is symmetric if A = At . Definition 3.11. Two vectors x and y are orthogonal, x ‹ y, if x · y = xt y = 0. Definition 3.12. A matrix P is orthogonal if P ≠1 = P t . Theorem 3.13. Let A be symmetric. 1. The eigenvalues ⁄i , for i = 1, . . . , n, of A are real, i.e., live in R. 2. Eigenvectors corresponding to distinct eigenvalues are orthogonal. 3. An orthogonal matrix P exists such that P ≠1 AP = diag(⁄1 , . . . , ⁄n ). In particular, the ith column of P is an eigenvector of A corresponding to the ith eigenvalue ⁄i of A. by For example, let’s go back to ellipsoids. In particular, consider the ellipse defined 5x21 + 8x1 x2 + 5x22 = 1. This equation is equivalent to 3 4 5 4 xt Ax = 1 for A = . 4 5 Note that the eigenvalues of A are ⁄1 = 9 and ⁄2 = 1, and some corresponding eigenvectors are 3 4 3 4 1 ≠1 x1 = and x2 = . 1 1 Now let us make x1 and x2 unit length, i.e., Îxi Î = 1. Recall that the length of a vector x is 1 ÎxÎ = (x21 + · · · + x2n ) 2 . In turn, we can instead consider the eigenvectors 1 Ô xi for i = 1, 2. 2 Then, if we set 1 P = Ô (x1 | x2 ), 2 we find a rotation matrix (by angle fi/4), i.e., an orthogonal matrix. Hence, considering = diag(9, 1), we find all the pieces of our theorem on the diagonalizability of symmetric matrices: A = P P ≠1 = P P t . ANALYSIS AND OPTIMIZATION The matrix P describes tilting, while the matrix 1 = x Ax = x P P x = z t t t t 11 describes stretching. Note z with z = P x. So after we rotate by P (i.e., by fi/4), we can think we live in z space as opposed to x space. In z space, our ellipse is just a stretched circle. If we look at the picture of xt Ax = 1, it is a 45 degree rotation of the picture of xt x = 1. 3.3. Quadratic Forms. Our ellipse is also the 1-level set of a quadratic form. Definition 3.14. A quadratic form is a function Q(x) = xt Ax for some matrix A. Remark 3.15. In the definition of quadratic form, we can assume A is symmetric. Indeed, since xt Ax is a scalar, xt Ax = (xt Ax)t = xt At x. So 1 1 1 1 xt (A ≠ At )x = (xt (A ≠ At )x) = xt (Ax ≠ At x) = (xt Ax ≠ xt At x) = 0. 2 2 2 2 1 t Thus, the skew-symmetric part 2 (A ≠ A ) of A does not contribute to the form. What is left over is the symmetric part 12 (A + At ) of A. Note 1 1 A = (A ≠ At ) + (A + At ). 2 2 Definition 3.16. Let A = (aij )n◊n . - A is positive definite if xt Ax > 0 for all x ”= 0. - A is positive semidefinite if xt Ax Ø 0 for all x ”= 0. - A is negative definite if xt Ax < 0 for all x ”= 0. - A is negative semidefinite if xt Ax Æ 0 for all x ”= 0. - Otherwise, we call A indefinite. Definition 3.17. Let Q = Q(x) be a quadratic form, for x œ Rn . - Q is positive definite if Q(x) > 0 for all x ”= 0. - Q is positive semidefinite if Q(x) Ø 0 for all x ”= 0. - Q is negative definite if Q(x) < 0 for all x ”= 0. - Q is negative semidefinite if Q(x) Æ 0 for all x ”= 0. - Otherwise, we call Q indefinite: Q(y) < 0 and Q(z) > 0 for some y, z œ Rn . For example, 1. Q(x) = 5x21 + 8x1 x2 + 5x22 is positive definite, as we saw. 2. Q(x) = 4x21 + 9x22 is positive definite. 3. Q(x) = 4x21 ≠ 12x2 x2 + 9x22 = (2x1 ≠ 3x2 )2 is positive semidefinite. 4. Q(x) = 4x21 ≠ 9x22 is indefinite. 12 ANALYSIS AND OPTIMIZATION 4. Lecture 4: Elements of Linear Algebra Cont’d and Multivariable Calculus 4.1. Quadratic Forms Cont’d. How do we determine definiteness? Theorem 4.1. Let Q be a quadratic form determined by the symmetric matrix A = (aij )n◊n , and let ⁄i , for i = i, . . . , n, be the eigenvalues of A. Then, - Q is positive definite … ⁄i > 0 for all i = 1, . . . , n. - Q is positive semidefinite … ⁄i Ø 0 for all i = 1, . . . , n. - Q is negative definite … ⁄i < 0 for all i = 1, . . . , n. - Q is positive semidefinite … ⁄i Æ 0 for all i = 1, . . . , n. - Q is in definite … ⁄i > 0 and ⁄j < 0 for some i ”= j. In the definition of a quadratic form, we considered xt Ax for any matrix A. When determining the definiteness of a given quadratic form in terms of its defining matrix, we always have to consider the symmetric matrix which defines it: 12 (At + A). Why? One reason is, in general, non-symmetric matrices may have complex roots, which do not have a sign. So we cannot talk about their definiteness from looking at the signs of their eigenvalues. Let’s look at a simple example. Consider the quadratic form Q(x) = x21 + x22 + 2x1 x2 . Two matrices to consider are A1 = 3 1 0 2 1 4 and A2 = 3 1 1 4 1 . 1 These both determine Q. Let’s look at characteristic polynomials of A1 and A2 : pA1 (⁄) = (1 ≠ ⁄)2 while pA2 (⁄) = (1 ≠ ⁄)2 ≠ 1 = ⁄(⁄ ≠ 2) These are different and have different roots. A1 has eigenvalues 1 and 1, while A2 has eigenvalues 0 and 2. Even though pA1 has real roots and both are positive, A1 is not positive definite. Indeed, (≠1, 1)t A1 (≠1, 1) = Q(≠1, 1) = 1 + 1 + 2(≠1)1 = 0, but (≠1, 1) is not the zero vector. So that, by the definition of positive definiteness of a quadratic form (or a matrix), Q (or A1 ) cannot be positive definite. Clearly, the eigenvalues of A1 cannot then be used to determine the definiteness of Q. For example, let f = f (x1 , x2 ) be a smooth function on R2 . If we consider the Taylor expansion of f at z: 1 f (x) = f (z) + Òf (z) · (x ≠ z) + (x ≠ z)t D2 f (z)(x ≠ z) + · · · 2 we see that the second order part 1 (x ≠ z)t D2 f (z)(x ≠ z) 2 is a quadratic form. Here D2 f is the Hessian (matrix) of f : 3 4 ˆ11 f ˆ12 f D2 f = . ˆ21 f ˆ22 f Determining the definiteness of D2 f (z) can help us understand if z is a local maximum, a local minimum, or a saddle point of f . ANALYSIS AND OPTIMIZATION 13 Definition 4.2. Given an m ◊ n matrix A, the leading principal minors of A, Di for i = 1, . . . , k = min{m, n}, are defined as the determinant of the i ◊ i submatrix of A determined by the first i rows and columns. Explicitly, -a11 a12 a13 -a11 a12 - , D3 = -a21 a22 a23 - , . . . D1 = a11 , D2 = -a21 a22 -a31 a32 a33 Note that the leading principle minors are defined of all matrices, not just square ones. In the square case, k = m = n and Dn = det(A). Theorem 4.3. Let Q be a quadratic form. - Q is positive definite if and only if Di > 0 for all i = 1, . . . , n. - Q is negative definite if and only if (≠1)i Di > 0 for all i = 1, . . . , n. For example, let Q(x) = 5x21 + 2x1 x3 + 2x22 + 2x2 x3 + 4x23 to which we have the defining matrix Q 5 A = a0 1 Observe that -5 D1 = 5 > 0, D2 = -0 0 2 1 R 1 1b . 4 0-= 10 > 0, and D3 = det(A) = 40 ≠ 2 ≠ 5 = 33 > 0. 2- Hence, Q (or equivalently A) is positive definite. 4.2. Differentiation. We start by recalling some notation and definitions. Definition 4.4. If f : Rn æ R is a differentiable function, we can define the directional derivative of f in a direction e by the limit ˆe f (x) = lim+ tæ0 f (x + te) ≠ f (x) . t d It is also equal to the derivative ds f (x + se)|s=0 . If e = ei , one of the canonical (or n standard) basis vectors of R , then we will often write ˆi f (x) instead of ˆei f (x); i.e., we set ˆi f (x) = ˆei f (x). Definition 4.5. The gradient of a differentiable function f : Rn æ R is the vector that collects all the directional derivatives of f in the standard directions: Note that Clearly, then Òf (x) = (ˆi f (x), . . . , ˆn f (x)). Òf (x) · e = ˆe f (x). |ˆe f | = |Òf · e| = ÎÒÎÎeÎ cos ◊. This is maximized when the angle between Òf and e is 0, i.e., when Òf points in the same direction as e. So Òf is the direction of maximal increase of f . Definition 4.6. The c-level set of f is the set {x : f (x) = c}. 14 ANALYSIS AND OPTIMIZATION Let n = 2, and suppose that the c-level set of f is the graph of some function of one variable in a small neighborhood around z œ {x : f (x) = c}: for all x1 œ (z1 ≠‘, z1 +‘) (with ‘ > 0), f (x1 , y(x1 )) = c. Then, in particular, Òf (z) · (1, “ Õ (z)) = ˆ1 f (z1 , y(z1 )) + ˆ2 f (z1 , “(z1 ))“ Õ (z1 ) = 0. And so, provided ˆ2 f (z) ”= 0, ˆ1 f (z1 , “(z1 )) . ˆ2 f (z1 , “(z1 )) In other words, this specific ratio of the components of the gradient of f at z gives us the perpendicular slope to the slope of the tangent line to “ at z1 . Recalling the general equation for a line through a point (y ≠ y0 = m(x ≠ x0 )), we see that “ Õ (z1 ) = ≠ ˆ1 f (z)(x1 ≠ z1 ) + ˆ2 f (z)(x2 ≠ z2 ) = ˆ1 f (z)(x1 ≠ z1 ) + ˆ2 f (z)(x2 ≠ “(z1 )) = 0. In turn, the tangent plane to the c-level set of f at z is {x : Òf (z) · (x ≠ z) = 0}. Geometrically, we have that Òf (z) for z œ {x : f (x) = c} is orthogonal to the tangent plane to the c-level set of f at z. An important theorem is the mean value theorem: Theorem 4.7. Suppose that f : Rn æ R is C 1 . Then, for some t œ (0, 1). f (x) ≠ f (y) = Òf (tx + (1 ≠ t)y) · (x ≠ y) Definition 4.8. Let f : Rn æ R be a twice differentiable function. The Hessian of f is the matrix of second derivatives of f : D2 f (x) = (ˆij f (x))n◊n . Theorem 4.9. If f : Rn æ R is C 2 , then D2 f (x) is symmetric. For example, let f : R2 æ R be defined by I x x (x2 ≠x2 ) f (x) = 1 0 2 1 x21 +x22 2 for x ”= 0 for x = 0. Note that ˆ12 f (0) ”= ˆ21 f (0). Definition 4.10. Let f : Rn æ Rm be a differentiable function, i.e., f = (f1 , · · · , fm ) for differentiable functions fi : Rn æ R with i = 1, . . . , m. The differential (or gradient) of f is the m ◊ n matrix whose ith row is the gradient of fi : Q R Q R Òf1 (x) ˆ1 f1 (z) · · · ˆn f1 (z) c d c d .. .. .. .. Df (x) = a b=a b. . . . . Òfm (x) ˆ1 fm (z) ··· ˆn fm (z) ANALYSIS AND OPTIMIZATION 15 5. Lecture 5: Elements of Multivariable Calculus Cont’d 5.1. Convexity (and Concavity). Definition 5.1. A set S is convex if tx + (1 ≠ t)y œ S for all x, y œ S and for all t œ [0, 1]. For example, 1. {x : p · x = c} for some p œ Rn and c œ R. 2. {x : p · x < c} for some p œ Rn and c œ R. 3. {x : Îx ≠ zÎ Æ r} for some r Ø 0 and some z œ Rn . 4. The empty set is convex. 5. Rn is convex. Let’s show that 2. holds: Let z, y œ {x : p · x < c}. We need to check that tz + (1 ≠ t)y is in {x : p · x < c} for all t œ [0, 1]. To that end, observe (by the linearity of the dot product) p · (tz + (1 ≠ t)y) = p · tz + p · (1 ≠ t)y = t p · z + (1 ≠ t) p · y < tc + (1 ≠ t)c = c for any t œ [0, 1]. And so, tz + (1 ≠ t)y œ {x : p · x < c} for any t œ [0, 1], as desired. Theorem 5.2. The intersection of two convex sets is convex. For example, let S = {x : Ax Æ b} = flm i=1 Si where Si = {x : ai · x Æ bi }. Here ai is the ith row of A. Since each Si is convex, S is convex. Remark 5.3. The union of two convex sets may not be convex. Definition 5.4. Let S µ Rn be convex and let f : S æ R. - f is convex if tf (x) + (1 ≠ t)f (y) Ø f (tx + (1 ≠ t)y) for all x, y œ S and for all t œ [0, 1]. - f : S æ R is strictly convex if we replace Ø with > above. - f : S æ R is concave if we replace Ø with Æ above. - f : S æ R is strictly concave if we replace Ø with < above. For example, 1. f (x) = a · x + b is both convex and concave. 2. f (x) = ÎxÎ is convex, by the triangle inequality, but not strictly convex. 3. Let S µ Rn be convex. The epigraph of f , i.e., {(x, y) œ S ◊ R : f (x) Æ y}, is convex if and only if f is convex. 4. Let S µ Rn be convex. The hypograph of f , i.e., {(x, y) œ S ◊R : f (x) Ø y}, is convex if and only if f is concave. 5. The max of two convex functions is convex. 6. The min of two concave functions is concave. Note that f is (strictly) convex if and only if ≠f is (strictly) concave. Theorem 5.5. Let S µ Rn be open and convex and f : S æ R be a C 1 function. Then, f is convex if and only if f (x) Ø f (z) + Òf (z) · (x ≠ z) for all x, z œ S. Similarly, f is strictly convex if have the same statement with > replacing Ø. 16 ANALYSIS AND OPTIMIZATION Theorem 5.6. Let S µ Rn be open and convex and f : S æ R be a C 2 function. - D2 f (x) is positive semidefinite for all x œ S if and only if f is convex. - If D2 f (x) is positive definite for all x œ S, then f is strictly convex. - D2 f (x) is negative semidefinite for all x œ S if and only if f is concave. - If D2 f (x) is negative definite for all x œ S, then f is strictly concave. For example, positive definite quadratic forms are strictly convex. Remark 5.7. The converse statements in this theorem are not true: f (x) = x4 is strictly convex, yet f ÕÕ (0) = 0. Theorem 5.8. Nonnegative linear combinations of convex (concave) functions are convex (concave). In particular, let S µ Rn be convex and fi : S æ R be convex (concave) for i œ {1, . . . , k}. Then f = a1 f1 + · · · ak fk is convex (concave) if ai Ø 0 for all i = 1, . . . , k. Theorem 5.9. Let S µ Rn convex, f : S æ R, and g : R æ R. - If f is convex and g is convex and increasing, then g ¶ f is convex. - If f is convex and g is concave and decreasing, then g ¶ f is concave. - If f is concave and g is convex and decreasing, then g ¶ f is convex. - If f is concave and g is concave and increasing, then g ¶ f is concave. An important inequality is Jensen’s inequality. Theorem 5.10. A function f is convex on a convex subset S of Rn if and only if for all {xi }ki=1 f (t1 x1 + · · · + tk xk ) Æ t1 f (x1 ) + · · · + tk f (xk ) µ S, k œ N, and t1 , . . . , tk Ø 0 such that t1 + · · · + tk = 1. Definition 5.11. A convex combination of a set of points {xi }ki=1 is the point for a set {ti }ki=1 such that t1 x1 + · · · tk xk ti Ø 0 for all i œ {1, · · · , k} and t1 + · · · + tk = 1. ANALYSIS AND OPTIMIZATION 17 6. Lecture 6: Elements of Multivariable Calculus Cont’d 6.1. Taylor’s Theorem. Often times it is useful to be able to approximate a function at or near a given point. Taylor’s theorem gives us a way to do this. Theorem 6.1. Let f : R æ R be k times differentiable at a point z. Then, for some t œ (0, 1), f (x) = k ÿ i=1 1 1 f (i≠1) (z)(x ≠ z)i≠1 + f (k) ((1 ≠ t)x + tz)(x ≠ z)k . (i ≠ 1)! k! In addition, if the kth order derivative of f is continuous at z, then f (x) = k+1 ÿ i=1 where 1 f (i) (z)(x ≠ z)i≠1 + Rk (x, z) (i ≠ 1)! Rk (x, z) æ 0 as x æ z. |x ≠ z|k For example, let f (x) = ex , then, Taylor expanding at 0, 1 1 f (x) = f (0) + f Õ (0)x + f ÕÕ (0)x2 + f ÕÕÕ (0)x3 + · · · 2 6 . 1 2 1 3 = 1 + x + x + x + ··· 2 6 There are other versions of Taylor’s theorem as well as higher dimensional analogues of Taylor’s theorem. For now, we will only state one higher dimensional analogue, when k = 2. Theorem 6.2. Let f : Rn æ R be twice differentiable at a point z. Then, for some t œ (0, 1), 1 f (x) = f (z) + Òf (z) · (x ≠ z) + (x ≠ z)t D2 f ((1 ≠ t)x + tz)(x ≠ z). 2 In addition, if the second derivatives of f are continuous at z, then 1 f (x) = f (z) + Òf (z) · (x ≠ z) + (x ≠ z)t D2 f (z)(x ≠ z) + R2 (x, z) 2 where R2 (x, z) æ 0 as x æ z. Îx ≠ zÎ2 Definition 6.3. The polynomials and T1,z (x) = f (z) + Òf (z) · (x ≠ z) 1 T2,z (x) = f (z) + Òf (z) · (x ≠ z) + (x ≠ z)t D2 f (z)(x ≠ z) 2 are called the first and second order Taylor polynomials of f centered at z respectively. For example, let f : R5 æ R2 be defined by 3 4 x1 x22 + x1 x3 x4 + x2 x25 ≠ 3x1 f (x) = . x34 x2 x3 + 2x1 x4 ≠ x4 x25 ≠ 2x5 18 ANALYSIS AND OPTIMIZATION Then, Df (x) = 3 2 x2 + x3 x4 ≠ 3 2x4 2x1 x2 + x25 x34 x3 x1 x4 x34 x2 x1 x3 3x24 x2 x3 + 2x1 ≠ x25 4 2x2 x5 . ≠2x4 x5 ≠ 2 Using Taylor’s theorem that at z = 0 and x = (0.1, 0.1, 0, 0, 0.1), we can approximate f at x: 3 4 3 4 3 4 0 ≠3 0 0 0 0 ≠0.3 f (x) ≥ f (0) + Df (0)x = + ·x= . 0 0 0 0 0 ≠2 ≠0.2 Note that f (x) = 3 0.001 ≠ 0.3 ≠0.2 4 = 3 4 ≠0.299 . ≠0.2 6.2. The Implicit Function Theorem. Suppose we have the equation f (x, y) = 0 where x is given and y is unknown. When can we solve this? We have already understood the answer in the linear case: when f (x, y) = Ay ≠ x for some matrix A. Indeed, if we first rename x, and call it b, and second rename y, can call it x, we have Ax = b, our familiar system of linear equations. But what happens when f is nonlinear? Theorem 6.4. Let f = f (x, y) : R2 æ R be a C 1 function in some neighborhood of (x0 , y0 ). Suppose that f (x0 , y0 ) = 0. If ˆy f (x0 , y0 ) = ˆ2 f (x0 , y0 ) ”= 0, then two intervals I1 = (x0 ≠ ”, x0 + ”) and I2 = (y0 ≠ ‘, y0 + ‘) exist such that for every x œ I1 , the equation f (x, y) = 0 has a unique solution in I2 : this defines y as a function of x in I1 , i.e., y = Ï(x) in I1 . Moreover, Ï is C 1 in I1 and ÏÕ (x) = ≠ ˆ1 f (x, Ï(x)) ˆx f (x, Ï(x)) =≠ ˆ2 f (x, Ï(x)) ˆy f (x, Ï(x)) for all x œ I1 . For example, let f (x, y) = y 3 + x2 ≠ 3xy ≠ 7 and consider the equation f (x, y) = 0. Note f (4, 3) = 0 and ˆ2 f (4, 3) = ˆy f (4, 3) = (3y 2 ≠ 3x)|(x,y)=(4,3) = 36 ”= 0. So we can indeed solve for y near (4, 3). If we apply our theorem, we see that y Õ (x) = ≠ At (x, y) = (4, 3), we then find ˆ1 f (x, y)) 3y ≠ 2x =≠ . ˆ2 f (x, y) 3(y 2 ≠ x) 1 . 15 This is wonderful, because, by Taylor’s theorem, we can at least approximate y = y(x) near x = 4. Indeed, y Õ (4) = y≠3≥ 1 1 41 (x ≠ 4) … y ≥ x+ . 15 15 15 Theorem 6.5. Let f = (f1 , . . . , fm ) be a C 1 function of (x, y) œ Rn ◊ Rm = Rn+m in a neighborhood of (x0 , y0 ). Suppose that f (x0 , y0 ) = 0. If the Jacobian determinant of f with respect to y is ”= 0 at (x0 , y0 ), then we can find neighborhoods B1 and B2 around x0 and y0 in Rn and Rm respectively such that the Jacobian determinant of f with respect to y is ”= 0 for all (x, y) œ B1 ◊ B2 . Moreover, for every x œ B1 , a ANALYSIS AND OPTIMIZATION 19 unique y œ B2 exists such that f (x, y) = 0. In this way, y is a function of x in B1 . In particular, y is C 1 in B1 and the differential of y (as a function of x) is Recall that Dy(x) = ≠(Dy f (x, y))≠1 Dx f (x, y). Q R Q Òf1 (z) ˆ1 f1 (z) c d c . .. .. Df (z) = a b=a . Òfm (z) So, letting z = (x, y) œ R m+n ˆ1 fm (z) , ··· .. . ··· R ˆn+m f1 (z) d .. b. . ˆn+m fm (z) Df (x, y) = (Dx f (x, y) | Dy f (x, y))m◊(m+n) where Dx f (x, y) is an m ◊ n matrix and Dy f (x, y) is and m ◊ m matrix. Notice that if we differentiate f (x, y(x)) = 0 in x, we find that Dx f (x, y) + Dy f (x, y)Dx y(x) = 0 So if det(Dy f (x, y)) ”= 0, then Dy f (x, y) is invertible, and Dy(x) = Dx y(x) = ≠(Dy f (x, y))≠1 Dx f (x, y). Definition 6.6. Let f : Rn+m æ Rm be differentiable. The Jacobian determinant of f with respect to y is det(Dy f (x, y)). For example, let f : R5 æ R2 be defined by 3 4 x1 x22 + x1 x3 y1 + x2 y22 ≠ 3 f (x, y) = . y13 x2 x3 + 2x1 y2 ≠ y1 y22 ≠ 2 Then, where Df (x, y) = (Dx f (x, y) | Dy f (x, y))m◊(m+n) Dx f (x, y) = and 3 x22 + x3 y1 2y2 2x1 x2 + y22 y13 x3 x1 y1 y13 x2 4 x1 x3 2x2 y2 . 3y12 x2 x3 ≠ y22 2x1 ≠ 2y1 y2 Note that f (1, 1, 1, 1, 1) = 0. The Jacobian determinant of f with respect to y is x1 x3 2x2 y2 -det(Dy f (x, y)) = - 2 , 3y1 x2 x3 ≠ y22 2x1 ≠ 2y1 y2 Dy f (x, y) = 3 4 which at (1, 1, 1, 1, 1) is -1 2-2 0- = ≠4. So, again, we can indeed solve for y as a function of x near (1, 1, 1, 1, 1), by the Implicit Function Theorem. Also, by our theorem, 3 43 4 3 4 1 0 ≠2 1 ≠4 ≠2 ≠2 2 3 1 Dy(1, 1, 1, 1, 1) = =≠ . 2 1 1 4 ≠2 1 4 ≠2 ≠5 ≠1 And, near (1, 1, 1, 1, 1), 3 y1 ≠ 1 y2 ≠ 1 4 1 ≥ 4 3 4 2 2 5 Q R 4 x1 ≠ 1 2 a x2 ≠ 1b . 1 x3 ≠ 1 20 ANALYSIS AND OPTIMIZATION In particular, we can approximate y = y(x) near x = (1, 1, 1). 6.3. The Inverse Function Theorem. Definition 6.7. Let f : A æ B be a function with A µ Rn and B µ Rm . - f is injective (one-to-one) if f (x) = f (z) implies that x = z. - f is surjective (onto) if for every y œ B, there is an x œ A such that f (x) = y. - f is bijective if it is injective and surjective. In this case, f is invertible, and the inverse f ≠1 : B æ A of f satisfies f (f ≠1 (y)) = y and f ≠1 (f (x)) = x. Theorem 6.8. Let f : Rn æ Rn be C 1 near a point x0 , and suppose that det Df (x0 ) ”= 0, i.e., the Jacobian determinant of f at x0 is nonzero. Then, f ≠1 exists in a neighborhood B around y0 and Df ≠1 (y0 ) = Df (x0 )≠1 where f (x0 ) = y0 . If f is C k near x0 , then f ≠1 is C k in the image B = f (A), for some neighborhood A of x0 . For example, let f : R2 æ R2 be defined by 3 4 x1 + x22 ≠ 2 f (x) = . x31 + x2 ≠ x1 x22 ≠ 1 Note that f (1, 1) = (0, 0) and Df (x) = Hence, 3 1 3x21 ≠ x22 4 2x2 . 1 ≠ 2x1 x2 det(Df (1, 1)) = 1(≠1) ≠ 2(2) = ≠5 ”= 0, and, by the Inverse Function Theorem, f is invertible in a neighborhood of (1, 1). Moreover, 3 4≠1 3 4 1 2 1/5 2/5 ≠1 Df (0, 0) = = . 2 ≠1 2/5 ≠1/5 We may not know the exact form of the inverse, but we do know that g(y) = Df ≠1 (0, 0)y is a good approximation of f ≠1 near 0. ANALYSIS AND OPTIMIZATION 21 7. Lecture 7: Unconstrained Optimization 7.1. Extreme and Stationary Points. Definition 7.1. Let f : A µ Rn æ R. - If f (x0 ) Ø f (x) for all x œ A, then x0 is a global maximum for f . - If f (x0 ) Æ f (x) for all x œ A, then x0 is a global minimum for f . - If f (x0 ) Ø f (x) for all x œ B µ A, then x0 is a local maximum for f . - If f (x0 ) Æ f (x) for all x œ B µ A, then x0 is a local minimum for f . Collectively, maxima and mimima are called extreme points. Theorem 7.2. Let f : A µ Rn æ R be differentiable. If x0 œ A is an extreme point, then Òf (x0 ) = 0. Definition 7.3. Let f : Rn æ R be differentiable. If Òf (x0 ) = 0, then x0 is called a stationary point of f . Stationary points may not be extreme points: (0, 0) for f (x, y) = x2 ≠ y 2 is stationary but not extreme. Extreme points are stationary provided f is differentiable. Non-differentiable functions can have extreme points that are not stationary: 0 for f (x) = |x| is an extreme point, but not a stationary point. Definition 7.4. A stationary but non-extreme point is called a saddle point. Theorem 7.5. Let S µ Rn be an open convex set, and x0 œ S. Let f : S æ R be C 1 in an neighborhood of x0 . - Suppose f is convex. Then, x0 is a minimum point for f in S if and only if x0 is a stationary point (for f ). - Suppose f is concave. Then, x0 is a maximum point for f in S if and only if x0 is a stationary point (for f ). Theorem 7.6. Let f : A µ Rn æ R. If f is continuous and A is closed and bounded, then f achieves a maximum and minimum point on A. Theorem 7.7. Let f : Rn æ R be twice differentiable in a neighborhood around x0 , and suppose that Òf (x0 ) = 0, i.e., x0 is a stationary point of f . - If D2 f (x0 ) is positive definite, then x0 is a local minimum of f . - If D2 f (x0 ) is negative definite, then x0 is a local maximum of f . - If det(D2 f (x0 )) ”= 0 and D2 f (x0 ) is neither positive definite nor negative definite, then x0 is a saddle point of f . - If x0 is a local minimum of f , then D2 f (x0 ) is positive semidefinite. - If x0 is a local maximum of f , then D2 f (x0 ) is negative semidefinite. These cases are not all the possibilities. Remember the function f (x) = x3 in R: f (0) = 0, f ÕÕ (0) = 0, and 0 is a saddle point of f . For example, let Õ f (x, y) = 3x2 y + y 3 ≠ 3x2 ≠ 3y 2 + 2. Let’s find and classify all stationary points. A stationary point satisfies 0 = Òf (x, y) = (6xy ≠ 6x, 3x2 + 3y 2 ≠ 6y). So we have the equations 6x(y ≠ 1) = 0 and 3(x2 + y 2 ≠ 2y) = 0. 22 ANALYSIS AND OPTIMIZATION In turn, there are few cases. From the first equation, we see that either x = 0 or y = 1. Inputting this information into the second equation, we find the following stationary points: 1. (x, y) = (0, 0), 2. (x, y) = (0, 2), 3. (x, y) = (1, 1), and 4. (x, y) = (≠1, 1). Now we compute the Hessian of f at these four points. Note, in general, 3 4 6y ≠ 6 6x D2 f (x, y) = . 6x 6y ≠ 6 Hence, at our four stationary points: 3 4 3 4 3 4 ≠6 0 6 0 0 6 2 2 2 D f (0, 0) = , D f (0, 2) = , D f (1, 1) = , 0 ≠6 0 6 6 0 and 4 0 ≠6 D f (≠1, 1) = . ≠6 0 So after checking the definiteness of these matrices we see that point 1. is a local max, 2. is a local min, 3. is a saddle point, and 4. is a saddle point. 2 3 ANALYSIS AND OPTIMIZATION 23 8. Lecture 8: Review of Midterm & Unconstrained Optimization Cont’d Let us consider the function f ú (r) = max f (x, r) xœS for a given f : R æ R and S µ R . Here x œ Rn and r œ Rm . f ú (r) is the maximum of f when we fix a particular r, which we know how to find. (Take the derivative of f in x, set it equal to 0, and solve for xú = xú (r) the stationary point.) The question now is, how does f ú change as r changes? Let xú = xú (r) be the vector that maximizes f for a fixed r. Then, n+m n f ú (r) = f (xú (r), r). So differentiating in r, if we consider the gradients as columns vectors Òf ú (r) = (Dxú (r))t Òx f (xú (r), r) + Òr f (xú (r), r). If we consider the gradients as row vectors, Òf ú (r) = Òx f (xú (r), r)Dxú (r) + Òr f (xú (r), r). But since xú (r) = xú is a maximum of f (x, r) for r fixed, Òx f (xú (r), r) = Òx f (xú , r) = 0. In turn, Òf ú (r) = Òr f (xú (r), r). This equality is called the envelope result. For example, let f ú (r) = max f (x, y, r) x,yœR with f (x, y, r) = ≠x2 ≠ xy ≠ 2y 2 + 2rx + 2ry. First, we differentiate in x and y and set that gradient in x and y to (0, 0), to determine the stationary points of f with r fixed: Òx,y f (x, y, r) = (≠2x ≠ y + 2r, ≠x ≠ 4y + 2r) = (0, 0). This holds if and only if 2x + y = 2r and x + 4y = 2r; that is, if and only if x = 3y. So xú = xú (r) = maximize f for r fixed. In turn, 6r 2r and y ú = y ú (r) = 7 7 f ú (r) = Finally, On the other hand, 8r2 . 7 df ú 16r (r) = . dr 7 ˆr f (xú , y ú , r) = 2xú + 2y ú = 16r , 7 24 ANALYSIS AND OPTIMIZATION which verifies the envelope result. In other words, to verify the envelope result, we compute the derivative of f with respect to r, plug in (xú (r), y ú (r), r) and check that this equals the derivative of f ú with respect to r. ANALYSIS AND OPTIMIZATION 25 9. Lecture 9: Constrained Optimization 9.1. Equality Constraints. Let us consider the general optimization problem max f (x) (or min f (x)) x x subject to g(x) = b. Here f : R æ R, g : R æ R with m Æ n, and b œ Rm are all given. The function g and the vector b are constraints; hence, we are in the world of constrained optimization. n n m Remark 9.1. Most often, we will be in the setting where m < n, so that we have a least one degree of freedom. To solve this problem, we introduce the method of Lagrange multipliers. In particular, from our given data, we construct a Lagrangian, i.e., a function L : Rn ◊ Rm = Rn+m æ R defined by L(x, ⁄) = f (x) ≠ ⁄ · (g(x) ≠ b) = f (x) ≠ m ÿ j=1 ⁄j (gj (x) ≠ bj ), where g = (g1 , g2 , . . . , gm ), b = (b1 , b2 , . . . , bm ) and ⁄ = (⁄1 , ⁄2 , . . . , ⁄m ) œ Rm . Each ⁄i , for i = 1, . . . , m, is called a Lagrange multiplier. A necessary condition for optimality, i.e., maximization (or minimization), is Òx L(x, ⁄) = 0; that is, for all i = 1, . . . , n, m ÿ ˆL (x, ⁄) = ˆi f (x) ≠ ⁄j ˆi gj (x) = 0. ˆxi j=1 In other words, we need to find a pair (xú , ⁄ú ) such that Òx L(xú , ⁄ú ) = 0 or ˆi L(xú , ⁄ú ) = 0 for all i = 1 . . . , n, and then check that xú is indeed optimal. Remark 9.2. Since ≠ max{f (x)} = min{≠f (x)}, x x any minimization problem can be turned into a maximization problem and visa-versa. So, for simplicity, we will often just talk about the maximization case. Is there a way to understand ⁄? Observe that we can construct the (optimal) value function f ú from our data as follows f ú (b) = max{f (x) : g(x) = b}. x For a fixed b œ R , let x = x (b) be a vector that maximizes f , subject to g(x) = b. Then, f (xú ) = f ú (b). Obviously, for all x œ Rn , f (x) Æ f ú (g(x)). m ú ú 26 ANALYSIS AND OPTIMIZATION Hence, if we define the function Ï : Rn æ R by Ï(x) = f (x) ≠ f ú (g(x)), then Ï is maximized at xú . So, assuming that f ú is differentiable, 0 = ÒÏ(xú ) = Òf (xú ) ≠ Òf ú (g(xú ))Dg(xú ) = Òf (xú ) ≠ Òf ú (b)Dg(xú ). Recall we fixed b œ Rm and xú = xú (b). In other words, for all i = 1, . . . , n, 0 = ˆi f (xú ) ≠ If we set m ÿ ˆj f ú (b)ˆi gj (xú ). j=1 ⁄új = ⁄új (b) = ˆj f ú (b), then our equation ˆi L(xú , ⁄ú ) = 0 for all i = 1, . . . , n is exactly what we have. For example, consider the optimization problem max f (x, y, z) with f (x, y, z) = x + 2z subject to 7 . 4 Here f : R3 æ R, g = (g1 , g2 ) = (x + y + z, x2 + y 2 + z) : R3 æ R2 , and b = (1, 7/4). (So m = 2 and n = 3.) x + y + z = 1 and x2 + y 2 + z = Remark 9.3. We can always make b = 0 by simply considering g(x) ≠ b instead of g(x). This won’t affect anything as the optimality condition is a condition derivatives, which don’t see the addition or subtraction of constants. So we can reformulate this problem as max f (x, y, z) with f (x, y, z) = x + 2z subject to 7 =0 4 Here f : R3 æ R, g = (g1 , g2 ) = (x + y + z ≠ 1, x2 + y 2 + z ≠ 74 ) : R3 æ R2 , and b = (0, 0). Now let’s construct the Lagrangian: 3 4 7 2 2 L(x, y, z, ⁄) = x + 2z ≠ ⁄1 (x + y + z ≠ 1) ≠ ⁄2 x + y + z ≠ . 4 x + y + z ≠ 1 = 0 and x2 + y 2 + z ≠ The optimality conditions are (1) ˆx L(x, y, z, ⁄) = 1 ≠ ⁄1 ≠ 2⁄2 x = 0 (2) ˆy L(x, y, z, ⁄) = ≠⁄1 ≠ 2⁄2 y = 0 subject to (3) ˆz L(x, y, z, ⁄) = 2 ≠ ⁄1 ≠ ⁄2 = 0 (–) x + y + z ≠ 1 = 0 and (—) x2 + y 2 + z ≠ Combing this information, we find that 7 = 0. 4 (xú , y ú , z ú ) = (0, ≠1/2, 3/2) or (xú , y ú , z ú ) = (1, 3/2, ≠3/2) ANALYSIS AND OPTIMIZATION 27 and ⁄ 1 = 1 = ⁄2 . In particular, (3) implies that ⁄2 = 2 ≠ ⁄1 . Putting this into (2), we find that ⁄1 (2y ≠ 1) = 4y. So y ”= 1/2 and ⁄1 = 4y/(2y ≠ 1). Let’s call this relation between ⁄1 and y (4). Using (1), (2), and (4), we find that (5) y = 2x ≠ 1/2. Plugging (5) into (–) and (—), we see that 5x(x ≠ 1) = 0, so that x = 0 or x = 1. From here, we find our possibilities for (xú , y ú , z ú , ⁄ú ). Computing f at these points, we see that f (0, ≠1/2, 3/2) = 3 and f (1, 3/2, ≠3/2) = ≠2. Thus, the only candidate is (0, ≠1/2, 3/2). But we aren’t done. We need to check how our Lagrange multipliers affects our Lagrangian. If we input ⁄1 = 1 = ⁄2 into our Lagrangian, we find that 11 L(x, y, z, 1, 1) = ≠x2 ≠ y 2 ≠ y + , 4 which is concave, as a function of (x, y, z). So (0, ≠1/2, 3/2) is indeed a maximizer. Theorem 9.4. Let (xú , ⁄ú ) be such that Òx L(xú , ⁄ú ) = 0. If L(x, ⁄ú ) is concave (convex) as a function of x, then xú solves the maximization (minimization) problem. Why is this? If L(x) = L(x, ⁄ú ) is concave and xú is a stationary point, then L(xú ) Ø L(x) for all x œ Rn . (Recall what we have learned about concave functions.) Then, using the constraint, g(x) = b = g(xú ), we see that f (xú ) Ø f (x) for all x œ Rn such that g(x) = b. In an earlier lecture, we considered optimization problems in all of Rn of the form maximize f , minimize f , or find and classify all stationary point of f . These are unconstrained problems. The Lagrange multiplier method allows us to tackle optimization problem over subsets of Rn . We will work over convex subset of Rn . For example, consider f (x, y) = 2x2 + 3y 2 ≠ 4x ≠ 5 and S = {(x, y) œ R2 : x2 + y 2 Æ 16}. What are the extrema of f subject to (x, y) œ S? In other words, find max f (z) and min f (z) zœS zœS with z = (x, y). To find these extrema, we have to take two steps. First, we find all the stationary points of f . That is, the points (x, y) such that 0 = Òf (x, y) = (4x ≠ 4, 6y). Thus, the only stationary point is (1, 0). Now we check whether or not this point is in S. Well, 12 + 02 = 1 Æ 16. So (1, 0) œ S. If it were not in S, we would discard it. To continue, we try to classify this stationary point. We compute 3 4 3 4 4 0 4 0 D2 f (x, y) = , so that D2 f (1, 0) = . 0 6 0 6 In turn, (1, 0) is a local minimum (D2 f (1, 0) is positive definite). Moreover, f (1, 0) = ≠7. Second, we have to understand what happens along the boundary of S, the set {(x, y) œ R2 : x2 + y 2 = 16}. This is an equality constraint, and so we introduce our Lagrangian, L(x, y, ⁄) = f (x, y) ≠ ⁄(x2 + y 2 ≠ 16). 28 ANALYSIS AND OPTIMIZATION As before, we need to understand when subject to Òx,y L(x, y, ⁄) = (4x ≠ 4 ≠ 2⁄x, 6y ≠ 2⁄y) = (0, 0) x2 + y 2 ≠ 16 = 0. From the second equation, we see that y = 0 or ⁄ =Ô 3. When ⁄ = 3, we see that x = ≠2. Then, y 2 = 16 ≠ 4 = 12, so that y = ±2 3. When y = 0, from our constraint equation, we see that x = ±4. Thus, we have four more cases to consider: Ô f (≠2, ±2 3) = 47, f (4, 0) = 11, and f (≠4, 0) = 43. So, considering Steps 1 and 2 together, the max of f over S is 47, and the min of f over S is ≠7. ANALYSIS AND OPTIMIZATION 29 10. Lecture 10: Constrained Optimization Cont’d 10.1. Inequality Constraints. Let us consider the general optimization problem max f (x) x subject to g(x) Æ b. Here f : R æ R, g : R æ R , and b œ Rm are all given. Also, n n m g(x) Æ b … gj (x) Æ bj for all j = 1, . . . , m. Remark 10.1. Unlike the equality constraints case, we do not typically make the assumption that m < n. In fact, it could be that m > n. Just as a minimization problem can be turned into a maximization problem (cf. last lecture), inequality constraints of the form gj (x) Ø bj can be turned into the form we wrote above, gj (x) Æ bj , by replacing gj with ≠gj and bj with ≠bj for which ever j we need to get the whole vector inequality. In particular, gj (x) Ø bj … ≠gj (x) Æ ≠bj . Definition 10.2. The set {x œ Rn : g(x) Æ b} is called the admissible or feasible set. Like before we construct a Lagrangian L : Rn ◊ Rm = Rn+m æ R defined by L(x, ⁄) = f (x) ≠ ⁄ · (g(x) ≠ b) where g = (g1 , g2 , . . . , gm ), b = (b1 , b2 , . . . , bm ), and ⁄ = (⁄1 , ⁄2 , . . . , ⁄m ) œ Rm . The vector ⁄ like before is called a vector of Lagrange multipliers. Again, we have an optimality condition Òx L(x, ⁄) = Òf (x) ≠ ⁄Dg(x) = 0; that is, for all i = 1, . . . , n, m ÿ ˆL (x, ⁄) = ˆi f (x) ≠ ⁄j ˆi gj (x) = 0. ˆxi j=1 The new feature here is a set of conditions called the complementary slackness conditions ⁄j Ø 0 with ⁄j = 0 if gj (x) < bj for any j = 1, . . . , m. In other words, ⁄ Ø 0 and ⁄ · (g(x) ≠ b) = 0. When gj (x) = bj , we say that this constraint is active. Why do we call these complementary slackness conditions? Because at most one of the two, for each j = 1, . . . , m, inequalities gj (x) Æ bj or ⁄j Ø 0 can be strict. Equivalently, at least one must be an equality. The optimality condition and the complementary slackness conditions together are called the Kuhn–Tucker conditions. Let’s analyze these conditions a bit by studying the (optimal) value function f ú associated to our problem: f ú (b) = max{f (x) : g(x) Æ b}. x Let’s assume that f ú is differentiable. 30 ANALYSIS AND OPTIMIZATION First, notice that f ú is nondecreasing in each argument bj . Why? Because if we increase bj while keeping bi for all i ”= j fixed, the admissible set increases. So f ú cannot decrease. For a fixed b œ Rm , let xú = xú (b) be a vector at which f is maximized, i.e., f ú (b) = f (xú ). Clearly, for any x œ Rn , f (x) Æ f ú (g(x)), because x satisfies the constraint when we replace b with g(x), i.e., g(x) Æ g(x) always. Then, since g(xú ) Æ b and f ú is nondecreasing, f ú (g(x)) Æ f ú (g(x) + b ≠ g(xú )). So if we define Ï : Rn æ R by we find that Ï(x) = f (x) ≠ f ú (g(x) + b ≠ g(xú )), Ï(x) Æ 0 for all x œ Rn . As Ï(xú ) = 0, xú is a maximum of Ï. So And if we set 0 = ÒÏ(xú ) = Òf (xú ) ≠ Òf ú (b)Dg(xú ). ⁄ú = ⁄ú (b) = Òf ú (b), we find our optimality condition. Now let’s look at our complementary slackness conditions again. First, since f ú is nondecreasing in each argument, ˆj f ú (b) Ø 0 for all b œ Rm . So as desired. Second, let’s prove that ⁄új Ø 0, gj (xú ) < bj for some j = 1, . . . , m ∆ ⁄új = 0. By the continuity of gj , if gj (xú ) < bj , then gj (x) < bj in a neighborhood of xú . So that since f ú is nondecreasing, f ú (g(xú )) Æ f ú (bÕ ) Æ f ú (b) where bÕ = (b1 , . . . , bj≠1 , bÕj , bj≠1 , . . . , bm ) and bÕj œ (gj (xú ), bj ). But xú being a maximizer implies that f ú (g(xú )) = f ú (b). So f ú is constant along this interval. Hence, ˆj f ú (b) = 0. In turn, ⁄új = 0. Theorem 10.3. Suppose that xú solves our inequality constraints problem. Let the constraint qualification CQ be satisfied: CQ: The vectors Ògj (xú ) for the j such that gj (xú ) = bj are linearly independent. Then, there is a unique ⁄ú such that the Kuhn–Tucker conditions holds at (xú , ⁄ú ). How do we solve our problem in general? Step 1: Find all the points (xú , ⁄ú ) at which the KT conditions hold. Step 2: Find all the points xú at which the CQ fails. Step 3: Check among your choices which are admissible, and then from those determine which is the maximizer. To gather the constraint qualification points, if any, you should do the following. ANALYSIS AND OPTIMIZATION 31 1. Gather all your constraint functions. Let’s say we have two: gi = gi (x, y) for i = 1, 2. 2. Take their gradients, to get a set of vectors, one vector for each constraint. These vectors may be variable, i.e., depend on x and y in this case. 3. Find all the points at which the constraints are active AND at which the gradients of the constraints are linearly dependent: (i). Find all pairs (a, b) such that when you plug (a, b) into g1 and g2 , they are active. Check if, at any of these (a, b), the gradients of g1 and g2 , as a set of two vectors, are linearly dependent. If so, keep these (a, b). You can do things in the reverse order as well: Find all pairs (a, b) at which the gradients of g1 and g2 , as a set of two vectors, are linearly dependent. Keep those (a, b) that when plugged into g1 and g2 both g1 and g2 are active. (ii). Find all pairs (a, b) such that when you plug (a, b) into g1 it is active. If the gradient of g1 at any of these (a, b) is the zero vector, then you have to keep it. A single vector is linearly dependent if and only if it is the 0 vector. So, in this case, the only (a, b) you can possibly add are those at which the gradient of g1 is the zero vector. (iii). Redo (ii) with g2 instead of g1 . NOTE: If you have more than two constraints, you’ll have more cases to consider. You’ll have to check all the constraints together, like in 1. You’ll have to check each constraint on its own, like in 2. and 3. And you’ll have to check all pairs, and triplets, etc. (For the three constraint case, you have each of the three to consider individually, all pairs to consider, and the three together to consider.) For example, consider subject to max f (x, y) with f (x, y) = ≠(x ≠ 2)2 ≠ 2(y ≠ 1)2 g1 (x, y) = x + 4y Æ 3 and g2 (x, y) = ≠x + y Æ 0. Our Lagrangian is L(x, y, ⁄) = f (x, y) ≠ ⁄1 (g1 (x, y) ≠ 3) ≠ ⁄2 g2 (x, y). Let’s check the KT conditions: ˆx L(x, y, ⁄) = ≠2(x ≠ 2) ≠ ⁄1 + ⁄2 = 0 and ˆy L(x, y, ⁄) = ≠4(y ≠ 1) ≠ 4⁄1 ≠ ⁄2 = 0 subject to ⁄1 (x + 4y ≠ 3) = 0 and ⁄2 (y ≠ x) = 0 with ⁄1 , ⁄2 Ø 0. We need to consider 4 cases: 1. when both constraints are inactive; 2. when one is active and the other is inactive; 3. the reverse of 2.; and 4. when both constraints are active. 1. If ⁄1 = ⁄2 = 0, then (x, y) = (2, 1). 2. If ⁄1 = 0, ⁄2 ”= 0, then (x, y) = (4/3, 4/3) and ⁄2 = ≠4/3. 3. If ⁄1 ”= 0, ⁄2 = 0, then (x, y) = (5/3, 1/3) and ⁄1 = 2/3. 4. If ⁄1 ”= 0, ⁄2 ”= 0, then (x, y) = (3/5, 3/5) and (⁄1 , ⁄2 ) = (22/25, ≠48/25). Now let’s check when the CQ fail. Observe that Òg1 (x, y) = (1, 4) and Òg2 (x, y) = (≠1, 1). These are linearly independent. So the CQ always holds, and there are no additional points to consider. First, (2, 1) does not satisfy the first constraint. So it is inadmissible. Throw it away. Similarly, (4/3, 4/3) does not satisfy the first constraint. So it is inadmissible. 32 ANALYSIS AND OPTIMIZATION Moreover, ⁄2 < 0 in this case, which violates our complementary slackness conditions. Throw (4/3, 4/3) away too. Third, while (3/5, 3/5) satisfies the constraints, ⁄2 < 0 in this case. So we throw it away. Fourth, (5/3, 1/3) is admissible and ⁄ú in this case is nonnegative. So we keep it. Since L(x, y, ⁄ú ) in this fourth case is concave, we know that (5/3, 1/3) is a maximizer (by the following theorem). Theorem 10.4. Let (xú , ⁄ú ) satisfy the KT conditions. If xú is admissible and L(x, ⁄ú ) is concave as a function of x, then xú solves the maximization problem. Proof. If L(x, ⁄ú ) is concave and Òx L(xú , ⁄ú ) = 0, then xú maximizes L(x, ⁄ú ). This implies that f (xú ) ≠ ⁄ú · (g(xú ) ≠ b) Ø f (x) ≠ ⁄ú · (g(x) ≠ b) for all admissible x. Rearranging, we find that f (xú ) ≠ f (x) Ø ⁄ú · (g(xú ) ≠ g(x)) for all admissible x. Now if gj (xú ) < bj , then ⁄új = 0, by our complementary slackness conditions. On the other hand, if gj (xú ) = bj , then ⁄új · (gj (xú ) ≠ gj (x)) = ⁄új · (bj ≠ gj (x)) Ø 0 since x is admissible and ⁄j Ø 0. So ⁄ú · (g(xú ) ≠ g(x)) Ø 0 if x is admissible. Using this information above, f (xú ) Ø f (x) for all admissible x. So xú is indeed our solution. ⇤ For example, let’s reconsider (cf. last lecture) subject to max f (x, y) with f (x, y) = 2x2 + 3y 2 ≠ 4x ≠ 5 x2 + y 2 Æ 16. This time, let’s use the KT conditions to find our solution. Our Lagrangian is L(x, y, ⁄) = 2x2 + 3y 2 ≠ 4x ≠ 5 ≠ ⁄(x2 + y 2 ≠ 16). So our KT conditions are 0 = ˆx L(x, y, ⁄) = 4x ≠ 4 ≠ 2⁄x 0 = ˆy L(x, y, ⁄) = 6y ≠ 2⁄y 0Æ⁄ 0 = ⁄(x2 + y 2 ≠ 16). If our constraint is active, then ⁄ ”= 0. In this case, x2 + y 2 = 16. From our second equation, we see that either y = 0 or ⁄ = 3. If y = 0, then x = ±4, by our constraint. Also, ⁄ = 3/2 and ⁄ = 5/2, when x = 4 and x = ≠4 respectively. Ô If ⁄ = 3, then using our first equation, we find that x = ≠2, and then y = ±2 3, by our constraint. If our constraint is inactive, then ⁄ = 0. So x = 1 and y = 0. Now we check when the CQ fail. Note that the gradient of our constraint is (2x, 2y). The only time (2x, 2y) is linearly dependent is if it is the 0 vector, i.e, when (x, y) = (0, 0). But (0, 0) does not activate our constraint. So we don’t consider it. Finally, when ⁄ú = 0, 3/2, L(x, y, ⁄ú ) is convex. When ⁄ú = 5/2, L(x, y, ⁄ú ) is neither convex nor concave. On the other hand, when ⁄ú = 3, L(x, y, ⁄ú ) is concave. ANALYSIS AND OPTIMIZATION 33 How do we handle this scenario? We can only apply our theorem in the last case (if L(x, y, ⁄ú ) is concave then xú is a maximizer, provided (xú , ⁄ú ) satisfies the KT conditions). To deal with this, we note that f is continuous and the admissible region is closed and bounded. So by the Extreme Value Theorem, a maximizer exists in this region. Thus, while the case that ⁄ = 3 is one in which we have a maximizer (by our theorem), we can’t ignore the other points. We have to test them all by checking they are admissible, their corresponding ⁄úÔØ 0, and finally plugging them into f . We’ll find, like in last lecture, that (≠2, ±2 3) are maximizers. Theorem 10.5. Let f : A µ Rn æ R be continuous. If A is closed and bounded, then f achieves its maximum and minimum in A. Definition 10.6. A subset A µ Rn is closed if it contains all of its limit points. Definition 10.7. A subset A µ Rn is open if for every point x œ A, we can find a radius r > 0 such that ball of radius r centered at x is in A, i.e., {z : Îz≠xÎ < r} µ A. In particular, the complement of an open set is closed and visa-versa, where the complement of A, denoted by Ac , is equal to the set of points not in / outside of A. Definition 10.8. A subset A µ Rn is bounded if for every x œ A, the length of x is bounded by some constant: for every x œ A, ÎxÎ Æ C for some finite C Ø 0. For example, 1. (0, 1) µ R is open and bounded 2. [0, 1] µ R is closed and bounded. 3. (0, 1] µ R is neither closed nor open, but it is bounded. 34 ANALYSIS AND OPTIMIZATION 11. Lecture 11: Constrained Optimization Cont’d and ODEs 11.1. Mixed Constraints. Let us consider the general optimization problem max f (x) x subject to g(x) = b and h(x) Æ c Here f : R æ R, g : R æ R , h : Rn æ Rk , b œ Rm , and c œ Rk are all given. In this case, we define the Lagrangian as follows: L : Rn ◊ Rm ◊ Rk = Rn+m+k æ R defined by L(x, ⁄, µ) = f (x) ≠ ⁄ · (g(x) ≠ b) ≠ µ · (h(x) ≠ c). n n m Theorem 11.1. Suppose xú solves our mixed constraints problem. Suppose that CQ holds for g and h. Then, there are unique vectors ⁄ú and µú such that Òx L(xú , ⁄ú , µú ) = 0, µú Ø 0, and µú · (h(xú ) ≠ c) = 0. Theorem 11.2. Suppose (xú , ⁄ú , µú ) satisfy the KT conditions and xú is admissible. If L(x, ⁄ú , µú ) is concave, as a function of x, then xú solves the maximization problem. Remark 11.3. The equality constraint has to be considered in every case when you are checking for CQ failure, because as an equality constraint, it is always active. Let’s do an example. Consider the maximization problem max f (x, y, z) with f (x, y, z) = x2 + y 2 + z 2 subject to g(x, y, z) = x + y + z = 0 and h(x, y, z) = x2 + y 2 + 3z 2 Æ 9. In this case, our Lagrangian is L(x, y, z, ⁄, µ) = x2 + y 2 + z 2 ≠ ⁄(x + y + z) ≠ µ(x2 + y 2 + 3z 2 ≠ 9). The KT conditions are 0 = ˆx L(x, y, z, ⁄, µ) = 2x ≠ ⁄ ≠ 2µx 0 = ˆy L(x, y, z, ⁄, µ) = 2y ≠ ⁄ ≠ 2µy 0 = ˆy L(x, y, z, ⁄, µ) = 2z ≠ ⁄ ≠ 6µz 0Ƶ 0 = µ(x2 + y 2 + 3z 2 ≠ 9) 0 = x + y + z. Note that Òg(x, y, z) = (1, 1, 1) and Òh(x, y, z) = (2x, 2y, 6z). The points at which Òg(x, y, z) and Òh(x, y, z) are linearly dependent are (a, a, a/3) for any a œ R. But since (a, a, a/3) cannot activate both g and h, we do not add any of them as a CQ failure point. To continue with the CQ, we have to consider the cases when only one constraint is active. Since an equality constraint is always active, the only choice we have is to consider it alone. Since Òg(x, y, z) = (1, 1, 1) is linearly independent, CQ always holds in this case. So, again, there are no CQ failure points to add. ANALYSIS AND OPTIMIZATION 35 Since the admissible region is closed and bounded, by the EVT, we look for our solution among the points which satisfy the KT conditions (the CQ, as we showed, does not fail). Notice that ⁄ is always active, as it is an equality constraint. If µ is inactive, µ = 0. In this case, by our optimality conditions, x = y = z = ⁄/2. By our equality constraint g, we then see that ⁄ = 0, which implies that x = y = z = 0. If µ is active, i.e, µ ”= 0, then, from our first two optimality conditions (subtract them), we find that 2(x ≠ y)(1 ≠ µ) = 0. So either µ = 1 or x = y. If x = y, then, by ourÔequality constraint, Ô ≠2x = z. Hence, by our inequality constraint, x = y = ±3/ 14 and z = û6/ 14. If µ = 1, then, by our first optimality condition, ⁄ = 0. Thus, z = 0, considering our Ô third optimalityÔcondition. Inputting this in our constraints, we see that x = ±3/ 2 and y = û3/ 2. In summary, we have the following points to consider: Ô Ô Ô Ô Ô (0, 0, 0), (±3/ 2, û3/ 2, 0), and (±3/ 14, ±3/ 14, û6/ 14). Ô Ô Plugging these into f , we find that (±3/ 2, û3/ 2, 0), which are admissible, both maximize f , and max f subject to g = 0 and h Æ 9 is equal to 9. As a final example, let’s throw away the inequality constraint in the last example, and consider max f (x, y, z) with f (x, y, z) = x2 + y 2 + z 2 subject to g(x, y, z) = x + y + z = 0. Note that the set of points that satisfy g = 0 is a plane, which is unbounded. Since f (x, y, z) æ +Œ as Î(x, y, z)Î æ Œ, we see that the max of f subject to g is plus infinity. 11.2. ODEs. What is an ordinary differential equation (ODE)? It is an equation where you consider relations between a function of one variable and its derivatives. In other words, here, the unknown is a function (of one variable), and not a number (or vector). And the equation includes one or more derivatives of the unknown. In general, an ODE looks like F (t, x, x(1) , . . . , x(n) ) = 0 for some given function F : Rn+2 æ R. Here t œ R and x = x(t) : R æ R. The function x is the unknown / solution to our ODE. We use dx d2 x (t) = xÕ (t) = ẋ(t) = x(1) (t), 2 (t) = xÕÕ (t) = ẍ(t) = x(2) (t), etc. dt dt to denote the derivatives of x. For example, for a, b œ R and f : R æ R, 1. 2. 3. 4. 5. ẋ = ax ẋ + ax = b ẋ + ax = bx2 ẋ = x + t ẋ = af (x) + bx 36 ANALYSIS AND OPTIMIZATION are all ODEs. We often use t to represent the variable on which x depends, and call it time, x is often called space. Definition 11.4. The graph (t, x(t)) of a solution to an ODE is called an integral curve or solution curve. Let’s consider the specific example, ẋ = x + t. In this case, we can use the general notation for an ODE as follows. Let F : R3 æ R be defined by F (t, x, ẋ) = ẋ ≠ x ≠ t. Then, F (t, x, ẋ) = 0 is equivalent to ẋ = x + t. Observe that x1 (t) = ≠t ≠ 1 and x2 (t) = et ≠ t ≠ 1 are both solutions to our ODE. Indeed, and Also, ẋ1 = ≠1 = (≠t ≠ 1) + t = x1 + t ẋ2 = et ≠ 1 = (et ≠ t ≠ 1) + t = x2 + t. x3 (t) = Cet ≠ t ≠ 1 for any C œ R is a solution. In general, an ODE may have many solutions, typically, infinitely many. Of course, if we add a constraint to our ODE, we might find that only one solution exits (or maybe no solution exists). For instance, if we restrict our solution curve to go through the point (t, x) = (0, 1), then C must be 2 (in the expression for x3 ). So this constraint forces us to have a unique solution. In other words, uniquely solves x(t) = 2et ≠ t ≠ 1 ẋ = x + t with x(0) = 1. ANALYSIS AND OPTIMIZATION 37 12. Lecture 12: ODEs Cont’d Definition 12.1. The general form of a solution to a given ODE is called a general solution. While a specific solution is called a particular solution. In our example from last lecture, Cet ≠ t ≠ 1 for any C œ R is the general solution, while x1 (t) = ≠t ≠ 1 and x2 (t) = et ≠ t ≠ 1 are particular solutions to ẋ = x + t. If we only look to solve our ODE on some time interval, e.g., (t0 , T ), then we call t0 the initial time and the problem I F (t, x, x(1) , . . . , x(n) ) = 0 in (t0 , T ) x(t0 ) = x0 the initial value problem on (t0 , T ) with initial condition x(t0 ) = x0 . The endpoint T can be plus infinity. A first order ODE is an ODE of the form F (t, x, ẋ) = 0, i.e. the minimal and maximal number of derivatives taken is 1. The highest order of an ODE is the maximal number of derivatives which shows up. The lower order terms are those which depend only on no or fewer derivatives of the solution. For example, in the general setting F (t, x, x(1) , . . . , x(n) ) = 0, the highest order is n. Anything that involves strictly fewer than n derivatives is lower order. A first order ODE is called separable if the lower order terms can be written as the product of two functions, one depending on space only and the other on time only: F (t, x, ẋ) = ẋ ≠ f (t)g(x) for some given f, g : R æ R. How do we solve a first order separable ODE? Step 1: Rewrite the equation: dx = f (t)g(x). dt Step 2: Formally, gather like variables: dx = f (t) dt. g(x) Step 3: Integrate: ˆ dx = f (t) dt. g(x) Step 4: Evaluate the integrals (if possible). Step 5: Solve for x (if possible). In this step, we may need to leave our expression as one that is implicit in x. ˆ 38 ANALYSIS AND OPTIMIZATION Step 6: Note that every constant function x(t) © a for any a œ R such that g(a) = 0 is a solution. Incorporate these constant solutions into your general solution if not already there. For example, let’s solve the IVP ≠t ẋ = with x(0) = 1. x≠3 Step 1: Rewrite the equation: dx ≠t = . dt x≠3 Step 2: Formally, gather like variables: (x ≠ 3)dx = ≠tdt. Step 3: Integrate: Step 4: ˆ (x ≠ 3) dx = ˆ ≠t dt. x2 t2 ≠ 3x = ≠ + C. 2 2 Step 5: Solve for x (if possible): (x ≠ 3)2 = 9 ≠ t2 + 2C ∆ x = 3 ± (9 ≠ t2 + 2C)1/2 . (This would be our general solution, after we add in the possible constant solutions.) Step 5’: Use the initial conditions: 5 x(0) = 1 ∆ C = ≠ . 2 But, 3 + (9 ≠ 02 ≠ 5)1/2 = 3 + 2 = 5 ”= 1. And 3 ≠ (9 ≠ 02 ≠ 5)1/2 = 3 ≠ 2 = 1. So, we see our solution is x = 3 ≠ (4 ≠ t2 )1/2 for t œ (≠2, 2). Step 6: There are no constant solutions in this problem because 1/(x ≠ 3) = 0 if and only if x = ±Œ, and we don’t permit x to take the value plus or minus infinity where it solves the ODE. In particular, for this initial condition, our solution only exists for a finite time interval t œ (≠2, 2). Let’s do another example: solve ẋ = 2t(x ≠ 5). Steps 1 - 4: We have ˆ ˆ dx = 2t dt. x≠5 So ln |x ≠ 5| = t2 + C; that is, 2 2 |x ≠ 5| = et +C = Aet with A = eC > 0. Thus, 2 x = 5 ± Aet with A > 0. ANALYSIS AND OPTIMIZATION 39 Step 6: We add the constant solution x(t) © 5, to find the general solution 2 x = 5 + Aet with A œ R. A first order linear ODE is of the form ẋ + a(t)x = b(t) or F (t, x, ẋ) = 0 with F (t, x, ẋ) = ẋ + a(t)x ≠ b(t). It is called linear because the expression ẋ + a(t)x is linear in x. How do we solve this type of ODE? Well, let’s make an observation: ẋ + a(t)x = b(t) … eA(t) ẋ + eA(t) a(t)x = eA(t) b(t) for any A = A(t) : R æ R, since eA(t) > 0. Now if A is the anti-derivative of a, ˆ A(t) = a(t) dt, we see that eA(t) ẋ + eA(t) a(t)x = eA(t) b(t) is equivalent to d (xeA(t) ) = b(t)eA(t) . dt Indeed, d (xeA(t) ) = ẋeA(t) + xȦ(t)eA(t) = ẋeA(t) + xa(t)eA(t) . dt So ˆ A(t) xe = b(t)eA(t) dt + C, i.e., x = e≠A(t) In summary, ˆ b(t)eA(t) dt + Ce≠A(t) with A(t) = ẋ + a(t)x = b(t) … x = e≠ The function ´ ´ e a(t) dt 3 C+ ˆ ´ b(t)e ˆ a(t) dt. a(t) dt dt 4 a(t) dt is called an integrating factor. If we want to understand an initial condition x(t0 ) = x0 , note that our solution can be rewritten as x(t) = Ce≠A(t) + e≠A(t) G(t) where G(t) is the indefinite integral ˆ G(t) = b(t)eA(t) dt. Thus, Note that And so, C = x(t0 )eA(t0 ) ≠ G(t0 ). A(t) ≠ A(t0 ) = ˆ t a(r) dr. t0 x(t) = x(t0 )e≠(A(t)≠A(t0 )) + e≠A(t) (G(t) ≠ G(t0 )). 40 ANALYSIS AND OPTIMIZATION Since G(t) ≠ G(t0 ) = ˆ we find that e≠A(t) (G(t) ≠ G(t0 )) = e≠A(t) In other words, I ẋ + a(t)x = b(t) x(t0 ) = x0 t ˆ t b(s)eA(s) ds, t0 b(s)eA(s) ds = t0 t ˆ b(s)e≠(A(t)≠A(s)) ds t0 … x = x0 e ≠ ´t t0 a(r) dr + ˆ t b(s)e≠ t0 ´t s a(r) dr ds For example, consider the IVP t2 ẋ + tx = 1 in (1, Œ) with x(1) = 2. We start by rewriting our equation ẋ + Now we define the integrating factor eA(t) = e ´ x 1 = 2. t t dt t = eln |t| = t ´ since t > 0. Here we choose C = 0 when computing the indefinite integral dt t . We have this freedom. Thus, 3ˆ 4 3ˆ 4 1 1 dt 1 A(t) x(t) = A(t) b(t)e dt + C = + C = (ln |t| + C) t t t e for some C œ R that we need to determine. Since x(1) = 2, we find that C = 2, and (since t > 0) our solution is 1 x(t) = (ln t + 2). t Let’s go back to the first ODE we analyzed: ẋ = x + t. Our analysis gave us an infinite family of solutions, but we could not conclude that we found everything. Now we can. In this case, we see that a(t) © ≠1 and b(t) = t. So ˆ ˆ xeA(t) = teA(t) dt + C with A(t) = ≠1 dt = ≠t. Thus, integrating by parts, ˆ ˆ xe≠t = te≠t dt + C = ≠te≠t + e≠t dt + C. In conclusion, x = ≠t ≠ 1 + Cet for any C œ R, as desired. A more general first order ODE we might find looks like g(t, x)ẋ + f (t, x) = 0 1 where f and g are C functions of two variables. This is an example of a nonlinear ODE. In a specific case, this ODE is rather easily solved. Let’s see that case now. ANALYSIS AND OPTIMIZATION 41 Suppose that a function h exists such that h(t, x) = C. Then, differentiating in t, d h(t, x) = ˆt h(t, x) + ˆx h(t, x)ẋ = 0. dt So if ˆt h = f and ˆx h = g, then we’d be in business. Theorem 12.2. Let f, g : R2 æ R be C 1 . An h : R2 æ R exists such that ˆt h = f and ˆx h = g if and only if ˆx f = ˆt g. And ˆ t ˆ x h(t, x) = f (s, x) ds + g(t0 , y) dy with x(t0 ) = x0 . t0 x0 If we have ODE of the form g(t, x)ẋ + f (t, x) = 0 and ˆx f = ˆt g, we call the ODE exact. For example, let’s consider the ODE 1 + t2 xẋ + tx2 = 0 in (0, T ). Then, letting g(t, x) = t2 x and f (t, x) = tx2 + 1, we find that ˆt g = 2tx = ˆx f . So this ODE is exact. In turn, our solutions x are those such that for some C œ R. Now let’s look at t2 x2 +t=C 2 h(t, x) = This implies that So t2 x2 + t = C. 2 3 41/2 2(C ≠ t) x=± and C Ø t. t 3 41/2 2(T ≠ t) xT = ± t 2 2 solves 1 + t xẋ + tx = 0 in (0, T ). Let’s state some general theorems about the existence and uniqueness of solutions to first order ODEs of the general nonlinear variety ẋ = f (t, x). Theorem 12.3. Let f = f (t, x) : R2 æ R. 1. If f and ˆx f are continuous in an open set A µ R2 , then, for every pair (t0 , x0 ) œ A, a unique local solution to ẋ = f (t, x) exists such that x(t0 ) = x0 . 2. If f and ˆx f are continuous in R2 and |f (t, x)| Æ a(t)|x| + b(t) for continuous functions a, b : R æ R, then, for every pair (t0 , x0 ) œ R2 , a unique global, i.e., in all of R, solution to ẋ = f (t, x) exists such that x(t0 ) = x0 . 42 ANALYSIS AND OPTIMIZATION Remark 12.4. By local solution, we mean a function x and a time interval (s, T ) exist such that t0 œ (s, T ), (t, x(t)) œ A for all t œ (s, T ), and (of course) ẋ = f (t, x) with x(t0 ) = x0 . Remark 12.5. Regarding uniqueness, we mean that if x1 and x2 are local solutions with x1 (t0 ) = x2 (t0 ), then x1 (t) = x2 (t) for all times t at which both solutions exit simultaneously. ANALYSIS AND OPTIMIZATION 43 13. Lecture 13: ODEs Cont’d 13.1. Bernoulli’s Equation. Bernoulli’s equation is a family of first order nonlinear ODEs. This family is parameterized by a constant r œ R, and takes the form ẋ + a(t)x = b(t)xr . Again, r œ R is a fixed constant. There are three cases to consider. Case 1: r = 0. When r = 0, our equation is linear. Indeed, we get ẋ + a(t)x = b(t). So we can apply what we know about first order linear ODEs to solve it. Case 2: r = 1. When r = 1, we can rearrange our equation is so that it is separable. Indeed, ẋ + a(t)x = b(t)x is equivalent to ẋ = f (t)g(x) with f (t) = b(t) ≠ a(t) and g(x) = x. So we can apply what we know about first order separable ODEs to solve it. Case 3: r ”= 0 and r ”= 1. When r ”= 0 and r ”= 1, we have to do a change of variables. Remark 13.1. In this case, we typically only look for positive solution. Otherwise, xr may not be well-defined. That said, given a specific r œ R, you can analyze when xr is well-defined to say something about solutions which are not just positive. Remark 13.2. Don’t forget to check if there is a constant solution, like x(t) = 0 for all t œ R (also written as x © 0) to the ODE. What do we mean by a change of variables, also known as a transformation of variables? A change of variables is a process of defining a new ODE from our original ODE by defining a new function from x. Let’s see this explicitly in this case. Here we define the function z = x1≠r , and see that x solves Bernoulli’s equation if and only if z solves the linear ODE 1 ż + a(t)z = b(t). 1≠r Indeed, ż = (1 ≠ r)x≠r ẋ. So after rewriting our original ODE as x≠r ẋ + a(t)x1≠r = b(t) (multiply by x≠r ) and using the formulas for z and ż, we find the linear ODE above. We know how to solve first order linear ODEs. Therefore, after we solve for z, we then determine x, by letting 1 x = z 1≠r . For example, let’s solve ẋ ≠ x = et x2 . 44 ANALYSIS AND OPTIMIZATION First, note that x © 0 is a solution. If x(t) ”= 0 for any t œ R, then we have ẋ 1 ≠ = et . x2 x Set z= Then, 1 . x ż = ≠ Hence, we find the ODE ẋ . x2 ≠ż ≠ z = et … ż + z = ≠et , which is linear. Thus, observe that (zet )Õ = z Õ et + zet = ≠e2t if and only if (multiply by et ). In turn, or ż + z = ≠et 1 zet = ≠ e2t + C, 2 1 z = ≠ et + Ce≠t . 2 Changing variables back, we find that x= 1 2et = on (≠Œ, 12 ln 2C) and ( 12 ln 2C, Œ) z 2C ≠ e2t also solves our ODE (on the intervals defined above). 13.2. Riccati’s Equation. Riccati’s equation is a first order nonlinear ODE. It takes the form ẋ = P (t) + Q(t)x + R(t)x2 . In general, this ODE can only be solved through numerical methods. But if we already know a particular solution, we can use a change of variables to find the general solution. Indeed, if we know that u = u(t) is a particular solution, we can define z implicitly by the equation 1 x=u+ . z Substituting u + z ≠1 into our ODE and using that u is a solution, we find an ODE for z. Indeed, notice that ẋ = u̇ ≠ żz ≠2 and Also, observe that u̇ = P (t) + Q(t)u + R(t)u2 . ẋ = P (t) + Q(t)x + R(t)x2 ANALYSIS AND OPTIMIZATION 45 if and only if u̇ ≠ żz ≠2 = P (t) + Q(t)(u + z ≠1 ) + R(t)(u + z ≠1 )2 = P (t) + Q(t)u + R(t)u2 + Q(t)z ≠1 + R(t)(2uz ≠1 + z ≠2 ), which, using that u is a solution, is equivalent to ≠żz ≠2 = Q(t)z ≠1 + R(t)(2uz ≠1 + z ≠2 ). Multiplying across by ≠z 2 and rearranging, we find the ODE ż + (Q(t) + 2R(t)u(t))z = ≠R(t). Letting a(t) = Q(t) + 2R(t)u(t) and b(t) = ≠R(t), we see that the ODE for z we found is linear, which we know how to solve. After we solve for z, we use the definition of z to determine x. Remark 13.3. This whole process assumed we knew a particular solution already. Sometimes we can guess a solution, and then use this procedure. For example, consider ẋ ≠ x2 + 2et x = e2t + et . Observe that u(t) = et is a particular solution (found by guessing). So let’s do our change of variables: 1 x=u+ . z In this case, the ODE for z we find is ż + (Q(t) + 2R(t)u(t))z = ≠R(t) with Q(t) = ≠2e and R(t) = 1, or t We then see that from which we determine that ż = ≠1. z = ≠t + C, x = et + (Note that x is only define when t ”= C.) 1 . C ≠t 13.3. Equations with Homogeneous Terms. Definition 13.4. Let k be an integer and suppose that f : Rn æ R is such that f (⁄x) = ⁄k f (x) for all ⁄ œ R. Then, we say f is k-homogeneous or homogeneous of degree k. 46 ANALYSIS AND OPTIMIZATION Suppose that we have the ODE P (t, x)ẋ + Q(t, x) = 0 with P and Q homogeneous of degree k, i.e., for all ⁄ œ R, P (⁄t, ⁄x) = ⁄k P (t, x) and Q(⁄t, ⁄x) = ⁄k Q(t, x). In this case, we can again preform a change of variables to simplify our ODE into something we know how to solve. Indeed, if we let x = tz, then our ODE is equivalent to P (t, tz)(tz)Õ + Q(t, tz) = 0. Using the k-homogeneity of P and Q and the product rule, we find the equivalent ODE tk P (1, z)z + tk P (1, z)tż + tk Q(1, z) = 0. In other words, 1 P (1, z)z + Q(1, z) ż = ≠ . t P (1, z) This is a separable ODE. So after solving it, we can determine x very easily. Remark 13.5. These computations are formal. Since we cannot divide by 0, we have to be careful about the expressions we have on the right-hand side of the ODE for z. For example, consider If we set tẋ = x ≠ tex/t on (0, Œ). P (t, x) = t and Q(t, x) = x ≠ tex/t , we see that P and Q are 1-homogeneous. So setting x = tz, we find the following ODE for z, tẋ = x ≠ tex/t … t(tz)Õ = tz ≠ tetz/t ; that is, if and only if tz + t2 ż = tz ≠ tez , which is equivalent to ż = ≠t≠1 ez . Using what we know about separable ODEs, we look at ˆ ˆ dz dt = ≠ . ez t Hence, we see that ≠e≠z = ≠ ln |t| + C = ≠ ln t + C, since t > 0. Thus, z = ≠ ln(ln t + C). Finally, x = tz = ≠t ln(ln t + C). ANALYSIS AND OPTIMIZATION 47 14. Lecture 14: ODEs Cont’d Recall that a second order ODE looks like F (t, x, ẋ, ẍ) = 0 4 for some given F : R æ R. Again, x = x(t) is the unknown, and dx d2 x = xÕ and ẍ = 2 = xÕÕ . dt dt In general, it is very difficult to find solutions to second order ODEs. But there are some specific cases we can understand very well. First, we will consider three cases that can be reduced / transformed into first order equations. Then, we will move on to some other second order ODEs, and learn some new techniques. Case 1: We start with the ODE ẋ = ẍ = f (t). To solve this ODE, we just integrate twice. For example, consider ẍ = k for some fixed k œ R. If we integrate once, we find that ẋ = kt + A, for any A œ R. Integrating again, we see that kt2 + At + B for any A, B œ R. 2 Case 2: Now let’s consider the ODE x= ẍ = f (t, ẋ). Notice that x on its own is missing. So if we let z = ẋ, we have the ODE ż = f (t, z). This change of variables reduces our ODE to a first order ODE, which we can try to analyze and solve with what we have already learned. If we can solve for z, then we simply integrate z = ẋ to determine x. For example, consider the ODE ẍ = ẋ + t. If we let z = ẋ, then we find the first order ODE ż = z + t, which we have already solved. In particular, Integrating, we find that z = Aet ≠ t ≠ 1 for any A œ R. x = Aet ≠ t2 ≠ t + B for any A, B œ R. 2 48 ANALYSIS AND OPTIMIZATION For a second order ODE, we typically need to prescribe two condition for an initial value problem. An initial value problem in the second order setting looks like Y _ ]F (t, x, ẋ, ẍ) = 0 on (t0 , T ) x(t0 ) = x0 _ [ ẋ(t0 ) = ẋ0 where x0 , ẋ0 œ R are two given constants. For example, if we want to solve the IVP Y _ ]ẍ = ẋ + t on (0, Œ) x(0) = 1 _ [ ẋ(0) = 2, we plug in x(0) = 1 and ẋ(0) = 2 into what we just obtained: and 1 = x(0) = Ae0 ≠ 02 ≠0+B =A+B 2 2 = ẋ(0) = Ae0 ≠ 0 ≠ 1 = A ≠ 1. So A = 3 and B = ≠2, and our solution is t2 ≠ t ≠ 2. 2 Let’s do another example. Consider the ODE x = 3et ≠ ẋ . t To solve this ODE, we consider the change of variables z = ẋ, and find the equivalent ODE, for z, z ż = cos t ≠ … tż + z = t cos t … (tz)Õ = t cos t. t So, integrating by parts, ˆ zt = t cos t dt = t sin t + cos t + A with A œ R. ẍ = cos t ≠ Multiplying by t≠1 and integrating again, we find that ˆ ˆ x = z dt = ≠ cos t + t≠1 cos t dt + A ln |t|. Remark 14.1. Since the indefinite integral of t≠1 cos t does not have a nice closed form solution, we leave it as an indefinite integral. Also, notice that the general solution here only has one free constant A, whereas, in all of our other examples, we’ve seen two free constants. The second constant lives inside the indefinite integral of t≠1 cos t. Case 3: Now let’s consider the ODE ẍ = f (x, ẋ). Notice that an explicit dependence on t is missing. The standard method to try to solve this type of ODE is to think of t as a function that depends on x, i.e., think ANALYSIS AND OPTIMIZATION 49 of x as the independent variable, and find and solve an ODE for t. In particular, let tÕ and tÕÕ denote the first and second derivatives of t with respect to x: formally ẋ = and ẍ = 1 1 dx = = Õ dt dt/dx t d dx dx d 1 1 ≠tÕÕ ≠tÕÕ = = = . dt dt dt dx tÕ tÕ (tÕ )2 (tÕ )3 So replacing ẍ and ẋ in our ODE, we find the equivalent ODE ≠tÕÕ = f (x, (tÕ )≠1 ) … tÕÕ = ≠(tÕ )3 f (x, (tÕ )≠1 ). (tÕ )3 Observe that this ODE, as an ODE for t = t(x), does not depend explicitly on t. So this ODE is one that falls into Case 2, above, which we know how to solve. For example, consider the ODE ẍ = ≠xẋ, which, from our procedure above, is equivalent to tÕÕ = x(tÕ )2 . Using what we learned in Case 2, we do the the change of variables z = z(x) = tÕ (x), and find the ODE z Õ = xz 2 . This is separable, and so we find the integral equality ˆ ˆ ≠1 x2 z ≠2 dx = x dx … = + C. z 2 Solving for z and then integrating, we find that ˆ ˆ 1 t = z dx = ≠2 dx. 2 x + 2C This integral has three cases to consider, when C = 0, C < 0, and C > 0. When C < 0, we see that ˆ ˆ 1 1 1 Ô ≠2 dx = ≠ dx 2 x + 2C C (x/ 2C)2 + 1 Ú ˆ 2 1 =≠ dy 2 C y +1 = ≠ (2/C)(arctan y + A) Ô ≠ (2/C)(arctan(x/ 2C) + A). Thus, solving for x, we find that Ô x = 2C tan(≠( (C/2)t + A)). 50 ANALYSIS AND OPTIMIZATION When C < 0, we see that ˆ ˆ 1 1 ≠2 dx = ≠2 2 dx x2 + 2C x2 ≠ (≠2C) 3 4 - x ≠ (≠2C) ≠1 -+A . = ln (≠2C) x + (≠2C) - We cannot solve for x explicitly, and so we leave the solution in an implicit form 3 4 - x ≠ (≠2C) ≠1 t= ln +A . (≠2C) x + (≠2C) When C = 0, we see that ˆ ˆ 1 1 2 ≠2 dx = ≠2 dx = + A. 2 2 x + 2C x x And so, 2 . t≠A Notice that, in these last two cases, the solution x is not defined on all of R = (≠Œ, Œ). For instance, when C = 0, we see that t ”= A. x= 14.1. Second Order Linear ODEs. A second order linear ODE takes the following form ẍ + a(t)ẋ + b(t)x = f (t), where a, b, and f are given functions. While this type of ODE is very difficult to solve by hand for most a, b, and f , we can say something about the general structure of solutions, even if we can’t find them. When f (t) = 0 for all t, we call this ODE homogeneous. Otherwise, we call this ODE non-homogeneous. Definition 14.2. Two functions u1 and u2 are non-proportional if neither is a constant multiple of the other. (This is linear independence for functions.) Theorem 14.3. Consider the non-homogeneous ODE ẍ + a(t)ẋ + b(t)x = f (t) and its homogeneous counterpart ẍ + a(t)ẋ + b(t)x = 0. 1. The general solution to the homogeneous ODE is given by xh = C1 u1 + C2 u2 where C1 , C2 œ R are arbitrary constants and u1 and u2 are any two particular solutions that are non-proportional. 2. The general solution to the non-homogeneous ODE is given by x = uú + xh where uú is any particular solution of the ODE and xh is the general solution of the homogeneous counterpart. ANALYSIS AND OPTIMIZATION 51 Remark 14.4. Notice that if u1 and u2 solve a homogeneous equation, then C1 u1 +C2 u2 , where C1 , C2 œ R are arbitrary constants, solves the same homogeneous equation. This is why we call this type of ODE linear. (Compare this with first order linear ODEs.) If uú and x solve a non-homogeneous equation, then x ≠ uú and uú ≠ x solve the homogeneous counterpart. In general, this theorem is the best we can do, but when a and b are constant functions, we can do better. Theorem 14.5. Consider the ODE ẍ + aẋ + bx = 0, where a, b œ R are arbitrary constants. 1. If D = a2 ≠ 4b > 0, then the general solution is x = C1 er1 t + C2 er2 t where r1 and r2 are the two real roots of the characteristic equation r2 + ar + b = 0. 2. If D = a2 ≠ 4b = 0, then the general solution is x = C1 ert + C2 tert where r is the real root of the characteristic equation r2 + ar + b = 0. 3. If D = a2 ≠ 4b < 0, then the general solution is x = e–t (C1 cos —t + C2 sin —t) Ô where – = ≠a/2 and — = ≠D/2. Here C1 , C2 œ R are arbitrary constants. Remark 14.6. Where did the characteristic equation come from? Well, if we plug x = ert into our equation ẍ + aẋ + bx = 0, we see that (r2 + ar + b)ert = 0, which is true if and only if r2 + ar + b = 0. What happens if we add a right hand side f = f (t), and consider ẍ + aẋ + bx = f (t), where a, b œ R are arbitrary constants? The general algorithm to solve a linear non-homogeneous constant coefficient ODE is as follows. Step 1: Solve the homogeneous counterpart ẍ + aẋ + bx = 0, call the solution xh . Step 2: Find any particular solution to the ODE, call the solution xnh . Step 3: Add these solutions together: x = xh + xnh . Of course, Step 2 seems to be putting us in a chicken vs. egg scenario. But we will see a method to deal with this. That said, there are three east cases to consider first. Case 1: If f is a polynomial of degree n, then you should look for an xnh that is also a polynomial of degree n. Case 2: If f = peqt , then guess xnh = Aeqt . Case 3: If f = p sin rt + q cos rt, then guess xnh = A sin rt + B cos rt. 52 ANALYSIS AND OPTIMIZATION Let’s do an example to see all this in action. Let’s solve the ODE ẍ ≠ 3ẋ + 2x = 2t2 + 2t + 4. Step 1 is to solve the homogeneous counterpart ẍ ≠ 3ẋ + 2x = 0, which has characteristic equation r2 ≠ 3r + 2 = 0. The roots of the equation are r1 = 1 and r2 = 2. So xh = C1 et + C2 e2t . Now we deal with Step 2. From Case 1, we expect xnh to look like a degree two polynomial At2 + Bt + C. Let’s see what A, B, and C need to be in order for this to be a solution. If xnh = At2 + Bt + C, then ẋnh = 2At + B and ẍnh = 2A. Hence, we need 2A ≠ 3(2At + B) + 2(At2 + Bt + C) = 2t2 + 2t + 4. This happens if and only if In other words, Thus, and 2A = 2, ≠6A + 2B = 2, and 2A ≠ 3B + 2C = 4. A = 1, B = 4, and C = 7. xnh = t2 + 4t + 7, x = C1 et + C2 e2t + t2 + 4t + 7. ANALYSIS AND OPTIMIZATION 53 15. Lecture 15: ODEs Cont’d and the Calculus of Variations 15.1. ODEs: Variation of Parameters. Until now we have only been able to solve second order linear constant coefficient ODEs with very particular right hand sides f = f (t). Through an example, we will now illustrate a method that, in principle, will allow you to deal with any f = f (t). Consider the ODE ẍ ≠ 2ẋ ≠ 3x = te≠t . ≠t Since f = f (t) = te is neither a polynomial, a function of the form peqt , nor a function of the form p sin(rt) + q cos(rt), the methods we’ve developed so far won’t help here. Instead, we employ a technique called the variation of parameters. Step 1 in solving this ODE is the same as before: solve the homogeneous counterpart ẍ ≠ 2ẋ ≠ 3x = 0. We see that xh = C1 e≠t + C2 e3t . Step 2 is the variation of parameters. Let x1 = x1 (t) = e≠t and x2 = x2 (t) = e3t be the two functions which make up the solution to the homogeneous equation. Now we solve the system of equations ċ1 x1 + ċ2 x2 = 0 and ċ1 ẋ1 + ċ2 ẋ2 = f (t) for two functions ċ1 = ċ1 (t) and ċ2 = ċ2 (t). Once we find ċ1 and ċ2 , we integrate them to get two function c1 and c2 . The solution to the ODE is then x = xh + xnh = C1 x1 + C2 x2 + c1 x1 + c2 x2 . In this case, specifically, we find the system ċ1 e≠t + ċ2 e3t = 0 and ≠ ċ1 e≠t + 3ċ2 e3t = te≠t . Adding these two equations together, we find that ċ2 e3t + 3ċ2 e3t = te≠t . So that te≠4t . 4 Plugging this into the first equation, we find that t ċ1 = ≠ . 4 Now we integrate to determine that ˆ t2 1 te≠4t e≠4t c1 (t) = ≠ and c2 (t) = te≠4t dt = ≠ ≠ , 8 4 16 64 after integrating by parts. So we find that 3 4 t2 t 1 xnh = e≠t ≠ ≠ ≠ . 8 16 64 ċ2 = Thus, the solution is x = C1 e≠t + C2 e3t ≠ e≠t 2 (8t + 4t + 1). 64 54 ANALYSIS AND OPTIMIZATION 15.2. Calculus of Variations. We are now going to move on to optimization problems where our optimizer will be a function rather than a point. The standard form of these problems is ˆ t1 ˆ t1 max f (t, x, ẋ) dt or min f (t, x, ẋ) dt x subject to x t0 t0 x(t0 ) = x0 and x(t1 ) = x1 . Here f : R3 æ R is a C 2 function and x = x(t) is C 1 . In general, x will end up being at least C 2 as well. These problems are called variational problems, and the methods used to solve them come from the theory of the calculus of variations. Sometimes the calculus of variations is called dynamic programing. Theorem 15.1. If xú = xú (t) is a solution of our variational problem, then ˆx f (t, xú , ẋú ) = d (ˆẋ f (t, xú , ẋú )). dt The equation in the theorem d (ˆẋ f (t, xú , ẋú )) dt is called the Euler–Lagrange equation associated to the variational problem determined by f . Let’s unpack this equation. The right had side is a total derivative, and if xú is C 2 , we can distribute the d/dt, and find ˆx f (t, xú , ẋú ) = d (ˆẋ f (t, xú , ẋú )) = ˆtẋ f (t, xú , ẋú ) + ˆxẋ f (t, xú , ẋú )ẋ + ˆẋẋ f (t, xú , ẋú )ẍ dt = ˆ13 f (t, xú , ẋú ) + ˆ23 f (t, xú , ẋú )ẋ + ˆ33 f (t, xú , ẋú )ẍ. The left hand side ˆx f (t, xú , ẋú ) = ˆ2 f (t, xú , ẋú ). In turn, when xú is C 2 , the EL equation is a second order ODE: ˆ13 f (t, xú , ẋú ) + ˆ23 f (t, xú , ẋú )ẋ + ˆ33 f (t, xú , ẋú )ẍ ≠ ˆ2 f (t, xú , ẋú ) = 0. Theorem 15.2. In the maximization (minimization) case, if xú solves the EL equation associated to a variational problem and f is concave (convex) in (x, ẋ) for every fixed t, then xú is a solution to the variational problem. Theorem 15.3. In the maximization (minimization) case, if xú solves the EL equation associated to a variational problem and f is strictly concave (convex) in (x, ẋ) for every fixed t, then xú is the unique solution to the variational problem. For example, consider the problem max x subject to ˆ t1 f (ẋ) dt t0 x(t0 ) = x0 and x(t1 ) = x1 . The EL equation is this case is d Õ (f (ẋ)) = 0 ∆ f Õ (ẋ(t)) = constant (as a function of t). dt ANALYSIS AND OPTIMIZATION 55 Notice that if x = x(t) is the line joining (t0 , x0 ) and (t1 , x1 ), then ẋ is constant, and so it would solve the EL equation. Indeed, the line joining (t0 , x0 ) and (t1 , x1 ) is x0 ≠ x1 t0 x1 ≠ t1 x0 x(t) = t+ . t 0 ≠ t1 t 0 ≠ t1 Then, x0 ≠ x1 ẋ(t) = = s for all t œ R, t0 ≠ t 1 i.e., it is constant. So f Õ (ẋ(t)) = f Õ (s) = constant, as desired. By our theorem, if f is concave, then this line would solve the variational problem. Let’s do another example: ˆ 1 max (1 ≠ x ≠ 3ẋ ≠ 2ẋ2 )et dt x 0 subject to x(0) = 0 and x(1) = 3. To solve this, we first identify the integrand f (t, x, ẋ) = (1 ≠ x ≠ 3ẋ ≠ 2ẋ2 )et . Second, we compute the EL equation. To do this we need ˆx f = ≠et , ˆẋ f = ≠(3 + 4ẋ)et and Thus, the EL equation is d (ˆẋ f ) = ≠(3 + 4ẋ + 4ẍ)et . dt 1 ≠et = ≠et (3 + 4ẋ + 4ẍ) … ẍ + x = ≠ . 2 This is a linear 2nd order ODE with constant coefficients, which we know how to solve. Its solution is t x = C1 + C2 e≠t ≠ . 2 Third, we use our boundary conditions x(0) = 0 and x(1) = 3 to determine C1 and C2 . In particular, we find that 7 C1 = ≠C2 = . 2(e ≠ 1) To conclude, we need to check if f is concave in (x, ẋ) for all t. To this end, we compute the Hessian of f with respect to (x, ẋ): 2 t Dx, ẋ f (t, x, ẋ) = diag(0, ≠4)e . This is symmetric and its eigenvalues are 0 and ≠4et , which are both less than or equal to 0. So f is indeed concave in (x, ẋ) for all t. Hence, x(t) = solves the problem. Let’s do another example: min x subject to 7 7 t ≠ e≠t ≠ 2(e ≠ 1) 2(e ≠ 1) 2 ˆ 0 T x2 + cẋ2 dt with c > 0 x(0) = x0 and x(T ) = 0. 56 ANALYSIS AND OPTIMIZATION Again, we compute the EL equation 1 2cẍ = 2x … ẍ ≠ x = 0. c The general solution to this ODE is 1 x = Aert + Be≠rt with r = Ô . c Using the boundary conditions x(0) = x0 and x(T ) = 0, we find that A + B = x0 and AerT + Be≠rT = 0. Thus, ≠x0 e≠rT x0 erT and B = . erT ≠ e≠rT erT ≠ e≠rT A= Notice that 2 Dx, ẋ f (t, x, ẋ) = diag(2, 2c). Since c > 0, the eigenvalues of this symmetric matrix are both positive. So f is strictly convex, which tells us that the solution to the EL we found subject to the boundary conditions is the unique minimizer. Question: Where does the EL equation come from? Well, let’s derive it in the maximization case. The minimization case is essentially the same. To do this, let ˆ t1 I(x) = f (t, x, ẋ) dt. t0 Also, let Ï : R æ R be C 1 and such that Ï Ø 0 and Ï(t0 ) = 0 = Ï(t1 ). Now suppose that xú is a maximizer. Then, x = xú + ⁄Ï is an admissible function, i.e., a competitor in the maximization problem. So In turn, the function I(xú + ⁄Ï) Æ I(xú ) for all ⁄ œ R. g(⁄) = I(xú + ⁄Ï) has a maximum at ⁄ = 0, from which it follows that g Õ (0) = 0. But what is g Õ , ˆ g Õ (⁄) = t1 ˆx f (t, xú + ⁄Ï, ẋú + ⁄Ï̇)Ï + ˆẋ f (t, xú + ⁄Ï, ẋú + ⁄Ï̇)Ï̇ dt. t0 Hence, 0 = g Õ (0) = ˆ t1 ˆx f (t, xú , ẋú )Ï dt + t0 ˆ t1 ˆẋ f (t, xú , ẋú )Ï̇ dt. t0 Let’s integrate the second integral by parts: ˆ t1 ˆ 1 ˆẋ f (t, xú , ẋú )Ï̇ dt = ˆẋ f (t, xú , ẋú )Ï|t=t ≠ t=t0 t0 t1 t0 =≠ ˆ t1 t0 d (ˆẋ f (t, xú , ẋú ))Ï dt dt d (ˆẋ f (t, xú , ẋú ))Ï dt dt ANALYSIS AND OPTIMIZATION 57 since Ï(t0 ) = 0 = Ï(t1 ), by assumption. Putting this back into what we had, we deduce that 6 ˆ t1 5 d 0= ˆx f (t, xú , ẋú ) ≠ (ˆẋ f (t, xú , ẋú )) Ï dt. dt t0 By a fact in calculus (see the text if you are interested, Chapter 8, Section 3, Theorem 8.3.2), this implies that d ˆx f (t, xú , ẋú ) ≠ (ˆẋ f (t, xú , ẋú )) = 0, dt which is exactly our EL equation. 58 ANALYSIS AND OPTIMIZATION 16. Lecture 16: Calculus of Variations Cont’d Let’s start by answering the question, “How does the concavity of f in (x, ẋ) imply an xú which solves the EL equation is a maximizer?” To see this, let y be an arbitrary admissible function. Then, ˆ t1 I(y) ≠ I(xú ) = (f (t, y, ẏ) ≠ f (t, xú , ẋú )) dt t0 t1 Æ ˆ Òx,ẋ f (t, xú , ẋú ) · (y ≠ xú , ẏ ≠ ẋú ) dt = ˆ [ˆx f (t, xú , ẋú ) ≠ t0 t1 t0 d (ˆẋ f (t, xú , ẋú )](y ≠ xú ) dt dt = 0. The inequality is a consequence of the concavity of f in (x, ẋ). The equality after follows from integrating by parts, and the last equality follows since xú solves the EL equation. In turn, I(xú ) Ø I(y) for all y admissible, as desired. In other words, xú is a solution to the variational problem. Now let’s see an example of how and where a variational problem shows up in real life: the optimal savings problem. Consider an economy evolving over time, where k = k(t) represents the capital stock, c = c(t) represents consumption, and y = y(t) represents net national product. Suppose that y(t) = g(k(t)) where g Õ (k) > 0 and g ÕÕ (k) Æ 0. That is, g is concave and increasing, i.e., net national product is concave and increasing as a function of stock. Also, assume that net national product = y(t) = c(t) + k̇(t) = consumption + investment and k(0) = k0 is the given capital stock existing today at time t = 0. Furthermore, assume that society has a utility function u = u(c), depending on consumption in the following way uÕ (c) > 0 and uÕÕ (c) < 0. An interpretation of this dependence is that high levels of consumption lead to a lower increase in satisfaction from a given increase in consumption compared to low levels of consumption. Finally, let r Ø 0 be the discount factor, which describes how much more the present matters than the future. The optimal savings problem is then ˆ T ˆ T ≠rt max u(c)e dt = max u(g(k) ≠ k̇)e≠rt dt, k 0 k 0 ANALYSIS AND OPTIMIZATION 59 recalling that c = y ≠ k̇ and y = g(k). What is the solution of the optimal savings problem? Well, let’s use what we have learned. First, we derive the EL equation. We compute So f (t, k, k̇) = u(g(k) ≠ k̇)e≠rt . ˆk f = uÕ (g(k) ≠ k̇)g Õ (k)e≠rt = uÕ (c)g Õ (k)e≠rt Hence, ˆk̇ f = ≠uÕ (g(k) ≠ k̇)e≠rt = ≠uÕ (c)e≠rt . d (ˆ f ) = ≠uÕÕ (c)cÕ e≠rt + ruÕ (c)e≠rt , dt k̇ and the EL equation is Now set uÕ (c)(g Õ (k) ≠ r) + uÕÕ (c)cÕ = 0. cuÕÕ (c) , uÕ (c) which represent the elasticity of marginal utility with respect to consumption. We then find (rearranging the EL equation) that Ec (uÕ ) = cÕ uÕ (c) r ≠ g Õ (k) = ÕÕ (r ≠ g Õ (k)) = . c cu (c) Ec (uÕ ) Notice that Ec (uÕ ) < 0 by our assumptions on utility. In turn, cÕ cÕ > 0 … g Õ (k) > r and < 0 … g Õ (k) < r. c c This tells us that consumption increases if and only if the marginal productivity of capital trumps the discount factor. Since the EL equation tells is that cÕ = g Õ (k)k̇ ≠ k̈, k̈ ≠ g Õ (k)k̇ + uÕ (c) (r ≠ g Õ (k)) = 0, uÕÕ (c) which is an ODE for k. As g is concave, g(k) ≠ k̇ is concave in (k, k̇). Moreover, as u is increasing and concave, u(g(k) ≠ k̇)e≠rt is also concave in (k, k̇). So any solution to our ODE above will solve our optimal savings problem. 16.1. More General / Other Terminal Conditions. So far we have only considered the boundary conditions x(t0 ) = x0 and (1) x(t1 ) = x1 . Now we will consider the following four cases: (a) x(t1 ) is free; (b) x(t1 ) Ø x1 ; (c) x(t1 ) Æ x2 ; and (d) x1 Æ x(t1 ) Æ x2 . The new feature, when dealing with (a) - (d) rather than (1), is called a transversality condition. So we consider the problem ˆ t1 max x f (t, x, ẋ) dt t0 60 ANALYSIS AND OPTIMIZATION subject to x(t0 ) = x0 and any one of (a) - (d). Theorem 16.1 (Necessary Condition). If xú solves the variational problem with one of the terminal conditions (a), (b), (c), or (d), then xú satisfies the EL equation and the transversality condition (a) ˆẋ f (t, xú , ẋú )|t=t1 = 0. (b) ˆẋ f (t, xú , ẋú )|t=t1 = 0 if x(t1 ) > x1 and ˆẋ f (t, xú , ẋú )|t=t1 Æ 0 if x(t1 ) = x1 . (c) ˆẋ f (t, xú , ẋú )|t=t1 = 0 if x(t1 ) < x2 and ˆẋ f (t, xú , ẋú )|t=t1 Ø 0 if x(t1 ) = x2 . (d) ˆẋ f (t, xú , ẋú )|t=t1 = 0 if x(t1 ) œ (x1 , x2 ); ˆẋ f (t, xú , ẋú )|t=t1 Æ 0 if x(t1 ) = x1 ; and ˆẋ f (t, xú , ẋú )|t=t1 Ø 0 if x(t1 ) = x2 . Theorem 16.2 (Sufficient Condition). Suppose f is concave in (x, ẋ) and xú satisfies the EL equation and the transversality condition ˆẋ f (t, xú , ẋú )|t=t1 = 0. (1) If xú (t1 ) œ [x1 , x2 ], then xú solves the variational problem. (2) If xú (t1 ) < x1 , then replace the transversality condition with the terminal condition x(t1 ) = x1 . The solution to the EL equation subject to x(t0 ) = x0 and x(t1 ) = x1 will solve the variational problem. (3) If xú (t1 ) > x2 , then replace the transversality condition with the terminal condition x(t1 ) = x2 . The solution to the EL equation subject to x(t0 ) = x0 and x(t1 ) = x2 will solve the variational problem. Why do we have the transversality condition? Let’s look at (a) and (b), for example. For (a), let ˆ t1 I(x) = f (t, x, ẋ) dt. t0 Consider any Ï : R æ R such that Ï Ø 0 and Ï(t0 ) = 0. Define g(⁄) = I(xú + ⁄Ï). Like before, when we derived the EL equation, we know that g Õ (0) = 0 if xú is a maximizer. So ˆ t1 0= ˆx f (t, xú , ẋú )Ï + ˆẋ f (t, xú , ẋú )Ï̇ dt t0 1 = ˆẋ f (t, xú , ẋú )Ï|t=t t=t0 + = ˆẋ f (t, xú , ẋú )|t=t1 . ˆ t1 t0 [ˆx f (t, xú , ẋú ) ≠ d (ˆẋ f (t, xú , ẋú )]Ï dt dt Here we integrated by parts, used that xú satisfies the EL equation, and that Ï(t0 ) = 0. For (b), we have to assume that xú (t1 ) + ⁄Ï(t1 ) Ø x1 to be admissible. If ú x (t1 ) > x1 , then the exact same argument gives us the same transversality condition. Why? Because xú (t1 ) > x1 allows us to consider both positive and negative ⁄. If xú (t1 ) = x1 , then ⁄Ï(t1 ) Ø 0. In other words, we can only consider ⁄ Ø 0. So we only have the inequality g Õ (0) Æ 0, ANALYSIS AND OPTIMIZATION which implies 61 ˆẋ f (t, xú , ẋú )|t=t1 Æ 0, by a similar argument. For example, let’s look at the problem ˆ ln 2 min ẋ2 + (x ≠ 2)2 dt x subject to 0 2 Æ x(0) Æ 3 and x(ln 2) = 1. To solve this, first, we put it into our standard form: ˆ 0 max ≠ẏ 2 ≠ (y ≠ 2)2 ds y subject to ≠ ln 2 2 Æ y(0) Æ 3 and y(≠ ln 2) = 1, with y(s) = x(≠s). How did we do this? We did the change of variables in the integral t = ≠s and renaming y(s) = x(≠s). Recall, the EL equation in general is d ˆy f = (ˆẏ f ). ds In this case, we find ≠2(y ≠ 2) = ≠2ÿ … ÿ ≠ y = 2. By what we know about ODEs, y = C1 es + C2 e≠s + 2. Using the initial condition y(≠ ln 2) = 1, we find the equation 1 = y(≠ ln 2) = which is true if and only if C1 + 2C2 + 2, 2 C1 = ≠2(1 + 2C2 ). Now we deal with the transversality condition Note that 0 = ˆẏ f |s=0 = ≠2ẏ(0). ẏ = C1 es ≠ C2 e≠s = ≠2(1 + 2C2 )es ≠ C2 e≠s . So ẏ(0) = 0 if and only if 2 C2 = ≠ = C1 . 5 We need to see what y(0) is: 2 2 6 y(0) = ≠ ≠ + 2 = < 2. 5 5 5 Therefore, this cannot be the solution to our variational problem. Instead, the solution will be the solution to the EL equation with the conditions y(≠ ln 2) = 1 and y(0) = 2, once we verify that the integrand is concave. Using y(0) = 2, we find that 2 y = ≠ (es ≠ e≠s ) + 2. 3 62 ANALYSIS AND OPTIMIZATION Since ≠ẏ 2 ≠ (y ≠ 2)2 is concave in (y, ẏ), this y solves the problem. Going back to x, we see that 2 x = ≠ (e≠t ≠ et ) + 2 3 solves the original minimization problem. ANALYSIS AND OPTIMIZATION 63 17. Lecture 17: Calculus of Variations Cont’d and Control Theory Let’s do another example. Consider ˆ 1 max (1 ≠ 4x2 ≠ ẋ2 ) dt x 0 subject to (a) x(1) Ø 0 and (d) ≠ 2 Æ x(1) Æ 0. We start by deriving the EL equation. Since f = 1 ≠ 4x2 ≠ ẋ2 , ˆx f = ≠8x, ˆẋ f = ≠2ẋ, and d (ˆẋ f ) = ≠2ẍ. dt So the EL equation is ẍ ≠ 4x = 0 with x(0) = 1. From this, we find the function x = Ce2t + (1 ≠ C)e≠2t . The transversality condition is This tells us that i.e., ≠2ẋ(1) = ˆẋ f |t=1 = 0. 0 = ẋ(1) = (2Ce2t ≠ 2(1 ≠ C)e≠2t )|t=1 , C= 1 . 1 + e4 Now we need to see what x(1) is, now that we have found C. x(1) = 2e2 = 0.2658... 1 + e4 Thus, for (a), we find that x(1) Ø 0, and x= 1 (e2t + e4≠2t ) 1 + e4 is the solution since f is concave in (x, ẋ). For (d), we see that x(1) œ / [≠2, 0], so we need to replace the transversality condition with the terminal point condition x(1) = 0. (since 0.2568... is closer to 0 than ≠2; if x(1) were closer to ≠2, we’d have impose the terminal point condition x(1) = ≠2.) In other words, for x = Ce2t + (1 ≠ C)e≠2t , we need to find C using x(1) = 0. In turn, we find that x= 1 (e2t ≠ e4≠2t ) 1 ≠ e4 is the solution, again since f is concave in (x, ẋ). 64 ANALYSIS AND OPTIMIZATION 17.1. Integral Constraints. When we were studying constrained optimization, we considered the method of Lagrange multipliers. If we introduce integral constraints, then this method will be useful again. Now we consider the general problem ˆ t1 max f (t, x, ẋ) dt x subject to t0 x(t0 ) = x0 , ˆ t1 ˆ t1 t0 and hi (t, x, ẋ) dt Æ ai for i = 1, . . . , n, gj (t, x, ẋ) dt = bj for j = 1, . . . , m. t0 Sometimes, it will be convenient to write these last two constraints in vector form ˆ t1 ˆ t1 h(t, x, ẋ) dt Æ a and g(t, x, ẋ) dt = b t0 t0 with h = (h1 , . . . , hn ), a = (a1 , . . . , an ), g = (g1 , . . . , hm ), and b = (b1 , . . . , bm ). Case 1: Equality Constraints. Let’s first consider the case where we only have g. In this case, like the Lagrangian, we define the augmented integrand f˜(t, x, ẋ) = f (t, x, ẋ) ≠ ⁄ · g(t, x, ẋ), and study it to determine solutions to the variational problem ˆ t1 max f (t, x, ẋ) dt x t0 subject to x(t0 ) = x0 and ˆ t1 g(t, x, ẋ) dt = b. t0 Theorem 17.1. Let xú be a solution of the EL equation d ˆx f˜ = (ˆẋ )f˜ with x(t0 ) = x0 . dt If f˜ is concave in (x, ẋ), then xú is a solution to the variational problem. For example, consider max x subject to ˆ 0 1 ≠ẋ2 dt x(0) = x(1) = 0 and ˆ 0 1 x dt = 1. Remark 17.2. By the Fundamental Theorem of Calculus, ˆ t1 x(t0 ) = x0 and x(t1 ) = t1 … ẋ dt = x1 ≠ x0 with x(t0 ) = x0 . t0 ANALYSIS AND OPTIMIZATION 65 To solve this, we first use our remark to rewrite the first two constraints and put them in standard form, i.e., an initial condition and integral conditions; no terminal condition: ˆ x(0) = 0 and 1 0 ẋ dt = 0. Now let’s construct our augmented integrand f˜ = ≠ẋ2 ≠ ⁄1 ẋ ≠ ⁄2 x, and observe that d ˆx f˜ = ≠⁄2 , ˆẋ f˜ = ≠2ẋ ≠ ⁄1 , and (ˆẋ f˜) = ≠2ẍ. dt Therefore, the EL equation is 2ẍ = ⁄2 . In turn, ⁄ 2 t2 x= + C1 t + C2 . 4 Since x(0) = 0, we see that C2 = 0. We need to determine ⁄2 and C1 . To this end, we need 3 4-t=1 ˆ 1 ⁄ 2 t2 ⁄2 0= ẋ dt = + C1 t -= + C1 . 4 4 0 t=0 We also need ˆ 1 ⁄2 C1 1= x dt = + . 12 2 0 Hence, ⁄2 = ≠24 and C1 = 6. The solution to the EL equation subject to the constraints is x = 6t(1 ≠ t). ˜ Since f is concave for any ⁄1 and ⁄2 , we then, by our theorem, conclude that this x is a maximizer. Case 2: Mixed Constraints. In this case, we define the augmented integrand as f˜(t, x, ẋ) = f (t, x, ẋ) ≠ ⁄ · g(t, x, ẋ) ≠ µ · h(t, x, ẋ) with µ Ø 0. Theorem 17.3. Let xú be a solution of the EL equation d ˆx f˜ = (ˆẋ f˜) with x(t0 ) = x0 . dt If f˜ is concave in (x, ẋ), then xú is a solution to the variational problem with 3 ˆ t1 4 µ· h(t, xú , ẋú ) ≠ a dt = 0. t0 For example, consider max x subject to ˆ 0 1 ≠ẋ2 dt x(0) = 0, x(1) = 1, and ˆ 0 1 x2 dt Æ a. 66 ANALYSIS AND OPTIMIZATION To solve this, we first construct our augmented integrand f˜ = ≠ẋ2 ≠ ⁄ẋ ≠ µx2 with µ Ø 0. (Again, we use the FTC to put our constraints in standard form.) The EL equation is ẍ = µx since ˆx f˜ = ≠2µx, ˆẋ f˜ = ≠2ẋ ≠ ⁄, and Using that x(0) = 0 and x(1) = 1, we find that d (ˆẋ f˜) = ≠2ẍ. dt x(t) = t if µ = 0 and Ô Ô Ô sinh( µt) e µt ≠ e≠ µt Ô x(t) = Ôµ = if µ > 0. Ô sinh( µ) e ≠ e≠ µ So if µ = 0, then x(t) = t solves the problem provided that ˆ 1 1 aØ t2 dt = . 3 0 If µ > 0, then x(t) = Ô sinh( µt) Ô sinh( µ) solves the problem with µ implicitly determined by a= ˆ 1 0 3 Ô 4 sinh( µt) 2 dt. Ô sinh( µ) 17.2. Control Theory. In control theory, we study problems of the form ˆ t1 max f (t, x, u) dt, x,u t0 where x = x(t) is called the state and u = u(t) is called the control, subject to and ẋ = g(t, x, u), u(t) œ U µ R, x(t0 ) = x0 , (a) x(t1 ) = x1 , (b) x(t1 ) Ø x1 , (c) x(t1 ) = free, or (d) x(t1 ) Æ x1 . The set U is called the control region. Definition 17.4. A pair (x, u) that satisfies the conditions ẋ = g(t, x, u), u(t) œ U µ R, x(t0 ) = x0 , and, depending on the problem, (a) x(t1 ) = x1 , (b) x(t1 ) Ø x1 , (c) x(t1 ) = free, or (d) x(t1 ) Æ x1 is a called an admissible pair. An optimal pair is an admissible pair that maximizes the integral ˆ t1 f (t, x, u) dt, among all admissible pairs. t0 ANALYSIS AND OPTIMIZATION 67 As a motivating example, let’s consider an economy evolving over time, where k = k(t) represents the capital stock, f = f (k) represents production, and s = s(t) represents the fraction of production set aside for investment. Then, (1 ≠ s(t))f (k(t)) represents consumption per unit time. If we wanted to maximize consumption over a given period, we would have the control problem ˆ T max (1 ≠ s)f (k) dt, k,s 0 where k = k(t) is our state and s = s(t) is our control, subject to k̇ = sf (k), s(t) œ [0, 1], k(0) = k0 , and k(T ) Ø kT . Here, k0 represents the amount of stock we start with, and kT represents the amount of stock we want to ensure exists at time T . 68 ANALYSIS AND OPTIMIZATION 18. Lecture 18: Control Theory Cont’d Like a Lagrangian, we introduce an auxiliary function called a Hamiltonian to help us solve control theory problems, also known as optimal control problems. Definition 18.1. For p0 œ R and p = p(t), we define the Hamiltonian H(t, x, u, p) = p0 f (t, x, u) + p(t)g(t, x, u). The function p = p(t) is called the adjoint function. Remark 18.2. We can assume that p0 is either equal to 0 or 1. Indeed, if p0 ”= 0, then divide everything by p0 , and redefine the adjoint function as p/p0 . Theorem 18.3 (The Maximum Principle). Let (xú , uú ) be an optimal pair. Then, there is a continuous function pú and a number p0 œ {0, 1}, i.e, p0 = 0 or p0 = 1, such that 1. (p0 , pú (t)) ”= (0, 0) for all t œ [t0 , t1 ]. 2. uú maximizes H with respect to u, i.e., H(t, xú (t), u, pú (t)) Æ H(t, xú (t), uú (t), pú (t)) for all u œ U. 3. ṗú = ≠ˆx H(t, xú (t), uú (t), pú (t)). 4. pú satisfies the following transversality condition at t = t1 , depending on the problem (a) nothing, (b) pú (t1 ) Ø 0 and pú (t1 ) = 0 if xú (t1 ) > x1 . (c) pú (t1 ) = 0, (d) pú (t1 ) Æ 0 and pú (t1 ) = 0 if xú (t1 ) < x1 . Theorem 18.4 (Mangasarian). Let (xú , uú ) be an admissible pair. Suppose that 1. through 4. of the maximum principle are satisfied for some pú with p0 = 1. If U , the control region, is convex and H(t, x, u, pú (t)) is concave in (x, u) for all t œ [t0 , t1 ], then (xú , uú ) is an optimal pair. Given these two theorems, how do we solve an optimal control problem? Step 0: Make sure the problem is in standard form; typically this just involves turning a minimization problem into a maximization problem. Step 1: (a) Identify the Hamiltonian, and set p0 = 1. (b) For each triplet (t, x, p), maximize H(t, x, u, p) with respect to u over U , and set û = û(t, x, p) to be the maximizer. Step 2: Find particular solutions to the ODEs, for x = x(t) and p = p(t), ẋ = g(t, x, û(t, x, p)) and ṗ = ≠ˆx H(t, x, û(t, x, p), p) using the (1) boundary conditions and (2) the transversality condition to go from the general solution to a particular solution. Call these particular solutions xú and pú . Then, set uú = uú (t) = û(t, xú (t), pú (t)). Step 3: Check the sufficient conditions from Mangasarian hold, i.e., (1) U is convex and (2) H(t, x, u, pú (t)) is concave in (x, u) for all t œ [t0 , t1 ]. If these two things hold, (xú , uú ) is an optimal pair. For example, let’s consider the optimal control problem ˆ 1 min u2 dt x,u 0 ANALYSIS AND OPTIMIZATION 69 subject to ẋ = u + ax, x(0) = 1, x(1) = free, and u œ U = R. Here a œ R is a fixed but arbitrary constant. Step 0: In standard form, our problem is ˆ 1 max ≠u2 dt x,u subject to 0 ẋ = u + ax, x(0) = 1, x(1) = free, and u œ U = R. Step 1: The Hamiltonian with p0 = 1 is H(t, x, u, p) = ≠u2 + p(u + ax). To maximize H(t, x, u, p) with respect to u œ U = R, we note that H(t, x, u, p) is concave in u. So stationary points with respect to u are maxima. Therefore, we compute 0 = ˆu H = ≠2u + p. And we find that p û = . 2 Step 2: Our ODEs are p ẋ = + ax and ṗ = ≠pa. 2 ≠at So p = C1 e . Using the transversality condition, p(1) = 0 (since x(1) = free), we find that C1 = 0. Thus, pú (t) = 0 for all t. Plugging this into the ODE for x, we get the simpler ODE ẋ = ax, which is solved by x = C2 eat . Using the boundary condition, x(0) = 1, we see that C2 = 1. And so, xú (t) = eat . In turn, pú (t) uú (t) = û(t, xú (t), pú (t)) = = 0 for all t. 2 So our candidate solution is (xú , uú ) = (eat , 0). Step 3: Since R is convex, U = R is convex. In addition, H(t, x, u, pú (t)) = ≠u2 , which is concave in (x, u) for all t œ [t0 , t1 ]. So the hypothesis of Mangasarian are satisfied, which implies that our candidate solution is an optimal solution, i.e., (xú , uú ) = (eat , 0) is an optimal pair / maximizing pair. Now, let’s redo the same problem, but with a different terminal boundary condition. Let’s consider the optimal control problem ˆ 1 max ≠u2 dt x,u subject to 0 ẋ = u + ax, x(0) = 1, x(1) = 0, and u œ U = R. 70 ANALYSIS AND OPTIMIZATION Here, again, a œ R is a fixed but arbitrary constant. Step 0: Since the problem is already in standard form, there is nothing to do. Step 1: Notice that Step 1 doesn’t consider the boundary conditions. So Step 1 is the same as before, and we have that û = p/2, again. Step 2: Here is the first place things change. While the ODEs are the same, the particular solutions we get will change because our boundary conditions and, thus, our transversality condition has changed. Recall that our ODEs are p ẋ = + ax and ṗ = ≠pa. 2 So p = C1 e≠at . Using this p, we compute that C1 x(t) = C2 + t if a = 0 2 and C1 ≠at x(t) = C2 eat ≠ e if a ”= 0. 2a Now we use our boundary conditions x(0) = 1 and x(1) = 0. Since we have a prescribed terminal value, there is no transversality condition. From the boundary conditions, we find that C2 = 1 and C1 = ≠2 if a = 0 and 2aea e≠a and C2 = ≠ if a ”= 0. sinh(a) 2 sinh(a) Recall sinh(a) = (ea ≠ e≠a )/2. So C1 = ≠ pú (t) = ≠2 if a = 0 and pú (t) = ≠ 2aea(1≠t) if a ”= 0. sinh(a) Moreover, xú (t) = 1 ≠ t if a = 0 and xú (t) = sinh(a(1 ≠ t)) if a ”= 0. sinh(a) Finally, aea(1≠t) if a ”= 0. sinh(a) Step 3: Since pú has changed, we need to check that H(t, x, u, pú (t)) is still concave in (x, u) for all t œ [t0 , t1 ]. When a = 0, uú (t) = ≠1 if a = 0 and uú (t) = ≠ When a ”= 0, H(t, x, u, pú (t)) = ≠u2 ≠ 2u. 2aea(1≠t) (u + ax). sinh(a) In (x, u) both of these functions are downward pointing parabolas, and so concave. Thus, the hypothesis of Mangasarian are satisfied, which implies that our candidate pair is an optimal pair. H(t, x, u, pú (t)) = ≠u2 ≠