A&O Notes

advertisement
ANALYSIS AND OPTIMIZATION
1. Lecture 1: Elements of Linear Algebra
Definition 1.1. An m◊n matrix is a rectangular array with m rows and n columns:
Q
R
a11 a12 · · · a1n
c a21 a22 · · · a2n d
c
d
A = (aij )m◊n = c .
..
.. d .
..
a ..
.
.
. b
am1
am2
···
amn
Here aij = the entry in the ith row and jth column. Also, i œ {1, . . . , m} and
j œ {1, . . . , n}.
Let us recall some basic properties or facts about matrices.
1. Addition: If two matrices A and B have the same size, we can add them.
A + B = (aij + bij )m◊n .
2. Multiplication: If A and B are two matrices and if the number of columns
of A matches the number of rows of B, we can multiply them.
3ÿ
4
n
AB =
aik bkj
k=1
m◊p
where
A = (aij )m◊n and B = (bij )n◊p .
3. Scalar Multiplication: If A = (aij )m◊n is a matrix and c œ R, is a constant,
then we can (scalar) multiply A by c.
cA = (caij )m◊n
4. Interaction of Basic Operations: If A, B, and C have the appropriate size,
then we have that
i. (AB)C = A(BC);
ii. A(B + C) = AB + AC; and
iii. (A + B)C = AC + BC.
5. Curious Facts: In general,
i. AB ”= BA. E.g., (aij )2◊3 (bij )3◊1 = a 2 ◊ 1 matrix; yet (bij )3◊1 (aij )2◊3
is not defined;
ii. AB = 0 ”∆ A = 0 or B = 0; and
iii. AB = AC ”∆ B = C or B = 0.
6. Transpose: Given a matrix A = (aij )m◊n we can take it’s transpose, defined
as
AÕ = At = AT = (aji )n◊m .
E.g.,
Q
R
3
4
0 1
0 2 4
A = a2 3b ∆ At =
.
1 3 5
4 5
1
2
ANALYSIS AND OPTIMIZATION
Definition 1.2. A column vector is an m ◊ 1 matrix. A row vector is a 1 ◊ n
matrix.
Definition 1.3. A set of vectors {a1 , . . . , an } is linearly dependent if a set of scalars
{c1 , . . . , cn } exists with ci ”= 0 for some i œ {1, . . . , n} such that
n
ÿ
ci ai = 0.
i=1
Otherwise, the set {a1 , . . . , an } is called linearly independent.
For example,
1. Parallel vectors are linearly dependent, by definition. Indeed, a1 Î a2 if
a1 = ca2 for some c ”= 0.
2. If three vectors live in a plane, then they are linearly dependent. Conversely,
if three vectors are linearly dependent, then they live in a plane.
Remark 1.4. Suppose {a1 , . . . , an } is linearly dependent and
Then,
c1 a1 + · · · + cn an = 0 with cn ”= 0.
an = ≠
n≠1
ÿ
i=1
ci
ai .
cn
In other words, an can be written as a linear combination of the other vectors.
Definition 1.5. The span of a set of vectors {a1 , . . . , an } is the collection of all
linear combinations of these vectors.
span{a1 , . . . , an } = {c1 a1 + · · · + cn an : ci œ R}
For example,
1. The span of a single vector is the collection of all vectors parallel to that
vector and the 0 vector.
2. If we have two linearly independent vectors, their span is a plane. If we
have two linearly dependent vectors, then they span a line.
Definition 1.6. The rank of a matrix A, rank(A), is the maximal number of linearly
independent column vectors of A.
For example,
1. A = 0 ∆ rank(A) = 0.
Theorem 1.7. rank(A) = rank(At ).
Corollary 1.8. rank(A) = the maximal number of linearly independent rows of A.
How do we find the rank of a given matrix? Let’s
Q
2 0
A = (a1 | a2 | a3 | a4 ) = a1 1
0 1
try to compute the rank of
R
4 1
3 0b .
1 1
Here ai , for i = 1, . . . 4, are the columns of A.
To do this, we need to recall the elementary row operations on matrices. Let
A = (aij )m◊n .
ANALYSIS AND OPTIMIZATION
3
1. Add (row i) to c times (row j). This row operation is equivalent to multiplying A on the left by E = (eij )m◊m where E has a c in the (i, j)th position,
1s on the diagonal, and 0s everywhere else.
2. Swapping (row i) with (row j). This row operation is equivalent to multiplying A on the left by E = (eij )m◊m where E has 1s on all the diagonal
entries except for the (i, i)th and (j, j)th positions, where it has 0s, and has
0s in the remaining positions except for the (i, j)th and (j, i)th positions,
where it has 1s.
3. Multiplication of (row i) by c ”= 0. This row operation is equivalent to
multiplying A on the left by E = (eij )m◊m where E has 1s on along the
diagonal entries except for the (i, i)th position, where it has c. The rest of
the positions are 0s.
Theorem 1.9. rank(EA) = rank(A).
So
and
Hence,
Q
2
a1
0
Q
1
a1
0
0
1
1
≠2
1
1
4
3
1
R
Q
1
2
r
0b ‘≠≠≠1≠æ a1
r1 ≠r3
1
0
≠1
1
1
R
Q
0 0
1
r2
3 0b ‘≠≠≠≠æ a0
r2 ≠r1
1 1
0
≠2
3
1
R
Q
3 0
1
r
3 0b ‘≠≠≠1≠æ a1
r1 ≠r2
1 1
0
R
Q
0 0
1
r3
3 0b ‘≠≠≠≠æ a0
3r3 ≠r2
1 1
0
rank(A) = 3.
≠2
1
1
≠2
3
0
R
0 0
3 0b ,
1 1
R
0 0
3 0b .
0 1
4
ANALYSIS AND OPTIMIZATION
2. Lecture 2: Elements of Linear Algebra Cont’d
2.1. Systems of Linear Equations. A system of linear equations takes the form
a11 x1 + a12 x2 + · · · + a1n xn = b1
a21 x1 + a22 x2 + · · · + a2n xn = b1
.. ..
.=.
am1 x1 + am2 x2 + · · · + amn xn = bm .
Equivalently, or compactly, any such system can be written as
Ax = b
for A = (aij )m◊n , b = (bi )m , and x = (xj )n◊1 . Here we think of A and b as given
and x as an unknown.
Given A and b, does an x exits such that Ax = b?
This question is about linear dependence and independence.
Theorem 2.1. Ax = b has a solution if and only if rank(A) = rank(Ab ) where
Q
R
a11 · · · a1n
b1
c
..
.. d .
..
Ab = (A | b) = a ...
.
.
. b
am1 · · · amn
bm
In words, a system of linear equations has a (at least one) solution if and only if
b can be written as a linear combination of the columns of A.
Remark 2.2. rank(A) Æ min{m, n}.
Remark 2.3. rank(Ab ) is at most rank(A)+1 and at least rank(A), i.e., rank(A) Æ
rank(Ab ) Æ rank(A) + 1.
Once we’ve established whether or not a solution exists, by understanding if
rank(Ab ) and rank(A) are the same or different, we can move to understanding if
we have a unique solution or infinitely many solutions. Heuristically, to solve a
system of linear equations, we need enough equations. So we can also ask, assuming
we have a solution, if we have more equations than we need.
Theorem 2.4. Consider a system of linear equations Ax = b. Assume it has a
solution, and suppose that rank(A) = rank(Ab ) = k.
1. If k = m = n, then the solution is unique.
2. If k < n, the number of unknowns, then some n ≠ k variables can be chosen
freely. The remaining k variables are determined uniquely. Thus, we have
infinitely many solutions. The system has n ≠ k degrees of freedom.
3. If k < m, then m ≠ k equations can be thrown away. In this case, we may
have infinitely many solutions or just one unique solution, depending on the
relationship between k and n.
Let’s consider the system determined by
Q
R
Q R
1 1 1
0
A = a2 1 0 b and b = a1b ,
1 0 ≠1
1
ANALYSIS AND OPTIMIZATION
5
i.e.,
x1 + x2 + x3 = 0
2x1 + x2 = 1
Observe that
x1 ≠ x3 = 1.
rank(A) ≠ rank(Ab ) = 2,
since r3 = r2 ≠ r1 . So our system is equivalent to the reduced system
x1 + x2 + x3 = 0
2x1 + x2 = 1,
i.e.,
Âx = b̂
with
 =
3
1
2
1
1
4
3 4
1
0
and b̂ =
.
0
1
(This is an example of part 3. of the theorem.) Notice that x3 is a free variable.
That is, for any x3 œ R, we can find a unique pair (x1 , x2 ) such that
x1 + x2 = ≠x3
2x1 + x2 = 1.
Thus, our system has infinitely many solutions and an extra equation.
2.2. Determinants and Inverses. Let A be an n ◊ n matrix, i.e., square.
Definition 2.5. An n ◊ n matrix B is the inverse of A if
AB = BA = I = diag(1, . . . , 1).
If an inverse exists, it is unique, and we denote it by
A≠1 .
When does A≠1 exit? 1. Well, we can try to compute it, if we knew how. 2.
Recall, from our theorem on systems of linear equation, rank(A) = n implies that a
unique solution exists to any system Ax = b. Hence, explicitly, x = A≠1 b. So full
rank, square matrices are invertible. 3. We can use the determinant test.
Case 1: n = 2. Let
A=
Then,
3
4
a b
.
c d
-a b - = ad ≠ bc.
det(A) = c d-
Geometrically, | det(A)| = the volume of the parallelogram determined by the rows
of A.
Case 2: n = 3. Let
Q
a11
A = aa21
a31
a12
a22
a32
R
a13
a23 b .
a33
6
ANALYSIS AND OPTIMIZATION
Then,
-a11
det(A) = --a21
-a31
a12
a22
a32
a13 --a
a23 - . = a11 -- 22
a32
a33 -
-a
a23 -≠ a12 -- 21
a33 a31
-a
a23 -+ a13 -- 21
a33 a31
a22 -.
a32 -
Geometrically, | det(A)| = the volume of the parallelopiped determined by the rows
of A.
Case 3: n > 3.
Definition 2.6. The cofactor Aij of a matrix A is equal to (≠1)i+j times the
determinant of the n ≠ 1 by n ≠ 1 matrix found by removing the ith row and jth
column of A.
Then,
det(A) =
n
ÿ
j=1
aij Aij =
n
ÿ
aij Aij for all i, j.
i=1
This is called the cofactor expansion of the determinant.
For example,
1. A = diag(a1 , . . . , an ) ∆ det(A) = a1 a2 · · · an .
2.
Q
R
a11
0
···
0
c a21 a22
0
0 d
c
d
A=c .
d ∆ det(A) = a1 a2 · · · an .
.
..
..
a ..
.
0 b
am1 am2 · · · ann
Matrices of this form are called lower triangular. The transpose of a lower
triangular is upper triangular.
Let us recall some properties of determinants.
1. det(At ) = det(A).
2. det(AB) = det(A) det(B).
3. Swap two rows ∆ determinant changes sign.
4. det(cA) = cn det(A).
5. If ai denote the columns of A, then det(a1 | · · · | cai | · · · | an ) = c det(A).
6. If ai denote the columns of A and b is some vector, then
det(a1 | · · · | ai + bi | · · · | an ) = det(A) + det(a1 | · · · | ai≠1 | bi | ai+1 | · · · | an ).
7. If the columns (rows) of A are linearly dependent, then det(A) = 0.
8. det(A) ”= 0 … rank(A) = n.
9. det(a1 | · · · | ai + caj | · · · | an ) = det(A).
Properties 5. and 6. together say that the det as a function on vectors is
multilinear.
Let’s go back to inverses.
Theorem 2.7. A≠1 exists if and only if det(A) ”= 0. Also, det(A≠1 ) = 1/ det(A).
If a matrix has 0 determinant, it is called singular.
Let us now recall some properties of inverses.
1. (A≠1 )≠1 = A.
2. (At )≠1 = (A≠1 )t .
3. (AB)≠1 = B ≠1 A≠1 .
ANALYSIS AND OPTIMIZATION
7
4. (cA)≠1 = c≠1 A≠1 , for c ”= 0.
Theorem 2.8. A≠1 = det(A)≠1 (Aij )t , where Aij is the (i, j)th cofactor of A.
Definition 2.9. Given a square matrix A, the adjoint of A, adj(A), is the transpose
of the matrix of cofactors of A, i.e., (Aij )t . So that when A is invertible, A≠1 =
det(A)≠1 adj(A).
If we go back to our original question, we had a question within it: how do we
compute A≠1 ? These two theorems give us an answer. Yet we might ask, is there
another way? Indeed there is. We can use Gauss–Jordan elimination (row reduction)
on the matrix (A | I) to transform it into (I | B). Then, B = A≠1 .
For example, set
Q
R
1 3 3
A = a1 4 3b .
1 3 4
Let us use this direct method to compute A≠1 .
Q
R
Q
R
1 3 3
1 0 0
1 3 3
1 0 0
r2 ;r3
a1 4 3
0 1 0b ‘≠≠≠≠≠≠≠≠æ a0 1 0
≠1 1 0b
r2 ≠r1 ;r3 ≠r1
1 3 4
0 0 1
0 0 1
≠1 0 1
Q
R
Q
R
1 0 3
4 ≠3 0
1 0 0
7 ≠3 ≠3
r
r
≠1 1 0b ‘≠≠≠1≠æ a0 1 0
≠1 1
0 b.
‘≠≠≠1≠æ a0 1 0
r1 ≠3r2
r1 ≠3r3
0 0 1
≠1 0 1
0 0 1
≠1 0
1
Hence,
Q
R
7 ≠3 ≠3
0 b.
A≠1 = a≠1 1
≠1 0
1
Finally, let us compute some cofactors.
-4 3- = 16 ≠ 9 = 7,
A11 = 3 4- 1 3- = (≠1)1+2 (4 ≠ 3) = ≠1,
A12 = -1 4and
-3 3- = (≠1)1+2 (12 ≠ 9) = ≠3.
A21 = 3 4If we continue in this way and also compute the determinant of A, we will recover
what we found as A≠1 using our theorem above.
8
ANALYSIS AND OPTIMIZATION
3. Lecture 3: Elements of Linear Algebra Cont’d
3.1. Eigenvalues and Eigenvectors. Let A be an n ◊ n (square) matrix. What
does A represent geometrically? Well, we can always ask, on what does A act and
how does A act? The simplest object A might act on is an n ◊ 1 (column) vector,
i.e., an element in Rn . For example, if A = diag(2, 1/2), then A takes every element
in R2 and doubles its length in the e1 direction and halves its length in the e2
direction, where ei = the vector with a 1 in the ith position and 0s in all the other
positions. We call the set {ei : i = 1, . . . , n} the canonical basis of Rn . In other
words, for A, the fundamental building blocks are the two vectors e1 and e2 and the
constants ⁄1 = 2 and ⁄2 = 1/2. These are examples of eigenvectors and eigenvalues.
Definition 3.1. A nonzero x œ Rn (x ”= 0) is an eigenvector of A if Ax = ⁄x for
some ⁄ œ R. The constant ⁄ is the called an eigenvalue of A.
Eigenvalues and eigenvectors are the geometric building blocks of a matrix. For
example, consider an ellipsoid in Rn , which is just a sphere stretched or shrunk in n
different directions. It turns out that we can identify any ellipsoid with a (square)
matrix. Moreover, the directions along which we change a sphere to get our ellipsoid
are the eigenvectors of this matrix, and the amount we stretch or shrink our sphere
in these directions are the eigenvalues of this matrix.
How do we find the eigenvectors and eigenvectors of a given A? From the
definition, we see that we need to solve
(A ≠ ⁄I)x = 0,
i.e., a system of linear equations, which has a solution if and only if
det(A ≠ ⁄I) ”= 0.
Notice that, as a function of ⁄, det(A ≠ ⁄I) is a polynomial of degree n.
Definition 3.2. The polynomial
p(⁄) = det(A ≠ ⁄I)
is called the characteristic polynomial of A.
The roots of the characteristic polynomial of A are the eigenvalues of A (there
are n, but they may not be distinct).
For example,
1. Let A = diag(a1 , . . . , an ). Then, det(A ≠ ⁄I) = (a1 ≠ ⁄) · · · (an ≠ ⁄). So
the eigenvalues of A are ⁄i = ai , and corresponding eigenvectors (found by
inspection) are ei respectively: Aei = ai ei .
2. If
Q
R
2 4 0
A = a4 8 0 b ,
0 0 10
then to find the eigenvalues of A, we solve
-2 ≠ ⁄
4
0 -8≠⁄
0 -- = (10≠⁄)[(2≠⁄)(8≠⁄)≠16] = ⁄(10≠⁄)2 .
0 = det(A≠⁄I) = -- 4
- 0
0
10 ≠ ⁄Hence,
⁄1 = 0, ⁄2 = 10, and ⁄3 = 10.
ANALYSIS AND OPTIMIZATION
9
In this case, we’d prefer to say that A has two eigenvalues ⁄1 = 0 and
⁄2 = 10, and the second eigenvalue has multiplicity two.
Now let’s move onto the eigenvectors. Associated to ⁄1 = 0, we get the
system
i.e.,
0 = (A ≠ 0I)x = Ax,
2x1 + 4x2 = 0
4x1 + 8x2 = 0
x3 = 0
…
x1 = ≠2x2
x3 = 0.
(Note r2 = 2r1 on the left.) Clearly, x2 is a free variable, so an eigenvectors
associated to ⁄1 = 0 is
Q R
≠2
x1 = a 1 b .
0
Associated to ⁄2 = ⁄3 = 10, we get the system
i.e.,
0 = (A ≠ 10I)x,
≠8x1 + 4x2 = 0
4x1 ≠ 2x2 = 0
…
x2 = 2x1 .
So eigenvectors associated to ⁄2 = ⁄3 = 10 are
Q R
Q R
1
0
x2 = a2b and x3 = a0b .
0
1
We found the last one by inspection.
Remark 3.3. Eigenvectors are not unique. Indeed, if x is an eigenvector of A, so
is cx for all c ”= 0 (and cx corresponds to the same eigenvalue as x).
Definition 3.4. The eigenspace associated to a distinct eigenvalue is the span of
all the eigenvectors associated to it.
For example, if we go back to our second example above. The eigenspace
associated to ⁄1 is span{x1 }, and the eigenspace associated to ⁄2 is span{x2 , x3 }.
Definition 3.5. The trace of a (square) matrix is given by
tr(A) = a11 + · · · + ann ,
i.e., it is the sum of the diagonal entries of A.
Theorem 3.6. Let A have eigenvalues ⁄i , for i = 1, . . . , n. Then,
det(A) = ⁄1 · · · ⁄n and tr(A) = ⁄1 + · · · + ⁄n .
Corollary 3.7. A is invertible if and only if the eigenvalues of A are all nonzero.
10
ANALYSIS AND OPTIMIZATION
3.2. Diagonalization. As I said before, we think of the eigenvalues and eigenvectors
of a matrix as its geometric building blocks.
Can we represent A in terms of its eigenvalues and eigenvectors?
Definition 3.8. A is called diagonalizable if it can be written as A = P P ≠1 for
some invertible matrix P and some diagonal matrix .
Theorem 3.9. An n ◊ n matrix is diagonalizable if and only if it has a set of
n linearly independent eigenvectors. In this case, P = (x1 | · · · | xn ) and
=
diag(⁄1 , . . . , ⁄n ) where xi is an eigenvector associated to ⁄i .
Definition 3.10. A is symmetric if A = At .
Definition 3.11. Two vectors x and y are orthogonal, x ‹ y, if x · y = xt y = 0.
Definition 3.12. A matrix P is orthogonal if P ≠1 = P t .
Theorem 3.13. Let A be symmetric.
1. The eigenvalues ⁄i , for i = 1, . . . , n, of A are real, i.e., live in R.
2. Eigenvectors corresponding to distinct eigenvalues are orthogonal.
3. An orthogonal matrix P exists such that P ≠1 AP = diag(⁄1 , . . . , ⁄n ). In
particular, the ith column of P is an eigenvector of A corresponding to the
ith eigenvalue ⁄i of A.
by
For example, let’s go back to ellipsoids. In particular, consider the ellipse defined
5x21 + 8x1 x2 + 5x22 = 1.
This equation is equivalent to
3
4
5 4
xt Ax = 1 for A =
.
4 5
Note that the eigenvalues of A are
⁄1 = 9 and ⁄2 = 1,
and some corresponding eigenvectors are
3 4
3 4
1
≠1
x1 =
and x2 =
.
1
1
Now let us make x1 and x2 unit length, i.e., Îxi Î = 1. Recall that the length of a
vector x is
1
ÎxÎ = (x21 + · · · + x2n ) 2 .
In turn, we can instead consider the eigenvectors
1
Ô xi for i = 1, 2.
2
Then, if we set
1
P = Ô (x1 | x2 ),
2
we find a rotation matrix (by angle fi/4), i.e., an orthogonal matrix. Hence,
considering
= diag(9, 1),
we find all the pieces of our theorem on the diagonalizability of symmetric matrices:
A = P P ≠1 = P P t .
ANALYSIS AND OPTIMIZATION
The matrix P describes tilting, while the matrix
1 = x Ax = x P P x = z
t
t
t
t
11
describes stretching. Note
z with z = P x.
So after we rotate by P (i.e., by fi/4), we can think we live in z space as opposed to
x space. In z space, our ellipse is just a stretched circle. If we look at the picture of
xt Ax = 1, it is a 45 degree rotation of the picture of xt x = 1.
3.3. Quadratic Forms. Our ellipse is also the 1-level set of a quadratic form.
Definition 3.14. A quadratic form is a function Q(x) = xt Ax for some matrix A.
Remark 3.15. In the definition of quadratic form, we can assume A is symmetric.
Indeed, since xt Ax is a scalar, xt Ax = (xt Ax)t = xt At x. So
1
1
1
1
xt (A ≠ At )x = (xt (A ≠ At )x) = xt (Ax ≠ At x) = (xt Ax ≠ xt At x) = 0.
2
2
2
2
1
t
Thus, the skew-symmetric part 2 (A ≠ A ) of A does not contribute to the form.
What is left over is the symmetric part 12 (A + At ) of A. Note
1
1
A = (A ≠ At ) + (A + At ).
2
2
Definition 3.16. Let A = (aij )n◊n .
- A is positive definite if xt Ax > 0 for all x ”= 0.
- A is positive semidefinite if xt Ax Ø 0 for all x ”= 0.
- A is negative definite if xt Ax < 0 for all x ”= 0.
- A is negative semidefinite if xt Ax Æ 0 for all x ”= 0.
- Otherwise, we call A indefinite.
Definition 3.17. Let Q = Q(x) be a quadratic form, for x œ Rn .
- Q is positive definite if Q(x) > 0 for all x ”= 0.
- Q is positive semidefinite if Q(x) Ø 0 for all x ”= 0.
- Q is negative definite if Q(x) < 0 for all x ”= 0.
- Q is negative semidefinite if Q(x) Æ 0 for all x ”= 0.
- Otherwise, we call Q indefinite: Q(y) < 0 and Q(z) > 0 for some y, z œ Rn .
For example,
1. Q(x) = 5x21 + 8x1 x2 + 5x22 is positive definite, as we saw.
2. Q(x) = 4x21 + 9x22 is positive definite.
3. Q(x) = 4x21 ≠ 12x2 x2 + 9x22 = (2x1 ≠ 3x2 )2 is positive semidefinite.
4. Q(x) = 4x21 ≠ 9x22 is indefinite.
12
ANALYSIS AND OPTIMIZATION
4. Lecture 4: Elements of Linear Algebra Cont’d and Multivariable
Calculus
4.1. Quadratic Forms Cont’d. How do we determine definiteness?
Theorem 4.1. Let Q be a quadratic form determined by the symmetric matrix
A = (aij )n◊n , and let ⁄i , for i = i, . . . , n, be the eigenvalues of A. Then,
- Q is positive definite … ⁄i > 0 for all i = 1, . . . , n.
- Q is positive semidefinite … ⁄i Ø 0 for all i = 1, . . . , n.
- Q is negative definite … ⁄i < 0 for all i = 1, . . . , n.
- Q is positive semidefinite … ⁄i Æ 0 for all i = 1, . . . , n.
- Q is in definite … ⁄i > 0 and ⁄j < 0 for some i ”= j.
In the definition of a quadratic form, we considered xt Ax for any matrix A. When
determining the definiteness of a given quadratic form in terms of its defining matrix,
we always have to consider the symmetric matrix which defines it: 12 (At + A). Why?
One reason is, in general, non-symmetric matrices may have complex roots, which
do not have a sign. So we cannot talk about their definiteness from looking at the
signs of their eigenvalues.
Let’s look at a simple example. Consider the quadratic form
Q(x) = x21 + x22 + 2x1 x2 .
Two matrices to consider are
A1 =
3
1
0
2
1
4
and A2 =
3
1
1
4
1
.
1
These both determine Q. Let’s look at characteristic polynomials of A1 and A2 :
pA1 (⁄) = (1 ≠ ⁄)2 while pA2 (⁄) = (1 ≠ ⁄)2 ≠ 1 = ⁄(⁄ ≠ 2)
These are different and have different roots. A1 has eigenvalues 1 and 1, while A2
has eigenvalues 0 and 2. Even though pA1 has real roots and both are positive, A1
is not positive definite. Indeed,
(≠1, 1)t A1 (≠1, 1) = Q(≠1, 1) = 1 + 1 + 2(≠1)1 = 0,
but (≠1, 1) is not the zero vector. So that, by the definition of positive definiteness
of a quadratic form (or a matrix), Q (or A1 ) cannot be positive definite. Clearly,
the eigenvalues of A1 cannot then be used to determine the definiteness of Q.
For example, let f = f (x1 , x2 ) be a smooth function on R2 . If we consider the
Taylor expansion of f at z:
1
f (x) = f (z) + Òf (z) · (x ≠ z) + (x ≠ z)t D2 f (z)(x ≠ z) + · · ·
2
we see that the second order part
1
(x ≠ z)t D2 f (z)(x ≠ z)
2
is a quadratic form. Here D2 f is the Hessian (matrix) of f :
3
4
ˆ11 f ˆ12 f
D2 f =
.
ˆ21 f ˆ22 f
Determining the definiteness of D2 f (z) can help us understand if z is a local
maximum, a local minimum, or a saddle point of f .
ANALYSIS AND OPTIMIZATION
13
Definition 4.2. Given an m ◊ n matrix A, the leading principal minors of A, Di
for i = 1, . . . , k = min{m, n}, are defined as the determinant of the i ◊ i submatrix
of A determined by the first i rows and columns. Explicitly,
-a11 a12 a13 -a11 a12 - , D3 = -a21 a22 a23 - , . . .
D1 = a11 , D2 = -a21 a22
-a31 a32 a33 Note that the leading principle minors are defined of all matrices, not just square
ones. In the square case, k = m = n and Dn = det(A).
Theorem 4.3. Let Q be a quadratic form.
- Q is positive definite if and only if Di > 0 for all i = 1, . . . , n.
- Q is negative definite if and only if (≠1)i Di > 0 for all i = 1, . . . , n.
For example, let
Q(x) = 5x21 + 2x1 x3 + 2x22 + 2x2 x3 + 4x23
to which we have the defining matrix
Q
5
A = a0
1
Observe that
-5
D1 = 5 > 0, D2 = -0
0
2
1
R
1
1b .
4
0-= 10 > 0, and D3 = det(A) = 40 ≠ 2 ≠ 5 = 33 > 0.
2-
Hence, Q (or equivalently A) is positive definite.
4.2. Differentiation. We start by recalling some notation and definitions.
Definition 4.4. If f : Rn æ R is a differentiable function, we can define the
directional derivative of f in a direction e by the limit
ˆe f (x) = lim+
tæ0
f (x + te) ≠ f (x)
.
t
d
It is also equal to the derivative ds
f (x + se)|s=0 . If e = ei , one of the canonical (or
n
standard) basis vectors of R , then we will often write ˆi f (x) instead of ˆei f (x);
i.e., we set
ˆi f (x) = ˆei f (x).
Definition 4.5. The gradient of a differentiable function f : Rn æ R is the vector
that collects all the directional derivatives of f in the standard directions:
Note that
Clearly, then
Òf (x) = (ˆi f (x), . . . , ˆn f (x)).
Òf (x) · e = ˆe f (x).
|ˆe f | = |Òf · e| = ÎÒÎÎeÎ cos ◊.
This is maximized when the angle between Òf and e is 0, i.e., when Òf points in
the same direction as e. So Òf is the direction of maximal increase of f .
Definition 4.6. The c-level set of f is the set {x : f (x) = c}.
14
ANALYSIS AND OPTIMIZATION
Let n = 2, and suppose that the c-level set of f is the graph of some function of one
variable in a small neighborhood around z œ {x : f (x) = c}: for all x1 œ (z1 ≠‘, z1 +‘)
(with ‘ > 0),
f (x1 , y(x1 )) = c.
Then, in particular,
Òf (z) · (1, “ Õ (z)) = ˆ1 f (z1 , y(z1 )) + ˆ2 f (z1 , “(z1 ))“ Õ (z1 ) = 0.
And so, provided ˆ2 f (z) ”= 0,
ˆ1 f (z1 , “(z1 ))
.
ˆ2 f (z1 , “(z1 ))
In other words, this specific ratio of the components of the gradient of f at z gives
us the perpendicular slope to the slope of the tangent line to “ at z1 . Recalling the
general equation for a line through a point (y ≠ y0 = m(x ≠ x0 )), we see that
“ Õ (z1 ) = ≠
ˆ1 f (z)(x1 ≠ z1 ) + ˆ2 f (z)(x2 ≠ z2 ) = ˆ1 f (z)(x1 ≠ z1 ) + ˆ2 f (z)(x2 ≠ “(z1 )) = 0.
In turn, the tangent plane to the c-level set of f at z is
{x : Òf (z) · (x ≠ z) = 0}.
Geometrically, we have that Òf (z) for z œ {x : f (x) = c} is orthogonal to the
tangent plane to the c-level set of f at z.
An important theorem is the mean value theorem:
Theorem 4.7. Suppose that f : Rn æ R is C 1 . Then,
for some t œ (0, 1).
f (x) ≠ f (y) = Òf (tx + (1 ≠ t)y) · (x ≠ y)
Definition 4.8. Let f : Rn æ R be a twice differentiable function. The Hessian of
f is the matrix of second derivatives of f : D2 f (x) = (ˆij f (x))n◊n .
Theorem 4.9. If f : Rn æ R is C 2 , then D2 f (x) is symmetric.
For example, let f : R2 æ R be defined by
I x x (x2 ≠x2 )
f (x) =
1
0
2
1
x21 +x22
2
for x ”= 0
for x = 0.
Note that ˆ12 f (0) ”= ˆ21 f (0).
Definition 4.10. Let f : Rn æ Rm be a differentiable function, i.e., f =
(f1 , · · · , fm ) for differentiable functions fi : Rn æ R with i = 1, . . . , m. The
differential (or gradient) of f is the m ◊ n matrix whose ith row is the gradient of
fi :
Q
R Q
R
Òf1 (x)
ˆ1 f1 (z)
· · · ˆn f1 (z)
c
d c
d
..
..
..
..
Df (x) = a
b=a
b.
.
.
.
.
Òfm (x)
ˆ1 fm (z)
···
ˆn fm (z)
ANALYSIS AND OPTIMIZATION
15
5. Lecture 5: Elements of Multivariable Calculus Cont’d
5.1. Convexity (and Concavity).
Definition 5.1. A set S is convex if tx + (1 ≠ t)y œ S for all x, y œ S and for all
t œ [0, 1].
For example,
1. {x : p · x = c} for some p œ Rn and c œ R.
2. {x : p · x < c} for some p œ Rn and c œ R.
3. {x : Îx ≠ zÎ Æ r} for some r Ø 0 and some z œ Rn .
4. The empty set is convex.
5. Rn is convex.
Let’s show that 2. holds: Let z, y œ {x : p · x < c}. We need to check that
tz + (1 ≠ t)y is in {x : p · x < c} for all t œ [0, 1]. To that end, observe (by the
linearity of the dot product)
p · (tz + (1 ≠ t)y) = p · tz + p · (1 ≠ t)y = t p · z + (1 ≠ t) p · y < tc + (1 ≠ t)c = c
for any t œ [0, 1]. And so, tz + (1 ≠ t)y œ {x : p · x < c} for any t œ [0, 1], as desired.
Theorem 5.2. The intersection of two convex sets is convex.
For example, let S = {x : Ax Æ b} = flm
i=1 Si where Si = {x : ai · x Æ bi }. Here
ai is the ith row of A. Since each Si is convex, S is convex.
Remark 5.3. The union of two convex sets may not be convex.
Definition 5.4. Let S µ Rn be convex and let f : S æ R.
- f is convex if
tf (x) + (1 ≠ t)f (y) Ø f (tx + (1 ≠ t)y)
for all x, y œ S and for all t œ [0, 1].
- f : S æ R is strictly convex if we replace Ø with > above.
- f : S æ R is concave if we replace Ø with Æ above.
- f : S æ R is strictly concave if we replace Ø with < above.
For example,
1. f (x) = a · x + b is both convex and concave.
2. f (x) = ÎxÎ is convex, by the triangle inequality, but not strictly convex.
3. Let S µ Rn be convex. The epigraph of f , i.e., {(x, y) œ S ◊ R : f (x) Æ y},
is convex if and only if f is convex.
4. Let S µ Rn be convex. The hypograph of f , i.e., {(x, y) œ S ◊R : f (x) Ø y},
is convex if and only if f is concave.
5. The max of two convex functions is convex.
6. The min of two concave functions is concave.
Note that f is (strictly) convex if and only if ≠f is (strictly) concave.
Theorem 5.5. Let S µ Rn be open and convex and f : S æ R be a C 1 function.
Then, f is convex if and only if
f (x) Ø f (z) + Òf (z) · (x ≠ z)
for all x, z œ S. Similarly, f is strictly convex if have the same statement with >
replacing Ø.
16
ANALYSIS AND OPTIMIZATION
Theorem 5.6. Let S µ Rn be open and convex and f : S æ R be a C 2 function.
- D2 f (x) is positive semidefinite for all x œ S if and only if f is convex.
- If D2 f (x) is positive definite for all x œ S, then f is strictly convex.
- D2 f (x) is negative semidefinite for all x œ S if and only if f is concave.
- If D2 f (x) is negative definite for all x œ S, then f is strictly concave.
For example, positive definite quadratic forms are strictly convex.
Remark 5.7. The converse statements in this theorem are not true: f (x) = x4 is
strictly convex, yet f ÕÕ (0) = 0.
Theorem 5.8. Nonnegative linear combinations of convex (concave) functions are
convex (concave). In particular, let S µ Rn be convex and fi : S æ R be convex
(concave) for i œ {1, . . . , k}. Then f = a1 f1 + · · · ak fk is convex (concave) if ai Ø 0
for all i = 1, . . . , k.
Theorem 5.9. Let S µ Rn convex, f : S æ R, and g : R æ R.
- If f is convex and g is convex and increasing, then g ¶ f is convex.
- If f is convex and g is concave and decreasing, then g ¶ f is concave.
- If f is concave and g is convex and decreasing, then g ¶ f is convex.
- If f is concave and g is concave and increasing, then g ¶ f is concave.
An important inequality is Jensen’s inequality.
Theorem 5.10. A function f is convex on a convex subset S of Rn if and only if
for all
{xi }ki=1
f (t1 x1 + · · · + tk xk ) Æ t1 f (x1 ) + · · · + tk f (xk )
µ S, k œ N, and t1 , . . . , tk Ø 0 such that t1 + · · · + tk = 1.
Definition 5.11. A convex combination of a set of points {xi }ki=1 is the point
for a set {ti }ki=1 such that
t1 x1 + · · · tk xk
ti Ø 0 for all i œ {1, · · · , k} and t1 + · · · + tk = 1.
ANALYSIS AND OPTIMIZATION
17
6. Lecture 6: Elements of Multivariable Calculus Cont’d
6.1. Taylor’s Theorem. Often times it is useful to be able to approximate a
function at or near a given point. Taylor’s theorem gives us a way to do this.
Theorem 6.1. Let f : R æ R be k times differentiable at a point z. Then, for
some t œ (0, 1),
f (x) =
k
ÿ
i=1
1
1
f (i≠1) (z)(x ≠ z)i≠1 + f (k) ((1 ≠ t)x + tz)(x ≠ z)k .
(i ≠ 1)!
k!
In addition, if the kth order derivative of f is continuous at z, then
f (x) =
k+1
ÿ
i=1
where
1
f (i) (z)(x ≠ z)i≠1 + Rk (x, z)
(i ≠ 1)!
Rk (x, z)
æ 0 as x æ z.
|x ≠ z|k
For example, let f (x) = ex , then, Taylor expanding at 0,
1
1
f (x) = f (0) + f Õ (0)x + f ÕÕ (0)x2 + f ÕÕÕ (0)x3 + · · ·
2
6
.
1 2 1 3
= 1 + x + x + x + ···
2
6
There are other versions of Taylor’s theorem as well as higher dimensional
analogues of Taylor’s theorem. For now, we will only state one higher dimensional
analogue, when k = 2.
Theorem 6.2. Let f : Rn æ R be twice differentiable at a point z. Then, for some
t œ (0, 1),
1
f (x) = f (z) + Òf (z) · (x ≠ z) + (x ≠ z)t D2 f ((1 ≠ t)x + tz)(x ≠ z).
2
In addition, if the second derivatives of f are continuous at z, then
1
f (x) = f (z) + Òf (z) · (x ≠ z) + (x ≠ z)t D2 f (z)(x ≠ z) + R2 (x, z)
2
where
R2 (x, z)
æ 0 as x æ z.
Îx ≠ zÎ2
Definition 6.3. The polynomials
and
T1,z (x) = f (z) + Òf (z) · (x ≠ z)
1
T2,z (x) = f (z) + Òf (z) · (x ≠ z) + (x ≠ z)t D2 f (z)(x ≠ z)
2
are called the first and second order Taylor polynomials of f centered at z respectively.
For example, let f : R5 æ R2 be defined by
3
4
x1 x22 + x1 x3 x4 + x2 x25 ≠ 3x1
f (x) =
.
x34 x2 x3 + 2x1 x4 ≠ x4 x25 ≠ 2x5
18
ANALYSIS AND OPTIMIZATION
Then, Df (x) =
3 2
x2 + x3 x4 ≠ 3
2x4
2x1 x2 + x25
x34 x3
x1 x4
x34 x2
x1 x3
3x24 x2 x3 + 2x1 ≠ x25
4
2x2 x5
.
≠2x4 x5 ≠ 2
Using Taylor’s theorem that at z = 0 and x = (0.1, 0.1, 0, 0, 0.1), we can approximate
f at x:
3 4 3
4
3
4
0
≠3 0 0 0 0
≠0.3
f (x) ≥ f (0) + Df (0)x =
+
·x=
.
0
0 0 0 0 ≠2
≠0.2
Note that
f (x) =
3
0.001 ≠ 0.3
≠0.2
4
=
3
4
≠0.299
.
≠0.2
6.2. The Implicit Function Theorem. Suppose we have the equation f (x, y) = 0
where x is given and y is unknown. When can we solve this? We have already
understood the answer in the linear case: when f (x, y) = Ay ≠ x for some matrix
A. Indeed, if we first rename x, and call it b, and second rename y, can call it x, we
have Ax = b, our familiar system of linear equations. But what happens when f is
nonlinear?
Theorem 6.4. Let f = f (x, y) : R2 æ R be a C 1 function in some neighborhood
of (x0 , y0 ). Suppose that f (x0 , y0 ) = 0. If ˆy f (x0 , y0 ) = ˆ2 f (x0 , y0 ) ”= 0, then two
intervals I1 = (x0 ≠ ”, x0 + ”) and I2 = (y0 ≠ ‘, y0 + ‘) exist such that for every
x œ I1 , the equation f (x, y) = 0 has a unique solution in I2 : this defines y as a
function of x in I1 , i.e., y = Ï(x) in I1 . Moreover, Ï is C 1 in I1 and
ÏÕ (x) = ≠
ˆ1 f (x, Ï(x))
ˆx f (x, Ï(x))
=≠
ˆ2 f (x, Ï(x))
ˆy f (x, Ï(x))
for all x œ I1 .
For example, let f (x, y) = y 3 + x2 ≠ 3xy ≠ 7 and consider the equation f (x, y) = 0.
Note
f (4, 3) = 0 and ˆ2 f (4, 3) = ˆy f (4, 3) = (3y 2 ≠ 3x)|(x,y)=(4,3) = 36 ”= 0.
So we can indeed solve for y near (4, 3). If we apply our theorem, we see that
y Õ (x) = ≠
At (x, y) = (4, 3), we then find
ˆ1 f (x, y))
3y ≠ 2x
=≠
.
ˆ2 f (x, y)
3(y 2 ≠ x)
1
.
15
This is wonderful, because, by Taylor’s theorem, we can at least approximate
y = y(x) near x = 4. Indeed,
y Õ (4) =
y≠3≥
1
1
41
(x ≠ 4) … y ≥
x+ .
15
15
15
Theorem 6.5. Let f = (f1 , . . . , fm ) be a C 1 function of (x, y) œ Rn ◊ Rm = Rn+m
in a neighborhood of (x0 , y0 ). Suppose that f (x0 , y0 ) = 0. If the Jacobian determinant
of f with respect to y is ”= 0 at (x0 , y0 ), then we can find neighborhoods B1 and B2
around x0 and y0 in Rn and Rm respectively such that the Jacobian determinant of
f with respect to y is ”= 0 for all (x, y) œ B1 ◊ B2 . Moreover, for every x œ B1 , a
ANALYSIS AND OPTIMIZATION
19
unique y œ B2 exists such that f (x, y) = 0. In this way, y is a function of x in B1 .
In particular, y is C 1 in B1 and the differential of y (as a function of x) is
Recall that
Dy(x) = ≠(Dy f (x, y))≠1 Dx f (x, y).
Q
R Q
Òf1 (z)
ˆ1 f1 (z)
c
d
c
.
..
..
Df (z) = a
b=a
.
Òfm (z)
So, letting z = (x, y) œ R
m+n
ˆ1 fm (z)
,
···
..
.
···
R
ˆn+m f1 (z)
d
..
b.
.
ˆn+m fm (z)
Df (x, y) = (Dx f (x, y) | Dy f (x, y))m◊(m+n)
where Dx f (x, y) is an m ◊ n matrix and Dy f (x, y) is and m ◊ m matrix.
Notice that if we differentiate f (x, y(x)) = 0 in x, we find that
Dx f (x, y) + Dy f (x, y)Dx y(x) = 0
So if det(Dy f (x, y)) ”= 0, then Dy f (x, y) is invertible, and
Dy(x) = Dx y(x) = ≠(Dy f (x, y))≠1 Dx f (x, y).
Definition 6.6. Let f : Rn+m æ Rm be differentiable. The Jacobian determinant
of f with respect to y is det(Dy f (x, y)).
For example, let f : R5 æ R2 be defined by
3
4
x1 x22 + x1 x3 y1 + x2 y22 ≠ 3
f (x, y) =
.
y13 x2 x3 + 2x1 y2 ≠ y1 y22 ≠ 2
Then,
where
Df (x, y) = (Dx f (x, y) | Dy f (x, y))m◊(m+n)
Dx f (x, y) =
and
3
x22 + x3 y1
2y2
2x1 x2 + y22
y13 x3
x1 y1
y13 x2
4
x1 x3
2x2 y2
.
3y12 x2 x3 ≠ y22 2x1 ≠ 2y1 y2
Note that f (1, 1, 1, 1, 1) = 0. The Jacobian determinant of f with respect to y is
x1 x3
2x2 y2 -det(Dy f (x, y)) = - 2
,
3y1 x2 x3 ≠ y22 2x1 ≠ 2y1 y2 Dy f (x, y) =
3
4
which at (1, 1, 1, 1, 1) is
-1 2-2 0- = ≠4.
So, again, we can indeed solve for y as a function of x near (1, 1, 1, 1, 1), by the
Implicit Function Theorem. Also, by our theorem,
3
43
4
3
4
1 0 ≠2
1 ≠4 ≠2 ≠2
2 3 1
Dy(1, 1, 1, 1, 1) =
=≠
.
2 1 1
4 ≠2 1
4 ≠2 ≠5 ≠1
And, near (1, 1, 1, 1, 1),
3
y1 ≠ 1
y2 ≠ 1
4
1
≥
4
3
4
2
2
5
Q
R
4 x1 ≠ 1
2 a
x2 ≠ 1b .
1
x3 ≠ 1
20
ANALYSIS AND OPTIMIZATION
In particular, we can approximate y = y(x) near x = (1, 1, 1).
6.3. The Inverse Function Theorem.
Definition 6.7. Let f : A æ B be a function with A µ Rn and B µ Rm .
- f is injective (one-to-one) if f (x) = f (z) implies that x = z.
- f is surjective (onto) if for every y œ B, there is an x œ A such that f (x) = y.
- f is bijective if it is injective and surjective. In this case, f is invertible, and
the inverse f ≠1 : B æ A of f satisfies f (f ≠1 (y)) = y and f ≠1 (f (x)) = x.
Theorem 6.8. Let f : Rn æ Rn be C 1 near a point x0 , and suppose that
det Df (x0 ) ”= 0, i.e., the Jacobian determinant of f at x0 is nonzero. Then,
f ≠1 exists in a neighborhood B around y0 and
Df ≠1 (y0 ) = Df (x0 )≠1 where f (x0 ) = y0 .
If f is C k near x0 , then f ≠1 is C k in the image B = f (A), for some neighborhood
A of x0 .
For example, let f : R2 æ R2 be defined by
3
4
x1 + x22 ≠ 2
f (x) =
.
x31 + x2 ≠ x1 x22 ≠ 1
Note that f (1, 1) = (0, 0) and
Df (x) =
Hence,
3
1
3x21 ≠ x22
4
2x2
.
1 ≠ 2x1 x2
det(Df (1, 1)) = 1(≠1) ≠ 2(2) = ≠5 ”= 0,
and, by the Inverse Function Theorem, f is invertible in a neighborhood of (1, 1).
Moreover,
3
4≠1 3
4
1 2
1/5 2/5
≠1
Df (0, 0) =
=
.
2 ≠1
2/5 ≠1/5
We may not know the exact form of the inverse, but we do know that g(y) =
Df ≠1 (0, 0)y is a good approximation of f ≠1 near 0.
ANALYSIS AND OPTIMIZATION
21
7. Lecture 7: Unconstrained Optimization
7.1. Extreme and Stationary Points.
Definition 7.1. Let f : A µ Rn æ R.
- If f (x0 ) Ø f (x) for all x œ A, then x0 is a global maximum for f .
- If f (x0 ) Æ f (x) for all x œ A, then x0 is a global minimum for f .
- If f (x0 ) Ø f (x) for all x œ B µ A, then x0 is a local maximum for f .
- If f (x0 ) Æ f (x) for all x œ B µ A, then x0 is a local minimum for f .
Collectively, maxima and mimima are called extreme points.
Theorem 7.2. Let f : A µ Rn æ R be differentiable. If x0 œ A is an extreme
point, then Òf (x0 ) = 0.
Definition 7.3. Let f : Rn æ R be differentiable. If Òf (x0 ) = 0, then x0 is called
a stationary point of f .
Stationary points may not be extreme points: (0, 0) for f (x, y) = x2 ≠ y 2 is stationary but not extreme. Extreme points are stationary provided f is differentiable.
Non-differentiable functions can have extreme points that are not stationary: 0 for
f (x) = |x| is an extreme point, but not a stationary point.
Definition 7.4. A stationary but non-extreme point is called a saddle point.
Theorem 7.5. Let S µ Rn be an open convex set, and x0 œ S. Let f : S æ R be
C 1 in an neighborhood of x0 .
- Suppose f is convex. Then, x0 is a minimum point for f in S if and only
if x0 is a stationary point (for f ).
- Suppose f is concave. Then, x0 is a maximum point for f in S if and only
if x0 is a stationary point (for f ).
Theorem 7.6. Let f : A µ Rn æ R. If f is continuous and A is closed and
bounded, then f achieves a maximum and minimum point on A.
Theorem 7.7. Let f : Rn æ R be twice differentiable in a neighborhood around x0 ,
and suppose that Òf (x0 ) = 0, i.e., x0 is a stationary point of f .
- If D2 f (x0 ) is positive definite, then x0 is a local minimum of f .
- If D2 f (x0 ) is negative definite, then x0 is a local maximum of f .
- If det(D2 f (x0 )) ”= 0 and D2 f (x0 ) is neither positive definite nor negative
definite, then x0 is a saddle point of f .
- If x0 is a local minimum of f , then D2 f (x0 ) is positive semidefinite.
- If x0 is a local maximum of f , then D2 f (x0 ) is negative semidefinite.
These cases are not all the possibilities. Remember the function f (x) = x3 in R:
f (0) = 0, f ÕÕ (0) = 0, and 0 is a saddle point of f .
For example, let
Õ
f (x, y) = 3x2 y + y 3 ≠ 3x2 ≠ 3y 2 + 2.
Let’s find and classify all stationary points. A stationary point satisfies
0 = Òf (x, y) = (6xy ≠ 6x, 3x2 + 3y 2 ≠ 6y).
So we have the equations
6x(y ≠ 1) = 0 and 3(x2 + y 2 ≠ 2y) = 0.
22
ANALYSIS AND OPTIMIZATION
In turn, there are few cases. From the first equation, we see that either x = 0 or
y = 1. Inputting this information into the second equation, we find the following
stationary points: 1. (x, y) = (0, 0), 2. (x, y) = (0, 2), 3. (x, y) = (1, 1), and 4.
(x, y) = (≠1, 1). Now we compute the Hessian of f at these four points. Note, in
general,
3
4
6y ≠ 6
6x
D2 f (x, y) =
.
6x
6y ≠ 6
Hence, at our four stationary points:
3
4
3
4
3
4
≠6 0
6 0
0 6
2
2
2
D f (0, 0) =
, D f (0, 2) =
, D f (1, 1) =
,
0 ≠6
0 6
6 0
and
4
0 ≠6
D f (≠1, 1) =
.
≠6 0
So after checking the definiteness of these matrices we see that point 1. is a local
max, 2. is a local min, 3. is a saddle point, and 4. is a saddle point.
2
3
ANALYSIS AND OPTIMIZATION
23
8. Lecture 8: Review of Midterm & Unconstrained Optimization
Cont’d
Let us consider the function
f ú (r) = max f (x, r)
xœS
for a given f : R
æ R and S µ R . Here x œ Rn and r œ Rm . f ú (r) is the
maximum of f when we fix a particular r, which we know how to find. (Take the
derivative of f in x, set it equal to 0, and solve for xú = xú (r) the stationary point.)
The question now is, how does f ú change as r changes? Let xú = xú (r) be the
vector that maximizes f for a fixed r. Then,
n+m
n
f ú (r) = f (xú (r), r).
So differentiating in r, if we consider the gradients as columns vectors
Òf ú (r) = (Dxú (r))t Òx f (xú (r), r) + Òr f (xú (r), r).
If we consider the gradients as row vectors,
Òf ú (r) = Òx f (xú (r), r)Dxú (r) + Òr f (xú (r), r).
But since xú (r) = xú is a maximum of f (x, r) for r fixed,
Òx f (xú (r), r) = Òx f (xú , r) = 0.
In turn,
Òf ú (r) = Òr f (xú (r), r).
This equality is called the envelope result.
For example, let
f ú (r) = max f (x, y, r)
x,yœR
with
f (x, y, r) = ≠x2 ≠ xy ≠ 2y 2 + 2rx + 2ry.
First, we differentiate in x and y and set that gradient in x and y to (0, 0), to
determine the stationary points of f with r fixed:
Òx,y f (x, y, r) = (≠2x ≠ y + 2r, ≠x ≠ 4y + 2r) = (0, 0).
This holds if and only if
2x + y = 2r and x + 4y = 2r;
that is, if and only if
x = 3y.
So
xú = xú (r) =
maximize f for r fixed. In turn,
6r
2r
and y ú = y ú (r) =
7
7
f ú (r) =
Finally,
On the other hand,
8r2
.
7
df ú
16r
(r) =
.
dr
7
ˆr f (xú , y ú , r) = 2xú + 2y ú =
16r
,
7
24
ANALYSIS AND OPTIMIZATION
which verifies the envelope result. In other words, to verify the envelope result, we
compute the derivative of f with respect to r, plug in (xú (r), y ú (r), r) and check
that this equals the derivative of f ú with respect to r.
ANALYSIS AND OPTIMIZATION
25
9. Lecture 9: Constrained Optimization
9.1. Equality Constraints. Let us consider the general optimization problem
max f (x) (or min f (x))
x
x
subject to
g(x) = b.
Here f : R æ R, g : R æ R with m Æ n, and b œ Rm are all given. The
function g and the vector b are constraints; hence, we are in the world of constrained
optimization.
n
n
m
Remark 9.1. Most often, we will be in the setting where m < n, so that we have
a least one degree of freedom.
To solve this problem, we introduce the method of Lagrange multipliers.
In particular, from our given data, we construct a Lagrangian, i.e., a function
L : Rn ◊ Rm = Rn+m æ R defined by
L(x, ⁄) = f (x) ≠ ⁄ · (g(x) ≠ b)
= f (x) ≠
m
ÿ
j=1
⁄j (gj (x) ≠ bj ),
where g = (g1 , g2 , . . . , gm ), b = (b1 , b2 , . . . , bm ) and ⁄ = (⁄1 , ⁄2 , . . . , ⁄m ) œ Rm .
Each ⁄i , for i = 1, . . . , m, is called a Lagrange multiplier.
A necessary condition for optimality, i.e., maximization (or minimization), is
Òx L(x, ⁄) = 0;
that is, for all i = 1, . . . , n,
m
ÿ
ˆL
(x, ⁄) = ˆi f (x) ≠
⁄j ˆi gj (x) = 0.
ˆxi
j=1
In other words, we need to find a pair (xú , ⁄ú ) such that
Òx L(xú , ⁄ú ) = 0 or ˆi L(xú , ⁄ú ) = 0 for all i = 1 . . . , n,
and then check that xú is indeed optimal.
Remark 9.2. Since
≠ max{f (x)} = min{≠f (x)},
x
x
any minimization problem can be turned into a maximization problem and visa-versa.
So, for simplicity, we will often just talk about the maximization case.
Is there a way to understand ⁄? Observe that we can construct the (optimal)
value function f ú from our data as follows
f ú (b) = max{f (x) : g(x) = b}.
x
For a fixed b œ R , let x = x (b) be a vector that maximizes f , subject to g(x) = b.
Then,
f (xú ) = f ú (b).
Obviously, for all x œ Rn ,
f (x) Æ f ú (g(x)).
m
ú
ú
26
ANALYSIS AND OPTIMIZATION
Hence, if we define the function Ï : Rn æ R by
Ï(x) = f (x) ≠ f ú (g(x)),
then Ï is maximized at xú . So, assuming that f ú is differentiable,
0 = ÒÏ(xú )
= Òf (xú ) ≠ Òf ú (g(xú ))Dg(xú )
= Òf (xú ) ≠ Òf ú (b)Dg(xú ).
Recall we fixed b œ Rm and xú = xú (b). In other words, for all i = 1, . . . , n,
0 = ˆi f (xú ) ≠
If we set
m
ÿ
ˆj f ú (b)ˆi gj (xú ).
j=1
⁄új = ⁄új (b) = ˆj f ú (b),
then our equation ˆi L(xú , ⁄ú ) = 0 for all i = 1, . . . , n is exactly what we have.
For example, consider the optimization problem
max f (x, y, z) with f (x, y, z) = x + 2z
subject to
7
.
4
Here f : R3 æ R, g = (g1 , g2 ) = (x + y + z, x2 + y 2 + z) : R3 æ R2 , and b = (1, 7/4).
(So m = 2 and n = 3.)
x + y + z = 1 and x2 + y 2 + z =
Remark 9.3. We can always make b = 0 by simply considering g(x) ≠ b instead
of g(x). This won’t affect anything as the optimality condition is a condition
derivatives, which don’t see the addition or subtraction of constants.
So we can reformulate this problem as
max f (x, y, z) with f (x, y, z) = x + 2z
subject to
7
=0
4
Here f : R3 æ R, g = (g1 , g2 ) = (x + y + z ≠ 1, x2 + y 2 + z ≠ 74 ) : R3 æ R2 , and
b = (0, 0). Now let’s construct the Lagrangian:
3
4
7
2
2
L(x, y, z, ⁄) = x + 2z ≠ ⁄1 (x + y + z ≠ 1) ≠ ⁄2 x + y + z ≠
.
4
x + y + z ≠ 1 = 0 and x2 + y 2 + z ≠
The optimality conditions are
(1) ˆx L(x, y, z, ⁄) = 1 ≠ ⁄1 ≠ 2⁄2 x = 0
(2) ˆy L(x, y, z, ⁄) = ≠⁄1 ≠ 2⁄2 y = 0
subject to
(3) ˆz L(x, y, z, ⁄) = 2 ≠ ⁄1 ≠ ⁄2 = 0
(–) x + y + z ≠ 1 = 0 and (—) x2 + y 2 + z ≠
Combing this information, we find that
7
= 0.
4
(xú , y ú , z ú ) = (0, ≠1/2, 3/2) or (xú , y ú , z ú ) = (1, 3/2, ≠3/2)
ANALYSIS AND OPTIMIZATION
27
and
⁄ 1 = 1 = ⁄2 .
In particular, (3) implies that ⁄2 = 2 ≠ ⁄1 . Putting this into (2), we find that
⁄1 (2y ≠ 1) = 4y. So y ”= 1/2 and ⁄1 = 4y/(2y ≠ 1). Let’s call this relation between
⁄1 and y (4). Using (1), (2), and (4), we find that (5) y = 2x ≠ 1/2. Plugging (5)
into (–) and (—), we see that 5x(x ≠ 1) = 0, so that x = 0 or x = 1. From here, we
find our possibilities for (xú , y ú , z ú , ⁄ú ).
Computing f at these points, we see that
f (0, ≠1/2, 3/2) = 3 and f (1, 3/2, ≠3/2) = ≠2.
Thus, the only candidate is (0, ≠1/2, 3/2).
But we aren’t done. We need to check how our Lagrange multipliers affects our
Lagrangian. If we input ⁄1 = 1 = ⁄2 into our Lagrangian, we find that
11
L(x, y, z, 1, 1) = ≠x2 ≠ y 2 ≠ y + ,
4
which is concave, as a function of (x, y, z). So (0, ≠1/2, 3/2) is indeed a maximizer.
Theorem 9.4. Let (xú , ⁄ú ) be such that Òx L(xú , ⁄ú ) = 0. If L(x, ⁄ú ) is concave
(convex) as a function of x, then xú solves the maximization (minimization) problem.
Why is this? If L(x) = L(x, ⁄ú ) is concave and xú is a stationary point, then
L(xú ) Ø L(x) for all x œ Rn . (Recall what we have learned about concave functions.)
Then, using the constraint, g(x) = b = g(xú ), we see that f (xú ) Ø f (x) for all
x œ Rn such that g(x) = b.
In an earlier lecture, we considered optimization problems in all of Rn of the
form maximize f , minimize f , or find and classify all stationary point of f . These
are unconstrained problems. The Lagrange multiplier method allows us to tackle
optimization problem over subsets of Rn . We will work over convex subset of Rn .
For example, consider
f (x, y) = 2x2 + 3y 2 ≠ 4x ≠ 5 and S = {(x, y) œ R2 : x2 + y 2 Æ 16}.
What are the extrema of f subject to (x, y) œ S? In other words, find
max f (z) and min f (z)
zœS
zœS
with z = (x, y). To find these extrema, we have to take two steps.
First, we find all the stationary points of f . That is, the points (x, y) such that
0 = Òf (x, y) = (4x ≠ 4, 6y).
Thus, the only stationary point is (1, 0). Now we check whether or not this point is
in S. Well, 12 + 02 = 1 Æ 16. So (1, 0) œ S. If it were not in S, we would discard it.
To continue, we try to classify this stationary point. We compute
3
4
3
4
4 0
4 0
D2 f (x, y) =
, so that D2 f (1, 0) =
.
0 6
0 6
In turn, (1, 0) is a local minimum (D2 f (1, 0) is positive definite). Moreover,
f (1, 0) = ≠7.
Second, we have to understand what happens along the boundary of S, the set
{(x, y) œ R2 : x2 + y 2 = 16}. This is an equality constraint, and so we introduce
our Lagrangian,
L(x, y, ⁄) = f (x, y) ≠ ⁄(x2 + y 2 ≠ 16).
28
ANALYSIS AND OPTIMIZATION
As before, we need to understand when
subject to
Òx,y L(x, y, ⁄) = (4x ≠ 4 ≠ 2⁄x, 6y ≠ 2⁄y) = (0, 0)
x2 + y 2 ≠ 16 = 0.
From the second equation, we see that y = 0 or ⁄ =Ô
3. When ⁄ = 3, we see that
x = ≠2. Then, y 2 = 16 ≠ 4 = 12, so that y = ±2 3. When y = 0, from our
constraint equation, we see that x = ±4. Thus, we have four more cases to consider:
Ô
f (≠2, ±2 3) = 47, f (4, 0) = 11, and f (≠4, 0) = 43.
So, considering Steps 1 and 2 together, the max of f over S is 47, and the min of
f over S is ≠7.
ANALYSIS AND OPTIMIZATION
29
10. Lecture 10: Constrained Optimization Cont’d
10.1. Inequality Constraints. Let us consider the general optimization problem
max f (x)
x
subject to
g(x) Æ b.
Here f : R æ R, g : R æ R , and b œ Rm are all given. Also,
n
n
m
g(x) Æ b … gj (x) Æ bj for all j = 1, . . . , m.
Remark 10.1. Unlike the equality constraints case, we do not typically make the
assumption that m < n. In fact, it could be that m > n.
Just as a minimization problem can be turned into a maximization problem (cf.
last lecture), inequality constraints of the form gj (x) Ø bj can be turned into the
form we wrote above, gj (x) Æ bj , by replacing gj with ≠gj and bj with ≠bj for
which ever j we need to get the whole vector inequality. In particular,
gj (x) Ø bj … ≠gj (x) Æ ≠bj .
Definition 10.2. The set {x œ Rn : g(x) Æ b} is called the admissible or feasible
set.
Like before we construct a Lagrangian L : Rn ◊ Rm = Rn+m æ R defined by
L(x, ⁄) = f (x) ≠ ⁄ · (g(x) ≠ b)
where g = (g1 , g2 , . . . , gm ), b = (b1 , b2 , . . . , bm ), and ⁄ = (⁄1 , ⁄2 , . . . , ⁄m ) œ Rm . The
vector ⁄ like before is called a vector of Lagrange multipliers. Again, we have an
optimality condition
Òx L(x, ⁄) = Òf (x) ≠ ⁄Dg(x) = 0;
that is, for all i = 1, . . . , n,
m
ÿ
ˆL
(x, ⁄) = ˆi f (x) ≠
⁄j ˆi gj (x) = 0.
ˆxi
j=1
The new feature here is a set of conditions called the complementary slackness
conditions
⁄j Ø 0 with ⁄j = 0 if gj (x) < bj for any j = 1, . . . , m.
In other words,
⁄ Ø 0 and ⁄ · (g(x) ≠ b) = 0.
When gj (x) = bj , we say that this constraint is active.
Why do we call these complementary slackness conditions? Because at most one
of the two, for each j = 1, . . . , m, inequalities gj (x) Æ bj or ⁄j Ø 0 can be strict.
Equivalently, at least one must be an equality.
The optimality condition and the complementary slackness conditions together
are called the Kuhn–Tucker conditions.
Let’s analyze these conditions a bit by studying the (optimal) value function f ú
associated to our problem:
f ú (b) = max{f (x) : g(x) Æ b}.
x
Let’s assume that f ú is differentiable.
30
ANALYSIS AND OPTIMIZATION
First, notice that f ú is nondecreasing in each argument bj . Why? Because if we
increase bj while keeping bi for all i ”= j fixed, the admissible set increases. So f ú
cannot decrease.
For a fixed b œ Rm , let xú = xú (b) be a vector at which f is maximized, i.e.,
f ú (b) = f (xú ).
Clearly, for any x œ Rn ,
f (x) Æ f ú (g(x)),
because x satisfies the constraint when we replace b with g(x), i.e., g(x) Æ g(x)
always. Then, since g(xú ) Æ b and f ú is nondecreasing,
f ú (g(x)) Æ f ú (g(x) + b ≠ g(xú )).
So if we define Ï : Rn æ R by
we find that
Ï(x) = f (x) ≠ f ú (g(x) + b ≠ g(xú )),
Ï(x) Æ 0 for all x œ Rn .
As Ï(xú ) = 0, xú is a maximum of Ï. So
And if we set
0 = ÒÏ(xú ) = Òf (xú ) ≠ Òf ú (b)Dg(xú ).
⁄ú = ⁄ú (b) = Òf ú (b),
we find our optimality condition.
Now let’s look at our complementary slackness conditions again. First, since f ú
is nondecreasing in each argument, ˆj f ú (b) Ø 0 for all b œ Rm . So
as desired. Second, let’s prove that
⁄új Ø 0,
gj (xú ) < bj for some j = 1, . . . , m ∆ ⁄új = 0.
By the continuity of gj , if gj (xú ) < bj , then gj (x) < bj in a neighborhood of xú . So
that since f ú is nondecreasing,
f ú (g(xú )) Æ f ú (bÕ ) Æ f ú (b)
where bÕ = (b1 , . . . , bj≠1 , bÕj , bj≠1 , . . . , bm ) and bÕj œ (gj (xú ), bj ). But xú being a
maximizer implies that f ú (g(xú )) = f ú (b). So f ú is constant along this interval.
Hence, ˆj f ú (b) = 0. In turn,
⁄új = 0.
Theorem 10.3. Suppose that xú solves our inequality constraints problem. Let the
constraint qualification CQ be satisfied:
CQ: The vectors Ògj (xú ) for the j such that gj (xú ) = bj are linearly independent.
Then, there is a unique ⁄ú such that the Kuhn–Tucker conditions holds at (xú , ⁄ú ).
How do we solve our problem in general?
Step 1: Find all the points (xú , ⁄ú ) at which the KT conditions hold.
Step 2: Find all the points xú at which the CQ fails.
Step 3: Check among your choices which are admissible, and then from those
determine which is the maximizer.
To gather the constraint qualification points, if any, you should do the following.
ANALYSIS AND OPTIMIZATION
31
1. Gather all your constraint functions. Let’s say we have two: gi = gi (x, y) for
i = 1, 2.
2. Take their gradients, to get a set of vectors, one vector for each constraint.
These vectors may be variable, i.e., depend on x and y in this case.
3. Find all the points at which the constraints are active AND at which the
gradients of the constraints are linearly dependent:
(i). Find all pairs (a, b) such that when you plug (a, b) into g1 and g2 , they are
active. Check if, at any of these (a, b), the gradients of g1 and g2 , as a set of two
vectors, are linearly dependent. If so, keep these (a, b). You can do things in the
reverse order as well: Find all pairs (a, b) at which the gradients of g1 and g2 , as a
set of two vectors, are linearly dependent. Keep those (a, b) that when plugged into
g1 and g2 both g1 and g2 are active.
(ii). Find all pairs (a, b) such that when you plug (a, b) into g1 it is active. If the
gradient of g1 at any of these (a, b) is the zero vector, then you have to keep it. A
single vector is linearly dependent if and only if it is the 0 vector. So, in this case,
the only (a, b) you can possibly add are those at which the gradient of g1 is the zero
vector.
(iii). Redo (ii) with g2 instead of g1 .
NOTE: If you have more than two constraints, you’ll have more cases to consider.
You’ll have to check all the constraints together, like in 1. You’ll have to check
each constraint on its own, like in 2. and 3. And you’ll have to check all pairs, and
triplets, etc. (For the three constraint case, you have each of the three to consider
individually, all pairs to consider, and the three together to consider.)
For example, consider
subject to
max f (x, y) with f (x, y) = ≠(x ≠ 2)2 ≠ 2(y ≠ 1)2
g1 (x, y) = x + 4y Æ 3 and g2 (x, y) = ≠x + y Æ 0.
Our Lagrangian is
L(x, y, ⁄) = f (x, y) ≠ ⁄1 (g1 (x, y) ≠ 3) ≠ ⁄2 g2 (x, y).
Let’s check the KT conditions:
ˆx L(x, y, ⁄) = ≠2(x ≠ 2) ≠ ⁄1 + ⁄2 = 0 and ˆy L(x, y, ⁄) = ≠4(y ≠ 1) ≠ 4⁄1 ≠ ⁄2 = 0
subject to
⁄1 (x + 4y ≠ 3) = 0 and ⁄2 (y ≠ x) = 0 with ⁄1 , ⁄2 Ø 0.
We need to consider 4 cases: 1. when both constraints are inactive; 2. when one is
active and the other is inactive; 3. the reverse of 2.; and 4. when both constraints
are active.
1. If ⁄1 = ⁄2 = 0, then (x, y) = (2, 1).
2. If ⁄1 = 0, ⁄2 ”= 0, then (x, y) = (4/3, 4/3) and ⁄2 = ≠4/3.
3. If ⁄1 ”= 0, ⁄2 = 0, then (x, y) = (5/3, 1/3) and ⁄1 = 2/3.
4. If ⁄1 ”= 0, ⁄2 ”= 0, then (x, y) = (3/5, 3/5) and (⁄1 , ⁄2 ) = (22/25, ≠48/25).
Now let’s check when the CQ fail. Observe that Òg1 (x, y) = (1, 4) and Òg2 (x, y) =
(≠1, 1). These are linearly independent. So the CQ always holds, and there are no
additional points to consider.
First, (2, 1) does not satisfy the first constraint. So it is inadmissible. Throw it
away. Similarly, (4/3, 4/3) does not satisfy the first constraint. So it is inadmissible.
32
ANALYSIS AND OPTIMIZATION
Moreover, ⁄2 < 0 in this case, which violates our complementary slackness conditions.
Throw (4/3, 4/3) away too. Third, while (3/5, 3/5) satisfies the constraints, ⁄2 < 0
in this case. So we throw it away. Fourth, (5/3, 1/3) is admissible and ⁄ú in this
case is nonnegative. So we keep it. Since L(x, y, ⁄ú ) in this fourth case is concave,
we know that (5/3, 1/3) is a maximizer (by the following theorem).
Theorem 10.4. Let (xú , ⁄ú ) satisfy the KT conditions. If xú is admissible and
L(x, ⁄ú ) is concave as a function of x, then xú solves the maximization problem.
Proof. If L(x, ⁄ú ) is concave and Òx L(xú , ⁄ú ) = 0, then xú maximizes L(x, ⁄ú ).
This implies that
f (xú ) ≠ ⁄ú · (g(xú ) ≠ b) Ø f (x) ≠ ⁄ú · (g(x) ≠ b)
for all admissible x. Rearranging, we find that
f (xú ) ≠ f (x) Ø ⁄ú · (g(xú ) ≠ g(x))
for all admissible x. Now if gj (xú ) < bj , then ⁄új = 0, by our complementary
slackness conditions. On the other hand, if gj (xú ) = bj , then
⁄új · (gj (xú ) ≠ gj (x)) = ⁄új · (bj ≠ gj (x)) Ø 0
since x is admissible and ⁄j Ø 0. So ⁄ú · (g(xú ) ≠ g(x)) Ø 0 if x is admissible. Using
this information above,
f (xú ) Ø f (x)
for all admissible x. So xú is indeed our solution.
⇤
For example, let’s reconsider (cf. last lecture)
subject to
max f (x, y) with f (x, y) = 2x2 + 3y 2 ≠ 4x ≠ 5
x2 + y 2 Æ 16.
This time, let’s use the KT conditions to find our solution. Our Lagrangian is
L(x, y, ⁄) = 2x2 + 3y 2 ≠ 4x ≠ 5 ≠ ⁄(x2 + y 2 ≠ 16).
So our KT conditions are
0 = ˆx L(x, y, ⁄) = 4x ≠ 4 ≠ 2⁄x
0 = ˆy L(x, y, ⁄) = 6y ≠ 2⁄y
0Æ⁄
0 = ⁄(x2 + y 2 ≠ 16).
If our constraint is active, then ⁄ ”= 0. In this case, x2 + y 2 = 16. From our
second equation, we see that either y = 0 or ⁄ = 3. If y = 0, then x = ±4, by our
constraint. Also, ⁄ = 3/2 and ⁄ = 5/2, when x = 4 and x = ≠4 respectively.
Ô If
⁄ = 3, then using our first equation, we find that x = ≠2, and then y = ±2 3, by
our constraint.
If our constraint is inactive, then ⁄ = 0. So x = 1 and y = 0.
Now we check when the CQ fail. Note that the gradient of our constraint is
(2x, 2y). The only time (2x, 2y) is linearly dependent is if it is the 0 vector, i.e, when
(x, y) = (0, 0). But (0, 0) does not activate our constraint. So we don’t consider it.
Finally, when ⁄ú = 0, 3/2, L(x, y, ⁄ú ) is convex. When ⁄ú = 5/2, L(x, y, ⁄ú ) is
neither convex nor concave. On the other hand, when ⁄ú = 3, L(x, y, ⁄ú ) is concave.
ANALYSIS AND OPTIMIZATION
33
How do we handle this scenario? We can only apply our theorem in the last case
(if L(x, y, ⁄ú ) is concave then xú is a maximizer, provided (xú , ⁄ú ) satisfies the KT
conditions). To deal with this, we note that f is continuous and the admissible
region is closed and bounded. So by the Extreme Value Theorem, a maximizer
exists in this region. Thus, while the case that ⁄ = 3 is one in which we have a
maximizer (by our theorem), we can’t ignore the other points. We have to test them
all by checking they are admissible, their corresponding ⁄úÔØ 0, and finally plugging
them into f . We’ll find, like in last lecture, that (≠2, ±2 3) are maximizers.
Theorem 10.5. Let f : A µ Rn æ R be continuous. If A is closed and bounded,
then f achieves its maximum and minimum in A.
Definition 10.6. A subset A µ Rn is closed if it contains all of its limit points.
Definition 10.7. A subset A µ Rn is open if for every point x œ A, we can find a
radius r > 0 such that ball of radius r centered at x is in A, i.e., {z : Îz≠xÎ < r} µ A.
In particular, the complement of an open set is closed and visa-versa, where the
complement of A, denoted by Ac , is equal to the set of points not in / outside of A.
Definition 10.8. A subset A µ Rn is bounded if for every x œ A, the length of x
is bounded by some constant: for every x œ A, ÎxÎ Æ C for some finite C Ø 0.
For example,
1. (0, 1) µ R is open and bounded
2. [0, 1] µ R is closed and bounded.
3. (0, 1] µ R is neither closed nor open, but it is bounded.
34
ANALYSIS AND OPTIMIZATION
11. Lecture 11: Constrained Optimization Cont’d and ODEs
11.1. Mixed Constraints. Let us consider the general optimization problem
max f (x)
x
subject to
g(x) = b and h(x) Æ c
Here f : R æ R, g : R æ R , h : Rn æ Rk , b œ Rm , and c œ Rk are all given. In
this case, we define the Lagrangian as follows: L : Rn ◊ Rm ◊ Rk = Rn+m+k æ R
defined by
L(x, ⁄, µ) = f (x) ≠ ⁄ · (g(x) ≠ b) ≠ µ · (h(x) ≠ c).
n
n
m
Theorem 11.1. Suppose xú solves our mixed constraints problem. Suppose that
CQ holds for g and h. Then, there are unique vectors ⁄ú and µú such that
Òx L(xú , ⁄ú , µú ) = 0, µú Ø 0, and µú · (h(xú ) ≠ c) = 0.
Theorem 11.2. Suppose (xú , ⁄ú , µú ) satisfy the KT conditions and xú is admissible.
If L(x, ⁄ú , µú ) is concave, as a function of x, then xú solves the maximization
problem.
Remark 11.3. The equality constraint has to be considered in every case when you
are checking for CQ failure, because as an equality constraint, it is always active.
Let’s do an example. Consider the maximization problem
max f (x, y, z) with f (x, y, z) = x2 + y 2 + z 2
subject to
g(x, y, z) = x + y + z = 0 and h(x, y, z) = x2 + y 2 + 3z 2 Æ 9.
In this case, our Lagrangian is
L(x, y, z, ⁄, µ) = x2 + y 2 + z 2 ≠ ⁄(x + y + z) ≠ µ(x2 + y 2 + 3z 2 ≠ 9).
The KT conditions are
0 = ˆx L(x, y, z, ⁄, µ) = 2x ≠ ⁄ ≠ 2µx
0 = ˆy L(x, y, z, ⁄, µ) = 2y ≠ ⁄ ≠ 2µy
0 = ˆy L(x, y, z, ⁄, µ) = 2z ≠ ⁄ ≠ 6µz
0Ƶ
0 = µ(x2 + y 2 + 3z 2 ≠ 9)
0 = x + y + z.
Note that
Òg(x, y, z) = (1, 1, 1) and Òh(x, y, z) = (2x, 2y, 6z).
The points at which Òg(x, y, z) and Òh(x, y, z) are linearly dependent are (a, a, a/3)
for any a œ R. But since (a, a, a/3) cannot activate both g and h, we do not add
any of them as a CQ failure point.
To continue with the CQ, we have to consider the cases when only one constraint
is active. Since an equality constraint is always active, the only choice we have is to
consider it alone. Since Òg(x, y, z) = (1, 1, 1) is linearly independent, CQ always
holds in this case. So, again, there are no CQ failure points to add.
ANALYSIS AND OPTIMIZATION
35
Since the admissible region is closed and bounded, by the EVT, we look for our
solution among the points which satisfy the KT conditions (the CQ, as we showed,
does not fail).
Notice that ⁄ is always active, as it is an equality constraint.
If µ is inactive, µ = 0. In this case, by our optimality conditions, x = y = z =
⁄/2. By our equality constraint g, we then see that ⁄ = 0, which implies that
x = y = z = 0.
If µ is active, i.e, µ ”= 0, then, from our first two optimality conditions (subtract
them), we find that 2(x ≠ y)(1 ≠ µ) = 0. So either µ = 1 or x = y. If x = y,
then, by ourÔequality constraint,
Ô ≠2x = z. Hence, by our inequality constraint,
x = y = ±3/ 14 and z = û6/ 14. If µ = 1, then, by our first optimality condition,
⁄ = 0. Thus, z = 0, considering our Ô
third optimalityÔcondition. Inputting this in
our constraints, we see that x = ±3/ 2 and y = û3/ 2.
In summary, we have the following points to consider:
Ô
Ô
Ô
Ô
Ô
(0, 0, 0), (±3/ 2, û3/ 2, 0), and (±3/ 14, ±3/ 14, û6/ 14).
Ô
Ô
Plugging these into f , we find that (±3/ 2, û3/ 2, 0), which are admissible, both
maximize f , and max f subject to g = 0 and h Æ 9 is equal to 9.
As a final example, let’s throw away the inequality constraint in the last example,
and consider
max f (x, y, z) with f (x, y, z) = x2 + y 2 + z 2
subject to
g(x, y, z) = x + y + z = 0.
Note that the set of points that satisfy g = 0 is a plane, which is unbounded. Since
f (x, y, z) æ +Œ as Î(x, y, z)Î æ Œ, we see that the max of f subject to g is plus
infinity.
11.2. ODEs. What is an ordinary differential equation (ODE)? It is an equation
where you consider relations between a function of one variable and its derivatives.
In other words, here, the unknown is a function (of one variable), and not a number
(or vector). And the equation includes one or more derivatives of the unknown.
In general, an ODE looks like
F (t, x, x(1) , . . . , x(n) ) = 0
for some given function F : Rn+2 æ R. Here t œ R and x = x(t) : R æ R. The
function x is the unknown / solution to our ODE.
We use
dx
d2 x
(t) = xÕ (t) = ẋ(t) = x(1) (t), 2 (t) = xÕÕ (t) = ẍ(t) = x(2) (t), etc.
dt
dt
to denote the derivatives of x.
For example, for a, b œ R and f : R æ R,
1.
2.
3.
4.
5.
ẋ = ax
ẋ + ax = b
ẋ + ax = bx2
ẋ = x + t
ẋ = af (x) + bx
36
ANALYSIS AND OPTIMIZATION
are all ODEs.
We often use t to represent the variable on which x depends, and call it time, x
is often called space.
Definition 11.4. The graph (t, x(t)) of a solution to an ODE is called an integral
curve or solution curve.
Let’s consider the specific example,
ẋ = x + t.
In this case, we can use the general notation for an ODE as follows. Let F : R3 æ R
be defined by F (t, x, ẋ) = ẋ ≠ x ≠ t. Then, F (t, x, ẋ) = 0 is equivalent to ẋ = x + t.
Observe that
x1 (t) = ≠t ≠ 1 and x2 (t) = et ≠ t ≠ 1
are both solutions to our ODE. Indeed,
and
Also,
ẋ1 = ≠1 = (≠t ≠ 1) + t = x1 + t
ẋ2 = et ≠ 1 = (et ≠ t ≠ 1) + t = x2 + t.
x3 (t) = Cet ≠ t ≠ 1 for any C œ R
is a solution.
In general, an ODE may have many solutions, typically, infinitely many.
Of course, if we add a constraint to our ODE, we might find that only one solution
exits (or maybe no solution exists). For instance, if we restrict our solution curve to
go through the point (t, x) = (0, 1), then C must be 2 (in the expression for x3 ). So
this constraint forces us to have a unique solution. In other words,
uniquely solves
x(t) = 2et ≠ t ≠ 1
ẋ = x + t with x(0) = 1.
ANALYSIS AND OPTIMIZATION
37
12. Lecture 12: ODEs Cont’d
Definition 12.1. The general form of a solution to a given ODE is called a general
solution. While a specific solution is called a particular solution.
In our example from last lecture,
Cet ≠ t ≠ 1 for any C œ R
is the general solution, while
x1 (t) = ≠t ≠ 1 and x2 (t) = et ≠ t ≠ 1
are particular solutions to ẋ = x + t.
If we only look to solve our ODE on some time interval, e.g., (t0 , T ), then we call
t0 the initial time and the problem
I
F (t, x, x(1) , . . . , x(n) ) = 0 in (t0 , T )
x(t0 ) = x0
the initial value problem on (t0 , T ) with initial condition x(t0 ) = x0 . The
endpoint T can be plus infinity.
A first order ODE is an ODE of the form
F (t, x, ẋ) = 0,
i.e. the minimal and maximal number of derivatives taken is 1.
The highest order of an ODE is the maximal number of derivatives which shows
up. The lower order terms are those which depend only on no or fewer derivatives
of the solution. For example, in the general setting
F (t, x, x(1) , . . . , x(n) ) = 0,
the highest order is n. Anything that involves strictly fewer than n derivatives is
lower order.
A first order ODE is called separable if the lower order terms can be written as
the product of two functions, one depending on space only and the other on time
only:
F (t, x, ẋ) = ẋ ≠ f (t)g(x)
for some given f, g : R æ R.
How do we solve a first order separable ODE?
Step 1: Rewrite the equation:
dx
= f (t)g(x).
dt
Step 2: Formally, gather like variables:
dx
= f (t) dt.
g(x)
Step 3: Integrate:
ˆ
dx
= f (t) dt.
g(x)
Step 4: Evaluate the integrals (if possible).
Step 5: Solve for x (if possible). In this step, we may need to leave our expression
as one that is implicit in x.
ˆ
38
ANALYSIS AND OPTIMIZATION
Step 6: Note that every constant function x(t) © a for any a œ R such that
g(a) = 0 is a solution. Incorporate these constant solutions into your general solution
if not already there.
For example, let’s solve the IVP
≠t
ẋ =
with x(0) = 1.
x≠3
Step 1: Rewrite the equation:
dx
≠t
=
.
dt
x≠3
Step 2: Formally, gather like variables:
(x ≠ 3)dx = ≠tdt.
Step 3: Integrate:
Step 4:
ˆ
(x ≠ 3) dx =
ˆ
≠t dt.
x2
t2
≠ 3x = ≠ + C.
2
2
Step 5: Solve for x (if possible):
(x ≠ 3)2 = 9 ≠ t2 + 2C ∆ x = 3 ± (9 ≠ t2 + 2C)1/2 .
(This would be our general solution, after we add in the possible constant solutions.)
Step 5’: Use the initial conditions:
5
x(0) = 1 ∆ C = ≠ .
2
But,
3 + (9 ≠ 02 ≠ 5)1/2 = 3 + 2 = 5 ”= 1.
And
3 ≠ (9 ≠ 02 ≠ 5)1/2 = 3 ≠ 2 = 1.
So, we see our solution is
x = 3 ≠ (4 ≠ t2 )1/2 for t œ (≠2, 2).
Step 6: There are no constant solutions in this problem because 1/(x ≠ 3) = 0 if
and only if x = ±Œ, and we don’t permit x to take the value plus or minus infinity
where it solves the ODE. In particular, for this initial condition, our solution only
exists for a finite time interval t œ (≠2, 2).
Let’s do another example: solve ẋ = 2t(x ≠ 5).
Steps 1 - 4: We have
ˆ
ˆ
dx
= 2t dt.
x≠5
So
ln |x ≠ 5| = t2 + C;
that is,
2
2
|x ≠ 5| = et +C = Aet with A = eC > 0.
Thus,
2
x = 5 ± Aet with A > 0.
ANALYSIS AND OPTIMIZATION
39
Step 6: We add the constant solution x(t) © 5, to find the general solution
2
x = 5 + Aet with A œ R.
A first order linear ODE is of the form
ẋ + a(t)x = b(t)
or
F (t, x, ẋ) = 0 with F (t, x, ẋ) = ẋ + a(t)x ≠ b(t).
It is called linear because the expression ẋ + a(t)x is linear in x.
How do we solve this type of ODE?
Well, let’s make an observation:
ẋ + a(t)x = b(t) … eA(t) ẋ + eA(t) a(t)x = eA(t) b(t)
for any A = A(t) : R æ R, since eA(t) > 0. Now if A is the anti-derivative of a,
ˆ
A(t) = a(t) dt,
we see that eA(t) ẋ + eA(t) a(t)x = eA(t) b(t) is equivalent to
d
(xeA(t) ) = b(t)eA(t) .
dt
Indeed,
d
(xeA(t) ) = ẋeA(t) + xȦ(t)eA(t) = ẋeA(t) + xa(t)eA(t) .
dt
So
ˆ
A(t)
xe
= b(t)eA(t) dt + C,
i.e.,
x = e≠A(t)
In summary,
ˆ
b(t)eA(t) dt + Ce≠A(t) with A(t) =
ẋ + a(t)x = b(t) … x = e≠
The function
´
´
e
a(t) dt
3
C+
ˆ
´
b(t)e
ˆ
a(t) dt.
a(t) dt
dt
4
a(t) dt
is called an integrating factor.
If we want to understand an initial condition x(t0 ) = x0 , note that our solution
can be rewritten as
x(t) = Ce≠A(t) + e≠A(t) G(t)
where G(t) is the indefinite integral
ˆ
G(t) = b(t)eA(t) dt.
Thus,
Note that
And so,
C = x(t0 )eA(t0 ) ≠ G(t0 ).
A(t) ≠ A(t0 ) =
ˆ
t
a(r) dr.
t0
x(t) = x(t0 )e≠(A(t)≠A(t0 )) + e≠A(t) (G(t) ≠ G(t0 )).
40
ANALYSIS AND OPTIMIZATION
Since
G(t) ≠ G(t0 ) =
ˆ
we find that
e≠A(t) (G(t) ≠ G(t0 )) = e≠A(t)
In other words,
I
ẋ + a(t)x = b(t)
x(t0 ) = x0
t
ˆ
t
b(s)eA(s) ds,
t0
b(s)eA(s) ds =
t0
t
ˆ
b(s)e≠(A(t)≠A(s)) ds
t0
… x = x0 e
≠
´t
t0
a(r) dr
+
ˆ
t
b(s)e≠
t0
´t
s
a(r) dr
ds
For example, consider the IVP
t2 ẋ + tx = 1 in (1, Œ) with x(1) = 2.
We start by rewriting our equation
ẋ +
Now we define the integrating factor
eA(t) = e
´
x
1
= 2.
t
t
dt
t
= eln |t| = t
´
since t > 0. Here we choose C = 0 when computing the indefinite integral dt
t . We
have this freedom. Thus,
3ˆ
4
3ˆ
4
1
1
dt
1
A(t)
x(t) = A(t)
b(t)e
dt + C =
+ C = (ln |t| + C)
t
t
t
e
for some C œ R that we need to determine. Since x(1) = 2, we find that C = 2, and
(since t > 0) our solution is
1
x(t) = (ln t + 2).
t
Let’s go back to the first ODE we analyzed:
ẋ = x + t.
Our analysis gave us an infinite family of solutions, but we could not conclude that
we found everything. Now we can. In this case, we see that a(t) © ≠1 and b(t) = t.
So
ˆ
ˆ
xeA(t) =
teA(t) dt + C with A(t) =
≠1 dt = ≠t.
Thus, integrating by parts,
ˆ
ˆ
xe≠t = te≠t dt + C = ≠te≠t + e≠t dt + C.
In conclusion,
x = ≠t ≠ 1 + Cet
for any C œ R, as desired.
A more general first order ODE we might find looks like
g(t, x)ẋ + f (t, x) = 0
1
where f and g are C functions of two variables. This is an example of a nonlinear
ODE. In a specific case, this ODE is rather easily solved. Let’s see that case now.
ANALYSIS AND OPTIMIZATION
41
Suppose that a function h exists such that
h(t, x) = C.
Then, differentiating in t,
d
h(t, x) = ˆt h(t, x) + ˆx h(t, x)ẋ = 0.
dt
So if
ˆt h = f and ˆx h = g,
then we’d be in business.
Theorem 12.2. Let f, g : R2 æ R be C 1 . An h : R2 æ R exists such that ˆt h = f
and ˆx h = g if and only if ˆx f = ˆt g. And
ˆ t
ˆ x
h(t, x) =
f (s, x) ds +
g(t0 , y) dy with x(t0 ) = x0 .
t0
x0
If we have ODE of the form g(t, x)ẋ + f (t, x) = 0 and ˆx f = ˆt g, we call the
ODE exact.
For example, let’s consider the ODE
1 + t2 xẋ + tx2 = 0 in (0, T ).
Then, letting g(t, x) = t2 x and f (t, x) = tx2 + 1, we find that ˆt g = 2tx = ˆx f . So
this ODE is exact. In turn, our solutions x are those such that
for some C œ R. Now let’s look at
t2 x2
+t=C
2
h(t, x) =
This implies that
So
t2 x2
+ t = C.
2
3
41/2
2(C ≠ t)
x=±
and C Ø t.
t
3
41/2
2(T ≠ t)
xT = ±
t
2
2
solves 1 + t xẋ + tx = 0 in (0, T ).
Let’s state some general theorems about the existence and uniqueness of solutions
to first order ODEs of the general nonlinear variety
ẋ = f (t, x).
Theorem 12.3. Let f = f (t, x) : R2 æ R.
1. If f and ˆx f are continuous in an open set A µ R2 , then, for every pair
(t0 , x0 ) œ A, a unique local solution to ẋ = f (t, x) exists such that x(t0 ) = x0 .
2. If f and ˆx f are continuous in R2 and
|f (t, x)| Æ a(t)|x| + b(t)
for continuous functions a, b : R æ R, then, for every pair (t0 , x0 ) œ R2 ,
a unique global, i.e., in all of R, solution to ẋ = f (t, x) exists such that
x(t0 ) = x0 .
42
ANALYSIS AND OPTIMIZATION
Remark 12.4. By local solution, we mean a function x and a time interval (s, T )
exist such that t0 œ (s, T ), (t, x(t)) œ A for all t œ (s, T ), and (of course) ẋ = f (t, x)
with x(t0 ) = x0 .
Remark 12.5. Regarding uniqueness, we mean that if x1 and x2 are local solutions
with x1 (t0 ) = x2 (t0 ), then x1 (t) = x2 (t) for all times t at which both solutions exit
simultaneously.
ANALYSIS AND OPTIMIZATION
43
13. Lecture 13: ODEs Cont’d
13.1. Bernoulli’s Equation. Bernoulli’s equation is a family of first order nonlinear ODEs. This family is parameterized by a constant r œ R, and takes the
form
ẋ + a(t)x = b(t)xr .
Again, r œ R is a fixed constant.
There are three cases to consider.
Case 1: r = 0. When r = 0, our equation is linear. Indeed, we get
ẋ + a(t)x = b(t).
So we can apply what we know about first order linear ODEs to solve it.
Case 2: r = 1. When r = 1, we can rearrange our equation is so that it is
separable. Indeed,
ẋ + a(t)x = b(t)x
is equivalent to
ẋ = f (t)g(x) with f (t) = b(t) ≠ a(t) and g(x) = x.
So we can apply what we know about first order separable ODEs to solve it.
Case 3: r ”= 0 and r ”= 1. When r ”= 0 and r ”= 1, we have to do a change of
variables.
Remark 13.1. In this case, we typically only look for positive solution. Otherwise,
xr may not be well-defined. That said, given a specific r œ R, you can analyze when
xr is well-defined to say something about solutions which are not just positive.
Remark 13.2. Don’t forget to check if there is a constant solution, like x(t) = 0
for all t œ R (also written as x © 0) to the ODE.
What do we mean by a change of variables, also known as a transformation
of variables? A change of variables is a process of defining a new ODE from our
original ODE by defining a new function from x. Let’s see this explicitly in this
case. Here we define the function
z = x1≠r ,
and see that x solves Bernoulli’s equation if and only if z solves the linear ODE
1
ż + a(t)z = b(t).
1≠r
Indeed,
ż = (1 ≠ r)x≠r ẋ.
So after rewriting our original ODE as
x≠r ẋ + a(t)x1≠r = b(t)
(multiply by x≠r ) and using the formulas for z and ż, we find the linear ODE above.
We know how to solve first order linear ODEs. Therefore, after we solve for z, we
then determine x, by letting
1
x = z 1≠r .
For example, let’s solve
ẋ ≠ x = et x2 .
44
ANALYSIS AND OPTIMIZATION
First, note that x © 0 is a solution. If x(t) ”= 0 for any t œ R, then we have
ẋ
1
≠ = et .
x2
x
Set
z=
Then,
1
.
x
ż = ≠
Hence, we find the ODE
ẋ
.
x2
≠ż ≠ z = et … ż + z = ≠et ,
which is linear. Thus, observe that
(zet )Õ = z Õ et + zet = ≠e2t
if and only if
(multiply by et ). In turn,
or
ż + z = ≠et
1
zet = ≠ e2t + C,
2
1
z = ≠ et + Ce≠t .
2
Changing variables back, we find that
x=
1
2et
=
on (≠Œ, 12 ln 2C) and ( 12 ln 2C, Œ)
z
2C ≠ e2t
also solves our ODE (on the intervals defined above).
13.2. Riccati’s Equation. Riccati’s equation is a first order nonlinear ODE. It
takes the form
ẋ = P (t) + Q(t)x + R(t)x2 .
In general, this ODE can only be solved through numerical methods. But if we
already know a particular solution, we can use a change of variables to find the
general solution. Indeed, if we know that
u = u(t)
is a particular solution, we can define z implicitly by the equation
1
x=u+ .
z
Substituting u + z ≠1 into our ODE and using that u is a solution, we find an ODE
for z. Indeed, notice that
ẋ = u̇ ≠ żz ≠2
and
Also, observe that
u̇ = P (t) + Q(t)u + R(t)u2 .
ẋ = P (t) + Q(t)x + R(t)x2
ANALYSIS AND OPTIMIZATION
45
if and only if
u̇ ≠ żz ≠2 = P (t) + Q(t)(u + z ≠1 ) + R(t)(u + z ≠1 )2
= P (t) + Q(t)u + R(t)u2 + Q(t)z ≠1 + R(t)(2uz ≠1 + z ≠2 ),
which, using that u is a solution, is equivalent to
≠żz ≠2 = Q(t)z ≠1 + R(t)(2uz ≠1 + z ≠2 ).
Multiplying across by ≠z 2 and rearranging, we find the ODE
ż + (Q(t) + 2R(t)u(t))z = ≠R(t).
Letting
a(t) = Q(t) + 2R(t)u(t) and b(t) = ≠R(t),
we see that the ODE for z we found is linear, which we know how to solve. After
we solve for z, we use the definition of z to determine x.
Remark 13.3. This whole process assumed we knew a particular solution already.
Sometimes we can guess a solution, and then use this procedure.
For example, consider
ẋ ≠ x2 + 2et x = e2t + et .
Observe that
u(t) = et
is a particular solution (found by guessing). So let’s do our change of variables:
1
x=u+ .
z
In this case, the ODE for z we find is
ż + (Q(t) + 2R(t)u(t))z = ≠R(t)
with Q(t) = ≠2e and R(t) = 1, or
t
We then see that
from which we determine that
ż = ≠1.
z = ≠t + C,
x = et +
(Note that x is only define when t ”= C.)
1
.
C ≠t
13.3. Equations with Homogeneous Terms.
Definition 13.4. Let k be an integer and suppose that f : Rn æ R is such that
f (⁄x) = ⁄k f (x) for all ⁄ œ R.
Then, we say f is k-homogeneous or homogeneous of degree k.
46
ANALYSIS AND OPTIMIZATION
Suppose that we have the ODE
P (t, x)ẋ + Q(t, x) = 0
with P and Q homogeneous of degree k, i.e., for all ⁄ œ R,
P (⁄t, ⁄x) = ⁄k P (t, x) and Q(⁄t, ⁄x) = ⁄k Q(t, x).
In this case, we can again preform a change of variables to simplify our ODE into
something we know how to solve. Indeed, if we let
x = tz,
then our ODE is equivalent to
P (t, tz)(tz)Õ + Q(t, tz) = 0.
Using the k-homogeneity of P and Q and the product rule, we find the equivalent
ODE
tk P (1, z)z + tk P (1, z)tż + tk Q(1, z) = 0.
In other words,
1 P (1, z)z + Q(1, z)
ż = ≠
.
t
P (1, z)
This is a separable ODE. So after solving it, we can determine x very easily.
Remark 13.5. These computations are formal. Since we cannot divide by 0, we
have to be careful about the expressions we have on the right-hand side of the ODE
for z.
For example, consider
If we set
tẋ = x ≠ tex/t on (0, Œ).
P (t, x) = t and Q(t, x) = x ≠ tex/t ,
we see that P and Q are 1-homogeneous. So setting x = tz, we find the following
ODE for z,
tẋ = x ≠ tex/t … t(tz)Õ = tz ≠ tetz/t ;
that is, if and only if
tz + t2 ż = tz ≠ tez ,
which is equivalent to
ż = ≠t≠1 ez .
Using what we know about separable ODEs, we look at
ˆ
ˆ
dz
dt
=
≠
.
ez
t
Hence, we see that
≠e≠z = ≠ ln |t| + C = ≠ ln t + C,
since t > 0. Thus,
z = ≠ ln(ln t + C).
Finally,
x = tz = ≠t ln(ln t + C).
ANALYSIS AND OPTIMIZATION
47
14. Lecture 14: ODEs Cont’d
Recall that a second order ODE looks like
F (t, x, ẋ, ẍ) = 0
4
for some given F : R æ R. Again, x = x(t) is the unknown, and
dx
d2 x
= xÕ and ẍ = 2 = xÕÕ .
dt
dt
In general, it is very difficult to find solutions to second order ODEs. But there are
some specific cases we can understand very well. First, we will consider three cases
that can be reduced / transformed into first order equations. Then, we will move
on to some other second order ODEs, and learn some new techniques.
Case 1: We start with the ODE
ẋ =
ẍ = f (t).
To solve this ODE, we just integrate twice.
For example, consider
ẍ = k for some fixed k œ R.
If we integrate once, we find that
ẋ = kt + A,
for any A œ R. Integrating again, we see that
kt2
+ At + B for any A, B œ R.
2
Case 2: Now let’s consider the ODE
x=
ẍ = f (t, ẋ).
Notice that x on its own is missing. So if we let
z = ẋ,
we have the ODE
ż = f (t, z).
This change of variables reduces our ODE to a first order ODE, which we can try
to analyze and solve with what we have already learned. If we can solve for z, then
we simply integrate
z = ẋ
to determine x.
For example, consider the ODE
ẍ = ẋ + t.
If we let z = ẋ, then we find the first order ODE
ż = z + t,
which we have already solved. In particular,
Integrating, we find that
z = Aet ≠ t ≠ 1 for any A œ R.
x = Aet ≠
t2
≠ t + B for any A, B œ R.
2
48
ANALYSIS AND OPTIMIZATION
For a second order ODE, we typically need to prescribe two condition for an
initial value problem. An initial value problem in the second order setting looks like
Y
_
]F (t, x, ẋ, ẍ) = 0 on (t0 , T )
x(t0 ) = x0
_
[
ẋ(t0 ) = ẋ0
where x0 , ẋ0 œ R are two given constants.
For example, if we want to solve the IVP
Y
_
]ẍ = ẋ + t on (0, Œ)
x(0) = 1
_
[
ẋ(0) = 2,
we plug in x(0) = 1 and ẋ(0) = 2 into what we just obtained:
and
1 = x(0) = Ae0 ≠
02
≠0+B =A+B
2
2 = ẋ(0) = Ae0 ≠ 0 ≠ 1 = A ≠ 1.
So A = 3 and B = ≠2, and our solution is
t2
≠ t ≠ 2.
2
Let’s do another example. Consider the ODE
x = 3et ≠
ẋ
.
t
To solve this ODE, we consider the change of variables z = ẋ, and find the equivalent
ODE, for z,
z
ż = cos t ≠ … tż + z = t cos t … (tz)Õ = t cos t.
t
So, integrating by parts,
ˆ
zt = t cos t dt = t sin t + cos t + A with A œ R.
ẍ = cos t ≠
Multiplying by t≠1 and integrating again, we find that
ˆ
ˆ
x = z dt = ≠ cos t + t≠1 cos t dt + A ln |t|.
Remark 14.1. Since the indefinite integral of t≠1 cos t does not have a nice closed
form solution, we leave it as an indefinite integral. Also, notice that the general
solution here only has one free constant A, whereas, in all of our other examples,
we’ve seen two free constants. The second constant lives inside the indefinite integral
of t≠1 cos t.
Case 3: Now let’s consider the ODE
ẍ = f (x, ẋ).
Notice that an explicit dependence on t is missing. The standard method to try to
solve this type of ODE is to think of t as a function that depends on x, i.e., think
ANALYSIS AND OPTIMIZATION
49
of x as the independent variable, and find and solve an ODE for t. In particular, let
tÕ and tÕÕ denote the first and second derivatives of t with respect to x: formally
ẋ =
and
ẍ =
1
1
dx
=
= Õ
dt
dt/dx
t
d dx
dx d 1
1 ≠tÕÕ
≠tÕÕ
=
=
=
.
dt dt
dt dx tÕ
tÕ (tÕ )2
(tÕ )3
So replacing ẍ and ẋ in our ODE, we find the equivalent ODE
≠tÕÕ
= f (x, (tÕ )≠1 ) … tÕÕ = ≠(tÕ )3 f (x, (tÕ )≠1 ).
(tÕ )3
Observe that this ODE, as an ODE for t = t(x), does not depend explicitly on t.
So this ODE is one that falls into Case 2, above, which we know how to solve.
For example, consider the ODE
ẍ = ≠xẋ,
which, from our procedure above, is equivalent to
tÕÕ = x(tÕ )2 .
Using what we learned in Case 2, we do the the change of variables
z = z(x) = tÕ (x),
and find the ODE
z Õ = xz 2 .
This is separable, and so we find the integral equality
ˆ
ˆ
≠1
x2
z ≠2 dx = x dx …
=
+ C.
z
2
Solving for z and then integrating, we find that
ˆ
ˆ
1
t = z dx = ≠2
dx.
2
x + 2C
This integral has three cases to consider, when C = 0, C < 0, and C > 0.
When C < 0, we see that
ˆ
ˆ
1
1
1
Ô
≠2
dx
=
≠
dx
2
x + 2C
C
(x/ 2C)2 + 1
Ú ˆ
2
1
=≠
dy
2
C
y +1

= ≠ (2/C)(arctan y + A)

Ô
≠ (2/C)(arctan(x/ 2C) + A).
Thus, solving for x, we find that

Ô
x = 2C tan(≠( (C/2)t + A)).
50
ANALYSIS AND OPTIMIZATION
When C < 0, we see that
ˆ
ˆ
1
1
≠2
dx = ≠2

2 dx
x2 + 2C
x2 ≠ (≠2C)

3 4
- x ≠ (≠2C) ≠1
-+A .

=
ln (≠2C)
x + (≠2C) -
We cannot solve for x explicitly, and so we leave the solution in an implicit form

3 4
- x ≠ (≠2C) ≠1

t= 
ln +A .
(≠2C)
x + (≠2C) When C = 0, we see that
ˆ
ˆ
1
1
2
≠2
dx = ≠2
dx = + A.
2
2
x + 2C
x
x
And so,
2
.
t≠A
Notice that, in these last two cases, the solution x is not defined on all of
R = (≠Œ, Œ). For instance, when C = 0, we see that t ”= A.
x=
14.1. Second Order Linear ODEs. A second order linear ODE takes the following form
ẍ + a(t)ẋ + b(t)x = f (t),
where a, b, and f are given functions. While this type of ODE is very difficult
to solve by hand for most a, b, and f , we can say something about the general
structure of solutions, even if we can’t find them.
When f (t) = 0 for all t, we call this ODE homogeneous. Otherwise, we call
this ODE non-homogeneous.
Definition 14.2. Two functions u1 and u2 are non-proportional if neither is a
constant multiple of the other. (This is linear independence for functions.)
Theorem 14.3. Consider the non-homogeneous ODE
ẍ + a(t)ẋ + b(t)x = f (t)
and its homogeneous counterpart
ẍ + a(t)ẋ + b(t)x = 0.
1. The general solution to the homogeneous ODE is given by
xh = C1 u1 + C2 u2
where C1 , C2 œ R are arbitrary constants and u1 and u2 are any two particular solutions that are non-proportional.
2. The general solution to the non-homogeneous ODE is given by
x = uú + xh
where uú is any particular solution of the ODE and xh is the general solution
of the homogeneous counterpart.
ANALYSIS AND OPTIMIZATION
51
Remark 14.4. Notice that if u1 and u2 solve a homogeneous equation, then
C1 u1 +C2 u2 , where C1 , C2 œ R are arbitrary constants, solves the same homogeneous
equation. This is why we call this type of ODE linear. (Compare this with first
order linear ODEs.)
If uú and x solve a non-homogeneous equation, then x ≠ uú and uú ≠ x solve the
homogeneous counterpart.
In general, this theorem is the best we can do, but when a and b are constant
functions, we can do better.
Theorem 14.5. Consider the ODE
ẍ + aẋ + bx = 0,
where a, b œ R are arbitrary constants.
1. If D = a2 ≠ 4b > 0, then the general solution is
x = C1 er1 t + C2 er2 t
where r1 and r2 are the two real roots of the characteristic equation
r2 + ar + b = 0.
2. If D = a2 ≠ 4b = 0, then the general solution is
x = C1 ert + C2 tert
where r is the real root of the characteristic equation r2 + ar + b = 0.
3. If D = a2 ≠ 4b < 0, then the general solution is
x = e–t (C1 cos —t + C2 sin —t)
Ô
where – = ≠a/2 and — = ≠D/2.
Here C1 , C2 œ R are arbitrary constants.
Remark 14.6. Where did the characteristic equation come from? Well, if we plug
x = ert into our equation ẍ + aẋ + bx = 0, we see that
(r2 + ar + b)ert = 0,
which is true if and only if r2 + ar + b = 0.
What happens if we add a right hand side f = f (t), and consider
ẍ + aẋ + bx = f (t),
where a, b œ R are arbitrary constants?
The general algorithm to solve a linear non-homogeneous constant coefficient
ODE is as follows.
Step 1: Solve the homogeneous counterpart ẍ + aẋ + bx = 0, call the solution xh .
Step 2: Find any particular solution to the ODE, call the solution xnh .
Step 3: Add these solutions together: x = xh + xnh .
Of course, Step 2 seems to be putting us in a chicken vs. egg scenario. But we
will see a method to deal with this. That said, there are three east cases to consider
first.
Case 1: If f is a polynomial of degree n, then you should look for an xnh that
is also a polynomial of degree n.
Case 2: If f = peqt , then guess xnh = Aeqt .
Case 3: If f = p sin rt + q cos rt, then guess xnh = A sin rt + B cos rt.
52
ANALYSIS AND OPTIMIZATION
Let’s do an example to see all this in action. Let’s solve the ODE
ẍ ≠ 3ẋ + 2x = 2t2 + 2t + 4.
Step 1 is to solve the homogeneous counterpart
ẍ ≠ 3ẋ + 2x = 0,
which has characteristic equation r2 ≠ 3r + 2 = 0. The roots of the equation are
r1 = 1 and r2 = 2. So
xh = C1 et + C2 e2t .
Now we deal with Step 2. From Case 1, we expect xnh to look like a degree two
polynomial At2 + Bt + C. Let’s see what A, B, and C need to be in order for this
to be a solution. If xnh = At2 + Bt + C, then
ẋnh = 2At + B and ẍnh = 2A.
Hence, we need
2A ≠ 3(2At + B) + 2(At2 + Bt + C) = 2t2 + 2t + 4.
This happens if and only if
In other words,
Thus,
and
2A = 2, ≠6A + 2B = 2, and 2A ≠ 3B + 2C = 4.
A = 1, B = 4, and C = 7.
xnh = t2 + 4t + 7,
x = C1 et + C2 e2t + t2 + 4t + 7.
ANALYSIS AND OPTIMIZATION
53
15. Lecture 15: ODEs Cont’d and the Calculus of Variations
15.1. ODEs: Variation of Parameters. Until now we have only been able to
solve second order linear constant coefficient ODEs with very particular right hand
sides f = f (t). Through an example, we will now illustrate a method that, in
principle, will allow you to deal with any f = f (t).
Consider the ODE
ẍ ≠ 2ẋ ≠ 3x = te≠t .
≠t
Since f = f (t) = te is neither a polynomial, a function of the form peqt , nor a
function of the form p sin(rt) + q cos(rt), the methods we’ve developed so far won’t
help here. Instead, we employ a technique called the variation of parameters.
Step 1 in solving this ODE is the same as before: solve the homogeneous
counterpart
ẍ ≠ 2ẋ ≠ 3x = 0.
We see that
xh = C1 e≠t + C2 e3t .
Step 2 is the variation of parameters. Let
x1 = x1 (t) = e≠t and x2 = x2 (t) = e3t
be the two functions which make up the solution to the homogeneous equation. Now
we solve the system of equations
ċ1 x1 + ċ2 x2 = 0 and ċ1 ẋ1 + ċ2 ẋ2 = f (t)
for two functions ċ1 = ċ1 (t) and ċ2 = ċ2 (t). Once we find ċ1 and ċ2 , we integrate
them to get two function c1 and c2 . The solution to the ODE is then
x = xh + xnh = C1 x1 + C2 x2 + c1 x1 + c2 x2 .
In this case, specifically, we find the system
ċ1 e≠t + ċ2 e3t = 0 and ≠ ċ1 e≠t + 3ċ2 e3t = te≠t .
Adding these two equations together, we find that
ċ2 e3t + 3ċ2 e3t = te≠t .
So that
te≠4t
.
4
Plugging this into the first equation, we find that
t
ċ1 = ≠ .
4
Now we integrate to determine that
ˆ
t2
1
te≠4t
e≠4t
c1 (t) = ≠ and c2 (t) =
te≠4t dt = ≠
≠
,
8
4
16
64
after integrating by parts. So we find that
3
4
t2
t
1
xnh = e≠t ≠ ≠
≠
.
8
16 64
ċ2 =
Thus, the solution is
x = C1 e≠t + C2 e3t ≠
e≠t 2
(8t + 4t + 1).
64
54
ANALYSIS AND OPTIMIZATION
15.2. Calculus of Variations. We are now going to move on to optimization
problems where our optimizer will be a function rather than a point. The standard
form of these problems is
ˆ t1
ˆ t1
max
f (t, x, ẋ) dt or min
f (t, x, ẋ) dt
x
subject to
x
t0
t0
x(t0 ) = x0 and x(t1 ) = x1 .
Here f : R3 æ R is a C 2 function and x = x(t) is C 1 . In general, x will end up
being at least C 2 as well.
These problems are called variational problems, and the methods used to solve
them come from the theory of the calculus of variations. Sometimes the calculus of
variations is called dynamic programing.
Theorem 15.1. If xú = xú (t) is a solution of our variational problem, then
ˆx f (t, xú , ẋú ) =
d
(ˆẋ f (t, xú , ẋú )).
dt
The equation in the theorem
d
(ˆẋ f (t, xú , ẋú ))
dt
is called the Euler–Lagrange equation associated to the variational problem
determined by f . Let’s unpack this equation. The right had side is a total derivative,
and if xú is C 2 , we can distribute the d/dt, and find
ˆx f (t, xú , ẋú ) =
d
(ˆẋ f (t, xú , ẋú )) = ˆtẋ f (t, xú , ẋú ) + ˆxẋ f (t, xú , ẋú )ẋ + ˆẋẋ f (t, xú , ẋú )ẍ
dt
= ˆ13 f (t, xú , ẋú ) + ˆ23 f (t, xú , ẋú )ẋ + ˆ33 f (t, xú , ẋú )ẍ.
The left hand side
ˆx f (t, xú , ẋú ) = ˆ2 f (t, xú , ẋú ).
In turn, when xú is C 2 , the EL equation is a second order ODE:
ˆ13 f (t, xú , ẋú ) + ˆ23 f (t, xú , ẋú )ẋ + ˆ33 f (t, xú , ẋú )ẍ ≠ ˆ2 f (t, xú , ẋú ) = 0.
Theorem 15.2. In the maximization (minimization) case, if xú solves the EL
equation associated to a variational problem and f is concave (convex) in (x, ẋ) for
every fixed t, then xú is a solution to the variational problem.
Theorem 15.3. In the maximization (minimization) case, if xú solves the EL
equation associated to a variational problem and f is strictly concave (convex) in
(x, ẋ) for every fixed t, then xú is the unique solution to the variational problem.
For example, consider the problem
max
x
subject to
ˆ
t1
f (ẋ) dt
t0
x(t0 ) = x0 and x(t1 ) = x1 .
The EL equation is this case is
d Õ
(f (ẋ)) = 0 ∆ f Õ (ẋ(t)) = constant (as a function of t).
dt
ANALYSIS AND OPTIMIZATION
55
Notice that if x = x(t) is the line joining (t0 , x0 ) and (t1 , x1 ), then ẋ is constant,
and so it would solve the EL equation. Indeed, the line joining (t0 , x0 ) and (t1 , x1 )
is
x0 ≠ x1
t0 x1 ≠ t1 x0
x(t) =
t+
.
t 0 ≠ t1
t 0 ≠ t1
Then,
x0 ≠ x1
ẋ(t) =
= s for all t œ R,
t0 ≠ t 1
i.e., it is constant. So f Õ (ẋ(t)) = f Õ (s) = constant, as desired. By our theorem, if f
is concave, then this line would solve the variational problem.
Let’s do another example:
ˆ 1
max
(1 ≠ x ≠ 3ẋ ≠ 2ẋ2 )et dt
x
0
subject to
x(0) = 0 and x(1) = 3.
To solve this, we first identify the integrand f (t, x, ẋ) = (1 ≠ x ≠ 3ẋ ≠ 2ẋ2 )et . Second,
we compute the EL equation. To do this we need
ˆx f = ≠et , ˆẋ f = ≠(3 + 4ẋ)et and
Thus, the EL equation is
d
(ˆẋ f ) = ≠(3 + 4ẋ + 4ẍ)et .
dt
1
≠et = ≠et (3 + 4ẋ + 4ẍ) … ẍ + x = ≠ .
2
This is a linear 2nd order ODE with constant coefficients, which we know how to
solve. Its solution is
t
x = C1 + C2 e≠t ≠ .
2
Third, we use our boundary conditions x(0) = 0 and x(1) = 3 to determine C1 and
C2 . In particular, we find that
7
C1 = ≠C2 =
.
2(e ≠ 1)
To conclude, we need to check if f is concave in (x, ẋ) for all t. To this end, we
compute the Hessian of f with respect to (x, ẋ):
2
t
Dx,
ẋ f (t, x, ẋ) = diag(0, ≠4)e .
This is symmetric and its eigenvalues are 0 and ≠4et , which are both less than or
equal to 0. So f is indeed concave in (x, ẋ) for all t. Hence,
x(t) =
solves the problem.
Let’s do another example:
min
x
subject to
7
7
t
≠
e≠t ≠
2(e ≠ 1) 2(e ≠ 1)
2
ˆ
0
T
x2 + cẋ2 dt with c > 0
x(0) = x0 and x(T ) = 0.
56
ANALYSIS AND OPTIMIZATION
Again, we compute the EL equation
1
2cẍ = 2x … ẍ ≠ x = 0.
c
The general solution to this ODE is
1
x = Aert + Be≠rt with r = Ô .
c
Using the boundary conditions x(0) = x0 and x(T ) = 0, we find that
A + B = x0 and AerT + Be≠rT = 0.
Thus,
≠x0 e≠rT
x0 erT
and
B
=
.
erT ≠ e≠rT
erT ≠ e≠rT
A=
Notice that
2
Dx,
ẋ f (t, x, ẋ) = diag(2, 2c).
Since c > 0, the eigenvalues of this symmetric matrix are both positive. So f is
strictly convex, which tells us that the solution to the EL we found subject to the
boundary conditions is the unique minimizer.
Question: Where does the EL equation come from? Well, let’s derive it in the
maximization case. The minimization case is essentially the same. To do this, let
ˆ t1
I(x) =
f (t, x, ẋ) dt.
t0
Also, let Ï : R æ R be C 1 and such that Ï Ø 0 and Ï(t0 ) = 0 = Ï(t1 ). Now
suppose that xú is a maximizer. Then, x = xú + ⁄Ï is an admissible function, i.e., a
competitor in the maximization problem. So
In turn, the function
I(xú + ⁄Ï) Æ I(xú ) for all ⁄ œ R.
g(⁄) = I(xú + ⁄Ï)
has a maximum at ⁄ = 0, from which it follows that
g Õ (0) = 0.
But what is g Õ ,
ˆ
g Õ (⁄) =
t1
ˆx f (t, xú + ⁄Ï, ẋú + ⁄Ï̇)Ï + ˆẋ f (t, xú + ⁄Ï, ẋú + ⁄Ï̇)Ï̇ dt.
t0
Hence,
0 = g Õ (0) =
ˆ
t1
ˆx f (t, xú , ẋú )Ï dt +
t0
ˆ
t1
ˆẋ f (t, xú , ẋú )Ï̇ dt.
t0
Let’s integrate the second integral by parts:
ˆ t1
ˆ
1
ˆẋ f (t, xú , ẋú )Ï̇ dt = ˆẋ f (t, xú , ẋú )Ï|t=t
≠
t=t0
t0
t1
t0
=≠
ˆ
t1
t0
d
(ˆẋ f (t, xú , ẋú ))Ï dt
dt
d
(ˆẋ f (t, xú , ẋú ))Ï dt
dt
ANALYSIS AND OPTIMIZATION
57
since Ï(t0 ) = 0 = Ï(t1 ), by assumption. Putting this back into what we had, we
deduce that
6
ˆ t1 5
d
0=
ˆx f (t, xú , ẋú ) ≠ (ˆẋ f (t, xú , ẋú )) Ï dt.
dt
t0
By a fact in calculus (see the text if you are interested, Chapter 8, Section 3,
Theorem 8.3.2), this implies that
d
ˆx f (t, xú , ẋú ) ≠ (ˆẋ f (t, xú , ẋú )) = 0,
dt
which is exactly our EL equation.
58
ANALYSIS AND OPTIMIZATION
16. Lecture 16: Calculus of Variations Cont’d
Let’s start by answering the question, “How does the concavity of f in (x, ẋ)
imply an xú which solves the EL equation is a maximizer?” To see this, let y be an
arbitrary admissible function. Then,
ˆ t1
I(y) ≠ I(xú ) =
(f (t, y, ẏ) ≠ f (t, xú , ẋú )) dt
t0
t1
Æ
ˆ
Òx,ẋ f (t, xú , ẋú ) · (y ≠ xú , ẏ ≠ ẋú ) dt
=
ˆ
[ˆx f (t, xú , ẋú ) ≠
t0
t1
t0
d
(ˆẋ f (t, xú , ẋú )](y ≠ xú ) dt
dt
= 0.
The inequality is a consequence of the concavity of f in (x, ẋ). The equality after
follows from integrating by parts, and the last equality follows since xú solves the
EL equation. In turn,
I(xú ) Ø I(y) for all y admissible,
as desired. In other words, xú is a solution to the variational problem.
Now let’s see an example of how and where a variational problem shows up in
real life: the optimal savings problem. Consider an economy evolving over time,
where
k = k(t) represents the capital stock,
c = c(t) represents consumption, and
y = y(t) represents net national product.
Suppose that
y(t) = g(k(t)) where g Õ (k) > 0 and g ÕÕ (k) Æ 0.
That is, g is concave and increasing, i.e., net national product is concave and
increasing as a function of stock. Also, assume that
net national product = y(t) = c(t) + k̇(t) = consumption + investment
and
k(0) = k0 is the given capital stock existing today at time t = 0.
Furthermore, assume that society has a utility function
u = u(c),
depending on consumption in the following way
uÕ (c) > 0 and uÕÕ (c) < 0.
An interpretation of this dependence is that high levels of consumption lead to a
lower increase in satisfaction from a given increase in consumption compared to low
levels of consumption. Finally, let
r Ø 0 be the discount factor,
which describes how much more the present matters than the future. The optimal
savings problem is then
ˆ T
ˆ T
≠rt
max
u(c)e
dt = max
u(g(k) ≠ k̇)e≠rt dt,
k
0
k
0
ANALYSIS AND OPTIMIZATION
59
recalling that
c = y ≠ k̇ and y = g(k).
What is the solution of the optimal savings problem? Well, let’s use what we
have learned. First, we derive the EL equation. We compute
So
f (t, k, k̇) = u(g(k) ≠ k̇)e≠rt .
ˆk f = uÕ (g(k) ≠ k̇)g Õ (k)e≠rt = uÕ (c)g Õ (k)e≠rt
Hence,
ˆk̇ f = ≠uÕ (g(k) ≠ k̇)e≠rt = ≠uÕ (c)e≠rt .
d
(ˆ f ) = ≠uÕÕ (c)cÕ e≠rt + ruÕ (c)e≠rt ,
dt k̇
and the EL equation is
Now set
uÕ (c)(g Õ (k) ≠ r) + uÕÕ (c)cÕ = 0.
cuÕÕ (c)
,
uÕ (c)
which represent the elasticity of marginal utility with respect to consumption. We
then find (rearranging the EL equation) that
Ec (uÕ ) =
cÕ
uÕ (c)
r ≠ g Õ (k)
= ÕÕ (r ≠ g Õ (k)) =
.
c
cu (c)
Ec (uÕ )
Notice that Ec (uÕ ) < 0 by our assumptions on utility. In turn,
cÕ
cÕ
> 0 … g Õ (k) > r and
< 0 … g Õ (k) < r.
c
c
This tells us that consumption increases if and only if the marginal productivity of
capital trumps the discount factor. Since
the EL equation tells is that
cÕ = g Õ (k)k̇ ≠ k̈,
k̈ ≠ g Õ (k)k̇ +
uÕ (c)
(r ≠ g Õ (k)) = 0,
uÕÕ (c)
which is an ODE for k. As g is concave, g(k) ≠ k̇ is concave in (k, k̇). Moreover,
as u is increasing and concave, u(g(k) ≠ k̇)e≠rt is also concave in (k, k̇). So any
solution to our ODE above will solve our optimal savings problem.
16.1. More General / Other Terminal Conditions. So far we have only considered the boundary conditions
x(t0 ) = x0 and (1) x(t1 ) = x1 .
Now we will consider the following four cases: (a) x(t1 ) is free; (b) x(t1 ) Ø x1 ; (c)
x(t1 ) Æ x2 ; and (d) x1 Æ x(t1 ) Æ x2 .
The new feature, when dealing with (a) - (d) rather than (1), is called a transversality condition. So we consider the problem
ˆ t1
max x
f (t, x, ẋ) dt
t0
60
ANALYSIS AND OPTIMIZATION
subject to
x(t0 ) = x0 and any one of (a) - (d).
Theorem 16.1 (Necessary Condition). If xú solves the variational problem with
one of the terminal conditions (a), (b), (c), or (d), then xú satisfies the EL equation
and the transversality condition
(a) ˆẋ f (t, xú , ẋú )|t=t1 = 0.
(b) ˆẋ f (t, xú , ẋú )|t=t1 = 0 if x(t1 ) > x1 and ˆẋ f (t, xú , ẋú )|t=t1 Æ 0 if x(t1 ) = x1 .
(c) ˆẋ f (t, xú , ẋú )|t=t1 = 0 if x(t1 ) < x2 and ˆẋ f (t, xú , ẋú )|t=t1 Ø 0 if x(t1 ) = x2 .
(d) ˆẋ f (t, xú , ẋú )|t=t1 = 0 if x(t1 ) œ (x1 , x2 ); ˆẋ f (t, xú , ẋú )|t=t1 Æ 0 if x(t1 ) =
x1 ; and ˆẋ f (t, xú , ẋú )|t=t1 Ø 0 if x(t1 ) = x2 .
Theorem 16.2 (Sufficient Condition). Suppose f is concave in (x, ẋ) and xú
satisfies the EL equation and the transversality condition
ˆẋ f (t, xú , ẋú )|t=t1 = 0.
(1) If xú (t1 ) œ [x1 , x2 ], then xú solves the variational problem.
(2) If xú (t1 ) < x1 , then replace the transversality condition with the terminal
condition x(t1 ) = x1 . The solution to the EL equation subject to x(t0 ) = x0
and x(t1 ) = x1 will solve the variational problem.
(3) If xú (t1 ) > x2 , then replace the transversality condition with the terminal
condition x(t1 ) = x2 . The solution to the EL equation subject to x(t0 ) = x0
and x(t1 ) = x2 will solve the variational problem.
Why do we have the transversality condition? Let’s look at (a) and (b), for
example. For (a), let
ˆ t1
I(x) =
f (t, x, ẋ) dt.
t0
Consider any Ï : R æ R such that Ï Ø 0 and Ï(t0 ) = 0. Define
g(⁄) = I(xú + ⁄Ï).
Like before, when we derived the EL equation, we know that
g Õ (0) = 0
if xú is a maximizer. So
ˆ t1
0=
ˆx f (t, xú , ẋú )Ï + ˆẋ f (t, xú , ẋú )Ï̇ dt
t0
1
= ˆẋ f (t, xú , ẋú )Ï|t=t
t=t0 +
= ˆẋ f (t, xú , ẋú )|t=t1 .
ˆ
t1
t0
[ˆx f (t, xú , ẋú ) ≠
d
(ˆẋ f (t, xú , ẋú )]Ï dt
dt
Here we integrated by parts, used that xú satisfies the EL equation, and that
Ï(t0 ) = 0.
For (b), we have to assume that xú (t1 ) + ⁄Ï(t1 ) Ø x1 to be admissible. If
ú
x (t1 ) > x1 , then the exact same argument gives us the same transversality condition.
Why? Because xú (t1 ) > x1 allows us to consider both positive and negative ⁄. If
xú (t1 ) = x1 , then ⁄Ï(t1 ) Ø 0. In other words, we can only consider ⁄ Ø 0. So we
only have the inequality
g Õ (0) Æ 0,
ANALYSIS AND OPTIMIZATION
which implies
61
ˆẋ f (t, xú , ẋú )|t=t1 Æ 0,
by a similar argument.
For example, let’s look at the problem
ˆ ln 2
min
ẋ2 + (x ≠ 2)2 dt
x
subject to
0
2 Æ x(0) Æ 3 and x(ln 2) = 1.
To solve this, first, we put it into our standard form:
ˆ 0
max
≠ẏ 2 ≠ (y ≠ 2)2 ds
y
subject to
≠ ln 2
2 Æ y(0) Æ 3 and y(≠ ln 2) = 1,
with y(s) = x(≠s). How did we do this? We did the change of variables in the
integral t = ≠s and renaming y(s) = x(≠s).
Recall, the EL equation in general is
d
ˆy f =
(ˆẏ f ).
ds
In this case, we find
≠2(y ≠ 2) = ≠2ÿ … ÿ ≠ y = 2.
By what we know about ODEs,
y = C1 es + C2 e≠s + 2.
Using the initial condition y(≠ ln 2) = 1, we find the equation
1 = y(≠ ln 2) =
which is true if and only if
C1
+ 2C2 + 2,
2
C1 = ≠2(1 + 2C2 ).
Now we deal with the transversality condition
Note that
0 = ˆẏ f |s=0 = ≠2ẏ(0).
ẏ = C1 es ≠ C2 e≠s = ≠2(1 + 2C2 )es ≠ C2 e≠s .
So ẏ(0) = 0 if and only if
2
C2 = ≠ = C1 .
5
We need to see what y(0) is:
2 2
6
y(0) = ≠ ≠ + 2 = < 2.
5 5
5
Therefore, this cannot be the solution to our variational problem. Instead, the
solution will be the solution to the EL equation with the conditions y(≠ ln 2) = 1
and y(0) = 2, once we verify that the integrand is concave. Using y(0) = 2, we find
that
2
y = ≠ (es ≠ e≠s ) + 2.
3
62
ANALYSIS AND OPTIMIZATION
Since ≠ẏ 2 ≠ (y ≠ 2)2 is concave in (y, ẏ), this y solves the problem. Going back to
x, we see that
2
x = ≠ (e≠t ≠ et ) + 2
3
solves the original minimization problem.
ANALYSIS AND OPTIMIZATION
63
17. Lecture 17: Calculus of Variations Cont’d and Control Theory
Let’s do another example. Consider
ˆ 1
max
(1 ≠ 4x2 ≠ ẋ2 ) dt
x
0
subject to
(a) x(1) Ø 0 and (d) ≠ 2 Æ x(1) Æ 0.
We start by deriving the EL equation. Since f = 1 ≠ 4x2 ≠ ẋ2 ,
ˆx f = ≠8x, ˆẋ f = ≠2ẋ, and
d
(ˆẋ f ) = ≠2ẍ.
dt
So the EL equation is
ẍ ≠ 4x = 0 with x(0) = 1.
From this, we find the function
x = Ce2t + (1 ≠ C)e≠2t .
The transversality condition is
This tells us that
i.e.,
≠2ẋ(1) = ˆẋ f |t=1 = 0.
0 = ẋ(1) = (2Ce2t ≠ 2(1 ≠ C)e≠2t )|t=1 ,
C=
1
.
1 + e4
Now we need to see what x(1) is, now that we have found C.
x(1) =
2e2
= 0.2658...
1 + e4
Thus, for (a), we find that x(1) Ø 0, and
x=
1
(e2t + e4≠2t )
1 + e4
is the solution since f is concave in (x, ẋ). For (d), we see that x(1) œ
/ [≠2, 0], so
we need to replace the transversality condition with the terminal point condition
x(1) = 0. (since 0.2568... is closer to 0 than ≠2; if x(1) were closer to ≠2, we’d have
impose the terminal point condition x(1) = ≠2.) In other words, for
x = Ce2t + (1 ≠ C)e≠2t ,
we need to find C using x(1) = 0. In turn, we find that
x=
1
(e2t ≠ e4≠2t )
1 ≠ e4
is the solution, again since f is concave in (x, ẋ).
64
ANALYSIS AND OPTIMIZATION
17.1. Integral Constraints. When we were studying constrained optimization, we
considered the method of Lagrange multipliers. If we introduce integral constraints,
then this method will be useful again. Now we consider the general problem
ˆ t1
max
f (t, x, ẋ) dt
x
subject to
t0
x(t0 ) = x0 ,
ˆ
t1
ˆ
t1
t0
and
hi (t, x, ẋ) dt Æ ai for i = 1, . . . , n,
gj (t, x, ẋ) dt = bj for j = 1, . . . , m.
t0
Sometimes, it will be convenient to write these last two constraints in vector form
ˆ t1
ˆ t1
h(t, x, ẋ) dt Æ a and
g(t, x, ẋ) dt = b
t0
t0
with
h = (h1 , . . . , hn ), a = (a1 , . . . , an ), g = (g1 , . . . , hm ), and b = (b1 , . . . , bm ).
Case 1: Equality Constraints. Let’s first consider the case where we only
have g. In this case, like the Lagrangian, we define the augmented integrand
f˜(t, x, ẋ) = f (t, x, ẋ) ≠ ⁄ · g(t, x, ẋ),
and study it to determine solutions to the variational problem
ˆ t1
max
f (t, x, ẋ) dt
x
t0
subject to
x(t0 ) = x0 and
ˆ
t1
g(t, x, ẋ) dt = b.
t0
Theorem 17.1. Let xú be a solution of the EL equation
d
ˆx f˜ = (ˆẋ )f˜ with x(t0 ) = x0 .
dt
If f˜ is concave in (x, ẋ), then xú is a solution to the variational problem.
For example, consider
max
x
subject to
ˆ
0
1
≠ẋ2 dt
x(0) = x(1) = 0 and
ˆ
0
1
x dt = 1.
Remark 17.2. By the Fundamental Theorem of Calculus,
ˆ t1
x(t0 ) = x0 and x(t1 ) = t1 …
ẋ dt = x1 ≠ x0 with x(t0 ) = x0 .
t0
ANALYSIS AND OPTIMIZATION
65
To solve this, we first use our remark to rewrite the first two constraints and put
them in standard form, i.e., an initial condition and integral conditions; no terminal
condition:
ˆ
x(0) = 0 and
1
0
ẋ dt = 0.
Now let’s construct our augmented integrand
f˜ = ≠ẋ2 ≠ ⁄1 ẋ ≠ ⁄2 x,
and observe that
d
ˆx f˜ = ≠⁄2 , ˆẋ f˜ = ≠2ẋ ≠ ⁄1 , and (ˆẋ f˜) = ≠2ẍ.
dt
Therefore, the EL equation is
2ẍ = ⁄2 .
In turn,
⁄ 2 t2
x=
+ C1 t + C2 .
4
Since x(0) = 0, we see that C2 = 0. We need to determine ⁄2 and C1 . To this end,
we need
3
4-t=1
ˆ 1
⁄ 2 t2
⁄2
0=
ẋ dt =
+ C1 t -=
+ C1 .
4
4
0
t=0
We also need
ˆ 1
⁄2
C1
1=
x dt =
+
.
12
2
0
Hence,
⁄2 = ≠24 and C1 = 6.
The solution to the EL equation subject to the constraints is
x = 6t(1 ≠ t).
˜
Since f is concave for any ⁄1 and ⁄2 , we then, by our theorem, conclude that this x
is a maximizer.
Case 2: Mixed Constraints. In this case, we define the augmented integrand
as
f˜(t, x, ẋ) = f (t, x, ẋ) ≠ ⁄ · g(t, x, ẋ) ≠ µ · h(t, x, ẋ) with µ Ø 0.
Theorem 17.3. Let xú be a solution of the EL equation
d
ˆx f˜ = (ˆẋ f˜) with x(t0 ) = x0 .
dt
If f˜ is concave in (x, ẋ), then xú is a solution to the variational problem with
3 ˆ t1
4
µ·
h(t, xú , ẋú ) ≠ a dt = 0.
t0
For example, consider
max
x
subject to
ˆ
0
1
≠ẋ2 dt
x(0) = 0, x(1) = 1, and
ˆ
0
1
x2 dt Æ a.
66
ANALYSIS AND OPTIMIZATION
To solve this, we first construct our augmented integrand
f˜ = ≠ẋ2 ≠ ⁄ẋ ≠ µx2 with µ Ø 0.
(Again, we use the FTC to put our constraints in standard form.) The EL equation
is
ẍ = µx
since
ˆx f˜ = ≠2µx, ˆẋ f˜ = ≠2ẋ ≠ ⁄, and
Using that x(0) = 0 and x(1) = 1, we find that
d
(ˆẋ f˜) = ≠2ẍ.
dt
x(t) = t if µ = 0
and
Ô
Ô
Ô
sinh( µt)
e µt ≠ e≠ µt
Ô
x(t) = Ôµ
=
if µ > 0.
Ô
sinh( µ)
e ≠ e≠ µ
So if µ = 0, then x(t) = t solves the problem provided that
ˆ 1
1
aØ
t2 dt = .
3
0
If µ > 0, then x(t) =
Ô
sinh( µt)
Ô
sinh( µ)
solves the problem with µ implicitly determined by
a=
ˆ
1
0
3
Ô 4
sinh( µt) 2
dt.
Ô
sinh( µ)
17.2. Control Theory. In control theory, we study problems of the form
ˆ t1
max
f (t, x, u) dt,
x,u
t0
where x = x(t) is called the state and u = u(t) is called the control, subject to
and
ẋ = g(t, x, u), u(t) œ U µ R, x(t0 ) = x0 ,
(a) x(t1 ) = x1 , (b) x(t1 ) Ø x1 , (c) x(t1 ) = free, or (d) x(t1 ) Æ x1 .
The set U is called the control region.
Definition 17.4. A pair (x, u) that satisfies the conditions
ẋ = g(t, x, u), u(t) œ U µ R, x(t0 ) = x0 ,
and, depending on the problem,
(a) x(t1 ) = x1 , (b) x(t1 ) Ø x1 , (c) x(t1 ) = free, or (d) x(t1 ) Æ x1
is a called an admissible pair. An optimal pair is an admissible pair that
maximizes the integral
ˆ t1
f (t, x, u) dt,
among all admissible pairs.
t0
ANALYSIS AND OPTIMIZATION
67
As a motivating example, let’s consider an economy evolving over time, where
k = k(t) represents the capital stock,
f = f (k) represents production, and
s = s(t) represents the fraction of production set aside for investment.
Then,
(1 ≠ s(t))f (k(t)) represents consumption per unit time.
If we wanted to maximize consumption over a given period, we would have the
control problem
ˆ T
max
(1 ≠ s)f (k) dt,
k,s
0
where k = k(t) is our state and s = s(t) is our control, subject to
k̇ = sf (k), s(t) œ [0, 1], k(0) = k0 , and k(T ) Ø kT .
Here, k0 represents the amount of stock we start with, and kT represents the amount
of stock we want to ensure exists at time T .
68
ANALYSIS AND OPTIMIZATION
18. Lecture 18: Control Theory Cont’d
Like a Lagrangian, we introduce an auxiliary function called a Hamiltonian to
help us solve control theory problems, also known as optimal control problems.
Definition 18.1. For p0 œ R and p = p(t), we define the Hamiltonian
H(t, x, u, p) = p0 f (t, x, u) + p(t)g(t, x, u).
The function p = p(t) is called the adjoint function.
Remark 18.2. We can assume that p0 is either equal to 0 or 1. Indeed, if p0 ”= 0,
then divide everything by p0 , and redefine the adjoint function as p/p0 .
Theorem 18.3 (The Maximum Principle). Let (xú , uú ) be an optimal pair. Then,
there is a continuous function pú and a number p0 œ {0, 1}, i.e, p0 = 0 or p0 = 1,
such that
1. (p0 , pú (t)) ”= (0, 0) for all t œ [t0 , t1 ].
2. uú maximizes H with respect to u, i.e.,
H(t, xú (t), u, pú (t)) Æ H(t, xú (t), uú (t), pú (t)) for all u œ U.
3. ṗú = ≠ˆx H(t, xú (t), uú (t), pú (t)).
4. pú satisfies the following transversality condition at t = t1 , depending on
the problem
(a) nothing, (b) pú (t1 ) Ø 0 and pú (t1 ) = 0 if xú (t1 ) > x1 .
(c) pú (t1 ) = 0, (d) pú (t1 ) Æ 0 and pú (t1 ) = 0 if xú (t1 ) < x1 .
Theorem 18.4 (Mangasarian). Let (xú , uú ) be an admissible pair. Suppose that 1.
through 4. of the maximum principle are satisfied for some pú with p0 = 1. If U , the
control region, is convex and H(t, x, u, pú (t)) is concave in (x, u) for all t œ [t0 , t1 ],
then (xú , uú ) is an optimal pair.
Given these two theorems, how do we solve an optimal control problem?
Step 0: Make sure the problem is in standard form; typically this just involves
turning a minimization problem into a maximization problem.
Step 1: (a) Identify the Hamiltonian, and set p0 = 1. (b) For each triplet (t, x, p),
maximize H(t, x, u, p) with respect to u over U , and set û = û(t, x, p) to be the
maximizer.
Step 2: Find particular solutions to the ODEs, for x = x(t) and p = p(t),
ẋ = g(t, x, û(t, x, p))
and
ṗ = ≠ˆx H(t, x, û(t, x, p), p)
using the (1) boundary conditions and (2) the transversality condition to go from
the general solution to a particular solution. Call these particular solutions xú and
pú . Then, set uú = uú (t) = û(t, xú (t), pú (t)).
Step 3: Check the sufficient conditions from Mangasarian hold, i.e., (1) U is
convex and (2) H(t, x, u, pú (t)) is concave in (x, u) for all t œ [t0 , t1 ]. If these two
things hold, (xú , uú ) is an optimal pair.
For example, let’s consider the optimal control problem
ˆ 1
min
u2 dt
x,u
0
ANALYSIS AND OPTIMIZATION
69
subject to
ẋ = u + ax, x(0) = 1, x(1) = free, and u œ U = R.
Here a œ R is a fixed but arbitrary constant.
Step 0: In standard form, our problem is
ˆ 1
max
≠u2 dt
x,u
subject to
0
ẋ = u + ax, x(0) = 1, x(1) = free, and u œ U = R.
Step 1: The Hamiltonian with p0 = 1 is
H(t, x, u, p) = ≠u2 + p(u + ax).
To maximize H(t, x, u, p) with respect to u œ U = R, we note that H(t, x, u, p) is
concave in u. So stationary points with respect to u are maxima. Therefore, we
compute
0 = ˆu H = ≠2u + p.
And we find that
p
û = .
2
Step 2: Our ODEs are
p
ẋ = + ax and ṗ = ≠pa.
2
≠at
So p = C1 e . Using the transversality condition, p(1) = 0 (since x(1) = free), we
find that C1 = 0. Thus,
pú (t) = 0 for all t.
Plugging this into the ODE for x, we get the simpler ODE
ẋ = ax,
which is solved by x = C2 eat . Using the boundary condition, x(0) = 1, we see that
C2 = 1. And so,
xú (t) = eat .
In turn,
pú (t)
uú (t) = û(t, xú (t), pú (t)) =
= 0 for all t.
2
So our candidate solution is
(xú , uú ) = (eat , 0).
Step 3: Since R is convex, U = R is convex. In addition, H(t, x, u, pú (t)) = ≠u2 ,
which is concave in (x, u) for all t œ [t0 , t1 ]. So the hypothesis of Mangasarian are
satisfied, which implies that our candidate solution is an optimal solution, i.e.,
(xú , uú ) = (eat , 0)
is an optimal pair / maximizing pair.
Now, let’s redo the same problem, but with a different terminal boundary
condition. Let’s consider the optimal control problem
ˆ 1
max
≠u2 dt
x,u
subject to
0
ẋ = u + ax, x(0) = 1, x(1) = 0, and u œ U = R.
70
ANALYSIS AND OPTIMIZATION
Here, again, a œ R is a fixed but arbitrary constant.
Step 0: Since the problem is already in standard form, there is nothing to do.
Step 1: Notice that Step 1 doesn’t consider the boundary conditions. So Step 1
is the same as before, and we have that û = p/2, again.
Step 2: Here is the first place things change. While the ODEs are the same, the
particular solutions we get will change because our boundary conditions and, thus,
our transversality condition has changed. Recall that our ODEs are
p
ẋ = + ax and ṗ = ≠pa.
2
So
p = C1 e≠at .
Using this p, we compute that
C1
x(t) = C2 +
t if a = 0
2
and
C1 ≠at
x(t) = C2 eat ≠
e
if a ”= 0.
2a
Now we use our boundary conditions x(0) = 1 and x(1) = 0. Since we have a
prescribed terminal value, there is no transversality condition. From the boundary
conditions, we find that
C2 = 1 and C1 = ≠2 if a = 0
and
2aea
e≠a
and C2 = ≠
if a ”= 0.
sinh(a)
2 sinh(a)
Recall sinh(a) = (ea ≠ e≠a )/2. So
C1 = ≠
pú (t) = ≠2 if a = 0 and pú (t) = ≠
2aea(1≠t)
if a ”= 0.
sinh(a)
Moreover,
xú (t) = 1 ≠ t if a = 0 and xú (t) =
sinh(a(1 ≠ t))
if a ”= 0.
sinh(a)
Finally,
aea(1≠t)
if a ”= 0.
sinh(a)
Step 3: Since pú has changed, we need to check that H(t, x, u, pú (t)) is still
concave in (x, u) for all t œ [t0 , t1 ]. When a = 0,
uú (t) = ≠1 if a = 0 and uú (t) = ≠
When a ”= 0,
H(t, x, u, pú (t)) = ≠u2 ≠ 2u.
2aea(1≠t)
(u + ax).
sinh(a)
In (x, u) both of these functions are downward pointing parabolas, and so concave.
Thus, the hypothesis of Mangasarian are satisfied, which implies that our candidate
pair is an optimal pair.
H(t, x, u, pú (t)) = ≠u2 ≠
Download