Uploaded by James Harden

最优化导论 4th 题解

advertisement
AN INTRODUCTION TO
OPTIMIZATION
SOLUTIONS MANUAL
Fourth Edition
Edwin K. P. Chong and Stanislaw H. Żak
A JOHN WILEY & SONS, INC., PUBLICATION
1. Methods of Proof and Some Notation
1.1
A
F
F
T
T
B
F
T
F
T
not A
T
T
F
F
not B
T
F
T
F
A⇒B
T
T
F
T
(not B)⇒(not A)
T
T
F
T
1.2
A
F
F
T
T
B
F
T
F
T
not A
T
T
F
F
not B
T
F
T
F
A⇒B
T
T
F
T
not (A and (not B))
T
T
F
T
1.3
A
F
F
T
T
B
F
T
F
T
not (A and B)
T
T
T
F
not A
T
T
F
F
not B
T
F
T
F
(not A) or (not B))
T
T
T
F
1.4
A
F
F
T
T
B
F
T
F
T
A and B
F
F
F
T
A and (not B)
F
F
T
F
(A and B) or (A and (not B))
F
F
T
T
1.5
The cards that you should turn over are 3 and A. The remaining cards are irrelevant to ascertaining the
truth or falsity of the rule. The card with S is irrelevant because S is not a vowel. The card with 8 is not
relevant because the rule does not say that if a card has an even number on one side, then it has a vowel on
the other side.
Turning over the A card directly verifies the rule, while turning over the 3 card verifies the contraposition.
2. Vector Spaces and Matrices
2.1
We show this by contradiction. Suppose n < m. Then, the number of columns of A is n. Since rank A is
the maximum number of linearly independent columns of A, then rank A cannot be greater than n < m,
which contradicts the assumption that rank A = m.
2.2
.
⇒: Since there exists a solution, then by Theorem 2.1, rank A = rank[A..b]. So, it remains to prove that
rank A = n. For this, suppose that rank A < n (note that it is impossible for rank A > n since A has
only n columns). Hence, there exists y ∈ Rn , y 6= 0, such that Ay = 0 (this is because the columns of
1
A are linearly dependent, and Ay is a linear combination of the columns of A). Let x be a solution to
Ax = b. Then clearly x + y 6= x is also a solution. This contradicts the uniqueness of the solution. Hence,
rank A = n.
⇐: By Theorem 2.1, a solution exists. It remains to prove that it is unique. For this, let x and y be
solutions, i.e., Ax = b and Ay = b. Subtracting, we get A(x − y) = 0. Since rank A = n and A has n
columns, then x − y = 0 and hence x = y, which shows that the solution is unique.
2.3
>
n+1
, i = 1, . . . , k. Since k ≥ n + 2, then the vectors ā1 , . . . , āk must
Consider the vectors āi = [1, a>
i ] ∈ R
n+1
be linearly independent in R
. Hence, there exist α1 , . . . αk , not all zero, such that
k
X
αi ai = 0.
i=1
The first component of the above vector equation is
Pk
i=1 αi ai = 0, completing the proof.
Pk
i=1
αi = 0, while the last n components have the form
2.4
a. We first postmultiply M by the matrix
"
to obtain
"
M m−k,k
M k,k
I m−k
O
#"
Ik
−M m−k,k
Ik
−M m−k,k
#
O
I m−k
#
O
I m−k
"
O
=
M k,k
#
I m−k
.
O
Note that the determinant of the postmultiplying matrix is 1. Next we postmultiply the resulting product
by
"
#
O
Ik
I m−k O
to obtain
"
O
M k,k
I m−k
O
Notice that
"
det M = det
where
#"
I m−k
Ik
O
"
det
O
# "
Ik
Ik
=
O
O
O
M k,k
#!
O
Ik
O
I m−k
"
det
#
O
.
M k,k
O
I m−k
Ik
O
#!
,
#!
= ±1.
The above easily follows from the fact that the determinant changes its sign if we interchange columns, as
discussed in Section 2.2. Moreover,
"
#!
Ik
O
det
= det(I k ) det(M k,k ) = det(M k,k ).
O M k,k
Hence,
det M = ± det M k,k .
b. We can see this on the following examples. We assume, without loss of generality that M m−k,k = O and
let M k,k = 2. Thus k = 1. First consider the case when m = 2. Then we have
"
# "
#
O
I m−k
0 1
M=
=
.
M k,k
O
2 0
2
Thus,
det M = −2 = det (−M k,k ) .
Next consider the case when m = 3. Then

"
det
O
M k,k
 0
#

 0
I m−k
= det 

O
· · ·

2
..
.
..
.
···
..
.
1
0
···
0

0

1
 = 2 6= det (−M k,k ) .

· · ·

0
Therefore, in general,
det M 6= det (−M k,k )
However, when k = m/2, that is, when all sub-matrices are square and of the same dimension, then it is
true that
det M = det (−M k,k ) .
See [121].
2.5
Let
"
A
M=
C
B
D
#
and suppose that each block is k × k. John R. Silvester [121] showed that if at least one of the blocks is
equal to O (zero matrix), then the desired formula holds. Indeed, if a row or column block is zero, then the
determinant is equal to zero as follows from the determinant’s properties discussed Section 2.2. That is, if
A = B = O, or A = C = O, and so on, then obviously det M = 0. This includes the case when any three
or all four block matrices are zero matrices.
If B = O or C = O then
"
#
A B
det M = det
= det (AD) .
C D
The only case left to analyze is when A = O or D = O. We will show that in either case,
det M = det (−BC) .
Without loss of generality suppose that D = O. Following arguments of John R. Silvester [121], we premultiply M by the product of three matrices whose determinants are unity:
"
#"
#"
#"
# "
#
I k −I k I k O
I k −I k A B
−C O
=
.
O
Ik
Ik Ik
O
Ik
C O
A B
Hence,
"
A
det
C
B
O
#
"
=
−C
A
O
B
#
= det (−C) det B
= det (−I k ) det C det B.
Thus we have
"
A
det
C
#
B
= det (−BC) = det (−CB) .
O
3
2.6
We represent the given system of equations in the form Ax = b, where
 
x1
"
#


1 1 2 1
 x2 
A=
,
x =  ,
and
 x3 
1 −2 0 −1
x4
Using elementary row operations yields
"
#
"
1 1 2 1
1
A=
→
1 −2 0 −1
0
"
1 1
[A, b] =
1 −2
2 1
0 −1
1
−3
#
"
1
1
→
−2
0
2
−2
1
−3
"
#
1
b=
.
−2
#
1
,
−2
2
−2
and
1
−2
#
1
,
−3
from which rank A = 2 and rank[A, b] = 2. Therefore, by Theorem 2.1, the system has a solution.
We next represent the system of equations as
"
#" # "
#
1 1
x1
1 − 2x3 − x4
=
1 −2 x2
−2 + x4
Assigning arbitrary values to x3 and x4 (x3 = d3 , x4 = d4 ), we get
" #
"
#−1 "
#
x1
1 1
1 − 2x3 − x4
=
x2
1 −2
−2 + x4
"
#"
#
1 −2 −1 1 − 2x3 − x4
= −
3 −1 1
−2 + x4
"
#
− 43 d3 − 31 d4
=
.
1 − 32 d3 − 32 d4
Therefore, a general solution is
 1
 
   4
  4
− 3 d3 − 13 d4
−3
−3
0
x1
x  1 − 2 d − 2 d  − 2 
− 2 
 1







 2 
3 3
3 4 =
 =
  3  d3 +  3  d4 +   ,
 x3  
  1 
 0 
 0
d3
x4
d4
0
1
0
where d3 and d4 are arbitrary values.
2.7
1. Apply the definition of | − a|:


−a
| − a| =
0

−(−a)


−a if
=
0
if

a
if
if −a > 0
if −a = 0
if −a < 0
a<0
a=0
a>0
= |a|.
2. If a ≥ 0, then |a| = a. If a < 0, then |a| = −a > 0 > a. Hence |a| ≥ a. On the other hand, | − a| ≥ −a
(by the above). Hence, a ≥ −| − a| = −|a| (by property 1).
4
3. We have four cases to consider. First, if a, b ≥ 0, then a + b ≥ 0. Hence, |a + b| = a + b = |a| + |b|.
Second, if a, b ≥ 0, then a + b ≤ 0. Hence |a + b| = −(a + b) = −a − b = |a| + |b|.
Third, if a ≥ 0 and b ≤ 0, then we have two further subcases:
1. If a + b ≥ 0, then |a + b| = a + b ≤ |a| + |b|.
2. If a + b ≤ 0, then |a + b| = −a − b ≤ |a| + |b|.
The fourth case, a ≤ 0 and b ≥ 0, is identical to the third case, with a and b interchanged.
4. We first show |a − b| ≤ |a| + |b|. We have
|a − b| = |a + (−b)|
≤
|a| + | − b|
= |a| + |b|
by property 3
by property 1.
To show ||a| − |b|| ≤ |a − b|, we note that |a| = |a − b + b| ≤ |a − b| + |b|, which implies |a| − |b| ≤ |a − b|. On the
other hand, from the above we have |b| − |a| ≤ |b − a| = |a − b| by property 1. Therefore, ||a| − |b|| ≤ |a − b|.
5. We have four cases. First, if a, b ≥ 0, we have ab ≥ 0 and hence |ab| = ab = |a||b|. Second, if a, b ≤ 0,
we have ab ≥ 0 and hence |ab| = ab = (−a)(−b) = |a||b|. Third, if a ≤ 0, b ≤ 0, we have ab ≤ 0 and hence
|ab| = −ab = a(−b) = |a||b|. The fourth case, a ≤ 0 and b ≥ 0, is identical to the third case, with a and b
interchanged.
6. We have
|a + b|
≤
|a| + |b|
by property 3
≤ c + d.
7. ⇒: By property 2, −a ≤ |a| and a ≤ |a. Therefore, |a| < b implies −a ≤ |a| < b and a ≤ |a| < b.
⇐: If a ≥ 0, then |a| = a < b. If a < 0, then |a| = −a < b.
For the case when “<” is replaced by “≤”, we simply repeat the above proof with “<” replaced by “≤”.
8. This is simply the negation of property 7 (apply DeMorgan’s Law).
2.8
Observe that we can represent hx, yi2 as
"
hx, yi2 = x
>
#
3
y = (Qx)> (Qy) = x> Q2 y,
5
2
3
where
"
1
Q=
1
#
1
.
2
Note that the matrix Q = Q> is nonsingular.
1. Now, hx, xi2 = (Qx)> (Qx) = kQxk2 ≥ 0, and
hx, xi2 = 0 ⇔
⇔
⇔
kQxk2 = 0
Qx = 0
x=0
since Q is nonsingular.
2. hx, yi2 = (Qx)> (Qy) = (Qy)> (Qx) = hy, xi2 .
3. We have
hx + y, zi2
=
(x + y)> Q2 z
= x> Q2 z + y > Q2 z
= hx, zi2 + hy, zi2 .
5
4. hrx, yi2 = (rx)> Q2 y = rx> Q2 y = rhx, yi2 .
2.9
We have kxk = k(x − y) + yk ≤ kx − yk + kyk by the Triangle Inequality. Hence, kxk − kyk ≤ kx − yk. On
the other hand, from the above we have kyk − kxk ≤ ky − xk = kx − yk. Combining the two inequalities,
we obtain |kxk − kyk| ≤ kx − yk.
2.10
Let > 0 be given. Set δ = . Hence, if kx − yk < δ, then by Exercise 2.9, |kxk − kyk| ≤ kx − yk < δ = .
3. Transformations
3.1
Let v be the vector such that x are the coordinates of v with respect to {e1 , e2 , . . . , en }, and x0 are the
coordinates of v with respect to {e01 , e02 , . . . , e0n }. Then,
v = x1 e1 + · · · + xn en = [e1 , . . . , en ]x,
and
v = x01 e01 + · · · + x0n e0n = [e01 , . . . , e0n ]x0 .
Hence,
[e1 , . . . , en ]x = [e01 , . . . , e0n ]x0
which implies
x0 = [e01 , . . . , e0n ]−1 [e1 , . . . , en ]x = T x.
3.2
a. We have

1

0
0
0
[e1 , e2 , e3 ] = [e1 , e2 , e3 ]  3
−4
2
−1
5

4

5 .
3
Therefore,

1

0
0
0 −1
T = [e1 , e2 , e3 ] [e1 , e2 , e3 ] =  3
−4
2
−1
5
b. We have
−1

4
28
1 

5 =
 29
42
3
−11

1

[e1 , e2 , e3 ] = [e01 , e02 , e03 ]  1
3
2
−1
4

3

0 .
5
Therefore,

1

T = 1
3

3

0 .
5
2
−1
4
3.3
We have

2

[e1 , e2 , e3 ] = [e01 , e02 , e03 ]  1
−1
6
2
−1
2

3

0 .
1
−14
−19
13

−14

−7  .
7
Therefore, the transformation matrix from {e01 , e02 , e03 } to {e1 , e2 , e3 } is


2
2 3


T =  1 −1 0 ,
−1 2 1
Now, consider a linear transformation L : R3 → R3 , and let A be its representation with respect to
{e1 , e2 , e3 }, and B its representation with respect to {e01 , e02 , e03 }. Let y = Ax and y 0 = Bx0 . Then,
y 0 = T y = T (Ax) = T A(T −1 x0 ) = (T AT −1 )x0 .
Hence, the representation of the linear transformation with respect to {e01 , e02 , e03 } is


3 −10 −8


B = T AT −1 =  −1
8
4 .
2 −13 −7
3.4
We have

1
0

[e01 , e02 , e03 , e04 ] = [e1 , e2 , e3 , e4 ] 
0
0
Therefore, the transformation matrix from

1 1
0 1

T =
0 0
0 0
1
1
0
0
1
1
1
0

1
1

.
1
1
{e1 , e2 , e3 , e4 } to {e01 , e02 , e03 , e04 } is

−1 
1 −1 0
0
1 1
 0 1 −1 0 
1 1



.
 =
0 0
1 −1
1 1
0 0
0
1
0 1
Now, consider a linear transformation L : R4 → R4 , and let A be its representation with respect to
{e1 , e2 , e3 , e4 }, and B its representation with respect to {e01 , e02 , e03 , e04 }. Let y = Ax and y 0 = Bx0 .
Then,
y 0 = T y = T (Ax) = T A(T −1 x0 ) = (T AT −1 )x0 .
Therefore,

B = T AT −1
5
 −3

=
 −1
1
3
−2
0
1
4
−1
−1
1

3
−2

.
−2
4
3.5
Let {v 1 , v 2 , v 3 , v 4 } be a set of linearly independent eigenvectors of A corresponding to the eigenvalues λ1 ,
λ2 , λ3 , and λ4 . Let T = [v 1 , v 2 , v 3 , v 4 ]. Then,
AT
= A[v 1 , v 2 , v 3 , v 4 ] = [Av 1 , Av 2 , Av 3 , Av 4 ]

=
λ1
0

[λ1 v 1 , λ2 v 2 , λ3 v 3 , λ4 v 4 ] = [v 1 , v 2 , v 3 , v 4 ] 
0
0
Hence,

λ1

AT = T  0
0
7
0
λ2
0

0

0 ,
λ3
0
λ2
0
0
0
0
λ3
0

0
0

.
0
λ4
or

λ1

T −1 AT =  0
0
0
λ2
0

0

0 .
λ3
Therefore, the linear transformation has a diagonal matrix form with respect to the basis formed by a linearly
independent set of eigenvectors.
Because
det(A) = (λ − 2)(λ − 3)(λ − 1)(λ + 1),
the eigenvalues are λ1 = 2, λ2 = 3, λ3 = 1, and λ4 = −1.
From Av i = λi v i , where v i 6= 0 (i = 1, 2, 3), the corresponding eigenvectors are
 
 


 
0
0
0
24
 0
 0
 2
 −12
 
 
 


v1 =   ,
v2 =   ,
v 3 =   , and
v4 = 
.
 1
 1
 −9
 1 
0
1
1
9
Therefore, the basis we are interested in is
 

0


 0
 
{v 1 , v 2 , v 3 } =   ,

 1



1
 
0
 0
 
 ,
 1
1

0
 2
 
 ,
 −9
1


24 


 −12


 .

 1 



9

3.6
Suppose v 1 , . . . , v n are eigenvectors of A corresponding to λ1 , . . . , λn , respectively. Then, for each i =
1, . . . , n, we have
(I n − A)v i = v i − Av i = v i − λi v i = (1 − λi )v i
which shows that 1 − λ1 , . . . , 1 − λn are the eigenvalues of I n − A.
Alternatively, we may write the characteristic polynomial of I n − A as
πI n −A (1 − λ) = det((1 − λ)I n − (I n − A)) = det(−[λI n − A]) = (−1)n πA (λ),
which shows the desired result.
3.7
Let x, y ∈ V ⊥ , and α, β ∈ R. To show that V ⊥ is a subspace, we need to show that αx + βy ∈ V ⊥ . For this,
let v be any vector in V. Then,
v > (αx + βy) = αv > x + βv > y = 0,
since v > x = v > y = 0 by definition.
3.8
The null space of A is N (A) = x ∈ R3 : Ax = 0 . Using elementary row operations and back-substitution,
we can solve the system of equations:






4 −2 0
4 −2 0
4 −2 0
4x1 − 2x2 = 0






⇒
 2 1 −1 →  0 2 −1 →  0 2 −1
2x2 − x3 = 0
2 −3 1
0 −2 1
0 0
0
  
1
x1
   41 
x =  x2  =  2  x3 .
x3
1

⇒
x2 =
1
x3 ,
2
x1 =
1
1
x2 = x3
2
4
8
⇒
Therefore,

 



 1
 
N (A) =  2 c : c ∈ R .



 4
3.9
Let x, y ∈ R(A), and α, β ∈ R. Then, there exists v, u such that x = Av and y = Au. Thus,
αx + βy = αAv + βAu = A(αv + βu).
Hence, αx + βy ∈ R(A), which shows that R(A) is a subspace.
Let x, y ∈ N (A), and α, β ∈ R. Then, Ax = 0 and Ay = 0. Thus,
A(αx + βy) = αAx + βAy = 0.
Hence, αx + βy ∈ N (A), which shows that N (A) is a subspace.
3.10
Let v ∈ R(B), i.e., v = Bx for some x. Consider the matrix [A v]. Then, N (A> ) = N ([A v]> ), since if
u ∈ N (A> ), then u ∈ N (B > ) by assumption, and hence u> v = u> Bx = x> B > u = 0. Now,
dim R(A) + dim N (A> ) = m
and
dim R([A v]) + dim N ([A v]> ) = m.
Since dim N (A> ) = dim N ([A v]> ), then we have dim R(A) = dim R([A v]). Hence, v is a linear combination of the columns of A, i.e., v ∈ R(A), which completes the proof.
3.11
We first show V ⊂ (V ⊥ )⊥ . Let v ∈ V , and u any element of V ⊥ . Then u> v = v > u = 0. Therefore,
v ∈ (V ⊥ )⊥ .
We now show (V ⊥ )⊥ ⊂ V . Let {a1 , . . . , ak } be a basis for V , and {b1 , . . . , bl } a basis for (V ⊥ )⊥ . Define
A = [a1 · · · ak ] and B = [b1 · · · bl ], so that V = R(A) and (V ⊥ )⊥ = R(B). Hence, it remains to show
that R(B) ⊂ R(A). Using the result of Exercise 3.10, it suffices to show that N (A> ) ⊂ N (B > ). So let
x ∈ N (A> ), which implies that x ∈ R(A)⊥ = V ⊥ , since R(A)⊥ = N (A> ). Hence, for all y, we have
(By)> x = 0 = y > B > x, which implies that B > x = 0. Therefore, x ∈ N (B > ), which completes the proof.
3.12
Let w ∈ W ⊥ , and y be any element of V. Since V ⊂ W, then y ∈ W. Therefore, by definition of w, we have
w> y = 0. Therefore, w ∈ V ⊥ .
3.13
Let r = dim V. Let v 1 , . . . , v r be a basis for V, and V the matrix whose ith column is v i . Then, clearly
V = R(V ).
⊥
Let u1 , . . . , un−r be a basis for V ⊥ , and U the matrix whose ith row is u>
= R(U > ), and
i . Then, V
> ⊥
⊥ ⊥
V = (V ) = R(U ) = N (U ) (by Exercise 3.11 and Theorem 3.4).
3.14
a. Let x ∈ V. Then, x = P x + (I − P )x. Note that P x ∈ V, and (I − P )x ∈ V ⊥ . Therefore,
x = P x + (I − P )x is an orthogonal decomposition of x with respect to V. However, x = x + 0 is also an
orthogonal decomposition of x with respect to V. Since the orthogonal decomposition is unique, we must
have x = P x.
b. Suppose P is an orthogonal projector onto V. Clearly, R(P ) ⊂ V by definition. However, from part a,
x = P x for all x ∈ V, and hence V ⊂ R(P ). Therefore, R(P ) = V.
3.15
To answer the question, we have to represent the quadratic form with a symmetric matrix as
"
#
"
#!
"
#
1
1
1
−8
1
1
1
−7/2
x>
+
x = x>
x.
2 1 1
2 −8 1
−7/2
1
9
The leading principal minors are ∆1 = 1 and ∆2 = −45/4. Therefore, the quadratic form is indefinite.
3.16
The leading principal minors are ∆1 = 2, ∆2 = 0, ∆3 = 0, which are all nonnegative. However, the
eigenvalues of A are 0, −1.4641, 5.4641 (for example, use Matlab to quickly check this). This implies that
the matrix A is indefinite (by Theorem 3.7). An alternative way to show that A is not positive semidefinite
is to find a vector x such that x> Ax < 0. So, let x be an eigenvector of A corresponding to its negative
eigenvalue λ = −1.4641. Then, x> Ax = x> (λx) = λx> x = λkxk2 < 0. For this example, we can take
x = [0.3251, 0.3251, −0.8881]> , for which we can verify that x> Ax = −1.4643.
3.17
a. The matrix Q is indefinite, since ∆2 = −1 and ∆3 = 2.
b. Let x ∈ M. Then, x2 + x3 = −x1 , x1 + x3 = −x2 , and x1 + x2 = −x3 . Therefore,
x> Qx = x1 (x2 + x3 ) + x2 (x1 + x3 ) + x3 (x1 + x2 ) = −(x21 + x22 + x23 ).
This implies that the matrix Q is negative definite on the subspace M.
3.18
a. We have

0

f (x1 , x2 , x3 ) = x22 = [x1 , x2 , x3 ]  0
0
0
1
0
 
x1
0
 
0  x2  .
x3
0
Then,

0

Q = 0
0
0
1
0

0

0
0
and the eigenvalues of Q are λ1 = 0, λ2 = 1, and λ3 = 0. Therefore, the quadratic form is positive
semidefinite.
b. We have

1

f (x1 , x2 , x3 ) = x21 + 2x22 − x1 x3 = [x1 , x2 , x3 ]  0
− 21
0
2
0
 
− 12
x1
 
0   x2  .
x3
0
Then,

1

Q= 0
− 21
√
and the eigenvalues of Q are λ1 = 2, λ2 = (1 − 2)/2,
is indefinite.
0
2
0

− 21

0 
0
and λ3 = (1 +
√
2)/2. Therefore, the quadratic form
c. We have

1

2
2
f (x1 , x2 , x3 ) = x1 + x3 + 2x1 x2 + 2x1 x3 + 2x2 x3 = [x1 , x2 , x3 ]  1
1
1
0
1
 
1
x1
 
1  x2  .
1
x3
Then,

1 1

Q = 1 0
1 1
√
and the eigenvalues of Q are λ1 = 0, λ2 = 1 − 3, and
indefinite.
10

1

1
1
λ3 = 1 +
√
3. Therefore, the quadratic form is
3.19
We have
f (x1 , x2 , x3 )
=
=
Let

4

Q =  −2
6
−2
1
−3
4x21 + x22 + 9x23 − 4x1 x2 − 6x2 x3 + 12x1 x3
 

x1
4 −2 6
 

[x1 , x2 , x3 ]  −2 1 −3  x2  .
6 −3 9
x3

6

−3 ,
9


x1
 
x =  x2  = x1 e1 + x2 e2 + x3 e3 ,
x3
where e1 , e2 , and e3 form the natural basis for R3 .
Let v 1 , v 2 , and v 3 be another basis for R3 . Then, the vector x is represented in the new basis as x̃, where
x = [v 1 , v 2 , v 3 ]x̃ = V x̃.
Now, f (x) = x> Qx = (V x̃)> Q(V x̃) = x̃> (V > QV )x̃ = x̃> Q̃x̃, where


q̃11 q̃12 q̃13


Q̃ =  q̃21 q̃22 q̃23 
q̃31 q̃32 q̃33
and q̃ij = v i Qv j for i, j = 1, 2, 3.
We will find a basis {v 1 , v 2 , v 3 } such that q̃ij = 0 for i 6= j, and is of the form
v1
= α11 e1
v2
= α21 e1 + α22 e2
v3
= α31 e1 + α32 e2 + α33 e3
Because
q̃ij = v i Qv j = v i Q(αj1 e1 + . . . + αjj ej ) = αj1 (v i Qe1 ) + . . . + αjj (v i Qej ),
we deduce that if v i Qej = 0 for j < i, then v i Qv j = 0. In this case,
q̃ii = v i Qv i = v i Q(αi1 e1 + . . . + αii ei ) = αi1 (v i Qe1 ) + . . . + αii (v i Qei ) = αii (v i Qei ).
Our task therefore is to find v i (i = 1, 2, 3) such that
v i Qej
=
0,
v i Qei
=
1,
j<i
and, in this case, we get

α11

Q̃ =  0
0
0
α22
0

0

0 .
α33
Case of i = 1.
From v >
1 Qe1 = 1,
(α11 e1 )> Qe1 = α11 (e>
1 Qe1 ) = α11 q11 = 1.
Therefore,
 1
4
α11
1
1
1
=
=
=
q11
∆1
4
⇒
11
 0
 
v 1 = α11 e1 =   .
 0
0
Case of i = 2.
From v >
2 Qe1 = 0,
>
(α21 e1 + α22 e2 )> Qe1 = α21 (e>
1 Qe1 ) + α22 (e2 Qe1 ) = α21 q11 + α22 q21 = 0.
From v >
2 Qe2 = 1,
>
(α21 e1 + α22 e2 )> Qe2 = α21 (e>
1 Qe2 ) + α22 (e2 Qe2 ) = α21 q12 + α22 q22 = 1.
Therefore,
"
q11
q12
q21
q22
#"
# " #
α21
0
=
.
α22
1
But, since ∆2 = 0, this system of equations is inconsistent. Hence, in this problem v >
2 Qe2 = 0 should
be satisfied instead of v >
Qe
=
1
so
that
the
system
can
have
a
solution.
In
this
case,
the diagonal
2
2
matrix becomes


α11 0 0


Q̃ =  0 0 0  ,
0 0 α33
and the system of equations become
"
#"
# " #
"
# " #
1
q11 q21 α21
0
α21
=
⇒
= 2 α22 ,
1
q12 q22 α22
0
α22
where α22 is an arbitrary real number. Thus,
 
1
 2
v 2 = α21 e1 + α22 e2 =  1  a,
0
where a is an arbitrary real number.
Case of i = 3.
Since in this case ∆3 = det(Q) = 0, we will have to apply the same reasoning of the previous case and
>
use the condition v >
3 Qe3 = 0 instead of v 3 Qe3 = 1. In this way the diagonal matrix becomes


α11 0 0


Q̃ =  0 0 0 .
0 0 0
>
Thus, from v >
3 Qe1 = 0, v 3 Qe2

q11 q21

 q12 q22
q13 q23
= 0 and v >
3 Qe3 = 0,






q31
α31
α31
α31






q32   α32  = Q>  α32  = Q  α32 
q33
α33
α33
α33


  
4 −2 6
α31
0


  
=  −2 1 −3  α32  =  0 .
6 −3 9
α33
0
Therefore,

 

α31
α31

 

 α32  =  2α31 + 3α33  ,
α33
α33
12
where α31 and α33 are arbitrary real numbers. Thus,

b


v 3 = α31 e1 + α32 e2 + α33 e3 =  2b + 3c ,
c

where b and c are arbitrary real numbers.
Finally,

1
4
a
2

V = [x1 , x2 , x3 ] =  0
0
a
0

b

2b + 3c ,
c
where a, b, and c are arbitrary real numbers.
3.20
We represent this quadratic form as f (x) = x> Qx, where


1 ξ −1


Q =  ξ 1 2 .
−1 2 5
The leading principal minors of Q are ∆1 = 1, ∆2 = 1 − ξ 2 , ∆3 = −5ξ 2 − 4ξ. For the quadratic form to
be positive definite, all the leading principal minors of Q must be positive. This is the case if and only if
ξ ∈ (−4/5, 0).
3.21
The matrix Q = Q> > 0 can be represented as Q = Q1/2 Q1/2 , where Q1/2 = (Q1/2 )> > 0.
1. Now, hx, xiQ = (Q1/2 x)> (Q1/2 x) = kQ1/2 xk2 ≥ 0, and
kQ1/2 xk2 = 0
hx, xiQ = 0 ⇔
Q1/2 x = 0
x=0
⇔
⇔
since Q1/2 is nonsingular.
2. hx, yiQ = x> Qy = y > Q> x = y > Qx = hy, xiQ .
3. We have
hx + y, ziQ
=
(x + y)> Qz
= x> Qz + y > Qz
= hx, ziQ + hy, ziQ .
4. hrx, yiQ = (rx)> Qy = rx> Qy = rhx, yiQ .
3.22
We have
We first show that kAk∞
kAk∞ = max{kAxk∞ : kxk∞ = 1}.
Pn
≤ maxi k=1 |aik |. For this, note that for each x such that kxk∞ = 1, we have
kAxk∞
=
max
i
≤ max
i
≤ max
i
13
n
X
aik xk
k=1
n
X
|aik ||xk |
k=1
n
X
k=1
|aik |,
since |xk | ≤ maxk |xk | = kxk∞ = 1. Therefore,
n
X
kAk∞ ≤ max
i
|aik |.
k=1
Pn
n
To show
Pn that kAk∞ = maxi k=1 |aik |, it remains to find a x̃ ∈ R , kx̃k∞ = 1, such that kAx̃k∞ =
maxi k=1 |aik |. So, let j be such that
n
X
|ajk | = max
i
k=1
n
X
|aik |.
k=1
Define x̃ by
(
|ajk |/ajk
x̃k =
1
if ajk 6= 0
.
otherwise
Clearly kx̃k∞ = 1. Furthermore, for i 6= j,
n
X
aik x̃k ≤
k=1
n
X
|aik | ≤ max
i
k=1
and
n
X
n
X
n
X
ajk x̃k =
k=1
|aik | =
k=1
n
X
|ajk |
k=1
|ajk |.
k=1
Therefore,
kAx̃k∞ = max
i
n
X
n
X
aik x̃k =
k=1
|ajk | = max
i
k=1
n
X
|aik |.
k=1
3.23
We have
kAk1 = max{kAxk1 : kxk1 = 1}.
Pm
We first show that kAk1 ≤ maxk i=1 |aik |. For this, note that for each x such that kxk1 = 1, we have
kAxk1
=
≤
≤
n
m X
X
|aik ||xk |
i=1 k=1
n
X
max
k
≤ max
k
since
Pn
k=1
m
X
|xk |
k=1
≤
aik xk
i=1 k=1
m X
n
X
|aik |
i=1
m
X
!
|aik |
i=1
m
X
k=1
|aik |,
i=1
|xk | = kxk1 = 1. Therefore,
kAk1 ≤ max
k
14
m
X
i=1
n
X
|aik |.
|xk |
Pm
m
To show
Pmthat kAk1 = maxk i=1 |aik |, it remains to find a x̃ ∈ R , kx̃k1 = 1, such that kAx̃k1 =
maxk i=1 |aik |. So, let j be such that
m
X
|aij | = max
k
i=1
m
X
|aik |.
i=1
Define x̃ by
(
1
x̃k =
0
if k = j
.
otherwise
Clearly kx̃k1 = 1. Furthermore,
kAx̃k1 =
m X
n
X
aik x̃k =
i=1 k=1
m
X
|aij | = max
i=1
k
m
X
|aik |.
i=1
4. Concepts from Geometry
4.1
⇒: Let S = {x : Ax = b} be a linear variety. Let x, y ∈ S and α ∈ R. Then,
A(αx + (1 − α)y) = αAx + (1 − α)Ay = αb + (1 − α)b = b.
Therefore, αx + (1 − α)y ∈ S.
⇐: If S is empty, we are done. So, suppose x0 ∈ S. Consider the set S0 = S − x0 = {x − x0 : x ∈ S}.
Clearly, for all x, y ∈ S0 and α ∈ R, we have αx + (1 − α)y ∈ S0 . Note that 0 ∈ S0 . We claim that S0
is a subspace. To see this, let x, y ∈ S0 , and α ∈ R. Then, αx = αx + (1 − α)0 ∈ S0 . Furthermore,
1
1
2 x + 2 y ∈ S0 , and therefore x + y ∈ S0 by the previous argument. Hence, S0 is a subspace. Therefore, by
Exercise 3.13, there exists A such that S0 = N (A) = {x : Ax = 0}. Define b = Ax0 . Then,
S
= S0 + x0 = {y + x0 : y ∈ N (A)}
= {y + x0 : Ay = 0}
= {y + x0 : A(y + x0 ) = b}
= {x : Ax = b}.
4.2
Let u, v ∈ Θ = {x ∈ Rn : kxk ≤ r}, and α ∈ [0, 1]. Suppose z = αu + (1 − α)v. To show that Θ is convex,
we need to show that z ∈ Θ, i.e., kzk ≤ r. To this end,
kzk2
=
(αu> + (1 − α)v > )(αu + (1 − α)v)
= α2 kuk2 + 2α(1 − α)u> v + (1 − α)2 kvk2 .
Since u, v ∈ Θ, then kuk2 ≤ r2 and kvk2 ≤ r2 . Furthermore, by the Cauchy-Schwarz Inequality, we have
u> v ≤ kukkvk ≤ r2 . Therefore,
kzk2 ≤ α2 r2 + 2α(1 − α)r2 + (1 − α)2 r2 = r2 .
Hence, z ∈ Θ, which implies that Θ is a convex set, i.e., the any point on the line segment joining u and v
is also in Θ.
4.3
Let u, v ∈ Θ = {x ∈ Rn : Ax = b}, and α ∈ [0, 1]. Suppose z = αu + (1 − α)v. To show that Θ is convex,
we need to show that z ∈ Θ, i.e., Az = b. To this end,
Az
= A(αu + (1 − α)v)
= αAu + (1 − α)Av.
15
Since u, v ∈ Θ, then Au = b and Av = b. Therefore,
Az = αb + (1 − α)b = b,
and hence z ∈ Θ.
4.4
Let u, v ∈ Θ = {x ∈ Rn : x ≥ 0}, and α ∈ [0, 1]. Suppose z = αu + (1 − α)v. To show that Θ is convex,
we need to show that z ∈ Θ, i.e., z ≥ 0. To this end, write x = [x1 , . . . , xn ]> , y = [y1 , . . . , yn ]> , and
z = [z1 , . . . , zn ]> . Then, zi = αxi + (1 − α)yi , i = 1, . . . , n. Since xi , yi ≥ 0, and α, 1 − α ≥ 0, we have zi ≥ 0.
Therefore, z ≥ 0, and hence z ∈ Θ.
5. Elements of Calculus
5.1
Observe that
kAk k ≤ kAk−1 kkAk ≤ kAk−2 kkAk2 ≤ · · · ≤ kAkk .
Therefore, if kAk < 1, then limk→∞ kAk k = O which implies that limk→∞ Ak = O.
5.2
For the case when A has all real eigenvalues, the proof is simple. Let λ be the eigenvalue of A with largest
absolute value, and x the corresponding (normalized) eigenvector, i.e., Ax = λx and kxk = 1. Then,
kAk ≥ kAxk = kλxk = |λ|kxk = |λ|,
which completes the proof for this case.
In general, the eigenvalues of A and the corresponding eigenvectors may be complex. In this case, we
proceed as follows (see [41]). Consider the matrix
B=
A
,
kAk + ε
where ε is a positive real number. We have
kBk =
kAk
< 1.
kAk + ε
By Exercise 5.1, B k → O as k → ∞, and thus by Lemma 5.1, |λi (B)| < 1, i = 1, . . . , n. On the other hand,
for each i = 1, . . . , n,
λi (A)
λi (B) =
,
kAk + ε
and thus
|λi (B)| =
|λi (A)|
< 1.
kAk + ε
which gives
|λi (A)| < kAk + ε.
Since the above arguments hold for any ε > 0, we have |λi (A)| ≤ kAk.
5.3
a. ∇f (x) = (ab> + ba> )x.
b. F (x) = ab> + ba> .
16
5.4
We have
Df (x) = [x1 /3, x2 /2],
and
" #
dg
3
(t) =
.
dt
2
By the chain rule,
d
F (t)
dt
= Df (g(t))
dg
(t)
dt
=
" #
3
[(3t + 5)/3, (2t − 6)/2]
2
=
5t − 1.
5.5
We have
Df (x) = [x2 /2, x1 /2],
and
" #
∂g
4
(s, t) =
,
∂s
2
" #
∂g
3
(s, t) =
.
∂t
1
By the chain rule,
∂
f (g(s, t))
∂s
∂g
(s, t)
∂s
" #
1
4
[2s + t, 4s + 3t]
2
2
= Df (g(t))
=
=
8s + 5t,
and
∂
f (g(s, t))
∂t
∂g
(s, t)
∂t
" #
1
3
[2s + t, 4s + 3t]
2
1
= Df (g(t))
=
=
5s + 3t.
5.6
We have
Df (x) = [3x21 x2 x23 + x2 , x31 x23 + x1 , 2x31 x2 x3 + 1]
and


et + 3t2
dx


(t) =  2t  .
dt
1
17
By the chain rule,
d
f (x(t))
dt
= Df (x(t))
dx
(t)
dt

et + 3t2


= [3x1 (t)2 x2 (t)x3 (t)2 + x2 (t), x1 (t)3 x3 (t)2 + x1 (t), 2x1 (t)3 x2 (t)x3 (t) + 1]  2t 
1

= 12t(et + 3t2 )3 + 2tet + 6t2 + 2t + 1.
5.7
Let ε > 0 be given. Since f (x) = o(g(x)), then
lim
x→0
kf (x)k
= 0.
g(x)
Hence, there exists δ > 0 such that if kxk < δ, then
kf (x)k
< ε,
g(x)
which can be rewritten as
kf (x)k ≤ εg(x).
5.8
By Exercise 5.7, there exists δ > 0 such that if kxk < δ, then |o(g(x))| < g(x)/2. Hence, if kxk < δ, x 6= 0,
then
1
f (x) ≤ −g(x) + |o(g(x))| < −g(x) + g(x)/2 = − g(x) < 0.
2
5.9
We have that
{x : f1 (x) = 12} = {x : x21 − x22 = 12},
and
{x : f2 (x) = 16} = {x : x2 = 8/x1 }.
To find the intersection points, we substitute x2 = 8/x1 into x21 − x22 = 12 to get x41 − 12x21 − 64 = 0.
Solving gives x21 = 16, −4. Clearly, the only two possibilities for x1 are x1 = +4, −4, from which we obtain
x2 = +2, −2. Hence, the intersection points are located at [4, 2]> and [−4, −2]> .
The level sets associated with f1 (x1 , x2 ) = 12 and f2 (x1 , x2 ) = 16 are shown as follows.
18
x2
f2(x1,x2) = 16
f1(x1,x2) = 12
f1(x1,x2) = 12
3
(4,2)
2
1
1
− 12
2 3
x1
12
(−4,−2)
f2(x1,x2) = 16
5.10
a. We have
1
f (x) = f (xo ) + Df (xo )(x − xo ) + (x − xo )> D2 f (xo )(x − xo ) + · · · .
2
We compute
Df (x)
D2 f (x)
[e−x2 , −x1 e−x2 + 1],
"
#
0
−e−x2
=
.
−e−x2 x1 e−x2
=
Hence,
=
"
#
1
0
x1 − 1
2 + [1, 0]
+ [x1 − 1, x2 ]
2
x2
−1
=
1
1 + x1 + x2 − x1 x2 + x22 + · · · .
2
"
f (x)
#"
−1
1
#
x1 − 1
+ ···
x2
b. We compute
Df (x)
D2 f (x)
[4x3 + 4x1 x22 , 4x21 x2 + 4x32 ],
" 1
#
12x21 + 4x22
8x1 x2
=
.
8x1 x2
4x21 + 12x22
=
Expanding f about the point xo yields
"
"
#
1
16
x1 − 1
f (x) = 4 + [8, 8]
+ [x1 − 1, x2 − 1]
2
8
x2 − 1
=
8
16
8x21 + 8x22 − 16x1 − 16x2 + 8x1 x2 + 12 + · · · .
19
#"
#
x1 − 1
+ ···
x2 − 1
c. We compute
Df (x)
D2 f (x)
[ex1 −x2 + ex1 +x2 + 1, −ex1 −x2 + ex1 +x2 + 1],
"
#
ex1 −x2 + ex1 +x2 −ex1 −x2 + ex1 +x2
=
.
−ex1 −x2 + ex1 +x2 ex1 −x2 + ex1 +x2
=
Expanding f about the point xo yields
=
"
#"
#
#
1
2e 0
x1 − 1
x1 − 1
+ ···
2 + 2e + [2e + 1, 1]
+ [x1 − 1, x2 ]
2
x2
0 2e
x2
=
1 + x1 + +x2 + e(1 + x21 + x22 ) + · · · .
"
f (x)
20
6. Basics of Unconstrained Optimization
6.1
a. In this case, x∗ is definitely not a local minimizer. To see this, note that d = [1, −2]> is a feasible
direction at x∗ . However, d> ∇f (x∗ ) = −1, which violates the FONC.
b. In this case, x∗ satisfies the FONC, and thus is possibly a local minimizer, but it is impossible to be
definite based on the given information.
c. In this case, x∗ satisfies the SOSC, and thus is definitely a (strict) local minimizer.
d. In this case, x∗ is definitely not a local minimizer. To see this, note that d = [0, 1]> is a feasible direction
at x∗ , and d> ∇f (x∗ ) = 0. However, d> F (x∗ )d = −1, which violates the SONC.
6.2
Because there are no constraints on x1 or x2 , we can utilize conditions for unconstrained optimization. To
proceed, we first compute the function gradient and find the critical points, that is, the points that satisfy
the FONC,
∇f (x1 , x2 ) = 0.
The components of the gradient ∇f (x1 , x2 ) are
∂f
= x21 − 4
∂x1
Thus there are four critical points:
" #
" #
2
2
(1)
(2)
x =
, x =
,
4
−4
∂f
= x22 − 16.
∂x2
and
"
x
(3)
#
−2
=
,
4
We next compute the Hessian matrix of the function f :
"
2x1
F (x) =
0
"
and x
(4)
#
−2
=
.
−4
#
0
.
2x2
Note that F (x(1) ) > 0 and therefore, x(1) is a strict local minimizer. Next, F (x(4) ) < 0 and therefore,
x(4) is a strict local maximizer. The Hessian is indefinite at x(2) and x(3) and so these points are neither
maximizer nor minimizers.
6.3
Suppose x∗ is a global minimizer of f over Ω, and x∗ ∈ Ω0 ⊂ Ω. Let x ∈ Ω0 . Then, x ∈ Ω and therefore
f (x∗ ) ≤ f (x). Hence, x∗ is a global minimizer of f over Ω0 .
6.4
Suppose x∗ is an interior point of Ω. Therefore, there exists ε > 0 such that {y : ky − x∗ k < ε} ⊂ Ω. Since
x∗ is a local minimizer of f over Ω, there exists ε0 > 0 such that f (x∗ ) ≤ f (x) for all x ∈ {y : ky −x∗ k < ε0 }.
Take ε00 = min(ε, ε0 ). Then, {y : ky − x∗ k < ε00 } ⊂ Ω0 , and f (x∗ ) ≤ f (x) for all x ∈ {y : ky − x∗ k < ε00 }.
Thus, x∗ is a local minimizer of f over Ω0 .
To show that we cannot make the same conclusion if x∗ is not an interior point, let Ω = {0}, Ω0 = [−1, 1],
and f (x) = x. Clearly, 0 ∈ Ω is a local minimizer of f over Ω. However, 0 ∈ Ω0 is not a local minimizer of f
over Ω0 .
6.5
a. The TONC is: if f 00 (0) = 0, then f 000 (0) = 0. To prove this, suppose f 00 (0) = 0. Now, by the FONC, we
also have f 0 (0) = 0. Hence, by Taylor’s theorem,
f (x) = f (0) +
f 000 (0) 3
x + o(x3 ).
3!
Since 0 is a local minimizer, f (x) ≥ f (0) for all x sufficiently close to 0. Hence, for all such x,
f 000 (0) 3
x ≥ o(x3 ).
3!
21
Now, if x > 0, then
o(x3 )
,
x3
which implies that f 000 (0) ≥ 0. On the other hand, if x < 0, then
f 000 (0) ≥ 3!
f 000 (0) ≤ 3!
o(x3 )
,
x3
which implies that f 000 (0) ≤ 0. This implies that f 000 (0) = 0, as required.
b. Let f (x) = −x4 . Then, f 0 (0) = 0, f 00 (0) = 0, and f 000 (0) = 0, which means that the FONC, SONC, and
TONC are all satisfied. However, 0 is not a local minimizer: f (x) < 0 for all x 6= 0.
c. The answer is yes. To see this, we first write
f (x) = f (0) + f 0 (0)x +
f 00 (0) 2 f 000 (0) 3
x +
x .
2
3!
Now, if the FONC is satisfied, then
f (x) = f (0) +
f 00 (0) 2 f 000 (0) 3
x +
x .
2
3!
Moreover, if the SONC is satisfied, then either (i) f 00 (0) > 0 or (ii) f 00 (0) = 0. In the case (i), it is clear from
the above equation that f (x) ≥ f (0) for all x sufficiently close to 0 (because the third term on the right-hand
side is o(x2 )). In the case (ii), the TONC implies that f (x) = f (0) for all x. In either case, f (x) ≥ f (0) for
all x sufficiently close to 0. This shows that 0 is a local minimizer.
6.6
a. The TONC is: if f 0 (0) = 0 and f 00 (0) = 0, then f 000 (0) ≥ 0. To prove this, suppose f 0 (0) = 0 and
f 00 (0) = 0. By Taylor’s theorem, for x ≥ 0,
f (x) = f (0) +
f 000 (0) 3
x + o(x3 ).
3!
Since 0 is a local minimizer, f (x) ≥ f (0) for sufficiently small x ≥ 0. Hence, for all x ≥ 0 sufficiently small,
f 000 (0) ≥ 3!
o(x3 )
.
x3
This implies that f 000 (0) ≥ 0, as required.
b. Let f (x) = −x4 . Then, f 0 (0) = 0, f 00 (0) = 0, and f 000 (0) = 0, which means that the FONC, SONC, and
TONC are all satisfied. However, 0 is not a local minimizer: f (x) < 0 for all x > 0.
6.7
For convenience, let z 0 = x0 + arg minx∈Ω f (x). Thus we want to show that z 0 = arg miny∈Ω0 f (y); i.e., for
all y ∈ Ω0 , f (y − x0 ) ≥ f (z 0 − x0 ). So fix y ∈ Ω0 . Then, y − x0 ∈ Ω. Hence,
f (y − x0 ) ≥ min f (x)
x∈Ω
= f arg min f (x)
x∈Ω
= f (z 0 − x0 ),
which completes the proof.
6.8
a. The gradient and Hessian of f are
"
∇f (x)
=
F (x)
=
1
2
3
"
1
2
3
22
#
" #
3
3
x+
7
5
#
3
.
7
Hence, ∇f ([1, 1]> ) = [11, 25]> , and F ([1, 1]> ) is as shown above.
b. The direction of maximal rate of increase is the direction of the gradient. Hence, the directional derivative
with respect to a unit vector in this direction is
∇f (x)
k∇f (x)k
>
∇f (x) =
At x = [1, 1]> , we have k∇f ([1, 1]> )k =
√
(∇f (x))> ∇f (x)
= k∇f (x)k.
k∇f (x)k
112 + 252 = 27.31.
c. The FONC in this case is ∇f (x) = 0. Solving, we get
"
#
3/2
x=
.
−1
The point above does not satisfy the SONC because the Hessian is not positive semidefinite (its determinant
is negative).
6.9
a. A differentiable function f decreases most rapidly in the direction of the negative gradient. In our problem,
∇f (x) =
h
∂f
∂x1
∂f
∂x2
i>
h
= 2x1 x2 + x32
x21 + 3x1 x22
i>
.
Hence, the direction of most rapid decrease is
h
−∇f x(0) = − 5
10
i>
.
b. The rate of increase of f at x(0) in the direction −∇f x(0) is
∇f x
(0)
> −∇f x(0) √
√
= −k∇f x(0) k = − 125 = −5 5.
k∇f x(0) k
c. The rate of increase of f at x(0) in the direction d is
∇f x
(0)
" #
i 3 1
= 11.
10
4 5
> d
h
= 5
kdk
6.10
a. We can rewrite f as
"
1 > 4
f (x) = x
2
4
#
" #
4
> 3
x+x
+ 7.
2
4
The gradient and Hessian of f are
∇f (x)
F (x)
"
4
4
"
4
4
=
=
#
" #
4
3
x+
,
2
4
#
4
.
2
Hence ∇f ([0, 1]> ) = [7, 6]> . The directional derivative is
[1, 0]> ∇f ([0, 1]> ) = 7.
23
b. The FONC in this case is ∇f (x) = 0. The only point satisfying the FONC is
" #
1 −5
∗
x =
.
4 2
The point above does not satisfy the SONC because the Hessian is not positive semidefinite (its determinant
is negative). Therefore, f does not have a minimizer.
6.11
a. Write the objective function as f (x) = −x22 . In this problem the only feasible directions at 0 are of the
form d = [d1 , 0]> . Hence, d> ∇f (0) = 0 for all feasible directions d at 0.
b. The point 0 is a local maximizer, because f (0) = 0, while any feasible point x satisfies f (x) ≤ 0.
The point 0 is not a strict local maximizer because for any x of the form x = [x1 , 0]> , we have f (x) =
0 = f (0), and there are such points in any neighborhood of 0.
The point 0 is not a local minimizer because for any point x of the form x = [x1 , x21 ] with x1 > 0, we
have f (x) = −x41 < 0, and there are such points in any neighborhood of 0. Since 0 is not a local minimizer,
it is also not a strict local minimizer.
6.12
a. We have ∇f (x∗ ) = [0, 5]> . The only feasible directions at x∗ are of the form d = [d1 , d2 ]> with d2 ≥ 0.
Therefore, for such feasible directions, d> ∇f (x∗ ) = 5d2 ≥ 0. Hence, x∗ = [0, 1]> satisfies the first order
necessary condition.
b. We have F (x∗ ) = O. Therefore, for any d, d> F (x∗ )d ≥ 0. Hence, x∗ = [0, 1]> satisfies the second order
necessary condition.
c. Consider points of the form x = [x1 , −x21 + 1]> , x1 ∈ R. Such points are in Ω, and are arbitrarily close to
x∗ . However, for such points x 6= x∗ ,
f (x) = 5(−x21 + 1) = 5 − 5x21 < 5 = f (x∗ ).
Hence, x∗ is not a local minimizer.
6.13
a. We have ∇f (x∗ ) = −[3, 0]> . The only feasible directions at x∗ are of the form d = [d1 , d2 ]> with d1 ≤ 0.
Therefore, for such feasible directions, d> ∇f (x∗ ) = 3d1 ≥ 0. Hence, x∗ = [2, 0]> satisfies the first order
necessary condition.
b. We have F (x∗ ) = O. Therefore, for any d, d> F (x∗ )d ≥ 0. Hence, x∗ = [2, 0]> satisfies the second order
necessary condition.
c. Yes, x∗ is a local minimizer. To see this, notice that any feasible point x = [x1 , x2 ]> 6= x∗ is such that
x1 < 2. Hence, for such points x 6= x∗ ,
f (x) = −3x1 > −6 = f (x∗ ).
In fact, x∗ is a strict local minimizer.
6.14
a. We have ∇f (x) = [0, 1], which is nonzero everywhere. Hence, no interior point satisfies the FONC.
Moreover, any boundary point with a feasible direction d such that d2 < 0 cannot be satisfy the FONC,
because for such a d, d> ∇f (x) = d2 < 0. By drawing a picture, it is easy to see that the only boundary
point remaining is x∗ = [0, 1]> . For this point, any feasible direction satisfies d2 ≥ 0. Hence, for any feasible
direction, d> ∇f (x∗ ) = d2 ≥ 0. Hence, x∗ = [0, 1]> satisfies the FONC, and is the only such point.
b. We have F (x) = O. So any point (and in particular x∗ = [0, 1]> ) satisfies the SONC.
p
c. The point x∗ = [0, 1]> is not a local minimizer. To see this, consider points of the form x = [ 1 − x22 , x2 ]>
where x2 ∈ [1/2, 1). It is clear that such points are feasible, and are arbitrarily close to x∗ = [0, 1]> . However,
for such points, f (x) = x2 < 1 = f (x∗ ).
24
6.15
a. We have ∇f (x∗ ) = [3, 0]> . The only feasible directions at x∗ are of the form d = [d1 , d2 ]> with d1 ≥ 0.
Therefore, for such feasible directions, d> ∇f (x∗ ) = 3d1 ≥ 0. Hence, x∗ = [2, 0]> satisfies the first order
necessary condition.
b. We have F (x∗ ) = O. Therefore, for any d, d> F (x∗ )d ≥ 0. Hence, x∗ = [2, 0]> satisfies the second order
necessary condition.
c. Consider points of the form x = [−x22 + 2, x2 ]> , x2 ∈ R. Such points are in Ω, and could be arbitrarily
close to x∗ . However, for such points x 6= x∗ ,
f (x) = 3(−x22 + 2) = 6 − 6x22 < 6 = f (x∗ ).
Hence, x∗ is not a local minimizer.
6.16
a. We have ∇f (x∗ ) = 0. Therefore, for any feasible direction d at x∗ , we have d> ∇f (x∗ ) = 0. Hence, x∗
satisfies the first-order necessary condition.
b. We have
"
8
F (x∗ ) =
0
#
0
.
−2
Any feasible direction d at x∗ has the form d = [d1 , d2 ]> where d2 ≤ 2d1 , d1 , d2 ≥ 0. Therefore, for any
feasible direction d at x∗ , we have
d> F (x∗ )d = 8d21 − 2d22 ≥ 8d21 − 2(2d1 )2 = 0.
Hence, x∗ satisfies the second-order necessary condition.
c. We have f (x∗ ) = 0. Any point of the form x = [x1 , x21 + 2x1 ], x1 > 0, is feasible and has objective
function value given by
f (x) = 4x21 − (x21 + 2x1 )2 = −(x41 + 4x31 ) < 0 = f (x∗ ),
Moreover, there are such points in any neighborhood of x∗ . Therefore, the point x∗ is not a local minimizer.
6.17
a. We have ∇f (x∗ ) = [1/x∗1 , 1/x∗2 ]> . If x∗ were an interior point, then ∇f (x∗ ) = 0. But this is clearly
impossible. Therefore, x∗ cannot possibly be an interior point.
b. We have F (x) = − diag[1/x21 , 1/x22 ], which is negative definite everywhere. Therefore, the second-order
necessary condition is satisfied everywhere. (Note that because we have a maximization problem, negative
definiteness is the relevant condition.)
6.18
Given x ∈ R, let
f (x) =
n
X
(x − xi )2 ,
i=1
so that x̄ is the minimizer of f . By the FONC,
f 0 (x̄) = 0,
and hence
n
X
2(x̄ − xi ) = 0,
i=1
which on solving gives
n
1X
x̄ =
xi .
n i=1
25
6.19
Let θ1 be the angle from the horizontal to the bottom of the picture, and θ2 the angle from the horizontal
to the top of the picture. Then, tan(θ) = (tan(θ2 ) − tan(θ1 ))/(1 + tan(θ2 ) tan(θ1 )). Now, tan(θ1 ) = b/x and
tan(θ2 ) = (a + b)/x. Hence, the objective function that we wish to maximize is
f (x) =
We have
(a + b)/x − b/x
a
=
.
2
1 + b(a + b)/x
x + b(a + b)/x
a2
f (x) = −
(x + b(a + b)/x)2
0
b(a + b)
1−
x2
.
Let x∗ be the optimal distance. Then, by the FONC, we have f 0 (x∗ ) = 0, which gives
1−
b(a + b)
(x∗ )2
=
x∗
=
⇒
0
p
b(a + b).
6.20
The squared distance from the sensor to the baby’s heart is 1 + x2 , while the squared distance from the
sensor to the mother’s heart is 1 + (2 − x)2 . Therefore, the signal to noise ratio is
f (x) =
1 + (2 − x)2
.
1 + x2
We have
f 0 (x)
−2(2 − x)(1 + x2 ) − 2x(1 + (2 − x)2 )
(1 + x2 )2
2
4(x − 2x − 1)
.
(1 + x2 )2
=
=
By the FONC, at the optimal position x∗ , we√have f 0 (x∗ ) = 0. Hence, either x∗ = 1 −
From the figure, it easy to see that x∗ = 1 − 2 is the optimal position.
√
2 or x∗ = 1 +
6.21
a. Let x be the decision variable. Write the total travel time as f (x), which is given by
p
√
1 + (d − x)2
1 + x2
f (x) =
+
.
v1
v2
Differentiating the above expression, we get
f 0 (x) =
v1
√
x
d−x
− p
.
2
1+x
v2 1 + (d − x)2
By the first order necessary condition, the optimal path satisfies f 0 (x∗ ) = 0, which corresponds to
x∗
d − x∗
p
= p
,
v1 1 + (x∗ )2
v2 1 + (d − x∗ )2
or sin θ1 /v1 = sin θ2 /v2 . Upon rearranging, we obtain the desired equation.
b. The second derivative of f is given by
f 00 (x) =
1
1
+
.
2
3/2
v1 (1 + x )
v2 (1 + (d − x)2 )3/2
Hence, f 00 (x∗ ) > 0, which shows that the second order sufficient condition holds.
26
√
2.
6.22
a. We have f (x) = U1 (x1 ) + U2 (x2 ) and Ω = {x : x1 , x2 ≥ 0, x1 + x2 ≤ 1}. A picture of Ω looks like:
x2
1 0
1
x1
b. We have ∇f (x) = [a1 , a2 ]> . Because ∇f (x) 6= 0, for all x, we conclude that no interior point satisfies
the FONC. Next, consider any feasible point x for which x2 > 0. At such a point, the vector d = [1, −1]>
is a feasible direction. But then d> ∇f (x) = a1 − a2 > 0 which means that FONC is violated (recall that
the problem is to maximize f ). So clearly the remaining candidates are those x for which x2 = 0. Among
these, if x1 < 1, then d = [0, 1]> is a feasible direction, in which case we have d> ∇f (x) = a2 > 0. This
leaves the point x = [1, 0]> . At this point, any feasible direction d satisfies d1 ≤ 0 and d2 ≤ −d1 . Hence,
for any feasible direction d, we have
d> ∇f (x) = d1 a1 + d2 a2 ≤ d1 a1 + (−d1 )a2 = d1 (a1 − a2 ) ≤ 0.
So, the only feasible point that satisfies the FONC is [1, 0]> .
c. We have F (x) = O ≤ 0. Hence, any point satisfies the SONC (again, recall that the problem is to
maximize f ).
6.23
We have
"
#
4(x1 − x2 )3 + 2x1 − 2
∇f (x) =
.
−4(x1 − x2 )3 − 2x2 + 2
Setting ∇f (x) = 0 we get
4(x1 − x2 )3 + 2x1 − 2
=
0
3
=
0.
−4(x1 − x2 ) − 2x2 + 2
Adding the two equations, we obtain x1 = x2 , and substituting back yields
x1 = x2 = 1.
Hence, the only point satisfying the FONC is [1, 1]> .
We have
"
12(x1 − x2 )2 + 2
F (x) =
−12(x1 − x2 )2
Hence
"
2
F ([1, 1] ) =
0
>
#
−12(x1 − x2 )2
.
12(x1 − x2 )2 − 2
0
−2
#
Since F ([1, 1]> ) is not positive semidefinite, the point [1, 1]> does not satisfy the SONC.
6.24
Suppose d is a feasible direction at x. Then, there exists α0 > 0 such that x + αd ∈ Ω for all α ∈ [0, α0 ].
Let β > 0 be given. Then, x + α(βd) ∈ Ω for all α ∈ [0, α0 /β]. Since α0 /β > 0, by definition βd is also a
feasible direction at x.
6.25
⇒: Suppose d is feasible at x ∈ Ω. Then, there exists α > 0 such that x + αd ∈ Ω, that is, A(x + αd) = b.
Since Ax = b and α 6= 0, we conclude that Ad = 0.
27
⇐: Suppose Ad = 0. Then, for any α ∈ [0, 1], we have αAd = 0. Adding this equation to Ax = b, we
obtain A(x + αd) = b, that is, x + αd ∈ Ω for all α ∈ [0, 1]. Therefore, d is a feasible direction at x.
6.26
The vector d = [1, 1]> is a feasible direction at 0. Now,
d> ∇f (0) =
∂f
∂f
(0) +
(0).
∂x1
∂x2
Since ∇f (0) ≤ 0 and ∇f (0) 6= 0, then
d> ∇f (0) < 0.
Hence, by the FONC, 0 is not a local minimizer.
6.27
◦
◦
We have ∇f (x) = c 6= 0. Therefore, for any x ∈Ω, we have ∇f (x) 6= 0. Hence, by Corollary 6.1, x ∈Ω
cannot be a local minimizer (and therefore it cannot be a solution).
6.28
The objective function is f (x) = −c1 x1 − c2 x2 . Therefore, ∇f (x) = [−c1 , −c2 ]> 6= 0 for all x. Thus,
S by
FONC, the optimal solution x∗ cannot lie in the interior of the feasible set. Next, for all x ∈ L1 L2 ,
d = [1, 1]> is a feasible direction.
Therefore, d> ∇f (x) = −c1 − c2 < 0. Hence, by FONC, the optimal
S
solution x∗ cannot lie in L1 L2 . Lastly, for all x ∈ L3 , d = [1, −1]> is a feasible direction. Therefore,
d> ∇f (x) = c2 − c1 < 0. Hence, by FONC, the optimal solution x∗ cannot lie in L3 . Therefore, by
elimination, the unique optimal feasible solution must be [1, 0]> .
6.29
a. We write
n
1X 2 2
a xi + b2 + yi2 + 2xi ab − 2xi yi a − 2yi b
f (a, b) =
n i=1
!
!
n
n
1X 2
1X
2
2
= a
x +b +2
xi ab
n i=1 i
n i=1
!
!
!
n
n
n
1X
1X 2
1X
xi yi a − 2
yi b +
y
−2
n i=1
n i=1
n i=1 i
#" #
" P
Pn
n
1
x2i n1 i=1 xi a
i=1
n
= [a b] 1 Pn
1
b
i=1 xi
n
" n
#
"
#
n
n
1X
1X
1X 2
a
−2
xi yi ,
yi
y
+
n i=1
n i=1
n i=1 i
b
= z > Qz − 2c> z + d,
where z, Q, c and d are defined in the obvious way.
b. If the point z ∗ = [a∗ , b∗ ]> is a solution, then byPthe FONC, we have ∇f (z ∗ ) = 2Qz ∗ − 2c = 0,
n
which means Qz ∗ = c. Now, since X 2 − (X)2 = n1 i=1 (xi − X)2 , and the xi are not all equal, then
det Q = X 2 − (X)2 6= 0. Hence, Q is nonsingular, and hence
"
#"
#  XY −(X)(Y ) 
1
XY
1
−X
X 2 −(X)2
.
z ∗ = Q−1 c =
=  (X 2 )(Y
)−(X)(XY )
2
2
2
−X
X
Y
X − (X)
2
2
X −(X)
Since Q > 0, then by the SOSC, the point z ∗ is a strict local minimizer. Since z ∗ is the only point satisfying
the FONC, then z ∗ is the only local minimizer.
c. We have
a∗ X + b∗ =
XY − (X)(Y )
X2
−
(X)2
X+
28
(X 2 )(Y ) − (X)(XY )
X 2 − (X)2
=Y.
6.30
Given x ∈ Rn , let
p
f (x) =
1X
kx − x(i) k2
p i=1
be the average squared error between x and x(1) , . . . , x(p) . We can rewrite f as
p
f (x)
1X
(x − x(i) )> (x − x(i) )
p i=1
!>
p
1
1 X (i)
>
x + kx(i) k2 .
x
= x x−2
p i=1
p
=
So f is a quadratic function. Since x̄ is the minimizer of f , then by the FONC, ∇f (x̄) = 0, i.e.,
p
2x̄ − 2
1 X (i)
x = 0.
p i=1
Hence, we get
p
1 X (i)
x̄ =
x ,
p i=1
i.e., x̄ is just the average, or centroid, or center of gravity, of x(1) , . . . , x(p) .
The Hessian of f at x̄ is
F (x̄) = 2I n ,
which is positive definite. Hence, by the SOSC, x̄ is a strict local minimizer of f (in fact, it is a strict global
minimizer because f is a convex quadratic function).
6.31
Fix any x ∈ Ω. The vector d = x − x∗ is feasible at x∗ (by convexity of Ω). By Taylor’s formula, we have
f (x) = f (x∗ ) + d> ∇f (x∗ ) + o(kdk) ≥ f (x∗ ) + ckdk + o(kdk).
Therefore, for all x sufficiently close to x∗ , we have f (x) > f (x∗ ). Hence, x∗ is a strict local minimizer.
6.32
Since f ∈ C 2 , F (x∗ ) = F > (x∗ ). Let d 6= 0 be a feasible directions at x∗ . By Taylor’s theorem,
f (x∗ + d) − f (x∗ ) =
1 >
d ∇f (x∗ ) + d> F (x∗ )d + o(kdk2 ).
2
Using conditions a and b, we get
f (x∗ + d) − f (x∗ ) ≥ ckdk2 + o(kdk2 ),
Therefore, for all d such that kdk is sufficiently small,
f (x∗ + d) > f (x∗ ),
and the proof is completed.
6.33
Necessity follows from the FONC. To prove sufficiency, we write f as
f (x) =
1
1
(x − x∗ )> Q(x − x∗ ) − x∗> Qx∗
2
2
where x∗ = Q−1 b is the unique vector satisfying the FONC. Clearly, since
Q > 0, then
1
f (x) ≥ f (x∗ ) = − x∗> Qx∗ ,
2
29
1 ∗>
Qx∗
2x
is a constant, and
and f (x) = f (x∗ ) if and only if x = x∗ .
6.34
Write u = [u1 , . . . , un ]. We have
xn
= axn−1 + bun
= a(axn−2 + bun−1 ) + bun
= a2 xn−2 + abun−1 + bun
..
.
= an x0 + an−1 bu1 + · · · + abun−1 + bun
= c> u,
where c = [an−1 b, . . . , ab, b]> . Therefore, the problem can be written as
minimize ru> u − qc> u,
which is a positive definite quadratic in u. The solution is therefore
u=
q
c,
2r
or, equivalently, ui = qan−i b/(2r), i = 1, . . . , n.
7. One Dimensional Search Methods
7.1
√
The range reduction factor for 3 iterations of the Golden Section method is (( 5−1/2)3 = 0.236, while that of
the Fibonacci method (with ε = 0) is 1/F3+1 = 0.2. Hence, if the desired range reduction factor is anywhere
between 0.2 and 0.236 (e.g., 0.21), then the Golden Section method requires at least 4 iterations, while the
Fibonacci method requires only 3. So, an example of a desired final uncertainty range is 0.21×(8−5) = 0.63.
7.2
a. The plot of f (x) versus x is as below:
3.2
3.1
3
2.9
f(x)
2.8
2.7
2.6
2.5
2.4
2.3
1
1.1
1.2
1.3
1.4
1.5
x
1.6
1.7
1.8
1.9
2
b. The number of steps needed for the Golden Section method is computed from the inequality:
0.61803N ≤
0.2
2−1
⇒
30
N ≥ 3.34.
Therefore, the fewest possible number of steps is 4. Applying 4 steps of the Golden Section method, we end
up with an uncertainty interval of [a4 , b0 ] = [1.8541, 2.000]. The table with the results of the intermediate
steps is displayed below:
Iteration k
ak
bk
f (ak )
f (bk )
New uncertainty interval
1
2
3
4
1.3820
1.6180
1.7639
1.8541
1.6180
1.7639
1.8541
1.9098
2.6607
2.4292
2.3437
2.3196
2.4292
2.3437
2.3196
2.3171
[1.3820,2]
[1.6180,2]
[1.7639,2]
[1.8541,2]
c. The number of steps needed for the Fibonacci method is computed from the inequality:
1 + 2ε
0.2
≤
FN +1
2−1
⇒
N ≥ 4.
Therefore, the fewest possible number of steps is 4. Applying 4 steps of the Fibonacci method, we end up
with an uncertainty interval of [a4 , b0 ] = [1.8750, 2.000]. The table with the results of the intermediate steps
is displayed below:
Iteration k
ρk
ak
bk
f (ak )
f (bk )
New unc. int.
1
2
3
4
0.3750
0.4
0.3333
0.45
1.3750
1.6250
1.7500
1.8750
1.6250
1.7500
1.8750
1.8875
2.6688
2.4239
2.3495
2.3175
2.4239
2.3495
2.3175
2.3169
[1.3750,2]
[1.6250,2]
[1.7500,2]
[1.8750,2]
d. We have f 0 (x) = 2x − 4 sin x, f 00 (x) = 2 − 4 cos x. Hence, Newton’s algorithm takes the form:
x(k+1) = x(k) −
x(k) − 2 sin x(k)
.
1 − 2 cos x(k)
Applying 4 iterations with x(0) = 1, we get x(1) = −7.4727, x(2) = 14.4785, x(3) = 6.9351, x(4) = 16.6354.
Apparently, Newton’s method is not effective in this case.
7.3
a. We first create the M-file f.m as follows:
% f.m
function y=f(x)
y=8*exp(1-x)+7*log(x);
The MATLAB commands to plot the function are:
fplot(’f’,[1 2]);
xlabel(’x’);
ylabel(’f(x)’);
The resulting plot is as follows:
31
8
7.95
7.9
f(x)
7.85
7.8
7.75
7.7
7.65
1
1.1
1.2
1.3
1.4
1.5
x
1.6
1.7
1.8
1.9
2
b. The MATLAB routine for the Golden Section method is:
%Matlab routine for Golden Section Search
left=1;
right=2;
uncert=0.23;
rho=(3-sqrt(5))/2;
N=ceil(log(uncert/(right-left))/log(1-rho)) %print N
lower=’a’;
a=left+(1-rho)*(right-left);
f_a=f(a);
for i=1:N,
if lower==’a’
b=a
f_b=f_a
a=left+rho*(right-left)
f_a=f(a)
else
a=b
f_a=f_b
b=left+(1-rho)*(right-left)
f_b=f(b)
end %if
if f_a<f_b
right=b;
lower=’a’
else
left=a;
lower=’b’
end %if
New_Interval = [left,right]
end %for i
%------------------------------------------------------------------
Using the above routine, we obtain N = 4 and a final interval of [1.528, 1.674]. The table with the results
of the intermediate steps is displayed below:
32
Iteration k
ak
bk
f (ak )
f (bk )
New uncertainty interval
1
2
3
4
1.382
1.618
1.528
1.618
1.618
1.764
1.618
1.674
7.7247
7.6805
7.6860
7.6805
7.6805
7.6995
7.6805
7.6838
[1.382,2]
[1.382,1.764]
[1.528,1.764]
[1.528,1.674]
c. The MATLAB routine for the Fibonacci method is:
%Matlab routine for Fibonacci Search technique
left=1;
right=2;
uncert=0.23;
epsilon=0.05;
F(1)=1;
F(2)=1;
N=0;
while F(N+2) < (1+2*epsilon)*(right-left)/uncert
F(N+3)=F(N+2)+F(N+1);
N=N+1;
end %while
N %print N
lower=’a’;
a=left+(F(N+1)/F(N+2))*(right-left);
f_a=f(a);
for i=1:N,
if i~=N
rho=1-F(N+2-i)/F(N+3-i)
else
rho=0.5-epsilon
end %if
if lower==’a’
b=a
f_b=f_a
a=left+rho*(right-left)
f_a=f(a)
else
a=b
f_a=f_b
b=left+(1-rho)*(right-left)
f_b=f(b)
end %if
if f_a<f_b
right=b;
lower=’a’
else
left=a;
lower=’b’
end %if
New_Interval = [left,right]
end %for i
%------------------------------------------------------------------
33
Using the above routine, we obtain N = 3 and a final interval of [1.58, 1.8]. The table with the results of
the intermediate steps is displayed below:
7.4
Now, ρk = 1 −
Iteration k
ρk
ak
bk
f (ak )
f (bk )
New uncertainty interval
1
2
3
0.4
0.333
0.45
1.4
1.6
1.58
1.6
1.8
1.6
7.7179
7.6805
7.6812
7.6805
7.7091
7.6805
[1.4,2]
[1.4,1.8]
[1.58,1.8]
FN −k+1
FN −k+2 .
Hence,
1−
ρk
1 − ρk
1 − FN −k+1 /FN −k+2
FN −k+1 /FN −k+2
FN −k+2 − FN −k+1
= 1−
FN −k+1
FN −k
= 1−
FN −k+1
= ρk+1
=
1−
To show that 0 ≤ ρk ≤ 1/2, we proceed by induction. Clearly ρ1 = 1/2 satisfies 0 ≤ ρ1 ≤ 1/2. Suppose
0 ≤ ρk ≤ 1/2, where k ∈ {1, . . . , N − 1}. Then,
1
≤ 1 − ρk ≤ 1
2
and hence
1≤
1
≤ 2.
1 − ρk
Therefore,
1
ρk
≤
≤ 1.
2
1 − ρk
Since ρk+1 = 1 −
ρk
1−ρk ,
then
0 ≤ ρk+1 ≤
1
2
as required.
7.5
We proceed by induction. For k = 2, we have F0 F3 − F1 F2 = (1)(3) − (1)(2) = 1 = (−1)2 . Suppose
Fk−2 Fk+1 − Fk−1 Fk = (−1)k . Then,
Fk−1 Fk+2 − Fk Fk+1
= Fk−1 (Fk+1 + Fk ) − (Fk−1 + Fk−2 )Fk+1
= Fk−1 Fk − Fk−2 Fk+1
= −(−1)k
= (−1)k+1 .
7.6
Define yk = Fk and zk = Fk−1 . Then, we have
"
#
" #
yk+1
yk
=A
,
zk+1
zk
where
"
1
A=
1
34
#
1
,
0
with initial condition
"
We can write
# " #
y0
1
=
.
z0
0
"
" #
#
yn
n 1
Fn = yn = [1, 0]
= [1, 0]A
.
zn
0
Since A is symmetric, it can be diagonalized as
" #"
u> λ 1
A=
v>
0
where
and
#
0 h
u
λ2
v
i
√
1− 5
λ2 =
,
2
√
1+ 5
,
λ1 =
2
q √

1  2/( 5 − 1)
u = − 1/4 q √
,
5
( 5 − 1)/2
 q√

1  ( 5 − 1)/2 
q √
v = 1/4
.
5
− 2/( 5 − 1)
Therefore, we have
"
Fn
= u>
λ1
0
0
λ2
#n
u
= u21 λn1 + u22 λn2

√ !n+1
1  1+ 5
−
= √
2
5

√ !n+1
1− 5
.
2
7.7
The number log(2) is the root of the equation g(x) = 0, where g(x) = exp(x) − 2. The derivative of g is
g 0 (x) = exp(x). Newton’s method applied to this root finding problem is
x(k+1) = x(k) −
exp(x(k) ) − 2
= x(k) − 1 + 2 exp(−x(k) ).
exp(x(k) )
Performing two iterations, we get x(1) = 0.7358 and x(2) = 0.6940.
7.8
a. We compute g 0 (x) = 2ex /(ex + 1)2 . Therefore Newton’s method of tangents for this problem takes the
form
(k)
x(k+1)
= x(k) −
(k)
(ex − 1)/(ex + 1)
2ex(k) /(ex(k) + 1)2
(k)
(e2x − 1)
2ex(k)
(k)
= x − sinh x(k) .
= x(k) −
b. By symmetry, we need x(1) = −x(0) for cycling. Therefore, x(0) must satisfy
−x(0) = x(0) − sinh x(0) .
The algorithm cycles if x(0) = c, where c > 0 is the solution to 2c = sinh c.
35
c. The algorithm converges to 0 if and only if |x(0) | < c, where c is from part b.
7.9
The quadratic function that matches the given data x(k) , x(k−1) , x(k−2) , f (x(k) ), f (x(k−1) ), and f (x(k−2) )
can be computed by solving the following three linear equations for the parameters a, b, and c:
a(x(k−i) )2 + bx(k−i) + c = f (x(k−i) ),
i = 0, 1, 2.
Then, the algorithm is given by x(k+1) = −b/2a (so, in fact, we only need to find the ratio of a and b).
With some elementary algebra (e.g., using Cramer’s rule without needing to calculate the determinant in
the denominator), the algorithm can be written as:
x(k+1) =
σ12 f (x(k) ) + σ20 f (x(k−1) ) + σ01 f (x(k−2) )
2(δ12 f (x(k) ) + δ20 f (x(k−1) ) + δ01 f (x(k−2) ))
where σij = (x(k−i) )2 − (x(k−j) )2 and δij = x(k−i) − x(k−j) .
7.10
a. A MATLAB routine for implementing the secant method is as follows.
function [x,v] = secant(g,xcurr,xnew,uncert);
%Matlab routine for finding root of g(x) using secant method
%
% secant;
% secant(g);
% secant(g,xcurr,xnew);
% secant(g,xcurr,xnew,uncert);
%
% x=secant;
% x=secant(g);
% x=secant(g,xcurr,xnew);
% x=secant(g,xcurr,xnew,uncert);
%
% [x,v]=secant;
% [x,v]=secant(g);
% [x,v]=secant(g,xcurr,xnew);
% [x,v]=secant(g,xcurr,xnew,uncert);
%
%The first variant finds the root of g(x) in the M-file g.m, with
%initial conditions 0 and 1, and uncertainty 10^(-5).
%The second variant finds the root of the function in the M-file specified
%by the string g, with initial conditions 0 and 1, and uncertainty 10^(-5).
%The third variant finds the root of the function in the M-file specified
%by the string g, with initial conditions specified by xcurr and xnew, and
%uncertainty 10^(-5).
%The fourth variant finds the root of the function in the M-file specified
%by the string g, with initial conditions specified by xcurr and xnew, and
%uncertainty specified by uncert.
%
%The next four variants returns the final value of the root as x.
%The last four variants returns the final value of the root as x, and
%the value of the function at the final value as v.
if nargin < 4
uncert=10^(-5);
if nargin < 3
if nargin == 1
xcurr=0;
xnew=1;
elseif nargin == 0
36
g=’g’;
else
disp(’Cannot have 2 arguments.’);
return;
end
end
end
g_curr=feval(g,xcurr);
while abs(xnew-xcurr)>xcurr*uncert,
xold=xcurr;
xcurr=xnew;
g_old=g_curr;
g_curr=feval(g,xcurr);
xnew=(g_curr*xold-g_old*xcurr)/(g_curr-g_old);
end %while
%print out solution and value of g(x)
if nargout >= 1
x=xnew;
if nargout == 2
v=feval(g,xnew);
end
else
final_point=xnew
value=feval(g,xnew)
end %if
%------------------------------------------------------------------
b. We get a solution of x = 0.0039671, with corresponding value g(x) = −9.908 × 10−8 .
7.11
function alpha=linesearch_secant(grad,x,d)
%Line search using secant method
epsilon=10^(-4); %line search tolerance
max = 100; %maximum number of iterations
alpha_curr=0;
alpha=0.001;
dphi_zero=feval(grad,x)’*d;
dphi_curr=dphi_zero;
i=0;
while abs(dphi_curr)>epsilon*abs(dphi_zero),
alpha_old=alpha_curr;
alpha_curr=alpha;
dphi_old=dphi_curr;
dphi_curr=feval(grad,x+alpha_curr*d)’*d;
alpha=(dphi_curr*alpha_old-dphi_old*alpha_curr)/(dphi_curr-dphi_old);
i=i+1;
if (i >= max) & (abs(dphi_curr)>epsilon*abs(dphi_zero)),
disp(’Line search terminating with number of iterations:’);
disp(i);
break;
end
end %while
%------------------------------------------------------------------
37
7.12
a. We could carry out the bracketing using the one-dimensional function φ0 (α) = f (x(0) + αd(0) ), where
d(0) is the negative gradient at x(0) , as described in Section 7.8. The decision variable would be α. However,
here we will directly represent the points in R2 (which is equivalent, though unnecessary in general).
The uncertainty interval is calculated by the following procedure:
"
#
"
#
1 > 2 1
2 1
x,
∇f (x) =
x
f (x) = x
2
1 2
1 2
Therefore,
"
1
2
"
0.8
−0.25
2
d = −∇f (x(0) ) = −
1
#"
# "
#
0.8
−1.35
=
−0.25
−0.3
As the problem requires, we use ε = 0.075.
First, we begin calculating f (x(0) ) and x(1) :
f (x(0) ) = f
#!
= 0.5025,
"
x(1)
#
"
# "
#
0.8
−1.35
0.6987
= x(0) + εd =
+ 0.075
=
−0.25
−0.3
−0.2725
Then, we proceed as follows to find the uncertainty interval:
"
#!
0.6987
(1)
f (x ) = f
= 0.3721
−0.2725
"
#
"
# "
#
0.8
−1.35
0.5975
x = x + 2εd =
+ 0.15
=
−0.25
−0.3
−0.2950
"
#!
0.5975
f (x(2) ) = f
= 0.2678
−0.2950
"
#
"
# "
#
0.8
−1.35
0.3950
(3)
(0)
x = x + 4εd =
+ 0.3
=
−0.25
−0.3
−0.3400
"
#!
0.3950
f (x(3) ) = f
= 0.1373
−0.3400
"
#
"
# "
#
0.8
−1.35
−0.0100
(4)
(0)
x = x + 8εd =
+ 0.6
=
−0.25
−0.3
−0.4300
"
#!
−0.0100
f (x(4) ) = f
= 0.1893
−0.4300
(2)
(0)
Between f (x(3) ) and ""
f (x(4) ) the# function
increases,
which
"
##
" means
# that the minimizer must occur on the
0.5975
−0.0100
−1.35
interval [x(2) , x(4) ] =
,
, with d =
.
−0.2950
−0.4300
−0.3
MATLAB code to solve the problem is listed next.
% Coded by David Schvartzman
38
% In our case we have:
Q = [2 1; 1 2];
x0 = [0.8; -.25];
e = 0.075;
f = zeros(1,10);
X = zeros(2,10);
x1 = x0;
d = -Q*x1;
f(1) = 0.5*(x1’)*Q*x1;
for i=2:10
X(:,i) = x1+e*d;
f(i) = 0.5*X(:,i)’*Q*X(:,i);
e = 2*e;
if(f(i) > f(i-1))
break;
end
end
% The interval is defined by:
a = X(:,i-2);
b = X(:,i);
str = sprintf(’The minimizer is located in: [a, b], where a = [%.4f; %.4f]...
and b = [%.4f; %.4f]’, a(1,1), a(2,1), b(1,1), b(2,1));
disp(str);
b. First, we determine the number of necessary iterations:
The initial uncertainty interval width is 0.6223. This width will be (0.6223)(0.618)N after N stages. We
choose N so that
0.01
⇒N =9
(0.618)N ≤
0.6223
We show the first iteration of the algorithm; the rest are analogous and shown in the following table.
From part a, we know that [a0 , b0 ] = [x(2) , x(4) ], then:
""
# "
##
0.5975
−0.0100
[a0 , b0 ] =
,
, with f (a0 ) = 0.2678, f (b0 ) = 0.1893.
−0.2950
−0.4300
T
a1 = a0 + ρ(b0 − a0 ) = [0.3655 − 0.3466]
T
b1 = a0 + (1 − ρ)(b0 − a0 ) = [0.2220 − 0.3784]
f (a1 ) = 0.1270
f (b1 ) = 0.1085
We can see f (a1 ) > f (b1 ), hence the uncertainty interval is reduced to:
""
# "
##
0.3655
−0.0100
[a1 , b0 ] =
,
−0.3466
−0.4300
So, calculating the norm of (b0 − a1 ), we see that the uncertainty region width is now 0.38461.
39
Iteration
ak
"
1
"
2
"
3
"
4
"
5
"
6
"
7
"
8
"
9
bk
#
0.3655
−0.3466
"
#
0.2220
−0.3784
"
#
0.2768
−0.3663
"
#
0.2220
−0.3784
"
#
0.2430
−0.3738
"
#
0.2559
−0.3709
"
#
0.2430
−0.3738
"
#
0.2479
−0.3727
"
#
0.2430
−0.3738
"
#
0.2220
−0.3784
#
0.1334
−0.3981
#
0.2220
−0.3784
#
0.1882
−0.3860
#
0.2220
−0.3784
f (ak )
0.1270
0.1085
0.1094
0.1085
0.1079
f (bk )
# "
##
.3655
.1334
,
−0.3466
−0.3981
""
# "
##
.2768
.1334
,
−0.3663
−0.3981
""
# "
##
.2768
.1882
,
−0.3663
−0.3860
""
# "
##
.2768
.2220
,
−0.3663
−0.3784
""
# "
##
0.2559
0.2220
,
−0.3709
−0.3784
""
# "
##
0.2559
0.2350
,
−0.3709
−0.3756
""
# "
##
0.2479
0.2350
,
−0.3727
−0.3756
""
# "
##
0.2479
0.2399
,
−0.3727
−0.3745
0.1117
0.1085
#
0.2350
−0.3756
0.1079
0.1080
0.1079
""
0.1085
0.1079
#
0.2399
−0.3745
# "
##
.3655
0.0100
,
−0.3466
−0.4300
0.1232
0.1081
0.1080
""
0.1085
#
0.2430
−0.3738
#
0.2430
−0.3738
New Uncertainty Interval
0.1079
0.1079
""
We can now see that the minimizer is located within
# "
##
0.2479
0.2399
,
, and its uncertainty
−0.3727
−0.3745
interval width is 0.00819.
Matlab code used to perform calculations is listed next
% Coded by David Schvartzman
% To succesfully run this program, run
% the previous script to obatin a and b.
e = 0.01;
Q = [2 1; 1 2];
ro = 0.5*(3-sqrt(5));
% First we determine the number of necessary iterations.
d = norm(a-b);N = ceil( (log(e/d))/(log(1-ro)));
fa = 0.5*a’*Q*a;
fb = 0.5*b’*Q*b;
str1 = sprintf(’Initial values:’);
str2 = sprintf(’a0 = [%.4f,%.4f].’, a(1), a(2));
str3 = sprintf(’b0 = [%.4f,%.4f].’, b(1), b(2));
str4 = sprintf(’f(a0) = %.4f.’, fa);
str5 = sprintf(’f(b0) = %.4f.’, fb);
40
strn = sprintf(’\n’);
disp(strn);
disp(str1);disp(str2);
disp(str3);disp(str4);
disp(str5);
s = a + ro*(b-a);
t = a + (1-ro)*(b-a);
fs = 0.5*s’*Q*s;
ft = 0.5*t’*Q*t;
for i=1:N
str1 = sprintf(’Iteration number: %d’, i);
str2 = sprintf(’a%d = [%.4f,%.4f].’, i, s(1), s(2));
str3 = sprintf(’b%d = [%.4f,%.4f].’, i, t(1), t(2));
str4 = sprintf(’f(a%d) = %.4f.’, i, fs);
str5 = sprintf(’f(b%d) = %.4f.’, i, ft);
if (ft>fs)
b = t;
%fb = ft;
t = s;
ft = fs;
s = a + ro*(b-a);
fs = 0.5*s’*Q*s;
else
a = s;
%fa = fs;
s = t;
fs = ft;
t = a + (1-ro)*(b-a);
ft = 0.5*t’*Q*t;
end
str6 = sprintf(’New uncertainty interval: a%d = [%.4f,%.4f], ...
b%d = [%.4f,%.4f].’, i, a(1), a(2), i, b(1), b(2));
disp(strn);
disp(str1)
disp(str2)
disp(str3)
disp(str4)
disp(str5)
disp(str6)
end
% The interval where the minimizer is boxed in is given by:
an = a;
bn = b;
% We can return (an+bn)/2 as the minimizer.
min = (an+bn)/2;disp(strn);
str = sprintf(’The minimizer x* is: [%.4f; %.4f]’, min(1,1), min(2,1));
disp(str);
c. We need to determine the number of necessary iterations:
The initial uncertainty interval width is 0.6223. This width will be (0.6223) F1+2ε
, where Fk is the k-th
(N +1)
element of the Fibonacci sequence. We choose N so that
1 + 2ε
0.01
≤
= 0.0161
FN +1
0.6223
41
⇒ FN +1 ≥
1 + 2ε
0.0161
For ε = 0.05, we require F(N +1) > 68.32, thus F(10) = 89 is enough, and we have N = 10 − 1, 9 iterations.
We show the first iteration of the algorithm; the rest are analogous and shown in the following table.
From part a, we know that [a0 , b0 ] = [x(2) , x(4) ], then:
""
# "
##
0.5975
−0.0100
[a0 , b0 ] =
,
, with f (a0 ) = 0.2678, f (b0 ) = 0.1893.
−0.2950
−0.4300
Recall that in the Fibonacci method, ρ1 = 1 −
FN
FN +1
=1−
55
89
= 0.3820 .
T
a1 = a0 + ρ1 (b0 − a0 ) = [0.3654 − 0.3466]
T
b1 = a0 + (1 − ρ1 )(b0 − a0 ) = [0.2221 − 0.3784]
f (a1 ) = 0.1270
f (b1 ) = 0.1085
We can see f (a1 ) > f (b1 ), hence the uncertainty interval is reduced to:
""
# "
##
0.3654
−0.0100
,
[a1 , b0 ] =
−0.3466
−0.4300
So, calculating the norm of (b0 − a1 ), we see that the uncertainty region width is now 0.38458.
k
ρk
ak
"
1
0.3820
2
0.3818
"
"
3
0.3824
bk
#
0.3654
−0.3466
"
#
0.2221
−0.3784
"
#
0.2767
−0.3663
"
f (ak )
f (bk )
#
0.2221
−0.3784
0.1270
0.1085
#
0.1333
−0.3981
0.1085
0.1232
#
0.2221
−0.3784
0.1094
0.1085
42
New Uncertainty Interval
""
# "
##
0.3654
−0.0100
,
−0.3466
−0.4300
""
# "
##
0.3654
0.1333
,
−0.3466
−0.3981
""
# "
##
0.2767
0.1333
,
−0.3663
−0.3981
k
ρk
ak
"
4
0.3810
"
5
0.3846
"
6
0.3750
"
7
0.4000
"
8
0.3333
"
9
0.4500
bk
#
0.2221
−0.3784
"
#
0.2426
−0.3739
"
#
0.2562
−0.3708
"
#
0.2426
−0.3739
"
#
0.2494
−0.3724
"
#
0.2426
−0.3739
"
#
0.1879
−0.3860
#
0.2221
−0.3784
#
0.2426
−0.3739
#
0.2357
−0.3754
#
0.2426
−0.3739
#
0.2419
−0.3740
f (ak )
0.1085
0.1079
0.1082
0.1079
0.1080
0.1079
f (bk )
New Uncertainty Interval
""
# "
##
0.2767
0.1879
,
−0.3663
−0.3860
""
# "
##
0.2767
0.2221
,
−0.3663
−0.3784
""
# "
##
0.2562
0.2221
,
−0.3708
−0.3784
""
# "
##
0.2562
0.2357
,
−0.3708
−0.3754
""
# "
##
0.2494
0.2357
,
−0.3724
−0.3754
""
# "
##
0.2494
0.2419
,
−0.3724
−0.3740
0.1118
0.1085
0.1079
0.1080
0.1079
0.1079
""
We can now see that the minimizer is located within
# "
##
0.2494
0.2419
,
, and its uncertainty
−0.3724
−0.3740
interval width is 0.00769.
Matlab copde used to perform calculations is listed next.
% Coded by David Schvartzman
% To succesfully run this program, run the first of the above scripts
% to obtain a and b.
% We take
e = 0.01;
Q = [2 1; 1 2];
% First determine the number of necessary iterations.
d = norm(a-b);
F_N_1 =(2*0.05+1)/(e/d);
F = zeros(1,20);
F(1) = 0;
F(2) = 1;
for i=1:20
F(i+1) = F(i)+F(i+1);
if(F(i+2)>= F_N_1)
break;
end
end
N = i-1;
ro = zeros(1, N+1);
for i = 1:N
ro(i) = 1 - F(N+3-i)/F(N+4-i);
end
ro(N) = ro(N) - 0.05;
fa = 0.5*a’*Q*a;
fb = 0.5*b’*Q*b;
43
str1 = sprintf(’Initial values:’);
str2 = sprintf(’a0 = [%.4f,%.4f].’, a(1), a(2));
str3 = sprintf(’b0 = [%.4f,%.4f].’, b(1), b(2));
str4 = sprintf(’f(a0) = %.4f.’, fa);
str5 = sprintf(’f(b0) = %.4f.’, fb);
strn = sprintf(’\n’);
disp(strn);
disp(str1);
disp(str2);
disp(str3);
disp(str4);
disp(str5);
s = a + ro(1)*(b-a);
t = a + (1-ro(1))*(b-a);
fs = 0.5*s’*Q*s;
ft = 0.5*t’*Q*t;
for i=1:N
str1 = sprintf(’Iteration number: %d’, i);
str2 = sprintf(’a%d = [%.4f,%.4f].’, i, s(1), s(2));
str3 = sprintf(’b%d = [%.4f,%.4f].’, i, t(1), t(2));
str4 = sprintf(’f(a%d) = %.4f.’, i, fs);
str5 = sprintf(’f(b%d) = %.4f.’, i, ft);
if (ft>fs)
b = t;
t = s;
ft = fs;
s = a + ro(i+1)*(b-a);
fs = 0.5*s’*Q*s;
else
a = s;
s = t;
fs = ft;
t = a + (1-ro(i+1))*(b-a);
ft = 0.5*t’*Q*t;
end
str6 = sprintf(’New uncertainty interval: a%d = [%.4f,%.4f],...
b%d = [%.4f,%.4f].’, i, a(1), a(2), i, b(1), b(2));
str7 = sprintf(’Uncertainty interval width: %.5f’, norm(a-b));
disp(strn);
disp(str1)
disp(str2)
disp(str3)
disp(str4)
disp(str5)
disp(str6)
disp(str7)
end
% The minimizer is boxed in the interval:
an = a;
bn = b;
% We can return (an+bn)/2 as the minimizer.
min = (an+bn)/2;
disp(strn);
str = sprintf(’The minimizer x* is: [%.4f; %.4f]’, min(1,1), min(2,1));
44
disp(str);
8. Gradient Methods
8.1
The function f is a quadratic and so we can represent it in standard form as
"
#
"
#
1 > 1 0
1
−1
>
f= x
+ 3 = x> Qx − x> b + c.
x−x
2
2
−1/2
0 2
The first iteration is
x(1) = x(0) − α0 ∇f x(0) .
To find x(1) , we need to compute ∇f x(0) = g (0) . We have
h
g (0) = Qx(0) − b = 1
i>
1/2
.
The step size, α0 , can be computed as
α0 =
5
g (0)> g (0)
= .
(0)>
(0)
6
g
Qg
Hence,
x
(1)
= −α0 g
The second iteration is
(0)
"
#
"
#
5 1
5/6
=−
=−
.
6 1/2
5/12
x(2) = x(1) − α1 ∇f x(1) ,
where
∇f x
(1)
"
=g
(1)
and
α1 =
(1)
= Qx
#
1/6
−b=
,
−1/3
g (1)> g (1)
5
= .
(1)>
(1)
9
g
Qg
Hence,
"
x
(2)
=x
(1)
− α1 g
(1)
#
"
# "
#
5 1/6
−5/6
− 25
27
=
−
=
.
25
9 −1/3
−5/12
− 108
The optimal solution is x∗ = [−1, −1/4]> obtaind by solving the equation Qx = b.
8.2
Let s be the order of convergence of {x(k) }. Suppose there exists c > 0 such that for all k sufficiently large,
kx(k+1) − x∗ k ≥ ckx(k) − x∗ kp .
Hence, for all k sufficiently large,
kx(k+1) − x∗ k
kx(k) − x∗ ks
=
≥
kx(k+1) − x∗ k
1
(k)
∗
p
(k)
kx − x k kx − x∗ ks−p
c
.
kx(k) − x∗ ks−p
Taking limits yields
kx(k+1) − x∗ k
c
≥
.
k→∞ kx(k) − x∗ ks
limk→∞ kx(k) − x∗ ks−p
lim
45
Since by definition s is the order of convergence,
kx(k+1) − x∗ k
< ∞.
k→∞ kx(k) − x∗ ks
lim
Combining the above two inequalities, we get
c
limk→∞
kx(k)
− x∗ ks−p
< ∞.
Therefore, since limk→∞ kx(k) − x∗ k = 0, we conclude that s ≤ p, i.e., the order of convergence is at most p.
8.3
We use contradiction. Suppose x(k) → x∗ and
kx(k+1) − x∗ k
>0
k→∞ kx(k) − x∗ kp
lim
for some p < 1. We may assume that x(k) =
6 x∗ for an infinite number of k (for otherwise, by convention,
the ratio above is eventually 0). Fix ε > 0. Then, there exists K1 such that for all k ≥ K1 ,
kx(k+1) − x∗ k
> ε.
kx(k) − x∗ kp
Dividing both sides by kx(k) − x∗ k1−p , we obtain
kx(k+1) − x∗ k
ε
.
>
(k)
∗
(k)
kx − x k
kx − x∗ k1−p
Because x(k) → x∗ and p < 1, we have kx(k) − x∗ k1−p → 0. Hence, there exists K2 such that for all k ≥ K2 ,
kx(k) − x∗ k1−p < ε. Combining this inequality with the previous one yields
kx(k+1) − x∗ k
>1
kx(k) − x∗ k
for all k ≥ max(K1 , K2 ); i.e.,
kx(k+1) − x∗ k > kx(k) − x∗ k,
which contradicts the assumption that x(k) → x∗ .
8.4
2
a. The sequence converges to 0, because the exponent −2k grows unboundedly negative as k → ∞.
b. The order of convergence of {x(k) } is ∞. To see this, we first write, for p ≥ 1,
|x(k+1) |
|x(k) |p
=
2−2
(k+1)2
2−2k
=
2−2
=
2−2
2
2
p
k2 +2k+1
k2
+p2k
(22k+1 −p)
2
.
But notice that the exponent −2k (22k+1 − p) grows unboundedly negative as k → ∞, regardless of the value
of p. Therefore, for any p,
|x(k+1) |
lim
= 0,
k→∞ |x(k) |p
which means that the order of convergence is ∞.
46
8.5
a. We have
x(k)
= ax(k−1)
= a · ax(k−2)
= a2 x(k−2)
..
.
= ak x(0) .
Because 0 < a < 1, we have ak → 0, and hence x(k) → 0.
b. Similarly, we have
y (k)
=
(y (k−1) )b
=
((y (k−2) )b )b
=
..
.
(y (k−2) )b
=
(y (0) )b .
2
k
Because |y (0) | < 1 and b > 1, we have bk → ∞ and hence y (k) → 0.
c. The order of convergence of {x(k) } is 1 because
|x(k+1) |
= lim a = a,
k→∞ |x(k) |
k→∞
lim
and 0 < a < ∞.
The order of convergence of {y (k) } is b because
|y (k+1) |
= lim 1 = 1,
k→∞
k→∞ |y (k) |b
lim
and 0 < 1 < ∞.
d. Suppose |x(k) | ≤ c|x(0) |. Using part a, we have ak ≤ c, which implies that k ≥ log(c)/ log(a). So the
smallest number of iterations k such that |x(k) | ≤ c|x(0) | is dlog(c)/ log(a)e (the smallest integer not smaller
than log(c)/ log(a)).
k
e. Suppose |y (k) | ≤ c|y (0) |. Using part b, we have |y (0) |b ≤ c|y (0) |. Taking logs (twice) and rearranging, we
have
1
log(c)
k≥
.
log 1 +
log(b)
log(|y (0) |)
Denote the right-hand side by z. So the smallest number of iterations k such that |y (k) | ≤ c|y (0) | is dze.
f. Comparing the answer in part e with that of part d, we can see that as c → 0, the answer in part d is
Ω(log(c)), whereas the answer in part e is O(log log(c)). Hence, in the regime where c is very small, the
number of iterations in part d (linear convergence) is (at least) exponentially larger than that in part e
(superlinear convergence).
8.6
We have uk+1 = (1 − ρ)uk , and uk → 0. Therefore,
|uk+1 |
=1−ρ>0
k→∞ |uk |
lim
and thus the order of convergence is 1.
47
8.7
a. The value of x∗ (in terms of a, b, and c) that minimizes f is x∗ = b/a.
b. We have f 0 (x) = ax − b. Therefore, the recursive equation for the DDS algorithm is
x(k+1) = x(k) − α(ax(k) − b) = (1 − αa)x(k) + αb.
c. Let x̃ = limk→∞ x(k) . Taking limits of both sides of x(k+1) = x(k) − α(ax(k) − b) (from part b), we get
x̃ = x̃ − α(ax̃ − b).
Hence, we get x̃ = b/a = x∗ .
d. To find the order of convergence, we compute
|x(k+1) − b/a|
|x(k) − b/a|p
=
|(1 − αa)x(k) + αb − b/a|
|x(k) − b/a|p
=
|(1 − αa)x(k) − (1 − αa)b/a|
|x(k) − b/a|p
= |1 − αa||x(k) − b/a|1−p .
Let z (k) = |1 − αa||x(k) − b/a|1−p . Note that z (k) converges to a finite nonzero number if and only if p = 1
(if p < 1, then z (k) → 0, and if p > 1, then z (k) → ∞). Therefore, the order of convergence of {x(k) } is 1,
e. Let y (k) = |x(k) − b/a|. From part d, after some manipulation we obtain
y (k+1) = |1 − αa|y (k) = |1 − αa|k+1 y (0) .
The sequence {x(k) } converges (to b/a) if and only if y (k) → 0. This holds if and only if |1 − αa| < 1, which
is equivalent to 0 < α < 2/a.
8.8
We rewrite f as f (x) = 12 x> Qx − b> x, where
"
6
Q=
4
4
6
#
The characteristic polynomial of Q is λ2 − 12λ + 20. Hence, the eigenvalues of Q are 2 and 10. Therefore,
the largest range of values of α for which the algorithm is globally convergent is 0 < α < 2/10.
8.9
a. We can write h(x) = Qx − b, where b = [−4, −1]> and
"
#
3 2
Q=
2 3
is positive definite. Hence, the solution is
"
1
3
Q−1 b =
5 −2
#"
−2
3
# " #
−4
−2
=
.
−1
1
b. By part a, the algorithm is a fixed-step-size gradient algorithm for a problem with gradient h. The
eigenvalues of Q are 1 and 5. Hence, the largest range of values of α such that the algorithm is globally
convergent to the solution is 0 < α < 2/5.
c. The eigenvectors of Q corresponding to eigenvalue 5 has the form c[1, 1]> , where c ∈ R. Hence, to violate
the descent property, we pick
" # " #
1
−3
x(0) = Q−1 b + c
=
1
0
48
where we choose c = −1 so that x(0) has the specified form.
8.10
a. We have
" #
"
#
1 >
3
1+a
> 1
+ b.
x−x
f (x) = x
2
1
1+a
3
b. The unique global minimizer exists if and only if the Hessian is positive definite, which holds if and only
if (1 + a)2 < 9 (by Sylvester’s criterion). Hence, the largest set of values of a and b such that the global
minimizer of f exists is given by −4 < a < 2 and b ∈ R (unrestricted).
The minimizer is given by
"
#" #
" #
" #
1
3 − (1 + a) 1
1
3
−(1 + a) 1
1
∗
x =
=
=
9 − (1 + a)2 −(1 + a)
9 − (1 + a)2 1
4+a 1
3
1
c. The algorithm is a gradient algorithm with fixed step size α = 2/5. The eigenvalues of the Hessian are
(after some calculations) 4 + a and 2 − a. For global convergence, we need 2/5 < 2/λmax , or λmax < 5,
where λmax = max(4 + a, 2 − a). From this we deduce that −3 < a < 1. Hence, the largest set of values of
a and b such that the algorithm is globally convergent is given by −3 < a < 1 and b ∈ R (unrestricted).
8.11
a. We have
f (x(k+1) )
=
(x(k+1) − c)2 /2
=
(x(k) − αk (x(k) − c) − c)2 /2
=
(1 − αk )2 (x(k) − c)2 /2
=
(1 − αk )2 f (x(k) ).
b. We have x(k) → c if and only if f (x(k) ) → 0. Hence, the algorithm is globally convergent
if
Q∞ if and only
(k)
2
f (x(k) ) → 0 for any x0 . From part a, we deduce that
f
(x
)
→
0
for
any
x
if
and
only
if
(1−α
)
=
0.
0
k
k=0
Q∞
Because 0 < α < 1, this condition is equivalent to k=0 (1 − αk ) = 0, which holds if and only if
∞
X
αk = ∞.
k=0
8.12
√
√
The only local minimizer of f is x∗ = 1/ 3. Indeed, we have f 0 (x∗ ) = 0 and f 00 (x∗ ) = 2 3. To find the
largest range of values of α such that the algorithm is locally convergent, we use a “linearization” argument:
The algorithm is locally convergent if and only if the “linearized” algorithm x(k+1) = x(k) −αf 00 (x∗ )(x(k) −x∗ )
is globally convergent. But the linearized algorithm is just a fixed step size algorithm applied to a quadratic
with second derivative f 00 (x∗ ). Therefore,
the largest range of values of α such that the algorithm is locally
√
convergent is 0 < α < 2/f 00 (x∗ ) = 1/ 3.
8.13
We use the formula from Lemma 8.1:
f (x(k+1) ) = (1 − γk )f (x(k) )
(we have V = f in this case). Using the expression for γk , we get, assuming x(k) 6= 1,
γk = α4 · 2−k (1 − α2−k ).
Hence, γk > 0, which means that f (x(k+1) ) < f (x(k) ) if x(k) 6= 1 for k ≥ 0. This implies that the algorithm
has the descent property (for k ≥ 0).
49
We also note that
∞
X
k=0
γk = α4
∞
X
−k
2
k=0
∞
X
−
!
−k
α4
k=0
4α
= α4 2 −
< inf ty.
3
Since γk > 0 for all k ≥ 0, we can apply the theorem given in class to deduce that the algorithm is not
globally convergent.
8.14
We have
|x(k+1) − x∗ | = |x(k) − x∗ − f 0 (x(k) )/f 00 (x∗ )|.
By Taylor’s Theorem,
f 0 (x(k) ) = f 0 (x∗ ) + f 00 (x∗ )(x(k) − x∗ ) + O(|x(k) − x∗ |2 ).
Since f 0 (x∗ ) = 0 by the FONC, we get
x(k) − x∗ − f 0 (x(k) )/f 00 (x∗ ) = O(|x(k) − x∗ |2 ).
Combining the above with the first equation, we get
|x(k+1) − x∗ | = O(|x(k) − x∗ |2 ),
which implies that the order of convergence is at least 2.
8.15
a. The objective function is a quadratic that can be written as
f (x) = (ax − b)> (ax − b) = kak2 x2 − 2a> bx + kbk2 .
Hence, the minimizer is x∗ = a> b/kak2 .
b. Note that f 00 (x) = 2kak2 . Thus, by the result for fixed step size gradient algorithms, the required largest
range for α is (0, 1/kak2 ).
8.16
a. We have
f (x) = kAx − bk2
=
(Ax − b)> (Ax − b)
=
(x> A> − b> )(Ax − b)
= x> (A> A)x − 2(A> b)> x + b> b
which is a quadratic function. The gradient is given by ∇f (x) = 2(A> A)x − 2(A> b) and the Hessian is
given by F (x) = 2(A> A).
b. The fixed step size gradient algorithm for solving the above optimization problem is given by
x(k+1)
= x(k) − α(2(A> A)x(k) − 2A> b)
= x(k) − 2αA> (Ax(k) − b).
c. The largest range of values for α such that the algorithm in part b converges to the solution of the problem
is given by
1
2
= .
0<α<
>
4
λmax (2A A)
8.17
a. We use contraposition. Suppose an eigenvalue of A is negative: Av = λv, where λ < 0 and v is a
corresponding eigenvector. Choose x(0) = v + x∗ . Then,
x(1) = v + x∗ − α(Av + Ax∗ + b) = v + x∗ − αλv,
50
and hence
x(1) − x∗ = (1 − αλ)(x(0) − x∗ ).
Since 1 − αλ > 1, we conclude that the algorithm is not globally monotone.
b. Note that the algorithm is identical to a fixed step size gradient algorithm applied to a quadratic with
Hessian A. The eigenvalues of A are 1 and 5. Therefore, the largest range of values of α for which the
algorithm is globally convergent is 0 < α < 2/5.
8.18
The steepest descent algorithm applied to the quadratic function f has the form
x(k+1) = x(k) − αk g (k) = x(k) −
g (k)> g (k) (k)
g .
g (k)> Qg (k)
⇒: If x(1) = Q−1 b, then
Q−1 b = x(0) − α0 g (0) .
Rearranging the above yields
Qx(0) − b = α0 Qg (0) .
Since g (0) = Qx(0) − b 6= 0, we have
1 (0)
g
α0
Qg (0) =
which means that g (0) is an eigenvector of Q (with corresponding eigenvalue 1/α0 ).
⇐: By assumption, Qg (0) = λg (0) , where λ ∈ R. We want to show that Qx(1) = b. We have
g (0)> g (0) (0)
(1)
(0)
Qx
= Q x − (0)>
g
g
Qg (0)
= Qx(0) −
1 g (0)> g (0)
Qg (0)
λ g (0)> g (0)
= Qx(0) − g (0)
= b.
8.19
a. Possible. Pick f such that λmax ≥ 2λmin and x(0) such that g (0) is an eigenvector of Q with eigenvalue
λmin . Then,
1
2
g (0)> g (0)
=
≥
.
α0 = (0)>
λmin
λmax
g
Qg (0)
b. Not possible. Indeed, using Rayleigh’s inequality,
α0 =
g (0)> g (0)
1
≤
.
λmin
g (0)> Qg (0)
8.20
a. We rewrite f as
f (x) =
where
"
3
Q=
2
1 >
x Qx − b> x − 22,
2
#
2
,
3
"
#
−3
b=
.
1
The eigenvalues of Q are 1 and 5. Therefore, the range of values of the step size for which the algorithm
converges to the minimizer is 0 < α < 2/5.
51
b. An eigenvector of Q corresponding to the eigenvalue 5 is v = [1, 1]> /5. We have x∗ = Q−1 b = [−11, 9]> /5.
Hence, an initial condition that results in the algorithm diverging is
" #
−2
(0)
∗
x =x +v =
.
2
8.21
In both cases, we compute the Hessian Q of f , and find its largest eigenvalue λmax . Then the range we seek
is 0 < α < 2/λmax .
a. In this case,
"
6
Q=
4
#
4
,
6
with eigenvalues 2 and 10. Hence, the answer is 0 < α < 1/5.
b. In this case, again we have
"
6
Q=
4
#
4
,
6
with eigenvalues 2 and 10. Hence, the answer is 0 < α < 1/5.
8.22
For the given algorithm we have
γk = β(2 − β)
(g (k)> g (k) )2
(g (k)> Qg (k) )(g (k)> Q−1 g (k) )
If 0 < β < 2, then β(2 − β) > 0, and by Lemma 8.2,
λmin (Q)
γk ≥ β(2 − β)
>0
λmax (Q)
P∞
which implies that k=0 γk = ∞. Hence, by Theorem 8.1, x(k) → x∗ for any x(0) .
If β ≤ 0 or β ≥ 2, then β(2 − β) ≤ 0, and by Lemma 8.2,
λmax (Q)
< 0.
γk ≤ β(2 − β)
λmin (Q)
By Lemma 8.1, V (x(k) ) ≥ V (x(0) ). Hence, if x(0) 6= x∗ , then {V (x(k) } does not converge to 0, and
consequently x(k) does not converge to x∗ .
8.23
By Lemma 8.1, V (x(k+1) ) = (1 − γk )V (x(k) ) for all k. Note that the algorithm has a descent property if
and only if V (x(k+1) ) < V (x(k) ) whenever g (k) 6= 0. Clearly, whenever g (k) 6= 0, V (x(k+1) ) < V (x(k) ) if and
only if 1 − γk < 1. The desired result follows immediately.
8.24
We have
x(k+1) − x(k) = αk d(k)
and hence
hx(k+1) − x(k) , ∇f (x(k+1) )i = αk hd(k) , ∇f (x(k+1) )i.
Now, let φk (α) = f (x(k) + αd(k) ). Since αk minimizes φk , then by the FONC, φ0k (αk ) = 0. By the chain
rule, φ0k (α) = d(k)> ∇f (x(k) + αd(k) ). Hence,
0 = φ0k (αk ) = d(k)> ∇f (x(k) + αk d(k) ) = hd(k) , ∇f (x(k+1) )i,
52
and so
hx(k+1) − x(k) , ∇f (x(k+1) )i = 0.
8.25
A simple MATLAB routine for implementing the steepest descent method is as follows.
function [x,N]=steep_desc(grad,xnew,options);
%
STEEP_DESC(’grad’,x0);
%
STEEP_DESC(’grad’,x0,OPTIONS);
%
%
x = STEEP_DESC(’grad’,x0);
%
x = STEEP_DESC(’grad’,x0,OPTIONS);
%
%
[x,N] = STEEP_DESC(’grad’,x0);
%
[x,N] = STEEP_DESC(’grad’,x0,OPTIONS);
%
%The first variant finds the minimizer of a function whose gradient
%is described in grad (usually an M-file: grad.m), using a gradient
%descent algorithm with initial point x0. The line search used in the
%secant method.
%The second variant allows a vector of optional parameters to
%defined. OPTIONS(1) controls how much display output is given; set
%to 1 for a tabular display of results, (default is no display: 0).
%OPTIONS(2) is a measure of the precision required for the final point.
%OPTIONS(3) is a measure of the precision required of the gradient.
%OPTIONS(14) is the maximum number of iterations.
%For more information type HELP FOPTIONS.
%
%The next two variants returns the value of the final point.
%The last two variants returns a vector of the final point and the
%number of iterations.
if nargin ~= 3
options = [];
if nargin ~= 2
disp(’Wrong number of arguments.’);
return;
end
end
if length(options) >= 14
if options(14)==0
options(14)=1000*length(xnew);
end
else
options(14)=1000*length(xnew);
end
clc;
format compact;
format short e;
options = foptions(options);
print = options(1);
epsilon_x = options(2);
epsilon_g = options(3);
max_iter=options(14);
for k = 1:max_iter,
53
xcurr=xnew;
g_curr=feval(grad,xcurr);
if norm(g_curr) <= epsilon_g
disp(’Terminating: Norm of gradient less than’);
disp(epsilon_g);
k=k-1;
break;
end %if
alpha=linesearch_secant(grad,xcurr,-g_curr);
xnew = xcurr-alpha*g_curr;
if print,
disp(’Iteration number k =’)
disp(k); %print iteration index k
disp(’alpha =’);
disp(alpha); %print alpha
disp(’Gradient = ’);
disp(g_curr’); %print gradient
disp(’New point =’);
disp(xnew’); %print new point
end %if
if norm(xnew-xcurr) <= epsilon_x*norm(xcurr)
disp(’Terminating: Norm of difference between iterates less than’);
disp(epsilon_x);
break;
end %if
if k == max_iter
disp(’Terminating with maximum number of iterations’);
end %if
end %for
if nargout >= 1
x=xnew;
if nargout == 2
N=k;
end
else
disp(’Final point =’);
disp(xnew’);
disp(’Number of iterations =’);
disp(k);
end %if
%------------------------------------------------------------------
To apply the above MATLAB routine to the function in Example 8.1, we need the following M-file to
specify the gradient.
function y=g(x)
y=[4*(x(1)-4).^3; 2*(x(2)-3); 16*(x(3)+5).^3];
We applied the algorithm as follows:
>> options(2) = 10^(-6);
>> options(3) = 10^(-6);
54
>> steep_desc(’g’,[-4;5;1],options)
Terminating: Norm of gradient less than
1.0000e-06
Final point =
4.0022e+00
3.0000e+00 -4.9962e+00
Number of iterations =
25
As we can see above, we obtained the final point [4.002, 3.000, −4.996]> after 25 iterations. The value of
the objective function at the final point is 7.2 × 10−10 .
8.26
The algorithm terminated after 9127 iterations. The final point was [0.99992, 0.99983]> .
9. Newton’s Method
9.1
a. We have f 0 (x) = 4(x − x0 )3 and f 00 (x) = 12(x − x0 )2 . Hence, Newton’s method is represented as
x(k+1) = x(k) −
x(k) − x0
,
3
which upon rewriting becomes
x(k+1) − x0 =
2 (k)
x − x0
3
b. From part a, y (k) = |x(k) − x0 | = (2/3)|x(k−1) − x0 | = (2/3)y (k−1) .
c. From part b, we see that y (k) = (2/3)k y (0) and therefore y (k) → 0. Hence x(k) → x0 for any x(0) .
d. From part b, we have
|x(k+1) − x0 |
2
2
= lim
= >0
(k)
k→∞ |x
k→∞ 3
3
− x0 |
lim
and hence the order of convergence is 1.
e. The theorem assumes that f 00 (x∗ ) 6= 0. However, in this problem, x∗ = x0 , and f 00 (x∗ ) = 0.
9.2
a. We have
|x(k+1) − x∗ | = |x(k) − x∗ − αk f 0 (x(k) )|.
By Taylor’s theorem applied to f 0 ,
f 0 (x(k) ) = f 0 (x∗ ) + f 00 (x∗ )(x(k) − x∗ ) + o(|x(k) − x∗ |).
Since f 0 (x∗ ) = 0 by the FONC, we get
x(k) − x∗ − αk f 0 (x(k) )
=
(1 − αk f 00 (x∗ ))(x(k) − x∗ ) + αk o(|x(k) − x∗ |)
= o(|x(k) − x∗ |) + αk o(|x(k) − x∗ |)
=
(1 + αk )o(|x(k) − x∗ |).
Because {αk } converges, it is bounded, and so (1 + αk )o(|x(k) − x∗ |) = o(|x(k) − x∗ |). Combining the above
with the first equation, we get
|x(k+1) − x∗ | = o(|x(k) − x∗ |),
which implies that the order of convergence is superlinear.
b. In the secant algorithm, if x(k) → x∗ , then (f 0 (x(k) ) − f 0 (x(k−1) ))/(x(k) − x(k−1) ) → f 00 (x∗ ). Since the
secant algorithm has the form x(k+1) = x(k) −αk f 0 (x(k) ) with αk = (x(k) −x(k−1) )/(f 0 (x(k) )−f 0 (x(k−1) )), we
55
deduce that αk → 1/f 00 (x∗ ). Hence, if we apply the secant algorithm to a function f ∈ C 2 , and it converges
to a local minimizer x∗ such that f 00 (x∗ ) 6= 0, then the order of convergence is superlinear.
9.3
a. We compute f 0 (x) = 4x1/3 /3 and f 00 (x) = 4x−2/3 /9. Therefore Newton’s algorithm for this problem
takes the form
4(x(k) )1/3 /3
x(k+1) = x(k) −
= −2x(k) .
4(x(k) )−2/3 /9
b. From part a, we have x(k) = 2k x(0) . Therefore, as long as x(0) 6= 0, the sequence {x(k) } does not converge
to 0.
9.4
a. Clearly f (x) ≥ 0 for all x. We have
x2 − x21 = 0
x = [1, 1]> .
⇔
⇔
f (x) = 0
and
1 − x1 = 0
Hence, f (x) > f ([1, 1]> ) for all x 6= [1, 1]> , and therefore [1, 1]> is the unique global minimizer.
b. We compute
∇f (x)
F (x)
"
400x31 − 400x1 x2 + 2x1 − 2
200(x2 − x21 )
"
1200x21 − 400x2 + 2
−400x1
=
=
#
#
−400x1
.
200
To apply Newton’s method we use the inverse of the Hessian, which is
"
#
1
200
400x1
−1
F (x) =
.
80000(x21 − x2 ) + 400 400x1 1200x21 − 400x2 + 2
Applying two iterations of Newton’s method, we have x(1) = [1, 0]> , x(2) = [1, 1]> . Therefore, in this
particular case, the method converges in two steps! We emphasize, however, that this fortuitous situation is
by no means typical, and is highly dependent on the initial condition.
c. Applying the gradient algorithm x(k+1) = x(k) − αk ∇f (x(k) ) with a fixed step size of αk = 0.05, we
obtain x(1) = [0.1, 0]> , x(2) = [0.17, 0.1]> .
9.5
If x(0) = x∗ , we are done. So, assume x(0) 6= x∗ . Since the standard Newton’s method reaches the point x∗
in one step, we have
f (x∗ )
= f (x(0) + Q−1 g (0) )
= min f (x)
≤ f (x(0) + αQ−1 g (0) )
for any α ≥ 0. Hence,
α0 = arg min f (x(0) + αQ−1 g (0) ) = 1.
α≥0
Hence, in this case, the modified Newton’s algorithm is equivalent to the standard Newton’s algorithm, and
thus x(1) = x∗ .
10. Conjugate Direction Methods
56
10.1
We proceed by induction to show that for k = 0, . . . , n − 1, the set {d(0) , . . . , d(k) } is Q-conjugate. We
assume that d(i) 6= 0, i = 1, . . . , k, so that d(i)> Qd(i) 6= 0 and the algorithm is well defined.
For k = 0, the statement trivially holds. So, assume that the statement is true for k < n − 1, i.e.,
{d(0) , . . . , d(k) } is Q-conjugate. We now show that {d(0) , . . . , d(k+1) } is Q-conjugate. For this, we need only
to show that for each j = 0, . . . , k, we have d(k+1)> Qd(j) = 0. To this end,
!
k
X
p(k+1)> Qd(i) (i)>
(k+1)>
(j)
(k+1)>
d
Qd(j)
d
Qd
=
p
−
(i)>
(i)
Qd
i=0 d
= p(k+1)> Qd(j) −
k
X
p(k+1)> Qd(i)
d(i)> Qd(i)
i=0
(i)>
By the induction hypothesis, d
(j)
Qd
d(i)> Qd(j) .
= 0 for i 6= j. Therefore
d(k+1)> Qd(j) = p(k+1)> Qd(j) −
p(k+1)> Qd(j)
d(j)> Qd(j)
d(j)> Qd(j) = 0.
In the above, we have assumed that the vectors d(k) are nonzero (so that d(k)> Qd(k) 6= 0 and the algorithm
is well defined). To prove that this assumption holds, we use induction to show that d(k) is a (nonzero)
linear combination of p(0) , . . . , p(k) (which immediately implies that d(k) is nonzero because of the linear
independence of p(0) , . . . , p(k) ).
For k = 0, we have d(0) = p(0) by definition. Assume that the result holds for k < n − 1; i.e., d(k) =
Pk
(k) (j)
(k)
are not all zero. Consider d(k+1) :
j=0 αj p , where the coefficients αj
d(k+1)
= p(k+1) −
k
X
βi d(i)
i=0
= p(k+1) −
k
X
i=0
= p(k+1) −
βi
i
X
(k)
αj p(j)
j=0
k X
k
X
(k)
βi αj p(j) .
j=0 i=j
(k+1)
So, clearly d
(0)
is a nonzero linear combination of p
, . . . , p(k+1) .
10.2
Let k ∈ {0, . . . , n − 1} and φk (α) = f (x(k) + αd(k) ). By the chain rule, we have
φ0 (αk ) = ∇f (x(k) + αk d(k) )> d(k) = g (k+1)> d(k) .
Since g (k+1)> d(k) = 0, we have φ0 (αk ) = 0. Note that
1 (k)>
φk (α) =
d
Qd(k) α2 + g (k)> d(k) α + constant.
2
As φ is a quadratic function of α (with positive coefficient in the quadratic term), we conclude that αk =
arg minα f (x(k) + αd(k) ).
Note that since g (k)> d(k) 6= 0 is the coefficient of the linear term in φk , we have αk 6= 0. For i ∈
{0, . . . , k − 1}, we have
d(k)> Qd(i)
1 (k+1)
(x
− x(k) )> Qd(i)
αk
1 (k+1)
(g
− g (k) )> d(i)
=
αk
1 (k+1)> (i)
=
g
d − g (k)> d(i)
αk
= 0
=
57
by assumption, which completes the proof.
10.3
From the conjugate gradient algorithm we have
d(k) = −g (k) +
g (k)> Qd(k−1)
(k−1)>
d
(k−1)
Qd
d(k−1) .
Premultiplying the above by d(k)> Q and using the fact that d(k) and d(k−1) are Q-conjugate, yields
d(k)> Qd(k)
= −d(k)> Qg (k) +
g (k)> Qd(k−1)
d(k−1)> Qd(k−1)
d(k)> Qd(k−1)
= −d(k)> Qg (k) .
10.4
a. Since Q is symmetric, then there exists a set of vectors {d(1) , . . . , d(n) } such that Qd(i) = λi d(i) ,
i = 1, . . . , n, and d(i)> d(j) = 0, j 6= i, where the λi are (real) eigenvalues of Q. Therefore, if i 6= j, we have
d(i)> Qd(j) = d(i)> (λj d(j) ) = λj (d(i)> d(j) ) = 0. Hence the set {d(1) , . . . , d(n) } is Q-conjugate.
b. Define λi = (d(i)> Qd(i) )/(d(i)> d(i) ). Let


d(1)>
 . 

D=
 ..  .
d(n)>
Since Q is positive definite and the set {d(1) , . . . , d(n) } is Q-conjugate, then by Lemma 10.1, the set is also
linearly independent. Hence, D is nonsingular. By Q-conjugacy, we have that for all i 6= j, d(i)> Qd(j) = 0.
By assumption, we have d(i)> λj d(j) = λj d(i)> d(j) = 0. Hence, d(i)> Qd(j) = λj d(i)> d(j) . Moreover, for
each i = 1, . . . , n, we have d(i)> Qd(i) = d(i)> λi d(i) = λi d(i)> d(i) . We can write the above conditions in
matrix form:
DQd(i) = D(λi d(i) ).
Since D is nonsingular, then we have
Qd(i) = λi d(i) ,
which completes the proof.
10.5
We have
d(k)> Qd(k+1) = γk d(k)> Qg (k+1) + d(k)> Qd(k) .
Hence, in order to have d(k)> Qd(k+1) = 0, we need
γk = −
d(k)> Qd(k)
d(k)> Qg (k+1)
.
10.6
We use induction. For k = 0, we have
d(0) = a0 g (0) = −a0 b ∈ V1 .
Moreover, x(0) = 0 ∈ V0 . Hence, the proposition is true at k = 0. Assume it is true at k. To show that it is
also true at k + 1, note first that
x(k+1) = x(k) + αk d(k) .
58
Because x(k) ∈ Vk ⊂ Vk+1 and d(k) ∈ Vk+1 by the induction hypothesis, we deduce that x(k+1) ∈ Vk+1 .
Moreover,
d(k+1)
= ak g (k+1) + bk d(k)
= ak (Qx(k+1) − b) + bk d(k) .
But because x(k+1) ∈ Vk+1 , Qx(k+1) − b ∈ Vk+2 . Moreover, d(k) ∈ Vk+1 ⊂ Vk+2 . Hence, d(k+1) ∈ Vk+2 .
This completes the induction proof.
b. The conjugate gradient algorithm is an instance of the algorithm given in the question. By the “expanding
subspace” theorem, we can say that in the conjugate gradient algorithm (with x(0) = 0), at each k, x(k) is
the global minimizer of f on the Krylov subspace Vk . Note that for all k ≥ n, Vk+1 = Vk , because of the
Cayley-Hamilton theorem, which allows us to express Qn as a linear combination of I, Q, . . . , Qn−1 .
10.7
Expanding φ(a) yields
φ(a)
=
=
1
(x0 + Da)> Q(x0 + Da) − (x0 + Da)> b
2
1
1 > >
>
>
>
>
>
a D QD a + a D Qx0 − D b +
x Qx0 − x0 b .
2
2 0
Clearly φ is a quadratic function on Rr . It remains to show that the matrix in the quadratic term, D > QD,
is positive definite. Since Q > 0, for any a ∈ Rr , we have
a> D > QD a = (Da)> Q(Da) ≥ 0
and
a> D > QD a = (Da)> Q(Da) = 0
if and only if Da = 0. Since rank D = r, Da = 0 if and only if a = 0. Hence, the matrix D > QD is positive
definite.
10.8
a. Let 0 ≤ k ≤ n − 1 and 0 ≤ i ≤ k. Then,
g (k+1)T g (i)
= g (k+1)T (βi−1 d(i−1) − d(i) )
= βi−1 g (k+1)T d(i−1) − g (k+1)T d(i)
=
0
by Lemma 10.2.
b. Let 0 ≤ k ≤ n − 1. and 0 ≤ i ≤ k − 1. Then,
g (k+1)T Qg (i)
=
(βk d(k) − d(k+1) )T Q(βi−1 d(i−1) − d(i) )
= βk βi−1 d(k)T Qd(i−1) − βk d(k)T Qd(i) − βi−1 d(k+1)T Qd(i−1) + d(k+1)T Qd(i)
=
0
by Q-conjugacy of d(k+1) , d(k) , d(i) and d(i−1) (note that the iteration indices here are all distinct).
10.9
We represent f as
"
1 > 5
f (x) = x
2
−3
#
" #
−3
0
x − x>
− 7.
2
1
59
The conjugate gradient algorithm is based on the following formulas:
x(k+1)
= x(k) + αk d(k) ,
d(k+1)
= −g (k+1) + βk d(k) ,
αk = −
g (k)> d(k)
d(k)> Qd(k)
βk =
g (k+1)> Qd(k)
d(k)> Qd(k)
.
We have,
"
(0)
d
=g
(0)
#
0
− b = −b =
.
−1
(0)
= Qx
We then proceed to compute
h
α0 = −
g (0)> d(0)
d(0)> Qd(0)
0
=−
h
−1
0
i
−1
"
i
5
−3
"
#
0
−1
1
#" # = − .
2
−3
0
2
−1
Hence,
x
(1)
=x
(0)
(0)
+ α0 d
" #
" # "
#
1 0
0
0
=
−
=
.
2 −1
0
1/2
We next proceed by evaluating the gradient of the objective function at x(1) ,
"
#"
# " # "
#
5 −3
0
0
−3/2
(1)
(1)
g = Qx − b =
−
=
.
−3 2
1/2
1
0
Because the gradient is nonzero, we can proceed with the next step where we compute
"
#" #
h
i 5 −3
0
−3/2 0
(0)
(1)>
−3
2
−1
g
Qd
9
"
#" # = − .
β0 = (0)>
=
(0)
h
i
4
d
Qd
5 −3
0
0 −1
−3 2
−1
Hence, the direction d(1) is
"
(1)
d
= −g
(1)
(0)
+ β0 d
#
" # "
#
9 0
3/2
3/2
=
−
=
.
4 −1
0
9/4
It is easy to verify that the directions d(0) and d(1) are Q-conjugate. Indeed,
"
#"
#
h
i 5 −3 3/2
(0)>
(1)
d
Qd = 0 −1
= 0.
−3 2
9/4
10.10
a. We have f (x) = 12 x> Qx − b> x where
"
5
Q=
2
#
2
,
1
60
" #
3
b=
.
1
b. Since f is a quadratic function on R2 , we need to perform only two iterations. For the first iteration we
compute
d(0)
α0
x(1)
g (1)
= −g (0) = [3, 1]>
5
=
29
= [0.51724, 0.17241]>
= [−0.06897, 0.20690]> .
For the second iteration we compute
β0
(1)
=
0.0047534
d
α1
= [0.08324, −0.20214]>
= 5.7952
x(2)
=
[1.000, −1.000]> .
c. The minimizer is given by x∗ = Q−1 b = [1, −1]> , which agrees with part b.
10.11
A MATLAB routine for the conjugate gradient algorithm with options for different formulas of βk is:
function [x,N]=conj_grad(grad,xnew,options);
%
CONJ_GRAD(’grad’,x0);
%
CONJ_GRAD(’grad’,x0,OPTIONS);
%
%
x = CONJ_GRAD(’grad’,x0);
%
x = CONJ_GRAD(’grad’,x0,OPTIONS);
%
%
[x,N] = CONJ_GRAD(’grad’,x0);
%
[x,N] = CONJ_GRAD(’grad’,x0,OPTIONS);
%
%The first variant finds the minimizer of a function whose gradient
%is described in grad (usually an M-file: grad.m), using initial point
%x0.
%The second variant allows a vector of optional parameters to be
%defined:
%OPTIONS(1) controls how much display output is given; set
%to 1 for a tabular display of results, (default is no display: 0).
%OPTIONS(2) is a measure of the precision required for the final point.
%OPTIONS(3) is a measure of the precision required of the gradient.
%OPTIONS(5) specifies the formula for beta:
%
0=Powell;
%
1=Fletcher-Reeves;
%
2=Polak-Ribiere;
%
3=Hestenes-Stiefel.
%OPTIONS(14) is the maximum number of iterations.
%For more information type HELP FOPTIONS.
%
%The next two variants return the value of the final point.
%The last two variants return a vector of the final point and the
%number of iterations.
if nargin ~= 3
options = [];
if nargin ~= 2
disp(’Wrong number of arguments.’);
return;
end
61
end
numvars = length(xnew);
if length(options) >= 14
if options(14)==0
options(14)=1000*numvars;
end
else
options(14)=1000*numvars;
end
clc;
format compact;
format short e;
options = foptions(options);
print = options(1);
epsilon_x = options(2);
epsilon_g = options(3);
max_iter=options(14);
g_curr=feval(grad,xnew);
if norm(g_curr) <= epsilon_g
disp(’Terminating: Norm of initial gradient less than’);
disp(epsilon_g);
return;
end %if
d=-g_curr;
reset_cnt = 0;
for k = 1:max_iter,
xcurr=xnew;
alpha=linesearch_secant(grad,xcurr,d);
%alpha=-(d’*g_curr)/(d’*Q*d);
xnew = xcurr+alpha*d;
if print,
disp(’Iteration number k =’)
disp(k); %print iteration index k
disp(’alpha =’);
disp(alpha); %print alpha
disp(’Gradient = ’);
disp(g_curr’); %print gradient
disp(’New point =’);
disp(xnew’); %print new point
end %if
if norm(xnew-xcurr) <= epsilon_x*norm(xcurr)
disp(’Terminating: Norm of difference between iterates less than’);
disp(epsilon_x);
break;
end %if
g_old=g_curr;
g_curr=feval(grad,xnew);
if norm(g_curr) <= epsilon_g
disp(’Terminating: Norm of gradient less than’);
62
disp(epsilon_g);
break;
end %if
reset_cnt = reset_cnt+1;
if reset_cnt == 3*numvars
d=-g_curr;
reset_cnt = 0;
else
if options(5)==0 %Powell
beta = max(0,(g_curr’*(g_curr-g_old))/(g_old’*g_old));
elseif options(5)==1 %Fletcher-Reeves
beta = (g_curr’*g_curr)/(g_old’*g_old);
elseif options(5)==2 %Polak-Ribiere
beta = (g_curr’*(g_curr-g_old))/(g_old’*g_old);
else %Hestenes-Stiefel
beta = (g_curr’*(g_curr-g_old))/(d’*(g_curr-g_old));
end %if
d=-g_curr+beta*d;
end
if print,
disp(’New beta =’);
disp(beta);
disp(’New d =’);
disp(d);
end
if k == max_iter
disp(’Terminating with maximum number of iterations’);
end %if
end %for
if nargout >= 1
x=xnew;
if nargout == 2
N=k;
end
else
disp(’Final point =’);
disp(xnew’);
disp(’Number of iterations =’);
disp(k);
end %if
%------------------------------------------------------------------
We created the following M-file, g.m, for the gradient of Rosenbrock’s function:
function y=g(x)
y=[-400*(x(2)-x(1).^2)*x(1)-2*(1-x(1)), 200*(x(2)-x(1).^2)]’;
We tested the above routine as follows:
>>
>>
>>
>>
>>
options(2)=10^(-7);
options(3)=10^(-7);
options(14)=100;
options(5)=0;
conj_grad(’g’,[-2;2],options);
Terminating: Norm of difference between iterates less than
63
1.0000e-07
Final_Point =
1.0000e+00
1.0000e+00
Number_of_iteration =
8
>> options(5)=1;
>> conj_grad(’g’,[-2;2],options);
Terminating: Norm of difference between iterates less than
1.0000e-07
Final_Point =
1.0000e+00
1.0000e+00
Number_of_iteration =
10
>> options(5)=2;
>> conj_grad(’g’,[-2;2],options);
Terminating: Norm of difference between iterates less than
1.0000e-07
Final_Point =
1.0000e+00
1.0000e+00
Number_of_iteration =
8
>> options(5)=3;
>> conj_grad(’g’,[-2;2],options);
Terminating: Norm of difference between iterates less than
1.0000e-07
Final_Point =
1.0000e+00
1.0000e+00
Number_of_iteration =
8
The reader is cautioned not to draw any conclusions about the superiority or inferiority of any of the
formulas for βk based only on the above single numerical experiment.
11. Quasi-Newton Methods
11.1
a. Let
φ(α) = f (x(k) + αd(k) ).
Then, using the chain rule, we obtain
φ0 (α) = d(k)> ∇f (x(k) + αd(k) ).
Hence
φ0 (0) = d(k)> g (k) .
Since φ0 is continuous, then, if d(k)> g (k) < 0, there exists ᾱ > 0 such that for all α ∈ (0, ᾱ], φ(α) < φ(0),
i.e., f (x(k) + αd(k) ) < f (x(k) ).
b. By part a, φ(α) < φ(0) for all α ∈ (0, ᾱ]. Hence,
αk = arg min φ(α) 6= 0
α≥0
which implies that αk > 0.
64
c. Now,
d(k)> g (k+1) = d(k)> ∇f (x(k) + αk d(k) ) = φ0k (αk ).
Since αk = arg minα≥0 f (x(k) + αd(k) ) > 0, we have φ0k (αk ) = 0. Hence, g (k+1)> d(k) = 0.
d.
i. We have d(k) = −g (k) . Hence, d(k)> g (k) = −kg (k) k2 . If g (k) 6= 0, then kg (k) k2 > 0, and hence
d(k)> g (k) < 0.
ii. We have d(k) = −F (x(k) )−1 g (k) . Since F (x(k) ) > 0, we also have F (x(k) )−1 > 0.
d(k)> g (k) = −g (k)> F (x(k) )−1 g (k) < 0 if g (k) 6= 0.
Therefore,
iii. We have
d(k) = −g (k) + βk−1 d(k−1) .
Hence,
d(k)> g (k) = −kg (k) k2 + βk−1 d(k−1)> g (k) .
By part c, d(k−1)> g (k) = 0. Hence, if g (k) 6= 0, then kg (k) k2 > 0, and
d(k)> g (k) = −kg (k) k2 < 0.
iv. We have d(k) = −H k g (k) . Therefore, if H k > 0 and g (k) 6= 0, then d(k)> g (k) = −g (k)> H k g (k) < 0.
e. Using the equation ∇f (x) = Qx − b, we get
d(k)> g (k+1)
= d(k)> (Qx(k+1) − b)
= d(k)> (Q(x(k) + αk d(k) ) − b)
= αk d(k)> Qd(k) + d(k)> (Qx(k) − b)
= αk d(k)> Qd(k) + d(k)> g (k) .
By part c, d(k)> g (k+1) = 0, which implies
αk = −
d(k)> g (k)
d(k)> Qd(k)
.
11.2
Yes, because:
1. The search direction is of the form d(k) = H k ∇f (x(k) ) for matrix H k = F (x(k) )−1 ;
2. The matrix H k = F (x(k) )−1 is symmetric for f ∈ C 2 ;
3. If f is quadratic, then the quasi-Newton condition is satisfied: H k+1 ∆g (i) = ∆x(i) , 0 ≤ i ≤ k. To see
this, note that if the Hessian is Q, then Q∆x(i) = ∆g (i) . Multiplying both sides by H k = Q−1 , we
obtain the desired result.
11.3
a. We have
1
> >
f x(k) + αd(k) =
x(k) + αd(k) Q x(k) + αd(k) − x(k) + αd(k) b + c.
2
Using the chain rule, we obtain
>
d (k)
f x + αd(k) = x(k) + αd(k) Qd(k) − d(k)> b.
dα
65
Equating the above to zero and solving for α gives
x(k)> Q − b> d(k) = −αd(k)> Qd(k) .
Taking into account that g (k)> = x(k)> Q − b> and that d(k)> Qd(k) > 0 for g (k) 6= 0, we obtain
αk = −
g (k)> d(k)
(k)>
d
=
(k)
Qd
g (k)> H k g (k)
d(k)> Qd(k)
.
b. The matrix Q is symmetric and positive definite; hence αk > 0 if H k = H >
k > 0.
11.4
a. The appropriate choice is H = F (x∗ )−1 . To show this, we can apply the same argument as in the proof
of the theorem on the convergence of Newton’s method. (We won’t repeat it here.)
b. Yes (provided we incorporate the usual step size). Indeed, if we apply the algorithm with the choice of H
in part a, then when applied to a quadratic with Hessian Q, the algorithm uses H = Q−1 , which definitely
satisfies the quasi-Newton condition. In fact, the algorithm then behaves just like Newton’s algorithm.
11.5
Our objective is to minimize the quadratic
f (x) =
1 >
x Qx − x> b + c.
2
We first compute the gradient ∇f and evaluate it at x(0) ,
"
∇f x(0) = g (0)
#
−1
= Qx(0) − b =
.
1
It is a non-zero vector, so we proceed with the first iteration. Let H 0 = I 2 . Then,
" #
1
d(0) = −H 0 g (0) =
.
−1
The step size α0 is
h
α0 = −
g (0)> d(0)
d(0)> Qd(0)
=−
h
−1
1
i
"
i 1
−1
0
1
"
#
1
−1
2
#" # = .
3
0
1
2 −1
Hence,
"
x
(1)
=x
(0)
(0)
+ α0 d
#
2/3
=
.
−2/3
We evaluate the gradient ∇f and evaluate it at x(1) to obtain
"
#"
# " # "
#
1
0
2/3
1
−1/3
∇f x(1) = g (1) = Qx(1) − b =
−
=
.
0 2 −2/3
−1
−1/3
It is a non-zero vector, so we proceed with the second iteration. We compute H 1 , where
>
∆x(0) − H 0 ∆g (0) ∆x(0) − H 0 ∆g (0)
H1 = H0 +
.
∆g (0)> ∆x(0) − H 0 ∆g (0)
66
To find H 1 we need to compute,
"
∆x
(0)
=x
(1)
−x
(0)
#
"
2/3
=
−2/3
and ∆g
(0)
=g
(1)
−g
(0)
#
2/3
=
.
−4/3
Using the above, we determine,
"
∆x
(0)
− H 0 ∆g
(0)
0
=
2/3
#
8
and ∆g (0)> ∆x(0) − H 0 ∆g (0) = − .
9
Then, we obtain
>
∆x(0) − H 0 ∆g (0) ∆x(0) − H 0 ∆g (0)
= H0 +
∆g (0)> ∆x(0) − H 0 ∆g (0)
"
#
0 0
"
#
0 4/9
1 0
=
+
−8/9
0 1
"
#
1 0
=
0 1/2
H1
and
"
(1)
d
= −H 1 g
(1)
#
1/3
=
.
1/6
We next compute
α1 = −
g (1)> d(1)
d(1)> Qd(1)
= 1.
Therefore,
"
x
(2)
∗
=x =x
(1)
(1)
+ α1 d
#
1
=
.
−1/2
Note that g (2) = Qx(2) − b = 0 as expected.
11.6
We are guaranteed that the step size satisfies αk > 0 if the search direction is in the descent direction,
i.e., the search direction d(k) = −M k ∇f (x(k) ) has strictly positive inner product with −∇f (x(k) ) (see
Exercise 11.1). Thus, the condition on M k that guarantees αk > 0 is ∇f (x(k) )> M k ∇f (x(k) ) > 0, which
corresponds to 1 + a > 0, or a > −1. (Note that if a ≤ −1, the search direction is not in the descent
direction, and thus we cannot guarantee that αk > 0.)
11.7
Let x ∈ Rn . Then
>
x H k+1 x
>
= x Hkx + x
= x> H k x +
>
(∆x(k) − H k ∆g (k) )(∆x(k) − H k ∆g (k) )>
∆g (k)> (∆x(k) − H k ∆g (k) )
(x> (∆x(k) − H k ∆g (k) ))2
∆g (k)> (∆x(k) − H k ∆g (k) )
!
x
.
Note that since H k > 0, we have x> H k x > 0. Hence, if ∆g (k)> (∆x(k) −H k ∆g (k) ) > 0, then x> H k+1 x > 0.
11.8
The complement of the Rank One update equation is
B k+1 = B k +
(∆g (k) − B k ∆x(k) )(∆g (k) − B k ∆x(k) )>
∆x(k)> (∆g (k) − B k ∆x(k) )
67
.
Using the matrix inverse formula, we get
B −1
k+1
= B −1
k
(k)
− B k ∆x(k) )(∆g (k) − B k ∆x(k) )B −1
B −1
k (∆g
k
−
∆x
(k)>
(∆g
= B −1
k +
(k)
(k)
− B k ∆x(k) ) + (∆g (k) − B k ∆x(k) )> B −1
− B k ∆x(k) )
k (∆g
(k)
(k) >
)(∆x(k) − B −1
)
(∆x(k) − B −1
k ∆g
k ∆g
(k)
∆g (k)> (∆x(k) − B −1
)
k ∆g
.
Substituting H k for B −1
k , we get a formula identical to the Rank One update equation. This should not
be surprising, since there is only one update equation involving a rank one correction that satisfies the
quasi-Newton condition.
11.9
We first compute the gradient ∇f and evaluate it at x(0) ,
∇f x
(0)
"
=g
(0)
(0)
= Qx
#
−1
.
−b=
1
It is a nonzero vector, so we proceed with the first iteration. Let H 0 = I 2 . Then,
" #
1
(0)
(0)
d = −H 0 g =
.
−1
The step size α0 is
h
α0 = −
g (0)> d(0)
d(0)> Qd(0)
=−
h
−1
1
i
"
i 1
−1
0
1
"
#
1
−1
2
#" # = .
3
0
1
2 −1
Hence,
"
x
(1)
=x
(0)
(0)
+ α0 d
#
2/3
=
.
−2/3
We evaluate the gradient ∇f and evaluate it at x(1) to obtain
"
#"
# " # "
#
1 0
2/3
1
−1/3
(1)
(1)
(1)
∇f x
= g = Qx − b =
−
=
.
0 2 −2/3
−1
−1/3
It is a nonzero vector, so we proceed with the second iteration. We compute H 1 , where
>
∆x(0) − H 0 ∆g (0) ∆x(0) − H 0 ∆g (0)
H1 = H0 +
.
∆g (0)> ∆x(0) − H 0 ∆g (0)
To find H 1 we need
"
∆x
(0)
=x
(1)
−x
(0)
#
2/3
=
−2/3
"
and ∆g
(0)
=g
(1)
−g
(0)
#
2/3
=
.
−4/3
Using the above, we determine,
"
∆x(0) − H 0 ∆g (0)
0
=
2/3
#
8
and ∆g (0)> ∆x(0) − H 0 ∆g (0) = − .
9
68
Then, we obtain
>
∆x(0) − H 0 ∆g (0) ∆x(0) − H 0 ∆g (0)
= H0 +
∆g (0)> ∆x(0) − H 0 ∆g (0)
"
#
0 0
"
#
0 4/9
1 0
=
+
−8/9
0 1
"
#
1 0
=
0 1/2
H1
and
"
(1)
d
= −H 1 g
(1)
#
1/3
=
.
1/6
Note that d(0)> Qd(1) = 0, that is, d(0) and d(1) are Q-conjugate.
11.10
The calculations are similar until we get to the second step:
"
#
1/2 −1/2
H1 =
−1/2 1/2
d(0)
= 0.
So the algorithm gets stuck at this point, which illustrates that it doesn’t work.
11.11
a. Since f is quadratic, and αk = arg minα≥0 f (x(k) + αd(k) ), then
αk = −
g (k)> d(k)
d(k)> Qd(k)
.
b. Now, d(k) = −H k g (k) , where H k = H >
k > 0. Substituting this into the formula for αk in part a, yields
αk =
g (k)> H k g (k)
d(k)> Qd(k)
> 0.
11.12
(Our solution to this problem is based on a solution that was furnished to us by Michael Mera, a student in
ECE 580 at Purdue in Spring 2005.) To proceed, we recall the formula of Lemma 11.1,
(A + uv > )−1 = A−1 −
(A−1 u)(v > A−1 )
1 + v > A−1 u
for 1 + v > A−1 u 6= 0. Recall the definitions from the hint,
A 0 = B k , u0 =
and
A1 = B k +
∆g (k)
(k)>
, v>
,
0 = ∆g
∆g (k)> ∆x(k)
B k ∆x(k)
∆g (k) ∆g (k)>
= A 0 + u0 v >
,
0 , u1 = −
(k)>
(k)
(k)>
∆g
∆x
∆x
B k ∆x(k)
and
(k)>
v>
Bk .
1 = ∆x
69
Using the above notation, we represent B k+1 as
B k+1
>
= A 0 + u0 v >
0 + u1 v 1
>
= A 1 + u1 v 1 .
Applying to the above Lemma 11.1 gives
GS
H BF
k+1
=
(B k+1 )−1
=
−1
(A1 + u1 v >
1)
= A−1
1 −
> −1
A−1
1 u1 v 1 A 1
.
−1
1 + v>
1 A 1 u1
Substituting into the above the expression for A−1
1 yields
> −1
> −1
A−1
A−1
−1
−1
>
0 u0 v 0 A0
0 u0 v 0 A0
−1
A
u
v
A
−
−
> −1
−1
−1
1
>
>
1
0
0
A 0 u0 v 0 A 0
1+v 0 A0 u0
1+v A u
GS
0 0 0 .
−
H BF
= A−1
−1
k+1
0 −
> A−1
−1
>
A
u
v
−1
0
>
0
0
0
1 + v 0 A 0 u0
1 + v 1 A0 − 1+v> A−1 u
u1
0
0
0
−1
Note that A0 = B k . Hence, A−1
0 = B k = H k . Using this and the notation introduced at the beginning of
the solution, we obtain
H k ∆g (k) ∆g (k)> H k
GS
H BF
=
H
−
k
k+1
∆g (k)> ∆x(k) + ∆g (k)> H k ∆g (k)
H k ∆g (k) ∆g (k)> H k
−B k ∆x(k) ∆x(k)> B k
H k − ∆g(k)>
∆x(k) +∆g (k)> H k ∆g (k)
∆x(k)> B k ∆x(k)
−
(k) ∆g (k)> H
H
∆g
−B k ∆x(k)
k
k
1 + ∆x(k)> B k H k − ∆g(k)> ∆x(k) +∆g(k)> H k ∆g(k)
∆x(k)> B k ∆x(k)
H k ∆g (k) ∆g (k)> H k
.
× Hk −
∆g (k)> ∆x(k) + ∆g (k)> H k ∆g (k)
We next perform some multiplications taking into account that H k = B −1
k and hence
H k Bk = Bk H k = I n.
We obtain
H k ∆g (k) ∆g (k)> H k
GS
H BF
= Hk −
k+1
∆g (k)> ∆x(k) + ∆g (k)> H k ∆g (k)
(k)
H k ∆g ∆g (k)>
∆g (k) ∆g (k)> H k
(k)
(k)>
1 − ∆g(k)> ∆x
(−∆x
∆x
)
1
−
(k) +∆g (k)> H ∆g (k)
(k)>
(k)
(k)>
(k)
∆g
∆x +∆g
H k ∆g
k
−
.
∆g (k) ∆g (k)>
(k)>
(k)
(k)>
(k)
∆x
B k ∆x + ∆x
B k − ∆g(k)> ∆x(k) +∆g(k)> H k ∆g(k) (−∆x )
We proceed with our manipulations. We first perform multiplications by ∆x(k) and ∆x(k)> to obtain
−
H k ∆g (k) ∆g (k)> H k
GS
H BF
= Hk −
k+1
∆g (k)> ∆x(k) + ∆g (k)> H k ∆g (k)
(k)
(k)>
H k ∆g ∆g
∆x(k)
∆x(k)> ∆g (k) ∆g (k)> H k
(k)
(k)>
−
∆x
∆x
−
(k)>
(k)
(k)>
(k)
(k)>
(k)
(k)>
(k)
∆g
∆x +∆g
H k ∆g
∆g
∆x +∆g
H k ∆g
∆x(k)> B k ∆x(k) − ∆x(k)> B k ∆x(k) +
70
∆x(k)> ∆g (k) ∆g (k)> ∆x(k)
∆g (k)> ∆x(k) +∆g (k)> H k ∆g (k)
.
Cancelling the terms in the denominator of the last term above and performing further multiplications gives
BF GS
Hk+1
= Hk
H k ∆g (k) ∆g (k)> H k
∆g (k)> ∆x(k) + ∆g (k)> H k ∆g (k)
H k ∆g (k) (∆g (k)> ∆x(k) )(∆x(k)> ∆g (k) )∆g (k)> H k
∆g (k)> ∆x(k) +∆g (k)> H k ∆g (k)
∆x(k)> ∆g (k) ∆g (k)> ∆x(k)
−
+
∆x(k) ∆x(k)> ∆g (k)> ∆x(k) + ∆g (k)> H k ∆g (k)
∆x(k)> ∆g (k) ∆g (k)> ∆x(k)
(k)
∆g (k)> ∆x(k) ∆x(k)> + ∆x(k) ∆g (k)> H k
H k ∆g
.
∆x(k)> ∆g (k) ∆g (k)> ∆x(k)
+
−
Further simplification of the third and the fifth terms on the right hand-side of the above equation gives
BF GS
Hk+1
= Hk
−
+
+
−
H k ∆g (k) ∆g (k)> H k
∆g (k)> ∆x(k) + ∆g (k)> H k ∆g (k)
H k ∆g (k) ∆g (k)> H k
∆g (k)> ∆x(k) + ∆g (k)> H k ∆g (k)
∆x(k) ∆x(k)> ∆g (k)> ∆x(k) + ∆g (k)> H k ∆g (k)
∆x(k)> ∆g (k) ∆g (k)> ∆x(k)
H k ∆g (k) ∆x(k)> + ∆x(k) ∆g (k)> H k
.
∆x(k)> ∆g (k)
Note that the second and the third terms cancel out each other. We then represent the fourth term in
alternative manner to obtain
∆x(k) ∆x(k)>
∆g (k)> H k ∆g (k)
BF GS
H k+1 = H k +
1+
∆x(k)> ∆g (k)
∆g (k)> ∆x(k)
H k ∆g (k) ∆x(k)> + ∆x(k) ∆g (k)> H k
,
∆x(k)> ∆g (k)
−
which is the desired BFGS update formula.
11.13
The first step for both algorithms is clearly the same, since in either case we have
x(1) = x(0) − α0 g (0) .
For the second step,
d(1)
= −H 1 g (1)
= − In +
1+
∆g (0)> ∆g (0)
!
∆x(0) ∆x(0)>
∆x(0)> ∆g (0)
!
∆g (0) ∆x(0)> + (∆g (0) ∆x(0)> )>
−
g (1)
(0)>
(0)
∆g
∆x
!
(0)>
∆g
∆g (0) ∆x(0) ∆x(0)> g (1)
(1)
= −g − 1 +
∆g (0)> ∆x(0)
∆x(0)> ∆g (0)
+
∆g (0)> ∆x(0)
∆g (0) ∆x(0)> g (1) + ∆x(0) ∆g (0)> g (1)
∆g (0)> ∆x(0)
Since the line search is exact, we have
∆x(0)> g (1) = α0 d(0)> g (1) = 0.
71
.
Hence,
(1)
d
= −g
(1)
∆g (0)> g (1)
+
∆g (0)> ∆x(0)
!
g (1)> ∆g (0)
= −g (1) +
∆g (0)> d(0)
!
∆x(0)
d(0)
= −g (1) + β0 d(0)
where
β0 =
g (1)> ∆g (0)
d(0)> ∆g (0)
g (1)> (g (1) − g (0) )
=
d(0)> (g (1) − g (0) )
is the Hestenes-Stiefel update formula for β0 . Since d(0) = −g (0) , and g (1)> g (0) = 0, we have
β0 =
g (1)> (g (1) − g (0) )
,
g (0)> g (0)
which is the Polak-Ribiere formula. Applying g (1)> g (0) = 0 again, we get
β0 =
g (1)> g (1)
,
g (0)> g (0)
which is the Fletcher-Reeves formula.
11.14
a. Suppose the three conditions hold whenever applied to a quadratic. We need to show that when applied
to a quadratic, for k = 0, . . . , n − 1 and i = 0, . . . , k, H k+1 ∆g (i) = ∆x(i) . For i = k, we have
H k+1 ∆g (k)
= H k ∆g (k) + U k ∆g (k)
= H k ∆g
=
∆x
(k)
(k)
+ ∆x
(k)
by condition 1
− H k ∆g (k)
by condition 2
,
as required. For the rest of the proof (i = 0, . . . , k − 1), we use induction on k.
For k = 0, there is nothing to prove (covered by the i = k case). So suppose the result holds for k − 1.
To show the result for k, first fix i ∈ {0, . . . , k − 1}. We have
H k+1 ∆g (i)
= H k ∆g (i) + U k ∆g (i)
=
∆x(i) + U k ∆g (i)
=
∆x(i) + a(k) ∆x(k)> ∆g (i) + b(k) ∆g (k)> H k ∆g (i)
by the induction hypothesis
by condition 3.
So it suffices to show that the second and third terms are both 0. For the second term,
∆x(k)> ∆g (i)
=
∆x(k)> Q∆x(i)
= αk αi d(k)> Qd(i)
=
0
because of the induction hypothesis, which implies Q-conjugacy (where Q is the Hessian of the given
quadratic). Similarly, for the third term,
∆g (k)> H k ∆g (i)
=
=
∆g (k)> ∆x(i)
∆x
(k)>
Q∆x
by the induction hypothesis
(i)
= αk αi d(k)> Qd(i)
=
0,
72
again because of the induction hypothesis, which implies Q-conjugacy. This completes the proof.
b. All three algorithms satisfy the conditions in part a. Condition 1 holds, as described in class. Condition
2 is straightforward to check for all three algorithms. For the rank-one and DFP algorithms, this is shown in
the book. For BFGS, some simple matrix algebra establishes that it holds. Condition 3 holds by appropriate
definition of the vectors a(k) and b(k) . In particular, for the rank-one algorithm,
a(k) =
(∆x(k) − H k ∆g (k) )
(∆x
(k)
− Hk
∆g (k) )> ∆g (k)
b(k) = −
,
(∆x(k) − H k ∆g (k) )
(∆x(k) − H k ∆g (k) )> ∆g (k)
.
For the DFP algorithm,
a(k) =
∆x(k)
∆x(k)> ∆g
b(k) = −
,
(k)
H k ∆g (k)
∆g (k)> H k ∆g (k)
.
Finally, for the BFGS algorithm,
a(k) =
1+
∆g (k)> H k ∆g (k)
∆g (k)> ∆x(k)
!
∆x(k)
∆x(k)> ∆g (k)
−
H k ∆g (k)
∆g (k)> ∆x
,
(k)
b(k) =
∆x(k)
∆g (k)> ∆x(k)
.
11.15
a. Suppose we apply the algorithm to a quadratic. Then, by the quasi-Newton property of DFP, we have
P
(i)
H DF
= ∆x(i) , 0 ≤ i ≤ k. The same holds for BFGS. Thus, for the given H k , we have for 0 ≤ i ≤ k,
k+1 ∆g
H k+1 ∆g (i)
P
GS
(i)
(i)
= φH DF
+ (1 − φ)H BF
k+1 ∆g
k+1 ∆g
= φ∆x(i) + (1 − φ)∆x(i)
=
∆x(i) ,
which shows that the above algorithm is a quasi-Newton algorithm (and hence also a conjugate direction
algorithm).
P
GS
b. By Theorem 11.4 and the discussion on BFGS, we have H DF
> 0 and H BF
> 0. Hence, for any
k
k
x 6= 0,
P
GS
x> H k x = φx> H DF
x + (1 − φ)x> H BF
x>0
k
k
since φ and 1 − φ are nonnegative. Hence, H k > 0, from which we conclude that the algorithm has the
descent property if αk is computed by line search (by Proposition 11.1).
11.16
To show the result, we will prove the following precise statement: In the quadratic case (with Hessian Q),
suppose that H k+1 ∆g (i) = ρi ∆x(i) , 0 ≤ i ≤ k, k ≤ n − 1. If αi 6= 0, 0 ≤ i ≤ k, then d(0) , . . . , d(k+1) are
Q-conjugate.
We proceed by induction. We begin with the k = 0 case: that d(0) and d(1) are Q-conjugate. Because
α0 6= 0, we can write d(0) = ∆x(0) /α0 . Hence,
d(1)> Qd(0) = −g (1)> H 1 Qd(0)
= −g (1)> H 1
Q∆x(0)
α0
= −g (1)>
H 1 ∆g (0)
α0
= −g (1)>
ρ0 ∆x(0)
α0
= −ρ0 g (1)> d(0) .
But g (1)> d(0) = 0 as a consequence of α0 > 0 being the minimizer of φ(α) = f (x(0) + αd(0) ). Hence,
d(1)> Qd(0) = 0.
73
Assume that the result is true for k − 1 (where k < n − 1). We now prove the result for k, that is, that
d , . . . , d(k+1) are Q-conjugate. It suffices to show that d(k+1)> Qd(i) = 0, 0 ≤ i ≤ k. Given i, 0 ≤ i ≤ k,
using the same algebraic steps as in the k = 0 case, and using the assumption that αi 6= 0, we obtain
(0)
d(k+1)> Qd(i) = −g (k+1)> H k+1 Qd(i)
..
.
= −ρi g (k+1)> d(i) .
Because d(0) , . . . , d(k) are Q-conjugate by assumption, we conclude from the expanding subspace lemma
(Lemma 10.2) that g (k+1)> d(i) = 0. Hence, d(k+1)> Qd(i) = 0, which completes the proof.
11.17
A MATLAB routine for the quasi-Newton algorithm with options for different formulas of H k is:
function [x,N]=quasi_newton(grad,xnew,H,options);
%
QUASI_NEWTON(’grad’,x0,H0);
%
QUASI_NEWTON(’grad’,x0,H0,OPTIONS);
%
%
x = QUASI_NEWTON(’grad’,x0,H0);
%
x = QUASI_NEWTON(’grad’,x0,H0,OPTIONS);
%
%
[x,N] = QUASI_NEWTON(’grad’,x0,H0);
%
[x,N] = QUASI_NEWTON(’grad’,x0,H0,OPTIONS);
%
%The first variant finds the minimizer of a function whose gradient
%is described in grad (usually an M-file: grad.m), using initial point
%x0 and initial inverse Hessian approximation H0.
%The second variant allows a vector of optional parameters to be
%defined:
%OPTIONS(1) controls how much display output is given; set
%to 1 for a tabular display of results, (default is no display: 0).
%OPTIONS(2) is a measure of the precision required for the final point.
%OPTIONS(3) is a measure of the precision required of the gradient.
%OPTIONS(5) specifies the formula for the inverse Hessian update:
%
0=Rank One;
%
1=DFP;
%
2=BFGS;
%OPTIONS(14) is the maximum number of iterations.
%For more information type HELP FOPTIONS.
%
%The next two variants return the value of the final point.
%The last two variants return a vector of the final point and the
%number of iterations.
if nargin ~= 4
options = [];
if nargin ~= 3
disp(’Wrong number of arguments.’);
return;
end
end
numvars = length(xnew);
if length(options) >= 14
if options(14)==0
options(14)=1000*numvars;
end
else
74
options(14)=1000*numvars;
end
clc;
format compact;
format short e;
options = foptions(options);
print = options(1);
epsilon_x = options(2);
epsilon_g = options(3);
max_iter=options(14);
reset_cnt = 0;
g_curr=feval(grad,xnew);
if norm(g_curr) <= epsilon_g
disp(’Terminating: Norm of initial gradient less than’);
disp(epsilon_g);
return;
end %if
d=-H*g_curr;
for k = 1:max_iter,
xcurr=xnew;
alpha=linesearch_secant(grad,xcurr,d);
xnew = xcurr+alpha*d;
if print,
disp(’Iteration number k =’)
disp(k); %print iteration index k
disp(’alpha =’);
disp(alpha); %print alpha
disp(’Gradient = ’);
disp(g_curr’); %print gradient
disp(’New point =’);
disp(xnew’); %print new point
end %if
if norm(xnew-xcurr) <= epsilon_x*norm(xcurr)
disp(’Terminating: Norm of difference between iterates less than’);
disp(epsilon_x);
break;
end %if
g_old=g_curr;
g_curr=feval(grad,xnew);
if norm(g_curr) <= epsilon_g
disp(’Terminating: Norm of gradient less than’);
disp(epsilon_g);
break;
end %if
p=alpha*d;
q=g_curr-g_old;
reset_cnt = reset_cnt+1;
if reset_cnt == 3*numvars
75
d=-g_curr;
reset_cnt = 0;
else
if options(5)==0 %Rank One
q’*(p-H*q)
H = H+(p-H*q)*(p-H*q)’/(q’*(p-H*q));
elseif options(5)==1 %DFP
H = H+p*p’/(p’*q)-(H*q)*(H*q)’/(q’*H*q);
else %BFGS
H = H+(1+q’*H*q/(q’*p))*p*p’/(p’*q)-(H*q*p’+(H*q*p’)’)/(q’*p);
end %if
d=-H*g_curr;
end
if print,
disp(’New H =’);
disp(H);
disp(’New d =’);
disp(d);
end
if k == max_iter
disp(’Terminating with maximum number of iterations’);
end %if
end %for
if nargout >= 1
x=xnew;
if nargout == 2
N=k;
end
else
disp(’Final point =’);
disp(xnew’);
disp(’Number of iterations =’);
disp(k);
end %if
%------------------------------------------------------------------
We created the following M-file, g.m, for the gradient of Rosenbrock’s function:
function y=g(x)
y=[-400*(x(2)-x(1).^2)*x(1)-2*(1-x(1)), 200*(x(2)-x(1).^2)]’;
We tested the above routine as follows:
>>
>>
>>
>>
>>
>>
>>
options(2)=10^(-7);
options(3)=10^(-7);
options(14)=100;
x0=[-2;2];
H0=eye(2);
options(5)=0;
quasi_newton(’g’,x0,H0,options);
Terminating: Norm of difference between iterates less than
1.0000e-07
Final point =
1.0000e+00
1.0000e+00
Number of iterations =
8
76
>> options(5)=1;
>> quasi_newton(’g’,x0,H0,options);
Terminating: Norm of difference between iterates less than
1.0000e-07
Final point =
1.0000e+00
1.0000e+00
Number of iterations =
8
>> options(5)=2;
>> quasi_newton(’g’,x0,H0,options);
Terminating: Norm of difference between iterates less than
1.0000e-07
Final point =
1.0000e+00
1.0000e+00
Number of iterations =
8
The reader is again cautioned not to draw any conclusions about the superiority or inferiority of any of
the formulas for H k based only on the above single numerical experiment.
11.18
a. The plot of the level sets of f were obtained using the following MATLAB commands:
>>
>>
>>
>>
[X,Y]=meshdom(-2:0.1:2, -1:0.1:3);
Z=X.^4/4+Y.^2/2 -X.*Y +X-Y;
V=[-0.72, -0.6, -0.2, 0.5, 2];
contour(X,Y,Z,V)
The plot is depicted below:
3
2.5
2
x_2
1.5
1
0.5
0
-0.5
-1
-2
-1.5
-1
-0.5
0
x_1
0.5
1
1.5
2
b. With the initial condition [0, 0]> , the algorithm converges to [−1, 0]> , while with the initial condition
[1.5, 1]> , the algorithm converges to [1, 2]> . These two points are the two strict local minimizers of f (as can
be checked using the SOSC). The algorithm apparently converges to the minimizer “closer” to the initial
point.
12. Solving Ax = b
77
12.1
Write the least squares cost in the usual notation kAx − bk2 where
 
 
1
3
h i
 
 
b =  2 .
x= m ,
A =  5 ,
3
6
The least squares estimate of the mass is
m∗ = (A> A)−1 A> b =
12.2
Write the least squares cost in the usual

1

A = 1
1
notation kAx − bk2 where

 
" #
3
1
a
 

x=
,
b =  4 .
2 ,
b
5
4
The least squares estimate for [a, b]> is
" #
a∗
b∗
=
(A> A)−1 A> b
"
=
3
7
7
21
#−1 "
#
12
31
#−1 "
"
1 21 −7
=
14 −7 3
" #
1 35
=
14 9
"
#
5/2
=
.
9/14
12.3
a. We form
31
.
70


12 /2


A =  22 /2 ,
32 /2
12
31
#


5.00


b =  19.5 .
44.0
The least squares estimate of g is then given by
g = (A> A)−1 A> b = 9.776.
b. We start with P 0 = 0.040816, and x(0) = 9.776. We have a1 = 42 /2 = 8, and b(1) = 78.5. Using the RLS
formula, we get x(1) = 9.802, which is our updated estimate of g.
12.4
Let x = [x1 , x2 , . . . , xn ]> and y = [y1 , y2 , . . . , yn ]> . This least-squares estimation problem can be expressed
as
minimize kαx − yk2 ,
with α as the decision variable. Assuming that x 6= 0, the solution is unique and is given by
Pn
xi yi
x> y
α∗ = (x> x)−1 x> y = > = Pi=1
n
2 .
x x
i=1 xi
78
12.5
The least squares estimate of R is the least squares solution to
1·R
1·R
= V1
..
.
= Vn .
Therefore, the least squares solution is
 −1
 
V1
1

 . 
 .  V1 + · · · + Vn
∗
>
>
 
 
.
R =
[1, . . . , 1]  ..  [1, . . . , 1]  ..  =
n
1
Vn

12.6
We represent the data in the table and the decision variables a and b using the usual least squares matrix
notation:


 
" #
1 2
6
a


 
A =  1 1 ,
b =  4 ,
x=
.
b
3 2
5
The least squares estimate is given by
"
#
"
a∗
11
>
−1 >
x = ∗ = (A A) A b =
b
9
∗
9
9
#−1 "
#
"
1
25
9
=
18 −9
26
#"
−9
11
# "
#
25
−1/2
=
.
26
61/18
12.7
The problem can be formulated as a least-squares problem with


 
0.3 0.1
5


 
A =  0.4 0.2 ,
b =  3 ,
0.3 0.7
4
where the decision variable is x = [x1 , x2 ], and x1 and x2 are the amounts of A and B, respectively. After
some algebra, we obtain the solution:
"
#" #
1
0.54
−0.32
3.9
x∗ = (A> A)−1 A> b =
.
(0.34)(0.54) − (0.32)2 −0.32 0.34
3.9
Since we are only interest in the ratio of the first component of x∗ to the second component, we need only
explicitly compute:
0.54 − 0.32
0.22
Ratio =
=
= 11.
−0.32 + 0.34
0.02
12.8
For each k, we can write
yk
= ayk−1 + buk + vk
= a2 yk−2 + abuk−1 + avk−1 + buk + vk
..
.
= ak−1 bu1 + ak−2 bu2 + · · · + buk + ak−1 v1 + ak−2 v2 + · · · + vk
79
Write u = [u1 , . . . , un ]> , v = [v1 , . . . , vn ]> , and y = [y1 , . . . , yn ]> . Then, y = Cu + Dv, where




1
0
··· 0
b
0 ··· 0


.
.
..
..
 ab
 a
. .. 
. .. 
b
1




D= .
C= .
,
.
..
..
..
 .

 .

.
.
.
0
0
 .
 .
an−1 b · · · ab b
an−1 an−2 · · · 1
Write b = D −1 y and A = D −1 C so that b = Au + v. Therefore, the linear least-squares estimate of u
given y is
u∗ = (A> A)−1 A> b = (C > D −> D −1 C)−1 C > D −> D −1 y.
But C = bD. Hence,

1
0

1 −1
1
 −a 1
u = D y=  .
..
b
b .
.
 .
0 ···
∗
···
..
.
..
.
−a

0
.. 
.

 y.

0
1
(Notice that D −1 has the simple form shown above.)
An alternative solution is first to define z = [z1 , . . . , zn ]> by zk = yk − ayk−1 . Then, we have z = bu + v.
Therefore, the linear least-squares estimate of u given y (or, equivalently, z) is


1
0 ··· 0

.. 
..
 −a 1
.
1
1
.


u∗ = z =  .
 y.
..
..

b
b .
.
. 0
 .
0 · · · −a 1
12.9
Define

x1
 .

X =  ..
xp

1
.. 

. ,
1

y1
 .

y=
 ..  .
yp

Since the xi are not all equal, we have rank X = 2. The objective function can be written as
" #
2
a
f (a, b) = X
−y .
b
Therefore, by Theorem 12.1 there exists a unique minimizer [a∗ , b∗ ]> given by
" #
a∗
= (X > X)−1 X > y
b∗
"P
#−1 " P
#
Pp
p
p
xi
x2i
xi yi
i=1
i=1
i=1
Pp
Pp
=
p
i=1 xi
i=1 yi
#−1 "
#
"
X2 X
XY
=
X
1
Y
"
#"
#
1
1
−X XY
=
Y
X 2 − (X)2 −X X 2


XY −(X)(Y )
X 2 −(X)2
.
=  (X 2 )(Y
)−(X)(XY )
X 2 −(X)2
80
As we can see, the solution does not depend on Y 2 .
12.10
a. We wish to find ω and θ such that
sin(ωt1 + θ)
= y1
..
.
sin(ωtp + θ)
= yp .
Taking arcsin, we get the following system of linear equations:
b. We may write the system of linear

t1
.
A=
 ..
tp
ωt1 + θ
=
..
.
arcsin y1
ωtp + θ
=
arcsin yp .
equations in part a as Ax = b, where



" #
1
arcsin y1


ω
..
.. 
.

x=
,
b=
. ,
.


θ
1
arcsin yp
Since the ti are not all equal, the first column of A is not a scalar multiple of the second column. Therefore,
rank A = 2. Hence, the least squares solution is
x
=
=
=
=
=
(A> A)−1 A> b
"P
#−1 " P
#
Pp
p
p
t2i
ti
ti arcsin yi
i=1
i=1
i=1
Pp
Pp
p
i=1 ti
i=1 arcsin yi
#−1 "
#
"
T2 T
TY
T 1
Y
"
#"
#
1
1
−T
TY
Y
T 2 − (T )2 −T T 2
"
#
1
T Y − (T )(Y )
.
T 2 − (T )2 −(T )(T Y ) + (T 2 )(Y )
12.11
The given line can be expressed as the range of the matrix A = [1, m]> . Let b = [x0 , y0 ]> be the given point.
Therefore, the problem is a linear least squares problem of minimizing kAx − bk2 . The solution is given by
x∗ = (A> A)−1 A> b =
x0 + my0
.
1 + m2
Therefore, the point on the straight line that is closest to the given point [x0 , y0 ] is given by [x∗ , mx∗ ]> .
12.12
a. Write

x>
1
 .

.
A= .
x>
p

1
.. 
p×(n+1)

,
. ∈ R
1

" #
a
z=
∈ Rn+1 ,
c
The objective function can then be written as kAz − bk2 .
81

y1
 .
p

b=
 ..  ∈ R .
yp
b. Let X = [x1 , . . . , xp ]> ∈ Rp×n , and e = [1, . . . , 1]> ∈ Rp . Then we
to the problem is (A> A)−1 A> b. But
"
# "
X >X X >e
X >X
>
A A=
=
e> X
p
0>
may write A = [X e]. The solution
0
p
#
since X > e = x1 + · · · + xp = 0 by assumption. Also,
"
# "
#
X >y
0
>
A y=
= >
e> y
e y
since X > y = y1 x1 + · · · + yp xp = 0 by assumption. Therefore, the solution is given by
#
"
#"
# "
0
(X > X)−1
0
0
>
∗
−1 >
z = (A A) A b =
= 1 > .
0>
1/p e> y
pe y
The affine function of best fit is the constant function f (x) = c, were
p
c=
1X
yi .
p i=1
12.13
a. Using the least squares formula, we have
−1
 
Pn
u1
y1
u k yk

 
 
θ̂n = [u1 , . . . , un ]  · · ·  [u1 , . . . , un ]  · · ·  = Pk=1
n
2 .
k=1 uk
un
yn


b. Given uk = 1 for all k, we have
θ̂n =
n
n
n
k=1
k=1
k=1
1X
1X
1X
yk =
(θ + ek ) = θ +
ek .
n
n
n
Hence, θ̂n → θ if and only if limn→∞
1
n
Pn
k=1 ek
= 0.
12.14
We pose the problem as a least squares problem:

x0

A =  x1
x2
We have
A> A =
"P
2
x2i
Pi=0
2
i=0 xi
minimize kAx − bk2 where x = [a, b]> , and

 
1
x1

 
b =  x2  .
1 ,
1
x3
#
x
i
i=0
,
3
P2
A> b =
"P
2
xi xi+1
Pi=0
2
i=0 xi+1
#
.
Therefore, the least squares solution is
" # "P
2
a
x2i
= Pi=0
2
b
i=0 xi
P2
i=0
3
xi
#−1 " P
2
xi xi+1
Pi=0
2
i=0 xi+1
82
#
"
5
=
3
3
3
#−1 "
# "
#
18
7/2
=
.
11
1/6
12.15
We pose the problem as a least squares problem: minimize kAx − bk2 where x = [a, b]> , and


 
0
1
h1


 
h
0
 1

 h2 
,
 . 
A=
b
=
.
.
 .
 . 
.. 
 .

 . 
hn−1 0
hn
(note that h0 = 0). We have
A> A =
"P
n−1
i=1
0
h2i
#
0
,
1
A> b =
"P
#
hi hi+1
.
h1
n−1
i=1
The matrix A> A is nonsingular because we assume that at least one hk is nonzero. Therefore, the least
squares solution is
#−1 " P
# " P
" # "P
Pn−1 2 #
n−1
n−1
n−1 2
a
h
h
(
h
h
)/(
0
h
i
i+1
i
i+1
i=1
i=1
i=1 hi )
i=1 i
=
.
=
h1
0
1
b
h1
12.16
We pose the problem as a least squares problem: minimize kAx − bk2 where x = [a, b]> , and


 
0
1
s1


 
1
 s1
 s2 

A=
b=
.. 
 ..
,
 .. 
.
 .
 .
sn−1 1
sn
(where we use s0 = 0). We have
>
A A=
"P
n−1 2
si
Pi=1
n−1
i=1 si
Pn−1
i=1
n
si
#
,
>
A b=
"P
n−1
i=1 si si+1
P
n
i=1 si
#
.
The matrix A> A is nonsingular because we assume that at least one sk is nonzero. Therefore, the least
squares solution is
" #
"P
#
Pn−1 #−1 " Pn−1
n−1 2
a
s
s
s
s
i
i
i+1
i
i=1
i=1
Pi=1
P
=
n−1
n
b
n
i=1 si
i=1 si
"
#
Pn−1
Pn−1 Pn
1
n i=1 si si+1 − i=1 si i=1 si
Pn−1 Pn−1
Pn−1 Pn
.
=
P
2
Pn−1
− i=1 si i=1 si si+1 + i=1 s2i i=1 si
n−1
n i=1 s2i −
s
i=1 i
12.17
This least-squares estimation problem can be expressed as
minimize kax − yk2 .
If x = 0, then the problem has an infinite number of solutions: any a solves the problem. Assuming that
x 6= 0, the solution is unique and is given by
a∗ = (x> x)−1 x> y =
83
x> y
.
x> x
12.18
The solution to this problem is the same as the solution to:
1
kx − bk2
2
x ∈ R(A).
minimize
subject to
Substituting x = Ay, we see that this is simply a linear least squares problem with decision variable y. The
solution to the least squares problem is y ∗ = (A> A)−1 A> b, which implies that the solution to the given
problem is x∗ = A(A> A)−1 A> b.
12.19
We solve the problem using two different methods. The first method would be to use the Lagrange multiplier
technique to solve the equivalent problem,
kx − x0 k2
h
subject to h(x) = 1
minimize
1
i
1 x − 1 = 0,
The lagrangian for the above problem has the form,
l(x, λ) = x21 + (x2 + 3)2 + x23 + λ(x1 + x2 + x3 − 1).
Applying the FONC gives


2x1 + λ


∇x l =  2x2 + 6 + λ
2x3 + λ
and x1 + x2 + x3 − 1 = 0.
Solving the above yields

x∗ =

4
 35 
− 3  .
4
3
The second approach is to use the well-known solution to the minimum norm problem. We first derive a
general solution formula for the problem,
minimize
kx − x0 k
subject to Ax = b,
where A ∈ Rm×n , m ≤ n, and rank A = m. To proceed, we first transform the above problem from the x
coordinates into the z = x − x0 coordinates to obtain,
minimize kzk
subject to Az = b − Ax0 .
The solution to the above problem has the form,
−1
z ∗ = A> AA>
(b − Ax0 )
−1
−1
= A> AA>
b − A> AA>
Ax0 .
Therefore, the solution to the original problem is
−1
x∗ = A> AA>
(b − Ax0 ) + x0
−1
−1
= A> AA>
b − A> AA>
Ax0 + x0
−1
−1 = A> AA>
b + I n − A> AA>
A x0 .
84
We substitute into the above formula the given numerical data to obtain
  
 
1
2
1
1
−
−
0
3
3
 3  3

2
1 
x∗ =  13  +  − 13
−3
−



3
3
1
1
1
2
−3 −3
0
3
3
 
4
 3 
=  − 53  .
4
3
12.20
For each x ∈ Rn , let y = x − x0 . Then, the original problem is equivalent to
kyk
Ay = b − Ax0 ,
minimize
subject to
in the sense that y ∗ is a solution to the above problem if and only if x∗ = y ∗ + x0 is a solution to the original
problem. By Theorem 12.2, the above problem has a unique solution given by
y ∗ = A> (AA> )−1 (b − Ax0 ) = A> (AA> )−1 b − A> (AA> )−1 Ax0 .
Therefore, the solution to the original problem is
x∗ = A> (AA> )−1 b − A> (AA> )−1 Ax0 + x0 = A> (AA> )−1 b + (I n − A> (AA> )−1 A)x0 .
Note that
kx∗ − x0 k
= y∗
= kA> (AA> )−1 (b − Ax0 )k
= kA> (AA> )−1 b − A> (AA> )−1 Ax0 k.
12.21
The objective function of the given problem can be written as
f (x) = kBx − ck2 ,
where
 
A
.

B =  .. 
,
A


b1
 .

c=
 ..  .
bp
The solution is therefore
p
x∗ = (B > B)−1 B > c = (pA> A)−1 A> (b1 + · · · + bp ) =
p
1X ∗
1 X > −1 >
(A A) A bi =
x
p i=1
p i=1 i
Alternatively: Write
kAx − bi k2 = x> A> Ax − 2x> A> bi + kbi k2
Therefore, the given objective function can be written as
px> A> Ax − 2x> A> (b1 + · · · + bp ) + kb1 k2 + · · · + kbi k2 .
The solution is therefore
p
∗
>
−1
x = (pA A)
1X ∗
A (b1 + · · · + bp ) =
x
p i=1 i
>
85
Note that the original problem can be written as the least squares problem
minimize kAx − bk2 ,
where
b=
b1 + · · · + bp
.
p
12.22
Write
kAx − bi k2 = x> A> Ax − 2x> A> bi + kbi k2
Therefore, the given objective function can be written as
(α1 + · · · + αp )x> A> Ax − 2x> A> (α1 b1 + · · · + αp bp ) + α1 kb1 k2 + · · · + αp kbi k2 .
The solution is therefore (by inspection)
x∗ = ((α1 + · · · + αp )A> A)−1 A> (α1 b1 + · · · + αp bp ) =
p
p
X
X
1
αi x∗i =
βi x∗i ,
α1 + · · · + αp i=1
i=1
where βi = αi /(α1 + · · · + αp ).
Note that the original problem can be written as the least squares problem
minimize kAx − bk2 ,
where
b=
α1 b1 + · · · + αp bp
.
α1 + · · · + αp
12.23
Let x∗ = A> (AA> )−1 b. Suppose y ∗ is a point in R(A> ) that satisfies Ay ∗ = b. Then, there exists
z ∗ ∈ Rm such that y ∗ = A> z ∗ . Then, subtracting the equation A(A> (AA> )−1 b) = b from the equation
A(A> z ∗ ) = b, we get
(AA> )(z ∗ − (AA> )−1 b) = 0.
Since rank A = m, AA> is nonsingular. Therefore, z ∗ − (AA> )−1 b = 0, which implies that
y ∗ = A> z ∗ = A> (AA> )−1 b = x∗ .
Hence, x∗ = A> (AA> )−1 b is the only vector in R(A> ) that satisfies Ax∗ = b.
12.24
a. We have
> (0)
−1 > (0)
x(0) = (A>
A0 b = G−1
.
0 A0 )
0 A0 b
Similarly,
> (1)
−1 > (1)
x(1) = (A>
A1 b = G−1
.
1 A1 )
1 A1 b
b. Now,
G0
=
h
A>
1
" #
i A
1
a1
a>
1
>
= A>
1 A1 + a1 a1
= G1 + a1 a>
1.
Hence,
G1 = G0 − a1 a>
1.
86
c. Using the Sherman-Morrison formula,
P1
= G−1
1
=
−1
(G0 − a1 a>
1)
= G−1
0 −
= P0 +
> −1
G−1
0 (−a1 )a1 G0
1 + (−a1 )> G−1
0 a1
P 0 a1 a>
1 P0
.
>
1 − a1 P 0 a1
d. We have
(0)
A>
0b
(0)
= G0 G0−1 A>
0b
= G0 x(0)
=
(0)
(G1 + a1 a>
1 )x
(0)
= G1 x(0) + a1 a>
.
1x
e. Finally,
x(1)
(1)
= G1−1 A>
1b
= G1−1 (A>
b(1) + a1 b1 − a1 b1 )
1
(0)
= G−1
A>
− a1 b1
0b
1
(0)
> (0)
= G−1
G
x
+
a
a
x
−
a
b
1
1
1
1
1
1
−1
(0)
(0)
= x − G1 a1 b1 − a>
1x
(0)
= x(0) − P 1 a1 b1 − a>
x
.
1
The general RLS algorithm for removals of rows is:
P (k+1)
x(k+1)
P k ak+1 a>
k+1 P k
>
1 − ak+1 P k ak+1
(k)
(k)
= x − P k+1 ak+1 bk+1 − a>
.
k+1 x
= Pk +
12.25
Using the notation of the proof of Theorem 12.3, we can write
(k)
x(k+1) = x(k) + µ(bR(k)+1 − a>
)
R(k)+1 x
aR(k)+1
.
kaR(k)+1 k2
Hence,
x(k) =
k−1
X
i=0
µ(2 − µ)
>
(i)
(b
−
a
x
)
aR(i)+1
R(i)+1
R(i)+1
kaR(k)+1 k2
which means that x(k) is in span[a1 , . . . , am ] = R(A> ).
12.26
a. We claim that x∗ minimizes kx − x(0) k subject to {x : Ax = b} if and only if y ∗ = x∗ − x(0) minimizes
kyk subject to {Ay = b − Ax(0) }.
To prove sufficiency, suppose y ∗ minimizes kyk subject to {Ay = b − Ax(0) }. Let x∗ = y ∗ + x(0) .
Consider any point x1 ∈ {x : Ax = b}. Now,
A(x1 − x(0) ) = b − Ax(0) .
87
Hence, by definition of y ∗ ,
kx1 − x(0) k ≥ ky ∗ k = kx∗ − x(0) k.
Therefore x∗ minimizes kx − x(0) k subject to {x : Ax = b}.
To prove necessity, suppose x∗ minimizes kx − x(0) k subject to {x : Ax = b}. Let y ∗ = x∗ − x(0) .
Consider any point y 1 ∈ {y : Ay = b − Ax(0) }. Now,
A(y 1 + x(0) ) = b.
Hence, by definition of x∗ ,
ky 1 k = k(y 1 + x(0) ) − x(0) k ≥ kx∗ − x(0) k = ky ∗ k.
Therefore, y ∗ minimizes kyk subject to {Ay = b − Ax(0) }.
By Theorem 12.2, there exists a unique vector y ∗ minimizing kyk subject to {Ay = b − Ax(0) }. Hence,
by the above claim, there exists a unique x∗ minimizing kx − x(0) k subject to {x : Ax = b}
b. Using the notation of the proof of Theorem 12.3, Kaczmarz’s algorithm is given by
(k)
)aR(k)+1 .
x(k+1) = x(k) + µ(bR(k)+1 − a>
R(k)+1 x
Subtract x(0) from each side to give
(0)
(k)
(x(k+1) − x(0) ) = (x(k) − x(0) ) + µ((bR(k)+1 − a>
) − a>
− x(0) ))aR(k)+1 .
R(k)+1 x
R(k)+1 (x
Writing y (k) = x(k) − x(0) , we get
(0)
(k)
y (k+1) = y (k) + µ((bR(k)+1 − a>
) − a>
)aR(k)+1 .
R(k)+1 x
R(k)+1 y
Note that y (0) = 0. By Theorem 12.3, the sequence {y (k) } converges to the unique point y ∗ that minimizes
kyk subject to {Ay = b − Ax(0) }. Hence {x(k) } converges to y ∗ + x(0) . From the proof of part a,
x∗ = y ∗ + x(0) minimizes kx − x(0) k subject to {x : Ax = b}. This completes the proof.
12.27
Following the proof of Theorem 12.3, assuming kak = 1 without loss of generality, we arrive at
kx(k+1) − x∗ k2 = kx(k) − x∗ k2 − µ(2 − µ)(a> (x(k) − x∗ ))2 .
Since x(k) , x∗ ∈ R(A) = R([a> ]) by Exercise 12.25, we have x(k) − x∗ ∈ R(A). Hence, by the CauchySchwarz inequality,
(a> (x(k) − x∗ ))2 = kak2 kx(k) − x∗ k2 = kx(k) − x∗ k2 ,
since kak = 1 by assumption. Thus, we obtain
kx(k+1) − x∗ k2 = (1 − µ(2 − µ))kx(k) − x∗ k2 = γ 2 kx(k) − x∗ k2
where γ =
p
1 − µ(2 − µ). It is easy to check that 0 ≤ 1 − µ(2 − µ) < 1 for all µ ∈ (0, 2). Hence, 0 ≤ γ < 1.
12.28
In Kaczmarz’s algorithm with µ = 1, we may write
(k)
x(k+1) = x(k) + (bR(k)+1 − a>
)
R(k)+1 x
aR(k)+1
.
kaR(k)+1 k2
Subtracting x∗ and premultiplying both sides by a>
R(k)+1 yields
aR(k)+1
>
(k+1)
∗
>
(k)
∗
>
(k)
aR(k)+1 (x
− x ) = aR(k)+1 x − x + (bR(k)+1 − aR(k)+1 x )
kaR(k)+1 k2
(k)
∗
>
(k)
)
= a>
− a>
R(k)+1 x + (bR(k)+1 − aR(k)+1 x
R(k)+1 x
∗
= bR(k)+1 − a>
R(k)+1 x
=
0.
88
∗
Substituting a>
R(k)+1 x = bR(k)+1 yields the desired result.
12.29
We will prove this by contradiction. Suppose Cx∗ is not the minimizer of kBy − bk2 over Rr . Let ŷ be the
minimizer of kBy − bk2 over Rr . Then, kB ŷ − bk2 < kBCx∗ − bk2 = kAx∗ − bk2 . Since C is of full rank,
there exists x̂ ∈ Rn such that ŷ = C x̂. Therefore,
kAx̂ − bk2 = kBC x̂ − bk2 = kB ŷ − bk2 < kAx∗ − bk2
which contradicts the assumption that x∗ is a minimizer of kAx − bk2 over Rn .
12.30
a. Let A = BC be a full rank factorization of A. Now, we have A† = C † B † , where B † = (B > B)−1 B > and
C † = C > (CC > )−1 . On the other hand (A> )† = (C > B > )† . Since A> = C > B > is a full rank factorization
of A> , we have (A> )† = (C > B > )† = (B > )† (C > )† . Therefore, to show that (A> )† = (A† )> , it is enough
to show that
(B > )†
=
(B † )>
(C > )†
=
(C † )> .
To this end, note that (B > )† = B(B > B)−1 , and (C > )† = (CC > )−1 C. On the other hand, (B † )> =
((B > B)−1 B > )> = B(B > B)−1 , and (C † )> = (C > (CC > )−1 )> = (CC > )−1 C, which completes the proof.
b. Note that A† = C † B † , which is a full rank factorization of A† . Therefore, (A† )† = (B † )† (C † )† . Hence,
to show that (A† )† = A, it is enough to show that
(B † )†
= B
(C † )†
= C.
To this end, note that (B † )† = ((B > B)−1 B > )† = B since B † is a full rank matrix. Similarly, (C † )† =
(C > (CC > )−1 )† = C since C † is a full rank matrix. This completes the proof.
12.31
⇒: We prove properties 1–4 in turn.
1. This is immediate.
2. Let A = BC be a full rank factorization of A. We have A† = C † B † , where B † = (B > B)−1 B > and
†
C = C > (CC > )−1 . Note that B † B = I and CC † = I. Now,
A† AA†
= C † B † BCC † B †
= C †B†
= A† .
3. We have
(AA† )>
=
(BCC † B † )>
=
(BB † )>
=
(B † )> B >
=
((B > B)−1 B > )> B >
= B(B > B)−1 B >
= BB †
= BCC † B †
= AA† .
89
4. We have
(A† A)>
=
(C † B † BC)>
=
(C † C)>
= C > (C † )>
= C > (C > (CC > )−1 )>
= C > (CC > )−1 C
= C †C
= C † B † BC
= A† A.
⇐: By property 1, we immediately have AA† A = A. Therefore, it remains to show that there exist
matrices U and V such that A† = U A> and A† = A> V .
For this, we note from property 2 that A† = A† AA† . But from property 3, AA† = (AA† )> = (A† )> A> .
Hence, A† = A† (A† )> A> . Setting U = A† (A† )> , we get that A† = U A> .
Similarly, we note from property 4 that A† A = (A† A)T = A> (A† )> . Substituting this back into property
2 yields A† = A† AA† = A> (A† )> A† . Setting V = (A† )> A† yields A† = A> V . This completes the proof.
12.32
(Taken from [23, p. 24]) Let

0

A1 =  0
0
0
1
1

0

1 ,
0

1

A2 =  0
0
0
1
0

0

0 .
0
We compute

0

A†1 =  0
0
We have
0
0
1

0

1 ,
−1

1

A†2 =  0
0

0

A1 A2 =  0
0
0
1
1
  
0 h
0
  
0  =  1 0
1
0
1
0
1
0
0

0

0 = A 2 .
0
i
which is a full rank factorization. Therefore,

0

(A1 A2 )† =  0
0
But

0

A†2 A†1 =  0
0
0
1/2
0
0
0
0

0

1/2 .
0

0

1 .
0
Hence, (A1 A2 )† 6= A†2 A†1 .
13. Unconstrained Optimization and Feedforward Neural Networks
13.1
a. The gradient of f is given by
∇f (w) = −X d (y d − X >
d w).
90
b. The Conjugate Gradient algorithm applied to our training problem is:
1. Set k := 0; select the initial point w(0) .
(0)
2. g (0) = −X d (y d − X >
). If g (0) = 0, stop, else set d(0) = −g (0) .
dw
(k)>
d
3. αk = − d(k)>
X
g (k)
> (k)
dXd d
4. w(k+1) = w(k) + αk d(k)
(k+1)
5. g (k+1) = X d (y d − X >
). If g (k+1) = 0, stop.
dw
6. βk =
(k)
g (k+1)> X d X >
d d
(k)
d(k)> X d X >
d
d
7. d(k+1) = −g (k+1) + βk d(k)
8. Set k := k + 1; go to 3.
c. We form the matrix X d as
"
−0.5
Xd =
−0.5
−0.5
0
−0.5
0.5
0
−0.5
0 0
0 0.5
0.5
−0.5
#
0.5
0
0.5
0.5
and the vector y d as
y d = [−0.42074, −0.47943, −0.42074, 0, 0, 0, 0.42074, 0.47943, 0.42074]> .
Running the Conjugate Gradient algorithm, we get a solution of w∗ = [0.8806, 0.000]> .
d. The level sets are shown in the figure below.
0.5
0.4
0.3
0.2
w_2
0.1
0
-0.1
-0.2
-0.3
-0.4
-0.5
0
0.1
0.2
0.3
0.4
0.5
w_1
The solution in part c agrees with the level sets.
e. The plot of the error function is depicted below.
91
0.6
0.7
0.8
0.9
1
0.5
0.4
0.3
0.2
error
0.1
0
-0.1
-0.2
-0.3
-0.4
-0.5
-1
0
1
x_2
1
0
0.5
-0.5
-1
x_1
13.2
a. The expression we seek is
ek+1 = (1 − µ)ek .
To derive the above, we write
ek+1 − ek
(k+1)
(k)
= y d − x>
− y d − x>
dw
dw
(k+1)
= −x>
− w(k) .
d w
Substituting for w(k+1) − w(k) from the Widrow-Hoff algorithm yields
ek+1 − ek = −µx>
d
ek xd
= −µek .
x>
d xd
Hence, ek+1 = (1 − µ)ek .
b. For ek → 0, it is necessary and sufficient that |1 − µ| < 1, which is equivalent to 0 < µ < 2.
13.3
a. The error satisfies
e(k+1) = (I p − µ)e(k) .
To derive the above expression, we write
e(k+1) − e(k)
> (k)
(k+1)
= yd − X >
w
−
y
−
X
w
d
d
d
>
= −X d w(k+1) − w(k) .
Substituting for w(k+1) − w(k) from the algorithm yields
>
−1
e(k+1) − e(k) = −X >
µe(k) = −µe(k) .
d X d (X d X d )
Hence, e(k+1) = (I p − µ)e(k) .
b. From part a, we see that e(k) = (I p − µ)k e(0) . Hence, by Lemma 5.1, a necessary and sufficient condition
for e(k) → 0 for any e(0) is that all the eigenvalues of I p − µ must be located in the open unit circle. From
Exercise 3.6, it follows that the above condition holds if and only if |1 − λi (µ)| < 1 for each eigenvalue λi (µ)
of µ. This is true if and only if 0 < |λi (µ)| < 2 for each eigenvalue λi (µ) of µ.
13.4
We modified the MATLAB routine of Exercise 8.25, by fixing the step size at a value η = 100. We need the
following M-file for the gradient:
92
function y=Dfbp(w);
wh11=w(1);
wh21=w(2);
wh12=w(3);
wh22=w(4);
wo11=w(5);
wo12=w(6);
xd1=0; xd2=1; yd=1;
v1=wh11*xd1+wh12*xd2;
v2=wh21*xd1+wh22*xd2;
z1=sigmoid(v1);
z2=sigmoid(v2);
y1=sigmoid(wo11*z1+wo12*z2)
d1=(yd-y1)*y1*(1-y1);
y(1)=-d1*wo11*z1*(1-z1)*xd1;
y(2)=-d1*wo12*z2*(1-z2)*xd1;
y(3)=-d1*wo11*z1*(1-z1)*xd2;
y(4)=-d1*wo12*z2*(1-z2)*xd2;
y(5)=-d1*z1;
y(6)=-d1*z2;
y=y’;
After 20 iterations of the backpropagation algorithm, we get the following weights:
w11
o(20)
=
2.883
o(20)
w12
h(20)
w11
h(20)
w12
h(20)
w21
h(20)
w22
=
3.194
=
0.1000
=
0.8179
=
0.3000
=
1.106.
(20)
The corresponding output of the network is y1
= 0.9879.
13.5
We used the following MATLAB routine:
function [x,N]=backprop(grad,xnew,options);
%
BACKPROP(’grad’,x0);
%
BACKPROP(’grad’,x0,OPTIONS);
%
%
x = BACKPROP(’grad’,x0);
%
x = BACKPROP(’grad’,x0,OPTIONS);
%
%
[x,N] = BACKPROP(’grad’,x0);
%
[x,N] = BACKPROP(’grad’,x0,OPTIONS);
%
%The first variant trains a net whose gradient
%is described in grad (usually an M-file: grad.m), using a backprop
%algorithm with initial point x0.
%The second variant allows a vector of optional parameters to
%defined. OPTIONS(1) controls how much display output is given; set
%to 1 for a tabular display of results, (default is no display: 0).
93
%OPTIONS(2) is a measure of the precision required for the final point.
%OPTIONS(3) is a measure of the precision required of the gradient.
%OPTIONS(14) is the maximum number of iterations.
%For more information type HELP FOPTIONS.
%
%The next two variants returns the value of the final point.
%The last two variants returns a vector of the final point and the
%number of iterations.
if nargin ~= 3
options = [];
if nargin ~= 2
disp(’Wrong number of arguments.’);
return;
end
end
if length(options) >= 14
if options(14)==0
options(14)=1000*length(xnew);
end
else
options(14)=1000*length(xnew);
end
clc;
format compact;
format short e;
options = foptions(options);
print = options(1);
epsilon_x = options(2);
epsilon_g = options(3);
max_iter=options(14);
for k = 1:max_iter,
xcurr=xnew;
g_curr=feval(grad,xcurr);
if norm(g_curr) <= epsilon_g
disp(’Terminating: Norm of gradient less than’);
disp(epsilon_g);
k=k-1;
break;
end %if
alpha=10.0;
xnew = xcurr-alpha*g_curr;
if print,
disp(’Iteration number k =’)
disp(k); %print iteration index k
disp(’alpha =’);
disp(alpha); %print alpha
disp(’Gradient = ’);
disp(g_curr’); %print gradient
94
disp(’New point =’);
disp(xnew’); %print new point
end %if
if norm(xnew-xcurr) <= epsilon_x*norm(xcurr)
disp(’Terminating: Norm of difference between iterates less than’);
disp(epsilon_x);
break;
end %if
if k == max_iter
disp(’Terminating with maximum number of iterations’);
end %if
end %for
if nargout >= 1
x=xnew;
if nargout == 2
N=k;
end
else
disp(’Final point =’);
disp(xnew’);
disp(’Number of iterations =’);
disp(k);
end %if
%------------------------------------------------------------------
To apply the above routine, we need the following M-file for the gradient.
function y=grad(w,xd,yd);
wh11=w(1);
wh21=w(2);
wh12=w(3);
wh22=w(4);
wo11=w(5);
wo12=w(6);
t1=w(7);
t2=w(8);
t3=w(9);
xd1=xd(1); xd2=xd(2);
v1=wh11*xd1+wh12*xd2-t1;
v2=wh21*xd1+wh22*xd2-t2;
z1=sigmoid(v1);
z2=sigmoid(v2);
y1=sigmoid(wo11*z1+wo12*z2-t3);
d1=(yd-y1)*y1*(1-y1);
y(1)=-d1*wo11*z1*(1-z1)*xd1;
y(2)=-d1*wo12*z2*(1-z2)*xd1;
y(3)=-d1*wo11*z1*(1-z1)*xd2;
y(4)=-d1*wo12*z2*(1-z2)*xd2;
y(5)=-d1*z1;
y(6)=-d1*z2;
y(7)=d1*wo11*z1*(1-z1);
y(8)=d1*wo12*z2*(1-z2);
y(9)=d1;
95
y=y’;
We applied our MATLAB routine as follows.
>>
>>
>>
>>
>>
options(2)=10^(-7);
options(3)=10^(-7);
options(14)=10000;
w0=[0.1,0.3,0.3,0.4,0.4,0.6,0.1,0.1,-0.1]’;
[wstar,N]=backprop(’grad’,w0,options)
Terminating with maximum number of iterations
wstar =
-7.7771e+00
-5.5932e+00
-8.4027e+00
-5.6384e+00
-1.1010e+01
1.0918e+01
-3.2773e+00
-8.3565e+00
5.2606e+00
N =
10000
As we can see from the above, the results coincide with Example 13.3. The table of the outputs of the
trained network corresponding to the training input data is shown in Table 13.2.
14. Global Search Algorithms
14.1
The MATLAB program is as follows.
function [ output_args ] = nm_simplex( input_args )
%Nelder-Mead simplex method
%Based on the program by the Spring 2007 ECE580 student, Hengzhou Ding
disp
disp
disp
disp
(’We minimize a function using the Nelder-Mead method.’)
(’There are two initial conditions.’)
(’You can enter your own starting point.’)
(’---------------------------------------------’)
% disp(’Select one of the starting points’)
% disp (’[0.55;0.7] or [-0.9;-0.5]’)
% x0=input(’’)
disp (’
’)
clear
close all;
disp(’Select one of the starting points, or enter your own point’)
disp(’[0.55;0.7] or [-0.9;-0.5]’)
disp(’(Copy one of the above points and paste it at the prompt)’)
x0=input(’’)
hold on
axis square
%Plot the contours of the objective function
[X1,X2]=meshgrid(-1:0.01:1);
96
Y=(X2-X1).^4+12.*X1.*X2-X1+X2-3;
[C,h] = contour(X1,X2,Y,20);
clabel(C,h);
% Initialize all parameters
lambda=0.1;
rho=1;
chi=2;
gamma=1/2;
sigma=1/2;
e1=[1 0]’;
e2=[0 1]’;
%x0=[0.55 0.7]’;
%x0=[-0.9 -0.5]’;
% Plot initial point and initialize the simplex
plot(x0(1),x0(2),’--*’);
x(:,3)=x0;
x(:,1)=x0+lambda*e1;
x(:,2)=x0+lambda*e2;
while 1
% Check the size of simplex for stopping criterion
simpsize=norm(x(:,1)-x(:,2))+norm(x(:,2)-x(:,3))+norm(x(:,3)-x(:,1));
if(simpsize<1e-6)
break;
end
lastpt=x(:,3);
% Sort the simplex
x=sort_points(x,3);
% Reflection
centro=1/2*(x(:,1)+x(:,2));
xr=centro+rho*(centro-x(:,3));
% Accept condition
if(obj_fun(xr)>=obj_fun(x(:,1)) && obj_fun(xr)<obj_fun(x(:,2)))
x(:,3)=xr;
% Expand condition
elseif(obj_fun(xr)<obj_fun(x(:,1)))
xe=centro+rho*chi*(centro-x(:,3));
if(obj_fun(xe)<obj_fun(xr))
x(:,3)=xe;
else
x(:,3)=xr;
end
% Outside contraction or shrink
elseif(obj_fun(xr)>=obj_fun(x(:,2)) &&
obj_fun(xr)<obj_fun(x(:,3)))
xc=centro+gamma*rho*(centro-x(:,3));
if(obj_fun(xc)<obj_fun(x(:,3)))
x(:,3)=xc;
else
x=shrink(x,sigma);
end
% Inside contraction or shrink
else
xcc=centro-gamma*(centro-x(:,3));
if(obj_fun(xcc)<obj_fun(x(:,3)))
x(:,3)=xcc;
97
else
x=shrink(x,sigma);
end
end
% Plot the new point and connect
plot([lastpt(1),x(1,3)],[lastpt(2),x(2,3)],’--*’);
end
% Output the final simplex (minimizer)
x(:,1)
% obj_fun
function y = obj_fun(x)
y=(x(1)-x(2))4+12*x(1)*x(2)-x(1)+x(2)-3;
% sort_points
function y = sort_points(x,N)
for i=1:(N-1)
for j=1:(N-i)
if(obj_fun(x(:,j))>obj_fun(x(:,j+1)))
tmp=x(:,j);
x(:,j)=x(:,j+1);
x(:,j+1)=tmp;
end
end
end
y=x;
% shrink
function y = shrink(x,sigma)
x(:,2)=x(:,1)+sigma*(x(:,2)-x(:,1));
x(:,3)=x(:,1)+sigma*(x(:,3)-x(:,1));
y=x;
When we run the MATLAB code above with initial condition [0.55, 0.7]> , we obtain the following plot:
3.
82
8
0.8
5.3
06
7
2.
2
35
1.6 12
12
4
0.6
3.0
8
0.87
36
0.13 7
49
0.4−0.603
84
1
−1.34
0.2−2.0814
0
3.8
287
2.3
512
1.61
24
99
−0.60
384
26
−2.081
4
−2.8201
−3.5589
−3.558
9
−4.2976
−0.2
364
−5.0
−5.03
64
−5.7
1
−0.4
775
−5.
751
−0.6
−0.8
0.55
0.6
0.65
0.7
0.75
0.8
1
75
.7
5
−
0.85
0.9
When we run the MATLAB code above with initial condition [−0.9, −0.5]> , we obtain the following plot:
98
1
91 84
4
13 03
0. 0.6
−
26
14
.34
.08
−1
−12
0
.82
−2
−2.8201
3.5589
−3
.55
89
−
0.5
−3
.55
89
−2.
589
−3.5
820
1
0.8
736
814
0.1
34
91
7
−2
.08
14
342
−0
.60
6
38
4
.6
−0
−0.4
84
03
−0.5
1
49
0.
87
36
1.
7
6
−0.7 12
−0.6
4
13
−0.8
−1.
0.
−0.5
3.8 −0.9
28
7
−2.8201
−2.0
0 −1.3
426
Note that this function has two local minimizers. The algorithm terminates at these two minimizers with
the two different initial conditions. This behavior depends on the value of lambda, which determines the
initial simplex. It is possible to reach both minimizers starting from the same initial point by using different
values of lambda. In the solution above, the initial simplex is “small”—lambda is just 0.1.
14.2
A MATLAB routine for a naive random search algorithm is given by the M-file rs_demo shown below:
function [x,N]=random_search(funcname,xnew,options);
%Naive random search demo
% [x,N]=random_search(funcname,xnew,options);
% print = options(1);
% alpha = options(18);
if nargin ~= 3
options = [];
if nargin ~= 2
disp(’Wrong number of arguments.’);
return;
end
end
if length(options) >= 14
if options(14)==0
options(14)=1000*length(xnew);
end
else
options(14)=1000*length(xnew);
end
if length(options) < 18
options(18)=1.0; %optional step size
end
format compact;
format short e;
options = foptions(options);
print = options(1);
epsilon_x = options(2);
99
epsilon_g = options(3);
max_iter=options(14);
alpha0 = options(18);
if funcname == ’f_r’,
ros_cnt
elseif funcname == ’f_p’,
pks_cnt;
end %if
if length(xnew) == 2
plot(xnew(1),xnew(2),’o’)
text(xnew(1),xnew(2),’Start Point’)
xlower = [-2;-1];
xupper = [2;3];
end
f_0=feval(funcname,xnew);
xbestcurr = xnew;
xbestold = xnew;
f_best=feval(funcname,xnew);
f_best=10^(sign(f_best))*f_best;
for k = 1:max_iter,
xcurr=xbestcurr;
f_curr=feval(funcname,xcurr);
alpha = alpha0;
xnew = xcurr + alpha*(2*rand(length(xcurr),1)-1);
for i=1:length(xnew),
xnew(i) = max(xnew(i),xlower(i));
xnew(i) = min(xnew(i),xupper(i));
end %for
f_new=feval(funcname,xnew);
if f_new < f_best,
xbestold = xbestcurr;
xbestcurr = xnew;
f_best = f_new;
end
if print,
disp(’Iteration number k =’)
disp(k); %print iteration index k
disp(’alpha =’);
disp(alpha); %print alpha
disp(’New point =’);
disp(xnew’); %print new point
disp(’Function value =’);
disp(f_new); %print func value at new point
end %if
if norm(xnew-xbestold) <= epsilon_x*norm(xbestold)
disp(’Terminating: Norm of difference between iterates less than’);
disp(epsilon_x);
100
break;
end %if
pltpts(xbestcurr,xbestold);
if k == max_iter
disp(’Terminating with maximum number of iterations’);
end %if
end %for
if nargout >= 1
x=xnew;
if nargout == 2
N=k;
end
else
disp(’Final point =’);
disp(xbestcurr’);
disp(’Number of iterations =’);
disp(k);
end %if
A MATLAB routine for a simulated annealing algorithm is given by the M-file sa_demo shown below:
function [x,N]=simulated_annealing(funcname,xnew,options);
%Simulated annealing demo
% random_search(funcname,xnew,options);
% print = options(1);
% gamma = options(15);
% alpha = options(18);
if nargin ~= 3
options = [];
if nargin ~= 2
disp(’Wrong number of arguments.’);
return;
end
end
if length(options) >= 14
if options(14)==0
options(14)=1000*length(xnew);
end
else
options(14)=1000*length(xnew);
end
if length(options) < 15
options(15)=5.0; %
end
if options(15)==0
options(15)=5.0; %
end
if length(options) < 18
options(18)=0.5; %optional step size
end
format compact;
101
format short e;
options = foptions(options);
print = options(1);
epsilon_x = options(2);
epsilon_g = options(3);
max_iter=options(14);
alpha = options(18);
gamma = options(15);
k0=2;
if funcname == ’f_r’,
ros_cnt
elseif funcname == ’f_p’,
pks_cnt;
end %if
if length(xnew) == 2
plot(xnew(1),xnew(2),’o’)
text(xnew(1),xnew(2),’Start Point’)
xlower = [-2;-1];
xupper = [2;3];
end
f_0=feval(funcname,xnew);
xbestcurr = xnew;
xbestold = xnew;
xcurr = xnew;
f_best=feval(funcname,xnew);
f_best=10^(sign(f_best))*f_best;
for k = 1:max_iter,
f_curr=feval(funcname,xcurr);
xnew = xcurr + alpha*(2*rand(length(xcurr),1)-1);
for i=1:length(xnew),
xnew(i) = max(xnew(i),xlower(i));
xnew(i) = min(xnew(i),xupper(i));
end %for
f_new=feval(funcname,xnew);
if f_new < f_curr,
xcurr = xnew;
f_curr = f_new;
else
cointoss = rand(1);
Temp = gamma/log(k+k0);
Prob = exp(-(f_new-f_curr)/Temp);
if cointoss < Prob,
xcurr = xnew;
f_curr = f_new;
end
end
if f_new < f_best,
102
xbestold = xbestcurr;
xbestcurr = xnew;
f_best = f_new;
end
if print,
disp(’Iteration number k =’)
disp(k); %print iteration index k
disp(’alpha =’);
disp(alpha); %print alpha
disp(’New point =’);
disp(xnew’); %print new point
disp(’Function value =’);
disp(f_new); %print func value at new point
end %if
if norm(xnew-xbestold) <= epsilon_x*norm(xbestold)
disp(’Terminating: Norm of difference between iterates less
than’);
disp(epsilon_x);
break;
end %if
pltpts(xbestcurr,xbestold);
if k == max_iter
disp(’Terminating with maximum number of iterations’);
end %if
end %for
if nargout >= 1
x=xnew;
if nargout == 2
N=k;
end
else
disp(’Final point =’);
disp(xbestcurr’);
disp(’Objective function value =’);
disp(f_best);
disp(’Number of iterations =’);
disp(k);
end %if
To use the above routines, we also need the following M-files:
pltpts.m:
function out=pltpts(xnew,xcurr)
plot([xcurr(1),xnew(1)],[xcurr(2),xnew(2)],’r-’,xnew(1),xnew(2),’o’,’Erasemode’,
’none’);
drawnow; % Draws current graph now
%pause(1)
out = [];
f_p.m:
function y=f_p(x);
103
y=3*(1-x(1)).^2.*exp(-(x(1).^2)-(x(2)+1).^2) 10.*(x(1)/5-x(1).^3-x(2).^5).*exp
(-x(1).^2-x(2).^2) - exp(-(x(1)+1).^2-x(2).^2)/3;
y=-y;
pks_cnt.m:
echo off
X = [-3:0.2:3]’;
Y = [-3:0.2:3]’;
[x,y]=meshgrid(X’,Y’) ;
func = 3*(1-x).^2.*exp(-x.^2-(y+1).^2) 10.*(x/5-x.^3-y.^5).*exp(-x.^2-y.^2) exp(-(x+1).^2-y.^2)/3;
func = -func;
clf
levels = exp(-5:10);
levels = [-5:0.9:10];
contour(X,Y,func,levels,’k--’)
xlabel(’x_1’)
ylabel(’x_2’)
title(’Minimization of Peaks function’)
drawnow;
hold on
plot(-0.0303,1.5455,’o’)
text(-0.0303,1.5455,’Solution’)
To run the naive random search algorithms, we first pick a value of α = 0.5, which involves setting
options(18)=0.5. We then use the command rs_demo(’f_p’,[0;-2],options). The resulting plot of the
algorithm trajectory is given below. As we can see, the algorithm is stuck at a local minimizer. (By running
the algorithm several times, the reader can verify that this non-convergent behavior is typical.)
Minimization of Peaks function
3
2
Solution
x2
1
0
−1
Start Point
−2
−3
−3
−2
−1
0
x1
1
2
3
Next, we try α = 1.5, which involves setting options(18)=1.5. We then use the command
rs_demo(’f_p’,[0;-2],options) again, to obtain the plot shown below. This time, the algorithm reaches
the global minimizer.
104
Minimization of Peaks function
3
2
Solution
x2
1
0
−1
Start Point
−2
−3
−3
−2
−1
0
x1
1
2
3
Finally, we again set α = 0.5, using options(18)=0.5. We then run the simulated annealing code using
sa_demo(’f_p’,[0;-2],options). The algorithm can be seen to converge to the global minimizer, as
plotted below.
Minimization of Peaks function
3
2
Solution
x2
1
0
−1
Start Point
−2
−3
−3
−2
−1
0
x
1
1
14.3
A MATLAB routine for a particle swarm algorithm is:
% A particle swarm optimizer
% to find the minimum/maximum of the MATLABs’ peaks function
%
D---# of inputs to the function (dimension of problem)
clear
%Parameters
ps=10;
D=2;
ps_lb=-3;
ps_ub=3;
vel_lb=-1;
vel_ub=1;
iteration_n = 50;
range = [-3, 3; -3, 3]; % Range of the input variables
% Plot contours of peaks function
[x, y, z] = peaks;
%pcolor(x,y,z); shading interp; hold on;
105
2
3
%contour(x, y, z, 20, ’r’);
mesh(x,y,z)
%
hold off;
%colormap(gray);
set(gca,’Fontsize’,14)
axis([-3 3 -3 3 -9 9])
%axis square;
xlabel(’x_1’,’Fontsize’,14);
ylabel(’x_2’,’Fontsize’,14);
zlabel(’f(x_1,x_2)’,’Fontsize’,14);
hold on
upper = zeros(iteration_n, 1);
average = zeros(iteration_n, 1);
lower = zeros(iteration_n, 1);
%
%
%
%
%
initialize population of particles and their velocities at time
zero,
format of pos= (particle#, dimension)
construct random population positions bounded by VR
need to bound positions
ps_pos=ps_lb + (ps_ub-ps_lb).*rand(ps,D);
% need to bound velocities between -mv,mv
ps_vel=vel_lb + (vel_ub-vel_lb).*rand(ps,D);
% initial pbest positions
p_best = ps_pos;
% returns column of cost values (1 for each particle)
f1=’3*(1-ps_pos(i,1))^2*exp(-ps_pos(i,1)^2-(ps_pos(i,2)+1)^2)’;
f2=’-10*(ps_pos(i,1)/5-ps_pos(i,1)^3-ps_pos(i,2)^5)*exp(-ps_pos(i,1)^2-ps_pos(i,2)^2)’;
f3=’-(1/3)*exp(-(ps_pos(i,1)+1)^2-ps_pos(i,2)^2)’;
p_best_fit=zeros(ps,1);
for i=1:ps
g1(i)=3*(1-ps_pos(i,1))^2*exp(-ps_pos(i,1)^2-(ps_pos(i,2)+1)^2);
g2(i)=-10*(ps_pos(i,1)/5-ps_pos(i,1)^3-ps_pos(i,2)^5)*exp(-ps_pos(i,1)^2-ps_pos(i,2)^2);
g3(i)=-(1/3)*exp(-(ps_pos(i,1)+1)^2-ps_pos(i,2)^2);
p_best_fit(i)=g1(i)+g2(i)+g3(i);
end
p_best_fit;
hand_p3=plot3(ps_pos(:,1),ps_pos(:,2),p_best_fit’,’*k’,’markersize’,15,’erase’,’xor’);
% initial g_best
[g_best_val,g_best_idx] = max(p_best_fit);
%[g_best_val,g_best_idx] = min(p_best_fit); this is to minimize
g_best=ps_pos(g_best_idx,:);
% get new velocities, positions (this is the heart of the PSO
% algorithm)
for k=1:iteration_n
for count=1:ps
ps_vel(count,:) = 0.729*ps_vel(count,:)...
% prev vel
106
+1.494*rand*(p_best(count,:)-ps_pos(count,:))...
+1.494*rand*(g_best-ps_pos(count,:));
end
ps_vel;
% update new position
ps_pos = ps_pos + ps_vel;
% independent
% social
%update p_best
for i=1:ps
g1(i)=3*(1-ps_pos(i,1))^2*exp(-ps_pos(i,1)^2-(ps_pos(i,2)+1)^2);
g2(i)=-10*(ps_pos(i,1)/5-ps_pos(i,1)^3-ps_pos(i,2)^5)*exp(-ps_pos(i,1)^2-ps_pos(i,2)^2);
g3(i)=-(1/3)*exp(-(ps_pos(i,1)+1)^2-ps_pos(i,2)^2);
ps_current_fit(i)=g1(i)+g2(i)+g3(i);
if ps_current_fit(i)>p_best_fit(i)
p_best_fit(i)=ps_current_fit(i);
p_best(i,:)=ps_pos(i,:);
end
end
p_best_fit;
%update g_best
[g_best_val,g_best_idx] = max(p_best_fit);
g_best=ps_pos(g_best_idx,:);
% Fill objective function vectors
upper(k) = max(p_best_fit);
average(k) = mean(p_best_fit);
lower(k) = min(p_best_fit);
set(hand_p3,’xdata’,ps_pos(:,1),’ydata’,ps_pos(:,2),’zdata’,ps_current_fit’);
drawnow
pause
end
g_best
g_best_val
figure;
x = 1:iteration_n;
plot(x, upper, ’o’, x, average, ’x’, x, lower, ’*’);
hold on;
plot(x, [upper average lower]);
hold off;
legend(’Best’, ’Average’, ’Poorest’);
xlabel(’Iterations’); ylabel(’Objective function value’);
When we run the MATLAB code above, we obtain a plot of the initial set of particles, as shown below.
107
f(x1,x2)
5
0
−5
2
2
0
0
−2
−2
x2
x1
Then, after 30 iterations, we obtain:
f(x1,x2)
5
0
−5
2
2
0
0
−2
−2
x2
x1
Finally, after 50 iterations, we obtain:
f(x1,x2)
5
0
−5
2
2
0
0
−2
−2
x2
x1
108
A plot of the objective function values (best, average, and poorest) is shown below.
9
Best
Average
Poorest
8
Objective function value
7
6
5
4
3
2
1
0
−1
0
10
20
30
40
50
Iterations
14.4
a. Expanding the right hand side of the second expression gives the desired result.
b. Applying the algorithm, we get a binary representation of 11111001011, i.e.,
1995 = 210 + 29 + 28 + 27 + 26 + 23 + 21 + 20 .
c. Applying the algorithm, we get a binary representation of 0.1011101, i.e.,
0.7265625 = 2−1 + 2−3 + 2−4 + 2−5 + 2−7 .
d. We have 19 = 24 + 21 + 20 , i.e., the binary representation for 19 is 10011. For the fractional part, we
need at least 7 bits to keep at least the same accuracy. We have 0.95 = 2−1 + 2−2 + 2−3 + 2−4 + 2−7 + · · · ,
i.e., the binary representation is 0.1111001 · · · . Therefore, the binary representation of 19.95 with at least
the same degree of accuracy is 10011.1111001.
14.5
It suffices to prove the result for the case where only one symbol is swapped, since the general case is
obtained by repeating the argument. We have two scenarios. First, suppose the symbol swapped is at a
position corresponding to a don’t care symbol in H. Clearly, after the swap, both chromosomes will still be
in H. Second, suppose the symbol swapped is at a position corresponding to a fixed symbol in H. Since
both chromosomes are in H, their symbols at that position must be identical. Hence, the swap does not
change the chromosomes. This completes the proof.
14.6
T
Consider a given chromosome in M (k) H. The probability that it is chosen for crossover is qc . If neither
of its offsprings is in H, then at least one of the crossover points must be between the corresponding first
and last fixed symbols of H. The probability of this is 1 − (1 − δ(H)/(L − 1))2 . To see this, note that
the probability that each crossover point is not between the corresponding first and last fixed symbols is
1 − δ(H)/(L − 1), and thus the probability that both crossover points are not between the corresponding
first and last fixed symbols of H is (1 − δ(H)/(L − 1))2 . Hence, the probability that the given chromosome
is chosen for crossover and neither of its offsprings is in H is bounded above by
2 !
δ(H)
qc 1 − 1 −
.
L−1
109
14.7
As for two-point crossover, the n-point crossover operation is a composition of n one-point crossover operations (i.e., n one-point crossover operations in succession). The required result for this case is as follows.
Lemma:
T
Given a chromosome in M (k) H, the probability that it is chosen for crossover and neither of its offsprings
is in H is bounded above by
n δ(H)
qc 1 − 1 −
.
L−1
2
For the proof, proceed as in the solution of Exercise 14.6, replacing 2 by n.
14.8
function M=roulette_wheel(fitness);
%function M=roulette_wheel(fitness)
%fitness = vector of fitness values of chromosomes in population
%M = vector of indices indicating which chromosome in the
%
given population should appear in the mating pool
fitness = fitness - min(fitness); % to keep the fitness positive
if sum(fitness) == 0,
disp(’Population has identical chromosomes -- STOP’);
break;
else
fitness = fitness/sum(fitness);
end
cum_fitness = cumsum(fitness);
for i = 1:length(fitness),
tmp = find(cum_fitness-rand>0);
M(i) = tmp(1);
end
14.9
% parent1, parent2 = two binary parent chromosomes (row vectors)
L = length(parent1);
crossover_pt = ceil(rand*(L-1));
offspring1 = [parent1(1:crossover_pt) parent2(crossover_pt+1:L)];
offspring2 = [parent2(1:crossover_pt) parent1(crossover_pt+1:L)];
14.10
% mating_pool = matrix of 0-1 elements; each row represents a chromosome
% p_m = probability of mutation
N = size(mating_pool,1);
L = size(mating_pool,2);
mutation_points = rand(N,L) < p_m;
new_population = xor(mating_pool,mutation_points);
14.11
A MATLAB routine for a genetic algorithm with binary encoding is:
110
function [winner,bestfitness] = ga(L,N,fit_func,options)
% function winner = GA(L,N,fit_func)
% Function call: GA(L,N,’f’)
% L = length of chromosomes
% N = population size (must be an even number)
% f = name of fitness value function
%
%Options:
%print = options(1);
%selection = options(5);
%max_iter=options(14);
%p_c = options(18);
%p_m = p_c/100;
%
%Selection:
% options(5) = 0 for roulette wheel, 1 for tournament
clf;
if nargin ~= 4
options = [];
if nargin ~= 3
disp(’Wrong number of arguments.’);
return;
end
end
if length(options) >= 14
if options(14)==0
options(14)=3*N;
end
else
options(14)=3*N;
end
if length(options) < 18
options(18)=0.75; %optional crossover rate
end
%format compact;
%format short e;
options = foptions(options);
print = options(1);
selection = options(5);
max_iter=options(14);
p_c = options(18);
p_m = p_c/100;
P = rand(N,L)>0.5;
bestvaluesofar = 0;
%Initial evaluation
for i = 1:N,
fitness(i) = feval(fit_func,P(i,:));
end
[bestvalue,best] = max(fitness);
if bestvalue > bestvaluesofar,
bestsofar = P(best,:);
bestvaluesofar = bestvalue;
111
end
for k = 1:max_iter,
%Selection
fitness = fitness - min(fitness); % to keep the fitness positive
if sum(fitness) == 0,
disp(’Population has identical chromosomes -- STOP’);
disp(’Number of iterations:’);
disp(k);
for i = k:max_iter,
upper(i)=upper(i-1);
average(i)=average(i-1);
lower(i)=lower(i-1);
end
break;
else
fitness = fitness/sum(fitness);
end
if selection == 0,
%roulette-wheel
cum_fitness = cumsum(fitness);
for i = 1:N,
tmp = find(cum_fitness-rand>0);
m(i) = tmp(1);
end
else
%tournament
for i = 1:N,
fighter1=ceil(rand*N);
fighter2=ceil(rand*N);
if fitness(fighter1)>fitness(fighter2),
m(i) = fighter1;
else
m(i) = fighter2;
end
end
end
M = zeros(N,L);
for i = 1:N,
M(i,:) = P(m(i),:);
end
%Crossover
Mnew = M;
for i = 1:N/2
ind1 = ceil(rand*N);
ind2 = ceil(rand*N);
parent1 = M(ind1,:);
parent2 = M(ind2,:);
if rand < p_c
crossover_pt = ceil(rand*(L-1));
offspring1 = [parent1(1:crossover_pt) parent2(crossover_pt+1:L)];
offspring2 = [parent2(1:crossover_pt) parent1(crossover_pt+1:L)];
Mnew(ind1,:) = offspring1;
Mnew(ind2,:) = offspring2;
end
end
112
%Mutation
mutation_points = rand(N,L) < p_m;
P = xor(Mnew,mutation_points);
%Evaluation
for i = 1:N,
fitness(i) = feval(fit_func,P(i,:));
end
[bestvalue,best] = max(fitness);
if bestvalue > bestvaluesofar,
bestsofar = P(best,:);
bestvaluesofar = bestvalue;
end
upper(k) = bestvalue;
average(k) = mean(fitness);
lower(k) = min(fitness);
end %for
if k == max_iter,
disp(’Algorithm terminated after maximum number of iterations:’);
disp(max_iter);
end
winner = bestsofar;
bestfitness = bestvaluesofar;
if print,
iter = [1:max_iter]’;
plot(iter,upper,’o:’,iter,average,’x-’,iter,lower,’*--’);
legend(’Best’, ’Average’, ’Worst’);
xlabel(’Generations’,’Fontsize’,14);
ylabel(’Objective Function Value’,’Fontsize’,14);
set(gca,’Fontsize’,14);
hold off;
end
a. To run the routine, we create the following M-files.
function dec = bin2dec(bin,range);
%function dec = bin2dec(bin,range);
%Function to convert from binary (bin) to decimal (dec) in a given range
index = polyval(bin,2);
dec = index*((range(2)-range(1))/(2^length(bin)-1)) + range(1);
function y=f_manymax(x);
y=-15*(sin(2*x))^2-(x-2)^2+160;
function y=fit_func1(binchrom);
%1-D fitness function
f=’f_manymax’;
range=[-10,10];
x=bin2dec(binchrom,range);
y=feval(f,x);
We use the following script to run the algorithm:
113
clear;
options(1)=1;
[x,y]=ga(8,10,’fit_func1’,options);
f=’f_manymax’;
range=[-10,10];
disp(’GA Solution:’);
disp(bin2dec(x,range));
disp(’Objective function value:’);
disp(y);
Running the above algorithm, we obtain a solution of x∗ = 1.6078, and an objective function value of
159.7640. The figure below shows a plot of the best, average, and worst solution from each generation of the
population.
160
150
Objective Function Value
140
130
120
110
100
90
80
Best
Average
Worst
70
60
0
5
10
15
Generations
20
25
30
b. To run the routine, we create the following M-files (we also use the routine bin2dec from part a.
function y=f_peaks(x);
y=3*(1-x(1)).^2.*exp(-(x(1).^2)-(x(2)+1).^2) 10.*(x(1)/5-x(1).^3-x(2).^5).*exp
(-x(1).^2-x(2).^2) - exp(-(x(1)+1).^2-x(2).^2)/3;
function y=fit_func2(binchrom);
%2-D fitness function
f=’f_peaks’;
xrange=[-3,3];
yrange=[-3,3];
L=length(binchrom);
x1=bin2dec(binchrom(1:L/2),xrange);
x2=bin2dec(binchrom(L/2+1:L),yrange);
y=feval(f,[x1,x2]);
We use the following script to run the algorithm:
clear;
options(1)=1;
[x,y]=ga(16,20,’fit_func2’,options);
f=’f_peaks’;
xrange=[-3,3];
114
yrange=[-3,3];
L=length(x);
x1=bin2dec(x(1:L/2),xrange);
x2=bin2dec(x(L/2+1:L),yrange);
disp(’GA Solution:’);
disp([x1,x2]);
disp(’Objective function value:’);
disp(y);
A plot of the objective function is shown below.
160
140
120
f(x)
100
80
60
40
20
0
−10
−5
0
x
5
10
Running the above algorithm, we obtain a solution of [−0.0353, 1.4941]> , and an x∗ = [−0.0588, 1.5412]> ,
and an objective function value of 7.9815. (Compare this solution with that of Example 14.3.) The figure
below shows a plot of the best, average, and worst solution from each generation of the population.
8
Objective Function Value
6
4
2
0
−2
Best
Average
Worst
−4
−6
0
10
20
30
Generations
40
50
14.12
A MATLAB routine for a real-number genetic algorithm:
function [winner,bestfitness] = gar(Domain,N,fit_func,options)
% function winner = GAR(Domain,N,fit_func)
% Function call: GAR(Domain,N,’f’)
% Domain = search space; e.g., [-2,2;-3,3] for the space [-2,2]x[-3,3]
115
60
% (number of rows of Domain = dimension of search space)
% N = population size (must be an even number)
% f = name of fitness value function
%
%Options:
%print = options(1);
%selection = options(5);
%max_iter=options(14);
%p_c = options(18);
%p_m = p_c/100;
%
%Selection:
% options(5) = 0 for roulette wheel, 1 for tournament
clf;
if nargin ~= 4
options = [];
if nargin ~= 3
disp(’Wrong number of arguments.’);
return;
end
end
if length(options) >= 14
if options(14)==0
options(14)=3*N;
end
else
options(14)=3*N;
end
if length(options) < 18
options(18)=0.75; %optional crossover rate
end
%format compact;
%format short e;
options = foptions(options);
print = options(1);
selection = options(5);
max_iter=options(14);
p_c = options(18);
p_m = p_c/100;
n = size(Domain,1);
lowb = Domain(:,1)’;
upb = Domain(:,2)’;
bestvaluesofar = 0;
for i = 1:N,
P(i,:) = lowb + rand(1,n).*(upb-lowb);
%Initial evaluation
fitness(i) = feval(fit_func,P(i,:));
end
[bestvalue,best] = max(fitness);
if bestvalue > bestvaluesofar,
bestsofar = P(best,:);
bestvaluesofar = bestvalue;
end
116
for k = 1:max_iter,
%Selection
fitness = fitness - min(fitness); % to keep the fitness positive
if sum(fitness) == 0,
disp(’Population has identical chromosomes -- STOP’);
disp(’Number of iterations:’);
disp(k);
for i = k:max_iter,
upper(i)=upper(i-1);
average(i)=average(i-1);
lower(i)=lower(i-1);
end
break;
else
fitness = fitness/sum(fitness);
end
if selection == 0,
%roulette-wheel
cum_fitness = cumsum(fitness);
for i = 1:N,
tmp = find(cum_fitness-rand>0);
m(i) = tmp(1);
end
else
%tournament
for i = 1:N,
fighter1=ceil(rand*N);
fighter2=ceil(rand*N);
if fitness(fighter1)>fitness(fighter2),
m(i) = fighter1;
else
m(i) = fighter2;
end
end
end
M = zeros(N,n);
for i = 1:N,
M(i,:) = P(m(i),:);
end
%Crossover
Mnew = M;
for i = 1:N/2
ind1 = ceil(rand*N);
ind2 = ceil(rand*N);
parent1 = M(ind1,:);
parent2 = M(ind2,:);
if rand < p_c
a = rand;
offspring1 = a*parent1+(1-a)*parent2+(rand(1,n)-0.5).*(upb-lowb)/10;
offspring2 = a*parent2+(1-a)*parent1+(rand(1,n)-0.5).*(upb-lowb)/10;
%do projection
for j = 1:n,
if offspring1(j)<lowb(j),
offspring1(j)=lowb(j);
elseif offspring1(j)>upb(j),
offspring1(j)=upb(j);
end
117
if offspring2(j)<lowb(j),
offspring2(j)=lowb(j);
elseif offspring2(j)>upb(j),
offspring2(j)=upb(j);
end
end
Mnew(ind1,:) = offspring1;
Mnew(ind2,:) = offspring2;
end
end
%Mutation
for i = 1:N,
if rand < p_m,
a = rand;
Mnew(i,:) = a*Mnew(i,:) + (1-a)*(lowb + rand(1,n).*(upb-lowb));
end
end
P = Mnew;
%Evaluation
for i = 1:N,
fitness(i) = feval(fit_func,P(i,:));
end
[bestvalue,best] = max(fitness);
if bestvalue > bestvaluesofar,
bestsofar = P(best,:);
bestvaluesofar = bestvalue;
end
upper(k) = bestvalue;
average(k) = mean(fitness);
lower(k) = min(fitness);
end %for
if k == max_iter,
disp(’Algorithm terminated after maximum number of iterations:’);
disp(max_iter);
end
winner = bestsofar;
bestfitness = bestvaluesofar;
if print,
iter = [1:max_iter]’;
plot(iter,upper,’o:’,iter,average,’x-’,iter,lower,’*--’);
legend(’Best’, ’Average’, ’Worst’);
xlabel(’Generations’,’Fontsize’,14);
ylabel(’Objective Function Value’,’Fontsize’,14);
set(gca,’Fontsize’,14);
hold off;
end
To run the routine, we create the following M-file for the given function.
function y=f_wave(x);
y=x(1)*sin(x(1)) + x(2)*sin(5*x(2));
118
We use the following script to run the algorithm:
options(1)=1;
options(14)=50;
[x,y]=gar([0,10;4,6],20,’f_wave’,options);
disp(’GA Solution:’);
disp(x);
disp(’Objective function value:’);
disp(y);
Running the above algorithm, we obtain a solution of x∗ = [7.9711, 5.3462]> , and an objective function
value of 13.2607. The figure below shows a plot of the best, average, and worst solution from each generation
of the population.
15
Objective Function Value
10
5
0
−5
−10
0
Best
Average
Worst
10
20
30
Generations
40
50
Using the MATLAB function fminunc (from the Optimization Toolbox), we found the optimal point to
be [7.9787, 5.3482]> , with objective function value 13.2612. We can see that this solution agrees with the
solution obtained using our real-number genetic algorithm.
119
15. Introduction to Linear Programming
15.1
minimize −2x1 − x2
subject to x1 + x3
=
2
x1 + x2 + x4
=
3
x1 + 2x2 + x5
=
5
x1 , . . . , x5
≥ 0
15.2
We have
x2 = ax1 + bu1 = a2 x0 + abu0 + bu1 = a2 + [ab, b]u
where u = [u0 , u1 ]> is the decision variable. We can write the constraint as ui ≤ 1 and ui ≥ −1. Hence, the
problem is:
minimize
subject to
a2 + [ab, b]u
−1 ≤ ui ≤ 1, i = 1, 2.
Since a2 is a constant, we can remove it from the objective function without changing the solution. Introducing slack variables v1 , v2 , v3 , v4 , we obtain the standard form problem
minimize
subject to
[ab, b]u
u0 + v1 = 1
−u0 + v2 = 1
u1 + v3 = 1
−u1 + v4 = 1.
15.3
−
+
−
+
−
Let x+
i , xi ≥ 0 be such that |xi | = xi + xi , xi = xi − xi . Substituting into the original problem, we have
minimize
subject to
−
+
−
+
−
c1 (x+
1 + x1 ) + c2 (x2 + x2 ) + · · · + cn (x2 + x2 )
+
−
A(x − x ) = b
x+ , x− ≥ 0,
−
+ >
−
− >
where x+ = [x+
1 , . . . , xn ] and x = [x1 , . . . , xn ] . Rewriting, we get
minimize
subject to
[c> , c> ]z
[A, −A]z = b
z ≥ 0,
which is an equivalent linear programming problem in standard form.
−
+ −
Note that although the variables x+
i and xi in the solution are required to satisfy xi xi = 0, we do
not need to explicitly include this in the constraint because any optimal solution to the above transformed
−
problem automatically satisfies the condition x+
i xi = 0. To see this, suppose we have an optimal solution
+
−
with both xi > 0 and xi > 0. In this case, note that ci > 0 (for otherwise we can add any arbitrary
−
constant to both x+
i and xi and still satisfy feasibility, but decrease the objective function value). Then,
+
−
−
by subtracting min(xi , xi ) from x+
i and xi , we have a new feasible point with lower objective function
value, contradicting the optimality assumption. [See also M. A. Dahleh and I. J. Diaz-Bobillo, Control of
Uncertain Systems: A Linear Programming Approach, Prentice Hall, 1995, pp. 189–190.]
120
15.4
Not every linear programming problem in standard form has a nonempty feasible set. Example:
minimize
subject to
x1
−x1 = 1
x1 ≥ 0.
Not every linear programming problem in standard form (even assuming a nonempty feasible set) has an
optimal solution. Example:
−x1
minimize
subject to
x2 = 1
x1 , x2 ≥ 0.
15.5
Let x1 and x2 represent the number of units to be shipped from A to C and to D, respectively, and x3 and
x4 represent the number of units to be shipped from B to C and to D, respectively. Then, the given problem
can be formulated as the following linear program:
minimize
subject to
x1 + 2x2 + 3x3 + 4x4
x1 + x3 = 50
x2 + x4 = 60
x1 + x2 ≤ 70
x3 + x4 ≤ 80
x1 , x2 , x3 , x4 ≥ 0.
Introducing slack variables x5 and x6 , we have the standard form problem
minimize
subject to
x1 + 2x2 + 3x3 + 4x4
x1 + x3 = 50
x2 + x4 = 60
x1 + x2 + x5 = 70
x3 + x4 + x6 = 80
x1 , x2 , x3 , x4 , x5 , x6 ≥ 0.
15.6
We can see that there are two paths from A to E (ACDE and ACBF DE), and two paths from B to F
(BCDF and BF ). Let x1 and x2 be the data rates for the two paths from A to E, respectively, and x3 and x4
the data rates for the two paths from B to F , respectively. The total revenue is then 2(x1 + x2 ) + 3(x3 + x4 ).
For each link, we have a data rate constraint on the sum of all xi s passing through that link. For example,
121
for link BC, there are two paths passing through it, with total data rate x2 + x3 . Hence, the constraint for
link BC is x2 + x3 ≤ 7. Hence, the optimization problem is the following linear programming problem:
maximize
subject to
2(x1 + x2 ) + 3(x3 + x4 )
x1 + x2 ≤ 10
x1 + x2 ≤ 12
x1 + x3 ≤ 8
x2 + x3 ≤ 7
x2 + x3 ≤ 4
x2 + x4 ≤ 3
x1 , . . . , x4 ≥ 0.
Converting this to standard form:
minimize
subject to
−2(x1 + x2 ) − 3(x3 + x4 )
x1 + x2 + x5 = 10
x1 + x2 + x6 = 12
x1 + x3 + x7 = 8
x2 + x3 + x8 = 7
x2 + x3 + x9 = 4
x2 + x4 + x10 = 3
x1 , . . . , x10 ≥ 0.
15.7
Let xi ≥ 0, i = 1, . . . , 4, be the weight in pounds of item i to be used. Then, the total weight is x1 +x2 +x3 +x4 .
To satisfy the percentage content of fiber, fat, and sugar, and the total weight of 1000, we need
3x1 + 8x2 + 16x3 + 4x4
=
10(x1 + x2 + x3 + x4 )
6x1 + 46x2 + 9x3 + 9x4
=
2(x1 + x2 + x3 + x4 )
20x1 + 5x2 + 4x3 + 0x4
=
5(x1 + x2 + x3 + x4 )
x1 + x2 + x3 + x4
=
1000
The total cost is 2x1 + 4x2 + x3 + 2x4 . Therefore, the problem is:
minimize 2x1 + 4x2 + x3 + 2x4
subject to − 7x1 − 2x2 + 6x3 − 6x4
=
0
4x1 + 44x2 + 7x3 + 7x4
=
0
15x1 − x3 − 5x4
=
0
=
1000
x1 + x2 + x3 + x4
x1 , x2 , x3 , x4
≥ 0
Alternatively, we could have simply replaced x1 + x2 + x3 + x4 in the first three equality constraints above
by 1000, to obtain:
3x1 + 8x2 + 16x3 + 4x4
=
10000
6x1 + 46x2 + 9x3 + 9x4
=
2000
20x1 + 5x2 + 4x3 + 0x4
=
5000
x1 + x2 + x3 + x4
=
1000.
Note that the only vector satisfying the above linear equations is [179, −175, 573, 422]> , which is not
feasible. Therefore, the constraint does not have any any feasible points, which means that the problem does
not have a solution.
122
15.8
The objective function is p1 + · · · + pn . The constraint for the ith location is: gi,1 p1 + · · · + gi,n pn ≥ P .
Hence, the the optimization problem is:
minimize
subject to
p1 + · · · + pn
gi,1 p1 + · · · + gi,n pn ≥ P, i = 1, . . . , m
p1 , . . . , pn ≥ 0.
By defining the notation G = [gi,j ] (m × n), en = [1, . . . , 1]> (with n components), and p = [p1 , . . . , pn ]> ,
we can rewrite the problem as
minimize
subject to
e>
np
Gp ≥ P em
p ≥ 0.
15.9
It is easy to check (using MATLAB, for example) that the matrix


2 −1 2 −1 3


A = 1 2
3
1
0
1 0 −2 0 −5
is of full rank (i.e., rank A = 3). Therefore, the system has basic solutions. To find the basic solutions, we
first select bases. Each basis consists of three linearly independent columns of A. These columns correspond
to basic variables of the basic solution. The remaining variables are nonbasic and are set to 0. The matrix
A has 5 columns; therefore, we have 53 = 10 possible candidate basic solutions (corresponding to the 10
combinations of 3 columns out of 5). It turns out that all 10 combinations of 3 columns of A are linearly
independent. Therefore, we have 10 basic solutions. These are tabulated as follows:
Columns
1, 2, 3
1, 2, 4
1, 2, 5
1, 3, 4
1, 3, 5
1, 4, 5
2, 3, 4
2, 3, 5
2, 4, 5
3, 4, 5
Basic Solutions
[−4/17, −80/17, 83/17, 0, 0]>
[−10, 49, 0, −83, 0]>
[105/31, 25/31, 0, 0, 83/31]>
[−12/11, 0, 49/11, −80/11, 0]>
[100/35, 0, 25/35, 0, 80/35]>
[65/18, 0, 0, 25/18, 49/18]>
[0, −6, 5, 2, 0]>
[0, −100/23, 105/23, 0, 4/23]>
[0, 13, 0, −21, 2]>
[0, 0, 65/19, −100/19, 12/19]>
15.10
In the figure below, the shaded region corresponds to the feasible set. We then translate the line 2x1 +5x2 = 0
across the shaded region until the line just touches the region at one point, and the line is as far as possible
from the origin. The point of contact is the solution to the problem. In this case, the solution is [2, 6]> , and
the corresponding cost is 34.
123
x2
2
6
0
6
2x +
1 5x
2 =34
4
4
x
1+
x
2=
2x +
1 5x
2 =0
4
0
15.11
We use the following MATLAB commands:
>>
>>
>>
>>
>>
>>
>>
>>
f=[0,-10,0,-6,-20];
A=[1,-1,-1,0,0; 0,0,1,-1,-1];
b=[0;0];
vlb=zeros(5,1);
vub=[4;3;3;2;2];
x0=zeros(5,1);
neqcstr=2;
x=linprog(f,A,b,vlb,vub,x0,neqcstr)
x =
4.0000
2.0000
2.0000
0.0000
2.0000
The solution is [4, 2, 2, 0, 2]> .
16. The Simplex Method
124
8
x1
16.1
a. Performing a sequence of

1 2 −1
 2 −1 3

A=
3 1
2
1 2
3
elementary row operations, we obtain




3 2
1 2 −1 3
2
1




0 1
 0 −5 5 −6 −3
0
→
→
 0 −5 5 −6 −3
0
3 3
1 1
0 0
4 −2 −1
0
2
−5
0
0
−1
5
4
0
3
−6
−2
0

2
−3

 = B.
−1
0
Because elementary row operations do not change the rank of a matrix, rank A = rank B. Therefore
rank A = 3.
b. Performing a sequence of elementary row operations, we obtain






1
10
−6
1
1 10 −6 1
1 γ −1 2






A =  2 −1 γ 5 →  1 γ −1 2 →  0 γ − 10
5
1
0
−21
γ + 12 3
2 −1 γ 5
1 10 −6 1




1 1
10
−6
1 1
10
−6




→  0 1 γ − 10
γ − 10
5 =B
5  → 0 1
0 3
−21
γ + 12
0 0 −3(γ − 3) γ − 3
Because elementary row operations do not change the rank of a matrix, rank A = rank B. Therefore
rank A = 3 if γ 6= 3 and rank A = 2 if γ = 3.
16.2
a.
"
3
A=
6
1
2
0
1
#
1
,
1
" #
4
b=
,
5
c = [2, −1, −1, 0] .
b. Pivoting the problem tableau about the elements (1, 4) and (2, 3), we obtain
3
3
5
1
1
0
0
1
0
1
0
0
4
1
1
c. Basic feasible solution: x = [0, 0, 1, 4]> , c> x = −1.
d. [r1 , r2 , r3 , r4 ] = [5, 0, 0, 0].
e. Since the reduced cost coefficients are all ≥ 0, the basic feasible solution in part c is optimal.
f. The original problem does indeed have a feasible solution, because the artificial problem has an optimal
feasible solution with objective function value 0, as shown in the final phase I tableau.
g. Extract the submatrices corresponding to A and b, append the last row [c> , 0], and pivot about the
(2, 1)th element to obtain
0
0
−1 1
3
1 1/3
1/3 0 1/3
0 −5/3 −5/3 0 −2/3
16.3
The problem in standard form is:
minimize
subject to
−x1 − x2 − 3x3
x1 + x3 = 1
x2 + x3 = 2
x1 , x2 , x3 ≥ 0.
125
We form the tableau for the problem:
1
0
−1
0
1
−1
1 1
1 2
−3 0
Performing necessary row operations, we obtain a tableau in canonical form:
1
0
0
0
1
0
1
1
−1
1
2
3
We pivot about the (1, 3)th element to get:
1
−1
1
0 1 1
1 0 1
0 0 4
The reduced cost coefficients are all nonnegative. Hence, the current basic feasible solution is optimal:
[0, 1, 1]> . The optimal cost is 4.
16.4
The problem in standard form is:
−2x1 − x2
minimize
subject to
x1 + x3 = 5
x2 + x4 = 7
x1 + x2 + x5 = 9
x1 , . . . , x5 ≥ 0.
We form the tableau for the problem:
1
0
1
−2
0
1
1
−1
1 0 0 5
0 1 0 7
0 0 1 9
0 0 0 0
The above tableau is already in canonical form, and therefore we can proceed with the simplex procedure.
We first pivot about the (1, 1)th element, to get
1
0
0
0
0
1
1
−1
1
0
−1
2
0 0 5
1 0 7
0 1 4
0 0 10
Next, we pivot about the (3, 2)th element to get
1 0
0 0
0 1
0 0
1
1
−1
1
0 0
1 −1
0 1
0 1
5
3
4
14
The reduced cost coefficients are all nonnegative. Hence, the optimal solution to the problem in standard
form is [5, 4, 0, 3, 0]> . The corresponding optimal cost is −14.
126
16.5
a. Let B = [a2 , a1 ] represent the first two columns of A ordered according to the basis corresponding to the
given canonical tableau, and D the second two columns. Then,
"
#
1
2
B −1 D =
,
3 4
Hence,
"
1
B=D
3
2
4
#−1
"
3/2
=
−2
#
−1/2
.
1
Hence,
"
−1/2
A=
1
3/2 0 1
−2 1 0
#
An alternative approach is to realize that the canonical tableau is obtained from the problem tableau via
elementary row operations. Therefore, we can obtain the entries of A from the 2 × 4 upper-left submatrix
of the canonical tableau via elementary row operations also. Specifically, start with
"
#
0 1 1 2
1 0 3 4
and then do two pivoting operations, one about (1, 4) and the other about (2, 3).
b. The right-half of c is given by
"
c>
D
=
r>
D
+
−1
c>
D
BB
1
= [−1, 1] + [7, 8]
3
#
2
= [−1, 1] + [31, 46] = [30, 47].
4
So c = [8, 7, 30, 47]> .
c. First we calculate B −1 b, giving us the basic variable values:
"
#" # " #
2 1 5
16
−1
B b=
=
.
4 3 6
38
Hence, the BFS is [38, 16, 0, 0]> .
−1
b = −(7(16) + 8(38)) =
d. The first two entries are 16 and 38, respectively. The last component is −c>
BB
−416. Hence, the last column is the vector [16, 38, −416]> .
16.6
−
The columns in the constraint matrix A corresponding to x+
i and xi are linearly dependent. Hence they
−
cannot both enter a basis at the same time. This means that only one variable, x+
i or xi , can assume a
nonnegative value; the nonbasic variable is necessarily zero.
16.7
a. From the given information, we have the 4 × 6 canonical tableau


1 ∗ 0 0 1/2 1
0 ∗ 1 0 0
2




0 ∗ 0 1 0
3
0 1 0 0 −1 −6
Explanations:
The given vector x indicates that A is 3 × 5.
127
In the above tableau, we assume that the basis is [a1 , a3 , a4 ], in this order. Other permutations of
orders will result in interchanging rows among the first three rows of the tableau.
The fifth column represents the coordinates of a5 with respect to the basis [a1 , a3 , a4 ]. Because
[−2, 0, 0, 0, 4]> lies in the nullspace of A, we deduce that −2a1 + 4a5 = 0, which can be rewritten
as a5 = (1/2)a1 + 0a3 + 0a4 , and hence the coordinate vector is [1/2, 0, 0]> .
b. Let d0 = [−2, 0, 0, 0, 4]> . Then, Ad0 = 0. Therefore, the vector x0 = x + εd0 also satisfies Ax = b. Now,
x0 = x + εd0 = [1 − 2ε, 0, 2, 3, 4ε]> . For x0 to be feasible, we must have ε ≤ 1/2. Moreover, the objective
function value of x0 is c> x0 = z0 + r5 x05 = 6 − 4ε, where z0 is the objective function value of x. So, if we
pick any ε ∈ (0, 1/2], then x0 will be a feasible solution with objective function value strictly less than 6.
For example, with ε = 1/2, x0 = [0, 0, 2, 3, 2]> is such a point. (We could also have obtained this solution by
pivoting about the element (1, 5) in the tableau of part a.)
16.8
a. The BFS is [6, 0, 7, 5, 0]> , with objective function value −8.
b. r = [0, 4, 0, 0, −4]> .
c. Yes, because the 5th column has all negative entries.
d. We pivot about the element (3, 2). The

0
1


0
0
new canonical tableau is:

0 −1/3 1 0
8/3
0 −2/3 0 0
4/3 


1 1/3 0 −1 7/3 
0 −4/3 0 0 −4/3
e. First note that based on the 5th column, the following point is feasible:
 
 
6
2
 0
 0
 
 
 
 
x =  7 +  3 .
 
 
 5
 1
0
1
Note that x5 = . Now, any solution of the form x = [∗, 0, ∗, ∗, ]> has an objective function value given by
z = z0 + r 5 where z0 = −8 and r5 = −4 (from parts a and b). If z = −100, then = 23. Hence, the following point has
objective function value z = −100:
 
   
6
2
52
 0
 0  0 
 
   
 
   
x =  7 + 23  3 =  76 .
 
   
 5
 1  28
0
1
23
f. The entries of the 2nd column of the given canonical tableau are the coordinates of a2 with respect to the
basis {a4 , a1 , a3 }. Therefore,
a2 = a4 + 2a1 + 3a3 .
Therefore, the vector [2, −1, 3, 1, 0]> lies in the nullspace of A. Similarly, using the entries of the 5th
column, we deduce that [−2, 0, −3, −1, −1]> also lies in the nullspace of A. These two vectors are linearly
independent. Because A has rank 3, the dimension of the nullspace of A is 2. Hence, these two vectors form
a basis for the nullspace of A.
128
16.9
a. We can convert the problem to standard form by multiplying the objective function by −1 and introducing
a surplus variable x3 . We obtain:
minimize
subject to
x1 + 2x2
x2 − x3 = 1
x1 , x2 , x3 ≥ 0.
Note that we do not need to deal with the absence of the constraint x2 ≥ 0 in the original problem, since
x2 ≥ 1 implies that x2 ≥ 0 also. Had we used the rule of writing x2 = u − v with u, v ≥ 0, we obtain the
standard form problem:
minimize
subject to
x1 + 2u − 2v
u − v − x3 = 1
x1 , u, v, x3 ≥ 0.
b. For phase I, we set up the artificial problem tableau as:
0 1
0 0
−1
0
1 1
1 0
Pivoting about element (1, 4), we obtain the canonical tableau:
0
0
1
−1
−1
1
1 1
0 −1
Pivoting now about element (1, 2), we obtain the next canonical tableau:
0 1
0 0
−1
0
1 1
1 0
Hence, phase I terminates, and we use x2 as our initial basic variable for phase II.
For phase II, we set up the problem tableau as:
0 1
1 2
−1
0
1
0
Pivoting about element (1, 2), we obtain
0 1
1 0
−1
2
1
−2
Hence, the BFS [0, 1, 0]> is optimal, with objective function value 2. Therefore, the optimal solution to the
original problem is [0, 1]> with objective function value −2.
16.10
a. [1, 0]>
h
b. 1 −1
1
i
"
1
Note that the answer is not
0
−1
−1
#
1
, which is the canonical tableau.
1
c. We choose q = 2 because the only negative RCC value is r2 . However, y1,2 < 0. Therefore, the simplex
algorithm terminates with the condition that the problem is unbounded.
d. Any vector of the form [x1 , x1 − 1]> , x1 ≥ 1, is feasible. Therefore the first component can take arbitrarily
large (positive) values. Hence, the objective function, which is −x1 , can take arbitrarily negative values.
129
16.11
The problem in standard form is:
minimize
x1 + x2
x1 + 2x2 − x3 = 3
subject to
2x1 + x2 − x4 = 3
x1 , x2 , x3 , x4 ≥ 0.
We will use x1 and x2 as initial basic variables. Therefore, Phase I is not needed, and we immediately
proceed with Phase II. The tableau for the problem is:
c>
a1
1
2
1
a2
2
1
1
a3
−1
0
0
a4
0
−1
0
b
3
3
0
We compute
"
>
λ
−1
= c>
BB
1
= [1, 1]
2
2
1
#−1
= [1/3, 1/3],
"
r>
D
=
c>
D
−1
− λ D = [0, 0] − [1/3, 1/3]
0
>
#
0
= [1/3, 1/3] = [r3 , r4 ].
−1
The reduced cost coefficients are all nonnegative. Hence, the solution to the standard form problem is
[1, 1, 0, 0]> . Therefore, the solution to the original problem is [1, 1]> , and the corresponding cost is 2.
16.12
a. The problem in standard form is:
minimize
4x1 + 3x2
5x1 + x2 − x3 = 11
subject to
2x1 + x2 − x4 = 8
x1 + 2x2 − x5 = 7
x1 , . . . , x5 ≥ 0.
We do not have an apparent basic feasible solution. Therefore, we will need to use the two phase method.
Phase I: We introduce artificial variables x6 , x7 , x8 and form the following tableau.
c>
a1
5
2
1
0
a2
1
1
2
0
a3
−1
0
0
0
a4
0
−1
0
0
a5
0
0
−1
0
a6
1
0
0
1
a7
0
1
0
1
a8
0
0
1
1
b
11
8
7
0
We then form the following revised tableau:
Variable
x6
x7
x8
B −1
1 0 0
0 1 0
0 0 1
y0
11
8
7
We compute:
λ>
=
[1, 1, 1]
r>
D
=
[r1 , r2 , r3 , r4 , r5 ] = [−8, −4, 1, 1, 1].
130
We form the augmented revised tableau by introducing y 1 = B −1 a1 = a1 :
B −1
1 0 0
0 1 0
0 0 1
Variable
x6
x7
x8
y0
11
8
7
y1
5
2
1
We now pivot about the first component of y 1 to get
B −1
y0
1/5 0 0 11/5
−2/5 1 0 18/5
−1/5 0 1 24/5
Variable
x1
x7
x8
We compute
λ>
=
[−3/5, 1, 1]
r>
D
=
[r2 , r3 , r4 , r5 , r6 ] = [−12/5, −3/5, 1, 1, 8/5].
We bring y 2 = B −1 a2 into the basis to get
Variable
x1
x7
x8
B −1
y0
1/5 0 0 11/5
−2/5 1 0 18/5
−1/5 0 1 24/5
y2
1/5
3/5
9/5
We pivot about the third component of y 2 to get
Variable
x1
x7
x2
B −1
2/9 0 −1/9
−1/3 1 −1/3
−1/9 0 5/9
y0
5/3
2
8/3
We compute
λ>
=
[−1/3, 1, −1/3]
r>
D
=
[r3 , r4 , r5 , r6 , r8 ] = [−1/3, 1, −1/3, 4/3, 4/3].
We bring y 3 = B −1 a3 into the basis to get
Variable
x1
x7
x2
B −1
2/9 0 −1/9
−1/3 1 −1/3
−1/9 0 5/9
y0
5/3
2
8/3
y3
−2/9
1/3
1/9
We pivot about the second component of y 3 to obtain
Variable
x1
x3
x2
0
−1
0
B −1
y0
2/3 −1/3 3
3
−1
6
−1/3 2/3
2
We compute
λ>
r>
D
= [0, 0, 0]
= [r4 , r5 , r6 , r7 , r8 ] = [0, 0, 1, 1, 1] ≥ 0> .
131
Thus, Phase I is complete, and the initial basic feasible solution is [3, 2, 6, 0, 0]> .
Phase II
We form the tableau for the original problem:
c>
a1
5
2
1
4
a2
1
1
2
3
a3
−1
0
0
0
a4
0
−1
0
0
a5
0
0
−1
0
b
11
8
7
0
The initial revised tableau for Phase II is the final revised tableau for Phase I. We compute
λ>
=
[0, 5/3, 2/3]
r>
D
=
[r4 , r5 ] = [5/3, 2/3] > 0> .
Hence, the optimal solution to the original problem is [3, 2]> .
b. The problem in standard form is:
minimize
subject to
−6x1 − 4x2 − 7x3 − 5x4
x1 + 2x2 + x3 + 2x4 + x5 = 20
6x1 + 5x2 + 3x3 + 2x4 + x6 = 100
3x1 + 4x2 + 9x3 + 12x4 + x7 = 75
x1 , . . . , x7 ≥ 0.
We have an apparent basic feasible solution: [0, 0, 0, 20, 100, 75]> , corresponding to B = I 3 . We form the
revised tableau corresponding to this basic feasible solution:
B −1
1 0 0
0 1 0
0 0 1
Variable
x5
x6
x7
y0
20
100
75
We compute
λ>
=
[0, 0, 0]
r>
D
=
[r1 , r2 , r3 , r4 ] = [−6, −4, −7, −5].
We bring y 2 = B −1 a3 = a3 into the basis to obtain
Variable
x5
x6
x7
B −1
1 0 0
0 1 0
0 0 1
y0
20
100
75
y3
1
3
9
We pivot about the third component of y 3 to get
Variable
x5
x6
x3
1
0
0
B −1
0 −1/9
1 −1/3
0 1/9
y0
35/3
75
25/3
We compute
λ>
=
[0, 0, −7/9]
r>
D
=
[r1 , r2 , r4 , r7 ] = [−11/3, −8/9, 13/3, 7/9].
132
We bring y 1 = B −1 a1 into the basis to obtain
Variable
x5
x6
x3
1
0
0
B −1
0 −1/9
1 −1/3
0 1/9
y0
35/3
75
25/3
y1
2/3
5
1/3
We pivot about the second component of y 1 to obtain
Variable
x5
x1
x3
1
0
0
B −1
−2/15 −1/15
1/5
−1/15
−1/15 2/15
y0
5/3
15
10/3
We compute
λ>
r>
D
= [0, −11/15, −8/15]
= [r2 , r4 , r6 , r7 ] = [27/15, 43/15, 11/15, 8/15] > 0> .
The optimal solution to the original problem is therefore [15, 0, 10/3, 0]> .
16.13
a. By inspection of r > , we conclude that the basic variables are x1 , x3 , x4 , and the basis matrix is


0 0 1


B =  1 0 0 .
0 1 0
Since r > ≥ 0> , the basic feasible solution corresponding to the basis B is optimal. This optimal basic
feasible solution is [8, 0, 9, 7]> .
b. An optimal solution to the dual is given by
−1
λ> = c>
,
BB
where c>
B = [6, 4, 5], and

B −1
0

= 0
1
1
0
0

0

1 .
0
We obtain λ> = [5, 6, 4].
>
>
>
>
>
c. We have r >
= [5, 6, 4], and D = [2, 1, 3]> . We get
D = cD − λ D, where r D = [1], cD = [c2 ], λ
1 = c2 − 10 − 6 − 12, which yields c2 = 29.
16.14
a. There are two basic feasible solutions: [1, 0]> and [0, 2]> .
b. The feasible set in R2 for this problem is the line segment joining the two basic feasible solutions [1, 0]>
and [0, 2]> . Therefore, if the problem has an optimal feasible solution that is not basic, then all points in
the feasible set are optimal. For this, we need
" #
" #
c1
2
=α
,
c2
1
where α ∈ R.
c. Since all basic feasible solutions are optimal, the relative cost coefficients are all zero.
133
16.15
a. 2 − α < 0, β ≤ 0, γ ≤ 0, and δ anything.
b. 2 − α ≥ 0, δ = −7, β and γ anything.
c. 2 − α < 0, γ > 0, either β ≤ 0 or 5/γ ≤ 4/β, and δ anything.
16.16
a. The value of α must be 0, because the objective function value is 0 (lower right corner), and α is the
value of an artificial variable.
The value of β must be 0, because it is the RCC value corresponding to a basic column.
The value of γ must be 2, because it must be a positive value. Otherwise, there is a feasible solution to
the artificial problem with objective function value smaller than 0, which is impossible.
The value of δ must be 0, because we must be able to bring the fourth column into the basis without
changing the objective function value.
b. The given linear programming problem does indeed have a feasible solution: [0, 5, 6, 0]> . We obtain
this by noticing that the right-most column is a linear combination of the second and third columns, with
coefficients 5 and 6.
16.17
First, we convert the inequality constraint Ax ≥ b into standard form. To do this, we introduce a variable
w ∈ Rm of surplus variables to convert the inequality constraint into the following equivalent constraint:
" #
x
[A, −I]
= b, w ≥ 0.
w
Next, we introduce variables u, v ∈ Rn to replace the free variable x by u − v. We then obtain the following
equivalent constraint:
 
u
 
[A, −A, −I]  v  = b, u, v, w ≥ 0.
w
This form of the constraint is now in standard form. So we can now use Phase I of the simplex method to
implement an algorithm to find a vectors u, v, and w satisfying the above constraint, if such exist, or to
declare that none exists. If such exist, we output x = u − v; otherwise, we declare that no x exists such
that Ax ≥ b. By construction, this algorithm is guaranteed to behave in the way specified by the question.
16.18
a. We form the tableau for the problem:
1 0 0
0 1 0
0 0 1
0 0 0
1/4
1/2
0
−3/4
−8
−12
0
20
−1 9 0
−1/2 3 0
1
0 1
−1/2 6 0
The above tableau is already in canonical form, and therefore we can proceed with the simplex procedure.
We first pivot about the (1, 4)th element, to get
4
−2
0
3
0 0 1 −32
1 0 0
4
0 1 0
0
0 0 0 −4
−4
3/2
1
−7/2
36 0
−15 0
0
1
33 0
Pivoting about the (2, 5)th element, we get
−12
−1/2
0
1
8 0 1 0 8
1/4 0 0 1 3/8
0 1 0 0 1
1 0 0 0 −2
134
−84
−15/4
0
18
0
0
1
0
Pivoting about the (1, 6)th element, we get
−3/2
1
0
1/8
1/16 −1/8 0 −3/64
3/2
−1 1 −1/8
−2
3
0
1/4
0 1 −21/2 0
1 0 3/16 0
0 0 21/2 1
0 0
−3
0
Pivoting about the (2, 7)th element, we get
−6 0 −5/2
−2/3 0 −1/4
6
1 5/2
1
0 −1/2
2
1/3
−2
−1
56 1 0 0
16/3 0 1 0
−56 0 0 1
16 0 0 0
Pivoting about the (1, 1)th element, we get
1 −3 0
0 1/3 0
0
0 1
0 −2 0
−5/4
1/6
0
−7/4
28
−4
0
44
1/2 0 0
−1/6 1 0
1
0 1
1/2 0 0
−8
−12
0
20
−1 9 0
−1/2 3 0
1
0 1
−1/2 6 0
Pivoting about the (2, 2)th element, we get
1 0 0
0 1 0
0 0 1
0 0 0
1/4
1/2
0
−3/4
which is identical to the initial tableau. Therefore, cycling occurs.
b. We start with the initial tableau of part a, and pivot about the (1, 4)th element to obtain
0 0 1 −32
1 0 0
4
0 1 0
0
0 0 0 −4
4
−2
0
3
−4
3/2
1
−7/2
36 0
−15 0
0
1
33 0
Pivoting about the (2, 5)th element, we get
−12
−1/2
0
1
8 0 1 0 8
1/4 0 0 1 3/8
0 1 0 0 1
1 0 0 0 −2
−84
−15/4
0
18
0
0
1
0
Pivoting about the (1, 6)th element, we get
−3/2
1
0
1/8
1/16 −1/8 0 −3/64
3/2
−1 1 −1/8
−2
3
0
1/4
0 1 −21/2 0
1 0 3/16 0
0 0 21/2 1
0 0
−3
0
Pivoting about the (2, 1)th element, we get
0
1
0
0
−2 0
−1
−2 0 −3/4
2 1
1
−1 0 −5/4
135
24
16
−24
32
1 −6
0 3
0 6
0 3
0
0
1
0
Pivoting about the (3, 2)th element, we get
0 0 1
1 0 1
0 1 1/2
0 0 1/2
0
1/4
1/2
−3/4
0
−8
−12
20
1 0 1
0 9 1
0 3 1/2
0 6 1/2
Pivoting about the (3, 4)th element, we get
0
1
0
0
0
−1/2
2
3/2
1 0
0
3/4 0 −2
1 1 −24
5/4 0
2
1
0
1
0 15/2 3/4
0
6
1
0 21/2 5/4
The reduced cost coefficients are all nonnegative. Hence, the optimal solution to the problem is
[3/4, 0, 0, 1, 0, 1, 0]> . The corresponding optimal cost is −5/4.
16.19
a. We have
Ad(0) = A(x(1) − x(0) )/α0 = (b − b)/α0 = 0.
Hence, d(0) ∈ N (A).
b. From our discussion of moving from one BFS to an adjacent BFS, we deduce that
"
#
−y q
(0)
d =
.
eq−m
In other words, the first m components of d(0) are −y1q , . . . , −ymq , and all the other components are 0 except
the qth component, which is 1.
16.20
The following is a MATLAB function that implements the simplex algorithm.
function [x,v]=simplex(c,A,b,v,options)
%
SIMPLEX(c,A,b,v);
%
SIMPLEX(c,A,b,v,options);
%
%
x = SIMPLEX(c,A,b,v);
%
x = SIMPLEX(c,A,b,v,options);
%
%
[x,v] = SIMPLEX(c,A,b,v);
%
[x,v] = SIMPLEX(c,A,b,v,options);
%
%SIMPLEX(c,A,b,v) solves the following linear program using the
%Simplex Method:
%
min c’x subject to Ax=b, x>=0,
%where [A b] is in canonical form, and v is the vector of indices of
%basic columns. Specifically, the v(i)-th column of A is the i-th
%standard basis vector.
%The second variant allows a vector of optional parameters to be
%defined:
%OPTIONS(1) controls how much display output is given; set
%to 1 for a tabular display of results (default is no display: 0).
%OPTIONS(5) specifies how the pivot element is selected;
%
0=choose the most negative relative cost coefficient;
%
1=use Bland’s rule.
136
if nargin ~= 5
options = [];
if nargin ~= 4
disp(’Wrong number of arguments.’);
return;
end
end
format compact;
%format short e;
options = foptions(options);
print = options(1);
n=length(c);
m=length(b);
cB=c(v(:));
r = c’-cB’*A; %row vector of relative cost coefficients
cost = -cB’*b;
tabl=[A b;r cost];
if print,
disp(’ ’);
disp(’Initial tableau:’);
disp(tabl);
end %if
while ones(1,n)*(r’ >= zeros(n,1)) ~= n
if options(5) == 0;
[r_q,q] = min(r);
else
%Bland’s rule
q=1;
while r(q) >= 0
q=q+1;
end
end %if
min_ratio = inf;
p=0;
for i=1:m,
if tabl(i,q)>0
if tabl(i,n+1)/tabl(i,q) < min_ratio
min_ratio = tabl(i,n+1)/tabl(i,q);
p = i;
end %if
end %if
end %for
if p == 0
disp(’Problem unbounded’);
break;
end %if
tabl=pivot(tabl,p,q);
137
if print,
disp(’Pivot point:’);
disp([p,q]);
disp(’New tableau:’);
disp(tabl);
end %if
v(p) = q;
r = tabl(m+1,1:n);
end %while
x=zeros(n,1);
x(v(:))=tabl(1:m,n+1);
The above function makes use of the following function that implements pivoting:
function Mnew=pivot(M,p,q)
%Mnew=pivot(M,p,q)
%Returns the matrix Mnew resulting from pivoting about the
%(p,q)th element of the given matrix M.
for i=1:size(M,1),
if i==p
Mnew(p,:)=M(p,:)/M(p,q);
else
Mnew(i,:)=M(i,:)-M(p,:)*(M(i,q)/M(p,q));
end %if
end %for
%----------------------------------------------------------------
We now apply the simplex algorithm to the problem in Example 16.2, as follows:
>>
>>
>>
>>
>>
>>
A=[1 0 1 0 0; 0 1 0 1 0; 1 1 0 0 1];
b=[4;6;8];
c=[-2;-5;0;0;0];
v=[3;4;5];
options(1)=1;
[x,v]=simplex(c,A,b,v,options);
Initial Tableau:
1
0
0
1
1
1
-2
-5
Pivot point:
2
2
New tableau:
1
0
0
1
1
0
-2
0
Pivot point:
3
1
New tableau:
0
0
0
1
1
0
0
0
>> disp(x’);
2
6
1
0
0
0
0
1
0
0
0
0
1
0
4
6
8
0
1
0
0
0
0
1
-1
5
0
0
1
0
4
6
2
30
1
0
0
0
1
1
-1
3
-1
0
1
2
2
6
2
34
2
0
0
138
>> disp(v’);
3
2
1
As indicated above, the solution to the problem in standard form is [2, 6, 2, 0, 0]> , and the objective
function value is −34. The optimal cost for the original maximization problem is 34.
16.21
The following is a MATLAB routine that implements the two-phase simplex method, using the MATLAB
function from Exercise 16.20.
function [x,v]=tpsimplex(c,A,b,options)
%
TPSIMPLEX(c,A,b);
%
TPSIMPLEX(c,A,b,options);
%
%
x = TPSIMPLEX(c,A,b);
%
x = TPSIMPLEX(c,A,b,options);
%
%
[x,v] = TPSIMPLEX(c,A,b);
%
[x,v] = TPSIMPLEX(c,A,b,options);
%
%TPSIMPLEX(c,A,b) solves the following linear program using the
%two-phase simplex method:
% min c’x subject to Ax=b, x>=0.
%The second variant allows a vector of optional parameters to be
%defined:
%OPTIONS(1) controls how much display output is given; set
%to 1 for a tabular display of results (default is no display: 0).
%OPTIONS(5) specifies how the pivot element is selected;
% 0=choose the most negative relative cost coefficient;
% 1=use Bland’s rule.
if nargin ~= 4
options = [];
if nargin ~= 3
disp(’Wrong number of arguments.’);
return;
end
end
clc;
format compact;
%format short e;
options = foptions(options);
print = options(1);
n=length(c);
m=length(b);
%Phase I
if print,
disp(’ ’);
disp(’Phase I’);
disp(’-------’);
end
v=n*ones(m,1);
for i=1:m
v(i)=v(i)+i;
139
end
[x,v]=simplex([zeros(n,1);ones(m,1)],[A eye(m)],b,v,options);
if all(v<=n),
%Phase II
if print
disp(’ ’);
disp(’Phase II’);
disp(’--------’);
disp(’Basic columns:’)
disp(v’)
end
%Convert [A b] into canonical augmented matrix
Binv=inv(A(:,[v]));
A=Binv*A;
b=Binv*b;
[x,v]=simplex(c,A,b,v,options);
if print
disp(’ ’);
disp(’Final solution:’);
disp(x’);
end
else
%assumes nondegeneracy
disp(’Terminating: problem has no feasible solution.’);
end
%----------------------------------------------------------------
We now apply the above MATLAB routine to the problem in Example 16.5, as follows:
>>
>>
>>
>>
>>
>>
A=[1 1 1 0; 5 3 0 -1];
b=[4;8];
c=[-3;-5;0;0];
options(1)=1;
format rat;
tpsimplex(c,A,b,options);
Phase I
------Initial Tableau:
1
1
1
5
3
0
-6
-4
-1
Pivot point:
2
1
New tableau:
0
2/5
1
1
3/5
0
0
-2/5
-1
Pivot point:
1
3
New tableau:
0
2/5
1
1
3/5
0
0
*
0
0
-1
1
1
0
0
0
1
0
4
8
-12
1/5
-1/5
-1/5
1
0
0
-1/5 12/5
1/5
8/5
6/5 -12/5
1/5
-1/5
*
1
0
1
-1/5
1/5
1
12/5
8/5
*
140
Pivot point:
2
2
New tableau:
-2/3
0
5/3
1
*
0
Pivot point:
1
4
New tableau:
-2
0
1
1
*
0
1
0
0
1/3
-1/3
*
1
0
1
-1/3
1/3
1
4/3
8/3
*
3
1
*
1
0
0
3
1
1
-1
0
1
4
4
*
Initial Tableau:
-2
0
1
1
2
0
3
1
5
1
0
0
4
4
20
Final solution:
0
4
0
4
Basic columns:
4
2
Phase II
--------
16.22
The following is a MATLAB function that implements the revised simplex algorithm.
function [x,v,Binv]=revsimp(c,A,b,v,Binv,options)
%
REVSIMP(c,A,b,v,Binv);
%
REVSIMP(c,A,b,v,Binv,options);
%
%
x = REVSIMP(c,A,b,v,Binv);
%
x = REVSIMP(c,A,b,v,Binv,options);
%
%
[x,v,Binv] = REVSIMP(c,A,b,v,Binv);
%
[x,v,Binv] = REVSIMP(c,A,b,v,Binv,options);
%
%REVSIMP(c,A,b,v,Binv) solves the following linear program using the
%revised simplex method:
%
min c’x subject to Ax=b, x>=0,
%where v is the vector of indices of basic columns, and Binv is the
%inverse of the basis matrix. Specifically, the v(i)-th column of
%A is the i-th column of the basis vector.
%The second variant allows a vector of optional parameters to be
%defined:
%OPTIONS(1) controls how much display output is given; set
%to 1 for a tabular display of results (default is no display: 0).
%OPTIONS(5) specifies how the pivot element is selected;
%
0=choose the most negative relative cost coefficient;
%
1=use Bland’s rule.
if nargin ~= 6
options = [];
if nargin ~= 5
disp(’Wrong number of arguments.’);
return;
end
141
end
format compact;
%format short e;
options = foptions(options);
print = options(1);
n=length(c);
m=length(b);
cB=c(v(:));
y0 = Binv*b;
lambdaT=cB’*Binv;
r = c’-lambdaT*A; %row vector of relative cost coefficients
if print,
disp(’ ’);
disp(’Initial revised tableau [v B^(-1) y0]:’);
disp([v Binv y0]);
disp(’Relative cost coefficients:’);
disp(r);
end %if
while ones(1,n)*(r’ >= zeros(n,1)) ~= n
if options(5) == 0;
[r_q,q] = min(r);
else
%Bland’s rule
q=1;
while r(q) >= 0
q=q+1;
end
end %if
yq = Binv*A(:,q);
min_ratio = inf;
p=0;
for i=1:m,
if yq(i)>0
if y0(i)/yq(i) < min_ratio
min_ratio = y0(i)/yq(i);
p = i;
end %if
end %if
end %for
if p == 0
disp(’Problem unbounded’);
break;
end %if
if print,
disp(’Augmented revised tableau [v B^(-1) y0 yq]:’)
disp([v Binv y0 yq]);
disp(’(p,q):’);
disp([p,q]);
end
augrevtabl=pivot([Binv y0 yq],p,m+2);
142
Binv=augrevtabl(:,1:m);
y0=augrevtabl(:,m+1);
v(p) = q;
cB=c(v(:));
lambdaT=cB’*Binv;
r = c’-lambdaT*A; %row vector of relative cost coefficients
if print,
disp(’New revised tableau [v B^(-1) y0]:’);
disp([v Binv y0]);
disp(’Relative cost coefficients:’);
disp(r);
end %if
end %while
x=zeros(n,1);
x(v(:))=y0;
The function makes use of the pivoting function in Exercise 16.20.
We now apply the simplex algorithm to the problem in Example 16.2, as follows:
>>
>>
>>
>>
>>
>>
>>
A=[1 0 1 0 0; 0 1 0 1 0; 1 1 0 0 1];
b=[4;6;8];
c=[-2;-5;0;0;0];
v=[3;4;5];
Binv=eye(3);
options(1)=1;
[x,v,Binv]=rev_simp(c,A,b,v,Binv,options);
Initial revised tableau [v B^(-1) y0]:
3
1
0
0
4
4
0
1
0
6
5
0
0
1
8
Relative cost coefficients:
-2
-5
0
0
0
Augmented revised tableau [v B^(-1) y0 yq]:
3
1
0
0
4
0
4
0
1
0
6
1
5
0
0
1
8
1
(p,q):
2
2
New revised tableau [v B^(-1) y0]:
3
1
0
0
4
2
0
1
0
6
5
0
-1
1
2
Relative cost coefficients:
-2
0
0
5
0
Augmented revised tableau [v B^(-1) y0 yq]:
3
1
0
0
4
1
2
0
1
0
6
0
5
0
-1
1
2
1
(p,q):
3
1
New revised tableau [v B^(-1) y0]:
3
1
1
-1
2
2
0
1
0
6
1
0
-1
1
2
143
Relative cost coefficients:
0
0
0
3
>> disp(x’);
2
6
2
0
>> disp(v’);
3
2
1
>> disp(Binv);
1
1
-1
0
1
0
0
-1
1
2
0
16.23
The following is a MATLAB routine that implements the two-phase revised simplex method, using the
MATLAB function from Exercise 16.22.
function [x,v]=tprevsimp(c,A,b,options)
%
TPREVSIMP(c,A,b);
%
TPREVSIMP(c,A,b,options);
%
%
x = TPREVSIMP(c,A,b);
%
x = TPREVSIMP(c,A,b,options);
%
%
[x,v] = TPREVSIMP(c,A,b);
%
[x,v] = TPREVSIMP(c,A,b,options);
%
%TPREVSIMP(c,A,b) solves the following linear program using the
%two-phase revised simplex method:
% min c’x subject to Ax=b, x>=0.
%The second variant allows a vector of optional parameters to be
%defined:
%OPTIONS(1) controls how much display output is given; set
%to 1 for a tabular display of results (default is no display: 0).
%OPTIONS(5) specifies how the pivot element is selected;
% 0=choose the most negative relative cost coefficient;
% 1=use Bland’s rule.
if nargin ~= 4
options = [];
if nargin ~= 3
disp(’Wrong number of arguments.’);
return;
end
end
clc;
format compact;
%format short e;
options = foptions(options);
print = options(1);
n=length(c);
m=length(b);
%Phase I
if print,
disp(’ ’);
disp(’Phase I’);
disp(’-------’);
144
end
v=n*ones(m,1);
for i=1:m
v(i)=v(i)+i;
end
[x,v,Binv]=rev_simp([zeros(n,1);ones(m,1)],[A eye(m)],b,v,eye(m),options);
%Phase II
if print
disp(’ ’);
disp(’Phase II’);
disp(’--------’);
end
[x,v,Binv]=rev_simp(c,A,b,v,Binv,options);
if print
disp(’ ’);
disp(’Final solution:’);
disp(x’);
end
%----------------------------------------------------------------
We now apply the above MATLAB routine to the problem in Example 16.5, as follows:
>>
>>
>>
>>
>>
>>
A=[4 2 -1 0; 1 4 0 -1];
b=[12;6];
c=[2;3;0;0];
options(1)=1;
format rat;
tprevsimp(c,A,b,options);
Phase I
------Initial revised tableau [v B^(-1) y0]:
5
1
0
12
6
0
1
6
Relative cost coefficients:
-5
-6
1
1
0
0
Augmented revised tableau [v B^(-1) y0 yq]:
5
1
0
12
2
6
0
1
6
4
(p,q):
2
2
New revised tableau [v B^(-1) y0]:
5
1
-1/2
9
2
0
1/4
3/2
Relative cost coefficients:
-7/2
0
1
-1/2
0
3/2
Augmented revised tableau [v B^(-1) y0 yq]:
5
1
-1/2
9
7/2
2
0
1/4
3/2
1/4
(p,q):
1
1
New revised tableau [v B^(-1) y0]:
1
2/7 -1/7 18/7
2
-1/14 2/7
6/7
145
Relative cost coefficients:
0
0
0
0
1
1
Phase II
-------Initial revised tableau [v B^(-1) y0]:
1
2/7 -1/7 18/7
2
-1/14 2/7
6/7
Relative cost coefficients:
*
0
5/14 4/7
Final solution:
18/7
6/7
0
0
17. Duality
17.1
Since x and λ are feasible, we have Ax ≥ b, x ≥ 0, and λ> A ≤ c> , λ ≥ 0. Postmultiplying both sides of
λ> A ≤ c> by x ≥ 0 yields
λ> Ax ≤ c> x.
Since Ax ≥ b and λ> ≥ 0> , we have λ> Ax ≥ λ> b. Hence, λ> b ≤ c> x.
17.2
The primal problem is:
e>
np
minimize
Gp ≥ P em
p ≥ 0,
subject to
where G = [gi,j ], en = [1, . . . , 1]> (with n components), and p = [p1 , . . . , pn ]> . The dual of the problem is
(using symmetric duality):
maximize
P λ> em
subject to
λ> G ≤ e>
n
λ ≥ 0.
17.3
a. We first transform the problem into standard form:
minimize
subject to
−2x1 − 3x2
x1 + 2x2 + x3 = 4
2x1 + x2 + x4 = 5
x1 , x2 , x3 , x4 ≥ 0.
The initial tableau is:
1
2
−2
2
1
−3
1 0 4
0 1 5
0 0 0
We now pivot about the (1, 2)th element to get:
1/2
3/2
−1/2
1 1/2
0 −1/2
0 3/2
146
0 2
1 3
0 6
Pivoting now about the (2, 1)th element gives:
0 1
1 0
0 0
2/3
−1/3
4/3
−1/3
2/3
1/3
1
2
7
Thus, the solution to the standard form problem is x1 = 2, x2 = 1, x3 = 0, x4 = 0. The solution to the
original problem is x1 = 2, x2 = 1.
b. The dual to the standard form problem is
maximize
4λ1 + 5λ2
λ1 + 2λ2 ≤ −2
subject to
2λ1 + λ2 ≤ −3
λ1 , λ2 ≤ 0.
>
From the discussion before Example 17.6, it follows that the solution to the dual is λ> = c>
I − rI =
[−4/3, −1/3].
17.4
The dual problem is
maximize
11λ1 + 8λ2 + 7λ3
subject to
5λ1 + 2λ2 + λ3 ≤ 4
λ1 + λ2 + 2λ3 ≤ 3
λ1 , λ2 , λ3 ≥ 0.
Note that we may arrive at the above in one of two ways: by applying the asymmetric form of duality, or
by applying the symmetric form of duality to the original problem in standard form. From the solution of
−1
= [0, 5/3, 2/3] (using the proof of the
Exercise 16.11a, we have that the solution to the dual is λ> = c>
BB
duality theorem).
17.5
We represent the primal in the form
c> x
minimize
subject to Ax = b
x ≥ 0.
The corresponding dual is
maximize
λ> b
subject to λ> A ≤ c> ,
that is,
maximize
2λ1 + 7λ2 + 3λ3

h
i −2

subject to
λ1 λ2 λ3  −1
1

1 1 0 0
h

2 0 1 0 ≤ −1
0 0 0 1
−2
0
−1
The solution to the dual can be obtained using the formula, λ∗> = c>
, where
BB


−2 1 1
h
i


c>
=
and
B
=
−1
−2
0
−1 2 0 .

B
1 0 0
147
0
i
0 .
−1
Note that because the last element in c>
when
B is zero, we do not need to calculate the last row of B
∗>
computing λ , that is, these elements are “don’t care” elements that we denote using the asterisk. Hence,


1
h
i 0 0
h
i


−1
λ∗> = c>
= −1 −2 0  0 1/2 1/2 = 0 −1 −2 .
BB
∗ ∗
∗
Note that
c> x∗ = λ∗> b = −13,
as expected.
17.6
a. Multiplying the objective function by −1, we see that the problem is of the form of the dual in the
asymmetric form of duality. Therefore, the dual to the problem is of the form of the primal in the asymmetric
form:
λ> b
minimize
λ> A = −c>
subject to
λ≥0
b. The given vector y is feasible in the dual. Since b = 0, any feasible point in the dual is optimal. Thus, y
is optimal in the dual, and the objective function value for y is 0. Therefore, by the Duality Theorem, the
primal also has an optimal feasible solution, and the corresponding objective function value is 0. Since the
vector 0 is feasible in the primal and has objective function value 0, the vector 0 is a solution to the primal.
17.7
−
We introduce two sets of nonnegative variables: x+
i ≥ 0, xi ≥ 0, i = 1, 2, . . . , 3. We can then represent the
optimization problem in the form
−
+
−
+
−
(x+
1 + x1 ) + (x2 + x2 ) + (x3 + x3 )
 +
x1
 x+ 
"
# 2  " #
+
1 1 −1 −1 −1 1 
2
 x3 
subject to
 − =
1
0 −1 0
0
1 0  x1 
 −
 x2 
x−
3
minimize
−
x+
i ≥ 0, xi ≥ 0.
We form the initial tableau,
c>
x+
1
1
0
1
x+
2
1
−1
1
x−
1
−1
0
1
x+
3
−1
0
1
x−
2
−1
1
1
x−
3
1
0
1
b
2
1
0
There is no apparent basic feasible solution. We add the second row to the first one to obtain,
c>
x+
1
1
0
1
x+
2
0
−1
1
x−
1
−1
0
1
x+
3
−1
0
1
x−
2
0
1
1
x−
3
1
0
1
b
3
1
0
We next calculate the reduced cost coefficients,
c>
x+
1
1
0
0
x+
2
0
−1
2
x+
3
−1
0
2
x−
1
−1
0
2
148
x−
2
0
1
0
x−
3
1
0
0
b
3
1
−4
We have zeros under the basic columns. The reduced cost coefficients are all nonnegative. The optimal
solution is,
h
i>
x∗ = 3 0 0 0 1 0 .
The optimal solution to the original problem is x∗ = [3, −1, 0]> .
The dual of the above linear program is
2λ1 + λ2
"
h
i 1 1
subject to
λ1 λ2
0 −1
maximize
−1
0
−1
0
−1
1
#
h
1
≤ 1
0
1
1
1
1
i
1 .
The optimal solution to the dual is
λ∗>
−1
= c>
BB
"
h
i 1
=
1 1
0
h
i
=
1 2 .
−1
1
#−1
17.8
a. The dual (asymmetric form) is
maximize
λ
subject to
λai ≤ 1, i = 1, . . . , n.
We can write the constraint as
λ ≤ min{1/ai : i = 1, . . . , n} = 1/an .
Therefore, the solution to the dual problem is
λ = 1/an .
b. Duality Theorem: If the primal problem has an optimal solution, then so does the dual, and the optimal
values of their respective objective functions are equal.
By the duality theorem, the primal has an optimal solution, and the optimal value of the objective function
is 1/an . The only feasible point in the primal with this objective function value is the basic feasible solution
[0, . . . , 0, 1/an ]> .
c. Suppose we start at a nonoptimal initial basic feasible solution, [0, . . . , 1/ai , . . . , 0]> , where 1 ≤ i ≤ n − 1.
The relative cost coefficient for the qth column, q 6= i, is
rq = 1 −
aq
.
ai
Since an > aj for any j 6= n, rq is the most negative relative cost coefficient if and only if q = n.
17.9
a. By asymmetric duality, the dual is given by
minimize
subject to
λ
λ ≥ ci , i = 1, . . . , n.
b. The constraint in part a implies that λ is feasible if and only if λ ≥ c4 . Hence, the solution is λ∗ = c4 .
c. By the duality theorem, the optimal objective function value for the given problem is c4 . The only solution
that achieves this value is x∗4 = 1 and x∗i = 0 for all i 6= 4.
149
17.10
a. The dual is
λ> 0
λ> A ≥ c>
minimize
subject to
λ ≥ 0.
b. By the duality theorem, we conclude that the optimal value of the objective function is 0. The only
vector satisfying x ≥ 0 that has an objective function value of 0 is x = 0. Therefore, the solution is x = 0.
c. The constraint set contains only the vector 0. Any other vector x satisfying x ≥ 0 has at least one
positive component, and consequently has a positive objective function value. But this contradicts the fact
that the optimal solution has an objective function value of 0.
17.11
a. The artificial problem is:
minimize
subject to
[0> , e> ]z
[A, I]z = b
z ≥ 0,
where e = [1, . . . , 1]> and z = [x> , y > ]> .
b. The dual to the artificial problem is:
λ> b
λ> A ≤ 0>
maximize
subject to
λ> ≤ e> .
c. Suppose the given original linear programming problem has a feasible solution. By the FTLP, the original
LP problem has a BFS. Then, by a theorem given in class, the artificial problem has an optimal feasible
solution with y = 0. Hence, by the Duality Theorem, the dual of the artificial problem also has an optimal
feasible solution.
17.12
a. Possible. This situation arises if the primal is unbounded, which by the Weak Duality Lemma implies
that the dual has no feasible solution.
b. Impossible, because the Duality Theorem requires that if the primal has an optimal feasible solution,
then so does the dual.
c. Impossible, because the Duality Theorem requires that if the dual has an optimal feasible solution, then
so does the primal. Also, the Weak Dual Lemma requires that if the primal is unbounded (i.e., has a feasible
solution but no optimal feasible solution), then the dual must have no feasible solution.
17.13
To prove the result, we use Theorem 17.3 (Complementary Slackness). Since µ ≥ 0, we have A> λ = c − µ ≤
c. Hence, λ is a feasible solution to the dual. Now, (c − A> λ)> x = µ> x = 0. Therefore, by Theorem 17.3,
x and λ are optimal for their respective problems.
17.14
To use the symmetric form of duality, we need to rewrite the problem as
minimize
subject to
−c> (u − v),
−A(u − v) ≥ −b
u, v ≥ 0,
150
which we represent in the form
" #
u
[−c c ]
,
v
" #
u
[−A A]
≥ −b
v
" #
u
≥ 0.
v
>
minimize
subject to
>
By the symmetric form of duality, the dual is:
maximize
λ> (−b)
subject to
λ> [−A A] ≤ [−c> c> ]
λ ≥ 0.
Note that for the constraint involving A, we have
λ> [−A A] ≤ [−c> c> ] ⇔
−λ> A ≤ −c> and λ> A ≤ c>
λ> A = c> .
⇔
Therefore, we can represent the dual as
λ> b
minimize
λ> A = c>
λ ≥ 0.
subject to
17.15
The corresponding dual can be written as:
maximize
3λ1 + 3λ2
subject to
λ1 + 2λ2 ≤ 1
2λ1 + λ2 ≤ 1
λ1 , λ2 ≥ 0.
To solve the dual, we refer back to the solution of Exercise 16.11. Using the idea of the proof of the duality
−1
= [1/3, 1/3]. The cost of the
theorem (Theorem 17.2), we obtain the solution to the dual as λ> = c>
BB
dual problem is 2, which verifies the duality theorem.
17.16
The dual to the above linear program (asymmetric form) is
maximize
λ> 0
subject to
λ> O ≤ c> .
The above dual problem has a feasible solution if and only if c ≥ 0. Since any feasible solution to the dual
is also optimal, the dual has an optimal solution if and only if c ≥ 0. Therefore, by the duality theorem, the
primal problem has a solution if and only if c ≥ 0.
If the solution to the dual exists, then the optimal value of the objective function in the primal is equal
to that of the dual, which is clearly 0. In this case, 0 is optimal, since c> 0 = 0.
17.17
Consider the primal problem
minimize
subject to
151
0> x
Ax ≥ b
x ≥ 0,
and its corresponding dual
maximize
y> b
subject to
y> A ≤ 0
y ≥ 0.
⇒: By assumption, there exists a feasible solution to the primal problem. Note that any feasible solution
is also optimal, and has objective function value 0. Suppose y satisfies A> y ≤ 0 and y ≥ 0. Then, y is a
feasible solution to the dual. Therefore, by the Weak Duality Lemma, b> y ≤ 0.
⇐: Note that the feasible region for the dual is nonempty, since 0 is a feasible point. Also, by assumption,
0 is an optimal solution, since any other feasible point y satisfies b> y ≤ b> 0 = 0. Hence, by the duality
theorem, the primal problem has an (optimal) feasible solution.
17.18
a. The dual is
maximize
y> b
subject to
y > A ≤ 0> .
b. The feasible set of the dual problem is always nonempty, because 0 is clearly guaranteed to be feasible.
c. Suppose y is feasible in the dual. Then, by assumption, b> y ≤ 0. But the point 0 is feasible and has
objective function value 0. Hence, 0 is optimal in the dual.
d. By parts b and c, the dual has an optimal feasible solution. Hence, by the duality theorem, the primal
problem also has an optimal feasible solution.
e. By assumption, there exists a feasible solution to the primal problem. Note that any feasible solution
in the primal has objective function value 0 (and hence so does the given solution). Suppose y satisfies
A> y ≤ 0. Then, y is a feasible solution to the dual. Therefore, by weak duality, b> y ≤ 0.
17.19
Consider the primal problem
y> b
minimize
y> A = 0
subject to
y≥0
and its corresponding dual
maximize
0> x
subject to
Ax ≤ b.
⇒: By assumption, there exists a feasible solution to the dual problem. Note that any feasible solution
is also optimal, and has objective function value 0. Suppose y satisfies A> y = 0 and y ≥ 0. Then, y is a
feasible solution to the primal. Therefore, by the Weak Duality Lemma, b> y ≥ 0.
⇐: Note that the feasible region for the primal is nonempty, since 0 is a feasible point. Also, by assumption, 0 is an optimal solution, since any other feasible point y satisfies b> y ≥ b> 0 = 0. Hence, by the
duality theorem, the dual problem has an (optimal) feasible solution.
17.20
Let e = [1, . . . , 1]> . Consider the primal problem
0> x
minimize
subject to
Ax ≤ −e
maximize
e> y
subject to
y> A = 0
y ≥ 0.
and its corresponding dual
152
⇒: Suppose there exists Ax < 0. Then, the vector x0 = x/ min{|(Ax)i |} is a feasible solution to the
primal problem. Note that any feasible solution is also optimal, and has objective function value 0. Suppose
y satisfies A> y = 0, y ≥ 0. Then, y is a feasible solution to the dual. Therefore, by the Weak Duality
Lemma, e> y ≤ 0. Since y ≥ 0, we conclude that y = 0.
⇐: Suppose 0 is the only feasible solution to the dual. Then, 0 is clearly also optimal. Hence, by the
duality theorem, the primal problem has an (optimal) feasible solution x. Since Ax ≤ −e and −e < 0, we
get Ax < 0.
17.21
a. Rewrite the primal as
(−e)> x
minimize
(P − I)> x = 0
x ≥ 0.
subject to
By asymmetric duality, the dual is
maximize
subject to
λ> 0
λ> (P − I)> ≤ −e> .
b. To make the notation simpler, we rewrite the dual as:
maximize
subject to
0
(P − I)y ≥ e.
Suppose the dual is feasible. Then, there exists a y such that P y ≥ y + e > y. Let yi be the largest
element of y, and p(i)> the ith row of P . Then, p(i)> y > yi . But, by definition of yi , y ≤ yi e. Hence,
p(i)> y ≤ yi p(i)> e = yi , which contradicts the inequality p(i)> y > yi . Hence, the dual is not feasible.
c. The primal is certainly feasible, because 0 is a feasible point. Therefore, by part b and strong duality, the
primal must also be unbounded.
d. Because 0 is an achievable objective function value (it is the objective function value of 0), and the problem
is unbounded, we deduce that 1 is also achievable. Hence, there exists a feasible x such that x> e = 1. This
proves the desired result.
17.22
Write the LP problem
minimize
subject to
c> x
Ax ≥ b
x≥0
and the corresponding dual problem
λ> b
λ> A ≤ c>
maximize
subject to
λ ≥ 0.
By a theorem on duality, if we can find feasible points x and λ for the primal and dual, respectively, such
that c> x = λ> b, then x and λ are optimal for their respective problems. We can rewrite the previous set
of relations as
 >

 
−c
b>
0
>
 c>
 0
−b

" #  
 A
 b
0 
 x
 

≥  .


 0
 In
0  λ

 

 0
 −c
−A> 
0
Im
153
0
Therefore, writing the above as Ây ≥ b̂, where  ∈ R(2m+2n+2)×(m+n) and b̂ ∈ R(2m+2n+2) , we have that
the first n components of φ((2m + 2n + 2), (m + n), Â, b̂) is a solution to the given linear programming
problem.
17.23
a. Consider the dual; b does not appear in the constraint (but it does appear in the dual objective function).
Thus, provided the level sets of the dual objective function do not exactly align with one of the faces of the
constraint set (polyhedron), the optimal dual vector will not change if we perturb b very slightly. Now, by
the duality theorem, z(b) = λ> b. Because λ is constant in a neighborhood of b, we deduce that ∇z(b) = λ.
b. By part a, we deduce that the optimal objective function value will change by 3∆b1 .
17.24
a. Weak duality lemma: if x0 and y 0 are feasible points in the primal and dual, respectively, then f1 (x0 ) ≥
f2 (y 0 ).
Proof: Because y 0 ≥ 0 and Ax0 − b ≤ 0, we have y >
0 (Ax0 − b) ≤ 0. Therefore,
f1 (x0 ) ≥ f1 (x0 ) + y >
0 (Ax0 − b)
1 >
>
=
x x0 + y >
0 Ax0 − y 0 b.
2 0
Now, we know that
1 >
1 ∗> ∗
∗
x x0 + y >
x + y>
0 Ax0 ≥ x
0 Ax ,
2 0
2
where x∗ = −A> y 0 . Hence,
1 >
1 >
1 >
>
>
>
>
x 0 x0 + y >
0 Ax0 ≥ y 0 AA y 0 − y 0 AA y 0 = − y 0 AA y 0 .
2
2
2
Combining this with the above, we have
1
AA> y 0 − y >
− y>
0b
2 0
= f2 (x0 ).
f1 (x0 ) ≥
Alternatively, notice that
f1 (x0 ) − f2 (y 0 )
1 >
1
x x0 + y >
AA> y 0 + b> y 0
2 0
2 0
1 >
1
>
≥
x x0 + y >
AA> y 0 + x>
0 A y0
2 0
2 0
1
=
kx0 + A> y 0 k2
2
≥ 0.
=
b. Suppose f1 (x0 ) = f2 (y 0 ) for feasible points x0 and y 0 . Let x be any feasible point in the primal. Then,
by part a, f1 (x) ≥ f2 (y 0 ) = f1 (x0 ). Hence x0 is optimal in the primal.
Similarly, let y be any feasible point in the dual. Then, by part a, f2 (y) ≤ f1 (x0 ) = f2 (y 0 ). Hence y 0 is
optimal in the dual.
18. Non-Simplex Methods
18.1
The following is a MATLAB function that implements the affine scaling algorithm.
function [x,N] = affscale(c,A,b,u,options);
%
AFFSCALE(c,A,b,u);
%
AFFSCALE(c,A,b,u,options);
154
%
%
x = AFFSCALE(c,A,b,u);
%
x = AFFSCALE(c,A,b,u,options);
%
%
[x,N] = AFFSCALE(c,A,b,u);
%
[x,N] = AFFSCALE(c,A,b,u,options);
%
%AFFSCALE(c,A,b,u) solves the following linear program using the
%affine scaling Method:
%
min c’x subject to Ax=b, x>=0,
%where u is a strictly feasible initial solution.
%The second variant allows a vector of optional parameters to be
%defined:
%OPTIONS(1) controls how much display output is given; set
%to 1 for a tabular display of results (default is no display: 0).
%OPTIONS(2) is a measure of the precision required for the final point.
%OPTIONS(3) is a measure of the precision required cost value.
%OPTIONS(14) = max number of iterations.
%OPTIONS(18) = alpha.
if nargin ~= 5
options = [];
if nargin ~= 4
disp(’Wrong number of arguments.’);
return;
end
end
xnew=u;
if length(options) >= 14
if options(14)==0
options(14)=1000*length(xnew);
end
else
options(14)=1000*length(xnew);
end
%if length(options) < 18
options(18)=0.99; %optional step size
%end
%clc;
format compact;
format short e;
options = foptions(options);
print = options(1);
epsilon_x = options(2);
epsilon_f = options(3);
max_iter=options(14);
alpha=options(18);
n=length(c);
m=length(b);
for k = 1:max_iter,
xcurr=xnew;
D = diag(xcurr);
155
Abar = A*D;
Pbar = eye(n) - Abar’*inv(Abar*Abar’)*Abar;
d = -D*Pbar*D*c;
if d ~= zeros(n,1),
nonzd = find(d<0);
r = min(-xcurr(nonzd)./d(nonzd));
else
disp(’Terminating: d = 0’);
break;
end
xnew = xcurr+alpha*r*d;
if print,
disp(’Iteration number k =’)
disp(k); %print iteration index k
disp(’alpha_k =’);
disp(alpha*r); %print alpha_k
disp(’New point =’);
disp(xnew’); %print new point
end %if
if norm(xnew-xcurr) <= epsilon_x*norm(xcurr)
disp(’Terminating: Relative difference between iterates <’);
disp(epsilon_x);
break;
end %if
if abs(c’*(xnew-xcurr)) < epsilon_f*abs(c’*xcurr),
disp(’Terminating: Relative change in objective function < ’);
disp(epsilon_f);
break;
end %if
if k == max_iter
disp(’Terminating with maximum number of iterations’);
end %if
end %for
if nargout >= 1
x=xnew;
if nargout == 2
N=k;
end
else
disp(’Final point =’);
disp(xnew’);
disp(’Number of iterations =’);
disp(k);
end %if
%----------------------------------------------------------------
We now apply the affine scaling algorithm to the problem in Example 16.2, as follows:
>>
>>
>>
>>
>>
>>
>>
A=[1 0 1 0 0; 0 1 0 1 0; 1 1 0 0 1];
b=[4;6;8];
c=[-2;-5;0;0;0];
u=[2;3;2;3;3];
options(1)=0;
options(2)=10^(-7);
options(3)=10^(-7);
156
>> affscale(c,A,b,u,options);
Terminating: Relative difference between iterates <
1.0000e-07
Final point =
2.0000e+00
6.0000e+00
2.0000e+00
1.0837e-09
Number of iterations =
8
1.7257e-08
The result obtained after 8 iterations as indicated above agrees with the solution in Example 16.2:
[2, 6, 2, 0, 0]> .
18.2
The following is a MATLAB routine that implements the two-phase affine scaling method, using the MATLAB function from Exercise 18.1.
function [x,N]=tpaffscale(c,A,b,options)
% March 28, 2000
%
%
TPAFFSCALE(c,A,b);
%
TPAFFSCALE(c,A,b,options);
%
%
x = TPAFFSCALE(c,A,b);
%
x = TPAFFSCALE(c,A,b,options);
%
%
[x,N] = TPAFFSCALE(c,A,b);
%
[x,N] = TPAFFSCALE(c,A,b,options);
%
%TPAFFSCALE(c,A,b) solves the following linear program using the
%Two-Phase Affine Scaling Method:
% min c’x subject to Ax=b, x>=0.
%The second variant allows a vector of optional parameters to be
%defined:
%OPTIONS(1) controls how much display output is given; set
%to 1 for a tabular display of results (default is no display: 0).
%OPTIONS(2) is a measure of the precision required for the final point.
%OPTIONS(3) is a measure of the precision required cost value.
%OPTIONS(14) = max number of iterations.
%OPTIONS(18) = alpha.
if nargin ~= 4
options = [];
if nargin ~= 3
disp(’Wrong number of arguments.’);
return;
end
end
%clc;
format compact;
format short e;
options = foptions(options);
print = options(1);
n=length(c);
m=length(b);
%Phase I
if print,
disp(’ ’);
157
disp(’Phase I’);
disp(’-------’);
end
u = rand(n,1);
v = b-A*u;
if v ~= zeros(m,1),
u = affscale([zeros(1,n),1]’,[A v],b,[u’ 1]’,options);
u(n+1) = [];
end
if print
disp(’ ’)
disp(’Initial condition for Phase II:’)
disp(u)
end
if u(n+1) < options(2),
%Phase II
u(n+1) = [];
if print
disp(’ ’);
disp(’Phase II’);
disp(’--------’);
disp(’Initial condition for Phase II:’);
disp(u);
end
[x,N]=affscale(c,A,b,u,options);
if nargout == 0
disp(’Final point =’);
disp(x’);
disp(’Number of iterations =’);
disp(N);
end %if
else
disp(’Terminating: problem has no feasible solution.’);
end
%----------------------------------------------------------------
We now apply the above MATLAB routine to the problem in Example 16.5, as follows:
>> A=[1 1 1 0; 5 3 0 -1];
>> b=[4;8];
>> c=[-3;-5;0;0];
>> options(1)=0;
>> tpaffscale(c,A,b,options);
Terminating: Relative difference between iterates <
1.0000e-07
Terminating: Relative difference between iterates <
1.0000e-07
Final point =
4.0934e-09
4.0000e+00
9.4280e-09
4.0000e+00
Number of iterations =
7
The result obtained above agrees with the solution in Example 16.5: [0, 4, 0, 4]> .
18.3
The following is a MATLAB routine that implements the affine scaling method applied to LP problems of
the form given in the question by converting the given problem in Karmarkar’s artificial form and then using
the MATLAB function from Exercise 18.1.
158
function [x,N]=karaffscale(c,A,b,options)
%
%
KARAFFSCALE(c,A,b);
%
KARAFFSCALE(c,A,b,options);
%
%
x = KARAFFSCALE(c,A,b);
%
x = KARAFFSCALE(c,A,b,options);
%
%
[x,N] = KARAFFSCALE(c,A,b);
%
[x,N] = KARAFFSCALE(c,A,b,options);
%
%KARAFFSCALE(c,A,b) solves the following linear program using the
%Affine Scaling Method:
% min c’x subject to Ax>=b, x>=0.
%We use Karmarkar’s artificial problem to convert the above problem into
%a form usable by the affine scaling method.
%The second variant allows a vector of optional parameters to be
%defined:
%OPTIONS(1) controls how much display output is given; set
%to 1 for a tabular display of results (default is no display: 0).
%OPTIONS(2) is a measure of the precision required for the final point.
%OPTIONS(3) is a measure of the precision required cost value.
%OPTIONS(14) = max number of iterations.
%OPTIONS(18) = alpha.
if nargin ~= 4
options = [];
if nargin ~= 3
disp(’Wrong number of arguments.’);
return;
end
end
%clc;
format compact;
format short e;
options = foptions(options);
print = options(1);
n=length(c);
m=length(b);
%Convert to Karmarkar’s aftificial problem
x0 = ones(n,1);
l0 = ones(m,1);
u0 = ones(n,1);
v0 = ones(m,1);
AA = [
c’ -b’ zeros(1,n) zeros(1,m) (-c’*x0+b’*l0);
A zeros(m,m) zeros(m,n) -eye(m) (b-A*x0+v0);
zeros(n,n) A’ eye(n) zeros(n,m) (c-A’*l0)
];
bb = [0; b; c];
cc = [zeros(2*m+2*n,1); 1];
y0 = [x0; l0; u0; v0; 1];
[y,N]=affscale(cc,AA,bb,y0,options);
159
if cc’*y <= options(3),
x = y(1:n);
if nargout == 0
disp(’Final point =’);
disp(x’);
disp(’Final cost =’);
disp(c’*x);
disp(’Number of iterations =’);
disp(N);
end %if
else
disp(’Terminating: problem has no optimal feasible solution.’);
end
We now apply the above MATLAB routine to the problem in Example 15.15, as follows:
>> c=[-3;-5];
>> A=[-1 -5; -2 -1; -1 -1];
>> b=[-40;-20;-12];
>> options(2)=10^(-4);
>> karaffscale(c,A,b,options);
Terminating: Relative difference between iterates <
1.0000e-04
Final point =
5.1992e+00
6.5959e+00
Final cost =
-4.8577e+01
Number of iterations =
3
The solution from Example 15.15 is [5, 7]> . The accuracy of the result obtained above is disappointing.
We believe that the inaccuracy here may be caused by our particularly simple numerical implementation of
the affine scaling method. This illustrates the numerical issues that must be dealt with in any practically
useful implementation of the affine scaling method.
18.4
a. Suppose T (x) = T (y). Then, Ti (x) = Ti (y) for i = 1, . . . , n + 1. Note that for i = 1, . . . , n, Ti (x) =
(xi /ai )Tn+1 (x) and Ti (y) = (yi /ai )Tn+1 (y). Therefore,
Ti (x) = (xi /ai )Tn+1 (x) = Ti (y) = (yi /ai )Tn+1 (y) = (yi /ai )Tn+1 (x),
which implies that xi = yi , i = 1, . . . , n. Hence x = y.
b. Let y ∈ {x ∈ ∆ : xn+1 > 0}. Hence yn+1 > 0. Define x = [x1 , . . . , xn ]> by xi = ai yi /yn+1 , i = 1, . . . , n.
Then, T (x) = y. To see this, note that
Tn+1 (x) =
yn+1
1
=
= yn+1 .
y1 /yn+1 + · · · + yn /yn+1 + 1
y1 + · · · yn + yn+1
Also, for i = 1, . . . , n,
Ti (x) = (yi /yn+1 )Tn+1 (x) = yi .
c. An immediate consequence of the solution to part b.
d. We have
Tn+1 (a) =
1
1
=
,
a1 /a1 + · · · + an /an + 1
n+1
and, for i = 1, . . . , n,
Ti (a) = (ai /ai )Tn+1 (a) =
160
1
.
n+1
e. Since y = T (x), we have that for i = 1, . . . , n, yi = (xi /ai )yn+1 . Therefore, x0i = yi ai = xi yn+1 , which
implies that x0 = yn+1 x. Hence, Ax0 = yn+1 Ax = byn+1 .
18.5
Let x ∈ Rn , and y = T (x). Let ai be the ith column of A, i = 1, . . . , n. As in the hint, let A0 be given by
A0 = [a1 a1 , . . . , an an , −b] .
Then,
Ax = b ⇔
Ax − b = 0

⇔
⇔
⇔
⇔

x1
 . 
 .. 

[a1 , . . . , an , −b] 
 =0
 xn 
1


x1 /a1
 . 
 .. 
=0
[a1 a1 , . . . , an an , −b] 


 xn /an 
1


(x1 /a1 )yn+1


..


0
.
=0
A 

 (xn /an )yn+1 
yn+1
A0 y = 0.
18.6
The result follows from Exercise 18.5 by setting A := c> and b := 0.
18.7
Consider the set {x ∈ Rn : e> x = 1, x ≥ 0, x1 = 0}, which can be written as {x ∈ Rn : Ax = b, x ≥ 0},
where
" #
" #
e>
1
A= > ,
b=
,
e1
0
with e = [1, . . . , 1]> , e1 = [1, 0, . . . , 0]> . Let a0 = e/n. By Exercise 12.20, the closest point on the set
{x : Ax = b} to the point a0 is
x∗ = A> (AA> )−1 (b − Aa0 ) + a0 = 0,
1
1
,...,
n−1
n−1
>
.
Since x∗ ∈ {x : Ax = b, x ≥ 0} ⊂ {x : Ax = b}, the point x∗ is also the closest point on the set
{x : Ax = b, x ≥ 0} to the point a0 .
Let r = ka0 − x∗ k. Then, the sphere of radius r is inscribed in ∆. Note that
1
r = ka0 − x∗ k = p
n(n − 1)
.
p
Hence, the radius of the largest sphere inscribed in ∆ is p
larger than or equal to 1/ n(n − 1). It remains to
show that this largest radius is less
p than or equal to 1/ n(n − 1). To this end, we show that this largest
radius is less than or equal to 1/ n(n − 1) + ε for any ε > 0. For this, it suffices to show that the sphere of
161
radius 1/
p
n(n − 1) + ε is not inscribed in ∆. To show this, consider the point
x
x∗ − a0
kx∗ − a0 k
p
= x∗ + ε n(n − 1)(x∗ − a0 )
#>
" r
1
n−1
1
∗
,p
,..., p
.
= x +ε −
n
n(n − 1)
n(n − 1)
= x∗ + ε
p
It is easy to verify that the point x above is on the sphere of radius
p1/ n(n − 1) + ε. However, clearly the
first component of x is negative. Therefore, the sphere of radius 1/ n(n − 1) + ε is not inscribed in ∆. Our
proof is thus completed.
18.8
We first consider the constraints. We claim that x ∈ ∆ ⇔ x̄ ∈ ∆. To see this, note that if x ∈ ∆, then
e> x = 1 and hence
e> x̄ = e> D −1 x/e> D −1 x = 1,
which means that x̄ ∈ ∆. The same argument can be used for the converse. Next, we claim Ax = 0 ⇔
AD x̄ = 0. To see this, write
Ax = ADD −1 x = AD x̄(e> D −1 x).
Since e> D −1 x > 0, we have Ax = 0 ⇔ AD x̄ = 0. Finally, we claim that if x∗ is an optimal solution to the
original problem, then x̄∗ = U (x∗ ) is an optimal solution to the transformed problem. To see this, recall
that the problem in a Karmarkar’s restricted problem, and hence by Assumption (B) we have c> x∗ = 0. We
now note that the minimum value of the objective function c> D x̄ in the transformed problem is zero. This
is because c> D x̄ = c> x/e> D −1 x, and e> D −1 x > 0. Finally, we observe that at the point, x̄∗ = U (x∗ )
the objective function value for the transformed problem is zero. Indeed,
c> D x̄∗ = c> DD −1 x∗ /e> D −1 x∗ = 0.
Therefore, the two problems are equivalent.
18.9
Let v ∈ Rm+1 be such that v > B = 0> . We will show that
" #
> A
v
= 0>
e>
and hence v = 0 by virtue of the assumption that
"
#
A
rank > = m + 1.
e
This in turn gives us the desired result.
To proceed, write v as
"
v=
where u ∈ R
m
u
#
vm+1
constitute the first m components of v. Then,
v > B = u> AD + vm+1 e> = 0> .
Postmultiplying the above by e, and using the facts that De = x0 , Ax0 = 0, and e> e = n, we get
u> Ax0 + vm+1 n = vm+1 n = 0,
162
which implies that vm+1 = 0. Hence, u> AD = 0> , which after postmultiplying by D −1 gives u> A = 0> .
Hence,
" #
> A
= 0> ,
v
e>
which implies that v = 0. Hence, rank B = m + 1.
18.10
We proceed by induction. For k = 0, the result is true because x(0) = a0 . Now suppose that x(k) is a strictly
interior point of ∆. We first show that x̄(k+1) is a strictly interior point. Now,
x̄(k+1) = a0 − αrĉ(k) .
Then, since α ∈ (0, 1) and kĉ(k) k = 1, we have
kx̄(k+1) − a0 k ≤ |αr|kĉ(k) k < r.
Since r is the radius of the largest sphere inscribed in ∆, x̄(k+1) is a strictly interior point of ∆. To complete
the proof, we write
D k x̄(k+1)
(k+1)
)= >
x(k+1) = U −1
k (x̄
e D k x̄(k+1)
We already know that x(k+1) ∈ ∆. It therefore remains to show that it is strictly interior, i.e., x(k+1) > 0.
To see this, note that e> D k x̄(k+1) > 0. Furthermore, we can write
 (k+1) (k) 
x1
x̄1


.
(k+1)
.
..
D k x̄
=


(k+1) (k)
xn
x̄n
(k)
(k)
(k+1)
Since x(k) = [x1 , . . . , xn ]> > 0 by the induction hypothesis, and x̄(k+1) = [x̄1
above, x(k+1) > 0 and hence it is a strictly interior point of ∆.
(k+1) >
, . . . x̄n
] > 0 by the
19. Integer Linear Programming
19.1
The result follows from the simple observation that if M is a submatrix of A, then any submatrix of M is
also a submatrix of A. Therefore, any property involving all submatrices of A also applies to all submatrices
of M .
19.2
The result follows from the simple observation that any submatrix of A> is the transpose of a submatrix of
A, and that the determinant of the transpose of a matrix equals the determinant of the original matrix.
19.3
The claim that A is totally unimodular if [A, I] is totally unimodular follows from Exercise 19.1. To show
the converse, suppose that A is totally unimodular. We will show that any p × p invertible submatrix of
[A, I], p ≤ min(m, n), has determinant ±1. We first note that any p × p invertible submatrix of [A, I] that
consists only of columns of A has determinant ±1 because A is totally unimodular. Moreover, any p × p
invertible submatrix of I has determinant 1.
Consider now a p × p invertible submatrix of [A, I] composed of k columns of A and p − k columns of I.
Without loss of generality, suppose that this submatrix is composed of the first p rows of [A, I], the last k
columns of A, and the first p − k columns of I. (This choice of rows and columns is without loss of generality
because we can exchange rows and columns to arrive at this form, and each exchange only changes the sign
of the determinant.) We now proceed as in the proof of Proposition 19.1.
19.4
The result follows from these properties of determinants: (1) that exchanging columns only changes the sign
163
of the determinant; (2) the determinant of a block triangular matrix is the product of the determinants of
the diagonal blocks; and (3) the determinant of the identity matrix is 1. See also Exercise 2.4.
19.5
The vectors x and z together satisfy
Ax + z = b,
which means that z = b − Ax. Because the right-hand side involves only integers, z is an integer vector.
19.6
The following MATLAB code generates the figures.
%----------------------------------------------% The vertices of the feasible set
x = [2/5 1; 2/5 -2/5]\[3; 1];
X = [0 0 x(1) 2.5];
Y = [0 3 x(2) 0];
fs=16; %Fontsize
% Draw the fesible set for x1 x2 \in R.
vi = convhull(X,Y);
plot(X,Y, ’o’);
axis on; axis equal;
axis([-0.2 4.2 -0.2 3.2]);
hold on
fill (X(vi), Y(vi), ’b’,’facealpha’, 0.2);
text(.1,.5,[’\fontsize{48}\Omega’],’position’, [1.5 1.25])
set(gca,’Fontsize’,fs)
hold off
% The optimal solution has to be one of the extreme points
c = [-3 -4]’;
% Draw the feasible set for the noninteger problem
figure
axis on; axis equal;
x = [-0.5:0.1:x(1)];
y1 = -x*0.4+3;
y2 = x-2.5;
fs=16; % Fontsize
plot(x,y1,’--b’,x,y2,’--b’,’LineWidth’,2);
axis([-0.2 4.2 -0.2 3.2]);
set(gca,’Fontsize’,fs)
hold on
X = [zeros(1,4) ones(1,3) 2*ones(1,3) 3];
Y = [0:max(floor(Y)) 0:max(floor(Y)-1) 0:max(floor(Y)-1) 1];
plot(X,Y,’bls’,’LineWidth’,2,...
’MarkerEdgeColor’,’k’,...
’MarkerFaceColor’,’g’,...
’MarkerSize’,10)
% Plot of the cost function
xc = [-1:0.5:5];
yc = -0.75*xc+(14/4)*ones(1,length(xc));
yc0 = -0.75*xc+(17.5/4)*ones(1,length(xc));
fs=16; % Fontsize
plot(xc, yc, ’r’, xc, yc0, ’o-k’, ’LineWidth’,2);
set(gca,’Fontsize’,fs)
%text(.1,.5,[’\fontsize{48}\Omega’],’position’, [1.5 1.25])
[~,Xmin] = min(c’*[X; Y]);
str = sprintf(’The maximizer is [%d, %d]’’ and the maximum is %.4f]’,...
X(Xmin), Y(Xmin), -c’*[X(Xmin); Y(Xmin)]);
disp(str);
%-----------------------------------------------
164
19.7
It suffices to show the following claim: If we introduce the equation
xi +
n
X
byij cxj + xn+1 = byi0 c
j=m+1
into the original constraint, then the result holds. The reason this suffices is that the Gomory cut is obtained
by subtracting this equation from an equation obtained by elementary row operations on [A, b] (hence is
equivalent to premultiplication by an invertible matrix).
To show the above claim, let xn+1 satisfy this constraint with an integer vector x. Then,
xi +
n
X
byij cxj + xn+1 = byi0 c,
j=m+1
which implies that
xn+1 = byi0 c − xi −
n
X
byij cxj .
j=m+1
Because the right-hand side involves only integers, xn+1 is an integer.
19.8
If there is only one Gomory cut, then the result follows directly from Exercise 19.7. The general result
follows by induction on the number of Gomory cuts, using Exercise 19.7 at each inductive step.
19.9
The result follows from Exercises 19.5 and 19.8.
19.10
The dual problem is:
minimize
3λ1 + λ2
2
2
subject to
λ1 + λ2 ≥ 3
5
5
2
1λ1 − λ2 ≥ 4
5
λ1 , λ 2 ≥ 0
λ1 , λ2 ∈ Z.
The problem is solved graphically using the same approach as in Example 19.5. We proceed by calculating
the extreme points of the feasible set. We first assume that λ1 , λ2 ∈ R. The extreme points are calculated
intersecting the given constraints, and they are:
5
λ(1) = [5, ]> ,
2
λ(2) = [
15 >
, 0] .
2
In Figure 24.10, we show the feasible set Ω for the case when λ1 , λ2 ∈ R.
Next we sketch the feasible set for the case when λ1 , λ2 ∈ Z and solve the problem graphically. The
graphical solution is depicted in Figure 24.11. We can see in Figure 24.11 that the optimal integer solution
is
x∗ = [6, 2]> .
The following MATLAB code generates the figures.
%-------------------------------------------------------% The vertices of the feasible set are:
x = [2/5 2/5;1 -2/5]\[3; 4];
X = [7.5 x(1) 20 20];
165
Feasible set Ω for the case when λ1 , λ2 ∈ R in Example 19.5.
Figure 24.10
18
16
14
12
10
8
6
4
2
0
0
Figure 24.11
5
10
15
Real feasible set with λ1 , λ2 ∈ Z.
166
Y = [0 x(2) 0 40];
fs=16; % Fontsize
% Now we draw the set Omega, supposing x1 x2 \in R.
vi = convhull(X,Y);
plot(X(1:2),Y(1:2), ’rX’, ’LineWidth’,4);
axis on; axis equal;
axis([-.2 18 -0.2 18]);
set(gca,’Fontsize’,fs)
%title(’Feasible set supossing x1, x2 in R.’,’Fontsize’,14,’Fontname’,’Avant-garde’);
hold on
fill (X(vi), Y(vi), ’b’,’facealpha’, 0.2);
text(.1,.5,[’\fontsize{48}\Omega’],’position’, [12 5])
hold off
% We now the optimal solution has to be one of the extreme points.
c = [3 1]’;
%Now we draw the real feasible set for the problem.
figure
axis on; axis equal;
axis([-.2 18 -0.2 18]);
set(gca,’Fontsize’,fs)
%title(’Feasible set and cost function’,’Fontsize’,14,’Fontname’,’Avant-garde’);
hold on
X = [];
Y = [];
for i=1:18
j=0;
while ((j<=((i-4)*2.5)) && (j<18))
if((j>=(7.5-i)))
X = [X i];
Y = [Y j];
end
j=j+1;
end
end
plot(X,Y,’bls’,’LineWidth’,1,...
’MarkerEdgeColor’,’k’,...
’MarkerFaceColor’,’g’,...
’MarkerSize’,10)
x = [5:0.1:18];
y1 = (x-4)*2.5;
y2 = (7.5-x);
plot(x,y1,’--b’,x,y2,’--b’,’LineWidth’,2);
set(gca,’Fontsize’,fs)
%text(.1,.5,[’\fontsize{48}\Omega’],’position’, [12 5])
%Plot of the cost function at level 17.5
xc = [-1:0.5:18];
yc = (35/2)*ones(1,length(xc))-3*xc;
plot(xc, yc, ’dk’, ’LineWidth’,2);
%Plot of the cost function
xc = [-1:0.5:18];
yc = (20)*ones(1,length(xc))-3*xc;
167
plot(xc, yc, ’r’, ’LineWidth’,2);
[~,Xmin] = min(c’*[X; Y]);
str = sprintf(’The minimizer is [%d, %d]’’ and the maximum is %.4f]’,...
X(Xmin), Y(Xmin), c’*[X(Xmin); Y(Xmin)]);
disp(str);
%--------------------------------------------------------
168
20. Problems with Equality Constraints
20.1
The feasible set consists of the points
" #
2
x=
,
a
a ≤ −1.
We next find the gradients:
"
#
2(x1 − 2)
∇h(x) =
0
"
#
0
and ∇g(x) =
.
3(x2 + 1)2
All feasible points are not regular because at the above points the gradients of h and g are not linearly
independent. There are no regular points of the constraints.
20.2
a. As usual, let f be the objective function, and h the constraint function. We form the Lagrangian
l(x, λ) = f (x)+λ> h(x), and then find critical points by solving the following equations (Lagrange condition):
Dx l(x, λ) = 0> ,
Dλ l(x, λ) = 0> .
We obtain

2
2


0

1
4
2
6
0
2
0
0
0
0
0
5
1
2
0
0
0
   
−4
4
x1
 x   −5
0
2
   
   
5  x3  =  −6 .
   
0  λ 1   3 
λ2
6
0
The unique solution to the above system is
λ∗ = [−27/5, −6/5]> .
x∗ = [16/5, −1/10, −34/25]> ,
Note that x∗ is a regular point. We now apply the SOSC. We compute

2 2

L(x∗ , λ∗ ) = F (x∗ ) + [λ∗ H(x∗ )] =  2 6
0 0

0

0 .
0
The tangent plane is
(
T (x∗ )
"
1
y ∈ R3 :
4
=
#
)
0
y=0
5
2
0
= {a[−5/4, 5/8, 1]> : a ∈ R}.
Let y = a[−5/4, 5/8, 1]> ∈ T (x∗ ), a 6= 0. We have
y > L(x∗ , λ∗ )y =
75 2
a > 0.
32
Therefore, x∗ is a strict local minimizer.
b. The Lagrange condition for this problem is
4 + 2λx1
2x2 + 2λx2
= 0
= 0
x21 + x22 − 9
=
169
0.
We have four points satisfying the Lagrange condition:
x(1) = [3, 0]> ,
x
λ(1) = −2/3
>
(2)
λ(2) = 2/3
= [−3, 0] ,
√
(3)
x = [2, 5]> ,
√
x(4) = [2, − 5]> ,
λ(3) = −1
λ(4) = −1.
Note that all four points x(1) , . . . , x(4) are regular. We now apply
"
#
"
0 0
2
L(x, λ) =
+λ
0 2
0
the SOSC. We have
#
0
,
2
= {y : [2x1 , 2x2 ]y = 0}.
T (x)
For the first point, we have
"
L(x
(1)
,λ
(1)
)
=
−4/3
0
#
0
2/3
= {a[0, 1]> : a ∈ R}.
T (x(1) )
Let y = a[0, 1]> ∈ T (x(1) ), a 6= 0. Then
y > L(x(1) , λ(1) )y =
2 2
a > 0.
3
Hence, x(1) is a strict local minimizer.
For the second point, we have
"
L(x
(2)
,λ
(2)
4/3
)=
0
#
0
> 0.
10/3
Hence, x(2) is a strict local minimizer.
For the third point, we have
"
L(x(3) , λ(3) )
T (x(3) )
=
−2
0
#
0
0
√
= {a[− 5, 2]> : a ∈ R}.
√
Let y = a[− 5, 2]> ∈ T (x(3) ), a 6= 0. Then
y > L(x(3) , λ(3) )y = −10a2 < 0.
Hence, x(3) is a strict local maximizer.
For the fourth point, we have
"
L(x
(4)
,λ
(4)
)
T (x(4) )
#
−2 0
=
0 0
√
= {a[ 5, 2]> : a ∈ R}.
√
Let y = a[ 5, 2]> ∈ T (x(4) ), a 6= 0. Then
y > L(x(4) , λ(4) )y = −10a2 < 0.
Hence, x(4) is a strict local maximizer.
170
c. The Lagrange condition for this problem is
x2 + 2λx1
=
0
x1 + 8λx2
=
0
x21 + 4x22 − 1
=
0.
We have four points satisfying the Lagrange condition:
√
√
x(1) = [1/ 2, −1/(2 2)]> ,
√
√
x(2) = [−1/ 2, 1/(2 2)]> ,
√
√
x(3) = [1/ 2, 1/(2 2)]> ,
√
√
x(4) = [−1/ 2, −1/(2 2)]> ,
λ(1) = 1/4
λ(2) = 1/4
λ(3) = −1/4
λ(4) = −1/4.
Note that all four points x(1) , . . . , x(4) are regular. We now apply
"
#
"
0 1
2
+λ
L(x, λ) =
1 0
0
T (x)
the SOSC. We have
#
0
,
8
= {y : [2x1 , 8x2 ]y = 0}.
Note that
L(x, −1/4)
L(x, 1/4)
"
−1/2
1
"
1/2
1
=
=
1
−2
#
#
1
.
2
After standard manipulations, we conclude that the first two points are strict local maximizers, while the
last two points are strict local minimizers.
20.3
We form the lagrangian
l(x, λ) = (a> x)(b> x) + λ1 (x1 + x2 ) + λ2 (x2 + x3 ).
The Lagrange conditions take the form,
∇x l
=
=
=
=
h(x)
=
h
i
(ab> + ba> )x + ∇x h1 (x) ∇x h2 (x) λ




0 1 0
1 0




 1 0 1  x +  1 1 λ
0 1 0
0 1


x2 + λ1


 x1 + x3 + λ1 + λ2 
x2 + λ2
 
0
 
 0
0
"
# " #
x1 + x2
0
=
.
x2 + x3
0
It is easy to see that x∗ = 0 and λ∗ = 0 satisfy the Lagrange, FONC, conditions.
171
The Hessian of the lagrangian is

0

∗
>
∗
>
L(x , λ ) = ab + ba =  1
0
1
0
1

0

1
0
and the tangent space


 


1


 
∗
T (x ) = y : y = a  −1 , a ∈ R .




1
To verify if the critical point satisfies the SOSC, we evaluate
y > L(x∗ , λ∗ )y = −4a2 < 0.
Thus the critical point is a strict local maximizer.
20.4
By the Lagrange condition, x∗ = [x1 , x2 ]> satisfies
x1 + λ
=
0
x1 + 4 + 4λ
=
0.
Eliminating λ we get
3x1 − 4 = 0
∗
which implies that x1 = 4/3. Therefore, ∇f (x ) = [4/3, 16/3]> .
20.5
a. The Lagrange condition for this problem is:
2(x∗ − x0 ) + 2λ∗ x∗
kx∗ k2
= 0
=
9,
where λ∗ ∈ R. Rewriting the first equation we get (1 + λ∗ )x∗ = x0 , which when combined with the second
equation gives two values for 1 + λ∗ : √
1 + λ∗1 = 2/3 and 1 + λ∗2 = √
−2/3. Hence there are two solutions to the
∗(1)
Lagrange condition: x
= (3/2)[1, 3], and x∗(2) = −(3/2)[1, 3].
b. We have L(x∗(i) , λ∗i ) = (1 + λ∗i )I. To apply the SONC Theorem, we need to check regularity. This is
easy, since the gradient of the constraint function at any point x is 2x, which is nonzero at both the points
in part a.
For the second point, 1 + λ∗2 = −2/3, which implies that the point is not a local minimizer because the
SONC does not hold.
On the other hand, the first point satisfies the SOSC (since 1 + λ∗1 = 2/3), which implies that it is a strict
local minimizer.
20.6
a. Let x1 , x2 , and x3 be the dimensions of the closed box. The problem is
minimize
subject to
2(x1 x2 + x2 x3 + x3 x1 )
x1 x2 x3 = V.
We denote f (x) = 2(x1 x2 + x2 x3 + x3 x1 ), and h(x) = x1 x2 x3 − V . We have ∇f (x) = 2[x2 + x3 , x1 +
x3 , x1 + x2 ]> and ∇h(x) = [x2 x3 , x1 x3 , x1 x2 ]> . By the Lagrange condition, the dimensions of the box with
minimum surface area satisfies
2(b + c) + λbc = 0
2(a + c) + λac = 0
2(a + b) + λab = 0
abc = V,
172
where λ ∈ R.
b. Regularity of x∗ means ∇h(x∗ ) 6= 0 (since there is only one scalar equality constraint in this case). Since
x∗ = [a, b, c]> is a feasible point, we must have a, b, c 6= 0 (for otherwise the volume will be 0). Hence,
∇h(x∗ ) 6= 0, which implies that x∗ is regular.
c. Multiplying the first equation by a and the second equation by b, and then subtracting the first from the
second, we obtain:
c(a − b) = 0.
Since c 6= 0 (see part b), we conclude that a = b. By a similar procedure on the second and third equations,
we conclude that b = c. Hence, substituting into the fourth (constraint) equation, we obtain
a = b = c = V 1/3 ,
with λ = −4V −1/3 .
d. The Hessian of the Lagrangian is given by

 
0
2 + λc 2 + λb
0

 
L(x∗ , λ) =  2 + λc
0
2 + λa =  −2
2 + λb 2 + λa
0
−2
−2
0
−2


−2
0


−2 = −2  1
0
1
1
0
1

1

1 .
0
The matrix L(x∗ , λ) is not positive definite (there are several ways to check this: we could use Sylvester’s
criterion, or we could compute the eigenvalues of L(x∗ , λ), which are 2, 2, −4). Therefore, we need to compute
the tangent space T (x∗ ). Note that
Dh(x∗ ) = ∇h(x)> = [bc, ac, ab] = V 2/3 [1, 1, 1].
Hence,
T (x∗ ) = {y : Dh(x∗ )y = 0} = {y : [1, 1, 1]y = 0} = {y : y3 = −(y1 + y2 )}.
Let y ∈ T (x∗ ), y 6= 0. Note that either y1 6= 0 or

0

y > L(x∗ , λ)y = −2y >  1
1
y2 6= 0. We have,

1 1

0 1 y = −4(y1 y2 + y1 y3 + y2 y3 ).
1 0
Substituting y3 = −(y1 + y2 ), we obtain
y > L(x∗ , λ)y = −4(y1 y2 − y1 (y1 + y2 ) − y2 (y1 + y2 )) = 4(y12 + y1 y2 + y22 ) = 4z > Qz
where z = [y1 , y2 ]> 6= 0 and
"
1
Q=
1/2
Therefore, y > L(x∗ , λ)y > 0, which shows
An alternative (simpler) calculation:

0 1
>
∗
>
y L(x , λ)y = −2y  1 0
1 1
#
1/2
> 0.
1
that the SOSC is satisfied.

1

1 y = −2(y1 (y2 + y3 ) + y2 (y1 + y3 ) + y3 (y1 + y2 )).
0
Substituting y1 = −(y2 + y3 ), y2 = −(y1 + y3 ), and y3 = −(y1 + y2 ) in the first, second, and third terms,
respectively, we obtain
y > L(x∗ , λ)y = 2(y12 + y22 + y22 ) > 0.
173
20.7
a. We first compute critical points by applying the Lagrange conditions. These are:
2x1 + 2λx1
6x1 + 2λx2
x21
+
x22
1 + 2λx3
+ x23 − 16
There are six points satisfying the Lagrange condition:
√
x(1) = [ 63/2, 0, 1/2]> ,
√
x(2) = [− 63/2, 0, 1/2]> ,
x(3) = [0, 0, 4]> ,
>
(4)
= [0, 0, −4] ,
√
(5)
x = [0, 575/6, 1/6]> ,
√
x(6) = [0, − 575/6, 1/6]> ,
x
All the above points are regular. We now
we compute

2

F (x) =  0
0
=
0
=
0
= 0
= 0.
λ(1) = −1
λ(2) = −1
λ(3) = −1/8
λ(4) = 1/8
λ(5) = −3
λ(6) = −3.
apply second order conditions to establish their nature. For this,

0

0 ,
0
0
6
0

2

H(x) =  0
0
0
2
0

0

0 ,
2
and
T (x∗ ) = {y ∈ R3 : [2x1 , 2x2 , 2x3 ]y = 0}.
For the first point, we have

0 0

L(x(1) , λ(1) )
4 0
0 −2
√
(1)
T (x ) = {[−a/ 63, b, a]> : a, b ∈ R}.

0

= 0
0
√
Let y = [−a/ 63, b, a]> ∈ T (x(1) ), where a and b are not both zero. Then

√

> 0 if |a| < b√2
y > L(x(1) , λ(1) )y = 4b2 − 2a2 = 0 if |a| = b 2 .

< 0 if |a| > b√2
From the above, we see that x(1) does not satisfy the SONC. Therefore, x(1) cannot be an extremizer.
Performing similar calculations for x(2) , we conclude that x(2) cannot be an extremizer either.
For the third point, we have


7/4
0
0


L(x(3) , λ(3) ) =  0 23/4
0 
0
0
−1/4
T (x(3) )
= {[a, b, 0]> : a, b ∈ R}.
Let y = [a, b, 0]> ∈ T (x(3) ), where a and b are not both zero. Then
y > L(x(3) , λ(3) )y =
174
7 2 23 2
a + b > 0.
4
4
Hence, x(3) is a strict local minimizer. Performing similar calculations for the remaining points, we conclude
that x(4) is a strict local minimizer, and x(5) and x(6) are both strict local maximizers.
b. The Lagrange condition for the problem is:
2x1 + λ(6x1 + 4x2 )
=
0
2x2 + λ(4x1 + 12x2 )
=
0
3x22 + 4x1 x2 + 6x22 − 140
=
0.
We represent the first two equations as
"
2 + 6λ
4λ
4λ
2 + 12λ
#"
# " #
x1
0
=
.
x2
0
From the constraint equation, we note that x = [0, 0]> cannot satisfy the Lagrange condition. Therefore,
the determinant of the above matrix must be zero. Solving for λ yields two possible values: −1/7 and −1/2.
We then have four points satisfying the Lagrange condition:
x(1) = [2, 4]> ,
x(4)
λ(2) = −1/7
= [−2, −4] ,
√ √
= [−2 14, 14]> ,
√
√
= [2 14, − 14]> ,
x
x(3)
λ(1) = −1/7
>
(2)
λ(3) = −1/2
λ(4) = −1/2.
Applying the SOSC, we conclude that x(1) and x(2) are strict local minimizers, and x(3) and x(4) are strict
local maximizers.
20.8
a. We can represent the problem as
minimize
f (x)
subject to
h(x) = 0,
where f (x) = 2x1 + 3x2 − 4, and h(x) = x1 x2 − 6. We have Df (x) = [2, 3], and Dh(x) = [x2 , x1 ]. Note
that 0 is not a feasible point. Therefore, any feasible point is regular. If x∗ is a local extremizer, then by
the Lagrange multiplier theorem, there exists λ∗ ∈ R such that Df (x∗ ) + λ∗ Dh(x∗ ) = 0> , or
2 + λ∗ x∗2
=
0
λ∗ x∗1
=
0.
3+
Solving, we get two possible extremizers: x(1) = [3, 2]> , with corresponding Lagrange multiplier λ(1) = −1,
and x(2) = −[3, 2]> , with corresponding Lagrange multiplier λ(2) = 1.
b. We have F (x) = O, and
"
0
H(x) =
1
#
1
.
0
First, consider the point x(1) = [3, 2]> , with corresponding Lagrange multiplier λ(1) = −1. We have
"
#
0 1
(1)
(1)
L(x , λ ) = −
,
1 0
and
T (x(1) ) = {y : [2, 3]y = 0} = {α[−3, 2]> : α ∈ R}.
Let y = α[−3, 2]> ∈ T (x(1) ), α 6= 0. We have
y > L(x(1) , λ(1) )y = 12α2 > 0.
175
Therefore, by the SOSC, x(1) = [3, 2]> is a strict local minimizer.
Next, consider the point x(2) = −[3, 2]> , with corresponding Lagrange multiplier λ(2) = 1. We have
"
#
0
1
L(x(2) , λ(2) ) =
.
1 0
and
T (x(2) ) = {y : −[2, 3]y = 0} = {α[−3, 2]> : α ∈ R} = T (x(1) ).
Let y = α[−3, 2]> ∈ T (x(2) ), α 6= 0. We have
y > L(x(2) , λ(2) )y = −12α2 < 0.
Therefore, by the SOSC, x(2) = −[3, 2]> is a strict local maximizer.
c. Note that f (x(1) ) = 8, while f (x(2) ) = −16. Therefore, x(1) , although a strict local minimizer, is not a
global minimizer. Likewise, x(2) , although a strict local maximizer, is not a global maximizer.
20.9
We observe that f (x1 , x2 ) is a ratio of two quadratic functions, that is, we can represent f (x1 , x2 ) as
f (x1 , x2 ) =
x> Qx
.
x> P x
Therefore, if a point x is a maximizer of f (x1 , x2 ) then so is any nonzero multiple of this point because
(tx)> Q(tx)
t2 x> Qx
x> Qx
=
=
.
(tx)> P (tx)
t 2 x> P x
x> P x
Thus any nonzero multiple of a solution is also a solution. To proceed, represent the original problem in an
equivalent form,
maximize
x> Qx = 18x21 − 8x1 x2 + 12x22
subject to x> P x = 2x21 + 2x22 = 1.
Thus, we wish to maximize f (x1 , x2 ) = 18x21 − 8x1 x2 + 12x22 subject to the equality constraint, h(x1 , x2 ) =
1 − 2x21 − 2x22 = 0. We apply the Lagrange’s method to solve the problem. We form the Lagrangian function,
l(x, λ) = f + λh,
compute its gradient and find critical points. We have,
"
#
"
−4
> 18
> 2
∇x l = ∇x x
x+λ 1−x
−4 12
0
"
#
"
#
18 −4
2 0
= 2
x − 2λ
x
−4 12
0 2
# !!
0
x
2
= 0.
We represent the above in an equivalent form,

"
#−1 "
2
0
18
λI 2 −
0 2
−4
#
−4 
x = 0.
12
That is, solving the problem is being reduced to solving an eigenvalue-eigenvector problem,
"
#!
"
#
9 −2
λ−9
2
λI 2 −
x=
x = 0.
−2 6
2
λ−6
176
The characteristic polynomial is
"
λ−9
det
2
#
2
= λ2 − 15λ + 50 = (λ − 5)(λ − 10).
λ−6
The eigenvalues are 5 and 10. Because we are interested in finding a maximizer, we conclude that the value
of the maximized function is 10, while the corresponding maximizer corresponds to an appropriately scaled,
to satisfy the constraint, eigenvector of this eigenvalue. An eigenvector can easily be found by taking any
nonzero column of the adjoint matrix of
"
#
9 −2
10I 2 −
.
−2 6
Performing simple manipulations gives
"
1
adj
2
# "
2
4
=
4
−2
#
−2
.
1
Thus,
√
"
−2
0.1
1
#
is a maximizer for the equivalent problem. Any multiple of the above vector is a solution of the original
maximization problem.
20.10
We use the technique of Example 20.8. First, we write the objective function in the form x> Qx, where
"
#
3 2
>
Q=Q =
.
2 3
The characteristic polynomial of Q is λ2 − 6λ + 5, and the eigenvalues of Q are 1 and 5. The solutions√to
the problem are the unit length eigenvectors of Q corresponding to the eigenvalue 5, which are ±[1, 1]> / 2.
20.11
Consider the problem
kAxk2
minimize
kxk2 = 1.
subject to
The optimal objective function value of this problem is the smallest value that kyk2 can take. The above
can be solved easily using Lagrange multipliers. The Lagrange conditions are
x> A> A − λx>
1 − x> x
= 0>
=
0.
The first equation can be rewritten as A> Ax = λx, which implies that λ is an eigenvalue of A> A. Moreover,
premultiplying by x> yields x> A> Ax = λx> x = λ, which indicates that the Lagrange multiplier is √
equal
to the optimal objective function value. Hence, the range of values that kyk = kAxk can take is 1 to 20.
20.12
Consider the following optimization problem (we need to use squared norms to make the functions differentiable):
minimize
−kAxk2
subject to
kxk2 = 1.
177
As usual, write f (x) = −kAxk2 and h(x) = kxk2 − 1. We have ∇f (x) = −2A> Ax and ∇h(x) = 2x. Note
that all feasible solutions are regular. Let x∗ be an optimal solution. Note that the optimal value of the
objective function is f (x∗ ) = −kAk22 . The Lagrange condition for the above problem is:
−2A> Ax∗ + λ∗ (2x∗ )
=
0
∗ 2
=
1.
kx k
From the first equation, we see that
A> Ax∗ = λ∗ x∗ ,
which implies that λ∗ is an eigenvalue of A> A, and x∗ is the corresponding eigenvector. Premultiplying the
above equation by x∗> and combining the result with the constraint equation, we obtain
λ∗ = x∗> A> Ax∗ = kAx∗ k2 = −f (x∗ ) = kAk22 .
Therefore, because x∗ minimizes f (x∗ ), we deduce that λ∗ must be the largest eigenvalue of A> A; i.e.,
λ∗ = λ1 . Therefore,
p
kAk2 = λ1 .
20.13
Let h(x) = 1 − x> P x = 0. Let x0 be such that h(x0 ) = 0. Then, x0 6= 0. For x0 to be a regular point, we
need to show that {∇h(x0 )} is a linearly independent set, i.e., ∇h(x0 ) 6= 0. Now, ∇h(x) = −2P x. Since
P is nonsingular, and x0 6= 0, then ∇h(x0 ) = −2P x0 6= 0.
20.14
Note that the point [1, 1]> is a regular point. Applying the Lagrange multiplier theorem gives
a + 2λ∗
=
0
∗
=
0.
b + 2λ
Hence, a = b.
20.15
a. Denote the solution by [x∗1 , x∗2 ]> . The Lagrange condition for this problem has the form
x∗2 − 2 + 2λ∗ x∗1
=
0
x∗1 − 2λ∗ x∗2
(x∗1 )2 − (x∗2 )2
=
=
0
0.
From the first and third equations it follows that x∗1 , x∗2 6= 0. Then, combining the first and second equations,
we obtain
2 − x∗2
x∗1
λ∗ =
=
2x∗1
2x∗2
which implies that 2x∗2 − (x∗2 )2 = (x∗1 )2 . Hence, x∗2 = 1, and by the third Lagrange equation, (x∗1 )2 = 1.
Thus, the only two points satisfying the Lagrange condition are [1, 1]> and [−1, 1]> . Note that both points
are regular.
b. Consider the point x∗ = [−1, 1]> . The corresponding Lagrange multiplier is λ∗ = −1/2. The Hessian of
the Lagrangian is
"
#
"
# "
#
1 2 0
0 1
−1 1
∗
∗
L(x , λ ) =
−
=
.
2 0 −2
1 0
1 1
The tangent plane is given by
T (x∗ ) = {y : [−2, −2]y = 0} = {[a, −a]> : a ∈ R}.
Let y ∈ T (x∗ ), y 6= 0. Then, y = [a, −a]> for some a 6= 0. We have y > L(x∗ , λ∗ )y = −2a2 < 0. Hence,
SONC does not hold in this case, and therefore x∗ = [−1, 1]> cannot be local minimizer. In fact, the point
is a strict local maximizer.
178
c. Consider the point x∗ = [1, 1]> . The corresponding Lagrange multiplier is λ∗ = 1/2. The Hessian of the
Lagrangian is
"
# "
#
"
#
1 2 0
1 1
0 1
∗
∗
=
.
L(x , λ ) =
+
2 0 −2
1 −1
1 0
The tangent plane is given by
T (x∗ ) = {y : [2, −2]y = 0} = {[a, a]> : a ∈ R}.
Let y ∈ T (x∗ ), y 6= 0. Then, y = [a, a] for some a 6= 0. We have y > L(x∗ , λ∗ )y = 2a2 > 0. Hence, by the
SOSC, the point x∗ = [1, 1]> is a strict local minimizer.
20.16
a. The point x∗ is the solution to the optimization problem
1
kx − x0 k2
2
Ax = 0.
minimize
subject to
Since rank A = m, any feasible point is regular. By the Lagrange multiplier theorem, there exists λ∗ ∈ Rm
such that
(x∗ − x0 )> − λ∗> A = 0> .
Postmultiplying both sides by x∗ and using the fact that Ax∗ = 0, we get
(x∗ − x0 )> x∗ = 0.
b. From part a, we have
x∗ − x0 = A> λ∗ .
Premultiplying both sides by A we get
−Ax0 = (AA> )λ∗
from which we conclude that λ∗ = −(AA> )−1 Ax0 . Hence,
x∗ = x0 + A> λ∗ = x0 − A> (AA> )−1 Ax0 = (I n − A> (AA> )−1 A)x0 .
20.17
a. The Lagrange condition is (omitting all superscript-∗ for convenience):
(Ax − b)> A + λ> C
= 0>
Cx = d.
For simplicity, write Q = A> A, which is positive definite. From the first equation, we have
x = Q−1 A> b − Q−1 C > λ.
Multiplying boths sides by C and using the second equation, we have
d = CQ−1 A> b − CQ−1 C > λ,
from which we obtain
λ = (CQ−1 C > )−1 (CQ−1 A> b − d).
Substituting back into the equation for x, we obtain
x = Q−1 A> b − Q−1 C > (CQ−1 C > )−1 (CQ−1 A> b − d).
179
b. Rewrite the objective function as
1 > >
1
x A Ax − b> Ax + kbk2 .
2
2
As before, write Q = A> A. Completing the squares and setting y = x − Q−1 A> b, the objective function
can be written as
1 >
y Qy + const.
2
Hence, the problem can be converted to the equivalent QP:
minimize
subject to
1 >
y Qy
2
Cy = d − CQ−1 A> b.
The solution to this QP is
y ∗ = Q−1 C > (CQ−1 C > )−1 (d − CQ−1 A> b).
Hence, the solution to the original problem is:
x∗
= Q−1 A> b + Q−1 C > (CQ−1 C > )−1 (d − CQ−1 A> b)
=
(A> A)−1 A> b + (A> A)−1 C > (C(A> A)−1 C > )−1 (d − C(A> A)−1 A> b),
which agrees with the solution obtained in part a.
20.18
Write f (x) = 12 x> Qx − c> x + d (actually, we could have ignored d) and h(x) = b − Ax. We have
Df (x) = x> Q − c> ,
Dh(x) = −A.
The Lagrange condition is
x∗> Q − c> − λ∗> A
b − Ax∗
= 0>
= 0.
From the first equation we get
x∗ = Q−1 (A> λ∗ + c).
Multiplying both sides by A and using the second equation (constraint), we get
(AQ−1 A> )λ∗ + AQ−1 c = b.
Since Q > 0 and A is of full rank, we can write
λ∗ = (AQ−1 A> )−1 (b − AQ−1 c).
Hence,
x∗ = Q−1 c + Q−1 A> (AQ−1 A> )−1 (b − AQ−1 c).
Alternatively, we could have rewritten the given problem in our usual quadratic programming form with
variable y = x − Q−1 c.
20.19
Clearly, we have M = R(B), i.e., y ∈ M if and only if there exists x ∈ Rm such that y = Bx. Hence
L is positive semidefinite on M
⇔
for all y ∈ M,
⇔
for all x ∈ Rm ,
y > Ly ≥ 0
(Bx)> L(Bx) ≥ 0
⇔
for all x ∈ R ,
x> (B > LB)x ≥ 0
⇔
for all x ∈ Rm ,
x> LM x ≥ 0
⇔
LM ≥ 0.
m
180
For positive definiteness, the same argument applies, with ≥ replaced by >.
20.20
a. By simple manipulations, we can write
x2 = a2 x0 + abu0 + bu1 .
Therefore, the problem is
1 2
(u + u21 )
2 0
a2 x0 + abu0 + bu1 = 0.
minimize
subject to
Alternatively, we may use a vector notation: writing u = [u0 , u1 ]> , we have
minimize
subject to
f (u)
h(u) = 0,
where f (u) = 21 kuk2 , and h(u) = a2 x0 + [ab, b]u. Since the vector ∇h(u) = [ab, b]> is nonzero for any u,
then any feasible point is regular. Therefore, by the Lagrange multiplier theorem, there exists λ∗ ∈ R such
that
u∗0 + λ∗ ab =
u∗1 + λ∗ b
a2 x0 + abu∗0 + bu∗1
0
=
=
0
0.
We have three linear equations in three unknowns, that upon solving yields
u∗0 = −
a3 x0
,
b(1 + a2 )
u∗1 = −
a2 x0
.
b(1 + a2 )
b. The Hessians of f and h are F (u) = I 2 (2 × 2 identity matrix) and H(u) = O, respectively. Hence, the
Hessian of the Lagrangian is L(u∗ , λ∗ ) = I 2 , which is positive definite. Therefore, u∗ satisfies the SOSC,
and is therefore a strict local minimizer.
20.21
Letting z = [x2 , u1 , u2 ]> , the objective function is z > Qz, where


1 0
0


Q =  0 1/2 0  .
0 0 1/3
The linear constraint on z is obtained by writing
x2 = 2x1 + u2 = 2(2 + u1 ) + u2 ,
which can be written as Az = b, where
A = [1, −2, −1],
Hence, using the method of Section 20.6, the solution is

b = 4.



1
1/3
 


z ∗ = Q−1 A> (AQ−1 A> )−1 b =  −4 · (12)−1 · 4 =  −4/3 .
−3
−1
Thus, the optimal controls are u∗1 = −4/3 and u∗2 = −1.
181
20.22
The composite input vector is
h
u = u0
u1
u2
i>
.
The performance index J is J = 21 u> u. To obtain the constraint Au = b, where A ∈ R1×3 , we proceed as
follows. First, we write
x2
= x1 + 2u1
= x0 + 2u0 + 2u1 .
Using the above, we obtain
x3
=
9
= x2 + 2u2
= x0 + 2u0 + 2u1 + 2u2 .
We represent the above in the format Au = b as follows
 
h
i u0
 
2 2 2  u1  = 6.
u2
Thus we formulated the problem of finding the optimal control sequence as a constrained optimization
problem
1 >
u u
2
subject to Au = b.
minimize
To solve the above problem, we form the Lagrangian
l(u, λ) =
1 >
u u + λ (Au − b) ,
2
where λ is the Lagrange multiplier. Applying the Lagrange first-order condition yields
u + A> λ = 0
and Au = b.
From the first of the above conditions, we calculate, u = −A> λ. Substituting the above into the second of
the Lagrange conditions gives
−1
λ = − AA>
b.
Combining the last two equations, we obtain a closed-form formula for the optimal input sequence
−1
u = A> AA>
b.
In our problem,
 

1
u0
b

 
A> = 
u =  u1  =  1 .
>
AA
1
u2

21. Problems With Inequality Constraints
182
21.1
a. We form the Lagrangian function,
l(x, µ) = x21 + 4x22 + µ(4 − x21 − 2x22 ).
The KKT conditions take the form,
h
Dx l(x, µ) = 2x1 − 2µx1
i
8x2 − 4µx2 = 0>
µ(4 − x21 − 2x22 ) = 0
µ≥0
4 − x21 − 2x22 ≤ 0.
From the first of the above equality, we obtain
(1 − µ)x1 = 0
(2 − µ)x2 = 0.
We first consider the case when µ = 0. Then, we obtain the point, x(1) = 0, which does not satisfy the
constraints.
The next case is when µ = 1. Then we have to have x2 = 0 and using µ(4 − x21 − 2x22 ) = 0 gives
" #
" #
2
2
(2)
(3)
x =
and x = −
.
0
0
For the case when µ = 2, we have to have x1 = 0 and we get
" #
" #
0
0
(4)
(5)
x = √
and x = − √ .
2
2
b. The Hessian of l is
"
2
L=
0
#
"
0
−2
+µ
8
0
#
0
.
−4
When µ = 1,
"
#
0
.
4
0
L=
0
We next find the subspace
T̃
h
i
= T = {y : ±4 0 y = 0}
h
i>
= {y = a 0 1 : a ∈ R}.
We then check for positive definiteness of L on T̃ ,
>
2
y Ly = a
h
0
"
i 0
1
0
0
4
#" #
0
= 4a2 > 0.
1
Hence, x(2) and x(3) satisfy the SOSC to be strict local minimizers.
When µ = 2,
"
#
−2 0
L=
,
0 0
and
h
T = {y = a 1
183
0
i>
: a ∈ R}.
We have
y > Ly = −2a2 < 0.
Thus, x(4) and x(5) do not satisfy the SONC to be minimizers.
In summary, only x(2) and x(3) are strict local minimizers.
21.2
a. We first find critical points by applying the Karush-Kuhn-Tucker conditions, which are
2x1 − 2 − 2µ1 x1 + 5µ2
1
1
2x2 − 10 + µ1 + µ2
5
2 1
1
x2 − x21 + µ2 5x1 + x2 − 5
µ1
5
2
µ
=
0
=
0
=
0
≥ 0.
We have to check four possible combinations.
Case 1: (µ1 = 0, µ2 = 0) Solving the first and second Karush-Kuhn-Tucker equations yields x(1) = [1, 5]> .
However, this point is not feasible and is therefore not a candidate minimizer.
Case 2: (µ1 > 0, µ2 = 0) We have two possible solutions:
(2)
x(2) = [−0.98, 4.8]>
µ1 = 2.02
(3)
x(3) = [−0.02, 0]>
µ1 = 50.
Both x(2) and x(3) satisfy the constraints, and are therefore candidate minimizers.
Case 3: (µ1 = 0, µ2 > 0) Solving the corresponding equation yields:
(4)
x(4) = [0.5050, 4.9505]>
µ1 = 0.198.
The point x(4) is not feasible, and hence is not a candidate minimizer.
Case 4: (µ1 > 0, µ2 > 0) We have two solutions:
x(5) = [0.732, 2.679]>
µ(5) = [13.246, 3.986]>
x(6) = [−2.73, 37.32]>
µ(6) = [188.8, −204]> .
The point x(5) is feasible, but x(6) is not.
We are left with three candidate minimizers: x(2) , x(3) , and x(5) . It is easy to check that they are regular.
We now check if each satisfies the second order conditions. For this, we compute
"
#
2 − 2µ1 0
L(x, µ) =
.
0
2
For x(2) , we have
"
L(x(2) , µ(2) )
T̃ (x(2) )
=
−2.04
0
#
0
2
= {a[−0.1021, 1]> : a ∈ R}.
Let y = a[−0.1021, 1]> ∈ T̃ (x(2) ) with a 6= 0. Then
y > L(x(2) , µ(2) )y = 1.979a2 > 0.
Thus, by the SOSC, x(2) is a strict local minimizer.
For x(3) , we have
"
L(x
(3)
(3)
,µ
)
T̃ (x(3) )
=
−97.958
0
#
0
2
= {a[−4.898, 1]> : a ∈ R}.
184
Let y = a[−4.898, 1]> ∈ T̃ (x(3) ) with a 6= 0. Then,
y > L(x(3) , µ(3) )y = −2347.9a2 < 0.
Thus, x(3) does not satisfy the SOSC. In fact, in this case, we have T (x(3) ) = T̃ (x(3) ), and hence x(3) does
not satisfy the SONC either. We conclude that x(3) is not a local minimizer. We can easily check that x(3)
is not a local maximizer either.
For x(5) , we have
"
#
−24.4919 0
(5)
(5)
L(x , µ ) =
0
2
T̃ (x(5) )
= {0}.
The SOSC is trivially satisfied, and therefore x(5) is a strict local minimizer.
b. The Karush-Kuhn-Tucker conditions are:
2x1 − µ1 − µ3
=
0
2x2 − µ2 − µ3
=
0
−x1
−x2
≤ 0
≤ 0
−x1 − x2 + 5
≤ 0
−µ1 x1 − µ2 x2 + µ3 (−x1 − x2 + 5)
µ1 , µ2 , µ3
=
0
≥ 0.
It is easy to verify that the only combination of Karush-Kuhn-Tucker multipliers resulting in a feasible point
is µ1 = µ2 = 0, µ3 > 0. For this case, we obtain x∗ = [2.5, 2.5]> , µ∗ = [0, 0, 5]> . We have
"
#
2 0
∗
∗
L(x , µ ) =
> 0.
0 2
Hence, x∗ is a strict local minimizer (in fact, the only one for this problem).
c. The Karush-Kuhn-Tucker conditions are:
2x1 + 6x2 − 4 + 2µ1 x1 + 2µ2
=
0
6x1 − 2 + 2µ1 − 2µ2
=
0
x21
+ 2x2 − 1
2x1 − 2x2 − 1
µ1 (x21 + 2x2 − 1) + µ2 (2x1 − 2x2 − 1)
µ1 , µ2
≤ 0
≤ 0
=
0
≥ 0.
It is easy to verify that the only combination of Karush-Kuhn-Tucker multipliers resulting in a feasible point
is µ1 = 0, µ2 > 0. For this case, we obtain x∗ = [9/14, 2/14]> , µ∗ = [0, 13/14]> . We have
"
#
2
6
L(x∗ , µ∗ ) =
6 0
T̃ (x∗ )
= {a[1, 1] : a ∈ R}.
Let y = a[1, 1] ∈ T̃ (x∗ ) with a 6= 0. Then
y > L(x∗ , µ∗ )y = 14a2 > 0.
Hence, x∗ is a strict local minimizer (in fact, the only one for this problem).
185
21.3
The Karush-Kuhn-Tucker conditions are:
2x1 + 2λx1 + 2λx2 + 2µx1 = 0
2x2 + 2λx1 + 2λx2 − µ = 0
x21 + 2x1 x2 + x22 − 1
x21 − x2
µ(x21 − x2 )
= 0
≤ 0
=
0
µ ≥ 0.
We have two cases to consider.
Case 1: (µ > 0) Substituting x2 = x21 into the third equation and combining the result with the first two
yields two possible points:
x(1) = [−1.618, 0.618]>
µ(1) = −3.7889
x(2) = [2.618, 0.382]>
µ(2) = −0.2111.
Note that the resulting µ values violate the condition µ > 0. Hence, neither of the points are minimizers
(although they are candidates for maximizers).
Case 2: (µ = 0) Subtracting the second equation from the first yields x1 = x2 , which upon substituting
into the third equation gives two possible points:
x(3) = [1/2, 1/2]> ,
x(4) = [−1/2, 1/2]> .
Note that x(4) is not a feasible point, and is therefore not a candidate minimizer.
Therefore, the only remaining candidate is x(3) , with corresponding λ(3) = −1/2 (and µ = 0). We now
check second order conditions. We have
"
#
1
−1
L(x(3) , 0, λ(3) ) =
−1 1
T̃ (x(3) )
= {a[1, −1]> : a ∈ R}.
Let y = a[1, −1]> ∈ T̃ (x(3) ) with a 6= 0. Then
y > L(x(3) , 0, λ(3) )y = 4a2 > 0.
Therefore, by the SOSC, x(3) is a strict local minimizer.
21.4
The optimization problem is:
minimize
subject to
e>
np
Gp ≥ P em
p ≥ 0,
where G = [gi,j ], en = [1, . . . , 1]> (with n components), and p = [p1 , . . . , pn ]> . The KKT condition for this
problem is:
>
>
e>
n − µ1 G − µ2
= 0>
>
µ>
1 (P em − Gp) − µ2 p = 0
Gp ≥ P em
µ1 , µ2 , p ≥ 0.
186
21.5
a. We have f (x) = x2 − (x1 − 2)3 + 3 and g(x) = 1 − x2 . Hence, ∇f (x) = [−3(x1 − 2)2 , 1]> and
∇g(x) = [0, −1]> . The KKT condition is
µ ≥ 0
−3(x1 − 2)2
=
0
1−µ = 0
µ(1 − x2 ) = 0
1 − x2
≤ 0.
The only solution to the above conditions is x∗ = [2, 1]> , µ∗ = 1.
To check if x∗ is regular, we note that the constraint is active. We have ∇g(x∗ ) = [0, −1]> , which is
nonzero. Hence, x∗ is regular.
b. We have
"
0
L(x , µ ) = F (x ) + µ G(x ) =
0
∗
∗
∗
∗
∗
#
0
.
0
Hence, the point x∗ satisfies the SONC.
c. Since µ∗ > 0, we have T̃ (x∗ , µ∗ ) = T (x∗ ) = {y : [0, −1]y = 0} = {y : y2 = 0}, which means that T̃
contains nonzero vectors. Hence, the SOSC does not hold at x∗ .
21.6
a. Write f (x) = x2 , g(x) = −(x2 + (x1 − 1)2 − 3). We have ∇f (x) = [0, 1]> and ∇g(x) = [−2(x1 − 1), −1]> .
The KKT conditions are:
µ ≥ 0
−2µ(x1 − 1) = 0
1−µ = 0
µ(x2 + (x1 − 1) − 3) = 0
2
x2 + (x1 − 1)2 + 3
≤ 0.
From the third equation we get µ = 1. The second equation then gives x1 = 1, and the fourth equation gives
x2 = 3. Therefore, the only point that satisfies the KKT condition is x∗ = [1, 3]> , with a KKT multiplier of
µ∗ = 1.
b. Note that the constraint x2 + (x1 − 1)2 + 3 ≥ 0 is active at x∗ . We have T (x∗ ) = {y : [0, −1]y =
0} = {y : y2 = 0}, and N (x∗ ) = {y : y = [0, −1]z, z ∈ R} = {y : y1 = 0}. Because µ∗ > 0, we have
T̃ (x∗ ) = T (x∗ ) = {y : y2 = 0}.
c. We have
"
−2
L(x∗ , µ∗ ) = O + 1
0
# "
0
−2
=
0
0
#
0
.
0
From part b, T (x∗ ) = {y : y2 = 0}. Therefore, for any y ∈ T (x∗ ), y > L(x∗ , µ∗ )y = −2y12 ≤ 0, which means
that x∗ does not satisfy the SONC.
21.7
a. We need to consider two optimization problems. We first consider the minimization problem
minimize
(x1 − 2)2 + (x2 − 1)2
subject to x21 − x2 ≤ 0
x1 + x2 − 2 ≤ 0
− x1 ≤ 0.
187
Then, we form the Lagrangian function
l(x, µ) = (x1 − 2)2 + (x2 − 1)2 + µ1 (x21 − x2 ) + µ2 (x1 + x2 − 2) + µ3 (−x1 ).
The KKT condition takes the form
h
∇x l(x, µ) = 2(x1 − 2) + 2µ1 x1 + µ2 − µ3
i
2(x2 − 1) − µ1 + µ2 = 0T
µ1 (x21 − x2 ) = 0
µ2 (x1 + x2 − 2) = 0
µ3 (−x1 ) = 0
µi ≥ 0.
The point x∗ = 0 satisfies the above conditions for µ1 = −2, µ2 = 0, and µ3 = −4. Thus the point x∗ does
not satisfy the KKT conditions for minimum.
We next consider the maximization problem
minimize
− (x1 − 2)2 − (x2 − 1)2
subject to x21 − x2 ≤ 0
x1 + x2 − 2 ≤ 0
− x1 ≤ 0.
The Lagrangian function for the above problem is,
l(x, µ) = −(x1 − 2)2 − (x2 − 1)2 + µ1 (x21 − x2 ) + µ2 (x1 + x2 − 2) + µ3 (−x1 ).
The KKT condition takes the form
h
∇x l(x, µ) = −2(x1 − 2) + 2µ1 x1 + µ2 − µ3
i
−2(x2 − 1) − µ1 + µ2 = 0>
µ1 (x21 − x2 ) = 0
µ2 (x1 + x2 − 2) = 0
µ3 (−x1 ) = 0
µi ≥ 0.
The point x∗ = 0 satisfies the above conditions for µ1 = 2, µ2 = 0, and µ3 = 4. Hence, the point x∗ satisfies
the KKT conditions for maximum.
b. We next compute the Hessian, with respect to x, of the lagrangian to obtain
"
# "
# "
#
−2
0
4
0
2
0
L = F + µ∗1 G1 =
+
=
0 −2
0 0
0 −2
which is indefinite on R2 . We next find the subspace
( "
#
) ( "
∇g1 (x∗ )>
0
T̃ =
y:
y=0 = y:
∗ >
∇g3 (x )
−1
#
)
−1
y=0
0
= {0},
That is, T̃ is a trivial subspace that consists only of the zero vector. Thus the SOSC for x∗ to be a strict
local maximizer is trivially satisfied.
21.8
a. Write h(x) = x1 − x2 , g(x) = −x1 . We have Df (x) = [x22 , 2x1 x2 ], Dh(x) = [1, −1], Dg(x) = [−1, 0].
188
Note that all feasible points are regular. The KKT condition is:
x22 + λ − µ =
2x1 x2 − λ =
µx1
0
0
=
0
µ ≥ 0
x1 − x2 = 0
x1
≥ 0.
We first try x1 = x∗1 = 0 (active inequality constraint). Substituting and manipulating, we have the solution
x∗1 = x∗2 = 0 with µ∗ = 0, which is a legitimate solution. If we then try x1 = x∗1 > 0 (inactive inequality
constraint), we find that there is no consistent solution to the KKT condition. Thus, there is only one point
satisfying the KKT condition: x∗ = 0.
c. The tangent space at x∗ = 0 is given by
T (0) = {y : [1, −1]y = 0, [−1, 0]y = 0} = {0}.
Therefore, the SONC holds for the solution in part a.
d. We have
"
#
2x2
.
2x1
0
L(x, λ, µ) =
2x2
Hence, at x∗ = 0, we have L(0, 0, 0) = O. Since the active constraint at x∗ = 0 is degenerate, we have
T̃ (0, 0) = {y : [1, −1]y = 0},
which is nontrivial. Hence, for any nonzero vector y ∈ T̃ (0, 0), we have y > L(0, 0, 0)y = 0 6> 0. Thus, the
SOSC does not hold for the solution in part a.
21.9
a. The KKT condition for the problem is:
(Ax − b)> A + λe> − µ>
= 0>
µ> x = 0
µ ≥ 0
e> x − 1
x
=
0
≥ 0
where e = [1, . . . , 1]> .
b. A feasible point x∗ is regular in this problem if the vectors e, ei , i ∈ J(x∗ ) are linearly independent,
where J(x∗ ) = {i : x∗i = 0} and ei is the vector with 0 in all components except the ith component, which
is 1.
In this problem, all feasible points are regular. To see this, note that 0 is not feasible. Therefore, any
feasible point results in the set J(x∗ ) having fewer than n elements, which implies that the vectors e, ei ,
i ∈ J(x∗ ) are linearly independent.
21.10
By the KKT Theorem, there exists µ∗ ≥ 0 such that
(x∗ − x0 ) + µ∗ ∇g(x∗ )
=
0
µ∗ g(x∗ )
=
0.
Premultiplying both sides of the first equation by (x∗ − x0 )> , we obtain
kx∗ − x0 k2 + µ∗ (x∗ − x0 )> ∇g(x∗ ) = 0.
189
Since kx∗ − x0 k2 > 0 (because g(x0 ) > 0) and µ∗ ≥ 0, we deduce that (x∗ − x0 )> ∇g(x∗ ) < 0 and µ∗ > 0.
From the second KKT condition above, we conclude that g(x∗ ) = 0.
21.11
a. By inspection, we guess the point [2, 2]> (drawing a picture may help).
b. We write f (x) = (x1 − 3)2 + (x2 − 4)2 , g1 (x) = −x1 , g2 (x) = −x2 , g3 (x) = x1 − 2, g4 (x) = x2 − 2,
g = [g1 , g2 , g3 , g4 ]> . The problem becomes
minimize
f (x)
g(x) ≤ 0.
subject to
We now check the SOSC for the point x∗ = [2, 2]> . We have two active constraints: g3 , g4 . Regularity
holds, since ∇g3 (x∗ ) = [1, 0]> and ∇g4 (x∗ ) = [0, 1]> . We have ∇f (x∗ ) = [−2, −4]> . We need to find a
µ∗ ∈ R4 , µ∗ ≥ 0, satisfying FONC. From the condition µ∗> g(x∗ ) = 0, we deduce that µ∗1 = µ∗2 = 0. Hence,
Df (x∗ ) + µ∗> Dg(x∗ ) = 0> if and only if µ∗ = [0, 0, 2, 4]> . Now,
"
#
"
#
2
0
0
0
F (x∗ ) =
,
[µ∗ G(x∗ )] =
.
0 2
0 0
Hence
"
2
L(x , µ ) =
0
∗
∗
0
2
#
which is positive definite on R2 . Hence, SOSC is satisfied, and x∗ is a strict local minimizer.
21.12
The KKT condition is
x> Q + µ> A
= 0>
µ> (Ax − b) = 0
µ ≥ 0
Ax − b ≤ 0.
Postmultiplying the first equation by x gives
x> Qx + µ> Ax = 0.
We note from the second equation that µ> Ax = µ> b. Hence,
x> Qx + µ> b = 0.
Since Q > 0, the first term is nonnegative. Also, the second term is nonnegative because µ ≥ 0 and b ≥ 0.
Hence, we conclude that both terms must be zero. Because Q > 0, we must have x = 0.
Aside: Actually, we can deduce that the only solution to the KKT condition must be 0, as follows. The
problem is convex; thus, the only points satisfying the KKT condition are global minimizers. However, we
see that 0 is a feasible point, and is the only point for which the objective function value is 0. Further, the
objective function is bounded below by 0. Hence, 0 is the only global minimizer.
21.13
a. We have one scalar equality constraint with h(x) = [c, d]> x − e and two scalar inequality constraints with
g(x) = −x. Hence, there exists µ∗ ∈ R2 and λ∗ ∈ R such that
µ∗
≥ 0
a + cλ − µ∗1
b + dλ∗ − µ∗2
∗> ∗
= 0
= 0
∗
µ
cx∗1
+
x
dx∗2
∗
x
190
=
0
= e
≥ 0.
b. Because x∗ is a basic feasible solution, and the equality constraint precludes the point 0, exactly one of the
inequality constraints is active. The vectors ∇h(x∗ ) = [c, d]> and ∇g1 = [1, 0]> are linearly independent.
Similarly, the vectors ∇h(x∗ ) = [c, d]> and ∇g2 = [0, 1]> are linearly independent. Hence, x∗ must be
regular.
c. The tangent space is given by
T (x∗ )
= {y ∈ Rn : Dh(x∗ )y = 0, Dgj (x∗ )y = 0, j ∈ J(x∗ )}
= N (M ),
where M is a matrix with the first row equal to Dh(x∗ ) = [c, d], and the second row is either Dg1 = [1, 0]
or Dg2 = [0, 1]. But, as we have seen in part b, rank M = 2 Hence, T (x∗ ) = {0}.
d. Recall that we can take µ∗ to be the relative cost coefficient vector (i.e., the KKT conditions are satisfied
with µ∗ being the relative cost coefficient vector). If the relative cost coefficients of all nonbasic variables
are strictly positive, then µ∗j > 0 for all j ∈ J(x∗ ). Hence, T̃ (x∗ , µ∗ ) = T (x∗ ) = {0}, which implies that
L(x∗ , λ∗ , µ∗ ) > 0 on T̃ (x∗ , µ∗ ). Hence, the SOSC is satisfied.
21.14
Let x∗ be a solution. Since A is of full rank, x∗ is regular. The KKT Theorem states that x∗ satisfies:
µ∗
>
∗>
c +µ
A
µ∗> Ax∗
≥ 0
= 0
=
0.
If we postmultiply the second equation by x∗ and subtract the third from the result, we get
c> x∗ = 0.
21.15
a. We can write the LP as
minimize
f (x)
subject to
h(x) = 0,
g(x) ≤ 0,
where f (x) = c> x, h(x) = Ax − b, and g(x) = −x. Thus, we have Df (x) = c> , Dh(x) = A, and
Dg(x) = −I. The Karush-Kuhn-Tucker conditions for the above problem have the form: if x∗ is a local
minimizer, then there exists λ∗ and µ∗ such that
µ∗
>
∗>
c +λ
A − µ∗>
µ∗> x∗
≥ 0
= 0>
=
0.
b. Let x∗ be an optimal feasible solution. Then, x∗ satisfies the Karush-Kuhn-Tucker conditions listed in
part a. Since µ∗ ≥ 0, then from the second condition in part a, we obtain (−λ∗ )> A ≤ c> . Hence, λ̄ = −λ∗
is a feasible solution to the dual (see Chapter 17). Postmultiplying the second condition in part a by x∗ , we
have
0 = c> x∗ + λ∗> Ax − µ∗> x∗ = c> x∗ + λ∗> b
which gives
>
c> x∗ = λ̄ b.
Hence, λ̄ achieves the same objective function value for the dual as x∗ for the primal.
c. From part a, we have µ∗> = c> − λ̄A. Substituting this into µ∗> x∗ = 0 yields the desired result.
191
21.16
By definition of J(x∗ ), we have gi (x∗ ) < 0 for all i 6∈ J(x∗ ). Since by assumption gi is continuous for all i,
there exists ε > 0 such that gi (x) < 0 for all i 6∈ J(x∗ ) and all x in the set B = {x : kx − x∗ k < ε}. Let
S1 = {x : h(x) = 0, gj (x) ≤ 0, j ∈ J(x∗ )}. We claim that S ∩ B = S1 ∩ B. To see this, note that clearly
S ∩ B ⊂ S1 ∩ B. To show that S1 ∩ B ⊂ S ∩ B, suppose x ∈ S1 ∩ B. Then, by definition of S1 and B, we
have h(x) = 0, gj (x) ≤ 0 for all j ∈ J(x∗ ), and gi (x) < 0 for all i 6∈ J(x∗ ). Hence, x ∈ S ∩ B.
Since x∗ is a local minimizer of f over S, and S ∩ B ⊂ S, x∗ is also a local minimizer of f over
S ∩ B = S1 ∩ B. Hence, we conclude that x∗ is a regular local minimizer of f on S1 . Note that S 0 ⊂ S1 ,
and x∗ ∈ S 0 . Therefore, x∗ is a regular local minimizer of f on S 0 .
21.17
Write f (x) = x21 +x22 , g1 (x) = x21 −x2 −4, g2 (x) = x2 −x1 −2, and g = [g1 , g2 ]> . We have ∇f (x) = [2x1 , 2x2 ]> ,
∇g1 (x) = [2x1 , −1]> , ∇g2 (x) = [−1, 1]> , and D2 f (x) = diag[2, 2]. We compute
∇f (x) + µ> ∇g(x) = [2x1 + 2µ1 x1 − µ2 , 2x2 − µ1 + µ2 ]> .
We use the FONC to find critical points. Rewriting ∇f (x) + µ> ∇g(x) = 0, we obtain
x1 =
µ2
,
2 + 2µ1
x2 =
µ1 − µ2
.
2
We also use µ> g(x) = 0 and µ ≥ 0, giving
µ1 (x21 − x2 − 4) = 0,
µ2 (x2 − x1 − 2) = 0.
The vector µ has two components; therefore, we try four different cases.
Case 1: (µ1 > 0, µ2 > 0) We have
x21 − x2 − 4 = 0,
x2 − x1 − 2 = 0.
We obtain two solutions: x(1) = [−2, 0]> and x(2) = [3, 5]> . For x(1) , the two FONC equations give µ1 = µ2
and −2(2 + 2µ1 ) = µ1 , which yield µ1 = µ2 = −4/5. This is not a legitimate solution since we require µ ≥ 0.
For x(2) , the two FONC equations give µ1 − µ2 = 10 and 3(2 + 2µ1 ) = µ2 , which yield µ = [−16/5, −66/5].
Again, this is not a legitimate solution.
Case 2: (µ1 = 0, µ2 > 0) We have
x2 − x1 − 2 = 0,
x1 =
µ2
,
2
x2 = −
µ2
.
2
Hence, x1 = −x2 , and thus x = [−1, 1], µ2 = −2. This is not a legitimate solution since we require µ ≥ 0.
Case 3: (µ1 > 0, µ2 = 0) We have
x21 − x2 − 4 = 0,
x1 = 0,
x2 =
µ1
.
2
Therefore, x2 = −4, µ1 = −8, and again we don’t have a legitimate solution.
Case 4: (µ1 = 0, µ2 = 0) We have x1 = x2 = 0, and all constraints are inactive. This is a legitimate
candidate for the minimizer. We now apply the SOSC. Note that since the candidate is an interior point
of the constraint set, the SOSC for the problem is equivalent to the SOSC for unconstrained optimization.
The Hessian matrix D2 f (x) = diag[2, 2] is symmetric and positive definite. Hence, by the SOSC, the point
x∗ = [0, 0]> is the strict local minimizer (in fact, it is easy to see that it is a global minimizer).
21.18
Write f (x) = x21 +x22 , g1 (x) = −x1 +x22 +4, g2 (x) = x1 −10, and g = [g1 , g2 ]> . We have ∇f (x) = [2x1 , 2x2 ]> ,
∇g1 (x) = [−1, 2x2 ]> , ∇g2 (x) = [1, 0]> , D2 f (x) = diag[2, 2], D2 g1 (x) = diag[0, 2], and D2 g2 (x) = O. We
compute
∇f (x) + µ> ∇g(x) = [2x1 − µ1 + µ2 , 2x2 + 2µ1 x2 ]> .
We use the FONC to find critical points. Rewriting ∇f (x) + µ> ∇g(x) = 0, we obtain
x1 =
µ1 − µ2
,
2
x2 (1 + µ1 ) = 0.
192
Since we require µ ≥ 0, we deduce that x2 = 0. Using µ> g(x) = 0 gives
µ1 (−x1 + 4) = 0,
µ2 = 0.
We are left with two cases.
Case 1: (µ1 > 0, µ2 = 0) We have −x1 + 4 = 0, and µ1 = 8, which is a legitimate candidate.
Case 2: (µ1 = 0, µ2 = 0) We have x1 = x2 = 0, which is not a legitimate candidate, since it is not a
feasible point.
We now apply SOSC to our candidate x = [4, 0]> , µ = [8, 0]> . Now,
"
#
"
# "
#
2 0
0 0
2 0
>
>
L([4, 0] , [8, 0] ) =
+8
=
,
0 2
0 2
0 18
which is positive definite on all of R2 . The point [4, 0]> is clearly regular. Hence, by the SOSC, x∗ = [4, 0]
is a strict local minimizer.
21.19
Write f (x) = x21 +x22 , g1 (x) = −x1 −x22 +4, g2 (x) = 3x2 −x1 , g3 (x) = −3x2 −x1 and g = [g1 , g2 , g3 ]> . We have
∇f (x) = [2x1 , 2x2 ]> , ∇g1 (x) = [−1, −2x2 ]> , ∇g2 (x) = [−1, 3]> , ∇g2 (x) = [−1, −3]> , D2 f (x) = diag[2, 2],
D2 g1 (x) = diag[0, −2], and D2 g2 (x) = D2 g3 (x) = O.
From the figure, we see that the two candidates are x(1) = [3, 1] and x(2) = [3, −1]. Both points are easily
verified to be regular.
For x(1) , we have µ3 = 0. Now,
Df (x(1) ) + µ> Dg(x(1) ) = [6 − µ1 − µ2 , 2 − 2µ1 + 3µ2 ] = 0> ,
which yields µ1 = 4, µ2 = 2. Now, T̃ (x(1) ) = {0}. Therefore, any matrix is positive definite on T̃ (x(1) ).
Hence, by the SOSC, x(1) is a strict local minimizer.
For x(2) , we have µ2 = 0. Now,
Df (x(1) ) + µ> Dg(x(1) ) = [6 − µ1 − µ3 , −2 + 2µ1 − 3µ3 ] = 0> ,
which yields µ1 = 4, µ3 = 2. Now, again we have T̃ (x(2) ) = {0}. Therefore, any matrix is positive definite
on T̃ (x(2) ). Hence, by the SOSC, x(2) is a strict local minimizer.
21.20
a. Write f (x) = 3x1 and g(x) = 2 − x1 − x22 . We have ∇f (x∗ ) = [3, 0]> and ∇g(x∗ ) = [−1, 0]> . Hence,
letting µ∗ = 3, we have ∇f (x∗ )+µ∗ ∇g(x∗ ) = 0. Note also that µ∗ ≥ 0 and µ∗ g(x∗ ) = 0. Hence, x∗ = [2, 0]>
satisfies the KKT (first order necessary) condition.
b. We have F (x∗ ) = O and G(x∗ ) = diag[0, −2]. Hence, L(x∗ , µ∗ ) = O + 3 diag[0, −2] = diag[0, −6]. Also,
T (x∗ ) = {y : [−1, 0]y = 0} = {y : y1 = 0}. Hence, x∗ = [2, 0]> does not satisfy the second order necessary
condition.
c. No. Consider points of the form x = [−x22 + 2, x2 ]> , x2 ∈ R. Such points are feasible, and could be
arbitrarily close to x∗ . However, for such points x 6= x∗ ,
f (x) = 3(−x22 + 2) = 6 − 6x22 < 6 = f (x∗ ).
Hence, x∗ is not a local minimizer.
21.21
The KKT condition for the problem is
∗
µ∗
≥ 0
∗
= 0
∗
x +λ a−µ
µ∗> x∗
193
=
0.
Premultiplying the second KKT condition above by µ∗> and using the third condition, we get
λ∗ µ∗> a = kµ∗ k2 .
Also, premultiplying the second KKT condition above by x∗> and using the feasibility condition a> x∗ = b,
we get
kx∗ k2
λ∗ = −
< 0.
b
We conclude that µ∗ = 0. For if not, the equation λ∗ µ∗> a = kµ∗ k2 implies that µ∗> a < 0, which contradicts
µ∗ ≥ 0 and a ≥ 0.
Rewriting the second KKT condition with µ∗ = 0 yields
x∗ = −λ∗ a.
Using the feasibility condition a> x∗ = b, we get
x∗ = a
b
.
kak2
21.22
a. Suppose (x∗1 )2 + (x∗2 )2 < 1. Then, the point x∗ = [x∗1 , x∗2 ]> lies in the interior of the constraint set
Ω = {x : kxk2 ≤ 1}. Hence, by the FONC for unconstrained optimization, we have that ∇f (x∗ ) = 0, where
f (x) = kx − [a, b]> k2 is the objective function. Now, ∇f (x∗ ) = 2(x∗ − [a, b]> ) = 0, which implies that
x∗ = [a, b]> which violates the assumption (x∗1 )2 + (x∗2 )2 < 1.
b. First, we need to show that x∗ is a regular point. For this, note that if we write the constraint as
g(x) = kxk2 − 1 ≤ 0, then ∇g(x∗ ) = 2x∗ 6= 0. Therefore, x∗ is a regular point. Hence, by the KarushKuhn-Tucker theorem, there exists µ ∈ R, µ ≥ 0, such that
∇f (x∗ ) + µ∇g(x∗ ) = 0,
which gives
" #
1
a
.
x =
1+µ b
∗
Hence, x∗ is unique, and we can write x∗1 = αa, x∗2 = αb, where α = 1/(1 + µ) ≥ 0.
c. √
Using part b and the fact that kx∗ k = 1, we get kx∗ k2 = α2 k[a, b]> k2 = 1, which gives α = 1/k[a, b]> k =
1/ a2 + b2 .
21.23
a. The Karush-Kuhn-Tucker conditions for this problem are
2x∗1 + µ∗ exp(x∗1 )
2(x∗2
∗
µ
=
0
∗
=
0
− x∗2 )
exp(x∗1 )
∗
=
0
+ 1) − µ
(exp(x∗1 )
µ
≤ x∗2
≥ 0.
b. From the second equation in part a, we obtain µ∗ = 2(x∗2 + 1). Since x∗2 ≥ exp(x∗1 ) > 0, then µ∗ > 0.
Hence, by the third equation in part a, we obtain x∗2 = exp(x∗1 ).
c. Since µ∗ = 2(x∗2 + 1) = 2(exp(x∗1 ) + 1), then by the first equation in part a, we have
2x∗1 + 2(exp(x∗1 ) + 1) exp(x∗1 ) = 0
which implies
x∗1 = −(exp(2x∗1 ) + exp(x∗1 )).
194
Since exp(x∗1 ), exp(2x∗1 ) > 0, then x∗1 < 0, and hence exp(x∗1 ), exp(2x∗1 ) < 1. Therefore, x∗1 > −2.
21.24
a. We rewrite the problem as
minimize
f (x)
g(x) ≤ 0,
subject to
where f (x) = c> x and g(x) = 21 kxk2 − 1. Hence, ∇f (x) = c and ∇g(x) = x. Note that x∗ 6= 0 (for
otherwise it would not be feasible), and therefore it is a regular point. By the KKT theorem, there exists
µ∗ ≤ 0 such that c = µ∗ x∗ and µ∗ g(x∗ ) = 0. Since c 6= 0, we must have µ∗ 6= 0. Therefore, g(x∗ ) = 0,
which implies that kx∗ k2 = 2.
p
b. From part a, we have α2 kek2 = 2. Since kek2 = n, we have α = 2/n.
To find c, we use
4 = c> x∗ + 8 = µ∗ kx∗ k2 + 8 = 2µ∗ + 8,
p
and thus µ∗ = −2. Hence, c = −2αe = − 2 2/n e.
21.25
We can represent the equivalent problem as
minimize
f (x)
g(x) ≤ 0,
subject to
where g(x) = 12 kh(x)k2 . Note that
∇g(x) = Dh(x)> h(x).
Therefore, the KKT condition is:
µ∗ ≥ 0
∇f (x∗ ) + µ∗ Dh(x∗ )> h(x∗ ) = 0
µ∗ kh(x∗ )k
=
0.
Note that for a feasible point x∗ , we have h(x∗ ) = 0. Therefore, the KKT condition becomes
µ∗
≥ 0
∗
= 0.
∇f (x )
Note that ∇g(x∗ ) = 0. Therefore, any feasible point x∗ is not regular. Hence, the KKT theorem cannot
be applied in this case. This should be clear, since obviously ∇f (x∗ ) = 0 is not necessary for optimality in
general.
22. Convex Optimization Problems
22.1
The given function is a quadratic, which we represent

−1
>
f = x  −α
1
in the form

−α 1

−1 −2 x.
−2 −5
A quadratic function is concave if and only if it is negative-semidefinite. Equivalently, if and only if its
negative is positive-semidefinite. On the other hand, a symmetric matrix is positive semidefinite if and only
195
if all its principal minors, not just the leading principal minors, are nonnegative. Thus we will determine
the range of the parameter α for which


1 α −1


−f = x>  α 1 2  x.
−1 2 5
is positive-semidefinite. It is easy to see that the three first-order principal minors (diagonal elements of
F ) are all positive. There are three second-order principal minors. Only one of them, the leading principal
minor, is a function of the parameter α,
"
#
1 α
det F (1 : !2, 1 : !2) = det
= 1 − α2 .
α 1
The above second-order leading principal minor is nonnegative if and only if
α ∈ [−1, 1].
The other second-order principal minors are
det F (1 : !3, 1 : !3)
and
det F (2 : !3, 2 : !3),
and they are positive. There is only one third-order principal minor, det F , where
"
#
"
#
"
#
1 2
α −1
α −1
det F = det
− α det
− det
2 5
2 5
1 2
= 1 − α(5α + 2) − (2α + 1)
= 1 − 5α2 − 2α − 2α − 1
= −5α2 − 4α.
The third-order principal minor is nonnegative if and only if, −α(5α + 4) ≥ 0, that is, if and only if
α ∈ [−4/5, 0].
Combining this with α ∈ [−1, 1] from above, we conclude that the function f is negative-semidefinite,
equivalently, the quadratic function f is concave, if and only if
α ∈ [−4/5, 0].
22.2
We have
φ(α)
=
=
1
(x + αd)> Q(x + αd) − (x + αd)b
2
1 >
1 >
>
2
>
(d Qd)α + d (Qx − b)α +
x Qx + x b .
2
2
This is a quadratic function of α. Since Q > 0, then
d2 φ
(α) = d> Qd > 0
dα2
and hence by Theorem 22.5, φ is strictly convex.
22.3
Write f (x) = x> Qx, where
"
1 0
Q=
2 1
196
#
1
.
0
Let x, y ∈ Ω. Then, x = [a1 , ma1 ]> and y = [a2 , ma2 ]> for some a1 , a2 ∈ R. By Proposition 22.1, it is
enough to show that (y − x)> Q(y − x) ≥ 0. By substitution,
(y − x)> Q(y − x) = m(a2 − a1 )2 ≥ 0,
which completes the proof.
22.4
Let x, y ∈ Ω and α ∈ (0, 1). Then, h(x) = h(y) = c. By convexity of Ω, h(αx + (1 − α)y) = c. Therefore,
h(αx + (1 − α)y) = αh(x) + (1 − α)h(y)
and so h is convex over Ω. We also have
−h(αx + (1 − α)y) = α(−h(x)) + (1 − α)(−h(y)),
which shows that −h is convex, and thus h is concave.
22.5
At x = 0, for ξ ∈ [−1, 1], we have, for all y ∈ R,
|y| ≥ |0| + ξ(y − 0) = ξy.
Thus, in this case any ξ in the interval [−1, 1] is a subgradient of f at x = 0.
At x = 1, ξ = 1 is the only subgradient of f , because, for all y ∈ R,
|y| ≥ 1 + ξ(y − 1) = y.
22.6
Let x, y ∈ Ω and α ∈ (0, 1). For convenience, write f¯ = max{f1 , . . . , f` } = maxi fi . We have
f¯(αx + (1 − α)y)
=
max fi (αx + (1 − α)y)
i
≤ max (αfi (x) + (1 − α)fi (y))
i
by convexity of each fi
≤ α max fi (x) + (1 − α) max fi (y)
i
i
by property of max
= αf¯(x) + (1 − α)f¯(y)
which implies that f¯ is convex.
22.7
⇒: This is true by definition.
⇐: Let d ∈ Rn be given. We want to show that d> Qd ≥ 0. Now, fix some vector x ∈ Ω. Since Ω is
open, there exists α 6= 0 such that y = x − αd ∈ Ω. By assumption,
0 ≤ (y − x)> Q(y − x) = α2 d> Qd
which implies that d> Qd ≥ 0.
22.8
Yes, the problem is a convex optimization problem.
First we show that the objective function f (x) = 12 kAx − bk2 is convex. We write
f (x) =
1 > >
x (A A)x − (b> A)x + constant
2
which is a quadratic function with Hessian A> A. Since the Hessian A> A is positive semidefinite, the
objective function f is convex.
Next we show that the constraint set is convex. Consider two feasible points x and y, and let λ ∈ (0, 1).
Then, x and y satisfy e> x = 1, x ≥ 0 and e> y = 1, y ≥ 0, respectively. We have
e> (λx + (1 − λ)y) = λe> x + (1 − λ)e> y = λ + (1 − λ) = 1.
197
Moreover, each component of λx + (1 − λ)y is given by λxi + (1 − λ)yi , which is nonnegative because every
term here is nonnegative. Hence, λx + (1 − λ)y is a feasible point, which shows that the constraint set is
convex.
22.9
We need to show that Ω is a convex set, and f is a convex function on Ω.
To show that Ω is a convex set, we need to show that for any y, z ∈ Ω and α ∈ (0, 1), we have αy+(1−α)z ∈
Ω. Let y, z ∈ Ω and α ∈ (0, 1). Thus, y1 = y2 ≥ 0 and z1 = z2 ≥ 0. Hence,
"
#
αy1 + (1 − α)z1
x = αy + (1 − α)z =
αy2 + (1 − α)z2
Now,
x1 = αy1 + (1 − α)z1 = αy2 + (1 − α)z2 = x2 ,
and since α, 1 − α ≥ 0,
x1 ≥ 0.
Hence, x ∈ Ω and therefore Ω is convex.
To show that f is convex on Ω, we need to show that for any y, z ∈ Ω and α ∈ [0, 1], f (αy + (1 − α)z) ≤
αf (y) + (1 − α)f (z). Let y, z ∈ Ω and α ∈ [0, 1]. Thus, y1 = y2 ≥ 0 and z1 = z2 ≥ 0, so that f (y) = y13 and
f (z) = z13 . Also, α3 ≤ α and (1 − α)3 ≤ (1 − α). We have
f (αy + (1 − α)z)
=
(αy1 + (1 − α)z1 )3
= α3 y13 + (1 − α)3 z13 + 3α2 x21 (1 − α)y1 + 3αx1 (1 − α)2 y12
≤ αy13 + (1 − α)z13 + max(y1 , z1 )(α3 − α + (1 − α)3 − (1 − α)
+ 3α2 (1 − α) + 3α(1 − α)2 )
αy13
=
+ (1 − α)z13
= αf (y) + (1 − α)f (z).
Hence, f is convex.
22.10
Since the problem is a convex optimization problem, we know for sure that any point of the form αy+(1−α)z,
α ∈ (0, 1), is a global minimizer. However, any other point may or may not be a minimizer. Hence, the
largest set of points G ⊂ Ω for which we can be sure that every point in G is a global minimizer, is given by
G = {αy + (1 − α)z : 0 ≤ α ≤ 1}.
22.11
a. Let f be the objective function and Ω the constraint set. Consider the set Γ = {x ∈ Ω : f (x) ≤ 1}.
This set contains all three of the given points. Moreover, by Lemma 22.1, Γ is convex. Now, if we take the
average of the first two points (which is a convex combination of them), the resulting point (1/2)[1, 0, 0]> +
(1/2)[0, 1, 0]> = (1/2)[1, 1, 0]> is in Γ, because Γ is convex. Similarly, the point (2/3)(1/2)[1, 1, 0]> +
(1/3)[0, 0, 1]> = (1/3)[1, 1, 1]> is also in Γ, because Γ is convex. Hence, the objective function value of
(1/3)[1, 1, 1]> must be ≤ 1.
b. If the three points are all global minimizers, then the point (1/3)[1, 1, 1]> , which must cannot have higher
objective function value than the given three points (by part a), must also be a global minimizer.
22.12
a. The Lagrange condition for the problem is given by:
x> Q + λ> A
= 0
Ax
= b.
From the first equation above, we obtain
x = Q−1 A> λ.
198
Applying the second equation (constraint on x), we have
(AQ−1 A> )λ = b.
Since rank A = m, the matrix AQ−1 A> is invertible. Therefore, the only solution to the Lagrange condition
is
x = Q−1 A> (AQ−1 A> )−1 b.
b. The point in part a above is a global minimizer because the problem is a convex optimization problem (by
problem 1, the constraint set is convex; the objective function is convex because its Hessian, Q, is positive
definite).
22.13
By Theorem 22.4, for all x ∈ Ω, we have
f (x) ≥ f (x∗ ) + Df (x∗ )(x − x∗ ).
P
>
Substituting Df (x∗ ) from the equation Df (x∗ ) + j∈J(x∗ ) µ∗j a>
j = 0 into the above inequality yields
X
f (x) ≥ f (x∗ ) −
∗
µ∗j a>
j (x − x ).
j∈J(x∗ )
Observe that for each j ∈ J(x∗ ),
∗
a>
j x + bj = 0,
and for each x ∈ Ω,
a>
j x + bj ≥ 0.
Hence, for each j ∈ J(x∗ ),
∗
a>
j (x − x ) ≥ 0.
Since µ∗j ≤ 0, we get
f (x) ≥ f (x∗ ) −
X
∗
∗
µ∗j a>
j (x − x ) ≥ f (x )
j∈J(x∗ )
and the proof is completed.
22.14
a. Let Ω = {x ∈ Rn : a> x ≥ b}, x1 , x2 ∈ Ω, and λ ∈ [0, 1]. Then, a> x1 ≥ b and a> x2 ≥ b. Therefore,
a> (λx1 + (1 − λ)x2 )
= λa> x1 + (1 − λ)a> x2
≥ λb + (1 − λ)b
= b
which means that λx1 + (1 − λ)x2 ∈ Ω. Hence, Ω is a convex set.
b. Rewrite the problem as
minimize
subject to
f (x)
g(x) ≤ 0
where f (x) = kxk2 and g(x) = b − a> x. Now, ∇g(x) = −a 6= 0. Therefore, any feasible point is regular.
By the Karush-Kuhn-Tucker theorem, there exists µ∗ ≥ 0 such that
2x∗ − µ∗ a = 0
∗
µ (b − a> x∗ )
=
0.
Since x∗ is a feasible point, then x∗ 6= 0. Therefore, by the first equation, we see that µ∗ 6= 0. The second
equation then implies that b − a> x∗ = 0.
199
c. By the first Karush-Kuhn-Tucker equation, we have x∗ = µ∗ a/2. Since a> x∗ = b, then µ∗ a> a/2 =
a> x∗ = b, and therefore µ∗ = 2b/kak2 . Since x∗ = µ∗ a/2 then x∗ is uniquely given by x∗ = ba/kak2 .
22.15
a. Let f (x) = c> x and Ω = {x : x ≥ 0}. Suppose x, y ∈ Ω, and α ∈ (0, 1). Then, x, y ≥ 0. Hence,
αx + (1 − α)y ≥ 0, which means αx + (1 − α)y ∈ Ω. Furthermore,
c> (αx + (1 − α)y) = αc> x + (1 − α)c> y.
Therefore, f is convex. Hence, the problem is a convex programming problem.
b. ⇒: We use contraposition. Suppose ci < 0 for some i. Let d = [0, . . . , 1, . . . , 0], where 1 appears in the
ith component. Clearly d is a feasible direction for any point x∗ ≥ 0. However, d> ∇f (x∗ ) = d> c = ci < 0.
Therefore, the FONC does not hold, and any point x∗ ≥ 0 cannot be a minimizer.
⇐: Suppose c ≥ 0. Let x∗ = 0, and d a feasible direction at x∗ . Then, d ≥ 0. Hence, d> ∇f (x∗ ) ≥ 0.
Therefore, by Theorem 22.7, x∗ is a solution.
The above also proves that if a solution exists, then 0 is a solution.
c. Write g(x) = −x so that the constraint can be expressed as g(x) ≤ 0.
⇒: We have Dg(x) = −I, which has full rank. Therefore, any point is regular. Suppose a solution x∗
exists. Then, by the KKT theorem, there exists µ∗ ≥ 0 such that c> − µ∗> = 0> and µ∗> x∗ = 0. Hence,
c = µ∗ ≥ 0.
⇐: Suppose c ≥ 0. Let x∗ = 0 and µ∗ = c. Then, µ∗ ≥ 0, c> − µ∗> = 0> , and µ∗> x∗ = 0, i.e., the
KKT condition is satisfied. By part a, x∗ is a solution to the problem.
The above also proves that if a solution exists, then 0 is a solution.
22.16
a. The standard form problem is
c> x
minimize
subject to
Ax = b
x ≥ 0,
which can be written as
minimize
subject to
f (x)
h(x) = 0,
g(x) ≤ 0,
where f (x) = c> x, h(x) = Ax − b, and g(x) = −x. Thus, we have Df (x) = c> , Dh(x) = A, and
Dg(x) = −I. The Karush-Kuhn-Tucker conditions for the above problem has the form:
µ∗
>
∗>
c +λ
A − µ∗>
µ∗> x∗
≥ 0
= 0>
=
0.
b. The Karush-Kuhn-Tucker conditions are sufficient for optimality in this case because the problem is a
convex optimization problem, i.e., the objective function is a convex function, and the feasible set is a convex
set.
c. The dual problem is
maximize λ> b
subject to λ> A
≤ c> .
c. Let
µ∗ = c> − λ∗> A.
200
Since λ∗ is feasible for the dual, we have µ∗ ≥ 0. Rewriting the above equation, we get
c> − λ∗> A − µ∗ = 0.
The Complementary Slackness condition (c> − λ∗> A)x∗ = 0 can be written as µ∗> x∗ = 0. Therefore, the
Karush-Kuhn-Tucker conditions hold. By part b, x∗ is optimal.
22.17
a. We can treat s(1) and s(2) as vectors in Rn . We have
Sa = {s : s = x1 s(1) + x2 s(2) , x1 , x2 ∈ R,
si ≥ a, i = 1 . . . , n}.
Let a = [a, . . . , a]> . The optimization problem is:
1 2
(x + x22 )
2 1
x1 s(1) + x2 s(2) ≥ a.
minimize
subject to
b. The KKT conditions are:
x1 − µ> s(1)
=
0
> (2)
=
0
x2 − µ s
>
µ (x1 s
(1)
+ x2 s
(2)
− a)
= 0
µ ≥ 0
(1)
(2)
x1 s + x2 s
≥ a.
c. Yes, because the Hessian of the Lagrangian is I (identity), which is positive definite.
d. Yes, this is a convex optimization problem. The objective function is quadratic, with identity Hessian
(hence positive definite). The constraint set is of the form Ax ≥ a, and hence is a linear variety.
22.18
a. We first show that the set of probability vectors
Ω = {q ∈ Rn : q1 + · · · + qn = 1, qi > 0, i = 1, . . . , n}
is a convex set. Let y, z ∈ Ω, so y1 + · · · + yn = 1, yi > 0, z1 + · · · + zn = 1, and zi > 0. Let α ∈ (0, 1) and
x = αy + (1 − α)z. We have
x1 + · · · + xn
= αy1 + (1 − α)z1 + · · · + αyn + (1 − α)zn
= α(y1 + · · · + yn ) + (1 − α)(z1 + · · · + zn )
= α + (1 − α)
= 1.
Also, because yi > 0, zi > 0, α > 0, and 1 − α > 0, we conclude that xi > 0. Thus, x ∈ Ω, which shows that
Ω is convex.
b. We next show that the function f is a convex function on Ω. For this, we compute
 p1

··· 0
q12
 .
.. 
..

F (q) = 
.
. ,
 ..
pn
0 · · · q2
n
which shows that F (q) > 0 for all q in the open set {q : qi > 0, i = 1, . . . , n}, which contains Ω. Therefore,
f is convex on Ω.
201
c. Fix a probability vector p. Consider the optimization problem
p1
pn
minimize
p1 log
+ · · · + pn log
x1
xn
subject to
x1 + · · · + xn = 1
xi > 0, i = 1, . . . , n
By parts a and b, the problem is a convex optimization problem. We ignore the constraint xi > 0 and write
down the Lagrange conditions for the equality-constraint problem:
pi
− ∗ + λ∗ = 0, i = 1, . . . , n
xi
x∗1 + · · · + x∗n = 1.
Rewrite the first set of equations as x∗i = pi /λ∗ . Combining this with the constraint and the fact that
p1 + · · · + pn = 1, we obtain λ∗ = 1, which means that x∗i = pi . Therefore, the unique global minimizer is
x∗ = p.
Note that f (x∗ ) = 0. Hence, we conclude that f (x) ≥ 0 for all x ∈ Ω. Moreover, f (x) = 0 if and only if
∗
x = 0. This proves the required result.
d. Given two probability vectors p and q, the number
pn
p1
+ · · · + pn log
D(p, q) = p1 log
q1
qn
is called the relative entropy (or Kullback-Liebler divergence) between p and q. It is used in information
theory to measure the “distance” between two probability vectors. The result of part c justifies the use of
D as a measure of “distance” (although D is not a metric because it is not symmetric).
22.19
We claim that a solution exists, and that it is unique. To prove the first claim, choose ε > 0 such that there
exists x ∈ Ω satisfying kx − zk < ε. Consider the modified problem
kx − zk
\
x ∈ Ω {y : ky − zk ≤ ε}.
minimize
subject to
If this modified problem has a solution, then clearly so does the original problem. The objective function here
is continuous, and the constraint set is closed and bounded. Hence, by Weierstrass’s Theorem, a solution to
the problem exists.
Let f be the objective function. Next, we show that f is convex (and hence the problem is a convex
optimization problem). Let x, y ∈ Ω and α ∈ (0, 1). Then,
f (αx + (1 − α)y)
= kαx + (1 − α)y − zk
= kα(x − z) + (1 − α)(y − z)k
≤ αkx − zk + (1 − α)ky − zk
= αf (x) + (1 − α)f (y),
which shows that f is convex.
To prove uniqueness, let x1 and x2 be solutions to the problem. Then, by convexity, x3 = (x1 + x2 )/2 is
also a solution. But
kx3 − zk
=
=
x 1 + x2
−z
2
x1 − z
x2 − z
+
2
2
1
(kx1 − zk + kx2 − zk)
2
= kx3 − zk,
≤
202
from which we conclude that the triangle inequality above holds with equality, implying that x1 − z =
α(x2 − z) for some α ≥ 0. Because kx1 − zk = kx2 − zk = αkx1 − zk, we have α = 1. From this, we obtain
x1 = x2 , which proves uniqueness.
22.20
a. Let A ∈ Rn×n and B ∈ Rn× be symmetric and A ≥ 0, B ≥ 0. Fix α ∈ (0, 1), x ∈ Rn , and let
C = αA + (1 − α)B. Then,
x> Cx = x> [αA + (1 − α)B]x
= αx> Ax + (1 − α)x> Bx.
Since x> Ax ≥ 0, x> Bx ≥ 0, and α, (1 − α) > 0 by assumption, then x> Cx ≥ 0, which proves the required
result.
Pn
b. We first show that the constraint set Ω = {x : F 0 + j=1 xj F j ≥ 0} is convex. So, let x, y ∈ Ω and
α ∈ (0, 1). Let z = αx + (1 − α)y. Then,
F0 +
n
X
zj F j
= F0 +
j=1
n
X
[αxj + (1 − α)yj ]F j
j=1
n
X
= F0 + α
= α[F 0 +
j=1
n
X
xj F j + (1 − α)
n
X
yj F j
j=1
xj F j ] + (1 − α)[F 0 +
n
X
j=1
yj F j ].
j=1
By assumption, we have
F0 +
F0 +
n
X
j=1
n
X
xj F j
≥ 0
yj F j
≥ 0.
j=1
By part c, we conclude that
F0 +
n
X
zj F j ≥ 0,
j=1
which implies that z ∈ Ω.
To show that the objective function f (x) = c> x is convex on Ω, let x, y ∈ Ω and α ∈ (0, 1). Then,
f (αx + (1 − α)y)
= c> (αx + (1 − α)y)
= αc> x + (1 − α)c> y
= αf (x) + (1 − α)f (y)
which shows that f is convex.
c. The objective function is already in the required form. To rewrite the constraint, let ai,j be the (i, j)th
entry of A, i = 1, . . . , m, j = 1, . . . , n. Then, the constraint Ax ≥ b can be written as
ai,1 x1 + ai,2 x2 + · · · + ai,n xn ≥ bi ,
i = 1, . . . , m
Now form the diagonal matrices
F0
=
diag{−b1 , . . . , −bm }
Fj
=
diag{a1,j , . . . , am,j },
203
j = 1, . . . , n.
Note that a diagonal matrix is positive semidefinite if and
Pn only if every diagonal element is nonnegative.
Hence, the constraint Ax ≥ b can be written as F 0 + j=1 xj F j ≥ 0. The left hand side is a diagonal
matrix, and the ith diagonal element is simply −bi + ai,1 x1 + ai,2 x2 + · · · + ai,n xn .
22.21
a. We have
Ω = {x : x1 + . . . + xn = 1; x1 , . . . , xn > 0; x1 ≥ 2xi , i = 2, . . . , n}.
So let x, y ∈ Ω and α ∈ (0, 1). Consider z = αx + (1 − α)y. We have
z1 + · · · + zn
= αx1 + (1 − α)y1 + · · · + αxn + (1 − α)yn
= α(x1 + · · · + xn ) + (1 − α)(y1 + · · · + yn )
= α+1−α
= 1.
Moreover, for each i, because xi > 0, yi > 0, α > 0 and 1 − α > 0, we have zi > 0. Finally, for each i,
z1 = αx1 + (1 − α)y1 ≥ α2xi + (1 − α)2yi = 2zi .
Hence, z ∈ Ω, which implies that Ω is convex.
b. We first show that the negative of the objective function is convex. For this, we will compute its Hessian,
which turns out to be a diagonal matrix with ith diagonal entry 1/x2i , which is strictly positive. Hence, the
Hessian is positive definite, which implies that the negative of the objective function is convex.
Combining the above with part a, we conclude that the problem is a convex optimization problem. Hence,
the FONC (for set constraints) is necessary and sufficient. Let x be a given allocation. The FONC at x
is d> ∇f (x) ≥ 0 for all feasible directions d at x. But because Ω is P
convex, the FONC can be written as
n
(y − x)> ∇f (x) ≥ 0 for all y ∈ Ω. Computing ∇f (x) for f (x) = − i=1 log(xi ), we get the proportional
fairness condition.
22.22
a. We rewrite the problem into a minimization problem by multiplying the objective function by −1. Thus,
the new objective function is the sum of the functions −Ui . Because each Ui is concave, −Ui is convex, and
hence their sum is convex.
To show that the constraint set Ω = {x : e> x ≤ C} (where e = [1, . . . , 1]> ) is convex, let x1 , x2 ∈ Ω, and
λ ∈ [0, 1]. Then, e> x1 ≤ C and e> x2 ≤ C. Therefore,
e> (λx1 + (1 − λ)x2 )
= λe> x1 + (1 − λ)e> x2
≤ λC + (1 − λ)C
= C
which means that λx1 + (1 − λ)x2 ∈ Ω. Hence, Ω is a convex set.
b. Because the problem is a convex optimization problem, the following KKT condition is necessary and
sufficient for x∗ to be a global minimizers:
µ∗
≥ 0
−Ui0 (x∗i ) + µ∗ = 0,
!
n
X
∗
∗
µ
xi − C
= 0
i = 1, . . . , n
i=1
n
X
x∗i
≤ C.
i=1
Note that because −Ui (xi ) + µ∗ xi is a convex function of xi , the second line above can be written as
x∗i = arg maxx (Ui (x) − µ∗ x).
204
Pn
c. Because each Ui is concave and increasing,
we conclude that i=1 x∗i = C; for otherwise we could increase
P
n
some x∗i and hence Ui (x∗i ) and also i=1 Ui (x∗i ), contradicting the optimality of x∗ .
22.23
First note that the optimization problem that you construct cannot be a convex problem (for otherwise, the
FONC implies that x∗ is a global minimizer, which then implies that the SONC holds). Let f (x) = x2 ,
g(x) = −(x2 + x21 ), and x∗ = 0. Then, ∇f (x∗ ) = [0, 1]> . Any feasible direction d at x∗ is of the form
d = [d1 , d2 ]> with d2 ≥ 0. Hence, d> ∇f (x∗ ) ≥ 0, which shows that the FONC holds.
Because ∇g(x∗ ) = −[0, 1] (so x∗ is regular), we see that if µ∗ = 1, then ∇f (x∗ ) + µ∗ ∇g(x∗ ) = 0, and so
the KKT condition holds.
Because F (x∗ ) = O, the SONC for set constraint Ω holds. However,
"
#
∗
∗
∗ −2 0
<0
L(x , µ ) = O + µ
0 0
and T (x∗ ) = {y : y2 = 0}, which shows that the SONC for inequality constraint g(x) ≤ 0 does not hold.
22.24
a. Let x0 and µ0 be feasible points in the primal and dual, respectively. Then, g(x0 ) ≤ 0 and µ0 ≥ 0, and
so µ>
0 g(x0 ) ≤ 0. Hence,
f (x0 ) ≥ f (x0 ) + µ>
0 g(x0 )
= l(x0 , µ0 )
≥
min l(x, µ0 )
x∈Rn
= q(µ0 ).
b. Suppose f (x0 ) = q(µ0 ) for feasible points x0 and µ0 . Let x be any feasible point in the primal. Then,
by part a, f (x) ≥ q(µ0 ) = f (x0 ). Hence x0 is optimal in the primal.
Similarly, let µ be any feasible point in the dual. Then, by part a, q(µ) ≤ f (x0 ) = q(µ0 ). Hence µ0 is
optimal in the dual.
c. Let x∗ be optimal in the primal. Then, by the KKT Theorem, there exists µ∗ ∈ Rm such that
∇x l(x∗ , µ∗ ) = (Df (x∗ ) + µ∗> Dg(x∗ ))>
∗>
µ
∗
g(x )
∗
µ
∗
= 0
= 0
≥ 0.
∗
Therefore, µ is feasible in the dual. Further, we note that l(·, µ ) is a convex function (because f is convex,
µ∗> g is convex being the sum of convex functions, and hence l is the sum of two convex functions). Hence,
we have l(x∗ , µ∗ ) = minx∈Rn l(x, µ∗ ). Therefore,
q(µ∗ )
=
min l(x, µ∗ )
x∈Rn
∗
= l(x , µ∗ )
= f (x∗ ) + µ∗> g(x∗ )
= f (x∗ ).
By part b, µ∗ is optimal in the dual.
22.25
a. The Schur complement of M (1, 1) is
∆11
= M (2 : !3, 2 : !3) − M (2 : 3, 1)M (1, 1)−1 M (1, 2 : !3)
"
# " #
i
1 2
γ h
=
−
γ −1
2 5
−1
"
#
1 − γ2 2 + γ
=
.
2+γ
4
205
b. The Schur complement of M (2 : !3, 2 : !3) is
∆22
= M (1, 1) − M (1, 2 : !3)M (2 : !3, 2 : 3)−1 M (2 : !3, 1)
"
# " #
h
i 1 2 −1 γ
= 1 − γ −1
2 5
−1
"
#" #
h
i 5 −2
γ
= 1 − γ −1
−2 1
−1
= −γ(5γ + 4).
22.26
Let
"
x1
P =
x2
and let
"
1
P1 =
0
#
"
0
0
, P2 =
0
1
x2
x3
#
#
"
1
0
, and P 3 =
0
0
#
0
.
1
Then, we can represent the Lyapunov inequality A> P + P A < 0 as
A> P + P A = x1 A> P 1 + P 1 A + x2 A> P 2 + P 2 A
+x3 A> P 3 + P 3 A
= −x1 F 1 − x2 F 2 − x3 F 3
< 0,
where
F i = −A> P i − P i A, i = 1, 2, 3.
Equivalently,
P = P > and A> P + P A < 0
if and only if
F (x) = x1 F 1 + x2 F 2 + x3 F 3 > 0.
22.27
The quadratic inequality
A> P + P A + P BR−1 B > P < 0
can be equivalently represented by the following LMI:
"
#
−R
B>P
< 0,
P B A> P + P A
or as the following LMI:
"
A> P + P A
B>P
#
PB
< 0.
−R
It is easy to verify, using Schur complements, that the above two LMIs are equivalent to the following
quadratic inequality:
"
#
A> P + P A + P BR−1 B > P
O
< 0.
O
−R
206
22.28
The MATLAB code is as follows:
A =[-0.9501
-0.4860
-0.4565
-0.2311
-0.8913
-0.0185
-0.6068
-0.7621
-0.8214];
setlmis([]);
P=lmivar(1,[3 1]);
lmiterm([1 1 1 P],1,A,’s’)
lmiterm([2 1 1 0],0.1)
lmiterm([-2 1 1 P],1,1)
lmiterm([3 1 1 P],1,1)
lmiterm([-3 1 1 0],1)
lmis=getlmis;
[tmin,xfeas]=feasp(lmis);
P=dec2mat(lmis,xfeas,P)
23. Algorithms for Constrained Optimization
23.1
a. By drawing a simple picture, it is easy to see that Π[x] = x/kxk, provided x 6= 0.
b. By inspection, we see that the solutions are [0, 1]> and [0, −1]> . (Or use Rayleigh’s inequality.)
c. Now,
x(k+1) = Π[x(k) + α∇f (x(k) )] = βk (x(k) + αQx(k) ) = βk (I + αQ)x(k) ,
where βk = 1/k(I + αQ)x(k) k. For the particular given form of Q, we have
(k+1)
= βk (1 + α)x1
(k+1)
= βk (1 + 2α)x2 .
x1
x2
(k)
(k)
Hence,
y (k+1) =
1+α
1 + 2α
y (k) .
(0)
d. Assuming x2 6= 0, y (0) is well defined. Hence, by part c, we can write
y (k) =
1+α
1 + 2α
k
y (0) .
Because α > 0,
1+α
< 1,
1 + 2α
which implies that y (k) → 0. But
1
= kx(k) k
q
(k)
(k)
=
(x1 )2 + (x2 )2
v
!
u
(k)
u (k)
(x1 )2
t
2
=
(x2 )
+1
(k)
(x2 )2
q
(k)
= |x2 | (y (k) )2 + 1,
207
which implies that
(k)
|x2 | = p
1
(y (k) )2 + 1
(k)
.
(k+1)
(k)
Because y (k) → 0, we have |x2 | → 1. By the expression for x2
in part c, we see that the sign of x2 does
(k)
(k)
(k)
not change with k. Hence, we deduce that either x2 → 1 or x2 → −1. This also implies that x1 → 0.
(k)
Hence, x converges to a solution to the problem.
(0)
(k)
(k)
e. If x2 = 0, then x2 = 0 for all k, which means that x1 = 1 or −1 for all k. In this case, the algorithm
is stuck at the initial condition [1, 0]> or [−1, 0]> (which are in fact the minimizers).
23.2
a. Yes. To show: Suppose that x(k) is a global minimizer of the given problem. Then, for all x ∈ Ω,
x 6= x(k) , we have c> x ≥ c> x(k) . Rewriting, we obtain c> (x − x(k) ) ≥ 0. Recall that
Π[x(k) − ∇f (x(k) )]
= arg min kx − (x(k) − ∇f (x(k) ))k2
x∈Ω
=
arg min kx − x(k) + ck2 .
x∈Ω
But, for any x ∈ Ω, x 6= x
(k)
,
kx − x(k) + ck2
= kx − x(k) k2 + kck2 + 2c> (x − x(k) )
> kck2 ,
where we used the facts that kx − x(k) k2 > 0 and c> (x − x(k) ) ≥ 0. On the other hand, kx(k) − x(k) + ck2 =
kck2 . Hence,
x(k+1) = Π[x(k) − ∇f (x(k) )] = x(k) .
b. No. Counterexample:
23.3
a. Suppose x(k) satisfies the FONC. Then, ∇f (x(k) ) = 0. Hence, x(k+1) = x(k) . Conversely, suppose x(k)
does not satisfy the FONC. Then, ∇f (x(k) ) 6= 0. Hence, αk > 0, and so x(k+1) 6= x(k) .
b. Case (i): Suppose x(k) is a corner point. Without loss of generality, take x(k) = [1, 1]> . (We can do this
because any other corner point can be mapped to this point by changing variables xi to −xi as appropriate.)
Note that any feasible direction d at x(k) = [1, 1]> satisfies d ≤ 0. Therefore,
x(k+1) = x(k)
⇔
−∇f (x(k) ) ≥ 0
⇔
⇔
d> ∇f (x(k) ) ≥ 0 for all feasible d at x(k)
x(k) satisfies FONC.
Case (ii): Suppose x(k) is not a corner point (i.e., is an edge point). Without loss of generality, take
x ∈ {x : x1 = 1, −1 < x2 < 1}. (We can do this because any other edge point can be mapped to this
point by changing variables xi to −xi as appropriate.) Note that any feasible direction d at x(k) ∈ {x : x1 =
1, −1 < x2 < 1} satisfies d1 ≤ 0. Therefore,
(k)
x(k+1) = x(k)
⇔
−∇f (x(k) ) = [a, 0]> , a > 0
⇔
⇔
d> ∇f (x(k) ) ≥ 0 for all feasible d at x(k)
x(k) satisfies FONC.
208
23.4
By definition of Π, we have
Π[x0 + y]
=
arg min kx − (x0 + y)k
=
arg min k(x − x0 ) − yk.
x∈Ω
x∈Ω
By Exercise 6.7, we can write
arg min k(x − x0 ) − yk = x0 + arg min kz − yk.
x∈Ω
z∈N (A)
The term arg minz∈N (A) kz − yk is simply the orthogonal projection of y onto N (A). By Exercise 6.7, we
have
arg min kz − yk = P y,
z∈N (A)
>
> −1
where P = I − A (AA )
A. Hence,
Π[x0 + y] = x0 + P y.
23.5
Since αk ≥ 0 is a minimizer of φk (α) = f (x(k) − αP g (k) ), we apply the FONC to φk (α) to obtain
φ0k (α) = (x(k) − αP g (k) )> Q(−P g (k) ) − b> (−P g (k) ).
Therefore, φ0k (α) = 0 if αg (k)> P QP g (k) = (x(k)> Q − b> )P g (k) . But
x(k)> Q − b> = g (k)> .
Hence
αk =
g (k)> P g (k)
.
g (k)> P QP g (k)
23.6
By Exercise 23.5, the projected steepest descent algorithm applied to this problem takes the form
x(k+1)
= x(k) − P x(k)
=
(I n − P )x(k)
= A> (AA> )−1 Ax(k) .
If x(0) ∈ {x : Ax = b}, then Ax(0) = b, and hence
x(1) = A> (AA> )−1 b
which solves the problem (see Section 12.3).
23.7
a. Define
φk (α) = f (x(k) − αP ∇f (x(k) ))
By the Chain Rule,
φ0k (αk )
dφk
(αk )
dα
= −(∇f (x(k) − αk P ∇f (x(k) )))> P ∇f (x(k) )
=
= −(∇f (x(k+1) ))> P ∇f (x(k) ).
Since αk minimizes φk , φ0k (αk ) = 0, and thus g (k+1)> P g (k) = 0.
209
b. We have x(k+1) − x(k) = −αk P g (k) and x(k+2) − x(k+1) = −αk+1 P g (k+1) . Therefore,
(x(k+2) − x(k+1) )> (x(k+1) − x(k) )
= αk+1 αk g (k+1)> P > P g (k)
= αk+1 αk g (k+1)> P g (k)
=
0
by part a, and the fact that P = P > = P 2 .
23.8
a. minimize f (x) + γP (x).
b. Suppose xγ 6∈ Ω. Then, P (xγ ) > 0 by definition of P . Because xγ is a global minimizer of the
unconstrained problem, we have
f (xγ ) + γP (xγ ) ≤ f (x∗ ) + γP (x∗ )
= f (x∗ ),
which implies that
f (xγ ) ≤ f (x∗ ) − γP (xγ ) < f (x∗ ).
23.9
We use the penalty method. First, we construct the unconstrained objective function with penalty parameter
γ:
f (x) = x21 + 2x22 + γ(x1 + x2 − 3)2 .
Because f is a quadratic with positive definite quadratic term, it is easy to find its minimizer:
" #
1
2
xγ =
.
1 + 2/(3γ) 1
For example, we can obtain the above by solving the FONC:
2(1 + γ)x1 + 2γx2 − 6γ
=
0
2γx1 + 2(2 + γ)x2 − 6γ
=
0.
Now letting γ → ∞, we obtain
" #
2
x∗ =
.
1
(It is easy to verify, using other means, that this is indeed the correct solution.)
23.10
Using the penalty method, we construct the unconstrained problem
minimize x + γ(max(a − x, 0))2
To find the solution to the above problem, we use the FONC. It is easy to see that the solution x∗ satisfies
x∗ < a. The derivative of the above objective function in the region x < a is 1 + 2γ(x − a). Thus, by the
FONC, we have x∗ = a − 1/(2γ). Since the true solution is at a, the difference is 1/(2γ). Therefore, for
1/(2γ) ≤ ε, we need γ ≥ 1/(2ε). The smallest such γ is 1/(2ε).
23.11
a. We have
"
1
1 > 1 + 2γ
2
2
kxk + γkAx − bk = x
2
2
2γ
210
#
" #
2γ
> 2γ
x−x
+ γ.
1 + 2γ
2γ
The above is a quadratic with positive definite Hessian. Therefore, the minimizer is
"
x∗γ
2γ
1 + 2γ
" #
1
1
.
2 + 1/2γ 1
=
=
1 + 2γ
2γ
#−1 "
2γ
2γ
#
Hence,
lim
γ→∞
x∗γ
" #
1 1
=
.
2 1
The solution to the original constrained problem is (see Section 12.3)
" #
1 1
>
> −1
∗
x = A (AA ) b =
.
2 1
b. We represent the objective function of the associated unconstrained problem as
1
1
kxk2 + γkAx − bk2 = x> I n + 2γA> A x − x> 2γA> b + γb> b.
2
2
The above is a quadratic with positive definite Hessian. Therefore, the minimizer is
x∗γ
Let A = U [S
−1 I n + 2γA> A
2γA> b
−1
1
>
=
In + A A
A> b.
2γ
=
O] V > be the singular value decomposition of A. For simplicity, denote ε = 1/2γ. We have
x∗γ
−1
εI n + A> A
A> b
" #
!−1
S
=
εI n + V
U > U [S O] V >
A> b
O
"
#
!−1
S2 O
>
=
εI n + V
V
A> b
O O
"
#−1
εI m + S 2
O
= V
V > A> b.
O
εI n−m
=
Note that
" #
S
V > A> =
U >.
O
Also,
"
εI m + S 2
O
O
εI n−m
#−1
"
(εI m + S 2 )−1
=
O
211
#
O
,
1
ε I n−m
where (εI m + S 2 )−1 is diagonal. Hence,
x∗γ
−1
εI n + A> A
A> b
#" #
"
S
(εI m + S 2 )−1
O
U>
= V
1
I
O
O
n−m
ε
" #
S
= V
(εI m + S 2 )−1 U >
O
" #
S
= V
U > U (εI m + S 2 )−1 U >
O
=
= A> U (εI m + S 2 )−1 U > .
Note that as γ → ∞, ε → 0, and
U (εI m + S 2 )−1 U > → U (S 2 )−1 U > .
But,
−1
U (S 2 )−1 U > = U S 2 U >
= (AA> )−1 .
Therefore,
x∗γ → A> (AA> )−1 b = x∗ .
24. Multi-Objective Optimization
24.1
The MATLAB code is as follows:
function multi_op
%MULTI_OP, illustrates multi-objective optimization.
clear
clc
disp (’
’)
disp (’This is a demo illustrating multi-objective optimization.’)
disp (’The numerical example is a modification of the example’)
disp (’from the 2002 book by A. Osyczka,’)
disp (’Example 5.1 on pages 101--105’)
disp (’------------------------------------------------------------’)
disp (’Select the population size denoted POPSIZE, for example, 50.’)
disp (’
’)
POPSIZE=input(’Population size POPSIZE = ’);
disp (’------------------------------------------------------------’)
disp (’Select the number of iterations denoted NUMITER; e.g., 10.’)
disp (’
’)
NUMITER=input(’Number of iterations NUMITER = ’);
disp (’
’)
disp (’------------------------------------------------------------’)
% Main
for i = 1:NUMITER
fprintf(’Working on Iteration %.0f...\n’,i)
xmat = genxmat(POPSIZE);
if i~=1
for j = 1:length(xR)
212
xmat = [xmat;xR{j}];
end
end
[xR,fR] = Select_P(xmat);
fprintf(’Number of Pareto solutions: %.0f\n’,length(fR))
end
disp (’
’)
disp (’------------------------------------------------------------’)
fprintf(’ Pareto solutions \n’)
celldisp(xR)
disp (’
’)
disp (’------------------------------------------------------------’)
fprintf(’ Objective vector values \n’)
celldisp(fR)
xlabel(’f_1’,’Fontsize’,16)
ylabel(’f_2’,’Fontsize’,16)
title(’Pareto optimal front’,’Fontsize’,16)
set(gca,’Fontsize’,16)
grid
for i=1:length(xR)
xx(i)=xR{i}(1);
yy(i)=xR{i}(2);
end
XX=[xx; yy];
figure
axis([1 7 5 10])
hold on
for i=1:size(XX,2)
plot(XX(1,i),XX(2,i),’marker’,’o’,’markersize’,6)
end
xlabel(’x_1’,’Fontsize’,16)
ylabel(’x_2’,’Fontsize’,16)
title(’Pareto optimal solutions’,’Fontsize’,16)
set(gca,’Fontsize’,16)
grid
hold off
figure
axis([-2 10 2 13])
hold on
plot([2 6],[5 5],’marker’,’o’,’markersize’,6)
plot([6 6],[5 9],’marker’,’o’,’markersize’,6)
plot([2 6],[9 9],’marker’,’o’,’markersize’,6)
plot([2 2],[5 9],’marker’,’o’,’markersize’,6)
for i=1:size(XX,2)
plot(XX(1,i),XX(2,i),’marker’,’x’,’markersize’,10)
end
x1=-2:.2:10;
x2=2:.2:13;
[X1, X2]=meshgrid(x1,x2);
Z1=-X1.^2 - X2;
v=[0 -5 -7 -10 -15 -20 -30 -40 -60];
cs1=contour(X1,X2,Z1,v);
clabel(cs1)
Z2=X1+X2.^2;
v2=[20 25 35 40 60 80 100 120];
cs2=contour(X1,X2,Z2,v2);
213
clabel(cs2)
xlabel(’x_1’,’Fontsize’,16)
ylabel(’x_2’,’Fontsize’,16)
title(’Level sets of f_1 and f_2, and Pareto optimal
points’,’Fontsize’,16)
set(gca,’Fontsize’,16)
grid
hold off
function xmat0 = genxmat(POPSIZE)
xmat0 = rand(POPSIZE,2);
xmat0(:,1) = xmat0(:,1)*4+2;
xmat0(:,2) = xmat0(:,2)*4+5;
function [xR,fR] = Select_P(xmat)
% Declaration
J = size(xmat,1);
% Init
Rset = [1];
j = 1;
isstep7 = 0;
% Step 1
x{1} = xmat(1,:);
f{1} = evalfcn(x{1});
% Step 2
while j < J
j = j+1;
% Step 3
r = 1;
rdel = [];
q = 0;
R = length(Rset);
for k = 1:size(xmat,1)
x{k} = xmat(k,:);
f{k} = evalfcn(x{k});
end
% Step 4
while 1
%for r=1:R
if all(f{j}<f{Rset(r)})
q = q+1;
rdel = [rdel r];
else
% Step 5
if all(f{j}>=f{Rset(r)})
break
end
end
% Step 6
214
r=r+1;
if r > R
isstep7 = 1;
break
end
end
% Step 7
if isstep7 == 1
isstep7 = 0;
if (q~=0)
Rset(rdel) =[];
Rset = [Rset j];
else
%Step 8
Rset = [Rset j];
end
end
for k = 1:size(xmat,1)
x{k} = xmat(k,:);
f{k} = evalfcn(x{k});
end
R = length(Rset);
end
% Return the Pareto solution.
for i = 1:length(Rset)
xR{i} = x{Rset(i)};
fR{i} = f{Rset(i)};
end
x1 = [];
y1 = [];
x2 = [];
y2 = [];
for k = 1:size(xmat,1)
if ismember(k,Rset)
x1 = [x1 f{k}(1)];
y1 = [y1 f{k}(2)];
else
x2 = [x2 f{k}(1)];
y2 = [y2 f{k}(2)];
end
end
%newplot
plot(x1,y1,’xr’,x2,y2,’.b’)
drawnow
function y = f1(x)
% y = x(1)^2+x(2);
% The above function is the original function in the Osyczka’s 2002
% book,
% (Example 5.1, page 101).
% Its negative makes a much more interesting example.
215
y = -(x(1)^2+x(2));
function y = f2(x)
y = x(1)+x(2)^2;
function y = evalfcn(x)
y(1) = f1(x);
y(2) = f2(x);
24.2
a. We proceed using contraposition. Assume that x∗ is not Pareto optimal. Therefore, there exists a point
x̂ ∈ Ω such that
fi (x̂) ≤ fi (x∗ ) for all i = 1, 2, . . . , `
and for some j, fj (x̂) < fj (x∗ ). Since c > 0,
c> f (x∗ ) > c> f (x̂),
which implies that x∗ is not a global minimizer for the weighted-sum problem.
For the converse, consider the following counterexample: Ω = {x ∈ R2 : kxk ≥ 1, x ≥ 0} and f (x) =
[x1 , x2 ]> . It is easy to see that the Pareto
front is {x : kxk = 1, x ≥ 0} (i.e., the part of the unit circle in the
√
nonnegative quadrant). So x∗ = (1/ 2)[1, 1]> is a Pareto minimizer. However, there is no c > 0 such that
x∗ is a global minimizer of the weighted-sum problem. To see this, fix c >√0 (assuming c1 ≤ c2 without loss
of generality) and consider the objective function value f (x∗ ) = (c1 + c2 )/ 2 for the weighted-sum problem.
Now, the point x0 = [1, 0]> is also a feasible point. Moreover f (x0 ) = c1 ≤ (c1 + c2 )/2 ≤ f (x∗ ). So x∗ is
not a global minimizer of the weighted-sum problem.
b. We proceed using contraposition. Assume that x∗ is not Pareto optimal. Therefore, there exists a point
x̂ ∈ Ω such that
fi (x̂) ≤ fi (x∗ ) for all i = 1, 2, . . . , `
and for some j, fj (x̂) < fj (x∗ ). By assumption, for all i = 1, . . . , `, fi (x∗ ) ≥ 0, which implies that
`
X
(fi (x∗ ))p >
i=1
`
X
(fi (x̂))p
i=1
(because p > 0). Hence, x∗ is not a global minimizer for the minimum-norm problem.
For the converse, consider the following counterexample: Ω = {x ∈ R2 : x1 + 2x2 ≥ 2, x ≥ 0} and
f (x) = [x1 , x2 ]> . It is easy to see that the Pareto front is {x : x1 + 2x2 = 2, x ≥ 0}. So x∗ = [1, 1/2]> is
a Pareto minimizer. However, there is no p > 0 such that x∗ is a global minimizer of the minimum-norm
problem. To see this, fix p > 0 and consider the objective function value f (x∗ ) = 1+(1/2)p for the minimumnorm problem. Now, the point x0 = [0, 1]> is also a feasible point. Moreover f (x0 ) = 1 ≤ 1+(1/2)p = f (x∗ ).
So x∗ is not a global minimizer of the minimum-norm problem.
c. For the first part, consider the following counterexample: Ω = {x ∈ R2 : x1 + x2 ≥ 2, x ≥ 0} and
f (x) = [x1 , x2 ]> . The Pareto front is {x : x1 + x2 = 2, x ≥ 0}, and x∗ = [1/2, 3/4]> is a Pareto minimizer.
But
f (x∗ ) = max{f1 (x∗ ), f2 (x∗ )}
= max{1/2, 3/4}
= 3/4.
However, x0 = [1, 1]> is also a feasible point, and f (x0 ) = 1 < f (x∗ ). Hence, x∗ is not a global minimizer
of the minimax problem.
For the second part, suppose Ω = {x ∈ R2 : x1 ≤ 2} and f (x) = [x1 , 2]> . Then, for any x ∈ Ω,
max{f1 (x), f2 (x)} = 2. So any x∗ ∈ R2 is a global minimizer of the minimax (single-objective) problem.
216
However, consider another point x̂ such that x̂1 < x∗1 . Then, f1 (x̂) < f1 (x∗ ) and f2 (x̂) = f2 (x∗ ). Hence,
x∗ is not a Pareto minimizer.
In fact, in the above example, no Pareto minimizer exists. However, if we set Ω = {x ∈ R2 : 1 ≤ x1 ≤ 2},
then the counterexample is still valid, but in this case any point of the form [1, x2 ]> is a Pareto minimizer.
24.3
Let
f (x) = c∗> f (x)
where x ∈ Ω = {x : h(x) = 0}. The function f is convex because all the functions fi are convex and
c∗i > 0, i = 1, 2, . . . , `. We can represent the given first-order condition in the following form: for any feasible
direction d at x∗ , we have
d> ∇f (x∗ ) ≥ 0.
By Theorem 22.7, the point x∗ is a global minimizer of f over Ω. Therefore,
f (x∗ ) ≤ f (x) for all x ∈ Ω.
That is,
`
X
c∗i fi (x∗ ) ≤
i=1
`
X
c∗i fi (x) for all x ∈ Ω.
i=1
∗
To finish the proof, we now assume that x is not Pareto optimal and the above condition holds. We then
proceed using the proof by contradiction. Because, by assumption, x∗ is not Pareto optimal, there exists a
point x̂ ∈ Ω such that
fi (x̂) ≤ fi (x∗ ) for all i = 1, 2, . . . , `
and for some j, fj (x̂) < fj (x∗ ). Since for all i = 1, 2, . . . , `, c∗i > 0, we must have
`
X
which contradicts the above condition,
proof. (See also Exercise 24.2, part a.)
c∗i fi (x∗ ) >
`
X
i=1
i=1
P`
≤
∗
∗
i=1 ci fi (x )
c∗i fi (x̂),
P`
∗
i=1 ci fi (x)
for all x ∈ Ω. This completes the
24.4
Let
f (x) = c∗> f (x)
where x ∈ Ω = {x : h(x) = 0}. The function f is convex because all the functions fi are convex and c∗i > 0,
i = 1, 2, . . . , `. We can represent the given Lagrange condition in the form
Df (x∗ ) + λ∗> Dg(x∗ ) = 0>
h(x∗ ) = 0.
By Theorem 22.8, the point x∗ is a global minimizer of f over Ω. Therefore,
f (x∗ ) ≤ f (x) for all x ∈ Ω.
That is,
`
X
c∗i fi (x∗ ) ≤
i=1
`
X
c∗i fi (x) for all x ∈ Ω.
i=1
∗
To finish the proof, we now assume that x is not Pareto optimal and the above condition holds. We then
proceed using the proof by contradiction. Because, by assumption, x∗ is not Pareto optimal, there exists a
point x̂ ∈ Ω such that
fi (x̂) ≤ fi (x∗ ) for all i = 1, 2, . . . , `
217
and for some j, fj (x̂) < fj (x∗ ). Since for all i = 1, 2, . . . , `, c∗i > 0, we must have
`
X
which contradicts the above condition,
proof. (See also Exercise 24.2, part a.)
c∗i fi (x∗ ) >
`
X
i=1
i=1
P`
≤
∗
∗
i=1 ci fi (x )
c∗i fi (x̂),
P`
∗
i=1 ci fi (x)
for all x ∈ Ω. This completes the
24.5
Let
f (x) = c∗> f (x)
where x ∈ Ω = {x : g(x) ≤ 0}. The function f is convex because all the functions fi are convex and c∗i > 0,
i = 1, 2, . . . , `. We can represent the given KKT condition in the form
µ∗ ≥ 0
Df (x∗ ) + µ∗> Dg(x∗ ) = 0>
µ∗> g(x∗ ) = 0
g(x∗ ) ≤ 0.
By Theorem 22.9, the point x∗ is a global minimizer of f over Ω. Therefore,
f (x∗ ) ≤ f (x) for all x ∈ Ω.
That is,
`
X
c∗i fi (x∗ ) ≤
i=1
`
X
c∗i fi (x) for all x ∈ Ω.
i=1
∗
To finish the proof, we now assume that x is not Pareto optimal and the above condition holds. We then
proceed using the proof by contradiction. Because, by assumption, x∗ is not Pareto optimal, there exists a
point x̂ ∈ Ω such that
fi (x̂) ≤ fi (x∗ ) for all i = 1, 2, . . . , `
and for some j, fj (x̂) < fj (x∗ ). Since for all i = 1, 2, . . . , `, c∗i > 0, we must have
`
X
which contradicts the above condition,
proof. (See also Exercise 24.2, part a.)
c∗i fi (x∗ ) >
`
X
i=1
i=1
P`
≤
∗
∗
i=1 ci fi (x )
c∗i fi (x̂),
P`
∗
i=1 ci fi (x)
for all x ∈ Ω. This completes the
24.6
Let
f (x) = c∗> f (x)
where x ∈ Ω = {x : h(x) = 0, g(x) ≤ 0}. The function f is convex because all the functions fi are convex
and c∗i > 0, i = 1, 2, . . . , `. We can represent the given KKT-type condition in the form
µ∗ ≥ 0
Df (x∗ ) + λ∗> Dh(x∗ ) + µ∗> Dg(x∗ ) = 0>
µ∗> g(x∗ ) = 0
h(x∗ ) = 0
g(x∗ ) ≤ 0.
By Theorem 22.9, the point x∗ is a global minimizer of f over Ω. Therefore,
f (x∗ ) ≤ f (x) for all x ∈ Ω.
218
That is,
`
X
c∗i fi (x∗ )
i=1
≤
`
X
c∗i fi (x) for all x ∈ Ω.
i=1
To finish the proof, we now assume that x∗ is not Pareto optimal and the above condition holds. We then
proceed using the proof by contradiction. Because, by assumption, x∗ is not Pareto optimal, there exists a
point x̂ ∈ Ω such that
fi (x̂) ≤ fi (x∗ ) for all i = 1, 2, . . . , `
and for some j, fj (x̂) < fj (x∗ ). Since for all i = 1, 2, . . . , `, c∗i > 0, we must have
`
X
which contradicts the above condition,
proof. (See also Exercise 24.2, part a.)
c∗i fi (x∗ ) >
`
X
i=1
i=1
P`
≤
∗
∗
i=1 ci fi (x )
c∗i fi (x̂),
P`
∗
i=1 ci fi (x)
for all x ∈ Ω. This completes the
24.7
The given minimax problem is equivalent to the problem given in the hint:
minimize
subject to
z
fi (x) − z ≤ 0, i = 1, 2.
Suppose [x∗> , z ∗ ]> is a local minimizer for the above problem (which is equivalent to x∗ being a local
minimizer to for the original problem). Then, by the KKT Theorem, there exists µ∗> ≥ 0, where µ∗ ∈ R2 ,
such that
"
#
∗ >
>
∗> [∇f1 (x ) , −1]
[0 , 1] + µ
= 0>
[∇f2 (x∗ )> , −1]
"
#
∗
∗
∗> f1 (x ) − z
µ
= 0.
f2 (x∗ ) − z ∗
Rewriting the first equation above, we get
µ∗1 ∇f1 (x∗ ) + µ∗2 ∇f2 (x∗ ) = 0,
µ∗1 + µ∗2 = 1.
Rewriting the second equation, we get
µ∗i (fi (x∗ ) − z ∗ ) = 0, i = 1, 2.
Suppose fi (x∗ ) < max{f1 (x∗ ), f2 (x∗ )}, where i ∈ {1, 2}. Then, z ∗ > fi (x∗ ). Hence, by the above equation
we conclude that µ∗i = 0.
219
Download