Primal-Dual Interior-Point Algorithms for Semidefinite Optimization Based on a Simple Kernel Function ∗

advertisement
Primal-Dual Interior-Point Algorithms for
Semidefinite Optimization Based on a Simple Kernel
Function ∗
G. Q. Wang†
Y. Q. Bai‡
C. Roos‡
May 22, 2004
†
Department of Mathematics, College Science,
Shanghai University, Shanghai, 200436.
‡
Faculty of Electrical Engineering, Mathematics and Computer Science,
Delft University of Technology, P.O. Box 5031, 2600 GA Delft, The Netherlands
e-mail: guoq wang@hotmail.com, [y.bai, c.roos]@ewi.tudelft.nl
Abstract
Interior-point methods (IPMs) for semidefinite optimization (SDO) have been studied intensively,
due to their polynomial complexity and practical efficiency. Recently, J.Peng et al. [14, 15] introduced
so-called self-regular kernel (and barrier) functions and designed primal-dual interior-point algorithms
based on self-regular proximity for linear optimization (LO) problems. They have also extended the
approach for LO to SDO. In this paper we present a primal-dual interior-point algorithm for SDO
problems based on a simple kernel function which was first introduced in [3]. The kernel function in
this paper is not self-regular due to its growth term increasing linearly. We derive the complexity
analysis for algorithms with large- and small-update methods. The complexity bounds are O(qn) log n²
√
and O(q 2 n) log n² , respectively, which are as good as those in linear case.
Keywords: semidefinite optimization, interior-point methods, primal-dual methods, large- and smallupdate methods, polynomial complexity.
AMS Subject Classification: 90C22, 90C31
1
Introduction
Semidefinite optimization problems (SDO) are convex optimization problems over the intersection of an
affine set and the cone of positive semidefinite matrices. In the past decade, SDO has been one of the most
active research areas in mathematical programming. There are two major factors that are responsible
for this increased interest in SDO. Firstly, SDO has found numbers applications in various fields, such
as statistics, structural design, electrical engineering and combinatorial optimization. Secondly, efficient
new algorithms, interior-point methods (IPMs) have led to increased interest both in the application and
∗ The second author is on leave from the Department of Mathematics, Shanghai University, Shanghai 200436, China. She
kindly acknowledges the support of Dutch Organization for Scientific Researches (NWO grant 613.000.110). Corresponding
author.
1
the research of SDO. In this paper we deal with so-called primal-dual IPMs. It is generally agreed that
these IPMs are most efficient from a computational part of view [2].
Many researchers have studied SDO and achieved plentiful and beautiful results. For an overview of these
results we refer to [18]. Several interior-point methods designed for LO have been successfully extended
to SDO. In particular, primal-dual interior-point algorithms are of highly efficiency both in theory and
in practice. An important work in this direction is the paper of Nesterov and Todd [13] who showed that
the primal-dual algorithm maintains its theoretical efficiency when the nonnegativity constrains in LO
are replaced by a convex cone, as long as the cone is homogeneous and self-dual, or in the terminology of
Nesterov and Todd, as long as the cone is self-scaled. Recently, J.Peng et al. [14, 15] introduced so-called
self-regular kernel functions. The prototype self-regular function in [14, 15] is given by
Υp,q (t) =
tp+1 − 1 t1−q − 1 p − q
+
+
(t − 1),
p(p + 1)
q(q − 1)
pq
(1)
where p ≥ 1 and q > 1. The parameter p is called the growth degree and q the barrier degree of the
kernel function. Based on self-regular functions, they designed primal-dual interior-point algorithms for
LO√problems and also√extended the approach to SDO ones. The complexity bounds obtained by them are
O( n) log n² and O( n log n) log n² , for small-update methods and large-update methods, respectively,
which are currently the best known bounds. Motivated by their work, in this paper we present a primaldual interior-point algorithm for SDO based on the kernel function:
ψ(t) = t − 1 +
t1−q − 1
,
q−1
t > 0,
(2)
where q > 1 is a parameter. This kernel function determines a matrix-valued function ψ(V ), which is
defined in the usual way (see Section 2.1), and a real-valued function Ψ(V ) defined as follows.
Ψ(V ) := Tr(ψ(V )), V ∈ Sn++ .
(3)
The kernel function ψ(t) was introduced in [3] for LO. See also [4]. The kernel function ψ(t) does not
belong to the family of self-regular functions, and as a consequence a large part of the analysis tools in
[14, 15] do no longer apply. In this paper we develop some new analysis tools for the analysis of primaldual interior-point algorithm based on ψ(t). Via the special definition of the matrix functions ψ(V ) and
Ψ(V ) and nice properties of ψ(t) the analysis in this paper is much simper than in [14, 15]. We derive
the complexity analysis for algorithms with large- and small-update methods. The complexity bounds
are as good as bounds for the linear case.
The outline of the paper is as follows. In Section 2 we first describe special matrix functions used in later
sections. Then we briefly recall the basic concepts of interior-point methods for SDO, such as central
path, NT-search directions, etc.. In Section 3, we describe a primal-dual interior-point algorithm based on
Ψ(V ) for SDO. In Section 4, we present the properties of ψ(t) and study the matrix functions ψ(V ) and
Ψ(V ). We analyze the algorithm to derive the complexity bound with large- and small-update methods
in Section 5. Finally, some concluding remarks follow in Section 6.
Some notations used throughout the paper are as follows. Rn , Rn+ and Rn++ denote the set of vectors
with n components, the set of nonnegative vectors and the set of positive vectors, respectively. k.k
denotes the Frobenius norm for matrices, and the 2-norm for vectors. Rm×n is the space of all m × n
matrices. Sn , Sn+ and Sn++ denote the cone of symmetric, symmetric positive semidefinite and symmetric
positive definite n × n matrices, respectively. E denotes n × n identity matrix. The Löwner partial order
”º” on positive semidefinite (or positive definite) matrices means A º B (A  B) if A − B is positive
semidefinite (or positive definite). We use the matrix inner product A•B = Tr(AT B). For any symmetric
1
positive definite matrix Q ∈ Sn++ , the expression Q 2 denotes its symmetric square root. Similarly we
α
can define the power Q for any Q Â 0 and α ∈ R. When λ is a vector we denote the diagonal matrix
Λ with entries λi by diag (λ). For any V ∈ Sn++ , we denote by λ(V ) the vector of eigenvalues of V
arranged in non-increasing order, that is, λ1 (V ) ≥ λ2 (V ) ≥ ... ≥ λn (V ). For any matrix M , we denote
by σ1 (M ) ≥ σ2 (M ) ≥ ... ≥ σn (M ) the singular values of M . Especially if M is symmetric, then one has
σi (M ) = |λi (M )|, i = 1, 2, ..., n.
2
2
2.1
Preliminaries
Special matrix functions
To introduce special matrix functions which will be useful for designing primal-dual interior-point algorithm, first of all, let us recall some known facts from linear algebra.
Theorem 2.1 [Spectral theorem for symmetric matrices in [18]] The real n × n matrix A is symmetric
if and only if there exists an orthogonal basis with respect to which A is real and diagonal, i.e. if and only
if there exits a matrix U ∈ Rn×n such that U T U = E and U T AU = Λ where Λ is a diagonal matrix. 2
The columns ui of U are the eigenvectors of A, satisfying
Aui = λi ui ,
i = 1, ..., n,
where λi is the i-th diagonal entry of Λ.
Now we are ready to show how a matrix function can be obtained from ψ(t).
Definition 2.2 Let V ∈ Sn++ be any symmetric n × n matrix and
V = QT diag (λ(V ))Q
where Q is any orthonormal matrix that diagonalizes V . Let ψ(t) be defined as in (2). The matrix
function ψ(V ) : Sn++ → Sn is defined by
ψ(V ) = QT diag (ψ(λ1 (V )), ψ(λ2 (V )), · · ·, ψ(λn (V )))Q.
(4)
Then we define a matrix function Ψ(V ): Sn++ → R+ as follows.
Definition 2.3
Ψ(V ) = Tr(ψ(V )) =
n
X
ψ(λi (V ))
(5)
i=1
where ψ(V ) is given by (4).
Since ψ(t) is triple differentiable, the derivatives ψ 0 (t), ψ 00 (t) and ψ 000 (t) are well-defined for t > 0. Hence,
replacing ψ(λi (V )) in (4) by ψ 0 (λi (V )), ψ 00 (λi (V )) and ψ 000 (λi (V )), respectively, for each matrix, we
obtain that the matrix functions ψ 0 (V ), ψ 00 (V ) and ψ 000 (V ) already are defined. ψ(V ) depends only on
the restriction of ψ(t) on the spectrum (the set of eigenvalues) of V (See p.278 in [5]).
Just as in the linear case, we call ψ(t) the kernel function for the matrix functions ψ(V ) and Ψ(V ).
Moreover, we call both ψ(V ) and Ψ(V ) matrix barrier functions.
The following lemma can be found in [15]. For completeness’ sake we include the proof below.
Lemma 2.4 Ψ(V ) is strictly convex with respect to V Â 0 and vanishes at its global minimal point
V = E, i.e., ψ(E) = ψ 0 (E) = 0n×n . Moreover, Ψ(E) = 0.
Proof: Firstly, we prove Ψ(V ) is strictly convex for V Â 0, i.e., for any V1 6= V2 Â 0, the following
inequality holds
¶
µ
1
V1 + V2
< (Ψ(V1 ) + Ψ(V2 )) .
(6)
Ψ
2
2
Since both V1 , V2 are positive definite, so is the matrix 12 (V1 + V2 ). Using Theorem 2.1, there exists
orthogonal matrices Q, Q̃1 , Q̃2 and diagonal matrices Λ, Λ1 , Λ2 such that
Λ = diag (λ) =
1
1
1
1
1
T
T
Q(V1 + V2 )QT = QV1 QT + QV2 QT2 = QQ̃1 Λ1 Q̃1 QT + QQ̃2 Λ2 Q̃2 QT .
2
2
2
2
2
3
Denote that Q1 = QQ̃1 and Q2 = QQ̃2 , we have
Λ=
1
(Q1 Λ1 QT1 + Q2 Λ1 QT2 ),
2
(7)
where Λ1 = diag (λ1 ) and Λ2 = diag (λ2 ) are positive diagonal matrices, and λ, λ1 , λ2 are vectors whose
components are the eigenvalues of 21 (V1 + V2 ), V1 and V2 , respectively. We denote
M = ([Q1 ]2ij )n×n
From (7) one can easily get
λ=
and
N = ([Q2 ]2ij )n×n .
1
(M λ1 + N λ2 ).
2
Using the orthogonality of the matrices Q1 and Q2 , one has
n
X
Mij =
i=1
n
X
n
X
Nij = 1, j = 1, 2, ..., n,
i=1
Mij =
j=1
n
X
Nij = 1, i = 1, 2, ..., n.
j=1
So M and N are double stochastic matrices. From the strict convexity of ψ(t) and the fact that V1 6=
V2 Â 0. We readily obtain
!
à n
µ
¶ X
¶
µ
n
n
n
X
X
V1 + V2
1 X
M i λ1 + N i λ2
1
2
Ψ
ψ(Mi λ ) +
ψ(Ni λ ) ,
=
<
ψ(λi ) =
ψ
2
2
2 i=1
i=1
i=1
i=1
where Mi (Ni ) denotes the i-th row of the M(N), using the orthogonality of the matrices Q1 and Q2 and
the convexity of ψ(t) again . Since M is double stochastic, we have
n
X
i=1
similarly, one has
ψ(Mi λ1 ) ≤
n X
n
X
Mij ψ(λ1j ) =
i=1 j=1
n
X
n
X
ψ(λ1j ) = Ψ(V1 ),
j=1
ψ(Ni λ2 ) ≤ Ψ(V2 ).
i=1
The above two inequalities yield the desired relation (6). Since ψ(1) = ψ 0 (1) = 0, by using (4) one can
easily verify other statements.
2
Moreover, we need to deal with another concepts relevant to matrix functions in matrix theory. The
related results of matrix of function can be found in [7, 9].
Definition 2.5 A matrix M (t) is said to be a matrix of functions if each entry of M (t) is a function of
t, i.e., M (t) = [Mij (t)].
First, we state two inequalities, which are used in the proof of Lemma 5.2. If M, N ∈ Sn , then
|Tr(M N )| ≤ |λ1 (M )|
n
X
|λi (N )|.
(8)
i=1
Furthermore, if M1 ¹ M2 and N º 0, then
Tr(M1 N ) ≤ Tr(M2 N ).
4
(9)
The usual concepts of continuity, differentiability and integrability can be naturally extended to matrices
of functions, by interpreting them entry-wise. Let M (t) and N (t) be two matrices of functions. Then it
can easily be understood that
·
¸
d
d
M (t) =
Mij (t) = M 0 (t),
(10)
dt
dt
µ
¶
d
d
Tr(M (t)) = Tr
M (t) = Tr(M 0 (t)),
(11)
dt
dt
d
Tr(ψ(M (t))) = Tr(ψ 0 (M (t))M 0 (t),
dt
·
¸
·
¸
d
d
d
(M (t)N (t)) =
M (t) N (t) + M (t)
N (t) = M 0 (t)N (t) + M (t)N 0 (t).
dt
dt
dt
(12)
(13)
Remark 2.6 In the rest of the section, when we use the function ψ(·) and its derivatives ψ 0 (·) and ψ 00 (·),
these denote matrix functions if the argument is a matrix and a univariate function if the argument is in
R+ .
2.2
The central path for SDO
We consider the standard form for SDO problems:
(P )
minimize
subject to
C •X
Ai • X = bi , i = 1, 2, ..., m, X º 0.
(14)
and its dual:
(D)
maximize
subject to
bT y
Pm
i=1
yi Ai + S = C, S º 0.
(15)
where each Ai ∈ Sn , b ∈ Rm , and C ∈ Sn . The matrices Ai are linearly independent.
We assume that (P ) and (D) satisfy the interior-point condition IPC, i.e., there exists X ∈ P, S ∈ D
with X Â 0, S Â 0, where P and D denote the feasible set of problem (P) and (D), respectively. Under
the assumption of IPC, the optimality conditions for (P ) and (D) can be written as follows.
Ai • X = bi , i = 1, 2, ..., m, X º 0,
m
X
yi Ai + S = C, S º 0,
(16)
i=1
XS = 0.
We modify the above system by relaxing the third equation as follows.
Ai • X = bi , i = 1, 2, ..., m, X º 0,
m
X
yi Ai + S = C, S º 0,
(17)
i=1
XS = µE,
with µ > 0 and E is the n × n unit matrix. Under the assumption that (P ) and (D) satisfy the IPC (
this can be achieved via the so-called self-dual embedding technique introduced by E. de Klerk et al. in
5
[18]), the system (17) has a unique solution, denoted by (X(µ), y(µ), S(µ)). We call X(µ) the µ-center
of (P ) and (y(µ), S(µ)) the µ-center of (D). The set of µ-centers (with µ running through positive real
numbers) gives a homotopy path, which is called the central path of (P ) and (D). If µ → 0 then the
limit of the central path exists and since the limit points satisfy the complementarity condition, the limit
yields optimal solutions for (P ) and (D). The relevance of the central path for LO and SDO has been
discussed in many papers, see, e.g. [18], [10], and [17], etc..
2.3
Search directions determined by kernel functions
The core idea of a primal-dual interior-point algorithm is to follow the central path and to approach the
optimal set of SDO problems by letting µ go to zero. A direct application of Newton’s method to (17)
produces the following equations for the search direction ∆X, 4y and ∆S:
Ai • ∆X = 0, i = 1, 2, ..., m,
m
X
∆yi Ai + ∆S = 0,
(18)
i=1
X∆S + ∆XS = µE − XS.
It can be showed that this system has a unique solution, see [18]. We can rewrite system (18) as follows.
Ai • ∆X = 0, i = 1, 2, ..., m,
m
X
∆yi Ai + ∆S = 0,
(19)
i=1
∆X + X∆SS −1 = µS −1 − X.
It is obvious that ∆S is symmetric due to the second equation in (19). However, a crucial observation
is that ∆X is not necessary symmetric because X∆SS −1 may be not symmetric. Many researchers
have proposed methods for symmetrizing the third equation in the above Newton system such that the
resulting new system has a unique symmetric solution. Among them, the following three directions
are the most popular ones, the direction introduced by Alizadeh, Haeberly, Overton (AHO) in [1], by
Helmberg, et al., Kojima et al., and Monteiro (HKM) in [6, 8, 11], Nesterov and Todd (NT) in [12, 13],
respectively, called AHO, HKM, and NT directions. In this paper we use the symmetrization scheme from
which the NT direction [12] is derived. One important reason for this is that the NT scaling technique
transfers the primal variable X and the dual S into the same space: the so-called V -space. If we apply
the NT-symmetrized scheme, namely, the term X∆SS −1 in the third equation is replaced by P ∆SP T ,
where
1
1
1
1
1
1
1
1 1
1
P := X 2 (X 2 SX 2 )− 2 X 2 = S − 2 (S 2 XS 2 ) 2 S − 2 .
Then the above system is replaced by the system
Ai • ∆X = 0, i = 1, 2, ..., m,
m
X
∆yi Ai + ∆S = 0,
(20)
i=1
∆X + P ∆SP T = µS −1 − X.
Obviously, now ∆X is a symmetric matrix and system (20) still has a unique solution (See [18]). Let
1
D = P 2 . Then matrix D can be used to scale X and S to the same matrix V because
1
1
V := √ D−1 XD−1 = √ DSD.
µ
µ
6
(21)
Therefore we have
1 −1
D XSD.
µ
Note that the matrices D and V are symmetric and positive definite.
Let us further define
V 2 :=
1
1
Āi := DAi D, i = 1, 2, · · ·, m; DX := √ D−1 4 XD−1 ; DS := √ D 4 SD.
µ
µ
(22)
(23)
Then it follows from (20) that the (scaled) NT search direction (DX , 4y, DS ) is satisfied the system
Āi • DX = 0, i = 1, 2, ..., m,
m
X
∆yi Āi + DS = 0,
(24)
i=1
DX + DS = V −1 − V.
So far we describe the scheme that defines the classical NT direction. Now, following [15] we change to
the new approach of this paper. Given the kernel function ψ(t) and the associated ψ(V ) and ψ 0 (V ) as
defined in Definition 2.2, instead of the V −1 − V , we replace the right-hand side in the third equation in
(24) by −ψ 0 (V ). Thus we consider the following system.
Āi • DX = 0, i = 1, 2, ..., m,
m
X
∆yi Āi + DS = 0,
(25)
i=1
DX + DS = −ψ 0 (V ).
It is straightforward to verify the system (25) has a unique solution. Having DX and DS , we can calculate
∆X and ∆S by (23). We will argue below that the matrix function ψ(V ) determines in a natural way
an interior-point algorithm. Since DX and DS are orthogonal, that is,
Tr(DX DS ) = Tr(DS DX ) = 0,
we have
(26)
DX = DS = 0n×n ⇔ ψ 0 (V ) = 0n×n ⇔ V = E ⇔ Ψ(V ) = 0,
i.e., if and only if XS = µE, that is, i.e., if and only if X = X(µ) and S = S(µ), as it should. Otherwise
Ψ(V ) > 0. Hence, if (X, y, S) 6= (X(µ), y(µ), S(µ)) then (4X, 4y, 4S) is nonzero. By taking a step
along the search direction, with the step size α defined by some line search rules, one constructs a new
triple (X, y, S) according to
X+ = X + α4X,
3
y+ = y + α4y,
S+ = S + α4S.
(27)
A primal-dual algorithm for SDO
We can now describe the algorithm in a more formal way. The generic form of this algorithm is shown
in Figure 1. It is clear from this description that closeness of (X, y, S) to (X(µ), y(µ), S(µ)) is measured
by the value of Ψ(V ), with τ as a threshold value: if Ψ(V ) ≤ τ then we start a new outer iteration by
performing a µ-update, otherwise we enter an inner iteration by computing the search directions at the
current iterates with respect to the current value of µ and apply (27) to get new iterates.
The parameters τ, θ and the step size α should be chosen in such a way that the algorithm is ’optimized’
in the sense that the number of iterations required by the algorithm is as small as possible. The choice
7
Primal-Dual Algorithm for SDO
Input:
A threshold parameter τ ≥ 1;
an accuracy parameter ε > 0;
a fixed barrier update parameter θ, 0 < θ < 1;
a strictly feasible (X 0 , S 0 ) and µ0 = 1 such that Ψ(X 0 , S 0 , µ0 ) ≤
τ.
begin
X := X 0 ; S := S 0 ; µ := µ0 ;
while nµ ≥ ε do
begin
µ := (1 − θ)µ;
while Ψ(X, S, µ) > τ do
begin
Solve system (25) and use (23) for ∆X, ∆y, ∆S,
Determine a step size α;
X := X + α∆X;
S := S + α∆S;
y := y + α∆y;
1
V := √1µ (D−1 XSD) 2 ;
end
end
end
Figure 1: Algorithm
of the so-called barrier update parameter θ plays an important role both in theory and practice of IPMs.
Usually, if θ is a constant independent of the dimension n of the problem, for instance θ = 12 , then we
call the algorithm a large-update (or long-step) method. If θ depends on the dimension of the problem,
such as θ = √1n , then the algorithm is named a small-update (or short-step) method.
The choice of the step size α (0 ≤ α ≤ 1) is another crucial issue in the analysis of the algorithm. It
has to be taken such that the closeness of the iterates to the current µ-center improves by a sufficient
amount. In the theoretical analysis the step size α is usually given a value that depends on the closeness
of the current iterates to the µ-center.
4
Properties of the kernel (barrier) function
In this section, we study properties of the kernel function ψ(t) and the barrier function Ψ(V ).
4.1
Properties of ψ(t)
In this section we recall some properties of ψ(t) from [3] and [4]. The first three derivatives of ψ(t) are
ψ 0 (t) = 1 −
1
,
tq
ψ 00 (t) =
q
> 0,
tq+1
ψ 000 (t) = −
It is quite straightforward to verify
00
ψ(1) = ψ 0 (1) = 0,
ψ (t) > 0, t > 0,
8
q(q + 1)
.
tq+2
(28)
lim ψ(t) = lim ψ(t) = +∞.
t→0
t→∞
Moreover, from (28), ψ(t) is strictly convex and ψ 00 (t) is monotonically decreasing in t ∈ (0, +∞).
Lemma 4.1 [Lemma 2.3 in [3]] Let t1 > 0, and t2 > 0. Then
√
1
ψ( t1 t2 ) ≤ (ψ(t1 ) + ψ(t2 )).
2
2
Lemma 4.2 One has
Proof:
1
ψ 0 (t)2 ≥ ψ 0 ( )2 ,
t
0 < t ≤ 1.
(29)
When 0 < t ≤ 1, we have
¶2
¶
µ
µ
¶µ
1
1
1
1
2
q
ψ 0 (t)2 − ψ 0 ( )2 = 1 − q
− (1 − tq ) = q + tq − 2
−
t
≥ 0,
t
t
t
tq
which implies the lemma.
2
Lemma 4.3 [ Lemma 2.6 in [4]] One has
ψ(t) <
1 00
ψ (1)(t − 1)2 ,
2
if t ≥ 1.
2
Lemma 4.4 [ Lemma 3.1 in [4]] Suppose that ψ(t1 ) = ψ(t2 ) with t1 ≤ 1 ≤ t2 and β ≥ 1. Then
ψ(βt1 ) ≤ ψ(βt2 ). Equality holds if and only if β = 1 or t1 = t2 = 1.
2
Lemma 4.5 Let % : [0, ∞) → [1, ∞) be the inverse function of ψ(t) for t ≥ 1. If q > 1, one has
q
.
q−1
(30)
q
s.
q−1
(31)
1 + s ≤ %(s) ≤ s +
If q ≥ 2, one has
r
%(s) ≤ 1 +
Proof:
s2 +
When q > 1, we have
t = % (s)
⇔
ψ(t) = t − 1 +
t1−q − 1
= s,
q−1
t ≥ 1.
Using that t ≥ 1 one easily sees that
1 + s ≤ %(s) ≤ s +
q
.
q−1
Now we consider the case that q ≥ 2. We first establish that tψ(t) ≥ (t − 1)2 for t ≥ 1. This goes as
follows. Defining f (t) = tψ(t) − (t − 1)2 one has f (1) = 0 and f 0 (t) = ψ(t) + tψ 0 (t) − 2(t − 1). Hence
f 0 (1) = 0 and f 00 (t) = 2ψ 0 (t) + tψ 00 (t) − 2. Since f 00 (t) = (q − 2)t−q ≥ 0 the claim follows. Hence we have
p
t ≤ 1 + tψ(t).
9
Now substituting t ≤ ψ(t) +
q
q−1
we obtain
r
%(s) ≤ 1 +
s2 +
q
s.
q−1
This completes the lemma.
2
Remark 4.6 (31) is tighter than (30) at s = 0. This will help us in getting a sharp iteration bound for
small-update methods.
4.2
Properties of Ψ(V )
The following lemmas are crucial for the analysis of the SDO algorithm to be stated below.
We need a result from [7] that we state without proof.
Lemma 4.7 [ Lemma 3.3.14 (c) in [7]] Let M, N ∈ Sn be two nonsingular matrices and f (t) be given
real-valued function such that f (et ) is a convex function. One has
n
X
f (σi (M N )) ≤
i=1
n
X
f (σi (M )σi (N )) ,
(32)
i=1
where σi (M ), i = 1, 2, ..., n denote the singular values of M .
2
We can apply the above lemma to ψ(t) because it is easily verify that ψ(et ) is convex function.
Lemma 4.8 [ Proposition 5.2.6 in [15]] Suppose that matrices V1 and V2 are symmetric positive definite,
then
1
1 1
1
Ψ([V12 V2 V12 ] 2 ) ≤ (Ψ(V1 ) + Ψ(V2 )).
(33)
2
Proof:
For any nonsingular matrix V ∈ Sn ,
¡
¢1
¡
¢1
σi (V ) = λi (V T V ) 2 = λi (V V T ) 2 , i = 1, 2, ..., n.
¿From the above equality, we have
³ 1 1´ ³
´ 12
³ 1
´
1
1
1 1
σi V12 V22 = λi (V12 V2 V12 )
= λi [V12 V2 V12 ] 2 , i = 1, 2, ..., n.
Since V1 and V2 are symmetric positive definite, one has
σi (V1 ) = λi (V1 ), σi (V2 ) = λi (V2 ), i = 1, 2, ..., n.
Using the definition of Ψ(V ), Lemma 4.7 and Lemma 4.1, one has
n
´ P
³
³ 1
´ X
³
´
1
1 1
1
1
1
n
Ψ [V12 V2 V12 ] 2 = i=1 ψ σi (V12 V22 ) ≤
ψ σi (V12 )σi (V22 )
i=1
n
³
´´
1
1 X ³ ³ 2 12 ´
ψ σi (V1 ) + ψ σi2 (V22 )
≤
2 i=1
n
1X
1
=
(ψ(σi (V1 )) + ψ(σi (V2 ))) = (Ψ(V1 ) + Ψ(V2 )).
2 i=1
2
This completes the lemma.
2
10
In the analysis of the algorithm we also use the norm-based proximity measure δ(V ) defined by
v
u n
X
1 0
1u
1
δ(V ) := kψ (V )k = t
ψ 0 (λi (V ))2 = kDX + DS k.
2
2 i=1
2
(34)
Obviously, since DX ⊥DS , we have
kDX + DS k2 = kDX k2 + kDS k2 .
(35)
Moreover, recall that Ψ(V ) is strictly convex and minimal at V = E whereas the minimal value is zero.
So we have
Ψ(V ) = 0 ⇔ δ(V ) = 0 ⇔ V = E.
The following lemma gives a bound of δ(V ) in terms of Ψ(V ).
Lemma 4.9 One has
1
δ(V ) ≥
2
µ
1−
1
(1 + Ψ(V ))q
¶
, V º 0.
(36)
Proof: The statement in the lemma is obvious if V = E, since then δ(V ) = Ψ(V ) = 0, otherwise we
have δ(V ) > 0 and Ψ(V ) > 0. To deal with the nontrivial case we consider, for K > 0, the problem
n
z K = min{δ(V )2 =
1X 0
ψ (λi (V ))2 : Ψ(V ) = K}.
4 i=1
The first order optimality condition becomes
1 0
ψ (λi (V ))ψ 00 (λi (V )) = γψ 0 (λi (V )), i = 1, 2, ..., n.
2
where γ ∈ R is the Lagrange multiplier. Since K > 0 we have λi (V ) 6= 1 for at least one i. Therefor,
γ 6= 0. It follows that either ψ(λ0i (V )) = 0 or ψ 00 (λi (V )) = 2γ, for each i. Since ψ 00 (t) is monotonically
decreasing, this implies that all λi (V ) for which ψ 00 (λi (V )) = 2γ have the same value. Denoting this
value as η, and observing that all other eigenvalues have value 1, we conclude that, after reordering the
eigenvalues, V has the form
V = QT diag (η, ...η, 1..., 1)Q.
Now Ψ(V ) = K implies kψ(η) = K. Given k, this uniquely determines ψ(η), where we have
4δ(V )2 = kψ 0 (η)2 = k(1 −
1 2
) ,
ηq
ψ(η) =
K
.
k
Noth that the equation ψ(η) = K
k has two solutions, one smaller than 1 and one larger than 1, that can
be written as η = λ1 (V ) and η = λ2 (V ) with 0 < λ1 (V ) < 1 and λ2 (V ) > 1. By Lemma 4.2 we have
1
2
2
ψ 0 (λ1 (V ))2 ≥ ψ 0 ( λ1 (V
) ) . Since we are minimizing δ(V ) , we conclude that η > 1. From the definition
K
of ψ(t) we deduce that ψ(η) ≤ η − 1, for η > 1. Hence we obtain that K
k ≤ η − 1, whence η ≥ 1 + k .
0
Since ψ (η) is monotonically increasing, we conclude that
4δ(V )2 = kψ 0 (η)2 = k(1 −
1 2
1
1
) ≥ k(1 −
)2 ≥ (1 −
)2 .
K
q
q
η
(1 + K)q
(1 + k )
The last inequality is due to the fact that the middle expression is increasing in k. Thus we
µ
¶
1
1
δ(V ) ≥
1−
.
2
(1 + K)q
Substituting Ψ(V ) for K, this inequality of the lemma follows.
2
Note that during the course of the algorithm the largest values of Ψ(V ) occur just after the update of
µ. So next we derive an estimate for the effect of a µ-update on the value of Ψ(V ). We start with an
important lemma.
11
Lemma 4.10 Let % : [0, ∞) → [1, ∞) be the inverse function of ψ(t) for t ≥ 1. Then we have for any
positive definite matrix V and any β ≥ 1:
¶¶
µ µ
Ψ(V )
Ψ(βV ) ≤ nψ β%
.
(37)
n
Proof: The inequality in the theorem is obvious if β = 1, or when V = E. So we may assume below
that β > 1 and V 6= E. We consider the following maximization problem:
max {Ψ(βV ) : Ψ(V ) = K} ,
V
where K is any nonnegative number. The first order optimality conditions for this problem are
βψ 0 (βλi (V )) = γψ 0 (λi (V )),
i = 1, . . . , n,
where γ denotes the Lagrange multiplier. Since ψ 0 (1) = 0 and βψ 0 (β) > 0, we must have λi (V ) 6= 1 for
all i. We even may assume that λi (V ) > 1 for all i. To see this, let ki be such that ψ(Vi ) = ki . Given ki ,
this equation has two solutions: λi (V )(1) < 1 and λi (V )(2) > 1. As a consequence of Lemma 4.4 we have
ψ(βλi (V )(1) ) ≤ ψ(βλi (V )(2) ). Since we are maximizing Ψ(βV ), we conclude that λi (V ) = λi (V )(2) > 1.
Thus we have, for all i,
µ
¶
µ
¶
1
1
β 1−
=
γ
1
−
, λi (V ) > 1.
(βλi (V ))q
λi (V )q
Using this and β > 1 we easily obtain that γ > β > 1, and that all coordinates λi (V ) are equal and given
by
µ
¶1
γβ q−1 − 1 q
λi (V ) =
, i = 1, 2, ..., n.
β q−1 (γ − β)
Denoting their common values as t we deduce from Ψ(V ) = K that nψ(t) = K. This implies t = %(K/n).
Hence the maximal value that Ψ(βV ) can attain is given by
Ψ(βtE) = nψ(βt) = nψ(β%(
K
Ψ(V )
)) = nψ(β%(
)).
n
n
This proves the lemma.
2
V
. If Ψ(V ) ≤ τ , then
1−θ
µ
¶
%(τ /n)
√
Ψ(V+ ) ≤ nψ
.
1−θ
Corollary 4.11 Let 0 ≤ θ ≤ 1 and V+ = √
(38)
Proof: With β ≥ 1 and Ψ(V ) ≤ τ the inequality follows from Lemma 4.10.
We use two upper bounds for %(s) which are formed by (30) and (31) to make two upper bounds for
Ψ(V ). As we will show in the next section, each subsequent inner iteration will give rise to a decrease of
value of Ψ(V ). Hence, we may already conclude that the numbers
!
Ãτ
q
n + q−1
, q > 1,
(39)
L1 := nψ √
1−θ
and

L2 := nψ 
q
1+
τ2
n2
√
+
q τ
q−1 n
1−θ

 , q ≥ 2,
are two upper bounds for Ψ(V ) during the course of the algorithm.
12
(40)
5
5.1
Derivation of iteration bound
Decrease of the value of Ψ(V ) and selection of step size
In each inner iteration we first compute the search direction ∆X, ∆y, and ∆S from the system (25),
also using (23). After a step with size α the new iterations are X+ = X + α∆X, y+ = y + α∆y and
S+ = S + α∆S, respectively.
Note that by (23), we have
√
√
X+ = X + α∆X = X + α µDDX D = µD(V + αDX )D
and
√
√
S+ = S + α∆S = S + α µD−1 DS D−1 = µD−1 (V + αDS )D−1 .
Thus we obtain
¢1
1 ¡
V+ = √ D−1 X+ S+ D 2 .
µ
1
1
1
We can verify that V+2 is unitarily similar to the matrix X+2 S+ X+2 and thus to (V +αDX ) 2 (V +αDS )(V +
1
αDX ) 2 . This implies that the eigenvalues of V+ are precisely the same as those of the matrix
³
´ 12
1
1
.
V + := (V + αDX ) 2 (V + αDS )(V + αDX ) 2
(41)
By the definition of Ψ(V ), we obtain Ψ(V+ ) = Ψ(V + ).
Hence, by Lemma 4.8, we have
Ψ(V+ ) = Ψ(V + ) ≤
1
(Ψ(V + αDX ) + Ψ(V + αDS )) .
2
Defining,
f (α) := Ψ(V+ ) − Ψ(V ) = Ψ(V + ) − Ψ(V ),
we have f (α) ≤ f1 (α), where
f1 (α) :=
1
(Ψ(V + αDX ) + Ψ(V + αDS )) − Ψ(V ).
2
Obviously,
f (0) = f1 (0) = 0.
By using (11), (12) and (13), we get
f10 (α) =
1
Tr (ψ 0 (V + αDX )DX + ψ 0 (V + αDS )DS ) ,
2
(42)
and
¢
1 ¡ 00
2
Tr ψ (V + αDX )DX
+ ψ 00 (V + αDS )DS2 .
2
Using the third inequality of system (25). Obviously,
f100 (α) =
f1 (0) =
1
1
Tr(ψ 0 (V )(DX + DS )) = Tr(−ψ 0 (V )2 ) = −2δ(V )2 .
2
2
(43)
(44)
The following lemma is a slight modification of the Weyl Theorem (see [18]).
Lemma 5.1 Let A, A + B ∈ Sn+ , one has
λi (A + B) ≥ λn (A) − |λ1 (B)|, i = 1, 2, ..., n.
13
(45)
By the Rayleigh-Ritz theorem (see [7]), for any i = 1, 2, ..., n, there exists X0 ∈ Rn , such that
Proof:
X0T (A + B)X0
X0T AX0
X0T BX0
=
+
X0T X0
X0T X0
X0T X0
¯
¯
¯ T
¯
¯ X BX ¯
X T AX0 ¯¯ X0T BX0 ¯¯
X T AX
¯
¯ = λn (A) − |λ1 (B)|.
≥ 0T
−¯ T
−
max
≥
min
X6=0 ¯ X T X ¯
X0 X0
X0 X0 ¯ X6=0 X T X
λi (A + B) ≥ λn (A + B) =
This completes the lemma.
2
Below we use the following notation:
δ := δ(V ).
One of the main results in this section is below.
Lemma 5.2 One has
f100 (α) ≤ 2δ 2 ψ 00 (λn (V ) − 2αδ).
(46)
Proof:
By using (34) and (35), we have kDX + DS k2 = kDX k2 + kDS k2 = 4δ 2 . It implies that
|λ1 (DX )| ≤ 2δ and |λ1 (DS )| ≤ 2δ. Using Lemma 5.1 and V + αDX º 0, we have
λi (V + αDX ) ≥ λn (V ) − α | λ1 (DX ) |≥ λn (V ) − 2αδ, i = 1, 2, ..., n.
Since ψ 00 (t) is monotonically decreasing in t ∈ (0, ∞), we obtain
ψ 00 (λi (V + αDX )) ≤ ψ 00 (λn (V ) − 2αδ),
hence, we have
ψ 00 (V + αDX ) ¹ ψ 00 (λn (V ) − 2αδ)E.
2
n
Since DX
∈ S+
, by using (9) and (8), we obtain
2
2
) ≤ ψ 00 (λn (V ) − 2αδ)
) ≤ Tr(ψ 00 ((λn (V ) − 2αδ)EDX
Tr(ψ 00 (V + αDX )DX
n
X
2
).
λi (DX
i=1
Similarly,
Tr(ψ 00 (V + αDS )DS2 ) ≤ Tr(ψ 00 ((λn (V ) − 2αδ)EDS2 ) ≤ ψ 00 (λn (V ) − 2αδ)
n
X
λi (DS2 ).
i=1
2
2
2
Using (43), from the above two inequalities and kDX k + kDS k = 4δ , we have
f100 (α) ≤
n
X
1 00
2
ψ (λn (V ) − 2αδ)
(λi (DX
) + λi (DS2 ) = 2δ 2 ψ 00 (λn (V ) − 2αδ).
2
i=1
This completes the proof.
2
Next we will choose a suitable step size for the algorithm. This should be chosen such that X+ and S+
are feasible and such that Ψ(V+ ) − Ψ(V ) decreases sufficiently.
The following strategy for selecting the step size is almost a ”word-by-word” extension of the LO case in
[3]. Therefore, for the proof of the following lemmas we refer to [3].
Lemma 5.3 [Lemma 3.2 in [3]] If the step size α satisfies
−ψ 0 (λn (V ) − 2αδ) + ψ 0 (λn (V )) ≤ 2δ,
then
f10 (α) ≤ 0.
14
(47)
2
Let ρ : [0, ∞) → (0, 1] denote the inverse function of the restriction of − 21 ψ 0 (t) on the interval (0, 1], one
may easily verify that
1
ρ(y) =
y ∈ [0, ∞).
(48)
1 ,
(2y + 1) q
Lemma 5.4 [Lemma 3.3 in [3]] The largest possible value of the step size of α satisfying (48) is given by
ᾱ :=
1
(ρ(δ) − ρ(2δ)).
2δ
(49)
2
By using (49), Lemma 5.3 and the well-known Bernoulli inequality (1 + x)α ≤ 1 + αx, x ≥ −1 and
0 ≤ α ≤ 1, we get
1
ᾱ ≥ α̃ =
.
(50)
1
q(2δ + 1) q (4δ + 1)
We define our default step size as α̃.
Lemma 5.5 (Lemma 3.12 in [14]) Let h(t) be a twice differentiable convex function with h(0) = 0,
h0 (0) < 0 and let h(t) attain its (global) minimum at t∗ > 0. If h00 (t) is increasing for t ∈ [0, t∗ ] then
h(t) ≤
th0 (0)
,
2
0 ≤ t ≤ t∗ .
2
Via the above lemma, we have the following lemma.
Lemma 5.6 [Lemma 3.6 in [3]] If the step size α is such that α ≤ ᾱ, then
f (α) ≤ −αδ 2 .
(51)
2
By Lemma 5.6, the default size α̃ satisfies α̃ ≤ α, we get the following upper bound for f (α̃):
f (α̃) ≤ −
δ2
1
q(2δ + 1) q (4δ + 1)
.
By Lemma 4.9, assuming Ψ(V ) ≥ τ ≥ 1, we obtain
µ
¶
µ
¶
µ
¶
1
1
1
1
1
1
1
δ = δ(V ) ≥
1−
≥
1
−
≥
1
−
≥ .
q
q
2
(1 + Ψ(V ))
2
(1 + τ )
2
1 + qτ
4
Since the decrease depends monotonically on δ, substitution yields
f (α̃) ≤ −
1
16
2q(1.5)
15
1
q
≤−
1
.
48q
5.2
Iteration bounds
We need to count how many inner iterations are required to return to the situation where Ψ(V ) ≤ τ .
We denote the value of Ψ(V ) after the µ-update as Ψ0 , the subsequent values in the same outer iteration
are denoted as Ψk , k = 1, 2, ..., K, where K denotes the total number of inner iterations in the outer
iteration. Using (39), we have
Ãτ
!
q
n + q−1
Ψ0 ≤ nψ √
.
1−θ
√
Since ψ(t) ≤ t − 1 when t ≥ 1, and 1 − 1 − θ = 1+√θ1−θ ≤ θ, we have
Ψ0 ≤
n
θn + τ + q−1
√
.
1−θ
(52)
According to decrease of f (α̃), we get
Ψk+1 ≤ Ψk −
1
,
48q
Lemma 5.7 One has
K ≤ 48q
Proof:
k = 0, 1, ..., K − 1.
n
θn + τ + q−1
√
.
1−θ
Using (52) and (53), the proof is trivial.
(53)
(54)
2
1
n
The number of outer iterations is bounded above by log , (seeing [16] Π.17, page 116). By multiplying
θ
²
the number of outer iterations and the number of inner iterations we get an upper bound for the total
number of iterations, namely,
n
θn + τ + q−1
n
√
48q
log .
²
θ 1−θ
In large-update methods one takes for θ a constant (independent on n), namely θ = Θ(1), and τ = O(n).
The iteration bound then becomes
n
O(qn) log .
(55)
²
Obviously, the bound suggest to take q as small as possible, i.e., q = 2. Note that the bound is exactly
the same as the bound obtained in [3] for LO, with the same kernel function, namely O(n log n² ).
5.3
Complexity of small-update methods
It may be noted that in the above case the worse iteration bound is not as good as it can be for smallupdate methods due to the fact that the used upper bound for % (s) is not tight at s = 0: it should be
equal to % (0) = 1 when s = 0. Using Lemma 4.5, we will see below that an upper bound that is tight at
s = 0 will lead to the correct iteration bound.
We recall from previous section that the number K of inner iterations is bound above by
K ≤ 48qΨ0 .
To estimate Ψ0 , we use (40) and Lemma 4.3, with ψ 00 (1) = q. We then obtain

Ψ0 ≤ nψ 
q
1+
τ2
n2
√
+
q τ
q−1 n
1−θ
q

2
2
q τ
1 + nτ 2 + q−1
n
qn

≤
√
− 1 .
2
1−θ

16
Using 1 −
√
1−θ =
√θ
1+ 1−θ
≤ θ again and

Ψ0 ≤
q
q
q−1
τ2
n2
+
qn  θ +
√
2
1−θ
≤ 2 this can be simplified to
2τ
n
2
 =
µ
¶2
q
√
2
q θ n + τn + 2τ
.
2(1 − θ)
We conclude that the total number of iterations is bounded above by
K
n
log ≤ 24
θ
²
µ
¶2
q
√
2
q 2 θ n + τn + 2τ
θ(1 − θ)
log
n
.
ε
For small-update methods, namely, θ = Θ( √1n ) and τ = O(1), it shows clearly that the iteration bound
¡ √
¢
is O q 2 n log nε .
6
Conclusions and remarks
We have extended the approach to primal-dual interior-point algorithms for LO, as developed in [3] to
SDO. We derive the same complexity bounds for large- and small- update methods for SDO as for the
LO case.
We developed some new analysis tools that can deal with non-self-regular kernel functions ( for self-regular
function, see [14, 15]). Moreover, our analysis is simple and straightforward.
Some interesting topics remain for further research. Firstly, the search direction used in this paper is
based on the NT-symmetrization scheme and it is natural to ask if other symmetrization schemes can
be used. Secondly, numerical experiments are necessary to compare the behavior of the algorithm of
this paper with existing methods. Thirdly, it is an interesting question whether we can design similar
algorithms by using the kernel function Υp,q (t) with 0 ≤ p ≤ 1, and q > 1. Note that if p ≥ 1 then
Υp,q (t) is self-regular and the case has been considered in [14, 15].
References
[1] Alizadeh, F., Haeberly, J.A., and Overton. M.: A new primal-dual interior-point method for semidefinite
programming. In Lewis,J.G., Ed., Proceedings of the fifth SIAM Conference on Applied Linear Algebra,
SIAM, 1994, 113-117.
[2] Andersen,E.D., Gondzio,J., Mészáros, Cs. and Xu. X.: Implementation of interior point methods for large
scale linear programming. In Terlaky, T. Ed., Interior Point Methods of Mathematical Programming, Kluwer
Academic Publishers, The Netherlands, 1996, 189–252.
[3] Bai,Y.Q., Roos. C.: A primal-dual interior-point method based on a new kernel function with linear growth
rate. Proceedings of Industrial Optimization Symposium and Optimization Day. Australia, November 2002.
[4] Bai, Y. Q., Ghami, M. El, and Roos. C.: A comparative study of kernel functions for primal-dual interiorpoint algorithms in linear optimization. Accepted by SIAM Journal on Optimization, 2004.
[5] Ben-Tal, A, Nemirovski. A.: Lectures on Modern Convex Optimization: Analysis, Algorithm, and Engineering
Applications. MPS-SIAM Series on Optimization, Vol. 02, SIAM, Philadelphia, PA, 2001.
[6] Helmberg, C., Rendl, F., Vanderbei, R. J. and Wolkowicz, H.: An interior-point method for semidefinite
programming. SIAM Journal on Optimization, 6, 1996, 342-361.
[7] Horn, R.A. and Charles R. Johnson: Topics in Matrix Analysis, Cambridge University Press, 1991.
[8] Kojima, M. Mizuno, S. and Yoshise, A.: Interior-point methods for the monotone semidefinite linear complementarity problem in symmetric matrices. SIAM Journal on Optimization, 7, 1997, 342-361.
[9] Lütkepohl, H.: Handbook of matrices, John Wiley & Sons, 1996.
17
[10] Megiddo, N.: Pathways to the optimal set in linear programming. In Megiddo, N. Ed., Progress in Mathematical Programming: Interior Point and Related Methods, Springer Verlag, New York, 1989, 131–158. Identical
version in : Proceedings of the 6th Mathematical Programming Symposium of Japan, Nagoya, Japan, 1986,
1–35.
[11] Monteiro, R.D.C.: Primal-dual path-following algorithms for semidefinite programming. SIAM Journal on
Optimization, 7, 1997, 663-678.
[12] Nesterov, Yu. E. and Todd, M. J.: Self-scaled barries and interior-point methods for convex programming,
Mathematics of Operations Reaearch, 22(1), 1997, 1-42.
[13] Nesterov, Yu. E. and Todd, M. J.: Primal-dual interior-point methods for self-scaled cones. SIAM Journal
on Optimization, 8(2),(electronic), 1998, pages 324–364.
[14] Peng, P. Roos, C. and Terlaky, T.: Self-regular functions and new search directions for linear and semidefinite
optimization. Mathematical Programming, 93, 2002, 129–171.
[15] Peng, J. Roos, C. and Terlaky. T.: Self-Regularity: A New Paradigm for Primal-Dual Interior-Point Algorithms. Princeton University Press, 2002.
[16] Roos, C. Terlaky, T. and Vial, J.-Ph.: Theory and Algorithms for Linear Optimization. An Interior-Point
Approach. John Wiley & Sons, Chichester, UK, 1997.
[17] Sonnevend. G.: An “analytic center” for polyhedrons and new classes of global algorithms for linear (smooth,
convex) programming. In Prékopa, A. Szelezsán, J. and Strazicky, B. Eds. System Modelling and Optimization : Proceedings of the 12th IFIP-Conference held in Budapest, Hungary, September 1985, volume 84 of
Lecture Notes in Control and Information Sciences, Springer Verlag, Berlin, West–Germany, 1986, 866–876.
[18] Wolkowicz, H. Saigal, R. and Vandenberghe, L.: Handbook of Semidefinite Programming, Theory, Algorithms,
and Applications, Kluwer Academic Publishers, 2000.
18
Download