Primal-Dual Interior-Point Algorithms for Semidefinite Optimization Based on a Simple Kernel Function ∗ G. Q. Wang† Y. Q. Bai‡ C. Roos‡ May 22, 2004 † Department of Mathematics, College Science, Shanghai University, Shanghai, 200436. ‡ Faculty of Electrical Engineering, Mathematics and Computer Science, Delft University of Technology, P.O. Box 5031, 2600 GA Delft, The Netherlands e-mail: guoq wang@hotmail.com, [y.bai, c.roos]@ewi.tudelft.nl Abstract Interior-point methods (IPMs) for semidefinite optimization (SDO) have been studied intensively, due to their polynomial complexity and practical efficiency. Recently, J.Peng et al. [14, 15] introduced so-called self-regular kernel (and barrier) functions and designed primal-dual interior-point algorithms based on self-regular proximity for linear optimization (LO) problems. They have also extended the approach for LO to SDO. In this paper we present a primal-dual interior-point algorithm for SDO problems based on a simple kernel function which was first introduced in [3]. The kernel function in this paper is not self-regular due to its growth term increasing linearly. We derive the complexity analysis for algorithms with large- and small-update methods. The complexity bounds are O(qn) log n² √ and O(q 2 n) log n² , respectively, which are as good as those in linear case. Keywords: semidefinite optimization, interior-point methods, primal-dual methods, large- and smallupdate methods, polynomial complexity. AMS Subject Classification: 90C22, 90C31 1 Introduction Semidefinite optimization problems (SDO) are convex optimization problems over the intersection of an affine set and the cone of positive semidefinite matrices. In the past decade, SDO has been one of the most active research areas in mathematical programming. There are two major factors that are responsible for this increased interest in SDO. Firstly, SDO has found numbers applications in various fields, such as statistics, structural design, electrical engineering and combinatorial optimization. Secondly, efficient new algorithms, interior-point methods (IPMs) have led to increased interest both in the application and ∗ The second author is on leave from the Department of Mathematics, Shanghai University, Shanghai 200436, China. She kindly acknowledges the support of Dutch Organization for Scientific Researches (NWO grant 613.000.110). Corresponding author. 1 the research of SDO. In this paper we deal with so-called primal-dual IPMs. It is generally agreed that these IPMs are most efficient from a computational part of view [2]. Many researchers have studied SDO and achieved plentiful and beautiful results. For an overview of these results we refer to [18]. Several interior-point methods designed for LO have been successfully extended to SDO. In particular, primal-dual interior-point algorithms are of highly efficiency both in theory and in practice. An important work in this direction is the paper of Nesterov and Todd [13] who showed that the primal-dual algorithm maintains its theoretical efficiency when the nonnegativity constrains in LO are replaced by a convex cone, as long as the cone is homogeneous and self-dual, or in the terminology of Nesterov and Todd, as long as the cone is self-scaled. Recently, J.Peng et al. [14, 15] introduced so-called self-regular kernel functions. The prototype self-regular function in [14, 15] is given by Υp,q (t) = tp+1 − 1 t1−q − 1 p − q + + (t − 1), p(p + 1) q(q − 1) pq (1) where p ≥ 1 and q > 1. The parameter p is called the growth degree and q the barrier degree of the kernel function. Based on self-regular functions, they designed primal-dual interior-point algorithms for LO√problems and also√extended the approach to SDO ones. The complexity bounds obtained by them are O( n) log n² and O( n log n) log n² , for small-update methods and large-update methods, respectively, which are currently the best known bounds. Motivated by their work, in this paper we present a primaldual interior-point algorithm for SDO based on the kernel function: ψ(t) = t − 1 + t1−q − 1 , q−1 t > 0, (2) where q > 1 is a parameter. This kernel function determines a matrix-valued function ψ(V ), which is defined in the usual way (see Section 2.1), and a real-valued function Ψ(V ) defined as follows. Ψ(V ) := Tr(ψ(V )), V ∈ Sn++ . (3) The kernel function ψ(t) was introduced in [3] for LO. See also [4]. The kernel function ψ(t) does not belong to the family of self-regular functions, and as a consequence a large part of the analysis tools in [14, 15] do no longer apply. In this paper we develop some new analysis tools for the analysis of primaldual interior-point algorithm based on ψ(t). Via the special definition of the matrix functions ψ(V ) and Ψ(V ) and nice properties of ψ(t) the analysis in this paper is much simper than in [14, 15]. We derive the complexity analysis for algorithms with large- and small-update methods. The complexity bounds are as good as bounds for the linear case. The outline of the paper is as follows. In Section 2 we first describe special matrix functions used in later sections. Then we briefly recall the basic concepts of interior-point methods for SDO, such as central path, NT-search directions, etc.. In Section 3, we describe a primal-dual interior-point algorithm based on Ψ(V ) for SDO. In Section 4, we present the properties of ψ(t) and study the matrix functions ψ(V ) and Ψ(V ). We analyze the algorithm to derive the complexity bound with large- and small-update methods in Section 5. Finally, some concluding remarks follow in Section 6. Some notations used throughout the paper are as follows. Rn , Rn+ and Rn++ denote the set of vectors with n components, the set of nonnegative vectors and the set of positive vectors, respectively. k.k denotes the Frobenius norm for matrices, and the 2-norm for vectors. Rm×n is the space of all m × n matrices. Sn , Sn+ and Sn++ denote the cone of symmetric, symmetric positive semidefinite and symmetric positive definite n × n matrices, respectively. E denotes n × n identity matrix. The Löwner partial order ”º” on positive semidefinite (or positive definite) matrices means A º B (A  B) if A − B is positive semidefinite (or positive definite). We use the matrix inner product A•B = Tr(AT B). For any symmetric 1 positive definite matrix Q ∈ Sn++ , the expression Q 2 denotes its symmetric square root. Similarly we α can define the power Q for any Q  0 and α ∈ R. When λ is a vector we denote the diagonal matrix Λ with entries λi by diag (λ). For any V ∈ Sn++ , we denote by λ(V ) the vector of eigenvalues of V arranged in non-increasing order, that is, λ1 (V ) ≥ λ2 (V ) ≥ ... ≥ λn (V ). For any matrix M , we denote by σ1 (M ) ≥ σ2 (M ) ≥ ... ≥ σn (M ) the singular values of M . Especially if M is symmetric, then one has σi (M ) = |λi (M )|, i = 1, 2, ..., n. 2 2 2.1 Preliminaries Special matrix functions To introduce special matrix functions which will be useful for designing primal-dual interior-point algorithm, first of all, let us recall some known facts from linear algebra. Theorem 2.1 [Spectral theorem for symmetric matrices in [18]] The real n × n matrix A is symmetric if and only if there exists an orthogonal basis with respect to which A is real and diagonal, i.e. if and only if there exits a matrix U ∈ Rn×n such that U T U = E and U T AU = Λ where Λ is a diagonal matrix. 2 The columns ui of U are the eigenvectors of A, satisfying Aui = λi ui , i = 1, ..., n, where λi is the i-th diagonal entry of Λ. Now we are ready to show how a matrix function can be obtained from ψ(t). Definition 2.2 Let V ∈ Sn++ be any symmetric n × n matrix and V = QT diag (λ(V ))Q where Q is any orthonormal matrix that diagonalizes V . Let ψ(t) be defined as in (2). The matrix function ψ(V ) : Sn++ → Sn is defined by ψ(V ) = QT diag (ψ(λ1 (V )), ψ(λ2 (V )), · · ·, ψ(λn (V )))Q. (4) Then we define a matrix function Ψ(V ): Sn++ → R+ as follows. Definition 2.3 Ψ(V ) = Tr(ψ(V )) = n X ψ(λi (V )) (5) i=1 where ψ(V ) is given by (4). Since ψ(t) is triple differentiable, the derivatives ψ 0 (t), ψ 00 (t) and ψ 000 (t) are well-defined for t > 0. Hence, replacing ψ(λi (V )) in (4) by ψ 0 (λi (V )), ψ 00 (λi (V )) and ψ 000 (λi (V )), respectively, for each matrix, we obtain that the matrix functions ψ 0 (V ), ψ 00 (V ) and ψ 000 (V ) already are defined. ψ(V ) depends only on the restriction of ψ(t) on the spectrum (the set of eigenvalues) of V (See p.278 in [5]). Just as in the linear case, we call ψ(t) the kernel function for the matrix functions ψ(V ) and Ψ(V ). Moreover, we call both ψ(V ) and Ψ(V ) matrix barrier functions. The following lemma can be found in [15]. For completeness’ sake we include the proof below. Lemma 2.4 Ψ(V ) is strictly convex with respect to V  0 and vanishes at its global minimal point V = E, i.e., ψ(E) = ψ 0 (E) = 0n×n . Moreover, Ψ(E) = 0. Proof: Firstly, we prove Ψ(V ) is strictly convex for V  0, i.e., for any V1 6= V2  0, the following inequality holds ¶ µ 1 V1 + V2 < (Ψ(V1 ) + Ψ(V2 )) . (6) Ψ 2 2 Since both V1 , V2 are positive definite, so is the matrix 12 (V1 + V2 ). Using Theorem 2.1, there exists orthogonal matrices Q, Q̃1 , Q̃2 and diagonal matrices Λ, Λ1 , Λ2 such that Λ = diag (λ) = 1 1 1 1 1 T T Q(V1 + V2 )QT = QV1 QT + QV2 QT2 = QQ̃1 Λ1 Q̃1 QT + QQ̃2 Λ2 Q̃2 QT . 2 2 2 2 2 3 Denote that Q1 = QQ̃1 and Q2 = QQ̃2 , we have Λ= 1 (Q1 Λ1 QT1 + Q2 Λ1 QT2 ), 2 (7) where Λ1 = diag (λ1 ) and Λ2 = diag (λ2 ) are positive diagonal matrices, and λ, λ1 , λ2 are vectors whose components are the eigenvalues of 21 (V1 + V2 ), V1 and V2 , respectively. We denote M = ([Q1 ]2ij )n×n From (7) one can easily get λ= and N = ([Q2 ]2ij )n×n . 1 (M λ1 + N λ2 ). 2 Using the orthogonality of the matrices Q1 and Q2 , one has n X Mij = i=1 n X n X Nij = 1, j = 1, 2, ..., n, i=1 Mij = j=1 n X Nij = 1, i = 1, 2, ..., n. j=1 So M and N are double stochastic matrices. From the strict convexity of ψ(t) and the fact that V1 6= V2  0. We readily obtain ! à n µ ¶ X ¶ µ n n n X X V1 + V2 1 X M i λ1 + N i λ2 1 2 Ψ ψ(Mi λ ) + ψ(Ni λ ) , = < ψ(λi ) = ψ 2 2 2 i=1 i=1 i=1 i=1 where Mi (Ni ) denotes the i-th row of the M(N), using the orthogonality of the matrices Q1 and Q2 and the convexity of ψ(t) again . Since M is double stochastic, we have n X i=1 similarly, one has ψ(Mi λ1 ) ≤ n X n X Mij ψ(λ1j ) = i=1 j=1 n X n X ψ(λ1j ) = Ψ(V1 ), j=1 ψ(Ni λ2 ) ≤ Ψ(V2 ). i=1 The above two inequalities yield the desired relation (6). Since ψ(1) = ψ 0 (1) = 0, by using (4) one can easily verify other statements. 2 Moreover, we need to deal with another concepts relevant to matrix functions in matrix theory. The related results of matrix of function can be found in [7, 9]. Definition 2.5 A matrix M (t) is said to be a matrix of functions if each entry of M (t) is a function of t, i.e., M (t) = [Mij (t)]. First, we state two inequalities, which are used in the proof of Lemma 5.2. If M, N ∈ Sn , then |Tr(M N )| ≤ |λ1 (M )| n X |λi (N )|. (8) i=1 Furthermore, if M1 ¹ M2 and N º 0, then Tr(M1 N ) ≤ Tr(M2 N ). 4 (9) The usual concepts of continuity, differentiability and integrability can be naturally extended to matrices of functions, by interpreting them entry-wise. Let M (t) and N (t) be two matrices of functions. Then it can easily be understood that · ¸ d d M (t) = Mij (t) = M 0 (t), (10) dt dt µ ¶ d d Tr(M (t)) = Tr M (t) = Tr(M 0 (t)), (11) dt dt d Tr(ψ(M (t))) = Tr(ψ 0 (M (t))M 0 (t), dt · ¸ · ¸ d d d (M (t)N (t)) = M (t) N (t) + M (t) N (t) = M 0 (t)N (t) + M (t)N 0 (t). dt dt dt (12) (13) Remark 2.6 In the rest of the section, when we use the function ψ(·) and its derivatives ψ 0 (·) and ψ 00 (·), these denote matrix functions if the argument is a matrix and a univariate function if the argument is in R+ . 2.2 The central path for SDO We consider the standard form for SDO problems: (P ) minimize subject to C •X Ai • X = bi , i = 1, 2, ..., m, X º 0. (14) and its dual: (D) maximize subject to bT y Pm i=1 yi Ai + S = C, S º 0. (15) where each Ai ∈ Sn , b ∈ Rm , and C ∈ Sn . The matrices Ai are linearly independent. We assume that (P ) and (D) satisfy the interior-point condition IPC, i.e., there exists X ∈ P, S ∈ D with X  0, S  0, where P and D denote the feasible set of problem (P) and (D), respectively. Under the assumption of IPC, the optimality conditions for (P ) and (D) can be written as follows. Ai • X = bi , i = 1, 2, ..., m, X º 0, m X yi Ai + S = C, S º 0, (16) i=1 XS = 0. We modify the above system by relaxing the third equation as follows. Ai • X = bi , i = 1, 2, ..., m, X º 0, m X yi Ai + S = C, S º 0, (17) i=1 XS = µE, with µ > 0 and E is the n × n unit matrix. Under the assumption that (P ) and (D) satisfy the IPC ( this can be achieved via the so-called self-dual embedding technique introduced by E. de Klerk et al. in 5 [18]), the system (17) has a unique solution, denoted by (X(µ), y(µ), S(µ)). We call X(µ) the µ-center of (P ) and (y(µ), S(µ)) the µ-center of (D). The set of µ-centers (with µ running through positive real numbers) gives a homotopy path, which is called the central path of (P ) and (D). If µ → 0 then the limit of the central path exists and since the limit points satisfy the complementarity condition, the limit yields optimal solutions for (P ) and (D). The relevance of the central path for LO and SDO has been discussed in many papers, see, e.g. [18], [10], and [17], etc.. 2.3 Search directions determined by kernel functions The core idea of a primal-dual interior-point algorithm is to follow the central path and to approach the optimal set of SDO problems by letting µ go to zero. A direct application of Newton’s method to (17) produces the following equations for the search direction ∆X, 4y and ∆S: Ai • ∆X = 0, i = 1, 2, ..., m, m X ∆yi Ai + ∆S = 0, (18) i=1 X∆S + ∆XS = µE − XS. It can be showed that this system has a unique solution, see [18]. We can rewrite system (18) as follows. Ai • ∆X = 0, i = 1, 2, ..., m, m X ∆yi Ai + ∆S = 0, (19) i=1 ∆X + X∆SS −1 = µS −1 − X. It is obvious that ∆S is symmetric due to the second equation in (19). However, a crucial observation is that ∆X is not necessary symmetric because X∆SS −1 may be not symmetric. Many researchers have proposed methods for symmetrizing the third equation in the above Newton system such that the resulting new system has a unique symmetric solution. Among them, the following three directions are the most popular ones, the direction introduced by Alizadeh, Haeberly, Overton (AHO) in [1], by Helmberg, et al., Kojima et al., and Monteiro (HKM) in [6, 8, 11], Nesterov and Todd (NT) in [12, 13], respectively, called AHO, HKM, and NT directions. In this paper we use the symmetrization scheme from which the NT direction [12] is derived. One important reason for this is that the NT scaling technique transfers the primal variable X and the dual S into the same space: the so-called V -space. If we apply the NT-symmetrized scheme, namely, the term X∆SS −1 in the third equation is replaced by P ∆SP T , where 1 1 1 1 1 1 1 1 1 1 P := X 2 (X 2 SX 2 )− 2 X 2 = S − 2 (S 2 XS 2 ) 2 S − 2 . Then the above system is replaced by the system Ai • ∆X = 0, i = 1, 2, ..., m, m X ∆yi Ai + ∆S = 0, (20) i=1 ∆X + P ∆SP T = µS −1 − X. Obviously, now ∆X is a symmetric matrix and system (20) still has a unique solution (See [18]). Let 1 D = P 2 . Then matrix D can be used to scale X and S to the same matrix V because 1 1 V := √ D−1 XD−1 = √ DSD. µ µ 6 (21) Therefore we have 1 −1 D XSD. µ Note that the matrices D and V are symmetric and positive definite. Let us further define V 2 := 1 1 Āi := DAi D, i = 1, 2, · · ·, m; DX := √ D−1 4 XD−1 ; DS := √ D 4 SD. µ µ (22) (23) Then it follows from (20) that the (scaled) NT search direction (DX , 4y, DS ) is satisfied the system Āi • DX = 0, i = 1, 2, ..., m, m X ∆yi Āi + DS = 0, (24) i=1 DX + DS = V −1 − V. So far we describe the scheme that defines the classical NT direction. Now, following [15] we change to the new approach of this paper. Given the kernel function ψ(t) and the associated ψ(V ) and ψ 0 (V ) as defined in Definition 2.2, instead of the V −1 − V , we replace the right-hand side in the third equation in (24) by −ψ 0 (V ). Thus we consider the following system. Āi • DX = 0, i = 1, 2, ..., m, m X ∆yi Āi + DS = 0, (25) i=1 DX + DS = −ψ 0 (V ). It is straightforward to verify the system (25) has a unique solution. Having DX and DS , we can calculate ∆X and ∆S by (23). We will argue below that the matrix function ψ(V ) determines in a natural way an interior-point algorithm. Since DX and DS are orthogonal, that is, Tr(DX DS ) = Tr(DS DX ) = 0, we have (26) DX = DS = 0n×n ⇔ ψ 0 (V ) = 0n×n ⇔ V = E ⇔ Ψ(V ) = 0, i.e., if and only if XS = µE, that is, i.e., if and only if X = X(µ) and S = S(µ), as it should. Otherwise Ψ(V ) > 0. Hence, if (X, y, S) 6= (X(µ), y(µ), S(µ)) then (4X, 4y, 4S) is nonzero. By taking a step along the search direction, with the step size α defined by some line search rules, one constructs a new triple (X, y, S) according to X+ = X + α4X, 3 y+ = y + α4y, S+ = S + α4S. (27) A primal-dual algorithm for SDO We can now describe the algorithm in a more formal way. The generic form of this algorithm is shown in Figure 1. It is clear from this description that closeness of (X, y, S) to (X(µ), y(µ), S(µ)) is measured by the value of Ψ(V ), with τ as a threshold value: if Ψ(V ) ≤ τ then we start a new outer iteration by performing a µ-update, otherwise we enter an inner iteration by computing the search directions at the current iterates with respect to the current value of µ and apply (27) to get new iterates. The parameters τ, θ and the step size α should be chosen in such a way that the algorithm is ’optimized’ in the sense that the number of iterations required by the algorithm is as small as possible. The choice 7 Primal-Dual Algorithm for SDO Input: A threshold parameter τ ≥ 1; an accuracy parameter ε > 0; a fixed barrier update parameter θ, 0 < θ < 1; a strictly feasible (X 0 , S 0 ) and µ0 = 1 such that Ψ(X 0 , S 0 , µ0 ) ≤ τ. begin X := X 0 ; S := S 0 ; µ := µ0 ; while nµ ≥ ε do begin µ := (1 − θ)µ; while Ψ(X, S, µ) > τ do begin Solve system (25) and use (23) for ∆X, ∆y, ∆S, Determine a step size α; X := X + α∆X; S := S + α∆S; y := y + α∆y; 1 V := √1µ (D−1 XSD) 2 ; end end end Figure 1: Algorithm of the so-called barrier update parameter θ plays an important role both in theory and practice of IPMs. Usually, if θ is a constant independent of the dimension n of the problem, for instance θ = 12 , then we call the algorithm a large-update (or long-step) method. If θ depends on the dimension of the problem, such as θ = √1n , then the algorithm is named a small-update (or short-step) method. The choice of the step size α (0 ≤ α ≤ 1) is another crucial issue in the analysis of the algorithm. It has to be taken such that the closeness of the iterates to the current µ-center improves by a sufficient amount. In the theoretical analysis the step size α is usually given a value that depends on the closeness of the current iterates to the µ-center. 4 Properties of the kernel (barrier) function In this section, we study properties of the kernel function ψ(t) and the barrier function Ψ(V ). 4.1 Properties of ψ(t) In this section we recall some properties of ψ(t) from [3] and [4]. The first three derivatives of ψ(t) are ψ 0 (t) = 1 − 1 , tq ψ 00 (t) = q > 0, tq+1 ψ 000 (t) = − It is quite straightforward to verify 00 ψ(1) = ψ 0 (1) = 0, ψ (t) > 0, t > 0, 8 q(q + 1) . tq+2 (28) lim ψ(t) = lim ψ(t) = +∞. t→0 t→∞ Moreover, from (28), ψ(t) is strictly convex and ψ 00 (t) is monotonically decreasing in t ∈ (0, +∞). Lemma 4.1 [Lemma 2.3 in [3]] Let t1 > 0, and t2 > 0. Then √ 1 ψ( t1 t2 ) ≤ (ψ(t1 ) + ψ(t2 )). 2 2 Lemma 4.2 One has Proof: 1 ψ 0 (t)2 ≥ ψ 0 ( )2 , t 0 < t ≤ 1. (29) When 0 < t ≤ 1, we have ¶2 ¶ µ µ ¶µ 1 1 1 1 2 q ψ 0 (t)2 − ψ 0 ( )2 = 1 − q − (1 − tq ) = q + tq − 2 − t ≥ 0, t t t tq which implies the lemma. 2 Lemma 4.3 [ Lemma 2.6 in [4]] One has ψ(t) < 1 00 ψ (1)(t − 1)2 , 2 if t ≥ 1. 2 Lemma 4.4 [ Lemma 3.1 in [4]] Suppose that ψ(t1 ) = ψ(t2 ) with t1 ≤ 1 ≤ t2 and β ≥ 1. Then ψ(βt1 ) ≤ ψ(βt2 ). Equality holds if and only if β = 1 or t1 = t2 = 1. 2 Lemma 4.5 Let % : [0, ∞) → [1, ∞) be the inverse function of ψ(t) for t ≥ 1. If q > 1, one has q . q−1 (30) q s. q−1 (31) 1 + s ≤ %(s) ≤ s + If q ≥ 2, one has r %(s) ≤ 1 + Proof: s2 + When q > 1, we have t = % (s) ⇔ ψ(t) = t − 1 + t1−q − 1 = s, q−1 t ≥ 1. Using that t ≥ 1 one easily sees that 1 + s ≤ %(s) ≤ s + q . q−1 Now we consider the case that q ≥ 2. We first establish that tψ(t) ≥ (t − 1)2 for t ≥ 1. This goes as follows. Defining f (t) = tψ(t) − (t − 1)2 one has f (1) = 0 and f 0 (t) = ψ(t) + tψ 0 (t) − 2(t − 1). Hence f 0 (1) = 0 and f 00 (t) = 2ψ 0 (t) + tψ 00 (t) − 2. Since f 00 (t) = (q − 2)t−q ≥ 0 the claim follows. Hence we have p t ≤ 1 + tψ(t). 9 Now substituting t ≤ ψ(t) + q q−1 we obtain r %(s) ≤ 1 + s2 + q s. q−1 This completes the lemma. 2 Remark 4.6 (31) is tighter than (30) at s = 0. This will help us in getting a sharp iteration bound for small-update methods. 4.2 Properties of Ψ(V ) The following lemmas are crucial for the analysis of the SDO algorithm to be stated below. We need a result from [7] that we state without proof. Lemma 4.7 [ Lemma 3.3.14 (c) in [7]] Let M, N ∈ Sn be two nonsingular matrices and f (t) be given real-valued function such that f (et ) is a convex function. One has n X f (σi (M N )) ≤ i=1 n X f (σi (M )σi (N )) , (32) i=1 where σi (M ), i = 1, 2, ..., n denote the singular values of M . 2 We can apply the above lemma to ψ(t) because it is easily verify that ψ(et ) is convex function. Lemma 4.8 [ Proposition 5.2.6 in [15]] Suppose that matrices V1 and V2 are symmetric positive definite, then 1 1 1 1 Ψ([V12 V2 V12 ] 2 ) ≤ (Ψ(V1 ) + Ψ(V2 )). (33) 2 Proof: For any nonsingular matrix V ∈ Sn , ¡ ¢1 ¡ ¢1 σi (V ) = λi (V T V ) 2 = λi (V V T ) 2 , i = 1, 2, ..., n. ¿From the above equality, we have ³ 1 1´ ³ ´ 12 ³ 1 ´ 1 1 1 1 σi V12 V22 = λi (V12 V2 V12 ) = λi [V12 V2 V12 ] 2 , i = 1, 2, ..., n. Since V1 and V2 are symmetric positive definite, one has σi (V1 ) = λi (V1 ), σi (V2 ) = λi (V2 ), i = 1, 2, ..., n. Using the definition of Ψ(V ), Lemma 4.7 and Lemma 4.1, one has n ´ P ³ ³ 1 ´ X ³ ´ 1 1 1 1 1 1 n Ψ [V12 V2 V12 ] 2 = i=1 ψ σi (V12 V22 ) ≤ ψ σi (V12 )σi (V22 ) i=1 n ³ ´´ 1 1 X ³ ³ 2 12 ´ ψ σi (V1 ) + ψ σi2 (V22 ) ≤ 2 i=1 n 1X 1 = (ψ(σi (V1 )) + ψ(σi (V2 ))) = (Ψ(V1 ) + Ψ(V2 )). 2 i=1 2 This completes the lemma. 2 10 In the analysis of the algorithm we also use the norm-based proximity measure δ(V ) defined by v u n X 1 0 1u 1 δ(V ) := kψ (V )k = t ψ 0 (λi (V ))2 = kDX + DS k. 2 2 i=1 2 (34) Obviously, since DX ⊥DS , we have kDX + DS k2 = kDX k2 + kDS k2 . (35) Moreover, recall that Ψ(V ) is strictly convex and minimal at V = E whereas the minimal value is zero. So we have Ψ(V ) = 0 ⇔ δ(V ) = 0 ⇔ V = E. The following lemma gives a bound of δ(V ) in terms of Ψ(V ). Lemma 4.9 One has 1 δ(V ) ≥ 2 µ 1− 1 (1 + Ψ(V ))q ¶ , V º 0. (36) Proof: The statement in the lemma is obvious if V = E, since then δ(V ) = Ψ(V ) = 0, otherwise we have δ(V ) > 0 and Ψ(V ) > 0. To deal with the nontrivial case we consider, for K > 0, the problem n z K = min{δ(V )2 = 1X 0 ψ (λi (V ))2 : Ψ(V ) = K}. 4 i=1 The first order optimality condition becomes 1 0 ψ (λi (V ))ψ 00 (λi (V )) = γψ 0 (λi (V )), i = 1, 2, ..., n. 2 where γ ∈ R is the Lagrange multiplier. Since K > 0 we have λi (V ) 6= 1 for at least one i. Therefor, γ 6= 0. It follows that either ψ(λ0i (V )) = 0 or ψ 00 (λi (V )) = 2γ, for each i. Since ψ 00 (t) is monotonically decreasing, this implies that all λi (V ) for which ψ 00 (λi (V )) = 2γ have the same value. Denoting this value as η, and observing that all other eigenvalues have value 1, we conclude that, after reordering the eigenvalues, V has the form V = QT diag (η, ...η, 1..., 1)Q. Now Ψ(V ) = K implies kψ(η) = K. Given k, this uniquely determines ψ(η), where we have 4δ(V )2 = kψ 0 (η)2 = k(1 − 1 2 ) , ηq ψ(η) = K . k Noth that the equation ψ(η) = K k has two solutions, one smaller than 1 and one larger than 1, that can be written as η = λ1 (V ) and η = λ2 (V ) with 0 < λ1 (V ) < 1 and λ2 (V ) > 1. By Lemma 4.2 we have 1 2 2 ψ 0 (λ1 (V ))2 ≥ ψ 0 ( λ1 (V ) ) . Since we are minimizing δ(V ) , we conclude that η > 1. From the definition K of ψ(t) we deduce that ψ(η) ≤ η − 1, for η > 1. Hence we obtain that K k ≤ η − 1, whence η ≥ 1 + k . 0 Since ψ (η) is monotonically increasing, we conclude that 4δ(V )2 = kψ 0 (η)2 = k(1 − 1 2 1 1 ) ≥ k(1 − )2 ≥ (1 − )2 . K q q η (1 + K)q (1 + k ) The last inequality is due to the fact that the middle expression is increasing in k. Thus we µ ¶ 1 1 δ(V ) ≥ 1− . 2 (1 + K)q Substituting Ψ(V ) for K, this inequality of the lemma follows. 2 Note that during the course of the algorithm the largest values of Ψ(V ) occur just after the update of µ. So next we derive an estimate for the effect of a µ-update on the value of Ψ(V ). We start with an important lemma. 11 Lemma 4.10 Let % : [0, ∞) → [1, ∞) be the inverse function of ψ(t) for t ≥ 1. Then we have for any positive definite matrix V and any β ≥ 1: ¶¶ µ µ Ψ(V ) Ψ(βV ) ≤ nψ β% . (37) n Proof: The inequality in the theorem is obvious if β = 1, or when V = E. So we may assume below that β > 1 and V 6= E. We consider the following maximization problem: max {Ψ(βV ) : Ψ(V ) = K} , V where K is any nonnegative number. The first order optimality conditions for this problem are βψ 0 (βλi (V )) = γψ 0 (λi (V )), i = 1, . . . , n, where γ denotes the Lagrange multiplier. Since ψ 0 (1) = 0 and βψ 0 (β) > 0, we must have λi (V ) 6= 1 for all i. We even may assume that λi (V ) > 1 for all i. To see this, let ki be such that ψ(Vi ) = ki . Given ki , this equation has two solutions: λi (V )(1) < 1 and λi (V )(2) > 1. As a consequence of Lemma 4.4 we have ψ(βλi (V )(1) ) ≤ ψ(βλi (V )(2) ). Since we are maximizing Ψ(βV ), we conclude that λi (V ) = λi (V )(2) > 1. Thus we have, for all i, µ ¶ µ ¶ 1 1 β 1− = γ 1 − , λi (V ) > 1. (βλi (V ))q λi (V )q Using this and β > 1 we easily obtain that γ > β > 1, and that all coordinates λi (V ) are equal and given by µ ¶1 γβ q−1 − 1 q λi (V ) = , i = 1, 2, ..., n. β q−1 (γ − β) Denoting their common values as t we deduce from Ψ(V ) = K that nψ(t) = K. This implies t = %(K/n). Hence the maximal value that Ψ(βV ) can attain is given by Ψ(βtE) = nψ(βt) = nψ(β%( K Ψ(V ) )) = nψ(β%( )). n n This proves the lemma. 2 V . If Ψ(V ) ≤ τ , then 1−θ µ ¶ %(τ /n) √ Ψ(V+ ) ≤ nψ . 1−θ Corollary 4.11 Let 0 ≤ θ ≤ 1 and V+ = √ (38) Proof: With β ≥ 1 and Ψ(V ) ≤ τ the inequality follows from Lemma 4.10. We use two upper bounds for %(s) which are formed by (30) and (31) to make two upper bounds for Ψ(V ). As we will show in the next section, each subsequent inner iteration will give rise to a decrease of value of Ψ(V ). Hence, we may already conclude that the numbers ! Ãτ q n + q−1 , q > 1, (39) L1 := nψ √ 1−θ and L2 := nψ q 1+ τ2 n2 √ + q τ q−1 n 1−θ , q ≥ 2, are two upper bounds for Ψ(V ) during the course of the algorithm. 12 (40) 5 5.1 Derivation of iteration bound Decrease of the value of Ψ(V ) and selection of step size In each inner iteration we first compute the search direction ∆X, ∆y, and ∆S from the system (25), also using (23). After a step with size α the new iterations are X+ = X + α∆X, y+ = y + α∆y and S+ = S + α∆S, respectively. Note that by (23), we have √ √ X+ = X + α∆X = X + α µDDX D = µD(V + αDX )D and √ √ S+ = S + α∆S = S + α µD−1 DS D−1 = µD−1 (V + αDS )D−1 . Thus we obtain ¢1 1 ¡ V+ = √ D−1 X+ S+ D 2 . µ 1 1 1 We can verify that V+2 is unitarily similar to the matrix X+2 S+ X+2 and thus to (V +αDX ) 2 (V +αDS )(V + 1 αDX ) 2 . This implies that the eigenvalues of V+ are precisely the same as those of the matrix ³ ´ 12 1 1 . V + := (V + αDX ) 2 (V + αDS )(V + αDX ) 2 (41) By the definition of Ψ(V ), we obtain Ψ(V+ ) = Ψ(V + ). Hence, by Lemma 4.8, we have Ψ(V+ ) = Ψ(V + ) ≤ 1 (Ψ(V + αDX ) + Ψ(V + αDS )) . 2 Defining, f (α) := Ψ(V+ ) − Ψ(V ) = Ψ(V + ) − Ψ(V ), we have f (α) ≤ f1 (α), where f1 (α) := 1 (Ψ(V + αDX ) + Ψ(V + αDS )) − Ψ(V ). 2 Obviously, f (0) = f1 (0) = 0. By using (11), (12) and (13), we get f10 (α) = 1 Tr (ψ 0 (V + αDX )DX + ψ 0 (V + αDS )DS ) , 2 (42) and ¢ 1 ¡ 00 2 Tr ψ (V + αDX )DX + ψ 00 (V + αDS )DS2 . 2 Using the third inequality of system (25). Obviously, f100 (α) = f1 (0) = 1 1 Tr(ψ 0 (V )(DX + DS )) = Tr(−ψ 0 (V )2 ) = −2δ(V )2 . 2 2 (43) (44) The following lemma is a slight modification of the Weyl Theorem (see [18]). Lemma 5.1 Let A, A + B ∈ Sn+ , one has λi (A + B) ≥ λn (A) − |λ1 (B)|, i = 1, 2, ..., n. 13 (45) By the Rayleigh-Ritz theorem (see [7]), for any i = 1, 2, ..., n, there exists X0 ∈ Rn , such that Proof: X0T (A + B)X0 X0T AX0 X0T BX0 = + X0T X0 X0T X0 X0T X0 ¯ ¯ ¯ T ¯ ¯ X BX ¯ X T AX0 ¯¯ X0T BX0 ¯¯ X T AX ¯ ¯ = λn (A) − |λ1 (B)|. ≥ 0T −¯ T − max ≥ min X6=0 ¯ X T X ¯ X0 X0 X0 X0 ¯ X6=0 X T X λi (A + B) ≥ λn (A + B) = This completes the lemma. 2 Below we use the following notation: δ := δ(V ). One of the main results in this section is below. Lemma 5.2 One has f100 (α) ≤ 2δ 2 ψ 00 (λn (V ) − 2αδ). (46) Proof: By using (34) and (35), we have kDX + DS k2 = kDX k2 + kDS k2 = 4δ 2 . It implies that |λ1 (DX )| ≤ 2δ and |λ1 (DS )| ≤ 2δ. Using Lemma 5.1 and V + αDX º 0, we have λi (V + αDX ) ≥ λn (V ) − α | λ1 (DX ) |≥ λn (V ) − 2αδ, i = 1, 2, ..., n. Since ψ 00 (t) is monotonically decreasing in t ∈ (0, ∞), we obtain ψ 00 (λi (V + αDX )) ≤ ψ 00 (λn (V ) − 2αδ), hence, we have ψ 00 (V + αDX ) ¹ ψ 00 (λn (V ) − 2αδ)E. 2 n Since DX ∈ S+ , by using (9) and (8), we obtain 2 2 ) ≤ ψ 00 (λn (V ) − 2αδ) ) ≤ Tr(ψ 00 ((λn (V ) − 2αδ)EDX Tr(ψ 00 (V + αDX )DX n X 2 ). λi (DX i=1 Similarly, Tr(ψ 00 (V + αDS )DS2 ) ≤ Tr(ψ 00 ((λn (V ) − 2αδ)EDS2 ) ≤ ψ 00 (λn (V ) − 2αδ) n X λi (DS2 ). i=1 2 2 2 Using (43), from the above two inequalities and kDX k + kDS k = 4δ , we have f100 (α) ≤ n X 1 00 2 ψ (λn (V ) − 2αδ) (λi (DX ) + λi (DS2 ) = 2δ 2 ψ 00 (λn (V ) − 2αδ). 2 i=1 This completes the proof. 2 Next we will choose a suitable step size for the algorithm. This should be chosen such that X+ and S+ are feasible and such that Ψ(V+ ) − Ψ(V ) decreases sufficiently. The following strategy for selecting the step size is almost a ”word-by-word” extension of the LO case in [3]. Therefore, for the proof of the following lemmas we refer to [3]. Lemma 5.3 [Lemma 3.2 in [3]] If the step size α satisfies −ψ 0 (λn (V ) − 2αδ) + ψ 0 (λn (V )) ≤ 2δ, then f10 (α) ≤ 0. 14 (47) 2 Let ρ : [0, ∞) → (0, 1] denote the inverse function of the restriction of − 21 ψ 0 (t) on the interval (0, 1], one may easily verify that 1 ρ(y) = y ∈ [0, ∞). (48) 1 , (2y + 1) q Lemma 5.4 [Lemma 3.3 in [3]] The largest possible value of the step size of α satisfying (48) is given by ᾱ := 1 (ρ(δ) − ρ(2δ)). 2δ (49) 2 By using (49), Lemma 5.3 and the well-known Bernoulli inequality (1 + x)α ≤ 1 + αx, x ≥ −1 and 0 ≤ α ≤ 1, we get 1 ᾱ ≥ α̃ = . (50) 1 q(2δ + 1) q (4δ + 1) We define our default step size as α̃. Lemma 5.5 (Lemma 3.12 in [14]) Let h(t) be a twice differentiable convex function with h(0) = 0, h0 (0) < 0 and let h(t) attain its (global) minimum at t∗ > 0. If h00 (t) is increasing for t ∈ [0, t∗ ] then h(t) ≤ th0 (0) , 2 0 ≤ t ≤ t∗ . 2 Via the above lemma, we have the following lemma. Lemma 5.6 [Lemma 3.6 in [3]] If the step size α is such that α ≤ ᾱ, then f (α) ≤ −αδ 2 . (51) 2 By Lemma 5.6, the default size α̃ satisfies α̃ ≤ α, we get the following upper bound for f (α̃): f (α̃) ≤ − δ2 1 q(2δ + 1) q (4δ + 1) . By Lemma 4.9, assuming Ψ(V ) ≥ τ ≥ 1, we obtain µ ¶ µ ¶ µ ¶ 1 1 1 1 1 1 1 δ = δ(V ) ≥ 1− ≥ 1 − ≥ 1 − ≥ . q q 2 (1 + Ψ(V )) 2 (1 + τ ) 2 1 + qτ 4 Since the decrease depends monotonically on δ, substitution yields f (α̃) ≤ − 1 16 2q(1.5) 15 1 q ≤− 1 . 48q 5.2 Iteration bounds We need to count how many inner iterations are required to return to the situation where Ψ(V ) ≤ τ . We denote the value of Ψ(V ) after the µ-update as Ψ0 , the subsequent values in the same outer iteration are denoted as Ψk , k = 1, 2, ..., K, where K denotes the total number of inner iterations in the outer iteration. Using (39), we have Ãτ ! q n + q−1 Ψ0 ≤ nψ √ . 1−θ √ Since ψ(t) ≤ t − 1 when t ≥ 1, and 1 − 1 − θ = 1+√θ1−θ ≤ θ, we have Ψ0 ≤ n θn + τ + q−1 √ . 1−θ (52) According to decrease of f (α̃), we get Ψk+1 ≤ Ψk − 1 , 48q Lemma 5.7 One has K ≤ 48q Proof: k = 0, 1, ..., K − 1. n θn + τ + q−1 √ . 1−θ Using (52) and (53), the proof is trivial. (53) (54) 2 1 n The number of outer iterations is bounded above by log , (seeing [16] Π.17, page 116). By multiplying θ ² the number of outer iterations and the number of inner iterations we get an upper bound for the total number of iterations, namely, n θn + τ + q−1 n √ 48q log . ² θ 1−θ In large-update methods one takes for θ a constant (independent on n), namely θ = Θ(1), and τ = O(n). The iteration bound then becomes n O(qn) log . (55) ² Obviously, the bound suggest to take q as small as possible, i.e., q = 2. Note that the bound is exactly the same as the bound obtained in [3] for LO, with the same kernel function, namely O(n log n² ). 5.3 Complexity of small-update methods It may be noted that in the above case the worse iteration bound is not as good as it can be for smallupdate methods due to the fact that the used upper bound for % (s) is not tight at s = 0: it should be equal to % (0) = 1 when s = 0. Using Lemma 4.5, we will see below that an upper bound that is tight at s = 0 will lead to the correct iteration bound. We recall from previous section that the number K of inner iterations is bound above by K ≤ 48qΨ0 . To estimate Ψ0 , we use (40) and Lemma 4.3, with ψ 00 (1) = q. We then obtain Ψ0 ≤ nψ q 1+ τ2 n2 √ + q τ q−1 n 1−θ q 2 2 q τ 1 + nτ 2 + q−1 n qn ≤ √ − 1 . 2 1−θ 16 Using 1 − √ 1−θ = √θ 1+ 1−θ ≤ θ again and Ψ0 ≤ q q q−1 τ2 n2 + qn θ + √ 2 1−θ ≤ 2 this can be simplified to 2τ n 2 = µ ¶2 q √ 2 q θ n + τn + 2τ . 2(1 − θ) We conclude that the total number of iterations is bounded above by K n log ≤ 24 θ ² µ ¶2 q √ 2 q 2 θ n + τn + 2τ θ(1 − θ) log n . ε For small-update methods, namely, θ = Θ( √1n ) and τ = O(1), it shows clearly that the iteration bound ¡ √ ¢ is O q 2 n log nε . 6 Conclusions and remarks We have extended the approach to primal-dual interior-point algorithms for LO, as developed in [3] to SDO. We derive the same complexity bounds for large- and small- update methods for SDO as for the LO case. We developed some new analysis tools that can deal with non-self-regular kernel functions ( for self-regular function, see [14, 15]). Moreover, our analysis is simple and straightforward. Some interesting topics remain for further research. Firstly, the search direction used in this paper is based on the NT-symmetrization scheme and it is natural to ask if other symmetrization schemes can be used. Secondly, numerical experiments are necessary to compare the behavior of the algorithm of this paper with existing methods. Thirdly, it is an interesting question whether we can design similar algorithms by using the kernel function Υp,q (t) with 0 ≤ p ≤ 1, and q > 1. Note that if p ≥ 1 then Υp,q (t) is self-regular and the case has been considered in [14, 15]. References [1] Alizadeh, F., Haeberly, J.A., and Overton. M.: A new primal-dual interior-point method for semidefinite programming. In Lewis,J.G., Ed., Proceedings of the fifth SIAM Conference on Applied Linear Algebra, SIAM, 1994, 113-117. [2] Andersen,E.D., Gondzio,J., Mészáros, Cs. and Xu. X.: Implementation of interior point methods for large scale linear programming. In Terlaky, T. Ed., Interior Point Methods of Mathematical Programming, Kluwer Academic Publishers, The Netherlands, 1996, 189–252. [3] Bai,Y.Q., Roos. C.: A primal-dual interior-point method based on a new kernel function with linear growth rate. Proceedings of Industrial Optimization Symposium and Optimization Day. Australia, November 2002. [4] Bai, Y. Q., Ghami, M. El, and Roos. C.: A comparative study of kernel functions for primal-dual interiorpoint algorithms in linear optimization. Accepted by SIAM Journal on Optimization, 2004. [5] Ben-Tal, A, Nemirovski. A.: Lectures on Modern Convex Optimization: Analysis, Algorithm, and Engineering Applications. MPS-SIAM Series on Optimization, Vol. 02, SIAM, Philadelphia, PA, 2001. [6] Helmberg, C., Rendl, F., Vanderbei, R. J. and Wolkowicz, H.: An interior-point method for semidefinite programming. SIAM Journal on Optimization, 6, 1996, 342-361. [7] Horn, R.A. and Charles R. Johnson: Topics in Matrix Analysis, Cambridge University Press, 1991. [8] Kojima, M. Mizuno, S. and Yoshise, A.: Interior-point methods for the monotone semidefinite linear complementarity problem in symmetric matrices. SIAM Journal on Optimization, 7, 1997, 342-361. [9] Lütkepohl, H.: Handbook of matrices, John Wiley & Sons, 1996. 17 [10] Megiddo, N.: Pathways to the optimal set in linear programming. In Megiddo, N. Ed., Progress in Mathematical Programming: Interior Point and Related Methods, Springer Verlag, New York, 1989, 131–158. Identical version in : Proceedings of the 6th Mathematical Programming Symposium of Japan, Nagoya, Japan, 1986, 1–35. [11] Monteiro, R.D.C.: Primal-dual path-following algorithms for semidefinite programming. SIAM Journal on Optimization, 7, 1997, 663-678. [12] Nesterov, Yu. E. and Todd, M. J.: Self-scaled barries and interior-point methods for convex programming, Mathematics of Operations Reaearch, 22(1), 1997, 1-42. [13] Nesterov, Yu. E. and Todd, M. J.: Primal-dual interior-point methods for self-scaled cones. SIAM Journal on Optimization, 8(2),(electronic), 1998, pages 324–364. [14] Peng, P. Roos, C. and Terlaky, T.: Self-regular functions and new search directions for linear and semidefinite optimization. Mathematical Programming, 93, 2002, 129–171. [15] Peng, J. Roos, C. and Terlaky. T.: Self-Regularity: A New Paradigm for Primal-Dual Interior-Point Algorithms. Princeton University Press, 2002. [16] Roos, C. Terlaky, T. and Vial, J.-Ph.: Theory and Algorithms for Linear Optimization. An Interior-Point Approach. John Wiley & Sons, Chichester, UK, 1997. [17] Sonnevend. G.: An “analytic center” for polyhedrons and new classes of global algorithms for linear (smooth, convex) programming. In Prékopa, A. Szelezsán, J. and Strazicky, B. Eds. System Modelling and Optimization : Proceedings of the 12th IFIP-Conference held in Budapest, Hungary, September 1985, volume 84 of Lecture Notes in Control and Information Sciences, Springer Verlag, Berlin, West–Germany, 1986, 866–876. [18] Wolkowicz, H. Saigal, R. and Vandenberghe, L.: Handbook of Semidefinite Programming, Theory, Algorithms, and Applications, Kluwer Academic Publishers, 2000. 18