Regularization methods for semidefinite programming ∗ J´

advertisement
Regularization methods for semidefinite
programming∗
Jérôme Malick†
Janez Povh‡
Franz Rendl§
Angelika Wiegele§
September 1, 2008
Abstract
We introduce a new class of algorithms for solving linear semidefinite
programming (SDP) problems. Our approach is based on classical tools
from convex optimization such as quadratic regularization and augmented
Lagrangian techniques. We study the theoretical properties and we show
that practical implementations behave very well on some instances of SDP
having a large number of constraints. We also show that the “boundary
point method” from [PRW06] is an instance of this class.
Key words: semidefinite programming, regularization methods, augmented
Lagrangian method, large scale semidefinite problems
1
1.1
Introduction
Motivations
Semidefinite programming (SDP) has been a very active research area in optimization for more than a decade. This activity was motivated by important
applications, especially in combinatorial optimization and in control theory. We
refer to the reference book [WSV00] for theory and applications.
The key object in semidefinite programming is the set of positive semidefinite
matrices, denoted by Sn+ , which constitutes a closed convex cone in Sn , the space
of n-by-n symmetric matrices. Denoting by hX, Y i = trace(XY ) the standard
inner product in Sn , one writes the standard form of a (linear) semidefinite
program as
min hC, Xi
(1)
AX = b, X 0,
∗ Supported in part by the EU project Algorithmic Discrete Optimization (ADONET),
MRTN-CT-2003-504438.
† CNRS, Lab. J. Kuntzmann, University of Grenoble, France
‡ University of Maribor, Faculty of logistics, Slovenia
§ Institut für Mathematik, Alpen-Adria-Universität Klagenfurt, Austria
1
where b ∈ Rm , A : Sn → Rm is a linear operator and X 0 stands for X ∈ Sn+ .
The problem dual to (1) is
max b> y
(2)
C − A∗ y = Z, Z 0,
where the adjoint A∗ : Rm → Sn satisfies
∀y ∈ Rm ,
y > AX = hA∗ y, Xi.
∀X ∈ Sn ,
(3)
The matrix X ∈ Sn is called the primal variable and the pair (y, Z) ∈ Rm × Sn
forms the dual variable.
The success of semidefinite programming was also spurred by the development of efficient algorithms to solve the pair of dual problems (1),(2). A pleasant
situation to develop algorithms is when strong duality holds, for instance under
the classical technical Slater constraint qualification: if we assume that both
primal and dual problems satisfy the Slater condition (meaning that there exists a positive definite matrix X with AX = b and a vector y such that C − A∗ y
is positive definite), then there is no duality gap and there exist primal and dual
optimal solutions; moreover (X, y, Z) is optimal if and only if
AX = b, C − A∗ y = Z,
(4)
X 0, Z 0, hX, Zi = 0.
It is widely accepted that interior-point based approaches are among the most
efficient methods for solving general SDP problems. The primal-dual interiorpoint path-following methods are based on solving the optimality conditions (4)
with hX, Zi = 0 replaced by a parameterized matrix equation ZX = µI, or
some variant of this equation. As µ > 0 approaches 0, optimality holds with
increasing accuracy. The key operation consists in applying Newton’s method
to these equations, resulting in the following linear system for ∆X, ∆y, ∆Z:
A(Z −1 A∗ (∆y)X)
=
b − µA(Z −1 ),
∆Z
=
−A∗ (∆y),
∆X
=
µZ −1 − X − Z −1 ∆ZX.
The specific form of the first equation, often called the Schur-Complement equation, depends on how exactly ZX − µI is symmetrized. We refer to [Tod01] for
a thorough treatment of this issue. The main effort consists in setting up and
solving this Schur equation. Several public domain packages based on this approach are available. The Mittelmann website (http://plato.asu.edu/ftp/
sdplib.html) reports benchmark computations using several implementations
of this idea. Looking at the results, it becomes clear that all of these methods
have their limits once the matrix dimension n goes beyond 1000 or once the
number m of constraints is larger than, say, 5000. These limitations are caused
by the dense linear algebra operations with matrices of order n and by setting
up and solving the Schur complement equation of order m. By using iterative
2
methods to solve this equation, [Toh04] manages to solve certain types of SDP
problems with m up to 100,000. Similarly [KS07] combines an iterative solver
with a modified barrier formulation of the dual SDP and also report computational results with the code PENNON with m up to 100,000. Another approach
to solve the pair of dual SDP problems (1),(2) is to use nonlinear optimization
techniques. For instance, [HR00], [BM03], [Mon03] or [BV06] solve (1),(2) after
rewriting them as nonlinear programs. Strong computational results on large
problems with medium accuracy have been reported for these algorithms. We
also mention the mirror-prox method recently proposed in [LNM07] as a (first
order) method tailored for large scale structured SDP.
In this paper, we study alternatives to all these methods, using quadratic regularization of SDP problems. As linear problems, the primal problem (1) and the
dual problem (2) can indeed admit several solutions which can be moreover very
sensitive to small perturbations of the data C, A and b. A classical and fruitful idea in non-smooth or constrained optimization (which can be traced back
to [BKL66], [Mar70]) is to stabilize the problems by adding quadratic terms:
this will ensure existence, uniqueness and stability of solutions. Augmented
Lagrangian methods [PT72], [Roc76a] are well-known important regularization
techniques. Regarding SDP, an augmented Lagrangian on the primal problem
(1) was considered in [BV06] and an augmented Lagrangian on the dual (2)
in [Ren05] and [PRW06], introducing the so-called “boundary point method”.
More recently, [JR08] propose an augmented primal-dual method which is similar to the present approach, and report computational results for large-scale
random SDP. The idea to apply a proximal regularization to SDP is mentioned
in [Mal05].
The main contributions of this paper are the following. We introduce and
study a new class of regularization methods for SDP and we show that our new
methods are efficient on several classes of large-scale SDP problems (n not too
large, say n ≤ 1000, but with a large number of constraints). We also show
that the boundary point method [PRW06] has a natural interpretation as one
particular instantiation of our general class of methods, thus we put this method
into perspective.
Here is a brief outline of the paper. In the following subsection, we finish
the introduction by presenting two quadratic regularizations for SDP problems.
They will eventually result in a general regularization algorithm for SDP (Algorithm 4.3, studied in Section 4). The connection of this algorithm with the
boundary point method will be made explicit by Proposition 4.4 (and Proposition 3.4). Before this, Sections 2 and 3 are devoted to the study of a particular
type of a nonlinear SDP problem - the “inner problem” - appearing in the two
regularizations. Section 2 studies optimality condition for this type of nonlinear
SDP problems, while Section 3 presents a general approach to solve them. Section 4 then uses these elements to set up the regularization algorithm. Finally
in Section 5 we report computational experiments on randomly generated instances as well as instances from publicly available libraries. A simple instance
of the regularization algorithm compares favorably on these instances with the
best SDP solvers.
3
1.2
Quadratic regularizations
We focus on two quadratic regularizations: Moreau-Yosida regularization for the
primal problem (1) and augmented Lagrangian method for the dual problem (2).
It is known that these two regularizations are actually equivalent, as primal and
dual views of the same process. We recall them briefly as we will constantly
draw connections between the primal and the dual point of view.
Primal Moreau-Yosida regularization. We denote by k · k the norms associated to standard inner products for both spaces Sn and Rn . We begin with
considering, for given t > 0, the following problem
1
kX − Y k2
min hC, Xi + 2t
(5)
AX = b, X 0, Y ∈ Sn ,
which is clearly equivalent to (1) (by minimizing first with respect to Y and then
with respect to X). The idea is then to solve (5) in two steps: For Y fixed, we
minimize first with respect to X and the result is then minimized with respect
to Y . Thus we consider the so-called Moreau-Yosida regularization Ft : Sn → R
defined as the optimal value
Ft (Y ) =
min
X0,AX=b
hC, Xi +
1
kX − Y k2 ;
2t
(6)
and therefore we have
min Ft (Y ) =
Y ∈Sn
min
X0,AX=b
hC, Xi.
(7)
By the strong convexity of the objective function of (6), the point that achieves
the minimum is unique; it is called the proximal point of Y (with parameter t)
and it is denoted by Pt (Y ). The next proposition recalls the well-known properties of the Moreau-Yosida regularization Ft (see the section XV.4 of [HUL93]
for instance).
Proposition 1.1 (Properties of the regularization). The function Ft is finite
everywhere, convex and differentiable. Its gradient at Y ∈ Sn is
∇Ft (Y ) =
1
(Y − Pt (Y )).
t
(8)
The functions ∇Ft (·) and Pt (·) are Lipschitz continuous.
Dual regularization by augmented Lagrangian. The augmented Lagrangian technique to solve (2) (going back to [Hes69], [Pow69] and [Roc76a])
introduces the augmented Lagrangian function Lσ with parameter σ > 0:
σ
Lσ (y, Z; Y ) = b> y − hY, A∗ y + Z − Ci − kA∗ y + Z − Ck2 .
2
This is just the usual Lagrangian for the problem
max b> y − σ2 kA∗ y + Z − Ck2
C − A∗ y = Z, Z 0,
4
(9)
that is (2) with an additional redundant quadratic term. The (bi)dual function
is in this case
Θσ (Y ) = max Lσ (y, Z; Y ).
(10)
y,Z0
A motivation for this approach is that Θσ is differentiable everywhere, in contrast to the dual function associated with (2). Solving this latter problem by
the augmented Lagrangian method then consists in minimizing Θσ .
Connections. The bridge between the primal and the dual regularizations
is formalized by the following proposition. It is a known result (see [HUL93,
XII.5.1.1] for the general case), and it will be a straightforward corollary of the
forthcoming Proposition 2.1.
Proposition 1.2 (Outer connection). If t = σ, then Θσ (Y ) = Ft (Y ) for all
Y ∈ Sn .
The above two approaches thus correspond to the same quadratic regularization process viewed either on the primal problem (1) or on the dual (2). The
idea to solve the pair of SDP problems is to use the differentiability of Ft (or
Θσ ). This can be seen from the primal point of view: the constrained program
(1) is replaced by (7), leading to the unconstrained minimization of the convex
differentiable function Ft . The proximal algorithm [Roc76b] consists in applying
a descent method to minimize Ft , for instance the gradient method with fixed
step size t. In view of (8), this gives the simple iteration Ynew = Pt (Y ).
In summary, the solution of the semidefinite program (1) by quadratic regularization requires an iterative scheme (outer algorithm) to minimize Ft or Θσ .
Evaluating Ft (Y ) or Θσ (Y ) is itself an optimization problem, which we call the
“inner problem”. From a practical point of view, we are interested in efficient
methods that yield approximate solutions of the inner problem. In the following
section we therefore investigate the optimality conditions of (6) and (10). These
are then used to formulate algorithms for the inner problem. We will then be
in position to describe the overall algorithm.
2
Inner problem: optimality conditions
In this section and the next one, we look at the problem of evaluating Ft (Y ) for
some given Y . Since Ft (Y ) is itself the result of the minimization problem,
1
kX − Y k2
min hC, Xi + 2t
(11)
AX = b, X 0,
we consider various techniques to carry out this minimization at least approximately. In this section we study Lagrangian duality of (11) and we take a
closer look at the optimality conditions. The next section will be devoted to
algorithms based on these elements.
We start by introducing the following notation that will be used extensively
in the sequel. The projection of a matrix A ∈ Sn onto the (closed convex) cone
5
Sn+ and its polar cone Sn− are denoted respectively by
A+ = argmin kX − Ak
and
A− = argmin kX − Ak.
X0
X0
Theorem III.3.2.5 of [HUL93] for instance implies that
A = A+ + A− .
(12)
P
In fact we have the decomposition explicitly. Let A = i λi pi pTi be the spectral decomposition of A with eigenvalues λi and eigenvectors pi , which may be
assumed to be pairwise orthogonal and of length one. Then it is well known
that
X
X
A+ =
λi pi pi >
and
A− =
λi pi pi > .
λi >0
λi <0
Observe also that we have for any A ∈ Sn and t > 0,
(tA)+ = tA+
(−A)+ = −(A− ) .
and
(13)
We dualize in (11) the two constraints by introducing the Lagrangian
Lt (X; y, Z) = hC, Xi +
1
kX − Y k2 − hy, AX − bi − hZ, Xi,
2t
a function of the primal variable X ∈ Sn and the dual (y, Z) ∈ Rm × Sn+ . The
dual function defined by
θt (y, Z) = min Lt (X; y, Z)
X∈Sn
(14)
is then described as follows.
Proposition 2.1 (Inner dual function). The minimum in (14) is attained at
X(y, Z) = t(Z + A∗ y − C) + Y.
(15)
The dual function θt is equal to
t
θt (y, Z) = b> y − hY, Z + A∗ y − Ci − kZ + A∗ y − Ck2 .
2
(16)
Moreover θt is differentiable: its gradient with respect to y is
∇y θt (y, Z) = b − A(t(Z + A∗ y − C) + Y ),
and its gradient with respect to Z is
∇Z θt (y, Z) = −(t(Z + A∗ y − C) + Y ).
6
(17)
Proof. Let (y, Z) ∈ Rm × Sn+ fixed. The function X 7→ Lt (X; y, Z) is strongly
convex and differentiable. Then it admits a unique minimum point X(y, Z)
satisfying
1
0 = ∇X Lt X(y, Z); y, Z = C + (X(y, Z) − Y ) − A∗ y − Z
t
which gives (15). Thus the dual function can be rewritten as
θt (y, Z)
=
=
=
Lt (X(y, Z); y, Z)
t
b> y + hC − A∗ y − Z, t(Z + A∗ y − C) + Y i + kZ + A∗ y − Ck2
2
t
b> y − hY, Z + A∗ y − Ci − kZ + A∗ y − Ck2
2
This function is differentiable with respect to (y, Z). For fixed Z, the gradient
of y 7→ θt (y, Z) is
∇y θt (y, Z) = b − AY − tA(Z + A∗ y − C) = b − A(t(Z + A∗ y − C) + Y ),
the one of Z 7→ θt (y, Z) is
∇Z θt (y, Z) = −(t(Z + A∗ y − C) + Y ),
This completes the proof.
In view of (16), the dual problem of (11) can be formulated as
max
m
y∈R ,Z0
t
b> y − hY, Z + A∗ y − Ci − kZ + A∗ y − Ck2 .
2
(18)
Observe that (18) is exactly (10). Proposition 1.2 now becomes obvious, and is
formalized in the following remark.
Remark 2.2 (Proof of Proposition 1.2). Proposition 2.1 and [HUL93, XII.2.3.6]
imply that there is no duality gap. Thus for t = σ we have Ft (Y ) = Θσ (Y ) by
equations (6) and (10).
We also get the expression of the proximal point and of the gradient of the
Moreau regularization Ft in terms of solutions of (18).
Corollary 2.3 (Gradient of Ft ). Let (ȳ, Z̄) be a solution of (18). Then
Pt (Y ) = t(Z̄ + A∗ ȳ − C) + Y
and
∇Ft (Y ) = −(Z̄ + A∗ ȳ − C).
Proof. Given a solution (ȳ, Z̄) of (18), the general duality theory (see for example [HUL93, XII.2.3.6]) yields that X(ȳ, Z̄) = t(Z̄ + A∗ ȳ − C) + Y is the
unique solution of the primal inner problem (11), that is Pt (Y ) by definition.
Moreover, (8) gives the desired expression for the gradient.
The following technical lemma specifies the role of the primal Slater assumption.
7
Lemma 2.4 (Coercivity). Assume that there exist X̄ 0 such that AX̄ = b
and that A is surjective. Then θt is coercive, in other terms θt (y, Z) → −∞
when k(y, Z)k → +∞, Z 0.
Proof. By (14) defining θt , we have for all (X, y, Z) ∈ Sn+ × Rm × Sn+
θt (y, Z) ≤ hC, Xi +
1
kX − Y k2 − hy, AX − bi − hZ, Xi.
2t
(19)
By surjectivity of A, there exist r > 0 and R > 0 such that for all γ ∈ Rm with
kγk < 2r, there exists Xγ 0 with kXγ − X̄k ≤ R satisfying AXγ − b = γ.
Then set, for y ∈ Rm ,
y
,
γ̄ = r
kyk
and plug the associated Xγ̄ into (19) to get
1
kXγ̄ − Y k2 − hy, γ̄i − hZ, Xγ i
2t
1
= hC, Xγ̄ i + kXγ̄ − Y k2 − rkyk − hZ, Xγ̄ i.
2t
θt (y, Z) ≤
hC, Xγ̄ i +
Observe that the minimum of hZ, Xγ i is attained over the compact set defined
by kγk ≤ r, {Z 0 and kZk = 1}. Call the minimum M and the points
achieving the minimum Z̃ 0 and X̃ 0, so that M > 0. Then we can derive
the bounds, for all (y, Z),
Z
hZ, Xγ̄ i =
, Xγ̄ kZk ≥ M kZk
kZk
and
1
kXγ̄ − Y k2 − rkyk − M kZk
2t
1
To conclude, note that the quantity hC, Xγ̄ i+ 2t
kXγ̄ −Y k2 is bounded, since Xγ̄
stays on a ball centered in X̄. So we see that θt (y, Z) → −∞ when kyk → +∞
or kZk → +∞.
θt (y, Z) ≤ hC, Xγ̄ i +
Remark 2.5 (A simple example showing that Lemma 2.4 is wrong without a
Slater point). Let J be the matrix of all ones, and consider the problem
min hC, Xi
hJ, Xi = 0, hI, Xi = 1, X 0,
and its dual
max y2
C − y1 J − y2 I = Z, Z 0.
We first observe that the primal problem has no Slater point. To see this, take
a feasible X, and write
λmin (X) = min
z6=0
z T Xz
eT Xe
hJ, Xi
≤
=
= 0.
zT z
eT e
n
8
Together with X 0, it follows λmin (X) = 0, hence X is singular, and thus
there is no Slater point.
Now we show that the dual function is not coercive. By (16), there holds
t
θt (y, Z) = y2 − hY, Z + y1 J + y2 I − Ci − kZ + y1 J + y2 I − Ck2 .
2
Choosing Z = −y1 J and y2 = 0 with y1 < 0 and y1 → −∞, we have (y, Z) with
k(y, Z)k → +∞. However, substituting in θt , we obtain
t
θt (y, Z) = hY, Ci − kCk2 ,
2
which is constant.
We end this subsection summarizing the optimality conditions for (11) and
(18) under the primal Slater assumption.
Proposition 2.6 (Optimality conditions). If there exists a positive definite
matrix X̄ satisfying AX̄ = b, then for any Y ∈ Sn there exist primal and dual
optimal solutions (X, y, Z) for (11),(18), and there is no duality gap. In this
case, the following statements are equivalent:
(i) (X, y, Z) is optimal for (11),(18).
(ii) (X, y, Z) satisfies
AX = b, X = t(Z + A∗ y − C) + Y,
X 0, Z 0, hX, Zi = 0.
(20)
(iii) (X, y, Z) satisfies

 X = t(Y /t + A∗ y − C)+
Z = −(Y /t + A∗ y − C)−

AA∗ y + A(Z − C) = (b − AY )/t.
(21)
Proof. The Slater assumption implies, first, that the intersection of Sn+ and
AX = b is non-empty, and therefore that there exists a solution to (11), and
second, that there exist dual solutions (thanks to Lemma 2.4).
Observe now that (ii) are the KKT conditions of (11), which gives the
equivalence between (i) and (ii) since the problems are convex. The equivalence between (ii) and (iii) comes essentially from (12) (precisely from [HUL93,
Theorem III.3.2.5]), which ensures that
X/t − Z = Y /t + A∗ y − C,
X 0,
Z 0,
hX, Zi = 0
is equivalent to
X/t = (Y /t + A∗ y − C)+ ,
−Z = (Y /t + A∗ y − C)− .
Using the equality X/t = Y /t + A∗ y + Z − C, we can also replace the variable
X in AX = b to obtain exactly (21).
9
3
Inner problem: algorithms
We have just seen that the optimality conditions for the inner problem have a
rich structure. We are going to exploit it and consider several approaches to get
approximate solutions of the inner problem.
3.1
Using the optimality conditions
A simple method to solve the inner problem (18) exploits the fact (look at the
optimality conditions of (21)) that for fixed Z, the vector y can be determined
from a linear system of equations, and for fixed y, the matrix Z is obtained
by projection. This is the idea used in the boundary point method [Ren05],
[PRW06] whose corresponding inner problem is solved by the following twostep process.
Algorithm 3.1 (Two-step iterative method for inner problem).
Given t > 0 and Y ∈ Sn .
Choose y ∈ Rm and set Z = −(Y /t + A∗ y − C)− .
Repeat until kAX − bk is small:
Step 1: Compute the solution y of AA∗ y = A(C − Z) + (b − AY )/t.
Step 2: Update Z = −(Y /t + A∗ y − C)− and X = t(Y /t + A∗ y − C)+ .
By construction, each iteration of this two-step process guarantees that
X 0,
Z 0,
hX, Zi = 0.
This explains the name “boundary point method”. Observe also that Step 1 of
Algorithm 3.1 amounts to solving an m × m linear system of the form AA∗ y =
rhs, which can be expensive if m is large. However, in contrast to interior point
methods, the matrix AA∗ is constant throughout the algorithm. So it can be
decomposed at the beginning of the process once and for all. Moreover, the
matrix structure can be exploited to speed up calculation.
To see that the stopping condition makes sense, we observe that after Step 2
of the algorithm is executed, the only possibly violated optimality condition is
AA∗ y + A(Z − C) = (b − AY )/t, using (21). After Step 2, X and Z satisfy
X = t(Z + A∗ y − C) + Y . This condition holding, it is clear that AX = b if
and only if AA∗ y + A(Z − C) = (b − AY )/t. We will come back to convergence
issues in more detail in Section 4.2.
3.2
Using the dual function
We consider again the dual inner problem (18). We observe that the minimization with respect to Z, with y held constant, leads to a projection onto Sn+ and
results in the dual function
θet (y) = max θt (y, Z)
Z0
10
depending on y only. The differentiability properties of this function are now
summarized.
Proposition 3.2 (Dual function). The function θet is a differentiable function
that can be expressed (up to a constant) as
t
2
θet (y) = b> y − k(A∗ y − C + Y /t)+ k
2
(22)
and whose gradient is ∇θet (y) = b − tA(A∗ y − C + Y /t)+ .
Proof. With the help of (16), express θet (y) as the minimum
t
θet (y) = − min −θt (y, Z) = b> y − min hY, Z + A∗ y − Ci + kZ + A∗ y − Ck2 .
Z0
Z0
2
Rewrite this objective function as
t
t
1
2
2
hY, Z + A∗ y − Ci + kZ + A∗ y − Ck2 = kZ − (C − Y /t − A∗ y)k − kY k .
2
2
2t
Observe that the minimum is attained by the projection of C − Y /t − A∗ y onto
Sn+ that is by the matrix
Z(y) = (C − Y /t − A∗ y)+ = −(A∗ y − C + Y /t)− .
(23)
Observe also that the function Z : y 7→ (C − Y /t − A∗ y)+ is continuous (since
the projection X 7→ X+ is Lipschitz continuous). Equation (12) enables to write
t
1
2
2
θet (y) = b> y − k(Y /t − C + A∗ y)+ k + kY k .
2
2t
(24)
We now want to use a theorem of differentiability of a min function (as [HUL93,
VI.4.4.5] for example) to compute the derivative of θet . So we need to ensure the
compactness of the index on which the maximum is taken. Let ȳ ∈ Rm and V
a compact neighborhood of ȳ in Rm . By continuity of Z(·) defined by (23), the
set U = Z(V ) is compact and we can write
θet (y) =
max θt (y, Z)
+
Z∈U ∩Sn
for all y ∈ V . Since the maximum is taken on a compact set, [HUL93, VI.4.4.5]
gives that θet is differentiable at ȳ with
∇θet (ȳ) = ∇y θt (ȳ, Z(ȳ))
for which we have an expression (recall (17)). Since this can be done around any
ȳ ∈ Rm , we conclude that θe is differentiable on Rm . Using (23), we compute,
for y ∈ Rm
∇θet (y)
= b − tA(Z(y) + A∗ y − C + Y /t)
= b − tA (A∗ y − C + Y /t)+
and the proof is complete.
11
The dual problem (18) can thus be cast as the following concave differentiable
problem
t
2
max
b> y − k(A∗ y − C + Y /t)+ k ,
(25)
m
y∈R
2
up to a constant, which is explicitly kY k2 /2t (see (24)). So we can use this
formulation to solve the inner problem (11) through its dual (25) (when there
is no duality gap, see Proposition 2.6). The key is that the objective function
θet is concave, differentiable with an explicit expression of its gradient (Proposition 3.2). Thus we can use any classical algorithm of unconstrained convex
programming to solve (25). In particular we can use
• gradient-like strategies (e.g.
method)
gradient, conjugate gradient or Nesterov
• Newton-like strategies (e.g. quasi-Newton or inexact generalized Newton).
For generality and simplicity, we consider the following variable metric method
to solve (25), which generalizes many of the above classical strategies.
Algorithm 3.3 (Dual variable metric method for the inner problem).
Given t > 0 and Y ∈ Sn . Choose y ∈ Rm .
Repeat until kb − AXk is small:
Compute X = t(A∗ y − C + Y /t)+ .
Set g = b − AX.
Update y ← y + τ W g with appropriate W and τ .
The stopping condition is as before, but now it is motivated by the fact that
∇θet (y) = b − tA(A∗ y − C + Y /t)+ = b − AX. In fact the connections between
Algorithm 3.3 and the previous inner method (Algorithm 3.1) are strong; they
are precisely stated by the following proposition.
Proposition 3.4 (Inner connection). If A is surjective, Algorithm 3.1 generates
the same sequence as Algorithm 3.3 when both algorithms start from the same
dual iterate y, and when W and τ are kept constant for all iterations, equal to
W = [AA∗ ]−1
and
τ = 1/t.
∗
Proof. The surjectivity of A implies that AA is invertible and then that the
sequence (Xk , yk , Zk )k generated by Algorithm 3.1 is well defined as well as
ek , yek )k generated by Algorithm 3.3 when W = [AA∗ ]−1 and
the sequence (X
ek for all k ≥ 0.
τ = 1/t. Let us prove by induction that yk = yek and Xk = X
We assume that the two algorithms start by the same dual iterate y0 = ye0 .
e0 = t(Y /t + A∗ y0 − C)+ by construction. Assume now
It holds that X0 = X
ek . To prove that yek+1 = yk+1 , we check if
that we have yk = yek and Xk = X
yek+1 defined by
1
ek )
yek+1 = yek + [AA∗ ]−1 (b − AX
t
12
satisfies the equation defining yk+1 , that is, if
∆k = AA∗ yek+1 − A(C − Zk ) − (b − AY )/t
is null. By construction of yek+1 , we have
ek )/t − A(C − Zk ) − (b − AY )/t
∆k = AA∗ yek + [AA∗ ]−1 (b − AX
=
=
ek )/t − (b − AY )/t − A(C − Zk )
AA∗ yek + (b − AX
ek /t − Zk − (Y /t + A∗ yek − C) .
−A X
ek = Xk and yek = yk , we get ∆k = 0 (by construction of Xk and Zk in
Since X
Step 2 of Algorithm 3.1). Hence, yk+1 = yek+1 , and it yields
ek+1 = t(A∗ yk+1 − C + Y /t)+ = Xk+1 ,
X
and the proof is completed by induction. Note also that the two algorithms
have the same stopping rule.
as
A direct calculation shows that the primal inner problem (11) can be cast

1
kX − (Y − tC)k2
 min 2t
AX = b,
(26)

X 0.
This problem is a so-called semidefinite least-squares problem: we want to compute the nearest matrix to (Y − tC) belonging to C, the intersection of Sn+ with
the affine subspace AX = b. In others words, we want to project the matrix
Y − tC onto the intersection. The problem received recently a great interest
(see [Mal04] for the general case, and [Hig02], [QS06] for the important special
case of correlation matrices).
4
Outer algorithm
In the previous section we have investigated how the inner problem, that is evaluating Ft (Y ) or Θσ (Y ) for some given Y , can be done approximately. These
methods are the backbone of the overall algorithm, which we are going to describe in this section.
4.1
Conceptual outer algorithm
We start with the primal point of view. The Moreau-Yosida quadratic regularization of (1) leads us to a conceptual proximal algorithm, which can be written
as follows.
13
Algorithm 4.1 (Conceptual proximal algorithm).
Initialization: Choose t > 0 and Y ∈ Sn .
Repeat until 1t kY − Pt (Y )k is small:
Step 1: Solve the inner problem (11) to get X = Pt (Y ).
Step 2: Set Y = X and update t.
The stopping condition is based on the gradient (8); but obviously, this
“algorithm” is only conceptual since it requires Pt (Y ). Moving now to the dual
point of view, we propose to apply the augmented Lagrangian method to solve
(2) leading to the following algorithm.
Algorithm 4.2 (Conceptual “boundary point”).
Initialization: Choose σ > 0 and Y ∈ Sn .
Repeat until kZ + A∗ y − Ck is small:
Step 1: Solve the inner problem (18) to get (y, Z).
Step 2: Compute X = Y + σ(Z + A∗ y − C), set Y = X and update σ.
If the two inner steps (Step 1 just above and Step 1 of Algorithm 4.1) are
solved exactly, then the expression of the gradient of the regularization Ft (recall
Corollary 2.3) would show that the previous algorithm produces the same iterates as Algorithm 4.1. In other words, the conceptual boundary point method is
equivalent to the conceptual proximal algorithm. Proposition 4.4 below shows
that this correspondence property also holds when the inner problems are solved
approximately. Note that we implicitly assume that the two regularization parameters are equal (t = σ); for clarity, we use only t for the rest of the paper.
Implementable versions of the above algorithms require the computation of
Step 1. In view of the previous sections, we use the general Algorithm 3.3
inside of Algorithms 4.1 and 4.2 to solve Step 1 (inner problem), and we introduce a tolerance εinner for the inner error and another tolerance εouter for the
outer stopping condition. We thus obtain the following regularization algorithm
for SDP.
Algorithm 4.3 (Regularization algorithm for SDP).
Initialization: Choose initial iterates Y , y, and εinner , εouter .
Repeat until kZ + A∗ y − Ck ≤ εouter :
Repeat until kb − AXk ≤ εinner :
Compute X = t(A∗ y − C + Y /t)+ and Z = −(A∗ y − C + Y /t)− .
Update y ← y + τ W g with appropriate τ and W .
end (inner repeat)
Y ← X.
end (outer repeat)
14
We note that Algorithm 4.3 has the following particular feature. It is “orthogonal” to interior point methods in the sense that it works to enforce the primal and dual feasibilities while the complementarity and semidefiniteness conditions are guaranteed throughout. In contrast, interior-point methods maintain
primal and dual feasibility and semidefiniteness and try to reach complementarity. We now establish the connection to the boundary point method from
[PRW06].
Proposition 4.4 (Outer connections). If A is surjective, W and τ are fixed at
W = [AA∗ ]−1
and
τ = 1/t,
then Algorithm 4.3 generates the same sequence as the boundary point method
(see Table 2 of [PRW06]) when starting from the same initial iterates Y0 = 0
and y0 such that A∗ y0 − C 0 and with the same stopping threshold εinner and
εouter .
Proof. The result follows easily from Proposition 3.4 by induction. Note first
that Y0 and y0 give Z0 = 0, and therefore that the initializations coincide. (For
the boundary point method we refer to Table 2 of [PRW06].) Proposition 3.4
then shows that the two inner algorithms generate the same iterates, and they
have the same stopping condition if εinner and εouter correspond to the thresholds
of Table 2 of [PRW06]. Observe finally that the two outer iterations are also
identical: the expressions of X and Z (Step 1 in Algorithm 4.3) give the update
of the boundary point (Step 2 in Algorithm 4.2).
4.2
Elements of convergence
The convergence behaviour of Algorithm 4.3 is for the moment much less understood than the convergence properties of interior point methods for example.
This lack of theory also opens the way for many degrees of freedom in tuning parameters for the algorithm. Section 5 discusses briefly this question and
presents the practical algorithm, used for experimentation.
For the sake of completeness we include here a first convergence theorem
with an elementary proof. The theorem’s assumptions and proof are inspired
from classical references on the proximal and augmented Lagrangian methods
([PT72], [Roc76a], [Ber82], you may see also [BV06]).
Theorem 4.5 (Convergence). Denote by (Yp , yp , Zp )p the sequence of (outer)
iterates generated by Algorithm 4.3 and by εp = εinner
the associated inner error.
p
Make three assumptions in Algorithm 4.3:
• t is constant,
P
•
p εp < ∞,
• (yp )p is bounded.
15
If primal and dual problems satisfy the Slater assumption, the sequence (Yp , yp , Zp )p
is asymptotically feasible:
Zp + A∗ yp − C → 0
AYp − b → 0.
and
Thus any accumulation point of the sequence gives a primal-dual solution.
Proof. Assume that the primal and dual problems satisfy the Slater assumption, and consider (Ȳ , ȳ, Z̄) be an optimal solution, that is satisfying (4). The
assumption that the series of εp converges yields that AYp − b → 0 is satisfied.
We want to prove that the dual residue Gp = −(Zp + A∗ yp − C) vanishes as
well. Recall we have Yp = Yp−1 − tGp . So we develop
kYp − Ȳ k2
= kYp−1 − Ȳ k2 − kYp−1 − Ȳ k2 + kYp − Ȳ k2
= kYp−1 − Ȳ k2 − kYp + tGp − Ȳ k2 + kYp − Ȳ k2
= kYp−1 − Ȳ k2 − 2hYp − Ȳ , tGp i − ktGp k2
= kYp−1 − Ȳ k2 − t2 kGp k2 − 2thYp − Ȳ , Gp i
Let us focus on the last term hYp − Ȳ , Gp i. Using the optimality conditions (4),
we write
Gp = −(Zp + A∗ yp − C) = −(Zp − Z̄) + A∗ (yp − ȳ).
and then
−hYp − Ȳ , Zp − Z̄ + A∗ (yp − ȳ)i
hYp − Ȳ , Gp i =
=
−hYp − Ȳ , A∗ (yp − ȳ)i − hYp , Zp i − hȲ , Z̄i + hYp , Z̄i + hȲ , Zp i
≥
−hYp − Ȳ , A∗ (yp − ȳ)i
= −hA(Yp − Ȳ ), yp − ȳi
≥
−εp kyp − ȳk,
the first inequality coming from hȲ , Z̄i = hYp , Zp i = 0 and Ȳ , Z̄, Yp , Zp 0,
and the second from the Cauchy-Schwarz Inequality. So we get
t2 kGp k2 ≤ −kYp − Ȳ k2 + kYp−1 − Ȳ k2 + 2tεp kyp − ȳk.
Summing these inequalities for the first N outer iterations, we finally get
t2
N
X
kGp k2 ≤ −kYN − Ȳ k2 + kY0 − Ȳ k2 + 2t
p=1
N
X
εp kyp − ȳk.
p=1
Call M a bound of the sequence kyp − ȳk and write
N
X
p=1
P
By assumption,
proof is complete.
p εp
kGp k2 ≤
N
2M X
1
2
εp .
kY
−
Ȳ
k
+
0
t2
t p=1
is finite, then so is
16
P
p
kGp k2 , hence Gp → 0 and the
4.3
Min-Max Interpretation and stopping rules
We end the presentation of the overall algorithm by drawing connections with a
method to solve min-max problems. Since we have an interpretation of the inner
algorithm via duality, it makes sense indeed to cast the primal SDP problem as
a min-max problem. With the help of (7) and (25) we can write (1) as
min max ϕ(Y, y)
Y ∈Sn y∈Rm
with
1
t
2
k(A∗ y − C + Y /t)+ k + kY k2 .
(27)
2
2t
Our approach can thus be interpreted as solving the primal and dual SDP
problems by computing a saddle-point of ϕ.
With this point of view, the choice of stopping conditions appears to be even
more crucial, because the inner and outer loops are antagonistic, as the first
minimizes and the second maximizes. In this context, there are two opposite
strategies. First, solving the inner maximization with respect to y with high
accuracy (using a differentiable optimization algorithm) and then updating Y
would amount to follow the conceptual proximal algorithm as closely as possible.
Alternatively, we can do one gradient-like iteration with respect to each variable
successively: this is essentially the idea of the “Arrow-Hurwicz” approach (see
for instance [AHU59]), an algorithm that goes back to the 1950’s, and that
has been applied for example to the saddle-point problems that appear in finite
element discretization of Stokes-type and elasticity equations and in mixed finite
element discretization.
We tried both strategies and have observed that in practice the second option
gives much better numerical results: for the numerical experiments in the next
section, we thus always fix the inner iteration to one. Besides, an adequate
stopping rule in such a situation still has to be studied theoretically; this is a
general question, beyond the scope of this paper. Note also that since there is
no distinction between the inner and the outer iteration in this approach, we
use a pragmatic stopping condition on both the relative primal and dual error,
see the next section.
ϕ(Y, y) = b> y −
5
5.1
Numerical illustrations
Simple regularization algorithm
Up to now we have seen that a practical implementation of a regularization approach for SDP can be derived from both the primal and the dual view points,
resulting in the same algorithmic framework. The choice of the regularization
parameter t (or σ), the tolerance levels for the inner problem, and the strategy
for W have a strong impact on the actual performance. We tested various prototype algorithms, and decide to emphasize the following simple instance of the
general algorithm which often yields to the overall best practical performance
on our test problems.
17
• Errors. We use a relative error measure following [Mit03]. Thus the
relative primal and dual infeasibilities are:
δp :=
kC − Z − A∗ yk
kA(X) − bk
, δd :=
, δ := max {δp , δd } .
1 + kbk
1 + kCk
Recall that all other optimality conditions hold by construction of the
algorithm.
• Inner iterations. In our experiments we noticed that the overall performance is best if we execute the inner loop only once. As explained in
Section 4.3, this is a plausible alternative in our context. With one inner
iteration, the choice of W (and τ ) is natural: we take W = (AA∗ )−1 and
τ = 1/t.
• Normalization. In order to simplify the choice for the internal parameters, we assume without loss of generality that the data are scaled so that
both b and C have norm one.
• Initializing t. We select t large enough, so that the relative dual infeasibility is smaller than the relative primal infeasibility. If b and C have
norm one, then a value of t in the range 0.1 ≤ t ≤ 10 turned out to be a
robust choice.
• Updating t. We use the following simple update strategy for t. As a
general rule we try to change t as little as possible. Therefore we leave
t untouched for a fixed number it of iterations. (Our choice is it = 10.)
Every it-th iteration we check whether δp ≤ δd . If this is not the case,
we reduce t by a constant multiplicative factor, which we have set to 0.9.
Otherwise we leave t unchanged.
• Stopping condition. We stop the algorithm once the overall error δ is
below a desired tolerance ε, which we set to
ε = 10−7 .
We also stop after a maximum of 300 iterations in case we do not reach
the required level of accuracy.
These specifications lead to the following algorithm we use for numerical
testing. Its MATLAB source code is available online at [Web].
Algorithm 5.1.
Initialization: Choose t > 0, Y ∈ Sn and ε. Set Z = 0.
Repeat until δ < ε:
Compute the solution y of AA∗ y = A(C − Z) + (b − AY )/t.
Compute Z = −(Y /t + A∗ y − C)− and X = t(Y /t + A∗ y − C)+ .
Update Y ← X.
Compute δ.
Update t.
18
Note that the computational effort in each iteration is essentially determined
by the eigenvalue decomposition of the matrix Y /t + A∗ y − C of order n, needed
to compute the projections X and Z, and by solving the linear system of order
m with matrix AA∗ . However, computing AA∗ and its Cholesky factorization
is done only once at the beginning of the algorithm. Hence when n is not too
big, the cost of each iteration is moderate, so we can target problems with m
very large.
The following section provides some computational results with this algorithm. All computations are done under Linux on a Pentium 4 running with
3.2 GHz and 2.5 GB of RAM.
5.2
Pseudo-random SDP
Our algorithm should be considered as an alternative to interior-point methods
in situations where the number m of equations prohibits direct methods to solve
the Schur complement equations. In order to test our method systematically,
we first consider ‘unstructured’ SDPs. The currently available libraries of SDP
instances do not contain enough instances of this type with sizes suitable for
our purposes. Therefore we consider randomly generated problems. Given the
parameters n, m, p and seed, we generate pseudo-random instances as follows.
First generator. The main focus lies on A which should be such that the
Cholesky factor of AA∗ can be computed with reasonable effort; this means
we must control the sparsity of the matrix. The linear operator A is defined
through m symmetric matrices Ai with (AX)i = hAi , Xi. The matrix AA∗
therefore has entries (AA∗ )ij = hAi , Aj i. This means that the m matrices Ai
forming A should be sparse, and the rows of A should be reordered to reduce the
fill-in in the Cholesky factor. In order to control the sparsity of A, we generate
the matrices Ai defining A to be nonzero only on a submatrix of order p, whose
support is selected randomly. Then we reorder the rows of A using the MATLAB
command symamd to reduce the fill-in. So when n and p are fixed, it increases
with m, and when m and p are fixed, it decreases with n.
Having A, we generate a positive definite X and set b = AX. Similarly, we
select a positive definite Z and a vector y and set C = Z + A∗ y, so we have
strong duality. The generator is also available on [Web]. To make the data
reproducible, we initialize the random number generator with the parameter
seed.
In Table 1, we provide a collection of instances. Aside from the parameters
n, m, p, seed used to generate the instances, we include:
• the time (in seconds) to compute the Cholesky factor of AA∗ ,
• the time (in seconds) to reach the required relative accuracy ε = 10−7 ,
• the optimal value of the SDP problems.
Observe that the computing time for the Cholesky factorization cannot be neglected in general, and that it varies a lot according to the construction and
sparsity of the matrix A (see comments above).
19
Second generator. When both n and m get larger, the bottleneck of our
approach is the computation of the Cholesky factor of AA∗ . In order to experiment also with larger instances, we have set up another generator for random
instances which generates A through the QR decomposition of A∗ = QR. Here
we select orthogonal columns in Q and select an upper triangular matrix R
at random, with a prespecified density (=proportion of nonzero entries in the
upper triangle). Again this generator is available online at [Web].
In Table 2 we collect some representative results. We ran the algorithm for
300 iterations and report the accuracy level reached. It was always below 10−6 .
We manage to solve these instances with reasonable computational effort. We
emphasize however, that these results are only achievable because the Cholesky
factorization AA∗ = RT R is given as part of the input. Computing the Cholesky
factor of the smallest instance (seed=4004010) failed on our machine due to lack
of memory.
5.3
Comparison with other methods
There are only a few methods available which are capable of dealing with largescale SDP. The website http://plato.asu.edu/bench.html maintained by
Hans Mittelmann gives an overview of the current state of the art in software
to solve large SDP problems.
In Table 3, we compare our method with the low-rank approach SDPLR of
[BM03]. Since this is essentially a first order nonlinear optimization approach,
it does not easily reach the same level of accuracy as our method. We have
set a time limit of 3600 seconds and report the final relative accuracy. We also
compare with the code of Kim Toh, described in [Toh04]. The results were
provided to us by Kim Toh; the timings for this experiment were obtained on a
Pentium 4 with 3.0 GHz, a machine that is slightly slower than ours. We first
note that SDPLR is clearly outperformed by both other methods. We also see
that as m gets larger, our method is substantially faster than the code of Toh.
At the time of final revisions of this paper, the preprint [ZST08] became
publicly available, showing that a refined instantiation of our regularization
algorithm leads to more powerful computational results.
5.4
The Lovász theta number
In [PRW06], the boundary point method was applied to the SDP problem underlying the Lovász theta number of a graph. We come back now on this type
of SDP with the present approach and we compute challenging instances.
Given a graph G (and its complementary G) with the set of vertices V (G)
and the set of edges E(G), n = |V (G)|, the Lovász theta number ϑ(G) of G is
defined as the optimal value of the following SDP problem

 max hJ, Xi
Xij = 0, ∀ij ∈ E(G),
ϑ(G) :=

trace X = 1, X 0.
20
n
300
300
300
400
400
400
500
500
500
500
600
600
600
600
700
700
700
800
800
800
900
900
1000
1000
m
20000
25000
10000
30000
40000
15000
30000
40000
50000
20000
40000
50000
60000
20000
50000
70000
90000
70000
100000
110000
100000
140000
100000
150000
p
3
3
4
3
3
4
3
3
3
4
3
3
3
4
3
3
3
3
3
3
3
3
3
3
seed
3002030
3002530
3001040
4003030
4004030
4001540
5003030
5004030
5005030
5002040
6004030
6005030
6006030
6002040
7005030
7007030
7009030
8007030
80010030
80011030
90010030
90014030
100010030
100015030
Cholesky
time
37
217
83
35
672
222
1
19
277
276
1
7
93
60
1
27
749
1
219
739
7
1672
1
420
Algorithm
time
63
127
97
118
202
195
201
198
254
323
395
372
345
485
591
507
601
793
805
836
1047
1340
1600
1851
obj-value
761.352
73.838
165.974
1072.139
805.768
-655.000
1107.626
816.610
364.945
328.004
306.617
-386.413
641.736
1045.269
313.202
-369.559
-26.755
2331.395
2259.288
1857.920
954.222
2319.830
3096.361
1052.887
Table 1: Randomly generated SDPs. Columns 1 to 4 provide the parameters to
generate the data. The columns “time” give the time (in seconds) to compute
the Cholesky factor, and the time of Algorithm 5.1. In the last column we add
the optimal value computed. The stopping criteria is error δ being below 10−7 .
n
400
500
600
700
800
900
m
40000
60000
80000
100000
130000
150000
dens
10/m
10/m
10/m
10/m
10/m
10/m
seed
4004010
5006010
8008010
70010010
80013010
90015010
time
200
382
714
1176
1550
2800
error
1e-6
1e-6
1e-6
.5e-6
1e-6
.5e-6
obj-value
8018.86
10201.85
12465.07
13063.73
13707.81
15964.01
Table 2: Randomly generated SDPs with known Cholesky factor of AA∗ . We
stop the algorithm after 300 iterations. We provide in columns 1 to 4 the
parameters to generate the data, and in the last three columns the time (in
seconds) to solve the problem, the error and the optimal value.
21
seed
3002030
3002530
3001040
4003030
4004030
4001540
5003030
5004030
5005030
5002040
6004030
6005030
6006030
6002040
7005030
7007030
8007030
SDPLR
time error
686 1.1e-5
1079 1.1e-5
123 1.2e-5
1880 7.3e-6
3055 5.0e-6
299 1.0e-5
2165 7.8e-6
3600 8.8e-6
3600 1.6e-5
347 1.1e-5
3499 5.8e-6
3600 3.8e-5
3600 4.1e-5
600 4.3e-5
3600 3.4e-5
3600 6.3e-5
3600 4.0e-5
time
204
958
159
1000
425
372
1309
1668
1207
644
1658
2009
1630
838
2696
4065
2951
Toh
error
1.5e-4
2.5e-7
3.7e-8
0.6e-7
0.2e-5
0.4e-7
0.5e-7
0.4e-7
0.1e-5
0.9e-7
0.7e-7
0.5e-5
0.5e-6
0.9e-7
0.8e-7
0.3e-7
1.0e-7
our
time
63
127
97
118
202
195
201
198
254
323
395
372
345
485
591
507
793
Table 3: Comparison on randomly generated SDPs. We compare SDPLR [BM03] and the code of Toh [Toh04] against our code on data sets from
Table 1.
22
graph name
keller5
keller6
san1000
san400-07.3
brock400-1
brock800-1
p-hat500-1
p-hat1000-3
p-hat1500-3
n
776
3361
1000
400
400
800
500
1000
1500
|E(G)|
74710
1026582
249000
23940
20077
112095
93181
127754
277006
ϑ(G)
31.000
63.000
15.000
22.000
39.702
42.222
13.074
84.80
115.44
ω(G)
27
≥ 59
15
22
27
23
9
≥ 68
≥ 94
|E(G)|
225990
4619898
250500
55860
59723
207505
31569
371746
847244
ϑ(G)
31.000
63.000
67.000
19.000
10.388
19.233
58.036
18.23
21.52
Table 4: DIMACS graphs and theta numbers
where J is again the matrix of all ones. Lovász [Lov79] showed that the polynomially computable number ϑ(G) separates the clique number ω(G) from the
chromatic number χ(G), i.e., it holds
ω(G) ≤ ϑ(G) ≤ χ(G).
Both numbers ω(G) and χ(G) are NP-complete to compute and in fact even
hard to approximate.
Let us take some graphs from the DIMACS collection [JT96] (available
at ftp://dimacs.rutgers.edu/pub/challenge/graph/benchmarks/clique).
The clique number for most of these instances is known, we compute their theta
number. We do not consider instances having special structure, like Hamming
or Johnson graphs, because these allow computational simplifications leading
to purely linear programs, see [Sch79]. In [DR07] it is observed that ϑ(G) can
be computed efficiently by interior point methods in case that either |E(G)| or
|E(G)| is not too large. Hence we consider only instances where |V (G)| > 300
and both |E(G)| and |E(G)| > 10000.
For the collection of instances of Table 4, ϑ has not yet been able to be
computed before. We provide for them approximations of values of both ϑ(G)
and ϑ(G). The number of outer iterations ranges from 200 to 2000. The computation times are for the smallest graphs (n = 400) some minutes, for the graphs
with sizes from n = 500 to n = 1500 several hours, and for the graph keller6
(n = 3361) one has to allow days in order to reach the desired tolerance.
It turned out that our method worked fine in most of the cases. We noticed
nevertheless an extremely slow convergence in case of the Sanchis graphs (e.g.,
san1000, san400-07.3). One reason for this may lie in the fact that the resulting
SDPs have optimal solutions with rank-one.
5.5
Conclusions and perspectives on experiments
We propose a class of regularization methods for solving linear SDP problems,
having in mind SDP problems with a large number of constraints. In the previous sections, we have presented and studied in theory these methods and in this
23
last section, we presented numerical experiments which show that in practice
our approach compares favourably on several classes of SDPs with the most
efficient currently available SDP solvers. Our approach has nevertheless not
yet reached the level of refinement necessary to be competitive on a very large
range of problems. The main issues are the choice of the classical optimization
algorithm for the inner problem (in other words, with our notation, the choice
of W and τ ), and moreover the stopping rule for this inner algorithm. As usual
with regularization methods, an important issue (still rather mysterious) is the
proper choice of the regularization parameter, and the way to update it. So the
clarification and generalization of this paper open the way for improvements:
there is large room for further research in this area, in both theory and practice.
Moreover, in order to be widely applicable, it is necessary to set up the method
so that it can handle several blocks of semidefinite matrices, as well as inequality
constraints. All this is also the topic of further research and development.
Acknowledgement
We thank Claude Lemaréchal for many fruitful discussions on the approach
discussed in this paper. We also thank Kim Toh and Sam Burer for making
their codes available to us. Moreover, Kim Toh was kind enough to experiment
with our random number generators and to run some of the instances independently on his machine. Finally, we appreciate the constructive comments of two
anonymous referees, leading to the present version of the paper.
References
[AHU59] K. Arrow, L. Hurwicz, and H. Uzawa. Studies in Linear and Nonlinear
Programming. Stanford University Press, 1959.
[Ber82]
D.P. Bertsekas. Constrained Optimization and Lagrange multiplier
methods. Academic Press, 1982.
[BKL66] R. Bellman, R. Kalaba, and J. Lockett. Numerical Inversion of the
Laplace Transform. Elsevier, 1966.
[BM03]
S. Burer and R.D.C Monteiro. A nonlinear programming algorithm
for solving semidefinite programs via low-rank factorization. Mathematical Programming (Series B), 95:329–357, 2003.
[BV06]
S. Burer and D. Vandenbussche. Solving lift-and-project relaxations of
binary integer programs. SIAM Journal on Optimization, 16(3):726–
750 (electronic), 2006.
[DR07]
I. Dukanovic and F. Rendl. Semidefinite programming relaxations for
graph coloring and maximal clique problems. Mathematical Programming, 109:345–365, 2007.
24
[Hes69]
M.R. Hestenes. Multiplier and gradient methods. Journal of Optimization Theory and Applications, 4:303–320, 1969.
[Hig02]
N. Higham. Computing a nearest symmetric correlation matrix - a
problem from finance. IMA Journal of Numerical Analysis, 22(3):329–
343, 2002.
[HR00]
C. Helmberg and F. Rendl. A spectral bundle method for semidefinite
programming. SIAM Journal on Optimization, 10(3):673–696, 2000.
[HUL93] J.-B. Hiriart-Urruty and C. Lemaréchal. Convex Analysis and Minimization Algorithms. Springer Verlag, Heidelberg, 1993. Two volumes.
[JR08]
F. Jarre and F. Rendl. An augmented primal-dual method for linear
conic problems. SIAM Journal on Optimization, 2008. To appear.
[JT96]
D.J. Johnson and M.A. Trick, editors. Cliques, Coloring, and Satisfiability: Second DIMACS Implementation Challenge, Workshop, October 11-13, 1993. American Mathematical Society, Boston, MA, USA,
1996.
[KS07]
M. Kočvara and M. Stingl. On the solution of large-scale SDP problems by the modified barrier method using iterative solvers. Mathematical Programming, 109:413–444, 2007.
[LNM07] Z. Lu, A. Nemirovski, and R.D.C. Monteiro. Large-scale semidefinite
programming via a saddle point Mirror-Prox algorithm. Mathematical
Programming, 109(2-3, Ser. B):211–237, 2007.
[Lov79]
L. Lovász. On the shannon capacity of a graph. IEEE Trans. Inform.
Theory, 25:1–7, 1979.
[Mal04]
J. Malick. A dual approach to semidefinite least-squares problems.
SIAM Journal on Matrix Analysis and Applications, 26, Number
1:272–284, 2004.
[Mal05]
J. Malick. Algorithmique en analyse variationnelle et optimisation
semidéfinie. PhD thesis, Université Joseph Fourier de Grenoble
(France), November 25, 2005.
[Mar70]
B. Martinet. Régularisation d’inéquations variationnelles par approximations successives. Revue Française d’Informatique et Recherche
Opérationnelle, R-3:154–179, 1970.
[Mit03]
D. Mittelmann. An independent benchmarking of sdp and socp
solvers. Mathematical Programming B, 95:407–430, 2003.
[Mon03] R.D.C. Monteiro. First- and second-order methods for semidefinite
programming. Mathematical Programming, 97:663–678, 2003.
25
[Pow69] M.J.D. Powell. A method for nonlinear constraints in minimization
problems. In R. Fletcher, editor, Optimization. Academic Press, London, New York, 1969.
[PRW06] J. Povh, F. Rendl, and A. Wiegele. A boundary point method to solve
semidefinite programs. Computing, 78:277–286, 2006.
[PT72]
B.T. Polyak and N.V. Tretjakov. On an iterative method for linear
programming and its economical interpretations. Èkonom. i Mat.
Methody, 8:740–751, 1972.
[QS06]
H. Qi and D. Sun. Quadratic convergence and numerical experiments of Newton’s method for computing the nearest correlation matrix. SIAM Journal on Matrix Analysis and Applications, 28:360–385,
2006.
[Ren05]
F. Rendl. A boundary point method to solve semidefinite programs.
Report 2/2005, Annals of Oberwolfach, Springer, 2005.
[Roc76a] R.T. Rockafellar. Augmented Lagrangians and applications of the
proximal point algorithm in convex programming. Mathematics of
Operations Research, 1:97–116, 1976.
[Roc76b] R.T. Rockafellar. Monotone operators and the proximal point algorithm. SIAM Journal on Control and Optimization, 14:877–898, 1976.
[Sch79]
A. Schrijver. A comparison of the Delsarte and Lovász bounds. IEEE
Transactions on Information Theory, IT-25:425–429, 1979.
[Tod01]
M.J. Todd. Semidefinite optimization. Acta Numerica, 10:515–560,
2001.
[Toh04]
K.C. Toh. Solving large scale semidefinite programs via an iterative
solver on the augmented systems. SIAM Journal on Optimization,
14:670–698, 2004.
[Web]
Website. http://www.math.uni-klu.ac.at/or/Software.
[WSV00] H. Wolkowicz, R. Saigal, and L. Vandenberghe. Handbook of Semidefinite Programming. Kluwer, 2000.
[ZST08] X. Zhao, D. Sun, and K. Toh. A Newton-CG augmented Lagrangian
method for semidefinite programming. Technical report, National
University of Singapore, March 2008.
26
Download