Lagrangian Relaxations and Randomization

advertisement
Lagrangian Relaxations and Randomization
• Semidefinite relaxations
• Lagrangian relaxations for general QCQPs
• Randomization
• Bounds on suboptimality (MAXCUT)
EE392o, Stanford University
1
Easy & Hard Problems
classical view on complexity:
• linear is easy
• nonlinear is hard(er)
EE392o, Stanford University
2
Easy & Hard Problems
... EE364 (and correct) view:
• convex is easy
• nonconvex is hard(er)
EE392o, Stanford University
3
Convex Optimization
minimize f0(x)
subject to f1(x) ≤ 0, . . . , fm(x) ≤ 0
x ∈ Rn is optimization variable; fi : Rn → R are convex:
fi(λx + (1 − λ)y) ≤ λfi(x) + (1 − λ)fi(y)
for all x, y, 0 ≤ λ ≤ 1
• includes LS, LP, QP, and many others
• like LS, LP, and QP, convex problems are fundamentally tractable
EE392o, Stanford University
4
Nonconvex Problems
nonconvexity makes problems essentially untractable...
• sometimes the result of bad problem formulation
• however, often arises because of some natural limitation: fixed
transaction costs, binary communications, ...
EE392o, Stanford University
5
Example: Robust LP
minimize cT x
subject to Prob(aTi x ≤ bi) ≥ η,
i = 1, . . . , m
coefficient vectors ai IID, N (ai, Σi); η is required reliability
• for fixed x, aTi x is N (aTi x, xT Σix)
• so for η = 50%, robust LP reduces to LP
minimize cT x
subject to aTi x ≤ bi,
i = 1, . . . , m
and so is easily solved
• what about other values of η, e.g., η = 10%? η = 90%?
EE392o, Stanford University
6
Hint
{x | Prob(aTi x ≤ bi) ≥ η, i = 1, . . . , m}
η = 10%
EE392o, Stanford University
η = 50%
η = 90%
7
That’s right
robust LP with reliability η = 90% is convex, and very easily solved
robust LP with reliability η = 10% is not convex, and extremely difficult
moral: very difficult and very easy problems can look quite similar
(to the untrained eye)
EE392o, Stanford University
8
Nonconvex Problems
what can be done?... we will use convex optimization results to:
• find bounds on the optimal value, by relaxation
• get ”good” feasible points via randomization
EE392o, Stanford University
9
Basic Problem
• focus here on a specific class of problems: general QCQPs
• vast range of applications...
the generic QCQP can be written:
minimize xT P0x + q0T x + r0
subject to xT Pix + qiT x + ri ≤ 0,
i = 1, . . . , m
• if all Pi are p.s.d., convex problem, use EE364...
• here, we suppose at least one Pi not p.s.d.
EE392o, Stanford University
10
Example: Boolean Least Squares
Boolean least-squares problem:
minimize kAx − bk2
subject to x2i = 1, i = 1, . . . , n
• basic problem in digital communications
• could check all 2n possible values of x . . .
• an NP-hard problem, and very hard in practice
• many heuristics for approximate solution
EE392o, Stanford University
11
Example: Partitioning Problem
two-way partitioning problem described in §5.1.4 of the 364 reader:
minimize xT W x
subject to x2i = 1,
i = 1, . . . , n
where W ∈ Sn, with Wii = 0.
• a feasible x corresponds to the partition
{1, . . . , n} = {i | xi = −1} ∪ {i | xi = 1}
• the matrix coefficient Wij can be interpreted as the cost of having the
elements i and j in the same partition.
• the objective is to find the partition with least total cost
• classic particular instance: MAXCUT (Wij ≥ 0)
EE392o, Stanford University
12
Convex Relaxation
the original QCQP:
minimize xT P0x + q0T x + r0
subject to xT Pix + qiT x + ri ≤ 0,
i = 1, . . . , m
can be rewritten:
minimize Tr(XP0) + q0T x + r0
subject to Tr(XPi) + qiT x + ri ≤ 0,
X xxT
Rank(X) = 1
i = 1, . . . , m
the only nonconvex constraint is now Rank(X) = 1...
EE392o, Stanford University
13
Convex Relaxation: Semidefinite Relaxation
• we can directly relax this last constraint, i.e. drop the nonconvex
Rank(X) = 1 to keep only X xxT
• the resulting program gives a lower bound on the optimal value
minimize Tr(XP0) + q0T x + r0
subject to Tr(XPi) + qiT x + ri ≤ 0,
X xxT
i = 1, . . . , m
Tricky. . . Can be improved?
EE392o, Stanford University
14
Lagrangian Relaxation
from the original problem:
minimize xT P0x + q0T x + r0
subject to xT Pix + qiT x + ri ≤ 0,
i = 1, . . . , m
we can form the Lagrangian:
L(x, λ) = xT
P0 +
m
X
i=1
!
λ i Pi x +
q0 +
m
X
i=1
λi q i
!T
x + r0 +
m
X
λi ri
i=1
in the variables x ∈ Rn and λ ∈ Rm
+ ...
EE392o, Stanford University
15
Lagrangian Relaxation: Lagrangian
the dual can be computed explicitly as an (unconstrained) quadratic
minimization problem, with:
T
T
inf x P x + q x + r =
x∈R
r − 14 q T P †q, if P 0 and q ∈ R(P )
−∞, otherwise
we have:
inf x L(x, λ) =
− 14
+
(q0 +
Pm
Pm
T
i=1 λi qi )
i=1 λi ri
(P0 +
+ r0
Pm
†
i=1 λi Pi )
(q0 +
Pm
i=1 λi qi )
where we recognize a Schur complement...
EE392o, Stanford University
16
Lagrangian Relaxation: Dual
the dual of the QCQP is then given by:
maximize
subject to
γ
+
Pm
i ri + r0
i=1 λ
P
m
(P0 + i=1 λiPi)
Pm
T
(q0 + i=1 λiqi) /2
λi ≥ 0,
i = 1, . . . , m
(q0 +
Pm
i=1 λi qi ) /2
−γ
0
which is a semidefinite program in the variable λ ∈ Rm and can be solved
efficiently
let us look at what happens when we use semidefinite duality to compute
the dual of this last program...
EE392o, Stanford University
17
Lagrangian Relaxation: Bidual
Taking the dual again, we get an SDP is given by:
minimize Tr(XP0) + q0T x + r0
T
subject to Tr(XP
)
+
q
i
i x + ri ≤ 0,
X xT
0
x 1
i = 1, . . . , m
in the variables X ∈ Sn and x ∈ Rn
• this is a convexification of the original program
• we have recovered the semidefinite relaxation in an “automatic” way
EE392o, Stanford University
18
Lagrangian Relaxation: Boolean LS
using the previous technique, we can relax the original Boolean LS
problem:
minimize kAx − bk2
subject to x2i = 1, i = 1, . . . , n
and relax it as an SDP:
T
T
Tr(AX)
+
2b
Ax
+
b
b
X xT
subject to
0
x 1
Xii = 1, i = 1, . . . , n
minimize
this program then produces a lower bound on the optimal value of the
original Boolean LS program
EE392o, Stanford University
19
Lagrangian Relaxation: Partitioning
the partitioning problem defined above is:
minimize xT W x
subject to x2i = 1,
i = 1, . . . , n
the variable x disappears from the relaxation, which becomes:
minimize Tr(W X)
subject to X 0
Xii = 1, i = 1, . . . , n
EE392o, Stanford University
20
Feasible points?
• Lagrangian relaxations only provide lower bounds on the optimal value
• how can we compute good feasible points?
• can we measure how suboptimal this lower bound is?
EE392o, Stanford University
21
Randomization
the original QCQP:
minimize xT P0x + q0T x + r0
subject to xT Pix + qiT x + ri ≤ 0,
i = 1, . . . , m
was relaxed into:
minimize Tr(XP0) + q0T x + r0
T
subject to Tr(XP
i ) +qi x + ri ≤ 0,
X xT
0
x 1
i = 1, . . . , m
• the last (Schur complement) constraint is equivalent to X − xxT 0
• hence, if x and X are the solution to the relaxed program, then
X − xxT is a covariance matrix...
EE392o, Stanford University
22
Randomization
• pick x as a Gaussian variable with x ∼ N (x, X − xxT )
• x will solve the QCQP ”on average” over this distribution
in other words:
minimize E[xT P0x + q0T x + r0]
subject to E[xT Pix + qiT x + ri] ≤ 0,
i = 1, . . . , m
a good feasible point can then be obtained by sampling enough x. . .
EE392o, Stanford University
23
Linearization
Consider the constraint:
xT P x + q T x + r ≤ 0
we decompose the matrix P into its positive and negative parts
P = P + − P− ,
P+, P− 0
and original constraint becomes
xT P+x + q0T x + r0 ≤ xT P−x
EE392o, Stanford University
24
Linearization
both sides of the inequality are now convex quadratic functions. We
linearize the right hand side around an initial feasible point x0 to obtain
xT P+x + q0T x + r0 ≤ x(0)T P−x(0) + 2x(0)T P−(x − x(0))
• the right hand side is now an affine lower bound on the original
function xT P−x (see §3.1.3 in the 364 reader).
• the resulting constraint is convex and more conservative than the
original one, hence the feasible set of the new problem will be a convex
subset of the original feasible set
• we form a convex restriction of the problem
We can then solve the convex restriction to get a better feasible point
x(1) and iterate. . .
EE392o, Stanford University
25
Bounds on suboptimality
• in certain particular cases, it is possible to get a hard bound on the
gap between the optimal value and the relaxation result
• a classic example is that of the MAXCUT bound
The MAXCUT problem is a particular case of the partitioning problem:
maximize xT W x
subject to x2i = 1,
i = 1, . . . , n
its Lagrangian relaxation is computed as:
maximize Tr(W X)
subject to X 0
Xii = 1, i = 1, . . . , n
EE392o, Stanford University
26
Bounds on suboptimality: MAXCUT
Let X be a solution to this program
• we look for a feasible point by sampling a normal distribution N (0, X)
• we convert each sample point x to a feasible point by rounding it to
the nearest value in {−1, 1}, i.e. taking
x̂ = sgn(x)
crucially, when x̂ is sampled using that procedure, the expected value of
the objective E[x̂T W x] can be computed explicitly:
n
X
2
2
E[x̂T W x] =
Wij arcsin(Xij ) = Tr(W arcsin(X))
π i,j=1
π
EE392o, Stanford University
27
Bounds on suboptimality: MAXCUT
• we are guaranteed to reach this expected value 2/π Tr(W arcsin(X))
after sampling a few (feasible) points x̂
• hence we know that the optimal value OP T of the MAXCUT problem
is between 2/π Tr(W arcsin(X)) and Tr(W X)
furthermore, with arcsin(X) X, we can simplify (and relax) the above
expression to get:
2
Tr(W X) ≤ OP T ≤ Tr(W X)
π
the procedure detailed above guarantees that we can find a feasible point
at most 2/π suboptimal
EE392o, Stanford University
28
Numerical Example: Boolean LS
Boolean least-squares problem:
minimize kAx − bk2
subject to x2i = 1, i = 1, . . . , n
with
kAx − bk2 = xT AT Ax − 2bT Ax + bT b
= Tr AT AX − 2bT AT x + bT b
where X = xxT , hence can express BLS as
minimize Tr AT AX − 2bT Ax + bT b
subject to Xii = 1, X xxT , rank(X) = 1
. . . still a very hard problem
EE392o, Stanford University
29
SDP relaxation for BLS
using Lagrangian relaxation, remember:
T
X xx
⇐⇒
X
xT
x
1
0
we obtained the SDP relaxation (with variables X, x)
Tr AT AX − 2bT AT x + bT b
X x
0
subject to Xii = 1,
xT 1
minimize
• optimal value of SDP gives lower bound for BLS
• if optimal matrix is rank one, we’re done
EE392o, Stanford University
30
Interpretation via randomization
• can think of variables X, x in SDP relaxation as defining a normal
distribution z ∼ N (x, X − xxT ), with E zi2 = 1
• SDP objective is E kAz − bk2
suggests randomized method for BLS:
• find X opt, xopt, optimal for SDP relaxation
• generate z from N (xopt, X opt − xoptxoptT )
• take x = sgn(z) as approximate solution of BLS
(can repeat many times and take best one)
EE392o, Stanford University
31
Example
• (randomly chosen) parameters A ∈ R150×100, b ∈ R150
• x ∈ R100, so feasible set has 2100 ≈ 1030 points
LS approximate solution: minimize kAx − bk s.t. kxk2 = n, then round
yields objective 8.7% over SDP relaxation bound
randomized method: (using SDP optimal distribution)
• best of 20 samples: 3.1% over SDP bound
• best of 1000 samples: 2.6% over SDP bound
EE392o, Stanford University
32
0.5
frequency
0.4
SDP bound
LS solution
0.3
0.2
0.1
0
1
1.2
kAx − bk/(SDP bound)
EE392o, Stanford University
33
Example: Partitioning Problem
we go back now to the two-way partitioning problem considered in
exercise 5.39 of the reader:
minimize xT W x
subject to x2i = 1,
i = 1, . . . , n
the Lagrange dual of this problem is given by the SDP:
maximize −1T ν
subject to W + diag(ν) 0
EE392o, Stanford University
34
Partitioning: Lagrangian relaxation
the dual of this SDP is the new SDP:
minimize Tr W X
subject to X 0
Xii = 1,
i = 1, . . . , n
the solution X opt gives a lower bound on the optimal value popt of the
partitioning problem
EE392o, Stanford University
35
Partitioning: simple heuristic
• solve the previous SDP to find X opt (and the bound popt)
• let v denote an eigenvector of X opt associated with its largest
eigenvalue
• now let
x̂ = sgn(v)
the vector x̂ is our guess for a good partition
EE392o, Stanford University
36
Partitioning: Randomization
• we generate independent samples x(1), . . . , x(K) from a normal
distribution with zero mean and covariance X opt
• for each sample we consider the heuristic approximate solution
x̂(k) = sgn(x(k))
• we then take the one with lowest cost
EE392o, Stanford University
37
Partitioning: Numerical Example
we compare the performance of these methods on a randomly chosen
problem
• the optimal SDP lower bound popt is equal to −1641
• the simple sign(x) heuristic gives a partition with total cost −1280
exactly what the optimal value is, we can’t say; all we can say at this
point is that it is between −1641 and −1280
EE392o, Stanford University
38
Partitioning
120
100
80
60
40
20
0
−1400
−1200
−1000
−800
−600
−400
−200
0
200
histogram of the objective obtained by the randomized heuristic, over
1000 samples: the minimum value reached here is −1328
EE392o, Stanford University
39
−500
−600
Objective value
−700
−800
−900
−1000
−1100
−1200
−1300
−1400
0
100
200
300
400
500
600
Number of samples
700
800
900
1000
we’re not sure what the optimal cost is, but now we know it’s between
−1641 and −1328
EE392o, Stanford University
40
Greedy method
we can improve these results a little bit using the following simple greedy
heuristic
T
• suppose the matrix Y = x̂ W x̂ has a column j whose sum
positive
• switching x̂j to −x̂j will decrease the objective by 2
Pn
i=1 yij
is
Pn
i=1 yij
• if we pick the column yj0 with largest sum, switch x̂j0 and repeat until
all column sums are negative, we decrease the objective
if we apply this simple method to the solution of the randomized method
we get an objective value of −1356, while applying it to the SDP
heuristic gives an objective value of −1372, our best partition yet...
EE392o, Stanford University
41
Download