Lagrangian Relaxations and Randomization • Semidefinite relaxations • Lagrangian relaxations for general QCQPs • Randomization • Bounds on suboptimality (MAXCUT) EE392o, Stanford University 1 Easy & Hard Problems classical view on complexity: • linear is easy • nonlinear is hard(er) EE392o, Stanford University 2 Easy & Hard Problems ... EE364 (and correct) view: • convex is easy • nonconvex is hard(er) EE392o, Stanford University 3 Convex Optimization minimize f0(x) subject to f1(x) ≤ 0, . . . , fm(x) ≤ 0 x ∈ Rn is optimization variable; fi : Rn → R are convex: fi(λx + (1 − λ)y) ≤ λfi(x) + (1 − λ)fi(y) for all x, y, 0 ≤ λ ≤ 1 • includes LS, LP, QP, and many others • like LS, LP, and QP, convex problems are fundamentally tractable EE392o, Stanford University 4 Nonconvex Problems nonconvexity makes problems essentially untractable... • sometimes the result of bad problem formulation • however, often arises because of some natural limitation: fixed transaction costs, binary communications, ... EE392o, Stanford University 5 Example: Robust LP minimize cT x subject to Prob(aTi x ≤ bi) ≥ η, i = 1, . . . , m coefficient vectors ai IID, N (ai, Σi); η is required reliability • for fixed x, aTi x is N (aTi x, xT Σix) • so for η = 50%, robust LP reduces to LP minimize cT x subject to aTi x ≤ bi, i = 1, . . . , m and so is easily solved • what about other values of η, e.g., η = 10%? η = 90%? EE392o, Stanford University 6 Hint {x | Prob(aTi x ≤ bi) ≥ η, i = 1, . . . , m} η = 10% EE392o, Stanford University η = 50% η = 90% 7 That’s right robust LP with reliability η = 90% is convex, and very easily solved robust LP with reliability η = 10% is not convex, and extremely difficult moral: very difficult and very easy problems can look quite similar (to the untrained eye) EE392o, Stanford University 8 Nonconvex Problems what can be done?... we will use convex optimization results to: • find bounds on the optimal value, by relaxation • get ”good” feasible points via randomization EE392o, Stanford University 9 Basic Problem • focus here on a specific class of problems: general QCQPs • vast range of applications... the generic QCQP can be written: minimize xT P0x + q0T x + r0 subject to xT Pix + qiT x + ri ≤ 0, i = 1, . . . , m • if all Pi are p.s.d., convex problem, use EE364... • here, we suppose at least one Pi not p.s.d. EE392o, Stanford University 10 Example: Boolean Least Squares Boolean least-squares problem: minimize kAx − bk2 subject to x2i = 1, i = 1, . . . , n • basic problem in digital communications • could check all 2n possible values of x . . . • an NP-hard problem, and very hard in practice • many heuristics for approximate solution EE392o, Stanford University 11 Example: Partitioning Problem two-way partitioning problem described in §5.1.4 of the 364 reader: minimize xT W x subject to x2i = 1, i = 1, . . . , n where W ∈ Sn, with Wii = 0. • a feasible x corresponds to the partition {1, . . . , n} = {i | xi = −1} ∪ {i | xi = 1} • the matrix coefficient Wij can be interpreted as the cost of having the elements i and j in the same partition. • the objective is to find the partition with least total cost • classic particular instance: MAXCUT (Wij ≥ 0) EE392o, Stanford University 12 Convex Relaxation the original QCQP: minimize xT P0x + q0T x + r0 subject to xT Pix + qiT x + ri ≤ 0, i = 1, . . . , m can be rewritten: minimize Tr(XP0) + q0T x + r0 subject to Tr(XPi) + qiT x + ri ≤ 0, X xxT Rank(X) = 1 i = 1, . . . , m the only nonconvex constraint is now Rank(X) = 1... EE392o, Stanford University 13 Convex Relaxation: Semidefinite Relaxation • we can directly relax this last constraint, i.e. drop the nonconvex Rank(X) = 1 to keep only X xxT • the resulting program gives a lower bound on the optimal value minimize Tr(XP0) + q0T x + r0 subject to Tr(XPi) + qiT x + ri ≤ 0, X xxT i = 1, . . . , m Tricky. . . Can be improved? EE392o, Stanford University 14 Lagrangian Relaxation from the original problem: minimize xT P0x + q0T x + r0 subject to xT Pix + qiT x + ri ≤ 0, i = 1, . . . , m we can form the Lagrangian: L(x, λ) = xT P0 + m X i=1 ! λ i Pi x + q0 + m X i=1 λi q i !T x + r0 + m X λi ri i=1 in the variables x ∈ Rn and λ ∈ Rm + ... EE392o, Stanford University 15 Lagrangian Relaxation: Lagrangian the dual can be computed explicitly as an (unconstrained) quadratic minimization problem, with: T T inf x P x + q x + r = x∈R r − 14 q T P †q, if P 0 and q ∈ R(P ) −∞, otherwise we have: inf x L(x, λ) = − 14 + (q0 + Pm Pm T i=1 λi qi ) i=1 λi ri (P0 + + r0 Pm † i=1 λi Pi ) (q0 + Pm i=1 λi qi ) where we recognize a Schur complement... EE392o, Stanford University 16 Lagrangian Relaxation: Dual the dual of the QCQP is then given by: maximize subject to γ + Pm i ri + r0 i=1 λ P m (P0 + i=1 λiPi) Pm T (q0 + i=1 λiqi) /2 λi ≥ 0, i = 1, . . . , m (q0 + Pm i=1 λi qi ) /2 −γ 0 which is a semidefinite program in the variable λ ∈ Rm and can be solved efficiently let us look at what happens when we use semidefinite duality to compute the dual of this last program... EE392o, Stanford University 17 Lagrangian Relaxation: Bidual Taking the dual again, we get an SDP is given by: minimize Tr(XP0) + q0T x + r0 T subject to Tr(XP ) + q i i x + ri ≤ 0, X xT 0 x 1 i = 1, . . . , m in the variables X ∈ Sn and x ∈ Rn • this is a convexification of the original program • we have recovered the semidefinite relaxation in an “automatic” way EE392o, Stanford University 18 Lagrangian Relaxation: Boolean LS using the previous technique, we can relax the original Boolean LS problem: minimize kAx − bk2 subject to x2i = 1, i = 1, . . . , n and relax it as an SDP: T T Tr(AX) + 2b Ax + b b X xT subject to 0 x 1 Xii = 1, i = 1, . . . , n minimize this program then produces a lower bound on the optimal value of the original Boolean LS program EE392o, Stanford University 19 Lagrangian Relaxation: Partitioning the partitioning problem defined above is: minimize xT W x subject to x2i = 1, i = 1, . . . , n the variable x disappears from the relaxation, which becomes: minimize Tr(W X) subject to X 0 Xii = 1, i = 1, . . . , n EE392o, Stanford University 20 Feasible points? • Lagrangian relaxations only provide lower bounds on the optimal value • how can we compute good feasible points? • can we measure how suboptimal this lower bound is? EE392o, Stanford University 21 Randomization the original QCQP: minimize xT P0x + q0T x + r0 subject to xT Pix + qiT x + ri ≤ 0, i = 1, . . . , m was relaxed into: minimize Tr(XP0) + q0T x + r0 T subject to Tr(XP i ) +qi x + ri ≤ 0, X xT 0 x 1 i = 1, . . . , m • the last (Schur complement) constraint is equivalent to X − xxT 0 • hence, if x and X are the solution to the relaxed program, then X − xxT is a covariance matrix... EE392o, Stanford University 22 Randomization • pick x as a Gaussian variable with x ∼ N (x, X − xxT ) • x will solve the QCQP ”on average” over this distribution in other words: minimize E[xT P0x + q0T x + r0] subject to E[xT Pix + qiT x + ri] ≤ 0, i = 1, . . . , m a good feasible point can then be obtained by sampling enough x. . . EE392o, Stanford University 23 Linearization Consider the constraint: xT P x + q T x + r ≤ 0 we decompose the matrix P into its positive and negative parts P = P + − P− , P+, P− 0 and original constraint becomes xT P+x + q0T x + r0 ≤ xT P−x EE392o, Stanford University 24 Linearization both sides of the inequality are now convex quadratic functions. We linearize the right hand side around an initial feasible point x0 to obtain xT P+x + q0T x + r0 ≤ x(0)T P−x(0) + 2x(0)T P−(x − x(0)) • the right hand side is now an affine lower bound on the original function xT P−x (see §3.1.3 in the 364 reader). • the resulting constraint is convex and more conservative than the original one, hence the feasible set of the new problem will be a convex subset of the original feasible set • we form a convex restriction of the problem We can then solve the convex restriction to get a better feasible point x(1) and iterate. . . EE392o, Stanford University 25 Bounds on suboptimality • in certain particular cases, it is possible to get a hard bound on the gap between the optimal value and the relaxation result • a classic example is that of the MAXCUT bound The MAXCUT problem is a particular case of the partitioning problem: maximize xT W x subject to x2i = 1, i = 1, . . . , n its Lagrangian relaxation is computed as: maximize Tr(W X) subject to X 0 Xii = 1, i = 1, . . . , n EE392o, Stanford University 26 Bounds on suboptimality: MAXCUT Let X be a solution to this program • we look for a feasible point by sampling a normal distribution N (0, X) • we convert each sample point x to a feasible point by rounding it to the nearest value in {−1, 1}, i.e. taking x̂ = sgn(x) crucially, when x̂ is sampled using that procedure, the expected value of the objective E[x̂T W x] can be computed explicitly: n X 2 2 E[x̂T W x] = Wij arcsin(Xij ) = Tr(W arcsin(X)) π i,j=1 π EE392o, Stanford University 27 Bounds on suboptimality: MAXCUT • we are guaranteed to reach this expected value 2/π Tr(W arcsin(X)) after sampling a few (feasible) points x̂ • hence we know that the optimal value OP T of the MAXCUT problem is between 2/π Tr(W arcsin(X)) and Tr(W X) furthermore, with arcsin(X) X, we can simplify (and relax) the above expression to get: 2 Tr(W X) ≤ OP T ≤ Tr(W X) π the procedure detailed above guarantees that we can find a feasible point at most 2/π suboptimal EE392o, Stanford University 28 Numerical Example: Boolean LS Boolean least-squares problem: minimize kAx − bk2 subject to x2i = 1, i = 1, . . . , n with kAx − bk2 = xT AT Ax − 2bT Ax + bT b = Tr AT AX − 2bT AT x + bT b where X = xxT , hence can express BLS as minimize Tr AT AX − 2bT Ax + bT b subject to Xii = 1, X xxT , rank(X) = 1 . . . still a very hard problem EE392o, Stanford University 29 SDP relaxation for BLS using Lagrangian relaxation, remember: T X xx ⇐⇒ X xT x 1 0 we obtained the SDP relaxation (with variables X, x) Tr AT AX − 2bT AT x + bT b X x 0 subject to Xii = 1, xT 1 minimize • optimal value of SDP gives lower bound for BLS • if optimal matrix is rank one, we’re done EE392o, Stanford University 30 Interpretation via randomization • can think of variables X, x in SDP relaxation as defining a normal distribution z ∼ N (x, X − xxT ), with E zi2 = 1 • SDP objective is E kAz − bk2 suggests randomized method for BLS: • find X opt, xopt, optimal for SDP relaxation • generate z from N (xopt, X opt − xoptxoptT ) • take x = sgn(z) as approximate solution of BLS (can repeat many times and take best one) EE392o, Stanford University 31 Example • (randomly chosen) parameters A ∈ R150×100, b ∈ R150 • x ∈ R100, so feasible set has 2100 ≈ 1030 points LS approximate solution: minimize kAx − bk s.t. kxk2 = n, then round yields objective 8.7% over SDP relaxation bound randomized method: (using SDP optimal distribution) • best of 20 samples: 3.1% over SDP bound • best of 1000 samples: 2.6% over SDP bound EE392o, Stanford University 32 0.5 frequency 0.4 SDP bound LS solution 0.3 0.2 0.1 0 1 1.2 kAx − bk/(SDP bound) EE392o, Stanford University 33 Example: Partitioning Problem we go back now to the two-way partitioning problem considered in exercise 5.39 of the reader: minimize xT W x subject to x2i = 1, i = 1, . . . , n the Lagrange dual of this problem is given by the SDP: maximize −1T ν subject to W + diag(ν) 0 EE392o, Stanford University 34 Partitioning: Lagrangian relaxation the dual of this SDP is the new SDP: minimize Tr W X subject to X 0 Xii = 1, i = 1, . . . , n the solution X opt gives a lower bound on the optimal value popt of the partitioning problem EE392o, Stanford University 35 Partitioning: simple heuristic • solve the previous SDP to find X opt (and the bound popt) • let v denote an eigenvector of X opt associated with its largest eigenvalue • now let x̂ = sgn(v) the vector x̂ is our guess for a good partition EE392o, Stanford University 36 Partitioning: Randomization • we generate independent samples x(1), . . . , x(K) from a normal distribution with zero mean and covariance X opt • for each sample we consider the heuristic approximate solution x̂(k) = sgn(x(k)) • we then take the one with lowest cost EE392o, Stanford University 37 Partitioning: Numerical Example we compare the performance of these methods on a randomly chosen problem • the optimal SDP lower bound popt is equal to −1641 • the simple sign(x) heuristic gives a partition with total cost −1280 exactly what the optimal value is, we can’t say; all we can say at this point is that it is between −1641 and −1280 EE392o, Stanford University 38 Partitioning 120 100 80 60 40 20 0 −1400 −1200 −1000 −800 −600 −400 −200 0 200 histogram of the objective obtained by the randomized heuristic, over 1000 samples: the minimum value reached here is −1328 EE392o, Stanford University 39 −500 −600 Objective value −700 −800 −900 −1000 −1100 −1200 −1300 −1400 0 100 200 300 400 500 600 Number of samples 700 800 900 1000 we’re not sure what the optimal cost is, but now we know it’s between −1641 and −1328 EE392o, Stanford University 40 Greedy method we can improve these results a little bit using the following simple greedy heuristic T • suppose the matrix Y = x̂ W x̂ has a column j whose sum positive • switching x̂j to −x̂j will decrease the objective by 2 Pn i=1 yij is Pn i=1 yij • if we pick the column yj0 with largest sum, switch x̂j0 and repeat until all column sums are negative, we decrease the objective if we apply this simple method to the solution of the randomized method we get an objective value of −1356, while applying it to the SDP heuristic gives an objective value of −1372, our best partition yet... EE392o, Stanford University 41