Convex Optimization — Boyd & Vandenberghe 5. Duality • Lagrange dual problem • weak and strong duality • geometric interpretation • optimality conditions • perturbation and sensitivity analysis • examples • generalized inequalities 5–1 Lagrangian standard form problem (not necessarily convex) minimize f0(x) subject to fi(x) ≤ 0, hi(x) = 0, i = 1, . . . , m i = 1, . . . , p variable x ∈ Rn, domain D, optimal value p⋆ Lagrangian: L : Rn × Rm × Rp → R, with dom L = D × Rm × Rp, L(x, λ, ν) = f0(x) + m X i=1 λifi(x) + p X νihi(x) i=1 • weighted sum of objective and constraint functions • λi is Lagrange multiplier associated with fi(x) ≤ 0 • νi is Lagrange multiplier associated with hi(x) = 0 Duality 5–2 Lagrange dual function Lagrange dual function: g : Rm × Rp → R, g(λ, ν) = = inf L(x, λ, ν) x∈D inf x∈D f0(x) + m X λifi(x) + i=1 p X νihi(x) i=1 ! g is concave, can be −∞ for some λ, ν lower bound property: if λ 0, then g(λ, ν) ≤ p⋆ proof: if x̃ is feasible and λ 0, then f0(x̃) ≥ L(x̃, λ, ν) ≥ inf L(x, λ, ν) = g(λ, ν) x∈D minimizing over all feasible x̃ gives p⋆ ≥ g(λ, ν) Duality 5–3 Least-norm solution of linear equations minimize xT x subject to Ax = b dual function • Lagrangian is L(x, ν) = xT x + ν T (Ax − b) • to minimize L over x, set gradient equal to zero: ∇xL(x, ν) = 2x + AT ν = 0 =⇒ x = −(1/2)AT ν • plug in in L to obtain g: 1 g(ν) = L((−1/2)AT ν, ν) = − ν T AAT ν − bT ν 4 a concave function of ν lower bound property: p⋆ ≥ −(1/4)ν T AAT ν − bT ν for all ν Duality 5–4 Standard form LP minimize cT x subject to Ax = b, x0 dual function • Lagrangian is L(x, λ, ν) = cT x + ν T (Ax − b) − λT x = −bT ν + (c + AT ν − λ)T x • L is affine in x, hence g(λ, ν) = inf L(x, λ, ν) = x −bT ν −∞ AT ν − λ + c = 0 otherwise g is linear on affine domain {(λ, ν) | AT ν − λ + c = 0}, hence concave lower bound property: p⋆ ≥ −bT ν if AT ν + c 0 Duality 5–5 Equality constrained norm minimization minimize kxk subject to Ax = b dual function T T g(ν) = inf (kxk − ν Ax + b ν) = x bT ν kAT νk∗ ≤ 1 −∞ otherwise where kvk∗ = supkuk≤1 uT v is dual norm of k · k proof: follows from inf x(kxk − y T x) = 0 if kyk∗ ≤ 1, −∞ otherwise • if kyk∗ ≤ 1, then kxk − y T x ≥ 0 for all x, with equality if x = 0 • if kyk∗ > 1, choose x = tu where kuk ≤ 1, uT y = kyk∗ > 1: kxk − y T x = t(kuk − kyk∗) → −∞ as t → ∞ lower bound property: p⋆ ≥ bT ν if kAT νk∗ ≤ 1 Duality 5–6 Two-way partitioning minimize xT W x subject to x2i = 1, i = 1, . . . , n • a nonconvex problem; feasible set contains 2n discrete points • interpretation: partition {1, . . . , n} in two sets; Wij is cost of assigning i, j to the same set; −Wij is cost of assigning to different sets dual function T g(ν) = inf (x W x + x X i νi(x2i − 1)) = inf xT (W + diag(ν))x − 1T ν x −1T ν W + diag(ν) 0 = −∞ otherwise lower bound property: p⋆ ≥ −1T ν if W + diag(ν) 0 example: ν = −λmin(W )1 gives bound p⋆ ≥ nλmin(W ) Duality 5–7 Lagrange dual and conjugate function minimize f0(x) subject to Ax b, Cx = d dual function g(λ, ν) = inf x∈dom f0 T T T T T f0(x) + (A λ + C ν) x − b λ − d ν = −f0∗(−AT λ − C T ν) − bT λ − dT ν • recall definition of conjugate f ∗(y) = supx∈dom f (y T x − f (x)) • simplifies derivation of dual if conjugate of f0 is known example: entropy maximization f0(x) = n X i=1 Duality xi log xi, f0∗(y) = n X eyi−1 i=1 5–8 The dual problem Lagrange dual problem maximize g(λ, ν) subject to λ 0 • finds best lower bound on p⋆, obtained from Lagrange dual function • a convex optimization problem; optimal value denoted d⋆ • λ, ν are dual feasible if λ 0, (λ, ν) ∈ dom g • often simplified by making implicit constraint (λ, ν) ∈ dom g explicit example: standard form LP and its dual (page 5–5) minimize cT x subject to Ax = b x0 Duality maximize −bT ν subject to AT ν + c 0 5–9 Weak and strong duality weak duality: d⋆ ≤ p⋆ • always holds (for convex and nonconvex problems) • can be used to find nontrivial lower bounds for difficult problems for example, solving the SDP maximize −1T ν subject to W + diag(ν) 0 gives a lower bound for the two-way partitioning problem on page 5–7 strong duality: d⋆ = p⋆ • does not hold in general • (usually) holds for convex problems • conditions that guarantee strong duality in convex problems are called constraint qualifications Duality 5–10 Slater’s constraint qualification strong duality holds for a convex problem minimize f0(x) subject to fi(x) ≤ 0, Ax = b i = 1, . . . , m if it is strictly feasible, i.e., ∃x ∈ int D : fi(x) < 0, i = 1, . . . , m, Ax = b • also guarantees that the dual optimum is attained (if p⋆ > −∞) • can be sharpened: e.g., can replace int D with relint D (interior relative to affine hull); linear inequalities do not need to hold with strict inequality, . . . • there exist many other types of constraint qualifications Duality 5–11 Inequality form LP primal problem minimize cT x subject to Ax b dual function T T T g(λ) = inf (c + A λ) x − b λ = x dual problem −bT λ AT λ + c = 0 −∞ otherwise maximize −bT λ subject to AT λ + c = 0, λ0 • from Slater’s condition: p⋆ = d⋆ if Ax̃ ≺ b for some x̃ • in fact, p⋆ = d⋆ except when primal and dual are infeasible Duality 5–12 Quadratic program primal problem (assume P ∈ Sn++) minimize xT P x subject to Ax b dual function 1 g(λ) = inf x P x + λ (Ax − b) = − λT AP −1AT λ − bT λ x 4 T T dual problem maximize −(1/4)λT AP −1AT λ − bT λ subject to λ 0 • from Slater’s condition: p⋆ = d⋆ if Ax̃ ≺ b for some x̃ • in fact, p⋆ = d⋆ always Duality 5–13 A nonconvex problem with strong duality minimize xT Ax + 2bT x subject to xT x ≤ 1 A 6 0, hence nonconvex dual function: g(λ) = inf x(xT (A + λI)x + 2bT x − λ) • unbounded below if A + λI 6 0 or if A + λI 0 and b 6∈ R(A + λI) • minimized by x = −(A + λI)†b otherwise: g(λ) = −bT (A + λI)†b − λ dual problem and equivalent SDP: maximize −bT (A + λI)†b − λ subject to A + λI 0 b ∈ R(A + λI) maximize −t −λ A + λI subject to bT b t 0 strong duality although primal problem is not convex (not easy to show) Duality 5–14 Geometric interpretation for simplicity, consider problem with one constraint f1(x) ≤ 0 interpretation of dual function: g(λ) = inf (t + λu), (u,t)∈G where G = {(f1(x), f0(x)) | x ∈ D} t t G G p⋆ d⋆ p⋆ λu + t = g(λ) g(λ) u u • λu + t = g(λ) is (non-vertical) supporting hyperplane to G • hyperplane intersects t-axis at t = g(λ) Duality 5–15 epigraph variation: same interpretation if G is replaced with A = {(u, t) | f1(x) ≤ u, f0(x) ≤ t for some x ∈ D} t A λu + t = g(λ) p⋆ g(λ) u strong duality • holds if there is a non-vertical supporting hyperplane to A at (0, p⋆) • for convex problem, A is convex, hence has supp. hyperplane at (0, p⋆) • Slater’s condition: if there exist (ũ, t̃) ∈ A with ũ < 0, then supporting hyperplanes at (0, p⋆) must be non-vertical Duality 5–16 Complementary slackness assume strong duality holds, x⋆ is primal optimal, (λ⋆, ν ⋆) is dual optimal f0(x⋆) = g(λ⋆, ν ⋆) = inf x f0(x) + ≤ f0(x⋆) + m X i=1 m X λ⋆ifi(x⋆) + i=1 ≤ f0(x⋆) λ⋆ifi(x) + p X νi⋆hi(x) i=1 p X ! νi⋆hi(x⋆) i=1 hence, the two inequalities hold with equality • x⋆ minimizes L(x, λ⋆, ν ⋆) • λ⋆ifi(x⋆) = 0 for i = 1, . . . , m (known as complementary slackness): λ⋆i > 0 =⇒ fi(x⋆) = 0, Duality fi(x⋆) < 0 =⇒ λ⋆i = 0 5–17 Karush-Kuhn-Tucker (KKT) conditions the following four conditions are called KKT conditions (for a problem with differentiable fi, hi): 1. primal constraints: fi(x) ≤ 0, i = 1, . . . , m, hi(x) = 0, i = 1, . . . , p 2. dual constraints: λ 0 3. complementary slackness: λifi(x) = 0, i = 1, . . . , m 4. gradient of Lagrangian with respect to x vanishes: ∇f0(x) + m X i=1 λi∇fi(x) + p X νi∇hi(x) = 0 i=1 from page 5–17: if strong duality holds and x, λ, ν are optimal, then they must satisfy the KKT conditions Duality 5–18 KKT conditions for convex problem if x̃, λ̃, ν̃ satisfy KKT for a convex problem, then they are optimal: • from complementary slackness: f0(x̃) = L(x̃, λ̃, ν̃) • from 4th condition (and convexity): g(λ̃, ν̃) = L(x̃, λ̃, ν̃) hence, f0(x̃) = g(λ̃, ν̃) if Slater’s condition is satisfied: x is optimal if and only if there exist λ, ν that satisfy KKT conditions • recall that Slater implies strong duality, and dual optimum is attained • generalizes optimality condition ∇f0(x) = 0 for unconstrained problem Duality 5–19 example: water-filling (assume αi > 0) Pn minimize − i=1 log(xi + αi) subject to x 0, 1T x = 1 x is optimal iff x 0, 1T x = 1, and there exist λ ∈ Rn, ν ∈ R such that λ 0, λixi = 0, 1 + λi = ν xi + α i • if ν < 1/αi: λi = 0 and xi = 1/ν − αi • if ν ≥ 1/αi: λi = ν − 1/αi and xi = 0 Pn • determine ν from 1T x = i=1 max{0, 1/ν − αi} = 1 interpretation • n patches; level of patch i is at height αi 1/ν ⋆ xi • flood area with unit amount of water • resulting level is 1/ν Duality αi ⋆ i 5–20 Perturbation and sensitivity analysis (unperturbed) optimization problem and its dual minimize f0(x) subject to fi(x) ≤ 0, hi(x) = 0, maximize g(λ, ν) subject to λ 0 i = 1, . . . , m i = 1, . . . , p perturbed problem and its dual min. f0(x) s.t. fi(x) ≤ ui, hi(x) = vi, i = 1, . . . , m i = 1, . . . , p max. g(λ, ν) − uT λ − v T ν s.t. λ0 • x is primal variable; u, v are parameters • p⋆(u, v) is optimal value as a function of u, v • we are interested in information about p⋆(u, v) that we can obtain from the solution of the unperturbed problem and its dual Duality 5–21 global sensitivity result assume strong duality holds for unperturbed problem, and that λ⋆, ν ⋆ are dual optimal for unperturbed problem apply weak duality to perturbed problem: p⋆(u, v) ≥ g(λ⋆, ν ⋆) − uT λ⋆ − v T ν ⋆ = p⋆(0, 0) − uT λ⋆ − v T ν ⋆ sensitivity interpretation • if λ⋆i large: p⋆ increases greatly if we tighten constraint i (ui < 0) • if λ⋆i small: p⋆ does not decrease much if we loosen constraint i (ui > 0) • if νi⋆ large and positive: p⋆ increases greatly if we take vi < 0; if νi⋆ large and negative: p⋆ increases greatly if we take vi > 0 • if νi⋆ small and positive: p⋆ does not decrease much if we take vi > 0; if νi⋆ small and negative: p⋆ does not decrease much if we take vi < 0 Duality 5–22 local sensitivity: if (in addition) p⋆(u, v) is differentiable at (0, 0), then λ⋆i ∂p⋆(0, 0) =− , ∂ui νi⋆ ∂p⋆(0, 0) =− ∂vi proof (for λ⋆i): from global sensitivity result, ∂p⋆(0, 0) p⋆(tei, 0) − p⋆(0, 0) = lim ≥ −λ⋆i tց0 ∂ui t p⋆(tei, 0) − p⋆(0, 0) ∂p⋆(0, 0) = lim ≤ −λ⋆i tր0 ∂ui t hence, equality p⋆(u) for a problem with one (inequality) constraint: u p⋆(u) u=0 p⋆(0) − λ⋆u Duality 5–23 Duality and problem reformulations • equivalent formulations of a problem can lead to very different duals • reformulating the primal problem can be useful when the dual is difficult to derive, or uninteresting common reformulations • introduce new variables and equality constraints • make explicit constraints implicit or vice-versa • transform objective or constraint functions e.g., replace f0(x) by φ(f0(x)) with φ convex, increasing Duality 5–24 Introducing new variables and equality constraints minimize f0(Ax + b) • dual function is constant: g = inf x L(x) = inf x f0(Ax + b) = p⋆ • we have strong duality, but dual is quite useless reformulated problem and its dual minimize f0(y) subject to Ax + b − y = 0 maximize bT ν − f0∗(ν) subject to AT ν = 0 dual function follows from g(ν) = inf (f0(y) − ν T y + ν T Ax + bT ν) x,y −f0∗(ν) + bT ν AT ν = 0 = −∞ otherwise Duality 5–25 norm approximation problem: minimize kAx − bk minimize kyk subject to y = Ax − b can look up conjugate of k · k, or derive dual directly g(ν) = inf (kyk + ν T y − ν T Ax + bT ν) x,y T b ν + inf y (kyk + ν T y) AT ν = 0 = −∞ otherwise T b ν AT ν = 0, kνk∗ ≤ 1 = −∞ otherwise (see page 5–4) dual of norm approximation problem maximize bT ν subject to AT ν = 0, Duality kνk∗ ≤ 1 5–26 Implicit constraints LP with box constraints: primal and dual problem minimize cT x subject to Ax = b −1 x 1 maximize −bT ν − 1T λ1 − 1T λ2 subject to c + AT ν + λ1 − λ2 = 0 λ1 0, λ2 0 reformulation with box constraints made implicit T c x −1 x 1 minimize f0(x) = ∞ otherwise subject to Ax = b dual function g(ν) = inf (cT x + ν T (Ax − b)) −1x1 = −bT ν − kAT ν + ck1 dual problem: maximize −bT ν − kAT ν + ck1 Duality 5–27 Problems with generalized inequalities minimize f0(x) subject to fi(x) Ki 0, i = 1, . . . , m hi(x) = 0, i = 1, . . . , p Ki is generalized inequality on Rki definitions are parallel to scalar case: • Lagrange multiplier for fi(x) Ki 0 is vector λi ∈ Rki • Lagrangian L : Rn × Rk1 × · · · × Rkm × Rp → R, is defined as L(x, λ1, · · · , λm, ν) = f0(x) + m X i=1 λTi fi(x) + p X νihi(x) i=1 • dual function g : Rk1 × · · · × Rkm × Rp → R, is defined as g(λ1, . . . , λm, ν) = inf L(x, λ1, · · · , λm, ν) x∈D Duality 5–28 lower bound property: if λi Ki∗ 0, then g(λ1, . . . , λm, ν) ≤ p⋆ proof: if x̃ is feasible and λ Ki∗ 0, then f0(x̃) ≥ f0(x̃) + m X λTi fi(x̃) + i=1 ≥ p X νihi(x̃) i=1 inf L(x, λ1, . . . , λm, ν) x∈D = g(λ1, . . . , λm, ν) minimizing over all feasible x̃ gives p⋆ ≥ g(λ1, . . . , λm, ν) dual problem maximize g(λ1, . . . , λm, ν) subject to λi Ki∗ 0, i = 1, . . . , m • weak duality: p⋆ ≥ d⋆ always • strong duality: p⋆ = d⋆ for convex problem with constraint qualification (for example, Slater’s: primal problem is strictly feasible) Duality 5–29 Semidefinite program primal SDP (Fi, G ∈ Sk ) minimize cT x subject to x1F1 + · · · + xnFn G • Lagrange multiplier is matrix Z ∈ Sk • Lagrangian L(x, Z) = cT x + tr (Z(x1F1 + · · · + xnFn − G)) • dual function g(Z) = inf L(x, Z) = x − tr(GZ) tr(FiZ) + ci = 0, −∞ otherwise i = 1, . . . , n dual SDP maximize − tr(GZ) subject to Z 0, tr(FiZ) + ci = 0, i = 1, . . . , n p⋆ = d⋆ if primal SDP is strictly feasible (∃x with x1F1 + · · · + xnFn ≺ G) Duality 5–30