VISUAL GEOMETRY GROUP Exploiting Duality (Particularly the dual of SVM) M. Pawan Kumar PART I : General duality theory • Basics of Mathematical Optimization • The algebra • The geometry • Examples PART II : Solving the SVM dual • General Decomposition Algorithm • Good Working Set • Implementation Details Mathematical Optimization min f0(x) s.t. fi(x) ≤ 0 hi(x) = 0 Objective function Inequality constraints Equality constraints x is a feasible point fi(x) ≤ 0, hi(x) = 0 x is a strictly feasible point fi(x) < 0, hi(x) = 0 Feasible region - set of all feasible points Convex Optimization min f0(x) s.t. fi(x) ≤ 0 hi(x) = 0 Objective function Inequality constraints Equality constraints Feasible region is convex Objective function is convex Convex set??? Convex function??? Convex Set Line Segment x1 x2 c x1 + (1 - c) x2 c [0,1] Endpoints Convex Set x1 x2 All points on the line segment lie within the set For all line segments with endpoints in the set Non-Convex Set x1 x2 Examples of Convex Sets x1 x2 Line Segment Examples of Convex Sets x1 x2 Line Examples of Convex Sets Hyperplane aTx - b = 0 Examples of Convex Sets Halfspace aTx - b ≤ 0 Examples of Convex Sets t x2 x1 Second-order Cone ||x|| ≤ t Operations that Preserve Convexity Intersection Polyhedron / Polytope Operations that Preserve Convexity Intersection Operations that Preserve Convexity Affine Transformation x Ax + b Convex Function f(x) x1 x2 x Blue point always lies above red point Convex Function f(x) x1 x2 x f( c x1 + (1 - c) x2 ) ≤ c f(x1) + (1 - c) f(x2) Domain of f(.) has to be convex Convex Function f(x) x1 x2 x f( c x1 + (1 - c) x2 ) ≤ c f(x1) + (1 - c) f(x2) -f(.) is concave Convex Function Once-differentiable functions f(y) + f(y)T (x - y) ≤ f(x) f(x) (y,f(y)) f(y) + f(y)T (x - y) x Twice-differentiable functions 2f(x) 0 Convex Function and Convex Sets f(x) x Epigraph of a convex function is a convex set Examples of Convex Functions Linear function aTx p-Norm functions (x1p + x2p + xnp)1/p, p ≥ 1 Quadratic functions xT Q x Q 0 Operations that Preserve Convexity Non-negative weighted sum f1(x) f2(x) + …. + w2 w1 x x xT Q x + aTx + b Q 0 Operations that Preserve Convexity Pointwise maximum f1(x) f2(x) , max x x Pointwise minimum of concave functions is concave Convex Optimization min f0(x) s.t. fi(x) ≤ 0 hi(x) = 0 Objective function Inequality constraints Equality constraints Feasible region is convex Objective function is convex PART I : General duality theory • Basics of Mathematical Optimization • The algebra • The geometry • Examples PART II : Solving the SVM dual • General Decomposition Algorithm • Good Working Set • Implementation Details Lagrangian min f0(x) s.t. fi(x) ≤ 0 hi(x) = 0 L(x,,) f0(x) + ∑i i fi(x) + ∑i i hi(x) i ≥ 0 Lagrangian Dual L(x,,) f0(x) + ∑i i fi(x) + ∑i i hi(x) i ≥ 0 g(,) minx L(x,,) x belongs to intersection of domains of f0, fi and hi xD Lagrangian Dual g(,) = minx f0(x) + ∑i i fi(x) + ∑i i hi(x) i ≥ 0 Pointwise minimum of affine (concave) functions Dual function is concave Lagrangian Dual p* = min f0(x) s.t. fi(x) ≤ 0 hi(x) = 0 ≥ For all (,) g(,) = minx f0(x) + ∑i i fi(x) + ∑i i hi(x) i ≥ 0 The Dual Problem The lower bound could be far from p* Best lower bound? Easy to obtain d* = max, minx f0(x) + ∑ f (x) + ∑ h (x) i i i i i i i ≥ 0 p* - d* ≥ 0 Duality Gap The Geometric Interpretation u v t (fi(x), hi(x), f0(x)) G xD t G p* u The Geometric Interpretation (, , 1)T (u, v, t) ≥ g(, ) t G p* d* g() u The Duality Gap p* = min f0(x) s.t. fi(x) ≤ 0 ≥ hi(x) = 0 d* = max, minx f0(x) + ∑i i fi(x) + ∑i i hi(x) i ≥ 0 The Duality Gap p* - d* Duality Gap p* - d* ≥ 0 Weak Duality p* - d* = 0 Strong Duality Strong Duality Problem is convex There exists a strictly feasible point Taken care of by most solvers Slater’s Condition At Strong Duality f0(x*) = g(*, *) = minx ( f0(x) + ∑i i*fi(x) + ∑ii*hi(x) ) ≤ f0(x*) + ∑i i*fi(x*) + ∑ii*hi(x*) ≤ f0(x*) Inequalities hold with equality x* minimizes the Lagrangian at (*, *) At Strong Duality f0(x*) = g(*, *) = minx ( f0(x) + ∑i i*fi(x) + ∑ii*hi(x) ) ≤ f0(x*) + ∑i i*fi(x*) + ∑ii*hi(x*) ≤ f0(x*) Inequalities hold with equality i*fi(x*) = 0 KKT Conditions fi(x*) ≤ 0 hi(x*) = 0 i* ≥ 0 i*fi(x*) = 0 Primal feasible Dual feasible Complementary Slackness f0(x*) + ∑i i*fi(x*) + ∑i i*hi(x*) = 0 Necessary conditions for strong duality KKT Conditions fi(x*) ≤ 0 hi(x*) = 0 i* ≥ 0 i*fi(x*) = 0 Primal feasible Dual feasible Complementary Slackness f0(x*) + ∑i i*fi(x*) + ∑i i*hi(x*) = 0 Necessary and sufficient for convex problems PART I : General duality theory • Basics of Mathematical Optimization • The algebra • The geometry • Examples PART II : Solving the SVM dual • General Decomposition Algorithm • Good Working Set • Implementation Details Linear Program min cTx s.t. A x = b x ≥0 QCQP min (1/2)xTP0x + q0x + r0 s.t. (1/2)xTPix + qix + ri Entropy Maximization min ∑i xi log(xi) s.t. A x ≤ b ∑i xi = 1 The SVM Framework wTx + b = 0 2/||w|| min 1/2 wTw + C i yi (wTxi + b) ≥ 1 - i i ≥ 0 Points X = {xi} Labels y= {yi} yi {-1, +1} Convex Quadratic Program The SVM Dual min (1/2) TQ - T1 s.t. Ty = 0 0 ≤ ≤ C1 Qij = yiyjxiTxj = yiyj k(xi,xj) PART I : General duality theory • Basics of Mathematical Optimization • The algebra • The geometry • Examples PART II : Solving the SVM dual • General Decomposition Algorithm • Good Working Set • Implementation Details The SVM Dual min (1/2) TQ - T1 s.t. Ty = 0 0 ≤ ≤ C1 Choose ‘q’ variables. Fix the rest. Best set B? Change unfixed variables, satisfying constraints, to decrease objective function (small problem). Repeat. Minimum ‘q’ ??? Till When ??? KKT Conditions min (1/2) TQ - T1 s.t. ilo Ty =0 0 ≤ ≤ C1 eq iup g() -1 + Q + eqy - lo + up = 0 ilo i = 0 iup (i - C) = 0 ilo ≥ 0 iup ≥ 0 KKT Conditions -1 + g() + eqy - lo + up = 0 ilo i = 0 ilo ≥ 0 For all 0 < i < C iup (i - C) = 0 iup ≥ 0 -1 + gi() + eqyi = 0 For all i = 0 -1 + gi() + eqyi - ilo = 0 For all i = C -1 + gi() + eqyi + iup = 0 KKT Conditions -1 + g() + eqy - lo + up = 0 ilo i = 0 ilo ≥ 0 iup (i - C) = 0 iup ≥ 0 gi() = yi ∑j jyj k(xi,xj) git() = gi(t-1) + yi ∑j B (jt - jt-1)yj k(xi,xj) Best set of ‘q’ variables (Working set) PART I : General duality theory • Basics of Mathematical Optimization • The algebra • The geometry • Examples PART II : Solving the SVM dual • General Decomposition Algorithm • Good Working Set • Implementation Details Working Set gi() = yi ∑j jyj k(xi,xj) d : feasible direction of descent t = t-1 + d Choose steepest descent direction First order approximation of objective (-1 + g(t-1))T d Working Set mind (-1 + g(t-1))T d yT d = 0 s.t. di ≥ 0 if it-1 = 0 di ≤ 0 if it-1 = C Card{d} = q -1 ≤ di ≤ 1 Working Set si = yi (-1 + gi(t-1)) Sort according decreasing values of si Choose q/2 from top if 0 < it-1 < C, or di = -yi satisfies feasibility of direction Choose q/2 from bottom if 0 < it-1 < C, or di = yi satisfies feasibility of direction Working Set mind (-1 + g(t-1))T d yT d = 0 s.t. di ≥ 0 if it-1 = 0 di ≤ 0 if it-1 = C Card{d} = q -1 ≤ di ≤ 1 PART I : General duality theory • Basics of Mathematical Optimization • The algebra • The geometry • Examples PART II : Solving the SVM dual • General Decomposition Algorithm • Good Working Set • Implementation Details Shrinking For all 0 < i < C -1 + gi() + eqyi = 0 For all i = 0 -1 + gi() + eqyi - ilo = 0 For all i = C -1 + gi() + eqyi + iup = 0 If ilo > 0 or iup > 0 for n consecutive iterations Drop i from problem (temporarily) Caching Kernel evaluation can be expensive Cache them in a least-recently-used manner Choose q’ variables where cache available Results Those who have used SVMlight : You know that it works very well. Those who haven’t used SVMlight : It works very well. See paper. Download. Questions???