Convex Optimization: Part 1 of Chapter 7 Discussion Presenter: Brian Quanz A KTEC Center of Excellence 1 About today’s discussion… • Chapter 7 – no separate discussion of convex optimization • Discusses with SVM problems • Instead: • Today: Discuss convex optimization • Next Week: Discuss some specific convex optimization problems (from text), e.g. SVMs A KTEC Center of Excellence 2 About today’s discussion… • Mostly follow alternate text: • Convex Optimization, Stephen Boyd and Lieven Vandenberghe – Borrowed material from book and related course notes – Some figures and equations shown here • Available online: http://www.stanford.edu/~boyd/cvxbook/ • Nice course lecture videos available from Stephen Boyd online: http://www.stanford.edu/class/ee364a/ • Corresponding convex optimization tool (discuss later) - CVX: http://www.stanford.edu/~boyd/cvx/ A KTEC Center of Excellence 3 Overview Why convex? What is convex? Key examples of linear and quadratic programming Key mathematical ideas to discuss: ->Lagrange Duality ->KKT conditions Brief concept of interior point methods CVX – convex opt. made easy A KTEC Center of Excellence 4 Mathematical Optimization • All learning is some optimization problem -> Stick to canonical form • x = (x1, x2, …, xp ) – opt. variables ; x* • f0 : Rp -> R – objective function • fi : Rp -> R – constraint function A KTEC Center of Excellence 5 Optimization Example • Well familiar with: regularized regression • Least squares • Add some constraints, ridge, lasso A KTEC Center of Excellence 6 Why convex optimization? • Can’t solve most OPs • E.g. NP Hard, even high polynomial time too slow • Convex OPs • (Generally) No analytic solution • Efficient algorithms to find (global) solution • Interior point methods (basically Iterated Newton) can be used: – ~[10-100]*max{p3 , p2m, F} ; F cost eval. obj. and constr. f • At worst solve with general IP methods (CVX), faster specialized A KTEC Center of Excellence 7 What is Convex Optimization? • OP with convex objective and constraint functions • f0 , … , fm are convex = convex OP that has an efficient solution! A KTEC Center of Excellence 8 Convex Function • Definition: the weighted mean of function evaluated at any two points is greater than or equal to the function evaluated at the weighted mean of the two points A KTEC Center of Excellence 9 Convex Function • What does definition mean? • Pick any two points x, y and evaluate along the function, f(x), f(y) • Draw the line passing through the two points f(x) and f(y) • Convex if function evaluated on any point along the line between x and y is below the line between f(x) and f(y) A KTEC Center of Excellence 10 Convex Function A KTEC Center of Excellence 11 Convex Function Convex! A KTEC Center of Excellence 12 Convex Function Not Convex!!! A KTEC Center of Excellence 13 Convex Function • Easy to see why convexity allows for efficient solution • Just “slide” down the objective function as far as possible and will reach a minimum A KTEC Center of Excellence 14 Local Optima is Global (simple proof) A KTEC Center of Excellence 15 Convex vs. Non-convex Ex. Affine – border case of convexity • Convex, min. easy to find A KTEC Center of Excellence 16 Convex vs. Non-convex Ex. • Non-convex, easy to get stuck in a local min. • Can’t rely on only local search techniques A KTEC Center of Excellence 17 Non-convex • Some non-convex problems highly multi-modal, or NP hard • Could be forced to search all solutions, or hope stochastic search is successful • Cannot guarantee best solution, inefficient • Harder to make performance guarantees with approximate solutions A KTEC Center of Excellence 18 Determine/Prove Convexity • Can use definition (prove holds) to prove • If function restricted to any line is convex, function is convex • If 2X differentiable, show hessian >= 0 • Often easier to: • Convert to a known convex OP – E.g. QP, LP, SOCP, SDP, often of a more general form • Combine known convex functions (building blocks) using operations that preserve convexity – Similar idea to building kernels A KTEC Center of Excellence 19 Some common convex OPs • Of particular interest for this book and chapter: • linear programming (LP) and quadratic programming (QP) • LP: affine objective function, affine constraints -e.g. LP SVM, portfolio management A KTEC Center of Excellence 20 LP Visualization Note: constraints form feasible set -for LP, polyhedra A KTEC Center of Excellence 21 Quadratic Program • QP: Quadratic objective, affine constraints • LP is special case • Many SVM problems result in QP, regression • If constraint functions quadratic, then Quadratically Constrained Quadratic Program (QCQP) A KTEC Center of Excellence 22 QP Visualization A KTEC Center of Excellence 23 Second Order Cone Program • Ai = 0 - results in LP • ci = 0 - results in QCQP • Constraint requires the affine functions to lie in 2nd order cone A KTEC Center of Excellence 24 Second Order Cone (Boundary) in R3 A KTEC Center of Excellence 25 Semidefinite Programming • Linear matrix inequality (LMI) constraints • Many problems can be expressed using LMIs • LP and SOCP A KTEC Center of Excellence 26 Semidefinite Programming A KTEC Center of Excellence 27 Building Convex Functions • From simple convex functions to complex: some operations that preserve complexity • Nonnegative weighted sum • Composition with affine function • Pointwise maximum and supremum • Composition • Minimization • Perspective ( g(x,t) = tf(x/t) ) A KTEC Center of Excellence 28 Verifying Convexity Remarks • For more detail and expansion, consult the referenced text, Convex Optimization • Geometric Programs also convex, can be handled with a series of SDPs (skipped details here) • CVX converts the problem either to SOCP or SDM (or a series of) and uses efficient solver A KTEC Center of Excellence 29 Lagrangian • Standard form: • Lagrangian L: • Lambda, nu, Lagrange multipliers (dual variables) A KTEC Center of Excellence 30 Lagrange Dual Function • Lagrange Dual found by minimizing L with respect to primal variables • Often can take gradient of L w.r.t. primal var.’s and set = 0 (SVM) A KTEC Center of Excellence 31 Lagrange Dual Function • Note: Lagrange dual function is the pointwise infimum of family of affine functions of (lambda, nu) • Thus, g is concave even if problem is not convex A KTEC Center of Excellence 32 Lagrange Dual Function • Lagrange Dual provides lower bound on objective value at solution A KTEC Center of Excellence 33 Lagrangian as Linear Approximation, Lower Bound • Simple interpretation of Lagrangian • Can incorporate the constraints into objective as indicator functions • Infinity if violated, 0 otherwise: 0 • In Lagrangian we use a “soft” linear approximation to the indicator functions; under-estimator since A KTEC Center of Excellence 34 Lagrange Dual Problem • Why not make the lower bound best possible? • Dual problem: • Always convex opt. problem (even when primal is non-convex) • Weak Duality: d* <= p* (have already seen this) A KTEC Center of Excellence 35 Strong Duality • If d* = p*, strong duality holds • Does not hold in general • Slater’s Theorem: If convex problem, and strictly feasible point exists, then strong duality holds! (proof too involved, refer to text) • => For convex problems, can use dual problem to find solution A KTEC Center of Excellence 36 Complementary Slackness • When strong duality holds (definition) (since constraints satisfied at x*) • Sandwiched between f0(x), last 2 inequalities are equalities, simple! A KTEC Center of Excellence 37 Complementary Slackness • Which means: • Since each term is non-positive, we have complementary slackness: • Whenever constraint is non-active, corresponding multiplier is zero A KTEC Center of Excellence 38 Complementary Slackness • This can also be described by • Since usually only a few active constraints at solution (see geometry), the dual variable lambda is often sparse • Note: In general no guarantee A KTEC Center of Excellence 39 Complementary Slackness • As we will see, this is why support vector machines result in solution with only key support vectors • These come from the dual problem, constraints correspond to points, and complementary slackness ensures only the “active” points are kept A KTEC Center of Excellence 40 Complementary Slackness • However, avoid common misconceptions when it comes to SVM and complementary slackness! • E.g. if Lagrange multiplier is 0, constraint could still be active! (not bijection!) • This means: A KTEC Center of Excellence 41 KKT Conditions • The KKT conditions are then just what we call that set of conditions required at the solution (basically list what we know) • KKT conditions play important role • Can sometimes be used to find solution analytically • Otherwise can think of many methods as ways of solving KKT conditions A KTEC Center of Excellence 42 KKT Conditions • Again given strong duality and assuming differentiable, since gradient must be 0 at x* • Thus, putting it all together, for non-convex problems we have A KTEC Center of Excellence 43 KKT Conditions – non-convex • Necessary conditions A KTEC Center of Excellence 44 KKT Conditions – convex • Also sufficient conditions: • 1+2 -> xt is feasible. • 3 -> L(x,lt,nt) is convex • 5 -> xt minimizes L(x,lt,nt) so g(lt,nt) = L(xt,lt,nt) A KTEC Center of Excellence 45 Brief description of interior point method • Solve a series of equality constrained problems with Newton’s method • Approximate constraints with log-barrier (approx. of indicator) A KTEC Center of Excellence 46 Brief description of interior point method • As t gets larger, approximation becomes better A KTEC Center of Excellence 47 Central Path Idea A KTEC Center of Excellence 48 CVX: Convex Optimization Made Easy • CVX is a Matlab toolbox • Allows you to flexibly express convex optimization problems • Translates these to a general form and uses efficient solver (SOCP, SDP, or a series of these) • http://www.stanford.edu/~boyd/cvx/ • All you have to do is design the convex optimization problem • Plug into CVX, a first version of algorithm implemented • More specialized solver may be necessary for some applications A KTEC Center of Excellence 49 CVX - Examples • Quadratic program: given H, f, A, and b • cvx_begin variable x(n) minimize (x’*H*x + f’*x) subject to A*x >= b cvx_end A KTEC Center of Excellence 50 CVX - Examples • SVM-type formulation with L1 norm • cvx_begin variable w(p) variable b(1) variable e(n) expression by(n) by = train_label.*b; minimize( w'*(L + I)*w + C*sum(e) + l1_lambda*norm(w,1) ) subject to X*w + by >= a - e; e >= ec; cvx_end A KTEC Center of Excellence 51 CVX - Examples • More complicated terms built with expressions • cvx_begin variable w(p+1+n); expression q(ec); for i =1:p for j =i:p if(A(i,j) == 1) q(ct) = max(abs(w(i))/d(i),abs(w(j))/d(j)); ct=ct+1; end end end minimize( f'*w + lambda*sum(q) ) subject to X*w >= a; cvx_end A KTEC Center of Excellence 52 Questions • Questions, Comments? A KTEC Center of Excellence 53 Extra proof A KTEC Center of Excellence 54