ITERATIVE METHODS AND REGULARIZATION IN THE DESIGN OF FAST ALGORITHMS

ITERATIVE METHODS AND REGULARIZATION IN THE DESIGN OF FAST ALGORITHMS An unified framework for optimization and online learning beyond Multiplicative Weight Updates Lorenzo Orecchia, MIT Math Talk Outline: A Tale of Two Halves PART 1: REGULARIZATION AND ITERATIVE TECHNIQUES FOR ONLINE LEARNING • Online Linear Optimization • Online Linear Optimization over Simplex and Multiplicative Weight Updates (MWUs) • A Regularization Framework to generalize MWUs: Follow the Regularized Leader MESSAGE: REGULARIZATION IS A POWERFUL ALGORITHMIC TECHNIQUE Talk Outline: A Tale of Two Halves PART 1: REGULARIZATION AND ITERATIVE TECHNIQUES FOR ONLINE LEARNING • Online Linear Optimization • Online Linear Optimization over Simplex and Multiplicative Weight Updates (MWUs) • A Regularization Framework to generalize MWUs: Follow the Regularized Leader MESSAGE: REGULARIZATION IS A POWERFUL ALGORITHMIC TECHNIQUE Optimization: Regularized Updates Online Learning: Multiplicative Weight Updates (MWUs) Talk Outline: A Tale of Two Halves PART 1: REGULARIZATION AND ITERATIVE TECHNIQUES FOR ONLINE LEARNING • Online Linear Optimization • Online Linear Optimization over Simplex and Multiplicative Weight Updates (MWUs) • A Regularization Framework to generalize MWUs: Follow the Regularized Leader MESSAGE: REGULARIZATION IS A POWERFUL ALGORITHMIC TECHNIQUE PART 2: NON-SMOOTH OPTIMIZATION AND FAST ALGORITHMS FOR MAXFLOW • Non-smooth vs Smooth Convex Optimization •Non-smooth Convex Optimization reduces to Online Linear Optimization • Application: Understanding Undirected Maxflow algorithms based on MWUs MESSAGE: FASTEST ALGORITHMS REQUIRE PRIMAL-DUAL APPROACH TOC Applications of MWUs  Fast Algorithms for solving specific LPs and SDPs:  Maximum Flow problems [PST], [GK], [F], [CKMST]  Covering-packing problems [PST]  Oblivious routing [R], [M]  Fast Approximation Algorithms based on LP and SDP relaxations:  Maxcut [AK]  Graph Partitioning Problems [AK], [S], [OSV]  Proof Technique  Hardcore Lemma [BHK]  QIP = PSPACE [W]  Derandomization [Y] … and more Machine Learning meets Optimization meets TCS These techniques have been rediscovered multiple times in different fields: Machine Learning, Convex Optimization, TCS Three surveys emphasizing the different viewpoints and literatures: 1) ML: Prediction, Learning and Games by Gabor and Lugosi 2) Optimization: Lectures in Modern Convex Optimization by Ben Tal and Nemirowski 3) TCS: The Multiplicative Weights Update Method: a Meta Algorithm and Applications by Arora, Hazan and Kale REGULARIZATION 101 What is Regularization? Regularization is a fundamental technique in optimization OPTIMIZATION PROBLEM WELL-BEHAVED OPTIMIZATION PROBLEM • Stable optimum • Unique optimal solution • Smoothness conditions … What is Regularization? Regularization is a fundamental technique in optimization OPTIMIZATION PROBLEM WELL-BEHAVED OPTIMIZATION PROBLEM Parameter ¸ > 0 Benefits of Regularization in Learning and Statistics: • Prevents overfitting • Increases stability •Decreases sensitivity to random noise Regularizer F Example: Regularization Helps Stability Consider a convex set S ½ Rn and a linear optimization problem: f(c) = arg minx2S cT x The optimal solution f(c) may be very unstable under perturbation of c : kc0 ¡ ck · ± kf(c0 ) ¡ f(c)k >> ± and c0 f(c0 ) c S f(c) Example: Regularization Helps Stability Consider a convex set S ½ Rn and a regularized linear optimization problem f(c) = arg minx2S cT x +F (x) where F is ¾-strongly convex. Then: kc0 ¡ ck · ± implies kf(c0 ) ¡ f(c)kk · ± ¾ c0T x + F (x) cT x + F (x) f(c) f(c0 ) Example: Regularization Helps Stability Consider a convex set S ½ Rn and a regularized linear optimization problem f(c) = arg minx2S cT x +F (x) where F is ¾-strongly convex. Then: kc0 ¡ ck · ± implies kslopek · ± kf(c0 ) ¡ f(c)kk · ± ¾ c0T x + F (x) cT x + F (x) f(c) f(c0 ) ONLINE LINEAR OPTIMIZATION AND MULTIPLICATIVE WEIGHT UPDATES Online Linear Minimization SETUP: Convex set Xµ Rn, generic norm, repeated game over T rounds. At round t, ALGORITHM x(t) 2 X Current solution ADVERSARY Online Linear Minimization SETUP: Convex set Xµ Rn, generic norm, repeated game over T rounds. At round t, ALGORITHM x(t) 2 X Current solution ADVERSARY `(t) 2 Rn; kr`(t) k¤ · ½ Current linear objective Loss vector Online Linear Minimization SETUP: Convex set Xµ Rn, generic norm, repeated game over T rounds. At round t, ALGORITHM ADVERSARY x(t) 2 X `(t) 2 Rn; kr`(t) k¤ · ½ Current linear objective Current solution Loss vector (t) T ` x(t) Algorithm’s loss Online Linear Minimization SETUP: Convex set Xµ Rn, generic norm, repeated game over T rounds. At round t, ALGORITHM x(t) 2 X x(t+1) 2 X Updated solution ADVERSARY n `(t) 2 R ; kr` x(t) 2 X(t) k¤ · ½ Online Linear Minimization SETUP: Convex set Xµ Rn, generic norm, repeated game over T rounds. At round t, ALGORITHM x(t) 2 X (t+1) x 2X Updated solution ADVERSARY n `(t) 2 R ; kr` x(t) 2 X(t) k¤ · ½ `(t+1) 2 Rn; kr`(t) k¤ · ½ New Loss Vector Online Linear Minimization SETUP: Convex set Xµ Rn, generic norm, repeated game over T rounds. At round t, ALGORITHM x(t) 2 X (t+1) x 2X ADVERSARY n `(t) 2 R ; kr` x(t) 2 X(t) k¤ · ½ `(t+1) 2 Rn; kr`(t) k¤ · ½ GOAL: update x(t) to minimize regret T T X X 1 1 (t) T (t) T T ¢ ` x ¡ min ¢ ì x x2X T T t=1 t=1 ^ Average Algorithm’s Loss L A Posteriori Optimum L¤ Simplex Case: Learning with Experts SETUP: Simplex Xµ Rn under ℓ1 norm. At round t, ALGORITHM p(t) distribution over experts ADVERSARY Simplex Case: Learning with Experts SETUP: Simplex Xµ Rn under ℓ1 norm. At round t, ALGORITHM p(t) distribution over dimensions i.e. experts ADVERSARY k`(t) k1 · ½ Experts’ losses Simplex Case: Learning with Experts SETUP: Simplex Xµ Rn under ℓ1 norm. At round t, ALGORITHM ADVERSARY p(t) k`(t) k1 · ½ distribution over experts Experts’ losses h i (t) (t) T (t) EiÃp(t) ì = p ` Algorithm’s loss Simplex Case: Learning with Experts SETUP: Simplex Xµ Rn under ℓ1 norm. At round t, ALGORITHM p(t) distribution over experts p(t+1) Update distribution ADVERSARY k`(t) k1 · ½ Experts’ losses Simplex Case: Multiplicative Weight Updates ALGORITHM ADVERSARY p(t) `(t) (t+1) Weights: w i (t) ì = (1 ¡ ²) (t) wi ; w1 = ~1 Simplex Case: Multiplicative Weight Updates ALGORITHM ADVERSARY p(t) `(t) (t+1) Weights: w i Distribution: (t+1) pi (t) ì = (1 ¡ ²) (t) wi (t) wi = Pn j=1 (t) wj ; w1 = ~1 Simplex Case: Multiplicative Weight Updates ALGORITHM ADVERSARY p(t) `(t) (t+1) Weights: w i Distribution: (t+1) pi (t) ì = (1 ¡ ²) (t) wi ; w1 = ~1 (t) wi = Pn j=1 (t) wj MULTIPLICATIVE WEIGHT UPDATE Simplex Case: Multiplicative Weight Updates ALGORITHM ADVERSARY p(t) `(t) (t+1) Weights: w i Distribution: (t+1) pi = (1 ¡ ²) (t) wi ; w1 = ~1 (t) wi = Pn j=1 CONSERVATIVE (t) ì 0 (t) wj AGGRESSIVE 1 ² 2 (0; 1) MWUs: Unraveling the Update ALGORITHM ADVERSARY p(t) `(t) Update: (t+1) pi / (t+1) wi (t) ì = (1 ¡ ²) (t) ¢ wi WEIGHT (t+1) wi P (t) ` t i (1 ¡ ²) CUMULATIVE LOSS P (t) t ì MWUs: Regret Bound ALGORITHM ADVERSARY p(t) `(t) Update: For ² < 1 2 (t+1) pi / (t+1) wi = (1 ¡ ²) and k`(t) k1 · ½ ^ ¡ L? · L ½ log n ²T (t) ì + ½² (t) ¢ wi MWUs: Regret Bound ALGORITHM ADVERSARY p(t) `(t) Update: For ² < 1 2 (t+1) pi / (t+1) wi (t) ì = (1 ¡ ²) (t) ¢ wi and k`(t) k1 · ½ ^ ¡ L? · L Algorithm’s Regret ½ log n ²T + ½² Start-up Penalty Penalty for being greedy ONLINE LINEAR OPTIMIZATION BEYOND MWUs A REGULARIZATION FRAMEWORK MWUs: Proof Sketch of Regret Bound Update: (t+1) pi / (t+1) wi Pt (s) = (1 ¡ ²) s=1 ì • Proof is potential function argument (t+1) © = log1¡² Pn (t+1) i=1 wi MWUs: Proof Sketch of Regret Bound (t+1) pi Update: / (t+1) wi Pt (s) = (1 ¡ ²) s=1 ì • Proof is potential function argument (t+1) © = log1¡² Pn (t+1) i=1 wi • Potential function bounds loss of best expert (t+1) © · (t+1) n log1¡² mini=1 wi = minni=1 ³P t (s) s=1 ì ´ MWUs: Proof Sketch of Regret Bound (t+1) pi Update: / (t+1) wi Pt (s) = (1 ¡ ²) s=1 ì • Proof is potential function argument (t+1) © = log1¡² Pn (t+1) i=1 wi • Potential function bounds loss of best expert (t+1) © · (t+1) n log1¡² mini=1 wi = minni=1 ³P t • Potential function is related to algorithm’s performance ©(t+1) ¡ ©(t) ³ T ´ ¸ `(t) p(t) ¡ ² (s) s=1 ì ´ MWUs: Proof Sketch of Regret Bound (t+1) pi Update: / (t+1) wi Pt (s) = (1 ¡ ²) s=1 ì • Proof is potential function argument (t+1) © = log1¡² Pn (t+1) i=1 wi • Potential function bounds loss of best expert (t+1) © · (t+1) n log1¡² mini=1 wi = minni=1 ³P t • Potential function is related to algorithm’s performance ©(t+1) ¡ ©(t) ³ T ´ ¸ `(t) p(t) ¡ ² (s) s=1 ì ´ DOES THIS PROOF TECHNIQUE GENERALIZE TO BEYOND SIMPLEX CASE? Designing a Regularized Update GOAL: Design an update and its potential function analysis QUESTION: Choice of potential function? DESIDERATA: 1) lower bounds best expert’s loss 2) tracks algorithm’s performance MWUs AND APPLICATIONS Designing a Regularized Update QUESTION: Choice of potential function? DESIDERATA: 1) lower bounds best expert’s loss 2) tracks algorithm’s performance Attempt 1 – FOLLOW THE LEADER: Cumulative loss L (t) MWUs AND APPLICATIONS x(t+1) = arg min xT L(t) x2X Pick best current solution = Pt (s) ` s=1 ©(t+1) = min xT L(t) x2X Potential is current best loss Designing a Regularized Update QUESTION: Choice of potential function? DESIDERATA: 1) lower bounds best expert’s loss 2) tracks algorithm’s performance Attempt 1 – FOLLOW THE LEADER: Cumulative loss L (t) MWUs AND APPLICATIONS x(t+1) = arg min xT L(t) x2X Pick best current solution = Pt (s) ` s=1 ©(t+1) = min xT L(t) x2X Potential is current best loss Designing a Regularized Update QUESTION: Choice of potential function? DESIDERATA: 1) lower bounds best expert’s loss 2) tracks algorithm’s performance Attempt 1 – FOLLOW THE LEADER: Cumulative loss L (t) MWUs AND APPLICATIONS x(t+1) = arg min xT L(t) x2X Pick best current solution = Pt (s) ` s=1 ©(t+1) = min xT L(t) x2X Potential is current best loss Fails if best expert changes moves drastically Designing a Regularized Update QUESTION: Choice of potential function? DESIDERATA: 1) lower bounds best expert’s loss 2) tracks algorithm’s performance Attempt 1 – FOLLOW THE LEADER: Cumulative loss L (t) MWUs AND APPLICATIONS x(t+1) = arg min xT L(t) x2X ©(t+1) = min xT L(t) x2X = Pt How to make update more stable? (s) ` s=1 Regularized Update: Definition QUESTION: Choice of potential function? DESIDERATA: 1) lower bounds best expert’s loss 2) tracks algorithm’s performance Attempt 2 – FOLLOW THE REGULARIZED LEADER: MWUs AND APPLICATIONS x(t+1) = arg min xT L(t) + ´ ¢ F(x) x2X ©(t+1) = min xT L(t) + ´ ¢ F(x) x2X Properties of Regularizer F(x): 1. Convex, differentiable 2. ¾-strong convex w.r.t. norm Parameter ´ ¸ 0, TBD Regularized Update: Definition QUESTION: Choice of potential function? DESIDERATA: 1) lower bounds best expert’s loss 2) tracks algorithm’s performance Attempt 2 – FOLLOW THE REGULARIZED LEADER: MWUs AND APPLICATIONS x(t+1) = arg min xT L(t) + ´ ¢ F(x) x2X ©(t+1) = min xT L(t) + ´ ¢ F(x) x2X Properties of Regularizer F(x): 1. Convex, differentiable 2. ¾-strong convex w.r.t. norm Parameter ´ ¸ 0, TBD These properties are actually sufficient to get a regret bound Regularized Update: Analysis QUESTION: Choice of potential function? DESIDERATA: 1) lower bounds best expert’s loss 2) tracks algorithm’s performance Attempt 2 – FOLLOW THE REGULARIZED LEADER: MWUs AND APPLICATIONS x(t+1) = arg min xT L(t) + ´ ¢ F(x) x2X ©(t+1) = min xT L(t) + ´ ¢ F(x) x2X (t+1) © (t) T · min L x2X Properties of Regularizer F(x): 1. Convex, differentiable 2. ¾-strong convex w.r.t. norm Parameter ´ ¸ 0, TBD x + ´ ¢ max F (x) x2X Regularized Update: Analysis QUESTION: Choice of potential function? DESIDERATA: 1) lower bounds best expert’s loss 2) tracks algorithm’s performance Attempt 2 – FOLLOW THE REGULARIZED LEADER: MWUs AND APPLICATIONS x(t+1) = arg min xT L(t) + ´ ¢ F(x) x2X ©(t+1) = min xT L(t) + ´ ¢ F(x) x2X (t+1) © (t) T · min L x2X Properties of Regularizer F(x): 1. Convex, differentiable 2. ¾-strong convex w.r.t. norm Parameter ´ ¸ 0, TBD x + ´ ¢ max F (x) x2X Regularization error Regularized Update: Analysis QUESTION: Choice of potential function? DESIDERATA: 1) lower bounds best expert’s loss 2) tracks algorithm’s performance ? Attempt 2 – FOLLOW THE REGULARIZED LEADER: MWUs AND APPLICATIONS x(t+1) = arg min xT L(t) + ´ ¢ F(x) x2X ©(t+1) = min xT L(t) + ´ ¢ F(x) x2X f (t+1) (x) Properties of Regularizer F(x): 1. Convex, differentiable 2. ¾-strong convex w.r.t. norm Parameter ´ ¸ 0, TBD Tracking the Algorithm: Proof by Picture f (t+1) (x) f (t) (x) ©(t+1) ©(t) x(t) Define: x(t+1) f (t+1) (x) = xT L(t) + ´ ¢ F (x) x Tracking the Algorithm: Proof by Picture f (t+1) (x) f (t) (x) ©(t+1) ©(t) x(t) Define: Notice: x x(t+1) f (t+1) (x) = xT L(t) + ´ ¢ F (x) f (t+1) (x) ¡ f (t) (t) T (x) = ` x Latest loss vector Tracking the Algorithm: Proof by Picture f (t+1) (x) f (t) (x) ©(t+1) T `(t) x(t) ©(t) x(t) Define: Notice: x x(t+1) T f (t+1) (x) = L(t) x + ´ ¢ F (x) f (t+1) (x) ¡ f (t) (t) T (x) = ` x Latest loss vector Tracking the Algorithm: Proof by Picture (t+1) (x) ff(t+1) (x) (t) (x) ff(t) (x) ©(t+1) T `(t) x(t) ©(t) x(t) x(t+1) Compare: (t) T ` x(t) and ©(t+1) ¡ ©(t) x Tracking the Algorithm: Proof by Picture f (t+1) (x) f (t) (x) ©(t+1) T `(t) x(t) ©(t) x(t) (t+1) © Want: (t) ¡© =f (t+1) (t+1) (x p x x(t+1) )¡f (t+1) (t) (t) T (t) (x ) + ` f (t+1) (x(t) ) ¼ f (t+1) (x(t+1) ) x Regularization in Action f (t+1) (x) f (t) (x) ©(t+1) T `(t) x(t) ©(t) x(t) x(t+1) REGULARIZATION T f (t+1) (x) = L(t) x + ´ ¢ F (x) x f (t) is (´ ¢ ¾ )-strongly-convex Regularization in Action f (t+1) (x) f (t) (x) ©(t+1) `(t) T `(t) x(t) ©(t) x(t) x x(t+1) REGULARIZATION T f (t+1) (x) = L(t) x + ´ ¢ F (x) kf (t+1) ¡f (t) (t) k¤ = k` k¤ STABILITY f (t) is (´ ¢ ¾ )-strongly-convex (t+1) jjx (t) ¡ x jj · jj`(t) jj¤ ´¢¾ Regularization in Action f (t+1) (x) f (t) (x) ©(t+1) `(t) Quadratic lower bound to f(t+1) T `(t) x(t) ©(t) x(t) x x(t+1) REGULARIZATION T f (t+1) (x) = L(t) x + ´ ¢ F (x) kf (t+1) ¡f (t) (t) k = k` k STABILITY f (t) is (´ ¢ ¾ )-strongly-convex (t+1) jjx (t) ¡ x jj¤ · jj`(t) jj ´¢¾ Analysis: Progress in One Iteration (t+1) © rf (t+1) (t) ¡© =f (t) (t) (x ) = ` (t+1) (t+1) (x )¡f (x ) + ` (t) (t) jjx (t) (t) T (t) (t+1) ¡ x jj · x jj`(t) jj¤ ´¢¾ MWUs AND APPLICATIONS (t+1) f (t) T f (t+1) (x(t+1) ) ¡ f (t+1) (x(t) ) ¸ ` is (´ ¢ ¾)-strongly-convex (t) 2 jj` jj¤ (t+1) (t) (x ¡x )+ 2´ ¢ ¾ Analysis: Progress in One Iteration (t+1) © rf (t+1) (t) ¡© =f (t) (t) (x ) = ` (t+1) (t+1) (x )¡f (t+1) (t) jjx (t) (t) T (t) (t) jj`(t) jj¤ ´¢¾ (x ) + ` ¡ x jj · x MWUs AND APPLICATIONS f (t+1) is (´ ¢ ¾)-strongly-convex (t) 2 jj` jj¤ (t+1) (t+1) (t+1) (t) (t+1) (t) f (x )¡f (x ) ¸ ` (x ¡x )+ 2´ ¢ ¾ (t) (t) 2 jj` jj k` k¤ ¤ (t) (t+1) (t) ¸ ¡k` k¤ kx ¡x k+ ¸¡ 2´ ¢ ¾ 2´ ¢ ¾ (t) T Completing the Analysis Progress in one iteration: (t) k` k¤ (t+1) (t) (t) © ¡© ¸` x ¡ 2¾´ MWUs AND APPLICATIONS (t) T Regret at iteration t Completing the Analysis Progress in one iteration: (t) T ©(t+1) ¡ ©(t) ¸ ` (t) k` k¤ (t) x ¡ 2¾´ MWUs AND APPLICATIONS Telescopic sum: ©(T +1) ¸ T X t=1 (t) T (t) ` p (t) jj` jj (1) +© ¡T ¢ 2´ ¢ ¾ Completing the Analysis Progress in one iteration: (t) T ©(t+1) ¡ ©(t) ¸ ` (t) k` k¤ (t) x ¡ 2¾´ MWUs AND APPLICATIONS Telescopic sum: ©(T +1) ¸ T X t=1 (t) T (t) ` p (t) jj` jj (1) +© ¡T ¢ 2´ ¢ ¾ Final regret bound: Ã T ! T X T 1 X (t) T (t) ´ ½2 (t) ` x ¡ min ` x · ¢ (max F (x) ¡ min F (x)) + x2X x2X x2X T t=1 T 2¾´ t=1 Completing the Analysis Regret bound: with regularizer F and jj`(t) jj¤ · ½ Ã T ! T X T 1 X (t) T (t) ´ ½2 (t) ` x ¡ min ` x · ¢ (max F (x) ¡ min F (x)) + x2X x2X T t=1 T x2X 2¾´ t=1 MWUs AND APPLICATIONS Start-up Penalty SAME TYPE OF BOUND AS FOR MWUs Penalty for being greedy Reinterpreting MWUs Potential function: Regularizer: F (p) = ©(t+1) = min pT L(t) + ´ ¢ n X Pp¸0; pi =1 n X pi log pi i=1 pi log pi is negative entropy MWUs i=1 AND APPLICATIONS Reinterpreting MWUs Potential function: ©(t+1) = min pT L(t) + ´ ¢ Regularizer: F (p) = Pp¸0; pi =1 n X n X pi log pi i=1 pi log pi is negative entropy MWUs SOFT-MAX i=1 AND APPLICATIONS F (p) is 1-strongly-convex w.r.t. k ¢ k1 Update: p(t+1) = arg min pT L(t) + ´ ¢ Pp¸0; pi =1 (t) (t+1) pi 1 ¡´ Li i=1 (t) 1 ¡´ Li e pi log pi i=1 (t) Li e =P n n X (1 ¡ ²) = Pn i=1 (t) (1 ¡ ²)Li : Reinterpreting MWUs Potential function: ©(t+1) = min pT L(t) + ´ ¢ Regularizer: F (p) = Pp¸0; pi =1 n X n X pi log pi i=1 pi log pi is negative entropy MWUs i=1 AND APPLICATIONS F (p) is 1-strongly-convex w.r.t. k ¢ k1 Update: p(t+1) = arg min pT L(t) + ´ ¢ Pp¸0; pi =1 (t) (t+1) pi 1 ¡´ Li i=1 (t) 1 ¡´ Li e pi log pi i=1 (t) Li e =P n n X (1 ¡ ²) = Pn i=1 (t) (1 ¡ ²)Li : Beyond MWUs: which regularizer? Regret bound: optimizing over ´ Ã T ! p T X X ½ (2 ¢ (maxx2X F (x) ¡ minx2X F (x)) 1 (t) T (t) (t) T p ` x ¡ min ` x · x2X T t=1 ¾T t=1 MWUs AND APPLICATIONS Best choice of regularizer and norm minimizes maxt jj`(t) jj2¤ ¢ (maxx2X F (x) ¡ minx2X F (x)) ¾ Beyond MWUs: which regularizer? Regret bound: optimizing over ´ Ã T ! p T X X ½ (2 ¢ (maxx2X F (x) ¡ minx2X F (x)) 1 (t) T (t) (t) T p ` x ¡ min ` x · x2X T t=1 ¾T t=1 MWUs AND APPLICATIONS Best choice of regularizer and norm minimizes maxt jj`(t) jj2¤ ¢ (maxx2X F (x) ¡ minx2X F (x)) ¾ Negative entropy with `1-norm is approximately optimal for simplex QUESTION: are other regularizers ever useful? Different Regularizers in Algorithm Design QUESTION 1: Are other regularizers, besides entropy, ever useful? YES! Applications:  Graph Partitioning and Random Walks  ~ Spectral algorithms for balanced separator running in time O(m) Uses random-walk framework and SDP MWUs Different walks correspond to different regularizers for eigenvector problem F(X) = Tr(X log X) Heat Kernel Random Walk p-norm, 1 · p · 1 F(X) = Tr(X p) Lazy Random Walk NEW REGULARIZER F (X) = Tr(X 1=2) Personalized PageRank SDP MWU [Mahoney, Orecchia, Vishnoi 2011], [Orecchia, Sachdeva, Vishnoi 2012] Different Regularizers in Algorithm Design QUESTION 1: Are other regularizers, besides entropy, ever useful? YES! Applications:  Graph Partitioning and Random Walks  Sparsification n ²-spectral-sparsifiers with O( n log edges ²2 ) Uses Matrix concentration bound equivalent to SDP MWUs  [Spielman, Srivastava 2008]  ²-spectral-sparsifiers with O( ²n2 ) edges Can be interpreted as different regularizer: F (X) = Tr(X 1=2) [Batson, Spielman, Srivastava 2009] Different Regularizers in Algorithm Design QUESTION 1: Are other regularizers, besides entropy, ever useful? YES! Applications:  Graph Partitioning and Random Walks  Sparsification Many more in Online Learning  Bandit Online Learning [AHR], … NON-SMOOTH CONVEX OPTIMIZATION REDUCES TO ONLINE LINEAR OPTIMIZATION Convex Optimization Setup min f(x) x2X NON-SMOOTH f convex, differentiable X µ Rn closed, convex set SMOOTH 8x 2 X; krf(x)k¤ · ½ 8x; y 2 X; krf(y) ¡ rf(x)k¤ · Lky ¡ xk ½-Lipschitz continuous ½-Lipschitz continuous gradient Convex Optimization Setup min f(x) x2X f convex, differentiable X µ Rn closed, convex set NON-SMOOTH SMOOTH 8x 2 X; krf(x)k¤ · ½ 8x; y 2 X; krf(y) ¡ rf(x)k¤ · Lky ¡ xk ½-Lipschitz continuous ½-Lipschitz continuous gradient Gradient step is guaranteed to decrease function value (t+1) f(x krf(x(t) )k2¤ ) · f(x ) ¡ 2L (t) Convex Optimization Setup min f(x) x2X f convex, differentiable X µ Rn closed, convex set NON-SMOOTH SMOOTH 8x; y 2 X; krf(y) ¡ rf(x)k¤ · Lky ¡ xk 8x 2 X; krf(x)k¤ · ½ ½-Lipschitz continuous NO GRADIENT STEP GUARANTEE ½-Lipschitz continuous gradient Gradient step is guaranteed to decrease function value (t+1) f(x x(t+1) x(t) krf(x(t) )k2¤ ) · f(x ) ¡ 2L (t) Convex Optimization Setup min f(x) x2X f convex, differentiable X µ Rn closed, convex set NON-SMOOTH SMOOTH 8x; y 2 X; krf(y) ¡ rf(x)k¤ · Lky ¡ xk 8x 2 X; krf(x)k¤ · ½ ½-Lipschitz continuous NO GRADIENT STEP GUARANTEE ½-Lipschitz continuous gradient Gradient step is guaranteed to decrease function value (t+1) f(x x(t+1) x(t) ONLY DUAL GUARANTEE krf(x(t) )k2¤ ) · f(x ) ¡ 2L (t) Non-Smooth Setup: Dual Approach f convex, differentiable min f(x) x2X X µ Rn closed, convex set 8x 2 X; krf(x)k¤ · ½ ½-Lipschitz continuous APPROACH: Each iterate solution provides a lower bound and an upper bound ¤ (t) (t) T f(x ) ¸ f(x ) + rf(x (x¤ ¡ x(t) ) f(x(t)) ¸ f(x¤) (t+1) x x(t+2) x(t) Non-Smooth Setup: Dual Approach f convex, differentiable min f(x) x2X X µ Rn closed, convex set 8x 2 X; krf(x)k¤ · ½ ½-Lipschitz continuous APPROACH: Each iterate solution provides a lower bound and an upper bound ¤ (t) (t) T f(x ) ¸ f(x ) + rf(x (x¤ ¡ x(t) ) f(x(t)) ¸ f(x¤) (t+1) x x(t+2) x(t) CAN WEAKEN DIFFERENTIABILITY ASSUMPTION: SUBGRADIENTS SUFFICE Non-Smooth Setup: Dual Approach APPROACH: Each iterate solution provides a lower bound and an upper bound T f(x¤) ¸ f(x(t) ) + rf(x(t) (x¤ ¡ x(t) ) f(x(t)) ¸ f(x¤) UPPER x(t) x(t+1) x(t+2) Take convex combination of both upper bounds and lower bounds with weights °t UPPER BOUND: 1 PT t=1 LOWER BOUND: °t ³P T ´ ¤ ° f(x ) ¸ f(x ) t t=1 (t) Non-Smooth Setup: Dual Approach APPROACH: Each iterate solution provides a lower bound and an upper bound T f(x¤) ¸ f(x(t) ) + rf(x(t) ) (x¤ ¡ x(t) ) f(x(t)) ¸ f(x¤) UPPER LOWER x(t) x(t+1) x(t+2) Take convex combination of both upper bounds and lower bounds with weights °t ³P ´ T 1 (t) ¤ P ° f(x ) ¸ f(x ) T t UPPER: t=1 ° t=1 LOWER : f(x¤ ) ¸ PT1 t=1 °t t hP T i (t) ¤ (t) ° (f(x ) + rf(x ) (x ¡ x )) t t=1 (t) T Non-Smooth Setup: Dual Approach APPROACH: Each iterate solution provides a lower bound and an upper bound T f(x¤) ¸ f(x(t) ) + rf(x(t) ) (x¤ ¡ x(t) ) UPPER LOWER f(x(t)) ¸ f(x¤) HOW TO UPDATE ITERATES? HOW TO CHOSE WEIGHTS? x(t) x(t+1) x(t+2) Take convex combination of both upper bounds and lower bounds with weights °t ³P ´ T 1 (t) ¤ P ° f(x ) ¸ f(x ) T t UPPER: t=1 ° t=1 LOWER : f(x¤ ) ¸ PT1 t=1 °t t hP T i (t) ¤ (t) ° (f(x ) + rf(x ) (x ¡ x )) t t=1 (t) T Reduction to Online Linear Minimization Fix weights °t to be uniform for simplicity: PT1 UPPER: t=1 LOWER : f(x¤ ) ¸ PT1 t=1 DUALITY GAP: · PT t=1 PT°t t=1 °t °t °t ³P T hP T ´ (t) ¤ ° f(x ) ¸ f(x ) t t=1 i (t) ¤ (t) ° (f(x ) + rf(x ) (x ¡ x )) t t=1 (t) T ¸ PT (t) ¤ (t) T ¤ (t) f(x ) ¡ f(x ) · ¡rf(x ) (x ¡ x ) t=1 LINEAR FUNCTION Reduction to Online Linear Minimization Fix weights °t to be uniform for simplicity: DUALITY GAP: · PT t=1 PT°t t=1 °t ¸ PT (t) ¤ (t) T ¤ (t) f(x ) ¡ f(x ) · ¡rf(x ) (x ¡ x ) t=1 ONLINE SETUP ALGORITHM ADVERSARY x(t) 2 X ¡rf(x(t) ) Reduction to Online Linear Minimization Fix weights °t to be uniform for simplicity: DUALITY GAP: · PT t=1 PT°t t=1 °t ¸ PT (t) ¤ (t) T ¤ (t) f(x ) ¡ f(x ) · ¡rf(x ) (x ¡ x ) t=1 ONLINE SETUP ALGORITHM ADVERSARY x(t) 2 X `(t) = ¡rf(x(t) ) Recall that by assumption: (t) (t) k` k¤ = krf(x )k¤ · ½ Loss vector is gradient Reduction to Online Linear Minimization Fix weights °t to be uniform for simplicity: DUALITY GAP: hP T t=1 i 1 (t) ¤ f(x ) ¡ f(x )· T 1 T ¢ PT (t) T ¤ (t) ¡rf(x ) (x ¡ x ) t=1 ONLINE SETUP ALGORITHM ADVERSARY x(t) 2 X `(t) = ¡rf(x(t) ) Recall that by assumption: (t) (t) k` k¤ = krf(x )k¤ · ½ Loss vector is gradient T T 1 X ¢ ¡rf(x(t) ) (x¤ ¡ x(t) ) = REGRET T t=1 Final Bound ONLINE SETUP ALGORITHM ADVERSARY x(t) 2 X `(t) = ¡rf(x(t) ) Recall that by assumption: T X t=1 (t) (t) k` k¤ = krf(x )k¤ · ½ Loss vector is gradient (t) T ¡rf(x ) (x¤ ¡ x(t) ) = REGRET RESULTING ALGORITHM: MIRROR DESCENT Error bound with ¾-strongly-convex regularizer F p ½ 2 ¢ (maxx2X F (x) ¡ minx2X F (x)) p ²MD · ¾ T Final Bound ONLINE SETUP ALGORITHM ADVERSARY x(t) 2 X `(t) = ¡rf(x(t) ) Recall that by assumption: T X t=1 (t) (t) k` k¤ = krf(x )k¤ · ½ Loss vector is gradient (t) T ¡rf(x ) (x¤ ¡ x(t) ) = REGRET RESULTING ALGORITHM: MIRROR DESCENT Error bound with ¾-strongly-convex regularizer F p ½ 2 ¢ (maxx2X F (x) ¡ minx2X F (x)) p ²MD · ¾ T ASYMPTOTICALLY OPTIMAL BY INFORMATION COMPLEXITY LOWER BOUND Non-Smooth Optimization over Simplex RESULTING ALGORITHM: MIRROR DESCENT OVER SIMPLEX = MWU Regularizer F is negative entropy, with krf(x(t) )k1 · ½ p ½ 2 ¢ log n p ²MD · T APPLICATIONS IN ALGORITHM DESIGN Warm-up Example: Linear Programming A 2 Rm£n ; ?9x 2 X : Ax ¡ b ¸ 0 Easy constraints Maintain feasible Hard constraints Require fixing LP Feasibility problem Warm-up Example: Linear Programming A 2 Rm£n ; ?9x 2 X : Ax ¡ b ¸ 0 LP Feasibility problem Convert into non-smooth optimization problem over simplex: min max pT (b ¡ Ax) p2¢m x2X Non-differentiable objective: f(p) = max pT (b ¡ Ax) x2X Warm-up Example: Linear Programming A 2 Rm£n ; ?9x 2 X : Ax ¡ b ¸ 0 LP Feasibility problem Convert into non-smooth optimization problem over simplex: min max pT (b ¡ Ax) p2¢m x2X Non-differentiable objective: T f(p) = max p (b ¡ Ax) x2X Best response to dual solution p Warm-up Example: Linear Programming A 2 Rm£n ; ?9x 2 X : b ¡ Ax ¸ 0 LP Feasibility problem Convert into non-smooth optimization problem over simplex: min max pT (b ¡ Ax) p2¢m x2X Non-differentiable objective f(p) = max pT (b ¡ Ax) x2X Admits subgradients, for all p: xp : pT (b ¡ Axp ) ¸ 0; (b ¡ Axp ) 2 @f(p) Subgradient is slack in constraints Warm-up Example: Linear Programming A 2 Rm£n ; ?9x 2 X : b ¡ Ax ¸ 0 LP Feasibility problem Convert into non-smooth optimization problem over simplex: min max pT (b ¡ Ax) p2¢m x2X Non-differentiable objective f(p) = max pT (b ¡ Ax) x2X Admits subgradients, for all p: xp : pT (b ¡ Axp ) ¸ 0; (b ¡ Axp ) 2 @f(p) If we can pick xp such that kb ¡ Axpk1 · ½ , then p ½ 2 ¢ log n p ²MD · T 2 ¢ ½2 ¢ log n T· ²2 MWU and s-t Maxflow Minaximum flow feasibility for value F over undirected graph G with incidence matrix B: jfe j 8e 2 E; F ¢ · 1 ce B T f = es ¡ et Will enforce this Turn into non-smooth minimization problem over simplex: X F ¢ jfe j f(p) = min pe ¢ ¡1 T ce B f =es ¡et e2E Best response fp is shortest s-t path with lengths pe / ce. For any p, if fphas length > 1, there is no subgradient, i.e. problem is infeasible. Otherwise, the following is a subgradient F ¢ j(fp )e j @f(p)e = ¡1 ce Unfortunately, width can be large k@f(p)e k1 · F cmin [PST 91] T =O ³ F log n ²2 cmin ´ Width Reduction: make function nicer NEED PRIMAL ARGUMENT (t+1) x x(t+2) x(t) PROBLEM: Optimal for this specific formulation k@f(p)e k1 · SOLUTION: Regularize primal X fe ³ ²´ f(p) = min F ¢ pe + ¡1 T c m B f=es ¡et e e2E F cmin Width Reduction: make primal nicer PROBLEM: Optimal for this specific formulation k@f(p)e k1 · SOLUTION: Regularize primal X fe ³ ²´ f(p) = min F ¢ pe + ¡1 T ce m B f=es ¡et e2E REGULARIZATION ERROR: NEW WIDTH: ²F m k@f(p)e k1 · ² ITERATION BOUND: T =O ³ m log n ²2 ´ [GK 98] F cmin Electrical Flow Approach [CKMST] Different formulation yields basis for CKMST algorithm: fe2 8e 2 E; F ¢ 2 · 1 ce B T f = es ¡ et Non-smooth optimization problem: f(p) = T min B f =es ¡et X e2E F ¢ fe2 pe ¢ ¡1 c2e Will enforce this Electrical Flow Approach [CKMST] Different formulation yields basis for CKMST algorithm: fe2 8e 2 E; F ¢ 2 · 1 ce B T f = es ¡ et Non-smooth optimization problem: f(p) = T min B f =es ¡et X e2E F ¢ fe2 pe ¢ ¡1 c2e Best response is electrical flow fp Original width: k@f(p)ek1 · m Will enforce this Electrical Flow Approach [CKMST] Different formulation yields basis for CKMST algorithm: fe2 8e 2 E; F ¢ 2 · 1 ce B T f = es ¡ et Will enforce this Non-smooth optimization problem: f(p) = Regularize primal: T min B f =es ¡et X e2E F ¢ fe2 pe ¢ ¡1 2 ce X f2 ³ ²´ e f(p) = min F ¢ pe + ¡1 2 T ce m B f =es ¡et e2E k@f(p)e k1 · r m ² Conclusion: Take-away messages • Regularization is a powerful tool for the design of fast algorithms. • Most iterative algorithms can be understood as regularized updates: MWUs, Width Reduction, Interior Point, Gradient descent, .. • Perform well in practice. Regularization also helps eliminate noise. • ULTIMATE GOAL: Development of a library of iterative methods for fast graph algorithms. Regularization plays a fundamental role in this effort THE END – THANK YOU

ITERATIVE METHODS AND REGULARIZATION IN THE DESIGN OF FAST ALGORITHMS

Related documents

Products

Support

ITERATIVE METHODS AND REGULARIZATION IN THE DESIGN OF FAST ALGORITHMS

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib