Primal Estimated sub-GrAdient Solver for SVM
Ming TIAN 04-20-2012
1
Reference
[1] Shalev-Shwartz, S., Singer, Y., & Srebro, N. (2007). Pegasos: primal estimated sub-gradient solver for svm. ICML, 807-814.
Mathematical Programming, Series B, 127(1):3-30, 2011.
[2] Zhuang Wang, Koby Crammer, Slobodan Vucetic (2010).
Multi-Class Pegasos on a Budget. ICML.
[3] Crammer, K & Singer. Y. (2001). On the algorithmic implementation of multiclass kernel-based vector machines. JMLR, 2,
262-292.
[4] Crammer, K., Kandola, J. & Singer, Y. (2004). Online classification on a budget. NIPS, 16, 225-232.
2
Outline
Review of SVM optimization
The Pegasos algorithm
Multi-Class Pegasos on a Budget
Further works
3
Outline
Review of SVM optimization
The Pegasos algorithm
Multi-Class Pegasos on a Budget
Further works
4
Review of SVM optimization
Q1:
Regularization term Empirical loss
5
Review of SVM optimization
6
Review of SVM optimization
Dual-based methods
Interior Point methods
Memory: m 2 , time: m 3 , log(log(1/ ))
Decomposition methods
Memory: m, Time: super-linear in m
Online learning & Stochastic Gradient
Memory: O(1), Time: 1/ 2 (linear kernel)
Typically, online learning algorithms do not converge to the optimal solution of SVM
7
Outline
Review of SVM optimization
The Pegasos algorithm
Multi-Class Pegasos on a Budget
Further works
8
PEGASOS
A_t = S
Subgradient method
|A_t| = 1
Stochastic gradient
Subgradient
Projection
9
Run-Time of Pegasos
Choosing |A t
|=1 and a linear kernel over R n
Run-time required for Pegasos to find
accurate solution with probability 1-
Run-time does not depend on #examples
Depends on “difficulty” of problem ( and
)
10
Formal Properties
Definition: w is
accurate if
Theorem 1 : Pegasos finds
accurate solution w.p. 1-
after at most iterations.
Theorem 2 : Pegasos finds log(1/
) solutions s.t. w.p. 1-
, at least one of them is
accurate after iterations
11
Proof Sketch
A second look on the update step:
12
Proof Sketch
Denote:
Logarithmic Regret for OCP
Take expectation :
f(w r
)-f(w * ) 0 Markov gives that w.p. 1-
Amplify the confidence
13
Proof Sketch
14
Proof Sketch
-
.
15
Proof Sketch
16
Proof Sketch
17
Experiments
3 datasets (provided by Joachims)
Reuters CCAT (800K examples, 47k features)
Physics ArXiv (62k examples, 100k features)
Covertype (581k examples, 54 features)
4 competing algorithms
SVM-light (Joachims)
SVMPerf (Joachims’06)
Norma (Kivinen, Smola, Williamson ’02)
Zhang’04 (stochastic gradient descent)
18
Training Time (in seconds)
Reuters
Covertype
Astro-
Physics
Pegasos
2
6
2
SVM-
Perf
77
85
5
SVM-
Light
20,075
25,514
80
19
Compare to Norma (on Physics) obj. value test error
20
Compare to Zhang (on Physics)
But, tuning the parameter is more expensive than learning …
21
Effect of k=|A t
| when T is fixed
22
Effect of k=|A t
| when kT is fixed
23
bias term
Popular approach: increase dimension of x
Cons: “pay” for b in the regularization term
Calculate subgradients w.r.t. w and w.r.t b:
Cons: convergence rate is 1/
2
Define:
Cons: |A t
| need to be large
Search b in an outer loop
Cons: evaluating objective is 1/
2
24
Outline
Review of SVM optimization
The Pegasos algorithm
Multi-Class Pegasos on a Budget
Further works
25
multi-class SVM (
Crammer & Singer, 2001) multi-class model :
26
multi-class SVM
(Crammer & Singer, 2001) multi-class SVM objective function: where and the multi-class hinge-loss function is defined as: where
27
multi-class Pegasos use the instantaneous objective function : multi-class Pegasos works by iteratively executing the two-step updates :
Step 1:
Where:
28
multi-class Pegasos
If loss is equal to zero then:
Else:
Step 2: project the weight wt+1 into the closed convex set:
29
Budgeted Multi-Class Pegasos
30
Budget Maintenance Strategies
Budget maintenance through removal
the optimal removal always selects the oldest SV
Budget maintenance through projection
projecting an SV onto all the remaining SVs and thus results in smaller weight degradation.
Budget maintenance through Merging
merging two SVs to a newly created one
The total cost of finding the optimal merging for the n -th and m -th SV is O (1).
31
Experiments
32
Outline
Review of SVM optimization
The Pegasos algorithm
Multi-Class Pegasos on a Budget
Further works
33
Distribution_aware Pegasos?
Online structural regularized SVM?
34
35
36