Review of SVM optimization

advertisement

PEGASOS

Primal Estimated sub-GrAdient Solver for SVM

Ming TIAN 04-20-2012

1

Reference

[1] Shalev-Shwartz, S., Singer, Y., & Srebro, N. (2007). Pegasos: primal estimated sub-gradient solver for svm. ICML, 807-814.

Mathematical Programming, Series B, 127(1):3-30, 2011.

[2] Zhuang Wang, Koby Crammer, Slobodan Vucetic (2010).

Multi-Class Pegasos on a Budget. ICML.

[3] Crammer, K & Singer. Y. (2001). On the algorithmic implementation of multiclass kernel-based vector machines. JMLR, 2,

262-292.

[4] Crammer, K., Kandola, J. & Singer, Y. (2004). Online classification on a budget. NIPS, 16, 225-232.

2

Outline

Review of SVM optimization

The Pegasos algorithm

Multi-Class Pegasos on a Budget

Further works

3

Outline

Review of SVM optimization

The Pegasos algorithm

Multi-Class Pegasos on a Budget

Further works

4

Review of SVM optimization

Q1:

Regularization term Empirical loss

5

Review of SVM optimization

6

Review of SVM optimization

 Dual-based methods

 Interior Point methods

 Memory: m 2 , time: m 3 , log(log(1/  ))

 Decomposition methods

 Memory: m, Time: super-linear in m

 Online learning & Stochastic Gradient

 Memory: O(1), Time: 1/  2 (linear kernel)

 Typically, online learning algorithms do not converge to the optimal solution of SVM

7

Outline

Review of SVM optimization

The Pegasos algorithm

Multi-Class Pegasos on a Budget

Further works

8

PEGASOS

A_t = S

Subgradient method

|A_t| = 1

Stochastic gradient

Subgradient

Projection

9

Run-Time of Pegasos

Choosing |A t

|=1 and a linear kernel over R n

Run-time required for Pegasos to find

 accurate solution with probability 1-

Run-time does not depend on #examples

 Depends on “difficulty” of problem (  and

)

10

Formal Properties

 Definition: w is

 accurate if

 Theorem 1 : Pegasos finds

 accurate solution w.p. 1-

 after at most iterations.

 Theorem 2 : Pegasos finds log(1/

) solutions s.t. w.p. 1-

, at least one of them is

 accurate after iterations

11

Proof Sketch

A second look on the update step:

12

Proof Sketch

 Denote:

 Logarithmic Regret for OCP

 Take expectation :

 f(w r

)-f(w * ) 0  Markov gives that w.p. 1-

Amplify the confidence

13

Proof Sketch

14

Proof Sketch

-

.

15

Proof Sketch

16

Proof Sketch

17

Experiments

 3 datasets (provided by Joachims)

 Reuters CCAT (800K examples, 47k features)

 Physics ArXiv (62k examples, 100k features)

 Covertype (581k examples, 54 features)

 4 competing algorithms

 SVM-light (Joachims)

 SVMPerf (Joachims’06)

 Norma (Kivinen, Smola, Williamson ’02)

 Zhang’04 (stochastic gradient descent)

18

Training Time (in seconds)

Reuters

Covertype

Astro-

Physics

Pegasos

2

6

2

SVM-

Perf

77

85

5

SVM-

Light

20,075

25,514

80

19

Compare to Norma (on Physics) obj. value test error

20

Compare to Zhang (on Physics)

But, tuning the parameter is more expensive than learning …

21

Effect of k=|A t

| when T is fixed

22

Effect of k=|A t

| when kT is fixed

23

bias term

 Popular approach: increase dimension of x

Cons: “pay” for b in the regularization term

 Calculate subgradients w.r.t. w and w.r.t b:

Cons: convergence rate is 1/

2

 Define:

Cons: |A t

| need to be large

 Search b in an outer loop

Cons: evaluating objective is 1/

2

24

Outline

Review of SVM optimization

The Pegasos algorithm

Multi-Class Pegasos on a Budget

Further works

25

multi-class SVM (

Crammer & Singer, 2001) multi-class model :

26

multi-class SVM

(Crammer & Singer, 2001) multi-class SVM objective function: where and the multi-class hinge-loss function is defined as: where

27

multi-class Pegasos use the instantaneous objective function : multi-class Pegasos works by iteratively executing the two-step updates :

Step 1:

Where:

28

multi-class Pegasos

If loss is equal to zero then:

Else:

Step 2: project the weight wt+1 into the closed convex set:

29

Budgeted Multi-Class Pegasos

30

Budget Maintenance Strategies

 Budget maintenance through removal

 the optimal removal always selects the oldest SV

 Budget maintenance through projection

 projecting an SV onto all the remaining SVs and thus results in smaller weight degradation.

 Budget maintenance through Merging

 merging two SVs to a newly created one

 The total cost of finding the optimal merging for the n -th and m -th SV is O (1).

31

Experiments

32

Outline

Review of SVM optimization

The Pegasos algorithm

Multi-Class Pegasos on a Budget

Further works

33

Further works

 Distribution_aware Pegasos?

 Online structural regularized SVM?

34

Thanks! Q&A

35

36

Download