Support Vector Machines Yihui Saw April 30, 2013 Massachusetts Institute of Technology

Yihui Saw (MIT)

Support Vector Machines

Yihui Saw

Massachusetts Institute of Technology yihui@mit.edu

April 30, 2013

SVM

April 30, 2013 1 / 22

Overview

1

Introduction

2

Formal Definition

3

Support Vector Machines with Errors

4

Non-linear problems

5

Kernels

6

Applications

Yihui Saw (MIT)

SVM

April 30, 2013 2 / 22

Background/Motivation

Given 2 classes, and a new data point we want to be able to classify this new data point as one of the 2 classes.

Yihui Saw (MIT)

SVM

April 30, 2013 3 / 22


Goal: Find a linear separator.

Yihui Saw (MIT)

SVM

April 30, 2013 4 / 22


But there are many possible linear separators.

Yihui Saw (MIT)

SVM

April 30, 2013 5 / 22

The SVM Model

But there are many possible linear separators.

Yihui Saw (MIT)

SVM

April 30, 2013 6 / 22

The SVM Model

Maximize the gap between support vectors.

Yihui Saw (MIT)

SVM

April 30, 2013 7 / 22

Formal Definition

Input : Set of samples S where each sample x i x ∈ R d and y i is its class y ∈ { +1 , − 1 } .

is a vector of d variables

Goal: Find Θ , Θ

0

1

| Θ | such that y i

(Θ · x i

+ Θ

0

) ≥ 1 while maximizing the gap

Yihui Saw (MIT)

SVM

April 30, 2013 8 / 22

The quadratic program

Primal min

1

2

| Θ | 2 subject to y i

(Θ · x i

+ Θ

0

) ≥ 1 where i = 1 , ..., n

Yihui Saw (MIT)

SVM

April 30, 2013 9 / 22

The dual problem

Dual max P n i =1

α i

−

1

2

P

α i

≥ 0 , i = 1 , 2 , ..., n n i =1

P n j =1

α i

α j y i y j

( x i

· x j

) subject to

The solution satisfies:

(support vector) α i

> 0 : y i

( P n j =1

α j y j x j

) · x i

= 1

(non-support vector) α i

= 0 : y i

( P n j =1

α j y j x j

) · x i

≥ 1

Yihui Saw (MIT)

SVM

April 30, 2013 10 / 22


Sometimes our examples contain errors

Solution: Introduce “slack” variables ξ i

≥ 0 to our optimization problem

Yihui Saw (MIT)

SVM

April 30, 2013 11 / 22


(primal) min

λ

2

| Θ | 2

+

1 n

P n i =1

ξ i subject to y i

(Θ · x i

+ Θ

0

) ≥ 1 − ξ i

, ξ i

≥ 0 , i = 1 , ..., n

λ is the regularization parameter. It balances how much we favor increasing the margin over satisfying the classification constraints.

Yihui Saw (MIT)

SVM

April 30, 2013 12 / 22


The effect of slack when examples are still linearly separable

Yihui Saw (MIT)

SVM

April 30, 2013 13 / 22


The effect of slack when examples are no longer linearly separable

Yihui Saw (MIT)

SVM

April 30, 2013 14 / 22

Non-linear problems

Problems that are not linearly separable

Yihui Saw (MIT)

SVM

April 30, 2013 15 / 22

Non-linear problems

The idea is to gain linearly separation by mapping the data to a higher dimensional space

Yihui Saw (MIT)

SVM

April 30, 2013 16 / 22

Non-linear problems

Recall the dual of the problem: max P n i =1

α i

−

1

2

P n i =1

P n j =1

α i

α j y i y j

( x i

· x j

)

In the non-linear case, we replace x i

· x j with φ ( x i

) · φ ( x j

).

So we don’t need to know what Φ is explicitly. Calculate Kernel function

K instead where

K ( x i

, x j

) = φ ( x i

) · φ ( x j

)

Yihui Saw (MIT)

SVM

April 30, 2013 17 / 22

Kernels

With Kernels, we can implicitly work with very high (or even infinite) dimensional feature vectors.

Example the radial basis kernel

K ( x i

, x j

) = φ ( x i

) · φ ( x j

) = e

−| xi − xj |

2

2 is infinite dimensional.

Kernel function

A kernel function is valid if and only if there exists some map φ ( x ) such that

K ( x i

, x j

) = φ ( x i

) · φ ( x j

)

Yihui Saw (MIT)

SVM

April 30, 2013 18 / 22

Kernels

Rules

1

2

3

4

K ( x i

, x j

) = 1 is a kernel function.

Let f : R d → R be any real valued function of x. Then, if K ( x i

, x j

) is a kernel, function, then so is ˜ ( x i

, x j

) = f ( x i

) K ( x i

, x j

) f ( x j

).

If K

1

( x i

, x j

) and K

2

( x i

, x j

) are kernels, then so is their sum.

If K

1

( x i

, x j

) and K

2

( x i

, x j

) are kernels, then so is their product.

Yihui Saw (MIT)

SVM

April 30, 2013 19 / 22

Kernels

Yihui Saw (MIT)

SVM

April 30, 2013 20 / 22

Applications

Text (and hypertext) categorization

Image classification

Bioinformatics (Protein classification, Cancer classification) etc ...

Yihui Saw (MIT)

SVM

April 30, 2013 21 / 22

The End

Yihui Saw (MIT)

SVM

April 30, 2013 22 / 22

Support Vector Machines Yihui Saw April 30, 2013 Massachusetts Institute of Technology

The End

Related documents

Products

Support

Support Vector Machines Yihui Saw April 30, 2013 Massachusetts Institute of Technology

The End

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib