Introduction to Machine Learning Contents Varun Chandola <>

1.1 Linear Classification via Hyperplanes 2 w Introduction to Machine Learning CSE474/574: Maximum Margin Methods Varun Chandola <chandola@buffalo.edu> 0 y 0 0 ·x w 1 1 3 w 1 Maximum Margin Classifiers 1.1 Linear Classification via Hyperplanes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 Concept of Margin . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . + w b < ·x Contents + w b= ·x + b > Outline x b k kw 1 Maximum Margin Classifiers y = w> x + b – Orthogonal to the hyperplane – This w goes through the origin • Remember the Perceptron! – How do you check if a point lies “above” or “below” w? • If data is linearly separable – What happens for points on w? – Perceptron training guarantees learning the decision boundary • There can be other boundaries – Depends on initial value for w • Add a bias b • But what is the best boundary? x2 – b > 0 - move along w – b < 0 - move opposite to w +1 −1 w> x = −b For a hyperplane that passes through the origin, a point x will lie above the hyperplane if w> x > 0 and will lie below the plane if w> x < 0, otherwise. This can be further understood by understanding that bf w> x is essentially equal to |w||x| cos θ, where θ is the angle between w and x. • How to check if point lies above or below w? – If w> x + b > 0 then x is above – Else, below n= w |w| • Decision boundary represented by the hyperplane w • For binary classification, w points towards the positive class b − |w| x1 Decision Rule y = sign(w> x + b) Linear Classification via Hyperplanes • w> x + b > 0 ⇒ y = +1 • Separates a D-dimensional space into two half-spaces • w> x + b < 0 ⇒ y = −1 • Defined by w ∈ <D • Perceptron can find a hyperplane that separates the data 1.1 1.2 Concept of Margin 3 1.2 Concept of Margin 4 − 1 b= w ·x w – . . . if the data is linearly separable + w ·x + w ·x b= 0 + xxx xxxx x 2 k kw b= 1 y x b k kw • But there can be many choices! • Find the one with best separability (largest margin) Similar analysis can be done for points on the negative side of x. In general, one can write the expression for the margin as: w> x + b (2) γ=y kwk • Gives better generalization performance 1. Intuitive reason 2. Theoretical foundations where y ∈ {−1, +1}. Functional Interpretation 1.2 Concept of Margin • Margin positive if prediction is correct; negative if prediction is incorrect • Margin is the distance between an example and the decision line 2 . We can show this as follows. Since the From the figure one can note that the size of the margin is kwk data is separable, we can get two parallel lines represented by w> x + b = +1 and w> x + b = −1. Using 2 result from (1) and (2), the distance between the two lines is given by 2γ = kwk . • Denoted by γ • For a positive point: γ= w> x + b kwk • For a negative point: γ=− References w> x + b kwk To understand the margin from a geometric perspective, consider the projection of the vector connecting the origin to a point x on the decision line. Let the point be denoted as x0 . Obviously the vector r connecting x0 and x is given by: w b =γ r = γw kwk if x lies on the positive side of w. But the same vector can be computed as: r = x − x0 Equating above two gives us x0 as: x0 = x − γ Noting that, since x0 lies on the hyperplane and hence: w kwk w > x0 + b = 0 Substituting x0 from above: w> x − γ Noting that w> w kwk = kwk, we get γ as: γ= w> w +b=0 kwk w> x + b kwk (1)

Introduction to Machine Learning Contents Varun Chandola <>

Related documents

Products

Support

Introduction to Machine Learning Contents Varun Chandola &lt;&gt;

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib

Introduction to Machine Learning Contents Varun Chandola <>