Introduction to Machine Learning Contents Varun Chandola <>

advertisement
1.1 Linear Classification via Hyperplanes
2
w
Introduction to Machine Learning
CSE474/574: Maximum Margin Methods
Varun Chandola <chandola@buffalo.edu>
0
y
0
0
·x
w
1
1
3
w
1 Maximum Margin Classifiers
1.1 Linear Classification via Hyperplanes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.2 Concept of Margin . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
+
w
b
<
·x
Contents
+
w
b=
·x
+
b
>
Outline
x
b k
kw
1
Maximum Margin Classifiers
y = w> x + b
– Orthogonal to the hyperplane
– This w goes through the origin
• Remember the Perceptron!
– How do you check if a point lies “above” or “below” w?
• If data is linearly separable
– What happens for points on w?
– Perceptron training guarantees learning the decision boundary
• There can be other boundaries
– Depends on initial value for w
• Add a bias b
• But what is the best boundary?
x2
– b > 0 - move along w
– b < 0 - move opposite to w
+1
−1
w> x = −b
For a hyperplane that passes through the origin, a point x will lie above the hyperplane if w> x > 0 and
will lie below the plane if w> x < 0, otherwise. This can be further understood by understanding that
bf w> x is essentially equal to |w||x| cos θ, where θ is the angle between w and x.
• How to check if point lies above or below w?
– If w> x + b > 0 then x is above
– Else, below
n=
w
|w|
• Decision boundary represented by the hyperplane w
• For binary classification, w points towards the positive class
b
− |w|
x1
Decision Rule
y = sign(w> x + b)
Linear Classification via Hyperplanes
• w> x + b > 0 ⇒ y = +1
• Separates a D-dimensional space into two half-spaces
• w> x + b < 0 ⇒ y = −1
• Defined by w ∈ <D
• Perceptron can find a hyperplane that separates the data
1.1
1.2 Concept of Margin
3
1.2 Concept of Margin
4
−
1
b=
w
·x
w
– . . . if the data is linearly separable
+
w
·x
+
w
·x
b=
0
+
xxx
xxxx x
2 k
kw
b=
1
y
x
b k
kw
• But there can be many choices!
• Find the one with best separability (largest margin)
Similar analysis can be done for points on the negative side of x. In general, one can write the expression
for the margin as:
w> x + b
(2)
γ=y
kwk
• Gives better generalization performance
1. Intuitive reason
2. Theoretical foundations
where y ∈ {−1, +1}.
Functional Interpretation
1.2
Concept of Margin
• Margin positive if prediction is correct; negative if prediction is incorrect
• Margin is the distance between an example and the decision line
2
. We can show this as follows. Since the
From the figure one can note that the size of the margin is kwk
data is separable, we can get two parallel lines represented by w> x + b = +1 and w> x + b = −1. Using
2
result from (1) and (2), the distance between the two lines is given by 2γ = kwk
.
• Denoted by γ
• For a positive point:
γ=
w> x + b
kwk
• For a negative point:
γ=−
References
w> x + b
kwk
To understand the margin from a geometric perspective, consider the projection of the vector connecting
the origin to a point x on the decision line. Let the point be denoted as x0 . Obviously the vector r
connecting x0 and x is given by:
w
b =γ
r = γw
kwk
if x lies on the positive side of w. But the same vector can be computed as:
r = x − x0
Equating above two gives us x0 as:
x0 = x − γ
Noting that, since x0 lies on the hyperplane and hence:
w
kwk
w > x0 + b = 0
Substituting x0 from above:
w> x − γ
Noting that
w> w
kwk
= kwk, we get γ as:
γ=
w> w
+b=0
kwk
w> x + b
kwk
(1)
Download