EE 6885 Statistical Pattern Recognition Reading

advertisement
EE 6885 Statistical Pattern
Recognition
Fall 2005
Prof. Shih-Fu Chang
http://www.ee.columbia.edu/~sfchang
Lecture 11 (10/17/05)
EE6887-Chang
„
Reading
„
Distance Metrics
„
„
DHS Chap. 5.1-5.4
Midterm Exam
„
Oct. 24th 2005 Monday 1pm-2:30pm (90mins)
„
„
DHS Chap. 4.6
Linear Discriminant Functions
„
„
11-1
Open books/notes, no computer
Review Class
„
Oct. 21st Friday 4pm. Location TBA
EE6887-Chang
11-2
1
kn-Nearest-Neighbor
For classification, estimate pn (ωi
for each class ωi
p ( x, ωi )
k
pn (ωi | x) = c n
= i
∑ pn ( x, ω j ) k
„
| x)
j =1
„
Performance bound of 1-nearest
neighbor (Cover & Hart ’67)
c *
*
P ≤ lim Pn (e) ≤ P* (2 −
P)
n →∞
c −1
„
Combine K-NN with clustering
„ K-Means, LVQ, GMM
„ Reduce complexity
„
„
When K increases, complexity?
Smooth decision boundaries
EE6887-Chang
11-3
Distance Metrics
„
„
Nearest neighbor rules need distance metrics
Required properties of a metric
1. non-negativity: D(a, b) ≥ 0
2. reflexivity: D (a, b) = 0 iff a = b
3. symmetry: D(a, b) = D(b, a )
4. triangular inequality: D(a, b) + D(b, c) ≥ D(c, a )
D (a, b) ≥ D (c, a ) − D (b, c)
„
Minkowski Metric
„ Euclidean
„ Manhattan
„
„
L∞
?
x
useful in indexing
d
Lk (a, b) = (∑ | ai − bi |k )1/ k
i =1
n + n − 2n12 ( n1 − n12 ) + ( n2 − n12 )
Tanimono Metric
Dtanimono ( S1 , S2 ) = 1 2
=
n1 + n2 − n12
n1 + n2 − n12
„ sets of elements
n2
n1
„ Point-point distance not
useful
EE6887-Chang
11-4
2
Discriminant Functions Revisited
define discriminant function gi ( x) for class ωi
map x to class ωi if gi ( x) ≥ g j ( x) ∀j ≠ i
e.g., gi ( x) = ln P ( x | ωi ) + ln P(ωi )
MAP classifier
Gaussian Case: P ( x | ωi ) = N ( μi , Σ i )
1
−1
t
exp( ( x − μ i ) Σi −1 ( x − μ i ))
P(x | wi ) =
d /2
2
Σi
( 2π )
„
Σi = Σ
Case I:
„
g i ( x) = wi t x + wi 0
Case II:
„
a hyperplane with bias wi 0
Σ = arbitrary
i
1
d
1
g i ( x ) = − ( x − μ i ) t Σ i −1 ( x − μ i ) − ln 2π − ln Σ i + ln P (ωi )
2
2
2
g i ( x ) = x tWi x + wit x + wi 0
EE6887-Chang
Decision boundaries may be Hyperplane,
hypersphere, hyperellipsoid,
hyperparaboloids, hyperhyperboloids
11-5
Discriminant Functions (Chap. 5)
„
„
Directly define discriminant functions
„ Without assuming parameter distribution functions for P ( x | ω )
i
„ Easy to derive useful classifiers
Linear Functions
g ( x) = w t x + w0 ,w: weight vector, w0 : bias
„
Two-Category Case
map x to class ω1 if g (x)>0, otherwise class ω2
Decision surface H : g (x) = 0
x = xp + r⋅
w
w
x p : projection of x onto H , g (x p ) = 0
r : distance from x to H
⎛
w ⎞⎟
w
⎟⎟ = rw t
=r w
g (x) = g ⎜⎜⎜x p + r ⋅
⎜⎝
w ⎠⎟
w
EE6887-Chang
⇒ r=
g ( x)
w
11-6
3
Multi-category Case
c categories: ω1 , ω2 , …, ωc
„
number of classifiers needed?
Approaches
„ Use two-class discriminant for each class
⇒ x belongs to class ωi or not?
„
„
Use two-class discriminant for each pair
of classes
⇒ x belongs to class ωi or ω j?
General Approach
one function for each class gi (x) = w i t x + wi 0
map x to class ωi if gi ( x) ≥ g j ( x) ∀j ≠ i
decision boundary H ij : gi (x) = g j (x)
H ij : (w i − w j )t x + ( wi 0 − w j 0 ) = 0
„
„
Each decision regions is convex
and singly connected.
Good for monomodal distributions
EE6887-Chang
11-7
Method for searching decision boundaries
g i (x) = w i t x + wi 0 ⇒ find weight ω and bias ωo
⎡1⎤
⎢ ⎥
⎡ 1⎤ ⎢ x1 ⎥
y = ⎢ ⎥ = ⎢⎢ ⎥⎥
⎢⎣ x⎥⎦
⎢ ⎥
⎢x ⎥
⎣ d⎦
„
Augmented Vector
„
Decision Boundary
⎡ wi 0 ⎤
⎢ ⎥
⎡ wi 0 ⎤ ⎢ wi1 ⎥
ai = ⎢ ⎥ = ⎢⎢ ⎥⎥
⎢ wi ⎥
⎣ ⎦ ⎢ ⎥
⎢w ⎥
⎣ id ⎦
H ij : (w i − w j )t x + ( wi 0 − w j 0 ) = 0
„
⇒ gi (x) = g i ( y ) = ai y
⇒ H ij : (ai − a j )t y = 0
2-category case
H : (w )t x + wi 0 = 0
⇓
H : at y = 0
„
A hyperplane in augmented y
space, with normal vector a
EE6887-Chang
11-8
4
Search Method for Linear Discriminant
all sample points reside in the y1 = 1 subspace
distance from x to boundary in x space:
distance from x to boundary in y space:
r ′ = at y / a
≤r
r=
r
g ( x)
w
[1, x]t
i.e., r ′ and r same signs,
r ′ lower bound for r
Design Objective for finding a
„
Find a that correctly classify each sample data
„
∀y i in class ω1 , a t y i >0 ∀y i in class ω2 , a t y i <0
∀y i in class ω2 , y i ← −( y i )
„
Normalization
„
New Design Objective
∀y i in class ω1 or ω2 , a t y i >0
Solution region
„
„
Intersection of positive
sides of all hyperplanes
EE6887-Chang
11-9
Searching Linear Discriminant Solutions
„
Stricter criterion: Solution region with margin
∀y i in class ω1 or ω2 , a t y i >b
b>0
b=0
„
Search Approaches
„
„
„
Gradient decent methods to find a solution in the
solution region
Maximize margin
Mapping to high-dimensional space
EE6887-Chang
11-10
5
Download