Pattern Classification (Duda, Hart, Stork) Nearest Neighbor Pattern

advertisement
Introduction
Supervised Classification Backbone
1-NN
k-NN
Conclusions
Pattern Classification (Duda, Hart, Stork)
Nearest Neighbor Pattern Classification (Cover and Hart)
Roberto Souza
DCA/FEEC/UNICAMP
16 de março de 2012
Roberto Souza
1/ 13
Introduction
Supervised Classification Backbone
1-NN
k-NN
Conclusions
Agenda
Introduction
Supervised Classification Backbone
1-NN
k-NN
Conclusions
Roberto Souza
2/ 13
Introduction
Supervised Classification Backbone
1-NN
k-NN
Conclusions
Developed in the 1960s;
Non-parametric, sub-optimum classifiers;
Often provide competitive results;
Simple to understand and implement.
Roberto Souza
3/ 13
Introduction
Supervised Classification Backbone
1-NN
k-NN
Conclusions
Supervised Classification Problem Formulation
M Classes;
N i.i.d. labeled samples
Z = {(X1 , θ(X1 )), (X2 , θ(X2 )), ...(XN , θ(XN ))}
θ(Xi ) ∈ {ω1 , ω2 , ..., ωM }
Assign new samples X s to one of the M possible classes
in a way to minimize the misclassification error.
Error expression:
Z +∞
p(error ) =
p(error , X )dX
−∞
Z +∞
=
p(error | X )p(X )dX .
−∞
Roberto Souza
4/ 13
Introduction
Supervised Classification Backbone
1-NN
k-NN
Conclusions
Bayesian Decison Rule
Finds an optimal solution to the classification problem;
p(ωi ) and p(X | ωi ) are known distributions;
From Bayes’ Theorem, p(ωi | X ) can be written as:
p(ωi | X ) =
where:
p(X ) =
M
X
p(X |ωi )×p(ωi )
,
p(X )
p(X | ωi ) × p(ωi ).
i=1
Bayesian decision Rule:
p(error | X ) = 1 − max[p(ω1 | X ), ..., p(ωM | X )].
Bayes’ Error Rate (BER) is achieved by using BDR.
Roberto Souza
5/ 13
Introduction
Supervised Classification Backbone
1-NN
k-NN
Conclusions
1-NN Overview
Figura: Illustration of 1-NN operation.
Roberto Souza
6/ 13
Introduction
Supervised Classification Backbone
1-NN
k-NN
Conclusions
1-NN Mathematical Formulation
The 1-NN classifier can be formulated in terms of mathematical equations. The label θ(X ) of each new sample X is
given by the following equation:
θ(X ) = θ(XNN ),
(1)
XNN = arg min {dX (Xi )},
(2)
where XNN is given by:
∀Xi ∈Z
and dX (Xi ) is the distance between X and Xi in the chosen
metric.
Roberto Souza
7/ 13
Introduction
Supervised Classification Backbone
1-NN
k-NN
Conclusions
1-NN Error Bounds
As the number of labeled samples N tends to infinity in a
M-class classification problem, the 1-Nearest Neighbor Error
Rate (1NNER) is bounded by the following expression:
BER ≤ 1NNER ≤ BER × (2 −
Roberto Souza
8/ 13
M
M−1
× BER).
(3)
Introduction
Supervised Classification Backbone
1-NN
k-NN
Conclusions
1-NN Weaknesses
1-NN is sensible to noise and outliers;
1NNER is only valid on an infinite labeled samples space;
Its computational complexity increases with N.
Roberto Souza
9/ 13
Introduction
Supervised Classification Backbone
1-NN
k-NN
Conclusions
k-NN Overview
k-NN is a natural extension of the 1-NN classifier. k-NN
classifies X by assigning it to the label most frequently
present in the k nearest neighbors. k-NN takes into
account k neighbors, so it is less sensible to noisy than
1-NN;
It can be shown that for an infinite number of samples,
N,as k tends to infinity the k-NN Error Rate (kNNER)
tends to the BER;
Roberto Souza
10/ 13
Introduction
Supervised Classification Backbone
1-NN
k-NN
Conclusions
Anomalous 3-NN Case
Figura: Illustration of a 3-NN anomalous case.
Roberto Souza
11/ 13
Introduction
Supervised Classification Backbone
1-NN
k-NN
Conclusions
Although k-NN, for k > 1, is theoretically a better
classifier than 1-NN, this may not be true if the number
of training samples is not large enough;
To avoid k-NN anomalous behaviour it is inserted a
parameter d, i.e. k-NN is no longer a non-parametric
classifier.
Roberto Souza
12/ 13
Introduction
Supervised Classification Backbone
1-NN
k-NN
Conclusions
Thanks for your attention!
Roberto Souza
13/ 13
Download