Nearest neighbors

advertisement
Pattern Recognition
Nearest neighbors
2014. 10. 31
Hyunki Hong
Contents
• Nearest neighbors density estimation
• The k nearest neighbors classification rule
• k-NN as a lazy learner
• Characteristics of the kNN classifier
• Optimizing the kNN classifier
Non-parametric density estimation:
review
• the general expression for non-parametric density
estimation is
• At that time, we mentioned that this estimate could be
computed by
– Fixing ๐‘‰ and determining the number ๐‘˜ of data points inside ๐‘‰.
→ This is the approach used in kernel density estimation.
- Fixing ๐‘˜ and determining the minimum volume ๐‘‰ that
encompasses ๐‘˜ points in the dataset.
→ This gives rise to the k-nearest-neighbors (kNN)
approach.
K-NN density estimation
• Approach
– In k-NN we grow the volume surrounding the estimation
point ๐‘ฅ until it encloses a total of k data points
– The density estimate then becomes.
where ๐‘…๐‘˜๐ท(๐‘ฅ) : the distance between the estimation point ๐‘ฅ and its ๐‘˜th closest neighbor
๐‘๐ท : the volume of the unit sphere in ๐ท dimensions,
which is equal to
Thus ๐‘1 = 2, ๐‘2 = ๐œ‹, ๐‘3 = 4๐œ‹/3, and so on.
cf. z>-1์ผ ๋•Œ,
ํŠนํžˆ, ์ž์—ฐ์ˆ˜ +0.5๊ผด์—์„œ๋Š”
• In general, the estimates that can be obtained with the
k-NN method are not very satisfactory.
- The estimates are prone to local noise.
- The method produces estimates with very heavy tails.
- Since the function ๐‘…๐‘˜(๐‘ฅ) is not differentiable, the density
estimate will have discontinuities.
- The resulting density is not a true probability density since
its integral over all the sample space diverges.
• To illustrate the behavior of kNN we generated several
density estimates for a bimodal Gaussian: ๐‘(๐‘ฅ)=½๐‘
(0,1)+½๐‘(10,4).
kNN in action
• Example 1
– Three-class 2D problem with non-linearly
separable, multimodal likelihoods.
– We use the kNN rule (๐‘˜ = 5) and the Euclidean
distance.
– The resulting decision boundaries and decision
regions are shown below.
kNN in action
• Example 2
– In 2D, 3-class problem with unimodal
likelihoods with a common mean; these
classes are also not linearly separable.
– We used the kNN rule (๐‘˜ = 5), and the
Euclidean distance as a metric.
Characteristics of the kNN classifier
• Advantages
– Analytically tractable
– Simple implementation
– Uses local information, which can yield highly adaptive behavior
– Lends itself very easily to parallel implementations
• Disadvantages
– Large storage requirements
– Computationally intensive recall
– Highly susceptible to the curse of dimensionality
• 1NN versus kNN
– The use of large values of ๐‘˜ has two main advantages.
1. Yields smoother decision regions
2. Provides probabilistic information, i.e., the ratio of examples for each
class gives information about the ambiguity of the decision.
– However, too large a value of ๐‘˜ is detrimental.
1. It destroys the locality of the estimation since farther examples are
taken into account.
2. In addition, it increases the computational burden.
Improving the NN search procedure
• The NN search procedure can be stated as follows.
– Given a set of ๐‘ points in ๐ท-dimensional space and an
unlabeled example ๐‘ฅ๐‘ข ∈ ℜ๐ท, find the point that minimizes the
distance to ๐‘ฅ๐‘ข.
– The naive approach of computing a set of ๐‘ distances, and
finding the (๐‘˜) smallest becomes impractical for large values
of ๐‘ and ๐ท.
• Two classical algorithms can be used to speed up the NN
search.
– Bucketing [Welch 1971]
1. The space is divided into identical cells; for each cell,
the data points inside it are stored in a list.
2. Cells are examined in order of increasing distance from
the query point; for each cell, the distance is computed between
its internal data points and the query point.
3. The search terminates when the distance from the query point to
the cell exceeds the distance to the closest point already visited.
Improving the NN search procedure
– k-d trees [Bentley, 1975; Friedman et al, 1977]
1. A k-d tree is a generalization of a binary search tree in high
dimensions.
a. Each internal node in a k-d tree is associated with a hyperrectangle and a hyper-plane orthogonal to one of the coordinate
axis.
b. The hyper-plane splits the hyper-rectangle into two parts, which
are associated with the child nodes.
c. The partitioning process goes on until the # data points in the
hyper-rectangle falls below some given threshold.
2. k-d trees partition the sample space according to the underlying
distribution of the data: the partitioning being finer in regions where
the density of data points is higher.
a. For a given query point, the algorithm works by first descending
the tree to find the data points lying in the cell that contains the
query point.
b. Then it examines surrounding cells if they overlap the ball
centered at the query point and the closest data point so far.
k-d tree example
• Data structure (3D case)
• Partitioning (2D case)
K-๊ทผ์ ‘์ด์›ƒ ๋ถ„๋ฅ˜๊ธฐ
K-๊ทผ์ ‘์ด์›ƒ ๋ถ„๋ฅ˜๊ธฐ
๋น„๋ชจ์ˆ˜์  ํ™•๋ฅ ๋ฐ€๋„ ์ถ”์ •๊ณผ ์ตœ๊ทผ์ ‘์ด์›ƒ ๋ถ„๋ฅ˜๊ธฐ
K-๊ทผ์ ‘์ด์›ƒ(K-NN) ๋ถ„๋ฅ˜๊ธฐ
K-NN ๋ถ„๋ฅ˜๊ธฐ์™€ ๋น„๋ชจ์ˆ˜์  ๋ฐ€๋„ ์ถ”์ •
K-๊ทผ์ ‘์ด์›ƒ ๋ถ„๋ฅ˜๊ธฐ์˜ ํŠน
์„ฑ ๊ฒฐ์ • ๊ฒฝ๊ณ„ ์„ค๊ณ„ ๊ณ ๋ ค์‚ฌํ•ญ
๋งคํŠธ๋žฉ์„ ์ด์šฉํ•œ K-NN ๋ถ„๋ฅ˜๊ธฐ ์‹ค
ํ—˜
K-๊ทผ์ ‘์ด์›ƒ ๋ถ„๋ฅ˜๊ธฐ
1) ๋น„๋ชจ์ˆ˜์  ํ™•๋ฅ ๋ฐ€๋„ ์ถ”์ •
๊ฐ ํด๋ž˜์Šค Ci์— ๋Œ€ํ•œ ํ™•๋ฅ ๋ฐ€๋„ํ•จ์ˆ˜ (๋ถ€ํ”ผ V๋ฅผ x์˜ ํ•จ์ˆ˜, ๋ฐ์ดํ„ฐ ์ˆ˜๋Š” K๋กœ ๊ณ ์ •)
N: ํ‘œ๋ณธ์˜ ์ด๊ฐœ์ˆ˜
Vi(x) : ํด๋ž˜์Šค Ci์— ์†ํ•˜๋Š” ๋ฐ์ดํ„ฐ ์ค‘์—์„œ x์—์„œ K
๋ฒˆ์งธ๋กœ ๊ฐ€๊นŒ์šด ๋ฐ์ดํ„ฐ xiK๊นŒ์ง€์˜ ๊ฑฐ๋ฆฌ d(x, xiK)
๋ฅผ ๋ฐ˜๊ฒฝ ri(x)์œผ๋กœ ํ•˜๋Š” ์ดˆ๊ตฌ(hypersphere)
vn : n์ฐจ์› ์ž…๋ ฅ๊ณต๊ฐ„์ƒ์˜ ๋‹จ์œ„๊ตฌ์˜ ์ฒด
์ 
: ๊ฒฐ์ •๊ทœ์น™
2) ์ตœ๊ทผ์ ‘์ด์›ƒ ๋ถ„๋ฅ˜๊ธฐ
C2
K = 1์ธ ๊ฒฝ์šฐ, ์ฃผ์–ด์ง„ ๋ฐ์ดํ„ฐ x์—
๋Œ€ํ•ด ๊ฐ ํด๋ž˜์Šค๋ณ„๋กœ ๊ฐ€์žฅ ๊ฐ€๊นŒ์šด
๋ฐ์ดํ„ฐ๊นŒ์ง€์˜ ๊ฑฐ๋ฆฌ ๊ณ„์‚ฐ
r1 x
r2
C1
ํด๋ž˜์Šค์™€ ์ƒ๊ด€์—†์ด ๋ชจ๋“  ๋ฐ์ดํ„ฐ ์ค‘์—์„œ ๊ฐ€์žฅ
์ž‘์€ ๊ฑฐ๋ฆฌ๊ฐ’์„ ๊ฐ–๋Š” ๋ฐ์ดํ„ฐ์˜ ํด๋ž˜์Šค๋กœ ํ• ๋‹น
์ตœ๊ทผ์ ‘์ด์›ƒ ๋ถ„๋ฅ˜๊ธฐ(Nearest Neighbor Classifier)
3) ์ตœ๊ทผ์ ‘์ด์›ƒ ๋ถ„๋ฅ˜๊ธฐ์˜ ์ˆ˜ํ–‰ ๋‹จ๊ณ„
1. ์ฃผ์–ด์ง„ ๋ฐ์ดํ„ฐ x์™€ ๋ชจ๋“  ํ•™์Šต ๋ฐ์ดํ„ฐ {x1, x2, …, xN}์™€์˜ ๊ฑฐ๋ฆฌ ๊ณ„์‚ฐ
2. ๊ฑฐ๋ฆฌ๊ฐ€ ๊ฐ€์žฅ ๊ฐ€๊นŒ์šด ๋ฐ์ดํ„ฐ๋ฅผ ์ฐพ์•„์„œ xmin์œผ๋กœ ์„ค์ •
3. xmin์ด ์†ํ•˜๋Š” ํด๋ž˜์Šค์— ํ• ๋‹น.
์ฆ‰, y(xmin)๊ณผ ๊ฐ™์€ ๊ฐ’์„ ๊ฐ€์ง€๋„๋ก y(x)๋ฅผ ๊ฒฐ์ •
4) ์ตœ๊ทผ์ ‘์ด์›ƒ ๋ถ„๋ฅ˜๊ธฐ์˜ ๋ฌธ์ œ-๊ณผ๋‹ค์ ํ•ฉ
ํ•™์Šต – ๋ฒ ์ด์ง€์•ˆ ๋ถ„๋ฅ˜๊ธฐ
ํ…Œ์ŠคํŠธ – ๋ฒ ์ด์ง€์•ˆ ๋ถ„๋ฅ˜๊ธฐ
6.5%
ํ•™์Šต – ์ตœ๊ทผ์ ‘์ด์›ƒ ๋ถ„๋ฅ˜๊ธฐ
ํ…Œ์ŠคํŠธ - ์ตœ๊ทผ์ ‘์ด์›ƒ๋ถ„๋ฅ˜๊ธฐ
13.5%
๊ณผ๋‹ค์ •ํ•ฉ๋œ
๊ฒฐ์ •๊ฒฝ๊ณ„
๏ƒจ “K-๊ทผ์ ‘์ด์›ƒ ๋ถ„๋ฅ˜๊ธฐ”
5) K-๊ทผ์ ‘์ด์›ƒ ๋ถ„๋ฅ˜๊ธฐ
K-NN ๋ถ„๋ฅ˜๊ธฐ์˜ ์ˆ˜ํ–‰ ๋‹จ๊ณ„
1. ์ฃผ์–ด์ง„ ๋ฐ์ดํ„ฐ x์™€ ๋ชจ๋“  ํ•™์Šต ๋ฐ์ดํ„ฐ {x1, x2, …, xN}์™€์˜ ๊ฑฐ๋ฆฌ ๊ณ„์‚ฐ
2. ๊ฑฐ๋ฆฌ๊ฐ€ ๊ฐ€์žฅ ๊ฐ€๊นŒ์šด ๊ฒƒ๋ถ€ํ„ฐ ์ˆœ์„œ๋Œ€๋กœ K๊ฐœ์˜ ๋ฐ์ดํ„ฐ๋ฅผ ์ฐพ์•„ ํ›„๋ณด ์ง‘ํ•ฉ
N(x)={x1, x2,…, xK}๋ฅผ ์ƒ์„ฑ
3. ํ›„๋ณด ์ง‘ํ•ฉ์˜ ๊ฐ ์›์†Œ๊ฐ€ ์–ด๋–ค ํด๋ž˜์Šค์— ์†ํ•˜๋Š”์ง€ ๊ทธ ๋ผ๋ฒจ๊ฐ’ y(x1),
y(x2), …, y(xK)์„ ์ฐพ์Œ.
4. ์ฐพ์•„์ง„ ๋ผ๋ฒจ ๊ฐ’ ์ค‘ ๊ฐ€์žฅ ๋งŽ์€ ๋นˆ๋„์ˆ˜๋ฅผ ์ฐจ์ง€ํ•˜๋Š” ํด๋ž˜์Šค๋ฅผ ์ฐพ์•„ x๋ฅผ
๊ทธ ํด๋ž˜์Šค์— ํ• ๋‹น
6) K-NN ๋ถ„๋ฅ˜๊ธฐ์™€ ๋น„๋ชจ์ˆ˜์  ๋ฐ€๋„ ์ถ”์ •
K = 1์ธ ๊ฒฝ์šฐ, K-NN ๋ถ„๋ฅ˜๊ธฐ๋Š” ์ตœ๊ทผ์ ‘์ด์›ƒ๋ถ„๋ฅ˜๊ธฐ์™€ ๋™์ผ ์ฒ˜๋ฆฌ.
๋ฐ์ดํ„ฐ x ์ฃผ์–ด์ง€๋ฉด, ํด๋ž˜์Šค์™€ ์ƒ๊ด€์—†์ด ์ „์ฒด ๋ฐ์ดํ„ฐ์—์„œ K ๊ฐœ ๋งŒํผ์„
ํฌํ•จํ•˜๋Š” ์˜์—ญ์˜ ๋ถ€ํ”ผ V(x)๋ฅผ ์ฐพ์Œ.
VK(x) : K๊ฐœ์˜ ๋ฐ์ดํ„ฐ๋ฅผ ํฌํ•จํ•˜๋Š” ์˜์—ญ์˜
๋ถ€ํ”ผ
K1, K2, … , KM : VK(x) ๋‚ด์— ํฌํ•จ๋œ
๊ฐ ํด๋ž˜์Šค๋ณ„ ๋ฐ์ดํ„ฐ์˜ ๊ฐœ์ˆ˜
C2
K=5์ธ ๊ฒฝ์šฐ
V5 ( x)
K2 ๏€ฝ 3
K1 ๏€ฝ 2
x
C1
y( x) ๏€ฝ arg max{K1 ( x), K2 ( x)} ๏ก C2
2. K-๊ทผ์ ‘์ด์›ƒ ๋ถ„๋ฅ˜๊ธฐ์˜ ํŠน์„ฑ
1) K-NN ๋ถ„๋ฅ˜๊ธฐ์˜ ๊ฒฐ์ •๊ฒฝ๊ณ„
- ๋น„์„ ํ˜•์  ๊ฒฐ์ •๊ฒฝ๊ณ„
- ๋น„๋ชจ์ˆ˜์  ๋ฐฉ๋ฒ•์— ๊ธฐ๋ฐ˜ํ•˜๋ฏ€๋กœ ๋ฐ์ดํ„ฐ ๋ถ„ํฌ ํ˜•ํƒœ์— ๋”ฐ๋ผ ์„ฑ๋Šฅ ์˜
ํ–ฅ ์ ์Œ.
C1
C2
์ตœ์†Œ๊ฑฐ๋ฆฌ ๋ถ„๋ฅ˜๊ธฐ
K-๊ทผ์ ‘์ด์›ƒ ๋ถ„๋ฅ˜๊ธฐ
2) K-๊ทผ์ ‘์ด์›ƒ ๋ถ„๋ฅ˜๊ธฐ์˜ ๊ฒฐ์ •๊ฒฝ๊ณ„
๋ฐ์ดํ„ฐ ๋ถ„ํฌ๊ฐ€ ๋ณต์žกํ•œ ๋น„์„ ํ˜• ๊ตฌ์กฐ๋ฅผ ๊ฐ€์ง€๋Š” ๊ฒฝ์šฐ
8
7
8
C1
C2
7
6
6
5
5
4
4
3
3
2
2
1
1
0
0
1
2
3
4
5
6
๊ฐ€์šฐ์‹œ์•ˆ ๋ฒ ์ด์ฆˆ ๋ถ„๋ฅ˜๊ธฐ
์‹คํŒจ
7
0
8 0
C1
1
C2
2
3
4
5
6
K-๊ทผ์ ‘์ด์›ƒ ๋ถ„๋ฅ˜๊ธฐ
์„ฑ๊ณต
7
8
3) ๋ถ„๋ฅ˜๊ธฐ์˜ ๋น„๊ต
๏ƒผ ๊ฐ€์šฐ์‹œ์•ˆ ๋ฒ ์ด์ฆˆ ๋ถ„๋ฅ˜๊ธฐ
๏ƒ˜ ๋ชจ์ˆ˜์  ๋ฐ€๋„ ์ถ”์ • ๋ฐฉ๋ฒ•์— ๊ธฐ๋ฐ˜
๏ƒ˜ ํ•™์Šต ๋ฐ์ดํ„ฐ๋ฅผ ํ†ตํ•ด ํ‰๊ท ๊ณผ ํ‘œ์ค€ํŽธ์ฐจ ๊ณ„์‚ฐ
๏‚ง ๋ถ„๋ฅ˜ ๊ณผ์ •์—์„œ ํ•™์Šต ๋ฐ์ดํ„ฐ๊ฐ€ ๋ถˆํ•„์š”
๏ƒผ K-๊ทผ์ ‘์ด์›ƒ ๋ถ„๋ฅ˜๊ธฐ
๏ƒ˜ ๋น„๋ชจ์ˆ˜์  ๋ฐ€๋„ ์ถ”์ • ๋ฐฉ๋ฒ•์— ๊ธฐ๋ฐ˜
๏ƒ˜ ์ƒˆ ๋ฐ์ดํ„ฐ๊ฐ€ ์ฃผ์–ด์งˆ ๋•Œ๋งˆ๋‹ค ํ•™์Šต ๋ฐ์ดํ„ฐ ์ „์ฒด์™€์˜ ๊ฑฐ๋ฆฌ ๊ณ„์‚ฐ์œผ
๋กœ K ๊ฐœ์˜ ์ด์›ƒ ๋ฐ์ดํ„ฐ ์„ ์ • ํ•„์š”ํ•จ.
๏‚ง ํ•ญ์ƒ ํ•™์Šต ๋ฐ์ดํ„ฐ ์ €์žฅ ๏ƒ  ๋น„์šฉ(๊ณ„์‚ฐ๋Ÿ‰, ๋ฉ”๋ชจ๋ฆฌ) ๋ฌธ์ œ ์ดˆ๋ž˜
4) K-NN ๋ถ„๋ฅ˜๊ธฐ์˜ ์„ค๊ณ„ ๊ณ ๋ ค์‚ฌํ•ญ
๋…ธ์ด์ฆˆ์—
๊ณผ๋‹ค์ •ํ•ฉ
K=1
K=5
K๊ฐ’์— ๋”ฐ๋ฅธ ๊ฒฐ์ •๊ฒฝ๊ณ„์˜ ๋ณ€ํ™”
K=20
K=100
5) K-NN ๋ถ„๋ฅ˜๊ธฐ์˜ ์„ค๊ณ„ ๊ณ ๋ ค์‚ฌํ•ญ
๏ƒผ ์ ์ ˆํ•œ K ๊ฐ’์˜ ๊ฒฐ์ •
๏ƒ˜ K = 1 ๏ƒ  ๋ฐ”๋กœ ์ด์›ƒํ•œ ๋ฐ์ดํ„ฐ์—๋งŒ ์˜์กดํ•˜์—ฌ ํด๋ž˜์Šค๊ฐ€ ๊ฒฐ์ •. ๋…ธ์ด
์ฆˆ์— ๋ฏผ๊ฐ
๏ƒ˜ K >> 1 ๏ƒ  ์ฃผ์–ด์ง„ ๋ฐ์ดํ„ฐ ์ฃผ๋ณ€ ์˜์—ญ์ด ์•„๋‹Œ ์ „์ฒด ๋ฐ์ดํ„ฐ ์˜์—ญ์—์„œ
๊ฐ ํด๋ž˜์Šค๊ฐ€ ์ฐจ์ง€ํ•˜๋Š” ๋น„์œจ(์„ ํ—˜ํ™•๋ฅ )์— ์˜์กด
๏ƒ˜ ์ฃผ์–ด์ง„ ๋ฐ์ดํ„ฐ์˜ ๋ถ„ํฌ ํŠน์„ฑ์— ์˜์กด
๏‚ง ๋ฐ์ดํ„ฐ๋ฅผ ํ™œ์šฉํ•œ ๋ถ„๋ฅ˜๋ฅผ ํ†ตํ•ด ๊ฐ€์žฅ ์ข‹์€ ์„ฑ๋Šฅ์„ ์ฃผ๋Š” ๊ฐ’์„ ์„ ํƒ
6) K-NN ๋ถ„๋ฅ˜๊ธฐ์˜ ์„ค๊ณ„ ๊ณ ๋ ค์‚ฌํ•ญ
๏‚ฑ
๊ฑฐ๋ฆฌํ•จ์ˆ˜ :
์ฃผ์–ด์ง„ ๋ฐ์ดํ„ฐ์™€ ํ•™์Šต ๋ฐ์ดํ„ฐ๊ฐ„์˜ ๊ฑฐ๋ฆฌ ๊ณ„์‚ฐ
์œ ํด๋ฆฌ๋””์•ˆ ๊ฑฐ๋ฆฌ
(2์ฐจ ๋…ธ๋ฆ„)
1์ฐจ ๋…ธ๋ฆ„
p์ฐจ ๋…ธ๋ฆ„
๋‚ด์ 
์ฝ”์‚ฌ์ธ ๊ฑฐ๋ฆฌ
์ •๊ทœํ™”๋œ
์œ ํด๋ฆฌ๋””์•ˆ ๊ฑฐ๋ฆฌ
๋งˆํ• ๋ผ๋…ธ๋น„์Šค ๊ฑฐ๋ฆฌ
๊ฐ ์ขŒํ‘œ์ถ• ๋ฐฉํ–ฅ์œผ๋กœ์˜ ๋ถ„์‚ฐ ์ฐจ์ด๋ฅผ ๊ณ ๋ คํ•˜์—ฌ ๋ฐ˜์˜
K-NN ๋ถ„๋ฅ˜๊ธฐ ์‹คํ—˜
1) ๋ฐ์ดํ„ฐ ์ƒ์„ฑ ๋ฐ ์‹คํ—˜ ๊ฒฐ๊ณผ
5
4
C3
C2
3
2
1
0
๏ƒผํ•™์Šต ๋ฐ์ดํ„ฐ: 100๊ฐœ/ํด๋ž˜์Šค
-1
C1
-2
-2
๏ƒผํ…Œ์ŠคํŠธ ๋ฐ์ดํ„ฐ: 105๊ฐœ/ํด๋ž˜์Šค
-1
0
1
2
3
4
5
K-๊ทผ์ ‘์ด์›ƒ ๋ถ„๋ฅ˜๊ธฐ์˜ ๋ถ„๋ฅ˜ ๊ฒฐ๊ณผ (%)
K=1
K=5
K=10
K=50
ํ•™์Šต์˜ค์ฐจ
0.00
9.67
11.67
11.67
ํ…Œ์ŠคํŠธ ์˜ค์ฐจ
14.74
11.11
10.46
10.52
2) K-NN ๋ถ„๋ฅ˜๊ธฐ
load dataCh4_7
%ํ•™์Šต ๋ฐ์ดํ„ฐ ๋กœ๋“œ
X=[X1;X2;X3];
Etrain=0;
N = size(X,1);
for i=1:N
x=X(i,:);
% ๊ฐ ๋ฐ์ดํ„ฐ์— ๋Œ€ํ•ด ๋ถ„๋ฅ˜์‹œ์ž‘
for j=1:N
% ๋ชจ๋“  ๋ฐ์ดํ„ฐ์™€์˜ ๊ฑฐ๋ฆฌ ๊ณ„์‚ฐ
d(j,1)=norm(x-X(j,:));
end
[sx,si]=sort(d);
% ๊ฑฐ๋ฆฌ์ˆœ์œผ๋กœ ์ •๋ ฌ
K=5; c=zeros(3,1);
% K=5๋กœ ์ •ํ•จ
for j=1:K
% ์ด์›ƒํ•œ K๊ฐœ ๋ฐ์ดํ„ฐ์˜ ๋ผ๋ฒจ์„
if (si(j)<=100) c(1)=c(1)+1; end % ์ ๊ฒ€ํ•˜์—ฌ ํˆฌํ‘œ์ˆ˜ํ–‰
if (si(j)>200) c(3)=c(3)+1; end
if ((si(j)>100) & (si(j)<=200)) c(2)=c(2)+1; end
end
[maxv, maxi]=max(c);
% ์ตœ๋Œ€ ํˆฌํ‘œ์ˆ˜๋ฅผ ๋ฐ›์€ ํด๋ž˜์Šค๋กœ ํ• ๋‹น
if (maxi~=(floor((i-1)/100)+1))
% ์›๋ž˜ ํด๋ž˜์Šค ๋ผ๋ฒจ๊ณผ ๋‹ค๋ฅด๋ฉด
Etrain(1,1) = Etrain(1,1)+1;
% ์˜ค๋ฅ˜๋ฐ์ดํ„ฐ์˜ ๊ฐœ์ˆ˜๋ฅผ ์ฆ๊ฐ€
end;
end
Error_rate = Etrain/N;
%์˜ค๋ถ„๋ฅ˜์œจ ์ถœ๋ ฅ
3) K๊ฐ’์— ๋”ฐ๋ฅธ ๊ฒฐ์ •๊ฒฝ๊ณ„์˜ ๋ณ€ํ™”
K=1
K=5
K=10
K=50
๋ฒ ์ด์ฆˆ ๋ถ„๋ฅ˜๊ธฐ๋Š” ์ ์ ˆํ•œ ํ™•๋ฅ  ๋ชจ๋ธ ์ถ”์ •, K-NN์€ ์ ์ ˆํ•œ K ์ฐพ๋Š” ๊ฒƒ ์ค‘์š”
4) ๊ฒฐ์ •๊ฒฝ๊ณ„ ๊ทธ๋ฆฌ๊ธฐ
load dataCh4_7
%ํ•™์Šต ๋ฐ์ดํ„ฐ ๋กœ๋“œ
X=[X1;X2;X3];
[x,y]=meshgrid([-2.5:0.1:5.5],[-2.5:0.1:5.5]);
%์ž…๋ ฅ๊ณต๊ฐ„์ „์ฒด์˜ ๋ฐ์ดํ„ฐ์ค€๋น„
XY=[x(:), y(:)];
plot(X1(:,1), X1(:,2), '*'); hold on;
%ํ•™์Šต๋ฐ์ดํ„ฐ ๊ทธ๋ฆฌ๊ธฐ
plot(X2(:,1), X2(:,2), 'ro'); plot(X3(:,1), X3(:,2), 'kd');
for i=1:size(XY,1)
%์ „์ œ ์ž…๋ ฅ๊ณต๊ฐ„์˜ ๋ฐ์ดํ„ฐ์— ๋Œ€ํ•ด
xt=XY(i,:);
%ํด๋ž˜์Šค ๋ผ๋ฒจ์„ ๊ฒฐ์ •
for j=1:size(X,1)
d(j,1)=norm(xt-X(j,:));
end
[sx,si]=sort(d);
K=1; c=zeros(3,1);
for j=1:K
if (si(j)<=100) c(1)=c(1)+1; end
if (si(j)>200) c(3)=c(3)+1; end
if ((si(j)>100) & (si(j)<=200)) c(2)=c(2)+1; end
end
[maxv, maxi]=max(c);
rxy1(i,1)=maxi;
end
rxy1=reshape(rxy1,size(x));
contour(x, y,rxy1);
%ํด๋ž˜์Šค ๋ผ๋ฒจ์— ๋”ฐ๋ฅธ ๋“ฑ๊ณ ์„  ๊ทธ๋ฆฌ๊ธฐ
axis([-2.5 5.5 -2.5 5.5]); grid on
K – NNR(Nearest Neighbor Rule) ๋ฐ€๋„ ์ถ”์ •
• k๊ฐœ์˜ ์ตœ๊ทผ์ ‘ํ•˜๋Š” ์ด์›ƒ์„ ์ด์šฉ
– ๋ผ๋ฒจ์ด ์—†๋Š” ์ž„์˜์˜ ์ž…๋ ฅ ๋ฐ์ดํ„ฐ Xu๋ฅผ ํ•™์Šต๋œ ๋ฐ์ดํ„ฐ ํด๋ž˜์Šค๋กœ ๋ถ„๋ฅ˜
• ์ฃผ์–ด์ง„ ํ•™์Šต ๋ฐ์ดํ„ฐ ์ง‘ํ•ฉ๋“ค์— ์†ํ•˜๋Š” k๊ฐœ์˜ ๊ฐ€์žฅ ๊ฐ€๊นŒ์šด ํ‘œ๋ณธ์„ ์ฐพ์•„๋‚ธ
๋‹ค์Œ,
• k๊ฐœ์˜ ๋ถ€๋ถ„์ง‘ํ•ฉ ๋‚ด์— ๊ฐ€์žฅ ๋งŽ์€ ๋นˆ๋„๋ฅผ ํด๋ž˜์Šค๋กœ Xu๋ฅผ ํ• ๋‹น
– k-NNR์˜ ํ•„์š”๋ฐ์ดํ„ฐ
• ์ƒ์ˆ˜ k
• ๋ผ๋ฒจ์ด ์žˆ๋Š” ํ•™์Šต ๋ฐ์ดํ„ฐ
์ง‘ํ•ฉ์˜ ํ‘œ๋ณธ
• ๊ฑฐ๋ฆฌ ์ฒ™๋„
k = 5 ์ธ ๊ฒฝ์šฐ
: Xu๋Š” ω1๋กœ ํ• ๋‹น
k – NNR์„ ์ด์šฉํ•œ ๋ฐ€๋„ ์ถ”์ •
• k-NNR์„ ์ด์šฉํ•œ ๊ฒฐ์ •๊ฒฝ๊ณ„ ์ถ”์ •
– K = 5๋กœ ๊ฐ€์ •ํ•˜๊ณ  ์„ธ ํด๋ž˜์Šค์— ๋Œ€ํ•œ ๊ฒฝ๊ณ„ ์ถ”์ •์„ ์ˆ˜ํ–‰ํ•œ ๊ฒฐ๊ณผ.
๋น„์„ ํ˜•์  ๊ฒฐ์ •๊ฒฝ๊ณ„ ๊ฐ€๋Šฅ
k – NNR์„ ์ด์šฉํ•œ ๋ฐ€๋„ ์ถ”์ •
R(x) = k๊ฐœ์˜ ๊ทผ์ ‘ ๋ฐ์ดํ„ฐ์™€ ๊ธฐ์ค€์  x์‚ฌ์ด์˜ ๊ฑฐ๋ฆฌ
• k-NNR์„ ์ด์šฉํ•œ ๋ฐ€๋„
์ถ”์ • ์˜ˆ
C = D์ฐจ์›์˜ ์˜์—ญ (์ฆ‰, D์ฐจ์› ๊ตฌ์˜ ์ฒด์ )
D
cf. z>-1์ผ ๋•Œ,
ํŠนํžˆ, ์ž์—ฐ์ˆ˜ +0.5๊ผด์—์„œ๋Š”
k – NNR์„ ์ด์šฉํ•œ ๋ฐ€๋„ ์ถ”์ •
• k-NNR์„ ์ด์šฉํ•œ ๋ฐ€๋„ ์ถ”์ • ์˜ˆ
k – NNR์„ ์ด์šฉํ•œ ๋ฐ€๋„ ์ถ”์ •
๋น„๋ชจ์ˆ˜ ๋ฐ€๋„ ์ถ”์ •์„ ์ด์šฉํ•œ ํŒจํ„ด ์ธ์‹
– ๊ฐ ํด๋ž˜์Šค์— ๋Œ€ํ•œ ์šฐ๋„ํ™•๋ฅ  ์ถ”์ •ํ•˜๊ณ  ๋ฒ ์ด์ฆˆ ๊ฒฐ์ • ๊ทœ์น™ ์ด์šฉํ•ด
์‹œํ—˜ ๋ฐ์ดํ„ฐ ๋ถ„๋ฅ˜ → ๊ฐ€์žฅ ํฐ ์‚ฌํ›„ ํ™•๋ฅ ์˜ ํด๋ž˜์Šค ์„ ํƒ
– Non-parametric approach์˜ ์žฅ์ 
• ํŒจํ„ด ์ธ์‹ ์ ‘๊ทผ์˜ ์šฉ์ด์„ฑ, ์‚ฌ์ „ ์ง€์‹ ๋ถˆํ•„์š”, unimodal/bimodal ํญ
๋„“๊ฒŒ ์ ์šฉ ๊ฐ€๋Šฅ ๋“ฑ…
– ๋‹จ์ 
• ์ •ํ™•ํ•œ ์ถ”์ •์„ ํ•˜๋ ค๋ฉด ๋งค์šฐ ๋งŽ์€ ์ˆ˜์˜ ํ‘œ๋ณธ ๋ฐ์ดํ„ฐ๊ฐ€ ์ฃผ์–ด์ ธ์•ผ ํ•จ.
• ๋งŽ์€ ๊ณ„์‚ฐ ์‹œ๊ฐ„ ๋ฐ ๊ณ ์šฉ๋Ÿ‰์˜ ๋ฉ”๋ชจ๋ฆฌ ์š”๊ตฌ
– K-NNR์„ ์ด์šฉํ•œ ํŒจํ„ด ์ธ์‹์˜ ์˜ˆ
• N๊ฐœ์˜ ํ•™์Šต ํ‘œ๋ณธ ์ง‘ํ•ฉ {x1, x2, …, xN}์ด c๊ฐœ์˜ ๋ถ€๋ถ„์ง‘ํ•ฉ D1, D2, … ,
DC ๋กœ ๋ถ„ํฌํ•˜๊ณ , ๊ฐ ๋ถ€๋ถ„ ์ง‘ํ•ฉ Dj ๋Š” ํด๋ž˜์Šค ωj ์— ์†ํ•œ๋‹ค๊ณ  ๊ฐ€์ •
• ์ž…๋ ฅ ํ‘œ๋ณธ๊ฐ’ x๊ฐ€ ์ฃผ์–ด์กŒ์„ ๋•Œ ์–ด๋Š ํด๋ž˜์Šค๋กœ ๋ถ„๋ฅ˜๋˜๋Š”์ง€ ์ธ์‹ ๊ณผ์ •
๋น„๋ชจ์ˆ˜ ๋ฐ€๋„ ์ถ”์ •์„ ์ด์šฉํ•œ ํŒจํ„ด ์ธ์‹
– Non-parametric Density Estimation(K-NNR) ์ด์šฉํ•œ ํŒจ
ํ„ด ์ธ์‹ ์˜ˆ
• N๊ฐœ์˜ ํ•™์Šต ํ‘œ๋ณธ ์ง‘ํ•ฉ {x1, x2, …, xN}์ด c๊ฐœ์˜ ๋ถ€๋ถ„์ง‘ํ•ฉ D1, D2, …,
DC ๋กœ ๋ถ„ํฌํ•˜๊ณ , ๊ฐ ๋ถ€๋ถ„ ์ง‘ํ•ฉ Dj ๋Š” ํด๋ž˜์Šค ωj ์— ์†ํ•œ๋‹ค๊ณ  ๊ฐ€์ •
• ์ž…๋ ฅ ํ‘œ๋ณธ๊ฐ’ x๊ฐ€ ์ฃผ์–ด์กŒ์„ ๋•Œ ์–ด๋Š ํด๋ž˜์Šค๋กœ ๋ถ„๋ฅ˜๋˜๋Š”์ง€ ์ธ์‹ํ•˜๋Š”
๊ณผ์ •. ์ฆ‰, ์‚ฌํ›„ํ™•๋ฅ  P(ωj |x) ๊ณ„์‚ฐ ๊ณผ์ •
1) ์šฐ๋„ํ•จ์ˆ˜ (likelihood function) ์ถ”์ •
– ํด๋ž˜์Šค ωj ์—์„œ k-NNR ํŒ๋ณ„์กฐ๊ฑด์— ๋งž๋Š” ํ‘œ๋ณธ์˜ ์ˆ˜๊ฐ€ kj์ผ ๋•Œ, ๊ฒฐํ•ฉํ™•๋ฅ  P( X , ๏ท j )
์˜ ์ผ๋ฐ˜ํ˜•์„ ์ด์šฉํ•˜์—ฌ ์šฐ๋„๋ฅผ ๋‹ค์Œ๊ณผ ๊ฐ™์ด ๊ฐ€์ •ํ•  ์ˆ˜ ์žˆ์Œ.
kj
kj
pN ( X , ๏ท j ) ๏€ฝ
๏‚ฎ pN ( X | ๏ท j ) ๏€ฝ
NV
NV
pN ( X , ๏ท j )
kj
kj
P(๏ท j | X ) ๏€ฝ c
๏€ฝ c ๏€ฝ
2) ์‚ฌํ›„ํ™•๋ฅ (posterior) ์ถ”์ •
๏ƒฅ pN ( X , ๏ทi ) ๏ƒฅ ki k
3) ์ตœ์ข… ํด๋ž˜์Šค ๊ฒฐ์ •
i ๏€ฝ1
P(๏ทP | X ) ๏€ฝ max j P(๏ท j | X )
i ๏€ฝ1
Download