Statistical clustering

advertisement
Introduction to Pattern Recognition for Human ICT
Statistical clustering
2014. 11. 7
Hyunki Hong
Contents
•
Similarity measures
•
Criterion functions
•
Cluster validity
•
k-means
•
Hierarchical clustering algorithms
Non-parametric unsupervised learning
• The concept of unsupervised learning
– A collection of pattern recognition methods that “learn
without a teacher”
– Two types of clustering methods were mentioned:
parametric and non-parametric
• Parametric unsupervised learning
– Equivalent to density estimation with a mixture of (Gaussian)
components
– Through the use of EM, the identity of the component that
originated each data point was treated as a missing feature.
Non-parametric unsupervised learning
• Non-parametric unsupervised learning
– No density functions are considered in these methods.
– Instead, we are concerned with finding natural groupings
(clusters) in a dataset.
• Non-parametric clustering involves three steps.
– Defining a measure of (dis)similarity between examples
– Defining a criterion function for clustering
– Defining an algorithm to minimize (or maximize) the criterion
function
Proximity measures
• Definition of metric
– A measuring rule ๐‘‘(๐‘ฅ, ๐‘ฆ) for the distance between two
vectors ๐‘ฅ and ๐‘ฆ is considered a metric if it satisfies the
following properties.
– If the metric has the property ๐‘‘(๐‘Ž๐‘ฅ, ๐‘Ž๐‘ฆ) = |๐‘Ž|๐‘‘(๐‘ฅ, ๐‘ฆ) then it
is called a norm and denoted ๐‘‘(๐‘ฅ, ๐‘ฆ) = ||๐‘ฅ − ๐‘ฆ||
• The most general form of distance
metric is the power norm.
– ๐‘ controls the weight placed on any dimension
dissimilarity, whereas ๐‘Ÿ controls the distance growth of
norm: a function that assigns a strictly positive length or
patterns that are further apart.
size to each vector in a vector space
p-norm:
a real number p ≥ 1
• Most commonly used metrics are derived from the power
norm.
– Minkowski metric (๐ฟ๐‘˜ norm)
The choice of an appropriate value of ๐‘˜ depends on the amount of
emphasis that you would like to give to the larger differences
between dimensions.
– Manhattan or city-block distance (๐ฟ1 norm)
When used with binary vectors, the L1 norm
is known as the Hamming distance.
– Euclidean norm (๐ฟ2 norm)
- Chebyshev distance (๐ฟ∞ norm)
• Other metrics are also popular
– Quadratic distance
The Mahalanobis distance is a particular case of this distance.
– Canberra metric (for non-negative features)
- Non-linear distance
where ๐‘‡ is a threshold and ๐ป is a distance
1. An appropriate choice for ๐ป and ๐‘‡ for feature selection is that
they should satisfy.
2. and that ๐‘‡ satisfies the unbiasedness and consistency conditions
of the Parzen estimator.
• The above distance metrics are measures of dissimilarity.
• Some measures of similarity also exist.
– Inner product:
The inner product is used when the vectors ๐‘ฅ and ๐‘ฆ are
normalized, so that they have the same length.
– Correlation coefficient
- Tanimoto measure (for binary-valued vectors)
Criterion function for clustering
• Once a (dis)similarity measure has been determined, we
need to define a criterion function to be optimized
– The most widely used clustering criterion is the sum-ofsquare-error.
1. This criterion measures how well the data set ๐‘‹ = {๐‘ฅ(1, …, ๐‘ฅ(๐‘} is
represented by the cluster centers ๐œ‡ = {๐œ‡(1, …, ๐œ‡(๐ถ} (๐ถ<๐‘)
2. Clustering methods that use this criterion are called minimum
variance.
– Other criterion functions exist, based on the scatter matrices
used in Linear Discriminant Analysis.
Cluster validity
• The validity of the final cluster solution is highly
subjective
– This is in contrast with supervised training, where a clear
objective function is known: Bayes risk
– Note that the choice of (dis)similarity measure and criterion
function will have a major impact on the final clustering
produced by the algorithms.
• Example
– Which are the meaningful clusters in these cases?
– How many clusters should be considered?
Iterative optimization
• Once a criterion function has been defined, we must find
a partition of the data set that minimizes the criterion.
– Exhaustive enumeration of all partitions, which guarantees
the optimal solution, is unfeasible.
For example, a problem with 5 clusters and 100 examples
yields 1067 partitionings
• The common approach is to proceed in an iterative
fashion.
- Find some reasonable initial partition and then
- Move samples from one cluster to another in order to
reduce the criterion function.
Iterative optimization
• These iterative methods produce sub-optimal solutions
but are computationally tractable.
• We will consider two groups of iterative methods.
– Flat clustering algorithms
1. These algorithms produce a set of disjoint clusters.
2. Two algorithms are widely used: k-means and ISODATA.
– Hierarchical clustering algorithms:
→ build a hierarchy of clusters
1. The result is a hierarchy of nested clusters.
2. These algorithms can be broadly divided into agglomerative and
divisive approaches.
The k-means algorithm
• Method
– a simple clustering procedure that attempts to minimize the
criterion function ๐ฝ๐‘€๐‘†๐ธ in an iterative fashion.
1. Define the number of clusters.
2. Initialize clusters by
a. an arbitrary assignment of examples to clusters or
b. an arbitrary set of cluster centers (some examples used as
centers)
3. Compute the sample mean of each cluster.
4. Reassign each example to the cluster with the nearest mean
5. If the classification of all samples has not changed, stop, else go
to step 3.
- a particular case of the EM algorithm for mixture models.
• Vector quantization
– An application of k-means to signal
processing and communication
– Univariate signal values are usually
quantized into a number of levels.
: Typically a power of 2 so the signal can
be transmitted in binary format.
– The same idea can be extended for
multiple channels.
1. We could quantize each separate
channel.
2. Instead, we can obtain a more
efficient coding if we quantize the
overall multidimensional vector by
finding a number of multidimensional
prototypes (cluster centers).
3. The set of cluster centers is called a
codebook, and the problem of finding
this codebook is normally solved using
the k-means algorithm.
๊ตฐ์ง‘ํ™”
clsustering
๋ถ„๋ฅ˜์™€ ๊ตฐ์ง‘ํ™”
K-means ๊ตฐ์ง‘ํ™” ์•Œ๊ณ ๋ฆฌ
์ฆ˜ K-means ์•Œ๊ณ ๋ฆฌ์ฆ˜ K-means ์•Œ๊ณ ๋ฆฌ์ฆ˜์˜ ํŠน์„ฑ
๊ณ„์ธต์  ๊ตฐ์ง‘ํ™”
๊ณ„์ธต์  ๊ตฐ์ง‘ํ™” ์•Œ๊ณ ๋ฆฌ์ฆ˜
๊ณ„์ธต์  ๊ตฐ์ง‘ํ™”์˜ ํŠน์„ฑ
๋งคํŠธ๋žฉ์„ ์ด์šฉํ•œ ๊ตฐ์ง‘ํ™” ์‹คํ—˜
1) ๋ถ„๋ฅ˜ vs ๊ตฐ์ง‘ํ™”
๊ตฐ์ง‘ํ™” clustering
๋ถ„๋ฅ˜ classification
(a)
(b)
๊ฐ ๋ฐ์ดํ„ฐ์— ๋Œ€ํ•œ ํด๋ž˜์Šค ๋ผ๋ฒจ
๊ฐ ๋ฐ์ดํ„ฐ์— ๋Œ€ํ•œ ํด๋ž˜์Šค ๋ผ๋ฒจ
(๋ชฉํ‘œ ์ถœ๋ ฅ๊ฐ’) ์ •๋ณด๊ฐ€ ์ฃผ์–ด์ง
์ •๋ณด๊ฐ€ ์ œ๊ณต๋˜์ง€ ์•Š์Œ
์ „์ฒด ๋ฐ์ดํ„ฐ๋งŒ ์ฃผ์–ด์ง.
๊ต์‚ฌ ํ•™์Šต
๋น„๊ต์‚ฌ ํ•™์Šต
2) ๊ตฐ์ง‘ํ™”์˜ ์˜ˆ
ํ‘œ๋ณธ(๋ฐ์ดํ„ฐ) ์ˆ˜๊ฐ€ ๋„ˆ๋ฌด ๋งŽ์•„ ๊ฐ ๋ฐ์ดํ„ฐ๋งˆ๋‹ค ์ผ์ผ์ด ํŠน์ • ํด๋ž˜์Šค๋กœ
๋ผ๋ฒจ๋งํ•˜๋Š” ๋น„์šฉ์ด ๋งŽ์€ ๊ฒฝ์šฐ์—๋„ ์œ ์šฉ
3) ๊ตฐ์ง‘ํ™”๊ฐ€ ์ ์šฉ ๊ฐ€๋Šฅํ•œ ๋ฐ์ดํ„ฐ์˜ ์˜ˆ
(a)
(b)
์žฅ๋ฉด ์˜์ƒ์˜ ๊ตฐ์ง‘ํ™” (์˜ˆ: ๋„์‹œ, ํ•˜๋Š˜, ์ž์—ฐ ๋“ฑ)
์˜์ƒ ํ™”์†Œ์˜ ๊ตฐ์ง‘ํ™”
(์˜์ƒ ๋ถ„ํ• )
4) K-means ๊ตฐ์ง‘ํ™” ์•Œ๊ณ ๋ฆฌ์ฆ˜
์ฃผ์–ด์ง„ ๋ฐ์ดํ„ฐ ์ง‘ํ•ฉ์„ K๊ฐœ์˜ ๊ทธ๋ฃน์œผ๋กœ ๋ฌถ๋Š” ์•Œ๊ณ ๋ฆฌ์ฆ˜
K๊ฐœ์˜ ๊ฐ ๊ทธ๋ฃน์˜ ๋Œ€ํ‘œ๋ฒกํ„ฐ: ํ•ด๋‹น ๊ทธ๋ฃน ๋‚ด์— ์†ํ•˜๋Š” ๋ฐ์ดํ„ฐ๋“ค์˜ ํ‰๊ท 
โ‘  ์‹œ์ž‘
โ‘ก ๋ฐ์ดํ„ฐ ๊ทธ๋ฃนํ•‘
4) K-means ๊ตฐ์ง‘ํ™” ์•Œ๊ณ ๋ฆฌ์ฆ˜
โ‘ข ๋Œ€ํ‘œ๋ฒกํ„ฐ ์ˆ˜์ •
โ‘ฃ ๋ฐ˜๋ณต ์—ฌ๋ถ€ ๊ฒฐ์ •
5) K-means ์•Œ๊ณ ๋ฆฌ์ฆ˜์˜ ์ ์šฉ ๊ณผ์ • โ‘ 
8
8
(a) initial state
7
7
6
6
5
5
4
4
3
3
2
2
1
1
0
0
1
2
3
4
5
6
7
8
0
0
(b) iteration = 1 (๋ฐ์ดํ„ฐ ๊ทธ๋ฃนํ•‘)
1
2
3
4
5
6
7
8
5) K-means ์•Œ๊ณ ๋ฆฌ์ฆ˜์˜ ์ ์šฉ ๊ณผ์ • โ‘ก
8
8
(c) iteration = 1 (๋Œ€ํ‘œ๋ฒกํ„ฐ ์ˆ˜์ •)
7
7
6
6
5
5
4
4
3
3
2
2
1
1
0
0
1
2
3
4
5
6
7
8
(d) iteration = 2
0
0
1
2
3
4
5
6
7
8
5) K-means ์•Œ๊ณ ๋ฆฌ์ฆ˜์˜ ์ ์šฉ ๊ณผ์ • โ‘ข
8
8
(e) iteration = 3
7
7
6
6
5
5
4
4
3
3
2
2
1
1
0
0
1
2
3
4
5
6
7
8
(f) iteration = 4
0
0
1
2
3
4
5
6
7
8
6) K-means ์•Œ๊ณ ๋ฆฌ์ฆ˜ ํŠน์„ฑ
๏ƒผ ์‹ค์ œ ๋ฌธ์ œ์—์˜ ์ ์šฉ ์‹œ ๊ณ ๋ คํ•ด์•ผ ํ•  ์‚ฌํ•ญ
1. ์ข‹์€ ๊ตฐ์ง‘์„ ์ฐพ๋Š” ๊ฒƒ์ด ํ™•์‹คํžˆ ๋ณด์žฅ๋˜๋Š”๊ฐ€?
2. ์ดˆ๊ธฐ ๋Œ€ํ‘œ๋ฒกํ„ฐ ์„ค์ •์ด ์„ฑ๋Šฅ์— ๋ฏธ์น˜๋Š” ์˜ํ–ฅ?
3. K๊ฐ’์„ ์–ด๋–ป๊ฒŒ ์„ ํƒํ•  ๊ฒƒ์ธ๊ฐ€?
7) K-means ์•Œ๊ณ ๋ฆฌ์ฆ˜ ํŠน์„ฑ
๋ฐ˜๋ณต ์ˆ˜ํ–‰ ๊ณผ์ •์˜ ์˜๋ฏธ
K-means ์•Œ๊ณ ๋ฆฌ์ฆ˜์˜ ๋ชฉ์ ํ•จ์ˆ˜
→ ์ตœ์†Œํ™”ํ•˜๋Š” ๋ฐฉํ–ฅ์œผ๋กœ ๋ฐ˜๋ณต ์ง„ํ–‰
๊ฐ ํด๋Ÿฌ์Šคํ„ฐ์˜ ๋ถ„์‚ฐ์„ ๋ชจ๋‘ ๋”ํ•œ ๊ฐ’
J↑ : ๊ฐ ํด๋Ÿฌ์Šคํ„ฐ ๋‚ด์˜ ๋ฐ์ดํ„ฐ๋“ค์ด ์„œ๋กœ ๋ญ‰์ณ์žˆ์ง€ ์•Š์Œ
J↓ : ๊ฐ ํด๋Ÿฌ์Šคํ„ฐ ๋‚ด์—์„œ๋Š” ๋ฐ์ดํ„ฐ๋“ค์ด ์ž˜ ๊ฒฐ์ง‘๋˜์–ด ์žˆ์Œ
7) K-means ์•Œ๊ณ ๋ฆฌ์ฆ˜ ํŠน์„ฑ
ํ•œ ๋ฒˆ ๋ฐ˜๋ณตํ•  ๋•Œ๋งˆ๋‹ค J์˜ ๊ฐ’์ด ๊ฐ์†Œํ•˜๋Š” ๋ฐฉํ–ฅ์œผ๋กœ ํ•™์Šต์ด ์ง„ํ–‰
J๊ฐ’์„ ๊ฒฐ์ •ํ•˜๋Š” ํŒŒ๋ผ๋ฏธํ„ฐ
๋Œ€ํ‘œ๋ฒกํ„ฐ mi
๊ฐ ๋ฐ์ดํ„ฐ์— ๋Œ€ํ•œ ํด๋Ÿฌ์Šคํ„ฐ ๋ผ๋ฒจ rni
1. ์ž„์˜์˜ ๊ฐ’์œผ๋กœ mi๊ฐ€ ๊ฒฐ์ •๋œ ํ›„ rni๋ฅผ ๊ฒฐ์ •ํ•˜๋Š” ๊ฒฝ์šฐ
J๊ฐ’ ์ตœ์†Œ ๏ƒ  ๊ฐ ๋ฐ์ดํ„ฐ๋กœ๋ถ€ํ„ฐ ๊ฐ€์žฅ ๊ฐ€๊นŒ์šด mi ๊นŒ์ง€ ๊ฑฐ๋ฆฌ์˜ ํ•ฉ์ด ๋”ํ•ด
์งˆ ์ˆ˜ ์žˆ๋„๋ก rni ๊ฐ’ ๊ฒฐ์ • = “K-means ์•Œ๊ณ ๋ฆฌ์ฆ˜์˜ ๊ทธ๋ฃนํ•‘ ๊ณผ์ •”
2. rni ๊ฐ€ ๊ณ ์ •๋œ ํ›„ ๋‹ค์‹œ mi ๋ฅผ ๊ฒฐ์ •ํ•˜๋Š” ๊ฒฝ์šฐ
i๋ฒˆ์งธ ํด๋Ÿฌ์Šคํ„ฐ์— ์†ํ•˜๋Š” ๋ฐ์ดํ„ฐ ํ•ฉ
์„ mi์— ๋Œ€ํ•ด ํŽธ๋ฏธ๋ถ„
(∂J/∂mi = 0)
= “K-means ์•Œ๊ณ ๋ฆฌ์ฆ˜์˜ ๋Œ€ํ‘œ๋ฒกํ„ฐ ์ˆ˜์ •์‹”
i๋ฒˆ์งธ ํด๋Ÿฌ์Šคํ„ฐ์— ์†ํ•˜๋Š” ๋ฐ์ดํ„ฐ ์ˆ˜
๏ƒจ ๋ชฉ์ ํ•จ์ˆ˜ J๋ฅผ ๊ทน์†Œํ™”ํ•˜๋Š” ์ง€์—ญ ๊ทน์†Œ์ ์„ ์ฐพ๋Š” ๊ฒƒ์„ ๋ณด์žฅ
7) K-means ์•Œ๊ณ ๋ฆฌ์ฆ˜ ํŠน์„ฑ
J
iteration
๋ฐ˜๋ณตํšŸ์ˆ˜์— ๋”ฐ๋ฅธ J๊ฐ’์˜ ๋ณ€ํ™”
7) K-means ์•Œ๊ณ ๋ฆฌ์ฆ˜์˜ ํŠน์„ฑ
์ดˆ๊ธฐ๊ฐ’์— ๋Œ€ํ•œ ์˜์กด์„ฑ ๋ฌธ์ œ
์ดˆ๊ธฐ์— ์ž„์˜๋กœ ๊ฒฐ์ •ํ•˜๋Š” ๋Œ€ํ‘œ๋ฒกํ„ฐ์— ๋”ฐ๋ผ ์ตœ์ข…์ ์œผ๋กœ ์ฐพ์•„์ง€๋Š” ํ•ด๊ฐ€ ๋‹ฌ๋ผ์ง
2๋ฒˆ ๋ฐ˜๋ณต ํ›„ ์ˆ˜๋ ด
(a)
(b)
7๋ฒˆ ๋ฐ˜๋ณต ํ›„ ์ˆ˜๋ ด
(c)
7) K-means ์•Œ๊ณ ๋ฆฌ์ฆ˜์˜ ํŠน์„ฑ
์ ์ ˆํ•œ K๊ฐ’์˜ ์„ ์ • ๋ฌธ์ œ
๋ฌธ์ œ ์˜์กด์ 
๋‹ค์–‘ํ•œ K๊ฐ’์— ๋Œ€ํ•ด ํด๋Ÿฌ์Šคํ„ฐ๋งํ•œ ๊ฒฐ๊ณผ๋ฅผ ์„ ํƒ?
๊ณ„์ธต์  ๊ตฐ์ง‘ํ™” ์•Œ๊ณ ๋ฆฌ์ฆ˜?
๊ณ„์ธต์  ๊ตฐ์ง‘ํ™” ์•Œ๊ณ ๋ฆฌ์ฆ˜
์ „์ฒด ๋ฐ์ดํ„ฐ๋ฅผ ๋ช‡ ๊ฐœ์˜ ๋ฐฐํƒ€์ ์ธ ๊ทธ๋ฃน์œผ๋กœ ๋‚˜๋ˆ„๋Š” ๋Œ€์‹ , ํฐ ๊ตฐ์ง‘์ด
์ž‘์€ ๊ตฐ์ง‘์„ ํฌํ•จํ•˜๋Š” ํ˜•ํƒœ๋กœ ๊ณ„์ธต์„ ์ด๋ฃจ๋„๋ก ๊ตฐ์ง‘ํ™”๋ฅผ ์ˆ˜ํ–‰ํ•˜์—ฌ
๊ทธ ๊ตฌ์กฐ๋ฅผ ์‚ดํŽด๋ณด๋Š” ๋ฐฉ๋ฒ•
๏ƒ˜ ๋ณ‘ํ•ฉ์  ๋ฐฉ๋ฒ•
• ๊ฐ ๋ฐ์ดํ„ฐ๊ฐ€ ํ•˜๋‚˜์˜ ๊ตฐ์ง‘์„ ์ด๋ฃจ๋Š” ์ตœ์†Œ ๊ตฐ์ง‘์—์„œ ์‹œ์ž‘ํ•˜์—ฌ ๊ฐ€๊นŒ์šด
๊ตฐ์ง‘๋ผ๋ฆฌ ๋‹จ๊ณ„์ ์œผ๋กœ ๋ณ‘ํ•ฉํ•˜์—ฌ ๋” ํฐ ๊ตฐ์ง‘์„ ๋งŒ๋“ค์–ด ๊ฐ€๋Š” ๋ฐฉ๋ฒ•
•
N-1๋ฒˆ์˜ ๋ณ‘ํ•ฉ ๊ณผ์ •์ด ํ•„์š”
๏ƒ˜ ๋ถ„ํ• ์  ๋ฐฉ๋ฒ•
• ๋ชจ๋“  ๋ฐ์ดํ„ฐ๊ฐ€ ํ•˜๋‚˜์˜ ๊ตฐ์ง‘์— ์†ํ•˜๋Š” ์ตœ๋Œ€ ๊ตฐ์ง‘์—์„œ ์‹œ์ž‘ํ•˜์—ฌ ํŠน์ •
๊ธฐ์ค€์— ๋”ฐ๋ผ ๊ตฐ์ง‘๋“ค์„ ๋ถ„ํ• ํ•ด ๊ฐ€๋Š” ๋ฐฉ๋ฒ•
• ๊ฐ€๋Šฅํ•œ ๋ถ„ํ•  ๋ฐฉ๋ฒ•์˜ ๊ฐ€์ง€์ˆ˜ 2N-1-1๊ฐœ ๏ƒ  ๋น„์‹ค์šฉ์ 
๊ณ„์ธต์  ์•Œ๊ณ ๋ฆฌ์ฆ˜์˜ ์ˆ˜ํ–‰ ๊ณผ์ • ์˜ˆ
Dendrogram
: ๊ณ„์ธต์  ๊ตฐ์ง‘ํ™” ๊ฒฐ๊ณผ ์ œ์‹œ
K=2
CAB ={A, B}
CCD ={C, D}
๊ณ„์ธต์  ์•Œ๊ณ ๋ฆฌ์ฆ˜์˜ ์ˆ˜ํ–‰ ๋‹จ๊ณ„
๊ณ„์ธต์  ๊ตฐ์ง‘ํ™”์˜ ํŠน์„ฑ
๊ณ ๋ ค์‚ฌํ•ญ 1: ๊ตฐ์ง‘๊ฐ„ ๊ฑฐ๋ฆฌ๋ฅผ ๊ณ„์‚ฐํ•˜๋Š” ๋ฐฉ๋ฒ•
์ตœ๋‹จ์—ฐ๊ฒฐ๋ฒ•
์ตœ์žฅ์—ฐ๊ฒฐ๋ฒ•
์ค‘์‹ฌ์—ฐ๊ฒฐ๋ฒ•
: ์•„์›ƒ๋ผ์ด์–ด ์˜ํ–ฅ ์ค„์ด๊ธฐ ์œ„ํ•ด ๊ฐ ๊ตฐ์ง‘์˜ ํ‰๊ท  ๋ฒกํ„ฐ ๊ตฌํ•˜๊ณ  ์ด๋“ค๊ฐ„์˜ ๊ฑฐ๋ฆฌ ์ด์šฉ
ํ‰๊ท ์—ฐ๊ฒฐ๋ฒ•
: ๋ชจ๋“  ๊ฐ€๋Šฅํ•œ ๋ฐ์ดํ„ฐ ์Œ์— ๋Œ€ํ•ด ๊ฑฐ๋ฆฌ ๊ณ„์‚ฐํ•˜๊ณ , ์ด ๊ฐ’์˜ ํ‰๊ท  ์ด์šฉ
Ward’s ๋ฐฉ๋ฒ•
: ๊ตฐ์ง‘์ด ๋ณ‘ํ•ฉ๋˜๋Š” ๊ฒฝ์šฐ, ๊ฐ ๊ตฐ์ง‘์˜ ํฌ๊ธฐ(๋ถ„์‚ฐ) ๊ณ ๋ ค
5) ๊ณ„์ธต์  ๊ตฐ์ง‘ํ™”์˜ ํŠน์„ฑ
๊ณ ๋ ค์‚ฌํ•ญ 2: ์ ์ ˆํ•œ ๊ตฐ์ง‘ ๊ฐœ์ˆ˜ ๊ฒฐ์ • ๏ƒ  ๋ด๋“œ๋กœ๊ทธ๋žจ ๋ถ„์„
๋ด๋“œ๋กœ๊ทธ๋žจ์—์„œ ํด๋Ÿฌ์Šคํ„ฐ๊ฐ„์˜ ๊ฑฐ๋ฆฌ๊ฐ€ ์ฆ๊ฐ€ํ•˜๋Š” ๋™์•ˆ ํด๋Ÿฌ์Šคํ„ฐ์˜
์ˆ˜๊ฐ€ ๋Š˜์–ด๋‚˜์ง€ ์•Š๊ณ  ์ผ์ • ๊ธฐ๊ฐ„ ๋™์•ˆ ์œ ์ง€๋˜๋Š” ์ง€์ ์„ ์ฐพ์Œ
=2
ํด๋Ÿฌ์Šคํ„ฐ
MATLAB ์˜ˆ์ œ
% ํ•™์Šต๋ฐ์ดํ„ฐ์˜ ์ƒ์„ฑ ----------------------------------K=3์ธ ๊ฒฝ์šฐ
X=[randn(50,2);
randn(50,2)+repmat([3 0],50,1); randn(50,2)+repmat([0 3],50,1)];
save data7_8 X;
% ๋Œ€ํ‘œ๋ฒกํ„ฐ์˜ ์ดˆ๊ธฐํ™” ----------------------------------figure(1);
plot(X(:,1),X(:,2),'*'); hold on;
N=size(X,1); K=3; m=zeros(K,2);
Xlabel=zeros(N,1);
% ๋‹จ๊ณ„ 1 -- ์„ธ ๊ฐœ์˜ ๋Œ€ํ‘œ๋ฒกํ„ฐ๋ฅผ ์„ ํƒ
while(i<=K)
t=floor(rand*N);
% ๋žœ๋คํ•˜๊ฒŒ ๋ฐ์ดํ„ฐ ์„ ํƒ
if ((X(t,:)~=m(1,:))& (X(t,:)~=m(2,:)) & (X(t,:)~=m(3,:)))
m(i,:)=X(t,:);
plot(m(i,1), m(i,2),'ro');
i=i+1;
end
end
% ์„ ํƒ๋œ ๋ฐ์ดํ„ฐ๋ฅผ ๋Œ€ํ‘œ๋ฒกํ„ฐ๋กœ ์„ค์ •
% ๋Œ€ํ‘œ๋ฒกํ„ฐ๋ฅผ ๊ทธ๋ž˜ํ”„์— ํ‘œ์‹œ
MATLAB ์˜ˆ์ œ
% K-means ๋ฐ˜๋ณต ์•Œ๊ณ ๋ฆฌ์ฆ˜์˜ ์‹œ์ž‘ ---------------------------(๊ณ„์†)
cmode=['gd'; 'b*'; 'mo'];
% ํด๋Ÿฌ์Šคํ„ฐ๋ณ„๋กœ ํ‘œ์‹œ ๊ธฐํ˜ธ ์„ค์ •
for iteration=1:10 % ๋‹จ๊ณ„ 4 (๋‹จ๊ณ„ 2์™€ 3 ๋ฐ˜๋ณต)
figure(iteration+1); hold on;
% ๋‹จ๊ณ„ 2 -- ๊ฐ ๋ฐ์ดํ„ฐ๋ฅผ ๊ฐ€๊นŒ์šด ํด๋Ÿฌ์Šคํ„ฐ์— ํ• ๋‹น
for i=1:N
for j=1:K
d(j)=(X(i,:)-m(j,:))*(X(i,:)-m(j,:))';
end
[minv, Xlabel(i)]=min(d);
plot(X(i,1),X(i,2),cmode(Xlabel(i),:));
% ํ• ๋‹น๋œ ํด๋Ÿฌ์Šคํ„ฐ๋ฅผ ํ‘œ์‹œ
end
% ๋‹จ๊ณ„ 3 -- ๋Œ€ํ‘œ๋ฒกํ„ฐ๋ฅผ ๋‹ค์‹œ ๊ณ„์‚ฐ
oldm=m;
for i=1:K
I=find(Xlabel==i);
m(i,:)=mean(X(I,:));
end
for i=1:3
plot(m(i,1), m(i,2),'ks');
% ์ˆ˜์ •๋œ ๋Œ€ํ‘œ๋ฒกํ„ฐ๋ฅผ ํ‘œ์‹œ
end
if sum(sum(oldm-m))<10^3 iteration=101; end % ๋ฐ˜๋ณต ์™„๋ฃŒ ์กฐ๊ฑด ๊ฒ€์‚ฌ
end % ๋‹จ๊ณ„ 4 (๋‹จ๊ณ„ 2์™€ 3 ๋ฐ˜๋ณต)
K-means์˜ ๊ตฐ์ง‘ํ™” ๊ณผ์ •
์ดˆ๊ธฐ ๋Œ€ํ‘œ๋ฒกํ„ฐ
๋Œ€ํ‘œ๋ฒกํ„ฐ์™€์˜ ๊ฑฐ๋ฆฌ ๊ณ„์‚ฐ์„
ํ†ตํ•ด ๊ฐ ๋ฐ์ดํ„ฐ๊ฐ€ ๊ทธ๋ฃนํ•‘๋œ
๊ฒฐ๊ณผ
๋Œ€ํ‘œ๋ฒกํ„ฐ์˜ ์ด๋™(์ˆ˜์ •) ๋ฐฉ
ํ–ฅ
4) K๊ฐ’์˜ ๋ณ€ํ™”์— ๋”ฐ๋ฅธ ๊ตฐ์ง‘ํ™” ๊ฒฐ๊ณผ
(a) Data set
(b) K=2
(c) K=3
(d) K=5
01_๊ต์‚ฌ์™€ ๋น„๊ต์‚ฌ ํ•™์Šต
02_๋น„๊ต์‚ฌ ํ•™์Šต์˜ ๋‘ ๊ฐ€์ง€ ์ ‘๊ทผ๋ฒ•
03_๋ฒกํ„ฐ ์–‘์žํ™”์™€ ํด๋Ÿฌ์Šคํ„ฐ๋ง
04_์ตœ์ ํ™” ๊ทœ์ค€
05_k-means ์•Œ๊ณ ๋ฆฌ์ฆ˜๊ณผ EM ์•Œ๊ณ ๋ฆฌ์ฆ˜
06_๋น„๊ท ์ผ ์ด์ง„ ๋ถ„ํ• 
07_k-means์™€ ์ด์ง„ ๋ถ„ํ• ์˜ ๋น„๊ต์™€ ๊ฐœ์„ : LBG ์•Œ๊ณ ๋ฆฌ์ฆ˜
01_๊ต์‚ฌ์™€ ๋น„๊ต์‚ฌ ํ•™์Šต
๏ถ ๊ต์‚ฌ ํ•™์Šต (supervised training)
๏‚ง ๊ด€์ธก๋œ ๋ฐ์ดํ„ฐ๊ฐ€ ํŠน์ง• ๋ฒกํ„ฐ x์™€ ๊ด€์ธก๊ฐ’์ด ์†ํ•ด์žˆ๋Š” ํด๋ž˜์Šค ω๋กœ ์ด๋ฃจ์–ด์ง„ ๋ณ€์ˆ˜ ์Œ
{x, ω}์œผ๋กœ ๊ตฌ์„ฑ๋˜๋Š” ๊ฒฝ์šฐ
๏ถ ๋น„๊ต์‚ฌ ํ•™์Šต (unsupervised training)
๏‚ง ํด๋ž˜์Šค ๋ผ๋ฒจ ωi๊ฐ€ ์ฃผ์–ด์ง€์ง€ ์•Š๊ณ  ํŠน์ง• ๋ฒกํ„ฐ x={x1, x2, ..., xN} ๋งŒ์œผ๋กœ ๋ฐ์ดํ„ฐ ์ง‘ํ•ฉ์ด
๊ตฌ์„ฑ๋˜๋Š” ๊ฒฝ์šฐ์˜ ํ•™์Šต
๏‚ง ๋ฐ์ดํ„ฐ๋งˆ์ด๋‹(data mining)์˜ ํด๋Ÿฌ์Šคํ„ฐ๋ง: ๋น„๊ต์‚ฌ ํ•™์Šต ํ†ตํ•ด ๋ฐ์ดํ„ฐ๋ฅผ ์œ ์‚ฌํ•œ
๊ฒƒ๋ผ๋ฆฌ ๋ชจ์Œ. ์ •ํ•ด์ง„ ๊ธฐ์ค€์— ๋งž๋Š” ๋ฐ์ดํ„ฐ๋ผ๋ฆฌ ๋ถ„๋ฅ˜ ๋ฐ ์ฒด๊ณ„ํ™” ํ•˜๋Š” ๊ณผ์ •
: ๋ฐ์ดํ„ฐ ๊ฐ„์˜ ํŒจํ„ด, ๊ทœ์น™, ๊ด€๊ณ„ ๋“ฑ์˜ ์ •๋ณด๋ฅผ ์ถ”์ •ํ•˜์—ฌ ๊ด€๊ณ„๋ฅผ ์ฒด๊ณ„ํ™”, ์ •๋Ÿ‰ํ™”ํ•˜๋Š” ๊ณผ์ •
์˜ˆ) ๋ฒกํ„ฐ ์–‘์žํ™” (Vector Quantization) ํ˜น์€ ํด๋Ÿฌ์Šคํ„ฐ๋ง(Clustering) ๋“ฑ
๏‚ง ๊ต์‚ฌํ•™์Šต์— ๋น„ํ•ด์„œ ์„ฑ๋Šฅ์ด ์ œํ•œ๋  ์ˆ˜ ์žˆ์œผ๋‚˜ ๋ฐ์ดํ„ฐ์˜ ๋ถ„์„ ๋ฐ ์ฒ˜๋ฆฌ ๊ณผ์ •์ด
์ƒ๋Œ€์ ์œผ๋กœ ๋‹จ์ˆœํ•ด์ง€๋Š” ๊ฒฝํ–ฅ์ด ์žˆ์Œ.
40
01_๊ต์‚ฌ์™€ ๋น„๊ต์‚ฌ ํ•™์Šต
๏ถ ๋น„๊ต์‚ฌ ํ•™์Šต์˜ ์ด์šฉ
๏‚ง ํ‘œ๋ณธ์˜ ์ˆ˜๊ฐ€ ๋„ˆ๋ฌด ๋งŽ์€ ๊ฒฝ์šฐ
๏‚ง ๊ฐ ํ‘œ๋ณธ๋งˆ๋‹ค ์ผ์ผ์ด ๋ผ๋ฒจ๋งํ•˜๋Š” ๊ฒƒ์ด ๋น„์šฉ์ด ๋งŽ์ด ๋“œ๋Š” ๋ณต์žกํ•œ ๊ณผ์ •์ผ ๊ฒฝ์šฐ
๏‚ง ์›์ฒœ์ ์œผ๋กœ ๋ฐ์ดํ„ฐ์— ๋Œ€ํ•œ ํด๋ž˜์Šค ๋ผ๋ฒจ ๋“ฑ๊ณผ ๊ฐ™์€ ์ •๋ณด๋ฅผ ๋ฏธ๋ฆฌ ์•Œ ์ˆ˜ ์—†๋Š” ๊ฒฝ์šฐ
๏‚ง ๊ฐ€์žฅ ์ผ๋ฐ˜์ ์ธ ๊ฒฝ์šฐ
๏‚ง ๋งŽ์€ ๋ฐ์ดํ„ฐ ์ง‘ํ•ฉ๋“ค์„ ์ž‘์€ ํ”„๋กœํ† ํƒ€์ž…(์›ํ˜•) ์ง‘ํ•ฉ์œผ๋กœ ๋ถ„๋ฅ˜ํ•  ์ˆ˜ ์žˆ๋Š” ๊ฒฝ์šฐ
๏‚ง Ex: K-NNR(The K-Nearest Neighbor Rule)
41
02_๋น„๊ต์‚ฌ ํ•™์Šต์˜ ๋‘๊ฐ€์ง€ ์ ‘๊ทผ๋ฒ•
๏ถ ๋ชจ์ˆ˜์  ํ˜ผํ•ฉ ๋ชจ๋ธ(Parametric mixture model) ๊ตฌ์ถ• ํ†ตํ•œ ๋ฐฉ๋ฒ•
๏‚ง ๋‹ค์ˆ˜์˜ ํ™•๋ฅ ๋ฐ€๋„ํ•จ์ˆ˜๋ฅผ ์ด์šฉํ•˜์—ฌ ์ฃผ์–ด์ง„ ํด๋ž˜์Šค์˜ ๋ฐ์ดํ„ฐ๋ฅผ ๋ชจ๋ธ๋งํ•˜๋Š” ๋ฐฉ๋ฒ•
๏‚ง ๋ชจ์ˆ˜์  ํ˜ผํ•ฉ ๋ชจ๋ธ
๏ถ ๋น„๋ชจ์ˆ˜์  ๋ฐฉ๋ฒ• (non-parametric method)
๏‚ง ์ฃผ์–ด์ง„ ํด๋ž˜์Šค์˜ ๋ฐ์ดํ„ฐ์— ๋Œ€ํ•œ ๊ฐ€์ • ์—†์ด, ๋ฐ์ดํ„ฐ๋ฅผ ๋‚˜๋ˆ”. ํด๋Ÿฌ์Šคํ„ฐ๋ง
๏‚ง ํ†ต๊ณ„์  ๋ชจ์ˆ˜๋ฅผ ์ด์šฉํ•˜์ง€ ์•Š๊ณ  ํ•™์Šต ๋ฐ์ดํ„ฐ์˜ ๋ชจ๋ธ์„ ๊ตฌ์„ฑํ•˜๋Š” ๋ฐฉ๋ฒ• (k-means,
LBG ์•Œ๊ณ ๋ฆฌ์ฆ˜ ๋“ฑ)
42
03_๋ฒกํ„ฐ ์–‘์žํ™”์™€ ํด๋Ÿฌ์Šคํ„ฐ๋ง
๏ถ ๋ฒกํ„ฐ ์–‘์žํ™” (=ํด๋Ÿฌ์Šคํ„ฐ๋ง)
๏‚ง N๊ฐœ์˜ ํŠน์ง• ๋ฒกํ„ฐ ์ง‘ํ•ฉ x = {x1, x2, ..., xN} ์„ K๊ฐœ์˜ ๋ฒกํ„ฐ ์ง‘ํ•ฉ y = {y1, y2, ... ,yK}๋กœ
๋งคํ•‘(mapping) ๋˜๋Š” ๋ถ„๋ฅ˜
๏‚ง ์ผ๋ฐ˜์ ์œผ๋กœ N >> K
๏‚ง ๋งคํ•‘ํ•จ์ˆ˜ : yi = c(xi), i = 1, …, K
c(โˆ™) : ์–‘์žํ™” ์—ฐ์‚ฐ์ž
๏‚ง yi : code vectors = code words = cluster
๏‚ง ์ผ๋ฐ˜์ ์œผ๋กœ๋Š” ์œ ํ•œ ์ง‘ํ•ฉ์˜ ํ•œ ์›์†Œ
๏‚ง ์ง‘ํ•ฉ y : ์ฝ”๋“œ๋ถ(codebook)
๏‚ง ์ฝ”๋“œ๋ถ ํฌ๊ธฐ: ์ฝ”๋“œ ๋ฒกํ„ฐ์˜ ์ˆ˜
๏‚ง N๊ฐœ์˜ ํŠน์ง• ๋ฒกํ„ฐ ๊ณต๊ฐ„ x๋ฅผ K๊ฐœ์˜ ์˜์—ญ
(ํด๋Ÿฌ์Šคํ„ฐ)์œผ๋กœ ๋ถ„ํ• ํ•˜๊ณ ,
๊ฐ ํด๋Ÿฌ์Šคํ„ฐ๋ฅผ ์ฝ”๋“œ ๋ฒกํ„ฐ์™€ ์—ฐ๊ด€
→ ์ฝ”๋“œ ๋ฒกํ„ฐ๋Š” ๊ฐ ์˜์—ญ์˜ ์ค‘์‹ฌ์— ์œ„์น˜
43
03_๋ฒกํ„ฐ ์–‘์žํ™”์™€ ํด๋Ÿฌ์Šคํ„ฐ๋ง
๏ถ ๋ฒกํ„ฐ ์–‘์žํ™” (ํด๋Ÿฌ์Šคํ„ฐ๋ง)
๏‚ง ์–‘์žํ™” ์˜ค์ฐจ, ์™œ๊ณก(distortion)
๏‚ง ๋ฒกํ„ฐ ์ง‘ํ•ฉ x๊ฐ€ y๋กœ ๋งคํ•‘๋˜๋ฉด์„œ ํ•„์—ฐ์ ์œผ๋กœ ๋ฐœ์ƒ๋˜๋Š” ์˜ค์ฐจ
๏ƒ˜ ์ฆ‰, x์™€ y ์‚ฌ์ด์˜ ์ฐจ์ด๊ฐ’.
๏ƒ˜ ํŠน์ง• ๋ฒกํ„ฐ๋“ค๊ฐ„์˜ ๊ฑฐ๋ฆฌ ์ฒ™๋„(distance measure), ์ธก๋Ÿ‰(metric) : d(x, y)
๏‚ง ํด๋Ÿฌ์Šคํ„ฐ๋ง ๋ฐฉ๋ฒ•์— ๋งž๋Š” ์–‘์žํ™” ์˜ค์ฐจ๊ฐ€ ๋ณ„๋„๋กœ ์ •์˜๋จ.
๏ƒ˜ ์˜ˆ) ๊ฐ๊ฐ์˜ ๋ฐ์ดํ„ฐ์™€ ๋ฐ์ดํ„ฐ ์ง‘ํ•ฉ ์ค‘์‹ฌ์  ์‚ฌ์ด์˜ ๊ฑฐ๋ฆฌ
44
45
04_์ตœ์ ํ™” ๊ทœ์ค€
๏ถ ์–‘์žํ™” ์ˆ˜ํ–‰์‹œ ์ผ๋ฐ˜์ ์ธ ์™œ๊ณก ์ฒ™๋„
๏‚ง ์œ ํด๋ฆฌ๋“œ ๊ฑฐ๋ฆฌ
→ ์ง‘ํ•ฉ ๋‚ด์˜ ๋ฉค๋ฒ„ ๊ฐ„ ๊ฑฐ๋ฆฌ์˜ ํ•ฉ์„ ์ตœ์†Œํ™”ํ•˜๋Š” ์  ๊ณ„์‚ฐ์— ํ™œ์šฉ
๏‚ง ์ค‘์‹ฌํ‰๊ท 
๏‚ง ์œ ํด๋ฆฌ๋“œ ๊ฑฐ๋ฆฌ ์ด์šฉํ•˜๋Š” ๊ฒฝ์šฐ, ์ฝ”๋“œ ๋ฒกํ„ฐ์˜ ํ‰๊ท ๊ฐ’์ด ์ค‘์‹ฌ๊ฐ’์œผ๋กœ ์ด์šฉ๋จ.
๏‚ง N๊ฐœ์˜ ๋ฒกํ„ฐ์—์„œ ํ‰๊ท ์™œ๊ณก
D ๏€ฝ
1
N
N
๏ƒฅ d (x
n
, y i(n) )
n ๏€ฝ1
ํ‰๊ท ์™œ๊ณก์„ ์ตœ์†Œํ™”ํ•˜๋Š” ์ฝ”๋“œ ๋ฒกํ„ฐ ์ง‘ํ•ฉ y = [y1, …, yK] ๋ฅผ ์–ด๋–ป๊ฒŒ ์ฐพ๋Š”๊ฐ€?
: ์ธ์ฝ”๋”ฉ๋˜๋Š” ํ›„๋ณด ๋ฒกํ„ฐ ์ˆ˜ N์€ ํ•ญ์ƒ ์ฝ”๋“œ๋ฒกํ„ฐ ์ˆ˜ K๋ณด๋‹ค ํ›จ์”ฌ ๋งŽ๊ณ , D์™€ ์ฝ”๋“œ ๋ฒกํ„ฐ๊ฐ„
๊ด€๊ณ„๋Š” ๋น„์„ ํ˜•์  → ๋ฐ˜๋ณต์  ์ตœ์ ํ™” ๊ณผ์ • ํ•„์š”
46
04_์ตœ์ ํ™” ๊ทœ์ค€
๏ถ ๋ฒกํ„ฐ ์–‘์žํ™”(VQ)์˜ ์„ธ๊ฐ€์ง€ ๋ฌธ์ œ
47
04_์ตœ์ ํ™” ๊ทœ์ค€
๏ถ ๋Œ€ํ‘œ์ ์ธ ํด๋Ÿฌ์Šคํ„ฐ๋ง ์•Œ๊ณ ๋ฆฌ์ฆ˜
๏‚ง EM ์•Œ๊ณ ๋ฆฌ์ฆ˜(Expectation–Maximization) : ๋ฌธ์ œ ์†์— ์ •๋ณด๊ฐ€ ์ˆจ์–ด ์žˆ๋Š” ๊ฒฝ์šฐ์—์„œ ์ตœ์ ํ•ด?
๏‚ง 1) ์ฃผ์–ด์ง„ ๋ฐ์ดํ„ฐ์—์„œ ์ถ”์ •๊ฐ’ ๊ณ„์‚ฐ (Expectation)
๏‚ง 2) ์ถ”์ •๋œ ๋ฐ์ดํ„ฐ์— ๋Œ€ํ•œ ์˜ค์ฐจ๊ฐ’ ์ตœ์ ํ™” (Maximization)
๏‚ง ์˜ค์ฐจ๊ฐ€ ์ง€์—ญ์ ์ธ ์ตœ์ ๊ฐ’์— ์ˆ˜๋ ดํ•  ๋•Œ๊นŒ์ง€ 1)๊ณผ 2)๋ฅผ ๋ฐ˜๋ณต(iteration)
๏‚ง ์ดˆ๊ธฐ๊ฐ’ ์„ ํƒ์ด ์ตœ์ ํ™” ์„ฑ๊ณต์˜ ๊ฒฐ์ •์ ์ธ ์š”์†Œ๊ฐ€ ๋จ.
๏‚ง K-Means ์•Œ๊ณ ๋ฆฌ์ฆ˜
๏‚ง ๊ฐ€์žฅ ๊ฐ„๋‹จํ•œ ํ˜•ํƒœ์˜ EM ์•Œ๊ณ ๋ฆฌ์ฆ˜
๏‚ง 1) ์ž„์˜์˜ ์ดˆ๊ธฐ๊ฐ’ ์ค‘์‹ฌ์œผ๋กœ ํด๋Ÿฌ์Šคํ„ฐ๋ฅผ ์„ ํƒํ•˜์—ฌ ๊ฒฐ์ •
๏‚ง 2) ๊ตฌ์„ฑ๋œ ํด๋Ÿฌ์Šคํ„ฐ ๋‚ด์—์„œ ๋ฒกํ„ฐ ์ง‘ํ•ฉ์˜ ์ค‘์‹ฌ์  ๊ณ„์‚ฐ
๏‚ง ์–‘์žํ™” ์˜ค์ฐจ๊ฐ€ ์ˆ˜๋ ดํ•  ๋•Œ๊นŒ์ง€ 1)๊ณผ 2)๋ฅผ ๋ฐ˜๋ณต
48
04_์ตœ์ ํ™” ๊ทœ์ค€
๏ถ k-means ์•Œ๊ณ ๋ฆฌ์ฆ˜
๏‚ง ๋‹ค์Œ๊ณผ ๊ฐ™์ด ์ฃผ์–ด์ง€๋Š” criterion function์„ ๋ฐ˜๋ณต์ ์œผ๋กœ ์ตœ์†Œํ™”์‹œํ‚ค๋Š” ๊ตฐ์ง‘ํ™”
(clustring) ๊ณผ์ •.
C
1
2
J MSE ๏€ฝ ๏ƒฅ ๏ƒฅ x ๏€ญ ๏ญ i
where μ i ๏€ฝ
๏ƒฅx
i ๏€ฝ1 x ๏‚ป ๏ท i
N
x ๏‚ป๏ท i
๏‚ง ํด๋Ÿฌ์Šคํ„ฐ์˜ ๊ฐœ์ˆ˜๋ฅผ ๊ฒฐ์ •
๏‚ง ์ž„์˜์˜ ํด๋Ÿฌ์Šคํ„ฐ ์ค‘์‹ฌ์„ ํ• ๋‹นํ•˜์—ฌ ํด๋Ÿฌ์Šคํ„ฐ ์ดˆ๊ธฐํ™”
๏‚ง ๊ฐ๊ฐ์˜ ํด๋Ÿฌ์Šคํ„ฐ์— ๋Œ€ํ•˜์—ฌ ํ‘œ๋ณธ ํ‰๊ท ์„ ๊ตฌํ•จ
๏‚ง ๊ฐ๊ฐ์˜ ํ‘œ๋ณธ์„ ๊ฐ€์žฅ ๊ทผ์ ‘ํ•œ ํ‰๊ท ์„ ๊ฐ–๋Š” ํด๋Ÿฌ์Šคํ„ฐ๋กœ ๋‹ค์‹œ ํ• ๋‹นํ•จ
๏‚ง ๋ชจ๋“  ํ‘œ๋ณธ์— ๋ณ€ํ™”๊ฐ€ ์—†์œผ๋ฉด ์•Œ๊ณ ๋ฆฌ์ฆ˜ ์ค‘์ง€ ๋˜๋Š” 3๋ฒˆ ๋‹จ๊ณ„๋กœ ์ด๋™ํ•˜์—ฌ ๊ณ„์† ์ˆ˜ํ–‰
49
05_k-means ์•Œ๊ณ ๋ฆฌ์ฆ˜๊ณผ EM์•Œ๊ณ ๋ฆฌ์ฆ˜
๏ถ K-Means ์•Œ๊ณ ๋ฆฌ์ฆ˜
50
05_k-means ์•Œ๊ณ ๋ฆฌ์ฆ˜๊ณผ EM์•Œ๊ณ ๋ฆฌ์ฆ˜
๏ถ K-Means ์•Œ๊ณ ๋ฆฌ์ฆ˜์˜ ์žฅ/๋‹จ์ 
๏‚ง ๊ฐ ํด๋Ÿฌ์Šคํ„ฐ์— ํ• ๋‹น๋˜๋Š” ์ค‘์‹ฌ ๋ฒกํ„ฐ์˜ ์ดˆ๊ธฐ๊ฐ’์— ๋งค์šฐ ๋ฏผ๊ฐํ•จ.
๏‚ง ์ƒ์„ฑ๋˜๋Š” ์ฝ”๋“œ ๋ถ์˜ ์‹ ๋ขฐ๋„๊ฐ€ ๋น„๊ต์  ๋†’์ง€๋งŒ ์ˆ˜ํ–‰์‹œ๊ฐ„์ด ๊ธธ ์ˆ˜ ์žˆ์Œ.
๏‚ง ์ดˆ๊ธฐ์— ์ฃผ์–ด์ง€๋Š” K๊ฐœ์˜ ์ฝ”๋“œ ๋ฒกํ„ฐ ๋ชจ๋‘์— ๋Œ€ํ•œ ์™œ๊ณก๊ฐ’์„ ๊ณ„์‚ฐํ•ด์•ผ ํ•จ.
51
06_๋น„๊ท ์ผ ์ด์ง„ ๋ถ„ํ•  (Non-Uniform Binary Split)
๏ถ ๋น„๊ท ์ผ ์ด์ง„ ๋ถ„ํ•  (Non-Uniform Binary Split)
๏‚ง ๋ฐ์ดํ„ฐ ์ง‘ํ•ฉ์„ ์ˆœ์ฐจ์ ์œผ๋กœ ํด๋Ÿฌ์Šคํ„ฐ์™€ ๋ถ€ํด๋Ÿฌ์Šคํ„ฐ(sub-cluster)๋กœ ๋ถ„ํ• 
๏‚ง ์ดˆ๊ธฐ์— ๋ฐ์ดํ„ฐ ์ง‘ํ•ฉ์„ ๋‘ ๊ฐœ๋กœ ๋‚˜๋ˆ„๋ฉฐ, K๊ฐœ๊ฐ€ ์ƒ์„ฑ๋  ๋•Œ๊นŒ์ง€ ๋ฐ˜์œผ๋กœ์˜ ๋ถ„ํ• ๊ณผ์ • ๋ฐ˜๋ณต
๏‚ง ๊ณ„์ธต์  ๋ถ„ํ•  ๊ธฐ๋ฒ•
๏‚ง ์ด์ง„๋ถ„ํ• ๊ธฐ๋ฒ• (binary split)
๏‚ง ํด๋Ÿฌ์Šคํ„ฐ ์ค‘ ์™œ๊ณก(distortion) ๊ฐ’์ด ๊ฐ€์žฅ ํฐ ํด๋Ÿฌ์Šคํ„ฐ๋งŒ ์„ ํƒํ•ด์„œ ๋ถ„ํ• 
๏ƒ˜๊ท ์ผ ์ด์ง„๋ถ„ํ• ์€ ์™œ๊ณก๊ฐ’์— ๊ด€๊ณ„์—†์ด ๋ฌด์กฐ๊ฑด ๋ชจ๋“  ํด๋Ÿฌ์Šคํ„ฐ๋ฅผ ๋ถ„ํ• ํ•˜๋Š” ๊ธฐ๋ฒ•.
52
06_๋น„๊ท ์ผ ์ด์ง„ ๋ถ„ํ• 
๏ถ ์ด์ง„ ๋ถ„ํ•  (binary split) ์•Œ๊ณ ๋ฆฌ์ฆ˜ ์ˆ˜ํ–‰ ์ ˆ์ฐจ
๏‚ง ๋‘ ๊ฐœ์˜ ํด๋Ÿฌ์Šคํ„ฐ๋ถ€ํ„ฐ ์‹œ์ž‘ํ•ด์„œ K ๊ฐœ์˜ ํด๋Ÿฌ์Šคํ„ฐ ์ง‘ํ•ฉ์ด ์ƒ์„ฑ๋  ๋•Œ๊นŒ์ง€ ์ˆ˜ํ–‰
๊ฐ ํด๋Ÿฌ์Šคํ„ฐ ๋‚ด์˜ ๋ฐ์ดํ„ฐ ํฌ์ธํŠธ์™€ ์ค‘์‹ฌ์ ๊นŒ์ง€์˜ ๊ฑฐ๋ฆฌ์˜ ํ‰๊ท ์ด ๊ฐ€์žฅ ํฐ
(์ •ํ™•๋„๋Š” ๋–จ์–ด์ง€์ง€๋งŒ ์ˆ˜ํ–‰์‹œ๊ฐ„์€ ํ›จ์”ฌ ์งง์Œ.)
y i ๏€ฝ c ( X a ),
y k ๏€ซ1 ๏€ฝ c ( X b )
53
06_๋น„๊ท ์ผ ์ด์ง„ ๋ถ„ํ•  (Non-Uniform Binary Split)
๏ถ ๋น„๊ท ์ผ ์ด์ง„ ๋ถ„ํ•  (Non-Uniform Binary Split)
๏‚ง ํด๋Ÿฌ์Šคํ„ฐ ์ค‘ ์™œ๊ณก(distortion) ๊ฐ’์ด ๊ฐ€์žฅ ํฐ ํด๋Ÿฌ์Šคํ„ฐ๋งŒ ์„ ํƒํ•ด์„œ ๋ถ„ํ• 
๏‚ง log K ๋ฒˆ์˜ ์ด์ง„ ๊ฒ€์ƒ‰ ํ†ตํ•ด ๊ฐ€์žฅ ๊ฐ€๊นŒ์šด ์ค‘์‹ฌ ์ฐพ์Œ.
๏‚ง ์ฆ‰, k-means๋ณด๋‹ค ์ˆ˜ํ–‰์†๋„๊ฐ€ ๋น ๋ฆ„.
๏‚ง ํด๋Ÿฌ์Šคํ„ฐ ๊ฐ„ ๊ฒฝ๊ณ„(boundary)๊ฐ€ ์ผ๋‹จ ๊ฒฐ์ •๋˜๋ฉด ํด๋Ÿฌ์Šคํ„ฐ ๊ฒฝ๊ณ„๊ฐ€ ๋ฐ”๋€Œ์ง€ ์•Š์Œ.
๏‚ง k-means์˜ ๊ฒฝ์šฐ ๋งค๋ฒˆ ์ˆ˜ํ–‰ ์‹œ ๊ฐฑ์‹ ๋จ.
๏‚ง ์ˆ˜ํ–‰ ์‹œ๊ฐ„์€ ๋น ๋ฅด์ง€๋งŒ ์ƒ์„ฑ๋˜๋Š” ์ฝ”๋“œ๋ถ์˜ ์‹ ๋ขฐ๋„๊ฐ€ ๋‹ค์†Œ ๋‚ฎ์•„์ง€๋Š” ๋‹จ์ ์ด ์žˆ์Œ.
k-mans ํ†ตํ•ด ํด๋Ÿฌ์Šคํ„ฐ ์ฐพ๊ณ  ๋‘ ํด๋ž˜์Šค์˜ ์ค‘์‹ฌ์„ ์—ฐ๊ฒฐํ•˜๋ฉฐ, ์ด ์ง์„ ์„ ์ˆ˜์ง ์ด๋“ฑ๋ถ„ํ•˜๋Š” ์„ ๋ถ„์ด
์ตœ์  ๊ฒฝ๊ณ„. ๊ฐ ํด๋Ÿฌ์Šคํ„ฐ ์ค‘์‹ฌ์— ๊ฐ€๊นŒ์šด ์ ์„ ํ•ด๋‹น ํด๋Ÿฌ์Šคํ„ฐ์— ํฌํ•จ
54
06_๋น„๊ท ์ผ ์ด์ง„ ๋ถ„ํ• 
55
07_k-means์™€ ์ด์ง„ ๋ถ„๋ฆฌ์˜ ๋น„๊ต์™€ ๊ฐœ์„  : LBG ์•Œ๊ณ ๋ฆฌ์ฆ˜
๏ถ ์ด์ง„ ๋ถ„๋ฆฌ ๊ฒฐ๊ณผ๋ฅผ k-means์˜ ์ดˆ๊ธฐ๊ฐ’์œผ๋กœ ์ด์šฉํ•˜๋Š” ๊ธฐ๋ฒ•
๏‚ง ์ด์ง„ ๋ถ„๋ฆฌ๋กœ ๊ตฌํ•œ ์ค‘์‹ฌ์„ k-means ์ดˆ๊ธฐ๊ฐ’์œผ๋กœ ์ด์šฉํ•˜๋ฉด ํ‘œ์ค€ k-means์˜ ๊ฒฝ์šฐ๋ณด๋‹ค
์„ฑ๋Šฅ์ด ์ข€ ๋” ํ–ฅ์ƒ๋จ.
๏‚ง ์ด์ง„ ๋ถ„๋ฆฌ์™€ k-means๊ฐ€ ๊ฒฐํ•ฉ๋œ ์•Œ๊ณ ๋ฆฌ์ฆ˜ : LBG์•Œ๊ณ ๋ฆฌ์ฆ˜
56
08_ MATLAB ์‹ค์Šต
57
08_ MATLAB ์‹ค์Šต
58
Download