Linear discriminants analysis

advertisement
Introduction to Pattern Recognition for Human ICT
Linear discriminants analysis
2014. 11. 14
Hyunki Hong
Contents
• Linear discriminant analysis, two classes
• Linear discriminant analysis, C classes
• LDA vs. PCA
• Limitations of LDA
Linear discriminant analysis, two
classes
• Objective
– LDA seeks to reduce dimensionality while preserving as
much of the class discriminatory information as possible.
– Assume we have a set of ๐ท-dimensional samples {๐‘ฅ(1, ๐‘ฅ(2, …,
๐‘ฅ(๐‘ }, ๐‘1 of which belong to class ๐œ”1, and ๐‘2 to class ๐œ”2.
– We seek to obtain a scalar ๐‘ฆ by projecting the samples ๐‘ฅ
onto a line ๐‘ฆ = ๐‘ค๐‘‡๐‘ฅ.
– Of all the possible lines we would like to select the one that
maximizes the separability of the scalars.
– In order to find a good projection vector, we need to
define a measure of separation.
– The mean vector of each class in ๐‘ฅ-space and ๐‘ฆ-space is
– We could then choose the distance between the projected
means as our objective function.
1. However, the distance between projected means is not a good
measure since it does not account for the standard deviation
within classes.
This axis yields better class
separability.
This axis has a larger distance between means.
• Fisher’s solution
– Fisher suggested maximizing the difference between the
means, normalized by a measure of the within-class scatter.
– For each class we define the scatter, an equivalent of the
variance, as
where the quantity
is called the within-class
scatter of the projected examples
– The Fisher linear discriminant is defined as the linear
function ๐‘ค๐‘‡๐‘ฅ that maximizes the criterion function
– Therefore, we are looking for a projection where examples
from the same class are projected very
close to each other and, at the same time,
the projected means are as farther apart
as possible.
• To find the optimum ๐‘ค∗, we must express ๐ฝ(๐‘ค) as a function of
๐‘ค
– First, we define a measure of the scatter in feature space ๐‘ฅ
where ๐‘†๐‘Š is called the within-class scatter matrix.
– The scatter of the projection ๐‘ฆ can then be expressed as a function
of the scatter matrix in feature space ๐‘ฅ.
- Similarly, the difference between the projected means can be
expressed in terms of the means in the original feature space.
The matrix ๐‘†๐ต is called the between-class scatter. Note that, since ๐‘†๐ต
is the outer product of two vectors, its rank is at most one.
– We can finally express the Fisher criterion in terms of ๐‘†๐‘Š and ๐‘†๐ต as
– To find the maximum of ๐ฝ(๐‘ค) we derive and equate to zero
d
u
( )๏€ฝ
dx v
v
d
(u ) ๏€ญ u
dx
dx
v
2
– Dividing by ๐‘ค๐‘‡๐‘†๐‘Š๐‘ค
- Solving the generalized eigenvalue problem (๐‘†๐‘Š−1๐‘†๐ต๐‘ค = ๐ฝ๐‘ค) yields
– This is know as Fisher’s linear discriminant (1936), although it is not
a discriminant but rather a specific choice of direction for the
projection of the data down to one dimension.
d
(v )
Example
LDA, C classes
• Fisher’s LDA generalizes gracefully for C-class
problems
– Instead of one projection ๐‘ฆ, we will now seek (๐ถ−1)
projections [๐‘ฆ1, ๐‘ฆ2, …, ๐‘ฆ๐ถ−1] by means of (๐ถ−1) projection
vectors ๐‘ค๐‘– arranged by columns into a projection matrix ๐‘Š =
[๐‘ค1| ๐‘ค2| …| ๐‘ค๐ถ−1]: ๐‘ฆ๐‘– = ๐‘ค๐‘–๐‘‡๐‘ฅ ⇒ ๐‘ฆ = ๐‘Š๐‘‡๐‘ฅ
• Derivation
– The within-class scatter generalizes as
– And the between-class scatter becomes
– Similarly, we define the mean vector and scatter matrices for
the projected samples as
- From our derivation for the two-class problem, we can write
- Recall that we are looking for a projection that maximizes the
ratio of between-class to within-class scatter. Since the
projection is no longer a scalar (it has ๐ถ−1 dimensions), we
use the determinant of the scatter matrices to obtain a scalar
objective function.
- And we will seek the projection matrix ๐‘Š∗ that maximizes
this ratio.
- And we will seek the projection matrix ๐‘Š∗ that maximizes
this ratio.
- It can be shown that the optimal projection matrix ๐‘Š∗ is
the one whose columns are the eigenvectors
corresponding to the largest eigenvalues of the following
generalized eigenvalue problem.
Limitations of LDA
• LDA produces at most ๐ถ − 1 feature projections.
– If the classification error estimates establish that more
features are needed, some other method must be employed
to provide those additional features.
• LDA is a parametric method (it assumes unimodal
Gaussian likelihoods).
– If the distributions are significantly non-Gaussian, the LDA
projections may not preserve complex structure in the data
needed for classification.
LDA will also fail if discriminatory
information is not in the mean but in the
variance of the data.
ํ•™์Šต ๋‚ด์šฉ
ํŠน์ง•์ถ”์ถœ
feature extraction
์„ ํ˜•๋ณ€ํ™˜์— ์˜ํ•œ ํŠน์ง•์ถ”์ถœ
์ฃผ์„ฑ๋ถ„๋ถ„์„
PCA:Principal Component
Analysis
๊ณต๋ถ„์‚ฐ ๋ถ„์„๊ณผ ํ™”์ดํŠธ๋‹ ๋ณ€ํ™˜
PCA ์•Œ๊ณ ๋ฆฌ์ฆ˜, ์ˆ˜ํ•™์  ์œ ๋„, ํŠน์„ฑ๊ณผ ๋ฌธ์ œ์ 
์„ ํ˜•ํŒ๋ณ„๋ถ„์„๋ฒ•
LDA:Linear Discriminant
Analysis
LDA ์•Œ๊ณ ๋ฆฌ์ฆ˜
LDA ํŠน์„ฑ๊ณผ ๋ฌธ์ œ์ 
๋งคํŠธ๋žฉ์„ ์ด์šฉํ•œ ์‹คํ—˜
1. ์„ ํ˜•๋ณ€ํ™˜์— ์˜ํ•œ ํŠน์ง•์ถ”์ถœ
๋ชฉ์  1
์ธ์‹์— ํ•ต์‹ฌ์ด ๋˜๋Š” ์ •๋ณด๋งŒ ์ถ”์ถœ → ๋ถˆํ•„์š”ํ•œ ์ •๋ณด ์ œ๊ฑฐ
๋ชฉ์  2
์ฐจ์›์˜ ์ €์ฃผ → ๋ฐ์ดํ„ฐ์˜ ์ฐจ์› ์ถ•์†Œ
์ธ์‹๋ฅ  ์ €ํ•˜ ์›์ธ
“์ฐจ์› ์ถ•์†Œ”
dimension reduction
n์ฐจ์› ์ž…๋ ฅ ๋ฐ์ดํ„ฐ x
์„ ํ˜•๋ณ€ํ™˜ W
(mโ‰ชn)
๋งคํ•‘
m์ฐจ์› ํŠน์ง• ๋ฒกํ„ฐ y
๋ถ€๋ถ„๊ณต๊ฐ„๋ถ„์„
subspace analysis
ํŠน์ง•์ถ”์ถœ์€ ์ธ์‹๊ธฐ์˜ ์„ฑ๋Šฅ์„ ์ขŒ์šฐํ•˜๋Š” ์ค‘์š”ํ•œ ์š”์†Œ์ด๋‹ค.
1. ์„ ํ˜• ๋ณ€ํ™˜์— ์˜ํ•œ ํŠน์ง•์ถ”์ถœ
(a)
x [2,2]
(b)
x [1,2]
y
w
W = [1,0]T
y=2
x๋ฅผ W ๋ฐฉํ–ฅ์œผ๋กœ ์‚ฌ์˜ํ•œ ์ขŒํ‘œ๊ฐ’
x๋ฅผ ๋ณ€ํ™˜ํ–‰๋ ฌ w๋กœ ์‚ฌ์˜ํ•˜์—ฌ ์–ป์€ ํฌ๊ธฐ๊ฐ’
1. ์„ ํ˜• ๋ณ€ํ™˜์— ์˜ํ•œ ํŠน์ง•์ถ”์ถœ
X : n×N ํ–‰๋ ฌ
Y = [y1, …, yN], m×N ํ–‰๋ ฌ
๏ƒผ ์ „์ฒด ๋ฐ์ดํ„ฐ ์ง‘ํ•ฉ X์— ๋Œ€ํ•œ ํŠน์ง•์ถ”์ถœ
2์ฐจ์› ๋ฐ์ดํ„ฐ ์ง‘ํ•ฉ์˜ ํŠน์ง• ์ถ”์ถœ → 1์ฐจ์› ํŠน์ง•
(a)
X
(b)
X
T
W X
T
W X
W ๏€ฝ ๏›1, 0 ๏
W ๏€ฝ ๏› 2,1๏
T
T
์„ ํ˜•๋ณ€ํ™˜์— ์˜ํ•œ ํŠน์ง•์ถ”์ถœ: ์ฃผ์–ด์ง„ ๋ฐ์ดํ„ฐ๋ฅผ ๋ณ€ํ™˜ํ–‰๋ ฌ W์— ์˜ํ•ด ์ •ํ•ด์ง€๋Š”
๋ฐฉํ–ฅ์œผ๋กœ ์‚ฌ์˜ํ•จ์œผ๋กœ์จ ์ €์ฐจ์›์˜ ํŠน์ง•๊ฐ’์„ ์–ป๋Š” ๊ฒƒ. W์˜ ๊ฐ ์—ด์€ ์‚ฌ์˜ํ• 
์ € ์ฐจ์› ๋ถ€๋ถ„๊ณต๊ฐ„์˜ ๊ธฐ์ €๋ฅผ ์ด๋ฃจ์–ด ํ•ด๋‹น ๋ถ€๋ถ„๊ณต๊ฐ„์„ ์ •์˜ํ•จ.
๏ƒผ ๋ณ€ํ™˜ ํ–‰๋ ฌ์— ๋”ฐ๋ฅธ ํŠน์ง•์˜ ๋ถ„ํฌ ํ˜•ํƒœ
3์ฐจ์› ๋ฐ์ดํ„ฐ ์ง‘ํ•ฉ์˜ ํŠน์ง• ์ถ”์ถœ → 2์ฐจ์› ๋ถ€๋ถ„๊ณต๊ฐ„์œผ๋กœ ์‚ฌ์˜์— ์˜ํ•œ ํŠน์ง• ์ถ”์ถœ
(a)
(b)
x2
x3
x2
(c)
x3
x1
(d)
x3
x1
x2
x1
์ข‹์€ ํŠน์ง• ์ถ”์ถœ → ๋ชฉ์ ์— ๋งž๋Š” ๋ถ€๋ถ„๊ณต๊ฐ„ ์ฐพ๊ธฐ: ๋ณ€ํ™˜ํ–‰๋ ฌ W (๋˜๋Š” ๋ณ€ํ™˜ํ–‰๋ ฌ์˜ ๊ฐ ์—ด๋กœ
๋‚˜ํƒ€๋‚ด๋Š” ๋ถ€๋ถ„๊ณต๊ฐ„์˜ ๊ธฐ์ €)๋ฅผ ์ ์ ˆํžˆ ์กฐ์ •ํ•˜์—ฌ ๋ถ„์„ ๋ชฉ์ ์— ๋งž๋Š” ํŠน์ง• ์ถ”์ถœ
2. ์ฃผ์„ฑ๋ถ„๋ถ„์„(Principal Component Analysis)
๋ฐ์ดํ„ฐ์˜ ๊ณต๋ถ„์‚ฐ ๋ถ„์„ํ•˜์—ฌ ๊ทธ ์„ฑ์งˆ์„ ์ด์šฉํ•ด ์„ ํ˜•๋ณ€ํ™˜ ์ˆ˜ํ–‰
๏ƒผ ํŠน์ง•๋ฒกํ„ฐ์˜ ํ†ต๊ณ„๋Ÿ‰
ํ‰๊ท 
์œผ๋กœ ๊ฐ€์ •
๊ณต๋ถ„์‚ฐ
: ์ž…๋ ฅ ๋ฐ์ดํ„ฐ X ์‚ฌ์šฉํ•˜๋Š” ๋Œ€์‹ , ๊ฐ ๋ฐ์ดํ„ฐ xi์—์„œ
์‚ฌ์šฉ
์„ผํ„ฐ๋ง(centering) ํ‰๊ท  ๋บ€ ํ–‰๋ ฌ
๋ฐ์ดํ„ฐ์˜ ๊ณต๋ถ„์‚ฐ๊ณผ ์„ ํ˜•๋ณ€ํ™˜๋œ ํŠน์ง• ๋ฐ์ดํ„ฐ์˜ ๊ณต๋ถ„์‚ฐ ๊ด€๊ณ„ ํ™•์ธ
: ํŠน์ง• ๋ฐ์ดํ„ฐ ์ง‘ํ•ฉ์˜ ๊ณต๋ถ„์‚ฐ์€ ์ž…๋ ฅ ๋ฐ์ดํ„ฐ์˜ ๊ณต๋ถ„์‚ฐ์„ W๋กœ ๋ณ€ํ™˜ํ•ด์„œ ์–ป์–ด์ง.
2-1. ๊ณต๋ถ„์‚ฐ๊ณผ ๋ถ„๋ฅ˜๊ธฐ ๊ด€๊ณ„
๏ƒผ ๊ณต๋ถ„์‚ฐ์ด ๋‹จ์œ„ํ–‰๋ ฌ์ธ ๊ฒฝ์šฐ
๏ƒผ ๊ณต๋ถ„์‚ฐ์ด ๋Œ€๊ฐํ–‰๋ ฌ์ธ ๊ฒฝ์šฐ
๋ฒ ์ด์ง€์–ธ ๋ถ„๋ฅ˜๊ธฐ๋Š” ์ตœ๋‹จ๊ฑฐ๋ฆฌ
(์œ ํด๋ฆฌ๋””์–ธ ๊ฑฐ๋ฆฌ)๋กœ ๋ถ„๋ฅ˜
์ •๊ทœํ™”๋œ ์œ ํด๋ฆฌ๋””์•ˆ ๊ฑฐ๋ฆฌ
: ๋ฐ์ดํ„ฐ์˜ ๊ฐ ์š”์†Œ๊ฐ„ ์ƒ๊ด€๊ด€๊ณ„๋Š” ์กด์žฌํ•˜์ง€ ์•Š๊ณ  ๊ฐ ์š”์†Œ๋“ค์˜
๋ถ„์‚ฐ๋งŒ ๊ณ ๋ ค๋˜๋Š” ๊ฒฝ์šฐ์ž„. ๊ฐ ์š”์†Œ๋ณ„๋กœ ๊ทธ ํ‘œ์ค€ํŽธ์ฐจ ๊ฐ’์„ ๋‚˜๋ˆˆ
๋‹ค์Œ, ์œ ํด๋ฆฌ๋””์–ธ ๊ฑฐ๋ฆฌ ๊ณ„์‚ฐ
๊ฐ€ ๋‹จ์œ„ํ–‰๋ ฌ/๋Œ€๊ฐํ–‰๋ ฌ์ด ๋˜๋„๋ก W ๊ตฌ์„ฑ
→ ๊ณ„์‚ฐ ๊ฐ„๋‹จ
2-2. ๋Œ€๊ฐํ™” ๋ณ€ํ™˜
์˜ ๊ณ ์œ ๋ฒกํ„ฐ ํ–‰๋ ฌ
(
Given a matrix ๐ด๐‘×๐‘, we say that ๐‘ฃ is an
eigenvector* if there exists a scalar ๐œ†
(the eigenvalue) such that ๐ด๐‘ฃ = ๐œ†๐‘ฃ
์˜ ๊ณ ์œ ์น˜๋ฅผ ๋Œ€๊ฐ์›์†Œ๋กœ ๊ฐ€์ง€๋Š” ํ–‰๋ ฌ
๊ฐ€ ์„œ๋กœ ์ง๊ต, ํฌ๊ธฐ๊ฐ€ 1์ธ ๊ฒฝ์šฐ)
์ผ ๋•Œ, →
๋Œ€๊ฐํ–‰๋ ฌ
๋Œ€๊ฐํ™” ๋ณ€ํ™˜
diagonalization transform
: ์ž…๋ ฅ ๋ฐ์ดํ„ฐ X์— ๋Œ€ํ•ด ๊ณต๋ถ„์‚ฐํ–‰๋ ฌ์˜ ๊ณ ์œ ์น˜ ๋ถ„์„์œผ๋กœ ์„œ๋กœ ์ง๊ต์ด๋ฉด์„œ
ํฌ๊ธฐ 1์ธ ๊ณ ์œ ๋ฒกํ„ฐ ์–ป์„ ๋•Œ, ์ด๋“ค์„ ์—ด๋ฒกํ„ฐ ๊ฐ€์ง€๋Š” ํ–‰๋ ฌ Φ๋กœ
์„ ํ˜•๋ณ€ํ™˜ํ•œ Y๋Š” ๋Œ€๊ฐํ–‰๋ ฌ์„ ๊ณต๋ถ„์‚ฐ์œผ๋กœ ๊ฐ€์ง€๋Š” ๋ถ„ํฌ๋ฅผ ์ด๋ฃธ.
2-3. ํ™”์ดํŠธ๋‹ ๋ณ€ํ™˜
: ์ถ”๊ฐ€ ๋ณ€ํ™˜ํ•ด์„œ ๊ณต๋ถ„์‚ฐํ–‰๋ ฌ์ด ์ •๋ฐฉํ–‰๋ ฌ ๋˜๋„๋ก
Y์˜ ๊ณต๋ถ„์‚ฐํ–‰๋ ฌ์ด ๋Œ€๊ฐํ–‰๋ ฌ Λ์ด๋ฏ€๋กœ, ๊ฐ ๋Œ€๊ฐ์›์†Œ์˜
์ œ๊ณฑ๊ทผ์˜ ์—ญ์ˆ˜๊ฐ’์œผ๋กœ ์–ป์–ด์ง€๋Š” ํ–‰๋ ฌ Λ-1/2 ์ด์šฉํ•œ
์„ ํ˜•๋ณ€ํ™˜ํ•ด ์ƒˆ๋กœ์šด ํŠน์ง•๋ฒกํ„ฐ Z ์–ป์Œ.
W=
(
๊ด€๊ณ„ ์ด์šฉ)
๋‹จ์œ„ํ–‰๋ ฌ
ํ™”์ดํŠธ๋‹ ๋ณ€ํ™˜
๏‚ง ๋ณต์žกํ•œ ๋ถ„ํฌ ํŠน์„ฑ์ด ์‚ฌ๋ผ์ง
whitening transform
๏‚ง ์ตœ๋‹จ๊ฑฐ๋ฆฌ๋ถ„๋ฅ˜๊ธฐ๋กœ ์ตœ์†Œ๋ถ„๋ฅ˜์œจ ํ™•๋ณด
: ์ž…๋ ฅ ๋ฐ์ดํ„ฐ X์— ๋Œ€ํ•œ ๊ณ ์œ ์น˜ํ–‰๋ ฌ Λ์™€ ๊ณ ์œ ๋ฒกํ„ฐํ–‰๋ ฌ Φ ์ด์šฉํ•œ ์„ ํ˜•
๋ณ€ํ™˜ํ–‰๋ ฌ์€ ๋‹จ์œ„ํ–‰๋ ฌ์„ ๊ณต๋ถ„์‚ฐ์œผ๋กœ ๊ฐ€์ง€๋Š” ํŠน์ง• ๋ฐ์ดํ„ฐ ์ง‘ํ•ฉ ์ฐพ๋Š” ๋ณ€ํ™˜
2-3. ๋Œ€๊ฐํ™” ๋ณ€ํ™˜๊ณผ ํ™”์ดํŠธ๋‹ ๋ณ€ํ™˜ ์˜ˆ
Y: X์˜ ๊ณ ์œ ๋ฒกํ„ฐ๋ฅผ
๊ธฐ์ €๋กœ ํ‘œํ˜„
Y ๏€ฝΦ X
T
X
(a)
Z: X์˜ ๊ณ ์œ ๋ฒกํ„ฐ์— ๊ณ ์œ ์น˜ ์ œ๊ณฑ๊ทผ
๊ณฑํ•ด ์–ป์–ด์ง„ ๋ฒกํ„ฐ๋ฅผ ๊ธฐ์ €๋กœ ํ‘œํ˜„
(b)
Z ๏€ฝ Λ
๏€ญ 12
T
Φ X
(c)
๋Œ€๊ฐํ™” ๋ณ€ํ™˜
: ๋ฐ์ดํ„ฐ ๋ถ„์‚ฐ์ด
๊ฐ€์žฅ ์ปค์ง€๋Š” ๋ฐฉํ–ฅ
ํ™”์ดํŠธ๋‹ ๋ณ€ํ™˜
: ๊ฐ ๊ณ ์œ ๋ฒกํ„ฐ
: ์ด ๋ฐฉํ–ฅ์—
์ˆ˜์ง์ธ ๋ฒกํ„ฐ ๋ฐฉํ–ฅ์œผ๋กœ์˜ ๋ถ„์‚ฐ
2-4 ์ฃผ์„ฑ๋ถ„๋ถ„์„(PCA) ์•Œ๊ณ ๋ฆฌ์ฆ˜
(์ž…๋ ฅ ๋ฐ์ดํ„ฐ n์ฐจ์› → ๋ณ€ํ™˜๋œ ๋ฐ์ดํ„ฐ m์ฐจ์›)
๋ณ€ํ™˜ ์ „์˜ ๋ฐ์ดํ„ฐ x๊ฐ€ ๊ฐ€์ง€๊ณ  ์žˆ๋Š” ์ •๋ณด๋ฅผ ์ฐจ์› ์ถ•์†Œ ํ›„์—๋„ ์ตœ๋Œ€ํ•œ ์œ ์ง€ ?
(a)
(b)
๏ƒ  ๋ฐ์ดํ„ฐ ์ง‘ํ•ฉ์ด ๊ฐ€๋Šฅํ•œ ๋„“๊ฒŒ ํผ์งˆ ์ˆ˜ ์žˆ๋Š” ๋ฐฉํ–ฅ์œผ๋กœ์˜ ์‚ฌ์˜ ๋ฐ”๋žŒ์ง
๏ƒ  ๋ฐ์ดํ„ฐ์˜ ๋ถ„์‚ฐ์ด ๊ฐ€์žฅ ํฐ ๋ฐฉํ–ฅ์œผ๋กœ์˜ ์„ ํ˜• ๋ณ€ํ™˜์„ ์ˆ˜ํ–‰
๏ƒ  ๊ณต๋ถ„์‚ฐ ํ–‰๋ ฌ์˜ ์ตœ๋Œ€ ๊ณ ์œ ๋ฒกํ„ฐ์™€ ๊ณ ์œ ์น˜ = ๋ถ„์‚ฐ์˜ ๊ฐ€์žฅ ํฐ ๋ฐฉํ–ฅ๊ณผ ๋ถ„์‚ฐ
๏ƒจ ๋ฐ์ดํ„ฐ์˜ ๊ณต๋ถ„์‚ฐ ํ–‰๋ ฌ์˜ ๊ณ ์œ ์น˜์™€ ๊ณ ์œ ๋ฒกํ„ฐ๋ฅผ ์ฐพ์•„, ๊ณ ์œ ์น˜๊ฐ€ ๊ฐ€์žฅ ํฐ ๊ฐ’
๋ถ€ํ„ฐ ์ˆœ์„œ๋Œ€๋กœ ์ด์— ๋Œ€์‘ํ•˜๋Š” m๊ฐœ์˜ ๊ณ ์œ ๋ฒกํ„ฐ๋ฅผ ์ฐพ์•„์„œ ํ–‰๋ ฌ W๋ฅผ ๊ตฌ์„ฑ
2-5. PCA ์•Œ๊ณ ๋ฆฌ์ฆ˜์˜ ์ˆ˜ํ–‰ ๋‹จ๊ณ„
2-6. 3D ๋ฐ์ดํ„ฐ์˜ ์ฃผ์„ฑ๋ถ„๋ถ„์„ ๊ฒฐ๊ณผ
(a) ์ž…๋ ฅ ๋ฐ์ดํ„ฐ
(c) 1,2์ฐจ ์ฃผ์„ฑ๋ถ„๋ฒกํ„ฐ์— ์˜ํ•œ ํŠน์ง•์ถ”์ถœ
(b) 1์ฐจ ์ฃผ์„ฑ๋ถ„๋ฒกํ„ฐ์— ์˜ํ•œ ํŠน์ง•์ถ”์ถœ
(d) 1,2,3์ฐจ ์ฃผ์„ฑ๋ถ„๋ฒกํ„ฐ์— ์˜ํ•œ ํŠน์ง•์ถ”์ถœ
“๋Œ€๊ฐํ™”๋ณ€ํ™˜”
2-7. PCA์˜ ์ˆ˜ํ•™์  ์œ ๋„
๊ธฐ์ €๋ฒกํ„ฐ ์ง‘ํ•ฉ {u1, u2, …, un} ์— ์˜ํ•œ ์ž…๋ ฅ ๋ฐ์ดํ„ฐ ํ‘œํ˜„
(์ง๊ต๋‹จ์œ„๊ธฐ์ €: ์„œ๋กœ ์ง๊ต, ํฌ๊ธฐ 1)
m๊ฐœ(m < n)์˜ ๊ธฐ์ €๋ฒกํ„ฐ๋งŒ์„ ์‚ฌ์šฉํ•œ ๋ฐ์ดํ„ฐ ๊ทผ์‚ฌ
: ์ฐจ์›์ถ•์†Œ
S: ํ‰๊ท  0๋˜๋„๋ก
์„ผํ„ฐ๋ง ๊ฐ€์ •ํ•˜๋ฉด,
๊ณต๋ถ„์‚ฐํ–‰๋ ฌ
๊ธฐ์ €๋ฒกํ„ฐ ์„ ํƒ์— ์˜ํ•œ ์ •๋ณด์†์‹ค๋Ÿ‰
Lagrange multiplier
J๋ฅผ ์ตœ์†Œ๋กœ ํ•˜๋Š” ๊ธฐ์ €๋ฒกํ„ฐ๋ฅผ ์œ„ํ•œ ๋ชฉ์ ํ•จ์ˆ˜
u์— ๋Œ€ํ•ด ๋ฏธ๋ถ„ํ•˜๋ฉด,
๋ชฉ์ ํ•จ์ˆ˜์˜ ์ตœ์ ํ™”
u , λ : ํ–‰๋ ฌ S์˜ ๊ณ ์œ ๋ฒกํ„ฐ์™€ ๊ณ ์œ ์น˜
๊ณต๋ถ„์‚ฐํ–‰๋ ฌ์˜ ๊ณ ์œ ์น˜๊ฐ’์€ ํ•ด๋‹น ๊ณ ์œ ๋ฒกํ„ฐ ์ด์šฉํ•ด
๋ฐ์ดํ„ฐ ์‚ฌ์˜ํ•˜์—ฌ ์–ป์–ด์ง€๋Š” ํŠน์ง•๊ฐ’๋“ค์˜ ๋ถ„์‚ฐ๊ฐ’ ๋‚˜ํƒ€๋ƒ„.
Lagrange multiplier
when one wishes to maximize or minimize a function
subject to fixed outside conditions or constraints.
• In mathematical optimization, Lagrange multiplier method is a
strategy for finding the local maxima and minima of a function
subject to equality constraints.
• For instance, consider the optimization problem maximize f(x,
y) subject to g(x, y) = c.
Find x and y to maximize f(x, y) subject to a
constraint (shown in red) g(x, y) = c.
Contour map. The red line shows the constraint
g(x, y) = c. The blue lines are contours of f(x, y).
The point where the red line tangentially touches a
blue contour is our solution. Since d1 > d2, the
solution is a maximization of f(x, y).
Lagrange multiplier
• We need both f and g to have continuous first partial derivatives.
• introduce a new variable (λ) called a Lagrange multiplier and study
the Lagrange function (or Lagrangian) defined by
where λ term may be either added or subtracted. If f(x0, y0) is a
maximum of f(x, y) for the original constrained problem, then there
exists λ0 such that (x0, y0, λ0) is a stationary point for the Lagrange
function (stationary points are those points where the partial
derivatives of Λ are zero, i.e. ∇Λ = 0).
A stationary point or critical point is a point of the domain
of a differentiable function, where the derivative is zero.
Lagrange multiplier
• consider the 2D problem: maximize f(x, y) subject to g(x, y) = c.
• Lagrange multipliers relies on the intuition that at a maximum f(x, y)
cannot be increasing in the direction of any neighboring point where g
= c.
• Suppose we walk along the contour line with g = c. We are interested
in finding points where f does not change as we walk, since these
points might be a maximum.
→ we could be following a contour line of f, since by definition f does
not change as we walk along its contour lines. This would mean
that the contour lines of f and g are parallel here
:
for some λ
where
: the respective gradients
• auxiliary function
and solve
• Example 1: Suppose we wish to maximize f(x, y) = x + y
subject to the constraint x2 + y2 =1. The feasible set is the unit
circle, and the level sets of f are diagonal lines (with slope -1),
so we can see graphically that the maximum occurs at ๏ƒฆ๏ƒง๏ƒง 2 , 2 ๏ƒถ๏ƒท๏ƒท ,
2 ๏ƒธ
๏ƒฆ
๏ƒจ 2
2
2๏ƒถ
๏ƒง๏€ญ
๏ƒท
,
๏€ญ
and that the minimum occurs at ๏ƒง๏ƒจ 2 2 ๏ƒท๏ƒธ .
1. Using the method of Lagrange multipliers,
g(x, y) – c = x2 + y2 –1.
2. Setting the gradient
yields the system of equations:
๋Œ€์ž…ํ•˜๋ฉด,
๏‚ฎ
๏ƒฆ
stationary points: ๏ƒง๏ƒง ๏€ญ
๏ƒจ
2
2
,๏€ญ
2๏ƒถ
๏ƒท
2 ๏ƒท๏ƒธ
and
๏ƒฆ 2
2๏ƒถ
๏ƒง
๏ƒท
,
๏ƒง 2
๏ƒท
2
๏ƒจ
๏ƒธ
3. Evaluating the objective function f at these points yields the maximum
& the minimum: 2 , ๏€ญ 2
2-7. PCA์˜ ์ˆ˜ํ•™์  ์œ ๋„
์†์‹ค๋˜๋Š” ์ •๋ณด๋Ÿ‰์˜ ๋น„์ค‘
= m๊ฐœ์˜ ํŠน์ง•์œผ๋กœ ํ‘œํ˜„ ๊ฐ€๋Šฅํ•œ ์ •๋ณด์˜ ๋น„์œจ
์„ ํƒํ•  ๊ณ ์œ ๋ฒกํ„ฐ์˜ ์ˆ˜(ํŠน์ง•์˜ ์ฐจ์ˆ˜) m
θ = 0.98์ด๋ผ๋ฉด, ์ •๋ณด ์†์‹ค๋Ÿ‰์ด ์ „์ฒด์˜ 2%์ดํ•˜
2-8. PCA์˜ ํŠน์„ฑ๊ณผ ๋ฌธ์ œ์ 
๏ƒผ PCA ๏ƒ  ๋ฐ์ดํ„ฐ ๋ถ„์„์— ๋Œ€ํ•œ ํŠน๋ณ„ํ•œ ๋ชฉ์ ์ด ์—†๋Š” ๊ฒฝ์šฐ์— ๊ฐ€์žฅ ํ•ฉ๋ฆฌ์ 
์ธ ์ฐจ์›์ถ•์†Œ์˜ ๊ธฐ์ค€
๏ƒผ PCA๋Š” ๋น„๊ต์‚ฌ ํ•™์Šต
๏‚ง ํด๋ž˜์Šค ๋ผ๋ฒจ ์ •๋ณด ํ™œ์šฉํ•˜์ง€ ์•Š์Œ ๏ƒ  ๋ถ„๋ฅ˜์˜ ํ•ต์‹ฌ ์ •๋ณด ์†์‹ค ์ดˆ๋ž˜
๊ฐ ํด๋ž˜์Šค๋ณ„๋กœ ๋ฐ์ดํ„ฐ์˜ ํžˆ์Šคํ† ๊ทธ๋žจ์ด ๊ฑฐ์˜ ๊ฒน์นจ.
2-8. PCA์˜ ํŠน์„ฑ๊ณผ ๋ฌธ์ œ์ 
๏ƒผ ์„ ํ˜•๋ณ€ํ™˜์˜ ํ•œ๊ณ„
๏‚ง ๋ฐ์ดํ„ฐ์˜ ๋น„์„ ํ˜• ๊ตฌ์กฐ๋ฅผ ๋ฐ˜์˜ํ•˜์ง€ ๋ชปํ•จ
(a)
(b)
์ฃผ์„ฑ๋ถ„2
์ฃผ์„ฑ๋ถ„1
์–ด๋–ค ๋ฐฉํ–ฅ์œผ๋กœ์˜ ์‚ฌ์˜๋„ ๋ฐ์ดํ„ฐ์˜ 2D ๊ตฌ์กฐ๋ฅผ ์ œ๋Œ€๋กœ ํ‘œํ˜„ํ•˜์ง€ ๋ชปํ•จ → ์ปค๋„ํ•จ์ˆ˜, ๋น„์„ ํ˜• ํ•™์Šต๋ฐฉ๋ฒ•.
3. ์„ ํ˜•ํŒ๋ณ„๋ถ„์„(Linear Discriminant Analysis)
๏ƒผ LDA ๋ชฉ์ 
๏‚ง ๋ณ€ํ™˜ํ–‰๋ ฌ W์˜ ์ตœ์ ํ™”๊ณผ์ •์˜ ๋ชฉ์ ํ•จ์ˆ˜ ์ •์˜ ์‹œ,
ํด๋ž˜์Šค ๋ผ๋ฒจ ์ •๋ณด๋ฅผ ์ ๊ทน ํ™œ์šฉ
๏‚ง ํด๋ž˜์Šค๊ฐ„ ํŒ๋ณ„์ด ์ž˜ ๋˜๋Š” ๋ฐฉํ–ฅ์œผ๋กœ ์ฐจ์›์ถ•์†Œ
PCA์™€ ๋งˆ์ฐฌ๊ฐ€์ง€๋กœ ์„ ํ˜•๋ณ€ํ™˜์— ์˜ํ•ด ํŠน์ง• ์ถ”์ถœ
(a)
(a) ๋ฐฉํ–ฅ์œผ๋กœ ์‚ฌ์˜ํ•œ ํŠน์ง•์ด
(b) ๋ณด๋‹ค ์ข‹์€ ๋ถ„๋ฅ˜ ์„ฑ๋Šฅ ๋ณด์ž„.
๏ƒผ ๋ถ„๋ฅ˜์— ์ ํ•ฉํ•œ ํŠน์ง• ๋ฐฉํ–ฅ์ด๋ž€?
๏‚ง ๊ฐ ํด๋ž˜์Šค๋“ค์ด ๊ฐ€๋Šฅํ•œ ์„œ๋กœ ๋ฉ€๋ฆฌ ๋–จ์–ด์งˆ ์ˆ˜ ์žˆ๋„๋ก ๊ฑฐ๋ฆฌ๋ฅผ ์œ ์ง€
(b)
3-1. LDA
: ์ด๋ถ„๋ฅ˜ ๋ฌธ์ œ
ํด๋ž˜์Šค ๊ฐ„ ๊ฑฐ๋ฆฌ์— ๋Œ€ํ•œ ๋ชฉ์ ํ•จ์ˆ˜
๋ถ„์ž๊ฐ’ ์ปค์งˆ์ˆ˜๋ก (ํด๋ž˜์Šค๊ฐ„ ๊ฑฐ
๋ฆฌ ๋ฉ€๊ณ ) + ๋ถ„๋ชจ๊ฐ’ ์ปค์งˆ์ˆ˜๋ก
(๊ฐ™์€ ํด๋ž˜์Šค๊ฐ„ ๊ฒฐ์ง‘)
→ ๋ถ„๋ฅ˜ ์šฉ์ด
ํด๋ž˜์Šค Ck์— ์†ํ•˜๋Š” ๋ฐ์ดํ„ฐ๋“ค์„ ๋ฒกํ„ฐ w๋กœ ์‚ฌ์˜ํ•ด
์–ป์–ด์ง„ 1D ํŠน์ง•๊ฐ’๋“ค์˜ ํ‰๊ท  = ํด๋ž˜์Šค Ck์˜ ๋ฐ์ดํ„ฐ
๋“ค์˜ ํ‰๊ท ๋ฒกํ„ฐ mk๋ฅผ w ๋ฐฉํ–ฅ์œผ๋กœ ์‚ฌ์˜ํ•œ ๊ฐ’
๋ณ€ํ™˜๋œ ํŠน์ง•๋ฐ์ดํ„ฐ์˜ ๊ณต๋ถ„์‚ฐ
์ด์šฉํ•˜๋ฉด,
ํด๋ž˜์Šค๊ฐ„ ์‚ฐ์ ํ–‰๋ ฌ
Between Scatter Matrix
ํด๋ž˜์Šค๋‚ด ์‚ฐ์ ํ–‰๋ ฌ
Within Scatter Matrix
๋ชฉ์ ํ•จ์ˆ˜์˜ ์ตœ์ ํ™”
๋ฒกํ„ฐ w๋Š” ๋‘ ํ‰๊ท ์˜ ์ฐจ ๋ฒกํ„ฐ์— ๋น„๋ก€ํ•˜๋Š” ํ˜•ํƒœ (์•ž ๊ทธ๋ฆผ ์ฐธ์กฐ)
3-2. LDA
: ๋‹ค์ค‘ ํด๋ž˜์Šค ๋ถ„๋ฅ˜ ๋ฌธ์ œ
๋ชฉ์ ํ•จ์ˆ˜
: ํด๋ž˜์Šค Ck์— ์†ํ•˜๋Š” ๋ฐ์ดํ„ฐ ๊ฐฏ์ˆ˜
๏€ญ1
๏€ญ1
๏€ญ1
๏€ญ1
๏€ญ1
๏€ญ1
Tr {( WS W W ) (WS B W )} ๏€ฝ Tr {( W ) ( S W ) W WS BW } ๏€ฝ Tr {( W ) ( S W ) S BW }
T
T
T
T
T
T
๋ชฉ์ ํ•จ์ˆ˜๋ฅผ ์ตœ๋Œ€๋กœ ํ•˜๋Š” ๋ณ€ํ™˜ํ–‰๋ ฌ W
orthogonal matrix: ์ •๋ฐฉํ–‰๋ ฌ X์˜ ๊ฐ ํ–‰๋ฒกํ„ฐ(๋˜๋Š” ์—ด๋ฒกํ„ฐ)๋“ค์ด ์ƒํ˜ธ์ง๊ต์ธ
๋‹จ์œ„๋ฒกํ„ฐ(orthonormal vector)๋“ค๋กœ ์ด๋ฃจ์–ด์ง. XXT = XTX= I (์ฆ‰, X-1 = XT)
3-3. LDA ์•Œ๊ณ ๋ฆฌ์ฆ˜์˜ ์ˆ˜ํ–‰ ๋‹จ๊ณ„
3-4. LDA ํŠน์„ฑ๊ณผ ๋ฌธ์ œ์ 
๏ƒผ ๊ต์‚ฌํ•™์Šต ๋Šฅ๋ ฅ
๏ƒผ ์„ ํ˜•๋ณ€ํ™˜์˜ ์ œ์•ฝ์ 
๏‚ง ๋ณต์žกํ•œ ๋น„์„ ํ˜• ๊ตฌ์กฐ๋ฅผ ๊ฐ€์ง„ ๊ฒฝ์šฐ์—๋Š” ์ ์ ˆํ•œ ๋ณ€ํ™˜์ด ๋ถˆ๊ฐ€
๏‚ง ์ปค๋„๋ฒ•, ๋น„์„ ํ˜• ๋งค๋‹ˆํด๋“œ ํ•™์Šต๋ฒ• ๋“ฑ์„ ์ด์šฉํ•œ ์ ‘๊ทผ ํ•„์š”
๏ƒผ ์„ ํƒํ•˜๋Š” ๊ณ ์œ ๋ฒกํ„ฐ์˜ ๊ฐœ์ˆ˜ (์ถ•์†Œ๋œ ํŠน์ง•์˜ ์ฐจ์›)
๏‚ง ์ง์ ‘ ๋ถ„๋ฅ˜๋ฅผ ํ†ตํ•ด ์–ป์–ด์ง€๋Š” ๋ฐ์ดํ„ฐ์— ๋Œ€ํ•œ ๋ถ„๋ฅ˜์œจ์„ ๊ธฐ์ค€์œผ๋กœ ๊ฒฐ์ •
๏‚ง ํ–‰๋ ฌ SW-1SB์— ์˜ํ•ด ์ฐพ์•„์ง€๋Š” ๊ณ ์œ ๋ฒกํ„ฐ์˜ ๊ฐœ์ˆ˜๊ฐ€ ์ œํ•œ
ํด๋ž˜์Šค ๊ฐœ์ˆ˜๊ฐ€ M ๊ฒฝ์šฐ, SB์˜ ๋žญํฌ M-1
๏ƒ  M๊ฐœ ํด๋ž˜์Šค ๋ถ„๋ฅ˜๋ฌธ์ œ์—์„œ ์ตœ๋Œ€ M-1์ฐจ์›์˜ ํŠน์ง•์ถ”์ถœ ๊ฐ€๋Šฅ
๏ƒผ ์ž‘์€ ์ƒ˜ํ”Œ์ง‘ํ•ฉ ๋ฌธ์ œ
(Small Sample Set Problem)
๏‚ง ์ž…๋ ฅ๋ฐ์ดํ„ฐ ์ˆ˜๊ฐ€ ์ž…๋ ฅ์ฐจ์›๋ณด๋‹ค ์ž‘์€ ๊ฒฝ์šฐ SW์˜ ์—ญํ–‰๋ ฌ์ด ์กด์žฌํ•˜์ง€ ์•Š์Œ
๏‚ง PCA๋กœ ๋จผ์ € ์ฐจ์›์ถ•์†Œํ›„ LDA ์ ์šฉ
4. ๋งคํŠธ๋žฉ ์ด์šฉํ•œ ์‹ค์Šต
4-1. ํ•™์Šต ๋ฐ์ดํ„ฐ ์ƒ์„ฑ
% ํ•™์Šต๋ฐ์ดํ„ฐ์˜ ์ƒ์„ฑ -------------------------------N=100;
m1=[0 0]; s1=[9 0;0 1];
m2=[0 4]; s2=[9 0;0 1];
X1=randn(N,2)*sqrtm(s1)+repmat(m1,N,1); % ํด๋ž˜์Šค1 ๋ฐ์ดํ„ฐ ์ƒ์„ฑ
X2=randn(N,2)*sqrtm(s2)+repmat(m2,N,1); % ํด๋ž˜์Šค2 ๋ฐ์ดํ„ฐ ์ƒ์„ฑ
figure(1);
plot(X2(:,1),X2(:,2),'ro'); hold on
plot(X1(:,1),X1(:,2),'*');
axis([-10 10 -5 10]);
save data8_9 X1 X2
4-2. PCA์— ์˜ํ•œ ๋ถ„์„
% PCA์— ์˜ํ•œ ๋ถ„์„ -------------------------------X=[X1;X2];
M=mean(X); S=cov(X);
% ํ‰๊ท ๊ณผ ๊ณต๋ถ„์‚ฐ ๊ณ„์‚ฐ
[V,D]=eig(S);
% ๊ณ ์œ ์น˜๋ถ„์„(U:๊ณ ์œ ๋ฒกํ„ฐํ–‰๋ ฌ, D: ๊ณ ์œ ์น˜ํ–‰๋ ฌ)
w1=V(:,2);
% ์ฒซ ๋ฒˆ์งธ ์ฃผ์„ฑ๋ถ„๋ฒกํ„ฐ
figure(1);
% ์ฒซ ๋ฒˆ์งธ ์ฃผ์„ฑ๋ถ„๋ฒกํ„ฐ๋ฅผ ๊ณต๊ฐ„์— ํ‘œ์‹œ
line([0 w1(1)*D(2,2)]+M(1),[0 w1(2)*D(2,2)]+M(2));
YX1=w1'*X1'; YX2=w1'*X2';
% ์ฒซ ๋ฒˆ์งธ ์ฃผ์„ฑ๋ถ„๋ฒกํ„ฐ๋กœ ์‚ฌ์˜
pYX1=w1*YX1; pYX2=w1*YX2;
% ์‚ฌ์˜๋œ ๋ฐ์ดํ„ฐ๋ฅผ 2์ฐจ์› ๊ณต๊ฐ„์œผ๋กœ ํ™˜์›
figure(2);
% ์‚ฌ์˜๋œ ๋ฐ์ดํ„ฐ๋ฅผ 2์ฐจ์› ๊ณต๊ฐ„์œผ๋กœ ํ™˜์›ํ•˜์—ฌ ํ‘œ์‹œ
plot(pYX1(1,:), pYX1(2,:)+M(2),'*');
hold on axis([-10 10 -5 10]);
plot(pYX2(1,:), pYX2(2,:)+M(2),'ro');
4-3. LDA์— ์˜ํ•œ ๋ถ„์„
% LDA์— ์˜ํ•œ ๋ถ„์„ -------------------------------m1=mean(X1); m2=mean(X2);
Sw=N*cov(X1)+N*cov(X2);
% with-in scatter ๊ณ„์‚ฐ
Sb=(m1-m2)'*(m1-m2);
% between scatter ๊ณ„์‚ฐ
[V,D]=eig(Sb*inv(Sw));
% ๊ณ ์œ ์น˜ ๋ถ„์„
w=V(:,2);
% ์ฐพ์•„์ง„ ๋ฒกํ„ฐ
figure(1);
% ์ฐพ์•„์ง„ ๋ฒกํ„ฐ๋ฅผ ๊ณต๊ฐ„์— ํ‘œ์‹œ
line([0 w(1)*-8]+M(1),[0 w(2)*-8]+M(2));
YX1=w'*X1'; YX2=w'*X2';
% ๋ฒกํ„ฐw ๋ฐฉํ–ฅ์œผ๋กœ ์‚ฌ์˜
pYX1=w*YX1; pYX2=w*YX2;
% ์‚ฌ์˜๋œ ๋ฐ์ดํ„ฐ๋ฅผ 2์ฐจ์› ๊ณต๊ฐ„์œผ๋กœ ํ™˜์›
figure(2);
% ์‚ฌ์˜๋œ ๋ฐ์ดํ„ฐ๋ฅผ 2์ฐจ์› ๊ณต๊ฐ„์œผ๋กœ ํ™˜์›ํ•˜์—ฌ ํ‘œ์‹œ
plot(pYX1(1,:)+M(1), pYX1(2,:),'*');
plot(pYX2(1,:)+M(1), pYX2(2,:),'ro');
4-4. ์‹คํ—˜ ๊ฒฐ๊ณผ
์ƒ์„ฑ๋œ ๋ฐ์ดํ„ฐ
PCD & LDA ๊ฒฐ๊ณผ
ํŒจํ„ด์ธ์‹ ์‹œ์Šคํ…œ ์„ค๊ณ„
์–ผ๊ตด์ธ์‹ ๋ฌธ์ œ
ํŠน์ง•์ถ”์ถœ๊ธฐ์˜ ํ•™์Šต
์˜์ƒ ๋ฐ์ดํ„ฐ๋ฅผ ์œ„ํ•œ ์ฃผ์„ฑ๋ถ„๋ถ„์„
PCA์— ์˜ํ•œ ํŠน์ง•์ถ”์ถœ - ์•„์ด๊ฒํŽ˜์ด์Šค
LDA์— ์˜ํ•œ ํŠน์ง•์ถ”์ถœ - ํ”ผ์…”ํŽ˜์ด์Šค
๋ถ„๋ฅ˜๊ธฐ์˜ ์„ค๊ณ„
๋งคํŠธ๋žฉ์— ์˜ํ•œ ์ธ์‹๊ธฐ ๊ตฌํ˜„
1. ์–ผ๊ตด์ธ์‹ ๋ฌธ์ œ
1-1. ๋‹ค์–‘ํ•œ ๋ณ€ํ˜•์„ ๊ฐ€์ง„ ์–ผ๊ตด ์˜์ƒ
๊ฐ๋„ ๋ณ€ํ™”
ํ‘œ์ • ๋ณ€ํ™”
์กฐ๋ช… ๋ณ€ํ™”
์–ผ๊ตด์ธ์‹, ํ‘œ์ • ์ธ์‹, ํฌ์ฆˆ ์ธ์‹,…
1-2. ํ•™์Šต ๋ฐ์ดํ„ฐ์˜ ์˜ˆ
0
60
-60
FERET ๋ฐ์ดํ„ฐ
๏‚ง ๊ฐœ์ธ๋‹น 9๊ฐœ์˜ ๋ฐ์ดํ„ฐ
๏‚ง -60๋„ ~ + 60๋„๊นŒ์ง€ 15๋„์”ฉ ํฌ์ฆˆ์˜ ๋ณ€ํ™”
ํ•™์Šต ๋ฐ์ดํ„ฐ : 150๊ฐœ
ํ…Œ์ŠคํŠธ ๋ฐ์ดํ„ฐ : 300๊ฐœ
๏ƒผ ๊ฐœ์ธ๋‹น 9๊ฐœ์ค‘ 3๊ฐœ ์‚ฌ์šฉ (60๋„, 0๋„, -60๋„) ๏ƒผ ๊ฐœ์ธ๋‹น 9๊ฐœ์ค‘ ๋‚˜๋จธ์ง€ 6๊ฐ€์ง€ ๋ณ€ํ˜• ์‚ฌ์šฉ
๏ƒผ ์ด 50๋ช… = 50๊ฐœ์˜ ํด๋ž˜์Šค
๏ƒผ ์ด 50๋ช…
1-3. ํฌ๊ธฐ ์กฐ์ •๋œ ๋ฐ์ดํ„ฐ
1-4. ์˜์ƒ ํฌ๊ธฐ ์กฐ์ •
ํฌ๊ธฐ ์กฐ์ •
70
50
40
50
3500์ฐจ์›
2000์ฐจ์›
ํ•™์Šต ๋ฐ์ดํ„ฐ ์ง‘ํ•ฉ : 150โ…น2000
ํ…Œ์ŠคํŠธ ๋ฐ์ดํ„ฐ ์ง‘ํ•ฉ : 300โ…น2000
๊ฒฐ๊ณผ
์ตœ๊ทผ์ ‘์ด์›ƒ๋ถ„๋ฅ˜ ๊ธฐ
Z
LDA
ํŠน์ง•์ถ”์ถœ
Y
PCA
์ฐจ์›์ถ•์†Œ
X
2. ํŠน์ง•์ถ”์ถœ๊ธฐ ํ•™์Šต
2-1. ์˜์ƒ ๋ฐ์ดํ„ฐ๋ฅผ ์œ„ํ•œ PCA
๏ƒผ ํ•™์Šต๋ฐ์ดํ„ฐ 150๊ฐœ๋ฅผ ์ด์šฉํ•˜์—ฌ ๊ณต๋ถ„์‚ฐํ–‰๋ ฌ∑ ์„ ๊ณ„์‚ฐํ•œ ํ›„, ๊ณ ์œ ์น˜ ๋ถ„์„ ์ˆ˜ํ–‰
2000โ…น2000
๏ƒจ ๋ฐ์ดํ„ฐ ํ–‰๋ ฌ X์˜ ์ „์น˜ํ–‰๋ ฌ XT๋ฅผ ํ™œ์šฉํ•œ ๋ณ€ํ˜•๋œ ์ฃผ์„ฑ๋ถ„ ์ถ”์ถœ ๋ฐฉ์‹์„ ์ ์šฉ
150โ…น2000์˜ ๋ฐ์ดํ„ฐ ํ–‰๋ ฌ X์˜ ๊ณต๋ถ„์‚ฐํ–‰๋ ฌ ๊ณ„์‚ฐ ๋Œ€์‹ ,
๊ณ ์œ ์น˜ ๋ถ„์„
ํ‰๊ท 
150โ…น150
๋ฐ์ดํ„ฐ ๊ฐœ์ˆ˜
์–‘๋ณ€์—
๊ณ ์œ ๋ฒกํ„ฐ
๊ณต๋ถ„์‚ฐํ–‰๋ ฌ ∑
๋ฅผ ๊ณฑํ•จ
๊ณ ์œ ์น˜
∑ ์˜ ๊ณ ์œ ๋ฒกํ„ฐ ๊ตฌํ•˜๊ณ  (X-M)T ๊ณฑํ•ด์„œ W ๊ตฌํ•จ.
์ง๊ตํ–‰๋ ฌ์˜ ์กฐ๊ฑด ๋งŒ์กฑํ•˜์ง€ ๋ชปํ•จ, 2000โ…น150
2-2. ์˜์ƒ ๋ฐ์ดํ„ฐ๋ฅผ ์œ„ํ•œ PCA
๏ƒผ ํŠน์ง•๋ฒกํ„ฐ์˜ ์ฐจ์› ๊ฒฐ์ •
๏ƒ  ๊ณ ์œ ์น˜์˜ ๊ฐ’์ด ํฐ ๊ฒƒ๋ถ€ํ„ฐ ์ฐจ๋ก€๋กœ ๋ช‡ ๊ฐœ๋ฅผ ์„ ํƒํ•  ๊ฒƒ์ธ๊ฐ€?
๊ฒฝํ—˜์  ์ ‘๊ทผ ๋ฐฉ๋ฒ•
(a)
(b)
0.98
Eigenvalue
Ratio of eigen-sum
0.95
69
15
number of eigenvalues
101
number of eigenvalues
2-3. PCA ๊ฒฐ๊ณผ - ์•„์ด๊ฒํŽ˜์ด์Šค
W๊ฐ€ ์ง๊ต๋ณ€ํ™˜์ธ ๊ฒฝ์šฐ
Eigenface : ์–ผ๊ตด์ธ์‹์— PCA๋ฅผ ์ ์šฉํ•˜์—ฌ ์ฐพ์•„์ง„ ๊ณ ์œ ๋ฒกํ„ฐ๋ฅผ ์˜์ƒ์œผ๋กœ ํ‘œ
ํ˜„
๏‚ป
= y1
+ y2
+ y3
W๊ฐ€ ์ •๋ฐฉํ–‰๋ ฌ์ด ์•„๋‹ˆ๊ณ  nโ…นm ํ–‰๋ ฌ์ด๋ฏ€๋กœ
์ง๊ตํ–‰๋ ฌ์˜ ์กฐ๊ฑด WWT = WTW = I๋ฅผ ๋งŒ์กฑ์‹œํ‚ค์ง€ ๋ชปํ•จ
→ ์ •๋ณด์˜ ์†์‹ค๋Ÿ‰์€ ํŠน์ง•์˜ ์ฐจ์› ์ˆ˜(๊ธฐ์ € ์˜์ƒ์˜ ์ˆ˜)์— ์˜์กด
+ … + y69
2-4. ์•„์ด๊ฒํŽ˜์ด์Šค
๊ณ ์œ ์น˜๊ฐ€ ํฐ ์ˆœ์„œ๋Œ€๋กœ ์ด 70๊ฐœ์˜ ๊ณ ์œ ๋ฒกํ„ฐ
2-5. LDA ๊ฒฐ๊ณผ - ํ”ผ์…”ํŽ˜์ด์Šค
๏ƒผ PCA๋ฅผ ํ†ตํ•ด ์–ป์€ 69์ฐจ์›์˜ ํŠน์ง• ๋ฐ์ดํ„ฐ ์ง‘ํ•ฉ Y์— ๋Œ€ํ•ด LDA ์ 
์šฉ ํŠน์ง• ๋ฐ์ดํ„ฐ ์ง‘ํ•ฉ Y๋ฅผ ๊ฐ ํด๋ž˜์Šค ๋ผ๋ฒจ ์ •๋ณด๋ฅผ ์ด์šฉํ•˜์—ฌ 50๊ฐœ์˜ ํด๋ž˜์Šค๋กœ ๋‚˜๋ˆ”
SW, SB๋ฅผ ๊ณ„์‚ฐํ•œ ํ›„
์˜ ๊ณ ์œ ์น˜๋ฒกํ„ฐ U (69โ…น49)๋ฅผ ๊ณ„์‚ฐ
์›๋ž˜ PCA์—์„œ W๋Š” 2000โ…น150 → 2000โ…น69
“Fisherface”
: Fischer’s linear
discriminant
3. ๋งคํŠธ๋žฉ ์ด์šฉํ•œ ์ธ์‹๊ธฐ ๊ตฌํ˜„
3-1. ๋ถ„๋ฅ˜๊ธฐ
๏ƒผ ์ตœ๊ทผ์ ‘์ด์›ƒ ๋ถ„๋ฅ˜๊ธฐ + ๋‹จ์ˆœํ•œ ์œ ํด๋ฆฌ๋””์•ˆ ๊ฑฐ๋ฆฌ
๏‚ง ์ด 300๊ฐœ์˜ ํ…Œ์ŠคํŠธ ๋ฐ์ดํ„ฐ ๊ฐ๊ฐ์— ๋Œ€ํ•˜์—ฌ ์ €์žฅ๋œ 150๊ฐœ์˜ ํ•™์Šต๋ฐ์ดํ„ฐ๋“ค๊ณผ์˜
๊ฑฐ๋ฆฌ๋ฅผ ๋น„๊ตํ•œ ํ›„, ๊ฐ€์žฅ ๊ฐ€๊นŒ์šด ์ด์›ƒ์œผ๋กœ ํ• ๋‹น
๏ƒผ ํŠน์ง• ๋ฐ์ดํ„ฐ Y : 69์ฐจ์›, ํŠน์ง• ๋ฐ์ดํ„ฐ Z : 49์ฐจ์›
๏‚ง ๊ฐ€๋Šฅํ•œ ๋ชจ๋“  ์ฐจ์›์— ๋Œ€ํ•˜์—ฌ ํ…Œ์ŠคํŠธ ํ•œ ํ›„ ๊ฐ€์žฅ ์ ์ ˆํ•œ ์ฐจ์›์„ ์„ ํƒ
๏ƒผ ํŠน์ง• ๋ฐ์ดํ„ฐ Y, Z์— ๋Œ€ํ•ด์„œ ๊ฐ๊ฐ ๋ถ„๋ฅ˜ ์ˆ˜ํ–‰
3-2. ํŠน์ง•์ถ”์ถœ (by PCA & LDA)
load data_person_trim
% ๋ฐ์ดํ„ฐ ๋ถˆ๋Ÿฌ์˜ค๊ธฐ
Ntrn=size(train_trim,1);
% ํ•™์Šต๋ฐ์ดํ„ฐ์˜ ์ˆ˜
Ntst=size(test_trim,1);
% ํ…Œ์ŠคํŠธ๋ฐ์ดํ„ฐ์˜ ์ˆ˜
Isize=size(train_trim,2);
% ์ž…๋ ฅ๋ฒกํ„ฐ์˜ ์ฐจ์›
Mcls = 50;
% ํด๋ž˜์Šค์˜ ์ˆ˜
%%%%% PCA์— ์˜ํ•œ ํŠน์ง• ์ถ”์ถœ
X=train_trim;
% ํ•™์Šต๋ฐ์ดํ„ฐ ํ–‰๋ ฌ
meanfig = mean(X);
% ํ•™์Šต๋ฐ์ดํ„ฐ ํ–‰๋ ฌ์˜ ํ‰๊ท 
M=repmat(meanfig,size(X,1),1); % ํ‰๊ท ๋ฒกํ„ฐ๋ฅผ ๋ฐ์ดํ„ฐ ์ˆ˜๋งŒํผ ๋ฐ˜๋ณตํ•œ ํ–‰๋ ฌ
S=(X-M)*(X-M)'/Ntrn;
% ๊ณ ์œ ์น˜๋ถ„์„์„ ์œ„ํ•œ ํ–‰๋ ฌ์„ ๊ณ„์‚ฐ
[V,D,U]=svd(S);
% ๊ณ ์œ ์น˜๋ถ„์„
W=(X-M)'*V;
% ๊ณ ์œ ๋ฒกํ„ฐ ์ฐพ๊ธฐ
3-2. ํŠน์ง•์ถ”์ถœ (by PCA & LDA)
%%%%% LDA์— ์˜ํ•œ ํŠน์ง•์ถ”์ถœ
eval=diag(D);
% PCA๋กœ ์ถ•์†Œํ•  ์ฐจ์› ๊ฒฐ์ •
for i=1:150
if ((sum(eval(1:i))/sum(eval))>0.95) break end
end
Ydim=i;
Wo=orth(W(:,1:Ydim));
% ๋ณ€ํ™˜ํ–‰๋ ฌ์˜ ์ง๊ตํ™”
Y=(Wo'*(X)')';
% PCA์— ์˜ํ•œ ์ฐจ์›์ถ•์†Œ (ํ•™์Šต๋ฐ์ดํ„ฐ)
Yt=(Wo'*(test_trim)')';
% PCA์— ์˜ํ•œ ์ฐจ์›์ถ•์†Œ (ํ…Œ์ŠคํŠธ๋ฐ์ดํ„ฐ)
Sw=zeros(Ydim);
% Within-Scatter Matrix์˜ ๊ณ„์‚ฐ
for i=1:Mcls
C=Y((i-1)*3+1:i*3,:);
Sw = Sw+3*cov(C);
m(i,:)=mean(C);
end
Sb=Mcls*cov(m);
% Between-Scatter Matrix์˜ ๊ณ„์‚ฐ
[Vf,Df,Uf]=svd(inv(Sw)*Sb);
% LDA ๋ณ€ํ™˜ํ–‰๋ ฌ ์ฐพ๊ธฐ
Z=(Vf'*Y')';
% LDA์— ์˜ํ•œ ํŠน์ง•์ถ”์ถœ (ํ•™์Šต๋ฐ์ดํ„ฐ)
Zt=(Vf'*Yt')';
% LDA์— ์˜ํ•œ ํŠน์ง•์ถ”์ถœ (ํ…Œ์ŠคํŠธ๋ฐ์ดํ„ฐ)
save feature_person W Vf Y Yt Z Zt; % ์ถ”์ถœ๋œ ํŠน์ง• ์ €์žฅ
_(๊ณ„์†)
3-3. ์–ผ๊ตด์ธ์‹ by PCA & LDA
load data_person_trim % ๋ฐ์ดํ„ฐ ๋ถˆ๋Ÿฌ์˜ค๊ธฐ
load feature_person
% ํŠน์ง• ๋ถˆ๋Ÿฌ์˜ค๊ธฐ
Ntrn=size(train_trim,1); % ํ•™์Šต๋ฐ์ดํ„ฐ์˜ ์ˆ˜
Ntst=size(test_trim,1); % ํ…Œ์ŠคํŠธ๋ฐ์ดํ„ฐ์˜ ์ˆ˜
Isize=size(train_trim,2); % ์ž…๋ ฅ๋ฒกํ„ฐ์˜ ์ฐจ์›
Mcls = 50;
% ํด๋ž˜์Šค์˜ ์ˆ˜
3-4. ์–ผ๊ตด์ธ์‹ by PCA & LDA_(๊ณ„์†โ‘ )
%%%%% PCA ํŠน์ง•๊ณผ LDA ํŠน์ง•์„ ์‚ฌ์šฉํ•œ ๋ถ„๋ฅ˜
for dim=1:Mcls-1
for i=1:Ntst
% ๊ฐ€๋Šฅํ•œ ๋ชจ๋“  ํŠน์ง• ์ฐจ์›์— ๋Œ€ํ•˜์—ฌ ๋ฐ˜๋ณต ์ˆ˜ํ–‰
% ๊ฐ ํ…Œ์ŠคํŠธ ๋ฐ์ดํ„ฐ์— ๋Œ€ํ•ด ๋ถ„๋ฅ˜ ์‹œ์ž‘
yt=Yt(i,:);
% ํ…Œ์ŠคํŠธ ๋ฐ์ดํ„ฐ์— ๋Œ€ํ•œ PCA ํŠน์ง•
zt=Zt(i,1:dim);
%Yt(:,i); % ํ…Œ์ŠคํŠธ ๋ฐ์ดํ„ฐ์— ๋Œ€ํ•œ LDA ํŠน์ง•
for j=1:Ntrn
% ํ•™์Šต๋ฐ์ดํ„ฐ๋“ค๊ณผ์˜ ๊ฑฐ๋ฆฌ๊ณ„์‚ฐ
dy(j)=norm(yt-Y(j,1:dim)); dz(j)=norm(zt-Z(j,1:dim));
end
[minvy, miniy]=min(dy);
%์ตœ๊ทผ์ ‘์ด์›ƒ ์ฐพ๊ธฐ (PCA ํŠน์ง•)
[minvz, miniz]=min(dz);
%์ตœ๊ทผ์ ‘์ด์›ƒ ์ฐพ๊ธฐ (LDA ํŠน์ง•)
min_labely(i)=train_label(miniy);
min_labelz(i)=train_label(miniz);
end
%์ตœ๊ทผ์ ‘์ด์›ƒ์˜ ํด๋ž˜์Šค๋กœ ํ• ๋‹น (PCA)
%์ตœ๊ทผ์ ‘์ด์›ƒ์˜ ํด๋ž˜์Šค๋กœ ํ• ๋‹น (LDA)
3-5. ์–ผ๊ตด์ธ์‹ by PCA & LDA_(๊ณ„์†โ‘ก)
error_labely=find(min_labely-test_label); %๋ถ„๋ฅ˜์œจ ๊ณ„์‚ฐ (PCA)
correcty=Ntst-size(error_labely,2);
classification_ratey(dim)=correcty/Ntst;
error_labelz=find(min_labelz-test_label);; %๋ถ„๋ฅ˜์œจ ๊ณ„์‚ฐ (LDA)
correctz=Ntst-size(error_labelz,2);
classification_ratez(dim)=correctz/Ntst;
[classification_ratey(dim), classification_ratez(dim)]
end % ๊ฐ€๋Šฅํ•œ ๋ชจ๋“  ํŠน์ง• ์ฐจ์›์— ๋Œ€ํ•˜์—ฌ ๋ฐ˜๋ณต ์ˆ˜ํ–‰
% ํŠน์ง• ์ˆ˜์— ๋”ฐ๋ฅธ ๋ถ„๋ฅ˜์œจ ๋ณ€ํ™” ๊ทธ๋ž˜ํ”„ ๊ทธ๋ฆฌ๊ธฐ
figure(1);
plot([1:49], classification_ratey); hold on
plot([1:49], classification_ratez,'r');
3-6. ํŠน์ง• ์ˆ˜์— ๋”ฐ๋ฅธ ๋ถ„๋ฅ˜์œจ์˜ ๋ณ€ํ™”
๋ถ„๋ฅ˜์œจ
LDA
PCA
ํŠน์ง•๋ฒกํ„ฐ์˜ ์ฐจ์›
Download