Support Vector Machines(SVM)

advertisement
Introduction to Pattern Recognition for Human ICT
Support Vector Machines
2014. 11. 28
Hyunki Hong
Contents
•
Introduction
•
Optimal separating hyperplanes
•
Kuhn-Tucker Theorem
•
Support Vectors
•
Non-linear SVMs and kernel methods
Introduction
• Consider the familiar problem of learning a binary
classification problem from data.
– Assume a given a dataset (𝑋,π‘Œ) = {(π‘₯1, 𝑦1),(π‘₯2, 𝑦2), …, (π‘₯𝑁, 𝑦𝑁)},
where the goal is to learn a function 𝑦 = 𝑓(π‘₯) that will correctly
classify unseen examples.
• How do we find such function?
– By optimizing some measure of performance of the learned model
• What is a good measure of performance?
– A good measure is the expected risk
where 𝐢(𝑓, 𝑦) is a suitable cost function, e.g., 𝐢(𝑓, 𝑦) = (𝑓(π‘₯)−𝑦)2
- Unfortunately, the risk cannot be measured directly since the
underlying pdf is unknown. Instead, we typically use the risk
over the training set, also known as the empirical risk.
• Empirical Risk Minimization
– A formal term for a simple concept: find the function 𝑓(π‘₯) that
minimizes the average risk on the training set.
– Minimizing the empirical risk is not a bad thing to do, provided that
sufficient training data is available, since the law of large numbers
ensures that the empirical risk will asymptotically converge to the
expected risk for 𝑛→∞.
• How do we avoid overfitting?
점근적: In applied mathematics, asymptotic analysis
is used to build numerical methods to approximate
equation solutions
– By controlling model complexity.
- Intuitively, we should prefer the simplest model that explains the
data.
Optimal separating hyperplanes
• Problem statement
– Consider the problem of finding a separating hyperplane
for a linearly separable dataset {(π‘₯1, 𝑦1), (π‘₯2, 𝑦2), …, (π‘₯𝑁, 𝑦
𝑁)},
π‘₯ ∈ 𝑅𝐷, 𝑦 ∈ {−1,+1}
– Which of the infinite hyperplanes should we choose?
1. Intuitively, a hyperplane that passes too close to the
training examples will be sensitive to noise and,
therefore, less likely to generalize well for data outside
the training set.
2. Instead, it seems reasonable to expect that a
hyperplane that is farthest from all training examples
will have better generalization capabilities
– Therefore, the optimal separating hyperplane will be the
one with the largest margin, which is defined as the
minimum distance of an example to the decision surface.
• Solution
- Since we want to maximize the margin, let’s express it as a function of
the weight vector and bias of the separating hyperplane.
– From trigonometry, the distance between a point π‘₯ and a plane (𝑀, 𝑏) is
– Since the optimal hyperplane has infinite solutions (by scaling the
weight vector and bias), we choose the solution for which the
discriminant function becomes one for the training examples closest to
the boundary.
This is known as the canonical hyperplane.
– Therefore, the distance from the closest example
to the boundary is
– And the margin becomes
• Therefore, the problem of maximizing the margin is equivalent
to
– Notice that 𝐽(𝑀) is a quadratic function, which means that there
exists a single global minimum and no local minima.
• To solve this problem, we will use classical Lagrangian
optimization techniques.
– We first present the Kuhn-Tucker Theorem, which provides an
essential result for the interpretation of Support Vector Machines.
Kuhn-Tucker Theorem
• Given an optimization problem with convex domain 𝛀⊆𝑅𝑁.
– In with 𝑓∈𝐢1 convex and 𝑔𝑖, β„Žπ‘– affine, necessary & sufficient conditions for a
normal point 𝑧∗ to be an optimum are the existence of 𝛼∗, 𝛽∗ such that
– 𝐿(𝑧, 𝛼, 𝛽) is known as a generalized Lagrangian function.
– The third condition is known as the Karush-Kuhn-Tucker (KKT)
complementary condition.
1. It implies that for active constraints 𝛼𝑖 ≥ 0; and for inactive constraints 𝛼𝑖 = 0
2. As we will see in a minute, the KKT condition allows us to identify the training
examples that define the largest margin hyperplane. These examples will be known as
Support Vectors
• Solution
- To simplify the primal problem, we eliminate the primal variables (𝑀,
𝑏) using the first Kuhn-Tucker condition πœ•π½/πœ•π‘§ = 0.
– Differentiating 𝐿𝑃(𝑀, 𝑏, 𝛼) with respect to 𝑀 and 𝑏, and setting to zero
yields
– Expansion of 𝐿𝑃 yields
– Using the optimality condition πœ•π½/πœ•π‘€ = 0, the first term in 𝐿𝑃 can be
expressed as
– The second term in 𝐿𝑃 can be expressed in the same way.
– The third term in 𝐿𝑃 is zero by virtue of the optimality condition πœ•π½/πœ•π‘
=0
– Merging these expressions together we obtain
– Subject to the (simpler) constraints 𝛼𝑖 ≥ 0 and
– This is known as the Lagrangian dual problem.
• Comments
– We have transformed the problem of finding a saddle point for 𝐿𝑃(𝑀,
𝑏) into the easier one of maximizing 𝐿𝐷(𝛼).
Notice that 𝐿𝐷(𝛼) depends on the Lagrange multipliers 𝛼, not on (𝑀,
𝑏).
– The primal problem scales with dimensionality (𝑀 has one
coefficient for each dimension), whereas the dual problem scales
with the amount of training data (there is one Lagrange multiplier
per example).
– Moreover, in 𝐿𝐷(𝛼) training data appears only as dot products π‘₯𝑖𝑇π‘₯𝑗.
Support Vectors
• The KKT complementary condition states that, for every point
in the training set, the following equality must hold
– Therefore, ∀π‘₯, either 𝛼𝑖 = 0 or 𝑦𝑖(𝑀𝑇π‘₯𝑖 + 𝑏 − 1) = 0 must hold.
1. Those points for which 𝛼𝑖 > 0 must then lie on one of the two hyperplanes
that define the largest margin (the term 𝑦𝑖(𝑀𝑇π‘₯𝑖 + 𝑏 − 1) becomes zero only
at these hyperplanes).
2. These points are known as the Support Vectors.
3. All the other points must have 𝛼𝑖 = 0.
– Note that only the SVs contribute to defining the optimal hyperplane.
1. NOTE: the bias term 𝑏 is found from the KKT
complementary condition on the support vectors.
2. Therefore, the complete dataset could be replaced by only
the support vectors, and the separating hyperplane would
be the same.
κ°•μ˜ λ‚΄μš©
μ„ ν˜• μ΄ˆν‰λ©΄μ— μ˜ν•œ λΆ„λ₯˜
SVM, Support Vector Machine
μŠ¬λž™λ³€μˆ˜λ₯Ό 가진 SVM
컀널법
λ§€νŠΈλž©μ„ μ΄μš©ν•œ μ‹€ν—˜
과닀적합과 μΌλ°˜ν™” 였차
ν•™μŠ΅ 였차 οƒ  ν•™μŠ΅ 데이터에 λŒ€ν•œ 좜λ ₯κ³Ό μ›ν•˜λŠ” 좜λ ₯과의 차이
μΌλ°˜ν™” 였차 οƒ  ν•™μŠ΅μ— μ‚¬μš©λ˜μ§€ μ•Šμ€ 데이터에 λŒ€ν•œ 였차
ν•™μŠ΅ μ‹œμŠ€ν…œμ˜ λ³΅μž‘λ„μ™€ μΌλ°˜ν™” 였차의 관계
(a)
(b)
ν•™μŠ΅ λ°μ΄ν„°μ˜ 경우
ν•™μŠ΅ 였차
μ„ ν˜• λΆ„λ₯˜κΈ° < λΉ„μ„ ν˜• λΆ„λ₯˜κΈ°
(c)
ν•™μŠ΅μ— μ‚¬μš©λ˜μ§€ μ•Šμ€ λ°μ΄ν„°μ˜ 경우
μΌλ°˜ν™” 였차
μ„ ν˜• λΆ„λ₯˜κΈ° < λΉ„μ„ ν˜• λΆ„λ₯˜κΈ°
μ„ ν˜• λΆ„λ₯˜κΈ° > λΉ„μ„ ν˜• λΆ„λ₯˜κΈ°
ν•™μŠ΅ 데이터가 전체 데이터 뢄포λ₯Ό
μ œλŒ€λ‘œ ν‘œν˜„ν•˜μ§€ λͺ»ν•˜λŠ” 경우 λ“±
κ³Όλ‹€ν•™μŠ΅μ„ ν”Όν•˜κ³  μΌλ°˜ν™” 였차λ₯Ό 쀄이기 μœ„ν•΄μ„œλŠ”
ν•™μŠ΅ μ‹œμŠ€ν…œμ˜ λ³΅μž‘λ„λ₯Ό 적절히 μ‘°μ •ν•˜λŠ” 것이 μ€‘μš”
과닀적합
μ„ ν˜• μ΄ˆν‰λ©΄ λΆ„λ₯˜κΈ°
μž…λ ₯ x에 λŒ€ν•œ μ„ ν˜• μ΄ˆν‰λ©΄ 결정경계 ν•¨μˆ˜
νŒλ³„ν•¨μˆ˜
f(x) = 1 οƒ  x ∈ C1
f(x) = -1 οƒ  x ∈ C2
μ΅œμ†Œ ν•™μŠ΅ 였차λ₯Ό λ§Œμ‘±ν•˜λŠ” μ—¬λŸ¬ 가지
μ„ ν˜• 결정경계
Which one?
SVMμ—μ„œ μ—¬λŸ¬ μ„ ν˜• κ²°μ •κ²½μ œ 쀑 μΌλ°˜ν™” 였차λ₯Ό μ΅œμ†Œλ‘œ ν•˜λŠ” 졜적의 경계λ₯Ό
μ°ΎκΈ° μœ„ν•΄ λ§ˆμ§„(margin)의 κ°œλ…μ„ λ„μž…ν•˜μ—¬ ν•™μŠ΅μ˜ λͺ©μ ν•¨μˆ˜λ₯Ό μ •μ˜
μ΅œλŒ€ λ§ˆμ§„ λΆ„λ₯˜κΈ°
λ§ˆμ§„(margin)
ν•™μŠ΅ 데이터듀 μ€‘μ—μ„œ λΆ„λ₯˜ 경계에 κ°€μž₯ κ°€κΉŒμš΄ λ°μ΄ν„°λ‘œλΆ€ν„° λΆ„λ₯˜κ²½κ³„κΉŒμ§€μ˜ 거리
μ„œν¬νŠΈλ²‘ν„°(support vector)
ν•™μŠ΅ 데이터듀 μ€‘μ—μ„œ λΆ„λ₯˜κ²½κ³„에 κ°€μž₯ κ°€κΉŒμš΄ 곳에 μœ„μΉ˜ν•œ 데이터
(a)
(b)
margin
margin
support vector
λΆ„λ₯˜ 경계에 λ”°λ₯Έ λ§ˆμ§„μ˜ 차이
μΌλ°˜ν™” 였차λ₯Ό μž‘κ²Œ οƒ  ν΄λž˜μŠ€κ°„μ˜ 간격을 크게 οƒ  λ§ˆμ§„μ„ μ΅œλŒ€ !!
 “μ΅œλŒ€ λ§ˆμ§„ λΆ„λ₯˜κΈ°”, “SVM”
μ΅œλŒ€ λ§ˆμ§„ λΆ„λ₯˜κΈ°
μ„ ν˜• λΆ„λ₯˜κ²½κ³„
(μ΄ˆν‰λ©΄)
w x  w0 ο€½ 0
T
d
x
w
ν•œ 점 xμ—μ„œ λΆ„λ₯˜ κ²½κ³„κΉŒμ§€μ˜ 거리
w x  w0
T
d ο€½
ο€­
w
w0
w
οƒ  C1
οƒ  C2
μ„œν¬νŠΈλ²‘ν„° χ
• 졜적 λΆ„λ₯˜ μ΄ˆν‰λ©΄ (OSH)
ν•œμ κ³Ό 이차원 평면과 μ΅œμ†Œκ±°λ¦¬
μ΅œλŒ€ λ§ˆμ§„μ˜ μˆ˜μ‹ν™”
• SVM은 μž…λ ₯이 m차원일 경우λ₯Ό ν¬ν•¨ν•˜μ—¬ 졜적 λΆ„λ₯˜μ΄ˆν‰λ©΄
인 결정경계와 λ§ˆμ§„μ„ μ΅œλŒ€ν™”ν•˜λŠ” 졜적 νŒŒλΌλ―Έν„°(w, b)λ₯Ό
찾아냄
κ°€μ€‘μΉ˜ 벑터, λ°”μ΄μ–΄μŠ€
μ΅œλŒ€ λ§ˆμ§„μ˜ μˆ˜μ‹ν™”
: λ§ˆμ§„
μ΅œλŒ€ λ§ˆμ§„μ˜ μˆ˜μ‹ν™”
μ΅œλŒ€ λ§ˆμ§„ λΆ„λ₯˜κΈ°
plus plane
C1
1
Margin
w
C2
1
w
λ§ˆμ§„ M
minus plane
λ§ˆμ§„μ˜ μ΅œλŒ€ν™”ν•˜λ €λ©΄,
support
vectors
의 μ΅œμ†Œν™”
ν•™μŠ΅ 데이터 μ΄μš©ν•΄
μ„ ν˜• λΆ„λ₯˜κΈ° νŒŒλΌλ―Έν„° μ΅œμ ν™”
SVM ν•™μŠ΅
ν•™μŠ΅ 데이터 집합
μΆ”μ •ν•΄μ•Ό ν•  νŒŒλΌλ―Έν„° w, woκ°€ λ§Œμ‘±ν•΄μ•Ό ν•  쑰건
μ΅œμ†Œν™”ν•  λͺ©μ ν•¨μˆ˜
+
“λΌκ·Έλž‘μ§€μ•ˆ ν•¨μˆ˜”
νŒŒλΌλ―Έν„° w, wo에 λŒ€ν•΄ κ·Ήμ†Œν™”ν•˜κ³ , αi에 λŒ€ν•΄ κ·ΉλŒ€ν™”
J ( w , w0 ,  ) ο€½
SVM ν•™μŠ΅
J(w, w0, α)의 dual problem
: J(w, w0, α) μ΅œμ ν™”ν•˜λŠ” λŒ€μ‹ ,
Q(α)의 μ΅œμ ν™”
1
2
N
T
T
i ο€½1
N
w w ο€½
T
οƒ₯
i ο€½1
N
yi w xi ο€½
T
i
N
w w ο€­ οƒ₯  i yi w xi ο€­ y0 οƒ₯  i yi 
i ο€½1
N
οƒ₯ οƒ₯ 
i
i ο€½1
T
j
yi y j x i x j
j ο€½1
μ›λž˜ 식을 λ‹€μ‹œ ν‘œν˜„
Q(α)λŠ” αi에 λŒ€ν•œ μ΄μ°¨ν•¨μˆ˜:
quadratic ν”„λ‘œκ·Έλž˜λ° μ΄μš©ν•˜μ—¬ κ°„λ‹¨νžˆ ν•΄(α^i) ꡬ할 수 있음.
οƒ  w, w0 μΆ”μ •
N
οƒ₯
i ο€½1
i
,
SVM ν•™μŠ΅
λŒ€λΆ€λΆ„ λ°μ΄ν„°λŠ” 결정경계와 λ–¨μ–΄μ Έ μžˆμœΌλ―€λ‘œ, 쑰건식
→ J(w, w0, α)을 μ΅œλŒ€ν™”ν•˜λŠ” 음이 μ•„λ‹Œ α^i λŠ” 0λΏμž„.
을 만쑱
λΆ„λ₯˜λ₯Ό μœ„ν•΄ μ €μž₯ν•  λ°μ΄ν„°μ˜ κ°œμˆ˜μ™€ κ³„μ‚°λŸ‰μ˜ ν˜„κ²©ν•œ κ°μ†Œ
SVM에 μ˜ν•œ λΆ„λ₯˜
κ²°μ •ν•¨μˆ˜
λŒ€μž…ν•˜λ©΄,
f(x) = 1 οƒ  C1
f(x) = -1 οƒ  C2
SVM에 μ˜ν•œ ν•™μŠ΅κ³Ό λΆ„λ₯˜
SVM에 μ˜ν•œ ν•™μŠ΅κ³Ό λΆ„λ₯˜
μŠ¬λž™λ³€μˆ˜λ₯Ό 가진 SVM
μ„ ν˜• 뢄리 λΆˆκ°€λŠ₯ν•œ λ°μ΄ν„°μ˜ 처리λ₯Ό μœ„ν•΄ μŠ¬λž™λ³€μˆ˜ ξ λ„μž…
: 잘λͺ» λΆ„λ¦¬λœ λ°μ΄ν„°λ‘œλΆ€ν„° ν•΄λ‹Ή 클래슀의 κ²°μ •κ²½κ³„κΉŒμ§€ 거리
i
1
k

j
1
1
λΆ„λ₯˜ 쑰건
μŠ¬λž™λ³€μˆ˜μ˜ 값이 클수둝 더 μ‹¬ν•œ μ˜€λΆ„λ₯˜λ₯Ό ν—ˆμš©!
μŠ¬λž™λ³€μˆ˜λ₯Ό 가진 SVM
: μ˜€λΆ„λ₯˜μ˜ ν—ˆμš©λ„ κ²°μ •.
c 컀지면, ξκ°’ μ»€μ§€λŠ” 것을 λ§‰μœΌλ―€λ‘œ μ˜€λΆ„λ₯˜ 였차 적어짐.
λͺ©μ ν•¨μˆ˜ μ΅œμ†Œν™”
λΌκ·Έλž‘μ§€μ•ˆ ν•¨μˆ˜
w, w0에 λŒ€ν•΄ λ―ΈλΆ„
μŠ¬λž™λ³€μˆ˜λ₯Ό μ–‘μœΌλ‘œ μœ μ§€ν•˜κΈ° μœ„ν•œ
μƒˆλ‘œμš΄ λΌκ·Έλž‘μ œ 승수 βi μΆ”κ°€
αiκ°€ 정해지면 w, w0의 값은 μŠ¬λž™λ³€μˆ˜κ°€ μ—†λŠ” κ²½μš°μ™€ μ™„μ „νžˆ 동일!
μ„ ν˜• 뢄리 κ°€λŠ₯ν•œ 경우
μ„ ν˜• 뢄리 λΆˆκ°€λŠ₯ν•œ 경우
컀널법
2D λ°μ΄ν„°λ‘œ 사상
λΉ„μ„ ν˜• λΆ„λ₯˜ 문제 οƒ  μ €μ°¨μ›μ˜ μž…λ ₯ x을 κ³ μ°¨μ›μ˜ κ³΅κ°„μ˜ κ°’ Φ(x)둜 λ³€ν™˜
(a)
(b)

κ³ μ°¨μ›μœΌλ‘œ μ„ ν˜•ν™”ν•˜λ©΄ κ°„λ‹¨ν•œ μ„ ν˜• λΆ„λ₯˜κΈ°λ₯Ό μ‚¬μš©ν•œ λΆ„λ₯˜ κ°€λŠ₯
κ·ΈλŸ¬λ‚˜ κ³„μ‚°λŸ‰ 증가와 같은 λΆ€μž‘μš© λ°œμƒ → “컀널법”으둜 ν•΄κ²°
컀널법과 SVM
n차원 μž…λ ₯ 데이터 xλ₯Ό m차원 (m>>1)의 νŠΉμ§• 데이터 Φ(x)둜 λ³€ν™˜ν•œ ν›„,
SVM을 μ΄μš©ν•΄μ„œ λΆ„λ₯˜ν•œλ‹€κ³  κ°€μ •
SVMμ—μ„œ 연산은 개개의 Φ(x)κ°€ μ•„λ‹ˆλΌ
두 λ²‘ν„°μ˜ 내적 Φ(x) βˆ™ Φ(y)λ₯Ό μ‚¬μš©
고차원 맀핑 Φ(x) λŒ€μ‹ μ— Φ(x) βˆ™ Φ(y)λ₯Ό ν•˜λ‚˜μ˜ ν•¨μˆ˜ k(x,y)둜 μ •μ˜ν•˜μ—¬ μ‚¬μš©
“컀널 ν•¨μˆ˜”
νŒŒλΌλ―Έν„° μΆ”μ • μœ„ν•œ λΌκ·Έλž‘μ§€μ•ˆ ν•¨μˆ˜
μž…λ ₯ x λŒ€μ‹  Φ(x)λ₯Ό μž…λ ₯으둜 μ‚¬μš©
이원적 문제의 ν•¨μˆ˜λ‘œ ν‘œν˜„ν•˜λ©΄
컀널법과 SVM
μΆ”μ •λœ νŒŒλΌλ―Έν„° μ΄μš©ν•΄ λΆ„λ₯˜ν•¨μˆ˜ ν‘œν˜„ κ°€λŠ₯
고차원 맀핑 톡해
λΉ„μ„ ν˜• 문제λ₯Ό μ„ ν˜•ν™”
: κ³„μ‚°λŸ‰ κ°μ†Œ
λŒ€ν‘œμ μΈ 컀널 ν•¨μˆ˜
μŠ¬λž™λ³€μˆ˜μ™€ 컀널 가진 SVM
μŠ¬λž™λ³€μˆ˜μ™€ 컀널 가진 SVM
맀트랩 μ΄μš©ν•œ μ‹€ν—˜
• SVM μ΄μš©ν•œ ν•™μŠ΅κ³Ό νŒ¨ν„΄ λΆ„λ₯˜ μ‹€ν—˜
데이터 개수 = 30
C1
C2
맀트랩 μ΄μš©ν•œ μ‹€ν—˜
맀트랩 ν•¨μˆ˜ quadprog()
λͺ©μ ν•¨μˆ˜
μ΄μ°¨κ³„νšλ²•μ— μ˜ν•΄ λͺ©μ ν•¨μˆ˜λ₯Ό μ΅œμ†Œν™”ν•˜λ©΄μ„œ νŠΉμ •
쑰건을 λ§Œμ‘±ν•˜λŠ” νŒŒλΌλ―Έν„° αλ₯Ό μ°ΎλŠ” ν•¨μˆ˜
λ§Œμ‘±ν•  쑰건
μž…λ ₯으둜 μ‚¬μš©λ  ꡬ체적인 ν–‰λ ¬κ³Ό 벑터λ₯Ό 찾으면
이용
이 값듀을 ν•¨μˆ˜μ˜ μž…λ ₯으둜 μ œκ³΅ν•˜μ—¬ αλ₯Ό μ°Ύκ³ , 이 값이 0보닀
큰 것을 μ„ νƒν•¨μœΌλ‘œμ¨ λŒ€μ‘λ˜λŠ” μ„œν¬νŠΈλ²‘ν„°λ₯Ό μ°ΎλŠ”λ‹€.
맀트랩 μ΄μš©ν•œ κ΅¬ν˜„
컀널 ν•¨μˆ˜ SVM_kernel
function [out] = SVM_kernel(X1,X2,fmode,hyp)
if (fmode==1)
% Linear Kernel
out=X1*X2';
end
if (fmode==2)
% Polynomial Kernel
out = (X1*X2'+hyp(1)).^hyp(2);
end
if (fmode==3)
% Gaussian Kernel
for i=1:size(X1,1)
for j=1:size(X2,1);
x=X1(i,:); y=X2( j,:);
out(i,j) = exp(-(x-y)*(x-y)'/(2*hyp(1)*hyp(1)));
end
end
end
λ§€νŠΈλž©μ„ μ΄μš©ν•œ κ΅¬ν˜„
SVM λΆ„λ₯˜κΈ°
% 2D 데이터 λΆ„λ₯˜λ₯Ό μœ„ν•΄ 컀널을 μ‚¬μš©ν•œ SVM λΆ„λ₯˜κΈ° 생성 및 λΆ„λ₯˜
load data12_8
% 데이터 뢈러옴
N=size(Y,1);
% N: 데이터 크기
kmode=2; hyp(1)=1.0; hyp(2)=2.0;
% 컀널 ν•¨μˆ˜μ™€ νŒŒλΌλ―Έν„° μ„€μ •
%%%%% μ΄μ°¨κ³„νšλ²• λͺ©μ ν•¨μˆ˜ μ •μ˜
%% μ΅œμ†Œν™” ν•¨μˆ˜: 0.5*a'*H*a - f'*a
%% 만쑱 쑰건 1: Aeq'*a = beq
%% 만쑱 λ²”μœ„: lb <= a <= ub
KXX=SVM_kernel(X,X,kmode,hyp);
% 컀널 ν–‰λ ¬ 계산
H=diag(Y)*KXX*diag(Y)+1e-10*eye(N);
% μ΅œμ†Œν™” ν•¨μˆ˜μ˜ ν–‰λ ¬
f=(-1)*ones(N,1);
% μ΅œμ†Œν™” ν•¨μˆ˜μ˜ 벑터
Aeq=Y'; beq=0;
% 만쑱 쑰건 1
lb=zeros(N,1); ub=zeros(N,1)+10^3;
% 만쑱 λ²”μœ„
λ§€νŠΈλž©μ„ μ΄μš©ν•œ κ΅¬ν˜„
SVM λΆ„λ₯˜κΈ°(계속)
%%%%% μ΄μ°¨κ³„νšλ²•μ— μ˜ν•œ μ΅œμ ν™” ν•¨μˆ˜ 호좜 --> μ•ŒνŒŒκ°’, w0 μΆ”μ •
[alpha, fval, exitflag, output]=quadprog(H,f,[],[],Aeq,beq,lb,ub)
svi=find(abs(alpha)>10^(-3));
% alpha>0 인 μ„œν¬νŠΈλ²‘ν„° μ°ΎκΈ°
svx = X(svi,:);
% μ„œν¬νŠΈλ²‘ν„° μ €μž₯
ksvx = SVM_kernel(svx,svx,kmode,hyp);
% μ„œν¬νŠΈλ²‘ν„° 컀널 계산
for i=1:size(svi,1)
% λ°”μ΄μ–΄μŠ€(w0) 계산
svw(i)=Y(svi(i))-sum(alpha(svi).*Y(svi).*ksvx(:,i));
end
w0=mean(svw);
λ§€νŠΈλž©μ„ μ΄μš©ν•œ κ΅¬ν˜„
SVM λΆ„λ₯˜κΈ°(계속)
%%%%% λΆ„λ₯˜ κ²°κ³Ό 계산
for i=1:N
xt=X(i,:);
ksvxt = SVM_kernel(svx,xt,kmode,hyp);
fx(i,1)=sign(sum(alpha(svi).*Y(svi).*ksvxt)+w0);
end
Cerr= sum(abs(Y-fx))/(2*N);
%%%%% κ²°κ³Ό 좜λ ₯ 및 μ €μž₯
fprintf(1,'Number of Support Vector: %d\n',size(svi,1));
fprintf(1,'Classificiation error rate: %.3f\n',Cerr);
save SVMres12_8 X Y posId negId svi svx alpha w0 kmode
λ§€νŠΈλž©μ„ μ΄μš©ν•œ κ΅¬ν˜„
결정경계λ₯Ό κ·Έλ¦¬λŠ” ν”„λ‘œκ·Έλž¨
load SVMres12_8
xM = max(X); xm= min(X);
S1=[floor(xm(1)):0.1:ceil(xM(1))];
S2=[floor(xm(2)):0.1:ceil(xM(2))];
for i=1:size(S1,2)
for j=1:size(S2,2)
xt=[S1(i),S2( j)];
ksvxt = SVM_kernel(svx,xt,kmode,hyp);
G(i,j)=(sum(alpha(svi).*Y(svi).*ksvxt)+w0);
F(i,j)=sign(G(i,j));
end
end
[X1, X2]=meshgrid(S1, S2);
figure(kmode); contour(X1',X2',F);
hold oncontour(X1',X2',G);
posId=find(Y>0); negId=find(Y<0);
plot(X(posId,1), X(posId,2),'d'); hold on
plot(X(negId,1), X(negId,2),'o');
plot(X(svi,1), X(svi,2),'g+');
% 데이터와 νŒŒλΌλ―Έν„° 뢈러였기
% 결정경계 그릴 μ˜μ—­ μ„€μ •
% μ˜μ—­λ‚΄μ˜ 각 μž…λ ₯에 λŒ€ν•œ 좜λ ₯ 계산
% κ²°μ • 경계 그리기
% νŒλ³„ν•¨μˆ˜μ˜ λ“±κ³ μ„  그리기
% 데이터 그리기
% μ„œν¬νŠΈλ²‘ν„° ν‘œμ‹œ
λ§€νŠΈλž©μ„ μ΄μš©ν•œ κ΅¬ν˜„
μ„ ν˜• 컀널 ν•¨μˆ˜λ₯Ό μ‚¬μš©ν•œ 경우 결정경계와 μ„œν¬νŠΈλ²‘ν„° (7개)
λ§€νŠΈλž©μ„ μ΄μš©ν•œ κ΅¬ν˜„
(a) Polynomial kernel
b=1,d=2, 5개
(b) Gaussian kernel
σ=1, 9개
닀항식 컀널과 κ°€μš°μ‹œμ•ˆ 컀널을 μ‚¬μš©ν•œ 경우 결정경계와 μ„œν¬νŠΈλ²‘ν„°
λ§€νŠΈλž©μ„ μ΄μš©ν•œ κ΅¬ν˜„
(a) σ=0.2
(b) σ=0.5
(c) σ=2.0
(d) σ=5.0
κ°€μš°μ‹œμ•ˆ μ»€λ„μ˜ σ에 λ”°λ₯Έ κ²°μ •κ²½κ³„μ˜ λ³€ν™”
Karush-Kuhn-Tucker (KKT) 쑰건
• In mathematical optimization, the Karush–Kuhn–Tucker
(KKT) conditions are first order necessary conditions for a
solution in nonlinear programming to be optimal, provided that
some regularity conditions are satisfied.
• Allowing inequality constraints, the KKT approach to
nonlinear programming generalizes the method of Lagrange
multipliers, which allows only equality constraints.
• The system of equations corresponding to the KKT conditions
is usually not solved directly, except in the few special cases
where a closed-form solution can be derived analytically. In
general, many optimization algorithms can be interpreted as
methods for numerically solving the KKT system of equations
λ§ˆμ§„ μ΅œλŒ€ν™” 쑰건식
• KKT 쑰건을 SVM μ΅œμ ν™” λ¬Έμ œμ— 적용
Download