Parameter estimation

advertisement
Introduction to Pattern Recognition for human ICT
Parameter estimation
2014. 10. 17
Hyunki Hong
Contents
•
Introduction
•
Parameter estimation
•
Maximum likelihood
•
Bayesian estimation
• In most situations, the true distributions are unknown and
must be estimated from data.
– Two approaches are commonplace.
1. Parameter Estimation
2. Non-parametric Density Estimation
• Parameter estimation
– Assume a particular form for the density (e.g. Gaussian), so
only the parameters (e.g., mean and variance) need to be
estimated.
1. Maximum Likelihood
2. Bayesian Estimation
• Non-parametric density estimation
– Assume NO knowledge about the density
1. Kernel Density Estimation
2. Nearest Neighbor Rule
ML vs. Bayesian parameter
estimation
• Maximum Likelihood
– The parameters are assumed to be FIXED but unknown.
– The ML solution seeks the solution that “best” explains the
dataset X :
• Bayesian estimation
– Parameters are assumed to be random variables with some
(assumed) known a priori distribution.
– Bayesian methods seeks to estimate the posterior density ๐‘(๐œƒ|
๐‘‹)
– The final density ๐‘(๐‘ฅ|๐‘‹) is obtained by integrating out the
parameters.
Maximum Likelihood
• Problem definition
– Assume we seek to estimate a density ๐‘(๐‘ฅ) that is known to
depends on a number of parameters ๐œƒ = [๐œƒ1, ๐œƒ2, …, ๐œƒ๐‘€]๐‘‡.
1. For a Gaussian pdf, ๐œƒ1 = ๐œ‡, ๐œƒ2 = ๐œŽ and ๐‘(๐‘ฅ) = ๐‘(๐œ‡, ๐œŽ).
2. To make the dependence explicit, we write ๐‘(๐‘ฅ|๐œƒ).
– Assume we have dataset ๐‘‹ = {๐‘ฅ(1, ๐‘ฅ(2, …, ๐‘ฅ(๐‘} drawn
independently from the distribution ๐‘(๐‘ฅ|๐œƒ) (an i.i.d. set)
1. Then we can write
independent identically distributed
2. The ML estimate of ๐œƒ is the value that maximizes the likelihood
๐‘(๐‘‹|๐œƒ).
3. This corresponds to the intuitive idea of choosing the value of
๐œƒ that is most likely to give rise to the data.
• For convenience, we will work with the log likelihood
- Because the log is a monotonic function, then:
- Hence, the ML estimate of ๐œƒ can be written as:
1. This simplifies the problem, since now we have to maximize a
sum of terms rather than a long product of terms.
2. An added advantage of taking logs will become very clear
when the distribution is Gaussian.
Example: Gaussian case, ๐ unknown
• Problem statement
– Assume a dataset ๐‘‹ = {๐‘ฅ(1, ๐‘ฅ(2, …, ๐‘ฅ(๐‘ } and a density of the form ๐‘
(๐‘ฅ) = ๐‘(๐œ‡, ๐œŽ) where ๐œŽ is known.
– What is the ML estimate of the mean?
– The maxima of a function are defined by the zeros of its
derivative.
- So the ML estimate of the mean is the average value of the training data, a
very intuitive result!
Example: Gaussian case, both ๐ and ๐ˆ
unknown
• A more general case when neither ๐ nor ๐ˆ is known.
– Fortunately, the problem can be solved in the same fashion.
– The derivative becomes a gradient since we have two
variables.
์ฐธ๊ณ ์ž๋ฃŒ: ๋’ท์žฅ
– Solving for ๐œƒ1 and ๐œƒ2 yields
Therefore, the ML of the variance is the sample variance of the
dataset, again a very pleasing result.
– Similarly, it can be shown that the ML estimates for the multivariate
Gaussian are the sample mean vector and sample covariance matrix.
์ฐธ๊ณ ์ž๋ฃŒ
• ๊ฐ€์šฐ์‹œ์•ˆ ํ™•๋ฅ ๋ฐ€๋„ํ•จ์ˆ˜์— ๋Œ€ํ•œ ์ตœ๋Œ€์šฐ๋„์ถ”์ •
– x = (x1, …, xR)์ด x ~ N(μ, σ2)์ธ ์ƒํ˜ธ ๋…๋ฆฝ ๋ฐ ๋™์ผ ํ™•๋ฅ ๋ถ„ํฌ ์กฐ๊ฑด์—์„œ ์ƒ์„ฑ
๋œ ํ‘œ๋ณธ์ด๋ผ ๊ฐ€์ •ํ•˜๋ฉด, θ = (θ1, θ2) = (μ, σ) ?
• ๋กœ๊ทธ ์šฐ๋„:
θ1, θ2 ์— ๋Œ€ํ•ด ๊ฐ๊ฐ ๋ฏธ๋ถ„
• l(θ)์˜ ๊ทธ๋ž˜๋””์–ธํŠธ:
d
1 du
(log u ) ๏€ฝ
dx
u dx
์„ค์ •
: MLE ๊ฒฐ๊ณผ๋Š” ๊ฐ๊ฐ ํ‘œ์ค€๊ณผ ๋ถ„์‚ฐ
Bias and variance
• How good are these estimates?
– Two measures of “goodness” are used for statistical estimates.
– BIAS: how close is the estimate to the true value?
– VARIANCE: how much does it change for different datasets?
– The bias-variance tradeoff
: In most cases, you can only decrease one of them at the expense of
the other.
• What is the bias of the ML estimate of the mean?
- Therefore the mean is an unbiased estimate.
• What is the bias of the ML estimate of the variance?
์ฐธ๊ณ ์ž๋ฃŒ
- Thus, the ML estimate of variance is BIASED.
This is because the ML estimate of variance uses
๐œ‡.
instead of
– How “bad” is this bias?
1. For ๐‘→∞ the bias becomes zero asymptotically.
2. The bias is only noticeable when we have very few samples, in
which case we should not be doing statistics in the first place!
– Notice that MATLAB uses an unbiased estimate of the
covariance.
Bayesian estimation
• In the Bayesian approach, our uncertainty about the
parameters is represented by a pdf
– Before we observe the data, the parameters are described
by a prior density ๐‘(๐œƒ) which is typically very broad to
reflect the fact that we know little about its true value.
– Once we obtain data, we make use of Bayes theorem to find
the posterior ๐‘(๐œƒ|๐‘‹).
: Ideally we want the data to sharpen the posterior ๐‘(๐œƒ|๐‘‹), that is,
reduce our uncertainty about parameters.
– Remember, though, that our goal is to estimate ๐‘(๐‘ฅ) or, more
exactly, ๐‘(๐‘ฅ|๐‘‹), the density given the evidence provided by
the dataset X.
Joint probability(๊ฒฐํ•ฉํ™•๋ฅ ): ๋‘ ์‚ฌ๊ฑด์ด ๋™์‹œ์— ์ผ์–ด๋‚  ํ™•๋ฅ  P(A, B)
A, B๊ฐ€ ์„œ๋กœ ๋…๋ฆฝ์ด๋ฉด, P(A, B) = P(A)P(B) = P(A∩B). ์•„๋‹ˆ๋ฉด P(A, B) = P(A|B)P(A)
• Let us derive the expression of a Bayesian estimate.
– From the definition of conditional probability
- ๐‘ƒ(๐‘ฅ|๐œƒ, ๐‘‹) is independent of X since knowledge of ๐œƒ
completely specifies the (parametric) density. Therefore
and, using the theorem of total probability we can integrate
๐œƒ out:
1. The only unknown in this expression is ๐‘(๐œƒ|๐‘‹); using Bayes rule
, where ๐‘(๐‘‹|๐œƒ) can be computed using the i.i.d. assumption.
2. NOTE : The last three expressions suggest a procedure to
estimate ๐‘(๐‘ฅ|๐‘‹). This is not to say that integration of these
expressions is easy!
• Example
– Assume a univariate density where our random variable ๐‘ฅ is
generated from a normal distribution with known standard
deviation.
– Our goal is to find the mean ๐œ‡ of the distribution given some
i.i.d. data points ๐‘‹ = {๐‘ฅ(1, ๐‘ฅ(2, …, ๐‘ฅ(๐‘}.
– To capture our knowledge about ๐œƒ = ๐œ‡, we assume that it
also follows a normal density with mean ๐œ‡0 and standard
deviation ๐œŽ0.
– We use Bayes rule to develop an expression for the
posterior ๐‘(๐œƒ|๐‘‹).
– To understand how Bayesian estimation changes the
posterior as more data becomes available, we will find the
maximum of ๐‘(๐œƒ|๐‘‹).
– The partial derivative with respect to ๐œƒ=๐œ‡ is
which, after some algebraic manipulation, becomes
Therefore, as N increases, the estimate of the mean ๐œ‡๐‘
moves from the initial prior ๐œ‡0 to the ML solution.
– Similarly, the standard deviation ๐œŽ๐‘can be found to be
Example
• Assume that the true mean of the distribution ๐‘(๐‘ฅ) is ๐œ‡ =
0.8 with standard deviation ๐œŽ = 0.3.
– In reality we would not know the true mean.
– We generate a number of examples from this distribution.
– To capture our lack of knowledge about the mean, we
assume a normal prior ๐‘0(๐œƒ0), with ๐œ‡0 = 0.0 and ๐œŽ0 = 0.3.
– The figure below shows the posterior ๐‘(๐œ‡|๐‘‹).
1. As ๐‘ increases, the estimate ๐œ‡๐‘ approaches its true value (๐œ‡ =
0.8) and the spread ๐œŽ๐‘ (or uncertainty in the estimate).
ML vs. Bayesian estimation
• What is the relationship between these two estimates?
– By definition, ๐‘(๐‘‹|๐œƒ) peaks at the ML estimate.
– If this peak is relatively sharp and the prior is broad, then
the integral below will be dominated by the region around
the ML estimate.
Therefore, the Bayesian estimate will approximate the ML solution.
– As we have seen in the previous example, when the number
of available data increases, the posterior ๐‘(๐œƒ|๐‘‹) tends to
sharpen.
1. Thus, the Bayesian estimate of ๐‘(๐‘ฅ) will approach the ML
solution as ๐‘→∞.
2. In practice, only when we have a limited number of observations
will the two approaches yield different results.
๋น„๋ชจ์ˆ˜ ๋ฐ€๋„ ์ถ”์ •๋ฒ•
01_๋น„๋ชจ์ˆ˜ ๋ฐ€๋„ ์ถ”์ •
02_ํžˆ์Šคํ† ๊ทธ๋žจ
03_์ปค๋„ ๋ฐ€๋„ ์ถ”์ •
04_Parzen ์ฐฝ์— ์˜ํ•œ ์ปค๋„ ๋ฐ€๋„ ์ถ”์ •
05_์Šค๋ฌด๋“œ ์ปค๋„์„ ์ด์šฉํ•œ ์ปค๋„ ๋ฐ€๋„ ์ถ”์ •
06_k-NNR์„ ์ด์šฉํ•œ ๋ฐ€๋„ ์ถ”์ • (๋‹ค์Œ์ฃผ)
01_๋น„๋ชจ์ˆ˜ ๋ฐ€๋„ ์ถ”์ •
์ฃผ์–ด์ง„ ์œ ํ•œ ๊ฐœ์˜ ๋ฐ์ดํ„ฐ x1, x2, ..., xN๋กœ๋ถ€ํ„ฐ ํ™•๋ฅ ๋ฐ€๋„ํ•จ์ˆ˜ P(x|Ck) ์ถ”์ • ๋ฌธ์ œ
• ๋น„๋ชจ์ˆ˜ ๋ฐ€๋„ ์ถ”์ •(Non-parametric Density Estimation)
– ํŒŒ๋ผ๋ฏธํ„ฐ(๋ชจ์ˆ˜)๋ฅผ ์‚ฌ์šฉํ•˜์ง€ ์•Š๊ณ  ํ‘œ๋ณธ ๋ฐ์ดํ„ฐ์— ๋Œ€ํ•œ ๋ฐ€๋„ํ•จ์ˆ˜ ์ถ”์ •
– ํ™•๋ฅ ๋ฐ€๋„ํ•จ์ˆ˜ P(x|Ck)๋ฅผ ์ถ”์ •ํ•˜๋Š” ๋ฐฉ๋ฒ•
• ํžˆ์Šคํ† ๊ทธ๋žจ
• ์ปค๋„๋ฐ€๋„์ถ”์ •(KDE)
– Parzen์ฐฝ ์ด์šฉํ•œ ์ปค๋„๋ฐ€๋„ ์ถ”์ •
– ์Šค๋ฌด์Šค ์ปค๋„ ์ด์šฉํ•œ ๋ฐ€๋„ ์ถ”์ •
• K-NNR์„ ์ด์šฉํ•œ ๋ฐ€๋„ ์ถ”์ •
02_ํžˆ์Šคํ† ๊ทธ๋žจ
•
๋ฐ์ดํ„ฐ์˜ ๋ฐ€๋„๋ฅผ ๊ฐ„๋‹จํ•œ ํ˜•ํƒœ๋กœ ํ‘œํ˜„
− ์ฃผ์–ด์ง„ ๋ฐ์ดํ„ฐ๋ฅผ ์ผ์ • ๊ฐ„๊ฒฉ(bin)์œผ๋กœ ๋‚˜๋ˆ„๊ณ  ๊ฐ ๊ฐ„๊ฒฉ์— ํ•ด๋‹น๋˜๋Š” ํ‘œ๋ณธ์˜ ๋นˆ๋„์ˆ˜
๋ฅผ ์นด์šดํŠธํ•˜์—ฌ ๋ฐ์ดํ„ฐ์˜ ๋ฐ€๋„๋ฅผ ํ‘œํ˜„ํ•œ ๊ทธ๋ž˜ํ”„
•
•
๋นˆ๋„์ˆ˜์˜ ์ด ํ•ฉ = 1 ๏ƒ  ํ™•๋ฅ ๋ฐ€๋„
๋‹จ์ 
− ๋ฐ€๋„ ์ถ”์ •์˜ ์ตœ์ข…๋ชจ์–‘์€ ๋ง‰๋Œ€์˜ ์‹œ์ž‘์ (origin)๊ณผ ๋ง‰๋Œ€-ํญ(bin-width)์— ์˜์กดํ•จ
− ๋ฐ์ดํ„ฐ ๊ฐ„๊ฒฉ์ด ํด ๊ฒฝ์šฐ, ์ถ”์ •๋œ ๋ฐ€๋„ ๊ฐ’์˜ ๋ถˆ์—ฐ์†๋„ ์ฆ๊ฐ€
๏ƒ  ๋ถˆ์—ฐ์† ์ •๋„๊ฐ€ ํฐ ๋งค๋„๋Ÿฝ์ง€ ๋ชปํ•œ ๋ฐ€๋„ ๊ฐ’
02_ํžˆ์Šคํ† ๊ทธ๋žจ
Y = [2.1, 2.4, 2.4, 2.47, 2.7, 2.6, 2.65, 3.3, 3.39, 3.8, 3.87];
X = [0.25:0.5:5];
N=
N = hist(Y, X);
0
0
0
0
4
3
2
2
0
0
๋ง‰๋Œ€ํญ: 0.5
์‹œ์ž‘์ : ๊ฐ๊ฐ n.0๊ณผ n.5
02_ํžˆ์Šคํ† ๊ทธ๋žจ
๋ง‰๋Œ€ํญ: 0.5
์‹œ์ž‘์ ์„ 0.25๋งŒํผ ์ด๋™
Y = [2.1, 2.4, 2.4, 2.47, 2.7, 2.6, 2.65, 3.3, 3.39, 3.8, 3.87];
X = [0.5:0.5:5];
N=
N = hist(Y, X);
0
0
0
1
6
0
2
2
0
๋‹จ์ :
1. ์ถ”์ •๋œ ๋ฐ€๋„๋Š” ์‹œ์ž‘์ ,
๋ง‰๋Œ€ํญ์— ์˜์กด
2. ๋‹ค๋ณ€๋Ÿ‰ ๋ฐ์ดํ„ฐ๋„ ์‹œ์ž‘์ ์—
์˜ํ–ฅ๋ฐ›์Œ.
3. ์ถ”์ •๋œ ๋ฐ€๋„๊ฐ€ ๋ถˆ์—ฐ์†์ 
0
03_์ปค๋„ ๋ฐ€๋„ ์ถ”์ •์˜ ๊ธฐ๋ณธ ๊ฐœ๋…
• Histogram vs. Generic KDE
– 1D Density Estimation
Histogram
KDE
03_์ปค๋„ ๋ฐ€๋„ ์ถ”์ •
• ๋น„๋ชจ์ˆ˜์  ๋ฐ€๋„์ถ”์ • ์ผ๋ฐ˜์‹
k
p ( x) ๏€
NV
• ์ปค๋„๋ฐ€๋„์ถ”์ • ๋‘ ๊ฐ€์ง€ ๋ฐฉ๋ฒ•
V : x๋ฅผ ๋‘˜๋Ÿฌ์‹ผ ์˜์—ญ R์˜ ์ฒด์ (๋ฉด์ )
N : ํ‘œ๋ณธ์˜ ์ด์ˆ˜
k : ์˜์—ญ R๋‚ด์˜ ํ‘œ๋ณธ์˜ ์ˆ˜
– KDE (Kernel Density Estimation)
• ์ฒด์  V๋ฅผ ๊ณ ์ •์‹œํ‚ค๊ณ , ์˜์—ญ๋‚ด์˜ ํ‘œ๋ณธ ์ˆ˜ k๋ฅผ ๊ฐ€๋ณ€ํ•˜์—ฌ ๋ฐ€๋„ํ•จ์ˆ˜ ์ถ”์ •
• Parzen ์ฐฝ(window) ์ถ”์ •๋ฒ• (๋˜๋Š” Parzen–Rosenblatt window)์ด ๋Œ€ํ‘œ์ 
– VN = 1
๊ณผ ๊ฐ™์€ N์˜ ํ•จ์ˆ˜๋กœ ์ฒด์  VN ์„ ์ง€์ •ํ•˜์—ฌ ์˜์—ญ์„ ์ค„์—ฌ๊ฐ€๋ฉด์„œ ์ตœ์ ๋ฐ€๋„ ์ถ”์ •
N
– k-NNR ์ถ”์ •๋ฒ•
• ์˜์—ญ ๋‚ด์˜ ํ‘œ๋ณธ ์ˆ˜ k ๊ฐ’์„ ๊ณ ์ •๋œ ๊ฐ’์œผ๋กœ ์„ ํƒํ•˜๊ณ  ์ฒด์  V๋ฅผ ๋ณ€๊ฒฝ์‹œ์ผœ ์ ‘๊ทผ
• ์–ด๋–ค ํ‘œ๋ณธ์ ์„ ํฌํ•จํ•˜๋Š” ์ฒด์ ์ด k N ๏€ฝ N ๊ฐœ์˜ ํ‘œ๋ณธ์„ ํฌํ•จํ•˜๋„๋ก ์ฒด์  ์ค„์—ฌ
๋‚˜๊ฐ€๋ฉด์„œ ์ตœ์  ๋ฐ€๋„ ์ถ”์ •
03_์ปค๋„ ๋ฐ€๋„ ์ถ”์ •
n์˜ ํ•จ์ˆ˜๋กœ ์ฒด์  Vn ์ง€์ •
• ์ปค๋„๋ฐ€๋„์ถ”์ •์˜ ๋‘๊ฐ€์ง€ ๋ฐฉ๋ฒ•
kn๊ฐœ์˜ ํ‘œ๋ณธ ํฌํ•จํ•˜๋„๋ก ์ฒด์  Vn ์ค„์ž„.
- KDE, K-NNR ๋ชจ๋‘ N์ด ์ปค์งˆ์ˆ˜๋ก ์‹ค์ œ ํ™•๋ฅ ๋ฐ€๋„์— ์ˆ˜๋ ดํ•จ.
- V์™€ k๋Š” N์— ๋”ฐ๋ผ ๊ฐ€๋ณ€์ ์œผ๋กœ ๊ฒฐ์ •๋จ.
04_Parzen ์ฐฝ์— ์˜ํ•œ ์ปค๋„ ๋ฐ€๋„ ์ถ”์ •
• ๋น„๋ชจ์ˆ˜์  ๋ฐ€๋„ํ•จ์ˆ˜ ์ถ”์ • ์ผ๋ฐ˜์‹: V ๊ณ ์ •, k ๊ฐ€๋ณ€
k
p ( x) ๏€
NV
V : x๋ฅผ ๋‘˜๋Ÿฌ์‹ผ ์˜์—ญ ์ฒด์  ๋ฒ”์œ„
N : ํ‘œ๋ณธ์˜ ์ด์ˆ˜
k : V๋‚ด์˜ ํ‘œ๋ณธ์˜ ์ˆ˜
– ์ค‘์‹ฌ์  x๊ฐ€ ์ž…๋ฐฉ์ฒด์˜ ์ค‘์‹ฌ, ํ•œ ๋ณ€์˜ ๊ธธ์ด h์ธ ์ดˆ์ž…๋ฐฉ์ฒด๋ฅผ ์˜์—ญ R,
์—ฌ๊ธฐ์— k๊ฐœ์˜ ํ‘œ๋ณธ์„ ํฌํ•จํ•œ๋‹ค๋ฉด
์ฒด์ : V ๏€ฝ h
D
D:
์ฐจ์›์˜ ์ˆ˜
Parzen ์ฐฝ(window) : ์ตœ์ดˆ๊ฐ’์ด ์ค‘์‹ฌ์ด ๋˜๋„๋ก ํ•˜๋Š” ๋‹จ์œ„ ์ดˆ์ž…๋ฐฉ์ฒด(hypercube)
๋ฒ”์œ„ ๋‚ด์— ํฌํ•จ์‹œํ‚ฌ ํ‘œ๋ณธ ์ˆ˜๋ฅผ ๊ฒฐ์ •ํ•  ํ•จ์ˆ˜(kernel)
๏ƒฌ
๏ƒฏ1, u j ๏ƒก 1 2
K (u ) ๏€ฝ ๏ƒญ
๏ƒฏ
๏ƒฎ0, ๊ทธ์™ธ
j ๏€ฝ 1,..., D
(Uniform kernel)
๏ƒฆ x ๏€ญ xn
k ๏€ฝ ๏ƒฅ K ๏ƒง๏ƒง
n ๏€ฝ1
๏ƒจ h
N
๋ฒ”์œ„ ๋‚ด์˜ ์ ์˜ ์ด์ˆ˜
๏ƒถ
๏ƒท
๏ƒท
๏ƒธ
04_Parzen ์ฐฝ์— ์˜ํ•œ ์ปค๋„ ๋ฐ€๋„ ์ถ”์ •
• Parzen–Rosenblatt ์œˆ๋„์šฐ ๋ฐ€๋„ํ•จ์ˆ˜ ์ถ”์ •์‹์˜ ์ผ๋ฐ˜ํ™”
– ์˜์—ญ ๋‚ด์˜ ํ‘œ๋ณธ์˜ ์ˆ˜๋Š” ์ค‘์‹ฌ์ด x์ด๊ณ  ๊ธธ์ด๊ฐ€ h์ธ ๋ฒ”์œ„ ๋‚ด๋ถ€์— ์  Xn
์ด ์†ํ•  ๋•Œ๋งŒ “1”์ด๊ณ , ๋‚˜๋จธ์ง€๋Š” “0”์ด ๋จ
– ์˜์—ญ V๋‚ด์— ํฌํ•จ๋˜๋Š” ํ‘œ๋ณธ์˜ ์ด ์ˆ˜๋ฅผ ๋ฐ€๋„์ถ”์ • ์ผ๋ฐ˜์‹์— ์น˜ํ™˜ํ•˜๋ฉด,
๏ƒฌ
๏ƒฏ1, u j ๏ƒก1 2
K (u ) ๏€ฝ ๏ƒญ
๏ƒฏ
๏ƒฎ0, ๊ทธ์™ธ
h
์ผ๋ฐ˜์ ์ธ ์ปค๋„ ๋ฐ€๋„ํ•จ์ˆ˜ ์ถ”์ •๊ธฐ
1
PKDE ( x) ๏€ฝ
Nh D
๏ƒฆX ๏€ญXn
K ๏ƒง๏ƒง
๏ƒฅ
h
n ๏€ฝ1
๏ƒจ
N
๏ƒถ
๏ƒท๏ƒท
๏ƒธ
04_Parzen ์ฐฝ์— ์˜ํ•œ ์ปค๋„ ๋ฐ€๋„ ์ถ”์ •
• Parzen–Rosenblatt ์œˆ๋„์šฐ
– Kernel functions in common use
for KDE
04_Parzen ์ฐฝ์— ์˜ํ•œ ์ปค๋„ ๋ฐ€๋„ ์ถ”์ •
• ์ปค๋„ํ•จ์ˆ˜์˜ ์—ญํ• 
PKDE ( x)์˜๊ธฐ๋Œ€๊ฐ’
– ์ถ”์ •๋œ ๋ฐ€๋„ PKDE ( x) ์˜ ๊ธฐ๋Œ€๊ฐ’: ์ปค๋„ ํ•จ์ˆ˜์™€ ์‹ค์ œ ํ™•๋ฅ ๋ฐ€๋„์˜ convolution
– ์ปค๋„์˜ ํญ h๋Š” ์Šค๋ฌด๋”ฉ ํŒŒ๋ผ๋ฏธํ„ฐ ์—ญํ• 
• ์ปค๋„ ํ•จ์ˆ˜๊ฐ€ ๋„“์œผ๋ฉด ๋„“์„์ˆ˜๋ก ์ถ”์ •์น˜๋Š” ๋” ๋ถ€๋“œ๋Ÿฌ์›Œ์ง
• h ๏ƒ 0 : ์ปค๋„ ํ•จ์ˆ˜์˜ ํญ ๏ƒ  0, ํ•จ์ˆ˜ ๊ฐ’ ๏ƒ  ๋ฌดํ•œ๋Œ€
์ฆ‰, h ๏ƒ 0 ์ด๋ฉด ์ปค๋„ ํ•จ์ˆ˜๊ฐ€ delta function์— ์ˆ˜๋ ดํ•จ.
04_Parzen ์ฐฝ์— ์˜ํ•œ ์ปค๋„ ๋ฐ€๋„ ์ถ”์ •
• ์ปค๋„ํ•จ์ˆ˜์˜ ์—ญํ• 
04_Parzen ์ฐฝ์— ์˜ํ•œ ์ปค๋„ ๋ฐ€๋„ ์ถ”์ •
• ์ปค๋„ํ•จ์ˆ˜์˜ ์—ญํ• 
๏ƒฌ
๏ƒฏ1, u j ๏ƒก1 2
K (u ) ๏€ฝ ๏ƒญ
๏ƒฏ
๏ƒฎ0, ๊ทธ์™ธ
(Uniform kernel)
04_Parzen ์ฐฝ์— ์˜ํ•œ ์ปค๋„ ๋ฐ€๋„ ์ถ”์ •
• ์ปค๋„ํ•จ์ˆ˜์—์„œ bandwidth h์˜ ์—ญํ• 
1
pn ( x) ๏€ฝ
Nh D
๏ƒฆ X ๏€ญXn ๏ƒถ
๏ƒท๏ƒท
K ๏ƒง๏ƒง
๏ƒฅ
n ๏€ฝ1
๏ƒจ h ๏ƒธ
N
– ์ปค๋„ ํ•จ์ˆ˜๊ฐ€ ๋‹ค์Œ๊ณผ ๊ฐ™์€ delta function์ด๋ผ๊ณ  ๊ฐ€์ •ํ•˜๋ฉด
1 ๏ƒฆX๏ƒถ
๏ค n ( X ) ๏€ฝ K ๏ƒง๏ƒง ๏ƒท๏ƒท
Vn ๏ƒจ hn ๏ƒธ
1 n
pn ( x ) ๏€ฝ ๏ƒฅ ๏ค n ๏€จ X ๏€ญ X
n j ๏€ฝ1
j
๏€ฉ
• h ๊ฐ€ ์ปค์งˆ์ˆ˜๋ก
– delta function์˜ ํฌ๊ธฐ ๊ฐ์†Œ
– ๋™์‹œ์— ํฌํ•จ๋˜๋Š” ํ‘œ๋ณธ ๋ฐ์ดํ„ฐ ์ˆ˜ ์ฆ๊ฐ€
– ๋„“๊ฒŒ ํผ์ง„ ํ‰ํ™œํ•œ ํ˜•ํƒœ์˜ ํ•จ์ˆ˜
• h ๊ฐ€ ์ž‘์•„์งˆ์ˆ˜๋ก
– delta function์˜ ํฌ๊ธฐ ์ฆ๊ฐ€
– Bandwith์— ํฌํ•จ๋˜๋Š” ํ‘œ๋ณธ ๋ฐ์ดํ„ฐ ์ˆ˜ ๊ฐ์†Œ
– ํ”ผํฌ๋ชจ์–‘์— ๊ฐ€๊นŒ์šด ํ˜•ํƒœ์˜ ๋ฐ€๋„ํ•จ์ˆ˜
๏ค n (X )
04_Parzen ์ฐฝ์— ์˜ํ•œ ์ปค๋„ ๋ฐ€๋„ ์ถ”์ •
• ์ปค๋„ํ•จ์ˆ˜์˜ ์—ญํ• 
05_์Šค๋ฌด๋“œ ์ปค๋„์„ ์ด์šฉํ•œ ์ปค๋„ ๋ฐ€๋„ ์ถ”์ •
– ์—ฐ์†์ ์ธ ํ˜•ํƒœ์˜ ์ปค๋„ํ•จ์ˆ˜๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ๊ฒฝ์šฐ
ํ•ญ์ƒ ์–‘์ธ ์Šค๋ฌด๋“œ ์ปค๋„ํ•จ์ˆ˜ K(x)๋กœ
ํŒŒ์  ์ฐฝ ์ผ๋ฐ˜ํ™”ํ•ด์„œ ๋ถˆ์—ฐ์† ํ˜„์ƒ ํ•ด๊ฒฐ
h: ์Šค๋ฌด๋“œ ํŒŒ๋ผ๋ฏธํ„ฐ ๋˜๋Š” ๋Œ€์—ญํญ
– Parzen ์ฐฝ์„ ์‚ฌ์šฉํ•  ๊ฒฝ์šฐ ์ถ”์ •๋œ ์ „์ฒด ๋ฐ€๋„์˜ ๋ชจ์–‘์ด
๋ถˆ์—ฐ์†์ ์ธ ๋ชจ์–‘์„ ๊ฐ–๋Š”
๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•œ ์ถ”์ •๋ฒ•
05_์Šค๋ฌด์Šค ์ปค๋„์„ ์ด์šฉํ•œ ์ปค๋„ ๋ฐ€๋„ ์ถ”์ •
• ์ฐฝํ•จ์ˆ˜ ํšจ๊ณผ
05_์Šค๋ฌด์Šค ์ปค๋„์„ ์ด์šฉํ•œ ์ปค๋„ ๋ฐ€๋„ ์ถ”์ •
• ์ฐฝํ•จ์ˆ˜ ํšจ๊ณผ
- ๋Œ€์—ญํญ์ด ์ง€๋‚˜์น˜๊ฒŒ ์ปค์ง€๋ฉด:
• ๋ฐ€๋„ํ•จ์ˆ˜์˜ ๊ณผ๋„ํ•œ ํ‰ํ™œํ™”
» Oversmooth
• ์ •ํ™•ํ•œ ํ˜•ํƒœ ์ถ”์ •์˜ ์–ด๋ ค์›€
- ๋Œ€์—ญํญ์ด ์ง€๋‚˜์น˜๊ฒŒ ์ข์•„์ง€
๋ฉด:
• ์ „์ฒด์  ์ถ”์„ธ๊ฐ€ ์•„๋‹Œ ๊ฐœ๋ณ„ ๋ฐ
์ดํ„ฐ์— ๋ฏผ๊ฐํ•ด์ง
» Undersmooth
• ๋ถ„์„์ด๋‚˜ ํด๋ž˜์Šคํ™” ํž˜๋“ค์–ด์ง
05_์Šค๋ฌด์Šค ์ปค๋„์„ ์ด์šฉํ•œ ์ปค๋„ ๋ฐ€๋„ ์ถ”์ •
• ์–ธ๋”์Šค๋ฌด๋“œvs. ์˜ค๋ฒ„์Šค๋ฌด๋“œ
Download