Review of probability and statistics

advertisement
Introduction to Pattern Recognition for Human ICT
Review of probability and statistics
2014. 9. 19
Hyunki Hong
Contents
• Probability
• Random variables
• Random vectors
• The Gaussian random variable
Review of probability theory
• Definitions (informal)
1. Probabilities are numbers assigned to events that indicate
“how likely” it is that the event will occur when a random
experiment is performed.
2. A probability law for a random experiment is a rule that
assigns probabilities to the events in the experiment.
3. The sample space S of a random experiment is the set of all
possible outcomes.
• Axioms of probability
1. Axiom I: 𝑃[𝐴𝑖] ≥ 0
2. Axiom II: 𝑃[𝑆] = 1
3. Axiom III: 𝐴𝑖∩𝐴j = ∅ ⇒ 𝑃[𝐴𝑖 ⋃ 𝐴𝑗] = 𝑃[𝐴𝑖] + 𝑃[𝐴𝑗]
More properties of probability
예) N=3,
P[A1] + P[A2] + P[A3] - P[A1∩A2] - P[A1∩A3] - P[A2∩A3] + P[A1∩A2∩A3]
Conditional probability
• If A and B are two events, the probability of event A when we
already know that event B has occurred is
1. This conditional probability P[A|B] is read:
– the “conditional probability of A conditioned on B”, or simply
– the “probability of A given B”
•
Interpretation
1. The new evidence “B has occurred”
has the following effects.
- The original sample space S (the square) becomes B (the rightmost circle).
- The event A becomes A∩B.
2. P[B] simply re-normalizes the probability of events that occur jointly with B.
Theorem of total probability
• Let 𝐡1, 𝐡2, …, 𝐡𝑁 be a partition of 𝑆 (mutually exclusive
that add to 𝑆).
• Any event 𝐴 can be represented as
𝐴 = 𝐴∩𝑆 = 𝐴∩(𝐡1∪𝐡2 …𝐡𝑁) = (𝐴∩𝐡1)∪(𝐴∩𝐡2)…(𝐴∩𝐡𝑁)
• Since 𝐡1, 𝐡2, …, 𝐡𝑁 are mutually exclusive, then
𝑃[𝐴] = 𝑃[𝐴∩𝐡1] + 𝑃[𝐴∩𝐡2] + β‹― + 𝑃[𝐴∩𝐡𝑁]
and, therefore
N
[ A | Bk ]P[ Bk ]
𝑃[𝐴] = 𝑃[𝐴|𝐡1]𝑃[𝐡1] + β‹― + 𝑃[𝐴|𝐡𝑁] 𝑃[𝐡οƒ₯𝑁]P =
k ο€½1
Bayes theorem
•
•
•
•
Assume {𝐡1, 𝐡2, …, 𝐡𝑁} is a partition of S
Suppose that event 𝐴 occurs
What is the probability of event 𝐡𝑗?
Using the definition of conditional probability and the
theorem of total probability we obtain
P[ B j | A] ο€½
P[ A  B j ]
P[ A]
ο€½
P[ A | B j ]P[ B j ]
N
οƒ₯ P[ A | B
k
]P[ Bk ]
k ο€½1
• This is known as Bayes Theorem or Bayes Rule, and is
(one of) the most useful relations in probability and
statistics.
Bayes theorem & statistical pattern recognition
• When used for pattern classification, BT is generally
p [ x |  j ] P [ j ]
expressed as p [ x |  j ] P [ j ]
P [ j | x ] ο€½
ο€½
N
οƒ₯
k ο€½1
p [ x |  k ] P [ k ]
p[ x ]
Mel-frequency cepstrum (MFC): 단ꡬ간 μ‹ ν˜Έμ˜ νŒŒμ›ŒμŠ€νŽ™νŠΈλŸΌμ„ ν‘œν˜„ν•˜
λŠ” 방법. λΉ„μ„ ν˜•μ μΈ MelμŠ€μΌ€μΌμ˜ 주파수 λ„λ©”μΈμ—μ„œ λ‘œκ·ΈνŒŒμ›ŒμŠ€νŽ™νŠΈλŸΌ
에 cosine transform으둜 μ–»μŒ. Mel-frequency cepstral coefficients
(MFCCs)λŠ” μ—¬λŸ¬ MFC듀을 λͺ¨μ•„ 놓은 κ³„μˆ˜λ“€μ„ μ˜λ―Έν•¨. (μŒμ„±μ‹ ν˜Έμ²˜λ¦¬)
where πœ”π‘— is the 𝑗-th class (e.g., phoneme) and π‘₯ is the
feature/observation vector (e.g., vector of MFCCs)
• A typical decision rule is to choose class πœ”π‘— with highest P[πœ”π‘—|
π‘₯].
Intuitively, we choose the class that is more “likely” given
observation π‘₯.
• Each term in the Bayes Theorem has a special name
1. 𝑃[πœ”π‘—]
prior probability (of class πœ”π‘—)
2. 𝑃[πœ”π‘—|π‘₯] posterior probability (of class πœ”π‘— given the observation π‘₯)
3. 𝑝[π‘₯|πœ”π‘—] likelihood (probability of observation π‘₯ given class πœ”π‘—)
4. 𝑝[π‘₯]
normalization constant (does not affect the decision)
• Example
1. Consider a clinical problem where we need to decide if a patient has a
particular medical condition on the basis of an imperfect test.
- Someone with the condition may go undetected (false-negative).
- Someone free of the condition may yield a positive result (false-positive).
2. Nomenclature
TN / (TN+FP) X 100 νŠΉμ΄λ„
- The true-negative rate P(NEG|-COND) of a test is called its
SPECIFICITY
TPcalled
/ (TP+FN)
X 100 민감도
- The true-positive rate P(POS|COND) of a test is
its SENSITIVITY
3. Problem
- Assume a population of 10,000 with a 1% prevalence for the condition
- Assume that we design a test with 98% specificity and 90% sensitivity
- Assume you take the test, and the result comes out POSITIVE
- What is the probability that you have the condition?
4. Solution
- Fill in the joint frequency table next slide, or
- Apply Bayes rule
198
9,702
• Applying Bayes reule
Random variables
• When we perform a random experiment, we are usually
interested in some measurement or numerical attribute of
the outcome.
ex) weights in a population of subjects, execution times when
benchmarking CPUs, shape parameters when performing
ATR
• These examples lead to the concept of random variable.
1. A random variable 𝑋 is a function that assigns a real number
𝑋(πœ‰) to each outcome πœ‰ in the sample space of
a random experiment.
2. 𝑋(πœ‰) maps from all possible outcomes
in sample space onto the real line.
Random variables
• The function that assigns values to each outcome is fixed
and deterministic, i.e., as in the rule “count the number of
heads in three coin tosses”
1. Randomness in 𝑋 is due to the underlying randomness
of the outcome πœ‰ of the experiment
• Random variables can be
1. Discrete, e.g., the resulting number after rolling a
dice
2. Continuous, e.g., the weight of a sampled
individual
Cumulative distribution function (cdf)
• The cumulative distribution function 𝐹𝑋(π‘₯) of a random
variable 𝑋 is defined as the probability of the event {𝑋 ≤
π‘₯}
𝐹𝑋(π‘₯) = 𝑃[𝑋 ≤ π‘₯] − ∞ < π‘₯ < ∞
• Intuitively, 𝐹𝑋(𝑏) is the long-term proportion
of times when 𝑋(πœ‰) ≤ 𝑏.
• Properties of the cdf
Probability density function (pdf)
• The probability density function 𝑓𝑋(π‘₯) of a continuous
random variable 𝑋, if it exists, is defined as the derivative
of 𝐹𝑋(π‘₯).
• For discrete random variables, the equivalent
to the pdf is the probability mass function
• Properties
1lb = 0.45359237kg
Statistical characterization of random
variables
• The cdf or the pdf are SUFFICIENT to fully characterize
a r.v.
• However, a r.v. can be PARTIALLY characterized with
other measures
• Expectation (center of mass of a density)
• Variance (spread about the mean)
• Standard deviation
• N-th moment
Random vectors
• An extension of the concept of a random variable
1. A random vector 𝑋 is a function that assigns a vector of real
numbers to each outcome πœ‰ in sample space 𝑆
2. We generally denote a random vector by a column vector.
• The notions of cdf and pdf are replaced by ‘joint cdf’ and
‘joint pdf ’.
1. Given random vector 𝑋 = [π‘₯1, π‘₯2, …, π‘₯𝑁]𝑇 we define the joint
cdf as
2. and the joint pdf as
• The term marginal pdf is used to represent the pdf of a
subset of all the random vector dimensions
1. A marginal pdf is obtained by integrating out variables
that are of no interest
ex) for a 2D random vector 𝑋 = [π‘₯1, π‘₯2]𝑇, the marginal pdf of
π‘₯1 is
Statistical characterization of random vectors
• A random vector is also fully characterized by its joint
cdf or joint pdf.
• Alternatively, we can (partially) describe a random
vector with measures similar to those defined for scalar
random variables.
• Mean vector:
• Covariance matrix
• The covariance matrix indicates the tendency of each
pair of features (dimensions in a random vector) to
vary together, i.e., to co-vary
1. The covariance has several important properties
– If π‘₯𝑖 and π‘₯π‘˜ tend to increase together, then π‘π‘–π‘˜ > 0
– If π‘₯𝑖 tends to decrease when π‘₯π‘˜ increases, then π‘π‘–π‘˜ < 0
– If π‘₯𝑖 and π‘₯π‘˜ are uncorrelated, then π‘π‘–π‘˜ =0
– |π‘π‘–π‘˜ | ≤ πœŽπ‘–πœŽπ‘˜, where πœŽπ‘– is the standard deviation of π‘₯𝑖.
– 𝑐𝑖𝑖 = πœŽπ‘–2 = π‘£π‘Žπ‘Ÿ[π‘₯𝑖]
2. The covariance terms can be expressed as 𝑐𝑖𝑖 = πœŽπ‘–2 and π‘π‘–π‘˜
=πœŒπ‘–π‘˜πœŽπ‘–πœŽπ‘˜.
– where πœŒπ‘–π‘˜ is called the correlation coefficient.
A numerical example
• Given the following samples from a 3D distribution
1. Compute the covariance matrix.
2. Generate scatter plots for every pair of vars.
3. Can you observe any relationships between the covariance
and the scatter plots?
4. You may work your solution in
the templates below.
2
3
5
6
4
2
4
4
6
4
4
6
2
4
4
-2
-1
1
2
0
-2
0
0
2
0
0
4
2
1
-2 1
0
4
0 2.5
4
0
0
4
2
0
4
4
0
2
4
0
0
4
2
0
-2
-2
0
-1
0
0
0
0
0
 2 .5
οƒͺ
2
οƒͺ
οƒͺ ο€­ 1
2
2
0
ο€­ 1οƒΉ
οƒΊ
0
οƒΊ
2 
The Normal or Gaussian distribution
• The multivariate Normal distribution 𝑁(πœ‡,Σ) is defined as
• For a single dimension, this expression is reduced to
• Gaussian distributions are very popular since
1. Parameters πœ‡,Σ uniquely characterize the normal distribution
2. If all variables π‘₯𝑖 are uncorrelated (𝐸[π‘₯𝑖π‘₯π‘˜] = 𝐸[π‘₯𝑖]𝐸[π‘₯π‘˜]), then
– Variables are also independent (𝑃[π‘₯𝑖π‘₯π‘˜] = 𝑃[π‘₯𝑖]𝑃[π‘₯π‘˜]), and
– Σ is diagonal, with the individual variances in the main diagonal.
• Central Limit Theorem (next slide)
• The marginal and conditional densities are also Gaussian.
• Any linear transformation of any 𝑁 jointly Gaussian r.v.’s results
in 𝑁 r.v.’s that are also Gaussian.
1. For 𝑋 = [𝑋1𝑋2…𝑋𝑁]𝑇 jointly Gaussian, and 𝐴𝑁×𝑁 invertible,
then π‘Œ = 𝐴𝑋 is also jointly Gaussian.
• Central Limit Theorem
1. Given any distribution with a mean πœ‡ and variance 𝜎2, the sampling
distribution of the mean approaches a normal distribution with mean πœ‡
and variance 𝜎2/𝑁 as the sample size 𝑁 increases
- No matter what the shape of the original distribution is, the sampling
distribution of the mean approaches a normal distribution.
- 𝑁 is the sample size used to compute the mean, not the overall number of
samples in the data.
2. Example: 500 experiments are performed using a uniform distribution.
- 𝑁=1
1) One sample is drawn from the distribution and its mean is recorded (500 times).
2) The histogram resembles a uniform distribution, as one would expect.
- 𝑁=4
1) Four samples are drawn and the mean of the four samples
is recorded (500 times)
2) The histogram starts to look more Gaussian
- As 𝑁 grows, the shape of the histograms resembles
a Normal distribution more closely.
01_기초 톡계
 톡계학
 자료λ₯Ό μ •λ³΄ν™”ν•˜κ³  이λ₯Ό λ°”νƒ•μœΌλ‘œ ν•˜μ—¬ 객관적인 μ˜μ‚¬ κ²°μ • 과정을 μ—°κ΅¬ν•˜λŠ” ν•™λ¬Έ
 λΆˆν™•μ‹€ν•œ μƒνƒœ μΆ”μ •μ˜ 문제
 λΆˆν™•μ‹€ν•œ μƒνƒœμ˜ νŒλ³„ 및 뢄석을 μœ„ν•œ λ°©λ²•μœΌλ‘œ ν™•λ₯ μ μΈ μ—¬λŸ¬ 기법듀 μ‚¬μš©
οƒ˜ Ex: λ…Έμ΄μ¦ˆκ°€ μ‹¬ν•œ 데이터 λͺ¨λΈλ§ 및 뢄석
 νŒ¨ν„΄μΈμ‹μ—μ„œλŠ” 톡계학적인 기법듀을 μ΄μš©ν•˜μ—¬ 주어진 μƒνƒœμ— λŒ€ν•œ 톡계 뢄석이 ν•„μˆ˜
 ν™•λ₯ κ°’을 μ΄μš©ν•œ μƒνƒœ μΆ”μ •
 ν™•λ₯ : κ°œμ—°μ„±, 신뒰도, λ°œμƒ λΉˆλ„ λ“±..
25
01_기초 톡계
 톡계 μš©μ–΄
 μΆ”μ •
 μš”μ•½λœ 데이터λ₯Ό 기반으둜 νŠΉμ • 사싀을 μœ λ„ν•΄ λ‚΄λŠ” 것
 데이터 λΆ„μ„μ˜ μ‹ λ’°μ„±
 μ„œλ‘œ λ‹€λ₯΄κ²Œ 반볡적으둜 데이터λ₯Ό μˆ˜μ§‘ν•˜λ”λΌλ„ 항상 λ™μΌν•œ κ²°κ³Όκ°€ μ–»μ–΄μ§€λŠ”κ°€?
οƒ˜ 즉, 보편적인 μž¬ν˜„μ΄ κ°€λŠ₯ν•œκ°€?
 λͺ¨μ§‘단
 데이터 λΆ„μ„μ˜ 관심이 λ˜λŠ” 전체 λŒ€μƒ
 ν‘œλ³Έ
 λͺ¨μ§‘λ‹¨μ˜ νŠΉμ„±μ„ νŒŒμ•…ν•˜κΈ° μœ„ν•΄μ„œ μˆ˜μ§‘λœ λͺ¨μ§‘λ‹¨μ˜ 일뢀뢄인 κ°œλ³„ 자료
 ν‘œλ³Έ 뢄포
 λ™μΌν•œ λͺ¨μ§‘λ‹¨μ—μ„œ μ·¨ν•œ, λ™μΌν•œ 크기λ₯Ό κ°–λŠ” κ°€λŠ₯ν•œ λͺ¨λ“  ν‘œλ³ΈμœΌλ‘œλΆ€ν„° 얻어진 톡계
κ°’λ“€μ˜ 뢄포
26
01_기초 톡계
 톡계 νŒŒλΌλ―Έν„°
 평균(mean)
 자료의 총합을 자료의 개수둜 λ‚˜λˆˆ κ°’.
 자료 λΆ„ν¬μ˜ 무게 쀑심에 ν•΄λ‹Ήν•œλ‹€.
 λΆ„μ‚°(variance)
 ν‰κ· μœΌλ‘œ λΆ€ν„° μžλ£Œκ°€ 흩어져 μžˆλŠ” 정도.
 μžλ£Œλ‘œλΆ€ν„° ν‰κ· κ°’μ˜ 차이에 λŒ€ν•œ 제곱 κ°’.
 ν‘œμ€€ 편차(standard deviation)
 λΆ„μ‚°μ˜ μ œκ³±κ·Όμ„ μ·¨ν•˜μ—¬ 자료의 차수(1μ°¨) 와 μΌμΉ˜μ‹œν‚¨ 것을 λ§ν•œλ‹€.
27
01_기초 톡계
 λ°”μ΄μ–΄μŠ€(bias)
 λ°μ΄ν„°μ˜ 편ν–₯된 정도λ₯Ό λ‚˜νƒ€λƒ„
28
01_기초 톡계
 곡뢄산(covariance)
 두 μ’…λ₯˜ μ΄μƒμ˜ 데이터가 μ£Όμ–΄μ§ˆ κ²½μš°μ— 두 λ³€μˆ˜μ‚¬μ΄μ˜ μ—°κ΄€ 관계에 λŒ€ν•œ 척도
 ν‘œλ³Έ 데이터 μ΄λ³€λŸ‰ 데이터(bivariate)
일 경우의 곡뢄산 계산
οƒ˜ Ex οƒ  xi: 2μ‹œ~3μ‹œ 사이에 μ•„νŠΈμ„Όν„° 정문을 μΆœμž…ν•˜λŠ” μ‚¬λžŒμ˜ 수
yi: 2μ‹œ~3μ‹œ μ‚¬μ΄μ˜ 평균 기온
 λ³€μˆ˜λ“€ κ°„μ˜ μ„ ν˜•κ΄€κ³„κ°€ μ„œλ‘œ μ •λ°©ν–₯(+)인지 λΆ€λ°©ν–₯(-)인지 μ •λ„λ§Œ λ‚˜νƒ€λƒ„.
Cov(x, y) > 0 : μ •λ°©ν–₯ μ„ ν˜•κ΄€κ³„
Cov(x, y) < 0 : λΆ€λ°©ν–₯ μ„ ν˜•κ΄€κ³„
Cov(x, y) = 0 : μ„ ν˜•κ΄€κ³„κ°€ μ—†μŒ
29
01_기초 톡계
 상관 κ³„μˆ˜(correlation coefficient)
 두 λ³€λŸ‰ X,Y μ‚¬μ΄μ˜ μƒκ΄€κ΄€κ³„μ˜ 정도λ₯Ό λ‚˜νƒ€λ‚΄λŠ” 수치(κ³„μˆ˜)
 κ³΅λΆ„μ‚°κ³Όμ˜ 차이점:
οƒ˜ 곡뢄산: λ³€μˆ˜λ“€ κ°„μ˜ μ„ ν˜•κ΄€κ³„κ°€ μ •λ°©ν–₯인지 λΆ€λ°©ν–₯인지 μ •λ„λ§Œ λ‚˜νƒ€λƒ„.
οƒ˜ 상관 κ³„μˆ˜: μ„ ν˜• 관계가 μ–Όλ§ˆλ‚˜ λ°€μ ‘ν•œμ§€λ₯Ό λ‚˜νƒ€λ‚΄λŠ” μ§€ν‘œ
 λ³€μˆ˜λ“€μ˜ 뢄포 νŠΉμ„±κ³ΌλŠ” 관계 μ—†μŒ.
30
01_기초 톡계
 Ex: λ„μ„œκ΄€ μƒμ£Όμ‹œκ°„, 성적, κ²°μ„νšŸμˆ˜μ˜ 상관관계
x1: μ£Όλ‹Ή λ„μ„œκ΄€μ— μžˆλŠ” μ‹œκ°„
x2: νŒ¨ν„΄μΈμ‹ κ³Όλͺ© 점수
x3: μ£Όλ‹Ή 결석 μ‹œκ°„
 μž‘μ€ 상관 κ³„μˆ˜λŠ” μ„ ν˜•μ μœΌλ‘œ 두 μ„±λΆ„κ°„μ˜ 관련성이 거의 μ—†μŒμ„ λ‚˜νƒ€λ‚Έλ‹€.
 음의 상관 κ³„μˆ˜λŠ” 두 성뢄이 μ„œλ‘œ μƒλ°˜λ˜λŠ” νŠΉμ„±μ„ 가짐을 μ˜λ―Έν•¨.
31
01_기초 톡계
 μ™œλ„(skewness, skew)
 뢄포가 μ–΄λŠ ν•œμͺ½μœΌλ‘œ 치우친(λΉ„λŒ€μΉ­:asymmetry) 정도λ₯Ό λ‚˜νƒ€λ‚΄λŠ” 톡계적 척도.
 였λ₯Έμͺ½μœΌλ‘œ 더 κΈΈλ©΄ μ–‘μ˜ 값이 되고 μ™Όμͺ½μœΌλ‘œ 더 κΈΈλ©΄ 음의 값이 λœλ‹€.
 뢄포가 쒌우 λŒ€μΉ­μ΄λ©΄ 0이 λœλ‹€.
32
01_기초 톡계
 첨도(kurtosis)
 λΎ°μ‘±ν•œ(peakedness) 정도λ₯Ό λ‚˜νƒ€λ‚΄λŠ” 톡계적 척도이닀.
33
2차원 데이터 생성
οƒΌ νŠΉμ • ν™•λ₯ λΆ„포λ₯Ό 따라 ν™•λ₯ μ μœΌλ‘œ 생성
οƒ˜
연ꡬ λͺ©μ μ— λ§žλŠ” 데이터 ν™•λ₯ λΆ„포 μ •μ˜
οƒ˜
이 λΆ„ν¬λ‘œλΆ€ν„° λžœλ€ν•˜κ²Œ 데이터 μΆ”μΆœ
 균등뢄포 rand()
 κ°€μš°μ‹œμ•ˆ 뢄포 randn()
οƒ  일반적인 경우 평균/곡뢄산에 λ”°λ₯Έ λ³€ν™˜ ν•„μš”
34
2차원 데이터 생성
replicate and tile an array
N=25;
m1=repmat([3,5], N,1);
m2=repmat([5,3], N,1);
s1=[1 1; 1 2];
s2=[1 1; 1 2];
matrix square root
X1=randn(N,2)*sqrtm(s1)+m1;
X2=randn(N,2)*sqrtm(s2)+m2;
plot(X1(:,1), X1(:,2), '+');
hold on;
plot(X2(:,1), X2(:,2), 'd');
save data2_1 X1 X2;
normally distributed pseudorandom numbers
ex) Generate values from a normal distribution with mean 1 and stadard deviation 2
r = 1+2.*randn(100,1)
35
2차원 데이터 생성
1
1
2
2
ν•™μŠ΅ 데이터
ν…ŒμŠ€νŠΈ 데이터
36
ν•™μŠ΅ : λ°μ΄ν„°μ˜ 뢄포 νŠΉμ„± 뢄석
οƒΌ 뢄포 νŠΉμ„± 뢄석 οƒ  결정경계
load data2_1;
m1 = mean(X1);
m2 = mean(X2);
s1 = cov(X1);
1
m1
2
m2
s2 = cov(X2);
save mean2_1 m1 m2 s1 s2;
ν•™μŠ΅ λ°μ΄ν„°λ‘œλΆ€ν„° μΆ”μ •λœ 평균과 곡뢄산
37
λΆ„λ₯˜ : κ²°μ •κ²½κ³„μ˜ μ„€μ •
οƒΌνŒλ³„ν•¨μˆ˜
d(m1,xnew)-d(m2,xnew)=0
οƒΌν΄λž˜μŠ€ 라벨
C1
d(m1,xnew)
xnew
m1
d(m2,xnew)
m2
C2
38
μ„±λŠ₯ 평가
οƒΌν•™μŠ΅ 였차 계산 ν”„λ‘œκ·Έλž¨
load data2_1;
load mean2_1
Etrain=0;
N=size(X1,1)
for i=1:N
d1=norm(X1(i,:)-m1);
d2=norm(X1(i,:)-m2);
if (d1-d2) > 0 Etrain = Etrain+1; end
d1=norm(X2(i,:)-m1);
d2=norm(X2(i,:)-m2);
if (d1-d2) < 0 Etrain = Etrain+1; end
end
fprintf(1,'Training Error = %.3f\n', Etrain/50);
39
μ„±λŠ₯ 평가
ν•™μŠ΅ 데이터
2/50
ν…ŒμŠ€νŠΈ 데이터
3/50
40
λ°μ΄ν„°μ˜ μˆ˜μ§‘κ³Ό μ „μ²˜λ¦¬
0 1 2 3 4 5 6 7 8 9
0 1 2 3 4 5 6 7 8 9
ν…ŒμŠ€νŠΈ 데이터
0 1 2 3 4 5 6 7 8 9
0 1 2 3 4 5 6 7 8 9
0 1 2 3 4 5 6 7 8 9
0 1 2 3 4 5 6 7 8 9
0 1 2 3 4 5 6 7 8 9
μ „μ²˜λ¦¬λ₯Ό 거친 데이터 집합
(크기 μ •κ·œν™”, 이진화)
ν•™μŠ΅ 데이터
μˆ˜μ§‘λœ 데이터 집합
41
데이터 λ³€ν™˜
μ˜μƒ 데이터λ₯Ό 읽어 데이터 λ²‘ν„°λ‘œ ν‘œν˜„ν•˜κ³  μ €μž₯
οƒΌ (μ „μ²˜λ¦¬λœ)μ˜μƒ 데이터(20x16) οƒ  벑터(320차원)
ν•™μŠ΅ 데이터
320x50
ν…ŒμŠ€νŠΈ 데이터
320x20
for i=0:1:9
for j=3:7
fn = sprintf('digit%d_%d.bmp', i, j);
xi = imread(fn);
x = reshape(double(xi), 20*16, 1);
Xtrain(:, i*5+j-2) = x;
Ttrain(i*5+j-2, 1) = i;
end
for j=1:2
fn = sprintf('digit%d_%d.bmp', i, j);
xi = imread(fn);
x = reshape(double(xi), 20*16, 1);
Xtest(:, i*2+j) = x;
Ttest(i*2+j, 1) = i;
end
end
save digitdata Xtrain Ttrain Xtest Ttest
reshape array: 320 by 1
ν•™μŠ΅λ°μ΄ν„° 집합: 320 by 50
클래슀 λ ˆμ΄λΈ”: 320 by 50
42
ν•™μŠ΅κ³Ό 결정경계
οƒΌ ν•™μŠ΅ 데이터에 λŒ€ν•œ 평균 벑터 계산
load digitdata
for i=0:1:9
mX(:, i+1) = mean(Xtrain(:, i*5+1:i*5+5)’)’;
end
save digitMean mX;
mXi = uint8(mX*255);
convert to unsigned 8-bit integer
for i = 0:1:9
subplot(1, 10, i+1);
imshow(reshape(mXi(:, i+1), 20, 16));
end
ν‰κ· μ˜μƒ
κ²°μ •κ·œμΉ™
43
λΆ„λ₯˜μ™€ μ„±λŠ₯ 평가
κ²°μ •κ·œμΉ™
load digitdata; load digitMean
Ntrain = 50; Ntest = 20;
Etrain = 0; Etest = 0;
for i = 1:50
ν•™μŠ΅ 데이터
x = Xtrain(:.i);
for j = 1:10
minv: μ΅œμ†Œκ°’.
dist(j) = norm(x - mX(:, j));
minc_train: μ΅œμ†Œκ±°λ¦¬λ₯Ό κ°–λŠ” 평균 영
end
상데이터 λ ˆμ΄λΈ”
[minv, minc_train(i)] = min(dist);
if(Ttrain(i) ~= (minc_train(i) - 1)) Etrain = Etrain+1; end
end
for i = 1:20
ν…ŒμŠ€νŠΈ 데이터
x = Xtest(:, i);
for j = 1:10
dist(j) = norm(x - mX(:, j));
end
[minv, minc_test(i)] = min(dist);
if(Ttest(i) ~= (minc_test(i) - 1)) Etest = Etest+1; end
end
44
λΆ„λ₯˜μ™€ μ„±λŠ₯ 평가
ν…ŒμŠ€νŠΈ 데이터
(였차 15%)
ν•™μŠ΅ 데이터
(였차 2%)
45
01_기초 톡계
 νšŒκ·€λΆ„μ„
 νšŒκ·€μ§μ„ 
 ν‘œλ³Έ 집합을 λŒ€ν‘œν•˜λŠ” 졜적의 직선을 κ΅¬ν•˜λŠ” κ³Όμ •
 각 ν‘œλ³Έμ—μ„œ μ§μ„ κΉŒμ§€μ˜ 거리 합이 κ°€μž₯ μž‘μ€ 직선을 κ΅¬ν•˜λŠ” κ³Όμ •
οƒ˜ 각 ν‘œλ³Έμ—μ„œ 직선에 λ‚΄λ¦° μˆ˜μ„ μ˜ 길이의 제곱 κ°’μ˜ 합이 μ΅œμ†ŒμΈ 직선을 ꡬ함
 'μ΅œμ†Œ μžμŠΉλ²•(Method of Least Mean Squares, LMS)'
E ο€½
ο‚ΆE

 ο€½
οƒ₯
ο€½ 0,
N οƒ₯ xi yi ο€­
N οƒ₯ x i2
N
i
οƒ₯
d i2 ο€½
ο‚ΆE
N
i
[ y i ο€­ ( x i   )] 2
ο€½0

οƒ₯x οƒ₯
ο€­ (οƒ₯ x )
i
2
i
yi
 ο€½
οƒ₯
οƒ₯ y ο€­οƒ₯ x οƒ₯ x
N οƒ₯ x ο€­ (οƒ₯ x )
2
xi
i
i
2
i
i
yi
2
i
46
Least squares line fitting
 Data: (x1, y1), …, (xn, yn)
 Line equation: yi = m xi + b
 Find (m, b) to minimize
E ο€½
οƒ₯
n
i ο€½1
( yi ο€­ m xi ο€­ b )

E ο€½ οƒ₯  y i ο€­ xi
i ο€½1 

n
y=mx+b
(xi, yi)
2
 y 1 οƒΉ  x1
m οƒΉ οƒΆ
οƒͺ οƒΊ οƒͺ
1οƒͺ οƒΊ οƒ· ο€½
 ο€­ 
οƒ·
οƒͺ
οƒΊ οƒͺ
b 
οƒͺ y n  οƒͺ x n
2
2
1οƒΉ
οƒΊ m οƒΉ
 οƒͺ οƒΊ
οƒΊ b
 
1
ο€½ Y ο€­ XB
2
ο€½ (Y ο€­ XB ) (Y ο€­ XB ) ο€½ Y Y ο€­ 2 ( XB ) Y  ( XB ) ( XB )
T
dE
T
T
T
ο€½ 2 X XB ο€­ 2 X Y ο€½ 0
T
T
dB
X XB ο€½ X Y
T
T
Normal equations: least squares solution to XB = Y
47
예제(μ„ ν˜•λŒ€μˆ˜ν•™)
 Data: (x1, y1), …, (xn, yn)
 Line equation: yi = m xi + b, y1 = m x1 + b, …, yn = m xn + b
 y 1 οƒΉ  x1
οƒͺ οƒΊ οƒͺ
 ο€½ 
οƒͺ οƒΊ οƒͺ
οƒͺ y n  οƒͺ x n
1οƒΉ
οƒΊ m οƒΉ
 οƒͺ οƒΊ
οƒΊ b
 
1
M y ο€½ M Mv
T
T
οƒž
y ο€½ M v,
μΈ‘μ •μ˜€μ°¨ 있으면, μΈ‘μ •λœ 점은 직선 μœ„μ— μžˆμ§€ μ•ŠμŒ.
οƒž
ο€­1
v ο€½ (M M ) M y
T
T
: μ΅œμ†Œμ œκ³±μ§μ„ (least squares line of best fit)
λ˜λŠ” νšŒκ·€μ§μ„ (regression line)
μœ„ μ‹μœΌλ‘œ 얻어진 y = m x + b
 (0, 1), (1, 3), (2, 4), (3, 4)의 μ΅œμ†Œμ œκ³±μ§μ„ ?
0
οƒͺ
1
M ο€½ οƒͺ
οƒͺ2
οƒͺ
3
1οƒΉ
οƒΊ
1
οƒΊ,
1οƒΊ
οƒΊ
1
1 οƒΉ
οƒͺ οƒΊ
3
y ο€½ οƒͺ οƒΊ
οƒͺ4οƒΊ
οƒͺ οƒΊ
4
0
T
M M ο€½ οƒͺ
1
1
2
1
1
0
οƒͺ
3οƒΉ 1
οƒͺ
οƒΊ
1 οƒͺ2
οƒͺ
3
1οƒΉ
οƒΊ
1
14
οƒΊο€½
οƒͺ
1οƒΊ  6
οƒΊ
1
 2
m
οƒͺ
 οƒΉ
T
-1
T
v ο€½ οƒͺ οƒΊ ο€½ ( M M ) M y ο€½ οƒͺ 10
ο€­3
b 
οƒͺ
 10
6οƒΉ
οƒΊ,
4
ο€­ 3οƒΉ
10 οƒΊ  0
7 οƒΊ οƒͺ 1
οƒΊ
10 
 2
οƒͺ
T
-1
( M M ) ο€½ οƒͺ 10
ο€­3
οƒͺ
 10
1
2
1
1
ο€­ 3οƒΉ
10 οƒΊ
7 οƒΊ
οƒΊ
10 
1 οƒΉ
οƒͺ οƒΊ
3οƒΉ 3
 1 οƒΉ
οƒͺ οƒΊο€½
οƒΊ
οƒͺ οƒΊ
1  οƒͺ 4 οƒΊ 1 . 5 
οƒͺ οƒΊ
4
48
Total least squares
Distance between point (xi, yi)
and line ax+by=d (a2 + b2 =
1): |axi + byi – d|
ax+by=d
Find (a, b, d) to minimize the
sum of squared perpendicular
distances E ο€½ n ( a x  b y ο€­ d ) 2
οƒ₯ i ο€½1 i
i
ο‚ΆE
ο‚Άd
ο€½
οƒ₯
n
i ο€½1
ο€­ 2 ( a xi  b yi ο€­ d ) ο€½ 0
d ο€½
 x1 ο€­ x
n
οƒͺ
2
E ο€½ οƒ₯ ( a ( x i ο€­ x )  b ( y i ο€­ y )) ο€½

i ο€½1
οƒͺ
οƒͺ x n ο€­ x
dE
dN
a
οƒ₯
n
Unit normal:
N=(a, b)
( a x (x
 b y, ο€­y
i di))
E ο€½
οƒ₯
n
xi 
i ο€½1
n
i ο€½1
2
i
i
b
οƒ₯
n
y1 ο€­ y οƒΉ
οƒΊ a οƒΉ

οƒΊ οƒͺb οƒΊ
 
y n ο€­ y 
n
i ο€½1
yi ο€½ a x  b y
2
ο€½ (UN ) (UN )
T
ο€½ 2 (U U ) N ο€½ 0
T
49
Total least squares
 x1 ο€­ x
n
οƒͺ
2
E ο€½ οƒ₯ ( a ( x i ο€­ x )  b ( y i ο€­ y )) ο€½

i ο€½1
οƒͺ
οƒͺ x n ο€­ x
dE
2
y1 ο€­ y οƒΉ
οƒΊ a οƒΉ

οƒΊ οƒͺb οƒΊ
 
y n ο€­ y 
ο€½ (UN ) (UN )
T
ο€½ 2 (U U ) N ο€½ 0
T
dN
Solution to (UTU)N = 0, subject to ||N||2 = 1: eigenvector of UTU
associated with the smallest eigenvalue (least squares solution
to homogeneous linear system UN = 0)
 x1 ο€­ x
οƒͺ
U ο€½

οƒͺ
οƒͺ x n ο€­ x
y1 ο€­ y οƒΉ
οƒΊ

οƒΊ
y n ο€­ y 
n

2
(
x
ο€­
x
)
οƒ₯ i
οƒͺ
T
U U ο€½ οƒͺ n i ο€½1
οƒͺ
( x i ο€­ x )( y i ο€­ y )
οƒͺ οƒ₯
i ο€½1
n
οƒ₯
i ο€½1
οƒΉ
( x i ο€­ x )( y i ο€­ y ) οƒΊ
οƒΊ
n
2
οƒΊ
(
y
ο€­
y
)
οƒ₯ i

i ο€½1
second moment matrix
50
01_기초 톡계
 νšŒκ·€κ³‘μ„ , 곑선 ν”ΌνŒ… (curve fitting)
 ν‘œλ³Έ 데이터에 λŒ€ν•œ 2μ°¨ ν•¨μˆ˜ 맀핑 κ³Όμ •
οƒ˜ 보닀 높은 차수의 ν•¨μˆ˜λ„ 맀핑 κ°€λŠ₯
 κ³‘μ„ μœΌλ‘œ λ‚˜νƒ€λ‚Έ νšŒκ·€ 식을 'νšŒκ·€ 곑선'이라고 ν•œλ‹€.
0
0
100
200
300
400
500
600
-2
-4
-6
-8
-10
y = -6E-12x 4 + 1E-07x 3 + 2E-05x 2 - 0.0461x - 0.1637
-12
51
02_ν™•λ₯  이둠
 ν™•λ₯  μš©μ–΄
 톡계적 ν˜„μƒ
 λΆˆν™•μ‹€ν•œ ν˜„μƒμ„ λ°˜λ³΅ν•˜μ—¬ κ΄€μ°°ν•˜κ±°λ‚˜, 그에 λŒ€ν•œ λŒ€λŸ‰μ˜ ν‘œλ³Έμ„ μΆ”μΆœν•˜μ—¬ 고유의 법칙 및
νŠΉμ„±μ„ μ°Ύμ•„λ‚Ό 수 μžˆλŠ” ν˜„μƒ
 ν™•λ₯  μ‹€ν—˜μ˜ νŠΉμ„±
 λ™μΌν•œ 쑰건 μ•„λž˜μ—μ„œ 반볡 μˆ˜ν–‰ν•  수 있음
 μ‹œν–‰μ˜ κ²°κ³Όκ°€ 맀번 λ‹€λ₯΄λ―€λ‘œ μ˜ˆμΈ‘ν•  수 μ—†μœΌλ‚˜, κ°€λŠ₯ν•œ λͺ¨λ“  결과의 집합은 μ•Œ 수 있음
 μ‹œν–‰μ„ λ°˜λ³΅ν•  λ•Œ 각각의 κ²°κ³ΌλŠ” λΆˆκ·œμΉ™ν•˜κ²Œ λ‚˜νƒ€λ‚˜μ§€λ§Œ, 반볡의 수λ₯Ό 늘이면 μ–΄λ–€ 톡계적
κ·œμΉ™μ„±μ΄ λ‚˜νƒ€λ‚˜λŠ” νŠΉμ§•μ„ 가짐
52
02_ν™•λ₯  이둠
 ν™•λ₯ (probability)
 톡계적 ν˜„μƒμ˜ ν™•μ‹€ν•¨μ˜ 정도λ₯Ό λ‚˜νƒ€λ‚΄λŠ” 척도.
 λ¬΄μž‘μœ„ μ‹œν–‰μ—μ„œ μ–΄λ– ν•œ 사건이 일어날 정도λ₯Ό λ‚˜νƒ€λ‚΄λŠ” 사건에 ν• λ‹Ήλœ μˆ˜λ“€μ„ 말함
 ν™•λ₯  법칙
οƒ˜ λ¬΄μž‘μœ„ μ‹œν–‰μ—μ„œ 사건에 ν™•λ₯ μ„ ν• λ‹Ήν•˜λŠ” κ·œμΉ™μ„ 말함
οƒ˜ λ¬΄μž‘μœ„ μ‹œν–‰μ˜ ν‘œλ³Έ 곡간 Sκ°€ λͺ¨λ“  κ°€λŠ₯ν•œ 좜λ ₯ 집합이 λœλ‹€.
53
02_ν™•λ₯  이둠
 μˆ˜ν•™μ  ν™•λ₯ 
 ν‘œλ³Έκ³΅κ°„ S에 λŒ€ν•΄μ„œ 사건 Aκ°€ 일어날 경우의 수 n(A)/n(S)을 사건 A의 ν™•λ₯ μ΄λΌκ³  ν•œλ‹€.
οƒ˜ λͺ¨λ“  경우의 수 및 ν‘œλ³Έ 집합이 미리 주어진 경우
 톡계적 ν™•λ₯ 
 μ‹œν–‰μ„ μ—¬λŸ¬ 번 λ°˜λ³΅ν•˜μ—¬ λŒ€μƒ 사건이 μΌμ–΄λ‚˜λŠ” ν™•λ₯ μ„ μƒλŒ€μ  λΉˆλ„μˆ˜λ‘œλΆ€ν„° κ²½ν—˜μ μœΌλ‘œ
μΆ”μ •ν•˜λŠ” κ³Όμ •
οƒ˜ 일반적인 μžμ—° ν˜„μƒμ΄λ‚˜ μ‚¬νšŒ ν˜„μƒμ—λŠ” λ°œμƒ κ°€λŠ₯μ„±μ΄λ‚˜ 횟수λ₯Ό 미리 μ•Œ 수 μ—†λŠ”
κ²½μš°κ°€ λŒ€λΆ€λΆ„μž„.
 n회의 μ‹œν–‰μ—μ„œ 사건이 r회 일어났닀고 ν•˜λ©΄ μƒλŒ€μ  λΉˆλ„μˆ˜λŠ” r/n이 λœλ‹€.
 이와 같이 μΆ”μ •λ˜λŠ” ν™•λ₯ μ„ 톡계적 ν™•λ₯ μ΄λΌκ³  ν•œλ‹€.
 사건(event)κ³Ό 배반 사건 (mutually exclusive event)
 κ²°κ³Όκ°€ μš°μ—°μΈ μ‹€ν—˜μ„ μ‹œν–‰(trial)이라고 함.
 λ¬΄μž‘μœ„ μ‹œν–‰λœ κ²°κ³Όλ₯Ό '사건'이라고 ν•œλ‹€.
 두 사건 A, Bκ°€ λ™μ‹œμ— 일어날 수 없을 λ•Œ A, BλŠ” μ„œλ‘œ 배반(mutually exclusive)ν•œλ‹€κ³  ν•œλ‹€.
 두 사건 A, Bκ°€ μ„œλ‘œ μ „ν˜€ λ¬΄κ΄€ν•œ 사건일 경우, A, Bλ₯Ό 독립사건 이라고 함.
54
02_ν™•λ₯  이둠
 ν‘œλ³Έ 곡간(sample space)κ³Ό ν™•λ₯  곡간
 κ΄€μ°°ν•  쑰건 및 λ°œμƒ κ°€λŠ₯ν•œ 결과의 λ²”μœ„λ₯Ό κ·œμ •ν•œ λ‹€μŒ, κ·Έ λ²”μœ„ λ‚΄μ˜ 각 결과에 기호λ₯Ό
λŒ€μ‘μ‹œν‚¨ 집합을 'ν‘œλ³Έ 곡간'이라고 ν•œλ‹€.
οƒ˜ ν‘œλ³Έ κ³΅κ°„μ˜ μ›μ†Œλ₯Ό 'ν‘œλ³Έμ '이라 함
οƒ˜ ν•˜λ‚˜μ˜ ν‘œλ³Έμ μœΌλ‘œ 이루어진 사건을 '근원 사건'이라 함.
οƒ˜ ν‘œλ³Έ κ³΅κ°„μ˜ λΆ€λΆ„ 집합을 '사건’이라고 함.
 ν‘œλ³Έ 곡간에 ν™•λ₯ μ΄ λŒ€μ‘λœ 집합을 'ν™•λ₯  곡간'이라 함
οƒ˜ ν™•λ₯  κ³΅κ°„μ΄λž€ ν™•λ₯  μ‹€ν—˜μ—μ„œ κ°€λŠ₯ν•œ λͺ¨λ“  결과의 집합을 λ§ν•œλ‹€.
 Example: ν•œ 개의 μ£Όμ‚¬μœ„λ₯Ό ν•œ 번 λ˜μ Έμ„œ 짝수의 눈이 λ‚˜μ˜¨ 경우
οƒ˜ ν‘œλ³Έκ³΅κ°„μ€ S={1, 2, 3, 4, 5, 6}이고, 근원 사건은 Ep={1, 2, 3, 4, 5, 6}, 사건은 E={2, 4, 6}
55
02_ν™•λ₯  이둠
 ν™•λ₯ μ— κ΄€ν•œ 정리
 ν™•λ₯ μ— κ΄€ν•œ μ„±μ§ˆ
56
02_ν™•λ₯  이둠
 μ£Όλ³€ ν™•λ₯  (marginal probability)
 2개 μ΄μƒμ˜ 사건이 μ£Όμ–΄μ‘Œμ„ λ•Œ ν•˜λ‚˜μ˜ 사건이 일어날 λ‹¨μˆœν•œ ν™•λ₯ 
οƒ˜ 즉, λ‹€λ₯Έ 사건은 κ³ λ €ν•˜μ§€ μ•Šμ€ ν™•λ₯ 
 쑰건뢀 ν™•λ₯  (conditional probability)
 A와 B 두 개의 사건이 μžˆμ„ 경우, 사건 Bκ°€ 일어날 ν™•λ₯ μ΄ 이미 μ•Œλ €μ Έ μžˆμ„ κ²½μš°μ— 사건
Aκ°€ 일어날 ν™•λ₯ 
οƒ˜ P[A|B] : Pr. of A given B
 쑰건뢀 ν™•λ₯ μ—μ„œ κ°œλ³„ ν™•λ₯  P[A], P[B] λ₯Ό 사전확λ₯ (priori probability)라고 함
 쑰건뢀 ν™•λ₯  P[A|B], P[B|A]λ₯Ό 사후확λ₯ (posteriori probability)이라고 함.
 사건 A와 Bκ°€ 독립사건이면 οƒ  P[A|B] = P[A]
주어진 B에 λŒ€ν•œ A의 ν™•λ₯ 
57
02_ν™•λ₯  이둠
 κ²°ν•© ν™•λ₯  (joint rule)
 A와 B 사건이 λ™μ‹œμ— λ°œμƒν•˜λŠ” ν™•λ₯ ,
οƒ˜ 즉, 쑰건뢀 ν™•λ₯ λ‘œ μœ λ„λ˜λŠ” ν™•λ₯ 
 P[B]×P[A | B] =P(A∩B), P[A]×P[B | A] =P(A∩B)
 λ§Œμ•½ 사건 A와 Bκ°€ 독립사건이면
οƒ˜ P[A|B]=P(A), P[B|A]=P(B)
οƒ˜ P[A|B]=P(A) οƒ  P[B]×P[A]=P(A∩B)κ°€ μ„±λ¦½ν•œλ‹€.
 체인 κ·œμΉ™(chain rule)
 각 사건이 일어날 ν™•λ₯ μ„ 쑰건뢀 ν™•λ₯ κ³Ό κ²°ν•© ν™•λ₯ μ„ μ΄μš©ν•˜μ—¬ 연쇄적인(chain) 쑰건뢀
ν™•λ₯ μ˜ 곱으둜 ν‘œν˜„ν•œ 것.
οƒ˜ 이λ₯Ό '체인 κ·œμΉ™'이라고 ν•œλ‹€.
58
02_ν™•λ₯  이둠
 체인 κ·œμΉ™(chain rule)
κ²°ν•© 뢄포: ν™•λ₯  λ³€μˆ˜κ°€ μ—¬λŸ¬ 개일 λ•Œ 이듀을 ν•¨κ»˜ κ³ λ €ν•˜λŠ” ν™•λ₯  뢄포
1. 이산적 경우:
2. 연속적인 경우,
 각 사건이 일어날 ν™•λ₯ μ„ 쑰건뢀 ν™•λ₯ κ³Ό κ²°ν•© ν™•λ₯ μ„ μ΄μš©ν•˜μ—¬ 연쇄적인(chain) 쑰건뢀
ν™•λ₯ μ˜ 곱으둜 ν‘œν˜„ν•œ 것.
59
02_ν™•λ₯  이둠
 전체 ν™•λ₯  (total probability)
 B1,B2,…,Bn 의 합집합이 ν‘œλ³Έ 곡간이고, μ„œλ‘œ μƒν˜Έ 배타적인 사건이라고 ν•  λ•Œ..
οƒ˜ ν‘œλ³Έ 곡간 S의 λΆ„ν•  μ˜μ—­μœΌλ‘œ 이듀 집합을 λ‚˜νƒ€λ‚Ό 수 μžˆλ‹€. 이 λ•Œ, 사건 A은 μ•„λž˜ 식과
같이 ν‘œν˜„λœλ‹€.
S
 B1,B2,…,Bn 은 μƒν˜Έ λ°°νƒ€μ μ΄λ―€λ‘œ
 κ·ΈλŸ¬λ―€λ‘œ μ•„λž˜μ‹μ΄ μ„±λ¦½ν•˜λ©° 이λ₯Ό 사건 A의 전체 ν™•λ₯ μ΄λΌκ³  ν•œλ‹€.
60
02_ν™•λ₯  이둠
 전체 ν™•λ₯  (total probability)
 사건 A의 전체 ν™•λ₯ 
 예제: μ–΄λŠ 곡μž₯μ—μ„œ A, B, C μ„Έ μ’…λ₯˜μ˜ 기계λ₯Ό μ‚¬μš©ν•˜μ—¬ 물건을 μƒμ‚°ν•˜κ³ , 기계 A, B,
Cλ₯Ό 전체 μƒμ‚°λŸ‰μ˜ 50%, 30%, 20%λ₯Ό 각각 μƒμ‚°ν•œλ‹€. μ—¬κΈ°μ„œ λ§Œλ“œλŠ” μ œν’ˆμ˜ λΆˆλŸ‰λ₯ μ€
각각 1%, 2%, 3%라고 ν•œλ‹€. 이듀 μž¬ν’ˆμ—μ„œ μž„μ˜λ‘œ 1개λ₯Ό 뽑아 검사할 λ•Œ, 그것이
λΆˆλŸ‰ν’ˆμΌ ν™•λ₯ μ€?
P[λΆˆλŸ‰] = P[A∩λΆˆλŸ‰] + P[B∩λΆˆλŸ‰] + P[C∩λΆˆλŸ‰]
= P[λΆˆλŸ‰|A]P[A] + P[λΆˆλŸ‰|B]P[B] + P[λΆˆλŸ‰|C]P[C]
= (0.01×0.5) + (0.02×0.3) + (0.03×0.2) = 0.017
61
02_ν™•λ₯  이둠
 베이즈의 정리 (Bayes’ rule)
 ν‘œλ³Έ 곡간 S의 λΆ„ν•  μ˜μ—­ B1, B2, …, BN μ£Όμ–΄μ‘Œμ„ λ•Œ, 사건 Aκ°€ 일어났닀고 κ°€μ •. 이 사건이
Bj 의 μ˜μ—­μ—μ„œ 일어날 ν™•λ₯ μ€?
 νŠΉμ§• 벑터(x)κ°€ κ΄€μΈ‘λ˜μ—ˆμ„ λ•Œ, κ·Έ νŠΉμ§• 벑터가 μ†ν•˜λŠ” 클래슀(ωj)λ₯Ό μ•Œμ•„λ‚΄λŠ” 문제
→ κ°€μž₯ 큰 P[ωi|x] κ°–λŠ” 클래슀 ωi 선택
 μ•žμ˜ μ˜ˆμ œμ—μ„œ μ°½κ³ μ—μ„œ 1개λ₯Ό 뽑아 κ²€μ‚¬ν•΄μ„œ λΆˆλŸ‰ν’ˆμ΄λΌλ©΄, 이것이 A μ œν’ˆμΌ ν™•λ₯ μ€?
P[ω A | x ] ο€½
P [x | ω A ] P [ ω A ]
οƒ₯ P [x | ω
k ο€½ A , B ,C
k
]P[ω k ]
ο€½
P [x | ω A ] P [ ω A ]
P [x | ω A ] P [ ω A ]  P [x | ω B ] P [ ω B ]  P [x | ω C ] P [ ω C ]
= (0.01×0.5)/[(0.01×0.5) + (0.02×0.3) + (0.03×0.2)] = 5/17
62
01_ν™•λ₯ λ³€μˆ˜
 ν™•λ₯ λ³€μˆ˜
 ν™•λ₯  μ‹€ν—˜μ—μ„œ κ°œλ³„ μ‹œν–‰ κ²°κ³Όλ₯Ό 수치둜 λŒ€μ‘μ‹œν‚€λŠ” ν•¨μˆ˜λ₯Ό 'ν™•λ₯ λ³€μˆ˜' ν˜Ήμ€
'λžœλ€λ³€μˆ˜'라고 μ •μ˜ν•œλ‹€.
 Ex: 동전 두 개λ₯Ό λ˜μ§€λŠ” μ‹œν–‰μ˜ 경우, μ•žλ©΄(head) μ•„λ‹ˆλ©΄ λ’·λ©΄(tail):
οƒ˜ ν‘œλ³Έ 곡간 S = {(H, H), (H, T), (T, H), (T, T)}
→ μ•žλ©΄μ΄ ν•˜λ‚˜ 이상 λ‚˜μ˜¬ ν™•λ₯ κ°’: ¾
οƒ˜ 동전 μ—΄ 개λ₯Ό λ˜μ§€λŠ” κ²½μš°μ™€ 같이 μ‹œν–‰μ΄ λ³΅μž‘ν•  경우, ν‘œλ³Έ 곡간 μƒμ—μ„œ λ°”λ‘œ ν™•λ₯ μ„
κ΅¬ν•˜λŠ” 것이 어렀움. μ‹€ν—˜ κ²°κ³Όλ₯Ό 수치둜 λŒ€μ‘μ‹œμΌœ 놓고 톡계적 뢄석을 ν–‰ν•˜λŠ” 것이
용이.
οƒ˜ 예제: ‘두 개의 동전을 λ˜μ§€λŠ” ν™•λ₯  μ‹€ν—˜μ—μ„œ
μ•ž 면이 λ‚˜μ˜€λŠ” 횟수’ κ·œμΉ™μœΌλ‘œ λ§Œλ“€μ–΄μ§€λŠ”
ν™•λ₯ λ³€μˆ˜ X의 κ°’ x = {0, 1, 2}
οƒ˜ ‘두 개의 μ£Όμ‚¬μœ„μ˜ μ λ“€μ˜ ν•©’ κ·œμΉ™μœΌλ‘œ λ§Œλ“€μ–΄μ§€λŠ”
ν™•λ₯ λ³€μˆ˜ X의 κ°’ x = {2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12}
cf. ν™•λ₯ (λ˜λŠ” 랜덀) 의미: μ‹œν–‰ 전에 μ–΄λ–€ 값을 κ°€μ§ˆμ§€ λͺ¨λ₯΄λŠ” λΆˆν™•μ‹€μ„±
λ˜λŠ” ν‘œλ³Έ 곡간 μ•ˆμ˜ λͺ¨λ“  값이 λ‚˜μ˜¬ ν™•λ₯ μ΄ κ°™λ‹€
ν™•λ₯ λ³€μˆ˜λŠ” 영문 λŒ€λ¬Έμžλ‘œ. κ·Έ λ³€μˆ˜κ°€ μ·¨ν•  수 μžˆλŠ” κ°’λŠ” μ†Œλ¬Έμžλ‘œ ν‘œμ‹œ
63
02_ν™•λ₯ λΆ„포
 ν™•λ₯  뢄포
 수치둜 λŒ€μ‘λœ ν™•λ₯ λ³€μˆ˜κ°€ κ°€μ§€λŠ” ν™•λ₯ κ°’μ˜ 뢄포
: ν™•λ₯ λ³€μˆ˜ Xκ°€ x의 값을 κ°–λŠ” ν™•λ₯  P(X = x) λ˜λŠ” p(x)둜 ν‘œμ‹œ
 예1) 두 개의 동전을 λ˜μ§€λŠ” ν™•λ₯  μ‹€ν—˜μ—μ„œ μ•žλ©΄μ΄ λ‚˜μ˜€λŠ” 숫자 (μ•ž=1, λ’€=0)
 예2) λ‘κ°œμ˜ μ£Όμ‚¬μœ„λ₯Ό λ˜μ Έμ„œ λ‚˜μ˜€λŠ” μ λ“€μ˜ ν•©
cf. ν™•λ₯ λ³€μˆ˜κ°€ μ·¨ν•  수 μžˆλŠ” ꡬ체적인 값을 ν™•λ₯ κ³΅κ°„μƒμ˜ ν™•λ₯ κ°’μœΌλ‘œ ν• λ‹Ήν•΄ μ£ΌλŠ” ν•¨μˆ˜
: ν™•λ₯ λΆ„ν¬ν•¨μˆ˜ λ˜λŠ” ν™•λ₯ ν•¨μˆ˜
64
03_ν™•λ₯ ν•¨μˆ˜μ˜ μ’…λ₯˜
 λˆ„μ λΆ„ν¬ν•¨μˆ˜
65
03_ν™•λ₯ ν•¨μˆ˜μ˜ μ’…λ₯˜
 λˆ„μ λΆ„ν¬ν•¨μˆ˜μ˜ μ„±μ§ˆ (cdf)
66
03_ν™•λ₯ ν•¨μˆ˜μ˜ μ’…λ₯˜
 ν™•λ₯ λ°€λ„ν•¨μˆ˜(pdf), ν™•λ₯ μ§ˆλŸ‰ν•¨μˆ˜(pmf)
cf. ν™•λ₯ λ°€λ„ν•¨μˆ˜λŠ” ν™•λ₯ μ˜ λ°€λ„λ§Œμ„ μ •μ˜. μ‹€μ œ ν™•λ₯ μ„ μ–»μœΌλ €λ©΄ ν™•λ₯ λ°€λ„ν•¨μˆ˜λ₯Ό 일정 κ΅¬κ°„μ—μ„œ 적뢄
67
03_ν™•λ₯ ν•¨μˆ˜μ˜ μ’…λ₯˜
 κ°€μš°μ‹œμ•ˆ ν™•λ₯ λ°€λ„ν•¨μˆ˜
 ν™•λ₯ λΆ„ν¬ν•¨μˆ˜ μ€‘μ—μ„œ κ°€μž₯ 많이 μ΄μš©λ˜λŠ” 뢄포가 κ°€μš°μŠ€(Gaussian)
ν™•λ₯ λ°€λ„ν•¨μˆ˜λ‹€.
 1μ°¨μ›μ˜ νŠΉμ§• 벑터일 κ²½μš°μ—λŠ” 두 개의 νŒŒλΌλ―Έν„°, 평균 μ 와 ν‘œμ€€νŽΈμ°¨ σ 만
정해지면 κ°€μš°μ‹œμ•ˆ ν™•λ₯ λ°€λ„ν•¨μˆ˜λŠ” λ‹€μŒκ³Ό 같이 ν‘œν˜„ν•œλ‹€.
68
03_ν™•λ₯ ν•¨μˆ˜μ˜ μ’…λ₯˜
 κ°€μš°μ‹œμ•ˆ ν™•λ₯ λ°€λ„ν•¨μˆ˜
 κ°€μš°μŠ€(Gaussian) ν™•λ₯ λ°€λ„ν•¨μˆ˜
Gaussian PDF
Gaussian CDF
69
03_ν™•λ₯ ν•¨μˆ˜μ˜ μ’…λ₯˜
 ν™•λ₯ λ°€λ„ν•¨μˆ˜μ˜ μ„±μ§ˆ
70
03_ν™•λ₯ ν•¨μˆ˜μ˜ μ’…λ₯˜
 ν™•λ₯ λ°€λ„ν•¨μˆ˜μ™€ ν™•λ₯ 
71
03_ν™•λ₯ ν•¨μˆ˜μ˜ μ’…λ₯˜
 ν™•λ₯ μ§ˆλŸ‰ν•¨μˆ˜
72
03_ν™•λ₯ ν•¨μˆ˜μ˜ μ’…λ₯˜
 κΈ°λŒ€κ°’ : ν™•λ₯ λ³€μˆ˜μ˜ 평균
 일반적인 ν‘œλ³Έ 평균
 ν™•λ₯ λ³€μˆ˜μ˜ 평균과 λΆ„μ‚°
 일반 λ°μ΄ν„°μ˜ 평균(x) 및 λΆ„μ‚° (s)κ³Ό κ΅¬λ³„ν•˜κΈ° μœ„ν•΄μ„œ 평균은 μ, 뢄산은 σ둜 λ‚˜νƒ€λƒ„.
 ο€½
1
οƒ₯
n
x
nx
n
nxx ο€½
οƒ₯
x
nx
x
n
: μƒλŒ€λ„μˆ˜, λΉˆλ„μˆ˜
E[X ] ο€½  ο€½
οƒ₯x
p(x )
all x
: X의 κΈ°λŒ€κ°’(expectation)
= 각 κ°’μ˜ 가쀑 μ‚°μˆ ν‰κ· μ˜ ν™•λ₯ μ  μš©μ–΄ ν‘œν˜„
(연속확λ₯  λ³€μˆ˜μΈ 경우)
73
03_ν™•λ₯ ν•¨μˆ˜μ˜ μ’…λ₯˜
 κΈ°λŒ€κ°’ : ν™•λ₯ λ³€μˆ˜μ˜ 평균
 ν™•λ₯ λ³€μˆ˜μ˜ 평균과 λΆ„μ‚°:
 ο€½
1
οƒ₯
n
x
nxx ο€½
οƒ₯
x
nx
x
n
An illustration of the convergence of
sequence averages of rolls of a die to the
expected value of 3.5 as the number of
trials grows
74
03_ν™•λ₯ ν•¨μˆ˜μ˜ μ’…λ₯˜
 ν™•λ₯ λ³€μˆ˜μ˜ λΆ„μ‚°κ³Ό ν‘œμ€€νŽΈμ°¨
(variance, 2th central moment)
: ν™•λ₯ λ³€μˆ˜μ˜ 뢄산도 λ°μ΄ν„°μ˜ λΆ„μ‚°μ‹μ—μ„œ μœ λ„
(연속확λ₯  λ³€μˆ˜μΈ 경우)
75
04_벑터 λžœλ€λ³€μˆ˜
 λžœλ€λ³€μˆ˜ 벑터
 렌덀 λ³€μˆ˜λ‘œ κ΅¬μ„±λœ 벑터
 ν™•λ₯  λ³€μˆ˜κ°€ 2개 이상 λ™μ‹œμ— κ³ λ €λ˜λŠ” 경우
 ν™•λ₯ λ³€μˆ˜κ°€ μ€‘μ²©λœ κ²ƒμœΌλ‘œ μ—΄(column)λ²‘ν„°λ‘œ μ •μ˜.
 2개의 λžœλ€λ³€μˆ˜λ₯Ό κ³ λ €ν•œ 경우λ₯Ό 이쀑 λžœλ€λ³€μˆ˜ (bi-variate)라고 함.
 μ΄λ³€μœ„ λžœλ€λ³€μˆ˜
 ν‘œλ³Έ 곡간 S μ—μ„œ μ •μ˜λ˜λŠ” 두 개의 λžœλ€λ³€μˆ˜ X, Y에 λŒ€ν•΄μ„œ
 두 랜덀 λ²‘ν„°λŠ” 각각 x, yλΌλŠ” 값을 가지며 μˆœμ„œμŒ (x, y) 둜 ν‘œν˜„λ˜λŠ” 2차원 ν‘œλ³Έ 곡간 λ‚΄μ˜
μž„μ˜μ˜ 점(random point)에 λŒ€μ‘λ¨.
 λˆ„μ λΆ„ν¬ν•¨μˆ˜μ™€ ν™•λ₯ λ°€λ„ν•¨μˆ˜ κ°œλ…μ€ 'κ²°ν•© λˆ„μ λΆ„ν¬ν•¨μˆ˜ (joint cdf)'와 'κ²°ν•©
ν™•λ₯ λ°€λ„ν•¨μˆ˜(joint pdf)'둜 ν™•μž₯λœλ‹€.
76
04_벑터 λžœλ€λ³€μˆ˜
 벑터 λžœλ€λ³€μˆ˜
77
04_벑터 λžœλ€λ³€μˆ˜
 κ°€μš°μ‹œμ•ˆ ν™•λ₯ λ°€λ„ν•¨μˆ˜
 단일 λ³€μˆ˜ vs. μ΄μ€‘λ³€μˆ˜ κ°€μš°μŠ€ ν™•λ₯ λ°€λ„
λ‹¨μΌλ³€μˆ˜ PDF
z
y (ex: λͺΈλ¬΄κ²Œ )
μ΄μ€‘λ³€μˆ˜ PDF
x (ex: μ‹ μž₯)
78
04_벑터 λžœλ€λ³€μˆ˜
 단일 λžœλ€λ³€μˆ˜μ˜ λˆ„μ  λΆ„ν¬ν•¨μˆ˜(cdf) 의 ν‘œν˜„
 X, Y의 이쀑 벑터 λžœλ€λ³€μˆ˜μ˜ λˆ„μ  λΆ„ν¬ν•¨μˆ˜ (cdf)의 ν‘œν˜„
79
04_벑터 λžœλ€λ³€μˆ˜
 단일 λžœλ€λ³€μˆ˜μ˜ λˆ„μ  λΆ„ν¬ν•¨μˆ˜μ˜ ν‘œν˜„
 X, Y 의 이쀑 벑터 λžœλ€λ³€μˆ˜μ˜ λˆ„μ  λΆ„ν¬ν•¨μˆ˜μ˜ ν‘œν˜„
 랜덀 벑터
κ°€ μ£Όμ–΄μ§ˆ 경우
80
04_벑터 λžœλ€λ³€μˆ˜
 μ£Όλ³€ν™•λ₯ λ°€λ„ν•¨μˆ˜ (maginal pdf)
 †μ£Όλ³€ ν™•λ₯  (marginal probability)
 2개 μ΄μƒμ˜ 사건이 μ£Όμ–΄μ‘Œμ„ λ•Œ ν•˜λ‚˜μ˜ 사건이 일어날 λ‹¨μˆœν•œ ν™•λ₯ 
οƒ˜ 즉, λ‹€λ₯Έ 사건은 κ³ λ €ν•˜μ§€ μ•Šμ€ ν™•λ₯ 
 λžœλ€λ²‘ν„°μ˜ 각 차원 성뢄에 λŒ€ν•œ κ°œλ³„ ν™•λ₯ λ°€λ„ν•¨μˆ˜λ₯Ό λ‚˜νƒ€λ‚΄λŠ” μš©μ–΄λ‹€.
 μ£Όλ³€ ν™•λ₯ λ°€λ„ν•¨μˆ˜λŠ” λ¬΄μ‹œν•˜κ³ μž ν•˜λŠ” λ³€μˆ˜μ— λŒ€ν•˜μ—¬ μ λΆ„ν•˜μ—¬ ꡬ할 수 μžˆλ‹€
81
05_λžœλ€λ²‘ν„°μ˜ 톡계적 νŠΉμ§•
 λ‹€λ³€μˆ˜ 랜덀 λ²‘ν„°μ˜ 톡계적 νŠΉμ§•
 κ²°ν•© λˆ„μ λ°€λ„ν•¨μˆ˜(joint cdf) ν˜Ήμ€ κ²°ν•© ν™•λ₯ λ°€λ„ν•¨μˆ˜ (joint pdf) λ₯Ό μ΄μš©ν•˜μ—¬ μ •μ˜λ¨.
 랜덀 벑터λ₯Ό 슀칼라 ν™•λ₯ λ³€μˆ˜μ—μ„œ μ •μ˜ν•œ 것과 같은 λ°©μ‹μœΌλ‘œ ν‘œν˜„ν•  수 μžˆλ‹€.
82
06_곡뢄산 ν–‰λ ¬
 곡뢄산 ν–‰λ ¬
 곡뢄산 ν–‰λ ¬
 λ‹€λ³€μˆ˜ 랜덀 λ²‘ν„°μ—μ„œ 각 λ³€μˆ˜(νŠΉμ§•) κ°„μ˜ 관계λ₯Ό λ‚˜νƒ€λ‚Έλ‹€.
 1
οƒͺ
Cov ( X ) ο€½  ο€½
οƒͺ
οƒͺ c ki

c ik οƒΉ
οƒΊ
οƒΊ
 N 
c ik ο€½ E [( X i ο€­  i )( X k ο€­  k )]
 n ο€½ E [( X n ο€­  n )( X n ο€­  n )]
 곡뢄산 ν–‰λ ¬μ˜ μ„±μ§ˆ
83
06_곡뢄산 ν–‰λ ¬
 곡뢄산 ν–‰λ ¬
 1
οƒͺ
Cov ( X ) ο€½  ο€½
οƒͺ
οƒͺ c ki

c ik οƒΉ
οƒΊ
οƒΊ
 N 
84
06_곡뢄산 ν–‰λ ¬
 곡뢄산 ν–‰λ ¬
 1
οƒͺ
Cov ( X ) ο€½  ο€½
οƒͺ
οƒͺ c ki

c ik οƒΉ
οƒΊ
οƒΊ
 N 
 Auto covariance matrix (자기 곡뢄산 ν–‰λ ¬)
C ik ο€½ E [( X i ο€­  i )( X k ο€­  k )]
 n ο€½ E [( X n ο€­  n )( X n ο€­  n )]
 Cross covariance matrix (μƒν˜Έ 곡뢄산 ν–‰λ ¬)
C ik ο€½ E [( X i ο€­  xi )( Y k ο€­  yk )]
 n ο€½ E [( X n ο€­  xn )( Y n ο€­  yn )]
85
06_곡뢄산 ν–‰λ ¬
 X ο€½ E [( X ο€­ X ) ] ο€½ E [( X ο€­ 2 X X  ( X ) ]
2
 곡뢄산 ν–‰λ ¬μ˜ ν‘œν˜„λ“€
2
2
2
ο€½ E[ X ] ο€­ 2 E[ X ]X  ( X ) ο€½ E[ X ] ο€­ 2 X X  ( X ) ο€½ E[ X ] ο€­ ( X )
2
2
2
2
2
2
S: μžκΈ°μƒκ΄€ν–‰λ ¬ (Auto correlation)
상관행렬
86
07_κ°€μš°μ‹œμ•ˆ 뢄포
 λ‹¨λ³€λŸ‰ κ°€μš°μ‹œμ•ˆ ν™•λ₯ λ°€λ„ν•¨μˆ˜
 ν™•λ₯ λΆ„ν¬ν•¨μˆ˜ μ€‘μ—μ„œ λŒ€ν‘œμ μœΌλ‘œ κ°€μž₯ 많이 μ‚¬μš©ν•˜λŠ” 뢄포가 κ°€μš°μ‹œμ•ˆ(Gaussian)
ν™•λ₯ λ°€λ„ν•¨μˆ˜λ‹€.
 1μ°¨μ›μ˜ νŠΉμ§• 벑터일 κ²½μš°μ—λŠ” 두 개의 νŒŒλΌλ―Έν„°, 평균 μ 와 ν‘œμ€€νŽΈμ°¨ σ 만
정해지면 κ°€μš°μ‹œμ•ˆ ν™•λ₯ λ°€λ„ν•¨μˆ˜λŠ” λ‹€μŒκ³Ό 같이 ν‘œν˜„ν•œλ‹€.
87
07_κ°€μš°μ‹œμ•ˆ 뢄포
 닀차원(λ‹€λ³€λŸ‰) κ°€μš°μŠ€ ν™•λ₯ λ°€λ„ν•¨μˆ˜
: 뢄포λ₯Ό κ°€μ§„λ‹€λŠ” 의미
x ~ N ( , )
μλŠ” κ°€μš°μ‹œμ•ˆ 쀑심인 (d×1)의 평균벑터, ΣλŠ” (d×d)의 곡뢄산 ν–‰λ ¬
T
μ—¬κΈ°μ„œ
 ο€½ E [ x ],
 ο€½ E [( x ο€­  )( x ο€­  ) ]
단일차원 κ°€μš°μŠ€ 뢄포:
88
07_κ°€μš°μ‹œμ•ˆ 뢄포
 λ‹€λ³€λŸ‰ κ°€μš°μ‹œμ•ˆ ν™•λ₯ λ°€λ„ν•¨μˆ˜μ˜ 예
 μ–΄λ–€ λͺ¨μ§‘λ‹¨μ—μ„œ 남녀 μ‹ μž₯κ³Ό λͺΈλ¬΄κ²Œλ₯Ό νŠΉμ§• λ³€μˆ˜λ‘œ ν•˜μ—¬ 2차원 곡간상에 λ‚˜νƒ€λ‚Έ
κ·Έλ¦Ό.
 이 데이터듀은 μ •κ·œλΆ„ν¬ ν˜Ήμ€ κ°€μš°μ‹œμ•ˆ 뢄포λ₯Ό κ°–λŠ” μ΄λ³€λŸ‰ μ •κ·œ ν˜Ήμ€ κ°€μš°μ‹œμ•ˆ
뢄포λ₯Ό 이룬닀고 κ°€μ •.
 λ―Έμ§€μ˜ νŠΉμ§• 벑터 Pκ°€ μ£Όμ–΄μ§ˆ 경우, 이 νŠΉμ§• 벑터가 λ‚¨μž ν΄λž˜μŠ€μ— μ†ν•˜λŠ”μ§€ μ—¬μž
ν΄λž˜μŠ€μ— μ†ν•˜λŠ”κ°€?
89
07_κ°€μš°μ‹œμ•ˆ 뢄포
 λ‹€λ³€λŸ‰ κ°€μš°μ‹œμ•ˆ ν™•λ₯ λ°€λ„ν•¨μˆ˜
 λ‚¨μžμ˜ μ‹ μž₯κ³Ό λͺΈλ¬΄κ²Œ, μ΄λ³€λŸ‰μ— λŒ€ν•œ ν™•λ₯ λ°€λ„ν•¨μˆ˜λŠ” [κ·Έλ¦Ό 4-9]와 같은 μ’… λͺ¨μ–‘μ˜
3차원 뢄포λ₯Ό 이룸.
 κ°€μš°μŠ€ 언덕(Gaussian hill)
P
90
07_κ°€μš°μ‹œμ•ˆ 뢄포
 μ€‘μ‹¬κ·Ήν•œμ •λ¦¬
 μž„μ˜ μΆ”μΆœλœ 크기가 n인 ν‘œλ³ΈμœΌλ‘œλΆ€ν„° κ³„μ‚°λœ ν‘œλ³Έν‰κ· μ€
 n의 크기가 큰 경우 κ·Όμ‚¬μ μœΌλ‘œ μ •κ·œλΆ„ν¬λ₯Ό λ”°λ₯΄κ²Œ 됨.
 ν‘œλ³Έμ˜ 개수λ₯Ό 많이 λ½‘μ„μˆ˜λ‘ ν‘œλ³Έλ“€μ˜ 평균은 μ •κ·œ 뢄포와 μœ μ‚¬ν•΄ 짐을 의미.
91
07_κ°€μš°μ‹œμ•ˆ 뢄포
 μ™„μ „ 곡뢄산 κ°€μš°μ‹œμ•ˆ (full covariance)
νŠΉμ§•μ˜ 차원간(λ˜λŠ” κ΅¬μ„±μš”μ†Œκ°„) 상관 쑴재
92
07_κ°€μš°μ‹œμ•ˆ 뢄포
 λŒ€κ° 곡뢄산 κ°€μš°μ‹œμ•ˆ (diagonal covariance)
데이터가 좕에 ν‰ν˜•μ„ μ΄λ£¨λŠ” 타원ν–₯ 뢄포. νŠΉμ§• 차원간 상관 λ¬΄μ‹œ
93
07_κ°€μš°μ‹œμ•ˆ 뢄포
 κ΅¬ν˜• 곡뢄산 κ°€μš°μ‹œμ•ˆ (spherical covariance)
데이터λ₯Ό κ΅¬ν˜• λΆ„ν¬λ‘œ λͺ¨λΈλ§
94
08_MATLAB μ‹€μŠ΅
 ν™•λ₯ λ°€λ„ν•¨μˆ˜μ— λ”°λ₯Έ 데이터 생성과 ν™•λ₯ λ°€λ„ν•¨μˆ˜ μ»¨νˆ¬μ–΄ 그리기
95
08_MATLAB μ‹€μŠ΅
 ν™•λ₯ λ°€λ„ν•¨μˆ˜μ— λ”°λ₯Έ 데이터 생성과 ν™•λ₯ λ°€λ„ν•¨μˆ˜ μ»¨νˆ¬μ–΄ 그리기
96
08_MATLAB μ‹€μŠ΅
 ν™•λ₯ λ°€λ„ν•¨μˆ˜μ— λ”°λ₯Έ 데이터 생성과 ν™•λ₯ λ°€λ„ν•¨μˆ˜ μ»¨νˆ¬μ–΄ 그리기
97
08_MATLAB μ‹€μŠ΅
 ν™•λ₯ λ°€λ„ν•¨μˆ˜μ— λ”°λ₯Έ 데이터 생성과 ν™•λ₯ λ°€λ„ν•¨μˆ˜ μ»¨νˆ¬μ–΄ 그리기
98
08_MATLAB μ‹€μŠ΅
 ν™•λ₯ λ°€λ„ν•¨μˆ˜μ— λ”°λ₯Έ 데이터 생성과 ν™•λ₯ λ°€λ„ν•¨μˆ˜ μ»¨νˆ¬μ–΄ 그리기
99
08_MATLAB μ‹€μŠ΅
 ν™•λ₯ λ°€λ„ν•¨μˆ˜μ— λ”°λ₯Έ 데이터 생성과 ν™•λ₯ λ°€λ„ν•¨μˆ˜ μ»¨νˆ¬μ–΄ 그리기
 plotgaus m-file은 κ°€μš°μ‹œμ•ˆ μ»¨νˆ¬μ–΄λ₯Ό κ·Έλ €λ³΄λŠ” ν•¨μˆ˜ 파일
100
08_MATLAB μ‹€μŠ΅
 ν™•λ₯ λ°€λ„ν•¨μˆ˜μ— λ”°λ₯Έ 데이터 생성과 ν™•λ₯ λ°€λ„ν•¨μˆ˜ μ»¨νˆ¬μ–΄ 그리기
101
08_MATLAB μ‹€μŠ΅
 ν™•λ₯ λ°€λ„ν•¨μˆ˜μ— λ”°λ₯Έ 데이터 생성과 ν™•λ₯ λ°€λ„ν•¨μˆ˜ μ»¨νˆ¬μ–΄ 그리기
102
Download