Introduction to Pattern Recognition for Human ICT Review of probability and statistics 2014. 9. 19 Hyunki Hong Contents • Probability • Random variables • Random vectors • The Gaussian random variable Review of probability theory • Definitions (informal) 1. Probabilities are numbers assigned to events that indicate “how likely” it is that the event will occur when a random experiment is performed. 2. A probability law for a random experiment is a rule that assigns probabilities to the events in the experiment. 3. The sample space S of a random experiment is the set of all possible outcomes. • Axioms of probability 1. Axiom I: π[π΄π] ≥ 0 2. Axiom II: π[π] = 1 3. Axiom III: π΄π∩π΄j = ∅ ⇒ π[π΄π β π΄π] = π[π΄π] + π[π΄π] More properties of probability μ) N=3, P[A1] + P[A2] + P[A3] - P[A1∩A2] - P[A1∩A3] - P[A2∩A3] + P[A1∩A2∩A3] Conditional probability • If A and B are two events, the probability of event A when we already know that event B has occurred is 1. This conditional probability P[A|B] is read: – the “conditional probability of A conditioned on B”, or simply – the “probability of A given B” • Interpretation 1. The new evidence “B has occurred” has the following effects. - The original sample space S (the square) becomes B (the rightmost circle). - The event A becomes A∩B. 2. P[B] simply re-normalizes the probability of events that occur jointly with B. Theorem of total probability • Let π΅1, π΅2, …, π΅π be a partition of π (mutually exclusive that add to π). • Any event π΄ can be represented as π΄ = π΄∩π = π΄∩(π΅1∪π΅2 …π΅π) = (π΄∩π΅1)∪(π΄∩π΅2)…(π΄∩π΅π) • Since π΅1, π΅2, …, π΅π are mutually exclusive, then π[π΄] = π[π΄∩π΅1] + π[π΄∩π΅2] + β― + π[π΄∩π΅π] and, therefore N [ A | Bk ]P[ Bk ] π[π΄] = π[π΄|π΅1]π[π΅1] + β― + π[π΄|π΅π] π[π΅ο₯π]P = k ο½1 Bayes theorem • • • • Assume {π΅1, π΅2, …, π΅π} is a partition of S Suppose that event π΄ occurs What is the probability of event π΅π? Using the definition of conditional probability and the theorem of total probability we obtain P[ B j | A] ο½ P[ A ο B j ] P[ A] ο½ P[ A | B j ]P[ B j ] N ο₯ P[ A | B k ]P[ Bk ] k ο½1 • This is known as Bayes Theorem or Bayes Rule, and is (one of) the most useful relations in probability and statistics. Bayes theorem & statistical pattern recognition • When used for pattern classification, BT is generally p [ x | ο· j ] P [ο· j ] expressed as p [ x | ο· j ] P [ο· j ] P [ο· j | x ] ο½ ο½ N ο₯ k ο½1 p [ x | ο· k ] P [ο· k ] p[ x ] Mel-frequency cepstrum (MFC): λ¨κ΅¬κ° μ νΈμ νμμ€ννΈλΌμ ννν λ λ°©λ². λΉμ νμ μΈ Melμ€μΌμΌμ μ£Όνμ λλ©μΈμμ λ‘κ·Ένμμ€ννΈλΌ μ cosine transformμΌλ‘ μ»μ. Mel-frequency cepstral coefficients (MFCCs)λ μ¬λ¬ MFCλ€μ λͺ¨μ λμ κ³μλ€μ μλ―Έν¨. (μμ±μ νΈμ²λ¦¬) where ππ is the π-th class (e.g., phoneme) and π₯ is the feature/observation vector (e.g., vector of MFCCs) • A typical decision rule is to choose class ππ with highest P[ππ| π₯]. Intuitively, we choose the class that is more “likely” given observation π₯. • Each term in the Bayes Theorem has a special name 1. π[ππ] prior probability (of class ππ) 2. π[ππ|π₯] posterior probability (of class ππ given the observation π₯) 3. π[π₯|ππ] likelihood (probability of observation π₯ given class ππ) 4. π[π₯] normalization constant (does not affect the decision) • Example 1. Consider a clinical problem where we need to decide if a patient has a particular medical condition on the basis of an imperfect test. - Someone with the condition may go undetected (false-negative). - Someone free of the condition may yield a positive result (false-positive). 2. Nomenclature TN / (TN+FP) X 100 νΉμ΄λ - The true-negative rate P(NEG|-COND) of a test is called its SPECIFICITY TPcalled / (TP+FN) X 100 λ―Όκ°λ - The true-positive rate P(POS|COND) of a test is its SENSITIVITY 3. Problem - Assume a population of 10,000 with a 1% prevalence for the condition - Assume that we design a test with 98% specificity and 90% sensitivity - Assume you take the test, and the result comes out POSITIVE - What is the probability that you have the condition? 4. Solution - Fill in the joint frequency table next slide, or - Apply Bayes rule 198 9,702 • Applying Bayes reule Random variables • When we perform a random experiment, we are usually interested in some measurement or numerical attribute of the outcome. ex) weights in a population of subjects, execution times when benchmarking CPUs, shape parameters when performing ATR • These examples lead to the concept of random variable. 1. A random variable π is a function that assigns a real number π(π) to each outcome π in the sample space of a random experiment. 2. π(π) maps from all possible outcomes in sample space onto the real line. Random variables • The function that assigns values to each outcome is fixed and deterministic, i.e., as in the rule “count the number of heads in three coin tosses” 1. Randomness in π is due to the underlying randomness of the outcome π of the experiment • Random variables can be 1. Discrete, e.g., the resulting number after rolling a dice 2. Continuous, e.g., the weight of a sampled individual Cumulative distribution function (cdf) • The cumulative distribution function πΉπ(π₯) of a random variable π is defined as the probability of the event {π ≤ π₯} πΉπ(π₯) = π[π ≤ π₯] − ∞ < π₯ < ∞ • Intuitively, πΉπ(π) is the long-term proportion of times when π(π) ≤ π. • Properties of the cdf Probability density function (pdf) • The probability density function ππ(π₯) of a continuous random variable π, if it exists, is defined as the derivative of πΉπ(π₯). • For discrete random variables, the equivalent to the pdf is the probability mass function • Properties 1lb = 0.45359237kg Statistical characterization of random variables • The cdf or the pdf are SUFFICIENT to fully characterize a r.v. • However, a r.v. can be PARTIALLY characterized with other measures • Expectation (center of mass of a density) • Variance (spread about the mean) • Standard deviation • N-th moment Random vectors • An extension of the concept of a random variable 1. A random vector π is a function that assigns a vector of real numbers to each outcome π in sample space π 2. We generally denote a random vector by a column vector. • The notions of cdf and pdf are replaced by ‘joint cdf’ and ‘joint pdf ’. 1. Given random vector π = [π₯1, π₯2, …, π₯π]π we define the joint cdf as 2. and the joint pdf as • The term marginal pdf is used to represent the pdf of a subset of all the random vector dimensions 1. A marginal pdf is obtained by integrating out variables that are of no interest ex) for a 2D random vector π = [π₯1, π₯2]π, the marginal pdf of π₯1 is Statistical characterization of random vectors • A random vector is also fully characterized by its joint cdf or joint pdf. • Alternatively, we can (partially) describe a random vector with measures similar to those defined for scalar random variables. • Mean vector: • Covariance matrix • The covariance matrix indicates the tendency of each pair of features (dimensions in a random vector) to vary together, i.e., to co-vary 1. The covariance has several important properties – If π₯π and π₯π tend to increase together, then πππ > 0 – If π₯π tends to decrease when π₯π increases, then πππ < 0 – If π₯π and π₯π are uncorrelated, then πππ =0 – |πππ | ≤ ππππ, where ππ is the standard deviation of π₯π. – πππ = ππ2 = π£ππ[π₯π] 2. The covariance terms can be expressed as πππ = ππ2 and πππ =πππππππ. – where πππ is called the correlation coefficient. A numerical example • Given the following samples from a 3D distribution 1. Compute the covariance matrix. 2. Generate scatter plots for every pair of vars. 3. Can you observe any relationships between the covariance and the scatter plots? 4. You may work your solution in the templates below. 2 3 5 6 4 2 4 4 6 4 4 6 2 4 4 -2 -1 1 2 0 -2 0 0 2 0 0 4 2 1 -2 1 0 4 0 2.5 4 0 0 4 2 0 4 4 0 2 4 0 0 4 2 0 -2 -2 0 -1 0 0 0 0 0 ο© 2 .5 οͺ 2 οͺ οͺο« ο 1 2 2 0 ο 1οΉ οΊ 0 οΊ 2 οΊο» The Normal or Gaussian distribution • The multivariate Normal distribution π(π,Σ) is defined as • For a single dimension, this expression is reduced to • Gaussian distributions are very popular since 1. Parameters π,Σ uniquely characterize the normal distribution 2. If all variables π₯π are uncorrelated (πΈ[π₯ππ₯π] = πΈ[π₯π]πΈ[π₯π]), then – Variables are also independent (π[π₯ππ₯π] = π[π₯π]π[π₯π]), and – Σ is diagonal, with the individual variances in the main diagonal. • Central Limit Theorem (next slide) • The marginal and conditional densities are also Gaussian. • Any linear transformation of any π jointly Gaussian r.v.’s results in π r.v.’s that are also Gaussian. 1. For π = [π1π2…ππ]π jointly Gaussian, and π΄π×π invertible, then π = π΄π is also jointly Gaussian. • Central Limit Theorem 1. Given any distribution with a mean π and variance π2, the sampling distribution of the mean approaches a normal distribution with mean π and variance π2/π as the sample size π increases - No matter what the shape of the original distribution is, the sampling distribution of the mean approaches a normal distribution. - π is the sample size used to compute the mean, not the overall number of samples in the data. 2. Example: 500 experiments are performed using a uniform distribution. - π=1 1) One sample is drawn from the distribution and its mean is recorded (500 times). 2) The histogram resembles a uniform distribution, as one would expect. - π=4 1) Four samples are drawn and the mean of the four samples is recorded (500 times) 2) The histogram starts to look more Gaussian - As π grows, the shape of the histograms resembles a Normal distribution more closely. 01_κΈ°μ΄ ν΅κ³ ο§ ν΅κ³ν ο§ μλ£λ₯Ό μ 보ννκ³ μ΄λ₯Ό λ°νμΌλ‘ νμ¬ κ°κ΄μ μΈ μμ¬ κ²°μ κ³Όμ μ μ°κ΅¬νλ νλ¬Έ ο§ λΆνμ€ν μν μΆμ μ λ¬Έμ ο§ λΆνμ€ν μνμ νλ³ λ° λΆμμ μν λ°©λ²μΌλ‘ νλ₯ μ μΈ μ¬λ¬ κΈ°λ²λ€ μ¬μ© ο Ex: λ Έμ΄μ¦κ° μ¬ν λ°μ΄ν° λͺ¨λΈλ§ λ° λΆμ ο§ ν¨ν΄μΈμμμλ ν΅κ³νμ μΈ κΈ°λ²λ€μ μ΄μ©νμ¬ μ£Όμ΄μ§ μνμ λν ν΅κ³ λΆμμ΄ νμ ο§ νλ₯ κ°μ μ΄μ©ν μν μΆμ ο§ νλ₯ : κ°μ°μ±, μ λ’°λ, λ°μ λΉλ λ±.. 25 01_κΈ°μ΄ ν΅κ³ οΆ ν΅κ³ μ©μ΄ ο§ μΆμ ο§ μμ½λ λ°μ΄ν°λ₯Ό κΈ°λ°μΌλ‘ νΉμ μ¬μ€μ μ λν΄ λ΄λ κ² ο§ λ°μ΄ν° λΆμμ μ λ’°μ± ο§ μλ‘ λ€λ₯΄κ² λ°λ³΅μ μΌλ‘ λ°μ΄ν°λ₯Ό μμ§νλλΌλ νμ λμΌν κ²°κ³Όκ° μ»μ΄μ§λκ°? ο μ¦, 보νΈμ μΈ μ¬νμ΄ κ°λ₯νκ°? ο§ λͺ¨μ§λ¨ ο§ λ°μ΄ν° λΆμμ κ΄μ¬μ΄ λλ μ 체 λμ ο§ νλ³Έ ο§ λͺ¨μ§λ¨μ νΉμ±μ νμ νκΈ° μν΄μ μμ§λ λͺ¨μ§λ¨μ μΌλΆλΆμΈ κ°λ³ μλ£ ο§ νλ³Έ λΆν¬ ο§ λμΌν λͺ¨μ§λ¨μμ μ·¨ν, λμΌν ν¬κΈ°λ₯Ό κ°λ κ°λ₯ν λͺ¨λ νλ³ΈμΌλ‘λΆν° μ»μ΄μ§ ν΅κ³ κ°λ€μ λΆν¬ 26 01_κΈ°μ΄ ν΅κ³ οΆ ν΅κ³ νλΌλ―Έν° ο§ νκ· (mean) ο§ μλ£μ μ΄ν©μ μλ£μ κ°μλ‘ λλ κ°. ο§ μλ£ λΆν¬μ λ¬΄κ² μ€μ¬μ ν΄λΉνλ€. ο§ λΆμ°(variance) ο§ νκ· μΌλ‘ λΆν° μλ£κ° ν©μ΄μ Έ μλ μ λ. ο§ μλ£λ‘λΆν° νκ· κ°μ μ°¨μ΄μ λν μ κ³± κ°. ο§ νμ€ νΈμ°¨(standard deviation) ο§ λΆμ°μ μ κ³±κ·Όμ μ·¨νμ¬ μλ£μ μ°¨μ(1μ°¨) μ μΌμΉμν¨ κ²μ λ§νλ€. 27 01_κΈ°μ΄ ν΅κ³ ο§ λ°μ΄μ΄μ€(bias) ο§ λ°μ΄ν°μ νΈν₯λ μ λλ₯Ό λνλ 28 01_κΈ°μ΄ ν΅κ³ ο§ κ³΅λΆμ°(covariance) ο§ λ μ’ λ₯ μ΄μμ λ°μ΄ν°κ° μ£Όμ΄μ§ κ²½μ°μ λ λ³μμ¬μ΄μ μ°κ΄ κ΄κ³μ λν μ²λ ο§ νλ³Έ λ°μ΄ν° μ΄λ³λ λ°μ΄ν°(bivariate) μΌ κ²½μ°μ 곡λΆμ° κ³μ° ο Ex ο xi: 2μ~3μ μ¬μ΄μ μνΈμΌν° μ λ¬Έμ μΆμ νλ μ¬λμ μ yi: 2μ~3μ μ¬μ΄μ νκ· κΈ°μ¨ ο§ λ³μλ€ κ°μ μ νκ΄κ³κ° μλ‘ μ λ°©ν₯(+)μΈμ§ λΆλ°©ν₯(-)μΈμ§ μ λλ§ λνλ. Cov(x, y) > 0 : μ λ°©ν₯ μ νκ΄κ³ Cov(x, y) < 0 : λΆλ°©ν₯ μ νκ΄κ³ Cov(x, y) = 0 : μ νκ΄κ³κ° μμ 29 01_κΈ°μ΄ ν΅κ³ ο§ μκ΄ κ³μ(correlation coefficient) ο§ λ λ³λ X,Y μ¬μ΄μ μκ΄κ΄κ³μ μ λλ₯Ό λνλ΄λ μμΉ(κ³μ) ο§ κ³΅λΆμ°κ³Όμ μ°¨μ΄μ : ο 곡λΆμ°: λ³μλ€ κ°μ μ νκ΄κ³κ° μ λ°©ν₯μΈμ§ λΆλ°©ν₯μΈμ§ μ λλ§ λνλ. ο μκ΄ κ³μ: μ ν κ΄κ³κ° μΌλ§λ λ°μ νμ§λ₯Ό λνλ΄λ μ§ν ο§ λ³μλ€μ λΆν¬ νΉμ±κ³Όλ κ΄κ³ μμ. 30 01_κΈ°μ΄ ν΅κ³ ο§ Ex: λμκ΄ μμ£Όμκ°, μ±μ , κ²°μνμμ μκ΄κ΄κ³ x1: μ£ΌλΉ λμκ΄μ μλ μκ° x2: ν¨ν΄μΈμ κ³Όλͺ© μ μ x3: μ£ΌλΉ κ²°μ μκ° ο§ μμ μκ΄ κ³μλ μ νμ μΌλ‘ λ μ±λΆκ°μ κ΄λ ¨μ±μ΄ κ±°μ μμμ λνλΈλ€. ο§ μμ μκ΄ κ³μλ λ μ±λΆμ΄ μλ‘ μλ°λλ νΉμ±μ κ°μ§μ μλ―Έν¨. 31 01_κΈ°μ΄ ν΅κ³ ο§ μλ(skewness, skew) ο§ λΆν¬κ° μ΄λ νμͺ½μΌλ‘ μΉμ°μΉ(λΉλμΉ:asymmetry) μ λλ₯Ό λνλ΄λ ν΅κ³μ μ²λ. ο§ μ€λ₯Έμͺ½μΌλ‘ λ κΈΈλ©΄ μμ κ°μ΄ λκ³ μΌμͺ½μΌλ‘ λ κΈΈλ©΄ μμ κ°μ΄ λλ€. ο§ λΆν¬κ° μ’μ° λμΉμ΄λ©΄ 0μ΄ λλ€. 32 01_κΈ°μ΄ ν΅κ³ ο§ μ²¨λ(kurtosis) ο§ λΎ°μ‘±ν(peakedness) μ λλ₯Ό λνλ΄λ ν΅κ³μ μ²λμ΄λ€. 33 2μ°¨μ λ°μ΄ν° μμ± οΌ νΉμ νλ₯ λΆν¬λ₯Ό λ°λΌ νλ₯ μ μΌλ‘ μμ± ο μ°κ΅¬ λͺ©μ μ λ§λ λ°μ΄ν° νλ₯ λΆν¬ μ μ ο μ΄ λΆν¬λ‘λΆν° λλ€νκ² λ°μ΄ν° μΆμΆ ο§ κ· λ±λΆν¬ rand() ο§ κ°μ°μμ λΆν¬ randn() ο μΌλ°μ μΈ κ²½μ° νκ· /곡λΆμ°μ λ°λ₯Έ λ³ν νμ 34 2μ°¨μ λ°μ΄ν° μμ± replicate and tile an array N=25; m1=repmat([3,5], N,1); m2=repmat([5,3], N,1); s1=[1 1; 1 2]; s2=[1 1; 1 2]; matrix square root X1=randn(N,2)*sqrtm(s1)+m1; X2=randn(N,2)*sqrtm(s2)+m2; plot(X1(:,1), X1(:,2), '+'); hold on; plot(X2(:,1), X2(:,2), 'd'); save data2_1 X1 X2; normally distributed pseudorandom numbers ex) Generate values from a normal distribution with mean 1 and stadard deviation 2 r = 1+2.*randn(100,1) 35 2μ°¨μ λ°μ΄ν° μμ± ο1 ο1 ο2 ο2 νμ΅ λ°μ΄ν° ν μ€νΈ λ°μ΄ν° 36 νμ΅ : λ°μ΄ν°μ λΆν¬ νΉμ± λΆμ οΌ λΆν¬ νΉμ± λΆμ ο κ²°μ κ²½κ³ load data2_1; m1 = mean(X1); m2 = mean(X2); s1 = cov(X1); ο1 m1 ο2 m2 s2 = cov(X2); save mean2_1 m1 m2 s1 s2; νμ΅ λ°μ΄ν°λ‘λΆν° μΆμ λ νκ· κ³Ό 곡λΆμ° 37 λΆλ₯ : κ²°μ κ²½κ³μ μ€μ οΌνλ³ν¨μ d(m1,xnew)-d(m2,xnew)=0 οΌν΄λμ€ λΌλ²¨ C1 d(m1,xnew) xnew m1 d(m2,xnew) m2 C2 38 μ±λ₯ νκ° οΌνμ΅ μ€μ°¨ κ³μ° νλ‘κ·Έλ¨ load data2_1; load mean2_1 Etrain=0; N=size(X1,1) for i=1:N d1=norm(X1(i,:)-m1); d2=norm(X1(i,:)-m2); if (d1-d2) > 0 Etrain = Etrain+1; end d1=norm(X2(i,:)-m1); d2=norm(X2(i,:)-m2); if (d1-d2) < 0 Etrain = Etrain+1; end end fprintf(1,'Training Error = %.3f\n', Etrain/50); 39 μ±λ₯ νκ° νμ΅ λ°μ΄ν° 2/50 ν μ€νΈ λ°μ΄ν° 3/50 40 λ°μ΄ν°μ μμ§κ³Ό μ μ²λ¦¬ 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 ν μ€νΈ λ°μ΄ν° 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 μ μ²λ¦¬λ₯Ό κ±°μΉ λ°μ΄ν° μ§ν© (ν¬κΈ° μ κ·ν, μ΄μ§ν) νμ΅ λ°μ΄ν° μμ§λ λ°μ΄ν° μ§ν© 41 λ°μ΄ν° λ³ν μμ λ°μ΄ν°λ₯Ό μ½μ΄ λ°μ΄ν° 벑ν°λ‘ νννκ³ μ μ₯ οΌ (μ μ²λ¦¬λ)μμ λ°μ΄ν°(20x16) ο 벑ν°(320μ°¨μ) νμ΅ λ°μ΄ν° 320x50 ν μ€νΈ λ°μ΄ν° 320x20 for i=0:1:9 for j=3:7 fn = sprintf('digit%d_%d.bmp', i, j); xi = imread(fn); x = reshape(double(xi), 20*16, 1); Xtrain(:, i*5+j-2) = x; Ttrain(i*5+j-2, 1) = i; end for j=1:2 fn = sprintf('digit%d_%d.bmp', i, j); xi = imread(fn); x = reshape(double(xi), 20*16, 1); Xtest(:, i*2+j) = x; Ttest(i*2+j, 1) = i; end end save digitdata Xtrain Ttrain Xtest Ttest reshape array: 320 by 1 νμ΅λ°μ΄ν° μ§ν©: 320 by 50 ν΄λμ€ λ μ΄λΈ: 320 by 50 42 νμ΅κ³Ό κ²°μ κ²½κ³ οΌ νμ΅ λ°μ΄ν°μ λν νκ· λ²‘ν° κ³μ° load digitdata for i=0:1:9 mX(:, i+1) = mean(Xtrain(:, i*5+1:i*5+5)’)’; end save digitMean mX; mXi = uint8(mX*255); convert to unsigned 8-bit integer for i = 0:1:9 subplot(1, 10, i+1); imshow(reshape(mXi(:, i+1), 20, 16)); end νκ· μμ κ²°μ κ·μΉ 43 λΆλ₯μ μ±λ₯ νκ° κ²°μ κ·μΉ load digitdata; load digitMean Ntrain = 50; Ntest = 20; Etrain = 0; Etest = 0; for i = 1:50 νμ΅ λ°μ΄ν° x = Xtrain(:.i); for j = 1:10 minv: μ΅μκ°. dist(j) = norm(x - mX(:, j)); minc_train: μ΅μ거리λ₯Ό κ°λ νκ· μ end μλ°μ΄ν° λ μ΄λΈ [minv, minc_train(i)] = min(dist); if(Ttrain(i) ~= (minc_train(i) - 1)) Etrain = Etrain+1; end end for i = 1:20 ν μ€νΈ λ°μ΄ν° x = Xtest(:, i); for j = 1:10 dist(j) = norm(x - mX(:, j)); end [minv, minc_test(i)] = min(dist); if(Ttest(i) ~= (minc_test(i) - 1)) Etest = Etest+1; end end 44 λΆλ₯μ μ±λ₯ νκ° ν μ€νΈ λ°μ΄ν° (μ€μ°¨ 15%) νμ΅ λ°μ΄ν° (μ€μ°¨ 2%) 45 01_κΈ°μ΄ ν΅κ³ οΆ νκ·λΆμ ο§ νκ·μ§μ ο§ νλ³Έ μ§ν©μ λννλ μ΅μ μ μ§μ μ ꡬνλ κ³Όμ ο§ κ° νλ³Έμμ μ§μ κΉμ§μ 거리 ν©μ΄ κ°μ₯ μμ μ§μ μ ꡬνλ κ³Όμ ο κ° νλ³Έμμ μ§μ μ λ΄λ¦° μμ μ κΈΈμ΄μ μ κ³± κ°μ ν©μ΄ μ΅μμΈ μ§μ μ κ΅¬ν¨ ο§ 'μ΅μ μμΉλ²(Method of Least Mean Squares, LMS)' E ο½ οΆE οΆο‘ ο‘ ο½ ο₯ ο½ 0, N ο₯ xi yi ο N ο₯ x i2 N i ο₯ d i2 ο½ οΆE N i [ y i ο (ο‘ x i ο« ο’ )] 2 ο½0 οΆο’ ο₯x ο₯ ο (ο₯ x ) i 2 i yi ο’ ο½ ο₯ ο₯ y οο₯ x ο₯ x N ο₯ x ο (ο₯ x ) 2 xi i i 2 i i yi 2 i 46 Least squares line fitting οΆ Data: (x1, y1), …, (xn, yn) οΆ Line equation: yi = m xi + b οΆ Find (m, b) to minimize E ο½ ο₯ n i ο½1 ( yi ο m xi ο b ) ο¦ E ο½ ο₯ ο§ y i ο οxi i ο½1 ο§ ο¨ n y=mx+b (xi, yi) 2 ο© y 1 οΉ ο© x1 ο©m οΉ οΆ οͺ οΊ οͺ 1οοͺ οΊ ο· ο½ ο ο ο ο· οͺ οΊ οͺ ο«b ο»οΈ οͺο« y n οΊο» οͺο« x n 2 2 1οΉ οΊ ο©m οΉ ο οͺ οΊ οΊ b ο« ο» 1οΊο» ο½ Y ο XB 2 ο½ (Y ο XB ) (Y ο XB ) ο½ Y Y ο 2 ( XB ) Y ο« ( XB ) ( XB ) T dE T T T ο½ 2 X XB ο 2 X Y ο½ 0 T T dB X XB ο½ X Y T T Normal equations: least squares solution to XB = Y 47 μμ (μ νλμν) οΆ Data: (x1, y1), …, (xn, yn) οΆ Line equation: yi = m xi + b, y1 = m x1 + b, …, yn = m xn + b ο© y 1 οΉ ο© x1 οͺ οΊ οͺ ο ο½ ο οͺ οΊ οͺ οͺο« y n οΊο» οͺο« x n 1οΉ οΊ ο©m οΉ ο οͺ οΊ οΊ b ο« ο» 1οΊο» M y ο½ M Mv T T ο y ο½ M v, μΈ‘μ μ€μ°¨ μμΌλ©΄, μΈ‘μ λ μ μ μ§μ μμ μμ§ μμ. ο ο1 v ο½ (M M ) M y T T : μ΅μμ κ³±μ§μ (least squares line of best fit) λλ νκ·μ§μ (regression line) μ μμΌλ‘ μ»μ΄μ§ y = m x + b οΆ (0, 1), (1, 3), (2, 4), (3, 4)μ μ΅μμ κ³±μ§μ ? ο©0 οͺ 1 M ο½ οͺ οͺ2 οͺ ο«3 1οΉ οΊ 1 οΊ, 1οΊ οΊ 1ο» ο©1 οΉ οͺ οΊ 3 y ο½ οͺ οΊ οͺ4οΊ οͺ οΊ ο«4ο» ο©0 T M M ο½ οͺ ο«1 1 2 1 1 ο©0 οͺ 3οΉ 1 οͺ οΊ 1ο» οͺ2 οͺ ο«3 1οΉ οΊ 1 ο©14 οΊο½ οͺ 1οΊ ο« 6 οΊ 1ο» ο© 2 m οͺ ο© οΉ T -1 T v ο½ οͺ οΊ ο½ ( M M ) M y ο½ οͺ 10 ο3 ο«b ο» οͺ ο« 10 6οΉ οΊ, 4ο» ο 3οΉ 10 οΊ ο© 0 7 οΊ οͺο« 1 οΊ 10 ο» ο© 2 οͺ T -1 ( M M ) ο½ οͺ 10 ο3 οͺ ο« 10 1 2 1 1 ο 3οΉ 10 οΊ 7 οΊ οΊ 10 ο» ο©1 οΉ οͺ οΊ 3οΉ 3 ο© 1 οΉ οͺ οΊο½ οΊ οͺ οΊ 1 ο» οͺ 4 οΊ ο«1 . 5 ο» οͺ οΊ ο«4ο» 48 Total least squares οΆDistance between point (xi, yi) and line ax+by=d (a2 + b2 = 1): |axi + byi – d| ax+by=d οΆFind (a, b, d) to minimize the sum of squared perpendicular distances E ο½ n ( a x ο« b y ο d ) 2 ο₯ i ο½1 i i οΆE οΆd ο½ ο₯ n i ο½1 ο 2 ( a xi ο« b yi ο d ) ο½ 0 d ο½ ο© x1 ο x n οͺ 2 E ο½ ο₯ ( a ( x i ο x ) ο« b ( y i ο y )) ο½ ο i ο½1 οͺ οͺο« x n ο x dE dN a ο₯ n Unit normal: N=(a, b) ( a x (x ο« b y, οy i di)) E ο½ ο₯ n xi ο« i ο½1 n i ο½1 2 i i b ο₯ n y1 ο y οΉ οΊ ο©a οΉ ο οΊ οͺb οΊ ο« ο» y n ο y οΊο» n i ο½1 yi ο½ a x ο« b y 2 ο½ (UN ) (UN ) T ο½ 2 (U U ) N ο½ 0 T 49 Total least squares ο© x1 ο x n οͺ 2 E ο½ ο₯ ( a ( x i ο x ) ο« b ( y i ο y )) ο½ ο i ο½1 οͺ οͺο« x n ο x dE 2 y1 ο y οΉ οΊ ο©a οΉ ο οΊ οͺb οΊ ο« ο» y n ο y οΊο» ο½ (UN ) (UN ) T ο½ 2 (U U ) N ο½ 0 T dN Solution to (UTU)N = 0, subject to ||N||2 = 1: eigenvector of UTU associated with the smallest eigenvalue (least squares solution to homogeneous linear system UN = 0) ο© x1 ο x οͺ U ο½ ο οͺ οͺο« x n ο x y1 ο y οΉ οΊ ο οΊ y n ο y οΊο» n ο© 2 ( x ο x ) ο₯ i οͺ T U U ο½ οͺ n i ο½1 οͺ ( x i ο x )( y i ο y ) οͺο« ο₯ i ο½1 n ο₯ i ο½1 οΉ ( x i ο x )( y i ο y ) οΊ οΊ n 2 οΊ ( y ο y ) ο₯ i οΊο» i ο½1 second moment matrix 50 01_κΈ°μ΄ ν΅κ³ ο§ νκ·κ³‘μ , 곑μ νΌν (curve fitting) ο§ νλ³Έ λ°μ΄ν°μ λν 2μ°¨ ν¨μ 맀ν κ³Όμ ο λ³΄λ€ λμ μ°¨μμ ν¨μλ 맀ν κ°λ₯ ο§ κ³‘μ μΌλ‘ λνλΈ νκ· μμ 'νκ· κ³‘μ 'μ΄λΌκ³ νλ€. 0 0 100 200 300 400 500 600 -2 -4 -6 -8 -10 y = -6E-12x 4 + 1E-07x 3 + 2E-05x 2 - 0.0461x - 0.1637 -12 51 02_νλ₯ μ΄λ‘ οΆ νλ₯ μ©μ΄ ο§ ν΅κ³μ νμ ο§ λΆνμ€ν νμμ λ°λ³΅νμ¬ κ΄μ°°νκ±°λ, κ·Έμ λν λλμ νλ³Έμ μΆμΆνμ¬ κ³ μ μ λ²μΉ λ° νΉμ±μ μ°ΎμλΌ μ μλ νμ ο§ νλ₯ μ€νμ νΉμ± ο§ λμΌν 쑰건 μλμμ λ°λ³΅ μνν μ μμ ο§ μνμ κ²°κ³Όκ° λ§€λ² λ€λ₯΄λ―λ‘ μμΈ‘ν μ μμΌλ, κ°λ₯ν λͺ¨λ κ²°κ³Όμ μ§ν©μ μ μ μμ ο§ μνμ λ°λ³΅ν λ κ°κ°μ κ²°κ³Όλ λΆκ·μΉνκ² λνλμ§λ§, λ°λ³΅μ μλ₯Ό λμ΄λ©΄ μ΄λ€ ν΅κ³μ κ·μΉμ±μ΄ λνλλ νΉμ§μ κ°μ§ 52 02_νλ₯ μ΄λ‘ ο§ νλ₯ (probability) ο§ ν΅κ³μ νμμ νμ€ν¨μ μ λλ₯Ό λνλ΄λ μ²λ. ο§ λ¬΄μμ μνμμ μ΄λ ν μ¬κ±΄μ΄ μΌμ΄λ μ λλ₯Ό λνλ΄λ μ¬κ±΄μ ν λΉλ μλ€μ λ§ν¨ ο§ νλ₯ λ²μΉ ο 무μμ μνμμ μ¬κ±΄μ νλ₯ μ ν λΉνλ κ·μΉμ λ§ν¨ ο 무μμ μνμ νλ³Έ κ³΅κ° Sκ° λͺ¨λ κ°λ₯ν μΆλ ₯ μ§ν©μ΄ λλ€. 53 02_νλ₯ μ΄λ‘ ο§ μνμ νλ₯ ο§ νλ³Έκ³΅κ° Sμ λν΄μ μ¬κ±΄ Aκ° μΌμ΄λ κ²½μ°μ μ n(A)/n(S)μ μ¬κ±΄ Aμ νλ₯ μ΄λΌκ³ νλ€. ο λͺ¨λ κ²½μ°μ μ λ° νλ³Έ μ§ν©μ΄ 미리 μ£Όμ΄μ§ κ²½μ° ο§ ν΅κ³μ νλ₯ ο§ μνμ μ¬λ¬ λ² λ°λ³΅νμ¬ λμ μ¬κ±΄μ΄ μΌμ΄λλ νλ₯ μ μλμ λΉλμλ‘λΆν° κ²½νμ μΌλ‘ μΆμ νλ κ³Όμ ο μΌλ°μ μΈ μμ° νμμ΄λ μ¬ν νμμλ λ°μ κ°λ₯μ±μ΄λ νμλ₯Ό 미리 μ μ μλ κ²½μ°κ° λλΆλΆμ. ο§ nνμ μνμμ μ¬κ±΄μ΄ rν μΌμ΄λ¬λ€κ³ νλ©΄ μλμ λΉλμλ r/nμ΄ λλ€. ο§ μ΄μ κ°μ΄ μΆμ λλ νλ₯ μ ν΅κ³μ νλ₯ μ΄λΌκ³ νλ€. ο§ μ¬κ±΄(event)κ³Ό λ°°λ° μ¬κ±΄ (mutually exclusive event) ο§ κ²°κ³Όκ° μ°μ°μΈ μ€νμ μν(trial)μ΄λΌκ³ ν¨. ο§ λ¬΄μμ μνλ κ²°κ³Όλ₯Ό 'μ¬κ±΄'μ΄λΌκ³ νλ€. ο§ λ μ¬κ±΄ A, Bκ° λμμ μΌμ΄λ μ μμ λ A, Bλ μλ‘ λ°°λ°(mutually exclusive)νλ€κ³ νλ€. ο§ λ μ¬κ±΄ A, Bκ° μλ‘ μ ν 무κ΄ν μ¬κ±΄μΌ κ²½μ°, A, Bλ₯Ό λ 립μ¬κ±΄ μ΄λΌκ³ ν¨. 54 02_νλ₯ μ΄λ‘ ο§ νλ³Έ 곡κ°(sample space)κ³Ό νλ₯ κ³΅κ° ο§ κ΄μ°°ν 쑰건 λ° λ°μ κ°λ₯ν κ²°κ³Όμ λ²μλ₯Ό κ·μ ν λ€μ, κ·Έ λ²μ λ΄μ κ° κ²°κ³Όμ κΈ°νΈλ₯Ό λμμν¨ μ§ν©μ 'νλ³Έ 곡κ°'μ΄λΌκ³ νλ€. ο νλ³Έ 곡κ°μ μμλ₯Ό 'νλ³Έμ 'μ΄λΌ ν¨ ο νλμ νλ³Έμ μΌλ‘ μ΄λ£¨μ΄μ§ μ¬κ±΄μ 'κ·Όμ μ¬κ±΄'μ΄λΌ ν¨. ο νλ³Έ 곡κ°μ λΆλΆ μ§ν©μ 'μ¬κ±΄’μ΄λΌκ³ ν¨. ο§ νλ³Έ 곡κ°μ νλ₯ μ΄ λμλ μ§ν©μ 'νλ₯ 곡κ°'μ΄λΌ ν¨ ο νλ₯ 곡κ°μ΄λ νλ₯ μ€νμμ κ°λ₯ν λͺ¨λ κ²°κ³Όμ μ§ν©μ λ§νλ€. ο§ Example: ν κ°μ μ£Όμ¬μλ₯Ό ν λ² λμ Έμ μ§μμ λμ΄ λμ¨ κ²½μ° ο ν본곡κ°μ S={1, 2, 3, 4, 5, 6}μ΄κ³ , κ·Όμ μ¬κ±΄μ Ep={1, 2, 3, 4, 5, 6}, μ¬κ±΄μ E={2, 4, 6} 55 02_νλ₯ μ΄λ‘ ο§ νλ₯ μ κ΄ν μ 리 ο§ νλ₯ μ κ΄ν μ±μ§ 56 02_νλ₯ μ΄λ‘ ο§ μ£Όλ³ νλ₯ (marginal probability) ο§ 2κ° μ΄μμ μ¬κ±΄μ΄ μ£Όμ΄μ‘μ λ νλμ μ¬κ±΄μ΄ μΌμ΄λ λ¨μν νλ₯ ο μ¦, λ€λ₯Έ μ¬κ±΄μ κ³ λ €νμ§ μμ νλ₯ ο§ μ‘°κ±΄λΆ νλ₯ (conditional probability) ο§ Aμ B λ κ°μ μ¬κ±΄μ΄ μμ κ²½μ°, μ¬κ±΄ Bκ° μΌμ΄λ νλ₯ μ΄ μ΄λ―Έ μλ €μ Έ μμ κ²½μ°μ μ¬κ±΄ Aκ° μΌμ΄λ νλ₯ ο P[A|B] : Pr. of A given B ο§ μ‘°κ±΄λΆ νλ₯ μμ κ°λ³ νλ₯ P[A], P[B] λ₯Ό μ¬μ νλ₯ (priori probability)λΌκ³ ν¨ ο§ μ‘°κ±΄λΆ νλ₯ P[A|B], P[B|A]λ₯Ό μ¬ννλ₯ (posteriori probability)μ΄λΌκ³ ν¨. ο§ μ¬κ±΄ Aμ Bκ° λ 립μ¬κ±΄μ΄λ©΄ ο P[A|B] = P[A] μ£Όμ΄μ§ Bμ λν Aμ νλ₯ 57 02_νλ₯ μ΄λ‘ ο§ κ²°ν© νλ₯ (joint rule) ο§ Aμ B μ¬κ±΄μ΄ λμμ λ°μνλ νλ₯ , ο μ¦, μ‘°κ±΄λΆ νλ₯ λ‘ μ λλλ νλ₯ ο§ P[B]×P[A | B] =P(A∩B), P[A]×P[B | A] =P(A∩B) ο§ λ§μ½ μ¬κ±΄ Aμ Bκ° λ 립μ¬κ±΄μ΄λ©΄ ο P[A|B]=P(A), P[B|A]=P(B) ο P[A|B]=P(A) ο P[B]×P[A]=P(A∩B)κ° μ±λ¦½νλ€. ο§ μ²΄μΈ κ·μΉ(chain rule) ο§ κ° μ¬κ±΄μ΄ μΌμ΄λ νλ₯ μ μ‘°κ±΄λΆ νλ₯ κ³Ό κ²°ν© νλ₯ μ μ΄μ©νμ¬ μ°μμ μΈ(chain) μ‘°κ±΄λΆ νλ₯ μ κ³±μΌλ‘ ννν κ². ο μ΄λ₯Ό 'μ²΄μΈ κ·μΉ'μ΄λΌκ³ νλ€. 58 02_νλ₯ μ΄λ‘ ο§ μ²΄μΈ κ·μΉ(chain rule) κ²°ν© λΆν¬: νλ₯ λ³μκ° μ¬λ¬ κ°μΌ λ μ΄λ€μ ν¨κ» κ³ λ €νλ νλ₯ λΆν¬ 1. μ΄μ°μ κ²½μ°: 2. μ°μμ μΈ κ²½μ°, ο§ κ° μ¬κ±΄μ΄ μΌμ΄λ νλ₯ μ μ‘°κ±΄λΆ νλ₯ κ³Ό κ²°ν© νλ₯ μ μ΄μ©νμ¬ μ°μμ μΈ(chain) μ‘°κ±΄λΆ νλ₯ μ κ³±μΌλ‘ ννν κ². 59 02_νλ₯ μ΄λ‘ ο§ μ 체 νλ₯ (total probability) ο§ B1,B2,…,Bn μ ν©μ§ν©μ΄ νλ³Έ 곡κ°μ΄κ³ , μλ‘ μνΈ λ°°νμ μΈ μ¬κ±΄μ΄λΌκ³ ν λ.. ο νλ³Έ κ³΅κ° Sμ λΆν μμμΌλ‘ μ΄λ€ μ§ν©μ λνλΌ μ μλ€. μ΄ λ, μ¬κ±΄ Aμ μλ μκ³Ό κ°μ΄ ννλλ€. S ο§ B1,B2,…,Bn μ μνΈ λ°°νμ μ΄λ―λ‘ ο§ κ·Έλ¬λ―λ‘ μλμμ΄ μ±λ¦½νλ©° μ΄λ₯Ό μ¬κ±΄ Aμ μ 체 νλ₯ μ΄λΌκ³ νλ€. 60 02_νλ₯ μ΄λ‘ ο§ μ 체 νλ₯ (total probability) ο§ μ¬κ±΄ Aμ μ 체 νλ₯ ο§ μμ : μ΄λ 곡μ₯μμ A, B, C μΈ μ’ λ₯μ κΈ°κ³λ₯Ό μ¬μ©νμ¬ λ¬Όκ±΄μ μμ°νκ³ , κΈ°κ³ A, B, Cλ₯Ό μ 체 μμ°λμ 50%, 30%, 20%λ₯Ό κ°κ° μμ°νλ€. μ¬κΈ°μ λ§λλ μ νμ λΆλλ₯ μ κ°κ° 1%, 2%, 3%λΌκ³ νλ€. μ΄λ€ μ¬νμμ μμλ‘ 1κ°λ₯Ό λ½μ κ²μ¬ν λ, κ·Έκ²μ΄ λΆλνμΌ νλ₯ μ? P[λΆλ] = P[A∩λΆλ] + P[B∩λΆλ] + P[C∩λΆλ] = P[λΆλ|A]P[A] + P[λΆλ|B]P[B] + P[λΆλ|C]P[C] = (0.01×0.5) + (0.02×0.3) + (0.03×0.2) = 0.017 61 02_νλ₯ μ΄λ‘ ο§ λ² μ΄μ¦μ μ 리 (Bayes’ rule) ο§ νλ³Έ κ³΅κ° Sμ λΆν μμ B1, B2, …, BN μ£Όμ΄μ‘μ λ, μ¬κ±΄ Aκ° μΌμ΄λ¬λ€κ³ κ°μ . μ΄ μ¬κ±΄μ΄ Bj μ μμμμ μΌμ΄λ νλ₯ μ? ο§ νΉμ§ 벑ν°(x)κ° κ΄μΈ‘λμμ λ, κ·Έ νΉμ§ 벑ν°κ° μνλ ν΄λμ€(ωj)λ₯Ό μμλ΄λ λ¬Έμ → κ°μ₯ ν° P[ωi|x] κ°λ ν΄λμ€ ωi μ ν ο§ μμ μμ μμ μ°½κ³ μμ 1κ°λ₯Ό λ½μ κ²μ¬ν΄μ λΆλνμ΄λΌλ©΄, μ΄κ²μ΄ A μ νμΌ νλ₯ μ? P[ω A | x ] ο½ P [x | ω A ] P [ ω A ] ο₯ P [x | ω k ο½ A , B ,C k ]P[ω k ] ο½ P [x | ω A ] P [ ω A ] P [x | ω A ] P [ ω A ] ο« P [x | ω B ] P [ ω B ] ο« P [x | ω C ] P [ ω C ] = (0.01×0.5)/[(0.01×0.5) + (0.02×0.3) + (0.03×0.2)] = 5/17 62 01_νλ₯ λ³μ οΆ νλ₯ λ³μ ο§ νλ₯ μ€νμμ κ°λ³ μν κ²°κ³Όλ₯Ό μμΉλ‘ λμμν€λ ν¨μλ₯Ό 'νλ₯ λ³μ' νΉμ 'λλ€λ³μ'λΌκ³ μ μνλ€. ο§ Ex: λμ λ κ°λ₯Ό λμ§λ μνμ κ²½μ°, μλ©΄(head) μλλ©΄ λ·λ©΄(tail): ο νλ³Έ κ³΅κ° S = {(H, H), (H, T), (T, H), (T, T)} → μλ©΄μ΄ νλ μ΄μ λμ¬ νλ₯ κ°: ¾ ο λμ μ΄ κ°λ₯Ό λμ§λ κ²½μ°μ κ°μ΄ μνμ΄ λ³΅μ‘ν κ²½μ°, νλ³Έ κ³΅κ° μμμ λ°λ‘ νλ₯ μ ꡬνλ κ²μ΄ μ΄λ €μ. μ€ν κ²°κ³Όλ₯Ό μμΉλ‘ λμμμΌ λκ³ ν΅κ³μ λΆμμ ννλ κ²μ΄ μ©μ΄. ο μμ : ‘λ κ°μ λμ μ λμ§λ νλ₯ μ€νμμ μ λ©΄μ΄ λμ€λ νμ’ κ·μΉμΌλ‘ λ§λ€μ΄μ§λ νλ₯ λ³μ Xμ κ° x = {0, 1, 2} ο ‘λ κ°μ μ£Όμ¬μμ μ λ€μ ν©’ κ·μΉμΌλ‘ λ§λ€μ΄μ§λ νλ₯ λ³μ Xμ κ° x = {2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12} cf. νλ₯ (λλ λλ€) μλ―Έ: μν μ μ μ΄λ€ κ°μ κ°μ§μ§ λͺ¨λ₯΄λ λΆνμ€μ± λλ νλ³Έ κ³΅κ° μμ λͺ¨λ κ°μ΄ λμ¬ νλ₯ μ΄ κ°λ€ νλ₯ λ³μλ μλ¬Έ λλ¬Έμλ‘. κ·Έ λ³μκ° μ·¨ν μ μλ κ°λ μλ¬Έμλ‘ νμ 63 02_νλ₯ λΆν¬ οΆ νλ₯ λΆν¬ ο§ μμΉλ‘ λμλ νλ₯ λ³μκ° κ°μ§λ νλ₯ κ°μ λΆν¬ : νλ₯ λ³μ Xκ° xμ κ°μ κ°λ νλ₯ P(X = x) λλ p(x)λ‘ νμ ο§ μ1) λ κ°μ λμ μ λμ§λ νλ₯ μ€νμμ μλ©΄μ΄ λμ€λ μ«μ (μ=1, λ€=0) ο§ μ2) λκ°μ μ£Όμ¬μλ₯Ό λμ Έμ λμ€λ μ λ€μ ν© cf. νλ₯ λ³μκ° μ·¨ν μ μλ ꡬ체μ μΈ κ°μ νλ₯ 곡κ°μμ νλ₯ κ°μΌλ‘ ν λΉν΄ μ£Όλ ν¨μ : νλ₯ λΆν¬ν¨μ λλ νλ₯ ν¨μ 64 03_νλ₯ ν¨μμ μ’ λ₯ οΆ λμ λΆν¬ν¨μ 65 03_νλ₯ ν¨μμ μ’ λ₯ οΆ λμ λΆν¬ν¨μμ μ±μ§ (cdf) 66 03_νλ₯ ν¨μμ μ’ λ₯ οΆ νλ₯ λ°λν¨μ(pdf), νλ₯ μ§λν¨μ(pmf) cf. νλ₯ λ°λν¨μλ νλ₯ μ λ°λλ§μ μ μ. μ€μ νλ₯ μ μ»μΌλ €λ©΄ νλ₯ λ°λν¨μλ₯Ό μΌμ ꡬκ°μμ μ λΆ 67 03_νλ₯ ν¨μμ μ’ λ₯ οΆ κ°μ°μμ νλ₯ λ°λν¨μ ο§ νλ₯ λΆν¬ν¨μ μ€μμ κ°μ₯ λ§μ΄ μ΄μ©λλ λΆν¬κ° κ°μ°μ€(Gaussian) νλ₯ λ°λν¨μλ€. ο§ 1μ°¨μμ νΉμ§ 벑ν°μΌ κ²½μ°μλ λ κ°μ νλΌλ―Έν°, νκ· μ μ νμ€νΈμ°¨ σ λ§ μ ν΄μ§λ©΄ κ°μ°μμ νλ₯ λ°λν¨μλ λ€μκ³Ό κ°μ΄ νννλ€. 68 03_νλ₯ ν¨μμ μ’ λ₯ οΆ κ°μ°μμ νλ₯ λ°λν¨μ ο§ κ°μ°μ€(Gaussian) νλ₯ λ°λν¨μ Gaussian PDF Gaussian CDF 69 03_νλ₯ ν¨μμ μ’ λ₯ οΆ νλ₯ λ°λν¨μμ μ±μ§ 70 03_νλ₯ ν¨μμ μ’ λ₯ οΆ νλ₯ λ°λν¨μμ νλ₯ 71 03_νλ₯ ν¨μμ μ’ λ₯ οΆ νλ₯ μ§λν¨μ 72 03_νλ₯ ν¨μμ μ’ λ₯ οΆ κΈ°λκ° : νλ₯ λ³μμ νκ· ο§ μΌλ°μ μΈ νλ³Έ νκ· ο§ νλ₯ λ³μμ νκ· κ³Ό λΆμ° ο§ μΌλ° λ°μ΄ν°μ νκ· (x) λ° λΆμ° (s)κ³Ό ꡬλ³νκΈ° μν΄μ νκ· μ μ, λΆμ°μ σλ‘ λνλ. ο ο½ 1 ο₯ n x nx n nxx ο½ ο₯ x nx x n : μλλμ, λΉλμ E[X ] ο½ ο ο½ ο₯x p(x ) all x : Xμ κΈ°λκ°(expectation) = κ° κ°μ κ°μ€ μ°μ νκ· μ νλ₯ μ μ©μ΄ νν (μ°μνλ₯ λ³μμΈ κ²½μ°) 73 03_νλ₯ ν¨μμ μ’ λ₯ οΆ κΈ°λκ° : νλ₯ λ³μμ νκ· ο§ νλ₯ λ³μμ νκ· κ³Ό λΆμ°: ο ο½ 1 ο₯ n x nxx ο½ ο₯ x nx x n An illustration of the convergence of sequence averages of rolls of a die to the expected value of 3.5 as the number of trials grows 74 03_νλ₯ ν¨μμ μ’ λ₯ οΆ νλ₯ λ³μμ λΆμ°κ³Ό νμ€νΈμ°¨ (variance, 2th central moment) : νλ₯ λ³μμ λΆμ°λ λ°μ΄ν°μ λΆμ°μμμ μ λ (μ°μνλ₯ λ³μμΈ κ²½μ°) 75 04_λ²‘ν° λλ€λ³μ οΆ λλ€λ³μ λ²‘ν° ο§ λ λ€ λ³μλ‘ κ΅¬μ±λ λ²‘ν° ο§ νλ₯ λ³μκ° 2κ° μ΄μ λμμ κ³ λ €λλ κ²½μ° ο§ νλ₯ λ³μκ° μ€μ²©λ κ²μΌλ‘ μ΄(column)벑ν°λ‘ μ μ. ο§ 2κ°μ λλ€λ³μλ₯Ό κ³ λ €ν κ²½μ°λ₯Ό μ΄μ€ λλ€λ³μ (bi-variate)λΌκ³ ν¨. ο§ μ΄λ³μ λλ€λ³μ ο§ νλ³Έ κ³΅κ° S μμ μ μλλ λ κ°μ λλ€λ³μ X, Yμ λν΄μ ο§ λ λλ€ λ²‘ν°λ κ°κ° x, yλΌλ κ°μ κ°μ§λ©° μμμ (x, y) λ‘ ννλλ 2μ°¨μ νλ³Έ κ³΅κ° λ΄μ μμμ μ (random point)μ λμλ¨. ο§ λμ λΆν¬ν¨μμ νλ₯ λ°λν¨μ κ°λ μ 'κ²°ν© λμ λΆν¬ν¨μ (joint cdf)'μ 'κ²°ν© νλ₯ λ°λν¨μ(joint pdf)'λ‘ νμ₯λλ€. 76 04_λ²‘ν° λλ€λ³μ οΆ λ²‘ν° λλ€λ³μ 77 04_λ²‘ν° λλ€λ³μ οΆ κ°μ°μμ νλ₯ λ°λν¨μ ο§ λ¨μΌ λ³μ vs. μ΄μ€λ³μ κ°μ°μ€ νλ₯ λ°λ λ¨μΌλ³μ PDF z y (ex: λͺΈλ¬΄κ² ) μ΄μ€λ³μ PDF x (ex: μ μ₯) 78 04_λ²‘ν° λλ€λ³μ οΆ λ¨μΌ λλ€λ³μμ λμ λΆν¬ν¨μ(cdf) μ νν οΆ X, Yμ μ΄μ€ λ²‘ν° λλ€λ³μμ λμ λΆν¬ν¨μ (cdf)μ νν 79 04_λ²‘ν° λλ€λ³μ οΆ λ¨μΌ λλ€λ³μμ λμ λΆν¬ν¨μμ νν οΆ X, Y μ μ΄μ€ λ²‘ν° λλ€λ³μμ λμ λΆν¬ν¨μμ νν οΆ λλ€ λ²‘ν° κ° μ£Όμ΄μ§ κ²½μ° 80 04_λ²‘ν° λλ€λ³μ οΆ μ£Όλ³νλ₯ λ°λν¨μ (maginal pdf) ο§ †μ£Όλ³ νλ₯ (marginal probability) ο§ 2κ° μ΄μμ μ¬κ±΄μ΄ μ£Όμ΄μ‘μ λ νλμ μ¬κ±΄μ΄ μΌμ΄λ λ¨μν νλ₯ ο μ¦, λ€λ₯Έ μ¬κ±΄μ κ³ λ €νμ§ μμ νλ₯ ο§ λλ€λ²‘ν°μ κ° μ°¨μ μ±λΆμ λν κ°λ³ νλ₯ λ°λν¨μλ₯Ό λνλ΄λ μ©μ΄λ€. ο§ μ£Όλ³ νλ₯ λ°λν¨μλ 무μνκ³ μ νλ λ³μμ λνμ¬ μ λΆνμ¬ κ΅¬ν μ μλ€ 81 05_λλ€λ²‘ν°μ ν΅κ³μ νΉμ§ ο§ λ€λ³μ λλ€ λ²‘ν°μ ν΅κ³μ νΉμ§ ο§ κ²°ν© λμ λ°λν¨μ(joint cdf) νΉμ κ²°ν© νλ₯ λ°λν¨μ (joint pdf) λ₯Ό μ΄μ©νμ¬ μ μλ¨. ο§ λλ€ λ²‘ν°λ₯Ό μ€μΉΌλΌ νλ₯ λ³μμμ μ μν κ²κ³Ό κ°μ λ°©μμΌλ‘ ννν μ μλ€. 82 06_곡λΆμ° νλ ¬ οΆ κ³΅λΆμ° νλ ¬ ο§ κ³΅λΆμ° νλ ¬ ο§ λ€λ³μ λλ€ λ²‘ν°μμ κ° λ³μ(νΉμ§) κ°μ κ΄κ³λ₯Ό λνλΈλ€. ο©ο³ 1 οͺ Cov ( X ) ο½ ο ο½ οͺ οͺο« c ki ο c ik οΉ οΊ οΊ ο³ N οΊο» c ik ο½ E [( X i ο ο i )( X k ο ο k )] ο³ n ο½ E [( X n ο ο n )( X n ο ο n )] ο§ κ³΅λΆμ° νλ ¬μ μ±μ§ 83 06_곡λΆμ° νλ ¬ οΆ κ³΅λΆμ° νλ ¬ ο©ο³ 1 οͺ Cov ( X ) ο½ ο ο½ οͺ οͺο« c ki ο c ik οΉ οΊ οΊ ο³ N οΊο» 84 06_곡λΆμ° νλ ¬ οΆ κ³΅λΆμ° νλ ¬ ο©ο³ 1 οͺ Cov ( X ) ο½ ο ο½ οͺ οͺο« c ki ο c ik οΉ οΊ οΊ ο³ N οΊο» ο§ Auto covariance matrix (μκΈ° 곡λΆμ° νλ ¬) C ik ο½ E [( X i ο ο i )( X k ο ο k )] ο³ n ο½ E [( X n ο ο n )( X n ο ο n )] ο§ Cross covariance matrix (μνΈ κ³΅λΆμ° νλ ¬) C ik ο½ E [( X i ο ο xi )( Y k ο ο yk )] ο³ n ο½ E [( X n ο ο xn )( Y n ο ο yn )] 85 06_곡λΆμ° νλ ¬ ο³ X ο½ E [( X ο X ) ] ο½ E [( X ο 2 X X ο« ( X ) ] 2 οΆ κ³΅λΆμ° νλ ¬μ ννλ€ 2 2 2 ο½ E[ X ] ο 2 E[ X ]X ο« ( X ) ο½ E[ X ] ο 2 X X ο« ( X ) ο½ E[ X ] ο ( X ) 2 2 2 2 2 2 S: μκΈ°μκ΄νλ ¬ (Auto correlation) μκ΄νλ ¬ 86 07_κ°μ°μμ λΆν¬ οΆ λ¨λ³λ κ°μ°μμ νλ₯ λ°λν¨μ ο§ νλ₯ λΆν¬ν¨μ μ€μμ λνμ μΌλ‘ κ°μ₯ λ§μ΄ μ¬μ©νλ λΆν¬κ° κ°μ°μμ(Gaussian) νλ₯ λ°λν¨μλ€. ο§ 1μ°¨μμ νΉμ§ 벑ν°μΌ κ²½μ°μλ λ κ°μ νλΌλ―Έν°, νκ· μ μ νμ€νΈμ°¨ σ λ§ μ ν΄μ§λ©΄ κ°μ°μμ νλ₯ λ°λν¨μλ λ€μκ³Ό κ°μ΄ νννλ€. 87 07_κ°μ°μμ λΆν¬ οΆ λ€μ°¨μ(λ€λ³λ) κ°μ°μ€ νλ₯ λ°λν¨μ : λΆν¬λ₯Ό κ°μ§λ€λ μλ―Έ x ~ N (ο , ο) μλ κ°μ°μμ μ€μ¬μΈ (d×1)μ νκ· λ²‘ν°, Σλ (d×d)μ 곡λΆμ° νλ ¬ T μ¬κΈ°μ ο ο½ E [ x ], ο ο½ E [( x ο ο )( x ο ο ) ] λ¨μΌμ°¨μ κ°μ°μ€ λΆν¬: 88 07_κ°μ°μμ λΆν¬ οΆ λ€λ³λ κ°μ°μμ νλ₯ λ°λν¨μμ μ ο§ μ΄λ€ λͺ¨μ§λ¨μμ λ¨λ μ μ₯κ³Ό λͺΈλ¬΄κ²λ₯Ό νΉμ§ λ³μλ‘ νμ¬ 2μ°¨μ 곡κ°μμ λνλΈ κ·Έλ¦Ό. ο§ μ΄ λ°μ΄ν°λ€μ μ κ·λΆν¬ νΉμ κ°μ°μμ λΆν¬λ₯Ό κ°λ μ΄λ³λ μ κ· νΉμ κ°μ°μμ λΆν¬λ₯Ό μ΄λ£¬λ€κ³ κ°μ . ο§ λ―Έμ§μ νΉμ§ λ²‘ν° Pκ° μ£Όμ΄μ§ κ²½μ°, μ΄ νΉμ§ 벑ν°κ° λ¨μ ν΄λμ€μ μνλμ§ μ¬μ ν΄λμ€μ μνλκ°? 89 07_κ°μ°μμ λΆν¬ οΆ λ€λ³λ κ°μ°μμ νλ₯ λ°λν¨μ ο§ λ¨μμ μ μ₯κ³Ό λͺΈλ¬΄κ², μ΄λ³λμ λν νλ₯ λ°λν¨μλ [κ·Έλ¦Ό 4-9]μ κ°μ μ’ λͺ¨μμ 3μ°¨μ λΆν¬λ₯Ό μ΄λ£Έ. ο§ κ°μ°μ€ μΈλ(Gaussian hill) P 90 07_κ°μ°μμ λΆν¬ οΆ μ€μ¬κ·Ήνμ 리 ο§ μμ μΆμΆλ ν¬κΈ°κ° nμΈ νλ³ΈμΌλ‘λΆν° κ³μ°λ νλ³Ένκ· μ ο§ nμ ν¬κΈ°κ° ν° κ²½μ° κ·Όμ¬μ μΌλ‘ μ κ·λΆν¬λ₯Ό λ°λ₯΄κ² λ¨. ο§ νλ³Έμ κ°μλ₯Ό λ§μ΄ λ½μμλ‘ νλ³Έλ€μ νκ· μ μ κ· λΆν¬μ μ μ¬ν΄ μ§μ μλ―Έ. 91 07_κ°μ°μμ λΆν¬ οΆ μμ 곡λΆμ° κ°μ°μμ (full covariance) νΉμ§μ μ°¨μκ°(λλ ꡬμ±μμκ°) μκ΄ μ‘΄μ¬ 92 07_κ°μ°μμ λΆν¬ οΆ λκ° κ³΅λΆμ° κ°μ°μμ (diagonal covariance) λ°μ΄ν°κ° μΆμ ννμ μ΄λ£¨λ νμν₯ λΆν¬. νΉμ§ μ°¨μκ° μκ΄ λ¬΄μ 93 07_κ°μ°μμ λΆν¬ οΆ κ΅¬ν 곡λΆμ° κ°μ°μμ (spherical covariance) λ°μ΄ν°λ₯Ό ꡬν λΆν¬λ‘ λͺ¨λΈλ§ 94 08_MATLAB μ€μ΅ οΆ νλ₯ λ°λν¨μμ λ°λ₯Έ λ°μ΄ν° μμ±κ³Ό νλ₯ λ°λν¨μ 컨ν¬μ΄ 그리기 95 08_MATLAB μ€μ΅ οΆ νλ₯ λ°λν¨μμ λ°λ₯Έ λ°μ΄ν° μμ±κ³Ό νλ₯ λ°λν¨μ 컨ν¬μ΄ 그리기 96 08_MATLAB μ€μ΅ οΆ νλ₯ λ°λν¨μμ λ°λ₯Έ λ°μ΄ν° μμ±κ³Ό νλ₯ λ°λν¨μ 컨ν¬μ΄ 그리기 97 08_MATLAB μ€μ΅ οΆ νλ₯ λ°λν¨μμ λ°λ₯Έ λ°μ΄ν° μμ±κ³Ό νλ₯ λ°λν¨μ 컨ν¬μ΄ 그리기 98 08_MATLAB μ€μ΅ οΆ νλ₯ λ°λν¨μμ λ°λ₯Έ λ°μ΄ν° μμ±κ³Ό νλ₯ λ°λν¨μ 컨ν¬μ΄ 그리기 99 08_MATLAB μ€μ΅ οΆ νλ₯ λ°λν¨μμ λ°λ₯Έ λ°μ΄ν° μμ±κ³Ό νλ₯ λ°λν¨μ 컨ν¬μ΄ 그리기 ο§ plotgaus m-fileμ κ°μ°μμ 컨ν¬μ΄λ₯Ό κ·Έλ €λ³΄λ ν¨μ νμΌ 100 08_MATLAB μ€μ΅ οΆ νλ₯ λ°λν¨μμ λ°λ₯Έ λ°μ΄ν° μμ±κ³Ό νλ₯ λ°λν¨μ 컨ν¬μ΄ 그리기 101 08_MATLAB μ€μ΅ οΆ νλ₯ λ°λν¨μμ λ°λ₯Έ λ°μ΄ν° μμ±κ³Ό νλ₯ λ°λν¨μ 컨ν¬μ΄ 그리기 102