ppt

Lecture Slides for INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, 2004 alpaydin@boun.edu.tr http://www.cmpe.boun.edu.tr/~ethem/i2ml CHAPTER 5: Multivariate Methods Multivariate Data    Multiple measurements (sensors) d inputs/features/attributes: d-variate N instances/observations/examples  X 11  2 X1  X    N  X 1 X 21 X 22 X N 2 X d1  2   Xd   N  X d   3 Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) Multivariate Parameters Me an : E x   μ  1 ,...,d  T Covarianc e: ij  CovX i , X j  Corre lation : CorrX i , X j   ij     CovX   E X  μ X  μ  T  ij i  j   12  12   1d    2  21  2   2d        2   d 1  d 2   d  4 Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) Parameter Estimation t x t 1 i N Sample me an m : mi  N ,i  1,...,d  x   N Covarianc ematrix S : sij Corre lation matrix R : rij  t 1 t i   mi x tj  m j  N sij si s j 5 Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) Estimation of Missing Values     What to do if certain instances have missing attributes? Ignore those instances: not a good idea if the sample is small Use ‘missing’ as an attribute: may give information Imputation: Fill in the missing value   Mean imputation: Use the most likely value (e.g., mean) Imputation by regression: Predict based on other attributes 6 Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) Multivariate Normal Distribution x ~ N d μ, Σ  1  1  T 1       px  e xp x  μ Σ x  μ  1/ 2 d/2  2  2 Σ 7 Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) Multivariate Normal Distribution   Mahalanobis distance: (x – μ)T ∑–1 (x – μ) measures the distance from x to μ in terms of ∑ (normalizes for difference in variances and correlations)  12 12  Bivariate: d = 2   p x 1 , x 2   12 1 212 2 2    1 2 2  e xp z1  2z1z2  z2  2 2 1   21       zi  x i  i  / i 8 Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) Bivariate Normal 9 Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 10 Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) Independent Inputs: Naive Bayes  If xi are independent, offdiagonals of ∑ are 0, Mahalanobis distance reduces to weighted (by 1/σi) Euclidean distance:  1 d x  1 i i   p x    pi x i   e xp   d   2 d / 2  i 1  i 1 i 2  i d    2    i 1  If variances are also equal, reduces to Euclidean distance 11 Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) Parametric Classification  If p (x | Ci ) ~ N ( μi , ∑i ) 1  1  T 1 p x | Ci   e xp x  μi  Σi x  μi  1/ 2 d/2  2  2 Σi  Discriminant functions are gi x   log p x | Ci   log P Ci  d 1 1 T 1   log2  log Σi  x  μi  Σi x  μi   log P Ci  2 2 2 12 Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) Estimation of Parameters r  ˆ P C   t t i i mi  N t t r t i x t r t i r x   t Si t i t  t  mi x  mi t r t i  T 1 1 T 1 gi x    log Si  x  mi  Si x  mi   log ˆ P Ci  2 2 13 Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) Different Si  Quadratic discriminant   1 1 T 1 1 T 1 gi x    log Si  x Si x  2x T Si mi  mi Si mi  log ˆ P Ci  2 2 T  x T Wi x  wi x  wi 0 whe re 1 1 Wi   Si 2 1 wi  Si mi wi 0 1 T 1 1   mi Si mi  log Si  log ˆ P Ci  2 2 14 Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) likelihoods discriminant: P (C1|x ) = 0.5 posterior for C1 15 Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) Common Covariance Matrix S  Shared common sample covariance S S  ˆ P Ci Si i  Discriminant reduces to 1 T gi x    x  mi  S 1 x  mi   log ˆ P Ci  2 which is a linear discriminant gi x   wi x  wi 0 T whe re 1 wi  S mi wi 0 1 T 1   mi S mi  log ˆ P Ci  2 Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 16 Common Covariance Matrix S 17 Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) Diagonal S  When xj j = 1,..d, are independent, ∑ is diagonal p (x|Ci) = ∏j p (xj |Ci) (Naive Bayes’ assumption) 1  x  mij gi x     2 j 1  s j d t j 2    log ˆ P Ci    Classify based on weighted Euclidean distance (in sj units) to the nearest mean 18 Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) Diagonal S variances may be different 19 Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) Diagonal S, equal variances  Nearest mean classifier: Classify based on Euclidean distance to the nearest mean gi x    x  mi 2s 2  2  log ˆ P Ci  1   2  x tj  mij 2s j 1 d  2   log ˆ P Ci  Each mean can be considered a prototype or template and this is template matching 20 Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) Diagonal S, equal variances *? 21 Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) Model Selection Assumption Covariance matrix No of parameters Shared, Hyperspheric Si=S=s2I 1 Shared, Axis-aligned Si=S, with sij=0 d Shared, Hyperellipsoidal Si=S Different, Hyperellipsoidal Si   d(d+1)/2 K d(d+1)/2 As we increase complexity (less restricted S), bias decreases and variance increases Assume simple models (allow some bias) to control variance (regularization) 22 Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) Discrete Features  Binary features: pij  p x j  1 | Ci  if xj are independent (Naive Bayes’) d xj 1x j  p x | Ci   pij 1  pij  j 1 the discriminant is linear gi x   log p x | Ci   log P Ci    x j log pij  1  x j  log 1  pij   log P Ci  j Estimated parameters ˆ pij  t t x t jri t r t i 23 Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) Discrete Features  Multinomial (1-of-nj) features: xj  {v1, v2,..., vnj} pijk  p z jk  1 | Ci   p x j  vk | Ci  if xj are independent d nj p x | Ci    pijkjk z j 1 k 1 gi x    j k z jk log pijk  log P Ci  ˆ pijk  t t z r t jk i t r t i 24 Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) Multivariate Regression r  gx | w ,w ,...,w    t t 0  1 d Multivariate linear model w 0  w1x 1t  w2x 2t    wd x dt  1 E w 0 , w1 ,...,w d | X   t r t  w 0  w1x 1t    wd x dt 2   2 Multivariate polynomial model: Define new higher-order variables z1=x1, z2=x2, z3=x12, z4=x22, z5=x1x2 and use the linear model in this new z space (basis functions, kernel trick, SVM: Chapter 10) 25 Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

ppt

Related documents

Products

Support

ppt

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib