ppt

advertisement
Lecture Slides for
INTRODUCTION TO
Machine Learning
ETHEM ALPAYDIN
© The MIT Press, 2004
alpaydin@boun.edu.tr
http://www.cmpe.boun.edu.tr/~ethem/i2ml
CHAPTER 5:
Multivariate Methods
Multivariate Data



Multiple measurements (sensors)
d inputs/features/attributes: d-variate
N instances/observations/examples
 X 11
 2
X1

X
 
 N
 X 1
X 21
X 22
X
N
2
X d1 
2 
 Xd 

N
 X d 

3
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
Multivariate Parameters
Me an : E x   μ  1 ,...,d 
T
Covarianc e: ij  CovX i , X j 
Corre lation : CorrX i , X j   ij 

  CovX   E X  μ X  μ 
T

ij
i  j
  12  12   1d 


2
 21  2   2d 


 


2 
 d 1  d 2   d 
4
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
Parameter Estimation
t
x
t 1 i
N
Sample me an m : mi 
N
,i  1,...,d

x


N
Covarianc ematrix S : sij
Corre lation matrix R : rij 
t 1
t
i

 mi x tj  m j

N
sij
si s j
5
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
Estimation of Missing Values




What to do if certain instances have missing
attributes?
Ignore those instances: not a good idea if the
sample is small
Use ‘missing’ as an attribute: may give information
Imputation: Fill in the missing value


Mean imputation: Use the most likely value (e.g., mean)
Imputation by regression: Predict based on other
attributes
6
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
Multivariate Normal Distribution
x ~ N d μ, Σ 
1
 1

T
1






px 
e xp x  μ Σ x  μ 
1/ 2
d/2
 2

2 Σ
7
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
Multivariate Normal Distribution


Mahalanobis distance: (x – μ)T ∑–1 (x – μ)
measures the distance from x to μ in terms of ∑
(normalizes for difference in variances and
correlations)
 12
12 
Bivariate: d = 2


p x 1 , x 2  
12
1
212
2
2



1
2
2 
e xp
z1  2z1z2  z2 
2
2
1 
 21  




zi  x i  i  / i
8
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
Bivariate Normal
9
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
10
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
Independent Inputs: Naive Bayes

If xi are independent, offdiagonals of ∑ are 0,
Mahalanobis distance reduces to weighted (by 1/σi)
Euclidean distance:
 1 d x 
1
i
i


p x    pi x i  
e
xp


d
 
2
d
/
2

i 1 
i 1
i
2  i
d



2



i 1

If variances are also equal, reduces to Euclidean
distance
11
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
Parametric Classification

If p (x | Ci ) ~ N ( μi , ∑i )
1
 1

T
1
p x | Ci  
e xp x  μi  Σi x  μi 
1/ 2
d/2
 2

2 Σi

Discriminant functions are
gi x   log p x | Ci   log P Ci 
d
1
1
T
1
  log2  log Σi  x  μi  Σi x  μi   log P Ci 
2
2
2
12
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
Estimation of Parameters
r

ˆ
P C  
t
t i
i
mi 
N
t t
r
t i x
t
r
t i
r x


t
Si
t i
t

t
 mi x  mi
t
r
t i

T
1
1
T
1
gi x    log Si  x  mi  Si x  mi   log ˆ
P Ci 
2
2
13
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
Different Si

Quadratic discriminant


1
1 T 1
1
T
1
gi x    log Si  x Si x  2x T Si mi  mi Si mi  log ˆ
P Ci 
2
2
T
 x T Wi x  wi x  wi 0
whe re
1 1
Wi   Si
2
1
wi  Si mi
wi 0
1 T 1
1
  mi Si mi  log Si  log ˆ
P Ci 
2
2
14
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
likelihoods
discriminant:
P (C1|x ) = 0.5
posterior for C1
15
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
Common Covariance Matrix S

Shared common sample covariance S
S  ˆ
P Ci Si
i

Discriminant reduces to
1
T
gi x    x  mi  S 1 x  mi   log ˆ
P Ci 
2
which is a linear discriminant
gi x   wi x  wi 0
T
whe re
1
wi  S mi wi 0
1 T 1
  mi S mi  log ˆ
P Ci 
2
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
16
Common Covariance Matrix S
17
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
Diagonal S

When xj j = 1,..d, are independent, ∑ is diagonal
p (x|Ci) = ∏j p (xj |Ci)
(Naive Bayes’ assumption)
1  x  mij
gi x    
2 j 1  s j
d
t
j
2

  log ˆ
P Ci 


Classify based on weighted Euclidean distance (in sj
units) to the nearest mean
18
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
Diagonal S
variances may be
different
19
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
Diagonal S, equal variances

Nearest mean classifier: Classify based on
Euclidean distance to the nearest mean
gi x   
x  mi
2s 2

2
 log ˆ
P Ci 
1
  2  x tj  mij
2s j 1
d

2

 log ˆ
P Ci 
Each mean can be considered a prototype or
template and this is template matching
20
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
Diagonal S, equal variances
*?
21
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
Model Selection
Assumption
Covariance matrix
No of parameters
Shared, Hyperspheric
Si=S=s2I
1
Shared, Axis-aligned
Si=S, with sij=0
d
Shared, Hyperellipsoidal
Si=S
Different, Hyperellipsoidal Si


d(d+1)/2
K d(d+1)/2
As we increase complexity (less restricted S), bias
decreases and variance increases
Assume simple models (allow some bias) to control
variance (regularization)
22
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
Discrete Features

Binary features: pij  p x j  1 | Ci 
if xj are independent (Naive Bayes’)
d
xj
1x j 
p x | Ci   pij 1  pij 
j 1
the discriminant is linear
gi x   log p x | Ci   log P Ci 
  x j log pij  1  x j  log 1  pij   log P Ci 
j
Estimated parameters
ˆ
pij 
t t
x
t jri
t
r
t i
23
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
Discrete Features

Multinomial (1-of-nj) features: xj  {v1, v2,..., vnj}
pijk  p z jk  1 | Ci   p x j  vk | Ci 
if xj are independent
d
nj
p x | Ci    pijkjk
z
j 1 k 1
gi x    j k z jk log pijk  log P Ci 
ˆ
pijk 
t
t
z
r
t jk i
t
r
t i
24
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
Multivariate Regression
r  gx | w ,w ,...,w   
t
t
0

1
d
Multivariate linear model
w 0  w1x 1t  w2x 2t    wd x dt

1
E w 0 , w1 ,...,w d | X   t r t  w 0  w1x 1t    wd x dt
2


2
Multivariate polynomial model:
Define new higher-order variables
z1=x1, z2=x2, z3=x12, z4=x22, z5=x1x2
and use the linear model in this new z space
(basis functions, kernel trick, SVM: Chapter 10)
25
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
Download