ppt

advertisement
Lecture Slides for
INTRODUCTION TO
Machine Learning
ETHEM ALPAYDIN
© The MIT Press, 2004
alpaydin@boun.edu.tr
http://www.cmpe.boun.edu.tr/~ethem/i2ml
CHAPTER 12:
Local Models
Introduction

Divide the input space into local regions and learn
simple (constant/linear) models in each patch

Unsupervised: Competitive, online clustering
Supervised: Radial-basis func, mixture of experts

3
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
Competitive Learning


E mi i 1 X  t i bit x t  mi
k
t
t

1
if
x

m

min
x
 ml
t
i
l
bi  
0 othe rwise
t t
b
x

t i
Batc hk - me ans: mi 
t
b
t i
Batc hk - me ans:
E t
mij  
 bit x tj  mij
mij


4
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
Winner-take-all
network
5
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
Adaptive Resonance Theory

t
i
Incremental; add a new
cluster if not covered;
defined by vigilance, ρ
k
t
b  x  mi   min x t  ml
l 1
mk 1  x t

t

m


x
 mi
i



if bi  
othe rwise
(Carpenter and Grossberg, 1988)
6
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
Self-Organizing Maps


Units have a neighborhood defined; mi is “between”
mi-1 and mi+1, and are all updated together
One-dim map:
(Kohonen, 1990)

ml  e l ,i  x t  ml

 l  i 2 
1
e l ,i  
e xp
2 
2
 2 
7
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
Radial-Basis Functions

Locally-tuned units:
 xt  m
h
pht  e xp

2sh2

2




H
y   whpht w 0
t
h 1
8
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
Local vs Distributed
Representation
9
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
Training RBF



Hybrid learning:
 First layer centers and spreads:
Unsupervised k-means
 Second layer weights:
Supervised gradient-descent
Fully supervised
(Broomhead and Lowe, 1988; Moody and Darken,
1989)
10
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
Regression
E m , s , w 
h
y it 
h
ih i ,h

1
| X    rit  y it
2 t i

2
H
t
w
p
 ih h  w i 0
h 1


w ih   rit  y it p ht
t
mhj



 t
t
t
   ri  y i w ih p h
t  i

x
t
j
 mhj
s h2
t

 t x  mh
t
t
s h    ri  y i w ih p h
3
s
t  i

h



2
11
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
Classification
E mh , sh , wih i ,h | X    rit log y it
t
i
y 

t
e xp h wihpht  wi 0


i
t
e
xp
w
p
k h kh h  wk 0

12
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
Rules and Exceptions
H
y t   whpht  v T x t  v 0
h 1
Exceptions
Default
rule
13
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
Rule-Based Knowledge
IF x 1  a  A ND x 2  b  OR x 3  c  THEN y  0.1
 x 1  a 2 
 x 2  b 2 
p1  e xp
  e xp
 with w 1  0.1
2
2
2s1 
2s2 


 x 3  c 2 
p 2  e xp
 with w 2  0.1
2
2s3 




Incorporation of prior knowledge (before training)
Rule extraction (after training) (Tresp et al., 1997)
Fuzzy membership functions and fuzzy rules
14
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
Normalized Basis Functions
t
h
g 
pht
t
p
l 1 l
H
2
t

e xp  x  mh / 2sh2 



2
t
2

e
xp

x

m
/
2
s
l 
l
l 

H
y   w ih ght
t
i
h 1


w ih   rit  y it ght
t


mhj   ri  y w ih  y
t
i
t
t
i
t
i

x
g
t
h
t
j
 mhj

sh2
15
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
Competitive Basis Functions




p r t | xt  p h | xt p r t | h, xt
Mixture model:


H

h 1

t
p
x
| h p h 
t
ph|x 
t
p
x
| l p l 
l




2
t

ah e xp  x  mh / 2sh2 


t
gh 
2
t
2

a
e
xp

x

m
/
2
s
l l 
l
l 

16
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)


t
f
t
h
t
t
i
t
ih

t
h

t



 g e xp 1 / 2 r  y  

l
t
l


mhj    f  g

g e xp  1 / 2i ri  y
t
h

 1
t 2
t
| X    log g e xp  ri  y ih 
h
t

 2 i
t
 w ih is the c onstantfit
y ih
t
h
w ih    ri  y f
t

p r |x 
Regression
L mh , sh , w ih i ,h

 rit  y it 
1
e xp

2
2 
2

t
i
t
h
t
j
 mhj

sh2
t 2
ih
t
i
t
h
 x
p h | x p r | h , x 
p h | r , x  
l p l | x p r | l , x 
t 2
il
17
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
Classification
L mh , sh , wih i ,h | X    log g
t
h
t
h
 y 
t
t ri
ih
i

t
t 
  log g e xp ri log y ih 
t
h
 i

e xpwih

k e xpwkh
t
h
t
y ih
t
h
f 

t
ght e xp i rit log y ih


t
t
t
g
e
xp
r
log
y
l l i i
il

18
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
EM for RBF (Supervised EM)


E-step:
M-step:

fht  p r | h, x t
mh 
sh
t t
f
t h x
t
f
t h
f x


w ih 

t
t h
t

t
 mh x  mh
t
f
t h

T
t t
f
t h ri
t
f
t h
19
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
Learning Vector Quantization


H units per class prelabeled (Kohonen, 1990)
Given x, mi is the closest:


mi   x t  mi

t

m



x
 mi
i



if x t and mi have the same c lasslabe l
othe rwise
x
mi
mj
20
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
Mixture of Experts


In RBF, each local fit is
a constant, wih, second
layer weight
In MoE, each local fit is
a linear function of x, a
local expert:
t
t
wih
 vih
xt
(Jacobs et al., 1991)
21
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
MoE as Models Combined

Radial gating:
2
t

e xp  x  mh / 2sh2 


ght 
2
t
2

e
xp

x

m
/
2
s
l 
l
l 


Softmax gating:


T t
exp
m
t
hx
gh 
T t
exp
m
l
l x


22
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
Cooperative MoE

Regression
E mh , s h , w ih i ,h

  r

1
t
t

| X   ri  y i
2 t i

w

2
t
v ih   rit  y ih
g ht x t
t
mhj
i
t
t
 y ih
t
ih

 y it g ht x tj
t
23
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
Competitive MoE: Regression
L mh , s h , w ih i ,h

 1
t
| X    log g e xp   rit  y ih
t
h
 2 i
t
y ih
 w ih  v ih x t
t
h

  f

x

2



t
v ih   rit  y ih
f ht x t
t
m h
t
h
 ght
t
t
24
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
Competitive MoE: Classification
L mh , sh , w ih i ,h | X    log g
t
t
y ih
h
t
h
 y 
t
t ri
ih
i

t 
  log ght e xp  rit log y ih

t
h
 i

e xpw ih
e xpv ih x t


t
e
xp
w
e
xp
v
x
k
k
kh
kh
25
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
Hierarchical Mixture of Experts



Tree of MoE where each MoE is an expert in a
higher-level MoE
Soft decision tree: Takes a weighted (gating)
average of all leaves (experts), as opposed to using
a single path and a single leaf
Can be trained using EM (Jordan and Jacobs, 1994)
26
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
Download