ppt

advertisement
Lecture Slides for
INTRODUCTION TO
Machine Learning
ETHEM ALPAYDIN
© The MIT Press, 2004
alpaydin@boun.edu.tr
http://www.cmpe.boun.edu.tr/~ethem/i2ml
CHAPTER 13:
Hidden Markov
Models
Introduction


Modeling dependencies in input; no longer iid
Sequences:
 Temporal: In speech; phonemes in a word
(dictionary), words in a sentence (syntax,
semantics of the language).
In handwriting, pen movements
 Spatial: In a DNA sequence; base pairs
3
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
Discrete Markov Process




N states: S1, S2, ..., SN State at “time” t, qt = Si
First-order Markov
P(qt+1=Sj | qt=Si, qt-1=Sk ,...) = P(qt+1=Sj | qt=Si)
Transition probabilities
aij ≡ P(qt+1=Sj | qt=Si)
aij ≥ 0 and Σj=1N aij=1
Initial probabilities
πi ≡ P(q1=Si)
Σj=1N πi=1
4
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
Stochastic Automaton
T
P O  Q | A,    P q1  P qt | qt 1   q1aq1q2 aqT 1qT
t 2
5
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
Example: Balls and Urns

Three urns each full of balls of one color
S1: red, S2: blue, S3: green 0.4 0.3 0.3
  0.5,0.2,0.3
T
O  S 1 , S 1 , S 3 , S 3
A  0.2 0.6 0.2
0.1 0.1 0.8
P O | A ,    P S 1   P S 1 | S 1   P S 3 | S 1   P S 3 | S 3 
 1  a11  a13  a33
 0.5  0.4  0.3  0.8  0.048
6
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
Balls and Urns: Learning

Given K example sequences of length T
# se que nc e sstarting with S i 

ˆi 
# se que nc es
# transitions from S i to S j 
ˆ
aij 
# transitions from S i 


k
k
1
q

S
and
q
k t 1 t i
t 1  S j
T- 1

k
1
q
k t 1 t  Si
T- 1


k
1
q
k 1  Si

K

7
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
Hidden Markov Models





States are not observable
Discrete observations {v1,v2,...,vM} are recorded; a
probabilistic function of the state
Emission probabilities
bj(m) ≡ P(Ot=vm | qt=Sj)
Example: In each urn, there are balls of different
colors, but with different probabilities.
For each observation sequence, there are multiple
state sequences
8
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
HMM Unfolded in Time
9
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
Elements of an HMM





N: Number of states
M: Number of observation symbols
A = [aij]: N by N state transition probability matrix
B = bj(m): N by M observation probability matrix
Π = [πi]: N by 1 initial state probability vector
λ = (A, B, Π), parameter set of HMM
10
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
Three Basic Problems of HMMs
1.
2.
3.
Evaluation: Given λ, and O, calculate P (O | λ)
State sequence: Given λ, and O, find Q* such that
P (Q* | O, λ ) = maxQ P (Q | O , λ )
Learning: Given X={Ok}k, find λ* such that
P ( X | λ* )=maxλ P ( X | λ )
(Rabiner, 1989)
11
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
Evaluation

Forward variable:
t i   P O1 Ot , qt  S i |  
Initializa tion :
1 i    i bi O1 
Re c ursion:
N

t 1  j    t i aij b j Ot 1 
 i 1

N
P O |     T i 
i 1
12
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

Backward variable:
t i   P Ot 1 OT | qt  S i ,  
Initializa tion :
T i   1
Re c ursion:
N
t i    aijb j Ot 1 t 1  j 
j 1
13
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
Finding the State Sequence
 t i   P qt  Si O ,  


t i t i 
N
j 1
t  j t  j 
Choose the state that has the highest probability,
for each time step:
qt*= arg maxi γt(i)
No!
14
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
Viterbi’s Algorithm
δt(i) ≡ maxq1q2∙∙∙ qt-1 p(q1q2∙∙∙qt-1,qt =Si,O1∙∙∙Ot | λ)




Initialization:
δ1(i) = πibi(O1), ψ1(i) = 0
Recursion:
δt(j) = maxi δt-1(i)aijbj(Ot), ψt(j) = argmaxi δt-1(i)aij
Termination:
p* = maxi δT(i), qT*= argmaxi δT (i)
Path backtracking:
qt* = ψt+1(qt+1* ), t=T-1, T-2, ..., 1
15
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
Learning
t i , j   P qt  S i , qt 1  S j | O ,  
t i , j  
t i aijb j Ot 1 t 1  j 
   k a b O  l 
k
l
t
kl
l
t 1
t 1
Baum - We lc h algorithm (EM) :
1 if qt  Si
1 if qt  S i and qt 1  S j
t
z 
zij  
0 othe rwise
0 othe rwise
t
i
16
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
Baum-Welch (EM)
 
 
E  ste p : E zit  t i 
E zijt  t i , j 
M  ste p :
K
ˆi 
k

 1 i 
k 1
K
ˆ
aij
k 1
K
k 1
k 1
Tk 1 k
k
t
t
t 1
K
Tk 1 k
t
k 1
t 1
 i , j 
Tk 1 k
t
t 1
Tk 1 k
t 1
  j 1O  v 


m 
   i 
K
ˆ
b
j



 
K
t i 
m
17
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
Continuous Observations

Discrete:
t
rm
M
P Ot | qt  S j ,     b j m 
m 1

1 if Ot  v m
r 
0 otherwise
t
m
Gaussian mixture (Discretize using k-means):
L
P Ot | qt  S j ,     P Gjl  p Ot | qt  S j ,Gl ,  
l 1

Continuous:

P Ot | qt  S j ,   ~ N  j , 2j

Use EM to learn parameters, e.g., ˆ j
~ N l , l 
  j O


  j 
t
t
t
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
t
t
18
HMM with Input

Input-dependent observations:


 
 
P Ot | qt  S j , x t ,  ~ N g j x t |  j , 2j

Input-dependent transitions (Meila and Jordan,
1996; Bengio and Frasconi, 1996):

P qt 1  S j | qt  Si , x t


Time-delay input:
xt  f Ot  ,...,Ot 1 
19
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
Model Selection in HMM

Left-to-right HMMs:
a11 a12 a13 0 
0 a

a
a
22
23
24 
A
0
0 a33 a34 


0
0
0
a
44 


In classification, for each Ci, estimate P (O | λi) by a
separate HMM and use Bayes’ rule
P O | i P i 
P i | O  
j P O |  j P  j 
20
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
Download