Handing Uncertain Observations in Unsupervised Topic

advertisement
Handing Uncertain Observations in
Unsupervised Topic-Mixture
Language Model Adaptation
Ekapol Chuangsuwanich1, Shinji Watanabe2,
Takaaki Hori2, Tomoharu Iwata2, James Glass1
1MIT
Computer Science and Artificial Intelligence Laboratory, Cambridge, Massachusetts, USA
2NTT Communication Science Laboratories, NTT Corporation, Japan
ICASSP 2012
報告者:郝柏翰
2013/03/05
Outline
• Introduction
• Topic Tracking Language Model(TTLM)
• TTLM Using Confusion Network Inputs(TTLMCN)
• Experiments
• Conclusion
2
Introduction
• In a real environment, acoustic and language features
often vary depending on the speakers, speaking styles
and topic changes.
• To accommodate these changes, speech recognition
approaches that include the incremental tracking of
changing environments have attracted attention.
• This paper proposes a topic tracking language model
that can adaptively track changes in topics based on
current text information and previously estimated topic
models in an on-line manner.
3
TTLM
• Tracking temporal changes in language environments
4
TTLM
• A long session of speech input is divided into chunks
t  1,2,..., T
• Each chunk is modeled by different topic distributions
t  tk kK1
• The current topic distribution depends on the topic
distribution of the past H chunks and precision
parameters α as follows:
K
( *ˆk ) t 1
ˆ
P(t | {t  h ,  } )   tk
H
th h 1
k 1
5
TTLM
• With the topic distribution, the unigram probability of a
word wm in the chunk can be recovered using the topic
and word probabilities
P( wm )  k 1ˆtk kwm
K
• Where θ is the unigram probabilities of word wm in topic k
• The adapted n-gram can be used for a 2nd pass
recognition for better results.
6
TTLMCN
chunk1
slot1
slot2
A1=3
…
chunk2
chunk3
slot3
• Consider a confusion network with M word slots.
• Each word slot m can contain different number of arcs Am
• with each arc containing a word wma and a corresponding
arc posterior dma.
• Sm is binary selection parameter, where sm = 1 indicates
that the arc is selected.
7
TTLMCN
• For each chunk t, we can write the joint distribution of
words, latent topics and arc selections conditioned on the
topic probabilities, unigram probabilities, and arc posteriors
as follows:
N
Am
P(Wt , Z t , St |  e ,  , t , Dt )   tzm  d 
s ma
ma
m
s ma
z m wma

1 s ma
e wma
a
8
TTLMCN
• Graphical representation of TTLMCN
9
Experiments(MIT-OCW)
• MIT-OCW is mainly composed of lectures given at MIT.
Each lecture is typically two hours long. We segmented
the lectures using Voice Activity Detectors into
utterances averaging two seconds each.
10
Compare with TTLM and TTLMCN
• We can see that the topic probability of TTLMCNI is more
similar to the oracle experiment than TTLM, especially in
the low probability regions.
• KL between TTLM and ORACLE was 3.3, TTLMCN was 1.3
11
Conclusion
• We described an extension for the TTLM in order to
handle errors in speech recognition. The proposed
model used a confusion network as input instead of just
one ASR hypothesis which improved performance even
in high WER situations.
• The gain in word error rate was not very large since the
LM typically contributed little to the performance of
LVCSR.
12
Significance Test (T-Test)
t?
?
H0:實驗組與對照組的常態分佈一致
H1:實驗組與對照組的常態分佈不一致
13
Significance Test (T-Test)
• Significance Test (T-Test)
X
t
(S / n )
• Example
T.TEST(A1 : A10, B1 : B10,1,2)
X
5
7
5
3
5
3
3
9
Y
8
1
4
6
6
4
1
2
 x  40, M x  5
S x2  4.571
 y  32, M y  4
S y2  6.571

54
4.571 6.571

8
8
14
Download