RESPITE progress report

advertisement
RESPITE progress report
Dan Ellis
International Computer Science Institute, Berkeley CA
<dpwe@icsi.berkeley.edu>
Outline
1
Hybrid AURORA system
2
Using hybrid results with HTK
3
Multifeature design
4
Multistream pronunciation modeling
ICSI: RESPITE progress - Dan Ellis
1999sep13 - 1
Hybrid AURORA system
1
•
AURORA noisy digits task
- TIDIGITS + 4 kinds of noise x 7 SNR levels
- standard HTK back-end provided
- objective: standard features for mobile phones
•
ICSI’s small-vocab techniques
- modulation-filtered spectrogram (MSG) features
- posterior probability combination (multistream)
•
Can we combine them?
- hybrid NN-HMM baseline system for AURORA
- use a TIDIGITS lexicon & phone models
- bootstrap labels from NUMBERS95 network
- use 480 hidden-unit net as N95
ICSI: RESPITE progress - Dan Ellis
1999sep13 - 2
Baseline AURORA results
System
•
AURORA test has 28 numbers...
•
...report just a few
- mean WER % for ∞, 15, 5, -5 dB SNR
+ overall mean ratio to HTK MFCC baseline
Feature
Clean
SNR15
SNR5
SNR-5
Avg. ratio
HTK
MFCC+d
1.4%
3.7%
15.9%
68.0%
100.0%
Hybrid
MFCC+d
2.2%
2.6%
9.9%
49.1%
82.1%
Hybrid
plp12N+d
2.6%
2.8%
10.6%
47.9%
89.6%
Hybrid
msg3N
2.1%
2.9%
11.6%
49.2%
87.1%
HTK
msg3NKG
5.6%
6.4%
21.5%
66.8%
184.5%
2
WERR%
10
1
10
HTK MFCC
Hybrid MFCC
Hybrid plp
Hybrid MSG
HTK MSG
0
10
clean
15dB
5dB
-5dB
SNR
ICSI: RESPITE progress - Dan Ellis
1999sep13 - 3
Combination systems
•
Posterior combination has worked well
Feature 1
calculation
Input
sound
Acoustic
classifier
Feature 2
calculation
HMM
decoder
Posterior
combination
Speech
features
Acoustic
classifier
Word
hypotheses
Phone
probabilities
P(qi|X1,X2) ∝ P(qi|X1)·P(qi|X2) / P(qi) ... if X1⊥X2|q
•
But it depends on features
Features
Clean
SNR15
SNR5
SNR-5
Avg. ratio
plp12Nd
2.6%
2.8%
10.6%
47.9%
89.6%
msg3N
2.1%
2.9%
11.6%
49.2%
87.1%
plp12Nd-msg3N
1.7%
2.4%
9.5%
47.3%
74.1%
plp12N-msg3aN
• dplp12N-msg3bN
1.7%
2.1%
8.8%
46.9%
70.1%
plp12Nd • msg3N
1.5%
1.9%
8.2%
43.0%
63.0%
ICSI: RESPITE progress - Dan Ellis
1999sep13 - 4
Using hybrid results with HTK
2
•
AURORA specification: use HTK recognizer
•
How to put combinations into HTK
- feature combination (with LDA?)
- posteriors as features (only 24 phone classes)
plp
calculation
Neural
net model
Noway
decoder
x
msg
calculation
Input
sound
Word
hypotheses
Subword
likelihoods
Phone
probabilities
Speech
features
•
System
Neural
net model
HTK
GM model
HTK
decoder
HTK handles it!
Feature
Clean
SNR15
SNR5
SNR-5
Avg. ratio
Hybrid
plp • msg
1.5%
1.9%
8.2%
43.0%
63.0%
HTK
posteriors
1.1%
1.9%
8.2%
46.1%
59.1%
ICSI: RESPITE progress - Dan Ellis
1999sep13 - 5
Tailoring posteriors for HTK
4
10
x 10
•
Posteriors are very un-Gaussian
- log-transform doesn’t help much
•
A linear output layer helps a lot
- remove softmax: yi = exp(xi)/Σj(exp(xj))
Histograms for elements 1, 2 and 23 (=h#) of lna1L (logprob) feature set
Histograms for elements 1,2,3 & 23 of lin out plp12Nd ftrs lin1
4000
2000
5
0
4
x 10
−12
10
−10
−8
−6
−4
−2
0
0
−20
4000
5
0
−20
4000
0
4
x 10
−12
5
2000
−10
−8
−6
−4
−2
0
−15
−10
−5
0
5
10
15
20
25
30
−15
−10
−5
0
5
10
15
20
25
30
2000
0
−20
4000
−15
−10
−5
0
5
10
15
20
25
2000
0
−12
−10
−8
−6
•
System
−4
−2
0
0
−15
−10
−5
0
5
10
15
Do combinations by summing linear outputs
Feature
Clean
SNR15
SNR5
SNR-5
Avg. ratio
HTK
posteriors
1.1%
1.9%
8.2%
46.1%
59.1%
HTK
log(p)
0.9%
1.8%
8.9%
48.8%
58.6%
HTK
Σ(lin. o/p)
0.9%
1.6%
7.7%
44.1%
51.6%
ICSI: RESPITE progress - Dan Ellis
1999sep13 - 6
20
Multifeature design
3
(Mike Shire)
•
‘Optimal’ features for different conditions
- subband envelope domain
- linear-discriminant analysis (LDA) for filter coeffs
•
Modulation-frequency domain responses
for clean, reverb, mixture:
1st Discriminant Filter
0
-5
dB
-10
-15
Clean
Light Reverb
Severe Reverb
Clean+Severe Reverb
-20
-25
-30
2nd Discriminant Filter
0
-5
dB
-10
-15
-20
-25
-30
0
10
ICSI: RESPITE progress - Dan Ellis
1
10
Hz
1999sep13 - 7
4
Multistream pronunciation models
(Barry Chen)
•
Combine streams in the decoder
- ‘HMM combination’
- separate state assignment for each stream
- constrain (disallow?) asynchrony
•
Are particular asynchronies important?
- between certain bands?
- between certain sounds?
- in particular directions?
•
Re-estimate transition probabilities
in 1-state asynchrony 4-band models
- no improvement yet
ICSI: RESPITE progress - Dan Ellis
1999sep13 - 8
Download