THISL & RESPITE progress reports

advertisement
THISL & RESPITE progress reports
Dan Ellis
International Computer Science Institute, Berkeley CA
<dpwe@icsi.berkeley.edu>
Outline
1
Thisl progress:
GUI, SQI evaluation, exotic data
2
Respite progress:
Choosing streams, NN feature extraction,
Aurora evaluation results
ICSI: Thisl & Respite progress - Dan Ellis
2000-01-23 - 1
Thisl: GUI
1
•
Automatic voice detection
- option to trigger on any utterance
•
Thomson NLP?
•
Ship to Thomson/BBC?
ICSI: Thisl & Respite progress - Dan Ellis
2000-01-23 - 2
Spoken Query Interface (SQI) Evaluation
•
Full IR processing from spoken queries
- 1-best hypotheses
- all terms from (smallish) lattice (KW prec=16%)
- (syntax not covered by Thomson NLP)
Spoken Query Performance by Speaker
0.6
Average Precision
0.5
0.4
1-best
lattice
0.3
0.2
0.1
0
0
10
20
30
40
50
Word Error Rate %
•
Spoken queries work OK
- dumb lattice processing does not
ICSI: Thisl & Respite progress - Dan Ellis
2000-01-23 - 3
Exotic data evaluation
•
Range of non-news programmes from BBC
- natural history
- interviews
- features
•
Evaluation plan
- basic WER%: (not so bad)
Data set
WER
OOV
Avg.Fr. Entropy
Euro99Eval
(6 news shows, 31k words)
29.2 ± 7.6 %
0.84%
1.14 ± 0.11
1999dec Exotic
(13 varied files, 44k words)
38.9 ± 8.4 %
0.70%
1.25 ± 0.09
- separate acoustic & language model scores
(for recognized & forced alignments)
- Chase-style blame assignment
ICSI: Thisl & Respite progress - Dan Ellis
2000-01-23 - 4
Combining feature streams
2
•
How to allocate feature dimensions to models?
- lower-dimension models train more quickly
- higher-dimension models find more interactions
Feature 1
calculation
Feature 1
calculation
Feature
concatenate
Input
sound
Feature 2
calculation
Acoustic
classifier
Speech
features
Acoustic
classifier
Posterior
multiply
Speech
features
Phone
probabilities
to decoder
Input
sound
Feature 2
calculation
•
^
•
Phone
probabilities
to decoder
Acoustic
classifier
Variations of PLP & MSG for Aurora:
Features
Parameters
baseline WER ratio
plp12•dplp12
136k
97.6%
plp12^dplp12
124k
89.6%
msg3a•msg3b
145k
101.1%
msg3a^msg3b
133k
85.8%
plp12•dplp12•msg3a•msg3b
281k
76.5%
plp12^dplp12^msg3a^msg3b
245k
74.1%
plp12^dplp12•msg3a^msg3b
257k
63.0%
ICSI: Thisl & Respite progress - Dan Ellis
2000-01-23 - 5
Choosing streams & combinations
•
Which combination methods?
- structural co-dependence is better
modeled in a single feature space
- orthogonal variability generalizes better
with later combination
•
Which feature streams?
- best pairwise system was plp12^msg3b
i.e. best single system plus worst!
- combine streams with complementary
information...
... look at conditional mutual information?
... of statistical model outputs?
... compensating for baseline performance?
ICSI: Thisl & Respite progress - Dan Ellis
2000-01-23 - 6
Tandem connectionist models
•
Posterior combination for HTK systems?
•
Answer: use posteriors as HTK input features
Feature
calculation
Input
sound
Neural
net model
Speech
features
(Hybrid system
output)
(Posterior
decoder)
(Phone
probabilities)
Pre-nonlinearity
outputs
PCA
orthogn'n
Subword
likelihoods
Othogonal
features
HTK
GM model
Tandem system
output
HTK
decoder
- (GMM system does not know they are phones)
•
Result: better performance than either alone!
- neural net has trained discriminatively
- GMM HMMs learn context-dependent structure
→extract complementary info from training data
System-features
baseline WER ratio
HTK-mfcc
100.0%
Hybrid-mfcc
84.6%
Tandem-mfcc
64.5%
Tandem-plp+msg
47.2%
ICSI: Thisl & Respite progress - Dan Ellis
2000-01-23 - 7
Aurora “Distributed SR” evaluation
•
7 telecoms company submissions:
Aurora DSR Evaluation 1999 Results
Avg. WER -20-0dB
Baseline improvement
100.00%
80.00%
60.00%
40.00%
20.00%
Ta
nd
em
2
S6
S5
S4
S3
Ta
nd
em
1
-20.00%
S2
S1
Ba
se
lin
e
0.00%
- Tandem systems from OGI-ICSI-Qualcomm
•
Best features for transmission?
- (filtered) subband energies may be sufficient
ICSI: Thisl & Respite progress - Dan Ellis
2000-01-23 - 8
Download