Some aspects of the ICSI 1998 Broadcast News effort

advertisement
Some aspects of the
ICSI 1998 Broadcast News effort
Dan Ellis
International Computer Science Institute, Berkeley CA
<dpwe@icsi.berkeley.edu>
Outline
1.
Overview
2.
Feature choice
3.
Net size and training size
4.
Whole-utterance features
5.
Gender-dependence
ICSI Broadcast News - Dan Ellis
1998nov25 - 1
Overview
•
6 months, scores of trainings, 100s of decodes
BN Error rate vs time (N=523) - 1998nov08
100
90
80
WER%
70
60
50
40
30
20
10
0
0
•
20
40
60
80
100
Days since 1998jun01
120
140
160
Variables:
feature, netsize, trnset, labels, testset, pruning,
phone models, lexicon, LM, ...
ICSI Broadcast News - Dan Ellis
1998nov25 - 2
Feature choice
•
Cambridge: normalized PLP
•
Band-limited to 4kHz for ‘difference’
•
Rasta performed poorly (16ms windows)
•
Search over deltas, context window
•
plp12N-8k best alone
•
msg1N-8k best for combination with RNN
Feature
Elements
WER% alone
RNN baseline
WER%
RNN combo
33.2
plp12N
13
36.7
31.1
ras12+dN
26
44.4
32.5
msg1N
28
39.4
29.9
(2000HU, 37h trainset, align2 labels, 7hyp decode)
ICSI Broadcast News - Dan Ellis
1998nov25 - 3
Net and training set: Size matters
•
Huge amount of training data available
- 74h (16M training patterns @ 16ms) → 142h
•
Search over net size / training set size
WER for PLP12N nets vs. net size & training data
44
42
WER%
40
38
36
34
32
0
0
1
1
2
2
3
3
Hidden layer size (log2 re 500 un
Training set size (log2 re 9.25 hours)
HUs
2000
4000
8000
Trnset
37h
74h
142h
Labels
align2
align2
align4
Trn time
4 days
7 days
21 days
WER%
38.6
35.3
31.6
ComboWER %
30.4
29.3
26.8
(msg1N, 7hyp decode)
ICSI Broadcast News - Dan Ellis
1998nov25 - 4
Whole-utterance features
•
BN groups have focussed on adaptation &
normalization
- VTLN, MLLR, SAT
•
Maybe do similar thing with extra net inputs
→ Whole-utterance pre-normalization featuredimension variances as constant inputs to net
Wacky ftr benefit vs. utt.dur threshold. − = h4e−97−sub, −− = h4ee−97
60
Utterance duration threshold, secs
50
40
30
20
10
0
36
ICSI Broadcast News - Dan Ellis
36.2
36.4
36.6
36.8
37
WER%
37.2
37.4
37.6
37.8
38
1998nov25 - 5
Gender dependence (GD)
•
Train separate nets on Female/Male data
- males represented 2:1 in BN
- oracle labels?
•
Encouraging results:
Net
F% (2224)
M% (3711)
WER%(5938)
2000HU/25h UF
28.3
53.1
43.8
2000HU/25h UM
42.1
33.3
36.6
4000HU/50h U
29.6
33.6
32.1
Oracle best
25.7
30.5
28.7
Combination scheme
27.7
32.2
30.5
•
Best practical scheme
- use classifier entropy to choose M or F net
- use decoder likelihood to choose GD or GI
ICSI Broadcast News - Dan Ellis
1998nov25 - 6
Eval results
Group
Full 1
Full 2
10x 1
10x 2
BBN
15.2
14.2
17.5
16.7
CU-HTK
14.2
13.5
16.8
15.5
Dragon
15.7
13.4
18.0
15.5
IBM
14.5
12.4
21.2
17.8
LIMSI
13.8
13.3
OGI/Fonix
27.9
23.6
Philips/RWTH
18.5
16.8
SPRACH
21.7
20.0
26.2
23.8
SRI
22.1
20.1
23.4
22.2
ICSI Broadcast News - Dan Ellis
1998nov25 - 7
Download