Some aspects of the ICSI 1998 Broadcast News effort

Some aspects of the ICSI 1998 Broadcast News effort Dan Ellis International Computer Science Institute, Berkeley CA <dpwe@icsi.berkeley.edu> Outline 1. Overview 2. Feature choice 3. Net size and training size 4. Whole-utterance features 5. Gender-dependence ICSI Broadcast News - Dan Ellis 1998nov25 - 1 Overview • 6 months, scores of trainings, 100s of decodes BN Error rate vs time (N=523) - 1998nov08 100 90 80 WER% 70 60 50 40 30 20 10 0 0 • 20 40 60 80 100 Days since 1998jun01 120 140 160 Variables: feature, netsize, trnset, labels, testset, pruning, phone models, lexicon, LM, ... ICSI Broadcast News - Dan Ellis 1998nov25 - 2 Feature choice • Cambridge: normalized PLP • Band-limited to 4kHz for ‘difference’ • Rasta performed poorly (16ms windows) • Search over deltas, context window • plp12N-8k best alone • msg1N-8k best for combination with RNN Feature Elements WER% alone RNN baseline WER% RNN combo 33.2 plp12N 13 36.7 31.1 ras12+dN 26 44.4 32.5 msg1N 28 39.4 29.9 (2000HU, 37h trainset, align2 labels, 7hyp decode) ICSI Broadcast News - Dan Ellis 1998nov25 - 3 Net and training set: Size matters • Huge amount of training data available - 74h (16M training patterns @ 16ms) → 142h • Search over net size / training set size WER for PLP12N nets vs. net size & training data 44 42 WER% 40 38 36 34 32 0 0 1 1 2 2 3 3 Hidden layer size (log2 re 500 un Training set size (log2 re 9.25 hours) HUs 2000 4000 8000 Trnset 37h 74h 142h Labels align2 align2 align4 Trn time 4 days 7 days 21 days WER% 38.6 35.3 31.6 ComboWER % 30.4 29.3 26.8 (msg1N, 7hyp decode) ICSI Broadcast News - Dan Ellis 1998nov25 - 4 Whole-utterance features • BN groups have focussed on adaptation & normalization - VTLN, MLLR, SAT • Maybe do similar thing with extra net inputs → Whole-utterance pre-normalization featuredimension variances as constant inputs to net Wacky ftr benefit vs. utt.dur threshold. − = h4e−97−sub, −− = h4ee−97 60 Utterance duration threshold, secs 50 40 30 20 10 0 36 ICSI Broadcast News - Dan Ellis 36.2 36.4 36.6 36.8 37 WER% 37.2 37.4 37.6 37.8 38 1998nov25 - 5 Gender dependence (GD) • Train separate nets on Female/Male data - males represented 2:1 in BN - oracle labels? • Encouraging results: Net F% (2224) M% (3711) WER%(5938) 2000HU/25h UF 28.3 53.1 43.8 2000HU/25h UM 42.1 33.3 36.6 4000HU/50h U 29.6 33.6 32.1 Oracle best 25.7 30.5 28.7 Combination scheme 27.7 32.2 30.5 • Best practical scheme - use classifier entropy to choose M or F net - use decoder likelihood to choose GD or GI ICSI Broadcast News - Dan Ellis 1998nov25 - 6 Eval results Group Full 1 Full 2 10x 1 10x 2 BBN 15.2 14.2 17.5 16.7 CU-HTK 14.2 13.5 16.8 15.5 Dragon 15.7 13.4 18.0 15.5 IBM 14.5 12.4 21.2 17.8 LIMSI 13.8 13.3 OGI/Fonix 27.9 23.6 Philips/RWTH 18.5 16.8 SPRACH 21.7 20.0 26.2 23.8 SRI 22.1 20.1 23.4 22.2 ICSI Broadcast News - Dan Ellis 1998nov25 - 7

Some aspects of the ICSI 1998 Broadcast News effort

Related documents

Products

Support

Some aspects of the ICSI 1998 Broadcast News effort

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib