Dan’s slides for EARS PI mtg • 4 slides on novel features based on linear predictor coefficients for the frequency (not time) domain ÿbasic signal model accepted at ICASSP03 • A couple of slides on a very new idea to look for data-derived (ICA?) articulatory-style features 2002-01-17 Temporal envelope features (Columbia) • Temporal fine structure is lost (deliberately) in STFT features: mpgr1-sx419 0.15 0.1 0.05 0 -0.05 0.65 0.7 0.75 0.8 0.85 0.9 8000 10 ms windows 0 6000 -20 4000 -40 2000 0 0.65 0.7 0.75 0.8 time / sec 0.85 0.9 -60 dB • Need a compact, parametric description... Frequency-Domain Linear Prediction (FDLP) • Extend LPC with LP model of spectrum TD-LP y[n] = Siaiy[n-i] FD-LP Y[k] = SibiY[k-i] DFT • ‘Poles’ represent temporal peaks: mpgr1-sx419: TDLPC env (60 poles / 30 0 ms) 0.1 0.05 0 -0.05 0.65 0.7 0.75 0.8 0.85 0.9 • Features ~ pole bandwidth, ‘frequency’ http://www.ee.columbia.edu/~marios/ctflp/ctflp.html FDLP features for speech • LP algorithm distributes fixed pole set within ~ 200 ms time window ÿautomatic selection of ‘significant’ times • Pole bandwidth ª transient sharpness ÿ1 - max(|li|) in several bands as feature ÿhelp with classification of stop bursts etc. • Pole frequency ª timing within window ÿfn - fn-1 as robust periodicity feature? FDLP preliminary results • Distribution of pole magnitudes for different phone classes (in 4 bands): 0.1 0-500 Hz band 500-1000 Hz band 2-4 kHz band 1-2 kHz band /ah/ /p/ 0.08 0.06 0.04 0.02 0 -2 0 2 4 6 -2 0 2 4 6 -2 0 2 4 6 -2 0 2 -log(1-|l|) • NN Classifier Frame Accuracies: plp12N plp12N+FDLP4 57.0% 58.4% 4 6 Data-derived phonetic features (Columbia) • Find a set of independent attributes to account for phonetic (lexical) distinctions ÿphones replaced by feature streams • Will require new pronunciation models ÿasynchronous feature transitions (no phones) ÿmapping from phonetics (for unseen words) Joint work with Eric Fosler-Lussier ICA for feature bases • PCA finds decorrelated bases; ICA finds independent bases test/dr1/faks0/sa2 15 10 5 Basis vectors 1 8 6 0 4 2 0 0 1 2 3 4 -1 8 2 6 4 1 2 0 0 d ow n ae s m iy t ix k eh r iy ix n oy l iy r ae g l ay k dh ae tcl -1 0 5 time / labels • Lexically-sufficient ICA basis set? 10 15 20 frequency / Bark Extra Slides Speech Fragment Recognition (Columbia) • Model match for missing features: Observation Y(f) Source X(f) Segregation S freq P(M,S|Y) = P(M)∫P(X|M)·P(X|Y,S)dX·P(S|Y) P(X) joint prob. segregation of model & seg. likelihood ‘boost’ likelihood • .. for partial observations in noise • .. or integrating partially-seen streams Missing speech information • Noise is not our primary concern; casual pronunciation is a big issue ÿnot missing Spectral information, but missing Phonetic information • Can we model this as: ÿ ‘missing’ (i.e. non-articulated) ÿ ‘features’ (i.e. phonetic-style features) ... ? • Need to locate information... P(S|Y) Class-dependent information • Locate information per subword unit • Mutual Information on time-frequency plane over different phone classes All phones Vowels only • ±250ms / 19 bark, TIMIT phone ctrs ICA for feature bases • PCA finds decorrelated bases; ICA finds independent bases PCA bases ICA bases 1 3 0.5 2 0 1 -0.5 0 -1 0.5 -1 3 0 1 2 3 4 2 0 1 0 -0.5 0 5 10 15 20 -1 0 5 10 15 20 frequency / Bark • Find lexically-sufficient ICA basis set? ICA for feature bases • ICA coefficients ~ more independent: test/dr1/faks0/sa2 15 10 5 8 6 4 2 0 8 6 4 2 0 d ow n ae s m iy t ix k eh r iy ix n oy l iy r ae g l ay k dh ae tcl labels • Looking for orthogonal subword features