Dan ’ s slides for EARS PI mtg

advertisement
Dan’s slides for EARS PI mtg
• 4 slides on novel features based on
linear predictor coefficients for the
frequency (not time) domain
ÿbasic signal model accepted at ICASSP03
• A couple of slides on a very new idea to
look for data-derived (ICA?)
articulatory-style features
2002-01-17
Temporal envelope features
(Columbia)
• Temporal fine structure is lost
(deliberately) in STFT features:
mpgr1-sx419
0.15
0.1
0.05
0
-0.05
0.65
0.7
0.75
0.8
0.85
0.9
8000
10 ms
windows
0
6000
-20
4000
-40
2000
0
0.65
0.7
0.75
0.8
time / sec
0.85
0.9
-60
dB
• Need a compact, parametric description...
Frequency-Domain
Linear Prediction (FDLP)
• Extend LPC with LP model of spectrum
TD-LP
y[n] = Siaiy[n-i]
FD-LP
Y[k] = SibiY[k-i]
DFT
• ‘Poles’ represent temporal peaks:
mpgr1-sx419: TDLPC env (60 poles / 30 0 ms)
0.1
0.05
0
-0.05
0.65
0.7
0.75
0.8
0.85
0.9
• Features ~ pole bandwidth, ‘frequency’
http://www.ee.columbia.edu/~marios/ctflp/ctflp.html
FDLP features for speech
• LP algorithm distributes fixed pole set
within ~ 200 ms time window
ÿautomatic selection of ‘significant’ times
• Pole bandwidth ª transient sharpness
ÿ1 - max(|li|) in several bands as feature
ÿhelp with classification of stop bursts etc.
• Pole frequency ª timing within window
ÿfn - fn-1 as robust periodicity feature?
FDLP preliminary results
• Distribution of pole magnitudes for
different phone classes (in 4 bands):
0.1
0-500 Hz band
500-1000 Hz band
2-4 kHz band
1-2 kHz band
/ah/
/p/
0.08
0.06
0.04
0.02
0
-2
0
2
4
6
-2
0
2
4
6
-2
0
2
4
6
-2
0
2
-log(1-|l|)
• NN Classifier Frame Accuracies:
plp12N
plp12N+FDLP4
57.0%
58.4%
4
6
Data-derived phonetic
features (Columbia)
• Find a set of independent attributes to
account for phonetic (lexical) distinctions
ÿphones replaced by feature streams
• Will require new pronunciation models
ÿasynchronous feature transitions (no phones)
ÿmapping from phonetics (for unseen words)
Joint work with Eric Fosler-Lussier
ICA for feature bases
• PCA finds decorrelated bases;
ICA finds independent bases
test/dr1/faks0/sa2
15
10
5
Basis vectors
1
8
6
0
4
2
0
0
1
2
3
4
-1
8
2
6
4
1
2
0
0
d ow n ae
s
m iy t ix k eh r iy ix n
oy
l iy r
ae
g l ay k dh ae
tcl
-1
0
5
time / labels
• Lexically-sufficient ICA basis set?
10
15
20
frequency / Bark
Extra Slides
Speech Fragment Recognition
(Columbia)
• Model match for missing features:
Observation
Y(f)
Source
X(f)
Segregation S
freq
P(M,S|Y) = P(M)∫P(X|M)·P(X|Y,S)dX·P(S|Y)
P(X)
joint prob.
segregation
of model & seg.
likelihood
‘boost’
likelihood
• .. for partial observations in noise
• .. or integrating partially-seen streams
Missing speech information
• Noise is not our primary concern;
casual pronunciation is a big issue
ÿnot missing Spectral information,
but missing Phonetic information
• Can we model this as:
ÿ ‘missing’ (i.e. non-articulated)
ÿ ‘features’ (i.e. phonetic-style features) ... ?
• Need to locate information... P(S|Y)
Class-dependent information
• Locate information per subword unit
• Mutual Information on time-frequency
plane over different phone classes
All phones
Vowels only
• ±250ms / 19 bark, TIMIT phone ctrs
ICA for feature bases
• PCA finds decorrelated bases;
ICA finds independent bases
PCA bases
ICA bases
1
3
0.5
2
0
1
-0.5
0
-1
0.5
-1
3
0
1
2
3
4
2
0
1
0
-0.5
0
5
10
15
20
-1
0
5
10
15
20
frequency / Bark
• Find lexically-sufficient ICA basis set?
ICA for feature bases
• ICA coefficients ~ more independent:
test/dr1/faks0/sa2
15
10
5
8
6
4
2
0
8
6
4
2
0
d ow n ae
s
m iy t ix k eh r iy ix n
oy
l iy r
ae
g l ay k dh ae
tcl
labels
• Looking for orthogonal subword features
Download