Integrating CASA with other approaches Dan Ellis

advertisement
Integrating CASA
with other approaches
Dan Ellis
Laboratory for Recognition and Organization of Speech and Audio
Dept. Electrical Eng., Columbia Univ., NY USA
dpwe@ee.columbia.edu
1. CASA cues
2. CASA in the Speech Fragment Decoder
3. Other approaches
Speech Separation: Integration - Dan Ellis
2004-11-06
1. CASA Cues
• Auditory Scene Analysis
.. is based on ‘grouping cues’
.. which reflect real-world sound characteristics
• Computational ASA features
common onset
harmonicity (common fundamental)
common modulation / fate
spatial cues (ITD/ILD i.e. perceived origin)
continuity
sequential similarity
learned patterns (schema)
Speech Separation: Integration - Dan Ellis
2004-11-06
Grouping cues
freq / Hz
• Onset & Harmonicity: Pierce example
8000
6000
4000
2000
0
0
1
2
3
4
5
6
7
8
9 time / s
• Multiple cues are not always consistent!
Speech Separation: Integration - Dan Ellis
2004-11-06
Pure CASA System (Brown’92)
input
mixture
Front end
signal
features
(maps)
Object
formation
discrete
objects
Grouping
rules
Source
groups
freq
onset
time
• Problems
period
frq.mod
hand-built rules
Harmoncity as principle cue:
how to integrate other factors?
one continuously-voiced utterance
Speech Separation: Integration - Dan Ellis
2004-11-06
M ! = argmax P " M X # = argmax P " X M
M
M
+-%(2%&2*34%//'!3%-'"(*35""/,/*6,-7,,(*
2. CASA within
1 .':-8&,/;*"6/,&<,2*9,%-8&,/*Y=*/,)
P" M #
!
M =Speech
argmax P " MFragment
X # = argmax P " X
M # ! -------------The
Decoder
%44*&,4%-,2*6>*P(
X | Y,S
P " X)*;#
M
M
1
#"2,4/*M*-"*#%-35*/"8&3,*9,%-8&,/*X
Barker, Cooke, Ellis ’04
• Match ‘uncorrupt’
1
.':-8&,/;*"6/,&<,2*9,%-8&,/*Y=*/,)&,)%-'"(*S=*
Observation
%44*&,4%-,2*6>*P( X | Y,S )*;
Y(f )
spectrum to ASR
models using
missing data
•
Observation
Y(f )
Source
X(f )
Source
X(f ) Segregation S
freq
Joint search for model M and freq
segregation S
1 ?"'(-*34%//'!3%-'"(*"9*#"2,4*%(2*/
to maximize:
1 ?"'(-*34%//'!3%-'"(*"9*#"2,4*%(2*/,)&,)%-'"(;
P" X Y $ S #
P" X Y $ S #
P " M $ S Y #P=" M
P "$MS# %YP#" X=M P
# !"-----------------------! P#" S! -----------------------Y#
-d
MP#"%XP# "-XdX M
P
"
X
#
Isolated Source Model
Segregation Model
Segregation S
<$P(X)$#*$&*#521$=*#(0"#0$>
<$P(X)$#*$&*#521$=*#(0"#0$>
!"#$%&&'(
)*+#,-$.'/0+12($3$42"1#'#5
Speech
Separation: Integration
- Dan Ellis
!"#$%&&'(
2004-11-06
67789::976$9$::$;$68
)*+#,-$.'/0+12($3$42"1#'#5
67789
Segregation S
1
freq
Using CASA cues
?"'(-*34%//'!3%-'"(*"9*#"2,4*%(2*/,)&,)%-'"(;
P" X Y $ S #
!"#$%&'()(&*+,-./+"
P " M $ S Y # = P " M # % P " X M # ! ------------------------- dX ! P " S Y #
P" X #
0 P(S|Y)&1#$2"&,34."-#3&#$*4/5,-#4$&-4&"+%/+%,-#4$
• CASA helps search
9<$P(X)$#*$&*#521$=*#(0"#0$>
'($0<'($(25125"0'*#$=*10<$>*#(',21'#5?
9 <*=$&'@2&A$'($'0?
consider
only segregations
made from
CASA
!"#$%&&'(
)*+#,-$.'/0+12($3$42"1#'#5
67789::976$9$::$;$68
0 67+$#$%&*4/&'()(8"-91+&143,1&*+,-./+"
chunks 9 B21'*,'>'0A;<"1C*#'>'0AD
• CASA rates segregation
E12F+2#>A$G"#,($G2&*#5$0*520<21
9 *#(20;>*#0'#+'0AD
construct0'C29E12F+2#>A$125'*#$C+(0$G2$=<*&2
P(S|Y) to reward CASA qualities:
Frequency Proximity
Common Onset
Speech Separation: Integration - Dan Ellis
!"#$%&&'(
)*+#,-$.'/0+12($3$42"1#'#5
Harmonicity
67789::976$9$:8$;$68
2004-11-06
3. CASA + Other Approaches
• Put CASA in the control for any engine?
• CASA + ICA?
Te-Won’s 5-year project
• CASA + Clustering (Francis Bach)
optimize ‘affinity’ based on auditory+... cues
• Nonnegative parts?
parts: local cues: local
Speech Separation: Integration - Dan Ellis
2004-11-06
Can Machine Learning Subsume CASA?
• ASA grouping cues describe real sounds
.. “anecdotally”
• Machine Learning is another way to find
regularities in large datasets
can, e.g., Roweis templates subsume harmonicity,
onset, etc.?
... and handle schema at the same time?
“cut out the (grouping cue) middleman”
• Trick is how to represent/generalize
listeners can organize novel sounds
Speech Separation: Integration - Dan Ellis
2004-11-06
Conclusions
• CASA cues:
describe attributes of real-world sources
• Integrating CASA:
contribute to a ‘reasonableness’ metric (co.theory)
help constrain the search (algorithm)
• Integration in general:
how much ‘different’ information is available?
Session: Is there integration? Can there be?
Speech Separation: Integration - Dan Ellis
2004-11-06
Download