Integrating CASA with other approaches Dan Ellis Laboratory for Recognition and Organization of Speech and Audio Dept. Electrical Eng., Columbia Univ., NY USA dpwe@ee.columbia.edu 1. CASA cues 2. CASA in the Speech Fragment Decoder 3. Other approaches Speech Separation: Integration - Dan Ellis 2004-11-06 1. CASA Cues • Auditory Scene Analysis .. is based on ‘grouping cues’ .. which reflect real-world sound characteristics • Computational ASA features common onset harmonicity (common fundamental) common modulation / fate spatial cues (ITD/ILD i.e. perceived origin) continuity sequential similarity learned patterns (schema) Speech Separation: Integration - Dan Ellis 2004-11-06 Grouping cues freq / Hz • Onset & Harmonicity: Pierce example 8000 6000 4000 2000 0 0 1 2 3 4 5 6 7 8 9 time / s • Multiple cues are not always consistent! Speech Separation: Integration - Dan Ellis 2004-11-06 Pure CASA System (Brown’92) input mixture Front end signal features (maps) Object formation discrete objects Grouping rules Source groups freq onset time • Problems period frq.mod hand-built rules Harmoncity as principle cue: how to integrate other factors? one continuously-voiced utterance Speech Separation: Integration - Dan Ellis 2004-11-06 M ! = argmax P " M X # = argmax P " X M M M +-%(2%&2*34%//'!3%-'"(*35""/,/*6,-7,,(* 2. CASA within 1 .':-8&,/;*"6/,&<,2*9,%-8&,/*Y=*/,) P" M # ! M =Speech argmax P " MFragment X # = argmax P " X M # ! -------------The Decoder %44*&,4%-,2*6>*P( X | Y,S P " X)*;# M M 1 #"2,4/*M*-"*#%-35*/"8&3,*9,%-8&,/*X Barker, Cooke, Ellis ’04 • Match ‘uncorrupt’ 1 .':-8&,/;*"6/,&<,2*9,%-8&,/*Y=*/,)&,)%-'"(*S=* Observation %44*&,4%-,2*6>*P( X | Y,S )*; Y(f ) spectrum to ASR models using missing data • Observation Y(f ) Source X(f ) Source X(f ) Segregation S freq Joint search for model M and freq segregation S 1 ?"'(-*34%//'!3%-'"(*"9*#"2,4*%(2*/ to maximize: 1 ?"'(-*34%//'!3%-'"(*"9*#"2,4*%(2*/,)&,)%-'"(; P" X Y $ S # P" X Y $ S # P " M $ S Y #P=" M P "$MS# %YP#" X=M P # !"-----------------------! P#" S! -----------------------Y# -d MP#"%XP# "-XdX M P " X # Isolated Source Model Segregation Model Segregation S <$P(X)$#*$&*#521$=*#(0"#0$> <$P(X)$#*$&*#521$=*#(0"#0$> !"#$%&&'( )*+#,-$.'/0+12($3$42"1#'#5 Speech Separation: Integration - Dan Ellis !"#$%&&'( 2004-11-06 67789::976$9$::$;$68 )*+#,-$.'/0+12($3$42"1#'#5 67789 Segregation S 1 freq Using CASA cues ?"'(-*34%//'!3%-'"(*"9*#"2,4*%(2*/,)&,)%-'"(; P" X Y $ S # !"#$%&'()(&*+,-./+" P " M $ S Y # = P " M # % P " X M # ! ------------------------- dX ! P " S Y # P" X # 0 P(S|Y)&1#$2"&,34."-#3&#$*4/5,-#4$&-4&"+%/+%,-#4$ • CASA helps search 9<$P(X)$#*$&*#521$=*#(0"#0$> '($0<'($(25125"0'*#$=*10<$>*#(',21'#5? 9 <*=$&'@2&A$'($'0? consider only segregations made from CASA !"#$%&&'( )*+#,-$.'/0+12($3$42"1#'#5 67789::976$9$::$;$68 0 67+$#$%&*4/&'()(8"-91+&143,1&*+,-./+" chunks 9 B21'*,'>'0A;<"1C*#'>'0AD • CASA rates segregation E12F+2#>A$G"#,($G2&*#5$0*520<21 9 *#(20;>*#0'#+'0AD construct0'C29E12F+2#>A$125'*#$C+(0$G2$=<*&2 P(S|Y) to reward CASA qualities: Frequency Proximity Common Onset Speech Separation: Integration - Dan Ellis !"#$%&&'( )*+#,-$.'/0+12($3$42"1#'#5 Harmonicity 67789::976$9$:8$;$68 2004-11-06 3. CASA + Other Approaches • Put CASA in the control for any engine? • CASA + ICA? Te-Won’s 5-year project • CASA + Clustering (Francis Bach) optimize ‘affinity’ based on auditory+... cues • Nonnegative parts? parts: local cues: local Speech Separation: Integration - Dan Ellis 2004-11-06 Can Machine Learning Subsume CASA? • ASA grouping cues describe real sounds .. “anecdotally” • Machine Learning is another way to find regularities in large datasets can, e.g., Roweis templates subsume harmonicity, onset, etc.? ... and handle schema at the same time? “cut out the (grouping cue) middleman” • Trick is how to represent/generalize listeners can organize novel sounds Speech Separation: Integration - Dan Ellis 2004-11-06 Conclusions • CASA cues: describe attributes of real-world sources • Integrating CASA: contribute to a ‘reasonableness’ metric (co.theory) help constrain the search (algorithm) • Integration in general: how much ‘different’ information is available? Session: Is there integration? Can there be? Speech Separation: Integration - Dan Ellis 2004-11-06