EE E6820: Speech & Audio Processing & Recognition Lecture 10: Signal Separation Dan Ellis <dpwe@ee.columbia.edu> Michael Mandel <mim@ee.columbia.edu> Columbia University Dept. of Electrical Engineering http://www.ee.columbia.edu/∼dpwe/e6820 April 14, 2009 1 2 3 4 Sound mixture organization Computational auditory scene analysis Independent component analysis Model-based separation E6820 (Ellis & Mandel) L10: Signal separation April 14, 2009 1 / 45 Outline 1 Sound mixture organization 2 Computational auditory scene analysis 3 Independent component analysis 4 Model-based separation E6820 (Ellis & Mandel) L10: Signal separation April 14, 2009 2 / 45 Sound Mixture Organization 4000 frq/Hz 3000 0 -20 2000 -40 1000 0 0 2 4 6 8 10 12 time/s -60 level / dB Analysis Voice (evil) Voice (pleasant) Stab Rumble Choir Strings Auditory Scene Analysis: describing a complex sound in terms of high-level sources / events . . . like listeners do Hearing is ecologically grounded I I reflects ‘natural scene’ properties subjective, not absolute E6820 (Ellis & Mandel) L10: Signal separation April 14, 2009 3 / 45 Sound, mixtures, and learning freq / kHz Speech Noise Speech + Noise 4 3 + 2 = 1 0 0 0.5 1 time / s 0 0.5 1 time / s 0 0.5 1 time / s Sound I I carries useful information about the world complements vision Mixtures . . . are the rule, not the exception I medium is ‘transparent’, sources are many I must be handled! Learning I I the ‘speech recognition’ lesson: let the data do the work like listeners E6820 (Ellis & Mandel) L10: Signal separation April 14, 2009 4 / 45 The problem with recognizing mixtures “Imagine two narrow channels dug up from the edge of a lake, with handkerchiefs stretched across each one. Looking only at the motion of the handkerchiefs, you are to answer questions such as: How many boats are there on the lake and where are they?” (after Bregman, 1990) Received waveform is a mixture I two sensors, N signals . . . underconstrained Disentangling mixtures as the primary goal? I I perfect solution is not possible need experience-based constraints E6820 (Ellis & Mandel) L10: Signal separation April 14, 2009 5 / 45 Approaches to sound mixture recognition Separate signals, then recognize e.g. Computational Auditory Scene Analysis (CASA), Independent Component Analysis (ICA) I nice, if you can do it Recognize combined signal I I ‘multicondition training’ combinatorics. . . Recognize with parallel models I I full joint-state space? divide signal into fragments, then use missing-data recognition E6820 (Ellis & Mandel) L10: Signal separation April 14, 2009 6 / 45 What is the goal of sound mixture analysis? CASA ? Separate signals? I I I output is unmixed waveforms underconstrained, very hard . . . too hard? not required? Source classification? I I output is set of event-names listeners do more than this. . . Something in-between? Identify independent sources + characteristics I standard task, results? E6820 (Ellis & Mandel) L10: Signal separation April 14, 2009 7 / 45 Segregation vs. Inference Source separation requires attribute separation I I sources are characterized by attributes (pitch, loudness, timbre, and finer details) need to identify and gather different attributes for different sources. . . Need representation that segregates attributes I I spectral decomposition periodicity decomposition Sometimes values can’t be separated e.g. unvoiced speech I maybe infer factors from probabilistic model? p(O, x, y ) → p(x, y | O) I or: just skip those values & infer from higher-level context E6820 (Ellis & Mandel) L10: Signal separation April 14, 2009 8 / 45 Outline 1 Sound mixture organization 2 Computational auditory scene analysis 3 Independent component analysis 4 Model-based separation E6820 (Ellis & Mandel) L10: Signal separation April 14, 2009 9 / 45 Auditory Scene Analysis (Bregman, 1990) How do people analyze sound mixtures? I I I break mixture into small elements (in time-freq) elements are grouped in to sources using cues sources have aggregate attributes Grouping ‘rules’ (Darwin and Carlyon, 1995) I cues: common onset/offset/modulation, harmonicity, spatial location, . . . Onset map Sound Frequency analysis Elements Harmonicity map Sources Grouping mechanism Source properties Spatial map E6820 (Ellis & Mandel) L10: Signal separation April 14, 2009 10 / 45 Cues to simultaneous grouping freq / Hz Elements + attributes 8000 6000 4000 2000 0 0 1 2 3 4 5 6 7 8 9 time / s Common onset I simultaneous energy has common source Periodicity I energy in different bands with same cycle Other cues I spatial (ITD/IID), familiarity, . . . E6820 (Ellis & Mandel) L10: Signal separation April 14, 2009 11 / 45 The effect of context Context can create an ‘expectation’ i.e. a bias towards a particular interpretation I A change in a signal will be interpreted as an added source whenever possible frequency / kHz e.g. Bregman’s “old-plus-new” principle: 2 1 + 0 0.0 0.4 0.8 1.2 time / s I a different division of the same energy depending on what preceded it E6820 (Ellis & Mandel) L10: Signal separation April 14, 2009 12 / 45 Computational Auditory Scene Analysis (CASA) Sound mixture Object 1 percept Object 2 percept Object 3 percept Goal: Automatic sound organization I Systems to ‘pick out’ sounds in a mixture . . . like people do e.g. voice against a noisy background I to improve speech recognition Approach psychoacoustics describes grouping ‘rules’ . . . just implement them? I E6820 (Ellis & Mandel) L10: Signal separation April 14, 2009 13 / 45 "#$#!%&'()*+(,!-&'.+//0(1 CASA front-end processing 2 "'&&+3'1&456 Correlogram: Loosely based on known/possible physiology 7''/+38!94/+,!'(!:(';(<-'//093+!-=8/0'3'18 e lin lay short-time autocorrelation de sound Cochlea filterbank frequency channels "'&&+3'1&45 /30.+ envelope follower freq lag time I I I I - linear filterbank cochlear approximation linear filterbank cochlear approximation static nonlinearity static -nonlinearity zero-delay slice is likeslice spectrogram - zero-delay is like spectrogram periodicity from delay-and-multiply detectors - periodicity from delay-and-multiply detectors E6820 (Ellis & Mandel) L10: Signal separation April 14, 2009 14 / 45 Bottom-Up Approach (Brown and Cooke, 1994) Implement psychoacoustic theory input mixture Front end signal features (maps) Object formation discrete objects Grouping rules Source groups freq onset time period frq.mod I I left-to-right processing uses common onset & periodicity cues Able to extract voiced speech v3n7 - Hu & Wang 02 freq / khz freq / khz v3n7 original mix 8 8 7 20 6 6 10 5 5 0 4 4 -10 3 3 -20 2 2 -30 1 1 0 0 7 0 0.2 0.4 0.6 E6820 (Ellis & Mandel) 0.8 1 1.2 1.4 time / sec 0 -40 0.2 0.4 0.6 0.8 1 L10: Signal separation 1.2 1.4 time / sec level ./ dB April 14, 2009 15 / 45 Problems with ‘bottom-up’ CASA "#$%&'()!*+,-!.%$,,$(/012!3454 freq time 6 3+#70()7#+%+89!,+('/:#';0'87<!'&'('8,) Circumscribing time-frequency elements - need to have ‘regions’, but hard to find need have ‘regions’, but hard to find 6 to"'#+$=+7+,<!+)!,-'!1#+(>#<!70' Periodicity -ishow the to primary handlecue aperiodic energy? I how to handle aperiodic energy? 6 ?')<8,-')+)!@+>!(>)A'=!!&,'#+89 Resynthesis via masked filtering - cannot separate within a single t-f element I cannot separate within a single t-f element 6 B$,,$(/01!&'>@')!8$!>(%+90+,<!$#!7$8,'C, Bottom-up leaves no ambiguity or context - how to model illusions? I how to model illusions? I E6820 (Ellis & Mandel) L10: Signal separation April 14, 2009 16 / 45 Restoration in sound perception Auditory ‘illusions’ = hearing what’s not there The continuity illusion & Sinewave Speech (SWS) 8000 Frequency 6000 4000 2000 0 0 0.5 1 1.5 Time 2 2.5 2 Time 2.5 3 Frequency 5000 4000 3000 2000 1000 0 I 0.5 1 1.5 3.5 duplex perception What kind of model accounts for this? I is it an important part of hearing? E6820 (Ellis & Mandel) L10: Signal separation April 14, 2009 17 / 45 Adding top-down constraints: Prediction-Driven CASA Perception is not direct but a search for plausible hypotheses Data-driven (bottom-up). . . input mixture I Front end signal features Object formation discrete objects Grouping rules Source groups objects irresistibly appear vs. Prediction-driven (top-down) hypotheses Hypothesis management prediction errors input mixture I I Front end signal features Compare & reconcile Noise components Periodic components Predict & combine predicted features match observations with a ‘world-model’ need world-model constraints. . . E6820 (Ellis & Mandel) L10: Signal separation April 14, 2009 18 / 45 Generic sound elements for PDCASA (Ellis, 1996) Goal is a representational space that I I I f/Hz covers real-world perceptual sounds minimal parameterization (sparseness) separate attributes in separate parameters NoiseObject profile magProfile H[k] 4000 ClickObject XC[n,k] decayTimes T[k] f/Hz 2000 4000 2000 1000 1000 WeftObject H W[n,k] 400 400 H[k] 200 0 200 f/Hz 400 XN[n,k] 0.010 0.00 magContour A[n] 0.005 0.01 2.2 2.4 time / s 0.0 0.1 time / s 0.000 0.0 0.1 200 p[n] 100 XC[n,k] = H[k]·exp((n - n0) / T[k]) 50 f/Hz 400 0.2 time/s Correlogram analysis Periodogram 200 XN[n,k] = H[k]·A[n] 100 50 0.0 0.2 0.4 0.6 0.8 time/s XW[n,k] = HW[n,k]·P[n,k] Object hierarchies built on top. . . E6820 (Ellis & Mandel) L10: Signal separation April 14, 2009 19 / 45 PDCASA for old-plus-new Incremental analysis t1 t2 t3 Input Signal Time t1: Initial element created Time t2: Additional element required Time t2: Second element finished E6820 (Ellis & Mandel) L10: Signal separation April 14, 2009 20 / 45 PDCASA for the continuity illusion Subjects hear the tone as continuous . . . if the noise is a plausible masker f/Hz ptshort 4000 2000 1000 0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 t/s Data-driven analysis gives just visible portions: Prediction-driven can infer masking: E6820 (Ellis & Mandel) L10: Signal separation April 14, 2009 21 / 45 Prediction-Driven CASA Explain a complex sound with basic elements f/Hz City 4000 2000 1000 400 200 1000 400 200 100 50 0 1 2 3 f/Hz Wefts1−4 4 5 Weft5 6 7 Wefts6,7 8 Weft8 9 Wefts9−12 4000 2000 1000 400 200 1000 400 200 100 50 Horn1 (10/10) Horn2 (5/10) Horn3 (5/10) Horn4 (8/10) Horn5 (10/10) f/Hz Noise2,Click1 4000 2000 1000 400 200 Crash (10/10) f/Hz Noise1 4000 2000 1000 −40 400 200 −50 −60 Squeal (6/10) Truck (7/10) −70 0 E6820 (Ellis & Mandel) 1 2 3 4 5 6 7 8 L10: Signal separation 9 time/s dB April 14, 2009 22 / 45 Aside: Ground Truth What do people hear in sound mixtures? I do interpretations match? → Listening tests to collect ‘perceived events’: E6820 (Ellis & Mandel) L10: Signal separation April 14, 2009 23 / 45 Aside: Evaluation Evaluation is a big problem for CASA I I I what is the goal, really? what is a good test domain? how do you measure performance? SNR improvement I I I tricky to derive from before/after signals: correspondence problem can do with fixed filtering mask differentiate removing signal from adding noise Speech Recognition (ASR) improvement I recognizers often sensitive to artifacts ‘Real’ task? I mixture corpus with specific sound events. . . E6820 (Ellis & Mandel) L10: Signal separation April 14, 2009 24 / 45 Outline 1 Sound mixture organization 2 Computational auditory scene analysis 3 Independent component analysis 4 Model-based separation E6820 (Ellis & Mandel) L10: Signal separation April 14, 2009 25 / 45 Independent Component Analysis (ICA) (Bell and Sejnowski, 1995, etc.) If mixing is like matrix multiplication, then separation is searching for the inverse matrix a11 a12 a21 a22 s1 × s1 s2 x1 x2 a11 w11 w12 × w21 w22 a21 a12 a22 s2 x1 x2 s^1 s^2 −δ MutInfo δwij i.e. W ≈ A−1 I with N different versions of the mixed signals (microphones), we can find N different input contributions (sources) I how to rate quality of outputs? i.e. when do outputs look separate? E6820 (Ellis & Mandel) L10: Signal separation April 14, 2009 26 / 45 Gaussianity, Kurtosis, & Independence A signal can be characterized by its PDF p(x) i.e. as if successive time values are drawn from a random variable (RV) I Gaussian PDF is ‘least interesting’ I Sums of independent RVs (PDFs convolved) tend to Gaussian PDF (central limit theorem) Measures of deviations from Gaussianity: 4th moment is Kurtosis (“bulging”) 0.8 exp(-|x|) kurt = 2.5 0.6 " kurt(y ) = E y −µ σ 4 # −3 0.4 I I I 4 exp(-x ) kurt = -1 0.2 0 -4 I Gaussian kurt = 0 -2 0 2 4 kurtosis of Gaussian is zero (this def.) ‘heavy tails’ → kurt > 0 closer to uniform dist. → kurt < 0 Directly related to KL divergence from Gaussian PDF E6820 (Ellis & Mandel) L10: Signal separation April 14, 2009 27 / 45 Independence in Mixtures Scatter plots & Kurtosis values s1 vs. s2 0.6 x1 vs. x2 0.6 0.4 0.4 0.2 0.2 0 0 -0.2 -0.4 -0.2 -0.6 -0.4 -0.4 -0.2 E6820 (Ellis & Mandel) -0.2 0 0.2 0.4 0.6 -0.5 0 0.5 p(s1) kurtosis = 27.90 p(x1) kurtosis = 18.50 p(s2) kurtosis = 53.85 p(x2) kurtosis = 27.90 -0.1 0 0.1 0.2 -0.2 L10: Signal separation -0.1 0 0.1 0.2 April 14, 2009 28 / 45 Finding Independent Components Sums of independent RVs are more Gaussian → minimize Gaussianity to undo sums i.e. search over wij terms in inverse matrix Kurtosis vs. θ s2 0.6 0.4 kurtosis mix 2 Mixture Scatter 0.8 s1 12 10 8 0.2 6 0 4 -0.2 2 -0.4 s1 -0.6 -0.3 -0.2 -0.1 0 0.1 0.2 0.3 0.4 0 0 0.2 0.4 0.6 mix 1 s2 0.8 θ/π 1 Solve by Gradient descent or Newton-Raphson: w + = E[xg (w T x)] − E[g 0 (w T x)]w w= w+ kw + k “Fast ICA”, (Hyvärinen and Oja, 2000) http://www.cis.hut.fi/projects/ica/fastica/ E6820 (Ellis & Mandel) L10: Signal separation April 14, 2009 29 / 45 Limitations of ICA Assumes instantaneous mixing I I real world mixtures have delays & reflections STFT domain? x1 (t) = a11 (t) ∗ s1 (t) + a12 (t) ∗ s2 (t) ⇒ X1 (ω) = A11 (ω)S1 (ω) + A12 (ω)S2 (ω) I Solve ω subbands separately, match up answers Searching for best possible inverse matrix I I cannot find more than N outputs from N inputs but: “projection pursuit” ideas + time-frequency masking. . . Cancellation inherently fragile I I I ŝ1 = w11 x1 + w12 x2 to cancel out s2 sensitive to noise in x channels time-varying mixtures are a problem E6820 (Ellis & Mandel) L10: Signal separation April 14, 2009 30 / 45 Outline 1 Sound mixture organization 2 Computational auditory scene analysis 3 Independent component analysis 4 Model-based separation E6820 (Ellis & Mandel) L10: Signal separation April 14, 2009 31 / 45 Model-Based Separation: HMM decomposition (e.g. Varga and Moore, 1990; Gales and Young, 1993) Independent state sequences for 2+ component source models model 2 model 1 observations / time New combined state space q 0 = q1 × q2 I need pdfs for combinations p(X | q1 , q2 ) E6820 (Ellis & Mandel) L10: Signal separation April 14, 2009 32 / 45 One-channel Separation: Masked Filtering Multichannel → ICA: Inverse filter and cancel target s s^ + - mix interference n n^ proc One channel: find a time-frequency mask mix s+n spectro gram stft x istft s^ mask proc Frequency Frequency Cannot remove overlapping noise in t-f cells, but surprisingly effective (psych masking?): 0 dB SNR 8000 -12 dB SNR -24 dB SNR 6000 Noise mix 4000 2000 8000 6000 Mask resynth 4000 2000 0 0 1 2 Time 3 E6820 (Ellis & Mandel) 0 1 2 Time 3 0 1 2 Time L10: Signal separation 3 April 14, 2009 33 / 45 “One microphone source separation” (Roweis, 2001) State sequences → t-f estimates → mask Original voices freq / Hz Speaker 1 Speaker 2 4000 3000 2000 1000 0 Mixture 4000 3000 2000 1000 0 Resynthesis masks State means 4000 3000 2000 1000 0 4000 3000 2000 1000 0 I I 0 1 2 3 4 5 6 time / sec 0 1 2 3 4 5 6 time / sec 1000 states/model (→ 106 transition probs.) simplify by subbands (coupled HMM)? E6820 (Ellis & Mandel) L10: Signal separation April 14, 2009 34 / 45 Speech Fragment Recognition (Barker et al., 2005) Signal separation is too hard! Instead: I I segregate features into partially-observed sources then classify Made possible by missing data recognition I integrate over uncertainty in observations for true posterior distribution Goal: Relate clean speech models P(X | M) to speech-plus-noise mixture observations . . . and make it tractable E6820 (Ellis & Mandel) L10: Signal separation April 14, 2009 35 / 45 "#$$#%&!'()(!*+,-&%#)#-% !! ! !!! . Speech /0++,1!2-3+4$!p(x|m)!(5+!264)#3#2+%$#-%(4777 models p(x | m) are multidimensional. . . 9i.e.'<2<!=2#$(-!>#1'#$?2(!@*1!2>21A!@12B<!?C#$$2& means, variances for every freq. channel "#$$#%&!'()(!*+,-&%#)#-% I need values for all dimensions to get p(·) 9 $22,!>#&+2(!@*1!#&&!,'=2$('*$(!0*!520!p(•) . /0++,1!2-3+4$!p(x|m)!(5+!264)#3#2+%$#-%(4777 Missing Data Recognition . But: can evaluate over a subset of dimensions xk 9 '<2<!=2#$(-!>#1'#$?2(!@*1!2>21A!@12B<!?C#$$2& 86)9!,(%!+:(46()+!-:+5!(! xu 9 $22,!>#&+2(!@*1!#&&!,'=2$('*$(!0*!520! p(•) p(x k,xu) $6;$+)!-<!3#2+%$#-%$!xk . 86)9!,(%!+:(46()+!-:+5!(! p ! x k m " =Z $ p ! $6;$+)!-<!3#2+%$#-%$! x # x u m " dx u xk p(xk | m) = p(xk , kxu | m) dx p ! x k m " = $ p ! x k#ux u m " dx u . !! y p(xk|xu<y ) p(xk|xu<y ) xk p(x xk k ) p(xk ) P(x P(x | q)| q)= = P(x1 | q) dimension ! P(x·1P(x | q) 2 | q) · P(x3 q) · P(x 2 | | q) · P(x4 | q) · P(x 3 |5 q) · P(x | q) · P(x · P(x | q) 4 |6 q) time ! · P(x5 | q) · P(x6 | q) hard part9 isC#1,!D#10!'(!!$,'$5!0C2!=#(E!F(25125#0'*$G finding the mask (segregation) dimension ! I p(xk,xu) y =+%,+>! . =+%,+>! 2#$$#%&!3()(!5+,-&%#)#-%9 2#$$#%&!3()(!5+,-&%#)#-%9 Hence, missing data recognition: ?5+$+%)!3()(!2($@ ?5+$+%)!3()(!2($@ xu "#$!%&&'( E6820 (Ellis & Mandel) )*+$,-!.'/0+12(!3!42#1$'$5 time ! separation L10: Signal 67789::976!9!:7!;!68 April 14, 2009 36 / 45 Missing Data Results Estimate static background noise level N(f ) Cells with energy close to background are considered “missing” Factory Noise "1754" + noise SNR mask Missing Data Classifier "1754" Digit recognition accuracy / % 100 80 60 40 20 0 a priori missing data MFCC+CMN 5 10 15 20 SNR (dB) I " must use spectral features! But: nonstationary noise → spurious mask bits I can we try removing parts of mask? E6820 (Ellis & Mandel) L10: Signal separation April 14, 2009 37 / 45 Comparing different segregations Standard classification chooses between models M to match source features X M ∗ = argmax p(M | X ) = argmax p(X | M)p(M) M M Mixtures: observed features Y , segregation S, all related by p(X | Y , S) Observation Y(f) Source X(f) Segregation S freq Joint classification of model and segregation: Z p(X | Y , S) p(M, S | Y ) = p(M) p(X | M) dX p(S | Y ) p(X ) I P(X ) no longer constant E6820 (Ellis & Mandel) L10: Signal separation April 14, 2009 38 / 45 Calculating fragment matches Z p(M, S | Y ) = p(M) p(X | M) p(X | Y , S) dX p(S | Y ) p(X ) p(X | M) – the clean-signal feature model p(X | Y ,S) p(X ) – is X ‘visible’ given segregation? Integration collapses some bands. . . p(S|Y ) – segregation inferred from observation I I just assume uniform, find S for most likely M or: use extra information in Y to distinguish Ss. . . Result: I I I probabilistically-correct relation between clean-source models p(X | M) and inferred, recognized source + segregation p(M, S | Y ) E6820 (Ellis & Mandel) L10: Signal separation April 14, 2009 39 / 45 Using CASA features p(S | Y ) links acoustic information to segregation I I is this segregation worth considering? how likely is it? Opportunity for CASA-style information to contribute I I periodicity/harmonicity: these different frequency bands belong together onset/continuity: this time-frequency region must be whole Frequency Proximity E6820 (Ellis & Mandel) Common Onset L10: Signal separation Harmonicity April 14, 2009 40 / 45 Fragment decoding Limiting S to whole fragments makes hypothesis search tractable: Fragments Hypotheses w1 w5 w2 w6 w7 w3 w8 w4 time T1 T2 T3 T4 T5 T6 I choice of fragments reflects p(S | Y )p(X | M) i.e. best combination of segregation and match to speech models Merging hypotheses limits space demands . . . but erases specific history E6820 (Ellis & Mandel) L10: Signal separation April 14, 2009 41 / 45 Speech fragment decoder results Simple p(S | Y ) model forces contiguous regions to stay together I big efficiency gain when searching S space "1754" + noise SNR mask Fragments Fragment Decoder "1754" Clean-models-based recognition rivals trained-in-noise recognition E6820 (Ellis & Mandel) L10: Signal separation April 14, 2009 42 / 45 Multi-source decoding Search for more than one source q2(t) S2(t) Y(t) S1(t) q1(t) Mutually-dependent data masks I I I disjoint subsets of cells for each source each model match p(Mx | Sx , Y ) is independent masks are mutually dependent: p(S1 , S2 | Y ) Huge practical advantage over full search E6820 (Ellis & Mandel) L10: Signal separation April 14, 2009 43 / 45 Summary Auditory Scene Analysis: I Hearing: partially understood, very successful Independent Component Analysis: I Simple and powerful, some practical limits Model-based separation: I Real-world constraints, implementation tricks Parting thought Mixture separation the main obstacle in many applications e.g. soundtrack recognition E6820 (Ellis & Mandel) L10: Signal separation April 14, 2009 44 / 45 References Albert S. Bregman. Auditory Scene Analysis: The Perceptual Organization of Sound. The MIT Press, 1990. ISBN 0262521954. C.J. Darwin and R.P. Carlyon. Auditory grouping. In B.C.J. Moore, editor, The Handbook of Perception and Cognition, Vol 6, Hearing, pages 387–424. Academic Press, 1995. G. J. Brown and M. P. Cooke. Computational auditory scene analysis. Computer speech and language, 8:297–336, 1994. D. P. W. Ellis. Prediction–driven computational auditory scene analysis. PhD thesis, Department of Electrtical Engineering and Computer Science, M.I.T., 1996. Anthony J. Bell and Terrence J. Sejnowski. An information-maximization approach to blind separation and blind deconvolution. Neural Computation, 7(6):1129–1159, 1995. ftp://ftp.cnl.salk.edu/pub/tony/bell.blind.ps.Z. A. Hyvärinen and E. Oja. Independent component analysis: Algorithms and applications. Neural Networks, 13(4–5):411–430, 2000. http://www.cis.hut.fi/aapo/papers/IJCNN99_tutorialweb/. A. Varga and R. Moore. Hidden markov model decomposition of speech and noise. In Proceedings of the 1990 IEEE International Conference on Acoustics, Speech and Signal Processing, pages 845–848, 1990. M.J.F. Gales and S.J. Young. Hmm recognition in noise using parallel model combination. In Proc. Eurospeech-93, volume 2, pages 837–840, 1993. S. Roweis. One-microphone source separation. In NIPS, volume 11, pages 609–616. MIT Press, Cambridge MA, 2001. Jon Barker, Martin Cooke, and Daniel P. W. Ellis. Decoding speech in the presence of other sources. Speech Communication, 45(1):5–25, 2005. URL http://www.ee.columbia.edu/~dpwe/pubs/BarkCE05-sfd.pdf. E6820 (Ellis & Mandel) L10: Signal separation April 14, 2009 45 / 45