SINGLE CHANNEL SPEECH MUSIC SEPARATION USING NONNEGATIVE MATRIXFACTORIZATION AND SPECTRAL MASKS Emad M. Grais Hakan Erdogan 17th International Conference on Digital Signal Processing,2011 Jain-De,Lee Outline INTRODUCTION NON-NEGATIVE MATRIX FACTORIZATION SIGNAL SEPARATION AND MASKING EXPERIMENTS AND DISCUSSION CONCLUSION Introduction There are two main stages of this work – – Training stage Separation stage Using NMF with different types of masks to improve the separation process – – The separation process faster NMF with fewer iterations Introduction Problem formulation – The observe a signal x(t) ,which is the mixture of two sources s(t) and m(t) X (t , f ) S (t , f ) M (t , f ) X (t , f ) e jX (t , f ) S (t , f ) e jS (t , f ) M (t , f ) e jM (t , f ) Where (t , f) be the STFT of x(t) – Assume the sources have the same phase angle as the mixed X=S+M Non-negative Matrix Factorization Non-negative matrix factorization algorithm [V ]nm [ B]nd [W ]dm Minimization problem min C (V , BW ) B ,W subject to elements of B,W≧0 Different cost functions C of NMF – – Euclidean distance KL divergence Non-negative Matrix Factorization Euclidean distance cost function minC(V , BW ) (Vi , j ( BW )i , j )2 B ,W i, j KL divergence cost function min C (V , BW ) (Vi , j log B ,W i, j Vi , j ( BW )i , j Vi , j ( BW )i , j ) Multiplicative Update Algorithm V B B W W W B T 1 T V W T B B B W T 1W Non-negative Matrix Factorization The magnitude spectrogram S and M are calculated by NMF STrain BspeechWspeech M Train BmusicWmusic Larger number of basis vectors – – – Lower approximation error Redundant set of basis Require more computation time Signal Separation and Masking The NMF is used decompose the magnitude spectrogram matrix X X [ Bspeech Bmusic ]W The initial spectrograms estimates for speech and music signals are respectively calculated as follows ~ S BspeechWS ~ M Bmusic WM Where WS and WM are submatrices in matrix W Signal Separation and Masking ~ Use the initial estimated spectrograms S and M~ to build a mask as follows ~P S H ~P ~ P S M Source signals reconstruction Sˆ H X Mˆ (1 H ) X Where 1 is a matrix of ones is element-wise multiplication Signal Separation and Masking Two specific values of p correspond to special masks – Wiener filter(soft mask) ~ S2 HWiener ~ 2 ~ 2 S M – Hard mask H hard ~2 S round( ~ 2 ~ 2 ) S M Signal Separation and Masking The value of the mask versus the linear ratio for different values of p Experiments and Discussion Simulation – – 16kHz sampling rate Speech • • – Music • • – – Training speech data-540 short utterances Testing speech data-20 utterances 38 pieces for training 1 piece for testing Hamming window-512 point FFT size-512 point Experiments and Discussion Performance measurement of the separation Experiments and Discussion Experiments and Discussion Experiments and Discussion Conclusion The family of masks have a parameter to control the saturation level The proposed algorithm gives better results and facilitates to speed up the separation process