single channel speech music separation using nonnegative matrix

SINGLE CHANNEL SPEECH MUSIC SEPARATION USING NONNEGATIVE MATRIXFACTORIZATION AND SPECTRAL MASKS Emad M. Grais Hakan Erdogan 17th International Conference on Digital Signal Processing,2011 Jain-De,Lee Outline  INTRODUCTION  NON-NEGATIVE MATRIX FACTORIZATION  SIGNAL SEPARATION AND MASKING  EXPERIMENTS AND DISCUSSION  CONCLUSION Introduction  There are two main stages of this work – – Training stage Separation stage  Using NMF with different types of masks to improve the separation process – – The separation process faster NMF with fewer iterations Introduction  Problem formulation – The observe a signal x(t) ,which is the mixture of two sources s(t) and m(t) X (t , f )  S (t , f )  M (t , f ) X (t , f ) e jX (t , f )  S (t , f ) e jS (t , f )  M (t , f ) e jM (t , f ) Where (t , f) be the STFT of x(t) – Assume the sources have the same phase angle as the mixed X=S+M Non-negative Matrix Factorization  Non-negative matrix factorization algorithm [V ]nm  [ B]nd [W ]dm  Minimization problem min C (V , BW ) B ,W subject to elements of B,W≧0  Different cost functions C of NMF – – Euclidean distance KL divergence Non-negative Matrix Factorization  Euclidean distance cost function minC(V , BW )   (Vi , j  ( BW )i , j )2 B ,W i, j  KL divergence cost function min C (V , BW )   (Vi , j log B ,W i, j Vi , j ( BW )i , j  Vi , j  ( BW )i , j )  Multiplicative Update Algorithm V B  B W W W  B T 1 T V W T B  B  B W T 1W Non-negative Matrix Factorization  The magnitude spectrogram S and M are calculated by NMF STrain  BspeechWspeech M Train  BmusicWmusic  Larger number of basis vectors – – – Lower approximation error Redundant set of basis Require more computation time Signal Separation and Masking  The NMF is used decompose the magnitude spectrogram matrix X X  [ Bspeech Bmusic ]W  The initial spectrograms estimates for speech and music signals are respectively calculated as follows ~ S  BspeechWS ~ M  Bmusic WM Where WS and WM are submatrices in matrix W Signal Separation and Masking ~  Use the initial estimated spectrograms S and M~ to build a mask as follows ~P S H  ~P ~ P S M  Source signals reconstruction Sˆ  H  X Mˆ  (1  H )  X Where 1 is a matrix of ones  is element-wise multiplication Signal Separation and Masking  Two specific values of p correspond to special masks – Wiener filter(soft mask) ~ S2 HWiener  ~ 2 ~ 2 S M – Hard mask H hard ~2 S  round( ~ 2 ~ 2 ) S M Signal Separation and Masking The value of the mask versus the linear ratio for different values of p Experiments and Discussion  Simulation – – 16kHz sampling rate Speech • • – Music • • – – Training speech data-540 short utterances Testing speech data-20 utterances 38 pieces for training 1 piece for testing Hamming window-512 point FFT size-512 point Experiments and Discussion  Performance measurement of the separation Experiments and Discussion Experiments and Discussion Experiments and Discussion Conclusion  The family of masks have a parameter to control the saturation level  The proposed algorithm gives better results and facilitates to speed up the separation process

single channel speech music separation using nonnegative matrix

Related documents

Products

Support

single channel speech music separation using nonnegative matrix

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib