single channel speech music separation using nonnegative matrix

advertisement
SINGLE CHANNEL SPEECH MUSIC SEPARATION USING
NONNEGATIVE MATRIXFACTORIZATION AND SPECTRAL MASKS
Emad M. Grais
Hakan Erdogan
17th International Conference on Digital Signal Processing,2011
Jain-De,Lee
Outline
 INTRODUCTION
 NON-NEGATIVE MATRIX FACTORIZATION
 SIGNAL SEPARATION AND MASKING
 EXPERIMENTS AND DISCUSSION
 CONCLUSION
Introduction
 There are two main stages of this work
–
–
Training stage
Separation stage
 Using NMF with different types of masks to
improve the separation process
–
–
The separation process faster
NMF with fewer iterations
Introduction
 Problem formulation
–
The observe a signal x(t) ,which is the mixture of two
sources s(t) and m(t)
X (t , f )  S (t , f )  M (t , f )
X (t , f ) e jX (t , f )  S (t , f ) e jS (t , f )  M (t , f ) e jM (t , f )
Where (t , f) be the STFT of x(t)
–
Assume the sources have the same phase angle as the
mixed
X=S+M
Non-negative Matrix Factorization
 Non-negative matrix factorization algorithm
[V ]nm  [ B]nd [W ]dm
 Minimization problem
min C (V , BW )
B ,W
subject to elements of B,W≧0
 Different cost functions C of NMF
–
–
Euclidean distance
KL divergence
Non-negative Matrix Factorization
 Euclidean distance cost function
minC(V , BW )   (Vi , j  ( BW )i , j )2
B ,W
i, j
 KL divergence cost function
min C (V , BW )   (Vi , j log
B ,W
i, j
Vi , j
( BW )i , j
 Vi , j  ( BW )i , j )
 Multiplicative Update Algorithm
V
B 
B W
W W 
B T 1
T
V
W T
B  B  B W T
1W
Non-negative Matrix Factorization
 The magnitude spectrogram S and M are calculated
by NMF
STrain  BspeechWspeech
M Train  BmusicWmusic
 Larger number of basis vectors
–
–
–
Lower approximation error
Redundant set of basis
Require more computation time
Signal Separation and Masking
 The NMF is used decompose the magnitude
spectrogram matrix X
X  [ Bspeech Bmusic ]W
 The initial spectrograms estimates for speech and
music signals are respectively calculated as follows
~
S  BspeechWS
~
M  Bmusic WM
Where WS and WM are submatrices in matrix W
Signal Separation and Masking
~
 Use the initial estimated spectrograms S and M~ to
build a mask as follows
~P
S
H  ~P ~ P
S M
 Source signals reconstruction
Sˆ  H  X
Mˆ  (1  H )  X
Where 1 is a matrix of ones
 is element-wise multiplication
Signal Separation and Masking
 Two specific values of p correspond to special masks
–
Wiener filter(soft mask)
~
S2
HWiener  ~ 2 ~ 2
S M
–
Hard mask
H hard
~2
S
 round( ~ 2 ~ 2 )
S M
Signal Separation and Masking
The value of the mask versus the linear ratio for different values of p
Experiments and Discussion
 Simulation
–
–
16kHz sampling rate
Speech
•
•
–
Music
•
•
–
–
Training speech data-540 short utterances
Testing speech data-20 utterances
38 pieces for training
1 piece for testing
Hamming window-512 point
FFT size-512 point
Experiments and Discussion
 Performance measurement of the separation
Experiments and Discussion
Experiments and Discussion
Experiments and Discussion
Conclusion
 The family of masks have a parameter to control the
saturation level
 The proposed algorithm gives better results and
facilitates to speed up the separation process
Download