SINGLE CHANNEL SPEECH MUSIC SEPARATION USING
NONNEGATIVE MATRIXFACTORIZATION AND SPECTRAL MASKS
Emad M. Grais
Hakan Erdogan
17th International Conference on Digital Signal Processing,2011
Jain-De,Lee
Outline
INTRODUCTION
NON-NEGATIVE MATRIX FACTORIZATION
SIGNAL SEPARATION AND MASKING
EXPERIMENTS AND DISCUSSION
CONCLUSION
Introduction
There are two main stages of this work
–
–
Training stage
Separation stage
Using NMF with different types of masks to
improve the separation process
–
–
The separation process faster
NMF with fewer iterations
Introduction
Problem formulation
–
The observe a signal x(t) ,which is the mixture of two
sources s(t) and m(t)
X (t , f ) S (t , f ) M (t , f )
X (t , f ) e jX (t , f ) S (t , f ) e jS (t , f ) M (t , f ) e jM (t , f )
Where (t , f) be the STFT of x(t)
–
Assume the sources have the same phase angle as the
mixed
X=S+M
Non-negative Matrix Factorization
Non-negative matrix factorization algorithm
[V ]nm [ B]nd [W ]dm
Minimization problem
min C (V , BW )
B ,W
subject to elements of B,W≧0
Different cost functions C of NMF
–
–
Euclidean distance
KL divergence
Non-negative Matrix Factorization
Euclidean distance cost function
minC(V , BW ) (Vi , j ( BW )i , j )2
B ,W
i, j
KL divergence cost function
min C (V , BW ) (Vi , j log
B ,W
i, j
Vi , j
( BW )i , j
Vi , j ( BW )i , j )
Multiplicative Update Algorithm
V
B
B W
W W
B T 1
T
V
W T
B B B W T
1W
Non-negative Matrix Factorization
The magnitude spectrogram S and M are calculated
by NMF
STrain BspeechWspeech
M Train BmusicWmusic
Larger number of basis vectors
–
–
–
Lower approximation error
Redundant set of basis
Require more computation time
Signal Separation and Masking
The NMF is used decompose the magnitude
spectrogram matrix X
X [ Bspeech Bmusic ]W
The initial spectrograms estimates for speech and
music signals are respectively calculated as follows
~
S BspeechWS
~
M Bmusic WM
Where WS and WM are submatrices in matrix W
Signal Separation and Masking
~
Use the initial estimated spectrograms S and M~ to
build a mask as follows
~P
S
H ~P ~ P
S M
Source signals reconstruction
Sˆ H X
Mˆ (1 H ) X
Where 1 is a matrix of ones
is element-wise multiplication
Signal Separation and Masking
Two specific values of p correspond to special masks
–
Wiener filter(soft mask)
~
S2
HWiener ~ 2 ~ 2
S M
–
Hard mask
H hard
~2
S
round( ~ 2 ~ 2 )
S M
Signal Separation and Masking
The value of the mask versus the linear ratio for different values of p
Experiments and Discussion
Simulation
–
–
16kHz sampling rate
Speech
•
•
–
Music
•
•
–
–
Training speech data-540 short utterances
Testing speech data-20 utterances
38 pieces for training
1 piece for testing
Hamming window-512 point
FFT size-512 point
Experiments and Discussion
Performance measurement of the separation
Experiments and Discussion
Experiments and Discussion
Experiments and Discussion
Conclusion
The family of masks have a parameter to control the
saturation level
The proposed algorithm gives better results and
facilitates to speed up the separation process