Blind Separation of Speech Mixtures Vaninirappuputhenpurayil Gopalan REJU School of Electrical and Electronic Engineering Nanyang Technological University 6:09 PM 1 Introduction Blind Convolutive Source Separation • Mixing process: s1 l 0 l 0 l 0 l 0 x1 (t ) h11 (l ) s1 (t l ) h12 (l ) s2 (t l ) x2 (t ) h21 (l ) s1 (t l ) h22 (l ) s2 (t l ) s2 • Unmixing process: L L l 0 l 0 L L l 0 l 0 y1 (t ) w11 (l ) x1 (t l ) w12 (l ) x2 (t l ) y2 (t ) w21 (l ) x1 (t l ) w22 (l ) x2 (t l ) 6:09 PM 2 Introduction Convolutive Blind Source Separation 6:09 PM Instantaneous Blind Source Separation 3 Introduction Convolutive Blind Source Separation Instantaneous Blind Source Separation x1 (t ) h11 h12 s1 (t ) x (t ) h s (t ) h 22 2 2 21 x1 (t ) h11 h12 s1 (t ) x (t ) h 2 21 h22 s2 (t ) X(t ) H S(t ) X(t ) HS (t ) • In frequency domain: X 1 ( f , t ) H11 ( f ) H12 ( f ) S1 ( f , t ) X ( f , t ) H ( f ) H ( f ) S ( f , t ) 22 2 21 2 X( f , t ) H ( f )S( f , t ) 6:09 PM 4 Introduction s1 x1 x2 x3 s2 s1 x1 x2 x3 s2 s3 No. of sources < No. of sensor Overdetermined mixing No. of sources = No. of sensor Determined mixing s1 s2 s3 s 6:09 PM4 s4 x1 x2 x3 No. of sources > No. of sensor Underdetermined mixing 5 Approaches for BSS of Speech Signals Types of mixing Instantaneous mixing 6:09 PM Convolutive mixing 6 Approaches for BSS of Speech Signals Instantaneous mixing Step 1: Selection of cost function Step 2: Minimization or maximization of the cost function S1 S2 X1 H Y1 Y2 W X2 Separated? 6:09 PM 7 Approaches for BSS of Speech Signals Instantaneous mixing Selection of cost function Signals from two different sources are independent Statistical independence py pi yi Information theoretic Basic idea is Non-Gaussianity Central limit theorem: Mixture of two or more sources will be more Gaussian than their individual components Non Gaussianity measures: i Kurtosis kurt( y) E y 4 3 E y 2 Negentropy J y H y gauss H y 2 Entropy H y py log py dy Nonlinear cross moments Ef yi g y j 0 Temporal structure of speech Second order statstics can be used eg., dioganaliz e the output correlatio n simultaneo usly for different time lags Non-stationarity of speech 6:09 PM Signals are divided into blocks and the correlatio n matrices are simultaneo usly diagonaliz ed 8 Approaches for BSS of Speech Signals Instantaneous mixing Minimization or maximization of the cost function simple gradient method Natural gradient method e.g. Informax ICA algorithm Newton’s method 6:09 PM e.g. FastICA 9 Approaches for BSS of Speech Signals Convolutive Mixing P L 1 Time Domain: yq wqp (l ) x p (t l ) q 1,, Q p 1 l 0 Frequency Domain: X( f , t ) H( f )S( f , t ) Y ( f , t ) W ( f ) X( f , t ) Advantage: No permutation problem Disadvantage: Slow convergence High computational cost for long filter taps 6:09 PM Advantage: Low computational cost Fast convergence Disadvantage: Permutation Problem X1 S1 Y1 H W S2 Y2 X2 or 10 Y2 Y1 Permutation Problem in Frequency Domain BSS Corresponding to y3 One frequency bin Instantaneous ICA algorithm x1 x2 x3 Mixed signals K point FFT f1 BSS f2 BSS fk Solving K point permutation IFFT Problem BSS y1 y2 y3 y1 y2 y3 Still signals are mixed Separated signals Corresponding to different sources Due to permutation problem 6:09 PM 11 Motivation Instantaneous Determined/ Overdetermined Frequency domain # mixtures ≥ # sources Frequency binwise separation Permutation problem Convolutive Time domain BSS Instantaneous Mixing matrix estimation Source estimation Frequency domain Frequency binwise separation Underdetermined # mixtures < # sources Permutation problem Convolutive Automatic detection of no. of sources 6:09 PM Time domain 12 My Contribution - I Instantaneous Determined/ Overdetermined Frequency domain # mixtures ≥ # sources Frequency binwise separation Permutation problem Convolutive Time domain BSS Instantaneous Mixing matrix estimation Source estimation Frequency domain Frequency binwise separation Underdetermined # mixtures < # sources Permutation problem Convolutive Automatic detection of no. of sources 6:09 PM Time domain 13 Algorithm for Solving the Permutation Problem One frequency bin Instantaneous ICA algorithm Mixed signals BSS f2 BSS fk y1 y2 y3 K point IFFT x1 x2 x3 K point FFT f1 Solving permutation Problem BSS Separated signals Permutation problem solved Permutation problem 6:09 PM 14 Existing Method for Solving the Permutation Problem Direction Of Arrival (DOA) method: Y1 ( f k ) W11 ( f k ) W12 ( f k ) X 1 ( f k ) Y ( f ) W ( f ) W ( f ) X ( f ) 22 k 2 k 2 k 21 k 2 U q f k , Wqp f k e p 1 6:09 PM j 2f k c 1d p sin( ) Direction of y1 = -30o Direction of y2 = 20o Position of the pth sensor Velocity of sound 15 Existing Method for Solving the Permutation Problem Direction Of Arrival (DOA) method: Disadvantages: Fails at lower frequencies. Fails when sources are near. Room reverberation. Sensor positions must be known. Reasons for failure at lower freq: 6:09 PM Lower spacing causes error in phase difference measurement. The relation is approximated for plane wave front under anechoic condition 16 Existing Method for Solving the Permutation Problem Adjacent bands correlation method: High correlation Low correlation Low correlation Mixed signals 6:09 PM BSS f2 BSS fk y1 y2 y3 K point IFFT x1 x2 x3 K point FFT f1 Solving permutation Problem BSS Separated signals 17 Existing Method for Solving the Permutation Problem Adjacent bands correlation method: r11 r11 s1 s2 …….. …….. r11 K-1 K r12 r12 r12 r12 r21 r21 r21 r21 K-1 K r22 With confidence K+1 K+1 r22 Example r11 and r22 r12 and r21 0.1 0.8 0.9 0.2 r11 and r22 r12 and r21 0.9 0.1 0.2 0.8 6:09 PM r11 r22 K+2 K+3 K+2 K+3 …….. Correlation matrix …….. r11 r12 r21 r22 r22 Without confidence r11 r22 r12 r21 r11 r22 r12 r21 Example 0.4 0.1 0.9 0.2 0.9 0.2 0.4 0.1 Change permutation No change 18 Existing Method for Solving the Permutation Problem Adjacent bands correlation method: r11 r11 r11 r11 Correlation matrix s1 s2 …….. …….. K-1 K r12 r12 r12 r12 r21 r21 r21 r21 K-1 K r22 K+1 K+1 r22 r22 K+2 K+2 K+3 K+3 …….. r11 r12 r21 r22 …….. r22 Disadvantage: The method is not robust 6:09 PM 19 Existing Method for Solving the Permutation Problem Combination of DOA and Correlation methods method: DOA + Harmonic Correlation + Adjacent bands correlation Advantage: Increased robustness 6:09 PM 20 Proposed algorithm: Partial separation method (Parallel configuration) Reference: V. G. Reju, S. N. Koh and I. Y. Soon, “Partial separation method for solving permutation problem in frequency domain blind source separation of speech signals,” Neurocomputing, Vol. 71, NO. 10–12, June 2008, pp. 2098–2112. Time domain stage y1 y2 x1 x2 ŝ1 ŝ2 6:09 PM Frequency domain stage 21 Partial separation method (Parallel configuration) Time domain stage 6:09 PM Frequency domain stage 22 Partial separation method (Cascade configuration) Parallel configuration Frequency domain stage Time domain stage 6:09 PM 23 Advantages of Partial Separation method • Robustness 6:09 PM 24 Comparison with Adjacent Bands Correlation Method 6:09 PM 25 Comparison with DOA method PS - Partial Separation method with confidence check, C1 - Correlation between the adjacent bins without confidence check, C2 - Correlation between adjacent bins with confidence check, Ha - Correlation between the harmonic components with confidence check, PS1 - Partial separation method alone without confidence check. 6:09 PM 26 My Contribution -II Instantaneous Determined/ Overdetermined Frequency domain # mixtures ≥ # sources Frequency binwise separation Permutation problem Convolutive Time domain BSS Instantaneous Mixing matrix estimation Source estimation Frequency domain Frequency binwise separation Underdetermined # mixtures < # sources Permutation problem Convolutive Automatic detection of no. of sources 6:09 PM Time domain 27 Underdetermined Blind Source Separation of Instantaneous Mixtures Mixture in time domain Time to TF domain Detection of SSPs Mixing matrix estimation Estimation of Sources S1 (k2 , t2 ) 0, S2 (k2 , t2 ) 0 S1 (k1 , t1 ) 0, S2 (k1 , t1 ) 0 X1 k , t x1 S1 (k2 , t2 ) 0, S2 (k2 , t2 ) 0 S1 (k1 , t1 ) 0, S2 (k1 , t1 ) 0 x2 X 2 k , t k t 6:09 PM 28 Mathematical Representation of Instantaneous Mixing Reference: V. G. Reju, S. N. Koh and I. Y. Soon, “An algorithm for mixing matrix estimation in instantaneous blind source separation,” Signal Processing, Vol. 89, Issue 9, September 2009, pp. 1762–1773. Time domain: x1 (t ) h11 h1Q s1 (t ) xP (t ) hP1 hPQ sQ (t ) Time-Frequency domain: X 1 (k , t ) h11 h1Q S1 (k , t ) X P (k , t ) hP1 hPQ SQ (k , t ) P – No. of mixtures Q – No. of sources h1q h1Q h11 S1 (k , t ) S q (k , t ) SQ (k , t ) hPq hPQ hP1 X(k , t ) h1S (k , t ) h 2 S (k , t ) h Q S (k , t ) 6:09 PM 29 Single Source Points in Time-Frequency domain Case 1: at point k1, t1 , S1 k1, t1 0 , S2 k1, t1 0 Single source point 1 Case 2 : at point k2 , t2 , S1 k2 , t2 0 , S2 k2 , t2 0 Single source point 2 X(k1, t1 ) h1S1 (k1 , t1 ) h0 2 S2 (k1 , t1 ) X(k2 , t2 ) h1S1 (k20, t2 ) h2 S2 (k2 , t2 ) RX(k1 , t1 ) h1 RS1 (k1 , t1 ) RX(k2 , t2 ) h 2 RS 2 (k2 , t 2 ) I 6:09 X(PM k1 , t1 ) h1 I S1 (k1 , t1 ) I X(k2 , t2 ) h 2 I S 2 (k2 , t2 ) 30 Single Source Points in Time-Frequency domain Q Xk , t h q S q k , t q 1 Let Q2 Case 1: at point k1, t1 , S1 k1, t1 0 , S2 k1, t1 0 Single source point 1 Case 2 : at point k2 , t2 , S1 k2 , t2 0 , S2 k2 , t2 0 Single source point 2 X(k1, t1 ) h1S1 (k1 , t1 ) X(k2 , t2 ) h2 S2 (k2 , t2 ) RX(k1 , t1 ) h1 RS1 (k1 , t1 ) RX(k2 , t2 ) h 2 RS 2 (k2 , t 2 ) I 6:09 X(PM k1 , t1 ) h1 I S1 (k1 , t1 ) I X(k2 , t2 ) h 2 I S 2 (k2 , t2 ) 31 Single Source Points in Time-Frequency domain Case 1: at point k1, t1 , S1 k1, t1 0 , S2 k1, t1 0 Single source point 1 Case 2 : at point k2 , t2 , S1 k2 , t2 0 , S2 k2 , t2 0 Single source point 2 X(k1, t1 ) h1S1 (k1 , t1 ) X(k2 , t2 ) h2 S2 (k2 , t2 ) RX(k1 , t1 ) h1 RS1 (k1 , t1 ) RX(k2 , t2 ) h 2 RS 2 (k2 , t 2 ) I X(k1 , t1 ) h1 I S1 (k1 , t1 ) .·. At single source point 1: I X(k2 , t2 ) h 2 I S 2 (k2 , t2 ) .·. At single source point 2: Direction of RX(k1 , t1 ) Direction of h1 Direction of RX(k2 , t2 ) Direction of h 2 Direction of RX(k1 , t1 Direction of I X(k1, t1 Direction of RX(k2 , t2 Direction of I X(k2 , t2 Direction of I X(k1 , t1 ) Direction of h1 6:09 PM Direction of I X(k2 , t 2 ) Direction of h 2 32 Scatter Diagram of the Mixtures When Source are Perfectly Sparse Example: X 1 (k ,1) X (k ,1) 2 X 1 (k ,2) X 1 (k ,3) X 1 (k ,4) X 2 (k ,2) X 2 (k ,3) X 2 (k ,4) 0 h h1 11 h21 6:09 PM 0 0 0 X 1 (k ,5) h11 h12 S1 (k ,1) S1 (k ,2) S1 (k ,3) S1 (k ,4) S1 (k ,5) X 2 (k ,5) h21 h22 S 2 (k ,1) S2 (k ,2) S2 (k ,3) S2 (k ,4) S2 (k ,5) 0 h h 2 12 h22 33 Scatter Diagram of the Mixtures When Source are Not Perfectly Sparse Example: X 1 (k ,1) X (k ,1) 2 X 1 (k ,2) X 1 (k ,3) X 1 (k ,4) X 2 (k ,2) X 2 (k ,3) X 2 (k ,4) 0 h h1 11 h21 6:09 PM 0 0 0 X 1 (k ,5) h11 h12 S1 (k ,1) S1 (k ,2) S1 (k ,3) S1 (k ,4) S1 (k ,5) X 2 (k ,5) h21 h22 S 2 (k ,1) S2 (k ,2) S2 (k ,3) S2 (k ,4) S2 (k ,5) 0 h h 2 12 h22 34 Scatter Diagram of the Mixtures when Sources are Sparse No. of sources = 6 No. of mixtures = 2 h4 h2 h1 h3 h6 h5 6:09 PM 35 Scatter Diagram of the Mixtures when Sources are Sparse, After Clustering No. of sources = 6 No. of mixtures = 2 h4 h2 h1 h3 h6 h5 6:09 PM 36 Scatter Diagram of the Mixtures when Sources are Not Perfectly Sparse Objective: Estimation of the single source points. No. of sources = 6 No. of mixtures = 2 h4 h2 h1 h3 h6 h5 6:09 PM 37 Principle of the Proposed Algorithm for the Detection of Single Source Points Case 1: At point k1, t1 , S1 k1, t1 0 , S2 k1, t1 0 Case 2 : At point k2 , t2 , S1 k2 , t2 0 , S2 k2 , t2 0 Single source point 1 RX(k1 , t1 ) h1 RS1 (k1 , t1 ) I X(k1 , t1 ) h1 I S1 (k1 , t1 ) Single source point 2 RX(k2 , t2 ) h 2 RS 2 (k2 , t 2 ) I X(k2 , t2 ) h 2 I S 2 (k2 , t2 ) Case 3 : At point k3 , t3 , S1 k3 , t3 0 , S2 k3 , t3 0 Multi source point X(k3 , t3 ) h1S1 (k3 , t3 ) h 2 S2 (k3 , t3 ) RX(k3 , t3 ) h1RS1 (k3 , t3 ) h 2 RS 2 (k3 , t3 ) I X(k3 , t3 ) h1I S1 (k3 , t3 ) h 2 I S 2 (k3 , t3 ) 6:09 PM 38 Principle of the Proposed Algorithm for the Detection of Single Source Points Case 1: At point k1, t1 , S1 k1, t1 0 , S2 k1, t1 0 Case 2 : At point k2 , t2 , S1 k2 , t2 0 , S2 k2 , t2 0 Single source point 1 RX(k1 , t1 ) h1 RS1 (k1 , t1 ) I X(k1 , t1 ) h1 I S1 (k1 , t1 ) Single source point 2 RX(k2 , t2 ) h 2 RS 2 (k2 , t 2 ) I X(k2 , t2 ) h 2 I S 2 (k2 , t2 ) Case 3 : At point k3 , t3 , S1 k3 , t3 0 , S2 k3 , t3 0 Multi source point X (k3 , t3 ) hof (k3X, t3()k,ht 2)S2(kDirection 1S1R 3 , t3 ) Direction of I X (k3 , t3 ) 3 3 RX(k3 , t3 ) h1RRSS11((kk33,,tt33)) h 2RRSS22((kk33 ,, tt33)) if and only if I X(k3 , t3 ) h1IISS11((kk33, t,3t)3) h 2 IISS22(k(k3 ,3t,3t)3) 6:09 PM 39 Principle of the Proposed Algorithm for the Detection of Single Source Points Average of 15 pairs of speech utterances of length 10 s each Direction of RX (k3 , t3 ) Direction of I X (k3 , t3 ) if and only if RS1 (k3 , t3 ) RS 2 (k3 , t3 ) I S1 (k3 , t3 ) I S 2 (k3 , t3 ) Direction of RX (k , t ) Direction of I X (k , t ) Direction of RX (k , t ) Direction of I X (k , t ) 6:09 PM 40 Proposed Algorithm for the Detection of Single Source Points X1 (k1 , t1 ) X1 k , t x1 X 2 (k1 , t1 ) x2 X 2 k , t k t RX 1 (k1 , t1 ) jI X 1 (k1 , t1 ) RX 2 (k1 , t1 ) jI X 2 (k1 , t1 ) RX (k , t ) I X (k , t ) cos RX (k , t ) I X (k , t ) T 6:09 PM RX 1 (k1 , t1 ) RX (k , t ) 2 1 1 I X 1 (k1 , t1 ) I X (k , t ) 2 1 1 RX 1 (k1 , t1 ) RX (k , t ) 2 1 1 I X 1 (k1 , t1 ) I X (k , t ) 2 1 1 41 Elimination of Outliers Clustering SSPs detection Outlier elimination 6:09 PM 42 Experimental Results NMSE 47.67dB No. of mixtures =2, No. of sources =6 6:09 PM 43 Detected Single Source Points, Three mixtures No. of mixtures =3, No. of sources =6 6:09 PM 44 Comparison with Classical Algorithms for Determined Case Average of 500 experimental results No. of mixtures =2 No. of sources =2 -> 6:09 PM 45 Normalized mean square error (NMSE) in mixing matrix estimation (dB) Comparison with Method Proposed in [1], Underdetermined case Order of the mixing matrices (PxQ) P – No. of mixtures Q – No. of sources [1] Y. Li, S. Amari, A. Cichocki, D. W. C. Ho, and S. Xie, “Underdetermined blind source separation based on sparse representation,” IEEE Transactions on Signal Processing, vol. 54, p. 423–437, Feb. 2006. 6:09 PM 46 Advantages of the Proposed algorithm 1) Much simpler constrain: the algorithm does not require “single source zone”. 2) Separation performance is better. 3) The algorithm is extremely simple but effective Step 1: Step 2: Convert x in the time domain to the TF domain to get X. Check the condition RX (k , t ) I X (k , t ) cos RX (k , t ) I X (k , t ) T Step 3: Step 4: 6:09 PM If the condition is satisfied, then X(k, t) is a sample at the SSP, and this sample is kept for mixing matrix estimation; otherwise, discard the point. Repeat Steps 2 to 3 for all the points in the TF plane or until sufficient number of SSPs are obtained. 47 -> My Contributions – III, IV and V Instantaneous Determined/ Overdetermined Frequency domain # mixtures ≥ # sources Frequency binwise separation Permutation problem Convolutive Time domain BSS Instantaneous Mixing matrix estimation Source estimation Frequency domain Frequency binwise separation Underdetermined # mixtures < # sources Permutation problem Convolutive Automatic detection of no. of sources 6:09 PM Time domain 48 Underdetermined Convolutive Blind Source Separation via Time-Frequency Masking Reference: V. G. Reju, S. N. Koh and I. Y. Soon, “Underdetermined Convolutive Blind Source Separation via Time- Frequency Masking,” IEEE Transactions on Audio, Speech and Language Processing, Vol. 18, NO. 1, Jan. 2010, pp. 101–116. Y1 (k , t ) Apply mask STFT Mic 1 k YQ ( k , t ) t X 1 (k , t ) Mixture in TF domain Mic P Y1 (k , t ) Apply Mask STFT X P (k , t ) k YQ ( k , t ) t PQ Mask estimation 6:09 PM 49 Separated signals in TF domain Mathematical Representation Time domain: Q L 1 x p (n) hpq (l ) sq (n l ) q 1 l 0 p 1, , P P – No. of mixtures Q – No. of sources Frequency domain: X 1 (k , t ) H11 (k ) H1Q (k ) S1 (k , t ) X P (k , t ) H P1 (k ) H PQ (k ) SQ (k , t ) H1q (k ) H1Q (k ) H11 (k ) S1 (k , t ) S q (k , t ) SQ (k , t ) H Pq (k ) H PQ (k ) H P1 (k ) X(k , t ) H1 (k ) S1 (k , t ) H q (k ) S q (k , t ) H Q (k ) SQ (k , t ) 6:09 PM 50 Single source points Instantaneous mixing Case 1: at point k1, t1 , S1 k1, t1 0 , S2 k1, t1 0 Single source point 1 Case 2 : at point k2 , t2 , S1 k2 , t2 0 , S2 k2 , t2 0 Single source point 2 RX(k1 , t1 ) H1 (k ) RS1 (k1 , t1 ) I X(k1 , t1 ) H1 (k ) I S1 (k1 , t1 ) RX(k 2 , t 2 ) H 2 (k ) RS 2 (k 2 , t 2 ) I X(k 2 , t 2 ) H 2 (k ) I S 2 (k 2 , t 2 ) Convolutive mixing Case 1: at point k1, t1 , S1 k1, t1 0 , S2 k1, t1 0 Single source point 1 X(k1 , t1 ) H1 (k )S1 (k1 , t1 ) 6:09 PM Case 2 : at point k2 , t2 , S1 k2 , t2 0 , S2 k2 , t2 0 Single source point 2 X(k2 , t2 ) H2 (k )S2 (k2 , t2 ) 51 Basic Principle of Single Source Points Detection Convolutive mixing Case 1: at point k1, t1 , S1 k1, t1 0 , S2 k1, t1 0 Single source point 1 Single source point 2 X(k1 , t1 ) H1 (k )S1 (k1 , t1 ) U1H U 2 cos C U1 U 2 Case 2 : at point k2 , t2 , S1 k2 , t2 0 , S2 k2 , t2 0 X(k2 , t2 ) H2 (k )S2 (k2 , t2 ) U UH U e j cos H cosC H is called Hermitian angle is called pseudo angle 6:09 PM 0 H 2 and -> The Hermitian angle between the complex vectors u1 and u2 will remain the same even if the vectors are multiplied by any complex 52 scalars, whereas the pseudo angle will change. Algorithm for Single Source Points Detection X1 (k1 , t1 ) x1 X1 k , t k k1 θH1 X 2 (k1 , t1 ) X 2 k , t x2 k k1 t t1 θH1 H (k ) 11 1 S1 (k1 , t1 ), H 21 (k1 ) 6:09 PM θH2 OR H X(k1 , t1 ) H r cos X(k1 , t1 ) r 1 q H (k ) 12 1 S 2 (k1 , t1 ), H 22 (k1 ) 1 j1 r 1 j1 X 1 (k1 , t1 ) X (k , t ) 2 1 1 1 j1 r 1 j1 θH2 X 1 (k1 , t1 ) X (k , t ) 2 1 1 53 Mask Estimation by k-means (KM) Clean Yq (k , t ) M q (k , t ) X p (k , t ) t , q 1, , Q Estimated 6:09 PM 54 Mask Estimation by Fuzzy c-means (FCM) Clean Yq (k , t ) M q (k , t ) X p (k , t ) t , q 1, , Q Estimated 6:09 PM 55 Automatic Detection of Number of Sources Cluster validation technique: For c = 2 to cmax Cluster the data into c clusters. Calculate the cluster validation index. End Take c corresponding to the best cluster as the number of sources. -> 6:09 PM 56 Elimination of Low Energy Points 6:09 PM 57