AN ADAPTIVE ALGORITHM FOR ICA Joby Joseph and K. V. S. Hari Indian Institute of Science, Bangalore, India, 560012 ABSTRACT A new adaptive algorithm for Independent Component Analysis (ICA) has been developed. It directly applies stochastic gradient descent on the independence criterion. The cost surface is seen to be fairly smooth. The developed algorithm is always stable. It converges faster than earlier algorithm. The algorithm is developed for general system case. Experiments are given to verify the above. 1. INTRODUCTION Independent Component Analysis are methods by which statistically independent features are recovered from mixtures. In the current paper linear mixtures are considered. Given ? A1EGBDC F*HIKJ(L%M ON H QPSRT) J(L%M UN H V3WN H (6) H Here J H L M is the density function of . H Here we deal with only identically distributed. If the are differently dis- the whiteness constraint this is same as minimizing the entropies of the elements of . tributed then it is the signature used for identifying the recovered sources. This cost function can be used for estimating the sources within scaling and permutation if the sources are not Gaussian. This much is evident from the works [2],[3],[1]. There exists certain adaptive algorithms . Some which assume the source denwhich estimate sities to be known, in which case the estimates converge faster. In some other attempts these densities are assumed to be of certain form, or the densities themselves are also estimated during the process. These are reviewed in [2]. Here we initially assume the source densities to be known and develop the adaptive algorithm. > (1) where !"# %$ , &'( %&) & " *+$ . is the mixing matrix. Problem is the estimation of the mixing system and the original signal . Principal Component Analysis (PCA) is the first step in the signal recovery [1]. It is assumed that PCA has been done for the original problem of signal separation. Thus 2. ADAPTIVE ALGORITHM have independent elements and a pre-whitening matrix , has been estimated such that We have to find > unitary such that (5) is satisfied. Thus - . , * (2) >> O isX atherotation matrix. In adaptive algorithm we start with /10 -2 3-546 879 : (3) initial value and adapt it, using samples ? , till it converges to the optimal > . Thus this is a formulation Thus after the PCA stage of stochastic gradient descent algorithm. We need to find a sequence of Y * such that > * converge to optimal > . -*; <= (4) [Z]\^ Y [Z]\ > < is unitary and has to be estimated in order to recover . > (7) _ ] Z \ ^ [ ] Z \ 3 [ ] Z \ Indirectly it can be stated as estimation of > unitary such ? Y > (8) that elements of ? are independent in Pairwise rotation of the vectors forming > is enough ? @ > - (5) to achieve this [1]. So we develop an algorithm which will A good cost function to estimate > such that ? has indesequentially operate rotations on each pair so that the cost ` . So in an pendent components, is the minimization of mutual infor(6) is minimized. > and Y are of size ` mation of elements of ? . ? is an estimate of , but iteration where aObOc and d*bOc elements of ? forms the pair for with ambiguity in the permutation of the elements. Under processing, Y is of the form below where e Rf and fVBSC 7 ( l x z , ( l x z !C' ¡ z*x *%%N Nyz z ¢q Z z x**%%NN(xx3 (£ (14) !C' U¤ (15) Approximating piecewise linearly and denoting sign of ¤ with ¥¦O¤ V ( lyx z* !§1¨ ¤o¥¦ U¤2 Jª V©Q%« ­¯(U¤{ Jª6©¬®« O­¤¯({ *6°®­¯({ (16) Hence the adaptation equation can be written as (8) and (9), with l* k±qk² ( l* , where ²n¬\ is the step size Solving for parameter. (a)The source, (b)mixture and (c)recovered signals 2 (a)Original signal 1.5 1 0.5 0 −0.5 −1 −1.5 −2 0 10 20 30 40 50 60 70 80 90 100 0 10 20 30 40 50 60 70 80 90 100 0 10 20 30 40 50 60 70 80 90 100 3 2 (b)Mixed signal ghhh Ua : bOc and d*bOc , columns and rows. j tvuuu Y [Z]\ hi XkXkj oqe RfBDf CrUlmUlm VnV XkXk: oo\p\pXkXk Q fVBSRC5f OOl*l***ksXkXkj u e j : w (9) Thus the problem is reduced to0 finding a sequence lyx z*3\ lyx z* {| lyx z* for each pair ? x*8} ? z* ~7 such that, l x z * tends to optimal l x z .Now we make use of the ergodicity and stationarity of and do adaptation with instantaneous optimal H ( l x z , where optimality is H PSRTm J UN V atinstant , so that the with respect to expected value in (6) is reached as . Instantaneous optimal rotation l x z is given by solution to W W ( l x z F H PDRT) J L M UN H V; X (10) WN H X (11) H HUF x z* W ( l x z where H J UON N H H * (12) J terms are in the 1 0 −1 The error measure for the estimate of the seperating W matrix −2 −3 6 2 1.5 (b)Recovered signal Error measure 5 4 1 0.5 0 −0.5 −1 −1.5 −2 3 Fig. 2. A set of the source, the mixture and the recovered signals are plotted in this figure. 2 1 > O X . 2. Calculate ? > *%- . 3. Obtain ( l* as in (16). 4. Form Y as in (9) with l* ³qk² lm 5. Update > using (8) 6. Do steps 3 and 4 for each a } d pair. 7. Increment and go to step (2) Algorithm 0 0 500 Iteration number 1000 1500 Fig. 1. The error with sample number is plotted in figure. 0 N(x%}Nyz7 0 a } d 7 x3 3qkN(xV QfVBDC5qkUlNyzQZ NyQz fVBSC*Ule VR f VOl535~Z p]z X oqkN(x3* e Rf Ul Notice that (10) arises from (11) because in (10) the terms other than are zero when the pair is being considered. Substituting (9) and (8) in (11) (13) 1. Initialize Y is always unitary, the algorithm is always staSince ble. As in general stochastic gradient descent algorithms, the solution will converge if a minima exist. This may be a local minima. To get an idea of the topology of the cost function, it can be plotted for some typical cases. This can be done by estimating (6) from recovered sources, with varying over different rotation angles. To do this choose of the form {{ >< < ghhi pq X* ½¾¿À qpX was tvuu D \ ¿ Á  { p q * X Á Q { \ à X à ¿ { ¼ <´ XX {½(Á½(¿X*¼\ qpX*X ½ ½ ü ¾{(½ à qpqpX*X* ½½ ¼ÀÁ¾{¾ qpX X* {( ½X¿{{Q¾ \ w qpX* ½À{ ¼ *X ½Á¾{ qpX*S\ÁX¿ qpX* ÁX¾(½ (18) chosen is as deThe error measure for the estimated > scribed in ([1]). It checks how close > < is to a permutaunitary mixing matrix ´< ¶µ q¸e RfVBDfC5·· efVRBSC5f ··=¹ (17) Then generate of required distribution and in turn obtain -2 , with (4). Generate > of the same form as < in (17), but with angle º instead of · . Vary º from qk­¯({ to ­¯({ and compute ? as in (5) for each · . Estimate of (6) can be done from these samples of ? by finding its normalized histogram and using it as the estimate of probability moment function. This probability moment function being the sampled version of the density function J»~M . tion matrix. For the given case the error with sample number is plotted in figure(1). The curve is ensemble average of 100 runs. The step size parameter was 0.07. Also given in figure(2) is an example of recovered signal compared to the original and the mixed signals. 0.7 previous algorithm New algorithm 0.6 0.5 Error measure 3. STABILITY AND CONVERGENCE 0.4 0.3 Cost function for 2 × 2 case -6.95 0.2 -7 -7.05 0.1 -7.1 0 0 500 1000 1500 2000 2500 number of iterations 3000 3500 4000 Cost -7.15 -7.2 Fig. 4. The rate of convergence of the present algorithm is compared with that of earlier algorithm in this figure. -7.25 -7.3 -7.35 -7.4 -2 -1.5 -1 -0.5 0 ψ Fig. 3. The cost function for the 0.5 1 1.5 2 { { system <´]: 4. SIMULATIONS ¼¼ H Example 1. This example demonstrates the convergence of the algorithm for a system. are independent and uniformly distributed sources with unit variance. The ¼¼ Example 2 In this example the cost surface for the stochastic gradient algorithm when the conditions are described as in (3). For the system the cost surface is plotted in figure (3), when the source density is uniform between and . If this density function is sampled to same number of points as the bins in the histogram used for computation from samples, and calculate entropy of this probability moment function, it is . Here the plotted cost is an average over runs for each . equally spaced values of between and were chosen. Notice that there are two local minima with identical cost values. As seen from the figure (3) these minimum values correspond to the cost of the original unmixed signals, implying that both these minima to be equally valid solutions for the separation matrix . Example 3 In this example the convergence of this new algorithm is compared with that in [3]. Identical mix- { { q|Å À Å À º <^Ä: \XX qk­¯({ q|¾ Àà ­¯º { \XX > {{ ²ÆX X¾ ing matrices were used and sources were uniform with unit variance. For new algorithm and for the old algorithm [3]. The convergence curves are given in figure (4). These plots are averaged over 100 runs of the algorithm. It is observed that the new algorithm converges faster to the same steady state error. ¤ÇX XX{ 5. CONCLUSION If the source density functions are known then this adaptive algorithm is good for recovery in the stage after PCA. The algorithm is stable irrespective of the parameters. It is optimizing an optimal cost function for ICA. The convergence is faster than the earlier algorithms. But computationally this is more expensive. As further work we are looking in to dynamic estimation of density function to avoid the requirement of apriori knowledge of the density function. Initial tests show that it is possible, but convergence is poor. 6. REFERENCES [1] Pierre Comon, “Independent component analysis, a new concept?,” Signal Processing, vol. 36, pp. 287– 314, 1994. [2] Jean-François Cardoso, “Blind signal separation: Statistical principles,” Proceedings of IEEE, vol. 86, no. 10, pp. 2009–2025, October 1998. [3] S. C. Douglas and S. Y. Kung, “Design of estimation/deflation approaches to independent component analysis,” Asilomar Conference, pp. 707–711, 1998.