AN ADAPTIVE ALGORITHM FOR ICA

advertisement
AN ADAPTIVE ALGORITHM FOR ICA
Joby Joseph and K. V. S. Hari
Indian Institute of Science, Bangalore, India, 560012
ABSTRACT
A new adaptive algorithm for Independent Component Analysis (ICA) has been developed. It directly applies stochastic gradient descent on the independence criterion. The cost
surface is seen to be fairly smooth. The developed algorithm
is always stable. It converges faster than earlier algorithm.
The algorithm is developed for general
system case.
Experiments are given to verify the above.
1. INTRODUCTION
Independent Component Analysis are methods by which
statistically independent features are recovered from mixtures. In the current paper linear mixtures are considered.
Given
?
A1EGBDC F*HIKJ(L%M ON H QPSRT) J(L%M UN H V3WN H
(6)
H
Here J H L M is the density function of . H Here we deal with
only identically distributed. If the are differently dis-
the whiteness constraint this is same as minimizing the entropies of the elements of .
tributed then it is the signature used for identifying the recovered sources. This cost function can be used for estimating the sources within scaling and permutation if the
sources are not Gaussian. This much is evident from the
works [2],[3],[1]. There exists certain adaptive algorithms
. Some which assume the source denwhich estimate
sities to be known, in which case the estimates converge
faster. In some other attempts these densities are assumed
to be of certain form, or the densities themselves are also estimated during the process. These are reviewed in [2]. Here
we initially assume the source densities to be known and
develop the adaptive algorithm.
>
(1)
where !"#
%$ , &'(
%&)
& " *+$ . is the mixing matrix. Problem is the estimation of the mixing system
and the original signal .
Principal Component Analysis (PCA) is the first step in the
signal recovery [1]. It is assumed that PCA has been done
for the original problem of signal separation. Thus 2. ADAPTIVE ALGORITHM
have independent elements and a pre-whitening matrix ,
has been estimated such that
We have to find > unitary such that (5) is satisfied. Thus
-
. , *
(2)
>> O isX atherotation
matrix. In adaptive algorithm we start with
/10 -2
3-546
879 :
(3)
initial value and adapt it, using samples ? , till
it converges to the optimal > . Thus this is a formulation
Thus after the PCA stage
of stochastic gradient descent algorithm. We need to find a
sequence of Y * such that > * converge to optimal > .
-*; <=
(4)
[Z]\^ Y [Z]\ > < is unitary and has to be estimated in order to recover .
>
(7)
_
]
Z
\
^
[
]
Z
\
3
[
]
Z
\
Indirectly it can be stated as estimation of > unitary such
?
Y
>
(8)
that elements of ? are independent in
Pairwise rotation of the vectors forming > is enough
? @ > -
(5)
to achieve this [1]. So we develop an algorithm which will
A good cost function to estimate >
such that ? has indesequentially operate rotations on each pair so that the cost
` . So in an
pendent components, is the minimization of mutual infor(6) is minimized. > and Y are of size `
mation of elements of ? . ? is an estimate of , but
iteration where aObOc and d*bOc elements of ? forms the pair for
with ambiguity in the permutation of the elements. Under
processing, Y is of the form below where e Rf and fVBSC
7
‚ƒ(„ l x z ,
‚ƒ(„ l x z ž!ŸC' “¡  z*x *%%N Nyz z
¢•q Z  z x**%%NN(xx3
(£ (14)

ž!ŸC' U¤
(15)
Approximating piecewise linearly and denoting sign of ¤
with ¥¦O¤
V
‚ƒ(„ lyx z*
!§1¨ ¤o¥”¦
U¤2‘ Jª
V©Q%« ­•¯(U¤{ ‘ Jª6©”¬®« O­•¤¯({ *6°®­•¯({ (16)
Hence the adaptation equation can be written as (8) and
(9), with l*
k±qk² ‚ƒ(„ l*
, where ²n¬\ is the step size
Solving for
parameter.
(a)The source, (b)mixture and (c)recovered signals
2
(a)Original signal
1.5
1
0.5
0
−0.5
−1
−1.5
−2
0
10
20
30
40
50
60
70
80
90
100
0
10
20
30
40
50
60
70
80
90
100
0
10
20
30
40
50
60
70
80
90
100
3
2
(b)Mixed signal
ghhh Ua : bOc and d*bOc , columns and rows. j tvuuu
Y [Z]\ hi XkXkj oqe RfBDf CrUlmUlm
VnV
XkXk: oo\p\pXkXk Q fVBSRC5f OOl*l***ksXkXkj u
e
j
: w
(9)
Thus the problem is reduced to0 finding a sequence lyx z*3\
lyx z*€ {| lyx z*
for each pair ? x*8} ? z*
~7 such that,
 l x z * tends to optimal ‚ƒ„ l x z .Now
we make use of
the ergodicity and stationarity of and do adaptation
with instantaneous
optimal H ‚ƒ(„ l x…z , where optimality is

H
PSRTm J UN V at‡instant
, so that the
with respect to
‰
†
ˆ
expected value in (6) is reached as
. Instantaneous
optimal rotation ‚ƒ„ l x z is given by solution to
W
W ‚ƒ(„ l x z F H PDRT) J L M UN H V; X (10)
WN H X (11)
H
HUŠF ‹ xŒ zŽ* W ‚ƒ(„ l x z where
H J UON N H H *
(12)

J
terms are in the
1
0
−1
The error measure for the estimate of the seperating
W
matrix
−2
−3
6
2
1.5
(b)Recovered signal
Error measure
5
4
1
0.5
0
−0.5
−1
−1.5
−2
3
Fig. 2. A set of the source, the mixture and the recovered
signals are plotted in this figure.
2
1
> O X .
2. Calculate ? > *%-
.
3. Obtain ‚ƒ(„ l*
as in (16).
4. Form Y as in (9) with l*
³qk² ‚ƒ„ lm
5. Update > using (8)
6. Do steps 3 and 4 for each a } d pair.
7. Increment and go to step (2)
Algorithm
0
0
500
Iteration number
1000
1500
Fig. 1. The error with sample number is plotted in figure.
0 N(x%}Nyz”7
“‘ ’ 0 a } d 7
 x3
3qkN(xV
QfVBDC5qkUlNy•zQZ–
NyQz
fVBSC•*Ule VR f— ˜VOl™5š35›~œZ ˜p]z
X
oqkN(x3* e Rf Ul
Notice that (10) arises from (11) because in (10) the
terms other than
are zero when the pair
is being considered. Substituting (9) and (8) in (11)
(13)
1. Initialize
Y is always unitary, the algorithm is always staSince
ble. As in general stochastic gradient descent algorithms,
the solution will converge if a minima exist. This may be
a local minima. To get an idea of the topology of the cost
function, it can be plotted for some typical
cases. This
can be done by estimating (6) from recovered sources, with
varying over different rotation angles. To do this choose
of the form
{{
><
<
ghhi pq X* ½¾¿À qpX
was
tvuu
D
\
¿
Á
Â
{
p
q
*
X
Á
Q
{
\
Ã
X
Ã
¿
{
¼
<´ XX
……{½(Á½(¿X*¼\ qpX*X
½…½ ü ¾{(½ à qpqpX*X* ½½ ¼ÀÁ¾{¾ qpX
X*…{( ½X¿{{Q¾ \ w
qpX* ½À{ ¼ *X ½Á¾{ qpX*S\ÁX¿ qpX* ÁX¾(½
(18)
chosen is as deThe error measure for the estimated >
scribed in ([1]). It checks how close > < is to a permutaunitary mixing matrix
´< ¶µ q¸e RfVBDfC5·· efVRBSC5f ··=¹
(17)
Then generate of required distribution and in turn obtain -2
, with (4). Generate > of the same form as < in
(17), but with angle º instead of · . Vary º from qk­•¯({ to
­•¯({ and compute ? as in (5) for each
· . Estimate of
(6) can be done from these samples of ?
by finding its
normalized histogram and using it as the estimate of probability moment function. This probability moment function
being the sampled version of the density function J»~M .
tion matrix. For the given case the error with sample number is plotted in figure(1). The curve is ensemble average of
100 runs. The step size parameter was 0.07. Also given in
figure(2) is an example of recovered signal compared to the
original and the mixed signals.
0.7
previous algorithm
New algorithm
0.6
0.5
Error measure
3. STABILITY AND CONVERGENCE
0.4
0.3
Cost function for 2 × 2 case
-6.95
0.2
-7
-7.05
0.1
-7.1
0
0
500
1000
1500
2000
2500
number of iterations
3000
3500
4000
Cost
-7.15
-7.2
Fig. 4. The rate of convergence of the present algorithm is
compared with that of earlier algorithm in this figure.
-7.25
-7.3
-7.35
-7.4
-2
-1.5
-1
-0.5
0
ψ
Fig. 3. The cost function for the
0.5
1
1.5
2
{ { system <´]:
4. SIMULATIONS
¼¼
H
Example 1. This example demonstrates the convergence of
the algorithm for a
system.
are independent and
uniformly distributed sources with unit variance. The
¼¼
Example 2 In this example the cost surface for the stochastic gradient algorithm when the conditions are described as
in (3). For the
system
the cost surface is
plotted in figure (3), when the source density is uniform between
and
. If this density function is sampled to
same number of points as the bins in the histogram used
for computation from samples, and calculate entropy of this
probability moment function, it is
. Here the plotted
cost is an average over
runs for each .
equally
spaced values of between
and
were chosen.
Notice that there are two local minima with identical cost
values. As seen from the figure (3) these minimum values
correspond to the cost of the original unmixed signals, implying that both these minima to be equally valid solutions
for the separation matrix .
Example 3 In this example the convergence of this new
algorithm is compared with that in [3]. Identical
mix-
{ {
q|Å À Å À
º
<^Ä:
\XX qk­•¯({ q|¾” Àà ­•¯º { \XX
>
{{
²ÆX
X¾
ing matrices were used and sources were uniform with unit
variance. For new algorithm
and for the old algorithm
[3]. The convergence curves are given
in figure (4). These plots are averaged over 100 runs of the
algorithm. It is observed that the new algorithm converges
faster to the same steady state error.
¤žÇX
XX{
5. CONCLUSION
If the source density functions are known then this adaptive
algorithm is good for recovery in the stage after PCA. The
algorithm is stable irrespective of the parameters. It is optimizing an optimal cost function for ICA. The convergence
is faster than the earlier algorithms. But computationally
this is more expensive. As further work we are looking in to
dynamic estimation of density function to avoid the requirement of apriori knowledge of the density function. Initial
tests show that it is possible, but convergence is poor.
6. REFERENCES
[1] Pierre Comon, “Independent component analysis, a
new concept?,” Signal Processing, vol. 36, pp. 287–
314, 1994.
[2] Jean-François Cardoso, “Blind signal separation: Statistical principles,” Proceedings of IEEE, vol. 86, no.
10, pp. 2009–2025, October 1998.
[3] S. C. Douglas and S. Y. Kung, “Design of estimation/deflation approaches to independent component
analysis,” Asilomar Conference, pp. 707–711, 1998.
Download