pptx

advertisement
Building Soft Clusters of Data via the Information Bottleneck Method
Russell J.
1
Ricker ,
Albert E.
1
Parker ,
Tomáš
1
Gedeon ,
Alexander G.
2
Dimitrov
1Montana
Abstract
Information-based distortion methods have been used
successfully to analyze the relationship between stimulus and
reaction spaces, such as in neural coding and text classification.
Distortion methods make few assumptions concerning the
correspondence between the two spaces, providing maximally
informative relationships between them. I used the Information
Bottleneck technique to create soft clustering of a synthetic
data set with 50 stimuli and 50 neural responses with a
multivariate Gaussian (with 4-blobs) which reconstructed their
hypothetical relationship. The algorithm utilized an annealing
method to solve the high-dimensional non-linear problem, and
was implemented using Matlab®. As the annealing parameter
increased, the solution to the problem underwent a series of
phase transitions, or bifurcations, that eventually stabilized to a
nearly deterministic clustering. By calculating the matrix of
second derivatives (the Hessian), we are able to determine
when the bifurcations occur. By calculating arrays of third and
fourth derivatives, we are able to determine whether the
bifurcations are subcritical or supercritical. The existence of
subcritical branches implies that several solutions not found by
the method of annealing exist. Because the method of
annealing is guaranteed to converge, the subcritical branch
must turn at a later bifurcation and become optimal. I show
that we can obtain nearly optimal information preservation for
N=4 classes, as well as provide evidence that sub-critical
bifurcations exist.
State University Department of Mathematical Sciences
2 Washington State University Department of Mathematics
Figure 1. A 52-by-52 class Gaussian distribution
with well-separated peaks. This was the test
distribution used to obtain the results. The
vertical axis is the input space, the horizontal
axis is the output space, and the dark areas
correspond to high-probability areas.
Figure 4. Bifurcation Diagram when N=4. The panels
illustrate the sequence of symmetry-breaking
bifurcations from the branch (q1/N,λ,β) with
symmetry S4, to symmetry S3 (blue), to symmetry S2
(black), to symmetry S1 (cyan). From “Symmetry”,
p17.
Theoretical Background
The Information Bottleneck Method clusters the
response space, Y, into a much smaller space, T.
In order to informatively cluster the response
space, the mutual information function between
Y and T must be minimized with the constraint
that the mutual information between the
stimulus space, X, and T must be greater than or
equal to some distortion rate, I0.
The Basic Optimization Problem:2,4
I (Y ; T )
RI ( I 0 ) : min
q I ( X ; T )  I
0
I0 is an information rate
q ≔ q(T|Y) is a conditional probability that maps
Figure 2. The Annealing parameter β versus value
of the mutual information between the input
space X and the clustered output space T for the
Information Bottleneck Method. The dotted line
is the value of the mutual information between X
and Y.
Y to T
 q(t | y)  1,
q(t | y)  0 t , y }
t
  y q(t | y) p( x, y) 


q
(
t
|
y
)
p
(
x
,
y
)
log
2
I(X;T) ≔
 p( x) p( y)q(t | y) 
x , y ,t
y




q
(
t
|
y
)


q
(
t
|
y
)
p
(
y
)
log
I(Y;T) ≔ 
2


p
(
y
)
q
(
t
|
y
)
y ,t
 y

Δ ≔ {q(T|Y) :
J 2  d q , L
 q 
   L(q,  ,  )


 
d q2 I ( X ; T )
If
is either positive or negative definite
on ker( dq2 (I (Y ;T )    I ( X ;T ))) for a (q*,λ*,β*), then
that (q*,λ*,β*) is a singularity of the gradient
flow if and only if (q*,λ*,β*) is a bifurcation
point.
1.
2.
Results
The Information Bottleneck Method is
providing nearly deterministic clusters of the
Gaussian Four-Blob distribution.
Evaluating a bifurcation discriminator, we
have numerical evidence that subcritical
bifurcations exist, which is contrary to the
hypotheses of others working with this
method.
N
2
3
4
5
To solve this, we utilize the method of annealing.
The algorithm for this method is2,3
1. Perform β-step: Let βj+1= βj+sj where sj>0
2. Take q(0)j+1 = qj + η as an initial guess for the
solution qj+1 at βj+1
3. Solve max
( I (Y ; T )   j 1  I ( X ; T )) to get maximizer
q
qj+1
How to Determine Solutions:2,4
2
The Hessian: dq (I (Y ; T )    I ( X ;T ))
|Y |
|T |


The Lagrangian: L(q,  ,  )   I (Y ; T )    I ( X ; T )   k   qk  1
k 1   1

J1  d q   L
If the Hessian is negative definite on ker(J1) or
ker(J2) for a certain q*, then that q* is a solution
to the initial problem.
How to Determine Bifurcations:2
The Gradient Flow:
Figure 5. This image, from “Symmetry Breaking in
Soft Clustering Decoding of Neural Codes,”
demonstrates the bifurcating branch. This figure
was derived from the Information Distortion
Method.
Figure 3. In this picture we can see the optimal
clusterings for each of N=2, 3, 4 and 5 clusters.
N=2 and N=3 display mismatched clusters; N=5
shows a cluster that has not yet resolved. N=4
displays well-divided and nearly deterministic
clusters.
Discriminator
-1.026e-5
-10.38
-6.62e-15
-58.09
Conclusions
The Information Bottleneck Method is providing
clusters of the information that accurately
models the original system. However, we have
numerical evidence that sub-critical branches
exist. In other words, optimal clustering schemes
not found by the method of annealing may exist.
This will be further explored by making an
annealing along branches predicted to be subcritical using continuation techniques.
References
[1] T. Cover and J. Thomas, Elements of Information Theory. New
York: Wiley Series in Communication (1991).
[2] A. E. Parker, A. G. Dimitrov, T. Gedeon. “Symmetry Breaking in
Soft Clustering Decoding of Neural Codes.” IEEE Trans. Info.
Theory, 56.2 (2010). Pp. 901-927
[3] K. Rose, “Deterministic annealing for clustering, compression,
classification, regression, and related optimization problems,”
Proc. IEEE, 86.11 (1998). Pp.2210-2239
[4] T. Gedeon, A. E. Parker, A. G. Dimitrov. “The Mathematical
Structure of Information Bottleneck Methods.” Entropy, 14.3
(2012). Pp. 456-479
Download