Symmetry breaking clusters when deciphering the neural code Albert E. Parker

advertisement
Symmetry breaking clusters
when deciphering the neural code
September 12, 2005
Albert E. Parker
Department of Mathematical Sciences
Center for Computational Biology
Montana State University
Collaborators: Tomas Gedeon, Alex Dimitrov, John Miller, and Zane Aldworth
Talk Outline
Deciphering the Neural Code
A Clustering Problem
The Dynamical System
Bifurcations
Theoretical Results
Numerical Results
Deciphering the Neural Code:
How does neural activity represent information about environmental stimuli?
“The little fly sitting in the fly’s brain trying to fly the fly”
Looking for the dictionary to the neural code …
encoding
Inputs: stimuli
X
outputs: neural response
Y
decoding
… but the dictionary is not deterministic!
Given a stimulus, an experimenter observes many different neural responses:
X
Yi| X
i = 1, 2, 3, 4
… but the dictionary is not deterministic!
Given a stimulus, an experimenter observes many different neural responses:
X
Yi| X
i = 1, 2, 3, 4
Neural encoding is stochastic!!
Similarly, neural decoding is stochastic:
Y
Xi|Y
i = 1, 2, … , 9
Probability Framework
encoder: P(Y|X)
environmental
stimuli
neural
responses
Y
X
decoder: P(X|Y)
Deciphering the Neural Code
=
Determining the
encoder P(Y|X) or the decoder P(X|Y)
Common Approaches: parametric estimations, linear methods
Difficulty: There is never enough data.
One Approach: Cluster the responses
Stimuli
X
L objects {xi}
Responses
p(X,Y)
Y
K objects {yi}
Clustered Responses
q(Z |Y)
Z
N objects {zi}
One Approach: Cluster the responses
Stimuli
X
L objects {xi}
Responses
p(X,Y)
Y
K objects {yi}
Clustered Responses
q(Z |Y)
Z
N objects {zi}
One Approach: Cluster the responses
Stimuli
Responses
p(X,Y)
X
L objects {xi}
P(Y|X)
P(X|Y)
Y
K objects {yi}
Clustered Responses
q(Z |Y)
Z
N objects {zi}
One Approach: Cluster the responses
Stimuli
Responses
p(X,Y)
X
L objects {xi}
P(Y|X)
P(X|Y)
Y
K objects {yi}
Clustered Responses
q(Z |Y)
Z
N objects {zi}
One Approach: Cluster the responses
Stimuli
Responses
p(X,Y)
X
L objects {xi}
P(Y|X)
P(X|Y)
Y
K objects {yi}
Clustered Responses
q(Z |Y)
Z
N objects {zi}
P(Z|X)
P(X|Z)
One Approach: Cluster the responses
Stimuli
X
L objects {xi}
Responses
p(X,Y)
Y
K objects {yi}
Clustered Responses
q(Z |Y)
Z
N objects {zi}
• q(Z|Y) is a stochastic clustering of the responses
• The outputs Y are clustered in Z so that the information
that one can learn about X by observing Z , I(X;Z), is as
close as possible to the mutual information I(X;Y)
Two optimization problems which use this approach
optimizing at a distortion level D(Y,Z)  D0
• Rate Distortion Theory (Shannon 1950’s)
Minimal Informative Compression
min I(X,Z) constrained by D(X,Z)  D0
q
• Deterministic Annealing (Rose 1990’s)
A Clustering Algorithm
max H(Z|X) constrained by D(X,Z)  D0
q
Relationship between these formulations:
I(X,Z)=H(Z) – H(Z|X)
Examples:
• Information Bottleneck Method (Tishby, Pereira, Bialek 1999)
min I(Y,Z) constrained by I(X;Z)  I0
q
max –I(Y,Z) +  I(X;Z)
q
• Information Distortion Method (Dimitrov and Miller 2001)
max H(Z|Y) constrained by I(X;Z)  I0
q
max H(Z|Y) +  I(X;Z)
q
A basic annealing algorithm
to solve
maxq(G(q)+D(q))

Let q0 be the maximizer of maxq G(q), and let 0 =0. For k  0, let (qk , k ) be
a solution to maxq G(q) +  D(q ). Iterate the following steps until
K =  max for some K.
1. Perform  -step: Let  k+1 = k + dk where dk>0
2. The initial guess for qk+1 at  k+1 is qk+1(0) = qk +  for some small
perturbation .
3. Optimization: solve maxq (G(q) +  k+1 D(q)) to get the maximizer qk+1 ,
using initial guess qk+1(0) .
Application of the annealing method to the Information Distortion problem
maxq (H(Z|X) +  I(X;Z))
when p(X,Y) is defined by four gaussian blobs
Y
p(X,Y)
L=52 inputs
X
X
K=52 outputs
X, Outputs
Z
K=52 outputs N=4 clustered outputs
Z, Clustered Outputs
Y, Inputs
q(Z|X)
X, Outputs
Evolution of the optimal clustering:
Observed Bifurcations for the Four Blob problem:
I(Y,Z) bits
We just saw the optimal clusterings q* at some  *=  max . What do the clusterings look like for < max ??
??????
Observed Bifurcations for the 4 Blob Problem
Conceptual Bifurcation Structure
I(Y,Z) bits
q*

Why are there only 3 bifurcations observed? In general, are there only N-1 bifurcations?
What kinds of bifurcations do we expect: pitchfork-like, transcritical, saddle-node, or some
other type?
How many bifurcating branches are there?
What do the bifurcating branches look like? Are they subcritical or supercritical ?
What is the stability of the bifurcating branches? Is there always a bifurcating branch
which contains solutions of the optimization problem?
Are there bifurcations which alter the classes after all of the classes have resolved ?
A General Problem:
To determine the bifurcations of solutions to clustering problems of the form
maxqG(q) constrained by D(q)I0 
where
q(Z|X)
X
K objects
Z
N clusters
• q  is a vector of conditional probabilities in
• G =g(qi) and D=d(qi) are sufficiently smooth on  , and q=(q1T… qNT)T
where qi  RK . This implies that:
1. G and D have symmetry: they are invariant to re-labeling of the
classes of Z;
2. The Hessians d2G and d2D are block diagonal.
• The Hessians d2G and d2D satisfy a set of generic regularity conditions at
bifurcation.
RNK.
A similar formulation:
Using the method Lagrange multipliers, the goal of determining the bifurcation
structure of solutions of the optimization problem can be rephrased as finding
the bifurcation structure of stationary points of the problem
maxq(G(q)+D(q))

q(Z|X)
X
Z
where
•  [0,).
K objects
N clusters
• q  is a vector of conditional probabilities in RNK.
• G =g(qi) and D=d(qi) are sufficiently smooth on  , and q=(q1T… qNT)T
where qi  RK .
• The Hessians d2G and d2D satisfy a set of generic regularity conditions at
bifurcation.
The Dynamical System
Goal: To solve  maxq  (G(q) +  D(q)) for each , incremented in
sufficiently small steps, as   .
Method: Study the equilibria of the of the gradient flow

 q 



    q , L (q,  ,  ) :  q ,  G(q)   D(q)    y   q( z | y)  1 

yY
 z

 

•
Equilibria of this system are possible solutions of the the maximization
problem  (satisfy the necessary conditions of constrained optimality)
•
If wT d2q (G(q*) +  D(q*))w < 0 for every wker J, then q* is a maximizer
of .
•
The Jacobian q,L(q*,*) is symmetric, and so only bifurcations of equilibria
can occur.
The first equilibrium is q*(0 = 0)  1/N.
•
The Symmetries:
To better understand the bifurcation structure, we capitalize
on the symmetries of the function G(q)+D(q)
class 1
class 3
q(Z|X) : a clustering
X
Z
K objects {xi}
N objects {zi}
The Symmetries:
To better understand the bifurcation structure, we capitalize
on the symmetries of the function G(q)+D(q)
class 3
class 1
q(Z|X) : a clustering
X
Z
K objects {xi}
N objects {zi}
The symmetry group of all
permutations on N symbols
is
SN
.
Equivariant Branching Lemma:
The subgroups of SN
with 1D fixed point spaces
determine the
Bifurcation Structure
A partial subgroup lattice for S4 and the corresponding bifurcating
directions
S4
S3
 3v 
 
 v
 v
 v
 
0 
S2 S2 S2
 0 
 
 2v 
 v
 v
 
0 
 0 
 
 v
 2v 
 v
 
0 
 0 
 
 v
 v
 2v 
 
0 
S3
 v
 
 3v 
 v
 v
 
0 
S2 S2 S2
 2v 
 
 0 
 v
 v
 
0 
 v
 
 0 
 2v 
 v
 
0 
 v
 
 0 
 v
 2v 
 
0 
1
S3
 v
 
 v
 3v 
 v
 
0 
S2 S2 S2
 2v 
 
 v
 0 
 v
 
0 
 v
 
 2v 
 0 
 v
 
0 
 v
 
 v
 0 
 2v 
 
0 
S3
 v
 
 v
 v
 3v 
 
0 
S2 S2 S2
 2v 
 
 v
 v
 0 
 
 0 
 v
 
 2v 
 v
 0 
 
 0 
 v
 
 v
 2v 
 0 
 
 0 
A partial subgroup lattice for S4 and the corresponding bifurcating
directions corresponding to subgroups isomorphic to S2 x S2.
S4
 12, 34 
 v 
 
 v 
 v
 
 v
 
 13, 24 
 14, 23 
 v 
 
 v
 v
 
 v 
 
 v 
 
 v
 v 
 
 v
 
Symmetry Breaking Bifurcations
q*
q 1 is fixed by S N  S 4
N
q1 
N
1

4

Symmetry Breaking Bifurcations
q is fixed by S N 1  S3
*
q* 
q*
q 1 is fixed by S N  S 4
N
q1 
N
1

4
S4
S3
S2 S2 S2
S3
S2
S2
S3
S2
S2
1
S2 S2

S3
S2
S2
S2
Symmetry Breaking Bifurcations
q is fixed by S N 1  S3
*
q* 
q * is fixed by S N  2  S 2
q* 
q*
q 1 is fixed by S N  S 4
N
q1 
N
1

4
S4
S3
S2 S2 S2
S3
S2
S2
S3
S2
S2
1
S2 S2

S3
S2
S2
S2
Symmetry Breaking Bifurcations
q*
S4
S3
S2 S2 S2
S3
S2
S2
S3
S2
S2
1
S2 S2

S3
S2
S2
S2
Symmetry Breaking Bifurcations
q*
S4
 12, 34 
 13, 24 
 14, 23 

Symmetry Breaking Bifurcations
q* is fixed by S2  S2  (12), (34)
q* 
q*
S4
 12, 34 
 13, 24 
 14, 23 

Observed Bifurcation
Structure
Group Structure
S4
S3
S2 S2 S2
S3
S2
S2
S3
S2
S2
1
S2 S2
S3
S2
S2
S2
The Equivariant Branching Lemma shows that the bifurcation structure contains the branches …
Observed Bifurcation
Structure
q*
Group Structure
S4
S3
S2 S2 S2
S3
S2
S2
S3
S2
S2
S2 S2
S3
S2
S2
S2
1

The subgroups {S2x S2} give additional structure …
Observed Bifurcation
Structure
q*
Group Structure
S4
 12, 34 
 13, 24 
 14, 23 

The subgroups {S2x S2} give additional structure …
Observed Bifurcation
Structure
q*
Group Structure
S4
 12, 34 
 13, 24 
 14, 23 

Theorem: There are at exactly K bifurcations on the branch (q1/N ,  ) whenever G(q1/N) is nonsingular
Observed Bifurcation
Structure
q*
There are K=52
bifurcations
on the first
branch

??????
Observed Bifurcations for the 4 Blob Problem
Conceptual Bifurcation Structure
q*

Why are there only 3 bifurcations observed? In general, are there only N-1 bifurcations?
What kinds of bifurcations do we expect: pitchfork-like, transcritical, saddle-node, or some
other type?
How many bifurcating solutions are there?
What do the bifurcating branches look like? Are they subcritical or supercritical ?
What is the stability of the bifurcating branches? Is there always a bifurcating branch
which contains solutions of the optimization problem?
Are there bifurcations which alter the classes after all of the classes have resolved ?
Conceptual Bifurcation Structure
??????
Observed Bifurcations for the 4 Blob Problem
q*

Why are there only 3 bifurcations observed? In general, are there only N-1 bifurcations?
There are N-1 symmetry breaking bifurcations from SM to SM-1 for M  N.
What kinds of bifurcations do we expect: pitchfork-like, transcritical, saddle-node, or some other
type?
How many bifurcating solutions are there? There are at least N from the first bifurcation (SN 
SN –1), at least N-1 from the next one (SN -1  SN –2), etc, as well as branches with
symmetry breaking from SM  Sm x Sn for all (m,n) where m + n =M.
What do the bifurcating branches look like? They are subcritical or supercritical depending on the
sign of the bifurcation discriminator (q*,*,m,n) .
What is the stability of the bifurcating branches? Is there always a bifurcating branch which contains
solutions of the optimization problem? Yes for , No for the annealing problem .
Are there bifurcations which alter the classes after all of the classes have resolved ? Generically, no.
d 2 q, L is the constraine d Hessian
d 2 q , L is singular
d 2 q F is singular
N M
B  R  MI K
i 1
1
i
is singular
N M
d 2 q F is nonsingula r
N M
N M
B  R  MI K
B  R  MI K
B  Ri1  MI K
is nonsingula r
is singular
is nonsingula r
i 1
1
i
M 1
Non-generic
d 2 q F is unconstrai ned Hessian
F  G  D
Symmetry breaking
pitchfork-like
bifurcation
i 1
1
i
i 1
M 1
Impossible
scenario
Saddle-node
bifurcation
Impossible
scenario
Continuation techniques
provide
numerical confirmation
of the theory
I(Y,Z) bits
q*

Bifurcating branches with
symmetry S2 x S2 = <(12),(34)>
I(Y,Z) bits
q*

A closer look …
q*
I(Y,Z) bits

Bifurcation from S4 to S3…
q*
I(Y,Z) bits

I(Y,Z) bits
The bifurcation from S4 to S3 is subcritical …
(the theory predicted this since the bifurcation discriminator (q1/4,*,m,n)<0 )
Theorem: The bifurcation discriminator of the pitchfork-like branch
(q*,*,*) + (tu,0,(t)) with symmetry Sm x Sn is
 (q *  * , m, n)  3  d 4 f [v, v, v, v]
F   f (qi )   ( g (qi )  d (qi ))
v is the null vector of B, the singular block of d 2 F
mn(m  n) 1 

A b
2
2
m

mn

n


b  d 3 f [v, v, v ]
  bT B   I K 
A  B  Ri1  MI K
If (q*,*,m,n) < 0, then the branch is subcritical. If (q*,*,m,n) > 0, then
the branch is supercritical.
I(Y,Z) bits
Additional structure!!
I(Y,Z) bits
I(Y,Z) bits
Conclusions …
 We have a complete theoretical picture of how the
clusterings evolve for any problem of the form
maxq(G(q)+D(q))
subject to the assumptions stated earlier.
 SO WHAT??
 There are theoretical consequences for “Rate Distortion Curve”…
 This yields a new and improved algorithm for solving
the neural coding problem …
A numerical algorithm to solve max(G(q)+D(q))
q

Let q0 be the maximizer of maxq G(q), 0 =1 and s > 0. For k  0, let (qk , k ) be a
solution to maxq G(q) +  D(q ). Iterate the following steps until
K =  max for
some K.
   qk 

1. Perform  -step: solve  q , L (qk , k ,  k )
        q , L (qk , k ,  k )
  k
   qk 
 and select  k+1 = k + dk where
for 

   k 
dk = (s sgn(cos )) /(||qk ||2 + ||k ||2 +1)1/2.
2.
The initial guess for (qk+1,k+1) at  k+1 is
(qk+1(0),k+1 (0)) = (qk ,k) + dk ( qk, k) .
3.
Optimization: solve maxq (G(q) +  k+1 D(q)) using pseudoarclength continuation
to get the maximizer qk+1, and the vector of Lagrange multipliers k+1 using initial
guess (qk+1(0),k+1 (0)).
4.
Check for bifurcation: compare the sign of the determinant of an identical block
of each of q [G(qk) +  k D(qk)] and q [G(qk+1) +  k+1 D(qk+1)]. If a bifurcation is
detected, then set qk+1(0) = qk + d_k u where u is bifurcating direction and repeat
step 3.
Application to cricket sensory data
E(Y|Z): stimulus
means conditioned
on each of the classes
Y: Neural
responses
Z:optimal
clustering
More about Bifurcations
Theorem: All symmetry breaking bifurcations are pitchfork-like.
Outline of proof: ’(0)=0 since 2xx r(0,0) =0.
Theorem: Generically, bifurcations which alter the classes do not occur after all of
the classes have resolved. That is, only saddle-node bifurcations are possible,
which do not alter class structure due to explicit bifurcating direction.
Theorem: If d2D(q*) is positive definite on ker d2F (q*,*), then the singularity
(q*,*,*) is a bifurcation. In particular, if d2G(q*) is negative definite on ker
d2F (q*,*), then d2D(q*) is positive definite on ker d2F (q*,*).
Theorem: A symmetry breaking bifurcating direction u is an eigenvector of d2q,L
((q*,*)+tu,*+ (t)) for small t. If the corresponding eigenvalue is positive,
then the branch consists of stationary points which are not solutions of .
Theorem: Subcritical bifurcating branches may be solutions of either  or 
Solutions of  need not be solutions of . Solutions of  are always solutions
of .
Theorem: If there exists a saddle-node bifurcation of solutions to the
Information Bottleneck problem at I0 = I*, then RI(I0) is neither concave,
nor convex in any neighborhood of I¤. Similarly, the existence of a
saddle-node bifurcation of solutions to the Information Distortion problem
at I0 = I* implies that RH(I0) is neither concave, nor convex in any
Continuation
•
•
•
•
A local maximum qk*(k) of  is an equilibrium of the gradient flow .
Initial condition qk+1(0)(k+1(0)) is sought in tangent direction qk, which is found
by solving the matrix system
   qk 
     q , L (qk , k ,  k )
 q , L (qk , k ,  k )
  
  k
The continuation algorithm used to find qk+1*(k+1) is based on Newton’s method.
Parameter continuation follows the dashed (---) path, pseudoarclength
continuation follows the dotted (…) path
(qk 1 , k 1 ,  k 1 )
( q,  )
(qk 1 , k 1 )
(qk 1 , k 1 )
( 0)
( 0)
( q k , k ,  k )
(qk 1 , k 1  k 1 )
( 0)
(qk , k )
k
( 0)
 k 1( 0)
( 0)

The Groups
• Let P be the finite group of n ×n “block” permutation matrices which
represents the action of SN on q and F(q,) . For example, if N=3,
0

 IK
0

IK
0
0
0

0   P permutes q(Z1|y) with q(Z2|y) for every y
I K 
• F(q,) is P -invariant means that for every   P,
F( q,) = F(q,)
• Let  be the finite group of (n+K) × (n+K) block permutation matrices
q 
which represents the action of SN on   and q, L(q,,):
 

 
 : 
 0

 K n

0 
 |   P
  the lagrange multiplier s and constraints are fixed !
I 

K K 

n K
• q, L(q, , ) is -equivariant means that for every  
q 
 q, L(q, , ) = q, L(   ,)
 
 
Notation and Definitions
•
q 
The symmetry of   is measured by its isotropy subgroup

 
 q ,

 q   q 
    |       
     

•
An isotropy subgroup  is a maximal isotropy subgroup of  if there does not
exist an isotropy subgroup  of  such that     .
•
 q*  *
At bifurcation ( ,  ) , the fixed point subspace of q*,* is
 * 
 

Fix ( q* ,* )  w  ker  q , L (q* , * ,  * ) | w  w,    q* ,*

Equivariant Branching Lemma
One of the Existence Theorems we use to describe a bifurcation in the
presence of symmetries is the Equivariant Branching Lemma
(Vanderbauwhede and Cicogna 1980-1).
Idea: The bifurcation structure of local solutions is described by the
isotropy subgroups of  which have dim Fix()=1.
• System:
x  r ( x,.  ), r : m    m
• r(x,) is G-equivariant for some compact Lie Group G
• r (0,0)  0,  x r (0,0)  0
• Fix(G)={0}
• Let H be an isotropy subgroup of G such that
dim Fix (H) = 1.
• Assume  r(0,0)  0 (crossing condition).
Then there is a unique smooth solution branch (tx0,(t)) to r = 0 such that
x0  Fix (H) and the isotropy subgroup of each solution is H.
Smoller-Wasserman Theorem
Another Existence Theorem:
Smoller-Wasserman Theorem (1985-6)
For variational problems where
r ( x,  )   x f ( x,  )
there is a bifurcating solution tangential to Fix(H) for
every maximal isotropy subgroup H, not only those with
dim Fix(H) = 1.
• dim Fix(H) =1 implies that H is a maximal isotropy subgroup
Download