Symmetry breaking clusters when deciphering the neural code September 12, 2005 Albert E. Parker Department of Mathematical Sciences Center for Computational Biology Montana State University Collaborators: Tomas Gedeon, Alex Dimitrov, John Miller, and Zane Aldworth Talk Outline Deciphering the Neural Code A Clustering Problem The Dynamical System Bifurcations Theoretical Results Numerical Results Deciphering the Neural Code: How does neural activity represent information about environmental stimuli? “The little fly sitting in the fly’s brain trying to fly the fly” Looking for the dictionary to the neural code … encoding Inputs: stimuli X outputs: neural response Y decoding … but the dictionary is not deterministic! Given a stimulus, an experimenter observes many different neural responses: X Yi| X i = 1, 2, 3, 4 … but the dictionary is not deterministic! Given a stimulus, an experimenter observes many different neural responses: X Yi| X i = 1, 2, 3, 4 Neural encoding is stochastic!! Similarly, neural decoding is stochastic: Y Xi|Y i = 1, 2, … , 9 Probability Framework encoder: P(Y|X) environmental stimuli neural responses Y X decoder: P(X|Y) Deciphering the Neural Code = Determining the encoder P(Y|X) or the decoder P(X|Y) Common Approaches: parametric estimations, linear methods Difficulty: There is never enough data. One Approach: Cluster the responses Stimuli X L objects {xi} Responses p(X,Y) Y K objects {yi} Clustered Responses q(Z |Y) Z N objects {zi} One Approach: Cluster the responses Stimuli X L objects {xi} Responses p(X,Y) Y K objects {yi} Clustered Responses q(Z |Y) Z N objects {zi} One Approach: Cluster the responses Stimuli Responses p(X,Y) X L objects {xi} P(Y|X) P(X|Y) Y K objects {yi} Clustered Responses q(Z |Y) Z N objects {zi} One Approach: Cluster the responses Stimuli Responses p(X,Y) X L objects {xi} P(Y|X) P(X|Y) Y K objects {yi} Clustered Responses q(Z |Y) Z N objects {zi} One Approach: Cluster the responses Stimuli Responses p(X,Y) X L objects {xi} P(Y|X) P(X|Y) Y K objects {yi} Clustered Responses q(Z |Y) Z N objects {zi} P(Z|X) P(X|Z) One Approach: Cluster the responses Stimuli X L objects {xi} Responses p(X,Y) Y K objects {yi} Clustered Responses q(Z |Y) Z N objects {zi} • q(Z|Y) is a stochastic clustering of the responses • The outputs Y are clustered in Z so that the information that one can learn about X by observing Z , I(X;Z), is as close as possible to the mutual information I(X;Y) Two optimization problems which use this approach optimizing at a distortion level D(Y,Z) D0 • Rate Distortion Theory (Shannon 1950’s) Minimal Informative Compression min I(X,Z) constrained by D(X,Z) D0 q • Deterministic Annealing (Rose 1990’s) A Clustering Algorithm max H(Z|X) constrained by D(X,Z) D0 q Relationship between these formulations: I(X,Z)=H(Z) – H(Z|X) Examples: • Information Bottleneck Method (Tishby, Pereira, Bialek 1999) min I(Y,Z) constrained by I(X;Z) I0 q max –I(Y,Z) + I(X;Z) q • Information Distortion Method (Dimitrov and Miller 2001) max H(Z|Y) constrained by I(X;Z) I0 q max H(Z|Y) + I(X;Z) q A basic annealing algorithm to solve maxq(G(q)+D(q)) Let q0 be the maximizer of maxq G(q), and let 0 =0. For k 0, let (qk , k ) be a solution to maxq G(q) + D(q ). Iterate the following steps until K = max for some K. 1. Perform -step: Let k+1 = k + dk where dk>0 2. The initial guess for qk+1 at k+1 is qk+1(0) = qk + for some small perturbation . 3. Optimization: solve maxq (G(q) + k+1 D(q)) to get the maximizer qk+1 , using initial guess qk+1(0) . Application of the annealing method to the Information Distortion problem maxq (H(Z|X) + I(X;Z)) when p(X,Y) is defined by four gaussian blobs Y p(X,Y) L=52 inputs X X K=52 outputs X, Outputs Z K=52 outputs N=4 clustered outputs Z, Clustered Outputs Y, Inputs q(Z|X) X, Outputs Evolution of the optimal clustering: Observed Bifurcations for the Four Blob problem: I(Y,Z) bits We just saw the optimal clusterings q* at some *= max . What do the clusterings look like for < max ?? ?????? Observed Bifurcations for the 4 Blob Problem Conceptual Bifurcation Structure I(Y,Z) bits q* Why are there only 3 bifurcations observed? In general, are there only N-1 bifurcations? What kinds of bifurcations do we expect: pitchfork-like, transcritical, saddle-node, or some other type? How many bifurcating branches are there? What do the bifurcating branches look like? Are they subcritical or supercritical ? What is the stability of the bifurcating branches? Is there always a bifurcating branch which contains solutions of the optimization problem? Are there bifurcations which alter the classes after all of the classes have resolved ? A General Problem: To determine the bifurcations of solutions to clustering problems of the form maxqG(q) constrained by D(q)I0 where q(Z|X) X K objects Z N clusters • q is a vector of conditional probabilities in • G =g(qi) and D=d(qi) are sufficiently smooth on , and q=(q1T… qNT)T where qi RK . This implies that: 1. G and D have symmetry: they are invariant to re-labeling of the classes of Z; 2. The Hessians d2G and d2D are block diagonal. • The Hessians d2G and d2D satisfy a set of generic regularity conditions at bifurcation. RNK. A similar formulation: Using the method Lagrange multipliers, the goal of determining the bifurcation structure of solutions of the optimization problem can be rephrased as finding the bifurcation structure of stationary points of the problem maxq(G(q)+D(q)) q(Z|X) X Z where • [0,). K objects N clusters • q is a vector of conditional probabilities in RNK. • G =g(qi) and D=d(qi) are sufficiently smooth on , and q=(q1T… qNT)T where qi RK . • The Hessians d2G and d2D satisfy a set of generic regularity conditions at bifurcation. The Dynamical System Goal: To solve maxq (G(q) + D(q)) for each , incremented in sufficiently small steps, as . Method: Study the equilibria of the of the gradient flow q q , L (q, , ) : q , G(q) D(q) y q( z | y) 1 yY z • Equilibria of this system are possible solutions of the the maximization problem (satisfy the necessary conditions of constrained optimality) • If wT d2q (G(q*) + D(q*))w < 0 for every wker J, then q* is a maximizer of . • The Jacobian q,L(q*,*) is symmetric, and so only bifurcations of equilibria can occur. The first equilibrium is q*(0 = 0) 1/N. • The Symmetries: To better understand the bifurcation structure, we capitalize on the symmetries of the function G(q)+D(q) class 1 class 3 q(Z|X) : a clustering X Z K objects {xi} N objects {zi} The Symmetries: To better understand the bifurcation structure, we capitalize on the symmetries of the function G(q)+D(q) class 3 class 1 q(Z|X) : a clustering X Z K objects {xi} N objects {zi} The symmetry group of all permutations on N symbols is SN . Equivariant Branching Lemma: The subgroups of SN with 1D fixed point spaces determine the Bifurcation Structure A partial subgroup lattice for S4 and the corresponding bifurcating directions S4 S3 3v v v v 0 S2 S2 S2 0 2v v v 0 0 v 2v v 0 0 v v 2v 0 S3 v 3v v v 0 S2 S2 S2 2v 0 v v 0 v 0 2v v 0 v 0 v 2v 0 1 S3 v v 3v v 0 S2 S2 S2 2v v 0 v 0 v 2v 0 v 0 v v 0 2v 0 S3 v v v 3v 0 S2 S2 S2 2v v v 0 0 v 2v v 0 0 v v 2v 0 0 A partial subgroup lattice for S4 and the corresponding bifurcating directions corresponding to subgroups isomorphic to S2 x S2. S4 12, 34 v v v v 13, 24 14, 23 v v v v v v v v Symmetry Breaking Bifurcations q* q 1 is fixed by S N S 4 N q1 N 1 4 Symmetry Breaking Bifurcations q is fixed by S N 1 S3 * q* q* q 1 is fixed by S N S 4 N q1 N 1 4 S4 S3 S2 S2 S2 S3 S2 S2 S3 S2 S2 1 S2 S2 S3 S2 S2 S2 Symmetry Breaking Bifurcations q is fixed by S N 1 S3 * q* q * is fixed by S N 2 S 2 q* q* q 1 is fixed by S N S 4 N q1 N 1 4 S4 S3 S2 S2 S2 S3 S2 S2 S3 S2 S2 1 S2 S2 S3 S2 S2 S2 Symmetry Breaking Bifurcations q* S4 S3 S2 S2 S2 S3 S2 S2 S3 S2 S2 1 S2 S2 S3 S2 S2 S2 Symmetry Breaking Bifurcations q* S4 12, 34 13, 24 14, 23 Symmetry Breaking Bifurcations q* is fixed by S2 S2 (12), (34) q* q* S4 12, 34 13, 24 14, 23 Observed Bifurcation Structure Group Structure S4 S3 S2 S2 S2 S3 S2 S2 S3 S2 S2 1 S2 S2 S3 S2 S2 S2 The Equivariant Branching Lemma shows that the bifurcation structure contains the branches … Observed Bifurcation Structure q* Group Structure S4 S3 S2 S2 S2 S3 S2 S2 S3 S2 S2 S2 S2 S3 S2 S2 S2 1 The subgroups {S2x S2} give additional structure … Observed Bifurcation Structure q* Group Structure S4 12, 34 13, 24 14, 23 The subgroups {S2x S2} give additional structure … Observed Bifurcation Structure q* Group Structure S4 12, 34 13, 24 14, 23 Theorem: There are at exactly K bifurcations on the branch (q1/N , ) whenever G(q1/N) is nonsingular Observed Bifurcation Structure q* There are K=52 bifurcations on the first branch ?????? Observed Bifurcations for the 4 Blob Problem Conceptual Bifurcation Structure q* Why are there only 3 bifurcations observed? In general, are there only N-1 bifurcations? What kinds of bifurcations do we expect: pitchfork-like, transcritical, saddle-node, or some other type? How many bifurcating solutions are there? What do the bifurcating branches look like? Are they subcritical or supercritical ? What is the stability of the bifurcating branches? Is there always a bifurcating branch which contains solutions of the optimization problem? Are there bifurcations which alter the classes after all of the classes have resolved ? Conceptual Bifurcation Structure ?????? Observed Bifurcations for the 4 Blob Problem q* Why are there only 3 bifurcations observed? In general, are there only N-1 bifurcations? There are N-1 symmetry breaking bifurcations from SM to SM-1 for M N. What kinds of bifurcations do we expect: pitchfork-like, transcritical, saddle-node, or some other type? How many bifurcating solutions are there? There are at least N from the first bifurcation (SN SN –1), at least N-1 from the next one (SN -1 SN –2), etc, as well as branches with symmetry breaking from SM Sm x Sn for all (m,n) where m + n =M. What do the bifurcating branches look like? They are subcritical or supercritical depending on the sign of the bifurcation discriminator (q*,*,m,n) . What is the stability of the bifurcating branches? Is there always a bifurcating branch which contains solutions of the optimization problem? Yes for , No for the annealing problem . Are there bifurcations which alter the classes after all of the classes have resolved ? Generically, no. d 2 q, L is the constraine d Hessian d 2 q , L is singular d 2 q F is singular N M B R MI K i 1 1 i is singular N M d 2 q F is nonsingula r N M N M B R MI K B R MI K B Ri1 MI K is nonsingula r is singular is nonsingula r i 1 1 i M 1 Non-generic d 2 q F is unconstrai ned Hessian F G D Symmetry breaking pitchfork-like bifurcation i 1 1 i i 1 M 1 Impossible scenario Saddle-node bifurcation Impossible scenario Continuation techniques provide numerical confirmation of the theory I(Y,Z) bits q* Bifurcating branches with symmetry S2 x S2 = <(12),(34)> I(Y,Z) bits q* A closer look … q* I(Y,Z) bits Bifurcation from S4 to S3… q* I(Y,Z) bits I(Y,Z) bits The bifurcation from S4 to S3 is subcritical … (the theory predicted this since the bifurcation discriminator (q1/4,*,m,n)<0 ) Theorem: The bifurcation discriminator of the pitchfork-like branch (q*,*,*) + (tu,0,(t)) with symmetry Sm x Sn is (q * * , m, n) 3 d 4 f [v, v, v, v] F f (qi ) ( g (qi ) d (qi )) v is the null vector of B, the singular block of d 2 F mn(m n) 1 A b 2 2 m mn n b d 3 f [v, v, v ] bT B I K A B Ri1 MI K If (q*,*,m,n) < 0, then the branch is subcritical. If (q*,*,m,n) > 0, then the branch is supercritical. I(Y,Z) bits Additional structure!! I(Y,Z) bits I(Y,Z) bits Conclusions … We have a complete theoretical picture of how the clusterings evolve for any problem of the form maxq(G(q)+D(q)) subject to the assumptions stated earlier. SO WHAT?? There are theoretical consequences for “Rate Distortion Curve”… This yields a new and improved algorithm for solving the neural coding problem … A numerical algorithm to solve max(G(q)+D(q)) q Let q0 be the maximizer of maxq G(q), 0 =1 and s > 0. For k 0, let (qk , k ) be a solution to maxq G(q) + D(q ). Iterate the following steps until K = max for some K. qk 1. Perform -step: solve q , L (qk , k , k ) q , L (qk , k , k ) k qk and select k+1 = k + dk where for k dk = (s sgn(cos )) /(||qk ||2 + ||k ||2 +1)1/2. 2. The initial guess for (qk+1,k+1) at k+1 is (qk+1(0),k+1 (0)) = (qk ,k) + dk ( qk, k) . 3. Optimization: solve maxq (G(q) + k+1 D(q)) using pseudoarclength continuation to get the maximizer qk+1, and the vector of Lagrange multipliers k+1 using initial guess (qk+1(0),k+1 (0)). 4. Check for bifurcation: compare the sign of the determinant of an identical block of each of q [G(qk) + k D(qk)] and q [G(qk+1) + k+1 D(qk+1)]. If a bifurcation is detected, then set qk+1(0) = qk + d_k u where u is bifurcating direction and repeat step 3. Application to cricket sensory data E(Y|Z): stimulus means conditioned on each of the classes Y: Neural responses Z:optimal clustering More about Bifurcations Theorem: All symmetry breaking bifurcations are pitchfork-like. Outline of proof: ’(0)=0 since 2xx r(0,0) =0. Theorem: Generically, bifurcations which alter the classes do not occur after all of the classes have resolved. That is, only saddle-node bifurcations are possible, which do not alter class structure due to explicit bifurcating direction. Theorem: If d2D(q*) is positive definite on ker d2F (q*,*), then the singularity (q*,*,*) is a bifurcation. In particular, if d2G(q*) is negative definite on ker d2F (q*,*), then d2D(q*) is positive definite on ker d2F (q*,*). Theorem: A symmetry breaking bifurcating direction u is an eigenvector of d2q,L ((q*,*)+tu,*+ (t)) for small t. If the corresponding eigenvalue is positive, then the branch consists of stationary points which are not solutions of . Theorem: Subcritical bifurcating branches may be solutions of either or Solutions of need not be solutions of . Solutions of are always solutions of . Theorem: If there exists a saddle-node bifurcation of solutions to the Information Bottleneck problem at I0 = I*, then RI(I0) is neither concave, nor convex in any neighborhood of I¤. Similarly, the existence of a saddle-node bifurcation of solutions to the Information Distortion problem at I0 = I* implies that RH(I0) is neither concave, nor convex in any Continuation • • • • A local maximum qk*(k) of is an equilibrium of the gradient flow . Initial condition qk+1(0)(k+1(0)) is sought in tangent direction qk, which is found by solving the matrix system qk q , L (qk , k , k ) q , L (qk , k , k ) k The continuation algorithm used to find qk+1*(k+1) is based on Newton’s method. Parameter continuation follows the dashed (---) path, pseudoarclength continuation follows the dotted (…) path (qk 1 , k 1 , k 1 ) ( q, ) (qk 1 , k 1 ) (qk 1 , k 1 ) ( 0) ( 0) ( q k , k , k ) (qk 1 , k 1 k 1 ) ( 0) (qk , k ) k ( 0) k 1( 0) ( 0) The Groups • Let P be the finite group of n ×n “block” permutation matrices which represents the action of SN on q and F(q,) . For example, if N=3, 0 IK 0 IK 0 0 0 0 P permutes q(Z1|y) with q(Z2|y) for every y I K • F(q,) is P -invariant means that for every P, F( q,) = F(q,) • Let be the finite group of (n+K) × (n+K) block permutation matrices q which represents the action of SN on and q, L(q,,): : 0 K n 0 | P the lagrange multiplier s and constraints are fixed ! I K K n K • q, L(q, , ) is -equivariant means that for every q q, L(q, , ) = q, L( ,) Notation and Definitions • q The symmetry of is measured by its isotropy subgroup q , q q | • An isotropy subgroup is a maximal isotropy subgroup of if there does not exist an isotropy subgroup of such that . • q* * At bifurcation ( , ) , the fixed point subspace of q*,* is * Fix ( q* ,* ) w ker q , L (q* , * , * ) | w w, q* ,* Equivariant Branching Lemma One of the Existence Theorems we use to describe a bifurcation in the presence of symmetries is the Equivariant Branching Lemma (Vanderbauwhede and Cicogna 1980-1). Idea: The bifurcation structure of local solutions is described by the isotropy subgroups of which have dim Fix()=1. • System: x r ( x,. ), r : m m • r(x,) is G-equivariant for some compact Lie Group G • r (0,0) 0, x r (0,0) 0 • Fix(G)={0} • Let H be an isotropy subgroup of G such that dim Fix (H) = 1. • Assume r(0,0) 0 (crossing condition). Then there is a unique smooth solution branch (tx0,(t)) to r = 0 such that x0 Fix (H) and the isotropy subgroup of each solution is H. Smoller-Wasserman Theorem Another Existence Theorem: Smoller-Wasserman Theorem (1985-6) For variational problems where r ( x, ) x f ( x, ) there is a bifurcating solution tangential to Fix(H) for every maximal isotropy subgroup H, not only those with dim Fix(H) = 1. • dim Fix(H) =1 implies that H is a maximal isotropy subgroup