Continuation and Symmetry Breaking Bifurcation of the Information Distortion Function September 19, 2002 Albert E. Parker Complex Biological Systems Department of Mathematical Sciences Center for Computational Biology Montana State University Collaborators: Tomas Gedeon Alexander Dimitrov John P. Miller Zane Aldworth Bryan Roosien Outline Our Problem A Class of Problems Continuation Bifurcation with Symmetries How we can efficiently solve the Class of Problems The Neural Coding Problem: We want to understand the neural code. We seek an answer to the question: How does neural activity represent information about environmental stimuli? “The little fly sitting in the fly’s brain trying to fly the fly” The mathematical problem Optimizing the Information Distortion Function max F(q, )= max (H(Z|Y)+I(X,Z)) q q • We are really just interested in max I(X,Z). q • • environmental stimuli .X Q(Y |X) neural responses Y q(Z |Y) neural responses in N clusters Z . : q( z | y ) | q( z | y ) 1, y Y n zZ •H(Z|Y) is the conditional entropy of Z|Y. • I(X,Z) is the mutual information between X and Z. Annealing At * 1, we observe bifurcatio n! max H ( Z | Y ) * I ( X , Z ) q At 0 max H ( Z | Y ) q Annealing At max I ( X , Z ) q At 0 max H ( Z | Y ) q Application of the method to 4 gaussian blobs Random clusters Similar Problems • Information Bottleneck Method (Tishby, Pereira, Bialek 2000) max –I(Y,Z) + I(X,Z) q Used for document classification, gene expression, neural coding and spectral analysis • Deterministic Annealing (Rose 1998) max H(Z|Y) + D(Y,Z) q Clustering Algorithm • Rate Distortion Theory (Shannon ~1950) max –I(Y,Z) + D(Y,Z) q The Class of Problems max F(q, )=max(G(q)+D(q)) q q To apply the bifurcation theory, the above problem must satisfy: • G and D are infinitely differentiable in . • G is strictly concave. • G and D must be invariant under relabeling of the classes. • The hessian of F is block diagonal with N blocks and B=B if q(z|y)= q(z|y) for every yY. B1 0 0 q F 0 0 0 0 B N The Dynamical System Goal: To efficiently solve maxq (G(q) + D(q)) for each , incremented in sufficiently small steps, as . Method: Study the equilibria of the of the flow q q , L (q, , ) : q , G(q) D(q) y q( z | y) 1 yY z • q, L : n K n K • The Jacobian wrt q of the K constraints {zq(z|y)-1} is J=(IK IK … IK). • If wT qF(q*,) w < 0 for every wker J, then q*() is a maximizer of . • The first equilibrium is q*(0 = 0) 1/N. Investigating the Dynamical System The Dynamical System How: Use numerical continuation in a constrained system to choose and to choose an initial guess to find the equilibria q*( ). Use bifurcation theory with symmetries to understand bifurcations of the equilibria. Properties of the Dynamical System • q In our dynamical system q , L (q, , ) the hessian q F q , L (q, , ) T J J 0 determines the stability of equilibria and the location of bifurcation. Theorem: (q*,*) is a bifurcation of equilibria of if and only if q,L(q*,*) is singular. Theorem: Let (q*,*) be a local solution to . If q,L(q*,*) is singular, then qF (q*,*) is singular. Continuation • • • • • A local maximum qk*(k) of is an equilibrium of the gradient flow . Initial condition qk+1(0)(k+1(0)) is sought in tangent direction qk, which is found by solving the matrix system qk q , L (qk , k , k ) q , L (qk , k , k ) k The continuation algorithm used to find qk+1*(k+1) is based on Newton’s method. We cannot use Newton’s method directly, since we have a constrained problem. Instead, we use an Augmented Lagrangian or an implicit solution method. (qk 1 , k 1 ) * q * qk 1 (qk , k ) * ( 0) k 1 q (qk 1 , k 1 ) ( 0) qk * k k 1( 0) ( 0) Bifurcation! What to do? How to continue after bifurcation? q* q* 1 N Do not want to get stuck on a local max global max Conceptual Bifurcation Structure q* (YN|Y) q* 1 N Bifurcations of q*() Observed Bifurcations for the 4 Blob Problem Questions … How to detect bifurcation? What kinds of bifurcations do we expect? And how many bifurcating solutions are there? How to choose a direction to take after bifurcation is detected? Explain the nice bifurcation structure that is observed numerically. 1. 2. Why are there only N-1 bifurcations observed? Why are there no bifurcations observed after all N classes have resolved? Bifurcations with symmetry To better understand the bifurcation structure, we capitalize on the symmetries of the optimization function F(q,). The “obvious” symmetry is that F(q,) is invariant to relabeling of the N classes of Z The symmetry group of all permutations on N symbols is SN. We need to define the action of SN on q and F(q,) … The Groups • Let P be the finite group of n ×n “block” permutation matrices which represents the action of SN on q and F(q,) . For example, if N=3, 0 IK 0 IK 0 0 0 0 P permutes q(z1|y) with q(z2|y) for every y I K • F(q,) is P -invariant means that for every P, F( q,) = F(q,) • Let be the finite group of (n+K) × (n+K) block permutation matrices q which represents the action of SN on and q, L(q,,): : 0 K n 0 | P the lagrange multiplier s and constraints are fixed ! I K K n K • q, L(q, , ) is -equivariant means that for every q q, L(q, , ) = q, L( ,) Bifurcations with symmetry • q The symmetry of is measured by its isotropy subgroup q , q q | • An isotropy subgroup is a maximal isotropy subgroup of if there does not exist an isotropy subgroup of such that . • q* * At bifurcation ( , ) , the fixed point subspace of q*,* is * Fix ( q* ,* ) w ker q , L (q* , * , * ) | w w, q* ,* Bifurcations with symmetry One of the tools we use to describe a bifurcation in the presence of symmetries is the Equivariant Branching Lemma (Vanderbauwhede and Cicogna 1980-1). Idea: The bifurcation structure of local solutions is described by the isotropy subgroups of which have dim Fix()=1. • System: x r ( x,. ), r : m m • r(x,) is G-equivariant for some compact Lie Group G • r (0,0) 0, x r (0,0) 0 • Fix(G)={0} • Let H be an isotropy subgroup of G such that dim Fix (H) = 1. • Assume r(0,0) x0 0 for nontrivial x0 Fix (H) (crossing condition). Then there is a unique smooth solution branch (tx0,(t)) to r = 0 such that x0 Fix (H) and the isotropy subgroup of each solution is H. Bifurcations with symmetry The other tool: Smoller-Wasserman Theorem (1985-6) For variational problems where r ( x, ) x f ( x, ) there is a bifurcating solution tangential to Fix(H) for every maximal isotropy subgroup H, not only those with dim Fix(H) = 1. • dim Fix(H) =1 implies that H is a maximal isotropy subgroup What do the bifurcations look like? From bifurcation, the Equivariant Branching Lemma shows that the following solutions emerge: An equilibria q* is called M-uniform if qF (q*,) has M blocks that are identical. The M classes of Z corresponding to these M identical blocks are called unresolved classes. The classes of Z that are not unresolved are called resolved. The first equilibria, q* 1/N, is N-uniform. Theorem: q* is M-uniform if and only if q* is fixed by SM. What do the bifurcations look like? Theorem: dim ker qF (q*,)=M with basis vectors {vi}Mi=1 v if is the i th unresolved class [vi ] 0 otherwise Theorem: dim ker q,L (q*,,)=M-1 with basis vectors vi vM 0 0 Point: Since the bifurcating solutions whose existence is guaranteed by the EBL and the SW Theorem are tangential to ker q,L (q*,,), then we know the explicit form of the bifurcating directions. What do the bifurcations look like? Assumptions: • • • Let q* be M-uniform Call the M identical blocks of qF (q*,): B. Call the other N-M blocks of qF (q*,): {R}. We assume that B has a single nullvector v and that R is nonsingular for every . If M<N, then BR-1 + MIK is nonsingular. Theorem: Let (q*,*,*) be a singular point of the flow q q , L (q, , ) such that q* is M-uniform. Then there exists M bifurcating (M-1)uniform solutions (q*,*,*) + (tuk,0,(t)), where ( M 1)v if is the k th unresolved class [uk ] v if k is any other unresolved class 0 otherwise Example: Some of the bifurcating branches when N=4 are given by the following isotropy subgroup lattice for S4 3v v v v 0 S4 v 3v v v 0 S3 S2 S2 S2 0 2v v v 0 0 v 2v v 0 0 v v 2v 0 S3 S3 S2 S2 S2 2v 0 v v 0 v 0 2v v 0 v 0 v 2v 0 1 S2 S2 S2 2v v 0 v 0 v 2v 0 v 0 v v 0 2v 0 v v 3v v 0 S3 v v v 3v 0 S2 S2 S2 2v v v 0 0 v 2v v 0 0 v v 2v 0 0 For the 4 Blob problem: The isotropy subgroups and bifurcating directions of the observed bifurcating branches isotropy group: bif direction: S4 S3 S2 (-v,-v,3v,-v,0)T (-v,2v,0,-v,0)T 1 (-v,0,0,v,0)T … No more bifs! Are there other branches? The Smoller-Wasserman Theorem shows that (under the same assumptions as before) if M is composite, then there exists bifurcating solutions with isotropy group <p> for every element of order M in and every prime p|M. Furthermore, dim (Fix <p>)=p-1 We have never numerically observed solutions fixed by <p> and so perhaps they are unstable. Bifurcating branches from a 4-uniform solution are given by the following isotropy subgroup lattice for S4 S4 Fix ( (1234 ) ) 0 A4 Fix ( A4 ) 0 (1324) 12, 34 v v v v 13, 24 14, 23 v v v v v v v v Maximal isotropy subgroup for S4 S4 S3 S3 S3 S3 12, 34 A4 13, 24 14, 23 Issues: SM • The full lattice of subgroups of the group SM is not known for arbitrary M. Bifurcation Type? Pitchfork? Conjecture: There are only pitchfork bifurcations. (show that for ’(t), which depends on 3qqqF(q,), that ’(0)=0 for any M – have this result for M=2,3) Subcritical or Supercritical? If not pitchfork, ’(0) >0 or ’(0) <0 answers this question. If pitchfork, one needs to examine ’’(0), depends on 4qqqqF(q,). Stability? (Is the bifurcating solution a maximum to ?) Answered Questions How to detect bifurcation? Look for singularity of B. What kinds of bifurcations do we expect? And how many bifurcating solutions are there? M bifurcating (M-1)-uniform solutions How to choose a direction to take after bifurcation is detected? ((M-1)v, -v, -v, -v, … -v, 0)T Explain the nice bifurcation structure that is observed numerically. 1. 2. Why are there only N bifurcations observed. There are only N different types of M-uniform solutions for M N. Why are there no bifurcations observed after all N classes have resolved. For 1-uniform solutions, q,L (q*,,) is nonsingular. The efficient algorithm Let q0 be the maximizer of maxq G(q), 0 =1 and s > 0. For k 0, let (qk , k ) be a solution to maxq G(q) + D(q ). Iterate the following steps until K = max for some K. qk 1. Perform -step: solve q , L (qk , k , k ) q , L (qk , k , k ) k qk for and select k+1 = k + dk where k dk = s /(||qk ||2 + ||k ||2 +1)1/2. 2. The initial guess for qk+1 at k+1 is qk+1(0) = qk + dk qk . 3. Optimization: solve maxq G(q) + k+1 D(q) to get the maximizer qk+1 , using initial guess qk+1(0) . 4. Check for bifurcation: compare the sign of the determinant of an identical block of each of q [G(qk) + k D(qk)] and q [G(qk+1) + k+1 D(qk+1)]. If a bifurcation is detected, then set qk+1(0) = qk + d_k u where u is in Fix(H) and repeat step 3.