Symmetry Breaking Bifurcations of the Information Distortion Dissertation Defense April 8, 2003 Albert E. Parker III Complex Biological Systems Department of Mathematical Sciences Center for Computational Biology Montana State University Goal: Solve the Information Distortion Problem The goal of my thesis is to solve the Information Distortion problem, an optimization problem of the form maxqG(q) constrained by D(q)D0 where • is a subset of Rn. • G and D are sufficiently smooth in . • G and D have symmetry: they are invariant to some group action. Problems of this form arise in the study of clustering problems or optimal source coding systems. Goal: Another Formulation Using the method Lagrange multipliers, the goal of finding solutions of the optimization problem can be rephrased as finding stationary points of the problem maxqF(q,) = maxq(G(q)+D(q)) where • [0,). • is a subset of RNK. • G and D are sufficiently smooth in . • G and D have symmetry: they are invariant to some group action. How: Determine the Bifurcation Structure We have described the bifurcation structure of stationary points to any problem of the form maxqF(q,) = maxq(G(q)+D(q)) where • [0,). • is a linear subset of RNK. • G and D are sufficiently smooth in . • G and D have symmetry: they are invariant to some group action. Thesis Topics The Data Clustering Problem The Neural Coding Problem Information Theory / Probability Theory Optimization Theory Dynamical Systems Bifurcation Theory with Symmetries Group Theory Continuation Techniques Outline of this talk The Data Clustering Problem A Class of Optimization Problems Bifurcation with Symmetries Numerical Results The Data Clustering Problem q(YN|Y) : a clustering Y K objects {yi} YN N objects {yNi} • Data Classification: identifying all of the books printed in 2002 which address the martial art Kempo • Data Compression: converting a bitmap file to a jpeg file A Symmetry: invariance to relabelling of the clusters of YN class 1 class 2 q(YN|Y) : a clustering Y K objects {yi} YN N objects {yNi} A Symmetry: invariance to relabelling of the clusters of YN class 2 class 1 q(YN|Y) : a clustering Y K objects {yi} YN N objects {yNi} Requirements of a Clustering Method • The original data is represented reasonably well by the clusters – Choosing a cost function, D(Y,YN) , called a distortion function, rigorously defines what we mean by the “data is represented reasonably well”. • Fast implementation Examples optimizing at a distortion level D(Y,YN) D0 • Deterministic Annealing (Rose 1998) max H(YN|Y) C, q A Fast Clustering Algorithm constrained by D(Y,YN) D0 • Rate Distortion Theory (Shannon ~1950) Minimum Informative Compression min I(Y,YN) constrained by D(Y,YN) D0 q : q( y N | y ) | q( y N | y ) 1, y Y NK y N YN Inputs and Outputs and Clustered Outputs Inputs X L objects {xi} Outputs p(X,Y) Y Clusters q(YN |Y) K objects {yi} YN N objects {yNi} • The Information Distortion method clusters the outputs Y into clusters YN so that the information that one can learn about X by observing YN , I(X;YN), is as close as possible to the mutual information I(X;Y) • The corresponding information distortion function is DI(Y;YN)=I(X;Y) - I(X;YN ) Two optimization problems which use the information distortion function • Information Distortion Method (Dimitrov and Miller 2001) max H(YN|Y) constrained by DI(Y,YN) D0 q max H(YN|Y) + I(X;YN) q • Information Bottleneck Method (Tishby, Pereira, Bialek 1999) min I(Y,YN) constrained by DI(Y,YN) D0 q max –I(Y,YN) + I(X;YN) q An annealing algorithm to solve maxqF(q,) = maxq(G(q)+D(q)) Let q0 be the maximizer of maxq G(q), and let 0 =0. For k 0, let (qk , k ) be a solution to maxq G(q) + D(q ). Iterate the following steps until K = max for some K. 1. Perform -step: Let k+1 = k + dk where dk>0 2. The initial guess for qk+1 at k+1 is qk+1(0) = qk + for some small perturbation . 3. Optimization: solve maxq (G(q) + k+1 D(q)) to get the maximizer qk+1 , using initial guess qk+1(0) . Application of the annealing method to the Information Distortion problem maxq (H(YN|Y) + I(X;YN)) when p(X,Y) is defined by four gaussian blobs Y q(YN |Y) YN p(X,Y) X 52 objects Inputs Outputs Y 52 objects 52 objects N objects I(X;YN)=D(q(YN|Y)) Observed Bifurcations for the Four Blob problem: We just saw the optimal clusterings q* at some *= max . What do the clusterings look like for < max ?? Conceptual Bifurcation Structure q* q* 1 N Observed Bifurcations for the 4 Blob Problem ?????? Observed Bifurcations for the 4 Blob Problem Conceptual Bifurcation Structure q* Why are there only 3 bifurcations observed? In general, are there only N-1 bifurcations? What kinds of bifurcations do we expect: pitchfork-like, transcritical, saddle-node, or some other type? How many bifurcating solutions are there? What do the bifurcating branches look like? Are they subcritical or supercritical ? What is the stability of the bifurcating branches? Is there always a bifurcating branch which contains solutions of the optimization problem? Are there bifurcations after all of the classes have resolved ? Bifurcations with symmetry • To better understand the bifurcation structure, we capitalize on the symmetries of the function G(q)+D(q) • The “obvious” symmetry is that G(q)+D(q) is invariant to relabelling of the N classes of YN switch labels 1 and 3 • The symmetry group of all permutations on N symbols is SN. Symmetry Breaking Bifurcations q* q 1 is fixed by S N S 4 N q1 N 1 4 Symmetry Breaking Bifurcations q is fixed by S N 1 S3 * q* q* q 1 is fixed by S N S 4 N q1 N 1 4 Symmetry Breaking Bifurcations q is fixed by S N 1 S3 * q* q * is fixed by S N 2 S 2 q* q* q 1 is fixed by S N S 4 N q1 N 1 4 Symmetry Breaking Bifurcations q* Symmetry Breaking Bifurcations q * is fixed by ( N cycle ) p (1324) 2 (12)(34) q* q* Existence Theorems for Bifurcating Branches q* Given a bifurcation at a point fixed by SN , • Equivariant Branching Lemma (Vanderbauwhede and Cicogna 1980-1) • There are N bifurcating branches, each which have symmetry SN-1 . • The Smoller-Wasserman Theorem (Smoller and Wasserman 1985-6) • There are bifurcating branches which have symmetry <(N-cycle)p> for every prime p|N, p<N. Existence Theorems for Bifurcating Branches q* Given a bifurcation at a point fixed by SN-1 , • Equivariant Branching Lemma (Vanderbauwhede and Cicogna 1980-1) • Gives N-1 bifurcating branches which have symmetry SN-2 . • The Smoller-Wasserman Theorem (Smoller and Wasserman 1985-6) • Gives bifurcating branches which have symmetry <(M-cycle)p> for every prime p|N-1, p<N-1 . When N = 4, N-1=3, there are no bifurcating branches given by SW Theorem. Bifurcation Structure corresponds with Group Structure A partial subgroup lattice for S4 and the corresponding bifurcating directions given by the Equivariant Branching Lemma S4 S3 3v v v v 0 S2 S2 S2 0 2v v v 0 0 v 2v v 0 0 v v 2v 0 S3 v 3v v v 0 S2 S2 S2 2v 0 v v 0 v 0 2v v 0 v 0 v 2v 0 1 S3 v v 3v v 0 S2 S2 S2 2v v 0 v 0 v 2v 0 v 0 v v 0 2v 0 S3 v v v 3v 0 S2 S2 S2 2v v v 0 0 v 2v v 0 0 v v 2v 0 0 A partial subgroup lattice for S4 and the corresponding bifurcating directions given by the Smoller-Wasserman Theorem S4 Fix ( (1234 ) ) 0 A4 Fix ( A4 ) 0 (1324) 12, 34 v v v v 13, 24 14, 23 v v v v v v v v Conceptual Bifurcation Structure q* The Equivariant Branching Lemma shows that the bifurcation structure from SM to SM-1 is … Conceptual Bifurcation Structure q* Group Structure S4 S3 S2 S2 S2 S3 S2 S2 S3 S2 S2 1 S2 S2 S3 S2 S2 S2 The Equivariant Branching Lemma shows that the bifurcation structure from SM to SM-1 is … Conceptual Bifurcation Structure q* q* Group Structure S4 S3 S2 S2 S2 S3 S2 S2 S3 S2 S2 S2 S2 S3 S2 S2 S2 1 The Smoller-Wasserman Theorem shows additional structure … Conceptual Bifurcation Structure q* q* Group Structure S4 (1324) A4 12, 34 13, 24 14, 23 The Smoller-Wasserman Theorem shows additional structure … 3 branches from the S4 to S3 bifurcation only. Conceptual Bifurcation Structure q* q* Group Structure S4 (1324) A4 12, 34 13, 24 14, 23 If we stay on a branch which is fixed by SM , how many bifurcations are there? Conceptual Bifurcation Structure q* q* Theorem: There are at exactly K/N bifurcations on the branch (q1/N , ) for the Information Distortion problem Conceptual Bifurcation Structure q* q* There are 13 bifurcations on the first branch Group Structure S4 (1324) A4 12, 34 13, 24 14, 23 Bifurcation theory in the presence of symmetries enables us to answer the questions previously posed … ?????? Observed Bifurcations for the 4 Blob Problem Conceptual Bifurcation Structure q* Why are there only 3 bifurcations observed? In general, are there only N-1 bifurcations? What kinds of bifurcations do we expect: pitchfork-like, transcritical, saddle-node, or some other type? How many bifurcating solutions are there? What do the bifurcating branches look like? Are they subcritical or supercritical ? What is the stability of the bifurcating branches? solutions of the optimization problem? Is there always a bifurcating branch which contains Are there bifurcations after all of the classes have resolved ? Conceptual Bifurcation Structure ?????? Observed Bifurcations for the 4 Blob Problem q* Why are there only 3 bifurcations observed? In general, are there only N-1 bifurcations? There are N-1 symmetry breaking bifurcations from SM to SM-1 for M N. What kinds of bifurcations do we expect: pitchfork-like, transcritical, saddle-node, or some other type? How many bifurcating solutions are there? There are at least N from the first bifurcation, at least N-1 from the next one, etc. What do the bifurcating branches look like? They are subcritical or supercritical depending on the sign of the bifurcation discriminator (q*,*,uk) . What is the stability of the bifurcating branches? Is there always a bifurcating branch which contains solutions of the optimization problem? No. Are there bifurcations after all of the classes have resolved ? In general, no. We can explain the bifurcation structure of all problems of the form maxq F(q, ) = maxq (G(q)+D(q)) where • [0,). • is a subset of RNK. • G and D are sufficiently smooth in . • G and D are invariant to relabelling of the classes of YN • The blocks of the Hessian q(G+ D) at bifurcation satisfy a set of generic conditions. This class of problems includes the Information Distortion problem. q , L is the constraine d Hessian q F is unconstrai ned Hessian q , L is singular q F is singular N M B R MI K B Ri1 MI K is nonsingula r is singular is nonsingula r i 1 is singular chapter 6 M 1 Non-generic N M B R MI K Symmetry breaking bifurcation i 1 1 i i 1 M 1 Impossible scenario Saddle-node bifurcation chapter 4 i 1 N M chapter 8 B R MI K 1 i 1 i chapter 6 N M q F is nonsingula r Impossible scenario Continuation techniques provide numerical confirmation of the theory Previously Observed Bifurcation Structure for the Four Blob problem: Equivariant Branching Lemma: Previous vs. Actual Bifurcation Structure Previous results: Actual structure: Singularity of F: * Singularity of L : We used Continuation Techniques and the Theory of Bifurcations with Symmetries on the 4 Blob Problem using the Information Distortion method to get this picture. q* Smoller-Wasserman Theorem: there are bifurcating branches with symmetry <(1324)2> = <(12)(34)> q* A closer look … q* Bifurcation from S4 to S3… q* The bifurcation from S4 to S3 is subcritical … (the theory predicted this since the bifurcation discriminator (q1/4,*,u)<0 ) Bifurcation from S3 to S2… q* The bifurcation from S3 to S2 is subcritical … Bifurcation from S2 to S1… q* The bifurcation from S2 to S1 … q* What are these branches ??? Conclusions Theorem: In general, either symmetry breaking bifurcations or saddle-node bifurcations can occur. Outline of proof: The Equivariant Branching Lemma, Smoller-Wasserman Theorem, and the following singularity structure: q , L is singular q F is singular N M N M B R Ri1 MI K B is singular is non - singular i 1 i 1 M 1 Non-generic Symmetry breaking bifurcation 1 i MI K q F is non - singular N M N M B R Ri1 MI K B is singular is non - singular i 1 i 1 1 i MI K M 1 Impossible Scenario Saddle-node bifurcation Impossible scenario Conclusions q* Theorem: All symmetry breaking bifurcations from SM to SM-1 are pitchfork-like, and there exists M bifurcating branches, for which we have explicit directions. Conclusions Theorem: The bifurcation discriminator of the pitchfork-like branch (q*,*,*) + (tu,0,(t)) is 2L (q , u ) 3 u, L[u, L E r ,s [v]r [v]s ( M 2 3M 3) v, 4 F [v, v, v] qr qs * * 3 1 If (q*,*,uk) < 0, then the branch is subcritical. If (q*,*,uk) > 0, then the branch is supercritical. Conclusions Theorem: Solutions of the optimization problem do not always persist from bifurcation. Theorem: In general, bifurcations do not occur after all of the classes have resolved. A numerical algorithm to solve max(G(q)+D(q)) q Let q0 be the maximizer of maxq G(q), 0 =1 and s > 0. For k 0, let (qk , k ) be a solution to maxq G(q) + D(q ). Iterate the following steps until K = max for some K. qk 1. Perform -step: solve q , L (qk , k , k ) q , L (qk , k , k ) k qk and select k+1 = k + dk where for k dk = (s sgn(cos )) /(||qk ||2 + ||k ||2 +1)1/2. 2. The initial guess for (qk+1,k+1) at k+1 is (qk+1(0),k+1 (0)) = (qk ,k) + dk ( qk, k) . 3. Optimization: solve maxq (G(q) + k+1 D(q)) using pseudoarclength continuation to get the maximizer qk+1, and the vector of Lagrange multipliers k+1 using initial guess (qk+1(0),k+1 (0)). 4. Check for bifurcation: compare the sign of the determinant of an identical block of each of q [G(qk) + k D(qk)] and q [G(qk+1) + k+1 D(qk+1)]. If a bifurcation is detected, then set qk+1(0) = qk + d_k u where u is bifurcating direction and repeat step 3. Details … • The Dynamical System • Types of Singularities • Continuation Techniques • The Explicit Group of Symmetries • Explicit Existence Theorems for bifurcating branches A Class of Problems max F(q, ) = max(G(q)+D(q)) q q • G and D are sufficiently smooth in . • G and D must be invariant under relabelling of the classes. The Dynamical System Goal: To determine the bifurcation structure of solutions to maxq (G(q) + D(q)) for [0,) . Method: Study the equilibria of the of the flow q q , L (q, , ) : q , G(q) D(q) y q( z | y) 1 yY z • q, L : n K n K • The Jacobian wrt q of the K constraints {YNq(YN|y)-1} is J=(IK IK … IK). • If wT qF(q*,) w < 0 for every wker J, then q*() is a maximizer of . • The first equilibrium is q*(0 = 0) 1/N. Properties of the Dynamical System • q In our dynamical system q , L (q, , ) the hessian q F q , L (q, , ) T J J 0 determines the stability of equilibria and the location of bifurcation. . q , L is the constraine d Hessian q F is unconstrai ned Hessian q , L is singular q F is singular N M B R MI K B Ri1 MI K is nonsingula r is singular is nonsingula r i 1 is singular chapter 6 M 1 Non-generic N M B R MI K Symmetry breaking bifurcation i 1 1 i i 1 M 1 Impossible scenario Saddle-node bifurcation chapter 4 i 1 N M chapter 8 B R MI K 1 i 1 i chapter 6 N M q F is nonsingula r Impossible scenario Investigating the Dynamical System The Dynamical System How: Use numerical continuation in a constrained system to choose and to choose an initial guess to find the equilibria q*( ). Use bifurcation theory with symmetries to understand bifurcations of the equilibria. Continuation • • • • A local maximum qk*(k) of is an equilibrium of the gradient flow . Initial condition qk+1(0)(k+1(0)) is sought in tangent direction qk, which is found by solving the matrix system qk q , L (qk , k , k ) q , L (qk , k , k ) k The continuation algorithm used to find qk+1*(k+1) is based on Newton’s method. Parameter continuation follows the dashed (---) path, pseudoarclength continuation follows the dotted (…) path (qk 1 , k 1 , k 1 ) ( q, ) (qk 1 , k 1 ) (qk 1 , k 1 ) ( 0) ( 0) ( q k , k , k ) (qk 1 , k 1 k 1 ) ( 0) (qk , k ) k ( 0) k 1( 0) ( 0) The Groups • Let P be the finite group of n ×n “block” permutation matrices which represents the action of SN on q and F(q,) . For example, if N=3, 0 IK 0 IK 0 0 0 0 P permutes q(YN1|y) with q(YN2|y) for every y I K • F(q,) is P -invariant means that for every P, F( q,) = F(q,) • Let be the finite group of (n+K) × (n+K) block permutation matrices q which represents the action of SN on and q, L(q,,): : 0 K n 0 | P the lagrange multiplier s and constraints are fixed ! I K K n K • q, L(q, , ) is -equivariant means that for every q q, L(q, , ) = q, L( ,) Notation and Definitions • q The symmetry of is measured by its isotropy subgroup q , q q | • An isotropy subgroup is a maximal isotropy subgroup of if there does not exist an isotropy subgroup of such that . • q* * At bifurcation ( , ) , the fixed point subspace of q*,* is * Fix ( q* ,* ) w ker q , L (q* , * , * ) | w w, q* ,* Equivariant Branching Lemma One of the Existence Theorems we use to describe a bifurcation in the presence of symmetries is the Equivariant Branching Lemma (Vanderbauwhede and Cicogna 1980-1). Idea: The bifurcation structure of local solutions is described by the isotropy subgroups of which have dim Fix()=1. • System: x r ( x,. ), r : m m • r(x,) is G-equivariant for some compact Lie Group G • r (0,0) 0, x r (0,0) 0 • Fix(G)={0} • Let H be an isotropy subgroup of G such that dim Fix (H) = 1. • Assume r(0,0) 0 (crossing condition). Then there is a unique smooth solution branch (tx0,(t)) to r = 0 such that x0 Fix (H) and the isotropy subgroup of each solution is H. Symmetry Breaking from SM to SM-1 From bifurcation, the Equivariant Branching Lemma shows that the following solutions emerge: An stationary point q* is M-uniform if there exists 1 M N and a K x 1 vector P such that q(yNi|Y)=P for M and only M classes, {yNi}Ni=1 of YN. These M classes of YN are unresolved classes. The classes of YN that are not unresolved are called resolved. The first equilibria, q* 1/N, is N-uniform. Theorem: q* is M-uniform if and only if q* is fixed by SM. Kernel of the Hessian at Symmetry Breaking Bifurcation Theorem: dim ker qF (q*,)=M with basis vectors {vi}Mi=1 v if is the i th unresolved class [vi ] 0 otherwise Theorem: dim ker q,L (q*,,)=M-1 with basis vectors vi vM 0 0 Point: Since the bifurcating solutions whose existence is guaranteed by the EBL and the SW Theorem are tangential to ker q,L (q*,,), then we know the explicit form of the bifurcating directions. Symmetry Breaking Bifurcation from M-uniform solutions Assumptions: • • • Let q* be M-uniform Call the M identical blocks of qF (q*,): B. Call the other N-M blocks of qF (q*,): {R}. We assume that B has a single nullvector v and that R is nonsingular for every . If M<N, then BR-1 + MIK is nonsingular. Theorem: Let (q*,*,*) be a singular point of the flow q q , L (q, , ) such that q* is M-uniform. Then there exists M bifurcating (M-1)uniform solutions (q*,*,*) + (tuk,0,(t)), where ( M 1)v if is the k th unresolved class [uk ] v if k is any other unresolved class 0 otherwise q , L is the constraine d Hessian q F is unconstrai ned Hessian q , L is singular q F is singular N M B R MI K B Ri1 MI K is nonsingula r is singular is nonsingula r i 1 is singular chapter 6 M 1 Non-generic N M B R MI K Symmetry breaking bifurcation i 1 1 i i 1 M 1 Impossible scenario Saddle-node bifurcation chapter 4 i 1 N M chapter 8 B R MI K 1 i 1 i chapter 6 N M q F is nonsingula r Impossible scenario Some of the bifurcating branches when N = 4 are given by the following isotropy subgroup lattice for S4 S4 S3 3v v v v 0 S2 S2 S2 0 2v v v 0 0 v 2v v 0 0 v v 2v 0 S3 v 3v v v 0 S2 S2 S2 2v 0 v v 0 v 0 2v v 0 v 0 v 2v 0 1 S3 v v 3v v 0 S2 S2 S2 2v v 0 v 0 v 2v 0 v 0 v v 0 2v 0 S3 v v v 3v 0 S2 S2 S2 2v v v 0 0 v 2v v 0 0 v v 2v 0 0 For the 4 Blob problem: The isotropy subgroups and bifurcating directions of the observed bifurcating branches isotropy group: bif direction: S4 S3 S2 (-v,-v,3v,-v,0)T (-v,2v,0,-v,0)T 1 (-v,0,0,v,0)T … No more bifs! Smoller-Wasserman Theorem The other Existence Theorem: Smoller-Wasserman Theorem (1985-6) For variational problems where r ( x, ) x f ( x, ) there is a bifurcating solution tangential to Fix(H) for every maximal isotropy subgroup H, not only those with dim Fix(H) = 1. • dim Fix(H) =1 implies that H is a maximal isotropy subgroup Other branches The Smoller-Wasserman Theorem shows that (under the same assumptions as before) if M is composite, then there exists bifurcating solutions with isotropy group <p> for every element of order M in and every prime p|M, p<M. Furthermore, dim (Fix <p>)=p-1 Bifurcating branches from a 4-uniform solution are given by the following isotropy subgroup lattice for S4 S4 Fix ( (1234 ) ) 0 A4 Fix ( A4 ) 0 (1324) 12, 34 v v v v 13, 24 14, 23 v v v v v v v v Maximal isotropy subgroup for S4 S4 S3 S3 S3 S3 12, 34 A4 13, 24 14, 23 Issues: SM • The full lattice of subgroups of the group SM is not known for arbitrary M. • The lattice of maximal subgroups of the group SM is not known for arbitrary M. More about the Bifurcation Structure Theorem: All symmetry breaking bifurcations from SM to SM-1 are pitchfork-like. Outline of proof: ’(0)=0 since 2xx r(0,0) =0. Theorem: The bifurcation discriminator of the pitchfork-like branch (q*,*,*) + (tuk,0,(t)) is 2L (q , u ) 3 u, L[u, L E r ,s [v]r [v]s ( M 2 3M 3) v, 4 F [v, v, v] qr qs * * 3 1 If (q*,*,uk) < 0, then the branch is subcritical. If (q*,*,uk) > 0, then the branch is supercritical. Theorem: Generically, bifurcations do not occur after all of the classes have resolved. Theorem: If dim (ker q,L (q*,,)) = 1, and if a crossing condition is satisfied, then saddle-node bifurcation must occur.