A Class of Problems   

advertisement
A Class of Problems
We use


Numerical continuation
Bifurcation theory with symmetries
to analyze a class of optimization problems of the form
max F(q,)=max (G(q)+D(q)).
q
q

The goal is to solve  for  = B(0,), where:
•


. : q( z | y ) |  q( z | y )  1, y  Y   n
zZ


•
•
•
G and D are infinitely differentiable in interior of .
G has a known local maximum
G and D must be invariant under relabeling of the classes.
Problems in this Class
•
Deterministic Annealing (Rose 1998)
max H(Z|Y) -  D(Y,Z)
Clustering Algorithm
•
Rate Distortion Theory (Shannon ~1950)
max –I(Y,Z) -  D(Y,Z)
Optimal Source Coding
•
Information Distortion (Dimitrov and Miller2001)
max H(Z|Y) +  I(X,Z)
Used in neural coding.
•
Information Bottleneck Method (Tishby, Pereira, Bialek 2000)
max –I(Y,Z) +  I(X,Z)
Used for document classification, gene expression,
neural coding and spectral analysis
Rate Distortion
How well is the source X represented by Z?
p(X)
X
Z is a representation of X using N symbols (or clusters)
Information Distortion
A good communication system has p(X,Y) like:
2H(Y) output sequences
1
Y
2
2I(X,Y) distinguishable
input/output classes of (x,y) pairs
3
4
Size of an input/output class:
2H(X)
X
input sequences
input
source
X
2(H(X|Y) + H(Y|X)) pairs
clustered
outputs
output
source
P(Y |X)
Y
q*(Z |Y)
Z
Q*(Z |X)
Goal: Determine the input/output classes of (x,y) pairs.
Idea: We seek to quantize (X,Y) into clusters which correspond with the
input/output classes.
Method: We determine a quantizer, Q*, between X and Z , a
representation of Y using N elements, such that the cost function
F(Q*,B) is a maximum for some B  (0,).
Some nice properties of the
problem
 The feasible region , a product of simplices, is nice.
Lemma
 is the convex hull of vertices ().
y1
 When D is convex, the optimal quantizer q* is
DETERMINISTIC.
y2
y3
y1
y2
y3
Theorem The extrema of  lie generically on the vertices of ..
Corollary The optimal quantizer is invariant to small perturbations
in the model.
Solution of the problem when
p(X,Y):= 4 gaussian blobs
p(X,Y)
I(X,Z) vs. N
The Dynamical System
Goal: To efficiently solve  maxq  (G(q) +  D(q)) for each , incremented
in sufficiently small steps, as   B.
Method: Study the equilibria of the of the flow

 q 



    q , L (q,  ,  ) :  q ,  G(q)   D(q)    y   q( z | y)  1  

yY
 z

 

•
The Jacobian wrt q of the K constraints {zq(z|y)-1} is J = (IK IK … IK).
•
The equilibrium at =0 is q*(0)  1/N.
•
 q F
 q., L (q,  ,  )   T
 J
J

0
determines stability and location of
bifurcation.
Assumptions:
•
Let q* be a local solution to  and fixed by SM .
•
Call the M identical blocks of q F (q*,): B. Call the other N-M blocks
of q F (q*,): {R}.
•
At a singularity (q*,*,*), B has a single nullvector v and R is
nonsingular for every .
•
If M<N, then BR-1 + MIK is nonsingular.
Theorem: If (q*,*,*) is a bifurcation of equilibria of , then  *  1.
For the four Blob Problem when N >2, the first bifurcation is subcritical (a first
order phase transition):
Investigating the Dynamical System
How:

Use numerical continuation in a constrained system to
choose  and to choose an initial guess to find the equilibria
q*( ).

Use bifurcation theory with symmetries to understand
bifurcations of the equilibria.
Continuation
(qk 1 ,  k 1 )
*
q
*
qk 1
(qk ,  k )
*
( 0)
k 1
q
(qk 1 ,  k 1 )
( 0)
qk
*
k
 k 1( 0)
( 0)

• A local maximum qk*(k) of  is an equilibrium of the
gradient flow .
• Initial condition qk+1(0)(k+1(0)) is sought in the tangent
direction  qk , which is found by solving the matrix system
   qk 
     q , L (qk , k ,  k )
 q , L (qk , k ,  k )
  
  k
• The continuation algorithm used to find qk+1*(k+1) is based
on Newton’s method.
Conceptual Bifurcation Structure
q* (YN|Y)
q* 
1
N
Bifurcations of q*()

Observed Bifurcations for the 4 Blob Problem
Bifurcations with symmetry
To better understand the bifurcation structure, we use
the symmetries of the cost function F(q,).
The symmetry is that F(q,) is invariant to relabeling
of the N classes of Z
The symmetry group of all permutations on N symbols is SN.
q 
The action of SN on   and q, L (q, , ) is represented by the finite
 
Lie Group

 
 : 
 0

 K n

0 

n K 
|   P
I 

K K 

where P is a “block permutation” matrix.
q 
The symmetry of   is measured by its isotropy group, the subgroup
 
of  which fixes it.
What do the bifurcations look like?
The Equivariant Branching Lemma gives the existence of
bifurcating solutions for every isotropy subgroup which
fixes a one dimensional subspace of ker q,L (q*,,).
Theorem:
Let (q*,*,*) be a singular point of the flow
 q 
    q , L (q,  ,  ) 

 
such that q* is fixed by SM. Then there exists M
bifurcating solutions, (q*,*,*) + (tuk,0,(t)), each with
isotropy group SM-1, where
( M  1)v if  is the k th unresolved class

[uk ]   v
if   k is any other unresolved class
0
otherwise


and v is a nullvector of an unresolved block of the Hessian.
Bifurcation Structure
Let T(q*,*) =
3  uk ,  3L[uk , PL L PL  3L[uk , uk ]  ( M 2  3M  3) 4 F [uk , uk , uk ] 
Pitchform Like Bifurcations.
Theorem: All bifurcations “pitchfork like”.
Branch Orientation?
Theorem: If T(q*,*) > 0, then the branch is supercritical. If
T(q*,*) < 0, then the branch is subcritical.
Branch Stability?
Theorem: If T(q*,*) < 0, then all branches fixed by SM-1 are
unstable.
Partial lattice of the isotropy subgroups of S4
(and associated bifurcating directions)
S4
 3v 
 
 v
 v
 v
 
0 
 v
 
 3v 
 v
 v
 
0 
S3
S2 S2 S2
 0 
 
 2v 
 v
 v
 
0 
 0 
 
 v
 2v 
 v
 
0 
 0 
 
 v
 v
 2v 
 
0 
S3
S3
S2 S2 S2
 2v 
 
 0 
 v
 v
 
0 
 v
 
 0 
 2v 
 v
 
0 
 v
 
 0 
 v
 2v 
 
0 
 v
 
 v
 3v 
 v
 
0 
S2 S2 S2
 2v 
 
 v
 0 
 v
 
0 
 v
 
 2v 
 0 
 v
 
0 
 v
 
 v
 0 
 2v 
 
0 
S3
 v
 
 v
 v
 3v 
 
0 
S2 S2 S2
 2v 
 
 v
 v
 0 
 
 0 
 v
 
 2v 
 v
 0 
 
 0 
 v
 
 v
 2v 
 0 
 
 0 
1
For the 4 blob problem:
The isotropy subgroups and
bifurcating directions of the
observed bifurcating branches
isotropy group: S4
S3
S2
1
bif direction:
(-v,-v,3v,-v,0)T (-v,2v,0,-v,0)T (-v,0,0,v,0)T …No more bifs!
Other Branches
The Smoller-Wasserman Theorem ascertains the existence
of bifurcating branches for every maximal isotropy
subgroup.
Theorem: If M is a composite number, then there exists
bifurcating solutions with isotropy group <p> for every
element  of order M in  and every prime p|M. The
bifurcating direction is in the p-1 dimensional subspace of
ker q,L (q*,,) which is fixed by <p>.
Lattice of the maximal isotropy
subgroups <p> in S4
S4
 (1423) 
A4
 (1324) 
 1234  
2
 1324  
2
 v 
 
 v 
 v
 
 v
 
2


 1243 
 v 
 
 v
 v
 
 v 
 
 v 
 
 v
 v 
 
 v
 
The above theorem states that there are bifurcating solutions
from q1/4 with symmetry <(1234)2>, <(1243)2>, <(1324)2>.
The full lattice of subgroups of the group SM is not known for
arbitrary M.
A numerical algorithm
to solve max F(q, )
Let q0 be the maximizer of maxq G(q), 0 =1 and s > 0. For k 
0, let (qk , k ) be a solution to maxq (G(q) +  D(q )). Iterate
the following steps until K = B for some K.
   qk 
     q , L (qk , k ,  k )
1. Perform  -step: solve q , L (qk , k ,  k )

   k 
   qk 
 and select  k+1 = k + dk where
for 




k


dk = s /(||qk ||2 + ||k ||2 +1)1/2.
2. The initial guess for qk+1 at  k+1 is qk+1(0) = qk + dk  qk .
3. Optimization: solve maxq (G(q) +  k+1 D(q)) to get the
maximizer q*k+1 , using initial guess qk+1(0) .
4. Check for bifurcation: compare the sign of the determinant
of an identical block of each of q [G(qk) +  k D(qk)] and
q [G(qk+1) +  k+1 D(qk+1)]. If a bifurcation is detected, then
set qk+1(0) = qk + dk u where u is given by  and repeat step 3.
Download