Presentation

advertisement
The second-round discussion* on
Geometric diffusions as a tool for
harmonic analysis and structure definition
of
Bydata
R. R. Coifman et al.
* The first-round discussion was led by Xuejun;
* The third-round discussion is to be led by Nilanjan.
Diffusion Maps
• Purpose
- finding meaningful structures and geometric descriptions of a
data set X.
- dimensionality reduction
• Why?
The high dimensional data is often subject to
a large quantity of constraints (e.g. physical laws)
that reduce the number of degrees of freedom.
Diffusion Maps
• Markov Random Walk


 K ( xi , x j ) 
1
A  [aij ]  [ p( x j | xi )]  

D
K

K ( xi , x j )

 j

Many works propose to use first few eigenvectors of A as a low representation
of data (without rigorous justification).
• Symmetric Kernel
a~ij 
K ( xi , x j )
 K ( x , x ) K ( x , x
i
i
j
i
j
• Relationship
~
A  D1 / 2 AD 1 / 2
j
)
 a~ ji
Diffusion Maps
• Spectral Decomposition of A
a~ij   2nn ( xi )n ( x j )
n
where 0  1  1  2   0
• Spectral Decomposition of Am
a~ij( m)   2nmn ( xi )n ( x j )
n
• Diffusion maps
m0 0 ( x )
 m

 m ( x )  1 1 ( x ) 
  


Diffusion Distance
• Diffusion distance of m-step
Dm2 ( xi , x j )  a~ii( m )  a~ jj( m )  2a~ij( m )
  m ( xi )   m ( x j )
2
• Interpretation
2
( m / 2)
( m / 2)
~
~
D ( xi , x j )   a
( xi , z )  a
( x j , z ) dz
2
m
The diffusion distance measures the rate of connectivity between xi
and xj by paths of length m in the data.
Diffusion vs. Geodesic Distance
Dgeod . ( A, B )  Dgeod . (C , B )
Dm ( A, B)  Dm (C, B)
Data Embedding

d
d
• By mapping the original data x  R into x  R 0 (often d  d 0 )


x  x  m0 0 ( x), 1m1 ( x), , md0d0 ( x)

• The diffusion distance can be accurately approximated
 2
Dm2 ( x, y )  x  y 1  O ( e m ) 
Example: curves
Umist face database: 36 pictures (92x112 pixels) of the
same person being randomly permuted.
Goal: recover the geometry of the data set.
Original ordering
Re-ordering
The second eigenfunction 1 assigns a real number
to each image. When this set of numbers is re - ordered,
one obtains a graph very similar to cos(t ) on 0,  .
The natural parameter (angle of the head) is recovered, the data points are
re-organized and the structure is identified as a curve with 2 endpoints.
Example: surface
Original set: 1275 images (75x81 pixels) of the word “3D”.
Diffusion Wavelet
• A function f defined on the data admits a multiscale
representation of the form:
s0 1


f  
A s0 f   A s  A s 1 f

s 0
coarsest scale
wavelet decomposition
• Need a method compute and efficiently represent
the powers Am.
Diffusion Wavelet
• Multi-scale analysis of diffusion
Discretize the semi-group {At:t>0} of the powers of A at a
logarithmic scale
which satisfy
Diffusion Wavelet
• The detail subspaces
• Downsampling, orthogonalization, and operator
compression
A - diffusion operator, G – Gram-Schmidt ortho-normalization, M - AG
 - diffusion maps: X is the data set
 j  { j , k : k  X j }
 0  { k : k  X }
Diffusion multi-resolution analysis on
the circle. Consider 256 points on the
unit circle, starting with 0,k=k and with
the standard diffusion. Plot several
scaling functions in each
approximation space Vj.
Diffusion multi-resolution analysis on the
circle. We plot the compressed matrices
representing powers of the diffusion
operator. Notice the shrinking of the size
of the matrices which are being
compressed at the different scales.
Multiscale Analysis of MDPs
[1] S. Mahadevan, “Proto-value Functions: Developmental Reinforcement Learning”,
ICML05.
[2] S. Mahadevan, M. Maggioni, “Value Function Approximation with Diffusion Wavelets
and Laplacian Eigenfunctions”, NIPS05.
[3] M. Maggioni, S. Mahadevan, “Fast Direct Policy Evaluation using Multiscale
Analysis of Markov Diffusion Processes”, ICML06.
To be discussed a third-round led by Nilanjan
Thanks!
Download