Self Organizing Maps

advertisement
DIMENSIONALITY
REDUCTION
June 2013
Computer Graphics Course
What is high dimensional data?
Images
Videos
Documents
Most data, actually!
How many dimensions?






Images – dimension 3·X·Y
This is the number of bytes in the image file
We can treat each byte as a dimension
Each image is a point in high dimensional space
Which space?
“space of images of size X·Y”
How many dimensions?



But we can describe an image using less bytes!
“Blue sky, green grass, yellow road…”
“Drawing of a kong-fu rat”
Why do Dimensionality Reduction?

Visualization: Understanding the structure of data
Why do Dimensionality Reduction?





Visualization: Understanding the structure of data
Fewer dimensions are easy to describe and find
correlations (rules)
Compression of data for efficiency
Clustering
Discovering similarities between elements
Why do Dimensionality Reduction?

Curse of dimensionality








100000000000
010000000000
001000000000
000100000000
…
All these vectors are the same
Euclidean distance from each other
But some dimensions could be “worth more”
Can you work with 1,000 images of 1,000,000
dimensions?
How to reduce dimensions?

Image features:
 Average
colors
 Histograms
 FFT based features (Frequency space)
 More…



Video features
Document features
Etc…
How to reduce dimensions?


Feature dimension is still quite high (512, 1024, etc)
What now?
Linear Dimensionality Reduction

Simplest way: Project all points on a plane (2D) or
a lower dimension sub-space
Linear Dimensionality Reduction



Simplest way: Project all points on a plane (2D)
Only one question:
Which plane is the best?
PCA (SVD)
Linear Dimensionality Reduction




Simplest way: Project all points on a plane (2D)
Only one question:
Which plane is the best?
PCA (SVD)
For specific
applications:
 CCA
(correlation)
 LDA (data with labels)
 NMF (non-negative components)
 ICA (multiple sources)
Non-Linear Dimensionality Reduction

What if data is not linear?

No plane will work here
Non-Linear Dimensionality Reduction





MDS – MultiDimensional Scaling
Use only distances between elements
Try to reconstruct element positions from distances
such that:
Reconstruction can happen in 1D, 2D, 3D, …
More dimensions = less error
Non-Linear Dimensionality Reduction
Non-Linear Dimensionality Reduction

MDS – MultiDimensional Scaling

Classical MDS: an algebraic solution
 Construct
a squared proximity matrix using some
normalization (“double centering”)
 Extract d largest eigenvectors / eigenvalues
 Multiply each eigenvector with sqrt(eigenvalue)
 Each row is the coordinates of its corresponding point
Non-Linear Dimensionality Reduction

MDS – MultiDimensional Scaling

Classical MDS: an algebraic solution
x1
x2
x3
x4
x5
…
e1 e2 e3 e4 e5
Each vector adds a dimension to
the mapping
Non-Linear Dimensionality Reduction

Non-metric MDS: Optimization problem
 Example:
Sammon’s projection
 Start from random positions for each element
 Define
 In
stress of the system:
each step, move towards positions that reduce the
stress (gradient descent)
 Continue until convergence
Non-Linear Dimensionality Reduction




Spectral embedding:
Create a graph of nearest neighbors
Compute the graph laplacian (relates to probability of
walking on each edge in a random walk)
Compute Eigenvalues – why?



Computing Eigenvalues is like multiplying the matrix by itself
many many times (towards infinity), which is like performing
random walks over and over until we reach a stable point
Again, the eigenvectors are the coordinates
Does not preserve distances like MDS – instead it
groups together points that are likely neighbors
Non-Linear Dimensionality Reduction




Other non-linear methods
Locally Linear Embedding (LLE): express each
point as a linear combination of its neighbors
Isomap: Takes adjacency graph as input, and
calculate MDS of the geodesic distances (distances
on the graph)
Self Organizing Maps (SOM): Next part…
SELF ORGANIZING MAPS
& RECENT APPLICATIONS
June 2013
Computer Graphics Course
Self Organizing Maps (SOM)







Originated from neural networks
Created by Kohonen, 1982
Also known as Kohonen Maps
Teuvo Kohonen: A Finnish researcher, learning and
neural networks
Due to SOM, became the most cited Finnish scientist!
More than 8,000 citations
So what is it?
What is a SOM?




A type of neural network
What is a neuron?
A function with several inputs and one output
In this case – usually a linear combination of the
input according to weights
What is a SOM?
no connection
(feedback/feed
forward) between
neurons
neurons
weights (mik)
input (xk)
Training a SOM





Start from random weights
For each input X(t) at iteration t:
Find the Best Matching Cell (BMC) (also called Best
Matching Unit or BMU) for X(t)
Update weights for each neuron close to the BMU
Weights are updated according to a decaying
learning rate and radius
Training a SOM
X(1)
neurons (mi)
BMC(1)
BMC(2)
X(2)
Training a SOM – The Math




Best Matching Cell: mc for which mc (t )  x(t )
is minimal
Another option for BMC: maximal dot product
x(t)Tmc(t)
Weight adaptation: mi (t  1)  mi (t )  hi (t )[ x(t )  mi (t )]
hi (t ) is a learning rate dependant of both the time
and the distance of mi from the BMC mc
Training a SOM – The Math

Example (motion map): mi (t  1)  mi (t )  hi (t )[ x(t )  mi (t )]
hi (t )   (t )  e
learning rate

d ( mi , mc )
2 2
distance between
BMC and mi
kernel width
t
 (t )  0.8  (1.0  6tMax
)
maximum number of iterations
t
 (t )  0.25  ( H  W )  (1.0  tMax
)
height and width of the neuron map
Presenting a SOM



Option 1: at each node present the data that
relates to vector mi (3D data, colors, continuous
spaces)
So for a color map with 3 inputs,
if a neuron weights are (0.7, 0.2, 0.3) we would
show a reddish color with 0.7 red component , 0.2
green component and 0.3 blue component
For a map of points on the plane with 2 inputs, we
would draw a point for each neuron in position
(Wx, Wy)
Presenting a SOM

Option 1: at each node present the data that
relates to vector mi (3D data, colors, continuous
spaces)
Presenting a SOM

Option 2: give each neuron a representation from
the training set X which is closest to vector mi
More Examples
More Examples
More Examples
Motion Map

Motion Map: Image-based Retrieval and
Segmentation of Motion Data
 Sakamato,
Kuriyama, Kenko
 SCA: Symposium on Computer Animation 2004


Goal: Presenting the user with a grid of postures in
order to select a clip of motion data from a large
database
Perform clustering on the SOM instead of the
abstract data
Motion Map

Example results: 436 posture samples from 55K frames of 51
motion files
Motion Map

Example results: Clustering based on SOM
Motion Map - Details





A map of posture samples is created from all
motion files together
Each sample similarity to its closest sample is over a
given threshold to reduce computation time
A standard SOM is calculated
Each posture is then connected to a hash table of
the motion files that contain similar postures
Clustering the SOM enables display of a simplified
map to the user (next page)
Motion Map - Details

Simplified map after SOM clustering: 17 dance styles
Procedural Texture Preview



Eurographics 2012
Goal: Present the user with a single image which
shows all possibilities of a procedural texture
Method overview:
 Selecting
candidate vectors of parameters which
maximize completeness, variety and smoothness
 Organizing the candidates in a SOM
 Synthesis of a continuous map
Procedural Texture Preview

Results
texture parameters
thumbnails of
random parameters
texture preview in a
single image
Procedural Texture Preview - Details


Selecting candidates for the parameters map using
the following optimizations:
C = a set of dense samples
X = the candidates in the parameter map
Completeness: minimize EC ( X )  max min(M ( p, q))
pC
qX

min(M (r, q))
Variety: maximize EV ( X )  min
rX qX

Smoothness: minimize ES ( X )  

pX qNear ( p )
M ( p, q )
Procedural Texture Preview - Details



A standard SOM will jointly optimize the
completeness and the smoothness
To optimize the variety as well, the SOM
implementation switches between minimizing Ev and
maximizing Ec
Instead of regular learning rate, at each step the
candidates (weights vectors) are replaced by a new
candidate according to the above optimizations
Procedural Texture Preview - Details


After the candidate selection, an image is
synthesized which smoothly combines all selected
candidates
Stitching is done using standard patch based
texture synthesis methods (Graphcut Textures,
Kwarta et al, TOG 2003)
Procedural Texture Preview

Some more results
That’s all folks!

Questions?
Download