DIMENSIONALITY REDUCTION June 2013 Computer Graphics Course What is high dimensional data? Images Videos Documents Most data, actually! How many dimensions? Images – dimension 3·X·Y This is the number of bytes in the image file We can treat each byte as a dimension Each image is a point in high dimensional space Which space? “space of images of size X·Y” How many dimensions? But we can describe an image using less bytes! “Blue sky, green grass, yellow road…” “Drawing of a kong-fu rat” Why do Dimensionality Reduction? Visualization: Understanding the structure of data Why do Dimensionality Reduction? Visualization: Understanding the structure of data Fewer dimensions are easy to describe and find correlations (rules) Compression of data for efficiency Clustering Discovering similarities between elements Why do Dimensionality Reduction? Curse of dimensionality 100000000000 010000000000 001000000000 000100000000 … All these vectors are the same Euclidean distance from each other But some dimensions could be “worth more” Can you work with 1,000 images of 1,000,000 dimensions? How to reduce dimensions? Image features: Average colors Histograms FFT based features (Frequency space) More… Video features Document features Etc… How to reduce dimensions? Feature dimension is still quite high (512, 1024, etc) What now? Linear Dimensionality Reduction Simplest way: Project all points on a plane (2D) or a lower dimension sub-space Linear Dimensionality Reduction Simplest way: Project all points on a plane (2D) Only one question: Which plane is the best? PCA (SVD) Linear Dimensionality Reduction Simplest way: Project all points on a plane (2D) Only one question: Which plane is the best? PCA (SVD) For specific applications: CCA (correlation) LDA (data with labels) NMF (non-negative components) ICA (multiple sources) Non-Linear Dimensionality Reduction What if data is not linear? No plane will work here Non-Linear Dimensionality Reduction MDS – MultiDimensional Scaling Use only distances between elements Try to reconstruct element positions from distances such that: Reconstruction can happen in 1D, 2D, 3D, … More dimensions = less error Non-Linear Dimensionality Reduction Non-Linear Dimensionality Reduction MDS – MultiDimensional Scaling Classical MDS: an algebraic solution Construct a squared proximity matrix using some normalization (“double centering”) Extract d largest eigenvectors / eigenvalues Multiply each eigenvector with sqrt(eigenvalue) Each row is the coordinates of its corresponding point Non-Linear Dimensionality Reduction MDS – MultiDimensional Scaling Classical MDS: an algebraic solution x1 x2 x3 x4 x5 … e1 e2 e3 e4 e5 Each vector adds a dimension to the mapping Non-Linear Dimensionality Reduction Non-metric MDS: Optimization problem Example: Sammon’s projection Start from random positions for each element Define In stress of the system: each step, move towards positions that reduce the stress (gradient descent) Continue until convergence Non-Linear Dimensionality Reduction Spectral embedding: Create a graph of nearest neighbors Compute the graph laplacian (relates to probability of walking on each edge in a random walk) Compute Eigenvalues – why? Computing Eigenvalues is like multiplying the matrix by itself many many times (towards infinity), which is like performing random walks over and over until we reach a stable point Again, the eigenvectors are the coordinates Does not preserve distances like MDS – instead it groups together points that are likely neighbors Non-Linear Dimensionality Reduction Other non-linear methods Locally Linear Embedding (LLE): express each point as a linear combination of its neighbors Isomap: Takes adjacency graph as input, and calculate MDS of the geodesic distances (distances on the graph) Self Organizing Maps (SOM): Next part… SELF ORGANIZING MAPS & RECENT APPLICATIONS June 2013 Computer Graphics Course Self Organizing Maps (SOM) Originated from neural networks Created by Kohonen, 1982 Also known as Kohonen Maps Teuvo Kohonen: A Finnish researcher, learning and neural networks Due to SOM, became the most cited Finnish scientist! More than 8,000 citations So what is it? What is a SOM? A type of neural network What is a neuron? A function with several inputs and one output In this case – usually a linear combination of the input according to weights What is a SOM? no connection (feedback/feed forward) between neurons neurons weights (mik) input (xk) Training a SOM Start from random weights For each input X(t) at iteration t: Find the Best Matching Cell (BMC) (also called Best Matching Unit or BMU) for X(t) Update weights for each neuron close to the BMU Weights are updated according to a decaying learning rate and radius Training a SOM X(1) neurons (mi) BMC(1) BMC(2) X(2) Training a SOM – The Math Best Matching Cell: mc for which mc (t ) x(t ) is minimal Another option for BMC: maximal dot product x(t)Tmc(t) Weight adaptation: mi (t 1) mi (t ) hi (t )[ x(t ) mi (t )] hi (t ) is a learning rate dependant of both the time and the distance of mi from the BMC mc Training a SOM – The Math Example (motion map): mi (t 1) mi (t ) hi (t )[ x(t ) mi (t )] hi (t ) (t ) e learning rate d ( mi , mc ) 2 2 distance between BMC and mi kernel width t (t ) 0.8 (1.0 6tMax ) maximum number of iterations t (t ) 0.25 ( H W ) (1.0 tMax ) height and width of the neuron map Presenting a SOM Option 1: at each node present the data that relates to vector mi (3D data, colors, continuous spaces) So for a color map with 3 inputs, if a neuron weights are (0.7, 0.2, 0.3) we would show a reddish color with 0.7 red component , 0.2 green component and 0.3 blue component For a map of points on the plane with 2 inputs, we would draw a point for each neuron in position (Wx, Wy) Presenting a SOM Option 1: at each node present the data that relates to vector mi (3D data, colors, continuous spaces) Presenting a SOM Option 2: give each neuron a representation from the training set X which is closest to vector mi More Examples More Examples More Examples Motion Map Motion Map: Image-based Retrieval and Segmentation of Motion Data Sakamato, Kuriyama, Kenko SCA: Symposium on Computer Animation 2004 Goal: Presenting the user with a grid of postures in order to select a clip of motion data from a large database Perform clustering on the SOM instead of the abstract data Motion Map Example results: 436 posture samples from 55K frames of 51 motion files Motion Map Example results: Clustering based on SOM Motion Map - Details A map of posture samples is created from all motion files together Each sample similarity to its closest sample is over a given threshold to reduce computation time A standard SOM is calculated Each posture is then connected to a hash table of the motion files that contain similar postures Clustering the SOM enables display of a simplified map to the user (next page) Motion Map - Details Simplified map after SOM clustering: 17 dance styles Procedural Texture Preview Eurographics 2012 Goal: Present the user with a single image which shows all possibilities of a procedural texture Method overview: Selecting candidate vectors of parameters which maximize completeness, variety and smoothness Organizing the candidates in a SOM Synthesis of a continuous map Procedural Texture Preview Results texture parameters thumbnails of random parameters texture preview in a single image Procedural Texture Preview - Details Selecting candidates for the parameters map using the following optimizations: C = a set of dense samples X = the candidates in the parameter map Completeness: minimize EC ( X ) max min(M ( p, q)) pC qX min(M (r, q)) Variety: maximize EV ( X ) min rX qX Smoothness: minimize ES ( X ) pX qNear ( p ) M ( p, q ) Procedural Texture Preview - Details A standard SOM will jointly optimize the completeness and the smoothness To optimize the variety as well, the SOM implementation switches between minimizing Ev and maximizing Ec Instead of regular learning rate, at each step the candidates (weights vectors) are replaced by a new candidate according to the above optimizations Procedural Texture Preview - Details After the candidate selection, an image is synthesized which smoothly combines all selected candidates Stitching is done using standard patch based texture synthesis methods (Graphcut Textures, Kwarta et al, TOG 2003) Procedural Texture Preview Some more results That’s all folks! Questions?