Image classification by sparse coding Andrew Ng Feature learning problem • Given a 14x14 image patch x, can represent it using 196 real numbers. • Problem: Can we find a learn a better representation for this? Andrew Ng Unsupervised feature learning Given a set of images, learn a better way to represent image than pixels. Andrew Ng First stage of visual processing in brain: V1 The first stage of visual processing in the brain (V1) does “edge detection.” Schematic of simple cell Actual simple cell “Gabor functions.” [Images from DeAngelis, Ohzawa & Freeman, 1995] Andrew Ng Learning an image representation Sparse coding (Olshausen & Field,1996) Input: Images x(1), x(2), …, x(m) (each in Rn x n) Learn: Dictionary of bases f1, f2, …, fk (also Rn x n), so that each input x can be approximately decomposed as: s.t. aj’s are mostly zero (“sparse”) Andrew Ng Sparse coding illustration Learned bases (f1 , …, f64): “Edges” Natural Images 50 100 150 200 50 250 100 300 150 350 200 400 250 50 300 100 450 500 50 100 150 200 350 250 300 350 400 450 150 500 200 400 250 450 300 500 50 100 150 350 200 250 300 350 100 150 400 450 500 400 450 500 50 200 250 300 350 400 450 500 Test example 0.8 * x 0.8 * + 0.3 * f36 + 0.3 * + 0.5 * f42 + 0.5 * f63 [0, 0, …, 0, 0.8, 0, …, 0, 0.3, 0, …, 0, 0.5, …] Compact & easily = [a1, …, a64] (feature representation) interpretable Andrew Ng More examples 0.6 * + 0.8 * f15 + 0.4 * f28 f37 Represent as: [0, 0, …, 0, 0.6, 0, …, 0, 0.8, 0, …, 0, 0.4, …] 1.3 * + 0.9 * f5 + 0.3 * f18 f29 Represent as: [0, 0, …, 0, 1.3, 0, …, 0, 0.9, 0, …, 0, 0.3, …] • Method hypothesizes that edge-like patches are the most “basic” elements of a scene, and represents an image in terms of the edges that appear in it. • Use to obtain a more compact, higher-level representation of the scene than pixels. Andrew Ng Digression: Sparse coding applied to audio [Evan Smith & Mike Lewicki, 2006] Andrew Ng Digression: Sparse coding applied to audio [Evan Smith & Mike Lewicki, 2006] Andrew Ng Sparse coding details Input: Images x(1), x(2), …, x(m) (each in Rn x n) L1 sparsity term (causes most s to be 0) Alternating minimization: Alternately minimize with respect to fi‘s (easy) and a’s (harder). Andrew Ng Solving for bases Early versions of sparse coding were used to learn about this many bases: 32 learned bases How to scale this algorithm up? Andrew Ng Sparse coding details Input: Images x(1), x(2), …, x(m) (each in Rn x n) L1 sparsity term Alternating minimization: Alternately minimize with respect to fi‘s (easy) and a’s (harder). Andrew Ng Feature sign search (solve for ai’s) Goal: Minimize objective with respect to ai’s. • Simplified example: • Suppose I tell you: • Problem simplifies to: • This is a quadratic function of the ai’s. Can be solved efficiently in closed form. • Algorithm: • Repeatedly guess sign (+, - or 0) of each of the ai’s. • Solve for ai’s in closed form. Refine guess for signs. Andrew Ng The feature-sign search algorithm: Visualization a2 Current guess: a1 0 a2 0 a1 Starting from zero (default) Andrew Ng The feature-sign search algorithm: Visualization a2 Current guess: a1 0 a2 0 1: Activate a2 with “+” sign Active set ={a2} a1 Starting from zero (default) Andrew Ng The feature-sign search algorithm: Visualization a2 Current guess: a1 0 a2 0 1: Activate a2 with “+” sign Active set ={a2} a1 Starting from zero (default) Andrew Ng The feature-sign search algorithm: Visualization a2 2: Update a2 (closed form) Current guess: a1 0 a2 0 1: Activate a2 with “+” sign Active set ={a2} a1 Starting from zero (default) Andrew Ng The feature-sign search algorithm: Visualization 3: Activate a1 with “+” sign Active set ={a1,a2} a2 Current guess: a1 0 a2 0 a1 Starting from zero (default) Andrew Ng The feature-sign search algorithm: Visualization 3: Activate a1 with “+” sign Active set ={a1,a2} a2 4: Update a1 & a2 (closed form) Current guess: a1 0 a2 0 a1 Starting from zero (default) Andrew Ng Before feature sign search 32 learned bases Andrew Ng With feature signed search Andrew Ng Training time Recap of sparse coding for feature learning Input: Images x(1), x(2), …, x(m) (each in Rn x n) Learn: Dictionary of bases f1, f2, …, fk (also Rn x n). Test time Input: Novel image x (in Rn x n) and previously learned fi’s. Output: Representation [a1, a2, …, ak] of image x. 0.8 * + 0.3 * + 0.5 * x 0.8 * f36 + 0.3 * f42 + 0.5 * f63 Represent as: [0, 0, …, 0, 0.8, 0, …, 0, 0.3, 0, …, 0, 0.5, …] Andrew Ng Sparse coding recap 0.8 * + 0.3 * + 0.5 * [0, 0, …, 0, 0.8, 0, …, 0, 0.3, 0, …, 0, 0.5, …] Much better than pixel representation. But still not competitive with SIFT, etc. Three ways to make it competitive: • Combine this with SIFT. • Advanced versions of sparse coding (LCC). • Deep learning. Andrew Ng Combining sparse coding with SIFT Input: Images x(1), x(2), …, x(m) (each in Rn x n) SIFT descriptors x(1), x(2), …, x(m) (each in R128) Learn: Dictionary of bases f1, f2, …, fk (also Rn x n). R128. Test time: Given novel SIFT descriptor, x (in R128), represent as Andrew Ng Putting it together Feature representation Learning algorithm Suppose you’ve already learned bases f1, f2, …, fk. Here’s how you represent an image. or Learning algorithm x(1) x(2) x(3) … a(1) a(2) a(3) … E.g., 73-75% on Caltech 101 (Yang et al., 2009, Boreau et al., 2009) Andrew Ng K-means vs. sparse coding K-means Centroid 1 Centroid 2 Centroid 3 Represent as: Andrew Ng K-means Centroid 1 K-means vs. sparse codingIntuition: “Soft” version of k-means Sparse(membership coding in multiple clusters). Basis f1 Basis f2 Centroid 2 Centroid 3 Represent as: Basis f3 Represent as: Andrew Ng K-means vs. sparse coding Rule of thumb: Whenever using k-means to get a dictionary, if you replace it with sparse coding it’ll often work better. Andrew Ng