Toyota Technological Institute at Chicago http://ttic.uchicago.edu/~gpapan Visual Dictionaries George Papandreou CVPR 2014 Tutorial on BASIS Additive Image Patch Modeling The patch-based image modeling approach. How to span the space of all 8x8 image patches? α1 Σ α2 α3 K D 2 Additive Image Patch Modeling The patch-based image modeling approach. How to span the space of all 8x8 image patches? α1 Σ α2 α3 K D 3 Two Modeling Goals Image reconstruction Use dictionary to build image prior Tasks: Compression, denoising, deblurring, inpainting,… Image interpretation Use dictionary for feature extraction Tasks: Classification, recognition,… 4 Three Modeling Regimes Two inter-related properties: How big is the dictionary? Over-completeness: How many non-zero components? sparsity PCA Sparse Coding Clustering 5 Where Does the Dictionary Come From? (1) Dictionary is fixed, e.g., basis or union of bases DCT JPEG image compression Wavelets 6 Where Does the Dictionary Come From? (2) Learn generic dictionary from a collection of images Many algorithms possible (see later) 7 Where Does the Dictionary Come From? (3) Learn an image-specific (image-adapted) dictionary Many algorithms possible (see later) 8 Where Does the Dictionary Come From? (4) Non-parametric: Dictionary is the set of all overlapping image patches (one or many images) Non-local means, patch transform, etc. 9 Beyond Bases: Hierarchical Dictionaries (1) Multi-scale image modeling Apply same dictionary to image at different scales Gaussian+Laplacian pyramids, wavelets, … (2) Recursive hierarchical models Build recursive dictionaries Deep learning 10 Key Problems Coding Find the expansion coefficients given the dictionary Dictionary learning Given data, learn a dictionary Hierarchical modeling K D 11 Image Coding Problem: Least Squares Least squares criterion. Equivalent formulations: Solution (Tikhonov regularization, Wiener filtering): Columns of V are the dual filters (dual dictionary). Fast processing (inner products). Yields dense code. 12 Image Coding Problem: Vector Quantization Equivalent formulations: Solution: Exact O(DK): one inner product for each basis Approximate O(D logK): ANN search 13 Sparse Coding Problem Assume only L non-zero coefficients: This is a much harder combinatorial problem. In the worst case there are possible active sets. If we knew the active set of coefs, then LS problem. Two very effective families of approximate algorithms: Greedy algorithms Relaxation algorithms 14 Greedy Sparse Coding: Matching Pursuit Greedily add T terms one by one Algorithm (Basic Matching Pursuit): 1. Initialize the residual r = x 2. Find atom that best explains the residual VQ problem at each iteration 3. Update the residual 4. Return if stopping criterion met, otherwise go to 2. Many variants (e.g., OMP). Efficient implementations. Mallat (2009) SPAMS 15 Basic Matching Pursuit Convergence Analysis Exponential convergence (recall VQ analysis): Dictionary coherence: Note that if spans . Basic matching pursuit costs T times more than VQ. 16 Relaxed Sparse Coding Continuous relaxation of the combinatorial problem Prominent case: p = 1 (L1 convex optimization) 17 Basis Pursuit Coding L1-penalized problem (a.k.a. basis pursuit, LASSO) Global optimum (convex optimization) Huge literature: Algorithms for large-scale problems Recovery guarantees: compressed sensing Extensions: TV minimization, ADMM Extensions: Re-weighted L1 Non-convex relaxations: 0 < p < 1 Mallat (2009), Elad (2010) SPAMS 18 Thresholding Algorithms Lp-optimization with orthonormal basis Decompose into separable problem: L2 invariant to rotation Lp norm is separable Look-up table 1-D optimization: L0 / L1: hard/soft thresholding L2: linear shrinkage Elad (2010) 19 Recap: (Sparse) Coding Problem: Find the expansion coefficients given the dictionary Exact methods p = 2 (Fourier, PCA, etc): Linear system p = 0 and = 1 (VQ): Fast search Orthonormal dictionary: Separable 1-D optimization Approximate methods for sparse coding p = 0: Greedy matching pursuit p = 1: Convex relaxation 20 Dictionary Learning Find a dictionary W that best fits a dataset Exact solution for L2 norm via the SVD (PCA) For sparse norms this is a hard non-convex problem even if the coding problem is convex Main approach: alternating minimization Recent advances in theory 21 Alternating Minimization Methods Update codes given dictionary Use any greedy/ relaxation sparse coding algorithm Update dictionary , given codes Least squares Method converges to local minimum K-SVD: Updates dictionaries sequentially Online version much faster for large datasets Olshausen & Field (1996); Engan+ (1999); Aharon+ (2006); Mairal+ (2010) 22 K-Means as Dictionary Learning Method Update codes such that Update dictionary given dictionary , given codes Special case of K-SVD using OMP-1 for coding Extremely fast Aharon+ (2006); Coates, Lee, Ng (2011) 23 Learned Dictionaries Generic KSVD Aharon+ (2006) Barbara KSVD 24 Learned Dictionaries Generic KSVD Generic K-Means Aharon+ (2006), Coates+ (2011), Papandreou+ (2014) 25 Image Denoising with Learned Dictionaries Noisy 22.1dB Denoised KSVD 30.8dB Aharon+ (2006) 26 Image Inpainting with Learned Dictionaries Joint dictionary learning and image inpainting Mairal+ (2010) 27 K-SVD vs K-Means Dictionaries in Denoising Replace K-SVD with K-Means in dictionary learning step of the denoising algorithm. Noisy 22.12 dB KSVD 32.43 dB OMP-32, 84 sec K-Means 32.25 dB OMP-1, 22 sec 28 Recap: Dictionary Learning Find a dictionary W that best fits a dataset Non-convex problem Greedy alternating optimization methods The K-means algorithm is very fast and works well for small image patches 29 Image Patch Dictionaries in Visual Recognition SIFT-based Bag-of-Words classification pipeline Dictionary >10K words Patches SIFT Classifier 30 Patch Dictionaries in Image Classification Image classification without SIFT Key insights: K-means works well Whitening is crucial Using larger dictionaries boosts recognition rate Encoding has a huge effect on performance Promising results on CIFAR but not on large image datasets Varma, Zisserman (2003); Coates+ (2011) 31 Histograms of Sparse Codes for Object Detection Key idea: Build a HOG-like descriptor on top of K-SVD learned patch dictionary instead of gradients, then DPM Ren, Ramanan (2013); Also see Dikmen, Hoiem, Huang (2012) 32 Hierarchical Modeling and Dictionary Learning So far: Modeling the appearance of small image patches, say 8x8 pixels. How about dictionaries of larger visual patterns? 1. Multiscale modeling Work with image pyramids 2. Hierarchical modeling Model higher order statistics of feature responses Recursively compose complex visual patterns Use unsupervised or supervised objectives 33 Hierarchical Models of Objects Fidler & Leonardis (2007); Zhu+ (2010) 34 Hierarchical Matching Pursuit (K-SVD) Bo, Ren, Fox (2013) 35 Deep Convolutional Networks LeCun+ (1998); Krizhevsky+ (2012) 36 Transformation Aware Dictionaries How to span the space of all 8x8 image patches? α1 Σ α2 α3 K D 37 Sources of Redundancy in Patch Dictionaries Same pattern, different position Same pattern, opposite polarity (x2 redundancy) Same pattern, different contrast How to build less redundant dictionaries? 38 The Epitome Data Structure Patch Epitome Epitomes: Jojic, Frey, Kannan, ICCV-03 39 Generating Patches from an Epitome 40 Generating Patches from an Epitome A single epitome essentially is a large collection of translated copies of a visual pattern. 41 Position and Appearance Transformations 42 Epitomic Image Matching Epitomes: Jojic, Frey, Kannan, ICCV-03 43 Dictionary of Mini-Epitomes Papandreou, Chen, Yuille, CVPR-14 44 Coding and Learning with Epitomic Dictionaries Patch coding in epitomic dictionaries: Epitomic dictionary equivalent to standard dictionary with patches at all possible positions in epitome: Dictionary learning: Variational inference on GMM model (Jojic+ '01) Sparse dictionary learning (Aharon, Elad '08; Mairal+ '11) Epitomic K-Means (Papandreou+ '14) 45 K-Means for the Mini-Epitome Model Generative model: 1. Select mini-epitome k with probabilityz 2. Select position p within epitome uniformly 3. Generate the patch Epitomic K-means (hard-EM) 1. Epitomic matching (hard assignment) 2. Epitome update 3. Diverse initialization with K-means++ (optional) Papandreou, Chen, Yuille, CVPR-14 46 K-Means for the Mini-Epitome Model • Generative model: 1. Select mini-epitome k with probability 2. Select position p within epitome uniformly 3. Generate the patch Max likelihood, hard EM – essentially epitomic adaptation of K-Means. Faster convergence using diverse initialization of miniepitomes by epitomic adaptation of K-Means++. 47 A Generic Mini-Epitome Dictionary Epitomic dictionary 256 mini-epitomes (16x16) Non-Epitomic dictionary 1024 elements (8x8) Both trained on 10,000 Pascal images 48 Evaluation on Image Reconstruction Original image Epitome reconstr. PSNR: 29.2 dB Improvement over nonepitome 49 Evaluation on Image Reconstruction 50 Evaluation on VOC-07 Image Classification 51 Max-Pooling vs. Epitomic Convolution Max-pooling Epitomic convolution 52 Deep Epitomic Convolutional Nets Convolution+ max-pooling Epitomic convolution Imagenet top-5 error: 14.2(max-pool) 13.6 (epitome) Papandreou arXiv-14 53 Recap: Transformation Aware Dictionaries Reduce dictionary redundancy by explicitly modeling nuisance variables Compact dictionaries for image reconstruction and recognition Epitomes as translation aware data structurez Epitomic convolution as alternative to a pair of consecutive convolution and max-pooling layers in deep networks. 56