pptx

Toyota Technological Institute at Chicago http://ttic.uchicago.edu/~gpapan Visual Dictionaries George Papandreou CVPR 2014 Tutorial on BASIS Additive Image Patch Modeling  The patch-based image modeling approach.  How to span the space of all 8x8 image patches? α1 Σ α2 α3 K  D 2 Additive Image Patch Modeling  The patch-based image modeling approach.  How to span the space of all 8x8 image patches? α1 Σ α2 α3  K D 3 Two Modeling Goals Image reconstruction  Use dictionary to build image prior  Tasks: Compression, denoising, deblurring, inpainting,… Image interpretation  Use dictionary for feature extraction  Tasks: Classification, recognition,… 4 Three Modeling Regimes Two inter-related properties:  How big is the dictionary? Over-completeness:  How many non-zero components? sparsity PCA Sparse Coding Clustering 5 Where Does the Dictionary Come From? (1) Dictionary is fixed, e.g., basis or union of bases DCT JPEG image compression Wavelets 6 Where Does the Dictionary Come From? (2) Learn generic dictionary from a collection of images Many algorithms possible (see later) 7 Where Does the Dictionary Come From? (3) Learn an image-specific (image-adapted) dictionary Many algorithms possible (see later) 8 Where Does the Dictionary Come From? (4) Non-parametric: Dictionary is the set of all overlapping image patches (one or many images) Non-local means, patch transform, etc. 9 Beyond Bases: Hierarchical Dictionaries (1) Multi-scale image modeling  Apply same dictionary to image at different scales  Gaussian+Laplacian pyramids, wavelets, … (2) Recursive hierarchical models  Build recursive dictionaries  Deep learning 10 Key Problems  Coding  Find the expansion coefficients given the dictionary  Dictionary learning  Given data, learn a dictionary  Hierarchical modeling K  D 11 Image Coding Problem: Least Squares  Least squares criterion. Equivalent formulations:  Solution (Tikhonov regularization, Wiener filtering):  Columns of V are the dual filters (dual dictionary).  Fast processing (inner products). Yields dense code. 12 Image Coding Problem: Vector Quantization  Equivalent formulations:  Solution:  Exact O(DK): one inner product for each basis  Approximate O(D logK): ANN search 13 Sparse Coding Problem  Assume only L non-zero coefficients:  This is a much harder combinatorial problem. In the worst case there are possible active sets.  If we knew the active set of coefs, then LS problem. Two very effective families of approximate algorithms:  Greedy algorithms  Relaxation algorithms 14 Greedy Sparse Coding: Matching Pursuit  Greedily add T terms one by one Algorithm (Basic Matching Pursuit): 1. Initialize the residual r = x 2. Find atom that best explains the residual VQ problem at each iteration 3. Update the residual 4. Return if stopping criterion met, otherwise go to 2.  Many variants (e.g., OMP). Efficient implementations. Mallat (2009) SPAMS 15 Basic Matching Pursuit Convergence Analysis  Exponential convergence (recall VQ analysis):  Dictionary coherence:  Note that if spans .  Basic matching pursuit costs T times more than VQ. 16 Relaxed Sparse Coding  Continuous relaxation of the combinatorial problem  Prominent case: p = 1 (L1 convex optimization) 17 Basis Pursuit Coding  L1-penalized problem (a.k.a. basis pursuit, LASSO)  Global optimum (convex optimization)  Huge literature:  Algorithms for large-scale problems  Recovery guarantees: compressed sensing  Extensions: TV minimization, ADMM  Extensions:  Re-weighted L1  Non-convex relaxations: 0 < p < 1 Mallat (2009), Elad (2010) SPAMS 18 Thresholding Algorithms  Lp-optimization with orthonormal basis  Decompose into separable problem: L2 invariant to rotation Lp norm is separable  Look-up table 1-D optimization:  L0 / L1: hard/soft thresholding  L2: linear shrinkage Elad (2010) 19 Recap: (Sparse) Coding  Problem:  Find the expansion coefficients given the dictionary  Exact methods  p = 2 (Fourier, PCA, etc): Linear system  p = 0 and = 1 (VQ): Fast search  Orthonormal dictionary: Separable 1-D optimization  Approximate methods for sparse coding  p = 0: Greedy matching pursuit  p = 1: Convex relaxation 20 Dictionary Learning  Find a dictionary W that best fits a dataset  Exact solution for L2 norm via the SVD (PCA)  For sparse norms this is a hard non-convex problem even if the coding problem is convex  Main approach: alternating minimization  Recent advances in theory 21 Alternating Minimization Methods  Update codes given dictionary  Use any greedy/ relaxation sparse coding algorithm  Update dictionary , given codes Least squares  Method converges to local minimum  K-SVD: Updates dictionaries sequentially  Online version much faster for large datasets Olshausen & Field (1996); Engan+ (1999); Aharon+ (2006); Mairal+ (2010) 22 K-Means as Dictionary Learning Method  Update codes such that  Update dictionary given dictionary , given codes  Special case of K-SVD using OMP-1 for coding  Extremely fast Aharon+ (2006); Coates, Lee, Ng (2011) 23 Learned Dictionaries Generic KSVD Aharon+ (2006) Barbara KSVD 24 Learned Dictionaries Generic KSVD Generic K-Means Aharon+ (2006), Coates+ (2011), Papandreou+ (2014) 25 Image Denoising with Learned Dictionaries Noisy 22.1dB Denoised KSVD 30.8dB Aharon+ (2006) 26 Image Inpainting with Learned Dictionaries  Joint dictionary learning and image inpainting Mairal+ (2010) 27 K-SVD vs K-Means Dictionaries in Denoising  Replace K-SVD with K-Means in dictionary learning step of the denoising algorithm. Noisy 22.12 dB KSVD 32.43 dB OMP-32, 84 sec K-Means 32.25 dB OMP-1, 22 sec 28 Recap: Dictionary Learning  Find a dictionary W that best fits a dataset  Non-convex problem  Greedy alternating optimization methods  The K-means algorithm is very fast and works well for small image patches 29 Image Patch Dictionaries in Visual Recognition  SIFT-based Bag-of-Words classification pipeline Dictionary >10K words Patches SIFT Classifier 30 Patch Dictionaries in Image Classification  Image classification without SIFT  Key insights:  K-means works well  Whitening is crucial  Using larger dictionaries boosts recognition rate  Encoding has a huge effect on performance  Promising results on CIFAR but not on large image datasets Varma, Zisserman (2003); Coates+ (2011) 31 Histograms of Sparse Codes for Object Detection  Key idea: Build a HOG-like descriptor on top of K-SVD learned patch dictionary instead of gradients, then DPM Ren, Ramanan (2013); Also see Dikmen, Hoiem, Huang (2012) 32 Hierarchical Modeling and Dictionary Learning  So far: Modeling the appearance of small image patches, say 8x8 pixels.  How about dictionaries of larger visual patterns? 1. Multiscale modeling  Work with image pyramids 2. Hierarchical modeling  Model higher order statistics of feature responses  Recursively compose complex visual patterns  Use unsupervised or supervised objectives 33 Hierarchical Models of Objects Fidler & Leonardis (2007); Zhu+ (2010) 34 Hierarchical Matching Pursuit (K-SVD) Bo, Ren, Fox (2013) 35 Deep Convolutional Networks LeCun+ (1998); Krizhevsky+ (2012) 36 Transformation Aware Dictionaries  How to span the space of all 8x8 image patches? α1 Σ α2 α3 K  D 37 Sources of Redundancy in Patch Dictionaries  Same pattern, different position  Same pattern, opposite polarity (x2 redundancy)  Same pattern, different contrast  How to build less redundant dictionaries? 38 The Epitome Data Structure Patch Epitome Epitomes: Jojic, Frey, Kannan, ICCV-03 39 Generating Patches from an Epitome 40 Generating Patches from an Epitome  A single epitome essentially is a large collection of translated copies of a visual pattern. 41 Position and Appearance Transformations 42 Epitomic Image Matching Epitomes: Jojic, Frey, Kannan, ICCV-03 43 Dictionary of Mini-Epitomes Papandreou, Chen, Yuille, CVPR-14 44 Coding and Learning with Epitomic Dictionaries Patch coding in epitomic dictionaries:  Epitomic dictionary equivalent to standard dictionary with patches at all possible positions in epitome: Dictionary learning:  Variational inference on GMM model (Jojic+ '01)  Sparse dictionary learning (Aharon, Elad '08; Mairal+ '11)  Epitomic K-Means (Papandreou+ '14) 45 K-Means for the Mini-Epitome Model Generative model: 1. Select mini-epitome k with probabilityz 2. Select position p within epitome uniformly 3. Generate the patch Epitomic K-means (hard-EM) 1. Epitomic matching (hard assignment) 2. Epitome update 3. Diverse initialization with K-means++ (optional) Papandreou, Chen, Yuille, CVPR-14 46 K-Means for the Mini-Epitome Model • Generative model: 1. Select mini-epitome k with probability 2. Select position p within epitome uniformly 3. Generate the patch Max likelihood, hard EM – essentially epitomic adaptation of K-Means. Faster convergence using diverse initialization of miniepitomes by epitomic adaptation of K-Means++. 47 A Generic Mini-Epitome Dictionary Epitomic dictionary 256 mini-epitomes (16x16) Non-Epitomic dictionary 1024 elements (8x8) Both trained on 10,000 Pascal images 48 Evaluation on Image Reconstruction Original image Epitome reconstr. PSNR: 29.2 dB Improvement over nonepitome 49 Evaluation on Image Reconstruction 50 Evaluation on VOC-07 Image Classification 51 Max-Pooling vs. Epitomic Convolution Max-pooling Epitomic convolution 52 Deep Epitomic Convolutional Nets Convolution+ max-pooling Epitomic convolution  Imagenet top-5 error: 14.2(max-pool)  13.6 (epitome) Papandreou arXiv-14 53 Recap: Transformation Aware Dictionaries  Reduce dictionary redundancy by explicitly modeling nuisance variables  Compact dictionaries for image reconstruction and recognition  Epitomes as translation aware data structurez  Epitomic convolution as alternative to a pair of consecutive convolution and max-pooling layers in deep networks. 56

pptx

Related documents

Products

Support

pptx

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib