pptx

advertisement
Toyota Technological Institute at Chicago
http://ttic.uchicago.edu/~gpapan
Visual Dictionaries
George Papandreou
CVPR 2014 Tutorial on BASIS
Additive Image Patch Modeling
 The patch-based image modeling approach.
 How to span the space of all 8x8 image patches?
α1
Σ
α2
α3
K

D
2
Additive Image Patch Modeling
 The patch-based image modeling approach.
 How to span the space of all 8x8 image patches?
α1
Σ
α2
α3

K
D
3
Two Modeling Goals
Image reconstruction
 Use dictionary to build image prior
 Tasks: Compression, denoising,
deblurring, inpainting,…
Image interpretation
 Use dictionary for feature extraction
 Tasks: Classification, recognition,…
4
Three Modeling Regimes
Two inter-related properties:
 How big is the dictionary? Over-completeness:
 How many non-zero components?
sparsity
PCA
Sparse Coding
Clustering
5
Where Does the Dictionary Come From?
(1) Dictionary is fixed, e.g., basis or union of bases
DCT
JPEG image
compression
Wavelets
6
Where Does the Dictionary Come From?
(2) Learn generic dictionary from a collection of images
Many algorithms
possible (see later)
7
Where Does the Dictionary Come From?
(3) Learn an image-specific (image-adapted) dictionary
Many algorithms
possible (see later)
8
Where Does the Dictionary Come From?
(4) Non-parametric: Dictionary is the set of all
overlapping image patches (one or many images)
Non-local means,
patch transform, etc.
9
Beyond Bases: Hierarchical Dictionaries
(1) Multi-scale image modeling
 Apply same dictionary to image at different scales
 Gaussian+Laplacian pyramids, wavelets, …
(2) Recursive hierarchical models
 Build recursive dictionaries
 Deep learning
10
Key Problems
 Coding
 Find the expansion coefficients
given the dictionary
 Dictionary learning
 Given data, learn a dictionary
 Hierarchical modeling
K

D
11
Image Coding Problem: Least Squares
 Least squares criterion. Equivalent formulations:
 Solution (Tikhonov regularization, Wiener filtering):
 Columns of V are the dual filters (dual dictionary).
 Fast processing (inner products). Yields dense code.
12
Image Coding Problem: Vector Quantization
 Equivalent formulations:
 Solution:
 Exact O(DK): one inner product for each basis
 Approximate O(D logK): ANN search
13
Sparse Coding Problem
 Assume only L non-zero coefficients:
 This is a much harder combinatorial problem. In the
worst case there are
possible active sets.
 If we knew the active set of coefs, then LS problem.
Two very effective families of approximate algorithms:
 Greedy algorithms
 Relaxation algorithms
14
Greedy Sparse Coding: Matching Pursuit
 Greedily add T terms one by one
Algorithm (Basic Matching Pursuit):
1. Initialize the residual r = x
2. Find atom that best explains the residual
VQ problem at
each iteration
3. Update the residual
4. Return if stopping criterion met, otherwise go to 2.
 Many variants (e.g., OMP). Efficient implementations.
Mallat (2009)
SPAMS
15
Basic Matching Pursuit Convergence Analysis
 Exponential convergence (recall VQ analysis):
 Dictionary coherence:
 Note that
if
spans
.
 Basic matching pursuit costs T times more than VQ.
16
Relaxed Sparse Coding
 Continuous relaxation of the combinatorial problem
 Prominent case: p = 1 (L1 convex optimization)
17
Basis Pursuit Coding
 L1-penalized problem (a.k.a. basis pursuit, LASSO)
 Global optimum (convex optimization)
 Huge literature:
 Algorithms for large-scale problems
 Recovery guarantees: compressed sensing
 Extensions: TV minimization, ADMM
 Extensions:
 Re-weighted L1
 Non-convex relaxations: 0 < p < 1
Mallat (2009), Elad (2010)
SPAMS
18
Thresholding Algorithms
 Lp-optimization with orthonormal basis
 Decompose into separable problem:
L2 invariant to
rotation
Lp norm is
separable
 Look-up table 1-D optimization:
 L0 / L1: hard/soft thresholding
 L2: linear shrinkage
Elad (2010)
19
Recap: (Sparse) Coding
 Problem:
 Find the expansion coefficients
given the dictionary
 Exact methods
 p = 2 (Fourier, PCA, etc): Linear system
 p = 0 and
= 1 (VQ): Fast search
 Orthonormal dictionary: Separable 1-D optimization
 Approximate methods for sparse coding
 p = 0: Greedy matching pursuit
 p = 1: Convex relaxation
20
Dictionary Learning
 Find a dictionary W that best fits a dataset
 Exact solution for L2 norm via the SVD (PCA)
 For sparse norms this is a hard non-convex problem
even if the coding problem is convex
 Main approach: alternating minimization
 Recent advances in theory
21
Alternating Minimization Methods
 Update codes
given dictionary
 Use any greedy/ relaxation sparse coding algorithm
 Update dictionary
, given codes
Least squares
 Method converges to local minimum
 K-SVD: Updates dictionaries sequentially
 Online version much faster for large datasets
Olshausen & Field (1996); Engan+ (1999); Aharon+ (2006); Mairal+ (2010)
22
K-Means as Dictionary Learning Method
 Update codes
such that
 Update dictionary
given dictionary
, given codes
 Special case of K-SVD using OMP-1 for coding
 Extremely fast
Aharon+ (2006); Coates, Lee, Ng (2011)
23
Learned Dictionaries
Generic KSVD
Aharon+ (2006)
Barbara KSVD
24
Learned Dictionaries
Generic KSVD
Generic K-Means
Aharon+ (2006), Coates+ (2011), Papandreou+ (2014)
25
Image Denoising with Learned Dictionaries
Noisy 22.1dB
Denoised KSVD 30.8dB
Aharon+ (2006)
26
Image Inpainting with Learned Dictionaries
 Joint dictionary learning and image inpainting
Mairal+ (2010)
27
K-SVD vs K-Means Dictionaries in Denoising
 Replace K-SVD with K-Means in dictionary learning
step of the denoising algorithm.
Noisy 22.12 dB
KSVD 32.43 dB
OMP-32, 84 sec
K-Means 32.25 dB
OMP-1, 22 sec
28
Recap: Dictionary Learning
 Find a dictionary W that best fits a dataset
 Non-convex problem
 Greedy alternating optimization methods
 The K-means algorithm is very fast and works well
for small image patches
29
Image Patch Dictionaries in Visual Recognition
 SIFT-based Bag-of-Words classification pipeline
Dictionary
>10K words
Patches
SIFT
Classifier 30
Patch Dictionaries in Image Classification
 Image classification without SIFT
 Key insights:
 K-means works well
 Whitening is crucial
 Using larger dictionaries boosts
recognition rate
 Encoding has a huge effect on
performance
 Promising results on CIFAR but
not on large image datasets
Varma, Zisserman (2003); Coates+ (2011)
31
Histograms of Sparse Codes for Object Detection
 Key idea: Build a HOG-like descriptor on top of K-SVD
learned patch dictionary instead of gradients, then DPM
Ren, Ramanan (2013); Also see Dikmen, Hoiem, Huang (2012)
32
Hierarchical Modeling and Dictionary Learning
 So far: Modeling the appearance of small image
patches, say 8x8 pixels.
 How about dictionaries of larger visual patterns?
1. Multiscale modeling
 Work with image pyramids
2. Hierarchical modeling
 Model higher order statistics of feature responses
 Recursively compose complex visual patterns
 Use unsupervised or supervised objectives
33
Hierarchical Models of Objects
Fidler & Leonardis (2007); Zhu+ (2010)
34
Hierarchical Matching Pursuit (K-SVD)
Bo, Ren, Fox (2013)
35
Deep Convolutional Networks
LeCun+ (1998); Krizhevsky+ (2012)
36
Transformation Aware Dictionaries
 How to span the space of all 8x8 image patches?
α1
Σ
α2
α3
K

D
37
Sources of Redundancy in Patch Dictionaries
 Same pattern, different position
 Same pattern, opposite polarity (x2 redundancy)
 Same pattern, different contrast
 How to build less redundant dictionaries?
38
The Epitome Data Structure
Patch
Epitome
Epitomes: Jojic, Frey, Kannan, ICCV-03
39
Generating Patches from an Epitome
40
Generating Patches from an Epitome
 A single epitome essentially is a large collection of
translated copies of a visual pattern.
41
Position and Appearance Transformations
42
Epitomic Image Matching
Epitomes: Jojic, Frey, Kannan, ICCV-03
43
Dictionary of Mini-Epitomes
Papandreou, Chen, Yuille, CVPR-14
44
Coding and Learning with Epitomic Dictionaries
Patch coding in epitomic dictionaries:
 Epitomic dictionary equivalent to standard dictionary
with patches at all possible positions in epitome:
Dictionary learning:
 Variational inference on GMM model (Jojic+ '01)
 Sparse dictionary learning (Aharon, Elad '08; Mairal+ '11)
 Epitomic K-Means (Papandreou+ '14)
45
K-Means for the Mini-Epitome Model
Generative model:
1. Select mini-epitome k with probabilityz
2. Select position p within epitome uniformly
3. Generate the patch
Epitomic K-means (hard-EM)
1. Epitomic matching (hard assignment)
2. Epitome update
3. Diverse initialization with K-means++ (optional)
Papandreou, Chen, Yuille, CVPR-14
46
K-Means for the Mini-Epitome Model
•
Generative model:
1. Select mini-epitome k with probability
2. Select position p within epitome uniformly
3. Generate the patch
Max likelihood, hard EM – essentially epitomic
adaptation of K-Means.
Faster convergence using diverse initialization of miniepitomes by epitomic adaptation of K-Means++.
47
A Generic Mini-Epitome Dictionary
Epitomic dictionary
256 mini-epitomes (16x16)
Non-Epitomic dictionary
1024 elements (8x8)
Both trained on 10,000 Pascal images
48
Evaluation on Image Reconstruction
Original image
Epitome reconstr.
PSNR: 29.2 dB
Improvement
over nonepitome
49
Evaluation on Image Reconstruction
50
Evaluation on VOC-07 Image Classification
51
Max-Pooling vs. Epitomic Convolution
Max-pooling
Epitomic
convolution
52
Deep Epitomic Convolutional Nets
Convolution+
max-pooling
Epitomic
convolution
 Imagenet top-5 error: 14.2(max-pool)  13.6 (epitome)
Papandreou arXiv-14
53
Recap: Transformation Aware Dictionaries
 Reduce dictionary redundancy by explicitly
modeling nuisance variables
 Compact dictionaries for image reconstruction
and recognition
 Epitomes as translation aware data structurez
 Epitomic convolution as alternative to a pair of
consecutive convolution and max-pooling
layers in deep networks.
56
Download