Exploring Image Data with Quantization-Based Techniques Svetlana Lazebnik ()

advertisement
Exploring Image Data with Quantization-Based
Techniques
Svetlana Lazebnik (lazebnik@cs.unc.edu)
University of North Carolina at Chapel Hill
Joint work with Maxim Raginsky (m.raginsky@duke.edu)
Duke University
Overview
• Motivation: working with high-dimensional image data
– Learning lower-dimensional structure (unsupervised)
– Learning posterior class distributions (supervised)
• Vector quantization framework
– Can learn a lot of useful structure by compressing the data
– Nearest-neighbor quantizers are convenient and familiar
– VQ theory and practice is extremely well-developed
• Outline of talk
– Dimensionality estimation and separation (NIPS 2005, work in progress)
– Learning quantizer codebooks by information loss minimization (AISTATS
2007)
1. Intrinsic dimensionality estimation
• Many natural signal classes have a low-dimensional smooth structure –
what we observe is a noisy embedding of the signal into a highdimensional space
• Goal: estimate the intrinsic dimensionality d of a manifold M embedded in
D-dimensional space (d < D) given a set of samples from M
Intrinsic dimensionality estimation: The basics
• Key idea (regularity assumption): For data
distributed on a d-dimensional manifold,
the probability of a small ball of radius ε
around any point on the manifold is
proportional to εd
• Regression estimate: Plot log N(ε) vs. log ε
and compute the slope of the linear part of
the curve
Intrinsic dimensionality estimation: The basics
• Key idea (regularity assumption): For data
distributed on a d-dimensional manifold,
the probability of a small ball of radius ε
around any point on the manifold is
proportional to εd
• Regression estimate: Plot log N(ε) vs. log ε
and compute the slope of the linear part of
the curve
• Previous work:
– Bennett (1969), Grassberger & Procaccia (1983), Camastra & Vinciarelli (2002),
Brand (2003), Kegl (2003), Costa & Hero (2004), Levina & Bickel (2005)
• Disadvantages:
– Estimates are local (defined for neighborhood of each point)
– Nearest-neighbor computations needed
– Negative bias in high dimensions
Quantization approach
• High-rate approximation: when the data lying on a d-dimensional manifold
are optimally vector-quantized with a large number k of codevectors, the
r.m.s. quantizer error ε(k) scales as follows:
Quantization approach
• High-rate approximation: when the data lying on a d-dimensional manifold
are optimally vector-quantized with a large number k of codevectors, the
r.m.s. quantizer error ε(k) scales as follows:
• Quantization dimension:
(Zador, 1982; Graf & Luschgy, 2000)
Quantization approach
• High-rate approximation: when the data lying on a d-dimensional manifold
are optimally vector-quantized with a large number k of codevectors, the
r.m.s. quantizer error ε(k) scales as follows:
• Quantization dimension:
(Zador, 1982; Graf & Luschgy, 2000)
• VQ literature: assume d is known and study the asymptotics of ε(k)
• Our work: observe empirical VQ error and estimate d
Dimensionality estimation by quantization
•
Estimation algorithm:
– Quantize the data with a range of codebook sizes k and track the error ε(k)
– Plot log k vs. – log ε(k) and find slope of the linear part of the curve
•
In theory:
– Distribution µ of data on manifold is known
– ε(k) is the error of the optimal k-point quantizer for µ
– The dimension estimate is obtained in the limit k  ∞
•
In practice:
– Approximate µ by the empirical distribution of the data
– Learn a good quantizer on a training sequence and track ε(k) on an
independent test sequence
– Estimation breaks down as k approaches the size of the training set
Example: Swiss roll
20,000 samples
Parametric fit:
Example: Toroidal spiral
20,000 samples
Real data
Visualization of digit estimates
Dimensionality separation
Myth: high-dimensional data
is homogeneous
Reality: high-dimensional data
is heterogeneous
© Maxim Raginsky
Quantization at multiple levels
k=1
k=2
• To find the quantizer for the next level:
– Initialize by splitting clusters from the previous level
– Refine by running k-means with the new centers
Quantization at multiple levels
k=1
k=2
k=3
• To find the quantizer for the next level:
– Initialize by splitting clusters from the previous level
– Refine by running k-means with the new centers
Quantization at multiple levels
k=1
k=2
k=3
k=4
• To find the quantizer for the next level:
– Initialize by splitting clusters from the previous level
– Refine by running k-means with the new centers
Quantization at multiple levels
k=1
k=2
k=3
k=4
• To find the quantizer for the next level:
– Initialize by splitting clusters from the previous level
– Refine by running k-means with the new centers
k=5
Quantization at multiple levels
k=1
k=2
k=3
k=4
• To find the quantizer for the next level:
– Initialize by splitting clusters from the previous level
– Refine by running k-means with the new centers
k=5
Tree-structured VQ
k=1
• Form the tree by recursively splitting clusters
– Greedy strategy: choose splits that cause greatest reduction in error
– Tradeoff: lose optimality, gain efficiency and a recursive partitioning scheme
Tree-structured VQ
k=1
k=2
• Form the tree by recursively splitting clusters
– Greedy strategy: choose splits that cause greatest reduction in error
– Tradeoff: lose optimality, gain efficiency and a recursive partitioning scheme
Tree-structured VQ
k=1
k=2
k=3
• Form the tree by recursively splitting clusters
– Greedy strategy: choose splits that cause greatest reduction in error
– Tradeoff: lose optimality, gain efficiency and a recursive partitioning scheme
Tree-structured VQ
k=1
k=2
k=3
k=4
• Form the tree by recursively splitting clusters
– Greedy strategy: choose splits that cause greatest reduction in error
– Tradeoff: lose optimality, gain efficiency and a recursive partitioning scheme
Tree-structured VQ
k=1
k=2
k=3
k=4
k=5
• Form the tree by recursively splitting clusters
– Greedy strategy: choose splits that cause greatest reduction in error
– Tradeoff: lose optimality, gain efficiency and a recursive partitioning scheme
Tree-structured VQ
• Form the tree by recursively splitting clusters
– Greedy strategy: choose splits that cause greatest reduction in error
– Tradeoff: lose optimality, gain efficiency and a recursive partitioning scheme
Dimensionality separation
• Each node in the tree-structured codebook represents a
subset of the data
• We can compute a dimension estimate for that subset using
the subtree rooted at that node (provided the subtree is large
enough)
• Use test error to find out whether the estimate for a given
subtree is reliable
Choosing the branching factor
Training error
Test error
MNIST 1’s dataset (7877 samples)
• Larger branching factor (b) yields a more optimal quantizer,
but fewer rate samples (the increment of k is b – 1)
• Increasing b from 2 to 4 decreases error only slightly,
but b = 2 is prone to artifacts, so we choose b = 3
Comparison of estimates
k-means estimates
TSVQ estimates
Correlation coefficient for training estimates: 0.994, for test estimates: 0.988
Comparison of running times
Each data point is the time to build a complete error curve
Platform: Pentium IV desktop, 4.3 GHz processor, 3GB of RAM
(random subsets of MNIST dataset)
Dimensionality separation experiments
• MNIST digits
– 70,000 samples
– Extrinsic dimensionality: 784 (28x28)
• Grayscale image patches
– 750,000 samples (from the 15 scene categories dataset)
– Extrinsic dimensionality: 256 (16x16)
Digits: codebook tree
Patches: codebook tree
Patches: dimensionality separation
305 leaves
Lowest-dimensional subsets
Highest-dimensional subsets
Patches: subtree visualization
Low-dimensional subtrees
High-dimensional subtree
Future experiments
• Tiny images (Fergus & Torralba, 2007)
– Extrinsic dimension: 3072 (32x32x3)
– Millions of samples collected from the Web
Future experiments
• Tiny images (Fergus & Torralba, 2007)
– Extrinsic dimension: 3072 (32x32x3)
– Millions of samples collected from the Web
• Human motion data
– Extrinsic dimension: 150+ per frame
– Hundreds of thousands of frames recorded?
Summary: Intrinsic Dimensionality Estimation
• Our method relates the intrinsic dimension of a manifold (a
topological property) to the asymptotic optimal quantization
error for distributions on the manifold (an operational
property)
• Advantage over regression methods: fewer nearest neighbor
computations
• By using TSVQ, framework can naturally be extended to
dimensionality separation of heterogeneous data
2. Learning quantizer codebooks by minimizing
information loss
• Problem formulation
X
Y
continuous feature
vector
class label
– Given: a training set of feature vectors X with class labels Y
Lazebnik & Raginsky (AISTATS 2007)
2. Learning quantizer codebooks by minimizing
information loss
• Problem formulation
quantization
prediction
X
K
Y
continuous feature
vector
codeword index
class label
– Given: a training set of feature vectors X with class labels Y
– Goal: learn a codebook such that the codeword index K assigned to X
preserves as much information as possible about its class label Y
Lazebnik & Raginsky (AISTATS 2007)
2. Learning quantizer codebooks by minimizing
information loss
• Problem formulation
quantization
prediction
X
K
Y
continuous feature
vector
codeword index
class label
– Given: a training set of feature vectors X with class labels Y
– Goal: learn a codebook such that the codeword index K assigned to X
preserves as much information as possible about its class label Y
– Requirement: the encoding of X to K should not depend on Y
Lazebnik & Raginsky (AISTATS 2007)
2. Learning quantizer codebooks by minimizing
information loss
• Problem formulation
quantization
prediction
X
K
Y
continuous feature
vector
codeword index
class label
– Given: a training set of feature vectors X with class labels Y
– Goal: learn a codebook such that the codeword index K assigned to X
preserves as much information as possible about its class label Y
– Requirement: the encoding of X to K should not depend on Y
• Information-theoretic approach
– K should approximate a sufficient statistic of X for Y: I (K;Y ) = I (X;Y )
– Practical strategy: minimize the information loss L = I (X;Y ) – I (K;Y )
Lazebnik & Raginsky (AISTATS 2007)
Information loss minimization
Dhillon et al. (2003), Banerjee et al. (2005)
• Representing feature vector Xi by quantizer index k = 1, …, C, will replace
P(Y | Xi) by P(Y | k), the posterior class distribution associated with k
Xi
k
P(Y |Xi)
P(Y |k)
Information loss minimization
Dhillon et al. (2003), Banerjee et al. (2005)
• Representing feature vector Xi by quantizer index k = 1, …, C, will replace
P(Y | Xi) by P(Y | k), the posterior class distribution associated with k
Xi
k
Pi
πk
• Goal: find a class distribution codebook П = {π1,…, πC} that minimizes
Kullback-Leibler divergence
P(Y |Xi), posterior class distribution for Xi
P(Y |k), posterior class distribution for index k
Probability simplex P(Y)
• Iterative minimization in the probability simplex:
– P(Y |X ) is needed for encoding, no spatial coherence in the feature space
Our approach
• Simultaneous quantization of feature space and probability simplex
• Encode data by the nearest-neighbor rule w.r.t. a codebook M = {m1,…,mC}
in the feature space
• New objective function:
Our approach
• Simultaneous quantization of feature space and probability simplex
• Encode data by the nearest-neighbor rule w.r.t. a codebook M = {m1,…,mC}
in the feature space
• New objective function:
Feature space
Probability simplex P(Y)
Our approach
• Optimizing over Voronoi partitions is still a difficult combinatorial problem
• We obtain a continuous optimization problem by using soft clustering
– Soft cluster assignment:
β – softness parameter
– Differentiable objective function:
Alternating Minimization
• Fix П and update M by gradient descent:
α
– learning rate (found by line search)
• Fix M and update П in closed form:
Example 1
Data
Example 1
K-means
Example 1
Our method (info-loss)
Example 2
Data
Example 2
K-means
Example 2
Our method (info-loss)
Classifying real image patches
Texture
Satimage
USPS digits
11 classes, 5500 samples
40 dimensions
6 classes, 6435 samples
36 dimensions
10 classes, 9298 samples
256 dimensions
UCI Machine Learning Repository
http://www.ics.uci.edu/~mlearn/MLRepository.html
Application 1: Building better codebooks for
bag-of-features image classification
Application 1: Building better codebooks for
bag-of-features image classification
• Dataset: 15 scene categories
Fei-Fei & Perona (2005), Oliva & Torralba (2001), Lazebnik et al. (2006)
Application 1: Building better codebooks for
bag-of-features image classification
• Dataset: 15 scene categories
Fei-Fei & Perona (2005), Oliva & Torralba (2001), Lazebnik et al. (2006)
• Feature extraction: sample 16x16 patches on a regular grid, compute
SIFT descriptors Lowe (1999, 2004)
128-dimensional
feature vector
Classification results
Fei-Fei & Perona (2005): 65.2%
Visual codebooks (C=32)
k-means
info-loss
Application 2: Image Segmentation
• Let’s take another look at our objective function:
centers of C image regions
appearance distributions for the regions
assignment of pixel i to region k
distribution of appearance attributes at pixel i
appearance “centroid” of region k
Application 2: Image Segmentation
• Let’s take another look at our objective function:
centers of C image regions
appearance distributions for the regions
assignment of pixel i to region k
distribution of appearance attributes at pixel i
appearance “centroid” of region k
• Related work: Heiler & Schnorr (2005)
– Variational curve evolution framework
– Limited to two regions
Segmentation example 1
Segmentation example 2
More segmentation examples
More segmentation examples
Summary: Learning quantizer codebooks by minimizing
information loss
• Quantization is a key operation for forming visual
representations
• Information-theoretic formulation is discriminative,
yet independent of choice of classifier
• Diverse applications: construction of visual codebooks,
image segmentation
Summary of talk
•
Intrinsic dimensionality estimation
– Theoretical notion of quantization dimension leads to a practical
estimation procedure
– TSVQ for partitioning of heterogeneous datasets
•
Information loss minimization
– Simultaneous quantization of feature space and posterior class
distributions
– Applications to visual codebook design, image segmentation
•
Future work: more fun with quantizers
– VQ techniques for density estimation in the space of image patches
– Applications to modeling of saliency and visual codebook construction
Download