Framework and Implementation

advertisement
Robust Classification of Objects,
Faces, and Flowers Using
Natural Image Statistics
主讲人:王崇秀
Outline






Authors
Abstract
Background
Framework and Implementation
Experiments and Results
Conclusions
2015/4/13
2
Authors

Christopher Kanan:




2015/4/13
Ph.D. student at the University of California, San Diego (UCSD), and
intend to graduate in 2012.
Research Interests: fuses findings and methods from computer vision,
machine learning, psychology, and computational neuroscience.
Homepage: http://cseweb.ucsd.edu/~ckanan/index.html
Email: ckanan@cs.ucsd.edu
3
Authors

Garrison Cottrell:


2015/4/13
Professor in the Computer Science & Engineering Department at UCSD
.
Research: His research is strongly interdisciplinary. It concerns using
neural networks as a computational model applied to problems in
cognitive science and artificial intelligence, engineering and biology.
He has had success in using them for such disparate tasks as modeling
how children acquire words, studying how lobsters chew, and nonlinear
data compression.
4
Outline






Authors
Abstract
Background
Framework and Implementation
Experiments and Results
Conclusions
2015/4/13
5
Abstract
Classification of images in many category datasets has rapidly improved in
recent years. However, systems that perform well on particular datasets
typically have one or more limitations such as a failure to generalize
across visual tasks (e.g., requiring a face detector or extensive retuning of
parameters), insufficient translation invariance, inability to cope with
partial views and occlusion, or significant performance degradation as the
number of classes is increased.
 Here we attempt to overcome these challenges using a model that
combines sequential visual attention using fixations with sparse coding.
The model’s biologically-inspired filters are acquired using unsupervised
learning applied to natural image patches. Using only a single feature type,
our approach achieves 78.5% accuracy on Caltech-101 and 75.2% on the
102 Flowers dataset when trained on 30 instances per class and it achieves
92.7% accuracy on the AR Face database with 1 training instance per
person. The same features and parameters are used across these datasets to 6
2015/4/13
illustrate its robust performance.

摘要


最近在很多分类数据集上,图像的分类性能在快速的提升。但是,在
某一特定数据集上性能很好的系统往往有一个或者多个限制,例如在
视觉任务中难以推广(需要一个人脸检测器或者额外的参数返回),平
移不变性不足,不能处理局部遮挡以及随着类别的增多,性能显著下
降。
在这里,我们试图使用一个模型来克服这些挑战,该模型结合了顺序
视觉注意中的稀疏编码的视点。该模型的生物启发的滤波器是在自然
图像块上通过无监督学习得到。仅使用一种特征,每类使用30个样本
来训练,该方法在caltech101上达到78.5%的识别率;在102类的花
数据库上达到75.2%的识别率。每个人使用1个训练样本,该方法在
AR人脸数据库上达到92.7%的识别率。在这些数据机上,使用的特征
和参数都是一致的,展示了该方法的鲁棒性。
2015/4/13
7
Outline






Authors
Abstract
Background
Framework and Implementation
Experiments and Results
Conclusions
2015/4/13
8
Background——Using Natural Image Statistics

Hand-designed features:


Self-taught learning:




Haar, DOG, Gabor, HOG, SIFT and so on;
Applied to unlabeled natural images to learn basis vectors/filters that are
good for representing natural images.
The training data is generally distinct from the datasets the system will
be evaluated on.
Self-taught learning works well because it represents natural scenes
efficiently, while not overfitting to a particular dataset.
Sparse coding
2015/4/13
9
Background——Visual Attention

Visual Attention


2015/4/13
A saliency map is a topologically organized map that indicates
interesting regions in an image based on the spatial organization of the
features and an agent’s current goal.
Computational model: There are many computational models, typically
produce maps that assign high saliency to regions with rare features.
10
Background——Sequential Object Recognition

Sequential Object Recognition




2015/4/13
Many algorithms for saliency maps have been used to predict the
location of human eye movements, little work has been done on how
they can be used to recognize individual objects.
a few notable exceptions [1, 23, 27, 15] and these approaches have
several similarities.
Framework: extract features -> saliency maps based on features ->
extract small window representing a fixation and classify these fixations
to made subsequent fixations -> mechanisms used to combine
information across fixations.
NIMBLE framework
11
Outline






Authors
Abstract
Background
Framework and Implementation
Experiments and Results
Conclusions
2015/4/13
12
Framework and Implementation

High level description of the model:




2015/4/13
Pre-processing image to cope with luminance variation.
Sparse ICA features are then extracted from the image.
Sparse ICA features are used to compute a saliency map, which
is treated as a probability distribution, and locations are randomly
sampled from the map.
Fixations are extracted from the feature maps at the sampled
location, followed by probabilistic classification.
13
Framework and Implementation

Image pre-processing:





2015/4/13
Resizing to ensure smallest dimension is 128 with other dimension
resized accordingly to maintain its aspect ratio.
Grayscale images are converted to color.
RGB → LMS: is a color space represented by the response of the three
types of cones of the human eye, named after their responsivity
(sensitivity) at long, medium and short wavelengths.
Normalization to [0,1]:
where   0.05 , rlinear ( z) [0,1] is a pixel of the image in LMS color space at
location z. Note that rnonlinear( z ) [0,1] as well.
14
Framework and Implementation

Image pre-processing:
2015/4/13
15
Framework and Implementation

Feature learning:






2015/4/13
To learn ICA filters, we preprocess 584 images from the McGill color
image dataset.
From each image, 100 b*b*3 patches are extracted from random
locations.
The channel mean (L, M, and S) computed across images is subtracted
from each patch. Each patch is then treated as a 3b^2 dimensional
vector.
PCA is applied to the patch collection to reduce the dimensionality
(discard the first principal component, retain the rest d principal
components).
Apply fastICA → d ICA filters.
m*n*3 images → m*n*d filter responses, sparse representation.
16
Framework and Implementation

Feature learning:
2015/4/13
the ICA filters learned.
17
Framework and Implementation

Saliency Maps:

2015/4/13
Use SUN model to generate saliency map.
18
Framework and Implementation

Spatial Pooling:





2015/4/13
Normalize saliency map to sum to one, and then treated as a
probability distribution.
Randomly sampled T times, during each fixation t, a location lt is
chosen according to the saliency map.
w*w*d(w=51) stack of filter responses.
Reduce the dimension of the stack by spatial subsampling it
using a spatial pyramid, which divide each w*w filter responses
into 1*1, 2*2, 4*4 grids, and the mean filter responses in each
grid cell is computed and concatenated to form a vector, and
normalized to unit length.
This reduces the dimensionality of the from w*w*d(51^2d) to 21d. lt
is normalized by the height and width of the image and stored
along with the corresponding features.
19
Framework and Implementation

Spatial Pooling:



2015/4/13
After acquiring T fixations from every training image, PCA is applied to
the collected feature vectors.
The first 500 principal components are retained, and then whitened.
Finally, the post-PCA fixation features, denoted wk ,i , are each made unit
length.
20
Framework and Implementation

Training and Classification

Naïve Bayes’ assumption: gt is the vector of fixation features.

Bayes’ rule:

P (C = k) is uniform and we fix T = 100, which would be about 30
s of viewing time for a person assuming 3 fixations/second.
2015/4/13
21
Outline






Authors
Abstract
Background
Framework and Implementation
Experiments and Results
Conclusions
2015/4/13
22
Experiments and Results

Caltech101 results
2015/4/13
23
Experiments and Results

Caltech256 results
2015/4/13
24
Experiments and Results

AR face database
2015/4/13
25
Experiments and Results

102 Flower database
2015/4/13
26
Outline






Authors
Abstract
Background
Framework and Implementation
Experiments and Results
Conclusions
2015/4/13
27
Conclusions


One of the reasons we think our approach works well is
because it employs a nonparametric exemplar-based
classifier.
The Naïve Bayes’ assumption is obviously false and
learning a more flexible model could lead to performance
improvements.
2015/4/13
28
Download