The Role of Features, Algorithms and Data in Visual Recognition

advertisement
The Role of Features, Algorithms
and Data in Visual Recognition
Reporter: WuBin
Data: 2011.04.29
Outline






Authors
Abstract
Problem
Experiment & Result
Discussion
Conclusion
Outline






Authors
Abstract
Problem
Experiment & Result
Discussion
Conclusion
Authors (Devi Parikh)


Research Assistant Professor at Toyota
Technological Institute (TTI) at Chicago
Research interests




Education




computer vision
machine learning
pattern recognition
Ph.D. : Carnegie Mellon University (2009)
MS.: Carnegie Mellon University (2007)
BS.: Rowan University (2005)
Publications

CVPR’11(2), CVPR’10(1), CVPR’09(1),
CVPR’08(1), ECCV,08(1), ICCV’07(1),
CVPR’07(Best Paper )…
Cornell University (2009)
Microsoft Research (twice)
Intel Research (once)
Authors (C. Lawrence Zitnick)


Microsoft Research, Redmond
Research Interests




Education


Color Image
Single Image
Stereo Matching
PhD student, Robotics Institute Carnegie Mellon University
Publications

CVPR’10(2), ACM Trans. Graph(2), ECCV’10(2), CVPR’09(2),
CVPR’08(2), ECCV,08(2), PAMI’08(1), IJCV’07(1), CVPR’06(1),
ECCV,06(1), ICCV’05(1), PAMI’00(1)…
Outline






Authors
Abstract
Problem
Experiment & Result
Discussion
Conclusion
Abstract


Which factors is critical to humans’ superior performance on visual
(scene and object) recognition
 Learning algorithm
 Amount of training data
 Features representations
In this work we take a small step towards this goal by performing a
series of human studies and machine experiments
 We find no evidence that human pattern matching algorithms are better
than standard machine learning algorithms
 We find that humans don’t leverage increased amounts of training data
 Through statistical analysis on the machine experiments and supporting
human studies, we find that the main factor impacting accuracies is the
choice of features
摘要


在视觉识别(场景以及物体)领域存在很多基于计算机视觉的相
关算法。为了取得更好的识别效果,有的系统着眼于复杂的学习
算法,一些则利用大量的训练数据,还有一些考虑对更有效的特
征进行建模。然而遗憾的是,所有这些系统都远远无法达到人类
的识别能力。如果我们了解了人类在视觉识别上的响应方式,那
么就能对上述三种方式的有效性产生更深刻的认识,从而发现究
竟是什么造就了人类优越的识别能力。
本文通过对人类学习和机器学习的一系列实验,朝着这个方向前
进了一小步。我们发现,没有任何证据证明人类的学习算法要优
于标准的机器学习算法。另外,人类也不依赖于增加大量的训练
样本来提高识别能力。在本文实验的基础上,通过统计分析发现,
影响识别精度的最重要的因素在于特征的选择。
Outline






Authors
Abstract
Problem
Experiment & Result
Discussion
Conclusion
Problem
What makes
humans so much
better at these
tasks than today’s
machines?
Outline






Authors
Abstract
Problem
Experiment & Result
Discussion
Conclusion
Experiment

Machine experiments







Classifiers
Datasets
Feature types
Dimensionality
Proportion of noisy features
Number of training instances
Human studies
Machine experiments (1)

Classifiers
 NN: nearest neighbor
 NCM: nearest class-mean
 LSVM: linear SVM
 QSVM: SVM with a quadratic polynomial kernel
 CSVM: SVM with a cubic polynomial kernel
 RBFSVM: SVM with an Radial Basis Function (RBF) kernel
 DT: decision tree
 NET: a multi-layer perceptron neural network with 1 hidden layer and
20 hidden layer nodes
 BOOST: boosting with linear SVM on individual features as the simple
learners
 LDASVM: Principal Component Analysis (PCA) then Linear
Discriminant Analysis (LDA) followed by a linear SVM classifier
Machine experiments (2)

Datasets
 OSR: eight categories (coast, forest, highway, inside-city, mountain,
open-country, street, tall-building) of outdoor scene recognition dataset
 ISR: eight categories (bathroom, bedroom, dining room, gym, kitchen,
living room, movie theater and stairs) from the indoor scene recognition
dataset
 PA1: eight categories (bird, bottle, cat, dog, horse, person, pottedplant,
sheep) from the PASCAL object recognition dataset
 PA2: eight other categories (aeroplane, bicycle, boat, chair, car, dining
table, motorbike, sofa) from the same PASCAL object recognition
dataset
 CAL: six categories (aeroplane, car-rear, face, ketch, motorbike, watch)
from the Caltech-101 object categories dataset
Machine experiments (3)

Feature types





CH: color histogram computed by assigning all pixels in an
image to a pre-computed universal color dictionary computed
using k-means
TH: texture histogram computed over a discretization of multiscale edge orientations in the image
GIST: gist descriptor
BOW: bag-of-words feature descriptor for the CAL dataset
ATT: binary attributes, indicate whether the objects have certain
higher-level attributes such as being round, or furry, or having a
head, etc.
Machine experiments (4)

Dimensionality





CH: {4; 8; 16; 32; 64; 128; 256}
GIST: vary the number of edge-orientations (osi ) at each of the
three scales ([s1; s2; s3]), as well as the number of spatial blocks
(n x n) the image is divided into.
TH: vary the number of orientation bins across the three scales to
obtain descriptors of different dimensionality
BOW: the dimensionality was kept fixed at 200 by using a
dictionary with 200 SIFT codewords
ATT: the dimensionality was also kept fixed at 64 for PA2, while
PA1 used a 32 bit version in addition to the 64 bit one by
dropping the attributes that were almost always set to zero across
the dataset
Machine experiments (5)

Proportion of noisy features



{0%; 25%; 50%; 100%; 200%}
200% indicates that twice the number of original
features are added as noisy features
Gaussian distribution with the same mean and standard
deviation
Machine experiments (6)

Number of training instances

Vary the number of training instances used per category
in the range {2; 4; 8; 16; 32; 64; 100(88 for CAL)}
Human Studies (1)

To prevent the use of prior knowledge about images by the subjects,
we do not display to them any direct image information such as
texture patches or color. Instead, we use abstracted visual patterns as
stimuli.

subjects performed at 34% for Figure 3(a), 47% for (b), 50% for (c)
and 47% for (d)
Human Studies (2)
Result



The Role of Algorithms
The Role of Data
The Role of Features
The Role of Algorithms

Fix the set of input features and training data for each set of
experiments

The learning algorithm used by humans is not superior to state-of-theart techniques on these types of problems
The Role of Data (1)




The machine experiments show consistent improvement as the number of training
instances increase
It is unlikely that machine accuracies will match those of humans when given the
original images
Humans may not be as capable at leveraging large amounts of training data for pattern
matching
Humans are very capable of generalizing from a small number of training examples
The Role of Data (2)



For machine experiments
significantly more training
examples are needed to
achieve similar levels of
accuracy
Linear SVMs and NCM are
less sensitive to noise
Humans are also
susceptible to data noise
The Role of Features (1)




Edge and gradient based features typically out perform color based
features across the various datasets
Humans are also known to be very sensitive to edge or contour
information
We perform human studies on recognition using the same features and
training sets as the machine experiments, and they show similar treads
The feature set is critical for recognition
The Role of Features (2)



Humans are shown natural images from the outdoor scene recognition
dataset under different transformations and asked to select a category
name from a list
Don’t provide subjects with any training data
Two transformations
 Block test



The image is divided into non-overlapping blocks, and the pixels in each
block are randomly shuffled
Maintains the global layout of the scene, but the local statistics are lost
Puzzle-test


The image is divided into non-overlapping blocks, but the blocks are
randomly shuffled in the image while maintaining the pixels’ relative
locations in the block
Local regions of the image are preserved while the global layout is not.
The Role of Features (3)


In both high and low resolution images, human recognition is robust to
a significant loss of local statistics. This indicates that humans rely on
the global layout of the scene for scene recognition
In high resolution images, human recognition rates are also very robust
even when the global layout of the scene is drastically altered, which
indicates that humans can also rely on local regions of images
The Role of Features (4)

Humans do not rely on a fixed set of features.
Depending on the information available to them,
humans can adaptively rely on different sets of
features during testing. This is true even if similar
instances have never been seen before. This
ability to adapt during testing is not seen in
standard machine learning algorithms
Outline






Authors
Abstract
Problem
Experiment & Result
Discussion
Conclusion
Discussion
Discussion



The notion of features goes beyond the choice between
colors, texture, the need for spatial information, etc. It
includes the concepts of incorporating semantic attributes
that are shared across categories
Perhaps what makes the human feature representation so
powerful is that these feature representations are tuned for
high performance at a variety of tasks
It is important to note that in addition to visual features,
humans leverage prior knowledge from several non-visual
higher-level and semantic features about how the world
we live in functions
Outline






Authors
Abstract
Problem
Experiment & Result
Discussion
Conclusion
Conclusion


In this paper we study human responses on visual recognition problems
as posed to machines, to gain insight into which of the three factors is
critical to humans’ superior performance.
 learning algorithm
 amount of training data
 features
We find no evidence that human pattern matching algorithms are better
than standard machine learning algorithms. Moreover, we find that
humans don’t leverage increased amounts of training data. We thus
hypothesize with the aid of ANOVA analysis that features are the main
factor contributing to superior human performance. Future work
involves extensive studies to identify which visual features humans rely
on to aid in the development of novel machine recognition algorithms.
Download