slides

advertisement
A Framework of Extracting Multi-scale
Features Using Multiple Convolutional
Neural Networks
Kuan-Chuan Peng
Tsuhan Chen
1
Introduction
• Breakthrough progress in object classification.
cat
dog
tiger
lion
O. Russakovsky et al. ImageNet large scale visual recognition challenge. arXiv:1409.0575, 2014.
2
N. Murray et al. AVA: A Large-Scale Database for Aesthetic Visual Analysis. CVPR12.
Introduction
• Humans are interested in more than objects.
• For example, aesthetic quality.
N. Murray et al. AVA: A Large-Scale Database for Aesthetic Visual Analysis. CVPR12.
3
How do machines describe images?
• Examples by state-of-art algorithm:
“man in black shirt is playing guitar.”
“woman is holding bunch of bananas.”
A. Karpathy and F.-F. Li. Deep visual-semantic alignments for generating image descriptions. CVPR15.
http://cs.stanford.edu/people/karpathy/deepimagesent/
4
How do machines describe images?
• Examples by state-of-art algorithm:
“man in black shirt is playing guitar.”
“woman is holding bunch of bananas.”
A. Karpathy and F.-F. Li. Deep visual-semantic alignments for generating image descriptions. CVPR15.
http://cs.stanford.edu/people/karpathy/deepimagesent/
5
How do machines describe images?
• Examples by state-of-art algorithm:
“man in black shirt is playing guitar.”
“woman is holding bunch of bananas.”
A. Karpathy and F.-F. Li. Deep visual-semantic alignments for generating image descriptions. CVPR15.
http://cs.stanford.edu/people/karpathy/deepimagesent/
6
How do machines describe images?
• Examples by state-of-art algorithm:
“man in black shirt is playing guitar.”
“woman is holding bunch of bananas.”
A. Karpathy and F.-F. Li. Deep visual-semantic alignments for generating image descriptions. CVPR15.
http://cs.stanford.edu/people/karpathy/deepimagesent/
7
How do experts describe images?
• Examples by the Pulitzer Prize winners:
“At bath times, Danielle appears serene. “The surgery has dragged on for hours with little
But no one know what lies beyond those progress, and Mulliken, taking a breather next to
an array of Sam's CAT scans, is feeling the
eyes.” (by Lane DeGregory)
frustration and exhaustion.” (by Tom Hallman Jr.)
http://www.pulitzer.org/archives/8417
http://www.pulitzer.org/archives/6451
8
How do experts describe images?
• Images convey more than objects.
“At bath times, Danielle appears serene. “The surgery has dragged on for hours with little
But no one know what lies beyond those progress, and Mulliken, taking a breather next to
an array of Sam's CAT scans, is feeling the
eyes.” (by Lane DeGregory)
frustration and exhaustion.” (by Tom Hallman Jr.)
http://www.pulitzer.org/archives/8417
http://www.pulitzer.org/archives/6451
9
Beyond Objects
• Abstract attributes matter.
– Attributes relating to or involving general ideas or
qualities rather than specific people, objects, or
actions. [Merriam-Webster dictionary]
• Bridge the gap between machines and humans:
– Teach machines to solve abstract tasks (tasks
involving abstract attributes).
10
http://www.merriam-webster.com/dictionary/abstract
Goal
• A general framework to achieve better
performance in abstract tasks.
– Multi-scale features by using convolutional neural
networks (CNN).
11
Why CNN?
speech recognition
object classification
video classification
O. Russakovsky et al. ImageNet large scale visual recognition challenge. arXiv:1409.0575, 2014.
L. Deng et al. A deep convolutional neural network using heterogeneous pooling for trading
acoustic invariance with phonetic confusion. ICASSP13.
12
A. Karpathy et al. Large-scale video classification with convolutional neural networks. CVPR14.
Existing Abstract Tasks
• More and more abstract tasks are proposed.
13
Artistic Style & Artist Style Classification
[F. S. Khan et al. MVA14.]
Architectural Style Classification
[Z. Xu et al. ECCV14.]
14
amusement
anger
awe
contentment disgust excitement
Emotion Classification
[J. Machajdik et al. ACMMM10.]
high aesthetic quality
fear
sad
low aesthetic quality
Aesthetic Classification
[N. Murray et al. CVPR12.]
15
Bohemian
Hipster
Fashion Style Classification
[M. H. Kiapour et al. ECCV14.]
Memorability Prediction
[P. Isola et al. CVPR11.]
Interestingness Prediction
[M. Gygli et al. ICCV13.]
16
Inspiration
• It is tricky to describe abstract attributes as
objects.
– Not easy to “locate” abstract attributes.
• What if abstract attributes prevail everywhere?
– Label-inheritable (LI) property.
?
contentment
[J. Machajdik et al. ACMMM10.]
17
Label-Inheritable (LI) Property
Dataset
Painting-91 [1]
arcDataset [2]
Caltech-101 [3]
Task
Artist style
classification
Architectural style
classification
Object classification
Label
Picasso
Baroque
Architecture
Faces
Label-inheritable
Yes
Partial
Mostly No
[1] F. S. Khan et al. Painting-91: a large scale database for computational painting categorization. Machine Vision & Applications 14.
18
[2] Z. Xu et al. Architectural style classification using multinomial latent logistic regression. ECCV14.
[3] F.-F. Li et al. Learning generative visual models from few training examples: An incremental bayesian approach tested on 101 object categories. CVPRW04.
Label-Inheritable (LI) Property
Dataset
Painting-91 [1]
arcDataset [2]
Caltech-101 [3]
Task
Artist style
classification
Architectural style
classification
Object classification
Label
Picasso
Baroque
Architecture
Faces
Label-inheritable
Yes
Partial
Mostly No
[1] F. S. Khan et al. Painting-91: a large scale database for computational painting categorization. Machine Vision & Applications 14.
19
[2] Z. Xu et al. Architectural style classification using multinomial latent logistic regression. ECCV14.
[3] F.-F. Li et al. Learning generative visual models from few training examples: An incremental bayesian approach tested on 101 object categories. CVPRW04.
Label-Inheritable (LI) Property
Dataset
Painting-91 [1]
arcDataset [2]
Caltech-101 [3]
Task
Artist style
classification
Architectural style
classification
Object classification
Label
Picasso
Baroque
Architecture
Faces
Label-inheritable
Yes
Partial
Mostly No
[1] F. S. Khan et al. Painting-91: a large scale database for computational painting categorization. Machine Vision & Applications 14.
20
[2] Z. Xu et al. Architectural style classification using multinomial latent logistic regression. ECCV14.
[3] F.-F. Li et al. Learning generative visual models from few training examples: An incremental bayesian approach tested on 101 object categories. CVPRW04.
Multi-Scale CNN
• Assume LI property holds for each image and
the associated label.
21
A. Krizhevsky et al. ImageNet classification with deep convolutional neural networks. NIPS12.
AlexNet
• The number of nodes in output layer is changed
to be the number of classes in each task.
22
A. Krizhevsky et al. ImageNet classification with deep convolutional neural networks. NIPS12.
Experimental Results
classification accuracy (%)
Method \ Task
Caltech-101 object Architectural
Artist style
Artistic style
classification
style
classification classification (15 / 30 training
classification
examples per class) (10 / 25 classes)
Previous work
(baseline)
53.10 [1]
62.20 [1]
83.80 / 86.50 [2]
69.17 / 46.21 [3]
Single-scale CNN
(baseline)
55.15
67.37
83.45 / 88.19
70.64 / 54.84
2-scale CNN
(ours)
58.11
69.67
80.19 / 87.58
74.82 / 58.89
3-scale CNN
(ours)
57.91
70.96
N/A
75.32 / 59.13
Label-inheritable
Yes
Yes
Mostly No
Partial
[1] F. S. Khan et al. Painting-91: a large scale database for computational painting categorization. Machine
Vision & Applications 14.
[2] M. D. Zeiler and R. Fergus. Visualizing and understanding convolutional networks. ECCV14.
23
[3] Z. Xu et al. Architectural style classification using multinomial latent logistic regression. ECCV14.
Is it because of more training data?
• What if we train one CNN with images in
different scales?
24
A. Krizhevsky et al. ImageNet classification with deep convolutional neural networks. NIPS12.
Additional Results
classification accuracy (%)
Method \ Task
Caltech-101 object Architectural
Artist style
Artistic style
classification
style
classification classification (15 / 30 training
classification
examples per class) (10 / 25 classes)
Previous work
(baseline)
53.10 [1]
62.20 [1]
83.80 / 86.50 [2]
69.17 / 46.21 [3]
Single-scale CNN
(baseline)
55.15
67.37
83.45 / 88.19
70.64 / 54.84
2-scale CNN
(ours)
58.11
69.67
80.19 / 87.58
74.82 / 58.89
1 CNN +
2-scale images
46.86
61.95
N/A
67.93 / 49.06
Label-inheritable
Yes
Yes
Mostly No
Partial
[1] F. S. Khan et al. Painting-91: a large scale database for computational painting categorization. Machine
Vision & Applications 14.
[2] M. D. Zeiler and R. Fergus. Visualizing and understanding convolutional networks. ECCV14.
25
[3] Z. Xu et al. Architectural style classification using multinomial latent logistic regression. ECCV14.
Conclusion
• We proposed Multi-Scale Convolutional
Neural Networks (MSCNN) based on LabelInheritable (LI) property.
– Multi-scale features.
• MSCNN can outperform the state-of-art
performance on datasets where LI property
holds or even partially holds.
26
Towards Solving Abstract Tasks
• More CNN features to achieve better
performance in abstract tasks.
– Multi-scale features (ICME15).
– Multi-depth features (ICIP15).
– Multi-task features (submitted to ICCV15).
K.-C. Peng and T. Chen. A Framework of extracting multi-scale features using multiple
convolutional neural networks. ICME15.
K.-C. Peng and T. Chen. Cross-layer features in convolutional neural networks for generic
classification tasks. ICIP15.
K.-C. Peng and T. Chen. Toward correlating and solving abstract tasks using convolutional neural
networks. Submitted to ICCV15.
27
Q&A
28
Download