Introduction
Problem: Classifying attributes and actions in still images
Model:




Collection of part templates
Specific scale space locations (human centric)
Discriminative learning
Sparse Activation
Motivation
Train
Test
Train
Test
Overview
Mining Parts
&
Learning Templates
Image
Scoring
Formulation
Dataset:
Model:
fractional multiples of
width and height
Objective:
Model
fractional multiples of
width and height
d = 1000
Part 1
Part 2
Part 3
parts
...
Model
Model & Scoring
Image Scoring
Model
sparse activation
overlap constraint
Optimization: Greedy selection of
0.33 overlap constraint
Model Initialization
1) randomly sample the positive training images for patch positions:
2) Initialize model parts:
perfect case:
3) BoF features
worst case:
normalized 105 patches.
3) Prunning: remove unused parts
Learning
k=4
Experiments
Willow 7 Human actions
27 Human Attributes (HAT)
Stanford 40 Human Actions
Implementation
Features:
– VLFeat - Dense SIFT,
• step size: 4 pixels
• square patches (8 to 40 pixels)
– k-means - vocabulary 1000
– explicit feature map + Bhattacharyya (Hellinger – Square root) kernel
Baseline: 4 level spatial pyramid
Immediate context:
– expand the human bounding boxes by 50% in both width and height
Full image context:
– full image classifier uses 4 level SPM with an exponential kernel
2
Qualitative Results
Willow Actions
Database of Human Attributes (HAT)
Stanford 40 Actions
Learned Parts - I
In each row, the first image is the patch used to initialize
the part and the remaining images are its top scoring patches
Learned Parts - II
In each row, the first image is the patch used to initialize
the part and the remaining images are its top scoring patches
Learned Parts - III
In each row, the first image is the patch used to initialize
the part and the remaining images are its top scoring patches