Lubor BMVC Practice Talk - Oxford Brookes University

advertisement
Latent SVMs for Human Detection with
a Locally Affine Deformation Field
Ľubor Ladický1
1
Phil Torr2 Andrew Zisserman1
University of Oxford
2
Oxford Brookes University
Object Detection
• Find all objects of interest
• Enclose them tightly in a bounding box
HOG Detector
• Sliding window using learnt HOG template
• Post-processing using non-maxima suppression
Dalal & Triggs CVPR05
HOG Detector
• Sliding window using learnt HOG template
• Post-processing using non-maxima suppression
Dalal & Triggs CVPR05
HOG Detector
• Sliding window using learnt HOG template
• Post-processing using non-maxima suppression
Dalal & Triggs CVPR05
HOG Detector
• Sliding window using learnt HOG template
• Post-processing using non-maxima suppression
Dalal & Triggs CVPR05
HOG Detector
• Sliding window using learnt HOG template
• Post-processing using non-maxima suppression
Dalal & Triggs CVPR05
HOG Detector
• Sliding window using learnt HOG template
• Post-processing using non-maxima suppression
Dalal & Triggs CVPR05
HOG Detector
Classifier response :
The weights w* and the bias b* learnt using the Linear SVM as:
Dalal & Triggs CVPR05
HOG Detector
Does not fit well !
• Sliding window using learnt HOG template
• Post-processing using non-maxima suppression
Dalal & Triggs CVPR05
Deformable Part-based Model
• Allows parts to move relative to the centre
• Effectively allows the template to deform
• Multiple models based on an aspect ratio
Felzenszwalb et al. CVPR08
Deformable Part-based Model
• Allows parts to move relative to the centre
• Effectively allows the template to deform
• Multiple models based on an aspect ratio
Felzenszwalb et al. CVPR08
Our approach
Cells c displaced by the deformation field d:
Our approach
Cells c displaced by the deformation field d:
Classifier response :
Regularisation takes the form of a pairwise MRF:
Our approach
Cells c displaced by the deformation field d:
Classifier response :
The weights w* and the bias b* learnt using the Latent Linear SVM as:
Comparison with our approach
HOG template
Part-based model
Our model
(no deformation)
(rigid movable parts)
(deformation field)
Our approach
Why hasn’t anyone tried it before?
Our approach
Why hasn’t anyone tried it before?
• Latent models with many latent variables tend to overfit
• Inference not feasible for a sliding window
Our approach
Why hasn’t anyone tried it before?
• Latent models with many latent variables tend to overfit
• Inference not feasible for a sliding window
• Deformation field used before only for
• Classification task (Duchenne et al ICCV11)
• Rescoring of detections (Ladický, PhD thesis)
Our approach
We restrict the deformation field to be locally affine (
):
Our approach
We restrict the deformation field to be locally affine (
):
Our approach
We restrict the deformation field to be locally affine (
):
Optimisation
Weights / bias (w*, b*) and the deformation fields dk estimated iteratively
Optimisation
Weights / bias (w*, b*) and the deformation fields dk estimated iteratively
Given the deformation fields the problem is a standard linear SVM:
Optimisation
Weights / bias (w*, b*) and the deformation fields dk estimated iteratively
Given (w*, b*) the problem is a constrained MRF optimisation:
The last can be decomposed as :
By defining
the optimisation becomes:
Optimisation
• The location of the cells in the first row and in the first column fully
determine the location of each cell
• Any locally affine deformation field can be reached by two moves :
• each column i moves by (Δcdix ,Δcdiy)
• each row j moves by (Δrdjx ,Δrdjy)
Optimisation
• The location of the cells in the first row and in the first column fully
determine the location of each cell
• Any locally affine deformation field can be reached by two moves :
• each column i moves by (Δcdix ,Δcdiy)
• each row j moves by (Δrdjx ,Δrdjy)
Optimisation
• The location of the cells in the first row and in the first column fully
determine the location of each cell
• Any locally affine deformation field can be reached by two moves :
• each column i moves by (Δcdix ,Δcdiy)
• each row j moves by (Δrdjx ,Δrdjy)
• Such moves do not alter the local affinity
Optimisation
• The location of the cells in the first row and in the first column fully
determine the location of each cell
• Any locally affine deformation field can be reached by two moves :
• each column i moves by (Δcdix ,Δcdiy)
• each row j moves by (Δrdjx ,Δrdjy)
• Such moves do not alter the local affinity
• Both moves can be solved quickly using dynamic programming
Learning multiple poses / viewpoints
Learning multiple poses / viewpoints
• We define a similarity measure between two training samples as :
where
Learning multiple poses / viewpoints
• We define a similarity measure between two training samples as :
where
• K-medoid clustering of S matrix clusters the data into multi model
Experiments
• Buffy dataset (typically used for pose estimation)
• Contains large variety of poses, viewpoints and aspect ratios
• Consists of 748 images
• Episode s5e3 used for training
• Episode s5e4 used for validation
• Episodes s5e2, s5e5 and s5e6 used for testing
Ferrari et al. CVPR08
Clustering of training samples
Each row corresponds to one model (out of 10 models)
Qualitative results
Quantitative results
Conclusion
• We propose
• Novel inference for locally affine deformation field (LADF)
• Object detector using LADF
• Clustering using LADF
Thank you
Questions?
Download