Pedestrian Detection in Crowded Scenes Dhruv Batra ECE CMU

advertisement
Pedestrian Detection in
Crowded Scenes
Dhruv Batra
ECE CMU
Pedestrian Detection in
Crowded Scenes
1.
2.
3.
Pedestrian Detection in Crowded Scenes. Bastian Leibe, Edgar Seemann, and Bernt
Schiele. In IEEE International Conference on Computer Vision and Pattern Recognition
(CVPR'05), San Diego, CA, June 2005.
An Evaluation of Local Shape-Based Features for Pedestrian Detection. Edgar Seemann,
Bastian Leibe, Krystian Mikolajczyk, and Bernt Schiele. In British Machine Vision
Conference (BMVC'05) Oxford, UK, September 2005.
Combined Object Categorization and Segmentation with an Implicit Shape Model. Bastian
Leibe, Ales Leonardis, and Bernt Schiele. In ECCV'04 Workshop on Statistical Learning in
Computer Vision, Prague, May 2004.
Theme of the Paper

Probabilistic top-down/bottom-up formulation of
segmentation/recognition

Basic Premise: “[Such a] problem is too difficult for any type of feature
or model alone”
Theme of the Paper

Open Question: How would you do pedestrian detection/segmentation?
Original
Supportimage
ofSupport
Segmentation
Segmentation
of segmentation
Segmentation
from local
from
from
features
local
from
global
features
global
features
features
(Chamfer
(Chamfer
Matching)
Matching)

Solution: integrate as many cues as possible from many sources
Theme of the Paper

Goal: Localize AND count pedestrians in a given image

Datasets
Training
Testing Set
Set:(Much
35 people
harder!):
walking
209parallel
imagestoofthe
595image
annotated
plane pedestrians
Theme of the Paper
Evaluation Criteria

Criteria 1: Relative Distance
Threshold d < 0.5
Fixed aspect ratio- 11:15
Evaluation Criteria

Criteria 2 & 3: Cover and Overlap
Threshold cover >50%
overlap >50%
Initial Recognition Approach



First Step: Generate hypotheses from local features (Intrinsic Shape
Models)
Training:
Code book Approach (with spatial information)
Initial Recognition Approach


First Step: Generate hypotheses from local features (Intrinsic Shape
Models)
Training:
Lowe’s DoG Detector
3s x 3s patches
Resize to 25 x 25
Initial Recognition Approach



First Step: Generate hypotheses from local features (Intrinsic Shape
Models)
Training:
Agglomerative Clustering
Initial Recognition Approach



First Step: Generate hypotheses from local features (Intrinsic Shape
Models)
Training:
Agglomerative Clustering
Initial Recognition Approach



First Step: Generate hypotheses from local features (Intrinsic Shape
Models)
Training:
Agglomerative Clustering
Initial Recognition Approach



First Step: Generate hypotheses from local features (Intrinsic Shape
Models)
Training:
Agglomerative Clustering
Codebook entries
store figure-ground masks for these entries
Initial Recognition Approach



First Step: Generate hypotheses from local features (Intrinsic Shape
Models)
Training:
But wait! We just lost spatial information … Run again
Lowe’s DoG Detector
3s x 3s patches
Resize to 25 x 25
Learn Spatial Distribution
Find codebook patches
Initial Recognition Approach

First Step: Generate hypotheses from local features (Intrinsic Shape
Models)


Testing:
Initial Hypothesis: Overall
Initial Recognition Approach

First Step: Generate hypotheses from local features (Intrinsic Shape
Models)


Testing:
Initial Hypothesis (Probabilistic Hough Voting Procedure)
learnt from spatial distributionsmeasuring
of codebook
similarity
entriesbetween patch and codebook entry
Search for maximum
Usinginaprobability
fixed size search
space window
Initial Recognition Approach

First Step: Generate hypotheses from local features (Intrinsic Shape
Models)


Testing:
Initial Hypothesis: found as maxima in 3D voting space
maxima computed using Mean Shift Mode Estimation
over this balloon density estimator
Uniform Cubicle Kernel
Initial Recognition Approach

First Step: Generate hypotheses from local features (Intrinsic Shape
Models)


Testing:
Initial Hypothesis: Overall
Initial Recognition Approach

First Step: Generate hypotheses from local features (Intrinsic Shape
Models)


Testing:
Initial Hypothesis: Overall
Initial Recognition Approach

First Step: Generate hypotheses from local features (Intrinsic Shape
Models)


Estimate from training data
Testing:
From similarity measure
Probabilistic top down segmentation
Assumption: Uniform Priors
start here
Intermediate Goal: Find this
Initial Recognition Approach

First Step: Generate hypotheses from local features (Intrinsic Shape
Models)


Testing:
Probabilistic top down segmentation
Substitute this here
Marginalized over all patches in image
to get this
Initial Recognition Approach

First Step: Generate hypotheses from local features (Intrinsic Shape
Models)


Testing:
Probabilistic top down segmentation
Initial Recognition Approach

First Step: Generate hypotheses from local features (Intrinsic Shape
Models)


Testing:
Probabilistic top down segmentation
Initial Recognition Approach

Second Step: Handling overlapping detections
Initial Recognition Approach

Second Step: Segmentation based Verification (Minimum Description
Length)

Saving that can be achieved by explaining part of image by a
particular hypothesis
Number of pixelsModel
N
explained
by h the error made by hypothesis h
Cost
complexity
of describing
Probability of being a background
Sum over all pixels hypothesized as figure
Initial Recognition Approach


Second Step: Segmentation based Verification (Minimum Description
Length)
Bias term
Relative importance assigned to support of hypothesis
With this framework we can resolve conflicts between overlapping
hypothesis
Initial Recognition Approach

Second Step: Segmentation based Verification (Minimum Description
Length)

Voila! It works
Initial Recognition Approach

Second Step: Segmentation based Verification (Minimum Description
Length)

Caveat: it leads to another set of problems
Or four legs and three arms
ISM doesn’t know a person doesn’t have three legs!
Global Cues are needed
Assimilation of Global Cues

Distance Transform, Chamfer Matching
get Feature Image by an
getedge
DT image
detector
by computing
Chamferdistance
Distance
tobetween
nearest feature
template
point
and DT image
Assimilation of Global Cues (Attempt 1)

Distance Transform, Chamfer Matching
Initial hypothesis
generated by
local features
Use scale estimate
to cut out
surrounding region
Apply Canny
detector and
compute DT
Yellow is highest
Chamfer score
Chamfer distance
based matching
Assimilation of Global Cues (Attempt 2)

Maximize Chamfer Score AND overlap with overlap with hypothesized
segmentation instead of pure Chamfer Score
Overlap expressed as
Bhattacharya coeff.
Joint score is
linear combination
of the two
Assimilation of Global Cues (Attempt 3)


Apply hypothesis saving MDL method again
Boolean quadratic formulation
Results
Download