Pedestrian Detection in Crowded Scenes Dhruv Batra ECE CMU Pedestrian Detection in Crowded Scenes 1. 2. 3. Pedestrian Detection in Crowded Scenes. Bastian Leibe, Edgar Seemann, and Bernt Schiele. In IEEE International Conference on Computer Vision and Pattern Recognition (CVPR'05), San Diego, CA, June 2005. An Evaluation of Local Shape-Based Features for Pedestrian Detection. Edgar Seemann, Bastian Leibe, Krystian Mikolajczyk, and Bernt Schiele. In British Machine Vision Conference (BMVC'05) Oxford, UK, September 2005. Combined Object Categorization and Segmentation with an Implicit Shape Model. Bastian Leibe, Ales Leonardis, and Bernt Schiele. In ECCV'04 Workshop on Statistical Learning in Computer Vision, Prague, May 2004. Theme of the Paper Probabilistic top-down/bottom-up formulation of segmentation/recognition Basic Premise: “[Such a] problem is too difficult for any type of feature or model alone” Theme of the Paper Open Question: How would you do pedestrian detection/segmentation? Original Supportimage ofSupport Segmentation Segmentation of segmentation Segmentation from local from from features local from global features global features features (Chamfer (Chamfer Matching) Matching) Solution: integrate as many cues as possible from many sources Theme of the Paper Goal: Localize AND count pedestrians in a given image Datasets Training Testing Set Set:(Much 35 people harder!): walking 209parallel imagestoofthe 595image annotated plane pedestrians Theme of the Paper Evaluation Criteria Criteria 1: Relative Distance Threshold d < 0.5 Fixed aspect ratio- 11:15 Evaluation Criteria Criteria 2 & 3: Cover and Overlap Threshold cover >50% overlap >50% Initial Recognition Approach First Step: Generate hypotheses from local features (Intrinsic Shape Models) Training: Code book Approach (with spatial information) Initial Recognition Approach First Step: Generate hypotheses from local features (Intrinsic Shape Models) Training: Lowe’s DoG Detector 3s x 3s patches Resize to 25 x 25 Initial Recognition Approach First Step: Generate hypotheses from local features (Intrinsic Shape Models) Training: Agglomerative Clustering Initial Recognition Approach First Step: Generate hypotheses from local features (Intrinsic Shape Models) Training: Agglomerative Clustering Initial Recognition Approach First Step: Generate hypotheses from local features (Intrinsic Shape Models) Training: Agglomerative Clustering Initial Recognition Approach First Step: Generate hypotheses from local features (Intrinsic Shape Models) Training: Agglomerative Clustering Codebook entries store figure-ground masks for these entries Initial Recognition Approach First Step: Generate hypotheses from local features (Intrinsic Shape Models) Training: But wait! We just lost spatial information … Run again Lowe’s DoG Detector 3s x 3s patches Resize to 25 x 25 Learn Spatial Distribution Find codebook patches Initial Recognition Approach First Step: Generate hypotheses from local features (Intrinsic Shape Models) Testing: Initial Hypothesis: Overall Initial Recognition Approach First Step: Generate hypotheses from local features (Intrinsic Shape Models) Testing: Initial Hypothesis (Probabilistic Hough Voting Procedure) learnt from spatial distributionsmeasuring of codebook similarity entriesbetween patch and codebook entry Search for maximum Usinginaprobability fixed size search space window Initial Recognition Approach First Step: Generate hypotheses from local features (Intrinsic Shape Models) Testing: Initial Hypothesis: found as maxima in 3D voting space maxima computed using Mean Shift Mode Estimation over this balloon density estimator Uniform Cubicle Kernel Initial Recognition Approach First Step: Generate hypotheses from local features (Intrinsic Shape Models) Testing: Initial Hypothesis: Overall Initial Recognition Approach First Step: Generate hypotheses from local features (Intrinsic Shape Models) Testing: Initial Hypothesis: Overall Initial Recognition Approach First Step: Generate hypotheses from local features (Intrinsic Shape Models) Estimate from training data Testing: From similarity measure Probabilistic top down segmentation Assumption: Uniform Priors start here Intermediate Goal: Find this Initial Recognition Approach First Step: Generate hypotheses from local features (Intrinsic Shape Models) Testing: Probabilistic top down segmentation Substitute this here Marginalized over all patches in image to get this Initial Recognition Approach First Step: Generate hypotheses from local features (Intrinsic Shape Models) Testing: Probabilistic top down segmentation Initial Recognition Approach First Step: Generate hypotheses from local features (Intrinsic Shape Models) Testing: Probabilistic top down segmentation Initial Recognition Approach Second Step: Handling overlapping detections Initial Recognition Approach Second Step: Segmentation based Verification (Minimum Description Length) Saving that can be achieved by explaining part of image by a particular hypothesis Number of pixelsModel N explained by h the error made by hypothesis h Cost complexity of describing Probability of being a background Sum over all pixels hypothesized as figure Initial Recognition Approach Second Step: Segmentation based Verification (Minimum Description Length) Bias term Relative importance assigned to support of hypothesis With this framework we can resolve conflicts between overlapping hypothesis Initial Recognition Approach Second Step: Segmentation based Verification (Minimum Description Length) Voila! It works Initial Recognition Approach Second Step: Segmentation based Verification (Minimum Description Length) Caveat: it leads to another set of problems Or four legs and three arms ISM doesn’t know a person doesn’t have three legs! Global Cues are needed Assimilation of Global Cues Distance Transform, Chamfer Matching get Feature Image by an getedge DT image detector by computing Chamferdistance Distance tobetween nearest feature template point and DT image Assimilation of Global Cues (Attempt 1) Distance Transform, Chamfer Matching Initial hypothesis generated by local features Use scale estimate to cut out surrounding region Apply Canny detector and compute DT Yellow is highest Chamfer score Chamfer distance based matching Assimilation of Global Cues (Attempt 2) Maximize Chamfer Score AND overlap with overlap with hypothesized segmentation instead of pure Chamfer Score Overlap expressed as Bhattacharya coeff. Joint score is linear combination of the two Assimilation of Global Cues (Attempt 3) Apply hypothesis saving MDL method again Boolean quadratic formulation Results