Shape-Based Human Detection and Segmentation via Hierarchical PartTemplate Matching Zhe Lin, Member, IEEE Larry S. Davis, Fellow, IEEE IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLGENCE, APRIL 2010 Overview • Introduction • Previous Work • Proposed Approach – Hierarchical Part-Template Matching – Pose-Adaptive Descriptors – Combining With Calibration And Background Subtraction • Experiment Result • Conclusion Overview • Introduction • Previous Work • Proposed Approach – Hierarchical Part-Template Matching – Pose-Adaptive Descriptors – Combining With Calibration And Background Subtraction • Experiment Result • Conclusion Introduction • Robust Human tracking and identification are highly dependent on reliable human detection and human segmentation. • Remains challenging due to several conditions like body postures, illumination, occlusion, and viewpoint changes. • Goal: Develop a robust and efficient approach to detect and segmentation. • Method: Shape-based, part-template matching Overview • Introduction • Previous Work • Proposed Approach – Hierarchical Part-Template Matching – Pose-Adaptive Descriptors – Combining With Calibration And Background Subtraction • Experiment Result • Conclusion Previous Work • Shape Feature extraction schemes – Model human shapes globally [1],[2],[3] – Model shapes using sparse local features [9],[10],[11] • Learning Perspective – Generative approach – tree-based data structure [6],[7],[8] – Discriminative approach – using SVMs as the test classifiers [3] • Surveillance scenarios – Motion blob information [35],[36] Overview • Introduction • Previous Work • Proposed Approach – Hierarchical Part-Template Matching – Pose-Adaptive Descriptors – Combining With Calibration And Background Subtraction • Experiment Result • Conclusion Proposed Approach • Hierarchical part-template matching approach combining with discriminative learning. Overview • Introduction • Previous Work • Proposed Approach – Hierarchical Part-Template Matching – Pose-Adaptive Descriptors – Combining With Calibration And Background Subtraction • Experiment Result • Conclusion Hierarchical Part-Template Matching • Generating the part-template tree model – Synthesizing global shape models – Generating parts by decomposition – Constructing an initial tree model using parts • Learning the part-template tree • Hierarchical part-template matching Synthesizing Global Shape Models • Analyzing articulation of human body to six regions – Head, torso, pair of upper legs, pair of lower legs – Parameter above are quantized into {3,2,3,3,3,3} Generating Parts by Decomposition • Binarize (a) and to obtain (b), then extract boundaries of the silhouettes to get (c). • Silhouettes are decomposed into three parts(head-torso, upper legs, and lower legs) • The parameters of silhouettes are denoted by θj, consist of index and location Constructing an Initial Tree Model Using Parts • A part-template tree is conducted by placing the decomposed part region or fragment into a tree. • Four layer L0~L3, denote root, head-torso, upper and lower legs separately. • Tree consists of 186 part-template. (6 ht models, 18 ul models, and 162 ll models) • Much larger set only slightly improves in performance. • Applying fast hierarchical shape matching scheme. Constructing an Initial Tree Model Using Parts Learning the Part-Template Tree • The tree doesn’t contain any prior statistics from real human silhouettes. • The learning is performed by matching the tree to a set of real human silhouette images. • The goal is to explicitly estimate branching probability distributions (conditional probability distributions). Learning the Part-Template Tree • Learning method: – The training silhouette is passed through the tree from root to estimate the matching score and find the optimal path. – Based on the set of paths, a branching probability distribution is estimated for each node. – Each node contains a binary image of the parttemplate, its sample point coordinates, and a branching probability. Hierarchical Part-Template Matching • Similarly to the model used for tree learning. • The overall matching score for a detection window is simply modeled as a summation of scores of all nodes along the path. • Score of node is the product of the parttemplate matching score and the probability of the node. • Matching method is similar to Chamfer matching [6]. – The matching score of a sample point on the contour is measured by edge-orientation matching to find the optimal human pose. [6] D.M. Gavrila and V. Philomin, “Real-Time Object Detection for SMART Vehicles,” Proc. IEEE Overview • Introduction • Previous Work • Proposed Approach – Hierarchical Part-Template Matching – Pose-Adaptive Descriptors – Combining With Calibration And Background Subtraction • Experiment Result • Conclusion Pose-Adaptive Descriptors • Introduce a pose-adaptive feature computation method for detecting human from images using SVM. • By similar method of HOG descriptor[3] getting object detection window. • After given the candidate detection window, hierarchical part-template matching is performed to estimate the [3] N. Dalal and B. Triggs, “Histograms of Oriented Gradients for Human Detection,” optimal pose. Proc. IEEE Conf. • After the pose is estimated, block Pose-Adaptive Descriptors Low-Level Features • Similar to [3] • Given an image, calculate gradient magnitudes |G| and edge orientation O • Quantize the image into 8x8 nonoverlapping cells, each represent a histogram of edge orientations. Pose Inference on The Low-Level Features • An optimal tree path is estimated based on the matching score. • Among matching score, the part-template score is measured by an average of gradient magnitude. • Matching score (1), where B(t) = [O(t)/(π/9)], h is the orientation histogram • The average score of the part-template is Representation Using Pose-Adaptive Descriptors • The global shape models are represented as a set of boundary points with corresponding edge orientations. Overview • Introduction • Previous Work • Proposed Approach – Hierarchical Part-Template Matching – Pose-Adaptive Descriptors – Combining With Calibration And Background Subtraction • Experiment Result • Conclusion Scene-to-Camera Calibration • To obtain a mapping between head points and foot points in the image, estimate the homography between the head plane and the foot plane in the image. • Get head point ph = f(pf), where pf is an arbitrary point of foot. Combining With Background Subtraction • Find foot regions Rfoot = {x|ϒx≥ξ} • Through part-template matching finding regions that may be legs. • Given the estimated human vertical axis vx and an adaptive rectangular window W(x,(w0,h0)), get human detection. • Get human segmentation. Combining With Calibration and Background Substraction Overview • Introduction • Previous Work • Proposed Approach – Hierarchical Part-Template Matching – Pose-Adaptive Descriptors – Combining With Calibration And Background Subtraction • Experiment Result • Conclusion Experiment Result • Present result of human detector using their method on two public pedestrian data sets (INRIA and MIT-CBCL). • Present result of multiple occluded human detector on three crowded image and video data set. • Compare with other approaches using DET curves. Experiment of Detection Result Experiment of Detection Result • Better performance than HOG-SVM. • Not only detecting but also segmenting human poses. • Can be further improved because of capability of being extended to cover more pose or articulations. • Successfully detected difficult poses while the HOG-based detector missed. Experiment of Detection Result Experiment of Detection Result Experiment of Segmentation Result • Using pose model and probabilistic hierarchical part-template matching algorithm give very accurate segmentation in the MITCBCL and INRIA data set. Experiment Without Subtraction Experiment Without Subtraction Experiment With Subtraction • Data set – Caviar Benchmark data set – Munich Airport data set collected by Siemens Corporate Research • Can get good result even with poor and inaccurate background subtraction. Experiment With Subtraction Experiment With Subtraction Overview • Introduction • Previous Work • Proposed Approach – Hierarchical Part-Template Matching – Pose-Adaptive Descriptors – Combining With Calibration And Background Subtraction • Experiment Result • Conclusion Conclusion • A hierarchical part-template matching approach is employed to match human shapes with images detect and segment simultaneously. • Many of misdetections are due to the pose estimation failures. • Future work – Investigating the addition of color and texture statistics to the local contextual