Multiple-shot Person Reidentification by HPE signature Loris Bazzani*, Marco Cristani*†, Alessandro Perina*, Michela Farenzena*, Vittorio Murino*† *Computer Science Department, University of Verona, Italy †Istituto Italiano di Tecnologia (IIT), Genova, Italy This research is founded by the EU-Project FP7 SAMURAI,grant FP7-SEC- 2007-01 No. 217899 Analysis of the problem (1) • Person Re-identification: Recognizing an individual in diverse locations over different (non-)overlapping camera views Different cameras T=1 T = 23 Same camera 2 T = 145 T = 222 Analysis of the problem (2) • We focus on the problem with non-overlapping cameras • Problems in real scenarios: – – – – – – Very low resolution Severe Occlusions Illumination variations Pedestrians with very similar clothes Pose and view-point changes No geometry of the environment • Solution: - Histogram Plus Epitome (HPE) descriptor, and - Multiple-shot approach 3 Outline Overview of the proposed method Pre-processing: Background Subtraction “Images selection” for Multiple-shot HPE descriptor - Global descriptor - Local descriptors HPEs’ Matching Results Conclusions 4 Overview of the proposed method • Employing global and local appearance-based features • Exploiting the temporal consistency to make robust the descriptor 5 Background Subtraction We employ a novel generative model: STEL [Jojic el al. 2009] 6 Capture the structure of an image class as a mixture of component segmentations Isolate meaningful parts that exhibit tight feature distributions Learned Mixture Components “Images selection” for Multiple-shot Objective: discard redundant information and images with occlusions Gaussian Mixture Models Clustering [Figueiredo and Jain 2002] of HSV histograms Automatic model selection employing the Bayesian Information Criterion [Figueiredo and Jain 2002] Discard the clusters with low number of instances Keep a random instance for each cluster Examples of ruled-out examples: 7 HPE descriptor: Global feature • Capture chromatic global information 36-dimensional HSV histogram Caused by illumination changes 8 (H=16, S=16, V=4) Average the histograms of the multiple instances Robust to illumination and pose variations, keeping the predominant chromatic information only HPE descriptor: Local feature (1) Epitome [Jojic el al. 2003]: generative model that analyzes the presence of recurrent, structured local patterns Generic Epitome Local Epitome 9 HPE descriptor: Local feature (2) Generic Epitome : 36-dimensional HSV histogram of the Epitome Local Epitome : Keep the patches with high : probability that a patch in the epitome having (i, j) as left-upper corner represents several ingredient patches Discard the patches with low entropy Extract a 36-dimensional HSV histogram of the “survived” patches 10 HPEs’ Matching Re-identification: associating each element in the probe set B to the corresponding element in the gallery set A Minimize the following distance where 11 is the Bhattacharyya distance and Results (1) iLIDS dataset: - Multiple images of 119 pedestrians 128x64 pixels - Comparison with Context-based method [Zheng et al. 2009] - Cross-validation: SvsS 10 trials, MvsS/MvsM 100 trials 12 Results (2) ETHZ dataset: - Three datasets of 83, 35 and 28 pedestrians of 64x32 pixels - Comparison with Partial Least Square (PLS) method [Schwartz and Davis 2009] - Cross-validation: Settings as for iLIDS 13 Results (3) How many images do we need to perform a “good” person re-identification? N = 5 seems to be the best trade-off 14 N = Number of images for the multi-shot approach Conclusions We proposed a novel descriptor for the person re- identification problem, i.e., HPE descriptor The descriptor is robust to low resolution, occlusions, illumination variations, pedestrians with very similar clothes, pose changes It is based on the accumulation of images to gain robustness Person re-identification problem is still far from being solved The results suggest that further improvements can be reached 15 References [Jojic el al. 2009] N. Jojic, A. Perina, M. Cristani, V. Murino, and B. Frey, “Stel component analysis: Modeling spatial correlations in image class structure,” IEEE Conference on Computer Vision and Pattern Recognition, pp. 2044– 2051, 2009. [Figueiredo and Jain 2002] M. Figueiredo and A. Jain, “Unsupervised learning of finite mixture models,” IEEE Trans. PAMI, vol. 24, no. 3, pp. 381–396, 2002. [Jojic el al. 2003] N. Jojic, B. J. Frey, and A. Kannan, “Epitomic analysis of appearance and shape,” in IEEE International Conference on Computer Vision. Washington, DC, USA: IEEE Computer Society, 2003, p. 34. [Schwartz and Davis 2009] W. Schwartz and L. Davis, “Learning discriminative appearance-based models using partial least squares,” in XXIISIBGRAPI, 2009. [Zheng et al. 2009] W. Zheng, S. Gong, and T. Xiang, “Associating groups of people,” in BMVC, 2009. 16