Spatio-temporal Embedding for Statistical Face Recognition from Video Wei Liu

Spatio-temporal Embedding for Statistical Face Recognition from Video Wei Liu†, Zhifeng Li†, and Xiaoou Tang‡ † MMLAB, Department of Information Engineering The Chinese University of Hong Kong Introduction ‡ Visual Computing Group Microsoft Research Asia Spatial Embedding: NDE Experimental results This paper addresses the problem of how to learn an appropriate representation from video to benefit video-based face recognition. We pose it as learning spatio-temporal embedding (STE) from raw video. z STE of a video sequence is defined as its condensed version capturing the essence of space-time characteristics of the video. z Relying on co-occurrence statistics of training videos, Bayesian keyframe learning leads to the temporal embedding, keyframes, of each video. z Given supervised signatures of face videos, nonparametric discriminant embedding (NDE) learned from the keyframes makes up STE. z A statistical formulation in terms of STEs to the video-based recognition problem. Framework (a) (b) Fig. 3. Keyframes learned from one video sequence in XM2VTS. (a) Top-1 keyframes, each of which stands for a cluster in the sequence; (b) 10 keyframes shown in the temporal axis, compared with the speech signal. (c) (d) Fig. 2. (a) The two half-sphere toy data points with 2 labels “*” and “o”; (b) PCA embedding; (c) LDA embedding; (d) NDE. • NDE is the multi-class nonparametric invariant of conventional LDA bases on the following improved within and between-class scatter matrices: Fig. 1. The framework of our video-based face recognition approach. (a) Training stage: learn keyframes from video sequences and then arrange them into K groups consisting of homogeneous frames, whose low-dimensional spatial embeddings are STEs via NDE; (b) testing stage: construct a statistical classifier in terms of learned STEs. Temporal Embedding • Only use image information, utilize the co-occurrence statistics of video sequences. • Based on the spatio-temporal correlation between frames, a synchronized frame clustering method applies K-means to incrementally cluster frames across all videos. • Given K clusters Ck with sub-clusters Ck(i) in each training video sequence V(i), the optimal keyframe e in Ck(i) should be such that which is called Bayesian keyframe learning. • Collect top-m keyframes within each Ck(i) to span the temporal embedding T(i) for sequence V(i), and denote its k-th component in cluster Ck(i) as Tk(i). Recognition results • NDE is a generalized version of LDA and usually provides more than c-1 projections. • NDE effectively captures the boundary structures for different classes. • Fig. 2 demonstrates NDE can find a better subspace than LDA or PCA in the case of abundant training data, so we take merits of NDE into video-based classification. Statistical Recognition Fig. 4. NDE versus LDA: the cumulative matching score is used for the performance measure. • Keyframes learned by our method approximate closely with those obtained by audiovideo frame synchronization [13]. • NDE outperforms conventional LDA in most cases. • Our recognition framework integrating STEs and statistical classification achieves the best recognition accuracy. Approaches Recognition Approaches Recognition Rate Rate • To run NDE on K slices slice_k={Tk(i)}I, we form keyframes carrying the same human label Nearest frame using unified 79.3 % Mutual subspace 93.2 % into one class in each slice. Thus K NDE projections Wk are acquired along with the STEs. subspace analysis • For any test video frame x, we will compute its statistical correlation to the learned STE L(i) Temporal embeddings + 81.7 % 98.0 % Nearest frame multi-level subspace analysis of video sequences V(i) in gallery. Model the correlation as the posterior probability p(L(i)|x). Spatio-temporal embeddings + Nearest frame 90.9 % 99.3 % • Using a probabilistic fusion scheme, we construct a MAP classifier in terms of learned using LDA statistical classifier STEs to realize image-to-video recognition that may also be developed to video-to-video If any question on this paper, please freely contact me ( wliu5@ie.cuhk.edu.hk). Visiting recognition. http://mmlab.ie.cuhk.edu.hk/~face/ for more information about my other works.

Spatio-temporal Embedding for Statistical Face Recognition from Video Wei Liu

Related documents

Products

Support

Spatio-temporal Embedding for Statistical Face Recognition from Video Wei Liu

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib