IJCAI 2007 Wei Liu, Xiaoou Tang, and Jianzhuang Liu Wei Liu, Xiaoou Tang, and Jianzhuang Liu Dept. of Information Engineering The Chinese University of Hong Kong The Chinese University of Hong Kong Outline y What is sketch-based facial photo hallucination y Related Works y Our Approach y Tensor T Model M d l y TensorPatches y Bayesian Tensor Inference y Experimental Results Definition Sketch-based facial photo hallucination: hallucinate (imagine) photorealistic faces from sketches, i.e., the backward transform from sketches to photos. Bidirectional transforms on photo-sketch pairs. (a) Forward transform: synthesizing a sketch image from a photo image; (b) backward transform: hallucinating a photorealistic image from a sketch image. Related Works y Face Sketching o H. Chen et al. “Example-based facial sketch generation with non- parametric sampling”, in Proc. of ICCV, 2001. o X. Tang and X. Wang. “Face sketch recognition”. IEEE Trans. on CSVT 14(1):50 CSVT, 14(1):50-57 57, 2004 2004. o Q. Liu et al. “A nonlinear approach for face sketch synthesis and recognition”, in Proc. of CVPR, 2005. y Ideas y Pixel-wise non-parametric sampling y Global linear model: PCA y Local linear model: LLE Motivations y Consider the complexity of image spaces and the conspicuous distinction between photos and sketches. y Extract the local relations by explicitly establishing the connection between two feature spaces (sketch and photo) h t ) formed f d by b a patch-based t hb d ttensor model. d l y Formulate a Bayesian approach accounting for the statistical inference from sketches to their corresponding photos in terms of the learned tensor model. Tensor Model • We make use of a novel tensor model to exclusively • • account for the representation of images with two styles: photo style and sketch photo-style sketch-style. style As small image patches can account for high-level statistics of images images, we take patches as constitutive elements of the tensor model. Based ased o on a pa patch c co corpus pus with p photooo a and d ssketch-styles, e c s y es, we arrange patches into a high-order tensor which will disclose the latent connection between the two styles. Patch-based Tensor Models y Generic Images y Model a 3rd order tensor resulting from the confluence of 3 modes: patch examples, patch styles, and patch features. y Tensor transfer can learn the hidden relations between photo patch space and sketch patch space. y Face Images y Model a 4th order tensor resulting from the confluence of 4 modes: people, patches, styles, and features. y Utilize PCA prior of face images. y Bayesian B i T Tensor IInference f will ill incorporate i t the th llearned d relations l ti iin which the tensor model entails into a Bayesian framework. Multilinear Analysis y Why use Multilinear Analysis? y It unifies multiple p factors in a framework. y It explicitly models the interaction between these factors. y Theoretical Foundation – Tensor Algebra g I × I ×"× I n y Tensor – Multidimensional Array A ∈ R 1 2 y Tensor Product Ik (A ×k U)i1i2 "ik −1 jk ik +1"in = ∑ (A )i1i2 "ik −1ik ik +1"in (U) jk ik ik =1 y High Order Singular Value Decomposition (HOSVD) A = C ×1 U1 ×2 U 2 ×3 " ×n U n H How M Multilinear ltili A Analysis l i W Works k y Ensemble Representation Ensemble Tensor: Arrange samples based on factors Base Matrix: Span the space of the self variations D = C ×1 U1 ×2 U 2 ×3 " ×n −1 U n −1 ×n U n Core Tensor: Control the i t interaction ti between factors Mode matrices: Capture the variations of each factor H How M Multilinear ltili A Analysis l i W Works k y Individual Sample Representation x = C ×1 u ×2 u ×3 " ×n −1 u T 1 T 2 Core Tensor: C Coordinate the interaction between factors T n −1 The vector representation of each factor Obtain the coefficients of basis ×n U n T TensorPatches P t h y Formulation F l ti off Multi-Style M lti St l Patch P t h Ensembles E bl D = C ×1 U people ×2 U positions ×3 U styles ×4 U features Patch ensemble Core ttensor C coordinating the interaction between bet ee factors Encoding E di people-people related information Encoding position-position related information Encoding style-style related information The basis spanning the variation subspace of patches y The tensor decomposition can be done by HOSVD Bidirectional Mapping/Inferring Forward transform: mapping the “Photo Patch Space” to the “Common Variation Space” Space from which inferring the “Sketch Patch Space” . x y ≈ Ay Bx x Backward transform: mapping the “Sketch Sketch Patch Space Space” to the “Common Common Variation Space” from which inferring the “Photo Patch Space” . x ≈ Ax By y y Illustration of relations among common variation space, photo patch space and sketch patch space. Hidden Relations y From common variation space to photo/sketch image spaces: I x = Ax w, I y = Ay w. y From photo/sketch image spaces to common variation ariation space: w = ( AxT Ax ) −1 AxT I x = Bx I x , w = ( AyT Ay ) −1 AyT I y = By I y . y The people parameter vector w maintains to be solved for new face images. B Bayesian i T Tensor Inference I f y We fulfill the backward transform Iy → Ix through w by taking these quantities as a whole into a global optimization formulation. y Our inference approach is still deduced from canonical Bayesian statistics, exploiting PCA to represent the photo feature vector Ix (use the latent variable a) to be hallucinated. hallucinated y The advantage of our approach is to take into account the statistics among a, a w, w and Iy. Iy B Bayesian i T Tensor Inference I f • Perform PCA on the training photo vectors {Ix} I x ≈ Ua + μ , p(a) ∝ exp{−aT Λ −1a}. • Use the learned relations I y = Ay w, w = Bx I x , we have p ( I y | w) ∝ exp{− • I y − Ay w λ1 2 }, p ( w | a ) ∝ exp{− w − Bx (Ua + μ ) λ2 We find the MAP solution a* for hallucinating the optimal Ix* as follows a* = arg max w,a p ( w, a | I y ) = arg max w,a p ( I y | w, a ) p ( w, a ) = arg max w,a p ( I y | w) p ( w | a ) p (a ). ) 2 }. Architecture of our sketch-based facial photo hallucination approach. (1) Learn the TensorPatches model taking image derivatives as features in the training phase. ((2)) Obtain the initial result applying pp y g the local geometry preserving method. (3) Infer image derivatives of the target photo using the Bayesian Tensor Inference method, given input image derivatives extracted from the test sketch face. (4) Conduct gradient correction to hallucinate the final result. Photo hallucination results for Asian and European faces faces. (a) Input sketch images images, (b) eigentransform method, (c) local geometry preserving method, (d) our method, (e) groundtruth face photos. Thanks! Jan 2007 If any question on this paper, feel free to contact me via wliu5@ie.cuhk.edu.hk. Visiting http://mmlab.ie.cuhk.edu.hk/~face/ for more information about my other works.