2012 Pedestrian Re-identification by Layered Pseudo-3D Pictorial Model Matching Yuanlu Xu, SYSU, China merayxu@gmail.com 2012.8.2 Episode 1 Difficulties, Empirical Studies, Intuitions, and Framework Problem Matching Difficulties Non-overlapping Camera Views Irrelevant negative samples, difficult to train classifiers Difficulties View Changes Difficulties Occlusions Carried objects block the appearance Difficulties Illumination Changes Need illumination-invariant features or lightamending process Difficulties Large Intra-class Variations & Limited Samples for Learning Difficulties 180° 209 18% Frontal 401 35% Frontal 45° 90° 90° 343 30% 180° 45° 188 17% Difficulties Poses in VIPeR Difficulties Poses in VIPeR Difficulties Occlusions in VIPeR 挎包 206 17% 书包 404 33% 提包 44 3% 自然人 586 47% 自然人 书包 挎包 提包 Difficulties Poses in VIPeR 上半身 (手臂和躯 干) 下半身 正面 45 90 180 正常 一只手臂 一只手臂 正常 手臂遮挡躯 干 一只手臂 正常 正常 剪刀形 正常 腿间遮挡 X形 倒Y形 部分遮挡 大腿遮挡 完全遮挡 Difficulties Occlusions in VIPeR Difficulties Occlusions in VIPeR Difficulties Occlusions in VIPeR 正面 书包 90 上半身 上半身 躯干下 半区域 躯干下 半区域 下半身 腿部上 半区域 腿部上 半区域 腿部 腿部 上半身 下半身 180 背部 下半身 挎包 提包 45 Difficulties View Difference in VIPeR 0度 2 0% 180度 103 16% 135度 95 15% 45度 68 11% 0度 45度 90度 135度 90度 362 58% 180度 Framework Pedestrian Modeling Annotated Parts Prior 3DPose Templates Learned Part Classifiers Matching Inference Part Detection Pose, View Estimation Computing Part Signature Ranking by Coarse Model Comparison Occluded Parts Recovery Pedestrian Image Pseudo-3D Pictorial Model Re-ranking by Layered Graph Matching Matching Results Episode 2 Pedestrian Modeling Framework Pedestrian Modeling Annotated Parts Prior 3DPose Templates Learned Part Classifiers Model Learning Learned Part Classifiers Part Detection Pedestrian Image Prior 3DPose Templates Pose, View Estimation Occluded Parts Recovery Model Inference Pseudo-3D Pictorial Model Pictorial model A part-based appearance model to represent nonrigid objects. D.S. Cheng, M. Cristani, et al., "Custom pictorial structures for reidentification,“ BMVC 2011 Pictorial model Characteristics: 1. The body is decomposed into a set of parts, their configuration 𝐿 = 𝑙0 , 𝑙1 , . . 𝑙𝑁 . 2. Each part 𝑙𝑖 = {𝑥𝑖 , 𝑦𝑖 , 𝜃𝑖 , 𝑠𝑖 }, position, orientation and scale, respectively. M. Andriluka et al, “Pictorial Structures Revisited: People Detection and Articulated Pose Estimation”, CVPR 09 Pictorial model Given an image of a pedestrian 𝐼, the posterior of 𝐿 is modeled as 𝑝 𝐿 𝐷 ∝ 𝑝 𝐿 ⋅ 𝑝(𝐷|𝐿) 𝑝(𝐿): pictorial model prior, formed as a directed tree. 𝑝(𝐷|𝐿): the likelihood of the image given a pictorial model, by discriminative appearance model Pictorial model The body model is decomposed into N = 6 part: chest, head, thighs and legs Pictorial model The kinematic dependencies between parts is represented by a directed tree: 𝐸 denotes the set of all directed edges in the kinematic tree and assign 𝑙0 to be the root node (torso). Pictorial model The prior for the root part configuration 𝑝(𝑙0 ) is simply assumed to be uniform. The part relations are modeled using Gaussian distributions. Although the part relations are intuitively not Gaussian, we can transform it to a different space. Pictorial model To model 𝑝 𝑙𝑖 𝑙𝑗 ), we transform the part configuration 𝑙𝑖 = (𝑥𝑖 , 𝑦𝑖 , 𝜃𝑖 , 𝑠𝑖 ) into the coordinate system of the joint between the two parts using the transformation: Pictorial model the part relation is modeled as a Gaussian in the transformed space: Pictorial model Estimate the likelihood 𝑝(𝐼|𝐿): 1. Different part evidence maps are conditionally independent given the configuration 𝐿. 2. The part map 𝑑𝑖 for part 𝑖 only depends on its own configuration 𝑙𝑖 . Pictorial model Estimate the likelihood 𝑝(𝐼|𝐿): The likelihood simplifies as Justifiable as long as parts do not occlude each other significantly. Pictorial model Train an AdaBoost classifier with simple decision stumps: ℎ𝑡 𝑥 = 𝑠𝑖𝑔𝑛(𝜉𝑡 (𝑥𝑛 𝑡 − ϕ𝑡 )) Pictorial model To integrate the discriminative classifiers into the generative probabilistic framework described above The posterior over the configuration of parts factorizes as: Episode 3 Matching Inference Framework Matching Inference Computing Part Signature Pseudo-3D Pictorial Model Ranking by Coarse Model Comparison Re-ranking by Layered Graph Matching Matching Results Part Signature Color Histograms: HSV characterization, where hue and saturation are jointly taken by a 2D histogram to retain much of the chromatic specificity. Maximally Stable Color Region (MSCR): detects a set of blob regions by looking at successive steps of an agglomerative clustering of image pixels. Source Image MSCR M. Farenzena, L. Bazzani, A. Perina, V. Murino, and M. Cristani. Person ReIdentification by SymmetryDriven Accumulation of Local Features. CVPR, 2010. Part Signature Distance Measures Given two part signatures 𝑔𝑖 = 𝐶𝐿𝑖 , 𝑀𝑆𝐶𝑅𝑖 , 𝑔𝑗 = 𝐶𝐿𝑖 , 𝑀𝑆𝐶𝑅𝑗 , the distance between 𝑔𝑖 and 𝑔𝑗 is defined as 𝐸 𝑔𝑖 , 𝑔𝑗 = 𝛼𝐸 𝐶𝐿 𝐶𝐿𝑖 , 𝐶𝐿𝑗 + 𝛽𝐸 𝑀𝑆𝐶𝑅 𝑀𝑆𝐶𝑅𝑖 , 𝑀𝑆𝐶𝑅𝑗 , where 𝐸 𝐶𝐿 is the Bhattacharyya distance, 𝐸 𝐶𝐿 𝐶𝐿𝑖 , 𝐶𝐿𝑗 = − ln 𝐶𝐿𝑖 𝑥 ⋅ 𝐶𝐿𝑗 𝑥 𝑥∈𝑋 . Coarse matching Distance Measures 𝐸 𝑀𝑆𝐶𝑅 is defined as 𝐸 𝑀𝑆𝐶𝑅 𝑀𝑆𝐶𝑅𝑖 , 𝑀𝑆𝐶𝑅𝑗 = min 𝛾 ⋅ 𝐸 𝑦 𝑥, 𝑦 + 1 − 𝛾 𝐸 𝑐 (𝑥, 𝑦) . 𝑥∈𝑀𝑆𝐶𝑅𝑖 𝑦∈𝑀𝑆𝐶𝑇𝑗 𝐸 𝑦 (𝑥, 𝑦) measures the Euclidean distance between MSCR centroids, 𝐸 𝑐 (𝑥, 𝑦) measures the Euclidean distance between their mean color. Coarse ranking For each pseudo-3D pictorial model, concatenating each part and normalizing them into a single feature vector. To represent parts with different size and depth, multiply the part signatures with a set of weights 𝑊 = 𝑤𝑖 , 𝑖 = 1, … , 𝑁 (large, front parts having large weights and vice versa), we get a coarse model signature 𝐺. By calculating the distance model signatures, we get a coarse ranking. Fine re-ranking by layered graph matching To further improve the matching results, a composite parts clustering approach is employed. Given a query pedestrian 𝐺 = {𝑔1 , … , 𝑔𝑛 }, to find the best match 𝐺′, define a candidacy graph Ω = (𝓒, 𝓔). By calculating the distance model signatures, we get a coarse ranking. Liang Lin, Xiaobai Liu, and Song-Chun Zhu, "Layered Graph Matching with Composite Cluster Sampling", TPAMI, 2010 Layered graph matching Input: two graphs(GS , GT ) Output: layered matching configuration Layered graph matching Input: source graph 𝐺 𝑆 and target graph 𝐺 𝑇 Output: layered matching configuration 𝑊 1. Construct candidate graph. 2. Sample composite clusters. a. Generate a composite cluster. b. Re-assign color to the composite cluster. c. Convert to a new state. Layered graph matching Construct candidate graph - vertices 1. Start 𝑢 with a linelet, find the set 𝑉(𝑢) of matching candidates. 2. Grow 𝑢, reduce the matching candidates. 3. Repeat 1 and 2 until only less than k matching candidates. Layered graph matching Construct candidate graph - vertices Let a matching pair (𝑢𝑖 , 𝑣𝑖 ) be a vertices 𝑐𝑖 in the candidate graph. Layered graph matching Construct candidate graph - edges Establish the negative and positive edges and calculate their edge probabilities between vertices. Layered graph matching Construct candidate graph - edges 𝑒 = < 𝑐𝑖 , 𝑐𝑗 > as a negative edge in two cases: 1. two candidates are mutually exclusive: 𝑢𝑖 = 𝑢𝑗 . 2. the two candidates overlap: 𝑣𝑖 ∩ 𝑣𝑗 ≠ ∅. Layered graph matching Construct candidate graph - edges 𝑒 = < 𝑐𝑖 , 𝑐𝑗 > as a positive edge: the similarity transformation to align 𝑐𝑖 and 𝑐𝑗 . Layered graph matching Layered graph matching Generate a composite cluster CCP: Candidates connected by the positive “on” edges form a CCP. (blue lines) Composite Cluster: A few CCPs connected by negative “on” edges form a composite cluster.(red lines) Layered graph matching Generate a composite cluster Layered graph matching Re-assign color • Primitives connected by positive edges receive the same color. The ones connected by negative edges receive different color. Layered graph matching Convert to a new state Layered graph matching Convert to a new state • Employ MCMC, the reversible jump between A and B. • Let 𝑞(𝐴 → 𝐵) be the proposal probability for moving from state A to state B. • The acceptance rate of the move from A to B is proposal probability ratio posterior probability ratio Layered graph matching Convert to a new state Proposal probability ratio: • • 𝑞(𝑉𝑐𝑐 |𝐴): the probability of generating 𝑉𝑐𝑐 at state A. 𝑞(𝑐𝑜𝑙𝑜𝑟𝑖𝑛𝑔(𝑉𝑐𝑐 ) = 𝐵 𝑉𝑐𝑐 |𝑉𝑐𝑐 , 𝐴): the probability of recoloring the CCPs to state B. Layered graph matching Convert to a new state Ratio of generating 𝑉𝑐𝑐 : Layered graph matching Convert to a new state Posterior probability ratio: Prior ratio Likelihood ratio 𝑝(𝑊 = 𝐵|𝐺 𝑆 , 𝐺 𝑇 ) 𝑝 𝑊 = 𝐵 ⋅ 𝑝(𝐺 𝑆 , 𝐺 𝑇 |𝑊 = 𝐵) ∝ 𝑆 𝑇 𝑝(𝑊 = 𝐴|𝐺 , 𝐺 ) 𝑝 𝑊 = 𝐴 ⋅ 𝑝(𝐺 𝑆 , 𝐺 𝑇 |𝑊 = 𝐴) 𝑝 𝑊 ∝ exp −𝛼𝐾 𝐾 − 𝛼𝑁 𝑁 ⋅ 𝑝(𝓛), 𝑝 𝓛 is a Potts model for the label 𝓛 to punish inconsistent assignments. ′ 𝐾 ′ 𝑝 𝐺 𝑆, 𝐺𝑇 𝑊 = 𝐾 𝑝(𝑔 , 𝑔 |𝚿 , Φ ) ∝ exp −𝐸 𝑔 , 𝑔 𝑘 𝑘 𝑘 𝑘 𝑘 𝑘 , 𝑘=1 𝑘=1 the computation of the posterior probability ratio only involves the recoloring of candidates in 𝑉𝑐𝑐 . Layered graph matching Occlusion constraints 正面 45 90 180 手臂档躯干 正常 正常 上半身 (手臂和躯干 ) 手臂档躯干 手臂档躯干 Layered graph matching Occlusion constraints 正面 45 90 180 正常 正常 剪刀形 正常 腿间遮挡 X形 大腿遮挡 倒Y形 完全遮挡 部分遮挡 下半身 Layered graph matching Symmetric constraints Left and right limbs owns the same appearances and feature Layered graph matching Coordination constraints • When people walk, their diagonal limbs share the same movement tendency. Episode 4 Summary Framework Revisit Pedestrian Modeling Annotated Parts Prior 3DPose Templates Learned Part Classifiers Matching Inference Part Detection Pose, View Estimation Computing Part Signature Ranking by Coarse Model Comparison Occluded Parts Recovery Pedestrian Image Pseudo-3D Pictorial Model Re-ranking by Layered Graph Matching Matching Results Contributions 1. A modified pseudo-3D pictorial model I. Modeling occlusions explicitly by incorporating 3D space information among different parts. II. Decreasing intra-class variations by reproducing viewinvariant appearances of pedestrians. III. A novel learning approach for modeling pedestrian Contributions 2. A novel graph-matching-based inference method I. A coarse to fine matching optimization process. II. A fine bottom-up matching algorithm, which fits the pseudo-3D pictorial model. QUESTIONS? Empirical studies How does human find the two matching pedestrians? D.S. Cheng, M. Cristani, et al., "Custom pictorial structures for reidentification," BMVC 2011. Empirical studies How does human find the two matching pedestrians? Human vs. Machine Cues human employed: 1. The color of the parts 2. The “type” of clothing worn 3. The gender, the skin, the presence of discriminant particulars D.S. Cheng, M. Cristani, et al., "Custom pictorial structures for reidentification," BMVC 2011. Empirical studies How does human find the two matching pedestrians? Cues human employed: 1. The color of the parts 2. The “type” of clothing worn 3. The gender, the skin, the presence of discriminant particulars Generalization: 1. A part-based appearance model 2-3. Existing semantic gaps Empirical studies Objective of pedestrian matching Decrease the Intra-class Variations. Empirical studies How does human find the two matching pedestrians? Empirical studies Motivation 1.We try to rebuild the missing parts blocked by other objects in graphs. 2.We try to find some stable features that are able to resist the view changes or occlusions for matching and to redraw the parts that strengthen the intraclass variations. Empirical studies View Changes Which parts are usually missing in the view changes? It depends on the angles we photo. Empirical studies View Changes——missing parts Front 易出现左腿当右腿的情况 或反之 90° 肩部信息部分丢失 腿部易出现相互遮挡 180° 易出现左腿当右腿的情况 或反之 Empirical studies View Changes——missing parts 正面和背面:腿部需要分层重建,用未被覆 盖的层重建出被覆盖的层。 90度:肩部信息需要重建,腿部也偶尔会需 要重建。 Empirical studies View Changes What stable features can we find in the view changes? We may have such options: 1.Features on cloths: front, sleeves and back. 2.Features on pants. Empirical studies Are all those features mentioned stable? View Changes—— appearance Front Information Lost Front to 90° Front Information Lost Front to 180° No previous back Information 90° to 180° No previous back Information Empirical studies View Changes—— features Conclusion: Features on sleeves and pants are stable to resist view changes. Empirical studies Next: What can we do with occlusions? Carried objects may block the appearance and disturb the detecting process in person modeling Empirical studies Occlusions We also need to solve the problem in 2 aspects: 1.Using the person model (rebuild missing parts) 2.Find stable features that resist the occlusions. The three main barrier objects: packs, satchels, handbags Occlusions Empirical studies Front packs satchels handbags 90° 180° Intuition Occlusions Since the carried object will block one person’s body in certain view, we could utilize certain texture features in certain angles. For example, we can use the texture of sleeves since bagpack will block one’s back if we take pictures behind him. Separate barrier objects from body, and redraw the appearance of blocked parts. Use the barrier object as a cue and find it in the target graph. (color) Conclusion Person model rebuilding —— missing parts Front 90 180 Body One leg may be blocked One arm is blocked One leg is blocked One leg may be blocked Pack No No Back is blocked Satchel No Torso is blocked No Hand bag No Legs are blocked No Conclusion Stable Appearance Using the appearance of pants or cloth, especially the feature of sleeves which may remain more stable than others’. Empirical studies Illumination Changes Illumination changes can influence the appearance of target person, such as the color of cloth and skin. Layered graph matching • The center positions of linelets which consist of primitives are denoted as x0,x1 and so on. • Select x0, and find its matches in {y:v0,v1````}. • The standard for matching is Layered graph matching • 1.All vertices in each layer receive a unique color. • 2. we use following function to match layers in Gs with layers in Gt. 3.If matched, (gk,g`k)will be a couple and receive the same color. Layered graph matching Distance Measures Layered graph matching Framework Revisit Layered graph matching Layered graph matching Layered graph matching