defense slides - UCLA Computer Science

2012 Moving Object Segmentation by Pursuing Local SpatioTemporal Manifolds Yuanlu Xu Problem Segmenting moving foreground in a video Related work & intuitions Dynamic background ~ dynamic textures Image sequences of certain textures moving and changing under certain properties. S. Soatto, G. Doretto, and Y. Wu. “Dynamic textures”. IJCV 2003 Related work & intuitions Dynamic background ~ dynamic textures How to model? The output of a linear dynamic system driven by IID Gaussian noises. Intuition for moving object segmentation: A complex scene containing dynamic background is composed of several independent dynamic textures. Related work & intuitions Illumination changes ~ modeling illumination Observing eigenvalue curves of different state bricks, (a) background, (b) foreground occlusion Y. Zhao et al. “Spatiotemporal patches for night background modeling by subspace learning”. ICPR 2008 Related work & intuitions Illumination changes ~ modeling illumination Intuition for handling illumination changes: The set of bricks of a given background location under various lighting conditions lies in a low-dimensional manifold. Related work & intuitions Indistinctive changes Similar appearance  incorporating extra information Intuition for distinguishing indistinctive moving objects: Modeling background appearance variations, estimating next state, distinguishing moving objects not following the similar changes Intuitions & assumptions 1. A complex scene containing dynamic background is composed of several independent dynamic textures. 2. The set of bricks of a given background location under various lighting conditions lies in a lowdimensional manifold. 3. Modeling background appearance variations. 1. Given a background location, the sequence of bricks (under dynamic changes, illumination changes) lies in a lowdimensional manifold, and the variations satisfy local linear. 2. The bricks with indistinctive and distinctive foreground occlusions can be well separated from the background by distinguishing differences in both appearance and variations. Representation Segmenting Brick in Video: For each frame, we divide it into patches with size ℎ ⋅ 𝑤. At each location, t patches are combined together to form a brick Representation Center Symmetric – Spatio Temporal LTP (CS-STLTP) Descriptor 156 178 182 0 尺度阈值 70 101 89 193 251 126 t = 0.2 1 0 -1 4个时空平面 85 178 124 81 101 63 Y T . . . 146 251 145 特征向量 0 1 0 -1 56 178 76 X 123 101 251 53 251 142 . . . . . . -1 1 -1 53 178 78 3x3x3 立方体 246 101 198 43 251 20 -1 1 -1 1 1 Mathematical formulation Given a brick sequence 𝑽 = 𝑣1 , 𝑣2 , … , 𝑣𝑛 ∈ 𝐑m∗nof a background location, we assume the dimension of the manifold 𝑽 in is 𝑑. The structure of this manifold: 𝑑 𝑣𝑖 = 𝑧𝑖,𝑗 𝐶𝑗 + 𝜔 𝑗=1 𝑪 = 𝐶1 , 𝐶2 , … , 𝐶𝑑 : bases of the manifold. 𝑧𝑖,𝑗 : coefficient of basis 𝐶𝑗 given 𝑣𝑖 . 𝜔: structural residual . Mathematical formulation Given the corresponding coding 𝒁 = 𝑧1 , 𝑧2 , … , 𝑧𝑛 ∈ 𝑹𝑑∗𝑛 for 𝑽 = 𝑣1 , 𝑣2 , … , 𝑣𝑛 , the coding variation is local linear, according to the assumption. The coding variation within this manifold: 𝑧𝑖+1 = 𝐴𝑧𝑖 + 𝜖𝑖 𝑧𝑖+1 , 𝑧𝑖 : two successive state. 𝐴 ∈ 𝑹𝑑∗𝑑 : description of the coding variation. 𝜖𝑖 : state residual. Mathematical formulation The problem of pursuing the structure of and the variation within a manifold is formulated as minimizing the empirical energy function: 1 𝑚𝑖𝑛. 𝑓𝑛 𝑪, 𝑨 = 𝑛 𝑛 𝑖=1 1 ( 𝑣𝑖 − 𝑪𝑧𝑖 2 1 + 𝑧𝑖 − 𝐴𝑧𝑖−1 2 2 2 (𝑽 = 𝑣1 , 𝑣2 , … , 𝑣𝑛 ∈ 𝑹𝑚∗𝑛 , 𝒁 ∈ 𝑹𝑑∗𝑛 , 𝑪 ∈ 𝑹𝑚∗𝑑 , 𝐴 ∈ 𝑹𝑑∗𝑑 ) min. structural residual min. state residual 2 2 ) Mathematical formulation Because 𝒁 is unknown, we rewrite the problem as a joint optimization problem with 𝑪, 𝒁, 𝐴: 1 𝑚𝑖𝑛. 𝑓 𝑪, 𝒁, 𝐴 = 𝑛 𝑛 𝑖=1 1 ( 𝑣𝑖 − 𝑪𝑧𝑖 2 1 + 𝑧𝑖 − 𝐴𝑧𝑖−1 2 2 2 2 2 ) Not jointly convex, but convex with respect to 𝑪, 𝐴 and 𝒁 when the other is fixed. A numerical solution: alternate between the two variables, minimizing over one while keeping the other one fixed. Representation 1 𝑚𝑖𝑛. 𝑓 𝑪, 𝒁, 𝐴 = 𝑛 𝑛 𝑖=1 1 ( 𝑣 − 𝑪𝑧𝑖 2 𝑖 1 + 𝑧 − 𝐴𝑧𝑖−1 2 2 𝑖 2 2 2 ) Rewritten as a linear dynamic system (LDS) 𝜔𝑖 ∼𝐼𝐼𝐷 structural residual  structural noise 𝑣𝑖 = 𝐶 𝑧𝑖 + 𝜔𝑖 , 𝑧𝑖+1 = 𝐴 𝑧𝑖 + 𝜖𝑖 𝑁 0, 𝑅 , 𝜖𝑖 ∼𝐼𝐼𝐷 𝑁(0, 𝑄) state residual  state noise Learning 𝑣𝑖 = 𝐶 𝑧𝑖 + 𝜔𝑖 , 𝑧𝑖+1 = 𝐴 𝑧𝑖 + 𝜖𝑖 𝜔𝑖 ∼𝐼𝐼𝐷 𝑁 0, 𝑅 , 𝜖𝑖 ∼𝐼𝐼𝐷 𝑁(0, 𝑄) Initial Learning Given a training sequence 𝑉 = {𝑣1 , 𝑣2 , … , 𝑣𝑛 }, identify 𝐶𝑛 , 𝐴𝑛 , 𝑅𝑛 , 𝑄𝑛 𝑣𝑖 = 𝐶𝑛 𝑧𝑖 + 𝜔𝑖 , 𝑧𝑖+1 = 𝐴𝑛 𝑧𝑖 + 𝜖𝑖 Online Learning Given a new brick 𝑣𝑛+1 , incrementally learn 𝐶𝑛+1 , 𝐴𝑛+1 , 𝑅𝑛+1 , 𝑄𝑛+1 𝑣𝑖+1 = 𝐶𝑛+1 𝑧𝑖 + 𝜔𝑖+1 , 𝑧𝑖+2 = 𝐴𝑛+1 𝑧𝑖+1 + 𝜖𝑖+1 Learning Initial Learning Sub-optimal analytical solution S. Soatto, G. Doretto, and Y. Wu. “Dynamic textures”. IJCV 2003. Online Learning Learning 𝐶𝑛+1 : incremental subspace learning Candid Covariance-free IPCA (CCIPCA) and IPCA J. Weng et al. “Candid covariance-free incremental principal component analysis”. TPAMI 2003. Y. Li. “On incremental and robust subspace learning”. Pattern Recognition 2004. Learning 𝐴𝑛+1 : Linear problem of the latest 𝑙 states Inference For a new brick 𝑣𝑛+1 , the segmentation of moving object is decided by the structural noise and state noise. Structural noise: 𝑧 ′ 𝜔𝑛+1 𝑇 = 𝐶𝑛 𝑣𝑛+1 = 𝑣𝑛+1 − 𝐶𝑛 𝑧′𝑛+1 𝑛+1 State noise: 𝜖𝑛 = 𝑧 ′ 𝑛+1 − 𝐴𝑛 𝑧𝑛 Experimental Results Datasets Busy scenes Dynamic scenes Water Surface Illumination changes Swaying Trees Sudden Light Airport Heavy Rain Active Fountain Train Station Gradual Light Waving Curtain Floating Bottle Experimental Results Scene GMM 1# Airport 2# Floating Bottle 3# Waving Curtain 4# Active Fountain 5# Heavy Rain 6# Sudden Light 7# Gradual Light 8# Train Station 9# Swaying Trees 10# Water Surface Average 46.99 57.91 62.75 52.77 71.11 47.11 51.10 65.12 19.51 79.54 55.39 ImGMM 47.36 57.77 74.58 60.11 81.54 51.37 50.12 68.80 23.25 86.01 59.56 OnlineAR 62.72 43.79 77.86 70.41 78.68 37.30 13.16 36.01 63.54 77.31 57.02 JDR 60.23 45.64 72.72 68.53 75.88 52.26 47.48 57.68 45.61 84.27 60.23 Struct1 -SVM 65.35 47.87 77.34 74.94 82.62 47.61 62.44 61.79 24.38 83.13 59.79 SILTP 68.14 59.57 78.01 76.33 76.71 52.63 54.86 67.05 42.54 74.30 63.08 STDB (RGB) 75.52 69.04 87.74 76.85 86.86 51.56 54.84 73.43 43.70 88.54 70.81 STDB (Ftr.) 66.40 75.85 79.57 79.68 81.35 70.23 72.52 66.46 48.49 87.88 72.84 Experimental Results Experimental Results Experimental Results Experimental Results Experimental Results Experimental Results Selection of structural update approach Scene 1# Airport 2# Floating Bottle 3# Waving Curtain 4# Active Fountain 5# Heavy Rain 6# Sudden Light 7# Gradual Light 8# Train Station 9# Swaying Trees 10# Water Surface Average CCIPCA Accuracy Efficiency (%) (fps) 75.52 69.04 87.74 76.85 86.86 51.56 4.1 54.84 73.43 43.70 88.54 70.81 IPCA Efficiency Accuracy (%) (fps) 65.13 70.02 78.47 81.38 79.84 53.63 2.3 59.79 68.69 70.17 89.43 71.66 Dynamic scenes: IPCA is much better than CCIPCA Busy scenes: CCIPCA is much better than IPCA Illumination changes: IPCA slightly better than CCIPCA Efficiency: CCIPCA is much faster than IPCA Contribution 1. Formulating the problem of modeling background by pursuing local spatio-temporal manifolds of video brick sequences. 2. Representing spatio-temporal statistics in video bricks with CSSTLTP descriptor. 3. Pursuing local spatio-temporal manifolds with two LDSs: a timeinvariant LDS for initial learning and a time-variant LDS for online learning. 4. Online learning the structure of local spatio-temporal manifolds with incremental subspace learning and the state variations with re-solving linear problems. Problems 1. CS-STLTP behaves well in handling illumination changes, but not sufficient to capture variation statistics. 2. In highly dynamics scenes, the assumption of local linear variation can hardly hold. 3. CCIPCA suffers updating the great changes of the structure of the manifold. IPCA behaves better than CCIPCA but suffers the computational complexity. Published Papers 1. Yuanlu Xu, Hongfei Zhou, Qing Wang, Liang Lin. “Realtime Objectof-Interest Tracking by Learning Composite Patch-based Templates”. ICIP 2012 (accepted) 2. Liang Lin, Yuanlu Xu, Xiaodan Liang. “Complex Background Subtraction by Pursuing Dynamic Spatio-temporal Manifolds”. ECCV 2012 (submitted) QUESTIONS? Difficulties Dynamic backgrounds Illumination changes (especially sudden changes) Difficulties Indistinctive moving objects Moving camera (e.g., shaking, hand-held) Contribution 1. Formulating the problem of modeling background by pursuing local spatio-temporal manifolds of video brick sequences. 2. Representing spatio-temporal statistics in video bricks. 3. Pursuing local spatio-temporal manifolds. 4. Maintaining local spatio-temporal manifolds online. Mathematical formulation Similar to sparse coding, to prevent 𝑪 being arbitrarily large, which results 𝒁 arbitrarily small, we add the constraint 𝐶𝑘 2 ≤ 1, and the constraint set 𝛤 is formulated as: 𝛤 ≜ 𝑪 ∈ 𝑹𝑚∗𝑑 , ∀𝑘 = 1,2, … , 𝑑, 𝐶𝑘 2 ≤1 ∀ 𝐶1 2 ≤ 1, 𝐶2 2 ≤ 1, ∀ 0 ≤ 𝜃 ≤ 1, 𝜃𝐶1 + 1 − 𝜃 𝐶2 2 ≤ 𝜃𝐶1 2 + 1 − 𝜃 𝐶2 ≤ 𝜃 𝐶1 2 + 1 − 𝜃 𝐶2 ≤𝜃+ 1−𝜃 ≤1 Thus 𝛤 is a convex set. 2 2 Mathematical formulation Because 𝒁 is unknown, we rewrite the problem as a joint optimization problem with 𝑪, 𝒁, 𝐴: 1 𝑚𝑖𝑛. 𝑓 𝑪, 𝒁, 𝐴 = 𝑛 𝑛 𝑖=1 1 ( 𝑣𝑖 − 𝑪𝑧𝑖 2 1 + 𝑧𝑖 − 𝐴𝑧𝑖−1 2 2 2 2 2 ) 𝑠𝑢𝑏𝑗𝑒𝑐𝑡 𝑡𝑜 𝑪 ∈ Γ Not jointly convex, but convex with respect to 𝑪, 𝐴 and 𝒁 when the other is fixed. A numerical solution: alternate between the two variables, minimizing over one while keeping the other one fixed. Mathematical formulation In practice, above joint optimization problem is simplified as a two step optimization: 1. Rewrite the problem as a time-variant linear dynamic system, solve the structure of the system, ignore the state (coding) variation. 2. Given the structure of the system, solve the state variation, based on the corresponding state for each brick. Representation Local Binary Pattern (LBP) / Local Ternary Pattern (LTP) Representation Scale Invariant LTP (SILTP) S. Liao et al. “Modeling pixel process with scale invariant local patterns for background subtraction in complex scenes”. CVPR 2010 Representation Scale Invariant LTP (SILTP) SILTP is more robust in handling scale changes (illumination changes). Representation 156 178 182 0 尺度阈值 70 101 89 193 251 126 t = 0.2 1 0 -1 4个时空平面 85 178 124 81 101 63 Y T . . . 146 251 145 特征向量 0 1 0 -1 56 178 76 X 123 101 251 53 251 142 . . . . . . -1 1 -1 53 178 78 3x3x3 立方体 246 101 198 43 251 20 -1 1 -1 1 1 Representation Center Symmetric Coding P0 P1 P2 P7 Pc P3 P6 P5 P4 Comparison S0 S1 S2 8 neighboring pixels S3 around the center are formed into 4 pairs (𝑃0 , 𝑃4 ), (𝑃1 , 𝑃5 ), (𝑃2 , 𝑃6 ), (𝑃3 , 𝑃7 ). Representation 1 𝑚𝑖𝑛. 𝑓 𝑪, 𝒁, 𝐴 = 𝑛 structure of the manifold  appearance matrix 𝑛 𝑖=1 1 ( 𝑣 − 𝑪𝑧𝑖 2 𝑖 1 + 𝑧 − 𝐴𝑧𝑖−1 2 2 𝑖 2 2 2 ) Rewritten as a linear dynamic system (LDS) structural noise 𝜔𝑖 ∼𝐼𝐼𝐷 𝑁 0, 𝑅  𝑣𝑖 = 𝐶 𝑧𝑖 + 𝜔𝑖 , structural residual 𝑧𝑖+1 = 𝐴 𝑧𝑖 + 𝜖𝑖 state variations of the manifold  dynamics matrix state noise 𝜖𝑖 ∼𝐼𝐼𝐷 𝑁(0, 𝑄)  state residual Initial learning Sub-optimal analytical solution Assumption: 1. The dimension of the manifold is 𝑑, the dimension of the state noise is 𝑑𝜖 , 𝑑 > 𝑑𝜖 . The appearance matrix satisfies 𝐶𝑛𝑇 𝐶𝑛 = 𝐼𝑑 . 2. The analytical solution for the structure of the manifold is The decomposition is simulated by SVD. 𝑊 = 𝑈 𝑆 𝑉 𝑇 , 𝐶𝑛 = 𝑈 1: 𝑑, : , 𝑍𝑛 = 𝑆(1: 𝑑, 1: 𝑑) 𝑉(1: 𝑑, : )𝑇 S. Soatto, G. Doretto, and Y. Wu. “Dynamic textures”. IJCV 2003. Initial learning Given the states 𝑧1 𝑧2 … 𝑧𝑛 , solving the dynamics matrix 𝐴𝑛 by linear programming: To estimate noise covariance 𝑄𝑛 , we treat 𝜖𝑖 as the reconstruction error 𝑒𝑖 = 𝑧𝑖+1 − 𝐴𝑛 𝑧𝑖 , and 𝑄𝑛 is represented as 𝑄𝑛 = 𝐸 𝑒𝑖 𝑒𝑖 𝑇 1 = lim 𝑗→+∞ 𝑗 1 ≈ 𝑛−1 𝑗 𝑒𝑘 𝑒𝑘 𝑘=1 𝑛−1 𝑒𝑘 𝑒𝑘 𝑇 𝑇 𝑘=1 To reduce the dimension of 𝑒𝑖 , let 𝑄𝑛 = 𝐵𝑛 𝐵𝑛 = 𝐵 −1 𝑒𝑖 . 𝑇 and apply PCA to 𝑄𝑛 , 𝜖𝑖 Initial learning Since different manifold has different dynamic properties, the dimension of the manifold is determined by the training samples. Static Dimension Low Dynamic Dimension High Online learning Against foreground occlusions We define a noise-free video brick under the current model to compensate the missing background samples. The noise-free video brick 𝑣𝑛+1 is defined as Online learning To update the structure of the manifold, we regard 𝑊𝑛+1 as the extension by adding a new column (update sample) to 𝑊𝑛 . The problem of updating 𝐶𝑛+1 is formulated as incremental subspace learning. To find a more effective approach, we employ two incremental subspace learning methods: 1. Candid Covariance-free Incremental PCA (CCIPCA), without estimating the covariance matrix. 2. Incremental PCA (IPCA), estimating the covariance matrix. Online learning CCIPCA J. Weng et al. “Candid covariance-free incremental principal component analysis”. IEEE TPAMI 2003. Online learning IPCA For a 𝑑-dimension manifold, with eigenvectors 𝐶𝑛 , and eigenvalues Λ𝑛 , the covariance matrix is estimated as With the new sample, the new covariance matrix is estimated as Using the new covariance matrix to estimate the new eigenvectors 𝐶𝑛+1 , Λ𝑛+1 . Y. Li. “On incremental and robust subspace learning”. Pattern Recognition 2004. Online learning Update the state variation 𝐴𝑛+1 , 𝐵𝑛+1 by re-estimating the new state 𝑧𝑛+1 , 𝐴𝑛+1 is updated by re-computing the linear problem, 𝐵𝑛+1 by re-estimating the covariance matrix, [ 𝑒𝑛−𝑙+1 𝑒𝑛−𝑙+2 ⋯ 𝑒𝑛 ] = [ 𝑧𝑛−𝑙+2 𝑧𝑛−𝑙+3 ⋯ 𝑧𝑛+1 ] − 𝐴𝑛 [ 𝑧𝑛−𝑙+1 𝑧𝑛−𝑙+2 ⋯ 𝑧𝑛 ] 𝑄𝑛+1 = 𝐸 𝑒𝑖 𝑒𝑖 𝑇 1 = 𝑙 𝑛 𝑒𝑘 𝑒𝑘 𝑘=𝑛−𝑙+1 𝑇 Online learning Anti-degeneration Algorithm Experimental Results Behave poorly on highly dynamic backgrounds!

defense slides - UCLA Computer Science

Related documents

Products

Support

defense slides - UCLA Computer Science

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib