Derivations for the Layered Dynamic Texture and Temporally-Switching Layered Dynamic Texture Antoni B. Chan and Nuno Vasconcelos SVCL-TR 2009/01 June 2009 Derivations for the Layered Dynamic Texture and Temporally-Switching Layered Dynamic Texture Antoni B. Chan and Nuno Vasconcelos Statistical Visual Computing Lab Department of Electrical and Computer Engineering University of California, San Diego June 2009 Abstract This is the supplemental material for the paper “Variational Layered Dynamic Textures” in CVPR 2009 [1]. The supplemental contains derivations for the variational approximation of the layered dynamic texture (LDT), and the EM algorithm and variational approximations of the temporally-switching layered dynamic texture (TS-LDT). Author email: abchan@ucsd.edu c University of California San Diego, 2009 This work may not be copied or reproduced in whole or in part for any commercial purpose. Permission to copy in whole or in part without payment of fee is granted for nonprofit educational and research purposes provided that all such whole or partial copies include the following: a notice that such copying is by permission of the Statistical Visual Computing Laboratory of the University of California, San Diego; an acknowledgment of the authors and individual contributors to the work; and all applicable portions of the copyright notice. Copying, reproducing, or republishing for any other purpose shall require a license with payment of fee to the University of California, San Diego. All rights reserved. SVCL Technical reports are available on the SVCL’s web page at http://www.svcl.ucsd.edu University of California, San Diego Statistical Visual Computing Laboratory 9500 Gilman Drive, Mail code 0407 EBU 1, Room 5512 La Jolla, CA 92093-0407 1 1 Derivation of the variational approximation for LDT In this section, we derive a variational approximation for the layered dynamic texture (LDT). Substituting the approximate factorial distribution q(X, Z) = K q(x(j) ) j=1 m q(zi ) (S.1) i=1 into the L function of (10) yields K m K m (j) ) i=1 q(zi ) j=1 q(x (j) dXdZ. L(q(X, Z)) = q(x ) q(zi ) log p(X, Y, Z) j=1 i=1 (S.2) This is minimized by sequentially optimizing each of the factors q(x (j) ) and q(zi ), while holding the remaining constant [2]. For convenience, we define the variable W = {X, Z}. Rewriting (S.2) in terms of a single factor q(w l ), while holding all others constant, L(q(W )) ∝ q(wl ) log q(wl )dwl − q(wl ) q(wk ) log p(W, Y )dW = q(wl ) log q(wl )dwl − (S.3) k=l q(wl ) log p̃(wl , Y )dwl = D(q(wl ) p̃(wl , Y ) ) , (S.4) (S.5) where in (S.3) we have dropped terms that do not depend on q(w l ) (and hence do not affect the optimization), and defined p̃(w l , Y ) as (S.6) log p̃(wl , Y ) ∝ EWk=l [log p(W, Y )], where EWk=l [log p(W, Y )] = k=l q(wk ) log p(W, Y )dWk=l . Since (S.5) is minimized when q ∗ (wl ) = p̃(wl , Y ), the optimal factor q(w l ) is equal to the expectation of the joint log-likelihood with respect to the other factors W k=l . We next discuss the joint distribution of the LDT, followed by deriving the forms of the optimal factors q(x (j) ) and q(zi ). For convenience, we ignore normalization constants during the derivation, and reinstate them after the forms of the factors are known. 1.1 Joint distribution of the LDT The LDT model assumes that the state processes X = {x (j) }K j=1 and the layer assignments Z are independent, i.e. the layer dynamics are independent of its location. Under this assumption, the joint distribution factors as p(X, Y, Z) = = p(Y |X, Z)p(X)p(Z) m K i=1 j=1 (S.7) (j) p(yi |x(j) , zi = j)zi K j=1 p(x(j) )p(Z), (S.8) 1 DERIVATION OF THE VARIATIONAL APPROXIMATION FOR LDT 2 where Y = {yi }m i=1 . Each state-sequence is a Gauss-Markov process, with distribution (j) p(x(j) ) = p(x1 ) τ t=2 (j) (j) p(xt |xt−1 ), (S.9) where the individual state densities are (j) (j) (j) (j) (j) (j) p(xt |xt−1 ) = G(xt , A(j) xt−1 , Q(j) ), (S.10) p(x1 ) = G(x1 , µ(j) , Q(j) ), and G(x, µ, Σ) is a Gaussian of mean µ and covariance Σ. When conditioned on state sequences and layer assignments, pixel values are independent, and pixel trajectories distributed as p(yi |x(j) , zi = j) = τ (j) p(yi,t |xt , zi = j), (S.11) t=1 where (j) (j) (j) p(yi,t |xt , zi = j) = G(yi,t , Ci xt , r(j) ). (S.12) Finally, the layer assignments are jointly distributed as p(Z) = m 1 Vi (zi ) Vi,i (zi , zi ), ZZ i=1 (S.13) (i,i )∈E where E is the set of edges of the MRF, Z Z a normalization constant (partition function), and Vi and Vi,i potential functions of the form (1) K αi , zi = 1 (j) (j) .. Vi (zi ) = (αi )zi = , (S.14) . j=1 α(K) , z = K i i (j) (j) K z z γ1 i i γ1 , zi = zi = . Vi,i (zi , zi ) = γ2 γ2 , zi = zi γ2 j=1 Vi is the prior probability of each layer, while V i,i attributes higher probability to configurations with neighboring pixels in the same layer. 1.2 Optimization of q(x(j) ) Rewriting (S.6) with wl = x(j) , log q ∗ (x(j) ) ∝ ∝ = log p̃(x(j) , Y ) = EZ,Xk=j [log p(X, Y, Z)] (S.15) m (j) EZ,Xk=j zi log p(yi |x(j) , zi = j) + log p(x(j) ) (S.16) m i=1 i=1 Ezi [zi(j) ] log p(yi |x(j) , zi = j) + log p(x(j) ), (S.17) 1.3 Optimization of q(z i ) 3 where in (S.16) we have dropped the terms of the joint log-likelihood (S.8) that are (j) (j) (j) not a function of x (j) . Finally, defining h i = Ezi [zi ] = q(zi )zi dzi , and the normalization constant m (j) (j) (j) Zq = p(x ) p(yi |x(j) , zi = j)hi dx(j) , (S.18) i=1 the optimal q(x(j) ) is given by (12). 1.3 Optimization of q(zi ) Rewriting (S.6) with wl = zi and dropping terms that do not depend on z i , log q ∗ (zi ) ∝ ∝ log p̃(zi , Y ) = EX,Zk=i [log p(X, Y, Z)] K EX,Zk=i zi(j) log p(yi |x(j) , zi = j) + log p(Z) (S.19) j=1 = K j=1 (j) zi Ex(j) [log p(yi |x(j) , zi = j)] + EZk=i [log p(Z)]. (S.20) For the last term, we have EZk=i [log p(Z)] ∝ EZk=i [log(Vi (zi ) Vi,i (zi , zi ))] (S.21) (i,i )∈E = log Vi (zi ) + Ezi [log Vi,i (zi , zi )] (S.22) (i,i )∈E = K j=1 ∝ K j=1 = K j=1 (j) (j) zi log αi + K Ezi [ j=1 (i,i )∈E (j) (j) zi log αi + K j=1 (j) (j) zi log αi + (j) zi (j) hi log (i,i )∈E γ1 + log γ2 ] γ2 Ezi [zi(j) ] log (i,i )∈E (j) (j) zi zi log γ1 . γ2 γ1 γ2 (S.23) (S.24) (S.25) Hence, log q ∗ (zi ) ∝ K j=1 = K j=1 (j) zi Ex(j) [log p(yi |x(j) , zi = j)] + (i,i )∈E (j) (j) (j) zi log(gi αi ), γ 1 (j) (j) hi log + log αi γ2 (S.26) 1 DERIVATION OF THE VARIATIONAL APPROXIMATION FOR LDT 4 (j) where gi is defined in (15). This is a multinomial distribution of normalization conK (j) (j) (j) stant j=1 (αi gi ), leading to (13) with h i as given in (14). 1.4 Normalization constant for q(x(j) ) Taking the log of (S.18), log Zq(j) = log = log p(x(j) ) p(x(j) ) m (j) p(yi |x(j) , zi = j)hi dx(j) i=1 m τ (S.27) (j) (j) p(yi,t |xt , zi = j)hi dx(j) . (S.28) i=1 t=1 (j) hi Note that the term p(y i,t |x(j) , zi = j) (j) does not affect the integral when h i (j) = 0. (j) Defining Ij as the set of indices with non-zero h i , i.e. Ij = {i|hi > 0}, (S.28) becomes τ (j) (j) p(yi,t |xt , zi = j)hi dx(j) , (S.29) log Zq(j) = log p(x(j) ) i∈Ij t=1 where (j) (j) (j) (j) (j) p(yi,t |xt , zi = j)hi = G(yi,t , Ci xt , r(j) )hi 12 (j) (j) 1 (j) r 2πr (j) (j) = (2πr(j) )− 2 hi G yi,t , Ci xt , (j) . (j) hi hi (S.30) (S.31) For convenience, we will define an LDS over the subset I j parameterized by Θ̃j = (j) {A(j) , Q(j) , C̃ (j) , R̃j , µ(j) }, where C̃ (j) = [Ci ]i∈Ij , and R̃j is diagonal with entries (j) r (j) (j) for i ∈ Ij . Noting that this LDS has conditional hi (j) (j) (j) (j) p̃(yi,t |xt , zi = j) = G(yi,t , Ci xt , r̃i ), we can rewrite r̃i = observation likelihood 12 (1−h(j) (j) 1 i ) (j) (j) (j) (hi )− 2 p̃(yi,t |xt , zi = j) p(yi,t |xt , zi = j)hi = 2πr(j) (S.32) and, from (S.29), log Zq(j) = log p(x(j) ) τ i∈Ij t=1 (S.33) 12 (1−h(j) 1 i ) (j) (j) 2πr(j) (hi )− 2 p̃(yi,t |xt , zi = j) dx(j) . Since, under the restricted LDS, the likelihood of Y j = [yi ]i∈Ij is p̃j (Yj ) = p(x(j) ) τ i∈Ij t=1 (j) p̃(yi,t |xt , zi = j)dx(j) , (S.34) 5 it follows that log Zq(j) τ 12 (1−h(j) i ) 1 (j) 2πr(j) = log p̃j (Yj ) (hi )− 2 (S.35) i∈Ij t=1 = τ τ (j) (j) (1 − hi ) log(2πr(j) ) − log hi + log p̃j (Yj ). 2 2 i∈Ij (S.36) i∈Ij 2 Derivation of the EM algorithm for the TS-LDT In this section, we derive the EM algorithm for the temporally-switching layered dynamic texture (TS-LDT). We begin by deriving the complete data log-likelihood, followed by the E and M steps. 2.1 Complete data log-likelihood of the TS-LDT (j) We introduce an indicator variable z i,t of value 1 if and only if z i,t = j, and 0 otherwise. Under the assumption that the state processes X and layer assignments Z are independent, the joint distribution factors as p(X, Y, Z) = = p(Y |X, Z)p(X)p(Z) m K τ (j) (S.37) (j) p(yi,t |xt , zi,t = j)zi,t i=1 j=1 t=1 K p(x(j) )p(Z), (S.38) j=1 where the conditional observation likelihood is (j) (j) (j) (j) p(yi,t |xt , zi,t = j) = G(yi,t , Ci xt + γi , r(j) ), (S.39) and the distribution for p(x (j) ) is the same as the LDT, given in (S.9, S.10). Finally, for the layer assignments Z, we assume that each frame Z t = {zi,t }m i=1 has the same MRF structure, with temporal edges only connecting nodes corresponding to the same pixel (e.g. z i,t and zi,t+1 ). The layer assignments are then jointly distributed as τ τ m 1 Vi,t (zi,t ) Vi,i (zi,t , zi ,t ) p(Z) = (S.40) ZZ t=1 i=1 t=1 (i,i )∈Et m Vt,t (zi,t , zi,t ) , · i=1 (t,t )∈Ei where Et is the set of MRF in frame t, E i is the set of MRF edges between frames for pixel i, and ZZ a normalization constant (partition function). The potential functions Vi,t , Vi,i , Vt,t are of the form: α(1) i,t , zi,t = 1 K (j) (j) .. zi,t (αi,t ) = Vi,t (zi,t ) = , (S.41) . j=1 α(K) , z = K i,t i,t 2 DERIVATION OF THE EM ALGORITHM FOR THE TS-LDT 6 z z K γ1 i,t i ,t (j) (j) V i,i (zi,t , z i ,t ) = γ2 j=1 γ2 z z K β1 i,t i,t = (j) (j) Vt,t (zi,t , zi,t ) = β2 j=1 β2 = γ1 , zi,t = zi ,t , γ2 , zi,t = zi ,t β1 , zi,t = zi,t , β2 , zi,t = zi,t (S.42) where Vi,t is the prior probability of each layer in each frame t, and V i,i and Vt,t attributes higher probability to configurations with neighboring pixels (both spatially and temporally) in the same layer. Taking the logarithm of (S.38), the complete data log-likelihood is log p(X, Y, Z) = m K τ i=1 j=1 t=1 + K (j) (j) zi,t log p(yi,t |xt , zi,t = j) (j) log p(x1 ) j=1 + τ t=2 (S.43) (j) (j) log p(xt |xt−1 ) + log p(Z). Using (S.10) and (S.39) and dropping terms that do not depend on the parameters Θ (and thus play no role in the M-step), log p(X, Y, Z) = − 1 2 K m τ j=1 i=1 t=1 2 (j) (j) (j) (j) zi,t yi,t − Ci xt − γi (j) + log r(j) (S.44) r K τ 2 2 1 (j) (j) (j) (j) (j) (j) − x1 − µ (j) + xt − A xt−1 (j) + τ log Q . 2 j=1 Q Q t=2 Note that p(Z) can be ignored since the parameters of the MRF are constants. Finally, the complete data log-likelihood is log p(X, Y, Z) = − (S.45) K m τ 1 2 j=1 i=1 t=1 (j) (j) (j) zi,t (j) T r 1 (j) (j) (j) (j) (yi,t − γi )2 − 2(yi,t − γi )Ci xt (j) + Ci Pt,t Ci K T 1 (j) (j) (j) T (j) −1 (j) (j) (j) (j) T − tr Q − µ x1 +µ µ P1,1 − x1 µ 2 j=1 K τ T 1 (j) (j) (j) T (j) −1 − tr Q Pt,t − Pt,t−1 A(j) − A(j) Pt,t−1 2 j=1 t=2 T (j) + A(j) Pt−1,t−1 A(j) 2.2 E-step 7 − K m τ K 1 (j) τ zi,t log r(j) − log Q(j) , 2 j=1 i=1 t=1 2 j=1 (j) (j) T (j) where we define Pt,t = xt xt (j) (j) T (j) and Pt,t−1 = xt xt−1 . 2.2 E-step From (S.45), it follows that the E-step of (4) requires conditional expectations of two forms: EX,Z|Y [f (x(j) )] = EX|Y [f (x(j) )], (S.46) (j) (j) EX,Z|Y [zi,t f (x(j) )] = EZ|Y [zi,t ]EX|Y,zi,t =j [f (x(j) )] (S.47) for some function f of x (j) , and where EX|Y,zi,t =j is the conditional expectation of X given the observation Y and that the i-th pixel at time t belongs to layer j. Defining the conditional expectations in (18) and aggregated statistics in (19), substituting (S.45) into (4), leads to the Q function Q(Θ; Θ̂) = − 1 2 − K j=1 1 2 m 1 r(j) K i=1 tr Q(j) τ t=1 −1 (j) (j) (j) (j) T (j) ẑi,t (yi,t − γi )2 − 2Ci Γi + Ci Φi Ci (j) (j) T T (j) (j) P̂1,1 − x̂1 µ(j) − µ(j) (x̂1 )T + µ(j) µ(j) + φ2 j=1 T T (j) − ψ (j) A(j) − A(j) ψ (j) + A(j) φ1 A(j) − (j) (j) (S.48) T K − 1 N̂j log r(j) 2 j=1 K τ log Q(j) . 2 j=1 2.3 M-step The maximization of the Q function with respect to the TS-LDT parameters leads to two optimization problems. The first is a maximization with respect to a square matrix X of the form b 1 1 X ∗ = argmax − tr X −1 A − log |X| ⇒ X ∗ = A. 2 2 b X (S.49) The second is a maximization with respect to a matrix X with the form ! 1 X ∗ = argmax − tr D(−BX T − XB T + XCX T ) ⇒ X ∗ = BC −1 , (S.50) 2 X where D and C are symmetric and invertible matrices. 2 DERIVATION OF THE EM ALGORITHM FOR THE TS-LDT 8 The optimal parameters are found by collecting the relevant terms in (S.49) and maximizing. This leads to a number of problems of the form of (S.50), namely −1 T T 1 A(j)∗ = argmax − tr Q(j) (−ψ (j) A(j) − A(j) ψ (j) (S.51) 2 (j) A T (j) + A(j) φ1 A(j) ) , −1 T 1 (j) (j) −x̂1 µ(j) − µ(j) (x̂1 )T µ(j)∗ = argmax − tr Q(j) (S.52) 2 µ(j) T , + µ(j) µ(j) 1 1 (j) (j)∗ (j) (j) (j) (j) (j) T Ci = argmax − (j) ẑi −2Ci Γi + Ci Φi Ci . (S.53) 2r (j) C i Using (S.50) leads to the solutions (j) −1 ∗ A(j) = ψ (j) φ1 , (j) µ(j)∗ = x̂1 , (j) ∗ Ci (j) T = Γi (j) −1 Φi . The remaining problems are of the form of (S.49) −1 T 1 (j) (j) (j) P̂1,1 − x̂1 µ(j) − µ(j) (x̂1 )T Q(j)∗ = argmax − trQ(j) 2 Q(j) r(j)∗ = T = T = = τ m 1 N̂j 1 N̂j i=1 t=1 τ m i=1 t=1 (j) ẑi,t (yi,t − (j) γi )2 − (j) γi )2 (S.57) T (j) −ψ (j) A(j) − A(j) ψ (j) + A(j) φ1 A(j) T T 1 (j) (j) P̂1,1 − µ(j)∗ µ(j)∗ + φ2 − A(j)∗ ψ (j) . τ In the second case, r (S.55) T T T T (j) (j) +µ(j) µ(j) + φ2 − ψ (j) A(j) − A(j) ψ (j) + A(j) φ1 A(j) τ − log Q(j) , 2 τ m 1 1 (j) (j) (j) (j) argmax − (j) ẑi,t (yi,t − γi )2 − 2Ci Γi (S.56) 2 r (j) r t=1 i=1 1 (j) (j) (j) T + Ci Φi Ci − N̂j log r(j) 2 In the first case, it follows from (S.49) that T T 1 (j) (j) (j) (j) P̂1,1 − x̂1 µ(j) − µ(j) (x̂1 )T + µ(j) µ(j) + φ2 Q(j)∗ = τ (j)∗ (S.54) − (j) (j) 2Ci Γi − (j)∗ (j) Ci Γi + (j) (j) (j) T Ci Φi Ci (S.58) (S.59) (j) ẑi,t (yi,t . (S.60) 9 Finally, noting that parameters are ∂Q (j) ∂γi = (j) ∂ (j) Γi ∂γi 1 r(j) ⇒ τ t=1 (j) γi =− τ (j) (j) t=1 ẑi,t x̂t|i (j) = −ξi , the estimate of the mean (j) −2ẑi,t (yi,t = τ 1 (j) t=1 ẑi,t − (j) γi ) τ t=1 + (j) (j) 2Ci ξi = 0, (S.61) (j) ẑi,t yi,t − (j) (j) Ci ξi . (S.62) 3 Variational approximation for the TS-LDT In this section, we derive the variational approximation for the TS-LDT, which follows closely to that of the LDT. Substituting (20) into (10), leads to (j) ) i,t q(zi,t ) j q(x (j) dXdZ. (S.63) L(q(X, Z)) = q(x ) q(zi,t ) log p(X, Y, Z) j i,t The L function (S.63) is minimized by sequentially optimizing each of the factors q(x(j) ) and q(zi,t ), while holding the others constant [2]. 3.1 Optimization of q(x(j) ) Rewriting (S.6) with wl = x(j) , log q ∗ (x(j) ) ∝ log p̃(x(j) , Y ) = EZ,Xk=j [log p(X, Y, Z)] τ m (j) (j) (j) zi,t log p(yi,t |xt , zi,t = j) + log p(x ) ∝ EZ,Xk=j = τ m t=1 i=1 (S.64) (S.65) t=1 i=1 (j) (j) Ezi,t [zi,t ] log p(yi,t |xt , zi,t = j) + log p(x(j) ), (S.66) where in (S.65) we have dropped the terms of the complete data log-likelihood (S.43) (j) (j) (j) that are not a function of x (j) . Finally, defining h i,t = Ezi,t [zi,t ] = q(zi,t )zi,t dzi,t , and the normalization constant τ m (j) (j) Zq(j) = p(x(j) ) p(yi,t |xt , zi,t = j)hi,t dx(j) , (S.67) t=1 i=1 the optimal q(x(j) ) is given by (21). 3.2 Optimization of q(zi,t ) Rewriting (S.6) with wl = zi,t and dropping terms that do not depend on z i,t , log q ∗ (zi,t ) ∝ log p̃(zi,t , Y ) = EX,Zk=i,s=t [log p(X, Y, Z)] (S.68) 3 VARIATIONAL APPROXIMATION FOR THE TS-LDT 10 EX,Zk=i,s=t ∝ K j=1 K = (j) (j) zi,t log p(yi,t |xt , zi,t = j) + log p(Z) (j) (S.69) (j) zi,t Ex(j) [log p(yi,t |xt , zi,t = j)] + EZk=i,s=t [log p(Z)]. (S.70) t j=1 For the last term, EZk=i,s=t [log p(Z)] ∝ EZk=i,s=t [log(Vi,t (zi,t ) (t,t )∈E = K j=1 ∝ j=1 + (j) j=1 K Ezi,t [ j=1 (j) zi,t (j) log αi,t (j) zi,t (j) zi,t + j=1 (j) (j) zi,t zi,t log K j=1 + (j) hi ,t (i,i )∈Et K j=1 E (t,t )∈Ei = K j=1 γ1 + log γ2 ] γ2 Ezi ,t [zi(j) ,t ] log (j) (j) xt β1 β2 γ1 log + γ2 (j) hi,t (t,t )∈Ei (j) [log p(yi,t |xt , zi,t = j)] + (j) hi,t log β1 (j) + log αi,t β2 (j) (j) zi,t log(gi,t αi,t ), (S.73) γ1 γ2 (j) zi,t + (j) (j) zi,t zi ,t log (i,i )∈Et (t,t )∈Ei log α(j) i,t log q ∗ (zi,t ) ∝ (S.72) β1 + log β2 ] β2 (j) zi,t (j) Ezi,t [zi,t ] log Hence, (j) Ezi ,t [ (i,i )∈Et j=1 = K (j) zi,t log αi,t + K K Ezi ,t [log Vi,i (zi,t , zi ,t )] i (t,t )∈Ei K Vt,t (zi,t , zi,t ,) )] (t,t )∈Ei Ezi,t [log Vt,t (zi,t , zi,t )] + (S.71) (i,i )∈Et + Vi,i (zi,t , zi ,t ) (i,i )∈Et = log Vi,t (zi,t ) + (S.74) β1 . log β2 (i,i )∈Et (j) (S.75) hi ,t log γ1 γ2 (S.76) (S.77) where gi,t is defined in (24). This is a multinomial distribution of normalization conK (j) (j) (j) stant j=1 (αi,t gi,t ), leading to (22) with h i,t as given in (23). 3.3 Normalization constant for q(x (j) ) 11 3.3 Normalization constant for q(x(j) ) Taking the log of (S.67) log Zq(j) = log p(x(j) ) τ m (j) (j) p(yi,t |xt , zi,t = j)hi,t dx(j) . (S.78) i=1 t=1 (j) (j) (j) Note that the term p(y i,t |xt , zi,t = j)hi,t does not affect the integral when h i,t = 0. (j) (j) Defining Ij as the set of indices (i, t) with non-zero h i,t , i.e. Ij = {(i, t)|hi,t > 0}, (S.78) becomes (j) (j) log Zq(j) = log p(x(j) ) p(yi,t |xt , zi,t = j)hi,t dx(j) , (S.79) (i,t)∈Ij where (j) (j) (j) (j) (j) p(yi,t |xt , zi,t = j)hi,t = G(yi,t , Ci xt , r(j) )hi,t 12 (j) (j) 1 (j) 2πr (j) (j) r (j) − 2 hi,t = (2πr ) G yi,t , Ci xt , (j) . (j) hi,t hi,t (S.80) (S.81) For convenience, we define an LDS over the subset of observations indexed by I j . Note that the dimension of the observation y t changes over time, depending on how (j) many hi,t are active in each frame, and hence the LDS is parameterized by Θ̃j = (j) (j) (j) {A(j) , Q(j) , C̃t , R̃t , µ(j) }, where C̃t (j) (j) = [Ci ](i,t)∈Ij is a time-varying observa- tion matrix, and R̃t is time-varying diagonal covariance matrix with diagonal entries (j) (j) [ r (j) ](i,t)∈Ij . This LDS has conditional observation likelihood p̃(y i,t |xt , zi,t = j) = hi,t (j) (j) (j) G(yi,t , Ci xt , r̃i,t ), we can rewrite 12 (1−h(j) (j) 1 i,t ) (j) (j) (j) (hi,t )− 2 p̃(yi,t |xt , zi,t = j), (S.82) p(yi,t |xt , zi,t = j)hi,t = 2πr(j) and, from (S.79), log Zq(j) = (S.83) 12 (1−h(j) i,t ) 1 (j) (j) log p(x(j) ) (hi,t )− 2 p̃(yi,t |xt , zi,t = j) dx(j) . 2πr(j) (i,t)∈Ij Since, under the restricted LDS, the likelihood of the observation Y j = [yi,t ](i,t)∈Ij is p̃j (Yj ) = p(x(j) ) (i,t)∈Ij (j) p̃(yi,t |xt , zi,t = j)dx(j) , (S.84) REFERENCES 12 it follows that log Zq(j) 12 (1−h(j) 1 i,t ) (j) 2πr(j) = log p̃j (Yj ) (hi,t )− 2 (S.85) (i,t)∈Ij = 1 1 (j) (j) (1 − hi,t ) log(2πr(j) ) − log hi,t + log p̃j (Yj ). (S.86) 2 2 (i,t)∈Ij (i,t)∈Ij References [1] A. B. Chan and N. Vasconcelos, “Variational layered dynamic textures,” in IEEE Conf. Computer Vision and Pattern Recognition, 2009. [2] C. M. Bishop, Pattern Recognition and Machine Learning. Springer, 2006. SVCL-TR 2009/01 June 2009 Derivations for the Layered Dynamic Texture and Temporally-Switching Layered Dynamic Texture Antoni B. Chan