Derivations for the Layered Dynamic Texture and Temporally-Switching Layered Dynamic Texture

advertisement
Derivations for the Layered Dynamic Texture and
Temporally-Switching Layered Dynamic Texture
Antoni B. Chan and Nuno Vasconcelos
SVCL-TR 2009/01
June 2009
Derivations for the Layered Dynamic Texture and
Temporally-Switching Layered Dynamic Texture
Antoni B. Chan and Nuno Vasconcelos
Statistical Visual Computing Lab
Department of Electrical and Computer Engineering
University of California, San Diego
June 2009
Abstract
This is the supplemental material for the paper “Variational Layered Dynamic Textures” in CVPR 2009 [1]. The supplemental contains derivations for the variational
approximation of the layered dynamic texture (LDT), and the EM algorithm and variational approximations of the temporally-switching layered dynamic texture (TS-LDT).
Author email: abchan@ucsd.edu
c
University
of California San Diego, 2009
This work may not be copied or reproduced in whole or in part for any commercial purpose. Permission to copy in whole or in part without payment of fee is granted for nonprofit educational and
research purposes provided that all such whole or partial copies include the following: a notice that
such copying is by permission of the Statistical Visual Computing Laboratory of the University of
California, San Diego; an acknowledgment of the authors and individual contributors to the work;
and all applicable portions of the copyright notice. Copying, reproducing, or republishing for any
other purpose shall require a license with payment of fee to the University of California, San Diego.
All rights reserved.
SVCL Technical reports are available on the SVCL’s web page at
http://www.svcl.ucsd.edu
University of California, San Diego
Statistical Visual Computing Laboratory
9500 Gilman Drive, Mail code 0407
EBU 1, Room 5512
La Jolla, CA 92093-0407
1
1 Derivation of the variational approximation for LDT
In this section, we derive a variational approximation for the layered dynamic texture
(LDT). Substituting the approximate factorial distribution
q(X, Z) =
K
q(x(j) )
j=1
m
q(zi )
(S.1)
i=1
into the L function of (10) yields
K
m
K
m
(j)
) i=1 q(zi )
j=1 q(x
(j)
dXdZ.
L(q(X, Z)) =
q(x )
q(zi ) log
p(X, Y, Z)
j=1
i=1
(S.2)
This is minimized by sequentially optimizing each of the factors q(x (j) ) and q(zi ),
while holding the remaining constant [2]. For convenience, we define the variable
W = {X, Z}. Rewriting (S.2) in terms of a single factor q(w l ), while holding all
others constant,
L(q(W ))
∝
q(wl ) log q(wl )dwl − q(wl )
q(wk ) log p(W, Y )dW
=
q(wl ) log q(wl )dwl −
(S.3)
k=l
q(wl ) log p̃(wl , Y )dwl
= D(q(wl ) p̃(wl , Y ) ) ,
(S.4)
(S.5)
where in (S.3) we have dropped terms that do not depend on q(w l ) (and hence do not
affect the optimization), and defined p̃(w l , Y ) as
(S.6)
log p̃(wl , Y ) ∝ EWk=l [log p(W, Y )],
where EWk=l [log p(W, Y )] =
k=l q(wk ) log p(W, Y )dWk=l . Since (S.5) is minimized when q ∗ (wl ) = p̃(wl , Y ), the optimal factor q(w l ) is equal to the expectation of
the joint log-likelihood with respect to the other factors W k=l .
We next discuss the joint distribution of the LDT, followed by deriving the forms
of the optimal factors q(x (j) ) and q(zi ). For convenience, we ignore normalization
constants during the derivation, and reinstate them after the forms of the factors are
known.
1.1 Joint distribution of the LDT
The LDT model assumes that the state processes X = {x (j) }K
j=1 and the layer assignments Z are independent, i.e. the layer dynamics are independent of its location.
Under this assumption, the joint distribution factors as
p(X, Y, Z) =
=
p(Y |X, Z)p(X)p(Z)
m K
i=1 j=1
(S.7)
(j)
p(yi |x(j) , zi = j)zi
K
j=1
p(x(j) )p(Z),
(S.8)
1 DERIVATION OF THE VARIATIONAL APPROXIMATION FOR LDT
2
where Y = {yi }m
i=1 . Each state-sequence is a Gauss-Markov process, with distribution
(j)
p(x(j) ) = p(x1 )
τ
t=2
(j)
(j)
p(xt |xt−1 ),
(S.9)
where the individual state densities are
(j)
(j)
(j) (j)
(j)
(j)
p(xt |xt−1 ) = G(xt , A(j) xt−1 , Q(j) ), (S.10)
p(x1 ) = G(x1 , µ(j) , Q(j) ),
and G(x, µ, Σ) is a Gaussian of mean µ and covariance Σ. When conditioned on state
sequences and layer assignments, pixel values are independent, and pixel trajectories
distributed as
p(yi |x(j) , zi = j) =
τ
(j)
p(yi,t |xt , zi = j),
(S.11)
t=1
where
(j)
(j) (j)
p(yi,t |xt , zi = j) = G(yi,t , Ci xt , r(j) ).
(S.12)
Finally, the layer assignments are jointly distributed as
p(Z) =
m
1 Vi (zi )
Vi,i (zi , zi ),
ZZ i=1
(S.13)
(i,i )∈E
where E is the set of edges of the MRF, Z Z a normalization constant (partition function), and Vi and Vi,i potential functions of the form

(1)


K
 αi , zi = 1
(j)
(j)
..
Vi (zi ) =
(αi )zi =
,
(S.14)
.


j=1
 α(K) , z = K
i
i
(j) (j)
K
z
z
γ1 i i
γ1 , zi = zi
=
.
Vi,i (zi , zi ) = γ2
γ2 , zi = zi
γ2
j=1
Vi is the prior probability of each layer, while V i,i attributes higher probability to
configurations with neighboring pixels in the same layer.
1.2 Optimization of q(x(j) )
Rewriting (S.6) with wl = x(j) ,
log q ∗ (x(j) ) ∝
∝
=
log p̃(x(j) , Y ) = EZ,Xk=j [log p(X, Y, Z)]
(S.15)
m
(j)
EZ,Xk=j
zi log p(yi |x(j) , zi = j) + log p(x(j) ) (S.16)
m
i=1
i=1
Ezi [zi(j) ] log p(yi |x(j) , zi = j) + log p(x(j) ),
(S.17)
1.3 Optimization of q(z i )
3
where in (S.16) we have dropped the terms of the joint log-likelihood (S.8) that are
(j)
(j)
(j)
not a function of x (j) . Finally, defining h i = Ezi [zi ] = q(zi )zi dzi , and the
normalization constant
m
(j)
(j)
(j)
Zq = p(x )
p(yi |x(j) , zi = j)hi dx(j) ,
(S.18)
i=1
the optimal q(x(j) ) is given by (12).
1.3 Optimization of q(zi )
Rewriting (S.6) with wl = zi and dropping terms that do not depend on z i ,
log q ∗ (zi ) ∝
∝
log p̃(zi , Y ) = EX,Zk=i [log p(X, Y, Z)]


K
EX,Zk=i  zi(j) log p(yi |x(j) , zi = j) + log p(Z)
(S.19)
j=1
=
K
j=1
(j)
zi Ex(j) [log p(yi |x(j) , zi = j)] + EZk=i [log p(Z)]. (S.20)
For the last term, we have
EZk=i [log p(Z)]
∝ EZk=i [log(Vi (zi )
Vi,i (zi , zi ))]
(S.21)
(i,i )∈E
= log Vi (zi ) +
Ezi [log Vi,i (zi , zi )]
(S.22)
(i,i )∈E
=
K
j=1
∝
K
j=1
=
K
j=1
(j)
(j)
zi log αi +
K
Ezi [
j=1
(i,i )∈E
(j)
(j)
zi log αi +

K
j=1
(j)
(j)
zi log αi +
(j)
zi
(j)

hi log
(i,i )∈E
γ1
+ log γ2 ]
γ2
Ezi [zi(j)
] log
(i,i )∈E
(j) (j)
zi zi log
γ1 
.
γ2
γ1
γ2
(S.23)
(S.24)
(S.25)
Hence,
log q ∗ (zi )
∝
K
j=1
=
K
j=1

(j)
zi Ex(j) [log p(yi |x(j) , zi = j)] +
(i,i )∈E
(j)
(j) (j)
zi log(gi αi ),

γ
1
(j)
(j)
hi log
+ log αi 
γ2
(S.26)
1 DERIVATION OF THE VARIATIONAL APPROXIMATION FOR LDT
4
(j)
where gi is defined in (15). This is a multinomial distribution of normalization conK
(j) (j)
(j)
stant j=1 (αi gi ), leading to (13) with h i as given in (14).
1.4 Normalization constant for q(x(j) )
Taking the log of (S.18),
log Zq(j)
=
log
=
log
p(x(j) )
p(x(j) )
m
(j)
p(yi |x(j) , zi = j)hi dx(j)
i=1
m τ
(S.27)
(j)
(j)
p(yi,t |xt , zi = j)hi dx(j) .
(S.28)
i=1 t=1
(j)
hi
Note that the term p(y i,t |x(j)
, zi = j)
(j)
does not affect the integral when h i
(j)
= 0.
(j)
Defining Ij as the set of indices with non-zero h i , i.e. Ij = {i|hi > 0}, (S.28)
becomes
τ
(j)
(j)
p(yi,t |xt , zi = j)hi dx(j) ,
(S.29)
log Zq(j) = log p(x(j) )
i∈Ij t=1
where
(j)
(j)
(j) (j)
(j)
p(yi,t |xt , zi = j)hi = G(yi,t , Ci xt , r(j) )hi
12 (j)
(j)
1 (j)
r
2πr
(j)
(j)
= (2πr(j) )− 2 hi
G yi,t , Ci xt , (j) .
(j)
hi
hi
(S.30)
(S.31)
For convenience, we will define an LDS over the subset I j parameterized by Θ̃j =
(j)
{A(j) , Q(j) , C̃ (j) , R̃j , µ(j) }, where C̃ (j) = [Ci ]i∈Ij , and R̃j is diagonal with entries
(j)
r (j)
(j) for i ∈ Ij . Noting that this LDS has conditional
hi
(j)
(j) (j) (j)
p̃(yi,t |xt , zi = j) = G(yi,t , Ci xt , r̃i ), we can rewrite
r̃i
=
observation likelihood
12 (1−h(j)
(j)
1
i )
(j)
(j)
(j)
(hi )− 2 p̃(yi,t |xt , zi = j)
p(yi,t |xt , zi = j)hi = 2πr(j)
(S.32)
and, from (S.29),
log Zq(j)
= log
p(x(j) )
τ
i∈Ij t=1
(S.33)
12 (1−h(j)
1
i )
(j)
(j)
2πr(j)
(hi )− 2 p̃(yi,t |xt , zi = j) dx(j) .
Since, under the restricted LDS, the likelihood of Y j = [yi ]i∈Ij is
p̃j (Yj ) =
p(x(j) )
τ
i∈Ij t=1
(j)
p̃(yi,t |xt , zi = j)dx(j) ,
(S.34)
5
it follows that
log Zq(j)

τ 12 (1−h(j)
i )
1
(j)
2πr(j)
= log p̃j (Yj )
(hi )− 2 

(S.35)
i∈Ij t=1
=
τ τ (j)
(j)
(1 − hi ) log(2πr(j) ) −
log hi + log p̃j (Yj ).
2
2
i∈Ij
(S.36)
i∈Ij
2 Derivation of the EM algorithm for the TS-LDT
In this section, we derive the EM algorithm for the temporally-switching layered dynamic texture (TS-LDT). We begin by deriving the complete data log-likelihood, followed by the E and M steps.
2.1 Complete data log-likelihood of the TS-LDT
(j)
We introduce an indicator variable z i,t of value 1 if and only if z i,t = j, and 0 otherwise. Under the assumption that the state processes X and layer assignments Z are
independent, the joint distribution factors as
p(X, Y, Z) =
=
p(Y |X, Z)p(X)p(Z)
m K τ
(j)
(S.37)
(j)
p(yi,t |xt , zi,t = j)zi,t
i=1 j=1 t=1
K
p(x(j) )p(Z), (S.38)
j=1
where the conditional observation likelihood is
(j)
(j) (j)
(j)
p(yi,t |xt , zi,t = j) = G(yi,t , Ci xt + γi , r(j) ),
(S.39)
and the distribution for p(x (j) ) is the same as the LDT, given in (S.9, S.10). Finally,
for the layer assignments Z, we assume that each frame Z t = {zi,t }m
i=1 has the same
MRF structure, with temporal edges only connecting nodes corresponding to the same
pixel (e.g. z i,t and zi,t+1 ). The layer assignments are then jointly distributed as

 τ
τ m
1 Vi,t (zi,t ) 
Vi,i (zi,t , zi ,t )
p(Z) =
(S.40)
ZZ t=1 i=1
t=1 (i,i )∈Et


m
Vt,t (zi,t , zi,t ) ,
·
i=1 (t,t )∈Ei
where Et is the set of MRF in frame t, E i is the set of MRF edges between frames for
pixel i, and ZZ a normalization constant (partition function). The potential functions
Vi,t , Vi,i , Vt,t are of the form:

 α(1)
i,t , zi,t = 1
K

(j) (j) 
..
zi,t
(αi,t )
=
Vi,t (zi,t ) =
,
(S.41)
.


j=1
 α(K) , z = K
i,t
i,t
2 DERIVATION OF THE EM ALGORITHM FOR THE TS-LDT
6
z z
K γ1 i,t i ,t
(j) (j)
V
i,i
(zi,t , z
i ,t
)
= γ2
j=1
γ2
z z
K β1 i,t i,t
=
(j) (j)
Vt,t (zi,t , zi,t )
= β2
j=1
β2
=
γ1 , zi,t = zi ,t
,
γ2 , zi,t = zi ,t
β1 , zi,t = zi,t
,
β2 , zi,t = zi,t
(S.42)
where Vi,t is the prior probability of each layer in each frame t, and V i,i and Vt,t
attributes higher probability to configurations with neighboring pixels (both spatially
and temporally) in the same layer.
Taking the logarithm of (S.38), the complete data log-likelihood is
log p(X, Y, Z) =
m K τ
i=1 j=1 t=1
+
K
(j)
(j)
zi,t log p(yi,t |xt , zi,t = j)
(j)
log p(x1 )
j=1
+
τ
t=2
(S.43)
(j) (j)
log p(xt |xt−1 )
+ log p(Z).
Using (S.10) and (S.39) and dropping terms that do not depend on the parameters Θ
(and thus play no role in the M-step),
log p(X, Y, Z) =
−
1
2
K m τ
j=1 i=1 t=1
2
(j) (j) (j)
(j) zi,t yi,t − Ci xt − γi (j) + log r(j)
(S.44)
r
K
τ 2
2
1 (j) (j)
(j)
(j) (j) (j) −
x1 − µ (j) +
xt − A xt−1 (j) + τ log Q .
2 j=1
Q
Q
t=2
Note that p(Z) can be ignored since the parameters of the MRF are constants. Finally,
the complete data log-likelihood is
log p(X, Y, Z) =
−
(S.45)
K m τ
1 2
j=1 i=1 t=1
(j)
(j)
(j)
zi,t
(j) T
r
1 (j)
(j)
(j) (j)
(yi,t − γi )2 − 2(yi,t − γi )Ci xt
(j)
+ Ci Pt,t Ci
K
T
1
(j)
(j) (j) T
(j) −1
(j) (j)
(j) (j) T
−
tr Q
− µ x1
+µ µ
P1,1 − x1 µ
2 j=1
K
τ
T
1 (j)
(j)
(j) T
(j) −1
−
tr Q
Pt,t − Pt,t−1 A(j) − A(j) Pt,t−1
2 j=1 t=2
T
(j)
+ A(j) Pt−1,t−1 A(j)
2.2 E-step
7
−
K m τ
K
1 (j)
τ
zi,t log r(j) −
log Q(j) ,
2 j=1 i=1 t=1
2 j=1
(j) (j) T
(j)
where we define Pt,t = xt xt
(j) (j) T
(j)
and Pt,t−1 = xt xt−1 .
2.2 E-step
From (S.45), it follows that the E-step of (4) requires conditional expectations of two
forms:
EX,Z|Y [f (x(j) )] = EX|Y [f (x(j) )],
(S.46)
(j)
(j)
EX,Z|Y [zi,t
f (x(j) )] = EZ|Y [zi,t ]EX|Y,zi,t =j [f (x(j) )]
(S.47)
for some function f of x (j) , and where EX|Y,zi,t =j is the conditional expectation of X
given the observation Y and that the i-th pixel at time t belongs to layer j. Defining the
conditional expectations in (18) and aggregated statistics in (19), substituting (S.45)
into (4), leads to the Q function
Q(Θ; Θ̂) =
−
1
2
−
K
j=1
1
2
m
1
r(j)
K
i=1
tr Q(j)
τ
t=1
−1
(j)
(j)
(j)
(j) T
(j)
ẑi,t (yi,t − γi )2 − 2Ci Γi + Ci Φi Ci
(j)
(j)
T
T
(j)
(j)
P̂1,1 − x̂1 µ(j) − µ(j) (x̂1 )T + µ(j) µ(j) + φ2
j=1
T
T
(j)
− ψ (j) A(j) − A(j) ψ (j) + A(j) φ1 A(j)
−
(j) (j)
(S.48)
T
K
−
1
N̂j log r(j)
2 j=1
K
τ
log Q(j) .
2 j=1
2.3 M-step
The maximization of the Q function with respect to the TS-LDT parameters leads to
two optimization problems. The first is a maximization with respect to a square matrix
X of the form
b
1 1
X ∗ = argmax − tr X −1 A − log |X| ⇒ X ∗ = A.
2
2
b
X
(S.49)
The second is a maximization with respect to a matrix X with the form
!
1
X ∗ = argmax − tr D(−BX T − XB T + XCX T ) ⇒ X ∗ = BC −1 , (S.50)
2
X
where D and C are symmetric and invertible matrices.
2 DERIVATION OF THE EM ALGORITHM FOR THE TS-LDT
8
The optimal parameters are found by collecting the relevant terms in (S.49) and
maximizing. This leads to a number of problems of the form of (S.50), namely
−1
T
T
1 A(j)∗ = argmax − tr Q(j) (−ψ (j) A(j) − A(j) ψ (j)
(S.51)
2
(j)
A
T
(j)
+ A(j) φ1 A(j) ) ,
−1
T
1 (j)
(j)
−x̂1 µ(j) − µ(j) (x̂1 )T
µ(j)∗ = argmax − tr Q(j)
(S.52)
2
µ(j)
T
,
+ µ(j) µ(j)
1 1 (j)
(j)∗
(j) (j)
(j) (j) (j) T
Ci
= argmax − (j) ẑi
−2Ci Γi + Ci Φi Ci
. (S.53)
2r
(j)
C
i
Using (S.50) leads to the solutions
(j) −1
∗
A(j) = ψ (j) φ1
,
(j)
µ(j)∗ = x̂1 ,
(j) ∗
Ci
(j) T
= Γi
(j) −1
Φi
.
The remaining problems are of the form of (S.49)
−1
T
1
(j)
(j)
(j)
P̂1,1 − x̂1 µ(j) − µ(j) (x̂1 )T
Q(j)∗ = argmax − trQ(j)
2
Q(j)
r(j)∗
=
T
=
T
=
=
τ
m
1 N̂j
1
N̂j
i=1 t=1
τ
m
i=1
t=1
(j)
ẑi,t (yi,t
−
(j)
γi )2
−
(j)
γi )2
(S.57)
T
(j)
−ψ (j) A(j) − A(j) ψ (j) + A(j) φ1 A(j)
T
T
1 (j)
(j)
P̂1,1 − µ(j)∗ µ(j)∗ + φ2 − A(j)∗ ψ (j) .
τ
In the second case,
r
(S.55)
T
T
T
T
(j)
(j)
+µ(j) µ(j) + φ2 − ψ (j) A(j) − A(j) ψ (j) + A(j) φ1 A(j)
τ
− log Q(j) ,
2
τ
m
1 1 (j)
(j)
(j) (j)
argmax − (j)
ẑi,t (yi,t − γi )2 − 2Ci Γi
(S.56)
2
r
(j)
r
t=1
i=1
1
(j) (j) (j) T
+ Ci Φi Ci
− N̂j log r(j)
2
In the first case, it follows from (S.49) that
T
T
1 (j)
(j)
(j)
(j)
P̂1,1 − x̂1 µ(j) − µ(j) (x̂1 )T + µ(j) µ(j) + φ2
Q(j)∗ =
τ
(j)∗
(S.54)
−
(j) (j)
2Ci Γi
−
(j)∗ (j)
Ci Γi
+
(j) (j) (j) T
Ci Φi Ci
(S.58)
(S.59)
(j)
ẑi,t (yi,t
.
(S.60)
9
Finally, noting that
parameters are
∂Q
(j)
∂γi
=
(j)
∂
(j) Γi
∂γi
1
r(j)
⇒
τ
t=1
(j)
γi
=−
τ
(j) (j)
t=1 ẑi,t x̂t|i
(j)
= −ξi , the estimate of the mean
(j)
−2ẑi,t (yi,t
= τ
1
(j)
t=1 ẑi,t
−
(j)
γi )
τ
t=1
+
(j) (j)
2Ci ξi
= 0,
(S.61)
(j)
ẑi,t yi,t
−
(j) (j)
Ci ξi
.
(S.62)
3 Variational approximation for the TS-LDT
In this section, we derive the variational approximation for the TS-LDT, which follows
closely to that of the LDT. Substituting (20) into (10), leads to
(j)
) i,t q(zi,t )
j q(x
(j)
dXdZ. (S.63)
L(q(X, Z)) =
q(x )
q(zi,t ) log
p(X, Y, Z)
j
i,t
The L function (S.63) is minimized by sequentially optimizing each of the factors
q(x(j) ) and q(zi,t ), while holding the others constant [2].
3.1 Optimization of q(x(j) )
Rewriting (S.6) with wl = x(j) ,
log q ∗ (x(j) ) ∝ log p̃(x(j) , Y ) = EZ,Xk=j [log p(X, Y, Z)]
τ m
(j)
(j)
(j)
zi,t log p(yi,t |xt , zi,t = j) + log p(x )
∝ EZ,Xk=j
=
τ m
t=1 i=1
(S.64)
(S.65)
t=1 i=1
(j)
(j)
Ezi,t [zi,t
] log p(yi,t |xt , zi,t = j) + log p(x(j) ),
(S.66)
where in (S.65) we have dropped the terms of the complete data log-likelihood (S.43)
(j)
(j)
(j)
that are not a function of x (j) . Finally, defining h i,t = Ezi,t [zi,t ] = q(zi,t )zi,t dzi,t ,
and the normalization constant
τ m
(j)
(j)
Zq(j) = p(x(j) )
p(yi,t |xt , zi,t = j)hi,t dx(j) ,
(S.67)
t=1 i=1
the optimal q(x(j) ) is given by (21).
3.2 Optimization of q(zi,t )
Rewriting (S.6) with wl = zi,t and dropping terms that do not depend on z i,t ,
log q ∗ (zi,t ) ∝ log p̃(zi,t , Y ) = EX,Zk=i,s=t [log p(X, Y, Z)]
(S.68)
3 VARIATIONAL APPROXIMATION FOR THE TS-LDT
10

EX,Zk=i,s=t 
∝
K
j=1
K
=

(j)
(j)
zi,t log p(yi,t |xt , zi,t = j) + log p(Z)
(j)
(S.69)
(j)
zi,t Ex(j) [log p(yi,t |xt , zi,t = j)] + EZk=i,s=t [log p(Z)]. (S.70)
t
j=1
For the last term,
EZk=i,s=t [log p(Z)]
∝ EZk=i,s=t [log(Vi,t (zi,t )
(t,t )∈E
=
K
j=1
∝
j=1
+
(j)
j=1
K
Ezi,t [
j=1
(j)
zi,t
(j)
log αi,t
(j)
zi,t

(j)
zi,t
+
j=1
(j) (j)
zi,t zi,t log
K
j=1
+
(j)
hi ,t
(i,i )∈Et
K
j=1
E
(t,t )∈Ei
=
K
j=1
γ1
+ log γ2 ]
γ2
Ezi ,t [zi(j)
,t ] log
(j)
(j)
xt
β1
β2
γ1
log
+
γ2
(j)
hi,t
(t,t )∈Ei
(j)
[log p(yi,t |xt , zi,t = j)] +

(j)
hi,t log
β1
(j)
+ log αi,t 
β2
(j) (j)
zi,t log(gi,t αi,t ),
(S.73)
γ1
γ2

(j)
zi,t
+
(j) (j)
zi,t zi ,t log
(i,i )∈Et
(t,t )∈Ei
log α(j)
i,t
log q ∗ (zi,t ) ∝
(S.72)
β1
+ log β2 ]
β2
(j)
zi,t
(j)
Ezi,t [zi,t
] log
Hence,
(j)
Ezi ,t [
(i,i )∈Et
j=1
=
K
(j)
zi,t log αi,t +
K
K
Ezi ,t [log Vi,i (zi,t , zi ,t )]
i
(t,t )∈Ei
K
Vt,t (zi,t , zi,t ,) )]
(t,t )∈Ei
Ezi,t [log Vt,t (zi,t , zi,t )]
+
(S.71)
(i,i )∈Et
+
Vi,i (zi,t , zi ,t )
(i,i )∈Et
= log Vi,t (zi,t ) +
(S.74)

β1 
.
log
β2
(i,i )∈Et
(j)
(S.75)
hi ,t log
γ1
γ2
(S.76)
(S.77)
where gi,t is defined in (24). This is a multinomial distribution of normalization conK
(j) (j)
(j)
stant j=1 (αi,t gi,t ), leading to (22) with h i,t as given in (23).
3.3 Normalization constant for q(x (j) )
11
3.3 Normalization constant for q(x(j) )
Taking the log of (S.67)
log Zq(j)
=
log
p(x(j) )
τ
m (j)
(j)
p(yi,t |xt , zi,t = j)hi,t dx(j) .
(S.78)
i=1 t=1
(j)
(j)
(j)
Note that the term p(y i,t |xt , zi,t = j)hi,t does not affect the integral when h i,t = 0.
(j)
(j)
Defining Ij as the set of indices (i, t) with non-zero h i,t , i.e. Ij = {(i, t)|hi,t > 0},
(S.78) becomes
(j)
(j)
log Zq(j) = log p(x(j) )
p(yi,t |xt , zi,t = j)hi,t dx(j) ,
(S.79)
(i,t)∈Ij
where
(j)
(j)
(j) (j)
(j)
p(yi,t |xt , zi,t = j)hi,t = G(yi,t , Ci xt , r(j) )hi,t
12 (j)
(j)
1 (j)
2πr
(j) (j) r
(j) − 2 hi,t
= (2πr )
G yi,t , Ci xt , (j) .
(j)
hi,t
hi,t
(S.80)
(S.81)
For convenience, we define an LDS over the subset of observations indexed by I j .
Note that the dimension of the observation y t changes over time, depending on how
(j)
many hi,t are active in each frame, and hence the LDS is parameterized by Θ̃j =
(j)
(j)
(j)
{A(j) , Q(j) , C̃t , R̃t , µ(j) }, where C̃t
(j)
(j)
= [Ci ](i,t)∈Ij is a time-varying observa-
tion matrix, and R̃t is time-varying diagonal covariance matrix with diagonal entries
(j)
(j)
[ r (j) ](i,t)∈Ij . This LDS has conditional observation likelihood p̃(y i,t |xt , zi,t = j) =
hi,t
(j) (j)
(j)
G(yi,t , Ci xt , r̃i,t ), we can rewrite
12 (1−h(j)
(j)
1
i,t )
(j)
(j)
(j)
(hi,t )− 2 p̃(yi,t |xt , zi,t = j), (S.82)
p(yi,t |xt , zi,t = j)hi,t = 2πr(j)
and, from (S.79),
log Zq(j) =
(S.83)
12 (1−h(j)
i,t )
1
(j)
(j)
log p(x(j) )
(hi,t )− 2 p̃(yi,t |xt , zi,t = j) dx(j) .
2πr(j)
(i,t)∈Ij
Since, under the restricted LDS, the likelihood of the observation Y j = [yi,t ](i,t)∈Ij is
p̃j (Yj ) =
p(x(j) )
(i,t)∈Ij
(j)
p̃(yi,t |xt , zi,t = j)dx(j) ,
(S.84)
REFERENCES
12
it follows that
log Zq(j)

12 (1−h(j)
1
i,t )
(j)
2πr(j)
= log p̃j (Yj )
(hi,t )− 2 

(S.85)
(i,t)∈Ij
=
1 1 (j)
(j)
(1 − hi,t ) log(2πr(j) ) −
log hi,t + log p̃j (Yj ). (S.86)
2
2
(i,t)∈Ij
(i,t)∈Ij
References
[1] A. B. Chan and N. Vasconcelos, “Variational layered dynamic textures,” in IEEE Conf. Computer Vision
and Pattern Recognition, 2009.
[2] C. M. Bishop, Pattern Recognition and Machine Learning.
Springer, 2006.
SVCL-TR
2009/01
June 2009
Derivations for the Layered Dynamic Texture and
Temporally-Switching Layered Dynamic Texture
Antoni B. Chan
Download