IEEE TRANS. ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, TO APPEAR, 2013 1 Clustering Dynamic Textures with the Hierarchical EM Algorithm for Modeling Video Adeel Mumtaz, Emanuele Coviello, Gert. R. G. Lanckriet, Antoni B. Chan F Algorithm 2 Kalman filter and Kalman smoothing filter A PPENDIX A S ENSITIVITY ANALYSIS SMOOTHING FILTER OF THE K ALMAN In this section, we derive a novel efficient algorithm for performing sensitivity analysis of the Kalman smoothing filter, which is used to compute the E-step expectations. We begin with a summary of the Kalman smoothing filter, followed by the derivation of sensitivity analysis of the Kalman filter and Kalman smoothing filter. Finally, we derive an efficient algorithm for computing the expected log-likelihood using these results. A.1 1: Input: DT parameters Θ = {A, Q, C, R, µ, S, ȳ}, video y1:τ . 2: Initialize: x̃1|0 = µ, Ṽ1|0 = S. 3: for t = {1, · · · , τ } do 4: Kalman filter – forward recursion: Ṽt|t−1 = AṼt−1|t−1 AT + Q, T Kt = Ṽt|t−1 C (C Ṽt|t−1 C x̃t|t−1 = Ex|y1:t−1 [xt ], (42) + R) (50) Ft = A − AKt C. −1 , Jt−1 = Ṽt−1|t−1 AT Ṽt|t−1 Ht−1 = A−1 − Jt−1 , Ṽt−1|τ = Ṽt−1|t−1 + Jt−1 (Ṽt|τ − Ṽt−1,t−2|τ = T Ṽt−1|t−1 Jt−2 T Ṽt|t−1 )Jt−1 , x̃t|t−1 = Ft−1 x̃t−1|t−2 + Gt−1 (yt−1 − ȳ), x̃t−1|τ = Ht−1 x̃t|t−1 + Jt−1 x̃t|τ , + Jt−1 (Ṽt,t−1|τ − where {Ft , Gt , Jt , Ht } are defined in (52), (53). Finally, note that the conditional covariances, Ṽt|τ and Ṽt,t−1|τ in (54) and (55), and matrices {Ft , Gt , Jt , Ht }, are not functions of the observed sequence y1:τ . Hence, we have (r) (r) V̂t = Ey|Θb [Ṽt|τ ] = Ṽt|τ , (r) (r) (46) V̂t,t−1 = Ey|Θb [Ṽt,t−1|τ ] = Ṽt,t−1|τ . • A. Mumtaz and A. B. Chan are with the Department of Computer Science, City University of Hong Kong. E-mail: adeelmumtaz@gmail.com, abchan@cityu.edu.hk. • E. Coviello and G. R. G. Lanckriet are with the Department of Electrical and Computer Engineering, University of California, San Diego. E-mail: emanuetre@gmail.com, gert@ece.ucsd.edu. (53) (54) (55) (56) 9: end for 10: Output: Kalman filter matrices {Ṽt|t−1 , Ṽt|τ , Ṽt,t−1|τ , Gt , Ft , Ht }, and state estimators {x̃t|t−1 , x̃t|t , x̃t|τ }. A.2 Sensitivity Analysis of the Kalman smoothing filter We consider the associated Kalman (r) (44) (45) (52) T AṼt−1|t−1 )Jt−2 , x̃t−1|τ = x̃t−1|t−1 + Jt−1 (x̃t|τ − Ax̃t−1|t−1 ). (43) Both filters are summarized in Algorithm 2. The Kalman filter consists of a set of forward recursive equations (Alg. 2, line 4), while the Kalman smoothing filter contains an additional backward recursion (Alg. 2, line 8). For sensitivity analysis, it will be convenient to rewrite the state estimators in (50) and (56) as functions only of x̃t|t−1 and x̃t|τ , (51) 5: end for 6: Initialize: Ṽτ,τ −1|τ = (I − Kτ C)AṼτ −1|τ −1 . 7: for t = {τ, · · · , 2} do 8: Kalman smoothing filter – backward recursion: x̃t|τ = Ext |y1:τ [xt ], Ṽt|τ = covxt |y1:τ (xt ), (48) (49) x̃t|t = x̃t|t−1 + Kt (yt − C x̃t|t−1 − ȳ), while the Kalman smoothing filter estimates the state conditioned on the fully observed sequence y1:τ , Ṽt,t−1|τ = covxt−1,t |y1:τ (xt , xt−1 ). , x̃t|t−1 = Ax̃t−1|t−1 , Gt = AKt , Ṽt|t−1 = covx|y1:t−1 (xt ), −1 Ṽt|t = (I − Kt C)Ṽt|t−1 , Kalman smoothing filter The Kalman filter [36, 26] computes the mean and covariance of the state xt of an LDS, conditioned on the partially observed sequence y1:t−1 = {y1 , . . . , yt−1 }, (47) T (r) (r) two LDS, Θb and Θr , and their (b) (b) (b) (b) (b) filters {Ft , Gt , Ht , x̃t|t−1 , x̃t|τ } and (r) (r) {Ft , Gt , Ht , x̃t|t−1 , x̃t|τ }. The goal is to compute the mean and covariance of the Kalman smoothing filter for Θr , when the source distribution is Θb , h i (r) (r) x̂t = Ey|Θb x̃t|τ , κ̂t = covy|Θb (yt , x̃t|τ ), (57) (r) (r) (r) χ̂t = covy|Θb (x̃t|τ ), χ̂t,t−1 = covy|Θb (x̃t|τ , x̃t−1|τ ). To achieve this, we first analyze the forward recursion, followed by the backward recursion. A.2.1 Forward recursion For the forward recursion, the Kalman filters for Θb and Θr are recursively defined by (44), " (b) # " (b) (b) # (b) (b) x̃t|t−1 Ft−1 x̃t−1|t−2 + Gt−1 (yt−1 − ȳb ) = , (r) (r) (r) (r) (b) x̃t|t−1 Ft−1 x̃t−1|t−2 + Gt−1 (yt−1 − ȳr ) (b) where {yt } are the observations from source Θb . Substituting (2) (b) (b) (b) of the base model, i.e., yt−1 = Cb xt−1 + wt−1 + ȳb , and including IEEE TRANS. ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, TO APPEAR, 2013 Algorithm 3 Sensitivity Analysis of Kalman filter Algorithm 4 Sensitivity Analysis of Kalman smoothing filter 1: Input: DT parameters Θb and Θr , Kalman filter matrices (b) (b) (r) (r) {Gt , Ft , Gt , F τ. t }, length Sb 0 0 µb 0 0. 2: Initialize: x̂1 = µb , V̂1 = 0 0 0 0 µr 3: for t = {2, · · · , τ + 1} do 4: Form block matrices: Ab 0 0 (b) G(b) C 0 At−1 = t−1 b Ft−1 , (r) (r) 0 Ft−1 Gt−1 Cb (59) 0 0 I (b) B = 0 , Ct−1 = Gt−1 , Dt−1 = 0 . (r) (r) 0 Gt−1 Gt−1 5: 2 1: Input: DT parameters Θb and Θr , Kalman smoothing filter matrices (b) (b) (r) (r) (r) (r) {Gt , Ft } and {Gt , Ft , Ht , Jt }, Kalman filter sensitivity analysis {x̂t , V̂t }, length τ . [3] −1 −1 [3,3] −T 2: Initialize: x̂τ = A−1 r x̂τ +1 , χ̂τ = Ar V̂τ +1 Ar , Lτ = Ar , Mτ = 0. 3: for t = {τ, · · · , 1} do 4: Compute cross-covariance: (r) [3,2] (r) [1,1] (r) ρ̂t = Lt Ft V̂t + (Lt Gt Cb + Mt )V̂t CbT + Lt Gt Rb . (67) 5: 6: if t > 1 then Compute sensitivity: (r) ω̂t = Lt Ft x̂t−1 = Update means and covariances: x̂t = At−1 x̂t−1 + Dt−1 (ȳb − ȳr ), h (r) χ̂t−1 = Ht−1 (60) T T V̂t = At−1 V̂t−1 AT t−1 + BQb B + Ct−1 Rb Ct−1 . the recursion of the associated state-space given in (1), we have (b) (b) (b) Ab xt−1 + vt xt (b) (b) (b) x̃(b) F (b) x̃(b) t|t−1 = t−1 t−1|t−2 + Gt−1 (Cb xt−1 + wt−1 ) (r) (b) (b) (r) (r) (r) x̃t|t−1 Ft−1 x̃t−1|t−2 + Gt−1 (Cb xt−1 + wt−1 + ȳb − ȳr ) (b) (b) + Dt−1 (ȳb − ȳr ), (r) Mt−1 = (r) (r) (r) = Ht−1 x̂t + Jt−1 x̂t , (62) with initial condition x̂τ = E = (r) y|Θb [x̃τ +1|τ ] E = [3] A−1 r x̂τ +1 . + Mt )Ab . (73) (r) (r) (r) (r) (r) (66) (r) (r) A.3 Expected Log-Likelihood log p(y|Θr ) = τ X log p(yt |y1:t−1 , Θr ) t=1 τ X (r) log N (yt |Cr x̃t|t−1 + ȳr , Σt ) t=1 τ X −1 tr 2 h (74) T Σ−1 t (yt − ȳr − Cr x̃t|t−1 )(yt − ȳr − Cr x̃t|t−1 ) (r) (r) i (64) Substituting (45) into the definition of χ̂t,t−1 yields m 2 log(2π), (75) (r) noting that Ṽt|t−1 and Σt are not a functions of the observations y1:t−1 , ` = Ey|Θb [log p(y|Θr )] τ h i X T T T −1 = tr Σ−1 t (Ût − λ̂t Cr − Cr λ̂t + Cr Λ̂t Cr ) 2 (r) covy|Θb (A−1 r x̃τ +1|τ ) [3,3] −T = A−1 Ar . r V̂ log |Σt | − (r) (r) (r) −T A−1 r covy|Θb (x̃τ +1|τ )Ar 1 2 where Σt = Cr Ṽt|t−1 CrT + Rr . Taking the expectation of (75), and where ω̂t = covy|Θb (x̃t|t−1 , x̃t|τ ), and the initial condition is = (72) Finally, the cross-covariances, ω̂t = covy|Θb (x̃t|t−1 , x̃t|τ ) and (r) κ̂Tt = ρ̂t = covy|Θb (x̃t|τ , yt ), are calculated efficiently using the recursion in (67) and (68), where {Lt , Mt } are recursively given by (72) and (73). The derivation is quite involved and appears in the Appendix B. The algorithm for sensitivity analysis of the Kalman smoothing filter is summarized in Algorithm 4. − = covy|Θb (Ht−1 x̃t|t−1 + Jt−1 x̃t|τ ) " (r) T # h i [3,3] V̂t ω̂tT (Ht−1 ) , (r) (r) = Ht−1 Jt−1 (r) ω̂t χ̂t (Jt−1 )T = , t=1 (r) χ̂τ = (r) (r) (r) = χ̂t−1 = covy|Θb (x̃t−1|τ ) (r) covy|Θb (x̃τ |τ ) (71) = ω̂t (Ht−1 )T + χ̂t (Jt−1 )T . (63) Taking the covariance of (45), we obtain the recursion, (r) (r) (r) (r) Jt−1 (Lt Gt Cb (r) = A−1 r , (70) = covy|Θb (x̃t|τ , Ht−1 x̃t|t−1 + Jt−1 x̃t|τ ) (r) = Ht−1 Ey|Θb [x̃t|t−1 ] + Jt−1 Ey|Θb [x̃t|τ ] (r) y|Θb [x̃τ |τ ] # The expected log-likelihood Ey|Θb [log p(y|Θr )] can be calculated efficiently using the results from the sensitivity analysis for Kalman filters. First, the observation log-likelihood of the DT is expressed in “innovation” form x̂t−1 = Ey|Θb [x̃t−1|τ ] [3] ω̂tT χ̂t (r) (Ht−1 )T (r) T (Jt−1 ) 8: end if 9: end for 10: Output: {x̂t , χ̂t , χ̂t,t−1 , κ̂t = ρ̂T t }. (58) Taking the expectation of (45) yields a recursion for x̂t , (r) [3,3] V̂t ω̂t " (r) (r) Backward recursion (r) (69) Lt−1 = Ht−1 + Jt−1 Lt Ft (r) (r) (68) χ̂t,t−1 = covy|Θb (x̃t|τ , x̃t−1|τ ) where xt = [(xt )T , (x̃t|t−1 )T , (x̃t|t−1 )T ]T , and the block matrices {At−1 , B, Ct−1 , Dt−1 } are defined in (59). Finally, taking the expectation of (58), with respect to {x1:τ , y1:τ } ∼ Θb , yields the recursive equations for x̂t in (60). Similarly, taking the covariance of (58) yields a recursive equation for V̂t in (61), where we have used the fact that (b) (r) (b) (b) (b) (b) (b) ⊥ wt . The ⊥ {xt−1 , x̃t−1|t−2 , x̃t−1|t−2 }, and vt ⊥ {vt , wt−1 } ⊥ recursive equations for the sensitivity analysis of the Kalman filter are summarized in Algorithm 3. A.2.2 , Update matrices: which can be rewritten succinctly as + (r) Jt−1 i (r) 7: (b) Ct−1 wt−1 [2,3] χ̂t,t−1 = ω̂t (Ht−1 )T + χ̂t (Jt−1 )T . (b) xt−1 xt = At−1 xt−1 + + (r) + (Lt Gt Cb + Mt )V̂t (r) Jt−1 x̂t , (61) 6: end for 7: Output: x̂t , V̂t . (b) Bvt [3,3] V̂t (r) [3] Ht−1 x̂t (65) t=1 − 1 m log |Σt | − log(2π), 2 2 (76) IEEE TRANS. ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, TO APPEAR, 2013 3 or in general, for s > t, where (r) (r) (r) (r) (r) = covy|Θb (x̃t|t−1 ) + Ey|Θb [x̃t|t−1 ]Ey|Θb [x̃t|t−1 ]T [3,3] = V̂t [3] [3] + x̂t (x̂t )T , (r) (r) x̃s|s−1 = Fs−1,t+1 (Ft x̃t|t−1 + Gt yt ) | {z } (r) Λ̂t = Ey|Θb [x̃t|t−1 (x̃t|t−1 )T ] αs (77) s−1 X + (r) Fs−1,j+1 Gj yj = αs + βs , (83) j=t+1 and {z | (r) λ̂t = Ey|Θb [(yt − ȳr )(x̃t|t−1 )T ] (r) [2,3] (r) [1] [3] + (Cb x̂t + ȳb − ȳr )(x̂t )T . A PPENDIX B E FFICIENT CALCULATION OF (78) CROSS - THE ( I ,s < t , and the quantities Fs Fs−1 · · · Ft , s ≥ t αs and βs as above. Note that βt+1 = 0. (r) We now substitute x̃s|s−1 = αs + βs into (79). First, substituting the αs terms into (79), where we define Fs,t = = covy|Θb (yt − ȳr , x̃t|t−1 ) + Ey|Θb [yt − ȳr ]Ey|Θb [x̃t|t−1 ]T = Cb V̂t (r) ρ̂t = κTt = covy|Θb (x̃t|τ , yt ). = τ +1 X as a function of only and observations yt:τ . Filling in the backward recursion of the Kalman smoothing filter in (45), (r) (r) = + (r) (r) (r) (r) (r) (r) Jt (Ht+1 x̃t+2|t+1 τ +1 X Lt = (r) + (r) (r) (r) (r) (r) = Ht x̃t+1|t + (r) s−2 Y ( (r) (r) (r) = Ht +( Ji )A−1 r x̃τ +1|τ (r) (r) i=t τ +1 X = ! (r) Ft+1 Jt+1,s−2 Ĥs−1 Fs−1,t+2 (r) (r) + Jt Lt+1 Ft+1 , (87) Lτ = Jτ,τ −1 Ĥτ Fτ,τ +1 = Ĥτ = A−1 r . (r) Jt,s−2 Ĥs−1 x̃s|s−1 , (79) β̂t = (r) Ht A−1 r where we define Ĥt = ( I ,t > s . Next, (r) (r) (r) Jt Jt+1 · · · Js ,t ≤ s ,t < τ , and Jt,s ,t = τ we rewrite (r) x̃s|s−1 , the (88) Next, we substitute the βs terms into (79), s=t+1 ( (86) where in (85) we have separated the first term of the summation (r) (s = t + 1) , and in (86) we have used Jt,s−2 = Jt Jt+1,s−2 and (r) Fs−1,t+1 = Fs−1,t+2 Ft+1 , when s ≥ t. The initial condition for the Lt backward recursion is (r) Ji )Hs−1 x̃s|s−1 s=t+2 i=t τY −1 τ +1 X s=t+2 (r) (r) τ X Jt,s−2 Ĥs−1 Fs−1,t+1 (r) Jt = Ĥt + (r) + Jτ −2 (Hτ −1 x̃τ |τ −1 + Jτ −1 A−1 r x̃τ +1|τ ) · · · )) (r) τ +1 X s=t+2 = Ht x̃t+1|t + Jt (Ht+1 x̃t+2|t+1 + Jt+1 (· · · (r) τ +1 X = Ĥt + (r) Jt+1 (· · · + Jτ −2 (Hτ −1 x̃τ |τ −1 + Jτ −1 x̃τ |τ ) · · · )) (r) (r) Jt,s−2 Ĥs−1 Fs−1,t+1 s=t+1 (r) (r) (85) } s=t+2 (r) (r) Jt,s−2 Ĥs−1 Fs−1,t+1 {z = Ht x̃t+1|t + Jt (Ht+1 x̃t+2|t+1 + Jt+1 x̃t+2|τ ) (r) (r) Ht x̃t+1|t (84) Lt = Jt,t−1 Ĥt Ft,t+1 + (r) (r) x̃t|τ = Ht x̃t+1|t + Jt x̃t+1|τ (r) (r) (r) where Lt can be computed recursively, (r) x̃t|t−1 (r) (r) (r) Jt,s−2 Ĥs−1 Fs−1,t+1 (Ft x̃t|t−1 + Gt yt ), s=t+1 (r) First, we derive an expression of the Kalman smoothing filter x̃t|τ (r) Jt,s−2 Ĥs−1 αs s=t+1 | (r) (r) τ +1 X α̂t = COVARIANCE TERMS In this section, we derive efficient expressions for calculating the cross-covariance terms, ω̂t = covy|Θb (x̃t|τ , x̃t|t−1 ), } βs τ +1 X Jt,s−2 Ĥs−1 βs s=t+1 = = filter terms where s > t, as a function of and {yt , · · · , ys−1 }. Note that we drop the constant ȳr term since it will not affect the covariance operator later. Substituting the forward recursions in (44), we have Jt,s−2 Ĥs−1 s=t+1 Kalman (r) x̃t|t−1 τ +1 X = τ +1 X s−1 X (r) Fs−1,j+1 Gj yj j=t+1 s−1 X (r) Jt,s−2 Ĥs−1 Fs−1,j+1 Gj yj s=t+2 j=t+1 = τ X τ +1 X (r) Jt,s−2 Ĥs−1 Fs−1,j+1 Gj yj , (89) j=t+1 s=j+1 (r) (r) (r) x̃t+1|t = Ft x̃t|t−1 + Gt yt , (r) x̃t+2|t+1 (r) x̃t+3|t+2 = (r) Ft+1 x̃t+1|t = (r) (r) (r) Ft+1 (Ft x̃t|t−1 + Gt yt ) (r) (r) Ft+2 x̃t+2|t+1 + Gt+2 yt+2 = (r) (r) + (80) (r) Gt+1 yt+1 (r) (r) + Gt+1 yt+1 , (r) x̃t|τ = (r) = Ft+2 (Ft+1 (Ft x̃t|t−1 + Gt yt ) (r) (81) (r) + Gt+1 yt+1 ) + Gt+2 yt+2 , (r) where in (89) we have collected the Gj yj terms by switching the double summation. Finally, using {α̂t , β̂t }, we rewrite the Kalman (r) smoothing filter of (79) as a function of x̃t|t−1 and yt+1:τ , (82) τ +1 X Jt,s−2 Ĥs−1 (αs + βs ) = α̂t + β̂t s=t+1 (r) (r) (r) = Lt (Ft x̃t|t−1 + Gt yt ) + β̂t . (90) IEEE TRANS. ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, TO APPEAR, 2013 Note that β̂t is a function of {yt+1 , · · · , yτ }. 4 Looking at the covariance with β̂t , covy|Θb (β̂t , yt ) B.1 Calculating ω̂t = (r) (r) covy|Θb (x̃t|τ , x̃t|t−1 ) = covy|Θb ( ω̂t = (r) Jt,s−2 Ĥs−1 Fs−1,j+1 Gj yj , yt ) j=t+1 s=j+1 Next, we will derive an expression for ω̂t (r) (r) covy|Θb (x̃t|τ , x̃t|t−1 ) τ +1 X τ X = covy|Θb (α̂t + (r) β̂t , x̃t|t−1 ). (91) τ +1 X τ X = (r) Jt,s−2 Ĥs−1 Fs−1,j+1 Gj covy|Θb (yj , yt ) j=t+1 s=j+1 Looking at the covariance with the β̂t term, τ +1 X τ X = (r) [1,1] Jt,s−2 Ĥs−1 Fs−1,j+1 Gj Cb Abj−t V̂t CbT j=t+1 s=j+1 (r) [1,1] covy|Θb (β̂t , x̃t|t−1 ) = covy|Θb ( = Mt V̂t τ +1 X τ X (r) (r) Jt,s−2 Ĥs−1 Fs−1,j+1 Gj yj , x̃t|t−1 ) = τ +1 X (r) ρ̂t = covy|Θb (x̃t|τ , yt ) (r) (r) (r) (r) = τ +1 X (r) = covy|Θb (Lt (Ft x̃t|t−1 + Gt yt ) + β̂t , yt ) Jt,s−2 Ĥs−1 Fs−1,j+1 Gj covy|Θb (yj , x̃t|t−1 ) j=t+1 s=j+1 τ X (98) Finally, using (90), the cross-covariance is j=t+1 s=j+1 τ X CbT . (r) (r) (r) = Lt Ft covy|Θb (x̃t|t−1 , yt ) + Lt Gt covy|Θb (yt ) (r) [2,3] V̂t Jt,s−2 Ĥs−1 Fs−1,j+1 Gj Cb Aj−t b , + covy|Θb (β̂t , yt ) (92) [3,2] (r) j=t+1 s=j+1 {z | = Lt Ft V̂t } Mt [1,1] (r) CbT + Lt Gt (Cb V̂t CbT + Rb ) [1,1] + Mt V̂t CbT (r) [1,1] (r) [3,2] CbT + (Lt Gt Cb + Mt )V̂t = Lt Ft V̂t where in the last line we have used (107). Mt can be computed with a backward recursion, (r) + Lt Gt Rb , τ X Mt = τ +1 X " = B.3 τ +1 X τ X Jt+1,s−2 Ĥs−1 i Ab ·Fs−1,j+1 Gj Cb Aj−t−1 b " = (r) Jt τ +1 X Useful properties In this section, we derive some properties used in the previous section. Note that in the sequel we remove the mean terms, ȳb and ȳr , which do not affect the covariance operator. First, we derive the covariance between two observations, for k > 0, j=t+1 s=j+1 (r) # (r) Jt+1,s−2 Ĥs−1 Fs−1,t+2 Gt+1 Cb (b) + τ +1 X (b) covy|Θb (yt+k , yt ) (93) s=t+2 τ X (100) where (99) follows by using (107), (98), and covy|Θb (yt ) = Cb covx|Θb (xt )CbT + Rb . (r) Jt,s−2 Ĥs−1 Fs−1,j+1 Gj Cb Aj−t b j=t+1 s=j+1 (r) Jt (b) (r) j−(t+1) (b) (b) (r) = Jt (Lt+1 Gt+1 Cb + Mt+1 )Ab , (r) (r) ω̂t = covy|Θb (x̃t|τ , x̃t|t−1 ) = (r) (r) covy|Θb (Lt (Ft x̃t|t−1 + = (r) (r) Lt Ft covy|Θb (x̃t|t−1 ) + (r) + β̂t , x̃t|t−1 ) (r) (r) Lt Gt covy|Θb (yt , x̃t|t−1 ) = = + + Calculating ρ̂t = [1,1] Cb Akb V̂t (r) (b) (b) (96) k X (102) CbT , (103) (b) (r) (b) (b) = = Cb Akb covy1:t−1 |Θb ( = (b) (r) Cb Akb covy|Θb (x̃t|t−1 , x̃t|t−1 ) (b) (r) Abk−l vt+l ) + wt+k , x̃t|t−1 ) l=1 (b) (r) covxt ,y1:t−1 |Θb (Cb Akb xt , x̃t|t−1 ) (95) (105) (b) (r) xt |y1:t−1 [xt ], x̃t|t−1 ) (106) E (r) = (104) [2,3] Cb Akb V̂t , (107) (b) (r) where in (105) we use vt+l ⊥ ⊥ x̂t|t−1 for l ≥ 1 and wt+k ⊥ ⊥ x̃t|t−1 for k ≥ 0, and in (106), covx,y (x, y) = covy (Ex|y [x], y). We now derive an expression for ρ̂t , (r) (b) (101) (b) (b) where in (102) we have rewritten xt+k as a function of xt , P (b) (b) (b) i.e., xt+k = Akb xt + kl=1 Ak−l vt+l , in (101) we have used b (b) (b) (b) (b) (b) (b) xt ⊥ ⊥ wt+k and xt+k ⊥ ⊥ wt for k > 0, and in (103), xt ⊥ ⊥ vt+l = cov(Cb (Akb xt + (r) covy|Θb (x̃t|τ , yt ) ρ̂t = covy|Θb (x̃t|τ , yt ) = covy|Θb (α̂t + β̂t , yt ). (b) Ak−l vt+l , xt )CbT b l=1 (b) k T Cb Ab cov(xt )Cb = (b) where in (95) we have used (107) and (92). B.2 = k X covy|Θb (yt+k , x̃t|t−1 ) = cov(Cb xt+k + wt+k , x̃t|t−1 ) (r) (r) [2,3] [2,3] Lt Gt Cb V̂t + Mt V̂t (r) [2,3] (Lt Gt Cb + Mt )V̂t , (b) for l ≥ 1. Next, we derive the covariance between the one-step ahead state estimator and an observation, for k ≥ 0, (r) Gt yt ) + covy|Θb (β̂t , x̃t|t−1 ) (r) [3,3] Lt Ft V̂t (r) [3,3] Lt Ft V̂t (b) = Cb cov(Akb xt + (94) where in (93) we have separated the first term of the summation (j = t + 1), and in (94) we have used the definition of Lt+1 and Mt+1 . The initial condition is Mτ = 0. Finally using (90), the cross-covariance is (b) (b) = Cb cov(xt+k , xt )CbT + covΘb (wt+k , wt ) Ab j=t+2 s=j+1 (r) (b) (b) = cov(Cb xt+k + wt+k , Cb xt + wt ) ! Jt+1,s−2 Ĥs−1 Fs−1,j+1 Gj Cb Ab (99) (97)