Blended Particle Filters for Large Dimensional Chaotic Dynamical Systems – Supplementary material Andrew J. Majda1 , Di Qi1 , and Themistoklis P. Sapsis2 1 Department of Mathematics and Center for Atmosphere and Ocean Science, Courant Institute of Mathematical Sciences, New York University, New York, NY 10012 2 1 Department of Mechanical Engineering, Massachusetts Institute of Technology, Cambridge, MA 02139 Proof of Proposition 1 Proposition 1. Assume the prior distribution from the forecast is the blended particle filter conditional Gaussian distribution Q X − p− (u) = pj,− δ (u1 − u1,j ) N ū− [1.1] 2,j , R2,j , j=1 and assume the observations have the structure v = G (u) + σ 0 = G0 (u1 ) + G1 (u1 ) u2 + σ 0 , [1.2] then the posterior distribution in the analysis step taking into account the observations in [1.2] is also a blended particle filter conditional Gaussian distribution, i.e. there are explicit formulas for the updated weights, pj,+ , + 1 ≤ j ≤ Q, and conditional mean, ū+ 2,j , and covariance, R2,j , so that p+ (u) = Q X + pj,+ δ (u1 − u1,j ) N ū+ 2,j , R2,j . [1.3] j=1 + In fact, the distributions N ū+ 2,j , R2,j are updated by suitable Kalman filter formulas with the mean update for ū2,j depending nonlinearly on u1,j in general. Q Proof. Let u = (u1 , u2 ), uj ∈ RNj , N1 + N2 = N , and u1 = {u1,j }j=1 are sampled in Q discrete states such that [1.1] is satisfied. Using discrete representation of u1 , the posterior distribution can be written as p+ (u | v) = Q X p+ (u, u1,j | v) = j=1 Q X p+ (u | v, u1,j ) pj,+ , [1.4] j=1 where pj,+ = p+ (u1,j | v) is the posterior probability of the j-th state. 1. First, consider the general likelihood function p (v | u) (not possibly Gaussian). A simply application of Bayes Theorem gives p+ (u | v, u1,j ) = = = p (v | u, u1,j ) p− (u, u1,j ) p (v, u1,j ) p (v | u, u1,j ) p− (u | u1,j ) ´ p (v | u1,j , u2 ) p (u2 | u1,j ) du2 p (v | u1,j , u2 ) Πj,− , I˜j 1 [1.5] 2 Details of Blended Filter Algorithms 2 ´ − − − with I˜j = p (v | u1,j , u2 ) N ū− 2,j , R2,j du2 , 1 ≤ j ≤ Q, and Πj,− = p− (u | u1,j ) = δ (u1 − u1,j ) N ū2,j , R2,j by the conditional Gaussian assumption. 2. Then, to determine the posterior weights pj,+ , the formula of conditional probability for posterior gives p (u, u1,j , v) = p (u | v, u1,j ) p+ (u1,j | v) p (v) = p (u | v, u1,j ) pj,+ p (v) , [1.6] and for prior p (u, u1,j , v) = p (v | u, u1,j ) p− (u, u1,j ) = p (v | u, u1,j ) pj,− Πj,− . [1.7] Compare [1.6] with [1.7], we get the identity p (v | u, u1,j ) pj,− Πj,− = p (u | v, u1,j ) pj,+ p (v) . Use the fact PQ ´ j=1 [1.8] p (u | v, u1,j ) pj,+ du2 = 1 and integrate both sides of [1.8] p (v) = Q ˆ X p (v | u1,j , u2 ) pj,− Πj,− du2 = j=1 Q X pj,− I˜j . [1.9] j=1 Therefore, substitute [1.5] and [1.9] into [1.8], ´ pj,− p (v | u1,j , u2 ) p (u2 | u1,j ) du2 pj,+ = p (v) ˜ pj,− Ij = PQ . ˜ k=1 pk,− Ik [1.10] 3. Finally with observations in the form of [1.2] and Gaussian likelihood function p (v | u1,j , u2 ) = pG (v − G0 (u1,j ) − G1 (u1,j ) u2 ), pG ∼ N (0, R0 ), the numerator of [1.5] becomes the standard process of Kalman filter. That is, − δ (u1 − u1,j ) N ū− 2,j , R2,j pG (v − G0 (u1,j ) − G1 (u1,j ) u2 ) p+ (u | v, u1,j ) = I˜j + + = δ (u1 − u1,j ) N ū2,j , R2,j , [1.11] + where ū+ 2,j , R2,j , depending nonlinearly on u1,j in general, are the posterior mean and covariance through suitable Kalman filtering process. Also the weight updating factor I˜j can be calculated explicitly by integrating the multiplication of two Gaussian distributions ˆ 1 1 T −1 − − T − ˜ Ij = exp − (v − G (u)) R0 (v − G (u)) exp − u2 − ū2,j R2,j u2 − ū2,j du2 2 2 1 1 +T +−1 + + −2 −T −−1 − − T −1 − = det R2,j exp ū R ū − ū2,j R2,j ū2,j − v − GEu1,j R0 v − GEu1,j . [1.12] 2 2,j 2,j 2,j Therefore, the expression for the posterior distribution [1.4] is derived explicitly in the form [1.10], [1.11], and [1.12]. 2 Details of Blended Filter Algorithms Here we describe the details about the blended filter algorithms. After the forecast step, we get the predicted values for the mean states ū and covariance matrix R, together with the particle presentation under DO basis E (t) = {e1 (t) , ..., es (t)}, s X u (t) = ū (t) + Yi (t; ω) ei (t) , i=1 2 Details of Blended Filter Algorithms 3 s where {ei (t)}i=1 is the subdimensional dynamical basis, and {Yi } is the corresponding stochastic coefficients achieved through Monte-Carlo simulations. The particle statistics presented by Yi must be consistent with the covariance matrix R, and Yi , Yj are normalized to be independent with each other, that is Yi Yj∗ = ei · Rej δij , 1 ≤ i, j ≤ s. For the analysis step, filtering is implemented in two separate subspaces. The projection operator is ⊥ calculated by completing the dynamical basis P = E, E ⊥ . The orthogonal complement E are chosen 1 . Following is the analysis step freely here. Initially each particle is uniformly weighted as (Yi , pi ) = Yi , Q algorithm for the blended methods. Algorithm. Blended filtering (analysis step) • Project the mean and covariance matrix to the two subspaces R1 ū1 T T = P ū, Rt = P RP = T ū2 R12 R12 R2 . [2.1] 0 • Solve the following linear system to find the conditional mean ū2 (u1,j ) = ū− 2,j (denote u1 = u1 − hu1 i, 0 ū2 (u1 ) = ū2 (u1 ) − hu2 i as the fluctuations about the mean) p1 u01 1,1 .. . p1 u01 1,N1 p1 ··· .. . ··· ··· pQ u0Q 1,1 .. . pQ u0Q 1,N1 pQ (N1 +1)×Q u01 2,1 u02 2,1 .. . u0Q 2,1 ··· ··· .. . ··· u01 2,N2 u02 2,N2 .. . u0Q 2,N2 = R12 0 . [2.2] Q×N2 Note that this is an underdetermined system for a sufficiently large number of particles, Q N1 + 1, and ‘–’ notation is suppressed here. • Calculate the conditional covariance R2− in u2 subspace by ˆ − R2 = R2 + hu2 i ⊗ hu2 i − ū2 (u1 ) ⊗ ū2 (u1 ) p (u1 ) du1 ˆ = R2 − ū02 (u1 ) ⊗ ū02 (u1 ) p (u1 ) du1 X ū02,j ⊗ ū02,j pj,− . = R2 − [2.3] j • Use Kalman filter updates in the u2 subspace ū+ 2,j R̃2+ K − ⊥ − = ū− 2,j + K v − GEu1,j − GE ū2,j , = I − KGE ⊥ R2− , −1 T T = R2− GE ⊥ GE ⊥ R2− GE ⊥ + R0 , [2.4a] [2.4b] [2.4c] with the (linear) observation operator G (u1 , u2 ) = GEu1 + GE ⊥ u2 , where R0 is the covariance matrix for the observation noise. • Update the particle weights in the u1 subspace by pj,+ ∝ pj,− Ij , with −1 1 T −1 + −T +T + − −1 − Ij = exp ū2,j R̃2 ū2,j − ū2,j R2 ū2,j − (v − GEu1,j ) R0 (v − GEu1,j ) . 2 • Normalize the weights pj,+ = Ppj,− Ij j pj,− Ij and use residual resampling. [2.5] 2 Details of Blended Filter Algorithms 4 • Get the posterior mean and covariance matrix from the posterior particle presentation X X ū+ u1,j pj,+ , ū+ ū+ 1 = 2 = 2,j pj,+ , j [2.6a] j and X ∗ Yi,k Yj,k pk,+ , 1 ≤ i, j ≤ s, [2.6b] Yi,k ū0+∗ j,k pk,+ , 1 ≤ i ≤ s, s + 1 ≤ j ≤ N, [2.6c] + R1,ij = k + R12,ij = X k R2+ = R̃2+ + X 0+ ū0+ 2,j ⊗ ū2,j pj,+ . [2.6d] j • Rotate the stochastic coefficients and basis to principal directions in the s-dimensional dynamical subspace. Remark. 1. As derived in [1.12] of Proposition 1, the weight updating factor I˜j has an additional component 1 + −2 , which is expensive to compute given the high dimensionality of the orthogonal subspace u2 . det R2,j For the blended algorithms in the main text, R2− is independent of the choice of particles j, so R2+ is also independent of j by [2.4b]. Therefore the expensive determinant term can be neglected in [2.5] as a normalization factor for the updated weights pj,+ . 2. By the blended forecast model, the dynamical basis ej (t) keeps tracking the principal directions of the system (that is, the direction with the largest eigenvalues of the covariance matrix R (t)). However, after the analysis step, the principal directions are changed according to the observation data. Therefore, it is necessary to rotate the basis according to the posterior covariance after every analysis step as described in the last step above. Although numerical tests show reasonable results without the rotation, corrections at every step can make sure that the scheme is robust and more accurate. 2.1 The conditional covariance R2 (u1 ) given particles in u1 subspace The forecast model gives the particle representation u1,j in the low dimensional subspace. And the conditional Gaussian mixture (ū2,j , R2,j ) in the orthogonal subspace is achieved by solving the least squares solution of 2 PQ the linear system [2.2] with minimum weighted L2 -norm, j=1 ū02,j pj . Such idea comes from the maximum entropy principle. Another advantage of this max-entropy solution is that it concludes that the conditional covariance R2 (u1 ) = R2− is independent of the particles in u1 subspace. One important point to note is that the prior covariance R2 from [2.1] is different from the conditional covariance R2 (u1 ) = R2− given the particle values in the u1 subspace. This can be seen clearly from their definitions ¨ R2 = (u2 − hu2 i) ⊗ (u2 − hu2 i) p (u2 | u1 ) p (u1 ) du2 du1 , [2.7] ˆ R2 (u1 ) = (u2 − ū2 (u1 )) ⊗ (u2 − ū2 (u1 )) p (u2 | u1 ) du2 . [2.8] ´ with hu2 i = ū2 (u1 ) p (u1 ) du1 . By a simple calculation combining [2.7] and [2.8] we have ˆ ˆ R2 (u1 ) p (u1 ) du1 = R2 + hu2 i ⊗ hu2 i − ū2 (u1 ) ⊗ ū2 (u1 ) p (u1 ) du1 ˆ = R2 − ū02 (u1 ) ⊗ ū02 (u1 ) p (u1 ) du1 , with ū02 (u1 ) = ū2 (u1 ) − hu2 i. Noting that R2 (u1 ) = R2− is independent of the variable u1 , we get the approximation for the conditional covariance ˆ − R2 = R2 − ū02 (u1 ) ⊗ ū02 (u1 ) p (u1 ) du1 X = R2 − ū02,j ⊗ ū02,j pj . [2.9] j 3 Blended Filter Performance on L-96 model 5 This conditional covariance formula is theoretically consistent, while it will introduce the problem of realizability. Then as discussed in the main text, we may need introduce a realizability check and correction method for R2− after each analysis step, or simply inflate the matrix as R2− = R2 (labeled as ‘inflated R2 ’ in the figures below). Both strategies work well for MQG-DO method while QG-DO method requires the larger inflation approach in practical implementations as shown in the results below. 2.2 Resampling strategies Here we add some details about the resampling process. In the resampling step, particles with small weights are abandoned, while the other particles are duplicated according to their weights to keep the ensemble size unchanged. One useful and popular resample approach is the residual resampling method. The idea is, replicating the j-th particle to have bpj Qc copies according to its weight in the first step; and for the rest P Q − j bpj Qc members particle values are assigned randomly with probability proportional to their residual weights pj Q − bpj Qc. An additional issue arises in the process of duplicating particles. After resampling, we will get several particles with the same value. For stochastic systems, this will not be a problem since the random external forcing term will add uncertainty to each particle and produce different prediction results even though the initial values are the same. However for deterministic systems with internal instability (for example, the L-96 system), such resampling would be of no real value, since no external randomness is introduced to the particles to generate different outputs in the next step. One possible strategy to avoid this problem is to add small perturbations to the duplicated particles. Due to the internal instability inside the system, small perturbations in the initial time can end up with large deviations in the forecasts before next analysis step so that the duplicated particles can get separated. The amplitude of the perturbation is added according to the uncertainties of the variables. This can be approximated by the filtered results before the resampling step. After updating the weights, we get the set of particles together with their new weights u0j , pj , P 02 then a perturbation is added to each particle as a white noise with variance σ 2 = j uj pj . Note that this perturbation can be viewed as an inflation to the particles. We will calculate the posterior covariance matrix according to the resampled particles, therefore the covariance will also get inflated accordingly. From numerical simulations it is shown that this strategy is effective to avoid filter divergence. 3 Blended Filter Performance on L-96 model Here we present the additional results for the blended filter methods compared with the MQG filter as well as EAKF with inflation and localization. The L-96 model is used as the test model dui = ui−1 (ui+1 − ui−2 ) − ui + F, dt i = 0, · · · , J − 1 [3.1] with J = 40 and F the deterministic forcing parameter. We will test the filter performances in both weakly chaotic regime (F = 5) and strongly chaotic regime (F = 8) as described in the main text. The statistics for these two cases are shown in Figure 3.1. Strong non-Gaussian statistics can be seen from the weakly chaotic regime F = 5 while near Gaussian distribution is shown in strongly chaotic regime F = 8. Figure 3.2 shows the RMS errors and pattern correlations for various filtering methods in regime F = 5 with moderate observational error and observation frequency r0 = 2, ∆t = 1, p = 4. And Figure 3.3 gives the joint distributions of the first two principal Fourier modes û7 and û8 in this regime by scatter plots to further visualize the non-Gaussian statistics captured through different schemes. Finally Figure 3.4 gives additional examples for the performance of the filters in regimes with sparse infrequent high quality observations with slightly larger observation noise r0 = 0.25. In addition, we compare the performances of the QG-DO method with crude covariance inflation R2− = R2 . Incorrect energy transfers in the QG forecast model for the long forecast time using the original covariance approximation as in [2.9] in this case end up with serious filter divergence. 3 Blended Filter Performance on L-96 model 6 F=5 F=8 0.4 0.12 0.11 0.35 0.1 Percentage of Energy % Percentage of Energy % 0.3 0.25 0.2 0.15 0.09 0.08 0.07 0.06 0.05 0.1 0.04 0.05 0.03 0 0 2 4 6 8 10 12 14 16 18 0.02 0 20 2 4 6 wavenumber 8 10 12 14 16 18 20 wavenumber pdfs for Re(u ) pdfs for Re(u ) k k 0.035 0.018 mode 7 mode 8 mode 8 mode 7 0.016 0.03 0.014 distribution function distribution function 0.025 0.02 0.015 0.012 0.01 0.008 0.006 0.01 0.004 0.005 0 0.002 −60 −40 −20 0 Re(uk) 20 40 0 −150 60 −100 −50 Autocorrelation function for F = 5 0 50 Re(uk) 100 150 Autocorrelation function for F = 8 1 0.5 correlation function correlation function 0.8 0 0.6 0.4 0.2 0 −0.5 0 5 10 15 time 20 25 30 −0.2 0 2 4 6 8 10 12 14 16 18 20 time Fig. 3.1: Statistics for the L-96 model: energy spectra in Fourier space (first row); the pdfs for the first two principal Fourier modes (second row); and the autocorrelation functions of the state variable (third row). Regimes for weakly chaotic (F = 5, left) and strongly chaotic (F = 8, right) dynamics are shown. 3 Blended Filter Performance on L-96 model 7 RMSE for F=5, p=4, r0=2, ∆t=1, S=5 3.5 obs. error MQG−DO EAKF QG−DO MQG 3 RMSE 2.5 2 1.5 1 0.5 0 0 50 100 150 200 time XC for F=5, p=4, r0=2, ∆t=1, S=5 1 0.9 cross−correlation 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0 50 100 150 200 time Fig. 3.2: RMS errors and pattern correlations in the weakly chaotic regime (F = 5) with parameters r0 = 2, ∆t = 1, p = 4. Results by different filtering methods are compared. 3 Blended Filter Performance on L-96 model 8 scatter plot between mode 7 and mode 8, truth for F=5 70 60 50 |u8| 40 30 20 10 0 0 10 20 30 40 |u7| 50 60 70 80 (a) truth scatter plot between mode 7 and mode 8, MQG−DO for F=5 scatter plot between mode 7 and mode 8, QG−DO for F=5 60 60 50 50 40 40 8 |u8| 70 |u | 70 30 30 20 20 10 10 0 0 10 20 30 40 |u7| 50 60 70 0 0 80 10 20 (b) MQG-DO 30 40 |u7| 50 60 70 80 (c) QG-DO scatter plot between mode 7 and mode 8, MQG for F=5 scatter plot between mode 7 and mode 8, EAKF for F=5 60 60 50 50 40 40 8 |u8| 70 |u | 70 30 30 20 20 10 10 0 0 10 20 30 40 |u7| (d) MQG 50 60 70 80 0 0 10 20 30 40 |u7| 50 60 70 80 (e) EAKF Fig. 3.3: Joint distributions of the first two principal Fourier modes û7 and û8 shown by scatter plots with different filter schemes in regime F = 5. 3 Blended Filter Performance on L-96 model 9 RMSE for F=8, p=4, r0=0.25, ∆t=0.15, S=5 14 obs. error MQG MQG−DO QG−DO EAKF 12 RMSE 10 8 6 4 2 0 60 80 100 120 140 160 180 140 160 180 time XC for F=8, p=4, r0=0.25, ∆t=0.15, S=5 0.8 Pattern Correlation 0.6 0.4 0.2 0 −0.2 60 80 100 120 time RMSE for QG−DO, F=8, p=4, r0=0.25, ∆t=0.15, S=5 16 obs. error QG−DO, conditional R−2 14 QG−DO, inflated R2 12 RMSE 10 8 6 4 2 0 60 80 100 120 140 160 180 time Fig. 3.4: RMS errors and pattern correlations in the strongly chaotic regime (F = 8) with parameters r0 = 0.25, ∆t = 0.15, p = 4. Results by different filtering methods are compared, and in addition the last row gives the comparison of the RMSEs for QG-DO method with inflated (blue) and original (red) conditional covariance R2− .