Blended Particle Filters for Large Dimensional Chaotic Andrew J. Majda

advertisement
Blended Particle Filters for Large Dimensional Chaotic
Dynamical Systems – Supplementary material
Andrew J. Majda1 , Di Qi1 , and Themistoklis P. Sapsis2
1
Department of Mathematics and Center for Atmosphere and Ocean Science, Courant Institute of Mathematical
Sciences, New York University, New York, NY 10012
2
1
Department of Mechanical Engineering, Massachusetts Institute of Technology, Cambridge, MA 02139
Proof of Proposition 1
Proposition 1. Assume the prior distribution from the forecast is the blended particle filter conditional
Gaussian distribution
Q
X
−
p− (u) =
pj,− δ (u1 − u1,j ) N ū−
[1.1]
2,j , R2,j ,
j=1
and assume the observations have the structure
v = G (u) + σ 0 = G0 (u1 ) + G1 (u1 ) u2 + σ 0 ,
[1.2]
then the posterior distribution in the analysis step taking into account the observations in [1.2] is also a blended
particle filter conditional Gaussian distribution, i.e. there are explicit formulas for the updated weights, pj,+ ,
+
1 ≤ j ≤ Q, and conditional mean, ū+
2,j , and covariance, R2,j , so that
p+ (u) =
Q
X
+
pj,+ δ (u1 − u1,j ) N ū+
2,j , R2,j .
[1.3]
j=1
+
In fact, the distributions N ū+
2,j , R2,j are updated by suitable Kalman filter formulas with the mean update
for ū2,j depending nonlinearly on u1,j in general.
Q
Proof. Let u = (u1 , u2 ), uj ∈ RNj , N1 + N2 = N , and u1 = {u1,j }j=1 are sampled in Q discrete states such
that [1.1] is satisfied. Using discrete representation of u1 , the posterior distribution can be written as
p+ (u | v) =
Q
X
p+ (u, u1,j | v) =
j=1
Q
X
p+ (u | v, u1,j ) pj,+ ,
[1.4]
j=1
where pj,+ = p+ (u1,j | v) is the posterior probability of the j-th state.
1. First, consider the general likelihood function p (v | u) (not possibly Gaussian). A simply application
of Bayes Theorem gives
p+ (u | v, u1,j )
=
=
=
p (v | u, u1,j ) p− (u, u1,j )
p (v, u1,j )
p (v | u, u1,j ) p− (u | u1,j )
´
p (v | u1,j , u2 ) p (u2 | u1,j ) du2
p (v | u1,j , u2 ) Πj,−
,
I˜j
1
[1.5]
2 Details of Blended Filter Algorithms
2
´
−
−
−
with I˜j = p (v | u1,j , u2 ) N ū−
2,j , R2,j du2 , 1 ≤ j ≤ Q, and Πj,− = p− (u | u1,j ) = δ (u1 − u1,j ) N ū2,j , R2,j
by the conditional Gaussian assumption.
2. Then, to determine the posterior weights pj,+ , the formula of conditional probability for posterior gives
p (u, u1,j , v)
= p (u | v, u1,j ) p+ (u1,j | v) p (v)
= p (u | v, u1,j ) pj,+ p (v) ,
[1.6]
and for prior
p (u, u1,j , v)
=
p (v | u, u1,j ) p− (u, u1,j )
=
p (v | u, u1,j ) pj,− Πj,− .
[1.7]
Compare [1.6] with [1.7], we get the identity
p (v | u, u1,j ) pj,− Πj,− = p (u | v, u1,j ) pj,+ p (v) .
Use the fact
PQ ´
j=1
[1.8]
p (u | v, u1,j ) pj,+ du2 = 1 and integrate both sides of [1.8]
p (v) =
Q ˆ
X
p (v | u1,j , u2 ) pj,− Πj,− du2 =
j=1
Q
X
pj,− I˜j .
[1.9]
j=1
Therefore, substitute [1.5] and [1.9] into [1.8],
´
pj,− p (v | u1,j , u2 ) p (u2 | u1,j ) du2
pj,+ =
p (v)
˜
pj,− Ij
= PQ
.
˜
k=1 pk,− Ik
[1.10]
3. Finally with observations in the form of [1.2] and Gaussian likelihood function p (v | u1,j , u2 ) =
pG (v − G0 (u1,j ) − G1 (u1,j ) u2 ), pG ∼ N (0, R0 ), the numerator of [1.5] becomes the standard process of
Kalman filter. That is,
−
δ (u1 − u1,j ) N ū−
2,j , R2,j pG (v − G0 (u1,j ) − G1 (u1,j ) u2 )
p+ (u | v, u1,j ) =
I˜j
+
+
= δ (u1 − u1,j ) N ū2,j , R2,j ,
[1.11]
+
where ū+
2,j , R2,j , depending nonlinearly on u1,j in general, are the posterior mean and covariance through
suitable Kalman filtering process. Also the weight updating factor I˜j can be calculated explicitly by integrating the multiplication of two Gaussian distributions
ˆ
1
1
T −1
−
− T −
˜
Ij = exp − (v − G (u)) R0 (v − G (u)) exp − u2 − ū2,j R2,j u2 − ū2,j du2
2
2
1
1 +T +−1 +
+ −2
−T −−1 −
− T −1
−
= det R2,j
exp
ū R
ū − ū2,j R2,j ū2,j − v − GEu1,j R0 v − GEu1,j
. [1.12]
2 2,j 2,j 2,j
Therefore, the expression for the posterior distribution [1.4] is derived explicitly in the form [1.10], [1.11],
and [1.12].
2
Details of Blended Filter Algorithms
Here we describe the details about the blended filter algorithms. After the forecast step, we get the predicted
values for the mean states ū and covariance matrix R, together with the particle presentation under DO
basis E (t) = {e1 (t) , ..., es (t)},
s
X
u (t) = ū (t) +
Yi (t; ω) ei (t) ,
i=1
2 Details of Blended Filter Algorithms
3
s
where {ei (t)}i=1 is the subdimensional dynamical basis, and {Yi } is the corresponding stochastic coefficients
achieved through Monte-Carlo simulations. The particle statistics presented by Yi must be consistent with
the covariance matrix R, and Yi , Yj are normalized to be independent with each other, that is
Yi Yj∗ = ei · Rej δij ,
1 ≤ i, j ≤ s.
For the analysis step, filtering is implemented in two separate
subspaces. The projection operator is
⊥
calculated by completing the dynamical basis P = E, E ⊥ . The orthogonal
complement E are chosen
1
. Following is the analysis step
freely here. Initially each particle is uniformly weighted as (Yi , pi ) = Yi , Q
algorithm for the blended methods.
Algorithm. Blended filtering (analysis step)
• Project the mean and covariance matrix to the two subspaces
R1
ū1
T
T
= P ū,
Rt = P RP =
T
ū2
R12
R12
R2
.
[2.1]
0
• Solve the following linear system to find the conditional mean ū2 (u1,j ) = ū−
2,j (denote u1 = u1 − hu1 i,
0
ū2 (u1 ) = ū2 (u1 ) − hu2 i as the fluctuations about the mean)





p1 u01
1,1
..
.
p1 u01
1,N1
p1
···
..
.
···
···
pQ u0Q
1,1
..
.
pQ u0Q
1,N1
pQ










(N1 +1)×Q
u01
2,1
u02
2,1
..
.
u0Q
2,1
···
···
..
.
···
u01
2,N2
u02
2,N2
..
.
u0Q
2,N2





=
R12
0
.
[2.2]
Q×N2
Note that this is an underdetermined system for a sufficiently large number of particles, Q N1 + 1,
and ‘–’ notation is suppressed here.
• Calculate the conditional covariance R2− in u2 subspace by
ˆ
−
R2 = R2 + hu2 i ⊗ hu2 i − ū2 (u1 ) ⊗ ū2 (u1 ) p (u1 ) du1
ˆ
= R2 − ū02 (u1 ) ⊗ ū02 (u1 ) p (u1 ) du1
X
ū02,j ⊗ ū02,j pj,− .
= R2 −
[2.3]
j
• Use Kalman filter updates in the u2 subspace
ū+
2,j
R̃2+
K
−
⊥ −
= ū−
2,j + K v − GEu1,j − GE ū2,j ,
= I − KGE ⊥ R2− ,
−1
T T
= R2− GE ⊥
GE ⊥ R2− GE ⊥ + R0
,
[2.4a]
[2.4b]
[2.4c]
with the (linear) observation operator G (u1 , u2 ) = GEu1 + GE ⊥ u2 , where R0 is the covariance matrix
for the observation noise.
• Update the particle weights in the u1 subspace by pj,+ ∝ pj,− Ij , with
−1
1
T −1
+
−T
+T
+
− −1 −
Ij = exp
ū2,j R̃2
ū2,j − ū2,j R2
ū2,j − (v − GEu1,j ) R0 (v − GEu1,j ) .
2
• Normalize the weights pj,+ =
Ppj,− Ij
j pj,− Ij
and use residual resampling.
[2.5]
2 Details of Blended Filter Algorithms
4
• Get the posterior mean and covariance matrix from the posterior particle presentation
X
X
ū+
u1,j pj,+ , ū+
ū+
1 =
2 =
2,j pj,+ ,
j
[2.6a]
j
and
X
∗
Yi,k Yj,k
pk,+ , 1 ≤ i, j ≤ s,
[2.6b]
Yi,k ū0+∗
j,k pk,+ , 1 ≤ i ≤ s, s + 1 ≤ j ≤ N,
[2.6c]
+
R1,ij
=
k
+
R12,ij
=
X
k
R2+ = R̃2+ +
X
0+
ū0+
2,j ⊗ ū2,j pj,+ .
[2.6d]
j
• Rotate the stochastic coefficients and basis to principal directions in the s-dimensional dynamical subspace.
Remark. 1. As derived in [1.12] of Proposition 1, the weight updating factor I˜j has an additional component
1
+ −2
, which is expensive to compute given the high dimensionality of the orthogonal subspace u2 .
det R2,j
For the blended algorithms in the main text, R2− is independent of the choice of particles j, so R2+ is
also independent of j by [2.4b]. Therefore the expensive determinant term can be neglected in [2.5] as a
normalization factor for the updated weights pj,+ .
2. By the blended forecast model, the dynamical basis ej (t) keeps tracking the principal directions of
the system (that is, the direction with the largest eigenvalues of the covariance matrix R (t)). However, after
the analysis step, the principal directions are changed according to the observation data. Therefore, it is
necessary to rotate the basis according to the posterior covariance after every analysis step as described in
the last step above. Although numerical tests show reasonable results without the rotation, corrections at
every step can make sure that the scheme is robust and more accurate.
2.1
The conditional covariance R2 (u1 ) given particles in u1 subspace
The forecast model gives the particle representation u1,j in the low dimensional subspace. And the conditional
Gaussian mixture (ū2,j , R2,j ) in the orthogonal subspace is achieved by solving the least squares solution of
2
PQ the linear system [2.2] with minimum weighted L2 -norm, j=1 ū02,j pj . Such idea comes from the maximum
entropy principle. Another advantage of this max-entropy solution is that it concludes that the conditional
covariance R2 (u1 ) = R2− is independent of the particles in u1 subspace.
One important point to note is that the prior covariance R2 from [2.1] is different from the conditional
covariance R2 (u1 ) = R2− given the particle values in the u1 subspace. This can be seen clearly from their
definitions
¨
R2 =
(u2 − hu2 i) ⊗ (u2 − hu2 i) p (u2 | u1 ) p (u1 ) du2 du1 ,
[2.7]
ˆ
R2 (u1 ) = (u2 − ū2 (u1 )) ⊗ (u2 − ū2 (u1 )) p (u2 | u1 ) du2 .
[2.8]
´
with hu2 i = ū2 (u1 ) p (u1 ) du1 . By a simple calculation combining [2.7] and [2.8] we have
ˆ
ˆ
R2 (u1 ) p (u1 ) du1 = R2 + hu2 i ⊗ hu2 i − ū2 (u1 ) ⊗ ū2 (u1 ) p (u1 ) du1
ˆ
= R2 − ū02 (u1 ) ⊗ ū02 (u1 ) p (u1 ) du1 ,
with ū02 (u1 ) = ū2 (u1 ) − hu2 i. Noting that R2 (u1 ) = R2− is independent of the variable u1 , we get the
approximation for the conditional covariance
ˆ
−
R2 = R2 − ū02 (u1 ) ⊗ ū02 (u1 ) p (u1 ) du1
X
= R2 −
ū02,j ⊗ ū02,j pj .
[2.9]
j
3 Blended Filter Performance on L-96 model
5
This conditional covariance formula is theoretically consistent, while it will introduce the problem of
realizability. Then as discussed in the main text, we may need introduce a realizability check and correction
method for R2− after each analysis step, or simply inflate the matrix as R2− = R2 (labeled as ‘inflated R2 ’ in
the figures below). Both strategies work well for MQG-DO method while QG-DO method requires the larger
inflation approach in practical implementations as shown in the results below.
2.2
Resampling strategies
Here we add some details about the resampling process. In the resampling step, particles with small weights
are abandoned, while the other particles are duplicated according to their weights to keep the ensemble size
unchanged. One useful and popular resample approach is the residual resampling method. The idea is,
replicating
the j-th particle to have bpj Qc copies according to its weight in the first step; and for the rest
P
Q − j bpj Qc members particle values are assigned randomly with probability proportional to their residual
weights pj Q − bpj Qc.
An additional issue arises in the process of duplicating particles. After resampling, we will get several
particles with the same value. For stochastic systems, this will not be a problem since the random external
forcing term will add uncertainty to each particle and produce different prediction results even though the
initial values are the same. However for deterministic systems with internal instability (for example, the
L-96 system), such resampling would be of no real value, since no external randomness is introduced to the
particles to generate different outputs in the next step. One possible strategy to avoid this problem is to
add small perturbations to the duplicated particles. Due to the internal instability inside the system, small
perturbations in the initial time can end up with large deviations in the forecasts before next analysis step
so that the duplicated particles can get separated. The amplitude of the perturbation is added according
to the uncertainties of the variables. This can be approximated by the filtered results before the resampling
step. After updating the weights, we get the set of particles together with their new weights u0j , pj ,
P
02
then a perturbation is added to each particle as a white noise with variance σ 2 =
j uj pj . Note that
this perturbation can be viewed as an inflation to the particles. We will calculate the posterior covariance
matrix according to the resampled particles, therefore the covariance will also get inflated accordingly. From
numerical simulations it is shown that this strategy is effective to avoid filter divergence.
3
Blended Filter Performance on L-96 model
Here we present the additional results for the blended filter methods compared with the MQG filter as well
as EAKF with inflation and localization. The L-96 model is used as the test model
dui
= ui−1 (ui+1 − ui−2 ) − ui + F,
dt
i = 0, · · · , J − 1
[3.1]
with J = 40 and F the deterministic forcing parameter. We will test the filter performances in both weakly
chaotic regime (F = 5) and strongly chaotic regime (F = 8) as described in the main text. The statistics
for these two cases are shown in Figure 3.1. Strong non-Gaussian statistics can be seen from the weakly
chaotic regime F = 5 while near Gaussian distribution is shown in strongly chaotic regime F = 8. Figure
3.2 shows the RMS errors and pattern correlations for various filtering methods in regime F = 5 with
moderate observational error and observation frequency r0 = 2, ∆t = 1, p = 4. And Figure 3.3 gives the joint
distributions of the first two principal Fourier modes û7 and û8 in this regime by scatter plots to further
visualize the non-Gaussian statistics captured through different schemes. Finally Figure 3.4 gives additional
examples for the performance of the filters in regimes with sparse infrequent high quality observations with
slightly larger observation noise r0 = 0.25. In addition, we compare the performances of the QG-DO method
with crude covariance inflation R2− = R2 . Incorrect energy transfers in the QG forecast model for the long
forecast time using the original covariance approximation as in [2.9] in this case end up with serious filter
divergence.
3 Blended Filter Performance on L-96 model
6
F=5
F=8
0.4
0.12
0.11
0.35
0.1
Percentage of Energy %
Percentage of Energy %
0.3
0.25
0.2
0.15
0.09
0.08
0.07
0.06
0.05
0.1
0.04
0.05
0.03
0
0
2
4
6
8
10
12
14
16
18
0.02
0
20
2
4
6
wavenumber
8
10
12
14
16
18
20
wavenumber
pdfs for Re(u )
pdfs for Re(u )
k
k
0.035
0.018
mode 7
mode 8
mode 8
mode 7
0.016
0.03
0.014
distribution function
distribution function
0.025
0.02
0.015
0.012
0.01
0.008
0.006
0.01
0.004
0.005
0
0.002
−60
−40
−20
0
Re(uk)
20
40
0
−150
60
−100
−50
Autocorrelation function for F = 5
0
50
Re(uk)
100
150
Autocorrelation function for F = 8
1
0.5
correlation function
correlation function
0.8
0
0.6
0.4
0.2
0
−0.5
0
5
10
15
time
20
25
30
−0.2
0
2
4
6
8
10
12
14
16
18
20
time
Fig. 3.1: Statistics for the L-96 model: energy spectra in Fourier space (first row); the pdfs for the first two
principal Fourier modes (second row); and the autocorrelation functions of the state variable (third
row). Regimes for weakly chaotic (F = 5, left) and strongly chaotic (F = 8, right) dynamics are
shown.
3 Blended Filter Performance on L-96 model
7
RMSE for F=5, p=4, r0=2, ∆t=1, S=5
3.5
obs. error
MQG−DO
EAKF
QG−DO
MQG
3
RMSE
2.5
2
1.5
1
0.5
0
0
50
100
150
200
time
XC for F=5, p=4, r0=2, ∆t=1, S=5
1
0.9
cross−correlation
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0
50
100
150
200
time
Fig. 3.2: RMS errors and pattern correlations in the weakly chaotic regime (F = 5) with parameters r0 =
2, ∆t = 1, p = 4. Results by different filtering methods are compared.
3 Blended Filter Performance on L-96 model
8
scatter plot between mode 7 and mode 8, truth for F=5
70
60
50
|u8|
40
30
20
10
0
0
10
20
30
40
|u7|
50
60
70
80
(a) truth
scatter plot between mode 7 and mode 8, MQG−DO for F=5
scatter plot between mode 7 and mode 8, QG−DO for F=5
60
60
50
50
40
40
8
|u8|
70
|u |
70
30
30
20
20
10
10
0
0
10
20
30
40
|u7|
50
60
70
0
0
80
10
20
(b) MQG-DO
30
40
|u7|
50
60
70
80
(c) QG-DO
scatter plot between mode 7 and mode 8, MQG for F=5
scatter plot between mode 7 and mode 8, EAKF for F=5
60
60
50
50
40
40
8
|u8|
70
|u |
70
30
30
20
20
10
10
0
0
10
20
30
40
|u7|
(d) MQG
50
60
70
80
0
0
10
20
30
40
|u7|
50
60
70
80
(e) EAKF
Fig. 3.3: Joint distributions of the first two principal Fourier modes û7 and û8 shown by scatter plots with
different filter schemes in regime F = 5.
3 Blended Filter Performance on L-96 model
9
RMSE for F=8, p=4, r0=0.25, ∆t=0.15, S=5
14
obs. error
MQG
MQG−DO
QG−DO
EAKF
12
RMSE
10
8
6
4
2
0
60
80
100
120
140
160
180
140
160
180
time
XC for F=8, p=4, r0=0.25, ∆t=0.15, S=5
0.8
Pattern Correlation
0.6
0.4
0.2
0
−0.2
60
80
100
120
time
RMSE for QG−DO, F=8, p=4, r0=0.25, ∆t=0.15, S=5
16
obs. error
QG−DO, conditional R−2
14
QG−DO, inflated R2
12
RMSE
10
8
6
4
2
0
60
80
100
120
140
160
180
time
Fig. 3.4: RMS errors and pattern correlations in the strongly chaotic regime (F = 8) with parameters r0 =
0.25, ∆t = 0.15, p = 4. Results by different filtering methods are compared, and in addition the last
row gives the comparison of the RMSEs for QG-DO method with inflated (blue) and original (red)
conditional covariance R2− .
Download