A Random Subgrouping Scheme for Ensemble Kalman Filters Yun Liu Dept. of Atmospheric and Oceanic Science, University of Maryland Atmospheric and oceanic sciences and Center for Climatic Research, UWMadison Collaborators: X. Rong, Z. Liu, S. Wu, S. Zhang, R. Jacob, F. Lu Outline • Background • A Random Subgrouping Scheme for Ensemble Kalman Filters • OSSE test results Ensemble Kalman Filter Ensemble Kalman filter is a Mont Carlo approximation of Kalman Filter. It samples the probability density function (PDF) of forecast and analysis using ensemble. (Evenson 1994). Before Ensemble Kalman filter After Ensemble Kalman filter Stochastic filter and Deterministic filter Stochastic filter (EnKF) vs Deterministic filter (EnSRF) EnKF: the observations are treated as random variables that are perturbed using their uncertainties (Burgers et al 1998). Xia = Xif + K(yi - HXif ) EnSRF: transform the forecast ensemble to match the analysis and its theoretical uncertainty from Kalman filter (Anderson 2001, Bishop et al. 2001, Whitaker and Hamill 2002, Tippett et al 2003, Hunt et al… ). xa ' = Ax f ' P a = (1- KH )P f xa = x f + K(yo - Hx f ) { or } ® Pa xa ' = x f ' A Error sources for Ensemble Kalman filters Sampling error from limited ensemble size EnKF EnSRF Forecast Yes Yes observation Yes No EnSRFs perform better than EnKF for a linear system, especially when ensemble size is small. Linear Hypothesis: error from Non-Gaussian PDF (in nonlinear system) EnKF performs better than EnSRF Lorenz63 model Experiment design: • The model time resolution 0.01 • The observation frequency 0.1 • The observation error standard derivation (2.0, 2.0,2.0) for x, y, z, respectively A case study for Lorenz63 model outlier The error evolution of 80-member ensemble EnKF / EAKF assimilation of Lorenz63 model. Individual members (black lines) and the ensemble means (red lines) EnKF vs EAKF for Lorenz63 • The EAKF performs worse when the ensemble size increases. • The outlier effect becomes severer with the increase of ensemble size. Kurtosis is an index for the presence of outliers å(x - x ) Kur = (å (x - x ) ) 4 i i 2 2 i i Challenge • The extreme outlier problem arising from nonGaussian PDFs is a challenge for an ensemble Kalman filter, especially for EnSRFs with a large ensemble size (Lawsen and Hansen 2004, Anderson 2010, Lei et al. 2010) Previous works • Sakov and Oke (2008) a random transformation. xa ' = x f ' A ® P a = (1- KH )P f xa ' = x f ' AT TT T = I, TI = I T is an arbitrary orthonormal mean-preserving matrix • Anderson (2010) rank histogram filter A random subgrouping technique on EnSRF (sEnSRFn) The sEnSRFn randomly divide the full ensemble into sub-ensembles of equal size of n at each analysis step. All sub-ensembles are updated independently using a deterministic filter algorithm. The random subgrouping technique removes the effect of extreme outliers from two aspects: • the smaller ensemble size for each sub-ensemble limits the divergence of an outlier from the ensemble mean; • the random subgrouping prevents an outlier persistent. The flow charter EnKF/ EnSRF/ sEnSRF5 Test in simple model Model: Lorenz63 model Observation: Adding random errors onto the “truth” with a standard derivation (2, 2, 2) Observation time interval: 0.1 (10 time steps) Simulation length: 50 time units (500 analysis steps) 500 sets of experiments are performed use EnKF, EAKF and sEAKFn schemes. For each sets of experiment, all three schemes use the identical initial conditions and observations A case study • The error evolution of 80-member ensemble EnKF / EAKF/sEAKF5 assimilation of Lorenz63 model. • Individual members (black lines) and the ensemble means (red lines) • The ensemble kurtosis (a) and analysis RMSE (b) for different ensemble size averaged from 500 sets of experiments. • The x-axis represents the sample sizes for subgroups of sEAKFn (EnKF, EAKF). The solid lines are for sEAKFn assimilation except the black line, which indicats the kurtosis of 3 for a Gaussian white noise distribution. The dash lines are for the simulations with subgrouping scheme but no randomizing. • The kurtosis (RMSE) from EAKF for an ensemble size of 80 and 160 are 19.0 (0.84) and 67.0 (1.33) that are too large to be shown on the plots. assimilation time interval • The ensemble kurtosis (a) and analysis RMSE ratio (b) for different assimilation schemes with ensemble size of 80 and different observation time intervals. The results are averaged from 500 sets of experiments. The x-axis represents the observation time intervals for the experiments. • (a) The ensemble kurtosis of y for different assimilation schemes. The blue dot line is for EAKF; the green star line is for EnKF and the red circle line is for sEAKFn with best subensemble size. • (b) The RMSE ratio between sEAKFn with best sub-ensemble size and corresponding EAKF(blue dot line) or EnKF ( green star line). Random subgrouping EnKF • The ensemble kurtosis (a) and analysis RMSE (b) of sEnKFn and EnKF for different ensemble size averaged from 500 sets of experiments. Test in an intermediate model: QG-slab ocean model • Ψ is the streamfuction, htopo the topography ϰ the coupling coefficient for the QG atmosphere, T the ocean temperature, α the coupling coefficient for the ocean. • H(y) is the mixture layer depth and Qw the solar forcing for the ocean, both of them using symmetric patterns in two hemispheres. • Resolution: 64*54 Gaussian grid, 0.5 hour interval • Observation for ψ : error scale ~ 10e6, assimilation interval: 12 hour • Ensemble size: 80 Outliers in QG-slab model with EAKF The ensemble kurtosis of ψ from a case study (EAKF and EAKF40), which share the identical initial condition and observation. The contour interval is 1.0 with the minimum value of 5.0. Case study • The analysis RMSE difference of ψ between different assimilation schemes. • (a) The analysis RMSE difference between EAKF and sEAKF40 from one set of experiments, which share the identical initial condition and observation. • (b) The analysis RMSE between EAKF and EnKF from same set of experiment of (a). RMSE decrease with new scheme • The analysis RMSE difference of ψ between EAKF and sEAKF40 averaged over 32 sets of experiments. The green solid contours indicate 95% confidence level. Conclusion • Ensemble Kalman filters can be divided into two categories: stochastic and deterministic. Both types of filters suffer from the problem of generating extreme outliers in the ensembles produced in a nonlinear system. • A sEnSRF scheme is developed to solve the outlier problem with a random subgrouping. The sEnSRF randomly divides the full ensemble into sub-ensembles of equal size at each analysis step. All sub-ensembles are updated independently using a deterministic filter algorithm • The random subgrouping technique removes the effects of extreme outliers in two ways: the smaller ensemble size for each sub-ensemble limits the deviation of an outlier from the ensemble mean and the random subgrouping prevents the longterm persistence of an outlier • Test results on both simple model and intermediate model show that the new scheme significantly improves filtering performance, compared to the traditional stochastic and deterministic filtering algorithms. Thank you!