Assimilation of non-linear observations using approximate background error covariances. Part II: Radar data assimilation into WRF ARW for short-term forecasting. Chang-Hwan Park, Mark S. Kulie, and Ralf Bennartz Atmospheric and Oceanic Sciences University of Wisconsin – Madison Abstract Temporally and spatially highly resolving radar measurements are the only means to continuously observe dynamically evolving meteorological phenomena such as severe thunderstorms. Assimilation of radar data into mesoscale models might be a key factor to improve precipitation forecasting especially at shorter time scales. However, major obstacles for the assimilation of radar data lie in the strong non-linearity of the observation operators and the intermittent nature of the precipitation processes. These result in a severe violation of the assumption of Gaussian error characteristics in the data assimilation schemes, which manifests itself in unrealistic background error covariance matrices and in unstable solutions. In this paper, we propose a new method to address this problem and to assimilate radar observations into mesoscale models. The proposed solution includes two steps to obtain a well-conditioned background error covariance matrix: A normalizing step and a rescaling step. We also introduce a new weighting technique to avoid filter divergence, another common issue especially in Ensemble Kalman Filter (EnKF) type assimilation schemes. The proposed method is applied to simulate an intensive precipitation event near Madison, WI in July 2006 using the Weather Research and Forecasting (WRF) Model in an EnKF mode. The new method results in a significant improvement of short-range forecasting skills for this severe weather event. Keywords : non-Gaussianity, non-linearity, discontinuity , filter divergence, flowdependent optimization, perturbation method, inverse method, WRF and EnKF 1) Introduction Forecasting severe thunderstorms and convective systems plays an important role in weather prediction. Severe thunderstorms can develop very quickly and can produce high winds, hail, and flash flooding, which are potentially life-threatening and can cause substantial property damage in a short time. According to the U.S. Natural Hazard Statistics over 30 year average (1977-2006), “Flash floods/floods are the #1 cause of deaths associated with thunderstorms.” (Sky warn review by Phil Hysell). Also, NOAA’s annual compilations of flood loss statistics show that the damage scale of the flood tends to increase with time, possibly mostly due to urban spread. Fast and accurate nowcasting and short-term forecasting systems, including the prediction of timing, location, and intensity of the severe storm, are becoming more and more important in this regard. Mesoscale models together with advanced methods to incorporate radar and other realtime observations have become one increasingly popular research venue in this area. Only radars allow for observing convective scale thunderstorms at high temporal and spatial resolution (Sun 2005). Several studies indicate that in particular Ensemble Kalman Filter (EnKF) approaches are able to successfully integrate radar and model data for short-term prediction of mesoscale convective precipitation (Snyder and Zhang 2003; Dowell, Zhang et al. 2004; Zhang, Snyder et al. 2004; Tong and Xue 2005; Xue, Jung et al. 2007; Meng and Zhang 2008). The rapid evolution of mesoscale systems as well as the non-linear relation between radar reflectivity and precipitation intensity makes the assimilation of radar data into models a challenging problem, both mathematically and physically. Specifically, problems include the non-linearity of the observation operator (Evensen 2003), non-Gaussian background errors (Harlim and Hunt 2007), filter divergence (Houtekamer and Mitchell 1998), as well as the flow-dependence of the background-error covariances (Houtekamer and Mitchell 1998; Houtekamer and Mitchell 2001; Houtekamer, Mitchell et al. 2005). There are two interrelated key factors in EnKF approaches: Firstly, the ensemble perturbation, which determines the initial spread of the ensemble and hence bears significant importance in determining the background error covariance; secondly, the actual ensemble update in which observations are used to steer the ensemble into a direction consistent with the observations. Both are interrelated because the ensemble spread crucially determines the weight given to observations in the update. Various approaches to incorporate radar data into mesoscale models have been reported. Lin et al. (Lin, Ray et al. 1993) initialized a cloud-scale model using radar, a thermodynamic retrieval method (Galchen and Kropfli 1984; Hane and Ray 1985) of incorporating pressure and potential temperature and the model velocity retrieved from the radial velocity. Snyder and Zhang (Snyder and Zhang 2003) showed an EnKF for assimilation of single-Doppler radar observations in cloud-scale models by initially assimilating radial velocity. The radial velocities observation operator is linear and based directly on prognostic model variables (i.e. wind). The simulation of radar reflectivity is more challenging than radial velocity, because the observation operator of radar reflectivity is highly non-linear and has a non-Gaussian error probability density function (PDF). Recent work (Tong and Xue 2005); (Xue, Tong et al. 2006) employed radar reflectivities within the EnKF framework in the log domain due to the severe nonlinearity of radar reflectivity. However, (Xue, Jung et al. 2007) indicate that the observation sampling error of radar reflectivity in the log domain violates the Gaussian assumption. One of the most serious problems in assimilating non-Gaussian variables is the non-physical update due to the non-Gaussian PDFs in the EnKF. Several methods for this problem are currently being investigated, e. g. localization (Anderson 2003; Ott, Hunt et al. 2004; Chen and Snyder 2007) and regularization(Beezley and Mandel 2008; Johns and Mandel 2008). Another problem in the forecast update using EnKF is a filter divergence. The divergence problem can be improved by double EnKF(Houtekamer and Mitchell 1998) , covariance inflation (Anderson 2001) , multiplicative covariance inflation (Hamill and Whitaker 2005; Houtekamer, Mitchell et al. 2005), covariance relaxation (Zhang, Snyder et al. 2004), or weighting between static and ensemble covariances in a hybrid ensemble transform Kalman filter (Wang, Barker et al. 2008). In this study we explore a new method to incorporate non-linear observations with non-Gaussian error covariances. This new method approximates the ill-defined background error covariances matrix of the observations using correlative information from other model variables. The approach is tested using radar observations of a convective precipitation event observed in Madison, WI in July 2006. In section Error! Reference source not found. the theory of EnKF and the description of WRF are briefly reviewed. In section 2) a synoptic and observational overview for this case study is provided. Also the model initiation for the convective case is discussed. Section Error! Reference source not found. introduces the new method, the background error approximation. Then we will validate the performance of forecast capability improved by EnKF with new method in section Error! Reference source not found.. 2) The Madison Flood Case a) Overview of Synoptic situation and Radar observations The heavy convective rain event modeled in this study occurred in the southern tier of the state of Wisconsin on 27 July 2006. As shown in Figure 1, the main synoptic feature at 15 UTC on 27 July 2006 was a stationary front that stretched across southern Wisconsin and served as the focal point for initial convective development. Figure 1. Surface weather map for 15 UTC on 27 July 2006 National Weather Service Weather Surveillance Radar (WSR-88D) Level II data from the Milwaukee/Sullivan, WI (KMKX) site were obtained from the National Climatic Data Center data archive. Base reflectivity data from 1600 to 1700 UTC were utilized in this study. Figure 2 illustrates the light clutter and clear air echoes that existed near the radar site early in the analyzed time period. The quality-control for the radar reflectivity was performed manually with eyeball method. In this study, the manual quality-control is sufficiently reliable, because the reflectivity of the radar observation in the assimilation cycles, which can be considered as the precipitation, appears far from ground clutter as shown Figure 2. Therefore, it is easy to remove non-precipitation feature from the radar observation. Figure 2. Base radar reflectivity from the KMKX radar site at 1630 UTC on 27 July 2006. The approximate location of the stationary front at 15 UTC is also indicated. The synoptic environment was not tremendously dynamic. The stationary front moved gradually over 1200 UTC and 0000 UTC. In addition, the vertical wind profile nearby 12UTC soundings indicated very little vertical shear existed in the low- to mid-levels. Surface winds were very light, and a slight wind direction shift was evident in the vicinity of the boundary. Also, upper level winds were not excessive in the region, so severe organized convection was not forecasted by National Weather Service Forecast and WRF does not simulate it. Nevertheless, the radar captured that the convection initiated around 1630 UTC as shown Figure 2 and it developed very quickly into the severe deep convections. For example, by 1730, Doppler radar rainfall estimates exceeded 6 inches in isolated locales throughout south-central Wisconsin, and severe urban flash flooding was reported in the city of Madison. Additional intense rainfall formed after about 1900 UTC along a lake breeze front that progressed steadily inland through the western suburbs of Milwaukee (about 30 km from the shore of Lake Michigan), and further flash flooding was reported from these convective cells in 20UTC. The deep convections over the flood regions of south-central Wisconsin is driven by huge convective available potential energy (CAPE). According to the morning National Weather Service Forecast Discussion on 27 July 2006 (available from the National Climatic Data Center archives), mixed layer CAPE values based on modified soundings were near 2000 Jkg-1 in the region with minimal to no convective inhibition. CAPE increases when unsaturated lower layer includes high water vapor. By definition of potential instability (Saucier 1955) , if an air column with dryer upper layer and high-moisture lower layer is raised by stationary frontal lifting, the instability of the atmospheric state will increase until moisture in the lower layer has been entirely saturated. The even weak frontal lifting with the upper dry air, being quickly cooler by dry-adiabatic laps rate and the lower humid air, slowly cooling along the saturation-adiabatic lapse rate, can be intensified into unstable atmospheric state and it will trigger a sever convection. Through these synoptic and observation overview, the cause of severe convection generating flood over the south-central Wisconsin turns out the high potential instability due to warmer and high moisture conditions on the surface layer. In order to improve forecasting capability of WRF for this event, we need to modify the potential instability in states of WRF using radar reflectivity and EnKF. b) The WRF-ARW model The Weather Research and Forecasting Model (WRF) have been developed for advanced mesoscale numerical weather prediction. In this study, the WRF is used to simulate a severe mesoscale convective system that produced heavy rain in South-Central Wisconsin and urban street flooding in Madison, WI. Table 1. WPS & WRF Model setup for the Madison Flood case WRF Preprocess System Option Description (WPS) Data GFS-ANL NECP High resolution Global Forecast System (1degree GFS) Center point 43.071N Latitude 89.37W Longitude Spatial resolution 7.12 km dx, dy in the mother dom. 1.42 km in the nest dom. 2 Domain size 700 1582 km Mother dom. 142 285 km2 Nest dom. WRF Option Description Microphysics WSM6 Mother/nest dom. Cumulus Parameterization Grell-3 Mother dom. Time control 18hr 2006/07/27/ 06:00 ~ 07/28 00:00 History interval 15 min Corresponding to the radar observation Vertical level 27 number of vertical levels Table 1 shows some of the important WRF model characteristics used in this study. The microphysics option chosen is WRF Single Moment (WSM) 6-Class graupel scheme. The WSM scheme has been developed as 3-class (vapor, cloud ice/water, snow/rain based on simple freezing/melting process), 5-class (vapor, rain, snow, cloud ice, cloud water as five different arrays) and 6-class (similar to 5-class with graupel added) scheme. The WSM6 scheme is used in this study to compare the rain distribution between radar observation and WRF simulation, because the six-class scheme is believed to show the most proper behavior for cloud-resolving grids (Hong 2006). The mother domain resolution in this study is about 7 km resolution. In this study we use the Grell scheme (G3) cumulus convection scheme for the mother domain, which allows sub-grid convective effects for 5~10km grid resolutions. We chose the nested model to have a 1.4 km grid scale. c) Ensemble Forecast using WRF The timeline of the ensemble forecast is described in Figure 3. After 15 initial ensemble perturbation in potential temperature field, enough spin-up time (10 hour in this study) is required to establish hydrological balance in the model. From 1600 to 1700 UTC, Radar reflectivity is assimilated into WRF in every 15 minutes. After assimilation, 15 ensemble runs will predict precipitation in targeted forecasting cycles. Figure 3. Forecast Timeline modified by application of EnKF (R: Radar data assimilation in WRF, S: Spin-up time after the initial perturbation and after the initialization by observation assimilation, F: Forecast after Radar data assimilation) Table 2 shows restart option used in this study as well as the time interval between restarts. The restart option helps to assimilate radar observation into the model. After the first model spin-up, WRF is stopped and updated through EnKF. Because the update is performed only on the nested domain, the feedback and smooth option in the WRF are utilized to indirectly update also the mother domain from the updated nested domain. These options help to bring the proper boundary condition of nested domain from mother domain. Table 2. WRF Model setup for the radar data assimilation WRF Restart Restart_interval (1) Restart_interval (2) Feedback & Smooth Option .correct., 1600 15 1 Description Restart after imposing spin-up time & RDA 10hr spin-up time from 06:00 to 16:00 Every 15min RDA from 16:00 to 17:00 Reflect nested domain’s assimilation product also into the mother domain for reducing-difference between the mother and nested domains We represent the forecast ensemble by 15 ensemble members from the potential temperature field at 0600 UTC. The stability associated with the different ensemble members was randomly varied by perturbing the potential temperature field using random perturbation based on a Gaussian distribution. Figure 4. Sampling 15 ensemble members of the model variables by the initial perturbation of potential temperature field. As can be seen from Figure 4, the perturbation in this study changes linearly with height, thus modifying the stability of the model domain. Positively perturbed members are promoted toward greater potential instability increasing the surface temperature. In contrast, negatively perturbed members have decreased potential instability. Table 3. Experimental design of statistical components in EnKF implementation for radar reflectivity assimilation Control EXP.1 EXP.2 EXP.3 Radar Observation without assimilation Z surface Z surface Z surface Model variables - , Qr , Qr , Qr - A 3D , Qr 3D A 3D A 3D - HA Qr surface HA surface HA surface Perturbation of Model State Perturbation of Model Simulation BG. and OB. error - Unweighted Unweighted Weighted In Table 3, we present two experiments including a control run. The control run is an ensemble run without a data assimilation (Control). WRF in this case couldn’t simulate deep convective precipitation in forecasting cycles. Therefore, in the first experiment (EXP.1) we performed the data assimilation of radar reflectivity to initiate deep convection in WRF, but without any approximation to non-linear observation operator in the Kalman gain. The second experiment (EXP.2) is the radar reflectivity assimilation case using approximate observation operator to resolve non-Gaussian problem in EnKF. 3) Methods a) The Ensemble Kalman Filter The general optimal estimation can be expressed as the following equation (Kalman 1960; Gelb 1974): X X K D h( X ) (1) Using the Jacobian matrix H the observation operator can be approximated linearly as: h X HX (2) leading to a fully linearized form of Equation (1): X X K D HX (3) The optimal filtering is implemented mainly in two parts, the Kalman gain K and the observation increment D h( X ) . The observation increment is the only component, which includes absolute information of the observation and the model state. The Kalman gain matrix K only holds information about the uncertainties and error covariances and does not contain information about the current state of the observation or the model. Thus, K determines the degree to which observational information gets filtered into the model. The Kalman gain itself can be written as: K BH T HBH T R 1 (4) If the observation-error matrix R is small compared to the background error, the update from the observation increment will become large through K. In other words, the model state will be effectively improved by the observation. On the contrary, if the observation error covariance is relatively large, the observations will be largely ignored. Large uncertainties in the observation correspond to a high R. As a result, the diminished Kalman gain will not update the model state from the observation data and the forecast model will not be modified by the observation. In the EnKF framework, the background error covariance matrix is calculated from the sampling uncertainty of a forecast ensemble using a Monte-Carlo method (Evensen 1994): 1 (5) B AAT s 1 where A X X are the ensemble perturbations. The Kalman gain K in the EnKF can then be expressed as: 1 1 K AAT H T HAAT H T R s 1 s 1 1 (6) Applying the linearized observation operator free implementation to avoid high computational costs for the construction of the H matrix, K can be rewritten as: 1 T 1 T K A HA HA HA R 1 s 1 s 1 (7) Data assimilation using an ensemble approach (Eq. (5)) often suffers from issues related to too small ensemble sizes (Whitaker and Hamill 2002). If the number of ensemble members is too small, the updated ensemble often becomes biased, unrepresentative, or the ensemble spread does not represent the true uncertainty in the model background state anymore. This issue will partly be addressed later on when the variance of the background versus observation error covariance are addressed. Subsequently, we deal with the approximation of HA under conditions with non-linear forward operators h(…). b) Background error approximation The matrix HA can be interpreted as the deviation of the ensemble members from the ensemble mean state mapped into observation space. The crucial assumption underlying the matrix HA is that the observation operator can be approximated by a linear function around the mean state. This requires the deviations between the individual ensemble members and the mean state to be small, an assumption, which is frequently violated when dealing for example with intermittent phenomena such as clouds and precipitation. Furthermore, when dealing with strongly non-linear observation operators, the distribution of HA will most likely not be Gaussian, even if A is Gaussian. In addition, in many instances the distribution of A itself will rather be bi-modal or skewed. For example in convective precipitation situations some models will see precipitation and some will not. As a consequence, A as well as HA will also be bi-modal, even if the observation operator h(…) is not too strongly non-linear, as is the case for example with radiative transfer models. In case of a strongly non-linear forward model, such as radar reflectivity, the issue becomes even more relevant. The impact of a non-linear observation operator is schematically depicted in Figure 5. The strongly non-linear observation operator in Figure 5c will map all negative distributions of the model PDF onto zero in observation space and create a highly skewed PDF in observation space. This is a common problem with radar observations. The assimilation of intermittent observations might also lead to rank-deficiencies in B for example in cases where neither of the ensemble members produces precipitation. Figure 5. Schematic distribution types of observation operators perturbation simulated after perturbation of model states a) linear observation operator forming a Gaussian distribution b) weakly non-linear observation operator forming a skewed Gaussian distribution c) strongly nonlinear observation operator forming a highly non-Gaussian distribution The above considerations make it clear that a stable estimate of HA is crucial for successful assimilation of a large set of observations especially regarding clouds and 1 HA (HA )T needs to be well conditioned. precipitation. In particular HBHT s 1 The physical interpretation of the background error covariance matrix B is two-fold. Firstly, B determines the error correlations of different components of the state vector. Thus, B will determine how much a certain change in an observed variable will affect variables in the state vector, which are not observed. Secondly, the magnitude of HBH compared to R determines the relative weight, which is put onto the observations in the update. This relative weighting between ensemble mean and observations depends mostly on the initial ensemble spread as well as on the development of the ensemble and can thus not be determined objectively, if the a-priori distribution of the ensemble is chosen arbitrarily. The two purposes of B are independent of each other and can be decoupled explicitly in a slightly revised formulation of the EnKF problem. Assume an observable linked to the state of the model via a strongly non-linear forward operator, such as precipitation rate. In this case HA represents the deviation of the individual ensemble members from the ensemble mean. Since the distribution of precipitation rate is typically highly skewed, the resulting distribution of HA will be highly skewed and not well suited for assimilation purposes. While precipitation rate is observed, the underlying process (e.g. ‘convection triggering precipitation’) might not be observed directly. However, various other model variables will be correlated with this process. For a typical convective precipitation situation, deviations in temperature in the boundary layer might be positively correlated with deviations in precipitation. In this paper, a new method is proposed to address this issue. The basic idea relies on the aforementioned correlative model variables and can be put into very simple terms: ‘If the background error correlation in observation space cannot be used directly, it might be approximated by another variable, which is correlated with the observations and has better error characteristics.’ This method corresponds to a translation of the non-Gaussian PDF into a Gaussian form, maintaining its original physical units. In Figure 6, the process is schematically explained. Two variables are shown as observable variable y and observation proxy Y. The PDF of the observable variables y forms shows a non-Gaussian distribution (blue curve) whereas the observation proxy Y (red curve) has a Gaussian distribution and the positive part of the distribution is well correlated to observable y (not shown). Figure 6. New method in general from : Modulation of Probability Density Function (PDF) between different physical variables by normalization process with rescaling factors The proxy background error covariance in observation space can now be derived via a rescaling. The ratio of the traces of the original background error covariance and the proxy are used to renormalize the total variance of the approximated background error covariance to match the physical units of R in the subsequent EnKF calculations. Note, that the ratio of the traces of these matrices is only a scalar factor and other, more sophisticated methods might be devised too. Another issue related to ensemble assimilation is the triggering of clouds or precipitation in cases where initially none of the ensemble members produce any cloud or precipitation. In such cases HA collapses to identical zero and no assimilation can be performed. Replacing HA with a proxy variable will yield a well-defined background error covariance and potentially will allow to initiate clouds or precipitation in the assimilation. This is highlighted in Figure 6, middle panel. Assume the PDF of precipitation (black curve) to be approximated by an ensemble with all identical zeros. Then the background error covariance will collapse. Assume the red curve to be the distribution of potential temperature in the boundary layer. Areas with high potential temperature will correspond to high precipitation and areas with lower potential temperature to zero precipitation. By replacing the black with the red curve, a better proxy of the true error covariance will be obtained. Using the variable Ao to denominate the proxy for HA, this leads to the following approximation for HA in the Kalman gain: HA Ao t (8) The revised Kalman gain can now be expressed as: 1 1 K AAoT t Ao AoT t 2 R s 1 s 1 1 (9) The rescaling factor t allows us to translate HA into Ao maintaining the units of the observation space. The next procedure is to find a suitable proxy for HA. This proxy can for example be part of the ensemble itself as long as certain criteria are fulfilled. A variable needs to be picked, which is well correlated with the model simulation HA. In the present study, we use potential temperature deviations in the boundary layer, which are well correlated with the initiation of convective precipitation. HA and Ao need to have identical dimensions. The whole process described in Equation (8) can be thought of as replacing H by a sparse matrix of dimensions m n with one value t in each row and all other values in that row set to zero. To maintain the correct dimension, only m columns can contain an entry larger than zero. More complicated observation proxies can be thought of too, but are not pursued in this initial publication. The rescaling factor t is recalculated in every assimilation cycles, so that the EnKF can effectively adjust to changing model states. The Kalman gain can be slightly reformulated to gain insight into the data assimilation process. When we extract the background error covariance matrix from the background error influence term by placing an imaginary HBHT ·(HBHT)-1 between the two terms in brackets in Eq. (4) we can rewrite the Kalman gain as: T T K AAo Ao Ao 1 Correlation Information between BG and OB I R B o1t 2 t 1 1 Error Information for the OB increment (10) Rescaling factor This reformulation allows us clearly to see the correlation information (CI) between the background state and the observation and the error information (EI) for the observation increment. The possible situations in the data assimilation study are presented in Table 4. Table 4 : The influence of the background and the observation errors on the observation assimilation case Error Norms Error Information N I R Bo1t 2 Error Reflected Observation Observation Update Increment K D - HX N 1 D HX C.1 R Bot 2 I D HX C.2 R Bot 2 2 I C.3 R Bot 2 Large Moderate Moderate-update C.4 R Bot 2 Extremely Large 0 Non-update D HX 2 Large-update Intermediate-update If the observation error is much smaller than the background error (Table.1-C.1), the EnKF actively updates the observation increment to the model states. On the contrary, if the Kalman gain K includes the extremely small background error (Table.1-C.4) the EnKF updates nearly no information from the observation increment, mistakenly interpreting that the current model state is highly trustful due to its small model background error. This C.4 case is the most common situation in the data assimilation study because of two reasons. First, the ensemble members produced by the initial model perturbations usually are distributed only with small spread size due to the model physics limitation. Therefore, the physically acceptable background errors tend to be much smaller than the observation errors. Secondly, the background error surfers from the natural underestimating process in the sequential update cycles. It is known as the filter divergence problem. Finally, due to the different temporal error evolution, the magnitude of the background error can be underestimated compared to the observation error in the data assimilation. This problem is schematically illustrated in the Figure 7. Figure 7. Background error underestimation problem a) absolute error simulation, b) relative error simulation (interval indication in blue curves - BG error in the convective initiation stage, in red curves - OB error in developing stage) Due to the non-linear relationship in the perturbation process, the range interval of the blue lines, HA, in Fig.3 would be intensified nonlinearly with time, even though the perturbation A maintains the constant interval over the whole cycles. The observation error R is, also, not constant as a relative error which is proportional to the true state variation. Therefore, the constant absolute error shown in Fig.3-a is an unrealistic error simulation regardless of not suffering the underestimation problem of the BG error. On the other hand, with the realistic error modeling (Fig.3-b), the underestimation of the BG error is an inevitable problem in the data assimilation. For example, in the Madison flood case study, it is found that the model simulation produced the initial precipitation approximately at 1 hour later than the KMKX radar measurement. This time difference causes the Kalman filter consistently to ignore the well quality-controlled and highly resolved radar observation during assimilation cycles. Consequently the radar data assimilation by EnKF couldn’t improve the forecast results. Therefore, the next step of the new method is the weighting technique to count this temporal evolution of the different error sources in the data assimilation. weighting to background error covariance matix T 1 BH T 2 f BH T 2 f A 2 f Ao t s 1 1 HBH T 2 f HBH T 2 f Ao t 2 f Ao t s 1 weighting to observation error covariance matix R f R (11) T (12) The weighting factor f linearly modulates the magnitude of observation and background errors (See Eq.(11) and (12)). In the assimilation cycles (see Fig.4) the observation error evolution is the more proceeded state than the background model error in this study. Figure 8. Weighting factor for the OB error and the underestimated BG error to improve the update process in the data assimilation cycles Therefore, the f in Fig.4 is chosen as less than 1 to reduce the difference between OB and BG errors. It means that by the weighting technique the C.4 in the Table 4 can change into C.3 allowing the effective update through the multiple assimilation cycles without an abrupt model imbalance which can happen in C.1 and C.2. In opposite case (the model simulation advents earlier than the observation), the f is simply selected as larger than 1 to increase the difference between BG and OB errors. The weighting term can be simplified in EnKF formulation as following expression. K AA T o A A o T o 1 Correation Information 1 f I R Bo1t 2 t 1 2 f (1) Renormalized Error Information This expression tells us that the introduced weighting technique adjusts only the relative error ratio (R·B-1t-2) in the error normalization term, conserving the ensemble sizes of A and HA (Aot in the new method) in the observation influence term. It means that this modification only renormalizes the observation increment. This method basically resolves all kind of inactive update problems due to the insufficiently perturbed ensemble spread, the filter divergence and BG error underestimation by the time lag of the BG error evolution in the data assimilation. In the expression of Sherman–Morrison–Woodbury formula to reduce computational expense when dealing with a large number of data points (Mandel 2006), the final Kalman gain in EnKF with weighting factor and rescaling factor is derived as, K K t AAoT R 1 I Ao I AoT R 1 Ao AoT Ro1 1 Where (2) 2 f f t s 1 This newly expressed Kalman gain K is non-dimensional. Also, all factors in K are incorporated into just one constant, . After applying the new method, the full expression of approximate EnKF for the highly non-linear case is Xˆ X K D HX / t (3) 4) Forecast Results and Analysis In order to validate the new method, three experiments are designed: Control case (without assimilation), EXP.1 (with assimilation using non-approximate HA), EXP.2 (with assimilation using approximate HA) and EXP.3 (with assimilation using approximate and weighted errors). a) Filter Stabilization and Forecast Correlation by the HA approximation The HA approximation includes two important considerations: the Gaussian form in HA for the stable recursive filter assimilation and the proper forecast correlation information for the forecast improvement. Figure 9. Filter Stabilization test: EXP.1 (without HA approximation): a) Original HA (radar reflectivity, Z), b) Non-Gaussian spread update at 1630UTC (potential temperature, Kelvin), c) NonGaussian spread update at 1645UTC, EXP.2 (with HA approximation): a) Approximate HA , b) Gaussian spread update at 1630UTC, c) Gaussian spread update at 1645UTC Firstly, the EnKF with the non-linear observation operator suffers from the unstable filter update process. For instance, the non-Gaussian HA (Fig.8-a in EXP.1) due to the non-linearity between HA-A creates the skewed spread update at 1630 UTC (Fig.8-b in EXP.1). By the recursive filter process, the spread update term at 1645UTC produces unrealistically huge θ indicating over 100 Kelvin (□- profile in Figure 9-c), not allowing further model runs. Also, the K loses the consistent filter controllability showing different spread update patterns at 1630 and 1645 UTC (compare the □- profile in Figure 9-b and c). On the other hand, when we look at the EXP.2 in the Figure 9, the data assimilation by EnKF with the approximate HA consistently maintains the Gaussian distribution at 1630UTC and 1645UTC; during the recursive filter updates the data assimilation is continued without any filter instability. innovation spelling Figure 10. Forecast Correlation test : a) Positive Innovation based on higher observation and lower current model state, b) Inversed Innovation controlled by correlation and error information of K, c) Improved forecast by input innovation, (EXP.1 : blue – no approximation with f=1.999, EXP.3 : red approximation with f=0.1) The second problem is the inappropriate forecast correlation between HA and A. For example, if there is no approximation for HA, the K naturally adopts the instant negative correlation; the positive ensemble member of the HA(Qr) is originated from instant condensation by the negatively perturbed ensemble member of the A(θ) and vice versa. Through the negative HA-A correlation, the positive input innovation (D-HX) (orange curve of Figure 10-a in 1hr assimilation cycles) inversely induces the θ decrease of the all ensemble members A(θ) (blue curves in the assimilation cycles of Figure 10-b). Consequently, this cooling effect produces the intensified bubbles in early assimilation cycles (blue curves in Figure 10-c). As discussed in the previous synoptic overview, however, the cause of the Madison flood is the potential instability. It means that the positively perturbed ensemble member in the current A(θ) (high potential instability) will creates the positive ensemble member in the future HA(Qr) (high convective precipitation). Therefore, in this study the negative instant correlation (condensation tendency) of the original K is replaced with the positive forecast correlation (convective tendency), by selecting the HA proxy which is positively correlated with A. Based on this HA-A relationship, the positive input innovation (D-HX) is assimilated to increase θ in the current model state (reds curves after spin-up in Figure 10-b) and produce the huge Qr forecast improvement (reds curves after spin-up in Figure 10-c). b) Renormalization of the error information by the Weighting technique Another important feature of the new method is the error renormalization by the weighting technique. The results of the weighting experiments are examined on three temporal stages. Figure 11. Error weighting test: a) the updated model stated by the inversed innovation b) the mean reflectivity intensity improvements according to various weighting factors, c) Storm Coverage distribution improvement with the threshold value 10 dBZ for the meaningful storm scale precipitation excluding unrealistic bubble feature.) The contribution of the radar observation on the data assimilation is described as a convective initiation on the potential temperature field (Figure 11-a). From 1600UTC to 1700UTC, the observation innovations renormalized by the various weighting factors are updated with different θ increase at the surface layer. On the other hand, the improvement of the forecasting performance is evaluated in terms of the storm intensity and the storm coverage in the Figure 11-b and c. In the forecasting cycles, the intensity and the coverage of the thunderstorm is most close to the radar measurement when f is equal to 0.1 which heightened the BG error and reduced OB error. This simple choice of the weighting factor provides a powerful modulating skill in the weather forecast. Moreover, the spin-up cycles when the thermal energy is translated into the convective dynamic energy might be a practically important duration for monitoring the potentiality whether the newly initialized model state develops into severe thunderstorms or not. Also, in the end of the spin-up the weighting experiments create the tipping point in b) corresponding with the advent moment of coverage score c) for the meaningful precipitation regions. These mathematical simple patterns before the main event appearing can be indexed to equip the earlier weather hazard warning system. Figure 12. Observation assimilation effect in the forecast system (Control vs. EXP.3): WRF (control without the radar assimilation), ENKF(EXP.3 with the radar assimilation based on the new method), RADAR (base radar reflectivity observation); 16:30UTC (assimilation), 17:30UTC (spin-up), 19:00UTC & 21:00UTC (forecasts) After applying HA approximation and weighting technique introduced in the new method we can truly examine how the forecasting capability is improved by the radar reflectivity assimilation. The ENKF (EXP. 3 in Figure 12) shows the radar data assimilation at 16:30UTC. In the Madison flood case study, 1600~1700 UTC is the most effective times for the quality control of the radar reflectivity. Because precipitations in these cycles are still distributed far from the center of the radar, where the ground cluster of the radar reflectivity is located, non-precipitation feature can be easily removed from the precipitation distribution. Also, if precipitation doesn’t appear on the model simulation, the link between observation and model state (rescaling factor t) is absent in EnKF implementation. Therefore, in 1600UTC, the radar reflectivity observation starts being assimilated in WRF. After assimilating the radar observation, spin-up time (1700 ~1730UTC in this study) is needed for hydrological balance due to the disparity between present background feature and newly introduced observational feature on model states. This process is well displayed on 1730UTC scene in ENKF (EXP.3) of Figure 12. Compared to the control run, the weak convective precipitation appears at right side of middle domain part. On the contrary, the precipitation, which was simulated on the control run, tends to be suppressed in EXP. 3, at right side of lower domain part. Even though EXP.3 still doesn’t produce enough precipitations as much as the radar observation displays, we can notice that the model condition on this stage are seemingly translating from the background state into observation state. The main difference between assimilation (ENKF) and non-assimilation (WRF) cases manifest at 1900UTC. In this stage, the control run simulates only weak frontal precipitation with unrealistic bubble feature (WRF in Figure 12). Contrarily, instead of bubbles, the ENKF shows the well organized deep convection popping-up simulating high radar reflectivity. Also the orientation of this convection is very similar to the distribution pattern of the radar reflectivity measurement at 2100UTC. Fully developed supercell storm forecasted in ENKF displays a outflow boundary feature, which is rapidly expanding by strong convective updraft. During 1900~2200 UTC, the model domain includes a whole convective cell. Therefore, the forecast results in these cycles are well matched to the real radar observation. However, after 2200 UTC the forecast results do not well correspond quantitatively to the compared radar observation, because the convective cell disappears crossing the boundary of the model domain. 5) Summary In this study we realize that in order to utilize highly non-linear observation operator and intermittent physical properties for the data assimilation, the following considerations are essential for the background error covariance matrix: the initial perturbation which is associated with the future targeted observation, the appropriate forecast correlation in the Kalman gain, the Gaussian distribution maintenance during sequential updates and the error renormalization for underestimated background error. All these considerations aim to design the well-conditioned background error covariance matrix. For instance, the perturbation method needs to produce initial ensemble members in the model space clearly showing the correlation with what we want to forecast. In the observational and synoptic overviews, it turns out that the potential instability is the main cause of the convective precipitations, leading to the Madison flood at July 2006. Therefore, the initial ensemble members are necessary to be spread randomly with the various potential instabilities providing statistical information for the data assimilation. Specifically, we perturbed the potential temperature varying linearly with heights to impose various potential instabilities. However, the ensemble perturbation in the observation space (HA) forms non-Gaussian distribution due to its non-linear physical response to the initial ensemble perturbation in the model space (A). This non-Gaussian distribution leads to the unstable filter behavior during assimilation cycles. The background error covariance matrix, which is approximated with HA proxy, satisfies the basic Gaussian assumption of EnKF. Also, in the physical aspect, the correlation in the background error covariance matrix is conditioned as the forecast correlation to control consistently the forecast system toward a desirable state. After the filter is endowed with the stability by approximating HA to have Gaussian distribution and the controllability by imposing forecast correlation between HA and A, the proposed weighting technique boosts the update process resolving the underestimation problem of BG. The results in the Figure 12 describe that convective and suppressive patterns on “ENKF” are originated from the positive input innovation, which is difference between “WRF” and “RADAR”, and their uncertainty information, which determines the forecast state between “WRF” and “RADAR” states. Then, through the correlation information in K, the present model state (the potential instability in this study) will be updated from the error-oriented innovation. We presented following experiments systemically to solve each problem. Table 5 : The summary of the problems, solutions and results on each experiment Control EXP.1 EXP.2 EXP.3 - Original Approximate Approximate - Instant correlation Forecast correlation Forecast correlation - Non-linear Linear Linear HA PDF - Non-Gaussian Gaussian Gaussian Filter behavior - Unstable Stable Stable B in K - Illconditioned Wellconditioned Wellconditioned Weighting factor - f = 1.0 f = 1.0 f = 0.1 B in N - Physically Unrealistic R in N - Original scale Improvement - Forecast No convection Underestimated Initial BGE Large Matured OBE Insufficient update Shallow convection Enhanced Initial BGE Diminished Matured OBE Significant update Severe deep convection HA in K HA–A Relationship HA Response to A Perturbation Problematic model update Intensified bubble feature Owing to the new method, we can fully explore the real case for the forecast of the Madison flood in July 2006. The results of the experiments in this study well describes that the WRF system improved by EnKF predicts a greater instability for overall ensemble mean, even though the initial random perturbations, before assimilation, were spread statistically equally between stable and unstable situations. Also, it tells us that even if the initiation by the radar observation in assimilation cycles has been performed for one hour, the WRF simulation for 4~5hr forecast cycles varied tremendously. The results of the new method in this research emphasize how important the proper model initiation by highly resolved observation data during assimilation is in the forecasting system. Furthermore, the mathematically well derived solution presents meaningful patterns potentially to provide early decisive forecast capability whether the weather situation will develop into the natural hazard scale or not. In summary, for the fast but accurate forecast of severe convective precipitations, the radar observation and EnKF are utilized in this study, but it is quite a challenge to apply them together because of the “non-linearity” of the observation operator and the "Gaussian assumption" of EnKF. New method proposed in this paper doesn’t smooth or generalize a nonlinear nature of observation. It associates with approximation of the background error covariance matrix to linearize Kalman gain (relative information) and to renormalize the observation increment (absolute information) conserving non-linearity and/or discontinuity in the observation increment. It means that new method intends the linear response of the ensemble spread among various model variables and the non-linear response of the ensemble mean in the model space to the observation increment variance. The EnKF with the well conditioned background error covariance matrix remarkably improves the short-term forecast performance for the mesoscale convective precipitation. This result is the good initial forecast for Madison flood case in July 27 2006. Appendix A. Symbol X X D A h ... ... Tr(…) H B HA Y Ao y Bo Description Updated ensemble Initial ensemble Observations Ensemble perturbations Observation operator Type Matrix Matrix Matrix Matrix Operator Dimension ns ns ms ns (n s) (m s) Ensemble averaging operator Operator (k s) (k s) Trace operator Linearized observation operator (Jacobian matrix) Background error covariance Ensemble perturbations in observation space Observation Vector Proxy for HA Operator Matrix (k k) 1 mn Matrix Matrix nn ms Vector Matrix m Proxy for observation vector Vector Background error covariance in Matrix observation space based on proxy m ms m m R K • K n m k s f t I Observation error covariance Kalman Gain Final modified Kalman Gain Number of elements in state vector Number of elements in observation vector Unspecified dimension Number of ensemble member Weighting factor Renormalization factor combining s, t, and f Rescaling factor Identity matrix Matrix Matrix Matrix Scalar m m nm nm 1 Scalar 1 Scalar Scalar Scalar Scalar 1 1 1 1 Scalar Matrix 1 Square Acknowledgements References Anderson, J. L. (2001). "An ensemble adjustment Kalman filter for data assimilation." Monthly Weather Review 129(12): 2884-2903. Anderson, J. L. (2003). "A local least squares framework for ensemble filtering." Monthly Weather Review 131(4): 634-642. Beezley, J. D. and J. Mandel (2008). "Morphing ensemble Kalman filters." Tellus Series a-Dynamic Meteorology and Oceanography 60(1): 131-140. Chen, Y. and C. Snyder (2007). "Assimilating vortex position with an ensemble Kalman filter." Monthly Weather Review 135: 1825-1845. Dowell, D. C., F. Q. Zhang, et al. (2004). "Wind and temperature retrievals in the 17 May 1981 Arcadia, Oklahoma, supercell: Ensemble Kalman filter experiments." Monthly Weather Review 132(8): 1982-2005. Evensen, G. (1994). "Sequential data assimilation with a nonlinear quasi-geostrophic model using Monte-Carlo methods to forecast error statistics." Journal of Geophysical Research-Oceans 99(C5): 10143-10162. Evensen, G. (2003). "The ensemble Kalman filter: Theoretical formulation and practical implementation." Ocean Dyn. 53: 343–367. Galchen, T. and R. A. Kropfli (1984). "Buoyancy and pressure perturbations derived from dual-Doppler radar observations of the planetary boundary-layer - Applications for matching models with observations." Journal of the Atmospheric Sciences 41(20): 30073020. Gelb, A., ed. (1974). "Applied Optimal Estimation." M.I.T. Press, Cambridge, USA. Hamill, T. M. and J. S. Whitaker (2005). "Accounting for the error due to unresolved scales in ensemble data assimilation: A comparison of different approaches." Monthly Weather Review 133: 3132-3147. Hane, C. E. and P. S. Ray (1985). "Pressure and buoyancy fields derived from Doppler radar data in a tornadic thunderstorm." Journal of the Atmospheric Sciences 42(1): 18-35. Harlim, J. and B. R. Hunt (2007). "A non-Gaussian ensemble filter for assimilating infrequent noisy observations." Tellus Series a-Dynamic Meteorology and Oceanography 59(2): 225-237. Hong, S.-Y., Lim, J (2006). "The WRF Single-Moment 6-class Microphysics Scheme (WSM6)." J. Korean Mete Soc., 42: 129 - 151. Houtekamer, P. L. and H. L. Mitchell (1998). "Data assimilation using an ensemble Kalman filter technique." Monthly Weather Review 126(3): 796-811. Houtekamer, P. L. and H. L. Mitchell (2001). "A sequential ensemble Kalman filter for atmospheric data assimilation." Monthly Weather Review 129(1): 123-137. Houtekamer, P. L., H. L. Mitchell, et al. (2005). "Atmospheric data assimilation with an ensemble Kalman filter: Results with real observations." Monthly Weather Review 133(3): 604-620. Johns, C. J. and J. Mandel (2008). "A two-stage ensemble Kalman filter for smooth data assimilation." Environmental and Ecological Statistics 15(1): 101-110. Kalman, R. E. (1960). "A new approach to linear filtering and prediction problems." Journal of Basic Engineering 82((1)): 35–45. Lin, Y., P. S. Ray, et al. (1993). "Initialization of a modeled convective storm using Doppler radar derived fields." Monthly Weather Review 121(10): 2757-2775. Mandel, J. (2006). "Efficient implementation of the ensemble Kalman filter." University of Colorado at Denver and Health Sciences Center CCM Report 231. Meng, Z. Y. and F. Q. Zhang (2008). "Tests of an ensemble kalman filter for mesoscale and regional-scale data assimilation. Part III: Comparison with 3DVAR in a real-data case study." Monthly Weather Review 136(2): 522-540. Ott, E., B. R. Hunt, et al. (2004). "A local ensemble Kalman filter for atmospheric data assimilation." Tellus Series a-Dynamic Meteorology and Oceanography 56(5): 415-428. Saucier, W. J. (1955). "Principles of Meteorological Analysis." 76–78. Snyder, C. and F. Q. Zhang (2003). "Assimilation of simulated Doppler radar observations with an ensemble Kalman filter." Monthly Weather Review 131(8): 16631677. Sun, J. Z. (2005). "Convective-scale assimilation of radar data: Progress and challenges." Quarterly Journal of the Royal Meteorological Society 131(613): 3439-3463. Tong, M. J. and M. Xue (2005). "Ensemble Kalman filter assimilation of Doppler radar data with a compressible nonhydrostatic model: OSS experiments." Monthly Weather Review 133(7): 1789-1807. Wang, X. G., D. M. Barker, et al. (2008). "A Hybrid ETKF-3DVAR Data Assimilation Scheme for the WRF Model. Part I: Observing System Simulation Experiment." Monthly Weather Review 136(12): 5116-5131. Whitaker, J. S. and T. M. Hamill (2002). "Ensemble data assimilation without perturbed observations." Monthly Weather Review 130(7): 1913-1924. Xue, M., Y. S. Jung, et al. (2007). "Error modeling of simulated reflectivity observations for ensemble Kalman filter assimilation of convective storms." Geophysical Research Letters 34(10): 5. Xue, M., M. J. Tong, et al. (2006). "An OSSE framework based on the ensemble square root Kalman filter for evaluating the impact of data from radar networks on thunderstorm analysis and forecasting." Journal of Atmospheric and Oceanic Technology 23(1): 46-66. Zhang, F., C. Snyder, et al. (2004). "Impacts of initial estimate and observation availability on convective-scale data assimilation with an ensemble Kalman filter." Monthly Weather Review 132(5): 1238-1253. The U.S. Natural Hazard Statistics, NOAA, http://www.nws.noaa.gov/om/hazstats.shtml Hydrological information center, NOAA, http://www.nws.noaa.gov/oh/hic/flood_stats/Flood_loss_time_series.shtml