1 - WMO

advertisement
Improved prediction skill using regularized error covariance estimates in an
ensemble Kalman filter
Jacob V. Tornfeldt Sørensen
DHI Water & Environment, Hørsholm, Denmark
Henrik Madsen
DHI Water & Environment, Hørsholm, Denmark
Henrik Madsen
Informatics and Mathematical Modelling, Technical University of Denmark, Kgs. Lyngby, Denmark
In ensemble Kalman filtering error covariance estimates are obtained from sample statistics of of an
ensemble of states. These states are perturbed according to a set of underlying error assumptions, which
are believed to capture the first order model errors. The approach often gives dynamically appealing
covariance structures and its updated states are in many respects consistent with the foundation of the
numerical models. However, the description of the model error can never be better than the imposed error
assumption. This implies that the resulting model error covariance estimate is biased and hence from a
statistical point of view the bias-variance trade-off can be exploited. Thus, through regularization we can
introduce a small additional bias and obtain a decreased variance of our covariance estimate. If this is
done successfully, the actual state estimate will in turn have an improved prediction skill.
In this extended abstract, a barotropic regional ocean model of the North Sea and Baltic Sea system is
used to examine the regularization ideas presented. We will consider the regularization implied by
assuming a slowly varying error and a distance dependence of the error covariance. The approach is
designed to simultaneously give a significant speed-up of the scheme. It is acknowledged that the
regularization will lead to updated states, which do not exactly fulfill the equations being solved. However,
if the deviation is small enough the prediction based on the scheme may very well be improved never the
less. The skill with and without regularized error covariance estimated in the assimilation techniques are
examined in hindcast as well as forecast. In this particular case, the efficient approximate techniques have
a clear advantage. This is especially true in data sparse areas and for forecasts.
Introduction
A large part of the world's population lives
close to the ocean and is affected by the coastal
environment. Therefore, forecasting of key
parameters in the coastal ocean has been on the
agenda for decades and in many countries
warning systems are being operated for selected
key parameters, e.g. Vested et al. [11], Gerritsen
et al. [5] and Erichsen and Rasch [3]. For most
forecast products, the forecast skill is of prime
importance. Since numerical modelling is only
slowly improving and has fundamental limitations,
the present on-going development also focuses on
the on-line assimilation of available data.
The basic idea in most assimilation systems
with a forecasting objective, is to provide the best
possible estimate of the ocean state at the time of
forecast. Such an approach was implemented by
Heemink [7] in a storm surge model for the Dutch
coast. He used a Steady Kalman filter and showed
an improved skill relative to a standard forecast
model at both a three and six hour forecast
horizon. Vested et al. [11] and Gerritsen et al. [5]
also investigated the forecast skill in the Southern
North Sea. They similarly found that Kalman filter
based initialization improves the forecast skill at
short time scales. However, at longer time scales
the skill deteriorates for a while before converging
to that of the standard forecast model without
data assimilation. Cañizares et al. [1] applied the
Steady Kalman filter for assimilating tide gauge
data in the North Sea and Baltic Sea system,
where they showed a good filtering performance
in areas of fairly dense data coverage. However,
far from observations, the filtering skill was
degraded. This problem was treated in Sørensen
et al. [10] and a regularization technique (distance
regularization) was introduced to solve it. The
effect on forecast skill of applying distance
regularization to a steady approximation of an
Ensemble Kalman Filter, (Evensen [4]), is
investigated in this paper.
Filtering Technique
The schemes used for the assimilation of water
level data in the present study can be categorized
as sequential estimation techniques. The
technique is basically composed of two parts. One
part is a specification and a model propagation of
the stochastic model state in between
measurement times. The other part provides an
estimate, x ia , of the state based on the
distributions of the model estimate,
x i f , and
measured variables, y io , respectively at time ti.
Let H x if
be the model representation of the
measurement and let Pi f and R i be the error
covariances of x i f and y io respectively. The
standard approach is to assume no bias and use
the best estimator in a minimal variance sense.
This estimator can be written
xia  xif  K i ( y io  H xif ) (1)
where the Kalman gain, K i , is given by,
K i  Pi f H T H Pi f H T  Ri 
1
(2)
The observational error also needs to be
quantified. The specification of error models for
the numerical model and for the observations is
build on a number of assumptions. In the present
study tide gauge stations are assumed to have a
constant standard deviation and to be mutually
uncorrelated. In a dynamical model the uncertainty
is continuously altered by the model dynamics and
hence the error description needs to be
propagated in time. A Markov Chain Monte Carlo
approach is followed here leading to the Ensemble
Kalman Filter (EnKF), (Evensen[4]).
The ensemble approach is an efficient way of
making the workload of the model error
propagation tractable, by reducing the degrees of
freedom in the description dramatically. However,
the resulting scheme requires of the order 100
times a standard model simulation and is still too
expensive for many operational systems, which
are typically pushed close to the limit in terms of
computational resources in order to resolve as
many processes as possible. Further, the EnKF
scheme may introduce spurious correlations in
data sparse regions due to an inaccurate model
error description and the stochastic nature of the
scheme. Hence, despite risking introducing nondynamical modes in the system, two forms of
regularisation of the gain is used, (Sørensen [10]).
The resulting Steady approximation and distance
regularisation are presented below.
Regularisation methods allow the expression of a
prior knowledge about the elements in K i and
their interdependence to be taken into account,
Hastie et al. [6]. The techniques can usually be
cast in a Bayesian framework, e.g. if a prior
information about the model error covariance,
PPRIOR , is available for P f , then the posterior
estimate,
PPOSTERIOR , is
P
POSTERIOR

1
 PPRIOR   P f
1

1
(3)
Such an approach is not tractable in the highdimensional state space under consideration.
However, the approximate schemes presented
below can be regarded as attempts to incorporate
or exploit prior knowledge.
The Steady Kalman filter can be regarded as an
ad-hoc regularisation method. Instead of
calculating the
Kalman gain
at
every
measurement time, it can be assumed that the
state and measurement error covariances are the
same at every update. This yields a constant
Kalman gain, which simultaneously reduces the
computational time to the same order of
magnitude as a standard model execution and
hence makes the scheme applicable to an
operational forecast setting. The Kalman gain is
calculated as a long term average of the gain
from an EnKF. Since the gain actually is varying,
this introduces a bias in the gain elements, but
the time averaging that creates the steady gain
smoothes the gain and lowers the variance of the
elements of the Kalman gain. This variance
reduction lowers the prediction error if the time
varying bias indeed is not too big.
The distance regularisation is an ad-hoc
procedure for expressing that we do not believe
any tide gauge observation should be used for
updating state variables that are positioned far
away, Houtekamer and Mitchell [8]. This is
implemented by constructing a vector, with
coefficients between 0 and 1, which are a
Gaussian function  of their geographical
distance, dm to observation, m, according to,
 (d m )  exp( 
d m2
)
D2
(4)
The parameter, D specifies the spatial
decorrelation scale. This regularisation can be
used in both the EnKF and the Steady Kalman
filter presented above, by multiplying the each
element in the Kalman gain by  ( dm).
Figure 1. Bathymetry and available tide gauge stations,
including 10 measurement stations (M1-M10) and 7
validation stations (V1-V7).
Setup
The area under consideration in the
present study is the North Sea, Baltic Sea and
interconnecting waters. We restrict our attention to
the barotropic hydrodynamics and hence employ
the depth averaged numerical model, MIKE 21,
developed at DHI Water & Environment, DHI [2].
The area and bathymetry is shown in Figure 1 with
the available tidal gauge measurement points
indicated. The gauges were divided into
measurement stations (M) used in the assimilation
and validation stations (V), which were only used
for performance assessment. The spatial
resolution varies from 9 to 1/3 nautical miles
through a two-way dynamic nesting technique.
The temporal resolution is 2.5 minutes and
measurements are available every 30 minutes.
The measurements are linearly interpolated and
assimilated every 10 minutes, i.e. every fourth
model time step.
The period of January 2002 was used in
the study. A steady Kalman gain was estimated as
an average of the gain calculated in an execution
of the EnKF in a three day period from 1 January
to 4 January. All measurements were adjusted to
have the same average as a standard model
prediction in January 2002 to diminish datum
problems. The experiments were designed to test
the forecasting performance of three prediction
schemes:
1. A standard model execution without
data assimilation
2. A Steady Kalman filter
3. A Steady Kalman filter using distance
regularisation
Twenty forecasts were performed with one day
intervals. Each model run included one day of
hindcast and a four day forecast. Hindcast wind
fields were used for forecast. In the assimilation
schemes, the model error was assumed to derive
solely from errors in the wind velocity and open
boundary water level forcing terms. These errors
were assumed to be colored with temporal
correlation scales of 5.7 hours and 1.7 hours,
respectively, and to have spatial correlation
scales of 300 km and 95 km. All measurements
errors were assumed to have a standard
deviation of 0.05 m. See Sørensen et al. [9] for a
more detailed description of the Kalman filter
settings and their effect on hindcast performance.
The spatial decorrelation scale of the distance
regularisation was set to 250 km.
The performance of the schemes were assessed
using the root mean square errors RMSE of the
A=20 forecasts for each forecast horizon, ti, and
tidal gauge station, s,
RMSEti , s  
1
A
A
y
o
i,a
( s)  H ( s) xia, a
(5)
a 1
Bulk performance measures were constructed as
averages of measurement and validation stations.
Results
The bulk RMSE statistics of the three
experiments are shown in Figures 2 and 3 for
measurement
and
validation
stations,
respectively. The overall picture is that the data
assimilation clearly improves the state estimate in
hindcast as compared to the standard model
execution in run 1, but this improved skill on
average only lasts 6-8 hours when using the
classical Steady Kalman filter in run 2. After this
period of improved predictive skill a period follows
with degraded water level predictions. However,
when applying the distance regularisation in run 3
the forecast skill is improved for 2-3 days and no
deterioration is observed at any forecast horizon
for any station. The distance regularisation
provides an improved global state estimate,
(Sørensen et al. [11] ) and hence no erroneous
signals are set free to propagate in the domain at
time of forecast.
A closer look at the RMSE statistics of run 3
indicates two modes of error correction by the
filter in the forecast period. In the first 12 hours
there is a relatively sharp decrease in prediction
skill. Thereafter, the skill only slowly moves
towards that of the reference forecast run 1. This
shows that the long-term prediction improvement
is due to a bias correction by the assimilation
scheme.
Administration of Navigation and Hydrography and the
Swedish Meteorological and Hydrological Institute is
acknowledged.
References
Figure 2. Aggregated RMSE of the reference run 1 (thin
black), the Steady run 2 (thick grey) and Steady
distance regularised
measurement points.
run
3
(thick
black)
in
all
Figure 3. Aggregated RMSE of the reference run 1 (thin
black), the Steady run 2 (thick grey) and Steady
distance regularised run 3 (thick black) in all validation
points.
Conclusions
This
paper
has
undertaken
an
investigation of the water level forecast prediction
skill when using Kalman filtering to initialize the
state at the time of forecast. Two schemes was
tested for the purpose; a steady approximation of
the Ensemble Kalman Filter with and with out
distance regularisation. The performance of the
schemes was investigated in an operational model
of the North Sea, Baltic Sea and interconnecting
waters. Forecast initialisation by the Steady
Kalman filter gave an average improved prediction
for a period of 6-8 hours. The distance regularising
scheme improves the forecast skill significantly,
hence adding increased value to the prediction for
2-3 days.
The use of distance regularisation significantly
improves the forecast skill in the system under
consideration and is to be encouraged for
operational forecasting purposes. It can easily be
combined with any filtering scheme but the Steady
Kalman filter will often be sufficiently accurate and
hence is recommended due to its low
computational cost.
Acknowledgements
This research was carried out jointly at DHI Water &
Environment and the Technical University of Denmark
under the Industrial Ph.D. Programme (EF835).
Contribution of tide gauge data from the Danish
Meteorological
Institute,
the
Royal
Danish
[1]
Cañizares, R., Madsen, H., Jensen, H.R. and
Vested, H.J., Developments in operational shelf sea
modelling in Danish waters, Estuar. Coast. Shelf Sci.,
Vol. 53, (2001), pp 595-605.
[2]
DHI, MIKE 21 coastal hydraulics and
oceanography, DHI Water & Environment (2001).
[3]
Erichsen, A.C. and Rasch, P.S., Two and
three-dimensional model system predicting the water
quality of tomorrow, Proceedings of the seventh
international conference on estuarine and coastal
modeling, St. Petersburg, Florida, USA, (2002), pp
165-184.
[4]
Evensen, G., Sequential data assimilation with
a non-linear quasi-geostrophic model using Monte
Carlo methods to forecast error statistics, J. Geophys.
Res., Vol. 99(C5), (1994), pp 10,143-10,162.
[5]
Gerritsen, H., de Vries, H. and Philippart, M.,
The Dutch continental shelf model, D.R. Lynch & A.M.
Davies, eds, Quantitative Skill Assessment for Coastal
Ocean Models, American Geoph. Union, (1995), pp
425-467.
[6]
Hastie, T., Tibshirani, R. and Friedman J., The
elements of statistical learning: data mining, inference,
and prediction, 1st edition, Springer Verlag, (2001).
[7]
Heemink, A., Storm surge prediction using
Kalman filtering, PhD thesis Twente University of
Technology, The Netherlands (1986).
[8]
Houtekamer, P.L. and Mitchell, H.L., Data
assimilation using an ensemble Kalman filter
technique, Monthly Weather Review, Vol. 126, (1998).
[9]
Sørensen, J.V.T., Madsen H. and Madsen H.,
Parameter sensitivity of three Kalman schemes for the
assimilation of tide gauge data in coastal and shelf sea
models, Ocean Modelling, Submitted, (2003).
[10]
Sørensen, J.V.T., Madsen H. and Madsen H.,
Efficient sequential techniques for the assimilation of
tide gauge data in three dimensional modeling of the
North Sea and Baltic Sea system, J. Geophys. Res.
(Oceans), In press, (2004).
[11]
Vested, H.J., Nielsen, J.W., Jensen H.R. and
Kristensen, K.B., Skill assessment of an operational
hydrodynamic forecast system for the North Sea and
Danish belts, D.R. Lynch & A.M. Davies, eds,
Quantitative Skill Assessment for Coastal Ocean
Models, American Geoph. Union, (1995), pp 373-396.
Download