Uses of Ensemble Output

advertisement
STATISTICAL INTERPRETATION METHODS APPLIED TO ENSEMBLE FORECASTS
USES AND INTERPRETATION OF ENSEMBLE OUTPUT
(Laurence Wilson, Environment Canada)
Abstract: It could possibly be argued that ensemble prediction systems (EPS) have been more
useful to the modeling and data assimilation community than to the operational forecasting
community. There are a few possible reasons for this. First, the expectation that an ensemble
system would permit a confidence estimate to be issued with each operational forecast has not
been fully realized. Second, EPSs produce enormous quantities of output, which must be further
processed before it can be effectively used in operations. Over the years since EPSs became
operational, various processing methods have been tried and evaluated, with varying degrees of
success. One can also think of many ways of interpreting EPS forecasts for operational use,
which have not yet been tried.
In this talk various aspects of interpretation of ensemble output will be discussed, focusing on
those methods which should be most promising for operational use of ensembles. The discussion
will begin with the search for a relationship between the ensemble spread and the skill of the
forecast, which is a prerequisite to the ability to predict forecast skill. Then, following a brief review
of existing statistical interpretation methods, a survey will be presented of statistical interpretation
efforts that have been applied to ensemble forecasts. The presentation will also include some
suggestions for new methods of interpretation of ensemble output.
1.
Introduction
It is probably fair to say that the output of ensemble systems has not been as widely used and
accepted in the operational meteorological community as one would have hoped. It can also be
said that ensemble systems have been quite useful to numerical weather prediction centers, as an
aid to the evaluation of models and especially data assimilation systems. Now that ensembles
have been run operationally for about eight years, it is worthwhile to examine the impediments to
their operational use, and to search for and identify ways to encourage their use by forecasters, in
countries which run ensemble systems as well as in countries which have access to the output of
ensemble systems run elsewhere. To help stimulate thinking about the use of ensembles in short
and medium range forecasting, this paper proposes reasons for the slow adoption of ensemble
output in operations, and discusses by example ways of overcoming these problems.
Several examples of the interpretation and processing of ensemble output are discussed.
Included are some methods that have already been tried and tested on ensemble output, and
some post-processing techniques that have been applied successfully to deterministic forecasts
but have not yet been tried with ensemble output. In all cases, the examples have been selected
for their potential to facilitate the interpretation by forecasters of the very large volumes of data
generated by ensemble systems.
2.
Challenges to the use of ensembles in forecasting
The output of an ensemble system consists of a set of alternative forecast values of each model
variable at each forecast projection time, which is usually interpreted as a representation of the
probability distribution of each model variable. As such, ensembles always produce an enormous
quantity of data, linearly related to the number of members in the ensemble. This presents the
first challenge: How to summarize or otherwise process the data so it can be interpreted and used
in operational forecasting, that is, how to process the data so that it yields meaningful information.
Since forecasters normally already have available to them a full range of forecast guidance from
the full resolution model, and perhaps also from statistical interpretation of that model output,
ensemble output is likely to be used only if it is demonstrated that it offers additional value or
information over existing products. If, for example, it is shown that ensemble forecasts are
superior to the available model output, or it could be shown that the ensemble gives reliable
2
information on the expected skill of the full resolution forecast, then potential users might be
persuaded to search the ensemble output for these added benefits.
A third challenge to the use of ensembles in forecasting is related to the stochastic nature of the
output. Forecaster may not be used to evaluating and using probabilistic forecasts. If the issued
forecast products must be deterministic or categorical, the forecaster must convert the
probabilistic ensemble output into a categorical forecast. In short, dealing with forecasts in
probabilistic format may require some basic change in the forecasting process, which might be
resisted, especially if the final forecast is not probabilistic.
3.
Presentation and processing of ensemble forecast output for use in operational
forecasting
There are three general types of products that are produced from ensemble systems:



Forecasts of forecast skill or confidence
Graphical displays of output
Probability forecasts.
These are described in this section with examples.
3.1
Forecasts of forecast skill
One of the potential benefits of ensemble prediction that was recognized right from the beginning
is the ability to use the spread of the ensemble as an indication of the relative confidence in the
forecast. It would be expected that a large spread in the ensemble, as represented for example by
its standard deviation, would be associated with greater errors in the forecast, either the forecast
from the ensemble model or the full resolution version of the model. The search for a strong
“spread-skill” relationship has continued for many years, and it has proven somewhat elusive.
However, more recent efforts have met with more success and some centers, including the UK
Met Office and NCEP, are now issuing confidence factors along with the forecast output.
The spread-skill relationship can be represented in various ways and several have been tried in
experiments. The spread of the ensemble is usually represented either by the variance or by its
square root, the standard deviation. The spread can be taken with respect to the ensemble mean
or the unperturbed control forecast. The spread about the control will always be greater or equal
to the spread about the ensemble mean. Forecast skill may be expressed in many ways; the most
common are the anomaly correlation and the root mean square error (rmse). Probably the most
relevant to forecasters would be to compare the skill of the full resolution model with the ensemble
spread, but the comparison is often made with the skill of the unperturbed control forecast. The
relationship may be examined via a simple scatter plot, or by means of 2 by 2 contingency tables,
where the skill and spread are each characterized by above- and below-average categories, and
total occurrences are tallied in each of the four categories, 1. high skill-high spread; 2. high skilllow spread; 3. low skill-high spread, and 4. low skill-low spread. In such a format, a strong spreadskill relationship is indicated by large counts in categories 2 and 3, and low counts in categories 1
and 4. While studies of the spread-skill relationship have typically been carried out over relatively
large domains, it may well be true that a local spread-skill relationship could be found, even if a
more general relationship does not exist. This should be investigated.
Figure 1. A plot of ensemble spread in terms of anomaly correlation vs. Skill of the control
forecast, for the ECMWF ensemble, August to October, 1995. Forecasts are for day 7 500 mb
geopotential height. The different symbols refer to the months of the verified season. (After
Buizza, 1997)
Figure 1 shows an example of a spread-skill plot graph, for an earlier version of the ECMWF
ensemble system. In this example, the data are presented in terms of the anomaly correlation
coefficient, and the relationship sought is between the skill of the control forecast and the
3
ensemble spread, both expressed as an anomaly correlation. The figure is graphically divided into
four quadrants by the vertical and horizontal lines indicating the mean skill and mean spread
respectively. If a strong relationship existed, it would show up as an elongation of the cloud of
points in the direction of the positive diagonal on the graph, with more points in the upper right and
lower left quadrants of the graph, and fewer points in the upper left and lower right quadrants.
This is indeed the case here, but there is also considerable scatter, and the relationship cannot be
considered particularly strong.
Fig. 2, from a more recent assessment of the spread-skill relationship for the ECMWF ensemble,
appears more encouraging; the plotted points for each forecast projection do tend to line up nearly
parallel to the diagonal, especially for the longer projections. However, averaging over several
cases to obtain each point has eliminated most of the scatter. It is difficult to know from this graph
whether one could reliably estimate the skill from the ensemble spread for a particular case.
Figure 2. Relationship between the ensemble standard deviation and the control error standard
deviation for the ECMWF ensemble forecasts during winter 1996/97. The sample for each 24 h
projection is divided into 10 equally populated categories, represented by the points on the graph.
(After Atger, 1999a).
Although a completely reliable spread-skill relationship has proven somewhat elusive, it remains
tempting to use the ensemble output in this way, and at least two centers, the UK Met Office and
NCEP in the U.S., regularly give out confidence estimates with their forecasts, based on the
ensemble output. More evaluation of the spread-skill relationship is needed, and it would be best
if it were done with respect to the operational model, since that is what forecasters already have
available. If the ensemble spread is related to the forecast error of the full-resolution model, this
information would certainly be of use in operational forecasting. In Canada an experiment has just
started to determine a relationship between the ensemble spread and the skill of the operational
model, but in terms of surface weather element forecasts. If successful, this information will be
used directly to affect the terminology used in our computer-generated worded forecasts.
3.2
Graphical displays of output
To help with the interpretation of the distribution of forecasts, ensemble output is frequently
displayed in various graphical forms, particularly “postage stamp” maps, “spaghetti” diagrams and
“plume” diagrams. Postage stamp maps consist of forecast maps for all of the ensemble
members, presented on one page, covering a limited domain of interest, and at a particular level.
They most often depict the 500 mb or surface fields, for a particular forecast projection. These
present enormous amounts of information, and may be hard to read and interpret, especially if the
ensemble has many members. To help with the interpretation of spatial fields from the ensemble,
the forecasts may be processed to help group “similar” patterns or, alternatively, identify patterns
that are markedly different from the others. Two such processing methods in operational use are
clustering and tubing (Atger, 1999b). Clustering involves calculating the differences among all
pairs of ensemble members, grouping together those for which differences are small, and
identifying as separate clusters those members with larger differences. Tubing involves identifying
a central cluster which contains the higher density part of the ensemble distribution, then
identifying “outliers” - extreme departures from the centroid of the main cluster. The line joining
the centroid of the main cluster and each outlier then defines the axis of a tube, which extends in
the direction of the outlier from the main cluster. The tubes may be interpreted as an indication of
the directions in which the forecast may differ from the ensemble mode. Whatever the similarities
and differences between these two methods of categorizing or grouping the members of the
ensemble distribution, the aim is the same: To organize the massive information content of the
ensemble to become more easily interpretable by forecasters.
Figure 3. An example of a plume diagram, as issued operationally at ECMWF. Panels are for 2 m
temperature (top), total precipitation over 12 h periods (middle), and 500 mb heights (bottom).
The full resolution forecast, control forecast, ensemble mean and verifying analysis are included
on the top and bottom panels.
4
Spaghetti plots represent a sampling of the full ensemble output in a different way. In a spaghetti
plot, a single 500 mb contour is plotted for all the ensemble members. Thus the main ridge and
trough features of the 500 mb flow can be seen along with the way in which they are forecast by
each ensemble member. Areas where there is greater uncertainty immediately stand out as large
scatter in the position of the contour. One must be cautious in the interpretation of the chart,
however, because large spatial scatter may not be significant when it happens in areas of flat
gradient. For this reason, a measure of the ensemble spread such as the standard deviation is
usually plotted on the map. Then, areas of large apparent uncertainty in the 500 mb contour
forecast can be checked to verify that they also coincide with larger ensemble spread. An example
of a spaghetti plot is shown in Wilson (2001a) elsewhere in this volume.
Plume diagrams are equivalent to spaghetti plots. Instead of plotting the spatial distribution of the
ensemble member forecasts at a particular forecast projection, the distribution of the ensemble
forecasts is plotted for a particular location as a function of projection time. Plume diagrams are
most often prepared for weather elements such as temperature, for specific stations. For ease of
interpretation, the probability density of the ensemble distribution might be contoured on a plume
chart. Fig. 3 is an example of a plume diagram as prepared and run operationally at ECMWF,
showing plumes for temperature, total precipitation and 500 mb heights for Reading UK. The
temperature and 500 mb height graphs are contoured to show probability density in four intervals,
and the ensemble mean, control, full resolution forecast and verifying analysis (not the
observations) are included. The plumes indicate graphically the increasing ensemble spread with
increasing projection time and show clearly where the full resolution forecast lies with respect to
the ensemble for a specific location. In the example shown, the mode of the ensemble remains
close to the verifying value and the full resolution forecast also agrees. Only once, at about day 5
on the 500 mb forecast, does the verifying value lie outside the plume and therefore outside the
ensemble. The precipitation graph clearly shows where there are timing differences among the
precipitation events, and facilitates comparison of the ensemble forecast with the full resolution
forecast.
Figure 4. “Box-and-whisker” depiction of ensemble forecasts of cloud amount (top), total
precipitation (2nd from top), wind speed (2nd from bottom) and 2 m temperature (bottom), for
Beijing, based on ECMWF ensemble forecasts of January 30, 2001. The control forecast (red)
and the full resolution forecast (blue) are also shown.
Another format in which single-location ensemble output is presented is shown in Fig. 4. These
are “box-and-whisker” plots of direct model output weather element forecasts. The boxes shown
at each 6 h indicate the range of values forecast by the ensemble between the 25th and 75th centile
(the middle 50% of the distribution), while the median of the ensemble is indicated by the
horizontal line in the box. The ends of the lines extending out of both ends of the box (the
“whiskers”) indicate the maximum and minimum values forecast by the ensemble. These kinds of
plots are an effective graphical way of depicting the essential characteristics of a distribution.
Asymmetries and the spread of the distribution can be immediately seen, and variations over the
period of the forecast are also immediately apparent.
3.3
Probability forecasts
Aside from the ability to predict the skill of the forecast in advance, the most important use of
ensemble forecasts is to estimate probabilities of events of particular interest. Specifically,
probabilities of the occurrence of severe or extreme weather events are usually of greatest
interest. Prior to the advent of ensemble forecasts, the only way to obtain probability forecasts
was to carry out statistical interpretation of the output from a model (MOS or Perfect Prog
techniques - see below). These statistical methods depend on the availability of enough
observational data to obtain stable statistical relationships, which is difficult to achieve for
extremes. If the ensemble model is capable of predicting extreme events reliably, and provided
the perturbations can represent the uncertainty reasonably well, then it should be possible to
obtain reliable estimates of the probability of an extreme event from the ensemble.
5
The procedure usually used for estimating probabilities is straightforward. First, a threshold is
chosen to describe the event of interest, and to divide the variable into two categories. This might
be “temperature anomaly greater than 8 degrees” for example, where the threshold of (mean + 8)
is used to separate temperatures which are extremely warm from all other temperatures. As
another example, extreme rainfall events might be delineated by “12h precipitation accumulation
greater than 25 mm”. Once the event is objectively defined, the probability may be estimated
simply by determining the percentage of the members of the ensemble for which the forecast
satisfies the criterion defined by the threshold.
Table 1. Suggested distributions to fit ensemble weather element forecasts when the ensemble
is small.
Weather Element
Temperature,
geopotential height,
upper air temperatures
Precipitation (QPF)
Distribution
Normal
Gamma Kappa
(cube-root normal)
Wind speed
Weibull
Cloud amount
Beta
Visibility
Lognormal
Characteristics
Two-parameter, mean and standard
deviation
Symmetric, bell-shaped
Gamma: Two-parameter - "shape and
spread"; positively skewed. Applies to
variables bounded below, approaches
normal when well away from lower bound
Kappa: Similar to gamma in form, but not
as well known.
Cube-root normal: The cube root of
precipitation amount has been found to be
approximately normally distributed.
Two-parameter; negatively skewed;
applies to variables bounded below.
Two-parameter; a family of distributions
including the uniform and U-shaped as
special cases. Intended for variables
which are bounded above and below.
Negatively or positively skewed,
depending on parameters.
Normal distribution with logarithmic x-axis;
applies to positive-definite variables.
Figure 5. Schematic of the process of probability estimation from the ensemble. Either the raw
ensemble distribution or a fitted distribution may be used to estimate probabilities.
Probability estimation is shown schematically in Fig. 5, where the threshold of the variable X is 1
and we are interested in the probability of the event (X>1). Fig. 5 shows a normal distribution; it is
usually the empirical distribution of the ensemble values that is used to estimate probabilities. This
is preferable, except for small ensembles (perhaps less than 30 members), when fitting a
distribution to the ensemble might lead better resolution in the probability estimates. Table 1
indicates theoretical distributions that have been shown to fit empirical climatological distributions
of some weather elements quite well.
It should be noted that probability estimation represents a form of interpretation of the ensemble
forecast. The predicted distribution is sampled at a specific value, then integrated to a single
probability of occurrence of the chosen event. Details of the probability distribution are lost in the
process. Some of these details can be kept by choosing multiple categories of interest using
several thresholds, for example 1,2, 5, 10 and 25 mm of precipitation, and determining
probabilities for each category. However, this still amounts to sampling the distribution. If the
number of categories becomes a substantial fraction of the ensemble size, e.g. more than 1/5, it is
preferable to fit a distribution in order to obtain stable probability estimates. Once probabilities
have been estimated for particular events, they can be presented as probability maps. Several
examples of such maps are contained elsewhere in this volume. If it is probability of an anomaly
6
over a specific threshold that is plotted, one should be aware of the source of the climatological
information, i.e., observations or analyses.
4.
Statistical interpretation applied to ensembles
Statistical interpretation methods can be applied to ensemble forecasts just as they have been
applied to deterministic forecasts from operational models. There are, however, a number of
factors that must be considered in choosing the statistical interpretation method. There are also
many possibilities for further application. This section describes several statistical processing
techniques that have been applied or could be used with ensemble forecasts. Some examples are
also discussed.
4.1
PPM and MOS
Perfect prog (PPM) and model output statistics (MOS) both involve the development of statistical
relationships between surface observations of a weather element of interest (the predictand) and
various predictors, which may come from analyses (PPM) or from model output (MOS). Some
details of the characteristics of the two methods are described in tables 2 and 3. Both methods
can be used with a variety of statistical techniques, including regression, discriminant analysis, and
non-linear techniques such as Classification and regression trees or Neural nets. PPM and MOS
refer to the way in which predictors are obtained; the statistical technique determines the form of
the relationship that is sought.
Table 2. Comparison of Perfect Prog and MOS
Classical
Development
of Equations
Application in
operational
forecast mode
Comments
Perfect Prog
MOS
PredictandObserved at T0
PredictorsObserved (Anal)
at T0 -dT
PredictandObserved at T0
PredictorsObserved (Anal)
at T0
PredictandObserved at T0
PredictorsForecast values
valid at T0 from
prog issued at
T0 -dT
Predictor values
observed not (T0)
to give
Predictor values
valid for T0 +dT
from prog issued
now to give
Predictor values
valid for T0 +dT
from prog issued
now to give
Forecast valid
at T0 -dT
Forecast valid
at T0 +dT
Forecast valid
at T0 +dT
dT less than or
equal to 6 hours
preferable unless
persistence works
well. Time lag built
into equaitions
dT can take any
value for which
forecast predictors
are available
Sample application
as PPM but
separate equations
used for each dT
One might apply PPM and MOS development to ensemble forecasts in different ways. The
simplest would be to take an existing set of PPM equations, developed perhaps for deterministic
forecasts, and simply apply them to all the members of the ensemble. In this way, an ensemble of
interpreted weather element forecasts can be obtained. For MOS, one must ensure that the
7
development model is the same as the model on which the equations are run. Therefore it would
likely be necessary to develop a new set of MOS equations using output from the ensemble
model. Furthermore, if the model itself is perturbed (as in the Canadian system), different
equations would need to be developed for each different version of the model. The extra
development cost is very likely prohibitive, which means that PPM is the method of choice for use
on ensembles except where MOS equations exist for the ensemble model.
Table 3. Other characteristics of MOS and Perfect Prog
Classical
Perfect Prog
MOS
Potential for
good fit
Relationships
weaken rapidly as
predictor
predictand time lag
increases
Relationships
strong because
only observed data
concurrent in time
is used
Relationships
weaken with
increasing
projection time due
to increasing model
error variance
Model
dependency
Model-independent
Model-independent
Model-independent
does not use
model output
does not account
for model biasmodel errors
decrease accuracy
accounts for model
bias
Large development
sample possible
Large development
sample possible
Generally small
development
samples depends
on frequency of
model changes
Access to
observed or
analysed variables
Access to
observed or
analysed variables
Access to model
output variables
that may not be
observed
Model bias
Potential
sample size
Potential
Predictors
The simultaneous production of many forecasts leads to the possibility of using a collection of
ensemble forecasts as a development sample for MOS. One might argue that ensembles provide
a means of obtaining a large enough dataset for MOS equation development much sooner than
would be possible with only a single model run. While a MOS development sample collected in
this way might represent some improvement on a MOS sample collected over the same period for
a single model, the main effect is to increase the unexplainable variance in the data. Each
member of a specific ensemble forecast will be matched with only one value of the predictand.
The variation within the ensemble represents error variance with respect to the predictand value,
which will be a minimum when the observation coincides with the ensemble mean, but will not be
explainable by the predictors. Thus, a MOS development sample composed of pooled ensemble
forecasts is not likely to lead to a significantly better fit than a MOS development sample based on
a single full resolution model.
The ensemble mean has been shown in many evaluations to perform better than both the control
forecast and the full resolution model in terms of the mean square error, mainly because the
ensemble mean tends not to predict extreme conditions. Although the ensemble mean is a
statistic of the ensemble and shouldn’t be used or evaluated as a single forecast, it is likely to work
8
well as a MOS predictor, especially a “least-squares” regression-based system which seeks to
minimize the squared difference between the predictors and the predictand.
Another option for the use of MOS would be to design a system which builds on what already
exists. If a MOS system exists for the full resolution model, then MOS predictors from any
ensemble model(s) could be added to the existing system. This would be a way of determining
whether the ensemble forecasts add information to the forecast that is not already available from
the full resolution model. Hybrid systems may also include perfect prog predictors; there are no
restrictions on the predictors that can be used as long as they are available to run the statistical
forecasts after development.
Figure 6. Example of single station MOS forecasts produced by running MOS equations on all
members of the U.S. NCEP ensemble. Forecasts are for maximum and minimum temperature
(top), 12h precipitation (middle) and 24h precipitation (bottom). The ensemble mean and the full
resolution model forecast are also indicated.
In summary, the easiest way to link ensemble forecasts to observations via statistical interpretation
is to run an existing PPM system on all the ensemble members, to obtain forecast distributions of
weather elements.. MOS forecasts require more effort to obtain, but the may be worthwhile,
especially for single-model ensemble systems.
In the interest of determining what information the ensemble might contribute to what is already
available from probability forecasts from statistical interpretation systems, it would be useful to
compare the characteristics of MOS forecasts on the full resolution model with those directly from
the ensemble. Some such comparisons have been done (See, for example, Hammill and Colucci,
1998), but more comparisons are needed. Hamill and Colucci (1998) found that the ensemble
forecasts could perform better than MOS for higher precipitation thresholds, but the ensemble
system could not outperform MOS on forecasts at lower precipitation thresholds. They also were
not able to find a spread-skill relationship using surface parameters; evidently such a relationship
is even more elusive than it is for middle atmosphere, medium range forecasts.
An example of PPM maximum-minimum temperature forecasts computed from the Canadian
ensembles is shown in another paper in this volume, Wilson (2001a). These forecasts used the
same regression equations that have been used for many years with the operational model output,
and are run daily for eight Canadian stations. As another example, NCEP has test-run existing
MOS equations on their ensemble system, presenting the output in the form of box and whisker
plots (Fig. 6). These forecasts are not currently operational.
4.2
Analogue forecasts
The analogue method has been applied to deterministic forecasts for a long time. The idea is to
search for past situations which are “best matches” to the current forecast, then to use the
weather associated with the analogue cases as the forecast. Usually matches are sought with the
500 mb pattern, but the surface map or the thickness may also be used. “Best” is usually defined
in terms of the correlation or the Euclidean distance. The forecast is then given by a weighted
function of the weather conditions associated with the analogue maps. Analogue methods are a
form of PPM, since previous analyses are used (rather than model output) to define the analogue
cases. The analogue method requires a large historical dataset to work well, and the more detail
that is demanded in the matching process, the harder it is to find close matches.
It would seem that the analogue method would be attractive for use with ensembles. For example,
one could find the best analogue cases for all members of the ensemble individually using a
reanalysis database. If a few matches were chosen for each ensemble member, these cases
could be taken as an ensemble of actual outcomes that resemble most closely the ensemble
forecast. Such an analogue ensemble could be used to evaluate and interpret the ensemble
forecast in several ways:
9

To validate the ensemble forecast on individual cases, by comparing the model and
occurrence distributions.

To define clusters, for example if the same analogue matches several of the ensemble
members.

To produce PPM estimates of probabilities, from the weather associated with the analogue
ensemble, and comparing with the probability estimates directly from the ensemble.

To check the realism of the outcomes predicted by the ensemble members, by comparing
the goodness of fit with analogue cases.
Figure 7. Assessment of combination full resolution model/eps-based MOS forecasts (“MOS”), vs.
Full resolution (“no eps”) and eps-based (“DMO eps”) forecasts of probability of remperature
anomaly greater than +2 degrees. Verification in terms of ROC area (top) and Brier score
(bottom) for one summer of independent data for De Bilt. (after Kok, 1999).
Some interesting questions come to mind, also:

Is there more or less variance in the analogues that correspond to forecasts from the
unperturbed control, compared to analogues for the ensemble members.

Is there any consistency in the evolution of the ensemble with respect to the evolution of
the analogue cases?
The analogue method has not been used in this way with ensembles yet, to my knowledge, except
in connection with standard PPM and MOS methods. Some MOS development experiments using
analogue predictors were carried out in the Netherlands (Kok, 1999), applied to the ECMWF
ensemble forecasts. Two tests were done to determine the impact of the ensemble forecasts:
First, MOS forecast equations were developed for 2 m temperature at De Bilt using direct model
output predictors and analogue predictors based on the full-resolution model and 2 m temperature
forecasts from the eps. (Indicated “MOS” on Fig. 7) A second set of equations was developed
without eps predictors (“no eps” on Fig. 7). Both these sets of equations were developed on two
summers of data, and tested on a third summer. Probabilities of temperature anomalies
exceeding various thresholds were computed and evaluated using the ROC and Brier score (See
Wilson, 2001b elsewhere in this volume). Fig. 7 shows sample results from this comparison. It
can be seen that, in terms of both the Brier score and ROC, the “no eps” equations perform better
than the ensemble alone (“DMO eps”) at all forecast ranges out to day 7, and are about equal in
performance after day 7. The “MOS” forecasts, however, show improvement over both the other
sets of forecasts at all ranges. Improvements over the eps alone are largest at day 3 and smallest
at day 10, while improvements over the “no eps” are marginal until day 7. These results suggest
that the eps does add information to what is available from the full resolution model, but that it
does not contribute significantly except after day 7. This may be partly because the full resolution
model gives sharper forecasts (higher resolution), which are subject to larger errors and lower
reliability at longer ranges than the ensemble forecasts.
4.3
Neural Networks
A neural network is a non-linear statistical interpretation method that seeks to classify or organize
predictand data into a set of response nodes which are triggered by specific characteristics of the
stimulus or predictor data. A particular form of neural nets, called self-organizing maps (Kohonen,
1997) has been shown by Eckert et al (1996) to be useful to the operational interpretation of
ensemble forecasts. Eckert et al (1996) used three years of 10 by 10 grids of 500 mb height
analyses to train the network. The result of this was an organized 8 by 8 array of 64 nodes, each
of which represents a particular map type for the area of interest around Switzerland. The 64
maps are organized in the sense that maps which are close to each other in the 8 by 8 array are
10
similar, and those which are far apart are markedly different. Fig. 8 shows the bottom right hand
corner of the array, 16 of the maps. It can be seen, for example that the top row is characterized
by a ridge over the area, which decreases in amplitude from left to right. Moving downward along
the left side of the array, the ridge becomes displaced more to the west, and a low located to the
east becomes more intense.
Figure 8. Sixteen of the 64 map types determined from three years of data, for the Eckert at al
(1996) application of Neural nets to the interpretation of ECMWF ensemble forecasts. Arrows
indicate the direction and intensity of the geostrophic wind over Switzerland. (After Eckert et al,
1996).
Once the array is trained on the development data, the ensemble members for a particular
forecast can be compared to the array of map types. The RMS distance of each ensemble
member from each node of the array is computed, and the ensemble members are classified into
corresponding map types by choosing the closest (lowest RMS difference). Fig. 9 shows an
example of the classification result, for three different cases, for 4, 6 and 10 days all verifying at
the same times, using the ECMWF ensemble forecasts. The spread in the ensemble forecast is
immediately apparent by the spread of the triggered nodes on each display. The top case is an
example of little ensemble spread, the second case (middle) shows a forecast where the
ensemble changes its idea as the verification time approaches, and the third case shows a
bimodal ensemble forecast, with the ensemble split nearly equally between two different map
types at all three forecast projections.
Figure 9. Three examples of application of the Kohonen Neural net to ensemble forecasts from
the ECMWF system, expressed as an 8 by 8 grid of gray-shaded squares. The intensity of the
shading indicates the number of ensemble members that are closest to each map type, the map
type that fits the full resolution model is shown by a square, and the analysis type is indicated by a
cross. Four day (left), 6 day (center) and 10 day (right) forecasts are shown.
The self-organizing maps were used to verify the mode of the ensemble vs. The full resolution
model forecast, against the analysis. That is, the map type which matched the greatest number of
ensemble members was compared with the map type that matched the full resolution model
forecast, and both were compared with the map type that fit the analysis. The verification sample
consisted of 8 months; the 4 day and 6 day projections were verified. For the 96h projection, it
was found that when the spread was small, the full resolution model coincided with the ensemble
mode usually, but the mode was more accurate when the two didn’t correspond. Overall accuracy
was above average when the spread was small, quality was low when the spread was large.
When the ensemble spread was large, the full resolution model was superior. For 144h, the
spread-skill relationship was the same as for 96h, but the ensemble mode was more accurate than
the full resolution model, regardless of the spread. These results tend to confirm the existence of
a spread-skill relationship, using a synoptically-oriented verification, but it is also clear that the
relationship is not strong.
Neural nets, especially the version described here, are similar in many ways to the analogue
method. Instead of using a compete set of archive cases as in the analogue method, the selforganizing maps represent a way of pooling the historical cases into a set of map types, a
synoptically-organized form of clustering the historical cases. With only three years of historical
data, map typing is needed to eliminate some of the variance in the dataset. If a full reanalysis
dataset were to be used as a training sample for a Kohonen self-organizing map, it should be
possible not only to increase the number of nodes, but to stratify by season and to perhaps use
additional predictors to train the network. This is potentially a very powerful method for extracting
synoptically meaningful information from ensemble forecasts and deserves additional
development. In practical terms, the use of a network of maps also eliminates the need to search
the entire history for matching cases, as is required in the analogue method, which is another
advantage of neural nets.
5.
Concluding remarks
11
This paper has presented a discussion of the challenges faced by those who wish to use the
output of ensemble prediction systems in operational forecasting, along with a description of some
graphical and statistical methods which can be used to overcome these challenges. The most
important issue concerns the need to extract the information that is relevant to the forecasting
process from the vast quantities of data that are produced by ensemble systems, and to present
that information in ways which facilitate and encourage its use. Secondly, since ensemble output
is presented as additional information to other model output, it is necessary to demonstrate that
the ensembles add value to that output. And thirdly, in those countries where probability forecasts
are not in general use, some adaptation of the forecasting process to prepare and interpret
probability forecasts will be necessary.
Several ways of graphically sampling the ensemble distribution have been described, along with
statistical processing methods that summarize the information in the ensemble. These include
clustering, tubing and probability estimation. Following the success of statistical interpretation of
deterministic model output, there should be considerable potential for the adaptation of statistical
interpretation methods to ensembles. The application of PPM and MOS formulations to
ensembles has been discussed, and a few examples presented. Finally, a rather promising
application of neural nets has been described.
It is clear that ensemble forecasts show great potential for effective use in forecasting, which is
only beginning to be tapped. The ability to forecast the skill of the forecast in advance, and the
possibility of obtaining reliable forecasts of extreme weather are only two of the most exciting uses
of ensemble forecasts. To realize this potential, much development remains to be done. Let’s get
on with it!
6.
References
Atger, F., 1999a: The skill of ensemble prediction systems. Mon. Wea. Rev., 127, 1941-1953.
Atger, F., 1999b: Tubing: An alternative to clustering for the classification of ensemble forecasts.
Wea. Forecasting, 14, 741-757.
Buizza, R., 1997: Potential forecast skill of ensemble prediction and spread and skill distributions
of the ECMWF ensemble prediction system. Mon. Wea. Rev., 125, 99-119.
Eckert, P., D. Cattani, and J. Ambuhl, 1996: Classification of ensemble forecasts by means of an
artificial neural network. Meteor. Appl., 3, 169-178.
Hamill, T.M., and S.J. Colucci, 1998: Evaluation of Eta-RSM ensemble probabilistic precipitation
forecasts. Mon. Wea. Rev., 126, 711-724.
Kohonen, T., 1997: Self-organizing maps, Second edition. Springer-Verlag, New York, 426pp.
Kok, C.J., 1999: Statistical post-processing on EPS. Unpublished manuscript, presented at the
ECMWF expert meeting on EPS, 1999, 15pp.
Wilson, L.J., 2001a: The Canadian Meteorological Center Ensemble Prediction System. WMO,
Proceedings of the Workshop on Ensemble Prediction, Beijing, in press.
Wilson, L.J., 2001b: Strategies for the verification of ensemble forecasts. WMO, Proceedings of
the Workshop on Ensemble Prediction, Beijing, in press.
Download