2.4 - WMO

advertisement
WORLD METEOROLOGICAL ORGANIZATION
_________________________
TCM-VI/Doc. 2.4
(13.X.2009)
___________
SIXTH TROPICAL CYCLONE RSMCs/TCWCs
TECHNICAL COORDINATION MEETING
ITEM 2.4
BRISBANE, AUSTRALIA
2 TO 5 NOVEMBER 2009
ENGLISH ONLY
Standard format for verification of TC forecast
(Submitted by the Secretariat)
This document provides the standard formant for verification
of TC forecast which was proposed by La Réunion and
submitted to TCM-5.
ACTION PROPOSED
It was noted at TCM-3 that the proposed methodologies of verification would be
acceptable in principle. At the 5th session, TCM reviewed the standard formant proposed by
RSMC La Réunion in response to the request from TCM-4 and in consideration of the comment
from the centers concerned. Nevertheless, TCM-5 reserved acceptance of the format, because
parameters for verification were different between the centers and they also had different views
about the measurement of gain in skill. It recognized difficulties in reaching a consensus. The
committee may discuss how to deal with this issue.
__________________
Standard format for warning verification statistics of TC forecasts
provided by TC RSMCs, TCWCs and NWP Centres
I. Introduction
As requested by the Second TC RSMCs Technical Coordination Meeting (STCM), a
methodology to standardize the verification of the forecasts provided by TC RSMCs and TCWCs
was submitted to the TCM-3 by the representative of the RSMC La Réunion. TCM-3 noted that the
proposed methodology would be acceptable in principle. However, it also noted that the proposed
methodology should be further studied by other TC RSMCs and TCWCs. Thus, TCM-4 requested
that the centres review the proposed format and forward their comments to the RSMC La Réunion
(Mr Philippe Caroff) before 1 February 2003 and that the final format to be sent to the TC RSMCs
and TCWCs by 1 September 2003. The proposed standard format is as follows:
 Global overview of the season (or add individual verification statistics for each system?)
 Parameters : track forecasts, intensity forecasts (winds; optional : pressure).
 Forecast periods to verify :
Errors at 0h (analysis), 12h, 24h, 48h forecast periods.
Recommended 72h.
Optional : 36h, long range forecasts
 Sampling of Data to be verified
Proposal : to verify all the forecasts disseminated, excluding those concerning a system
classified, at the analysis stage, as extratropical, and excluding those concerning a system
for which maximum wind is, both at the time of analysis and at the time of forecast, of
inferior strength (in a strict sense) than near gale force (respectively gale force) for average
winds over 10 minutes (respectively 1 minute).
Optional : discrimination by intensities, i.e. to provide equivalent statistics calculated
respectively for:


All the forecasts for a system whose maximum wind speed, at analysis or at
forecast, is greater than or equal to 64kt (average winds over 10 minutes)
All forecasts for a system whose maximum wind speed is between 34 and 63kt
(over 10 minutes) at analysis (or at forecast) and not greater than or equal to 64
kt at forecast (or at analysis).
Size of samples (number of forecasts for the statistics submitted) should be systematically
indicated (for example, between brackets adjacent to the corresponding statistic).
II. Statistics on verification of track forecasts
All statistical data on track forecast errors are to be indicated in kilometres.
Simple Errors
A global statement on track forecast errors will be provided. This will contain both Direct
Positional Errors - measured as the distance, on the globe, between the forecast and the observed
position (Best Track point) -, standard deviation. Furthermore, indication of the median is
recommended.
As regards systems considered individually, it is recommended to indicate a visual representation,
in the form of a graph, of tracks and associated forecast errors (at the different forecast ranges).
A further option, not compulsory, would be to indicate the histogram of errors for the
different forecast ranges. A distribution of errors by segments of 50km (0-50, 50-100, etc...) and by
frequency seems particularly advisable. Forecast error rates less than typical thresholds might also
be included (percentage of forecasts with errors less than 100km at 12h forecast period, 150km at
24h, 300km at 48h, 450km at 72h).
Measuring biases
Two types of biases may be identified: zonal and meridional biases and biases calculated
on the observed track of the phenomenon in question.
Thus, the classic definitions are:
 DX = Positional error in an East-West direction, with the convention of the sign that DX
is positive when the forecast position is to the east of the observed position.
 DY = Positional error in a North-South direction, with the convention of the sign that DY
is positive when the forecast position is located on the polar side in relation to the
observed position.
 AT = Positional error along the axis of the track (Along Track), with the convention of
the sign that AT is positive when the forecast is in advance of the observed position.
 CT = Positional error transverse to the track (Cross Track), with the convention of the
sign that CT is positive when the forecast is located to the right (to the left, respectively)
of observed track in the northern hemisphere (southern hemisphere, respectively).
As the dimensions indicated above are signed, a simple arithmetical average of these
errors for all of the forecasts is of scant importance, as it can hide the biases by artificial
compensation between positive and negative values. A scalar average (average absolute errors)
conversely, provides information on the average size of deviations between forecasts and
observations. This might give an indication of major time differences (low scalar average on CT,
with high scalar average on AT) or major track errors (high scalar average on CT, with low scalar
average on AT).
More informative could be to draw a distinction between positive and negative errors, and
to present average values as well as standard deviations associated with respectively positive AT
errors, negative AT errors, positive CT errors, negative CT errors (idem for DX and DY) submitted
separately. The relative frequency of the occurrence of positive and negative errors is an indication
of possible bias and is of note here, even if easily accessible given the comparison of the number
of samples mentioned between brackets.
So as to avoid an overabundance of numbers, graphical overviews would undoubtedly be
more appropriate. To highlight biases tending to over-estimation or under-estimation of track speed
(linked to AT), or tending to forecasts too far left or right regarding the real track (linked to CT, and
useful for indicating a tendency to anticipate or delay -or miss- track recurvatures), a simple
method consists of viewing the sample of forecast errors in the form of a graphical representation
(axes of AT, CT or DX, DY coordinates) by forecast period of the distribution of errors (scatter
diagram).
An equally visual method, but integrating information other than purely qualitative in nature,
consists of elaborating, for each forecast period, a “wind rose”, where forecast error is treated
vectorally and defined, by the norm DPE and by the angle of deviation in relation to the real track
(angle subtracted from AT and CT). For this, errors need to be sorted by class, and the respective
frequencies of various classes calculated. The definition of classes should be adapted to the size
and type of sample. For 12h forecasts, classes of a 30° (or 45° for a limited sample) angle
deviation are proposed and also for 50km steps for DPEs (0-50km, 50-100km, 100-150km, >
200km), which already amounts to 60 classes (40 in the latter case).
The ideal -absence of biases and weak errors-, consists of obtaining a wind rose with a
distribution balanced between the right and left hemispheres (no one direction deviation) and
between the upper and lower hemispheres (no speed bias), and thus the most targeted as possible
to the vertical axis (highest frequencies for light deviations), focussed on weak error classes.
The measurement of gain in skill
In order to compare forecasts made in extremely variable conditions and evaluate their
quality by including these forecasts’ degree of difficulty (in particular with the aim of detecting
forecasts’ trends with time), there are several possible options. The first being the measurement of
gain (or loss) in skill, made possible by forecast in relation to a reference model.
A reference model still needs to be chosen. PERSISTENCE (calculated using Best Track
and movement during the last 12 hours), which is the simplest forecast model, or a more
developed model, CLIMATOLOGICAL or CLIPER (linking climatology and persistence), are usually
used as reference.
The gain in skill in relation to CLIPER, is quantified in percentage terms by;
Gain in skill= (CLIPER DPE - DPE) Х 100%
CLIPER DPE




The samples for calculating the gain in official forecasts (or NWP forecasts) compared with
data from the model of reference (CLIPER or PERSISTENCE) will be less comprehensive
,because obtaining the latter data generally requires knowledge of the positions observed
12hrs and 24hrs before the base time and, therefore, it is not possible to calculate initial
values of skill at the beginning of the trajectory.
These skill calculations do not exactly reflect (they underestimate) the actual gain in skill
provided by the forecasters, because they are calculated using Best Track points. This
does not, of course, stop different Centres from taking interest, at an internal level, in a
verification of forecasts in relation to real time data, which are the only true work basis for
forecasters when making forecasts.
One problem is that the definition of CLIPER models can differ according to the cyclone
basins. The ideal situation would be to have a universal model serving as a sole reference.
The MOCCANA climatological model (model of analogues) developed in La Réunion, which
give results similar to CLIPER results, could be proposed as this reference. Otherwise, the
simplest solution would be to adopt PERSISTENCE as a reference tool.
For a given cyclone basin, once the reference model is chosen, the gain in skill’s graph of
time evolution could be presented, but it will not make it possible to validate an
improvement (or deterioration) in the quality of forecasts. A season that is rather easy in
terms of forecast difficulty, presenting very persistent and/or climatlological trajectories, will
be perhaps less valuable in terms of skill than a difficult season, allowing easier large gains.
Normalization or weighting of forecasts
This is why other options could be explored to try to overcome this natural variation of the
cyclone season’s degree of difficulty.
One of these solutions is to take into account that with a large enough sample of systems
and trajectories, the previous variability tends to disappear (by integrating a large number of
trajectories, it is assumed that all forecast situations are included, from the easiest to the most
difficult). Consequently, by establishing a running average of gains (or even directly of forecast
errors) during several cyclone seasons, a statistically significant seasonal tendency is likely to
appear, without interference from the difficulty of individual season’s seasonal variation. The period
needed to carry out this “running mean” still has to be determined. The running average of 5 years,
used by some, may be not necessarily sufficient.
If the previous method appears relevant to assess an evolution in the quality of forecasts, in
a given, individually taken cyclone basin, other problems could be considered.
If it appears a priori unrealistic to hope that comparisons can be made between groups or
forecasts made in different and therefore cyclone basins of varying difficulty (see Neumann’s work
on the comparative difficulty of different cyclone basins). However, perhaps it is nevertheless
possible to try to quantify the quality of forecasts in relation to their degree of difficulty.
In order to do this, a correction factor could be applied to the average error of forecast, the
formula for which including the season’s degree of difficulty, harmonized with a standard reference.
Ideally, the coefficient could, for example, be defined by a comparison between the season’s
persistent average error of forecast and the climatological persistent average error of forecast
during a 30 year period. If the season has been easier than usual, the correction coefficient is
above 1, and below 1 if the season has been more difficult. In this way, “standardized” average
annual errors of forecasts become testable in terms of seasonal comparison.
III. Verification of intensity forecast statistics
Intensity forecasts carried out by CMRS or Warning Centers apply to central pressure,
maximum wind in the tropical systems, and in some cases to the forecast of the Dvorak intensity Ci
number. The verification of intensity forecasts will more specifically apply to the two first
parameters, with priority given to maximum wind.
Verification of maximum wind forecasts
The first question concerns the choice of units. The m/s, international unit system, or knot,
used almost uniformly in advisory and bulletin forecasts, are to be recommended.
These forecast errors being signed, will be treated in the same way as track forecasts for AT, CT
and DX.DY.
A distinction will be made between absolute average errors (with standard deviation and
median), giving the average margin of error forecast (still in relation to Best Track intensities), and
data applying to the biases, for whom there will be a separation of positive errors (predicted
maximum wind above the maximum wind recorded or estimated) and negative errors, with
respective average and standard deviation, as well as relative frequency (giving the possible bias
of over-estimation or under-estimation of intensities).
Optionally, as with trajectory forecasts, histograms of errors of intensity during different periods can
be used effectively. The division of errors into steps of 5 kts or 2.5 m/s (…, -2.5 m/s, 0 m/s, +2.5
m/s, etc…) and by frequency appears to be well adapted and could be recommended.
Verification of central pressure forecasts
The accepted unit will be hectoPascal, and the statistics will be presented following the
aforementioned methodology for maximum wind (5 hPa will be the recommended step).
IV. Verification of cyclogenesis forecasts
Centres disseminating cyclogenesis forecasts, generally issue their forecasts of formation
of tropical depressions in a probabilistic form (probability of “poor”, “fair”, “good” cyclogenesis). The
monitoring of this particular type of forecast is therefore through adapted statistical tools, like
contingency tables, with aggregatedscores calculated on the tables.
These aggregated scores will provide the following elements: percentage of correct forecasts, level
of false alarms, level of non-detection, Heidke index of success (compared with a random
forecast), Rousseau index (compared with a random forecast, but in keeping with climatology).
The quality indices are presented in form: Q= (B-H)/(T-H), where B represents the number
of correct forecasts, T is the total number of forecasts and H represents the number of correct
forecasts resulting from a reference forecast. This reference forecast is always incorrect in the
case of the percentage of correct forecasts. H is the probability (respectively the probability in
keeping with climatology) in the case of the Heidke index (respectively Rousseau) and is deduced
from the aggregates of lines and columns in the table).
N.B. Other types of forecasts could be subject to similar kinds of data processing. The
forecasts of the trajectory’s curvatures, the forecasts of intensity above a certain threshold, as well
as the forecasts for reaching the threshold of hurricane intensity, for example, could be given.
Download