Glossary_contingency_prob

advertisement
Additions to the Glossary – EUMETCAL modules on forecast
verification.
Contingency Table: For a categorical variable of K categories, a table of K by K cells
representing all the possible forecast-observation combinations of categories. Each cell
contains a count of all the cases in the verification sample which fit that particular
forecast category-observed category combination. Contingency tables usually are
structured with observation categories as columns and forecast categories as rows.
Marginal sums of total occurrences in each forecast and observed category are usually
added on the right and at the bottom of the table respectively, and the total verification
sample size is given in the lower right hand corner cell.
Hits: The number of cases of a verification sample where the forecast category agrees
with the observed category, usually reported separately for each category. The term is
usually applied to forecast events with a relatively low climatological frequency of
occurrence.
Missed events (or “misses”): The number of cases of a verification sample where a
categorical event was observed, but was not forecast.
False alarms: The number of cases of a verification sample where a categorical event
was predicted to occur, but was not observed.
Correct non-events (or correct negatives): The number of cases of a verification
sample where a categorical event was forecast not to occur, and was not observed to
occur.
Underforecast: The forecast value of the variable is lower than the observed value. For
categorical forecasts, the forecast frequency of occurrence of the event is lower than the
observed frequency.
Overforecast: The forecast value of the variable is higher than the observed value. For
categorical forecasts, the forecast frequency of occurrence of the event is higher than the
observed frequency.
Hit rate: The number of times an event is correctly forecast divided by the total number
of observations of that event. (same as probability of detection)
Probability of detection (POD): The number of times an event is correctly forecast
divided by the total number of observations of that event (same as hit rate)
Proportion (percentage) correct (PC): The total number of times a categorical
variable is correctly forecast, summed over all categories and divided by the total number
of cases in the verification sample. For percentage, the proportion is multiplied by 100.
False alarm ratio (FAR): The number of times an event is forecast but is not observed,
divided by the total number of forecasts of that event.
Post agreement (PAG): The fraction of forecasts of an event which are correct.
False Alarm Rate (FA): The number of times an event was forecast but not observed,
divided by the total number of times the event was not observed. That is, the fraction of
non-events which were forecast as false alarms. (same as probability of false detection)
Probability of false detection (POFD): The number of times an event was forecast but
not observed, divided by the total number of times the event was not observed. That is,
the fraction of non-events which were forecast as false alarms. (same as the false alarm
rate)
Critical success index (CSI): The number of correct forecasts of an event (hits), divided
by the total of the hits, false alarms and misses (same as the Threat score).
Threat Score (TS): The number of correct forecasts of an event (hits), divided by the
total of the hits, false alarms and misses (same as the critical success index)
Equitable threat score (ETS): The threat score (critical success index) adjusted for the
number of correct forecasts expected by chance. The score is intended to offset the
sensitivity of the TS to the underlying climatology of the event.
Heidke skill score (HSS): The skill of a categorical forecast based on the hit rate, and
referenced to a chance (random) forecast. Usually expressed as the difference between
the hit rate for the forecast and the chance hit rate, divided by the difference between a
perfect hit rate (=1) and the chance hit rate.
Hanssen-Kuiper score (KSS): The difference between the hit rate and the false alarm
rate for a set of categorical forecasts. This can be computed for each category separately
when there are more than two categories. (same as True skill statistic)
True skill statistic (TSS): The difference between the hit rate and the false alarm rate
for a set of categorical forecasts. This can be computed for each category separately
when there are more than two categories. (same as the Hanssen-Kuiper score)
Relative operating characteristic (curve) (ROC): For a set of probability forecasts of a
dichotomous variable, a plot of the hit rate vs. the false alarm rate obtained by varying
the probability threshold defining the prediction of the occurrence or non-occurrence of
the event. Sometimes called the receiver operating characteristic.
Skill: The accuracy of a forecast compared to the accuracy of a standard forecast which
is considered to be unskilled.
Reliability: As an attribute of probability forecasts, the degree of correspondence
between the forecast probability and the observed frequency of an event over a
verification sample. Similar to bias, but cannot be evaluated on a single probability
forecast.
Resolution: As an attribute of probability forecasts, the degree to which a forecast system
can systematically group the cases of a verification sample into subsets with different
observed frequencies of occurrence of the event.
Sharpness: The variance in a set of forecasts. Applies to both continuous and
probabilistic forecasts, and is a function of the forecasts alone.
Uncertainty: The variance in a set of (usually categorical) observations.
Discrimination: As an attribute of probability forecasts, the ability of a probability
forecast system to systematically distinguish between occurrences and non-occurrences
of an event.
Base rate: The frequency of occurrence of an event of interest averaged over a
verification sample. Sometimes called the “sample climatological frequency” or just
“sample climatology”.
False skill: The tendency to obtain credit for systematic spatial or temporal
climatological variations in forecasts by referencing to an underlying climatology where
these variations have been averaged out. For example, the computation of temperature
skill scores over many stations at different latitudes means that latitudinal differences in
temperature climatology are included as “skill” compared to the underlying mean
observed temperature. This is a potential problem for all skill scores and the relative
operating characteristic.
Reliability table (diagram): A plot of the observed frequency of an event vs. the
forecast probability of the event. The forecast-observation dataset is binned according to
the forecast probability in order to obtain estimates of the observed frequency. The graph
and associated statistics are used to estimate reliability, resolution and sharpness of
probability forecasts.
Rank probability score (RPS): For multi-category probability forecasts, a measure of
the difference between the cumulative distribution of forecasts and the observation,
summed over all categories. Measures accuracy.
Rank probability skill score (RPSS): For multi-category probability forecasts, the
difference in the rank probability score for a standard unskilled forecast such as
climatology or persistence and the rank probability score for the forecast, divided by the
rank probability score for the standard forecast. Measures skill.
Conditional distribution: A frequency distribution of a subset of the cases of a
verification sample where membership in the subset is determined (conditioned) by
selected values of another variable. For example, the conditional distribution of forecast
probabilities given that the event occurred.
Frequency distribution: A tally, usually in the form of a histogram plot, of all the
values of a variable represented in a sample. The plot is either in the form of the number
of cases in the sample vs. the values of the variable, or if all counts are divided by the
sample size, the frequency of occurrence of each value in the sample vs. the values of the
variable.
Likelihood diagram: A histogram plot of both the conditional distribution given the
occurrence and the conditional distribution given the non-occurrence of an event of
interest for a verification sample.
Bi-normal model: A method of fitting the relative operating characteristic curve to a set
of points for the purpose of estimating the area. The bi-normal model requires an
assumption which is normally satisfied in meteorological verification, and its accuracy is
validated by many studies in different fields. The verification sample must be large
enough (typically >100 cases in total) but this limit depends on the base rate. See the
library for further information.
Trapezoidal rule: A method of fitting the relative operating characteristic curve to a set
of points for the purpose of estimating the area, which involves connecting the points
with straight lines. This method tends to underestimate the ROC area when used on large
samples and is accurate only when the sample size is small, typically <100 cases, or when
there are fewer than 10 occurrences of the event in the sample.
Brier skill score (BSS): The skill score based on the Brier score, given by the difference
between the Brier score for the standard forecast and the Brier score for the forecast,
divided by the Brier score for the standard forecast.
Sharpness diagram: A histogram plot of number (or frequency) of forecast probability
vs. forecast probability value for a verification dataset. Usually used as an inset to a
reliability diagram to show sharpness.
Download