Additions to the Glossary – EUMETCAL modules on forecast verification. Contingency Table: For a categorical variable of K categories, a table of K by K cells representing all the possible forecast-observation combinations of categories. Each cell contains a count of all the cases in the verification sample which fit that particular forecast category-observed category combination. Contingency tables usually are structured with observation categories as columns and forecast categories as rows. Marginal sums of total occurrences in each forecast and observed category are usually added on the right and at the bottom of the table respectively, and the total verification sample size is given in the lower right hand corner cell. Hits: The number of cases of a verification sample where the forecast category agrees with the observed category, usually reported separately for each category. The term is usually applied to forecast events with a relatively low climatological frequency of occurrence. Missed events (or “misses”): The number of cases of a verification sample where a categorical event was observed, but was not forecast. False alarms: The number of cases of a verification sample where a categorical event was predicted to occur, but was not observed. Correct non-events (or correct negatives): The number of cases of a verification sample where a categorical event was forecast not to occur, and was not observed to occur. Underforecast: The forecast value of the variable is lower than the observed value. For categorical forecasts, the forecast frequency of occurrence of the event is lower than the observed frequency. Overforecast: The forecast value of the variable is higher than the observed value. For categorical forecasts, the forecast frequency of occurrence of the event is higher than the observed frequency. Hit rate: The number of times an event is correctly forecast divided by the total number of observations of that event. (same as probability of detection) Probability of detection (POD): The number of times an event is correctly forecast divided by the total number of observations of that event (same as hit rate) Proportion (percentage) correct (PC): The total number of times a categorical variable is correctly forecast, summed over all categories and divided by the total number of cases in the verification sample. For percentage, the proportion is multiplied by 100. False alarm ratio (FAR): The number of times an event is forecast but is not observed, divided by the total number of forecasts of that event. Post agreement (PAG): The fraction of forecasts of an event which are correct. False Alarm Rate (FA): The number of times an event was forecast but not observed, divided by the total number of times the event was not observed. That is, the fraction of non-events which were forecast as false alarms. (same as probability of false detection) Probability of false detection (POFD): The number of times an event was forecast but not observed, divided by the total number of times the event was not observed. That is, the fraction of non-events which were forecast as false alarms. (same as the false alarm rate) Critical success index (CSI): The number of correct forecasts of an event (hits), divided by the total of the hits, false alarms and misses (same as the Threat score). Threat Score (TS): The number of correct forecasts of an event (hits), divided by the total of the hits, false alarms and misses (same as the critical success index) Equitable threat score (ETS): The threat score (critical success index) adjusted for the number of correct forecasts expected by chance. The score is intended to offset the sensitivity of the TS to the underlying climatology of the event. Heidke skill score (HSS): The skill of a categorical forecast based on the hit rate, and referenced to a chance (random) forecast. Usually expressed as the difference between the hit rate for the forecast and the chance hit rate, divided by the difference between a perfect hit rate (=1) and the chance hit rate. Hanssen-Kuiper score (KSS): The difference between the hit rate and the false alarm rate for a set of categorical forecasts. This can be computed for each category separately when there are more than two categories. (same as True skill statistic) True skill statistic (TSS): The difference between the hit rate and the false alarm rate for a set of categorical forecasts. This can be computed for each category separately when there are more than two categories. (same as the Hanssen-Kuiper score) Relative operating characteristic (curve) (ROC): For a set of probability forecasts of a dichotomous variable, a plot of the hit rate vs. the false alarm rate obtained by varying the probability threshold defining the prediction of the occurrence or non-occurrence of the event. Sometimes called the receiver operating characteristic. Skill: The accuracy of a forecast compared to the accuracy of a standard forecast which is considered to be unskilled. Reliability: As an attribute of probability forecasts, the degree of correspondence between the forecast probability and the observed frequency of an event over a verification sample. Similar to bias, but cannot be evaluated on a single probability forecast. Resolution: As an attribute of probability forecasts, the degree to which a forecast system can systematically group the cases of a verification sample into subsets with different observed frequencies of occurrence of the event. Sharpness: The variance in a set of forecasts. Applies to both continuous and probabilistic forecasts, and is a function of the forecasts alone. Uncertainty: The variance in a set of (usually categorical) observations. Discrimination: As an attribute of probability forecasts, the ability of a probability forecast system to systematically distinguish between occurrences and non-occurrences of an event. Base rate: The frequency of occurrence of an event of interest averaged over a verification sample. Sometimes called the “sample climatological frequency” or just “sample climatology”. False skill: The tendency to obtain credit for systematic spatial or temporal climatological variations in forecasts by referencing to an underlying climatology where these variations have been averaged out. For example, the computation of temperature skill scores over many stations at different latitudes means that latitudinal differences in temperature climatology are included as “skill” compared to the underlying mean observed temperature. This is a potential problem for all skill scores and the relative operating characteristic. Reliability table (diagram): A plot of the observed frequency of an event vs. the forecast probability of the event. The forecast-observation dataset is binned according to the forecast probability in order to obtain estimates of the observed frequency. The graph and associated statistics are used to estimate reliability, resolution and sharpness of probability forecasts. Rank probability score (RPS): For multi-category probability forecasts, a measure of the difference between the cumulative distribution of forecasts and the observation, summed over all categories. Measures accuracy. Rank probability skill score (RPSS): For multi-category probability forecasts, the difference in the rank probability score for a standard unskilled forecast such as climatology or persistence and the rank probability score for the forecast, divided by the rank probability score for the standard forecast. Measures skill. Conditional distribution: A frequency distribution of a subset of the cases of a verification sample where membership in the subset is determined (conditioned) by selected values of another variable. For example, the conditional distribution of forecast probabilities given that the event occurred. Frequency distribution: A tally, usually in the form of a histogram plot, of all the values of a variable represented in a sample. The plot is either in the form of the number of cases in the sample vs. the values of the variable, or if all counts are divided by the sample size, the frequency of occurrence of each value in the sample vs. the values of the variable. Likelihood diagram: A histogram plot of both the conditional distribution given the occurrence and the conditional distribution given the non-occurrence of an event of interest for a verification sample. Bi-normal model: A method of fitting the relative operating characteristic curve to a set of points for the purpose of estimating the area. The bi-normal model requires an assumption which is normally satisfied in meteorological verification, and its accuracy is validated by many studies in different fields. The verification sample must be large enough (typically >100 cases in total) but this limit depends on the base rate. See the library for further information. Trapezoidal rule: A method of fitting the relative operating characteristic curve to a set of points for the purpose of estimating the area, which involves connecting the points with straight lines. This method tends to underestimate the ROC area when used on large samples and is accurate only when the sample size is small, typically <100 cases, or when there are fewer than 10 occurrences of the event in the sample. Brier skill score (BSS): The skill score based on the Brier score, given by the difference between the Brier score for the standard forecast and the Brier score for the forecast, divided by the Brier score for the standard forecast. Sharpness diagram: A histogram plot of number (or frequency) of forecast probability vs. forecast probability value for a verification dataset. Usually used as an inset to a reliability diagram to show sharpness.