BAYESIAN PROCESSOR OF OUTPUT: PROBABILITY OF PRECIPITATION OCCURRENCE By Roman Krzysztofowicz

advertisement
BAYESIAN PROCESSOR OF OUTPUT:
PROBABILITY OF PRECIPITATION OCCURRENCE
By
Roman Krzysztofowicz
University of Virginia
Charlottesville,Virginia
and
Coire J. Maranzano
Johns Hopkins University
Baltimore, Maryland
Research Paper RK–0601
http://www.faculty.virginia.edu/rk/
January 2006
Revised October 2006
c 2006 by R. Krzysztofowicz and C.J. Maranzano
Copyright °
———————————————–
Corresponding author address: Professor Roman Krzysztofowicz, University of Virginia, P.O.
Box 400747, Charlottesville,VA 22904–4747. E-mail: rk@virginia.edu
ABSTRACT
The Bayesian Processor of Output (BPO) is a theoretically-based technique for probabilistic
forecasting of weather variates. The first version of the BPO described herein is for a binary predictand; it is illustrated by producing the probability of precipitation (PoP) occurrence forecast.
This PoP is a posterior probability obtained through Bayesian fusion of a prior (climatic) probability and a realization of predictors output from a numerical weather prediction (NWP) model.
The strength of the BPO derives from (i) the theoretic structure of the forecasting equation (which
is Bayes theorem), (ii) the flexibility of the meta-Gaussian family of likelihood functions (which
allows any form of the marginal distribution functions of predictors, and a non-linear and heteroscedastic dependence structure between predictors), (iii) the simplicity of estimation, and (iv)
the effective use of asymmetric samples (typically, a long climatic sample of the predictand and a
short operational sample of the NWP model output).
Modeling and estimation of the BPO are explained in a setup parallel to that of the Model
Output Statistics (MOS) technique used operationally by the National Weather Service. The performance of the prototype BPO system is compared with the performance of the operational MOS
system in terms of calibration and informativeness on two samples (estimation and validation).
These preliminary results highlight the advantages of the BPO in terms of (i) performance for a
specific location (and hence a user), (ii) efficiency of extracting predictive information from the
NWP model output (fewer predictors needed), and (iii) parsimony of the predictors (no need for
experimentation to find suitable transformations of the NWP model output). Potential implications
for operational forecasting and ensemble processing are discussed.
ii
TABLE OF CONTENTS
ABSTRACT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii
1. INTRODUCTION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 Towards Bayesian Forecasting Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 BPO for Binary Predictand . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
2. BAYESIAN THEORY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.1 Variates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.2 Samples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.3 Input Elements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.4 Theoretic Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
3. META-GAUSSIAN MODEL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
3.1 Input Elements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
3.2 Forecasting Equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
3.3 Basic Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
3.4 Model Validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
4. EXAMPLE WITH ONE PREDICTOR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
4.1 Prior Probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
4.2 Conditional Density Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
4.3 Informativeness of Predictor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
4.4 Posterior Probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
4.5 Another Predictor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
4.6 Binary-Continuous Predictor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
4.7 Monotonicity of Likelihood Ratio . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
5. EXAMPLE WITH TWO PREDICTORS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
5.1 Conditional Correlation Coefficients. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .16
5.2 Conditional Dependence Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
5.3 Conditional Dependence Structures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
5.4 Second Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
5.5 Predictors Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
iii
6. MOS SYSTEM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
6.1 Forecasting Equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
6.2 Grid-Binary Transform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
6.3 Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
6.4 Predictors Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
7. COMPARISON OF BPO WITH MOS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
7.1 System Versus Technique . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
7.2 Performance Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
7.3 Comparative Verifications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
7.4 Explanations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
8. SUMMARY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
8.1 Bayesian Technique . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
8.2 Preliminary Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
8.3 Potential Implications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
ACKNOWLEDGMENTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
APPENDIX A: NUMERICAL APPROXIMATION TO Q−1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
REFERENCES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
TABLES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
FIGURES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
iv
1. INTRODUCTION
1.1 Towards Bayesian Forecasting Techniques
Rational decision making by industries, agencies, and the public in anticipation of heavy
precipitation, snow storm, flood, or other disruptive weather phenomenon, requires information
about the degree of certitude that the user can place in a weather forecast. It is vital, therefore,
to advance the meteorologist’s capability of quantifying forecast uncertainty to meet the society’s
rising expectations for reliable information.
Our objective is to develop and test a coherent set of theoretically-based techniques for probabilistic forecasting of weather variates. The basic technique, called Bayesian Processor of Output
(BPO), processes output from a numerical weather prediction (NWP) model and optimally fuses it
with climatic data in order to quantify uncertainty about a predictand.
As is well known, Bayes theorem provides the optimal theoretic framework for fusing information from different sources and for obtaining the probability distribution of a predictand,
conditional on a realization of predictors, or conditional on an ensemble of realizations. The “optimality” of Bayes theorem for fusing information, or updating uncertainty, or revising probability,
rests on logical and mathematical arguments (see, for example, Savage, 1954; DeGroot, 1970; de
Finetti, 1974). These arguments have long ago been adopted by engineers and decision theorists
for information, or signal, or forecast processing, and for decision making based on forecasts (see,
for example, Edwards et al., 1968; Sage and Melsa, 1971; Krzysztofowicz, 1983; Alexandridis and
Krzysztofowicz, 1985). Introducing what we would call today a Bayesian processor of forecast
for a binary predictand, DeGroot (1988) explains:
The argument in favor of the Bayesian approach proceeds in two steps: (1) The
quantitative assessment of uncertainty is in itself a sterile exercise unless that
1
assessment is to be used to make decisions. (2) The Bayesian approach provides
the only coherent methodology for decision making under uncertainty.
Lindley (1987), defending “the inevitability of probability” as a measure of uncertainty, presents
logical arguments and a succinct verdict: “Most intelligent behavior is simply obeying Bayes
theorem. Any other procedure is incoherent.” The challenge lying before us is to develop and test
Bayesian procedures suitable for operational forecasting in meteorology.
1.2 BPO for Binary Predictand
The present article describes the BPO for a binary predictand. This BPO is illustrated by
producing the probability of precipitation (PoP) occurrence forecast. The overall setup for the
illustration is parallel to the operational setup for the Model Output Statistics (MOS) technique
(Glahn and Lowry, 1972) used in operational forecasting by the National Weather Service (NWS).
In the currently deployed AVN-MOS system (Antolik, 2000), the predictors for the MOS forecasting equations are based on output fields from the Global Spectral Model run under the code
name AVN. The performance of the operational AVN-MOS system is the primary benchmark for
evaluation of the performance of the BPO.
The article is organized as follows. Section 2 presents the gist of the Bayesian theory of
forecasting for a binary predictand. Section 3 details the input elements, the forecasting equation,
and the basic properties of the BPO. Section 4 presents a tutorial example of the BPO for PoP using
a single predictor. Section 5 presents a tutorial example of the BPO for PoP using two predictors.
The prototype BPO system is compared and contrasted with the operational MOS system in terms
of the structure of the forecasting equations in Section 6, and in terms of performance on matched
verifications in Section 7. Section 8 summarizes implications of these comparisons and potential
advantages of the BPO.
2
2. BAYESIAN THEORY
2.1 Variates
Let V be the predictand — a binary variate serving as the indicator of some future event,
such that V = 1 if and only if the event occurs, and V = 0 otherwise; its realization is denoted v,
where v ∈ {0, 1}. Let Xi be the predictor — a variate whose realization xi is used to forecast
V . Let X = (X1 , ..., XI ) be the vector of I predictors; its realization is denoted x = (x1 , ..., xI ).
Each Xi (i = 1, ..., I) is assumed to be a continuous variate — an assumption that simplifies the
presentation but can be relaxed if necessary.
2.2 Samples
Suppose the forecasting problem has already been structured, and the task is to develop the
forecasting equation in a setup similar to that of the MOS technique (Antolik, 2000).
In the
examples throughout the article, the event to be forecasted is the occurrence of precipitation (accumulation of at least 0.254 mm of water) in Buffalo, New York, during the 6-h period 1200–1800
UTC, beginning 60 h after the run of the AVN model at 0000 UTC. The predictors are the variates
whose realizations are output from the AVN model. Forecasts are to be made every day in the cool
season (October–March).
Let {v} denote the climatic sample of the predictand. The climatic sample comes from the
database of the National Climatic Data Center (NCDC). This database contains hourly precipitation
observations in Buffalo from over 56 years; however, the record is heterogeneous and must be
processed in order to obtain a homogeneous sample. To avoid this task, only observations recorded
by the Automated Surface Observing System (ASOS) are included in the prior sample. In effect,
it is a 7-year long sample extending from 1 January 1997 through 31 December 2003. Each day
provides one realization. The sample size for the cool season is M = 1132.
3
Let {(x, v)} denote the joint sample of the predictor vector and the predictand. The joint
sample comes from the database that the Meteorological Development Laboratory (MDL) used
to estimate the operational forecasting equations of the AVN-MOS system. It is a 4-year long
sample extending from 1 April 1997 through 31 March 2001. The sample size for the cool season
is N = 698.
The point of the above example is that typically the joint sample is much shorter than the
climatic sample: N << M. Classical statistical methods, such as the MOS technique, deal with
this sample asymmetry by simply ignoring the long climatic sample. In effect, these methods
ignore vast amounts of information about the predictand. In contrast, the BPO uses both samples;
it extracts information from each sample and then optimally fuses information according to the
laws of probability. (Pooling of samples from different months and stations in order to increase
the sample size is a separate issue.)
2.3 Input Elements
With P denoting the probability and p denoting a generic density function, define the following objects.
g = P (V = 1) is the prior probability of event V = 1; it is to be estimated from the climatic
sample {v}. Probability g quantifies the uncertainty about the predictand V that exists before
the NWP model output is available. Equivalently, it characterizes the natural variability of the
predictand.
fv (x) = p(x|V = v) for v = 0, 1; function fv is the I-variate density function of the predictor
vector X, conditional on the hypothesis that the event is V = v. The two conditional density
functions, f0 and f1 , are to be estimated from the joint sample {(x, v)}. For a fixed realization
X = x, object fv (x) is the likelihood of event V = v.
4
Thus (f0 , f1 ) comprises the family
of likelihood functions. This family quantifies the stochastic dependence between the predictor
vector X and the predictand V . Equivalently, it characterizes the informativeness of the predictors
with respect to the predictand. (The informativeness is defined in Section 4.3.)
2.4 Theoretic Structure
The probability g and the family of likelihood functions (f0 , f1 ) carry information about the
prior uncertainty and the informativeness of the predictors into the Bayesian revision procedure.
The expected density function κ of the predictor vector X is given by the total probability law:
κ(x) = f0 (x)(1 − g) + f1 (x)g,
(1)
and the posterior probability π = P (V = 1|X = x) of event V = 1, conditional on a realization
of the predictor vector X = x, is given by Bayes theorem:
π=
f1 (x)g
.
κ(x)
(2)
By inserting (1) into (2), one obtains an alternative expression:
∙
¸−1
1 − g f0 (x)
π= 1+
,
g f1 (x)
(3)
where (1 − g)/g is the prior odds against event V = 1, and f0 (x)/f1 (x) is the likelihood ratio against event V = 1. Equation (3) defines the theoretic structure of the BPO for a binary
predictand.
Inasmuch as Eqs. (1) and (2) follow directly from the axioms of probability theory (sans
additional assumptions), Eq. (3) is the most general solution for the conditional probability π. In
that sense, it provides the optimal theoretic framework for fusing model output (which supplies a
value of x) with climatic data (which supply a value of g).
5
3. META-GAUSSIAN MODEL
To implement the BPO, a flexible and convenient model is needed for each multivariate conditional density function, f0 and f1 . We employ the meta-Gaussian model developed by Kelly and
Krzysztofowicz (1994, 1995, 1997) and used successfully in probabilistic river stage forecasting
(Krzysztofowicz and Herr, 2001; Krzysztofowicz, 2002) and probabilistic rainfall modeling (Herr
and Krzysztofowicz, 2005).
3.1 Input Elements
A multivariate meta-Gaussian distribution is constructed from specified marginal distributions, a correlation matrix, and the Gaussian dependence structure. To obtain expressions for f0
and f1 , this construction must be replicated twice; every element of the construction must be duplicated, with one copy being conditioned on event V = 0 and another copy being conditioned on
event V = 1. Accordingly, the input elements are defined as follows.
fiv (xi ) = p(xi |V = v) for i = 1, ..., I; v = 0, 1. For fixed i ∈ {1, ..., I} and v ∈ {0, 1},
function fiv is the marginal density function of the predictor Xi , conditional on the hypothesis that
the event is V = v. For a fixed realization Xi = xi of the predictor, object fiv (xi ) is the marginal
likelihood of event V = v.
Fiv (xi ) = P (Xi ≤ xi |V = v) for i = 1, ..., I; v = 0, 1.
For fixed i ∈ {1, ..., I} and
v ∈ {0, 1}, function Fiv is the marginal distribution function of the predictor Xi , conditional on
the hypothesis that the event is V = v; this Fiv corresponds to fiv .
γ ijv = Cor(Zi , Zj |V = v) for i = 1, ..., I − 1; j = i + 1, ..., I; v = 0, 1.
This is the
Pearson’s product-moment correlation coefficient between the standard normal predictors Zi and
Zj , conditional on the hypothesis that the event is V = v. The standard normal predictor Zi ,
conditional on event V = v, is obtained from the original predictor Xi , conditional on event
6
V = v, through the normal quantile transform (NQT):
Zi = Q−1 (Fiv (Xi )),
i = 1, ..., I; v = 0, 1;
where Q is the standard normal distribution function, and Q−1 is the inverse of Q. The conditional
correlation coefficients are arranged in two conditional correlation matrices
Γv = [γ ijv ],
v = 0, 1,
whose elements have the following properties: γ iiv = 1 for i = 1, ..., I; −1 < γ ijv < 1 for i 6= j;
and γ ijv = γ jiv for i, j = 1, ..., I.
It follows that matrix Γv has dimension I × I; is square,
symmetric, and positive definite; and is uniquely determined by its I(I − 1)/2 upper diagonal
elements.
3.2 Forecasting Equation
When each of the two multivariate conditional density functions, f0 and f1 , is meta-Gaussian,
the BPO defined by Eq.(3) takes the following form. Given a prior probability g of event V = 1,
and given a realization x = (x1 , ..., xI ) of the predictor vector, the posterior probability of event
V = 1 is specified by the equation
"
I
Y
fi0 (xi )
1−g
π = 1+
λ(x)
g
f (x )
i=1 i1 i
#−1
,
(4)
where λ is the likelihood ratio weighting function defined by the equation
λ(x) =
r
∙
¸
¢
1 ¡ T −1
det Γ1
T
T −1
T
exp − z0 Γ0 z0 − z0 z0 − z1 Γ1 z1 + z1 z1 ,
det Γ0
2
(5)
and where the mapping of the vector x = (x1 , ..., xI ) into two vectors z0 = (z10 , ..., zI0 ) and
z1 = (z11 , ..., zI1 ) is defined by the NQT:
ziv = Q−1 (Fiv (xi )),
i = 1, ..., I; v = 0, 1.
7
(6)
In numerical calculations, Q−1 is approximated by a rational function (Abramowitz and Stegun,
1972), which is reproduced in Appendix A.
Equation (4) reveals that the posterior probability is determined by the product of the prior
odds (1 − g)/g against event V = 1, the marginal likelihood ratios fi0 (xi )/fi1 (xi ) against event
V = 1, and the likelihood ratio weight λ(x).
The marginal likelihood ratio function fi0 /fi1
carries information from predictor Xi ; the likelihood ratio weighting function λ accounts for the
conditional dependence among the predictors X1 , ..., XI .
If the predictors X1 , ..., XI are independent, conditional on event V = 0 and conditional
on event V = 1, then each of the two conditional correlation matrices, Γ0 and Γ1 , simplifies
to the identity matrix; consequently λ(x) = 1 at every point x, and the multivariate likelihood
ratio f0 (x)/f1 (x) simplifies to the product of the marginal likelihood ratios fi0 (xi )/fi1 (xi ) for
i = 1, ..., I.
3.3 Basic Properties
The meta-Gaussian model for f0 and f1 , which is embedded in Eqs. (4) – (6), offers these
properties.
1. The marginal conditional distribution function Fiv of predictor Xi may take any form;
this form may be different for each i ∈ {1, ..., I} and each v ∈ {0, 1}; the marginal conditional
density function fiv is simply derived from Fiv .
2.
The two transforms (the NQTs) for each predictor Xi are uniquely specified once its
marginal conditional distribution functions, Fi0 and Fi1 , have been estimated.
3. The conditional dependence structure among the predictors X1 , ..., XI is pairwise; the
degree of dependence is quantified by the conditional correlation matrix Γv .
4. The conditional dependence structure between any two predictors Xi and Xj , i 6= j, may
8
be non-linear (in the conditional mean) and heteroscedastic (in the conditional variance).
5. The probabilistic forecast (the posterior probability π) is given by an analytic expression.
Properties 1 and 4 imply the flexibility in fitting the model to data — an attribute necessary
to produce forecasts that are well calibrated and most informative. Properties 2 and 3 imply the
simplicity of estimation. Property 5 implies the computational efficiency — an important attribute
for operational forecasting.
3.4 Model Validation
The meta-Gaussian model can be validated on a given joint sample — an advantage because
one can gain additional insight into the BPO and the data. First, each marginal conditional distribution function Fiv should be tested for the goodness of fit to the empirical distribution function of
Xi , conditional on V = v. Second, the two conditional dependence structures should be validated
based on the following fact (Kelly and Krzysztofowicz, 1997): conditional on V = v (v = 0, 1),
the joint distribution of X1 , ..., XI is meta-Gaussian if and only if the joint distribution of Z1 , ..., ZI
is Gaussian. The NQT guarantees that the marginal distribution of each Zi is standard normal.
Therefore, the validation amounts to testing the hypothesis that the distribution of each pair (Zi ,
Zj ), for i = 1, ..., I − 1 and j = i + 1, ..., I, is bivariate standard normal. This test can be broken
down into three tests of the following requirements:
1. Linearity — the regression of Zi on Zj must be linear.
2. Homoscedasticity — the variance of the residual Θij = Zi − γ ijv Zj must be independent
of Zj .
3. Normality — the distribution of Θij must be normal with mean 0 and variance 1 − γ 2ijv .
Inasmuch as these are requirements of a linear model, testing procedures are well known.
9
4. EXAMPLE WITH ONE PREDICTOR
When there is only one predictor, the BPO is given by Eq.(3), with the vector x being replaced
by the variable x which denotes a realization of predictor X. In effect, three elements are needed
for forecasting: a prior probability g, and two univariate conditional density functions, f0 and f1 .
4.1 Prior Probability
The prior probability g is estimated for the cool season from the climatic sample. It is
g = 0.27, the value to be used for forecasting every day during the cool season. In general, when
the climatic sample is large, the prior probability g could be estimated for a subseason, even for
a day, by applying a moving window to the climatic sample. For instance, Table 1 shows that
g varies from month to month. Thus, using a given g every day during a month and changing
g from month to month would improve the calibration of the forecast probability within each
month. (This statement is justified because the expectation of the posterior probability equals the
prior probability.) Despite this potential advantage, all examples reported herein use g for the
cool season because this parallels the setup of the operational MOS system (whose equations are
estimated for the cool season) and because the available validation sample (from 2 1/2 years) is too
short to verify the calibration of forecasts for each month.
Overall, the prior probability has four attributes important for application.
(i) It may be
location-specific and season-specific (or day-specific) and thereby can capture the “micro-climate”.
(ii) It may be estimated from a large climatic sample. (iii) It is independent of the choice of the
predictors and the length of the NWP model output available for estimation (the size of the joint
sample). (iv) It need not be re-estimated when the NWP model changes; thus it ensures a stable
calibration of the forecast probabilities for as long as the climate remains stationary.
10
4.2 Conditional Density Functions
The single predictor X is the mean relative humidity of a variable depth layer from sigma
1.0 to sigma 0.44 at 60 h after the 0000 UTC model run (for short, mean relative humidity at 60
h). Its two conditional density functions, f0 and f1 , are for the cool season; they are derived from
the corresponding conditional distribution functions, F0 and F1 . The procedure for modeling and
estimation of F0 and F1 is as follows.
1. The joint sample {(x, v)} of 698 realizations is stratified into two subsamples: {(x, 0)}
containing 518 realizations and {(x, 1)} containing 180 realizations.
2. From each subsample, an empirical distribution function of X is constructed (Fig. 1).
3. A parametric model for Fv is chosen, its parameters are estimated, and its goodness-of-fit to
the empirical distribution function is evaluated. Here, both F0 and F1 are ratio type II log-Weibull;
each is defined on the interval (0, 100) and is specified by two parameters (Fig. 1).
The above procedure has been automated by creating a catalog of parametric models and by
developing algorithms for estimation of the parameters and choice of the best model. The catalog
includes expressions for the distribution functions and for the density functions. In effect, once a
parametric model for Fv is chosen, the expression for fv is known. Fig. 2 shows f0 and f1 .
4.3 Informativeness of Predictor
A predictor X used in the BPO is characterized in terms of its informativeness. Intuitively, the
informativeness of predictor X may be visualized by judging the degree of separation between the
two conditional density functions, f0 and f1 , shown in Fig. 2: the larger the separation, the more
informative the predictor. Formally, the informativeness of predictor X is characterized by the
Receiver Operating Characteristic (ROC) — a graph of the probability of detection P (D|x) versus
the probability of false alarm P (F |x) for all x. When the likelihood ratio L(x) = f0 (x)/f1 (x)
11
is a strictly monotonic function of x, the ROC may be constructed directly from the conditional
distribution functions F0 and F1 : if L = f0 /f1 is strictly increasing, then
P (D|x) = F1 (x),
P (F |x) = F0 (x);
(7a)
if L = f0 /f1 is strictly decreasing, then
P (D|x) = 1 − F1 (x),
P (F |x) = 1 − F0 (x).
(7b)
Given f0 and f1 shown in Fig. 2, Eq. (7b) holds; the resultant ROC is shown in Fig. 3.
Clearly, the mean relative humidity X is an informative predictor of the precipitation occurrence
indicator V , as the ROC lies decisively above the diagonal line (which characterizes an uninformative predictor); but X is far from being a perfect predictor of V , as the ROC passes far from the
upper left corner of the graph (which characterizes a perfect predictor).
When there are two or more alternative predictors, they can be compared (and possibly
ranked) in terms of a binary relation of informativeness. This relation derives from the Bayesian
theory of sufficient comparisons, the essence of which is as follows (Blackwell, 1951, 1953;
Krzysztofowicz and Long, 1990, 1991). Let Xi and Xj be two alternative predictors of V . Suppose a rational decision maker will use the probabilistic forecast of V from the BPO in a Bayesian
decision procedure with a prior probability g and a loss function l. Let VAi (g, l) denote the value
of a probabilistic forecast generated by predictor Xi (as defined in the Bayesian theory).
Definition. Predictor Xi is said to be more informative than predictor Xj if and only if the
value of a forecast generated by predictor Xi is at least as high as the value of a forecast generated
by predictor Xj , for every prior probability g and every loss function l; formally, if and only if
VAi (g, l) > VAj (g, l),
12
for every g, l.
Inasmuch as any two rational decision makers may employ different prior probabilities and
different loss functions, the condition “for every g, l” is synonymous with the statement “for every
rational decision maker”. Blackwell (1953) proved the following.
Theorem. Predictor Xi is more informative than predictor Xj if and only if the ROC of Xi
is superior to the ROC of Xj .
The binary relation of informativeness establishes a quasi order on a set of predictors X1 , ..., XI .
The quasi order is reflexive and transitive, but is not complete. That is, there may exist two predictors such that neither is more informative than the other, which is the case when one ROC crosses
the other. Also there may exist two predictors that are equally informative, which is the case when
one ROC is identical to the other.
In summary, an advantage of the BPO is that its elements, F0 and F1 , enable us to characterize the informativeness of a predictor for a given predictand. When two or more predictors
are available, they can easily be compared (and possibly ranked) in terms of the informativeness
relation.
4.4 Posterior Probability
Once the three elements (g, f0 , f1 ) are specified, the posterior probability π of precipitation
occurrence may be calculated from Eq. (3), given any value x of the mean relative humidity output
from the AVN model. Figure 4 shows the plot of π versus x for three values of g.
Regardless of the value of g, the posterior probability π is an increasing, non-linear, irreflexive
function of x. The basic shape of this function is determined by the conditional density functions
f0 , f1 . The prior probability g scales (nonlinearly) this basic shape. This has two practical implications. First, it illustrates the assertion made in Section 4.1 that the role of the prior probability g
is to calibrate the forecast probability. Thus by estimating g from a large climatic sample (and by
13
properly modeling and estimating f0 and f1 ), the meteorologist can ensure the necessary condition
for the forecast probability to be well calibrated against the climatic probability of precipitation
occurrence at a specific location and within a specific season. Second, even though the conditional
density functions f0 , f1 remain fixed during a season (here the cool season), the prior probability g
may change from month to month (or some other subseason). Consequently, the posterior probability π can be calibrated against the climatic probability for each month (or subseason) rather than
for the 6-month long cool season.
4.5 Another Predictor
Different predictors behave differently. That is why each predictor should be modeled individually, and the catalog of parametric models from which the conditional distribution functions
F0 , F1 are drawn should be large enough to afford flexibility. To underscore this point, let us model
another predictor: the relative vorticity on the isobaric surface of 850 hPa at 63 h after the 0000
UTC model run (for short, 850 hPa relative vorticity at 63 h). Figure 5 shows the empirical conditional distribution functions and the parametric conditional distribution functions F0 , F1 ; here, F0
is Weibull and F1 is log-logistic; each is defined on the interval (−5, ∞) and is specified by two
parameters. Figure 6 shows the conditional density functions f0 , f1 . Figure 7 shows the plot of π
versus x for three values of g. Clearly, this predictor behaves differently than the previous one.
A comparison of the ROCs (Fig. 3) reveals that the mean relative humidity at 60 h is approximately more informative than the 850 hPa relative vorticity at 63 h for predicting precipitation
occurrence during the period 60–66 h. (The adverb “approximately” is inserted because one ROC
crosses the other near the left end.) Would a combination of the two predictors be more informative
than either predictor alone? This question is answered at the end of Section 5.
14
4.6 Binary-Continuous Predictor
Another informative predictor X of precipitation occurrence is an estimate of the total precipitation amount during a specified period, output from the NWP model with a specified lead time.
Typically, X is a binary-continuous variate: its takes on value zero on some days, and positive
values on other days. Thus, the sample space of X is the interval [0, ∞), and the probability distribution of X assigns a nonzero probability to the event X = 0 and spreads the complementary
probability over the interval (0, ∞) according to some density function.
When the probability of event X = 0 is small, X may be modeled approximately as a
continuous variate. When the probability of event X = 0 is large, X should be modeled as a
binary-continuous variate in order to extract from it all information. The BPO can be suitably
modified to incorporate a binary-continuous predictor, alone or in combination with other continuous predictors. The case with a single binary-continuous predictor is described by Maranzano and
Krzysztofowicz (2004).
4.7 Monotonicity of Likelihood Ratio
When there exists a physical or a logical requirement for the posterior probability π to be a
monotone function of the predictor value x, as is the case in Figs. 4 and 7, this requirement can
be enforced via the likelihood ratio function L = f0 /f1 . As may be inferred from Eq. (3), if
L(x) decreases with x, then π increases with x; if L(x) increases with x, then π decreases with
x. A monotonicity requirement may not be satisfied automatically by L simply because f0 and f1
are obtained without any constraint on their ratio f0 /f1 . Thus, when a monotonicity requirement
exists, it is necessary to check that L satisfies it. Algorithms have been developed to perform this
checking and to force a monotonicity requirement on L.
15
5. EXAMPLE WITH TWO PREDICTORS
5.1 Conditional Correlation Coefficients
Let X1 denote the mean relative humidity predictor analyzed in Section 4.2, and let X2 denote
the relative vorticity predictor analyzed in Section 4.5. The analyses of individual predictors supply the conditional distribution functions (F10 , F11 ; F20 , F21 ) and the conditional density functions
(f10 , f11 ; f20 , f21 ). In order to obtain the BPO with two predictors (X1 , X2 ), it is necessary to
estimate two conditional correlation coefficients (γ 120 , γ 121 ) from which the two conditional correlation matrices, Γ0 and Γ1 , are constructed. The estimation procedure, applicable to any number
of predictors I > 2, is as follows.
1. The joint sample {(x1 , ..., xI , v)} is stratified into two conditional joint samples {(x1 , ..., xI , 0)}
and {(x1 , ..., xI , 1)} according to the value of the precipitation indicator v. Every step that follows
is performed twice, for v = 0 and v = 1.
2. Each conditional joint realization (x1 , ..., xI , v) is processed through the NQT
ziv = Q−1 (Fiv (xi )),
i = 1, ..., I,
to obtain a transformed conditional joint realization (z1v , ..., zIv ).
3. The transformed conditional joint sample {(z1v , ..., zIv )} is used to estimate the conditional
Pearson’s product-moment correlation coefficients γ ijv for i = 1, ..., I − 1; j = i + 1, ..., I.
When applied to the joint sample at hand, the above estimation procedure yields γ 120 = 0.577
and γ 121 = 0.596. Thereby all input elements have been estimated, and the BPO is ready for
forecasting.
5.2 Conditional Dependence Measures
Under the I-variate meta-Gaussian density function fv , the parameter γ ijv characterizes the
stochastic dependence between Xi and Xj , conditional on the hypothesized precipitation event
16
V = v. For the purpose of interpretation, γ ijv may be transformed into the Spearman’s rank
correlation coefficient ρijv between Xi and Xj , conditional on the hypothesized precipitation event
V = v. The transformation is given by (Kelly and Krzysztofowicz, 1997)
ρijv = (6/π) arcsin (γ ijv /2).
(8)
In the present example, ρ120 = 0.559 and ρ121 = 0.578.
From the estimates of γ 120 and γ 121 (or ρ120 and ρ121 ) one can infer that the mean relative
humidity X1 and the 850 hPa relative vorticity X2 are stochastically dependent, conditional on the
predictand V , and that the degree of dependence is somewhat stronger when precipitation occurs,
V = 1, than when precipitation does not occur, V = 0.
5.3 Conditional Dependence Structures
The purpose of the NQT is to transform a given dependence structure of the predictors into the
Gaussian dependence structure. To learn the dependence structure and to judge the performance
of the NQT, scatterplots of the conditional joint samples are examined. There are two scatterplots
of the original sample points (x1 , x2 ), conditional on V = 0 and V = 1 (Figs. 8a and 8b). Each
exhibits a non-Gaussian dependence structure: the scatters are not elliptic, especially the one conditional on V = 1, and the right-most points form a vertical frontier — an implication of X1 being
bounded above by 100%. Likewise, there are two scatterplots of the transformed sample points
(z1v , z2v ), for v = 0 and v = 1 (Figs. 8c and 8d). In each case, the scatter is elliptic, and the
hypothesis of the Gaussian dependence structure cannot be refuted. Thus the NQT performs well.
When the number of predictors I > 2, the analysis of the scatterplots should be performed
for every pair of variates (Xi , Xj ), i = 1, ..., I − 1; j = i + 1, ..., I. Pairwise analyses are sufficient
to validate the I-variate meta-Gaussian dependence structure.
17
5.4 Second Example
The event to be forecasted is the occurrence of precipitation in Quillayute, Washington, during
the 24-h period 1200–1200 UTC, beginning 36 h after the 0000 UTC model run in the warm season
(April–September). Let X1 denote the relative humidity on the isobaric surface of 850 hPa at 36
h, estimated by the AVN model. Let X2 denote the relative vorticity on the isobaric surface of 850
hPa at 36 h, estimated by the AVN model. The scatterplots are shown in Fig. 9.
As in the first example, the NQT performs well: each of the two non-Gaussian dependence
structures of the original sample points (especially the one in Fig. 9b) is transformed into the
Gaussian dependence structure. What makes this example different from the previous one is the
vastly different degrees of conditional dependence: X1 and X2 are (approximately) independent
(ρ120 = 0.011), conditional on precipitation nonoccurrence, V = 0; X1 and X2 are positively
dependent (ρ121 = 0.358), conditional on precipitation occurrence, V = 1.
The BPO takes the two conditional correlation coefficients explicitly into account, but nonBayesian techniques (such as MOS regression and logistic regression) fail to do so. When ρ120 and
ρ121 are significantly different, this may be one of the reasons for the superior performance of the
BPO.
5.5 Predictors Selection
For every predictand, 34 potential predictors are defined by appropriately concatenating five
variables (total precipitation amount, mean relative humidity, relative vorticity, relative humidity,
and vertical velocity), three lead times, and four isobaric surfaces. From this set, the best combination of no more than five predictors is selected. The selection is accomplished via an algorithm that
(i) maximizes RS (the area under the ROC, defined in Section 7.1) subject to the constraint that an
additional predictor must increase RS by at least a specified threshold, (ii) employs objective op-
18
timization and heuristic search, and (iii) estimates the parameters of the BPO and the performance
score RS from a given joint sample (an estimation sample — here from 4 years).
In the examples for Buffalo with one predictor, X1 (mean relative humidity at 60 h) or X2
(850 hPa relative vorticity at 63 h), and with two predictors, X1 and X2 , the scores are as follows:
RS(X1 ) = 0.818,
RS(X2 ) = 0.742,
RS(X1 , X2 ) = 0.825.
Although the combination of two predictors (X1 , X2 ) outperforms each of the single predictors,
X1 and X2 , the gain is below a threshold of significance. Thus, given only these two potential
predictors, it is best to select the single predictor X1 .
19
6. MOS SYSTEM
6.1 Forecasting Equation
The primary benchmark for evaluation of the BPO is the MOS system (Glahn and Lowry,
1972; Antolik, 2000) currently used in operational forecasting by the NWS. For a binary predictand, the MOS forecasting equation has the general form
π = a0 +
I
P
ai ti (xi ),
(9)
i=1
where ti is some transform determined experientially for each predictor Xi (i = 1, ..., I), and
a0 , a1 , ..., aI are regression coefficients. The predictand and the predictors are defined at a station.
For the predictand defined in Section 2.2, the MOS utilizes five predictors:
1. Total precipitation amount during 6-h period, 60–66 h; cutoff 2.54 mm.
2. Total precipitation amount during 3-h period, 60–63 h; cutoff 0.254 mm.
3. Relative humidity at the pressure level of 700 hPa at 66 h; cutoff 70%.
4. Relative humidity at the pressure level of 850 hPa at 60 h; cutoff 90%.
5. Vertical velocity at the pressure level of 850 hPa at 57 h; cutoff –0.2.
6.2 Grid-Binary Transform
In some cases, a predictor enters Eq. (9) untransformed, i.e., ti (xi ) = xi . In the present
case, each predictor is subjected to a grid-binary transformation, which is specified in terms of a
heuristic algorithm (Jensenius, 1992). The algorithm takes the gridded field of predictor values and
performs on it three operations: (i) mapping of each gridpoint value into "1" or "0", which indicates
the exceedance or nonexceedance of a specified cutoff level; (ii) smoothing of the resultant binary
field; and (iii) interpolation of the gridpoint values to the value ti (xi ) at a station. It follows that
the transformed predictor value ti (xi ) at a station depends upon the original predictor values at
all grid points in a vicinity. Thus when viewed as a transform of the original predictor Xi into a
20
grid-binary predictor ti (Xi ) at a fixed station, the transform ti is nonlinear and nonstationary (from
one forecast time to the next). The grid-binary predictor ti (Xi ) is dimensionless and its sample
space is the closed unit interval [0,1].
6.3 Estimation
The regression coefficients in Eq. (9) are estimated from a joint sample {(t1 (x1 ), ..., tI (xI ), v)}
of realizations of the transformed predictors and the predictand. Like the sample for the BPO, this
sample includes all daily realizations in the cool season (October – March) in 4 years. Unlike
the sample for the BPO, this sample includes not only the realizations at the Buffalo station, but
the realizations at all stations within the region to which Buffalo belongs. The pooling of station
samples into a regional sample is needed to ensure a “stable” estimation of the MOS regression
coefficients (Antolik, 2000). The estimates obtained by the MDL are:
a0 = 0.23806, a1 = 0.33791,
a3 = 0.15049, a4 = 0.15344,
a2 = 0.10016,
a5 = −0.21371.
These estimates are assumed to be valid for every station within the region.
6.4 Predictors Selection
For every predictand, there are about 176 potential predictors. The main reason for this number being about five times larger than 34 in BPO is that MOS employs the grid-binary predictors:
for each variable there are several cutoff levels, each of which generates a new predictor. The
best predictors are selected sequentially according to the maximum variance reduction criterion of
linear regression and the stopping criterion whereby an additional predictor must reduce variance
by at least a specified threshold. Up to 15 predictors can be selected.
21
7. COMPARISON OF BPO WITH MOS
7.1 System Versus Technique
There is a fundamental distinction between a forecasting technique and a forecasting system,
which for our purposes is this. A forecasting technique is essentially a forecasting equation with a
generic statistical interpretation, Eqs. (4)–(6) for BPO and Eq. (9) for MOS. A forecasting system
is a conjunction of a forecasting technique and a processing software that an organization employs
to process real-time data into operational forecasts. For instance, any comparison involving the
MOS technique, as defined by Eq. (9) but outside its processing software, would be a sterile experiment, unrepresentative of the actual MOS system of the NWS. For, as explained in Section 6.2,
the grid-binary transformations are an intrinsic, though often overlooked, part of that system: they
require processing of the entire gridded fields of model outputs, they cannot be reproduced except
through software, and they cannot be executed on data from an isolated station or an isolated grid
point at which comparison of techniques might be undertaken.
Whereas it is of scientific interest to compare the BPO technique against the MOS technique
and other traditional statistical techniques — several such comparisons have already been performed and will be reported in future publications — it is far more important to mission-oriented
agencies to compare the prototype BPO system with the operational MOS system. In his review,
C. Doswell concurred: “... it probably would be revealing to compare forecasts generated by the
BPO method against the real operational MOS ... it would be a more convincing ‘yardstick’ for
comparison and contrast”.
7.2 Performance Measures
It is apparent that each system, BPO and MOS, processes information in a totally different
manner. The objective of the following experiment is to compare the two systems with respect to
22
the efficiency of extracting the predictive information from the same data record — the archive of
the AVN model output. Towards this end, two comparative verifications of forecasts are performed
based on two input samples: (i) the estimation joint sample {(x, v)} from 4 years (April 1997 –
March 2001); this is the same joint sample that was used for estimation of the family of likelihood
functions (f0 , f1 ) of the BPO; and (ii) the validation joint sample {(x, v)} from 2 1/2 years (April
2001 – September 2003); this joint sample is used solely for validation.
Given an input sample (either the estimation sample or the validation sample), each system
(BPO and MOS) is used to calculate the forecast probability π based on every realization of its
predictors. (The MOS forecasts calculated from the validation sample are actually the operational
AVN-MOS forecasts produced by the NWS during 2
1/2
years; we simply re-calculated them.)
Then the joint sample {(π, v)} of realizations of the forecast probability and the predictand is used
to calculate the following performance measures.
The calibration function (CF) — a graph of the conditional probability η(π) = P (V = 1|Π =
π) versus the forecast probability π.
The receiver operating characteristic (ROC) — a graph of the probability of detection versus
the probability of false alarm.
The calibration score (CS) — the Euclidean distance (the square root of the expected quadratic
difference) between the line of perfect calibration and the calibration function:
©
ª1
CS = E([Π − η(Π)]2 ) 2 ;
0 ≤ CS ≤ 1.
The ROC score (RS) — the area under the ROC (calculated from a piecewise linear estimate
of the ROC using the trapezoidal rule); 1/2 ≤ RS ≤ 1. Some basic facts pertaining to this
performance measure are as follows: (i) System A is more informative than system B if and only
if the ROC of A is superior to the ROC of B. (ii) If system A is more informative than system B,
23
then the RS of A is not smaller than the RS of B.
7.3 Comparative Verifications
Complete results are presented for the 6-h forecast period, 60–66 h after the model run. The
BPO uses one predictor (mean relative humidity at 60 h, as detailed in Section 4); the MOS uses
five predictors (as detailed in Section 6).
Figure 11 shows the CF and the CS from every verification. Both BPO and MOS exhibit
stable calibration across the two samples, the estimation sample and the validation sample. The
MOS probabilities smaller than 0.4 are well calibrated, but those greater than 0.4 are poorly calibrated on both samples. The BPO probabilities are well calibrated on both samples. Based on the
CS from the validation sample, BPO is calibrated better than MOS, by 0.034 on average (on the
probability scale).
Figure 12 shows the ROC and the RS from every verification. Both BPO and MOS exhibit
stable informativeness across the two samples, the estimation sample and the validation sample.
For each sample, the two ROCs cross each other. Thus neither system is more informative than
the other. For each sample, the RS of BPO is slightly higher than the RS of MOS.
A summary of results is presented for three forecast periods, 6-h, 12-h, 24-h, each beginning
60 h after the model run. Table 2 lists the predictors used by the BPO; Tables 3 and 4 report the
scores from verifications on the estimation samples and on the validation samples. In all six cases,
BPO is calibrated significantly better than MOS: the CS of BPO is at least 50% smaller than the
CS of MOS. In five out of six cases, the RS of BPO is slightly higher than the RS of MOS.
Finally, there is a consistent difference in terms of the number of “optimal predictors” selected
for each system during its development: BPO uses 1–2 predictors, which are always extracted
directly from the output fields of the AVN model; MOS uses 4–5 predictors, most of which are
24
obtained through grid-binary transformations of the output fields of the AVN model (Section 6.2).
7.4 Explanations
Calibration. Why is it that BPO is calibrated significantly better than MOS? Why is it that
MOS is poorly calibrated, contrary to the verification results of past studies? The explanation is
twofold.
First, as elaborated in Sections 4.1 and 4.4, the theoretic structure of the BPO forecasting
equation (3) ensures the necessary condition for the forecast probability to be well calibrated
against the prior (climatic) probability g input into the equation for a specific location and season. The ad-hoc structure of the MOS forecasting equation (9) does not offer this property.
Second, the good calibrations of MOS reported in past studies (e.g., Murphy and Brown,
1984; Antolik, 2000) may have been the artifact of the analyses. For these studies did not verify
the calibration of MOS at any specific location (which is of import to the users of forecasts at
that location), but instead pooled the verification samples from many locations into one national
sample from which verification statistics were calculated. If the prior probability and the degree of
calibration varied across locations, then the verification statistics obtained from a pooled sample
did not pertain to any location and therefore would be misleading to users.
Informativeness. Why is it that MOS needs two to four additional predictors to barely match
the informativeness of BPO? The explanation once again is twofold.
First, the laws of probability theory, from which the BPO is derived, ensure the optimal structure of the BPO forecasting equation (3). The structure of the MOS forecasting equation (9) is
different. Therefore, given any single predictor, the BPO system, if properly operationalized, can
never be less informative than the MOS system (or any other non-Bayesian system for that matter). To make up for the non-optimal theoretic structure, a non-Bayesian system needs additional
25
predictors (which are conditionally informative in that system).
Second, the grid-binary transform (Jensenius, 1992) was invented to improve the calibration
of the MOS system. But by mapping an original predictor (which is binary-continuous or continuous) into a binary predictor, this transform also removes part of predictive information contained
in the original predictor. In the examples reported herein, two to four additional predictors are
needed to make up for the lost information and the nonoptimal structure of the MOS forecasting
equation.
To dissect the predictive performance of the grid-binary transform, each system, MOS and
BPO, was estimated and evaluated twice: first, utilizing an original predictor, and next utilizing
the grid-binary transformation of that predictor. There were two findings. (i) The use of the
grid-binary transform in the MOS leads to a compromise: the transform improves the CS but
deteriorates the RS.
(ii) The use of the grid-binary transform in the BPO is unnecessary for
calibration (because the BPO automatically calibrates the posterior probability against the specified
prior probability) and is detrimental for informativeness (because it removes part of the predictive
information contained in the original predictor).
26
8. SUMMARY
8.1 Bayesian Technique
1. The BPO for a binary predictand described herein is the first technique of its kind for
probabilistic forecasting of weather variates: it produces the posterior probability of an event
through Bayesian fusion of a prior (climatic) probability and a realization of predictors output
from a NWP model.
2. The BPO implements Bayes theorem, which provides the correct theoretic structure of
the forecasting equation, and employs the meta-Gaussian family of multivariate density functions,
which provides a flexible and convenient parametric model. It can be estimated effectively from
asymmetric samples — the climatic sample of the predictand (which is typically long), and the
joint sample of the predictor vector and the predictand (which is typically short).
3. The development of the BPO has focused on quality of modeling and simplicity of estimation. The BPO allows (i) the marginal conditional distribution functions of the predictors to
be of any form (as typically they are non-Gaussian), and (ii) the conditional dependence structure
between any two predictors to be non-linear and heteroscedastic (as typically is the case in meteorology). Despite this flexibility, the BPO requires the estimation of only distribution parameters
and correlation coefficients. And the entire process of selecting predictors, choosing parametric
distribution functions, and estimating parameters can be automated.
8.2 Preliminary Results
1. The PoP produced by the prototype BPO system is better calibrated than, and at least as
informative as, the PoP produced by the operational MOS system for a specific location (and hence
for a specific user).
2. The BPO utilizing one or two predictors performs, in terms of both calibration and infor-
27
mativeness, at least as well as the MOS utilizing four or five predictors. This suggests that BPO is
more efficient than MOS in extracting predictive information from the output of a NWP model.
3. Every predictor in the BPO is a direct model output (interpolated to the station), whereas
most predictors in the MOS are grid-binary predictors whose definitions require subjective experimentation (to set the cutoff levels and smoothing parameters) and algorithmic processing of
the entire output fields (to calculate the predictor values).
Thus in terms of the definitions of
predictors, the BPO is more parsimonious than the MOS.
8.3 Potential Implications
1. Inasmuch as the grid-binary predictors can be dispensed with because only the basic and
derived predictors need be considered by the BPO, the set of potential predictors for the BPO is
about 5 times smaller than the set of potential predictors for the MOS. Consequently, the overall
effort needed to select the most informative subset of predictors can be reduced substantially.
2. With fewer predictors (say between one and four for BPO, instead of between four and
fifteen for MOS), an extension of the BPO to processing an ensemble of the NWP model output will
present a less demanding task (in terms of data storage requirements and computing requirements)
than it would be if an extension of the MOS technique were attempted.
28
Acknowledgments.
This material is based upon work supported by the National Science
Foundation under Grant No. ATM-0135940, “New Statistical Techniques for Probabilistic Weather
Forecasting”. The Meteorological Development Laboratory of the National Weather Service provided the AVN-MOS database and the MOS forecasting equations for comparative verifications.
The collaboration of Drs. Harry R. Glahn and Paul Dallavalle in this regard is much appreciated;
the advice of Mark S. Antolik on accessing and interpreting the data is gratefully acknowledged.
29
APPENDIX A
NUMERICAL APPROXIMATION TO Q−1
Two approximations to the inverse of the standard normal distribution function Q−1 can be
found in Abramowitz and Stegun (1972, Chapter 26, p. 933). For operational forecasting, the
3-term rational function approximation is sufficiently accurate, and we reproduce it below at the
request of a reviewer.
Given the probability p, the standard normal quantile z = Q−1 (p) is
approximated by
−ẑ if 0 < p ≤ 0.5,
ẑ if 0.5 ≤ p < 1,
where
ẑ = t −
a0 + a1 t
;
1 + b1 t + b2 t2
⎧ √
⎪
⎨ −2 ln p
t=
p
⎪
⎩ −2 ln (1 − p)
a0 = 2.30753,
a1 = 0.27061,
if 0 < p ≤ 0.5,
if 0.5 ≤ p < 1;
b1 = 0.99229,
b2 = 0.04481.
The error of this approximation is |ẑ − z| < 3 × 10−3 .
30
REFERENCES
Abramowitz, M., and I.A. Stegun (eds.), 1972: Handbook of Mathematical Functions, Dover,
Mineola, New York.
Alexandridis, M.G., and R. Krzysztofowicz, 1985: Decision models for categorical and probabilistic weather forecasts. Applied Mathematics and Computation, 17, 241–266.
Antolik, M.S., 2000: An overview of the National Weather Service’s centralized statistical quantitative precipitation forecasts. Journal of Hydrology, 239(1–4), 306–337.
Blackwell, D., 1951: Comparison of experiments. Proceedings of the Second Berkeley Symposium
on Mathematical Statistics and Probability, J. Neyman (ed.), University of California Press,
Berkeley, pp. 93–102.
Blackwell, D., 1953: Equivalent comparisons of experiments. Annals of Mathematical Statistics,
24, 265–272.
de Finetti, B., 1974: Theory of Probability, vol. 1, John Wiley and Sons, New York.
DeGroot, M.H., 1970: Optimal Statistical Decisions, McGraw-Hill, New York.
DeGroot, M.H., 1988: A Bayesian view of assessing uncertainty and comparing expert opinion.
Journal of Statistical Planning and Inference, 20, 295–306.
Edwards, W., L.D. Phillips, W.L. Hays, and B.C. Goodman, 1968: Probabilistic information
processing systems: Design and evaluation. IEEE Transactions on Systems Science and
Cybernetics, SSC-4 (3), 248–265.
Glahn, H.R., and D.A. Lowry, 1972: The use of model output statistics (MOS) in objective weather
forecasting. Journal of Applied Meteorology, 11(8), 1203–1211.
31
Herr, H.D., and R. Krzysztofowicz, 2005: Generic probability distribution of rainfall in space: The
bivariate model. Journal of Hydrology, 306(1–4), 234–263.
Jensenius, J.S., Jr., 1992: The use of grid-binary variables as predictors for statistical weather forecasting. Preprints of the 12th Conference on Probability and Statistics in the Atmospheric
Sciences, Toronto, Ontario, American Meteorological Society, pp. 225–230.
Kelly, K.S., and R. Krzysztofowicz, 1994: Probability distributions for flood warning systems.
Water Resources Research, 30(4), 1145–1152.
Kelly, K.S., and R. Krzysztofowicz, 1995: Bayesian revision of an arbitrary prior density. Proceedings of the Section on Bayesian Statistical Science, American Statistical Association,
50–53.
Kelly, K.S., and R. Krzysztofowicz, 1997: A bivariate meta-Gaussian density for use in hydrology.
Stochastic Hydrology and Hydraulics, 11(1), 17–31.
Krzysztofowicz, R., 1983: Why should a forecaster and a decision maker use Bayes theorem.
Water Resources Research, 19(2), 327–336.
Krzysztofowicz, R., 1999: Bayesian forecasting via deterministic model. Risk Analysis, 19(4),
739–749.
Krzysztofowicz, R., 2002: Bayesian system for probabilistic river stage forecasting. Journal of
Hydrology, 268(1–4), 16–40.
Krzysztofowicz, R., and H.D. Herr, 2001: Hydrologic uncertainty processor for probabilistic river
stage forecasting: Precipitation-dependent model. Journal of Hydrology, 249(1–4), 46–68.
Krzysztofowicz, R., and D. Long, 1990: Fusion of detection probabilities and comparison of multisensor systems. IEEE Transactions on Systems, Man, and Cybernetics, 20(3), 665–677.
32
Krzysztofowicz, R., and D. Long, 1991: Forecast sufficiency characteristic: Construction and
application. International Journal of Forecasting, 7(1), 39–45.
Lindley, D.V., 1987: The probability approach to the treatment of uncertainty in artificial intelligence and expert systems. Statistical Science , 2(1), 17–24.
Maranzano, C.J., and R. Krzysztofowicz, 2004: Bayesian processor of output for probabilistic
forecasting of precipitation occurrence. Preprints of the 17th Conference on Probability
and Statistics in the Atmospheric Sciences, Seattle, Washington, American Meteorological
Society; paper number 4.3.
McCullagh, P., and J.A. Nelder, 1989: Generalized Linear Models, 2nd ed., Chapman and Hall,
Boca Raton.
Murphy, A.H., and B.G. Brown, 1984: A comparative evaluation of objective and subjective
weather forecasts in the United States. Journal of Forecasting , 3, 369–393.
Sage, A.P. and J.L. Melsa, 1971: Estimation Theory with Applications to Communications and
Control, McGraw-Hill, New York.
Savage, L.J., 1954: The Foundations of Statistics, John Wiley and Sons, New York.
33
Table 1. Sample sizes and estimates of the prior probability g of precipitation occurrence;
6-h period 1200–1800 UTC; Buffalo, NY.
Month
Cool
Oct
Nov
Dec
Jan
Feb
Mar
Season
Size
184
180
197
205
175
191
1132
g
0.23
0.29
0.28
0.27
0.25
0.29
0.27
34
Table 2. Predictors used in the BPO system for PoP forecasts; forecast periods beginning
60 h after the 0000 UTC model run; cool season; Buffalo, NY.
Period
Predictor
6-h (60–66)
Mean relative humidity at 60 h
12-h (60–72)
Total precipitation during 60–72 h
Mean relative humidity at 60 h
24-h (60–84)
Total precipitation during 60–72 h
Mean relative humidity at 72 h
35
Table 3. Comparison of the BPO system with the MOS system for PoP forecasts;
verification on the estimation sample (Apr. 1997 – Mar. 2001); forecast periods
beginning 60 h after the 0000 UTC model run; cool season; Buffalo, NY.
Period
6-h (60–66)
12-h (60–72)
24-h (60–84)
System
Number of
Predictors
Calibration
Score, CS
ROC
Score, RS
Sample
Size
BPO
1
0.031
0.818
698
MOS
5
0.085
0.815
713
BPO
2
0.051
0.835
688
MOS
4
0.105
0.828
703
BPO
2
0.049
0.780
667
MOS
5
0.112
0.781
682
36
Table 4. Comparison of the BPO system with the MOS system for PoP forecasts;
verification on the validation sample (Apr. 2001 – Sept. 2003); forecast periods
beginning 60 h after the 0000 UTC model run; cool season; Buffalo, NY.
Period
6-h (60–66)
12-h (60–72)
24-h (60–84)
System
Number of
Predictors
Calibration
Score, CS
ROC
Score, RS
Sample
Size
BPO
1
0.031
0.815
363
MOS
5
0.065
0.808
363
BPO
2
0.042
0.829
360
MOS
4
0.092
0.814
360
BPO
2
0.063
0.810
348
MOS
5
0.133
0.807
348
37
FIGURE CAPTIONS
Figure 1. Empirical distribution functions and parametric distribution functions Fv of
the mean relative humidity at 60 h, X, output from the AVN model, conditional on
precipitation event V = v (precipitation nonoccurrence, v = 0; precipitation
occurrence, v = 1); 6-h forecast period 1200–1800 UTC, beginning 60 h after
the 0000 UTC model run; cool season; Buffalo, NY.
Figure 2. Conditional density functions fv corresponding to the parametric conditional distribution
functions Fv (v = 0, 1) shown in Fig. 1.
Figure 3. Receiver operating characteristics of two predictors: mean relative humidity
at 60 h, and 850 hPa relative vorticity at 63 h, for forecasting precipitation occurrence
during 6-h period 1200–1800 UTC, beginning 60 h after the 0000 UTC model run;
cool season; Buffalo, NY.
Figure 4. Posterior probability π of precipitation occurrence as a function of the mean relative
humidity at 60 h, x, output from the AVN model, for three values of the prior
probability g; 6-h forecast period 1200–1800 UTC, beginning 60 h after the 0000
UTC model run; cool season; Buffalo, NY.
Figure 5. Empirical distribution functions and parametric distribution functions Fv of
the 850 hPa relative vorticity at 63 h, X, output from the AVN model, conditional on
precipitation event V = v (precipitation nonoccurrence, v = 0; precipitation
occurrence, v = 1); 6-h forecast period 1200–1800 UTC, beginning 60 h after
the 0000 UTC model run; cool season; Buffalo, NY.
Figure 6. Conditional density functions fv corresponding to the parametric conditional distribution
functions Fv (v = 0, 1) shown in Fig. 5.
38
Figure 7. Posterior probability π of precipitation occurrence as a function of the 850 hPa relative
vorticity at 63 h, x, output from the AVN model, for three values of the prior
probability g; 6-h forecast period 1200–1800 UTC, beginning 60 h after the 0000
UTC model run; cool season; Buffalo, NY.
Figure 8. Scatterplots of the 850 hPa relative vorticity at 63 h, X2 , versus the mean relative
humidity at 60 h, X1 , conditional on: (a) precipitation nonoccurrence, V = 0; (b)
precipitation occurrence, V = 1; and scatterplots of the standard normal predictors
Z2 and Z1 , conditional on: (c) precipitation nonoccurrence, V = 0; (d) precipitation
occurrence, V = 1; 6-h forecast period 1200–1800 UTC, beginning 60 h after the
0000 UTC model run; cool season; Buffalo, NY.
Figure 9. Scatterplots of the 850 hPa relative vorticity at 36 h, X2 , versus the 850 hPa relative
humidity at 36 h, X1 , conditional on: (a) precipitation nonoccurrence, V = 0; (b)
precipitation occurrence, V = 1; and scatterplots of the standard normal predictors
Z2 and Z1 , conditional on: (c) precipitation nonoccurrence, V = 0; (d) precipitation
occurrence, V = 1; 24-h forecast period 1200–1200 UTC, beginning 36 h after the
0000 UTC model run; warm season; Quillayute, WA.
Figure 10. Calibration functions of the BPO using one predictor and of the MOS using five
predictors; PoP forecasts for 6-h period 1200–1800 UTC, beginning 60 h after the
0000 UTC model run; cool season; Buffalo, NY: (a) BPO — estimation sample,
(b) MOS — estimation sample, (c) BPO — validation sample, (d) MOS —
validation sample. (Above each point is the number of forecasts used to estimate
that point.)
39
Figure 11. Receiver operating characteristics of the BPO using one predictor and of the
MOS using five predictors; PoP forecasts for 6-h period 1200–1800 UTC,
beginning 60 h after the 0000 UTC model run; cool season, Buffalo, NY:
(a) estimation sample, (b) validation sample.
40
1.0
v=0
v=1
0.9
0.8
P(X ≤ x | V = v)
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0.0
10
20
30
40
50
60
70
80
90 100
Mean Relative Humidity x [%]
Figure 1. Empirical distribution functions and parametric distribution functions Fv of the mean
relative humidity at 60 h, X, output from the AVN model, conditional on precipitation
event V = v (precipitation nonoccurrence, v = 0; precipitation occurrence, v = 1);
6-h forecast period 1200–1800 UTC, beginning 60 h after the 0000 UTC model run;
cool season; Buffalo, NY.
41
0.04
v=0
v=1
Density fv(x)
0.03
0.02
0.01
0.00
10
20
30
40
50
60
70
80
90 100
Mean Relative Humidity x [%]
Figure 2. Conditional density functions fv corresponding to the parametric conditional distribution
functions Fv (v = 0, 1) shown in Fig. 1.
42
1.0
0.9
Probability of Detection
0.8
0.7
0.6
0.5
0.4
0.3
0.2
Mean Relative Humidity
850 Relative Vorticity
0.1
0.0
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
Probability of False Alarm
Figure 3. Receiver operating characteristics of two predictors: mean relative humidity at
60 h, and 850 hPa relative vorticity at 63 h, for forecasting precipitation occurrence
during 6-h period 1200–1800 UTC, beginning 60 h after the 0000 UTC model
run; cool season; Buffalo, NY.
.
43
Posterior Probability π = P(V = 1 | X = x)
1.0
Prior Probability
0.9
g = 0.75
g = P(V = 1)
0.8
0.7
0.50
0.6
0.5
0.4
0.25
0.3
0.2
0.1
0.0
10
20
30
40
50
60
70
80
90 100
Mean Relative Humidity x [%]
Figure 4. Posterior probability π of precipitation occurrence as a function of the mean
relative humidity at 60 h, x, output from the AVN model, for three values of
the prior probability g; 6-h forecast period 1200–1800 UTC, beginning 60 h
after the 0000 UTC model run; cool season; Buffalo, NY.
44
1.0
0.9
0.8
P(X ≤ x | V = v)
0.7
0.6
0.5
0.4
0.3
0.2
v=0
v=1
0.1
0.0
-5
-3
-1
1
3
5
7
9
11
850 Relative Vorticity x
Figure 5. Empirical distribution functions and parametric distribution functions Fv of
the 850 hPa relative vorticity at 63 h, X, output from the AVN model, conditional on
precipitation event V = v (precipitation nonoccurrence, v = 0; precipitation
occurrence, v = 1); 6-h forecast period 1200–1800 UTC, beginning 60 h after
the 0000 UTC model run; cool season; Buffalo, NY.
45
0.3
v=0
v=1
Density fv(x)
0.2
0.1
0.0
-5
-3
-1
1
3
5
7
9
11
850 Relative Vorticity x
Figure 6. Conditional density functions fv corresponding to the parametric conditional
distribution functions Fv (v = 0, 1) shown in Fig. 5.
46
Posterior Probability π = P(V = 1 | X = x)
1.0
0.9
g = 0.75
0.8
0.7
0.50
0.6
0.5
0.4
0.25
0.3
Prior Probability
0.2
g = P(V = 1)
0.1
0.0
-5
-3
-1
1
3
5
7
9
11
850 Relative Vorticity x
Figure 7. Posterior probability π of precipitation occurrence as a function of the 850 hPa
relative vorticity at 63 h, x, output from the AVN model, for three values of
the prior probability g; 6-h forecast period 1200–1800 UTC, beginning 60 h
after the 0000 UTC model run; cool season; Buffalo, NY.
47
11
v=0
(b)
ρ120 = 0.559
9
7
7
5
3
1
-1
-3
ρ121 = 0.578
5
3
1
-1
-5
10
3.5
20
30 40 50 60 70 80
Mean Relative Humidity x1
v=0
90 100
10
2.5
2.5
1.5
1.5
0.5
-0.5
-2.5
-2.5
-1.5
-0.5
0.5
1.5
z10 = Q-1(F10(x1))
2.5
-3.5
-3.5
3.5
v=1
90 100
γ121 = 0.596
-0.5
-1.5
-2.5
30 40 50 60 70 80
Mean Relative Humidity x1
0.5
-1.5
-3.5
-3.5
20
(d) 3.5
γ120 = 0.577
z21 = Q-1(F21(x2))
z20 = Q-1(F20(x2))
v=1
-3
-5
(c)
11
9
850 Relative Vorticity x2
850 Relative Vorticity x2
(a)
-2.5
-1.5
-0.5
0.5
1.5
z11 = Q-1(F11(x1))
2.5
3.5
Figure 8. Scatterplots of the 850 hPa relative vorticity at 63 h, X2 , versus the mean relative
humidity at 60 h, X1 , conditional on: (a) precipitation nonoccurrence, V = 0; (b)
precipitation occurrence, V = 1; and scatterplots of the standard normal predictors
Z2 and Z1 , conditional on: (c) precipitation nonoccurrence, V = 0; (d) precipitation
occurrence, V = 1; 6-h forecast period 1200–1800 UTC, beginning 60 h after the
0000 UTC model run; cool season; Buffalo, NY.
48
(a)
v=0
(b)
ρ120 = 0.011
9
7
7
5
3
1
-1
-3
ρ121 = 0.358
5
3
1
-1
-5
10
3.5
20
30
40 50 60 70 80
850 Relative Humidity x1
v=0
90 100
10
(d) 3.5
γ120 = 0.012
2.5
2.5
1.5
1.5
z21 = Q-1(F21(x2))
z20 = Q-1(F20(x2))
v=1
-3
-5
(c)
11
9
850 Relative Vorticity x2
850 Relative Vorticity x2
11
0.5
-0.5
-2.5
-2.5
-1.5
-0.5
0.5
1.5
z10 = Q-1(F10(x1))
2.5
-3.5
-3.5
3.5
40 50 60 70 80
850 Relative Humidity x1
v=1
90 100
γ121 = 0.373
-0.5
-1.5
-2.5
30
0.5
-1.5
-3.5
-3.5
20
-2.5
-1.5
-0.5
0.5
1.5
z11 = Q-1(F11(x1))
2.5
3.5
Figure 9. Scatterplots of the 850 hPa relative vorticity at 36 h, X2 , versus the 850 hPa relative
humidity at 36 h, X1 , conditional on: (a) precipitation nonoccurrence, V = 0; (b)
precipitation occurrence, V = 1; and scatterplots of the standard normal predictors
Z2 and Z1 , conditional on: (c) precipitation nonoccurrence, V = 0; (d) precipitation
occurrence, V = 1; 24-h forecast period 1200–1200 UTC, beginning 36 h after the
0000 UTC model run; warm season; Quillayute, WA.
49
(b)
1.0
0.9
0.8
Conditional Probability P(V = 1 | Π = π)
Conditional Probability P(V = 1 | Π = π)
(a)
32
34
0.7
42
53
0.6
0.5
0.4
60
67
0.3
101
0.2
172
0.1
-137
CS = 0.031
0.0
(c)
0.9
50
0.8
0.7
67
31
0.6
0.5
0.4
135
0.3
92
0.2
171
0.1
167
CS = 0.085
0.0
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
Forecast Probability π
Forecast Probability π
(d)
1.0
0.9
0.8
Conditional Probability P(V = 1 | Π = π)
Conditional Probability P(V = 1 | Π = π)
1.0
28
0.7
25
0.6
40
0.5
37
0.4
0.3
42
54
0.2
94
0.1
CS = 0.031
43
0.0
1.0
0.9
0.8
46
0.7
0.6
40
0.5
0.4
58
0.3
79
0.2
0.1
80
60
CS = 0.065
0.0
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
Forecast Probability π
Forecast Probability π
Figure 10. Calibration functions of the BPO using one predictor and of the MOS using five
predictors; PoP forecasts for 6-h period 1200–1800 UTC, beginning 60 h after the
0000 UTC model run; cool season; Buffalo, NY: (a) BPO — estimation sample,
(b) MOS — estimation sample, (c) BPO — validation sample, (d) MOS —
validation sample. (Above each point is the number of forecasts used to estimate
that point.)
50
(a)
1.0
0.9
Probability of Detection
0.8
0.7
0.6
0.5
0.4
0.3
Estimation Sample
BPO RS = 0.818
MOS RS = 0.815
0.2
0.1
0.0
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
Probability of False Alarm
(b)
1.0
0.9
Probability of Detection
0.8
0.7
0.6
0.5
0.4
0.3
Validation Sample
BPO RS = 0.815
MOS RS = 0.808
0.2
0.1
0.0
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
Probability of False Alarm
Figure 11. Receiver operating characteristics of the BPO using one predictor and of
the MOS using five predictors; PoP forecasts for 6-h period 1200–1800 UTC,
beginning 60 h after the 0000 UTC model run; cool season, Buffalo, NY:
(a) estimation sample, (b) validation sample.
51
Download