Probabilistic Weather Forecasting Using Bayesian Model Averaging J. McLean Sloughter Adviser: Tilmann Gneiting GSR: Susan Joslyn Committee members: Adrian Raftery & Cliff Mass 8 May, 2009 This work was supported by MURI, JEFS, & NSF grants Background Motivation Ensemble forecasting Bayesian model averaging Dissertation outline BMA for vector wind Data Decomposing the problem Bias-correction Error distributions Model Results Future directions References Acknowledgements 2 Why probabilistic forecasting? Situations where certain ranges or thresholds are of interest Situations where knowing not just the most likely outcome, but possible extremes are important Situations involving a cost / loss analysis, where probabilities of different outcomes need to be known Examples: Wind energy Military Sailing Airports Winter road maintenance 3 Ensemble Forecasting 48-hour forecasts for maximum wind speeds on 7 August 2003 4 Ensemble Forecasting Single forecast model is run multiple times with different initial conditions Forecasts created on a 12-km grid, and bilinearly interpolated to locations of interest Ensemble mean tends to outperform individual members Spread-skill relationship: spread of forecasts tends to be correlated with magnitude of error 5 Ensemble Forecasting Would like the ensemble to look like draws from the same distribution as the observed values Ensemble only captures one source of variability – uncertainty in initial conditions Ensemble distribution is underdispersed relative to observed values Ensemble members agree with one another more than they agree with observations 6 Bayesian model averaging (BMA) Weighted average of multiple component models One component per ensemble member Each component a distribution of observed value conditioned on an ensemble member forecast Model fit based on training data – past sets of forecasts / observations Use a sliding window of training data Weights determined by how well each member fits the training data 7 Bayesian model averaging where is the deterministic forecast from member k, is the weight associated with member k, and 0.0 0.5 density 1.0 1.5 is the estimated density function for y given member k 0 1 2 3 Observed speed 4 5 8 Background Motivation Ensemble forecasting Bayesian model averaging Dissertation outline BMA for vector wind Data Decomposing the problem Bias-correction Error distributions Model Results Future directions References Acknowledgements 9 Dissertation Outline Precipitation forecasting Sloughter et al., 2007, MWR Extends BMA to a specific case of skewed and noncontinuous distributions Wind speed forecasting Sloughter et al., 2009, JASA Extends methods of Sloughter et al. (2007) to other forms of skewed and non-continuous distributions Examines robustness of BMA to details of model selection Vector wind forecasting This talk Extends BMA to multivariate distributions 10 Background Motivation Ensemble forecasting Bayesian model averaging Dissertation outline BMA for vector wind Data Decomposing the problem Bias-correction Error distributions Model Results Future directions References Acknowledgements 11 BMA for vector wind Methods exist for using Bayesian Model Averaging to create probabilistic forecasts for weather quantities that can be expressed as a mixture of normals (Raftery et al., 2005), such as temperature and pressure. Expanded to be applied to non-continuous and skewed quantities such as precipitation and wind speed in Sloughter et al. 2007, Sloughter et al. 2009. A method is needed for modeling multivariate quantites such as wind vectors. 12 Knot A knot is a measure of speed used in nautical, meteorological, and aviation settings 1.852 kilometers per hour 1.151 miles per hour 0.514 meters per second Sailors would throw out the chip log (a board designed to stay stationary in water) tied to a rope with knots spaced 7 fathoms (42 feet) apart They would then count how many knots were fed out in 30 seconds 13 Knot 4-6 knots is a light breeze – leaves move, breeze can be felt on one’s face 11-16 knots is a moderate breeze – dust and paper will be blown about, whitecaps will form on the water 20-21 knots is generally the threshold for issuing a small craft advisory 34-40 knots is a gale – small branches break from trees, walking becomes difficult 14 Data This work uses wind data from the Pacific Northwest for the full year 2003, plus November and December 2002 (results for 2003 data, 2002 used only for training) “Instantaneous” vector wind measurements Measured in knots Each forecast consists of 8 ensemble members Data were available for 343 days, missing for 83 days A total of 38091 observations, averaging 111 observations per day All work that follows is based on 30-day training periods, with 2-day-ahead forecasting 15 Data Data from Surface Airway Observation stations Airports in BC, Washington, Oregon, Idaho, and California 16 Decomposing the problem Wind has two dimensions, east/west direction and north/south direction BMA uses a mixture distribution with one component per ensemble member Consider each mixture component a bivariate distribution parameterized in terms of a mean vector and a covariance matrix Assume that the mean of the distribution is some function of the forecast vector, and that the covariance matrix does not depend upon the forecast (exploratory plots support these assumptions) 17 Decomposing the problem h(fk) is the mean (a bias-corrected forecast) BV(0, Q) is the distribution of the forecast error Model the distribution of the errors rather than the observed values Has the advantage of having constant parameters across forecast values Can then be decomposed into two separate problems: bias-correcting the forecast modeling the error distribution 18 Bias-correction For simplicity, consider affine bias corrections Two potential forms of bivariate bias correction Additive bias-correction Full affine bias-correction Where Y is the observed wind vector, fk is the kth vector forecast, ak is an additive bias vector, and Bk is a transformation matrix 19 Bias-correction Bivariate root mean squared error (in knots) for one ensemble member Out-of-sample results using 30-day training period Similar results hold for other ensemble members Affine bias-correction shows a marked improvement 20 Error distributions Now deal with the error field (observations minus bias-corrected forecasts) Exploratory work suggests that the distributions are ellipsoidal, but have heavier tails than normal distributions Transform the error vector (rkcosqk, rksinqk)T by raising the magnitude of the vector to the 4/5 power while preserving the angle Model this as a bivariate normal distribution 21 Model Thus, our final model is: Where the gk are the distributions on y implied by the distributions of the transformed error vectors Model parameters are estimated globally using all observation locations 22 Model Bias-correction fit via linear regression (separate bias correction for each mixture component) Weights and covariance matrix estimated via maximum likelihood using the EM algorithm Use latent variables zkst which are indicators that forecast k was the best forecast at station s at time t 23 Model E step: M step: 24 Model M step (continued) 25 Results We simulate a large number of forecasts from our distribution Can evaluate the forecast of either the wind vector or derived quantities (marginal speed or direction) from the empirical distribution of our forecasts Essentially creating a new, larger ensemble of forecasts that should be better-calibrated than the original ensemble 26 Example To illustrate what the BMA distribution is doing, consider the case of forecasting at Omak, Washington on February 4th, 2003 27 28 Results Our goal is to maximize sharpness subject to calibration (the Gneiting principle) By calibration, we mean that we want our probability distribution function to be correct – if we forecast an event as happening with probability .9, we want it to happen 90% of the time By sharpness, we mean that we want predictive intervals to be as narrow as possible 29 Results For univariate quantities, the verification rank histogram is a tool that can be used to assess the calibration of an ensemble forecast Find the rank of each forecast relative to the ensemble members If the ensemble is properly calibrated, the observation and forecasts should be interchangeable If so, each potential rank of the forecast should have equal probability Thus, a histogram of the ranks should look flat 30 Results For multivariate quantities, there is an analogous multivariate rank histogram (MVRH), again based on the assumption of exchangeability Define if and only if in every dimension For each member of the combined set of the observation and the forecasts, find the pre-rank The multivariate rank is the rank of the observation pre-rank, with any ties resolved at random If we have a set of 8 forecasts and 1 observation, there are 9 possible rankings of the observation relative to the forecasts 31 Results MVRH for the raw ensemble (left) and BMA forecast distribution (right) Raw ensemble is under-dispersed BMA forecast distribution is much better-calibrated 32 Results The energy score (ES) is a scoring rule for multivariate probabilistic forecasts that takes into account both calibration and sharpness In the univariate case, it reduces to the continuous ranked probability score (CRPS) P is the predictive distribution, x the observed wind vector, X and X’ independent random variables with distribution P 33 Results There may still be interest in a point forecast as well We can use the spatial median as a point forecast We can assess the quality of a multivariate point forecast using the multivariate mean absolute error (MMAE) 34 Results BMA outperforms climatology and the raw ensemble both in terms of the probabilistic forecast and the deterministic forecast 35 Results – marginal speed and direction Again consider verification rank histograms to assess calibration Both speed (top) and direction (bottom) are much improved by BMA 36 Results – marginal speed and direction CRPS is the scalar equivalent of the energy score DCRPS is the angular equivalent Scalar point forecasts can be assessed by the MAE, and angular point forecasts by the mean directional error (MDE) Can also look at coverage and width of 77.8% prediction intervals for scalar forecasts – coverage assesses calibration, width assesses sharpness 37 Results – marginal speed and direction Wind speed: Wind direction: 38 Results – marginal speed and direction We can see that for both speed and direction, BMA improves the quality of both the probabilistic and deterministic forecasts BMA produces marginal distributions that are better-calibrated than the raw ensemble and sharper than climatology 39 Background Motivation Ensemble forecasting Bayesian model averaging Dissertation outline BMA for vector wind Data Decomposing the problem Bias-correction Error distributions Model Results Future directions References Acknowledgements 40 Future Directions Develop a BMA method to explicitly model marginal instantaneous wind speed and compare to the performance of the forecasts from this model (current BMA for marginal wind speed is for maximum wind speeds, not instantaneous) Incorporate spatial information, either through explicitly modeling some spatial structure to our parameters or by estimating parameters locally rather than globally Investigate using an exponential forgetting for training data rather than a sliding window, which could allow for faster computation through the use of updating formulae for parameter estimates Extend multivariate methods to jointly forecast multiple weather quantities simultaneously 41 References Raftery, A.E., Gneiting, T., Balabdaoui, F. and Polakowski, M. (2005). Using Bayesian Model Averaging to calibrate forecast ensembles. Monthly Weather Review, 133, 1155-1174. Sloughter, J. M., Raftery, A. E., Gneiting, T. and Fraley, C. (2007). Probabilistic quantitative precipitation forecasting using Bayesian model averaging. Monthly Weather Review, 135, 3209-3220. Sloughter, J. M., Gneiting, T., and Raftery, A.E. (2009). Probabilistic wind speed forecasting using ensembles and Bayesian model averaging. Journal of the American Statistical Association, accepted. Mass, C., Joslyn, S., Pyle, P., Tewson, P., Gneiting, T., Raftery, A., Baars, J., Sloughter, J. M., Jones, D., and Fraley, C. (2009). PROBCAST: A web-based portal to mesoscale probabilistic forecasts. Bulletin of the American Meteorological Society, in press. http://probcast.com 42 Acknowledgements Committee: Tilmann Gneiting - adviser Adrian Raftery, Cliff Mass - committee members Susan Joslyn - GSR Statistics folks: Veronica Berrocal, Chris Fraley, Thordis Thorarinsdottir, Will Kleiber, Larissa Stanberry, Matt Johnson, Robert Yuen, Michael Polakowski, Nicholas Johnson Atmospheric Sciences folks: Jeff Baars, Eric Grimit, Jeff Thomason, Tony Eckel APL folks: Patrick Tewson, John Pyle, David Jones, Janet Olsonbaker, Scott Sandgathe Psychology folks: Limor Nadav-Greenberg, Buzz Hunt, Queena Chen, Jared Le Clerc, Rebecca Nichols, Sonia Savelli 43