Probabilistic Prediction Cliff Mass University of Washington Uncertainty in Forecasting • Most numerical weather prediction (NWP) today and most forecast products reflect a deterministic approach. • This means that we do the best job we can for a single forecast and do not consider uncertainties in the model, initial conditions, or the very nature of the atmosphere. • However, the uncertainties are usually very significant and information on such uncertainty can be very useful. This is really ridiculous! A Fundamental Issue • The work of Lorenz (1963, 1965, 1968) demonstrated that the atmosphere is a chaotic system, in which small differences in the initialization, well within observational error, can have large impacts on the forecasts, particularly for longer forecasts. • In a series of experiments he found that small errors in initial conditions can grow so that all deterministic forecast skill is lost at about two weeks. Butterfly Effect: a small change at one place in a complex system can have large effects elsewhere Not unlike a pinball game Uncertainty Extends Beyond Initial Conditions • Also uncertainty in our model physics. – such as microphysics and boundary layer parameterizations. • And further uncertainty produced by our numerical methods. Probabilistic NWP • To deal with forecast uncertainty, Epstein (1969) suggested stochastic-dynamic forecasting, in which forecast errors are explicitly considered during model integration. • Essentially, uncertainty estimates are added to each term in the primitive equations. • This stochastic method was not and still is not computationally practical. Probabilistic-Ensemble Numerical Prediction (NWP) • Another approach, ensemble prediction, was proposed by Leith (1974), who suggested that prediction centers run a collection (ensemble) of forecasts, each starting from a different initial state. • The variations in the resulting forecasts could be used to estimate the uncertainty of the prediction. • But even the ensemble approach was not possible at this time due to limited computer resources. • Became practical in the late 1980s as computer power increased. Ensemble Prediction • Can use ensembles to estimate the probabilities that some weather feature will occur. •The ensemble mean is more accurate on average than any individual ensemble member. •Forecast skill of the ensemble mean is related to the spread of the ensembles •When ensemble forecasts are similar, ensemble mean skill tend to be higher. •When forecasts differ greatly, ensemble mean forecast skill tends to be less. Deterministic Forecasting An analysis produced to run an NWP model is somewhere in a cloud of likely states. Any point in the cloud is equally likely to be the truth. 12h forecast 12h observation T T The true state of the atmosphere exists as a single point in phase space that we never know exactly. 48h forecast 24h forecast 36h forecast Nonlinear error growth and model deficiencies drive apart 24h observation the forecast and true trajectories T (i.e., Chaos Theory) 36h observation T 48h observation T P H S AS PA E C11 E A point in phase space completely describes an instantaneous state of the atmosphere. (pres, temp, etc. at all points at one time.) Ensemble Forecasting, a Stochastic Approach An ensemble of likely analyses leads to an ensemble of likely forecasts T T P H S AS PA E C12 E Ensemble Forecasting: Encompasses truth Reveals flow-dependent uncertainty Yields objective stochastic forecast Probability Density Functions 0.4 0.4 0.2 0.2 0 0 • Usually we fit the distribution of ensemble members with a gaussian or other reasonably smooth theoretical distribution as a first step 22 May 2003 1:30 PM General Examination Presentation A critical issue is the development of ensemble systems that create probabilistic guidance that is both reliable and sharp. We Need to Create Probability Density Functions (PDFs) of Each Variable That have These Characteristics Elements of a Good Probability Forecast: • Sharpness (also known as resolution) – The width of the predicted distribution should be as small as possible. Sharp Less Sharp Probability Density Function (PDF) for some forecast quantity Elements of a Good Probability Forecast • Reliability (also known as calibration) – A probability forecast p, ought to verify with relative frequency p. – Forecasts from climatology are reliable (by definition), so calibration alone is not enough. Reliability Diagram Verification Rank Histogram (a.k.a., Talagrand Diagram)-Another Measure of Reliability Over many trials, record verification’s position (the “rank”) among the ordered EF members. Under-Spread EF 0.2 0.1 0.3 0.2 0.1 0 0 1 Frequency Over-Spread EF Probability 0.3 Probability Probability Reliable EF 2 3 4 5 6 7 8 Verification Rank 2 3 4 5 6 7 8 Verification Rank 1 0.3 0.3 0.2 0.2 0.2 0.1 0.1 0.1 5 10 15 20 0 0.1 9 0.3 0 0.2 0 1 9 0.3 5 10 15 20 0 2 5 3 4 5 6 7 8 Verification Rank 10 15 9 20 Cumulative Precip. (mm) 17 EF PDF (curve) & 8 sample members (bars) True PDF (curve) & verification value (bar) Brier Skill Score (BSS) directly examines reliability, resolution, and overall skill by Discrete, Contiguous Bins Continuous Brier Score 1 M BS M j1 Decomposed Brier Score p e j o j 1 I 1 I 2 2 BS N i pe' i oi N i oi o o 1 o M i 1 M i 1 2 (reliability, rel) (resolution, res) (uncertainty, unc) M : number of fcst/obs pairs pe j : forecast probability {0.0…1.0} oj : observation {0.0 = no, 1.0 = yes} BS = 0 for perfect forecasts BS = 1 for perfectly wrong forecasts Brier Skill Score BSS BS fcst BS clim BS perf BS clim I : number of probability bins (normally 11) N : number of data pairs in the bin ( p'e )i : binned forecast probability (0.0, 0.1,…1.0 for 11 bins) oi : observed relative frequency for bin i o : sample climatology (total occurrences / total forecasts) Brier Skill Score′ 1 BS fcst BS clim BS S 1 rel fcst res fcst unc fcst BSS relclim resclim uncclim 0 BSS = 1 for perfect forecasts BSS < 0 for forecasts worse than climo res rel unc 0 ADVANTAGES: 1) No need for long-term climatology 2) Can compute and visualize in reliability diagram Probabilistic Information Can Produce Substantial Economic and Public Protection Benefits There is a decision theory on using probabilistic information for economic savings C= cost of protection L= loss if a damaging event occurs Decision theory says you should protect if the probability of occurrence is greater than C/L Critical Event: surface winds > 50kt Cost (of protecting): $150K Loss (if damage ): $1M C/L = .15 (15%) Deterministic Deterministic Observation Observation Probabilistic Probabilistic Case Case Forecast Forecast (kt) (kt) (kt) (kt) Cost Cost ($K) ($K) Forecast Forecast 11 65 65 54 54 150 150 42% 42% 22 58 58 63 63 150 150 71% 71% 33 73 73 57 57 150 150 95% 95% 44 55 55 37 37 150 150 13% 13% 55 39 39 31 31 00 3% 3% 66 31 31 55 55 1000 1000 36% 36% 77 62 62 71 71 150 150 85% 85% 88 53 53 42 42 150 150 22% 22% 99 21 21 27 27 00 51% 51% 10 10 52 52 39 39 150 150 77% 77% Total Total Cost: Cost: $$ 2,050 2,050 Observed? Decision Theory Example YES NO Forecast? YES NO Hit $150K False Alarm $150K Miss $1000K Correct Rejection $0K Cost Cost ($K) ($K) by by Threshold Threshold for for Protective Protective Action Action 0% 0% 20% 20% 40% 40% 60% 60% 80% 80% 100% 100% 150 150 150 150 150 150 1000 1000 1000 1000 1000 1000 150 150 150 150 150 150 150 150 1000 1000 1000 1000 150 150 150 150 150 150 150 150 150 150 1000 1000 150 150 00 00 00 00 00 150 150 00 00 00 00 00 150 150 150 150 1000 1000 1000 1000 1000 1000 1000 1000 150 150 150 150 150 150 150 150 150 150 1000 1000 150 150 150 150 00 00 00 00 150 150 150 150 150 150 00 00 00 150 150 150 150 150 150 150 150 00 00 $$1,500 1,500 $$1,200 1,200 $$1,900 1,900 $$2,600 2,600 $$3,300 3,300 $$5,000 5,000 Optimal Threshold = 15% History of Probabilistic Weather Prediction (in the U.S.) Early Forecasting Started Probabilistically!!! • Early forecasters, faced with large gaps in their young science, understood the uncertain nature of the weather prediction process and were comfortable with a probabilistic approach to forecasting. • Cleveland Abbe, who organized the first forecast group in the United States as part of the U.S. Signal Corp, did not use the term “forecast” for his first prediction in 1871, but rather used the term “probabilities,” resulting in him being known as “Old Probabilities” or “Old Probs” to the public. “Ol Probs” •Professor Cleveland Abbe, issued the first public “Weather Synopsis and Probabilities” on February 19, 1871 •A few years later, the term indications was substituted for probabilities, and by 1889 the term forecasts received official approval(Murphy 1997). History of Probabilistic Prediction • The first modern operational probabilistic forecasts in the United States were produced in 1965. These forecasts, for the probability of precipitation, were produced by human weather forecasters and thus were subjective probabilistic predictions. • The first objective probabilistic forecasts were produced as part of the Model Output Statistics (MOS) system that began in 1969. NOTE: Model Output Statistics (MOS) • Based on simple linear regression with 12 predictors. • Y = a0 +a1X1 + a2X2 + a3X3 + a4X4 … Ensemble Prediction • Ensemble prediction began an NCEP in the early 1990s. ECMWF rapidly joined the club. • During the past decades the size and sophistication of the NCEP and ECMWF ensemble systems have grown considerably, with the medium-range global ensemble system becoming an integral tool for many forecasters. • Also during this period, NCEP has constructed a higher resolution, short-range ensemble system (SREF) that uses breeding to create initial condition variations. Example: NCEP Global Ensemble System • Begun in 1993 with the MRF (now GFS) • First tried “lagged” ensembles as basis…using runs of various initializations verifying at the same time. • Then used the “breeding” method to find perturbations to the initial conditions of each ensemble members. • Breeding adds random perturbations to an initial state, let them grow, then reduce amplitude down to a small level, lets them grow again, etc. • Give an idea of what type of perturbations are growing rapidly in the period BEFORE the forecast. • Does not include physics uncertainty. • Now replaced by Ensemble Transform Filter Approach NCEP Global Ensemble • 20 members at 00, 06, 12, and 18 UTC plus two control runs for each cycle • 28 levels • T190 resolution (roughly 80km resolution) • 384 hours • Uses stochastic physics to get some physics diversity ECMWF Global Ensemble • 50 members and 1 control • 60 levels • T399 (roughly 40 km) through 240 hours, T255 afterwards • Singular vector approach to creating perturbations • Stochastic physics Several Nations Have Global Ensembles Too! • China, Canada, Japan and others! • And there are combinations of global ensembles like: – TIGGE: Thorpex Interative Grand Global Ensemble from ten national NWP centers – NAEFS: North American Ensemble Forecasting System combining U.S. and Canadian Global Ensembles Popular Ensemble-Based Products Spaghetti Diagram Ensemble Mean ‘Ensemble “best guess” = high-resolution control forecast or ensemble mean ensemble spread = standard deviation of the members at each grid point Shows where “best guess” can be trusted (i.e., areas of low or high predictability) Details unpredictable aspects of waves: amplitude vs. phase 37 Global Forecast System (GFS) Ensemble http://www.cdc.noaa.gov/map/images/ens/ens.html Spread Chart Meteograms Versus “Plume Plots” 5520 1000/500 Hpa Geopotential Thickness [m] at Yokosuka Initial DTG 00Z 28 JAN 1999 5460 5400 5340 5280 5220 5160 Data Range = meteogram-type trace of each ensemble 5100 member’s raw output 5040 Excellent tool for point forecasting, if calibrated Can easily4980 (and should) calibrate for model bias Calibrating for0 ensemble spread problems 5 4is difficult 3 2 1 6 7 8 9 10 Forecast Day Must 38 use box & whisker, or confidence interval plot for large ensembles FNMOC Ensemble Forecast S Box and Whisker Plots 39 http://www.weatheroffice.gc.ca/ensemble/index_naefs_e.html 40 http://www.weatheroffice.gc.ca/ensemble/index_naefs_e.html Gray shaded area is 90% Confidence Interval (CI) Misawa AB, Japan AFWA Forecast Multimeteogram JME Cycle: 11Nov06, 18Z RWY: 100/280 50 15km Resolution 45 Extreme Max Wind (kt) Speed Wind Speed (kt) . 40 35 30 25 90% CI 20 15 10 5 0 Wind Direction 11/18 41 Mean 12/00 06 12 Extreme 18 13/00 06 Min Valid Time (UTC) Valid Time 12 18 14/00 06 Hurricane Track Forecast & Potential 3 42 Ensemble-Based Probabilities Postage Stamp Plots Verification SLP and winds 1: cent - Reveals high uncertainty in storm track and intensity - Indicates low probability of Puget Sound wind event 2: eta 5: ngps 8: eta* 11: ngps* 3: ukmo 6: cmcg 9: ukmo* 12: cmcg* 4: tcwb 7: avn 10: tcwb* 13: avn* A Number of Nations Are Experimenting with HigherResolution Ensembles European MOGREPS – 24 km resolution – Uses ETKF for diversity breeding) – Stochastic physics NCEP Short-Range Ensembles (SREF) • Resolution of 32 km • Out to 87 h twice a day (09 and 21 UTC initialization) • Uses both initial condition uncertainty (breeding) and physics uncertainty. • Uses the Eta and Regional Spectral Models and recently the WRF model (21 total members) SREF Current System Model RSM-SAS RSM-RAS Res (km) Levels Members 45 28 Ctl,n,p 45 28 n,p Cloud Physics GFS physics GFS physics Convection Simple Arak-Schubert Relaxed Arak-Schubert Betts-Miller-Janjic BMJ-moist prof Eta-BMJ Eta-SAT 32 32 60 Ctl,n,p 60 n,p Op Ferrier Op Ferrier Eta-KF Eta-KFD 32 32 60 Ctl,n,p 60 n,p Op Ferrier Op Ferrier Kain-Fritsch Kain-Fritsch with enhanced detrainment PLUS * NMM-WRF control and 1 pert. Pair * ARW-WRF control and 1 pert. pair The UW Ensemble System • Perhaps the highest resolution operational ensemble systems are running at the University of Washington • UWME: 8 members at 36 and 12-km • UW EnKF system: 60 members at 36 and 4-km Calibration (Post-Processing) of Ensembles Is Essential Calibration of Mesoscale Ensemble Systems: The Problem • The component models of virtually all ensemble systems have systematic bias that substantially degrade the resulting probabilistic forecasts. • Since different models or runs have different systematic bias, this produces forecast variance that DOES NOT represent true forecast uncertainty. • Systematic bias reduces sharpness and degrades reliability. • Also, most ensemble systems produce forecasts that are underdispersive. Not enough variability! Example of Bias Correction for UW Ensemble System Uncorrected + T2 4.0 3.5 2.0 1.5 1.0 0.5 48 h 2.5 12 h 24 h 36 h Average RMSE (C) and Bias Average (shaded) Average Average RMSE RMSE and and Bias Bias (mb) (C) 3.0 0.0 -0.5 -1.0 -1.5 -2.0 -2.5 plus01 plus02 plus03 plus04 plus05 plus06 plus07 plus08 mean Bias-Corrected T2 4.0 3.5 2.0 1.5 1.0 0.5 48 h 2.5 12 h 24 h 36 h Average RMSE (C) and Bias Average (shaded) Average Average RMSE RMSE and andBias Bias Bias (mb) (C) RMSE and (mb) 3.0 0.0 -0.5 -1.0 -1.5 -2.0 -2.5 plus01 plus01 plus02 plus02 plus03 plus03 plus04 plus04 plus0 5 plus05 plus06 plus06 plus07 plus07 plus08 plus08 mean mean B Skill for Probability of T2 < 0°C 0.2 *ACMEcore *UW Basic Ensemble with bias correction ACMEcore UW Basic Ensemble, no bias correction *ACMEcore+ *UW Enhanced Ensemble with bias cor. ACMEcore+ UW Enhanced Ensemble without bias cor Uncertainty 0.1 0.0 -0.1 00 0.6 03 06 09 12 15 18 21 24 42 45 48 0.5 BSS 0.4 0.3 0.2 0.1 0.0 -0.1 00 03 06 09 12 15 18 21 24 27 30 33 36 BSS: Brier Skill Score 39 27 30 33 36 39 42 The Next Step: Bayesian Model Averaging • Although bias correction is useful it is possible to do more. – Optimize the variance of the forecast distributions – Weight the various ensemble members using their previous performance. – An effective way to do this is through Bayesian Model Averaging (BMA). Bayesian Model Averaging • Assumes a gaussian (or other) PDF for each ensemble member. • Assumes the variance of each member is the same (in current version). • Includes a simple bias correction for each member. • Weights each member by its performance during a training period (we are using 25 days) • Adds the pdfs from each member to get a total pdf. Application of BMA-Max 2-m Temperature (all stations in 12 km domain) Being Able to Create Reliable and Sharp Probabilistic Information is Only Half the Problem! Even more difficult will be communication and getting people and industries to use it. Deterministic Nature? • People seem to prefer deterministic products: “tell me what is going to happen” • People complain they find probabilistic information confusing. Many don’t understand POP (probability of precipitation). • Media and internet not moving forward very quickly on this. National Weather Service Icons are not effective in communicating probabilities And a “slight” chance of freezing drizzle reminds one of a trip to Antarctica Commercial sector is no better (Weather.Com) A great deal of research and development is required to develop effective approaches for communicating probabilistic forecasts which will not overwhelm people and allow them to get value out of them.