Application of Forecast Verification Science to Operational River Forecasting in the National Weather Service Julie Demargne, James Brown, Yuqiong Liu and D-J Seo UCAR NROW, November 4-5, 2009 Approach to river forecasting Observations Forecasters Models Input forecasts EVAPOTRANSPIRATION INFILTRATION FREE PERCOLATION LOWER ZONE Users TENSION UPPER ZONE PRIM ARY FREE RESERVED INTERFLOW SURFACE RUNOFF TENSION TENSION SUPPLEM ENTAL FREE RESERVED BASEFLOW SUBSURFACE OUTFLOW DIRECT RUNOFF Forecast products Forecasters 2 Where is the …? In the past Verification ?? ? • Limited verification of hydrologic forecasts • How good are the forecasts for application X? 3 Where is the …? Now Verification !!! Papers Verification Experts Verification Products Verification Systems 4 Hydrologic forecasting: a multi-scale problem National Forecast group Major river system River basin with river forecast points Headwater basin with radar rainfall grid High resolution flash flood basins Hydrologic forecasts must be verified consistently across all spatial scales and resolutions. 5 Hydrologic forecasting: a multi-scale problem Forecast Uncertainty Years Seasons Months Forecast Lead Time Weeks Days Hours Minutes Protection of Life & Property Benefits Hydropower Flood Mitigation & Navigation Recreation Agriculture Ecosystem Reservoir Control State/Local Planning Health Environment Commerce Seamless probabilistic water forecasts are required for all lead times and all users; so is verification information. 6 Need for hydrologic forecast verification • In 2006, NRC recommended NWS expand verification of its uncertainty products and make it easily available to all users in near real time Users decide whether to take action with risk-based decision Must educate users on how to interpret forecast and verification info 7 River forecast verification service http://www.nws.noaa.gov/oh/ rfcdev/docs/ Final_Verification_Report.pdf http://www.nws.noaa.gov/oh/rfcdev/docs/ NWS-Hydrologic-Forecast-VerificationTeam_Final-report_Sep09.pdf.pdf 8 River forecast verification service • To help us answer How good are the forecasts for application X? What are the strengths and weaknesses of the forecasts? What are the sources of error and uncertainty in the forecasts? How are new science and technology improving the forecasts and the verifying observations? What should be done to improve the forecasts? Do forecasts help users in their decision making? 9 River forecast verification service River forecasting system Observations Verification systems Models Input forecasts Forecasters Users EVAPOTRANSPIRATION TENSION UPPER ZONE INFILTRATION FREE PERCOLATION LOWER ZONE PRIM ARY FREE INTERFLOW SURFACE RUNOFF TENSION TENSION SUPPLEM ENTAL FREE RESERVED RESERVED BASEFLOW SUBSURFACE OUTFLOW Users DIRECT RUNOFF Forecast products Verification products 10 River forecast verification service • Verification Service within Community Hydrologic Prediction System (CHPS) to: Compute metrics Display data & metrics Disseminate data & metrics Provide real-time access to metrics Analyze uncertainty and error in forecasts Track performance 11 Verification challenges • Verification is useful if the information generated leads to decisions about the forecast/system being verified Verification needs to be user oriented • No single verification measure provides complete information about the quality of a forecast product Several verification metrics and products are needed • To facilitate communication of forecast quality, common verification practices and products are needed from weather and climate forecasts to water forecasts Collaborations between meteorology and hydrology communities are needed (e.g., Thorpex-Hydro, HEPEX) 12 Verification challenges: two classes of verification • Diagnostic verification: to diagnose and improve model performance done off-line with archived forecasts or hindcasts to analyze forecast quality relative to different conditions/processes • Real-time verification: to help forecasters and users make decisions in real-time done in real-time (before the verifying observation occurs) using information from historical analogs and/or past forecasts and verifying observations under similar conditions 13 Diagnostic verification products • Key verification metrics for 4 levels of information for single-valued and probabilistic forecasts 1. Observations-forecasts comparisons (scatter plots, box plots, time series plots) 2. Summary verification (e.g. MAE/Mean CRPS, skill score) 3. More detailed verification (e.g. measures of reliability, resolution, discrimination, correlation, results for specific conditions) 4. Sophisticated verification (e.g. for specific events with ROC) To be evaluated by forecasters and forecast users 14 Diagnostic verification products Forecast value • Examples for level 1: scatter plot, box-and-whiskers plot User-defined threshold Observed value 15 Diagnostic verification products • Examples for level 1: box-and-whiskers plot ‘Errors’ for one forecast Max. 90% 80% Median 20% 10% Forecast error (forecast - observed) [mm] American River in California – 24-hr precipitation ensembles (lead day 1) Zero error line “Blown” forecasts High bias Low bias Min. Observed daily total precipitation [mm] 16 Diagnostic verification products • Examples for level 2: skill score maps by months January April October Smaller score, better 17 Diagnostic verification products • Examples for level 3: more detailed plots Score Performance under different conditions Score Performance for different months 18 Diagnostic verification products • Examples for level 4: event specific plots Event: > 85th percentile from observed distribution Reliability Perfect Predicted Probability Discrimination Probability of Detection POD Observed frequency Perfect Probability of False Detection POFD 19 Diagnostic verification products • Examples for level 4: user-friendly spread-bias plot “Hit rate” = 90% 60% of time, observation should fall in window covering middle 60% (i.e. median ±30%) 60% “Underspread” 20 Diagnostic verification analyses • Analyze any new forecast process with verification • Use different temporal aggregations Analyze verification statistic as a function of lead time If similar performance across lead times, data can be pooled • Perform spatial aggregation carefully Analyze results for each basin and results plotted on spatial maps Use normalized metrics (e.g. skill scores) Aggregate verification results across basins with similar hydrologic processes (e.g. by response time) • Report verification scores with sample size In the future, confidence intervals 21 Diagnostic verification analyses • Evaluate forecast performance under different conditions w/ time conditioning: by month, by season w/ atmospheric/hydrologic conditioning: – low/high probability threshold – absolute thresholds (e.g., PoP, Flood Stage) Check that sample size is not too small • Analyze sources of uncertainty and error Verify forcing input forecasts and output forecasts For extreme events, verify both stage and flow Sensitivity analysis to be set up at all RFCs: 1) what is the optimized QPF horizon for hydrologic forecasts? 2) do run-time modifications made on the fly improve forecasts? 22 Diagnostic verification software • Interactive Verification Program (IVP) developed at OHD: verifies single-valued forecasts at given locations/areas 23 Diagnostic verification software • Ensemble Verification System (EVS) developed at OHD: verifies ensemble forecasts at given locations/areas 24 Dissemination of diagnostic verification • Example: WR water supply website http://www.nwrfc.noaa.gov/westernwater/ Data Visualization Error •MAE, RMSE •Conditional on lead time, year Skill •Skill relative to Climatology •Conditional Categorical •FAR, POD, contingency table (based on climatology or user definable) 25 Dissemination of diagnostic verification • Example: OHRFC bubble plot online http://www.erh.noaa.gov/ohrfc/bubbles.php 26 Real-time verification • How good could the ‘live’ forecast be? Live forecast Observations 27 Real-time verification • Select analogs from a pre-defined set of historical events and compare with ‘live’ forecast Analog 3 Analog 2 Analog 1 Observed Live forecast Analog Observed Analog Forecast “Live forecast for Flood is likely to be too high” 28 Real-time verification • Adjust ‘live’ forecast based on info from the historical analogs Live forecast What happened Live forecast was too high 29 Real-time verification • Example for ensemble forecasts Live forecast (L) Analog forecasts (H): μH = μL ± 1.0˚C Temperature (oF) Analog observations “Day 1 forecast is probably too high” Forecast lead day 30 Real-time verification • Build analog query prototype using multiple criteria Seeking analogs for precipitation: “Give me past forecasts for the 10 largest events relative to hurricanes for this basin.” Seeking analogs for temperature: “Give me all past forecasts with lead time 12 hours whose ensemble mean was within 5% of the live ensemble mean.” Seeking analogs for flow: “Give me all past forecasts with lead times of 12-48 hours whose probability of flooding was >=0.95, where the basin-averaged soil-moisture was > x and the immediately prior observed flow exceeded y at the forecast issue time”. Requires forecasters’ input! 31 Outstanding science issues • • • • • • Define meaningful reference forecasts for skill scores • Account for observational error (measurement and representativeness errors) and rating curve error • Account for non-stationarity (e.g., climate change) Separate timing error and amplitude error in forecasts Verify rare events and specify sampling uncertainty in metrics Analyze sources of uncertainty and error in forecasts Consistently verify forecasts on multiple space and time scales Verify multivariate forecasts (issued at multiple locations and for multiple time steps) by accounting for statistical dependencies 32 Verification service development OHD OCWWS NCEP Forecasters Users Academia Forecast agencies COMET-OHD-OCWWS OHD-NCEP Thorpex-Hydro project collaboration on training Private OHD-Deltares collaboration for CHPS enhancements HEPEX Verification Test Bed (CMC, Hydro-Quebec, ECMWF) 33 Looking ahead • • 2012: Info on quality of forecast service available online real-time and diagnostic verification implemented in CHPS RFC verification standard products available online along with forecasts 2015: Leveraging grid-based verification tools 34 Thank you Questions? FORECASTER FORECASTER Julie.Demargne@noaa.gov 35 Extra slide 36 Diagnostic verification products • Key verification metrics from NWS Verification Team report 37