STREAMFLOW FORECASTING AT A GLOBAL SCALE Nathalie Voisin A/ RETROSPECTIVE MEDIUM-RANGE STREAMFLOW FORECASTING AT A GLOBAL SCALE 1/ DATASETS Climatology is ECMWF reanalysis, ERA40: 1980-2002 period (after TOVS start) Daily dataset T2M, 10U, 10V, TP every 6 hours 1.125 degree resolution at the Equator (Gaussian grid) (Note that NCEP reanalysis could also be used, if needed) Downscaling reference ( spatial variability) at 0.5 degree resolution, Adam et al (2006): 1979-1999 period Monthly dataset, disaggregated to daily using statistics Originally 2 degree resolution, downscaled to 0.5 degree using inverse distance square weighting interpolation of Co-op stations , and using 1st order Markov Chain for occurrence of rain, and 2 parameter gamma distribution for the intensity, with the parameters being derived from the interpolation from station data (Nijssen et al, 2001). This precipitation dataset is corrected for gauge undercatch and orography. A first order autoregressive model was used for temperature. Statistical parameters have been derived from stations observations, or interpolated from parameters derived from observations in regions nearby. Tmin, Tmax, WND, TP Forecasts are presently GFS reforecasts: 1979-2005 Daily dataset 2T, 10U, 10V, TP every 12 hours 2.5 degree resolution 15 ensemble members, based on 7 pairs of different initial conditions, and a control run initialized with NCEP-NCAR reanalysis. Validation: NMC reanalysis Note that later on we would like to use ECMWF medium range, monthly and seasonal forecasts. 2/ STRATEGY: - The Climatology dataset forces the hydrological model VIC until the day of the forecast (spin up and initial conditions). - VIC is then forced for a 15 days period with o the climatology ( control run for later analysis) o the 15 ensemble members forecast o a 21 member ensemble of 15 days “forecasts”, where each member is the 15 day climatology starting on the day of the forecast, for each (22-1=) 21 years of the climatology. 1 - outputs will be analyzed: * skill at different lead time ( 1, 3, 7, 10 and 15 days) with respect to the climatology VIC run and observations where available * at different locations, with different climates, and for specific events ( low flow, high flow and average flow periods) * skill will be measured by different statistical tools ( Brier score and others) in order to measure the accuracy of the streamflow forecasts ( peak magnitude, timing). * with the medium range forecast, the emphasis will be floods mainly, while seasonal forecasts will allow to look at both flood and drought ( streamflow and soil moisture). * where no stations are available for comparison, results will be presented in terms of anomalies and percentiles. 3/ DATA PROCESSING: The climatology needs to be downscaled to 0.5 degrees The forecasts need to be bias corrected to match the climatology statistics, and then downscaled to 0.5 degree. The climatology statistics to be conserved in the forecasts are: - the frequency of occurrence of rain - the amplitude of the signals ( intensity range) - an eventual systematic bias ( average precipitation) The use of the MOS (Model Output Statistics) can also statistically correct the forecasts (to be investigated). 4/ BIAS CORRECTION OF THE FORECASTS: 4.1 CDF of the forecasts: Each ensemble has a specific set of initial conditions anomalies. The forecast climatology is a cumulative distribution function for each cell (about 2445), for each day of the year (DOY), for each lead time, and over a 9 day window in order to increase the number of data. Each cdf is then created based on 25 years time 9 days = 225 values. F LS F LS 15members366days 15leadtimes2445cells 2 Technical Note/ Choice: Having a climatology for each ensemble member allows to bias correct specifically each systematic bias due to a specific deviation in the initial conditions. The corrected ensemble member forecasts should be closer to each other, less spread out. The uncertainty in the bias correction is minimized by doing so. Brief reminder on the cdf, for this specific case (9 days window): - Rank the 225 values (assume they are independent and could have occurred in any year), rank is i, n=225. These values are not independent as a system can last an entire week. This n could be divided by 3 at least, then smooth the cdf …. or we take this approximation. If a one day extreme occurs, the approximation underestimates the probability of occurrence, (and the non exceedance probability). If a multiple day extreme occurs, the approximation overestimates the probability of occurrence. With 225 occurences, we can assume that the approximation of independent data is reasonable (say 1 extreme in 25 years, that would induce a 3/225 (0.013) bias in the probability) - Use the Weibull percentiles : p = i / (N+1) Note that by using the 25 years of data , the bias correction result is better than if the bias correction was using 24 years ( exclude the 9 day window centered on the day of the forecast to be bias corrected), as would happen in a real time process. The heavy computation already required to derive the 15*366*15*2445 = 6.7 107 climatologies motivates us to accept this approximation. (I could do it by continents but I need to read the entire file (15 days, 15 ensemble for 2445 cells for each day anyway) and I cannot store the 25 years worth of data on our system (barely one year). The climatologies are done by doy: I download the 9 day window of the doy for 25 years, then I compute the cdf for each cell. Then I remove the files. It should take about 10-12 days to compute the climatology of the forecasts because of the time it takes to download the file ( 365 days *2 in 24 hours). The final product is the cdf of the forecast, for each cell, lead time, member, and variable (precip, temp and wind) 4.2 CDF of Climatology The present climatology (ERA40) has a resolution of 1.125 degree at the equator (Gaussian Grid) and therefore need to be regridded to 2.5 degree resolution using inverse distance square weighting interpolation (Symap algorithm). The cdf is derived as explained above, for ERA40 reanalysis. CLIM LS CLIM LS 366days 2445cells 4.3 Bias correction Let FLS be the forecast for a specific doy, cell, member, and lead time at a resolution of 2.5 degree (large scale). Let F LS be the forecast climatology, for that same doy, cell, member, lead time, at 2.5 degree resolution. On the cdf of F LS , the value of FLS corresponds to a certain percentile ( non exceedance probability ). This percentile corresponds to a value of CLS on the cdf of CLIM LS (of that same cell, but doy + lead 3 time). CLS is the bias corrected forecast value, for that doy, cell, member, lead time and at a resolution of 2.5 degree. Note on finding P: Because this is a retrospective analysis and we included all the values in the climatology, the percentile of FLS has a corresponding P, there is not fitting or interpolation to be done. Find the rank of FLS in the climatology, deduce P. In the climatology, find the 2 closest p’s before and after that P, and linearly interpolate the corresponding CLIM’s to deduce CLS. Note on the rain occurrence: It might happen that the forecast model underestimates low precipitation values. In this case a forecast value FLS of zero precipitation corresponds to a non exceedance probability varying from 0 to say p*, which corresponds to a climatology value range of 0 to CLS*. In this case, (case exists if (p*-pclim*) is larger than 0.01 for example), then a percentile p is deducted from a uniform distribution (srand function in C), and CLS is then derived. On the other hand, if the forecast climatology overestimates low precipitation value, then it will simply be affected a value CLS of zero. 5/ DOWNSCALING OF THE CLIMATOLOGY AND THE FORECAST Adam et al. (2006) precipitation dataset and corresponding Nijssen et al (2001) temperature and wind datasets have been downscaled from 2 degree to 0.5 degree spatial resolution using interpolation between stations, when available, and using a stochastic model where data were sparse , along with an 4 adjustment for orography. The statistics of spatial variability at 0.5 degree of the Adam et al. (2006) and Nijssen et al (2001) can only be better than a “simple” disaggregation of 2.5 degree down to 0.5 degree. Adam et al. / Nijssen et al. datasets X DS are being regridded to 2.5 degree X LS and 1 degree X 1deg resolutions. 5.1 Downscaling of the climatology. The climatology is originally at 1.125 degree (at the equator, Gaussian grid). For simplicity, the climatology is regridded to a 1 degree resolution lat/lon grid. It creates a slight averaging at the equator and does not change anything at higher latitudes. CLIMLS becomes CLIM1deg. For a specific doy and cell, a random year in the Adam et al. / Nijssen et al. dataset is chosen X 1deg . In this approach, the larger scale value is the one of the climatology, and the downscale value is the larger scale value to which is added a random spatial variability anomaly. For temperature and wind: CLIM DS CLIM 1deg X DS X 1deg , where X DS X 1deg is the spatial variability For precipitation: CLIM DS CLIM 1deg . X X DS , where DS is the spatial variability X 1deg X 1deg 5.2 Downscaling of the Forecast The downscaling of the forecasts is very similar to the climatology, but at the 2.5 degree resolution (large scale), and separate for each ensemble member. CDS CLS X DS X LS for temperature and wind X C DS C LS . DS for precipitation X LS This is a different process from Wood et al (2002) downscaling approach. Wood et al. (2002) used the same dataset (based on observations) to both perform the bias correction at 2.5 degree, and to downscale at 1/8th of a degree; the downscaled (Symap interpolation) forecast anomaly is added to the small scale observation, while the downscaled observation is corrected for a large scale anomaly ( see square below). Wood et al (2002) approach: FDS ODS ,RAND OLS ,RAND O LS OLS O LS ODS ,RAND rand,LS FCST ,LS where OLS is the bias corrected forecast value ( specific doy, member, cell) O LS is the observations mean for the specific doy, cell. FCST , LS is the forecast anomaly rand,LS is the large scale random doy anomaly 5 5.3 Downscaling of GFS GFS forecast provide T average only, and not minimum and maximum temperatures. In the downscaling process for temperature, the (Tmin –Tavg)DS and (Tmax – Tavg)DS will be additional anomalies added in the equation. T min FCST ,DS TavgFCST ,LS TavgOBS,DS TavgOBS ,LS T min OBS,DS TavgOBS ,DS and similarly for TavgFCST ,LS T min OBS,DS TavgOBS,LS Tmax. _______________________________________________________________________________ B/ REAL-TIME MEDIUM-RANGE STREAMFLOW FORECASTING AT A GLOBAL SCALE 1/ DATASETS: Climatology is the same Data assimilation: ECMWF analysis, motivation to use satellite information for soil moisture, snow water equivalent and precipitation Forecasts: ECMWF ensemble prediction system, ECMWF monthly forecast, ECMWF seasonal forecast 2/ STRATEGY: The climatology forces the hydrological model until the end of the dataset (2002) The ECMWF analysis forces the hydrological model until the day of the forecasts. The ECMWF forecasts (ensemble prediction system, monthly, seasonal) force VIC for a 10-day forecast (monthly, and seasonal). The 21 member ensemble from the climatology force VIC for a 10-day forecast (monthly, and seasonal). ( presently, I am doing 15 days forecast using GFS). Forecasts will be expressed in quantities or anomalies compared to climatology. Skill will be assessed once the observations or analysis, or reanalysis are available for that date. 6 3/ DATA PROCESSING: Both the ECMWF analysis and the forecasts need to be bias corrected with respect to the climatology, following the procedure explained in the retrospective forecast. Either a fit or linear interpolation will be used to derive the p’s from the cdf’s, and corresponding CLS’s. ECMWF analysis would be bias corrected at the same scale as the ECMWF reanalysis. Both the ECMWF analysis and the forecasts need to be downscaled to 0.5 degrees, the same way as explained in the retrospective forecast section. ___________________________________________________________________________________ CHECKS TO BE MADE/FURTHER INVESTIGATIONS: - use of MOS - Check that the precipitation time series of ECMWF reanalysis looks consistent over the chosen period (since instruments evolved). - Check skill of forecasts; . specify a skill score in the forecast climatology . specify a score threshold below which make a streamflow forecast using the ensemble is not useful at all : do a one to one plot between the streamflow forecast score and the precipitation forecast score to determine the real minimum useful forecast score. . depending on the skill score, a different number of ensemble ( randomly selected) might be used. Or the forecast will be the climatology (force VIC with 15 days starting on the day of forecast, for each of the 21 years of the climatology). 7