Report on the approach for realtime and retrospective forecasts

advertisement
STREAMFLOW FORECASTING AT A GLOBAL SCALE
Nathalie Voisin
A/ RETROSPECTIVE MEDIUM-RANGE STREAMFLOW FORECASTING AT A GLOBAL
SCALE
1/ DATASETS
Climatology is ECMWF reanalysis, ERA40:
1980-2002 period (after TOVS start)
Daily dataset
T2M, 10U, 10V, TP every 6 hours
1.125 degree resolution at the Equator (Gaussian grid)
(Note that NCEP reanalysis could also be used, if needed)
Downscaling reference ( spatial variability) at 0.5 degree resolution, Adam et al (2006):
1979-1999 period
Monthly dataset, disaggregated to daily using statistics
Originally 2 degree resolution, downscaled to 0.5 degree using inverse distance square
weighting interpolation of Co-op stations , and using 1st order Markov Chain for occurrence of rain, and
2 parameter gamma distribution for the intensity, with the parameters being derived from the
interpolation from station data (Nijssen et al, 2001). This precipitation dataset is corrected for gauge
undercatch and orography. A first order autoregressive model was used for temperature.
Statistical parameters have been derived from stations observations, or interpolated from
parameters derived from observations in regions nearby.
Tmin, Tmax, WND, TP
Forecasts are presently GFS reforecasts:
1979-2005
Daily dataset
2T, 10U, 10V, TP every 12 hours
2.5 degree resolution
15 ensemble members, based on 7 pairs of different initial conditions, and a control run
initialized with NCEP-NCAR reanalysis.
Validation: NMC reanalysis
Note that later on we would like to use ECMWF medium range, monthly and seasonal forecasts.
2/ STRATEGY:
- The Climatology dataset forces the hydrological model VIC until the day of the forecast (spin up and
initial conditions).
- VIC is then forced for a 15 days period with
o the climatology ( control run for later analysis)
o the 15 ensemble members forecast
o a 21 member ensemble of 15 days “forecasts”, where each member is the 15 day climatology
starting on the day of the forecast, for each (22-1=) 21 years of the climatology.
1
- outputs will be analyzed:
* skill at different lead time ( 1, 3, 7, 10 and 15 days) with respect to the climatology VIC
run and observations where available
* at different locations, with different climates, and for specific events ( low flow, high
flow and average flow periods)
* skill will be measured by different statistical tools ( Brier score and others) in order to
measure the accuracy of the streamflow forecasts ( peak magnitude, timing).
* with the medium range forecast, the emphasis will be floods mainly, while seasonal
forecasts will allow to look at both flood and drought ( streamflow and soil moisture).
* where no stations are available for comparison, results will be presented in terms of
anomalies and percentiles.
3/ DATA PROCESSING:
The climatology needs to be downscaled to 0.5 degrees
The forecasts need to be bias corrected to match the climatology statistics, and then downscaled to 0.5
degree.
The climatology statistics to be conserved in the forecasts are:
- the frequency of occurrence of rain
- the amplitude of the signals ( intensity range)
- an eventual systematic bias ( average precipitation)
The use of the MOS (Model Output Statistics) can also statistically correct the forecasts (to be
investigated).
4/ BIAS CORRECTION OF THE FORECASTS:
4.1 CDF of the forecasts:
Each ensemble has a specific set of initial conditions anomalies. The forecast climatology is a
cumulative distribution function for each cell (about 2445), for each day of the year (DOY), for each
lead time, and over a 9 day window in order to increase the number of data.
Each cdf is then created based on 25 years time 9 days = 225 values.
F LS  F LS 15members366days 15leadtimes2445cells 
2
Technical Note/ Choice:
Having a climatology for each ensemble member allows to bias correct specifically each systematic bias
due to a specific deviation in the initial conditions. The corrected ensemble member forecasts should be
closer to each other, less spread out. The uncertainty in the bias correction is minimized by doing so.
Brief reminder on the cdf, for this specific case (9 days window):
- Rank the 225 values (assume they are independent and could have occurred in any
year), rank is i, n=225.
These values are not independent as a system can last an entire week. This n could be
divided by 3 at least, then smooth the cdf …. or we take this approximation. If a one day
extreme occurs, the approximation underestimates the probability of occurrence, (and the
non exceedance probability). If a multiple day extreme occurs, the approximation
overestimates the probability of occurrence. With 225 occurences, we can assume that the
approximation of independent data is reasonable (say 1 extreme in 25 years, that would
induce a 3/225 (0.013) bias in the probability)
- Use the Weibull percentiles : p = i / (N+1)
Note that by using the 25 years of data , the bias correction result is better than if the bias correction was
using 24 years ( exclude the 9 day window centered on the day of the forecast to be bias corrected), as
would happen in a real time process. The heavy computation already required to derive the
15*366*15*2445 = 6.7 107 climatologies motivates us to accept this approximation. (I could do it by
continents but I need to read the entire file (15 days, 15 ensemble for 2445 cells for each day anyway)
and I cannot store the 25 years worth of data on our system (barely one year). The climatologies are
done by doy: I download the 9 day window of the doy for 25 years, then I compute the cdf for each cell.
Then I remove the files. It should take about 10-12 days to compute the climatology of the forecasts
because of the time it takes to download the file ( 365 days *2 in 24 hours).
The final product is the cdf of the forecast, for each cell, lead time, member, and variable (precip, temp
and wind)
4.2 CDF of Climatology
The present climatology (ERA40) has a resolution of 1.125 degree at the equator (Gaussian Grid)
and therefore need to be regridded to 2.5 degree resolution using inverse distance square weighting
interpolation (Symap algorithm).
The cdf is derived as explained above, for ERA40 reanalysis.
CLIM LS  CLIM LS 366days 2445cells 
4.3 Bias correction
Let FLS be the forecast for a specific doy, cell, member, and lead time at a resolution of 2.5
degree (large scale).
Let F LS be the forecast climatology, for that same doy, cell, member, lead time, at 2.5 degree resolution.
On the cdf of F LS , the value of FLS corresponds to a certain percentile ( non exceedance probability ).
This percentile corresponds to a value of CLS on the cdf of CLIM LS (of that same cell, but doy + lead
3
time). CLS is the bias corrected forecast value, for that doy, cell, member, lead time and at a resolution of
2.5 degree.
Note on finding P:
Because this is a retrospective analysis and we included all the values in the climatology, the percentile
of FLS has a corresponding P, there is not fitting or interpolation to be done. Find the rank of FLS in the
climatology, deduce P. In the climatology, find the 2 closest p’s before and after that P, and linearly
interpolate the corresponding CLIM’s to deduce CLS.
Note on the rain occurrence:
It might happen that the forecast model underestimates low precipitation values. In this case a forecast
value FLS of zero precipitation corresponds to a non exceedance probability varying from 0 to say p*,
which corresponds to a climatology value range of 0 to CLS*. In this case, (case exists if (p*-pclim*) is
larger than 0.01 for example), then a percentile p is deducted from a uniform distribution (srand function
in C), and CLS is then derived.
On the other hand, if the forecast climatology overestimates low precipitation value, then it will simply
be affected a value CLS of zero.
5/ DOWNSCALING OF THE CLIMATOLOGY AND THE FORECAST
Adam et al. (2006) precipitation dataset and corresponding Nijssen et al (2001) temperature and wind
datasets have been downscaled from 2 degree to 0.5 degree spatial resolution using interpolation
between stations, when available, and using a stochastic model where data were sparse , along with an
4
adjustment for orography. The statistics of spatial variability at 0.5 degree of the Adam et al. (2006) and
Nijssen et al (2001) can only be better than a “simple” disaggregation of 2.5 degree down to 0.5 degree.
Adam et al. / Nijssen et al. datasets X DS are being regridded to 2.5 degree X LS and 1 degree X 1deg
resolutions.
5.1 Downscaling of the climatology.
The climatology is originally at 1.125 degree (at the equator, Gaussian grid). For simplicity, the
climatology is regridded to a 1 degree resolution lat/lon grid. It creates a slight averaging at the equator
and does not change anything at higher latitudes. CLIMLS becomes CLIM1deg.
For a specific doy and cell, a random year in the Adam et al. / Nijssen et al. dataset is chosen X 1deg .
In this approach, the larger scale value is the one of the climatology, and the downscale value is the
larger scale value to which is added a random spatial variability anomaly.
For temperature and wind:
CLIM DS  CLIM 1deg  X DS  X 1deg , where X DS  X 1deg  is the spatial variability
For precipitation:
CLIM DS  CLIM 1deg .
X
X DS
, where DS is the spatial variability
X 1deg
X 1deg
5.2 Downscaling of the Forecast
The downscaling of the forecasts is very similar to the climatology, but at the 2.5 degree
resolution (large scale), and separate for each ensemble member.
CDS  CLS   X DS  X LS  for temperature and wind
X
C DS  C LS . DS for precipitation
X LS
This is a different process from Wood et al (2002) downscaling approach. Wood et al. (2002) used the
same dataset (based on observations) to both perform the bias correction at 2.5 degree, and to downscale
at 1/8th of a degree; the downscaled (Symap interpolation) forecast anomaly is added to the small scale
observation, while the downscaled observation is corrected for a large scale anomaly ( see square below).
Wood et al (2002) approach:

 

FDS  ODS ,RAND  OLS ,RAND  O LS  OLS  O LS ODS ,RAND   rand,LS   FCST ,LS where
OLS is the bias corrected forecast value ( specific doy, member, cell)
O LS is the observations mean for the specific doy, cell.
 FCST , LS is the forecast anomaly
 rand,LS is the large scale random doy anomaly
5
5.3 Downscaling of GFS
GFS forecast provide T average only, and not minimum and maximum temperatures. In the downscaling
process for temperature, the (Tmin –Tavg)DS and (Tmax – Tavg)DS will be additional anomalies added in
the equation.
T min FCST ,DS  TavgFCST ,LS  TavgOBS,DS  TavgOBS ,LS   T min OBS,DS  TavgOBS ,DS 
and similarly for
 TavgFCST ,LS  T min OBS,DS  TavgOBS,LS 
Tmax.
_______________________________________________________________________________
B/ REAL-TIME MEDIUM-RANGE STREAMFLOW FORECASTING AT A GLOBAL SCALE
1/ DATASETS:
Climatology is the same
Data assimilation: ECMWF analysis, motivation to use satellite information for soil moisture, snow
water equivalent and precipitation
Forecasts: ECMWF ensemble prediction system, ECMWF monthly forecast, ECMWF seasonal forecast
2/ STRATEGY:
The climatology forces the hydrological model until the end of the dataset (2002)
The ECMWF analysis forces the hydrological model until the day of the forecasts.
The ECMWF forecasts (ensemble prediction system, monthly, seasonal) force VIC for a 10-day forecast
(monthly, and seasonal).
The 21 member ensemble from the climatology force VIC for a 10-day forecast (monthly, and seasonal).
( presently, I am doing 15 days forecast using GFS).
Forecasts will be expressed in quantities or anomalies compared to climatology. Skill will be assessed
once the observations or analysis, or reanalysis are available for that date.
6
3/ DATA PROCESSING:
Both the ECMWF analysis and the forecasts need to be bias corrected with respect to the climatology,
following the procedure explained in the retrospective forecast. Either a fit or linear interpolation will be
used to derive the p’s from the cdf’s, and corresponding CLS’s.
ECMWF analysis would be bias corrected at the same scale as the ECMWF reanalysis.
Both the ECMWF analysis and the forecasts need to be downscaled to 0.5 degrees, the same way as
explained in the retrospective forecast section.
___________________________________________________________________________________
CHECKS TO BE MADE/FURTHER INVESTIGATIONS:
- use of MOS
- Check that the precipitation time series of ECMWF reanalysis looks consistent over the chosen period
(since instruments evolved).
- Check skill of forecasts;
. specify a skill score in the forecast climatology
. specify a score threshold below which make a streamflow forecast using the ensemble is not
useful at all : do a one to one plot between the streamflow forecast score and the precipitation forecast
score to determine the real minimum useful forecast score.
. depending on the skill score, a different number of ensemble ( randomly selected) might be
used. Or the forecast will be the climatology (force VIC with 15 days starting on the day of forecast, for
each of the 21 years of the climatology).
7
Download