A Comparison of Four Streamflow Record Extension Techniques

advertisement
WATER RESOURCES RESEARCH,
VOL. 18, NO. 4, PAGES 1081-1088, AUGUST
1982
A Comparison of Four Streamflow Record Extension Techniques
ROBERT M.
HIRSCH
U.S. Geological Survey, Reston, Virginia 22092
One approach to developing time series of streamflow, which may be used for simulation and
optimizationstudiesof water resourcesdevelopmentactivities,is to extend an existinggagerecord in
time by exploiting the interstationcorrelation between the station of interest and some nearby (longterm) base station. Four methods of extension are described, and their properties are explored. The
methodsare regression(REG), regressionplus noise (RPN), and two new methods, maintenanceof
variance extensiontypes I and 2 (MOVE.l, MOVE.2). MOVE. I is equivalent to a method which is
widely used in psychology,biometrics, and geomorphologyand which has been called by various
names, e.g., 'line of organic correlation,' 'reduced major axis,' 'unique solution,' and 'equivalence
line.'
The methods are examined
for bias and standard error of estimate
of moments
and order
statistics,and an empiricalexaminationis madeof the preservationof historiclow-flow characteristics
using 50-year-longmonthly recordsfrom seven streams.The REG and RPN methodsare shown to
have seriousdeficienciesas record extensiontechniques.MOVE.2 is shown to be marginally better
than MOVE. l, accordingto the various comparisonsof bias and accuracy.
INTRODUCTION
Current practice in many aspects of water resources
planning and managementinvolves the use of hydrologic
time
series to
simulate
the outcome
of decisions.
The
decisionsmay include waste treatment plant designs,estab-
lishment of operatingpolicies for water supply systems,
hydropower
pr0daCtion
scheduling,
entryintorivercompacts and interg6Vernmentalagreements,and construction
The kinds
of water storageand wm,urawal
..t.a .
outcomes that may be of interest include frequency and
duration of unacceptablewater quality conditions,frequency and durationof supplyshortfallsfor municipal,industrial
or agricultural water users, dependablerate of hydropower
production during peak demand periods, frequency and
severity of river compactviolations, or, more abstractly, the
expectation and variance of project benefits or costs. There
are several different methodsfor developingtime seriesfor
use in simulation. The following is a list of the general
categories of methods for developing such time series for
streamflowsat a singlesite.
1. Use the historic record of streamflows [Rippl, 1883'
Hazen, 1914].
2. Use the historic record of streamflows and extend it in
time by exploiting the correlation between flows at the site
and concurrent flows at some nearby long-term gage(a base
station) [Riggs, 1972; Matalas and Jacobs, 1964; Hirsch,
1979].
3. Reconstruct historic flow records by transposingrecords from a base stationto the site of interestby usingsome
functionin which the coefficientsare derivedfrom a regional
streamflow basin characteristics regression equation
[Hirsch, 1979].
4. Generate multiple synthetic streamflowrecords [Fiering, 1-967]where the parameters are based on historic flow
values at the site or on regional streamflow characteristics
[Benson and Matalas, 1967].
5. Develop and calibrate a conceptualmodel of the basin
and use it to generatea streamflowrecord by usinghistorical
This paper is not subjectto U.S. copyright.Publishedin 1982by
the American GeophysicalUnion.
Paper number 2W0479.
meteorological records as inputs [Crawford and Linsley,
1966].
6. Develop and calibrate a conceptualmodel of the basin
and use it to generate multiple streamflowrecords by using
synthetic meteorological records as inputs [Leclerc and
Schaake, 1973].
Each of these categories,and the specificmethodsin each
category, have certain advantages and disadvantagesfor
various applications.The selectionof an appropriatemethod
would depend on the relevant time step of analysis (hours,
days, weeks, seasons,years, or decades)and the benefitsof
increasedaccuracy in estimationof outcomesin comparison
to the cost of applying a more complex method. The more
historically based methods (categories 1, 2, 3, and 5) have
their greatest applicability when the time scalesare fine (for
example, small storages,run of fiver withdrawals,or water
quality analysis). The synthetic methods (categories4 and 6)
have their greatest applicability where the time scales are
coarse (for example, large storagesfor control of multiyear
droughts). An additional consideration in the selection of
methodsis the potential for analysisof errors. In general,
this becomesmore difficultas the numberof parametersand
model assumptionsgrow and this grows rapidly as the time
step of analysis becomes finer, Another consideration in
many cases is the adaptability of the method to changes in
watershed characteristics. In general, only those methods
that involve conceptualmodels have this capability.
This paper will focus on methods within the second
category and comparison of these with category 1. It is the
intent of this paper to evaluate the characteristicsof some
methods within category 2. This is not done out of a belief
that this categoryis generally SUperiorto the others, but
becauseit may be superiorfor someusesand is easyto apply
and well accepted by many practicing water resource engineers. The evaluation of these techniques will rely on
measures of the accuracy of low-flow duration and severity
estimatesas indicators of the suitability of the techniquesfor
use in water resource system simulations.
In the next section of the paper, four methods of record
extension are defined and some of their properties are
discussed. A set of Monte Carlo trials are then carried out to
evaluate the bias and error in estimating means, variances,
1081
1082
HIRSCH' COMPARISONOF STREAMFLOWRECORD EXTENSION TECHNIQUES
and order statistics based on extended
records where mar-
seriallyindependent,
andhavea bivariatenormalprobability
ginal distributions of flows are normal. Finally, the methods
are applied to historical data sets and the results are examined for accuracy in reproducing certain historical order
distribution
withparameters
t•x,•, Crx
2,%2,andp,wheret•x
statistics.
for y. The parameter p is the population product moment
PROBLEM
andarx
2 denotethepopulation
meanandvariancefor x, and
• andtry
2,thepopulation
meanandvariance,
respectively,
correlationcoefficient,andpcry/Crx
is the populationvalueof
DEFINITION
the slope of the linear regressionof y on x.
For the basestationthe flow valuesare denotedx(i) where
i is an index of time. For the short-record
THœ FOUR Mœ•'HODS OF EXTENSION
station the flow
values are denoted y(i). The observed events for the two
sequences are represented as
x(1), ß ß ', x(N1), x(N1 + 1), ß ß ', x(N1 + N2)
Regression (REG)
The first method of extension is linear regression. The
missingvalues are filled in by the equation
y(i) = a + bx(i)
y(1), ß ß ', y(N1)
where N1 is the length of the shortersequenceand (N1 + N2)
is the length of the longer sequence.N1 is also the length of
concurrent record. It is not necessaryfor the two sequences
to begin or end simultaneously,nor need the observationsbe
consecutive, but there is no loss of generality if the two
sequences are represented as above.
The estimatesof the missingvaluesare denotedp(i), i =
N1 + 1, ß ß., N1 + N2 and the complete extended record is
denoted 29(0,i = 1, ß ß., N1, N1 + 1, ß ß., N1 + N2; where
29(i)=y(i)
/9(i) = p(i)
i=
1,...,N1
i= N1 + 1,''
(1)
where the parameters a and b are those values which
minimize
N1
Z = •
(.9(0- y(i))2
i=1
The solution for a and b is found by solving the normal
equations [Draper and Smith, 1966, p. 59]. The optimal
solution to (1), rearranged for convenience, is
p(i) = m(yO + r
S(yO
S(xO
(x(i)-
m(xO)
(2)
', N1 + N2
Matalas and Jacobs [1964] show that m(y) is an unbiased
Table 1 gives the naming conventions for the various
sample statistics referred to in the report. Much of the
notation and some of the derivationsused in this paper were
developed by Matalas and Jacobs [1964]. There is, however, a fundamental differencebetween the intent of that paper
and the intent of the present one. Matalas and Jacobswere
concerned with the quality (bias and variance) of the esti-
matesm(.9)andS2(.9),
andit wasnottheirintentto consider
methodsof producingan extendedrecord y(i). In this paper
the goal is the development of this extended record. The
estimate
of • butS2(y)is a biased
estimate
of %2forp2<
1.0. Specifically,
E[m(Y)l = •
(1
_p2)N2(N1
_4)}
E[S2(.9)]
=O'y
2 1--(N1
+N2-1)(N13)
Table2 givessomeexamplevaluesof E[S2(.9)]/try
2 for
various combinationsof p, N1, and N2. Given that a common
properties
of statistics
of thisrecord,suchasm(y)andS2(y) purposeof record extensionis the evaluationof the severity
are used as measures (but not the only measures) of the and duration of hydrologic extremes, this consistentunderquality of the extended record. In the next section, four estimation of variance is an alarmingfeature of REG. It is in
methodswill be presented,and in each casethe expectation fact the intent of each of the following three methods to
of m(y)andof S2(y)will be evaluated
underthe assumption eliminate (or at least minimize) this bias in the variance.
that concurrent observations of x and y are stationary,
Regression Plus Independent Noise (RPN)
Matalas and Jacobs [1964] demonstratedthat unbiased
TABLE
1. Definitions of Sample Statistics
Statistic
m(x•)
m(x2)
m(x)
m(y•)
m(.f)
S2(x•)
52(X2)
S2(x)
S2(y•)
S2(y)
estimates of mean and variance are achieved if the following
equation is used to calculate p(i):
Definition
x(1),
x(N•
x(1),
y(1),
y(1),
Sample Mean of
ß ß., x(N•)
+ 1),-..,
x(N• + N2)
ß ß ', x(N•), x(N• + 1), ß ß ', x(Ni + N2)
ß ß., y(N•)
ß ß., y(N•), y(N• + 1),...,
y(N• + N2)
x(1),
x(N•
x(1),
y(1),
y(1),
Sample Variance of*
ß ß., x(N•)
+ 1),...,
x(N• + N2)
ß ß., x(Ni), x(N• + 1), ß ß., x(Ni + N2)
ß ß., y(N•)
ß ß ', y(Ni), y(N• + 1), ß ß., y(N• + N2)
Product Moment Correlation Coefficient of
x(1), ß ß., x(N•) and y(1), ß ß., y(N•)
*Variance computed with sample size minus 1 as the divisor.
p(i) = m(yl) + r
S(y0
S(xO
(x(i) - m(xl))+ a(1 - r2)mS(yl)e(i)
(3)
where e(i) is a normal independentrandom variable with
zero mean and unit variance and
N2(N1 - 4)(N1 - 1)
(N2-
1)(N1-
3)(Nl-
2)
This procedureof addingindependentnoiseto regression
estimates [Matalas and Jacobs, 1964, p. E4]
... is not too appealing. Independent studies of the same
sequenceof x and y by several investigatorslead to different
HIRSCH: COMPARISON OF STREAMFLOW RECORD EXTENSION TECHNIQUES
TABLE 2. Valuesof E[Sy2(.•)]/o'y
2 UsingtheRegression
(REG)
Method
of Record Extension
N•
N2
0.5
0.7
0.9
10
20
30
50
10
10
20
30
10
50
0.66
0.64
0.63
0.88
0.46
0.77
0.75
0.75
0.92
0.63
0.91
0.91
0.91
0.97
0.86
variance,whileMOVE. 1 overestimates
it. NotethatS2(y)is
anasymptotically
unbiased
estimator
of %2asN• -->o•.
valuesof [m(.9),S2(.9)and p(i), i = N1 + 1, '' ', N1 + N2],
The estimates p(i) are hybrids: a weighted sum of the
historically observed random variable x(i) and a unrelated,
computer generated random variable, e(i). One may feel
uncomfortable making a management decision when only
one realization of these series of random numbers is used,
and yet the interpretation of results from multiple realizations would be, at best, ambiguousbecausethe realizations
would not be independentof each other. Nevertheless, the
RPN method doeshave the desiredpropertiesof an unbiased
mean and variance, and it has found use in practice [Beard et
al., 1970].
Maintenance of VarianceExtension,Type 1
(MOVE. I)
An alternative to the RPN approachis to specify that the
extension equation must be of the form given in (1) but that a
and b are to be set not to minimize squarederrors, but rather
to maintain the sample mean and variance. The idea which
led to the developmentof MOVE. 1 was to find somevalues
of a and b in (1) which satisfy the following two equalities:
N1
i=1
i=1
S(y,)
S(x,)
(x(i) - m(xO)
For the former
it is the minimization
E(m(5)) = /Xy
extension
$(y)
p(i)= fit(y)
+
- m(x)]
(6)
N2
(N1 + N2)
S(Yl)
r•
(m(x2) - m(xl))
S(x2)
(7)
, [
•2(y)
=N•+N2-1 (N•-1)S2(y0
of
(5a)
S2(yl)
+(N2
- 1)r2S2(xi)
S2(x2)
+(N2
- 1).2(1
- r2)S2(yl)
+
NIN2
S2(yl)
(N•+ N2)r2 S2-••(m(x2)- m(x•))2
(8)
The complexity of these expressionsprevented the discovery of an analytical solution for the bias of the mean and
variance. In Monte Carlo experiments of 2000 trials with
each of 15 different combinations of p, N•, and N2, the
o•2
1
The
(4)
squared errors of •(i); for the latter it is the desire for the
sample mean and variance of the y(i) to equal the sample
mean and variance of the y(i) for i = 1, ß ß., N•. When this
equation is used for record extension it can be shown that
N•+N2-
is based on more information.
equation is
fit(Y) = m(Yl) +
In spite of the obvious similarity between (2) and (4), it
should be recognizedthat they arise from completely differ-
E[S2(.9)]
=
estimation
themselves
unbiased
estimates
of • andtry
2.
is
p(i) = m(yO +
ent motivations.
In MOVE.1 the only four parameters are the means and
variances of x and y estimated from the first N• observations. In MOVE.2 these sameparametersare used, but their
were developed by Matalas and Jacobs [1964] and are
(y(i)- m(y0)2
i=1
One such solution
Maintenance of Variance Extension, Type 2
(MOVE.2)
transfer
fromthex sequence.
Theparameters
fit(y)and3:2(y)
N1
•, (.9(0- m(Y0)
2= •
It should be noted that the estimatesy(i) in (4) lie between
the estimates of y(i) from a regression of y on x and the
estimate of y(i) from the inverse of a regressionof x on y.
This property was noted and discussed in a hydrologic
context by Kritskiy and Menkel [1968], who called (4) 'the
unique solution.' Till [1973] refers to this line as the reduced
major axis and gives the standard error of the slope and
intercept estimates; he also gives some applications in
geomorphology. Additional information on the mathematics
is found in the work by Kruskal [1953]. The first known
reference to this line is by Pearson [1901] and discussionsof
its applications can be found in works by lmbrie [1956]
(biometrics) and Greenall [1949] (psychology).
Kirby [1974] has proposed least normal squares, which
also results in an equation for a line falling between the
regressionof x on y and y on x. His equation coincideswith
(4) only where r = 1.0 and/or S(x•) = S(yO. Neither the line
represented by (4) nor Kirby's line coincide with the line
which bisects the angle between these two regressionlines
except where r = 1.0 and/or S(x•) = S(yO.
The mean and variance estimatesfor x are based on all N• +
N2 observations, and the mean and variance estimatesfor y
are based on the historical values of y and on information
• p(i) = x• y(i)
N1
Table3 givesvaluesof theratioE S2(y)/O'y
2 for various
combinationsof p, N•, and N2. It is clear from Tables 2 and 3
that the magnitude of the bias is substantially lower for
MOVE.1 than for REG, and that REG underestimates
because the same sequence of pseudo-random numbers is
unlikely to be used by the investigators.
i=1
1083
N•- 1+ N2 N•- 3
(5b)
hypothesisthat E[m(5)] = /Xycould not be rejected at the
1084
HIRSCH:COMPARISON
OF STREAMFLOW
RECORDEXTENSIONTECHNIQUES
TABLE 3. Valuesof E[Sy2(jT)]/rry
2 UsingtheMaintenance
of
was either significantand positive or not significant.For
Variance Type 1 (MOVE.1) Method of Record Extension
MOVE.1 the bias for S2(y)was in all casespositiveand
N1
N2
0.5
0.7
0.9
10
20
30
50
10
10
20
30
10
50
1.19
1.05
1.03
1.01
1.18
1.08
1.03
1.02
1.00
1.12
1.03
1.03
1.01
1.00
1.05
significant,and the bias for the order statisticswas either
negativeand significantor not significant.In no casedid the
results show the bias to be significantly(at the 5% level)
different from the value determined by (5b). For MOVE.2, in
onlyonecasewastheresignificant
biasin S2(y)(negative
biasfor N• = 20, N2 = 40, p = 0.5) and it was not significant
at the 2.5% level. The bias in the order statistics were either
positive and significantor not significant.
All biasesobservedin m(y) were very small. Two of them
were significantbut evenif all methodswere in fact unbiased
0.05 significance
level. The resultsfor S2(y)are shownin in m(y), it would not be unreasonableto find two significantTable 4. These results suggestthat for all practicalpurposes, ly biasedvalues(at the 5% level) out of 24 cases.Neither of
MOVE.2
satisfies the need for an extension method which
these two casesis significantat the 2.5% level.
Thus, in summary,one can concludethat all four methods
produces
unbiased
valuesof m(y).andS2(y).
MONTE
CARLO EVALUATION
OF ERRORS
In estimatingfrequency distributionsof streamflows,the
hydrologistwill not only considerthe statisticalmomentsof
the sample but also some of the extreme order statistics.If
some method of record extension
introduces
a bias into the
value of the more extreme order statistics, this will lead to
produceextendedrecordsthat are unbiasedin the mean,
that REG substantiallyreducesvariability, MOVE. 1 slightly
increasesvariability, and RPN and MOVE.2 preserve the
variance but slightly decreasesvariability in the more extreme events. The method with the lowest value of the root
mean squareerror (RMSE) for all statisticsexcept m(y) is
MOVE.2. For m(y) there is little differencebetween the
bias in the estimates of the probability of exceedanceof
selected extreme values or, conversely, bias in the estimation of distribution quantiles.
Let w represent a statistic of a (simulated)historic time
series (such as a moment or order statistic) and W represent
RMSE of REG and of MOVE.2. For all methods and all
statistics the RMSE decreases with an increase in N• or an
the same statistic of a (simulated) extended record. The error
in any realization of w is (w - E[w]), and the error in any
realization of W is (W - E[w]), where E[w] is the expectation
In the previoussectionsof thispaper,the randomvariable
beingconsideredhad very simpleand hydrologicallyunrealistic properties(normal,independent,and withouta cyclic
component).In this section some real data are used to
explorethe techniques
underconditions
of nonnormal
distribution, serial dependence,and seasonalcycles. The data
used are monthly volume data. Some decisionswere necessaryon the particularsof howto applythe four techniques
to
of the statistic for the sample size and population being
considered. The estimated bias of W, B(W), is the average
error (W - E[w]) over a large number of Monte Carlo trials
(2000 trials in this case). The root mean square error for
statistic w is denoted R(w), and for the statistic • the root
mean square error is R(W).
The statistics
w considered
herearem(y),S2(y),y•, Y2,and
increase in p.
EMPIRICAL
CHECK OF THE METHODS
such data.
The first decision was whether or not to use only one
extensionequationfor all months,12 differentonesfor the
ßß., N• + N2. The statistics
½ arem(y),S2(y),y•, Y2,andY5 12months,or to makesomecompromisesuchastwo or four
seasonalequations.The choiceinvolvesa trade-offbetween
where yk is the kth order statisticof the seriesy(i), i = 1, ß ß.,
N• + N2. The Monte Carlo trials were carded out for (N•,
greatersamplesizefor estimatingthe parametersversusthe
N2) values of (10, 50) and (20, 40) and for p valuesof 0.5, 0.7, ability to preservereal monthto monthdifferencesthat may
Y5 where Yk is the kth order statistic of the seriesy(i) i = 1,
and 0.9. In all cases the flows are assumed to be bivariate
exist in the base station to short-record station relationship.
normaltzx= •y = 0, trx
2 = try2 = 1withnoautocorrelation.The choice made here, basedon someexperimentswith the
The results of these trials are given in Table 5. For each
statistic w, the null hypothesisthat B(w) or B(W) equalszero
(unbiased) was tested and those cases found significantat
the 5% level are marked by the asterisk. The B(w) are
unbiased by construction.
For REG all of the statistics except m(y) appear to be
data, was to use a singleextensionequationfor all of the
months in each of the four techniques.This problem has
biased for all of the cases considered. The absolute value of
Maintenanceof Variance, Type 2 (MOVE.2) Method
TABLE 4. Valuesof Sample
Meanof S2(.•)/O'y
2 Usingthe
of Record Extension
the bias decreaseswith an increasein p or an increasein N•.
The biases for REG all show the regression towards the
mean, the estimated variances are too low, and the low-
order statisticsare too high. For example, with N• = 10, N2
= 50, and p = 0.5, the second order statistic is, on the
average, -1.261 rather than -1.935, which is its expectation; thus B•2) = (-1.261) - (-1.935) = 0.674.
For the other three methods the absolute value of the
N•
N2
0.5
0.7
0.9
10
20
30
50
lO
10
20
30
lO
50
0.99
0.99
0.99
1.00
0.99
0.99
1.00
0.99
0.99
1.O1
1.02'
1.00
1.00
1.01
1.00
biasesfor all w except m(y) are in all casessubstantiallyless
Based on 2000 Monte
than the bias with REG.
*The hypothesis
Ho: E[X2(.•)]/try
2 -' 1.00is rejectedat the 5%
For RPN there were no cases of
significant
biasfor S2(y),andfor theorderstatistics
thebias
level.
Carlo thais.
HIRSCH: COMPARISON OF STREAMFLOW RECORD EXTENSION TECHNIQUES
TABLE 5. BiasesB( ) and Root Mean SquareError R(
) From 2000 Monte Carlo Trials
B(•o) or B(d,)
method
m(y)
S2(y)
R(•o) or R(&)
nl
n2
rho
Yl
m(y)
S2(y)
y5
Y2
10
10
10
10
10
50
50
50
50
50
0.5
0.5
0.5
0.5
0.5
HIST
REG
RPN
MOVE.I
MOVE.2
0.001
-0.003
-0.005
-0.009
-0.004
0.004
-0.557*
-0.001
0.164'
-0.017
-0.002
0.603*
0.030*
-0.059*
0.044*
-0.002
0.674*
0.067*
-0.069*
0.083*
-0.006
0.656*
0.097*
-0.090*
0.102'
0.131
0.294
0.306
0.334
0.292
0.183
0.658
0.517
0.783
0.487
0.239
0.784
0.487
0.593
0.467
0.332
0.871
0.616
0.758
0.590
0.459
0.910
0.734
0.930
0.726
10
10
10
10
10
50
50
50
50
50
0.7
0.7
0.7
0.7
0.7
HIST
REG
RPN
MOVE.I
MOVE.2
-0.001
0.004
0.003
0.002
0.004
0.003
-0.362*
0.007
0.137'
0.004
-0.000
0.358*
0.021'
-0.045*
0.031'
-0.005
0.447*
0.041'
-0.069*
0.046*
-0.010
0.476*
0.057*
-0.090*
0.058*
0.128
0.251
0.262
0.271
0.250
0.184
0.533
0.458
0.670
0.445
0.233
0.578
0.439
0.501
0.427
0.323
0.707
0.563
0.641
0.533
0.442
0.804
0.705
0.803
0.670
10
10
10
10
10
50
50
50
50
50
0.9
0.9
0.9
0.9
0.9
HIST
REG
RPN
MOVE.I
MOVE.2
0.004
0.003
0.004
0.002
0.003
-0.002
-0.148'
-0.013
0.040*
-0.011
0.001
0.129'
0.026*
-0.012
0.022*
0.010
0.176'
0.037*
-0.018'
0.033*
0.013
0.217'
0.050*
-0.017
0.047*
0.129
0.186
0.191
0.188
0.186
0.186
0.352
0.346
0.382
0.333
0.235
0.367
0.342
0.348
0.340
0.333
0.491
0.472
0.473
0.456
0.448
0.610
0.589
0.603
0.577
20
20
20
20
20
40
40
40
40
40
0.5
0.5
0.5
0.5
0.5
HIST
REG
RPN
MOVE.I
MOVE.2
0.002
0.006
0.006
0.005
0.005
-0.000
-0.484*
-0.006
0.054*
-0.013'
-0.001
0.476*
0.023*
-0.009
0.027*
0.004
0.470*
0.044*
-0.016
0.049*
0.001
0.446*
0.067*
-0.014
0.077*
0.130
0.208
0.228
0.227
0.208
0.187
0.533
0.347
0.418
0.325
0.245
0.582
0.375
0.386
0.341
0.334
0.624
0.472
0.510
0.443
0.452
0.687
0.586
0.641
0.556
20
20
20
20
20
40
40
40
40
40
0.7
0.7
0.7
0.7
0.7
HIST
REG
RPN
MOVE.I
MOVE.2
-0.004
-0.007*
-0.007
-0.005
-0.006
0.003
-0.314'
0.010
0.052*
0.003
-0.008
0.267*
-0.005
-0.027*
-0.002
-0.003
0.327*
0.010
-0.022*
0.022*
-0.011
0.344*
0.026*
-0.031'
0.037*
0.128
0.185
0.199
0.194
0.186
0.184
0.412
0.330
0.375
0.313
0.237
0.423
0.343
0.355
0.328
0.334
0.531
0.447
0.470
0.427
0.458
0.624
0.560
0.604
0.544
20
20
20
20
20
40
40
40
40
40
0.9
0.9
0.9
0.9
0.9
HIST
REG
RPN
MOVE.I
MOVE.2
-0.003
-0.124'
-0.004
0.014'
-0.005
0.006
0.099*
0.010
-0.007
0.005
0.015'
0.138'
0.019'
-0.001
0.019'
0.005
0.156'
0.019
-0.012
0.017
0.129
0.151
0.158
0.152
0.151
0.180
0.268
0.259
0.257
0.245
0.240
0.296
0.285
0.283
0.280
0.331
0.395
0.389
0.383
0.375
0.459
0.521
0.535
0.524
0.510
0.002
0.004
0.006*
0.004
0.004
Y5
1085
Y2
Yl
The rowsmarkedHIST representcalculations
basedon (N• + N2) observations
andinvolvingno recordextension.E(•o)= 0.000, 1.000,
-1.430, -1.935, -2.319, for the five statistics, respectively.
*Bias significantat the 5% level.
been explored by Alley and Burns [1981], who provide a
means for selectingthe best method for any given case.
The second decision was whether
or not to do the exten-
sion with the real data or with their logarithms (that is, take
logarithms of all of the data, do the extension, and then take
antilogs of these extended records). This questionhas been
dealt with by Hirsch [1979] and, in a somewhat different
context, by $tedinger [1980]. In both cases the conclusion
was to work with the logarithms, and this was the approach
taken here. The consequencesof this choice are that for any
of the techniques,the samplemean of the extendedrecord of
the logarithms is an unbiased estimate of the mean of the
logarithms, but the sample mean of the extended record of
flows is not an unbiased estimate of the mean of the flows.
However, the objective of the technique is to produce
records with sample cumulative distribution functions
(CDF' s) which are close approximationsof the CDF's actual
records (particularly in the tails). Where the station to
station relationship more nearly approximates a bivariate
lognormal distribution rather than a bivariate normal, transforming to logarithms will better achieve the desired result.
The evaluation describedhere will forcus on the ability of
the techniques to produce low-flow duration intensity characteristics like those of the actual records they are intended
to represent. The example used concurrent, 50-year-long
(commencing in 1928) U.S. Geological Survey monthly
volume records from seven stream gaging stations in west
central Virginia. Table 6 provides some information on the
stations,their drainagebasinsand the observedflow characteristics. The experimental designis the following.
1. Each station is, in turn, considered to be the short-
record station with only 10 years of data available. The
historic values of the first, second, and fifth-order statistics
in the annual series of minimum 1-month, 3-month, and 6-
month volumesfrom the entire 50-year record are recorded.
They are denotedQ(r, d, is) where r = 1, 2, 5 (indexof order
or rank), d = 1, 3, 6 (duration in months), and is = 1, 2, ß ß.,
7 (index of short-recordstation). The year is consideredto
be the climatic year commencingin April.
2. For each given short-recordstation, each of the other
six stations is considered, in turn, to be the base station with
the full 50-year record available.
3. For each short-record station-base station pair, the
period of record overlap is considered,in turn, to be years 110, 11-20, 21-30, 31-40, and 41-50, respectively.
4. For each short-record station-base station-overlap period, (there are 7 x 6 x 5 = 210 such 3-tuples) each of the
four extension methods is applied to the logarithms of the
streamflow volume data. From each of these extended flow
records the same flow statistics are computed. They are
1086
HIRSCH' COMPARISONOF STREAMFLOWRECORD EXTENSION TECHNIQUES
TABLE 6.
Information on the Seven Gages
Logarithms of
Monthly Volumes
Monthly Volumes
Coefficient of
Variation
Coefficient of
Skewness
Standard
deviation
28.3
0.92
1.26
1.04
-0.12
1.81
91
9.3
0.97
1.33
1.11
-0.12
1.80
1680
75
26.3
0.83
1.65
0.83
-0.02
2.12
164
2230
88
12.3
1.02
1.48
1.12
-0.00
1.76
461
2030
82
38.6
0.88
1.33
0.93
-0.03
1.87
247
2500
44
19.7
0.78
1.59
0.74
300
2470
47
27.0
0.57
1.54
0.54
Drainage Mean basin
Station
Number
Location
Forest
area
elevation
mi2
in feet
329
2150
89
104
2210
395
02018000 Craig Creek at
cover in
Mean
percent 106 m3
Coefficient of
skewness
Coefficient of
kurtosis
Parr, Va.
02017500
Johns Creek at
New Castle, Va.
02055000
Roanoke River at
Roanoke, Va.
02013000 Dunlap Creek near
Covington, Va.
02016000 CowpastureRiver near
Clifton Forge, Va.
03167000
Reed Creek at
0.21
2.01
Grahams Forge, Va.
03170000
Little River at
-0.03
2.59
Graysonton, Va.
denoted
Q(r,d, is, ib, sq,m),ib = 1,2, ßß., 7 (indexofbase
station), ib • is, sq = 1, 2, ß ß -, 5 (indexof sequence),and m
= {REG, RPN, MOVE. 1, MOVE.2}.
5. The measure of accuracy of each estimate is U(r, d, is,
ib, sq, m)
U(r, d, is, ib, sq, m) =
(2(r,d, is, ib, sq,m)
Q(r, d, is)
The correlation coefficient (of the log values) was also
recorded for each of the 210 3-tuples.The mediancorrelation
coefficientis 0.90, the lower and upperquartilesare 0.85 and
0.93, and the minimum and maximum are 0.71 and 0.99.
The accuracy of the methodsis summarizedin Figures 1,
2, and 3. In these figures the box plots represent the
distribution of all 210 U values for a given rank, duration,
and method. The accuracyof the methodmay be judged by
the degreeof dispersionin the box plot for that method, by
the degreethat the medianapproachesa value of 1.0, and by
the symmetryof the box about a value of 1.0. The following
are some observations
1. Both MOVE.1
and MOVE.2
have median values for U
very closeto 1.0 (in fact, between0.980 and 1.007)for all
flow statistics.
2. At all durations REG had median U values > 1.00 (the
most extreme being 1.11). For durationsof 1 and 3 months
and for all three order statistics,approximately75% of the
computedvalueswere higherthanthe historicalvalues.This
result was expected, of course, given the tendencyfor
regressionto reduce variance resulting in low extremes
which are 'too high.'
3. For RPN at a 1-month duration, the median U values as
well as the upper quartile U valuesare <1.00. This arises
because of the kurtosis of the log volume values. For all
seven streams, the kurtosis is significantly(a = 0.01) less
than 3.0. For the three other methods, REG, MOVE. 1, and
MOVE.2, the kurtosis of the extended record will closely
approximatethe kurtosisfrom the basestation.But in RPN
the kurtosis of the extended record will be some weighted
average(dependingon r) of the basestationkurtosisandthe
kurtosis of the noise term (a value of 3). Thus the kurtosisof
about the results.
Order statistic 1
(lowest in 5•year record)
1.75
Order statistic 2
Order statistic 5
(second lowest)
(ffth lowest)
__•._
,
,
,
,'-t-,
,
'
I
,
0.75
0.50
O25
REG
RPN
MOVE. 1 MOVE. 2
REG
RPN MOVE. 1 MOVE. 2
REG
RPN
MOVE. 1 MOVE. 2
Fig. 1. Boxplotsof U valuesfor 1-month-duration
low flows.Boxplotsshowmedian,upperandlowerquartiles,and
maximum and minimum values. The sample size is 210.
HIRSCH: COMPARISONOF STREAMFLOWRECORD EXTENSION TECHNIQUES
1087
2.00
Order statistic 1
(lowest in 50-year record)
1.75
Order statistic 2
Order statistic 5
(fifth lowest)
(second lowest)
,--[--
•
1.50
"
125
100
0.50
0.25
REG
0.00
RPN
MOVE. 1 MOVE. 2
REG
RPN
MOVE. 1 MOVE. 2
REG
RPN
MOVE. 1 MOVE. 2
Fig. 2. Box plotsof U valuesfor 3-month-duration
low flows.Box plotsshowmedian,upperandlower quartiles,and
maximum and minimum values. The samplesize is 210.
the extended records using RPN will be higher than those of
the historic records. Given the approximatepreservationof
variance and inflation of kurtosis, the extremes of the
extended records will be more extreme than the historic. It is
not known
how common
it is for kurtosis values for such
time seriesto fall in the range 1.5-2.5 (as these sevenrecords
do). It is not, however, unexpected.If one postulatesthat
log monthly volumes are the sum of a normal random
component and a sine curve of a period of 1 year (with the
random component standard deviation equal to about one
third of the semiamplitudeof the sinecurve), the kurtosisof
the sum is approximately 2.
4. For RPN at a duration
of 6 months the median U values
for RPN are > 1.00, and in fact, nearly 75% of all U values
are > 1.00. Thus RPN
extended
records have too low low-
flow values at a 1-month duration and too high low-flow
values at a 6-month duration. The reasonfor these too high
values at 6-month duration is that the noisein (3) is independent. Thus the estimatedportion (i = N• + 1, ß ß., N• + N2)
of the extended record will have a serial correlation coeffi-
cient lower (in absolute value) than the base station, but for
the other
three
methods
it will
match
the base station
exactly.
5. At
a 3-month
duration
the
U values for RPN
for MOVE.
1.
An additional question consideredhere was the matter of
selecting a base station from among a variety of possible
long-record stationslocated nearby. The hypothesisconsidered was that usingthe base stationwith the highestcorrelation coefficient (of the log values) for the 10-year overlap
period would result in a smallerdispersionof U valuesthan
was found using all possiblebase stations. This hypothesis
was born out by the results: The interquartile ranges of U
values using only the best correlated base station was
2.00
1.75
Order statistic 1
Order statistic 2
Order statistic 5
(lowest in 50-year record)
(second lowest)
(fifth lowest)
,--[--
•
'•.50
=. • •.25
•
0.75
0
0.50
0.25
0.00
REG
RPN
MOVE. 1 MOVE. 2
REG
are
reasonably symmetrical about a value of 1.0. Presumably,
the two contrary effects described above (3 and 4) are
approximately in balance at this duration.
6. The box plots of U values for MOVE.1 and MOVE.2
are very similar in all cases. In general, the interquartile
range and overall range for MOVE.2 is slightly smaller than
RPN
MOVE. 1 MOVE. 2
REG
RPN
MOVE. 1 MOVE. 2
Fig. 3. Box plotsof U valuesfor 6-month-duration
low flows.Boxplotsshowmedian,upperandlowerquartiles,and
maximumand minimumvalues. The samplesize is 210.
1088
HIRSCH: COMPARISONOF STREAMFLOWRECORDEXTENSION TECHNIQUES
smaller for all nine order statisticsconsidered.The average
ratio of the interquartile range using these selected base
stations to the interquartile range for all possible base
stations was 0.63. Thus maximum sample correlation appears to be a reasonable selection criterion; obviously,
others might exist which are better.
SUMMARY
AND CONCLUSIONS
In using record extension, as applied in the preceding
section, one is transferringthe characteristicsof distribution
shape, serial correlation, and seasonality from the base
station to the short-record station with adjustments of location and scale appropriate to the short-record station. The
analytical derivations, the Monte Carlo study of samples
from bivariate normal populations,and the empirical analysis all show that REG and RPN fall substantially short of
achieving the desired result of creating a realistic extended
record. Specifically, REG cannot be expected to provide
records with the appropriate variability and RPN cannot be
expected to provide records with the appropriate distribution shape or serial correlaton.
MOVE. 1 and MOVE.2 differ in the ways that they utilize
the available data for parameter estimation (the latter uses
more of the base station data than does the former). The
differences in their performance, as seenin the evaluation of
biasesof moments and order statisticsand in the empirical
study, all show MOVE.2 to have slightly more desirable
properties than MOVE. 1 and rather substantially better
properties than REG or RPN. Of course, if the base station
and short record station have substantial differences in terms
of distribution shapes,serial correlation,or seasonality,then
these methods cannot be expected to perform very well,
becausethey will transfer these characteristicsfrom the base
station to the short record station.
The intent of record extensionis to produce a time series
which is relatively long and is alsoplausible;that is, one that
possessesstatistical characteristicsbelieved to be like those
of an actual record for the station. The reasonfor producing
such a record is for use in simulation and optimizations
related to potential water managementdecisions.The use of
regressionfor these purposes is inappropriate. The aim of
regression is to provide the 'best estimate' (minimum
squared error) for each individual streamflow value and not
to preserve any particular statistical characteristics of the
record. The three other methods are attempts to produce
serieswhich do preserve certain desiredcharacteristics.The
results of this study suggest that MOVE.2 is the most
effective of all four of the methodsin terms of producingtime
series with properties (such as variance and extreme order
statistics) most like the properties of the records they are
intended to represent.
Acknowledgments.Ed Gilroy of the U.S. GeologicalSurvey
helped with the derivation of the bias of the variancefor MOVE. 1.
Bill Kirby of the U.S. GeologicalSurvey suggestedthe idea of
MOVE.2
as a variant of MOVE.
1.
REFERENCES
Aitchison, J., and J. A. C. Brown, The Log-normal Distribution,
CambridgeUniversity Press, London, 1957.
Alley, W. A., and A. W. Burns, Mixed-stationextensionof monthly
streamflow records, report, U.S. Geol. Surv., Reston, Virginia,
1981.
Beard, L. R., A. J. Fredrich, and E. F. Hawkins, Estimating
monthly streamflowswithin a region, Tech. Pap. 18, 14 pp., The
Hydrol. Eng. Center, U.S. Army Corps of Eng., Davis, Calif.,
1970.
Benson, M. A., and N. C. Matalas, Synthetic hydrology based on
regional statisticalparameters, Water Resour. Res., 3, 931-936,
1967.
Crawford, N.H., and R. K. Linsley, Jr., Digital simulationin
hydrology:Stanfordwatershedmodeliv, Tech.Rep. 39, Dep. of
Civ. Eng., StanfordUniv., Stanford,Calif., July 1966.
Draper, N. R., and H. Smith, Applied RegressionAnalysis, John
Wiley, New York, 1966.
Fiering, M. B, Streamflow Synthesis, Harvard University Press,
Cambridge, Massachusetts, 1967.
Greenall, P. D., The concept of equivalent scoresin similar tests,
Br. J. Stat. Psychol., 2, 30-40, 1949.
Hazen, A., Storagerequiredfor the regulationof streamflow,Trans.
Am. Soc. Civ. Eng., 77, 1914.
Hirsch, R. M., An evaluation of some record reconstructiontechniques, Water Resour. Res., 15, 1781-1790, 1979.
Imbrie, J., Biometrical methodsin the studyof invertebratefossils,
Bull. Am. Mus. Nat. Hist., 108, 215-52, 1956.
Kirby, W., Straight line fitting of an observation path by least
normal squares,U.S. Geol. Surv. Open-FileRep., 74-187, 1974.
Kritskiy, S. N., and M. F. Menkel, Some statisticalmethodsin the
analysisof hydrologicseries,Sov. Hydrol. Select.Pap., 7, 80-98,
1968.
Kruskal, W. H., On the uniquenessof the line of organiccorrelation, Biometrics, 9, 47-58, 1953.
Leclerc, G., and J. C. Schaake, Jr., Methodology for assessingthe
potential impact of urban developmenton urban runoff and the
relative efficiencyof runoffcontrolalternatives,Rep. 167, Dep. of
Civ. Eng., Mass. Inst. of Technol., Cambridge,Mass., 1973.
Matalas, N. C., Mathematical assessmentof synthetic hydrology,
Water Resour. Res., 3, 937-945, 1967.
Matalas, N. C., and B. A. Jacobs, A correlation procedure for
augmentinghydrologicdata, U.S. Geol. Surv. Prof. Pap., 434-E,
1964.
Pearson,K., On linesand planesof closestfit to systemsof pointsin
space, Philos. Mag., 2, pp. 559-572, 1901.
Riggs, H. C., Low-flow investigations,U.S. Geol. Surv. Tech.
Water Resour. Invest., 4, 1972.
Rippl, W., The capacity of storagereservoirsfor water supply,
Proc. Inst. Civ. Eng., 71, 1883.
Stedinger,J. R., Fitting log normaldistributionsto hydrologicdata,
Water Resour. Res., 16, 481-490, 1980.
Till, R., The use of linear regressionin geomorphology,Area, 5,
303-308, 1973.
(Received October 29, 1981;
revised March 9, 1982;
accepted March 22, 1982.)
Download