wrcr13140-sup-0003-txts01

advertisement
Justification of linear relationship between ET/P vs. PET/P using
Monte Carlo Experiment
Spurious correlation (or artificial correlation) has been studied for decades [Pearson, 1897;
Yule, 1910; Firebaugh and Gibbs, 1985; Brett, 2004]. It refers to the correlation between two
variables that have no causal connection, but may be inferred due to a certain third, unseen or
confounding factor. For regression on indices (or ratio variables), spurious correlation can be
caused by the common component, which have no causal relation with other variables [Kronmal,
1993]. In our study, ET/P is regressed on PET/P. In hydrology, ET is expressed as a function of
PET and P, i.e. ET = f (PET, P, other factors) [Penman, 1948; Brutsaert and Stricker, 1979]. P is
a causal factor on ET rather than the “unseen” or “confounding” factor on the relationship
between ET and PET. Thus, it is implausible to conclude that the relationship between ET and
PET is caused by the third causal or confounding variable P. It is the strong physical relation
between ET and P that may avoid the artificial correlation between ET/P vs. PET/P.
Additional analysis is conducted through Monte Carlo experiments to detect the spurious
correlation between ET/P and PET/P.
Based on the analysis of 547 catchments, our conclusion
is that the linear relationship of ET/P vs. PET/P shall not be subject to spurious correlation. This
is essentially due to the casual inferences between the components, i.e., the physical relation
among PET, ET and P, particularly the physical meaning of the two ratios, PET/P (usually
defined as dryness index) and ET/P as defined in the context of Budyko curve.
For the case of X/Z vs. Y/Z, if X, Y and Z are independent variables or random variables,
how well of Y/Z linearly fits with X/Z largely depend on the magnitude of coefficient of
variation (CV) of Z and CV of X or Y [Benson, 1965; Brett, 2004]. Based on mathematical
derivation, if CVZ = CVX = CVY, expected goodness-of-fit (coefficient of determination, R2) of a
linear relationship between X/Z and Y/Z is 0.5; if CVZ > CVX and/or CVY, expected R2 would be
Page | 1
larger than 0.5. For example, if CVZ = 2CVX = 2CVY, expected R2 would be 0.8; if CVZ =3CVX
= 3CVY, expected R2 would be larger than 0.9 [Benson, 1965]. Using Monte Carlo simulation,
Brett [2004] summarized how the R2 of the X/Z vs. Y/Z regression is determined by the CV of X,
Y and Z. Furthermore, in most instances, X, Y and Z are not absolutely uncorrelated, and in
such cases, a larger R2 would be found [Benson, 1965]. In generally, R2 would be monotonically
increasing with CVZ/CVX and/or CVZ/CVY when the relationship between X/Z vs. Y/Z is
spurious, which has been shown in Figures 2 and 4 of Brett [2004], and the residual error of the
correlation should be random since the relationship is spurious and controlled by Z.
ET/P and PET/P share the common denominator, i.e., precipitation. Generally, precipitation
has larger variability in the arid region and lower variability in the humid region. Figure S1 plots
the interannual CV of P against mean annual PET/P in 547 catchments in the continental U.S.,
which shows an obvious monotonically increasing trend of CV with the increase of PET/P. For
a given catchment, in nature, the interannual variability of P is usually much larger than PET and
ET. Further check on the magnitude of CV among three components shows that the average
value of CVP/CVPET is 3.75 in the rage of 1.2 ~ 12.5, and the average value of CVP/CVET is 3.01
in the range of 1.17~13.39. Figure S2 shows the relative magnitude between CVP/CVPET and
CVP/CVET in the 547 catchments. Therefore, the linear relationship of ET/P vs. PET/P can be
subject to spurious correlation with a high possibility. However, more detailed analysis does not
support the possibility, as illustrated below.
Figure S1: Scatter plot of interannual CV of precipitation under catchment aridity
Figure S2 Scatter plot of the magnitude of CV between CVP/CVPET and CVP/CVET.
(1)
Page | 2
R2 versus CVP/CVET or CVP/CVPET
For a given catchment, in nature, the interannual variability of P is usually much larger than
PET and ET. According to the definition of spurious correlation, if we suppose that ET/P vs.
PET/P is artificial, generally, we would expect that the R2 of the linear relationship of ET/P vs.
PET/P would be larger in arid regions than that in humid regions since CVP/CVET and
CVP/CVPET is larger in arid regions; and the expected average R2 would be around 0.9 at least.
Moreover, due to the correlation among P, ET and PET, the expected R2 would be higher than
that under the uncorrelated condition [Benson, 1965]. However, Figure S3 doesn’t show an
expected pattern following spurious correlation. As can be seen, there is a higher R 2 and
stronger linear relationship in the humid and semi-humid region (PET/P < 2.0) and very arid
region (PET/P >6.0); while R2 has a great disparity in the arid and semi-arid region (2.0 < PET/P
< 6.0). Actually, higher R2 and stronger linearity due to ET variability is largely controlled by
limited energy supply when PET/P < 2.0 and limited water supply when PET/P > 6.0 [Yang et al.,
2007]. When 2.0 < PET/P < 6.0, the large disparity of R2 is caused by complex interactions of
vegetation, human interference, landscape conditions, etc., with catchment water-energy supply
(PET and P). Previous studies [Milly, 1994a; 1994b; Potter and Zhang, 2007; Yang et al., 2007;
Potter and Zhang, 2009; Yang et al., 2009] also support the distribution of R2 versus
precipitation as shown in Figure S3.
Figure S3 Scatter plot of relationship between R2 based on UM_ET and PET/P
Figure S4 (a) and (b) show the relation of the R2 with CVP/CVPET and CVP/CVET,
respectively, which do not agree with the features of spurious correlation either. From Figure S4
(a), we cannot find the trend of R2 changing with CVP/CVPET; while in Figure S4 (b), there is an
obvious trend of R2 changing with high and low values of CVP/CVET, but no trend when
3<CVP/CVET<5. From Figure S2, CVP is more than two times of CVPET or CVET for most of the
Page | 3
catchments, but R2 does not show an obvious increasing trend with CVP/CVPET and CVP/CVET
(see Figure S4), as expected for a spurious correlation.
Therefore, the linear relationship
between PET/P and ET/P may not be due to spurious correlation based on the above data
analysis.
Figure S4 Scatter plot of relationship of R2 based on UM_ET against CVP/CVPET (a) and CVP/CVET
(b)
(2)
Physical meaning of slope and intercept of linear models
If the relationship ET/P = α•PET/P + β is reformulated as ET = α•PET + β•P, the slope (α)
reflects the correlation between ET and PET and intercept (β) reflects correlation between ET
and P. Thus, higher value of α represents higher sensitivity of ET to PET; higher value of β
represents higher sensitivity of ET to P. Figure S5 exhibits the decreasing and increasing trend
of slope and intercept as mean annual PET/P increases, respectively. It essentially reflects the
spatial relation of ET with P and PET [Budyko, 1974]. In humid regions, energy is the limiting
factor to ET and the slope is relatively large; however, in arid regions, water supply is the
limiting factor for ET and the intercept is relatively large [Zhang et al., 2004; Yang et al., 2008;
Potter and Zhang, 2009]. Therefore, the slope and intercept of the linear relationship can be
interpreted in a physical frame.
Furthermore, the major physical or anthropogenic factors contributing to the linear
relationship are discussed in the manuscript, which provides a solid interpretation of the casual
relation between PET/P and ET/P.
Figure S5 Scatter plots of slope (a) and intercept (b) of the linear relationship against PET/P.
Page | 4
(3)
Monte Carlo experiments
Given that the analysis above may still not be convincing enough on the argument of a
genuine relationship between ET/P and PET/P. We conduct further analysis using Monte Carlo
experiments.
For Monte Carlo experiments, we need to interrupt the physical connection
between the three variables and randomly resample the data time series.
For each of the 547 catchments, we resample P, PET and ET, respectively, to generate a new
set of P, PET and ET from the original data with the same length, i.e., 24 years. We used two
methods to do the resampling in MATLAB: one is random sample without replacement and the
other is bootstrap method with replacement. Random resample approach holds the CV same as
the observed set, but bootstrap method changes the CV slightly. We resampled 10,000 times
using each method. Then, we recalculated the mean R2, slope and intercept. Generally, the
results from bootstrap resample method are very similar with the random resample method. So,
the differences between the results from original data those from the resampled data are
compared based on a random resample method. Figure S6 shows the R2 of the resampled
relationship versus CVP/CVPET and CVP/CVET, respectively. As can be seen, there are obvious
increasing tends of R2 with CVP/CVPET and CVP/CVET, as expected with spurious features. The
scatter plots in Figure S6 are different with those in Figure S4 (a) and (b), which show the trends
based on original data, and they are not likely subject to a spurious correlation as explained
above together with Figure S3. That is to say, the linearity between PET/P and ET/P is subject to
spurious correlation as demonstrated by Benson [1965] and Brett [2004] when the physical
connections are destroyed by a random resample approach. The situation in Figure S4 (b) and
Figure S6 (b) on the relationship between R2 and CVP/CVET looks slightly spurious, but this is
likely due to the fact that P is a controlling factor of ET (as stated in the manuscript). Thus, due
Page | 5
to the strong physical causal relation of ET with PET and P, the linear relationship identified
with ET/P vs. PET/P should not be subject to spurious correlation.
Figure S6 scatter plots show resampled relationship of R2 with CVP/CVPET (a) and CVP/CVET (b)
As shown in Figure S7, with the resampled data, the slope and intercept no longer sustain the
physical pattern as shown in Figure S5 based on the original data. Compared to Figure S5, to
some extent, a similar pattern of slope can be found, but there is no pattern at all for the
resampled samples of intercept. The decreasing trends between resampled slope and CVP/CVPET
(Figure S7 (a)) is similar to the trend identified from original data (Figure S5 (a)) because the
interannual variability of PET is small and random resampled series are very similar with the
original ones. The random distribution of the resampled intercepts in Figure S7 (b) also indicates
that the resampled linearity is controlled by the common denominator, i.e., P. Basically, the
resampled linear relationships do not reflect the actual correlation, although there is a high
coefficient of determination.
Figure S7 scatter plots show resampled mean slope (a) and intercept (b) against catchment PET/P.
In summary, the Monte Carlo experiment implies that the relationship between ET/P and
PET/P is not a spurious correlation to some extent, especially from the comparison between
resampled and original results. The strong physical connection of ET with P and PET safeguards
the relationship and its physical interpretation.
Page | 6
References:
Benson, M. A. (1965), Spurious correlation in hydaulics and hydrology, Journal of Hydraulic
Division, 91(4), 35-42.
Brett, M. T. (2004), When is a correlation between non-independent variables "spurious"? Oikos,
105(3), 647-656, doi:10.1111/j.0030-1299.2004.12777.x.
Brutsaert, W., and H. Stricker (1979), An advection-aridity approach to estimate actual regional
evapotranspiration, Water Resour Res, 15(2), 443-450, doi:10.1029/WR015i002p00443.
Budyko, M. I. (1974), Climate and life, 508 pp., Academic, New York.
Firebaugh, G., and J. P. Gibbs (1985), User's Guide to Ratio Variables, American Sociological
Review, 50(5), 713-722.
Kronmal, R. A. (1993), Spurious Correlation and the Fallacy of the Ratio Standard Revisited,
Journal of the Royal Statistical Society. Series A (Statistics in Society), 156(3), 379-392.
Milly, P. C. D. (1994a), Climate, Soil Water Storage, and the Average Annual Water Balance,
Water Resour Res, 30(7), 2143-2156, doi:10.1029/94WR00586.
Milly, P. C. D. (1994b), Climate, interseasonal storage of soil water, and the annual water
balance, Adv Water Resour, 17(1-2), 19-24.
Pearson, K. (1897), Mathematical contributions to the theory of evolution — On a form of
spurious correlation which may arise when indices are used in the measurement of organs,
Proceedings of the Royal Society of London, 60, 489-498.
Penman, H. L. (1948), Natural evaporation from open water, bare soil and grass, Proceedings of
the Royal Society of London. Series A, Mathematical and Physical Sciences, 193(1032), 120145, doi:10.1098/rspa.1948.0037.
Page | 7
Potter, N. J., and L. Zhang (2007), Water balance variability at the interstorm timescale, Water
Resour Res, 43, W05405, doi:10.1029/2006WR005276.
Potter, N. J., and L. Zhang (2009), Interannual variability of catchment water balance in
Australia, J Hydrol, 369(1-2), 120-129, doi:10.1016/j.jhydrol.2009.02.005.
Yang, D., F. Sun, Z. Liu, Z. Cong, G. Ni, and Z. Lei (2007), Analyzing spatial and temporal
variability of annual water-energy balance in nonhumid regions of China using the Budyko
hypothesis, Water Resour Res, 43, W04426, doi:10.1029/2006WR005224.
Yang, D., W. Shao, P. J. F. Yeh, H. Yang, S. Kanae, and T. Oki (2009), Impact of vegetation
coverage on regional water balance in the nonhumid regions of China, Water Resour Res, 45,
W00A14, doi:10.1029/2008WR006948.
Yang, H., D. Yang, Z. Lei, and F. Sun (2008), New analytical derivation of the mean annual
water-energy
balance
equation,
Water
Resour
Res,
44,
W03410,
doi:10.1029/2007WR006135.
Yule, G. U. (1910), On the interpretation of correlations between indices or ratios, Journal of the
Royal Statistical Society, 73, 644-647.
Zhang, L., K. Hickel, W. R. Dawes, F. H. S. Chiew, A. W. Western, and P. R. Briggs (2004), A
rational function approach for estimating mean annual evapotranspiration, Water Resour Res,
40, W02502, doi:10.1029/2003WR002710.
Page | 8
List of Figures
Figure S1: Scatter plot of interannual CV of precipitation with catchment aridity
Figure S2 Scatter plot of the magnitude of CV between CVP/CVPET and CVP/CVET.
Figure S3 Scatter plot of relationship between R2 based on UM_ET and PET/P
Figure S4 Scatter plot of relationship of R2 based on UM_ET against CVP/CVPET (a) and CVP/CVET (b)
Figure S5 Scatter plots of slope (a) and intercept (b) of the linear relationship against PET/P.
Figure S6 scatter plots show resampled relationship of R2 with CVP/CVPET (a) and CVP/CVET (b)
Figure S7 scatter plots show resampled mean slope (a) and intercept (b) against catchment PET/P.
Page | 9
Download