Evasion modeling and uncertainty estimation

1

2

3

4

Supporting Information

Evasion of CO

2

from streams – The dominant component of the carbon export through the aquatic conduit in a boreal landscape

5

6

Marcus Wallin, Thomas Grabs, Ishi Buffam, Hjalmar Laudon,

Anneli Ågren, Mats Öquist & Kevin Bishop

7

8

Evasion modeling and uncertainty estimation

9

Methods

22

23

24

18

19

20

21

14

15

16

17

10

11

12

13

Uncertainties of evasion rates calculated by the deterministic model equations presented in the main text (equations 1 to 3) were estimated through a Monte Carlo experiment. A Monte-Carlo-based approach was chosen due to the nonlinearity of the model equations which makes application of analytical error propagation methods more complex. For the Monte Carlo experiment an ensemble of

50’000 random parameter sets (13 parameters per set) was created by drawing parameters from a multivariate normal distribution. A random parameter set was selected at the beginning of each model simulation and kept constant during the simulation period. Random parameter sets were drawn using

Latin hypercube sampling. Latin hypercube sampling samples the parameter space more efficiently than standard random sampling. Thus, it is possible to calculate robust uncertainty estimates with smaller sample sizes and fewer model runs than required by standard random sampling. Convergence and robustness of the uncertainty estimates were tested by repeating the Monte Carlo experiment with

6 differently sized test sample sets. The 6 test sample sets were created from 6 independent Latin hypercube samplings and contained 50, 100, 500, 1’000, 5’000 and 10’000 parameter sets, respectively. The Monte Carlo procedure was deemed robust and convergent for a selected uncertainty estimate if repeating the experiment with increasing sample sizes would lead to relatively small (≤ 5%)

25

26 and decreasing absolute relative differences in two uncertainty estimates



. Relative differences in uncertainty estimates

 

were calculated by comparing the estimate



of a Monte Carlo experiment

27

28

29 with a test sample set of size N to the estimate resulting from the Monte Carlo experiment with the largest sample size (i.e., N



50'000

) (Equation 1). Convergence and robustness were verified visually by plotting

 

against N . Because preliminary results indicated that the modeled stochastic

30

31 total and specific evasion rates were approximately normally distributed, their mean and standard deviation were chosen as appropriate uncertainty estimates (



).

32

  

   

 



50 ' 000

50 ' 000







100 % (Equation 1)

33

34

35

The 13 parameters (Table 1) that were drawn randomly from probability distributions included parameters to estimate k (2 parameters:

CO

2 a k

, b k

),

 norm

(2 parameters: a



, b



),



(4 parameters:

CO

2





, k

), stream flow (1 parameter:



Q

) and stream width (4 parameters: w k

). Uncertainty in stream

36 temperature estimates calculated as daily mean stream temperature per stream order was

37 approximately ±3 °C and had relatively little effect on estimates of k

CO

2

(±3 %). For this reason as well

38

39 as to keep computational costs low, stream temperature was used deterministically and excluded from the Monte Carlo experiment.

40

41

42

Probability distributions for random parameter sampling were determined from experimental data for all parameters except for errors related to uncertainty in streamflow. Streamflow was assumed to be

43 associated with a log-normally distributed error of 5 %. Bivariate normal distributions of regression model parameters for estimating gas transfer coefficients k

CO

2

and stream reach travel times

 norm were determined through standard linear regression (Wallin et al. 2011). For each stream order a log44

45

46

47 normal distribution was fit to observed stream widths to quantify the mean stream width w k and the variation around the mean. Errors arising by estimating average daily



CO

2 for different stream orders from observed



were quantified through leave-one-out-cross-validation. For this, observed

CO

2

48

49

 time series were split in two by using every second measurement for validation (calculation of

CO

2 residuals) and all remaining measurements for estimation. Relative errors per stream order





, k

were

50

51 calculated by dividing the calculated residuals (i.e., the difference between observed and predicted



CO

2

) by observed



CO

2

. Relative errors per stream order were assumed to be normally distributed

56

57

58

59

52

53

54

55

60

61

62 with zero mean and standard deviations were estimated from the computed relative errors. All fitted theoretical probability distributions were compared to the respective empirical distributions by performing Kolmogorov-Smirnoff tests. Results from the Kolmogorov-Smirnoff tests were used to identify cases in which a theoretical probability distribution did not fit well the underlying empirical distribution, which was indicated by the rejection of the null hypothesis (null hypothesis: ‘the theoretical probability distribution is the same as the empirical distribution’). Cases for which the null hypothesis was rejected were assessed in more detail by visual analysis of quantile-quantile plots and density plots of the corresponding theoretical and empirical distributions. In cases with a more complex underlying empirical distribution, a theoretical distribution that fitted less well (rejection of the null hypothesis) was kept if this resulted rather in wider than in more narrow final uncertainty estimates.

63

Results and discussion

68

69

70

71

72

64

65

66

67

Means and standard deviations of evasion rates estimated using Monte Carlo experiments with sample sizes greater than 1000 differed less than 5% from estimates obtained with a sample size of 50’000 parameter sets (Figure 1). This was found true for evasion rates calculated per site and stream order.

Monte Carlo realizations of evasion rates were approximately normally distributed (Figure 2), which indicated that mean and standard deviation were appropriate statistics. Consequently, uncertainties were expressed as standard-deviations around the mean rather than as confidence intervals.

Kolmogorov-Smirnoff test scores performed on the distributions that were used to generate random parameter sets, indicated that fitted normal and log-normal distributions (Tables 2 and 3) followed approximately the shape of the respective empirical distribution. This was true for all fitted

73 distributions except for those describing mean errors when estimating



CO

2 in first and third order streams from point measurements. Visual assessment of quantile-quantile plots and density plots 74

75

76

77

78

(Figure 3) showed, however, that the fitted normal distributions had a larger spread compared to the respective empirical data distribution. Thus, using the relatively wider fitted distributions as basis for the Latin hypercube sampling leads to slightly larger uncertainty intervals than the original experimental data would suggest.

79

80

81

82

In summary, the conducted Monte Carlo experiment produced stable uncertainty estimates (standard deviations around the mean) that were rather too large than too small given the general assumption that the chosen CO

2

evasion model adequately represents the corresponding complex natural processes.

83

84

85

Table 1. Overview of model parameters.

Parameter Unit

E

CO

2

Type/ In-or Output Description and where used in main paper mg s -1 Stochastic/ Output Stream CO

2

evasion (equation 1) min -1 Stochastic/ Input Gas specific transfer coefficient (equation 2)

 k

CO

2



CO

2 norm mg L -1 Stochastic/ Input min -1 m Stochastic/ Input

Difference between in-stream and atmospheric CO

2

(stochastic in combination with





, k

)

Normalized stream reach residence time (equation 3)

L segment m Deterministic/ Input Stream reach length (GIS-derived)

Q L s -1 Stochastic / Input Stream flow (stochastic in combination with



Q

) tan

 m m -1 Deterministic/ Input Stream slope (GIS-derived) (equation 2) a k min -1 Stochastic/ Input b k a

 min -1

?

a)

Stochastic/ Input

Stochastic/ Input

Gas transfer coefficient regression model slope

(equation 2)

Gas transfer coefficient regression model intercept

(equation 2)

Reach travel time regression model slope (equation 3) b



?

a) Stochastic/ Input







, k

Q

[] b)

[] b)

Stochastic/ Input

Stochastic/ Input

Reach travel time regression model intercept (equation

3)

Mean error (multiplicative) when estimating



CO

2

(for stream Strahler order k )

Mean error (multiplicative) when estimating streamflow

[°C] Deterministic/ Input Stream temperature (interpolated per stream order) T k w k m Stochastic/ Input Mean stream width (for stream Strahler order k ) a) b)

The units are not informative because the regression model was applied to ln( Q ) .

Dimensionless.

86

87

88

89

90

91

92

Table 2. Parameter distributions used for Latin hypercube sampling.

Parameter Distribution a k b k a

 b







, 1





, 2





, 3





, 4



Q w

1 w

2 w

3 w

4 a)

~

~

~

~

Mean

N

N

N

N







 

( a k



( a k



( a





( a



,

,

, b k

),



( a k

, b k

, b k

),



( a k

, b k b b





),



( a



),



( a



,

, b b





)



)



)



)



0.014

0.01

-0.39

-0.97

~

~

~

~

N

N

N

N

 

 

 

 

0 d)

0 d)

0 d)

0 d)

Standard n

Kolmogorov-Smirnoff test deviation H

0

: is normally distributed

2.05·10 -3 14 H

0

not rejected (s ks

= 0.13, p = 0.96) a)

6.85·10 -3 14 H

0

not rejected (s ks

= 0.13, p = 0.96) a)

0.02

0.06

2.19 %

1.40 %

2.37 %

2.28 %

114 H

0

not rejected (s ks

= 0.05, p = 0.96) a)

114 H

0

not rejected (s ks

= 0.05, p = 0. 96) a)

101 H

0

rejected, (s ks

= 0.16, p = 0. 009) b) c)

101 H

0

not rejected (s ks

= 0.13, p = 0. 067) b)

100 H

0

rejected, (s ks

= 0.17, p = 0. 004) b) c)

99 H

0

not rejected (s ks

= 0.13, p = 0. 056) b)

~

~ exp exp exp







N

N

N







,

,

,











0 d)

4.24

0.05

d)

0.08

0 not tested

61 H

0

not rejected (s ks

= 0.08, p = 0. 9) b)

~

~

~ exp exp





N

N

 

 





4.91

5.42

6.23

0.07

0.23

0.05

65 H

0

not rejected (s ks

= 0.08, p = 0. 43) b)

16 H

0

not rejected (s ks

= 0.17, p = 0. 7) b)

16 H

0

not rejected (s ks

= 0.18, p = 0. 46) b)

Test performed on regression residuals. The covariance matrix

 covariance/ correlation between regression parameters. accounts for non-zero b) Test performed using sample standard deviation. For random sampling the standard deviation around the mean was used. c) Visual assessment of quantile- quantile plots and density plots indicated that the fitted distributions had a larger spread than the experimental data. d) Parameter was not estimated but chosen based on theoretical considerations.

Table 3. Correlations between regression parameters used for Latin hypercube sampling.

Parameter pair Correlation



 a a

 k

,

, b b

 k





-0.81

-0.86

93

94

95

96

97

98

99

100

101

Figure 1. Relative differences of mean and standard deviations of yearly CO

2

evasion rates estimated from differently sized Monte Carlo experiments compared to the estimates from the Monte Carlo experiment with the largest sample size ( 5·10 4 ). Mean values and standard deviations in this figure were calculated from modeled yearly CO

2

evasion rates per stream surface of 1 st order streams. All other stochastic model outputs showed similar patterns when comparing mean values and standard deviations against sample size.

102

103

104

105

106

107

108

Figure 2. Comparison of the empirical (dashed line) distribution with the fitted normal distribution

(solid line) of modeled yearly CO

2

evasion rates per stream surface of 1 st order streams. The empirical distribution is illustrated by a kernel density estimate, which is similar to a using a histogram. The area under each curve was scaled to unity for better comparison. All other stochastic model outputs showed similar patterns when comparing empirical and fitted normal distributions.

109

110

111

112

Figure 3. Comparison of the empirical (dashed line) distribution with the fitted normal distribution

(solid line) of relative (normalized) residuals when estimating



CO

2 in individual 1 st order streams. The

113

114

115 empirical distribution is illustrated by a kernel density estimate, which is similar to a using a histogram. The dotted curve represents the fitted normal distribution around the residual mean. The

116 area under each curve was scaled to unity for better comparison. A similar result was found when plotting empirical and fitted distributions of relative (normalized) residuals when estimating

 in

CO

2

117 individual 3 rd order streams.

Evasion modeling and uncertainty estimation

Supporting Information

Evasion of CO

from streams – The dominant component of the carbon export through the aquatic conduit in a boreal landscape

Marcus Wallin, Thomas Grabs, Ishi Buffam, Hjalmar Laudon,

Anneli Ågren, Mats Öquist & Kevin Bishop