Uncertainty due to gap-filling in long-term hydrologic datasets Craig See1, Jeremy Hayward1, Ruth Yanai1, Mark Green2, Doug Moore3 Amey Bailey4 1. SUNY College of Environmental Science and Forestry, Syracuse, NY 2. Plymouth State University, Plymouth, NH 3. University of New Mexico, Albuquerque, NM 4. USDAForest Service, Northern Research Station All long term datasets contain gaps Credit: Don Buso Why Gap Filling? Sometimes we need a continuous record (calculating fluxes) (Kiang et. al 2013) Gap filling methods • • • • Use of historical averages Bayesian Bootstrapping Expectation-maximization algorithm Use neighboring values – Direct substitution – Regression All gap filling methods introduce new error into the final total! Stream flow Gap Causes Coweeta Hubbard Brook 1% 1% 6% 10% 13% 8% 25% Equipment failure Maintenance 6% Operator error 12% Ice in v-notch Debris in v-notch 68% Misc. 50% Precipitation Volume Gaps at Hubbard Brook Precipitation Chemistry Gaps at Sevilletta NWR 5% 2% 8% Contaminated Sample Technician Error Overflow Missing in Field 85% Credit: Odonfiction.wordpress.com Credit: Doug Moore Streamflow gaps at Gomadansan Experimental Forest 0 50 100 150 0 50 100 300 200 weir 12 100 0 150 100 weir 16 50 0 150 100 weir 17 50 0 100 weir 20 50 0 0 100 200 300 0 50 100 150 (Yanai et al. 2013, in review) Example: Stream Flow at Hubbard Brook • Long term record from 2 watersheds • Model predicting one from the other • Record of actual gaps occurring – Gap length – Flow rate at start of gap Credit: HBRF Watershed 6 stream flow (ft3/s) Stream flow at Hubbard Brook 1962-2001 Watershed 5 stream flow (ft3/s) Absolute difference (ft3) Flow rate during gap impacts uncertainty in total volume (half hour gap) Initial stream flow (ft3/s) Methods-flow distribution Frequency Randomly sample 100,000 “fake gaps” from the data to create a distribution of change in flow during a gap. Change in stream flow (ft3/s) Creating a sampling pool For each “real gap” in the record: • Randomly create a “fake gap” of same length from the master dataset • Randomly select a flow-change value from the flow distribution • Does the max. flow rate inside the fake gap exceed the max. flow rate of the real gap ± the flow change value? – If yes, reject and pick another fake gap – If no, include this fake gap in sampling pool Sampling Pools • Create a sampling pool of 1000 possible uncertainties for each real gap in the record • To calculate uncertainty for a year, sum random values pulled from sampling pools for each real gap in that year. • Do this for 100,0000 iterations. Frequency 1996 (wet year) -6.7% 6.7% Anualstream flow uncertainty (mm) Frequency 2001 (dry year) -14.5% 14.5% Annualstream flow uncertainty (mm) Dry Year (2001) Frequency Wet Year (1996) -6.7% 6.7% Annual stream flow uncertainty (mm) 14.5% 14.5% Annual stream flow uncertainty (mm) Precipitation gaps at Sevilleta (1989-1996) How do we incorporate “gap uncertainty” into annual nitrate deposition estimates? Precipitation gaps at Sevilleta How do we incorporate “gap uncertainty” into annual nitrate deposition estimates? Methods-Wet Deposition Gaps • Construct a variogram based on observations that include all gauges (84 observations) • Use basic kriging interpolation to estimate deposition for all observations • Re-run analysis separately for all combinations of funnels. • Create distribution of observed-expected for each combination Gauge 8 removed Frequency Frequency Gauge 3 removed -1.5% 1.5% Observed – Expected NO3 (mg/m2) -2.6% 2.6% Observed – Expected NO3 (mg/m2) Precipitation gaps at Sevilleta How do we incorporate “gap uncertainty” into annual nitrate deposition estimates? Removing multiple gauges rarely changes the range of possible errors, but increases the chance of higher error within that range for a given event. Gauge 3&8 removed Frequency Frequency Gauge 8 removed -2.6% Observed – Expected NO3 2.6% (mg/m2) -2.6% 2.6% Observed – Expected NO3 (mg/m2) There is no correlation between amount of deposition from a collection, and the percent error in gap filling with kriging. Gauge 8 120 Percent error 100 80 60 40 20 0 0 5 10 15 20 25 Observed NO3 (mg/m2) Its OK to express your sampling pool as a percentage. Frequency To account for differences in collection, express the distribution as a percent of N deposition during a given event Gauge 8 Percent of observed value Use these percentages and the deposition estimate for a real gap to create a sampling pool! Summary • Data gaps can contribute significant uncertainty to estimates of ecosystem nutrient inputs and outputs. • With long term datasets, this uncertainty can be estimated empirically Advantages: Disadvantages: • Doesn’t rely on parametric estimates • Can use any model • Easy to understand • Requires data • Computationally intensive • The past doesn’t always predict the future! Thank you! • • • • Co-authors John Campbell Franklin Diggs Becka Walling