Water Resources Research Supporting Information for A Bayesian Approach to Improved Calibration and Prediction of Groundwater Models With Structural Error T. Xu, A. J. Valocchi Department of Civil and Environmental Engineering, University of Illinois at Urbana-Champaign, Urbana, IL, USA Contents of this file Text S1 Figure S1- S4 Text S1. Generation of recharge and evapotranspiration rates in the synthetic case study The generation of the spatiotemporally varying recharge rate in the virtual reality and the simple model (Section 3.1) is described with more details here. First, annual recharge rates for 20 years are generated from a first order autoregressive AR(1) model, with a long-term mean of 0.1 m/year. Second, the annual recharge rates were distributed to every month using a fixed set of monthly multiplier. For the recharge rate used in the virtual reality, a spatially varying factor field was simulated for each month, using SGeMS, with a mean of 1, a variance of 1 and an isotropic spherical variogram with range equal to 1 km. The spatial varying recharge field for each month was calculated by multiplying the factor field with the recharge rate of that month. For the recharge rate in the simple model, on the other hand, the spatial factor fields were first contaminated with noise, sampled at five virtual climate stations, and then extrapolated throughout the whole domain to obtain "smoothed" factor fields. This introduces input error induced by, e.g. limited sampling locations for precipitation, coarse resolution for soil type map. The new factor fields were then multiplied by the monthly recharge rates to calculate the spatiotemporally varying recharge in the simplified model. The EVT fields were generated in a similar way. Figure S1. Up: Drawdown calibration error of the standard LSR method plotted versus drawdown computed by the LSR calibrated model. Bottom: Drawdown calibration error of the Bayesian approach plotted versus posterior mean of the Bayesian approach. The Bayesian calibration error is calculated as the difference between calibration targets and Bayesian posterior mean. It can be seen that the Bayesian approach resulted in error with smaller magnitude and more evenly spread around 0. Figure S2. Stream gain-and-loss (𝚫𝑸) calibration error time series of the standard LSR method (upper) and the Bayesian approach (bottom). The Bayesian calibration error is calculated as the difference between calibration targets and Bayesian posterior mean. An increasing trend can be observed from the LSR calibration error. The GP posterior mean captured this trend; therefore the remaining residual is more evenly distributed around 0. Figure S3. Autocorrelation function (ACF) of calibration error of the conventional LSR (blue) and the Bayesian approach (red). The dash-dotted lines enclose 95% confidence interval that the true correlations were 0. One lag equals three month as drawdowns and stream gain-and-loss are observed quarterly. Strong temporal correlation can be observed for LSR calibration error within the time span of one year, leading the use of a diagonal error covariance matrix dubious. On the other hand, the Bayesian residual has significantly weaker temporal correlation, mostly falling within the 95% confidence bound. This is because the GP error model captures the correlation structure in model structural error. Figure S4. Correlation coefficients among the calibration error for the seven drawdown locations and stream gain-and-loss, resulted from the conventional LSR (a) and the Bayesian approach (b). Strong correlation can be observed among drawdown locations and between drawdowns and stream gain-and-loss. Similarly with temporal correlation, the presence of such correlation makes the use of a diagonal error covariance matrix dubious. For the Bayesian calibration error, the correlation among calibration targets is significantly smaller, within the range of (-0.5, 0.5), indicating that the GP error model indeed captures the correlation structure in model structural error and renders nearly white-noise remnant error.