Classical and Bayesian nonlinear regression applied to hydraulic rating curve inference. Construction and uncertainty analysis of stage-discharge rating curves Trond Reitan (Division of statistics and insurance mathematics, Department of Mathematics, University of Oslo) Motivation for this work • River hydrology – Management of fresh water resources • Decision-making concerning flood risk • Decision-making concerning drought • • River hydrology => How much water is flowing through the rivers? Key definition; discharge, amount of water passing through a cross-section of the river each time unit Trond Reitan (Division of statistics and insurance mathematics, Department of Mathematics, University of Oslo) Key problem • Discharge is expensive. But hydrologists wants discharge time series! • Solution: Find a relationship between discharge and something that is inexpensive to measure. • Usually, that something is water level. • This job must be done over and over again: Need solid tools for finding such relationships. • Discharge measurements are uncertain => need statistical tools • Program must be easy for hydrologists to use => User friendliness in statistics? Trond Reitan (Division of statistics and insurance mathematics, Department of Mathematics, University of Oslo) Water level definitions • Stage: the height of the water level at a river site h Q h0 Datum, height=0 Trond Reitan (Division of statistics and insurance mathematics, Department of Mathematics, University of Oslo) Stage-discharge relationship h Q=C(h-h0)b Q h0 Datum, height=0 Discharge, Q Trond Reitan (Division of statistics and insurance mathematics, Department of Mathematics, University of Oslo) Stage-discharge relationship • Simple physical attributes: – Q=0 for hh0 – Q(h2)>Q(h1) for h2>h1>h0 • Parametric form suggested by hydraulics (Lambie (1978) and ISO 1100/2 (1998)): Q=C(h-h0)b • Parameters may be fixed only in stage intervals segmentation h h Q Trond Reitan (Division of statistics and insurance mathematics, Department of Mathematics, University of Oslo) width Calibration data and statistical model • n stage-discharge measurements. • Discharge is error-prone. nonlinear regression • Statistical inference on C,b,h0 • Qi=C (hi-h0)b Ei, where Ei~logN(0,2) i.i.d. noise and i{1,…,n} • qi=a+b log(hi-h0) + i, where i~N(0,2) i.i.d. • Problem: Enable hydrological engineers to estimate Q(h)=C(h-h0)b and evaluate the calibration uncertainty. Trond Reitan (Division of statistics and insurance mathematics, Department of Mathematics, University of Oslo) One segment fitting, the old way • Guess or make approximate measurement of h0. Then linear regression on qi vs log(hi-h0). • For each plausible value of h0, do linear regression. Choose the h0 that gives least SSE for the regression. – Same as doing max likelihood inference on the model: +b log(hi-h0) + i , i{1,…,n} – Means that uncertainty analysis becomes available (?). – Studied by Venetis (1970). – Also by Clarke (1999). Trond Reitan (Division of statistics and insurance mathematics, Department of Mathematics, University of Oslo) qi=a Problems concerning classic one segment curve fitting • Sometimes exhibits heteroscedasticity. • Sometimes there's no finite solution! – Found a set of requirements that ensures finite estimates. – In practice, broken requirements means no finite estimates. – The model can produce broken requirements for any set of stage measurements! – Parameter estimators have infinite expectancy -> Uncertainty inference becomes difficult! – Explored in paper 1, Reitan and Petersen-Øverleir (2006) and in the appendix, Reitan and Petersen-Øverleir (2005). Trond Reitan (Division of statistics and insurance mathematics, Department of Mathematics, University of Oslo) Bayesian one segment fitting • • Based in the same data model, but with a prior distribution to the set of parameters. The Bayesian study of this resulted in paper 2, Reitan and Petersen-Øverleir (2008a). Bayesian analysis of other models done by Moyeeda and Clark (2005) and Árnason (2005). Trond Reitan (Division of statistics and insurance mathematics, Department of Mathematics, University of Oslo) Pros and cons • Upsides: – – – – • Encodes hydraulic knowledge. Can put softer ‘restrictions’ on the parameters. Finite estimates. Natural uncertainty measures. Downsides: – Requires heavier numerical methods (MCMC). – Coming up with a prior distribution can be hard. – Also sometimes exhibits heteroscedasticity. Trond Reitan (Division of statistics and insurance mathematics, Department of Mathematics, University of Oslo) Input - prior distribution • Prior distribution form as simple as possible. • Prior knowledge either local or regional. • Regional knowledge can be extracted once and for all. • At-site prior knowledge can be set through asking for credibility intervals. Trond Reitan (Division of statistics and insurance mathematics, Department of Mathematics, University of Oslo) Output – estimates and uncertainties • Estimates – expectancies or medians from the posterior distribution. • Uncertainty – credibility intervals of parameters and the curve itself, Q(h)=C (h-h0)b. Trond Reitan (Division of statistics and insurance mathematics, Department of Mathematics, University of Oslo) Segmentation • Original idea: Divide the stage-discharge measurement into sets and fit Q(h)=C (h-h0)b separately for each segment. This can fit a wider range of measurement sets. Segment 2 h Segment 1 Intersection Q Trond Reitan (Division of statistics and insurance mathematics, Department of Mathematics, University of Oslo) Problems with manual segmentation h 1) Uncertainty analysis of manual decisions not statistically available. 2) Curves fitted to two neighbouring sets may not intersect. 3) Two such curves may h have intersections only inside the sets. Jump in the curve Q Q Trond Reitan (Division of statistics and insurance mathematics, Department of Mathematics, University of Oslo) Statistical model for segmentation – the interpolation model • Idea: Make a model with segments and let the data be attached to that model. • Model: for k segments, introduce k-1 segment limits parameters, hs,1, …, hs,k-1. For a measurement hs,j-1<hi< hs,j assume qi=aj+bj log(hi-h0,j) +i. • Make sure there’s continuity by sacrificing one of the parameters in the upper segments (aj for j>1). • Goal: Make inference on all parameters in this model. Also, make inference on k. Trond Reitan (Division of statistics and insurance mathematics, Department of Mathematics, University of Oslo) Frequentist inference on the interpolation model • Segmented model first formulated and treated by using the maximum likelihood method in paper 3, Petersen-Øverleir and Reitan (2005). • Problems: 3) Possibility of infinite parameter estimates inherited from one segment model. (Much more likely than usual for upper segments.) 4) Multi-modality and discontinuous derivative of the marginal likelihood of changepoint parameters, {hs,j}. 5) Inference of k through AIC or BIC? Trond Reitan (Division of statistics and insurance mathematics, Department of Mathematics, University of Oslo) Bayesian inference on the interpolation model • Need prior distribution of changepoints, {hs,j}, and number of segments, k. • MCMC for each sub-model characterized by k. • Importance sampling for posterior sub-model probability, Pr(k|D). • Input: Data, prior probability of each sub-model and prior for the parameter set of each sub-model. (Can be regional or partially regional. Set by asking for credibility intervals.) • Output: Pr(k|D) and posterior dist. of all parameters for each k. (Summarised by estimates and credibility intervals.) Trond Reitan (Division of statistics and insurance mathematics, Department of Mathematics, University of Oslo) Output example for interpolation model inference Q Trond Reitan (Division of statistics and insurance mathematics, Department of Mathematics, University of Oslo) Problems with Bayesian treatment of segmented models • • • • • Difficult to make efficient inference algorithms (but a semi-efficient one has been made). Changepoints only inside the dataset (thus ”the interpolation model”). Extrapolation uncertainty underestimated because there can be changepoints outside the dataset. Solution(?): The process model, a new model where the segments appear through a process. Problems with the process model: Very inefficient algorithms. Difficult to implement all sorts of relevant prior knowledge. Middle ground? Use changepoints of most sub-model from the interpolation model as data in inference about the changepoint process. Process model used for extrapolation of the curve. Trond Reitan (Division of statistics and insurance mathematics, Department of Mathematics, University of Oslo) References 1) 2) 3) 4) 5) 6) 7) 8) 9) 10) Árnason S (2005), Estimating nonlinear hydrological rating curves and discharge using the Bayesian approach. Masters Degree, Faculty of Engineering, University of Iceland Clarke, RT (1999), Uncertainty in the estimation of mean annual flood due to rating curve indefinition. J Hydrol, 222: 185-190 ISO 1100/2. (1998), Stage-discharge Relation, Geneva Lambie JC (1978), Measurement of flow - velocity-area methods. Hydrometry: Principles and Practices, first edition, edited by R.W. Herschy, John Wiley & Sons, UK. Moyeeda RA, Clarke RT (2005), The use of Bayesian methods for fitting rating curves, with case studies. Adv Water Res, 28:8:807-818 Petersen-Øverleir A, Reitan T (2005), Objective segmentation in compound rating curves. J Hydrol, 311: 188-201 Reitan T, Petersen-Øverleir A (2005), Estimating the discharge rating curve by nonlinear regression - The frequentist approach. Statistical Research Report, University of Oslo, Preprint 2, 2005 Available at: http://www.math.uio.no/eprint/stat report/2005/02-05.html Reitan T, Petersen-Øverleir A (2006), Existence of the frequentistic regression estimate of a power-law with a location parameter, with applications for making discharge rating curves. Stoc Env Res Risk Asses, 20:6: 445-453 Reitan T, Petersen-Øverleir A (2008a), Bayesian power-law regression with a location parameter, with applications for construction of discharge rating curves. Stoc Env Res Risk Asses, 22: 351-365 Venetis C (1970), A note on the estimation of the parameters in logarithmic stage-discharge relationships with estimation of their error, Bull Inter Assoc Sci Hydrol, 15: 105-111 Trond Reitan (Division of statistics and insurance mathematics, Department of Mathematics, University of Oslo)