Short course on “Model building, inference and hypothesis testing in hydrology” 21-25 May, 2012 Martyn Clark Outline • A typical hydrological modeling recipe • Different types of hydrological models – “Conceptual” models – Physics-based models – Other modeling typologies (lumped v. distributed, process-based models) The typical hydrologic modeling recipe • Perceptual Model – Understanding of the Earth System – Not constrained by mathematics – Different for different people (based on training, experience) • Conceptual Model – Synthesis (simplification) of the perceptual model – Requires specifying • System boundaries • Relevant inputs, state variables and outputs • Physical laws to be obeyed and simplifying assumptions – Different conceptual models = different hypotheses of system behavior • Numerical Model – Spatial discretization and temporal integration of the governing model equations 3 What controls the spatial variability of snow? What controls sub-surface flow to the stream? 4 Jollie River, near Mt Cook, New Zealand How can we represent the interception of water by the vegetation canopy? What about the expansion and contraction of saturated areas near the stream? 5 Mahurangi River, Northland, NZ The typical hydrologic modeling recipe • Perceptual Model – Understanding of the Earth System – Not constrained by mathematics – Different for different people (based on training, experience) • Conceptual Model – Synthesis (simplification) of the perceptual model – Requires specifying • System boundaries • Relevant inputs, state variables and outputs • Physical laws to be obeyed and simplifying assumptions – Different conceptual models = different hypotheses of system behavior • Numerical Model – Spatial discretization and temporal integration of the governing model equations 6 Water Cycle (from 7 NASA) Land component of the water cycle (Clark et al., 2008) 8 The typical hydrologic modeling recipe • Perceptual Model – Understanding of the Earth System – Not constrained by mathematics – Different for different people (based on training, experience) • Conceptual Model – Synthesis (simplification) of the perceptual model – Requires specifying • System boundaries • Relevant inputs, state variables and outputs • Physical laws to be obeyed and simplifying assumptions – Different conceptual models = different hypotheses of system behavior • Numerical Model – Spatial discretization and temporal integration of the governing model equations 9 State equations dS s dt dSc dt dS r dt dS a dt ps ms pr ec Sc pt Sc pt Sc qix S r qsx S r er S r d S r d S r qb S a 10 Outline • A typical hydrological modeling recipe • Different types of hydrological models – “Conceptual” models – Physics-based models – Other modeling typologies (lumped v. distributed, process-based models) A common typology of hydrologic models (really a continuum)… • Physics-based models – Rely on physical “laws” wherever possible (e.g., Darcy’s Law) – Include “astute” representations of individual processes • Conceptual models – Use relatively simple expressions to represent the dominant physical processes at larger spatial scales (e.g., a catchment) • Both modeling approaches have a number of unknown model parameters – Specify parameters a-priori (common in physics-based models) – Infer parameters using inverse methods (common in conceptual models) • Both modeling approaches can provide a poor representation of dominant processes, while appearing to provide adequate predictions of the system response to external forcing – The right answers for the wrong reasons – Compensatory effects of model parameters Physics-based models • Typically have a more complete representation of dominant processes – Explicitly simulate all land-atmosphere energy fluxes, rather than using empirical equations for snow melt and potential evapotranspiration • Model parameters can generally be related to “observable” quantities – Interception capacity of vegetation computed as a function of the leaf area index, which, in turn, can be related to vegetation type or inferred from satellite measurements – Soil hydraulic parameters related to soil type • State variables can be related to observable quantities • Generally built with the goal of spatial transposability, rather than the goal of simulating the idiosyncrasies in individual catchments Example: Use of Darcy’s Law to represent the flow of water through soil z z q K K z z z • Combine with the continuity eqn & add a sink term K K S t z z q = vertical liquid water flux (m s-1) K = hydraulic conductivity (m s-1) = volumetric liquid water content (-) = matric head (m) S = sink, usually representing ET (-) z = vertical coordinate (m) • Add some constitutive functions to define some of the unknown terms f • Substitute terms K f d K K S t z d z Source: Hydrus-1d • Vertical flux of water d d K d d z d K K d z q K Where are the uncertainties? • Darcy’s Law is empirical... • but, apart from that… How do you estimate the “effective” hydraulic conductivity over larger spatial scales How do you estimate the relation between matric head and volumetric water content? How do you estimate the root water uptake? d K K S t z d z q = vertical liquid water flux (m s-1) K = hydraulic conductivity (m s-1) = volumetric liquid water content (-) = matric head (m) S = sink, usually representing ET (-) z = vertical coordinate (m) and.. how can we specify equations for other parts of the model: • Interception of water by the vegetation canopy • Snow albedo • Other… Implications for model architecture • Requires (at least) a vertically resolved model to compute the spatial gradients Conceptual models • Use empirical formulations for individual processes – For example, snow melt • Model parameters may not be observable – Parameters defining the expansion of saturated area – Baseflow coeffcients • Model state variables may not be observable • But… conceptual models have the potential to provide a more realistic representation of hydrological processes at larger spatial scales Example: The Sacramento model Other modeling typologies • Lumped vs. distributed – Lumped: Catchment as a single computational unit – Semi-distributed: A collection of 1-D models applied over multiple grid-cells (or multiple sub-basins), with streamflow routed through a digital river network – Fully distributed: 3-D (or layered 2-D) implementation of the model equations. – Gross classification, as model type really depends on the model’s representation of spatial variability across a hierarchy of spatial scales, including representing sub-grid variability • Emergence of alternative vocabulary (physically based models, process-based models) – Recognizes the empiricism in both physics-based and conceptual models, and defines models that seek to have an astute representation of all processes – That is, physics-based modeling is impossible The model’s representation of spatial variability can have a large impact on simulations of water fluxes at larger spatial scales Consider the “mosaic” approach to representing heterogeneity in vegetation, as in the VIC model … and contrast against other model architectures: • should different vegetation types compete for the same soil moisture? • should the architecture allow for sub-grid variability in soil moisture, to simulate the co-variability between vegetation type and water availability? Short course on “Model building, inference and hypothesis testing in hydrology” 21-25 May, 2012 Martyn Clark Approach • Stick to very simple (yet robust) numerical methods – Simpler than those presented in “Numerical Recipes” – Yet very easy for people to code from scratch, and retrofit existing hydrologic models • Learn by doing – “I never really understand something until I code it up myself” – Exercises to implement the methods that are introduced • Encourage discussion The typical hydrologic modeling recipe • Perceptual Model – Understanding of the Earth System – Not constrained by mathematics – Different for different people (based on training, experience) • Conceptual Model – Synthesis (simplification) of the perceptual model – Requires specifying • System boundaries • Relevant inputs, state variables and outputs • Physical laws to be obeyed and simplifying assumptions – Different conceptual models = different hypotheses of system behavior • Numerical Model – Temporal integration of the governing model equations 23 Outline • Numerical methods for bucket-style hydrologic models – – – – The explicit Euler method The implicit Euler method Adaptive sub-stepping Ad-hoc numerical implementations in hydrologic models • Exploring impacts of unreliable numerical implementation – Macro-scale and micro-scale discontinuities in model response surfaces and associated difficulties in model calibration and sensitivity analyses – Computational cost – Fragility of unreliable numerical implementations in predictive mode • Summary and outlook State equations dS s dt dSc dt dS r dt dS a dt ps ms pr ec Sc pt Sc pt Sc qix S r qsx S r er S r d S r d S r qb S a 25 The exact solution of the ODE system (and the need for approximations) • The ODE system from Clark et al. (2008) can be written as dS g S, t dt • The exact solution of the average flux over the interval tn (start of the time step) to tn+1 (end of the time step) is g • n n 1 1 t t n1 g(S, ), d tn Given an estimate of the average flux, the model state variables can be temporally integrated as S t n 1 S t n tg nn 1 • The exact solution is computationally expensive, so approximations to the exact solution are used • The approximation controls the stability, accuracy, smoothness, and efficiency of the solution The bread and butter of hydrology: The explicit Euler method • The most straightforward approximation to the exact solution is the explicit Euler method nn1 g EE (1) g(S n ) where the flux at the start of the time step is used to approximate the average flux over the time step • The explicit Euler method linearly extrapolates from the initial condition along the initial slope of g(S) Try this… • A famous hydrologist (weight = 90 kg) has decided to go skydiving, and has asked you to determine how fast they will fall. • The hydrologist in question thinks that it is acceptable for you to use Newton’s second law of motion considering only the gravitational force (Fgrav) and the force due to air resistance (Fres) . That is, dv m Fgrav Fres dt where m = mass (kg) and v is the fall velocity (m s-1). • Further, the hydrologist pursues the paradigm of parsimony, and is quite happy for you to use an empirical representation of the resistance force. After some discussion, you agree that the terms on the right-hand-side can be written as Fgrav mg Fres kv where k is a proportionality constant (for freefall, assume k = 15 kg s-1), and g is the gravitational acceleration 9.81 m s-2). • Your mission, should you choose to accept it: Based on an initial velocity of zero (the hydrologist is hopping out of a hovering helicopter), temporally integrate the ODE for a 3 minute period using the explicit Euler method with 10 second time steps. The parachute problem (freefall) • ODE: dv m mg kv dt • Constants: – m = mass of the hydrologist (90 kg) – g = gravitational acceleration (9.81 m s-2) – k = proportionality constant for freefall (15 kg s-1) • Hence dv f (v ) dt • Numerical approximation: explicit Euler v n1 v n f (v n )t • Length of time step: Δt = 10 seconds • Initial velocity = zero m s-1 – the hydrologist is hopping out of a hovering helicopter …and, the end result… Any observations? Let’s see if we can do better… SOLVE THE PROBLEM AGAIN, EXCEPT THIS TIME USE THE EXPLICIT EULER METHOD WITH A TIME STEP OF 0.1 SECONDS Ta-Da! The parachute problem (landing) • Velocity ODE: m • dv mg kv dt Position ODE: dz v dt • Initial conditions: – v = initial velocity (terminal velocity from the freefall example = 58 m s -1) – z = height to open the parachute (500 m) • Use the same constants as in the freefall example, except for k: – m = mass of the hydrologist (90 kg) – g = gravitational acceleration (9.81 m s-2) – k = proportionality constant for the open chute (250 kg s-1) • Numerical approximation: explicit Euler with Δt = 10 seconds Poor numerical implementation can be downright dangerous! Once again, short steps save the day… Terminal velocity = 3.53 m/s = 12.71 km/hr = 7.89 mi/hr Can we do better than the explicit Euler method? Can we both increase accuracy and decrease computing costs? The implicit Euler method • Approximates the exact solution of the average flux by estimating the flux at the end of the time step nn1 g( IE1) g(S n 1 ) • The implicit Euler method requires that the computed flux at the end of the time step is consistent with the estimated solution: S n 1 S n t g S n 1 • In general, this requires iteration and can be computationally expensive. Newton-Raphson iteration • The Newton-Raphson method finds the root of a function by evaluating the function f (x) and the function derivatives f ́ (x) at arbitrary points of x. • The function: In the implicit Euler method f (x) is the residual between a trial state variable (x) and the solution at the end of the time step estimated using the trial value x: f ( x) x x n t g ( x) • The derivative: The derivative f ́ (x) can be computed using finite differences, as f ( x) • f ( x) f ( x ) The new value of x can then be computed as x new f ( x) x f ( x) and iterate until either the function or the derivative are below a userspecified error tolerance Let’s give it a go: Can the implicit Euler method save our famous hydrologist? • Step 1: – Provide an initial guess of v (vm) – Compute the function evaluation (the residual) at vm and vm+ɛ • requires the derivatives g(vm) and g(vm+ɛ) f (v m ) v m v n t g (v m ) • Notation: m=iteration index, and n=time step index; n denotes the start of the time step, and v0 is the initial guess of v – Compute the derivative of the function evaluation f (v ) m f (v m ) f (v m ) • Step 2: compute a new estimate of v x new m f ( v ) m v f (v m ) and go back to step one, replacing the initial guess of v with the new estimate of v • Keep going until you are happy! … and, the result… Adaptive sub-stepping Consider the freefall example: Are there conditions where we can afford to take longer time steps and still have a reliable solution? A good ODE integrator should exert some adaptive control over its own progress, making frequent changes in step size. Usually, the purpose of this adaptive step size control is to achieve some pre-determined accuracy in the solution with minimal computational effort. Many small steps should tiptoe through treacherous terrain, while a few great strides should speed through uninteresting countryside. The resulting gains in efficiency are not mere tens of percents or factors of two; they can sometimes be factors of ten, a hundred, or more. From “Numerical Recipes in Fortran: The Art of Scientific Computing” Implementing adaptive sub-stepping schemes • When stepping through time, consider an “outer loop” that is defined by the temporal resolution of the data, and an “inner loop” that includes a variable number of sub-steps necessary to obtain an accurate solution. • The step size in the inner loop depends on the numerical error in the ODE solution, and therefore need a method to monitor numerical error. • Approaches for numerical error monitoring compare a more accurate solution to a less accurate solution: – Estimate the average flux over the time step by (1) taking one full step (less accurate); and (2) taking two sub-steps (more accurate). – Compute two different approximations of the average flux. Example: The explicit Heun method • A predictor-corrector approach – Uses the explicit Euler method as a predictor, but applies a correction • Step 1: Estimate the average flux over the time step using the explicit Euler method: nn1 g EE (1) g(S n ) and use the EE flux to estimate the state at the end of the step S n 1 EE (1) nn1 S tg EE (1) n • Step 2: Use the explicit Euler estimate of the state at the end of the time step to estimate the flux at the end of the time step, and average the start-of-step and end-of step fluxes: nn1 g EH (2) 1 g (S n ) g (S nEE1(1) ) 2 Example: The explicit Heun method Incorporating additional information => higher accuracy Numerical error monitoring with the explicit Heun method • Given: S n 1 EE (1) nn1 S tg EE (1) n and S n 1 EH (2) nn1 S tg EH (2) n an estimate of the error in the solution at the end of the time step can be given as enEH1(2) SnEH1(2) SnEE1(1) • The sub-step is accepted if the user-specified absolute and relative truncation tolerances are satisfied, following enj 1 R s nj 1 A 0 j 1, N s and the new step size determined based on the size of the error So…. now that we are familiar with some simple and robust time stepping schemes, let’s consider how the governing equations are typically solved in hydrologic models… • Many hydrologic models use ad-hoc approaches to add and subtract individual model fluxes from model stores in a pre-determined sequence. • For example, the Sacramento model is implemented as follows: 1) Compute evaporation and subtract it from tension storages 2) Add precipitation to the upper zone tension storage 3) Compute baseflow and subtract it from the lower zone storage 4) If precipitation was in excess of the upper zone tension storage (in step 2), then add the excess precipitation to the upper zone free storage 5) Compute drainage from the upper zone and move it to the lower zone 6) Compute surface runoff and update water storage in the impervious area. • This approach is known as “operator splitting” which allows the application of specialized numerical approximations to each individual process. Advantages of the operator-splitting method… • Permits the use of analytical solutions for individual fluxes – It is extremely difficult (if not impossible) to find an analytical solution for a coupled set of differential equations, but it is possible to find analytical solutions for individual fluxes – For example, baseflow may be represented as a linear function of storage in the saturated zone: qb kS and the storage at tn+1 can be given as S n1 S n exp(k t ) • Simplifies handling solution constraints – For example, consider the case when the drainage from the unsaturated zone over a time step exceeds the water in the soil qd min qd , S n • Reduces the dimensionality of the problem Disadvantages of the operator-splitting method… • Analytical solutions are generally not available for all model fluxes – if analytical solutions are not available, the approximation is typically based on the explicit Euler scheme, and the solution may be contaminated by large temporal truncation errors • Implementation of the operator-splitting approach often resembles “spaghetti code” in which the model conceptualization is conflated with the numerical implementation – Modeling errors associated with the physical conceptualization and numerical implementation are difficult to separate, diagnose, and resolve – It is difficult to design flexible and modular hydrologic toolkits, such as FUSE and FLEX • Unless numerical error control is implemented (very unusual), model predictions computed using the OS approach may depend on the order of processing individual model fluxes – creates an undesirable arbitrariness in model construction and behavior Preference: avoid the OS method to the extent possible • Construct a model in a way that clearly delineates the model assumptions from the numerical approximations – – – Develop a model as a closed set of coupled ODEs The “physics routines” of the model are used to compute the temporal derivatives of the model states The “numerics routines” of the model are used for the temporal integration of the ODEs. • Facilitates tapping into the vast literature on differential equations to control numerical errors and reduce model run times • Simplifies the design of modular software for hydrologic simulations Outline • Numerical methods for bucket-style hydrologic models – – – – The explicit Euler method The implicit Euler method Adaptive sub-stepping Ad-hoc numerical implementations in hydrologic models • Exploring impacts of unreliable numerical implementation – Macro-scale and micro-scale discontinuities in model response surfaces and associated difficulties in model calibration and sensitivity analyses – Computational cost – Fragility of unreliable numerical implementations in predictive mode • Summary and outlook The numerical model • Time stepping schemes used in bucket-style hydrologic models have received surprisingly little attention in the literature • Reflects an (untested) assumption that numerical errors are always overwhelmed by uncertainties in the data and governing equations, and attention to numerical robustness is unwarranted • However… seemingly trivial numerical aspects can have a profound impact on model performance, contaminating scientific analyses and undermining operational predictions • Important for the hydrologic community to avoid preventable numerical artifacts before tackling more genuine challenges 52 Study approach • Evaluate the accuracy, smoothness, reliability and efficiency of different time stepping schemes, and identify the impact of numerical errors on model analysis, model calibration, and model predictions • Use the FUSE toolbox to implement the same methods in multiple model structures • Test all methods in a densely gauged catchment, as well as 12 other catchments spanning a range of physical and hydroclimate regimes FUSE models evaluated Fidelity of fixed-step time stepping schemes Impact on the objective function surface Implicit Euler objective function surface is similar to the near-exact solution obtained with adaptive sub-stepping Summary and outlook • In many cases model analysis complexities are consequences of numerical artifacts in the model implementation • Numerical errors of uncontrolled time stepping schemes routinely dwarf the structural errors of the model conceptualization • The robust numerics paradigm is yet to be accepted as the required standard in conceptual hydrology • Reliable hydrologic modeling is a major challenge without adding confounding numerical artifacts—let’s avoid preventable numerical troubles before tackling more genuine challenges!