RULES FOR RESPONSIBLE MODEL BUILDING William James University Professor Emeritus President, CHI Guelph, Canada bill@computationalhydraulics.com "All models are wrong, though some may be said to be useful." (G.E.Box). It's not enough to know simply when or how a model may be said to be useful it's more important to know how reliable it is. R:1 A model is a concept. Concepts are used in thinking, scientific deduction, engn’rg design and forensics. They are improved by experience. We do not necessarily require the model that most approaches perfection, rather we seek the model that provides an acceptably accurate explanation. Simple models are often said to be “better” than complex models. Optimal model complexity depends on the questions to be resolved and the resources available. Your model should meet your own ethical standards – it should: •accept the limits of the discipline of engineering; •improve and restore the natural balances and biodiversity; •correct the human behaviour that caused the problem to the ecosystem; •imitate the structure of the natural, native or indigenous system; •be good for all parts of the natural system; •not enrich one individual or group to the distress or impoverishment of another; •be in harmony with good character, cultural value, and moral law. R:70 •the living world is the matrix for all design, •design should follow the laws of life, •biological equity must determine design, •design must reflect bioregionality, •projects should use renewable energy systems, •design should integrate living systems, •projects should heal the planet, and •design should follow a sacred ecology R:69 - fundamental tenet variance can be systematically reduced by including (explaining) more and more relevant processes, at a higher time and spatial resolution. R:12 The implicit problem in critical thinking is to find the most probable flaws in an argument, to discern the best lines of thought and to improve the argument. The solution may be stated: if we test the argument perhaps over a long time, which parts of the argument are less likely to be a valid, and how may the experience be better explained elsewise? The implicit problem in scientific method is to find the optimum or sufficient description of the dominant processes. The solution may be stated: if we test the current explanation of dominant processes over a long period of time, e.g. 75-years for an engineering environmental problem, is the description optimal in the sense that it is the most parsimonious description that meets the required, or imposed, uncertainty? The implicit problem in engineering design is to find the optimum cost-effective array of best practices. A solution may be stated: if the 75-year rainfall time series that occurred at the International Airport, had in fact occurred at Foxran Estates, then plan 126 would have been the most costeffective of the 329 plans examined - had they, of course, all existed over this time. The implicit problem in engineering forensics is to find the most credible explanation for an acute problem, and to suggest a cost-effective solution which is generally to replace the acute problem by a chronic problem. The price to be paid is vigilance. Concerns include: What array of models should be used? What is the model applicability in the context of the study objectives? What accuracy is achievable? What is the uncertainty of the model? What investment of model effort is most cost-efficient? Is cost-efficiency appropriate for optimizing an uncertain model? Rule: A model is used to help select the best among competing proposals. It is fundamentally irresponsible and unethical for modelers not to interpret the inherent uncertainty R:2 R5 Steps in model construction 1. review and re-state the problem 2. construct the as-is model input data set 3. select model performance evaluation criteria 4. select an objective function 5. calibrate and evaluate the model 6. satisfied? If no, go back to 1; If y, go to 7 7. model several theoretical or to-be situations 8. select the likely best alternative 9. report the best solution and its uncertainty. Rule: Computed and observed time series are more ethically represented as smudges than single-valued lines. R:7 Rule: Objectives must be simplified and related to the computed output and objective functions. The model must include code that adequately describes all significant processes. R:8 N M N S N pr C pa p , s , m m 1 s 1 pr 1 where: Nm = the number of modules active in the model, Ns = number of sub-spaces modeled in each module, Npr= number of processes modeled in each sub-space, pa = the input parameters required for each process. R:20 cost is taken to be a combination of: 1. engn’rg fees to design alternative solutions; 2. construction costs of the selected alternative; 3. intangible costs; 4. costs due to uncertainty of the selected option. R:48 R62 Note: Bill’s suggested relations & numbers $ 106 Evaluation function 105 F e m + ed 104 Fmin Design costs term 103 102 ed Model error term em Optimum complexity 101 100 100 101 102 103 104 Complexity C 105 106 107 108 109 Rule: In determining the best level of complexity, test simple models first, proceeding to more complex, until the required accuracy of the computed response function is achieved. Use the least number of processes, discretized spaces, and the biggest time step that delivers the required uncertainty. R:51 Sensitivity analysis consists of 1. varying model coefficients one at a time, with the amount varied being representative of the uncertainty in the parameter being analyzed, 2. dividing resulting dimensionless change in computed response by the dimensionless parameter variation, and then 3. ranking the resulting sensitivity gradient. R:129 Non-linear sensitivity gradients for peak Flow Me dium dura tion, m e dium inte nsity (0.3 in/hr for 1 hr) / Loc a tion 100 WW1 WAREA WW3 WSLOPE WW5 WW7 WW8 WW9 WW10 ww11 WW6 0.875 0.850 Flow [cfs] 0.825 0.800 0.775 0.750 -7.5 -5.0 -2.5 0.0 2.5 Percent change in parameter 5.0 7.5 Wkbk:59 Rule: Do not test a generalized program per se for sensitivity, parameter optimization, or error, because individual applications are likely to be radically different. Values of parameters in the input datafile determine which processes will be dominant or dormant. Relative parameter values change both the model sensitivity and the model uncertainty. Each model application must be separately tested over the relevant range of model R:3 categorize input parameters in four groups: 1. can be measured with almost total certainty: 2. can be readily measured in the field or laboratory. 3. cannot be easily measured in the field or laboratory. 4. cannot be measured with any certainty at all. model process calibration parameter estimation sensitivity analysis event calibration continuous model © W James ‘97 1. Calibration Start Model parameters Datafile Calibration IFs Programs User input Postprocessor RFs OFs EFs Parameter Optimization Longterm IFs Sensitivity Analysis Model Error Analysis End No OK? Yes Continuous Fuzzy RFs Inference 2. Inferences R:105 R5 Steps in model construction 1. review and re-state the problem 2. construct the as-is model input data set 3. select model performance evaluation criteria 4. select an objective function 5. calibrate and evaluate the model 6. satisfied? If no, go back to 1; If y, go to 7 7. model several theoretical or to-be situations 8. select the likely best alternative 9. report the best solution and its uncertainty. PCSWMM PCSWMM 2005 Utilities Terminology 24 Response Functions • Nodes – Depth, head, volume, lateral inflow, total inflow, flooding • Links – Flow, depth, velocity, capacity • System – Temp, rainfall, snow depth, losses, runoff, dry weather inflow, ground water inflow, RDII inflow, direct inflow, total inflow, flooding, outflow, storage typical cycle in a response or input function -the functions may be observed, synthetic or computed; RFcrit and IFcrit are arbitrary RF(t), IF(t) RFcrit, IFcrit t1,1 t1,2 t1,3 t1,4 t2,1 t2,2 R:117 OF1: (t2,1 - t1,1) duration of wet event OF2: (t2,2 - t1,3) duration of dry event OF3: RF(t1,3) peak flow, flux, or concentration OF4: RF(t1,1) minimum flow, flux or concentration OF5: *INT (t1,4-t1,1) total wet event flow or flux OF6: (t1,4 - t1,2) duration of exceedance OF7: (t2,2 - t1,4) duration of deficit OF8: n[RF>RFcrit] number of exceedances OF9: n[RF<RFcrit] number of deficits OF10: *INT (t2,2-t1,4) volume of deficit OF11: *INT (t1,4-t1,2) volume of excess OF12: OF5/OF1 wet event mean concentration OF13: *INT (t2,1-t1,4) total dry event flow or flux OF14: OF13/OF2 dry event mean concentration R:117 t 1,4 OF 5 = RF(t)dt t 1,1 t 2,2 OF 10 = RF crit - RF(t)dt t 1,4 t 1,4 OF 11 = RF(t) - RF dt crit t 1,2 t 2,1 OF 13 = RF(t)dt t 1,4 R:118 Dominant process Objective function Overland flow over impervious areas OF3 Infiltration into the upper soil mantle OF4 Pollutant washoff OF5 Erosion OF1 Overland flow over pervious areas OF3 Pollutant build-up OF5 Recovery of storages OF2 Recovery of loss (infiltration) rates OF4 Recession of storages OF7 Evaporation *IF8 Snowmelt *IF11 snow accumulation *IF7 R:119 Rule: Select the best objective function thoughtfully, by relating it back to the original design questions. Use the minimum acceptable number of objective functions. R:119 1. observation error, related to field instrumentation, comprising two components, one random one systematic; 2. sampling error, associated with the timing and location of the field equipment; 3. numerical error, identified with numerical math used in the code; 4. structural error, related to disaggregation (the number & resolution of the processes active); 5. structural error, related to discretization (the spatial resolution); 6. structural error, related to poor formulation of one or more of the component process relations and code; and 7. propagated error, related to erroneous parameters. R:123 External description Prior knowledge 1. uncertainty due to natural variability, or unobserved input disturbances. 2. measurement and sampling errors of observed input and output. Calibration process Identify as-is model 3. start-up error 4. input TS datafile error 5. model error Internal description 1. aggregation error 2. numerical error 3. structural error 4. discretization error 5. input environment datafile error 6. model structure and state-parameter error 7. parameter optimization error Design process (inference to the tobe and as-was scenarios 6. uncertainty of to-be parameters 7. user output-interpretation error 8. parameter propagation error 9. error analysis R:124 Rule: Sixteen sources of error are listed in the framework for uncertainty analysis presented here. When interpreting the computed output from your model, all sixteen sources should be explicitly interpreted. R:127 model users must be able to: 1. isolate the important empirical parameters that require refining (calibration), 2. associate these parameters with their correct processes (may be more than one), 3. isolate the conditions under which the processes are active (again may be more than one), and then 4. select state-variable events (SV sub-spaces) for sensitivity (which may be hypothetical events), and 5. select state-variable events from the observed record for calibration analyses. R:136 (Ofi)c C D B D A A represents “small” events B represents “medium” events C represents “big” events D represents fuzzy overlaps (Ofi)o R:137 Short-duration-high-intensity SDHI 20 m; 3 in/h Medium-duration-hi-intensity MDHI 60 m; 1.0 in/hr long-duration-high-intensity LDHI 600 m; 0.2 in/h Short-duration-med-intensity SDMI 20 m; 0.4 in/hr Medium-duratn-med-intensity MDMI 60 m; 0.3 in/h long-duration-med-intensity LDMI 600 m; 0.1 in/h Short-duration-low-intensity SDLI 20 m; 0.1 in/h Medium-duration-low-intensity MDLI 60 m; 0.1 in/hr long-duration-low-intensity LDLI 600 m; 0.1 in/h Short-duration-high-intensity SDHI 1 d; 0.5 in/d long-duration-high-intensity LDHI 10 d; 0.3 in/d Short-duration-low-intensity SDHI 1 d; 0.05in/d Evapo-transpiration: long-duration-low-intensity LDLI 10 d; 0.05 in/d R:139 Rain: Light rate of rain Overland flow over impervious areas Medium rate of rain Infiltration into upper soil mantle; pollutant washoff Heavy rate of rain Erosion; pollutant washoff; pervious area flow Long duration rain Overland flow over pervious areas No rain: Long duration drought Pollutant build-up; groundwater depletion Short duration drought Storage recessions Temperature: High temperatures Evapo-transpiration; snowmelt Low temperatures Snow accumulation & ripening Wind: High wind Snowmelt R:140 Rule: Associate parameters with processes, and processes with causative events, and causative events with limited state-variable sub-spaces. R:140 A total error statistic (EFt) may be used to quantify overall goodness of fit: 1 2 n ( COF i - OOF i )2 EF t = (1.0 - w) + (w | OPF p - CPF p |) n i=1 where: EFt = total error statistic (m3/s); w = weighting factor; n = number of measured hourly flows; OOF = measured flow (m3/s); COF = computed flow (m3/s); OPF = measured peak flow (m3/s); and CPF = computed peak flow (m3/s). R:142 Rule: Use first-order error analysis to report the estimated propagated error in your recommended design solution. R:156 rate of rain E A B H not used + ve zero G C - ve evapotranspiration rate not used D F zero duration of rain R:165 1.0 C,F,G B,H 0.0 A,E I D medium zero evapo-transpiration short duration med long rate-of-rain R:166 general form: If X period is Y , analyze Z parameters. where X, Y, Z have the following meanings: X Y Z 1. rain long erosion 2. rain medium pervious area flow 3. rain medium pollutant washoff 4. rain short impervious area flow 5. rain short rain-out 6. ET exists recovery of storages 7. ET exists recovery of loss rates 8. ET exists groundwater depletion 9. ET medium pollutant build-up R:167 Rule: Analyse only sensitive parameters, and then only against relevant events. R:167 Framework for continuous modeling: At your desk: 1. Make a list of simplified design questions, and postulate the relationship between your list and your proposed objective functions. R:169 2. Select the best objective functions and response functions for your study problem. Minimize the computed output and computer execution times. Allocate storage space for computed time series management. R:169 3. Obtain or generate a credible, very-long-term time series to drive your model for design inference. R:169 4. Obtain a short but sufficient record of good, observed events to calibrate your model. R:169 Using the PCSWMM4 shell: 5. List all parameters that need to be optimized, and their associated processes. R:169 6 Associate all processes with the limited statevariable sub-spaces where they dominate. R:169 7. Search the good observed record for a sufficient number of appropriate events. R:169 8. Estimate: 1. the mean most likely value, 2. a higher most likely value, and 3. a lower most likely value for each of all input parameters. Choose the sensitivity test range, but keep it small. R:169 9. Carry out the sensitivity tests, and rank all parameters, in terms of their dimensionless sensitivity gradients. R:169 10. Optimize the parameters to give the smallest error. R:169 11. Run the calibrated model for the long term time series for each array of BMPs. R:169 12. Infer which is the best array. Rerun the model for this array estimating the error in the computed response functions. R:169 13. Study all the input and output information again; make certain that it is logical, and gain knowledge about the performance of the drainage system. Interpret the impact of the errors. R:169 At your client's office: 14. Report your recommendations, and, provided you follow the logic, become rich and famous. R:169 The following 8 rules form a personal catechism for honest, very-long term, continuous surface water quality modeling R:171 Rule 1: Do not calibrate all parameters simultaneously against a long-term continuous observed record, notwithstanding any early advice to the contrary in the literature. R:171 Rule 2: Transpose or synthesize a long-term, hydro-meteorologic input time-series from the same hydrologic region, and use this for inferring comparative performance of various arrays of BMPs. Many records of 50 years duration or longer are available. R:171 Rule 3: Carefully choose the best objective functions that represent the design questions and the model variability. Get the advisory committee to justify the selections in writing. R:171 Rule 4: In order to control the amount of computing, associate the input parameters with processes, and processes with causative events, and causative events with limited state-variable sub-spaces. For this activity, sensitivity analysis code in PCSWMM4 is helpful. Do not analyze parameters outside these spaces. R:171 Rule 5: Use three estimates of the most likely parameter values. It is more meaningful to compare the computed response from several reasonable models, rather than responses computed using extreme values. R:171 Rule 6. Assume that the WQM is approximately linear, for the purposes of optimizing parameters, and estimating the propagated error. Then analyze for sensitivity near the mean expected values of all input parameters. R:171 Rule 7: Calibrate only sensitive parameters, and then only against relevant events for which you have good, short-term observed data. And that must include good rate-of-rain with adequate coverage and spatial resolution. R:171 Rule 8: Use first-order linear error analysis, and report the estimated propagated error in your recommended design solution. R:171 The end see you on-line at: •www.computationalhydraulics.com •www.eos.uoguelph.ca/webfiles/james •bill@computationalhydraulics.com • wjames@uoguelph.ca