Models evaluation and selection

Assessment on models from Latvian perspective Models evaluation and selection To select best suitable models to apply for national assessments of water quality, consideration is needed on the purpose of the modelling and expected targets, the spatial and temporal resolution, the model data requirement and data availability, the models tested for similar conditions, the modularity of models to benefit from developments, the user interface to the model and the user competence. Choice between freeware model and commercial product can have two ways aspects on support, availability, developments and costs of the model application. Major problems in the surface water ecosystem in Latvia are eutrophication and effects from load of priority hazardous substances (annex X EU Water framework directive) according to LEGMA, thus the models presented in this report cover short descriptions of models within those areas of interest. However the numbers of models that apply are numerous and not all could be covered within this report. Processes and models important for modelling eutrophication issues include the water cycle in hydrological models, catchment load from landuse activities i.e. diffuse sources and point sources in catchment models, turnover processes in lakes and watercourses in retention models. The EUROHARP EU FP6 project evaluated and tested models with the purpose to compare different model approaches used for international reporting obligations in Europe to harmonise reporting procedures on diffuse sources of nutrients (EUROHARP 1-2003). Nine models were tested in seventeen different countries. Only two models from the EUROHARP models are selected within this report based on the evaluation results from EUROHARP. Only four models included hydrological modelling (ANIMO,TRK, SWAT, EveNFlow) and only four models included enough processes to be classified as very suitable for scenario asessments (MONERIS, NL-CAT, TRK, SWAT). Thus only SWAT and TRK out of the nine EUROHARP models were very suitable for the purpose to model eutrophication issues. Other models that have been included in this report are the hydrological models HBV and SCS because of the importance of the hydrology and their difference in complexity, INCA because it is an alternative to TRK and SWAT with modest data requirement in a single interface package, CE-QUAL-W2 as an example of a fully distributed water quality model, and WATSHMAN as an example of source apportionment and scenario tool. Monitor and model hazardous priority substance can be done by combining monitoring with modelling tools to determine the chemical fate of the substance and to determine the effects on the ecological status. Models that are discussed in this report are chemical fate models based on fugacity, screening monitoring methods, QSAR model to determine chronic toxicity and physical data of unknown new substances and OMEGA as a new tool for determining ecological status. Reference EUROHARP 1-2003. Review and Literature Evaluation of Quantification Tools for the Assessment of Nutrient Losses at Catchment Scale. Oscar F. Schoumans, ALTERRA, the Netherlands Martyn Silgram, ADAS, United Kingdom Short model descriptions Hydrological models The SCS Curve number method The SCS curve number method is a simple, widely used and efficient method for determining the approxient amount of runoff from a rainfall even in a particular area. The SCS curve number method is often included in more distributed hydrological models to evaluate surface runoff (e.g. SWAT). Although the method is designed for a single storm event, it can be scaled to find average annual runoff values. The data requirements for this method are very low, rainfall amount and curve number. The curve number is based on the area's hydrologic soil group, land use, treatment and hydrologic condition. The two former being of greatest importance. The general equation for the SCS curve number method is as follows: The initial equation (1) is based on trends observed in data from collected sites, therefore it is an empirical equation instead of a physically based equation. After further empirical evaluation of the trends in the data base, the initial abstractions, Ia, could be defined as a percentage of S (2). With this assumption, the equation (3) could be written in a more simplified form with only 3 variables. The parameter CN is a transformation of S, and it is used to make interpolating, averaging, and weighting operations more linear (4). Curve numbers are available for most land-use types. Description of the SCS method captured from: http://www.ecn.purdue.edu/runoff/documentation/scs.htm The HBV model The HBV model (Bergström, 1976, 1992) is a rainfall-runoff model, which includes conceptual numerical descriptions of hydrological processes at the catchment scale. The general water balance can be described as: where: P = precipitation E = evapotranspiration Q = runoff SP = snow pack SM = soil moisture UZ = upper groundwater zone LZ =lower groundwater zone lakes = lake volume In different model versions HBV has been applied in more than 40 countries all over the world. It has been applied to countries with such different climatic conditions as for example Sweden, Zimbabwe, India and Colombia. The model has been applied for scales ranging from lysimeter plots (Lindström and Rodhe, 1992) to the entire Baltic Sea drainage basin (Bergström and Carlson, 1994; Graham, 1999). HBV can be used as a semi-distributed model by dividing the catchment into subbasins. Each subbasin is then divided into zones according to altitude, lake area and vegetation. The model is normally run on daily values of rainfall and air temperature, and daily or monthly estimates of potential evaporation. The model is used for flood forecasting in the Nordic countries, and many other purposes, such as spillway design floods simulation (Bergström et al., 1992), water resources evaluation (for example Jutman, 1992, Brandt et al., 1994), nutrient load estimates (Arheimer, 1998). Input data are observations of precipitation, air temperature and estimates of potential evapotranspiration. The time step is usually one day, but it is possible to use shorter time steps. The evaporation values used are normally monthly averages although it is possible to use daily values. Air temperature data are used for calculations of snow accumulation and melt. It can also be used to adjust potential evaporation when the temperature deviates from normal values, or to calculate potential evaporation. If none of these last options are used, temperature can be omitted in snowfree areas. The model consists of subroutines for meteorological interpolation, snow accumulation and melt, evapotranspiration estimation, a soil moisture accounting procedure, routines for runoff generation and finally, a simple routing procedure between subbasins and in lakes. It is possible to run the model separately for several subbasins and then add the contributions from all subbasins. Calibration as well as forecasts can be made for each subbasin. For basins of considerable elevation range a subdivision into elevation zones can also be made. This subdivision is made for the snow and soil moisture routines only. Each elevation zone can further be divided into different vegetation zones (e.g., forested and non-forested areas). The standard snowmelt routine of the HBV model is a degree-day approach, based on air temperature, with a water holding capacity of snow which delays runoff. Melt is further distributed according to the temperature lapse rate and is modelled differently in forests and open areas. A threshold temperature, TT, is used to distinguish rainfall from snowfall. Although the automatic calibration routine is not a part of the model itself, it is an essential component in the practical work. The standard criterion (Lindström, b1997) is a compromise between the traditional efficiency, R 2 by Nash and Sutcliffe (1970) and the relative volume error, RD: In practice the optimisation of only R2 often results in a remaining volume error. The criterion above gives results with almost as high R2 values and practically no volume error. The best results are obtained with w close to 0.1. The automatic calibration method for the HBV model developed by Harlin (1991) used different criteria for different parameters. With the simplification to one single criterion, the search method could be made more efficient. The optimisation is made for one parameter at a time, while keeping the others constant. The one-dimensional search is based on a modification of the Brent parabolic interpolation (Press et al., 1992). Description of the HBV model captured from: http://www.smhi.se/sgn0106/if/hydrologi/hbv.htm Link to official homepage: http://www.smhi.se/foretag/m/hbv_demo/html/welcome.html Hydrology and water quality models These models include hydrology and retention modelling producing results of runoff and water quality. CE-QUAL-W2 CE-QUAL-W2 is a two-dimensional water quality and hydrodynamic code supported by the USACE Waterways Experiments Station (Cole and Buchak). W2 models basic eutrophication processes such as temperature-nutrient-algae-dissolved oxygen-organic matter and sediment relationships. The model has been widely applied to stratified surface water systems such as lakes, reservoirs, and estuaries and computes water levels, horizontal and vertical velocities, temperature, and 21 other water quality parameters (such as dissolved oxygen, nutrients, organic matter, algae, pH, the carbonate cycle, bacteria, and dissolved and suspended solids). Version 3 has the capability of modeling entire river basins with rivers and inter-connected lakes, reservoirs, and/or estuaries. Similar fully distributed models are the MIKE models (www.dhi.dk) and SHETRAN which is fully distributed even in three dimensions (http://www.ncl.ac.uk/wrgi/wrsrl/rbms/rbms.html). CE-QUAL-W2 has been in use for the last two decades as a tool for water quality managers to assess the impacts of management strategies on reservoir, lake, and estuarine systems. A predominant feature of the model is its ability to compute the two-dimensional velocity field for narrow systems that stratify. In contrast with many reservoir models that are zero-dimensional with regards to hydrodynamics, the ability to accurately simulate transport can be as important as the water column kinetics in accurately simulating water quality. There were many approaches that could have been implemented to incorporate riverine branches within CE-QUAL-W2. By choosing a theoretical basis for the riverine branches that uses the existing CE-QUAL-W2 2-D computational scheme for hydraulics and water quality, the following benefits accrued:  Code updates in the computational scheme affected the entire model rather than just one of the computational schemes for either the riverine or the reservoir sections leading to easier code maintenance.  No changes were made to the temperature or water quality solution algorithms.  By using the two-dimensional framework, the riverine branches had the ability to predict the velocity and water quality field in two dimensions - this has advantages in modeling the following processes: sediment deposition and scour, particulate (algae, detritus, suspended solids) sedimentation, and sediment flux processes as well as making Manning’s friction factor stage invariant.  Since the entire watershed model had the same theoretical basis, setting up branches and interfacing branches involved the same process whether for reservoir or riverine sections, thus making code maintenance and model set-up easier. The theoretical approach was to re-derive the governing equations assuming that the 2-D grid is adjusted by the channel slope Cole, T. and Buchak, E. "CE-QUAL-W2: A Two-Dimensional, Laterally Averaged, Hydrodynamic and Water Quality Model, Version 2.0," Tech. Rpt. EL-95-May 1995, Waterways Experiments Station, Vicksburg, MS, 1995. Link to CE-QUAL-W2 homepage http://www.ce.pdx.edu/w2/ Water quality catchment models TRK The Swedish TRK system has been developed to calculate gross and net load and source apportionment of nutrients for national assessments of progress towards environmental targets regarding reductions in eutrophication in surface waters. The TRK system has further been developed to provide the option for scenario analysis e.g. mitigation options at subcatchment level associated with agricultural practices. To permit assessment of the most effective measures, and to avoid the large effects due to inter-annual variations in climate, results are presented from the system as long-term climate-normalised load for a specific year. The models included in the system however, can provide a daily output resolution. The results from the system have been used for international reports on the transport to the sea, for assessment of the reduction of the anthropogenic load on the sea and for guidance on effective measures for reducing the load on the sea on a national scale (Naturvårdsverket 1997, Brandt and Ejhed 2002). The TRK system consists of a GIS and database that prepares input data for models included in the system, and calculates gross and net load and source apportionment. Calculations of both N and P are included from both diffuse sources and point sources, including calculations of hydrology and nitrogen retention in soils, rivers and lakes. The TRK-system includes three dynamic simulation models; SOILNDB, ICECREAM and HBV-NP. The SOILNDB model is a one-dimension model, describing nitrogen dynamics and losses in soil profiles in arable land. Nutrient losses from arable land are calculated from areas with a unique combination of crop, soil type, region, and climate and fertiliser regime from the root zone or deeper. A method for calculating a number of leaching estimates for different typical cropping situations has been developed. Outputs as leaching (in mg/l) from different combinations of arable crops, soils and fertiliser regimes, are input data to HBV-NP. In HBV-NP which is a more conceptual model, root zone concentrations are assigned to various land-use categories (i.e. from pasture, forest, and other land) to water percolating from the unsaturated zone to the groundwater. The runoff model calculates daily runoff from the various land uses in subbasins. The summarised soil leakage is mixed with load from rural households, and point sources are added to the river discharge as well as atmospheric deposition. In addition to the mixing of waters and various loads through the river network, turnover processes (retention) in the groundwater (below rootzone) and ditches, rivers and lakes are simulated both for inorganic and organic nitrogen. The schematic processes of TRK system includes the following steps: 1. Import spatially distributed input data to produce point and diffuse sources, hydrology and retention. 2. Preparation and coupling of distributed land-use categories to other data and subcatchments, and coupling of point sources to subcatchments using GIS; 3. Import land use to the HBV-model calculations followed by export of hydrology for the SOILNDB and ICECREAM calculations. 4. Import agricultural data (crops, soils, practices, meteorological data and hydrological data to the SOILNDB and ICECREAM models.Calculations and export of leaching concentrations from arable land. 5. Export all compiled data of diffuse sources (leaching concentrations and land-use area) and point source discharge to HBV and HBV-NP models. Calculation of nitrogen and phosphorus transport and retention in soils, rivers and lakes. 6. Import retention from HBV-NP model calculations. 7. Compilation of gross and net load and source apportionment. The results are presented in the GIS, and source apportionment is made for each subbasin as well as for the whole river basins. Generalised N root-zone leaching estimates for arable land are calculated using the SOILNDB modelling tool (Johnsson et al., 2002). The method is based on calculating a number of standard N leaching rates (i.e. nitrogen leaching from the root zone for a specified year if the weather and harvest would have been normal) for a number of combinations of soils, crops and fertilisation forms and regions (catchment, area etc.). For this calculation the following is used: SOILNDB, a crop rotation generator, longterm meteorological data, agricultural statistics of crops and area distribution, standard yields, normal fertilisation rates and crop management information. Leaching is simulated for a large number of years using the meteorological timeseries to get acceptable mean values of the standard leaching rates for the different crop-soil combinations. Thus, leaching estimates are normalised with respect to year to year variation in weather conditions and crop production. The method of calculating leaching estimates was developed by Hoffmann & Johnnsson (1999) and Johnsson & Hoffmann (1998) and has been further developed by Johnsson & Mårtensson (2002). The system has been used for calculating leaching estimates for combinations of different climates, soil textural classes, crops, organic Review and Literature Evaluation of Quantification Tools of Nutrient Losses EUROHARP 1-2003 33 matter classes and fertilisations regimes in the Nordic countries and Sweden (Johnson & Hoffmann, 1996; Johnsson & Hoffmann, 1998, Johnsson & Mårtensson, 2002). SOILNDB is a management oriented modelling tool based on the onedimensional SOILSOILN models describing N dynamics and losses in arable soils, a parameter database and parameter estimation algorithms. The soil N model, SOILN (Johnsson et al., 1987) is coupled in series with the soil water and heat model, SOIL (Jansson & Halldin, 1979; Jansson, 1991). SOIL provides driving variables for the SOILN model, i.e., infiltration, water flow between layers and to drainage tiles, unfrozen soil water content and soil temperature. The SOIL model includes snow dynamics, frost, evapotranspiration, infiltration, surface runoff and drainage flows as well as water uptake by vegetation. The SOILN model includes the major processes determining inputs, transformations and outputs of N in arable soils: inputs of fertiliser and deposition; mineralisation dependent on soil temperature and moisture; decomposition to CO2, humus and recycling within the pool; soil temperature function, Q10, for regulation of all biological processes; plant uptake from empirical functions; denitrification dependent on soil temperature, soil oxygen status and soil nitrate content (Figure 13). Nitrate transport is calculated as the product of water flow and nitrate concentration in the soil layer. Ammonium is considered to be immobile in the soil profile. Gross load from arable land is calculated using spatial distribution of crops and soil types. The HBV-NP simulates nitrogen (N) and phosphorus (P) transport and transformation at the catchment scale (from 1 km2 to > 1 000 000 km2). The objectives are usually to estimate transport, retention and source apportionment, to separate human impact from anthropogenic, and to evaluate climate and management scenarios. It is based on the hydrological HBV model, which gradually has been equipped with a N routine (Bergström et al. 1987, Brandt 1990, Arheimer and Wittgren 1994, Arheimer and Brandt, 1998). The P routine has recently been developed within VASTRA - the Swedish Water Management Research Programme. HBV-NP is a dynamic mass-balance model, which is run at a daily time-step, including all sources in the catchment coupled to the water balance: where: c = concentration of nutrient fraction V = water volume of groundwater, river or active part of lake in = inflow (e.g. for groundwater: soil leakage from various land uses; for lakes/wetlands: upstream rivers and local discharge, precipitation on the surface) out = outflow to river, lake or downstream subbasin, evaporation D = atmospheric deposition on water surfaces P = emissions from point sources or rural households F = retention (removal or release), see Table 1. The spatial resolution of the model depends on the subbasin division in each application. The HBV-N has been applied in large-scale studies, covering southern Sweden (145 000 km2 divided into 3700 catchments; Arheimer and Brandt, 1998), the country of Sweden (450 000 km2 divided into 1000 subbasins; the TRK project), and the Baltic Sea drainage basin (~1 720 000 km2 divided into 30 subbasins; Pettersson et al., 2000). The model has also been used for more detailed studies, as for the Genevadsån River (200 km2 divided into 70 subbasins; Arheimer and Wittgren, 2002; Arheimer et al, 2003). Additionally, the model has been applied in Matsalu River in Estonia (Lidén et al., 1999). When applying the model the river basin may be divided into several coupled subbasins, for which the calculations are made separately, and this gives the spatial distribution of the model results. The hydrological part (i.e. HBV-96) consists of routines for accumulation and melt of snow, accounting of soil moisture, lake routing and runoff response. The model includes a number of free parameters, which are calibrated against observed timeseries of river discharge and riverine nutrient concentrations. For large-scale catchment applications, the calibration procedure is made step-wise for surface runoff, tile drains and groundwater, rivers and lakes, with simultaneous consideration to several monitoring sites in a region. In the nutrient routine, soil leaching concentrations are assigned to the water percolating from the unsaturated zone to the response reservoir of the hydrological HBV model (Fig. 1). Different concentrations are applied to water originating from different combinations of land use and soils. The arable land may be further divided into a variety of crops and management practices, for which the nutrient leaching is achieved by using field-scale models, e.g., SOILN (Johnsson et al., 1987); or ICECREAM (Tattari et al., 2001) extended with macropore flow. For P, also soil surface erosion and water transport is considered, using a GIS-based model component, e.g. DelPi (Hellström, 2003). In addition to the diffuse soil-leaching, nutrient load is also added from point-sources, such as rural households, industries, and wastewater treatment plants. Atmospheric deposition is added to lake surfaces, while deposition on land is implicitly included in the soil-leaching. The model simulates residence, transformation and transport of N and P in groundwater, rivers, wetlands and lakes. The model considers that stream bank erosion, as well as sedimentation and suspension processes in the rivers may have an impact on the river load. The equations used to account for the nutrient turnover processes are mainly based on empirical relations between physical parameters and concentration dynamics. The fractions modelled are: dissolved inorganic nitrogen (DIN), dissolved organic nitrogen (DON), particulate phosphorus (PP), and soluble reactive phosphorus (SRP). Calculations are made with a daily time-step. Simultaneous calibration of water balance and nutrient concentrations may be performed (Pettersson et al., 2001). Andersson, L. and Arheimer, B. (2003): Modelling of human and climatic impact on nitrogen load in a Swedish river 1885-1994. Hydrobiologia (in press). Andersson, L. and Arheimer, B., (2001). Consequences of changed wetness on riverine nitrogen - human impact on retention vs. natural climatic variability. Regional Environmental Change 2:93-105. Andersson, L., Hellström, M., Persson, K. (2002): A nested model approach for phosphorus load simulation in catchments: HBV-P. In: Proceedings Nordic Hydrological Conference. Röros, Nor-way. August 2002, pp. 229-238. Andersson, L., Persson, K., Hellström, M. (2002): Fosfortransport och koncentrationer i vattendrag. Utveckling och test av modellverktyg för uppföljning av miljömål, samt scenarier av hur uppställda mål kan nås. VASTRA working paper. (In Swedish) Andreasson, J. (2002): Skogsläckaget och retentionen av kväve norr om Dalälven. VASTRA working paper. (In Swedish) Arheimer, B. (1998) Riverine Nitrogen - analysis and modelling under Nordic conditions. Ph.D. thesis. Kanaltryckeriet, Motala. pp. 200. Arheimer, B. and Bergström, S. (1999). Modelling nitrogen transport in Sweden: influence of a new approach to runoff response. In: Heathwaite, L. (Ed.) Impact of Land-Use Change on Nutrient Loads from Diffuse Sources. International Association of Hydrological Sciences, IAHS Publication no. 257. Arheimer, B. and Brandt, M., (1998). Modelling nitrogen transport and retention in the catchments of southern Sweden. Ambio 27(6):471-480. Arheimer, B. and Brandt, M., (2000). Watershed modelling of non-point nitrogen pollution from arable land to the Swedish coast in 1985 and 1994. Ecological Engineering 14:389-404. Arheimer, B. and Wittgren, H. B., (1994). Modelling the effects of wetlands on regional nitrogen transport. Ambio 23(6):378-386. Arheimer, B. and Wittgren, H.B., (2002). Modelling Nitrogen Retention in Potential Wetlands at the Catchment Scale. Ecological Engineering 19(1):63-80. Arheimer, B., Torstensson, G. and Wittgren, H.B (2003): Landscape planning to reduce coastal eutrophication: Constructed Wetlands vs. Agricultural Practices. Landscape and Urban Planning (in press). Bergström, S., Brandt, M. & Gustafson, A., (1987). Simulation of runoff and nitrogen leaching from two fields in southern Sweden. Hydrological Science Journal 32(2-6):191-205. Brandt, M. and Ejhed, H. (2003): TRK-Transport, Retention, Källfördelning. Belastning på havet. Swedish Environmental Protection Agency, Report No. 5247. Brandt, M., (1990). Simulation of runoff and nitrogen transport from mixed basins in Sweden. Nordic Hydrology, 21:13-34. Fogelberg, S. (2003): Modelling nitrogen retention at the catchment-scale: Comparison of HBV-N and MONERIS. Master thesis, Uppsala Technical University, Report (in press). Hellström, 2002, DelPi. An ArcView GIS 3.x extension for Estimating diffuse Loads of Sediment and Phosphorus from arable catchments. Johnsson, H., Bergström, L. and Jansson, P.-E., 1987. Simulated nitrogen dynamics and losses in a layered agricultural soil. Agriculture, Ecosystems and Environment 18:333-356. Lidén, R., Vasilyev, A., Loigu, E., Stålnacke, P., Grimvall, A. and Wittgren, H. B., (1999). Nitrogen source apportionment - a comparison between a dynamic and a statistical model. Ecological Modelling 114:235-250. Marmefelt, E., Arheimer, B. and Langner, J., (1998). An integrated biogeochemical model system for the Baltic Sea. Hydrobiologia 393:45-56. Pettersson, A., Arheimer, B. and Johansson, B., (2001). Nitrogen concentrations simulated with HBV-N: new response function and calibration strategy. Nordic Hydrology 32(3):227-248. Tattari, S., Bärlund, I., Rekolainen, S., Posch, M, Siimes, K., Tuhkanen, H-R, YliHalla, M. (2001). Modeling sediment yield and phosphorus transport in Finnish clayey soils. Transactions of the ASAE Vol. 44, no. 2, pp. 297-307. Wittgren, H. B., Gippert, L., Jonasson, L., Pettersson, A., Thunvik, R., and Torstensson, G. (2001). An actor game on implementation of environmental quality standards for nitrogen. In: Steenvoorden, J., Claessen, F. and Willems, J. (Eds) Agricultural Effects on Ground and Surface Waters. IAHS Publ. no. 273. ICECREAM is a simulation model for quantification of field-scale losses of phosphorus. It is based on the CREAMS model, which was developed in the USA, then it has been further developed and tested in Finland and Sweden. Losses of phosphorus have mainly been related to surface runoff, and CREAMS was therefore developed to calculate the effects of different management options on the erosion and surface losses of phosphorus. However, during recent years it has been found that leaching of phosphorus through the soil profile to drainage pipes can be relatively large, especially on structured clay soils. The ICECREAM-model was therefore complemented with descriptions for flow of both dissolved and particulate phosphorus through macropores. The knowledge about the processes governing sorption, transformation and transport of P is incomplete, and the quantification of the losses is therefore more insecure than for nitrogen. Since the knowledge about some processes governing the loss of P is small, ICECREAM is a mix of physically based descriptions and empirical equations. The surface runoff is for example based on a factor called ‘curve number’, which is based on field experiments in the USA. However, it is possible that these ‘curve numbers’ do not represent conditions outside the place where they were measured, and the model should therefore be calibrated when it is used outside USA. The ICECREAM model runs on a daily time-step with standard meteorological data as input. The model contains a description of a full water balance including precipitation, evaporation, transpiration, surface runoff, and percolation out of the root zone. A modification of the Soil Conservation Service curve number method is used to partition net rainfall between surface runoff and infiltration. Downward water flow between soil layers and percolation is calculated with a ‘storage-routing’ concept (i.e. a capacity-type approach), and takes place if the water storage in a layer exceeds the field capacity. In ICECREAM, soil phosphorus is divided into six pools (kg m-2), three of which are inorganic, in the form of stable, PS, active, PA, and labile, PL, phosphorus, and three are organic: a litter pool consisting of fresh organic material such as decomposing roots and straw, PFO, a humus pool comprising more stable organic matter, PSO, and a faeces pool containing added manure, PMAN. Flows of P between pools include plant uptake of PL, decomposition and humification of litter and faeces, immobilization of PL to litter P and mineralization of humus. Phosphorus in fertilizer is directly added to the labile phosphorus pool, PL. The losses of phosphorus are divided into surface runoff losses and losses to tile drains, and two fractions are considered, dissolved and particulate P. Larsson, M.H., Persson, K., Ulén, B., Lindsjö, A., Jarvis, N.J. 2007. A dual porosity model to quantify phosphorus losses from macroporous soils. Ecological Modelling 205, 123- 134. Posch, M., Rekolainen, S. 1993. Erosivity factor in the Universal Soil Loss Equation estimated from Finnish rainfall data, Agric. Sci. Finland 2,271–279. Rekolainen, S., Posch, M., 1993. Adapting the CREAMS model for Finnish conditions. Nordic Hydrol. 24, 309-322. Tattari, S., Bärlund, I., Rekolainen, S., Posch, M., Siimes, K., Tuhkanen, H.-R., Yli-Halla, M., 2001. Modelling sediment yield and phosphorus transport in Finnish clayey soils. Trans. ASAE 44, 297-307. Link to EUROHARP homepage with TRK model descriptions etc www.euroharp.org Link to original TRK reports (in Swedish) http://www.naturvardsverket.se/Documents/bokhandeln/bokhandeln.htm Link to information on HBV-NP http://www.smhi.se/sgn0106/if/hydrologi/hbv_np.htm Link to information on ICECREAM (in Swedish) http://vv.mv.slu.se/ShowPage.cfm?OrgenhetSida_ID=6506 SWAT The SWAT-model (Soil and Water Assessment Tool) is a three-dimensional continuous time watershed model that operates on a daily time step at basin scale and originally developed in the USA. The major objective of the model is to predict the longterm impacts in large basins of management and also timing of agricultural practices within a year (i.e., crop rotations, planting and harvest dates, irrigation, fertiliser, and pesticide application rates and timing). It can be used to simulate at the basin scale water and nutrient cycles in landscapes where the dominant land use is agriculture. It can also help in assessing the environmental efficiency of best management plans and alternative management policies. The chemicals considered in the model include nutrients (N-based, P-based, O-based and algae) and pesticides. In order to apply SWAT, each watershed is discretised into sub-watersheds for which the top surface corresponds to the upper boundary. The lower boundary is represented by the top of the deep aquifer (several metres). The losses (water, sediment, and nutrients) for a specific subwatershed are computed at the sub-watershed outlet. The point sources and the losses for each sub-watershed are then routed through a channel network where retention and transformation of nutrients is simulated. The model takes into account not only the retention taking place in the soil, but also the retention occurring in the river system. The hydrology in the model is based on the water balance equation comprising surface runoff, precipitation, evapotranspiration, infiltration and subsurface runoff. Evapotranspiration can be calculated by the Priestley-Taylor method or Penman-Monteith method. Precipitation can be estimated using a weather generator included in SWAT; however, measured time series can also be used, thereby reducing uncertainties. For calculation of the infiltration, the soil profile is represented by up to 10 layers, a shallow aquifer and a deep aquifer. When the field capacity in one layer is exceeded, the water is routed to the next soil layer. If this layer is already saturated, a lateral flow occurs. Bottom layer percolation goes into the shallow and deep aquifers. Water reaching the deep aquifer is lost, but a return flow from the shallow aquifer due to the deep aquifer saturation is added directly to the subbasin channel. Runoff volumes are computed by the SCS Curve Number Method. Surface runoff is estimated as a non-linear function of precipitation and a retention coefficient. Also the Green & Ampt approach is available. SWAT also incorporates models to predict channel losses, runoff in frozen soils, snow melt, or capillary rise. A simplified EPIC model is used to simulate crop growth (e.g. wheat, barley, alfalfa, corn) using unique sets of parameters for each crop. Natural vegetations (i.e. forest, grass, pasture) are also included in the crop database. Once all hydrological processes are calculated for an homogeneous part of the subbasin, the resulting flows are considered to contribute directly to the main channel. SWAT includes a routing module based on the ROTO model. This routing procedure moves downstream the water budget taking into account how subbasins and reservoirs are connected. Sediment yield is determined for each subbasin with the Modified Universal Soil Loss Equation, including runoff, soil erodibility, slope and crop factors. Nutrient loading to the channel is calculated from the concentrations in the upper soil layer and the runoff volumes. Use of P and N by crops is estimated by using a supply and demand approach. The nitrogen module also includes processes like mineralisation, denitrification, and volatilisation. Phosphorus association with the sediment phase is also considered in the phosphorus module. Both modules are based on the CREAMS model. Nutrient retention in ponds and wetlands are included. After considering the N and P dynamics, the chemicals are also routed into the subbasin channels. Neitsch S.L., Arnold J.G., Kiniry J.R., Williams J.R., (2001), Soil and Water Assessment Tool – Theoretical Documentation - Version 2000, Blackland Research Center – Agricultural Research Service, Texas - USA Neitsch S.L., Arnold J.G., Kiniry J.R., Williams J.R., (2001), Soil and Water Assessment Tool – User Manual Version 2000, Blackland Research Center – Agricultural Research Service, Texas - USA Link to SWAT official homepage http://www.brc.tamus.edu/swat/ INCA The Integrated Nitrogen in Catchments model (INCA) was one of the first models to simulate the integrated effects of point and diffuse N sources on streamwater nitrate (NO ) and ammonium (NH ) concentrations and loads, and to estimate N process loads in the plant/soil system (Whitehead et al., 1998a). Since INCA is based on massbalance, it is potentially applicable to a broad range of spatial and temporal scales. The key features of the original INCA: the spatial characterisation of N input variations with land use and a concentration dependency of the N transformation rates in the plant/soil system and in-stream. The numerical method for solving the equations is based on the fourth-order Runge-Kutta technique, since this allows a simultaneous solution of the model equations and thereby ensures that no single process, represented by the equations, takes precedence over another. The key processes and N transformations assumed to occur in the plant/soil system are plant uptake of NO3 and NH4, nitrification, denitrification, mineralisation and immobilisation within each land-use type within each sub-catchment. These processes are represented by a generalised set of six equations; one set to simulate the flow and N in a 1 km2 cell in each of six land-use types. Parameter sets for the equations are derived through calibration, the process whereby the model parameters are adjusted until the difference between observed and simulated data is considered acceptable (Oreskes et al., 1994). 3 4 The soil reactive zone is assumed to leach water to the deeper groundwater zone and the river. In the groundwater zone, it is assumed that no biogeochemical reactions occur and that a mass balance of NH4 and NO3 is adequate. The split between the volume of water stored in the soil and the groundwater is calculated using the Base Flow Index, which is an attempt to estimate the proportions of water in a stream derived from surface and deeper groundwater sources. Whilst the index is an over- simplification, since rapid stormflow does not comprise soil water only, it represents a pragmatic method for achieving such a split and is based on the analysis of observed river flows in the UK (Gustard et al., 1987). The soil and groundwater retention volumes allow the simulation of long-term changes in the water and N stored in the soil. This process is based on the TNT model developed by Beaujouan et al. (2001). In the INCAv1.6 model, the soil drainage volume represents the water volume stored in the soil that responds rapidly to water inflow. As such, it may be thought of as macropore, drain or piston flow: the flow that most strongly influences a rising hydrograph limb. The soil retention volume represents the water volume stored in the soil that responds more slowly and may make up the majority of water storage in the soil, very similar to the field capacity concept. As such, this water may be thought of as stored in the soil micropores, and therefore dependent on the soil wetting and drying characteristics. The groundwater volume represents the sum of the mobile and immobile water stored in the aquifer, while the time constant used in the discharge equation applies to the mobile water only. The user supplies the input data as a daily time series, though in the case of the effluent inputs, average annual values can be used if no time series data are available. During model calibration, the model parameters are determined by the user. Temporal input data to drive the hydrological and N process component models INCA requires: 1. The hydrology: the hydrologically effective rainfall, the actual rainfall, the soil moisture deficit and the air temperature. 2. The water chemistry: streamwater NO3 and NH4 concentrations 3. Land management practices; the growing season for different crop and vegetation types, and fertiliser application quantities and timings; 4. Sewage effluent flow rates and NO3 and NH4 concentrations for effluents; 5. wet and dry atmospheric deposition of NO3 and NH4. Spatial data are catchment and subcatchment boundaries and land-use classes in aggregated classes. The Integrated Catchments Model of Phosphorus dynamics (INCA-P) provides a process-based representation of the factors and processes controlling P dynamics in both the land and in-stream components of river catchments while minimising data requirements and model structural complexity (Wade et al., 2002a). The model structure is appropriate to the problem of quantifying the timing and load of P delivered to river-systems; P transport is highly variable in space and time, and such transportation is often dominated by storm-events. INCA-P is designed to simulate P transport during storms by considering saturation-excess overland-flow, throughflow and groundwater flow-paths, and also incorporates in-stream processes to account for internal P loads (e.g. historic P inputs from effluent); the latter must be considered when assessing the impact of diffuse P sources on the aquatic ecology. INCA-P aims to extend current research and is designed to investigate (1) the transport and retention of P in the terrestrial and aquatic environment, and (2) the relative contributions to the in-stream P of external-diffuse, external-point and internal sources (e.g. P release from sediments and decaying organic-matter). INCA-P builds on the established Integrated Nitrogen in Catchments Model (INCA) which is a dynamic, process-based hydrochemical model that has been used to simulate nitrogen in river systems and plot studies throughout Europe, and the .Kennet. model which simulates in-stream P and macrophyte/epiphyte dynamics (Wade et al., 2001; Wade et al., 2002b). As such, INCA-P represents an advance towards a generalised framework for simulating water quality determinands in heterogeneous river systems which started with the INCA model. The .Kennet. model has been used previously to investigate management options, such as the impacts of flow and the removal of P from effluent on macrophyte biomass (Wade et al., 2002c, d; Wade et al., 2004), and INCA-P was used to assess the integrated effects of variable hydrological connectivity, soil P conditions and bio-solid applications (Hewett et al., 2004). The input fluxes that the INCA-P model takes into account are inorganic-P fertiliser and farmyard manure (FYM), slurry applications and livestock wastes. Various output fluxes (plant uptake, movement of labile P to forms that are not taken up by plants or which are bound to organic complexes by microbial immobilisation or become inactive due to chemical immobilisation) are subtracted from these inputs before the amount available for stream output is calculated. These inputs and outputs are differentiated by land-use type and varied according to environmental conditions (e.g. soil moisture and temperature). The model accounts for stocks of inorganic and organic P in the soil (in readily available and firmly bound forms) and in groundwater; total P and soluble-reactive P concentrations are simulated in the stream. The model simulates the flow of water through the plant/ soil system from different land-use types to deliver the P load to the river system. This is then routed downstream after accounting for direct effluent discharges and in-stream biological and sediment interactions. As such, the INCA-P model produces daily estimates of discharge, and stream water total P (TP) and soluble reactive P (SRP) concentrations and fluxes at discrete points along a river.s main channel. Because the model is semidistributed, spatial variations in land use and management can be taken into account, although the hydrological connectivity of different land-use patches is not modelled as in a fully distributed approach. Instead, the hydrological and nutrient fluxes from different land-use classes and sub-catchment boundaries are modelled simultaneously, and information is fed sequentially into a multi-reach river model. Text on INCA captured from: A.J. Wade, P. Durand, V. Beaujouan, W.W. Wessel, K.J. Raat, P.G. Whitehead, D. Butterfield, K. Rankinen and A. Lepisto (2002), A nitrogen model for European catchments: INCA, new model structure and equations. Hydrol.Earth Syst. Sci., 6(3) 559-582. Text on INCA-P captured from: A.J. Wade, D. Butterfield, T. Griffiths and P.G. Whitehead (2007) Eutrophication control in river-systems: an application of INCA-P to the River Lugg. Hydrol.Earth Syst. Sci., 11(1), 584-600. References: Beaujouan, V., Durand, P. and Ruiz, L., 2001. Modelling the effect of the spatial distribution of agricultural practices on nitrogen fluxes in rural catchments. Ecol. Model., 137, 91-103. Gustard, A., Marhsall, D. C. W. and Sutcliffe, M. F., 1987. Low flow estimation in Scotland. Institute of Hydrology Report 101, Institute of Hydrology, Wallingford, UK. Oreskes, N., Shrade-Frechette, K. and Belitz, K., 1994. Verification, Validation and Confirmation of Numerical Models in the Earth Sciences. Science, 263, 641-646. Wade, A.J., Hornberger, G.M., Whitehead, P.G., Jarvie, H.P. and Flynn, N., 2001. On modelling the mechanisms that control in-stream phosphorus, macrophyte and epiphyte dynamics: An assessment of a new model using general sensitivity analysis. Water Resour. Res., 37, 2777.2792. Wade, A.J., Whitehead, P.G. and Butterfield, D., 2002a. The Integrated Catchments model of Phosphorus dynamics (INCAP), a new approach for multiple source assessment in heterogeneous river systems: model structure and equations. Hydrol. Earth Syst. Sci., 6, 583.606. Wade, A.J., Durand, P., Beaujouan, V., Wessel, W.W., Raat, K.J., Whitehead, P.G., Butterfield, D., Rankinen, K. and Lepisto, A., 2002b. A nitrogen model for European catchments: INCA, new model structure and equations. Hydrol. Earth Syst. Sci., 6, 559. 582. Wade, A.J., Whitehead, P.G., Hornberger, G.M. and Snook, D.L., 2002c. On modelling the flow controls on macrophyte and epiphyte dynamics in a lowland permeable catchment: the River Kennet, southern England. Sci. Total Envir., 282/283, 375.393. Wade, A.J., Whitehead, P.G., Hornberger, G.M., Jarvie, H.P. and Flynn, N., 2002d. On modelling the impacts of phosphorus stripping at sewage works on in-stream phosphorus and macrophyte/epiphyte dynamics: a case study for the River Kennet. Sci. Total Envir., 282/283, 395.415. Hewett, C.J.M., Quinn, P.F., Whitehead, P.G., Heathwaite, A.L. and Flynn, N.J., 2004. Towards a nutrient export risk matrix approach to managing agricultural pollution at source. Hydrol. Earth Syst. Sci., 8, 834.845. Whitehead, P. G., Wilson, E. J. and Butterfield, D., 1998a. A semi-distributed Nitrogen Model for Multiple Source Assessments in Catchments (INCA): Part 1 Model Structure and Process Equations. Sci. Total Environ., 210/211, 547-558. WATSHMAN The new Watshman-system developed at IVL consists of five modules; Run-off modelling, Gross load modelling, Net load modelling, Point source data analysis and Monitoring data analysis. The modelling developed for nitrogen and phosphorous, is performed step-wise meaning that the result from the runoff calculation is used as input to the gross load calculation which is used in the net load calculation. Model calibrations are done between each step. The Point Source Data Analysis and the Monitoring Data Analysis modules can be used independently of each other and the modelling modules. The strength of the WATSHMAN system is its user-friendly interface as a tool to administrate monitoring data and modelling results of hydrology, nutrient gross load calculations and retention calculations to present source apportionment and scenario calculations. The Watshman database is based on the ArcHydro data model and extended with objects needed by Watshman for modelling etc. The ArcHydro Data Model can be defined as a geographic database containing a GIS representation of a hydrological information system under a case-specific database design which is extensible, flexible, and adaptable to the user requirements.Watshman is a module built system and not all modules are needed in order to get benefit out of the system. SWAT has been used for hydrologic modelling in some model applications in Sweden instead of the simpler SCS-model included in Watshman. A GIS/Web-application (ArcIMS /.NET) was also included in Watshman to make it possible to publish data and maps from the database on Internet. The gross load modelling uses typical mean concentrations of leakage of nutrients from different land-uses, point source loads and the modelled monthly water flow as input to the calculations. It is also possible to insert monthly water flow from other models or measured flow into the database as input to the leakage modelling. The leakage calculations can be made more complex if detailed nutrient loading data for atmospheric deposition and agricultural areas are available. The output from this modelling is the gross load from each HRU and catchment (including load from point sources). The interfaces for the different steps are described in the coming figures: When the modelling purpose puts higher demand on detailed hydrological results than the SCS-method can provide, e.g. for detailed modelling of agricultural leakage when effects of changed management practices are to be analysed, or when a more physical model needs to be used, it is possible to incorporate more advanced models in Watshman. If ArcSWAT for instance is used for model calculations, Watshman’s can be used only for storing indata and for analysing and presenting SWAT results. Link to WATSHMAN at IVL: http://www.ivl.se/affar/miljo_it/WatshmanDemo/Start.asp Models strength and weaknesses The table below summarize the short model descriptions and a comparison of the models. Modest data requirem ents High time resolution High spatial resolution Processb ased conceptu al model Calibratio n data required Scenario possibiliti es Complete catchmen t model SCS HBV TRK SWAT INCA WATSHMAN Y CEQUALW2 N Y Y N Y Y N Y Y Y Y Y Y Y Y Y Y Y Y Y N Y Y Y Y Y N N Y y y Y Y N n y Y Y Y Y Y n n N Y Y Y Y Distribute Y d Time N consumin g Single N interface Exchang eable submodel s Applied for national assessm ents Applied in Northern Europe NonExpert user Freeware Y Y (semi) Y (fully) Y (semi +) Y (semi) Y Y Y (semi) (semi) Y Y Y Y N Y Y N Y Y Y - Y - N N Y - Y - Y - Y N(some process models can be exchang ed) - Y Y Y Y Y Y Y Y N N N N N Y Y Y Y Y Y Y - have not been verified. Priority substances Models and monitoring Monitoring to determine the occurrence of all priority substances in all different media; air, water, sediment, land and biota is usually costly and demand big efforts and is also not necessarily the best and only way to determine their environmental presence. The use of chemical fate models in combination with monitoring is a costefficient solution. The priority substances may have affinity to certain media such as sediment or biota and may not always be detected in the water phase. The affinity of the substance to the different media is determined by the physical characteristics of the substance, their chemical structure and active groups. To determine the presence of priority substances in the water environment, chemical fate models can be used based on the chemical and physical properties of the substance, degradation rates and emission data. If experimental physical data of a new substance are not available in literature, QSAR or quantum mechanical models can be used to estimate such data based on the knowledge of the active groups of the substance. The monitoring of the substance is suggested to be performed within the WFD surveillance monitoring in a screening program. The screening program is setup based on the geographic distribution of load on the environment from known or suspected pollution sources and the results from the chemical fate model combined with previously existing knowledge on environmental occurrence of the target compound. The pollution source can be determined from a substance flow analysis. The dispersion of the substance in the aquatic phase from a point of pollution may be modelled by a hydrologically fully distributed model as CE-QUAL-W2 or the MIKE products. Ecological status models The toxic pressure from substances on an ecosystem is decided from the free water concentration. The free water concentration is not preferabel to quantify by monitoring for many substances, e.g. hydrophobic substances, because of the affinity of the substance to media other than water and because of the large volume and thereby dilution in the water phase. Fugacity models (see below) or bioaccumulation models (e.g. AQUATOX www.epa.gov/waterscience/models/aquatox/) are used to determine the free water concentration of a substance depending on available monitoring data (sediment samples or biota samples). These models also need physical data of the substance for example vapor pressure to determine the partitioning of the substance between different media and load of the substance. The physical data may not be available from experimental literature for new substances, but can be derived from models as QSAR models or quantum mechanical models. Within the EU Water Framework Directive, the targets for priority substances for a water bodies are set by risk levels (like AA-QS and MAQ-QS). These risk levels are based on observed effects in ecotoxicological risk assessments. The ecotoxicological risk assessments are carried out by establishing the concentration-effect relation for individual chemical components and individual test species (single-species toxicity tests, measuring effects to individuals) [1-4]. However, the WFD is based on preserving or improving the ecological status of the whole water body. On the water body scale, ‘simple’ dose effect relations between individual toxicants and individual species have to be aggregated to an evaluation of the toxic effect of chemical components for the whole water body. To resolve this incongruity between individual-based data and the complex biological entities addressed in ecological risk assessment, a model framework (OMEGA) will be introduced to link individual species sensitivity distributions to risk levels on the water body scale. The total toxic pressure of substances on an ecosystem and the ecological status may be determined using the OMEGA tool under development in the REBECCA EU project. OMEGA stands for Optimal Modelling for Ecotoxicological Assessment, predicting effects on plants, animals and populations or ecological functions. The method is based on a stepwise approach: 1. Calculation of the potentially affected fraction of species (PAF). 2. Identification of sensitive species or speciesgroups. 3. Calculation of accumulation in foodchains. 4. Calculating effects on development of populations. Within Rebecca, the focus is on the first two steps within the model. Chemical fate models Chemical fate models are used to determine the partitioning of a substance between different media such as eg. air, water, sediment, soil and biota. Some models include additional compartments such as urban surfaces, plants/forests, or entire aquatic foodwebs. In their most simple form, the results form such a model can be used to e.g. obtain information on the free water concentration and thereby the toxic pressure on the ecosystem or for information on the overall environmental fate of a substance for monitoring purposes. Chemical fate models are often based on concentration, or, as in this case, fugacity. The fugacity modelling approach was introduced by Mackay et al., and is fate models based on the concept of fugacity (Mackay, 1991 and 2001). The fugacity, f expressed in units of Pascal, [Pa], can be described as a partial pressure extered by a substance when leaving one phase of the environment (e.g. air) for another (e.g. water). It is related to the concentration of a substance within a phase i (soil, sediment, water, air etc) through fi=Zi+Ci, where Zi is defined as the fugacity capacity [mol m-3 Pa-1], the escaping tendency of a substance from the phase i. Each phase has a set of defined transport velocity parameters, their D-values [mol Pa-1 h-1]. When combined with mass flow equations, degradation kinetics (e.g., the half-lives of the substance in the included phases) and the spatial parameterisation of a certain area, the models can give valuable information on the distribution of the substance between different phases (‘compartments’) in the environment. The models also provide information on residence time, accumulation and concentrations. In order to include the unsteady-state dynamics of the substances being emitted to the compartments of the model, the calculating algorithm needs to handle and solve the differential equation; Vi being the volume of the compartment i, Zi its bulk fugacity capacity, Ii is the input rate, each term Dijfj represents intermedia input transfers and Dtifi is the total output. If initial fugacities (concentrations) are defined for each medium, the equations can be numerically integrated to give fugacities (concentrations) as a function of time, thus a level IV-model. Since the environmental temperature of the region can vary tremendously over a one year period, the algorithms also to include at least the temperature variations above 273 K and whence available temperature dependent physical input data. IVL has within the REBECCA EU project developed a tool to host the time-resolved calculations of this complexity as well as to display results in the MATLAB Simulink software. See REBECCA toxics final report reference no Q3477.40 for information on all equations for calculating bulk fugacity capacities and D-values. In order to fully take advantage of the time- (and temperature) resolution level IV algorithms, it is very important that the area of interest is spatially parameterised. Highly resolved geographical data can often be retrieved using GIS-based databases (Geographic Information System). A specific graphical user interface (GUI) was developed by IVL dedicated to fugacity modeling. In the first step a fugacity toolbox is created within the Simulink library environment, in order to enable customary drag-and drop operations when composing the model. In areas with highly variable climate it is desirable to include the temperature variations in the model. The MATLAB Simulink code is fully functional with respect to implement temperature dependencies also for the substance specific properties such as log KOW(T), Pvap (T) and Sol(T). However, the general lack of data renders it difficult to utilise any other figures than the properties at 298 K. After an iterative approach, usually 5-10 different emission scenarios per substance is required before satisfactory matching, the best fit model will provide the free water concentration of the substance. An advantage of the fugacity modelling approach is thus that a spatial distribution of the concentration can be calculated. Furthermore, the fugacity modelling approach yields extensive information on the concentrations of toxicants in all environmental mediums, not only water. This can provide valuable overall information when assessing the environmental burden associated with a certain emission scenario. When emission charges changes it will affect the environmental stress, which will have an impact on the decision making process. The model can thus be used to predict the impact of emissions changing over time. Mackay D, Paterson S (1991). Evaluating the multimedia fate of of organic chemicals: A level III fugacity model. Environ. Sci. Technol., vol 25, pp 427-436. Mackay D (2001). Multimedia environmental models: the fugacity approach, 2 nd ed. CRC, Boca Raton, FL, USA Link to level III fugacity model EQC (Mackay et al, downloadable freeware) http://www.trentu.ca/academic/aminss/envmodel/models/VBL3.html Link to REBECCA project homepage http://www.environment.fi/default.asp?contentid=230500&lan=EN Link to IVL http://www.ivl.se QSAR models For many substances, chronic toxicity and for new substances physical properties may not be available. A QSAR model is a relation between chemical structure and a property of the chemical compound. The features of a chemical structure are captured by so called chemical descriptors that can be of a number of different types. This section describes the methods and software used for descriptor calculation and regression modelling and the text has been captured from the toxics final report in the REBECCA EU project. All choices of software and methodology have been guided by the ambition to make the developed models easy to apply for prediction of the endpoints for new substances. Chemical descriptors can be classified according to different properties. One possible classification is into measured and calculated descriptors. A measured descriptor is usually a physical property of the compound, e.g. partition coefficients, refractive index or light absorption, and requires that the substance is available in a laboratory or similar. Calculated descriptors on the other hand, do not require that the substance is isolated in the laboratory; it may not even have been synthesized, since all that is needed is the chemical structure. A further classification of calculated descriptors is into zero, one, two and three dimensional depending on how they are dependent on the chemical structure. Zero and one-dimensional descriptors only depend on the number of different atoms and functional groups. Two dimensional descriptors depend on the connectivity between atoms while three-dimensional also depend on the conformation of the molecule. Undoubtedly, the single most important descriptor used in QSAR is hydrophobicity, which is usually measured as the logarithm of the octanol/water partition coefficient, log KOW. Recently, more advanced calculated descriptors that account for the 3D structure of a molecule are gaining increased use especially for receptor mediated effects such as endocrine disruption (Perkins et al. 2003). Usually only calculated descriptors are used, since many of the compounds for which it is of interest to apply QSARs are new and measured descriptors are often not available. Descriptor calculation can be made using the software Dragon (Talete srl, Italy). Dragon requires a 3-D structure as input. Often, quantum or molecular mechanics software is used for 3D optimization of chemical structures. However, such software is often expensive and optimization can be very time consuming for large structures. An alternative is rule-based 3D structure estimation, which is faster and considered to be sufficiently accurate. A summary of the methods and software used for descriptor calculation is as follows: 1. CAS number is transformed to SMILES strings using information from public databases. 2. CORINA is used to transform the SMILES string to 3D mol files. 3. DRAGON was used to calculate descriptors from the mol files Partial least squares (PLS) regressionis used, which is a latent variable regression method. One of the main advantages of latent variable regression methods are the possibility for prediction outlier detection offered. It is extremely important to note that empirical models are not validated outside the domain in which they are trained, i.e. a QSAR model cannot be applied to substances that are too dissimilar to the substances in the training data. Outlier diagnostics can be used to estimate whether or not new substances are similar enough to the training data to yield reliable predictions. If not, the prediction obtained should not be trusted. Outlier detection according to the methodology described in Furusjö et al. (2005) was used. If no a priori information about variable importance is available, so called auto-scaling can be used, i.e. all variables are scaled with the inverse of their standard deviation in the training set and then centered by subtracting the mean. Variables that do not have any variance in the training set are normally excluded from training data, since there influence on the response cannot be determined. If a descriptor is constant in the training set, this is an indication that the model is not valid for compounds with other values of this variable. To detect this, the descriptor must be kept in the model. As an example, if the count of nitro groups is zero for all substances in training set, the model should not be trusted for nitro compounds. If the descriptor containing the number of nitro groups is excluded, information is discarded that can be used to detect this fact. Cross validation prediction errors are superior for outlier detection compared to Y residuals from fitting a model with all substances. Validation is of utmost importance for all regression modelling but especially for regression method with many parameters, like latent variable regression and neural networks. Both cross-validation and test set validation should be used. This gives reliable models and good estimates of model accuracy. Erik Furusjo¨ *, Anders Svenson, Magnus Rahmberg, Magnus Andersson, The importance of outlier detection and training set selection for reliable environmental QSAR predictions. Chemosphere vol 63 (2006) pp 99–108. REBECCA Toxics final report. EU 6:th FP. Reference no Q3477.40 Link to information on REBECCA http://ec.europa.eu/research/fp6/ssp/rebecca_en.htm http://www.environment.fi search for REBECCA

Models evaluation and selection

Related documents

Products

Support

Models evaluation and selection

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib