ARTIFICIAL NEURAL NETWORK (ANN) ESTIMATION OF SATURATED HYDRAULIC CONDUCTIVITY Wilson Agyei Agyare1, Soojin Park2 and Paul Vlek3 Introduction Saturated hydraulic conductivity (Ks) among other soil hydraulic properties are important for initializing climate and hydrologic models. However, measuring Ks is time consuming and expensive. Work done in the past to model Ks has been limited to the use of empirical and physical relationships referred to as pedotransfer functions (PTFs) and in recent times Artificial Neural Networks (ANN). ANN is used as a special class of PTFs to approximate any continuous (nonlinear) function (Pachepsky and Schaap, 2004). The use of terrain attributes for modelling Ks may serve as a suitable alternative, as terrain data are fairly easy to collect compared to intensive soil sampling. The important question is: will the inclusion of terrain attributes in estimating Ks improve ANN model performance? Also, some of the setbacks in the use of ANN are the issue of sensitivity and amount of input parameter required to make a good estimate. Study area: The study was carried out at two locations in the Volta Basin of Ghana namely; Tamale (9°28’N and 0°55’W), and Ejura (7°19’N and 1°16’W) (See Agyare, 2004). 0 .7 Different input data Mean R2 value for training data Mean R2 value for test data 0 .6 All parameters (A) 0.60a (0.019) 0.47a (0.020 0 .5 Ten (10) most sensitive parameters (B) 0.58a (0.015) 0.51a (0.022) 0.50a (0.021) 0.15b (0.023 0.07b (0.013 F-statistic (Significance) 85.1 (0.00) 92.0 (0.00) ab b b b 0 .4 2 Six (6) most sensitive continuous soil parameters (C) 0.56a (0.018) Using only terrain parameters (D) T rain in g a n d testin g d ata fro m sa m e site T rain in g a n d testin g d ata fro m d iffe re n t sites ab a 0 .3 Training Testing curvature†, 0 10 20 30 Number of input parameters Figure1A. Variation in R2 for Ks estimation with increasing number of input parameters using ANN 0.55 B 0.50 R 2 0.45 Training 0.40 Testing Linear (Training) 0.35 Log. (Testing) 0.30 0 200 400 600 800 1000 1200 Sample data size Figure1B. ANN training data size effect on R2 for training and test data for estimating Ks using combined data (Tamale + Ejura) All parameters (A): Profile plan curvature, curvature, elevation (m), wetness index, upslope contribution area (m2), stream power index†, slope gradient (°), LS factor†, Aspect† (°), pH, Bulk density†* (gcm-1), Organic carbon†* (%), CEC†* (cmol(+)kg-1), Silt†* (%), Clay†* (%), Sand†* (%),site (Tamale or Ejura), gravel and/or concretion, soil sampling depth (topsoil or subsoil), soil structural grade (strong), structural type (sub-angular blocky), and structural size (course); with B and C indicated by † and * 1CSIR-Savanna Agricultural Research Institute, Tamale, Ghana of Geography, Seoul National University, Shilim-Dong, Kwanak-Gu, Seoul, Korea for Development research, University of Bonn, Bonn, Germany 2Department 0 .2 c c 0 .1 0 .0 E ju ra to p s o il E ju ra s u b s o il T a m a le top so il T a m a le s u b s o il S ite a n d d ep th o f sa m p lin g Figure 2. Comparison of R2 for estimated Ks for different testing data using training data from the same site and different site using ANN Sensitivity analysis of ANN Figure 1A depicts a rapid improvement in R2 for both training and testing data for the most sensitive parameters. The increase then becomes gradual, with the training data attaining a plateau, whereas with the testing data, R2 declines with additional input parameters. The R2 for the training data linearly increases as an indication of the increasing ability to train the ANN as the size of the input data is increased (Figure 1B). However, for the testing data, the R2 increases at a decreasing rate. This trend indicates that after a certain maximum training data size there will be no further increase in the ability to estimate. ANN modeling with soil and terrain parameters According to Table 1 using only terrain attributes (D) gives an R2 that is significantly lower than when the other three parameter groups are used for both training and testing datasets. Figure 2 illustrates R2, for the different sites by soil depths when Ks is estimated with testing dataset from the same or different site as the training dataset. Shown on the graphs are the Bonferroni mean separation results using a, b, and c. Also marked on the graphs are standard error bars. There is higher R2 for testing data when it is from the same site as the training data. The R2 for the topsoil at the two sites is significantly higher for situations when the training and testing data are from the same site but lower when the testing dataset is from a site different from that of the training dataset. The R2 for subsoil at both sites were not significantly different whether the testing and training datasets are from the same site or not. 3Center Terrain analysis Point elevation data generation using differential GPS Digital elevation model (DEM) generation Terrain parameter generation from DEM Soil sampling and analysis Transecting: Minpit soil identification and profile description Sample depth: 0 – 15 cm (topsoil) and 30 – 45 cm (subsoil) Disturbed sample: Particle size distribution (sand, silt and clay), Organic carbon, CEC and pH Undisturbed sampling: Saturated hydraulic conductivity (Ks) and Bulk density Artificial neural network (ANN) Model: Multi-Layer Perceptron (MLP) with cross validation Evaluation: Sensitivity analysis, R2 and NMSE Statistical analysis: CV, ANOVA, R2 Table 1. Coefficient of determination (R2) for Ks using different data groups from two sites and sampling depths as input data with standard error in ( ) A 0.53 0.51 0.49 0.47 0.45 0.43 0.41 0.39 0.37 0.35 Methodology R R 2 Results and Discussions Objectives Identify sensitive parameters among soil and terrain parameters, and data size suitable for estimating Ks Estimate Ks for sites different from those of the training data. Conclusion Sensitive parameters are important for ANN modeling of Ks Large training data set (> 1000) is required for good estimation of Ks using ANN In Ks estimation using ANN, training with dataset from same environment is important when the topsoil is being considered Inclusion of terrain parameters can improve the estimation of Ks using ANN, but it can not be relied upon solely as input data. References 1.Agyare, W.A. 2004. Soil characterization and modelling of spatial distribution of saturated hydraulic conductivity at two sites in the Volta Basin of Ghana. Ecology and development series, No. 17, Cuvillier Verlag, Göttingen, Germany 2. Pachepsky, Y., Schaap, M.G. 2004. Data mining and exploration techniques. In: Pachepsky Y. and Rawls W.J. (Eds.), Development of pedotransfer functions in soil hydrology, Development in soil science. Elsevier, Vol. 30, pp. 21-32