Front. Earth Sci. 2014, 8(3): 439–456 DOI 10.1007/s11707-014-0416-0 RESEARCH ARTICLE Artificial Neural Network (ANN) and Regression Tree (CART) applications for the indirect estimation of unsaturated soil shear strength parameters D.P. KANUNGO (✉), Shaifaly SHARMA, Anindya PAIN Geotechnical Engineering Group, CSIR – Central Building Research Institute (CBRI), Roorkee 247667, India © Higher Education Press and Springer-Verlag Berlin Heidelberg 2014 Abstract The shear strength parameters of soil (cohesion and angle of internal friction) are quite essential in solving many civil engineering problems. In order to determine these parameters, laboratory tests are used. The main objective of this work is to evaluate the potential of Artificial Neural Network (ANN) and Regression Tree (CART) techniques for the indirect estimation of these parameters. Four different models, considering different combinations of 6 inputs, such as gravel %, sand %, silt %, clay %, dry density, and plasticity index, were investigated to evaluate the degree of their effects on the prediction of shear parameters. A performance evaluation was carried out using Correlation Coefficient and Root Mean Squared Error measures. It was observed that for the prediction of friction angle, the performance of both the techniques is about the same. However, for the prediction of cohesion, the ANN technique performs better than the CART technique. It was further observed that the model considering all of the 6 input soil parameters is the most appropriate model for the prediction of shear parameters. Also, connection weight and bias analyses of the best neural network (i.e., 6/2/2) were attempted using Connection Weight, Garson, and proposed Weight-bias approaches to characterize the influence of input variables on shear strength parameters. It was observed that the Connection Weight Approach provides the best overall methodology for accurately quantifying variable importance, and should be favored over the other approaches examined in this study. Keywords cohesion, friction angle, Artificial Neural Network, Regression Tree, Connection Weight, Weightbias Approach Received May 15, 2013; accepted September 25, 2013 E-mail: debi.kanungo@gmail.com 1 Introduction Shear strength parameters are the most important engineering properties of soil. Determinations of the shear strength properties of soil are of foremost importance in geotechnical investigation. Shear strength properties are further required for the determination of bearing capacity for foundation analysis. Understanding of the shear strength of the soil is critical for the design of road embankments, retaining walls, copper dams, etc. Shear strength is the property of the soil that enables the soil to keep its equilibrium when its surface is not level, or for that matter, under any loading situation that produces shearing stresses. Several procedures have been proposed in the literature to determine the shear strength parameters of unsaturated soil. These shear parameters can be determined either in the field, or in the laboratory, or both. The tests employed in the laboratory may include an unconfined compression test, a triaxial test, a laboratory vane shear test, a direct shear box test, and a direct simple shear test. In-situ tests are normally conducted to confirm the validity of the laboratory tests, and for design purposes. The in-situ tests include a field vane shear test, a standard penetration test, a cone penetration test, and piezocone and pressure meter readings (Jain et al., 2010). Different authors have worked considerably in the field of the prediction of shear strength parameters of unsaturated soils, using mathematical relationships such as elliptical and hyperbolic functions (Abra-mento et al., 1989; Escario and Juca, 1989; Lu, 1992; Shen and Yu, 1996; Xu, 1997). Recently, soft computing techniques, such as Artificial Neural Network (ANN), Fuzzy System, Genetic Expression Programming, and others, have been used frequently to solve a wide variety of problems in geosciences and geotechnical engineering. These include the estimation of the probability of liquefaction (Youd and Gilstrap, 1999; Juang et al., 2000; Goh et al., 1994; Hanna et al., 2007), 440 Front. Earth Sci. 2014, 8(3): 439–456 strength parameter modeling of different soils (Agrawal et al., 1994; Attoh-Okine, 2004; Goktepe et al., 2008; Kaya, 2009), hydraulic conductivity (Akbulut, 2005), predicting the angle of internal friction in soils using a hybrid genetic fuzzy system (Goktepe and Sezer, 2010), identification of compaction characteristics (Najjar and Basheer, 1996; Alavi et al., 2009), and the problem of slope stability (Ferentinou and Sakellariou, 2007; Cho, 2009). Kayadelen et al. (2009) studied the prediction of the angle of internal friction of soils using soft computing techniques. Among all of these soft computing techniques, ANN has a wide range of applicability in cases of landslide susceptibility zonation (LSZ), prediction of debris flow, landslide movement monitoring, prediction of strength and deformation properties of rock, etc. Lu and Rosenbaum (2003) built models using artificial neural networks and grey systems for the prediction of slope stability. Neaupane and Achet (2004) used a back propagation neural network for monitoring a landslide in the higher Himalaya. ANN models (Arora et al., 2004; Gómez and Kavzoglu, 2005; Yesilnacar and Topal, 2005; Kanungo et al., 2006; Nefeslioglu et al., 2008) have been implemented for LSZ studies. Sonmez et al. (2006) used ANN for the determination of the deformation modulus of intact rock specimens. Das and Basudhar (2008) used artificial neural networks for the prediction of the residual friction angle of clays. Tiryaki (2008) used multivariate statistics, artificial neural networks, and regression trees to predict intact rock strength and deformation properties for mechanical excavations. Baykasoğlu et al. (2008) used genetic programming for the prediction of compressive and tensile strength of Gaziantep limestone. Maji and Sitharam (2008) used the ANN model for the prediction of elastic modulus of jointed rock mass. Çanakcı et al. (2009) predicted the compressive and tensile strength of Gaziantep basalts using neural networks and gene expression programming. Rafiai and Jafari (2011) developed a new set of rock failure criteria using the ANN approach. In the present study, indirect estimation of shear strength parameters, such as friction angle (f) and cohesion (c) of soil, under unsaturated conditions was done using ANN and Regression Tree (CART) approaches. 2 Data used for the study Laboratory testing of soils from surface and subsurface areas is a fundamental element of the geotechnical investigation of a site before the design and practice of any civil engineering construction. These laboratory tests may vary from simple soil classification tests to complex strength and deformation tests. In this research, soil samples have been obtained from both surface and subsurface soil resources from 6 different states of India, namely, Himachal Pradesh, Uttar Pradesh, Bihar, Jharkhand, Orissa, and Andhra Pradesh. A series of laboratory tests were conducted to determine the engineering properties of these soil samples. All tests were performed according to Indian Standard 2720 (IS 2720: Parts IV, V and XIII). The tests performed included the grain size distribution, Atterberg limits, dry density, and direct shear test. Soil parameters including gravel % (GP), sand % (SP), silt % (STP), clay % (CP), dry density (DD), plasticity index (PI), c, and f were measured on 115 samples. These data were taken from the unpublished reports of the Institute and utilized for this research work. The basic idea in this research is the evaluation of the capability of ANN and CART techniques to make indirect estimates of the shear strength parameters of soils under unsaturated conditions. Statistical descriptions of the soil parameters of all 115 soil samples are given in Table 1. It can be seen from this table that the median and average values of each parameter are similar. This shows that the statistical distributions of each parameter for all of the soil samples are nearly normal. As shown in Table 1, the measured values of c range from 0.0 to 0.7 kg/cm2, with an average value of 0.14 kg/cm2, and a median value of 0.08 kg/cm2. The f values range from a lower value of 9 degrees to a higher value of 40.5 degrees. The mean and median values of f are 25.4 and 26 degrees respectively. Amongst all of the 6 soil parameters, DD has the least spatial variation; and SP has the largest spatial variation. A correlation matrix was produced by applying a bivariate correlation technique to the original data set, in order to analyze the strength of the linear relationships Table 1 Basic descriptive statistics for different parameters of soil samples Statistics GP SP STP CP DD /(gm$cm–3) PI c/(kg$cm–2) f /degree Minimum 0 5 0 0 1.24 0 0 9 Maximum 59 97 82 48 2.04 45.22 0.7 40.5 Average 4.456522 59.56957 25.84783 10.24783 1.838817 10.23383 0.139087 25.40261 Median 0 61 23 10 1.9 11 0.08 26 8.730257 23.65652 17.82655 10.6948 0.166569 9.609107 0.164424 7.200542 115 115 115 115 115 115 115 115 Standard deviation Number of samples D.P. KANUNGO et al. ANN & CART for prediction of soil shear parameters between the variables considered in this case. The correlation coefficients (R) values between c and f, the dependent variables, and the other 6 soil properties (GP, SP, STP, CP, DD, and PI), the independent variables, were investigated. The correlation coefficient matrix for all soil parameters, along with their significance levels, is given in Table 2. The results in Table 2 show that SP, STP, and CP have influences on c. STP and CP are significant factors contributing to c, with positive R values of 0.589 and 0.512, respectively. SP also appears to influence c, but with a negative R of –0.672. As per the significance levels of different soil parameters (refer to Table 2), GP is the only insignificant parameter with respect to c. Also SP, STP, CP and PI have influences on f. SP is the most significant soil parameter contributing to f of soil, with a positive R value of 0.661. STP, CP, and PI are related to f with negative R values of –0.651, –0.603 and –0.631, respectively. It is also observed in Table 2 that DD is the only insignificant parameter with respect to f. 3 Methodology 3.1 Artificial Neural Network (ANN) ANN was originally devised as models of the human brain. It was hoped that ANN could reveal useful information about the structure of the brain and the processes that occur within the brain. The use of ANN as a tool for exploring brain function has become increasingly widespread within 441 cognitive psychology and neurophysiology in recent years. However, this study is primarily interested in ANN as a tool for solving mathematical problems. In particular, ANN is used to identify unknown multivariate functions from samples of data. Aspects concerning the biological validity of ANN architecture, or of a training algorithm, are only occasionally considered. Neural networks are non-linear statistical data modeling tools. They can be used to model complex relationships between inputs and outputs, or to find patterns in data. A set of connected units forms a neural network with the capability of nonlinear input/output approximations. If the units are grouped into layers, and all units of a layer are connected with the units of the subsequent layer, a feed forward network is developed (i.e., a static process). Each layer in a network contains a sufficient number of neurons depending on its specific application. The neurons in a layer are connected to the neurons in the next successive layer; and each connection carries a weight (Atkinson and Tatnall, 1997). The input layer receives the data from different sources. Hence, the number of neurons in the input layer depends on the number of input data sources. The hidden and output layers actively process the data. The number of hidden layers and their neurons are often determined by trial and error (Gong, 1996). The number of neurons in output layers is fixed by the application. Each hidden neuron responds to the weighted inputs it receives from the connected neurons from the preceding input layer (Lee et al., 2004). Once the combined effect on each hidden neuron is determined, Table 2 Correlation matrix and significance levels for the considered data set Correlation matrix Parameters GP GP SP STP CP DD PI c f 1 –0.102 –0.159 –0.327 –0.252 –0.182 –0.012 0.274 1 –0.890 –0.648 0.223 –0.679 –0.672 0.661 1 0.433 –0.275 0.531 0.589 –0.651 1 0.183 0.779 0.512 –0.603 1 0.006 –0.298 0.019 1 0.470 –0.631 1 –0.678 SP STP CP DD PI c f 1 Significance Levels (Correlation is significant at the 0.01 level) GP SP STP CP DD PI c f . 0.277 0.090 0.000 0.007 0.052 0.900 0.003 . 0.000 0.000 0.016 0.000 0.000 0.000 . 0.000 0.003 0.000 0.000 0.000 . 0.050 0.000 0.000 0.000 . 0.948 0.001 0.838 . 0.000 0.000 . 0.000 . 442 Front. Earth Sci. 2014, 8(3): 439–456 the activation at this neuron is determined via a transfer function (Yesilnacar and Topal, 2005). Any differentiable nonlinear function can be used as a transfer function; but a sigmoid function is generally used, though there are many other functions (Schalkoff, 1997). The sigmoid function constrains the outputs of a network to a range between 0 and 1. This type of feed forward network propagates the input vector Xmn from the input-layer, through one or more hidden layers, to the output-layer, only in one direction. A dynamic process can be carried out by adding external feedback of delayed outputs; and this is referred to as external recurrent networks. The relationship between the input vector (Xmn ) and output vector (Xjnþ1 ) of this element can be described as follows: ! X n n Wjm Xm , Xjnþ1 ¼ F (1) i 1 or 1 þ e–x other nonlinear transfer function, e.g., tan-sigmoid funcn tion, and Xjnþ1 is output of unit j in the nth layer, and Wjm is th th a weight from unit m in n layer to unit j in (m + 1) layer, as shown in the Fig. 1. Network training is a process by which the connection weights and biases of the ANN are adapted through a continuous process of simulation by the environment in which the network is embedded. The primary goal of training is to minimize an error function, by searching for a set of connection strengths and biases that cause the ANN to produce outputs that are equal or close to targets. The network connection strengths are adjusted in the training process, which can be executed through a number of learning algorithms based on back propagation learning (Ripley, 1996; Haykin, 1998; Zhou, 1999; Lee et al., 2004; Gómez and Kavzoglu, 2005; Yesilnacar and Topal, 2005). The most widely used back propagation algorithms are gradient descent and gradient descent with momentum. These are often too slow for the solution of practical problems. The faster algorithms use standard numerical optimizers such as conjugate gradient, quasi-Newton, and Levenberg-Marquardt approaches. In where F(x) is the log sigmoid function FðxÞ ¼ this study, the Levenberg-Marquardt algorithm (implemented as TRAINLM in MATLAB software) was used for training the neural network. The details of this algorithm can be found in Hagan and Menhaj (1994) and Hagan et al. (1996). In other words, training aims at estimating the parameters’ weights and biases by minimizing an error function of the output values. The total sum squared error E is averaged over all patterns in the training set, in which Mp is the target output (predicted) for the pth pattern, and Op is the actual output (measured). Xp E¼ ðMp – Op Þ2 : (2) p¼1 The process of back propagating the error is repeated iteratively until the error is minimized to an acceptable value and the adjusted weights are obtained, which are then used to determine the network outputs. The performance of the network depends on the accuracy obtained over a set of test data. If the network is trained to an acceptable level of accuracy, then the adjusted weights are used to determine the outputs of the test data set. The R and Root Mean Square Error (RMSE) were considered for evaluating the ability of trained and tested networks in predicting shear parameters. The coefficient of determination is a measure of the accuracy of prediction of the trained network models. Higher R values indicate better prediction. In addition, the mean relative percentage error was also used to measure the accuracy of prediction. For each network, the training and testing errors/accuracies for the individual output shear strength parameters (i.e., either c or f) were calculated using Eq. (3) and Eq. (5). Accuracies for multiple output shear strength parameters (i.e., both c and f) were calculated using Eq. (4) and Eq. (6). rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 1 Xn RMSE ¼ ðMj – Oj Þ2 , (3) j¼1 n RMSEcombined ! rffiffiffiffiffiffi n n X 1 X 2 2 ¼ ðM – Oc Þ þ ðMf – Of Þ , 2n c¼1 c f¼1 vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi X ! u u ðMj – Oj Þ2 j X R ¼ t1 – , 2 ðO Þ j j (4) (5) XX Mjj – M Ojj – O j j ffi , (6) Rcombined ¼ sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi XX 2 X X 2 Mjj – M Ojj – O j Fig. 1 A schematic description of the relationship between the input and output vectors of one neuron. j j j where j = number of data patterns in the independent data set; M = actual data set; O = predicted data set; M = mean D.P. KANUNGO et al. ANN & CART for prediction of soil shear parameters of actual data set; O = mean of predicted data set. In this study, a feed forward back-propagation (FFBP) multi-layer ANN, with one input layer, one hidden layer, and one output layer was used to predict the shear strength parameters of the soil. The input layer contains a maximum of 6 neurons, each representing a parameter (GP, SP, STP, CP, DD, and PI) that contributes to the shear strength of the soil. The output layer contains two different neurons, representing c, and f of soil under an unsaturated condition for a given set of input values. It is important to ensure that the training samples given to the network cover the range of possibilities and are representative of the frequency of occurrences of the variables. The Levenberg-Marquardt algorithm (implemented as TRAINLM in MATLAB software) was used for training the neural network; and the log-sigmoid function was applied for neurons between the input and hidden layers, and also for neurons between the hidden and output layers. By varying the number of neurons in the hidden layer, the neural networks are run several times to identify the most appropriate neural network architecture based on training and testing accuracies. The neural network processing that was implemented, using coding developed in MATLAB Software and the computation procedure of the ANN process, is given in Fig. 2. In this study, 95 of the simulated samples in a random order formed the training data; while the remaining 20 simulated samples in a random order were used for both validation and testing. 3.2 443 Regression tree In recent times, there has been increasing interest in the use of classification and CART analysis. CART analysis is a tree-building technique, which is different from traditional data analysis methods. Breiman et al. (1984) developed CART, which is a sophisticated program for fitting trees to data. Depending on the value of the predictor variables, a decision tree partitions the data set into regions, so that the response variable is roughly constant in that region. The attractive feature of the CART methodology is a sequence of hierarchical questions that the algorithm asks. This method is relatively simple to understand and interpret. The questions can be answered as ‘yes’ or ‘no’; and depending on the answer, it either proceeds to another question, or arrives at a fitted response value. Details about the mathematical treatment of regression tree analysis are discussed in Breiman et al. (1984). The unique starting point of a classification tree is called a root node and consists of the entire learning data set at the top of the tree. A node is a subset of the set of variables, and it can be a terminal or non-terminal node. A non-terminal node is a node that splits into two daughter nodes (binary split). Such a binary split is determined by a condition on a single variable, where the condition is satisfied or not satisfied by the observed value of that variable. All observations in the whole dataset that have reached a particular node and satisfy the condition for that variable drop down to one of the daughter nodes. The remaining observations at that node that do not satisfy the condition drop down to the other daughter node. A node that does not split is called a terminal node. In each terminal node, t, the fitted value, or the predicted response value y(t) is constant. Suppose the data points (x1,y1), (x2,y2),…,(xn,yn) are all of the samples belonging to one terminal node, say node l. Then the simple model for l is given in Eq. (7), the sample mean of the dependent variable in that cell. 1 X yðlÞ ¼ y: (7) N ðtÞ x 2 l n n Since the predictor, yðlÞ, is constant over each terminal node, the regression tree can be thought of as a histogram estimate of the regression surface. For every node l, X ðyn – yðlÞÞ2 is the within node sum of squares, that is xn 2l Fig. 2 Neural network processing as implemented using coding developed in MATLAB software. the total squared deviations of yn in l from their average. Summing over the entire terminal node (l2T) gives the total within node sum of squares; and dividing by N gives the average (Eq. 8). 1 XX E¼ ðy – yðlÞÞ2 : (8) N l2T x 2l n n Therefore, a particular regression tree is formed by iteratively splitting nodes to maximize the decrease in E 444 Front. Earth Sci. 2014, 8(3): 439–456 (Breiman et al., 1984). There are two methods to grow a decision tree: (a) stop recursive partitioning when the largest decrease in E would be less than some user defined threshold value say δ, and (b) each terminal node should not contain more or equal to P sample data points, where P is a user defined integer number (splitmin). In this study, the second method is used to grow the decision trees. 4 Implementation of different techniques for the prediction of shear strength parameters of soil The estimation of unsaturated shear strength parameters of soils was performed using both ANN and CART techniques. Both of the techniques used different combinations of GP, SP, STP, CP, DD, and PI as the input parameters, and the unsaturated c, and f of soil as the target outputs. Though GP is insignificant with respect to c, and DD is insignificant with respect to f as per the correlation matrix of the original data set (refer to Table 2), neither of these two parameters (GP and DD) could be excluded from the ANN models. This is because of the fact that this study attempts to develop a single ANN model for estimating both c and f, considering 6 input parameters (GP, SP, STP, CP, DD, and PI) in combination. Secondly, in order to cross verify the inference of the correlation matrix with the capability of the ANN models through the connection weight and bias analysis for input parameter importance for estimating c and f, all 6 parameters in combination are considered for the ANN models. With different combinations out of these 6 input parameters, four different models were tried to establish the efficacy of different input combinations for predicting the shear strength parameters of soil. These models are as follows: (a) Model I: GP, SP, STP, and CP; (b) Model II: GP, SP, STP, CP, and DD; (c) Model III: GP, SP, STP, CP, and PI; and (d) Model IV: GP, SP, STP, CP, DD, and PI. These techniques were implemented with MATLAB software to predict the unsaturated shear strength parameters of the soil samples. 4.1 Prediction using ANN technique A feed forward back-propagation multi-layer ANN with one input layer, one hidden layer, and one output layer was considered in the present study. The number of neurons in the input layer equals the number of input parameters considered in case of each model (i.e., 4 inputs in the 1st case, 5 inputs in each of the 2nd and 3rd cases, and all 6 inputs in the 4th case). The data at each neuron of the input layer correspond to the normalized value of each input. The input parameters GP, SP, STP, and CP are normalized with respect to 100 as these represent the percent value. The input parameters DD and PI are normalized with respect to their maximum value of occurrence (i.e., 2.04 in the case of DD and 45.22 in the case of PI). The output layer consists of two different neurons, representing the experimentally determined value of unsaturated c, and f of soil as the target outputs. The values of c and f are also normalized with respect to their maximum values of occurrence (i.e., 0.7 kg/cm2 and 40.5 degrees respectively). The number of neurons in the hidden layers is varied by running the networks several times to achieve the desired training and testing data accuracies. One set each of training and testing data were randomly generated from the available data set using the stratified random sampling technique. The training dataset consists of data pertaining to 95 soil samples and the testing dataset consists of data pertaining to 20 soil samples out of a total of 115 samples. All of the data points in the datasets were mutually exclusive (Foody and Arora, 1997). The training dataset was used to train different network architectures, while the testing dataset was used simultaneously with the training dataset to control the overtraining of the network. The testing dataset was also used to evaluate the accuracy of the networks. The well-known back propagation Levenberg-Marquardt algorithm was used to train the neural networks. 40 neural network architectures were created by varying the number of neurons in the hidden layer, and then trained and tested. The training process was initiated with arbitrary initial connection weights, which were constantly updated until an acceptable level of accuracy was reached. The final adjusted weights of the trained network were used to derive outputs of the testing data to evaluate the performance of the network. 4.2 Prediction using the regression tree technique A number of regression trees were calculated, respectively, for c and f, using MATLAB and employing different combinations of input parameters (GP, SP, STP, CP, DD, and PI) as predictors. The original experimentally determined values of all of the parameters were used for the analysis. Determining the right size tree is not that straight forward. 50 different trees with varying splitmins were built for each individual combination of predictors. The performances of the trees developed in this study were assessed using two different statistical performance evaluation criteria. The statistical measures used were the R and RMSE, the same as those adopted in the ANN technique. 5 Results and discussion 5.1 Results of ANN technique 5.1.1 Performance evaluation of neural networks The performance of the networks was evaluated by determining both the training and testing data accuracies D.P. KANUNGO et al. ANN & CART for prediction of soil shear parameters in terms of R and RMSE values (Freund, 1992). The combined accuracies in terms of R and RMSE values for both of the shear parameters (c and f) were calculated with MATLAB software for each of the four models, using different combinations of input parameters (as mentioned in Section 4). The training and testing accuracies for all of the 4 models for some selected networks out of a total 40 neural networks with neurons in the hidden layer varying from 1 to 40 are given in Table 3. A variation in both training and testing data accuracies across different models can be seen as the neural network architectures change. The training and testing accuracies in 445 terms of R values observed across different neural networks for different models were as follows: (a) Model I: –0.05 to 0.99 for training and 0.12 to 0.82 for testing, (b) Model II: 0.13 to 0.99 for training and –0.09 to 0.91 for testing, (c) Model III: 0.76 to 0.99 for training, and 0.48 to 0.81 for testing and (d) Model IV: 0.13 to 0.99 for training and 0.27 to 0.88 for testing. This implies that there an optimal network architecture for a given dataset exists. In the present case, the network architectures 4/6/2, 5/ 16/2, 5/18/2, and 6/2/2 were observed to be the most appropriate ones for Models I, II, III, and IV, respectively, with the training and testing data accuracies, as highlighted Table 3 Combined accuracies in terms of R and RMSE’s for both the shear parameters (c and f ) for all 4 models for some selected neural networks R ANN architecture RMSE Training Testing Training Testing 0.904788 0.709213 0.128307 0.202459 Model I (GP, SP, STP and CP as inputs) 4/1/2 4/6/2 0.950486 0.820566 0.094369 0.094369 4/7/2 0.938912 0.73586 0.103917 0.203841 4/20/2 0.983056 0.723992 0.056938 0.229616 4/24/2 0.98938 0.709713 0.045527 0.226808 4/32/2 0.984646 0.724235 0.052902 0.215475 4/40/2 0.991034 0.58296 0.040481 0.27849 5/1/2 0.894149 0.779981 0.134873 0.177807 5/3/2 0.894153 0.894153 0.134871 0.177657 5/6/2 0.933653 0.85501 0.108095 0.15316 Model II (GP, SP, STP, CP and DD as inputs) 5/16/2 0.978625 0.915253 0.062174 0.121163 5/19/2 0.993896 0.861629 0.033275 0.16182 5/32/2 0.995472 0.886153 0.02864 0.135251 5/40/2 0.890366 0.671108 0.282153 0.37535 Model III (GP, SP, STP, CP and PI as inputs) 5/1/2 0.887056 0.735596 0.139076 0.192906 5/7/2 0.967071 0.773272 0.076868 0.201282 5/13/2 0.98884 0.800253 0.047095 0.19212 5/18/2 0.988724 0.810958 0.045282 0.180246 5/21/2 0.986577 0.804543 0.04952 0.19896 5/29/2 0.993524 0.779988 0.034499 0.20616 5/40/2 0.996953 0.639055 0.023531 0.2481 6/1/2 0.900495 0.801914 0.130986 0.170389 6/2/2 0.939658 0.879199 0.103081 0.136468 6/5/2 0.974072 0.87192 0.069304 0.147565 Model IV (GP, SP, STP, CP, DD and PI as inputs) 6/7/2 0.98054 0.877744 0.059174 0.156894 6/15/2 0.975545 0.861556 0.066319 0.16628 6/22/2 0.996851 0.848852 0.023911 0.169109 6/40/2 0.999205 0.761987 0.012016 0.216977 446 Front. Earth Sci. 2014, 8(3): 439–456 in Table 3 for the prediction of shear parameters of soil under unsaturated conditions. For the above most appropriate network architectures, the training R values across all of the models varied from 0.94 to 0.99, whereas the testing R values across these four models varied from 0.81 to 0.91. It can be inferred from these results that there exists a good correlation between training and testing results along with the good prediction capability achieved by the neural networks with the present data set. If the results of all of the four models are compared for the prediction of c and f, it can be observed from Table 3 that Model II, considering 5 input parameters such as GP, SP, STP, CP, and DD, has given the highest accuracies (5/ 16/2 neural network). The training and testing R values as obtained for Model II for the 5/16/2 neural network are illustrated in Fig. 3. However, in the case of Model IV, considering all of the 6 input parameters, the difference between training and testing R values (6/2/2 neural network) is the least when compared to the other three models (Kanungo et al., 2006). The training and testing R values obtained for Model IV for the 6/2/2 neural network are illustrated in Fig. 4. If several networks fit the training set equally well, then the simplest network (i.e., the network that has the smallest number of weights and biases) will, on average, give the best generalization performance (Sietsma and Dow, 1991). Hence, it can be stated from these observations that Model IV is the most appropriate model for the prediction of both the shear parameters of soil under unsaturated conditions. In order to assess the individual accuracy of each of the shear parameters (c and f) separately instead of combined (as discussed above) for all of the four models, the predicted and the target output values of both c and f were analyzed in each case for all the neural networks. The training and testing accuracies in terms of R and RMSE values for all of the 4 models for some selected networks are given in Table 4. A variation in both training and testing data accuracies, for both c and f across different models, can be seen as the neural network architectures change. The training and testing accuracies in terms of R values observed across different neural networks for different models in predicting c were as follows: (a) Model I: –0.60 to 0.98 for training and –0.60 to 0.75 for testing, (b) Model II: –0.31 to 0.99 for training and –0.34 to 0.92 for testing, (c) Model III: –0.09 to 0.99 for training and –0.32 to 0.82 for testing and (d) Model IV: –0.37 to 0.99 for training and –0.48 to 0.91 for testing (Fig. 5). Similarly, for f, the training and testing accuracies in terms of R values observed across different neural networks for different models were as follows: (a) Model I: 0.0 to 0.97 for training and –0.41 to 0.80 for testing, (b) Model II: 0.08 to 0.99 for training and –0.42 to 0.87 for testing, (c) Model III: –0.32 to 0.99 for training and –0.32 to 0.89 for testing and (d) Model IV: –0.31 to 0.99 for training and –0.60 to 0.94 for testing (Fig. 6). For the prediction of c, the network architectures 4/6/2, 5/16/2, 5/21/2, and 6/5/2 were observed to be the most appropriate ones for Models I, II, III, and IV, respectively, with the training and testing data accuracies as highlighted in Table 4. For these most appropriate network architectures, the training R values across all of the models varied from 0.90 to 0.97; whereas, the testing R values across these four models varied from 0.75 to 0.91. It can also be observed, from the results given in Table 4, that for the Fig. 3 Correlation coefficients as obtained for Model II for the 5/16/2 neural network: (a) training and (b) testing. D.P. KANUNGO et al. ANN & CART for prediction of soil shear parameters 447 Fig. 4 Correlation coefficients as obtained for Model IV for the 6/2/2 neural network: (a) training and (b) testing. prediction of f of soil the network architectures 4/6/2, 5/ 16/2, 5/27/2, and 6/8/2 were the most appropriate ones for Models I, II, III, and IV, respectively. For these most appropriate network architectures, the training R values across all four models varied from 0.84 to 0.98; whereas, the testing R values across these four models varied from 0.80 to 0.92. It can be stated from these results that there exists a good correlation between training and testing results even though the individual prediction of both of the shear parameters of the soil is considered. If the results of all of the four models are compared for the prediction of c and f individually, Table 4 shows that Model IV, which considers all 6 input parameters, is the most appropriate model with the highest accuracies for the prediction of both of the shear parameters of soil under unsaturated conditions. In summary, the overall results indicate that, whether the training and testing data accuracies for the prediction of both shear parameters for soil (c and f) are analyzed combined or individually, it hardly affects the final result in terms of the most appropriate model for prediction, Model IV, which considers all 6 input parameters. But it is always easy and convenient to adopt the combined way of accuracy analysis for prediction of both c and f with multiple neurons in the output layer (in this case 2 neurons for c and f) to select the best single neural network architecture. 5.1.2 Connection weight analysis The ANN connection weights (Table 5) obtained for the most appropriate neural network for the best model (6/2/2 neural network for Model IV using all 6 input parameters in the case of combined analysis) to predict both c and f have been used to characterize the input data sources (GP, SP, STP, CP, DD, and PI) in terms of their influence on the shear strength parameters of soil (c and f). In this process, the connection weight approach (Olden and Jackson, 2002) and Garson’s Algorithm (Garson, 1991; Gevrey et al., 2003) have been used to assess the influence of each variable, in the most appropriate neural network (i.e., 6/2/ 2), on the prediction of c and f. Both of these approaches use connection weights between input-hidden and hiddenoutput neurons only and not the biases. The connection weight approach calculates the product of the raw inputhidden and hidden-output connection weights between each input neuron and output neuron, and sums the products across all hidden neurons (Olden and Jackson, 2002; Olden et al., 2004). Garson’s algorithm partitions hidden-output connection weights into components associated with each input neuron using the absolute values of connection weights (Garson, 1991; Gevrey et al., 2003; Olden et al., 2004). As in this case, the neural network with 6/2/2 architecture (representing 6 neurons in the input layer, 2 neurons in the hidden layer and 2 neurons in the output layer) has the connection weight matrices of 62, and 22 size. The product of 62 and 22 matrices, using the connection weight approach and Garson’s approach, gives a resultant matrix of 62, which corresponds to the weights of the 6 input variables. The values of these weights are used to rank the input variables, meaning that the variable with maximum weight is assigned as rank 1 and the variable with the minimum weight as rank 6. The weights and ranks corresponding to all the 6 input 448 Front. Earth Sci. 2014, 8(3): 439–456 Table 4 Individual accuracies in terms of R and RMSE values for both of the shear parameters (c and f) for all the 4 models for some selected neural networks f c ANN architecture R Training RMSE Testing R RMSE Training Testing Training Testing Training Testing 0.336 1.181 0.808 0.693 0.079 0.516 Model I (GP, SP, STP and CP as inputs) 4/1/2 0.710 0.564 4/6/2 0.896 0.752 1.730 5.821 0.844 0.798 0.038 0.923 4/7/2 0.833 0.639 1.148 2.083 0.864 0.716 0.320 1.102 4/24/2 0.985 0.670 1.551 0.044 0.958 0.494 0.000 0.414 4/28/2 0.958 0.742 2.006 2.188 0.906 0.377 0.036 0.558 4/32/2 0.968 0.694 0.784 1.208 0.954 0.693 0.007 0.702 4/40/2 0.981 0.555 0.645 0.029 0.975 0.236 0.040 1.592 Model II (GP, SP, STP, CP and DD as inputs) 5/1/2 0.712 0.714 0.036 0.870 0.727 0.719 0.020 0.435 5/6/2 0.845 0.850 0.820 0.589 0.808 0.791 0.355 0.517 5/16/2 0.950 0.897 0.740 0.624 0.943 0.871 0.118 0.717 5/22/2 0.989 0.896 0.041 1.832 0.987 0.582 0.091 0.503 5/32/2 0.988 0.924 0.090 0.046 0.991 0.660 0.049 0.052 5/33/2 0.988 0.880 0.119 0.755 0.989 0.636 0.073 0.461 5/40/2 0.993 0.681 0.179 0.038 0.079 0.411 34.165 8.704 Model III (GP, SP, STP, CP and PI as inputs) 5/1/2 0.657 0.626 0.305 0.888 0.755 0.691 0.002 0.380 5/9/2 0.851 0.510 0.304 0.971 0.888 0.792 0.231 1.025 5/13/2 0.977 0.809 1.783 0.989 0.967 0.684 0.120 0.240 5/18/2 0.975 0.808 0.587 0.180 0.968 0.603 0.076 0.698 5/21/2 0.970 0.820 0.735 1.261 0.963 0.731 0.355 0.380 5/27/2 0.989 0.680 0.422 1.735 0.985 0.886 0.017 0.469 5/40/2 0.994 0.638 0.168 0.199 0.990 0.354 0.069 0.623 Model IV (GP, SP, STP, CP, DD and PI as inputs) 6/1/2 0.719 0.751 0.008 0.969 0.765 0.744 0.011 0.536 6/2/2 0.861 0.853 0.311 0.857 0.824 0.854 0.178 0.803 6/5/2 0.959 0.908 1.582 0.658 0.899 0.679 0.214 0.082 6/8/2 0.979 0.799 0.222 1.265 0.916 0.922 0.045 0.190 6/15/2 0.963 0.839 0.410 0.872 0.902 0.812 0.046 0.700 6/22/2 0.995 0.828 0.163 0.466 0.988 0.815 0.048 0.249 6/40/2 0.999 0.728 17.029 1.422 0.997 0.791 34.165 0.358 variables for the prediction of both c and f are given in Table 6. Ideally, the finer fractions (CP and STP), along with PI of the soil material, contribute more towards its c than coarser fractions (GP and SP) along with DD. Conversely, the coarser fractions (GP and SP), along with DD of the soil material, contribute more towards its f than the finer fractions (CP and STP) along with PI. This is depicted in the correlation matrix of the original data set (refer to Table 2) used for this study. It is observed from Table 6 that, for the prediction of c, the descending order of influence of the 6 different input variables in the case of the connection weight approach is PI, STP, SP, CP, DD, and GP. That in case of Garson’s approach is CP, GP, PI, DD, SP, and STP. Similarly, from Table 6, it is also observed that, for the prediction of f, the descending order of influence of the 6 input variables in the case of the connection weight approach is GP, DD, CP, SP, D.P. KANUNGO et al. ANN & CART for prediction of soil shear parameters 449 Fig. 5 ANN training and testing accuracies in terms of correlation coefficients observed across different neural networks for the case of c for Model IV. Fig. 6 ANN training and testing accuracies in terms of correlation coefficients observed across different neural networks for the case of f for Model IV. Table 5 Connection weights and biases for cohesion (c) and friction angle (f) in case of a 6/2/2 neural network Weights (wik) Weights ( vko) Biases Neurons Input 1 (GP) Input 2 (SP) Input 3 (STP) Input 4 (CP) Input 5 (DD) Input 6 (PI) Output 1 (c) Output 2 (f) bk Hidden Neuron 1 (k = 1) –124.98 –5.75 –0.98 20.12 –101.27 14.18 2.34 –1.02 87.64 –4.80 Hidden Neuron 2 (k = 2) 1.85 –4.18 2.66 –67.58 6.05 56.99 2.94 –1.59 –2.69 bo (c) — (f) 2.10 — 450 Front. Earth Sci. 2014, 8(3): 439–456 Table 6 Weights and ranks (in brackets) corresponding to all of the 6 input variables for the prediction of both c and f as obtained using Connection weight approach, Garson’s approach and Weight-bias approach Weights and ranks Input parameters Connection weight approach Garson’s apporach Weight-bias approach c f c f c f GP –287.569167 (6) 125.21965 (1) 0.480890716 (2) 0.480890716 (2) –41.045 (5) 12.532 (3) SP –25.7740852 (3) 12.54380092 (4) 0.051503799 (5) 0.051503799 (5) –1.7203 (4) 22.123 (2) STP 5.51500891 (2) –3.22028038 (5) 0.02275593 (6) 0.02275593 (6) –118.6067 (6) 83.4911 (1) CP –151.67199 (4) 86.88804508 (3) 0.560407326 (1) 0.560407326 (1) 200.5407 (3) –88.731 (4) DD –219.64503 (5) 94.23460104 (2) 0.422294122 (4) 0.422294122 (4) 232.2052 (2) –105.052 (5) PI 200.9270384 (1) –105.211826 (6) 0.462148106 (3) 0.462148106 (3) 366.723 (1) –171.444 (6) STP, and PI; and that in the case of Garson’s approach is CP, GP, PI, DD, SP, and STP (exactly the same as in the case of c). This result illustrates, to some extent, the critical difference between the connection weight approach and Garson’s approach that results in their differential ability to correctly identify variable importance in the neural network. The connection weight product matrix (Table 6) shows that the finer fractions (CP and STP), along with PI, have comparatively more influence than a coarser fraction (GP), along with DD, on the prediction of c, as ideally the case should be. However, there exists an ambiguity in this case. SP, being a coarser fraction of soil, exhibits more influence on the prediction of its c than the finer fraction CP. Similarly, it is also demonstrated in the connection weight product matrix (Table 6) that the coarser fractions (GP), along with DD, have comparatively more influence than the finer fractions (CP and STP), along with PI, on the prediction of f of the soil, as ideally the case should be. However, there exists a similar ambiguity as discussed above. On the other hand, the inability of Garson’s algorithm to correctly depict the true order of variable importance can be simply because it uses absolute connection weights in its calculations, and fails to account for the contrasting influences of input neurons through different hidden neurons to output neurons. In this study, the Garson’s weight product matrices for both c and f are exactly same. It is also an interesting finding that, in the case of Garson’s algorithm for weight analysis, no contrast in the order of influence of input variables is depicted in the case of neural network architectures with multiple output neurons (as in this case two output neurons c and f). In contrast, the connection weight approach uses raw connection weights, which accounts for the direction of the input–hidden–output relationship and results in the correct identification of variable contributions. 5.1.3 Connection weight-bias analysis In the present study, an attempt was made to develop an approach that takes into account both the connection weights between input – hidden and hidden – output neurons, and biases at hidden and output neurons (Table 5), in order to assess the influence of input variables in the most appropriate neural network (i.e., 6/2/2) on the prediction of c and f. According to Goh et al. (2005), a model equation can be established with these weights and biases as the model parameters to obtain the network output. The mathematical equation relating the input variables and the output can be written as: ( " h X cn or fn ¼ flog-sig b0 þ vko flog-sig k¼1 bk þ m X !#) wik Xi , (9) i¼1 where cn or fn is the normalized (in the range of 0 to 1 in this case) value of c or f depending on the weights and biases; b0= bias at the output layer; vko connection weight between kth neuron of hidden layer and the output layer (o); bk= bias at the kth neuron of hidden layer; h = number of neurons in the hidden layer; wik= connection weight between ith input neuron and kth neuron of hidden layer; Xi= normalized input variable i in the range [0,1]; m = total No. of input variables and flog-sig= log-sigmoid transfer 1 function f ðxÞ ¼ . 1 þ e–x Using the values of the weights and biases presented in D.P. KANUNGO et al. ANN & CART for prediction of soil shear parameters Table 5, one can easily calculate the normalized values of unsaturated c and f respectively, which need to be denormalized with respect to their maximum values in the original data set. Taking a lead from the above Eq. (9), the following equations (Eqs. (10) and (11)) are designed to calculate the resultant matrix of 62, which corresponds to the weights of 6 input variables for the prediction of both c and f, using connection weights as well as biases (named the Weightbias Approach). 8 9 8 9 b0 ðcÞ þ v11 ðb1 þ w11 Þ þ v21 ðb2 þ w21 Þ > GP > > > > > > > > > > > > > > > > > > > b ðcÞ þ v ðb þ w Þ þ v ðb þ w Þ > > > SP 0 11 1 12 21 2 22 > > > > > > > > > > > > > > > > < STP = < b0 ðcÞ þ v11 ðb1 þ w13 Þ þ v21 ðb2 þ w23 Þ > = ¼ , > > CP > b0 ðcÞ þ v11 ðb1 þ w14 Þ þ v21 ðb2 þ w24 Þ > > > > > > > > > > > > > > > > > > > > > > > > > DD b ðcÞ þ v ðb þ w Þ þ v ðb þ w Þ > > > > 0 11 1 15 21 2 25 > > > > > > > : : ; > ; PI b0 ðcÞ þ v11 ðb1 þ w16 Þ þ v21 ðb2 þ w26 Þ (10) 9 8 9 8 b0 ðfÞ þ v12 ðb1 þ w11 Þ þ v22 ðb2 þ w21 Þ > GP > > > > > > > > > > > > > > > > > > > b ðfÞ þ v ðb þ w Þ þ v ðb þ w Þ > > > SP 0 12 1 12 22 2 22 > > > > > > > > > > > > > > > > = < STP = < b0 ðfÞ þ v12 ðb1 þ w13 Þ þ v22 ðb2 þ w23 Þ > : ¼ > > b0 ðfÞ þ v12 ðb1 þ w14 Þ þ v22 ðb2 þ w24 Þ > > > CP > > > > > > > > > > > > > > > > DD > > > > > > > b ðfÞ þ v ðb þ w Þ þ v ðb þ w Þ > > > 0 12 1 15 22 2 25 > > > > > > > > : ; : ; > PI b0 ðfÞ þ v12 ðb1 þ w16 Þ þ v22 ðb2 þ w26 Þ (11) The weights and ranks corresponding to all of the 6 input variables for the Prediction of both c and f, as obtained using the Weight-bias Approach, are also given in Table 6. Here, it is observed that for the prediction of c, the descending order of influence of 6 input variables in this case is PI, DD, CP, SP, GP, and STP. For the prediction of f, the descending order of influence of these variables is STP, SP, GP, CP, DD, and PI. The resultant matrix (Table 6) indicates that the finer fractions (CP) along with PI have comparatively more influence than the coarser fractions (SP and GP) on the prediction of c, as ideally the case should be. However, there exist ambiguities in that STP, being a finer fraction of soil, exhibits less influence on the prediction of its c than the coarser fractions GP and SP. Also, DD has more influence in predicting c, which should not be the case. It can be observed that the coarser fractions (SP and GP) have comparatively more influence than the finer fractions (CP) along with DD of the soil on the prediction of its f, as the case should be. However, there also exist ambiguities in this case; in that, STP being a finer fraction of soil exhibits more influence on the prediction of its f than the coarser fractions GP and SP. Also, PI has more influence in 451 predicting f, which again should not be the case. Hence, in the case of the Weight-bias Approach, it can be stated that the results to a great extent match the correlation matrix of the original data set (refer to Table 2). This clearly reflects the use of raw connection weights and biases for generating the resultant matrix, as in the case of the Connection Weight Approach. 5.1.4 Comparison of Connection Weight and Weight-bias approaches As the results for the Connection Weight Approach, Garson’s Approach, and the proposed Weight-bias Approach are compared, it can be summarized that, for the assessment of degree of influence of input variables on the prediction of shear strength parameters of soils under unsaturated conditions, the most appropriate weight analysis approach is the Connection Weight Approach. It was found that the Connection Weight Approach provides the best overall methodology for accurately quantifying variable importance, and should be favored over the other approaches examined in this study. The proposed Weightbias Approach performed relatively well. Garson’s Approach showed both poor accuracy and precision, and showed no contrast in weights across multiple outputs. The most notable result of this study is that Garson’s Algorithm is the poorest performing approach. 5.2 Results of the Regression Tree technique As in the ANN technique, experimentally determined c and f values of soils under unsaturated conditions are also used as the target outputs in the regression tree technique. Both the experimentally determined and observed values of c and f are compared using R and RMSE measures in the training and testing phase to assess the prediction capability of this technique. The variation in RMSE and R values across different trees with varying splitmins are given in Fig. 7 for Model IV, with all 6 input parameters, to assess their prediction capability for both c and f of soils. The best prediction tree in each combination case is chosen on the basis of these performance measures. Tables 7 and 8 contain the statistical performance details (RMSE and R) of the best prediction trees for c and f, respectively, for four different models. It can be inferred from the results given in both tables that for different combinations of input parameters the splitmin value varies for the best regression tree for prediction of both c and f. The results shown in Table 7 indicate that the best result for the prediction of c has been obtained using Model IV, which considers all 6 input parameters (GP, SP, STP, CP, DD, and PI). The training and testing R values are 0.88 and 0.73 respectively. Also, the RMSE values for the training and testing datasets are quite reasonable; and the difference is less in this case than it is using the other 3 combinations 452 Front. Earth Sci. 2014, 8(3): 439–456 Fig. 7 Variation in RMSE and R values across different regression trees with varying splitmins for Model IV for: (a) cohesion and (b) friction angle. Table 7 Training and testing accuracies for prediction of c using Regression Tree technique RMSE/(kg$cm–2) Predictors Splitmin R Training Testing Training Testing GP,SP,STP, and CP (Model I) 19 0.080 0.192 0.84 0.45 GP,SP,STP,CP, and DD (Model II) 12 0.056 0.165 0.93 0.67 GP,SP,STP,CP, and PI (Model III) 10 0.075 0.189 0.87 0.48 GP,SP,STP,CP,DD, and PI (Model IV) 23 0.069 0.162 0.88 0.73 Table 8 Training and testing accuracies for prediction of f using Regression Tree technique RMSE/degree Predictors Splitmin R Training Testing Training Testing GP,SP,STP, and CP (Model I) 11 2.709 6.078 0.92 0.64 GP,SP,STP,CP, and DD (Model II) 9 2.361 3.749 0.94 0.89 GP,SP,STP,CP, and PI (Model III) 15 2.804 5.645 0.91 0.70 GP,SP,STP,CP,DD, and PI (Model IV) 9 2.349 3.833 0.94 0.90 of input parameters. In cases using the other 3 combinations of input parameters, it can be inferred from the results that the prediction trees are over-trained, and could not attain generalization capability because the difference between training and testing R is quite large. Results in Table 8 show that, for the prediction of f, again, the best result was obtained with Model IV, considering all 6 input parameters, with training and testing R values of 0.94 and 0.90, respectively. These values are comparatively higher than those in the case of prediction of c (refer to Table 7). In the case of Model II, with input parameters GP, SP, STP, CP, and DD, the prediction accuracies in terms of R values for training and testing are 0.94 and 0.89, respectively, and are of the same order as those for Model IV. Hence, it can be inferred from these results that the input parameter PI has little influence on the prediction or estimation of f. In the cases of the other two combinations of input parameters, it can be stated that the regression trees were over-trained, and did not attain generalization capability because the difference between training and testing R values is quite large. Hence, it can be summarized from the above results that D.P. KANUNGO et al. ANN & CART for prediction of soil shear parameters the regression tree analysis in the case of Model IV, considering all 6 predictors (GP, SP, STP, CP, DD, and PI), gives the most appropriate result for the prediction of both c and f. The most appropriate regression trees obtained for c and f for Model IV, using all 6 input parameters as predictors, are given in Fig. 8. The experimentally determined and predicted c and f values for the testing dataset of 20 soil samples in the case of Model IV are represented in Fig. 9. This figure clearly shows a good correlation between the experimentally determined and predicted values of both shear parameters of soil. 6 453 Conclusions In this research, ANN and CART techniques for the prediction of shear strength parameters (c and f) of soils under unsaturated conditions was investigated. The performance evaluation of these techniques was carried out using R and RMSE measures. In this work, four different models were adopted, considering different combinations of 6 input parameters of soil such as GP, SP, STP, CP, DD, and PI. For all four models, with varied combinations of input parameters, both ANN and CART Fig. 8 Most appropriate regression tree in the case of Model IV for predicting (a) cohesion and (b) angle of internal friction. 454 Front. Earth Sci. 2014, 8(3): 439–456 techniques were attempted and evaluated. The results indicate that amongst these models, Model IV, considering all 6 input parameters, is the most appropriate model for the prediction of shear parameters with both ANN and CART techniques. Fig. 9 Experimentally determined and predicted values for the testing dataset in case of Model IV using regression tree technique: (a) cohesion and (b) angle of friction. In the case of the ANN technique, the 6/2/2 neural network architecture gave the best training and testing accuracy, with combined R values of 0.939 and 0.879, respectively; and with combined RMSE values 0.103 and 0.136, respectively, for prediction of both c and f. Similarly, in cases using regression tree analysis, it was found that, for Model IV, the regression tree with a splitmin of 23 obtained the best training and testing accuracy, with R values 0.88 and 0.73, respectively; and with RMSE values 0.069 and 0.162, respectively, for the prediction of soil c. For the prediction of f, the regression tree with a splitmin of 9 gave the best training and testing accuracy, with R values 0.94 and 0.90, respectively; and with RMSE values 2.349 and 3.833, respectively. With regression trees like those generated in this study, having many branches, there is the possibility of over fitting the tree towards the training data set, hence it loses its generalization capability. Keeping this in view, simpler regression trees in this case have been tried; and results with considerably good accuracies have been obtained. From the results obtained using ANN and CART techniques, it can be observed that for the prediction of f, the performances of both the techniques are of the same order, which is indicated by the similar R values for both training and testing samples. However, for the prediction of c, the ANN technique performs better than the regression tree technique. Hence, it can be concluded from these results that the ANN technique, with the Levenberg-Marquardt learning rule, is a comparatively better approach than the CART technique for the prediction of both of the shear parameters of the soils. Further, the performance evaluation of the ANN technique, for the indirect estimation of both of the shear parameters (c and f) of soils, has also been attempted on an individual shear parameter basis, to assess and compare the combined prediction capability vis-à-vis individual prediction capability of the ANN technique. It was observed from the results that the individual or combined performance evaluation of both training and testing datasets did not affect the overall result in selecting the most appropriate model (Model IV considering all 6 input parameters in this case) for the prediction of both of the shear parameters of soil (c and f). However, it does have effects in terms of the R values, and also in terms of the most appropriate neural network architecture. So, it can finally be stated that a combined performance evaluation is always convenient to use in order to select the best single neural network architecture for the prediction of both c and f. Connection weight and bias analyses of the best neural network (i.e., 6/2/2) were attempted using the Connection Weight Approach, Garson’s Approach, and a proposed Weight-bias Approach to characterize the input variables in terms of their influence on the shear strength parameters of soil (c and f). It can be summarized that the Connection Weight Approach appeared to be the most appropriate and accurate weight analysis approach for quantifying variables importance. This method should be favored over the other approaches examined in this study. The Connection Weight Approach uses raw connection weights, which accounts for the direction of the input-hidden-output relationship, and results in the correct identification of variable contribution. The proposed Weight-bias Approach performed relatively well; and the results were comparable with the Connection Weight Approach results. The most notable result of this analysis was that Garson’s Algorithm was the poorest performing approach; and no contrast in the order of influence of the input variables was depicted in cases of neural network architecture with multiple output neurons (as in those cases with two output neurons c and f). Acknowledgements The authors are grateful to the Director, CSIR-CBRI D.P. KANUNGO et al. ANN & CART for prediction of soil shear parameters for his kind permission to publish this work. We also thank the editors and two anonymous reviewers for their valuable comments, which helped to improve the quality of the paper. References Agrawal G, Weeraratne S, Khilnani K (1994). Estimating clay liner and cover permeability using computational neural networks. In: Proceedings of the 1st Congress on Computing in Civil Engineering, Washington Akbulut S (2005). Artificial neural networks for predicting the hydraulic conductivity of coarse grained soils. Eurasian Soil Sci, 38: 392–398 Alavi A H, Gandomi A H, Gandomi M, Sadat H S S (2009). Prediction of maximum dry density and optimum moisture content of stabilized soil using RBF neural networks. IES Journal Part A: Civil & Structural Engineering, 2(2): 98–106 Arora M K, Das Gupta A S, Gupta R P (2004). An artificial neural network approach for landslide hazard zonation in the Bhagirathi (Ganga) Valley, Himalayas. Int J Remote Sens, 25(3): 559–572 Atkinson P M, Tatnall A R L (1997). Neural networks in remote sensing. Int J Remote Sens, 18(4): 699–709 Attoh-Okine N O (2004). Application of genetic-based neural network to lateritic soil strength modeling. Construct Build Mater, 18(8): 619– 623 Baykasoğlu A, Güllüb H, Çanakçıb H, Özbakırc L (2008). Prediction of compressive and tensile strength of limestone via genetic programming. Expert Syst Appl, 35(1–2): 111–123 Breiman L, Friedman J H, Olshen R A, Stone C J (1984). Classification and Regression Trees. Newyork: Chapman and Hall Ltd/CRC Çanakcı H, Baykasoglu A, Gullu H (2009). Prediction of compressive and tensile strength of Gaziantep basalts via neural networks and gene expression programming. Neural Comput Appl, 18(8): 1031– 1041 Cho S E (2009). Probabilistic stability analyses of slopes using the ANNbased response surface. Comput Geotech, 36(5): 787–797 Das S K, Basudhar P K (2008). Prediction of residual friction angle of clays using artificial neural network. Eng Geol, 100(3–4): 142–145 Escario V, Juca J E T (1989). Strength and deformation of partly saturated soils. In: Proceedings 12th International Conf. Soil Mech. Foundation Engineering. Rio de Janeiro, 1: 43–46 Ferentinou M D, Sakellariou M G (2007). Computational intelligence tools for the prediction of slope performance. Comput Geotech, 34 (5): 362–384 Foody G M, Arora M K (1997). An evaluation of some factors affecting the accuracy of classification by an artificial neural network. Int J Remote Sens, 18(4): 799–810 Freund J E (1992). Mathematical Statistics (5th edition). New Delhi: Printice-Hall of India Pvt. Ltd., 658 Garson G D (1991). Interpreting neural-network connection weights. Artif Intell Expert, 6: 47–51 Gevrey M, Dimopoulos I, Lek S (2003). Review and comparison of methods to study the contribution of variables in artificial neural network models. Ecol Model, 160: 249–264 Goh A T C, Kulhawy F H, Asce F, Chua C G (1994). Seismic liquefaction potential assessed by neural networks. J Geotech 455 Geoenviron Eng, 120(9): 1467–1480 Goh A T C, Kulhawy F H, Chua C G (2005). Bayesian neural network analysis of undrained side resistance of drilled shafts. J Geotech Geoenviron Eng, 131(1): 84–93 Goktepe A B, Altun S, Altintas G, Tan O (2008). Shear strength estimation of plastic clays with statistical and neural approaches. Build Environ, 43(5): 849–860 Goktepe A B, Sezer A (2010). Effect of particle shape on density and permeability of sands. Proceedings of institution of civil engineers geotechnical engineering, 163: 307–320 Gómez H, Kavzoglu T (2005). Assessment of shallow landslide susceptibility using artificial neural networks in Jabonosa River Basin, Venezuela. Eng Geol, 78(1–2): 11–27 Gong P (1996). Integrated analysis of spatial data from multiple sources: using evidential reasoning and artificial neural network techniques for geological mapping. Photogrammetric Engineering & Remote Sensing, 62(5): 513–523 Hagan M T, Demuth H B, Beale M H (1996). Neural Network Design. Boston: PWS Publishing, 730 Hagan M T, Menhaj M B (1994). Training feedforward networks with the Marquardt algorithm. IEEE Trans Neural Netw, 5(6): 989–993 Hanna A M, Ural D, Saygili G (2007). Neural network model for liquefaction potential in soil deposits using Turkey and Taiwan earthquake data. Soil Dyn Earthquake Eng, 27(6): 521–540 Haykin S (1998). Neural Networks: A Comprehensive Foundation. New Jersey: Prentice Hall, 842 IS 2720 (Part IV) (1985). Indian Standard for Grain Size Analysis (2nd Revision). New Delhi: Bureau of Indian Standards, 73–94 IS 2720 (Part V) (1985). Indian Standard for Determination of Liquid and Plastic Limit (2nd Revision). New Delhi: Bureau of Indian Standards, 109–114 IS 2720 (Part XIII) (1986). Indian Standard for Direct Shear Test (2nd Revision). New Delhi: Bureau of Indian Standards, 195–198 Jain V, Seung H S, Turaga S C (2010). Machines that learn to segment images: a crucial technology for connectomics. Curr Opin Neurobiol, 20(5): 653–666 Juang C H, Chen C J, Jiang T, Andrus R D (2000). Risk-based liquefaction potential evaluation using standard penetration tests. Can Geotech J, 37(6): 1195–1208 Kanungo D P, Arora M K, Sarkar S, Gupta R P (2006). A comparative study of conventional, ANN black box, fuzzy and combined neural and fuzzy weighting procedure for landslide susceptibility zonation in Darjeeling Himalayas. Eng Geol, 85(3–4): 347–366 Kaya A (2009). Residual and fully softened strength evaluation of soils using artificial neural networks. Geological and Geotechnical Engineering, 27(2): 281–288 Kayadelen C, Günaydın O, Fener M, Demir A, Özvan A (2009). Modeling of the angle of shearing resistance of soils using soft computing systems. Expert Systems with Applications, 36: 11814– 11826 Lee S, Ryu J H, Won J S, Park H J (2004). Determination and application of the weights for landslide susceptibility mapping using an artificial neural network. Eng Geol, 71(3–4): 289–302 Lu P, Rosenbaum M S (2003). Artificial neural networks and grey systems for the prediction of slope stability. Nat Hazards, 30(3): 383– 398 456 Front. Earth Sci. 2014, 8(3): 439–456 Lu Z (1992). The relationship of shear strength to swelling pressure for unsaturated soils. Chinese journal of geotechnical engineering, 14(3): 1–8 (in Chinese) Maji V B, Sitharam T G (2008). Prediction of elastic modulus of jointed rock mass using artificial neural networks. Geotech Geol Eng, 26(4): 443–452 Najjar Y M, Basheer I A (1996). Discussion of stress-strain modeling of sands using artificial neural networks. J Geotech Eng, 122(11): 949– 951 Neaupane K M, Achet S H (2004). Use of backpropagation neural network for landslide monitoring: a case study in the higher Himalaya. Eng Geol, 74(3–4): 213–226 Nefeslioglu H A, Duman T Y, Durmaz S (2008). Landslide susceptibility mapping for a part of tectonic Kelkit Valley (Eastern Black Sea region of Turkey). Geomorphology, 94(3–4): 401–418 Olden J D, Jackson D A (2002). Illuminating the “black box”: understanding variable contributions in artificial neural networks. Ecol Model, 154: 135–150 Olden J D, Joy M K, Death R G (2004). An accurate comparison of methods for quantifying variable importance in artificial neural networks using simulated data. Ecol Model, 178 (3–4): 389–397 Rafiai H, Jafari A (2011). Artificial neural networks as a basis for new generation of rock failure criteria. Int J Rock Mech Min Sci, 48(7): 1153–1159 Ripley B (1996). Pattern Recognition and Neural Networks. Cambridge: Cambridge University Press, 403 Schalkoff R J (1997). Artificial Neural Networks. New York: Wiley, 422 Shen Z, Yu S (1996). The problems in the present studies on mechanics of unsaturated soils. In: Proceedings of the Symposium on Geotechnical Aspects of Regional Soils. Beijing: Atomic Energy Press (in Chinese) Sietsma J, Dow R J F (1991). Creating artificial neural networks that generalize. Neural Netw, 4(1): 67–79 Sonmez H, Gokceoglua C, Nefeslioglub H A, Kayabasi A (2006). Estimation of rock modulus: for intact rocks with an artificial neural network and for rock masses with a new empirical equation. Eng Geol, 43: 224–235 Tiryaki B (2008). Predicting intact rock strength for mechanical excavation using multivariate statistics, artificial neural networks, and regression trees. Eng Geol, 99(1–2): 51–60 Xu Y (1997). Mechanical Properties of Unsaturated Expansive Soils and Its Application to Engineering. Dissertation for Ph.D degree. Nanjing: Hohai University (in Chinese) Yesilnacar E, Topal T (2005). Landslide susceptibility mapping: a comparison of logistic regression and neural networks methods in a medium scale study, Hendek region (Turkey). Eng Geol, 79(3–4): 251–266 Youd T L, Gilstrap S D (1999). Liquefaction and deformation of silty and fine-grained soils. In: Proceedings of the 2nd international conference on earthquake geotechnical engineering, 3: 1013–1020 Zhou W (1999). Verification of the nonparametric characteristics of back propagation neural networks for image classification. IEEE Transaction on Geoscience and remote sensing, 37: 771–779