NEURAL NETWORKS NON-LINEAR SCALING Alok Bhaskar Nakate B.E., Pune University, India, 2006 PROJECT Submitted in partial satisfaction of the requirements for the degree of MASTER OF SCIENCE in COMPUTER SCIENCE at CALIFORNIA STATE UNIVERSITY, SACRAMENTO SPRING 2011 NEURAL NETWORKS NON-LINEAR SCALING A Project by Alok Bhaskar Nakate Approved by: __________________________________, Committee Chair V. Scott Gordon, Ph.D. __________________________________, Second Reader Kwai Ting Lan, Ph.D. ____________________________ Date ii Student: Alok Bhaskar Nakate I certify that this student has met the requirements for format contained in the University format manual, and that this project is suitable for shelving in the Library and credit is to be awarded for the Project. __________________________, Graduate Coordinator Nikrouz Faroughi, Ph.D. Department of Computer Science iii ________________ Date Abstract of NEURAL NETWORKS NON-LINEAR SCALING by Alok Bhaskar Nakate Training a neural network with backpropagation algorithm is a systematic process to model a set of given data. This training process involves, among other things, scaling the input and output datasets provided to the neural network. The reason that the scaling process is required, is that real world problem datasets might not be in the range [0, 1], whereas the neural networks work with data only in the range [0, 1], i.e. the neurons fire or they do not. To achieve this, linear scaling is typically used, which for certain datasets can make it difficult for the neural network to properly differentiate between values that are close together. This project will show how the implementation of non-linear median scaling can be applied to the training datasets and compares its performance against the linear scaling methodology for a variety of training datasets both for speed of learning and subsequent ability to generalize. This project demonstrates that after introduction of non-linear iv scaling into the backpropagation and applying it to the various datasets, there is an improvement in performance to some extent. _______________________, Committee Chair V. Scott Gordon, Ph.D. _______________________ Date v ACKNOWLEDGEMENTS I would like to thank Professor Scott Gordon who helped me choose the topic for this project and for countless times he spent with me to guide me in my project. I would also thank him for his valuable advice for this project, and for his reviewing and providing suggestions for the project report. I would also like to thank Professor Kwai Ting Lan for reviewing this project report. Finally, I would like to thank my numerous friends who endured this long process with me, always offering support. vi TABLE OF CONTENTS Acknowledgements ............................................................................................................ vi List of Tables ..................................................................................................................... ix List of Figures ..................................................................................................................... x Chapter 1. INTRODUCTION .................................................................................................. 1 2. BACKGROUND .................................................................................................... 3 2.1. Neural Networks ..................................................................................................... 3 2.2. Artificial neuron ...................................................................................................... 7 2.3. Backpropagation algorithm ................................................................................... 11 2.3.1. Criteria..................................................................................................... 12 2.3.2. Learning rate ........................................................................................... 13 2.3.3. Generalization ......................................................................................... 13 2.4. Linear scaling ........................................................................................................ 14 2.4.1. 3. Drawback of the linear scaling ................................................................... 16 PROPOSED SOLUTION ..................................................................................... 19 3.1. Scaling in backpropagation ......................................................................... 20 3.1.1. Initialization ................................................................................................ 21 3.1.2. Scaling......................................................................................................... 21 3.1.3. Training ....................................................................................................... 23 3.1.4. Un-scaling process ...................................................................................... 24 3.2. Which Neural Network elements are scaled? ............................................. 26 vii 3.3. 4. Scaling range............................................................................................... 27 EXPERIMENTAL METHODOLOGY ................................................................ 28 4.1. Datasets ....................................................................................................... 28 4.1.1. Minimum weight steel beam problem ........................................................ 28 4.1.2. Two-Dimensional projectile motion datasets ............................................. 30 4.1.3. Wine recognition data ................................................................................. 32 4.2. 5. Measuring performance .............................................................................. 35 RESULTS ............................................................................................................. 37 5.1. Result for minimum weight problem datasets ............................................ 37 5.2. Result for two-dimensional projectile motion dataset ................................ 38 5.3. Result for wine recognition data ................................................................. 39 6. CONCLUSION ..................................................................................................... 41 7. FUTURE WORK .................................................................................................. 42 Appendix A Source Code ............................................................................................... 43 Appendix B Datasets ...................................................................................................... 61 Appendix C Results ........................................................................................................ 66 Bibliography ..................................................................................................................... 82 viii LIST OF TABLES Page Table 1 Linear scaling example ……………….….……….………...….…………….. 16 Table 2 Linear scaling with large difference in datasets ……………….…………...... 17 Table 3 Training datasets for steel beam design problem …………………………….. 29 Table 4 Testing datasets for steel beam design problem ……..………………………. 30 Table 5 Example training datasets for 2-D projectile data motion …...………………. 31 Table 6 Example testing datasets for 2-D projectile data motion ….…………………. 32 Table 7 Example training datasets for wine recognition ………….……….…………. 34 Table 8 Example testing datasets for wine recognition ……………....………………. 34 Table 9 Result for minimum weight problem …………………………………..…….. 38 Table 10 Result for 2-D projectile motion problem ………………...………………… 39 Table 11 Result for wine recognition data …………………..………………………... 40 ix LIST OF FIGURES Page Figure 1 Artificial neural network…………………………..……………………….…. 5 Figure 2 Artificial neuron with activation function F……………..……………………. 7 Figure 3 Sigmoidal activation function…..……………………….…………………….. 9 Figure 4 Backpropagation algorithm……..……………………………...……………. 12 Figure 5 Neural network with large difference in input datasets…...…………………. 18 Figure 6 Formula for non-linear scaling…....…………………….…….…..…………. 20 Figure 7 Formula for linear scaling…….…..…………………………………………. 22 Figure 8 Formula for non-linear scaling in detail…....…………….………….………. 23 Figure 9 Formula for scaling criteria value..…………………………………..………. 24 Figure 10 Formula for non-linear unscaling.…………..………………………………. 25 Figure 11 Formula for linear unscaling…….………………………………….………. 26 x 1 Chapter 1 INTRODUCTION Neural networks are biologically inspired and often capable of modeling realworld complex functions. Artificial Neural Networks are composed of elements that perform in a manner that resembles a set of biological neurons, and organized in a way that is inspired by the anatomy of the brain [1]. An interesting characteristic of neural networks is that they can learn their behavior by example, i.e. when certain set of inputs are applied, they self-adjust to produce responses consistent with the desired outputs. So, when a set of training data is applied to the neural networks, they learn that particular problem, so furthermore when the trained neural network comes across some unknown inputs but of similar circumstances this designed neural network, can often respond appropriately, and this characteristic of neural network is known as generalization [4]. To effectively do so they need to be trained initially with training datasets which include the inputs and the corresponding desired outputs. The neural network then tries to minimize the error between the desired output specified in the training data and actual output produced by the neural networks. This is often a slow process, so researchers are interesting in finding fast training methods. One of the well-known methods of training a neural network is Backpropagation. Backpropagation is a systematic method for training multilayer neural networks. When a set of inputs is applied to the neural network, the backpropagation algorithm adjusts the weights based on the resulting error. 2 Backpropagation usually performs scaling of input (optional) and output values. The reason for scaling the training datasets is that the artificial neurons only output in the range [0, 1]. Real world problems often include data outside this range. Backpropagation typically uses linear scaling before the algorithm starts its training. However, this linear scaling approach often fails when the input or output values are clustered and not adequately distributed. If all the datasets are scaled linearly, this can result in training data that is clustered too closely together for the network to model. Therefore, this project will introduce a new approach of nonlinear scaling, and test this method on a variety of problems with different training datasets. 3 Chapter 2 BACKGROUND 2.1. Neural Networks Artificial neural networks are inspired by the biological neurons that reside in the brain’s Central Nervous System. The neural network has the capability for solving real world problems by building a computational model and processing the information provided to it. It can be called an adaptive system that changes its structure based on the information that flows through the network during its learning. Learning implies that neural networks are capable of changing their input/output behavior as a result of changes in the environment. Neural networks learn by example. Training sets are provided to the neural network and then by the use of training algorithms like backpropagation, they learn to replicate it. The neural networks operate in three scenarios, which are as follows: 1. During the training process of the neural network, the datasets including both the input and the desired output are provided. The network then adjusts the weights with the help of the algorithm (backpropagation), so that it gets to the desired output. This process may require iterations to reach to the desired output. This is also known as the correct/known behavior. Therefore, the neural networks learn from this process when provided with an appropriate training dataset producing the correct output. 2. After training, test datasets are applied to the neural network to test the above process in order to know whether the designed network was successful in learning. The test 4 datasets also include input and the desired output. The only difference between the first step and this step is that instead of iterating through the loop to adjust the weights, it uses the above data (weights) to try to reach the desired output for similar untrained cases. 3. In the third scenario, only input datasets are provided to the designed neural network expecting that the neural network will produce the desired output since it has learnt from the above steps. This scenario can be a real-life example, for instance implementing the trained neural network in a field where the desired outputs are unknown and the only data available is input data. The main advantage of neural networks lies in their ability to represent both linear and non-linear relationships and in their ability to learn these relationships directly from the data being modeled. The basic artificial neural network is shown in Figure 1 below. It consists of four main components: 1. The input layer: All the artificial neurons accept the inputs given to the neural networks. 2. The Output layer: All the results are produced at this layer. 3. One or more hidden layers: The computations occur here, described in section 2.2. 4. The weights: The values at each link between the nodes. Weights get adjusted in the network according to the error calculations. 5 Figure 1: Artificial neural network In the traditional computational (non-neural network) approach, complex problems can be solved by “divide and conquer”. The complex problems can be decomposed into simpler or smaller elements so that it is easier to understand them. These simple elements which are partially solved can then be gathered together to produce the complex systems. However, in neural networks, the solutions are distributed amongst the neurons. Each neuron computes and produces output and passes on to another neuron or to the output layer. Often, the neural networks can modify their behavior in response to changes in the environment, which is similar to the working of the brain. 6 Some advantages of neural networks are: 1. A trained neural network can become an “expert” in the category of information it has learned. 2. Adaptive learning: They have an ability to learn how to do tasks based on the data given for training or initial experience. 3. Self-Organization: They can create their own representation of information that they receive during the learning process. 4. Flexibility to a large number of domains 7 2.2. Artificial neuron The artificial neuron is an abstraction of the first order characteristic of the biological neuron, which represents a mathematical unit in the neural network model. We can say that neurons form the computational model inspired by the natural biological neuron. Figure 2 below shows the neuron used as a fundamental building block for backpropagation algorithm. It accepts input from the datasets at the input layer or from other neurons, adds all these inputs and multiplies them with their corresponding weights, producing a weighted sum. This weighted summation of products also called as NET must be calculated for each neuron in the network. After NET is calculated, it is passed on to the activation function to determine to what degree the neuron fires, thereby producing the signal OUT [1]. X1 W2 X2 X3 Activation function W1 NET = XW F W3 Threshold Figure 2: Artificial neuron with activation function F OUT 8 As shown in the above figure 2, the X1, X2 and X3 are the inputs given to the neuron. The W1, W2 and W3 are the weights applied initially to the inputs, which are later adjusted accordingly. Then the summation blocks accepts these inputs and weights and calculate the weighted sum or NET and passes that on to the activation function. Each input is multiplied by a corresponding weight and all the weighted inputs are then summed to determine the activation level of that neuron. In figure 2 above, the “F” block is the activation function. Furthermore, this activation function F processes NET signal to produce the neuron’s output signal, OUT. OUT = F (NET) In some neural network models, F may be a threshold function OUT = 1 if NET > T OUT = 0 otherwise Here, T is a constant threshold value, or a function that more accurately simulates the non-linear characteristic of the biological neuron and permits more general network functions [1]. However, it is more common to use a continuous function instead of a hard threshold value. The F processing block compresses the range of NET, so that OUT never exceeds some low limits regardless of the value of NET, and this F processing block is therefore sometimes known as the squashing function. The squashing function is often chosen to be a logistic function or sigmoid (S – Shaped) and mathematically represented as: 9 OUT = F(NET) = 1 / (1 + e – NET ) As shown in the figure 3, the squashing function or the sigmoid compresses the range of NET so that the OUT lies between zero and one. This is why the calculated output of a neural network can lie only between zero and one. The advantage of using the logistic function rather than a hard threshold is that the logistic function is differentiable which led to the derivation of the backpropagation algorithm [1]. Figure 3: Sigmoidal activation function 10 In biological terms, the inputs of artificial neurons resemble the synapse in biological neuron, which are multiplied by weights. It resembles the strength of the respective signals in natural neuron, and is further computed by the mathematical function that determines the activation function. Now, the weights of the artificial neurons can be adjusted to obtain the desired output with the specific inputs provided. So depending on the adjustment of the weights, the summation in the computational block will be different. Normally, the artificial neural network is not composed of a single neuron. Typical neural networks can have as few as half a dozen or as many as hundreds or thousands of neurons involved in complicated applications. There are various algorithms which adjust the weights according to the desired output and this process is called training an artificial neural network. Backpropagation algorithm is one of the popular training methods. 11 2.3. Backpropagation algorithm Backpropagation is a systematic method for training multilayer artificial neural networks. It is known as a supervised learning method, which adjusts the weights when the result that was calculated is compared against the result that was expected in the training datasets. The difference between the produced output and the desired output is calculated which is known as the error. It then statistically uses this information to modify the weights and train itself to reach the expected result as quickly as possible. So, we can say that when a set of datasets are provided to neural networks, they systematically try to train themselves and so as to respond intelligently to the inputs. Backpropagation algorithm is mainly applied to feed forward neural networks. As the name suggests, the artificial neural networks send their calculations forward and the errors are propagated backwards. The backpropagation algorithm follows an iterative process, i.e. in each iteration the weights of nodes are modified using data from the training data sets. The main aim/objective of backpropagation algorithm is to reduce the error by adjusting the weights throughout the entire network. The error is the difference between the actual result calculated by the neural networks and the desired output. Often, the data that are provided to the neural network are scattered. Hence, before the backpropagation executes the learning process, these scattered data are scaled to some desired range. The objective of training the neural network using backpropagation is to adjust the weights so that the application of a set of inputs produces the desired set of outputs. The input – output sets are often referred to as vectors. The training process assumes that 12 each input vector is paired with a target vector or the desired output, together these are called as the training pair [1]. The neural network is trained over a number of training pairs. The backpropagation process is shown in figure 4. Initialize the weights in the network to a small random number I Do For each training pair (di , do ) in the training sets // di is the desired input and do is the desired output. // this is the feed forward pass O = Neural_network_outputs (network, di) Calculate error (do – O) at the output layer for each output in training pair Compute delta for all weights from the hidden layer to output layer Compute delta for all the weights from input layer to hidden layer // above two steps are the backward pass Update the weights in the Network Until all datasets classified correctly and the desired criteria is satisfied Return the set of trained network weights Figure 4: Backpropagation algorithm 2.3.1. Criteria This parameter indicates when the learning process should stop. All the outputs must be within this criteria parameter to terminate. 13 2.3.2. Learning rate This is a constant to affect the speed of the learning. The mathematical calculations of backpropagation are based on small changes being made to the weights at each step of the error calculations. If the changes made to the weights are too large, the algorithm may bounce around the error. So, in this case, it is necessary to reduce the learning rate. On the other hand, the smaller the learning rate, the more steps it takes to get to the criteria. 2.3.3. Generalization If you have a good training set with examples that cover most or all of the various possible inputs and the neural networks learns them all, then it is likely to generalize and successfully model other similar instances. This means that it will give the correct output for other inputs of the same application. 14 2.4. Linear scaling Backpropagation algorithm is a supervised learning method wherein a set of inputs datasets are applied to the networks to train them and then applying the test datasets to see how well they learned. Furthermore, if the network learns them all and some unknown set of input data are applied to the network, they hopefully generalize. Often, the input and output datasets are not uniform, i.e. they do not have a fixed range, and certainly are rarely limited to the range [0, 1]. These datasets are usually distributed across a large range of values. Due to this large distribution of input datasets, scaling becomes essential. The reason this type of scaling process is needed is because the input and output values provided to the neural network can be any random number, whereas the neurons can output data only in the range of [0, 1], i.e. they fire or they do not fire. Hence scaling the datasets is necessary. Hence, the linear scaling is applied to output values (training datasets) and optionally to the input datasets, that are given to the neural networks, to bring all the datasets in the desired range [0, 1], before the backpropagation starts the learning process. 15 The formula for applying the linear scaling to each input and output training datasets provided to the Neural Networks is: Xi = (X – Xmin) ____________ (Xmax – Xmin) Where, X = the value which is being scaled. Xi = the scaled value for each dataset values (X) Xmin = the minimum value of that particular input which is being scaled. Xmax = the maximum value of that particular input which is being scaled. Table 1 shows an example dataset and the scaled result. The first column is the original training dataset and the second column is the scaled value for each left hand side value. The scaled values will be between range [0.050, 0.95], instead of the theoretical range [0, 1]. According to the formula used in sigmoidal function and as shown in figure 3, it approaches only 0 or 1, but never actually reaches 0 or 1, hence the linear scaling formula is modified to scale all the datasets from 0.05 to 0.95. As shown in the table 1, in the first column the smallest value is 11.02 hence its scaled value is 0.050 and the largest value is 14.83, hence its scaled value is 0.95. 16 Table 1: Linear scaling example Datasets provided to Neural Network Linear Scaled values 14.23 0.807 13.19 0.563 14.36 0.841 13.24 0.573 14.19 0.800 14.39 0.845 14.06 0.765 14.83 0.95 11.02 0.050 2.4.1. Drawback of the linear scaling For some training datasets that have large differences in them, scaling those values that are close together will bring their differences near to zero. The motivation of using backpropagation for training neural networks is to reduce the error between the calculated output by the network and the desired output in the datasets. However, if the difference between some of the various input/output pairs provided to neural network is too small to make them harder to distinguish, the further weight manipulations in 17 backpropagation becomes tougher and hence requires much more iteration to achieve the desired output. To demonstrate this, let us consider the datasets in table 2. After applying the linear scaling formula to each of the values, we get the following scaled values. Table 2: Linear Scaling with large difference in datasets Datasets provided to Neural Network Linear Scaling values 0.25 0.00016 0.15 0.050 0.75 0.0011 0.99 0.0014 600 0.95 430 0.7165 598 0.9966 According to the linear scaling formula, X = 0.15 Xmin = 0.15 Xmax = 600 The Xi = 0.05 For X = 0.75, the Xi = 0.0011 In addition, for X = 600, the Xi = 0.95. 18 Figure 5: Neural networks with large difference in input datasets In the figure 5, the width of the arrows represents the amount of weights applied to each input datasets. The nodes with 0.15 values are thicker and the nodes with value 600 are thinner. From the above example, the difference between the datasets 0.15 and 600 is large, so the smaller values like 0.15, 0.75 are all scaled close to zero and this adversely affects the ability of backpropagation to find weights that differentiate between them. This project explores whether this drawback can be reduced by applying nonlinear scaling to the input/output datasets, i.e. scale the larger datasets to a smaller range and the smaller datasets to a larger range. 19 Chapter 3 PROPOSED SOLUTION One way of making the scaling process non-linear is to scale the datasets according to the medians for each dataset inputs and outputs. This approach might help resolve the scaling issue for the datasets with large difference, as this scaling approach will scale at a lesser amount for the largely dense values and scale in large amount for the less dense datasets values. The formula for median type scaling will be as shown in figure 6. First, we find out the median, minimum and maximum for each datasets, i.e. finding these three values for each column in the training datasets provided to the neural networks. Then, for each dataset value apply the given formula and linearly scale the right and left halves of the data separately. After the right half of the formula computes the scaling value that fit in the right hand side of the median, 0.5 value is added to it the computed value, to shift the value to the right hand side, as shown in figure 6. If seen closely each sub-portion of the sides (values that lie in left hand side and the values that lie in right hand side) are actually scaled linearly. However, the overall classification of the datasets occurs non-linearly according to the median value and the algorithm scales them accordingly. 20 Figure 6: Formula for non-linear scaling 3.1. Scaling in backpropagation The backpropagation algorithm is classified into three major steps. They are as follows: 1. Initialization: In this procedure, the neural network is defined and all the necessary parameters are set. The weights are initially set to small, random values. 2. Scaling: This process applies the scaling formula as shown in figure 6, to the datasets provided to neural network to bring all the datasets into the desired range [0, 1]. 3. Training: This process includes the forward pass and error correction procedure for reaching the desired output. This process iterates until all the datasets are classified correctly and the desired criterion is reached. (Described in figure 4.) 4. Un-scaling: This process includes scaling back the calculated output data by the backpropagation algorithm in step 2 and the actual inputs that were scaled. 21 3.1.1. Initialization 1. Set training datasets and test datasets, i.e. initialize the number of rows and number of columns in the datasets. 2. Design the neural network by defining the network topology, i.e. the number of nodes with input, output and hidden layers. 3. Initialize the backpropagation parameters like criteria, learning rate. 4. Initialize the weights to some small random values. 3.1.2. Scaling 1. Initialize the extreme[] array. The extreme[] array holds the 3 values – minimum, maximum, and the median values for each input and output datasets. 2. Open the Training case file and read all the data to an array Train[][] 3. For each column in array Train[][] a. Find out the minimum and maximum values and insert them in extreme[0] and extreme[1] array. 4. For each input datasets a. Scale down only the input datasets except the output datasets in the array Train[][] with the following linear scaling formula. i.e. for each dataset in the particular column apply the formula shown in figure 7 and this will scale them in the range [0 – 1]. 22 Figure 7: Formula for linear scaling 5. Store the output dataset columns in a temporary array MedianArray[] 6. Sort the MedianArray[] to find the median. 7. For each output datasets a. Calculate the median value and insert it in array extreme[2]. 8. For each output datasets a. Scale down the output datasets with the three values in extreme[] array with the non-linear median type scaling algorithm shown in figure 8 and insert each scaled value in Train[][] array. 23 Figure 8: Formula for non-linear scaling in detail 3.1.3. Backpropagation algorithm 1. For each data training pair (di , do) in the training sets a. Process the feed forward pass b. For each output in training pair, calculate the error at the output layer c. Scale the criteria value for each corresponding calculated output values. The formula for scaling the criteria value is scaled to the range [0,.9], not [0.05, 0.95]: 24 Figure 9: Formula for scaling the criteria value d. Process the backward pass i. Compute delta for all weights from the hidden layer to output layer. ii. Compute delta for all the weights from input layer to hidden layer e. Update the weights in the network in a way that minimizes the error. 2. Repeat Step 1 for all datasets until they are classified correctly and within the desired criteria. 3. Return the set of trained network weights. 3.1.4. Un-scaling process 1. After the weight manipulations process executes, and the outputs lie within the desired criteria or the network reaches the number of defined iterations, the 25 dataset values need to be scaled back to the original value/range. The formula for reversing the scaling process is as shown in figure 10. i. For non-linear scaling back to original range: Figure 10: Formula for non-linear unscaling 26 ii. For Linear scaling back to original range: Figure 11: Formula for linear unscaling 2. Print the calculated Output 3. Exit the program 3.2. Which Neural Network elements are scaled? In the above algorithm, the non-linear median type scaling is applied to the output dataset only and the input dataset are scaled according to the linear formula and then these results are compared against the fully linear scaling process applied to the same datasets and observed the performance of the neural network. The non-linear scaling of the datasets is applied to only the output datasets, but scaling the inputs in non-linear would also be a useful experiment. 27 A criterion is a parameter used by the backpropagation algorithm to find out the termination point. Every calculated output error is compared with the criteria to see if training has succeeded. In the algorithm, we need to scale the criteria value, as all the input and output datasets have been scaled respectively, so when comparisons are made, all the data values need to be within to the same range. Hence, the criteria value for each calculated output is also scaled non-linearly. 3.3. Scaling range All the formulas described above in the algorithm are theoretical formulas, so for the implementation purpose, the algorithm needs to be refined to ensure that the error correction calculation is correct. For instance, in the scaling formula if the difference of Xmin and Xmax value comes to zero, the algorithm will encounter divide-by zero error during the actual implementation. Therefore, all the formulas need to be adjusted. As shown in all the figures above in the backpropagation algorithm, all the datasets are scaled (linearly or non-linearly) in the range [0.05, 0.95]. The purpose for doing so is that the logistic output used in the backpropagation cannot actually output a zero or a one, but it approaches these values (Please refer to section 2.2 for logistic function explanation). In the median measurement formula, the range from [Median – Maximum] the computed value falls in range [Minimum – Median], so the formula has to be refined so that the computed value in the range [Median – Maximum] shifts right to 0.5 value (Please refer to Appendix A for the formula implementations). 28 Chapter 4 EXPERIMENTAL METHODOLOGY The effectiveness of non-linear scaling will be assessed by applying different problem sets to a particular neural network under normal scaling techniques (linearly) and then apply the same problem sets with non-linear scaling technique, and measuring how well the neural network does at learning the data and generalizing. In this section, descriptions of each of the datasets is provided. 4.1. Datasets Each dataset has one training pair datasets and one testing pair dataset, which are applied to the designed neural network. The table shows only some of the training pairs, as in some cases all the training dataset is too large to be included in this document. The above-discussed backpropagation algorithm is applied to three problem datasets. 4.1.1. Minimum weight steel beam problem In this dataset, the artificial neural network will be used for learning in the domain of structural engineering. This dataset is an acceptable design that satisfies the requirements of a design code for AISC LRFD specification of steel structures for designing concrete structures. This dataset is specifically of a minimum weight steel beam from the wide-flange (W) shape database for a given loading condition [3]. The designed artificial neural network will be used to learn to select the lightest W shape among all the available shapes. Each instance of the training dataset consists of five input patterns: 29 The member length (L) The unbraced length (L b) The maximum bending momentum in the member (M max) The maximum shear force (V max) Each instance of the training datasets also includes the following corresponding output pattern: The plastic module of the corresponding least weight member (Z x) Backpropagation uses the following parameter values: Learning rate = 0.3 Criteria = 0.001 Table 3: Training datasets for steel beam design problem Inputs Output Instance L Lb M max V max Zx 1 0.40 0.40 0.190 0.190 0.6313 2 0.20 0.20 0.120 0.240 0.3630 3 0.35 0.35 0.035 0.035 0.1000 4 0.15 0.15 0.045 0.120 0.1400 5 0.15 0.15 0.035 0.095 0.1000 30 Table 4: Testing datasets for steel beam design problem Inputs Outputs Instance L Lb M max V max Zx 1 0.20 0.20 0.030 0.060 0.1000 2 0.30 0.30 0.095 0.127 0.3900 3 0.15 0.15 0.010 0.027 0.0415 4 0.40 0.40 0.120 0.120 0.4830 4.1.2. Two – Dimensional projectile motion datasets This dataset includes the two – dimensional projectile motion of a ball fired from a gun with particular specifications. These datasets were gathered by keeping the gun at some stationary platform and hitting a stationary target. The gun has the ability to swivel anywhere from 0 to 90 degrees on the Z axis, and the angle that is formed will be elevation angle theta. The gun can swivel 360 degrees about the origin of X – Y plane, the angle that is formed between the X-axis and where the gun is pointing is called the azimuthal angle phi. Keeping wind into consideration, in the X-Y plane resulting in the wind's own azimuthal angle alpha. Finally, the wind, if it is blowing, imparts some acceleration to the projectile and we will refer to this acceleration as “a”. Hence, each instance of the training dataset consists of five input patterns: 31 The initial velocity of projectile (V o) The elevation angle of gun ( Θ ) Azimuthal angle of the gun (Φ) The horizontal acceleration of projectile due to wind (a) Azimuthal angle of wind accelerated vector (α) Each instance of the training dataset consists of two output patterns: The target co-ordinates (Xt, Yt, 0) The projectile impact co-ordinates (Xi, Yi, 0) Table 5: Example training datasets for 2-D projectile data motion Instance Inputs outputs Vo Θ Φ a α Target Impact 1 -185.680 62.3995 33.0 7.0 3.203 57.8742 136.039 2 46.4323 -21.428 40.0 0.0 0.8099 9.1267 335.227 3 -197.269 80.5293 49.0 0.0 3.791 30.211 157.793 4 -45.512 -66.324 31.0 0.0 1.0952 62.4432 235.541 5 130.623 -164.342 49.0 3.0 0.0 68.9361 270.0 6 61.9628 42.6237 31.0 2.0 0.6027 22.502 34.523 7 32.177 -3.891 35.0 0.0 5.7311 7.5142 353.103 32 Table 6: Example testing datasets for 2-D projectile data motion Instance Inputs outputs Vo Θ Φ a α Target Impact 1 -136.583 10.491 39.0 7.0 1.928 26.795 194.710 2 34.339 210.004 49.0 1.0 4.509 34.542 80.5470 3 82.987 -124.332 32.0 9.0 4.930 33.781 316.559 4 -48.627 40.255 35.0 9.0 0.5604 37.4732 182.333 5 153.010 174.322 46.0 4.0 1.3616 73.017 7.921 6 -165.138 119.676 37.0 4.0 2.754 53.303 136.59 Backpropagation uses the following parameter values: Learning rate = 0.3 Criteria = 1 4.1.3. Wine recognition data The wine recognition datasets are the result of a chemical analysis of wines grown in the same region in Italy, but derived from three cultivars. This chemical analysis determined the quantities of 13 constituents found in each of the three types of wines. The output is classified into three categories mainly according to the Class they fit in. Each instance of the training dataset consists of five input patterns: 33 Alcohol Malic Acid Ash Alkalinity of ash Magnesium Total phenols Flavonoids Non-Flavonoid phenols Proanthocyanins Color Intensity Hue OD280/OD315 of diluted wines Proline Each instance of the training dataset consists of one output pattern: Class distribution (1, 2, or 3) 34 Table 7: Example training datasets for wine recognition N Input O 1 14.2 1.71 2.43 15.6 127 2.8 3.06 .28 2.2 5.6 1.04 3.9 1065 1 2 13.2 1.78 2.14 11.2 100 2.6 2.76 .26 1.2 4.3 1.05 3.4 1050 1 3 12.3 .94 1.36 10.6 88 .28 .42 1.9 1.05 1.8 520 2 4 12.3 1.1 2.28 16 101 2.0 1.09 .63 .41 3.2 1.25 1.6 680 2 5 12.8 1.35 2.32 18 122 1.5 1.25 .21 .94 4.1 .76 1.2 630 3 6 12.8 2.31 2.4 98 1.3 560 3 24 1.9 .57 1.1 1.09 .27 .83 5.7 .66 Table 8: Example testing datasets for wine recognition N Input O 1 14 1.95 2.5 16.8 113 3.8 3.49 .24 2.1 7.8 .86 3.45 1480 1 2 12 1.81 2.2 18.8 86 2.2 2.53 .26 1.7 3.9 1.1 3.14 714 2 3 13 3.9 2.3 21.5 113 1.4 1.39 .34 1.1 9.4 .57 1.33 550 3 Backpropagation uses the following parameter values: Learning rate = 0.3 Criteria = 0.1 35 4.2. Measuring performance The main objective of this study is to improve the scaling process of the backpropagation algorithm, so that it can do the error correction as fast as it can and reach the desired output in the least number of iterations. So to compare between the linear scaling process and the non-linear scaling process using the backpropagation for both tests, we have to study the main factors of the algorithm, like the speed of learning, and how well the designed neural network generalizes. We set the learning rate and other parameters before we apply the scaling, and then keep these values intact and then observe the error calculations done by linear scaling process and then non-linear scaling process for each particular dataset. Another factor that needs to be observed is the criteria value for each particular dataset that is being tested. Keeping the criteria value too low might turn the neural network calculation into infinite or too much iteration, and the comparison between the linear and non-linear scaling would not conclude or would not make sense. On the other hand, a huge criteria value would allow training to conclude before learning is complete. Therefore, we need to observe the results with trial and error method, for the best criteria value for a particular dataset. For example, in the projectile datasets, the training pair is dispersed along a large range, one value is -150 and the other value is 220, so keeping the criteria too low, the neural network will have to undergo too many iterations. However, the datasets which do not have a large distribution in their input values, we can apply the criteria value as low as 0.001. Changing the criteria value for each test and for each datasets, makes a significant difference in the learning process. 36 Another important observation for the comparison would be to study the difference between calculated outputs versus the corresponding desired outputs by the human eye observations. After the proposed number of iterations, the backpropagation algorithm terminates, even if the results were not meet within the specified criteria. For such results, the minute details in the error difference in the generated output sets must be observed. After the learning process finishes calculating the weights, they are applied to the test datasets. During this test, we need to observe how well they learned for both scaling methods. During this test execution, the algorithm does not undergo any iteration for its weight manipulations, it simply uses is the calculated weights for determining the output. 37 Chapter 5 RESULTS After the parameters required by backpropagation algorithm for non-linear scaling have been set properly, the datasets are applied to the designed artificial neural network. Both the datasets, i.e. the training datasets and the testing datasets are given as input to the algorithm. Complete outputs for training runs are given in Appendix C. 5.1. Result for minimum weight problem datasets After executing the backpropagation method for linear scaling, the network did not converge the calculation within the declared iteration value, which were 900000 iterations. On the other hand, for the non-linear scaling execution of the algorithm, the network converged earlier than the number of iterations that was defined. As discussed earlier, the number of iterations of the neural network depends on the criteria value, both the execution had the same criteria value. The neural network learnt the dataset within 359299 iterations, whereas in linear scaling method, the network did not converge. Furthermore, this methodology was tested using the test datasets. Observing the outputs for test datasets, the network applied the weights that were adjusted during the learning phase. The percentage of the testing cases that meet the criteria for non-linear scaling is 100, so that all the outputs that were generated by the neural network were within the desired criteria. 38 Looking at the results of backpropagation for minimum weight problem dataset, we can conclude that, for the non-linear scaling, the network learned faster than when the datasets were scaled linearly. Table 9: Result for minimum weight problem Operation Linear Scaling Total number of iterations Non-Linear Scaling 900000 Converged Network did not converge Percentage Generalization for training cases Percentage Generalization for test cases 80% Network converged in 359299 iterations 100% 83.33% 100% 5.2. Result for two-dimensional projectile motion dataset For this dataset problem, in both the cases i.e. linear scaling and the non-linear scaling, the network does not converge within 90000000 numbers of iterations. Therefore, it is hard for us to tell which one did better. However, if we observe the outputs that were generated in non-linear scaling method, there is very little improvement in the difference between the desired output and the calculated output. For example, from the output (Refer to Appendix C for result), the desired result is supposed to be 59.3108 and the execution in linear scaling processes 64.2965, whereas the non-linear scaling processes 62.1667, so it is clear that 62.1667 is closer to the desired value than 59.3108. 39 Table 10: Result for 2-D projectile motion problem Operation Linear Scaling Total number of iterations Non-Linear Scaling 90000000 Converged Network did not converge Network did not converge Percentage Generalization for training cases Percentage Generalization for test cases 4.84% 12.168% 22.22% 15.556% 5.3. Result for wine recognition data We apply the wine recognition problem to the backpropagation algorithm for linear and non-linear scaling methodology. This particular problem helps us to identify that the non-linear scaling is better than the linear scaling method. For the linear scaling method, the number of datasets that meet the criteria for training cases is 99.4382 on the other hand for the non-linear scaling method, it is 100.00 percent. The number of iterations for linear scaling methodology is less than the non-linear scaling. This implies that the linear scaling methodology learnt faster than non-linear, but did not learn efficiently. Therefore, the output generated by the algorithm in non-linear method was more efficient than the output generated by the algorithm in linear method. 40 Table 11: Result for wine recognition data Operation Linear Scaling Total number of iterations Converged Percentage Generalization for training cases Percentage Generalization for test cases Non-Linear Scaling 9000000 Network converged in 275049 iterations 99.4382% Network converged in 306377 iterations 100% 100% 100% 41 Chapter 6 CONCLUSION Artificial Neural Networks learn by example, and with good training datasets, they achieve the learning process faster. The current linear scaling process used in backpropagation fails for large differences and clustering of the values in training datasets. In real world problems, the data that the neural network uses is often scattered along a large scale. If this distributed data is scaled linearly, the learning effort of the neural network may increase. A new method of non-linear scaling called median scaling, tries to lessen the clustering issues in the datasets. The algorithm takes all the input, output datasets at once, and finds out the number of inputs and number of training cases. Further, for each training pair in the datasets, the algorithm finds out three values: minimum, maximum and the median and scales them. After undergoing the experiment of the non-linear scaling (median type scaling) and applying various datasets, we can say that the non-linear scaling improves the error calculation up to some extent. Therefore, the non-linear scaling is one of the processes to decrease the training effort. The linear scaling can work for a set of data where the range is not dispersed rather they are tightly coupled. 42 Chapter 7 FUTURE WORK There is a lot of scope for improving the backpropagation algorithm in terms of scaling the datasets. For instance, this experiment applies non-linear scaling to the output datasets, further we could apply non-linear scaling to the input datasets too and try to study the error calculations. Another approach for non-linear scaling would be to perform clustering analysis on the training datasets. Each set of populated datasets can be clustered and then can be classified into groups of datasets. Then we can split the points into multiple sections and then apply scaling. So, instead of finding out the median and then scaling the datasets, we find multiple points according to the clustering analysis and then scale each of them. 43 APPENDIX A Source Code /************************************************** Neural Network with Backpropagation using Non-Linear Scaling methodology -------------------------------------------------Modified by Alok Nakate California State University Sacramento Date: October 2010 Change: Non-Linear Scaling -------------------------------------------------Adapted from D. Whitley, Colorado State University Modifications by S. Gordon -------------------------------------------------Version 3.0 - October 2009 - includes momentum -------------------------------------------------compile with g++ nn.c ****************************************************/ #include <iostream> #include <fstream> #include <cmath> using namespace std; #define NumOfCols 3 #define NumOfRows #define NumINs /* number of layers +1 i.e, include input layer */ 14 13 #define NumOUTs 1 0.01 /* most books suggest 0.3 */ /* all outputs must be within this to terminate */ #define MaxIterate 1000000 /* maximum number of iterations #define ReportIntv 1001 0.8 #define TrainCases 178 */ /* all outputs must be within this to terminate */ #define TestCriteria 0.02 #define Momentum */ /* number of outputs, not including bias node #define LearningRate 0.3 #define Criteria /* max number of rows net +1, last is bias node */ /* number of inputs, not including bias node */ /* print report every time this many cases done*/ /* momentum constant /* number of training cases */ */ 44 #define TestCases 15 /* number of test cases */ // network topology by column -----------------------------------#define NumNodes1 14 /* col 1 - must equal NumINs+1 #define NumNodes2 14 /* col 2 - hidden layer 1, etc. */ */ #define NumNodes3 1 /* output layer must equal NumOUTs */ #define NumNodes4 0 /* #define NumNodes5 0 /* note: layers include bias node */ #define NumNodes6 */ 0 #define TrainFile "winetrain.dat" /* file containing training data */ #define TestFile "winetest.dat" /* file containing testing data */ int NumRowsPer[NumOfRows]; /* number of rows used in each column incl. bias */ /* note - bias is not included on output layer */ /* note - leftmost value must equal NumINs+1 /* note - rightmost value must equal NumOUTs double TrainArray[TrainCases][NumINs + NumOUTs]; // an array for finding the median for a particular column // (from the given inputs and outputs) double MedianArray[TrainCases]; double TestArray[TestCases][NumINs + NumOUTs]; int CritrIt = 2 * TrainCases; ifstream train_stream; /* source of training data */ ifstream test_stream; /* source of test data */ ofstream result_stream("result.txt"); void CalculateInputsAndOutputs (); void TestInputsAndOutputs(); void TestForward(); double ScaleOutput(double X, int which); // Alok double ScaleOutputGeneralise(double X, int which); double ScaleDown(double X, int which); // Alok double ScaleDownGeneralise(double X, int which); */ */ 45 // Alok double ScaleCriteria(double ActualOutput, int which); double ScaleTestCriteria(double ActualOutput, int which); void GenReport(int Iteration); void TrainForward(); void FinReport(int Iteration); void DumpWeights(); // Alok double * sort(double arr[], int numR); void quicksort(int arr[], int low, int high); double FindMedian(double arr[], int numR); double getMedian(double arr[], int numR); // struct CellRecord { double Output; double Error; double Weights[NumOfRows]; double PrevDelta[NumOfRows]; }; struct CellRecord CellArray[NumOfRows][NumOfCols]; double Inputs[NumINs]; double DesiredOutputs[NumOUTs]; double extrema[NumINs+NumOUTs][3]; // [0] is low, [1] is hi, [2] is median long Iteration; /************************************************************ Get data from Training and Testing Files, put into arrays The scaling process also occurs in this step. *************************************************************/ void GetData() { for (int i=0; i < (NumINs+NumOUTs); i++) { extrema[i][0]=99999.0; extrema[i][1]=-99999.0; } 46 // read in training data train_stream.open(TrainFile); for (int i=0; i < TrainCases; i++) { for (int j=0; j < (NumINs+NumOUTs); j++) { train_stream >> TrainArray[i][j]; if (TrainArray[i][j] < extrema[j][0]) extrema[j][0] = TrainArray[i][j]; if (TrainArray[i][j] > extrema[j][1]) extrema[j][1] = TrainArray[i][j]; }} train_stream.close(); // read in test data test_stream.open(TestFile); for (int i=0; i < TestCases; i++) { for (int j=0; j < (NumINs+NumOUTs); j++) { test_stream >> TestArray[i][j]; if (TestArray[i][j] < extrema[j][0]) extrema[j][0] = TestArray[i][j]; if (TestArray[i][j] > extrema[j][1]) extrema[j][1] = TestArray[i][j]; }} // guard against both extrema being equal for (int i=0; i < (NumINs+NumOUTs); i++) if (extrema[i][0] == extrema[i][1]) extrema[i][1]=extrema[i][0]+1; test_stream.close(); // scale training and test data to range 0..1 /********************************************************************** Apply Scaling to Training cases **********************************************************************/ for (int i=0; i < TrainCases; i++) { for (int j=0; j < NumINs; j++) TrainArray[i][j] = ScaleDown(TrainArray[i][j],j); } // Alok // Find the Median of a particular column // Currently it finds the medians of all the output columns. 47 for (int k=NumINs; k < NumINs+NumOUTs; k++) { for (int i=0; i < TrainCases; i++) { MedianArray[i] = TrainArray[i][k]; } extrema [k][2] = FindMedian(MedianArray, TrainCases); } // Apply the NON-LINEAR Scaling using the median for (int i=0; i < TrainCases; i++) { for (int k=NumINs; k < NumINs+NumOUTs; k++) TrainArray[i][k] = ScaleDownGeneralise(TrainArray[i][k],k); } /********************************************************************** Apply Scaling to Test cases **********************************************************************/ for (int i=0; i < TestCases; i++) { for (int j=0; j < NumINs; j++) TestArray[i][j] = ScaleDown(TestArray[i][j],j); for (int k=NumINs; k < NumINs+NumOUTs; k++) TestArray[i][k] = ScaleDownGeneralise(TestArray[i][k],k); } } double FindMedian(double arr[], int numR) { double valMedian; arr = sort (arr, numR); valMedian = getMedian(arr, numR); return valMedian; } /*************************************************************** Function to find the median of a particular sorted column 48 ***************************************************************/ double getMedian(double arr[], int numR) { int middle = numR/2; double average; if (numR%2==0) average = (arr[middle-1]+arr[middle])/2; else average = (arr[middle]); return average; } /*************************************************************** Function to sort a particular column ***************************************************************/ double * sort(double arr[], int numR) { double temp; for (int i = (TrainCases - 1); i >= 0; i--) { for (int j = 1; j <= i; j++) { if (MedianArray[j-1] > MedianArray[j]) { temp = MedianArray[j-1]; MedianArray[j-1] = MedianArray[j]; MedianArray[j] = temp; } } } return arr; } 49 /************************************************************** Assign the next training pair ***************************************************************/ void CalculateInputsAndOutputs() { static int S=0; for (int i=0; i < NumINs; i++) Inputs[i]=TrainArray[S][i]; for (int i=0; i < NumOUTs; i++) DesiredOutputs[i]=TrainArray[S][i+NumINs]; S++; if (S==TrainCases) S=0; } /************************************************************** Assign the next testing pair ***************************************************************/ void TestInputsAndOutputs() { static int S=0; for (int i=0; i < NumINs; i++) Inputs[i]=TestArray[S][i]; for (int i=0; i < NumOUTs; i++) DesiredOutputs[i]=TestArray[S][i+NumINs]; S++; if (S==TestCases) S=0; } /************************* MAIN *************************************/ void main() { int I, J, K, existsError, ConvergedIterations=0; long seedval; double Sum, newDelta, scaledCriteria; Iteration=0; NumRowsPer[0] = NumNodes1; NumRowsPer[3] = NumNodes4; NumRowsPer[1] = NumNodes2; NumRowsPer[4] = NumNodes5; NumRowsPer[2] = NumNodes3; NumRowsPer[5] = NumNodes6; 50 /* initialize the weights to small random values. */ /* initialize previous changes to 0 (momentum). */ seedval = 555; srand(seedval); for (I=1; I < NumOfCols; I++) for (J=0; J < NumRowsPer[I]; J++) for (K=0; K < NumRowsPer[I-1]; K++) { CellArray[J][I].Weights[K] = 2.0 * ((double)((int)rand() % 100000 / 100000.0)) - 1.0; CellArray[J][I].PrevDelta[K] = 0; } GetData(); // read training and test data into arrays cout << endl << "Iteration result_stream << "Iteration Inputs Inputs cout << "Desired Outputs "; "; Actual Outputs" << endl; result_stream << "Desired Outputs Actual Outputs" << endl; // ------------------------------// beginning of main training loop do { /* retrieve a training pair */ CalculateInputsAndOutputs(); for (J=0; J < NumRowsPer[0]-1; J++) CellArray[J][0].Output = Inputs[J]; /* set up bias nodes */ for (I=0; I < NumOfCols-1; I++) { CellArray[NumRowsPer[I]-1][I].Output = 1.0; CellArray[NumRowsPer[I]-1][I].Error = 0.0; } /************************** * FORWARD PASS * **************************/ /* hidden layers */ for (I=1; I < NumOfCols-1; I++) for (J=0; J < NumRowsPer[I]-1; J++) 51 { Sum = 0.0; for (K=0; K < NumRowsPer[I-1]; K++) Sum += CellArray[J][I].Weights[K] * CellArray[K][I-1].Output; CellArray[J][I].Output = 1.0 / (1.0+exp(-Sum)); CellArray[J][I].Error = 0.0; } /* output layer */ for (J=0; J < NumOUTs; J++) { Sum = 0.0; for (K=0; K < NumRowsPer[NumOfCols-2]; K++) Sum += CellArray[J][NumOfCols-1].Weights[K] * CellArray[K][NumOfCols-2].Output; CellArray[J][NumOfCols-1].Output = 1.0 / (1.0+exp(-Sum)); CellArray[J][NumOfCols-1].Error = 0.0; } /************************** * BACKWARD PASS * **************************/ /* calculate error at each output node */ for (J=0; J < NumOUTs; J++) CellArray[J][NumOfCols-1].Error = DesiredOutputs[J]-CellArray[J][NumOfCols-1].Output; /* check to see how many consecutive oks seen so far */ existsError = 0; for (J=0; J < NumOUTs; J++) { // Alok // Apply non linear scaling to the criteria too as we applied it to // the outputs initially scaledCriteria = ScaleCriteria(CellArray[J][NumOfCols-1].Output, NumINs+J); if (fabs(CellArray[J][NumOfCols-1].Error) > scaledCriteria) { existsError = 1; } } 52 if (existsError == 0) ConvergedIterations++; else ConvergedIterations = 0; /* apply derivative of squashing function to output errors */ for (J=0; J < NumOUTs; J++) CellArray[J][NumOfCols-1].Error = CellArray[J][NumOfCols-1].Error * CellArray[J][NumOfCols-1].Output * (1.0 - CellArray[J][NumOfCols-1].Output); /* backpropagate error */ /* output layer */ for (J=0; J < NumRowsPer[NumOfCols-2]; J++) for (K=0; K < NumRowsPer[NumOfCols-1]; K++) CellArray[J][NumOfCols-2].Error = CellArray[J][NumOfCols-2].Error + CellArray[K][NumOfCols-1].Weights[J] * CellArray[K][NumOfCols-1].Error * (CellArray[J][NumOfCols-2].Output) * (1.0-CellArray[J][NumOfCols-2].Output); /* hidden layers */ for (I=NumOfCols-3; I>=0; I--) for (J=0; J < NumRowsPer[I]; J++) for (K=0; K < NumRowsPer[I+1]-1; K++) CellArray[J][I].Error = CellArray[J][I].Error + CellArray[K][I+1].Weights[J] * CellArray[K][I+1].Error * (CellArray[J][I].Output) * (1.0-CellArray[J][I].Output); /* adjust weights */ for (I=1; I < NumOfCols; I++) for (J=0; J < NumRowsPer[I]; J++) for (K=0; K < NumRowsPer[I-1]; K++) { newDelta = (Momentum * CellArray[J][I].PrevDelta[K]) + (LearningRate * CellArray[K][I-1].Output * CellArray[J][I].Error); CellArray[J][I].Weights[K] = CellArray[J][I].Weights[K] + newDelta; CellArray[J][I].PrevDelta[K] = newDelta; } 53 GenReport(Iteration); Iteration++; } while (!((ConvergedIterations >= CritrIt) || (Iteration >= MaxIterate))); // end of main training loop // ------------------------------- FinReport(ConvergedIterations); TrainForward(); TestForward(); } double ScaleCriteria(double ActualOutput, int which) { double range, allPos; if (ActualOutput < 0.5) { range = (extrema[which][2]-extrema[which][0]); allPos = ((.9*(Criteria/range)))/2; } else if (ActualOutput >= 0.5) { range = (extrema[which][1]-extrema[which][2]); allPos = (((.9*(Criteria/range)))/2)+0.5; } return (allPos); } double ScaleTestCriteria(double ActualOutput, int which) { double range, allPos; if (ActualOutput < 0.5) { 54 range = (extrema[which][2]-extrema[which][0]); allPos = ((.9*(TestCriteria/range)))/2; } else if (ActualOutput >= 0.5) { range = (extrema[which][1]-extrema[which][2]); allPos = (((.9*(TestCriteria/range)))/2)+0.5; } return (allPos); } /******************************************* Scale Desired Output to 0..1 *******************************************/ double ScaleDown(double X, int which) { double allPos; allPos = .9*(X-extrema[which][0])/(extrema[which][1]-extrema[which][0])+.05; return (allPos); } /************************************************ This function scales the input non-linear wise. *************************************************/ double ScaleDownGeneralise(double X, int which) { double range, allPos; if (X < extrema[which][2]) { range = (extrema[which][2]-extrema[which][0]); allPos = ((.9*((X-extrema[which][0])/range))+.05)/2; } else if (X >= extrema[which][2]) 55 { range = (extrema[which][1]-extrema[which][2]); allPos = (((.9*((X-extrema[which][2])/range))+.05)/2)+0.5; } return (allPos); } /******************************************* Scale actual output to original range *******************************************/ double ScaleOutput(double X, int which) { double range = extrema[which][1] - extrema[which][0]; double scaleUp = ((X-.05)/.9) * range; return (extrema[which][0] + scaleUp); } /************************************************* Scale back to the original value **************************************************/ double ScaleOutputGeneralise(double X, int which) { double range, scaleUp, res; if (X < 0.5) { range = extrema[which][2] - extrema[which][0]; scaleUp = ((((X*2)-.05)/.9) * range); res = (extrema[which][0] + scaleUp); //(((X*2) - 0.05)/0.9) * range } if (X >= 0.5) { range = extrema[which][1] - extrema[which][2]; scaleUp = (((((X-0.5)*2)-.05)/.9) * range); 56 res = (extrema[which][2] + scaleUp); } return (res); } /******************************************* Run Test Data forward pass only *******************************************/ void TestForward() { int GoodCount=0; double Sum, TotalError=0, scaledTestCriteria; cout << "Running Test Cases" << endl; result_stream << "Running Test Cases" << endl; for (int H=0; H < TestCases; H++) { TestInputsAndOutputs(); for (int J=0; J < NumRowsPer[0]-1; J++) CellArray[J][0].Output = Inputs[J]; /* hidden layers */ for (int I=1; I < NumOfCols-1; I++) for (int J=0; J < NumRowsPer[I]-1; J++) { Sum = 0.0; for (int K=0; K < NumRowsPer[I-1]; K++) Sum += CellArray[J][I].Weights[K] * CellArray[K][I-1].Output; CellArray[J][I].Output = 1.0 / (1.0+exp(-Sum)); CellArray[J][I].Error = 0.0; } /* output layer */ for (int J=0; J < NumOUTs; J++) { Sum = 0.0; for (int K=0; K < NumRowsPer[NumOfCols-2]; K++) Sum += CellArray[J][NumOfCols-1].Weights[K] * CellArray[K][NumOfCols-2].Output; CellArray[J][NumOfCols-1].Output = 1.0 / (1.0+exp(-Sum)); CellArray[J][NumOfCols-1].Error = DesiredOutputs[J]-CellArray[J][NumOfCols-1].Output; 57 scaledTestCriteria = ScaleTestCriteria(CellArray[J][NumOfCols-1].Output, NumINs+J); if (fabs(CellArray[J][NumOfCols-1].Error) <= scaledTestCriteria ) GoodCount++; TotalError += CellArray[J][NumOfCols-1].Error * CellArray[J][NumOfCols-1].Error; } GenReport(-1); } cout << endl; result_stream << endl; cout << "Sum Squared Error for Testing cases = " << TotalError << endl; result_stream << "Sum Squared Error for Testing cases = " << TotalError << endl; cout << "% of Testing Cases that meet criteria = " << ((double)GoodCount/(double)TestCases)*100; result_stream << "% of Testing Cases that meet criteria = " << ((double)GoodCount/(double)TestCases)*100; cout << endl; result_stream << endl; cout << endl; result_stream << endl; } /***************************************************** Run Training Data forward pass only, after training ******************************************************/ void TrainForward() { int GoodCount=0; double Sum, TotalError=0, scaledCriteria; cout << endl << "Confirm Training Cases" << endl; result_stream << endl << "Confirm Training Cases" << endl; for (int H=0; H < TrainCases; H++) { CalculateInputsAndOutputs (); for (int J=0; J < NumRowsPer[0]-1; J++) CellArray[J][0].Output = Inputs[J]; /* hidden layers */ for (int I=1; I < NumOfCols-1; I++) for (int J=0; J < NumRowsPer[I]-1; J++) { Sum = 0.0; 58 for (int K=0; K < NumRowsPer[I-1]; K++) Sum += CellArray[J][I].Weights[K] * CellArray[K][I-1].Output; CellArray[J][I].Output = 1.0 / (1.0+exp(-Sum)); CellArray[J][I].Error = 0.0; } /* output layer */ for (int J=0; J < NumOUTs; J++) { Sum = 0.0; for (int K=0; K < NumRowsPer[NumOfCols-2]; K++) Sum += CellArray[J][NumOfCols-1].Weights[K] * CellArray[K][NumOfCols-2].Output; CellArray[J][NumOfCols-1].Output = 1.0 / (1.0+exp(-Sum)); CellArray[J][NumOfCols-1].Error = DesiredOutputs[J]-CellArray[J][NumOfCols-1].Output; scaledCriteria = ScaleCriteria(CellArray[J][NumOfCols-1].Output, NumINs+J); if (fabs(CellArray[J][NumOfCols-1].Error) <= scaledCriteria) GoodCount++; TotalError += CellArray[J][NumOfCols-1].Error * CellArray[J][NumOfCols-1].Error; } GenReport(-1); } cout << endl; result_stream << endl; cout << "Sum Squared Error for Training cases = " << TotalError << endl; result_stream << "Sum Squared Error for Training cases = " << TotalError << endl; cout << "% of Training Cases that meet criteria = " << ((double)GoodCount/(double)TrainCases)*100 << endl; result_stream << "% of Training Cases that meet criteria = " << ((double)GoodCount/(double)TrainCases)*100 << endl; cout << endl; result_stream << endl; } /******************************************* Final Report 59 *******************************************/ void FinReport(int CIterations) { cout.setf(ios::fixed); cout.setf(ios::showpoint); cout.precision(4); result_stream.setf(ios::fixed); result_stream.setf(ios::showpoint); result_stream.precision(4); if (CIterations<CritrIt) { cout << "Network did not converge" << endl; result_stream << "Network did not converge" << endl; } else { cout << "Converged to within criteria" << endl; result_stream << "Converged to within criteria" << endl; } cout << "Total number of iterations = " << Iteration << endl; result_stream << "Total number of iterations = " << Iteration << endl; } /******************************************* Generation Report pass in a -1 if running test cases *******************************************/ void GenReport(int Iteration) { int J; cout.setf(ios::fixed); cout.setf(ios::showpoint); cout.precision(4); result_stream.setf(ios::fixed); result_stream.setf(ios::showpoint); result_stream.precision(4); if (Iteration == -1) { for (J=0; J < NumRowsPer[0]-1; J++) { cout << " " << ScaleOutput(Inputs[J],J); result_stream << " " << ScaleOutput(Inputs[J],J); } cout << " "; result_stream << " "; for (J=0; J < NumOUTs; J++) { cout << " " << ScaleOutputGeneralise(DesiredOutputs[J],NumINs+J); 60 result_stream << " " << ScaleOutputGeneralise(DesiredOutputs[J],NumINs+J); } cout << " "; result_stream << " "; for (J=0; J < NumOUTs; J++) { cout << " " << ScaleOutputGeneralise(CellArray[J][NumOfCols-1].Output,NumINs+J); result_stream << " " << ScaleOutputGeneralise(CellArray[J][NumOfCols-1].Output,NumINs+J); } cout << endl; result_stream << endl; } else if ((Iteration % ReportIntv) == 0) { cout << " " << Iteration << " "; result_stream << " " << Iteration << " "; for (J=0; J < NumRowsPer[0]-1; J++) { cout << " " << ScaleOutput(Inputs[J],J); result_stream << " " << ScaleOutput(Inputs[J],J); } cout << " "; result_stream << " "; for (J=0; J < NumOUTs; J++) { cout << " " << ScaleOutputGeneralise(DesiredOutputs[J],NumINs+J); result_stream << " " << ScaleOutputGeneralise(DesiredOutputs[J],NumINs+J); } cout << " "; result_stream << " "; for (J=0; J < NumOUTs; J++) { cout << " " << ScaleOutputGeneralise(CellArray[J][NumOfCols-1].Output,NumINs+J); result_stream << " " << ScaleOutputGeneralise(CellArray[J][NumOfCols-1].Output,NumINs+J); } cout << endl; result_stream << endl; } } 61 APPENDIX B Datasets The three datasets used in the Backpropagation algorithm: 1. Minimum weight steel beam problem datasets 0.40 .40 .190 .190 .6313 0.20 .20 .120 .240 .3630 0.35 .35 .035 .035 .1000 0.15 .15 .045 .120 .1400 0.15 .15 .035 .095 .1000 0.06 .06 .018 .123 .0490 0.12 .06 .018 .067 .1000 0.10 .10 .007 .028 .0233 0.06 .06 .004 .024 .0110 0.20 .20 .050 .100 .1590 2. 2D Projectile motion datasets -185.68085 62.39953 0.0 33.0 7.0 3.2038 57.87422 136.03995 46.43233 -21.4281 0.0 40.0 0.0 0.80996 9.12676 335.2271 -197.26916 80.52932 0.0 49.0 0.0 3.7913 30.21106 157.79371 -45.51251 -66.32454 0.0 31.0 0.0 1.09526 62.44321 235.54173 130.62391 -164.34226 0.0 49.0 3.0 0.0 68.93617 270.0 61.96282 42.62375 0.0 31.0 2.0 0.6027 22.50215 34.52309 -42.51268 236.75163 0.0 46.0 5.0 1.97327 82.40679 41.62494 -8.97536 7.36584 0.0 33.0 1.0 3.96902 2.99796 140.31947 62 -29.53386 -73.83029 0.0 32.0 4.0 0.0 46.10695 225.0 171.80714 -108.08147 0.0 43.0 8.0 0.0 27.05438 315.0 10.25567 115.54738 0.0 46.0 7.0 3.34476 5.9313 -21.85311 0.0 40.0 0.0 0.5036 18.15832 71.96181 86.01389 285.1852 307.65628 170.32188 0.0 43.0 9.0 0.68538 44.1592 19.79431 -79.70994 0.37885 0.0 31.0 0.0 5.64623 27.18855 179.72769 0.0 108.02264 0.0 39.0 0.0 0.0 67.94637 90.0 -3.06196 11.22578 0.0 32.0 9.0 3.95672 3.29125 102.67458 -48.77648 101.84351 0.0 39.0 8.0 0.99979 54.85066 196.16885 267.25881 182.24948 0.0 40.0 8.0 0.79311 70.8592 7.23498 109.22732 -94.95332 0.0 43.0 3.0 1.81711 39.85647 310.59396 -66.24069 74.12901 0.0 37.0 8.0 1.09514 50.63463 200.09005 -291.67982 -76.96141 0.0 40.0 7.0 3.60032 53.89839 183.51809 -3.03388 131.93313 0.0 31.0 9.0 1.46092 76.80569 240.05522 32.17785 -3.89194 0.0 35.0 0.0 5.73113 7.51423 353.10353 53.00022 72.62676 0.0 49.0 9.0 5.33339 11.70271 64.28479 85.76912 101.11365 0.0 33.0 5.0 1.29635 33.46894 41.62803 -17.92859 338.2669 0.0 40.0 8.0 1.70127 69.49202 83.30323 -13.07071 255.31984 0.0 46.0 9.0 1.75899 27.03595 89.25971 192.01133 -162.59587 0.0 49.0 9.0 4.53268 46.49823 16.71117 -35.43567 31.33487 0.0 43.0 5.0 4.98891 56.16365 114.25625 -52.46014 66.44012 0.0 33.0 0.0 6.06239 24.81207 128.2941 89.23896 -78.41001 0.0 49.0 4.0 0.0 13.45551 315.0 -34.93675 501.70877 0.0 49.0 9.0 1.36357 68.64398 133.90507 -20.93785 -7.27552 0.0 35.0 9.0 1.21631 5.42342 203.02062 -39.84531 -15.16282 0.0 37.0 1.0 4.19117 8.77469 200.2623 63 -38.27994 37.26014 0.0 40.0 2.0 0.65763 9.60368 137.73263 -96.35594 -40.79194 0.0 31.0 4.0 2.34045 47.31952 227.32617 10.36563 6.58753 0.0 31.0 6.0 0.44178 61.9614 204.23973 -50.26037 -4.93124 0.0 31.0 9.0 5.36504 31.81111 156.64365 -56.69363 0.0 0.0 32.0 6.0 0.0 34.24794 180.0 -86.42147 100.59401 0.0 40.0 4.0 0.65914 28.73856 143.5829 80.85112 42.29208 0.0 31.0 8.0 1.58313 49.83195 328.02689 196.48956 0.0 0.0 43.0 3.0 0.0 30.8483 0.0 229.20318 -15.612 0.0 49.0 5.0 5.27109 63.24488 51.15651 110.58687 16.18088 0.0 37.0 1.0 2.66207 56.59575 3.13154 17.6044 -95.25832 0.0 31.0 3.0 0.05457 40.73669 265.3123 39.30388 -37.403 0.0 36.0 1.0 5.58348 83.9627 313.05209 175.17317 21.09589 0.0 43.0 1.0 5.10087 57.41801 15.72236 -167.18443 -4.74984 0.0 46.0 6.0 3.23038 78.13924 11.75959 -66.90301 20.68802 0.0 32.0 8.0 1.34695 21.84847 181.86529 -43.18237 -32.46084 0.0 36.0 0.0 3.19018 12.05541 216.93274 3. Wine recognition datasets 14.23 1.71 2.43 15.6 127 2.8 3.06 .28 2.29 5.64 1.04 3.92 1065 1 13.2 1.78 2.14 11.2 100 2.65 2.76 .26 1.28 4.38 1.05 3.4 1050 1 13.16 2.36 2.67 18.6 101 2.8 3.24 .3 2.81 5.68 1.03 3.17 1185 1 14.37 1.95 2.5 16.8 113 3.85 3.49 .24 2.18 7.8 .86 3.45 1480 1 13.24 2.59 2.87 21 118 2.8 2.69 .39 1.82 4.32 1.04 2.93 735 1 14.2 1.76 2.45 15.2 112 3.27 3.39 .34 1.97 6.75 1.05 2.85 1450 1 14.39 1.87 2.45 14.6 96 2.5 2.52 .3 1.98 5.25 1.02 3.58 1290 1 14.06 2.15 2.61 17.6 121 2.6 2.51 .31 1.25 5.05 1.06 3.58 1295 1 64 14.83 1.64 2.17 14 97 2.8 2.98 .29 1.98 5.2 1.08 2.85 1045 1 13.86 1.35 2.27 16 98 2.98 3.15 .22 1.85 7.22 1.01 3.55 1045 1 14.1 2.16 2.3 18 105 2.95 3.32 .22 2.38 5.75 1.25 3.17 1510 1 14.12 1.48 2.32 16.8 95 2.2 2.43 .26 1.57 5 1.17 2.82 1280 1 13.75 1.73 2.41 16 89 2.6 2.76 .29 1.81 5.6 1.15 2.9 1320 1 14.75 1.73 2.39 11.4 91 3.1 3.69 .43 2.81 5.4 1.25 2.73 1150 1 14.38 1.87 2.38 12 102 3.3 3.64 .29 2.96 7.5 1.2 3 1547 1 13.63 1.81 2.7 17.2 112 2.85 2.91 .3 1.46 7.3 1.28 2.88 1310 1 14.3 1.92 2.72 20 120 2.8 3.14 .33 1.97 6.2 1.07 2.65 1280 1 13.83 1.57 2.62 20 115 2.95 3.4 .4 1.72 6.6 1.13 2.57 1130 1 14.19 1.59 2.48 16.5 108 3.3 3.93 .32 1.86 8.7 1.23 2.82 1680 1 13.64 3.1 2.56 15.2 116 2.7 3.03 .17 1.66 5.1 .96 3.36 845 1 14.06 1.63 2.28 16 126 3 3.17 .24 2.1 5.65 1.09 3.71 780 1 12.93 3.8 2.65 18.6 102 2.41 2.41 .25 1.98 4.5 1.03 3.52 770 1 13.71 1.86 2.36 16.6 101 2.61 2.88 .27 1.69 3.8 1.11 4 1035 1 12.85 1.6 2.52 17.8 95 2.48 2.37 .26 1.46 3.93 1.09 3.63 1015 1 13.5 1.81 2.61 20 96 2.53 2.61 .28 1.66 3.52 1.12 3.82 845 1 13.05 2.05 3.22 25 124 2.63 2.68 .47 1.92 3.58 1.13 3.2 830 1 13.39 1.77 2.62 16.1 93 2.85 2.94 .34 1.45 4.8 .92 3.22 1195 1 13.3 1.72 2.14 17 94 2.4 2.19 .27 1.35 3.95 1.02 2.77 1285 1 13.87 1.9 2.8 19.4 107 2.95 2.97 .37 1.76 4.5 1.25 3.4 915 1 14.02 1.68 2.21 16 96 2.65 2.33 .26 1.98 4.7 1.04 3.59 1035 1 13.73 1.5 2.7 22.5 101 3 3.25 .29 2.38 5.7 1.19 2.71 1285 1 13.58 1.66 2.36 19.1 106 2.86 3.19 .22 1.95 6.9 1.09 2.88 1515 1 13.68 1.83 2.36 17.2 104 2.42 2.69 .42 1.97 3.84 1.23 2.87 990 1 13.76 1.53 2.7 19.5 132 2.95 2.74 .5 1.35 5.4 1.25 3 1235 1 65 13.51 1.8 2.65 19 110 2.35 2.53 .29 1.54 4.2 1.1 2.87 1095 1 13.48 1.81 2.41 20.5 100 2.7 2.98 .26 1.86 5.1 1.04 3.47 920 1 13.28 1.64 2.84 15.5 110 2.6 2.68 .34 1.36 4.6 1.09 2.78 880 1 13.05 1.65 2.55 18 98 2.45 2.43 .29 1.44 4.25 1.12 2.51 1105 1 13.07 1.5 2.1 15.5 98 2.4 2.64 .28 1.37 3.7 1.18 2.69 1020 1 14.22 3.99 2.51 13.2 128 3 3.04 .2 2.08 5.1 .89 3.53 760 1 13.56 1.71 2.31 16.2 117 3.15 3.29 .34 2.34 6.13 .95 3.38 795 1 13.41 3.84 2.12 18.8 90 2.45 2.68 .27 1.48 4.28 .91 3 1035 1 13.88 1.89 2.59 15 101 3.25 3.56 .17 1.7 5.43 .88 3.56 1095 1 13.24 3.98 2.29 17.5 103 2.64 2.63 .32 1.66 4.36 .82 3 680 1 13.05 1.77 2.1 17 107 3 3 .28 2.03 5.04 .88 3.35 885 1 14.21 4.04 2.44 18.9 111 2.85 2.65 .3 1.25 5.24 .87 3.33 1080 1 14.38 3.59 2.28 16 102 3.25 3.17 .27 2.19 4.9 1.04 3.44 1065 1 13.9 1.68 2.12 16 101 3.1 3.39 .21 2.14 6.1 .91 3.33 985 1 14.1 2.02 2.4 18.8 103 2.75 2.92 .32 2.38 6.2 1.07 2.75 1060 1 13.94 1.73 2.27 17.4 108 2.88 3.54 .32 2.08 8.90 1.12 3.1 1260 1 66 APPENDIX C Results The output results for each datasets are as follows: 1. Result for minimum weight problem datasets The algorithm is executed using the linear scaling process. Below is the output result of the execution: 886886 0.1200 0.0600 0.0180 0.0670 0.1000 0.0999 887887 0.1000 0.1000 0.0070 0.0280 0.0233 0.0216 888888 0.0600 0.0600 0.0040 0.0240 0.0110 0.0129 889889 0.2000 0.2000 0.0500 0.1000 0.1590 0.1590 890890 0.4000 0.4000 0.1900 0.1900 0.6313 0.6312 891891 0.2000 0.2000 0.1200 0.2400 0.3630 0.3630 892892 0.3500 0.3500 0.0350 0.0350 0.1000 0.1001 893893 0.1500 0.1500 0.0450 0.1200 0.1400 0.1403 894894 0.1500 0.1500 0.0350 0.0950 0.1000 0.0999 895895 0.0600 0.0600 0.0180 0.1230 0.0490 0.0487 896896 0.1200 0.0600 0.0180 0.0670 0.1000 0.0999 897897 0.1000 0.1000 0.0070 0.0280 0.0233 0.0216 898898 0.0600 0.0600 0.0040 0.0240 0.0110 0.0129 899899 0.2000 0.2000 0.0500 0.1000 0.1590 0.1590 Network did not converge Total number of iterations = 900000 67 Confirm Training Cases 0.4000 0.4000 0.1900 0.1900 0.6313 0.6312 0.2000 0.2000 0.1200 0.2400 0.3630 0.3630 0.3500 0.3500 0.0350 0.0350 0.1000 0.1001 0.1500 0.1500 0.0450 0.1200 0.1400 0.1403 0.1500 0.1500 0.0350 0.0950 0.1000 0.0999 0.0600 0.0600 0.0180 0.1230 0.0490 0.0488 0.1200 0.0600 0.0180 0.0670 0.1000 0.1000 0.1000 0.1000 0.0070 0.0280 0.0233 0.0216 0.0600 0.0600 0.0040 0.0240 0.0110 0.0129 0.2000 0.2000 0.0500 0.1000 0.1590 0.1590 Sum Squared Error for Training cases = 0.0172 % of Training Cases that meet criteria = 80.00 Running Test Cases 0.2000 0.2000 0.0300 0.0600 0.1000 0.0791 0.3000 0.3000 0.0950 0.1270 0.3900 0.3873 0.1500 0.1500 0.0100 0.0270 0.0415 0.0318 0.4000 0.4000 0.1200 0.1200 0.4830 0.4943 0.2800 0.2800 0.1200 0.1710 0.3900 0.4794 0.1700 0.1700 0.0200 0.0470 0.0700 0.0522 Sum Squared Error for Testing cases = 0.0189 % of Testing Cases that meet criteria = 83.33 Then we apply the non-linear scaling to the algorithm and execute the algorithm. Below is the output for the execution. 68 344344 0.1500 0.1500 0.0350 0.0950 0.1000 0.0919 345345 0.0600 0.0600 0.0180 0.1230 0.0490 0.0490 346346 0.1200 0.0600 0.0180 0.0670 0.1000 0.1000 347347 0.1000 0.1000 0.0070 0.0280 0.0233 0.0238 348348 0.0600 0.0600 0.0040 0.0240 0.0110 0.0099 349349 0.2000 0.2000 0.0500 0.1000 0.1590 0.1613 350350 0.4000 0.4000 0.1900 0.1900 0.6313 0.6287 351351 0.2000 0.2000 0.1200 0.2400 0.3630 0.3621 352352 0.3500 0.3500 0.0350 0.0350 0.1000 0.1000 353353 0.1500 0.1500 0.0450 0.1200 0.1400 0.1461 354354 0.1500 0.1500 0.0350 0.0950 0.1000 0.0922 355355 0.0600 0.0600 0.0180 0.1230 0.0490 0.0490 356356 0.1200 0.0600 0.0180 0.0670 0.1000 0.1000 357357 0.1000 0.1000 0.0070 0.0280 0.0233 0.0237 358358 0.0600 0.0600 0.0040 0.0240 0.0110 0.0100 Converged to within criteria Total number of iterations = 359299 Confirm Training Cases 0.2000 0.2000 0.0500 0.1000 0.1590 0.1612 0.4000 0.4000 0.1900 0.1900 0.6313 0.6288 0.2000 0.2000 0.1200 0.2400 0.3630 0.3623 0.3500 0.3500 0.0350 0.0350 0.1000 0.1003 0.1500 0.1500 0.0450 0.1200 0.1400 0.1462 0.1500 0.1500 0.0350 0.0950 0.1000 0.0931 0.0600 0.0600 0.0180 0.1230 0.0490 0.0491 0.1200 0.0600 0.0180 0.0670 0.1000 0.1003 0.1000 0.1000 0.0070 0.0280 0.0233 0.0237 0.0600 0.0600 0.0040 0.0240 0.0110 0.0100 Sum Squared Error for Training cases = 0.0001 % of Training Cases that meet criteria = 100 69 Running Test Cases 0.2000 0.2000 0.0300 0.0600 0.1000 0.1020 0.3000 0.3000 0.0950 0.1270 0.3900 0.3746 0.1500 0.1500 0.0100 0.0270 0.0415 0.0632 0.4000 0.4000 0.1200 0.1200 0.4830 0.4994 0.2800 0.2800 0.1200 0.1710 0.3900 0.4814 0.1700 0.1700 0.0200 0.0470 0.0700 0.0905 Sum Squared Error for Testing cases = 0.0307 % of Testing Cases that meet criteria = 100 2. Result for two dimensional projectile motion dataset This datasets is applied to the linear scaling execution of the backpropagation algorithm. Below is the output result of this execution. 70 89977888 72.7617 44.5306 0.0000 36.0000 2.0000 4.3939 58.4399 43.8725 48.7751 104.7192 89978889 -172.7895 -32.8454 0.0000 39.0000 3.0000 4.3372 47.5066 174.3488 55.0915 149.3253 89979890 -21.1724 23.4599 0.0000 35.0000 5.0000 0.0000 7.6959 135.0000 27.2050 180.0532 89980891 110.5869 16.1809 0.0000 37.0000 1.0000 2.6621 56.5958 3.1315 50.9714 199.3150 89981892 261.0820 46.3084 0.0000 40.0000 8.0000 0.0000 81.7613 90.0000 69.1375 100.5568 89982893 23.7422 -22.4497 0.0000 32.0000 2.0000 5.0736 84.6955 62.7784 54.7421 225.9713 89983894 -42.9145 -19.9635 0.0000 32.0000 5.0000 0.2630 15.9800 203.5113 35.1318 186.7456 89984895 43.8675 111.6771 0.0000 36.0000 7.0000 0.0000 28.8077 90.0000 46.1808 105.9781 89985896 -13.0707 255.3198 0.0000 46.0000 9.0000 1.7590 27.0359 89.2597 51.0034 90.3390 89986897 -79.0395 41.7969 0.0000 33.0000 9.0000 1.7895 21.1974 167.8664 29.6163 205.0539 89987898 -19.6245 -177.9393 0.0000 40.0000 2.0000 3.6936 42.0222 272.0479 45.2451 273.9994 89988899 -32.5601 71.8381 0.0000 40.0000 6.0000 4.7127 18.2334 109.6095 35.5439 127.9712 89989900 31.1128 -148.6197 0.0000 43.0000 1.0000 0.0000 64.0141 270.0000 53.6213 287.1136 89990901 171.8071 -108.0815 0.0000 43.0000 8.0000 0.0000 27.0544 315.0000 41.7095 284.4379 89991902 -171.9108 18.1377 0.0000 36.0000 8.0000 3.4680 71.1371 82.0294 68.8775 160.8337 89992903 -74.4172 101.2078 0.0000 40.0000 9.0000 2.2764 18.1455 125.0931 46.5662 171.3997 89993904 -148.1408 -91.9054 0.0000 32.0000 8.0000 3.1597 49.0996 240.6482 53.6533 225.2897 89994905 125.9776 -16.5148 0.0000 46.0000 2.0000 5.4039 78.1282 33.9058 46.4541 176.2502 89995906 -27.2345 -52.1089 0.0000 32.0000 6.0000 2.5699 51.3839 292.1707 41.0238 202.6173 89996907 30.2260 1.8990 0.0000 32.0000 2.0000 2.7425 70.3060 348.8768 58.3618 168.6078 89997908 -2.2989 71.5865 0.0000 40.0000 1.0000 1.2326 82.4052 107.9166 39.6973 98.3865 89998909 152.0903 42.7557 0.0000 31.0000 8.0000 0.0000 77.0752 90.0000 56.6407 102.4234 89999910 91.4196 -72.5273 0.0000 46.0000 1.0000 5.5390 15.8439 321.6954 40.5753 224.6502 Network did not converge Total number of iterations = 90000000 71 Confirm Training Cases 178.9687 23.4961 0.0000 46.0000 5.0000 0.0125 21.9821 8.8686 64.6341 113.7757 230.7672 130.0264 0.0000 46.0000 3.0000 0.0000 60.8047 45.0000 51.8464 -19.9321 -21.1724 23.4599 0.0000 35.0000 5.0000 0.0000 7.6959 135.0000 36.6128 177.0297 -8.8855 86.9114 0.0000 36.0000 7.0000 6.0688 26.2457 115.3928 42.0572 134.7640 104.1476 67.7624 0.0000 31.0000 3.0000 0.0000 51.1223 45.0000 66.8488 -19.9321 11.7676 -92.6438 0.0000 40.0000 4.0000 0.0000 17.2861 270.0000 54.8268 288.5219 69.6204 -9.6036 0.0000 43.0000 9.0000 5.3348 53.1830 109.3070 29.8834 186.5350 25.2443 -25.2443 0.0000 40.0000 0.0000 0.0000 83.6846 315.0000 58.6743 245.8078 2.2759 -52.7335 0.0000 35.0000 5.0000 0.6854 13.6466 266.7787 51.6609 252.8812 10.1286 -133.8500 0.0000 32.0000 7.0000 4.9717 82.0924 164.3093 75.9074 242.7174 -117.9191 81.4699 0.0000 39.0000 8.0000 2.2939 71.2305 290.0352 43.1565 167.2508 51.0201 0.5967 0.0000 35.0000 2.0000 0.0000 89.8632 90.0000 58.0049 100.6049 -234.9254 -39.5119 0.0000 40.0000 7.0000 3.8667 64.5243 136.9512 61.2702 182.8140 245.8805 -161.6241 0.0000 37.0000 9.0000 5.9184 71.0984 291.4706 78.8932 318.4534 -258.3213 171.8963 0.0000 43.0000 8.0000 2.3965 82.4826 223.0216 72.9146 118.7620 44.6163 -90.0276 0.0000 33.0000 4.0000 4.1680 64.2199 341.8543 63.1254 282.0471 -234.4236 -131.9078 0.0000 40.0000 7.0000 3.6872 82.8753 198.4846 62.1360 184.4935 92.7752 -74.8569 0.0000 35.0000 3.0000 0.0000 28.9386 315.0000 64.6133 361.5471 -83.1771 13.3336 0.0000 37.0000 2.0000 0.6102 19.8037 173.8225 38.5355 178.5257 41.6193 25.2473 0.0000 40.0000 7.0000 0.9806 7.9414 28.8339 36.5113 162.7288 5.8573 59.3921 0.0000 33.0000 3.0000 1.8828 83.0231 350.5439 46.6688 130.2809 78.1169 34.8598 0.0000 33.0000 5.0000 2.2508 30.5585 7.1259 52.5647 142.8254 -30.3512 -140.0638 0.0000 40.0000 0.0000 1.1769 59.3108 257.7733 64.2965 295.4913 77.9839 -170.2764 0.0000 40.0000 8.0000 5.1735 77.7614 121.4381 75.5442 248.7630 Sum Squared Error for Training cases = 49.8366 % of Training Cases that meet criteria = 4.84 72 Running Test Cases -136.5831 10.4916 0.0000 39.0000 7.0000 1.9283 26.7958 194.7109 47.2232 187.9361 34.3398 210.0049 0.0000 49.0000 1.0000 4.5090 34.5423 80.5470 55.5612 84.1371 82.9870 -124.3327 0.0000 32.0000 9.0000 4.9309 33.7814 316.5595 75.4170 245.3352 -48.6279 40.2555 0.0000 35.0000 9.0000 0.5604 37.4732 182.3330 31.0888 190.3178 153.0100 174.3223 0.0000 46.0000 4.0000 1.3613 73.0171 7.9212 55.5231 84.4360 368.8978 -0.0000 0.0000 49.0000 9.0000 0.0000 34.0978 0.0000 86.1550 153.1886 -165.1387 119.6764 0.0000 37.0000 4.0000 2.7543 53.3036 136.5933 64.9642 119.2699 15.6687 -31.5330 0.0000 37.0000 7.0000 0.5740 7.4114 291.1254 38.8739 209.5419 -62.0044 -41.2591 0.0000 32.0000 4.0000 5.8651 27.5143 203.2999 43.6235 201.7343 -317.2747 23.8801 0.0000 49.0000 6.0000 3.8348 40.9752 154.0088 62.2519 179.2567 -59.0752 -40.2403 0.0000 31.0000 6.0000 3.9809 18.7016 211.4226 44.0200 204.6961 152.6800 -0.0000 0.0000 32.0000 4.0000 0.0000 62.5278 0.0000 63.4093 -19.9315 109.5231 -110.5396 0.0000 40.0000 7.0000 0.4121 31.4505 290.6868 61.7914 292.2250 -51.7418 -13.9461 0.0000 40.0000 1.0000 2.0395 79.8504 229.0027 21.3260 237.4694 203.8975 -150.3364 0.0000 46.0000 2.0000 5.6647 59.8957 323.2586 65.7052 370.1521 -0.2333 0.3988 0.0000 43.0000 9.0000 0.3752 0.0702 120.3868 27.8146 196.5201 108.3641 -148.6955 0.0000 43.0000 9.0000 5.1323 68.8050 96.5263 69.6681 241.5652 17.3378 -97.6280 0.0000 37.0000 7.0000 0.0000 49.3766 225.0000 52.3217 289.5082 Sum Squared Error for Testing cases = 1.8085 % of Testing Cases that meet criteria = 22.22 Then we apply the non-linear scaling to the algorithm and execute the algorithm. Below is the output for the execution. 73 89977888 72.7617 44.5306 0.0000 36.0000 2.0000 4.3939 58.4399 43.8725 46.0570 91.6389 89978889 -172.7895 -32.8454 0.0000 39.0000 3.0000 4.3372 47.5066 174.3489 54.7174 151.1034 89979890 -21.1724 23.4599 0.0000 35.0000 5.0000 0.0000 7.6959 135.0000 26.1861 188.3340 89980891 110.5869 16.1809 0.0000 37.0000 1.0000 2.6621 56.5958 3.1315 50.7372 224.9430 89981892 261.0820 46.3084 0.0000 40.0000 8.0000 0.0000 81.7612 90.0000 75.3395 -7.4797 89982893 23.7422 -22.4497 0.0000 32.0000 2.0000 5.0736 84.6955 62.7784 49.8706 255.6838 89983894 -42.9145 -19.9635 0.0000 32.0000 5.0000 0.2630 15.9800 203.5113 23.9576 205.8927 89984895 43.8675 111.6771 0.0000 36.0000 7.0000 0.0000 28.8077 90.0000 46.8638 126.0627 89985896 -13.0707 255.3198 0.0000 46.0000 9.0000 1.7590 27.0360 89.2597 53.3963 72.4230 89986897 -79.0395 41.7969 0.0000 33.0000 9.0000 1.7895 21.1974 167.8664 31.3773 224.1386 89987898 -19.6245 -177.9393 0.0000 40.0000 2.0000 3.6936 42.0222 272.0479 42.8211 313.6628 89988899 -32.5601 71.8381 0.0000 40.0000 6.0000 4.7127 18.2334 109.6095 41.5594 107.9797 89989900 31.1128 -148.6197 0.0000 43.0000 1.0000 0.0000 64.0141 270.0000 52.2251 274.2804 89990901 171.8071 -108.0815 0.0000 43.0000 8.0000 0.0000 27.0544 315.0000 46.8173 299.3389 89991902 -171.9108 18.1377 0.0000 36.0000 8.0000 3.4680 71.1371 82.0294 67.1349 150.1788 89992903 -74.4172 101.2078 0.0000 40.0000 9.0000 2.2764 18.1455 125.0931 50.9134 184.5170 89993904 -148.1408 -91.9054 0.0000 32.0000 8.0000 3.1597 49.0996 240.6482 40.3980 266.6870 89994905 125.9776 -16.5148 0.0000 46.0000 2.0000 5.4039 78.1282 33.9058 38.9918 181.3696 89995906 -27.2345 -52.1089 0.0000 32.0000 6.0000 2.5699 51.3839 292.1707 25.0806 236.7288 89996907 30.2260 1.8990 0.0000 32.0000 2.0000 2.7425 70.3060 348.8768 47.1975 200.4385 89997908 -2.2989 71.5865 0.0000 40.0000 1.0000 1.2326 82.4052 107.9166 36.2531 112.0635 89998909 152.0903 42.7557 0.0000 31.0000 8.0000 0.0000 77.0752 90.0000 55.1508 -9.7306 89999910 91.4196 -72.5273 0.0000 46.0000 1.0000 5.5390 15.8439 321.6954 46.0183 185.1858 Network did not converge Total number of iterations = 90000000 74 Confirm Training Cases 178.9687 23.4961 0.0000 46.0000 5.0000 0.0125 21.9821 8.8686 53.6210 131.0318 230.7672 130.0264 0.0000 46.0000 3.0000 0.0000 60.8046 45.0000 60.1170 -9.9999 -21.1724 23.4599 0.0000 35.0000 5.0000 0.0000 7.6959 135.0000 39.6184 186.0025 -8.8855 86.9114 0.0000 36.0000 7.0000 6.0688 26.2457 115.3928 41.9491 135.4553 104.1476 67.7624 0.0000 31.0000 3.0000 0.0000 51.1223 45.0000 69.8545 -9.9995 11.7676 -92.6438 0.0000 40.0000 4.0000 0.0000 17.2861 270.0000 49.0758 279.7641 69.6204 -9.6036 0.0000 43.0000 9.0000 5.3348 53.1830 109.3070 32.4829 180.1147 25.2443 -25.2443 0.0000 40.0000 0.0000 0.0000 83.6846 315.0000 53.8916 197.8363 2.2759 -52.7335 0.0000 35.0000 5.0000 0.6854 13.6466 266.7787 46.9608 260.0039 10.1286 -133.8500 0.0000 32.0000 7.0000 4.9717 82.0924 164.3093 69.2704 266.0002 -117.9191 81.4699 0.0000 39.0000 8.0000 2.2939 71.2305 290.0352 43.9545 179.6308 51.0201 0.5967 0.0000 35.0000 2.0000 0.0000 89.8632 90.0000 63.5299 56.4707 -234.9254 -39.5119 0.0000 40.0000 7.0000 3.8667 64.5243 136.9512 62.5969 171.8357 245.8805 -161.6241 0.0000 37.0000 9.0000 5.9184 71.0984 291.4706 69.7580 278.4772 -258.3213 171.8963 0.0000 43.0000 8.0000 2.3965 82.4826 223.0216 72.1274 131.7584 44.6163 -90.0276 0.0000 33.0000 4.0000 4.1680 64.2199 341.8543 67.9425 263.2658 -234.4236 -131.9078 0.0000 40.0000 7.0000 3.6872 82.8753 198.4846 62.4416 177.4301 92.7752 -74.8569 0.0000 35.0000 3.0000 0.0000 28.9385 315.0000 67.7422 326.8099 -83.1771 13.3336 0.0000 37.0000 2.0000 0.6102 19.8037 173.8225 38.4212 193.8217 41.6193 25.2473 0.0000 40.0000 7.0000 0.9806 7.9414 28.8339 40.2185 170.1645 5.8573 59.3921 0.0000 33.0000 3.0000 1.8828 83.0230 350.5439 46.7700 150.4367 78.1169 34.8598 0.0000 33.0000 5.0000 2.2508 30.5585 7.1259 49.2888 169.7477 -30.3512 -140.0638 0.0000 40.0000 0.0000 1.1769 59.3108 257.7733 62.1667 304.8369 77.9839 -170.2764 0.0000 40.0000 8.0000 5.1735 77.7614 121.4381 67.8393 268.6543 Sum Squared Error for Training cases = 59.0457 % of Training Cases that meet criteria = 12.168 75 Running Test Cases -136.5831 10.4916 0.0000 39.0000 7.0000 1.9283 26.7958 194.7109 44.8325 226.2114 34.3398 210.0049 0.0000 49.0000 1.0000 4.5090 34.5423 80.5470 55.0247 66.7159 82.9870 -124.3327 0.0000 32.0000 9.0000 4.9309 33.7814 316.5595 68.5860 268.8288 -48.6279 40.2555 0.0000 35.0000 9.0000 0.5604 37.4732 182.3330 36.8600 190.5780 153.0100 174.3223 0.0000 46.0000 4.0000 1.3613 73.0171 7.9212 47.6661 111.9258 368.8978 -0.0000 0.0000 49.0000 9.0000 0.0000 34.0978 0.0000 78.9937 178.7031 -165.1387 119.6764 0.0000 37.0000 4.0000 2.7543 53.3036 136.5933 66.8384 121.7134 15.6687 -31.5330 0.0000 37.0000 7.0000 0.5740 7.4114 291.1255 33.3119 224.1601 -62.0044 -41.2591 0.0000 32.0000 4.0000 5.8651 27.5143 203.2999 47.1139 191.4429 -317.2747 23.8801 0.0000 49.0000 6.0000 3.8348 40.9752 154.0088 62.9408 171.2888 -59.0752 -40.2403 0.0000 31.0000 6.0000 3.9809 18.7016 211.4226 44.8314 188.5400 152.6800 -0.0000 0.0000 32.0000 4.0000 0.0000 62.5278 0.0000 71.7174 -9.8828 109.5231 -110.5396 0.0000 40.0000 7.0000 0.4121 31.4505 290.6868 63.3577 296.6970 -51.7418 -13.9461 0.0000 40.0000 1.0000 2.0395 79.8504 229.0027 30.5632 213.2331 203.8975 -150.3364 0.0000 46.0000 2.0000 5.6647 59.8957 323.2586 68.3636 314.5242 -0.2333 0.3988 0.0000 43.0000 9.0000 0.3752 0.0702 120.3868 29.7459 198.8738 108.3641 -148.6955 0.0000 43.0000 9.0000 5.1323 68.8050 96.5263 63.8980 260.3573 17.3378 -97.6280 0.0000 37.0000 7.0000 0.0000 49.3766 225.0000 50.1579 285.8929 Sum Squared Error for Testing cases = 2.1026 % of Testing Cases that meet criteria = 15.556 3. Result for Wine Recognition data This datasets is applied to the linear scaling execution of the backpropagation algorithm. Below is the output result of this execution. 76 260260 13.5000 1.8100 2.6100 20.0000 96.0000 2.5300 2.6100 0.2800 1.6600 3.5200 1.1200 3.8200 845.0000 1.0000 1.0407 261261 12.6000 2.4600 2.2000 18.5000 94.0000 1.6200 0.6600 0.6300 0.9400 7.1000 0.7300 1.5800 695.0000 3.0000 3.0017 262262 13.3400 0.9400 2.3600 17.0000 110.0000 2.5300 1.3000 0.5500 0.4200 3.1700 1.0200 1.9300 750.0000 2.0000 2.0011 263263 13.2000 1.7800 2.1400 11.2000 100.0000 2.6500 2.7600 0.2600 1.2800 4.3800 1.0500 3.4000 1050.0000 1.0000 0.9780 264264 11.7600 2.6800 2.9200 20.0000 103.0000 1.7500 2.0300 0.6000 1.0500 3.8000 1.2300 2.5000 607.0000 2.0000 1.9211 265265 14.2100 4.0400 2.4400 18.9000 111.0000 2.8500 2.6500 0.3000 1.2500 5.2400 0.8700 3.3300 1080.0000 1.0000 1.0663 266266 13.8400 4.1200 2.3800 19.5000 89.0000 1.8000 0.8300 0.4800 1.5600 9.0100 0.5700 1.6400 480.0000 3.0000 3.0046 267267 12.0800 1.3300 2.3000 23.6000 70.0000 2.2000 1.5900 0.4200 1.3800 1.7400 1.0700 3.2100 625.0000 2.0000 2.0348 268268 13.7100 1.8600 2.3600 16.6000 101.0000 2.6100 2.8800 0.2700 1.6900 3.8000 1.1100 4.0000 1035.0000 1.0000 0.9802 269269 12.7000 3.5500 2.3600 21.5000 106.0000 1.7000 1.2000 0.1700 0.8400 5.0000 0.7800 1.2900 600.0000 3.0000 3.0420 270270 13.1100 1.0100 1.7000 15.0000 78.0000 2.9800 3.1800 0.2600 2.2800 5.3000 1.1200 3.1800 502.0000 2.0000 2.0189 271271 14.1300 4.1000 2.7400 24.5000 96.0000 2.0500 0.7600 0.5600 1.3500 9.2000 0.6100 1.6000 560.0000 3.0000 2.9957 272272 11.4600 3.7400 1.8200 19.5000 107.0000 3.1800 2.5800 0.2400 3.5800 2.9000 0.7500 2.8100 562.0000 2.0000 2.0450 273273 13.2400 3.9800 2.2900 17.5000 103.0000 2.6400 2.6300 0.3200 1.6600 4.3600 0.8200 3.0000 680.0000 1.0000 1.0531 274274 12.5800 1.2900 2.1000 20.0000 103.0000 1.4800 0.5800 0.5300 1.4000 7.6000 0.5800 1.5500 640.0000 3.0000 3.0433 Converged to within criteria Total number of iterations = 275049 77 Confirm Training Cases 12.8500 1.6000 2.5200 17.8000 95.0000 2.4800 2.3700 0.2600 1.4600 3.9300 1.0900 3.6300 1015.0000 1.0000 1.0222 13.5000 1.8100 2.6100 20.0000 96.0000 2.5300 2.6100 0.2800 1.6600 3.5200 1.1200 3.8200 845.0000 1.0000 1.0351 13.0500 2.0500 3.2200 25.0000 124.0000 2.6300 2.6800 0.4700 1.9200 3.5800 1.1300 3.2000 830.0000 1.0000 1.0925 13.3900 1.7700 2.6200 16.1000 93.0000 2.8500 2.9400 0.3400 1.4500 4.8000 0.9200 3.2200 1195.0000 1.0000 0.9826 13.3000 1.7200 2.1400 17.0000 94.0000 2.4000 2.1900 0.2700 1.3500 3.9500 1.0200 2.7700 1285.0000 1.0000 1.0050 13.8700 1.9000 2.8000 19.4000 107.0000 2.9500 2.9700 0.3700 1.7600 4.5000 1.2500 3.4000 915.0000 1.0000 0.9845 14.0200 1.6800 2.2100 16.0000 96.0000 2.6500 2.3300 0.2600 1.9800 4.7000 1.0400 3.5900 1035.0000 1.0000 0.9831 13.7300 1.5000 2.7000 22.5000 101.0000 3.0000 3.2500 0.2900 2.3800 5.7000 1.1900 2.7100 1285.0000 1.0000 0.9824 13.5800 1.6600 2.3600 19.1000 106.0000 2.8600 3.1900 0.2200 1.9500 6.9000 1.0900 2.8800 1515.0000 1.0000 0.9825 13.6800 1.8300 2.3600 17.2000 104.0000 2.4200 2.6900 0.4200 1.9700 3.8400 1.2300 2.8700 990.0000 1.0000 1.0146 13.7600 1.5300 2.7000 19.5000 132.0000 2.9500 2.7400 0.5000 1.3500 5.4000 1.2500 3.0000 1235.0000 1.0000 0.9849 13.5100 1.8000 2.6500 19.0000 110.0000 2.3500 2.5300 0.2900 1.5400 4.2000 1.1000 2.8700 1095.0000 1.0000 0.9915 13.4800 1.8100 2.4100 20.5000 100.0000 2.7000 2.9800 0.2600 1.8600 5.1000 1.0400 3.4700 920.0000 1.0000 1.0307 13.2800 1.6400 2.8400 15.5000 110.0000 2.6000 2.6800 0.3400 1.3600 4.6000 1.0900 2.7800 880.0000 1.0000 0.9861 13.0500 1.6500 2.5500 18.0000 98.0000 2.4500 2.4300 0.2900 1.4400 4.2500 1.1200 2.5100 1105.0000 1.0000 1.0132 13.0700 1.5000 2.1000 15.5000 98.0000 2.4000 2.6400 0.2800 1.3700 3.7000 1.1800 2.6900 1020.0000 1.0000 1.1099 Sum Squared Error for Training cases = 0.0596 % of Training Cases that meet criteria = 99.4382 78 Running Test Cases 14.3700 1.9500 2.5000 16.8000 113.0000 3.8500 3.4900 0.2400 2.1800 7.8000 0.8600 3.4500 1480.0000 1.0000 1.0179 14.1000 2.1600 2.3000 18.0000 105.0000 2.9500 3.3200 0.2200 2.3800 5.7500 1.2500 3.1700 1510.0000 1.0000 0.9749 13.0500 1.7300 2.0400 12.4000 92.0000 2.7200 3.2700 0.1700 2.9100 7.2000 1.1200 2.9100 1150.0000 1.0000 0.9917 13.7400 1.6700 2.2500 16.4000 118.0000 2.6000 2.9000 0.2100 1.6200 5.8500 0.9200 3.2000 1060.0000 1.0000 0.9817 13.5600 1.7300 2.4600 20.5000 116.0000 2.9600 2.7800 0.2000 2.4500 6.2500 0.9800 3.0300 1120.0000 1.0000 0.9911 12.7200 1.8100 2.2000 18.8000 86.0000 2.2000 2.5300 0.2600 1.7700 3.9000 1.1600 3.1400 714.0000 2.0000 1.9294 12.4700 1.5200 2.2000 19.0000 162.0000 2.5000 2.2700 0.3200 3.2800 2.6000 1.1600 2.6300 937.0000 2.0000 1.9731 11.6100 1.3500 2.7000 20.0000 94.0000 2.7400 2.9200 0.2900 2.4900 2.6500 0.9600 3.2600 680.0000 2.0000 2.0100 11.8700 4.3100 2.3900 21.0000 82.0000 2.8600 3.0300 0.2100 2.9100 2.8000 0.7500 3.6400 380.0000 2.0000 2.0596 12.0700 2.1600 2.1700 21.0000 85.0000 2.6000 2.6500 0.3700 1.3500 2.7600 0.8600 3.2800 378.0000 2.0000 2.0362 12.8600 1.3500 2.3200 18.0000 122.0000 1.5100 1.2500 0.2100 0.9400 4.1000 0.7600 1.2900 630.0000 3.0000 2.9178 13.0800 3.9000 2.3600 21.5000 113.0000 1.4100 1.3900 0.3400 1.1400 9.4000 0.5700 1.3300 550.0000 3.0000 3.0236 14.3400 1.6800 2.7000 25.0000 98.0000 2.8000 1.3100 0.5300 2.7000 13.0000 0.5700 1.9600 660.0000 3.0000 2.9585 13.4800 1.6700 2.6400 22.5000 89.0000 2.6000 1.1000 0.5200 2.2900 11.7500 0.5700 1.7800 620.0000 3.0000 2.9972 13.7100 5.6500 2.4500 20.5000 95.0000 1.6800 0.6100 0.5200 1.0600 7.7000 0.6400 1.7400 740.0000 3.0000 2.9643 Sum Squared Error for Testing cases = 0.0045 % of Testing Cases that meet criteria = 100.0000 Then we apply the non-linear scaling to the algorithm and execute the algorithm. Below is the output for the execution. 79 290290 13.5000 3.1200 2.6200 24.0000 123.0000 1.4000 1.5700 0.2200 1.2500 8.6000 0.5900 1.3000 500.0000 3.0000 3.0386 291291 13.0500 3.8600 2.3200 22.5000 85.0000 1.6500 1.5900 0.6100 1.6200 4.8000 0.8400 2.0100 515.0000 2.0000 2.0194 292292 14.3000 1.9200 2.7200 20.0000 120.0000 2.8000 3.1400 0.3300 1.9700 6.2000 1.0700 2.6500 1280.0000 1.0000 0.9919 293293 11.7900 2.1300 2.7800 28.5000 92.0000 2.1300 2.2400 0.5800 1.7600 3.0000 0.9700 2.4400 466.0000 2.0000 1.9906 294294 12.3300 1.1000 2.2800 16.0000 101.0000 2.0500 1.0900 0.6300 0.4100 3.2700 1.2500 1.6700 680.0000 2.0000 2.0120 295295 12.7700 2.3900 2.2800 19.5000 86.0000 1.3900 0.5100 0.4800 0.6400 9.9000 0.5700 1.6300 470.0000 3.0000 3.0509 296296 12.5100 1.7300 1.9800 20.5000 85.0000 2.2000 1.9200 0.3200 1.4800 2.9400 1.0400 3.5700 672.0000 2.0000 2.0025 297297 13.0500 1.6500 2.5500 18.0000 98.0000 2.4500 2.4300 0.2900 1.4400 4.2500 1.1200 2.5100 1105.0000 1.0000 1.0075 298298 13.3200 3.2400 2.3800 21.5000 92.0000 1.9300 0.7600 0.4500 1.2500 8.4200 0.5500 1.6200 650.0000 3.0000 3.0420 299299 12.7200 1.8100 2.2000 18.8000 86.0000 2.2000 2.5300 0.2600 1.7700 3.9000 1.1600 3.1400 714.0000 2.0000 2.0321 300300 14.3800 1.8700 2.3800 12.0000 102.0000 3.3000 3.6400 0.2900 2.9600 7.5000 1.2000 3.0000 1547.0000 1.0000 0.9786 301301 12.0700 2.1600 2.1700 21.0000 85.0000 2.6000 2.6500 0.3700 1.3500 2.7600 0.8600 3.2800 378.0000 2.0000 2.0124 302302 13.7200 1.4300 2.5000 16.7000 108.0000 3.4000 3.6700 0.1900 2.0400 6.8000 0.8900 2.8700 1285.0000 1.0000 0.9806 303303 13.4000 4.6000 2.8600 25.0000 112.0000 1.9800 0.9600 0.2700 1.1100 8.5000 0.6700 1.9200 630.0000 3.0000 3.0243 304304 12.3400 2.4500 2.4600 21.0000 98.0000 2.5600 2.1100 0.3400 1.3100 2.8000 0.8000 3.3800 438.0000 2.0000 2.0512 305305 13.4800 1.8100 2.4100 20.5000 100.0000 2.7000 2.9800 0.2600 1.8600 5.1000 1.0400 3.4700 920.0000 1.0000 1.0297 306306 13.8800 5.0400 2.2300 20.0000 80.0000 0.9800 0.3400 0.4000 0.6800 4.9000 0.5800 1.3300 415.0000 3.0000 3.0467 Converged to within criteria Total number of iterations = 306377 80 Confirm Training Cases 13.5000 1.8100 2.6100 20.0000 96.0000 2.5300 2.6100 0.2800 1.6600 3.5200 1.1200 3.8200 845.0000 1.0000 1.0242 13.0500 2.0500 3.2200 25.0000 124.0000 2.6300 2.6800 0.4700 1.9200 3.5800 1.1300 3.2000 830.0000 1.0000 1.0880 13.3900 1.7700 2.6200 16.1000 93.0000 2.8500 2.9400 0.3400 1.4500 4.8000 0.9200 3.2200 1195.0000 1.0000 0.9815 13.3000 1.7200 2.1400 17.0000 94.0000 2.4000 2.1900 0.2700 1.3500 3.9500 1.0200 2.7700 1285.0000 1.0000 0.9941 13.8700 1.9000 2.8000 19.4000 107.0000 2.9500 2.9700 0.3700 1.7600 4.5000 1.2500 3.4000 915.0000 1.0000 0.9858 14.0200 1.6800 2.2100 16.0000 96.0000 2.6500 2.3300 0.2600 1.9800 4.7000 1.0400 3.5900 1035.0000 1.0000 0.9832 13.7300 1.5000 2.7000 22.5000 101.0000 3.0000 3.2500 0.2900 2.3800 5.7000 1.1900 2.7100 1285.0000 1.0000 0.9855 13.5800 1.6600 2.3600 19.1000 106.0000 2.8600 3.1900 0.2200 1.9500 6.9000 1.0900 2.8800 1515.0000 1.0000 0.9818 13.6800 1.8300 2.3600 17.2000 104.0000 2.4200 2.6900 0.4200 1.9700 3.8400 1.2300 2.8700 990.0000 1.0000 1.0105 13.7600 1.5300 2.7000 19.5000 132.0000 2.9500 2.7400 0.5000 1.3500 5.4000 1.2500 3.0000 1235.0000 1.0000 0.9828 13.5100 1.8000 2.6500 19.0000 110.0000 2.3500 2.5300 0.2900 1.5400 4.2000 1.1000 2.8700 1095.0000 1.0000 0.9871 13.4800 1.8100 2.4100 20.5000 100.0000 2.7000 2.9800 0.2600 1.8600 5.1000 1.0400 3.4700 920.0000 1.0000 1.0290 13.2800 1.6400 2.8400 15.5000 110.0000 2.6000 2.6800 0.3400 1.3600 4.6000 1.0900 2.7800 880.0000 1.0000 0.9843 13.0500 1.6500 2.5500 18.0000 98.0000 2.4500 2.4300 0.2900 1.4400 4.2500 1.1200 2.5100 1105.0000 1.0000 1.0071 13.0700 1.5000 2.1000 15.5000 98.0000 2.4000 2.6400 0.2800 1.3700 3.7000 1.1800 2.6900 1020.0000 1.0000 1.0992 Sum Squared Error for Training cases = 0.0435 % of Training Cases that meet criteria = 100.0000 81 Running Test Cases 14.3700 1.9500 2.5000 16.8000 113.0000 3.8500 3.4900 0.2400 2.1800 7.8000 0.8600 3.4500 1480.0000 1.0000 0.9942 14.1000 2.1600 2.3000 18.0000 105.0000 2.9500 3.3200 0.2200 2.3800 5.7500 1.2500 3.1700 1510.0000 1.0000 0.9787 13.0500 1.7300 2.0400 12.4000 92.0000 2.7200 3.2700 0.1700 2.9100 7.2000 1.1200 2.9100 1150.0000 1.0000 0.9940 13.7400 1.6700 2.2500 16.4000 118.0000 2.6000 2.9000 0.2100 1.6200 5.8500 0.9200 3.2000 1060.0000 1.0000 0.9816 13.5600 1.7300 2.4600 20.5000 116.0000 2.9600 2.7800 0.2000 2.4500 6.2500 0.9800 3.0300 1120.0000 1.0000 0.9912 12.7200 1.8100 2.2000 18.8000 86.0000 2.2000 2.5300 0.2600 1.7700 3.9000 1.1600 3.1400 714.0000 2.0000 2.0376 12.4700 1.5200 2.2000 19.0000 162.0000 2.5000 2.2700 0.3200 3.2800 2.6000 1.1600 2.6300 937.0000 2.0000 1.9519 11.6100 1.3500 2.7000 20.0000 94.0000 2.7400 2.9200 0.2900 2.4900 2.6500 0.9600 3.2600 680.0000 2.0000 1.9965 11.8700 4.3100 2.3900 21.0000 82.0000 2.8600 3.0300 0.2100 2.9100 2.8000 0.7500 3.6400 380.0000 2.0000 2.0498 12.0700 2.1600 2.1700 21.0000 85.0000 2.6000 2.6500 0.3700 1.3500 2.7600 0.8600 3.2800 378.0000 2.0000 2.0286 12.8600 1.3500 2.3200 18.0000 122.0000 1.5100 1.2500 0.2100 0.9400 4.1000 0.7600 1.2900 630.0000 3.0000 2.9333 13.0800 3.9000 2.3600 21.5000 113.0000 1.4100 1.3900 0.3400 1.1400 9.4000 0.5700 1.3300 550.0000 3.0000 3.0373 14.3400 1.6800 2.7000 25.0000 98.0000 2.8000 1.3100 0.5300 2.7000 13.0000 0.5700 1.9600 660.0000 3.0000 2.9894 13.4800 1.6700 2.6400 22.5000 89.0000 2.6000 1.1000 0.5200 2.2900 11.7500 0.5700 1.7800 620.0000 3.0000 3.0314 13.7100 5.6500 2.4500 20.5000 95.0000 1.6800 0.6100 0.5200 1.0600 7.7000 0.6400 1.7400 740.0000 3.0000 2.9724 Sum Squared Error for Testing cases = 0.0040 % of Testing Cases that meet criteria = 100.0000 82 BIBLIOGRAPHY 1. Neural Computing: THEORY AND PRACTICE By Philip D. Wasserman Anza Research, Inc. Publisher: VAN NOSTRAND REINHOLD, 1989 2. Neural Networks – A Systematic Introduction By Raul Rojas Publisher: Berlin; New York: Springer-Verlag, c1996 3. Machine Learning: Neural Networks, Genetic Algorithms and Fuzzy Systems By Hojjat Adeli, Shih – Lin Hung Publisher: John Wiley & Sons, Inc, 1995 4. http://www.doc.ic.ac.uk/~nd/surprise_96/journal/vol4/cs11/report.html#What is a Neural Network Website Title: NEURAL NETWORKS by Christos Stergiou and Dimitrios Siganos Date Accessed 10th June 2010 5. Artificial neural Networks : methods and applications / David J. Livingstone, editor Publisher: Totowa, NJ: Humana Press, c2008