An ANALYSIS of MULTI-LAYER PERCEPTRON with BACKPROPAGATION An ANALYSIS of MULTI-LAYER PERCEPTRON with BACKPROPAGATION Abstract Artificial neural networks are tools mostly used in machine learning and/or pattern recognition branches of computer science. They are tools using a computing technique inspired from the basic elements of the brain: neurons. This technique is a parametric one and thus prior to learning. Hence learning in artificial neural networks consists of adjusting these parameters by certain methods (or algorithms). “Artificial neural networks can be most adequately characterized as `computational models` with particular properties such as the ability to adapt or learn, to generalize, or to cluster or organize data, and which operation is based on parallel processing”1. I simulated an ANN in two different major tasks and obtained results according to different initial parameters. Deriving relations between the parameters and the results was difficult since in most of the test cases the training ended with immediate success; a rapid convergence. The rest of this analysis report continues as follows: First part describes the criteria (the parameters) which can be adjusted from the user interface and hence its effects can be tested. Second part describes the inner structure of the network. Third and fourth chapters include the selected test results and the comments on them. Appendix consists of the programs GUI and an extra list of test results. 1. PROGRAM CAPABILITIES (TESTED CRITERIA) Neuro-Trainer (NEUTRA) v1.0 is an ANN interface in which some initial parameters can be set and the specified network can be trained. NEUTRA user interface is explained in full detail in Appendix A. NEUTRA has four major tasks in itself: It can create a network, train the network, test the network and display the test results. The parameters set from the user interface are below. However they will be introduced in detail later. The parameters which are set during the creation process are: Output Layer: The number code of the output layer. Input Layer: The number code of the input layer. It gives the number of the layers with the “Output Layer” value. Layer Sizes: Neuron counts in the layers. The numbers are introduced with commas in between. Bias: The bias used in all of the neurons of the system. Alpha: The coefficient used in the redefined error calculations. Disable: The neurons to be disabled from the upper hidden layer. Error: The type of the error. Either standard or redefined. Direct Input Conn: The input neurons are directly connected to the output layer neurons if checked. Training parameters are: Data Size: The number of training loops. Learning Rate: The coefficient used in backpropagation. Momentum: The coefficient that represents the probability of path-changing in training. Training/Data: The number that defines the percentages of the data to be used in training and testing. If the number is 100 than all the data is also used in testing. It is not active in the first task since it has only 4 different input patterns. 1 An ANALYSIS of MULTI-LAYER PERCEPTRON with BACKPROPAGATION NEUTRA’s evaluation functions and data generators are implemented for two major tasks. First is a [2*2*1] network which stands for the logic operator XOR. It is also capable of simulating AND, OR, NAND, NOR operators. The second is [5*5*1] network which stands for ((O1 AND O2) OR (O3 AND O5)). The tests and displaying functions and all methods requiring a heuristic function is specifically designed for an output layer of count 1. Although NEUTRA is capable of doing more (specifying the layer count, momentum and etc.) I mostly focused on the differences in error types, biases and disabling hidden layer neurons. 2. NETWORK STRUCTURE The network is a generic full-connected multi-layer perceptron with backpropagation. Its neurons are basically McCulloch-Pitts neurons having evaluation functions and the network itself is the same network introduced by Frank Rosenblatt2. The characteristics of such a network can be listed as its evaluation function, its weight updating rule, its error function and so on. In NEUTRA these characteristics are defined accordingly: For the ith neuron of the jth layer, the firing function is: N ( j 1) Vji = f ( V (j-1)iWjik ) k 1 where f ( x) = tanh( x) The standard error of ith neuron of the jth layer in a multi-layer perceptron is: ji = f '(Vji)(expi - Vji) if it is the output layer, ji = f '(Vji)( N ( j 1) (j+1)kW(j+1)ki) k 1 otherwise; where f '( x) = 1- tanh 2 ( x) , V is the value fired by the neuron and W are the connection weights. Weight updating rule is: Wji(n) = Wji(n-1) + Wji(n) where Wji(n) = µWji(n-1) + ß ji(n)Vi(n) Here μ represents momentum coefficient whereas β represents the beta parameter. The redefined error is: E ref = E std + (W ) ij 2 i,j There is no stopping criterion defined. The training ends when the maximum number of loops is reached. I did not implement a stopping criterion because of practical reasons. In almost every case the training made successful adjustments in a very small number of loops and this made it unnecessary to define a stopping criterion. When a neuron is disabled its connections to the upper layers are disabled. When ‘Direct Input Connection’ is selected new connections from input neurons to output neurons are added. For both of the tasks the inputs are either 1 or -1; so as the output values. It is basically a design issue in order to specify a correspondence between the neurons and the evaluation function tanh which has a range of [-1,1]. 2 An ANALYSIS of MULTI-LAYER PERCEPTRON with BACKPROPAGATION Initial weights are randomized between [0,1]. Data separation specified with ‘Training/Data’ parameter for training and test phases is randomized too. This means, in each training the data given to the training and test phases will be different. 3. RESULTS & COMMENTS Tests are done in order to see the effects of changes in some specific parameters as mentioned before. These parameters are the error type in the first task and the Training/Data percentage in the second task. However, during the test procedures I saw extraordinary effects of some other parameters. Also I tested the first tasks network structure in other similar tasks such as AND, OR problems. Those were to test the structure only. The results of these tests are fully present in Appendix B. 3.1 First Task The first class tests are performed with 10 different amounts of training data (Figure 3.1). The data count is (5,10,15,25,50,100,250,1000,5000,10000) accordingly. I used such small values since the network converges to a successful limit immediately. The Error/Epochs and Tests/Epochs graphics are below (Figure 3.2, Figure 3.3). Since there is not a significant difference among the other tests of the class considering these results, these snapshots can also be considered as mean values. Figure 3.1: Configurations of the first class tests. 3 An ANALYSIS of MULTI-LAYER PERCEPTRON with BACKPROPAGATION Figure 3.2: Epochs / Tests graphics for data size of 50. Figure 3.3: Epochs / Errors graphics for data size of 50. The ‘jumps’ in the figure 3.3 points out the oscillation due to the value of β. As seen in the figure 3.2, network gives the exact solution only after the third loop. The second class tests are performed similarly. The data size sets are the same with the first class. The only difference is that the ‘Error Type’ parameter is set to 4 An ANALYSIS of MULTI-LAYER PERCEPTRON with BACKPROPAGATION “Redefined”. The figures of the test results, again corresponding to the fifth data size, are below (Figure 3.4, Figure 3.5). Figure 3.4: Epochs / Tests graphics for data size of 50. Figure 3.5: Epochs / Errors graphics for data size of 50. 5 An ANALYSIS of MULTI-LAYER PERCEPTRON with BACKPROPAGATION Figure 3.6: Output / Input graphics for the output layer of the network trained with data size of 50. As seen in the figures, the redefinition of the error parameter has no significant effect on the network, or it is impossible to observe the difference since the training converges in the third loop immediately. The decision surfaces do not differ either (Figure 3.6). This situation may have several reasons. Firstly, the simplicity of the task and the structure of the network do not allow us to differentiate the changes in this parameter. There is one neuron in the output layer and there is no specific heuristic function. Secondly the input patterns are bipolarized i.e. they can have only two values, which leads to a network responding only such patterns. However I have to mention that increasing the α coefficient leads the network to different convergence states (Figure 3.7). High values for α (> 0.1) makes the network unsuccessful in total. 3.2 Second Task In the second task the main target is the ‘Training/Data’ parameter. The class tests are performed with three different values for the training data percentages: 70, 90, and 100 (Figure 3.8). In the first phase, their differences are figured separately (Figure 3.9, Figure 3.10, and Figure 3.11). Since there is no difference among the Epochs / Tests graphics, they are not figured. Similar to the first task, the network converges in the second or the third loop (Figure B.1). 6 An ANALYSIS of MULTI-LAYER PERCEPTRON with BACKPROPAGATION Figure 3.7: The convergence state of a large α value (0.1) Figure 3.8: Configurations of the second class tests. 7 An ANALYSIS of MULTI-LAYER PERCEPTRON with BACKPROPAGATION Figure 3.9: Epochs / Errors graphic for %70 training percentage with data size of 10000. Figure 3.10: Epochs / Errors graphic for %90 training percentage with data size of 10000. 8 An ANALYSIS of MULTI-LAYER PERCEPTRON with BACKPROPAGATION Figure 3.11: Epochs / Errors graphic for %100 training percentage with data size of 10000. The main idea which can be resolved from these figures is that the network converges rapidly when it is tested with the training data (third case). Another observation is that it converges in the first case more rapidly than in the second case. This is because the training data is smaller and the network is able to adapt itself to the data easier. 4. FURTHER COMMENTS Throughout the implementation and testing processes some of the characteristics of the tasks made it quite impossible to reach successful observations. Most importantly the effect of the redefinition of error criterion cannot be observed due to the simplicity of the problem. Network, as mentioned before, rapidly converges. Increasing the data size of training, the red area in the Input/Output graphic shrinks to some degree i.e. the network becomes more and more stable. However it is not sensitive to hope better results since the input patterns are bipolarized. I used relatively small values for data sizes in the first task than the second one. The reason was that the convergence of the first task had been established in the early steps of training. Choosing small values gave me the chance to observe small differences between the cases (if there existed any). Independently, I found out that this rapid convergence of the network highly depends on the choice of the bias. Setting bias to 0 makes the network divergent. Choosing bias too large delays the convergence, however choosing a small value leads to different convergence states (Figure B.2). 9 An ANALYSIS of MULTI-LAYER PERCEPTRON with BACKPROPAGATION The polarized character of the input patterns shows itself in the testing phase. All 9 types of the tests performed give the same result in this phase: This is ironically a deceptive situation which impedes the convergence of the network and the observation of the effects of the different configurations of parameters. I believe that more heuristic problems are more suitable to see the effects of changes in these parameters. Kemal Taşkın M.S. in Cognitive Sciences METU - ANKARA mail@kemaltaskin.net http://www.kemaltaskin.net 18.03.2004 Figure 4.1: Test results. 10 An ANALYSIS of MULTI-LAYER PERCEPTRON with BACKPROPAGATION APPENDIX A. NEUTRA 1.0 PROGRAM USER INTERFACE Neuro-Trainer v1.0 is a multi-layer perceptron simulator (see 1) written in C#. Its user interface contains fields for adjusting network parameters and starting simple network actions (create, train, test, display) (Figure A.1). Figure A.1: Standard interface of NEUTRA v1.0 The configurations panel was described before (see 1). Tasks Panel: There are two radio buttons in this panel, corresponding to XOR problem and the second generic problem. The combo box, which is active only when the first task is selected, includes additional tasks mentioned before. When the second task is selected the ‘Training/Data’ parameter becomes active. It is not enabled in the first task though. Actions: This panel involves three major tasks concerning the network. <Create> button creates the network. <Train> button begins the training and <Delete> button deletes the network in order to create networks for more tests. Display: This panel involves 4 buttons. <Weights> button displays the connection weights of the network at that moment. <Tests> button displays the graphics window for test results (Correct/All), <Errors> button displays the graphics window for drifts of the results and <Decision Surfaces> displays a window corresponding to the decision surfaces of each upper hidden layer. Test: This panel has two buttons on it. <Test> button tests the system with the specified number of test data and displays the results. It is also possible to test a single input pattern with <Ind. Test> button by writing the button into the text box left. i An ANALYSIS of MULTI-LAYER PERCEPTRON with BACKPROPAGATION The source code and the executable are available in http://thesis.kemaltaskin.net/. B. FURTHER TEST RESULTS & FIGURES Figure B.1: Epochs / Tests graphics (common) for the second task. Figure B.2: Epochs / Tests graphics for the first task with bias 0.1 ii An ANALYSIS of MULTI-LAYER PERCEPTRON with BACKPROPAGATION Figure B.3: Epochs / Tests graphics for the first task with AND problem. C. REFERENCES 1: 2: Smagt, P. Van Der & Kröse, B. (1996), An Introduction to Neural Networks [p.13] Rosenblatt, F. (1958), The Perceptron: A Probabilistic Model for Information Storage and Organization in the Brain [p. 386-408] Kawaguchi, K. (2000), A Multithreaded Software Model for Backpropagation Neural Network Applications [Chp. 2.4.4] Bodén, M. (2001), A Guide to Recurrent Neural Networks and Backpropagation 3: 4: iii