Martin Kromka martinkromka@gmail.com Supervisor : Ing. Rudolf Jakša PhD. jaksa@neuron.tuke.sk Tutor : Ing. Matúš Užák uzak@neuron.tuke.sk Abstract Information reduction in visualization of neural network learning 1. Introduction A typical neural network performance is usually evaluated with help of mean-square error (MSE) or by estimation of the general classification precision. It’s quite possible that two different neural networks may have the same MSE and in spite of that may form completely different internal data mapping. We need internal mapping analysis to find out if there is any difference between these two nets and visualization is the simplest way to achieve this [4]. 2. Selected Methods and Approaches 2.1 The basic type of visualization The basic of visualization is to convert the responses (output values) of each neural network part (visual information). Sigmoidal function (Figure 2.1) is the most used as an activating function. It also limits neuron’s output to <0,1> interval. We could obtain the particular visual information by multiplying the output with the maximum grey level of the desired picture (Gmax) (Figure 2.2). 1 Figure 2.1: The shape of sigmoidal function [1]. Figure 2.2: Neuron’s response visualization [1]. The left part of Figure 2.2 corresponds with response of neuron to functional signal before learning process (right after the random initialization). The right part of Figure 2.2 means the same as the left part, but it’s after the nearest extreme convergence visualization. Information reduction during neural network learning visualization could be done with help of clustering of each neuron response. For this purpose we used two clustering techniques. For the first we used clustering of two nearest neurons responses with help of Euclidean metric. For the second we used clustering of many neurons with help of Kohonen network [2]. 2.2 Clustering with help of Euclidean metric With help of Euclidean metric combinations of two most similar responses on hidden layers within the frame of the whole testing set rise. Arithmetical diameter needs to be calculated for visualization of one picture with two most similar neurons responses. There newcomer neuron pairs are multiplied by the maximum grey level (Gmax). Consequently min-max normalization needs to be done [3]. Figure 2.3 and Figure 2.4 provides more detailed look at neuron pairs normalized responses visualization. 2 Figure 2.3: First hidden layer visualization using 2D circle training model. Figure 2.4: Second hidden layer visualization using 2D circle training model. 2.3 Clustering with help of Kohonen network Kohonen network includes as many neurons, as many pictures the user wants to visualize on the screen. Two kinds of Kohonen networks were used , exclusively for the first hidden layer and exclusively for the second hidden layer. As training models for Kohonen network, there are each particular neurons responses (Figure 2.5). With help of Kohonen network, clustering of each responses for the first and for the second hidden layer within the frame of the whole testing set is performed. Two variants of visualization using Kohonen networks are possible: 1. cluster centre visualization 2. diameters visualization Figure 2.5: Clustering process using Kohonen network 1. Centres visualization During neural network (NN) learning process, not only the learnt NN synaptic weights (SW) are adapting, but also the Kohonen network SW are 3 adapting. For visualization purposes these Kohonen network SW are multiplied by the maximum grey level (Gmax). 2. Diameters visualization Neurons clusters are being created during the learning process with help of Kohonen network. For neurons clusters visualization purposes, arithmetical diameter of neurons responses for each cluster need to be calculated. This diameter is then multiplied by the maximum grey level (Gmax). 3. Interaction interferences implementation 3.1 SW, destined to the exact neuron reset The visual view of neurons behaviour allows the user to choose neurons in saturation state (mode) and are responding inappropriately to the signals spreading throughout the network. Interaction implementation allows the user to reset the SW, destined to the exact neuron after unpacking the cluster (Figure 3.1) the user wants or chooses (Figure 3.2, Figure 3.3). 4 Figure 3.1: Unpacked clusters. Figure 3.2: Unpacked neurons cluster with wrong reaction of neuron on the left. Figure 3.3 Reset of the neuron on the left 5 3.2 Individual learning rate setting After unpacking some cluster, the user is allowed to set individual learning rate (discrete values) of each neuron of this cluster (Figure 3.4 a Figure 3.5). The user has overview about neurons that represent the appropriate building blocks of desired knowledge, considering their reactions to input signals. SW to those neurons shouldn’t change in the same way as neurons deforming the desired knowledge. Figure 3.4: Default gamma parameter setting for each particular cluster neurons 3.3 Figure 3.5: Various gamma parameter setting for each cluster neuron. Prunning We are able to find the suitable NN architecture. NN costs could be decreased that way (time, memory, hardware implementation). Unimportant neurons could be cut. That helps to create a input relevance picture [5]. Because of simplicity, the prunning implementation was made the easy way. Neurons removal is possible after cluster unpacking. This prunning was implemented in its testing version. The whole process isn’t faster in any way by neurons removal (Figure 3.6). Figure 3.6: Neuron prunning. 4. Experiments The experiments were designed and realised simultaneously with designed visualization model of NN learning information reduction and visualization of this process. Experiments are interlocking each other and are continuously clarify to the reader the NN learning visualization benefits. 4.1 Euclidean metric In Figure 4.1, there is NN visualization using Euclidean metric, learning the circle model. The first row of squares signifies the input layer, the second 6 signifies the first hidden layer (its clusters), the third signifies the second hidden layer (its clusters). The final output is in the last bottom square. Figure 4.1: Circle model visualization by clustering with help of Euclidean metric. 4.2 Kohonen network If we want to visualizate such a number of neuron pairs, that is not possible to figure, there is a problem about Euclidean metric. Kohonen network visualization allows us two views. The centres view and the diameters view. Using this kind of visualization the first row of squares signifies the input layer, the second signifies the first hidden layer (its clusters), the third signifies the second hidden layer (its clusters) and the last bottom square signifies the final response of output neuron (Figure 4.2, 4.3). 7 Figure 4.2: Circle model diameters visualization. Figure 4.3: Circle model centres visualization. By the next experiment SW reset possibility for each neuron and also the possibility of learning rate change for better results were used (Figure 4.4, 4.5, 4.6). Figure 4.4: Desired output. Figure 4.5: Learning process without user intervention visualization. 8 Figure 4.6: Learning process interactive user intervention visualization. Visually, the final neuron looks better using SW reset interaction and individual learning rate change than without learning interaction. On the other hand testing model MSE without interaction is 0,016697 and with interaction is 0,018216. Beside the interaction interference was also prunning used in the next experiment. With help of prunning we tried to get as good results as without prunning by removing neurons (Figure 4.7, 4.8). Figure 4.7: 2-10-4-2 NN visualization without prunning. Figure 4.8: 2-2-1-2 NN visualization after prunning. Figure 4.7 shows the net visualization with 2 input neurons, 10 first layer hidden neurons, 4 second layer hidden neurons and 2 output neurons. Visually, the visualization shows 2 input neurons, 6 clusters in the first hidden layer, 3 9 clusters in the second hidden layer and 1 output neuron. After that prunning was applied. With its help the inappropriate responding neuron’s respond values were set to 0 (Figure 4.8). We got the minimum net topology by removing neurons. We got the following topology: 2 input neurons, 2 first layer hidden neurons, 1 second layer hidden neuron and 2 output neurons. 4. Conclusion We are able to observe knowledge creation with help of visualization. This is an advantage for the user to better understand and properly modify the building blocks, represented by interactions of hidden layers neurons. We tried to visualizate large NN using clusters. This brought clarity to large NN visualization process. If the user understands well the particular building blocks creation, he will be to interference into the learning process and this way make the results better. References 1. Užák, M., Vizualizácia a interakcia v procese učenia neurónových sietí, Diplomová práca, Technická univerzita Košice, (2005). 2. Sinčák, P., Andrejková, G., Neurónové siete – inžiniersky prístup (Dopredné neurónové siete), č.1., Elfa-press, ISBN 80-88786-38-X, (1996). 3. Paralič, J., Objavovanie znalostí v databázach, elfa s.r.o., ISBN 80-89066-607,(2003) 4. Duch, W. Visualization of hidden node activity in neural networks. International Joint Conference on Neural Networks, Portland, Oregon, 2003, Vol. I, pp. 1735-1740 5. Algoritmus Prunning [on-line] http://poprad.fei.tuke.sk/~ligus/fir/obsah/met23.htm 10