Self-Optimizing Neural Network for Dynamic Data Classification Using Discrete-Time K-Winners-Take-All Neural Circuits Pavlo V. Tymoshchuk, Sergiy V. Shatnyi, Mykhaylo V. Lobur, Orest Y. Tsurkovs’kyi Department of Computer Aided Design Systems, L’viv Polytechnic National University, 12, S. Bandera Str., L’viv, 79013, Ukraine {pautym@polynet.lviv.ua} The project statement The design of modified mathematical model and corresponding functional block-diagram of self-optimizing neural network (SONN) is proposed. The network outputs are obtained by using the discrete-time dynamic K-winners-take-all (KWTA) neural circuits which are capable to identify the K largest from N input features, where 1 K N is a positive integer in the case K=1. Implementation prospects of the SONN in an up-to-date digital hardware and its application for classification problem solving of dynamic features are outlined. In contrast to other comparable analogs, the network is expected to combine such properties as high speed and precision of data processing. 1. Background Artificial neural networks are important for neural computation. They possess an ability to reduce own topology in the process of classification problem solving. Furthermore, such the networks produce a multilayer hybrid topologies, which together with the weights, are determined automatically by the constructing algorithm and thus avoid the search for finding a proper neural architecture. Another advantage of these algorithms is that convergence is guaranteed by the method [1] - [5]. A growing amount of current research in the area of neural networks is oriented towards this important topic. Providing constructive methods for building hybrid neural networks can potentially create more compact models which may be easily implemented in hardware and used on various embedded systems. The SONN is a neural network classifier based on a training data analysis which quickly estimates values of individual real, integer or binary input features. The method carries out all computation fully automatically from a data analysis and a data input dimension reduction to a computation of a neural network topology and its weight parameters. The SONN computational cost is equal O(nlog2n), where n is a sum of a data quantity, a data input dimension and a data output dimension. The SONN is capable both to develop fully automatically a network architecture, suitably compute weights and automatically convert any integer and real data into discriminative binary vectors, optimize them and an input data dimension as well. The automatic factorization of real input vectors provides small and good optimized SONN topologies [6], [7]. The process of the SONN design starts from a training data analysis. First, the SONN examines all input features (inputs) of all training samples for all classes, then automatically factorizes integer and real inputs into binary bipolar vectors. Next, it counts and estimates all factorized input features and starts the process of a neural network hybrid architecture development and a weights computation. The SONN integrates and compresses same data features of training samples as much as possible by grouping them together and by transforming features of same discrimination coefficient values into single connections. This process is associated with the process of training samples divisions into subgroups that are used to create neurons. The SONN topology consist of the following three types of neurons that fulfills an input data transformation from real input vectors into a final classified outputs: discriminativelyfactorization neurons; aggregation-strengthening neurons; maximum-selection neurons. The neurons are arranged in some number of layers. The number of layers is dependent on training data correlations. If training data of different classes are more correlated then a neural network architecture is also more complicated and it has got more layers and neurons and vice versa. If 1 there are little correlations between same class training samples there are more neurons in layers and vice versa. The SONN outputs determine the similarities of a given input vector to classes defined in a training data set. The SONN qualifies some groups of inputs to be more important and pays special attention to them. On the other hand, inputs which are less or by no means important for a classification are simply reduced and not used in a classification process. The SONN can automatically minimize a real data input dimension and a factorized binary data dimension, so this solution is free from “the curse of dimensionality problem”. The part of a data transformation is proceeded by aggregation-strengthening neurons placed in the middle part of the SONN architecture. These neurons demand bipolar binary inputs during their adaptation process. They aggregate inputs of same discrimination coefficient values together (without an information lost) and appropriately strengthen these of them that better discriminate training samples in-between various classes. The strengthening factors are called global discrimination coefficients. The aggregation-strengthening neurons produce their outputs in the range of [-1;+1]. The neurons are interconnected in such a way to promote the most discriminative features and to propagate discrimination properties of previous connections to next layers (i.e. it propagates the sum of discrimination coefficients of all previous connections of a considered neuron). The outputs of aggregation-strengthening neurons are given by: The bipolar binary inputs necessary for an ASNs adaptation are computed by discriminativelyfactorization neurons (DFNs) that are capable of automatically factorize integer and real training inputs in such a way that existing discriminative properties are not lost after this transformation. If simultaneously all associated inputs lie in appropriate factorization ranges of a considered DFN then it produces the value +1 on its output and the value −1 otherwise using the following function: The maximum similarity to each class is computed by maximum-selection neurons (MSNs) which are used to select a maximal value from all connected ASNs that represent aggregated most important input features of training samples of appropriate same classes by the following expression: 2 The SONNs require all data available at the beginning of the adaptation process, so they can process various global optimization processes. As a result, the SONNs can globally estimate importance of each input feature for all training samples and adapt the network structure and weights reflecting these estimations. The importance of each k-th input feature for each n-th training sample of the m-th class is globally measured by determining the following global discrimination coefficients: where M denotes a number of trained classes, Q is a number of all training samples, K is a n number of factorized input features, u k is the k-th feature value for the n-th training sample and Pkm , N mk , Q m are defined by the following formulas: Those discrimination coefficients qualify the discrimination property of each k-th feature for the n-th training sample of the m-th class and have the following properties: they are insensitive for a different quantity of training samples that represent various classes; a discrimination property of each feature of each training sample is globally estimated. The SONN topology optimization process is so designed to compute a minimal network structure using only those features that have maximal discrimination properties for each class. The above described three types of neurons are used to develop a hybrid partially connected multilayer neural network topology that can be precisely adjusted to an almost whatever training data set and a classification task. The SONN topology optimization process is based on discrimination coefficients computed for all training data samples as well as for some subgroups of them. In order to find an optimal SONN topology there is necessary to create only these connections which are necessary to classify correctly and unambiguously all training data using only the input features with maximal discrimination properties. Weights parameters and a topology are computed simultaneously during the SONN construction process. Such strategy makes possible to precisely assign each feature representing a subgroup of training samples to an accurate optimal weight value arising from its discrimination properties. The SONN classification results have been compared with the other AI methods for the Wine and Iris data from MLRepository. The comparison results show that an error of correct classification by SONN approach is close to that of FSM method and method of k-nearest-neighbors. The whole process of the neural networks development and optimization and the weights computation for classification by using SONN for 5 inputs takes 1 - 2 seconds [6], [7]. The SONN has been 3 successfully implemented on digital computers and adapted to many classification and recognition tasks, in particular, to OCR, medical and pharmacological diagnosing, and classical benchmarks. 2. Objectives The aim of this project is to design a mathematical model and corresponding functional block-diagram of discrete-time SONN for classification problem solving of dynamic features. Outputs of such a network be obtained based on discrete-time dynamical K-winners-take-all (KWTA) neural circuits. Prospects of implementation of the network in an up-to-date digital hardware should be outlined. Computer simulations confirming theoretical results have to be provided. It has be shown that the designed SONN is more fast and computationally precise than other comparable competitors. 3. Design the SONN for dynamic input features 3.1. K-winners-take-all neural circuit In order to increase the SONN performance that is outlined in Subsection 3.3, the SONN can be designed in an up-to-date hardware. In this case a MSN of the SONN modeled by (6) is effectively constructed by using KWTA neural circuit. KWTA neural networks are known to select K largest out of a set of N inputs, where 1 K N is a positive integer [8] - [12]. In the special case when K is equal to unity, which has to be used for classification problem solving, the KWTA network is the winner-takes-all (WTA) one, that chooses the maximal among N inputs [13] - [15]. In the project, outputs (6) is proposed to calculate by using discrete-time dynamic KWTA neural circuit presented in [9]. It is known that discrete-time neural networks comparatively to continuous-time analogs demonstrate a more high precision of signal processing, they are more reliable, more suitable to implement in software, and can be implemented in an up-to-date digital hardware for real time data processing [15]. The circuit is globally stable and convergent to KWTA operation in finite number of iterations. It is composed of N feedforward neurons and one feedback hardlimiting neuron used to determine the desired shift of inputs. The circuit can be implemented in a digital hardware by summers, integrator, switches and external sources of voltage or current, which are appropriate for real time data processing using VLSI technologies [16]. In contrast to other comparative analogs, the circuit can process correctly any finite value distinct inputs of any finite range, it has low computational and hardware implementation complexity, high speed of processing inputs, and possesses order preserving property of inputs. The circuit does not require resetting and corresponding supervisory circuit that additionally simplifies the hardware and increases a speed of processing inputs. Let us use the circuit mathematical model in the case K=1 and simplify it to the discretetime state equation k s ( k 1 ) s ( k ) A sgn R s ( k ) (11) and output equation bn cn s(k), (12) 1 , ifR ( s ( k )) 0 ; T ( b ,b ,..., b ) sgn R ( s ( k )) 0 , ifR ( s ( k )) 0 ;is a signum (hard limiting) n n n n where b 1 2 N , 1 , ifR ( s ( k )) 0 N s ( k ) 1 S b ( k ), is a k-th discrete value of the residual function, n function, R j j 1 4 N 1 , ifb ( k ) s ( k ) 0 ; n j Sbnj (k) determines S b ( k ) is a step function, a sum n j 0 ,otherwise j1 the number of positive outputs, s(k ) is a k-th discrete value of scalar dynamic shift of inputs, cnj anj amin c c c 0 n n n are preprocessed inputs, , 1 2 N a a a 0 n n n are inputs that assumed to be located in the known range [amin,amax] , 1 2 N where the numbers a min and a max represent the minimal and maximal possible values of inputs, respectively with amaxaminA, 1 N is a number of inputs, is an iteratively updated parameter that guarantees a convergence of the algorithm to the KWTA operation, 0s(1) A is an initial condition. The outputs (6) can be obtained by using the step function u , if S b ( k ) 0 ; n n j j x ( k ) (13) n j 0 otherwise and sum N y (k ) x k ), m nj( j 1 (14) where u n j is a value of n j th input, y m (k ) is a k-th discrete value of maximal input. 3.2. A functional block-diagram of the SONN A functional block-diagram of the SONN can be composed of three parts, specifically aggregation-strengthening neurons, discriminatively-factorization neurons, and maximumselection neurons. In particular, a functional block-diagram of the aggregation-strengthening neurons described by expression (1) is presented in Fig. 1. The diagram consists of the following blocks: input data u training/test; classes C; norm function (8); summing in (9) and (10); controlled switches S that implements fulfilling conditions in (7) and comparison in (9), (10); multiplication, division and summing : in (7); multiplication and division : in (3) and (4); multiplication and summation in (1). The aggregation-strengthening neurons are implemented in a digital hardware using summers, multipliers, dividers and controlled switches. Fig. 1. Architecture of the aggregation-strengthening neurons described by expression (1). 5 A functional block-diagram of the discriminatively-factorization neurons described by expression (5) is presented in Fig.2. The diagram consists of digital comparators, AND- and NOswitches, and constant voltage or current sources. Fig. 2. Architecture of the discriminatively-factorization neurons of the SONN described by (5). A functional block-diagram of the maximum-selection neurons described by (13), (14) built based on the discrete-time dynamic KWTA NC presented in [9] is shown in Fig. 3, where ∑ is a block of discrete-time summation, S is a switch described by a step function (13). Thus, the circuit can be implemented in a digital hardware using such digital components as summers, controlled switches, integrator, and external sources of voltage or current. Fig. 3. Architecture of the maximum-selection neurons based on KWTA neural circuit presented in [9] which solve the problems (13), (14). As one can see from (14), the output of the SONN has to be computed for each class m(1,..., M ). Therefore, in order to reduce a time of obtaining outputs (14), parallel 6 computation of each output (14) can be applied. For this purpose M discrete-time dynamic KWTA neural circuits is proposed to use in the project. The overall SONN can be implemented in an up-to-date digital hardware using such electronic circuit components as summers, multipliers, dividers, controlled switches, comparators, AND- and NO-switches, integrator, and external sources of voltage or current. Note that the functional block-diagrams presented in Fig. 1 – Fig. 3 are characterized by hardware implementation restrictions. In particular, all the implemented blocks of the blockdiagram will have time delays which in totality define the speed of processing inputs by a corresponding real digital SONN. An expression for time delay to process one set of input features by the network can be presented as follows: TT T T 1 2 3, (13) where T1 is a time delay of the discriminatively-factorization neurons, T2 is a time delay of the discriminatively-factorization neurons, T3 is a time delay of the maximum-selection neurons. If more than one set of input features should be processed, then it can be done sequentially in time. In other words, each next set of inputs can be processed after processing the previous set of inputs. Therefore, in order to obtain a correct operation of the SONN in the case of more than one set of inputs, a repetition period Tr of sets of input features should meet the following inequality: Tr T . (14) The SONN can be used in the case of time-varying input features c l (k ) , l=1,2,…,N, k=1,2,… if the module of speed change of such input features is much less than that of performing the mathematical operations which are present in the SONN model. In other words, in this case, condition dx k )/dt dp dt l( n/ , l=1,2,…,Q, (15) where Q is a quantity of mathematical operations fulfilled in the network, should be satisfied for each k . The resolution of the SONN is theoretically infinite and does not depend of values of its parameters. In other words, the network can always classify distinct input features correctly. Moreover, the SONN can operate correctly with any initial value of a state variable 0x(1) A, where A>0 [8]. Therefore, a functioning of the designed SONN is independent on initial conditions which can accept arbitrary values in a given range. Therefore the network will not require periodical resetting for repetitive processing of input features, additional supervisory circuits for resetting, and spend additional processing time on this operation. This allows to simplify the hardware and increase a speed of processing input features that is important for real time operating of the designed SONN. 3.3. Implementation prospects of the SONN In section 1 static (i.e. time independent) learning set for designing SONN was used. According to [7], the error of classification problem solving of static input features by SONNs for some test Wine data is larger than 6% against its less than 1.5% value for FSM method. Moreover, the error of classification by SONNs for some test Iris data is higher than 5% in contrast to 4% value obtained using the method of k-nearest neighbors. Therefore, we propose to solve a problem of precision rising of classification by SONNs in this project. For this purpose, 7 we suggest to obtain a solution of the problem using SONNs with higher order nonlinearities comparatively to existing ones. Let us consider the case of generalization and applying SONN for classification of discretetime dynamic input features. In this case, for every time point t(k), k = 1,2,…K different SONN structure can be created. In each time point t(k) corresponding SONN is created and optimized, and values of its weights are computed for classification problem solving. The process of designing the SONNs for K time points of input features will require a time T=K , where is a time of creating, optimization and the SONN weights computing in one discrete-time point of input feature. Thus, a time of designing K SONNs for K time points of input features by sequential computer software can have unreasonable large value for large number of K. Moreover, in this case in order to keep designed structure and parameters of SONNs for each time point k in computer it is necessary to have large volume of memory. For instance, according to numerical results presented in [7] for concrete example of classification with 5 input features „The whole process of the SONN development, optimization and weights computation have taken 1 - 2 seconds”. Therefore, in this case for K=10000 the time T=10000 - 20000 seconds>2.5 – 5 hours which can be unexeptable from practical point of view. Furthermore, in order to solve a classification problem of dynamic inputs by SONNs in a real time, i. e. in online mode the inequality <= t(k1)t(k), k=1,2,...,K-1 should be satisfied. For instance, to classify signals of EEGs given in [17], [18] in online mode it is necessary to have <=78 ms that is much less than SONNs presented in [7] can provide. Therefore, the problem of reducing a time of problem solving is important both for offline and online classification. An implementation of the designed SONN will be simulated using software. Such simulations should confirm theoretical results obtained in the project. It is known that software implementations of neural networks offer flexibility. Software implementations of SONNs can be trained and simulated on general-purpose sequential computers. In this project for reducing a time of classification we propose to solve the problem of classification of dynamic input features by implementation of SONN in a parallel software using their parallel structure. The processing speed of input features by SONNs implemented in software can be not high enough in many applications, especially to meet the demands of real time. Therefore, microprocessors and digital signal processing can be not suitable for parallel designs of SONNs, in particular, for real time applications. In order to speed-up the SONN operation, let us implement it in an up-to-date digital hardware to be used in such applications. Comparing with an analogue implementation, as known, a digital hardware is more computationally precise and reliable as long as the requirements for the size and power efficiency are not high. The digital implementation of the designed SONNS is expected to have a better repeatability, lower noise sensitivity, better testability, higher flexibility, as well as compatibility with other types of preprocessors [16]. Despite the tremendous growth in the digital computing power of generalpurpose processors neural network hardware has been found to be promising in many applications, such as image processing, speech synthesis and analysis, pattern recognition, high energy physics and others. Hardware implementations are essential for applicability and for taking the advantage of SONN’s inherent parallelism. Specific-purpose fixed hardware implementations (i.e. VLSI) can be dedicated to specific SONN models. VLSI implementations of SONNs is capable to provide not only high speed in real time applications but also compactness [19]. Therefore SONNs implemented in hardware comparatively to its software implemented counterparts will offer the following advantages [20]: - Speed: Specialized hardware offers very high computational power at limited price and thus achieving several orders of speed-up, especially in the neural domain where parallelism and distributed computing are inherently involved; for example, VLSI implementations for cellular neural networks can achieve speeds upto several teraflops [21], which otherwise is a very high speed for conventional DSPs, PCs, or even work stations. 8 - Cost: A hardware implementation can provide margins for reducing system cost by lowering the total component count and decreasing power requirements. - Graceful degradation: Even with the advancement and introduction of the multi-core PC processors architectures, the need for having effective fault-tolerant mechanisms is still present. Importance and in this respect parallel hardware implementations offer considerable advantage. - Compactness. Recent advances in reprogrammable logic enable implementing large ANNs on a single device. The main reason for this is the miniaturization of component manufacturing technology, where the data density of electronic components doubles every 18 months. Therefore, to solve the problem of classification of dynamic input featuress we propose to implement the SONNs in an up-to-date digital hardware. For SONN hardware FPGA-based implementations, ASIC-based implementations, and DSP-based implementations can be used. Since DSP-based implementation is sequential, it does not preserve the parallel architecture of the SONNs. ASIC implementation can be used for the SONN hardware realization, although it does not offer re-configurability by the user in order to improve their performance. The FPGA implementation achieves a comparable accuracy with the traditional solutions based on generalpurpose computers. An FPGA as an implementation hardware combines the reprogrammability advantage of general purpose processors with the parallel processing and speed advantages of customer hardware. The size and speed evaluation of FPGA reveals its low cost in terms of logic and memory [22]. To implement the SONN in a hardware, the FPGA based reconfigurable computing architecture is quite suitable because the parallel structure of FPGA matches the topology of the SONN and offers flexibility in reconfiguration. The architecture of the SONN and training algorithms can be implemented on a FPGA chip performing an on-line training. Such computational characteristics of the SONN as modularity and dynamic adaptation can also be realized in FPGA hardware. Using FPGA, the network may be implemented through parallel computing in a real-time hand-tracking system [23]. Due to the relatively high capacity, high density, short design cycle, and short time to market when using EDA tools, FPGA can be considered as the most applicable microelectronic technology for the designing SONN. References [1] Duch, W., Korbicz, J., Rutkowski, L., Tadeusiewicz, R. (eds.): Biocybernetics and Biomedical Engineering. EXIT, Warszawa (2000). [2] Fiesler, E., Beale, R. (eds.): Handbook of Neural Computation. IOP Publishing Ltd.,/ Oxford University Press, Bristol, New York (1997) [3] Horzyk, A.: A New Extension of Self-Optimizing Neural Networks for Topology Optimization. In: Duch, W., Kacprzyk, J., Oja, E., Zadrożny, S. (eds.) ICANN 2005. LNCS, vol. 3696, pp. 415–420. Springer, Heidelberg (2005) [4] Horzyk, A., Tadeusiewicz, R.: Self-Optimizing Neural Networks. In: Yin, F.-L., Wang, J., Guo, C. (eds.) ISNN 2004. LNCS, vol. 3173, pp. 150–155. Springer, Heidelberg (2004). [5] Jankowski, N.: Ontogenic neural networks. EXIT, Warszawa (2003). [6] Adrian Horzyk, Self Optimizing Neural Networks SONN-3 for Classification Tasks, Proc. of HAIS 2008, LNAI 5271, eds. E. Corchado, A. Abraham, W. Pedrycz, Springer-Verlag, Berlin Heidelberg 2008, pp. 229-236. [7] Adrian Horzyk, Introduction to Constructive and Optimization Aspects of SONN-3, Proc. of ICANN 2008, LNCS 5164, eds. V. Kurkova et al., Springer-Verlag, Berlin Heidelberg 2008, pp. 763-772. [8] E. Majani, R. Erlanson, and Y. Abu-Mostafa, “On the k-winners-take-all network,” In Advances in Neural Information Processing Systems, vol. 1, D. S. Touretzky, Ed. San Mateo, CA: Morgan Kaufmann, 1989, pp. 634–642. [9] P.V.Tymoshchuk, “A discrete-time dynamic K-winners-take-all neural circuit”, Neurocomputing, vol. 72, 2009, pp. 3191-3202. 9 [10] P. Tymoshchuk, “Continuous-Time Model of Analogue K-Winners-Take-All Neural Circuit”, in Proc. 13th Int. Conf. EANN, CCIS 311, London, 2012, pp. 94–103. [11] P. V. Tymoshchuk, “A model of analogue K-winners-take-all neural circuit,” Neural Networks, vol. 42, pp. 44-61, June 2013. [12] P. V. Tymoshchuk ”A fast analogue K-winners-take-all neural circuit”, in Proc. Int. Joint Conf. Neural Networks, Dallas, TX, 2013, pp. 882-889. [13] R. P. Lippmann, “An introduction to computing with neural nets,” IEEE Acoustics, Speech and Signal Processing Magazine, vol. 3, no. 4, pp. 4-22, Apr. 1987. [14] P.Tymoshchuk and E.Kaszkurewicz, ”A Winner-take-all circuit based on second order Hopfield neural networks as building blocks”, in Proc. Int. Joint Conf. Neural Networks, vol. II, Portland, OR, 2003, pp. 891-896. [15] P.Tymoshchuk and E.Kaszkurewicz, ”A winner-take all circuit using neural networks as building blocks”, Neurocomputing, vol. 64, 2005, pp. 375-396. [16] A. Cichocki and R. Unbehauen, Neural Networks for Optimization and Signal Processing (New York: John Wiley and Sons, 1993). [17] S. M. Lee and S. J. Roberts, “Sequential Dynamic Classification Using Latent Variable Models”, Technical Report PARG 08-02, Robotics Research Group of Engineering Science Department of Oxford University, 2008. [18] J. W. Yoon,_S. J. Roberts, M. Dyson, J. Q. Ganb, “Adaptive classification for Brain Computer Interface systems using Sequential Monte Carlo sampling,” Neural Networks, vol. 22, pp. 1286-1294, June 2009. [19] Suhap Sahin, Yasar Becerikli, and Suleyman Yazici, Neural Network Implementation in Hardware Using FPGA, Proc. of ICONIP 2006, LNCS 4234, eds. I. King et al., Springer-Verlag, Berlin Heidelberg 2006, pp. 1105-1112. [20] J. Misra and I. Saha, ”Artificial neural networks in hardware: A survey of two decades of progress”, Neurocomputing, vol. 74, 2010, pp. 239-255. [21] M. Hanggi, G. Moschytz, Cellular Neural Networks: Analysis, Design, and Optimization, Kluwer Academic Publishers, Norwell, MA, USA, 2000. [22] A. Muthuramalingam, S. Himavathi and E. Srinivasan, “Neural network implementation using FPGA: issues and application”, International Journal of Information Technology, vol. 4, no 2, 2008, pp. 95-101. [23] M. Krips, T. Lammert and A. Kummert, “FPGA implementation of a neural network for a real-time hand tracking system”, Proceedings of the 1st IEEE International Workshop on Electronic Design, Test and Applications, vol. 29-31, 2002, pp. 313-317. 10