Finding an optimal neural network structure using decision trees RASTISLAV STRUHARIK Department of Electrical Engineering University of Novi Sad Trg Dositeja Obradovića 6 SERBIA&MONTENEGRO LADISLAV NOVAK Department of Electrical Engineering University of Novi Sad Trg Dositeja Obradovića 6 SERBIA&MONTENEGRO ALESSANDRA FANNI Department of Electrical and Electronic Engineering University of Cagliari Piazza d'Armi - 09123 Cagliari ITALY Abstract: - In this paper an algorithm for construction of neural networks from decision trees is presented. First decision tree is constructed using some standard algorithm, then equivalent set of rules is extracted from that tree. Neural network is than formed using that set of rules. This neural network is than used as a basis for further learning that will improve the classification performance on the given problem. Using this algorithm neural network structure and initial set of weight values can be quickly estimated, eliminating extensive experiments that are currently needed to find these parameters. Key-Words: - Decision Trees, ID3, Neural Networks, backpropagation, classification 1 Introduction There are many techniques that have been used in the field of machine learning. Among them the most popular are certainly those related to decision trees and neural networks. Quinlan [1] originally introduced decision tree learning. In this approach of machine learning the target function to be learned is represented by a tree, which has a finite depth. The same function can be also represented by a set of equivalent if-then rules. Each node in the tree specifies a test of some attribute of the function to be learned, and each branch descending from that node corresponds to one of the possible values for this attribute. An instance is classified by starting at the root node, testing the attribute specified by this node, and then descending down the branch corresponding to the value of the attribute in the given node. This process is repeated until we reach the leaf at the end of the tree. The advantage of this approach is that learning is fast, but often solution is not optimal, and accuracy of the formed decision tree is not good enough. Artificial neural networks are among the most effective learning methods currently known. They consist of many simple-processing elements, which are heavily interconnected. Using some of the existing learning algorithms (e.g. backpropagation) network weights are adjusted so as to minimize total squared error on training examples. The advantage of this method is that presented solutions are very good, but there is a problem with estimating the optimal neural network structure and initial weights values for the given problem. For this reason, one is "doomed" to try many different structures in order to find the best one by trial and error approach. Also there is a problem with convergence of backpropagation algorithm, since the learning phase of neural network can be stuck in a local minimum. In order to overcome the disadvantages of both techniques, DT and ANN, we propose in this paper an approach that combines the best properties of both techniques. There have been similar attempts in the past, for instance [2]. Roughly speaking the basic idea is as follows: for the given problem, which is represented by a set of instances, we construct a decision tree using a standard technique (e.g. ID3) and then we recognize an equivalent set of if-then rules. Using this set of rules we construct a feed forward neural network. This network has to be further trained using the same training set that was used to construct the decision tree. In this algorithm a decision tree is constructed to find a good enough starting structure of neural network, thus eliminating cumbersome experiments one would have to do in order to find this structure. Also, the same decision tree is used to find a good enough set of starting weight values so there is no need for random initialization (which is standard procedure in determining initial weight values). In the first stage we would confine ourselves to problems of classification only into two classes. Later the algorithm could be improved to allow solving problems of classification into any finite number of classes. If we need a better translation we can use following neural network: 1 181,643 A -a0 2 ANN_from_DT Algorithm Algorithm that is presented in this article can be used for solving the problems that only have numerical attributes, and target function has two values (classification in two classes). It was stated before that every decision tree could be represented by an equivalent set of if-then rules. General structure of any if-then rule from that set is if ( A1 a1 )and ( A2 a 2 )and...and ( An a n ) then y y i n Fig. 1 2.2 Realization of logical AND operation After the translation of conditions that have numerical attributes into the conditions with logical attributes general form of if-then rule is if x1 and ...and xn and xn 1 and...and xn m (2.1) Where A1,…, An are numerical attributes describing the problem, a1,…, an some real numbers, and y can have only two values 0, 1. Condition An=an can be replaced by equivalent condition A -90,827 an and An an , where is some small real number. To form neural network that will satisfy if-then rules which have form given by (2.1) we need structures that will be able to realize following operations: 1. translation of conditions with real valued attributes into conditions with logical attributes 2. realization of logical AND operation 3. realization of logical OR operation 2.1 Translation of Conditions There are three types of conditions with real valued attributes that can be found in general structure of ifthen rules of interest: 1. A > a0, 2. A < a0 and 3.A = a0 Where A is some numerical attribute and a0 is some real number. As before condition 3 can be replaced by conjunction of conditions 1 and 2. Furthermore conditions 1 and 2 can be represented by a logical variable that has value one when the condition is met, and zero when condition is not met. This translation can be achieved by one sigmoid neuron. then y y i , yi 0,1. (2.2) Rule has n+m logical variables, n are no negated and m are negated. Neural network that realizes this rule has one sigmoid neuron with n+m inputs. As recommended in [3] weight values for the inputs paths are determined in following way, weight of inputs that come from no negated logical variables is set to , and weight of inputs that come from negated logical variables is set to -. Bias value of sigmoid neuron has value n-. Values for and are determined empirically and have following values = 3, = 2,3. Now output of this network is high when the rule is satisfied, and output is low when rule is not satisfied. x1 . xn . - n- x n1 . xn m - Fig. 2 2.3 Realization of logical OR operation Logical OR operation can be realized in two different ways. One way is to realize logical OR that has only two inputs. OR operations with more than two inputs can be achieved using appropriate number of two inputs OR operations. Once again we need only one sigmoid neuron with two inputs. Weight values for two inputs are set to , and bias has value -. Another way is to realize multi-input OR operation. We need one sigmoid neuron but this time with m inputs (number of inputs in OR operation). Weight values for inputs are set to , and bias has value - (see Fig. 3). x1 x2 . . . xi . if ( A1 a1 )and ( A2 a 2 )and...and ( An a n ) then y y i , y i 0,1 - xm Fig. 3 Values for and are once again determined empirically, = 3, = -2,3 as recommended in [3]. In this article were also used following values, = 9,643, = -7,036 in effort to improve the realized OR operation. 2.4 ANN_from_DT Algorithm This algorithm can be used only for problems with numerical attributes and two-valued target function. For these problems sets of rules equivalent to appropriate decision trees are comprised from rules that have structure given by (2.1). Neural networks described in previous sections can realize these rules. Every condition that any of attributes must satisfy can by realized by neural network from Fig. 1. Conjunction of needed conditions in particular rule can be achieved by neural network from Fig. 2. Disjunction of rules from set of rules can be done by neural networks from Fig. 3. At the end we have a complex neural network that satisfies the given set of rules i.e. has a performance close to the performance of decision tree formed in the beginning. Given neural network has as many inputs as there are attributes that describe the problem, and has only one output. This neural network can now be trained using for example backpropagation algorithm to refine its performance. ANN_from_DT (Examples, Target_attribute, Attributes) Form a decision tree using some of the existing algorithms (ID3, C4.5) Form a set of rules that is equivalent to the decision tree. For every path that begins at the root node and ends at some leaf node there is a rule that has a following general structure Replace every condition An = an with an equivalent condition An a n and An a n , where is small real number. Every of the conditions that some attribute AiAttributes must fulfill, realize with neural network from Fig. 1. Every rule with logical variables replacing numerical attributes realize with neural network from Fig. 2. Disjunctions of rules realize with neural networks from Fig. 3. Every input that corresponds to attribute that has not been used in any of the rules connect to every neuron from first hidden layer. Set weights of these connections to zero. Return formed neural network. 3 Conclusion Generally speaking neural networks achieve better results than decision trees. This is because ID3 decision trees use only orthogonal hyperplanes, while neural networks use non-orthogonal hyperplanes. The advantage of DTs is that learning is fast, but often solution is not optimal, and accuracy of the formed decision tree is not good enough. In order to overcome the disadvantages of both techniques, DT and ANN, we propose in this paper an approach that combines the best properties of both techniques. For a given classification problem, which is represented by a set of instances, we construct a decision tree using a standard technique (e.g. ID3) and then we recognize an equivalent set of if-then rules. Using this set of rules we construct a feed forward neural network which is further trained using the same training set that was used to construct the decision tree. Initial experiments show that neural networks whose structure is determined directly from corresponding decision tree, mostly achieve better results than classically formed networks. To be able to draw some general conclusions, more experiments with different problems must be done. Still, it seems that using decision trees in constructing neural network structure and the set of initial weights values results in fairly good classification performance comparing with traditional techniques with an important advantage of knowing complete structure, including weights, in advance. References: [1] Quinlan J. R.: Induction of decision trees, Machine Learning, 1(1), pp. 81-106, 1986. [2] Ishwar K. Sethi: Entropy Nets: From Decision Trees to Neural Networks, Proceedings of the IEEE, vol. 78, no. 10, October 1990. [3] Geoffrey G. Towell, Jude W. Shavlik, Michiel O. Noordewier: Refinement of Approximate Domain Theories by Knowledge-Based Neural Networks, AAAI-90, pp. 861-866, 1990.