483-305

advertisement
Finding an optimal neural network structure using decision trees
RASTISLAV STRUHARIK
Department of Electrical Engineering
University of Novi Sad
Trg Dositeja Obradovića 6
SERBIA&MONTENEGRO
LADISLAV NOVAK
Department of Electrical Engineering
University of Novi Sad
Trg Dositeja Obradovića 6
SERBIA&MONTENEGRO
ALESSANDRA FANNI
Department of Electrical and Electronic Engineering
University of Cagliari
Piazza d'Armi - 09123 Cagliari
ITALY
Abstract: - In this paper an algorithm for construction of neural networks from decision trees is presented.
First decision tree is constructed using some standard algorithm, then equivalent set of rules is extracted from
that tree. Neural network is than formed using that set of rules. This neural network is than used as a basis for
further learning that will improve the classification performance on the given problem. Using this algorithm
neural network structure and initial set of weight values can be quickly estimated, eliminating extensive
experiments that are currently needed to find these parameters.
Key-Words: - Decision Trees, ID3, Neural Networks, backpropagation, classification
1 Introduction
There are many techniques that have been used in
the field of machine learning. Among them the most
popular are certainly those related to decision trees
and neural networks. Quinlan [1] originally
introduced decision tree learning. In this approach of
machine learning the target function to be learned is
represented by a tree, which has a finite depth. The
same function can be also represented by a set of
equivalent if-then rules. Each node in the tree
specifies a test of some attribute of the function to
be learned, and each branch descending from that
node corresponds to one of the possible values for
this attribute. An instance is classified by starting at
the root node, testing the attribute specified by this
node, and then descending down the branch
corresponding to the value of the attribute in the
given node. This process is repeated until we reach
the leaf at the end of the tree. The advantage of this
approach is that learning is fast, but often solution is
not optimal, and accuracy of the formed decision
tree is not good enough.
Artificial neural networks are among the most
effective learning methods currently known. They
consist of many simple-processing elements, which
are heavily interconnected. Using some of the
existing learning algorithms (e.g. backpropagation)
network weights are adjusted so as to minimize total
squared error on training examples. The advantage
of this method is that presented solutions are very
good, but there is a problem with estimating the
optimal neural network structure and initial weights
values for the given problem. For this reason, one is
"doomed" to try many different structures in order to
find the best one by trial and error approach. Also
there is a problem with convergence of
backpropagation algorithm, since the learning phase
of neural network can be stuck in a local minimum.
In order to overcome the disadvantages of both
techniques, DT and ANN, we propose in this paper
an approach that combines the best properties of
both techniques. There have been similar attempts in
the past, for instance [2]. Roughly speaking the
basic idea is as follows: for the given problem,
which is represented by a set of instances, we
construct a decision tree using a standard technique
(e.g. ID3) and then we recognize an equivalent set of
if-then rules. Using this set of rules we construct a
feed forward neural network. This network has to be
further trained using the same training set that was
used to construct the decision tree. In this algorithm
a decision tree is constructed to find a good enough
starting structure of neural network, thus eliminating
cumbersome experiments one would have to do in
order to find this structure. Also, the same decision
tree is used to find a good enough set of starting
weight values so there is no need for random
initialization (which is standard procedure in
determining initial weight values). In the first stage
we would confine ourselves to problems of
classification only into two classes. Later the
algorithm could be improved to allow solving
problems of classification into any finite number of
classes.
If we need a better translation we can use following
neural network:
1
181,643
A
-a0
2 ANN_from_DT Algorithm
Algorithm that is presented in this article can be
used for solving the problems that only have
numerical attributes, and target function has two
values (classification in two classes). It was stated
before that every decision tree could be represented
by an equivalent set of if-then rules. General
structure of any if-then rule from that set is
if ( A1  a1 )and ( A2  a 2 )and...and ( An  a n )
then y  y i
n
Fig. 1
2.2 Realization of logical AND operation
After the translation of conditions that have
numerical attributes into the conditions with logical
attributes general form of if-then rule is
if x1 and ...and xn and xn 1 and...and xn  m
(2.1)
Where A1,…, An are numerical attributes describing
the problem, a1,…, an some real numbers, and y can
have only two values 0, 1. Condition An=an can be
replaced by equivalent condition
A
-90,827
 an    and  An  an    ,
where  is some small real number. To form neural
network that will satisfy if-then rules which have
form given by (2.1) we need structures that will be
able to realize following operations:
1. translation of conditions with real valued
attributes into conditions with logical attributes
2. realization of logical AND operation
3. realization of logical OR operation
2.1 Translation of Conditions
There are three types of conditions with real valued
attributes that can be found in general structure of ifthen rules of interest:
1. A > a0,
2. A < a0 and
3.A = a0
Where A is some numerical attribute and a0 is some
real number. As before condition 3 can be replaced
by conjunction of conditions 1 and 2. Furthermore
conditions 1 and 2 can be represented by a logical
variable that has value one when the condition is
met, and zero when condition is not met. This
translation can be achieved by one sigmoid neuron.
then y  y i , yi  0,1.
(2.2)
Rule has n+m logical variables, n are no negated and
m are negated.
Neural network that realizes this rule has one
sigmoid neuron with n+m inputs. As recommended
in [3] weight values for the inputs paths are
determined in following way, weight of inputs that
come from no negated logical variables is set to ,
and weight of inputs that come from negated logical
variables is set to -. Bias value of sigmoid neuron
has value n-. Values for  and  are determined
empirically and have following values  = 3,  = 2,3. Now output of this network is high when the
rule is satisfied, and output is low when rule is not
satisfied.
x1
.
xn


.
-
n-
x n1
.
xn  m
-
Fig. 2
2.3 Realization of logical OR operation
Logical OR operation can be realized in two
different ways. One way is to realize logical OR that
has only two inputs. OR operations with more than
two inputs can be achieved using appropriate
number of two inputs OR operations. Once again we
need only one sigmoid neuron with two inputs.
Weight values for two inputs are set to , and bias
has value -. Another way is to realize multi-input
OR operation. We need one sigmoid neuron but this
time with m inputs (number of inputs in OR
operation). Weight values for inputs are set to ,
and bias has value - (see Fig. 3).
x1
x2
.
.
.
xi
.

if ( A1  a1 )and ( A2  a 2 )and...and ( An  a n )
then y  y i , y i  0,1






-


xm
Fig. 3
Values for  and  are once again determined
empirically,  = 3,  = -2,3 as recommended in [3].
In this article were also used following values,  =
9,643,  = -7,036 in effort to improve the realized
OR operation.
2.4 ANN_from_DT Algorithm
This algorithm can be used only for problems with
numerical attributes and two-valued target function.
For these problems sets of rules equivalent to
appropriate decision trees are comprised from rules
that have structure given by (2.1). Neural networks
described in previous sections can realize these
rules. Every condition that any of attributes must
satisfy can by realized by neural network from Fig.
1. Conjunction of needed conditions in particular
rule can be achieved by neural network from Fig. 2.
Disjunction of rules from set of rules can be done by
neural networks from Fig. 3. At the end we have a
complex neural network that satisfies the given set
of rules i.e. has a performance close to the
performance of decision tree formed in the
beginning. Given neural network has as many inputs
as there are attributes that describe the problem, and
has only one output. This neural network can now
be trained using for example backpropagation
algorithm to refine its performance.
ANN_from_DT
(Examples,
Target_attribute,
Attributes)
 Form a decision tree using some of the existing
algorithms (ID3, C4.5)
Form a set of rules that is equivalent to the
decision tree. For every path that begins at the
root node and ends at some leaf node there is a
rule that has a following general structure


Replace every condition An = an with an
equivalent condition
 An  a n   and An  a n    ,
where  is small real number.
Every of the conditions that some attribute
AiAttributes must fulfill, realize with neural
network from Fig. 1.
Every rule with logical variables replacing
numerical attributes realize with neural
network from Fig. 2.
Disjunctions of rules realize with neural
networks from Fig. 3.
Every input that corresponds to attribute that
has not been used in any of the rules connect to
every neuron from first hidden layer. Set
weights of these connections to zero.
Return formed neural network.
3 Conclusion
Generally speaking neural networks achieve better
results than decision trees. This is because ID3
decision trees use only orthogonal hyperplanes,
while neural networks use non-orthogonal
hyperplanes. The advantage of DTs is that learning
is fast, but often solution is not optimal, and
accuracy of the formed decision tree is not good
enough. In order to overcome the disadvantages of
both techniques, DT and ANN, we propose in this
paper an approach that combines the best properties
of both techniques.
For a given classification problem, which is
represented by a set of instances, we construct a
decision tree using a standard technique (e.g. ID3)
and then we recognize an equivalent set of if-then
rules. Using this set of rules we construct a feed
forward neural network which is further trained
using the same training set that was used to
construct the decision tree. Initial experiments show
that neural networks whose structure is determined
directly from corresponding decision tree, mostly
achieve better results than classically formed
networks. To be able to draw some general
conclusions, more experiments with different
problems must be done. Still, it seems that using
decision trees in constructing neural network
structure and the set of initial weights values results
in fairly good classification performance comparing
with traditional techniques with an important
advantage of knowing complete structure, including
weights, in advance.
References:
[1] Quinlan J. R.: Induction of decision trees,
Machine Learning, 1(1), pp. 81-106, 1986.
[2] Ishwar K. Sethi: Entropy Nets: From Decision
Trees to Neural Networks, Proceedings of the
IEEE, vol. 78, no. 10, October 1990.
[3] Geoffrey G. Towell, Jude W. Shavlik, Michiel
O. Noordewier: Refinement of Approximate
Domain Theories by Knowledge-Based Neural
Networks, AAAI-90, pp. 861-866, 1990.
Download