Using Hierarchical Recurrent Neuro-Fuzzy Systems for Decision Making Andreas Nürnberger

From: AAAI Technical Report SS-02-03. Compilation copyright © 2002, AAAI (www.aaai.org). All rights reserved.
Using Hierarchical Recurrent Neuro-Fuzzy Systems
for Decision Making
Andreas Nürnberger
University of California at Berkeley
EECS, Computer Science Division
Berkeley, CA 94720-1770, USA
email: anuernb@eecs.berkeley.edu
Abstract
Over the last years, a number of methods have been proposed to automatically learn and optimize fuzzy rule bases
from data. The obtained rule bases are usually robust and
allow an interpretation even for data sets that contains imprecise or uncertain information. However, most of the proposed methods are still restricted to learn and/or optimize
single layer feed-forward rule bases. The main disadvantages of this architecture are that the complexity of the rule
base increases exponentially with the number of input and
output variables and that the system is not able to store and
reuse information of the past. Thus temporal dependencies
have to be encoded in every data pattern. In this article we
briefly discuss the advantages and disadvantages of hierarchical recurrent fuzzy systems that tackle these problems.
Furthermore, we present a neuro-fuzzy model that has been
designed to learn and optimize hierarchical recurrent fuzzy
rule bases from data.
Introduction
In the beginning of the application of fuzzy systems (Zadeh
1965; Zadeh 1973) to real world problems, we were mainly
confronted with simple rule bases that used just a few input
variables to compute usually just a single output value.
Meanwhile, the complexity of fuzzy systems that are used
in decision making and control theory has increased drastically. Therefore, a number of methods have been proposed
to automatically learn and optimize fuzzy rule bases from
data.
Besides approaches that are based on decision trees (e.g.
Yuan, and Shaw 1995; Chi, and Yan 1996; Wang et al.
1999), fuzzy cluster analysis (e.g. Bezdek et al. 1998;
Höppner et al. 1999), genetic or evolutionary algorithms
(e.g. Lee, and Takagi 1993; Cordon et al. 2001), neurofuzzy systems (e.g. Lin, and Lee 1996; Nauck, Klawonn,
and Kruse 1997; Pedrycz 1998) have meanwhile proven
their usability in practice. The obtained rule bases are usually robust and allow an interpretation even for data sets
that contains imprecise or uncertain information. However,
most of the proposed methods are still restricted to learn
and/or optimize single layer feed-forward rule bases. The
main disadvantages of this architecture is that the complexity of the rule base increases exponentially with the number
of input and output variables – if we want to ensure a coverage of the considered data space – and that the system is
not able to store and reuse information of the past. Thus
temporal dependencies have to be encoded in every data
pattern that is presented to the system, e.g. if feed-forward
systems are applied to time series data, these data has to be
preprocessed and recoded, so that the relevant dynamic
information is available in each data set, which is presented
to the system. In case of applications in system control, this
is usually realized by additional use of the derivation or
integral of the system state. If this information is not available, a vector of prior states has to be stored and used as
additional system input. The same holds for fuzzy systems
that should be applied to dynamic decision making problems. Unfortunately, it is usually hard to define an ‘optimal’ number of prior states that should be used and for
continuously running systems additionally the time-delay
between the used state information has to be determined.
The use of prior state vectors leads again to a (sometimes
drastically) increased number of inputs, which soon becomes intractable. Furthermore, by the increased number of
inputs, the interpretability of the rule base suffers. Thus,
these systems can not be used in applications where the
decision making process has to be clearly visible.
In this article we discuss the advantages and disadvantages
of hierarchical recurrent fuzzy systems that tackle these
problems. Furthermore, we present a neuro-fuzzy model
that has been designed to learn and optimize hierarchical
recurrent fuzzy rule bases from data.
Hierarchical Fuzzy Systems
The development of hierarchical fuzzy systems was motivated by the dimensionality problem of single-stage fuzzy
systems, i.e. fuzzy systems that support only a direct mapping from the input to the output variables. In these systems
the total number of fuzzy rules increases exponentially with
the number of input and output variables, since usually all
combinations of input variables have to be considered to
cover the complete input space. Therefore, also the total
number of adjustable system parameters might increase
exponentially, if individual fuzzy sets are assigned to each
rule.
subsystem z
z
input variables:
x, y
output variables: z
inner variables:
u
small zero large
R4
R5
large small large
subsystem u
u
R6
small
R1: If x is large
then u is small
R2: If x is small
and y is large
then u is zero
R3: If y is small
then u is large
small zero large
R1
R2
large small
x
R3
large
small
y
R4: If u is large
then z is small
R5: If u is small
and y is large
then z is zero
R6: If y is small
then z is large
Figure 1. A hierarchical rule base and its representation as
directed graph
The idea of hierarchical fuzzy systems is to introduce internal (or intermediate) variables to the rule base. Therefore,
it is possible to use an output of a rule as an input to another rule and thus reducing the total number of fuzzy rules
required. However, it has to be ensured that no recurrence
occurs, i.e. that for the computation of a rule output no
input is required which is computed based on the output of
this rule1. This type of architecture is also known as rule
chaining in fuzzy expert systems (see, for example, Kruse,
Schwecke, and Heinsohn 1991; Pan, DeSouza, and Kak
1998; Steimann, and Adlassnig 1998).
For the creation of hierarchical systems we can distinguish
between two main (interpretation) principles: The combination of fuzzy rules to (local) fuzzy systems in a hierarchical
architecture (including defuzzification) and arbitrary rule
structures, where each rule is interpreted as a single fact.
The latter is usually used in the context of fuzzy expert
systems. Here, the main goal is to evaluate rule bases efficiently for changing rule structures or facts. These may
result in problems concerning the solution of conflicting
rules and overlapping of fuzzy sets (see, for example, Pan,
DeSouza, and Kak 1998). The problems are mainly caused
by the lack in performance due to the complexity of rule
bases usually used in expert system shells and the representation of each variable as a fuzzy set during the complete
inference process.
Another possibility to evaluate hierarchical fuzzy systems
is to interpret each subsystem independent in the context of
fuzzy interpolation. Therefore, a hierarchical rule base is
1
If a fuzzy rule base is interpreted as a directed graph, where the vertices
are defined by the variables and rules, and the edges are defined by links
from the variables used in the antecedent to the rule node and from the
rule node to the variables of the consequent, then a rule base without
recurrence is equivalent to a directed acyclic graph.
interpreted as a combination of local subsystems, where
each system computes a crisp output before the output is
used by another rule or subsystem. This interpretation is
again related to control theory and reflects an analogy to
cascaded control systems, where each sub-controller is
designed (mostly) independent of the other but also access
to inner (defuzzified) variables is required.
The sequence of computation can be obtained by a simple
topological ordering of the rule base. A simple example is
depicted in Figure 1. The rule base can be computed in the
order indicated by the rule numbers. The computation of a
crisp output (defuzzification) has to be done, when the last
rule of a subsystem was evaluated (here, after the evaluation of rules 3 and 6). The computation of a succeeding
subsystem can be initiated (triggered), when all input variables to this system have been computed.
Unfortunately, one disadvantage of the local defuzzification approach is that we neglect information about the
specificity or vagueness of the inner variables. Furthermore, alternative solutions can not be propagated to succeeding layers. Depending on the used defuzzification
methods we even have to avoid alternative or contradicting
solutions defined by the used fuzzy rules. However, for
problems that should be solved in the context of fuzzy
interpolation, e.g. function approximation or control problems, we can usually find a rule base without alternatives or
contradiction and even for most decision making problems
this is frequently possible.
Hierarchical Recurrent Fuzzy Systems
Pure feed-forward fuzzy system are not able to store and
reuse information of the past. Thus temporal dependencies
have to be encoded in every data pattern that is presented
to the system. If they are applied to time series data, these
data has to be preprocessed and recoded, so that the relevant dynamic information is available in each data set,
which is presented to the feed-forward system. This problem led to the idea to store prior system states in internal
variables of the fuzzy system (Gorrini, and Bersini 1994).
To be able to store ‘information of the past’, the simple
introduction of internal variables as in hierarchical rule
structures is obviously not sufficient, unless the system
information of prior system states (the values of output and
inner variables of the fuzzy system) is reused as input.
Therefore, all variables are now regarded as time dependent and thus the value of each variable depends on time t.
Furthermore, recurrent links are introduced, which can be
used to feed-back the output of variables of the previous
time step. The propagation can still be done as in pure
hierarchical systems, since the recurrence is hidden to the
subsystems. However, initial values for the recurrent links
have to be defined for the first propagation of the rule base.
A simple example of a hierarchical recurrent rule base is
depicted in Figure 2.
subsystem z
-1
z
x, y
input variables:
output variables: z
u
inner variables:
z
small zero large
R4
R5
large small
large
subsystem u
u
small zero large
R1
R2
small large small
x
small
R1: If x(t) is large
and z(t-1) is large
then u(t) is small
R2: If x(t) is small
and y(t) is large
then u(t) is zero
R3: If y(t) is small
then u(t) is large
R4: If z(t-1) is large
then z(t) is small
R5: If u(t) is small
and y(t) is large
then z(t) is zero
R6: If y(t) is small
then z(t) is large
R3
large
R6
small
linguistic terms share their parameters) and in the antecedents (layer two). Furthermore, constraints can be defined,
which have to be observed by the learning method, e.g. that
the fuzzy sets have to cover the considered state space. A
possible structure is shown in Figure 3. A more formal
definition is given in the following.
Definition 1. Let Ant(r) be the set of the fuzzy sets used in
the antecedent and Con(r) be the set of the fuzzy sets used
in the consequent of rule r, then the recurrent neuro-fuzzy
system is a fuzzy system with the following specifications:
(i)
All fuzzy sets µ (ji ) ( xi ) ∈ Ant ( r ) are defined either Gaussian-like or logistic fuzzy sets
− ( xi − c (j i ) )2
( )
2 α (j i )
µ (ji ) ( xi ) = e
2
1
or µ (ji ) ( xi ) =
1+ e
y
All fuzzy sets ν l( k ) ( y k ) ∈ Con( r ) are defined by
Gaussian-like fuzzy sets
A Hybrid Recurrent Neuro-Fuzzy System
− ( yk − cˆl( k ) )2
ν
(ii)
(k )
l
(k ) 2
( yk ) = e 2 (αˆl ) ,
where ν l(k ) is the l-th fuzzy set assigned to output yk.
Each rule is a tuple
(i )
(i )
1
n
(k )
(k )
Rr = ( µ jr r1 ,l, µ jr rn ,ν lr r1 ,l,ν lr rn )
1
n
and has the form
( ir1 )
Rr : if xi is µ j
r1
r1
( irn )
and l and xi is µ j
rn
then yk is ν
r1
( k r1 )
lr1
rn
( k rn )
and l and yk is ν l
rn
rn
.
(iii) The activation ar(t) of a fuzzy rule at time t is
computed by
a r (t ) =
(iv)
∏µ
(i )
j
i , j:µ (j i )∈Ant ( r )
( xi ) .
The output of the system for each output variable yk is computed by a weighted sum:
∑ area (ν l( k ) , ar (t )) ⋅ center (ν l(k ) , ar (t ))
Model Structure
The main idea of this model is to combine simple feedforward fuzzy systems – which may consist of just one rule
– to arbitrary hierarchical models. Therefore, the interpretability of every part is ensured before, during and after
optimization. Backward connections between the models
are realized by time-delayed feed-back links (see also the
discussions in the previous sections). The interpretability of
the fuzzy sets is ensured by the use of coupled weights in
the consequents (fuzzy sets, which are assigned to the same
,
where µ (ij ) is the j-th fuzzy set assigned to input
xi.
Figure 2. A hierarchical recurrent rule base and its representation as directed (cyclic) graph
In (Nürnberger 2001) we have presented a neuro-fuzzy
system that can be used to learn and optimize hierarchical
recurrent fuzzy rule bases – as discussed in the previous
section – from data. The structure is based on local Mamdani-like fuzzy system for function approximation
(Mamdani, and Assilian 1975), that can be combined to an
arbitrary hierarchical and recurrent rule base. The encoding
of a fuzzy system in a neural network like architecture is
motivated by the generic fuzzy perceptron (Nauck 1994),
which was also used in our feed-forward neuro-fuzzy models for fuzzy control, approximation and classification (see,
e.g. Nauck, Klawonn, and Kruse 1997; Nürnberger, Nauck,
and Kruse 1999).
The learning algorithm proposed in (Nürnberger 2001)
supports logistic or Gaussian-like fuzzy sets to define the
membership functions of the antecedents and Gaussian-like
fuzzy sets to define the consequents. However, the algorithm can easily adapted to other (differentiable) membership functions.
In the following we briefly describe the structure of the
model and the used learning methods.
β (j i ) ( xi −d (j i ) )
yk (t ) =
r :ν l(rk ) ∈Con ( r )
r
r
∑ area (ν l( k ) , ar (t ))
r :ν l(rk ) ∈Con ( r )
r
∑ area (ν l( k ) , ar (t )) ⋅ cˆl( k )
=
r :ν l(rk ) ∈Con ( r )
r
r
∑ area (ν l( k ) , ar (t ))
r :ν l(rk ) ∈Con ( r )
r
,
where area computes the area and center the
center of gravity of the output fuzzy set νˆl( k ) of
rule r for yk at time t which is defined as
r
νˆl( k ) ( yk ) = min(ν l( k ) ( yk ), a r (t )) .
r
(v)
r
Two types of feedback connections are allowed:
a.
Time-delayed feedback: Any output yk
can be assigned to an input xi by defining
xi (t) := yk(t-1).
b.
Hierarchical feedback: An output yk can
be assigned to an input xi by defining
xi(t) := yk(t) if xi(t) is not used for the
computation of yk(t) neither directly nor
indirectly via another (hierarchical) input xj(t).
The feedback connections defined in (v) enable the definition of hierarchical fuzzy systems (b.) and the use of timedelayed inputs (a.). It should be noted, that only timedelayed feedback connections are considered as ‘real’
recurrent connections. Hierarchical feedback connections
are simply a ‘recurrent’ definition of a hierarchical fuzzy
rule base, and the condition defined in (b.) ensures that the
rule base can be sorted topologically.
yj
y1
yk
yn
The used defuzzification method is a modified center of
gravity approach (COG; see e.g. Kruse, Gebhardt, and
Klawonn 1994). In the commonly used COG approaches
the output fuzzy sets are merged prior to defuzzification.
Therefore, overlapping areas of fuzzy sets are neglected.
Here, the area is weighted by the number of occurrences,
thus considering the number of ‘votes’ for a specific output.
Learning Method for Parameter Optimization
The considered learning problem is a function approximation problem and for each input vector (time series vector)
the desired output vector is known. Therefore supervised
learning methods can be used to optimize the parameters of
the membership functions. The realized learning method is
motivated by the real-time recurrent learning method that
has already been applied successfully for the optimization
of the weights in recurrent neural networks (Williams, and
Zipser 1989). The idea is to propagate the error obtained at
the output units back through the rules of the (hierarchical)
fuzzy system and adapt the fuzzy sets accordingly. If feedback connections have to be considered, the fuzzy rule base
is unfolded in time and the error is propagated back
through time as described briefly below. A more detailed
description of the learning algorithm is given in
(Nürnberger 2001).
Let E be the total error (the cost function to be minimized)
of all output neurons k over all time steps t=0, ..., T:
T
∑
…
∑
(1)
1
ν
∏
…
∑
…
∏
(1)
t =0
∑
with
E (t ) =
νk(n)
νl(j)
…
E = ∑ E (t ) ,
1
∑ ( Ek (t ))2 ,
2 k
(2)
and
…
∏
…
∏
z-1
µ1(1) … µ n(11 ) µ1( j ) … µ n( jj ) µ 1( n ) … µ n(nn )
if node k has a target

ok (t ) − yk (t )
E k (t ) = 
(3)
output ok at time t ,
0
otherwise,
be an error measure for output node k. Then the error gradient of error E(t) for an arbitrary parameter p of the considered fuzzy rule base is defined as
∂E (t )
∂y (t )
= − ∑ E k (t ) k
.
∂p
∂p
k
(4)
Using Definition 1 we obtain
…
x1
…
xj
∂y k (t ) ∂y k (t )
+
=
∂f p ( p )
∂p
xn
Figure 3. Possible structure of the recurrent neuro-fuzzy
system (using one time-delayed and one hierarchical feedback)

∂y (t )  ∂a (t )
∑  ∂ak (t ) ⋅  ∂rp +
∈Con ( r )
i:µ

 r
r:ν l(rk )
 (5)
∂a r (t ) ∂xi (t ) 
∑ ∂x (t ) ∂p ,
∈Ant ( r )
i

(i )
jr
∂x i (t )
have to consider the deriva∂p
tions resulting from the input of connected subsystems or
prior time-steps if a time delayed link is used. Thus, four
cases have to be considered:
where the derivatives
 ∂y j (t )

 ∂p
 ∂y j (t − 1)
∂x i (t )  ∂p
=
∂p
 ∂x j (t − 1)
 ∂p

0

∂x i (t )
∂p
∂x i (t )
If
∂p
∂x i (t )
If
∂p
∂x i (t )
If
∂p
(6)
If
is a hierar. feedback y j (t ),
is a time del. feedb. y j (t − 1),
is a self - feedback,
is an external input.
xi ( t )
=0
∂p
since the external input is independent from any parameter
of the system (neglecting any external recurrences, which
could not be handled by the system itself).
If xi (t ) is assigned to an internal variable of the system,
three cases have to be considered:
x (t )
• For a hierarchical feedback with xi (t ) := yi (t ) , i
∂p
If xi (t ) is an external input to the system, then
Implementation
The learning method discussed above was implemented in
a software tool for the interactive design and computation
of hierarchical recurrent fuzzy rule bases. To ensure a good
interpretability of a learned rule base, the tool supports –
besides shared weights as defined above – the use of simple
constraints that has to be observed by the learning process1
and a simple pruning strategy: Rules are removed from the
rule base if the support of an assigned fuzzy set is decreased below a threshold value during learning. (The influence of this rules on the output is very low and therefore
the rule can usually be neglected.)
Furthermore, some modifications of the learning process
have been implemented: Batch or on-line learning combined with a momentum term and a teacher forcing strategy
(see, e.g. Williams, and Zipser 1989). A screenshot of the
software implementation is shown in Figure 4.
Besides the manual definition of a rule base, the software
implementation also provides a heuristics to learn a fuzzy
rule base from data. This method is briefly described in the
following.
can be computed by (5) by replacing xi (t ) with the
output variable y j (t ) of the preceding layer, since a hierarchical feedback is independent from xi .
• If xi (t ) is a self-feedback input, i.e. xi (t ) := x j (t − 1) ,
xi ( t )
can be computed iteratively similar to
∂p
RTRL learning (Williams, and Zipser 1989), by assum∂xi (0)
=0, since the initial state can be considered
ing
∂p
as independent from p.
then
• If
xi (t ) is a
time-delayed feedback input, i.e.
∂y j (t )
∂xi (t ) ∂y j (t − 1)
=
and
∂p
∂p
∂p
can be computed by use of Equation (5).
xi (t ) := y j (t − 1) , then
Finally, the parameters of the fuzzy sets are adapted according to
∂E (t )
,
(7)
p(t + 1) = p(t ) − η p
∂p
where ηp is the learning rate for a parameter p.
Figure 4. Screenshot of the software implementation
Rule Base Learning
The main idea of this approach is to learn the hierarchically
structured rule base of local subsystems by use of rule
templates. These templates enable the definition of the
variables and the belonging time delays that should be used
for the creation of the subsystems.
For recurrent systems it is usually necessary to be able to
have different fuzzy partitions for input and output variables that are assigned to the same domain. Therefore the
rule templates allow to define individual subsets of fuzzy
sets for each domain. Thus, the rule learning algorithm can
1
For discussions concerning the use of constraints in neuro-fuzzy learning approaches, see e.g. Nauck, Klawonn, and Kruse 1997.
be restricted to choose only fuzzy sets of a specific subset
during rule creation. In the realized software implementation subsets can be created by defining a simple filter pattern for every variable such that only fuzzy sets with matching linguistic terms are used in the learning process for a
specific rule template. The templates can be defined using
generic fuzzy rules. For example, the subsystem template
IF (x1(t-1) LIKE '*2') AND (x2(t-1) LIKE '*1')
THEN (y1(t) LIKE '*0')
is used to create rules like
IF (x1(t-1) IS small2) AND (x2(t-1) IS large1)
THEN (y1(t) IS zero0)
IF (x1(t-1) IS zero2) AND (x2(t-1) IS large1)
THEN (y1(t) IS large0).
Based on the rule templates, the specific instances of the
membership functions have to be selected for each rule.
The used learning method is motivated by the heuristics
implemented in the NEFPROX model (Nauck, and Kruse
1997). The algorithm requires an existing fuzzy partitioning for each considered domain. However, the rule base
learning algorithm can be modified to create fuzzy sets and
thus to avoid an initial partitioning of the domains (Nauck,
and Kruse 1997).
The algorithm consists of two parts. During the first part
(rule base learning), the algorithm creates rules iteratively
based on the given training data. The algorithm selects for
each variable the fuzzy set (of the subset) that obtains the
highest membership degree for the given value and assigns
it to the antecedent or consequent, respectively. Then the
fuzzy rules are created based on the given templates. For
fuzzy rules used to compute inner variables, the respective
consequents are assigned randomly.
During the second part (consequent re-assignment), the
algorithm re-assigns the consequents based on the obtained
mean over all inputs for each output variable (considering a
minimal rule activation) during a complete propagation
through time, i.e. the fuzzy set that obtains the highest
membership degree for the mean value is assigned to the
consequent.
Application Example
In the following example we use the discussed neuro-fuzzy
system to identify a dynamic process in order to be able to
predict its future states. The example describes an application to a system identification problem in control theory
(simple physical spring-mass model). However, an analogous problem in decision making is the identification of a
model that can be used to predict the next state of a dynamic process.
For the derivation of the model, we consider a mass on a
spring. Let the height x = 0 denote the position of the mass
m when it is at rest. Suppose the mass would be lifted to a
certain height x(0) = x0 and would then be dropped (i.e. the
initial velocity is v(0) = 0). Since the gravitational force on
the mass m is the same for all heights x, we may neglect it.
The force exerted by the spring is governed by Hooke's
law, according to which the force of a spring is proportional to the change of the length of the spring and opposite
to the direction of this change. This means
F = − c ⋅ ∆l = −c ⋅ x holds, where c is a constant that
depends on the spring. According to Newton’s second law
F = m ⋅ a = m ⋅ x this force causes an acceleration of the
mass m. Consequently, we obtain the differential equation
−c
DxD =
x,
m
with the initial values x(0) = x0 and v(0) = xD (0) = 0 .
This second order differential equation can be transformed
into a system of two coupled first order differential equations by introducing an intermediary variable v
xD = v and vD =
−c
x.
m
(8)
As system parameters the values c=40, m=1 and x0=1 were
used. A data set was generated by simulating the system for
a period of 20 sec. with a (fixed step size ∆t=0.1) explicit
fourth-order Runge-Kutta method and storing the values
x(t) and v(t). To obtain a rule base that can be interpreted
with respect to each variable, two rule templates – one for
the computation of x and one for the computation of v –
were defined for the rule base learning method. The domains x and v were partitioned by three Gaussian-like fuzzy
sets (neg, zero, pos), while for each subsystem and delay
independent fuzzy sets were defined with similar initial
parameters. The use of independent fuzzy sets for output
and (time-delayed) input is necessary to allow each system
to scale between the output and input values. The rule base
learning method yielded the following rules:
IF (x[t-1] IS neg2 AND v[t-1] IS neg1) THEN x IS neg0
IF (x[t-1] IS neg2 AND v[t-1] IS zero1) THEN x IS neg0
IF (x[t-1] IS neg2 AND v[t-1] IS pos1) THEN x IS zero0
IF (x[t-1] IS zero2 AND v[t-1] IS neg1) THEN x IS neg0
IF (x[t-1] IS zero2 AND v[t-1] IS zero1) THEN x IS zero0
IF (x[t-1] IS zero2 AND v[t-1] IS pos1) THEN x IS pos0
IF (x[t-1] IS pos2 AND v[t-1] IS neg1) THEN x IS zero0
IF (x[t-1] IS pos2 AND v[t-1] IS zero1) THEN x IS pos0
IF (x[t-1] IS pos2 AND v[t-1] IS pos1) THEN x IS pos0
IF (x[t-1] IS neg1 AND v[t-1] IS neg2) THEN v IS zero0
IF (x[t-1] IS neg1 AND v[t-1] IS zero2) THEN v IS pos0
IF (x[t-1] IS neg1 AND v[t-1] IS pos2) THEN v IS pos0
IF (x[t-1] IS zero1 AND v[t-1] IS neg2) THEN v IS neg0
IF (x[t-1] IS zero1 AND v[t-1] IS zero2) THEN v IS zero0
IF (x[t-1] IS zero1 AND v[t-1] IS pos2) THEN v IS pos0
IF (x[t-1] IS pos1 AND v[t-1] IS neg2) THEN v IS neg0
IF (x[t-1] IS pos1 AND v[t-1] IS zero2) THEN v IS neg0
IF (x[t-1] IS pos1 AND v[t-1] IS pos2) THEN v IS zero0
If we interpret this rule base, we see that the learned rule
base of the subsystems for the computation of x and v are
similar to the rule base of a simple integrator (adder). So,
we can suppose, that x is computed by integration of v, and
v by integration of (negated) x. If we compare this interpretation to Equation (8) we see, that this behavior corresponds to the physical model.
After rule base learning, the rule base was optimized for
4000 training cycles (complete propagations through time)
using x and v as training data. Figure 6 depicts the learning
progress. The learning terminated with a summed square
error of E = 12.872, which is caused by the deviation from
the frequency and amplitude of the physical system. In
Figure 5 the position x and velocity v of the trained system
and the validation data over time is shown (x and v given).
To evaluate the performance of the presented approach for
inner variables, in following runs either x or v was presented to the system for learning, thus the other variable
was considered as unknown. The simulation run using just
x for training yielded the best approximation of the amplitudes for x and v, while the system which was trained by
use of v fits the frequencies (see Figure 5). Both approaches achieved an appropriate approximation of the
physical model. The performances of the resulting models
are similar to the model that was trained using both variables (x and v) as known output value for learning.
It has to be emphasized, that the only input to the system
are the initial values x(0)=1 and v(0)=0. The dynamic behavior results from the reuse of the output signals as input
(time-delayed feedback). Even during learning, the system
used its own output as input and just the error signal Ei(t) –
the difference between the system output and training data
– for learning.
1
0,8
0,6
0,4
0,2
0
-0,2 0
10
20
30
40
-0,4
-0,6
-0,8
-1
x train
x (x given)
x (v given)
x (x and v given)
7
6
5
4
3
2
1
0
-1 0
10
20
30
40
-2
-3
2000
1800
1600
1400
1200
1000
800
600
400
200
0
0
500
1000
1500
2000
2500
3000
3500
4000
cycle
Figure 6. Change of E(t) (x and v given)
Discussion
In the preceding section we have discussed the application
of the proposed neuro-fuzzy system to a simple dynamical
system to show the principle usability for the approximation of dynamic systems. The analysis covered identification of hierarchical as well as recurrent rule structures.
In the same way, the model can be applied to decision
making problems:
• If prior knowledge is available, this can be used to
define an initial rule base.
• Using rule templates, existing training data can be used
to learn (additional) fuzzy rules.
• The membership functions used to define an existing or
learned fuzzy rule base can be optimized using available training data.
• The obtained fuzzy rule base can be analyzed to obtain
information about the analyzed data set or to justify the
decision process.
However, for the application to decision making problems
we have to consider, that the proposed model is based on
local subsystems and the inner variables are defuzzified
before used in a succeeding layer (subsystem). Thus, we
neglect information about the specificity or vagueness of an
attribute in the inference process. Furthermore, we can not
forward alternative solutions to succeeding layers and we
therefore have to avoid contradicting rules (e.g. implementing alternative decisions).
To further improve the usability of the discussed system,
the structure of the neuro-fuzzy model could be extended
so that also symbolic attributes can be used directly. Currently, these attributes have to encoded as real values (if an
order can be defined) or as a binary vector. Furthermore,
the model should be extended to support the use of patterns
with missing values. Here, approaches similar to the methods discussed in (Nauck, and Kruse 1999) might be appropriate. These aspects will be considered in future work.
-4
-5
Conclusions
-6
-7
v train
v (x given)
v (v given)
Figure 5. System output
v (x and v given)
In this article we have discussed hierarchical recurrent
fuzzy systems that tackle the complexity problems of pure
feed-forward architectures and enable the approximation of
dynamic processes by storing information of prior system
states internally.
Furthermore, we have discussed a neuro-fuzzy model that
can be used to optimize hierarchical recurrent fuzzy rule
bases. The model allows the use of prior knowledge in
form of fuzzy rules or rule templates, e.g. to define an appropriate initial solution or to enhance the learning performance. The learning methods ensures the interpretability
of the optimized rule base by use of shared weights (fuzzy
sets) and constraints.
The proposed neuro-fuzzy system can be a valuable tool to
(interactively) optimize arbitrary hierarchical recurrent
fuzzy rule bases. However, we have to be aware that even
if we use neuro-fuzzy techniques for optimization, we still
apply a fuzzy system as a matter of fact. In general, fuzzy
systems cannot be meant for outperforming other approximation approaches. This is mainly prevented by the usually
small numbers of linguistic terms that are shared by all
rules to ensure an interpretable solution. This trade-off
between (often quantitative) methods that achieve good
performance and (often qualitative) models that give insights into a specific domain or problem was formulated as
the principle of the incompatibility of precision and meaning by Zadeh (Zadeh 1973). The benefit gained by using a
fuzzy system lies in interpretability and readability of the
rule base. This is widely considered more important than
the ‘last percent’ of increase in performance.
Acknowledgements
The work presented in this article was partially supported
by BTexact Technologies, Adastral Park, Martlesham, UK.
References
Bezdek, J. C., Keller, J. M., Krishnapuram, R., and Pal, N.
R. 1998. Fuzzy Models and Algorithms for Pattern Recognition and Image Processing, The Handbooks on
Fuzzy Sets, Kluwer Academic Publishers, Norwell, MA.
Chi, Z., and Yan, H. 1996. ID3-derived fuzzy rules and
optimal defuzzication for handwritten numeral recognition, IEEE Trans. Fuzzy Systems, 4(1), pp. 24-31.
Cordon, O., Herrera, F., Hoffmann, F., and Magdalena, L.
2001. Genetic Fuzzy Systems: Evolutionary Tuning and
Learning of Fuzzy Knowledge Bases, Advances in Fuzzy
Systems, World Scientific, Singapur.
Gorrini, V., and Bersini, H. 1994. Recurrent Fuzzy Systems, In: Proceedings of the 3rd Conference on Fuzzy
Systems (FUZZ-IEEE 94), IEEE, Orlando.
Höppner, F., Klawonn, F., Kruse, R., and Runkler, T. 1999.
Fuzzy Cluster Analysis, Wiley, Chichester.
Kruse, R., Gebhardt, J., and Klawonn, F. 1994. Foundations of Fuzzy Systems, Wiley, New York, Chichester, et
al..
Kruse, R., Schwecke, E., and Heinsohn, J. 1991. Uncertainty and Vagueness in Knowledge-Based Systems:
Numerical Methods, Springer-Verlag, Berlin.
Lee, M., and Takagi, H. 1993. Integrating Design Stages of
Fuzzy Systems Using Genetic Algorithms, In: Proc.
IEEE Int. Conf. on Fuzzy Systems 1993, pp. 612-617,
San Francisco.
Lin, C.-T., and Lee, C.-C. 1996. Neural Fuzzy Systems. A
Neuro-Fuzzy Synergism to Intelligent Systems, Prentice
Hall, New York.
Mamdani, E. H., and Assilian, S. 1975. An Experiment in
Linguistic Synthesis with a Fuzzy Logic Controller, Int.
J. Man Machine Studies, 7, pp. 1-13.
Nauck, D. 1994. A Fuzzy Perceptron as a Generic Model
for Neuro-Fuzzy Approaches, In: Proc. FuzzySysteme'94, Munich.
Nauck, D., and Kruse, R. 1997. Function Approximation
by NEFPROX, In: Proc. Second European Workshop on
Fuzzy Decision Analysis and Neural Networks for Management, Planning, and Optimization (EFDAN'97), pp.
160-169, Dortmund.
Nauck, D., and Kruse, R. 1999. Learning in Neuro-Fuzzy
Systems with Symbolic Attributes and Missing Values,
In: Proc. Sixth International Conference on Information
Processing (ICONIP99), pp. 142-147, Perth.
Nauck, D., Klawonn, F., and Kruse, R. 1997. Foundations
of Neuro-Fuzzy Systems, Wiley, Chichester.
Nürnberger, A. 2001. A Hierarchical Recurrent NeuroFuzzy System, In: Proc. of Joint 9th IFSA World Congress and 20th NAFIPS International Conference, pp.
1407-1412, IEEE.
Nürnberger, A., Nauck, D., and Kruse, R. 1999. NeuroFuzzy Control Based on the NEFCON-Model, Soft
Computing, 2(4), pp. 182-186, Springer, Berlin.
Pan, J., DeSouza, G. N., and Kak, A. C. 1998. FuzzyShell:
A Large-Scale Expert System Shell using Fuzzy Logic
for Uncertainty Reasoning, Trans. on Fuzzy Systems,
6(4), pp. 563-581.
Pedrycz, W. 1998. Computational Intelligence: An Introduction, CRC Press., Boca Raton, FL.
Steimann, F., and Adlassnig, K.-P. 1998. Fuzzy Medical
Diagnosis, In: Ruspini, E., Bonisonne, P. P., and
Pedrycz, W. (eds.), Handbook of Fuzzy Computation,
IOP Publishing Ltd and Oxford University Press.
Wang, C.-H., Liu, J.-F., Hong, T.-P., and Tseng, S.-S.
1999. A fuzzy inductive learning strategy for modular
rules, Fuzzy Sets and Systems, 103, pp. 91-105.
Williams, R. J., and Zipser, D. 1989. Experimental analysis
of the real time recurrent learning algorithm, Connection
Science, 1, pp. 87-111.
Yuan, Y., and Shaw, M. J. 1995. Induction of Fuzzy Decision Trees, Fuzzy Sets and Systems, 69(2), pp. 125-139.
Zadeh, L. A. 1965. Fuzzy Sets, Information and Control, 8,
pp. 338-353.
Zadeh, L. A. 1973. Outline of a New Approach to the
Analysis of Complex Systems and Decision Processes,
IEEE Trans. Systems, Man & Cybernetics, 3(1), pp. 2844.