From: AAAI Technical Report SS-02-03. Compilation copyright © 2002, AAAI (www.aaai.org). All rights reserved. Using Hierarchical Recurrent Neuro-Fuzzy Systems for Decision Making Andreas Nürnberger University of California at Berkeley EECS, Computer Science Division Berkeley, CA 94720-1770, USA email: anuernb@eecs.berkeley.edu Abstract Over the last years, a number of methods have been proposed to automatically learn and optimize fuzzy rule bases from data. The obtained rule bases are usually robust and allow an interpretation even for data sets that contains imprecise or uncertain information. However, most of the proposed methods are still restricted to learn and/or optimize single layer feed-forward rule bases. The main disadvantages of this architecture are that the complexity of the rule base increases exponentially with the number of input and output variables and that the system is not able to store and reuse information of the past. Thus temporal dependencies have to be encoded in every data pattern. In this article we briefly discuss the advantages and disadvantages of hierarchical recurrent fuzzy systems that tackle these problems. Furthermore, we present a neuro-fuzzy model that has been designed to learn and optimize hierarchical recurrent fuzzy rule bases from data. Introduction In the beginning of the application of fuzzy systems (Zadeh 1965; Zadeh 1973) to real world problems, we were mainly confronted with simple rule bases that used just a few input variables to compute usually just a single output value. Meanwhile, the complexity of fuzzy systems that are used in decision making and control theory has increased drastically. Therefore, a number of methods have been proposed to automatically learn and optimize fuzzy rule bases from data. Besides approaches that are based on decision trees (e.g. Yuan, and Shaw 1995; Chi, and Yan 1996; Wang et al. 1999), fuzzy cluster analysis (e.g. Bezdek et al. 1998; Höppner et al. 1999), genetic or evolutionary algorithms (e.g. Lee, and Takagi 1993; Cordon et al. 2001), neurofuzzy systems (e.g. Lin, and Lee 1996; Nauck, Klawonn, and Kruse 1997; Pedrycz 1998) have meanwhile proven their usability in practice. The obtained rule bases are usually robust and allow an interpretation even for data sets that contains imprecise or uncertain information. However, most of the proposed methods are still restricted to learn and/or optimize single layer feed-forward rule bases. The main disadvantages of this architecture is that the complexity of the rule base increases exponentially with the number of input and output variables – if we want to ensure a coverage of the considered data space – and that the system is not able to store and reuse information of the past. Thus temporal dependencies have to be encoded in every data pattern that is presented to the system, e.g. if feed-forward systems are applied to time series data, these data has to be preprocessed and recoded, so that the relevant dynamic information is available in each data set, which is presented to the system. In case of applications in system control, this is usually realized by additional use of the derivation or integral of the system state. If this information is not available, a vector of prior states has to be stored and used as additional system input. The same holds for fuzzy systems that should be applied to dynamic decision making problems. Unfortunately, it is usually hard to define an ‘optimal’ number of prior states that should be used and for continuously running systems additionally the time-delay between the used state information has to be determined. The use of prior state vectors leads again to a (sometimes drastically) increased number of inputs, which soon becomes intractable. Furthermore, by the increased number of inputs, the interpretability of the rule base suffers. Thus, these systems can not be used in applications where the decision making process has to be clearly visible. In this article we discuss the advantages and disadvantages of hierarchical recurrent fuzzy systems that tackle these problems. Furthermore, we present a neuro-fuzzy model that has been designed to learn and optimize hierarchical recurrent fuzzy rule bases from data. Hierarchical Fuzzy Systems The development of hierarchical fuzzy systems was motivated by the dimensionality problem of single-stage fuzzy systems, i.e. fuzzy systems that support only a direct mapping from the input to the output variables. In these systems the total number of fuzzy rules increases exponentially with the number of input and output variables, since usually all combinations of input variables have to be considered to cover the complete input space. Therefore, also the total number of adjustable system parameters might increase exponentially, if individual fuzzy sets are assigned to each rule. subsystem z z input variables: x, y output variables: z inner variables: u small zero large R4 R5 large small large subsystem u u R6 small R1: If x is large then u is small R2: If x is small and y is large then u is zero R3: If y is small then u is large small zero large R1 R2 large small x R3 large small y R4: If u is large then z is small R5: If u is small and y is large then z is zero R6: If y is small then z is large Figure 1. A hierarchical rule base and its representation as directed graph The idea of hierarchical fuzzy systems is to introduce internal (or intermediate) variables to the rule base. Therefore, it is possible to use an output of a rule as an input to another rule and thus reducing the total number of fuzzy rules required. However, it has to be ensured that no recurrence occurs, i.e. that for the computation of a rule output no input is required which is computed based on the output of this rule1. This type of architecture is also known as rule chaining in fuzzy expert systems (see, for example, Kruse, Schwecke, and Heinsohn 1991; Pan, DeSouza, and Kak 1998; Steimann, and Adlassnig 1998). For the creation of hierarchical systems we can distinguish between two main (interpretation) principles: The combination of fuzzy rules to (local) fuzzy systems in a hierarchical architecture (including defuzzification) and arbitrary rule structures, where each rule is interpreted as a single fact. The latter is usually used in the context of fuzzy expert systems. Here, the main goal is to evaluate rule bases efficiently for changing rule structures or facts. These may result in problems concerning the solution of conflicting rules and overlapping of fuzzy sets (see, for example, Pan, DeSouza, and Kak 1998). The problems are mainly caused by the lack in performance due to the complexity of rule bases usually used in expert system shells and the representation of each variable as a fuzzy set during the complete inference process. Another possibility to evaluate hierarchical fuzzy systems is to interpret each subsystem independent in the context of fuzzy interpolation. Therefore, a hierarchical rule base is 1 If a fuzzy rule base is interpreted as a directed graph, where the vertices are defined by the variables and rules, and the edges are defined by links from the variables used in the antecedent to the rule node and from the rule node to the variables of the consequent, then a rule base without recurrence is equivalent to a directed acyclic graph. interpreted as a combination of local subsystems, where each system computes a crisp output before the output is used by another rule or subsystem. This interpretation is again related to control theory and reflects an analogy to cascaded control systems, where each sub-controller is designed (mostly) independent of the other but also access to inner (defuzzified) variables is required. The sequence of computation can be obtained by a simple topological ordering of the rule base. A simple example is depicted in Figure 1. The rule base can be computed in the order indicated by the rule numbers. The computation of a crisp output (defuzzification) has to be done, when the last rule of a subsystem was evaluated (here, after the evaluation of rules 3 and 6). The computation of a succeeding subsystem can be initiated (triggered), when all input variables to this system have been computed. Unfortunately, one disadvantage of the local defuzzification approach is that we neglect information about the specificity or vagueness of the inner variables. Furthermore, alternative solutions can not be propagated to succeeding layers. Depending on the used defuzzification methods we even have to avoid alternative or contradicting solutions defined by the used fuzzy rules. However, for problems that should be solved in the context of fuzzy interpolation, e.g. function approximation or control problems, we can usually find a rule base without alternatives or contradiction and even for most decision making problems this is frequently possible. Hierarchical Recurrent Fuzzy Systems Pure feed-forward fuzzy system are not able to store and reuse information of the past. Thus temporal dependencies have to be encoded in every data pattern that is presented to the system. If they are applied to time series data, these data has to be preprocessed and recoded, so that the relevant dynamic information is available in each data set, which is presented to the feed-forward system. This problem led to the idea to store prior system states in internal variables of the fuzzy system (Gorrini, and Bersini 1994). To be able to store ‘information of the past’, the simple introduction of internal variables as in hierarchical rule structures is obviously not sufficient, unless the system information of prior system states (the values of output and inner variables of the fuzzy system) is reused as input. Therefore, all variables are now regarded as time dependent and thus the value of each variable depends on time t. Furthermore, recurrent links are introduced, which can be used to feed-back the output of variables of the previous time step. The propagation can still be done as in pure hierarchical systems, since the recurrence is hidden to the subsystems. However, initial values for the recurrent links have to be defined for the first propagation of the rule base. A simple example of a hierarchical recurrent rule base is depicted in Figure 2. subsystem z -1 z x, y input variables: output variables: z u inner variables: z small zero large R4 R5 large small large subsystem u u small zero large R1 R2 small large small x small R1: If x(t) is large and z(t-1) is large then u(t) is small R2: If x(t) is small and y(t) is large then u(t) is zero R3: If y(t) is small then u(t) is large R4: If z(t-1) is large then z(t) is small R5: If u(t) is small and y(t) is large then z(t) is zero R6: If y(t) is small then z(t) is large R3 large R6 small linguistic terms share their parameters) and in the antecedents (layer two). Furthermore, constraints can be defined, which have to be observed by the learning method, e.g. that the fuzzy sets have to cover the considered state space. A possible structure is shown in Figure 3. A more formal definition is given in the following. Definition 1. Let Ant(r) be the set of the fuzzy sets used in the antecedent and Con(r) be the set of the fuzzy sets used in the consequent of rule r, then the recurrent neuro-fuzzy system is a fuzzy system with the following specifications: (i) All fuzzy sets µ (ji ) ( xi ) ∈ Ant ( r ) are defined either Gaussian-like or logistic fuzzy sets − ( xi − c (j i ) )2 ( ) 2 α (j i ) µ (ji ) ( xi ) = e 2 1 or µ (ji ) ( xi ) = 1+ e y All fuzzy sets ν l( k ) ( y k ) ∈ Con( r ) are defined by Gaussian-like fuzzy sets A Hybrid Recurrent Neuro-Fuzzy System − ( yk − cˆl( k ) )2 ν (ii) (k ) l (k ) 2 ( yk ) = e 2 (αˆl ) , where ν l(k ) is the l-th fuzzy set assigned to output yk. Each rule is a tuple (i ) (i ) 1 n (k ) (k ) Rr = ( µ jr r1 ,l, µ jr rn ,ν lr r1 ,l,ν lr rn ) 1 n and has the form ( ir1 ) Rr : if xi is µ j r1 r1 ( irn ) and l and xi is µ j rn then yk is ν r1 ( k r1 ) lr1 rn ( k rn ) and l and yk is ν l rn rn . (iii) The activation ar(t) of a fuzzy rule at time t is computed by a r (t ) = (iv) ∏µ (i ) j i , j:µ (j i )∈Ant ( r ) ( xi ) . The output of the system for each output variable yk is computed by a weighted sum: ∑ area (ν l( k ) , ar (t )) ⋅ center (ν l(k ) , ar (t )) Model Structure The main idea of this model is to combine simple feedforward fuzzy systems – which may consist of just one rule – to arbitrary hierarchical models. Therefore, the interpretability of every part is ensured before, during and after optimization. Backward connections between the models are realized by time-delayed feed-back links (see also the discussions in the previous sections). The interpretability of the fuzzy sets is ensured by the use of coupled weights in the consequents (fuzzy sets, which are assigned to the same , where µ (ij ) is the j-th fuzzy set assigned to input xi. Figure 2. A hierarchical recurrent rule base and its representation as directed (cyclic) graph In (Nürnberger 2001) we have presented a neuro-fuzzy system that can be used to learn and optimize hierarchical recurrent fuzzy rule bases – as discussed in the previous section – from data. The structure is based on local Mamdani-like fuzzy system for function approximation (Mamdani, and Assilian 1975), that can be combined to an arbitrary hierarchical and recurrent rule base. The encoding of a fuzzy system in a neural network like architecture is motivated by the generic fuzzy perceptron (Nauck 1994), which was also used in our feed-forward neuro-fuzzy models for fuzzy control, approximation and classification (see, e.g. Nauck, Klawonn, and Kruse 1997; Nürnberger, Nauck, and Kruse 1999). The learning algorithm proposed in (Nürnberger 2001) supports logistic or Gaussian-like fuzzy sets to define the membership functions of the antecedents and Gaussian-like fuzzy sets to define the consequents. However, the algorithm can easily adapted to other (differentiable) membership functions. In the following we briefly describe the structure of the model and the used learning methods. β (j i ) ( xi −d (j i ) ) yk (t ) = r :ν l(rk ) ∈Con ( r ) r r ∑ area (ν l( k ) , ar (t )) r :ν l(rk ) ∈Con ( r ) r ∑ area (ν l( k ) , ar (t )) ⋅ cˆl( k ) = r :ν l(rk ) ∈Con ( r ) r r ∑ area (ν l( k ) , ar (t )) r :ν l(rk ) ∈Con ( r ) r , where area computes the area and center the center of gravity of the output fuzzy set νˆl( k ) of rule r for yk at time t which is defined as r νˆl( k ) ( yk ) = min(ν l( k ) ( yk ), a r (t )) . r (v) r Two types of feedback connections are allowed: a. Time-delayed feedback: Any output yk can be assigned to an input xi by defining xi (t) := yk(t-1). b. Hierarchical feedback: An output yk can be assigned to an input xi by defining xi(t) := yk(t) if xi(t) is not used for the computation of yk(t) neither directly nor indirectly via another (hierarchical) input xj(t). The feedback connections defined in (v) enable the definition of hierarchical fuzzy systems (b.) and the use of timedelayed inputs (a.). It should be noted, that only timedelayed feedback connections are considered as ‘real’ recurrent connections. Hierarchical feedback connections are simply a ‘recurrent’ definition of a hierarchical fuzzy rule base, and the condition defined in (b.) ensures that the rule base can be sorted topologically. yj y1 yk yn The used defuzzification method is a modified center of gravity approach (COG; see e.g. Kruse, Gebhardt, and Klawonn 1994). In the commonly used COG approaches the output fuzzy sets are merged prior to defuzzification. Therefore, overlapping areas of fuzzy sets are neglected. Here, the area is weighted by the number of occurrences, thus considering the number of ‘votes’ for a specific output. Learning Method for Parameter Optimization The considered learning problem is a function approximation problem and for each input vector (time series vector) the desired output vector is known. Therefore supervised learning methods can be used to optimize the parameters of the membership functions. The realized learning method is motivated by the real-time recurrent learning method that has already been applied successfully for the optimization of the weights in recurrent neural networks (Williams, and Zipser 1989). The idea is to propagate the error obtained at the output units back through the rules of the (hierarchical) fuzzy system and adapt the fuzzy sets accordingly. If feedback connections have to be considered, the fuzzy rule base is unfolded in time and the error is propagated back through time as described briefly below. A more detailed description of the learning algorithm is given in (Nürnberger 2001). Let E be the total error (the cost function to be minimized) of all output neurons k over all time steps t=0, ..., T: T ∑ … ∑ (1) 1 ν ∏ … ∑ … ∏ (1) t =0 ∑ with E (t ) = νk(n) νl(j) … E = ∑ E (t ) , 1 ∑ ( Ek (t ))2 , 2 k (2) and … ∏ … ∏ z-1 µ1(1) … µ n(11 ) µ1( j ) … µ n( jj ) µ 1( n ) … µ n(nn ) if node k has a target ok (t ) − yk (t ) E k (t ) = (3) output ok at time t , 0 otherwise, be an error measure for output node k. Then the error gradient of error E(t) for an arbitrary parameter p of the considered fuzzy rule base is defined as ∂E (t ) ∂y (t ) = − ∑ E k (t ) k . ∂p ∂p k (4) Using Definition 1 we obtain … x1 … xj ∂y k (t ) ∂y k (t ) + = ∂f p ( p ) ∂p xn Figure 3. Possible structure of the recurrent neuro-fuzzy system (using one time-delayed and one hierarchical feedback) ∂y (t ) ∂a (t ) ∑ ∂ak (t ) ⋅ ∂rp + ∈Con ( r ) i:µ r r:ν l(rk ) (5) ∂a r (t ) ∂xi (t ) ∑ ∂x (t ) ∂p , ∈Ant ( r ) i (i ) jr ∂x i (t ) have to consider the deriva∂p tions resulting from the input of connected subsystems or prior time-steps if a time delayed link is used. Thus, four cases have to be considered: where the derivatives ∂y j (t ) ∂p ∂y j (t − 1) ∂x i (t ) ∂p = ∂p ∂x j (t − 1) ∂p 0 ∂x i (t ) ∂p ∂x i (t ) If ∂p ∂x i (t ) If ∂p ∂x i (t ) If ∂p (6) If is a hierar. feedback y j (t ), is a time del. feedb. y j (t − 1), is a self - feedback, is an external input. xi ( t ) =0 ∂p since the external input is independent from any parameter of the system (neglecting any external recurrences, which could not be handled by the system itself). If xi (t ) is assigned to an internal variable of the system, three cases have to be considered: x (t ) • For a hierarchical feedback with xi (t ) := yi (t ) , i ∂p If xi (t ) is an external input to the system, then Implementation The learning method discussed above was implemented in a software tool for the interactive design and computation of hierarchical recurrent fuzzy rule bases. To ensure a good interpretability of a learned rule base, the tool supports – besides shared weights as defined above – the use of simple constraints that has to be observed by the learning process1 and a simple pruning strategy: Rules are removed from the rule base if the support of an assigned fuzzy set is decreased below a threshold value during learning. (The influence of this rules on the output is very low and therefore the rule can usually be neglected.) Furthermore, some modifications of the learning process have been implemented: Batch or on-line learning combined with a momentum term and a teacher forcing strategy (see, e.g. Williams, and Zipser 1989). A screenshot of the software implementation is shown in Figure 4. Besides the manual definition of a rule base, the software implementation also provides a heuristics to learn a fuzzy rule base from data. This method is briefly described in the following. can be computed by (5) by replacing xi (t ) with the output variable y j (t ) of the preceding layer, since a hierarchical feedback is independent from xi . • If xi (t ) is a self-feedback input, i.e. xi (t ) := x j (t − 1) , xi ( t ) can be computed iteratively similar to ∂p RTRL learning (Williams, and Zipser 1989), by assum∂xi (0) =0, since the initial state can be considered ing ∂p as independent from p. then • If xi (t ) is a time-delayed feedback input, i.e. ∂y j (t ) ∂xi (t ) ∂y j (t − 1) = and ∂p ∂p ∂p can be computed by use of Equation (5). xi (t ) := y j (t − 1) , then Finally, the parameters of the fuzzy sets are adapted according to ∂E (t ) , (7) p(t + 1) = p(t ) − η p ∂p where ηp is the learning rate for a parameter p. Figure 4. Screenshot of the software implementation Rule Base Learning The main idea of this approach is to learn the hierarchically structured rule base of local subsystems by use of rule templates. These templates enable the definition of the variables and the belonging time delays that should be used for the creation of the subsystems. For recurrent systems it is usually necessary to be able to have different fuzzy partitions for input and output variables that are assigned to the same domain. Therefore the rule templates allow to define individual subsets of fuzzy sets for each domain. Thus, the rule learning algorithm can 1 For discussions concerning the use of constraints in neuro-fuzzy learning approaches, see e.g. Nauck, Klawonn, and Kruse 1997. be restricted to choose only fuzzy sets of a specific subset during rule creation. In the realized software implementation subsets can be created by defining a simple filter pattern for every variable such that only fuzzy sets with matching linguistic terms are used in the learning process for a specific rule template. The templates can be defined using generic fuzzy rules. For example, the subsystem template IF (x1(t-1) LIKE '*2') AND (x2(t-1) LIKE '*1') THEN (y1(t) LIKE '*0') is used to create rules like IF (x1(t-1) IS small2) AND (x2(t-1) IS large1) THEN (y1(t) IS zero0) IF (x1(t-1) IS zero2) AND (x2(t-1) IS large1) THEN (y1(t) IS large0). Based on the rule templates, the specific instances of the membership functions have to be selected for each rule. The used learning method is motivated by the heuristics implemented in the NEFPROX model (Nauck, and Kruse 1997). The algorithm requires an existing fuzzy partitioning for each considered domain. However, the rule base learning algorithm can be modified to create fuzzy sets and thus to avoid an initial partitioning of the domains (Nauck, and Kruse 1997). The algorithm consists of two parts. During the first part (rule base learning), the algorithm creates rules iteratively based on the given training data. The algorithm selects for each variable the fuzzy set (of the subset) that obtains the highest membership degree for the given value and assigns it to the antecedent or consequent, respectively. Then the fuzzy rules are created based on the given templates. For fuzzy rules used to compute inner variables, the respective consequents are assigned randomly. During the second part (consequent re-assignment), the algorithm re-assigns the consequents based on the obtained mean over all inputs for each output variable (considering a minimal rule activation) during a complete propagation through time, i.e. the fuzzy set that obtains the highest membership degree for the mean value is assigned to the consequent. Application Example In the following example we use the discussed neuro-fuzzy system to identify a dynamic process in order to be able to predict its future states. The example describes an application to a system identification problem in control theory (simple physical spring-mass model). However, an analogous problem in decision making is the identification of a model that can be used to predict the next state of a dynamic process. For the derivation of the model, we consider a mass on a spring. Let the height x = 0 denote the position of the mass m when it is at rest. Suppose the mass would be lifted to a certain height x(0) = x0 and would then be dropped (i.e. the initial velocity is v(0) = 0). Since the gravitational force on the mass m is the same for all heights x, we may neglect it. The force exerted by the spring is governed by Hooke's law, according to which the force of a spring is proportional to the change of the length of the spring and opposite to the direction of this change. This means F = − c ⋅ ∆l = −c ⋅ x holds, where c is a constant that depends on the spring. According to Newton’s second law F = m ⋅ a = m ⋅ x this force causes an acceleration of the mass m. Consequently, we obtain the differential equation −c DxD = x, m with the initial values x(0) = x0 and v(0) = xD (0) = 0 . This second order differential equation can be transformed into a system of two coupled first order differential equations by introducing an intermediary variable v xD = v and vD = −c x. m (8) As system parameters the values c=40, m=1 and x0=1 were used. A data set was generated by simulating the system for a period of 20 sec. with a (fixed step size ∆t=0.1) explicit fourth-order Runge-Kutta method and storing the values x(t) and v(t). To obtain a rule base that can be interpreted with respect to each variable, two rule templates – one for the computation of x and one for the computation of v – were defined for the rule base learning method. The domains x and v were partitioned by three Gaussian-like fuzzy sets (neg, zero, pos), while for each subsystem and delay independent fuzzy sets were defined with similar initial parameters. The use of independent fuzzy sets for output and (time-delayed) input is necessary to allow each system to scale between the output and input values. The rule base learning method yielded the following rules: IF (x[t-1] IS neg2 AND v[t-1] IS neg1) THEN x IS neg0 IF (x[t-1] IS neg2 AND v[t-1] IS zero1) THEN x IS neg0 IF (x[t-1] IS neg2 AND v[t-1] IS pos1) THEN x IS zero0 IF (x[t-1] IS zero2 AND v[t-1] IS neg1) THEN x IS neg0 IF (x[t-1] IS zero2 AND v[t-1] IS zero1) THEN x IS zero0 IF (x[t-1] IS zero2 AND v[t-1] IS pos1) THEN x IS pos0 IF (x[t-1] IS pos2 AND v[t-1] IS neg1) THEN x IS zero0 IF (x[t-1] IS pos2 AND v[t-1] IS zero1) THEN x IS pos0 IF (x[t-1] IS pos2 AND v[t-1] IS pos1) THEN x IS pos0 IF (x[t-1] IS neg1 AND v[t-1] IS neg2) THEN v IS zero0 IF (x[t-1] IS neg1 AND v[t-1] IS zero2) THEN v IS pos0 IF (x[t-1] IS neg1 AND v[t-1] IS pos2) THEN v IS pos0 IF (x[t-1] IS zero1 AND v[t-1] IS neg2) THEN v IS neg0 IF (x[t-1] IS zero1 AND v[t-1] IS zero2) THEN v IS zero0 IF (x[t-1] IS zero1 AND v[t-1] IS pos2) THEN v IS pos0 IF (x[t-1] IS pos1 AND v[t-1] IS neg2) THEN v IS neg0 IF (x[t-1] IS pos1 AND v[t-1] IS zero2) THEN v IS neg0 IF (x[t-1] IS pos1 AND v[t-1] IS pos2) THEN v IS zero0 If we interpret this rule base, we see that the learned rule base of the subsystems for the computation of x and v are similar to the rule base of a simple integrator (adder). So, we can suppose, that x is computed by integration of v, and v by integration of (negated) x. If we compare this interpretation to Equation (8) we see, that this behavior corresponds to the physical model. After rule base learning, the rule base was optimized for 4000 training cycles (complete propagations through time) using x and v as training data. Figure 6 depicts the learning progress. The learning terminated with a summed square error of E = 12.872, which is caused by the deviation from the frequency and amplitude of the physical system. In Figure 5 the position x and velocity v of the trained system and the validation data over time is shown (x and v given). To evaluate the performance of the presented approach for inner variables, in following runs either x or v was presented to the system for learning, thus the other variable was considered as unknown. The simulation run using just x for training yielded the best approximation of the amplitudes for x and v, while the system which was trained by use of v fits the frequencies (see Figure 5). Both approaches achieved an appropriate approximation of the physical model. The performances of the resulting models are similar to the model that was trained using both variables (x and v) as known output value for learning. It has to be emphasized, that the only input to the system are the initial values x(0)=1 and v(0)=0. The dynamic behavior results from the reuse of the output signals as input (time-delayed feedback). Even during learning, the system used its own output as input and just the error signal Ei(t) – the difference between the system output and training data – for learning. 1 0,8 0,6 0,4 0,2 0 -0,2 0 10 20 30 40 -0,4 -0,6 -0,8 -1 x train x (x given) x (v given) x (x and v given) 7 6 5 4 3 2 1 0 -1 0 10 20 30 40 -2 -3 2000 1800 1600 1400 1200 1000 800 600 400 200 0 0 500 1000 1500 2000 2500 3000 3500 4000 cycle Figure 6. Change of E(t) (x and v given) Discussion In the preceding section we have discussed the application of the proposed neuro-fuzzy system to a simple dynamical system to show the principle usability for the approximation of dynamic systems. The analysis covered identification of hierarchical as well as recurrent rule structures. In the same way, the model can be applied to decision making problems: • If prior knowledge is available, this can be used to define an initial rule base. • Using rule templates, existing training data can be used to learn (additional) fuzzy rules. • The membership functions used to define an existing or learned fuzzy rule base can be optimized using available training data. • The obtained fuzzy rule base can be analyzed to obtain information about the analyzed data set or to justify the decision process. However, for the application to decision making problems we have to consider, that the proposed model is based on local subsystems and the inner variables are defuzzified before used in a succeeding layer (subsystem). Thus, we neglect information about the specificity or vagueness of an attribute in the inference process. Furthermore, we can not forward alternative solutions to succeeding layers and we therefore have to avoid contradicting rules (e.g. implementing alternative decisions). To further improve the usability of the discussed system, the structure of the neuro-fuzzy model could be extended so that also symbolic attributes can be used directly. Currently, these attributes have to encoded as real values (if an order can be defined) or as a binary vector. Furthermore, the model should be extended to support the use of patterns with missing values. Here, approaches similar to the methods discussed in (Nauck, and Kruse 1999) might be appropriate. These aspects will be considered in future work. -4 -5 Conclusions -6 -7 v train v (x given) v (v given) Figure 5. System output v (x and v given) In this article we have discussed hierarchical recurrent fuzzy systems that tackle the complexity problems of pure feed-forward architectures and enable the approximation of dynamic processes by storing information of prior system states internally. Furthermore, we have discussed a neuro-fuzzy model that can be used to optimize hierarchical recurrent fuzzy rule bases. The model allows the use of prior knowledge in form of fuzzy rules or rule templates, e.g. to define an appropriate initial solution or to enhance the learning performance. The learning methods ensures the interpretability of the optimized rule base by use of shared weights (fuzzy sets) and constraints. The proposed neuro-fuzzy system can be a valuable tool to (interactively) optimize arbitrary hierarchical recurrent fuzzy rule bases. However, we have to be aware that even if we use neuro-fuzzy techniques for optimization, we still apply a fuzzy system as a matter of fact. In general, fuzzy systems cannot be meant for outperforming other approximation approaches. This is mainly prevented by the usually small numbers of linguistic terms that are shared by all rules to ensure an interpretable solution. This trade-off between (often quantitative) methods that achieve good performance and (often qualitative) models that give insights into a specific domain or problem was formulated as the principle of the incompatibility of precision and meaning by Zadeh (Zadeh 1973). The benefit gained by using a fuzzy system lies in interpretability and readability of the rule base. This is widely considered more important than the ‘last percent’ of increase in performance. Acknowledgements The work presented in this article was partially supported by BTexact Technologies, Adastral Park, Martlesham, UK. References Bezdek, J. C., Keller, J. M., Krishnapuram, R., and Pal, N. R. 1998. Fuzzy Models and Algorithms for Pattern Recognition and Image Processing, The Handbooks on Fuzzy Sets, Kluwer Academic Publishers, Norwell, MA. Chi, Z., and Yan, H. 1996. ID3-derived fuzzy rules and optimal defuzzication for handwritten numeral recognition, IEEE Trans. Fuzzy Systems, 4(1), pp. 24-31. Cordon, O., Herrera, F., Hoffmann, F., and Magdalena, L. 2001. Genetic Fuzzy Systems: Evolutionary Tuning and Learning of Fuzzy Knowledge Bases, Advances in Fuzzy Systems, World Scientific, Singapur. Gorrini, V., and Bersini, H. 1994. Recurrent Fuzzy Systems, In: Proceedings of the 3rd Conference on Fuzzy Systems (FUZZ-IEEE 94), IEEE, Orlando. Höppner, F., Klawonn, F., Kruse, R., and Runkler, T. 1999. Fuzzy Cluster Analysis, Wiley, Chichester. Kruse, R., Gebhardt, J., and Klawonn, F. 1994. Foundations of Fuzzy Systems, Wiley, New York, Chichester, et al.. Kruse, R., Schwecke, E., and Heinsohn, J. 1991. Uncertainty and Vagueness in Knowledge-Based Systems: Numerical Methods, Springer-Verlag, Berlin. Lee, M., and Takagi, H. 1993. Integrating Design Stages of Fuzzy Systems Using Genetic Algorithms, In: Proc. IEEE Int. Conf. on Fuzzy Systems 1993, pp. 612-617, San Francisco. Lin, C.-T., and Lee, C.-C. 1996. Neural Fuzzy Systems. A Neuro-Fuzzy Synergism to Intelligent Systems, Prentice Hall, New York. Mamdani, E. H., and Assilian, S. 1975. An Experiment in Linguistic Synthesis with a Fuzzy Logic Controller, Int. J. Man Machine Studies, 7, pp. 1-13. Nauck, D. 1994. A Fuzzy Perceptron as a Generic Model for Neuro-Fuzzy Approaches, In: Proc. FuzzySysteme'94, Munich. Nauck, D., and Kruse, R. 1997. Function Approximation by NEFPROX, In: Proc. Second European Workshop on Fuzzy Decision Analysis and Neural Networks for Management, Planning, and Optimization (EFDAN'97), pp. 160-169, Dortmund. Nauck, D., and Kruse, R. 1999. Learning in Neuro-Fuzzy Systems with Symbolic Attributes and Missing Values, In: Proc. Sixth International Conference on Information Processing (ICONIP99), pp. 142-147, Perth. Nauck, D., Klawonn, F., and Kruse, R. 1997. Foundations of Neuro-Fuzzy Systems, Wiley, Chichester. Nürnberger, A. 2001. A Hierarchical Recurrent NeuroFuzzy System, In: Proc. of Joint 9th IFSA World Congress and 20th NAFIPS International Conference, pp. 1407-1412, IEEE. Nürnberger, A., Nauck, D., and Kruse, R. 1999. NeuroFuzzy Control Based on the NEFCON-Model, Soft Computing, 2(4), pp. 182-186, Springer, Berlin. Pan, J., DeSouza, G. N., and Kak, A. C. 1998. FuzzyShell: A Large-Scale Expert System Shell using Fuzzy Logic for Uncertainty Reasoning, Trans. on Fuzzy Systems, 6(4), pp. 563-581. Pedrycz, W. 1998. Computational Intelligence: An Introduction, CRC Press., Boca Raton, FL. Steimann, F., and Adlassnig, K.-P. 1998. Fuzzy Medical Diagnosis, In: Ruspini, E., Bonisonne, P. P., and Pedrycz, W. (eds.), Handbook of Fuzzy Computation, IOP Publishing Ltd and Oxford University Press. Wang, C.-H., Liu, J.-F., Hong, T.-P., and Tseng, S.-S. 1999. A fuzzy inductive learning strategy for modular rules, Fuzzy Sets and Systems, 103, pp. 91-105. Williams, R. J., and Zipser, D. 1989. Experimental analysis of the real time recurrent learning algorithm, Connection Science, 1, pp. 87-111. Yuan, Y., and Shaw, M. J. 1995. Induction of Fuzzy Decision Trees, Fuzzy Sets and Systems, 69(2), pp. 125-139. Zadeh, L. A. 1965. Fuzzy Sets, Information and Control, 8, pp. 338-353. Zadeh, L. A. 1973. Outline of a New Approach to the Analysis of Complex Systems and Decision Processes, IEEE Trans. Systems, Man & Cybernetics, 3(1), pp. 2844.