MIT eScience REPORT FUZZY SIGNATURES Bai Qifeng KEY WORDS: cluster, c-mean, data mining, fuzzy logic, fuzzy signature, if-then rules, projection-based ABSTRACT: A major advantage of fuzzy theory is that it allows the natural and linguistic description of problems instead of precise numerical values. This advantage which deals with a complicated system in an intuitive way is the main reason why fuzzy theory is widely applied. However, fuzzy systems suffer from rule explosion in complicated systems. There are two ways to reduce rule sets. One way is to create a sparse system. The other is to construct fuzzy signatures. By constructing a hierarchical fuzzy signatures structure, it also can assist human experts by solving missing data and reducing unnecessary information presentation. 1. Introduction Over the past few decades, fuzzy logic theory is widely used: process control, management and decision making, operations research, economies. Dealing with simple ‘yes’ and ‘no’ answers is no longer satisfactory enough; a degree of membership (Zadeh, 1965) became a new way of solving problems. Fuzzy logic derives from the truth that the human common sense reasoning mode is approximate in nature. However, when we are using fuzzy theory to handling practical problems, it is inevitable to meet a large data set. Also, it cannot tackle problems with complicated and interdependent features or where there is missing data (Wong, Gedeon and Kóczy, 2001). In this situation, it will cause heavy workload and too complicate rules. Fuzzy signature is introduced to solve some problems in economy and medical domains which is full of complicated and interdependent objects which need to be classified and evaluated (Gedeon et al. 2001). Fuzzy signature has a hierarchical structure which allows some relevant data constructed as vectors of fuzzy values, and then are contained in some high-level vectors. This tree structure is created with the objective to mimic human experts’ decision-making process as which can handle situations in which the numbers of data item are different, some even missing. 1 Hereby, an approach named Projection Based Method for Fuzzy System is introduced to automatically construct a fuzzy rule base from a set of input-output sample data. It adapts data mining technology to cluster on output space which is produced by a set of training data, and then each output cluster is projected back to each input dimension. The clusters from different input dimensions can be used to create fuzzy rules (Chong, 2001). 2. Fuzzy Logic Theory With the development of electronic devices, precise measured values are handled by computer systems, for example, speed 50.12 mile/hour. Conventional bivalent sets can tell us whether this speed is ‘fast’ or ‘slow’. The most obvious limiting feature of bivalent sets is that they are mutually exclusive - it is not possible to have membership of more than one set. Also, opinion would be widely different as to whether 50 mile/hour is 'fast' or 'slow' hence the expert knowledge we need to define our system is mathematically at odds with the human world. So, it is not accurate to define a transition from a quantity such as slow' to 'fast'. What if 50 mile/hour is the boundary, 49.9 does mean ‘slow’ but 50.12 means ‘fast’ in bivalent set. However in the real world, there should be a smooth change from ‘slow’ to ‘fast’ would. This natural phenomenon can be described more accurately by Fuzzy Set Theory. A fuzzy set is a set whose elements have degrees of membership. An element of a fuzzy set can be full member (100% membership) or a partial member (between 0% and 100% membership). That is, the membership value assigned to an element is no longer restricted to just two values, but can be 0, 1 or any value in-between. The mathematical function which defines the degree of an element's membership in a fuzzy set is called the membership function. Let U be a Universal set, contains all elements, A is a crisp set. It can be presented as A x U | x meets some conditions Definition of Characteristic Function 2 1, if 0, if a x x A x A Definition of Membership Function: The value of fuzzy set in U is presented by membership function A x . Another intuitive presentation is to regard A x as percentum of which x belongs to A. Usually, a fuzzy set A in U, is presented as an order pairs of x and its membership value: A x, A x | x U For example: Figure 1 describes the graph of membership functions of fever. 1.2 Slight 1 Moderate Severe Extreme 0.8 0.6 0.4 0.2 0 37.3 37.9 38.6 39.1 40 (Figure 1, Fever rules data came from www.bhp.doh.gov.tw) Algorithm is : For each level of fever If data > min && data< median value = (data – min)/(median – min) else if data > median && data < max value = (data – median)/(max – median) else value = 0 3 1.2 Slight Moderate 1 37.8 0.8 Sever e Extreme 39.8 38.4 0.6 0.4 0.2 0 37.3 37.9 38.6 39.1 40 Assume U 37.8, 38.4, 39.8, Set A A x , A 37.8 0.83, 0, A 38.4 0.29, 0.71, A 39.8 0, 0, 0, 0 0, 0 0.22, 0.78 Membership functions mostly have much complex shape then fever(x). They will at least tend to be triangles pointing up, and they can be much more complex than that. Furthermore, membership functions which are discussed so far are as if they always are based on a single criterion, but this isn't always the case, although it is the most common case. One could, for example, want to have the membership function for Fever not only depends on a person's temperature but also on other symptoms. 3. Fuzzy Control Fuzzy control, which directly uses fuzzy rules, is the most important current application in fuzzy theory. I use a procedure originated by Ebrahim Mamdani in the late 70s as a demo to show how fuzzy systems works. Three steps are used to create a fuzzy controlled machine: 1) Fuzzification (Using membership functions to graphically describe a situation) 2) Rule evaluation (Application of fuzzy rules) 4 3) Defuzzification (Obtaining the crisp or actual results) Now, we want to construct an inverted pendulum system. Here, the problem is to balance a pole on a mobile platform that can move in only two directions, to the left or to the right. The angle between the platform and the pendulum and the angular velocity of this angle are chosen as the inputs of the system. Output is corresponding to the speed of the platform. (Mamdani, 1972) 3.1. Fuzzification First of all, the different levels of output of the platform (speed) are defined by specifying the membership functions for the fuzzy sets. The graph of the function is shown below Similarly, the different angles between the platform and the pendulum and... The angular velocities of specific angles are also defined 5 Note: For simplicity, it is assumed that all membership functions are spread equally. 3.2. Rule evaluation The next step is to define the fuzzy rules. The fuzzy rules are a series of if-then statements. These statements are usually derived by an expert to achieve optimum results. Some examples of these rules are: If angle is zero and angular velocity is zero then speed is also zero. If angle is zero and angular velocity is low then the speed shall be low. The full set of rules is summarized in the table below. The dashes are for conditions, which have no rules associated with them. This is for simplifying the situation. Speed Angle ------------ negative high negative low zero positive low positive high v negative high ------------ ----------- negative high --------- --------- e negative low --------- --------- negative low zero -------- l zero negative high negative low zero positive low positive high o positive low --------- zero low ----------- --------- c positive high --------- --------- high ---------- --------- An application of these rules is shown using specific values for angle and angular velocities. The values used for this example are 0.75 and 0.25 for zero and positive-low angles, and 0.4 and 0.6 for zero and negative-low angular velocities. These points are on the graphs below. 6 Consider the rule "if angle is zero and angular velocity is zero, the speed is zero". The actual value belongs to the fuzzy set zero to a degree of 0.75 for "angle" and 0.4 for "angular velocity". Since this is an AND operation, the minimum criterion is used , and the fuzzy set zero of the variable "speed" is cut at 0.4 and the patches are shaded up to that area. This is illustrated in the figure below. Similarly, the minimum criterion is used for the other three rules. The following figures show the result patches yielded by the rule "if angle is zero and angular velocity is 7 negative low, the speed is negative low", "if angle is positive low and angular velocity is zero, then speed is positive low" and "if angle is positive low and angular velocity is negative low, the speed is zero". The four results overlap and are reduced to the following figure 3.3. Defuzzification The result of the fuzzy controller as of know is a fuzzy set (of speed). In order to choose an appropriate representative value as the final output(crisp values), defuzzification must be done. There are numerous defuzzification methods, but the most common one used is the center of gravity of the set as shown below. 8 4. Hierarchical Fuzzy System A major issue in fuzzy applications is how to produce fuzzy rules. The classical approaches of fuzzy control deal with dense rule bases where the universe of discourse is fully covered by the antecedent fuzzy sets of the rule base in each dimension, thus there is at least one activated rule for every input (Muresan, 2001). It causes the high computational complexity of these traditional approaches, because the numbers of rules has an exponential increase with the number of inputs and terms, e.g. in the above example, there are two inputs and 5 terms, it should be 25 rules, however, if there are 5 inputs and 5 terms, the number of rules is 3,125. The complexity limits the usage of classical fuzzy theory where the inputs cannot exceed about 6 to 10 (Wong et al. 2003). If a fuzzy model contains k variables and maximum T linguistic (or other fuzzy) terms in each dimension, the number of necessary rules is O (T k ) . The number of rules can be decreased either by decreasing T, or k, or both, meanwhile methods should prevent from losing the easy interpretability of the components. One method leads to sparse rule bases through decreasing T and adapts rule interpolation to create rule bases (Kóczy and Hirota, 1993). The other aims to reduce the dimension of the sub-rule bases k by using meta-levels or hierarchical fuzzy rule bases (Sugeno, Murofushi, Nishino and Miwa, 1991). As for the hierarchical structure, the basic idea is the following: Often the multi-dimensional input state space X X 1 X 2 X k can be decomposed, so that some of its components, e. g. X X 1 X 2 X k 0 determine a subspace of X (k 0 k ) , so that in Z 0 a partition D1 , D2 ,, Dn determined: n Di Z 0 i 1 In each element of , i.e. Di , a sub-rule base Ri can be constructed with local validity. In the worst case, each sub-rule base refers to exactly X Z 0 X k0 1 X k , and so the 9 hierarchical rule base has the following structure: R0: If z 0 is D1 then use R1 If z 0 is D2 then use R2 ……. If z 0 is Dn then use Rn Where z 0 Z 0 R1: If z1 is A11 then y use B11 If z1 is A12 then use B12 ……. If z1 is A1m1 then use B1m1 Where z1 X Z 0 R2: If z1 is A21 then y use B21 If z1 is A22 then use B22 ……. If z1 is A2m2 then use B 2m2 Where z1 X Z 0 .. Rn: If z1 is An1 then y use B21 If z1 is An2 then use B22 ……. If z1 is Anmn then use Bnmn The fuzzy rules in hierarchical structure are pointers to other sub – rules bases. We can find that this hierarchical approach does not help with the O (T k ) complexity of the whole rule bases as the size of R0 is O(T k1 ) ,and each Ri, i>0, is of order O (T k k1 ) , so the resulting complexity is O(T k1 ) O(T k k1 ) O(T k ) . Only if a suitable and Z0 are found where the number of variables in each Zi is ki<k-k0 and max nk1 ki K O(T K ) , then the application of the structured rule base leads in effect to the reduction of k to smaller exponent: k0<k+K . Now, the main difficulty in the automatic construction of such system is mainly in finding a suitable Z0 and . 10 5. Fuzzy Signature Fuzzy signatures structure data into vectors of fuzzy values, each of which can be a further vector. It can extend the application of fuzzy theory to domains which contain complex and interdependent features. Fuzzy signatures can be used in cases which have different numbers of data components. The definition of fuzzy sets was A: X->[0,1], and was extend to L-fuzzy sets by Goguen (Goguen, 1967), As : X ai ik1 , ai 0,1 ,a aij iki 1 ij 0,1 aijl lk1 ij AL : X L , L being an arbitrary algebraic lattice. Vector Valued Fuzzy Sets is descried k as, where Av ,k : X 0,1 , and the range of membership values was the lattice of k dimensional vectors with components in the unit interval (Kóczy, 1982). Generally, it means each fuzzy signature is a nested vector structure which contains a serial fuzzy signature, the internal structure of which indicates the semantic and logical connection of state variables, equivalent to the leaves of the signature graph. It can be denoted as a fuzzy set vector which has possible recursive component vectors: A : X S (n) where n 1 and n S ( n) S i i 1 0,1 S i ( m) and describes Cartesian product. S A fuzzy signature is a kind of special multidimensional fuzzy data; some of the data in sub-groups will affect some feature on their higher level. The relationship between higher and lower levels is controlled by a set of fuzzy aggregations. The results of the parent signature at each level are computed from their branches with appropriate aggregation of their child signatures. The aggregation methods 11 are not necessary identical. It can be changed based on expert opinions and detailed circumstance. With each aggregation, higher signatures will keep less information. In some circumstances, it is useful to reduce and aggregate information and maintain compatibility with that of other sources in which some detail variables are missed or omitted. In most cases, the rule of maximal common sub-tree is that all signatures are able to be interpolated between the corresponding branches. 6. Fuzzy Signature in SARS Pre-clinical Diagnosis There are two ways to determine the sub-trees of the fuzzy signature. One way is determined by human experts. The other is the structure of the fuzzy signature is decided via identifying the separability of data (Chong et al. 2002). Here, in the demonstration, the first method will be used. The following scheme is the daily symptom signatures of patients: 8am fever12 pm 4 pm 8 pm AS 12am Cough 9 pm Nausea Sore Doctors know these symptoms need to be checked and how many times to be monitored. In reality, more symptoms should be tested, for the reason of simple demonstration, only some representative symptoms are included. A few examples with linguistic values and fuzzy signatures are list below: 12 none 0.0 none 0.0 slight 0.2 slight A1 0.2 normal 0.5 slight 0.25 slight 0.25 moderate 0.4 moderate A2 0.4 high 0.7 severe 0.9 slight 0.25 none 0 Normally, fever values (temperature) can be expressed as e.g. 38.9 degree; it also can be converted to linguistic values by considering contextual information such as the different normal body temperature of adults and children. Note that the structures which happen in real world data are different. For patient 2, there are only two measurements of fever. The structure of the fuzzy signature contains some information in the associated vector component. An aggregation method can compare components regardless of the different numbers of sub-components. Aggregation methods should be designed for vectors with the assistance of domain experts. Here, we assume the time of examination of a day is less significant and that highest temperature value is more important. The two signatures are reduced to: A1 f 0.2 0.5 0.5 0.25 0.25 A2 f 0 .4 0 .6 0 .8 0.25 0 Now, the component “fever” can be rewritten linguistically as e.g. “slight”, “moderate”. The signatures above still contain information to describe the “worst case fever” of each patient, although information of the daily tendency is lost. We can continue our processes further and finally, get an overall “abnormal condition” measure: A1o 0.25, A2o 0.4 Notes: Aggregation methods for different symptoms here are different with that of signatures of same symptoms. Basically, we can use an expert system to weight each 13 symptom; also many artificial intelligence theories can be used to in this application. This example just shows how to convert patients’ data into individual fuzzy signatures. Then, by using some fuzzy operation and aggregation, the fuzzy signature can produce an indicated value about the measurement of “abnormal condition”. The main advantage of fuzzy signature is it can model more vague information and in some cases symptoms of patients are allowed to be different. The other reason is that the structure of a fuzzy signature is flexible, which can allow insertion of new fuzzy signatures without need of prior structure design (Wong et al. 2003). 7. Automatic Method to Construct Fuzzy signature We have introduced how to manually construct an application of fuzzy signature in SARS pre-clinical diagnoses. We can build its internal structure based on patients’ symptom. However in application to a large data set, it is possible for any hierarchical structure that it contains its sub-structure is hidden. It has been discussed that the subspace Z0 is used to select the most appropriate sub–rule to deduce the output. Generally speaking, the more separable the elements in Π are, the easier sub-rule base selection is. Therefore, through ranking the importance of subspaces based on this capability in separating components can be used to decide the proper subspace. However, the problem is that finding Π and Z0 affect each other. Sugeno and Yasukawa (1991) introduced a solution for sparse rule-base generation. SY solution clusters output data sample and induces the rules by projecting clusters of output to input domains. However, it only produces necessary rules for the input-output sample data. Projection-based fuzzy rule extraction (PB) extended from SY approach, aims to automatically construct fuzzy rule base from a set of input-output sample data. Before we introduce PB, we first discuss what is fuzzy clustering and a fuzzy clustering method called fuzzy c-Means. 8. Fuzzy Clustering In last chapter, we mentioned that a clustering algorithm is used to cluster output data 14 samples. The main requirement of a reasonable cluster is that each of its elements can be modeled by a rule base with local validity (Wong et al. 2003). So it can be regarded that its aim is to find a subspace contains homogeneous data. Once a subspace is found, it could be used to select an appropriate sub-rule to infer the output for a given input. In data X x1 , x2 x, 3 ,.....xn , how to classify data points in X to K groups and ( n k 2 ), the rule is: there are highly relevant points in the same group, and highly irrelevant points in different groups. In traditional mathematical classification, it will classify the datum ‘strictly’ to a group. This is called Hard Clustering. However, most problems in our lives belong to uncertain fuzzy problems. Using fuzzy clustering allows a certain value to belong to different groups, so it can be better to keep the feature of this value. On the other hand, when we are using the feature selection methods to seek a ‘reasonable’ subspace for clustering, the algorithm works more effectively, most of feature selection technology such as c-Means needs to know the number of clusters. 8.1. Fuzzy C-Mean Bezdek introduced Fuzzy C-Means clustering method (FCM) in 1981, extend from Hard C-Mean clustering Method (Dunn, 1974). In some conditions, convergence of FCM is better than that of Maximum Likelihood (ML) (Huggins, 1983). A suit of FCM algorithms issued by Cannon et al, are used widely in research in clustering analysis applications. The fuzzy cluster algorithm issued by Bezdek is an amended Dunn’s c-Means, but it is still some flaws such as, weight is not be considered and mainly used for static l data. n In 1978, Roubens introduced a new objective n function: iv2 2jv d i, j , however, the divergence is not very good. In 1981, v 1 i 1 j 1 n k Leonard et al, issued an amended objective formula: v 1 n uiv2 u 2jv d (i, j ) i 1 j 1 n 2 j 1 improve Roubens’ objective function (Cheng, 1991) 15 , which can u 2jv In this case, many symptoms affect each other from a medical viewpoint and different physical features of persons also affect the implicit meaning of measurement of symptoms. So, we adapt fuzzy clustering theory instead of classical clustering. Furthermore, using fuzzy clustering will still maintain the information in the original data so that it can be used for further research. Let (1 , 2 ,..... c1 ) as a fuzzy partition C U cn 11 12 1n 2 n 21 c1 cn Dunn defined a fuzzy objective function: c n J D U ,V ij2 x j vi i 1 j 1 2 , vi is cluster center of i set Then, Bezdek (1981) extended it to: c n J m U ,V ; X ijm x j vi ,1 m 2 i 1 j 1 xk vi 2 represents the deviation of data x k with vi . The number m governs the influence of membership grades. , vi is cluster center of i set For getting the minimum (U, V): c n min J m U ,V ; X ijm x j vi i 1 j 1 2 We should reach two conditions listed below: n ( ij ) m x j vi j 1 n ( ij ) ,1 i c m j 1 and 16 c ij ( x j v j k 1 1 2 2 x j vk ) m 1 1 ,1 i c,1 j n FCMC algorithm has a limitation, which is it needs to know the number of clusters. Some work has been done on how to find an optimal number of clusters. This is referred to as the cluster validity problem. A cluster validity index proposed by Fukuyama and Sugeno (FS): c n m ( xk vi 2 vi x 2 ),2 c n S (c ) U ij i 1 j 1 Therefore, the optimal member of clusters can be found by minimizing the distance between data and its centre and maximizing the distance between data in difference clusters. Iteration of the FCMC stops when the error is below a defined tolerance or its improvement over the previous iteration is below a certain threshold. 9. Projection Based Rule Extraction Briefly, this technique can be implemented following these steps: 1. Perform c-Means to cluster data along output space. The FS index of Fuzzy c-Means can be used to get a optimal number of clusters. 2. For each fuzzy output cluster, all points contained in the cluster are projected back to input dimensions. 3. The projected points in each dimension are clustered again. In this procedure, the FS index is used in conjunction with the merging index. This process will produce multiple fuzzy clusters in each dimension. 4. Each of the clusters in the input dimension is a projection of the multi-dimensional input cluster to that input dimension. Then, the clusters from the individual dimensions are combined to form the multi-dimensional input cluster. 5. For each of the multi-dimensional clusters identified, a rule can be created. 17 10. Conclusion In this report, I have described the basic concept of fuzzy sets and what fuzzy control is. For solving the rules explosion, I introduced the conception of fuzzy signature and how to construct a fuzzy signature manually. The hierarchical fuzzy signatures structure presented can perform feature selection and interclass separability to reduce the complexity. Here a SARS pre-clinical diagnosis model was constructed using fuzzy signature to show the flexibility of the fuzzy signature. In addition, a data mining algorithm – c-Means and its role in fuzzy signature was introduced. The PB provides a method to construct rule sets automatically. With the assistance of relevant theory, the fuzzy applications which may be generated from or adapted to fuzzy logic will be wide-used, and provide the more opportunity for modeling of conditions which are inherently imprecisely defined. 18 References Bezdek, J.C., Pattern Recognition with Fuzzy Objective Function Algorithms. 1981, New York: Plenum Press. Chong, A., Gedeon, T.D. and Kóczy,L.T., “Projection Based Method for Sparse Fuzzy System Generation” in Proceedings of 2nd WSEAS int.Conf on Scientific Computation and Soft Computing, 2002,pp.321-325 Chong, A., Gedeon, T.D. and Kóczy, L.T. “Feature selection and subspace clustering for hierarchical fuzzy rule extraction” CIMSA. 2003,Paris Gedeon, T.D. (1999) “Clustering Significant Wordsusing their Co-occurrence in Document Sub-Collections,” Proceedings 7th European Congress onIntelligent Techniques and Soft Computing(EUFIT’99), Aachen, pp. 302-306. Gedeon, T.D., Kóczy, L.T. Wong, K.W. and Liu, P. (2001) “Effective Fuzzy Systems for Complex Structured Data,” Proceedings of IASTED International Conference Control and Applications (CA 2001), pp. 184-187 Goguen, J.A (1967) “L-fuzzy sets” J. Math. A. Anal. Appl. 18, pp. 145-174 Kóczy , L.T. and Hirota, K., (1993) “Approximate reasoning by linear rule interpolation and general approximation,” Int. J. Approx. Reason, Vol 9, pp197-223. Muresan, Leila “Interpolation in Hierarchical Fuzzy Rule Bases,” Technical University of Budapest, Hungary Sugeno, M., Murofushi, T., Nishino, J., and Miwa, H. (1991) “helicopter flight control based on fuzzy logic,” Proceedings of Fuzzy Engineering toward Human Friendly System’91, pp. 1120-1124 Wang L X, Mendel J M (1992). “Fuzzy basis functions, universal approximation, and orthogonal least squares learning [J]”. IEEE Trans. On Neural Networks , (5) :807 -814. [4] Wong, K.W., Chong, A., Gedeon, T.D., K ó czy, L.T., and Vamos, T. (2003). “Hierarchical Fuzzy Signatures Structure for Complex Structured Data” Proceedings of International Symposium on Computational Intelligence and Intelligent Informatics 2003(ISCIII’03), Nabeul, Tunisia, pp105-109 19 Appendix Plan for future work Extend literature survey: 1. To find good methods to optimize weights. 2. To find different aggregation methods for the fuzzy signatures. Research work will focus on: 1. Finding an appropriate artificial intelligence algorithm to fine tune the weights of the fuzzy signatures. I will use artificial intelligence, because these approaches are used widely in automatically constructing systems based on feedback and self-learning algorithms. 2. Use of different aggregation methods. Experimental work: 1. Real data will be used to train the selected method of artificial intelligence. 2. To verify the final result deduced by the artificial intelligence method. 20