Australian National University Department of Computer Science Constructing Fuzzy Signature Based on Medical Data Student: Bai Qifeng Supervisor: Prof. Tom Gedeon 1 Index 1. INTRODUCTION ................................................................................................................................ 3 2. FUZZY LOGIC THEORY .................................................................................................................. 4 2.1. DEFINITION OF MEMBERSHIP FUNCTION: ....................................................................................... 5 2.2. FUZZY CONTROL ............................................................................................................................ 6 2.2.1. Fuzzification .......................................................................................................................... 7 2.2.2. Rule evaluation ...................................................................................................................... 8 2.2.3. Defuzzification ......................................................................................................................10 3. HIERARCHICAL FUZZY SYSTEM ............................................................................................... 11 4. FUZZY SIGNATURE .........................................................................................................................13 5. FUZZY CLUSTERING ......................................................................................................................14 5.1. 5.2. FUZZY C-MEAN .............................................................................................................................15 CLUSTER VALIDITY PROBLEM .......................................................................................................16 6. FACTOR ANALYSIS ..........................................................................................................................17 7. FEATURE RANKING AND SELECTION ......................................................................................17 8. OUTLIERS AND MISSING DATA ...................................................................................................18 9. EXPERIMENT AND CONSTRUCTION OF FUZZY SIGNATURES ..........................................18 10. CONCLUSION ...............................................................................................................................18 REFERENCES ............................................................................................................................................19 2 Constructing Fuzzy Signature Based on Medical Data KEY WORDS: clustering, FCM, fuzzy logic, fuzzy signature, factor analysis ABSTRACT: A major advantage of fuzzy theory is that it allows the natural and linguistic description of problems instead of precise numerical values. This advantage which deals with a complicated system in an intuitive way is the main reason why fuzzy theory is widely applied. However, fuzzy systems suffer from rule explosion in complicated systems. The hierarchical fuzzy signature is introduced to reduce the rules. Fuzzy signatures are vector valued fuzzy sets, where each vector component can be a further vector valued fuzzy set. Though clustering methods can help up find the clusters which have similar characteristics. Further, principal components in the clusters can be located via factor analysis. Then, a fuzzy signature can be constructed based on the degrees of coupling of components. 1. Introduction Over the past few decades, fuzzy logic theory is widely used: process control, management and decision making, operations research, economies. Dealing with simple ‘yes’ and ‘no’ answers is no longer satisfactory enough; a degree of membership (Zadeh, 1965) became a new way of solving problems. Fuzzy logic derives from the truth that the human common sense reasoning mode is approximate in nature. However, conventional fuzzy systems suffer from rules explosion. Thus, their applicability still is limited into some control systems with few dimensions of input variables and simply structured data. (Wong, Gedeon and Kóczy, 2001). If the number of inputs is larger than 6 or 10, it will cause heavy workload and too complicate rules. Fuzzy signature is introduced to solve some problems in economy and medical domains which is full of complicated and interdependent objects which need to be classified and evaluated (Gedeon et al. 2001). Fuzzy signature has a hierarchical structure which allows some relevant data constructed as vectors of fuzzy values, and then are contained in some high-level vectors. This tree structure is created with the objective to mimic human experts’ decision-making process as which can handle situations in which the numbers of data item are different, some even missing. Through classifying interdependent features and evaluating similarities / dissimilarities, features with high similarities can be grouped 3 together to form a sub branch and independent features can form a new sub-branch, Then, the different sub-branch can construct a higher level branch. Finally, they will be formed into a hierarchical tree. However, when dealing with a complex data, it is possible that they hide hierarchical structure. This report aims to how to find the internal hidden structure. Here, clustering is first performed on the dataset, then, factor analysis methods are used to find the principal components in clusters, then we can construct fuzzy signature based on the different degrees of coupling and importance. 2. Fuzzy Logic Theory The success of Fuzzy Systems is came from several factors. One is its ability to model non-linear systems at reasonable accuracy using human interpretable rules. Given a system input, the fuzzy system is not only able to infer a result, but also explain, in a way understandable to humans, how the conclusion is reached. It is this inferential explanation capability that has distinguished fuzzy systems from other Artificial Intelligence techniques as well as traditional mathematical models (Alex 2004). In comparison to traditional crisp symbolic rules, fuzzy rules are more able to capture uncertainty due to the use of fuzzy sets, which allows for gradual transition between different regions in the problem domain. For example, temperature of body is 38.5 degree. Conventional bivalent sets can tell us whether this temperature is ‘high’ or ‘low’. The most obvious limiting feature of bivalent sets is that they are mutually exclusive - it is not possible to have membership of more than one set. So, it is not accurate to define a transition from a quantity such as ‘low' to 'high'. What if 38.5 degree is the boundary, 38.49 does mean ‘low’ but 38.51 means ‘fast’ in bivalent set. This natural phenomenon can be described more accurately by Fuzzy Set Theory. A fuzzy set is a set whose elements have degrees of membership. An element of a fuzzy set can be full member (100% membership) or a partial member (between 0% and 100% 4 membership). That is, the membership value assigned to an element is no longer restricted to just two values, but can be 0, 1 or any value in-between. The mathematical function which defines the degree of an element's membership in a fuzzy set is called the membership function. Let U be a Universal set, contains all elements, A is a crisp set. It can be presented as A x U | x meets some conditions Definition of Characteristic Function 1, if 0, if a x x A x A 2.1. Membership Function: The value of fuzzy set in U is presented by membership function A x . Another intuitive presentation is to regard A x as percentum of which x belongs to A. Usually, a fuzzy set A in U, is presented as an order pairs of x and its membership value: A x, A x | x U For example: Figure 1 describes the graph of membership functions of fever. 1.2 Slight Moderate 1 0.8 37.8 Sever e Extreme 39.8 38.4 0.6 0.4 0.2 0 37.3 37.9 38.6 39.1 40 (Figure 1, Fever rules data came from www.bhp.doh.gov.tw) 5 Assume U 37.8, 38.4, 39.8, Set A A x , A 37.8 0.83, 0, A 38.4 0.29, 0.71, A 39.8 0, 0, 0, 0 0, 0 0.22, 0.78 Using the functions, each value along the specific domain can be assigned a membership value that represents the degree to which it belongs to a particular fuzzy set. So, 39.8 can be transformed to 22% belongs to ‘High’ fever and 78% belongs to ‘Severe’ fever. The fuzzy rule Ri from a fuzzy system has the following form: If X is Ai then Y is Bi Where X = {x1, x2, …, xn} is the input, Y is the output, Ai = Ai1 x Ai2 x …x Ain and Bi are fuzzy sets of the antecedent and the consequent of the rule respectively. Each fuzzy set can have a linguistic label. For example, a fuzzy rule from a medical system may look like the following: If Temperature is Low then Dose is Low Where Lows in Temperature and Dose are fuzzy sets defined in the Temperature and Dose membership functions. The use of human interpretable rules with linguistic labels allows for easy encoding of the expert’s knowledge into a fuzzy model. Instead of directly modeling the behavior of the system mathematically, the use of fuzzy system offers an alternative to modeling an experienced human operator. When dealing with complex systems, the latter is a much more convenient approach. For this reason, fuzzy system has attracted much attention from both academia and industry. 2.2. Fuzzy Control Fuzzy control, which directly uses fuzzy rules, is the most important current application in fuzzy theory. I use a procedure originated by Ebrahim Mamdani in the late 70s as a demo to present the complete inference algorithm. 6 Three steps are used to create a fuzzy controlled machine: 1) Fuzzification (Using membership functions to graphically describe a situation) 2) Rule evaluation (Application of fuzzy rules) 3) Defuzzification (Obtaining the crisp or actual results) Now, we want to construct an inverted pendulum system. Here, the problem is to balance a pole on a mobile platform that can move in only two directions, to the left or to the right. The angle between the platform and the pendulum and the angular velocity of this angle are chosen as the inputs of the system. Output is corresponding to the speed of the platform. (Mamdani, 1972) 2.2.1. Fuzzification First of all, the different levels of output of the platform (speed) are defined by specifying the membership functions for the fuzzy sets. The graph of the function is shown below Similarly, the different angles between the platform and the pendulum and... The angular velocities of specific angles are also defined 7 Note: For simplicity, it is assumed that all membership functions are spread equally. 2.2.2. Rule evaluation The next step is to define the fuzzy rules. The fuzzy rules are a series of if-then statements. These statements are usually derived by an expert to achieve optimum results. Some examples of these rules are: If angle is zero and angular velocity is zero then speed is also zero. If angle is zero and angular velocity is low then the speed shall be low. The full set of rules is summarized in the table below. The dashes are for conditions, which have no rules associated with them. This is for simplifying the situation. Speed Angle ------------ negative high negative low zero positive low positive high v negative high ------------ ----------- negative high --------- --------- e negative low --------- --------- negative low zero -------- l zero negative high negative low zero positive low positive high o positive low --------- zero low ----------- --------- c positive high --------- --------- high ---------- --------- An application of these rules is shown using specific values for angle and angular velocities. The values used for this example are 0.75 and 0.25 for zero and positive-low angles, and 0.4 and 0.6 for zero and negative-low angular velocities. These points are on the graphs below. 8 Consider the rule "if angle is zero and angular velocity is zero, the speed is zero". The actual value belongs to the fuzzy set zero to a degree of 0.75 for "angle" and 0.4 for "angular velocity". Since this is an AND operation, the minimum criterion is used , and the fuzzy set zero of the variable "speed" is cut at 0.4 and the patches are shaded up to that area. This is illustrated in the figure below. Similarly, the minimum criterion is used for the other three rules. The following figures show the result patches yielded by the rule "if angle is zero and angular velocity is 9 negative low, the speed is negative low", "if angle is positive low and angular velocity is zero, then speed is positive low" and "if angle is positive low and angular velocity is negative low, the speed is zero". The four results overlap and are reduced to the following figure 2.2.3. Defuzzification The result of the fuzzy controller as of know is a fuzzy set (of speed). In order to choose an appropriate representative value as the final output (crisp values), defuzzification must be done. There are numerous defuzzification methods, but the most common one used is the center of gravity of the set as shown below. 10 3. Hierarchical Fuzzy System A major issue in fuzzy applications is how to produce fuzzy rules. The classical approaches of fuzzy control deal with dense rule bases where the universe of discourse is fully covered by the antecedent fuzzy sets of the rule base in each dimension, thus there is at least one activated rule for every input (Muresan, 2001). It causes the high computational complexity of these traditional approaches, because the numbers of rules has an exponential increase with the number of inputs and terms, e.g. in the above example, there are two inputs and 5 terms, it should be 25 rules, however, if there are 5 inputs and 5 terms, the number of rules is 3,125. The complexity limits the usage of classical fuzzy theory where the inputs cannot exceed about 6 to 10 (Wong et al. 2003). If a fuzzy model contains k variables and maximum T linguistic (or other fuzzy) terms in each dimension, the number of necessary rules is O (T k ) . The number of rules can be decreased either by decreasing T, or k, or both, meanwhile methods should prevent from losing the easy interpretability of the components. One method leads to sparse rule bases through decreasing T and adapts rule interpolation to create rule bases (Kóczy and Hirota, 1993). The other aims to reduce the dimension of the sub-rule bases k by using meta-levels or hierarchical fuzzy rule bases (Sugeno, Murofushi, Nishino and Miwa, 1991). As for the hierarchical structure, the basic idea is the following: Often the multi-dimensional input state space X X 1 X 2 X k can be decomposed, so that some of its components, e. g. X X 1 X 2 X k 0 determine a subspace of X (k 0 k ) , so that in Z 0 a partition D1 , D2 ,, Dn determined: n Di Z 0 i 1 In each element of , i.e. Di , a sub-rule base Ri can be constructed with local validity. In the worst case, each sub-rule base refers to exactly X Z 0 X k0 1 X k , and so the 11 hierarchical rule base has the following structure: R0: If z 0 is D1 then use R1 If z 0 is D2 then use R2 ……. If z 0 is Dn then use Rn Where z 0 Z 0 R1: If z1 is A11 then y use B11 If z1 is A12 then use B12 ……. If z1 is A1m1 then use B1m1 Where z1 X Z 0 R2: If z1 is A21 then y use B21 If z1 is A22 then use B22 ……. If z1 is A2m2 then use B 2m2 Where z1 X Z 0 .. Rn: If z1 is An1 then y use B21 If z1 is An2 then use B22 ……. If z1 is Anmn then use Bnmn The fuzzy rules in hierarchical structure are pointers to other sub – rules bases. We can find that this hierarchical approach does not help with the O (T k ) complexity of the whole rule bases as the size of R0 is O(T k1 ) ,and each Ri, i>0, is of order O (T k k1 ) , so the resulting complexity is O(T k1 ) O(T k k1 ) O(T k ) . Only if a suitable and Z0 are found where the number of variables in each Zi is ki<k-k0 and max nk1 ki K O(T K ) , then the application of the structured rule base leads in effect to the reduction of k to smaller exponent: k0<k+K . Now, the main difficulty in the automatic construction of such system is mainly in finding a suitable Z0 and . 12 One requirement of a suitable Π is that each of its elements Di can be modeled by a rulebase with local validity. In this case, it is reasonable to expect Di to contain homogeneous data. The problem of finding Π can thus be reduced to finding homogeneous structures within the data (Gedeon, 2001). This can be achieved by clustering algorithms which will be introduced detailed later. 4. Fuzzy Signature Fuzzy signatures structure data into vectors of fuzzy values, each of which can be a further vector. It can extend the application of fuzzy theory to domains which contain complex and interdependent features. Fuzzy signatures can be used in cases which have different numbers of data components. The definition of fuzzy sets was A: X->[0,1], and was extend to L-fuzzy sets by Goguen (Goguen, 1967), k As : X ai i 1 , ai 0,1 k ,a aij i i 1 ij 0,1 a kij ijl l 1 AL : X L , L being an arbitrary algebraic lattice. Vector Valued Fuzzy Sets is descried k as, where Av ,k : X 0,1 , and the range of membership values was the lattice of k dimensional vectors with components in the unit interval (Kóczy, 1982). Generally, it means each fuzzy signature is a nested vector structure which contains a serial fuzzy signature, the internal structure of which indicates the semantic and logical connection of state variables, equivalent to the leaves of the signature graph. It can be denoted as a fuzzy set vector which has possible recursive component vectors: A : X S (n) where n 1 and n S ( n) S i i 1 0,1 S i ( m) and describes Cartesian product. S 13 A fuzzy signature is a kind of special multidimensional fuzzy data; some of the data in sub-groups will affect some feature on their higher level. The relationship between higher and lower levels is controlled by a set of fuzzy aggregations. The results of the parent signature at each level are computed from their branches with appropriate aggregation of their child signatures. The aggregation methods are not necessary identical. It can be changed based on expert opinions and detailed circumstance. With each aggregation, higher signatures will keep less information. In some circumstances, it is useful to reduce and aggregate information and maintain compatibility with that of other sources in which some detail variables are missed or omitted. In most cases, the rule of maximal common sub-tree is that all signatures are able to be interpolated between the corresponding branches. 5. Fuzzy Clustering In last chapter, we mentioned that a clustering algorithm is used to cluster output data samples. The main requirement of a reasonable cluster is that each of its elements can be modeled by a rule base with local validity (Wong et al. 2003). So it can be regarded that its aim is to find a subspace contains homogeneous data. Once a subspace is found, it could be used to select an appropriate sub-rule to infer the output for a given input. In data X x1 , x2 x, 3 ,.....xn , how to classify data points in X to K groups and ( n k 2 ), the rule is: there are highly relevant points in the same group, and highly irrelevant points in different groups. In traditional mathematical classification, it will classify the datum ‘strictly’ to a group. This is called Hard Clustering. However, most problems in our lives belong to uncertain fuzzy problems. Using fuzzy clustering allows a certain value to belong to different groups, so it can be better to keep the feature of this value. On the other hand, when we are using the feature selection methods to seek a ‘reasonable’ subspace for clustering, the algorithm works more effectively, most of 14 feature selection technology such as c-Means needs to know the number of clusters. 5.1. FCM Algorithm Bezdek introduced Fuzzy C-Means clustering method (FCM) in 1981, extend from Hard C-Mean clustering Method (Dunn, 1974). In some conditions, convergence of FCM is better than that of Maximum Likelihood (ML) (Huggins, 1983). A suit of FCM algorithms issued by Cannon et al, are used widely in research in clustering analysis applications. The fuzzy cluster algorithm issued by Bezdek is an amended Dunn’s c-Means, but it is still some flaws such as, weight is not be considered and mainly used for static l data. n In 1978, Roubens introduced a new objective n function: iv2 2jv d i, j , however, the divergence is not very good. In 1981, v 1 i 1 j 1 n k Leonard et al, issued an amended objective formula: v 1 n uiv2 u 2jv d (i, j ) i 1 j 1 n 2 j 1 , which can u 2jv improve Roubens’ objective function (Cheng, 1991) In this case, many symptoms affect each other from a medical viewpoint and different physical features of persons also affect the implicit meaning of measurement of symptoms. So, we adapt fuzzy clustering theory instead of classical clustering. Furthermore, using fuzzy clustering will still maintain the information in the original data so that it can be used for further research. Let (1 , 2 ,..... c1 ) as a fuzzy partition C U cn 11 21 c1 12 1n 2 n cn Dunn defined a fuzzy objective function: 15 c n J D U ,V ij2 x j vi i 1 j 1 2 , vi is cluster center of i set Then, Bezdek (1981) extended it to: c n J m U ,V ; X ijm x j vi ,1 m 2 i 1 j 1 2 xk vi represents the deviation of data x k with vi . The number m governs the influence of membership grades. , vi is cluster center of i set For getting the minimum (U, V): c n min J m U ,V ; X ijm x j vi i 1 j 1 2 We should reach two conditions listed below: n ( ij ) m x j vi j 1 n ( ij ) ,1 i c m j 1 and c ij ( x j v j k 1 1 2 2 x j vk ) m 1 1 ,1 i c,1 j n 5.2. Cluster Validity Problem FCMC algorithm has a limitation, which is it needs to know the number of clusters. Some work has been done on how to find an optimal number of clusters. This is referred to as the cluster validity problem. A cluster validity index proposed by Fukuyama and Sugeno (FS): c n m ( xk vi 2 vi x 2 ),2 c n S (c ) U ij i 1 j 1 16 Therefore, the optimal member of clusters can be found by minimizing the distance between data and its centre and maximizing the distance between data in difference clusters. Iteration of the FCMC stops when the error is below a defined tolerance or its improvement over the previous iteration is below a certain threshold. 6. Factor Analysis From the fuzzy signature in SARS, we can find this kind of basic structure. The four tests for temperature are grouped in a sub branch as default since experts believe these four tests are high relevant and affect each other. From the angle of medicine, any diseases would be not only one symptom, but several symptoms may appear concurrently. Or, there may be some main symptoms with accompanying symptoms. Based on this point, we believe we can find high relevant symptoms which could be grouped into a sub-branch. The main aim of Factor analysis is to find out a set of closely related models intended for exploring or establishing correlational structure among the observed random variables (Basilevsky, 1994). Initially, factor analysis was developed by psychologists and was primarily concerned with hypotheses about the organization of mental ability suggested by an examination of matrices of correlation between cognitive test varieties. Hotelling (1957) has pointed out that factor analysis is the most widely used of multivariate techniques, in spite of not always appropriately. With the advent of computers, factor analysis has been spread to extensive domains, other than psychology: to economics, botany, biology as well as to the social sciences (Basilevsky, 1994). 7. Feature Ranking and Selection During the automatic construction of the fuzzy signature, feature ranking and selection play important roles both in constructing the sub-branch and in assigning weights for 17 each sub branch of fuzzy signatures. Devijver and J. Kittler (1982 ) developed interclass separability criterion to implement the purposes. Tikk and Gedeon (2000 ) improved the original criterion by fuzzifying it and used it for feature selection in fuzzy rule extraction. 8. Outliers and Missing Data 9. Experiment and Construction of Fuzzy Signatures 10. Conclusion In this report, I have described the basic concept of fuzzy sets and what fuzzy control is. For solving the rules explosion, I introduced the conception of fuzzy signature and how to construct a fuzzy signature manually. The hierarchical fuzzy signatures structure presented can perform feature selection and interclass separability to reduce the complexity. Here a SARS pre-clinical diagnosis model was constructed using fuzzy signature to show the flexibility of the fuzzy signature. In addition, a data mining algorithm – FCM and its role in fuzzy signature was introduced. With the assistance of relevant theory, the fuzzy applications which may be generated from or adapted to fuzzy logic will be wide-used, and provide the more opportunity for modeling of conditions which are inherently imprecisely defined. 18 References Bezdek, J.C., Pattern Recognition with Fuzzy Objective Function Algorithms. 1981, New York: Plenum Press. Chong, A., Gedeon, T.D. and Kóczy,L.T., “Projection Based Method for Sparse Fuzzy System Generation” in Proceedings of 2nd WSEAS int.Conf on Scientific Computation and Soft Computing, 2002,pp.321-325 Chong, A., Gedeon, T.D. and Kóczy, L.T. “Feature selection and subspace clustering for hierarchical fuzzy rule extraction” CIMSA. 2003,Paris Gedeon, T.D. (1999) “Clustering Significant Wordsusing their Co-occurrence in Document Sub-Collections,” Proceedings 7th European Congress onIntelligent Techniques and Soft Computing(EUFIT’99), Aachen, pp. 302-306. Gedeon, T.D., Kóczy, L.T. Wong, K.W. and Liu, P. (2001) “Effective Fuzzy Systems for Complex Structured Data,” Proceedings of IASTED International Conference Control and Applications (CA 2001), pp. 184-187 Goguen, J.A (1967) “L-fuzzy sets” J. Math. A. Anal. Appl. 18, pp. 145-174 Kóczy , L.T. and Hirota, K., (1993) “Approximate reasoning by linear rule interpolation and general approximation,” Int. J. Approx. Reason, Vol 9, pp197-223. Muresan, Leila “Interpolation in Hierarchical Fuzzy Rule Bases,” Technical University of Budapest, Hungary Sugeno, M., Murofushi, T., Nishino, J., and Miwa, H. (1991) “helicopter flight control based on fuzzy logic,” Proceedings of Fuzzy Engineering toward Human Friendly System’91, pp. 1120-1124 Wang L X, Mendel J M (1992). “Fuzzy basis functions, universal approximation, and orthogonal least squares learning [J]”. IEEE Trans. On Neural Networks , (5) :807 -814. [4] Wong, K.W., Chong, A., Gedeon, T.D., K ó czy, L.T., and Vamos, T. (2003). “Hierarchical Fuzzy Signatures Structure for Complex Structured Data” Proceedings of International Symposium on Computational Intelligence and Intelligent Informatics 2003(ISCIII’03), Nabeul, Tunisia, pp105-109 D. Tikk and T. D. Gedeon, "Feature ranking based on interclass separability for fuzzy control aplication," presented at Proceedings of the International Conference on Artificial Intelligence in Science and Technology (AISAT'2000), Horbat, 29-32, 2000. 19 P. A. Devijver and J. Kittler, Pattern Recognition: A Statistical Approach. London: Prentice Hall, 1982. Basilevsky, Alexander, Statistical Factor Analysis and Related Methods. 1994, NewYork: Wiley-Interscience Publication. 20