Plan - ANU - Australian National University

advertisement
Australian National University
Department of Computer Science
Constructing Fuzzy Signature Based
on Medical Data
Student: Bai Qifeng
Supervisor: Prof. Tom Gedeon
1
Index
1.
INTRODUCTION ................................................................................................................................ 3
2.
FUZZY LOGIC THEORY .................................................................................................................. 4
2.1.
DEFINITION OF MEMBERSHIP FUNCTION: ....................................................................................... 5
2.2.
FUZZY CONTROL ............................................................................................................................ 6
2.2.1.
Fuzzification .......................................................................................................................... 7
2.2.2.
Rule evaluation ...................................................................................................................... 8
2.2.3.
Defuzzification ......................................................................................................................10
3.
HIERARCHICAL FUZZY SYSTEM ............................................................................................... 11
4.
FUZZY SIGNATURE .........................................................................................................................13
5.
FUZZY CLUSTERING ......................................................................................................................14
5.1.
5.2.
FUZZY C-MEAN .............................................................................................................................15
CLUSTER VALIDITY PROBLEM .......................................................................................................16
6.
FACTOR ANALYSIS ..........................................................................................................................17
7.
FEATURE RANKING AND SELECTION ......................................................................................17
8.
OUTLIERS AND MISSING DATA ...................................................................................................18
9.
EXPERIMENT AND CONSTRUCTION OF FUZZY SIGNATURES ..........................................18
10.
CONCLUSION ...............................................................................................................................18
REFERENCES ............................................................................................................................................19
2
Constructing Fuzzy Signature Based on Medical Data
KEY WORDS: clustering, FCM, fuzzy logic, fuzzy signature, factor analysis
ABSTRACT:
A major advantage of fuzzy theory is that it allows the natural and linguistic description of problems
instead of precise numerical values. This advantage which deals with a complicated system in an intuitive
way is the main reason why fuzzy theory is widely applied. However, fuzzy systems suffer from rule
explosion in complicated systems. The hierarchical fuzzy signature is introduced to reduce the rules. Fuzzy
signatures are vector valued fuzzy sets, where each vector component can be a further vector valued fuzzy
set. Though clustering methods can help up find the clusters which have similar characteristics. Further,
principal components in the clusters can be located via factor analysis. Then, a fuzzy signature can be
constructed based on the degrees of coupling of components.
1.
Introduction
Over the past few decades, fuzzy logic theory is widely used: process control,
management and decision making, operations research, economies. Dealing with simple
‘yes’ and ‘no’ answers is no longer satisfactory enough; a degree of membership (Zadeh,
1965) became a new way of solving problems. Fuzzy logic derives from the truth that the
human common sense reasoning mode is approximate in nature.
However, conventional fuzzy systems suffer from rules explosion. Thus, their
applicability still is limited into some control systems with few dimensions of input
variables and simply structured data. (Wong, Gedeon and Kóczy, 2001). If the number of
inputs is larger than 6 or 10, it will cause heavy workload and too complicate rules. Fuzzy
signature is introduced to solve some problems in economy and medical domains which
is full of complicated and interdependent objects which need to be classified and
evaluated (Gedeon et al. 2001). Fuzzy signature has a hierarchical structure which allows
some relevant data constructed as vectors of fuzzy values, and then are contained in some
high-level vectors. This tree structure is created with the objective to mimic human
experts’ decision-making process as which can handle situations in which the numbers of
data item are different, some even missing. Through classifying interdependent features
and evaluating similarities / dissimilarities, features with high similarities can be grouped
3
together to form a sub branch and independent features can form a new sub-branch, Then,
the different sub-branch can construct a higher level branch. Finally, they will be formed
into a hierarchical tree.
However, when dealing with a complex data, it is possible that they hide hierarchical
structure. This report aims to how to find the internal hidden structure. Here, clustering is
first performed on the dataset, then, factor analysis methods are used to find the principal
components in clusters, then we can construct fuzzy signature based on the different
degrees of coupling and importance.
2.
Fuzzy Logic Theory
The success of Fuzzy Systems is came from several factors. One is its ability to model
non-linear systems at reasonable accuracy using human interpretable rules. Given a
system input, the fuzzy system is not only able to infer a result, but also explain, in a way
understandable to humans, how the conclusion is reached. It is this inferential explanation
capability that has distinguished fuzzy systems from other Artificial Intelligence
techniques as well as traditional mathematical models (Alex 2004).
In comparison to traditional crisp symbolic rules, fuzzy rules are more able to capture
uncertainty due to the use of fuzzy sets, which allows for gradual transition between
different regions in the problem domain. For example, temperature of body is 38.5 degree.
Conventional bivalent sets can tell us whether this temperature is ‘high’ or ‘low’. The
most obvious limiting feature of bivalent sets is that they are mutually exclusive - it is not
possible to have membership of more than one set. So, it is not accurate to define a
transition from a quantity such as ‘low' to 'high'. What if 38.5 degree is the boundary,
38.49 does mean ‘low’ but 38.51 means ‘fast’ in bivalent set.
This natural phenomenon can be described more accurately by Fuzzy Set Theory. A fuzzy
set is a set whose elements have degrees of membership. An element of a fuzzy set can be
full member (100% membership) or a partial member (between 0% and 100%
4
membership). That is, the membership value assigned to an element is no longer
restricted to just two values, but can be 0, 1 or any value in-between. The mathematical
function which defines the degree of an element's membership in a fuzzy set is called the
membership function.
Let U be a Universal set, contains all elements, A is a crisp set. It can be presented as
A  x U | x meets some conditions
Definition of Characteristic Function
1, if
0, if
 a x   
x A
x A
2.1. Membership Function:
The value of fuzzy set in U is presented by membership function  A x  .
Another intuitive presentation is to regard  A x  as percentum of which x belongs to A.
Usually, a fuzzy set A in U, is presented as an order pairs of x and its membership value:
A  x,  A x | x U 
For example:
Figure 1 describes the graph of membership functions of fever.
1.2
Slight
Moderate
1
0.8
37.8
Sever
e
Extreme
39.8
38.4
0.6
0.4
0.2
0
37.3
37.9
38.6
39.1
40
(Figure 1, Fever rules data came from www.bhp.doh.gov.tw)
5
Assume U  37.8, 38.4, 39.8, Set A   A x ,
 A 37.8  0.83,
0,
 A 38.4  0.29, 0.71,
 A 39.8   0,
0,
0,
0
0,
0
0.22, 0.78
Using the functions, each value along the specific domain can be assigned a membership
value that represents the degree to which it belongs to a particular fuzzy set. So, 39.8 can
be transformed to 22% belongs to ‘High’ fever and 78% belongs to ‘Severe’ fever.
The fuzzy rule Ri from a fuzzy system has the following form:
If X is Ai then Y is Bi
Where X = {x1, x2, …, xn} is the input, Y is the output, Ai = Ai1 x Ai2 x
…x
Ain and Bi are
fuzzy sets of the antecedent and the consequent of the rule respectively. Each fuzzy set
can have a linguistic label. For example, a fuzzy rule from a medical system may look
like the following:
If Temperature is Low then Dose is Low
Where Lows in Temperature and Dose are fuzzy sets defined in the Temperature and
Dose membership functions.
The use of human interpretable rules with linguistic labels allows for easy encoding of
the expert’s knowledge into a fuzzy model. Instead of directly modeling the behavior of
the system mathematically, the use of fuzzy system offers an alternative to modeling an
experienced human operator. When dealing with complex systems, the latter is a much
more convenient approach. For this reason, fuzzy system has attracted much attention
from both academia and industry.
2.2. Fuzzy Control
Fuzzy control, which directly uses fuzzy rules, is the most important current application
in fuzzy theory. I use a procedure originated by Ebrahim Mamdani in the late 70s as a
demo to present the complete inference algorithm.
6
Three steps are used to create a fuzzy controlled machine:
1) Fuzzification (Using membership functions to graphically describe a situation)
2) Rule evaluation (Application of fuzzy rules)
3) Defuzzification (Obtaining the crisp or actual results)
Now, we want to construct an inverted pendulum system. Here, the problem is to balance
a pole on a mobile platform that can move in only two directions, to the left or to the right.
The angle between the platform and the pendulum and the angular velocity of this angle
are chosen as the inputs of the system. Output is corresponding to the speed of the
platform. (Mamdani, 1972)
2.2.1. Fuzzification
First of all, the different levels of output of the platform (speed) are defined by specifying
the membership functions for the fuzzy sets. The graph of the function is shown below
Similarly, the different angles between the platform and the pendulum and...
The angular velocities of specific angles are also defined
7
Note: For simplicity, it is assumed that all membership functions are spread equally.
2.2.2. Rule evaluation
The next step is to define the fuzzy rules. The fuzzy rules are a series of if-then
statements. These statements are usually derived by an expert to achieve optimum results.
Some examples of these rules are:
If angle is zero and angular velocity is zero then speed is also zero.
If angle is zero and angular velocity is low then the speed shall be low.
The full set of rules is summarized in the table below. The dashes are for conditions,
which have no rules associated with them. This is for simplifying the situation.
Speed
Angle
------------
negative high
negative low
zero
positive low
positive high
v negative high
------------
-----------
negative high
---------
---------
e negative low
---------
---------
negative low
zero
--------
l zero
negative high
negative low
zero
positive low
positive high
o positive low
---------
zero
low
-----------
---------
c positive high
---------
---------
high
----------
---------
An application of these rules is shown using specific values for angle and angular
velocities. The values used for this example are 0.75 and 0.25 for zero and positive-low
angles, and 0.4 and 0.6 for zero and negative-low angular velocities. These points are on
the graphs below.
8
Consider the rule "if angle is zero and angular velocity is zero, the speed is zero". The
actual value belongs to the fuzzy set zero to a degree of 0.75 for "angle" and 0.4 for
"angular velocity". Since this is an AND operation, the minimum criterion is used , and
the fuzzy set zero of the variable "speed" is cut at 0.4 and the patches are shaded up to
that area. This is illustrated in the figure below.
Similarly, the minimum criterion is used for the other three rules. The following figures
show the result patches yielded by the rule "if angle is zero and angular velocity is
9
negative low, the speed is negative low", "if angle is positive low and angular velocity is
zero, then speed is positive low" and "if angle is positive low and angular velocity is
negative low, the speed is zero".
The four results overlap and are reduced to the following figure
2.2.3. Defuzzification
The result of the fuzzy controller as of know is a fuzzy set (of speed). In order to choose
an appropriate representative value as the final output (crisp values), defuzzification must
be done. There are numerous defuzzification methods, but the most common one used is
the center of gravity of the set as shown below.
10
3.
Hierarchical Fuzzy System
A major issue in fuzzy applications is how to produce fuzzy rules. The classical
approaches of fuzzy control deal with dense rule bases where the universe of discourse is
fully covered by the antecedent fuzzy sets of the rule base in each dimension, thus there
is at least one activated rule for every input (Muresan, 2001). It causes the high
computational complexity of these traditional approaches, because the numbers of rules
has an exponential increase with the number of inputs and terms, e.g. in the above
example, there are two inputs and 5 terms, it should be 25 rules, however, if there are 5
inputs and 5 terms, the number of rules is 3,125. The complexity limits the usage of
classical fuzzy theory where the inputs cannot exceed about 6 to 10 (Wong et al. 2003).
If a fuzzy model contains k variables and maximum T linguistic (or other fuzzy) terms in
each dimension, the number of necessary rules is O (T k ) . The number of rules can be
decreased either by decreasing T, or k, or both, meanwhile methods should prevent from
losing the easy interpretability of the components. One method leads to sparse rule bases
through decreasing T and adapts rule interpolation to create rule bases (Kóczy and Hirota,
1993). The other aims to reduce the dimension of the sub-rule bases k by using
meta-levels or hierarchical fuzzy rule bases (Sugeno, Murofushi, Nishino and Miwa,
1991).
As for the hierarchical structure, the basic idea is the following:
Often the multi-dimensional input state space
X  X 1  X 2   X k
can be
decomposed, so that some of its components, e. g. X  X 1  X 2    X k 0 determine a
subspace of X (k 0  k ) , so that in Z 0 a partition
  D1 , D2 ,, Dn 
determined:
n
 Di  Z 0
i 1
In each element of  , i.e. Di , a sub-rule base Ri can be constructed with local validity.
In the worst case, each sub-rule base refers to exactly X Z 0  X k0 1   X k , and so the
11
hierarchical rule base has the following structure:
R0:
If z 0 is D1 then use R1
If z 0 is D2 then use R2
…….
If z 0 is Dn then use Rn
Where z 0  Z 0
R1:
If z1 is A11 then y use B11
If z1 is A12 then use B12
…….
If z1 is A1m1 then use B1m1
Where z1  X Z 0
R2:
If z1 is A21 then y use B21
If z1 is A22 then use B22
…….
If z1 is A2m2 then use B 2m2
Where z1  X Z 0
..
Rn:
If z1 is An1 then y use B21
If z1 is An2 then use B22
…….
If z1 is Anmn then use Bnmn
The fuzzy rules in hierarchical structure are pointers to other sub – rules bases. We can
find that this hierarchical approach does not help with the O (T k ) complexity of the
whole rule bases as the size of R0 is O(T k1 ) ,and each Ri, i>0, is of order O (T k  k1 ) , so
the resulting complexity is O(T k1 )  O(T k k1 )  O(T k ) . Only if a suitable  and Z0 are
found where the number of variables in each Zi is ki<k-k0 and max nk1 ki   K  O(T K ) ,
then the application of the structured rule base leads in effect to the reduction of k to
smaller exponent: k0<k+K . Now, the main difficulty in the automatic construction of
such system is mainly in finding a suitable Z0 and  .
12
One requirement of a suitable Π is that each of its elements Di can be modeled by a
rulebase with local validity. In this case, it is reasonable to expect Di to contain
homogeneous data. The problem of finding Π can thus be reduced to finding
homogeneous structures within the data (Gedeon, 2001). This can be achieved by
clustering algorithms which will be introduced detailed later.
4.
Fuzzy Signature
Fuzzy signatures structure data into vectors of fuzzy values, each of which can be a
further vector. It can extend the application of fuzzy theory to domains which contain
complex and interdependent features. Fuzzy signatures can be used in cases which have
different numbers of data components.
The definition of fuzzy sets was A: X->[0,1], and was extend to L-fuzzy sets by Goguen
(Goguen, 1967),

k
As : X  ai i 1 , ai  

0,1
 
k ,a
aij i i 1 ij



0,1
a 
kij
ijl l 1
AL : X  L , L being an arbitrary algebraic lattice. Vector Valued Fuzzy Sets is descried
k
as, where Av ,k : X  0,1 , and the range of membership values was the lattice of k
dimensional vectors with components in the unit interval (Kóczy, 1982). Generally, it
means each fuzzy signature is a nested vector structure which contains a serial fuzzy
signature, the internal structure of which indicates the semantic and logical connection of
state variables, equivalent to the leaves of the signature graph. It can be denoted as a
fuzzy set vector which has possible recursive component vectors:
A : X  S (n) where n  1 and
n
S ( n)   S i
i 1
 0,1
S i   ( m) and  describes Cartesian product.
S
13
A fuzzy signature is a kind of special multidimensional fuzzy data; some of the data in
sub-groups will affect some feature on their higher level.
The relationship between higher and lower levels is controlled by a set of fuzzy
aggregations. The results of the parent signature at each level are computed from their
branches with appropriate aggregation of their child signatures. The aggregation methods
are not necessary identical. It can be changed based on expert opinions and detailed
circumstance.
With each aggregation, higher signatures will keep less information. In some
circumstances, it is useful to reduce and aggregate information and maintain
compatibility with that of other sources in which some detail variables are missed or
omitted. In most cases, the rule of maximal common sub-tree is that all signatures are
able to be interpolated between the corresponding branches.
5.
Fuzzy Clustering
In last chapter, we mentioned that a clustering algorithm is used to cluster output data
samples. The main requirement of a reasonable cluster is that each of its elements can be
modeled by a rule base with local validity (Wong et al. 2003). So it can be regarded that
its aim is to find a subspace contains homogeneous data. Once a subspace is found, it
could be used to select an appropriate sub-rule to infer the output for a given input.
In data X  x1 , x2 x, 3 ,.....xn  , how to classify data points in X to K groups and
( n  k  2 ), the rule is: there are highly relevant points in the same group, and highly
irrelevant points in different groups. In traditional mathematical classification, it will
classify the datum ‘strictly’ to a group. This is called Hard Clustering. However, most
problems in our lives belong to uncertain fuzzy problems. Using fuzzy clustering allows a
certain value to belong to different groups, so it can be better to keep the feature of this
value. On the other hand, when we are using the feature selection methods to seek a
‘reasonable’ subspace for clustering, the algorithm works more effectively, most of
14
feature selection technology such as c-Means needs to know the number of clusters.
5.1. FCM Algorithm
Bezdek introduced Fuzzy C-Means clustering method (FCM) in 1981, extend from Hard
C-Mean clustering Method (Dunn, 1974). In some conditions, convergence of FCM is
better than that of Maximum Likelihood (ML) (Huggins, 1983). A suit of FCM
algorithms issued by Cannon et al, are used widely in research in clustering analysis
applications. The fuzzy cluster algorithm issued by Bezdek is an amended Dunn’s
c-Means, but it is still some flaws such as, weight is not be considered and mainly used
for
static
l
data.
n
In
1978,
Roubens
introduced
a
new
objective
n
function:   iv2  2jv d i, j  , however, the divergence is not very good. In 1981,
v 1 i 1 j 1
n
k
Leonard et al, issued an amended objective formula: 
v 1
n
 uiv2 u 2jv d (i, j )
i 1 j 1
n
2
j 1
, which can
u 2jv
improve Roubens’ objective function (Cheng, 1991)
In this case, many symptoms affect each other from a medical viewpoint and different
physical features of persons also affect the implicit meaning of measurement of
symptoms. So, we adapt fuzzy clustering theory instead of classical clustering.
Furthermore, using fuzzy clustering will still maintain the information in the original data
so that it can be used for further research.
Let   (1 ,  2 ,..... c1 ) as a fuzzy partition C
U cn
 11

  21
 

  c1
12  1n 
   2 n 


 

  cn 

Dunn defined a fuzzy objective function:
15
c
n
J D U ,V     ij2 x j  vi
i 1 j 1
2
, vi is cluster center of i set
Then, Bezdek (1981) extended it to:
c
n
J m U ,V ; X     ijm x j  vi ,1  m  
2
i 1 j 1
2
xk  vi
represents the deviation of data x k with vi . The number m governs the
influence of membership grades.
, vi is cluster center of i set
For getting the minimum (U, V):
c n


min  J m U ,V ; X     ijm x j  vi

i 1 j 1

2




We should reach two conditions listed below:
n
 (  ij ) m x j
vi 
j 1
n
 (  ij )
,1  i  c
m
j 1
and
 c
 ij    ( x j  v j
 k 1

1
2
2
x j  vk )
m 1 
1
 ,1  i  c,1  j  n


5.2. Cluster Validity Problem
FCMC algorithm has a limitation, which is it needs to know the number of clusters. Some
work has been done on how to find an optimal number of clusters. This is referred to as
the cluster validity problem. A cluster validity index proposed by Fukuyama and Sugeno
(FS):
c
n
 m ( xk  vi 2  vi  x 2 ),2  c  n
S (c )   U ij
i 1 j 1
16
Therefore, the optimal member of clusters can be found by minimizing the distance
between data and its centre and maximizing the distance between data in difference
clusters.
Iteration of the FCMC stops when the error is below a defined tolerance or its
improvement over the previous iteration is below a certain threshold.
6.
Factor Analysis
From the fuzzy signature in SARS, we can find this kind of basic structure. The four tests
for temperature are grouped in a sub branch as default since experts believe these four
tests are high relevant and affect each other. From the angle of medicine, any diseases
would be not only one symptom, but several symptoms may appear concurrently. Or,
there may be some main symptoms with accompanying symptoms. Based on this point,
we believe we can find high relevant symptoms which could be grouped into a
sub-branch.
The main aim of Factor analysis is to find out a set of closely related models intended for
exploring or establishing correlational structure among the observed random variables
(Basilevsky, 1994). Initially, factor analysis was developed by psychologists and was
primarily concerned with hypotheses about the organization of mental ability suggested
by an examination of matrices of correlation between cognitive test varieties. Hotelling
(1957) has pointed out that factor analysis is the most widely used of multivariate
techniques, in spite of not always appropriately. With the advent of computers, factor
analysis has been spread to extensive domains, other than psychology: to economics,
botany, biology as well as to the social sciences (Basilevsky, 1994).
7.
Feature Ranking and Selection
During the automatic construction of the fuzzy signature, feature ranking and selection
play important roles both in constructing the sub-branch and in assigning weights for
17
each sub branch of fuzzy signatures. Devijver and J. Kittler (1982 ) developed interclass
separability criterion to implement the purposes. Tikk and Gedeon (2000 ) improved the
original criterion by fuzzifying it and used it for feature selection in fuzzy rule extraction.
8.
Outliers and Missing Data
9.
Experiment and Construction of Fuzzy Signatures
10. Conclusion
In this report, I have described the basic concept of fuzzy sets and what fuzzy control is.
For solving the rules explosion, I introduced the conception of fuzzy signature and how
to construct a fuzzy signature manually. The hierarchical fuzzy signatures structure
presented can perform feature selection and interclass separability to reduce the
complexity. Here a SARS pre-clinical diagnosis model was constructed using fuzzy
signature to show the flexibility of the fuzzy signature. In addition, a data mining
algorithm – FCM and its role in fuzzy signature was introduced. With the assistance of
relevant theory, the fuzzy applications which may be generated from or adapted to fuzzy
logic will be wide-used, and provide the more opportunity for modeling of conditions
which are inherently imprecisely defined.
18
References
Bezdek, J.C., Pattern Recognition with Fuzzy Objective Function Algorithms. 1981, New
York: Plenum Press.
Chong, A., Gedeon, T.D. and Kóczy,L.T., “Projection Based Method for Sparse Fuzzy
System Generation” in Proceedings of 2nd WSEAS int.Conf on Scientific Computation
and Soft Computing, 2002,pp.321-325
Chong, A., Gedeon, T.D. and Kóczy, L.T. “Feature selection and subspace clustering for
hierarchical fuzzy rule extraction” CIMSA. 2003,Paris
Gedeon, T.D. (1999) “Clustering Significant Wordsusing their Co-occurrence in
Document Sub-Collections,” Proceedings 7th European Congress onIntelligent
Techniques and Soft Computing(EUFIT’99), Aachen, pp. 302-306.
Gedeon, T.D., Kóczy, L.T. Wong, K.W. and Liu, P. (2001) “Effective Fuzzy Systems for
Complex Structured Data,” Proceedings of IASTED International Conference Control
and Applications (CA 2001), pp. 184-187
Goguen, J.A (1967) “L-fuzzy sets” J. Math. A. Anal. Appl. 18, pp. 145-174
Kóczy , L.T. and Hirota, K., (1993) “Approximate reasoning by linear rule interpolation
and general approximation,” Int. J. Approx. Reason, Vol 9, pp197-223.
Muresan, Leila “Interpolation in Hierarchical Fuzzy Rule Bases,” Technical University of Budapest,
Hungary
Sugeno, M., Murofushi, T., Nishino, J., and Miwa, H. (1991) “helicopter flight control
based on fuzzy logic,” Proceedings of Fuzzy Engineering toward Human Friendly
System’91, pp. 1120-1124
Wang L X, Mendel J M (1992). “Fuzzy basis functions, universal approximation, and
orthogonal least squares learning [J]”. IEEE Trans. On Neural Networks , (5) :807 -814.
[4]
Wong, K.W., Chong, A., Gedeon, T.D., K ó czy, L.T., and Vamos, T. (2003).
“Hierarchical Fuzzy Signatures Structure for Complex Structured Data” Proceedings of
International Symposium on Computational Intelligence and Intelligent Informatics
2003(ISCIII’03), Nabeul, Tunisia, pp105-109
D. Tikk and T. D. Gedeon, "Feature ranking based on interclass separability for
fuzzy control aplication," presented at Proceedings of the International Conference on
Artificial Intelligence in Science and Technology (AISAT'2000), Horbat, 29-32, 2000.
19
P. A. Devijver and J. Kittler, Pattern Recognition: A Statistical Approach. London:
Prentice Hall, 1982.
Basilevsky, Alexander, Statistical Factor Analysis and Related Methods. 1994,
NewYork: Wiley-Interscience Publication.
20
Download