5. Fuzzy Signature

advertisement
MIT eScience REPORT
FUZZY SIGNATURES
Bai Qifeng
KEY WORDS: cluster, c-mean, data mining, fuzzy logic, fuzzy signature, if-then rules, projection-based
ABSTRACT:
A major advantage of fuzzy theory is that it allows the natural and linguistic description of problems
instead of precise numerical values. This advantage which deals with a complicated system in an intuitive
way is the main reason why fuzzy theory is widely applied. However, fuzzy systems suffer from rule
explosion in complicated systems. There are two ways to reduce rule sets. One way is to create a sparse
system. The other is to construct fuzzy signatures. By constructing a hierarchical fuzzy signatures structure,
it also can assist human experts by solving missing data and reducing unnecessary information
presentation.
1.
Introduction
Over the past few decades, fuzzy logic theory is widely used: process control,
management and decision making, operations research, economies. Dealing with simple
‘yes’ and ‘no’ answers is no longer satisfactory enough; a degree of membership (Zadeh,
1965) became a new way of solving problems. Fuzzy logic derives from the truth that the
human common sense reasoning mode is approximate in nature.
However, when we are using fuzzy theory to handling practical problems, it is inevitable
to meet a large data set. Also, it cannot tackle problems with complicated and
interdependent features or where there is missing data (Wong, Gedeon and Kóczy, 2001).
In this situation, it will cause heavy workload and too complicate rules. Fuzzy signature
is introduced to solve some problems in economy and medical domains which is full of
complicated and interdependent objects which need to be classified and evaluated
(Gedeon et al. 2001). Fuzzy signature has a hierarchical structure which allows some
relevant data constructed as vectors of fuzzy values, and then are contained in some
high-level vectors. This tree structure is created with the objective to mimic human
experts’ decision-making process as which can handle situations in which the numbers of
data item are different, some even missing.
1
Hereby, an approach named Projection Based Method for Fuzzy System is introduced to
automatically construct a fuzzy rule base from a set of input-output sample data. It adapts
data mining technology to cluster on output space which is produced by a set of training
data, and then each output cluster is projected back to each input dimension. The clusters
from different input dimensions can be used to create fuzzy rules (Chong, 2001).
2.
Fuzzy Logic Theory
With the development of electronic devices, precise measured values are handled by
computer systems, for example, speed 50.12 mile/hour. Conventional bivalent sets can
tell us whether this speed is ‘fast’ or ‘slow’. The most obvious limiting feature of bivalent
sets is that they are mutually exclusive - it is not possible to have membership of more
than one set. Also, opinion would be widely different as to whether 50 mile/hour is 'fast'
or 'slow' hence the expert knowledge we need to define our system is mathematically at
odds with the human world. So, it is not accurate to define a transition from a quantity
such as slow' to 'fast'. What if 50 mile/hour is the boundary, 49.9 does mean ‘slow’ but
50.12 means ‘fast’ in bivalent set. However in the real world, there should be a smooth
change from ‘slow’ to ‘fast’ would.
This natural phenomenon can be described more accurately by Fuzzy Set Theory. A fuzzy
set is a set whose elements have degrees of membership. An element of a fuzzy set can be
full member (100% membership) or a partial member (between 0% and 100%
membership). That is, the membership value assigned to an element is no longer
restricted to just two values, but can be 0, 1 or any value in-between. The mathematical
function which defines the degree of an element's membership in a fuzzy set is called the
membership function.
Let U be a Universal set, contains all elements, A is a crisp set. It can be presented as
A  x U | x meets some conditions
Definition of Characteristic Function
2
1, if
0, if
 a x   
x A
x A
Definition of Membership Function:
The value of fuzzy set in U is presented by membership function  A x  .
Another intuitive presentation is to regard  A x  as percentum of which x belongs to A.
Usually, a fuzzy set A in U, is presented as an order pairs of x and its membership value:
A  x,  A x | x U 
For example:
Figure 1 describes the graph of membership functions of fever.
1.2
Slight
1
Moderate Severe
Extreme
0.8
0.6
0.4
0.2
0
37.3
37.9
38.6
39.1
40
(Figure 1, Fever rules data came from www.bhp.doh.gov.tw)
Algorithm is :
For each level of fever
If data > min && data< median
value = (data – min)/(median – min)
else if data > median && data < max
value = (data – median)/(max – median)
else
value = 0
3
1.2
Slight
Moderate
1
37.8
0.8
Sever
e
Extreme
39.8
38.4
0.6
0.4
0.2
0
37.3
37.9
38.6
39.1
40
Assume U  37.8, 38.4, 39.8, Set A   A x ,
 A 37.8  0.83,
0,
 A 38.4  0.29, 0.71,
 A 39.8   0,
0,
0,
0
0,
0
0.22, 0.78
Membership functions mostly have much complex shape then fever(x). They will at least
tend to be triangles pointing up, and they can be much more complex than that.
Furthermore, membership functions which are discussed so far are as if they always are
based on a single criterion, but this isn't always the case, although it is the most common
case. One could, for example, want to have the membership function for Fever not only
depends on a person's temperature but also on other symptoms.
3.
Fuzzy Control
Fuzzy control, which directly uses fuzzy rules, is the most important current application
in fuzzy theory. I use a procedure originated by Ebrahim Mamdani in the late 70s as a
demo to show how fuzzy systems works.
Three steps are used to create a fuzzy controlled machine:
1) Fuzzification (Using membership functions to graphically describe a situation)
2) Rule evaluation (Application of fuzzy rules)
4
3) Defuzzification (Obtaining the crisp or actual results)
Now, we want to construct an inverted pendulum system. Here, the problem is to balance
a pole on a mobile platform that can move in only two directions, to the left or to the right.
The angle between the platform and the pendulum and the angular velocity of this angle
are chosen as the inputs of the system. Output is corresponding to the speed of the
platform. (Mamdani, 1972)
3.1. Fuzzification
First of all, the different levels of output of the platform (speed) are defined by specifying
the membership functions for the fuzzy sets. The graph of the function is shown below
Similarly, the different angles between the platform and the pendulum and...
The angular velocities of specific angles are also defined
5
Note: For simplicity, it is assumed that all membership functions are spread equally.
3.2. Rule evaluation
The next step is to define the fuzzy rules. The fuzzy rules are a series of if-then
statements. These statements are usually derived by an expert to achieve optimum results.
Some examples of these rules are:
If angle is zero and angular velocity is zero then speed is also zero.
If angle is zero and angular velocity is low then the speed shall be low.
The full set of rules is summarized in the table below. The dashes are for conditions,
which have no rules associated with them. This is for simplifying the situation.
Speed
Angle
------------
negative high
negative low
zero
positive low
positive high
v negative high
------------
-----------
negative high
---------
---------
e negative low
---------
---------
negative low
zero
--------
l zero
negative high
negative low
zero
positive low
positive high
o positive low
---------
zero
low
-----------
---------
c positive high
---------
---------
high
----------
---------
An application of these rules is shown using specific values for angle and angular
velocities. The values used for this example are 0.75 and 0.25 for zero and positive-low
angles, and 0.4 and 0.6 for zero and negative-low angular velocities. These points are on
the graphs below.
6
Consider the rule "if angle is zero and angular velocity is zero, the speed is zero". The
actual value belongs to the fuzzy set zero to a degree of 0.75 for "angle" and 0.4 for
"angular velocity". Since this is an AND operation, the minimum criterion is used , and
the fuzzy set zero of the variable "speed" is cut at 0.4 and the patches are shaded up to
that area. This is illustrated in the figure below.
Similarly, the minimum criterion is used for the other three rules. The following figures
show the result patches yielded by the rule "if angle is zero and angular velocity is
7
negative low, the speed is negative low", "if angle is positive low and angular velocity is
zero, then speed is positive low" and "if angle is positive low and angular velocity is
negative low, the speed is zero".
The four results overlap and are reduced to the following figure
3.3. Defuzzification
The result of the fuzzy controller as of know is a fuzzy set (of speed). In order to choose
an appropriate representative value as the final output(crisp values), defuzzification must
be done. There are numerous defuzzification methods, but the most common one used is
the center of gravity of the set as shown below.
8
4.
Hierarchical Fuzzy System
A major issue in fuzzy applications is how to produce fuzzy rules. The classical
approaches of fuzzy control deal with dense rule bases where the universe of discourse is
fully covered by the antecedent fuzzy sets of the rule base in each dimension, thus there
is at least one activated rule for every input (Muresan, 2001). It causes the high
computational complexity of these traditional approaches, because the numbers of rules
has an exponential increase with the number of inputs and terms, e.g. in the above
example, there are two inputs and 5 terms, it should be 25 rules, however, if there are 5
inputs and 5 terms, the number of rules is 3,125. The complexity limits the usage of
classical fuzzy theory where the inputs cannot exceed about 6 to 10 (Wong et al. 2003).
If a fuzzy model contains k variables and maximum T linguistic (or other fuzzy) terms in
each dimension, the number of necessary rules is O (T k ) . The number of rules can be
decreased either by decreasing T, or k, or both, meanwhile methods should prevent from
losing the easy interpretability of the components. One method leads to sparse rule bases
through decreasing T and adapts rule interpolation to create rule bases (Kóczy and Hirota,
1993). The other aims to reduce the dimension of the sub-rule bases k by using
meta-levels or hierarchical fuzzy rule bases (Sugeno, Murofushi, Nishino and Miwa,
1991).
As for the hierarchical structure, the basic idea is the following:
Often the multi-dimensional input state space
X  X 1  X 2   X k
can be
decomposed, so that some of its components, e. g. X  X 1  X 2    X k 0 determine a
subspace of X (k 0  k ) , so that in Z 0 a partition
  D1 , D2 ,, Dn 
determined:
n
 Di  Z 0
i 1
In each element of  , i.e. Di , a sub-rule base Ri can be constructed with local validity.
In the worst case, each sub-rule base refers to exactly X Z 0  X k0 1   X k , and so the
9
hierarchical rule base has the following structure:
R0:
If z 0 is D1 then use R1
If z 0 is D2 then use R2
…….
If z 0 is Dn then use Rn
Where z 0  Z 0
R1:
If z1 is A11 then y use B11
If z1 is A12 then use B12
…….
If z1 is A1m1 then use B1m1
Where z1  X Z 0
R2:
If z1 is A21 then y use B21
If z1 is A22 then use B22
…….
If z1 is A2m2 then use B 2m2
Where z1  X Z 0
..
Rn:
If z1 is An1 then y use B21
If z1 is An2 then use B22
…….
If z1 is Anmn then use Bnmn
The fuzzy rules in hierarchical structure are pointers to other sub – rules bases. We can
find that this hierarchical approach does not help with the O (T k ) complexity of the
whole rule bases as the size of R0 is O(T k1 ) ,and each Ri, i>0, is of order O (T k  k1 ) , so
the resulting complexity is O(T k1 )  O(T k k1 )  O(T k ) . Only if a suitable  and Z0 are
found where the number of variables in each Zi is ki<k-k0 and max nk1 ki   K  O(T K ) ,
then the application of the structured rule base leads in effect to the reduction of k to
smaller exponent: k0<k+K . Now, the main difficulty in the automatic construction of
such system is mainly in finding a suitable Z0 and  .
10
5.
Fuzzy Signature
Fuzzy signatures structure data into vectors of fuzzy values, each of which can be a
further vector. It can extend the application of fuzzy theory to domains which contain
complex and interdependent features. Fuzzy signatures can be used in cases which have
different numbers of data components.
The definition of fuzzy sets was A: X->[0,1], and was extend to L-fuzzy sets by Goguen
(Goguen, 1967),

As : X  ai ik1 , ai  

0,1
 
,a
aij iki 1 ij



0,1
aijl lk1
ij
AL : X  L , L being an arbitrary algebraic lattice. Vector Valued Fuzzy Sets is descried
k
as, where Av ,k : X  0,1 , and the range of membership values was the lattice of k
dimensional vectors with components in the unit interval (Kóczy, 1982). Generally, it
means each fuzzy signature is a nested vector structure which contains a serial fuzzy
signature, the internal structure of which indicates the semantic and logical connection of
state variables, equivalent to the leaves of the signature graph. It can be denoted as a
fuzzy set vector which has possible recursive component vectors:
A : X  S (n) where n  1 and
n
S ( n)   S i
i 1
 0,1
S i   ( m) and  describes Cartesian product.
S
A fuzzy signature is a kind of special multidimensional fuzzy data; some of the data in
sub-groups will affect some feature on their higher level.
The relationship between higher and lower levels is controlled by a set of fuzzy
aggregations. The results of the parent signature at each level are computed from their
branches with appropriate aggregation of their child signatures. The aggregation methods
11
are not necessary identical. It can be changed based on expert opinions and detailed
circumstance.
With each aggregation, higher signatures will keep less information. In some
circumstances, it is useful to reduce and aggregate information and maintain
compatibility with that of other sources in which some detail variables are missed or
omitted. In most cases, the rule of maximal common sub-tree is that all signatures are
able to be interpolated between the corresponding branches.
6.
Fuzzy Signature in SARS Pre-clinical Diagnosis
There are two ways to determine the sub-trees of the fuzzy signature. One way is
determined by human experts. The other is the structure of the fuzzy signature is decided
via identifying the separability of data (Chong et al. 2002). Here, in the demonstration,
the first method will be used.
The following scheme is the daily symptom signatures of patients:

 8am  



 fever12 pm 

 4 pm  



 8 pm  
AS  

12am 
Cough

 9 pm  

 Nausea 


Sore


Doctors know these symptoms need to be checked and how many times to be monitored.
In reality, more symptoms should be tested, for the reason of simple demonstration, only
some representative symptoms are included.
A few examples with linguistic values and fuzzy signatures are list below:
12
  none  
 0.0 


  

  none  
 0.0 
  slight  
 0.2 


  

slight




A1 
  0.2 
 normal  
 0.5 


  

   
   
 slight 
 0.25 
 slight 
 0.25 





   




  



   

 moderate 
 0.4 


  

moderate




A2 
  0.4 
  high  
 0.7 
 

  

  severe 
 0.9 
 slight 
 0.25 
 none 
 0 




Normally, fever values (temperature) can be expressed as e.g. 38.9 degree; it also can be
converted to linguistic values by considering contextual information such as the different
normal body temperature of adults and children.
Note that the structures which happen in real world data are different. For patient 2, there
are only two measurements of fever. The structure of the fuzzy signature contains some
information in the associated vector component. An aggregation method can compare
components regardless of the different numbers of sub-components. Aggregation
methods should be designed for vectors with the assistance of domain experts. Here, we
assume the time of examination of a day is less significant and that highest temperature
value is more important. The two signatures are reduced to:
A1 f
 0.2 
 0.5 
  
  0.5 
 0.25 
 0.25 


A2 f
 0 .4 
  0 .6  
  
   0 .8  
 0.25 
 0 


Now, the component “fever” can be rewritten linguistically as e.g. “slight”, “moderate”.
The signatures above still contain information to describe the “worst case fever” of each
patient, although information of the daily tendency is lost. We can continue our processes
further and finally, get an overall “abnormal condition” measure:
A1o  0.25,
A2o  0.4
Notes: Aggregation methods for different symptoms here are different with that of
signatures of same symptoms. Basically, we can use an expert system to weight each
13
symptom; also many artificial intelligence theories can be used to in this application.
This example just shows how to convert patients’ data into individual fuzzy signatures.
Then, by using some fuzzy operation and aggregation, the fuzzy signature can produce an
indicated value about the measurement of “abnormal condition”. The main advantage of
fuzzy signature is it can model more vague information and in some cases symptoms of
patients are allowed to be different. The other reason is that the structure of a fuzzy
signature is flexible, which can allow insertion of new fuzzy signatures without need of
prior structure design (Wong et al. 2003).
7.
Automatic Method to Construct Fuzzy signature
We have introduced how to manually construct an application of fuzzy signature in SARS
pre-clinical diagnoses. We can build its internal structure based on patients’ symptom.
However in application to a large data set, it is possible for any hierarchical structure that
it contains its sub-structure is hidden. It has been discussed that the subspace Z0 is used to
select the most appropriate sub–rule to deduce the output. Generally speaking, the more
separable the elements in Π are, the easier sub-rule base selection is. Therefore, through
ranking the importance of subspaces based on this capability in separating components
can be used to decide the proper subspace. However, the problem is that finding Π and Z0
affect each other. Sugeno and Yasukawa (1991) introduced a solution for sparse rule-base
generation. SY solution clusters output data sample and induces the rules by projecting
clusters of output to input domains. However, it only produces necessary rules for the
input-output sample data. Projection-based fuzzy rule extraction (PB) extended from SY
approach, aims to automatically construct fuzzy rule base from a set of input-output
sample data.
Before we introduce PB, we first discuss what is fuzzy clustering and a fuzzy clustering
method called fuzzy c-Means.
8.
Fuzzy Clustering
In last chapter, we mentioned that a clustering algorithm is used to cluster output data
14
samples. The main requirement of a reasonable cluster is that each of its elements can be
modeled by a rule base with local validity (Wong et al. 2003). So it can be regarded that
its aim is to find a subspace contains homogeneous data. Once a subspace is found, it
could be used to select an appropriate sub-rule to infer the output for a given input.
In data X  x1 , x2 x, 3 ,.....xn  , how to classify data points in X to K groups and
( n  k  2 ), the rule is: there are highly relevant points in the same group, and highly
irrelevant points in different groups. In traditional mathematical classification, it will
classify the datum ‘strictly’ to a group. This is called Hard Clustering. However, most
problems in our lives belong to uncertain fuzzy problems. Using fuzzy clustering allows a
certain value to belong to different groups, so it can be better to keep the feature of this
value. On the other hand, when we are using the feature selection methods to seek a
‘reasonable’ subspace for clustering, the algorithm works more effectively, most of
feature selection technology such as c-Means needs to know the number of clusters.
8.1. Fuzzy C-Mean
Bezdek introduced Fuzzy C-Means clustering method (FCM) in 1981, extend from Hard
C-Mean clustering Method (Dunn, 1974). In some conditions, convergence of FCM is
better than that of Maximum Likelihood (ML) (Huggins, 1983). A suit of FCM
algorithms issued by Cannon et al, are used widely in research in clustering analysis
applications. The fuzzy cluster algorithm issued by Bezdek is an amended Dunn’s
c-Means, but it is still some flaws such as, weight is not be considered and mainly used
for
static
l
data.
n
In
1978,
Roubens
introduced
a
new
objective
n
function:   iv2  2jv d i, j  , however, the divergence is not very good. In 1981,
v 1 i 1 j 1
n
k
Leonard et al, issued an amended objective formula: 
v 1
n
 uiv2 u 2jv d (i, j )
i 1 j 1
n
2
j 1
improve Roubens’ objective function (Cheng, 1991)
15
, which can
u 2jv
In this case, many symptoms affect each other from a medical viewpoint and different
physical features of persons also affect the implicit meaning of measurement of
symptoms. So, we adapt fuzzy clustering theory instead of classical clustering.
Furthermore, using fuzzy clustering will still maintain the information in the original data
so that it can be used for further research.
Let   (1 ,  2 ,..... c1 ) as a fuzzy partition C
U cn
 11 12  1n 

   2 n 
21


 


 


  c1    cn 
Dunn defined a fuzzy objective function:
c
n
J D U ,V     ij2 x j  vi
i 1 j 1
2
, vi is cluster center of i set
Then, Bezdek (1981) extended it to:
c
n
J m U ,V ; X     ijm x j  vi ,1  m  
2
i 1 j 1
xk  vi
2
represents the deviation of data x k with vi . The number m governs the
influence of membership grades.
, vi is cluster center of i set
For getting the minimum (U, V):
c n


min  J m U ,V ; X     ijm x j  vi

i 1 j 1

2




We should reach two conditions listed below:
n
 ( ij ) m x j
vi 
j 1
n
 ( ij )
,1  i  c
m
j 1
and
16
 c
 ij    ( x j  v j
 k 1

1
2
2
x j  vk )
m 1 
1
 ,1  i  c,1  j  n


FCMC algorithm has a limitation, which is it needs to know the number of clusters. Some
work has been done on how to find an optimal number of clusters. This is referred to as
the cluster validity problem. A cluster validity index proposed by Fukuyama and Sugeno
(FS):
c
n
 m ( xk  vi 2  vi  x 2 ),2  c  n
S (c )   U ij
i 1 j 1
Therefore, the optimal member of clusters can be found by minimizing the distance
between data and its centre and maximizing the distance between data in difference
clusters.
Iteration of the FCMC stops when the error is below a defined tolerance or its
improvement over the previous iteration is below a certain threshold.
9.
Projection Based Rule Extraction
Briefly, this technique can be implemented following these steps:
1. Perform c-Means to cluster data along output space. The FS index of Fuzzy c-Means
can be used to get a optimal number of clusters.
2. For each fuzzy output cluster, all points contained in the cluster are projected back to
input dimensions.
3. The projected points in each dimension are clustered again. In this procedure, the FS
index is used in conjunction with the merging index. This process will produce
multiple fuzzy clusters in each dimension.
4. Each of the clusters in the input dimension is a projection of the multi-dimensional
input cluster to that input dimension. Then, the clusters from the individual
dimensions are combined to form the multi-dimensional input cluster.
5. For each of the multi-dimensional clusters identified, a rule can be created.
17
10. Conclusion
In this report, I have described the basic concept of fuzzy sets and what fuzzy control is.
For solving the rules explosion, I introduced the conception of fuzzy signature and how
to construct a fuzzy signature manually. The hierarchical fuzzy signatures structure
presented can perform feature selection and interclass separability to reduce the
complexity. Here a SARS pre-clinical diagnosis model was constructed using fuzzy
signature to show the flexibility of the fuzzy signature. In addition, a data mining
algorithm – c-Means and its role in fuzzy signature was introduced. The PB provides a
method to construct rule sets automatically. With the assistance of relevant theory, the
fuzzy applications which may be generated from or adapted to fuzzy logic will be
wide-used, and provide the more opportunity for modeling of conditions which are
inherently imprecisely defined.
18
References
Bezdek, J.C., Pattern Recognition with Fuzzy Objective Function Algorithms. 1981, New
York: Plenum Press.
Chong, A., Gedeon, T.D. and Kóczy,L.T., “Projection Based Method for Sparse Fuzzy
System Generation” in Proceedings of 2nd WSEAS int.Conf on Scientific Computation
and Soft Computing, 2002,pp.321-325
Chong, A., Gedeon, T.D. and Kóczy, L.T. “Feature selection and subspace clustering for
hierarchical fuzzy rule extraction” CIMSA. 2003,Paris
Gedeon, T.D. (1999) “Clustering Significant Wordsusing their Co-occurrence in
Document Sub-Collections,” Proceedings 7th European Congress onIntelligent
Techniques and Soft Computing(EUFIT’99), Aachen, pp. 302-306.
Gedeon, T.D., Kóczy, L.T. Wong, K.W. and Liu, P. (2001) “Effective Fuzzy Systems for
Complex Structured Data,” Proceedings of IASTED International Conference Control
and Applications (CA 2001), pp. 184-187
Goguen, J.A (1967) “L-fuzzy sets” J. Math. A. Anal. Appl. 18, pp. 145-174
Kóczy , L.T. and Hirota, K., (1993) “Approximate reasoning by linear rule interpolation
and general approximation,” Int. J. Approx. Reason, Vol 9, pp197-223.
Muresan, Leila “Interpolation in Hierarchical Fuzzy Rule Bases,” Technical University of Budapest,
Hungary
Sugeno, M., Murofushi, T., Nishino, J., and Miwa, H. (1991) “helicopter flight control
based on fuzzy logic,” Proceedings of Fuzzy Engineering toward Human Friendly
System’91, pp. 1120-1124
Wang L X, Mendel J M (1992). “Fuzzy basis functions, universal approximation, and
orthogonal least squares learning [J]”. IEEE Trans. On Neural Networks , (5) :807 -814.
[4]
Wong, K.W., Chong, A., Gedeon, T.D., K ó czy, L.T., and Vamos, T. (2003).
“Hierarchical Fuzzy Signatures Structure for Complex Structured Data” Proceedings of
International Symposium on Computational Intelligence and Intelligent Informatics
2003(ISCIII’03), Nabeul, Tunisia, pp105-109
19
Appendix
Plan for future work
Extend literature survey:
1. To find good methods to optimize weights.
2. To find different aggregation methods for the fuzzy signatures.
Research work will focus on:
1. Finding an appropriate artificial intelligence algorithm to fine tune the weights of
the fuzzy signatures. I will use artificial intelligence, because these approaches are
used widely in automatically constructing systems based on feedback and
self-learning algorithms.
2. Use of different aggregation methods.
Experimental work:
1. Real data will be used to train the selected method of artificial intelligence.
2. To verify the final result deduced by the artificial intelligence method.
20
Download