On a similarity measure between LR

advertisement
On a Similarity Measure between LR-Type
Fuzzy Numbers and Its Application to
Database Acquisition
Miin-Shen Yang,1 * Wen-Liang Hung,2,† Shou-Jen Chang-Chien 1,‡
1
Department of Applied Mathematics, Chung Yuan Christian University,
Chung-Li, Taiwan 32023, ROC
2
Department of Mathematics Education, National Hsinchu Teachers
College, Hsin-Chu, Taiwan, ROC
This article presents a new similarity measure for LR-type fuzzy numbers. The proposed similarity measure is based on a defined metric between LR-type fuzzy numbers. It is known that an
exponential operation is highly useful in dealing with the classical Shannon entropy and cluster
analysis. We adopted, therefore, the exponential operation on this metric. Furthermore, we analyze its properties and make numerical comparisons to several similarity measures. The results
show that the proposed similarity measure can overcome the drawbacks of the existing similarity measures. We then apply it to compound attributes for handling null queries to database
systems. These applications can also be widely used in fuzzy queries to databases. © 2005
Wiley Periodicals, Inc.
1.
INTRODUCTION
In the real world situations are very often uncertain. Probability has traditionally been used in modeling uncertainty. However, fuzziness has been widely
used to handle another type of uncertainty. Fuzzy data and fuzzy presentation
commonly exist in real-world systems. In all of these fuzzy types of presentation, LR-type fuzzy numbers are most used as in linguistic, decision making,
knowledge representation, medical diagnosis, control systems, databases, and so
forth.
A similarity measure is important for presenting a degree of similarity
between two objects or concepts. Traditionally, a similarity measure was used
*Author to whom all correspondence should be addressed: e-mail: msyang@math.
cycu.edu.tw.
†e-mail: wlhung@mail.nhctc.edu.tw.
‡e-mail: chang-chien0102@yahoo.com.tw.
INTERNATIONAL JOURNAL OF INTELLIGENT SYSTEMS, VOL. 20, 1001–1016 ~2005!
© 2005 Wiley Periodicals, Inc. Published online in Wiley InterScience
~www.interscience.wiley.com!. • DOI 10.1002/int.20102
1002
YANG, HUNG, AND CHANG-CHIEN
most in cluster analysis such as hierarchical clustering ~see Ref. 1!. Since Zadeh 2
proposed fuzzy sets, many different similarity measures between fuzzy sets have
been proposed and applied in various areas ~see Refs. 3–7!. However, most of
them are defined for fuzzy values ~i.e., discrete fuzzy sets!.
LR-type fuzzy ~trapezoidal! numbers are commonly used to present real ~interval! numbers in a fuzzy environment for analyzing fuzziness and fuzzy data that
have various applications as in linguistic, control, database systems, and so forth.
However, there is less similarity for LR-type fuzzy numbers. Recently, Chen
and Chen 8,9 proposed a similarity measure between generalized fuzzy numbers.
They used center of gravity to define a similarity for generalized fuzzy numbers
that is complicated and not useful for general LR-types. It is known that an
exponential operation is highly useful in dealing with the classical Shannon
entropy ~see Refs. 10 and 11! and cluster analysis ~see Ref. 12!. In this article,
we adopt, therefore, the exponential operation on a defined metric between
LR-type fuzzy numbers and propose a new similarity measure. We compare
it to Chen and Chen,8,9 Chen,13 Lee,14 and Hsieh and Chen 15 and apply it in
fuzzy queries to databases. In Section 2, we present the proposed similarity
measure and its properties. Section 3 gives numerical comparisons and examples. In Section 4, we apply it to database queries by generating compound attributes for handling null queries. The results are compared to Wang
and Tsai.16 A real data set from a health physical fitness norm report in Taiwan
is used and analyzed using the proposed method. We present our conclusions in
Section 5.
2.
PROPOSED SIMILARITY MEASURE AND ITS PROPERTIES
A fuzzy number F is defined as a convex normalized fuzzy set of real numbers ℜ with membership m F ~ x! piecewise continuous. Let L ~and R! both be
decreasing functions from all positive real numbers ℜ⫹ to the interval @0,1# with
L~0! ⫽ 1, L~ x! ⬍ 1 for x ⬎ 0, L~1! ⫽ 0, and L~ x! ⬎ 0 for x ⬍ 1 ~and the same for
R!. A fuzzy number X is called LR-type if there are real numbers m, a ⬎ 0, b ⬎ 0
with
冉 冊
冉 冊
 m⫺x
L
,
a
m X ~ x! ⫽ 
x⫺m
,
R

b
xⱕm
xⱖm
where m is called the mean value of X and a and b are called the left and right
spreads. We then symbolically denote X by X ⫽ ~m, a, b!LR . Let X and Y be two
LR-type fuzzy numbers with X ⫽ ~m x , ax , bx !LR and Y ⫽ ~m y , ay , by !LR . Using
the extension principle, we have that X ⫹ Y ⫽ ~m x ⫹ m y , ax ⫹ ay , bx ⫹ by !LR ,
⫺Y ⫽ ~⫺m y , by , ay !RL , ~m x , ax , bx !LR ⫺ ~m y , by , ay !RL ⫽ ~m x ⫺ m y , ax ⫹ ay ,
bx ⫹ by !LR , and lX ⫽ ~lm x , lax , lbx !LR when l ⬎ 0 and lX ⫽ ~lm x ,⫺lbx ,
⫺lax !RL when l ⬍ 0 ~see Ref. 17!.
SIMILARITY MEASURE BETWEEN LR-TYPE FUZZY NUMBERS
1003
For an LR-type fuzzy number M ⫽ ~m, a, b!LR , if L and R are of the form
T ~ x! ⫽
再
1 ⫺ x,
0 ⱕ x ⱕ1
0,
otherwise
then M is called a triangular fuzzy number and denoted by M ⫽ ~m, a, b!T . A
fuzzy number X is called a LR-type trapezoidal fuzzy number if there are real
numbers m 1 , m 2 , a ⬎ 0, b ⬎ 0 with
m X ~ x! ⫽
冉
冦冉
L
冊
x ⱕ m1
冊
x ⱖ m2
m1 ⫺ x
,
a
1,
R
x ⫺ m2
,
b
m1 ⱕ x ⱕ m2
where m 1 and m 2 are called mean values of X and a, b are called the left and right
spreads of X. Symbolically, X is denoted by X ⫽ ~m 1 , m 2 , a, b!LR . We know that
LR-type fuzzy numbers are used to present real numbers in a fuzzy environment
and trapezoidal fuzzy numbers are used to present fuzzy intervals that are widely
applied in linguistic, knowledge representation, control systems, databases, and so
forth.
Remark 1.
If we take L and R to be of the form
T ~ x! ⫽
where 0 ⬍ w ⱕ 1, then
再
w~1 ⫺ x!,
0 ⱕ x ⱕ1
0,
otherwise
冉
冦冉
w 1⫺
m X ~ x! ⫽
m1 ⫺ x
a
w
w 1⫺
x ⫺ m2
b
冊
for x ⱕ m 1
冊
for x ⱖ m 2
for m 1 ⱕ x ⱕ m 2
The fuzzy number X is called a generalized trapezoidal fuzzy number by Chen.18,19
We denote X ⫽ ~m 1 , m 2 , a, b; w!GT . If w ⫽1, then the generalized trapezoidal fuzzy
number is called a ~normal! trapezoidal fuzzy number.
A similarity measure is an important tool for presenting a degree of similarity
between two objects. There are many similarity measures defined for ~discrete!
fuzzy sets on a finite set. Zwick et al.3 first used the geometric distance and the
Hausdorff metric to define similarity measures among fuzzy sets. Pappis and Karacapilidis 5 proposed three kinds of similarity measures for discrete fuzzy sets. There
are many different proposed similarities for discrete fuzzy sets and applications
~see Refs. 4, 16, and 20, among others!. However, there are few defined similarities for LR-type fuzzy numbers. Recently, Chen and Chen 8,9 proposed a similarity
1004
YANG, HUNG, AND CHANG-CHIEN
between fuzzy numbers. They used the center of gravity ~COG! to define the degree
of similarity between trapezoidal or triangular fuzzy numbers. However, their definition seems to be too complicated and difficult to interpret. Their methods cannot be used for LR-type fuzzy numbers. In this section, we propose a simple way
to define a new similarity for LR-type fuzzy numbers.
Consider two LR-type fuzzy numbers A ⫽ ~m a , aa , ba !LR and B ⫽
~m b , ab , bb !LR . Yang and Ko 21,22 defined a metric dLR for A and B with
2
~A, B! ⫽ ~m a ⫺ m b ! 2 ⫹ ~~m a ⫺ ᐉaa ! ⫺ ~m b ⫺ ᐉab !! 2
dLR
⫹ ~~m a ⫹ rba ! ⫺ ~m b ⫹ rbb !! 2
where ᐉ ⫽ *01 L⫺1 ~w! dw and r ⫽ *01 R ⫺1 ~w! dw if L⫺1 and R⫺1 are integrable over
the interval @0,1#. To define dLR for two LR-type trapezoidal fuzzy numbers A ⫽
~m 1a , m 2a , aa , ba !LR and B ⫽ ~m 1b , m 2b , ab , bb !LR , we extend dLR to
2
~A, B! ⫽ ~~m 1a ⫺ m 1b ! 2 ⫹ ~m 2a ⫺ m 2b ! 2 !/2 ⫹ ~~m 1a ⫺ ᐉaa ! ⫺ ~m 1b ⫺ ᐉab !! 2
dLR
⫹ ~~m 2a ⫹ rba ! ⫺ ~m 2b ⫹ rbb !! 2
It is known that an exponential operation is highly useful in dealing with the classical Shannon entropy ~see Refs. 10 and 11! and cluster analysis ~see Ref. 12!. We
adopted, therefore, the exponential operation on the metric dLR . The proposed similarity measure S~A, B! between two LR-type ~trapezoidal! fuzzy numbers A and B
is defined as
S~A, B! ⫽
再
1
if A ⫽ B
2
~A, B!/s!
exp~⫺dLR
if A ⫽ B
where s is a constant bigger than 0 chosen as follows:
~i! if A ⫽ ~m a , aa , ba !LR and B ⫽ ~m b , ab , bb !LR , then
s ⫽ ~D* ⫹ D * !/2 ⫹ ~6~m a ⫺ aa ! ⫺ ~m b ⫺ ab !6 ⫹ 6~m a ⫹ ba ! ⫺ ~m b ⫹ bb !6!/2 3
where D* ⫽ 6~m a ⫺ ᐉaa ! ⫺ ~m b ⫺ ᐉab !6, D * ⫽ 6~m a ⫹ rba ! ⫺ ~m b ⫹ rbb !6.
~ii! If A ⫽ ~m 1a , m 2a , aa , ba !LR and B ⫽ ~m 1b , m 2b , ab , bb !LR , then
s ⫽ ~D* ⫹ D * !/2 ⫹ ~6~m 1a ⫺ aa ! ⫺ ~m 1b ⫺ ab !6 ⫹ 6~m 2a ⫹ ba ! ⫺ ~m 2b ⫹ bb !6!/2 4
where D* ⫽ 6~m 1a ⫺ ᐉaa ! ⫺ ~m 1b ⫺ ᐉab !6, D * ⫽ 6~m 2a ⫹ rba ! ⫺ ~m 2b ⫹ rbb !6.
In general, most similarity measures S~A, B! satisfy that, if A ⫽ B, then
S~A, B! ⫽ 1. However, some of them may not satisfy if S~A, B! ⫽ 1, then A ⫽ B.
We present the proposed similarity that has the property that A ⫽ B if and only
S~A, B! ⫽ 1. In fact, it has also the property S~A, B! ⫽ S~B, A!.
Property 1. For any two LR-type fuzzy (trapezoidal) numbers A and B, we have
A ⫽ B if and only if S~A, B! ⫽ 1.
SIMILARITY MEASURE BETWEEN LR-TYPE FUZZY NUMBERS
1005
Proof.
~i! If A ⫽ B, then S~A, B! ⫽ 1 by the definition.
~ii! Let A ⫽ ~m a , aa , ba !LR and B ⫽ ~m b , ab , bb !LR with A ⫽ B. Thus,
s ⫽ ~6~m a ⫺ ᐉaa ! ⫺ ~m b ⫺ ᐉab !6 ⫹ 6~m a ⫹ rba ! ⫺ ~m b ⫹ rbb !6!/2
⫹ ~6~m a ⫺ aa ! ⫺ ~m b ⫺ ab !6 ⫹ 6~m a ⫹ ba ! ⫺ ~m b ⫹ bb !6!/2 3
It implies that
s ⫽ 0 ? m a ⫺ ᐉaa ⫽ m b ⫺ ᐉab , m a ⫹ rba ⫽ m b ⫹ rbb
m a ⫺ aa ⫽ m b ⫺ ab , m a ⫹ ba ⫽ m b ⫹ bb
? ~1 ⫺ ᐉ!aa ⫽ ~1 ⫺ ᐉ!ab , ~r ⫺ 1!ba ⫽ ~r ⫺ 1!bb
m a ⫺ aa ⫽ m b ⫺ ab , m a ⫹ ba ⫽ m b ⫹ bb
? m a ⫽ m b , aa ⫽ ab , ba ⫽ bb
if ᐉ ⫽ 1 and r ⫽ 1
?A⫽B
2
Therefore, if A ⫽ B, then s ⫽ 0 and dLR
~A, B! ⫽ 0. Using the definition
S~A, B! ⫽
再
1
if A ⫽ B
2
~A, B!/s!
exp~⫺dLR
if A ⫽ B
we have that if A ⫽ B, then S~A, B! ⫽ 1. That means, if S~A, B! ⫽ 1, then A ⫽ B.
Similarly, the proof is the same for A ⫽ ~m 1a , m 2a , aa , ba !LR and B ⫽ ~m 1b ,
䡲
m 2b , ab , bb !LR .
Property 2. If A and B are two LR-type fuzzy numbers, then S~A, B! ⫽ S~B, A!.
Proof.
~i! If S~A, B! ⫽ 1, then A ⫽ B ~by Property 1!. That is, S~A, B! ⫽ S~B, A!.
2
~A, B! ⫽ 0. It is clear that
~ii! If S~A, B! ⫽ 1, then A ⫽ B. We then have s ⫽ 0 and dLR
2
2
~A, B! ⫽ dLR
~B, A!. Thus, S~A, B! ⫽ S~B, A!.
䡲
dLR
3.
NUMERICAL COMPARISONS
In this section, we make some numerical comparisons of the proposed similarity with the other similarities proposed by Chen and Chen,8,9 Chen,13 Lee,14
and Hsieh and Chen.15
Let A and B be two triangular fuzzy numbers where A ⫽ ~m a , aa , ba !T , and
B ⫽ ~m b , ab , bb !T with m a ⫺ aa ⱖ 0, m a ⫹ ba ⱕ 1, and m b ⫺ ab ⱖ 0, m b ⫹ bb ⱕ 1.
Chen 13 defined the degree of similarity SC ~A, B! between A and B as follows:
SC ~A, B! ⫽ 1 ⫺
7A ⫺ B7
3
1006
YANG, HUNG, AND CHANG-CHIEN
where
7A ⫺ B7 ⫽ 6~m a ⫺ aa ! ⫺ ~m b ⫺ ab !6 ⫹ 6m a ⫺ m b 6 ⫹ 6~m a ⫹ ba ! ⫺ ~m b ⫹ bb !6
If A and B are two trapezoidal fuzzy numbers where A ⫽ ~m 1a , m 2a , aa , ba ;1!GT
and B ⫽ ~m 1b , m 2b , ab , bb ;1!GT with m 1a ⫺ aa ⱖ 0, m 2a ⫹ ba ⱕ 1, and m 1b ⫺
ab ⱖ 0, m 2b ⫹ bb ⱕ 1, then the degree of similarity SC ~A, B! between A and B can
be calculated as follows:
SC ~A, B! ⫽ 1 ⫺
7A ⫺ B7
4
where
7A ⫺ B7 ⫽ 6~m 1a ⫺ aa ! ⫺ ~m 1b ⫺ ab !6 ⫹ 6m 1a ⫺ m 1b 6 ⫹ 6m 2a ⫺ m 2b 6
⫹ 6~m 2a ⫹ ba ! ⫺ ~m 2b ⫹ bb !6
Lee 14 defined a similarity measure SL ~A, B! for trapezoidal fuzzy numbers
A ⫽ ~m 1a , m 2a , aa , ba ;1!GT and B ⫽ ~m 1b , m 2b , ab , bb ;1!GT as follows:
SL ~A, B! ⫽ 1 ⫺
7A ⫺ B7lp
⫻ 4⫺1/p
7U 7
where
7U 7 ⫽ max~U ! ⫺ min~U !
U ⫽ $m 1a ⫺ aa , m 1a , m 2a , m 2a ⫹ ba , m 1b ⫺ ab , m 1b , m 2b , m 2b ⫹ bb %
and
7A ⫺ B7lp ⫽ ~6~m 1a ⫺ aa ! ⫺ ~m 1b ⫺ ab !6 p ⫹ 6m 1a ⫺ m 1b 6 p ⫹ 6m 2a ⫺ m 2b 6 p
⫹ 6~m 2a ⫹ ba ! ⫺ ~m 2b ⫹ bb !6 p ! 1/p
Based on the idea of graded mean integration-representation distance, Hsieh
and Chen 15 proposed a similarity measure SHC ~A, B! between two fuzzy numbers
A and B as follows:
SHC ~A, B! ⫽
1
1 ⫹ d~A, B!
where d~A, B! ⫽ 6P~A! ⫺ P~B!6 and P~A! and P~B! are the graded mean integration representations of A and B, respectively. If A ⫽ ~m a , aa , ba !T and B ⫽
~m b , ab , bb !T , then
P~A! ⫽
⫺aa ⫹ 6m a ⫹ ba
6
and
P~B! ⫽
⫺ab ⫹ 6m b ⫹ bb
6
If A and B are trapezoidal fuzzy numbers, where A ⫽ ~m 1a , m 2a , aa , ba ;1!GT ,
B ⫽ ~m 1b , m 2b , ab , bb ;1!GT , then
SIMILARITY MEASURE BETWEEN LR-TYPE FUZZY NUMBERS
P~A! ⫽
⫺aa ⫹ 3m 1a ⫹ 3m 2a ⫹ ba
6
P~B! ⫽
⫺ab ⫹ 3m 1b ⫹ 3m 2b ⫹ bb
6
1007
and
Recently, Chen and Chen 8,9 used the simple center of gravity method ~SCGM!
to calculate the COG points of generalized trapezoidal fuzzy numbers and then to
calculate the degree of similarity between two generalized trapezoidal fuzzy numbers. Let A ⫽ ~m 1a , m 2a , aa , ba ; wa !GT , B ⫽ ~m 1b , m 2b , ab , bb ; wb !GT with m 1a ⫺
aa ⱖ 0, m 2a ⫹ ba ⱕ 1, and m 1b ⫺ ab ⱖ 0, m 2b ⫹ bb ⱕ 1. Using the SCGM, we can
obtain the COG points COG~A! and COG~B! of A and B, respectively, where
COG~A! ⫽ ~ x A* , yA* ! and COG~B! ⫽ ~ x B* , yB* ! are defined as follows:
yA* ⫽
x A* ⫽
冦
wA ⫻
冉
冊
m 2a ⫺ m 1a
⫹2
~m 2a ⫹ ba ! ⫺ ~m 1a ⫺ aa !
6
wA
2
if m 1a ⫺ aa ⫽ m 2a ⫹ ba
if m 1a ⫺ aa ⫽ m 2a ⫹ ba
yA* ~m 2a ⫹ m 1a ! ⫹ ~m 2a ⫹ ba ⫹ m 1a ⫺ aa !~wA ⫺ yA* !
2wA
Similarly, ~ x B* , yB* ! is for B. Then, the degree of similarity SCC ~A, B! between A
and B can be calculated as follows:
冉
SCC ~A, B! ⫽ 1 ⫺
冊
7A ⫺ B7
min$ yA* , yB* %
⫻ ~1 ⫺ 6 x A* ⫺ x B* 6! B~SA , SB ! ⫻
4
max$ yA* , yB* %
where 7A ⫺ B7 ⫽ 6~m 1a ⫺ aa ! ⫺ ~m 1b ⫺ ab !6 ⫹ 6m 1a ⫺ m 1b 6 ⫹ 6m 2a ⫺ m 2b 6 ⫹
6~m 2a ⫹ ba ! ⫺ ~m 2b ⫹ bb !6. B~SA , SB ! was defined as follows:
B~SA , SB ! ⫽
再
1
if SA ⫹ SB ⬎ 0
0
if SA ⫹ SB ⫽ 0
where SA and SB are the lengths of the bases of A and B, respectively, defined as:
SA ⫽ ~m 2a ⫹ ba ! ⫺ ~m 1a ⫺ aa !,
SB ⫽ ~m 2b ⫹ bb ! ⫺ ~m 1b ⫺ ab !
To compare the proposed similarity S~A, B! with SC ~A, B!, SL ~A, B!, SHC ~A, B!,
and SCC ~A, B!, we give the following examples.
Example 1. Chen and Chen 9 used 12 sets of generalized trapezoidal fuzzy numbers to illustrate that the similarity measure SCC ~A, B! is more reasonable than
SC ~A, B!, SL ~A, B!, and SHC ~A, B!. In this example we also use these datasets, which
are shown in Figure 1, to compare the proposed similarity measure S~A, B! with
the above similarity measures. The similarities for these generalized trapezoidal
fuzzy numbers with S~A, B!, SC ~A, B!, SL ~A, B!, SHC ~A, B!, and SCC ~A, B! are presented in Table I.
1008
YANG, HUNG, AND CHANG-CHIEN
Figure 1.
Twelve sets of fuzzy numbers.
0.975
0.9167
1i
0.8357
0.8607
SC ~A, B!
SL ~A, B!
SHC ~A, B!
SCC ~A, B!
S~A, B!
1
1
1
1
1
Set 2
Set 4
0.7 i
0.5 i
0.7692 i
0.49
0.4493
Set 3
0.7 i
0.5 i
0.7692 i
0.42
0.4444
1i
1i
1i
0.8
0.9753
Set 5
1
*
1
1
1
Set 6
0.9
0i
0.909
0.9
0.7659
Set 7
0.9 i
0.5
0.909 i
0.54
0.7558
Set 8
12 sets of generalized trapezoidal fuzzy numbers
0.9 i
0.6667
0.909 i
0.81
0.7866
Set 9
*The similarity measure cannot calculate the degree of similarity between two generalized trapezoidal fuzzy numbers.
i
Incorrect results.
Set 1
Similarity
measures
Table I. Similarities of 12 datasets.
0.9 i
0.8333
1i
0.9
0.8752
Set 10
0.9 i
0.75
1i
0.72
0.7659
Set 11
0.9 i
0.8
0.9375
0.78
0.8465
Set 12
SIMILARITY MEASURE BETWEEN LR-TYPE FUZZY NUMBERS
1009
1010
YANG, HUNG, AND CHANG-CHIEN
Table II. A nine-member linguistic term set.
Linguistic terms
GTFNs
SCC ~L i , R 1 !
S~L i , R 1 !
SCC ~L i , R 2 !
S~L i , R 2 !
Absolutely-low
Very-low
Low
Fairly-low
Medium
Fairly-high
High
Very-high
Absolutely-high
L 1 ⫽ ~0,0,0,0;1!GT
L 2 ⫽ ~0,0.02,0,0.05;1!GT
L 3 ⫽ ~0.1,0.18,0.06,0.05;1!GT
L 4 ⫽ ~0.22,0.36,0.05,0.06;1!GT
L 5 ⫽ ~0.41,0.58,0.09,0.07;1!GT
L 6 ⫽ ~0.63,0.80,0.05,0.06;1!GT
L 7 ⫽ ~0.78,0.92,0.06,0.05;1!GT
L 8 ⫽ ~0.98,1,0.05,0;1!GT
L 9 ⫽ ~1,1,0,0;1!GT
0.1565
0.1962
0.3226
0.5092
0.7056 ᐉ
0.5828
0.4545
0.2937
0.2391
0.1663
0.1789
0.2560
0.3519
0.6209 ᐉ
0.4326
0.2646
0.1648
0.1538
0.0927
0.1366
0.2070
0.3254
0.4683 ᐉ
0.4318
0.3395
0.2591
0.1816
0.1717
0.1841
0.2657
0.3779
0.6459 ᐉ
0.4609
0.3144
0.1944
0.1813
ᐉ
Largest value.
Based on Table I, the similarities SC , SL , and SHC appear with lots of symbols
“*” and “ i ” where “*” denotes that the similarity measure cannot calculate the
degree of similarity between two generalized trapezoidal fuzzy numbers and “ i ”
means incorrect results. Similar to the discussions of Chen and Chen,9 we find that
the proposed similarity measure S~A, B! can actually overcome the drawbacks of
SC ~A, B!, SL ~A, B!, and SHC ~A, B!. Next, we use Examples 5.1 and 5.2 in Chen
and Chen 9 to compare the proposed similarity measure S~A, B! to SCC ~A, B!.
Example 2. In the Example 5.1 of Chen and Chen,9 a nine-member linguistic
term set based on Chen 13 is used to represent the linguistic terms. The linguistic
terms and their corresponding generalized trapezoidal fuzzy numbers ~GTFNs! of
Ref. 13 are illustrated in Table II.
According to the fuzzy risk analysis method in Ref. 9, the total risk R 1 was
given by ~0.2683, 0.7052, 0.1069, 0.3898; 1!GT ~see p. 53 in Ref. 9!. Based on
Chen and Chen’s similarity measure SCC ~L i , R 1 ! and our proposed similarity measure S~L i , R 1 !, we obtain the degree of similarity between R 1 and these linguistic
terms shown in Table II, respectively. The results of these similarity values are
also presented in Table II. Based on Table II, we find that SCC ~R 1 , medium! ⫽
0.7056 has the largest value. Then the generalized trapezoidal fuzzy number R 1
is translated into the linguistic term “medium.” On the other hand, we also find
that S~R 1 , medium! ⫽ 0.6209 has the largest value. Then the generalized trapezoidal fuzzy number R 1 is also translated into the linguistic term “medium.”
In the Example 5.2 of Ref. 9, the total risk was R 2 ⫽ ~0.2889, 0.7497, 0.1044,
0.3997; 0.7!GT ~see p. 54 in Ref. 9!. Using the same method, we obtain the degrees
of similarity between R 2 and the nine linguistic terms. Table II shows the results
of these similarity values. From Table II, we also find that SCC ~R 2 , medium! ⫽
0.4683 and S~R 2 , medium! ⫽ 0.6459 have the largest value, respectively.
From Examples 1 and 2, we can see that the proposed similarity measure is as
good as Chen and Chen’s 9 similarity measure. In LR-type fuzzy numbers, normal
fuzzy numbers are also commonly used. Next, we will focus on normal fuzzy numbers. If
冉 冉 冊冊
L~ x! ⫽ R~ x! ⫽ exp ⫺
x⫺m
g
2
1011
SIMILARITY MEASURE BETWEEN LR-TYPE FUZZY NUMBERS
for an LR-type fuzzy number X, then X is called a normal fuzzy number, denoted
by X ⫽ ~m, g!N , that is,
冉 冉 冊冊
x⫺m
g
m X ~ x! ⫽ exp ⫺
2
,
⫺` ⬍ x ⬍ `
Based on the metric dLR defined in Section 2, we can obtain dLR ~X,Y ! for any two
normal fuzzy numbers X ⫽ ~m x , gx !N and Y ⫽ ~m y , gy !N as follows:
2
~X,Y ! ⫽ ~m x ⫺ m y ! 2 ⫹
dLR
⫹
冉冉
mx ⫹
冉冉
Mp
2
⫽ 3~m x ⫺ m y ! 2 ⫹
mx ⫺
Mp
冊 冉
2
冊 冉
gx ⫺ m y ⫺
gx ⫺ m y ⫹
Mp
2
冊冊
Mp
2
冊冊
2
gy
2
gx
p
~gx ⫺ gy ! 2
2
Example 3. Let us consider the normal fuzzy numbers X ⫽ ~0,1!N , Y ⫽ ~0.5,1!N ,
and Z ⫽ ~1,1!N shown in Figure 2. From Figure 2, the normal fuzzy number X is
closer to Y than Z, so that it is more similar to Y than Z. According to our similarity
measure, we have the degrees of similarity between these normal fuzzy numbers
as follows:
S~X,Y ! ⫽ 0.3012,
S~Y, Z! ⫽ 0.3012,
S~X, Z! ⫽ 0.0907
Because S~X,Y ! ⬎ S~X, Z!, we can see that the normal fuzzy number X is more
similar to Y than the normal fuzzy number Z. If we apply Chen and Chen’s 8,9
similarity measure, we cannot calculate the degrees of similarity between these
normal fuzzy numbers.
In fact, our proposed similarity measure S~A, B! has taken account of the L
~or R! shape in the LR-type fuzzy numbers. The form is simple only with an
exponential-type distance. The similarity SCC defined by Chen and Chen 8,9 did
not take the L ~or R! shape into account. On the other hand, the form of SCC is very
complicated and difficult to interpret. This is why we propose this simple similarity measure for LR-type fuzzy numbers. In applications, we could apply it to the
data retrieval such as database queries, data mining, web mining, and so forth. In
Figure 2.
Three normal-type fuzzy numbers.
1012
YANG, HUNG, AND CHANG-CHIEN
the next section, we apply the proposed similarity to a database query with compound attributes.
4.
APPLICATIONS TO A DATABASE QUERY WITH
COMPOUND ATTRIBUTES
Tahani 23 proposed the use of fuzzy sets for querying regular databases with a
conceptual framework early in 1977. Afterward, Kacprzyk and Ziolkowski 24 presented database queries with fuzzy linguistic quantifiers. Fuzzy databases subsequently were widely studied ~see Ref. 25!. An important approach to database
querying is based on similarity measures ~see Ref. 6!. For querying a database
more friendly with fuzzy queries, Nomura et al.26 proposed a method to generate
compound attributes that were ambiguous attributes not defined in the original
database schema but able to be derived from multiple rigid attributes in the schema.
Recently, Wang and Tsai 16 presented an approach for handling null queries on the
basis of generating compound attributes from fuzzy number and fuzzy trapezoidal
numbers. They then used similarity measures to define the degrees of similarity
between these fuzzy numbers. This kind of database management system for handling compound attributes in null queries is able to reduce the occurrences of null
answers and also provide a user-friendly query environment. In this section, we
apply the similarity measure proposed in Section 2 to these compound attributes.
We then make comparisons to the proposed method in Ref. 16.
Wang and Tsai 16 generated compound attributes from fuzzy numbers and fuzzy
intervals. For example, a compound attribute “Size” for a person can be generated
from two fuzzy numbers “Height” and “Weight” such that it is used to represent
semantic ~or intuitive! meanings in the database. They proceeded as follows. The
first step is to fuzzify numerical or interval-valued rigid attributes into fuzzy
numbers or fuzzy trapezoidal numbers. The second step is to select a suitable aggregation function and similarity measure. Because Wang and Tsai 16 chose discretetype similarity measures, they were forced to discretize these fuzzy attributes. The
third step involves measuring the degree of similarity between the compound
attribute and the query statement. If the similarity measure is greater than the
assigned threshold, it is considered an answer to the query. Otherwise it is considered not an answer to the query. In the first step, Wang and Tsai 16 first used an
S-function to convert all rigid attributes into the same unit-interval range @0,1# so
that they could be compounded under the same domain and scale. They then fuzzified all rigid attributes into LR-type fuzzy numbers with bell shapes. In the second step, they used the discrete-type similarity measures from two fuzzy sets so
that it was necessary to discretize the LR-type fuzzy numbers first and then choose
the better aggregate function and similarity measure for compound attributes. Our
method directly defines the similarity measure and the aggregate operator for
LR-type fuzzy numbers. It is not necessary to discretize LR-type fuzzy numbers
based on our method, which means that it does not lose any information included
in the fuzzy numbers.
In our proposed method, we assigned three fuzzy terms $Short, Normal,
Tall% for the rigid attribute “Height” as the three triangular fuzzy numbers with
Short ⫽ ~0, 0, 0.5!T , Normal ⫽ ~0.5, 0.5, 0.5!T , and Tall ⫽ ~1, 0.5, 0!T . The rigid
1013
SIMILARITY MEASURE BETWEEN LR-TYPE FUZZY NUMBERS
Figure 3.
Triangular fuzzy numbers for attributes Height, Weight, and Size.
attribute “Weight” is assigned using four triangular fuzzy numbers $Light, Average, Heavy, Fat% with Light ⫽ ~0, 0, 1_3 !T , Average ⫽ ~ 1_3 , 1_3 , 1_3 !T , Heavy ⫽ ~ 2_3 , 1_3 , 1_3 !T ,
and Fat ⫽ ~1, 1_3 , 0!T . The compound attribute called “Size” is assigned using three
triangular fuzzy numbers $Small, Average, Big% with Small ⫽ ~0, 0, 0.5!T , Average ⫽ ~0.5, 0.5, 0.5!T , and Big ⫽ ~1, 0.5, 0!T . These attributes are shown in Figure 3. We used the following aggregate operation 䊝 as
~m x , ax , bx !LR 䊝 ~m y , ay , by !LR ⫽
冉
m x ⫹ m y ax ⫹ ay bx ⫹ by
,
,
2
2
2
冊
LR
Thus, we generated the compound tuples from rigid attributes “Height” and
“Weight” as shown in Table III. We then used the similarity S~A, B! proposed in
Section 2 to find the similarity measures between the compound tuples and the
compound attribute “Size.” The results are shown in Table IV. When we compare
the results in Table IV with Ref. 16, the results from our method are better than
Wang and Tsai’s 16 results.
We applied the proposed method to a real dataset. The real data are from a
report from the National College of Physical Education and Sports, Taiwan, on
health physical fitness norms for Taiwan residents.27 This dataset includes the statistical data for height and weight with means and variances about residents in the
Greater Taiwan area from ages 6 to 65. In approximate normal distribution, the
probability between the mean plus and minus two standard deviations is around
0.95. Thus, we set up triangular fuzzy numbers for the attributes “Height” and
“Weight” with the mean value 0.5 for fuzzy numbers and outside of minus and
plus two standard deviation with 0 and 1. Suppose that the sample means and the
sample standard deviations for “Height” and “Weight” are m h , m w and sh , sw .
Table III. Compound tuples of ~Height, Weight!.
Weight
Height
Light
Average
Heavy
Fat
Short
Normal
Tall
5
_
!T
~0, 0, 12
5
_
~ 14_ , 14_ , 12
!T
~ 12_ , 14_ , 16_ !T
5
_
~ 16_ , 16_ , 12
!T
5 _
5
_
_
~ 12
, 125 , 12
!T
2
5 1
_ _
_
~ 3 , 12 , 6 !T
5
_
~ 13_ , 16_ , 12
!T
7 _
5
_
_
~ 12
, 125 , 12
!T
5
5 1
_ _
_
~ 6 , 12 , 6 !T
~ 12_ , 16_ , 14_ !T
5 1
_
~ 34_ , 12
, 4_ !T
5
_
~1, 12 , 0!T
1014
YANG, HUNG, AND CHANG-CHIEN
Table IV. Similarity measures between compound tuples and compound attribute “Size”
based on the proposed similarity measure.
S~~Height,Weight!,Size!
S~~Short, Light!,Small!
S~~Short, Light!, Average!
S~~Short, Light!, Big!
S~~Short, Average!,Small!
S~~Short, Average!, Average!
S~~Short, Average!, Big!
S~~Short, Heavy!,Small!
S~~Short, Heavy!, Average!
S~~Short, Heavy!, Big!
S~~Short, Fact!,Small!
S~~Short, Fact!, Average!
S~~Short, Fact!, Big!
S~~Normal, Light!,Small!
S~~Normal, Light!, Average!
S~~Normal, Light!, Big!
S~~Normal, Average!,Small!
S~~Normal, Average!, Average!
S~~Normal, Average!, Big!
Similarity
measures
S~~Height,Weight!,Size!
Similarity
measures
0.946
0.275
0.089
0.644
0.421
0.134
0.449
0.634
0.194
0.300
0.820
0.291
0.523
0.521
0.163
0.342
0.792
0.239
S~~Normal, Heavy!,Small!
S~~Normal, Heavy!, Average!
S~~Normal, Heavy!, Big!
S~~Normal, Fat!,Small!
S~~Normal, Fat!, Average!
S~~Normal, Fat!, Big!
S~~Tall, Light!,Small!
S~~Tall, Light!, Average!
S~~Tall, Light!, Big!
S~~Tall, Average!,Small!
S~~Tall, Average!, Average!
S~~Tall, Average!, Big!
S~~Tall, Heavy!,Small!
S~~Tall, Heavy!, Average!
S~~Tall, Heavy!, Big!
S~~Tall, Fat!,Small!
S~~Tall, Fat!, Average!
S~~Tall, Fat!, Big!
0.239
0.792
0.342
0.163
0.521
0.523
0.291
0.820
0.300
0.194
0.634
0.449
0.134
0.421
0.644
0.089
0.275
0.946
On the basis of the health physical fitness norm report executed by the National
College of Physical Education and Sports, Taiwan ~see Ref. 27!, we have their
sample means and sample standard deviations as shown in Table V. Because we
used fuzzy numbers for “Height” and “Weight” according to Figure 3, the statistical data for height and weight were transformed into fuzzy terms according to the
following rules:
Height: If “height” ⱕ m h ⫺ sh , then the person is Short;
If m h ⫺ sh ⬍ “height” ⬍ m h ⫹ sh , then the person is Normal;
If “height” ⱖ m h ⫹ sh , then the person is Tall.
Weight: If “weight” ⱕ m w ⫺ _43 sw , then the person is Light;
If m w ⫺ _43 sw ⬍ “weight” ⱕ m w , then the person is Average;
If m w ⬍ “weight” ⱕ m w ⫹ _43 sw , then the person is Heavy;
If “weight” ⱖ m w ⫹ _43 sw , then the person is Fat.
For example, a 6-year-old boy has a height of 113 cm and a weight of 18 kg.
Because all 6-year-old boys have m h ⫽ 119.5 cm and sh ⫽ 6.34 cm and m h ⫺ sh ⫽
113.16 cm, the 6-year-old boy is “Short.” All 6-year-old boys have m w ⫽ 24.1 kg
and sw ⫽ 4.48 kg and m m ⫺ _43 sw ⫽ 18.13 kg, the 6-year-old boy is “Light.” From
Table IV, the similarity S~~Short, Light!,Small! ⫽ 0.946 is the largest one among
S~~Short, Light!,Small!, S~~Short, Light!, Average!, and S~~Short, Light!, Big!. Thus
the size of the 6-year-old boy is “Small.” Thus, if we key in the height and weight
of a person to the database, it can respond with “Size.” Moreover, if we give a null
query with “Size,” the database will generate the response for your query. In general, these applications can also be used in fuzzy queries to various databases.
1015
SIMILARITY MEASURE BETWEEN LR-TYPE FUZZY NUMBERS
Table V. Health physical fitness norm report with means m h , m w and standard deviation
sh , sw of height and weight.
Men
Women
Height
Age
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20–25
26–30
31–35
36– 40
41– 45
46–50
51–55
56– 60
61– 65
Weight
Height
Weight
m h ~cm!
sh ~cm!
m w ~kg!
sw ~kg!
m h ~cm!
sh ~cm!
m w ~kg!
sw ~kg!
119.5
126.8
131.5
135.3
140.4
147.9
154.1
161.5
165.7
170.4
170.6
171.6
171.6
172.0
172.4
170.9
169.4
168.8
167.8
166.8
165.3
165.4
164.7
6.34
5.41
6.60
7.77
7.46
9.36
10.24
8.67
8.14
5.81
6.63
6.69
5.59
6.60
5.54
6.06
5.63
5.79
5.99
5.58
6.22
5.51
6.23
24.1
28.7
30.7
32.6
37.3
43.3
47.9
53.3
55.8
61.9
63.3
64.9
64.5
67.8
67.5
69.2
69.7
68.9
68.8
69.6
68.1
67.5
68.2
4.48
6.68
7.71
7.08
9.08
11.93
11.77
11.57
9.40
10.61
12.14
11.00
9.52
9.77
9.96
9.71
9.55
9.45
9.99
8.86
8.55
9.54
9.81
117.2
124.9
130.7
136.1
143.2
149.2
152.5
157.1
157.1
159.2
159.3
157.8
158.9
158.7
159.8
158.7
157.6
157.2
156.0
154.9
154.2
154.1
153.9
6.57
6.36
5.96
6.20
7.04
7.19
7.08
5.90
5.49
5.05
5.42
7.10
5.74
4.46
5.18
5.60
5.51
5.58
5.45
4.90
5.10
5.09
5.50
22.7
25.6
29.2
32.6
38.2
41.9
44.9
50.2
50.8
52.8
54.8
52.2
53.3
52.1
52.9
53.9
54.7
56.3
56.4
57.6
58.4
58.0
58.0
4.35
5.33
5.20
5.99
8.48
8.88
7.84
8.18
9.68
8.41
8.59
7.61
8.62
6.30
6.88
7.70
7.35
7.82
7.63
8.12
8.12
7.26
8.02
5.
CONCLUSIONS
Similarity measures are used for presenting degrees of similarity between
objects or concepts. These measures can be widely applied in various areas such
as clustering, control systems, database query, and so forth. Since the idea of fuzzy
sets was proposed by Zadeh 2 in 1965, fuzziness has been widely used for realworld systems. LR-type fuzzy numbers are most used to present fuzziness. We
proposed a new similarity measure for LR-type fuzzy numbers. The proposed similarity measure can overcome the drawbacks of the existing similarity measures.
We then applied the proposed method to a fuzzy database query with compound
attributes. Comparisons to Wang and Tsai 16 were made. A real data application
was also presented. In future research we will apply the proposed method to data
mining and web mining.
References
1.
2.
Kaufman L, Rousseeuw PJ. Finding groups in data. New York: Wiley; 1990.
Zadeh LA. Fuzzy sets. Inform Control 1965;8:338–356.
1016
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
15.
16.
17.
18.
19.
20.
21.
22.
23.
24.
25.
26.
27.
YANG, HUNG, AND CHANG-CHIEN
Zwick R, Carlstein E, Budescu DV. Measures of similarity among fuzzy concepts: A comparative analysis. Int J Approx Reas 1987;1:221–242.
Turksen IB, Zhong Z. An approximate analogical reasoning approach based on similarity
measures. IEEE Trans Syst Man Cybern 1988;18:1049–1056.
Pappis CP, Karacapilidis NI. A comparative assessment of measures of similarity of fuzzy
values. Fuzzy Set Syst 1993;56:171–174.
Candan KS, Li WS, Priya ML. Similarity-based ranking and query processing in multimedia databases. Data Knowl Eng 2000;35:259–298.
Yang MS, Shih HM. Cluster analysis based on fuzzy relations. Fuzzy Set Syst 2001;
120:197–212.
Chen SJ, Chen SM. A new method to measure the similarity between fuzzy numbers. In:
Proc 10th IEEE Int Conf on Fuzzy Systems, Melbourne; 2001. pp 73–76.
Chen SJ, Chen SM. Fuzzy risk analysis based on similarity measures of generalized fuzzy
numbers. IEEE Trans Fuzzy Syst 2003;11:45–56.
Pal NR, Pal SK. Entropy: A new definition and its applications. IEEE Trans Syst Man
Cybern 1991;21:1260–1270.
Pal NR, Pal SK. Some properties of the exponential entropy. Inform Sci 1992;66:119–137.
Wu KL, Yang MS. Alternative c-means clustering algorithms. Pattern Recogn 2002;
35:2267–2278.
Chen SM. New methods for subjective mental workload assessment and fuzzy risk analysis. Cybern Syst 1996;27:449– 472.
Lee HS. An optimal aggregation method for fuzzy opinions of group decision. In: Proc
1999 IEEE Int Conf Systems, Man, Cybernetics, Vol 3; 1999. pp 314–319.
Hsieh CH, Chen SH. Similarity of generalized fuzzy numbers with graded mean integration representation. In: Proc 8th Int Fuzzy Systems Association World Congress, Taipei,
Vol 2; 1999. pp 551–555.
Wang SL, Tsai YJ. Generating compound attributes from fuzzy data for null queries. Intell
Autom Soft Comput 2001;7:1–8.
Zimmermann HJ. Fuzzy set theory and its applications. Dordrecht, The Netherlands: Kluwer; 1991.
Chen SH. Operations on fuzzy numbers with function principle. Tamkang J Manag Sci
1985;6:13–25.
Chen SH. Ranking generalized fuzzy number with graded mean integration. In: Proc 8th
Int Fuzzy Systems Association World Congress, Taipei, Vol 2; 1999. pp 899–902.
Fan J, Xie W. Some notes on similarity measure and proximity measure. Fuzzy Set Syst
1999;101:403– 412.
Yang MS, Ko CH. On a class of fuzzy c-numbers clustering procedures for fuzzy data.
Fuzzy Set Syst 1996;84:49– 60.
Yang MS, Ko CH. On cluster-wise fuzzy regression analysis. IEEE Trans Syst Man Cybern
B 1997;27:1–13.
Tahani V. A conceptual framework for fuzzy query processing—A step toward very intelligent database systems. Inform Process Manag 1977;12:289–303.
Kacprzyk J, Ziolkowski A. Database queries with fuzzy linguistic quantifier. IEEE Trans
Syst Man Cybern 1986;16:474– 479.
Petry FE. Fuzzy databases: Principles and applications. Dordrecht, The Netherlands: Kluwer; 1996.
Nomura T, Odaka T, Ohki N, Yokoyama T, Matsushita Y. Generating ambiguous attributes
for fuzzy queries. In: Proc 1992 IEEE Int Conf on Fuzzy Systems; 1992. pp 753–760.
Health Physical Fitness Norm Report, executed by National College of Physical Education and Sports, Taiwan, Published by National Council on Physical Fitness and Sports,
Taiwan, ROC; 1999.
Download