On a Similarity Measure between LR-Type Fuzzy Numbers and Its Application to Database Acquisition Miin-Shen Yang,1 * Wen-Liang Hung,2,† Shou-Jen Chang-Chien 1,‡ 1 Department of Applied Mathematics, Chung Yuan Christian University, Chung-Li, Taiwan 32023, ROC 2 Department of Mathematics Education, National Hsinchu Teachers College, Hsin-Chu, Taiwan, ROC This article presents a new similarity measure for LR-type fuzzy numbers. The proposed similarity measure is based on a defined metric between LR-type fuzzy numbers. It is known that an exponential operation is highly useful in dealing with the classical Shannon entropy and cluster analysis. We adopted, therefore, the exponential operation on this metric. Furthermore, we analyze its properties and make numerical comparisons to several similarity measures. The results show that the proposed similarity measure can overcome the drawbacks of the existing similarity measures. We then apply it to compound attributes for handling null queries to database systems. These applications can also be widely used in fuzzy queries to databases. © 2005 Wiley Periodicals, Inc. 1. INTRODUCTION In the real world situations are very often uncertain. Probability has traditionally been used in modeling uncertainty. However, fuzziness has been widely used to handle another type of uncertainty. Fuzzy data and fuzzy presentation commonly exist in real-world systems. In all of these fuzzy types of presentation, LR-type fuzzy numbers are most used as in linguistic, decision making, knowledge representation, medical diagnosis, control systems, databases, and so forth. A similarity measure is important for presenting a degree of similarity between two objects or concepts. Traditionally, a similarity measure was used *Author to whom all correspondence should be addressed: e-mail: msyang@math. cycu.edu.tw. †e-mail: wlhung@mail.nhctc.edu.tw. ‡e-mail: chang-chien0102@yahoo.com.tw. INTERNATIONAL JOURNAL OF INTELLIGENT SYSTEMS, VOL. 20, 1001–1016 ~2005! © 2005 Wiley Periodicals, Inc. Published online in Wiley InterScience ~www.interscience.wiley.com!. • DOI 10.1002/int.20102 1002 YANG, HUNG, AND CHANG-CHIEN most in cluster analysis such as hierarchical clustering ~see Ref. 1!. Since Zadeh 2 proposed fuzzy sets, many different similarity measures between fuzzy sets have been proposed and applied in various areas ~see Refs. 3–7!. However, most of them are defined for fuzzy values ~i.e., discrete fuzzy sets!. LR-type fuzzy ~trapezoidal! numbers are commonly used to present real ~interval! numbers in a fuzzy environment for analyzing fuzziness and fuzzy data that have various applications as in linguistic, control, database systems, and so forth. However, there is less similarity for LR-type fuzzy numbers. Recently, Chen and Chen 8,9 proposed a similarity measure between generalized fuzzy numbers. They used center of gravity to define a similarity for generalized fuzzy numbers that is complicated and not useful for general LR-types. It is known that an exponential operation is highly useful in dealing with the classical Shannon entropy ~see Refs. 10 and 11! and cluster analysis ~see Ref. 12!. In this article, we adopt, therefore, the exponential operation on a defined metric between LR-type fuzzy numbers and propose a new similarity measure. We compare it to Chen and Chen,8,9 Chen,13 Lee,14 and Hsieh and Chen 15 and apply it in fuzzy queries to databases. In Section 2, we present the proposed similarity measure and its properties. Section 3 gives numerical comparisons and examples. In Section 4, we apply it to database queries by generating compound attributes for handling null queries. The results are compared to Wang and Tsai.16 A real data set from a health physical fitness norm report in Taiwan is used and analyzed using the proposed method. We present our conclusions in Section 5. 2. PROPOSED SIMILARITY MEASURE AND ITS PROPERTIES A fuzzy number F is defined as a convex normalized fuzzy set of real numbers ℜ with membership m F ~ x! piecewise continuous. Let L ~and R! both be decreasing functions from all positive real numbers ℜ⫹ to the interval @0,1# with L~0! ⫽ 1, L~ x! ⬍ 1 for x ⬎ 0, L~1! ⫽ 0, and L~ x! ⬎ 0 for x ⬍ 1 ~and the same for R!. A fuzzy number X is called LR-type if there are real numbers m, a ⬎ 0, b ⬎ 0 with 冉 冊 冉 冊 m⫺x L , a m X ~ x! ⫽ x⫺m , R b xⱕm xⱖm where m is called the mean value of X and a and b are called the left and right spreads. We then symbolically denote X by X ⫽ ~m, a, b!LR . Let X and Y be two LR-type fuzzy numbers with X ⫽ ~m x , ax , bx !LR and Y ⫽ ~m y , ay , by !LR . Using the extension principle, we have that X ⫹ Y ⫽ ~m x ⫹ m y , ax ⫹ ay , bx ⫹ by !LR , ⫺Y ⫽ ~⫺m y , by , ay !RL , ~m x , ax , bx !LR ⫺ ~m y , by , ay !RL ⫽ ~m x ⫺ m y , ax ⫹ ay , bx ⫹ by !LR , and lX ⫽ ~lm x , lax , lbx !LR when l ⬎ 0 and lX ⫽ ~lm x ,⫺lbx , ⫺lax !RL when l ⬍ 0 ~see Ref. 17!. SIMILARITY MEASURE BETWEEN LR-TYPE FUZZY NUMBERS 1003 For an LR-type fuzzy number M ⫽ ~m, a, b!LR , if L and R are of the form T ~ x! ⫽ 再 1 ⫺ x, 0 ⱕ x ⱕ1 0, otherwise then M is called a triangular fuzzy number and denoted by M ⫽ ~m, a, b!T . A fuzzy number X is called a LR-type trapezoidal fuzzy number if there are real numbers m 1 , m 2 , a ⬎ 0, b ⬎ 0 with m X ~ x! ⫽ 冉 冦冉 L 冊 x ⱕ m1 冊 x ⱖ m2 m1 ⫺ x , a 1, R x ⫺ m2 , b m1 ⱕ x ⱕ m2 where m 1 and m 2 are called mean values of X and a, b are called the left and right spreads of X. Symbolically, X is denoted by X ⫽ ~m 1 , m 2 , a, b!LR . We know that LR-type fuzzy numbers are used to present real numbers in a fuzzy environment and trapezoidal fuzzy numbers are used to present fuzzy intervals that are widely applied in linguistic, knowledge representation, control systems, databases, and so forth. Remark 1. If we take L and R to be of the form T ~ x! ⫽ where 0 ⬍ w ⱕ 1, then 再 w~1 ⫺ x!, 0 ⱕ x ⱕ1 0, otherwise 冉 冦冉 w 1⫺ m X ~ x! ⫽ m1 ⫺ x a w w 1⫺ x ⫺ m2 b 冊 for x ⱕ m 1 冊 for x ⱖ m 2 for m 1 ⱕ x ⱕ m 2 The fuzzy number X is called a generalized trapezoidal fuzzy number by Chen.18,19 We denote X ⫽ ~m 1 , m 2 , a, b; w!GT . If w ⫽1, then the generalized trapezoidal fuzzy number is called a ~normal! trapezoidal fuzzy number. A similarity measure is an important tool for presenting a degree of similarity between two objects. There are many similarity measures defined for ~discrete! fuzzy sets on a finite set. Zwick et al.3 first used the geometric distance and the Hausdorff metric to define similarity measures among fuzzy sets. Pappis and Karacapilidis 5 proposed three kinds of similarity measures for discrete fuzzy sets. There are many different proposed similarities for discrete fuzzy sets and applications ~see Refs. 4, 16, and 20, among others!. However, there are few defined similarities for LR-type fuzzy numbers. Recently, Chen and Chen 8,9 proposed a similarity 1004 YANG, HUNG, AND CHANG-CHIEN between fuzzy numbers. They used the center of gravity ~COG! to define the degree of similarity between trapezoidal or triangular fuzzy numbers. However, their definition seems to be too complicated and difficult to interpret. Their methods cannot be used for LR-type fuzzy numbers. In this section, we propose a simple way to define a new similarity for LR-type fuzzy numbers. Consider two LR-type fuzzy numbers A ⫽ ~m a , aa , ba !LR and B ⫽ ~m b , ab , bb !LR . Yang and Ko 21,22 defined a metric dLR for A and B with 2 ~A, B! ⫽ ~m a ⫺ m b ! 2 ⫹ ~~m a ⫺ ᐉaa ! ⫺ ~m b ⫺ ᐉab !! 2 dLR ⫹ ~~m a ⫹ rba ! ⫺ ~m b ⫹ rbb !! 2 where ᐉ ⫽ *01 L⫺1 ~w! dw and r ⫽ *01 R ⫺1 ~w! dw if L⫺1 and R⫺1 are integrable over the interval @0,1#. To define dLR for two LR-type trapezoidal fuzzy numbers A ⫽ ~m 1a , m 2a , aa , ba !LR and B ⫽ ~m 1b , m 2b , ab , bb !LR , we extend dLR to 2 ~A, B! ⫽ ~~m 1a ⫺ m 1b ! 2 ⫹ ~m 2a ⫺ m 2b ! 2 !/2 ⫹ ~~m 1a ⫺ ᐉaa ! ⫺ ~m 1b ⫺ ᐉab !! 2 dLR ⫹ ~~m 2a ⫹ rba ! ⫺ ~m 2b ⫹ rbb !! 2 It is known that an exponential operation is highly useful in dealing with the classical Shannon entropy ~see Refs. 10 and 11! and cluster analysis ~see Ref. 12!. We adopted, therefore, the exponential operation on the metric dLR . The proposed similarity measure S~A, B! between two LR-type ~trapezoidal! fuzzy numbers A and B is defined as S~A, B! ⫽ 再 1 if A ⫽ B 2 ~A, B!/s! exp~⫺dLR if A ⫽ B where s is a constant bigger than 0 chosen as follows: ~i! if A ⫽ ~m a , aa , ba !LR and B ⫽ ~m b , ab , bb !LR , then s ⫽ ~D* ⫹ D * !/2 ⫹ ~6~m a ⫺ aa ! ⫺ ~m b ⫺ ab !6 ⫹ 6~m a ⫹ ba ! ⫺ ~m b ⫹ bb !6!/2 3 where D* ⫽ 6~m a ⫺ ᐉaa ! ⫺ ~m b ⫺ ᐉab !6, D * ⫽ 6~m a ⫹ rba ! ⫺ ~m b ⫹ rbb !6. ~ii! If A ⫽ ~m 1a , m 2a , aa , ba !LR and B ⫽ ~m 1b , m 2b , ab , bb !LR , then s ⫽ ~D* ⫹ D * !/2 ⫹ ~6~m 1a ⫺ aa ! ⫺ ~m 1b ⫺ ab !6 ⫹ 6~m 2a ⫹ ba ! ⫺ ~m 2b ⫹ bb !6!/2 4 where D* ⫽ 6~m 1a ⫺ ᐉaa ! ⫺ ~m 1b ⫺ ᐉab !6, D * ⫽ 6~m 2a ⫹ rba ! ⫺ ~m 2b ⫹ rbb !6. In general, most similarity measures S~A, B! satisfy that, if A ⫽ B, then S~A, B! ⫽ 1. However, some of them may not satisfy if S~A, B! ⫽ 1, then A ⫽ B. We present the proposed similarity that has the property that A ⫽ B if and only S~A, B! ⫽ 1. In fact, it has also the property S~A, B! ⫽ S~B, A!. Property 1. For any two LR-type fuzzy (trapezoidal) numbers A and B, we have A ⫽ B if and only if S~A, B! ⫽ 1. SIMILARITY MEASURE BETWEEN LR-TYPE FUZZY NUMBERS 1005 Proof. ~i! If A ⫽ B, then S~A, B! ⫽ 1 by the definition. ~ii! Let A ⫽ ~m a , aa , ba !LR and B ⫽ ~m b , ab , bb !LR with A ⫽ B. Thus, s ⫽ ~6~m a ⫺ ᐉaa ! ⫺ ~m b ⫺ ᐉab !6 ⫹ 6~m a ⫹ rba ! ⫺ ~m b ⫹ rbb !6!/2 ⫹ ~6~m a ⫺ aa ! ⫺ ~m b ⫺ ab !6 ⫹ 6~m a ⫹ ba ! ⫺ ~m b ⫹ bb !6!/2 3 It implies that s ⫽ 0 ? m a ⫺ ᐉaa ⫽ m b ⫺ ᐉab , m a ⫹ rba ⫽ m b ⫹ rbb m a ⫺ aa ⫽ m b ⫺ ab , m a ⫹ ba ⫽ m b ⫹ bb ? ~1 ⫺ ᐉ!aa ⫽ ~1 ⫺ ᐉ!ab , ~r ⫺ 1!ba ⫽ ~r ⫺ 1!bb m a ⫺ aa ⫽ m b ⫺ ab , m a ⫹ ba ⫽ m b ⫹ bb ? m a ⫽ m b , aa ⫽ ab , ba ⫽ bb if ᐉ ⫽ 1 and r ⫽ 1 ?A⫽B 2 Therefore, if A ⫽ B, then s ⫽ 0 and dLR ~A, B! ⫽ 0. Using the definition S~A, B! ⫽ 再 1 if A ⫽ B 2 ~A, B!/s! exp~⫺dLR if A ⫽ B we have that if A ⫽ B, then S~A, B! ⫽ 1. That means, if S~A, B! ⫽ 1, then A ⫽ B. Similarly, the proof is the same for A ⫽ ~m 1a , m 2a , aa , ba !LR and B ⫽ ~m 1b , 䡲 m 2b , ab , bb !LR . Property 2. If A and B are two LR-type fuzzy numbers, then S~A, B! ⫽ S~B, A!. Proof. ~i! If S~A, B! ⫽ 1, then A ⫽ B ~by Property 1!. That is, S~A, B! ⫽ S~B, A!. 2 ~A, B! ⫽ 0. It is clear that ~ii! If S~A, B! ⫽ 1, then A ⫽ B. We then have s ⫽ 0 and dLR 2 2 ~A, B! ⫽ dLR ~B, A!. Thus, S~A, B! ⫽ S~B, A!. 䡲 dLR 3. NUMERICAL COMPARISONS In this section, we make some numerical comparisons of the proposed similarity with the other similarities proposed by Chen and Chen,8,9 Chen,13 Lee,14 and Hsieh and Chen.15 Let A and B be two triangular fuzzy numbers where A ⫽ ~m a , aa , ba !T , and B ⫽ ~m b , ab , bb !T with m a ⫺ aa ⱖ 0, m a ⫹ ba ⱕ 1, and m b ⫺ ab ⱖ 0, m b ⫹ bb ⱕ 1. Chen 13 defined the degree of similarity SC ~A, B! between A and B as follows: SC ~A, B! ⫽ 1 ⫺ 7A ⫺ B7 3 1006 YANG, HUNG, AND CHANG-CHIEN where 7A ⫺ B7 ⫽ 6~m a ⫺ aa ! ⫺ ~m b ⫺ ab !6 ⫹ 6m a ⫺ m b 6 ⫹ 6~m a ⫹ ba ! ⫺ ~m b ⫹ bb !6 If A and B are two trapezoidal fuzzy numbers where A ⫽ ~m 1a , m 2a , aa , ba ;1!GT and B ⫽ ~m 1b , m 2b , ab , bb ;1!GT with m 1a ⫺ aa ⱖ 0, m 2a ⫹ ba ⱕ 1, and m 1b ⫺ ab ⱖ 0, m 2b ⫹ bb ⱕ 1, then the degree of similarity SC ~A, B! between A and B can be calculated as follows: SC ~A, B! ⫽ 1 ⫺ 7A ⫺ B7 4 where 7A ⫺ B7 ⫽ 6~m 1a ⫺ aa ! ⫺ ~m 1b ⫺ ab !6 ⫹ 6m 1a ⫺ m 1b 6 ⫹ 6m 2a ⫺ m 2b 6 ⫹ 6~m 2a ⫹ ba ! ⫺ ~m 2b ⫹ bb !6 Lee 14 defined a similarity measure SL ~A, B! for trapezoidal fuzzy numbers A ⫽ ~m 1a , m 2a , aa , ba ;1!GT and B ⫽ ~m 1b , m 2b , ab , bb ;1!GT as follows: SL ~A, B! ⫽ 1 ⫺ 7A ⫺ B7lp ⫻ 4⫺1/p 7U 7 where 7U 7 ⫽ max~U ! ⫺ min~U ! U ⫽ $m 1a ⫺ aa , m 1a , m 2a , m 2a ⫹ ba , m 1b ⫺ ab , m 1b , m 2b , m 2b ⫹ bb % and 7A ⫺ B7lp ⫽ ~6~m 1a ⫺ aa ! ⫺ ~m 1b ⫺ ab !6 p ⫹ 6m 1a ⫺ m 1b 6 p ⫹ 6m 2a ⫺ m 2b 6 p ⫹ 6~m 2a ⫹ ba ! ⫺ ~m 2b ⫹ bb !6 p ! 1/p Based on the idea of graded mean integration-representation distance, Hsieh and Chen 15 proposed a similarity measure SHC ~A, B! between two fuzzy numbers A and B as follows: SHC ~A, B! ⫽ 1 1 ⫹ d~A, B! where d~A, B! ⫽ 6P~A! ⫺ P~B!6 and P~A! and P~B! are the graded mean integration representations of A and B, respectively. If A ⫽ ~m a , aa , ba !T and B ⫽ ~m b , ab , bb !T , then P~A! ⫽ ⫺aa ⫹ 6m a ⫹ ba 6 and P~B! ⫽ ⫺ab ⫹ 6m b ⫹ bb 6 If A and B are trapezoidal fuzzy numbers, where A ⫽ ~m 1a , m 2a , aa , ba ;1!GT , B ⫽ ~m 1b , m 2b , ab , bb ;1!GT , then SIMILARITY MEASURE BETWEEN LR-TYPE FUZZY NUMBERS P~A! ⫽ ⫺aa ⫹ 3m 1a ⫹ 3m 2a ⫹ ba 6 P~B! ⫽ ⫺ab ⫹ 3m 1b ⫹ 3m 2b ⫹ bb 6 1007 and Recently, Chen and Chen 8,9 used the simple center of gravity method ~SCGM! to calculate the COG points of generalized trapezoidal fuzzy numbers and then to calculate the degree of similarity between two generalized trapezoidal fuzzy numbers. Let A ⫽ ~m 1a , m 2a , aa , ba ; wa !GT , B ⫽ ~m 1b , m 2b , ab , bb ; wb !GT with m 1a ⫺ aa ⱖ 0, m 2a ⫹ ba ⱕ 1, and m 1b ⫺ ab ⱖ 0, m 2b ⫹ bb ⱕ 1. Using the SCGM, we can obtain the COG points COG~A! and COG~B! of A and B, respectively, where COG~A! ⫽ ~ x A* , yA* ! and COG~B! ⫽ ~ x B* , yB* ! are defined as follows: yA* ⫽ x A* ⫽ 冦 wA ⫻ 冉 冊 m 2a ⫺ m 1a ⫹2 ~m 2a ⫹ ba ! ⫺ ~m 1a ⫺ aa ! 6 wA 2 if m 1a ⫺ aa ⫽ m 2a ⫹ ba if m 1a ⫺ aa ⫽ m 2a ⫹ ba yA* ~m 2a ⫹ m 1a ! ⫹ ~m 2a ⫹ ba ⫹ m 1a ⫺ aa !~wA ⫺ yA* ! 2wA Similarly, ~ x B* , yB* ! is for B. Then, the degree of similarity SCC ~A, B! between A and B can be calculated as follows: 冉 SCC ~A, B! ⫽ 1 ⫺ 冊 7A ⫺ B7 min$ yA* , yB* % ⫻ ~1 ⫺ 6 x A* ⫺ x B* 6! B~SA , SB ! ⫻ 4 max$ yA* , yB* % where 7A ⫺ B7 ⫽ 6~m 1a ⫺ aa ! ⫺ ~m 1b ⫺ ab !6 ⫹ 6m 1a ⫺ m 1b 6 ⫹ 6m 2a ⫺ m 2b 6 ⫹ 6~m 2a ⫹ ba ! ⫺ ~m 2b ⫹ bb !6. B~SA , SB ! was defined as follows: B~SA , SB ! ⫽ 再 1 if SA ⫹ SB ⬎ 0 0 if SA ⫹ SB ⫽ 0 where SA and SB are the lengths of the bases of A and B, respectively, defined as: SA ⫽ ~m 2a ⫹ ba ! ⫺ ~m 1a ⫺ aa !, SB ⫽ ~m 2b ⫹ bb ! ⫺ ~m 1b ⫺ ab ! To compare the proposed similarity S~A, B! with SC ~A, B!, SL ~A, B!, SHC ~A, B!, and SCC ~A, B!, we give the following examples. Example 1. Chen and Chen 9 used 12 sets of generalized trapezoidal fuzzy numbers to illustrate that the similarity measure SCC ~A, B! is more reasonable than SC ~A, B!, SL ~A, B!, and SHC ~A, B!. In this example we also use these datasets, which are shown in Figure 1, to compare the proposed similarity measure S~A, B! with the above similarity measures. The similarities for these generalized trapezoidal fuzzy numbers with S~A, B!, SC ~A, B!, SL ~A, B!, SHC ~A, B!, and SCC ~A, B! are presented in Table I. 1008 YANG, HUNG, AND CHANG-CHIEN Figure 1. Twelve sets of fuzzy numbers. 0.975 0.9167 1i 0.8357 0.8607 SC ~A, B! SL ~A, B! SHC ~A, B! SCC ~A, B! S~A, B! 1 1 1 1 1 Set 2 Set 4 0.7 i 0.5 i 0.7692 i 0.49 0.4493 Set 3 0.7 i 0.5 i 0.7692 i 0.42 0.4444 1i 1i 1i 0.8 0.9753 Set 5 1 * 1 1 1 Set 6 0.9 0i 0.909 0.9 0.7659 Set 7 0.9 i 0.5 0.909 i 0.54 0.7558 Set 8 12 sets of generalized trapezoidal fuzzy numbers 0.9 i 0.6667 0.909 i 0.81 0.7866 Set 9 *The similarity measure cannot calculate the degree of similarity between two generalized trapezoidal fuzzy numbers. i Incorrect results. Set 1 Similarity measures Table I. Similarities of 12 datasets. 0.9 i 0.8333 1i 0.9 0.8752 Set 10 0.9 i 0.75 1i 0.72 0.7659 Set 11 0.9 i 0.8 0.9375 0.78 0.8465 Set 12 SIMILARITY MEASURE BETWEEN LR-TYPE FUZZY NUMBERS 1009 1010 YANG, HUNG, AND CHANG-CHIEN Table II. A nine-member linguistic term set. Linguistic terms GTFNs SCC ~L i , R 1 ! S~L i , R 1 ! SCC ~L i , R 2 ! S~L i , R 2 ! Absolutely-low Very-low Low Fairly-low Medium Fairly-high High Very-high Absolutely-high L 1 ⫽ ~0,0,0,0;1!GT L 2 ⫽ ~0,0.02,0,0.05;1!GT L 3 ⫽ ~0.1,0.18,0.06,0.05;1!GT L 4 ⫽ ~0.22,0.36,0.05,0.06;1!GT L 5 ⫽ ~0.41,0.58,0.09,0.07;1!GT L 6 ⫽ ~0.63,0.80,0.05,0.06;1!GT L 7 ⫽ ~0.78,0.92,0.06,0.05;1!GT L 8 ⫽ ~0.98,1,0.05,0;1!GT L 9 ⫽ ~1,1,0,0;1!GT 0.1565 0.1962 0.3226 0.5092 0.7056 ᐉ 0.5828 0.4545 0.2937 0.2391 0.1663 0.1789 0.2560 0.3519 0.6209 ᐉ 0.4326 0.2646 0.1648 0.1538 0.0927 0.1366 0.2070 0.3254 0.4683 ᐉ 0.4318 0.3395 0.2591 0.1816 0.1717 0.1841 0.2657 0.3779 0.6459 ᐉ 0.4609 0.3144 0.1944 0.1813 ᐉ Largest value. Based on Table I, the similarities SC , SL , and SHC appear with lots of symbols “*” and “ i ” where “*” denotes that the similarity measure cannot calculate the degree of similarity between two generalized trapezoidal fuzzy numbers and “ i ” means incorrect results. Similar to the discussions of Chen and Chen,9 we find that the proposed similarity measure S~A, B! can actually overcome the drawbacks of SC ~A, B!, SL ~A, B!, and SHC ~A, B!. Next, we use Examples 5.1 and 5.2 in Chen and Chen 9 to compare the proposed similarity measure S~A, B! to SCC ~A, B!. Example 2. In the Example 5.1 of Chen and Chen,9 a nine-member linguistic term set based on Chen 13 is used to represent the linguistic terms. The linguistic terms and their corresponding generalized trapezoidal fuzzy numbers ~GTFNs! of Ref. 13 are illustrated in Table II. According to the fuzzy risk analysis method in Ref. 9, the total risk R 1 was given by ~0.2683, 0.7052, 0.1069, 0.3898; 1!GT ~see p. 53 in Ref. 9!. Based on Chen and Chen’s similarity measure SCC ~L i , R 1 ! and our proposed similarity measure S~L i , R 1 !, we obtain the degree of similarity between R 1 and these linguistic terms shown in Table II, respectively. The results of these similarity values are also presented in Table II. Based on Table II, we find that SCC ~R 1 , medium! ⫽ 0.7056 has the largest value. Then the generalized trapezoidal fuzzy number R 1 is translated into the linguistic term “medium.” On the other hand, we also find that S~R 1 , medium! ⫽ 0.6209 has the largest value. Then the generalized trapezoidal fuzzy number R 1 is also translated into the linguistic term “medium.” In the Example 5.2 of Ref. 9, the total risk was R 2 ⫽ ~0.2889, 0.7497, 0.1044, 0.3997; 0.7!GT ~see p. 54 in Ref. 9!. Using the same method, we obtain the degrees of similarity between R 2 and the nine linguistic terms. Table II shows the results of these similarity values. From Table II, we also find that SCC ~R 2 , medium! ⫽ 0.4683 and S~R 2 , medium! ⫽ 0.6459 have the largest value, respectively. From Examples 1 and 2, we can see that the proposed similarity measure is as good as Chen and Chen’s 9 similarity measure. In LR-type fuzzy numbers, normal fuzzy numbers are also commonly used. Next, we will focus on normal fuzzy numbers. If 冉 冉 冊冊 L~ x! ⫽ R~ x! ⫽ exp ⫺ x⫺m g 2 1011 SIMILARITY MEASURE BETWEEN LR-TYPE FUZZY NUMBERS for an LR-type fuzzy number X, then X is called a normal fuzzy number, denoted by X ⫽ ~m, g!N , that is, 冉 冉 冊冊 x⫺m g m X ~ x! ⫽ exp ⫺ 2 , ⫺` ⬍ x ⬍ ` Based on the metric dLR defined in Section 2, we can obtain dLR ~X,Y ! for any two normal fuzzy numbers X ⫽ ~m x , gx !N and Y ⫽ ~m y , gy !N as follows: 2 ~X,Y ! ⫽ ~m x ⫺ m y ! 2 ⫹ dLR ⫹ 冉冉 mx ⫹ 冉冉 Mp 2 ⫽ 3~m x ⫺ m y ! 2 ⫹ mx ⫺ Mp 冊 冉 2 冊 冉 gx ⫺ m y ⫺ gx ⫺ m y ⫹ Mp 2 冊冊 Mp 2 冊冊 2 gy 2 gx p ~gx ⫺ gy ! 2 2 Example 3. Let us consider the normal fuzzy numbers X ⫽ ~0,1!N , Y ⫽ ~0.5,1!N , and Z ⫽ ~1,1!N shown in Figure 2. From Figure 2, the normal fuzzy number X is closer to Y than Z, so that it is more similar to Y than Z. According to our similarity measure, we have the degrees of similarity between these normal fuzzy numbers as follows: S~X,Y ! ⫽ 0.3012, S~Y, Z! ⫽ 0.3012, S~X, Z! ⫽ 0.0907 Because S~X,Y ! ⬎ S~X, Z!, we can see that the normal fuzzy number X is more similar to Y than the normal fuzzy number Z. If we apply Chen and Chen’s 8,9 similarity measure, we cannot calculate the degrees of similarity between these normal fuzzy numbers. In fact, our proposed similarity measure S~A, B! has taken account of the L ~or R! shape in the LR-type fuzzy numbers. The form is simple only with an exponential-type distance. The similarity SCC defined by Chen and Chen 8,9 did not take the L ~or R! shape into account. On the other hand, the form of SCC is very complicated and difficult to interpret. This is why we propose this simple similarity measure for LR-type fuzzy numbers. In applications, we could apply it to the data retrieval such as database queries, data mining, web mining, and so forth. In Figure 2. Three normal-type fuzzy numbers. 1012 YANG, HUNG, AND CHANG-CHIEN the next section, we apply the proposed similarity to a database query with compound attributes. 4. APPLICATIONS TO A DATABASE QUERY WITH COMPOUND ATTRIBUTES Tahani 23 proposed the use of fuzzy sets for querying regular databases with a conceptual framework early in 1977. Afterward, Kacprzyk and Ziolkowski 24 presented database queries with fuzzy linguistic quantifiers. Fuzzy databases subsequently were widely studied ~see Ref. 25!. An important approach to database querying is based on similarity measures ~see Ref. 6!. For querying a database more friendly with fuzzy queries, Nomura et al.26 proposed a method to generate compound attributes that were ambiguous attributes not defined in the original database schema but able to be derived from multiple rigid attributes in the schema. Recently, Wang and Tsai 16 presented an approach for handling null queries on the basis of generating compound attributes from fuzzy number and fuzzy trapezoidal numbers. They then used similarity measures to define the degrees of similarity between these fuzzy numbers. This kind of database management system for handling compound attributes in null queries is able to reduce the occurrences of null answers and also provide a user-friendly query environment. In this section, we apply the similarity measure proposed in Section 2 to these compound attributes. We then make comparisons to the proposed method in Ref. 16. Wang and Tsai 16 generated compound attributes from fuzzy numbers and fuzzy intervals. For example, a compound attribute “Size” for a person can be generated from two fuzzy numbers “Height” and “Weight” such that it is used to represent semantic ~or intuitive! meanings in the database. They proceeded as follows. The first step is to fuzzify numerical or interval-valued rigid attributes into fuzzy numbers or fuzzy trapezoidal numbers. The second step is to select a suitable aggregation function and similarity measure. Because Wang and Tsai 16 chose discretetype similarity measures, they were forced to discretize these fuzzy attributes. The third step involves measuring the degree of similarity between the compound attribute and the query statement. If the similarity measure is greater than the assigned threshold, it is considered an answer to the query. Otherwise it is considered not an answer to the query. In the first step, Wang and Tsai 16 first used an S-function to convert all rigid attributes into the same unit-interval range @0,1# so that they could be compounded under the same domain and scale. They then fuzzified all rigid attributes into LR-type fuzzy numbers with bell shapes. In the second step, they used the discrete-type similarity measures from two fuzzy sets so that it was necessary to discretize the LR-type fuzzy numbers first and then choose the better aggregate function and similarity measure for compound attributes. Our method directly defines the similarity measure and the aggregate operator for LR-type fuzzy numbers. It is not necessary to discretize LR-type fuzzy numbers based on our method, which means that it does not lose any information included in the fuzzy numbers. In our proposed method, we assigned three fuzzy terms $Short, Normal, Tall% for the rigid attribute “Height” as the three triangular fuzzy numbers with Short ⫽ ~0, 0, 0.5!T , Normal ⫽ ~0.5, 0.5, 0.5!T , and Tall ⫽ ~1, 0.5, 0!T . The rigid 1013 SIMILARITY MEASURE BETWEEN LR-TYPE FUZZY NUMBERS Figure 3. Triangular fuzzy numbers for attributes Height, Weight, and Size. attribute “Weight” is assigned using four triangular fuzzy numbers $Light, Average, Heavy, Fat% with Light ⫽ ~0, 0, 1_3 !T , Average ⫽ ~ 1_3 , 1_3 , 1_3 !T , Heavy ⫽ ~ 2_3 , 1_3 , 1_3 !T , and Fat ⫽ ~1, 1_3 , 0!T . The compound attribute called “Size” is assigned using three triangular fuzzy numbers $Small, Average, Big% with Small ⫽ ~0, 0, 0.5!T , Average ⫽ ~0.5, 0.5, 0.5!T , and Big ⫽ ~1, 0.5, 0!T . These attributes are shown in Figure 3. We used the following aggregate operation 䊝 as ~m x , ax , bx !LR 䊝 ~m y , ay , by !LR ⫽ 冉 m x ⫹ m y ax ⫹ ay bx ⫹ by , , 2 2 2 冊 LR Thus, we generated the compound tuples from rigid attributes “Height” and “Weight” as shown in Table III. We then used the similarity S~A, B! proposed in Section 2 to find the similarity measures between the compound tuples and the compound attribute “Size.” The results are shown in Table IV. When we compare the results in Table IV with Ref. 16, the results from our method are better than Wang and Tsai’s 16 results. We applied the proposed method to a real dataset. The real data are from a report from the National College of Physical Education and Sports, Taiwan, on health physical fitness norms for Taiwan residents.27 This dataset includes the statistical data for height and weight with means and variances about residents in the Greater Taiwan area from ages 6 to 65. In approximate normal distribution, the probability between the mean plus and minus two standard deviations is around 0.95. Thus, we set up triangular fuzzy numbers for the attributes “Height” and “Weight” with the mean value 0.5 for fuzzy numbers and outside of minus and plus two standard deviation with 0 and 1. Suppose that the sample means and the sample standard deviations for “Height” and “Weight” are m h , m w and sh , sw . Table III. Compound tuples of ~Height, Weight!. Weight Height Light Average Heavy Fat Short Normal Tall 5 _ !T ~0, 0, 12 5 _ ~ 14_ , 14_ , 12 !T ~ 12_ , 14_ , 16_ !T 5 _ ~ 16_ , 16_ , 12 !T 5 _ 5 _ _ ~ 12 , 125 , 12 !T 2 5 1 _ _ _ ~ 3 , 12 , 6 !T 5 _ ~ 13_ , 16_ , 12 !T 7 _ 5 _ _ ~ 12 , 125 , 12 !T 5 5 1 _ _ _ ~ 6 , 12 , 6 !T ~ 12_ , 16_ , 14_ !T 5 1 _ ~ 34_ , 12 , 4_ !T 5 _ ~1, 12 , 0!T 1014 YANG, HUNG, AND CHANG-CHIEN Table IV. Similarity measures between compound tuples and compound attribute “Size” based on the proposed similarity measure. S~~Height,Weight!,Size! S~~Short, Light!,Small! S~~Short, Light!, Average! S~~Short, Light!, Big! S~~Short, Average!,Small! S~~Short, Average!, Average! S~~Short, Average!, Big! S~~Short, Heavy!,Small! S~~Short, Heavy!, Average! S~~Short, Heavy!, Big! S~~Short, Fact!,Small! S~~Short, Fact!, Average! S~~Short, Fact!, Big! S~~Normal, Light!,Small! S~~Normal, Light!, Average! S~~Normal, Light!, Big! S~~Normal, Average!,Small! S~~Normal, Average!, Average! S~~Normal, Average!, Big! Similarity measures S~~Height,Weight!,Size! Similarity measures 0.946 0.275 0.089 0.644 0.421 0.134 0.449 0.634 0.194 0.300 0.820 0.291 0.523 0.521 0.163 0.342 0.792 0.239 S~~Normal, Heavy!,Small! S~~Normal, Heavy!, Average! S~~Normal, Heavy!, Big! S~~Normal, Fat!,Small! S~~Normal, Fat!, Average! S~~Normal, Fat!, Big! S~~Tall, Light!,Small! S~~Tall, Light!, Average! S~~Tall, Light!, Big! S~~Tall, Average!,Small! S~~Tall, Average!, Average! S~~Tall, Average!, Big! S~~Tall, Heavy!,Small! S~~Tall, Heavy!, Average! S~~Tall, Heavy!, Big! S~~Tall, Fat!,Small! S~~Tall, Fat!, Average! S~~Tall, Fat!, Big! 0.239 0.792 0.342 0.163 0.521 0.523 0.291 0.820 0.300 0.194 0.634 0.449 0.134 0.421 0.644 0.089 0.275 0.946 On the basis of the health physical fitness norm report executed by the National College of Physical Education and Sports, Taiwan ~see Ref. 27!, we have their sample means and sample standard deviations as shown in Table V. Because we used fuzzy numbers for “Height” and “Weight” according to Figure 3, the statistical data for height and weight were transformed into fuzzy terms according to the following rules: Height: If “height” ⱕ m h ⫺ sh , then the person is Short; If m h ⫺ sh ⬍ “height” ⬍ m h ⫹ sh , then the person is Normal; If “height” ⱖ m h ⫹ sh , then the person is Tall. Weight: If “weight” ⱕ m w ⫺ _43 sw , then the person is Light; If m w ⫺ _43 sw ⬍ “weight” ⱕ m w , then the person is Average; If m w ⬍ “weight” ⱕ m w ⫹ _43 sw , then the person is Heavy; If “weight” ⱖ m w ⫹ _43 sw , then the person is Fat. For example, a 6-year-old boy has a height of 113 cm and a weight of 18 kg. Because all 6-year-old boys have m h ⫽ 119.5 cm and sh ⫽ 6.34 cm and m h ⫺ sh ⫽ 113.16 cm, the 6-year-old boy is “Short.” All 6-year-old boys have m w ⫽ 24.1 kg and sw ⫽ 4.48 kg and m m ⫺ _43 sw ⫽ 18.13 kg, the 6-year-old boy is “Light.” From Table IV, the similarity S~~Short, Light!,Small! ⫽ 0.946 is the largest one among S~~Short, Light!,Small!, S~~Short, Light!, Average!, and S~~Short, Light!, Big!. Thus the size of the 6-year-old boy is “Small.” Thus, if we key in the height and weight of a person to the database, it can respond with “Size.” Moreover, if we give a null query with “Size,” the database will generate the response for your query. In general, these applications can also be used in fuzzy queries to various databases. 1015 SIMILARITY MEASURE BETWEEN LR-TYPE FUZZY NUMBERS Table V. Health physical fitness norm report with means m h , m w and standard deviation sh , sw of height and weight. Men Women Height Age 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20–25 26–30 31–35 36– 40 41– 45 46–50 51–55 56– 60 61– 65 Weight Height Weight m h ~cm! sh ~cm! m w ~kg! sw ~kg! m h ~cm! sh ~cm! m w ~kg! sw ~kg! 119.5 126.8 131.5 135.3 140.4 147.9 154.1 161.5 165.7 170.4 170.6 171.6 171.6 172.0 172.4 170.9 169.4 168.8 167.8 166.8 165.3 165.4 164.7 6.34 5.41 6.60 7.77 7.46 9.36 10.24 8.67 8.14 5.81 6.63 6.69 5.59 6.60 5.54 6.06 5.63 5.79 5.99 5.58 6.22 5.51 6.23 24.1 28.7 30.7 32.6 37.3 43.3 47.9 53.3 55.8 61.9 63.3 64.9 64.5 67.8 67.5 69.2 69.7 68.9 68.8 69.6 68.1 67.5 68.2 4.48 6.68 7.71 7.08 9.08 11.93 11.77 11.57 9.40 10.61 12.14 11.00 9.52 9.77 9.96 9.71 9.55 9.45 9.99 8.86 8.55 9.54 9.81 117.2 124.9 130.7 136.1 143.2 149.2 152.5 157.1 157.1 159.2 159.3 157.8 158.9 158.7 159.8 158.7 157.6 157.2 156.0 154.9 154.2 154.1 153.9 6.57 6.36 5.96 6.20 7.04 7.19 7.08 5.90 5.49 5.05 5.42 7.10 5.74 4.46 5.18 5.60 5.51 5.58 5.45 4.90 5.10 5.09 5.50 22.7 25.6 29.2 32.6 38.2 41.9 44.9 50.2 50.8 52.8 54.8 52.2 53.3 52.1 52.9 53.9 54.7 56.3 56.4 57.6 58.4 58.0 58.0 4.35 5.33 5.20 5.99 8.48 8.88 7.84 8.18 9.68 8.41 8.59 7.61 8.62 6.30 6.88 7.70 7.35 7.82 7.63 8.12 8.12 7.26 8.02 5. CONCLUSIONS Similarity measures are used for presenting degrees of similarity between objects or concepts. These measures can be widely applied in various areas such as clustering, control systems, database query, and so forth. Since the idea of fuzzy sets was proposed by Zadeh 2 in 1965, fuzziness has been widely used for realworld systems. LR-type fuzzy numbers are most used to present fuzziness. We proposed a new similarity measure for LR-type fuzzy numbers. The proposed similarity measure can overcome the drawbacks of the existing similarity measures. We then applied the proposed method to a fuzzy database query with compound attributes. Comparisons to Wang and Tsai 16 were made. A real data application was also presented. In future research we will apply the proposed method to data mining and web mining. References 1. 2. Kaufman L, Rousseeuw PJ. Finding groups in data. New York: Wiley; 1990. Zadeh LA. Fuzzy sets. Inform Control 1965;8:338–356. 1016 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 21. 22. 23. 24. 25. 26. 27. YANG, HUNG, AND CHANG-CHIEN Zwick R, Carlstein E, Budescu DV. Measures of similarity among fuzzy concepts: A comparative analysis. Int J Approx Reas 1987;1:221–242. Turksen IB, Zhong Z. An approximate analogical reasoning approach based on similarity measures. IEEE Trans Syst Man Cybern 1988;18:1049–1056. Pappis CP, Karacapilidis NI. A comparative assessment of measures of similarity of fuzzy values. Fuzzy Set Syst 1993;56:171–174. Candan KS, Li WS, Priya ML. Similarity-based ranking and query processing in multimedia databases. Data Knowl Eng 2000;35:259–298. Yang MS, Shih HM. Cluster analysis based on fuzzy relations. Fuzzy Set Syst 2001; 120:197–212. Chen SJ, Chen SM. A new method to measure the similarity between fuzzy numbers. In: Proc 10th IEEE Int Conf on Fuzzy Systems, Melbourne; 2001. pp 73–76. Chen SJ, Chen SM. Fuzzy risk analysis based on similarity measures of generalized fuzzy numbers. IEEE Trans Fuzzy Syst 2003;11:45–56. Pal NR, Pal SK. Entropy: A new definition and its applications. IEEE Trans Syst Man Cybern 1991;21:1260–1270. Pal NR, Pal SK. Some properties of the exponential entropy. Inform Sci 1992;66:119–137. Wu KL, Yang MS. Alternative c-means clustering algorithms. Pattern Recogn 2002; 35:2267–2278. Chen SM. New methods for subjective mental workload assessment and fuzzy risk analysis. Cybern Syst 1996;27:449– 472. Lee HS. An optimal aggregation method for fuzzy opinions of group decision. In: Proc 1999 IEEE Int Conf Systems, Man, Cybernetics, Vol 3; 1999. pp 314–319. Hsieh CH, Chen SH. Similarity of generalized fuzzy numbers with graded mean integration representation. In: Proc 8th Int Fuzzy Systems Association World Congress, Taipei, Vol 2; 1999. pp 551–555. Wang SL, Tsai YJ. Generating compound attributes from fuzzy data for null queries. Intell Autom Soft Comput 2001;7:1–8. Zimmermann HJ. Fuzzy set theory and its applications. Dordrecht, The Netherlands: Kluwer; 1991. Chen SH. Operations on fuzzy numbers with function principle. Tamkang J Manag Sci 1985;6:13–25. Chen SH. Ranking generalized fuzzy number with graded mean integration. In: Proc 8th Int Fuzzy Systems Association World Congress, Taipei, Vol 2; 1999. pp 899–902. Fan J, Xie W. Some notes on similarity measure and proximity measure. Fuzzy Set Syst 1999;101:403– 412. Yang MS, Ko CH. On a class of fuzzy c-numbers clustering procedures for fuzzy data. Fuzzy Set Syst 1996;84:49– 60. Yang MS, Ko CH. On cluster-wise fuzzy regression analysis. IEEE Trans Syst Man Cybern B 1997;27:1–13. Tahani V. A conceptual framework for fuzzy query processing—A step toward very intelligent database systems. Inform Process Manag 1977;12:289–303. Kacprzyk J, Ziolkowski A. Database queries with fuzzy linguistic quantifier. IEEE Trans Syst Man Cybern 1986;16:474– 479. Petry FE. Fuzzy databases: Principles and applications. Dordrecht, The Netherlands: Kluwer; 1996. Nomura T, Odaka T, Ohki N, Yokoyama T, Matsushita Y. Generating ambiguous attributes for fuzzy queries. In: Proc 1992 IEEE Int Conf on Fuzzy Systems; 1992. pp 753–760. Health Physical Fitness Norm Report, executed by National College of Physical Education and Sports, Taiwan, Published by National Council on Physical Fitness and Sports, Taiwan, ROC; 1999.