Functional Dependencies in Fuzzy Databases Brian Hartlieb This concept was called Fuzziness and the theory was called Fuzzy Set Theory. ABSTRACT Integrity constraints play a critical role in a logical database design in which data dependencies are of more interest. One of the most important data dependencies is the functional dependency in relational databases, representing the dependency relationships among attribute values in a relation. We will examine the two most popular frameworks for extending classical Functional Dependencies into Fuzzy Functional Dependencies. We will examine some of the extensions made to the classical relational data model that attempt to tackle the challenges of incorporating fuzzy logic. While the frameworks’ foundational premises are different, the similarities are so strong, that a unification model may emerge in the future. Keywords functional dependency, fuzzy functional dependency, relational database, fuzzy database. While there are many types of uncertainty, Fuzzy set theory attempts to address only one aspect of uncertainty. It would seem beneficial to database systems, if it could incorporate fuzzy logic. Databases that use crisp and non-crisp data can beneficial from use of a query language that uses Fuzzy logic. A branch of fuzzy set theory is the use of “hedging” adjectives. A query language that provides the use of these adjectives will provide a more natural language for the user, than traditional SQL. It could be said that purpose of a fuzzy database is to provide an intelligent interface to a relational database system by facilitating approximate query handling and producing approximate answers when exact data may not be unavailable. [3] As an example consider a student record database system. We want to find bright and young students in the whole batch. For a crisp system we would specify the query as PROJECT (Student_Name) WHERE 19 ≤ AGE ≤ 23 and 3 ≤ GPA ≤ 4 1. INTRODUCTION This paper deals with the application of fuzzy logic in a relational database environment with the objective of showing how the definition of classical functional dependency (FD) has been extended in to a fuzzy functional dependency (FFD). The paper is organized as follows. In Section 2, the paper deals with some of the basic definitions and concepts of fuzzy logic. In Sections 3, we introduce the two major classes of fuzzy relational data models. In Section 4 and section 5, the two main approaches of fuzzy functional dependency frameworks are described. Finally, some concluding remarks. 2. BACKGROUND Databases are one form of modeling the real world. A world that is imprecise and vague. Database models can be either precise or imprecise. Query languages are designed to express the user’s retrieval requests in either a crisp manner or not. Most database models today are crisp. A crisp database model is one that is highly quantifiable - all relationships are fixed and all attributes have one value. The classical approach to uncertainty in databases is to reduce retrieval to 3-value logic. Each database object is surely, maybe, or surely-not, in response to a query. Several extensions have been brought to the relational data model to capture the imprecise parts of the real world. In general there are two approaches. The approaches differ mainly in the method they use, while still being based on the work of a single man, Lotfi A. Zadeh, father of fuzzy logic. The first approach is the similarity-based approach, first introduced by Zadeh in1970. [1] The second approach was later introduced by Zadeh in 1978 [2] Zadeh explained how a possibility distribution can be used in conjunction with his earlier work of fuzzy sets. Zadeh’s initial work introduced a theory whose objects fuzzy sets are sets with boundaries that are not precise and the membership in this fuzzy set is not a matter of true or false, but rather a matter of degree. But this system has a major flaw. Consider a student, Bob whose age is 24 and has a good GPA of 4 out of 4. He should have been selected but is not. It is because of the rigid boundary conditions set by the crisp logic of the query. In fuzzy logic, we would specify two fuzzy sets, YOUNG (fig.1a) and GPA (fig.1b), and each student will have some membership grade associated with the two sets. So according Bob will have a non–zero membership grade although it will be less than other students in the age group 19-23. Hence Bob will now be included in the result set to be considered. Bob now satisfies the query to some extent, which is represented by his membership grade. [4] 1 0 1 18 19 23 24 0 3 3.5 4 Figure 1. (a) Young (b) GPA 2.1 Fuzzy Definitions When A is a fuzzy set and x is a relevant object, the proposition “x is a member of A” is not necessarily either true or false, as required by the two-valued logic, but it may be true only to some degree, the degree to which x is actually a member of A, is a real number in the interval [0, 1]. Theoretically, if X is a collection of objects denoted generically by x, then a fuzzy set F in X is a set of ordered pairs, F = {(x, µF(x)) | x ε X} µF(x) is called the membership function (or grade of membership) of x in F that maps X to the membership space M. The range of the membership function is a subset of the nonnegative real numbers whose supremum is finite. 21st Computer Science Seminar SE1-T3-1 2.2 Fuzzy Set Operators and Fuzzy Logic 1 For crisp sets, the basic operations are, namely, Union, OR Intersection, AND Complement, NOT 0 Fuzzy sets have defined fuzzy operators that allow the manipulation of the fuzzy sets. There are fuzzy complements, intersection and union operators but they are not uniquely defined. However there is an important distinction between traditional set logic and fuzzy set theory. In traditional set theory there is a distinction between the union operation of sets and OR operator, was well as in the case of intersection and AND operator. But in fuzzy theory there is no such distinction between the logical and set operators; Fuzzy union ≡ Fuzzy OR Fuzzy intersection ≡ Fuzzy AND Fuzzy complement ≡ Fuzzy NOT d There are four parameters associated with a linguistic term as a, b, c and d as shown in the Fig. 3. For the range [b,c] the membership value is 1.0, while for the range [a, b] and [c, d] the membership value remains between [0.0, 1.0]. This example, introduces fuzzy sets and linguistic terms on the attribute domains and linguistic variables (e.g. on the attribute domain AGE we may define fuzzy sets as YOUNG, MIDDLE and OLD). [3] YOUNG 0 a Y bY MIDDLE OLD c Y , a M dY , bM c M , a O bO , dM cO dO Figure 4. Age Data can be classified as crisp, when there is no vagueness in the information (e.g., X = 13). With Fuzzy data, there is vagueness in the information and this can be further divided into two types as: (1) Approximate Value: The information data is not totally vague and there is some approximate value, which is known and the data, lies near that value (e.g., 10 < X < 15). These are considered have a triangular shaped possibility distribution as shown below 1 X c 1 Fuzzy Complement, ~A(x) = 1 - A(x) Fuzzy Union, (A∪B)(x) = max[A(x), B(x)]. Fuzzy Intersection, (A∩B)(x) = min[A(x), B(x)]. -d b Figure 3. Possibility Distribution for a Linguistic Term SMALL for the Linguistic Variable HEIGHT Some standard fuzzy operations are: 0 a d Figure 2. Possibility Distribution for an approximate value. (Approximately X) The parameter, d gives the range around which the information value lies. (2) Linguistic Variable: A linguistic variable is a variable that apart from representing a fuzzy number also represents linguistic concepts interpreted in a particular context. Each linguistic variable is defined in terms of a variable which either has a physical interpretation (speed, weight etc.) or any other numerical variable (salary, absences, GPA etc.) The information in this case is totally vague and we associate a fuzzy set with the information. A linguistic term is the name given to the fuzzy set (e.g., X is SMALL). These are considered have a trapezoidal shaped possibility distribution as shown below 3. APPROCH AND METHODOLOGY There have been many extensions developed for fuzzy relational data model. These extensions can be classified into two categories: The similarity-based and the possibility-based models. In a similarity-based model, some similarity relationships are specified for some attributes so that values of these attributes may be grouped into similarity classes. Each similarity class contains values that are similar to each other to, and above a given degree. Thus they are indistinct, and form an uncertain representation of a real-world value. In a possibility-based model, an ill-known data is represented by a possibility distribution which describes the possibility for each crisp attribute value to be the actual value of the data. In both types of models, membership degrees may be associated with tuples of a fuzzy relation. Integrity constraints play a critical role in a logical database design. Among these constraints, data dependencies are of most interest. Various types of data dependencies such as functional and multivalued dependencies are used for the design of classical relational schema that are conceptually meaningful and free of certain anomalies. For example, if one attribute determines another, we say that there exists a functional dependency between these attributes. Functional dependencies in databases relate the values of one set of attributes to the values of another set. Fuzzy functional dependencies can represent the dependency relationships among attribute values in fuzzy relations, such as “the salary almost depends on the job position and experience.” [5] We will review the two main approaches used at extending relational functional dependencies into fuzzy functional 21st Computer Science Seminar SE1-T3-2 dependencies. These two approaches are the similarity-based approach, and possibility-based approach. 4. THE POSSIBLIITY-BASED APPROACH In a relational data model that can support imprecise information, it is necessary to accommodate two types of impreciseness, the impreciseness in data values and impreciseness in the association among data values. As an example of impreciseness in data values, consider the Employee(Name, Salary) database, where Salary of an employee, John, may be known to the extent that it lies in the range $60,000-80,000, or may be known that John has a “high salary.” Similarly, as an example of impreciseness in the association among data values, let Likes(Student, Course) represent how much a student likes a particular course. Here the data values may be precisely known, but the degree to which a student, John, likes the course DBMS, is imprecise. It is not difficult to envision examples where both ambiguity in data values as well as impreciseness in the association among them are both present. [6] Fuzzy data is represented by possibility distributions and a grade of membership is used to represent the association between values. Also this grade of membership may itself be a possibility distribution. Fuzzy similarity relations facilitate the estimation of the extent to which possible values of an attribute can be regarded as being interchangeable. By introducing an extra element, e, for the situations where a nonzero possibility can mean the nonapplicability of an attribute. The traditional null value no longer has to mean that an attribute is completely unknown. It has also been proposed that possibility distributions be used to represent fuzzy values as well as uncertainty, when concerning the value of an attribute. [5] Depending on the complexity of dom(Ai), i = 1, . . . , n, we classify fuzzy relations into two categories. In type-l, fuzzy relations, dom(Ai) can only be a fuzzy set (or a classical set). A type-l fuzzy relation may be considered as a first-level extension of classical relations, where we will be able to capture the impreciseness in the association among entities. The type-2 fuzzy relations provide further generalization by allowing dom(Ai) to be even a set of fuzzy sets (or possibility distributions). By enlarging dom(Ai), type-2 relations enable us to represent a wider type of impreciseness in data values. Such relations can be considered as a second-level generalization of classical relations. For example, Type-2 allows a domain that hold both numerical and linguistic values; we may define the domain AGE as positive integers, and functions for YOUNG, MIDDLE and OLD. The three functions will convert the values of YOUNG, MIDDLE and OLD to numerical values. [6] 4.1 Fuzzy Integrity Constraints The integrity constraints in relational database systems can be broadly classified into two groups: (1) Domain dependency: Domain dependency restricts admissible domain values of the attributes, e.g., “age of an employee is less than 65 years,” or “no one is 10 feet tall.” (2) Data dependency: Data dependency requires that if some tuples in the database fulfill certain equalities, then either some other tuples must also exist in the database, or some values of the given tuples must be equal. [6] As we generalize relational database systems to deal with fuzzy, it will be necessary to consider integrity constraints that involve fuzzy constructs. Thus in a relation PLAYERS(Name, Age, Height, Sport, Income), an integrity constraint may be stated as, “Most basketball players are tall,” or “Many baseball players have high income.” These integrity constraints impose restrictions on the admissible values of height or income of the basketball or tennis players, respectively. Similarly, as an example of a fuzzy data dependency, consider the relation scheme EMPLOYEE(Name, Department, Job, Experience, Salary), where an integrity constraint may be stated as “in any department employees having similar jobs and experience must have almost equal salary.” [6] 4.2 Fuzzy Functional Dependencies In the fuzzy domain, equality of domain values defines a fuzzy proposition and may even be specified as “approximately equal,” “more or less equal,” etc. For instance, a fuzzy data dependency in the relation EMPLOYEE(Name, Job, Experience, Salary) can be stated as “Job and Experience more or less determines Salary.” [6] There are two families of approaches where possibility-based attribute values are compared for “equality”. The classification is based on the nature of comparison between two ill-known values. In one approach the comparison is made in terms of representations (i.e., the result is a degree to which the two underlying fuzzy sets are equal). One interpretation of this FFD is: "when two tuples have the same value (or representation) on X, they should have the same value (or representation) on Y". This FFD may contain tuples whose X-representations share some more or less possible values while they do not share a single value in the Y-representations. In the other approach the comparison is made in terms of values (i.e., its result is degree of possibility of the equality between two ill-known values.) The interpretation of this FFD is: "when two tuples have the same value (or representation) on X, they should have the same value (or representation) on Y”. This FFD uses a critical threshold value, which is undecided and remains arbitrary to the design. [7] 4.3 Inference Rules An important concept related to data dependencies is the concept of inference rules. Given a set of dependencies, inference rules introduce other dependencies that are logical consequences of the given dependencies. These rules are dependency generators and so they are closely related to the definition and semantics of the dependencies. Given a set of data dependencies that hold on a database, it is often possible to derive other data dependencies that also hold on the same database. An important point to make for the inference rules is that they can only be useful if the dependencies they generate form a sound and complete set. By sound, we mean that the generated dependency is valid in all relation instances provided the given set of inferences is also valid. By complete, we mean that all of the valid dependencies can be generated using only these rules. Thus, when defining the dependencies and their inference rules, it is crucial that the dependencies are well defined in terms of definition and semantics, and their inference rules are sound and complete. 21st Computer Science Seminar SE1-T3-3 In particular, we focus on the establishment of inference rules for the following reasons: (1) By the inference rules, we sometimes get simpler functional dependencies than the original ones. It is convenient to use simpler functional dependencies whenever possible in order to infer actual values of unknown values. (2) We can obtain minimal sets of functional dependency by using inference rules. In integrity checking, if we can use the minimal sets of functional dependencies, we are free from excessive evaluations. (3) When we obtain functional dependencies by data mining or knowledge discovery, we can get another functional dependency by using the inference rules. The inference rules for classical FDs are Armstrong’s 3-value logic axioms which are sound and complete. [8] Armstrong’s inference rules [9] Al. Reflexivity 5. THE SIMILARITY-BASED APPROACH The similarity-based fuzzy relational model is not an extension to the original relational model, but actually a generalization of it. It allows a set of values for an attribute rather than only atomic values, and replaces the identity concept with a similarity concept. The similarity-based relational model allows a set of values for a single attribute providing that all the values are from the same domain. The model, allows multiple values, while keeping the property of a strongly typed attribute value present in the classical relational model. This property is useful for query processing and Update operations. If the attribute value is precise and crisp, then the value is atomic, if it is imprecise and inexact, then a set of values that are similar to this value are stated in place of it. The level of similarity among the values is defined by the explicitly defined similarity relation for the domain of the attribute values. Similarity relations are useful for describing how similar two elements from the same domain are. A similarity relation, s(x,y), for a given domain D, is a mapping of every pair of elements in the domain onto the unit interval [0,1]. The identity relation used in non-fuzzy relational databases Identity relation is a special case of this similarity relation. [5] If Y ⊆ X, then X → Y. A2. Augmentation If X → Y, then X → XY. A3. Transitivity If X → Y and Y → Z, then X → Z. As augmentation, “If X →Y, X Z →YZ” is sometimes used, which is deduced from A1-A3. Some useful inference rules deduced from Armstrong’s inference rules D1. Union If X →Y and X → Z, then X → Y Z. Similarity relations are useful for describing how similar two elements from the same domain are, as the name implies. Given two elements, the similarity relation maps these two elements into an element in the interval [0, 1]. The more similar two elements are, the higher the value of the mapped element. If the two elements are the same, that is, if we compare an element with itself, the mapped element is 1, the highest possible value. An ordinary relation is considered to be a similarity relation when it satisfies the three conditions stated below. Definition 5.1. A similarity relation is a mapping s: D X D → [0, 1] such that for x, y, z∈D, D2. Decomposition If X → Y Z, then X → Y and X → Z. Fuzzy inference rules [6] By extending Armstrong’s rules to a multivalued logic system. FFl. Reflexivity s(x, x) =1 (reflexivity) s(x, y) = s(y, x) (symmetry) s(x, z) ≥ max y∈D (min s(x, y) , s(y, z))) (max-min transitivity) Example 5.1 For a domain D, we have D= s{a, b, c, d}. We define a relationship s for domain D, such that; If Y ⊆ X, then X →FY. FF2. Augmentation If X →F Y, then XZ →F YZ. FF3. Transitivity If X →F Y and Y →F Z, then X →F Z. FF4. Union If X →FY and X →F Z, then X →F Y Z. s a b c d a 1 0 8 0 b 0 8 1 0 c 0 0 1 0.7 d 0 0 0.7 1 Relation s satisfies the three conditions stated in Definition 5.1. Thus, it is a similarity relation. FF5. Decomposition If X →F Y Z, then X →F Y and X →F Z. FF6. Generalized augmentation If X →F Y and X ⊆ U and V ⊆ X Y, then U →F V. 21st Computer Science Seminar SE1-T3-4 Example 5.2. The equivalence classes induced by s in Example 2.1 are; [1, 0.8): [0.8, 0.7): [0.7, 0): 0: {a}, {b}, {c}, {d}. {a, b}, {c}, {d}. {a, b}, {c, d}. {a, b, c, d} Example 5.3. This is the instance of fuzzy relation car and the similarity relations of its attribute domains. [5] Type Color price t1 {sportscar} {blue, green} {expensive} t2 {wagon} {red} {modest, affordable} t3 {truck} {blue} {modest, affordable} t4 {wagon} {green} {expensive} TYPE S W T Sportscar (S) 1 0 0 Wagon (W) 0 1 0 Truck (T) 0 0 1 COLOR B G R Blue (B) 1 0.7 0 Green(G) 0.7 1 0 Red (R) 0 0 1 PRICE C A M A Cheap (C) 1 0.3 0.3 0 Modest (M) 0.3 1 0.8 0 Affordable(A) 0.3 0.8 1 0 Expensive (E) 0 0 0 1 5.1 Fuzzy Functional Dependencies Fuzzy functional dependencies reflect some kind of semantic knowledge about attribute subsets of the real world. FFDs are used to design similarity-based fuzzy databases where data redundancy and update anomalies are reduced. In a fuzzy relational data model, the degree of “X determines Y” may not necessarily be 1 as in the crisp case. Naturally, a value ranging over the interval [0, 1] may be accepted. Then the definition of FFD turns into “similar Y values correspond to similar X values.” [5] FFDs are functional constraints that are specified among the attributes of a fuzzy relation schema. The similarity-based relational model compares two attributes by measuring the closeness of the values in terms of the explicitly declared similarity relation of the attribute domain. The degree of closeness between two tuples in a fuzzy relation instance is called the conformance of them. The conformance is defined both on a single attribute and on a set of attributes. For precise FFDs, the similarity of Y values has to be greater than or equal to the similarity of X values, where similarity is measured in terms of conformance. For imprecise FFDs, the impreciseness of the dependency is a threshold on the similarity of Y values, thus weakening the dependency. [5] The FFDs should also be checked whenever tuples are inserted into the fuzzy relational database or they are modified, so that the integrity constraints imposed by these FFDs are not violated. [8] The definition of the FFD turns into: “if t[X] is similar to t’[X], t[Y] is also similar to t’[Y]. The similarity between Y values is greater or equal to the similarity between X values.” This dependency is shown as X→F Y. A typical example of such a dependency is: ‘‘employees with similar experiences must have similar salaries.’’ In this case, while the values of attributes experience and salary may be imprecise, the defined dependency is precise which can be noticed from the ‘‘must have’’ clause in the example. This definition of FFDs still has some missing point, which is the case where the dependency itself is imprecise. An example to this kind of FFD is: ‘‘the intelligence level of a person more or less determines the degree of success,’’ where the ‘‘more or less’’ clause makes the dependency imprecise. Assume that there are two people with identical intelligence levels, and the first person is very successful. We cannot conclude that the second person will be very successful, too, but we can state that the success level of the second person will be ‘‘more or less similar’’ to the success level of the first person, so a change in definition has to be made in order to accommodate the imprecise FFDs in addition to precise FFDs. One way to do this is to accept the linguistic strength in the dependency as a threshold value for example, the dependency ‘‘employees with similar experiences must have similar salaries’’ has linguistic strength 1, and the dependency ‘‘the intelligence level of a person more or less determines the degree of success’’ has linguistic strength (0.6). We choose this method to describe imprecise FFDs as well as precise ones. This threshold value naturally determines the strength of the dependency. Thus, this value will be θ called the strength of the FFD, shown as X→ θF Y. [6] 5.2 Inference Rules For the similarity-based fuzzy model, we must examine inference rules under two interpretations of functional dependencies. Under the interpretation corresponding to using Godel’s multivalued logic system, Armstrong’s inference rules are sound and complete for any functional dependency with no weights, and the extended inference rules of Armstrong’s ones are sound and complete for any functional dependency with weights. On the other hand, under the interpretation corresponding to using Diens’ multivalued logic system, Armstrong’s inference rules are sound and complete for functional dependencies with identity relations and no weights, and the extended inference rules of Armstrong’s ones are sound and complete for functional dependencies with identity relations and weights. However, Armstrong’s inference rules and their extended inference rules are not sound for functional dependencies with resemblance relations and no weights and with resemblance relations and weights, respectively. In these cases, another sound inference rules hold. (Armstrong, Diens, and Godel, where mathematicians who worked separately on 3-value and/or multivalued logic systems.) [9] 21st Computer Science Seminar SE1-T3-5 6. CONCLUSION Like the classical databases, the fuzzy databases not properly designed suffer from the problems of data redundancy and update anomalies. To provide a good fuzzy relational database design, the concept of FFD is used to define the fuzzy normal forms and dependency-preserving and lossless join properties. In this article, we reviewed the two leading frameworks for Fuzzy relational databases, the similarity-based approach, and possibility-based approach. Zadeh’s fuzzy logic models are the bases for both approaches. While the possibility-based model came later, work has continued on both approaches by a number of researchers. Both approaches have a number of parallels, and similarities. As far as Fuzzy functional dependencies are defined in both approaches, most of the problems have been addressed. There seems to be no major advantage to using one approach over the other. It is clear that, based on these basic fuzzy relational models; there maybe one type of extended fuzzy relational model, where possibility distributions and resemblance relations arise in relational databases simultaneously. In the Future that is a good possibility that a single universal Fuzzy relational database model be proposed. 7. REFERENCES [1] Zadeh, L.A. Similarity relations and fuzzy orderings. Inform. Sci., 3(2), 1970. 177-200 [2] Zadeh, L.A. Fuzzy sets as a basis for a theory of possibility. Fuzzy Sets Syst., 1(1), 1978. 3-28. [3] Ma, Z. M., Zhang, W. J., Ma, W. Y., Mili1, F., Data Dependencies in Extended Possibility-Based Fuzzy Relational Databases. INTERNATIONAL JOURNAL OF INTELLIGENT SYSTEMS, VOL. 17, Wiley Periodicals, Inc. DOI: 10.1002/int.1088. 2002. 321–332 [4] Bedi, Punam, Kaur, Harmeet, Malhotra, Ankit, Fuzzy Dimension To Databases. Published at the 37th National Convention of Computer Society of India, Bangalore, India, November 2002 [5] Bahar, Adnan, Yazici, Adnan. Normalization and Lossless Join Decomposition of Similarity-Based Fuzzy Relational Databases. INTERNATIONAL JOURNAL OF INTELLIGENT SYSTEMS, VOL. 19, Wiley Periodicals, Inc. DOI 10.1002/int.20029. 2004. 885–917 [6] Raju, KVSVN and Majumdar, AK, Fuzzy functional dependencies and lossless join decomposition of fuzzy relational database systems. ACM Trans Database Syst 1988.129–166. [7] Wang, Shyue-Liang, Tsai, Jenn-Shing, Hong, Tzung-Pei. Mining Functional Dependencies from Fuzzy Relational Databases. SAC'00 March 19-21 Como, Italy ACM 1-58113239-5/00/003. 2000. [8] Wang, Shyue-Liang, Shen, Ju-Wen, Hong, Tzung-Pei Hong, Chang, Bill C.H. Incremental Discovery of Functional Dependencies From Similarity-Based Fuzzy Relational Databases Using Partitions. IFSA World Congress and 20th NAFIPS International Conference, 2001. Joint 9th, Volume: 3, 25-28 July 2001. 1322-1326. [9] Nakata, Michinori. Functional Dependencies in Fuzzy Databases. 1997 First International Conference on Knowledge-Based Intelligent Electronic Systems, 21-23 May 1997, Adelaide, Australia. Editor, L.C. Jain. 1997. 21st Computer Science Seminar SE1-T3-6