Knowledge Bases with Missing, Uncertain, Fuzzy, or Incomplete Information Based on Fuzzy Databases: Principles and Applications by Frederick E. Petry & Chapter 7 of Aritificial Intelligence on Reasoning with Uncertain or Incomplete Information by George F. Luger & William A. Stubblefield & Querying Disjunctive Databases in Polynomial Time by Lars E. Olson Outline of Topics (A Sampling of Some of the Approaches) Imprecision in Conventional Databases Probabilistic Databases Reasoning with Uncertain or Incomplete Information Fuzzy Set Theory Basic definitions Operations Models of Fuzziness Null values Range values Similarity-based models Possibility-based models Rough sets Disjunctive Databases Missing or Incomplete Information Missing or incomplete values (i.e. null values) can have many meanings Unknown (e.g. salary is not known) Not applicable (e.g. student ID for non-students) Does not exist (e.g. middle name of a person) Maybe, possibly, probably, probably not, … Representation () Normally: same symbol, same meaning Neither nor = is reasonable Can have marked nulls: i = i Three Valued Logic for Nulls Also: x y for x or y null and one of , , , , , . OK for Cost<10 Parts But not for Cost<10 Cost10 Parts Range Values For some unknowns, a range may be known. R = Person P1 P2 P3 P4 P5 P6 Age 25 30-35 32-40 45-55 45-55 65 Interesting Consequences: 25 Age R, but 35 Age R is unknown. Relations don’t have duplicates, but both 45-55 elements must remain in Age R. Age Person=P2 R < Age Person=P4 R, but it is unknown whether Age Person=P2 R < Age Person=P3 R. Probabilistic Databases CleanupCosts: Site Costs … M357 0.4 [$ 50 – 75 K] 0.6 [$ 75 – 100 K] Each group sums to one. (Missing probabilities added.) L121 … 0.3 [$ 200 – 300 K] 0.5 [$ 300 – 400 K] 0.2 [* – *] Associate probabilities with attributes. Example: estimates of 10 experts of cleanup costs – 4 estimated M357 at 50-75K and 6 estimated it at 75-100K. Operators extended by allowing conditions on both values and probabilities. Example: SITE COST: 0.5 200K CleanupCosts = {SITE:L121, …}. Certainty Theory (Stanford Certainty Factor Algebra) Attempts to formalize the human approach to uncertainty when thinking of facts and rules as being “highly probably,” “unlikely,” “possible,” … Simple assumptions for creating confidence measures and for combining confidence measures: Evidence is either for or against a particular hypothesis – ranges from 1 to -1 MB(H|E): Measure of Belief of hypothesis (H) given evidence (E). MD(H|E): Measure of Disbelief of H given E. Either: 1 > MB(H|E) > 0 while MD(H|E) = 0, or 1 > MD(H|E) > 0 while MB(H|E) = 0 Then: the certainty factor, CF(H|E) = MB(H|E) – MD(H|E). Premises of rules are conjunctions and disjunctions of uncertain facts and are modeled respectively by min and max. Conjunction of Premises: CF(P1 P2) = min(CF(P1), CF(P2)) Disjunction of Premises: CF(P1 P2) = max(CF(P1), CF(P2)) Certainty Theory (Stanford Certainty Factor Algebra) Conclusions of rules have certainty factors based on assumed complete certainty of premises, but are then tempered multiplicatively by the uncertainty of premises. Example of a rule: (P1 P2) P3 R1 (.7) R2 (.3) Result 1 (R1) has “estimated certainty” of .7; Result 2 (R2) … .3 P1, P2, and P3 assumed completely certain for estimating result certainties Example of certainty calculation assuming P1, P2, and P3 have certainty .6, .4, and .2 respectively CF(P1 P2) = min(.6, .4) = .4 CF((P1 P2) P3) = max(.4, .2) = .4 When R1 is added as a derived fact: CF(R1) = CF((P1 P2) P3) .7 = .4 .7 = .28 When R2 is added: CF(R1) = .4 .3 = .12 Certainty Theory (Stanford Certainty Factor Algebra) Combinations of multiple rules for the same evidence have rules … If CF(R1) and CF(R2) are positive, then CF(R1) + CF(R2) – (CF(R1) CF(R2)) If CF(R1) and CF(R2) are negative, then CF(R1) + CF(R2) + (CF(R1) CF(R2)) Otherwise (CF(R1) + CF(R2)) / (1 – min(|CF(R1)|, |CF(R2)|)) … with desirable properties. Results are always between 1 and -1. Combining contradictory certainty factors cancel each other. Combined positive (negative) certainty factors increase (decrease) monotonically, as would be expected. Certainty Theory (Stanford Certainty Factor Algebra) Example: MotherOfChild(x,y) MotherOfChild(x,z) FullSibling(x,y) (.9) LiveInSameHoushold(x,y) BirthdateWithin15Years(x,y) FullSibling(x,y) (.8) MotherOfChild(mary,john) (.7) MotherOfChild(mary,alice) (.8) LiveInSameHousehold(john,alice) (.9) BirthDateWithin15Years(john,alice) (1.0) 1st FullSibling Rule: min(.7, .8) .9 = .63 2nd FullSibling Rule: min(.9, 1.0) .8 = .72 Positive Certainty Factor Rule: .63 + .72 – (.63 .72) = 1.35 - .45 = .9 Certainty Theory (Stanford Certainty Factor Algebra) Example: MotherOfChild(x,y) MotherOfChild(x,z) FullSibling(x,y) (.9) LiveInSameHoushold(x,y) BirthdateWithin15Years(x,y) FullSibling(x,y) (.8) MotherOfChild(mary,john) (.7) MotherOfChild(mary,alice) (.8) LiveInSameHousehold(john,alice) (.9) BirthDateWithin15Years(john,alice) (-.6) 1st FullSibling Rule: min(.7, .8) .9 = .63 2nd FullSibling Rule: min(.9, -.6) .8 = -.48 Mixed Certainty Factor Rule: (.63 + (-.48))/(1 – min(|.63|, |-.48|)) = .15/.52 = .29 The theory does not attempt to do “correct” reasoning, but rather to model the combining of confidence measures in an approximate, heuristic, and informal way. Fuzzy Set Theory Basic Idea: extend the notion of a characteristic function: For an ordinary set S: CharS(x) : U {0, 1} e.g. for U = abcde, 01101 is {b, c, e} For a fuzzy set F: CharF(x) : U [0, 1] i.e. the degree of set membership ranges from 0 to 1 and represents an opinion, a judgment, or the confidence in membership. The membership function is usually denoted by F(x). e.g. estimates of high cost of environmental cleanup: HIGH = {0.0/1K, 0.125/5K, 0.5/10K, 0.8/25K, 0.9/50K, 1.0/100K} Can interpolate to get estimates for other values, e.g. 0.95/75K Can extend to get other estimates, e.g. HIGH(x) = 1.0 for x 100K Fuzzy Set Operations Equality Containment AB(x) = Min(A(x), B(x)) {0.0/0K, 0.8/10K} {0.0/0K, 0.5/5K, 0.9/10K} = {0.0/0K, 0.4/5K, 0.8/10K} Concentration (CON(A)) AB(x) = Max(A(x), B(x)) {0.0/0K, 0.8/10K} {0.0/0K, 0.5/5K, 0.9/10K} = {0.0/0K, 0.5/5K, 0.9/10K} Set Intersection A = {(1-A(x)/x} {0.7/10K} = {0.3/10K} i.e. the judgment that 10K is not in the set is 0.3 (when the judgment that it is in the set is 0.7) Set Union A B iff A(x) B(x) {0.0/0K, 0.8/10K} {0.0/0K, 0.5/5K, 0.9/10K} Complement A = B iff A(x) = B(x) {0.0/0K, 10/10K} = {0.0/0K, 0.5/5K, 1.0/10K} CON(A)(x) = (A(x))2 Concentrates fuzzy elements by reducing the membership grade Dilation (DIL(A)) DIL(A)(x) = (A(x))1/2 Dilates fuzzy elements by increasing the membership grade Similarity-Based Models Allows us to measure nearness of domain elements Uses a similarity relation Sim(x, y) A B C D E A B C D E 1.0 0.8 0.4 0.5 0.8 0.8 1.0 0.4 0.5 0.9 0.4 0.4 1.0 0.4 0.4 0.5 0.5 0.4 1.0 0.5 0.8 0.9 0.4 0.5 1.0 The similarity of A and E is 0.8. Properties of Similarity Relations Sim(x, y) A B C D E A B C D E 1.0 0.8 0.4 0.5 0.8 0.8 1.0 0.4 0.5 0.9 0.4 0.4 1.0 0.4 0.4 0.5 0.5 0.4 1.0 0.5 0.8 0.9 0.4 0.5 1.0 Reflexive: Sim(x,x) = 1.0 Symmetric: Sim(x,y) = Sim(y,x) Transitive: Sim(x,z) = Max{y}(Min[Sim(x,y), Sim(y,z)]) e.g. Sim(A,C) = Max(Min[Sim(A,A), Sim(A,C)], Min[Sim(A,B), Sim(B,C)], Min[Sim(A,C), Sim(C,C)], Min[Sim(A,D), Sim(D,C)], Min[Sim(A,E), Sim(E,C)]) = Max(Min[1.0, 0.4], Min[0.8, 0.4], Min[0.4, 1.0], Min[0.5, 0.4], Min[0.8, 0.4]) = Max(0.4, 0.4, 0.4, 0.4, 0.4) = 0.4. Reflexive, Symmetric, Transitive: implies an equivalence relation Equivalence Classes for Similarity Relations Sim(x, y) A B C D E A B C D E 1.0 0.8 0.4 0.5 0.8 0.8 1.0 0.4 0.5 0.9 0.4 0.4 1.0 0.4 0.4 0.5 0.5 0.4 1.0 0.5 0.8 0.9 0.4 0.5 1.0 The -level set S = {(x,y)|Sim(x,y) } is an equivalence relation. Partion Tree: {A, B, C, D, E} {A, B, D, E} {A, B, E} {D} {C} = 0.4 = 0.5 {C} = 0.8 {B, E} {D} {C} = 0.9 {A} {B} {E} {D} {C} = 1.0 {A} e.g. S0.8 = {{A,B,E}, {C}, {D}} Fuzzy Relations Fuzzy Relation: (essentially) subset of the cross product (D1) (D2) … (Dn), where (D) is the power set of the domain D. Fuzzy Tuple: any member of of a fuzzy relation. Interpretation (of a fuzzy tuple): (essentially) select any one element from each set of the tuple. The space of interpretations is the cross product D1 D2 … Dn. Fuzzy Queries Relational Algebra Expressions plus a clause defining minimum thresholds for attributes over similarity relations Evaluate and merge tuples (forming sets in the powerset) so long as there are no minimum threshold value violations Relational Calculus An augmentation similar to the augmentation for relational algebra Predicate-calculus statements with minimum level values specified for relations and/or comparison expressions Query Example (Find Consensus in Opinion Survey) How well to the experts and residents agree about the effects of pollutants? Similarity Relation for EFFECT Then if Thres(EFFECT) 0.85, for example, then the values in the same blocks of the partition determined by = 0.85 are conisdered to be equal. Thus, for this example, {Minimal, Limited, Tolerable, Moderate} are equal, as are {Major, Extreme}. Consensus of Experts R1 = POLLUTANT,NAME,EFFECTTYPE=ExpertSurvey with Thres(EFFECT) 0.85 and Thres(NAME) 0.0 Consensus of Residents R2 = POLLUTANT,NAME,EFFECTTYPE=ResidentSurvey with Thres(EFFECT) 0.85 and Thres(NAME) 0.0 Consensus in Survey R1 R1 POLLUTANT,EFFECT R2 with Thres(EFFECT) 0.85 and Thres(NAME) 0.0 R2 Possibility-Based Models Similarity-based models Capture imprecision in distinction of elements of domains Example: is “limited” different from “moderate”? Possibility-based models Capture uncertainty about the elements themselves (or about relationships among the elements) Example: Is the salary high? How much is “likes(person, movie)” certain? Possibility Theory (for Single Values) The available information about the value of a single-valued attribute A for a tuple t is represented by a possibility distribution A(t) on D {e}, where D is the domain of attribute A and e is an extra element which stands for the case when the attribute does not apply to t. The possibility distribution A(t) defines a mapping from D {e} to [0, 1] Example: middle-aged might have a possibility distribution defined over the domain age = [0, 100], for every tuple t that contains age, as follows: middle-aged(t) = (e) = 0 { 0, for age(t) 35 (45 – 35)/10, for 35 age(t) 45 1, for 45 age(t) 50 (55 – 50)/5, for 50 age(t) 55 0, for 55 age(t) Possibility Distributions for Usual Situations Possibility Distributions for Ill-known Values Computing Possibile and Certain Conditions (from the possibility distribution A(t) and a set S – ordinary or fuzzy) Possible(d) = mindD(A(t)(d), S(d)) PossibilityDegree(d) = supdDPossible(d) Certain(d) = maxdD(1 - A(t)(d), S(d)) CertaintyDegree(d) = infdDCertain(d) Possibility-Based Queries If John’s age and the fuzzy predicate P(middle-aged) are represented as: Then we can evaluate the condition: John’s age = middle-aged as: Possible Certain PossibilityDegree CertaintyDegree Note: Unlike earlier, where middle-aged was the possibility distribution, here middle-aged is a fuzzy set and John’s age has a possibility distribution (e.g. we are assuming that someone has an idea what the possibilities are for John’s age). Rough Sets Captures indiscernibility among set values Formalized as follows: R is an indiscernibility relation or equivalence relation among indiscernible values. [x]R denotes the equivalence class of R containing x. The elementary sets are the equivalence classes of R. The definable sets are unions of elementary sets. Lower and upper approximations of a set X over the elements of the universe U are respectively RX = { xU | [x]R X } RX = { xU | [x]RX } Examples for Rough Sets Indiscernibility Relation, R A definable set: [URBAN]R [SAND]R = {BUILDING, URBAN, BEACH, COAST, SAND} An equivalence class: [SAND]R = {BEACH, COAST, SAND} Elementary sets For X = {BUILDING, URBAN, SAND}, lower approximation: RX = {BUILDING, URBAN} upper approximation: RX = {BUILDING, URBAN, BEACH COAST, SAND} Rough Relational Model A rough relation R is (essentially) a subset of the set cross product (D1) (D2) … (Dn). A rough tuple ti for R is (di1, di2, …, din) where dij Dj. An interpretation (a1, a2, …, an) of a rough tuple ti = (di1, di2, …, din) is any value assignment such that aj dij for all j. Tuples ti = (di1, di2, …, din) and tk = (dk1, dk2, …, dkn) are redundant if [dij] = [dkj] for all j. Sample Rough Relation Rough Relation X: Tuple i FEATURE equivalence classes: COUNTRY equivalence classes: Interpretations of Tuple i: (U157, US, SAND) … (U157, UNITED-STATES, SAND) … Tuple i and the tuple (U157, {USA, MEXICO}, {ROAD, BEACH}) are redundant. Rough Selection A=aX is a rough relation Y where RY = { tX | i[ai] = j[bj] }, aia, bjt(A) RY = { tX | i[ai] j[bj] }, aia, bjt(A) X FEATURE={BEACH, ROAD}X ID U157 M007 ( U147 COUNTRY FEATURE {USA, MEXICO} {SAND, ROAD} MEXICO {COAST, AIRPORT} US {BEACH, ROAD, URBAN} ) Rough Projection B(X) is the rough relation { t(B) | t X }. Rough projection must maintain lower and upper approximations with equivalent tuples, one from the lower and one from the upper being maintained as the lower approximation. X COUNTRYX US {US, MEXICO} MEXICO BELIZE {BELIZE, INT} FEATUREFEATURE={BEACH, ROAD}X {SAND, ROAD} ( {SAND, ROAD, URBAN} ) Rough Join The specification of is lengthy, but straightforward: for testing whether tuples join, lower approximation uses equality (as usual) and upper approximation uses rough set subset (as usual), but in either direction. Both lower and upper tuples are compared. (COUNTRY,FEATUREFEATURE={SAND,ROAD}X) COUNTRY (COUNTRY,FEATUREFEATURE=MARSHX) US {SAND, ROAD, URBAN} US {SAND, ROAD, URBAN} US {SAND, ROAD, URBAN} ( {US, MEXICO} {SAND, ROAD} ( {US, MEXICO} {SAND, ROAD} MARSH {MARSH, LAKE} {MARSH, PASTURE, RIVER} MARSH ) {MARSH, LAKE} ) ( {US, MEXICO} {SAND, ROAD} {MARSH, PASTURE, RIVER} ) (COUNTRY,FEATUREFEATURE={SAND,ROAD}X) {US, MEXICO} MEXICO ( US {SAND, ROAD} {SAND, ROAD} {SAND, ROAD, URBAN} ) (COUNTRY,FEATUREFEATURE=MARSHX) US ( US ( USA MARSH {MARSH, LAKE} ) {MARSH, PASTURE, RIVER} ) Rough Relational Set Operators Rough difference, X – Y: Rough union, X Y: X RT = { tRX | tRY } and RT = { tRX | tRY } RT = { tRXRY } and RT = { tRXRY } Rough intersection, X Y: RT = { tRXRY } and RT = { tRXRY } FEATURECOUNTRY=USX {MARSH, LAKE} MARSH {MARSH, PASTURE, RIVER} {FOREST, RIVER} {SAND, ROAD, URBAN} ( {SAND, ROAD} ) - FEATURECOUNTRY=MEXICOX {SAND, ROAD} BEACH SAND Note that ( {SAND, ROAD} ), which is an upper approximation after the selection, disappears in the projection, and is thus not removed in the difference operation.