Knowledge Bases with Missing, Uncertain, Fuzzy, or Incomplete

advertisement
Knowledge Bases
with Missing, Uncertain,
Fuzzy, or Incomplete
Information
Based on
Fuzzy Databases: Principles and Applications
by Frederick E. Petry
&
Chapter 7 of Aritificial Intelligence
on Reasoning with Uncertain or Incomplete Information
by George F. Luger & William A. Stubblefield
&
Querying Disjunctive Databases in Polynomial Time
by Lars E. Olson
Outline of Topics
(A Sampling of Some of the Approaches)

Imprecision in Conventional Databases





Probabilistic Databases
Reasoning with Uncertain or Incomplete Information
Fuzzy Set Theory



Basic definitions
Operations
Models of Fuzziness




Null values
Range values
Similarity-based models
Possibility-based models
Rough sets
Disjunctive Databases
Missing or Incomplete Information

Missing or incomplete values (i.e. null values)
can have many meanings





Unknown (e.g. salary is not known)
Not applicable (e.g. student ID for non-students)
Does not exist (e.g. middle name of a person)
Maybe, possibly, probably, probably not, …
Representation ()



Normally: same symbol, same meaning
Neither  nor = is reasonable
Can have marked nulls: i = i
Three Valued Logic for Nulls
Also: x  y   for x or y null and  one of , , , , , .
OK for Cost<10 Parts
But not for Cost<10  Cost10 Parts
Range Values
For some unknowns,
a range may be known.
R = Person
P1
P2
P3
P4
P5
P6
Age
25
30-35
32-40
45-55
45-55
65
Interesting Consequences:
 25  Age R, but 35  Age R is
unknown.
 Relations don’t have duplicates,
but both 45-55 elements must
remain in Age R.
 Age Person=P2 R < Age Person=P4
R, but it is unknown whether
Age Person=P2 R < Age Person=P3 R.
Probabilistic Databases
CleanupCosts:
Site
Costs
…
M357 0.4 [$ 50 – 75 K]
0.6 [$ 75 – 100 K]
Each group sums
to one. (Missing
probabilities added.)
L121
…
0.3 [$ 200 – 300 K]
0.5 [$ 300 – 400 K]
0.2 [* – *]
Associate probabilities
with attributes.
Example: estimates
of 10 experts of
cleanup costs –
4 estimated M357
at 50-75K and 6
estimated it at
75-100K.
Operators extended by allowing conditions on both values and probabilities.
Example: SITE COST: 0.5 200K CleanupCosts = {SITE:L121, …}.
Certainty Theory
(Stanford Certainty Factor Algebra)


Attempts to formalize the human approach to uncertainty when
thinking of facts and rules as being “highly probably,” “unlikely,”
“possible,” …
Simple assumptions for creating confidence measures and for
combining confidence measures:

Evidence is either for or against a particular hypothesis – ranges from 1
to -1



MB(H|E): Measure of Belief of hypothesis (H) given evidence (E).
MD(H|E): Measure of Disbelief of H given E.
Either:




1 > MB(H|E) > 0 while MD(H|E) = 0, or
1 > MD(H|E) > 0 while MB(H|E) = 0
Then: the certainty factor, CF(H|E) = MB(H|E) – MD(H|E).
Premises of rules are conjunctions and disjunctions of uncertain facts
and are modeled respectively by min and max.


Conjunction of Premises: CF(P1  P2) = min(CF(P1), CF(P2))
Disjunction of Premises: CF(P1  P2) = max(CF(P1), CF(P2))
Certainty Theory
(Stanford Certainty Factor Algebra)

Conclusions of rules have certainty factors based on assumed
complete certainty of premises, but are then tempered
multiplicatively by the uncertainty of premises.

Example of a rule: (P1  P2)  P3  R1 (.7)  R2 (.3)



Result 1 (R1) has “estimated certainty” of .7; Result 2 (R2) … .3
P1, P2, and P3 assumed completely certain for estimating result
certainties
Example of certainty calculation assuming P1, P2, and P3 have
certainty .6, .4, and .2 respectively




CF(P1  P2) = min(.6, .4) = .4
CF((P1  P2)  P3) = max(.4, .2) = .4
When R1 is added as a derived fact: CF(R1) = CF((P1  P2)  P3)  .7
= .4  .7 = .28
When R2 is added: CF(R1) = .4  .3 = .12
Certainty Theory
(Stanford Certainty Factor Algebra)

Combinations of multiple rules for the same evidence
have rules …




If CF(R1) and CF(R2) are positive, then
CF(R1) + CF(R2) – (CF(R1)  CF(R2))
If CF(R1) and CF(R2) are negative, then
CF(R1) + CF(R2) + (CF(R1)  CF(R2))
Otherwise (CF(R1) + CF(R2)) / (1 – min(|CF(R1)|, |CF(R2)|))
… with desirable properties.



Results are always between 1 and -1.
Combining contradictory certainty factors cancel each other.
Combined positive (negative) certainty factors increase
(decrease) monotonically, as would be expected.
Certainty Theory
(Stanford Certainty Factor Algebra)

Example:
MotherOfChild(x,y)  MotherOfChild(x,z)  FullSibling(x,y) (.9)
LiveInSameHoushold(x,y)  BirthdateWithin15Years(x,y)  FullSibling(x,y) (.8)
MotherOfChild(mary,john) (.7)
MotherOfChild(mary,alice) (.8)
LiveInSameHousehold(john,alice) (.9)
BirthDateWithin15Years(john,alice) (1.0)
1st FullSibling Rule: min(.7, .8)  .9 = .63
2nd FullSibling Rule: min(.9, 1.0)  .8 = .72
Positive Certainty Factor Rule: .63 + .72 – (.63  .72) = 1.35 - .45 = .9
Certainty Theory
(Stanford Certainty Factor Algebra)

Example:
MotherOfChild(x,y)  MotherOfChild(x,z)  FullSibling(x,y) (.9)
LiveInSameHoushold(x,y)  BirthdateWithin15Years(x,y)  FullSibling(x,y) (.8)
MotherOfChild(mary,john) (.7)
MotherOfChild(mary,alice) (.8)
LiveInSameHousehold(john,alice) (.9)
BirthDateWithin15Years(john,alice) (-.6)
1st FullSibling Rule: min(.7, .8)  .9 = .63
2nd FullSibling Rule: min(.9, -.6)  .8 = -.48
Mixed Certainty Factor Rule: (.63 + (-.48))/(1 – min(|.63|, |-.48|)) = .15/.52 = .29

The theory does not attempt to do “correct” reasoning, but rather to
model the combining of confidence measures in an approximate,
heuristic, and informal way.
Fuzzy Set Theory
Basic Idea: extend the notion of a characteristic function:
For an ordinary set S: CharS(x) : U  {0, 1}
e.g. for U = abcde, 01101 is {b, c, e}
For a fuzzy set F: CharF(x) : U  [0, 1]
i.e. the degree of set membership ranges from 0 to 1 and
represents an opinion, a judgment, or the confidence in membership.
The membership function is usually denoted by F(x).
e.g. estimates of high cost of environmental cleanup:
HIGH = {0.0/1K, 0.125/5K, 0.5/10K, 0.8/25K, 0.9/50K, 1.0/100K}
Can interpolate to get estimates for other values, e.g. 0.95/75K
Can extend to get other estimates, e.g. HIGH(x) = 1.0 for x  100K
Fuzzy Set Operations

Equality



Containment






AB(x) = Min(A(x), B(x))
{0.0/0K, 0.8/10K}  {0.0/0K, 0.5/5K, 0.9/10K} = {0.0/0K, 0.4/5K, 0.8/10K}
Concentration (CON(A))



AB(x) = Max(A(x), B(x))
{0.0/0K, 0.8/10K}  {0.0/0K, 0.5/5K, 0.9/10K} = {0.0/0K, 0.5/5K, 0.9/10K}
Set Intersection


A = {(1-A(x)/x}
{0.7/10K} = {0.3/10K} i.e. the judgment that 10K is not in the set is 0.3 (when
the judgment that it is in the set is 0.7)
Set Union


A  B iff A(x)  B(x)
{0.0/0K, 0.8/10K}  {0.0/0K, 0.5/5K, 0.9/10K}
Complement


A = B iff A(x) = B(x)
{0.0/0K, 10/10K} = {0.0/0K, 0.5/5K, 1.0/10K}
CON(A)(x) = (A(x))2
Concentrates fuzzy elements by reducing the membership grade
Dilation (DIL(A))


DIL(A)(x) = (A(x))1/2
Dilates fuzzy elements by increasing the membership grade
Similarity-Based Models


Allows us to measure nearness of domain elements
Uses a similarity relation
Sim(x, y)
A
B
C
D
E
A
B
C
D
E
1.0
0.8
0.4
0.5
0.8
0.8
1.0
0.4
0.5
0.9
0.4
0.4
1.0
0.4
0.4
0.5
0.5
0.4
1.0
0.5
0.8
0.9
0.4
0.5
1.0
The similarity of A and E is 0.8.
Properties of Similarity Relations
Sim(x, y)
A
B
C
D
E



A
B
C
D
E
1.0
0.8
0.4
0.5
0.8
0.8
1.0
0.4
0.5
0.9
0.4
0.4
1.0
0.4
0.4
0.5
0.5
0.4
1.0
0.5
0.8
0.9
0.4
0.5
1.0
Reflexive: Sim(x,x) = 1.0
Symmetric: Sim(x,y) = Sim(y,x)
Transitive: Sim(x,z) = Max{y}(Min[Sim(x,y), Sim(y,z)])
e.g. Sim(A,C) = Max(Min[Sim(A,A), Sim(A,C)], Min[Sim(A,B), Sim(B,C)], Min[Sim(A,C), Sim(C,C)],
Min[Sim(A,D), Sim(D,C)], Min[Sim(A,E), Sim(E,C)]) = Max(Min[1.0, 0.4], Min[0.8, 0.4], Min[0.4, 1.0],
Min[0.5, 0.4], Min[0.8, 0.4]) = Max(0.4, 0.4, 0.4, 0.4, 0.4) = 0.4.

Reflexive, Symmetric, Transitive: implies an equivalence relation
Equivalence Classes
for Similarity Relations
Sim(x, y)
A
B
C
D
E
A
B
C
D
E
1.0
0.8
0.4
0.5
0.8
0.8
1.0
0.4
0.5
0.9
0.4
0.4
1.0
0.4
0.4
0.5
0.5
0.4
1.0
0.5
0.8
0.9
0.4
0.5
1.0
The -level set S =
{(x,y)|Sim(x,y)  } is
an equivalence relation.
Partion Tree:
{A, B, C, D, E}
{A, B, D, E}
{A, B, E}
{D}
{C}
 = 0.4
 = 0.5
{C}  = 0.8
{B, E}
{D}
{C}  = 0.9
{A} {B} {E}
{D}
{C}  = 1.0
{A}
e.g. S0.8 = {{A,B,E}, {C}, {D}}
Fuzzy Relations
Fuzzy Relation: (essentially) subset of the cross product
(D1)  (D2)  …  (Dn), where (D) is the power set
of the domain D.
Fuzzy Tuple: any member of of a fuzzy relation.
Interpretation (of a fuzzy tuple): (essentially) select any
one element from each set of the tuple. The space of
interpretations is the cross product D1  D2  …  Dn.
Fuzzy Queries

Relational Algebra



Expressions plus a clause defining minimum
thresholds for attributes over similarity relations
Evaluate and merge tuples (forming sets in the
powerset) so long as there are no minimum threshold
value violations
Relational Calculus


An augmentation similar to the augmentation for
relational algebra
Predicate-calculus statements with minimum level
values specified for relations and/or comparison
expressions
Query Example
(Find Consensus in Opinion Survey)
How well to the experts and residents agree about the effects of pollutants?
Similarity Relation for EFFECT
Then if Thres(EFFECT)  0.85, for example, then the values
in the same blocks of the partition determined by  = 0.85
are conisdered to be equal. Thus, for this example,
{Minimal, Limited, Tolerable, Moderate} are equal, as are
{Major, Extreme}.
Consensus of Experts
R1 = POLLUTANT,NAME,EFFECTTYPE=ExpertSurvey
with Thres(EFFECT)  0.85
and Thres(NAME)  0.0
Consensus of Residents
R2 = POLLUTANT,NAME,EFFECTTYPE=ResidentSurvey
with Thres(EFFECT)  0.85
and Thres(NAME)  0.0
Consensus in Survey
R1
R1 POLLUTANT,EFFECT R2
with Thres(EFFECT)  0.85
and Thres(NAME)  0.0
R2
Possibility-Based Models

Similarity-based models



Capture imprecision in distinction of elements of
domains
Example: is “limited” different from “moderate”?
Possibility-based models


Capture uncertainty about the elements themselves
(or about relationships among the elements)
Example: Is the salary high? How much is
“likes(person, movie)” certain?
Possibility Theory
(for Single Values)



The available information about the value of a single-valued
attribute A for a tuple t is represented by a possibility
distribution A(t) on D  {e}, where D is the domain of
attribute A and e is an extra element which stands for the
case when the attribute does not apply to t.
The possibility distribution A(t) defines a mapping from D 
{e} to [0, 1]
Example: middle-aged might have a possibility distribution
defined over the domain age = [0, 100], for every tuple t
that contains age, as follows:
middle-aged(t) =
(e) = 0
{
0, for age(t)  35
(45 – 35)/10, for 35  age(t)  45
1, for 45  age(t)  50
(55 – 50)/5, for 50  age(t)  55
0, for 55  age(t)
Possibility Distributions
for Usual Situations
Possibility Distributions
for Ill-known Values
Computing
Possibile and Certain Conditions
(from the possibility distribution A(t)
and a set S – ordinary or fuzzy)




Possible(d) = mindD(A(t)(d), S(d))
PossibilityDegree(d) = supdDPossible(d)
Certain(d) = maxdD(1 - A(t)(d), S(d))
CertaintyDegree(d) = infdDCertain(d)
Possibility-Based Queries
If John’s age and the fuzzy predicate P(middle-aged) are represented as:
Then we can evaluate the condition: John’s age = middle-aged as:
Possible
Certain
PossibilityDegree
CertaintyDegree
Note: Unlike earlier, where middle-aged was the possibility distribution, here middle-aged is
a fuzzy set and John’s age has a possibility distribution (e.g. we are assuming that someone
has an idea what the possibilities are for John’s age).
Rough Sets


Captures indiscernibility among set values
Formalized as follows:





R is an indiscernibility relation or equivalence relation
among indiscernible values.
[x]R denotes the equivalence class of R containing x.
The elementary sets are the equivalence classes of R.
The definable sets are unions of elementary sets.
Lower and upper approximations of a set X over the
elements of the universe U are respectively


RX = { xU | [x]R  X }
RX = { xU | [x]RX   }
Examples for Rough Sets
Indiscernibility Relation, R
A definable set: [URBAN]R  [SAND]R
= {BUILDING, URBAN, BEACH,
COAST, SAND}
An equivalence class:
[SAND]R = {BEACH, COAST, SAND}
Elementary sets
For X = {BUILDING, URBAN, SAND},
lower approximation:
RX = {BUILDING, URBAN}
upper approximation:
RX = {BUILDING, URBAN, BEACH
COAST, SAND}
Rough Relational Model




A rough relation R is (essentially) a subset of the
set cross product (D1)  (D2) …  (Dn).
A rough tuple ti for R is (di1, di2, …, din) where dij
 Dj.
An interpretation (a1, a2, …, an) of a rough tuple
ti = (di1, di2, …, din) is any value assignment
such that aj  dij for all j.
Tuples ti = (di1, di2, …, din) and tk = (dk1, dk2, …,
dkn) are redundant if [dij] = [dkj] for all j.
Sample Rough Relation
Rough Relation X:
Tuple i
FEATURE equivalence classes:
COUNTRY equivalence classes:
Interpretations of Tuple i:
(U157, US, SAND)
…
(U157, UNITED-STATES, SAND)
…
Tuple i and the tuple
(U157, {USA, MEXICO}, {ROAD, BEACH})
are redundant.
Rough Selection
A=aX is a rough relation Y where
RY = { tX | i[ai] = j[bj] }, aia, bjt(A)
RY = { tX | i[ai]  j[bj] }, aia, bjt(A)
X
FEATURE={BEACH, ROAD}X
ID
U157
M007
( U147
COUNTRY
FEATURE
{USA, MEXICO} {SAND, ROAD}
MEXICO
{COAST, AIRPORT}
US
{BEACH, ROAD, URBAN} )
Rough Projection
B(X) is the rough relation { t(B) | t  X }.
Rough projection must maintain lower and upper approximations with
equivalent tuples, one from the lower and one from the upper being
maintained as the lower approximation.
X
COUNTRYX
US
{US, MEXICO}
MEXICO
BELIZE
{BELIZE, INT}
FEATUREFEATURE={BEACH, ROAD}X
{SAND, ROAD}
( {SAND, ROAD, URBAN} )
Rough Join
The specification of  is lengthy, but straightforward: for testing whether tuples join, lower
approximation uses equality (as usual) and upper approximation uses rough set subset (as
usual), but in either direction. Both lower and upper tuples are compared.
(COUNTRY,FEATUREFEATURE={SAND,ROAD}X)
COUNTRY
(COUNTRY,FEATUREFEATURE=MARSHX)
US
{SAND, ROAD, URBAN}
US
{SAND, ROAD, URBAN}
US
{SAND, ROAD, URBAN}
( {US, MEXICO} {SAND, ROAD}
( {US, MEXICO} {SAND, ROAD}
MARSH
{MARSH, LAKE}
{MARSH, PASTURE, RIVER}
MARSH )
{MARSH, LAKE} )
( {US, MEXICO} {SAND, ROAD}
{MARSH, PASTURE, RIVER} )
(COUNTRY,FEATUREFEATURE={SAND,ROAD}X)
{US, MEXICO}
MEXICO
( US
{SAND, ROAD}
{SAND, ROAD}
{SAND, ROAD, URBAN} )

(COUNTRY,FEATUREFEATURE=MARSHX)
US
( US
( USA
MARSH
{MARSH, LAKE} )
{MARSH, PASTURE, RIVER} )
Rough Relational Set Operators

Rough difference, X – Y:



Rough union, X  Y:



X
RT = { tRX | tRY } and
RT = { tRX | tRY }
RT = { tRXRY } and
RT = { tRXRY }
Rough intersection, X  Y:


RT = { tRXRY } and
RT = { tRXRY }
FEATURECOUNTRY=USX
{MARSH, LAKE}
MARSH
{MARSH, PASTURE, RIVER}
{FOREST, RIVER}
{SAND, ROAD, URBAN}
( {SAND, ROAD} )
-
FEATURECOUNTRY=MEXICOX
{SAND, ROAD}
BEACH
SAND
Note that ( {SAND, ROAD} ), which is an upper
approximation after the selection, disappears in
the projection, and is thus not removed in the
difference operation.
Download