CaseSim

advertisement
Similarity in CBR
Sources:
–Chapter 4
–www.iiia.csic.es/People/enric/AICom.html
–www.ai-cbr.org
Computing Similarity
•Similarity is a key (the key?) concept in CBR
We saw that a case consists of:
similarity
Problem
Solution
Adequacy
We saw that the CBR problem solving cycle consists of:
similarity
Retrieval
Reuse
Revise
Retain
•We will distinguish between:
Meaning of similarity
Formal axioms capturing this meaning
Meaning of Similarity
Observation 1: Similarity always concentrates on one aspect or
task:
There is no absolute similarity
Example:
•Two cars are similar if they have similar capacity
(two compact cars may be similar to each other but
not to a full-size car)
•Two cars are similar if they have similar price (a new
compact car may be similar to an old full-size car but
not to an old compact car)
When computing similarity we are concentrating on one such
aspect or aggregating several such aspects
Meaning of Similarity (2)
Observation 2: Similarity is not always transitive:
Example:
I define similar to mean “within walking distance”
•“Lehigh’s book store” is similar to “Lupita”
•“Lupitas” is similar to “Perkins”
•“Perkins” is similar to “Monrovia book store”
•…
•But: “Lehigh’s book store” is not similar to “Best
Buy” in Allentown !
The problem is that the property “small difference” cannot be
propagated
Meaning of Similarity (3)
Observation 3: Similarity is not always symmetric:
Example:
• “Mike Tyson fights like a lion”
• But do we really want to say that “a lion fights like
Mike Tyson”?
The problem is that in general the distance from an element to
a prototype of a category is larger than the other way around
Similarity and Utility in CBR
•Utility: measure of the improvement in efficiency as a result of a
body of knowledge (We’ll come back to this point)
The goal of the similarity is to select cases that can be easily
adapted to solve a new problem
Similarity = Prediction of the utility of the case
•However:
 The similarity is an a priori criterion
 The utility is an a posteriori criterion
• Ideal: Similarity makes a good prediction of the utility
Axioms for Similarity
•There are 3 types of axioms:
Binary similarity predicate “x and y are similar”
Binary dissimilarity predicate “x and y are dissimilar”
Similarity as order relation: “x is at least as similar to y as it
is to z”
•Observation:
The first and the second are equivalent
The third provides more information: grade of similarity
Similarity Relations
•We want to define a relation:
R(x,y,z) iff “x is at least as similar to y as x is
similar is to z”
•First lets consider the following relation:
S(x,y,u,v) iff “x is at least as similar to y as u is similar
to v”
Definition of R in terms of S:
R(x,y,z) iff S(x,y,x,z)
Similarity Relations (2)
•Possible requirements on the relation S:
1. Reflexive: S(x,x,u,v)
2. Symmetry: S(x,y,y,x)
3. Transitivity: S(x,y,u,v) & S(u,v,s,t)  S(x,y,s,t)
4. Symmetry: S(x,y,u,v) iff S(y,x,u,v) iff S (x,y,v,u)
Similarity Relations (3)
In CBR we have an object x fixed when computing
similarity. Which x?
The new problem
We are looking for a y such that y is the most similar to x.
In terms of R this be seen as:
 z: R(x,y,z)
•Given a problem x we can define an ordering relation x as
follows:
y x z iff R(x,y,z)
y >x z iff (y x z and ¬ z x y)
y ~x z iff (y x z and z x y)
Similarity Metric
•We want to assign a number to indicate the similarity between
a case and a problem
Definition: A similarity metric over a set M is a function:
sim: M  M  [0,1]
Such that:
For all x in M: sim(x,x) = 1 holds
For all x, y in M: sim(x,y) = sim(y,x)
“ the closer the value of sim(x,y) to 1, the more similar is x to y”
Similarity Metric (2)
Given a similarity metric: sim: M  M  [0,1], it induces a
similarity relation Ssim (x,y,u,v) and x as follows:
For all x, y, u, v: Ssim (x,y,u,v) holds if sim(x,y)  sim(u,v)
For all x, y, z: y x z if sim(x,y)  sim(x,z)

•sim provides a quantitative value for similarity:
sim(x, yi)
0
y1 y2
y3 y4
Thus y4 is more similar to x
1
Distance Metric
•Definition: A distance function over a set M is a
function:
d: M  M  [0,)
Such that:
For all x in M: d(x,x) = 0 holds
For all x, y in M: d(x,y) = d(y,x)
•Definition: A distance function over a set M is a
metric if:
For all x, y in M: d(x,y) = 0 holds then x = y
For all x, y, z in M: d(x,z) + d(z,y)  d(x,y)
Relation between Similarity and
Distance Metric
Given a distance metric, d, it induces a similarity
relation Sd(x,y,u,v), x as follows:
For all x, y, u, v: S(x,y,u,v) holds if d(x,y)  d(u,v)
For all x, y, z: y x z if d(x,y)  d(x,z)
Definition: A similarity metric sim and a distance metric d
are compatible iff:
for all x,y, u, v: Sd(x,y,u,v) iff Ssim(x,y,u,v)
Relation between Similarity and
Distance Metric (2)
Property: Let
f: [0,)  (0,1]
Be a bijective and order inverting (if u< v then f(v) < f(u))
function such that:
•f(0) = 1
•f(d(x,y)) = sim(x,y)
then d and sim are compatible
If d(x,y) < d(u,v) then sim(x,y) > sim(u,v)
f(d(x,y)) > f(d(u,v))
Relation between Similarity and
Distance Metric (3)
F(x) can be used to construct sim giving d. Example of such a
function is:
•if you have the Euclidean distance:
d((x,y),(u,v)) = sqr((x-u)2 + (y-v)2)
• Since f(x) = 1 – (x/(x+1)) meets the property before
•Then:
sim((x,y),(u,v))) = f(d((x,y),(u,v)))
= 1 – (d((x,y),(u,v)) /(d((x,y),(u,v)) +1))
is a similarity metric
Relation between Similarity and
Distance Metric (3)
•The function f(x) = 1 – (x/(x+1)) is a bijective function from
[0,) into (0,1]:
1
0
Other Similarity Metrics
•Suppose that we have cases represented as attribute-value
pairs (e.g., the restaurant domain)
•Suppose initially that the values are binary
•We want to define similarity between two cases of the form:
X = (X1, …, Xn) where Xi = 0 or 1
Y = (Y1, …,Yn) where Yi = 0 or 1
Preliminaries
Let:
A = (i=1,n)Xi•Yi
(number of attributes for which
Xi =1 and Yi = 1)
B = (i=1,n)Xi•(1-Yi)
(number of attributes for which
Xi =1 and Yi = 0)
C = (i=1,n)(1-Xi)•Yi
(number of attributes for which
Xi =0 and Yi = 1)
D = (i=1,n)(1-Xi) •(1-Yi) (number of attributes for which
Xi =0 and Yi = 0)
Then, A + B + C + D = n
A+D = “matching attributes”
B+C= “mismatching attributes”
Hamming Distance
H(X,Y) = n – (i=1,n)Xi•Yi –
(i=1,n)(1-Xi)•(1-Yi)
Properties:
Range of H: [0,n]
H counts the mismatch between the attribute values
H is a distance metric:
•H(X,X) = 0
•H(X,Y) = H(Y,X)
H((1-X1, …, 1-Xn), (1-Y1, …,1-Yn)) =
H((X1, …, Xn), (Y1, …,Yn))
Simple-Matching-Coefficient (SMC)
# of mismatches
H(X,Y) = n – (A + D) = B + C
•Another distance-similarity compatible function is
f(x) = 1 – x/max (where max is the maximum value for x)
We can define the SMC similarity, simH:
Proportion of
the difference
simH(X,Y) = 1 – ((n – (A+D))/n) = (A+D)/n = 1- ((B+C)/n)
Simple-Matching-Coefficient (SMC)
(II)
•If we use on simH(X,Y) = (A+D)/n =1- ((B+C)/n) = factor(A, B, C, D)
Monotonic:
If A  A’ then:
If B  B’ then:
If C  C’ then:
If D  D’ then:
factor(A,B,C,D)  factor(A’,B,C,D)
factor(A,B’,C,D)  factor(A,B,C,D)
factor(A,B,C’,D)  factor(A,B,C,D)
factor(A,B,C,D)  factor(A,B,C,D’)
Symmetric:
simH (X,Y) = simH(Y,X)
Variations of the SMC
•The hamming similarity assign equal value to matches (both 0 or
both 1)
•There are situations in which you want to count different when
both match with 1 as when both match with 0
Thus, sim((1-X1, …, 1-Xn), (1-Y1, …,1-Yn)) =
sim((X1, …, Xn), (Y1, …,Yn)) may not hold
Example: Two symptoms of patients are similar if they both
have fever (Xi = 1 and Yi = 1) but not similar if
neither have fever (Xi = 0 and Yi = 0)
Specific attributes may be more important than other attributes
Example: manufacturing domain: some parts of the workpiece
are more important than others
Variations of SMC (III)
•simH(X,Y) = (A+D)/n = (A+D)/(A+B+C+D)
•We introduce a weight, , with 0 <  < 1:
sim(X,Y) = ((A+D))/ ((A+D) + (1 - )(B+C))
For which  is sim(X,Y) = simH(X,Y)?
 = 0.5
sim(X,Y) preserves the monotonic and symmetric conditions
The similarity depends only from A,
B, C and D (3)
•What is the role of ? What happens if  > 0.5? If  < 0.5?
sim(X,Y) = ((A+D))/ ((A+D) + (1 - )(B+C))
1
 > 0.5
 = 0.5
 < 0.5
0
0
n
•If  > 0.5 we give more
weights to the matching
attributes
•If  < 0.5 we give more
weights to the missmatching attributes
Discarding 0-match
•Thus, sim((1-X1, …, 1-Xn), (1-Y1, …,1-Yn)) =
sim((X1, …, Xn), (Y1, …,Yn)) may not hold
•Only when the attribute occurs (i.e., Xi = 1 and Yi = 1 ) will
contribute to the similarity
Possible definition of the similarity:
sim = A / (A+ B+C)
Specific Attributes may be More
Important Than Other Attributes
•Significance of the attributes varies
•Weighted Hamming distance:
There is a weight vector: (1, …, n) such that
(i=1,n) i = 1
HW(X,Y) = 1 – (i=1,n) i • Xi•Yi –
(i=1,n) i • (1-Xi)•(1-Yi)
•Example: “Process planning: some features are more
important than others”
Non Monotonic Similarity
•The monotony condition in similarity, formally, says that:
sim(A,B)  sim(A’,B)
always holds if A counts the number of matches and A  A’
•Informally the monotony condition can be expressed as:
For any X, Y, X’ attribute-value vectors, If we obtain X’ by
modifying X on the value of one attribute such that X’ and Y
have the same value on that attribute then: sim(X,Y)  sim(X’,Y)
Non Monotonic Similarity (2)
Is the hamming distance monotonic? Yes
simH(X,Y) = (i=1,n)eq(Xi,Yi) / n
Consider the XOR function:
(0,0) and (1,1) are on the same class (+)
(0,1) and (1,0) are on the same class (-)
Thus d((1,1),(1,0)) > d((1,1),(0,0))
Is this monotonic? No
Non Monotonic Similarity (3)
•You may think: “well that was mathematics, how about real
world?”
•Suppose that we have two interconnected batteries B and B’
and 3 lamps X, Y and Z that have the following properties:
 If X is on, B and B’ work
 If Y is on, B or B’ work
 If Z is on, B works
Situation
1
2
3
X
0
0
0
Y
1
1
0
Z
1
0
0
B
Ok
Fail
Fail
B’
Fail
Ok
Fail
Thus:
• sim(1,3) > sim(1,2)
• Non monotonic!
Tversky Contrast Model
•Defines a non monotonic distance
•Comparison of a situation S with a prototype P (i.e, a case)
•S and P are sets of features
•The following sets:
A=S P
B=P–S
C = S – P
P
S
C
A
B
Tversky Contrast Model (2)
•Tversky-distance:
T(P,S) = f(A) - f(B) - f(C)
•Where f:  [0, )
• f, , , and  are fixed and defined by the user
•Example:
If f(A) = # elements in A
 =  =  = 1
T counts the number of elements in common minus the
differences
The Tversky-distance is not symmetric
Local versus Global Similarity Metrics
• In many situations we have similarity metrics between attributes
of the same type (called local similarity metrics). Example:
For a complex engine, we may have a similarity for the
temperature of the engine
• In such situations a reasonable approach to define a global
similarity sim(x,y) is to “aggregate” the local similarity
metrics simi(xi,yi). A widely used practice
• What requirements should we give to sim(x,y) in terms of
the use of simi(xi,yi)?
sim(x,y) to increate monotonically with each simi(xi,yi).
Local versus Global Similarity Metrics
(Formal Definitions)
•A local similarity metric on an attribute Ti is a similarity metric simi:
Ti  Ti  [0,1]
•A function : [0,1]n  [0,1] is an aggregation function if:
(0,0,…,0) = 0
 is monotonic non-decreasing on every argument
•Given a collection of n similarity metrics sim1, …, simn, for attributes
taken values from Ti, a global similarity metric, is a similarity metric
sim:V  V  [0,1], V in T1  … Tn, such that there is an aggregation
 function with:
sim(X,Y) = sim(X,Y) = (sim1(X1,Y1), …,simn(Xn,Yn))
Example: (X1,X2,…,Xn) = (X1+X2+…+Xn)/n
Example
• Cases may contain attributes of type:
– real number A: the voltage output of a device
• define a local similarity metric, simvoltage()
– Integer B: revolutions per second
• define a local similarity metric, simrps()
– A bunch of symbolic attributes m = (C1,..,Cm): front light
blinking or none, year of manufacture, etc
• define a Hamming similarity, simH(), combining all
these attributes
• Define an aggregated similarity sim() metric:
sim(C,C’) = 1 *simvoltage(A,A’) + 2 *simvoltage(A,A’) + 3 *simH(m, m’)
Homework (1 of 2)
1. In Slide 12 we define the similarity relation Ssim(x,y,u,v).
Which of the 4 kinds of relations defined in Slide 9 are
satisfied by Ssim(x,y,u,v)?
2. Let us define:
SH(x,y,u,v) iff H(x,y)  H(u,v)
where H is the Hamming distance (defined in Slide 20).
Which of the 4 kinds of relations defined in Slide 9 are
satisfied by SH(x,y,u,v)?
3. Let us define:
ST(x,y,u,v) iff T(x,y)  T(u,v)
where T is the Tversky Contrast Model (defined in Slide
31). Which of the 4 kinds of relations defined in Slide 9
are satisfied by ST(x,y,u,v)?
Homework (2 of 2)
4. Define a formula for the Hamming distance when the
attributes are symbolic but may take more than 2 values:
•X = (X1, …, Xn) where Xi  Ti
•Y = (Y1, …,Yn) where Yi  Ti
•Each Ti is finite
Download