CaseSim

Similarity in CBR Sources: –Chapter 4 –www.iiia.csic.es/People/enric/AICom.html –www.ai-cbr.org Computing Similarity •Similarity is a key (the key?) concept in CBR We saw that a case consists of: similarity Problem Solution Adequacy We saw that the CBR problem solving cycle consists of: similarity Retrieval Reuse Revise Retain •We will distinguish between: Meaning of similarity Formal axioms capturing this meaning Meaning of Similarity Observation 1: Similarity always concentrates on one aspect or task: There is no absolute similarity Example: •Two cars are similar if they have similar capacity (two compact cars may be similar to each other but not to a full-size car) •Two cars are similar if they have similar price (a new compact car may be similar to an old full-size car but not to an old compact car) When computing similarity we are concentrating on one such aspect or aggregating several such aspects Meaning of Similarity (2) Observation 2: Similarity is not always transitive: Example: I define similar to mean “within walking distance” •“Lehigh’s book store” is similar to “Lupita” •“Lupitas” is similar to “Perkins” •“Perkins” is similar to “Monrovia book store” •… •But: “Lehigh’s book store” is not similar to “Best Buy” in Allentown ! The problem is that the property “small difference” cannot be propagated Meaning of Similarity (3) Observation 3: Similarity is not always symmetric: Example: • “Mike Tyson fights like a lion” • But do we really want to say that “a lion fights like Mike Tyson”? The problem is that in general the distance from an element to a prototype of a category is larger than the other way around Similarity and Utility in CBR •Utility: measure of the improvement in efficiency as a result of a body of knowledge (We’ll come back to this point) The goal of the similarity is to select cases that can be easily adapted to solve a new problem Similarity = Prediction of the utility of the case •However:  The similarity is an a priori criterion  The utility is an a posteriori criterion • Ideal: Similarity makes a good prediction of the utility Axioms for Similarity •There are 3 types of axioms: Binary similarity predicate “x and y are similar” Binary dissimilarity predicate “x and y are dissimilar” Similarity as order relation: “x is at least as similar to y as it is to z” •Observation: The first and the second are equivalent The third provides more information: grade of similarity Similarity Relations •We want to define a relation: R(x,y,z) iff “x is at least as similar to y as x is similar is to z” •First lets consider the following relation: S(x,y,u,v) iff “x is at least as similar to y as u is similar to v” Definition of R in terms of S: R(x,y,z) iff S(x,y,x,z) Similarity Relations (2) •Possible requirements on the relation S: 1. Reflexive: S(x,x,u,v) 2. Symmetry: S(x,y,y,x) 3. Transitivity: S(x,y,u,v) & S(u,v,s,t)  S(x,y,s,t) 4. Symmetry: S(x,y,u,v) iff S(y,x,u,v) iff S (x,y,v,u) Similarity Relations (3) In CBR we have an object x fixed when computing similarity. Which x? The new problem We are looking for a y such that y is the most similar to x. In terms of R this be seen as:  z: R(x,y,z) •Given a problem x we can define an ordering relation x as follows: y x z iff R(x,y,z) y >x z iff (y x z and ¬ z x y) y ~x z iff (y x z and z x y) Similarity Metric •We want to assign a number to indicate the similarity between a case and a problem Definition: A similarity metric over a set M is a function: sim: M  M  [0,1] Such that: For all x in M: sim(x,x) = 1 holds For all x, y in M: sim(x,y) = sim(y,x) “ the closer the value of sim(x,y) to 1, the more similar is x to y” Similarity Metric (2) Given a similarity metric: sim: M  M  [0,1], it induces a similarity relation Ssim (x,y,u,v) and x as follows: For all x, y, u, v: Ssim (x,y,u,v) holds if sim(x,y)  sim(u,v) For all x, y, z: y x z if sim(x,y)  sim(x,z)  •sim provides a quantitative value for similarity: sim(x, yi) 0 y1 y2 y3 y4 Thus y4 is more similar to x 1 Distance Metric •Definition: A distance function over a set M is a function: d: M  M  [0,) Such that: For all x in M: d(x,x) = 0 holds For all x, y in M: d(x,y) = d(y,x) •Definition: A distance function over a set M is a metric if: For all x, y in M: d(x,y) = 0 holds then x = y For all x, y, z in M: d(x,z) + d(z,y)  d(x,y) Relation between Similarity and Distance Metric Given a distance metric, d, it induces a similarity relation Sd(x,y,u,v), x as follows: For all x, y, u, v: S(x,y,u,v) holds if d(x,y)  d(u,v) For all x, y, z: y x z if d(x,y)  d(x,z) Definition: A similarity metric sim and a distance metric d are compatible iff: for all x,y, u, v: Sd(x,y,u,v) iff Ssim(x,y,u,v) Relation between Similarity and Distance Metric (2) Property: Let f: [0,)  (0,1] Be a bijective and order inverting (if u< v then f(v) < f(u)) function such that: •f(0) = 1 •f(d(x,y)) = sim(x,y) then d and sim are compatible If d(x,y) < d(u,v) then sim(x,y) > sim(u,v) f(d(x,y)) > f(d(u,v)) Relation between Similarity and Distance Metric (3) F(x) can be used to construct sim giving d. Example of such a function is: •if you have the Euclidean distance: d((x,y),(u,v)) = sqr((x-u)2 + (y-v)2) • Since f(x) = 1 – (x/(x+1)) meets the property before •Then: sim((x,y),(u,v))) = f(d((x,y),(u,v))) = 1 – (d((x,y),(u,v)) /(d((x,y),(u,v)) +1)) is a similarity metric Relation between Similarity and Distance Metric (3) •The function f(x) = 1 – (x/(x+1)) is a bijective function from [0,) into (0,1]: 1 0 Other Similarity Metrics •Suppose that we have cases represented as attribute-value pairs (e.g., the restaurant domain) •Suppose initially that the values are binary •We want to define similarity between two cases of the form: X = (X1, …, Xn) where Xi = 0 or 1 Y = (Y1, …,Yn) where Yi = 0 or 1 Preliminaries Let: A = (i=1,n)Xi•Yi (number of attributes for which Xi =1 and Yi = 1) B = (i=1,n)Xi•(1-Yi) (number of attributes for which Xi =1 and Yi = 0) C = (i=1,n)(1-Xi)•Yi (number of attributes for which Xi =0 and Yi = 1) D = (i=1,n)(1-Xi) •(1-Yi) (number of attributes for which Xi =0 and Yi = 0) Then, A + B + C + D = n A+D = “matching attributes” B+C= “mismatching attributes” Hamming Distance H(X,Y) = n – (i=1,n)Xi•Yi – (i=1,n)(1-Xi)•(1-Yi) Properties: Range of H: [0,n] H counts the mismatch between the attribute values H is a distance metric: •H(X,X) = 0 •H(X,Y) = H(Y,X) H((1-X1, …, 1-Xn), (1-Y1, …,1-Yn)) = H((X1, …, Xn), (Y1, …,Yn)) Simple-Matching-Coefficient (SMC) # of mismatches H(X,Y) = n – (A + D) = B + C •Another distance-similarity compatible function is f(x) = 1 – x/max (where max is the maximum value for x) We can define the SMC similarity, simH: Proportion of the difference simH(X,Y) = 1 – ((n – (A+D))/n) = (A+D)/n = 1- ((B+C)/n) Simple-Matching-Coefficient (SMC) (II) •If we use on simH(X,Y) = (A+D)/n =1- ((B+C)/n) = factor(A, B, C, D) Monotonic: If A  A’ then: If B  B’ then: If C  C’ then: If D  D’ then: factor(A,B,C,D)  factor(A’,B,C,D) factor(A,B’,C,D)  factor(A,B,C,D) factor(A,B,C’,D)  factor(A,B,C,D) factor(A,B,C,D)  factor(A,B,C,D’) Symmetric: simH (X,Y) = simH(Y,X) Variations of the SMC •The hamming similarity assign equal value to matches (both 0 or both 1) •There are situations in which you want to count different when both match with 1 as when both match with 0 Thus, sim((1-X1, …, 1-Xn), (1-Y1, …,1-Yn)) = sim((X1, …, Xn), (Y1, …,Yn)) may not hold Example: Two symptoms of patients are similar if they both have fever (Xi = 1 and Yi = 1) but not similar if neither have fever (Xi = 0 and Yi = 0) Specific attributes may be more important than other attributes Example: manufacturing domain: some parts of the workpiece are more important than others Variations of SMC (III) •simH(X,Y) = (A+D)/n = (A+D)/(A+B+C+D) •We introduce a weight, , with 0 <  < 1: sim(X,Y) = ((A+D))/ ((A+D) + (1 - )(B+C)) For which  is sim(X,Y) = simH(X,Y)?  = 0.5 sim(X,Y) preserves the monotonic and symmetric conditions The similarity depends only from A, B, C and D (3) •What is the role of ? What happens if  > 0.5? If  < 0.5? sim(X,Y) = ((A+D))/ ((A+D) + (1 - )(B+C)) 1  > 0.5  = 0.5  < 0.5 0 0 n •If  > 0.5 we give more weights to the matching attributes •If  < 0.5 we give more weights to the missmatching attributes Discarding 0-match •Thus, sim((1-X1, …, 1-Xn), (1-Y1, …,1-Yn)) = sim((X1, …, Xn), (Y1, …,Yn)) may not hold •Only when the attribute occurs (i.e., Xi = 1 and Yi = 1 ) will contribute to the similarity Possible definition of the similarity: sim = A / (A+ B+C) Specific Attributes may be More Important Than Other Attributes •Significance of the attributes varies •Weighted Hamming distance: There is a weight vector: (1, …, n) such that (i=1,n) i = 1 HW(X,Y) = 1 – (i=1,n) i • Xi•Yi – (i=1,n) i • (1-Xi)•(1-Yi) •Example: “Process planning: some features are more important than others” Non Monotonic Similarity •The monotony condition in similarity, formally, says that: sim(A,B)  sim(A’,B) always holds if A counts the number of matches and A  A’ •Informally the monotony condition can be expressed as: For any X, Y, X’ attribute-value vectors, If we obtain X’ by modifying X on the value of one attribute such that X’ and Y have the same value on that attribute then: sim(X,Y)  sim(X’,Y) Non Monotonic Similarity (2) Is the hamming distance monotonic? Yes simH(X,Y) = (i=1,n)eq(Xi,Yi) / n Consider the XOR function: (0,0) and (1,1) are on the same class (+) (0,1) and (1,0) are on the same class (-) Thus d((1,1),(1,0)) > d((1,1),(0,0)) Is this monotonic? No Non Monotonic Similarity (3) •You may think: “well that was mathematics, how about real world?” •Suppose that we have two interconnected batteries B and B’ and 3 lamps X, Y and Z that have the following properties:  If X is on, B and B’ work  If Y is on, B or B’ work  If Z is on, B works Situation 1 2 3 X 0 0 0 Y 1 1 0 Z 1 0 0 B Ok Fail Fail B’ Fail Ok Fail Thus: • sim(1,3) > sim(1,2) • Non monotonic! Tversky Contrast Model •Defines a non monotonic distance •Comparison of a situation S with a prototype P (i.e, a case) •S and P are sets of features •The following sets: A=S P B=P–S C = S – P P S C A B Tversky Contrast Model (2) •Tversky-distance: T(P,S) = f(A) - f(B) - f(C) •Where f:  [0, ) • f, , , and  are fixed and defined by the user •Example: If f(A) = # elements in A  =  =  = 1 T counts the number of elements in common minus the differences The Tversky-distance is not symmetric Local versus Global Similarity Metrics • In many situations we have similarity metrics between attributes of the same type (called local similarity metrics). Example: For a complex engine, we may have a similarity for the temperature of the engine • In such situations a reasonable approach to define a global similarity sim(x,y) is to “aggregate” the local similarity metrics simi(xi,yi). A widely used practice • What requirements should we give to sim(x,y) in terms of the use of simi(xi,yi)? sim(x,y) to increate monotonically with each simi(xi,yi). Local versus Global Similarity Metrics (Formal Definitions) •A local similarity metric on an attribute Ti is a similarity metric simi: Ti  Ti  [0,1] •A function : [0,1]n  [0,1] is an aggregation function if: (0,0,…,0) = 0  is monotonic non-decreasing on every argument •Given a collection of n similarity metrics sim1, …, simn, for attributes taken values from Ti, a global similarity metric, is a similarity metric sim:V  V  [0,1], V in T1  … Tn, such that there is an aggregation  function with: sim(X,Y) = sim(X,Y) = (sim1(X1,Y1), …,simn(Xn,Yn)) Example: (X1,X2,…,Xn) = (X1+X2+…+Xn)/n Example • Cases may contain attributes of type: – real number A: the voltage output of a device • define a local similarity metric, simvoltage() – Integer B: revolutions per second • define a local similarity metric, simrps() – A bunch of symbolic attributes m = (C1,..,Cm): front light blinking or none, year of manufacture, etc • define a Hamming similarity, simH(), combining all these attributes • Define an aggregated similarity sim() metric: sim(C,C’) = 1 *simvoltage(A,A’) + 2 *simvoltage(A,A’) + 3 *simH(m, m’) Homework (1 of 2) 1. In Slide 12 we define the similarity relation Ssim(x,y,u,v). Which of the 4 kinds of relations defined in Slide 9 are satisfied by Ssim(x,y,u,v)? 2. Let us define: SH(x,y,u,v) iff H(x,y)  H(u,v) where H is the Hamming distance (defined in Slide 20). Which of the 4 kinds of relations defined in Slide 9 are satisfied by SH(x,y,u,v)? 3. Let us define: ST(x,y,u,v) iff T(x,y)  T(u,v) where T is the Tversky Contrast Model (defined in Slide 31). Which of the 4 kinds of relations defined in Slide 9 are satisfied by ST(x,y,u,v)? Homework (2 of 2) 4. Define a formula for the Hamming distance when the attributes are symbolic but may take more than 2 values: •X = (X1, …, Xn) where Xi  Ti •Y = (Y1, …,Yn) where Yi  Ti •Each Ti is finite

CaseSim

Related documents

Products

Support

CaseSim

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib