Association rules

Algoritmy strojového učení: klasifikační a asociační pravidla 1 Klasifikační pravidla a jak je lze získat z dat 2 Presbyope = vetchý Myope = krátkozraký Person O1 O2 O3 O4 O5 O6-O13 O14 O15 O16 O17 O18 O19-O23 O24 Illustrative example: Contact lenses data Age Spect. presc. young myope young myope young myope young myope young hypermetrope ... ... pre-presbyohypermetrope pre-presbyohypermetrope pre-presbyohypermetrope presbyopic myope presbyopic myope ... ... presbyopic hypermetrope Astigm. no no yes yes no ... no yes yes no no ... yes Tear prod. reduced normal reduced normal reduced ... normal reduced normal reduced normal ... normal Lenses NONE SOFT NONE HARD NONE ... SOFT NONE NONE NONE NONE ... NONE Classes: N(none), S(soft), H(hard) contact lenses 3 / 23 Umělá inteligence I. Decision tree for contact lenses recommendation tear prod. reduced normal astigmatism NONE no yes spect. pre. SOFT myope HARD 4 / 23 hypermetrope NONE Umělá inteligence I. Problems with dec. trees?     Dec,.trees can be transfored into rules, but to apply them we need to have „complete information“ about the case The resulting rule sets can be rather complex (1 rule = 1 branch of the tree) and difficult to understand for human user Sets of rules in DNF are sometimes easier to grasp:  If X then C1  If X and Y then C2  If not X and Z and Y then C3  If B then C2 But learning such sets is more difficult! 5 / 23 Umělá inteligence I. Ordered or unordered sets of rules?  Disjunction of 2 rules does not have to increase their performance! Example:  Let us have 1000 cases and 2 rules R1 and R2, each covering 100 cases and each correct on 90 of them.  What happens if the rules R1 and R2 are combined?  In the best case the incorrect cases are identical and the performance of R1 OR R2 is (90+90)/(90+90+10)=0,95  In the worst case R1 and R2 are correct on the same cases and wrong on different ones. In such a case, the performance of R1 OR R2 is (90)/(90+10+10)=0,82 6 / 23 Umělá inteligence I. Ruleset representation  Rule base is a disjunctive set of conjunctive rules  Standard form of rules: IF Condition THEN Class Class IF Conditions Class  Conditions  Examples: IF Outlook=Sunny  Humidity=Normal THEN PlayTennis=Yes IF Outlook=Overcast THEN PlayTennis=Yes IF Outlook=Rain  Wind=Weak THEN PlayTennis=Yes  Form of CN2 rules: IF Conditions THEN MajClass [ClassDistr]  Rule base: {R1, R2, R3, …, DefaultRule} 7 / 23 Umělá inteligence I. Decision tree vs. rule learning: Splitting vs. covering + + + + + + - - - + + + + + + - - - 8 / 23  Splitting (ID3, C4.5, J48, See5)  Covering (AQ, CN2) Umělá inteligence I. Classification Rule Learning  Rule set representation  Two rule learning approaches:  Learn decision tree, convert to rules  Learn set/list of rules  Learning an unordered set of rules  Learning an ordered list of rules  9 / 23 Heuristics, overfitting, pruning Umělá inteligence I. PlayTennis: Training examples Day D1 D2 D3 D4 D5 D6 D7 D8 D9 D10 D11 D12 D13 D14 10 / 23 Outlook Sunny Sunny Overcast Rain Rain Rain Overcast Sunny Sunny Rain Sunny Overcast Overcast Rain Temperature Hot Hot Hot Mild Cool Cool Cool Mild Cool Mild Mild Mild Hot Mild Humidity High High High High Normal Normal Normal High Normal Normal Normal High Normal High Wind Weak Strong Weak Weak Weak Strong Strong Weak Weak Weak Strong Weak Weak Strong PlayTennis No No Yes Yes Yes No Yes No Yes Yes Yes Yes Yes No Umělá inteligence I. PlayTennis: Using a decision tree for classification Outlook Sunny Overcast Humidity High No Rain Yes Normal Yes Wind Strong No Weak Yes Is Saturday morning OK for playing tennis? Outlook=Sunny, Temperature=Hot, Humidity=High, Wind=Strong PlayTennis = No, because Outlook=Sunny  Humidity=High 11 / 23 Umělá inteligence I. Contact lense: classification rules tear production=reduced => lenses=NONE [S=0,H=0,N=12] tear production=normal & astigmatism=no => lenses=SOFT [S=5,H=0,N=1] tear production=normal & astigmatism=yes & spect. pre.=myope => lenses=HARD [S=0,H=3,N=2] tear production=normal & astigmatism=yes & spect. pre.=hypermetrope => lenses=NONE [S=0,H=1,N=2] DEFAULT lenses = NONE 12 / 23 Umělá inteligence I. Unordered rulesets  rule Class IF Conditions is learned by first determining Class and then Conditions  ordered sequence of classes C1, …, Cn in RuleSet  But: unordered (independent) execution of rules when classifying a new instance: all rules are tried and predictions of those covering the example are collected; voting is used to obtain the final classification  if no rule fires, then DefaultClass (majority class in E) 13 / 23 Umělá inteligence I. Contact lense: decision list Ordered (order dependent) rules : IF tear production=reduced THEN lenses=NONE ELSE /*tear production=normal*/ IF astigmatism=no THEN lenses=SOFT ELSE /*astigmatism=yes*/ IF spect. pre.=myope THEN lenses=HARD ELSE /* spect.pre.=hypermetrope*/ lenses=NONE 14 / 23 Umělá inteligence I. Ordered set of rules: if-then-else decision lists  rule Class IF Conditions is learned by first determining Conditions and then Class  Notice: mixed sequence of classes C1, …, Cn in RuleBase  But: ordered execution when classifying a new instance: rules are sequentially tried and the first rule that `fires’ (covers the example) is used for classification  Decision list {R1, R2, R3, …, D}: rules Ri are interpreted as if-then-else rules  If no rule fires, then DefaultClass (majority class in Ecur) 15 / 23 Umělá inteligence I. Original covering algorithm (AQ, Michalski 1969,86) Basic (bottom-up) covering algorithm + + + + + + - - - for each class Ci do  Ei := Pi U Ni (Pi pos., Ni neg.)  RuleBase(Ci) := empty  repeat {learn-set-of-rules}  learn-one-rule R covering some positive examples and no negatives  add R to RuleBase(Ci)  delete from Pi all pos. ex. covered by R  until Pi = empty 16 / 23 Umělá inteligence I. Learning unordered set of rules (CN2, Clark and Niblett) Clark and Niblett, 1989: top-down approach to search (specialization applies a beam search)  RuleBase := empty  for each class Ci do  Ei := Pi U Ni, RuleSet(Ci) := empty  repeat {learn-set-of-rules}  R := Class = Ci IF Conditions, Conditions := true  repeat {learn-one-rule} R’ := Class = Ci IF Conditions AND Cond (general-to-specific beam search of Best R’) until stopping criterion is satisfied (no negatives covered or Performance(R’) < ThresholdR)  add R’ to RuleSet(Ci)  delete from Pi all positive examples covered by R’ until stopping criterion is satisfied (all positives covered or Performance(RuleSet(Ci)) < ThresholdRS)  RuleBase := RuleBase U RuleSet(Ci) 17 / 23 Umělá inteligence I. Learn-one-rule: Greedy vs. beam search  learn-one-rule by greedy general-to-specific search, at each step selecting the `best’ descendant, no backtracking  beam search: maintain a list of k best candidates at each step; descendants (specializations) of each of these k candidates are generated, and the resulting set is again reduced to k best candidates Recommended reading for search in AI V. Mařík: Řešení úloh a využívání znalostí, kapitola v knize Mařík et al.: UI(1), Academia 1993, 2003 Umělá inteligence I. 18 / 23 Illustrative example: Contact lenses data Person O1 O2 O3 O4 O5 O6-O13 O14 O15 O16 O17 O18 O19-O23 O24 19 / 23 Age Spect. presc. young myope young myope young myope young myope young hypermetrope ... ... pre-presbyohypermetrope pre-presbyohypermetrope pre-presbyohypermetrope presbyopic myope presbyopic myope ... ... presbyopic hypermetrope Astigm. no no yes yes no ... no yes yes no no ... yes Tear prod. reduced normal reduced normal reduced ... normal reduced normal reduced normal ... normal Lenses NONE SOFT NONE HARD NONE ... SOFT NONE NONE NONE NONE ... NONE Umělá inteligence I. Learn-one-rule as heuristic search Lenses = hard IF true S=H=N= ... Lenses = hard IF Astigmatism = no [S=5, H=0, N=7] Lenses = hard IF Astigmatism = yes [S=0, H=4, N=8] Lenses = hard IF Tearprod. = reduced [S=0, H=0, N=12] Lenses = hard IF Tearprod. = normal [S=5, H=4, N=3] Lenses = hard IF Tearprod. = normal AND Spect.Pre. = myope [S=2, H=3, N=1] Lenses = hard IF Tearprod. = normal AND Spect.Pre. = hyperm. [S=3, H=1, N=2] 20 / 23 Lenses = hard IF Tearprod. = normal AND Astigmatism = no Lenses = hard IF Tearprod. = normal AND Astigmatism = yes [S=0, H=4, N=2] [S=5, H=0, N=1] Umělá inteligence I. Rule learning: summary    21 / 23 Hypothesis construction: find a set of n rules  usually simplified by n separate rule constructions Rule construction: find a pair (Class, Cond)  select rule head (class) and construct rule body, or  construct rule body and assign rule head (in ordered algos) Body construction: find a set of m features  usually simplified by adding to rule body one feature at a time Umělá inteligence I. Associations and Frequent Item Analysis 22 Outline  Transactions  Frequent itemsets  Subset Property  Association rules  Applications 23 / 23 Umělá inteligence I. Transactions Example TID Produce 1 MILK, BREAD, EGGS 2 BREAD, SUGAR 3 BREAD, CEREAL 4 MILK, BREAD, SUGAR 5 MILK, CEREAL 6 BREAD, CEREAL 7 MILK, CEREAL 8 MILK, BREAD, CEREAL, EGGS 9 MILK, BREAD, CEREAL 24 / 23 Umělá inteligence I. Transaction database: Example TID Products 1 A, B, E 2 B, D 3 B, C 4 A, B, D 5 A, C 6 B, C 7 A, C 8 A, B, C, E 9 A, B, C ITEMS: A = milk B= bread C= cereal D= sugar E= eggs 25 / 23 TID Produce 1 MILK, BREAD, EGGS 2 BREAD, SUGAR 3 BREAD, CEREAL 4 MILK, BREAD, SUGAR 5 MILK, CEREAL 6 BREAD, CEREAL 7 MILK, CEREAL 8 MILK, BREAD, CEREAL, EGGS 9 MILK, BREAD, CEREAL Instances = Transactions Umělá inteligence I. Transaction database: Example Attributes converted to binary flags TID Products 1 A, B, E 2 B, D 3 B, C 4 A, B, D 5 A, C 6 B, C 7 A, C 8 A, B, C, E 9 A, B, C 26 / 23 TID A B C D E 1 1 1 0 0 1 2 0 1 0 1 0 3 4 0 1 1 1 1 0 0 1 0 0 5 1 0 1 0 0 6 0 1 1 0 0 7 8 1 1 0 1 1 1 0 0 0 1 9 1 1 1 0 0 Umělá inteligence I. Definitions  Item (položka): attribute =value pair or simply value   Itemset I (množina položek) : a subset of possible items   usually attributes are converted to binary flags for each value, e.g. Product = “A” is written as “A” Example: I = {A,B,E} (order unimportant) Transaction: (TID, itemset)  27 / 23 TID is transaction ID Umělá inteligence I. Support and Frequent Itemsets  Support of an itemset I   In example database:   sup(I ) = no. of transactions t that support (i.e. contain) I sup ({A,B,E}) = 2, sup ({B,C}) = 4 Frequent itemset I is one with at least the minimum support count  28 / 23 sup(I ) >= minsup TID A B C D E 1 1 1 0 0 1 2 0 1 0 1 0 3 4 0 1 1 1 1 0 0 1 0 0 5 1 0 1 0 0 6 0 1 1 0 0 7 8 1 1 0 1 1 1 0 0 0 1 9 1 1 1 0 0 Umělá inteligence I. SUBSET PROPERTY Every subset of a frequent set is frequent!  Q: Why is it so?  Example: Suppose {A,B} is frequent. Since each occurrence of A,B includes both A and B, then both A and B must also be frequent  Similar argument for larger itemsets  29 / 23 Almost all association rule algorithms are based on this subset property ! Umělá inteligence I. Finding Frequent Itemsets  Start by finding one-item frequent sets (easy)  Q: How?  A: Simply count the frequencies of all items 30 / 23 Umělá inteligence I. Finding itemsets: next level Apriori algorithm (Agrawal & Srikant) Idea: use one-item sets to generate two-item sets, two-item sets to generate three-item sets, …  If (A,B) is a frequent item set, then (A) and (B) have to be frequent item sets as well!  In general: if X is frequent k-item set, then all (k-1)-item subsets of X are also frequent  Compute k-item set by merging (k-1)-item sets 31 / 23 Umělá inteligence I. An example  Given: five three-item sets (A B C), (A B D), (A C D), (A C E), (B C D)  Lexicographic order improves efficiency  Which are candidates for four-item sets? (A B C D) Q: OK? Answer: Yes, because all 3-item subsets are frequent (A C D E) Q: OK? Answer: No, because (C D E) is not frequent 32 / 23 Umělá inteligence I. Classification vs Association Rules Classification Rules Association Rules  Focus on one target field  Many target fields  Specify class in all cases  Applicable in some cases  Measures : Accuracy  Measures : Support, 33 / 23 Confidence, Lift Umělá inteligence I. Association Rules  Association rule R : Itemset1 => Itemset2 Itemset1 and Itemset2 are disjoint and Itemset2 is non-empty “if a transaction includes Itemset1 then it also has Itemset2”  Examples A,B => C A,B => C,E A => B,C A,B =>D 34 / 23 TID A B C D E 1 1 1 0 0 1 2 0 1 0 1 0 3 4 0 1 1 1 1 0 0 1 0 0 5 1 0 1 0 0 6 0 1 1 0 0 7 8 1 1 0 1 1 1 0 0 0 1 9 1 1 1 0 0 Umělá inteligence I. From Frequent Itemsets to Association Rules  Q: Given frequent set {A,B,E}, what are possible association rules?        35 / 23 A => B, E A, B => E A, E => B B => A, E B, E => A E => A, B __ => A,B,E (empty rule), or true => A,B,E Umělá inteligence I. Rule Support and Confidence  Suppose R : I => J is an association rule sup (R) = sup (I  J) is the support count support of itemset I  J conf (R) = sup(I  J) / sup(I) is the confidence of R fraction of transactions with I that have J, too  36 / 23 Association rules with given minimum support and conf are sometimes called “strong” rules Umělá inteligence I. Measures for the rule Ant => Suc a is the total number of transactions with items Ant  Suc support = a/n confidence = a/r cover = a/k Suc Non  (Suc) Ant a b r= a+b Non (Ant) c d s= c+d  k= a+c l= b+d n= r+s 4ft quantifiers in LispMiner „above average“ a/r > (1+p)*k/n means “When comparing number of transactions meeting Suc  in the full dataset and  among all transactions which meet Ant one finds that the difference is at least 100*p % (the number is Umělá inteligence I. higher in the second set)” 37 / 23 Association Rules Example: conf (I => J ) = sup(I  J) / sup(I) Q: Given frequent set {A,B,E}, what association rules have minsup = 2 and minconf= 50% ?  A, B => E : conf=2/4 = 50% A, E => B : conf=2/2 = 100% B, E => A : conf=2/2 = 100% E => A, B : conf=2/2 = 100% Don’t qualify A =>B, E : conf= 2/6 =33%< 50% B => A, E : conf= 2/7 = 28% < 50% TID List of items 1 A, B, E 2 B, D 3 B, C 4 A, B, D 5 A, C 6 B, C 7 A, C 8 A, B, C, E 9 A, B, C __ => A,B,E : conf= 2/9 = 22% < 50% 38 / 23 Umělá inteligence I. Find Strong Association Rules  A rule has the parameters minsup and minconf:   Problem:   sup(R) >= minsup and conf (R) >= minconf Find all association rules with given minsup and minconf First, find all frequent itemsets 39 / 23 Umělá inteligence I. Generating Association Rules  Two stage process:  Determine frequent itemsets e.g. with the Apriori algorithm.  For each frequent item set I  for each subset J of I   determine all association rules of the form: I-J => J Main idea used in both stages : subset property 40 / 23 Umělá inteligence I. Example: Generating Rules from an Itemset Frequent itemset from golf/tenis data: Is this frequent item set? {Humidity = normal, Windy = False, Play = Yes } Support is 4 41 / 23 Outlook Temp Humidity Windy Play Sunny Hot High False No Sunny Hot High True No Overcast Hot High False Yes Rainy Mild High False Yes Rainy Cool Normal False Yes Rainy Cool Normal True No Overcast Cool Normal True Yes Sunny Mild High False No Sunny Cool Normal False Yes Rainy Mild Normal False Yes Sunny Mild Normal True Yes Overcast Mild High True Yes Overcast Hot Normal False Yes Rainy Mild High True No Umělá inteligence I. Example: Generating Rules from the freq. set Humidity = Normal, Windy = False,Play = Yes Seven potential rules: If Humidity = Normal and Windy = False then Play = Yes 4/4 If If If If If If Outlook Temp Humidity Windy Play Sunny Hot High False No Sunny Hot High True No Overcast Hot High False Yes Rainy Mild High False Yes Rainy Cool Normal False Yes Rainy Cool Normal True No Overcast Cool Normal True Yes Sunny Mild High False No Sunny Cool Normal False Yes Rainy Mild Normal False Yes Sunny Mild Normal True Yes Overcast Mild High True Yes Overcast Hot Normal False Yes Rainy Mild High True No Humidity = Normal and Play = Yes then Windy = False Windy = False and Play = Yes then Humidity = Normal Humidity = Normal then Windy = False and Play = Yes Windy = False then Humidity = Normal and Play = Yes Play = Yes then Humidity = Normal and Windy = False True then Humidity = Normal and Windy = False and Play = Yes 42 / 23 Umělá inteligence I. 4/6 4/6 4/7 4/8 4/9 4/12 Outlook Temp Humidity Windy Play Sunny Hot High False No Sunny Hot High True No Overcast Hot High False Yes Rainy Mild High False Yes Rainy Cool Normal False Yes Rainy Cool Normal True No Overcast Cool Normal True Yes Rules with support > 1 Sunny Mild High False No Sunny Cool Normal False Yes and confidence = 100% : Rainy Mild Normal False Yes Sunny Mild Normal True Yes Overcast Mild High True Yes Overcast Hot Normal False Yes Rainy Mild High True No Rules for the weather data   Association rule  Sup. Conf. 1 Humidity=Normal Windy=False Play=Yes 4 100% 2 Temperature=Cool Humidity=Normal 4 100% 3 Outlook=Overcast Play=Yes 4 100% 4 Temperature=Cold Play=Yes Humidity=Normal 3 100% ... ... ... ... ... 58 Outlook=Sunny Temperature=Hot Humidity=High 2 100% In total: 3 rules with support four, 5 with support three, 50 with support two 43 / 23 Umělá inteligence I. Weka associations File: weather.nominal.arff MinSupport: 0.2 44 / 23 Umělá inteligence I. Filtering Association Rules   Problem: any large dataset can lead to very large number of association rules, even with reasonable minimal Confidence and Support Confidence by itself is not sufficient !  e.g. if all transactions include Z, then  any rule I => Z will have confidence 100%.  Other measures to filter rules 45 / 23 Umělá inteligence I. Further WEKA measures for the rule Ant => Suc support = a/n confidence = a/r cover = a/k Suc Non  (Suc) Ant a b r= a+b Non (Ant) c d s= c+d  k= a+c l= b+d n= r+s lift = (a/r)/(k/n)= a*n/(r*k) “Lift estimates increase in precision of default prediction of Suc on the set of transactions meeting Ant when compared to than on the whole dataset” leverage = (a-r*k/n)/n “Ratio of ‘extra’ transactions covered by the rule when compared to those covered provided Ant and Suc are independent” conviction = r*l/(b*n) “Similar to lift, but it considers transactions, which are not covered by Suc.” 46 / 23 Umělá inteligence I. Weka associations: output 47 / 23 Umělá inteligence I. Association Rule LIFT  The lift of an association rule I => J is defined as:  lift = P(J|I) / P(J)  Note, P(I) = (support of I) / (no. of transactions)  ratio of confidence to expected confidence  Interpretation:  if lift > 1, then I and J are positively correlated lift < 1, then I are J are negatively correlated. lift = 1, then I and J are independent. 48 / 23 Umělá inteligence I. Other issues  ARFF format very inefficient for typical market basket data   Attributes represent items in a basket and most items are usually missing Interestingness of associations  49 / 23 find unusual associations: Milk usually goes with bread, but soy milk does not. Umělá inteligence I. Beyond Binary Data  Hierarchies drink  milk  low-fat milk  Stop&Shop low-fat milk …  find associations on any level   Sequences over time  … 50 / 23 Umělá inteligence I. Applications  Market basket analysis  Store layout, client offers  Bookstore: offers of similar titles (see e.g. Amazon)  „Diapers and beer“ urban legend  Recommendations concerning new services or new customers, e.g.   Finding unusual events   if (Car=Porsche & Gender=Male & Age < 20) then (Risk=high & Insurance = high) WSARE – What is Strange About Recent Events … 51 / 23 Umělá inteligence I. Summary  Frequent itemsets  Association rules  Subset property  Apriori algorithm  Application difficulties 52 / 23 Umělá inteligence I.

Association rules

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib