Association Rules Carissa Wang February 23, 2010 What is Association Rule In data mining, it is a method for discovering relations between different sets of items in a large database. Database A large collection of transactions Example - Market basket database Definition X => Y X = {x1, x2, …, xn} Y = {y1, y2, …, yn} xi and yj are distinct items for all i and all j X is the left-hand-side (LHS) Y is the right-hand-side (RHS) Example Transaction ID Items Bought 1 Milk, bread, cookies, juice 2 Milk, juice 3 Milk, eggs 4 Bread, cookies, coffee Measuring the rule Support Frequency of an item set occurs in the database Item set – LHS RHS Confidence Probability of LHS => RHS Support Rules Milk => juice Bread => juice {milk, juice} Transaction ID Items Bought 1 Milk, bread, cookies, juice 2 Milk, juice 3 Milk, eggs 4 Bread, cookies, coffee 2 / 4 = 0.50 {bread, juice} 1 / 4 = 0.25 Confidence Rules Milk => juice Bread => juice Milk => juice Transactio Items Bought n ID 1 Milk, bread, cookies, juice 2 Milk, juice 3 Milk, eggs 4 Bread, cookies, coffee 0.50 / 0.75 = 0.67 Bread => juice 0.25 / 0.50 = 0.50 What these numbers mean Support High – LHS => RHS Low – not enough evidence of LHS => RHS Confidence High – given condition LHS, RHS will occur Low – RHS does not occur consistently Other measures of association rule Lift Conviction All – confidence Collective strength Leverage Algorithm to generate association rule Apriori Algorithm Eclat Algorithm Frequent Pattern Growth Algorithm One Attribute Rule Zero Attribute Rule Apriori Algorithm Database with large transactions Breadth-first search Two properties Downward closure Antimonotonicity Apriori Property Downward Closure Subset of large item set is also large Antimonotonicity Superset of small item set is small How Apriori algorithm works Find subsets with minimum frequency of in the given transactions Extend the subsets by one item and keep the subsets that meet the minimum frequency Repeat last step until no frequent superset How Apriori algorithm works Item Support 1 2 3 4 3 6 Item Support {1,2} 3 {1,3} 2 {1,4} 3 {2,3} 4 {2,4} 5 4 5 Min Frequency =3 Item Support {1,2,4} 3 {2,3,4} 3 {3,4} 3 Applications Web usage mining Intrusion detection Bioinformatics Reference Apriori algorithm, Wikipedia http://en.wikipedia.org/wiki/Apriori_algorithm Fundamentals of Database Systems, 5th ed, Elmasri and Navathe Association rule learning, Wikipedia http://en.wikipedia.org/wiki/Association_rules