COMP 578 Data Warehousing & Data Mining Ch 2 Discovering Association Rules Keith C.C. Chan Department of Computing The Hong Kong Polytechnic University The AR Mining Problem Given a database of transactions. Each transaction being a list of items. E.g. purchased by a customer in a visit. Find all rules that correlate the presence of one set of items with that of another set of items E.g., 30% of people who buys diapers also buys beer. 2 Motivation & Applications (1) If we can find such associations, we will be able to answer: ??? beer (What should the company do to boost beer sales?) Diapers ??? (What other products should the store stocks up?) Attached mailing in direct marketing. 3 Motivation & Applications (2) Originally for marketing to understand purchasing trends. What products or services customers tend to purchase at the same time, or later on? Use market basket analysis to plan: Coupon and discounting: Do not offer simultaneous discounts on beer and diapers if they tend to be bought together. Discount one to pull in sales of the other. Product placement. Place products that have a strong purchasing relationship close together. Place such products far apart to increase traffic past other items. 4 Measure of Interestingness For a data mining algorithm to mine for interesting association rules, users have to define a measure of “interestingness”. Two popular interestingness measures have been proposed: Support and Confidence Lift Ratio (Interest) MineSet from SGI use the terms predictability and prevalence instead of support and confidence. 5 The Support and Confidence Given rule X & Y => Z Support, S = P(X Y Z) where A B indicates that a transaction contains both X and Y (union of item sets X and Y) [# of tuples containing both A & B / total # of tuples] Confidence, C = P(Z | X Y ) P(Z | X Y ) is a conditional probability that a transaction having {XY} also contains Z [# of tuples containing both X&Y&Z / # of tuples containing X&Y] 6 The Support and Confidence Customer buys both Customer buys diaper Customer buys beer Transaction ID Items Bought 2000 A,B,C 1000 A,C 4000 A,D 5000 B,E,F Let minimum support 50%, and minimum confidence 50%, find out the S and C of : 1. A C 2. C A Answer: A C (50%, 66.6%) C A (50%, 100%) 7 How Good is a Predictive Model? Response curves - How does the response rate of a targeted selection compare to a random selection? 8 What is A Lift Ratio? (1) Consider the rule: It states an explicit percentage (50% of the time). Consider this other rule: When people buy diapers they also buy beer 50 percent of the time. People who purchase a VCR are three times more likely to also purchase a camcorder. The rule used the comparative phrase “three times more likely”? 9 What is A Lift Ratio? (2) The probability is compared to the baseline likelihood. The baseline likelihood is the probability of the event occurring independently. E.g., if people normally buy beer 5% of the time, then the first rule could have said “10 times more likely.” The ratio in this kind of comparison is called lift. A key goal of an association rule mining exercise is to find rules that have the desired lift. 10 Lift Ratio As Interestingness An Example: X 1 1 1 1 0 0 0 0 Y 1 1 0 0 0 0 0 0 Z 0 1 1 1 1 1 1 1 Rule Support Confidence X=>Y 25% 50% X=>Z 37.50% 75% X and Y, positively correlated, X and Z, negatively related Support and Confidence of X=>Z dominates 11 Lift Ratio As Interestingness It is a measure of dependent or correlated events The lift of rule X => Y lift X ,Y P(Y | X ) P(Y ) Apriori = P(Y) Confidence=P(Y|X) P( X Y ) or P( X ) P(Y ) Lift = 1 means X and Y are independent events Lift < 1 means X and Y are negatively correlated Lift > 1 means X and Y are positively correlated (better than random) 12 AR Mining with Lift Ratio (1) To understand what lift ratio is, consider the following: 500,000 transactions 20,000 transactions contain diapers (4 percent) 30,000 transactions contain beer (6 percent) 10,000 transactions contain both diapers and beer (2 percent) Confidence measures how much a particular item is dependent on another. When people buy diapers, they also buy beer 50% of the time (10,000/20,000). The confidence for this rule is 50%. 13 AR Mining with Lift Ratio (2) The inverse rule could be stated as: When people buy beer they also buy diapers 1/3 of the time (Conf=33.33% = 10,000/30,000). In the absence of any knowledge about what else was bought, the following can be computed: People buy diapers 4 percent of the time. People buy beer 6 percent of the time. 4% and 6% are called the expected confidence (or baseline likelihood, or A Priori Probability) of buying diapers or beer. 14 AR Mining with Lift Ratio (3) Lift measures the difference between the confidence of a rule and the expected confidence. Lift is one measure of the strength of an effect. If people who bought diapers also bought beer 8% of the time, then the effect is small if expected confidence is 6%. If the confidence is 50%, and lift is more than 8 times (when measured as a ratio), then the interactions between diapers and beer is very strong. 15 AR Mining with Lift Ratio : An Example Consider item sets with three items: 10,000 transactions contain wipes. 8,000 transactions contain wipes and diapers (80%). 220 transactions contain wipes and beer (2.2%). 200 transactions contain wipes, diapers and beer (2%). The complete set of 12 rules is presented in a table along with their confidence, support and lift. 16 AR Mining with Lift Ratio : An Example LHS Beer Diapers Wipes Diapers Beer Wipes Beer Wipes 2.00 2.00 1.00 0.04 Diapers 4.00 90.91 22.73 0.04 10 Diapers Beer Diapers Wipes Wipes Beer Diapers & Wipes Diapers & Beer Beer & Wipes Diapers Exp Conf (%) 6.00 4.00 2.00 4.00 6.00 2.00 6.00 0.044 1.00 22.73 0.04 11 Wipes 2.00 2.00 1.00 0.04 12 Beer Wipes & Beer Diapers & Beer Diapers & Wipes 1.60 0.67 0.42 0.04 1 2 3 4 5 6 7 8 9 RHS Conf (%) Lift Ratio Supp (%) 50.00 33.33 40.00 80.00 2.20 0.73 2.50 8.33 8.33 20.00 20.00 0.37 0.37 0.42 2.00 2.00 1.60 1.60 0.04 0.04 0.04 17 AR Mining with Lift Ratio : An Example The greatest amount of lift, if measured as a ratio, is found in the 9th and 10th rules. Both have a lift greater than 22, computed as 90.91/4 and 1/0.044. For the 9th rule, the lift of 22 means: People who purchase wipes and beer are 22 times more likely to also purchase diapers than people who do not. Note the negative lift (lift ratio less than 1) in the 5th, 6th, 7th and last rules. The latter two rules both have a lift ratio of approximately 0.42. 18 AR Mining with Lift Ratio : An Example Negative lift on the 7th rule means that people who buy diapers and wipes are less likely to buy beer than one would expect. Rules with very high or very low confidence model an anomaly. If a rule says, with a confidence of 1 (100%), that whenever people bought pet food they also bought pet supplies. Further investigation show that was for one day only. There was a special giveaway. 19 AR Mining with Lift Ratio : An Example Most rules have dairy on the right hand side. Milk or eggs are so commonly purchased, “dairy” is quite likely to show up in many rules. Ability to exclude specific items is very useful. Interesting rules are: Have a very high or very low lift. Do not involve items that appear on most transactions. Have support that exceeds a threshold. Low support might simply be due to a statistical anomaly. Rules that are more general are frequently desirable. Sometimes interesting to differentiate between diapers sold in boxes vs. diapers sold in bulk. 20 Lift Ratio and Sample Size Consider the association A => B. A lift ratio can be very large even if the number of transactions having A and B together or separately are very small. To take sample size into consideration, one can consider using the support and confidence as interestingness measures. 21 Complexity of AR Mining Algorithms An association algorithm is simply a counting algorithm. Probabilities are computed by taking ratios among various counts. If item hierarchies are in use, then some translation (or lookup) is needed. One must carefully control the sizes of the item sets because of combinatorial explosion problem. 22 Complexity of AR Mining Algorithms Large grocery stores stock sell more than 100,000 different items. There can be 5 billion possible item pairs, and 1.7 x 1014 sets of three items. An item hierarchy can be used to reduce this number to a manageable size. There is unlikely to be a specific relationship between Pampers in the 30-count box and Blue Ribbon in 12oz cans. 23 Complexity of AR Mining Algorithms If there is such a relationship, it is probably subsumed by the more general relationship between diapers and beer. Using an item hierarchy reduces the number of combinations. It also helps to find more general higherlevel relationships such as those between any kind of diapers and any kind of beer. 24 Complexity of AR Mining Algorithms The combinatorial explosion problem: Even if you use an item hierarchy to group items together so that the average group size is 50. Reducing 100,000 items to 2,000 item groups. With 2,000 item groups there are still almost 2 million paired item sets. An algorithm might require up to 2 million counting registers. There are 1.3 billion three-item item sets! Many combinations will never occur. Some sort of dynamic memory or counter allocation and addressing scheme will be needed. 25 The Apriori Algorithm Transaction ID 2000 1000 4000 5000 Items Bought A,B,C A,C A,D B,E,F For rule A C: Min. support 50% Min. confidence 50% Frequent Itemset Support {A} 75% {B} 50% {C} 50% {A,C} 50% support = support({A ^ C}) = 50% confidence = support({A ^ C})/support({A}) = 66.6% The Apriori principle: Any subset of a frequent itemset must be frequent 26 Applying Apriori Algorithm Database D TID 100 200 300 400 itemset sup. C1 {1} 2 {2} 3 Scan D {3} 3 {4} 1 {5} 3 Items 134 235 1235 25 C2 itemset sup L2 itemset sup 2 2 3 2 {1 {1 {1 {2 {2 {3 C3 itemset {2 3 5} Scan D {1 3} {2 3} {2 5} {3 5} 2} 3} 5} 3} 5} 5} 1 2 1 2 3 2 L1 itemset sup. {1} {2} {3} {5} 2 3 3 3 C2 itemset {1 2} Scan D L3 itemset sup {2 3 5} 2 {1 {1 {2 {2 {3 3} 5} 3} 5} 5} ANIMATED DEMO 27 Improving Apriori’s Efficiency Hash-based itemset counting: A k-itemset whose corresponding hashing bucket count is below the threshold cannot be frequent Transaction reduction: A transaction that does not contain any frequent k-itemset is useless in subsequent scans Partitioning: Any itemset that is potentially frequent in DB must be frequent in at least one of the partitions of DB Sampling: mining on a subset of given data, lower support threshold + a method to determine the completeness Dynamic itemset counting: add new candidate itemsets only when all of their subsets are estimated to be frequent. 28 Is Apriori Fast Enough? The core of the Apriori algorithm: Use frequent (k – 1)-itemsets to generate candidate frequent k-itemsets Use database scan and pattern matching to collect counts for the candidate itemsets The bottleneck of Apriori: candidate generation Huge candidate sets: 104 frequent 1-itemset will generate 107 candidate 2-itemsets To discover a frequent pattern of size 100, e.g., {a1, a2, …, a100}, one needs to generate 2100 1030 candidates. Multiple scans of database: Needs (n +1 ) scans, n is the length of the longest pattern 29 Multiple-Level ARs Items often form hierarchy. Items at the lower level are expected to have lower support. Rules regarding itemsets at appropriate levels could be quite useful. Transaction database can be encoded based on dimensions and levels It is smart to explore shared multi-level mining (Han & Fu,VLDB’95). 30 Mining Multi-Level Association A top_down, progressive deepening approach: First find high-level strong rules: milk bread [20%, 60%]. Then find their lower-level “weaker” rules: 2% milk wheat bread [6%, 50%]. Variations at mining multiple-level association rules. Level-crossed association rules: 2% milk Wonder wheat bread Association rules with multiple, alternative hierarchies: 2% milk Wonder breadg 31 Multi-level Association: Uniform Support vs. Reduced Support (1) Uniform Support: the same minimum support for all levels + One minimum support threshold. No need to examine itemsets containing any item whose ancestors do not have minimum support. – Lower level items do not occur as frequently. If support threshold too high miss low level associations. too low generate too many high level associations. 32 Multi-level Association: Uniform Support vs. Reduced Support (2) Reduced Support: reduced minimum support at lower levels There are 4 search strategies: Level-by-level independent Level-cross filtering by k-itemset Level-cross filtering by single item Controlled level-cross filtering by single item 33 Uniform Support Multi-level mining with uniform support Level 1 min_sup = 5% Level 2 min_sup = 5% Milk [support = 10%] 2% Milk Skim Milk [support = 6%] [support = 4%] 34 Reduced Support Multi-level mining with reduced support Level 1 min_sup = 5% Level 2 min_sup = 3% Milk [support = 10%] 2% Milk Skim Milk [support = 6%] [support = 4%] Back 35 Multi-level Association: Redundancy Filtering Some rules may be redundant due to “ancestor” relationships between items. Example milk wheat bread, [support = 8%, confidence = 70%] 2% milk wheat bread, [support = 2%, confidence = 72%] We say the first rule is an ancestor of the second rule. A rule is redundant if its support is close to the “expected” value, based on the rule’s ancestor. 36 Multi-Level Mining: Progressive Deepening A top-down, progressive deepening approach: First mine high-level frequent items: milk (15%), bread (10%) Then mine their lower-level “weaker” frequent itemsets: 2% milk (5%), wheat bread (4%) Different min_support threshold across multi-levels lead to different algorithms: If adopting the same min_support across multi-levels then toss t if any of t’s ancestors is infrequent. If adopting reduced min_support at lower levels then examine only those descendents whose ancestor’s support is frequent/non-negligible. 37 AR Representation Scheme In words: In first-order logic or PROLOG-like statement: 60% of people who buys diapers also buys beers and 0.5% buys both. buys(x, “diapers”) -> buys(x, “beers”) [0.5%, 60%] Also representation as if-then rules. If diapers in Itemset THEN beers in Itemset [0.5%, 60%] If people buy diapers, they also buy beers 60% of the time, 0.5% of the people buy both. 38 Presentation of Association Rules (Tabular ) 39 Visualization of Association Rule Using Plane Graph 40 Visualization of Association Rule Using Rule Graph 41 Sequential Apriori Algorithm (1) The problem of mining sequential patterns can be split into the following phases: 1. 2. 3. 4. 5. Sort Phase. This step implicitly converts the original transaction database into a database of sequences. Litemset Phase. In this phase we find the set of all litemsets L. We are also simultaneously finding the set of all large 1-sequences. Transformation Phase. We need to repeatedly determine which of a given set of large sequences are contained in a customer sequence. We transform each customer sequence into an alternative representation. Sequence Phase. Use the set of litemsets to find the desired sequences. Algorithms for this phase below. Maximal Phase. Find the maximal sequences among the set of large sequences. In some algorithms this phase is combined with the sequence phase to reduce the time wasted in counting non maximal sequences. REFERENCE: Mining Sequential Patterns 42 Sequential Apriori Algorithm (2) There are two families of algorithms- count-all and count-some. The count-all algorithms count all the large sequences, including non-maximal sequences. The non-maximal sequences must then be pruned out (in the maximal phase). AprioriAll is a count-all algorithm, based on the Apriori algorithm for finding large itemsets. Apriori-Some is a count-some algorithm. The intuition behind these algorithms is that since we are only interested in maximal sequences, we can avoid counting sequences which are contained in a longer sequence if we first count longer sequences. 43 AprioriAll Algorithm (1) Step 1 Step 2 Minimum support = 25% Step 3 44 AprioriAll Algorithm (2) Step 4 L1 = large 1-sequences; // Result of litemset phase for ( k = 2; Lk-1 0; k++) do begin Ck = New Candidates generated from Lk-1 (see next slide) foreach customer-sequence c in the database do Increment the count of all candidates in Ck that are contained in c. Lk = Candidates in Ck with minimum support. end Answer = Maximal Sequences in k Lk ; 45 AprioriAll Algorithm (3) Apriori Candidate Generation The apriori-generate function takes as argument Lk-1, the set of all large (k-1)-sequences. It works as follows. First join Lk-1 with Lk-1 insert into Ck select p.litemset1 , ..., p.litemsetk-1 , q.litemsetk-1 from Lk-1 p, Lk-1 q where p.litemset1 = q.litemset1 , . . ., p.litemsetk-2 = q.litemsetk-2 ; Next delete all sequences c Ck such that some (k-1)-subsequence of c is not in Lk-1 46 Count Operation in Sequential Apriori Hash Tree used for fast search of candidate occurrences. Similar to association rule discovery, except for following differences. • Every event-timestamp pair in the timeline is hashed at the root. • Events eligible for hashing at the next level are determined by the maximum gap (xg), window size (ws), and span (ms) constraints. REFERENCE: Sequential Hash Tree for fast access http://www-users.cs.umn.edu/~mjoshi/hpdmtut/sld144.htm 47 Exercises: 1. What is the difference between the algorithms of Apriori and AprioriAll? 2. What happens if min. support and confidence are set too low / high? 3. Give a short example to show that items in a strong association rule may actually be negatively correlated. 48 END OF CHAPTER 2 BACK TO MAIN 49