Technology Forge Version 0.1 www.technologyforge.net Weka Tutorial 5 – Association ? © 2009 – Mark Polczynski Weka Tutorial 5 - Association 1 Association Got Beer? Market Basket Analysis Got Diapers? What things tend to “go together” in your market basket? Weka Tutorial 5 - Association 2 People who bought beer also usually bought diapers. Example of Market Basket Analysis People who bought diapers also usually bought beer. Weka Tutorial 5 - Association 3 Cash Register Checkout Data: • Items in shopping cart • Time of day • Day of week • Season of year • Method of payment • Location of store • etc… Associations? What can you use associations for? Fact-Based Marketing Strategies: • Floor plans • Discounts • Coupons • etc… Weka Tutorial 5 - Association 4 If the weather outside is: Then to play, or not to play, that is the question! Weka Tutorial 5 - Association 5 TPONTPNom.xls dataset: Outlook sunny sunny overcast rainy rainy rainy overcast sunny sunny rainy sunny overcast overcast rainy Temp. hot hot hot mild cool cool cool mild cool mild mild mild hot mild Humidity high high high high normal normal normal high normal normal normal high normal high Weka Tutorial 5 - Association Windy false true false false false true true false false false true true false true Play no no yes yes yes no yes no yes yes yes yes yes no 6 Outlook sunny sunny overcast rainy rainy rainy overcast sunny sunny rainy sunny overcast overcast rainy Temp. hot hot hot mild cool cool cool mild cool mild mild mild hot mild Outlook sunny = 5 overcast = 4 rainy = 5 Humidity high high high high normal normal normal high normal normal normal high normal high Temp. hot = 4 mild = 6 cool = 4 Windy false true false false false true true false false false true true false true Play no no yes yes yes no yes no yes yes yes yes yes no Humidity high = 7 normal = 7 Weka Tutorial 5 - Association Attribute values and counts Windy true = 6 false = 8 Play yes = 9 no = 5 7 Classification: Input attributes Outlook sunny sunny overcast rainy rainy rainy overcast sunny sunny rainy sunny overcast overcast rainy Temp. hot hot hot mild cool cool cool mild cool mild mild mild hot mild Humidity high high high high normal normal normal high normal normal normal high normal high Class Windy false true false false false true true false false false true true false true Play no no yes yes yes no yes no yes yes yes yes yes no Goal: Predict whether the person will play, given weather conditions. Weka Tutorial 5 - Association 8 Outlook sunny sunny overcast rainy rainy rainy overcast sunny sunny rainy sunny overcast overcast rainy Temp. hot hot hot mild cool cool cool mild cool mild mild mild hot mild Humidity high high high high normal normal normal high normal normal normal high normal high Windy false true false false false true true false false false true true false true Play no no yes yes association, we For yes look for things no that “go together” yes no yes yes yes yes yes no Some things that “go together”: Outlook = sunny AND Temp. = hot go together 2 times, Outlook = sunny AND Temp. = cool go together only once. Weka Tutorial 5 - Association 9 1 Outlook: sunny overcast rainy Two-item sets of attributes that can go together 5 2 Play: yes no Temp: hot mild cool Windy: true false 4 3 Weka Tutorial 5 - Association node-to-node links Humidity: high normal 1-2 2-3 3-4 4-5 5-1 1-3 2-4 3-5 4-1 5-2 10 10 Combinations of Outlook and Temperature that can go together 1 sunny/hot sunny/mild sunny/cool overcast/hot overcast/mild overcast/cool rainy/hot rainy/mild rainy/cool Outlook: sunny overcast rainy 3x3=9 5 2 Play: yes no Temp: hot mild cool Windy: true false 4 3 Weka Tutorial 5 - Association Humidity: high normal 11 All possible two-item sets 1 Outlook: 3 Play: 2 Temp: 3 5 2 1-2: 2-3: 3-4: 4-5: 5-1: 1-3: 2-4: 3-5: 4-1: 5-2: 3x3 = 9 3x2 = 6 2x2 = 4 2x2 = 4 2x3 = 6 3x2 = 6 3x2 = 6 2x2 = 4 2x3 = 6 2x3 = 6 57 node-to-node links 4 Windy: 2 3 Humidity: 2 Weka Tutorial 5 - Association combinations for link 12 3 4 3 4 3 4 6 1 3 1 1 3 2 2 4 2 3 3 4 2 0 4 2 2 3 1 Weka Tutorial 5 - Association 6 2 2 2 1 3 2 3 3 8 3 2 3 3 3 2 4 4 3 3 6 2 no 7 2 2 3 1 2 4 Play yes 7 3 2 2 3 4 0 6 2 1 3 false 0 3 2 2 3 2 3 3 2 4 1 1 2 4 2 2 0 Windy true 2 1 1 2 2 2 2 4 0 normal 2 2 1 3 2 2 3 2 3 high 5 cool 4 mild 5 Humid. hot rainy 5 4 5 4 6 4 7 7 6 8 9 5 overcast Oulook sunny overcast rainy Temp. hot mild cool Humid. high normal Windy yes no Play yes no Temp. sunny Outlook Occurrences of each pair 9 2 4 3 2 4 3 3 6 3 6 5 3 0 2 2 2 1 4 1 3 2 13 6 2 2 2 1 3 2 3 3 8 3 2 3 3 3 2 4 4 no 7 2 2 3 1 2 4 Play yes 7 3 2 2 3 4 0 false 4 1 1 2 Windy true Oulook sunny 5 2 2 overcast 4 2 1 rainy 5 0 3 Temp. hot 4 mild 6 cool 4 Humid. high 7 normal 7 Windy yes 6 no 8 Humidity = normal and Play = yes Play yes 9 occurs5more often than no normal mild hot rainy sunny Temperature = cool and Humidity = normal occurs more often than Temperature = cool and 5 Humidity 4 5 =4high6 Humid. high Temp. cool overcast Outlook 9 2 4 3 2 4 3 3 6 3 6 5 3 0 2 2 2 1 4 1 3 2 Humidity = normal and Play = no Weka Tutorial 5 - Association 14 “Support” is the number of instances 7 3 2 2 3 4 0 7 2 2 3 1 2 4 Weka Tutorial 5 - Association 6 2 2 2 1 3 2 3 3 8 3 2 3 3 3 2 4 4 no 4 1 1 2 Play yes 6 2 1 3 false 4 2 2 0 Windy true normal Oulook sunny 5 overcast 4 rainy 5 Temp. hot 4 mild 6 cool 4 Humid. high 7 normal 7 Temp. = cool AND Humid. = normal Windy yes 6 8 dataset, occursno 4 times in the Play yes 9 5 is 4. sonothe support high 5 cool rainy 4 mild overcast 5 Humid. hot sunny Outlook where a particular rule is correct.Temp. 9 2 4 3 2 4 3 3 6 3 6 5 3 0 2 2 2 1 4 1 3 2 15 Here are the 47 two-item sets 4 1 1 2 7 3 2 2 3 4 0 7 2 2 3 1 2 4 6 2 2 2 1 3 2 3 3 8 3 2 3 3 3 2 4 4 Combinations Combinationswith withhighest highestcoverage: support: Humidity = normal AND Play = yes Windy = no and Play = yes Weka Tutorial 5 - Association no 6 2 1 3 Play yes normal 4 2 2 0 Windy false high rainy 5 cool overcast 4 mild 5 4 5 4 6 4 7 7 6 8 9 5 5 Humid. hot Oulook sunny overcast rainy Temp. hot mild cool Humid. high normal Windy yes no Play yes no Temp. sunny Outlook true that have support => 2. 9 2 4 3 2 4 3 3 6 3 6 5 3 0 2 2 2 1 4 1 3 2 16 “Confidence” of the association rule 4 1 1 2 7 3 2 2 3 4 0 7 2 2 3 1 2 4 6 2 2 2 1 3 2 3 3 8 3 2 3 3 3 2 4 4 Number of occurrences of Humidity = normal is 7. Weka Tutorial 5 - Association no 6 2 1 3 Play yes 4 2 2 0 Windy false normal Oulook sunny 5 overcast 4 Combination rainy 5 Humidity = normal AND Play = yes Temp. hot 4 occurs 6 times. mild 6 cool 4 Humid. high 7 normal 7 Windy yes 6 no 8 Play yes 9 no 5 high 5 cool rainy 4 mild overcast 5 hot sunny is 6/7, or 86% Humid. true “If Humidity = normal, Outlook then Play = yes” Temp. 9 2 4 3 2 4 3 3 6 3 6 5 3 0 2 2 2 1 4 1 3 2 17 Confidence of rule 7 3 2 2 3 4 0 7 2 2 3 1 2 4 6 2 2 2 1 3 2 3 3 8 3 2 3 3 3 2 4 4 Oulook sunny 5 overcast 4 rainy 5 Number of occurrences of Temp. hot 4 Play = yes mild 6 is 9. cool 4 Humid. high 7 normal 7 Windy yes 6 no 8 Combination Play yes 9 Humidity = normal AND Play = yes no 5 occurs 6 times. Weka Tutorial 5 - Association no 4 1 1 2 Play yes 6 2 1 3 false 4 2 2 0 Windy true normal 5 high rainy 4 cool overcast 5 mild is 6/9, or 67% Humid. hot sunny If Play = yes, then Humidity = normalTemp. Outlook 9 2 4 3 2 4 3 3 6 3 6 5 3 0 2 2 2 1 4 1 3 2 18 8 0.60 0.50 0.60 0.75 0.50 0.50 0.57 0.57 no 4 7 7 6 0.40 0.20 0.60 0.40 0.40 0.25 0.25 0.50 0.50 0.50 0.60 0.40 0.40 0.60 0.40 0.75 0.25 0.25 0.67 0.33 0.50 0.00 1.00 0.50 0.57 0.00 0.43 0.29 0.57 0.43 0.50 0.33 0.50 0.50 0.38 0.25 0.50 0.50 0.44 0.33 0.33 0.67 0.33 0.40 0.20 0.80 0.20 0.60 Play false true normal Windy yes IF Oulook sunny 5 0.40 overcast 4 0.50 rainy 5 0.00 If Play = yes, Temp. hot 4 0.50 0.50 0.00 then mild 6 Humidity 0.33 0.17= normal 0.50 cool 4 0.25 0.25 0.50 Humid. high 7 0.43 0.29 0.29 0.43 normal 7 0.29 0.29 0.43 0.14 Windy yes 6 0.33 0.33 0.33 0.17 no 8 0.38 0.25 0.38 0.38 Play yes 9 0.22 0.44 0.33 0.22 no 5 0.60 0.00 0.40 0.40 high hot If Humidity = normal, 5 4 5Play = 4 yes6 then cool Humid. mild rainy Temp. overcast sunny THEN Outlook 9 0.40 1.00 0.60 0.50 0.67 0.75 0.43 0.86 0.50 0.75 5 0.60 0.00 0.40 0.50 0.33 0.25 0.57 0.14 0.50 0.25 0.67 0.40 The two-item set of Humidity and Play produced two different rules. Weka Tutorial 5 - Association 19 5 4 5 4 6 4 7 7 6 8 9 5 5 4 5 0.50 0.33 0.25 0.43 0.29 0.33 0.38 0.22 0.60 0.50 0.17 0.25 0.29 0.29 0.33 0.25 0.44 0.00 0.00 0.50 0.50 0.29 0.43 0.33 0.38 0.33 0.40 Weka Tutorial 5 - Association Humid. 4 6 4 7 0.40 0.40 0.20 0.60 0.50 0.25 0.25 0.50 0.00 0.60 0.40 0.40 0.75 0.67 0.00 0.43 0.57 0.00 0.14 0.29 0.57 0.17 0.50 0.33 0.50 0.38 0.38 0.25 0.50 0.22 0.44 0.33 0.33 0.40 0.40 0.20 0.80 Windy 7 0.40 0.50 0.60 0.25 0.33 1.00 Play 6 0.40 0.50 0.40 0.25 0.50 0.50 0.43 0.43 8 0.60 0.50 0.60 0.75 0.50 0.50 0.57 0.57 0.50 0.50 0.67 0.33 0.67 0.20 0.60 0.40 no IF Oulook sunny overcast rainy Temp. hot mild cool Humid. high normal Windy yes no Play yes no Temp. yes Two-item set accuracy false THEN Outlook true 6 2 normal 3 3 If Temperature = cool, then Humidity = normal If Outlook = overcast, than Play = yes high 3 4 6 1 Two-item set association rules: cool 3 4 3 4 5 3 0 2 2 2 1 4 1 3 2 mild 0 4 2 2 3 1 9 2 4 3 2 4 3 3 6 3 6 hot 4 2 3 3 4 2 8 3 2 3 3 3 2 4 4 rainy 3 1 1 3 2 2 6 2 2 2 1 3 2 3 3 no 7 2 2 3 1 2 4 For thresholds set at: accuracy = 100% coverage = >2 overcast 7 3 2 2 3 4 0 yes 4 1 1 2 false 6 2 1 3 Play sunny 0 3 2 2 3 2 3 3 2 4 2 2 0 Windy true 2 1 1 2 2 2 2 4 0 normal 2 2 1 3 2 2 3 2 3 high 5 cool 4 mild 5 Humid. hot 5 4 5 4 6 4 7 7 6 8 9 5 rainy Oulook sunny overcast rainy Temp. hot mild cool Humid. high normal Windy yes no Play yes no overcast Two-item set coverage Temp. sunny Outlook 9 0.40 1.00 0.60 0.50 0.67 0.75 0.43 0.86 0.50 0.75 5 0.60 0.00 0.40 0.50 0.33 0.25 0.57 0.14 0.50 0.25 20 Optimum thresholds? Support Dataset Confidence Association Rule Generator Association Rules Predictive apriori association algorithm: • Optimally combines support and confidence into predictive accuracy, • Requires only that user specify number of rules generated. Weka Tutorial 5 - Association 21 Load TPONTPNom.arff dataset Weka Tutorial 5 - Association 22 Show Show Outlook Outlook relative relativetotoPlay Play Weka Tutorial 5 - Association 23 Choose PredictiveApriori algorithm Weka Tutorial 5 - Association 24 Best 100 rules for TPONTP dataset Weka Tutorial 5 - Association 25 Support for Outlook = overcast AND Play = yes Occurrences of Outlook = overcast Predictive accuracy = 95.6% Confidence ConfidenceofofIf Humidity If Outlook = normal = overcast, then then Play Play = yes = yes isis4/4 6/7==100% 86% Weka Tutorial 5 - Association 26 Weka Tutorial 5 - Association 27 Input attributes Outlook Temp. sunny hot Classification Goal: sunny Predict hot whether the personovercast will play, hot given weather conditions. rainy mild rainy cool rainy cool overcast cool sunny mild sunny cool Association rule: rainy mild If Humidity = normalsunny mild AND Windy = false,overcast mild then Play = yes overcast hot rainy mild Humidity high high high high normal normal normal high normal normal normal high normal high Class Windy false true false false false true true false false false true true false true Play no no yes yes yes no yes no yes yes yes yes yes no Question: Is there a linkage between classification and association? Weka Tutorial 5 - Association 28 Mine only the association rules that have the designated class attribute as the antecedent. Keep the number of rules at 100. Weka Tutorial 5 - Association 29 Weka Tutorial 5 - Association 30 Weka Tutorial 5 - Association 31 What’s the difference between classification and association? • In one sense, association is like classification, except that instead of considering just one attribute as the class attribute, it considers every attribute and every combination of attributes as a class, and then performs a form of classification on all possible classes. • Unlike classification, association does not necessarily consider all attributes when attempting to create rules, but operates on sub-set combinations of attributes. • Association only works on nominal attributes, not numerical attributes, although numerical attributes can be discretized. Weka Tutorial 5 - Association 32 The Weka Experimenter Environment supports efficient automation of algorithm testing. Topic of the next Weka tutorial Weka Tutorial 5 - Association 33 Weka Documentation: Weka Tutorial 5 - Association 34 Contact the Author: Mark Polczynski, PhD The Technology Forge mhp.techforge@gmail.com Weka Tutorial 5 - Association 35