Unsupervised Learning Clustering Unsupervised classification, that is, without the class attribute Want to discover the classes Association Rule Discovery Discover correlation Data Mining and Knowledge Discovery 1 The Clustering Process Pattern representation Definition of pattern proximity measure Clustering Data abstraction Cluster validation Data Mining and Knowledge Discovery 2 Pattern Representation Number of classes Number of available patterns Feature selection Circles, ellipses, squares, etc. Can we use wrappers and filters? Feature extraction Produce new features E.g., principle component analysis (PCA) Data Mining and Knowledge Discovery 3 Pattern Proximity Want clusters of instances that are similar to each other but dissimilar to others Need a similarity measure Continuous case Euclidean measure (compact isolated clusters) The squared Mahalanobis distance 1 d M (xi , x j ) (xi x j ) (xi x j ) T alleviates problems with correlation Many more measures Data Mining and Knowledge Discovery 4 Pattern Proximity Nominal attributes nx d (x i , x j ) n n Number of attributes x Number of attributes that are the same Data Mining and Knowledge Discovery 5 Clustering Techniques Clustering Hierarchical Single Link Complete Link CobWeb Partitional Square Error Mixture Maximization K-means Expectation Maximization Data Mining and Knowledge Discovery 6 Technique Characteristics Agglomerative vs Divisive Agglomerative: each instance is its own cluster and the algorithm merges clusters Divisive: begins with all instances in one cluster and divides it up Hard vs Fuzzy Hard clustering assigns each instance to one cluster whereas in fuzzy clustering assigns degree of membership Data Mining and Knowledge Discovery 7 More Characteristics Monothetic vs Polythetic Polythetic: all attributes are used simultaneously, e.g., to calculate distance (most algorithms) Monothetic: attributes are considered one at a time Incremental vs Non-Incremental With large data sets it may be necessary to consider only part of the data at a time (data mining) Incremental works instance by instance Data Mining and Knowledge Discovery 8 Hierarchical Clustering Dendrogram F C B DE G S i m i l a r i t y A A B Data Mining and Knowledge Discovery C D E F G 9 Hierarchical Algorithms Single-link Distance between two clusters set equal to the minimum of distances between all instances More versatile Produces (sometimes too) elongated clusters Complete-link Distance between two clusters set equal to maximum of all distances between instances in the clusters Tightly bound, compact clusters Often more useful in practice Data Mining and Knowledge Discovery 10 Example: Clusters Found Single-Link Complete-Link 1 1 1 1 1 1 * 1 1 1 11 1 1 1 1 1 1 * 1 1 1 11 2 * * * * * * * 2 2* 2 2 * * * * Data Mining and Knowledge Discovery * * 2 2* 2 2 2 2 2 2 2 2 2 2 2 2 * 2 2 2 2 2 2 2 2 2 11 Partitional Clustering Output a single partition of the data into clusters Good for large data sets Determining the number of clusters is a major challenge Data Mining and Knowledge Discovery 12 K-Means Predetermined number of clusters Start with seed clusters of one element Seeds Data Mining and Knowledge Discovery 13 Assign Instances to Clusters Data Mining and Knowledge Discovery 14 Find New Centroids Data Mining and Knowledge Discovery 15 New Clusters Data Mining and Knowledge Discovery 16 Discussion: k-means Applicable to fairly large data sets Sensitive to initial centers Use other heuristics to find good initial centers Converges to a local optimum Specifying the number of centers very subjective Data Mining and Knowledge Discovery 17 Clustering in Weka Clustering algorithms in Weka K-Means Expectation Maximization (EM) Cobweb hierarchical, incremental, and agglomerative Data Mining and Knowledge Discovery 18 CobWeb Algorithm (main) characteristics: Hierarchical and incremental Uses category utility The k clusters CU C1 , C2 ,..., Ck Improvemen t in probability estimate because of instancecluster assigment 2 2 Pr C Pr a v | C Pr a v l i ij l i ij l i j k Why divide by k? Data Mining and Knowledge Discovery All possible values for attribute ai 19 Category Utility If each instance in its own cluster 1 vij actual value of instance Pr ai vij | Cl otherwise 0 Category utility function becomes CU C1 , C2 ,..., Ck n Pr ai vij i 2 j k Without k it would always be best for each instance to have its own cluster, overfitting! Data Mining and Knowledge Discovery 20 The Weather Problem Outlook Temp. Humidity Windy Sunny Hot High FALSE Sunny Hot High TRUE Overcast Hot High FALSE Rainy Mild High FALSE Rainy Cool Normal FALSE Rainy Cool Normal TRUE Overcast Cool Normal TRUE Sunny Mild High FALSE Sunny Cool Normal FALSE Rainy Mild Normal FALSE Sunny Mild Normal TRUE Overcast Mild High TRUE Overcast Hot Normal FALSE Rainy Mild High TRUE Data Mining and Knowledge Discovery Play No No Yes Yes Yes No Yes No Yes Yes Yes Yes Yes No 21 Weather Data (without Play) Label instances: a,b,….,n Start by putting the first instance in its own cluster a Add another instance in its own cluster a Data Mining and Knowledge Discovery b 22 Adding the Third Instance Evaluate the category utility of adding the instance to one of the two clusters versus adding it as its own cluster b a c a a b c b c Highest utility Data Mining and Knowledge Discovery 23 Adding Instance f First instance not to get its own cluster: a b c d e f Look at the instances: Rainy Cool Normal FALSE Rainy Cool Normal TRUE Quite similar! Data Mining and Knowledge Discovery 24 Add Instance g Look at the instances: E) Rainy Cool Normal FALSE F) Rainy Cool Normal TRUE G) Overcast Cool Normal TRUE a b c d e Data Mining and Knowledge Discovery f g 25 Add Instance h Look at the instances: A) Sunny Hot High FALSE D) Rainy Mild High FALSE H) Sunny Mild High FALSE Rearrange: Merged into a single cluster before h is added b a d h Runner up Best matching node c e f g (Splitting is also possible) Data Mining and Knowledge Discovery 26 Final Hierarchy g a d h c b k l f e j m n i What next? Data Mining and Knowledge Discovery 27 Dendrogram Clusters g a d h c b k l e f j m n i What do a, b, c, d, h, k, and l have in common? Data Mining and Knowledge Discovery 28 Numerical Attributes Assume normal distribution 1 1 l PrCl 2 i i il CU C1 , C2 ,..., Ck k 1 Problems with zero variance! The acuity parameter imposes a minimum variance Data Mining and Knowledge Discovery 29 Hierarchy Size (Scalability) May create very large hierarchy The cutoff parameter is uses to suppress growth If CU C1 , C2 ,..., Ck Cutoff cut node off. Data Mining and Knowledge Discovery 30 Discussion Advantages Incremental scales to large number of instances Cutoff limits size of hierarchy Handles mixed attributes Disadvantages Incremental sensitive to order of instances? Arbitrary choice of parameters: divide by k, artificial minimum value for variance of numeric attributes, ad hoc cutoff value Data Mining and Knowledge Discovery 31 Probabilistic Perspective Most likely set of clusters given data Probability of each instance belonging to a cluster Assumption: instances are drawn from one of several distributions Goal: estimate the parameters of these distributions Usually: assume distributions are normal Data Mining and Knowledge Discovery 32 Mixture Resolution Mixture: set of k probability distributions Represent the k clusters Probabilities that an instance takes certain attribute values given it is in the cluster What is the probability an instance belongs to a cluster (or a distribution) Data Mining and Knowledge Discovery 33 One Numeric Attribute Two cluster mixture model: Cluster B Cluster A Attribute Given some data, how can you determine the parameters: A Mean for Cluster A A Standard deviation for Cluster A B Mean for Cluster B B Standard deviation for Cluster B p A Probabilit y of being in Cluster A Data Mining and Knowledge Discovery 34 Problems If we knew which instance came from each cluster we could estimate these values If we knew the parameters we could calculate the probability that an instance belongs to each cluster Prx | A Pr[ A] f ( x; A , A ) p A PrA | x Pr[ x] 1 f ( x; A , A ) e 2 ( x )2 2 2 Pr[ x] . Data Mining and Knowledge Discovery 35 EM Algorithm Expectation Maximization (EM) Start with initial values for the parameters Calculate the cluster probabilities for each instance Re-estimate the values for the parameters Repeat General purpose maximum likelihood estimate algorithm for missing data Can also be used to train Bayesian networks (later) Data Mining and Knowledge Discovery 36 Beyond Normal Models More than one class: More than one numeric attribute Straightforward Easy if assume attributes independent If dependent attributes, treat them jointly using the bivariate normal Nominal attributes No more normal distribution! Data Mining and Knowledge Discovery 37 EM using Weka Options numClusters: set number of clusters. Default = -1 selects it automatically maxIterations: maximum number of iterations seed -- random number seed minStdDev -- set minimum allowable standard deviation Data Mining and Knowledge Discovery 38 Other Clustering Artificial Neural Networks (ANN) Random search Genetic Algorithms (GA) GA used to find initial centroids for k-means Simulated Annealing (SA) Tabu Search (TS) Support Vector Machines (SVM) Will discuss GA and SVM later Data Mining and Knowledge Discovery 39 Applications Image segmentation Object and Character Recognition Data Mining: Stand-alone to gain insight into the data Preprocess before classification that operates on the detected clusters Data Mining and Knowledge Discovery 40 DM Clustering Challenges Data mining deals with large databases Scalability with respect to number of instance Dealing with mixed data Use a random sample (possible bias) Many algorithms only make sense for numeric data High dimensional problems Can the algorithm handle many attributes? How do we interpret a cluster in high dimensions? Data Mining and Knowledge Discovery 41 Other (General) Challenges Shape of clusters Minimum domain knowledge (e.g., knowing the number of clusters) Noisy data Insensitivity to instance order Interpretability and usability Data Mining and Knowledge Discovery 42 Clustering for DM Main issue is scalability to large databases Many algorithms have been developed for scalable clustering: Partitional methods: CLARA, CLARANS Hierarchical methods: AGNES, DIANA, BIRCH, CURE, Chameleon Data Mining and Knowledge Discovery 43 Practical Partitional Clustering Algorithms Classic k-Means (1967) Work from 1990 and later: k-Medoids Uses the mediod instead of the centroid Less sensitive to outliers and noise Computations more costly PAM (Partitioning Around Mediods) algorithm Data Mining and Knowledge Discovery 44 Large-Scale Problems CLARA: Clustering LARge Applications Select several random samples of instances Apply PAM to each Return the best clusters CLARANS: Similar to CLARA Draws samples randomly while searching More effective than PAM and CLARA Data Mining and Knowledge Discovery 45 Hierarchical Methods BIRCH: Balanced Iterative Reducing and Clustering using Hierarchies Clustering feature: triplet summarizing information about subclusters Clustering feature tree: height-balanced tree that stores the clustering features Data Mining and Knowledge Discovery 46 BIRCH Mechanism Phase I: Phase II: Scan database to build an initial CF tree Multilevel compression of the data Apply a selected clustering algorithm to the leaf nodes of the CF tree Has been found to be very scalable Data Mining and Knowledge Discovery 47 Conclusion The use of clustering in data mining practice seems to be somewhat limited due to scalability problems More commonly used unsupervised learning: Association Rule Discovery Data Mining and Knowledge Discovery 48 Association Rule Discovery Aims to discovery interesting correlation or other relationships in large databases Finds a rule of the form if A and B then C and D Which attributes will be included in the relation is unknown Data Mining and Knowledge Discovery 49 Mining Association Rules Similar to classification rules Use same procedure? Every attribute is the same Apply to every possible expression on right hand side Huge number of rules Infeasible Only want rules with high coverage/support Data Mining and Knowledge Discovery 50 Market Basket Analysis Basket data: items purchased on pertransaction basis (not cumulative, etc) How do you boost the sales of a given product? What other products does discontinuing a product impact? Which products should be shelved together? Terminology (market basket analysis): Item - an attribute/value pair Item set - combination of items with min. coverage Data Mining and Knowledge Discovery 51 How Many k-Item Sets Have Minimum Coverage? Outlook Sunny Sunny Overcast Rainy Rainy Rainy Overcast Sunny Sunny Rainy Sunny Overcast Overcast Rainy Temp. Humidity Windy Hot High FALSE Hot High TRUE Hot High FALSE Mild High FALSE Cool Normal FALSE Cool Normal TRUE Cool Normal TRUE Mild High FALSE Cool Normal FALSE Mild Normal FALSE Mild Normal TRUE Mild High TRUE Hot Normal FALSE Mild High TRUE Data Mining and Knowledge Discovery Play No No Yes Yes Yes No Yes No Yes Yes Yes Yes Yes No 52 Item Sets 1-Item 2-Item 3-Item 4-Item Outlook=sunny (5) Outlook=sunny temp=mild (2) Outlook= overcast (4) Outlook=sunny temp=hot (2) Outlook=sunny temp=hot humidity=high (2) Outlook=sunny temp=hot play=no (2) Outlook=rainy (5) Outlook=sunny humidity=norm (2) Outlook=sunny humidity=norm play=yes (2) Temp=cool (4) Outlook=sunny windy=true (2) Outlook=sunny humidity=high windy=false (2) Temp=mild (6) Outlook=sunny windy=true (2) Outlook=sunny humidity=high play=no (3) Outlook=sunny temp=hot humidity=high play=no (2) Outlook=sunny humidity=high windy=false play=no (2) Outlook=over temp=hot windy=false play=no (2) Outlook=rainy temp=mild windy=false play=yes (2) Outlook=rainy humidity=norm windy=false play=yes (2) Data Mining and Knowledge Discovery 53 From Sets to Rules 3-Item Set w/coverage 4: Humidity = normal, windy = false, play = yes Association Rules: Accuracy If humidity = normal and windy = false then play = yes If humidity = normal and play = yes then windy = false If windy = false and play = yes then humidity = normal If humidity = normal then windy = false and play = yes If windy = false then humidity = normal and play = yes If play = yes then humidity = normal and windy = false If - then humidity = normal and windy = false and play=yes Data Mining and Knowledge Discovery 4/4 4/6 4/6 4/7 4/8 4/9 4/12 54 From Sets to Rules (continued) 4-Item Set w/coverage 2: Temperature = cool, humidity = normal, windy = false, play = yes Association Rules: Accuracy If temperature = cool, windy = false humidity = normal, play = yes If temperature = cool, humidity = normal, windy = false play = yes If temperature = cool, windy = false, play = yes humidity = normal Data Mining and Knowledge Discovery 2/2 2/2 2/2 55 Overall Minimum coverage (2): 12 1-item sets, 47 2-item sets, 39 3-item sets, 6 4-item sets Minimum accuracy (100%): 58 association rules “Best” Rules (Coverage = 4, Accuracy = 100%) If humidity = normal and windy = false If temperature = cool If outlook = overcast play = yes humidity = normal play = yes Data Mining and Knowledge Discovery 56 Association Rule Mining STEP 1: Find all item sets that meet minimum coverage STEP 2: Find all rules that meet minimum accuracy STEP 3: Prune Data Mining and Knowledge Discovery 57 Generating Item Sets How do we generate minimum coverage item sets in a scalable manner? Need an efficient algorithm: Total number of item set is huge Grows exponentially in the number of attributes Start by generating minimum coverage 1-item sets Use those to generate 2-item sets, etc Why do we only need to consider minimum coverage 1-item sets? Data Mining and Knowledge Discovery 58 Justification Item Set 1: {Humidity = high} Coverage(1) = Number of times humidity is high Item Set 2: {Windy = false} Coverage (2) = Number of times windy is false Item Set 3: {Humidity = high, Windy = false} Coverage (3) = Number of times humidity is high and windy is false Coverage (3) Coverage(1) Coverage (3) Coverage(2) If Item Set 1 and 2 do not both meet min. coverage Item Set 3 cannot either Data Mining and Knowledge Discovery 59 Generating Item Sets Start with all 3-item sets that meet min. coverage (A B C) (A B D) (A C D) (A C E) Merge to generate 4-item sets There are only two 4item sets that could possibly work (Consider only sets that start with the same two attributes) (A B C D) (A C D E) Candidate 4-item sets with minimum coverage (must be checked) Data Mining and Knowledge Discovery 60 Algorithm for Generating Item Sets Build up from 1-item sets so that we only consider item sets that is found by merging two minimum coverage sets Only consider sets that have all but one item in common Computational efficiency further improved using hash tables Data Mining and Knowledge Discovery 61 Generating Rules Meets min. If windy = false and play = no then coverage outlook = sunny and humidity = high and accuracy Meets min. coverage and accuracy If windy then If windy then = false and play = no outlook = sunny = false and play = no humidity = high Data Mining and Knowledge Discovery 62 How Many Rules? Want to consider every possible subset of attributes as consequent Have 4 attributes: Four single consequent rules Six double consequent rules Two triple consequent rules Twelve possible rules for single 4-item set! Exponential explosion of possible rules Data Mining and Knowledge Discovery 63 Must We Check All? If A and B then C and D Coverage Number of times A, B, C, and D are true Number of times A, B, C, and D are true Accuracy Number of times A and B are true If A,B and C then D Coverage Number of times A, B, C, and D are true Number of times A, B, C, and D are true Accuracy Number of times A, B, and C are true Data Mining and Knowledge Discovery 64 Efficiency Improvement A double consequent rule can only be OK if both single consequent rules are OK Procedure: Start with single consequent rules Build up double consequent rules, etc. candidate rules check for accuracy In practice: need to check far fewer rules Data Mining and Knowledge Discovery 65 Apriori Algorithm This is a simplified description of the Apriori algorithm Developed in early 90s and is the most commonly used approach New developments focus on Generating item sets more efficiently Generating rules from item sets more efficiently Data Mining and Knowledge Discovery 66 Association Rule Discovery using Weka Parameters to be specified in Apriori: upperBoundMinSupport: start with this value of minimum support delta: in each step decrease the minimum support required by this value lowerBoundMinSupport: final minimum support numRules: how many rules are generated metricType: confidence, lift, leverage, conviction minMetric: smallest acceptable value for a rule Handles only nominal attributes Data Mining and Knowledge Discovery 67 Difficulties Apriori algorithm improves performance by using candidate item sets Still some problems … Costly to generate large number of item sets To generate a frequent pattern of size 100 need >21001030 candidates! Requires repeated scans of database to check candidates Again, most problematic for long patterns Data Mining and Knowledge Discovery 68 Solution? Can candidate generation be avoided? New approach: Create a frequent pattern tree (FP-tree) stores information on frequent patterns Use the FP-tree for mining frequent patterns partitioning-based divide-and-conquer (as opposed to bottom-up generation) Data Mining and Knowledge Discovery 69 Database TID 100 200 300 400 500 Items F,A,C,D,G,I,M,P A,B,C,F,L,M,O B,F,H,J,O B,C,K,S,P A,F,C,E,L,P,M,N (Min. support = 3) Frequent Items F,C,A,M,P F,C,A,B,M F,B C,B,P F,C,A,M,P Head of Item node links F C A B M P FP-Tree Root F:4 C:3 C:1 B:1 A:3 P:1 M:2 B:1 P:2 M:1 Data Mining and Knowledge Discovery B:1 70 Computational Effort Each node has three fields Also a header table with item name count node link item name head of node link Need two scans of the database Collect set of frequent items Construct the FP-tree Data Mining and Knowledge Discovery 71 Comments The FP-tree is a compact data structure The FP-tree contains all the information related to mining frequent patterns (given the support) The size of the tree is bounded by the occurrences of frequent items The height of the tree is bounded by the maximum number of items in a transaction Data Mining and Knowledge Discovery 72 Mining Patterns Mine complete set of frequent patterns For any frequent item A, all possible patterns containing A can be obtained by following A’s node links starting from A’s head of node links Data Mining and Knowledge Discovery 73 Example Item F C A B M P Head of node links Root F:4 C:3 C:1 B:1 B:1 A:3 P:1 M:2 B:1 P:2 M:1 Occurs twice Frequent Pattern (P:3) Paths <F:4, C:3, A:3, M:2, P:2> <C:1, B:1, P:1> Occurs ones Data Mining and Knowledge Discovery 74 Rule Generation Mining complete set of association rules has some problems May be a large number of frequent item sets May be a huge number of association rules One potential solution is to look at closed item sets only Data Mining and Knowledge Discovery 75 Frequent Closed Item Sets An item set X is a closed item set if there is no item set X’ such that X X’ and every transaction containing X also contains X’ A rule X Y is an association rule on a frequent closed item set if both X and XY are frequent closed item sets, and there does not exist a frequent closed item set Z such that X Z XY Data Mining and Knowledge Discovery 76 Example ID 10 20 30 40 50 Items A,C,D,E,F A,B,E C,E,F A,C,D,F C,E,F Frequent Item Sets (min support = 2): A (3), E (4), AE (2), All the closed sets ACDF (2), CF (3), CEF (3), D (2), Not closed! Why? AC (2), + 12 more Data Mining and Knowledge Discovery 77 Mining Frequent Closed Item Sets (CLOSET) TDB NOTE C:4 E:4 F:4 A:3 Order for D:2 conditional DB CEFAD EA CEF CFAD CEF D-cond DB (D:2) A-cond DB (A:3) F-cond DB (F:4) E-cond DB (E:4) CEFA CEF CE:3 C:4 CFA E C Output: E:4 CF Output: CFAD:2 Output: CF:2,CEF:3 Output: A:3 EA-cond DB (EA:2) C Output: EA:2 Data Mining and Knowledge Discovery 78 Mining with Taxonomies Taxonomy: Clothes Outerwear Jackets Footwear Shirts Shoes Hiking Boots Ski Pants Generalized association rule X Y where no item in Y is an ancestor of an item in X Data Mining and Knowledge Discovery 79 Why Taxonomy? The ‘classic’ association rule mining restricts the rules to the leave nodes in the taxonomy However: Rules at lower levels may not have minimum support and thus interesting association may go undiscovered Taxonomies can be used to prune uninteresting and redundant rules Data Mining and Knowledge Discovery 80 Example ID 10 20 30 40 50 60 Item Set {Jacket} {Outerwear} {Cloths} {Shoes} {Hiking Boots} {Footwear} {Outerwear, Hiking Boots} {Cloths, Hiking Boots} {Outerwear, Footwear} {Cloths, Footwear} Items Shirt Jacket, Hiking Boots Ski pants, Hiking Boots Shoes Shoes Jacket Rule Outerwear Hiking Boots Outerwear Footwear Hiking Boots Outerwear Hiking Boots Clothes Support 2 2 2 2 Support 2 3 4 2 2 2 2 2 2 2 Confidence 2/3 2/3 2/2 2/2 Data Mining and Knowledge Discovery 81 Interesting Rules Many way in which the interestingness of a rule can be evaluated based on ancestors For example: A rule with no ancestors is interesting A rule with ancestor(s) is interesting only if it has enough ‘relative support’ Rule ID 1 2 3 Rule Clothes Footwear Outerwear Footwear Jackets Footwear Support 10 8 4 Item Clothes Outerwear Jackets Support 5 2 1 Which rules are interesting? Data Mining and Knowledge Discovery 82 Discussion Association rule mining finds expression of the form X Y from large data sets One of the most popular data mining tasks Originates in market basket analysis Key measures of performance Support Confidence (or accuracy) Is support and confidence enough? Data Mining and Knowledge Discovery 83 Type of Rules Discovered ‘Classic’ association rule problem All rules satisfying minimum threshold of support and confidence Focus on subset of rules, e.g., Optimized rules Maximal frequent item sets Closed item sets Data Mining and Knowledge Discovery What makes for an interesting rule? 84 Algorithm Construction Determine frequent item sets (all or part) By far the most computational time Variations focus on this part Generate rules from frequent item sets Data Mining and Knowledge Discovery 85 Generating Item Sets Search space traversed Support determined Bottom-up Counting Apriori-like algorithms * Have discussed Intersecting Apriori* Partition AprioriTID DIC Top-down Counting FP-Growth* Data Mining and Knowledge Discovery Intersecting Eclat No algorithm dominates others! 86 Applications Market basket analysis Classic marketing application Applications to recommender systems Data Mining and Knowledge Discovery 87 Recommender Customized goods and services Recommend products Collaborative filtering similarities among users’ tastes recommend based on other users many on-line systems simple algorithms Data Mining and Knowledge Discovery 88 Classification Approach View as classification problem Product either of interest or not Induce a model, e.g., a decision tree Classify a new product as either interesting or not interesting Difficulty in this approach? Data Mining and Knowledge Discovery 89 Association Rule Approach Product associations User associations 90% of users who like product A and product B also like product C A and B C (90%) 90% of products liked by user A and user B are also liked by user C Use combination of product and user associations Data Mining and Knowledge Discovery 90 Advantages ‘Classic’ collaborative filtering must identify users with similar tastes This approach uses overlap of other users’ tastes to match given user’s taste Can be applied to users whose tastes don’t correlate strongly with those of other users Can take advantage of information from, say user A, for a recommendation to user B, even if they do not correlate Data Mining and Knowledge Discovery 91 What’s Different Here? Is this really a ‘classic’ association rule problem? Want to learn what products are liked by what users ‘Semi-supervised’ Target item User (for user associations) Product (for product associations) Data Mining and Knowledge Discovery 92 Single-Consequent Rules Only a single (target) item in the consequent Go through all such items Association Rules All possible item combination consequent Associations for Recommender Classification One single item consequent Data Mining and Knowledge Discovery 93