Rule Learning: Itemset Mining, Apriori Algorithm & Association Rules

Rule Learning Module Introduction Objectives By the end of this module, you will be prepared to: • Sketch itemset mining • Sketch association rule learning • Formulate the a-priori algorithm • Define support • Define how empirically determined probabilities are computed for rules and itemsets Rule Learning Identifies Patterns in Data • Often considered “unsupervised” as no target class • Provides analytic insights into data • Explainability Introduction to Rule Learning and Itemset Mining Objectives By the end of this module, you will be prepared to: • Sketch itemset mining • Define support • Define how empirically determined probabilities are computed for rules and itemsets Gaining insights into data • Rule and itemset learning are explainable methods that show how their conclusions were reached • These methods are used to identify unique patterns in data • They do not rely on ground truth, and are sometimes considered as “unsupervised” methods Source: BBC https://www.bbc.com/news/technology-33804287 Basic Setup • Itemsets • Example: Market basket Analysis Consider the following transaction database {apple, banana, bread} {bread, apple, peanut_butter} {bread, jelly, peanut_butter} {salmon, capers, cheese, orange_juice} {salmon, oil, parsley} {milk, peanut_butter, bread} {eggs, bread, oil} Support • The support for a given itemset is simply the number of times* that the itemset appears in the transaction database • For a given itemset X, this will be annotated sup(X) or #X *In some implementations this may be the fraction of times. Association Rules and Confidence • Example: Market basket Analysis Consider the following transaction database • {apple, banana, bread} • {bread, apple, peanut_butter, jelly} • {bread, jelly, peanut_butter} • {salmon, capers, cheese, orange_juice} • {salmon, oil, parsley} • {milk, peanut_butter, jelly} • {eggs, bread, oil} Consider rule {peanut_butter, jelly} 🡪 {bread} Itemset Mining and the A-Priori Algorithm Objectives By the end of this module, you will be prepared to: • Formulate the a-priori algorithm Problem: Finding Itemsets Given a transaction database, can we find all itemsets that meet a certain level of minimum support? This is called minining frequent itemsets How Do You Find Itemsets? – Brute Force • Brute force method to find itemsets in a transaction database of size n of m different items for a given level of support minSupt • Result = {} • For i in 1,…,m • For each combination C of items of size m • Check each transaction t to see if C is a subset of t • If C is a subset of minSupt transactions, then add C to Result • Return Result Problems with Brute Force Approach • Exponential runtime • Needlessly examines itemsets that are not present • Needlessly examines itemsets that we already know do not meet the support Downward Closure Property • The Apriori Algorithm • Leverages Downward Closure for a more efficient process • Iterative, level wise search • At each iteration k, only consider certain itemsets contain frequent itemsets of size k-1 • Join step: generate all possible candidates of length k • Prune step: remove those candidates that cannot be frequent (as they contain a non-frequent subset) A Prior Algorithm: Pseudocode • candidateGen subroutine • Example: Market basket Analysis First Pass Consider the following transaction database • {apple, banana, bread} • {bread, apple, peanut_butter, jelly} • {bread, jelly, peanut_butter} • {salmon, capers, cheese, orange_juice} • {salmon, oil, parsley} • {milk, peanut_butter, jelly} • {eggs, bread, oil} Find all itemsets of size 2 First pass: • {apple} • {bread} • {peanut_butter} • {jelly} • {salmon} • {oil} Example: Market basket Analysis k=2 Consider the following transaction database • {apple, banana, bread} • {bread, apple, peanut_butter, jelly} • {bread, jelly, peanut_butter} • {salmon, capers, cheese, orange_juice} • {salmon, oil, parsley} • {milk, peanut_butter, jelly} • {eggs, bread, oil} Find all itemsets of size 2 Candidates: • {apple, bread} • {apple, peanut_butter} • {apple, jelly} • {apple, salmon} • {apple, oil} • {bread, peanut_butter} • {bread, jelly} • {bread, salmon} • {bread, oil} • {peanut_butter, jelly} • {peanut_butter, salmon} • {peanut_butter, oil} • {jelly, salmon} • {jelly, oil} • {salmon, oil} Example: Market basket Analysis k=3 Consider the following transaction database Candidates: • {bread, peanut_butter, jelly} • {apple, banana, bread} • {bread, apple, peanut_butter, jelly} • {bread, jelly, peanut_butter} • {salmon, capers, cheese, orange_juice} • {salmon, oil, parsley} • {milk, peanut_butter, jelly} • {eggs, bread, oil} Find all itemsets of size 2 Termination criteria met (no more candidates) Result: • {apple} • {bread} • {peanut_butter} • {jelly} • {salmon} • {oil} • {apple, bread} • {bread, peanut_butter} • {bread, jelly} • {peanut_butter, jelly} A Priori • Typically, the size of the largest item set is bounded at much less than m (usually ~10) • Very fast algorithm, under certain conditions it can run in linear time • Setting minSupt=1 will make A Priori preform poorly • Key: higher support yields a sparsity that A Priori leverages; Association Rule Mining Objectives By the end of this module, you will be prepared to: • Sketch association rule learning Problem • Given a transaction database, suppose we have a set of frequent itemsets, all having a support of at least minSupt • How do we then find association rules that meet some minimum level of confidence? Simple Extension to A Priori • Example: Market basket Analysis Association Rules Consider the following transaction database • {apple, banana, bread} • {bread, apple, peanut_butter, jelly} • {bread, jelly, peanut_butter} • {salmon, capers, cheese, orange_juice} • {salmon, oil, parsley} • {milk, peanut_butter, jelly} • {eggs, bread, oil} Find all itemsets of size 2 Itemsets: • {apple} (2) • {bread} (4) • {peanut_butter} (3) • {jelly} (3) • {salmon} (2) • {oil} (2) • {apple, bread} (2) • {bread, peanut_butter} (2) • {bread, jelly} (2) • {peanut_butter, jelly} (3) Itemsets: • apple 🡪 bread 1.0 • bread 🡪 apple 0.5 • bread 🡪 peanut_butter 0.5 • peanut_butter 🡪 bread 0.66 • bread 🡪 jelly 0.5 • jelly 🡪 bread 0.66 • peanut_butter 🡪 jelly 1.0 • jelly 🡪 peanut_butter 1.0 Class Association Rules (CAR) • This is where a certain item or items is a “target class” and appears only in the consequent • Set of class labels is disjoint from the set of items • Key idea to mine: • Find items sets that meet minimum support • Compute confidence for the itemset as an antecedent based on the fraction of itemsets that appear with the target class

Rule Learning: Itemset Mining, Apriori Algorithm & Association Rules

Related documents

Products

Support

Rule Learning: Itemset Mining, Apriori Algorithm & Association Rules

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib