Association Rules by Carissa Wang

advertisement
Association Rules
Carissa Wang
February 23, 2010
What is Association Rule
In data mining, it is a method for
discovering relations between different
sets of items in a large database.
Database
A large collection of transactions
Example - Market basket database
Definition
X => Y
X = {x1, x2, …, xn}
Y = {y1, y2, …, yn}
xi and yj are distinct items for all i and all j
X is the left-hand-side (LHS)
Y is the right-hand-side (RHS)
Example
Transaction ID
Items Bought
1
Milk, bread, cookies, juice
2
Milk, juice
3
Milk, eggs
4
Bread, cookies, coffee
Measuring the rule
Support
Frequency of an item set occurs in the
database
Item set – LHS  RHS
Confidence
Probability of LHS => RHS
Support
 Rules
Milk => juice
Bread => juice
 {milk, juice}
Transaction
ID
Items Bought
1
Milk, bread,
cookies, juice
2
Milk, juice
3
Milk, eggs
4
Bread, cookies,
coffee
 2 / 4 = 0.50
 {bread, juice}
1 / 4 = 0.25
Confidence
 Rules
Milk => juice
Bread => juice
 Milk => juice
Transactio Items Bought
n ID
1
Milk, bread,
cookies, juice
2
Milk, juice
3
Milk, eggs
4
Bread, cookies,
coffee
0.50 / 0.75 = 0.67
 Bread => juice
0.25 / 0.50 = 0.50
What these numbers mean
Support
High – LHS => RHS
Low – not enough evidence of LHS => RHS
Confidence
High – given condition LHS, RHS will occur
Low – RHS does not occur consistently
Other measures of association rule
Lift
Conviction
All – confidence
Collective strength
Leverage
Algorithm to generate association rule
Apriori Algorithm
Eclat Algorithm
Frequent Pattern Growth Algorithm
One Attribute Rule
Zero Attribute Rule
Apriori Algorithm
Database with large transactions
Breadth-first search
Two properties
Downward closure
Antimonotonicity
Apriori Property
Downward Closure
Subset of large item set is also large
Antimonotonicity
Superset of small item set is small
How Apriori algorithm works
Find subsets with minimum frequency of in
the given transactions
Extend the subsets by one item and keep
the subsets that meet the minimum
frequency
Repeat last step until no frequent superset
How Apriori algorithm works
Item Support
1
2
3
4
3
6
Item
Support
{1,2}
3
{1,3}
2
{1,4}
3
{2,3}
4
{2,4}
5
4
5
Min Frequency
=3
Item
Support
{1,2,4} 3
{2,3,4} 3
{3,4}
3
Applications
Web usage mining
Intrusion detection
Bioinformatics
Reference
Apriori algorithm, Wikipedia
http://en.wikipedia.org/wiki/Apriori_algorithm
Fundamentals of Database Systems, 5th
ed, Elmasri and Navathe
Association rule learning, Wikipedia
http://en.wikipedia.org/wiki/Association_rules
Download