Unsupervised Learning Clustering Association Rule Discovery

advertisement
Unsupervised Learning

Clustering



Unsupervised classification, that is, without
the class attribute
Want to discover the classes
Association Rule Discovery

Discover correlation
Data Mining and Knowledge
Discovery
1
The Clustering Process

Pattern representation

Definition of pattern proximity measure

Clustering

Data abstraction

Cluster validation
Data Mining and Knowledge
Discovery
2
Pattern Representation


Number of classes
Number of available patterns


Feature selection


Circles, ellipses, squares, etc.
Can we use wrappers and filters?
Feature extraction


Produce new features
E.g., principle component analysis (PCA)
Data Mining and Knowledge
Discovery
3
Pattern Proximity



Want clusters of instances that are similar
to each other but dissimilar to others
Need a similarity measure
Continuous case



Euclidean measure (compact isolated clusters)
The squared Mahalanobis distance
1
d M (xi , x j )  (xi  x j ) (xi  x j )
T
alleviates problems with correlation
Many more measures
Data Mining and Knowledge
Discovery
4
Pattern Proximity

Nominal attributes
nx
d (x i , x j ) 
n
n  Number of attributes
x  Number of attributes that are the same
Data Mining and Knowledge
Discovery
5
Clustering Techniques
Clustering
Hierarchical
Single
Link
Complete
Link
CobWeb
Partitional
Square
Error
Mixture
Maximization
K-means
Expectation
Maximization
Data Mining and Knowledge
Discovery
6
Technique Characteristics

Agglomerative vs Divisive

Agglomerative: each instance is its own cluster
and the algorithm merges clusters

Divisive: begins with all instances in one cluster
and divides it up

Hard vs Fuzzy

Hard clustering assigns each instance to one
cluster whereas in fuzzy clustering assigns degree
of membership
Data Mining and Knowledge
Discovery
7
More Characteristics

Monothetic vs Polythetic

Polythetic: all attributes are used simultaneously, e.g., to
calculate distance (most algorithms)


Monothetic: attributes are considered one at a time
Incremental vs Non-Incremental

With large data sets it may be necessary to consider only
part of the data at a time (data mining)

Incremental works instance by instance
Data Mining and Knowledge
Discovery
8
Hierarchical Clustering
Dendrogram
F
C
B
DE
G
S
i
m
i
l
a
r
i
t
y
A
A B
Data Mining and Knowledge
Discovery
C D E F
G
9
Hierarchical Algorithms

Single-link




Distance between two clusters set equal to the
minimum of distances between all instances
More versatile
Produces (sometimes too) elongated clusters
Complete-link



Distance between two clusters set equal to maximum
of all distances between instances in the clusters
Tightly bound, compact clusters
Often more useful in practice
Data Mining and Knowledge
Discovery
10
Example: Clusters Found
Single-Link
Complete-Link
1 1
1 1 1
1
*
1
1
1 11
1 1
1 1 1
1
*
1
1
1 11
2
*
*
*
*
*
*
*
2
2* 2
2
*
*
*
*
Data Mining and Knowledge
Discovery
*
*
2
2* 2
2
2
2
2
2
2
2
2 2
2
2
*
2
2
2
2
2
2
2
2 2
11
Partitional Clustering



Output a single partition of the data
into clusters
Good for large data sets
Determining the number of clusters is a
major challenge
Data Mining and Knowledge
Discovery
12
K-Means
Predetermined
number of clusters
Start with seed
clusters of one
element
Seeds
Data Mining and Knowledge
Discovery
13
Assign Instances to Clusters
Data Mining and Knowledge
Discovery
14
Find New Centroids
Data Mining and Knowledge
Discovery
15
New Clusters
Data Mining and Knowledge
Discovery
16
Discussion: k-means


Applicable to fairly large data sets
Sensitive to initial centers



Use other heuristics to find good initial
centers
Converges to a local optimum
Specifying the number of centers very
subjective
Data Mining and Knowledge
Discovery
17
Clustering in Weka

Clustering algorithms in Weka
K-Means
 Expectation Maximization (EM)
 Cobweb


hierarchical, incremental, and
agglomerative
Data Mining and Knowledge
Discovery
18
CobWeb

Algorithm (main) characteristics:


Hierarchical and incremental
Uses category utility
The k clusters
CU C1 , C2 ,..., Ck  

Improvemen t in probability estimate
because of instancecluster assigment

2
2


Pr
C
Pr
a

v
|
C

Pr
a

v
 l  i ij l
i
ij
 
l
i



j
k
Why divide by k?
Data Mining and Knowledge
Discovery
All possible values
for attribute ai
19
Category Utility

If each instance in its own cluster
1 vij  actual value of instance
Pr ai  vij | Cl  
otherwise
0



Category utility function becomes
CU C1 , C2 ,..., Ck  


n   Pr ai  vij
i

2
j
k
Without k it would always be best for each
instance to have its own cluster, overfitting!
Data Mining and Knowledge
Discovery
20
The Weather Problem
Outlook Temp. Humidity Windy
Sunny
Hot
High
FALSE
Sunny
Hot
High
TRUE
Overcast
Hot
High
FALSE
Rainy
Mild
High
FALSE
Rainy
Cool
Normal FALSE
Rainy
Cool
Normal TRUE
Overcast Cool
Normal TRUE
Sunny
Mild
High
FALSE
Sunny
Cool
Normal FALSE
Rainy
Mild
Normal FALSE
Sunny
Mild
Normal TRUE
Overcast Mild
High
TRUE
Overcast
Hot
Normal FALSE
Rainy
Mild
High
TRUE
Data Mining and Knowledge
Discovery
Play
No
No
Yes
Yes
Yes
No
Yes
No
Yes
Yes
Yes
Yes
Yes
No
21
Weather Data (without Play)

Label instances: a,b,….,n
Start by putting
the first instance
in its own cluster
a
Add another instance
in its own cluster
a
Data Mining and Knowledge
Discovery
b
22
Adding the Third Instance
Evaluate the category utility of adding the instance to one
of the two clusters versus adding it as its own cluster
b
a
c
a
a
b
c
b
c
Highest utility
Data Mining and Knowledge
Discovery
23
Adding Instance f
First instance not to get
its own cluster:
a
b
c
d
e
f
Look at the instances:
Rainy Cool Normal FALSE
Rainy Cool Normal TRUE
Quite similar!
Data Mining and Knowledge
Discovery
24
Add Instance g
Look at the instances:
E) Rainy Cool Normal FALSE
F) Rainy Cool Normal TRUE
G) Overcast Cool Normal TRUE
a
b
c
d
e
Data Mining and Knowledge
Discovery
f
g
25
Add Instance h
Look at the instances:
A) Sunny Hot High FALSE
D) Rainy Mild High FALSE
H) Sunny Mild High FALSE
Rearrange:
Merged into a
single cluster
before h is added
b
a
d
h
Runner up
Best matching node
c
e
f
g
(Splitting is also possible)
Data Mining and Knowledge
Discovery
26
Final Hierarchy
g
a
d
h
c
b
k
l
f
e
j
m
n
i
What next?
Data Mining and Knowledge
Discovery
27
Dendrogram  Clusters
g
a
d
h
c
b
k
l
e
f
j
m
n
i
What do a, b, c, d, h, k, and l
have in common?
Data Mining and Knowledge
Discovery
28
Numerical Attributes

Assume normal distribution
 1
1
l PrCl  2  i     
i 
 il
CU C1 , C2 ,..., Ck  
k
1


Problems with zero variance!
The acuity parameter imposes a minimum
variance
Data Mining and Knowledge
Discovery
29
Hierarchy Size (Scalability)

May create very large hierarchy


The cutoff parameter is uses to suppress
growth
If
CU C1 , C2 ,..., Ck   Cutoff
cut node off.
Data Mining and Knowledge
Discovery
30
Discussion

Advantages




Incremental  scales to large number of instances
Cutoff  limits size of hierarchy
Handles mixed attributes
Disadvantages


Incremental  sensitive to order of instances?
Arbitrary choice of parameters:



divide by k,
artificial minimum value for variance of numeric attributes,
ad hoc cutoff value
Data Mining and Knowledge
Discovery
31
Probabilistic Perspective





Most likely set of clusters given data
Probability of each instance belonging to a
cluster
Assumption: instances are drawn from one of
several distributions
Goal: estimate the parameters of these
distributions
Usually: assume distributions are normal
Data Mining and Knowledge
Discovery
32
Mixture Resolution




Mixture: set of k probability distributions
Represent the k clusters
Probabilities that an instance takes certain
attribute values given it is in the cluster
What is the probability an instance belongs to
a cluster (or a distribution)
Data Mining and Knowledge
Discovery
33
One Numeric Attribute
Two cluster mixture model:
Cluster B
Cluster A
Attribute
Given some data, how can you determine the parameters:
 A  Mean for Cluster A
 A  Standard deviation for Cluster A
 B  Mean for Cluster B
 B  Standard deviation for Cluster B
p A  Probabilit y of being in Cluster A
Data Mining and Knowledge
Discovery
34
Problems


If we knew which instance came from each
cluster we could estimate these values
If we knew the parameters we could calculate
the probability that an instance belongs to
each cluster
Prx | A Pr[ A] f ( x;  A ,  A ) p A
PrA | x  

Pr[ x]
1
f ( x;  A ,  A ) 
e
2
( x )2

2 2
Pr[ x]
.
Data Mining and Knowledge
Discovery
35
EM Algorithm

Expectation Maximization (EM)





Start with initial values for the parameters
Calculate the cluster probabilities for each instance
Re-estimate the values for the parameters
Repeat
General purpose maximum likelihood
estimate algorithm for missing data

Can also be used to train Bayesian networks
(later)
Data Mining and Knowledge
Discovery
36
Beyond Normal Models

More than one class:


More than one numeric attribute



Straightforward
Easy if assume attributes independent
If dependent attributes, treat them jointly
using the bivariate normal
Nominal attributes

No more normal distribution!
Data Mining and Knowledge
Discovery
37
EM using Weka

Options

numClusters: set number of clusters.




Default = -1 selects it automatically
maxIterations: maximum number of iterations
seed -- random number seed
minStdDev -- set minimum allowable standard
deviation
Data Mining and Knowledge
Discovery
38
Other Clustering


Artificial Neural Networks (ANN)
Random search

Genetic Algorithms (GA)





GA used to find initial centroids for k-means
Simulated Annealing (SA)
Tabu Search (TS)
Support Vector Machines (SVM)
Will discuss GA and SVM later
Data Mining and Knowledge
Discovery
39
Applications

Image segmentation

Object and Character Recognition

Data Mining:

Stand-alone to gain insight into the data

Preprocess before classification that
operates on the detected clusters
Data Mining and Knowledge
Discovery
40
DM Clustering Challenges


Data mining deals with large databases
Scalability with respect to number of instance


Dealing with mixed data


Use a random sample (possible bias)
Many algorithms only make sense for numeric
data
High dimensional problems


Can the algorithm handle many attributes?
How do we interpret a cluster in high dimensions?
Data Mining and Knowledge
Discovery
41
Other (General) Challenges





Shape of clusters
Minimum domain knowledge (e.g.,
knowing the number of clusters)
Noisy data
Insensitivity to instance order
Interpretability and usability
Data Mining and Knowledge
Discovery
42
Clustering for DM

Main issue is scalability to large databases

Many algorithms have been developed for
scalable clustering:

Partitional methods: CLARA, CLARANS

Hierarchical methods: AGNES, DIANA, BIRCH,
CURE, Chameleon
Data Mining and Knowledge
Discovery
43
Practical Partitional Clustering
Algorithms



Classic k-Means (1967)
Work from 1990 and later:
k-Medoids




Uses the mediod instead of the centroid
Less sensitive to outliers and noise
Computations more costly
PAM (Partitioning Around Mediods)
algorithm
Data Mining and Knowledge
Discovery
44
Large-Scale Problems

CLARA: Clustering LARge Applications




Select several random samples of instances
Apply PAM to each
Return the best clusters
CLARANS:



Similar to CLARA
Draws samples randomly while searching
More effective than PAM and CLARA
Data Mining and Knowledge
Discovery
45
Hierarchical Methods

BIRCH: Balanced Iterative Reducing and
Clustering using Hierarchies

Clustering feature: triplet summarizing
information about subclusters

Clustering feature tree: height-balanced
tree that stores the clustering features
Data Mining and Knowledge
Discovery
46
BIRCH Mechanism

Phase I:



Phase II:


Scan database to build an initial CF tree
Multilevel compression of the data
Apply a selected clustering algorithm to the
leaf nodes of the CF tree
Has been found to be very scalable
Data Mining and Knowledge
Discovery
47
Conclusion


The use of clustering in data mining
practice seems to be somewhat limited
due to scalability problems
More commonly used unsupervised
learning:
Association Rule Discovery
Data Mining and Knowledge
Discovery
48
Association Rule Discovery

Aims to discovery interesting correlation or
other relationships in large databases

Finds a rule of the form
if A and B then C and D

Which attributes will be included in the
relation is unknown
Data Mining and Knowledge
Discovery
49
Mining Association Rules


Similar to classification rules
Use same procedure?




Every attribute is the same
Apply to every possible expression on right
hand side
Huge number of rules  Infeasible
Only want rules with high coverage/support
Data Mining and Knowledge
Discovery
50
Market Basket Analysis

Basket data: items purchased on pertransaction basis (not cumulative, etc)




How do you boost the sales of a given product?
What other products does discontinuing a product
impact?
Which products should be shelved together?
Terminology (market basket analysis):


Item - an attribute/value pair
Item set - combination of items with min. coverage
Data Mining and Knowledge
Discovery
51
How Many k-Item Sets Have
Minimum Coverage?
Outlook
Sunny
Sunny
Overcast
Rainy
Rainy
Rainy
Overcast
Sunny
Sunny
Rainy
Sunny
Overcast
Overcast
Rainy
Temp. Humidity Windy
Hot
High FALSE
Hot
High
TRUE
Hot
High FALSE
Mild
High FALSE
Cool Normal FALSE
Cool Normal TRUE
Cool Normal TRUE
Mild
High FALSE
Cool Normal FALSE
Mild
Normal FALSE
Mild
Normal TRUE
Mild
High
TRUE
Hot
Normal FALSE
Mild
High
TRUE
Data Mining and Knowledge
Discovery
Play
No
No
Yes
Yes
Yes
No
Yes
No
Yes
Yes
Yes
Yes
Yes
No
52
Item Sets
1-Item
2-Item
3-Item
4-Item
Outlook=sunny
(5)
Outlook=sunny
temp=mild (2)
Outlook=
overcast (4)
Outlook=sunny
temp=hot (2)
Outlook=sunny
temp=hot
humidity=high
(2)
Outlook=sunny
temp=hot
play=no (2)
Outlook=rainy
(5)
Outlook=sunny
humidity=norm
(2)
Outlook=sunny
humidity=norm
play=yes (2)
Temp=cool (4)
Outlook=sunny
windy=true (2)
Outlook=sunny
humidity=high
windy=false (2)
Temp=mild (6)
Outlook=sunny
windy=true (2)
Outlook=sunny
humidity=high
play=no (3)
Outlook=sunny
temp=hot
humidity=high
play=no (2)
Outlook=sunny
humidity=high
windy=false
play=no (2)
Outlook=over
temp=hot
windy=false
play=no (2)
Outlook=rainy
temp=mild
windy=false
play=yes (2)
Outlook=rainy
humidity=norm
windy=false
play=yes (2)
Data Mining and Knowledge
Discovery
53
From Sets to Rules
3-Item Set w/coverage 4:
Humidity = normal, windy = false, play = yes
Association Rules:
Accuracy
If humidity = normal and windy = false then play = yes
If humidity = normal and play = yes then windy = false
If windy = false and play = yes then humidity = normal
If humidity = normal then windy = false and play = yes
If windy = false then humidity = normal and play = yes
If play = yes then humidity = normal and windy = false
If - then humidity = normal and windy = false and play=yes
Data Mining and Knowledge
Discovery
4/4
4/6
4/6
4/7
4/8
4/9
4/12
54
From Sets to Rules
(continued)
4-Item Set w/coverage 2:
Temperature = cool, humidity = normal,
windy = false, play = yes
Association Rules:
Accuracy
If temperature = cool, windy = false  humidity = normal, play = yes
If temperature = cool, humidity = normal, windy = false  play = yes
If temperature = cool, windy = false, play = yes  humidity = normal
Data Mining and Knowledge
Discovery
2/2
2/2
2/2
55
Overall

Minimum coverage (2):


12 1-item sets, 47 2-item sets, 39 3-item sets, 6 4-item
sets
Minimum accuracy (100%):

58 association rules
“Best” Rules (Coverage = 4, Accuracy = 100%)
If humidity = normal and windy = false
If temperature = cool
If outlook = overcast
 play = yes
 humidity = normal
 play = yes
Data Mining and Knowledge
Discovery
56
Association Rule Mining
STEP 1: Find all item sets that meet
minimum coverage
STEP 2: Find all rules that meet minimum
accuracy
STEP 3: Prune
Data Mining and Knowledge
Discovery
57
Generating Item Sets

How do we generate minimum coverage item
sets in a scalable manner?



Need an efficient algorithm:



Total number of item set is huge
Grows exponentially in the number of attributes
Start by generating minimum coverage 1-item sets
Use those to generate 2-item sets, etc
Why do we only need to consider minimum
coverage 1-item sets?
Data Mining and Knowledge
Discovery
58
Justification
Item Set 1: {Humidity = high}
Coverage(1) = Number of times humidity is high
Item Set 2: {Windy = false}
Coverage (2) = Number of times windy is false
Item Set 3: {Humidity = high, Windy = false}
Coverage (3) = Number of times humidity is high and
windy is false
Coverage (3)  Coverage(1)
Coverage (3)  Coverage(2)
If Item Set 1 and 2 do not
both meet min. coverage
Item Set 3 cannot either
Data Mining and Knowledge
Discovery
59
Generating Item Sets
Start with all
3-item sets
that meet min.
coverage
(A B C)
(A B D)
(A C D)
(A C E)
Merge to
generate
4-item sets
There are only two 4item sets that could
possibly work
(Consider only
sets that start
with the same
two attributes)
(A B C D)
(A C D E)
Candidate 4-item sets with minimum
coverage (must be checked)
Data Mining and Knowledge
Discovery
60
Algorithm for Generating Item
Sets



Build up from 1-item sets so that we
only consider item sets that is found by
merging two minimum coverage sets
Only consider sets that have all but one
item in common
Computational efficiency further
improved using hash tables
Data Mining and Knowledge
Discovery
61
Generating Rules
Meets min.
If windy = false and play = no then
coverage
outlook = sunny and humidity = high
and accuracy

Meets min.
coverage
and accuracy
If windy
then
If windy
then
= false and play = no
outlook = sunny
= false and play = no
humidity = high
Data Mining and Knowledge
Discovery
62
How Many Rules?


Want to consider every possible subset
of attributes as consequent
Have 4 attributes:





Four single consequent rules
Six double consequent rules
Two triple consequent rules
Twelve possible rules for single 4-item set!
Exponential explosion of possible rules
Data Mining and Knowledge
Discovery
63
Must We Check All?
If A and B then C and D
Coverage  Number of times A, B, C, and D are true
Number of times A, B, C, and D are true
Accuracy 
Number of times A and B are true
If A,B and C then D
Coverage  Number of times A, B, C, and D are true
Number of times A, B, C, and D are true
Accuracy 
Number of times A, B, and C are true
Data Mining and Knowledge
Discovery
64
Efficiency Improvement


A double consequent rule can only be OK if
both single consequent rules are OK
Procedure:


Start with single consequent rules
Build up double consequent rules, etc.



candidate rules
check for accuracy
In practice: need to check far fewer rules
Data Mining and Knowledge
Discovery
65
Apriori Algorithm



This is a simplified description of the
Apriori algorithm
Developed in early 90s and is the most
commonly used approach
New developments focus on


Generating item sets more efficiently
Generating rules from item sets more
efficiently
Data Mining and Knowledge
Discovery
66
Association Rule Discovery
using Weka

Parameters to be specified in Apriori:







upperBoundMinSupport: start with this value
of minimum support
delta: in each step decrease the minimum
support required by this value
lowerBoundMinSupport: final minimum
support
numRules: how many rules are generated
metricType: confidence, lift, leverage, conviction
minMetric: smallest acceptable value for a rule
Handles only nominal attributes
Data Mining and Knowledge
Discovery
67
Difficulties


Apriori algorithm improves performance
by using candidate item sets
Still some problems …

Costly to generate large number of item sets


To generate a frequent pattern of size 100 need
>21001030 candidates!
Requires repeated scans of database to check
candidates

Again, most problematic for long patterns
Data Mining and Knowledge
Discovery
68
Solution?


Can candidate generation be avoided?
New approach:

Create a frequent pattern tree (FP-tree)


stores information on frequent patterns
Use the FP-tree for mining frequent
patterns



partitioning-based
divide-and-conquer
(as opposed to bottom-up generation)
Data Mining and Knowledge
Discovery
69

Database
TID
100
200
300
400
500
Items
F,A,C,D,G,I,M,P
A,B,C,F,L,M,O
B,F,H,J,O
B,C,K,S,P
A,F,C,E,L,P,M,N
(Min. support = 3)
Frequent Items
F,C,A,M,P
F,C,A,B,M
F,B
C,B,P
F,C,A,M,P
Head of
Item node links
F
C
A
B
M
P
FP-Tree
Root
F:4
C:3
C:1
B:1
A:3
P:1
M:2
B:1
P:2
M:1
Data Mining and Knowledge
Discovery
B:1
70
Computational Effort

Each node has three fields




Also a header table with



item name
count
node link
item name
head of node link
Need two scans of the database


Collect set of frequent items
Construct the FP-tree
Data Mining and Knowledge
Discovery
71
Comments

The FP-tree is a compact data structure

The FP-tree contains all the information
related to mining frequent patterns (given the
support)

The size of the tree is bounded by the
occurrences of frequent items

The height of the tree is bounded by the
maximum number of items in a transaction
Data Mining and Knowledge
Discovery
72
Mining Patterns


Mine complete set of frequent patterns
For any frequent item A, all possible
patterns containing A can be obtained
by following A’s node links starting from
A’s head of node links
Data Mining and Knowledge
Discovery
73
Example
Item
F
C
A
B
M
P
Head of
node links
Root
F:4
C:3
C:1
B:1
B:1
A:3
P:1
M:2
B:1
P:2
M:1
Occurs twice
Frequent Pattern
(P:3)
Paths
<F:4, C:3, A:3, M:2, P:2>
<C:1, B:1, P:1>
Occurs ones
Data Mining and Knowledge
Discovery
74
Rule Generation

Mining complete set of association rules
has some problems



May be a large number of frequent item
sets
May be a huge number of association rules
One potential solution is to look at
closed item sets only
Data Mining and Knowledge
Discovery
75
Frequent Closed Item Sets


An item set X is a closed item set if there is
no item set X’ such that X  X’ and every
transaction containing X also contains X’
A rule X Y is an association rule on a
frequent closed item set if


both X and XY are frequent closed item sets, and
there does not exist a frequent closed item set Z
such that X  Z  XY
Data Mining and Knowledge
Discovery
76
Example
ID
10
20
30
40
50
Items
A,C,D,E,F
A,B,E
C,E,F
A,C,D,F
C,E,F
Frequent Item Sets (min support = 2):
A (3),
E (4),
AE (2),
All the closed sets
ACDF (2),
CF (3),
CEF (3),
D (2),
Not closed! Why?
AC (2),
+ 12 more
Data Mining and Knowledge
Discovery
77
Mining Frequent Closed Item
Sets (CLOSET)
TDB
NOTE
C:4
E:4
F:4
A:3 Order for
D:2 conditional DB
CEFAD
EA
CEF
CFAD
CEF
D-cond DB (D:2)
A-cond DB (A:3)
F-cond DB (F:4)
E-cond DB (E:4)
CEFA
CEF
CE:3
C:4
CFA
E
C
Output: E:4
CF
Output: CFAD:2
Output: CF:2,CEF:3
Output: A:3
EA-cond DB (EA:2)
C
Output: EA:2
Data Mining and Knowledge
Discovery
78
Mining with Taxonomies
Taxonomy:
Clothes
Outerwear
Jackets
Footwear
Shirts
Shoes
Hiking Boots
Ski Pants
Generalized association rule
X Y where no item in Y is
an ancestor of an item in X
Data Mining and Knowledge
Discovery
79
Why Taxonomy?

The ‘classic’ association rule mining restricts
the rules to the leave nodes in the taxonomy

However:

Rules at lower levels may not have minimum
support and thus interesting association may go
undiscovered

Taxonomies can be used to prune uninteresting
and redundant rules
Data Mining and Knowledge
Discovery
80
Example
ID
10
20
30
40
50
60
Item Set
{Jacket}
{Outerwear}
{Cloths}
{Shoes}
{Hiking Boots}
{Footwear}
{Outerwear, Hiking Boots}
{Cloths, Hiking Boots}
{Outerwear, Footwear}
{Cloths, Footwear}
Items
Shirt
Jacket, Hiking Boots
Ski pants, Hiking Boots
Shoes
Shoes
Jacket
Rule
Outerwear  Hiking Boots
Outerwear  Footwear
Hiking Boots  Outerwear
Hiking Boots  Clothes
Support
2
2
2
2
Support
2
3
4
2
2
2
2
2
2
2
Confidence
2/3
2/3
2/2
2/2
Data Mining and Knowledge
Discovery
81
Interesting Rules


Many way in which the interestingness of a rule can be
evaluated based on ancestors
For example:


A rule with no ancestors is interesting
A rule with ancestor(s) is interesting only if it has enough
‘relative support’
Rule ID
1
2
3

Rule
Clothes  Footwear
Outerwear  Footwear
Jackets  Footwear
Support
10
8
4
Item
Clothes
Outerwear
Jackets
Support
5
2
1
Which rules are interesting?
Data Mining and Knowledge
Discovery
82
Discussion

Association rule mining finds expression of
the form X Y from large data sets

One of the most popular data mining tasks

Originates in market basket analysis

Key measures of performance


Support

Confidence (or accuracy)
Is support and confidence enough?
Data Mining and Knowledge
Discovery
83
Type of Rules Discovered

‘Classic’ association rule problem


All rules satisfying minimum threshold of
support and confidence
Focus on subset of rules, e.g.,



Optimized rules
Maximal frequent item sets
Closed item sets
Data Mining and Knowledge
Discovery
What makes for an
interesting rule?
84
Algorithm Construction

Determine frequent item sets (all or
part)



By far the most computational time
Variations focus on this part
Generate rules from frequent item sets
Data Mining and Knowledge
Discovery
85
Generating Item Sets
Search space
traversed
Support
determined
Bottom-up
Counting
Apriori-like
algorithms
* Have discussed
Intersecting
Apriori*
Partition
AprioriTID
DIC
Top-down
Counting
FP-Growth*
Data Mining and Knowledge
Discovery
Intersecting
Eclat
No algorithm
dominates others!
86
Applications

Market basket analysis


Classic marketing application
Applications to recommender systems
Data Mining and Knowledge
Discovery
87
Recommender



Customized goods and services
Recommend products
Collaborative filtering




similarities among users’ tastes
recommend based on other users
many on-line systems
simple algorithms
Data Mining and Knowledge
Discovery
88
Classification Approach

View as classification problem




Product either of interest or not
Induce a model, e.g., a decision tree
Classify a new product as either interesting
or not interesting
Difficulty in this approach?
Data Mining and Knowledge
Discovery
89
Association Rule Approach

Product associations


User associations


90% of users who like product A and product B also
like product C
A and B  C (90%)
90% of products liked by user A and user B are also
liked by user C
Use combination of product and user
associations
Data Mining and Knowledge
Discovery
90
Advantages


‘Classic’ collaborative filtering must identify
users with similar tastes
This approach uses overlap of other users’
tastes to match given user’s taste


Can be applied to users whose tastes don’t
correlate strongly with those of other users
Can take advantage of information from, say user
A, for a recommendation to user B, even if they do
not correlate
Data Mining and Knowledge
Discovery
91
What’s Different Here?




Is this really a ‘classic’ association rule
problem?
Want to learn what products are liked by
what users
‘Semi-supervised’
Target item


User (for user associations)
Product (for product associations)
Data Mining and Knowledge
Discovery
92
Single-Consequent Rules


Only a single (target) item in the
consequent
Go through all such items
Association Rules
All possible item
combination consequent
Associations for
Recommender
Classification
One single item
consequent
Data Mining and Knowledge
Discovery
93
Download