Weka Tutorial 5

advertisement
Technology
Forge
Version 0.1
www.technologyforge.net
Weka Tutorial 5 –
Association
?
© 2009 – Mark Polczynski
Weka Tutorial 5 - Association
1
Association
Got
Beer?
Market Basket Analysis
Got
Diapers?
What things tend to
“go together”
in your market basket?
Weka Tutorial 5 - Association
2
People who bought beer
also usually bought diapers.
Example of
Market Basket
Analysis
People who bought diapers
also usually bought beer.
Weka Tutorial 5 - Association
3
Cash Register Checkout Data:
• Items in shopping cart
• Time of day
• Day of week
• Season of year
• Method of payment
• Location of store
• etc…
Associations?
What can you use
associations for?
Fact-Based Marketing Strategies:
• Floor plans
• Discounts
• Coupons
• etc…
Weka Tutorial 5 - Association
4
If the weather outside is:
Then to play, or not to play, that is the question!
Weka Tutorial 5 - Association
5
TPONTPNom.xls dataset:
Outlook
sunny
sunny
overcast
rainy
rainy
rainy
overcast
sunny
sunny
rainy
sunny
overcast
overcast
rainy
Temp.
hot
hot
hot
mild
cool
cool
cool
mild
cool
mild
mild
mild
hot
mild
Humidity
high
high
high
high
normal
normal
normal
high
normal
normal
normal
high
normal
high
Weka Tutorial 5 - Association
Windy
false
true
false
false
false
true
true
false
false
false
true
true
false
true
Play
no
no
yes
yes
yes
no
yes
no
yes
yes
yes
yes
yes
no
6
Outlook
sunny
sunny
overcast
rainy
rainy
rainy
overcast
sunny
sunny
rainy
sunny
overcast
overcast
rainy
Temp.
hot
hot
hot
mild
cool
cool
cool
mild
cool
mild
mild
mild
hot
mild
Outlook
sunny = 5
overcast = 4
rainy = 5
Humidity
high
high
high
high
normal
normal
normal
high
normal
normal
normal
high
normal
high
Temp.
hot = 4
mild = 6
cool = 4
Windy
false
true
false
false
false
true
true
false
false
false
true
true
false
true
Play
no
no
yes
yes
yes
no
yes
no
yes
yes
yes
yes
yes
no
Humidity
high = 7
normal = 7
Weka Tutorial 5 - Association
Attribute
values
and counts
Windy
true = 6
false = 8
Play
yes = 9
no = 5
7
Classification:
Input attributes
Outlook
sunny
sunny
overcast
rainy
rainy
rainy
overcast
sunny
sunny
rainy
sunny
overcast
overcast
rainy
Temp.
hot
hot
hot
mild
cool
cool
cool
mild
cool
mild
mild
mild
hot
mild
Humidity
high
high
high
high
normal
normal
normal
high
normal
normal
normal
high
normal
high
Class
Windy
false
true
false
false
false
true
true
false
false
false
true
true
false
true
Play
no
no
yes
yes
yes
no
yes
no
yes
yes
yes
yes
yes
no
Goal: Predict whether the person will play, given weather conditions.
Weka Tutorial 5 - Association
8
Outlook
sunny
sunny
overcast
rainy
rainy
rainy
overcast
sunny
sunny
rainy
sunny
overcast
overcast
rainy
Temp.
hot
hot
hot
mild
cool
cool
cool
mild
cool
mild
mild
mild
hot
mild
Humidity
high
high
high
high
normal
normal
normal
high
normal
normal
normal
high
normal
high
Windy
false
true
false
false
false
true
true
false
false
false
true
true
false
true
Play
no
no
yes
yes association, we
For
yes
look for things
no
that
“go together”
yes
no
yes
yes
yes
yes
yes
no
Some things that “go together”:
Outlook = sunny AND Temp. = hot go together 2 times,
Outlook = sunny AND Temp. = cool go together only once.
Weka Tutorial 5 - Association
9
1
Outlook:
sunny
overcast
rainy
Two-item sets of
attributes that can
go together
5
2
Play:
yes
no
Temp:
hot
mild
cool
Windy:
true
false
4
3
Weka Tutorial 5 - Association
node-to-node
links
Humidity:
high
normal
1-2
2-3
3-4
4-5
5-1
1-3
2-4
3-5
4-1
5-2
10
10
Combinations of Outlook
and Temperature that
can go together
1
sunny/hot
sunny/mild
sunny/cool
overcast/hot
overcast/mild
overcast/cool
rainy/hot
rainy/mild
rainy/cool
Outlook:
sunny
overcast
rainy
3x3=9
5
2
Play:
yes
no
Temp:
hot
mild
cool
Windy:
true
false
4
3
Weka Tutorial 5 - Association
Humidity:
high
normal
11
All possible
two-item sets
1
Outlook: 3
Play: 2
Temp: 3
5
2
1-2:
2-3:
3-4:
4-5:
5-1:
1-3:
2-4:
3-5:
4-1:
5-2:
3x3 = 9
3x2 = 6
2x2 = 4
2x2 = 4
2x3 = 6
3x2 = 6
3x2 = 6
2x2 = 4
2x3 = 6
2x3 = 6
57
node-to-node
links
4
Windy: 2
3
Humidity: 2
Weka Tutorial 5 - Association
combinations
for link
12
3
4
3
4
3
4
6
1
3
1
1
3
2
2
4
2
3
3
4
2
0
4
2
2
3
1
Weka Tutorial 5 - Association
6
2
2
2
1
3
2
3
3
8
3
2
3
3
3
2
4
4
3
3
6
2
no
7
2
2
3
1
2
4
Play
yes
7
3
2
2
3
4
0
6
2
1
3
false
0
3
2
2
3
2
3
3
2
4
1
1
2
4
2
2
0
Windy
true
2
1
1
2
2
2
2
4
0
normal
2
2
1
3
2
2
3
2
3
high
5
cool
4
mild
5
Humid.
hot
rainy
5
4
5
4
6
4
7
7
6
8
9
5
overcast
Oulook sunny
overcast
rainy
Temp. hot
mild
cool
Humid. high
normal
Windy yes
no
Play
yes
no
Temp.
sunny
Outlook
Occurrences
of each pair
9
2
4
3
2
4
3
3
6
3
6
5
3
0
2
2
2
1
4
1
3
2
13
6
2
2
2
1
3
2
3
3
8
3
2
3
3
3
2
4
4
no
7
2
2
3
1
2
4
Play
yes
7
3
2
2
3
4
0
false
4
1
1
2
Windy
true
Oulook sunny
5
2
2
overcast
4
2
1
rainy
5
0
3
Temp. hot
4
mild
6
cool
4
Humid. high
7
normal
7
Windy yes
6
no
8
Humidity = normal and Play = yes
Play
yes
9
occurs5more often than
no
normal
mild
hot
rainy
sunny
Temperature = cool and Humidity = normal
occurs more often than
Temperature = cool and
5 Humidity
4
5 =4high6
Humid.
high
Temp.
cool
overcast
Outlook
9
2
4
3
2
4
3
3
6
3
6
5
3
0
2
2
2
1
4
1
3
2
Humidity = normal and Play = no
Weka Tutorial 5 - Association
14
“Support” is the number of instances
7
3
2
2
3
4
0
7
2
2
3
1
2
4
Weka Tutorial 5 - Association
6
2
2
2
1
3
2
3
3
8
3
2
3
3
3
2
4
4
no
4
1
1
2
Play
yes
6
2
1
3
false
4
2
2
0
Windy
true
normal
Oulook sunny
5
overcast
4
rainy
5
Temp. hot
4
mild
6
cool
4
Humid. high
7
normal
7
Temp. = cool
AND Humid.
= normal
Windy yes
6
8 dataset,
occursno
4 times in the
Play
yes
9
5 is 4.
sonothe support
high
5
cool
rainy
4
mild
overcast
5
Humid.
hot
sunny
Outlook
where a particular rule
is correct.Temp.
9
2
4
3
2
4
3
3
6
3
6
5
3
0
2
2
2
1
4
1
3
2
15
Here are the 47 two-item sets
4
1
1
2
7
3
2
2
3
4
0
7
2
2
3
1
2
4
6
2
2
2
1
3
2
3
3
8
3
2
3
3
3
2
4
4
Combinations
Combinationswith
withhighest
highestcoverage:
support:
Humidity = normal AND Play = yes
Windy = no and Play = yes
Weka Tutorial 5 - Association
no
6
2
1
3
Play
yes
normal
4
2
2
0
Windy
false
high
rainy
5
cool
overcast
4
mild
5
4
5
4
6
4
7
7
6
8
9
5
5
Humid.
hot
Oulook sunny
overcast
rainy
Temp. hot
mild
cool
Humid. high
normal
Windy yes
no
Play
yes
no
Temp.
sunny
Outlook
true
that have support => 2.
9
2
4
3
2
4
3
3
6
3
6
5
3
0
2
2
2
1
4
1
3
2
16
“Confidence” of the association rule
4
1
1
2
7
3
2
2
3
4
0
7
2
2
3
1
2
4
6
2
2
2
1
3
2
3
3
8
3
2
3
3
3
2
4
4
Number of occurrences of
Humidity = normal
is 7.
Weka Tutorial 5 - Association
no
6
2
1
3
Play
yes
4
2
2
0
Windy
false
normal
Oulook sunny
5
overcast
4
Combination
rainy
5
Humidity = normal AND Play = yes
Temp. hot
4
occurs 6 times.
mild
6
cool
4
Humid. high
7
normal
7
Windy yes
6
no
8
Play
yes
9
no
5
high
5
cool
rainy
4
mild
overcast
5
hot
sunny
is 6/7, or 86%
Humid.
true
“If Humidity = normal, Outlook
then Play = yes”
Temp.
9
2
4
3
2
4
3
3
6
3
6
5
3
0
2
2
2
1
4
1
3
2
17
Confidence of rule
7
3
2
2
3
4
0
7
2
2
3
1
2
4
6
2
2
2
1
3
2
3
3
8
3
2
3
3
3
2
4
4
Oulook sunny
5
overcast
4
rainy
5
Number of occurrences of
Temp. hot
4
Play = yes
mild
6
is 9.
cool
4
Humid. high
7
normal
7
Windy yes
6
no
8
Combination
Play
yes
9
Humidity = normal AND Play = yes
no
5
occurs 6 times.
Weka Tutorial 5 - Association
no
4
1
1
2
Play
yes
6
2
1
3
false
4
2
2
0
Windy
true
normal
5
high
rainy
4
cool
overcast
5
mild
is 6/9, or 67%
Humid.
hot
sunny
If Play = yes, then Humidity
= normalTemp.
Outlook
9
2
4
3
2
4
3
3
6
3
6
5
3
0
2
2
2
1
4
1
3
2
18
8
0.60
0.50
0.60
0.75
0.50
0.50
0.57
0.57
no
4
7
7
6
0.40 0.20 0.60 0.40 0.40
0.25 0.25 0.50 0.50 0.50
0.60 0.40 0.40 0.60 0.40
0.75 0.25 0.25
0.67 0.33 0.50
0.00 1.00 0.50
0.57 0.00
0.43
0.29 0.57
0.43
0.50 0.33 0.50 0.50
0.38 0.25 0.50 0.50
0.44 0.33 0.33 0.67 0.33
0.40 0.20 0.80 0.20 0.60
Play
false
true
normal
Windy
yes
IF
Oulook sunny
5
0.40
overcast
4
0.50
rainy
5
0.00
If Play = yes,
Temp. hot
4 0.50 0.50 0.00
then
mild
6 Humidity
0.33 0.17= normal
0.50
cool
4 0.25 0.25 0.50
Humid. high
7 0.43 0.29 0.29 0.43
normal
7 0.29 0.29 0.43 0.14
Windy yes
6 0.33 0.33 0.33 0.17
no
8 0.38 0.25 0.38 0.38
Play
yes
9 0.22 0.44 0.33 0.22
no
5 0.60 0.00 0.40 0.40
high
hot
If Humidity = normal,
5
4
5Play =
4 yes6
then
cool
Humid.
mild
rainy
Temp.
overcast
sunny
THEN
Outlook
9
0.40
1.00
0.60
0.50
0.67
0.75
0.43
0.86
0.50
0.75
5
0.60
0.00
0.40
0.50
0.33
0.25
0.57
0.14
0.50
0.25
0.67
0.40
The two-item set of Humidity and Play
produced two different rules.
Weka Tutorial 5 - Association
19
5
4
5
4
6
4
7
7
6
8
9
5
5
4
5
0.50
0.33
0.25
0.43
0.29
0.33
0.38
0.22
0.60
0.50
0.17
0.25
0.29
0.29
0.33
0.25
0.44
0.00
0.00
0.50
0.50
0.29
0.43
0.33
0.38
0.33
0.40
Weka Tutorial 5 - Association
Humid.
4
6
4
7
0.40 0.40 0.20 0.60
0.50 0.25 0.25 0.50
0.00 0.60 0.40 0.40
0.75
0.67
0.00
0.43 0.57 0.00
0.14 0.29 0.57
0.17 0.50 0.33 0.50
0.38 0.38 0.25 0.50
0.22 0.44 0.33 0.33
0.40 0.40 0.20 0.80
Windy
7
0.40
0.50
0.60
0.25
0.33
1.00
Play
6
0.40
0.50
0.40
0.25
0.50
0.50
0.43
0.43
8
0.60
0.50
0.60
0.75
0.50
0.50
0.57
0.57
0.50
0.50
0.67 0.33 0.67
0.20 0.60 0.40
no
IF
Oulook sunny
overcast
rainy
Temp. hot
mild
cool
Humid. high
normal
Windy yes
no
Play
yes
no
Temp.
yes
Two-item set
accuracy
false
THEN
Outlook
true
6
2
normal
3
3
If Temperature = cool,
then Humidity = normal
If Outlook = overcast,
than Play = yes
high
3
4
6
1
Two-item set association rules:
cool
3
4
3
4
5
3
0
2
2
2
1
4
1
3
2
mild
0
4
2
2
3
1
9
2
4
3
2
4
3
3
6
3
6
hot
4
2
3
3
4
2
8
3
2
3
3
3
2
4
4
rainy
3
1
1
3
2
2
6
2
2
2
1
3
2
3
3
no
7
2
2
3
1
2
4
For thresholds set at:
accuracy = 100%
coverage = >2
overcast
7
3
2
2
3
4
0
yes
4
1
1
2
false
6
2
1
3
Play
sunny
0
3
2
2
3
2
3
3
2
4
2
2
0
Windy
true
2
1
1
2
2
2
2
4
0
normal
2
2
1
3
2
2
3
2
3
high
5
cool
4
mild
5
Humid.
hot
5
4
5
4
6
4
7
7
6
8
9
5
rainy
Oulook sunny
overcast
rainy
Temp. hot
mild
cool
Humid. high
normal
Windy yes
no
Play
yes
no
overcast
Two-item set
coverage
Temp.
sunny
Outlook
9
0.40
1.00
0.60
0.50
0.67
0.75
0.43
0.86
0.50
0.75
5
0.60
0.00
0.40
0.50
0.33
0.25
0.57
0.14
0.50
0.25
20
Optimum
thresholds?
Support
Dataset
Confidence
Association Rule
Generator
Association
Rules
Predictive apriori association algorithm:
•
Optimally combines support and confidence into predictive accuracy,
•
Requires only that user specify number of rules generated.
Weka Tutorial 5 - Association
21
Load
TPONTPNom.arff
dataset
Weka Tutorial 5 - Association
22
Show
Show
Outlook
Outlook
relative
relativetotoPlay
Play
Weka Tutorial 5 - Association
23
Choose
PredictiveApriori
algorithm
Weka Tutorial 5 - Association
24
Best 100 rules for
TPONTP dataset
Weka Tutorial 5 - Association
25
Support for
Outlook = overcast
AND
Play = yes
Occurrences of
Outlook = overcast
Predictive accuracy
= 95.6%
Confidence
ConfidenceofofIf
Humidity
If Outlook
= normal
= overcast,
then
then
Play
Play
= yes
= yes
isis4/4
6/7==100%
86%
Weka Tutorial 5 - Association
26
Weka Tutorial 5 - Association
27
Input attributes
Outlook
Temp.
sunny
hot
Classification Goal: sunny
Predict
hot
whether the personovercast
will play, hot
given weather conditions.
rainy
mild
rainy
cool
rainy
cool
overcast
cool
sunny
mild
sunny
cool
Association rule: rainy
mild
If Humidity = normalsunny
mild
AND Windy = false,overcast
mild
then Play = yes
overcast
hot
rainy
mild
Humidity
high
high
high
high
normal
normal
normal
high
normal
normal
normal
high
normal
high
Class
Windy
false
true
false
false
false
true
true
false
false
false
true
true
false
true
Play
no
no
yes
yes
yes
no
yes
no
yes
yes
yes
yes
yes
no
Question: Is there a linkage between classification and association?
Weka Tutorial 5 - Association
28
Mine only the association
rules that have the
designated class attribute
as the antecedent.
Keep the number
of rules at 100.
Weka Tutorial 5 - Association
29
Weka Tutorial 5 - Association
30
Weka Tutorial 5 - Association
31
What’s the difference between classification and association?
•
In one sense, association is like classification, except that instead of
considering just one attribute as the class attribute, it considers every
attribute and every combination of attributes as a class, and then
performs a form of classification on all possible classes.
•
Unlike classification, association does not necessarily consider all
attributes when attempting to create rules, but operates on sub-set
combinations of attributes.
•
Association only works on nominal attributes, not numerical attributes,
although numerical attributes can be discretized.
Weka Tutorial 5 - Association
32
The Weka Experimenter Environment supports
efficient automation of algorithm testing.
Topic of the next
Weka tutorial
Weka Tutorial 5 - Association
33
Weka Documentation:
Weka Tutorial 5 - Association
34
Contact the Author:
Mark Polczynski, PhD
The Technology Forge
mhp.techforge@gmail.com
Weka Tutorial 5 - Association
35
Download