Constraints 732A02 Data Mining - Clustering and Association Analysis A constraint C(.) is

advertisement
732A02 Data Mining Clustering and Association Analysis
•
Constraints
A
constraint C(.) is
Monotone
Constrained frequent itemset mining
If C(A) then C(B) for all A ΠB.
E.g. A’ Œ A.
Antimonotone
If C(A) then C(B) for all B ΠA.
Or, if not C(B) then not C(A) for all B ΠA.
E.g. support ≥ min_support.
The apriori property applies to any antimonotone
constraint.
…………………
Jose M. Peña
jospe@ida.liu.se
Constraints
Constraints
sum(S.Price) ≥ v is monotone (positive prices).
sum(S.Price) ≤ v is antimonotone (positive prices).
min(S.Price) ≤ v is monotone.
sum(S.Price) ≥ v is not antimonotone.
range(S.Price) ≥ 15 is monotone.
range(S.Price) ≤ 15 is antimonotone.
Itemset ab satisfies C
Item
Price
Itemset ab violates C
a
40
So does every superset of ab
b
0
b
0
c
-20
c
-20
d
10
d
10
e
-30
e
-30
30
So does every superset of ab
Item
Price
a
40
f
30
f
g
20
g
20
h
-10
h
-10
Constraints
Apriori algorithm + any constraint
Constraint
v∈S
S⊇V
Antimonotone
no
no
Monotone
yes
yes
S⊆V
min(S) ≤ v
yes
no
no
yes
min(S) ≥ v
max(S) ≤ v
yes
yes
no
no
max(S) ≥ v
count(S) ≤ v
no
yes
yes
no
count(S) ≥ v
no
yes
sum(S) ≤ v ( a ∈ S, a ≥ 0 )
sum(S) ≥ v ( a ∈ S, a ≥ 0 )
yes
no
no
yes
range(S) ≤ v
range(S) ≥ v
yes
no
no
yes
avg(S) θ v, θ ∈ { =, ≤, ≥ }
support(S) ≥ ξ
No but convertible
yes
No but convertible
no
support(S) ≤ ξ
no
yes
Database D
TID
100
200
300
400
itemset sup.
{1}
2
{2}
3
Scan D
{3}
3
{4}
1
{5}
3
C1
Items
134
235
1235
25
C2 itemset sup
L2 itemset sup
2
2
3
2
{1
{1
{1
{2
{2
{3
C3 itemset
{2 3 5}
Scan D
{1 3}
{2 3}
{2 5}
{3 5}
2}
3}
5}
3}
5}
5}
1
2
1
2
3
2
L1 itemset sup.
{1}
{2}
{3}
{5}
2
3
3
3
C2 itemset
{1 2}
Scan D
L3 itemset sup
{2 3 5} 2
{1
{1
{2
{2
{3
3}
5}
3}
5}
5}
Constraint: Sum{S.price} < 5,
where item price equals item
id
Apriori algorithm + antimonotone constraint
Apriori algorithm + monotone constraint
Prune
search
space
Database D
TID
100
200
300
400
itemset sup.
{1}
2
{2}
3
Scan D
{3}
3
{4}
1
{5}
3
C1
Items
134
235
1235
25
C2 itemset sup
L2 itemset sup
2
2
3
2
{1
{1
{1
{2
{2
{3
C3 itemset
{2 3 5}
Scan D
{1 3}
{2 3}
{2 5}
{3 5}
2}
3}
5}
3}
5}
5}
1
2
1
2
3
2
L1 itemset sup.
{1}
{2}
{3}
{5}
2
3
3
3
C2 itemset
{1 2}
Scan D
L3 itemset sup
{2 3 5} 2
{1
{1
{2
{2
{3
3}
5}
3}
5}
5}
Constraint: Sum{S.price} < 5,
where item price equals item
id
Database D
TID
100
200
300
400
Items
134
235
1235
25
itemset sup.
{1}
2
{2}
3
Scan D
{3}
3
{4}
1
{5}
3
C1
C2 itemset sup
L2 itemset sup
2
2
3
2
{1
{1
{1
{2
{2
{3
C3 itemset
{2 3 5}☺
Scan D
{1 3}
{2 3}
{2 5}
{3 5}
FP grow algorithm + antimonotone constraint
2}
3}
5}
3}
5}
5}
1
2
1
2
3
2
Does not prune
search space but
avoids constraint
checking
L1 itemset sup.
{1}
{2}
{3}
{5}
2
3
3
3
C2 itemset
{1 2}
Scan D
L3 itemset sup
{2 3 5} 2
{1
{1
{2
{2
{3
Constraint: Sum{S.price} ≥ 5,
where item price equals item
id
C(α) then do not check C(.) in TDB|α
Similar in Apriori
(prune search space)
Specific of FP grow
(avoids constraint check)
Constraints
avg(S.Price) ≤ v and avg(S.Price) ≥ v are neither
monotone nor antimonotone.
Convertible monotone
Constraints
If C(A) then C(B) for all A and B respecting R such that A is a suffix
of B.
E.g. avg(S.Price) ≥ v wrt decreasing price order.
Convertible antimonotone
If there exists an item order R such that
avg(X) ≥ 25 is convertible monotone wrt
descending item price order R: < a, f, g, d,
b, h, c, e>
If there exists an item order R such that
If C(A) then C(B) for all A and B respecting R such that B is a suffix
of A.
Or, if not C(B) then not C(A) for all A and B respecting R such that B
is a suffix of A.
E.g. avg(S.Price) ≥ v wrt to increasing price order.
Not in the output,
since they don’t
satisfy the constraint
FP grow algorithm + monotone constraint
If
3}
5}☺
3}
5}☺
5}☺
avg(X) ≥ 25 is convertible antimonotone wrt
ascending item price item order R-1: < e, c,
h, b, d, g, f, a >
If an itemset d satisfies a constraint C, so do
itemsets fd and afd, which have d as a suffix.
If an itemset dfa satisfies a constraint C, so do
itemsets fa and a, which are suffixes of dfa.
Thus, avg(X) ≥ 25 is strongly convertible.
Check that avg(X) § 25 is also strongly
convertible.
Constraints
Constraints
Constraint
Convertible
antimonotone
Convertible
monotone
Strongly
convertible
avg(S) ≤ , ≥ v
Yes
Yes
Yes
median(S) ≤ , ≥ v
Yes
Yes
Yes
sum(S) ≤ v (items could be of any
value, v ≥ 0)
Yes
No
No
sum(S) ≤ v (items could be of any
value, v ≤ 0)
No
Yes
No
sum(S) ≥ v (items could be of any
value, v ≥ 0)
No
Yes
No
sum(S) ≥ v (items could be of any
value, v ≤ 0)
Yes
No
No
……
Monotone
Antimonotone
Strongly
convertible
Convertible
antimonotone
Convertible
monotone
Inconvertible
avg(S)-median(S)=0
FP grow algorithm + convertible antimonotone constraint
of ordering the items according
to decreasing frequency, now the items
are ordered according to the order R of
the constraint.
FP grow algorithm + convertible monotone constraint
Instead
With monotone constraint
With convertible monotone constraint
False: Such items can
appear not only as suffix.
False: No check is needed for
those itemsets that are a suffix
of α U β. The check is needed
for the rest of items.
True: α will be added as suffix to
any itemset derived from TDB|α
and the result respects R.
Exercise
How
would you incorporate covertible
constraints in the Apriori algorithm ?
If C(α) then do not check C(.) in TDB|α
Instead of ordering the items according to
decreasing frequency, now the items are ordered
according to the order R of the constraint.
If C(α) then do not check C(.) in TDB|α because α
will be added as suffix to any itemset derived from
TDB|α and the result respects R.
Download