732A02 Data Mining Clustering and Association Analysis • Constraints A constraint C(.) is Monotone Constrained frequent itemset mining If C(A) then C(B) for all A Œ B. E.g. A’ Œ A. Antimonotone If C(A) then C(B) for all B Œ A. Or, if not C(B) then not C(A) for all B Œ A. E.g. support ≥ min_support. The apriori property applies to any antimonotone constraint. ………………… Jose M. Peña jospe@ida.liu.se Constraints Constraints sum(S.Price) ≥ v is monotone (positive prices). sum(S.Price) ≤ v is antimonotone (positive prices). min(S.Price) ≤ v is monotone. sum(S.Price) ≥ v is not antimonotone. range(S.Price) ≥ 15 is monotone. range(S.Price) ≤ 15 is antimonotone. Itemset ab satisfies C Item Price Itemset ab violates C a 40 So does every superset of ab b 0 b 0 c -20 c -20 d 10 d 10 e -30 e -30 30 So does every superset of ab Item Price a 40 f 30 f g 20 g 20 h -10 h -10 Constraints Apriori algorithm + any constraint Constraint v∈S S⊇V Antimonotone no no Monotone yes yes S⊆V min(S) ≤ v yes no no yes min(S) ≥ v max(S) ≤ v yes yes no no max(S) ≥ v count(S) ≤ v no yes yes no count(S) ≥ v no yes sum(S) ≤ v ( a ∈ S, a ≥ 0 ) sum(S) ≥ v ( a ∈ S, a ≥ 0 ) yes no no yes range(S) ≤ v range(S) ≥ v yes no no yes avg(S) θ v, θ ∈ { =, ≤, ≥ } support(S) ≥ ξ No but convertible yes No but convertible no support(S) ≤ ξ no yes Database D TID 100 200 300 400 itemset sup. {1} 2 {2} 3 Scan D {3} 3 {4} 1 {5} 3 C1 Items 134 235 1235 25 C2 itemset sup L2 itemset sup 2 2 3 2 {1 {1 {1 {2 {2 {3 C3 itemset {2 3 5} Scan D {1 3} {2 3} {2 5} {3 5} 2} 3} 5} 3} 5} 5} 1 2 1 2 3 2 L1 itemset sup. {1} {2} {3} {5} 2 3 3 3 C2 itemset {1 2} Scan D L3 itemset sup {2 3 5} 2 {1 {1 {2 {2 {3 3} 5} 3} 5} 5} Constraint: Sum{S.price} < 5, where item price equals item id Apriori algorithm + antimonotone constraint Apriori algorithm + monotone constraint Prune search space Database D TID 100 200 300 400 itemset sup. {1} 2 {2} 3 Scan D {3} 3 {4} 1 {5} 3 C1 Items 134 235 1235 25 C2 itemset sup L2 itemset sup 2 2 3 2 {1 {1 {1 {2 {2 {3 C3 itemset {2 3 5} Scan D {1 3} {2 3} {2 5} {3 5} 2} 3} 5} 3} 5} 5} 1 2 1 2 3 2 L1 itemset sup. {1} {2} {3} {5} 2 3 3 3 C2 itemset {1 2} Scan D L3 itemset sup {2 3 5} 2 {1 {1 {2 {2 {3 3} 5} 3} 5} 5} Constraint: Sum{S.price} < 5, where item price equals item id Database D TID 100 200 300 400 Items 134 235 1235 25 itemset sup. {1} 2 {2} 3 Scan D {3} 3 {4} 1 {5} 3 C1 C2 itemset sup L2 itemset sup 2 2 3 2 {1 {1 {1 {2 {2 {3 C3 itemset {2 3 5}☺ Scan D {1 3} {2 3} {2 5} {3 5} FP grow algorithm + antimonotone constraint 2} 3} 5} 3} 5} 5} 1 2 1 2 3 2 Does not prune search space but avoids constraint checking L1 itemset sup. {1} {2} {3} {5} 2 3 3 3 C2 itemset {1 2} Scan D L3 itemset sup {2 3 5} 2 {1 {1 {2 {2 {3 Constraint: Sum{S.price} ≥ 5, where item price equals item id C(α) then do not check C(.) in TDB|α Similar in Apriori (prune search space) Specific of FP grow (avoids constraint check) Constraints avg(S.Price) ≤ v and avg(S.Price) ≥ v are neither monotone nor antimonotone. Convertible monotone Constraints If C(A) then C(B) for all A and B respecting R such that A is a suffix of B. E.g. avg(S.Price) ≥ v wrt decreasing price order. Convertible antimonotone If there exists an item order R such that avg(X) ≥ 25 is convertible monotone wrt descending item price order R: < a, f, g, d, b, h, c, e> If there exists an item order R such that If C(A) then C(B) for all A and B respecting R such that B is a suffix of A. Or, if not C(B) then not C(A) for all A and B respecting R such that B is a suffix of A. E.g. avg(S.Price) ≥ v wrt to increasing price order. Not in the output, since they don’t satisfy the constraint FP grow algorithm + monotone constraint If 3} 5}☺ 3} 5}☺ 5}☺ avg(X) ≥ 25 is convertible antimonotone wrt ascending item price item order R-1: < e, c, h, b, d, g, f, a > If an itemset d satisfies a constraint C, so do itemsets fd and afd, which have d as a suffix. If an itemset dfa satisfies a constraint C, so do itemsets fa and a, which are suffixes of dfa. Thus, avg(X) ≥ 25 is strongly convertible. Check that avg(X) § 25 is also strongly convertible. Constraints Constraints Constraint Convertible antimonotone Convertible monotone Strongly convertible avg(S) ≤ , ≥ v Yes Yes Yes median(S) ≤ , ≥ v Yes Yes Yes sum(S) ≤ v (items could be of any value, v ≥ 0) Yes No No sum(S) ≤ v (items could be of any value, v ≤ 0) No Yes No sum(S) ≥ v (items could be of any value, v ≥ 0) No Yes No sum(S) ≥ v (items could be of any value, v ≤ 0) Yes No No …… Monotone Antimonotone Strongly convertible Convertible antimonotone Convertible monotone Inconvertible avg(S)-median(S)=0 FP grow algorithm + convertible antimonotone constraint of ordering the items according to decreasing frequency, now the items are ordered according to the order R of the constraint. FP grow algorithm + convertible monotone constraint Instead With monotone constraint With convertible monotone constraint False: Such items can appear not only as suffix. False: No check is needed for those itemsets that are a suffix of α U β. The check is needed for the rest of items. True: α will be added as suffix to any itemset derived from TDB|α and the result respects R. Exercise How would you incorporate covertible constraints in the Apriori algorithm ? If C(α) then do not check C(.) in TDB|α Instead of ordering the items according to decreasing frequency, now the items are ordered according to the order R of the constraint. If C(α) then do not check C(.) in TDB|α because α will be added as suffix to any itemset derived from TDB|α and the result respects R.