Association Rules Olson Yanhong Li Fuzzy Association Rules • Association rules mining provides information to assess significant correlations in large databases • IF X THEN Y • SUPPORT: degree to which relationship appears in data • CONFIDENCE: probability that if X, then Y Association Rule Algorithms • APriori • Agrawal et al., 1993; Agrawal & Srikant, 1994 – Find correlations among transactions, binary values • Weighted association rules • Cai et al., 1998; Lu et al. 2001 • Cardinal data • Srikant & Agrawal, 1996 – Partitions attribute domain, combines adjacent partitions until binary Fuzzy Association Rules • Most based on APriori algorithm • Treat all attributes as uniform • Can increase number of rules by decreasing minimum support, decreasing minimum confidence – Generates many uninteresting rules – Software takes a lot longer Gyenesei (2000) • Studied weighted quantitative association rules in fuzzy domain – With & without normalization – NONNORMALIZED • Used product operator to define combined weight and fuzzy value • If weight small, support level small, tends to have data overflow – NORMALIZED • Used geometric mean of item weights as combined weight • Support then very small Algorithm • Get membership functions, minimum support, minimum confidence • Assign weight to each fuzzy membership for each attribute (categorical) • Calculate support for each fuzzy region • If support > minimum, OK • If confidence > minimum, OK • If both OK, generate rules Demo Model: Loan App Case 1 2 3 4 5 6 7 8 9 10 Age 20 26 46 31 28 21 46 25 38 27 Income 52623 23047 56810 38388 80019 74561 65341 46504 65735 26047 Risk -38954 -23636 45669 -7968 -35125 -47592 58119 -30022 30571 -6 Credit Result Red 0 Green 1 Green 1 Amber 1 Green 1 Green 1 Green 1 Green 1 Green 1 Red 1 Fuzzified Age 1.2 Membership value 1 0.8 0.6 0.4 0.2 0 Age 0 25 35 Young Figure 2: The membership functions of attibute Age 40 Middle 50 100 Old Fuzzify Age Case 1 2 3 4 5 6 7 8 9 10 Age 20 26 46 31 28 21 46 25 38 27 Young 1.000 0.9 0 0.4 0.7 1 0 1 0 0.8 Middle 0 0.1 0.4 0.6 0.3 0 0.4 0 1 0.2 Old 0 0 0.6 0 0 0 0.6 0 0 0 Calculate Support for Each Pair of Fuzzy Categories • Membership value – Identify weights for each attribute – Identify highest fuzzy membership category for each case • Membership value = minimum weight associated with highest fuzzy membership category • Support – Average membership value for all cases Support • If support for pair of categories is above minimum support, retain • Identifies all pairs of fuzzy categories with sufficiently strong relationship Pairs: minsup 0.25 R11R22 0.235 R22R42 0.184 R11R31 0.207 R22R51 0.449 R11R41 0.212 R31R41 0.266 R11R42 0.131 R31R42 0.096 R11R51 0.230 R31R51 0.264 R22R31 0.237 R41R51 0.560 R22R41 0.419 R42R51 0.174 Confidence • Identify direction • For those training set cases involving the pair of attributes, what proportion came out as predicted? Confidence Values: Pairs Minimum confidence 0.9 R22R41 0.855 R41R31 0.462 R41R22 0.727 R31R51 0.825 R22R51 0.916 R51R31 0.410 R51R22 0.697 R41R51 0.972 R31R41 0.831 R51R41 0.870 Rules vs. Support the number of association rules 20 minconf=0.55 minconf=0.65 minconf=0.75 minconf=0.85 15 minconf=0.95 10 minconf=1 5 0 0.2 0.25 0.3 0.35 0.4 minsup 0.55 Figure 7: The relationship between number of association rules and minsup using the proposed method Rules vs. Confidence minsup=0.2 the number of association rules minsup=0.25 minsup=0.3 20 minsup=0.35 15 minsup=0.4 10 minsup=0.55 5 0 0.55 minconf 0.65 0.75 0.85 0.95 1 Figure 8: The relationship betw een number of association rules and minconf using the proposed method Higher order combinations • Try triplets – If ambitious, sets of 4, and beyond • Problem: – Computational complexity explodes Research • The higher the minimum support, the fewer rules you get • The higher the minimum confidence, the fewer rules you get • Weights can yield more rules • Greatest accuracy seemed to be at intermediate levels of support – Higher levels of confidence