The Principles for measuring association rules1 Victor Shi, Shan Duanmu and William Perrizo Computer Science Dept., North Dakota State University Fargo, ND 58105 Abstract – This paper presents six principles for characterizing interestingness measures of association rules. By applying the principles to the analysis of 5 measures, we found that at most three measures are needed to fully characterize association rule mining. While support-confidence framework is not sufficient to rank rules, we propose three alternatives to do so with no loss of implication, correlation, novelty and utility. Based on the proposed alternatives, we also discuss possible techniques for attaining most interesting rules. 1 Introduction Association rule mining searches for interesting relationships among items in a given data set. Such interesting relationships are typically expressed in an association rule in the form of X Y , where X and Y are sets of items. It can be read that, whenever a transaction T contains X, it probably will contain Y. The probability is defined as the percentage of transactions containing Y in addition to X with regard to the overall number of transactions containing X. This probability is called confidence (or strength). While the confidence measure represents the certainty of a rule, support is used to represent the usefulness of the rule [1]. Formally, the support of a rule is defined as the percentage of transactions containing both X and Y with regard to the number of transactions in the database. The target of association rule mining is to find interesting rules. A rule is considered to be interesting if its confidence and support exceed certain thresholds. Such thresholds are generally assumed to be given by domain experts. When mining association rules there are mainly two problems: complexity and usefulness. The number of rules grows exponentially with the number of items. Human can handle only a small fraction of them. Thus many algorithms use support threshold to reduce the algorithmic complexity and the generated number of rules [2, 3]. While the support-confidence framework has been widely used for measuring the interestingness of association rules, it is known that the resulting rules may be misleading [4-8]. A rule X Y with high support and high confidence may still not indicate that X and Y are dependent. The use of thresholds of support and confidence for pruning may obscure important rules, and also many unimportant rules may remain in the resulting rule set. S. Brin et al [4, 5] proposed Chi-square test, interest and conviction measures to overcome the weakness of the support-confidence framework. S. Ahmed [6] defined reliability measure to fix the symmetry problems of Brin’s framework, namely that Interest measure gives the same value to rules X Y and Y X , and that Conviction gives that the same value to rules X Y and Y X . 1 This work is partially supported by GSA Grant ACT# 96130308. Giving the same value to rules X Y and Y X is a serious drawback. For example, X is BUYING_COMPUTER and Y is BUYING_SOFTWARE. The supports P[ X ] 0.2 , P[Y ] 0.6 and P[ X Y ] 0.2 . By definition [4] the rules X Y and Y X have the same Interest 2. This is against our intuition, since we know a customer will likely buy software when he buys a computer. It is much less likely that a customer will buy computer when he buys software. In fact in the example the confidence of rule X Y is 100%, which is much larger that the confidence of rule Y X (33.3%). S. Ahmed et al [6] claim that reliability measure is a better choice than conviction by saying that Conviction gives the same value to rules X Y and Y X while Reliability does not. An interesting question to this claim is that why it is bad if a measure gives the same value to rules X Y and Y X . Logically, X Y is equivalent to Y X [9]. In the literature, many measures [8, 10] have been proposed to capture the correlation property of X and Y. Are there other properties that need to be captured? What measures should be used together to fully characterize the interestingness properties of association rules? Is there a synthetic measure that incorporates all the properties and thus can be used to rank rules for presenting the top N rules to users? To the best of our knowledge, few papers addressed these questions together. In this paper we present six objective principles for evaluating interestingness measures. With the proposed principles, we try to answer the above questions based on the analysis of 5 interestingness measures. We will also answer the question whether Reliability is a better interestingness measure than Conviction by detailed analysis. The paper is organized as follows. We present the six principles in section 2. In section 3 we choose five interestingness measures and compare them using the proposed principles. Section 4 discusses possible pruning techniques. We conclude the paper in Section 5. 2 Objective principles An association rule X Y can be interpreted as the presence of itemset X in a transaction would imply the occurrence of itemset Y. Logically X Y can be defined P[ X Y ] as X Y X [9]. Also it can be rewritten as ( X Y ) . Thus (confidence) P[ X ] P[ X ]P[Y ] and (conviction) both should be good measures with regard to implication. P[ X Y ] Further, we impose a constraint on the implication measure that, when P[ X ] P[Y ] then the implication of rule X Y should be larger than the implication of rule Y X . This can be explained from the example in section 1, where P[ X ] 0.2 , P[Y ] 0.6 and P[ X Y ] 0.2 . Definitely implication of BUY _ COMPUTER BUY _ SOFTWARE should be larger than the implication of BUY _ SOFTWARE BUY _ COMPUTER . We will see in the next section both confidence and conviction perfectly follows this constraint. Thus we write the implication principle as follows. Principle 1 (implication principle): If a set of measures is defined to reflect the interestingness of an association rule X Y , then at least one measure mi ( X Y ) in the set should satisfy the constraint mi ( X Y ) mi (Y X ) when P[ X ] P[Y ] . To mine a rule X Y , we assume there is certain relationship between X and Y. It makes no sense to say that we have a rule but its antecedent and consequent are independent. Confidence measure does have this drawback, though it satisfies the Implication Principle. Many research efforts have been devoted to develop new measures that reflect the strength of the correlation between the antecedent and the consequent in a rule [9]. Theoretically, the covariance of X and Y is the best to reflect the correlation strength. In practice, we also hope that the measure may have some sort of closure property so that we can utilize it to reduce computation complexity. In general, we only need the correlation measure to be proportional to the covariance. Principle 2 (correlation principle): If a set of measures is defined to reflect the interestingness of an association rule X Y , then at least one measure mi in the set should be directly proportional to the covariance of X and Y. When presenting rules to user, it is desirable that the rules contain new information. Most efforts in this regard are devoted to removing redundant rules from generated rule set [1]. One of two rules is considered to be redundant if they are “too close” to each other. Here we define novelty from a different perspective. In general, we say a rule is less novel if it is closer to common knowledge. The difficulty is how to quantitatively measure this “closer to common knowledge”. We define “common knowledge” to be the rules we can get from the database when P[ X ] 1 (or P[Y ] 1 ). This is reasonable, since we are sure there is a rule X Y if P[ X ] 1 , no matter how small P[Y ] is, and vice versa. The closer P[ X ] (or P[Y ] ) is to 1, the less novel the rule is X Y . We draw the following novelty principle. Principle 3 (novelty principle): If a set of measures is defined to reflect the interestingness of an association rule, then for a given P[ X Y ] , at least one measure mi in the set should reflect its novelty. The novelty measure mi should be inversely proportional to p=max{ P[ X ] , P[Y ] }. In this paper we deliberately use P[ X ] , P[Y ] and P[ X Y ] instead of occurrence frequencies of X, Y and X Y . This is because rule X Y has statistical significance only after X, Y and X Y occur “frequent enough”. The frequency of X Y may has to exceed a certain threshold to intrigue the user to find if there is a rule X Y . Thus the occurrence frequency threshold of X Y is determined by the “statistical” significance or user’s “financial” interest, whichever is larger. Therefore we have the following utility principle. Principle 4 (utility principle): If a set of measures is defined to reflect the interestingness of an association rule, then at least one measure mi in the set should reflect its utility, i.e., mi is a monotone increasing function with respect to P[ X Y ] . In practice, association rule mining may produce too many rules to be examined by human. It is desirable to present only “the most interesting N rules” to user for further examination. A synthetic measure with which we can sort the rules in descending order can help meet this need. Principle 5 (top-N-rule principle): If a synthetic measure is defined to sort the rules for presenting top N rules to user, then it is desirable that this measure obeys the principles 1-4. Arithmetic complexity is an important issue in association rule mining. It is impossible to mine large databases without efficient algorithms. Enormous efforts in the past have been devoted to reduce computation complexity. We thus define the efficiency principle as follows. Principle 6 (efficiency principle): If a set of measures is defined to reflect the interestingness of an association rule, then it is desirable that thresholds of measures can help reduce computation complexity. Compared to principles 5 and 6, principles of implication, correlation, novelty and utility are more fundamental. While potentially there are many measures that may satisfy those principles, efficiency principle and top-N-rule principle may help decide which measure will be most successful. Next section we will see examples in this regard. 3 Evaluating measures In the preceding section we elaborate 6 principles for characterizing association rules. Are they complete? Is there redundancy in the principles? We say Implication, Correlation, Novelty and Utility are four fundamental principles, does it mean we need four measures to complete the measuring of rules? So far we do not have a measure that can incorporate the four fundamental principles, what should we do then? What is the best measure? In this section we select five example measures to analyze using the principles we set. You will see most of these questions can be answered after the analysis is done. For convenience, we rewrite the definitions of the selected measures as follows Support sup( X Y ) P [ X Y ] Confidence P[ X Y ] conf ( X Y ) P[ X ] (3.1) (3.2) Interest int r( X Y ) Conviction rel ( X Y ) (3.3) P[ X ]P[Y ] P[ X Y ] (3.4) P[ X Y ] P[Y ] P[ X ] (3.5) conv( X Y ) Reliability P[ X Y ] P[ X ] P[ Y ] 3.1 Support Formula (3.1) defines support as the joint occurrence probability of X and Y in a transaction. Obviously it is a utility measure as many researchers have pointed out, it satisfies principle 4. Also it satisfies the Efficiency principle (principle 6) because of its downward closure property [4]. A further examination can tell other principles can not apply to support. 3.2 Confidence Formula (3.2) defines confidence as the conditional probability of Y given X. Here we prove that it satisfies the Implication principle (principle 1): conf ( X Y ) conf (Y X ) P[ X Y ] P[ X Y ] P[ X ] P[Y ] P[ X Y ] ( P[Y ] P[ X ]) 0 P[ X ]P[Y ] when P[ X ] P[Y ] Thus confidence is an implication measure as we expected. Similarly, we know that confidence can not reflect correlation principle, utility principle and has no closure property we can use to reduce complexity [4]. It also does not satisfy novelty principle. Here is the proof. Suppose P[Y ] increases by Y P[Y ] and P[ X Y ] correspondingly increases by X Y P[ X Y ] . Then (1 X Y ) P[ X Y ] P[ X Y ] P[ X ] P[ X ] P[ X Y ] X Y 0 P[ X ] This indicates that confidence is not inversely proportional to p=max{ P[ X ] , P[Y ] } when P[ X ] < P[Y ] . ▌ 3.3 Interest Formula (3.3) defines Interest. The interest is devised to capture the strength of correlation between X and Y [4]. So here we do not want to waste our effort to prove that it reflects the principle 2. Interestingly, we can prove interest satisfies the novelty principle as follows. Suppose P[ X ] increases by X P[X ] and P[ X Y ] correspondingly increases by X Y P[ X Y ] . Then (1 X Y ) P[ X Y ] P[ X Y ] (1 X ) P[ X ]P[Y ] P[ X ]P[Y ] P[ X Y ] 1 X Y ( 1) P[ X ]P[Y ] 1 X P[ X Y ] X Y X ( )0 P[ X ]P[Y ] 1 X This indicates that confidence is inversely proportional to p=max{ P[ X ] , P[Y ] } when P[ X ] > P[Y ] . By symmetry, confidence is also inversely proportional to p=max{ P[ X ] , P[Y ] } when P[ X ] < P[Y ] . ▌ An example can help explain how the Interest reveals the novelty of a rule. Suppose P[ X Y ] P[ X ] P[Y ] P[ X Y ] 0.9 . int r( X Y ) 1.11 , only a little bit P[ X ] P[ Y ] higher than 1, even though the presence of X always signifies the occurrence of Y in the example. This is because the rule does not provide much novel information when P [ X ] (or P[Y ] ) is close to 1. 3.4 Conviction Formula (3.4) defines conviction. It can be rewritten as conv( X Y ) P [ X ](1 P [ Y ]) P[ X ] P[ X Y ] If conviction is a good implication measure, we must have conv( X Y ) – conv( Y X ) >0 when P [ X ] P [ Y ] , as the implication principles required. conv( X Y ) conv( Y X ) P [ X ](1 P [ Y ]) P [ Y ](1 P [ X ]) P[ X ] P[ X Y ] P[ Y ] P[ X Y ] ( P [ Y ] P [ X ])( P [ X Y ] P [ X ] P [ Y ]) ( P [ X ] P [ X Y ])( P [ Y ] P [ X Y ]) From the above, we can see conv( X Y ) conv( Y X ) >0 when X and Y are positively correlated ( P [ X Y ] P [ X ] P [ Y ] ). Thus conviction can be used as an implication measure when we are only interested in positively correlated rules. The authors of [8] concluded that the correlation between conviction and coefficients is positive. So conviction satisfies the correlation principle and can be used as a correlation measure. As for the novelty principle, we have P [ X ](1 P [ Y ]) ≈1 when P[Y ] <<1 and P[ X ] << P [ X Y ] . Thus conv( X Y ) P[ X ] P[ X Y ] conviction does not do well with regard to principle 3. 3.5 Reliability Formula (3.5) defines Reliability. It reflects correlation between X and Y as discussed in [6]. We say Reliability satisfies the implication principle when we are only interested in rules that have positive correlations. The proof is given as follows. To satisfy the implication principle, rel( X Y ) rel( Y X ) must be greater than 0 when P[ X ] < P [ Y ] . We have rel ( X Y ) rel ( Y X ) P[ X Y ] P[ X Y ] ( P [ Y ]) ( P [ X ]) P[ X ] P[ Y ] P[ X Y ] ( P [ Y ] P [ X ])( 1) P[ X ] P[ Y ] Thus we have rel( X Y ) rel( Y X ) >0 when P[ X ] < P [ Y ] and P[ X Y ] 1 P[ X ] P[ Y ] (positively correlated). ▌ We prove that reliability may not satisfy the novelty principle. Proof. Suppose P[ X ] increases by X P[X ] and P[ X Y ] correspondingly increases by X Y P[ X Y ] . The proof can be completed by two steps. The first step is to prove it satisfies novelty principle when p=max{ P[ X ] , P[Y ] }= P[ X ] . The second step is to prove it satisfies novelty principle when p=max{ P[ X ] , P[Y ] }= P [ Y ] . (1) Step 1: for p=max{ P[ X ] , P[Y ] }= P[ X ] ( 1 X Y )P [ X Y ] P[ X Y ] ( P [ Y ]) ( P [ Y ]) ( 1 X )P [ X ] P[ X ] P [ X Y ] 1 X Y ( 1 ) P[ X ] 1 X P [ X Y ] X Y X ( ) 0 P[ X ] P[ Y ] 1 X This indicates that reliability is inversely proportional to p=max{ P[ X ] , P[Y ] } when P[ X ] > P[Y ] . (1) Step 2: for p=max{ P[ X ] , P[Y ] }= P [ Y ] ( 1 X Y )P [ X Y ] P[ X Y ] ( P [ Y ](1 Y )) ( P [ Y ]) P[ X ] P[ X ] P[ X Y ] X Y P [ Y ] Y P[ X ] Y P[ X Y ] Y , then Reliability will not satisfy the novelty principle. P [ X ] P [ Y ] X Y X Y always is greater than 1 due to the fact that XY is caused by X . This tells us that if X and Y are highly positively correlated, reliability cannot satisfy the novelty principle. Otherwise, it does. If We summarize the discussion of this section into table 1. Table 1 Implication Support confidence X Correlation Novelty Utility Interest X X Conviction X (when positively correlated) X Reliability X (when positively correlated) X X(when negatively related) X A few conclusions can be made from table 1. 1. No measure is absolutely better than others for obtaining the Top-N ruleset. 2. When using a synthetic measure such as reliability or conviction, support is still an important utility measure. Interest still should be used as a novelty measure in order to fully characterize rules. 3. Interest not only can be used as a good correlation measure, it also can be used as a good novelty measure. It is always 1 when the rule contains no novel information. 4. When Interest is used as a synthetic measure for ranking rules, then confidence should also be included in addition to support. This is because Interest is a poor measure for implication examination. 5. While we may have three alternate frameworks for fully characterizing rules (supportconfidence-interest, support-conviction-interest, support-reliability-interest), the supportconfidence-interest framework is best. The other two work well only when rules are positively correlated. 4 Most interesting rules As we can see from the conclusions drawn in section 3, we do not have a synthetic measure that can incorporate all four fundamental principles. Instead, a framework of three separate measures is needed to fully capture “all” properties of association rules. This “three measure framework” corresponds to the three random variables P[ X ] , P [ Y ] and P [ X Y ] for calculating the rules. Data mining has to deal with enormous amount of data. Thus the complexity of algorithms for generating interesting rules is one of the major concerns. In the literature, techniques to reduce complexity can be classified into three categories, one sets up thresholds for each measure of interestingness and uses those thresholds to reduce the search space, the second one is to define a partial order on selected measures [11]. With the partial order, rules are classified into multiple classes. Algorithms are then employed to find the most interesting rule in each class. The algorithms in the third category are to find a synthetic measure and use it to sort the ruleset, so that top N rules can be presented to end user. The first kind of algorithms has the drawback of generating too many rules when the thresholds are low, or losing important rules when the thresholds are high. The third kind of algorithms has the drawback of requiring an appropriate synthetic measure. In the preceding section we saw no good synthetic measure that can fully capture all the properties. We have recommended three frameworks to evaluate the goodness of rules. Each framework requires three measures. In [11] a partial order technique is used in supportconfidence framework to mine most interesting rules in each class. It says that the lift (interest) is monotone to both rule support and confidence. Thus the interest order can be deduced from the partial order of support-confidence framework. This contradicts our support-confidence-interest framework, because the interest measure would redundant if it could be deduced from support-confidence framework. By further examination, we found the claim that, when confidence is fixed, lift is monotone to rule support is P[ X Y ] 0.8 0.4 0.88 incorrect. For example, conf ( X Y ) . The rule 0.8 P[ X ] 0.1 0.05 0.11 support could be either 0.8, 0.4, or 0.88, even though the confidence is fixed at 0.8. For the same reason, conviction is not monotone to support. Thus we cannot have a supportconviction framework. Another obvious example is introduced in Table 2 [7], where rule X Y ’s (support, confidence) = (25%, 50%) is before rule X Z ’s (support, confidence) = (37.5%, 75%), but X Y ’s interest =2 is after X Z ’s interest=0.86, if we sort them in ascending order. Table 2: Interest does not follow the partial order in the support-confidence framework X 1 1 1 1 0 0 0 0 Y 1 1 0 0 0 0 0 0 Z 0 1 1 1 1 1 1 1 5 Conclusions and future work In this paper we proposed 6 principles for studying the interestingness measures of association rule mining. By applying the proposed principles to the analysis of 5 measures, we found that at most three measures are needed to fully characterize rule mining. While support-confidence framework is not sufficient to rank rules, we proposed three alternatives to do so with no loss of implication, correlation, novelty and utility. Based on the proposed alternatives, we discussed possible techniques for attaining most interesting rules or top N rules. In the paper we only discussed five measures. In the literature, numerous measures were proposed to characterize correlation, implication and novelty. For example, P. Tan et al [8] compared 10 measures (Laplace, Gini, RI, Interest, Conviction, etc.) with respect to -coefficient and concluded that IS measure most closely reflects the correlation property of a rule. We feel that, for the ranking purpose, it is unnecessary to use a measure that is the closest to -coefficient to reflect the correlation property of a rule. Any measure is fine as long as it is a monotone function of the -coefficient. Thus we chose Interest, conviction and reliability as the alternative measures for correlation study. For the same reason we chose support as the alternative measure for utility, confidence and conviction as the alternatives for implication, etc. The choices are merely for the convenience of demonstrating the effectiveness of the 6 principles. In fact the efficiency principle (Principle 6) encourages us to investigate other alternatives so that rule mining can be efficiently done without the loss of full characterization. For example, if we have a set of measures that can be computed more efficiently and fully characterizes the implication, correlation, novelty and utility, then it certainly should be used to replace the three alternatives proposed in the paper. Thus our future work is to explore possible alternative frameworks and find the best one based on the efficiency principle. References [1] J. Han and M. Kamber, “Data Mining – concepts and techniques”, Morgan Kaufmann Publishers, 2001. [2] J. Hipp, U. Guntzer and G. Nakhaeizadeh, “Algorithms for association rule mining – A general survey and comparison”, ACM SIGKDD, 2000, pp.58-64. [3] Z. Zheng, R. Kohavi and L. Mason, “Real world performance of association rule mining”, ACM SIGKDD, 2001. [4] S. Brin, R. Motwani, and C. Silverstein, “Beyond Market Baskets: Generalizing association rules to correlations”, ACM SIGMOD, 1997. [5] S. Brin, R. Motwani, J. Ullman, and S. Tsur, “Dynamic Itemset Counting and Implication rules for Market Basket Data”, ACM SIGMOD, 1997. [6] K. Ahmed, N. EI-Makky and Y. Taha, “A note on ‘Beyond Market Baskets: Generalizing association rules to correlations’”, ACM SIGKDD Explorations, Vol. 1, Issue 2, pp. 46-48, 2000. [7] C. Aggarwal, and P. Yu, “A new framework for Itemset Generation”, ACM PODS, pp. 18-24, 1998. [8] P. Tan, and V. Kumar, “Interestingness Measures for Association Patterns: A Perspective”, KDD’2000 Workshop on Postprocessing in Machine Learning and Data Mining, Boston, 2000. [9] E. Cohen, “Programming in the 1990s: An introduction to the calculation of programs”, Springer-Verlag, ISBN 0-387-97382-6, 1990. [10] R. Hilderman, and R. Hamilton, “Knowledge discovery and interestingness measures: A survey”, Technical Report CS 99-04, C.S. Dept., Univ. of Regina, 1999. [11] R. Bayardo Jr., and R. Agrawal, “Mining the most interesting rules”, ACM SIGKDD, pp.145-154, 1999.