1 - NDSU Computer Science

advertisement
The Principles for measuring association rules1
Victor Shi, Shan Duanmu and William Perrizo
Computer Science Dept., North Dakota State University
Fargo, ND 58105
Abstract – This paper presents six principles for characterizing interestingness measures
of association rules. By applying the principles to the analysis of 5 measures, we found
that at most three measures are needed to fully characterize association rule mining.
While support-confidence framework is not sufficient to rank rules, we propose three
alternatives to do so with no loss of implication, correlation, novelty and utility. Based on
the proposed alternatives, we also discuss possible techniques for attaining most
interesting rules.
1 Introduction
Association rule mining searches for interesting relationships among items in a given
data set. Such interesting relationships are typically expressed in an association rule in the
form of X  Y , where X and Y are sets of items. It can be read that, whenever a
transaction T contains X, it probably will contain Y. The probability is defined as the
percentage of transactions containing Y in addition to X with regard to the overall number
of transactions containing X. This probability is called confidence (or strength). While
the confidence measure represents the certainty of a rule, support is used to represent the
usefulness of the rule [1]. Formally, the support of a rule is defined as the percentage of
transactions containing both X and Y with regard to the number of transactions in the
database.
The target of association rule mining is to find interesting rules. A rule is considered
to be interesting if its confidence and support exceed certain thresholds. Such thresholds
are generally assumed to be given by domain experts. When mining association rules
there are mainly two problems: complexity and usefulness. The number of rules grows
exponentially with the number of items. Human can handle only a small fraction of
them. Thus many algorithms use support threshold to reduce the algorithmic complexity
and the generated number of rules [2, 3].
While the support-confidence framework has been widely used for measuring the
interestingness of association rules, it is known that the resulting rules may be misleading
[4-8]. A rule X  Y with high support and high confidence may still not indicate that X
and Y are dependent. The use of thresholds of support and confidence for pruning may
obscure important rules, and also many unimportant rules may remain in the resulting
rule set. S. Brin et al [4, 5] proposed Chi-square test, interest and conviction measures to
overcome the weakness of the support-confidence framework. S. Ahmed [6] defined
reliability measure to fix the symmetry problems of Brin’s framework, namely that
Interest measure gives the same value to rules X  Y and Y  X , and that Conviction
gives that the same value to rules X  Y and Y  X .
1
This work is partially supported by GSA Grant ACT# 96130308.
Giving the same value to rules X  Y and Y  X is a serious drawback. For
example, X is BUYING_COMPUTER and Y is BUYING_SOFTWARE. The supports
P[ X ]  0.2 , P[Y ]  0.6 and P[ X  Y ]  0.2 . By definition [4] the rules X  Y and
Y  X have the same Interest 2. This is against our intuition, since we know a customer
will likely buy software when he buys a computer. It is much less likely that a customer
will buy computer when he buys software. In fact in the example the confidence of rule
X  Y is 100%, which is much larger that the confidence of rule Y  X (33.3%). S.
Ahmed et al [6] claim that reliability measure is a better choice than conviction by saying
that Conviction gives the same value to rules X  Y and Y  X while Reliability
does not. An interesting question to this claim is that why it is bad if a measure gives the
same value to rules X  Y and Y  X . Logically, X  Y is equivalent to
Y  X [9].
In the literature, many measures [8, 10] have been proposed to capture the correlation
property of X and Y. Are there other properties that need to be captured? What measures
should be used together to fully characterize the interestingness properties of association
rules? Is there a synthetic measure that incorporates all the properties and thus can be
used to rank rules for presenting the top N rules to users? To the best of our knowledge,
few papers addressed these questions together. In this paper we present six objective
principles for evaluating interestingness measures. With the proposed principles, we try
to answer the above questions based on the analysis of 5 interestingness measures. We
will also answer the question whether Reliability is a better interestingness measure than
Conviction by detailed analysis.
The paper is organized as follows. We present the six principles in section 2. In
section 3 we choose five interestingness measures and compare them using the proposed
principles. Section 4 discusses possible pruning techniques. We conclude the paper in
Section 5.
2 Objective principles
An association rule X  Y can be interpreted as the presence of itemset X in a
transaction would imply the occurrence of itemset Y. Logically X  Y can be defined
P[ X  Y ]
as X  Y  X [9]. Also it can be rewritten as ( X  Y ) . Thus
(confidence)
P[ X ]
P[ X ]P[Y ]
and
(conviction) both should be good measures with regard to implication.
P[ X  Y ]
Further, we impose a constraint on the implication measure that, when P[ X ]  P[Y ] then
the implication of rule X  Y should be larger than the implication of rule Y  X .
This can be explained from the example in section 1, where P[ X ]  0.2 , P[Y ]  0.6 and
P[ X  Y ]  0.2 . Definitely implication of BUY _ COMPUTER  BUY _ SOFTWARE
should be larger than the implication of BUY _ SOFTWARE  BUY _ COMPUTER .
We will see in the next section both confidence and conviction perfectly follows this
constraint. Thus we write the implication principle as follows.
Principle 1 (implication principle): If a set of measures is defined to reflect the
interestingness of an association rule X  Y , then at least one measure mi ( X  Y ) in
the set should satisfy the constraint mi ( X  Y )  mi (Y  X ) when P[ X ]  P[Y ] .
To mine a rule X  Y , we assume there is certain relationship between X and Y. It
makes no sense to say that we have a rule but its antecedent and consequent are
independent. Confidence measure does have this drawback, though it satisfies the
Implication Principle. Many research efforts have been devoted to develop new
measures that reflect the strength of the correlation between the antecedent and the
consequent in a rule [9]. Theoretically, the covariance of X and Y is the best to reflect the
correlation strength. In practice, we also hope that the measure may have some sort of
closure property so that we can utilize it to reduce computation complexity. In general,
we only need the correlation measure to be proportional to the covariance.
Principle 2 (correlation principle): If a set of measures is defined to reflect the
interestingness of an association rule X  Y , then at least one measure mi in the set
should be directly proportional to the covariance of X and Y.
When presenting rules to user, it is desirable that the rules contain new information.
Most efforts in this regard are devoted to removing redundant rules from generated rule
set [1]. One of two rules is considered to be redundant if they are “too close” to each
other. Here we define novelty from a different perspective. In general, we say a rule is
less novel if it is closer to common knowledge. The difficulty is how to quantitatively
measure this “closer to common knowledge”. We define “common knowledge” to be the
rules we can get from the database when P[ X ]  1 (or P[Y ]  1 ). This is reasonable,
since we are sure there is a rule X  Y if P[ X ]  1 , no matter how small P[Y ] is, and
vice versa. The closer P[ X ] (or P[Y ] ) is to 1, the less novel the rule is X  Y . We
draw the following novelty principle.
Principle 3 (novelty principle): If a set of measures is defined to reflect the
interestingness of an association rule, then for a given P[ X  Y ] , at least one measure mi
in the set should reflect its novelty. The novelty measure mi should be inversely
proportional to p=max{ P[ X ] , P[Y ] }.
In this paper we deliberately use P[ X ] , P[Y ] and P[ X  Y ] instead of occurrence
frequencies of X, Y and X  Y . This is because rule X  Y has statistical significance
only after X, Y and X  Y occur “frequent enough”. The frequency of X  Y may has to
exceed a certain threshold to intrigue the user to find if there is a rule X  Y . Thus the
occurrence frequency threshold of X  Y is determined by the “statistical” significance
or user’s “financial” interest, whichever is larger. Therefore we have the following utility
principle.
Principle 4 (utility principle): If a set of measures is defined to reflect the
interestingness of an association rule, then at least one measure mi in the set should
reflect its utility, i.e., mi is a monotone increasing function with respect to P[ X  Y ] .
In practice, association rule mining may produce too many rules to be examined by
human. It is desirable to present only “the most interesting N rules” to user for further
examination. A synthetic measure with which we can sort the rules in descending order
can help meet this need.
Principle 5 (top-N-rule principle): If a synthetic measure is defined to sort the rules for
presenting top N rules to user, then it is desirable that this measure obeys the principles
1-4.
Arithmetic complexity is an important issue in association rule mining. It is
impossible to mine large databases without efficient algorithms. Enormous efforts in the
past have been devoted to reduce computation complexity. We thus define the efficiency
principle as follows.
Principle 6 (efficiency principle): If a set of measures is defined to reflect the
interestingness of an association rule, then it is desirable that thresholds of measures can
help reduce computation complexity.
Compared to principles 5 and 6, principles of implication, correlation, novelty and
utility are more fundamental. While potentially there are many measures that may satisfy
those principles, efficiency principle and top-N-rule principle may help decide which
measure will be most successful. Next section we will see examples in this regard.
3 Evaluating measures
In the preceding section we elaborate 6 principles for characterizing association rules.
Are they complete? Is there redundancy in the principles? We say Implication,
Correlation, Novelty and Utility are four fundamental principles, does it mean we need
four measures to complete the measuring of rules? So far we do not have a measure that
can incorporate the four fundamental principles, what should we do then? What is the
best measure? In this section we select five example measures to analyze using the
principles we set. You will see most of these questions can be answered after the
analysis is done. For convenience, we rewrite the definitions of the selected measures as
follows
Support
sup( X  Y )  P [ X  Y ]
Confidence
P[ X  Y ]
conf ( X  Y ) 
P[ X ]
(3.1)
(3.2)
Interest
int r( X  Y ) 
Conviction
rel ( X  Y ) 
(3.3)
P[ X ]P[Y ]
P[ X  Y ]
(3.4)
P[ X  Y ]
 P[Y ]
P[ X ]
(3.5)
conv( X  Y ) 
Reliability
P[ X  Y ]
P[ X ] P[ Y ]
3.1 Support
Formula (3.1) defines support as the joint occurrence probability of X and Y in a
transaction. Obviously it is a utility measure as many researchers have pointed out, it
satisfies principle 4. Also it satisfies the Efficiency principle (principle 6) because of its
downward closure property [4]. A further examination can tell other principles can not
apply to support.
3.2 Confidence
Formula (3.2) defines confidence as the conditional probability of Y given X. Here
we prove that it satisfies the Implication principle (principle 1):
conf ( X  Y )  conf (Y  X )
P[ X  Y ] P[ X  Y ]


P[ X ]
P[Y ]
P[ X  Y ]

( P[Y ]  P[ X ])  0
P[ X ]P[Y ]
when P[ X ]  P[Y ]
Thus confidence is an implication measure as we expected. Similarly, we know that
confidence can not reflect correlation principle, utility principle and has no closure
property we can use to reduce complexity [4]. It also does not satisfy novelty principle.
Here is the proof.
Suppose P[Y ] increases by Y P[Y ] and P[ X  Y ] correspondingly increases by
 X Y P[ X  Y ] . Then
(1   X Y ) P[ X  Y ] P[ X  Y ]

P[ X ]
P[ X ]
P[ X  Y ]

 X Y  0
P[ X ]
This indicates that confidence is not inversely proportional to p=max{ P[ X ] , P[Y ] } when
P[ X ] < P[Y ] . ▌
3.3 Interest
Formula (3.3) defines Interest. The interest is devised to capture the strength of
correlation between X and Y [4]. So here we do not want to waste our effort to prove that
it reflects the principle 2. Interestingly, we can prove interest satisfies the novelty
principle as follows.
Suppose P[ X ] increases by  X P[X ] and P[ X  Y ] correspondingly increases by
 X Y P[ X  Y ] . Then
(1   X Y ) P[ X  Y ] P[ X  Y ]

(1   X ) P[ X ]P[Y ] P[ X ]P[Y ]

P[ X  Y ] 1   X Y
(
 1)
P[ X ]P[Y ] 1   X

P[ X  Y ]  X Y   X
(
)0
P[ X ]P[Y ] 1   X
This indicates that confidence is inversely proportional to p=max{ P[ X ] , P[Y ] } when
P[ X ] > P[Y ] .
By symmetry, confidence is also inversely proportional to
p=max{ P[ X ] , P[Y ] } when P[ X ] < P[Y ] . ▌
An example can help explain how the Interest reveals the novelty of a rule. Suppose
P[ X  Y ]
P[ X ]  P[Y ]  P[ X  Y ]  0.9 . int r( X  Y ) 
 1.11 , only a little bit
P[ X ] P[ Y ]
higher than 1, even though the presence of X always signifies the occurrence of Y in the
example. This is because the rule does not provide much novel information when
P [ X ] (or P[Y ] ) is close to 1.
3.4 Conviction
Formula (3.4) defines conviction. It can be rewritten as
conv( X  Y ) 
P [ X ](1  P [ Y ])
P[ X ]  P[ X  Y ]
If conviction is a good implication measure, we must have conv( X  Y ) –
conv( Y  X ) >0 when P [ X ]  P [ Y ] , as the implication principles required.
conv( X  Y )  conv( Y  X )
P [ X ](1  P [ Y ])
P [ Y ](1  P [ X ])


P[ X ]  P[ X  Y ] P[ Y ]  P[ X  Y ]
( P [ Y ]  P [ X ])( P [ X  Y ]  P [ X ] P [ Y ])

( P [ X ]  P [ X  Y ])( P [ Y ]  P [ X  Y ])
From the above, we can see conv( X  Y )  conv( Y  X ) >0 when X and Y are
positively correlated ( P [ X  Y ]  P [ X ] P [ Y ] ). Thus conviction can be used as an
implication measure when we are only interested in positively correlated rules.
The authors of [8] concluded that the correlation between conviction and  coefficients is positive. So conviction satisfies the correlation principle and can be used
as a correlation measure.
As for the novelty principle, we have
P [ X ](1  P [ Y ])
≈1 when P[Y ] <<1 and P[ X ] << P [ X  Y ] . Thus
conv( X  Y ) 
P[ X ]  P[ X  Y ]
conviction does not do well with regard to principle 3.
3.5 Reliability
Formula (3.5) defines Reliability. It reflects correlation between X and Y as discussed
in [6]. We say Reliability satisfies the implication principle when we are only interested
in rules that have positive correlations. The proof is given as follows.
To satisfy the implication principle, rel( X  Y )  rel( Y  X ) must be greater than 0
when P[ X ] < P [ Y ] . We have
rel ( X  Y )  rel ( Y  X )
P[ X  Y ]
P[ X  Y ]
(
 P [ Y ])  (
 P [ X ])
P[ X ]
P[ Y ]
P[ X  Y ]
 ( P [ Y ]  P [ X ])(
1)
P[ X ] P[ Y ]
Thus we have rel( X  Y )  rel( Y  X ) >0 when P[ X ] < P [ Y ] and
P[ X  Y ]
1
P[ X ] P[ Y ]
(positively correlated). ▌
We prove that reliability may not satisfy the novelty principle.
Proof. Suppose P[ X ] increases by  X P[X ] and P[ X  Y ] correspondingly increases
by  X Y P[ X  Y ] . The proof can be completed by two steps. The first step is to prove it
satisfies novelty principle when p=max{ P[ X ] , P[Y ] }= P[ X ] . The second step is to
prove it satisfies novelty principle when p=max{ P[ X ] , P[Y ] }= P [ Y ] .
(1) Step 1: for p=max{ P[ X ] , P[Y ] }= P[ X ]
( 1   X Y )P [ X  Y ]
P[ X  Y ]
(
 P [ Y ])  (
 P [ Y ])
( 1   X )P [ X ]
P[ X ]

P [ X  Y ] 1   X Y
(
1 )
P[ X ]
1  X

P [ X  Y ]  X Y   X
(
) 0
P[ X ] P[ Y ]
1  X
This indicates that reliability is inversely proportional to p=max{ P[ X ] , P[Y ] } when
P[ X ] > P[Y ] .
(1) Step 2: for p=max{ P[ X ] , P[Y ] }= P [ Y ]
( 1   X Y )P [ X  Y ]
P[ X  Y ]
(
 P [ Y ](1   Y ))  (
 P [ Y ])
P[ X ]
P[ X ]
P[ X  Y ]

 X Y  P [ Y ]  Y
P[ X ]

Y
P[ X  Y ]
 Y , then Reliability will not satisfy the novelty principle.
P [ X ] P [ Y ]  X Y
 X Y
always is greater than 1 due to the fact that  XY is caused by  X . This tells us that if X
and Y are highly positively correlated, reliability cannot satisfy the novelty principle.
Otherwise, it does.
If
We summarize the discussion of this section into table 1.
Table 1
Implication
Support confidence
X
Correlation
Novelty
Utility
Interest
X
X
Conviction
X (when positively
correlated)
X
Reliability
X (when positively
correlated)
X
X(when
negatively
related)
X
A few conclusions can be made from table 1.
1. No measure is absolutely better than others for obtaining the Top-N ruleset.
2. When using a synthetic measure such as reliability or conviction, support is still an
important utility measure. Interest still should be used as a novelty measure in order to
fully characterize rules.
3. Interest not only can be used as a good correlation measure, it also can be used as a
good novelty measure. It is always 1 when the rule contains no novel information.
4. When Interest is used as a synthetic measure for ranking rules, then confidence should
also be included in addition to support. This is because Interest is a poor measure for
implication examination.
5. While we may have three alternate frameworks for fully characterizing rules (supportconfidence-interest, support-conviction-interest, support-reliability-interest), the supportconfidence-interest framework is best. The other two work well only when rules are
positively correlated.
4 Most interesting rules
As we can see from the conclusions drawn in section 3, we do not have a synthetic
measure that can incorporate all four fundamental principles. Instead, a framework of
three separate measures is needed to fully capture “all” properties of association rules.
This “three measure framework” corresponds to the three random variables P[ X ] ,
P [ Y ] and P [ X  Y ] for calculating the rules.
Data mining has to deal with enormous amount of data. Thus the complexity of
algorithms for generating interesting rules is one of the major concerns. In the literature,
techniques to reduce complexity can be classified into three categories, one sets up
thresholds for each measure of interestingness and uses those thresholds to reduce the
search space, the second one is to define a partial order on selected measures [11]. With
the partial order, rules are classified into multiple classes. Algorithms are then employed
to find the most interesting rule in each class. The algorithms in the third category are to
find a synthetic measure and use it to sort the ruleset, so that top N rules can be presented
to end user.
The first kind of algorithms has the drawback of generating too many rules when the
thresholds are low, or losing important rules when the thresholds are high. The third kind
of algorithms has the drawback of requiring an appropriate synthetic measure. In the
preceding section we saw no good synthetic measure that can fully capture all the
properties.
We have recommended three frameworks to evaluate the goodness of rules. Each
framework requires three measures. In [11] a partial order technique is used in supportconfidence framework to mine most interesting rules in each class. It says that the lift
(interest) is monotone to both rule support and confidence. Thus the interest order can be
deduced from the partial order of support-confidence framework. This contradicts our
support-confidence-interest framework, because the interest measure would redundant if
it could be deduced from support-confidence framework. By further examination, we
found the claim that, when confidence is fixed, lift is monotone to rule support is
P[ X  Y ]
0.8 0.4 0.88
incorrect. For example, conf ( X  Y ) 
. The rule
 0.8 


P[ X ]
0.1 0.05 0.11
support could be either 0.8, 0.4, or 0.88, even though the confidence is fixed at 0.8. For
the same reason, conviction is not monotone to support. Thus we cannot have a supportconviction framework. Another obvious example is introduced in Table 2 [7], where rule
X  Y ’s (support, confidence) = (25%, 50%) is before rule X  Z ’s (support,
confidence) = (37.5%, 75%), but X  Y ’s interest =2 is after X  Z ’s interest=0.86, if
we sort them in ascending order.
Table 2: Interest does not follow the partial order in the support-confidence framework
X
1
1
1
1
0
0
0
0
Y
1
1
0
0
0
0
0
0
Z
0
1
1
1
1
1
1
1
5 Conclusions and future work
In this paper we proposed 6 principles for studying the interestingness measures of
association rule mining. By applying the proposed principles to the analysis of 5
measures, we found that at most three measures are needed to fully characterize rule
mining. While support-confidence framework is not sufficient to rank rules, we proposed
three alternatives to do so with no loss of implication, correlation, novelty and utility.
Based on the proposed alternatives, we discussed possible techniques for attaining most
interesting rules or top N rules.
In the paper we only discussed five measures. In the literature, numerous measures
were proposed to characterize correlation, implication and novelty. For example, P. Tan
et al [8] compared 10 measures (Laplace, Gini, RI, Interest, Conviction, etc.) with respect
to  -coefficient and concluded that IS measure most closely reflects the correlation
property of a rule. We feel that, for the ranking purpose, it is unnecessary to use a
measure that is the closest to  -coefficient to reflect the correlation property of a rule.
Any measure is fine as long as it is a monotone function of the  -coefficient. Thus we
chose Interest, conviction and reliability as the alternative measures for correlation study.
For the same reason we chose support as the alternative measure for utility, confidence
and conviction as the alternatives for implication, etc. The choices are merely for the
convenience of demonstrating the effectiveness of the 6 principles. In fact the efficiency
principle (Principle 6) encourages us to investigate other alternatives so that rule mining
can be efficiently done without the loss of full characterization. For example, if we have
a set of measures that can be computed more efficiently and fully characterizes the
implication, correlation, novelty and utility, then it certainly should be used to replace the
three alternatives proposed in the paper. Thus our future work is to explore possible
alternative frameworks and find the best one based on the efficiency principle.
References
[1] J. Han and M. Kamber, “Data Mining – concepts and techniques”, Morgan Kaufmann
Publishers, 2001.
[2] J. Hipp, U. Guntzer and G. Nakhaeizadeh, “Algorithms for association rule mining –
A general survey and comparison”, ACM SIGKDD, 2000, pp.58-64.
[3] Z. Zheng, R. Kohavi and L. Mason, “Real world performance of association rule
mining”, ACM SIGKDD, 2001.
[4] S. Brin, R. Motwani, and C. Silverstein, “Beyond Market Baskets: Generalizing
association rules to correlations”, ACM SIGMOD, 1997.
[5] S. Brin, R. Motwani, J. Ullman, and S. Tsur, “Dynamic Itemset Counting and
Implication rules for Market Basket Data”, ACM SIGMOD, 1997.
[6] K. Ahmed, N. EI-Makky and Y. Taha, “A note on ‘Beyond Market Baskets:
Generalizing association rules to correlations’”, ACM SIGKDD Explorations, Vol. 1,
Issue 2, pp. 46-48, 2000.
[7] C. Aggarwal, and P. Yu, “A new framework for Itemset Generation”, ACM PODS,
pp. 18-24, 1998.
[8] P. Tan, and V. Kumar, “Interestingness Measures for Association Patterns: A
Perspective”, KDD’2000 Workshop on Postprocessing in Machine Learning and Data
Mining, Boston, 2000.
[9] E. Cohen, “Programming in the 1990s: An introduction to the calculation of
programs”, Springer-Verlag, ISBN 0-387-97382-6, 1990.
[10] R. Hilderman, and R. Hamilton, “Knowledge discovery and interestingness
measures: A survey”, Technical Report CS 99-04, C.S. Dept., Univ. of Regina, 1999.
[11] R. Bayardo Jr., and R. Agrawal, “Mining the most interesting rules”, ACM
SIGKDD, pp.145-154, 1999.
Download