MLDM 2003, Lecture Notes in Computer Science

advertisement
Discovering Association Patterns based on Mutual
Information
Bon K. Sy1
1 Queens College/CUNY
Computer Science Department
Flushing NY 11367
U.S.A.
bon@bunny.cs.qc.edu
Abstract. Identifying and expressing data patterns in form of association rules
is a commonly used technique in data mining. Typically, association rules discovery is based on two criteria: support and confidence. In this paper we will
briefly discuss the insufficiency on these two criteria, and argue the importance
of including interestingness/dependency as a criterion for (association) pattern
discovery. From the practical computational perspective, we will show how the
proposed criterion grounded on interestingness could be used to improve the efficiency of pattern discovery mechanism. Furthermore, we will show a probabilistic inference mechanism that provides an alternative to pattern discovery.
Example illustration and preliminary study for evaluating the proposed approach will be presented.
1 Introduction
In data mining, an association rule is typically expressed in form of A-> B. But the
definition of an association rule may vary slightly among different disciplines and
applications. For example, in philosophical logic [1] an association rule for two binary-valued logic variables A -> B with 80% certainty could mean 20% of the instances
in its “frame of discernment” bear a relationship of (A: True B: False). In other words,
the certainty factor is a measure of the “truthfulness” of a rule in the world. While in
uncertain reasoning A -> B with 80% certainty means 80% chance B will happen if A
happens; i.e., Pr(B|A) = 0.8. Yet in data mining an association rule A -> B could be
associated with two measures: support and confidence; where support is a measure of
significance of the presence of (A B) in the sample population of interests, while confidence is a measure of antecedence/consequence relationship much like uncertain
reasoning. An example of such an association rule in data mining could be 80% of the
movie goers for “The Lord of the Ring” went on to buy the book, and such a population accounts for 20% of the entire sample population.
Support and confidence are two measures widely used in data mining with the objective of detecting data patterns that exhibit antecedence/consequence relationships.
However, these two measures also present conceptual and computational challenges.
Let’s consider the case of the above example. Let A=1 be the moviegoers watching
“The Lord of the Ring”, and B=1 be the buyers of the book. Ideally, from the perspective of the utility of an association rule, we want both Pr(A=1 ∩ B=1) and
Pr(B=1|A=1) to be high. Consider the case where Pr(A=1) = Pr(B=1) = 0.8, and
Pr(A=1 ∩ B=1) = 0.64, we can easily see that the antecedence/consequence relationship Pr(B=1|A=1) = 0.8 is quite misleading since A and B are independent of each
other in the event level (because Pr(B=1|A=1) = Pr(B=1) = 0.8). Even subtler, an
association rule A -> B manifests an antecedence/consequence relationship that suggests a time precedence relationship; i.e., B happens after A. But let’s suppose the
population is the English literature students who have an assignment on writing critiques about the story. Let’s assume C=1 represents the English literature students
with such an assignment. It is then no surprise to expect that the antecedence/consequence relationships are indeed C -> A and C -> B. And since watching the
movie prior to reading the book could save time on getting an idea about the story, it
is natural that students may watch the movie first! But from the observed data, if we
do not know about C=1, we may end up concluding A -> B, thus a fallacy on the situation. This situation is referred to as spurious association [2] that has been known for a
long time in the philosophy community. It is well known that a fallacy due to spurious
association can only be disproved; while we may never be able to prove the truthfulness of an association rule that manifests an antecedence/consequence relationship.
Nevertheless, it is possible to examine the “interestingness” of an association about
whether the events in a data pattern are independent of each other or not [3], [4].
The objective of this paper is to investigate information-statistical criteria for discovering data patterns that exhibit interesting association. Our primary goal is to introduce an information-statistical measure that bears an elegant statistical convergence
property for discovering association patterns. The proposed approach is more than just
adding another constraint. We will show how this could lead to reduction in the computational cost based on probabilistic inference of high order patterns from low order
patterns.
In section 2 we will first formulate the problem, and conduct an analysis on the
complexity of discovering association rules/patterns. In section 3 the state-of-the-art
approach for discovering association rules/patterns using a priori property will be
discussed. In section 4 we will present a novel probabilistic inference approach based
on model abstraction for reasoning high order association patterns from low order
association patterns. In section 5 a modified a priori algorithm that integrates the
probabilistic inference approach will be detailed. To evaluate the effectiveness of the
proposed approach, a preliminary study using a data set about cereals will be reported
in section 6. In section 7 we will summarize as the conclusion of this paper the contributions of this research.
2 Problem formulation and analysis
Let X = {x1, x2, ..., xn} be a set of n categories, and D = {D1, D2, … Dn} be the domain set of the corresponding categories. A domain Di is a mutually exclusive set of
items of category xi, including a null value, if necessary, to indicate no item selection
from the category. For the sake of discussion, we will assume each domain carries m
items; i.e., |D1| = |D2| = … = |Dn| = m.
An item set transaction is represented by Di x Dj x … Dk; where {Di, Dj, … Dk} is
a subset of D. Let T = {t1 … tn} be the set of all possible transactions. An association
pattern is a transaction with at least two items. Let A = {a1 …. av} be the set of all
possible association patterns. It is not difficult to find out the number of all possible
association patterns v = ∑k= 2 n mk(n,k) = (m+1)n – mn – 1where (n,k) = n!/k!(n-k)!.
Consider a case of 11 categories (i.e., n = 11) and m = 4, the number of possible association patterns is 511 – 45. In other words, the number of association patterns grows
exponentially with the number of categories [5].
A k-tuple association pattern (k > 1) is an item set of k categories. This k-tuple association pattern will also be referred to as a pattern of kth-order. For a given k-tuple
association pattern, there are ∑i= 1 k-1(k,i) possibilities on deriving an association rule.
Since we have already mentioned the issue of spurious association, this paper will
only focus on discovering significant association patterns rather than association rules.
Even if our focus is to discover significant association patterns, we need to answer a
fundamental question: what properties are desirable for a significant association patterns? In other words, what association patterns should be considered significant?
In this research, an association pattern ai consisting of items {i1, i2, … ip} is considered α-significant if it satisfies the following conditions:
1. The support for ai, defined as Pr(ai), is at least α; i.e., Pr(ai) ≥ α. (C1)
2. The interdependency of {i1, i2, … ip} as measured by mutual information
measure MI(ai) = Log2 Pr(i1, i2, … ip)/Pr(i1)Pr(i2)… Pr(ip) is significant. (C2)
As reported elsewhere [6], [7], mutual information measure asymptotically converges to χ2. A convenient way to determine whether MI(ai) is significant is to compare the mutual information measure with χ2 measure; i.e., MI(ai) is significant if
MI(ai) ≥ β(χ2) γ; where β and γ are some scaling factors, and due to Pearson, χ2 = (oi –
ei)2/ei.
In other words, to determine whether any one of the (m+1)n – mn – 1 association
patterns is significant or not, we test it against the above two conditions. Clearly this is
computationally prohibitive if we have to test all the patterns against the two conditions above. Fortunately the famous a priori property [8], [9] allows us to prune away
patterns in a lattice hierarchy that are extensions of a pattern, but did not survive the
test against the first condition (C1) just mentioned.
3
State-of-the-art: A priori and Mutual Information Measure
An association pattern is basically a collection of items. Suppose there is a 2-tuple
association pattern a1 = {d1, d2}; where d1 is an item element of the set D1, and d2 is
an item element of the set D2. We can consider an association pattern as an event in a
probability space with random variables x1 assuming the value d1, and x2 assuming
the value d2; i.e., Pr(a1) = Pr(x1:d1 ∩ x2:d2). An extension ea1 of a pattern a1 is a
pattern consisting of an item set D’ that is a proper superset of {d1, d2}; i.e., {d1, d2}
 D’. It is not difficult to observe the property: Pr(a1) ≥ Pr(ea1) since Pr(a1) = ∑D’ –
{d1 d2} Pr(ea1). Therefore, if a1 is not α-significant because Pr(a1) < α, ea1cannot be
α-significant, thus facilitating a pruning criterion during the process of identifying
significant association patterns --- the essence of a priori property.
On the other hand, if the mutual information measure of a1 is not significant, it
does not guarantee the extension of a1 not significant. Consider ea1= {x1 x2 x3}, if
Pr(x1:d1 ∩ x2:d2 ∩ x3:d3)/Pr(d3) > Pr(x1:d1 ∩ x2:d2), Pr(x1:d1 ∩ x2:d2 ∩ x3:d3)
> Pr(x1:d1)Pr(x2:d2)Pr(x3:d3), and Pr(x1:d1 ∩ x2:d2) > Pr(x1:d1)Pr(x2:d2), then
MI(ea1) > MI(a1). Furthermore, it is possible that an association pattern satisfies
(C1), but fails (C2) (mutual information measure). Therefore, (C2) provides a complementary pruning criterion for discovering significant association patterns.
In the process of deriving significant association patterns, we need one pass on all
the transaction records to obtain the marginal probabilities required for mutual information measure. To identify second order (2-tuple) association patterns, we need to
permute every pair of items in a transaction record and keep track the frequency information in the same first pass [10]. The frequency information is then used to derive
the joint probability information needed for mutual information measure and for determining α-significant. At the end of the pass, we can then determine what association
patterns --- as well as the patterns that are the extensions --- to discard, before the
commencement of the next pass for identifying third-order patterns.
In each pass, the complexity is proportional to number of transaction records. In
many applications such as on-line shopping, the number of transaction records tends
to be very large. In such a case, the computational cost for deriving significant association patterns could be high even the complexity is linear with respect to the number
of transaction records. A fundamental question is whether we could deduce high order
association patterns from low order patterns without the need of repetitively scanning
the transaction records. This is particularly so should the number of transaction records be large. To answer this question, we explore a novel model abstraction process
that permits probabilistic inference on high order association patterns.
4 Model abstraction for probabilistic inference
Let’s consider a case of 11 discrete random variables (categories) {x1, … x11} and the
domain of each variable consists of 4 states; i.e., xi can assume a value from a set {1 2
3 4} for i = 1 .. 11. Let’s further assume (x1:1 x2:1), (x1:1 x3:1), and (x2:1 x3:1) have
been identified as significant association patterns. We want to know whether the extension (x1:1 x2:1 x3:1) is a significant association pattern. A naïve approach is to
conduct another scanning pass to obtain the frequency information for α-significant
test and mutual information measure.
At the time (x1:1 x2:1), (x1:1 x3:1), and (x2:1 x3:1) are determined as significant
association patterns, we would have already obtained the information of all marginal
probabilities Pr(xi) (where i = 1.. 11), and the joint probabilities Pr(x1:1, x2:1),
Pr(x1:1 x3:1), and Pr(x2:1 x3:1). Let’s assume Pr(x1:1) = 0.818, Pr(x2:1) = 0.909,
Pr(x3:1) = 0.42, Pr(x1:1∩ x2:1) = 0.779, Pr(x1:1 x3:1) = 0.364, and Pr(x2:1 x3:1) =
0.403. Pr(x1:1 ∩ x2:1 ∩ x3:1) is the only missing information needed for determining whether (x1:1 x2:1 x3:1) is a significant association pattern. Suppose the value of
α used for α-significant test is 0.2, if (x1:1 x2:1 x3:1) is a significant association pattern, it must satisfy the following conditions:
Pr(x1:1) = 0.818

∑x2 x3 Pr(x1:1 ∩ x2 ∩ x3) = 0.818
Pr(x2:1) = 0.909

∑x1 x3 Pr(x1 ∩ x2:1 ∩ x3) = 0.909
Pr(x3:1) = 0.42

∑x2 x3 Pr(x1 ∩ x2 ∩ x3:1) = 0.42
Pr(x1:1∩ x2:1) = 0.779

∑ x3 Pr(x1:1 ∩ x2:! ∩ x3) = 0.779
Pr(x1:1∩ x3:1) = 0.364

∑ x2 Pr(x1:1 ∩ x2 ∩ x3:1) = 0.364
Pr(x2:1∩ x3:1) = 0.403

∑ x1 Pr(x1 ∩ x2:1 ∩ x3:1) = 0.403
Pr(x1:1∩ x2:1∩ x3:1) ≥ 0.2

Pr(x1:1 ∩ x2:1 ∩ x3:1) - S = 0.2
where S is a non-negative slack variable
∑x1 x2 2 x3 Pr(x1 ∩ x2 ∩ x3) = 1
Although the domain of each variable x1, x2, and x3 consist of 4 states, we are interested in only one particular state of the variable; namely, x1 = 1, x2 = 1, and x3=1.
We can define a new state 0 to represent the irrelevant states {2, 3, 4}. In other words,
the above example consists of only 23 = 8 joint probability terms rather than 43 = 64
joint terms, thus reducing the number of dimensions. In the above example, there are
eight equality constraints and nine unknowns (one for each joint probability term and
a slack variable). It is an underdetermined algebraic system that has multiple solutions; where a solution is a vector of size = 9. Among all the solutions, one corresponds to the true distribution that we are interested in. As discussed in our previous
research [11], the underdetermined algebraic system provides a basis for formulating
an optimization problem that aims at maximizing the likelihood estimate of the statistical distribution of the data.
Although the probabilistic inference approach just demonstrated offers an alternative to scanning the transaction records, there are three related questions about its
utility. First, under what circumstances probabilistic inference approach is more attractive in comparing to a straightforward scanning? Second, how feasible and expensive is it computationally on solving the optimization problem? Third, how accurate is
the estimate of the joint probability information (for example, Pr(x1:1 ∩ x2 ∩ x3) in
the above case)?
To answer the first question, we first note that probabilistic inference is applied only to the high order association patterns that we are interested in. But unless the order
of association patterns is relatively low, the process of probabilistic inference has to
be applied one-at-a-time to each association pattern that we are interested in. Therefore, probabilistic inference approach will have a distinct advantage over a straightforward scanning when (1) the number of transaction records is large, (2) each transaction record consists of a large number of categories, and (3) only few high order
association patterns are of interests.
As we reported elsewhere [11], the problem of probabilistic inference formulated
as an optimization problem under the principle of minimum biased information can be
solved quite efficiently. In practice, we can solve an optimization problem with 300
some variables within a minute using a 450MMX HZ personal computer. For data
mining problems, 300 some variables translates to the 8th-order association patterns
(i.e., trunc(Log2300)). In practice, it is highly unlikely to have significant association
patterns with an order of seven or above.
The third question is perhaps the most challenging one. From the perspective of
computational geometry, probabilistic inference is a search process in a high dimensional probability sub-space defined by the (in)equality constraints [12]. The error
percentage defined by the normalized distance between the estimated optimal joint
probability and the true joint probability increases as the order of association patterns
increases. This is because the joint probability (support) of the association patterns
decreases as the order increases, thus increasing the error sensitivity. As a result, when
the estimated joint probability of an association pattern is used in mutual information
measure to determine its significance, the asymptotic convergence of mutual information measure towards chi-square distribution will need to be calibrated. As reported
elsewhere [6], [7], mutual information measure of two random variables (x1 x2) has
the following asymptotic convergence property: I(x1: x2) -> χ2 (K-1)(J-1)(1- α)/2N; where
K and J are the number of states of x1 and x2 respectively, N is the sample population
size, and α is the significance level. The calibration for adjusting the error sensitivity
of the joint probability as it is used in calculating the mutual information measure of a
high order association pattern MI(x1 x2 .. xn) in the event level is shown below:
Eˆ
1
 2 ( E ' )0 / 2
MI ( x1, x 2... xn)  (
)(
)
Pr( x1, x 2... xn) 2 N
(1)
where MI(x1,x2…xn) = Log2Pr(x1 x2 … xn)/Pr(x1)Pr(x2)…Pr(xn)
N = sample population size
χ2 = Pearson chi-square test statistic defined as (oi – ei)2/ei
with oi = observed count = N Pr(x1 x2 .. xn)
ei = expected count under the assumption of independence
= N Pr(x1)Pr(x2)...Pr(xn)

E = Expected entropy measure of estimated probability model
E’ = Maximum possible entropy of estimated probability model
O = order of the association pattern (i.e., n in this case)
Referring to the previous example, the optimal solution that maximizes the likelihood
estimate under the assumption of minimum biased information is [Pr(x1:0 ∩ x2:0 ∩
x3:0) = 0.035, Pr(x1:0 ∩ x2:0 ∩ x3:1) = 0.017, Pr(x1:0 ∩ x2:1 ∩ x3:0) = 0.091,
Pr(x1:0 ∩ x2:1 ∩ x3:1) = 0.039, Pr(x1:1 ∩ x2:0 ∩ x3:0) = 0.039, Pr(x1:0 ∩ x2:0 ∩
x3:1) = 0, Pr(x1:0 ∩ x2:1 ∩ x3:0) = 0.415, Pr(x1:1 ∩ x2:1 ∩ x3:1) = 0.364]. The

expected entropy measure of estimated probability model E = -∑x1 x2 2 x3 Pr(x1 ∩ x2 ∩
x3) Log2 Pr(x1 ∩ x2 ∩ x3) =2.006223053. The maximum possible entropy of estimated probability model E’ is the case of even distribution; i.e., E’ = -∑x1 x2 2 x3 Pr(x1 ∩
x2 ∩ x3) Log2 Pr(x1 ∩ x2 ∩ x3) = 3.
There is an interesting observation about the heuristics of the above equation. Let’s
consider the case of second-order association patterns; i.e., o=2. When the expected
entropy measure of estimated probability model is identical to that of maximum likelihood estimate, Pr(x1, x2) Log2Pr(x1, x2)/Pr(x1)Pr(x2) → χ2/2N. If we now sum up all
possible association patterns defined by (x1, x2) to examine the mutual information
measure in the variable level (as opposed to the event level), we will obtain the asymptotic convergence property: I(x1: x2) -> χ2/2N as discussed earlier.
5 Modified a priori algorithm
Based on the methods discussed in the previous sections, below is an algorithm that
combines a priori property with mutual information measure for identifying significant
association patterns:
Step 1:
Conduct a scanning pass to derive the marginal probabilities Pr(xi = dk) (i = 1..n)
for all possible dks, and the joint probabilities Pr(xi = dl, xj = dm) (i<j, i=1..n-1,
j=2..n) for all possible dls and dms by checking each transaction record one at a time.
Remark: This can be easily achieved by creating a bin as a place holder of frequency count for each unique xi and (xi xj) [12], and discard the bin (xi xj) when its frequency count at the time of k% completion of transaction record scanning is less than
N(α - 1 + k/100) --- a condition that guarantees the frequency count to be less than the
threshold α defined for α-significant.
Step 2:
Rank all w (≤ n(n-1)/2) association patterns (xi, xj) survived in (i) step 1, and (ii)
the test due to (C2) about mutual information measure, in the descending order of the
corresponding joint probabilities, and put in a collection set AS.
Step 3:
Select w’ (≤ w) association patterns from the top of AS, and enumerate each association pattern (referred to as a source pattern) with a new item Ij from a category/attribute variable not already in the association pattern that satisfies the following
condition:
Every second-order association pattern formed by Ij and an item in its source
pattern is a significant association pattern in AS. For example, suppose the
source pattern is (x1:d1, x2:d2), it can be enumerated to (x1:d1, x2:d2, xj:Ij)
if both (x1:d1, xj:Ij) and (x2:d2, xj:Ij) are significant association patterns.
Step 4:
Based on the number of newly enumerated patterns and the order of the patterns,
determine according to the scenario discussed in the previous section whether the joint
probabilities for the newly enumerated patterns should be derived from a new pass of
transaction record scanning or probabilistic inference described earlier. In either case,
proceed to derive the joint probabilities for the newly enumerated patterns and test
against the condition (C1) in section 2. If a pattern does not pass the test, discard it
from the list for further processing.
Step 5:
For each newly enumerated association pattern survived in step 4, test against the
condition (C2) (mutual information measure) in section 2. If a pattern passes the test,
insert the newly enumerated significant association pattern into a temporary bin TB in
such a way that the descending order of joint probabilities of the patterns in TB is
preserved.
Step 6:
Insert the items in TB to the top of AS. If the computational resources are still
available, empty TB and go to step 3. Otherwise stop and return AS.
6 Preliminary study and result discussion
In order to better understand the computational behavior of the proposed approach
discussed in this paper, a preliminary study was conducted using a dataset about different brands of cereals. This dataset was originally published in the anonymous ftp
from unix.hensa.ac.uk, and re-distributed as cereal.tar.gz/cereal.zip by [13].
This dataset is chosen because it is relatively small to allow an exhaustive data
analysis to establish a “ground truth” for the purpose of evaluation. This dataset consists of 77 records. Each record consists of 11 categories/attributes. The number of
possible second-order association patterns, therefore, is 42(11x10)/2 = 880. In this
preliminary study, we set α = 0.2 for α-significant test. 57 out of 880 association patterns survived the test due to condition (C1). Among the 57 association patterns, 15
failed the test due to (C2) (mutual information measure).
Based on the extension of the 42 second-order significant association patterns,
third-order association patterns were derived and 25 of the third-order patterns survived the test due to (C1). Among the 25 association patterns, 19 passed the test due
to (C2). Based on the 19 third-order significant association patterns, three significant
association patterns of fourth-order were found. This completes the construction of the
“ground truth” for evaluation.
To evaluate how effective is the proposed algorithm presented in section 5, applying step 1 of the algorithm produced the same set of second-order association patterns.
In step 2, we chose w = 1; i.e., only the most probable significant association pattern
(Pr(x9:2, x10:3) = 0.779) was used for enumeration of third-order association pattern
candidates. Following the condition stipulated in step 3, eight candidates of thirdorder association patterns were found. Among the eight candidates, five of the 19
actual third-order significant association patterns were found. In other words, we were
able to find 26% (5/19) of the third-order significant association patterns using only
2% (1/42) of the candidate set for enumeration.
To understand better the behavior of probabilistic inference, we repeated step 4 except that probabilistic inference is applied on the same eight candidates rather than
scanning the dataset. The following results were found.
Table 1. Comparison between using probabilistic inference vs exhaustive scan
Case
Association pattern
Mutual
infor-
Adjusted
MI > C ?
Ground truth
1
2
3
4
5
6
7
8
x1:3
x3:1
x3:2
x4:3
x6:2
x7:2
x7:3
x9:2
x9:2 x10:3
x9:2 x10:3
x9:2 x10:3
x9:2 x10:3
x9:2 x10:3
x9:2 x10:3
x9:2 x10:3
x10:3 x11:3
mation MI
0.315
-0.005484
0.135
0.221
0.391
0.178
0.218
0.211
chi-square C
0.208
0.003804
0.085
0.135
0.311
0.143
0.194
0.139
Yes
No
Yes
Yes
Yes
Yes
Yes
Yes
No
No
Yes
Yes
Yes
No
Yes
Yes
When the seven cases where MI > C in table 1 are used to enumerate fourth-order
patterns, 11 such patterns are obtained. Among the 11 fourth-order patterns, two of the
three true significant association patterns are covered.
When the probabilistic inference was applied again, only one of the two true fourthorder significant association patterns was found. This leads to a 50% false-negative
error rate. Among the nine cases that were not significant association patterns, probabilistic inference process drew the same conclusion in six cases, yielding a 33% falsepositive error rate. This results in a weighted error rate of (2/11)0.5% + (9/11)0.333%
= 36%, or a 64% accuracy rate.
As also noted in the study, the condition stipulated in step 3 plays an essential role
in maintaining the enumeration space small. 42 second-order significant association
patterns were found. An exhaustive enumeration of 42 second-order patterns will yield
at least 42x(11-2)x4 – 42 = 1470 third-order association pattern candidates. In our
study we used only one of the 42 patterns for an enumeration. This results in (1x(112)x4) 36 possible third-order pattern candidates while the condition stipulated in step
3 restricted the enumeration to only eight third-order pattern candidates.
7 Conclusion
This paper discussed new criteria based on mutual information measure for defining
significant association patterns, and a novel probabilistic inference approach utilizing
model abstraction for discovering significant association patterns. The new criteria are
proposed to address the interestingness, defined by interdependency among the attributes, of an association pattern. The novel probabilistic inference approach is introduced to offer an alternative approach to deduce the essential information needed for
discovering significant patterns without the need of an exhaustive scan of the entire
database. The preliminary study has showed interesting results. Our follow-up study
will focus on applying the proposed approach to real world data sets.
Acknowledgement: This work is supported in part by PSC CUNY Research Award
and NSF DUE CCLI #0088778.
References
1. Genesereth M., Nilsson N.: Logical Foundations of Artificial Intelligence. Morgan
Kaufmann (1987)
2 Freedman, D.: From association to causation: Some remarks on the history of statistics. Statistical Science 14 Vol. 3 (1999) 243-258
3. Cover T.M., Thomas J.A.: Elements of Information Theory. New York: John Wiley
& Sons (1991)
4. Rish I., Hellerstein J., Jayram T.: An Analysis of Data Characteristics that affect
Naive Bayes Performance. Technical Report RC21993, IBM T.J. Watson Research
Center (2001)
5. Yang J., Wang W., Yu P.S., Han J.: Mining Long Sequential Patterns in a Noisy
Environment. ACM SIGMOD June 4-6, Madison, Wisconsin (2002) 406-417
6. Kullback S.: Information Theory and Statistics. John Wiley & Sons Inc (1959)
7. Basharin G.: Theory of Probability and its Applications. Vol. 4 (1959) 333-336
8. Agrawal R., Imielinski T., Swami A.: Mining Association Rules between Sets of
Items in large Databases. Proc. ACM SIGMOD Conf. Washington DC, May (1993)
9. Agrawal R., Srikant R.: Fast Algorithms for Mining Association Rules. VLDDBB
(1994) 487-499
10. Toivonen H.: Sampling Large Databases for Association Rules. Proc. 22nd VLDB
(1996) 134-145
11. Sy B.K.: Probability Model Selection Using Information-Theoretic Optimization
Criterion. J. of Statistical Computing & Simulation, Gordan & Breach. V69-3 (2001)
12. Hoeffding W.: Probability Inequalities for sums of bounded Random Variables.
Journal of the American Statistical Associations. Vol. 58 (1963) 13-30
13. Zaki M.: SPADE: an efficient algorithm for Mining Frequent Sequences. Machine
Learning Journal, Vol. 42-1/2 (2001) 31-60
14. www http://davis.wpi.edu/~xmdv/datasets.html
Download