On Characterization and Discovery of Minimal Unexpected Patterns in Data

advertisement
On Characterization and Discovery of Minimal Unexpected Patterns in Data
Mining Applications
Balaji Padmanabhan
Operations and Information Management Department
The Wharton School, University of Pennsylvania
http://www.wharton.upenn.edu/faculty/padmanabhan.html
balaji@wharton.upenn.edu
Alexander Tuzhilin
Information Systems Department
Stern School of Business, New York University
http://www.stern.nyu.edu/~atuzhili/
atuzhili@stern.nyu.edu
On Characterization and Discovery of Minimal Unexpected Patterns in Data
Mining Applications
Abstract
A drawback of traditional data mining methods is that they do not leverage prior knowledge of users. In
many business settings, managers and analysts have significant intuition based on several years of
experience. In prior work we proposed a method that could discover unexpected patterns in data by using this
domain knowledge in a systematic manner. In this paper we continue our focus on discovering unexpected
patterns and propose new methods for discovering a minimal set of unexpected patterns that discover orders
of magnitude fewer patterns and yet retain most of the truly unexpected ones. We demonstrate the strengths
of this approach experimentally using a case study application in a marketing domain.
Keywords: data mining, databases, rule discovery, association rules, unexpectedness, interestingness,
minimality.
1. Introduction
A well-known criticism of many rule discovery algorithms in data mining is that they generate too many
patterns, many of which are obvious or irrelevant [PSM94, ST95, BMU+97, S97, BA99]. It stands to reason
that more effective methods are needed to discover fewer and more relevant patterns from data. One way to
approach this problem is by focusing on discovering unexpected patterns [ST95, ST96, LH96, LHC97,
Suz97, CSD98, Sub98, BT98, PT98, PT99, P99], where unexpectedness of discovered patterns is usually
defined relative to a system of prior expectations. In particular, we proposed in our prior research [PT98,
PT99, P99] a characterization of unexpectedness based on logical contradiction of a discovered pattern and
prior beliefs and presented new algorithms for discovering such unexpected patterns. As demonstrated in
[PT98, PT99, P99], this approach generated far fewer and more interesting patterns than traditional
approaches.
In this paper we extend our prior work on discovering unexpected patterns [PT98, PT99] and propose a new
approach that further reduces the number of such patterns in a significant way, yet retaining most of the truly
interesting patterns. This is achieved through defining a minimal set of unexpected patterns as those
unexpected patterns that are not refinements of other unexpected patterns and hence cannot be monotonically
inferred from other unexpected patterns. Moreover, we present efficient algorithms that discover the minimal
set of unexpected patterns and test these algorithms on “real-world” data to see how well they perform in
practice. In the context of discovering a minimal set of patterns in data mining, [BA99, LHM99, SLR+99,
1
TKR+95, SA96, BAG99] also provide alternate approaches to characterizing this concept. In Section 3 these
approaches are described and contrasted to our method of discovering minimal sets of unexpected patterns.
The power of the approach presented in this paper lies in combining two independent concepts of
unexpectedness and minimality of a set of patterns into one integrated concept that provides for the discovery
of small but important sets of interesting patterns. Moreover, our proposed methods are efficient in the sense
that they focus directly on discovering minimal unexpected patterns rather than using any of the postprocessing approaches, such as filtering, to determine the minimal unexpected patterns from the set of all the
discovered patterns. Further, the approach presented in this paper is effective in supporting decision making
in real business applications where unexpected patterns with respect to managerial intuition can be of great
value.
The rest of this paper is organized as follows. In Section 2 we present an overview of the concept of
unexpectedness and provide a summary of the relevant related work. We then present in Section 3 related
work to the concept of minimality of a set of rules and definitions and formal characterizations of the
minimal set of patterns and the minimal set of unexpected patterns. In Section 4 we first present naïve and
semi-naïve approaches to discovering the minimal set of unexpected patterns; we then present MinZoomUR,
the algorithm for discovering the minimal set of unexpected patterns. Experimental results are presented in
Section 5 followed by conclusions in Section 6.
2. Overview of Unexpectedness
Unexpectedness of patterns has been studied in [ST95, ST96, LH96, LHC97, Suz97, CSD98, Sub98, BT98,
PT98, PT99, P99] and a comparison of these approaches is provided in [P99]. In this paper we follow our
previous approach to unexpectedness presented in [PT98, PT99, P99] because it is simple and intuitive (as it
is based on a logical contradiction between a discovered pattern and a belief) and lends itself to efficient
algorithms that, as demonstrated in [PT98, P99], discover interesting patterns in various applications.
Moreover, this paper builds on our previous work on unexpectedness [PT98, PT99] by proposing new
methods that significantly improve the effectiveness of the methods described in [PT98, PT99].
To make the paper self-contained, we present an overview of our previous approach to unexpectedness from
[PT98, PT99]. To define unexpectedness, we start with a set of beliefs that represent knowledge about the
domain and use these beliefs to seed the search for all unexpected patterns defined as rules. In particular, let I
= {i1, i2, …, im} be a set of discrete attributes (also called “items” [AIS93]), some of them being ordered and
others unordered. Let D = {T1, T2, ..., TN} be a relation consisting on N transactions [AMS+95] T1, T2, ..., TN
2
over the relation schema {i1, i2, …, im}. Also, let an atomic condition be a proposition of the form value1 ≤
attribute ≤ value2 for ordered attributes and attribute = value for unordered attributes where value, value1,
value2 belong to the finite set of discrete values taken by attribute in D. Finally, an itemset is a conjunction of
atomic conditions. Then we assume that rules and beliefs are defined as extended association rules of the
form X → A, where X is the conjunction of atomic conditions (an itemset) and A is an atomic condition. As
defined in [AIS93], the rule has confidence c if c% of the transactions in D that contain X also contain A and
the rule has support s in D if s% of the transactions in D contain both X and A. Finally, a rule is said to hold
on a dataset D if the confidence of the rule is greater than a user-specified threshold value chosen to be any
value greater than 0.5.
Given these preliminaries, we define unexpectedness as follows.
1
Definition [PT98, P99] . The rule A → B is unexpected with respect to the belief X → Y on the dataset D if
the following conditions hold:
(a) B AND Y |= FALSE. This condition imposes the constraint that B and Y logically contradict each
other.
2
(b) A AND X holds on a statistically large subset of tuples in D . We use the term “intersection of a rule
with respect to a belief” to refer to this subset. This intersection defines the subset of tuples in D in which
the belief and the rule are both “applicable” in the sense that the antecedents of the belief and the rule are
both true on all the tuples in this subset.
(c) The rule A, X → B holds (same level of threshold support and confidence). Since condition (a)
constrains B and Y to logically contradict each other, the rule A, X → Y does not hold.
A key assumption in this definition is that of the monotonicity of beliefs that is motivated in [PT98, P99]. In
particular, if we have a belief Y → B that we expect to hold on a dataset D, then monotonicity assumes the
belief should also be expected to hold on any statistically large subset of D.
The above definition of unexpectedness states that the rule and the belief logically contradict each other,
that the rule should hold on its intersection with the belief, and this intersection is supposed to be
1
We would like to point out that this definition is applicable not only to the specific structure of the rules defined above, but also to
a broader set of rules. Moreover, this observation also holds for the definitions of the minimal set of rules and monotonicity inference
introduced in Section 3. However, the discovery algorithms presented in this paper are designed for the rules having this specific
structure introduced above.
2
One of the ways to define “large subset of tuples” is through the user-specified support threshold value.
3
statistically large. This condition is “unexpected” because, according to the monotonicity assumption, it is
expected that the belief should hold on the subset of data defined by the belief’s intersection with the rule.
Given the definition of unexpectedness, [PT98, P99] propose algorithm ZoomUR that discovers all the
unexpected rules with respect to a set of beliefs that satisfy user-specified minimum support and confidence
requirements. ZoomUR consists of algorithms for two phases of the discovery strategy - ZoominUR and
ZoomoutUR.
In the first phase of ZoomUR, ZoominUR discovers all unexpected patterns that are refinements to any
belief. More specifically, given any belief X → Y, ZoominUR discovers all unexpected rules of the form X, A
→ B such that B AND Y |= FALSE. As done in the Apriori algorithm [AMS+95], ZoominUR first generates
3
“large” itemsets incrementally and then generates rules from the discovered itemsets. As originally proposed
in Apriori [AMS+95], ZoominUR also uses the observation that subsets of large itemsets should also be large
to limit the search for large itemsets. However, unlike Apriori, which starts from an empty set, ZoominUR
starts with an initial set of candidate itemsets that are derived from beliefs such that every candidate itemset
considered by ZoominUR must contain the body of the belief and an atomic condition that contradicts the
head of the belief.
In the second phase of ZoomUR, starting from all the unexpected rules that are refinements to a belief,
ZoomoutUR discovers more general rules (generalizations) that are also unexpected. Specifically from each
unexpected refinement of the form X, A → B ZoomoutUR discovers all the unexpected rules of the form X’,
A → B where X’ ⊂ X. The rules that ZoomoutUR discovers are not refinements of beliefs, but more general
rules that satisfy the conditions of unexpectedness as defined above. For example, if a belief is that
“professional → weekend” (professionals tend to shop more on weekends than on weekdays), ZoominUR
may discover a refinement such as “professional, december → weekday” (in December, professionals tend to
shop more on weekdays than on weekends). ZoomoutUR may then discover a more general rule “december
→ weekday”, which is totally different from the initial belief “professional → weekend”.
4
Though ZoomUR discovers only the unexpected rules and also far fewer rules than Apriori , it still discovers
large numbers of rules many of which are redundant in the sense that they can be obtained from other
discovered rules. For example, given the belief diaper → beer and two unexpected patterns diaper, weekday
→ not_beer and diaper, weekday, male → not_beer the second unexpected pattern can be inferred from the
3
4
An itemset is said to be large if the percentage of transactions that contain it exceed a user-specified minimum support level.
This is not surprising - the objective of Apriori is to discover all strong rules, while ZoomUR discovers only unexpected rules.
4
first one under the monotonicity assumption. Therefore, to improve the discovery process, we introduce in
this paper the concept of a minimal set of unexpected patterns and present efficient algorithms that discover
this set of rules.
3. Minimal Set of Patterns
The notion of minimality is very broad and its studies go way back to the ancient times, including the work
of Occam and his Occam’s razor. Since in this paper we focus on discovery of unexpected patterns, we will
focus only on minimality as related to this problem.
One of the early influential works related to minimality of a set of patterns was presented by Mitchell in
[M82]. In particular, [M82] presents a unifying approach to the problem of generalizing knowledge (that, for
example, can be represented as rules) by viewing generalization as a search problem. Moreover, based on the
specific search methods used, Mitchell [M82] also categorizes several rule learning systems [BM78, P70,
W75, HRM7, V78, M77, MUB82] that deal with generalization. In particular, [M82] deals with a broader set
of objects (that can also include rules) and formulates the generalization problem as follows. Given a set of
instances specified in an instance language, the generalization problem is formulated in [M82] as a search for
descriptions in a generalization language such that these generalizations are consistent with a set of training
examples that are labeled with the appropriate generalization. [M82] adopts a strong notion of consistency by
defining a generalization to be consistent with training examples if it matches all the positive examples and
does not match any negative examples. Moreover, [M82] introduces a partial order for generalizations (one
generalization being more general than another one) and defines minimally specific generalizations based on
this partial order. At the extremes of this partial order are the G-set (the most general set) and the S-set (the
most specific set). Although [M82] deals with abstract objects, Mitchell’s approach is also applicable to rules
and their generalizations.
In the context of discovering a minimal set of rules in data mining, the approach presented in [M82] has the
following two limitations. First, in most cases it may not be possible to have training examples (a set of
discovered rules) that are classified into known generalizations. Therefore, rather than learning these
generalization relationship among different objects, it is necessary to define them. Recent characterizations of
various notions of minimality in the knowledge discovery literature take this approach [BA99, LHM99,
SLR+99], and we will describe them shortly. Second, the concept of consistency in [M82] is too strong to
eliminate rules, typically flagged as uninteresting by users. For example, requiring that all refinements of a
discovered rule diaper → beer also be discovered in order to characterize the rule as minimal is too strong
since there may be several rules of the form diaper, X → beer that do not have adequate support or
5
confidence. In practice it is rarely the case that all possible refinements to a discovered rule are also
discovered. Hence we need a weaker notion of consistency than the one proposed in [M82].
To address these limitations, [BA99, LHM99, SLR+99, TKR+95, SA96, BAG99] provide alternate
approaches to characterizing a minimal set of discovered rules. In particular, [BA99] presents an approach
that finds the “most interesting rules”, defined as rules that lie on a support and confidence frontier. Further
[BA99] proves that these rules necessarily contain the strongest rules discovered using several objective
criteria other than just confidence and support.
In [SLR+99] several heuristics for pruning large numbers of association rules have been proposed. One of
these heuristics prunes out certain refinements of rules, thus, alluding to the concept of minimality of a set of
rules. However, [SLR+99] focuses on the heuristics that prune redundant rules from a discovered set of rules
and does not explore the concept of minimality fully, nor proposes any algorithms for discovering a minimal
set of patterns.
In [LHM99] a technique is presented to prune and then summarize an already discovered set of association
rules. In particular, [LHM99] defines the concept of direction-setting rules and demonstrates how nondirection-setting rules can be inferred from them. Therefore, the set of direction-setting rules constitutes a set
of rules that are “minimal” in some sense. This work is related to [SLR+99] in the sense that certain rule
refinements are pruned out in the [LHM99] approach and therefore, they are not direction-setting. However,
the approach presented in [LHM99] is different from [SLR+99] and from our approach in the sense that not
all refined rules are non-direction setting according to [LHM99]. Moreover, [LHM99] focuses on pruning
already discovered rules and does not address the issue of direct discovery of minimal sets.
An approach to eliminating redundant association rules is presented in [TKR+95]. In particular, [TKR+95]
introduces a concept of the “structural cover” for association rules and presents post-processing algorithms to
find the structural cover. In this paper we, present an alternative formal characterization of the minimal set of
patterns that corresponds to structural covers of [TKR+95] for the association rules but is also broader and
applicable to more general classes of rules. Moreover, [TKR+95] focuses on pruning already discovered
rules and does not address the issue of direct discovery of minimal sets.
Finally, the work of [SA96] and [BAG99] is also related to the problem of discovering minimal sets of rules.
In particular, [SA96] and [BAG99] provide methods for eliminating rules such that the support and/or
confidence values of these rules are not unexpected with respect to the support and confidence values of
previously discovered rules. However, this work is only marginally related to our approach because we focus
6
on a more general definition of minimality that does not directly depend on confidence and support of
discovered rules.
In this section we present our approach to minimality that is targeted to the discovery of unexpected patterns.
Since the concept of unexpectedness of discovered patterns is based on the monotonicity assumption about
the set of underlying beliefs, and since the monotonicity assumption is strongly linked to the concept of
refinement of patterns, our concept of minimality is based on refinement of patterns. Therefore, we do not
consider other kinds of minimality, such as direction-setting rules [LHM99] or SC-optimality [BA99].
In the rest of this section we formally define minimality of a set of patterns and the minimal set of
unexpected patterns. However, before defining these concepts, we introduce some preliminary definitions in
the next section.
3.1 Inference Under Monotonicity Assumption
Before introducing minimal rules, we need to define formally which rules can be inferred to hold on a dataset
due to the monotonicity assumption.
Definition. Let X and Y be two itemsets. Then itemset Y is a generalized refinement of itemset X (denoted as
Y = genref (X) ) if for all atomic conditions a from X
(a) if a is of the form attribute = value then a ∈ Y.
(b) if a is of the form value1 ≤ attribute ≤ value2 then Y contains an atomic condition value1 + δ ≤ attribute
≤ value2 - ε for some non-negative values of δ and ε.
Lemma 3.1. Y is a generalized refinement of X if and only if there exists itemset Z such that Y = X AND Z.
Sketch of the Proof. For unordered attributes this observation is trivial. For the ordered attributes from X
having the form value1 ≤ attribute ≤ value2, Z contains an atomic condition from Y of the form value1 + δ ≤
attribute ≤ value2 - ε.
Proposition 3.2. If X → Y holds on D and Z is a generalized refinement of X, then under the monotonicity
assumption, Z → Y holds on D.
Proof. According to Lemma 3.1, there exists itemset W such that Z = X AND W. Let D′ = { t ∈ D | Z = X
AND W holds on t }. Then the monotonicity assumption states that X → Y should hold on D′ and hence X, W
→ Y holds on D.
7
The above proposition states that rule X, W → Y can be inferred to hold on D under monotonicity assuming
that rule X → Y holds. However, this inference is applicable only to the rules having the specific structure
that was defined in Section 2 and considered throughout this paper (because the definition of generalized
refinement explicitly assumes this rule structure). We will provide an alternative characterization of
generalized refinement below that can define inference under the monotonicity assumption in more general
terms.
The next proposition provides an alternative characterization of generalized refinement that will be used
subsequently.
Proposition 3.3. Itemset C is a generalized refinement of itemset A if and only if C |= A.
Proof. The necessary condition immediately follows from Lemma 3.1. To prove the sufficient condition,
assume that C |= A. If a is a condition from A involving an unordered attribute of the form attribute = value
and a does not belong to C, then we can find an interpretation I such that attribute = value1 ≠ value and C is
true in I. In this case, A is false in I, thus producing the contradiction. If a is a condition from A involving an
ordered attribute of the form value1 ≤ attribute ≤ value2, and C does not contain any condition of the form
value1 + δ ≤ attribute ≤ value2 - ε then we can find an interpretation I such that the value of attribute does
not belong to the interval [value1, value2] and C is true for I. In this case, A is false in I, thus also producing
the contradiction.
We next present a definition and later show that it captures the concept of inference under the monotonicity
assumption in more general terms than was done using Proposition 3.2.
Definition. Rule (A → B) |=M (C → D) if
1. C |= A, and
2. D = B.
Note that we defined relationship |=M in terms of logical implication. However, Proposition 3.3 provides an
alternative characterization of |=M in terms of generalized refinements.
Theorem 3.4. If X → Y holds on dataset D and X → Y |=M Z → V, then under the monotonicity assumption Z
→ V holds on D.
8
Proof. By definition of |=M, V = Y and Z |= X. Then from Proposition 3.3, we conclude that Z is a
generalized refinement of X. By Proposition 3.2, Z → Y holds on D.
This theorem demonstrates that the relational operator |=M defines inference of one rule from another under
the monotonicity assumption. Moreover, this theorem gives a broader characterization of this inference than
the one implied by Proposition 3.2 because |=M is defined in terms of logical implication as opposed to
generalized refinement and hence can be applied to a broader class of rules than the ones introduced in
Section 2.
Example. Assume the rule diaper, weekday → not_beer holds on a dataset D. Consider the rule diaper,
weekday, male → not_beer. Since diaper, weekday, male |= diaper, weekday it follows that diaper, weekday
→ not_beer |=M diaper, weekday, male → not_beer. Therefore according to Theorem 3.4, diaper, weekday,
male → not_beer should also hold. Also notice that the itemset diaper, weekday, male is a generalized
refinement of the itemset diaper, weekday.
Proposition 3.5. The relation |=M is reflexive, transitive and not symmetric.
Proof. The proof follows from the observation that the logical implication relation |= is reflexive and
transitive but not symmetric.
The following proposition establishes the relationship between |=M and classical logical inference relation |=.
Proposition 3.6. If x |=M y then it has to be the case that x |= y, but if x |= y it does not follow that x |=M y (i.e.
classical logical inference relation is a necessary but not sufficient condition for two rules to be related by
|=M).
Proof. The “necessary” part (if x |=M y then it has to be the case that x |= y) trivially holds. For the second part
of the proof we provide a counter-example of a case where a rule logically follows from another rule, but
where the relationship is not because of monotonicity assumption. A → B > 10 |= A → B > 9. However it
follows from the definition that A → B > 10 |≠M A → B > 9.
Notice that the inference relation |=M deals with rules that hold on data probabilistically, whereas logical
implication |= deals with interpretations of logical formulas and does not directly deal with probabilistic
relationships usually encountered in data mining. Therefore, we introduced the relationship |=M that deals
with the type of probabilistic inference captured by the monotonicity assumption.
9
Using these definitions and results, we next present the definitions for the minimal set of rules and the
minimal set of unexpected patterns.
3.2 Minimal Set of Rules
In order to define the minimal set of (unexpected) rules, we first introduce a partial ordering relationship on a
set of rules.
Definition. The set of rules Y covers the set of rules X (denoted as Y ≺ X) if Y ⊆ X and ∀ xi ∈ X, ∃ yi ∈ Y
such that yi |=M xi
Definition. Let X and Y be sets of rules. Then Y is the minimal set of X if Y covers X and there is no set of
rules Z that covers Y.
The following proposition establishes an alternative characterization of a minimal set of rules.
Proposition 3.7. Y is the minimal set of X if and only if the following conditions hold:
(1) Y ⊆ X.
(2) ∀ xi ∈ X, ∃ yi ∈ Y such that yi |=M xi.
(3) ∀ y1 , y2 ∈ Y, y1 |≠M y2.
Proof. Immediately follows from the definition of the minimal set of rules.
The next proposition establishes uniqueness of the minimal set of rules.
Proposition 3.8. For any set of rules X, the minimal set of X is unique.
Proof. To prove this proposition, we define a directed graph G=(V, E) as follows. The set of nodes V consists
of all the rules from X. Given two nodes n1 = A → B and n2 = C → D from V, there is an edge from node n1 to
node n2 in E if A → B |=M C → D. Then it is easy to see that the minimal set of rules for X consists of all the
nodes of G having no incoming edges (in-degrees of these nodes are 0). Since such set of nodes in G is
unique, it follows that the minimal set of rules for X is also unique.
Given the definition of minimality introduced above, we next define the minimal set of unexpected patterns.
Definition. If B is a belief and X is the set of all unexpected patterns with respect to B, the minimal set of
unexpected patterns with respect to B is the minimal set of X.
10
Example: For the belief diaper → beer let the set of all unexpected patterns be {diaper and weekday →
not_beer, diaper and unemployed → not_beer, diaper and weekday and unemployed → not_beer, weekday
→ not_beer, unemployed → not_beer, weekday and unemployed → not_beer}. The minimal set of
unexpected patterns in this case is {weekday → not_beer, unemployed → not_beer}.
In this section we discussed minimality of a set of patterns and presented a definition for the minimal set of
unexpected patterns. In the next section we present algorithms for discovering the minimal set of unexpected
patterns.
4. Discovering the Minimal Set of Unexpected Patterns
In this section we present three algorithms for discovering the minimal set of unexpected patterns. We first
present, in section 4.1, a naive algorithm that discovers such a set of patterns. In section 4.3 we present an
efficient algorithm for discovering the minimal set of unexpected patterns. However before presenting this,
we describe in Section 4.2 a semi-naive algorithm that discovers only the minimal set of unexpected
refinements. We describe this semi-naive algorithm because in many applications we are interested only in
refinements (hence the algorithm is important in its own right) and also because the semi-naive algorithm
illustrates some important points used in the algorithm presented in Section 4.3.
The inputs to all the algorithms presented in this section are the same and constitute:
1. A set of beliefs, B.
2. The dataset D.
3. Minimum support and confidence values minsup and minconf.
4. Minimum and maximum width for all ordered attributes.
Regarding input (4), in the case of ordered attributes the width of any condition of the form value1 ≤ attribute
≤ value2 is defined to be value2 - value1. We take as user inputs the minimum and maximum width for all
ordered attributes. This is necessary and useful for the following reason. Assume that age is defined to be an
ordered attribute and takes values ranging from 1 to 100 in the dataset. Clearly at the extreme a rule
involving a condition of the form 1 ≤ age ≤ 100 is not useful since the condition 1 ≤ age ≤ 100 will hold
for every record in the dataset. Extending this argument, larger ranges of age may hold for most records in
the dataset, hence we allow the user to specify the maximum width for age that the user may be interested in
considering. Similarly the user may not be interested in too small a range for ordered attributes and we allow
the user to specify a minimum width for the attribute. Note that this is not a restrictive assumption in any way
11
since the default can be the smallest width and largest width respectively for these two parameters. However,
the specification of this condition improves the efficiency of the algorithm.
We now present the first algorithm, FilterMinZoomUR.
4.1 Algorithm FilterMinZoomUR
FilterMinZoomUR is a post-processing method that discovers the minimal set of unexpected patterns by
filtering patterns from the set of all unexpected patterns. This algorithm operates in two phases. In the first
phase, Algorithm ZoomUR (outlined in Section 2) is applied to a set of beliefs to discover all unexpected
patterns. For this set, in the second phase, each unexpected pattern is compared to the rest of the unexpected
patterns and dropped from consideration if there exists another pattern from that set, from which it can be
inferred under the monotonicity assumption. This strategy is equivalent to applying a minimal filter to the set
of patterns discovered by ZoomUR to select the minimal set for the set of unexpected patterns.
Algorithm FilterMinZoomUR is presented in Figure 4.1. For each belief, B, step 2 applies ZoomUR to
generate the set of all unexpected patterns, Unexp(B). For each unexpected pattern, X, in Unexp(B) steps 4
through 9 adds X to the set of minimal patterns if it cannot be inferred under monotonicity from any other
unexpected pattern.
FilterMinZoomUR is a naïve algorithm in two senses: (1) it is a post-processing method that relies on
ZoomUR to provide the set of all unexpected patterns and (2) each unexpected pattern is compared to the rest
of the patterns in the process of selecting the minimal set of unexpected patterns.
Inputs: Beliefs Bel_Set, Dataset D, minwidth and maxwidth for all ordered
attributes and minimum support min_sup and minimum confidence min_conf
Outputs: For each belief, B, MinUnexp(B)
1 forall beliefs B ∈ Bel_Set {
2
Unexp(B)= ZoomUR(Inputs)
3
MinUnexp(B) = {}
4
forall x ∈ Unexp(B) {
5
Other_unexp = Unexp(B) - x
if not(∃ y ∈ Other_unexp such that y |=M x) {
6
7
MinUnexp(B) = MinUnexp(B) ∪ {x}
8
}
9
}
12
10 }
Figure 4.1 Algorithm FilterMinZoomUR
5
Below we state and prove the completeness of FilterMinZoomUR.
Theorem 4.1. For any belief, B, FilterMinZoomUR discovers the minimal set of unexpected rules.
Proof. To prove that MinUnexp(B) is a minimal set of Unexp(B) we will show that all three conditions listed
in the definition of the minimal set presented in Proposition 3.7 hold.
The first and third conditions are trivially satisfied. To satisfy the second condition, it needs to be shown that
for any X ∈ Unexp(B) there exists Y ∈ MinUnexp(B) such that Y |=M X. For any X ∈ Unexp(B) consider the
two cases that arise from Steps 4 through 9 of FilterMinZoomUR:
1. X is added to the set of minimal patterns. Due to the reflexivity of |=M it is easily seen that there exists
Y=X ∈ MinUnexp(B) such that Y |=M X.
2. X is not added to the set of minimal patterns since there exists a pattern P in Unexp(B) - X such that P |=M
X. In this case, observing that |=M is transitive and Unexp(B) is finite, Steps 4 through 9 is applied
iteratively and adds some Q to MinUnexp(B) where Q |=M P. Since P |=M X, it follows that Q |=M X.
Hence there exists Y ∈ MinUnexp(B) such that Y |=M X.
The third condition of the definition holds trivially since steps 7 and 8 of FilterMinZoomUR add a pattern X
to MinUnexp(B) only if there exists no other pattern, Y, in Unexp(B) such that Y |=M X.
One of the main sources of inefficiencies of this algorithm is that we first have to generate Unexp(B) and
then filter the non-minimal patterns out of it. In the next two sections we present more efficient algorithms
that avoid generating many unexpected patterns that may be known to be non-minimal.
4.2 Algorithm MinZoominUR
In this section we present MinZoominUR, an algorithm for discovering the minimal set of unexpected
refinements to a set of beliefs.
Consider the belief body → head, having the structure specified in Section 2. We use the term CONTR(head)
5
The inputs to algorithms presented in this section are the same as the inputs to ZoomUR [PT98] described in Section 2.
13
to refer to the set of atomic conditions that contradict the atomic condition specified by head. Assume that v1,
v2,...,vk are the set of unique values (sorted in ascending order if the attribute a is ordered) that a takes on in
D. CONTR(head) is generated as follows:
(1) If the head of the belief is of the form value1 ≤ attribute ≤ value2 (attribute is ordered), then condition
value3 ≤ attribute ≤ value4 belongs to CONTR(head) if the ranges [value1, value2] and [value3, value4] do
not overlap.
(2) If the head of the belief is of the form attribute = val (attribute is unordered), then condition attribute
= vp belongs to CONTR(head) if vp ∈ { v1, v2,...,vk } and vp ≠ val;
Algorithm MinZoominUR is based on the Apriori algorithm [AMS+95] with several major differences. First,
generation of large itemsets starts with a set of beliefs that seed the search. Second, MinZoominUR does not
generate those itemsets that are guaranteed to produce non-minimal rules. Third, rule generation process is
integrated into the itemset generation part of the algorithm – this process is immaterial for Apriori but results
in significant efficiency improvements for MinZoominUR.
Before presenting MinZoominUR, we first present a broad overview of the algorithm. Each iteration of
MinZoominUR generates itemsets in the following manner. In the k-th iteration we generate itemsets of the
form {C,body,P}, where C ∈ CONTR(head) and P is a conjunction of k atomic conditions. Observe that to
determine the confidence of the rule body, P → C, the supports of both the itemsets {C,body,P} and
{body,P} will have to be determined. Hence in the k-th iteration of generating large itemsets, two sets of
candidate itemsets are considered for support determination:
(1) The set Ck of candidate itemsets. Each itemset in Ck (e.g. {C,body,P}) contains
(i) a condition that contradicts the head of belief, (i.e. any C ∈ CONTR(head)),
(ii) the body {body} of the belief, and
(iii) k other atomic conditions (P is a conjunction of k atomic conditions).
(2) A set Ck' of additional candidates. Each itemset in Ck' (e.g. {body,P}) is generated from an itemset in Ck
by dropping the condition, C, that contradicts the head of the belief.
In each iteration, minimal unexpected rules are generated from the set of large itemsets. If an itemset
generates an unexpected rule, it is deleted from consideration and therefore no superset of this itemset is
even considered in subsequent iterations. As we prove in Theorem 4.2, this step avoids generation of
itemsets producing non-minimal rules and significantly improves the efficiency of the algorithm.
We explain the steps of MinZoominUR in Fig. 4.2 now. The following is a list of notations that are used in
14
describing the algorithm:
•
UNORD is the set of unordered attributes.
•
ORD is the set of ordered attributes.
•
minwidth(a) and maxwidth(a) are minimum and maximum widths for ordered attribute a.
•
Attributes(x) is the set of all attributes present in any of the conditions in itemset x.
•
Values(a) is the set of distinct values the attribute a takes in the dataset D.
First, given a belief, B, the set of atomic conditions that contradict the head of the belief, CONTR(head(B)),
is computed (as described previously). Then, the first candidate itemsets generated in C0 (step 3) will each
contain the body of the belief and a condition from CONTR(head(B)). Hence the cardinality of the set C0 is
the same as the cardinality of the set CONTR(head(B)).
Inputs: Beliefs Bel_Set, Dataset D, minwidth and maxwidth for all
ordered attributes ORD and thresholds min_support and min_conf
Outputs: For each belief, B, MinUnexp(B)
1 forall beliefs B ∈ Bel_Set {
2
MinUnexp(B) = {}
3
C0 = { {x,body(B)} | x ∈ CONTR(head(B)) };
4
C0’ = {{body(B)}};
5
k=0
6
while (Ck != ∅ ) do {
7
8
9
10
forall c ∈ Ck ∪ Ck’, compute support(c)
Lk = {x| x ∈ Ck, support(x) ≥ min_support }
Lk’ = {x| x ∈ Ck’, support(x) ≥ min_support}
forall (x ∈ Lk) {
Let a = x ∩ CONTR(head(B)) /* this intersection is a single
11
element */
12
rule_conf = support(x)/support(x-a)
13
if (rule_conf > min_conf) {
MinUnexp(B) = MinUnexp(B) ∪ {x – a →a}
14
15
Lk = Lk - x
16
17
}
}
18
k++
19
Ck = generate_new_candidates(Lk-1, B)
15
Ck’ = generate_bodies(Ck , B)
20
21
}
22
forall x ∈ MinUnexp(B) {
23
24
Other_unexp = MinUnexp(B)-x
if (∃ y ∈ Other_unexp such that y |=M x) {
25
MinUnexp(B) = MinUnexp(B) - {x}
26
27
}
}
28 }
Figure 4.2 Algorithm MinZoominUR
To illustrate this, consider an example involving only binary attributes. For the belief x=0 → y=0, the set
CONTR({y=0}) consists of a single condition {y=1}. The initial candidate sets, therefore, are C0 = {{y=1,
x=0}}, C0' = {{x=0}}.
Steps (6) through (20) in Fig. 4.2 are iterative: Steps 7 through 9 determine the supports in dataset D for all
the candidate itemsets currently being considered and selects the large itemsets Lk and Lk’. Each itemset in Lk
contains the body and the head of a potentially unexpected rule, while each itemset in Lk’ contains only the
body of the potentially unexpected rule.
Steps 10 through 17 generate unexpected rules such that large itemsets that contribute to unexpected rules are
subsequently deleted in Step 15. Specifically, for each large itemset in Lk, if the unexpected refinement rule
that is generated from the itemset has sufficient confidence, then two actions are performed:
1. Step 14 adds this rule to the set of potentially minimal unexpected refinements.
2. Step 15 deletes the corresponding itemset from Lk since any itemset that is a superset of this itemset can
only generate unexpected refinements that can be monotonically inferred from the new rule generated in
step 14. Theorem 4.2 below has a detailed proof.
In step (19), function generate_new_candidates(Lk-1, B) generates the set Ck of new candidate itemsets to be
considered in the next pass from the previously determined set of large itemsets, Lk-1, with respect to the
belief B (“x → y”) in the following manner:
16
(A) Initial condition (k=1): In the example (involving binary attributes) considered above, assume that L0 =
{{x=0, y=1},{x=0}}, i.e. both the initial candidates had adequate support. Further assume that p is the only
other attribute (also binary) in the domain. The next set of candidate itemsets to be considered would be C1 =
{{x=0,y=1,p=0}, {x=0,y=1,p=1}}, and C1’ = {{x=0, p=0}, {x=0, p=1}}.
In general we generate C1 from L0 by adding additional conditions of the form attribute = value for
unordered attributes or of the form value1 ≤ attribute ≤ value2 for ordered attributes to each of the itemsets in
L0. More specifically, for a belief B, the set C1 is computed using the following rules. If itemset x ∈ L0 and x
contains a condition that contradicts the head of the belief:
1. The itemset x ∪ {{a = val}} ∈ C1 if a ∈ UNORD (set of unordered attributes), val ∈ Values(a) and a ∉
Attributes(x).
2. The itemset x ∪ {{value1 ≤ a ≤ value2}} ∈ C1 if a ∉ Attributes(head(B)), a ∈ ORD (set of ordered
attributes), value1 ∈ Values(a), value2 ∈ Values(a), value1 ≤ value2, and the resulting width for the
attribute a should satisfy minimum and maximum width restrictions for that attribute.
This process is efficient and complete because of the following reasons.
1. The attributes are assumed to have a finite number of unique discrete values in the dataset D. Only
conditions involving these discrete values are considered.
2. For unordered attributes no condition involving an attribute already present in the itemset is added. This
ensures that itemsets that are guaranteed to have zero support are never considered. For example, this
condition ensures that for the belief month=9 → sales=low, the itemset {{month = 3}} is not added to
the itemset {{sales = high}, {month = 9}}.
3. For ordered attributes, however, an itemset such as {{3 ≤ a ≤ 6 }} can be added to {{b=1}, {5 ≤ a ≤
8}} to result in {{b=1}, {5 ≤ a ≤ 6 }} where the initial belief may be 5 ≤ a ≤ 8→ b=0 for example.
Without loss of generality in this case we represent the new itemset as {{b=1}, {5 ≤ a ≤ 8}, {3 ≤ a ≤ 6
}} rather than as {{b=1}, {5 ≤ a ≤ 6 }}. We use this “long form” notation since (1) we assume that all
itemsets in a given iteration have the same cardinality and (2) the body of the belief is explicitly present
in each itemset.
(B) Incremental generation of Ck from Lk-1 when k > 1: This function is very similar to the apriori-gen
function described in [AMS+95]. For example, assume that for a belief, B, "x → y", c is a condition that
contradicts y and that L1 = {{c, x, p}, {c, x, q}, {x, p}, {x, q}}. Similar to the apriori-gen function, the next
set of candidate itemsets that contain x and c is C2 ={{x, c, p, q}} since this is the only itemset such that all its
17
subsets of one less cardinality that contain both x and c are in L1.
In general, an itemset X is in Ck if and only if for the belief B, X contains body(B) and a condition A such that
A ∈ CONTR(head(B)) and all subsets of X with one less cardinality, containing A and body(B), are in Lk-1.
More specifically, Ck is generated from Lk-1 using the following rule:
If a ∈ CONTR(head(B)), a ∈{x1, x2,..., xp} and {x1, x2,..., xp, v}, {x1, x2,..., xp, w}∈ Lk-1 then
{x1, x2,..., xp, v, w} ∈ Ck if w ∉ Attributes({x1, x2,..., xp, v}).
The above rule for generating Ck essentially limits itemsets to a single condition for each “new” attribute not
present in the belief B. This however does not eliminate any relevant large itemset from being generated as
the following example shows. Consider the case where starting from the belief x=1 → y=0 the set of large
itemsets (generated by MinZoominUR at the end of the first iteration) L1 contains {y=1, x=1, 3 ≤ a ≤ 6} and
{y=1, x=1, 5 ≤ a ≤ 7}. Combining these itemsets yields the equivalent itemset {y=1, x=1, 5 ≤ a ≤ 6} of the
same cardinality as any itemset in L1 and if this itemset is large, it would already be present in L1.
In step (20), as described previously, we would also need the support of additional candidate itemsets in Ck'
to determine the confidence of unexpected rules that will be generated. The function generate_bodies(Ck,B)
generates Ck' by considering each itemset in Ck and dropping the condition that contradicts the head of the
belief and adding the resulting itemset in Ck'.
Steps (22 – 27) are needed to detect any remaining non-minimal rules that arise due to the following special
case of certain itemsets containing unordered attributes. To illustrate this special case, consider the following
two itemsets: {{a=1}, {5 ≤ b ≤ 10}} and {{a=1}, {7 ≤ b ≤ 8}}. The special case is that neither of these sets
is a superset of the other, yet (5 ≤ b ≤ 10 → a=1) |=M (7 ≤ b ≤ 8 → a=1) since (7 ≤ b ≤ 8) |= (5 ≤ b ≤ 10).
Therefore, the rule 7 ≤ b ≤ 8 → a=1 should be eliminated in order to produce the minimal set of unexpected
rules. Since Steps (6 – 21) of the algorithm do not eliminate such rules, the additional Steps (22 – 27) do this.
Observe that in the case of only unordered attributes in the itemsets, Steps (22 – 27) of the algorithm are not
needed since MinUnexp(B) after Step 21 is guaranteed to be minimal (see the proof of Theorem 4.2).
Moreover, notice that elimination of remaining non-minimal rules can also be done in two other parts of the
algorithm instead of Steps (22 – 27). First, it can be done in generate_new_candidates procedure (Step 19)
by avoiding the generation of the itemsets that are generalized refinements of itemsets that have previously
18
produced unexpected rules. Second, the elimination procedure can also be done as new rules are added to the
set MinUnexp(B) in Step 14 by comparing the new rule with all the other rules in MinUnexp(B). We
analyzed both of these possibilites and realized that the approach taken in Steps (22 – 27) of Figure 4.2 is at
least as efficient as these two other possibilities. Moreover, another advantage of doing elimination in Steps
(22 – 27) is that it can be dropped altogether in case of only unordered attributes (as explained above).
The computational complexity of Steps (1 – 21) is determined by the total number of candidate itemsets K
generated in Steps (19 - 20) taken over all the iterations of the While-loop. The computational complexity of
the elimination procedure in Steps (22 – 27) is O(n2), where n is the size of the set MinUnexp(B). In practice
K >> n2. Therefore, the bottleneck of MinZoominUR algorithm lies in Steps (6 – 21).
Moreover, the complexity of MinZoominUR in the worst case is comparable to the worst-case complexity of
Apriori that is bounded by O(||C|| * ||D||), where ||C|| denotes the sum of the sizes of candidates considered,
and ||D|| denotes the size of the database [AMS+95]. However, in the average case, the computational
complexity of MinZoominUR is significantly lower than that of Apriori. This is the case because the average
number of candidates considered in MinZoominUR is significantly lower than that for Apriori due to (a)
minimality-based elimination procedure, and (b) presence of the initial set of beliefs that seed the search
process. In Section 5 we experimentally compare the main algorithm, ZoomUR (presented in the next
section), with Apriori not in terms of itemsets but directly in terms of the number of rules generated.
A key strength of MinZoominUR, compared to ZoomUR [PT99] and Apriori [AMS+95], is that rule
discovery is integrated into the itemset generation procedure and hence it can greatly reduce the number of
itemsets generated in subsequent iterations - as [AMS+95] show, itemsets can grow exponentially in
association rule discovery algorithms. The reason this is possible is the objective of MinZoominUR - to
generate only the minimal set of unexpected patterns (as opposed to generating all patterns or all unexpected
patterns).
Below we prove the completeness of MinZoominUR.
Theorem 4.2. For any belief, B, MinZoominUR discovers the minimal set of unexpected rules that are
refinements to the belief.
Sketch of the Proof. We will first show that for the case where there are unordered attributes only,
MinZoominUR generates the minimal set of unexpected patterns without needing to apply the minimal filter
(Steps 22 through 27 of Figure 4.2).
19
For unordered attributes only, it is easy to see that a rule X1=x1, X2=x2,…, Xn=xn → Y = y1 is minimal if and
6
only if there is no rule of the form Z → Y = y1, where Z ⊂ {X1=x1, X2=x2,…, Xn=xn} .
For unordered attributes only, consider the belief A=a1 → B=b1 and any minimal unexpected rule A=a1, X
→ B=b2 which is a refinement that holds, i.e its support and confidence values are greater than the
specified threshold values, where the itemset X is a conjunction of atomic conditions involving unordered
attributes. Below we will show that A=a1, X → B=b2 will be discovered by MinZoominUR.
Since A=a1, X → B=b2 holds, the itemset {A=a1, X, B=b2}, and all its subsets, have adequate support.
Further since the rule is assumed to hold, it is clear that Steps 13 and 14 generate the rule A=a1, X →
B=b2 if the itemset {A=a1, X, B=b2} is generated. Hence it needs to be shown that the itemset {A=a1 , X,
B=b2} will be generated.
Before the iterations of MinZoominUR, Step 3 generates {A=a1, B=b2} in the initial set of candidates.
Based on Apriori and the completeness proof of ZoomUR [P99] it is clear that the itemset {A=a1, X,
7
B=b2} will be generated unless Step 15 deleted a “parent” of this itemset. We will next show that this is
impossible, i.e. Step 15 could not have deleted a parent of this itemset in any previous iteration.
Consider any parent, {A=a1, Y, B=b2} of the itemset {A=a1, X, B=b2} defined such that Y ⊂ X. Assume
that the itemset {A=a1, Y, B=b2} was deleted in Step 15 in some iteration. Hence it has to be the case (see
Steps 13-16 of Figure 4.2) that A=a1, Y → B=b2 holds. However if Y ⊂ X and A=a1, Y → B=b2 then the
rule A=a1, X → B=b2 cannot be minimal. This is a contradiction and hence no itemset of the form {A=a1,
Y, B=b2} will be deleted in previous iterations. Hence the itemset {A=a1, X, B=b2} will be generated.
Hence MinZoominUR will generate any minimal unexpected rule A=a1, X → B=b2 which is a refinement to
the belief. Given the observation that for unordered attributes only, a rule X1=x1, X2=x2,…, Xn=xn → Y = y1 is
minimal if and only if there is no rule of the form Z → Y = y1, where Z ⊂ {X1=x1, X2=x2,…, Xn=xn}, it is easy
to see that MinZoominUR does not generate any non-minimal rule.
Hence, for the case where there are unordered attributes only, MinZoominUR generates the minimal set
of unexpected patterns without needing to apply the minimal filter. For the case of ordered attributes, it
6
Note that this “syntactic” subset property is not true when dealing with ordered attributes, which is why the minimal filter in Steps
22-27 is necessary.
7
Parent is defined formally in Section 4.3.1
20
can easily be seen that only non-minimal rules are automatically excluded from MinZoominUR. However
there is a special case involving ordered attributes that cannot guarantee only minimal rules before Steps
22-27. This special case arises since a syntactic subset check cannot capture containment when dealing
with ranges of values for ordered attributes. An example of this special case was given above in Section
4.2. Hence the filter in Steps 22-27 removes any non-minimal rules remaining and its clear that
MinZoominUR generates only the minimal set of unexpected refinements to a belief.
As mentioned earlier, MinZoominUR only discovers the minimal set of unexpected refinements to a belief.
We next present MinZoomUR, an algorithm that discovers for each belief the minimal set of unexpected
rules.
4.3 Algorithm MinZoomUR
In this section we present MinZoomUR, an algorithm that discovers for each belief, the minimal set of
unexpected rules. Before describing the algorithm we present some preliminaries.
4.3.1 Preliminaries
For a belief B, let x be any large itemset containing body(B) and one condition from CONTR(head(B)).
We use the term parents(x) to denote the set of all subsets of x that contain the body of the belief and one
8
condition that contradicts the head of the belief considered in previous iterations during the candidate
generation phase of the algorithm. Specifically,
parents(x) = { a | a ⊂ x, body(B) ⊂ a, ∃c such that c ∈ CONTR(head(B)) and c ∈ a}
An itemset y is said to be a parent of x if y ∈ parents(x).
We use the term zoomin rules to denote unexpected rules that are refinements of beliefs and zoomout rules
for unexpected rules that are more general unexpected rules. The large itemset x is said to generate a zoomin
rule if confidence (x - c → c) > min_conf, where c ∈ CONTR(head(B)). The large itemset x is said to
generate a zoomout rule if x generates a zoomin rule x - c → c and confidence( x - c - d → c) > min_conf,
where c ∈ CONTR(head(B)), d ⊆ body(B) and d is not empty.
8
Recall that the candidate generation phase of these algorithms (Apriori, MinZoominUR and MinZoomUR) is iterative such that
itemsets in subsequent iterations have greater cardinality (number of items).
21
Associated with each itemset, x, are two attributes: x.rule, that keeps track of whether a zoomin rule is
generated from x, and x.dropped_subsets, which keeps track of the subsets of body(B) that are dropped
during the discovery of zoomout rules. These two attributes are further explained below.
We define x.rule as follows:
x.rule = 1 if x generates a zoomin rule,
= 0 otherwise.
Further we define parentrules(x) to be TRUE if and only if x has a parent y such that y.rule = 1.
If x generates a zoomin rule, zoomout rules are generated in ZoomoutUR [PT98, P99] by dropping one or
more attributes belonging to body(B) from the zoomin rule generated. This process essentially drops nonempty subsets of body(B) from the zoomin rule generated to determine the zoomout rules that are
unexpected. The set x.dropped_subsets is the set of subsets of body(B) that are dropped from itemset x to
generate a given set of zoomout rules. Specifically, if P is a set of zoomout rules generated from x, then
x.dropped_subsets is defined as follows:
x.dropped_subsets(P) = {d | { x - c - d → c} ∈ P, where c ∈ CONTR(head(B)), d ⊆ body(B), d ≠ ∅ }.
Given these preliminaries, we now present MinZoomUR.
4.3.2 Overview of the Discovery Strategy
Unlike what was done in MinZoominUR, an itemset that generates a zoomin rule in MinZoomUR cannot
always be deleted from subsequent consideration since it is possible for minimal zoomout rules to be derived
from non-minimal zoomin rules. Consider the following example. For a belief a, b → x, let a, b, c → y and a,
b, c, d → y be two zoomin rules. Though a, b, c, d → y is a non-minimal zoomin rule, the rule may result in a
zoomout rule such as b, c, d → y which may belong to the minimal set of unexpected rules.
Extending this example one more step, we observe that the zoomout rule b, c, d → y can, however, be
guaranteed to be non-minimal if the first zoomin rule a, b, c → y resulted in a zoomout rule of the form p, c
→ y such that b, c, d |= p, c where p is a proper subset of the body of the belief. Examples of such p are {b}
and {} corresponding to the zoomout rules b, c → y and c → y respectively (generated from a, b, c → y).
However if the first zoomin rule generated only the zoomout rule a, c → y, it may still be possible for the
zoomout rule b, c, d → y to be minimal since b, c, d |≠ a, c.
22
The discovery strategy of MinZoomUR is based on the following conditions under which some generated
rules are guaranteed to be non-minimal and hence can be excluded from the minimal set. Theorem 4.3 below
proves that these conditions do indeed exclude only non-minimal rules, hence we state these rules with only
limited explanation here. The “exclusion rules” used in MinZoomUR are:
1. If x and y are two large itemsets such that x is a parent of y and x.rule = 1 then the zoomin rule generated
from y cannot be minimal. Hence for any itemset y such that parentrules(y) is TRUE, the zoomin rule
generated from y cannot be minimal. This is the only exclusion rule used previously in MinZoominUR.
2. If x is a large itemset that generates a zoomin rule and some zoomout rules, then the zoomin rule
generated cannot be minimal.
9
3. If x is a large itemset that generates zoomout rules p and q and elem_p ∈ x.dropped_subsets(p) and
elem_q ∈ x.dropped_subsets(q) and elem_p ⊂ elem_q then p cannot be minimal. For example, for a
belief a, b → z, let itemset x = {a, b, c, y} be large where y ∈ CONTR(z). If x generates the zoomout
rules p ("b, c → y") and q ("c → y") then elem_p is {a} and elem_q is {a, b}. Since elem_p ⊂ elem_q,
the rule b, c → y cannot belong to the minimal set of unexpected rules since it can be inferred from c →
y using the monotonicity assumption.
4. If x and y are two large itemsets such that x is a parent of y, zoomout rules generated from y generated by
dropping any subset, p, from the body of the belief such that p is a subset of some element belonging to
x.dropped_subsets cannot be minimal rules. For example, for a belief a, b, c → z, let itemset x = {a, b, c,
d, m} be large where m ∈ CONTR(z). Let x generate the zoomout rule c, d → m. Hence {a, b} ∈
x.dropped_subsets. This exclusion rule states that from any large itemset y, generated in subsequent
iterations from x, the zoomout rules derived from y by dropping either {a} or {b} cannot be minimal. For
example, assume the large itemset y = {a, b, c, d, e, m}. The zoomout rules generated from y by dropping
{a} or {b} are, respectively, b, c, d, e → m and a, c, d, e → m. Observe that neither of these two zoomout
rules can be minimal since these can be derived under the monotonicity assumption from the prior
zoomout rule c, d → m.
MinZoomUR generates candidate itemsets in the same manner as in MinZoominUR. A main difference in
the algorithms is that MinZoomUR considers zoomout rules also for a given itemset immediately after the
itemset generates a zoomin rule. This is necessary because some of the exclusion rules applied to an
unexpected rule generated depends on knowing the zoomout rules generated for that itemset and its parents.
9
For belief B and itemset x, x.dropped_subsets(p) where p is a single zoomout rule, will contain only one element which is the
subset of body(B) that was dropped to create the zoomout rule.
23
Inputs: Beliefs Bel_Set, Dataset D, minwidth and maxwidth for all ordered
attributes and thresholds min_support and min_conf
Outputs: For each belief, B, MinUnexp(B)
1
forall beliefs B ∈ Bel_Set {
2
MinUnexp(B) = {}; k=0
3
C0 = {{x,body(B)} | x ∈ CONTR(head(B))}; C0’ = {{body(B)}};
4
while (Ck != ∅ ) do {
forall c ∈ Ck ∪ Ck’, compute support(c)
Lk = {x| x ∈ Ck, support(x) ≥ min_support }
5
6
8
Lk’ = {x| x ∈ Ck’, support(x) ≥ min_support}
forall (x ∈ Lk) {
9
x.rule = 0; Let a = x ∩ CONTR(head(B));
10
rule_conf = support(x)/support(x-a)
11
if (rule_conf > min_conf) {
7
12
x.rules = 1
13
zoutrules = minzoomoutrules(x,B)
14
x.dropped_subs = dropped_subsets(zoutrules,B)
15
MinUnexp(B) = MinUnexp(B) ∪ zoutrules
16
if (zoutrules == parentrules(x) == NULL)
MinUnexp(B) = MinUnexp(B) ∪ {x – a →a}
17
if ({body(B)} ∈ x.dropped_subs) Lk = Lk - x
18
19
}
20
}
21
k++
22
Ck = generate_new_candidates(Lk-1, B)
23
Ck’ = generate_bodies(Ck , B)
24
25
}
forall x ∈ MinUnexp(B) {
26
Other_unexp = Unexp(B)-x
27
if (∃ y ∈ Other_unexp | y |=M x)
28
29
MinUnexp(B) = MinUnexp(B) - {x}
}
30 }
Figure 4.3 Algorithm MinZoomUR
24
MinZoomUR is presented in Figure 4.3. The beginning (Steps 1 through 7) and the end (Steps 25 through
30) of the algorithm where the minimal filter is applied are the same as in MinZoominUR. We explain steps
8 through 24 next, where unexpected rules are generated from each large itemset and the potentially minimal
ones stored in MinUnexp(B).
Steps 8 through 12 consider whether a large itemset, x, generates a zoomin rule and sets the attribute x.rule
accordingly. If a zoomin rule is generated, step 13 applies the procedure minzoomoutrules to consider
potentially minimal zoomout rules generated from this itemset. This procedure first applied exclusion rule #3
to generate some potentially minimal zoomout rules. Step 14 initializes x.dropped_subsets but notice that
these are actually computed in the process of applying exclusion rule #3 in the previous step. Then exclusion
rule #4 is applied to this set of rules to filter out any guaranteed non-minimal rules. The resulting set of
potentially minimal zoomout rules generated from this itemset are zoutrules. From this set of rules, Step 14
sets the attribute x.dropped_subsets to the set of subsets of the body of the belief dropped in any of the rules
in zoutrules. Step 15 adds the rules zoutrules to the potentially minimal set. In order to decide whether the
zoomin rule generated for x should be added to the potentially minimal set, Step 16 and 17 applies exclusion
rules #1 and #2. Finally, for each large itemset, x, Step 18 applies a corrollary to exclusion rule #4 - if the
entire body of a belief is dropped to generate a zoomout rule, then no children of x can generate any minimal
rule. Hence, in this event the itemset x is deleted from subsequent consideration.
Below we prove the completeness of MinZoomUR.
Theorem 4.3. For any belief MinZoomUR discovers the minimal set of unexpected patterns.
Proof. As proved in Theorem 4.1, steps 25 through 30 generate the minimal set of unexpected patterns from
MinUnexp(B). Hence in order to prove the theorem, it just needs to be shown that the exclusion rules applied
prior to step 25 exclude only the non-minimal rules. In the remainder of the proof we will consider each
exclusion rule and demonstrate that the rule excludes only non-minimal rules.
Exclusion rule #1. This rule is the exclusion rule used in MinZoominUR. Hence, according to Theorem 4.2,
this rule excludes only rules guaranteed to be non-minimal.
Exclusion rule #2 This rule immediately follows from the definitions.
Exclusion rule #3. Let x be a large itemset that generates zoomout rule p by dropping elem_p from body(B)
and generates zoomout rule q by dropping elem_q from body(B). Further, as in exclusion rule #3, let elem_p
⊂ elem_q. Let c ∈ x such that c ∈ CONTR(B). The zoomout rule p therefore is x - elem_p - c → c and the
zoomout rule q is x - elem_q - c → c. Since elem_p ⊂ elem_q, we have x - elem_p - c ⊃ x - elem_q - c. Hence
25
x - elem_p - c |= x - elem_q - c and as a result x - elem_q - c → c |=M x - elem_p - c → c. Thus x - elem_p - c
→ c is not minimal and can be excluded.
Exclusion rule #4. Let x and y be two large itemsets such that x is a parent of y and assume that zoomout rule
zy,p is generated from y by dropping a subset, p, from the body of the belief such that p is a subset of q where
q ∈ x.dropped_subsets. It needs to be shown that zy,p cannot be minimal. Since x is a parent of y, x = y - k,
where k is some non-empty itemset and there exists c ∈ CONTR(B) such that c ∈ x and c ∈ y. Further since p
is a subset of q, assume p = q - t, where t is some itemset.
The rule zy,p therefore is y - p - c → c. Since q ∈ x.dropped_subsets there is a zoomout rule zx,q generated
from x that is obtained by dropping a subset q from the body of the belief. The rule zx,q therefore is x - q - c
→ c. Substituting for x and q, zx,q is equivalent to (y - k) - (p + t) - c → c which is the rule y - p - c - (k + t) →
c. Since k is also non-empty, y - p - c ⊃ y - p - c - (k + t) and hence ,y - p - c |= y - p - c - (k + t). Therefore, zx,q
|=M zy,p. Hence zy,p is not minimal and can be excluded.
In this section we presented methods to generate the minimal set of unexpected patterns. The strength of
these methods is that they eliminate redundant unexpected patterns and therefore generate a smaller set of
unexpected patterns. In the next section we present experiments from a real world consumer purchase dataset
that these ideas can be used to discover orders of magnitude fewer patterns that ZoomUR [PT98] and Apriori
[AMS+95] and yet find most of the truly unexpected ones.
5. Experiments
To illustrate the usefulness of our approach to discovering patterns, in this section we consider an illustrative
case study application of applying the methods to consumer purchase data from a major market research
firm. We pre-processed this data by combining different data sets (transaction data joined with
demographics), made available to us into one table containing 38 different attributes and 313409 records.
These attributes pertain to the item purchased by a shopper at a store over a period of one year, together with
certain characteristics of the store and demographic data about the shopper and his or her family. Some
demographic attributes include age and gender of the shopper, occupation, income and marital status of the
household head and the presence of children in the family and the size of the household. Some transactionspecific attributes include product purchased, coupon usage (whether the shopper used any coupons to get a
lower price or not), the availability of store coupons or manufacturer’s coupons and presence of
advertisements for the product purchased in the store. For simplicity in generating beliefs and in making
comparisons to other techniques that generate association rules in these experiments we restrict our
26
consideration to rules involving discrete attributes only. An initial set of 28 beliefs was generated by domain
experts after examining 300 rules generated from the data using methods described in [P99].
In this section we present results from applying MinZoomUR, ZoomUR [PT98, P99] and Apriori [AMS+95]
to this dataset starting from the initial set of beliefs where applicable. In Section 5.1 we compare these
methods in terms of the number of rules generated and provide some guidelines as to when each may be
applicable and in Section 5.2 we discuss scalability of MinZoomUR and ZoomUR with respect to the size of
the database and the initial set of beliefs.
5.1 Number of Patterns Generated.
In this section we compare and contrast MinZoomUR, ZoomUR and Apriori in terms of the number of
patterns generated and other criteria. We also present practical implications of this comparison in terms of
guidelines for when each may be preferable. For a fixed minimum confidence level of 0.6, Figure 5.1
through 5.3 show the number of patterns generated by Apriori, ZoomUR and MinZoomUR for varying levels
of minimum support thresholds. Apriori generated 50,000 to 250,000 rules even for reasonably high
minimum support values. This is not surprising since the objective of Apriori is to discover all strong
association rules. For reasonable values of support (5 to 10%), ZoomUR generates 50 to 5000 unexpected
patterns. MinZoomUR on the other hand generated only 15 to 700 unexpected patterns even for extremely
low values for minimum support.
300,000
Number of Rules
250,000
200,000
150,000
100,000
50,000
0
5.0%
6.0%
7.0%
8.0%
9.0%
10.0% 11.0% 12.0% 13.0%
Minimum Support
Figure 5.1. Number of rules generated by Apriori
27
6000
5000
# of rules
4000
3000
2000
1000
0
5.3%
6.3%
7.3%
8.3%
9.3%
10.3%
Minimum Support
Figure 5.2. Number of unexpected rules generated by ZoomUR
800
700
# of rules
600
500
400
300
200
100
0
0.0%
1.0%
2.0%
3.0%
4.0%
5.0%
Minimum Support
Figure 5.3. Number of unexpected rules generated by MinZoomUR
Figure 5.4 illustrates the comparison of the three methods in terms of the number of generated rules. Due to
the order of magnitude difference in the number of generated rules, the graph plots the number of rules
generated using a logarithmic scale for the Y axis.
As we would expect, as the minimum support threshold is lowered, all the methods discover a greater
number of rules. Despite this, MinZoomUR discovers orders of magnitude fewer patterns than both ZoomUR
and Apriori. The graphs in Figures 5.1 – 5.3 also demonstrate that a majority of patterns generated by
ZoomUR are redundant. Observe that as the support threshold is lowered, the number of patterns generated
by both ZoomUR and Apriori seem to increase more than linearly. While this is the case for MinZoomUR in
28
some regions, MinZoomUR plateaued out for lower regions of support, as Figure 5.3 demonstrates. This
plateau signifies that very few new minimal unexpected patterns are generated despite the fact that the
number of unexpected patterns generated by ZoomUR keep increasing in that region. This observation
coupled with the comparison in the number of rules generated indicate that MinZoomUR is indeed effective
in removing redundant patterns, which represent a large majority of the set of all unexpected patterns.
1000000
100000
10000
# of rules (logarithmic
scale)
Apriori
1000
ZoomUR
MinZoomUR
100
10
1
0
2
4
6
8
10
12
Minimum Support
Figure 5.4. Comparison of number of rules generated by Apriori, ZoomUR and MinZoomUR
Discussion. Based on these experiments we discuss below some possible tradeoffs between these
methods and provide some guidelines to their usage.
The clear advantage of MinZoomUR over ZoomUR is that it generates far fewer patterns and yet retains
most of the truly interesting ones. Since ZoomUR generates all unexpected patterns for a belief and
MinZoomUR generates the minimal set of unexpected patterns, MinZoomUR will always generate a subset
of patterns that ZoomUR generates. As shown above, this subset can be extremely small (from 15 to a few
hundred patterns for the entire set of beliefs). We would also like to note here that the classical notion of
“minimality” often assumes that it is possible to reconstruct the set of all objects having certain property
from the minimal set of objects having this property. In our case also, the set of all unexpected patterns can
be reconstructed from the minimal set of unexpected patterns. However, this can only be done using a
generate-and-test procedure (starting from the minimal set) that requires data access again. This limitation of
our approach is the result of using efficient search algorithms that directly discover the minimal set of
unexpected patterns without even examining all unexpected patterns. Moreover, this limitation can also be
circumvented by letting the domain expert examine the set of minimal unexpected patterns (that is small),
29
select the most interesting minimal patterns, and refine them to discover all the unexpected patterns obtained
from this selected set.
The drawback of MinZoomUR compared to ZoomUR is that MinZoomUR makes an implicit assumption
that minimal unexpected patterns are the most interesting patterns. From a subjective point of view this may
not be necessarily true. Consider the following example of two unexpected patterns:
•
When coupons are available for cereals, they don't get used (confidence = 60%)
•
On weekends, when coupons are available for cereals they don't get used (confidence = 98%)
MinZoomUR will not generate the second unexpected pattern since it is monotonically implied by the first
pattern. However, the second unexpected pattern has a much higher confidence and may be considered "more
unexpected" by some users, in the spirit of [SA96, BAG99]. In a more general sense, the criteria implied by
monotonicity and confidence are just two methods to rank unexpected patterns. In general there may be other
criteria, some of which even depending on other subjective preferences of a user. Hence, since ZoomUR
generates all the unexpected patterns, it is guaranteed to contain all the unexpected patterns that are "most
unexpected" from any specific definition of the term "most unexpected". In subsequent work, we will study
the issue of generating the "most unexpected patterns" by characterizing the degree of unexpectedness for
patterns along the lines of [ST96]. In the context of objective measures of interestingness, [BA99] discuss
interesting approaches to finding the “most interesting” patterns.
Given the relative advantages of the two methods to discovering unexpected patterns, a practical implication
of the above is that ZoomUR can be used to generate unexpected patterns for larger support values and
MinZoomUR can be used if patterns of very low support need to be generated. As shown in Figure 5.3,
MinZoomUR generates a reasonable number of unexpected patterns even for extremely small values of
minimum support, as low as even 0.5%. Also the support of some beliefs about a domain may be very low,
perhaps reflective of some condition that occurs rarely. In such cases methods such as MinZoomUR that can
find patterns at very low support values are necessary.
Apriori on the other hand has the drawback of generating a very large number of patterns since the objective
is to discover all strong rules. As Figure 5.1 shows, for very low support values, this could result in millions
of rules. However there are two sides of the coin. Generating a very large number of patterns results in a data
mining problem of a second order and is hence avoidable. At the same time, domain knowledge that
ZoomUR and MinZoomUR start with will almost always be incomplete for most business applications.
Hence it is possible that either of the two methods that seek unexpected patterns could miss other interesting
30
patterns that may be unrelated to the captured domain knowledge. However the set of patterns generated by
Apriori can be guaranteed to have all the interesting patterns since it has all patterns. We believe that this
tradeoff is in some sense unavoidable since the problem of generating all interesting patterns (not just
“unexpected”) is a difficult problem to solve.
5.2 Scalability Issues
In this section we experimentally examine the scalability of ZoomUR and MinZoomUR with respect to the
size of the database and the number of initial beliefs.
Scalability with the size of the database
For a sample of 10 beliefs, we ran ZoomUR and MinZoomUR multiple times by varying the number of
records in the dataset from 40,000 to 200,000. Figure 5.4 and Figure 5.5 show the execution times for
ZoomUR and MinZoomUR respectively. The experiments indicate that the methods are scalable in the range
considered. Figures 5.4 and 5.5 indicate that both ZoomUR and MinZoomUR seem to scale linearly with the
size of the database. This is not surprising since these algorithms are based on Apriori, which as shown in
[AMS+95] scales linearly.
Figure 5.4. Execution time of ZoomUR as a function of database size
30
Time (minutes)
25
20
15
10
5
0
0
50000
100000
150000
200000
Database size
31
25
Time (minutes)
20
15
10
5
0
0
50000
100000
150000
200000
250000
Database size
Figure 5.5. Execution time of MinZoomUR as a function of database size
Scalability as a function of number of initial beliefs
Since ZoomUR and MinZoomUR generate unexpected patterns for each belief individually, these methods
will scale linearly with the number of beliefs. However, the time taken to generate unexpected patterns for
each belief is dependent on the specific belief in consideration due to many reasons such as the support and
confidence values of the belief on the data and the number of unexpected patterns that actually exist. To
examine this in greater detail, we measured the percentage of execution time that was used to generate
unexpected patterns for each belief. Figures 5.6 and 5.7 illustrate the percentage of time taken for each of the
beliefs. Each sector represents the time taken for generating unexpected patterns for a specific belief.
Figure 5.6. Proportion of execution time of ZoomUR for each belief
32
Figure 5.7. Proportion of execution time of MinZoomUR for each belief
Each of the pie charts in Figures 5.6 and 5.7 has slices corresponding to each belief. The distribution of time
is similar in both cases, but the disproportionate share of some beliefs was interesting and we discuss a
reason here as to why this would be expected to occur. Assume that a belief A → B has support s and
confidence c in the data. The upper bound for the support for any unexpected rule generated by ZoominUR is
10
the fraction of the records where A is true and B is not true. This upper bound is (1-c)*s/c . For any given
minimum support value, for beliefs where this fraction is large, intuitively many more itemsets can be
generated and hence the execution time would be greater. When we ranked the beliefs according to this
criterion, the top six beliefs were the same six beliefs that contributed to most of the execution time. Hence
the distribution shown in Figure 5.6 and Figure 5.7 is just indicative of the fact that the initial set of beliefs
may vary in their support and confidence values on the data.
In this section we presented results pertaining to the effectiveness of MinZoomUR and compared it to
Apriori and ZoomUR. In this real world case study we demonstrate that MinZoomUR can be used to
discover far fewer patterns than Apriori and ZoomUR yet finding most of the truly interesting patterns.
6. Conclusions
In this paper we presented a detailed formal characterization of the minimal set of unexpected patterns and
proposed three algorithms for discovering the minimal set of such patterns. In a real world application we
demonstrate that the main discovery algorithm, MinZoomUR, discovers orders of magnitude fewer patterns
10
Support(A, ¬B) = support(A) – support(A, B) = (s/c) – s = (1-c)*(s/c)
33
than other comparable methods yet retains most of the truly unexpected patterns. We also discuss tradeoffs
between various discovery methods and present some guidelines for their usage.
The power of this approach lies in combining two independent concepts of unexpectedness and minimality
of a set of patterns into one integrated concept that provides for the discovery of small but important sets of
interesting patterns. Moreover, MinZoominUR and MinZoomUR are efficient since we focus directly on
discovering minimal unexpected patterns rather than adopting any of the post-processing approaches, such as
filtering. Finally, the approach is effective in supporting decision making in real business applications where
unexpected patterns with respect to managerial intuition can be of great value.
One of the key assumptions in our approach to unexpectedness is the construction of a comprehensive set of
beliefs since few truly interesting patterns can be discovered for a poorly specified set of beliefs. In [P99] we
describe a method of generating a comprehensive set of beliefs. We also utilized this method to generate the
initial set of beliefs used in the experiments described in Section 5. As a future research, we plan to study
how different sets of initial beliefs affect generation of the minimal sets of unexpected patterns generated
from these beliefs. We also plan to study how unexpected patterns can be generated from multiple beliefs.
References
[AIS93] Agrawal, R., Imielinski, T. and Swami, A., 1993. Mining Association Rules Between Sets of Items
in Large Databases. In Proc. of the ACM SIGMOD Conference on Management of Data, pp. 207-216.
[AMS+95] Agrawal, R., Mannila, H., Srikant, R., Toivonen, H. and Verkamo,A.I., 1995. Fast Discovery of
Association Rules. In Fayyad, U.M., Piatetsky-Shapiro, G., Smyth, P., and Uthurusamy, R. eds., Advances in
Knowledge Discovery and Data Mining. AAAI Press.
[AT97] Adomavicius, G., and Tuzhilin, A., 1997. Discovery of Actionable Patterns in Databases: The Action
Hierarchy Approach. In Proc. of the Third International Conference on Knowledge Discovery and Data
Mining.
[BA99] Bayardo, R. and Agrawal, R., 1999. Mining the Most Interesting Rules. In Proc. of the Fifth ACMSIGKDD International Conference on Knowledge Discovery and Data Mining.
[BAG99] Bayardo, R., Agrawal, R. and Gunopulos, D., 1999. Constraint-Based Rule Mining in Large,
Dense Databases. In Proceedings of ICDE, 1999.
[BT98] Berger, G. and Tuzhilin, A., 1998. Discovering Unexpected Patterns in Temporal Data
Using Temporal Logic. In Etzion, O., Jajodia, S. and Sripada, S. eds, Temporal Databases:
34
Research and Practice. Springer, 1998.
[BM78] Buchanan, B.G. and Mitchell, T.M., 1978. Model Directed Learning of Production Rules. In
Waterman and Hayes-Roth (eds.) Pattern-Directed Inference Systems, Academic Press, New York.
[BMU+97] Brin, S., Motwani, R., Ullman, J.D., and Tsur, S., 1997. Dynamic Itemset Counting and
Implication Rules for Market Basket Data. Procs. ACM SIGMOD Int. Conf. on Mgmt. of Data, pp.255-264.
[CSD98] Chakrabarti, S., Sarawagi, S. and Dom, B. Mining Surprising Patterns Using Temporal Description
Length. In International Conference on Very Large Databases, 1998.
[F97] Forbes Magazine, Sep. 8, 1997. Believe in yourself, believe in the merchandise, pp.118-124.
[FPS96] Fayyad, U.M., Piatetsky-Shapiro, G., Smyth, P., 1996. From Data Mining to Knowledge Discovery:
An Overview. In Fayyad, U.M.,Piatetsky-Shapiro, G., Smyth, P., and Uthurusamy, R. eds., Advances in
Knowledge Discovery and Data Mining. AAAI/MIT Press.
[HRM75] Hayes-Roth, M. and Mostow, D., 1975. An Automatically Compliable Recognition Network for
Structured Patterns. In Proceedings of International Joint Conference on Artificial Intelligence-1975., pp.
356-362.
[KMR+94] Klemettinen, M., Mannila, H., Ronkainen, P., Toivonen, H. and Verkamo, A.I., 1994. Finding
Interesting Rules from Large Sets of Discovered Association Rules. In Proc. of the Third International
Conference on Information and Knowledge Management, pp. 401-407.
[LH96] Liu, B. and Hsu, W., 1996. Post-Analysis of Learned Rules. In Proc. of the Thirteenth National
Conference on Artificial Intelligence (AAAI ’96), pp. 828-834.
[LHC97] Liu, B., Hsu, W. and Chen, S, 1997. Using General Impressions to Analyze Discovered
Classification Rules. In Proc. of the Third International Conference on Knowledge Discovery and Data
Mining (KDD 97), pp. 31-36.
[LHM99] Liu, B., Hsu, W., and Ma, Y., 1999. Pruning and Summarizing the Discovered Rules. In Proc. of
the Fifth ACM-SIGKDD International Conference on Knowledge Discovery and Data Mining.
[M77] Mitchell, T.M., 1977. Version Spaces: A Candidate Elimination Approach to Rule Learning. In
Proceedings of International Joint Conference on Artificial Intelligence-1977., pp. 305-310.
[M82] Mitchell, T., 1982. Generalization as Search. Artificial Intelligence, pp. 203-226.
[MPM96] Matheus, C.J., Piatetsky-Shapiro, G. and McNeill, D., 1996. Selecting and Reporting What is
Interesting: The KEFIR Application to Healthcare Data. In Advances in Knowledge Discovery and Data
Mining. AAAI Press, 1996.
35
[MUB82] Mitchell, T.M., Utgoff, P.E., and Banerji, R.B., 1982. Learning Problem-Solving Heuristics by
Experimentation. In Michalski et al. (Eds.) Machine Learning, Tioga Press, Palo Alto.
[P70] Plotkin, G.D., 1970. A Note on Inductive Generalization. In Meltzer and Michie (eds.) Machine
Intelligence, Edinburgh University Press, Edinburgh.
[P99] Padmanabhan, B., 1999. Discovering Unexpected Patterns in Data Mining Applications. Doctoral
Dissertation, New York University, May 1999.
[PSM94] Piatetsky-Shapiro, G. and Matheus, C.J., 1994. The Interestingness of Deviations. In Procs. of the
AAAI-94 Workshop on Knowledge Discovery in Databases, pp. 25-36.
[PT98] Padmanabhan, B. and Tuzhilin, A., 1998. “A Belief-Driven Method for Discovering Unexpected
Patterns.” In Proc. of the 4th International Conference on Knowledge Discovery and Data Mining, 1998.
[PT99] Padmanabhan, B. and Tuzhilin, A., 1999. Unexpectedness as a Measure of Interestingness in
Knowledge Discovery. In Decision Support Systems, (27)3 (1999) pp. 303-318
[S97] Stedman, C., 1997. Data Mining for Fool's Gold. Computerworld, Vol. 31,No. 48, Dec. 1997.
[SA96] Srikant, R. and Agrawal, R., 1996. Mining Quantitative Association Rules in Large Relational
Tables. In Proc. of the ACM SIGMOD Conference on Management of Data, 1996.
[SLR+99] Shah, D., Lakshmanan, L.V.S, Ramamritham, K., and Sudarshan, S., 1999. Interestingness and
Pruning of Mined Patterns. In Proceedings of the 1999 ACM SIGMOD Workshop on Research Issues in
Data Mining and Knowledge Discovery (DMKD), Philadelphia, 1999.
[ST95] Silberschatz, A. and Tuzhilin, A., 1995. On Subjective Measures of Interestingness in Knowledge
Discovery. In Proc. of the First International Conference on Knowledge Discovery and Data Mining, pp.
275-281.
[ST96] Silberschatz, A. and Tuzhilin, A., 1996. What Makes Patterns Interesting in Knowledge Discovery
Systems. IEEE Transactions on Knowledge and Data Engineering. Special Issue on Data Mining, vol. 5, no.
6, pp. 970-974.
[SVA97] Srikant, R., Vu, Q. and Agrawal, R. Mining Association Rules with Item Constraints. In Proc. of
the Third International Conference on Knowledge Discovery and Data Mining (KDD 97), pp. 67-73.
[Sub98] Subramonian, R. Defining diff as a data mining primitive. In Proceedings of the Fourth
International Conference on Knowledge Discovery and Data Mining, 1998.
[Suz97] Suzuki, E., 1997. Autonomous Discovery of Reliable Exception Rules. In Proc. of the Third
International Conference on Knowledge Discovery and Data Mining, pp. 259-262.
36
[TKR+95] Toivonen, H., Klemetinen, M., Ronkainen, P., Hatonen, K. and Mannila, H., 1995. Pruning and
Grouping Discovered Association Rules. In MLNet Workshop on Statistics, Machine Learning and
Discovery in Databases, pp. 47-52.
[V78] Vere, S.A., 1978. Inductive Learning of Relational Productions. In Waterman and Hayes-Roth (Eds.)
Pattern-Directed Inference Systems, Academic Press, New York.
[W75] Winston, P.H., 1975. Learning Structural Descriptions from Examples. In Winston (ed.) The
Psychology of Computer Vision, McGraw Hill, New York.
37
Download