Knowledge Refinement - Operations, Information and Decisions

advertisement
1
Knowledge Refinement Based on the Discovery of Unexpected Patterns
in Data Mining
Balaji Padmanabhan
Operations and Information Management Department
The Wharton School, University of Pennsylvania
3620 Locust Walk, Philadelphia, PA 19104
email: balaji@wharton.upenn.edu
tel:+1(215)573-9646
Alexander Tuzhilin
Information Systems Department
Stern School of Business, New York University
44 West 4 Street, New York, NY 10012
email: atuzhili@stern.nyu.edu
tel:+1(212)998-0832
Abstract
In prior work we provided methods that generate unexpected patterns with respect to
managerial intuition by eliciting managers' beliefs about the domain and using these
beliefs to seed the search for unexpected patterns in data. Unexpected patterns discovered
in this manner represent contradictions or “holes” in domain knowledge which need to be
resolved. Given a belief and a set of unexpected patterns, the motivation behind
knowledge refinement is that the belief can be made stronger by refining the belief based
on the discovered patterns. In this paper we address the problem of incorporating the
discovered contradictions into the belief system based on a formal logic approach.
2
Specifically, we present a framework for refinement based on a generic knowledge
refinement strategy, describe abstract properties of refinement algorithms that can be
used to compare specific instantiations and then describe and compare two specific
refinement algorithms based on this framework.
Keywords: Knowledge refinement, unexpected patterns, data mining, association rules,
rule discovery, refinement strategies, iterative refinement
1. Introduction
Over the past few years, research in data mining [7, 10] has produced several new
techniques for pattern discovery in large datasets and has demonstrated several successful
applications of data mining. However, a drawback of several data mining techniques is
that they do not systematically leverage prior domain knowledge of users. In prior
research [13, 14, 15] we proposed new methods to discover unexpected patterns in data
based on prior domain knowledge. As demonstrated in [13, 14, 15], methods that
generate unexpected patterns can discover far fewer and more effective patterns than
conventional pattern discovery approaches in data mining such as association rule
discovery [3].
In general, for knowledge-driven data mining methods, the concept of knowledge
refinement is critical since these methods have to deal with reconciling discovered
patterns from data with the initial domain knowledge. In particular, unexpected patterns
3
discovered from data using methods proposed in prior work [13, 14, 15] represent
contradictions to domain knowledge that need to be resolved. Resolving contradictions in
this manner can be used as a process of iteratively refining domain knowledge through
repeated data mining. To address this problem, in this paper we present knowledge
refinement methods that reconcile unexpected patterns generated by methods in [13, 14,
15] with prior domain knowledge. Specifically, in this paper we present a generic
knowledge refinement strategy, describe abstract properties of refinement algorithms that
can be used to compare specific instantiations based on the generic strategy and then
present and compare two algorithms that refine prior domain knowledge based on the
discovery of unexpected association rules.
In a broad sense, the idea of refining knowledge has been addressed before in several
different communities. In the KDD (Knowledge Discovery and Data Mining) community
[19] suggests a feedback process in which the discovered patterns would be used to
update the belief system but stops short of describing how this can be done. Further, the
context in which [19] describes the feedback process is different - [19] presents a data
monitoring and discovery triggering framework that is applicable in problems where the
data changes over time (new data enters the system and this could “trigger” discovery
algorithms). In this paper, the data remains constant while the knowledge discovered
from this data gets refined by the feedback process. In our work, we will describe how to
refine the knowledge base iteratively.
4
In the area of association rule discovery in data mining, there has been little work done in
knowledge refinement based on discovered rules. A main reason for this is that most
methods for rule discovery in data mining do not systematically integrate prior domain
knowledge into the search for patterns in data. Recently unexpected pattern discovery
[13, 14, 15] provides methods that discover unexpected association rules based on
systematic incorporation of domain knowledge. However, these approaches do not
describe how to refine knowledge based on the discovered patterns.
Prior research in the areas of expert systems [5, 8], scientific discovery [18], belief
revision [1, 4], classification [6, 16] and concept learning [9, 11] address issues related to
knowledge refinement. Below we briefly describe the issues in these areas and discuss
their relation to our work. The idea of new knowledge discovery, its integration with
existing knowledge and iterative repetition of this process can be traced back to the early
work on expert systems [5, 8] and on scientific discovery [18]. However, in this paper we
deal with discovering unexpected patterns using association rules and with the specific
issues of integrating these types of patterns into the existing system of beliefs. There has
also been work done on the problem of belief revision in AI (e.g., [1, 4]). However in our
approach we start with beliefs that are statistically valid. Hence, given the data and the
fact that the beliefs “hold”, no update is necessary. However, we demonstrate that
refining beliefs can make domain knowledge better. Further, beliefs and the new
evidence are represented as “rules”. The methods proposed in [1, 4] do not describe how
to update rules specifically. This lies outside of the scope of the research issues addressed
in [1,4] and other related papers in belief revision.
5
In classification, several approaches [6, 16] address the issue of refinement in the context
of predicting a dependent class. Specifically, these approaches are based on iteratively
applying steps to classify the misclassified cases better. In general in each iteration, the
classification method is biased towards incorrectly classified data. These approaches
differ from the methods for refinement proposed in this paper since we do not address the
issue of classification but focus on building methods to refine domain knowledge in the
form of rules using unexpected (association) rule discovery.
In the problem of concept learning [9, 11] several approaches address rule refinement. In
concept learning the task is to learn a set of rules that completely characterize a predicted
class (concept). Some approaches deal with starting with domain theories about the
concept and identify exceptions in the data. Since the task of discovering all rules that can
explain the exceptions is exponential, these approaches use specific heuristics in rule
refinement. Our work is the first approach to use association rules, in which it is possible
to search through all possible rules since we can exploit constraints not available to other
methods [3, 13]. Further we deal with a broader set of beliefs (and not just characterizing
a single concept) and do not assume that rules in the final set need to completely
characterize the data as done in concept learning.
The rest of this paper is structured as follows. In Section 2 we provide, for completeness,
some preliminaries including an overview of association rules and unexpected pattern
discovery. Section 3 provides a technical motivation for knowledge refinement. In
6
Section 4 we present the generic refinement strategy. Section 5 discusses properties of
refinement algorithms and Section 6 presents and compares two specific refinement
algorithms. Conclusions are presented in Section 7.
2. Preliminaries
In this section we present an overview of the structure of rules considered in this paper,
unexpectedness and approaches to discover unexpected patterns presented in [14, 15].
Let I = {i1, i2, …, im} be a set of discrete attributes (also called “items” [2]), some of them
being ordered and others unordered. Let D = {T1, T2, ..., TN} be a relation consisting on N
transactions [3] T1,...,TN over the relation schema {i1, i2, …, im}. Also, let an atomic
condition be a proposition of the form value1  attribute  value2 for ordered attributes
and attribute = value for unordered attributes where value, value1, value2 belong to the
set of distinct values taken by attribute in D. Finally, an itemset is a conjunction of
atomic conditions. Then we assume that rules and beliefs are defined as extended
association rules of the form X  A, where X is the conjunction of atomic conditions (an
itemset) and A is an atomic condition. Moreover, the rule has confidence [2] c if c% of
the transactions in D that contain X also contain A; also, the rule has support [2] s in D if
s% of the transactions in D contain both X and A. Finally, a rule is said to hold on a
dataset D if the confidence of the rule is greater than a user-specified threshold value
chosen to be any value greater than 0.5. Various efficient algorithms for finding all
association rules in transactions databases have been proposed in [3].
7
To define unexpectedness, we start with a set of beliefs that represent knowledge about
the domain and use these beliefs to seed the search for all unexpected patterns defined as
rules. [13, 14] provides an approach to defining unexpectedness in terms of a logical
contradiction, on the data, to an existing system of beliefs. More specifically, in this
approach, a rule A  B is defined to be unexpected with respect to the belief X  Y on
the database D if the following conditions hold:
(a) B and Y logically contradict each other (B AND Y |= FALSE);
(b) A AND X holds on a “large” subset of tuples in D;
(c) The rule A, X  B holds.
Given the definition of unexpectedness, [13] proposes algorithm ZoomUR that discovers
all the unexpected rules with respect to a set of beliefs that satisfy user-specified
minimum support and confidence requirements. ZoomUR consists of algorithms for two
phases of the discovery strategy - ZoominUR and ZoomoutUR.
In the first phase of ZoomUR, ZoominUR discovers all unexpected patterns that are
refinements to any belief. More specifically, given any belief X  Y, ZoominUR
discovers all unexpected rules of the form X, A  B such that B AND Y |= FALSE. In the
second phase of ZoomUR, starting from all the unexpected rules that are refinements to a
belief, ZoomoutUR discovers more general rules (generalizations) that are also
unexpected. Specifically from each unexpected refinement of the form X, A  B
ZoomoutUR discovers all the unexpected rules of the form X’, A  B where X’  X. The
8
rules that ZoomoutUR discovers are not refinements of beliefs, but more general rules
that satisfy the conditions of unexpectedness as defined above.
For example, if a belief is that “professional  weekend” (professionals tend to shop
more on weekends than on weekdays), ZoominUR may discover a refinement such as
“professional, December  weekday” (in December, professionals tend to shop more on
weekdays than on weekends). ZoomoutUR may then discover a more general rule
“December  weekday”, which is totally different (in the sense that the rules that
ZoomoutUR discovers cannot be inferred from either the original belief or the refinement
generated by ZoominUR) from the initial belief “professional  weekend”. In addition to
being unexpected by definition, these more general rules could perhaps provide
additional reasons why the belief was contradicted in the refinement (perhaps its not a
"professionals in December effect" but a "December effect" that causes the belief to be
contradicted).
Though ZoomUR discovers only the unexpected rules and also far fewer rules than
Apriori, it still discovers large numbers of rules many of which are redundant in the sense
that they can be inferred from other discovered rules. For example, given the belief
diaper  beer and two unexpected patterns diaper, weekday  not_beer and diaper,
weekday, male  not_beer the second unexpected pattern can be inferred from the first
one under monotonicity. Formally, the monotonicity assumption is defined in [15] using
the relation |=M defined below.
9
Definition [15]. Rule (A  B) |=M (C  D) if
1. C |= A, and
2. D = B.

e.g. (diaper  beer) |=M (diaper, weekday  beer) since diaper, weekday logically
implies diaper.
Therefore, to improve the discovery process, [15] introduces the concept of a minimal set
of unexpected patterns and presents efficient algorithms that discover this set of rules.
[15] presents MinZoominUR, an algorithm that discovers the minimal set of unexpected
refinements and MinZoomUR, an algorithm that discovers the minimal set of all
unexpected patterns (not just refinements). A more detailed description of the algorithms
that discover minimal unexpected patterns is in [15].
Given these preliminaries, in the next section we present a technical motivation for
knowledge refinement based on unexpected pattern discovery.
3. Motivation for Knowledge Refinement
Given a belief and a set of unexpected patterns, the motivation behind knowledge
refinement is that the belief can be made stronger by refining the belief based on the
discovered patterns. This is presented formally in Theorem 3.1.
10
Theorem 3.1. Given a database D, a belief A  B with support s1 and confidence c1 such
that the belief holds on D, and an unexpected pattern X  C with support s2 and
confidence c2, the refined belief A, X  B has confidence c3 such that c3 > c1.
Proof.
For any itemset P, let cnt(P) denote the number of records in D that satisfy P. To prove c3
> c1, we need to prove the following:
cnt(A,B,X) / cnt(A,X) > cnt(A,B) / cnt(A).
Rewriting the LHS of the above inequality, we need to prove:
[cnt(A,B)-cnt(A,B,X)] / [cnt(A)-cnt(A,X)] > cnt(A,B) / cnt(A).
Rearranging the above after multiplying both sides of the inequality by cnt(A).[cnt(A)cnt(A,X)] results in the following inequality to be proved:
cnt(A). cnt(A,B,X) < cnt(A,X). cnt(A,B)
Rearranging the terms, we need to prove that cnt(A,B)/cnt(A) > cnt(A,B,X) / cnt(A,X).
Observe that the LHS of the above inequality is the confidence of A  B and the RHS is
the confidence of A, X B. The above inequality would hold (and hence the proof would
be complete) if it can be shown that the rule A, X B has lower confidence that the belief
A  B. Given that the belief holds, to complete the proof below we show that the rule A,
X B in fact does not hold (and hence has confidence < 0.5).
11
Given that X  C is unexpected, it follows that A, X  C holds. Since C |= B, it
follows that cnt(A,X,B)  cnt(A,X,C). Hence the rule A, X  B also holds. Clearly if
this is the case, A, X  B cannot hold, which completes the proof.

In the next section we present the generic refinement strategy.
4. Generic refinement strategy
The general strategy is to generate unexpected patterns followed by selecting some of
these patterns and refining the beliefs. The process continues until no more unexpected
patterns can be generated (i.e. a fixpoint [20] is reached).
This generic refinement strategy is presented in Figure 1. Rk(B) denotes the set of beliefs
generated at the end of the k-th iteration of refining belief B. fixpoint(B) denotes the
refined set of beliefs for a given belief B such that there are no unexpected patterns with
respect to any belief in fixpoint(B).
At the end of each iteration, the set of beliefs is checked for validity, where validity may
be defined in different terms such as minimum support and confidence. This is necessary
to ensure that the refinement procedure does not add beliefs that may not satisfy threshold
support or confidence requirements.
12
The process therefore consists of three procedures in each iteration:
1. Pattern generation procedure: generation of unexpected patterns for a belief. The
procedure to generate unexpected patterns for a belief can be one of the methods
described in Section 2 (ZoominUR, ZoomUR, MinZoominUR, MinZoomUR). The
pattern generation procedure used in the refinement methods presented in this paper is
discussed in Section 6.1.
2. Selection procedure: selecting a subset of unexpected patterns that will be used to
refine the belief. There can be several criteria for selecting a subset of patterns from
the set of all unexpected patterns and there can, hence, be several selection
procedures that can be used.
3. Refinement procedure: refining the belief using selected patterns. Given a belief and a
set of unexpected patterns, the refinement procedure details how the new set of
beliefs will be computed. Section 6.2 discusses this in greater detail.
<< INSERT FIGURE 1 ABOUT HERE >>
A specific instantiation of each of these three procedures creates a specific refinement
algorithm. There are several possible instantiations of each of these procedures, which
results in a large number of refinement algorithms. A strength of this generic refinement
strategy is that it can be viewed as a broad framework that can allow for a large number
of different refinement approaches.
13
Rather than comparing different refinement approaches using a single metric, we adopt
the approach of listing several properties, presented in the following section, that can be
used to compare refinement algorithms.
5. Properties of refinement algorithms
In this section we present five properties of refinement algorithms. These properties are
not exhaustive since specific applications may have additional requirements or desirable
properties of a belief system.
1. Convergence.
Guarantees convergence of a belief system to a fixpoint, i.e., after a finite number of
iterations no unexpected patterns are discovered relative to the refined system of beliefs.
2. Consistency.
Ensures that beliefs RK(B) are consistent at any iteration of the belief revision process. A
set of beliefs, B, is said to be consistent if for all b1, b2  B whenever head(b1) |=
head(b2) then it is also true that body(b1) body(b2) |= FALSE.
For example, the beliefs A,X B and A,X  B are consistent since A,X and A,X
cannot hold at the same time. The beliefs A,X B and A, C  B are not consistent
since A,X and A, C can hold at the same time even though they have contradictory heads.
14
3. Path Independence.
Ensures that the order in which the selected patterns are incorporated into the belief
system does not affect the refined system of beliefs.
4. Minimality.
Ensures that a refinement strategy creates a minimal set of beliefs where minimality is
defined with respect to the |=M operator (as described in Section 2).
5. Monotonicity.
Guarantees that once an unexpected pattern is incorporated into the belief system and
becomes “expected,” the pattern will never re-appear subsequently as “unexpected”
again.
In the next section we present two refinement algorithms IterateRUB (iteratively Refines
beliefs Using the "Best" unexpected pattern in each iteration) and IterateRUA (iteratively
Refines beliefs Using All unexpected patterns in each iteration) both of which generate
fixpoints for a given belief.
6. Refinement Algorithms
In this section we present two refinement algorithms that differ only in the selection
procedure of the iterative refinement process. The pattern generation procedure and
15
refinement procedure are common and we describe these procedures first, followed by a
description of the individual selection procedures.
6.1 The pattern generation procedure
The pattern generation procedure used to generate unexpected patterns with respect to
each belief is MinZoominUR, which generates only the minimal set of rules that are
refinements to a belief (“zoomin” rules, see Section 2). Some other choices for the pattern
generation procedure are ZoominUR, ZoomUR, MinZoomUR but MinZoominUR is a
good choice for the pattern generation procedure for the following reasons:
1. Intuitively, "refinement" is associated with specialization. The Webster dictionary
provides a definition of "refine" as "to improve or perfect by pruning or polishing".
For this reason, zoomin rules are refinements that contradict a belief while zoomout
rules are more general unexpected patterns. MinZoominUR generates only zoomin
rules.
2. Since MinZoominUR generates only minimal patterns, using MinZoominUR
guarantees that no selection procedure can select patterns that are subsumed by other
unexpected patterns. Intuitively using only the minimal set of unexpected patterns is
equivalent to resolving the most general contradictions first. For example, consider a
belief that professionals shop on weekends and two unexpected patterns are that in
December they shop on weekdays and professionals with medium income in
16
December shop on weekdays. Intuitively the first unexpected pattern should be
resolved first since it is more general.
For the above reasons, MinZoominUR is used as the pattern generation procedure in the
refinement algorithms presented in this paper. We next describe the refinement procedure
used in IterateRUB and IterateRUA.
6.2 The refinement procedure
The refinement procedure, NM, used in IterateRUB and IterateRUA is similar to ideas in
non-monotonic reasoning [17].
Given a belief b represented by A  B and a set, S, of unexpected patterns X1  C1, X2
 C2,…, XN  CN, the refinement procedure NM_Refine(b, S) replaces the initial belief
with the following:

X1  C1, X2  C2,…, XN  CN and

A,  X1,  X2,…,  XN  B
Given the constraint that the bodies of rules and beliefs considered are restricted to
conjunctions and the fact that the refinement procedure incorporates negations in the
body of the refined belief, the refined belief is equivalently represented as a set of beliefs
derived according to the following procedure.
17
The body (A,  X1,  X2,…,  XN) of the refined belief is first converted to the equivalent
disjunctive normal form P1  P2  … PK. The belief A,  X1,  X2,…,  XN  B is
therefore equivalent to P1  P2  … PK  B, which is represented as K beliefs P1  B,
P2  B,…, PK  B all of which have only conjunctions of conditions in their bodies.
The strengths of the NM refinement procedure are:
1. Since all the unexpected patterns are incorporated into the belief system, the NM
procedure guarantees that all selected unexpected patterns are now expected.
2. Completeness: the original belief is refined to incorporate all the conditions in the
selected patterns that contradicted the belief.
The two refinement algorithms IterateRUB and IterateRUA incorporate the same pattern
generation and refinement procedures described above and differ in the selection
procedure used.
These two specific algorithms are described in this paper since they represent two
extremes: IterateRUB uses only a single "best" (see Section 6.3 below) pattern at each
stage of the refinement process (similar to the greedy heuristic that is used in recursive
partitioning methods such as CART [6]) and IterateRUA uses all generated patterns in
each iteration.
18
In the next two sections we present IterateRUB and IterateRUA.
6.3 IterateRUB
IterateRUB is presented in Figure 2. The algorithm follows the generic refinement
strategy presented in Section 4 and uses MinZoominUR as the pattern generation
procedure and NM as the refinement procedure. The selection procedure used in
IterateRUB selects a single "strongest" unexpected pattern from the set of MinZoominUR
patterns by applying the following heuristic:
1. Select the set HC of unexpected patterns that have the highest confidence from the set
of patterns generated.
2. Select the set HCS of unexpected patterns that have the highest support from the set
HC generated previously.
3. Since there may be multiple patterns with the same confidence and support values,
select any single pattern from HCS.
<< INSERT FIGURE 2 ABOUT HERE >>
In general there may be measures other than confidence or support to choose the
"strongest" unexpected pattern.
19
To evaluate IterateRUB in theorems 6.1 through 6.5 below we prove various properties
of IterateRUB. Some experimental results of IterateRUB in a real world application are
presented after Theorem 6.6 in Section 6.4 below.
Theorem 6.1. For a belief, B, and a dataset D with a finite number of discrete attributes,
IterateRUB converges to a fixpoint after a finite number of iterations (i.e. fixpoint(B)
exists).
Proof. Since IterateRUB uses MinZoominUR rules to refine beliefs, it follows that the
number of conditions in the body of both the refined beliefs is strictly greater than the
number of conditions in the body of the parent belief (because zoomin rules are
specializations of a belief). Given a finite number of discrete attributes, and therefore a
finite number of conditions that can be considered, the upper bound on the number of
iterations of IterateRUB is the number of possible conditions in the domain minus the
minimum number of conditions in the body of any original belief.
In the unordered case where conditions involve only the equality operator, the upper
bound is in fact the number of attributes in the domain (since no two conditions in a
belief can involve the same attribute - for e.g. A=1,B=4,A=3  X=0 is not a valid
belief).
That the final set of beliefs is a fixpoint follows trivially since this set is characterized by
a lack of unexpected patterns for MinZoominUR to generate.

20
In practice, a second effect occurs which further reduces the number of iterations before
convergence. We briefly explain this below.
Consider a belief A  B and the selected unexpected pattern A, X  B. In the next
iteration, the belief A  B is replaced with the beliefs A, X  B and A, X  B. The
support of both the refined beliefs is less than or equal to the support of the parent belief
as is shown below.
Assume that cnt(X) is the number of records in D where X holds. By definition, support(
A  B) = cnt(A, B)/|D|. Also, since the belief A  B holds, cnt(A, B) > cnt (A, B).
Below we show for both the refined beliefs that the support is less than or equal to the
support of the parent belief.

support(A,X  B) = cnt( A, X, B ) / |D| = (1/|D|) * (cnt( A, B) - cnt(A, X, B)) 
support(A B).

support(A, X, B) = cnt(A, X, B) / |D|  cnt(A, B) / |D|
 cnt(A, B) / |D|
=
support( A  B).
Given this, MinZoominUR usually generates unexpected patterns with lesser and lesser
support until the support values fall below the minimum specified threshold. This is
another factor that in practice aids in fast convergence to a fixpoint.
21
Theorem 6.2. For a belief B, RK(B) is consistent where K is any iteration of IterateRUB.
Proof. This property is a direct consequence of the fact that in IterateRUB, for any
iteration K, RK(B) consists of beliefs that are mutually exclusive. Intuitively mutually
exclusive beliefs are consistent since by virtue of no two beliefs being applicable at the
same time there can be no potential inconsistency. The consistency condition is hence
satisfied trivially if RK(B) can be shown to consist of beliefs that are mutually exclusive.
To prove the theorem, we now prove by induction on the number of iterations, k, that for
all b1, b2  RK(B) the beliefs b1 and b2 are mutually exclusive.
Base step: To prove that R 1 (B) consists of mutually exclusive beliefs.
Consider an initial belief A  B and the best unexpected pattern A, X  B. The set of
beliefs at the end of the first iteration are A, X  B and A, X  B. Since (A, X) |= 
(A, X) the two beliefs are mutually exclusive.
Induction step: Assume that R
P
(B) consists of mutually exclusive beliefs. We need to
prove that R P+1 (B) consists of mutually exclusive beliefs.
Consider any belief C  D that belongs to R
there is no such belief, then R
P+1
P
(B) and that has unexpected patterns. If
(B) = R P (B) and the result trivially holds. If there are
unexpected patterns, assume that the best unexpected pattern is C, X  D.
22
At the end of iteration (P+1) the belief C  D is replaced by C, X  D and C, X 
D. Observe that the bodies of these two beliefs ( C,X and C, X ),are specializations to
the body of the belief (C). Since C  D was in R P (B), by the inductive assumption it
follows that C  D is mutually exclusive to all other beliefs in R P (B). Therefore C, X
 D and C, X  D are both mutually exclusive to all other beliefs in R P (B). Further,
since the body of any belief in R P+1 (B) is a specialization of the body of some belief in R
P (B)
it follows that both C, X  D and C, X  D are mutually exclusive to any belief
derived from any belief other than C  D from R P (B).
By symmetry the same argument applies to all beliefs refined in iteration P+1. Hence R
P+1 (B)
consists of mutually exclusive beliefs.

Theorem 6.3. IterateRUB has the path-independence property, i.e. the order in which the
selected patterns are incorporated into the belief system does not affect the final belief
system.
Proof. Since only one pattern is incorporated into the belief system at each iteration this
holds trivially. 
Theorem 6.4. For a belief B, R K (B) is minimal where K is any iteration of IterateRUB.
23
Proof. As proved in theorem 6.2, all beliefs in R K (B) are mutually exclusive. Hence it is
impossible to find two beliefs b1, b2 in R K (B) such that body(b1) |= body(b2). Hence R
K (B)
has to be minimal.

Theorem 6.5. IterateRUB is monotonic.
Proof. To prove this we need to show that no unexpected pattern incorporated into the
belief system at any iteration appears again as unexpected.
Clearly since the set of beliefs at any iteration is mutually exclusive, the same pattern
cannot appear as unexpected for two different beliefs in any iteration. Below we prove
that any unexpected pattern cannot re-appear at any subsequent iteration too.
To trace how a single belief is refined iteratively in IterateRUB, consider a tree with the
belief at the root such that children of any node in this tree are beliefs that the refinement
procedure creates for the parent belief and the depth of a node indicates the number of
iterations from the initial belief. A property of any belief (node) in this tree is that they
are refinements of all their parent nodes. Therefore, since an unexpected pattern here is a
zoomin rule, it can never re-surface at any node in the sub-tree under itself. Hence to
prove that an unexpected pattern cannot re-surface at a subsequent iteration, all we need
to show now is that they cannot result from any other belief in the iteration that the
unexpected pattern was incorporated. Consider an unexpected pattern, p, incorporated
into the belief system in iteration k. Recall that all beliefs in RK(B) are mutually exclusive
24
for any iteration, k. Therefore all nodes in the trees that result from each belief in RK(B)-p
are also mutually exclusive with p and therefore none of these patterns can be the same as
p. Hence the result.

To summarize, IterateRUB always converges to a fixpoint for any belief, generates
minimal and consistent beliefs at any iteration, is path-independent and monotonic. In the
next section we present another refinement algorithm, IterateRUA and discuss its
properties.
6.4 IterateRUA
IterateRUA is presented in Figure 3. The algorithm follows the generic refinement
strategy presented in Section 4 and also uses MinZoominUR as the pattern generation
procedure and NM as the refinement procedure. Skipping the selection step in Figure 3
defaults to using all generated patterns in the refinement procedure. Hence, the selection
procedure is trivially one that selects all the generated patterns.
In theorems 6.6 through 6.10 below we prove various properties of IterateRUA.
Theorem 6.6. For a belief, B, and a dataset D with a finite number of discrete attributes,
IterateRUA converges to a fixpoint after a finite number of iterations (i.e. fixpoint(B)
exists).
25
The proof is the same as the one proved for IterateRUB in theorem 6.1.

<< INSERT FIGURE 3 ABOUT HERE >>
In order to experimentally study and compare the convergence properties of IterateRUB
and IterateRUA we applied the methods to consumer purchase data from a major market
research firm. We pre-processed this data by combining different data sets (transaction
data joined with demographics), made available to us into one table containing 38
different attributes and 313409 records. These attributes pertain to the item purchased by
a shopper at a store over a period of one year, together with certain characteristics of the
store and demographic data about the shopper and his or her family. Some demographic
attributes include age and gender of the shopper, occupation, income and marital status of
the household head and the presence of children in the family and the size of the
household. Some transaction-specific attributes include product purchased, coupon usage
(whether the shopper used any coupons to get a lower price or not), the availability of
store coupons or manufacturer’s coupons and presence of advertisements for the product
purchased in the store.
We started with an initial set of 28 beliefs and a minimum support value of 1% and a
minimum confidence threshold of 0.6. We used IterateRUB and IterateRUA to compute
the fixpoints. Both approaches terminated in a few minutes in fixpoints which satisfy the
condition that no more unexpected patterns exist for any of the beliefs in the final set.
IterateRUB generated a final set of 79 beliefs and converged more rapidly in this
26
experiment while IterateRUA generated a final set of 2549 beliefs. For the same set of
beliefs we also ran the methods for six different minimum support values below 3% and
the average number of patterns in the fixpoints for IterateRUB and IterateRUA were 42
and 1033 respectively.
Continuing with a discussion of the properties of IterateRUA, below we next consider the
consistency property of IterateRUA.
Theorem 6.7. For a belief B, R K (B) is not always consistent where K is any iteration of
IterateRUA.
Proof. To show that IterateRUA can generate an inconsistent set of beliefs in an iteration
we provide an example of a case where this can occur. Since IterateRUA incorporates all
unexpected patterns into the belief system, consider the following two beliefs in an
iteration A, X  B and A, Y  B. Assume that there are no unexpected patterns for
the first belief but the second belief generates A, Y, P B. Hence the belief system at the
next iteration contains both A, X  B and A, Y, P B which are two inconsistent
beliefs (since when A, X, Y and P are true it results in the system claiming B and B at
the same time).

The implications of theorem 6.7 will be discussed in the discussion in Section 7.
27
Theorem 6.8. IterateRUA has the path-independence property, i.e. the order in which the
selected patterns are incorporated into the belief system does not affect the final belief
system.
Proof. Given a belief A  B and a set of unexpected patterns X1  C1, X2  C2,…, XN
 CN, the refinement procedure replaces the belief with the following:

X1  C1, X2  C2,…, XN  CN and

A,  X1,  X2,…,  XN  B
Since all unexpected patterns are therefore incorporated simultaneously into the belief
system as shown above, the order does not matter trivially. Hence the result.
Theorem 6.9. For a belief B, R
K

(B) is not always minimal where K is any iteration of
IterateRUA.
Proof.
We provide a simple example where R
K
(B) can be non-minimal. Since
IterateRUA incorporates all unexpected patterns into the belief system, consider the
following two beliefs in an iteration A, X  B and A, Y  B. The next iteration can
generate A, X, Y, P  B and A, Y, P  B as unexpected patterns for each of the previous
beliefs. Since all discovered unexpected patterns are incorporated into the belief system
by IterateRUA, clearly R
result.
K
(B) is non-minimal since A, Y, P |= A, X, Y, P. Hence the

Theorem 6.10. IterateRUA is not monotonic.
28
Proof. Consider the following two beliefs in an iteration A, X  B and A, Y  B.
Assume that the pattern A, X, Y  B holds. Since A, X, Y  B will be generated as
unexpected for both beliefs in the same iteration even after it is incorporated into the
belief system (when it is generated the first time) the pattern will be generated again as
unexpected for the second belief. Hence IterateRUA is not monotonic.

To summarize, IterateRUA converges to a fixpoint, has the path-independence property
but is not consistent, minimal or monotonic. In the next section we discuss the
implications of these and other additional properties of the refinement algorithms.
7 Discussion
The generic refinement strategy presented in Figure 1 has three degrees of freedom: the
pattern generation procedure, the selection procedure and the refinement procedure.
Given that a fixed pattern generation procedure (MinZoominUR) and a fixed refinement
procedure (NM_Refine) were chosen for their strengths described in Section 4, we
presented two refinement algorithms that represented extremes in the selection procedure
- IterateRUB selected only the best pattern to incorporate each time while IterateRUA
selected all. Clearly the generic refinement strategy (Figure 1) can be used in many other
refinement algorithms that select some subset of generated patterns each time. However
since IterateRUB was shown to have all the good properties presented in this paper we
29
believe it is an “optimal” algorithm with respect to satisfying all the objective functions
or properties that were chosen.
A globally “best” refinement algorithm needs clear specification of what “best” should be
and in general there may be several other properties that are useful (such as convergence
in a fixed number of iterations, the size of the fixpoint, predictive accuracy). The
approach adopted here selected five good properties of refinement algorithms to make
inferences on their relative strengths. IterateRUB satisfied all these properties.
IterateRUA however does not score well in consistency, minimality and monotonicity
properties. However, notice that the reason IterateRUB generated consistent and minimal
beliefs and had the monotonicity property was because at any iteration it generated
mutually exclusive patterns (as shown in proofs of theorems 6.2, 6.4 and 6.5). IterateRUA
can be modified such that at the end of each iteration the patterns can be made mutually
exclusive by applying conflict resolution strategies. However this is an expensive
operation which involves comparison of each belief with the other beliefs at each
iteration. Therefore, with respect to the properties presented here IterateRUB is
preferable to IterateRUA. We also showed experimentally in a real world dataset that
IterateRUB also converges to a much smaller fixpoint than does IterateRUA.
As mentioned in the paper, we selected IterateRUA and IterateRUB for a detailed
comparison in this paper since they represented extremes in the selection procedure. An
interesting finding in this paper is that the method that uses all discovered patterns in
refinement is inferior to the one that just uses the best with respect to the criteria
30
considered. In a sense, this is not an intuitive result since it does not advocate using the
entire set of discovered patterns in refinement. The work presented in this paper
represents one approach to knowledge refinement in the specific context of unexpected
association rules discovered from data. In general, as knowledge-driven data mining
develops, additional work is needed to investigate new refinement strategies for other
methods and to evaluate different comparison metrics.
In this paper we addressed the problem of incorporating the discovered contradictions
into the belief system based on a formal logic approach. Specifically, we presented a
framework for refinement based on a generic knowledge refinement strategy, described
abstract properties of refinement algorithms that can be used to compare specific
instantiations and then presented and compared two specific refinement algorithms based
on this framework.
References
[1] Alchourron, C., Gardenfors, P. and Makinson, D., 1985. On the logic of theory
change: Partial meet contraction and revision functions. Journal of Symbolic Logic,
50:510530.
[2] Agrawal, R., Imielinski, T. and Swami, A., 1993. Mining association rules between
sets of items in large databases. In Proc. of the 1993 ACM SIGMOD Conference on
Management of Data, pp. 207-216.
31
[3] Agrawal, R., Mannila, H., Srikant, R., Toivonen, H. and Verkamo,A.I., 1995. Fast
discovery of association rules. In Fayyad, U.M., Piatetsky-Shapiro, G., Smyth, P., and
Uthurusamy, R. eds., Advances in Knowledge Discovery and Data Mining. AAAI Press.
[4] Boutilier, C., 1994. Unifying default reasoning and belief revision in a modal
framework. Artificial Intelligence, 68:33-85.
[5] Buchanan, B.G. and E.A. Feigenbaum., 1978. DENDRAL and META-DENDRAL:
Their applications dimensions. Artificial Intelligence, 11:5-24.
[6] Breiman, L., Friedman, J.H., Olshen, R.A. and Stone, C.J., 1984. Classification and
Regression Trees, Wadsworth International Group.
[7] Fayyad, U.M., Piatetsky-Shapiro, G., Smyth, P., 1996. From data mining to
knowledge discovery: An overview. In Fayyad, U.M.,Piatetsky-Shapiro, G., Smyth, P.,
and Uthurusamy, R. eds., Advances in Knowledge Discovery and Data Mining.
AAAI/MIT Press.
[8] Lenat, D.B., 1983. AM: Discovery in mathematics as heuristic search. In R. Davis
and D. Lenat, editors. Knowledge-Based Systems in Artificial Intelligence. McGraw-Hill.
32
[9] Mitchell, T., 1980. The need for biases in learning generalizations. Technical Report
CBM-TR-117, Dept. of Computer Science, Rutgers University.
[10] Mitchell, T. Machine learning and data mining. Communications of the ACM, Vol
42, No. 11, November 1999.
[11] Michalski, R.S. and Kaufman, K.A. Data mining and knowledge discovery: A
review of issues and a multistrategy approach. Technical Report P97-3 MLI 97-2,
Machine Learning and Inference Laboratory, George Mason University.
[12] Padmanabhan, B., 1999. Discovering Unexpected Patterns in Data Mining
Applications. Doctoral Dissertation, New York University, May 1999.
[13]
Padmanabhan, B. and Tuzhilin, A., 1998. A belief-driven method for discovering
unexpected patterns. In Proc. of the 4th International Conference on Knowledge
Discovery and Data Mining, 1998.
[14] Padmanabhan, B. and Tuzhilin, A., 1999. Unexpectedness as a measure of
interestingness in knowledge discovery. Decision Support Systems, (27)3 pp. 303-318
[15] Padmanabhan, B. and Tuzhilin, A., 2000. Small is beautiful: Discovering the
minimal set of unexpected patterns. In Proceedings of the 6th ACM SIGKDD Conference
on Knowledge Discovery and Data Mining, 2000.
33
[16] Quinlan, J.R., 1993. C4.5: Programs for Machine Learning, Morgan Kaufmann, San
Mateo, California.
[17] Reiter, R., 1987. Nonmonotonic reasoning. In Annual Review of Computer Science,
1987.
[18] Shrager, J. and Langley, P., 1990. Computational Models of Scientific Discovery and
Theory Formation. San Mateo, CA: Morgan Kaufmann, 1990.
[19] Tuzhilin, A. and Silberschatz, A., 1996. A belief-driven discovery framework based
on data monitoring and triggering. Working Paper #IS-96-26, Dept. of Information
Systems, Leonard N. Stern School of Business, NYU.
[20] Ullman, J., 1998. Principles of Database and Knowledge-Based Systems, vol. 1.
Computer Science Press, 1988.
34
Input: Belief B, dataset D
Output: fixpoint(B)
K = 0
R 0(B) = B
Repeat {
B' = {}
For each belief b  R K(B) {
X = patterns unexpected with respect to b
S = select some patterns from X to refine b
B' = B'  patterns from refining b with S
}
K = K + 1
R K(B)
= valid_beliefs(B')
} until no unexpected patterns in R K(B)
fixpoint(B) = R K(B)
Figure 1. Generic refinement strategy
35
Input: Belief B, dataset D, minimum support s,
minimum confidence c
Output: fixpoint(B)
K = 0
Repeat {
B' = {}
R K(B) = valid_beliefs(B)
For each belief b  R K(B) {
X = Unexpected patterns from MinZoominUR(b, s, c,
D)
S = select_one_strongest_pattern(X)
B' = B'  NM_refine(b, S)
}
K = K + 1
R K(B)
= B'
} until no unexpected patterns in B;
fixpoint(B) = R K(B)
Figure 2. IterateRUB
36
Input: Belief B, dataset D, minimum support s,
minimum confidence c
Output: fixpoint(B)
K = 0
Repeat {
B' = {}
R K(B) = valid_beliefs(B)
For each belief b  R K(B) {
S = Unexpected patterns from MinZoominUR(b, s, c,
D)
B' = B'  NM_refine(b, S)
}
K = K + 1
R K(B)
= B'
} until no unexpected patterns in B;
fixpoint(B) = R K(B)
Figure 3. IterateRUA
37
Biographies
Balaji Padmanabhan is an Assistant Professor of Operations and Information
Management at The Wharton School, University of Pennsylvania. He holds a Ph.D. in
Information Systems from New York University and a B.S. in Computer Science from
the Indian Institute of Technology, Madras. His research interests are in the areas of Data
Mining and Knowledge Management with a focus on building effective tools for the
discovery of interesting patterns in data by combining domain knowledge about problems
with automated search. His current research is on the discovery of unexpected patterns in
data, knowledge-driven data mining, web usage mining and evaluation of personalization
technologies. His work has been published in Decision Support Systems, Procs. of the
ACM SIGKDD Knowledge Discovery and Data Mining Conference, European Journal
of Marketing and Proceedings of International Conference on Information Systems,
Workshop on Information Technology and Systems and AIS. He has served on the
program committees of KDD, WITS and IIWAS conferences and on the Editorial Board
of the Journal of Database Management.
Alexander Tuzhilin is an Associate Professor of Information Systems at Stern School of
Business, New York University. He holds a Ph.D. in Computer Science from the
Courant Institute of Mathematical Sciences, NYU. His research interests include
knowledge discovery in databases (data mining), personalization techniques for CRM,
temporal databases, marketing information systems, query-driven simulations, and
conceptual modeling of information systems. His papers have been published in ACM
38
Transactions on Database Systems, ACM Transactions on Information Systems, ACM
Transactions on Modeling and Computer Simulation, IEEE Transactions on Knowledge
and Data Engineering, Acta Informatica, Information Systems, Information Systems
Research, DSS, and marketing and OR journals. He serves on the Editorial Boards of the
Journal of Data Mining and Knowledge Discovery, the INFORMS Journal on
Computing, the Journal of AIS (JAIS) and the Electronic Commerce Research Journal
and served as a guest editor of special issues of several journals and on the program
committees of the KDD, SIAM Data Mining, VLDB, TIME, ICECR and ER conferences
and of numerous workshops.
Download