1 Knowledge Refinement Based on the Discovery of Unexpected Patterns in Data Mining Balaji Padmanabhan Operations and Information Management Department The Wharton School, University of Pennsylvania 3620 Locust Walk, Philadelphia, PA 19104 email: balaji@wharton.upenn.edu tel:+1(215)573-9646 Alexander Tuzhilin Information Systems Department Stern School of Business, New York University 44 West 4 Street, New York, NY 10012 email: atuzhili@stern.nyu.edu tel:+1(212)998-0832 Abstract In prior work we provided methods that generate unexpected patterns with respect to managerial intuition by eliciting managers' beliefs about the domain and using these beliefs to seed the search for unexpected patterns in data. Unexpected patterns discovered in this manner represent contradictions or “holes” in domain knowledge which need to be resolved. Given a belief and a set of unexpected patterns, the motivation behind knowledge refinement is that the belief can be made stronger by refining the belief based on the discovered patterns. In this paper we address the problem of incorporating the discovered contradictions into the belief system based on a formal logic approach. 2 Specifically, we present a framework for refinement based on a generic knowledge refinement strategy, describe abstract properties of refinement algorithms that can be used to compare specific instantiations and then describe and compare two specific refinement algorithms based on this framework. Keywords: Knowledge refinement, unexpected patterns, data mining, association rules, rule discovery, refinement strategies, iterative refinement 1. Introduction Over the past few years, research in data mining [7, 10] has produced several new techniques for pattern discovery in large datasets and has demonstrated several successful applications of data mining. However, a drawback of several data mining techniques is that they do not systematically leverage prior domain knowledge of users. In prior research [13, 14, 15] we proposed new methods to discover unexpected patterns in data based on prior domain knowledge. As demonstrated in [13, 14, 15], methods that generate unexpected patterns can discover far fewer and more effective patterns than conventional pattern discovery approaches in data mining such as association rule discovery [3]. In general, for knowledge-driven data mining methods, the concept of knowledge refinement is critical since these methods have to deal with reconciling discovered patterns from data with the initial domain knowledge. In particular, unexpected patterns 3 discovered from data using methods proposed in prior work [13, 14, 15] represent contradictions to domain knowledge that need to be resolved. Resolving contradictions in this manner can be used as a process of iteratively refining domain knowledge through repeated data mining. To address this problem, in this paper we present knowledge refinement methods that reconcile unexpected patterns generated by methods in [13, 14, 15] with prior domain knowledge. Specifically, in this paper we present a generic knowledge refinement strategy, describe abstract properties of refinement algorithms that can be used to compare specific instantiations based on the generic strategy and then present and compare two algorithms that refine prior domain knowledge based on the discovery of unexpected association rules. In a broad sense, the idea of refining knowledge has been addressed before in several different communities. In the KDD (Knowledge Discovery and Data Mining) community [19] suggests a feedback process in which the discovered patterns would be used to update the belief system but stops short of describing how this can be done. Further, the context in which [19] describes the feedback process is different - [19] presents a data monitoring and discovery triggering framework that is applicable in problems where the data changes over time (new data enters the system and this could “trigger” discovery algorithms). In this paper, the data remains constant while the knowledge discovered from this data gets refined by the feedback process. In our work, we will describe how to refine the knowledge base iteratively. 4 In the area of association rule discovery in data mining, there has been little work done in knowledge refinement based on discovered rules. A main reason for this is that most methods for rule discovery in data mining do not systematically integrate prior domain knowledge into the search for patterns in data. Recently unexpected pattern discovery [13, 14, 15] provides methods that discover unexpected association rules based on systematic incorporation of domain knowledge. However, these approaches do not describe how to refine knowledge based on the discovered patterns. Prior research in the areas of expert systems [5, 8], scientific discovery [18], belief revision [1, 4], classification [6, 16] and concept learning [9, 11] address issues related to knowledge refinement. Below we briefly describe the issues in these areas and discuss their relation to our work. The idea of new knowledge discovery, its integration with existing knowledge and iterative repetition of this process can be traced back to the early work on expert systems [5, 8] and on scientific discovery [18]. However, in this paper we deal with discovering unexpected patterns using association rules and with the specific issues of integrating these types of patterns into the existing system of beliefs. There has also been work done on the problem of belief revision in AI (e.g., [1, 4]). However in our approach we start with beliefs that are statistically valid. Hence, given the data and the fact that the beliefs “hold”, no update is necessary. However, we demonstrate that refining beliefs can make domain knowledge better. Further, beliefs and the new evidence are represented as “rules”. The methods proposed in [1, 4] do not describe how to update rules specifically. This lies outside of the scope of the research issues addressed in [1,4] and other related papers in belief revision. 5 In classification, several approaches [6, 16] address the issue of refinement in the context of predicting a dependent class. Specifically, these approaches are based on iteratively applying steps to classify the misclassified cases better. In general in each iteration, the classification method is biased towards incorrectly classified data. These approaches differ from the methods for refinement proposed in this paper since we do not address the issue of classification but focus on building methods to refine domain knowledge in the form of rules using unexpected (association) rule discovery. In the problem of concept learning [9, 11] several approaches address rule refinement. In concept learning the task is to learn a set of rules that completely characterize a predicted class (concept). Some approaches deal with starting with domain theories about the concept and identify exceptions in the data. Since the task of discovering all rules that can explain the exceptions is exponential, these approaches use specific heuristics in rule refinement. Our work is the first approach to use association rules, in which it is possible to search through all possible rules since we can exploit constraints not available to other methods [3, 13]. Further we deal with a broader set of beliefs (and not just characterizing a single concept) and do not assume that rules in the final set need to completely characterize the data as done in concept learning. The rest of this paper is structured as follows. In Section 2 we provide, for completeness, some preliminaries including an overview of association rules and unexpected pattern discovery. Section 3 provides a technical motivation for knowledge refinement. In 6 Section 4 we present the generic refinement strategy. Section 5 discusses properties of refinement algorithms and Section 6 presents and compares two specific refinement algorithms. Conclusions are presented in Section 7. 2. Preliminaries In this section we present an overview of the structure of rules considered in this paper, unexpectedness and approaches to discover unexpected patterns presented in [14, 15]. Let I = {i1, i2, …, im} be a set of discrete attributes (also called “items” [2]), some of them being ordered and others unordered. Let D = {T1, T2, ..., TN} be a relation consisting on N transactions [3] T1,...,TN over the relation schema {i1, i2, …, im}. Also, let an atomic condition be a proposition of the form value1 attribute value2 for ordered attributes and attribute = value for unordered attributes where value, value1, value2 belong to the set of distinct values taken by attribute in D. Finally, an itemset is a conjunction of atomic conditions. Then we assume that rules and beliefs are defined as extended association rules of the form X A, where X is the conjunction of atomic conditions (an itemset) and A is an atomic condition. Moreover, the rule has confidence [2] c if c% of the transactions in D that contain X also contain A; also, the rule has support [2] s in D if s% of the transactions in D contain both X and A. Finally, a rule is said to hold on a dataset D if the confidence of the rule is greater than a user-specified threshold value chosen to be any value greater than 0.5. Various efficient algorithms for finding all association rules in transactions databases have been proposed in [3]. 7 To define unexpectedness, we start with a set of beliefs that represent knowledge about the domain and use these beliefs to seed the search for all unexpected patterns defined as rules. [13, 14] provides an approach to defining unexpectedness in terms of a logical contradiction, on the data, to an existing system of beliefs. More specifically, in this approach, a rule A B is defined to be unexpected with respect to the belief X Y on the database D if the following conditions hold: (a) B and Y logically contradict each other (B AND Y |= FALSE); (b) A AND X holds on a “large” subset of tuples in D; (c) The rule A, X B holds. Given the definition of unexpectedness, [13] proposes algorithm ZoomUR that discovers all the unexpected rules with respect to a set of beliefs that satisfy user-specified minimum support and confidence requirements. ZoomUR consists of algorithms for two phases of the discovery strategy - ZoominUR and ZoomoutUR. In the first phase of ZoomUR, ZoominUR discovers all unexpected patterns that are refinements to any belief. More specifically, given any belief X Y, ZoominUR discovers all unexpected rules of the form X, A B such that B AND Y |= FALSE. In the second phase of ZoomUR, starting from all the unexpected rules that are refinements to a belief, ZoomoutUR discovers more general rules (generalizations) that are also unexpected. Specifically from each unexpected refinement of the form X, A B ZoomoutUR discovers all the unexpected rules of the form X’, A B where X’ X. The 8 rules that ZoomoutUR discovers are not refinements of beliefs, but more general rules that satisfy the conditions of unexpectedness as defined above. For example, if a belief is that “professional weekend” (professionals tend to shop more on weekends than on weekdays), ZoominUR may discover a refinement such as “professional, December weekday” (in December, professionals tend to shop more on weekdays than on weekends). ZoomoutUR may then discover a more general rule “December weekday”, which is totally different (in the sense that the rules that ZoomoutUR discovers cannot be inferred from either the original belief or the refinement generated by ZoominUR) from the initial belief “professional weekend”. In addition to being unexpected by definition, these more general rules could perhaps provide additional reasons why the belief was contradicted in the refinement (perhaps its not a "professionals in December effect" but a "December effect" that causes the belief to be contradicted). Though ZoomUR discovers only the unexpected rules and also far fewer rules than Apriori, it still discovers large numbers of rules many of which are redundant in the sense that they can be inferred from other discovered rules. For example, given the belief diaper beer and two unexpected patterns diaper, weekday not_beer and diaper, weekday, male not_beer the second unexpected pattern can be inferred from the first one under monotonicity. Formally, the monotonicity assumption is defined in [15] using the relation |=M defined below. 9 Definition [15]. Rule (A B) |=M (C D) if 1. C |= A, and 2. D = B. e.g. (diaper beer) |=M (diaper, weekday beer) since diaper, weekday logically implies diaper. Therefore, to improve the discovery process, [15] introduces the concept of a minimal set of unexpected patterns and presents efficient algorithms that discover this set of rules. [15] presents MinZoominUR, an algorithm that discovers the minimal set of unexpected refinements and MinZoomUR, an algorithm that discovers the minimal set of all unexpected patterns (not just refinements). A more detailed description of the algorithms that discover minimal unexpected patterns is in [15]. Given these preliminaries, in the next section we present a technical motivation for knowledge refinement based on unexpected pattern discovery. 3. Motivation for Knowledge Refinement Given a belief and a set of unexpected patterns, the motivation behind knowledge refinement is that the belief can be made stronger by refining the belief based on the discovered patterns. This is presented formally in Theorem 3.1. 10 Theorem 3.1. Given a database D, a belief A B with support s1 and confidence c1 such that the belief holds on D, and an unexpected pattern X C with support s2 and confidence c2, the refined belief A, X B has confidence c3 such that c3 > c1. Proof. For any itemset P, let cnt(P) denote the number of records in D that satisfy P. To prove c3 > c1, we need to prove the following: cnt(A,B,X) / cnt(A,X) > cnt(A,B) / cnt(A). Rewriting the LHS of the above inequality, we need to prove: [cnt(A,B)-cnt(A,B,X)] / [cnt(A)-cnt(A,X)] > cnt(A,B) / cnt(A). Rearranging the above after multiplying both sides of the inequality by cnt(A).[cnt(A)cnt(A,X)] results in the following inequality to be proved: cnt(A). cnt(A,B,X) < cnt(A,X). cnt(A,B) Rearranging the terms, we need to prove that cnt(A,B)/cnt(A) > cnt(A,B,X) / cnt(A,X). Observe that the LHS of the above inequality is the confidence of A B and the RHS is the confidence of A, X B. The above inequality would hold (and hence the proof would be complete) if it can be shown that the rule A, X B has lower confidence that the belief A B. Given that the belief holds, to complete the proof below we show that the rule A, X B in fact does not hold (and hence has confidence < 0.5). 11 Given that X C is unexpected, it follows that A, X C holds. Since C |= B, it follows that cnt(A,X,B) cnt(A,X,C). Hence the rule A, X B also holds. Clearly if this is the case, A, X B cannot hold, which completes the proof. In the next section we present the generic refinement strategy. 4. Generic refinement strategy The general strategy is to generate unexpected patterns followed by selecting some of these patterns and refining the beliefs. The process continues until no more unexpected patterns can be generated (i.e. a fixpoint [20] is reached). This generic refinement strategy is presented in Figure 1. Rk(B) denotes the set of beliefs generated at the end of the k-th iteration of refining belief B. fixpoint(B) denotes the refined set of beliefs for a given belief B such that there are no unexpected patterns with respect to any belief in fixpoint(B). At the end of each iteration, the set of beliefs is checked for validity, where validity may be defined in different terms such as minimum support and confidence. This is necessary to ensure that the refinement procedure does not add beliefs that may not satisfy threshold support or confidence requirements. 12 The process therefore consists of three procedures in each iteration: 1. Pattern generation procedure: generation of unexpected patterns for a belief. The procedure to generate unexpected patterns for a belief can be one of the methods described in Section 2 (ZoominUR, ZoomUR, MinZoominUR, MinZoomUR). The pattern generation procedure used in the refinement methods presented in this paper is discussed in Section 6.1. 2. Selection procedure: selecting a subset of unexpected patterns that will be used to refine the belief. There can be several criteria for selecting a subset of patterns from the set of all unexpected patterns and there can, hence, be several selection procedures that can be used. 3. Refinement procedure: refining the belief using selected patterns. Given a belief and a set of unexpected patterns, the refinement procedure details how the new set of beliefs will be computed. Section 6.2 discusses this in greater detail. << INSERT FIGURE 1 ABOUT HERE >> A specific instantiation of each of these three procedures creates a specific refinement algorithm. There are several possible instantiations of each of these procedures, which results in a large number of refinement algorithms. A strength of this generic refinement strategy is that it can be viewed as a broad framework that can allow for a large number of different refinement approaches. 13 Rather than comparing different refinement approaches using a single metric, we adopt the approach of listing several properties, presented in the following section, that can be used to compare refinement algorithms. 5. Properties of refinement algorithms In this section we present five properties of refinement algorithms. These properties are not exhaustive since specific applications may have additional requirements or desirable properties of a belief system. 1. Convergence. Guarantees convergence of a belief system to a fixpoint, i.e., after a finite number of iterations no unexpected patterns are discovered relative to the refined system of beliefs. 2. Consistency. Ensures that beliefs RK(B) are consistent at any iteration of the belief revision process. A set of beliefs, B, is said to be consistent if for all b1, b2 B whenever head(b1) |= head(b2) then it is also true that body(b1) body(b2) |= FALSE. For example, the beliefs A,X B and A,X B are consistent since A,X and A,X cannot hold at the same time. The beliefs A,X B and A, C B are not consistent since A,X and A, C can hold at the same time even though they have contradictory heads. 14 3. Path Independence. Ensures that the order in which the selected patterns are incorporated into the belief system does not affect the refined system of beliefs. 4. Minimality. Ensures that a refinement strategy creates a minimal set of beliefs where minimality is defined with respect to the |=M operator (as described in Section 2). 5. Monotonicity. Guarantees that once an unexpected pattern is incorporated into the belief system and becomes “expected,” the pattern will never re-appear subsequently as “unexpected” again. In the next section we present two refinement algorithms IterateRUB (iteratively Refines beliefs Using the "Best" unexpected pattern in each iteration) and IterateRUA (iteratively Refines beliefs Using All unexpected patterns in each iteration) both of which generate fixpoints for a given belief. 6. Refinement Algorithms In this section we present two refinement algorithms that differ only in the selection procedure of the iterative refinement process. The pattern generation procedure and 15 refinement procedure are common and we describe these procedures first, followed by a description of the individual selection procedures. 6.1 The pattern generation procedure The pattern generation procedure used to generate unexpected patterns with respect to each belief is MinZoominUR, which generates only the minimal set of rules that are refinements to a belief (“zoomin” rules, see Section 2). Some other choices for the pattern generation procedure are ZoominUR, ZoomUR, MinZoomUR but MinZoominUR is a good choice for the pattern generation procedure for the following reasons: 1. Intuitively, "refinement" is associated with specialization. The Webster dictionary provides a definition of "refine" as "to improve or perfect by pruning or polishing". For this reason, zoomin rules are refinements that contradict a belief while zoomout rules are more general unexpected patterns. MinZoominUR generates only zoomin rules. 2. Since MinZoominUR generates only minimal patterns, using MinZoominUR guarantees that no selection procedure can select patterns that are subsumed by other unexpected patterns. Intuitively using only the minimal set of unexpected patterns is equivalent to resolving the most general contradictions first. For example, consider a belief that professionals shop on weekends and two unexpected patterns are that in December they shop on weekdays and professionals with medium income in 16 December shop on weekdays. Intuitively the first unexpected pattern should be resolved first since it is more general. For the above reasons, MinZoominUR is used as the pattern generation procedure in the refinement algorithms presented in this paper. We next describe the refinement procedure used in IterateRUB and IterateRUA. 6.2 The refinement procedure The refinement procedure, NM, used in IterateRUB and IterateRUA is similar to ideas in non-monotonic reasoning [17]. Given a belief b represented by A B and a set, S, of unexpected patterns X1 C1, X2 C2,…, XN CN, the refinement procedure NM_Refine(b, S) replaces the initial belief with the following: X1 C1, X2 C2,…, XN CN and A, X1, X2,…, XN B Given the constraint that the bodies of rules and beliefs considered are restricted to conjunctions and the fact that the refinement procedure incorporates negations in the body of the refined belief, the refined belief is equivalently represented as a set of beliefs derived according to the following procedure. 17 The body (A, X1, X2,…, XN) of the refined belief is first converted to the equivalent disjunctive normal form P1 P2 … PK. The belief A, X1, X2,…, XN B is therefore equivalent to P1 P2 … PK B, which is represented as K beliefs P1 B, P2 B,…, PK B all of which have only conjunctions of conditions in their bodies. The strengths of the NM refinement procedure are: 1. Since all the unexpected patterns are incorporated into the belief system, the NM procedure guarantees that all selected unexpected patterns are now expected. 2. Completeness: the original belief is refined to incorporate all the conditions in the selected patterns that contradicted the belief. The two refinement algorithms IterateRUB and IterateRUA incorporate the same pattern generation and refinement procedures described above and differ in the selection procedure used. These two specific algorithms are described in this paper since they represent two extremes: IterateRUB uses only a single "best" (see Section 6.3 below) pattern at each stage of the refinement process (similar to the greedy heuristic that is used in recursive partitioning methods such as CART [6]) and IterateRUA uses all generated patterns in each iteration. 18 In the next two sections we present IterateRUB and IterateRUA. 6.3 IterateRUB IterateRUB is presented in Figure 2. The algorithm follows the generic refinement strategy presented in Section 4 and uses MinZoominUR as the pattern generation procedure and NM as the refinement procedure. The selection procedure used in IterateRUB selects a single "strongest" unexpected pattern from the set of MinZoominUR patterns by applying the following heuristic: 1. Select the set HC of unexpected patterns that have the highest confidence from the set of patterns generated. 2. Select the set HCS of unexpected patterns that have the highest support from the set HC generated previously. 3. Since there may be multiple patterns with the same confidence and support values, select any single pattern from HCS. << INSERT FIGURE 2 ABOUT HERE >> In general there may be measures other than confidence or support to choose the "strongest" unexpected pattern. 19 To evaluate IterateRUB in theorems 6.1 through 6.5 below we prove various properties of IterateRUB. Some experimental results of IterateRUB in a real world application are presented after Theorem 6.6 in Section 6.4 below. Theorem 6.1. For a belief, B, and a dataset D with a finite number of discrete attributes, IterateRUB converges to a fixpoint after a finite number of iterations (i.e. fixpoint(B) exists). Proof. Since IterateRUB uses MinZoominUR rules to refine beliefs, it follows that the number of conditions in the body of both the refined beliefs is strictly greater than the number of conditions in the body of the parent belief (because zoomin rules are specializations of a belief). Given a finite number of discrete attributes, and therefore a finite number of conditions that can be considered, the upper bound on the number of iterations of IterateRUB is the number of possible conditions in the domain minus the minimum number of conditions in the body of any original belief. In the unordered case where conditions involve only the equality operator, the upper bound is in fact the number of attributes in the domain (since no two conditions in a belief can involve the same attribute - for e.g. A=1,B=4,A=3 X=0 is not a valid belief). That the final set of beliefs is a fixpoint follows trivially since this set is characterized by a lack of unexpected patterns for MinZoominUR to generate. 20 In practice, a second effect occurs which further reduces the number of iterations before convergence. We briefly explain this below. Consider a belief A B and the selected unexpected pattern A, X B. In the next iteration, the belief A B is replaced with the beliefs A, X B and A, X B. The support of both the refined beliefs is less than or equal to the support of the parent belief as is shown below. Assume that cnt(X) is the number of records in D where X holds. By definition, support( A B) = cnt(A, B)/|D|. Also, since the belief A B holds, cnt(A, B) > cnt (A, B). Below we show for both the refined beliefs that the support is less than or equal to the support of the parent belief. support(A,X B) = cnt( A, X, B ) / |D| = (1/|D|) * (cnt( A, B) - cnt(A, X, B)) support(A B). support(A, X, B) = cnt(A, X, B) / |D| cnt(A, B) / |D| cnt(A, B) / |D| = support( A B). Given this, MinZoominUR usually generates unexpected patterns with lesser and lesser support until the support values fall below the minimum specified threshold. This is another factor that in practice aids in fast convergence to a fixpoint. 21 Theorem 6.2. For a belief B, RK(B) is consistent where K is any iteration of IterateRUB. Proof. This property is a direct consequence of the fact that in IterateRUB, for any iteration K, RK(B) consists of beliefs that are mutually exclusive. Intuitively mutually exclusive beliefs are consistent since by virtue of no two beliefs being applicable at the same time there can be no potential inconsistency. The consistency condition is hence satisfied trivially if RK(B) can be shown to consist of beliefs that are mutually exclusive. To prove the theorem, we now prove by induction on the number of iterations, k, that for all b1, b2 RK(B) the beliefs b1 and b2 are mutually exclusive. Base step: To prove that R 1 (B) consists of mutually exclusive beliefs. Consider an initial belief A B and the best unexpected pattern A, X B. The set of beliefs at the end of the first iteration are A, X B and A, X B. Since (A, X) |= (A, X) the two beliefs are mutually exclusive. Induction step: Assume that R P (B) consists of mutually exclusive beliefs. We need to prove that R P+1 (B) consists of mutually exclusive beliefs. Consider any belief C D that belongs to R there is no such belief, then R P+1 P (B) and that has unexpected patterns. If (B) = R P (B) and the result trivially holds. If there are unexpected patterns, assume that the best unexpected pattern is C, X D. 22 At the end of iteration (P+1) the belief C D is replaced by C, X D and C, X D. Observe that the bodies of these two beliefs ( C,X and C, X ),are specializations to the body of the belief (C). Since C D was in R P (B), by the inductive assumption it follows that C D is mutually exclusive to all other beliefs in R P (B). Therefore C, X D and C, X D are both mutually exclusive to all other beliefs in R P (B). Further, since the body of any belief in R P+1 (B) is a specialization of the body of some belief in R P (B) it follows that both C, X D and C, X D are mutually exclusive to any belief derived from any belief other than C D from R P (B). By symmetry the same argument applies to all beliefs refined in iteration P+1. Hence R P+1 (B) consists of mutually exclusive beliefs. Theorem 6.3. IterateRUB has the path-independence property, i.e. the order in which the selected patterns are incorporated into the belief system does not affect the final belief system. Proof. Since only one pattern is incorporated into the belief system at each iteration this holds trivially. Theorem 6.4. For a belief B, R K (B) is minimal where K is any iteration of IterateRUB. 23 Proof. As proved in theorem 6.2, all beliefs in R K (B) are mutually exclusive. Hence it is impossible to find two beliefs b1, b2 in R K (B) such that body(b1) |= body(b2). Hence R K (B) has to be minimal. Theorem 6.5. IterateRUB is monotonic. Proof. To prove this we need to show that no unexpected pattern incorporated into the belief system at any iteration appears again as unexpected. Clearly since the set of beliefs at any iteration is mutually exclusive, the same pattern cannot appear as unexpected for two different beliefs in any iteration. Below we prove that any unexpected pattern cannot re-appear at any subsequent iteration too. To trace how a single belief is refined iteratively in IterateRUB, consider a tree with the belief at the root such that children of any node in this tree are beliefs that the refinement procedure creates for the parent belief and the depth of a node indicates the number of iterations from the initial belief. A property of any belief (node) in this tree is that they are refinements of all their parent nodes. Therefore, since an unexpected pattern here is a zoomin rule, it can never re-surface at any node in the sub-tree under itself. Hence to prove that an unexpected pattern cannot re-surface at a subsequent iteration, all we need to show now is that they cannot result from any other belief in the iteration that the unexpected pattern was incorporated. Consider an unexpected pattern, p, incorporated into the belief system in iteration k. Recall that all beliefs in RK(B) are mutually exclusive 24 for any iteration, k. Therefore all nodes in the trees that result from each belief in RK(B)-p are also mutually exclusive with p and therefore none of these patterns can be the same as p. Hence the result. To summarize, IterateRUB always converges to a fixpoint for any belief, generates minimal and consistent beliefs at any iteration, is path-independent and monotonic. In the next section we present another refinement algorithm, IterateRUA and discuss its properties. 6.4 IterateRUA IterateRUA is presented in Figure 3. The algorithm follows the generic refinement strategy presented in Section 4 and also uses MinZoominUR as the pattern generation procedure and NM as the refinement procedure. Skipping the selection step in Figure 3 defaults to using all generated patterns in the refinement procedure. Hence, the selection procedure is trivially one that selects all the generated patterns. In theorems 6.6 through 6.10 below we prove various properties of IterateRUA. Theorem 6.6. For a belief, B, and a dataset D with a finite number of discrete attributes, IterateRUA converges to a fixpoint after a finite number of iterations (i.e. fixpoint(B) exists). 25 The proof is the same as the one proved for IterateRUB in theorem 6.1. << INSERT FIGURE 3 ABOUT HERE >> In order to experimentally study and compare the convergence properties of IterateRUB and IterateRUA we applied the methods to consumer purchase data from a major market research firm. We pre-processed this data by combining different data sets (transaction data joined with demographics), made available to us into one table containing 38 different attributes and 313409 records. These attributes pertain to the item purchased by a shopper at a store over a period of one year, together with certain characteristics of the store and demographic data about the shopper and his or her family. Some demographic attributes include age and gender of the shopper, occupation, income and marital status of the household head and the presence of children in the family and the size of the household. Some transaction-specific attributes include product purchased, coupon usage (whether the shopper used any coupons to get a lower price or not), the availability of store coupons or manufacturer’s coupons and presence of advertisements for the product purchased in the store. We started with an initial set of 28 beliefs and a minimum support value of 1% and a minimum confidence threshold of 0.6. We used IterateRUB and IterateRUA to compute the fixpoints. Both approaches terminated in a few minutes in fixpoints which satisfy the condition that no more unexpected patterns exist for any of the beliefs in the final set. IterateRUB generated a final set of 79 beliefs and converged more rapidly in this 26 experiment while IterateRUA generated a final set of 2549 beliefs. For the same set of beliefs we also ran the methods for six different minimum support values below 3% and the average number of patterns in the fixpoints for IterateRUB and IterateRUA were 42 and 1033 respectively. Continuing with a discussion of the properties of IterateRUA, below we next consider the consistency property of IterateRUA. Theorem 6.7. For a belief B, R K (B) is not always consistent where K is any iteration of IterateRUA. Proof. To show that IterateRUA can generate an inconsistent set of beliefs in an iteration we provide an example of a case where this can occur. Since IterateRUA incorporates all unexpected patterns into the belief system, consider the following two beliefs in an iteration A, X B and A, Y B. Assume that there are no unexpected patterns for the first belief but the second belief generates A, Y, P B. Hence the belief system at the next iteration contains both A, X B and A, Y, P B which are two inconsistent beliefs (since when A, X, Y and P are true it results in the system claiming B and B at the same time). The implications of theorem 6.7 will be discussed in the discussion in Section 7. 27 Theorem 6.8. IterateRUA has the path-independence property, i.e. the order in which the selected patterns are incorporated into the belief system does not affect the final belief system. Proof. Given a belief A B and a set of unexpected patterns X1 C1, X2 C2,…, XN CN, the refinement procedure replaces the belief with the following: X1 C1, X2 C2,…, XN CN and A, X1, X2,…, XN B Since all unexpected patterns are therefore incorporated simultaneously into the belief system as shown above, the order does not matter trivially. Hence the result. Theorem 6.9. For a belief B, R K (B) is not always minimal where K is any iteration of IterateRUA. Proof. We provide a simple example where R K (B) can be non-minimal. Since IterateRUA incorporates all unexpected patterns into the belief system, consider the following two beliefs in an iteration A, X B and A, Y B. The next iteration can generate A, X, Y, P B and A, Y, P B as unexpected patterns for each of the previous beliefs. Since all discovered unexpected patterns are incorporated into the belief system by IterateRUA, clearly R result. K (B) is non-minimal since A, Y, P |= A, X, Y, P. Hence the Theorem 6.10. IterateRUA is not monotonic. 28 Proof. Consider the following two beliefs in an iteration A, X B and A, Y B. Assume that the pattern A, X, Y B holds. Since A, X, Y B will be generated as unexpected for both beliefs in the same iteration even after it is incorporated into the belief system (when it is generated the first time) the pattern will be generated again as unexpected for the second belief. Hence IterateRUA is not monotonic. To summarize, IterateRUA converges to a fixpoint, has the path-independence property but is not consistent, minimal or monotonic. In the next section we discuss the implications of these and other additional properties of the refinement algorithms. 7 Discussion The generic refinement strategy presented in Figure 1 has three degrees of freedom: the pattern generation procedure, the selection procedure and the refinement procedure. Given that a fixed pattern generation procedure (MinZoominUR) and a fixed refinement procedure (NM_Refine) were chosen for their strengths described in Section 4, we presented two refinement algorithms that represented extremes in the selection procedure - IterateRUB selected only the best pattern to incorporate each time while IterateRUA selected all. Clearly the generic refinement strategy (Figure 1) can be used in many other refinement algorithms that select some subset of generated patterns each time. However since IterateRUB was shown to have all the good properties presented in this paper we 29 believe it is an “optimal” algorithm with respect to satisfying all the objective functions or properties that were chosen. A globally “best” refinement algorithm needs clear specification of what “best” should be and in general there may be several other properties that are useful (such as convergence in a fixed number of iterations, the size of the fixpoint, predictive accuracy). The approach adopted here selected five good properties of refinement algorithms to make inferences on their relative strengths. IterateRUB satisfied all these properties. IterateRUA however does not score well in consistency, minimality and monotonicity properties. However, notice that the reason IterateRUB generated consistent and minimal beliefs and had the monotonicity property was because at any iteration it generated mutually exclusive patterns (as shown in proofs of theorems 6.2, 6.4 and 6.5). IterateRUA can be modified such that at the end of each iteration the patterns can be made mutually exclusive by applying conflict resolution strategies. However this is an expensive operation which involves comparison of each belief with the other beliefs at each iteration. Therefore, with respect to the properties presented here IterateRUB is preferable to IterateRUA. We also showed experimentally in a real world dataset that IterateRUB also converges to a much smaller fixpoint than does IterateRUA. As mentioned in the paper, we selected IterateRUA and IterateRUB for a detailed comparison in this paper since they represented extremes in the selection procedure. An interesting finding in this paper is that the method that uses all discovered patterns in refinement is inferior to the one that just uses the best with respect to the criteria 30 considered. In a sense, this is not an intuitive result since it does not advocate using the entire set of discovered patterns in refinement. The work presented in this paper represents one approach to knowledge refinement in the specific context of unexpected association rules discovered from data. In general, as knowledge-driven data mining develops, additional work is needed to investigate new refinement strategies for other methods and to evaluate different comparison metrics. In this paper we addressed the problem of incorporating the discovered contradictions into the belief system based on a formal logic approach. Specifically, we presented a framework for refinement based on a generic knowledge refinement strategy, described abstract properties of refinement algorithms that can be used to compare specific instantiations and then presented and compared two specific refinement algorithms based on this framework. References [1] Alchourron, C., Gardenfors, P. and Makinson, D., 1985. On the logic of theory change: Partial meet contraction and revision functions. Journal of Symbolic Logic, 50:510530. [2] Agrawal, R., Imielinski, T. and Swami, A., 1993. Mining association rules between sets of items in large databases. In Proc. of the 1993 ACM SIGMOD Conference on Management of Data, pp. 207-216. 31 [3] Agrawal, R., Mannila, H., Srikant, R., Toivonen, H. and Verkamo,A.I., 1995. Fast discovery of association rules. In Fayyad, U.M., Piatetsky-Shapiro, G., Smyth, P., and Uthurusamy, R. eds., Advances in Knowledge Discovery and Data Mining. AAAI Press. [4] Boutilier, C., 1994. Unifying default reasoning and belief revision in a modal framework. Artificial Intelligence, 68:33-85. [5] Buchanan, B.G. and E.A. Feigenbaum., 1978. DENDRAL and META-DENDRAL: Their applications dimensions. Artificial Intelligence, 11:5-24. [6] Breiman, L., Friedman, J.H., Olshen, R.A. and Stone, C.J., 1984. Classification and Regression Trees, Wadsworth International Group. [7] Fayyad, U.M., Piatetsky-Shapiro, G., Smyth, P., 1996. From data mining to knowledge discovery: An overview. In Fayyad, U.M.,Piatetsky-Shapiro, G., Smyth, P., and Uthurusamy, R. eds., Advances in Knowledge Discovery and Data Mining. AAAI/MIT Press. [8] Lenat, D.B., 1983. AM: Discovery in mathematics as heuristic search. In R. Davis and D. Lenat, editors. Knowledge-Based Systems in Artificial Intelligence. McGraw-Hill. 32 [9] Mitchell, T., 1980. The need for biases in learning generalizations. Technical Report CBM-TR-117, Dept. of Computer Science, Rutgers University. [10] Mitchell, T. Machine learning and data mining. Communications of the ACM, Vol 42, No. 11, November 1999. [11] Michalski, R.S. and Kaufman, K.A. Data mining and knowledge discovery: A review of issues and a multistrategy approach. Technical Report P97-3 MLI 97-2, Machine Learning and Inference Laboratory, George Mason University. [12] Padmanabhan, B., 1999. Discovering Unexpected Patterns in Data Mining Applications. Doctoral Dissertation, New York University, May 1999. [13] Padmanabhan, B. and Tuzhilin, A., 1998. A belief-driven method for discovering unexpected patterns. In Proc. of the 4th International Conference on Knowledge Discovery and Data Mining, 1998. [14] Padmanabhan, B. and Tuzhilin, A., 1999. Unexpectedness as a measure of interestingness in knowledge discovery. Decision Support Systems, (27)3 pp. 303-318 [15] Padmanabhan, B. and Tuzhilin, A., 2000. Small is beautiful: Discovering the minimal set of unexpected patterns. In Proceedings of the 6th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2000. 33 [16] Quinlan, J.R., 1993. C4.5: Programs for Machine Learning, Morgan Kaufmann, San Mateo, California. [17] Reiter, R., 1987. Nonmonotonic reasoning. In Annual Review of Computer Science, 1987. [18] Shrager, J. and Langley, P., 1990. Computational Models of Scientific Discovery and Theory Formation. San Mateo, CA: Morgan Kaufmann, 1990. [19] Tuzhilin, A. and Silberschatz, A., 1996. A belief-driven discovery framework based on data monitoring and triggering. Working Paper #IS-96-26, Dept. of Information Systems, Leonard N. Stern School of Business, NYU. [20] Ullman, J., 1998. Principles of Database and Knowledge-Based Systems, vol. 1. Computer Science Press, 1988. 34 Input: Belief B, dataset D Output: fixpoint(B) K = 0 R 0(B) = B Repeat { B' = {} For each belief b R K(B) { X = patterns unexpected with respect to b S = select some patterns from X to refine b B' = B' patterns from refining b with S } K = K + 1 R K(B) = valid_beliefs(B') } until no unexpected patterns in R K(B) fixpoint(B) = R K(B) Figure 1. Generic refinement strategy 35 Input: Belief B, dataset D, minimum support s, minimum confidence c Output: fixpoint(B) K = 0 Repeat { B' = {} R K(B) = valid_beliefs(B) For each belief b R K(B) { X = Unexpected patterns from MinZoominUR(b, s, c, D) S = select_one_strongest_pattern(X) B' = B' NM_refine(b, S) } K = K + 1 R K(B) = B' } until no unexpected patterns in B; fixpoint(B) = R K(B) Figure 2. IterateRUB 36 Input: Belief B, dataset D, minimum support s, minimum confidence c Output: fixpoint(B) K = 0 Repeat { B' = {} R K(B) = valid_beliefs(B) For each belief b R K(B) { S = Unexpected patterns from MinZoominUR(b, s, c, D) B' = B' NM_refine(b, S) } K = K + 1 R K(B) = B' } until no unexpected patterns in B; fixpoint(B) = R K(B) Figure 3. IterateRUA 37 Biographies Balaji Padmanabhan is an Assistant Professor of Operations and Information Management at The Wharton School, University of Pennsylvania. He holds a Ph.D. in Information Systems from New York University and a B.S. in Computer Science from the Indian Institute of Technology, Madras. His research interests are in the areas of Data Mining and Knowledge Management with a focus on building effective tools for the discovery of interesting patterns in data by combining domain knowledge about problems with automated search. His current research is on the discovery of unexpected patterns in data, knowledge-driven data mining, web usage mining and evaluation of personalization technologies. His work has been published in Decision Support Systems, Procs. of the ACM SIGKDD Knowledge Discovery and Data Mining Conference, European Journal of Marketing and Proceedings of International Conference on Information Systems, Workshop on Information Technology and Systems and AIS. He has served on the program committees of KDD, WITS and IIWAS conferences and on the Editorial Board of the Journal of Database Management. Alexander Tuzhilin is an Associate Professor of Information Systems at Stern School of Business, New York University. He holds a Ph.D. in Computer Science from the Courant Institute of Mathematical Sciences, NYU. His research interests include knowledge discovery in databases (data mining), personalization techniques for CRM, temporal databases, marketing information systems, query-driven simulations, and conceptual modeling of information systems. His papers have been published in ACM 38 Transactions on Database Systems, ACM Transactions on Information Systems, ACM Transactions on Modeling and Computer Simulation, IEEE Transactions on Knowledge and Data Engineering, Acta Informatica, Information Systems, Information Systems Research, DSS, and marketing and OR journals. He serves on the Editorial Boards of the Journal of Data Mining and Knowledge Discovery, the INFORMS Journal on Computing, the Journal of AIS (JAIS) and the Electronic Commerce Research Journal and served as a guest editor of special issues of several journals and on the program committees of the KDD, SIAM Data Mining, VLDB, TIME, ICECR and ER conferences and of numerous workshops.