On Characterization and Discovery of Minimal Unexpected Patterns in Data Mining Applications Balaji Padmanabhan Operations and Information Management Department The Wharton School, University of Pennsylvania http://www.wharton.upenn.edu/faculty/padmanabhan.html balaji@wharton.upenn.edu Alexander Tuzhilin Information Systems Department Stern School of Business, New York University http://www.stern.nyu.edu/~atuzhili/ atuzhili@stern.nyu.edu On Characterization and Discovery of Minimal Unexpected Patterns in Data Mining Applications Abstract A drawback of traditional data mining methods is that they do not leverage prior knowledge of users. In many business settings, managers and analysts have significant intuition based on several years of experience. In prior work we proposed a method that could discover unexpected patterns in data by using this domain knowledge in a systematic manner. In this paper we continue our focus on discovering unexpected patterns and propose new methods for discovering a minimal set of unexpected patterns that discover orders of magnitude fewer patterns and yet retain most of the truly unexpected ones. We demonstrate the strengths of this approach experimentally using a case study application in a marketing domain. Keywords: data mining, databases, rule discovery, association rules, unexpectedness, interestingness, minimality. 1. Introduction A well-known criticism of many rule discovery algorithms in data mining is that they generate too many patterns, many of which are obvious or irrelevant [PSM94, ST95, BMU+97, S97, BA99]. It stands to reason that more effective methods are needed to discover fewer and more relevant patterns from data. One way to approach this problem is by focusing on discovering unexpected patterns [ST95, ST96, LH96, LHC97, Suz97, CSD98, Sub98, BT98, PT98, PT99, P99], where unexpectedness of discovered patterns is usually defined relative to a system of prior expectations. In particular, we proposed in our prior research [PT98, PT99, P99] a characterization of unexpectedness based on logical contradiction of a discovered pattern and prior beliefs and presented new algorithms for discovering such unexpected patterns. As demonstrated in [PT98, PT99, P99], this approach generated far fewer and more interesting patterns than traditional approaches. In this paper we extend our prior work on discovering unexpected patterns [PT98, PT99] and propose a new approach that further reduces the number of such patterns in a significant way, yet retaining most of the truly interesting patterns. This is achieved through defining a minimal set of unexpected patterns as those unexpected patterns that are not refinements of other unexpected patterns and hence cannot be monotonically inferred from other unexpected patterns. Moreover, we present efficient algorithms that discover the minimal set of unexpected patterns and test these algorithms on “real-world” data to see how well they perform in practice. In the context of discovering a minimal set of patterns in data mining, [BA99, LHM99, SLR+99, 1 TKR+95, SA96, BAG99] also provide alternate approaches to characterizing this concept. In Section 3 these approaches are described and contrasted to our method of discovering minimal sets of unexpected patterns. The power of the approach presented in this paper lies in combining two independent concepts of unexpectedness and minimality of a set of patterns into one integrated concept that provides for the discovery of small but important sets of interesting patterns. Moreover, our proposed methods are efficient in the sense that they focus directly on discovering minimal unexpected patterns rather than using any of the postprocessing approaches, such as filtering, to determine the minimal unexpected patterns from the set of all the discovered patterns. Further, the approach presented in this paper is effective in supporting decision making in real business applications where unexpected patterns with respect to managerial intuition can be of great value. The rest of this paper is organized as follows. In Section 2 we present an overview of the concept of unexpectedness and provide a summary of the relevant related work. We then present in Section 3 related work to the concept of minimality of a set of rules and definitions and formal characterizations of the minimal set of patterns and the minimal set of unexpected patterns. In Section 4 we first present naïve and semi-naïve approaches to discovering the minimal set of unexpected patterns; we then present MinZoomUR, the algorithm for discovering the minimal set of unexpected patterns. Experimental results are presented in Section 5 followed by conclusions in Section 6. 2. Overview of Unexpectedness Unexpectedness of patterns has been studied in [ST95, ST96, LH96, LHC97, Suz97, CSD98, Sub98, BT98, PT98, PT99, P99] and a comparison of these approaches is provided in [P99]. In this paper we follow our previous approach to unexpectedness presented in [PT98, PT99, P99] because it is simple and intuitive (as it is based on a logical contradiction between a discovered pattern and a belief) and lends itself to efficient algorithms that, as demonstrated in [PT98, P99], discover interesting patterns in various applications. Moreover, this paper builds on our previous work on unexpectedness [PT98, PT99] by proposing new methods that significantly improve the effectiveness of the methods described in [PT98, PT99]. To make the paper self-contained, we present an overview of our previous approach to unexpectedness from [PT98, PT99]. To define unexpectedness, we start with a set of beliefs that represent knowledge about the domain and use these beliefs to seed the search for all unexpected patterns defined as rules. In particular, let I = {i1, i2, …, im} be a set of discrete attributes (also called “items” [AIS93]), some of them being ordered and others unordered. Let D = {T1, T2, ..., TN} be a relation consisting on N transactions [AMS+95] T1, T2, ..., TN 2 over the relation schema {i1, i2, …, im}. Also, let an atomic condition be a proposition of the form value1 ≤ attribute ≤ value2 for ordered attributes and attribute = value for unordered attributes where value, value1, value2 belong to the finite set of discrete values taken by attribute in D. Finally, an itemset is a conjunction of atomic conditions. Then we assume that rules and beliefs are defined as extended association rules of the form X → A, where X is the conjunction of atomic conditions (an itemset) and A is an atomic condition. As defined in [AIS93], the rule has confidence c if c% of the transactions in D that contain X also contain A and the rule has support s in D if s% of the transactions in D contain both X and A. Finally, a rule is said to hold on a dataset D if the confidence of the rule is greater than a user-specified threshold value chosen to be any value greater than 0.5. Given these preliminaries, we define unexpectedness as follows. 1 Definition [PT98, P99] . The rule A → B is unexpected with respect to the belief X → Y on the dataset D if the following conditions hold: (a) B AND Y |= FALSE. This condition imposes the constraint that B and Y logically contradict each other. 2 (b) A AND X holds on a statistically large subset of tuples in D . We use the term “intersection of a rule with respect to a belief” to refer to this subset. This intersection defines the subset of tuples in D in which the belief and the rule are both “applicable” in the sense that the antecedents of the belief and the rule are both true on all the tuples in this subset. (c) The rule A, X → B holds (same level of threshold support and confidence). Since condition (a) constrains B and Y to logically contradict each other, the rule A, X → Y does not hold. A key assumption in this definition is that of the monotonicity of beliefs that is motivated in [PT98, P99]. In particular, if we have a belief Y → B that we expect to hold on a dataset D, then monotonicity assumes the belief should also be expected to hold on any statistically large subset of D. The above definition of unexpectedness states that the rule and the belief logically contradict each other, that the rule should hold on its intersection with the belief, and this intersection is supposed to be 1 We would like to point out that this definition is applicable not only to the specific structure of the rules defined above, but also to a broader set of rules. Moreover, this observation also holds for the definitions of the minimal set of rules and monotonicity inference introduced in Section 3. However, the discovery algorithms presented in this paper are designed for the rules having this specific structure introduced above. 2 One of the ways to define “large subset of tuples” is through the user-specified support threshold value. 3 statistically large. This condition is “unexpected” because, according to the monotonicity assumption, it is expected that the belief should hold on the subset of data defined by the belief’s intersection with the rule. Given the definition of unexpectedness, [PT98, P99] propose algorithm ZoomUR that discovers all the unexpected rules with respect to a set of beliefs that satisfy user-specified minimum support and confidence requirements. ZoomUR consists of algorithms for two phases of the discovery strategy - ZoominUR and ZoomoutUR. In the first phase of ZoomUR, ZoominUR discovers all unexpected patterns that are refinements to any belief. More specifically, given any belief X → Y, ZoominUR discovers all unexpected rules of the form X, A → B such that B AND Y |= FALSE. As done in the Apriori algorithm [AMS+95], ZoominUR first generates 3 “large” itemsets incrementally and then generates rules from the discovered itemsets. As originally proposed in Apriori [AMS+95], ZoominUR also uses the observation that subsets of large itemsets should also be large to limit the search for large itemsets. However, unlike Apriori, which starts from an empty set, ZoominUR starts with an initial set of candidate itemsets that are derived from beliefs such that every candidate itemset considered by ZoominUR must contain the body of the belief and an atomic condition that contradicts the head of the belief. In the second phase of ZoomUR, starting from all the unexpected rules that are refinements to a belief, ZoomoutUR discovers more general rules (generalizations) that are also unexpected. Specifically from each unexpected refinement of the form X, A → B ZoomoutUR discovers all the unexpected rules of the form X’, A → B where X’ ⊂ X. The rules that ZoomoutUR discovers are not refinements of beliefs, but more general rules that satisfy the conditions of unexpectedness as defined above. For example, if a belief is that “professional → weekend” (professionals tend to shop more on weekends than on weekdays), ZoominUR may discover a refinement such as “professional, december → weekday” (in December, professionals tend to shop more on weekdays than on weekends). ZoomoutUR may then discover a more general rule “december → weekday”, which is totally different from the initial belief “professional → weekend”. 4 Though ZoomUR discovers only the unexpected rules and also far fewer rules than Apriori , it still discovers large numbers of rules many of which are redundant in the sense that they can be obtained from other discovered rules. For example, given the belief diaper → beer and two unexpected patterns diaper, weekday → not_beer and diaper, weekday, male → not_beer the second unexpected pattern can be inferred from the 3 4 An itemset is said to be large if the percentage of transactions that contain it exceed a user-specified minimum support level. This is not surprising - the objective of Apriori is to discover all strong rules, while ZoomUR discovers only unexpected rules. 4 first one under the monotonicity assumption. Therefore, to improve the discovery process, we introduce in this paper the concept of a minimal set of unexpected patterns and present efficient algorithms that discover this set of rules. 3. Minimal Set of Patterns The notion of minimality is very broad and its studies go way back to the ancient times, including the work of Occam and his Occam’s razor. Since in this paper we focus on discovery of unexpected patterns, we will focus only on minimality as related to this problem. One of the early influential works related to minimality of a set of patterns was presented by Mitchell in [M82]. In particular, [M82] presents a unifying approach to the problem of generalizing knowledge (that, for example, can be represented as rules) by viewing generalization as a search problem. Moreover, based on the specific search methods used, Mitchell [M82] also categorizes several rule learning systems [BM78, P70, W75, HRM7, V78, M77, MUB82] that deal with generalization. In particular, [M82] deals with a broader set of objects (that can also include rules) and formulates the generalization problem as follows. Given a set of instances specified in an instance language, the generalization problem is formulated in [M82] as a search for descriptions in a generalization language such that these generalizations are consistent with a set of training examples that are labeled with the appropriate generalization. [M82] adopts a strong notion of consistency by defining a generalization to be consistent with training examples if it matches all the positive examples and does not match any negative examples. Moreover, [M82] introduces a partial order for generalizations (one generalization being more general than another one) and defines minimally specific generalizations based on this partial order. At the extremes of this partial order are the G-set (the most general set) and the S-set (the most specific set). Although [M82] deals with abstract objects, Mitchell’s approach is also applicable to rules and their generalizations. In the context of discovering a minimal set of rules in data mining, the approach presented in [M82] has the following two limitations. First, in most cases it may not be possible to have training examples (a set of discovered rules) that are classified into known generalizations. Therefore, rather than learning these generalization relationship among different objects, it is necessary to define them. Recent characterizations of various notions of minimality in the knowledge discovery literature take this approach [BA99, LHM99, SLR+99], and we will describe them shortly. Second, the concept of consistency in [M82] is too strong to eliminate rules, typically flagged as uninteresting by users. For example, requiring that all refinements of a discovered rule diaper → beer also be discovered in order to characterize the rule as minimal is too strong since there may be several rules of the form diaper, X → beer that do not have adequate support or 5 confidence. In practice it is rarely the case that all possible refinements to a discovered rule are also discovered. Hence we need a weaker notion of consistency than the one proposed in [M82]. To address these limitations, [BA99, LHM99, SLR+99, TKR+95, SA96, BAG99] provide alternate approaches to characterizing a minimal set of discovered rules. In particular, [BA99] presents an approach that finds the “most interesting rules”, defined as rules that lie on a support and confidence frontier. Further [BA99] proves that these rules necessarily contain the strongest rules discovered using several objective criteria other than just confidence and support. In [SLR+99] several heuristics for pruning large numbers of association rules have been proposed. One of these heuristics prunes out certain refinements of rules, thus, alluding to the concept of minimality of a set of rules. However, [SLR+99] focuses on the heuristics that prune redundant rules from a discovered set of rules and does not explore the concept of minimality fully, nor proposes any algorithms for discovering a minimal set of patterns. In [LHM99] a technique is presented to prune and then summarize an already discovered set of association rules. In particular, [LHM99] defines the concept of direction-setting rules and demonstrates how nondirection-setting rules can be inferred from them. Therefore, the set of direction-setting rules constitutes a set of rules that are “minimal” in some sense. This work is related to [SLR+99] in the sense that certain rule refinements are pruned out in the [LHM99] approach and therefore, they are not direction-setting. However, the approach presented in [LHM99] is different from [SLR+99] and from our approach in the sense that not all refined rules are non-direction setting according to [LHM99]. Moreover, [LHM99] focuses on pruning already discovered rules and does not address the issue of direct discovery of minimal sets. An approach to eliminating redundant association rules is presented in [TKR+95]. In particular, [TKR+95] introduces a concept of the “structural cover” for association rules and presents post-processing algorithms to find the structural cover. In this paper we, present an alternative formal characterization of the minimal set of patterns that corresponds to structural covers of [TKR+95] for the association rules but is also broader and applicable to more general classes of rules. Moreover, [TKR+95] focuses on pruning already discovered rules and does not address the issue of direct discovery of minimal sets. Finally, the work of [SA96] and [BAG99] is also related to the problem of discovering minimal sets of rules. In particular, [SA96] and [BAG99] provide methods for eliminating rules such that the support and/or confidence values of these rules are not unexpected with respect to the support and confidence values of previously discovered rules. However, this work is only marginally related to our approach because we focus 6 on a more general definition of minimality that does not directly depend on confidence and support of discovered rules. In this section we present our approach to minimality that is targeted to the discovery of unexpected patterns. Since the concept of unexpectedness of discovered patterns is based on the monotonicity assumption about the set of underlying beliefs, and since the monotonicity assumption is strongly linked to the concept of refinement of patterns, our concept of minimality is based on refinement of patterns. Therefore, we do not consider other kinds of minimality, such as direction-setting rules [LHM99] or SC-optimality [BA99]. In the rest of this section we formally define minimality of a set of patterns and the minimal set of unexpected patterns. However, before defining these concepts, we introduce some preliminary definitions in the next section. 3.1 Inference Under Monotonicity Assumption Before introducing minimal rules, we need to define formally which rules can be inferred to hold on a dataset due to the monotonicity assumption. Definition. Let X and Y be two itemsets. Then itemset Y is a generalized refinement of itemset X (denoted as Y = genref (X) ) if for all atomic conditions a from X (a) if a is of the form attribute = value then a ∈ Y. (b) if a is of the form value1 ≤ attribute ≤ value2 then Y contains an atomic condition value1 + δ ≤ attribute ≤ value2 - ε for some non-negative values of δ and ε. Lemma 3.1. Y is a generalized refinement of X if and only if there exists itemset Z such that Y = X AND Z. Sketch of the Proof. For unordered attributes this observation is trivial. For the ordered attributes from X having the form value1 ≤ attribute ≤ value2, Z contains an atomic condition from Y of the form value1 + δ ≤ attribute ≤ value2 - ε. Proposition 3.2. If X → Y holds on D and Z is a generalized refinement of X, then under the monotonicity assumption, Z → Y holds on D. Proof. According to Lemma 3.1, there exists itemset W such that Z = X AND W. Let D′ = { t ∈ D | Z = X AND W holds on t }. Then the monotonicity assumption states that X → Y should hold on D′ and hence X, W → Y holds on D. 7 The above proposition states that rule X, W → Y can be inferred to hold on D under monotonicity assuming that rule X → Y holds. However, this inference is applicable only to the rules having the specific structure that was defined in Section 2 and considered throughout this paper (because the definition of generalized refinement explicitly assumes this rule structure). We will provide an alternative characterization of generalized refinement below that can define inference under the monotonicity assumption in more general terms. The next proposition provides an alternative characterization of generalized refinement that will be used subsequently. Proposition 3.3. Itemset C is a generalized refinement of itemset A if and only if C |= A. Proof. The necessary condition immediately follows from Lemma 3.1. To prove the sufficient condition, assume that C |= A. If a is a condition from A involving an unordered attribute of the form attribute = value and a does not belong to C, then we can find an interpretation I such that attribute = value1 ≠ value and C is true in I. In this case, A is false in I, thus producing the contradiction. If a is a condition from A involving an ordered attribute of the form value1 ≤ attribute ≤ value2, and C does not contain any condition of the form value1 + δ ≤ attribute ≤ value2 - ε then we can find an interpretation I such that the value of attribute does not belong to the interval [value1, value2] and C is true for I. In this case, A is false in I, thus also producing the contradiction. We next present a definition and later show that it captures the concept of inference under the monotonicity assumption in more general terms than was done using Proposition 3.2. Definition. Rule (A → B) |=M (C → D) if 1. C |= A, and 2. D = B. Note that we defined relationship |=M in terms of logical implication. However, Proposition 3.3 provides an alternative characterization of |=M in terms of generalized refinements. Theorem 3.4. If X → Y holds on dataset D and X → Y |=M Z → V, then under the monotonicity assumption Z → V holds on D. 8 Proof. By definition of |=M, V = Y and Z |= X. Then from Proposition 3.3, we conclude that Z is a generalized refinement of X. By Proposition 3.2, Z → Y holds on D. This theorem demonstrates that the relational operator |=M defines inference of one rule from another under the monotonicity assumption. Moreover, this theorem gives a broader characterization of this inference than the one implied by Proposition 3.2 because |=M is defined in terms of logical implication as opposed to generalized refinement and hence can be applied to a broader class of rules than the ones introduced in Section 2. Example. Assume the rule diaper, weekday → not_beer holds on a dataset D. Consider the rule diaper, weekday, male → not_beer. Since diaper, weekday, male |= diaper, weekday it follows that diaper, weekday → not_beer |=M diaper, weekday, male → not_beer. Therefore according to Theorem 3.4, diaper, weekday, male → not_beer should also hold. Also notice that the itemset diaper, weekday, male is a generalized refinement of the itemset diaper, weekday. Proposition 3.5. The relation |=M is reflexive, transitive and not symmetric. Proof. The proof follows from the observation that the logical implication relation |= is reflexive and transitive but not symmetric. The following proposition establishes the relationship between |=M and classical logical inference relation |=. Proposition 3.6. If x |=M y then it has to be the case that x |= y, but if x |= y it does not follow that x |=M y (i.e. classical logical inference relation is a necessary but not sufficient condition for two rules to be related by |=M). Proof. The “necessary” part (if x |=M y then it has to be the case that x |= y) trivially holds. For the second part of the proof we provide a counter-example of a case where a rule logically follows from another rule, but where the relationship is not because of monotonicity assumption. A → B > 10 |= A → B > 9. However it follows from the definition that A → B > 10 |≠M A → B > 9. Notice that the inference relation |=M deals with rules that hold on data probabilistically, whereas logical implication |= deals with interpretations of logical formulas and does not directly deal with probabilistic relationships usually encountered in data mining. Therefore, we introduced the relationship |=M that deals with the type of probabilistic inference captured by the monotonicity assumption. 9 Using these definitions and results, we next present the definitions for the minimal set of rules and the minimal set of unexpected patterns. 3.2 Minimal Set of Rules In order to define the minimal set of (unexpected) rules, we first introduce a partial ordering relationship on a set of rules. Definition. The set of rules Y covers the set of rules X (denoted as Y ≺ X) if Y ⊆ X and ∀ xi ∈ X, ∃ yi ∈ Y such that yi |=M xi Definition. Let X and Y be sets of rules. Then Y is the minimal set of X if Y covers X and there is no set of rules Z that covers Y. The following proposition establishes an alternative characterization of a minimal set of rules. Proposition 3.7. Y is the minimal set of X if and only if the following conditions hold: (1) Y ⊆ X. (2) ∀ xi ∈ X, ∃ yi ∈ Y such that yi |=M xi. (3) ∀ y1 , y2 ∈ Y, y1 |≠M y2. Proof. Immediately follows from the definition of the minimal set of rules. The next proposition establishes uniqueness of the minimal set of rules. Proposition 3.8. For any set of rules X, the minimal set of X is unique. Proof. To prove this proposition, we define a directed graph G=(V, E) as follows. The set of nodes V consists of all the rules from X. Given two nodes n1 = A → B and n2 = C → D from V, there is an edge from node n1 to node n2 in E if A → B |=M C → D. Then it is easy to see that the minimal set of rules for X consists of all the nodes of G having no incoming edges (in-degrees of these nodes are 0). Since such set of nodes in G is unique, it follows that the minimal set of rules for X is also unique. Given the definition of minimality introduced above, we next define the minimal set of unexpected patterns. Definition. If B is a belief and X is the set of all unexpected patterns with respect to B, the minimal set of unexpected patterns with respect to B is the minimal set of X. 10 Example: For the belief diaper → beer let the set of all unexpected patterns be {diaper and weekday → not_beer, diaper and unemployed → not_beer, diaper and weekday and unemployed → not_beer, weekday → not_beer, unemployed → not_beer, weekday and unemployed → not_beer}. The minimal set of unexpected patterns in this case is {weekday → not_beer, unemployed → not_beer}. In this section we discussed minimality of a set of patterns and presented a definition for the minimal set of unexpected patterns. In the next section we present algorithms for discovering the minimal set of unexpected patterns. 4. Discovering the Minimal Set of Unexpected Patterns In this section we present three algorithms for discovering the minimal set of unexpected patterns. We first present, in section 4.1, a naive algorithm that discovers such a set of patterns. In section 4.3 we present an efficient algorithm for discovering the minimal set of unexpected patterns. However before presenting this, we describe in Section 4.2 a semi-naive algorithm that discovers only the minimal set of unexpected refinements. We describe this semi-naive algorithm because in many applications we are interested only in refinements (hence the algorithm is important in its own right) and also because the semi-naive algorithm illustrates some important points used in the algorithm presented in Section 4.3. The inputs to all the algorithms presented in this section are the same and constitute: 1. A set of beliefs, B. 2. The dataset D. 3. Minimum support and confidence values minsup and minconf. 4. Minimum and maximum width for all ordered attributes. Regarding input (4), in the case of ordered attributes the width of any condition of the form value1 ≤ attribute ≤ value2 is defined to be value2 - value1. We take as user inputs the minimum and maximum width for all ordered attributes. This is necessary and useful for the following reason. Assume that age is defined to be an ordered attribute and takes values ranging from 1 to 100 in the dataset. Clearly at the extreme a rule involving a condition of the form 1 ≤ age ≤ 100 is not useful since the condition 1 ≤ age ≤ 100 will hold for every record in the dataset. Extending this argument, larger ranges of age may hold for most records in the dataset, hence we allow the user to specify the maximum width for age that the user may be interested in considering. Similarly the user may not be interested in too small a range for ordered attributes and we allow the user to specify a minimum width for the attribute. Note that this is not a restrictive assumption in any way 11 since the default can be the smallest width and largest width respectively for these two parameters. However, the specification of this condition improves the efficiency of the algorithm. We now present the first algorithm, FilterMinZoomUR. 4.1 Algorithm FilterMinZoomUR FilterMinZoomUR is a post-processing method that discovers the minimal set of unexpected patterns by filtering patterns from the set of all unexpected patterns. This algorithm operates in two phases. In the first phase, Algorithm ZoomUR (outlined in Section 2) is applied to a set of beliefs to discover all unexpected patterns. For this set, in the second phase, each unexpected pattern is compared to the rest of the unexpected patterns and dropped from consideration if there exists another pattern from that set, from which it can be inferred under the monotonicity assumption. This strategy is equivalent to applying a minimal filter to the set of patterns discovered by ZoomUR to select the minimal set for the set of unexpected patterns. Algorithm FilterMinZoomUR is presented in Figure 4.1. For each belief, B, step 2 applies ZoomUR to generate the set of all unexpected patterns, Unexp(B). For each unexpected pattern, X, in Unexp(B) steps 4 through 9 adds X to the set of minimal patterns if it cannot be inferred under monotonicity from any other unexpected pattern. FilterMinZoomUR is a naïve algorithm in two senses: (1) it is a post-processing method that relies on ZoomUR to provide the set of all unexpected patterns and (2) each unexpected pattern is compared to the rest of the patterns in the process of selecting the minimal set of unexpected patterns. Inputs: Beliefs Bel_Set, Dataset D, minwidth and maxwidth for all ordered attributes and minimum support min_sup and minimum confidence min_conf Outputs: For each belief, B, MinUnexp(B) 1 forall beliefs B ∈ Bel_Set { 2 Unexp(B)= ZoomUR(Inputs) 3 MinUnexp(B) = {} 4 forall x ∈ Unexp(B) { 5 Other_unexp = Unexp(B) - x if not(∃ y ∈ Other_unexp such that y |=M x) { 6 7 MinUnexp(B) = MinUnexp(B) ∪ {x} 8 } 9 } 12 10 } Figure 4.1 Algorithm FilterMinZoomUR 5 Below we state and prove the completeness of FilterMinZoomUR. Theorem 4.1. For any belief, B, FilterMinZoomUR discovers the minimal set of unexpected rules. Proof. To prove that MinUnexp(B) is a minimal set of Unexp(B) we will show that all three conditions listed in the definition of the minimal set presented in Proposition 3.7 hold. The first and third conditions are trivially satisfied. To satisfy the second condition, it needs to be shown that for any X ∈ Unexp(B) there exists Y ∈ MinUnexp(B) such that Y |=M X. For any X ∈ Unexp(B) consider the two cases that arise from Steps 4 through 9 of FilterMinZoomUR: 1. X is added to the set of minimal patterns. Due to the reflexivity of |=M it is easily seen that there exists Y=X ∈ MinUnexp(B) such that Y |=M X. 2. X is not added to the set of minimal patterns since there exists a pattern P in Unexp(B) - X such that P |=M X. In this case, observing that |=M is transitive and Unexp(B) is finite, Steps 4 through 9 is applied iteratively and adds some Q to MinUnexp(B) where Q |=M P. Since P |=M X, it follows that Q |=M X. Hence there exists Y ∈ MinUnexp(B) such that Y |=M X. The third condition of the definition holds trivially since steps 7 and 8 of FilterMinZoomUR add a pattern X to MinUnexp(B) only if there exists no other pattern, Y, in Unexp(B) such that Y |=M X. One of the main sources of inefficiencies of this algorithm is that we first have to generate Unexp(B) and then filter the non-minimal patterns out of it. In the next two sections we present more efficient algorithms that avoid generating many unexpected patterns that may be known to be non-minimal. 4.2 Algorithm MinZoominUR In this section we present MinZoominUR, an algorithm for discovering the minimal set of unexpected refinements to a set of beliefs. Consider the belief body → head, having the structure specified in Section 2. We use the term CONTR(head) 5 The inputs to algorithms presented in this section are the same as the inputs to ZoomUR [PT98] described in Section 2. 13 to refer to the set of atomic conditions that contradict the atomic condition specified by head. Assume that v1, v2,...,vk are the set of unique values (sorted in ascending order if the attribute a is ordered) that a takes on in D. CONTR(head) is generated as follows: (1) If the head of the belief is of the form value1 ≤ attribute ≤ value2 (attribute is ordered), then condition value3 ≤ attribute ≤ value4 belongs to CONTR(head) if the ranges [value1, value2] and [value3, value4] do not overlap. (2) If the head of the belief is of the form attribute = val (attribute is unordered), then condition attribute = vp belongs to CONTR(head) if vp ∈ { v1, v2,...,vk } and vp ≠ val; Algorithm MinZoominUR is based on the Apriori algorithm [AMS+95] with several major differences. First, generation of large itemsets starts with a set of beliefs that seed the search. Second, MinZoominUR does not generate those itemsets that are guaranteed to produce non-minimal rules. Third, rule generation process is integrated into the itemset generation part of the algorithm – this process is immaterial for Apriori but results in significant efficiency improvements for MinZoominUR. Before presenting MinZoominUR, we first present a broad overview of the algorithm. Each iteration of MinZoominUR generates itemsets in the following manner. In the k-th iteration we generate itemsets of the form {C,body,P}, where C ∈ CONTR(head) and P is a conjunction of k atomic conditions. Observe that to determine the confidence of the rule body, P → C, the supports of both the itemsets {C,body,P} and {body,P} will have to be determined. Hence in the k-th iteration of generating large itemsets, two sets of candidate itemsets are considered for support determination: (1) The set Ck of candidate itemsets. Each itemset in Ck (e.g. {C,body,P}) contains (i) a condition that contradicts the head of belief, (i.e. any C ∈ CONTR(head)), (ii) the body {body} of the belief, and (iii) k other atomic conditions (P is a conjunction of k atomic conditions). (2) A set Ck' of additional candidates. Each itemset in Ck' (e.g. {body,P}) is generated from an itemset in Ck by dropping the condition, C, that contradicts the head of the belief. In each iteration, minimal unexpected rules are generated from the set of large itemsets. If an itemset generates an unexpected rule, it is deleted from consideration and therefore no superset of this itemset is even considered in subsequent iterations. As we prove in Theorem 4.2, this step avoids generation of itemsets producing non-minimal rules and significantly improves the efficiency of the algorithm. We explain the steps of MinZoominUR in Fig. 4.2 now. The following is a list of notations that are used in 14 describing the algorithm: • UNORD is the set of unordered attributes. • ORD is the set of ordered attributes. • minwidth(a) and maxwidth(a) are minimum and maximum widths for ordered attribute a. • Attributes(x) is the set of all attributes present in any of the conditions in itemset x. • Values(a) is the set of distinct values the attribute a takes in the dataset D. First, given a belief, B, the set of atomic conditions that contradict the head of the belief, CONTR(head(B)), is computed (as described previously). Then, the first candidate itemsets generated in C0 (step 3) will each contain the body of the belief and a condition from CONTR(head(B)). Hence the cardinality of the set C0 is the same as the cardinality of the set CONTR(head(B)). Inputs: Beliefs Bel_Set, Dataset D, minwidth and maxwidth for all ordered attributes ORD and thresholds min_support and min_conf Outputs: For each belief, B, MinUnexp(B) 1 forall beliefs B ∈ Bel_Set { 2 MinUnexp(B) = {} 3 C0 = { {x,body(B)} | x ∈ CONTR(head(B)) }; 4 C0’ = {{body(B)}}; 5 k=0 6 while (Ck != ∅ ) do { 7 8 9 10 forall c ∈ Ck ∪ Ck’, compute support(c) Lk = {x| x ∈ Ck, support(x) ≥ min_support } Lk’ = {x| x ∈ Ck’, support(x) ≥ min_support} forall (x ∈ Lk) { Let a = x ∩ CONTR(head(B)) /* this intersection is a single 11 element */ 12 rule_conf = support(x)/support(x-a) 13 if (rule_conf > min_conf) { MinUnexp(B) = MinUnexp(B) ∪ {x – a →a} 14 15 Lk = Lk - x 16 17 } } 18 k++ 19 Ck = generate_new_candidates(Lk-1, B) 15 Ck’ = generate_bodies(Ck , B) 20 21 } 22 forall x ∈ MinUnexp(B) { 23 24 Other_unexp = MinUnexp(B)-x if (∃ y ∈ Other_unexp such that y |=M x) { 25 MinUnexp(B) = MinUnexp(B) - {x} 26 27 } } 28 } Figure 4.2 Algorithm MinZoominUR To illustrate this, consider an example involving only binary attributes. For the belief x=0 → y=0, the set CONTR({y=0}) consists of a single condition {y=1}. The initial candidate sets, therefore, are C0 = {{y=1, x=0}}, C0' = {{x=0}}. Steps (6) through (20) in Fig. 4.2 are iterative: Steps 7 through 9 determine the supports in dataset D for all the candidate itemsets currently being considered and selects the large itemsets Lk and Lk’. Each itemset in Lk contains the body and the head of a potentially unexpected rule, while each itemset in Lk’ contains only the body of the potentially unexpected rule. Steps 10 through 17 generate unexpected rules such that large itemsets that contribute to unexpected rules are subsequently deleted in Step 15. Specifically, for each large itemset in Lk, if the unexpected refinement rule that is generated from the itemset has sufficient confidence, then two actions are performed: 1. Step 14 adds this rule to the set of potentially minimal unexpected refinements. 2. Step 15 deletes the corresponding itemset from Lk since any itemset that is a superset of this itemset can only generate unexpected refinements that can be monotonically inferred from the new rule generated in step 14. Theorem 4.2 below has a detailed proof. In step (19), function generate_new_candidates(Lk-1, B) generates the set Ck of new candidate itemsets to be considered in the next pass from the previously determined set of large itemsets, Lk-1, with respect to the belief B (“x → y”) in the following manner: 16 (A) Initial condition (k=1): In the example (involving binary attributes) considered above, assume that L0 = {{x=0, y=1},{x=0}}, i.e. both the initial candidates had adequate support. Further assume that p is the only other attribute (also binary) in the domain. The next set of candidate itemsets to be considered would be C1 = {{x=0,y=1,p=0}, {x=0,y=1,p=1}}, and C1’ = {{x=0, p=0}, {x=0, p=1}}. In general we generate C1 from L0 by adding additional conditions of the form attribute = value for unordered attributes or of the form value1 ≤ attribute ≤ value2 for ordered attributes to each of the itemsets in L0. More specifically, for a belief B, the set C1 is computed using the following rules. If itemset x ∈ L0 and x contains a condition that contradicts the head of the belief: 1. The itemset x ∪ {{a = val}} ∈ C1 if a ∈ UNORD (set of unordered attributes), val ∈ Values(a) and a ∉ Attributes(x). 2. The itemset x ∪ {{value1 ≤ a ≤ value2}} ∈ C1 if a ∉ Attributes(head(B)), a ∈ ORD (set of ordered attributes), value1 ∈ Values(a), value2 ∈ Values(a), value1 ≤ value2, and the resulting width for the attribute a should satisfy minimum and maximum width restrictions for that attribute. This process is efficient and complete because of the following reasons. 1. The attributes are assumed to have a finite number of unique discrete values in the dataset D. Only conditions involving these discrete values are considered. 2. For unordered attributes no condition involving an attribute already present in the itemset is added. This ensures that itemsets that are guaranteed to have zero support are never considered. For example, this condition ensures that for the belief month=9 → sales=low, the itemset {{month = 3}} is not added to the itemset {{sales = high}, {month = 9}}. 3. For ordered attributes, however, an itemset such as {{3 ≤ a ≤ 6 }} can be added to {{b=1}, {5 ≤ a ≤ 8}} to result in {{b=1}, {5 ≤ a ≤ 6 }} where the initial belief may be 5 ≤ a ≤ 8→ b=0 for example. Without loss of generality in this case we represent the new itemset as {{b=1}, {5 ≤ a ≤ 8}, {3 ≤ a ≤ 6 }} rather than as {{b=1}, {5 ≤ a ≤ 6 }}. We use this “long form” notation since (1) we assume that all itemsets in a given iteration have the same cardinality and (2) the body of the belief is explicitly present in each itemset. (B) Incremental generation of Ck from Lk-1 when k > 1: This function is very similar to the apriori-gen function described in [AMS+95]. For example, assume that for a belief, B, "x → y", c is a condition that contradicts y and that L1 = {{c, x, p}, {c, x, q}, {x, p}, {x, q}}. Similar to the apriori-gen function, the next set of candidate itemsets that contain x and c is C2 ={{x, c, p, q}} since this is the only itemset such that all its 17 subsets of one less cardinality that contain both x and c are in L1. In general, an itemset X is in Ck if and only if for the belief B, X contains body(B) and a condition A such that A ∈ CONTR(head(B)) and all subsets of X with one less cardinality, containing A and body(B), are in Lk-1. More specifically, Ck is generated from Lk-1 using the following rule: If a ∈ CONTR(head(B)), a ∈{x1, x2,..., xp} and {x1, x2,..., xp, v}, {x1, x2,..., xp, w}∈ Lk-1 then {x1, x2,..., xp, v, w} ∈ Ck if w ∉ Attributes({x1, x2,..., xp, v}). The above rule for generating Ck essentially limits itemsets to a single condition for each “new” attribute not present in the belief B. This however does not eliminate any relevant large itemset from being generated as the following example shows. Consider the case where starting from the belief x=1 → y=0 the set of large itemsets (generated by MinZoominUR at the end of the first iteration) L1 contains {y=1, x=1, 3 ≤ a ≤ 6} and {y=1, x=1, 5 ≤ a ≤ 7}. Combining these itemsets yields the equivalent itemset {y=1, x=1, 5 ≤ a ≤ 6} of the same cardinality as any itemset in L1 and if this itemset is large, it would already be present in L1. In step (20), as described previously, we would also need the support of additional candidate itemsets in Ck' to determine the confidence of unexpected rules that will be generated. The function generate_bodies(Ck,B) generates Ck' by considering each itemset in Ck and dropping the condition that contradicts the head of the belief and adding the resulting itemset in Ck'. Steps (22 – 27) are needed to detect any remaining non-minimal rules that arise due to the following special case of certain itemsets containing unordered attributes. To illustrate this special case, consider the following two itemsets: {{a=1}, {5 ≤ b ≤ 10}} and {{a=1}, {7 ≤ b ≤ 8}}. The special case is that neither of these sets is a superset of the other, yet (5 ≤ b ≤ 10 → a=1) |=M (7 ≤ b ≤ 8 → a=1) since (7 ≤ b ≤ 8) |= (5 ≤ b ≤ 10). Therefore, the rule 7 ≤ b ≤ 8 → a=1 should be eliminated in order to produce the minimal set of unexpected rules. Since Steps (6 – 21) of the algorithm do not eliminate such rules, the additional Steps (22 – 27) do this. Observe that in the case of only unordered attributes in the itemsets, Steps (22 – 27) of the algorithm are not needed since MinUnexp(B) after Step 21 is guaranteed to be minimal (see the proof of Theorem 4.2). Moreover, notice that elimination of remaining non-minimal rules can also be done in two other parts of the algorithm instead of Steps (22 – 27). First, it can be done in generate_new_candidates procedure (Step 19) by avoiding the generation of the itemsets that are generalized refinements of itemsets that have previously 18 produced unexpected rules. Second, the elimination procedure can also be done as new rules are added to the set MinUnexp(B) in Step 14 by comparing the new rule with all the other rules in MinUnexp(B). We analyzed both of these possibilites and realized that the approach taken in Steps (22 – 27) of Figure 4.2 is at least as efficient as these two other possibilities. Moreover, another advantage of doing elimination in Steps (22 – 27) is that it can be dropped altogether in case of only unordered attributes (as explained above). The computational complexity of Steps (1 – 21) is determined by the total number of candidate itemsets K generated in Steps (19 - 20) taken over all the iterations of the While-loop. The computational complexity of the elimination procedure in Steps (22 – 27) is O(n2), where n is the size of the set MinUnexp(B). In practice K >> n2. Therefore, the bottleneck of MinZoominUR algorithm lies in Steps (6 – 21). Moreover, the complexity of MinZoominUR in the worst case is comparable to the worst-case complexity of Apriori that is bounded by O(||C|| * ||D||), where ||C|| denotes the sum of the sizes of candidates considered, and ||D|| denotes the size of the database [AMS+95]. However, in the average case, the computational complexity of MinZoominUR is significantly lower than that of Apriori. This is the case because the average number of candidates considered in MinZoominUR is significantly lower than that for Apriori due to (a) minimality-based elimination procedure, and (b) presence of the initial set of beliefs that seed the search process. In Section 5 we experimentally compare the main algorithm, ZoomUR (presented in the next section), with Apriori not in terms of itemsets but directly in terms of the number of rules generated. A key strength of MinZoominUR, compared to ZoomUR [PT99] and Apriori [AMS+95], is that rule discovery is integrated into the itemset generation procedure and hence it can greatly reduce the number of itemsets generated in subsequent iterations - as [AMS+95] show, itemsets can grow exponentially in association rule discovery algorithms. The reason this is possible is the objective of MinZoominUR - to generate only the minimal set of unexpected patterns (as opposed to generating all patterns or all unexpected patterns). Below we prove the completeness of MinZoominUR. Theorem 4.2. For any belief, B, MinZoominUR discovers the minimal set of unexpected rules that are refinements to the belief. Sketch of the Proof. We will first show that for the case where there are unordered attributes only, MinZoominUR generates the minimal set of unexpected patterns without needing to apply the minimal filter (Steps 22 through 27 of Figure 4.2). 19 For unordered attributes only, it is easy to see that a rule X1=x1, X2=x2,…, Xn=xn → Y = y1 is minimal if and 6 only if there is no rule of the form Z → Y = y1, where Z ⊂ {X1=x1, X2=x2,…, Xn=xn} . For unordered attributes only, consider the belief A=a1 → B=b1 and any minimal unexpected rule A=a1, X → B=b2 which is a refinement that holds, i.e its support and confidence values are greater than the specified threshold values, where the itemset X is a conjunction of atomic conditions involving unordered attributes. Below we will show that A=a1, X → B=b2 will be discovered by MinZoominUR. Since A=a1, X → B=b2 holds, the itemset {A=a1, X, B=b2}, and all its subsets, have adequate support. Further since the rule is assumed to hold, it is clear that Steps 13 and 14 generate the rule A=a1, X → B=b2 if the itemset {A=a1, X, B=b2} is generated. Hence it needs to be shown that the itemset {A=a1 , X, B=b2} will be generated. Before the iterations of MinZoominUR, Step 3 generates {A=a1, B=b2} in the initial set of candidates. Based on Apriori and the completeness proof of ZoomUR [P99] it is clear that the itemset {A=a1, X, 7 B=b2} will be generated unless Step 15 deleted a “parent” of this itemset. We will next show that this is impossible, i.e. Step 15 could not have deleted a parent of this itemset in any previous iteration. Consider any parent, {A=a1, Y, B=b2} of the itemset {A=a1, X, B=b2} defined such that Y ⊂ X. Assume that the itemset {A=a1, Y, B=b2} was deleted in Step 15 in some iteration. Hence it has to be the case (see Steps 13-16 of Figure 4.2) that A=a1, Y → B=b2 holds. However if Y ⊂ X and A=a1, Y → B=b2 then the rule A=a1, X → B=b2 cannot be minimal. This is a contradiction and hence no itemset of the form {A=a1, Y, B=b2} will be deleted in previous iterations. Hence the itemset {A=a1, X, B=b2} will be generated. Hence MinZoominUR will generate any minimal unexpected rule A=a1, X → B=b2 which is a refinement to the belief. Given the observation that for unordered attributes only, a rule X1=x1, X2=x2,…, Xn=xn → Y = y1 is minimal if and only if there is no rule of the form Z → Y = y1, where Z ⊂ {X1=x1, X2=x2,…, Xn=xn}, it is easy to see that MinZoominUR does not generate any non-minimal rule. Hence, for the case where there are unordered attributes only, MinZoominUR generates the minimal set of unexpected patterns without needing to apply the minimal filter. For the case of ordered attributes, it 6 Note that this “syntactic” subset property is not true when dealing with ordered attributes, which is why the minimal filter in Steps 22-27 is necessary. 7 Parent is defined formally in Section 4.3.1 20 can easily be seen that only non-minimal rules are automatically excluded from MinZoominUR. However there is a special case involving ordered attributes that cannot guarantee only minimal rules before Steps 22-27. This special case arises since a syntactic subset check cannot capture containment when dealing with ranges of values for ordered attributes. An example of this special case was given above in Section 4.2. Hence the filter in Steps 22-27 removes any non-minimal rules remaining and its clear that MinZoominUR generates only the minimal set of unexpected refinements to a belief. As mentioned earlier, MinZoominUR only discovers the minimal set of unexpected refinements to a belief. We next present MinZoomUR, an algorithm that discovers for each belief the minimal set of unexpected rules. 4.3 Algorithm MinZoomUR In this section we present MinZoomUR, an algorithm that discovers for each belief, the minimal set of unexpected rules. Before describing the algorithm we present some preliminaries. 4.3.1 Preliminaries For a belief B, let x be any large itemset containing body(B) and one condition from CONTR(head(B)). We use the term parents(x) to denote the set of all subsets of x that contain the body of the belief and one 8 condition that contradicts the head of the belief considered in previous iterations during the candidate generation phase of the algorithm. Specifically, parents(x) = { a | a ⊂ x, body(B) ⊂ a, ∃c such that c ∈ CONTR(head(B)) and c ∈ a} An itemset y is said to be a parent of x if y ∈ parents(x). We use the term zoomin rules to denote unexpected rules that are refinements of beliefs and zoomout rules for unexpected rules that are more general unexpected rules. The large itemset x is said to generate a zoomin rule if confidence (x - c → c) > min_conf, where c ∈ CONTR(head(B)). The large itemset x is said to generate a zoomout rule if x generates a zoomin rule x - c → c and confidence( x - c - d → c) > min_conf, where c ∈ CONTR(head(B)), d ⊆ body(B) and d is not empty. 8 Recall that the candidate generation phase of these algorithms (Apriori, MinZoominUR and MinZoomUR) is iterative such that itemsets in subsequent iterations have greater cardinality (number of items). 21 Associated with each itemset, x, are two attributes: x.rule, that keeps track of whether a zoomin rule is generated from x, and x.dropped_subsets, which keeps track of the subsets of body(B) that are dropped during the discovery of zoomout rules. These two attributes are further explained below. We define x.rule as follows: x.rule = 1 if x generates a zoomin rule, = 0 otherwise. Further we define parentrules(x) to be TRUE if and only if x has a parent y such that y.rule = 1. If x generates a zoomin rule, zoomout rules are generated in ZoomoutUR [PT98, P99] by dropping one or more attributes belonging to body(B) from the zoomin rule generated. This process essentially drops nonempty subsets of body(B) from the zoomin rule generated to determine the zoomout rules that are unexpected. The set x.dropped_subsets is the set of subsets of body(B) that are dropped from itemset x to generate a given set of zoomout rules. Specifically, if P is a set of zoomout rules generated from x, then x.dropped_subsets is defined as follows: x.dropped_subsets(P) = {d | { x - c - d → c} ∈ P, where c ∈ CONTR(head(B)), d ⊆ body(B), d ≠ ∅ }. Given these preliminaries, we now present MinZoomUR. 4.3.2 Overview of the Discovery Strategy Unlike what was done in MinZoominUR, an itemset that generates a zoomin rule in MinZoomUR cannot always be deleted from subsequent consideration since it is possible for minimal zoomout rules to be derived from non-minimal zoomin rules. Consider the following example. For a belief a, b → x, let a, b, c → y and a, b, c, d → y be two zoomin rules. Though a, b, c, d → y is a non-minimal zoomin rule, the rule may result in a zoomout rule such as b, c, d → y which may belong to the minimal set of unexpected rules. Extending this example one more step, we observe that the zoomout rule b, c, d → y can, however, be guaranteed to be non-minimal if the first zoomin rule a, b, c → y resulted in a zoomout rule of the form p, c → y such that b, c, d |= p, c where p is a proper subset of the body of the belief. Examples of such p are {b} and {} corresponding to the zoomout rules b, c → y and c → y respectively (generated from a, b, c → y). However if the first zoomin rule generated only the zoomout rule a, c → y, it may still be possible for the zoomout rule b, c, d → y to be minimal since b, c, d |≠ a, c. 22 The discovery strategy of MinZoomUR is based on the following conditions under which some generated rules are guaranteed to be non-minimal and hence can be excluded from the minimal set. Theorem 4.3 below proves that these conditions do indeed exclude only non-minimal rules, hence we state these rules with only limited explanation here. The “exclusion rules” used in MinZoomUR are: 1. If x and y are two large itemsets such that x is a parent of y and x.rule = 1 then the zoomin rule generated from y cannot be minimal. Hence for any itemset y such that parentrules(y) is TRUE, the zoomin rule generated from y cannot be minimal. This is the only exclusion rule used previously in MinZoominUR. 2. If x is a large itemset that generates a zoomin rule and some zoomout rules, then the zoomin rule generated cannot be minimal. 9 3. If x is a large itemset that generates zoomout rules p and q and elem_p ∈ x.dropped_subsets(p) and elem_q ∈ x.dropped_subsets(q) and elem_p ⊂ elem_q then p cannot be minimal. For example, for a belief a, b → z, let itemset x = {a, b, c, y} be large where y ∈ CONTR(z). If x generates the zoomout rules p ("b, c → y") and q ("c → y") then elem_p is {a} and elem_q is {a, b}. Since elem_p ⊂ elem_q, the rule b, c → y cannot belong to the minimal set of unexpected rules since it can be inferred from c → y using the monotonicity assumption. 4. If x and y are two large itemsets such that x is a parent of y, zoomout rules generated from y generated by dropping any subset, p, from the body of the belief such that p is a subset of some element belonging to x.dropped_subsets cannot be minimal rules. For example, for a belief a, b, c → z, let itemset x = {a, b, c, d, m} be large where m ∈ CONTR(z). Let x generate the zoomout rule c, d → m. Hence {a, b} ∈ x.dropped_subsets. This exclusion rule states that from any large itemset y, generated in subsequent iterations from x, the zoomout rules derived from y by dropping either {a} or {b} cannot be minimal. For example, assume the large itemset y = {a, b, c, d, e, m}. The zoomout rules generated from y by dropping {a} or {b} are, respectively, b, c, d, e → m and a, c, d, e → m. Observe that neither of these two zoomout rules can be minimal since these can be derived under the monotonicity assumption from the prior zoomout rule c, d → m. MinZoomUR generates candidate itemsets in the same manner as in MinZoominUR. A main difference in the algorithms is that MinZoomUR considers zoomout rules also for a given itemset immediately after the itemset generates a zoomin rule. This is necessary because some of the exclusion rules applied to an unexpected rule generated depends on knowing the zoomout rules generated for that itemset and its parents. 9 For belief B and itemset x, x.dropped_subsets(p) where p is a single zoomout rule, will contain only one element which is the subset of body(B) that was dropped to create the zoomout rule. 23 Inputs: Beliefs Bel_Set, Dataset D, minwidth and maxwidth for all ordered attributes and thresholds min_support and min_conf Outputs: For each belief, B, MinUnexp(B) 1 forall beliefs B ∈ Bel_Set { 2 MinUnexp(B) = {}; k=0 3 C0 = {{x,body(B)} | x ∈ CONTR(head(B))}; C0’ = {{body(B)}}; 4 while (Ck != ∅ ) do { forall c ∈ Ck ∪ Ck’, compute support(c) Lk = {x| x ∈ Ck, support(x) ≥ min_support } 5 6 8 Lk’ = {x| x ∈ Ck’, support(x) ≥ min_support} forall (x ∈ Lk) { 9 x.rule = 0; Let a = x ∩ CONTR(head(B)); 10 rule_conf = support(x)/support(x-a) 11 if (rule_conf > min_conf) { 7 12 x.rules = 1 13 zoutrules = minzoomoutrules(x,B) 14 x.dropped_subs = dropped_subsets(zoutrules,B) 15 MinUnexp(B) = MinUnexp(B) ∪ zoutrules 16 if (zoutrules == parentrules(x) == NULL) MinUnexp(B) = MinUnexp(B) ∪ {x – a →a} 17 if ({body(B)} ∈ x.dropped_subs) Lk = Lk - x 18 19 } 20 } 21 k++ 22 Ck = generate_new_candidates(Lk-1, B) 23 Ck’ = generate_bodies(Ck , B) 24 25 } forall x ∈ MinUnexp(B) { 26 Other_unexp = Unexp(B)-x 27 if (∃ y ∈ Other_unexp | y |=M x) 28 29 MinUnexp(B) = MinUnexp(B) - {x} } 30 } Figure 4.3 Algorithm MinZoomUR 24 MinZoomUR is presented in Figure 4.3. The beginning (Steps 1 through 7) and the end (Steps 25 through 30) of the algorithm where the minimal filter is applied are the same as in MinZoominUR. We explain steps 8 through 24 next, where unexpected rules are generated from each large itemset and the potentially minimal ones stored in MinUnexp(B). Steps 8 through 12 consider whether a large itemset, x, generates a zoomin rule and sets the attribute x.rule accordingly. If a zoomin rule is generated, step 13 applies the procedure minzoomoutrules to consider potentially minimal zoomout rules generated from this itemset. This procedure first applied exclusion rule #3 to generate some potentially minimal zoomout rules. Step 14 initializes x.dropped_subsets but notice that these are actually computed in the process of applying exclusion rule #3 in the previous step. Then exclusion rule #4 is applied to this set of rules to filter out any guaranteed non-minimal rules. The resulting set of potentially minimal zoomout rules generated from this itemset are zoutrules. From this set of rules, Step 14 sets the attribute x.dropped_subsets to the set of subsets of the body of the belief dropped in any of the rules in zoutrules. Step 15 adds the rules zoutrules to the potentially minimal set. In order to decide whether the zoomin rule generated for x should be added to the potentially minimal set, Step 16 and 17 applies exclusion rules #1 and #2. Finally, for each large itemset, x, Step 18 applies a corrollary to exclusion rule #4 - if the entire body of a belief is dropped to generate a zoomout rule, then no children of x can generate any minimal rule. Hence, in this event the itemset x is deleted from subsequent consideration. Below we prove the completeness of MinZoomUR. Theorem 4.3. For any belief MinZoomUR discovers the minimal set of unexpected patterns. Proof. As proved in Theorem 4.1, steps 25 through 30 generate the minimal set of unexpected patterns from MinUnexp(B). Hence in order to prove the theorem, it just needs to be shown that the exclusion rules applied prior to step 25 exclude only the non-minimal rules. In the remainder of the proof we will consider each exclusion rule and demonstrate that the rule excludes only non-minimal rules. Exclusion rule #1. This rule is the exclusion rule used in MinZoominUR. Hence, according to Theorem 4.2, this rule excludes only rules guaranteed to be non-minimal. Exclusion rule #2 This rule immediately follows from the definitions. Exclusion rule #3. Let x be a large itemset that generates zoomout rule p by dropping elem_p from body(B) and generates zoomout rule q by dropping elem_q from body(B). Further, as in exclusion rule #3, let elem_p ⊂ elem_q. Let c ∈ x such that c ∈ CONTR(B). The zoomout rule p therefore is x - elem_p - c → c and the zoomout rule q is x - elem_q - c → c. Since elem_p ⊂ elem_q, we have x - elem_p - c ⊃ x - elem_q - c. Hence 25 x - elem_p - c |= x - elem_q - c and as a result x - elem_q - c → c |=M x - elem_p - c → c. Thus x - elem_p - c → c is not minimal and can be excluded. Exclusion rule #4. Let x and y be two large itemsets such that x is a parent of y and assume that zoomout rule zy,p is generated from y by dropping a subset, p, from the body of the belief such that p is a subset of q where q ∈ x.dropped_subsets. It needs to be shown that zy,p cannot be minimal. Since x is a parent of y, x = y - k, where k is some non-empty itemset and there exists c ∈ CONTR(B) such that c ∈ x and c ∈ y. Further since p is a subset of q, assume p = q - t, where t is some itemset. The rule zy,p therefore is y - p - c → c. Since q ∈ x.dropped_subsets there is a zoomout rule zx,q generated from x that is obtained by dropping a subset q from the body of the belief. The rule zx,q therefore is x - q - c → c. Substituting for x and q, zx,q is equivalent to (y - k) - (p + t) - c → c which is the rule y - p - c - (k + t) → c. Since k is also non-empty, y - p - c ⊃ y - p - c - (k + t) and hence ,y - p - c |= y - p - c - (k + t). Therefore, zx,q |=M zy,p. Hence zy,p is not minimal and can be excluded. In this section we presented methods to generate the minimal set of unexpected patterns. The strength of these methods is that they eliminate redundant unexpected patterns and therefore generate a smaller set of unexpected patterns. In the next section we present experiments from a real world consumer purchase dataset that these ideas can be used to discover orders of magnitude fewer patterns that ZoomUR [PT98] and Apriori [AMS+95] and yet find most of the truly unexpected ones. 5. Experiments To illustrate the usefulness of our approach to discovering patterns, in this section we consider an illustrative case study application of applying the methods to consumer purchase data from a major market research firm. We pre-processed this data by combining different data sets (transaction data joined with demographics), made available to us into one table containing 38 different attributes and 313409 records. These attributes pertain to the item purchased by a shopper at a store over a period of one year, together with certain characteristics of the store and demographic data about the shopper and his or her family. Some demographic attributes include age and gender of the shopper, occupation, income and marital status of the household head and the presence of children in the family and the size of the household. Some transactionspecific attributes include product purchased, coupon usage (whether the shopper used any coupons to get a lower price or not), the availability of store coupons or manufacturer’s coupons and presence of advertisements for the product purchased in the store. For simplicity in generating beliefs and in making comparisons to other techniques that generate association rules in these experiments we restrict our 26 consideration to rules involving discrete attributes only. An initial set of 28 beliefs was generated by domain experts after examining 300 rules generated from the data using methods described in [P99]. In this section we present results from applying MinZoomUR, ZoomUR [PT98, P99] and Apriori [AMS+95] to this dataset starting from the initial set of beliefs where applicable. In Section 5.1 we compare these methods in terms of the number of rules generated and provide some guidelines as to when each may be applicable and in Section 5.2 we discuss scalability of MinZoomUR and ZoomUR with respect to the size of the database and the initial set of beliefs. 5.1 Number of Patterns Generated. In this section we compare and contrast MinZoomUR, ZoomUR and Apriori in terms of the number of patterns generated and other criteria. We also present practical implications of this comparison in terms of guidelines for when each may be preferable. For a fixed minimum confidence level of 0.6, Figure 5.1 through 5.3 show the number of patterns generated by Apriori, ZoomUR and MinZoomUR for varying levels of minimum support thresholds. Apriori generated 50,000 to 250,000 rules even for reasonably high minimum support values. This is not surprising since the objective of Apriori is to discover all strong association rules. For reasonable values of support (5 to 10%), ZoomUR generates 50 to 5000 unexpected patterns. MinZoomUR on the other hand generated only 15 to 700 unexpected patterns even for extremely low values for minimum support. 300,000 Number of Rules 250,000 200,000 150,000 100,000 50,000 0 5.0% 6.0% 7.0% 8.0% 9.0% 10.0% 11.0% 12.0% 13.0% Minimum Support Figure 5.1. Number of rules generated by Apriori 27 6000 5000 # of rules 4000 3000 2000 1000 0 5.3% 6.3% 7.3% 8.3% 9.3% 10.3% Minimum Support Figure 5.2. Number of unexpected rules generated by ZoomUR 800 700 # of rules 600 500 400 300 200 100 0 0.0% 1.0% 2.0% 3.0% 4.0% 5.0% Minimum Support Figure 5.3. Number of unexpected rules generated by MinZoomUR Figure 5.4 illustrates the comparison of the three methods in terms of the number of generated rules. Due to the order of magnitude difference in the number of generated rules, the graph plots the number of rules generated using a logarithmic scale for the Y axis. As we would expect, as the minimum support threshold is lowered, all the methods discover a greater number of rules. Despite this, MinZoomUR discovers orders of magnitude fewer patterns than both ZoomUR and Apriori. The graphs in Figures 5.1 – 5.3 also demonstrate that a majority of patterns generated by ZoomUR are redundant. Observe that as the support threshold is lowered, the number of patterns generated by both ZoomUR and Apriori seem to increase more than linearly. While this is the case for MinZoomUR in 28 some regions, MinZoomUR plateaued out for lower regions of support, as Figure 5.3 demonstrates. This plateau signifies that very few new minimal unexpected patterns are generated despite the fact that the number of unexpected patterns generated by ZoomUR keep increasing in that region. This observation coupled with the comparison in the number of rules generated indicate that MinZoomUR is indeed effective in removing redundant patterns, which represent a large majority of the set of all unexpected patterns. 1000000 100000 10000 # of rules (logarithmic scale) Apriori 1000 ZoomUR MinZoomUR 100 10 1 0 2 4 6 8 10 12 Minimum Support Figure 5.4. Comparison of number of rules generated by Apriori, ZoomUR and MinZoomUR Discussion. Based on these experiments we discuss below some possible tradeoffs between these methods and provide some guidelines to their usage. The clear advantage of MinZoomUR over ZoomUR is that it generates far fewer patterns and yet retains most of the truly interesting ones. Since ZoomUR generates all unexpected patterns for a belief and MinZoomUR generates the minimal set of unexpected patterns, MinZoomUR will always generate a subset of patterns that ZoomUR generates. As shown above, this subset can be extremely small (from 15 to a few hundred patterns for the entire set of beliefs). We would also like to note here that the classical notion of “minimality” often assumes that it is possible to reconstruct the set of all objects having certain property from the minimal set of objects having this property. In our case also, the set of all unexpected patterns can be reconstructed from the minimal set of unexpected patterns. However, this can only be done using a generate-and-test procedure (starting from the minimal set) that requires data access again. This limitation of our approach is the result of using efficient search algorithms that directly discover the minimal set of unexpected patterns without even examining all unexpected patterns. Moreover, this limitation can also be circumvented by letting the domain expert examine the set of minimal unexpected patterns (that is small), 29 select the most interesting minimal patterns, and refine them to discover all the unexpected patterns obtained from this selected set. The drawback of MinZoomUR compared to ZoomUR is that MinZoomUR makes an implicit assumption that minimal unexpected patterns are the most interesting patterns. From a subjective point of view this may not be necessarily true. Consider the following example of two unexpected patterns: • When coupons are available for cereals, they don't get used (confidence = 60%) • On weekends, when coupons are available for cereals they don't get used (confidence = 98%) MinZoomUR will not generate the second unexpected pattern since it is monotonically implied by the first pattern. However, the second unexpected pattern has a much higher confidence and may be considered "more unexpected" by some users, in the spirit of [SA96, BAG99]. In a more general sense, the criteria implied by monotonicity and confidence are just two methods to rank unexpected patterns. In general there may be other criteria, some of which even depending on other subjective preferences of a user. Hence, since ZoomUR generates all the unexpected patterns, it is guaranteed to contain all the unexpected patterns that are "most unexpected" from any specific definition of the term "most unexpected". In subsequent work, we will study the issue of generating the "most unexpected patterns" by characterizing the degree of unexpectedness for patterns along the lines of [ST96]. In the context of objective measures of interestingness, [BA99] discuss interesting approaches to finding the “most interesting” patterns. Given the relative advantages of the two methods to discovering unexpected patterns, a practical implication of the above is that ZoomUR can be used to generate unexpected patterns for larger support values and MinZoomUR can be used if patterns of very low support need to be generated. As shown in Figure 5.3, MinZoomUR generates a reasonable number of unexpected patterns even for extremely small values of minimum support, as low as even 0.5%. Also the support of some beliefs about a domain may be very low, perhaps reflective of some condition that occurs rarely. In such cases methods such as MinZoomUR that can find patterns at very low support values are necessary. Apriori on the other hand has the drawback of generating a very large number of patterns since the objective is to discover all strong rules. As Figure 5.1 shows, for very low support values, this could result in millions of rules. However there are two sides of the coin. Generating a very large number of patterns results in a data mining problem of a second order and is hence avoidable. At the same time, domain knowledge that ZoomUR and MinZoomUR start with will almost always be incomplete for most business applications. Hence it is possible that either of the two methods that seek unexpected patterns could miss other interesting 30 patterns that may be unrelated to the captured domain knowledge. However the set of patterns generated by Apriori can be guaranteed to have all the interesting patterns since it has all patterns. We believe that this tradeoff is in some sense unavoidable since the problem of generating all interesting patterns (not just “unexpected”) is a difficult problem to solve. 5.2 Scalability Issues In this section we experimentally examine the scalability of ZoomUR and MinZoomUR with respect to the size of the database and the number of initial beliefs. Scalability with the size of the database For a sample of 10 beliefs, we ran ZoomUR and MinZoomUR multiple times by varying the number of records in the dataset from 40,000 to 200,000. Figure 5.4 and Figure 5.5 show the execution times for ZoomUR and MinZoomUR respectively. The experiments indicate that the methods are scalable in the range considered. Figures 5.4 and 5.5 indicate that both ZoomUR and MinZoomUR seem to scale linearly with the size of the database. This is not surprising since these algorithms are based on Apriori, which as shown in [AMS+95] scales linearly. Figure 5.4. Execution time of ZoomUR as a function of database size 30 Time (minutes) 25 20 15 10 5 0 0 50000 100000 150000 200000 Database size 31 25 Time (minutes) 20 15 10 5 0 0 50000 100000 150000 200000 250000 Database size Figure 5.5. Execution time of MinZoomUR as a function of database size Scalability as a function of number of initial beliefs Since ZoomUR and MinZoomUR generate unexpected patterns for each belief individually, these methods will scale linearly with the number of beliefs. However, the time taken to generate unexpected patterns for each belief is dependent on the specific belief in consideration due to many reasons such as the support and confidence values of the belief on the data and the number of unexpected patterns that actually exist. To examine this in greater detail, we measured the percentage of execution time that was used to generate unexpected patterns for each belief. Figures 5.6 and 5.7 illustrate the percentage of time taken for each of the beliefs. Each sector represents the time taken for generating unexpected patterns for a specific belief. Figure 5.6. Proportion of execution time of ZoomUR for each belief 32 Figure 5.7. Proportion of execution time of MinZoomUR for each belief Each of the pie charts in Figures 5.6 and 5.7 has slices corresponding to each belief. The distribution of time is similar in both cases, but the disproportionate share of some beliefs was interesting and we discuss a reason here as to why this would be expected to occur. Assume that a belief A → B has support s and confidence c in the data. The upper bound for the support for any unexpected rule generated by ZoominUR is 10 the fraction of the records where A is true and B is not true. This upper bound is (1-c)*s/c . For any given minimum support value, for beliefs where this fraction is large, intuitively many more itemsets can be generated and hence the execution time would be greater. When we ranked the beliefs according to this criterion, the top six beliefs were the same six beliefs that contributed to most of the execution time. Hence the distribution shown in Figure 5.6 and Figure 5.7 is just indicative of the fact that the initial set of beliefs may vary in their support and confidence values on the data. In this section we presented results pertaining to the effectiveness of MinZoomUR and compared it to Apriori and ZoomUR. In this real world case study we demonstrate that MinZoomUR can be used to discover far fewer patterns than Apriori and ZoomUR yet finding most of the truly interesting patterns. 6. Conclusions In this paper we presented a detailed formal characterization of the minimal set of unexpected patterns and proposed three algorithms for discovering the minimal set of such patterns. In a real world application we demonstrate that the main discovery algorithm, MinZoomUR, discovers orders of magnitude fewer patterns 10 Support(A, ¬B) = support(A) – support(A, B) = (s/c) – s = (1-c)*(s/c) 33 than other comparable methods yet retains most of the truly unexpected patterns. We also discuss tradeoffs between various discovery methods and present some guidelines for their usage. The power of this approach lies in combining two independent concepts of unexpectedness and minimality of a set of patterns into one integrated concept that provides for the discovery of small but important sets of interesting patterns. Moreover, MinZoominUR and MinZoomUR are efficient since we focus directly on discovering minimal unexpected patterns rather than adopting any of the post-processing approaches, such as filtering. Finally, the approach is effective in supporting decision making in real business applications where unexpected patterns with respect to managerial intuition can be of great value. One of the key assumptions in our approach to unexpectedness is the construction of a comprehensive set of beliefs since few truly interesting patterns can be discovered for a poorly specified set of beliefs. In [P99] we describe a method of generating a comprehensive set of beliefs. We also utilized this method to generate the initial set of beliefs used in the experiments described in Section 5. As a future research, we plan to study how different sets of initial beliefs affect generation of the minimal sets of unexpected patterns generated from these beliefs. We also plan to study how unexpected patterns can be generated from multiple beliefs. References [AIS93] Agrawal, R., Imielinski, T. and Swami, A., 1993. Mining Association Rules Between Sets of Items in Large Databases. In Proc. of the ACM SIGMOD Conference on Management of Data, pp. 207-216. [AMS+95] Agrawal, R., Mannila, H., Srikant, R., Toivonen, H. and Verkamo,A.I., 1995. Fast Discovery of Association Rules. In Fayyad, U.M., Piatetsky-Shapiro, G., Smyth, P., and Uthurusamy, R. eds., Advances in Knowledge Discovery and Data Mining. AAAI Press. [AT97] Adomavicius, G., and Tuzhilin, A., 1997. Discovery of Actionable Patterns in Databases: The Action Hierarchy Approach. In Proc. of the Third International Conference on Knowledge Discovery and Data Mining. [BA99] Bayardo, R. and Agrawal, R., 1999. Mining the Most Interesting Rules. In Proc. of the Fifth ACMSIGKDD International Conference on Knowledge Discovery and Data Mining. [BAG99] Bayardo, R., Agrawal, R. and Gunopulos, D., 1999. Constraint-Based Rule Mining in Large, Dense Databases. In Proceedings of ICDE, 1999. [BT98] Berger, G. and Tuzhilin, A., 1998. Discovering Unexpected Patterns in Temporal Data Using Temporal Logic. In Etzion, O., Jajodia, S. and Sripada, S. eds, Temporal Databases: 34 Research and Practice. Springer, 1998. [BM78] Buchanan, B.G. and Mitchell, T.M., 1978. Model Directed Learning of Production Rules. In Waterman and Hayes-Roth (eds.) Pattern-Directed Inference Systems, Academic Press, New York. [BMU+97] Brin, S., Motwani, R., Ullman, J.D., and Tsur, S., 1997. Dynamic Itemset Counting and Implication Rules for Market Basket Data. Procs. ACM SIGMOD Int. Conf. on Mgmt. of Data, pp.255-264. [CSD98] Chakrabarti, S., Sarawagi, S. and Dom, B. Mining Surprising Patterns Using Temporal Description Length. In International Conference on Very Large Databases, 1998. [F97] Forbes Magazine, Sep. 8, 1997. Believe in yourself, believe in the merchandise, pp.118-124. [FPS96] Fayyad, U.M., Piatetsky-Shapiro, G., Smyth, P., 1996. From Data Mining to Knowledge Discovery: An Overview. In Fayyad, U.M.,Piatetsky-Shapiro, G., Smyth, P., and Uthurusamy, R. eds., Advances in Knowledge Discovery and Data Mining. AAAI/MIT Press. [HRM75] Hayes-Roth, M. and Mostow, D., 1975. An Automatically Compliable Recognition Network for Structured Patterns. In Proceedings of International Joint Conference on Artificial Intelligence-1975., pp. 356-362. [KMR+94] Klemettinen, M., Mannila, H., Ronkainen, P., Toivonen, H. and Verkamo, A.I., 1994. Finding Interesting Rules from Large Sets of Discovered Association Rules. In Proc. of the Third International Conference on Information and Knowledge Management, pp. 401-407. [LH96] Liu, B. and Hsu, W., 1996. Post-Analysis of Learned Rules. In Proc. of the Thirteenth National Conference on Artificial Intelligence (AAAI ’96), pp. 828-834. [LHC97] Liu, B., Hsu, W. and Chen, S, 1997. Using General Impressions to Analyze Discovered Classification Rules. In Proc. of the Third International Conference on Knowledge Discovery and Data Mining (KDD 97), pp. 31-36. [LHM99] Liu, B., Hsu, W., and Ma, Y., 1999. Pruning and Summarizing the Discovered Rules. In Proc. of the Fifth ACM-SIGKDD International Conference on Knowledge Discovery and Data Mining. [M77] Mitchell, T.M., 1977. Version Spaces: A Candidate Elimination Approach to Rule Learning. In Proceedings of International Joint Conference on Artificial Intelligence-1977., pp. 305-310. [M82] Mitchell, T., 1982. Generalization as Search. Artificial Intelligence, pp. 203-226. [MPM96] Matheus, C.J., Piatetsky-Shapiro, G. and McNeill, D., 1996. Selecting and Reporting What is Interesting: The KEFIR Application to Healthcare Data. In Advances in Knowledge Discovery and Data Mining. AAAI Press, 1996. 35 [MUB82] Mitchell, T.M., Utgoff, P.E., and Banerji, R.B., 1982. Learning Problem-Solving Heuristics by Experimentation. In Michalski et al. (Eds.) Machine Learning, Tioga Press, Palo Alto. [P70] Plotkin, G.D., 1970. A Note on Inductive Generalization. In Meltzer and Michie (eds.) Machine Intelligence, Edinburgh University Press, Edinburgh. [P99] Padmanabhan, B., 1999. Discovering Unexpected Patterns in Data Mining Applications. Doctoral Dissertation, New York University, May 1999. [PSM94] Piatetsky-Shapiro, G. and Matheus, C.J., 1994. The Interestingness of Deviations. In Procs. of the AAAI-94 Workshop on Knowledge Discovery in Databases, pp. 25-36. [PT98] Padmanabhan, B. and Tuzhilin, A., 1998. “A Belief-Driven Method for Discovering Unexpected Patterns.” In Proc. of the 4th International Conference on Knowledge Discovery and Data Mining, 1998. [PT99] Padmanabhan, B. and Tuzhilin, A., 1999. Unexpectedness as a Measure of Interestingness in Knowledge Discovery. In Decision Support Systems, (27)3 (1999) pp. 303-318 [S97] Stedman, C., 1997. Data Mining for Fool's Gold. Computerworld, Vol. 31,No. 48, Dec. 1997. [SA96] Srikant, R. and Agrawal, R., 1996. Mining Quantitative Association Rules in Large Relational Tables. In Proc. of the ACM SIGMOD Conference on Management of Data, 1996. [SLR+99] Shah, D., Lakshmanan, L.V.S, Ramamritham, K., and Sudarshan, S., 1999. Interestingness and Pruning of Mined Patterns. In Proceedings of the 1999 ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery (DMKD), Philadelphia, 1999. [ST95] Silberschatz, A. and Tuzhilin, A., 1995. On Subjective Measures of Interestingness in Knowledge Discovery. In Proc. of the First International Conference on Knowledge Discovery and Data Mining, pp. 275-281. [ST96] Silberschatz, A. and Tuzhilin, A., 1996. What Makes Patterns Interesting in Knowledge Discovery Systems. IEEE Transactions on Knowledge and Data Engineering. Special Issue on Data Mining, vol. 5, no. 6, pp. 970-974. [SVA97] Srikant, R., Vu, Q. and Agrawal, R. Mining Association Rules with Item Constraints. In Proc. of the Third International Conference on Knowledge Discovery and Data Mining (KDD 97), pp. 67-73. [Sub98] Subramonian, R. Defining diff as a data mining primitive. In Proceedings of the Fourth International Conference on Knowledge Discovery and Data Mining, 1998. [Suz97] Suzuki, E., 1997. Autonomous Discovery of Reliable Exception Rules. In Proc. of the Third International Conference on Knowledge Discovery and Data Mining, pp. 259-262. 36 [TKR+95] Toivonen, H., Klemetinen, M., Ronkainen, P., Hatonen, K. and Mannila, H., 1995. Pruning and Grouping Discovered Association Rules. In MLNet Workshop on Statistics, Machine Learning and Discovery in Databases, pp. 47-52. [V78] Vere, S.A., 1978. Inductive Learning of Relational Productions. In Waterman and Hayes-Roth (Eds.) Pattern-Directed Inference Systems, Academic Press, New York. [W75] Winston, P.H., 1975. Learning Structural Descriptions from Examples. In Winston (ed.) The Psychology of Computer Vision, McGraw Hill, New York. 37