Generalized frequent episodes in event sequences Srivatsan Laxman, K.P.Unnikrishnan∗and P.S.Sastry Dept. Electrical Engineering Indian Institute of Science Bangalore 560012, INDIA Abstract Discovering patterns in temporal data is an important task in Data Mining. A successful method for this was proposed by Mannila et al. [1] in 1997. In their framework, mining for temporal patterns in a database of sequences of events is done by discovering the so called frequent episodes. These episodes characterize interesting collections of events occurring relatively close to each other in some partial order. However, in this framework (and in many others for finding patterns in event sequences), the ordering of events in an event sequence is the only allowed temporal information. But there are many applications where the events are not instantaneous; they have time durations. Interesting episodes that we want to discover may need to contain information regarding event durations etc. In this paper we extend Mannila et al.’s framework to tackle such issues. In our generalized formulation, episodes are defined so that much more temporal information about events can be incorporated into the structure of an episode. This significantly enhances the expressive capability of the rules that can be discovered in the frequent episode framework. We also present algorithms for discovering such generalized frequent episodes. 1. INTRODUCTION Data sets with temporal dependencies frequently occur in business, engineering and scientific scenarios. Development of data mining techniques specially suited for such data sets is rapidly gaining momentum today (see, e.g., [1] and references therein). These data sets may often be viewed as logs (or records) of events occurring in one or more sequences. The goal of data mining here would be to identify patterns in the form of some partially ordered collection of events that are interesting or can serve some useful predictive function. An often cited example of temporal data is the customer transaction log in a grocery store. Each purchase by a customer may be regarded as an event in the transaction logs. One can either maintain different transaction sequences for ∗ General Motors R&D Center, Warren, MI Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Copyright 200X ACM X-XXXXX-XX-X/XX/XX ...$5.00. different customers, or a single sequence for all the customers put together. Discovering frequent (or popular) buying patterns from these transaction logs can be of great use to the store [2, 3]. Another example for a temporal database is an alarm sequence in a telecommunication network [1]. The goal of data mining here would be to find relationships between alarms in the sequence. This would help in predicting impending alarms and faults in an on-line analysis of the incoming alarm stream. It is important to note that in both the examples cited above, the data is viewed as a time-ordered sequence of events. This ordering of events is the only temporal information that is used for defining patterns or episodes that we want to discover through mining techniques. In other words, the events logged are viewed as instantaneous, and the duration of existence of an event is is not considered relevant for the discovery process. This is true of most temporal data mining techniques today. There have been some recent attempts to allow time durations for events while analyzing temporal data [4] though such information does not seem to have been used for defining what constitute interesting sequential patterns. There are many applications where the event duration can often be a very important attribute. Consider, for example, the logs of an assembly line in a manufacturing plant. At every change in state of the line, we can log (as an event) the current state along with its start time. This would yield a temporal database in which each event is associated with (among other attributes) its duration of existence. The assembly line could either be running (with the current throughput metric logged as an attribute) or it could have stopped for a variety of reasons, such as an electrical system malfunction, a mechanical system malfunction, a lunch break or an end of shift. The role of data mining here would typically be to discover patterns of frequent breakdown and to suggest potential process improvements for achieving better throughputs and lower down times. The significance of event durations in such applications cannot be overemphasized. For example, extended lunch breaks certainly do not help the plant’s throughput! Or frequent short stoppages due to a series of electrical system malfunctions could foretell a severe power problem in the plant. Clearly, many of the interesting rules that we will want to discover (in this assembly line example) require constructs with some timing information built into them. This motivates our idea of introducing time durations and time intervals into the definition of an episode. In this paper we extend the approach of discovering fre- 0 5 10 E1 E3 E1 15 20 E2 25 TIME 30 35 E1 E2 40 45 E3 50 55 60 E4 65 E1 different kinds of valid time overlaps that successive events in the sequence may possess. It is easy to see that the duration of the ith event (denoted by ∆i ) in the event sequence (s, Ts , Te ), can be simply written as 70 E4 Figure 1: An event sequence ∆i = τi − ti ∀i. quent episodes as discussed in [1] for incorporating time duration information into episodes. We propose the generalized serial episode which incorporates explicit timing information for the events that constitute it. In the sections to follow we first define the generalized serial episode by associating event durations with each node of the episode. Next we present algorithms for discovering rules in this generalized framework. We show that the algorithms described in [1] for finding frequent serial episodes can be adapted to meet our requirements. We then briefly discuss how we can easily extend our definition of the episode to incorporate time intervals between nodes in an episode. We also discuss how we can associate with each node, not just a single event, but a collection of events. All these generalizations enhance the expressive power of the rules that can be discovered in the generalized serial episode framework. 2. EPISODES AND SUBEPISODES The problem we consider is the following. We are given a time-ordered sequence of events with each event persisting for some duration. We want to discover interesting patterns, which will be called generalized serial episodes, that denote some ordered collection of events that occur frequently in the data sequence. The definition for episodes allows specification of both the events that constitute it and the duration of such events. While the basic framework is similar to that in [1], the main difference is in the incorporation of time durations. This section presents our definitions for the generalized serial episode and its subepisode. Let E denote a finite set of possible event types, E = {E1 , E2 , . . . , EM }. (1) An event is defined as a triple (A, t, τ ), where A ∈ E, t is the event start time (an integer) and τ is the event end time (also an integer). We shall require of t and τ that τ > t in all events. It is important to note that unlike in [1] the events here are not instantaneous and hence require (for their full specification) both start and end times. An event sequence on E is simply an ordered collection of events and is represented by the triple (s, Ts , Te ), with Ts and Te fixing the lower and upper time extremities for the sequence and s = {(A1 , t1 , τ1 ), (A2 , t2 , τ2 ), . . . , (An , tn , τn )} (2) where Ai ∈ E & ti , τi ∈ [Ts , Te ], 1 ≤ i ≤ n, and ti ≥ ti−1 , 2 ≤ i ≤ n. Clearly, our definition of the event sequence allows for successive events in it to occur in either one of simultaneous, disjoint or partially overlapping time intervals. These notions do not exist when the events in the sequence are viewed as instantaneous as in [1]. Fig. 1 illustrates a typical example for an event sequence. The line segments represent the time intervals in which the events occur. The number above each line segment corresponds to the event’s duration of occurrence. Notice the (3) In the manufacturing plant example that we considered in Section 1, the event sequence basically describes the state of the line as a function of time, with the state at any given instant being fully specified by the conjunction of all event types that occur simultaneously (at that instant). For example, there could well be a simultaneous breakdown of different subsystems, each being denoted by a separate event in the sequence. An important special case of our event sequence would be one where we have ti+1 = τi , ∀i. This would be the case for our manufacturing system example if the event type definition is such that only one type of event can exist at any given time. By incorporating event start and end times in our event sequences, we expect to be able to define and discover episodes with greater expressive capability in them. A window on the event sequence (s, Ts , Te ) is simply a time slice from that sequence and is specified by another triple (w, ts , te ). The window may be regarded as an event sequence in its own right. ts and te are the window’s time extremities and w is an appropriate contiguous slice of events from s. It is important however to note that events in s that start after ts but terminate only beyond te need to be dropped out of w. For example, for the event sequence specified in Fig. 1 the window (w, 0, 10) includes the event (E4 , 06, 08) but excludes (E2 , 06, 12) and (E1 , 06, 15), since their end times go beyond the upper extremity of the window. 2.1 The generalized serial episode We are now ready to define the generalized serial episode along the same lines as the serial episode definition of [1]. We can think of the episode1 as a finite ordered collection of nodes. Each node is associated with an event type and a collection of time intervals (which describe the event duration possibilities for that node). Let B = {B1 , . . . , BK } denote a collection of time intervals. Recall that E is the set of possible event types. The generalized serial episode is defined as a quadruple α = (Vα , <α , gα , dα ). Vα is a finite collection of nodes and <α specifies a total order over it. gα and dα are both 1-1 maps where gα : Vα → E specifies the rule for associating an event type with each node and dα : Vα → 2B maps each node to the possible event durations that episode α allows. As a notation, we denote the number of nodes in Vα by n. Fig. 2 pictorially depicts a few example episodes. The event type associated with a node is written inside the circle that represents it. The set written directly below each node represents the subset of 2B to which (the episode’s) dα maps that node. In these examples, we have chosen the collection of time intervals as B = {[0, 3], [4, 7], [8, 11], [12, 15]}. It is important to observe that α2 and α4 represent different episodes, purely on the basis of having different timing information (i. e. different d(·) maps). Having proposed this modified definition for the episode, the most natural ques1 From here on the term “episode” always refers to the generalized serial episode. “α1 ” “α2 ” n n E2 E2 {[0, 3], [8, 11]} {[0, 3], [4, 7], [12, 15]} α (s, Ts , Te ) and β α =⇒ β (s, Ts , Te ). B \ {[0, 3]} E1 - En 4 B B n E2 {[0, 3], [8, 11]} - En 3 {[12, 15]} hβ (v) = hα (fβα (v)) ∀v ∈ Vβ . - En 1 B \ {[12, 15]} B - En 4 {[0, 3], [4, 7]} 3. SIMILAR AND PRINCIPAL EPISODES Figure 2: Some example episodes tion to ask next is when does such an episode occur in a given event sequence? Consider an episode α = (Vα , <α , gα , dα ). Let the number of nodes in α (i. e. |Vα |) be n. The episode α is said to occur in the event sequence (s, Ts , Te ) if ∃ a 11 hα : Vα → {1, . . . , n} such that the following conditions hold ∀v, w ∈ Vα : Ahα (v) = gα (v), ∃b ∈ dα (v) such that ∆hα (v) ∈ b and v <α w in Vα =⇒ thα (v) thα (w) in s. (4) (5) (6) The occurrence of episode α in the event sequence (s, Ts , Te ) is written as α (s, Ts , Te ). It can be immediately seen that among the example episodes defined in Fig. 2 all except α4 occur in the example event sequence of Fig. 1. Also, since a window in (s, Ts , Te ) may be viewed as an event sequence on its own, the same definitions are used to determine whether or not an episode occurred in a window of (s, Ts , Te ) as well. Observe that α1 , α2 , α3 and α5 all occur in the window (w, 0, 10) itself. 2.2 The subepisode Consider an episode α = (Vα , <α , gα , dα ). An episode β = (Vβ , <β , gβ , dβ ) is said to be a subepisode of α (written β α) if ∃ a 1-1 fβα : Vβ → Vα such that ∀v, w ∈ Vβ we have: gα (fβα (v)) = gβ (v), (7) dα (fβα (v)) ⊆ dβ (v), and (8) v <β w in Vβ =⇒ fβα (v) <α fβα (w) in Vα . (9) Basically, according to our definition of the subepisode, it does not suffice if just the relative ordering of events in the candidate subepisode matches that of the episode (which is the case in [1]). In addition, we also require that the timing constraints on the events of the candidate subepisode must be consistent with what those in the episode itself would allow. Notice that while α1 6 α2 we have α3 α5 and α2 α4 in Fig. 2. Strict subepisodes, i. e. β α and β 6= α, are denoted by β ≺ α. When β α we will sometimes prefer to say that α is a superepisode of β (written α β)2 . Also, from the 2 Strict superepisodes are similarly denoted like α β. (11) This is a “suitable” map hβ (·) which demonstrates that β (s, Ts , Te ). “α5 ” n E2 (10) Since α (s, Ts , Te ) there is a “suitable” hα : Vα → {1, 2, . . . , n} and since β α there is also a “suitable” fβα : Vβ → Vα . Thus, we can simply define a new map hβ : Vα → {1, 2, . . . , n} as “α4 ” “α3 ” n definitions thus far it can be easily see that - En 3 With definitions for the episode, subepisode and occurrence of an episode now in place, we proceed to introduce the important concepts of similar and principal episodes. It is by virtue of these concepts that our framework for rule discovery becomes both interesting and feasible. These ideas are closely tied to the notion of frequency of an episode in an event sequence. Consider an event sequence (s, Ts , Te ). For the sake of notational succinctness this event sequence triple is denoted by S. Let λS (α) represent some measure of how often α occurs in the event sequence S. This could be based either on the number of occurrences of α, or on the number of windows of S which contain α (both in the same sense as in [1]) or on just any other consistent frequency measure. Whatever the definition of λS (α) we shall require of it the property that β ≺ α =⇒ λS (β) ≥ λS (α) (12) i. e. in a given event sequence, the subepisodes are always at least as frequent (if not more frequent) as their corresponding superepisodes. This is a very important property which needs to be exploited well in order to facilitate efficient rule discovery using frequent episodes. We now motivate the idea of similar and principal episodes by examining the occurrence (in the event sequence example of Fig. 1) of some simple episodes defined in Fig. 3. Notice that all the episodes have identical event type assignments i. e. they have the same gα (·) maps. However, since the episodes describe events with different timing information they differ in their dα (·) maps. Let us represent the event sequence in Fig. 1 by S. Further, in this example, our choice for the λS (·) function is simply the number of occurrences (of the episode) in the event sequence S. 3.1 Similar episodes First we consider the βi ’s in Fig. 3. Since β3 ≺ β2 ≺ β1 ≺ β0 and since β0 S it can be inferred (from Eq. 10) that βi S 0 ≤ i ≤ 3. The βi ’s differ only in the durations that they allow for the second node (with event type E1 ). As more and more intervals are deemed allowed in the episode, we naturally expect the episode frequency to increase in S. This can be readily seen from Eq. (12). By counting the number of occurrences in S (i. e. the event sequence in Fig. 1) we find that, for the βi ’s in Fig. 3 λS (β3 ) = λS (β2 ) > λS (β1 ) > λS (β0 ). (13) Essentially, with respect to their frequencies of occurrence in S, the episodes β2 and β3 are indistinguishable. Such episodes are referred to as being similar in S. It can be “β0 ” n E3 cipal episode is one in which each time interval in each event duration set (of the episode) has earned its place by virtue of a strictly positive contribution to the episode’s frequency in the event sequence. An episode α∗ is said to be principal in an event sequence S if the following two conditions hold: - En 1 {[4, 7]} {[4, 7]} “β1 ” E3 - En 1 {[4, 7]} n {[4, 7], [8, 11]} α∗ S and ∗ (17) S ∗ @α α such that α ∼ α . “β2 ” n E3 {[4, 7]} - En 1 B \ {[12, 15]} “β3 ” E3 - En 1 {[4, 7]} n B Figure 3: Similar and principal episodes likewise verified that among the γi ’s in Fig. 3 the episodes γ1 and γ2 are also similar in S. In fact, of the episodes defined in Fig. 3 the pairs {β2 , β3 } and {γ1 , γ2 } are the only sets of similar episodes with respect to the event sequence S. It is important to note that unlike in the case of episodes and subepisodes, similar episodes are always defined with respect to a given event sequence only. Two episodes are said to be similar if they are identically defined except for the fact that one allows some more time intervals (for the event durations) than the other, and if their frequencies of occurrence are equal in the given event sequence. Let us now formally define the concept of similarity between episodes in an event sequence. Two episodes α and β are said to be similar in the event sequence S (written S α ∼ β) if all of the following hold: α β or β α, |Vα | = |Vβ | and λS (α) = λS (β). (14) (15) (16) It is clear that every episode α is trivially similar to itself (reflexive). From Eqs. (14)–(16) it is also evident that if S S α ∼ β then β ∼ α (symmetric). Further, it is also easy S S S to show that if α ∼ β and β ∼ γ then α ∼ γ (transitive). Similarity between episodes (in a given event sequence) is therefore an equivalence relation over the set of all episodes. 3.2 The principal episodes Consider once more the example episodes of Fig. 3. The event duration sets in all those episodes, except for β3 and γ2 , have the property that on removing any one time interval (or more) from any of them, the frequency of the corresponding episode (in the event sequence S defined in Fig. 1) reduces by a strictly non-zero quantity. In other words, the additional time intervals that β3 and γ2 permit (compared to say those permitted by β2 and γ1 respectively) for the durations of their second nodes are clearly not interesting given the event sequence S. We therefore wish to keep episodes like β3 and γ2 out of our rule discovery process. This leads us to the idea of principal episodes. Simply speaking, a prin- (18) It can be easily verified that all episodes in Fig. 3 excluding β3 and γ2 (and including δ0 ) are principal in the event sequence of Fig. 1. Notice that as was the case with similar episodes, the notion of principal episodes is also always tied to a specific event sequence. The similarity relation (being an equivalence relation) partitions the space of all episodes into equivalence classes. The episodes within each class are similar to one another. There will exist exactly one principal episode in each partition. All the other episodes in the partition can in fact be generated by appending time intervals (that do not contribute to the eventual episode frequency) to the event duration sets of the principal episode. When we are concerned with finite length event sequences, the space of all possible episodes that can occur in the sequence is definitely finite. Similarity with respect to the event sequence naturally yields a finite partitioning of this episode space. Each partition has exactly one principal episode which may be regarded as the generator of that equivalence class. Therefore it is guaranteed that the collection of all principal episodes in a given finite length event sequence is always a well defined and finite set. Subepisodes of principal episodes are in general not principal. Indeed, the non-principal episodes in each partition of our similarity-based quotient space are after all subepisodes of the partition’s generator (or principal episode). However, all subepisodes generated by dropping nodes (rather than by appending time intervals to the d(·) sets) of a principal episode are most certainly principal. Consider an l-node principal episode α∗l . Let α∗l−1 be an (l −1)-node subepisode of α∗l obtained by dropping one of its nodes (and keeping everything else identical). Recall that every time interval in the d(·) sets of α∗l contributes a strictly positive quantity to its frequency in the (given) event sequence (say S). Therefore, dropping a time interval from the d(·) set of a node in α∗l will reduce its frequency by a strictly positive quantity. Removing the same time interval from the corresponding node in α∗l−1 will naturally cause a similar fall in frequency since α∗l−1 S every time α∗l S. This shows that there can be no “redundant” time intervals in the d(·) sets of α∗l−1 (and therefore it is also principal in S). This is a very important property which needs to be exploited during the candidate generation step (in Algorithm 3) of our rule discovery process. 4. DISCOVERING RULES We are now ready to describe the rule discovery process in our generalized framework. It is very much in the same lines as in [1] except that we work with the collection of frequent principal episodes FS∗ rather than just the set of frequent episodes FS in the event sequence S. In this section we describe the various algorithms that constitute our rule discovery process. Each of these algorithms is an adaptation of its counterpart in [1]. For completeness, we provide all the algorithms here. However, to keep the discussion brief, we only indicate the prominent differences with respect to the algorithms in [1] in each case. Thus, this section assumes that the reader is familiar with the algorithms in [1]. 4.1 The rules Let FS∗ denote the set of all frequent principal episodes in S. We are looking for rules like β ∗ → α∗ , where β ∗ ≺ α∗ and α∗ , β ∗ ∈ FS∗ . To obtain such rules we need to first compute all the frequent principal episodes in the event sequence S. For each frequent episode-subepisode pair, we declare that the subepisode implies the episode if the ratio of their frequencies of occurrence in the event sequence exceeds a threshold. This is what Algorithm 1 does. The important difference with respect to the algorithm proposed in [1] is that here the episodes and subepisodes considered for rule discovery are all not just frequent but also principal in S. Algorithm 1 To discover rules based on the frequent principal episodes of an event sequence Input: Set E of event types, event sequence S, set E of allowed episodes, window width bw , minimum frequency λmin , minimum confidence ρmin , set B of allowed time intervals Output: Rules that hold in S 1: /∗ Find set FS∗ of all frequent principal episodes in S ∗/ 2: Compute FS∗ using Algorithm 2 3: /∗ Generate rules ∗/ 4: for all α∗ ∈ FS∗ do 5: for all β ∗ ∈ FS∗ such that β ∗ ≺ α∗ do 6: if (λS (α∗ )/λS (β ∗ )) ≥ ρmin then 7: Output the rule β ∗ → α∗ 4.2 The frequent principal episodes The process of computing the set of frequent principal episodes is incremental and is done through Algorithm 2. We start with the set C1 of all possible 1-node episodes. F1∗ is the collection of frequent principal episodes in C1 . Once F1∗ is obtained, the collection C2 of candidate 2-node episodes is generated from it. This incremental process of first obtaining Fl∗ from the candidates in Cl and then generating Cl+1 from Fl∗ is repeated till the set of new candidates generated is empty. The crucial step here is to designate an episode as a candidate only if all the subepisodes obtained by dropping a node from it have already been declared as being both frequent and principal. 4.3 Candidate generation The candidate generation (i. e. Algorithm 3) is also along the same lines as that in [1]. The l-node frequent principal episodes are arranged as a lexicographically sorted collection Fl∗ of l-length arrays (of event types). However, the ith episode in the collection now needs two components for its specification – the first is Fl∗ [i].g[j] which refers to the event type association for j th node, and the second is Fl∗ [i].d[j] which stores the collection of intervals associated with the j th node, the kth element of which is denoted by Fl∗ [i].d[j][k]. The episode collection is viewed as to be constituted in blocks (like in [1]) such that within each block in Fl∗ the first (l − 1) nodes are identical i. e. both the event type as well as the timing information of the first (l − 1) nodes are the same for the episodes in a block. Potential (l + 1)-node candidates are constructed by combining two Algorithm 2 To compute the set of frequent principal episodes in an event sequence Input: Set E of event types, event sequence S, set E of allowed episodes, window width bw , minimum frequency λmin , set B of allowed time intervals Output: Set FS∗ of all frequent principal episodes in S 1: Initialize FS∗ = φ 2: /∗ Construct set C1 of 1-node candidate episodes ∗/ 3: Compute C1 = {α ∈ E such that |α.g| = 1} 4: Initialize l = 1 5: while Cl 6= φ do 6: /∗ Obtain set Fl∗ of all l-node frequent principal episodes ∗/ 7: Compute Fl∗ = {α ∈ Cl such that λS (α) ≥ λmin & α is principal in S} using Algorithm 4 8: /∗ Obtain set Cl+1 of all (l + 1)-node candidate episodes ∗/ 9: Compute Cl+1 = {α ∈ El+1 such that ∀β ≺ α with |β.g| < l ∗ we have β ∈ F|β.g| } using Algorithm 3 10: Update l = l + 1 11: for all l do S 12: Update FS∗ = FS∗ Fl∗ ∗ 13: Output FS episodes within a block. The newly constructed (l + 1)-node episode is declared a candidate only if every subepisode generated by dropping one of its nodes is already known to be principal and frequent (and hence must have been found in Fl∗ ). 4.4 Counting frequencies of principal episodes We now need an algorithm that can compute the frequency of the candidates that have been generated. Algorithm 4 implements a frequency computing scheme. The episode frequency is defined based on the number of windows of the event sequence that contain the episode. This is identical to the frequency definition used in [1]. Since our episodes and their occurrence in an event sequence are essentially generalizations of the concepts defined in [1], the basic idea of using finite state automata to recognize episode occurrences continue to be very useful. However, the exact initialization and updating procedures for the automata differ slightly to account for our generalizations. In Algorithm 4, waits(A, δ) (rather than the waits(A) of [1]) stores (α, i) pairs, indicating that an automaton for episode α is waiting in its ith state for the event type A to occur with a duration in the δ time interval. This is the main difference between the algorithm proposed in [1] and our algorithm here. All the other structures have identical functions in both the algorithms. The number of windows of S that contain α is stored in α.f req. (α, j) ∈ beginsat(t) indicates that an automaton for α has reached its j th state at time t. trans stores the various automata transitions in the current window shift. Each element in trans is a triple (α, j, t) implying that an automaton for α that was initialized at time t has transited to state j in the current window. An important step in Algorithm 4 (that was not needed in its counterpart in [1]) is the check for principal episodes. Once all the frequencies have been computed, an l-node episode is declared as principal only if all its l-node superepisodes differ in frequency with respect to itself. It can be seen that this principal episode check in Algorithm 4 is the only serious computational overhead that our Algorithm 3 To generate (l + 1)-node candidates from lnode frequent principal episodes Input: Set Fl∗ of appropriately sorted l-node frequent principal episodes, set B of allowed time intervals Output: Set Cl+1 of appropriately sorted (l + 1)-node candidate episodes 1: Initialize Cl+1 = φ and k = 0 2: if l = 1 then 3: for h = 1 to |Fl∗ | do 4: Initialize Fl∗ [i].BlkStart = 1 5: for i = 1 to |Fl∗ | do 6: Initialize CurrentBlkStart = k + 1 7: for (j = Fl∗ [i].BlkStart; Fl∗ [j].BlkStart = Fl∗ [i].BlkStart; j = j + 1) do 8: /∗ Fl∗ [i] and Fl∗ [j] have their first (l − 1) nodes common ∗/ 9: /∗ Build a potential candidate α from these two ∗/ 10: for x = 1 to l do 11: Assign α.g[x] = Fl∗ [i].g[x] and α.d[x] = Fl∗ [i].d[x] 12: Assign α.g[l + 1] = Fl∗ [j].g[l] and α.d[l + 1] = Fl∗ [j].d[l] 13: /∗ Build and test subepisodes of α ∗/ 14: for y = 1 to (l − 1) do 15: for x = 1 to (y − 1) do 16: Assign β.g[x] = α.g[x] and β.d[x] = α.d[x] 17: for x = y to l do 18: Assign β.g[x] = α.g[x + 1] and β.d[x] = α.d[x + 1] 19: if β 6∈ Fl∗ then 20: Goto next j in line 8 21: Update k = k + 1 22: Assign Cl+1 [k] = α and Cl+1 [k].BlkStart = CurrentBlkStart 23: Output Cl+1 rule discovery process has when compared to the algorithms in [1]. This, and of course, the natural swelling of the set of all episodes due to the inclusion of timing information in them, are the only significant sources for the additional computational cost that we need to bear. 5. DISCUSSION In this paper our objective is to show how the framework for discovering frequent episodes, due to Mannila et al. [1], can be extended to handle applications where time durations of events is an important attribute. As briefly outlined in Section 1, being able to include time durations of events is useful, e.g., in analyzing failure modes of manufacturing processes. The framework presented here is being developed specifically to handle such an application. At the level of formalism, the main change needed is the incorporation of event durations explicitly in the definition of episodes. Thus, in defining the generalized serial episode (cf. Section 2.1), in addition to the node set, Vα , the total order <α , and the map gα that assigns an event type to each node, we also included dα that prescribes the allowable time durations for events associated with different nodes. Given that the data of event sequence contains time durations of events, it is now fairly straight forward to count the frequency of such generalized events. However, incorporation of dα results in a structure whereby interesting frequent episodes now become equivalence classes rather than single episodes. This is the reason we defined the concept of principal episodes in Section 3 so that the search for frequent episodes does not return too many redundant ones. The main computational task here is to discover all frequent principal episodes. The algorithms presented follow essentially the same strategy as in [1], namely, constructing frequent episodes of progressively increasing length starting with episodes of length one. However, the inclusion of dα and the need to ensure that only principal episodes are considered as candidates, together imply that the algorithms presented in [1] need to be modified. As explained in Section 4, the needed modifications are easily implemented without too much additional computational burden. It may be worth noting that the structure developed here is a strict generalization of the serial episodes discussed in [1]. If we always choose dα as the set of all possible time durations, then we essentially get back the serial episodes of [1]. There are some modifications to the structure of generalized serial episodes that are easily implemented and which enhance the expressive capabilities of the formalism. We briefly describe two such modifications below. We can associate more than one event type with each node in defining episodes. This is done by making the gα a function from Vα to 2E . That is, both gα and dα would become set-valued functions now. This would make our generalized serial episodes to include some parallel episodes as defined in [1] as well, in addition to being a strict generalization of serial episodes. However, this would not result in a structure that is a strict generalization of parallel episodes also. Our structure would not be able to include all parallel episodes of [1] though it would contain episodes that are not expressible either as serial episodes or as parallel episodes. For example, consider a two node episode where the first node is associated with the set {A, C} and the second node is associated with {B}. This would represent a pattern of either A or C but not both occurring before B. Such an episode cannot be represented in the formalism of [1]. On the other hand, in our formalism, we cannot represent the episode that says both A and C should occur, in any order, before B, as a single episode. Discovering such generalized episodes is also easily done. We have explained in Section 3 that because dα is a setvalued function, one can construct subepisodes that are not interesting by simply tagging more time intervals with each node. That is why we defined the concept of principal episodes. In a similar way, when gα is a set-valued function, we can make uninteresting subepisodes by simply tagging irrelevant event types with each node. Hence, we need to modify our definition of similar and principal episodes to take care of this redundancy due to the new set-valued gα function also. This is done easily. Now the modifications needed for the algorithms would be similar to what we have done to take care of the requirements due to presence of dα in our generalized serial episodes. There is another direction in which we can extend the formalism presented here. In the generalized serial episode framework presented here, we have included only intra-event time durations. That is, for an episode to occur in an event sequence, the only additional constraints here are in terms of duration of individual events. Another kind of constraint that may be relevant in applications is in terms of the time interval between successive events in the episodes. This is very easily done by changing the definition of ∆i to ∆i = ti+1 − ti in Eq. (3). Now the dα function prescribes constraints on the duration between starting times of successive events. The only modification needed in the algorithms now is in the part where we decide whether an episode has occurred in an event sequence while counting the frequency. As is easy to see, we can extend our structure to take care of both intra-event and inter-event time constraints. In the definition of an episode, instead of having only one function dα , we can have two functions, say, d1α and d2α , one to take care of intra-event time constraints and another to take care of inter-event time constraints. Thus, the structure proposed in this paper allows one to discover episodes or patterns in which information regarding time durations of events can form an important part. Most of the currently available data mining techniques do not allow for this though such a capability may be useful in analyzing data pertaining to many engineering processes. It is in this sense that we feel our extension of the framework of Mannila et al. [1] is significant in enriching the expressive capabilities of the technique of discovering frequent episodes. Acknowledgements The work reported here is supported in part by project funded by GM R&D Center, Warren, through SID, IISc, Bangalore. 6. REFERENCES [1] H. Mannila, H. Toivonen, and A. I. Verkamo, “Discovery of frequent episodes in event sequences,” Data Mining and Knowledge Discovery, vol. 1, no. 3, pp. 259–289, 1997. [2] R. Agrawal and R. Srikant, “Mining sequential patterns,” in 11th Int’l Conference on Data Engineering, Taipei, Taiwan, Mar. 1995. [3] R. Srikanth and R. Agrawal, “Mining generalized association rules,” in The 21st VLDB Conference, Zurich, Switzerland, Sept. 1995. [4] R. Villafane, K. A. Hua, D. Tran, and B. Maulik, “Mining interval time series,” in Data Warehousing and Knowledge Discovery, 1999. Algorithm 4 To generate the set of all l-node frequent principal episodes from the set of l-node candidates Input: Set C of (candidate) episodes, event sequence S = (s, Ts , Te ), window width bw , minimum frequency λmin Output: Subset F (of C) of frequent episodes in the sequence S 1: for all α ∈ C do 2: for i = 1 to |α.g| do 3: Initialize α.init[i] = 0 4: for all δ ∈ α.d[1] do 5: Initialize waits(α.g[i], δ) = φ 6: for all α ∈ C do 7: for all δ ∈ α.d[1] do 8: Update S waits(α.g[1], δ) = waits(α.g[1], δ) {(α, 1)} 9: Initialize α.f req = 0 10: for t = Ts − bw to Ts − 1 do 11: Initialize beginsat(t) = φ 12: for start = Ts − bw + 1 to Te do 13: /∗ Start of new window∗/ 14: Initialize beginsat(start + bw − 1) = φ and trans = φ 15: for all (A, t, τ ) ∈ s such that τ = start + bw − 1 and (τ − t) ≤ bw do 16: Assign d = {b ∈ B such that (τ − t) ∈ b} 17: for all (α, j) ∈ waits(A, d) do 18: if j = |α.g| and α.init[j] = 0 then 19: Assign α.inwindow = start 20: if j = 1 then S 21: Update trans = trans {(α, j, t)} 22: else S 23: Update trans = trans {(α, j, α.init[j − 1])} 24: Update beginsat(α.init[j − 1]) = beginsat(α.init[j − 1]) \ {(α, j − 1)} 25: Reset α.init[j − 1] = 0 26: for all δ ∈ α.d[j] do 27: Update waits(A, δ) = waits(A, δ) \ {(α, j − 1)} 28: for all (α, j, t) ∈ trans do 29: Update α.init[j] = t S 30: Update beginsat(t) = beginsat(t) {(α, j)} 31: if j < |α.g| then 32: for all δ ∈ α.d[j] do 33: Update S waits(α.g[j + 1], δ) = waits(α.g[j + 1], δ) {(α, j + 1)} 34: for all (α, j) ∈ beginsat(start − 1) do 35: if j = |α.g| then 36: Update α.f req = α.f req + start − α.inwindow 37: else if α.init[j] = start − 1 then 38: for all δ ∈ α.d[j + 1] do 39: Update waits(α.g[j + 1], δ) = waits(α.g[j + 1], δ) \ {(α, j + 1)} 40: Reset α.init[j] = 0 41: /∗ Output ∗/ 42: Initialize F ∗ = φ 43: for all α ∈ C such that α.f req/(Te −Ts +bw −1) ≥ λmin do 44: for all β α do 45: if β.f req = α.f req then 46: Skip to next S α in line 43 47: Update F ∗ = F ∗ {α} 48: Output F ∗