From: KDD-96 Proceedings. Copyright © 1996, AAAI (www.aaai.org). All rights reserved.
Discovering
generalized
Heikki
episodes
Mannila
using
and Hannu
minimal
occurrences
Toivonen
University of Helsinki
Department of Computer Science
P.O. Box 26, FIN-00014 Helsinki, Finland
Heikki.Mannila@cs.Helsinki.FI,
Hannu.Toivonen@cs.HeIsinki.FI
Abstract
Sequences of events are an important special form
of data that arises in several contexts, including
telecommunications,
user interface studies, and
epidemiology.
We present a general and flexible framework of specifying classes of generalized
episodes. These are recurrent combinations of
events satisfying certain conditions. The framework can be instantiated
to a wide variety of
applications by selecting suitable primitive conditions.
We present algorithms for discovering
frequently occurring episodes and episode rules.
The algorithms are based on the use of minimal
occurrences of episodes; this makes it possible
to evaluate confidences of a wide variety of rules
using only a single analysis pass. We present empirical results on t,he behavior of t.he algorithms
on events stemming from a WWW log.
Introduction
Sequences of events are a common
form of data that can
contain important knowledge to be discovered. Examples of such data are telecommunications network
alarms, user interface actions, crimes committed by
a person, occurrences of recurrent illnesses, etc. Recently, interest in knowledge discovery from sequences
of events has increased:
see, e.g., (Dousson, Gaborit,, & Ghallab 1993; Laird 1993; Wang et al. 1994;
Morris, Khatib, & Ligozat 1995; Bettini, Wang, Sr; Jajodia 1996).
In a previous paper (Mannila, Toivonen, & Verkamo
1995) we showed how sequences of events can be analyzed by locating frequently occurring episodes from
them. An episode is a combination of events with a partially specified order; it occurs in a sequence, if there
are occurrences of the events in an order consistent
with the given order, within a given time bound.
In a telecommunication application, the rules found
using the methods of (Mannila, Toivonen, & Verkamo
1995) have proven to be useful and they have been integrated in alarm handling software (HritGnen et al.
146
KDD-96
1996). However, application studies both in the telecommunications domain and elsewhere have shown that
there is a need for extensions to the methods.
In this paper we study what types of episodes can
be efficiently discovered from long sequences of events.
We present a simple and powerful framework for building episodes out of predefined predicates, and describe
how such episodes can be discovered efficiently. The
framework allows one to express arbitrary unary conditions on the individual events, and to pose binary
conditions on the pairs of events, giving one the possibility to exactly target the types of event combinations
that are interesting in the applications.
The algorithms described in the paper are based on
mznzm,al occurrences of episodes. In addition to being
simple and efficient, this formulation has the advantage
that, t,he confidences and frequencies of rules wit#h different, time bounds can be obtained quickly, i.e., there
is no need to rerun the analysis if one only wants to
modify the time bounds. In case of complicated episodes, the time needed for recognizing the occurrence of
an episode can be significant; the use of stored minimal
occurrences of episodes eliminates unnecessary repetition of the recognition effort.
Episodes:
patterns
in event
sequences
We use a fairly standard way of modeling events in
time. Given the set R = {Al, . . . , A,} of event attributes with domains DAM,. . . , DA,, an event e over R is
a (m + 1)-tuple (al, . . . , alnr t), where ai E DA, and t
is a real number, the tame of e. We refer to the time of
e by e.T, and to an attribute A E R of e by e.A. An
event sequence S is a collection of events over R, i.e.,
a relation over RU {T}, where the domain of attribute
T is the set of real numbers.
An epasode P on variables {xl,. . . , xk}, denoted
xk), is a conjunction
P(Xl,...,
i\ Pi(%,Zi),
i=l
xk} are event variables, and
where y%,zi E {xi,...,
each conjunct po2(x, y) h as one of the forms LY(z.A),
or x.T 5 y.T. Here A and B are event
P(x.A,y.B),
attributes, o is a predefined unary predicate on DA,
and p is a predefined binary predicate on DA x Dg.
We assume that the available unary predicates include
testing for equality with a constant, and that the binary
predicates include equality. The size ] P ] of an episode
P is the number of conjuncts not involving time in P.
the data could not easily be used. The recent work
on sequence data m databases (see (Seshadri, Livny, &
Ramakrishnan 1996)) p rovides interesting openings towards the use of database techniques in the processing
of queries on sequences.
An occurrence of an episode P at [t, t’] is mz’namal
if P does not occur at any proper subinterval [u, u’] C
[t, t’]. The set of (intervals of) minimal occurrences of
an episode P is denoted by ma(P):
Example
1 In the telecommunications domain, event
attributes are, e.g., type, module, and sewerzty, indicating the type of alarm, the module that sent the alarm,
and the severity of the alarm, respectively. An episode
might look as follows:
ma(P)= Ut,t’l I [t,t’l is
x.type = 2356 A y.type = 7401,
or
x. type = 2356 A y.type = 7401 A
x.T 2 y.T A x,module = y.module.
The first episode indicates that two alarms of type 2356
and 7401 have to occur, whereas the second says that
alarm 2356 has to precede 7401 and that the alarms
have to come from the same module. Specification of,
e g., this condition was not possible in the framework
of (Mannila, Toivonen, & Verkamo 1995).
The episode
x.type = 2356 A y.type = 7401 A
nezghbor(x. module, y.module)
captures the situation when alarms of type 2356 and
0
7401 are sent from adjacent modules.
Example 2 In analyzing a WWW log, the events have
attributes page (the accessed page), host (t,he accessing
host), and time. An example episode is
x.page = p1 A y.page = p, A x.host = y.host,
indicating
accesses to p, and p, from the same host.
un
a minimal occurrence of P}.
An epzsode rule is an expression P [V] + Q [WI,
where P and Q are episodes, and V and W are real
numbers. The informal interpretation of the rule is that,
if episode P has a minimal occurrence at interval [t, t’]
with t’ -t 5 V, then episode Q occurs at interval [t , t”]
for some t” such that t” - t 5 W.’
Example
3 An example rule in the WWW
log is
x.page = p1 A y.page = p2 A
x.host
+
= y.host [60]
x.page = p1 A y.page = p, A
z.page = p3 A x.host
y.host = z.host
= y host
r\
[120]
expressing that if some host accesses pages pr and pz
within a minute, page ps is likely to be accessed from
0
the same host within two minutes.
An episode P(xl, . . .) xk) is seraal, if P includes conjuncts enforcing a tot,al order between the xi’s. The
episode is parallel, if there are no conditions on the relative order of the events.
The frequency freq(P) of an episode P ill a given
event. sequence S is defined as the number of minimal
occurrences of P in the sequence S, freq( P) = ]mo( P) 1.
Given a frequency threshold man-fr, an episode P is
frequent if freq(P) 1 min-fr.
The epasode rule dascovery task can now be stat,ed
as follows. Given an event sequence S, a class S of
episodes, and time bounds I/ and W, find all frequent
episode rules ofthe form P [V] + Q [WI, where P, Q E
An episode P(xl, . . . , xk) occurs in a sequence of
events S = (er,..., e,), at znterval [t, t’], if there are
disjoint events ej, , . . . , elk such that P(e,, , . . , eJk) is
true,l and t 5 mz’n,{ej, .T} and t’ 2 max,{eJ, .T}.
Note that one could write an SQL query that, t,ests
whether an episode of the above form occurs in a sequence, when the sequence is represented as a relation
over R U {T}. The evaluation of such a query would,
however, be quite inefficient, as the sequence form of
To make episode rule discovery feasible, t,he class of
episodes has to be restrict,ed somehow. We consider the
restricted but. interestmg task: of discovermg episodes
from the following classes Es (l?, A) and &p (r , A) The
class Es(l?, A) consists of serial episodes with unary
predicates from r and binary predicates from A. The
class Ep(r, A) is defined to consist of parallel episodes,
with predicates from l? and A.
‘For reasons of brevity we omit the standard formal
definition of satisfaction of a formula, and just use t.he notation P(e3,, . . . , eJk) to indicate this.
“There is a number of variations for the relationship
between t.he intervals; e.g., rules that point backwards in
t.ime can be defied in a similar way.
E.
Mining with Noise and Missing Data
147
The episodes
Differences
to the previous model
described in (Mannila, Toivonen, & Verkamo 1995)
were based on conditions on the event types. That
is, of our examples only the first episode of Example 1
would be representable within that framework. In the
applications it is necessary to be able to state both more
relaxed and more restricted episode specifications by
hm,;nm
l‘u.” ‘1’6
or\nA;t;r,no
cl”l,~,“,“llU
,,l,t,rl
I~l.zL”~U
+r\
U”,
ac.s*,n
thn
lJL,G
nn+...nnlr
II~lJ”Y”IR
+A..b”p”-
logy. That is, binary predicates are necessary. Also,
there was only one fixed time bound associated with
a rule. In the applications it is preferable to have two
bounds, one for the left-hand side and one for the whole
rule, such as “if A and B occur within 15 seconds, then
C follows within 30 seconds”.
One notable difference in the algorithms is that here
we are able to give methods that can be used to compute the confidences of rules of the above form for various values of the time bounds; the previous methods
had to make one pass through the data for each time
bound. There are also some smaller technical differences, e.g., the exact definition of the confidence of a
rule is somewhat changed (and is now more natural, it
seems to us).
The work most closely related to ours is perhaps
(Srikant & Agrawal 1996). They search in multiple sequences for patterns that are similar to the serial episodes of (Mannila, Toivonen, & Verkamo 1995) with
some extra restrictions and an event taxonomy. Our
mpt.hnrlc
.-.vY**vuu
ran
VUll he
U” avtenrla-l
“I.Y”.lUVU
4th
lll”ll
uca tavnnnmv
“U’L”“““‘J
hv
o L&IIrI;vUJ CA
ect application of the similar extensions to association
rules. Also, our methods can be applied on analyzing
several sequencies; there is actually a variety of choices
for the definition of frequency of an episode in a set of
sequencies.
There are also some interesting similarities between
the discovery of frequent episodes and the work done
on inductive logic programming (see, e.g., (Muggleton
1992)); a noticeable difference is caused by the sequentiality of the underlying data model, and the emphasis
on time-iimited occurrences. Similariy, the probiem of
looking for one occurrence of an episode can be viewed
as a constraint satisfaction problem.
Finding
minimal
frequent
occurrences
episodes
of
In this section we describe the algorithms used to locate the minimum occurrences of frequent episodes from
the classes Cp and Es. The algorithms are based on the
idea of first locating minimal occurrences of small epis1~ and using this information to generate candidates
oaes,
for possibly frequent larger episodes.
An episode P is sample, if it includes no binary predicates. Recognition of simple episodes turns out to
148
KDD-96
be considerably easier than for arbitrary
is shown by the following two theorems.
episodes, as
Theorem 4 Finding whether a simple serial or parallel episode P has an occurrence in an event sequence
q
S can be done in time IPllSl.
Theorem
,A,
“Urj
5 Finding whether a serial or parallel epis-
.,n
1D h,,
IIOX? au
r\nm%m,.nmrn
“~~UAI~IILS
NP-complete problem.
:,111 au
e.m C”t;lllJ
rrrr--+
--I..---acquallLa
0C :-Ib ^_
all
q
One should not be discouraged by the NPcompleteness result, however; the situations used in the
reduction are highly contrived. It seems that in practice episodes similar to the ones in our examples can
be recognized and discovered fast.
We move to the discovery alprithm
for simple episodes. Given an episode P = As=1 (pi(yi, 2%)) a subepasode PI of P determined by a set I 5 (1,. . .,k} is
7 1 l-ho
n,.nncl&nc YIb,o YI
nf
simnlv
“..“yL”,
.P,1 -- IA\tEl -/n.IQr.
.LIIU hcanrr
ULWIS., yrvyu
‘i’z\Y~> “al.
simple episodes are given in the following lemmas.
Lemma 6 (i) If an episode P is frequent in an event
sequence S, then all subepisodes PI are frequent. (ii)
If [t,t’] E ma(P), then PI occurs in [t,t’] and hence
there is an interval [tr,t$] E mo(PI) such that t 5 tI 5
0
t'l < t’.
Lemma 7 Let P be a
and let [t, t'] E ma(P).
and PZ of P of size k and tz Ejt,i’j we have
mo(P2).
simple serial episode of size k,
Then there are subepisodes PI
1 such that for some tr E [t, t’[
[t,trj E mo(Pr j and [tz,t’j E
q
Note that Lemma 7 does not hold for general episodes: a minimal occurrence of a general episode does
not necessarily start .with a minimal occurrence of a
subepisode.
Lemma 8 Let P be a simple parallel episode of size k,
and let [t, t'] E ma(P). Then there are subepisodes PI
andPzofPofsizek-lsuchthatforsometi,t2,ti,t/2E
[t,t']
we have [tl,t{]
E mo(Pl) and [tz,tL] E mo(Pz),
and furthermore t = min{t 1, t2) and t’ = max{ti , tk}.
0
We use the same algorithm skeleton as in the discovery of association rules (Agrawal & Srikant 1994;
Mannila, Toivonen, & Verkamo 1994). Namely, having
found the set Lk of frequent simple episodes of size k,
we form the set ck+l of candidate episodes of size k+l,
i.e., episodes whose all subepisodes are frequent, and
then find out which candidate episodes P E Ck+l are
really frequent by forming the set ma(P)
Aigorithm 9, beiow, discovers aii frequent simpie
serial episodes. The discovery of simple parallel eplsodes is similar; the details of steps 9 and 10 are a bit.
different.
Algorithm
9 Discovery of frequent simple serial episodes.
Input:
An event sequence 5, unary predicates l? and
binary predicates A, and a frequency threshold min-fr.
Output:
All frequent simple episodes from Cs(l?, A)
(and their minimal occurrences).
Method:
1.
2.
3.
4.
5.
6.
7.
8.
9.
Cl := the set of episodes of size 1 in Es((r, A);
for all P E Ci compute ma(P);
Li := {P E Ci ] P is frequent};
i := 1;
while Le # 0 do
ci+1 := {P [ P E Es@‘, A), all subepisodes of P are in La};
i:=i+l;
for all P E Ci do
select subepisodes PI and P2;
compute ma(P) from mo(Pl) and
10.
mo(P2);
11.
12.
13.
14.
od;
Li := {P E Ci ] P is frequent};
od;
for all i and all P E L,, output P and ma(P);
Finding
the minimal
occurrences
For simple
serial episodes, the two subepisodes are selected on
line 9 so that PI contains all events except the last
one and P2 in turn contains all except the first one. PI
and P2 also contain all the predicates that apply on the
events in the episode. The minimal occurrences of P
are then computed on line 10 by
mo( P) = {[t , u’] ] there are [t , t’] E mo( PI) and
[u, u’] E mo(P2) such that t < u,
t’ < u’, and [t , u’] is minimal}.
Note the correspondence to Lemma 7.
For simple parallel episodes, the subepisodes PI and
Pz contain all events except one; the omitted events
must be different. Again, PI and Pz contain all the
applicable predicates. See Lemma 8 for the idea of how
to compute the minimal occurrences of P. To optimize
the efficiency, ] mo( PI) I+ I mo( Pp,) I should be minimized.
The minimal occurrences of a candidate episode P
can be found in a linear pass over the minimal occurrences of the selected subepisodes PI and P2. The
time required for one candidate is thus 0( ]mo( PI) 1 +
!mo(Pz)! + !mo(P)!),
which is O(n), where n is the
length of the event sequence.
While minimal occurrences of episodes can be located quite efficiently from minimal occurrences, the size
of the data structures may be even larger than the original database, especially in the first couple of itera-
tions. A solution is to use in the beginning other pattern recognition methods.
Finding general episodes
The recognition problem
for general episodes is considerably harder than that
for simple episodes. The difficulty is caused by the
failure of Lemma 7. Consider as an example the event
sequence
(mm,
hostI, 1) , (vvel,
host2,2),
(wge2,
hostl,
ho&,
4)) (ww3,
(vw2,
hosf2,3),
5))
and the episode P
x.host
= y.host
A y.host
= z.host.
The minimal occurrences of the two (actually equivalent) subepisodes x.host = y.host and y.host = z.host of
P are at [2,3], whereas ma(P) = {[I, 51).
While the minimal occurrences of a general episode
cannot be built as easily from the minimal occurrences
of subepisodes as for simple episodes, the occurrence
of subepisodes is still a necessary condition for the occurrence of the whole episode (Lemma 6). We use
this property to implement a simple exhaustive search
method for finding minimal occurrences of general episodes; the use of minimal occurrences of subepisodes
guarantees that the exhaustive search has to be applied
only to small slices of the long event sequence.
Finding
minimal
occurrences
of general episodes In line 9 of Algorithm 9 choose incomparable
subepisodes PI and P2 of P such that jmo(P1)1 +
Imo(Pz) I is as small as possible. Then, for each pair
of intervals [tl,ti] E mo(Pi) and [tz, ta] E mo(P2) such
that max(ti, tk) - min(ti, t2) < W search for a minimal occurrence of P in the time interval [max(t{ , ti) W,min(ti,t~)+W].
H ere W is the upper bound on the
length of minimal occurrences. The search can be done
using basically exhaustive search, as the sizes of eplsodes and the number of events in such small intervals
are small. We omit the details.
A problem similar to the computation of frequencies
occurs in the area of active databases. There triggers
can be specified as composite events, somewhat similar
t,o episodes. In (Geham, Jagadish, & Shmueli 1992) it
is shown how finite automata can be constructed from
composite events to recognize when a trigger should be
fired. This method is not practical for episodes since
the deterministic automata could be very large.
Finding
confidences
of rules
In this section we show how the information about minimal occurrences of frequent episodes can be used to
Mining with Noise and Missing Data
149
man&
50
100
250
500
Number of frequent episodes and informative rules
Time bounds U (s)
I lr n,l nn *‘-.A
au,
60 120
15 30
10, JU, ou, 1m
F)n vu
cl-l
1131’ 617 2278 1982 5899’ 7659 5899
14205
418 217
739
642 1676 2191 1676
3969
611
375
289
134
289
57
160
111
46
21
59
Table 1: Experimental
ir? nne
T-,RGQ
t.hnwlch
1-w“..- .,“b”
t.ho
rt.nwtmrm
Y&IV YY~..IVU.“U
mnlPj
“““\’
,
and
as follows.
For each [t,t’] E ma(P) with t’ - t < V, locate the
minimal occurrence [s, s’] of Q such that t 5 s and
[s, s’] is the first interval in ma(Q) with this property.
Then check whether st - t 2 W.
The time complexity of the confidence computation
is 0( 1mo( P) I + I ma(Q) I). If one wants t,o find t,he confidences of rules of the form P [V] + Q [W] for all
V, W E U for some set of times U, then by usmg a
table of size IU12 one can in fact evaluate all these in
time O(imo(Pji+imo(Qji+iUj2j.
For reasons ofbrevIty we omit the details.
ma(Q),
Experimental
results
We have experimented with t,he methods using as test
dat.a a part of the WWW server log from t,he Department of Computer Science at the University of Helsinki. The log contains requests to see WWW pages
at the department’s server; such requests can be made
by WWW browsers at any host in the Internet..
An mwnt.
ln,~ ,-ran
2~ rnnciat;nm
_---. -__1 in
-.. t.hhp
“.-” “b
YUll he
-1 CPP,,
U-Ill U”
““““L”““lb
80
87
138
80
results: number of episodes and rules
obtain confidences for various types of episode rules
without looking at the data again.
Recall that an episode rule is an expression P [V] +
Q [VVj, where P and Q are episodes, and V and W
are real numbers To find surh rules, first note that
for the rule to be interesting, also the episode P A Q
has to be frequent. So rules of the above form can be
enumerated by looking at all frequent episodes M =
~ic~aa, and then looking at all partitions of I as I =
J U K, and forming the left and right-hand sides as
subepisodes: P = MJ and Q = MI<. The evaluation
of the confidence of the rule P [V] + Q [W] can be
aone
49
nf
“1 the
YI1b
attributes page, host, and tzme. The number of events
in our data set is 116308, and it covers three weeks
in February and March, 1996. In total, 7634 different pages are referred to from 11635 hosts Requests
for images have been excluded from consideration. For
simplicity, we only considered t,he page and trme attributes; we used relatively short, time bounds to reduce the probability of unrelated requests cont,ributing
Execution times (s)
m’)rl k
“0”‘“-J’
T;mo
Illllcl
15,30 I30,60
Table 2: Experimental
ha..n,lo
V”Lul.UU
160,120
TT
”
l-1
(U]
1 15,30,60,120
results: execution times
to the frequencies.
We experimented with frequency thresholds min-fr
bet,ween 50 and 500, and with time bounds between
15 s and 2 min. (In three cases we used two time
bounds, i.e., U = {V, W}, and in one case we searched
simultaneously for all combinations of four time bounds
in U.) Episode rules discovered with t,hese paramet,ers
should reveal the paths through which people navigate
when they know where they want to go.
Table 1 shows the number of frequent episodes and
the number of informative rules with confidence at,
ieast 0.2. (A ruie with time bounds V, W is considered
informative if its confidence is higher than with time
bounds V’, W ’ where V’ < V and W ’ > W.)
The number of frequent episodes is in the range from
40 to 6000, and it seems to grow rather fast. when the
frequency threshold becomes lower. Our dat,a is relatively dense, and therefore the effect. of the t,ime bounds
on the number of frequent. episodes is roughly 1inea.r.
The largest frequent, episodes consist of 7 eve&. Note
that the method is robust in the sense that. a change
in one parameter extends or shrinks the collection of
frequent episodes but does not replace any.
Table 2 shows the execution times for t,he experiments on a PC (90 MHz Pentium, 32 MB memory,
Linux operating system). The data resided in a 3.0 MB
flat text file.
The experiments show that the method is efficient,.
The execution times are between 50 s and 5 min Not,e,
in particular, that searching for episodes wldh several
different time bounds (ihe right-most columns in the
tables) is as fast as searching for episodes with only
the largest time bound. Minimal occurrences are thus
a very suitable representation for queries with different
time bounds.
Following are some examples of the episode rules
found (we use the titles of the pages here, or their English translations, which should be self-explanatory).
References
Agrawal, R., and Srikant, R. 1994. Fast algorithms for
mining association rules in large databases. In Proceedings
of the Twentzeth
Bettini, C.; Wang, X. S.; and Jajodia, S. 1996. Testing
complex temporal relationships involving multiple granularities and its application to data mining. In Proceedings
of the Fifteenth
pOSiW?I
“Department Home Page”, “Spring term 96” [15 s]
j. “Classes in spring 96” [30 s] (confidence 0 83). In
other words, in 83 % of the cases where the departmental home page and the spring term page had been
accessed within 15 seconds, the classes page was requested within 30 seconds (that is, within 30 seconds
from the request for the departmental home page).
“Research at the department” * “Staff of the department” [2 min] (confidence 0.29). (There is no
time bound for the left-hand side since there is only
one event .)
“Department Home Page”, “Department Home Page
in Finnish”, ‘(Classes in spring 96”, “Basic courses”
[15 s] + “Introduction
to Document Preparation
(IDP)” , “IDP Course Description”, “IDP Exercises”
[2 min] (confidence 0.42).
Experiments
with the data set of (Mannila,
Toivonen, &L Verkamo 1995) show that - with comparable parameters - the present method is as fast or
faster than the one presented in (Mannila, Toivonen,
” -_ .
& Verkamo 1995j. ‘The new method has, however, two
important advantages: the rule formalism is more useful, and rules with several different time bounds can be
found with the same effort.
Concluding
remarks
We have presented a framework for generalized episodes, and algorithms for discovering episode rules from
sequences of events. The present framework supplies
sufficient power for representing desired connections
between events.
The work presented here is in many ways preliminary. Perhaps the most important extensions are facilities for rule querying and compilation, i.e , methods by
which the user could specify the episode class in highlevel language and the definition would automatically
be compiled into a specialization of the algorithm that
would take advantage of the restrictions on the episode
class. Other open problems include a theoretical analysis of what subclasses of episodes are recognizable
from episodes in polynomial time, and the combination
of episode techniques with intensity models.
on Very Large
International
Conference
487 - 499.
Data Bases (VLDB’94),
on
ACM
PrinCipkS
Sym-
SIGACT-SIGMOD-SIGART
Of
Database Systems
(l’UlJ3’Yb).
10
appear.
Dousson, C.; Gaborit, P.; and Ghallab, M. 1993 Situation
recognition: Representation and algorithms. In Proceedings of the Thirteenth
Artificial
Intelligence
Internatzonal
(IJCAI-93),
Joznt Conference
166 - 172.
on
Gehani, N.; Jagadish, H.; and Shmueli, 0. 1992. Composite event specification in active databases. In Proceedings
Internarzonal Conference
Data Bases (VLjjB:gfjj,
32.7 - 338.
of the Eightteenth
on Very Large
Hit&en,
K.; Klemettinen, M.; Mannila, H.; Ronkainen,
P.; and Toivonen, H. 1996. Knowledge discovery from
telecommunication
network alarm databases. In 12th International Conference on Data Engzneering
(ICDE’96),
115 - 122.
Laird; I’. 1993. IdentifvinP
in sequen._.” oatterns
‘
~~~~. ~~~
Y ---,I and usine
tial data. In Jantke, K.; Kobayashi, S.; Tomita, E.; and
Yokomori, T., eds., Algorithmic
Learning Theory, 4th Internatzonal Worbhop, 1 - 18. Berlin: Springer-Verlag.
Mannila, H.; Toivonen, H.; and Verkamo, A. I. 1994.
Efficient algorithms for discovering association rules. In
Fayyad, U. M., and Uthurusamy, R., eds., Iinowledge Discovery zn Databases, Papers from the 1994 AAAI
Workshop (IiDD’94),
181 - 192.
.Mannhj H,; Toivonen; H;; and Ve&amo. I --A. -I. 1995, Discovering frequent episodes in sequences. In Proceedings of
the Farst International
Conference
on Knowledge Discovery and Data Mining (KDD’95),
210 - 215.
Morris, R. A ; Khatib, L.; and Ligozat, G. 1995. Generating scenarios from specifications of repeating events. In
Second International
tion and Reasoning
Workshop
fTl.ME-9.5);
Muggleton, S 1992. Indvctzve
don: Academic Press.
on Temporal
Representa-
Logic Programming.
Lon-
Seshadri, P.; Livny, M.; and Ramakrishnan,
R. 1996.
SEQ: Design & implementation of sequence database system. In Proceedings of the d&nd Internatzonal Conference
on Very Large Data Bases (VLDB’96).
To appear.
Srikant, R., and Agrawal, R. 1996. Mining sequential patt.erns: Generalizations and performance improvements. In
international
Conference on Extencizng Database Technology (EDBT’1996).
Wang, J. T.-L.; Chirn, G.-W.; Marr, T. G.; Shapiro, B.;
Shasha, D.; and Zhang, K. 1994. Combinatorial
pattern
discovery for scientific data: Some preliminary results. In
Proceedings
of ACM SIGMOD
Conference
on Management of Data (SIGMOD’94),
115 - 125.
Mining with Noise and Missing Data
151