M a i n te n a n c e

1,
From: KDD-96 Proceedings. Copyright © 1996, AAAI (www.aaai.org). All rights reserved.
M a i n te n a n c e
o f D i s c o v e re d
Kn o w l e d g e
: A Case in
M u l ti -l e v e l
A s s o c i a ti o n
Rules
David W . Ch e u n g t
Vin c e n t T . N&
t D e p a rtm e n t o f C o m p u te r S c i e n c e , T h e U n i v e rs i ty
t D e p a rtm e n t
o f H o n g K o n g , H o n g K o n g . E m a i l : {d c h e u n g l w k ta m )@ c s .h k u .h k .
o f C o m p u ti n g , H o n g K o n g P o l y te c h n i c
A b s tra c t
A n i n c re m e n ta l
te c h n i q u e
a n d a fa s t a l g o r i t h m F U P h a v e b e e n p ro p o s e d p re v i o u s l y fo r
th e u p d a te o f d i s c o v e re d s i n g l e - l e v e l a s s o c i a ti o n
ru l e s ( S L A R ) . In th i s s tu d y , a m o re e ffi c i e n t a l g o r i t h m F U P * , w h i c h g e n e ra te s a s m a l l e r n u m b e r o f c a n d i d a te s e ts w h e n c o m p a r i n g w i th F U P ,
h a s b e e n p ro p o s e d . In a d d i ti o n , w e h a v e d e m o n s tra te d th a t th e i n c re m e n ta l
te c h n i q u e i n F U P
a n d F U P * c a n b e g e n e ra l i z e d to s o m e o th e r k d d
s y s te m s . A n e ffi c i e n t a l g o r i t h m M L U p h a s b e e n
p ro p o s e d fo r th i s p u rp o s e fo r th e u p d a ti n g
of
d i s c o v e re d m u l ti - l e v e l
a s s o c i a ti o n ru l e s (M L A R ) .
O u r p e rfo rm a n c e
s tu d y s h o w s th a t M L U p h a s
a s u p e r i o r p e rfo rm a n c e o v e r M L -T 2 i n u p d a ti n g
d i s c o v e re d M L A R .
1
An
plies
U n i v e rs i ty , H o n g K o n g . E m a i l : c s ty n g @ c o m p .p o l y u .e d u .h k .
th e te c h n i q u e c o u l d b e g e n e ra l i z e d to s o l v e th e u p d a te
p ro b l e m i n s o m e o th e r k d d s y s te m s . T h e re m a i n i n g
o f th e p a p e r i s o rg a n i z e d a s fo l l o w s . In S e c ti o n 2 , th e
fa s te r v e rs i o n F U P * i s p ro p o s e d . In S e c ti o n 3 , th e
p ro b l e m o f u p d a ti n g M L A R i s d i s c u s s e d a n d th e a l g o ri th m M L U p fo r th e u p d a te o f d i s c o v e re d M L A R i s
d i s c u s s e d . In S e c ti o n 4 , a n i n -d e p th p e rfo rm a n c e s tu d y
o f M L U p i s p re s e n te d . S e c ti o n 5 i s th e d i s c u s s i o n a n d
conclusions.
2
U p d a te
o f D i s c o v e re d
SL AR
In th e fo l l o w i n g d i s c u s s i o n , w e u s e th e s a m e n o ta ti o n
a s u s e d i n (4 ). W e s u m m a ri z e th e fi n d i n g o f (4 ) i n
L e m m a 1 . F o r a c o m p l e te d e s c ri p ti o n o f F U P , p l e a s e
s e e (4 ).
Lemma
In tro d u c ti o n
a s s o c i a ti o n
ru l e ( A R ) i s a s tro n g
c e rta i n a s s o c i a ti o n
re l a ti o n s h i p s
Be n j a m i n W . T a m t
ru l e w h i c h i m among a set of
o b j e c ts i n a d a ta b a s e . S i n c e fi n d i n g i n te re s ti n g A R
i n d a ta b a s e s m a y d i s c l o s e s o m e u s e fu l p a tte rn s fo r
d e c i s i o n s u p p o rt, m a rk e ti n g a n a l y s i s , fi n a n c i a l fo re c a s t, s y s te m fa u l t p re d i c ti o n , a n d m a n y o th e r a p p l i c a ti o n s , i t h a s a ttra c te d a l o t o f a tte n ti o n i n re c e n t d a ta m i n i n g re s e a rc h (6 ). E ffi c i e n t m i n i n g o f
A R i n tra n s a c ti o n
a n d /o r re l a ti o n a l d a ta b a s e s h a s
b e e n s tu d i e d s u b s ta n ti a l l y
(1 ; 2 ; 8 ; 1 0 ; 1 2 ; 7 ; 1 1 ;
1 3 ).
In o u r p re v i o u s s tu d y , w e h a v e i n v e s ti g a te d th e
m a i n te n a n c e p ro b l e m i n S L A R d i s c o v e ry (4 ). A n e ffi c i e n t a l g o ri th m F U P (F a s t U p d a te ) h a s b e e n p ro p o s e d ,
w h i c h c a n i n c re m e n ta l l y u p d a te th e A R d i s c o v e re d , i f
u p d a te s to a d a ta b a s e i s re s tri c te d to i n s e rti o n s o f n e w
tra n s a c ti o n s .
In th i s p a p e r, w e w i l l re p o rt tw o p ro g re s s e s i n o u r s tu d y o f th e m a i n te n a n c e p ro b l e m i n th e
m i n i n g o f a s s o c i a ti o n ru l e s . (1 ) A fa s te r v e rs i o n F U P *
o f F U P h a s b e e n p ro p o s e d . T h e i m p ro v e m e n t o f F U P *
o v e r F U P i s i n th e c a n d i d a te s e t g e n e ra ti o n p ro c e d u re .
(2 ) A n a l g o ri th m M L U p (s ta n d s fo r M u l ti - L e v e l
assoc i a ti o n ru l e s U p d a te ) h a s b e e n p ro p o s e d fo r th e u p d a te
o f th e d i s c o v e re d M L A R i n re l a ti o n d a ta b a s e s (7 ).
T h e s u c c e s s o f th e i n c re m e n ta l u p d a ti n g te c h n i q u e
u s e d i n S L A R a n d M L A R s u g g e s ts th a t, p o te n ti a l l y ,
1 (4 ) A k - i t e m s e t X n o t i n th e o ri g i n a l E u rg e
k - i t e m s e ts
L k c a n b e c o m e a w i n n e r, (i .e ., b e c o m e l a rg e )
i n th e u p d a te d d a ta b a s e D B U d b o n l y i ffX .S U p p O T td
2
0
s x d.
A fa s te r u p d a te
a l g o ri th m
F UP*
T h e i m p ro v e m e n t o f F U P * o v e r F U P (4 ) i s i n th e
c a n d i d a te s e t g e n e ra ti o n m e c h a n i s m . F U P u s e s th e
A p ri o ri -g e n fu n c ti o n d e fi n e d i n (2 ) to e s ta b l i s h a s e t
o f c a n d i d a te s e ts (4 ). In fa c t, i t l o o k s fo r i te m s e ts i n
A p ri o ri -g e n (L i -r)
w h i c h d o e s n o t b e l o n g to L k b u t
a p p e a r i n s o m e tra n s a c ti o n (s ) i n d b , w h o s e s u p p o rt
c o u n t i n d b i s l a rg e r th a n o r e q u a l to s x d , w h e re th e
s e t L i ., i s th e s e t o f s i z e -(k -l ) l a rg e i te m s e t i n th e u p d a te d a ta b a s e fo u n d i n th e (k -1 )-th i te ra ti o n o f F U P .
W e fi n d o u t th a t th e d o m a i n i n th i s s e a rc h i n g w h i c h i s
c a n b e fu rth e r re d u c e d to a
th e s e t A p ri o ri -g e n (L /,-i )
s m a l l e r s e t. T h i s fi n d i n g i s s u p p o rte d b y th e fo l l o w i n g
re s u l t. ( A re s u l t s i m i l a r to L e m m a 2 fo r p a rti ti o n e d
d a ta b a s e s h a s b e e n re p o rte d i n (5 )).
Lemma
2 A k - i t e m s e t X n o t i n th e o r i g i n &
l a rg e k i te m s e ts L k c a n b e c o m e a w i n n e r ( i . e ., b e c o m e l a rg e ) i n
th e u p d a te d d a ta b a s e D B U d b
o n l y i f Y .s u p p o rtd 1 s x d
fo r a l l th e s u b s e ts Y c X .
P r o o f. It fo l l o w s fro m L e m m a 1 th a t x .S U p p O rtd
> s x
d . If Y C X , th e n Y . S U p p O T t d
2
X.SUppOrtd.
Hence,
th e c o n d i ti o n
0
h o l d s fo r a l l th e s u b s e ts Y o f X .
M i n i n g A s s o c i a ti o n R u l e s
307
From Lemma
2, the candi d ate
sets can be restri c ted
to the sets i n Apri o ri - gen(L$-,),
where L;.,
are the
whose support
counts
i n db are l a rger
i t emsets
i n L’,-,,
than or equal
to s x d. In general ,
L;-,
i s smal l e r
and hence
the number
of candi d ate
sets
than
Li - ,,
i n Apri o ri - gen(LE-,)
i s smal l e r
than that i n Apri o ri gen(L’,-,).
In the fol l o wi n g,
we wi l
use Exampl e
1
to i l u strate
the executi o n
of FUP*.
In parti c ul a r,
the
exampl e
wi l show that FUP* can reduce
si g ni f i c antl y
the number
of candi d ate
sets.
Exampl e
1 A database
DB i s updated
wi t h an i n crement
db such that D = 1000, d = 100 and s = 3%.
X, Y, 2, and W are four i t ems and the si z e-l
and si z e2 l a rge i t emsets
i n DB are L1 = {X, Y, 2) and L2 =
{XY, YZ),
respecti v el y .
Al s o XY.supportr,
= 32 and
YZ.SUppO?' t D
= 31.
Suppose
FUP*
has compl e ted
the fi r st i t erati o n
and found
the “n ew”
si z e-l
i t emsets
assumi n g
that the supM oreover,
L$ = {X,Y,
W).
port counts
of X, Y, and W found
i n db are 2, 4 and
5, respecti v el y .
Thi s
exampl e
i l u strates
how FUP*
wi l
fi n d out L’, i n the second
i t erati o n,
and al s o i t s
effecti v eness
i n reduci n g
the number
of candi d ate
sets.
FUP* fi r st fi l t ers
out l o sers from L2. Note that Z E
L1 - L:, i . e., Z has become
a l o ser; therefore,
the set
Y Z E L:! must al s o be a l o ser and i s fi l t ered
out. For
the remai n i n g
set XY E Lz, FUP* scans db to update
i t s support
count.
Assume
that xY.SUppOrtdb
= 2.
Si n ce XY.supportu~
= (2+32)
> 3%x 1100, therefore,
XY i s l a rge i n DB U db and i s stored
i n L’,.
Secondl y ,
FUP*
needs
to fi n d out the “n ew”
l a rge
i t emsets
from db. For thi s purpose,
FUP* has to fi n d
contai n s
the i t emsets
out the set LT from Li , whi c h
i n Li that have enough
support
counts
i n db. Si n ce
x.SUppOrtd
= 2 < 3% x 100, X 6 L!, i . e., even though
X i s a wi n ner
i n the 1st i t erati o n,
it wi l not be used to
generate
the si z e-2 candi d ate
sets. On the other hand,
both the support
counts
of Y and W i n db are l a rger
than the threshol d
3%x 100; therefore
they wi l be used
to generate
the si z e-2 candi d ate
sets, i . e., LT = {Y, W}.
Fol l o wi n g
that, FUP*
appl i e s
Apri o ri - gen
on LT and
generates
the candi d ate
set C2 = {YW}.
Note that i n
FUP, Apri o ri - gen
i s appl i e d
on Li = {X, Y, W} i n stead
sets generated
wi l have
of LT , and the set of candi d ate
three i t emsets
whi c h
i s three ti m es l a rger
than what i s
generated
i n FUP*
i n thi s exampl e .
Thi s
i l u strates
that FUP* can si g ni f i c antl y
and effecti v el y
reduce
the
number
of candi d ate
sets when compari n g
wi t h FUP.
SuppOSe
Yw.SuppOrtd
= 4 > 3% x 100.
It fdOWS
from Lemma
1 that Y W wi l not be pruned
and remai n
i n C2. Fol l o wi n g
the pruni n g
of the candi d ate
sets
i n C2, FUP*
has to update
the remai n i n g
candi d ate
sets i n Cz agai n st
the ori g i n al
database
DB.
Suppose
Yw.SUppOrtD
= 29. Si n ce Yw.SUppOrtuD
= 29 + 4 >
3% x 1100, it i s a l a rge i t emset
i n the updated
database.
Therefore
Y W i s added
i n to L/,. At the end of the
Cl
second
i t erati o n,
L’, = {XY, Y W} i s returned.
308
Technol o gy
Spotl i g ht
3
Update
of Di s covered
MLAR
The method
used i n FUP* (and FUP) coul d
be appl i e d
to many
other kdd systems
to update
the knowl e dge
di s covered.
In parti c ul a r,
it can be used i n the systems
that are desi g ned
to di s cover
vari o us
types of associ a ti o ns
between
general i z ed
i t ems
and events.
Thi s
incl u des
the di s covery
of MLAR,
general i z ed
AR, sequenti a l patterns,
epi s odes,
and quanti t ati v e
AR (7; 12; 3;
9; 13). In the fol l o wi n g,
we wi l show that FUP” can be
general i z ed
to sol v e the update
probl e m
for MLAR.
For
thi s purpose,
an al g ori t hm
MLUp,
whi c h
i s an adaptati o n
of FUP*, wi l be proposed.
Mi n i n g
of MLAR
In the study
of mi n i n g
MLAR,
a seri e s of al g ori t hms
have been proposed
to faci l i t ate
a top-down,
progressi v e deepeni n g
method
based
on the al g ori t hms
for
mi n i n g
SLAR. The method
fi r st fi n ds l a rge data i t ems
at the top-most
l e vel
and then progressi v el y
deepens
the mi n i n g
process
i n to thei r l a rge descendants
at l o wer
concept
l e vel s .
For detai l s
on the mi n i n g
of MLAR,
pl e ase
refer to (7).
Update
of di s covered
MLAR
The probl e m
of updati n g
the di s covered
MLAR
i s the
same as that i n the si n gl e -l e vel
envi r onment.
The onl y
di f ference
i s that the rul e s i n al l the l e vel s
have to be
updated
i n stead
of updati n g
the rul e s i n onl y one l e vel .
Al s o, the mi n i m um
support
threshol d s
at di f ferent
l e vel s may not be equal .
We use sm to denote
the mi n i m for m > 1.
mum support
threshol d
at l e vel
Si n ce there are several
vari a ti o ns
of the al g ori t hm
i n mi n i n g
MLAR,
the update
al g ori t hm
shoul d
be desi g ned
accordi n g
to the strategy
used i n the i n i t i a l
mi n The al g ori t hm
MLUp
we are proposi n g
i n g process.
i s associ a ted
wi t h the representati v e
mi n i n g
al g ori t hm
ML-T2.
The fol l o wi n g
two resul t s
are the bases
of
MLUp.
Lemma
i t emset
3 In a mul t i - l e vel
X
not
i n the
become
envi r onment,
a l e vel - m
l a rge 1-i t emsets
L[m,
a wi n ner
(i . e., become
l a rge)
DB U db onl y
if al l ancestors
ori g i n al
(m 2 l), can
the updated
database
X are wi n ners.
Proof. Thi s fol l o ws
from the defi n i t i o n
i n the mul t i - l e vel
envi r onment.
of l a rge
l11,
in
of
i t emsets
0
Fol l o wi n g
Lemma
3, when
MLUp
scans the i n crement db to l o ok for new si z e-l
wi n ners,
it not onl y has
to ensure
a candi d ate
i t emset
has the requi r ed
support
count,
but must al s o check
that al l i t s ancestors
are
l a rge i n the updated
database.
(Because
of transi t i v i t y, MLUp
onl y needs to check a candi d ate’s
i m medi a te
ancestor).
Lemma
4 In a mul t i - l e vel
envi r onment,
a l e vel - m
ki t emset
X not i n the ori g i n al
l a rge k-i t emsets
L[m, k],
(m 2 l ) , can become
a wi n ner
(i . e., become
l a rge)
in
the updated
database
DB U db onl y
if X.SUppOrtd
2
s,,, x d and Y.SUppOrtd
1 sm x d, for al l subset
Y E X.
Proof.
Thi s
fol l o ws
di r ectl y
from
Lemmas
1 and
2.
0
The i m pl i c ati o n
of the resul t
i n Lemma
4 i s that
the candi d ate
set generati o n
mechani s m
i n FUP* can
be appl i e d
di r ectl y
i n MLUp
for fi n di n g
new wi n ners
i n di f ferent
l e vel s .
In the fol l o wi n g,
we descri b e
the
mai n
procedure
of the update
al g ori t hm
MLUp.
The
i n put
to the al g ori t hm
i n cl u des
the ori g i n al
encoded
transacti o n
database
T[l ] , the i n crement
database
db,
and the ol d l a rge i t emsets
L[m, k], (m 2 1, k > l ) ,
and thei r
support
counts.
Fol l o wi n g
the conventi o ns
by D
i n FUP*,
the si z es of T[l ]
and db are denoted
and d respecti v el y .
Moreover,
the mi n i m um
support
threshol d
for di f ferent
l e vel i s denoted
by si n , (m 2 1).
MLUp
(mai n
steps) :
1. Transl a te
the i n crement
transacti o n
database
db
i n to an encoded
transacti o n
tabl e
db[l ] accordi n g
to
the gi v en
taxonomy
i n formati o n.
2. At l e vel 1, scan db[l ] to update
the support
counts
of the 1-i t emsets
i n L[l , l] to fi l t er
out the wi n ners
i n to L’[l,
11. In the same scan, fi n d al l the 1-i t emsets
do not bel o ng
to L[l , 11, whose support
i n db[l ] whi c h
count i n db[l ] i s l a rger than or equal
to s1 x d, and store
these 1-i t emsets
i n the candi d ate
set Cl . Subsequentl y ,
scan T[l ] to fi n d out the new wi n ners
i n Cl and store
them i n to L’[l,
11. Fol l o wi n g
that, T[l ] i s fi l t ered
by
L’[l,
l] to generate
the encoded
transacti o n
tabl e 2721.
Si m i l a rl y ,
db[l ] i s fi l t ered
to db[2].
At l e vel m, (m > 1), scan db[2] to update
the support
counts
of the 1-i t emsets
i n L[m, 11. An 1-i t emset
in
L[m, l ] i s a wi n ner
onl y
if i t s i m medi a te
ancestor
is
l a rge i n the updated
database
and i t s support
counts
i n the updated
database
i s l a rger
than or equal to s,,, x
(D + d).
In the same scan, fi n d al l l e vel - m
1-i t emsets
i n db[2]
whi c h
do not bel o ng
to L[m, 11, whose i m medi a te
ancestor bel o ngs
to L’[m1, l] and whose support
count
than or equal
to srrs x d. Then
store
i n db[2] i s l a rger
these 1-i t emsets
i n the candi d ate
set Cl . Subsequentl y ,
scan T[2] to fi n d out the new wi n ners
i n Cl and store
them i n L’[m,
11.
3. The l a rge
k-i t emsets,
(k > l ) , for the updated
database
at l e vel m i s deri v ed
i n three steps:
(1) Remove
al l the k-i t emsets
i n L[m, k] for whi c h
one of i t s ancestors
i s not l a rge
i n the updated
database.
Then
scan db[2] to update
the support
counts
of the remai n i n g
i t emsets
i n L[m, k] to fi n d out
the wi n ners.
(2) Let L*[m, k - l] be the subsets
of
i t emsets
i n L’[m, k - l] whose support
count
i n db i s
l a rger then or equal to s, x d. In the same scan on db[2]
performed
i n (1)) fi n d al l l e vel - m
k-i t emsets
i n Apri o ri gen(L*
[m, k-l ] )
whi c h do not bel o ng
to L[m, k], whose
support
count i n db[2] i s l a rger than or equal
to snb x d,
and store them i n the candi d ate
set Ck. (3) Scan DB
to update
the support
counts
of the candi d ate
sets i n
Ck and fi n d al l the l e vel - m
si z e-k wi n ners
i n Ck, and
store them i n L’[m,
k].
4. At l e vel m, return the uni o n
of L’[m, k] for al l the
k’s.
4
Performance
Study
of MLUp
Extensi v e
experi m ents
have been conducted
to assess
the performance
of MLUp.
It was compared
wi t h the
al g ori t hm
ML-TP.
The experi m ents
were performed
on
an AIX system on an RS/SOOO workstati o n
wi t h model
410. The resul t
shows that MLUp
i s much faster than
re-runni n g
ML-T2
to update
the di s covered
AR. Thi s
i m provement
i s not surpri s i n g
gi v en that FUP al s o has
si m i l a r
performance
i n updati n g
SLAR. The databases
used i n our experi m ents
are syntheti c
data generated
usi n g
a techni q ue
si m i l a r
to that i n (2).
Fi g ure
1: Performance
Compari s on
(l e vel
1)
test
envi r onments
are
denoted
by
Our
T10.14.D100.d10.s~s~~s-s~,
whi c h
represents
an updated database
i n whi c h
the ori g i n al
database
DB has
100 thousands
of transacti o ns
(Dl O O), the i n crement
db has 10 thousands
of transacti o ns
(d10).
The transacti o ns
on average
has 10 i t ems (Tl O ),
and the average
si z e of the l a rge i t emsets
i s 4 (14). Moreover,
there are
four l e vel s
i n the taxonomy
and the mi n i m um
supports
are denoted
by si , (1 5 i 5 4). The performance
compari s on
between
MLUp
and ML-T2
i n the update
of
the l e vel - l
AR i s pl o tted
i n Fi g ure
1 agai n st
di f ferent
mi n i m um
support
threshol d s.
Thei r
performance
rati o s are al s o presented
as bar charts i n the same fi g ure.
It can be seen that MLUp
i s 2-3 ti m es faster than MLT2. MLUp
al s o has si m i l a r
speed-up
over ML-T2
in
the updates
i n the other l e vel s .
As expl a i n ed
before,
MLUp
reduces
substanti a l y
the number
of candi d ate
sets generated
when compari n g wi t h ML-T:!.
In Fi g ure
2, the number
of candi date sets generated
i n MLUp
i n the same experi m ent
i s compared
wi t h that i n ML-TP.
The rati o s
i n the
compari s on
are presented
as bar charts
i n the same
fi g ure.
The chart shows that the number
of candi d ate
sets generated
by MLUp
i s onl y about
2-3% of that i n
ML-T2.
A seri e s of updates
from 10K to 350K were generated on the databases
T10.14.Dl 0 0,
and the executi o n
ti m es for MLUp
and ML-T2
to do the updates
on these
i n crements
were compared.
A gradual l y
l e vel off of the
speed-up
of MLUp
over ML-T2
onl y appears
when the
i n crement
si z e i s about
3.5 ti m es the si z e of the ori g i n al database.
The fact that MLUp
sti l exhi b i t s
performance
gai n when the i n crement
i s much l a rger
than
the ori g i n al
database
shows that it i s very effi c i e nt.
Mi n i n g
Associ a ti o n
Rul e s
309
del e ti o n
and
modi f i c ati o n.
References
Fi g ure
2: Reducti o n
5
Di s cussi o n
of Candi d ate
and
Sets (l e vel
1)
Concl u si o ns
We have shown
that FUP*
i s an effi c i e nt
al g ori t hm
for updati n g
di s covered
SLAR. It i m proves
the performance
of FUP by si g ni f i c antl y
reduci n g
i t s candi d ate
sets.
We have al s o proposed
an effi c i e nt
al g ori t hm
MLUp
for updati n g
di s covered
MLAR.
It i s an adaptati o n
of
the FUP*
al g ori t hm
i n the mul t i - l e vel
envi r onment.
The al g ori t hm
MLUp
i s i m pl e mented
and i t s performance
i s studi e d
and compared
wi t h the ML-T2
al gori t hm
. The study
shows that MLUp
has superi o r
performance
i n the mul t i - l e vel
envi r onment.
The success of the i n cremental
updati n g
techni q ue
i n both the
SLAR and MLAR
suggests
that the techni q ue
coul d
be general i z ed
to sol v e
the update
probl e ms
i n some
other knowl e dge
di s covery
systems.
Currentl y ,
both
FUP*
and MLUp
are appl i c abl e
onl y to a database
whi c h
al l o w
frequent
or occasi o nal
updates
restri c ted
to i n serti o ns
of new transacti o ns.
We have
al s o i n vesti g ated
the cases of updates
incl u di n g
del e ti o ns
and/or
modi f i c ati o ns
to a transacti o n database.
In FUP*
and MLUp,
the i n cremental
updati n g
techni q ue
has made use of the fact that new
wi n ners
generated
i n the updati n g
process
must appear and have enough
support
counts i n the i n crement.
However,
thi s does not hol d i n general
i n the cases of
del e ti o n
and modi f i c ati o n.
For exampl e ,
i n the case
of del e ti o n,
because
the si z e of the updated
database
has decreased,
some i t emsets
whi c h
are “s mal l ”
i n the
orgi n i a l
database
DB, coul d
become
l a rge i n the updated database,
even though
it i s not contai n ed
i n any
transacti o n
del e ted.
Consequentl y ,
the set of candi date sets cannot
be l i m i t ed
to those appear
i n the i n crement,
and potenti a l y ,
al l i t emsets
i n the updated
database
have to be consi d ered
as candi d ates.
Therefore, the current
i n cremental
techni q ue
cannot
be appl i e d
di r ectl y
to the cases of del e ti o n
and modi f i c ati o n.
However,
it i s possi b l e
to sol v e the del e ti o n
and modi fi c ati o n
cases if the i n i t i a l
mi n i n g
process
i s enhanced
to retai n
more i n formati o ns
to support
the update.
The extensi o n
of our i n cremental
update
techni q ue
for the mai n tenance
of other type of knowl e dge
such
as general i z ed
AR, epi s odes,
sequenti a l
patterns,
and
quanti t ati v e
AR i s an i n teresti n g
topi c
for future
research.
However,
as di s cussed
above,
a bi g ger
chal l e nge
i s to extend
thi s techni q ue
to cover the cases of
310
Technol o gy
Spotl i g ht
[l] R. Agrawal ,
T. Imi e l i n ski ,
and A. Swami .
Mi n i n g associ a ti o n
rul e s between
sets of i t ems
i n l a rge
databases.
In Proc. 1993 ACM-SIGMOD
Int. Conf.
Management
of Data,
pp. 207-216,
Washi n gton,
D.C., May 1993.
[2] R. Agrawal
and R. Sri k ant.
Fast al g ori t hms
for
Int. Conf.
mi n i n g
associ a ti o n
rul e s.
In Proc. 1994
VLDB,
pp. 487-499,
Santi a go,
Chi l e ,
Sept. 1994.
[3] R. Agrawal
and R. Sri k ant.
Mi n i n g
sequenti a l
patterns. In Proc. 1995 Inn-t. Conf. Data Engi n eeri n g,
pp.
3-14, Tai p ei ,
Tai w an,
March
1995.
[4] D.W. Cheung,
J. Han, V. Ng, and C.Y. Wong.
Mai n tenance
of di s covered
associ a ti o n
rul e s i n l a rge
databases:
An i n cremental
updati n g
techni q ue.
In
Proc. 1996
Int’I
Conf.
on Data Engi n eeri n g,
New Orl e ans,
Loui s i a na,
Feb. 1996.
[5] D.W. Cheung,
J. Han, V. Ng, A. Fu and Y. Fu.
A Fast Di s tri b uted
Al g ori t hm
for Mi n i n g
Associ a ti o n
Rul e s.
Techni c al
Report,
Dept. of Computer
Sci e nce,
The Uni v ersi t y
of Hong Kong,
1996.
[6] U. M. Fayyad,
G. Pi a tetsky-Shapi r o,
P. Smyth,
and R. Uthurusamy.
Advances
i n Knowl e dge
Di s covery and Data
Mi n i n g.
AAAI/MIT
Press, 1996.
[7] J. Han and Y. Fu. Di s covery
of mul t i p l e -l e vel
associ a ti o n
rul e s from l a rge
databases.
In Proc. 1995
pp. 420-431,
Zuri c h,
Swi t zerl a nd,
Int. Conf.
VLDB,
Sept. 1995.
H. Manni l a ,
P. Ronkai n en,
[8] M. Kl e metti n en,
H. Toi v onen,
and A. I. Verkamo.
Fi n di n g
i n teresti n g
rul e s from l a rge sets of di s covered
associ a ti o n
rul e s. In
Proc. 3rd
Management,
Int’l
Conf.
pp.
on
401-408,
Informati o n
Gai t hersburg,
and
Knowl e dge
Maryl a nd,
Nov. 1994.
[9] H. Manni l a ,
H. Toi v onen,
and A. I. Verkamo.
Di s coveri n g
Frequent
Epi s odes
i n Sequences.
In Proc. 1st
Int’l
Conf.
on KDD,
pp. 210-215,
Montreal ,
Quebec,
Canada,
Aug. 1995.
[l o ] J.S. Park, M.S. Chen, and P.S. Yu. An effecti v e
hash-based
al g ori t hm
for mi n i n g
associ a ti o n
rul e s. In
Proc.
1995
ACM-SIGMOD
Int. Conf. Management
of Data,
pp. 175-186,
San Jose, CA, May 1995.
[ll] A. Savasere,
E. Omi e ci n ski ,
and S. Navathe.
An
effi c i e nt
al g ori t hm
for mi n i n g
associ a ti o n
rul e s i n l a rge
databases.
In Proc. 1995 Int. Conf. VLDB,
pp. 432443, Zuri c h,
Swi t zerl a nd,
Sept. 1995.
[12] R. Sri k ant
and R. Agrawal .
Mi n i n g
general i z ed
associ a ti o n
rul e s. In Proc. 1995 Int. Conf. VLDB, pp.
407-419,
Zuri c h,
Swi t zerl a nd,
Sept. 1995.
[13] R. Sri k ant
and R. Agrawal .
Mi n i n g
quanti t ati v e associ a ti o n
rul e s
i n l a rge
rel a ti o nal
tabl e s.
In
Proc.
1996
ACM-SIGMOD
Int. Conf.
Management
of Data,
Montreal ,
Canada,
June 1996.