1, From: KDD-96 Proceedings. Copyright © 1996, AAAI (www.aaai.org). All rights reserved. M a i n te n a n c e o f D i s c o v e re d Kn o w l e d g e : A Case in M u l ti -l e v e l A s s o c i a ti o n Rules David W . Ch e u n g t Vin c e n t T . N& t D e p a rtm e n t o f C o m p u te r S c i e n c e , T h e U n i v e rs i ty t D e p a rtm e n t o f H o n g K o n g , H o n g K o n g . E m a i l : {d c h e u n g l w k ta m )@ c s .h k u .h k . o f C o m p u ti n g , H o n g K o n g P o l y te c h n i c A b s tra c t A n i n c re m e n ta l te c h n i q u e a n d a fa s t a l g o r i t h m F U P h a v e b e e n p ro p o s e d p re v i o u s l y fo r th e u p d a te o f d i s c o v e re d s i n g l e - l e v e l a s s o c i a ti o n ru l e s ( S L A R ) . In th i s s tu d y , a m o re e ffi c i e n t a l g o r i t h m F U P * , w h i c h g e n e ra te s a s m a l l e r n u m b e r o f c a n d i d a te s e ts w h e n c o m p a r i n g w i th F U P , h a s b e e n p ro p o s e d . In a d d i ti o n , w e h a v e d e m o n s tra te d th a t th e i n c re m e n ta l te c h n i q u e i n F U P a n d F U P * c a n b e g e n e ra l i z e d to s o m e o th e r k d d s y s te m s . A n e ffi c i e n t a l g o r i t h m M L U p h a s b e e n p ro p o s e d fo r th i s p u rp o s e fo r th e u p d a ti n g of d i s c o v e re d m u l ti - l e v e l a s s o c i a ti o n ru l e s (M L A R ) . O u r p e rfo rm a n c e s tu d y s h o w s th a t M L U p h a s a s u p e r i o r p e rfo rm a n c e o v e r M L -T 2 i n u p d a ti n g d i s c o v e re d M L A R . 1 An plies U n i v e rs i ty , H o n g K o n g . E m a i l : c s ty n g @ c o m p .p o l y u .e d u .h k . th e te c h n i q u e c o u l d b e g e n e ra l i z e d to s o l v e th e u p d a te p ro b l e m i n s o m e o th e r k d d s y s te m s . T h e re m a i n i n g o f th e p a p e r i s o rg a n i z e d a s fo l l o w s . In S e c ti o n 2 , th e fa s te r v e rs i o n F U P * i s p ro p o s e d . In S e c ti o n 3 , th e p ro b l e m o f u p d a ti n g M L A R i s d i s c u s s e d a n d th e a l g o ri th m M L U p fo r th e u p d a te o f d i s c o v e re d M L A R i s d i s c u s s e d . In S e c ti o n 4 , a n i n -d e p th p e rfo rm a n c e s tu d y o f M L U p i s p re s e n te d . S e c ti o n 5 i s th e d i s c u s s i o n a n d conclusions. 2 U p d a te o f D i s c o v e re d SL AR In th e fo l l o w i n g d i s c u s s i o n , w e u s e th e s a m e n o ta ti o n a s u s e d i n (4 ). W e s u m m a ri z e th e fi n d i n g o f (4 ) i n L e m m a 1 . F o r a c o m p l e te d e s c ri p ti o n o f F U P , p l e a s e s e e (4 ). Lemma In tro d u c ti o n a s s o c i a ti o n ru l e ( A R ) i s a s tro n g c e rta i n a s s o c i a ti o n re l a ti o n s h i p s Be n j a m i n W . T a m t ru l e w h i c h i m among a set of o b j e c ts i n a d a ta b a s e . S i n c e fi n d i n g i n te re s ti n g A R i n d a ta b a s e s m a y d i s c l o s e s o m e u s e fu l p a tte rn s fo r d e c i s i o n s u p p o rt, m a rk e ti n g a n a l y s i s , fi n a n c i a l fo re c a s t, s y s te m fa u l t p re d i c ti o n , a n d m a n y o th e r a p p l i c a ti o n s , i t h a s a ttra c te d a l o t o f a tte n ti o n i n re c e n t d a ta m i n i n g re s e a rc h (6 ). E ffi c i e n t m i n i n g o f A R i n tra n s a c ti o n a n d /o r re l a ti o n a l d a ta b a s e s h a s b e e n s tu d i e d s u b s ta n ti a l l y (1 ; 2 ; 8 ; 1 0 ; 1 2 ; 7 ; 1 1 ; 1 3 ). In o u r p re v i o u s s tu d y , w e h a v e i n v e s ti g a te d th e m a i n te n a n c e p ro b l e m i n S L A R d i s c o v e ry (4 ). A n e ffi c i e n t a l g o ri th m F U P (F a s t U p d a te ) h a s b e e n p ro p o s e d , w h i c h c a n i n c re m e n ta l l y u p d a te th e A R d i s c o v e re d , i f u p d a te s to a d a ta b a s e i s re s tri c te d to i n s e rti o n s o f n e w tra n s a c ti o n s . In th i s p a p e r, w e w i l l re p o rt tw o p ro g re s s e s i n o u r s tu d y o f th e m a i n te n a n c e p ro b l e m i n th e m i n i n g o f a s s o c i a ti o n ru l e s . (1 ) A fa s te r v e rs i o n F U P * o f F U P h a s b e e n p ro p o s e d . T h e i m p ro v e m e n t o f F U P * o v e r F U P i s i n th e c a n d i d a te s e t g e n e ra ti o n p ro c e d u re . (2 ) A n a l g o ri th m M L U p (s ta n d s fo r M u l ti - L e v e l assoc i a ti o n ru l e s U p d a te ) h a s b e e n p ro p o s e d fo r th e u p d a te o f th e d i s c o v e re d M L A R i n re l a ti o n d a ta b a s e s (7 ). T h e s u c c e s s o f th e i n c re m e n ta l u p d a ti n g te c h n i q u e u s e d i n S L A R a n d M L A R s u g g e s ts th a t, p o te n ti a l l y , 1 (4 ) A k - i t e m s e t X n o t i n th e o ri g i n a l E u rg e k - i t e m s e ts L k c a n b e c o m e a w i n n e r, (i .e ., b e c o m e l a rg e ) i n th e u p d a te d d a ta b a s e D B U d b o n l y i ffX .S U p p O T td 2 0 s x d. A fa s te r u p d a te a l g o ri th m F UP* T h e i m p ro v e m e n t o f F U P * o v e r F U P (4 ) i s i n th e c a n d i d a te s e t g e n e ra ti o n m e c h a n i s m . F U P u s e s th e A p ri o ri -g e n fu n c ti o n d e fi n e d i n (2 ) to e s ta b l i s h a s e t o f c a n d i d a te s e ts (4 ). In fa c t, i t l o o k s fo r i te m s e ts i n A p ri o ri -g e n (L i -r) w h i c h d o e s n o t b e l o n g to L k b u t a p p e a r i n s o m e tra n s a c ti o n (s ) i n d b , w h o s e s u p p o rt c o u n t i n d b i s l a rg e r th a n o r e q u a l to s x d , w h e re th e s e t L i ., i s th e s e t o f s i z e -(k -l ) l a rg e i te m s e t i n th e u p d a te d a ta b a s e fo u n d i n th e (k -1 )-th i te ra ti o n o f F U P . W e fi n d o u t th a t th e d o m a i n i n th i s s e a rc h i n g w h i c h i s c a n b e fu rth e r re d u c e d to a th e s e t A p ri o ri -g e n (L /,-i ) s m a l l e r s e t. T h i s fi n d i n g i s s u p p o rte d b y th e fo l l o w i n g re s u l t. ( A re s u l t s i m i l a r to L e m m a 2 fo r p a rti ti o n e d d a ta b a s e s h a s b e e n re p o rte d i n (5 )). Lemma 2 A k - i t e m s e t X n o t i n th e o r i g i n & l a rg e k i te m s e ts L k c a n b e c o m e a w i n n e r ( i . e ., b e c o m e l a rg e ) i n th e u p d a te d d a ta b a s e D B U d b o n l y i f Y .s u p p o rtd 1 s x d fo r a l l th e s u b s e ts Y c X . P r o o f. It fo l l o w s fro m L e m m a 1 th a t x .S U p p O rtd > s x d . If Y C X , th e n Y . S U p p O T t d 2 X.SUppOrtd. Hence, th e c o n d i ti o n 0 h o l d s fo r a l l th e s u b s e ts Y o f X . M i n i n g A s s o c i a ti o n R u l e s 307 From Lemma 2, the candi d ate sets can be restri c ted to the sets i n Apri o ri - gen(L$-,), where L;., are the whose support counts i n db are l a rger i t emsets i n L’,-,, than or equal to s x d. In general , L;-, i s smal l e r and hence the number of candi d ate sets than Li - ,, i n Apri o ri - gen(LE-,) i s smal l e r than that i n Apri o ri gen(L’,-,). In the fol l o wi n g, we wi l use Exampl e 1 to i l u strate the executi o n of FUP*. In parti c ul a r, the exampl e wi l show that FUP* can reduce si g ni f i c antl y the number of candi d ate sets. Exampl e 1 A database DB i s updated wi t h an i n crement db such that D = 1000, d = 100 and s = 3%. X, Y, 2, and W are four i t ems and the si z e-l and si z e2 l a rge i t emsets i n DB are L1 = {X, Y, 2) and L2 = {XY, YZ), respecti v el y . Al s o XY.supportr, = 32 and YZ.SUppO?' t D = 31. Suppose FUP* has compl e ted the fi r st i t erati o n and found the “n ew” si z e-l i t emsets assumi n g that the supM oreover, L$ = {X,Y, W). port counts of X, Y, and W found i n db are 2, 4 and 5, respecti v el y . Thi s exampl e i l u strates how FUP* wi l fi n d out L’, i n the second i t erati o n, and al s o i t s effecti v eness i n reduci n g the number of candi d ate sets. FUP* fi r st fi l t ers out l o sers from L2. Note that Z E L1 - L:, i . e., Z has become a l o ser; therefore, the set Y Z E L:! must al s o be a l o ser and i s fi l t ered out. For the remai n i n g set XY E Lz, FUP* scans db to update i t s support count. Assume that xY.SUppOrtdb = 2. Si n ce XY.supportu~ = (2+32) > 3%x 1100, therefore, XY i s l a rge i n DB U db and i s stored i n L’,. Secondl y , FUP* needs to fi n d out the “n ew” l a rge i t emsets from db. For thi s purpose, FUP* has to fi n d contai n s the i t emsets out the set LT from Li , whi c h i n Li that have enough support counts i n db. Si n ce x.SUppOrtd = 2 < 3% x 100, X 6 L!, i . e., even though X i s a wi n ner i n the 1st i t erati o n, it wi l not be used to generate the si z e-2 candi d ate sets. On the other hand, both the support counts of Y and W i n db are l a rger than the threshol d 3%x 100; therefore they wi l be used to generate the si z e-2 candi d ate sets, i . e., LT = {Y, W}. Fol l o wi n g that, FUP* appl i e s Apri o ri - gen on LT and generates the candi d ate set C2 = {YW}. Note that i n FUP, Apri o ri - gen i s appl i e d on Li = {X, Y, W} i n stead sets generated wi l have of LT , and the set of candi d ate three i t emsets whi c h i s three ti m es l a rger than what i s generated i n FUP* i n thi s exampl e . Thi s i l u strates that FUP* can si g ni f i c antl y and effecti v el y reduce the number of candi d ate sets when compari n g wi t h FUP. SuppOSe Yw.SuppOrtd = 4 > 3% x 100. It fdOWS from Lemma 1 that Y W wi l not be pruned and remai n i n C2. Fol l o wi n g the pruni n g of the candi d ate sets i n C2, FUP* has to update the remai n i n g candi d ate sets i n Cz agai n st the ori g i n al database DB. Suppose Yw.SUppOrtD = 29. Si n ce Yw.SUppOrtuD = 29 + 4 > 3% x 1100, it i s a l a rge i t emset i n the updated database. Therefore Y W i s added i n to L/,. At the end of the Cl second i t erati o n, L’, = {XY, Y W} i s returned. 308 Technol o gy Spotl i g ht 3 Update of Di s covered MLAR The method used i n FUP* (and FUP) coul d be appl i e d to many other kdd systems to update the knowl e dge di s covered. In parti c ul a r, it can be used i n the systems that are desi g ned to di s cover vari o us types of associ a ti o ns between general i z ed i t ems and events. Thi s incl u des the di s covery of MLAR, general i z ed AR, sequenti a l patterns, epi s odes, and quanti t ati v e AR (7; 12; 3; 9; 13). In the fol l o wi n g, we wi l show that FUP” can be general i z ed to sol v e the update probl e m for MLAR. For thi s purpose, an al g ori t hm MLUp, whi c h i s an adaptati o n of FUP*, wi l be proposed. Mi n i n g of MLAR In the study of mi n i n g MLAR, a seri e s of al g ori t hms have been proposed to faci l i t ate a top-down, progressi v e deepeni n g method based on the al g ori t hms for mi n i n g SLAR. The method fi r st fi n ds l a rge data i t ems at the top-most l e vel and then progressi v el y deepens the mi n i n g process i n to thei r l a rge descendants at l o wer concept l e vel s . For detai l s on the mi n i n g of MLAR, pl e ase refer to (7). Update of di s covered MLAR The probl e m of updati n g the di s covered MLAR i s the same as that i n the si n gl e -l e vel envi r onment. The onl y di f ference i s that the rul e s i n al l the l e vel s have to be updated i n stead of updati n g the rul e s i n onl y one l e vel . Al s o, the mi n i m um support threshol d s at di f ferent l e vel s may not be equal . We use sm to denote the mi n i m for m > 1. mum support threshol d at l e vel Si n ce there are several vari a ti o ns of the al g ori t hm i n mi n i n g MLAR, the update al g ori t hm shoul d be desi g ned accordi n g to the strategy used i n the i n i t i a l mi n The al g ori t hm MLUp we are proposi n g i n g process. i s associ a ted wi t h the representati v e mi n i n g al g ori t hm ML-T2. The fol l o wi n g two resul t s are the bases of MLUp. Lemma i t emset 3 In a mul t i - l e vel X not i n the become envi r onment, a l e vel - m l a rge 1-i t emsets L[m, a wi n ner (i . e., become l a rge) DB U db onl y if al l ancestors ori g i n al (m 2 l), can the updated database X are wi n ners. Proof. Thi s fol l o ws from the defi n i t i o n i n the mul t i - l e vel envi r onment. of l a rge l11, in of i t emsets 0 Fol l o wi n g Lemma 3, when MLUp scans the i n crement db to l o ok for new si z e-l wi n ners, it not onl y has to ensure a candi d ate i t emset has the requi r ed support count, but must al s o check that al l i t s ancestors are l a rge i n the updated database. (Because of transi t i v i t y, MLUp onl y needs to check a candi d ate’s i m medi a te ancestor). Lemma 4 In a mul t i - l e vel envi r onment, a l e vel - m ki t emset X not i n the ori g i n al l a rge k-i t emsets L[m, k], (m 2 l ) , can become a wi n ner (i . e., become l a rge) in the updated database DB U db onl y if X.SUppOrtd 2 s,,, x d and Y.SUppOrtd 1 sm x d, for al l subset Y E X. Proof. Thi s fol l o ws di r ectl y from Lemmas 1 and 2. 0 The i m pl i c ati o n of the resul t i n Lemma 4 i s that the candi d ate set generati o n mechani s m i n FUP* can be appl i e d di r ectl y i n MLUp for fi n di n g new wi n ners i n di f ferent l e vel s . In the fol l o wi n g, we descri b e the mai n procedure of the update al g ori t hm MLUp. The i n put to the al g ori t hm i n cl u des the ori g i n al encoded transacti o n database T[l ] , the i n crement database db, and the ol d l a rge i t emsets L[m, k], (m 2 1, k > l ) , and thei r support counts. Fol l o wi n g the conventi o ns by D i n FUP*, the si z es of T[l ] and db are denoted and d respecti v el y . Moreover, the mi n i m um support threshol d for di f ferent l e vel i s denoted by si n , (m 2 1). MLUp (mai n steps) : 1. Transl a te the i n crement transacti o n database db i n to an encoded transacti o n tabl e db[l ] accordi n g to the gi v en taxonomy i n formati o n. 2. At l e vel 1, scan db[l ] to update the support counts of the 1-i t emsets i n L[l , l] to fi l t er out the wi n ners i n to L’[l, 11. In the same scan, fi n d al l the 1-i t emsets do not bel o ng to L[l , 11, whose support i n db[l ] whi c h count i n db[l ] i s l a rger than or equal to s1 x d, and store these 1-i t emsets i n the candi d ate set Cl . Subsequentl y , scan T[l ] to fi n d out the new wi n ners i n Cl and store them i n to L’[l, 11. Fol l o wi n g that, T[l ] i s fi l t ered by L’[l, l] to generate the encoded transacti o n tabl e 2721. Si m i l a rl y , db[l ] i s fi l t ered to db[2]. At l e vel m, (m > 1), scan db[2] to update the support counts of the 1-i t emsets i n L[m, 11. An 1-i t emset in L[m, l ] i s a wi n ner onl y if i t s i m medi a te ancestor is l a rge i n the updated database and i t s support counts i n the updated database i s l a rger than or equal to s,,, x (D + d). In the same scan, fi n d al l l e vel - m 1-i t emsets i n db[2] whi c h do not bel o ng to L[m, 11, whose i m medi a te ancestor bel o ngs to L’[m1, l] and whose support count than or equal to srrs x d. Then store i n db[2] i s l a rger these 1-i t emsets i n the candi d ate set Cl . Subsequentl y , scan T[2] to fi n d out the new wi n ners i n Cl and store them i n L’[m, 11. 3. The l a rge k-i t emsets, (k > l ) , for the updated database at l e vel m i s deri v ed i n three steps: (1) Remove al l the k-i t emsets i n L[m, k] for whi c h one of i t s ancestors i s not l a rge i n the updated database. Then scan db[2] to update the support counts of the remai n i n g i t emsets i n L[m, k] to fi n d out the wi n ners. (2) Let L*[m, k - l] be the subsets of i t emsets i n L’[m, k - l] whose support count i n db i s l a rger then or equal to s, x d. In the same scan on db[2] performed i n (1)) fi n d al l l e vel - m k-i t emsets i n Apri o ri gen(L* [m, k-l ] ) whi c h do not bel o ng to L[m, k], whose support count i n db[2] i s l a rger than or equal to snb x d, and store them i n the candi d ate set Ck. (3) Scan DB to update the support counts of the candi d ate sets i n Ck and fi n d al l the l e vel - m si z e-k wi n ners i n Ck, and store them i n L’[m, k]. 4. At l e vel m, return the uni o n of L’[m, k] for al l the k’s. 4 Performance Study of MLUp Extensi v e experi m ents have been conducted to assess the performance of MLUp. It was compared wi t h the al g ori t hm ML-TP. The experi m ents were performed on an AIX system on an RS/SOOO workstati o n wi t h model 410. The resul t shows that MLUp i s much faster than re-runni n g ML-T2 to update the di s covered AR. Thi s i m provement i s not surpri s i n g gi v en that FUP al s o has si m i l a r performance i n updati n g SLAR. The databases used i n our experi m ents are syntheti c data generated usi n g a techni q ue si m i l a r to that i n (2). Fi g ure 1: Performance Compari s on (l e vel 1) test envi r onments are denoted by Our T10.14.D100.d10.s~s~~s-s~, whi c h represents an updated database i n whi c h the ori g i n al database DB has 100 thousands of transacti o ns (Dl O O), the i n crement db has 10 thousands of transacti o ns (d10). The transacti o ns on average has 10 i t ems (Tl O ), and the average si z e of the l a rge i t emsets i s 4 (14). Moreover, there are four l e vel s i n the taxonomy and the mi n i m um supports are denoted by si , (1 5 i 5 4). The performance compari s on between MLUp and ML-T2 i n the update of the l e vel - l AR i s pl o tted i n Fi g ure 1 agai n st di f ferent mi n i m um support threshol d s. Thei r performance rati o s are al s o presented as bar charts i n the same fi g ure. It can be seen that MLUp i s 2-3 ti m es faster than MLT2. MLUp al s o has si m i l a r speed-up over ML-T2 in the updates i n the other l e vel s . As expl a i n ed before, MLUp reduces substanti a l y the number of candi d ate sets generated when compari n g wi t h ML-T:!. In Fi g ure 2, the number of candi date sets generated i n MLUp i n the same experi m ent i s compared wi t h that i n ML-TP. The rati o s i n the compari s on are presented as bar charts i n the same fi g ure. The chart shows that the number of candi d ate sets generated by MLUp i s onl y about 2-3% of that i n ML-T2. A seri e s of updates from 10K to 350K were generated on the databases T10.14.Dl 0 0, and the executi o n ti m es for MLUp and ML-T2 to do the updates on these i n crements were compared. A gradual l y l e vel off of the speed-up of MLUp over ML-T2 onl y appears when the i n crement si z e i s about 3.5 ti m es the si z e of the ori g i n al database. The fact that MLUp sti l exhi b i t s performance gai n when the i n crement i s much l a rger than the ori g i n al database shows that it i s very effi c i e nt. Mi n i n g Associ a ti o n Rul e s 309 del e ti o n and modi f i c ati o n. References Fi g ure 2: Reducti o n 5 Di s cussi o n of Candi d ate and Sets (l e vel 1) Concl u si o ns We have shown that FUP* i s an effi c i e nt al g ori t hm for updati n g di s covered SLAR. It i m proves the performance of FUP by si g ni f i c antl y reduci n g i t s candi d ate sets. We have al s o proposed an effi c i e nt al g ori t hm MLUp for updati n g di s covered MLAR. It i s an adaptati o n of the FUP* al g ori t hm i n the mul t i - l e vel envi r onment. The al g ori t hm MLUp i s i m pl e mented and i t s performance i s studi e d and compared wi t h the ML-T2 al gori t hm . The study shows that MLUp has superi o r performance i n the mul t i - l e vel envi r onment. The success of the i n cremental updati n g techni q ue i n both the SLAR and MLAR suggests that the techni q ue coul d be general i z ed to sol v e the update probl e ms i n some other knowl e dge di s covery systems. Currentl y , both FUP* and MLUp are appl i c abl e onl y to a database whi c h al l o w frequent or occasi o nal updates restri c ted to i n serti o ns of new transacti o ns. We have al s o i n vesti g ated the cases of updates incl u di n g del e ti o ns and/or modi f i c ati o ns to a transacti o n database. In FUP* and MLUp, the i n cremental updati n g techni q ue has made use of the fact that new wi n ners generated i n the updati n g process must appear and have enough support counts i n the i n crement. However, thi s does not hol d i n general i n the cases of del e ti o n and modi f i c ati o n. For exampl e , i n the case of del e ti o n, because the si z e of the updated database has decreased, some i t emsets whi c h are “s mal l ” i n the orgi n i a l database DB, coul d become l a rge i n the updated database, even though it i s not contai n ed i n any transacti o n del e ted. Consequentl y , the set of candi date sets cannot be l i m i t ed to those appear i n the i n crement, and potenti a l y , al l i t emsets i n the updated database have to be consi d ered as candi d ates. Therefore, the current i n cremental techni q ue cannot be appl i e d di r ectl y to the cases of del e ti o n and modi f i c ati o n. However, it i s possi b l e to sol v e the del e ti o n and modi fi c ati o n cases if the i n i t i a l mi n i n g process i s enhanced to retai n more i n formati o ns to support the update. The extensi o n of our i n cremental update techni q ue for the mai n tenance of other type of knowl e dge such as general i z ed AR, epi s odes, sequenti a l patterns, and quanti t ati v e AR i s an i n teresti n g topi c for future research. However, as di s cussed above, a bi g ger chal l e nge i s to extend thi s techni q ue to cover the cases of 310 Technol o gy Spotl i g ht [l] R. Agrawal , T. Imi e l i n ski , and A. Swami . Mi n i n g associ a ti o n rul e s between sets of i t ems i n l a rge databases. In Proc. 1993 ACM-SIGMOD Int. Conf. Management of Data, pp. 207-216, Washi n gton, D.C., May 1993. [2] R. Agrawal and R. Sri k ant. Fast al g ori t hms for Int. Conf. mi n i n g associ a ti o n rul e s. In Proc. 1994 VLDB, pp. 487-499, Santi a go, Chi l e , Sept. 1994. [3] R. Agrawal and R. Sri k ant. Mi n i n g sequenti a l patterns. In Proc. 1995 Inn-t. Conf. Data Engi n eeri n g, pp. 3-14, Tai p ei , Tai w an, March 1995. [4] D.W. Cheung, J. Han, V. Ng, and C.Y. Wong. Mai n tenance of di s covered associ a ti o n rul e s i n l a rge databases: An i n cremental updati n g techni q ue. In Proc. 1996 Int’I Conf. on Data Engi n eeri n g, New Orl e ans, Loui s i a na, Feb. 1996. [5] D.W. Cheung, J. Han, V. Ng, A. Fu and Y. Fu. A Fast Di s tri b uted Al g ori t hm for Mi n i n g Associ a ti o n Rul e s. Techni c al Report, Dept. of Computer Sci e nce, The Uni v ersi t y of Hong Kong, 1996. [6] U. M. Fayyad, G. Pi a tetsky-Shapi r o, P. Smyth, and R. Uthurusamy. Advances i n Knowl e dge Di s covery and Data Mi n i n g. AAAI/MIT Press, 1996. [7] J. Han and Y. Fu. Di s covery of mul t i p l e -l e vel associ a ti o n rul e s from l a rge databases. In Proc. 1995 pp. 420-431, Zuri c h, Swi t zerl a nd, Int. Conf. VLDB, Sept. 1995. H. Manni l a , P. Ronkai n en, [8] M. Kl e metti n en, H. Toi v onen, and A. I. Verkamo. Fi n di n g i n teresti n g rul e s from l a rge sets of di s covered associ a ti o n rul e s. In Proc. 3rd Management, Int’l Conf. pp. on 401-408, Informati o n Gai t hersburg, and Knowl e dge Maryl a nd, Nov. 1994. [9] H. Manni l a , H. Toi v onen, and A. I. Verkamo. Di s coveri n g Frequent Epi s odes i n Sequences. In Proc. 1st Int’l Conf. on KDD, pp. 210-215, Montreal , Quebec, Canada, Aug. 1995. [l o ] J.S. Park, M.S. Chen, and P.S. Yu. An effecti v e hash-based al g ori t hm for mi n i n g associ a ti o n rul e s. In Proc. 1995 ACM-SIGMOD Int. Conf. Management of Data, pp. 175-186, San Jose, CA, May 1995. [ll] A. Savasere, E. Omi e ci n ski , and S. Navathe. An effi c i e nt al g ori t hm for mi n i n g associ a ti o n rul e s i n l a rge databases. In Proc. 1995 Int. Conf. VLDB, pp. 432443, Zuri c h, Swi t zerl a nd, Sept. 1995. [12] R. Sri k ant and R. Agrawal . Mi n i n g general i z ed associ a ti o n rul e s. In Proc. 1995 Int. Conf. VLDB, pp. 407-419, Zuri c h, Swi t zerl a nd, Sept. 1995. [13] R. Sri k ant and R. Agrawal . Mi n i n g quanti t ati v e associ a ti o n rul e s i n l a rge rel a ti o nal tabl e s. In Proc. 1996 ACM-SIGMOD Int. Conf. Management of Data, Montreal , Canada, June 1996.