UP-Growth: An Efficient Algorithm for High Utility Itemset Mining Vincent S. Tseng, Cheng-Wei Wu, Bai-En Shie, and Philip S.Yu SIG KDD 2010 1 2010/8/25 Outline Motivation Problem Definition Method UP-Tree Structure UP-Growth Method Experimental Results Conclusions 2 2010/8/25 Motivation The unit profits and purchased quantities of the items are not taken into considerations in frequent itemset mining. The basic meaning of utility is the interestedness/ importance/profitability of items to the users. 3 2010/8/25 (Cont.) The utility of items in a transaction database consists of two aspects: External utility: the importance of distinct items. Internal utility: the importance of the items in the transaction. The utility of an itemset is defined as the external utility multiplied by the internal utility. High utility itemset: its utility is no less than a userspecified threshold. 4 2010/8/25 (Cont.) Mining high utility itemsets from the databases is not an easy task since the downward closure property used in frequent itemset mining cannot be applied here. How to effectively prune the search space and efficiently capture all high utility itemsets with no miss is a big challenge. 5 2010/8/25 Problem Definition u((XX )*q(i , Tu )( X , T , T) ) p,T uu(i , )Tdd)=p(i ) Xp T d T dp D d u ( i p d d i X X T p d u({AD})=u({AD},T )+u({AD}, u({AC},T1)=u({A},T11)+u({C},T1 T )=7+17=24 u({A},T31)=5+1=6 )=5*1=5 An itemset is called a high utility itemset if its utility is no less than min_util T W U (TU X ) (Td ) X uT(T Td ,TDdT) U (T d ) d d TWU({AD})=TU(T1)+TU(T3) TU(T1)=u({ACD},T1)= 8 =8+30=38 6 TheIf transaction-weighted TWU(X) is no less thandownward the minimum utility closure(TWDC): threshold, X is called a high transactionForweighted any itemsetutilization X, if X is notitemset a HTWUI,(abbreviated any supersetasof X isHTWUI) a low utility itemset. 2010/8/25 Proposed Method Construction of UP-Tree Generation of potential high utility itemsets (PHUIs) from the UP-Tree by UP-Growth 7 2010/8/25 Construction of UP-Tree The construction of UP-Tree can be performed with two scans of the original database. First scan TU of each transaction is computed. TWU of each single item is also accumulated. Discarding global unpromising items. Unpromising items are removed from the transaction and utilities are eliminated from the TU of the transaction. The remaining promising items in the transaction are sorted in the descending order of TWU. Second scan Transactions are inserted into UP-Tree. 8 2010/8/25 min_util= 40 (Cont.) First scan unpromising items 9 Descending order of TWU 2010/8/25 (Cont.) Second scan 10 2010/8/25 (Cont.) 1 8 11 2010/8/25 (Cont.) 1 8 12 2010/8/25 (Cont.) 2 1 1 13 30 22 22 2010/8/25 (Cont.) 14 Strategy 1. Discarding global unpromising items (DGU). 2010/8/25 Generating PHUIs from the global UPtree An item ip is called a local promising item in {ai}-CPB if pu(ip, {ai}-CPB) is no smaller than min_util; {D}’s conditional pattern base ({D}-CPB) 15 {A}is a local unpromising item in {D}-CPB , any superset of {A} is not a high utility itemset. 2010/8/25 (Cont.) Generating PHUIs from {D}-Tree: {{D}:58,{DE}:45, {DEB}:45, {DEC}:45, {DEBC}:45, {DB}:45,{DBC}:45, {DC}:53} 16 A set of PHUIs is {{D}:58,{DE}:45, {DEB}:45, {DEC}:45, {DEBC}:45, {DB}:45,{DBC}:45, {DC}:53}, {B}:61 {BE}:54, {BEC}:54, {BC}:54, {A}:65, {AC}:55, {ACE}:47, {AE}:47, {E}:88, {EC}:76, {C}:96}. 2010/8/25 Decreasing global node (DGN) utilities in construction of a global UP-Tree Strategy 2. Discarding global node utilities (DGN) The utilities of its descendants are discarded from the utility of the node during the construction of a global UP-Tree 17 {B}’s-CPB 2010/8/25 (Cont.) 18 2010/8/25 (Cont.) 1 1 19 2010/8/25 (Cont.) 1 1 20 2010/8/25 (Cont.) {C}.nu=1+p({C})×q({C}, T2’)=1+1×6=7 2 7 21 2010/8/25 (Cont.) {E}.nu=p({C})×q({C}, T2’)+p({E})×q({E}, T2’)=1×6+3×2=12 2 7 1 12 22 2010/8/25 (Cont.) {E}.nu=p({C})×q({C}, T2’)+p({E})×q({E}, T2’)+p({A})×q({A}, T2’)=1×6+3×2+5×2=22 2 7 1 12 1 22 23 2010/8/25 (Cont.) A set of PHUIs is {{D}:58, {DE}:45, {DEB}:45, {DEBC}:45, {DEC}:45, {DB}:45, {DBC}:45, {DC}:53, {B}:61, {A}:65, {E}:88, {C}:96}. 24 2010/8/25 UP-Growth For efficiently generating PHUIs from the global UP-Tree with two strategies: DLU(Discarding local unpromising items) DLN(Decreasing local node utilities) 25 2010/8/25 DLU Due to memory space limit, instead of maintaining exact utility values of the items in the conditional pattern base, we maintain a minimum item utility table(MIUT). Strategy 3. Discarding local unpromising items(DLU) The MIUT of unpromising items are discarded from path utilities of the paths during the construction of a local UP-Tree 26 2010/8/25 (Cont.) 8-miu({A})× {AC}.count = 5×1 = 5 25-miu({A})× {BAEC}.count = 5×1 =5 27 2010/8/25 DLN Strategy 4. Decreasing local node utilities(DLN): The MIUT of descendant nodes for the node are decreased during the construction of a local UP-Tree. 1 3 28 2010/8/25 DLN Decreasing local node utilities(DLN): The MIUT of descendant nodes for the node are decreased during the construction of a local UP-Tree. 3+{20-miu({B})×1-miu({E}) ×1} = 3+13 = 16 2 16 1 17 1 20 20-miu({E})×1 = 20-3= 17 29 2010/8/25 DLN Decreasing local node utilities(DLN): The MIUT of descendant nodes for the node are decreased during the construction of a local UP-Tree. 16+{20-miu({B})×1-miu({E}) ×1} = 16+13 = 29 3 29 2 34 2 40 17+20-miu({E})×1 = 17+17= 34 30 2010/8/25 Experimental Results 31 2010/8/25 Scalability 32 2010/8/25 Conclusions This paper proposed an efficient UP-Growth algo. For mining high utility itemsets. A UP-Tree structure is proposed for maintaining the information of high utility itemsets By four strategies, the mining performance is enhanced significantly since both the search space and the number of candidates are effectively reduced. 33 2010/8/25