UP-Growth: An Efficient Algorithm for High Utility Itemset Mining

advertisement
UP-Growth: An Efficient Algorithm for
High Utility Itemset Mining
Vincent S. Tseng, Cheng-Wei Wu, Bai-En Shie, and
Philip S.Yu
SIG KDD 2010
1
2010/8/25
Outline
 Motivation
 Problem Definition
 Method
 UP-Tree Structure
 UP-Growth Method
 Experimental Results
 Conclusions
2
2010/8/25
Motivation
 The unit profits and purchased quantities of the items are not
taken into considerations in frequent itemset mining.
 The basic meaning of utility is the interestedness/
importance/profitability of items to the users.
3
2010/8/25
(Cont.)
 The utility of items in a transaction database consists of two
aspects:
 External utility: the importance of distinct items.
 Internal utility: the importance of the items in the
transaction.
 The utility of an itemset is defined as the external utility
multiplied by the internal utility.
 High utility itemset: its utility is no less than a userspecified threshold.
4
2010/8/25
(Cont.)
 Mining high utility itemsets from the databases is not an easy
task since the downward closure property used in frequent
itemset mining cannot be applied here.
 How to effectively prune the search space and efficiently
capture all high utility itemsets with no miss is a big
challenge.
5
2010/8/25
Problem Definition
u((XX
)*q(i , Tu )( X , T , T) )
p,T
uu(i
, )Tdd)=p(i
)

Xp  T d  T dp D d u ( i p d
d
i X  X T
p
d
u({AD})=u({AD},T )+u({AD},
u({AC},T1)=u({A},T11)+u({C},T1
T )=7+17=24
u({A},T31)=5+1=6
)=5*1=5
An itemset is called a high utility itemset if
its utility is no less than min_util
T W U (TU
X ) (Td
) X uT(T Td ,TDdT) U (T d )
d
d
TWU({AD})=TU(T1)+TU(T3)
TU(T1)=u({ACD},T1)= 8
=8+30=38
6
TheIf transaction-weighted

TWU(X) is no less thandownward
the minimum utility
closure(TWDC):
threshold, X is called a high transactionForweighted
any itemsetutilization
X, if X is notitemset
a HTWUI,(abbreviated
any supersetasof
X isHTWUI)
a low utility itemset.
2010/8/25
Proposed Method
 Construction of UP-Tree
 Generation of potential high utility itemsets (PHUIs) from the
UP-Tree by UP-Growth
7
2010/8/25
Construction of UP-Tree
 The construction of UP-Tree can be performed with two scans of
the original database.
 First scan
TU of each transaction is computed.
TWU of each single item is also accumulated.
Discarding global unpromising items.
Unpromising items are removed from the transaction and utilities are
eliminated from the TU of the transaction.
 The remaining promising items in the transaction are sorted in the
descending order of TWU.




 Second scan
 Transactions are inserted into UP-Tree.
8
2010/8/25
min_util= 40
(Cont.)
First scan
unpromising items
9
Descending order of
TWU
2010/8/25
(Cont.)
Second scan
10
2010/8/25
(Cont.)
1 8
11
2010/8/25
(Cont.)
1 8
12
2010/8/25
(Cont.)
2
1
1
13
30
22
22
2010/8/25
(Cont.)
14
Strategy 1. Discarding global unpromising items (DGU).
2010/8/25
Generating PHUIs from the global UPtree
An item ip is called a local promising
item in {ai}-CPB if pu(ip, {ai}-CPB) is
no smaller than min_util;
{D}’s conditional pattern base ({D}-CPB)
15
{A}is a local unpromising item in {D}-CPB ,
any superset of {A} is not a high utility
itemset.
2010/8/25
(Cont.)
 Generating PHUIs from {D}-Tree:
{{D}:58,{DE}:45, {DEB}:45, {DEC}:45, {DEBC}:45,
{DB}:45,{DBC}:45, {DC}:53}
16
A set of PHUIs is {{D}:58,{DE}:45, {DEB}:45, {DEC}:45, {DEBC}:45,
{DB}:45,{DBC}:45, {DC}:53}, {B}:61 {BE}:54, {BEC}:54, {BC}:54,
{A}:65, {AC}:55, {ACE}:47, {AE}:47, {E}:88, {EC}:76, {C}:96}.
2010/8/25
Decreasing global node (DGN) utilities
in construction of a global UP-Tree
Strategy 2. Discarding global node utilities (DGN)
The utilities of its descendants are discarded from the utility of
the node during the construction of a global UP-Tree
17
{B}’s-CPB
2010/8/25
(Cont.)
18
2010/8/25
(Cont.)
1 1
19
2010/8/25
(Cont.)
1 1
20
2010/8/25
(Cont.)
{C}.nu=1+p({C})×q({C}, T2’)=1+1×6=7
2 7
21
2010/8/25
(Cont.)
{E}.nu=p({C})×q({C}, T2’)+p({E})×q({E},
T2’)=1×6+3×2=12
2 7
1 12
22
2010/8/25
(Cont.)
{E}.nu=p({C})×q({C}, T2’)+p({E})×q({E},
T2’)+p({A})×q({A}, T2’)=1×6+3×2+5×2=22
2 7
1 12
1 22
23
2010/8/25
(Cont.)
A set of PHUIs is {{D}:58, {DE}:45,
{DEB}:45, {DEBC}:45, {DEC}:45, {DB}:45,
{DBC}:45, {DC}:53, {B}:61, {A}:65, {E}:88,
{C}:96}.
24
2010/8/25
UP-Growth
 For efficiently generating PHUIs from the global UP-Tree
with two strategies:
 DLU(Discarding local unpromising items)
 DLN(Decreasing local node utilities)
25
2010/8/25
DLU
 Due to memory space limit, instead of maintaining exact
utility values of the items in the conditional pattern base, we
maintain a minimum item utility table(MIUT).
 Strategy 3. Discarding local unpromising items(DLU)
 The MIUT of unpromising items are discarded from path
utilities of the paths during the construction of a local UP-Tree
26
2010/8/25
(Cont.)
8-miu({A})× {AC}.count = 5×1 = 5
25-miu({A})× {BAEC}.count = 5×1
=5
27
2010/8/25
DLN
 Strategy 4. Decreasing local node utilities(DLN):
 The MIUT of descendant nodes for the node are decreased
during the construction of a local UP-Tree.
1 3
28
2010/8/25
DLN
 Decreasing local node utilities(DLN):
 The MIUT of descendant nodes for the node are decreased
during the construction of a local UP-Tree.
3+{20-miu({B})×1-miu({E}) ×1} =
3+13 = 16
2
16
1
17
1
20
20-miu({E})×1 = 20-3= 17
29
2010/8/25
DLN
 Decreasing local node utilities(DLN):
 The MIUT of descendant nodes for the node are decreased
during the construction of a local UP-Tree.
16+{20-miu({B})×1-miu({E}) ×1}
= 16+13 = 29
3
29
2
34
2
40
17+20-miu({E})×1 = 17+17= 34
30
2010/8/25
Experimental Results
31
2010/8/25
Scalability
32
2010/8/25
Conclusions
 This paper proposed an efficient UP-Growth algo. For
mining high utility itemsets.
 A UP-Tree structure is proposed for maintaining the
information of high utility itemsets
 By four strategies, the mining performance is enhanced
significantly since both the search space and the number of
candidates are effectively reduced.
33
2010/8/25
Download