A Tree-based Approach for Frequent Pattern Mining from Uncertain Data Carson Kai-Sang Leung, Mark Anthony F. Mateo, and Dale A. Brajczuk PAKDD 2008 Outline Motivation UF-Growth algorithm Construction of the UF-Tree Mining of Frequent Patterns from the UF-Tree Improvements to UF-Growth algo. Experimental Results Conslusion Motivation Over the past decade, there have been numerous studies on mining frequent patterns from precise data. However, there are situations in which users are uncertain about the presence or absence of some items. suspicion UF-Growth Algorithm The algorithm consists of two operations: The construction of UF-tree The mining of frequent patterns from UF-tree Construction of the UF-Tree minsup = 1 Scan DB a : 2.7 b: 2.625 c: 2.52429 d: 2.20875 e:2.1575 1 1 1 Mining of Frequent Patterns from the UF-Tree {e}-projected DB expSup({a,e}) = (1*0.72*0.9)+(2*0.71875*0.9) =1.94175 expSup({d,e}) = (1*0.72*0.71875)+(2*0.71875*0.72) =1.5525 {a,e} and {d,e} are frequent (Cont.) {d,e}-projected DB {e}-projected DB expSup({d,e}) in {d,e}-projected DB is 0.5175=0.71875*0.72 expSup ({a,d,e})=3*0.5175*0.9=1.39725 {a}, {a,d}, {a,d,e}, {a,e}, {b}, {b,c}, {c}, {d}, {d,e}, and {e} Improvements to UF-Growth Algorithm The UF-tree above may appear to require a large amount of memory Improvement 1. To increase the chance of path sharing, we discretize and round the expected support of each tree node up to k dceimal places (Cont.) 2. The iproved UF-growth does not need to bulid subsequent UF-trees for any non-singleton patterns. {e}-projected DB {a,e},{a,d,e}, {a,d,e},{d,e} {d,e} with their To enumerate enumerateallallitsitssubsets subsetsand{a,e}, their accumulative expected supports 1.94175, andfar. expected supports equal 0.648,equal 0.46575 and1.39725 0.5175 so 1.5525 Experimental Results (Cont.) Conclusion Improvement 1. method may cause false positive.