Incremental Update o..

1 Incremental Update on Probabilistic Frequent Itemsets in Uncertain Databases Ming-Yen Lin , Cheng-Tai Fu , Sue-Chen Hsueh ICUIMC 2010 2 Outline • • • • • Motivation Problem Definition Method Experimental Result Conclusion 3 Motivation • Most algorithms focus on improving the mining efficiency with the assumption that the database is static. • Some patterns may become obsolete and new ones may emerge due to updates. Re-mining the whole uncertain database from scratch is very time consuming owing to the frequentness probabilities computations. 4 Problem Definition • X:itemsets • P(wj):possibe world • Sup(X,wj):support of itemset X in a possible world wj • The probability that the support of X in the uncertain database equals i, denoted as Pi(X) is 5 (Cont.) • P(w8) = { t1= {a, b}, t2 = {a, c, d}} is = 1.0 × 0.5 × 0.4 × 0.2 × 1.0 = 0.04 • P(w6) = { t1 = {a, b}, t2 = {c, d}} = 1.0 × 0.5 × 1 − 0.4 × 0.2 × 1.0 = 0.06 6 (Cont.) • P2({a}) = P(w3) + P(w4) + P(w7) + P(w8) = 0.16 + 0.04 + 0.16 + 0.04 = 0.4 • P1({c,d}) = P(w2) + P(w4) + P(w6) + P(w8) = 0.06 + 0.04 + 0.06 + 0.04 = 0.2 7 (Cont.) • the probability of itemset X is frequent in the uncertain database, named frequentness probability of X and denoted by Pfreq(X), is 8 (Cont.) • minsup=1.0,minprob=0.5 • Pfreq({a}) = P1({a}) + P2({a}) =0.6 + 0.4 = 1.0 • Pfreq({c, d}) = P1({c ,d}) + P2({c, d}) = 0.2 +0 = 0.2 < minprob 9 Approximation Method • F :cumulative distribution function of the Poisson distribution • 𝜇 𝑋 :the expected support of itemset X • The frequentness probability of itemset X in an uncertain database D, denoted as Pfreq(X), can be well-approximated by a Poisson distribution as: 10 (Cont.) • THEOREM 1. Pfreq(X), if approximated by Equation ,increases monotonically with 𝜇 𝑋 11 P-FUP(Probabilistic-Fast Update pattern) Algorithm • DB: the original database • db: the incremental database(having newly appended transactions is produced after the latest mining, using ms and minprob, in DB) • UD: the union of DB and db 12 (Cont.) • Pass One: Pruning Away the Infrequent 1itemset, Generating C1, and Discovering L’1 • LEMMA 1. A 1-itemset X in L1 is infrequent in 𝑈𝐷 UD if and only if 𝑋. 𝐸𝑥𝑝𝑆𝑢𝑝𝑈𝐷 < 𝜇𝑚 • LEMMA 2. A 1-itemset X not in L1 can become a probabilistic frequent itemset in UD only if 𝑑𝑏 𝑋. 𝐸𝑥𝑝𝑆𝑢𝑝𝑑𝑏 ≥ 𝜇𝑚 13 (Cont.) 14 (Cont.) • Pass k: Pruning Other Infrequent k-itemset, Pruning Candidates, and Discovering Remaining L’k 𝑘 ≥ 2 • LEMMA 3. If a (k-1)-itemset X is infrequent at the (k-1)-th pass,that is, X is in Lk-1 but not in L’k-1, a probabilistic frequent k-itemset in Lk containing the itemset cannot become a probabilistic frequent itemset in the k-th pass. • LEMMA 4. A k-itemset X in Lk is infrequent in UD 𝑈𝐷 if and only if 𝑋. 𝐸𝑥𝑝𝑆𝑢𝑝𝑈𝐷 < 𝜇𝑚 • LEMMA 5. A k-itemset X not in Lk can become a probabilistic frequent itemset in UD only if 𝑑𝑏 𝑋. 𝐸𝑥𝑝𝑆𝑢𝑝𝑑𝑏 ≥ 𝜇𝑚 15 (Cont.) 16 Experimental Result 17 Experimental Result 18 Conclusion • In this paper, we have proposed the p-FUP algorithm to efficiently solve the problem.

Incremental Update o..

Related documents

Products

Support

Incremental Update o..

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib