Incremental Update o..

advertisement
1
Incremental Update on Probabilistic
Frequent Itemsets in Uncertain Databases
Ming-Yen Lin , Cheng-Tai Fu ,
Sue-Chen Hsueh
ICUIMC 2010
2
Outline
•
•
•
•
•
Motivation
Problem Definition
Method
Experimental Result
Conclusion
3
Motivation
• Most algorithms focus on improving the mining
efficiency with the assumption that the database
is static.
• Some patterns may become obsolete and new
ones may emerge due to updates. Re-mining the
whole uncertain database from scratch is very
time consuming owing to the frequentness
probabilities computations.
4
Problem Definition
• X:itemsets
• P(wj):possibe world
• Sup(X,wj):support of itemset X in a possible
world wj
• The probability that the support of X in the
uncertain database equals i, denoted as Pi(X) is
5
(Cont.)
• P(w8) = { t1= {a, b}, t2 = {a, c, d}} is
= 1.0 × 0.5 × 0.4 × 0.2 × 1.0
= 0.04
• P(w6) = { t1 = {a, b}, t2 = {c, d}}
= 1.0 × 0.5 × 1 − 0.4 × 0.2 × 1.0
= 0.06
6
(Cont.)
• P2({a}) = P(w3) + P(w4) + P(w7) + P(w8)
= 0.16 + 0.04 + 0.16 + 0.04 = 0.4
• P1({c,d}) = P(w2) + P(w4) + P(w6) + P(w8)
= 0.06 + 0.04 + 0.06 + 0.04 = 0.2
7
(Cont.)
• the probability of itemset X is frequent in the
uncertain database, named frequentness
probability of X and denoted by Pfreq(X), is
8
(Cont.)
• minsup=1.0,minprob=0.5
• Pfreq({a}) = P1({a}) + P2({a}) =0.6 + 0.4 = 1.0
• Pfreq({c, d}) = P1({c ,d}) + P2({c, d})
= 0.2 +0 = 0.2 < minprob
9
Approximation Method
• F :cumulative distribution function of the Poisson
distribution
• πœ‡ 𝑋 :the expected support of itemset X
• The frequentness probability of itemset X in an
uncertain database D, denoted as Pfreq(X), can be
well-approximated by a Poisson distribution
as:
10
(Cont.)
• THEOREM 1. Pfreq(X), if approximated by
Equation ,increases monotonically with πœ‡ 𝑋
11
P-FUP(Probabilistic-Fast Update
pattern) Algorithm
• DB: the original database
• db: the incremental database(having newly
appended transactions is produced after the
latest mining, using ms and minprob, in DB)
• UD: the union of DB and db
12
(Cont.)
• Pass One: Pruning Away the Infrequent 1itemset, Generating C1, and Discovering L’1
• LEMMA 1. A 1-itemset X in L1 is infrequent in
π‘ˆπ·
UD if and only if 𝑋. 𝐸π‘₯π‘π‘†π‘’π‘π‘ˆπ· < πœ‡π‘š
• LEMMA 2. A 1-itemset X not in L1 can become
a probabilistic frequent itemset in UD only if
𝑑𝑏
𝑋. 𝐸π‘₯𝑝𝑆𝑒𝑝𝑑𝑏 ≥ πœ‡π‘š
13
(Cont.)
14
(Cont.)
• Pass k: Pruning Other Infrequent k-itemset,
Pruning Candidates, and Discovering Remaining
L’k π‘˜ ≥ 2
• LEMMA 3. If a (k-1)-itemset X is infrequent at the
(k-1)-th pass,that is, X is in Lk-1 but not in L’k-1, a
probabilistic frequent k-itemset in Lk containing the
itemset cannot become a probabilistic frequent
itemset in the k-th pass.
• LEMMA 4. A k-itemset X in Lk is infrequent in UD
π‘ˆπ·
if and only if 𝑋. 𝐸π‘₯π‘π‘†π‘’π‘π‘ˆπ· < πœ‡π‘š
• LEMMA 5. A k-itemset X not in Lk can become a
probabilistic frequent itemset in UD only if
𝑑𝑏
𝑋. 𝐸π‘₯𝑝𝑆𝑒𝑝𝑑𝑏 ≥ πœ‡π‘š
15
(Cont.)
16
Experimental Result
17
Experimental Result
18
Conclusion
• In this paper, we have proposed the p-FUP
algorithm to efficiently solve the problem.
Download